Moral of the post: don't do premature optimization. Common adage but it's a good...

srcreigh · on May 26, 2023

The cause of this is the likely N+1-like behaviour of indexes on un-clustered data. Two options for speeding this up a ton (they dont' make sense to use together),

1. Cluster your data using the Gist/r-tree index. postgis docs [1] great explanation

2. Use the r-tree as covering index. IE add your associated data as dimensions into the index. The gist index becomes (lat, long, weather_c, avg_altitude), etc. This avoids having to load a separate page for each row in the spatial index [2]

[1]: https://postgis.net/workshops/postgis-intro/clusterindex.htm...

[2]: https://www.postgresql.org/docs/current/indexes-index-only-s...

Aachen · on May 26, 2023

Thanks! I don't think MariaDB offers the former but it's good to know this exists and I may want to look elsewhere. This isn't the only thing I ran into with MariaDB, but I'm also hesitant to subscribe to maintaining another service into eternity when the existing database has done fine for everything (more than a decade of projects) up until this project. If I had to redo it, I'd definitely try this out though to see if it's worth it! And at work it's also a different equation. So it's useful info :)

The latter, I considered but didn't think would cut the time by more than half whereas it would need an order of magnitude or more speedup to be useful. Doing a full table scan and caching the result for certain zoom levels was the solution for me.

iLoveOncall · on May 25, 2023

Adding an index is an incredibly expensive task on a dataset so big that it needs one to be added.

It's something that is likely to require massive engineering effort on a live system.

Removing an index, on the other hand, is never an issue.

It's not at all premature optimization, it's simply basic software design.

Aachen · on May 25, 2023

Read OP's post please. They're talking of systems where it was acceptable to always do full table scans, in one case having 350 rows.

Not "an incredibly expensive task on a dataset so big that it needs one". Thus I read OP's post as recommending to not do premature optimization (without using those words literally).