We sped up time series by 20-30x

earthnail · on Feb 14, 2024

Disclosure: I know one of the founders, so obviously I'm a bit biased.

That being said, the engineering of Rerun fills me with joy. This thing is just crazy fast and really usable. Absolute gem.

Btw @Rerun folks you can update your Readme on Github and remove the time series shortcoming now :). Glad to see that landed!

boomskats · on Feb 14, 2024

egui is an absolute blessing and a great example of high quality, healthy opensource. I know a couple of people who use it for literally everything now.

On the topic of Readmes though, their Github also still says that their beta will be publicly available on February 15th (I guess they don't mean tomorrow). In any case I reckon they've got their priorities sorted.

emilk · on Feb 14, 2024

egui author here - thanks for the kind words

Haha, yeah, we open sourced Rerun a year ago, minus one day I guess I’ll need to update that readme

nikonp · on Feb 14, 2024

Thanks! Haha yeah definitely the downside of putting disclaimers everywhere.

There is another order of magnitude performance left to squeeze out in theory so I guess we can always leave it in until that’s done too

MrYellowP · on Feb 14, 2024

Question.

With a 20x-30x improvement in performance I really have to ask:

How did you manage to fuck up your code's design so badly ...

... that you've left so much performance on the table?

nikonp · on Feb 14, 2024

Did you read the post?

CyberDildonics · on Feb 15, 2024

I did, it was basic stuff. The entity component system datastore entity hierarchy path components look like they are 3 float vectors to points. Then the optimization are about basic locality and rendering. 3D programs like houdini do this stuff and don't even call it a feature, let alone a database. Bragging about a 30x speedup is broadcasting to anyone knowledgeable that it was structured poorly initially.

It seems like the modern database blog meta is to brag about the basics like they are a new discovery.

carefulfungi · on Feb 15, 2024

The broadcasting I see is that they focused first on making something people liked and then focused on making it fast.

CyberDildonics · on Feb 15, 2024

Doubling the speed is an optimization, room for 30x speed means there is something very wrong. That means it is only an optimization compared to yourself. Even porting something from javascript to C++ should only give a 10x speedup. If one version of hardware to the next was a 30x improvement, the first version would be considered completely broken.

willtemperley · on Feb 14, 2024

Interesting approach. Recently (in SwiftUI) I did something similar using Ramer–Douglas–Peucker to retain only significant points. Not as sophisticated, but took about half a day to implement. Render lag went from annoying to barely noticeable.

tda · on Feb 14, 2024

Yep, that trusty simple algorithm always works. I have some grafana dashboards that plot a lot of 3D data (x,y,z lines coloured by some sensor readings) using a plotly plugin. I can't pre-aggregate too much in the database, or I'll loose too much detail. So I ended up adding a little ad-hoc js implementation to reduce my number of points by a factor ten on the fly for plotting, works great.

I was just about to write down that it would be too difficult to de RDP simplification in the (ClickHouse) database, but then I recalled PostGIS has it built in and low and behold, there is also something in ClickHouse for this [0]. Back to the drawing board.

[0] https://clickhouse.com/codebrowser/ClickHouse/contrib/boost/...

abey79 · on Feb 14, 2024

(Rerun engineer here.) That's actually pretty good idea for egui_plots itself (the lib we use to render the plots), as it would be of general usefulness. The aggregation described in the article is somewhat specific to time-series.

I'm not super-familiar with the Ramer–Douglas–Peucker algorithm itself but I've used implementations of it, and, from the looks of it, its CPU cost would largely be offset by the savings in triangulation done by egui's renderer (also done on CPU currently).

abey79 · on Feb 14, 2024

FWIW, I opened an issue: https://github.com/emilk/egui/issues/4046

nikonp · on Feb 14, 2024

Cool! I don’t know Ramer-Douglas-Pecker, how did that work?

nauful · on Feb 14, 2024

It's a simplification algorithm for a series of points. Briefly, if you have a group of several points relatively close to a straight line between the first and last point in the group, all of the middle points may be removed.

Given a first and last point, it finds the point furthest away from a straight line connection, then recursively divides down the pairs of (first, furthest) and (last, furthest) only if the furthest point is above a minimum threshold distance from a straight line connection.

justjust · on Feb 14, 2024

It seems a similar idea would be to adjust a linear model to a sequence of points and just maintain those that are outliers and other two that are end points on the linear model.

renewiltord · on Feb 15, 2024

This is very cool. I'm going to give the DICOM viewer a shot and see how it looks. Project is seriously cool.

apapapa · on Feb 15, 2024

Not sure what the point of dicom is... Is it only to make images proprietary ?

eiiot · on Feb 14, 2024

Rerun is one of those projects where I want to find an excuse to use it more. So well-thought-out, well engineered, etc. Props to the engineers.