Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
We sped up time series by 20-30x (rerun.io)
165 points by Tycho87 on Feb 14, 2024 | hide | past | favorite | 19 comments


Disclosure: I know one of the founders, so obviously I'm a bit biased.

That being said, the engineering of Rerun fills me with joy. This thing is just crazy fast and really usable. Absolute gem.

Btw @Rerun folks you can update your Readme on Github and remove the time series shortcoming now :). Glad to see that landed!


egui is an absolute blessing and a great example of high quality, healthy opensource. I know a couple of people who use it for literally everything now.

On the topic of Readmes though, their Github also still says that their beta will be publicly available on February 15th (I guess they don't mean tomorrow). In any case I reckon they've got their priorities sorted.


egui author here - thanks for the kind words

Haha, yeah, we open sourced Rerun a year ago, minus one day I guess I’ll need to update that readme


Thanks! Haha yeah definitely the downside of putting disclaimers everywhere.

There is another order of magnitude performance left to squeeze out in theory so I guess we can always leave it in until that’s done too


Question.

With a 20x-30x improvement in performance I really have to ask:

How did you manage to fuck up your code's design so badly ...

... that you've left so much performance on the table?


Did you read the post?


I did, it was basic stuff. The entity component system datastore entity hierarchy path components look like they are 3 float vectors to points. Then the optimization are about basic locality and rendering. 3D programs like houdini do this stuff and don't even call it a feature, let alone a database. Bragging about a 30x speedup is broadcasting to anyone knowledgeable that it was structured poorly initially.

It seems like the modern database blog meta is to brag about the basics like they are a new discovery.


The broadcasting I see is that they focused first on making something people liked and then focused on making it fast.


Doubling the speed is an optimization, room for 30x speed means there is something very wrong. That means it is only an optimization compared to yourself. Even porting something from javascript to C++ should only give a 10x speedup. If one version of hardware to the next was a 30x improvement, the first version would be considered completely broken.


Interesting approach. Recently (in SwiftUI) I did something similar using Ramer–Douglas–Peucker to retain only significant points. Not as sophisticated, but took about half a day to implement. Render lag went from annoying to barely noticeable.


Yep, that trusty simple algorithm always works. I have some grafana dashboards that plot a lot of 3D data (x,y,z lines coloured by some sensor readings) using a plotly plugin. I can't pre-aggregate too much in the database, or I'll loose too much detail. So I ended up adding a little ad-hoc js implementation to reduce my number of points by a factor ten on the fly for plotting, works great.

I was just about to write down that it would be too difficult to de RDP simplification in the (ClickHouse) database, but then I recalled PostGIS has it built in and low and behold, there is also something in ClickHouse for this [0]. Back to the drawing board.

[0] https://clickhouse.com/codebrowser/ClickHouse/contrib/boost/...


(Rerun engineer here.) That's actually pretty good idea for egui_plots itself (the lib we use to render the plots), as it would be of general usefulness. The aggregation described in the article is somewhat specific to time-series.

I'm not super-familiar with the Ramer–Douglas–Peucker algorithm itself but I've used implementations of it, and, from the looks of it, its CPU cost would largely be offset by the savings in triangulation done by egui's renderer (also done on CPU currently).



Cool! I don’t know Ramer-Douglas-Pecker, how did that work?


It's a simplification algorithm for a series of points. Briefly, if you have a group of several points relatively close to a straight line between the first and last point in the group, all of the middle points may be removed.

Given a first and last point, it finds the point furthest away from a straight line connection, then recursively divides down the pairs of (first, furthest) and (last, furthest) only if the furthest point is above a minimum threshold distance from a straight line connection.


It seems a similar idea would be to adjust a linear model to a sequence of points and just maintain those that are outliers and other two that are end points on the linear model.


This is very cool. I'm going to give the DICOM viewer a shot and see how it looks. Project is seriously cool.


Not sure what the point of dicom is... Is it only to make images proprietary ?


Rerun is one of those projects where I want to find an excuse to use it more. So well-thought-out, well engineered, etc. Props to the engineers.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: