whilo's comments

whilo · 2025-04-12T02:22:24 1744424544

As a German coming from Mannheim, and now living on the Canadian West coast, I have to say that this is exactly the mindset that makes it so difficult to innovate in Germany. While people have a top education and know everything they need to, they don't have a "digital mindset". I think of myself now as computing process, and see myself as a cyberneticist in the German philosophical tradition of Hegel, Marx, Hilbert, Gödel and Bloch, but even I more often than necessary mistrust innovations.

Almost all the science fiction and cybernetic work of the last 75 years came either out of the Eastern block, or out of the US. There is basically no German sci fi vision, and there is extreme reluctance to speculative, big picture thinking as it is pursued in Silicon Valley. Software companies grow quickly and need a very different approach. Silicon Valley mostly understands this scaling aspect and winner take all markets really well. German investors are cheap and extremely risk averse, I tried to build a software company in Germany and it is hard.

While a lot of the current AI work was also done in Germany, e.g. by Schmidhuber, Germans are stuck in their business model. I recommend Münchau's book "Kaput" (or one of his podcast interviews) on how poorly Germans have adapted to the non-industrial aspects of a modern economy (read: "services"). I really hope that more tech founder thinking like Benz, Bosch or Siemens returns to Germany in a modern form. But I don't see it yet, and Germans are still super reactive and conservative to larger changes. The Greens tried to think a bit out of the box, and were heavily punished for it. In general there is basically no political representation for building a new successful economy. At best there is this nice little narrative about the long established "Mittelstand", which has produce almost zero software companies. The first step right now would be to own the idea of the EU, and wanting to win instead of complaining.

pmags · 2025-04-12T22:42:26 1744497746

I'm not German and have only made a couple of short visits to Germany, so I have no basis on which to judge your statement that Germans don't have a digital mindset.

But if that is indeed true, I find it equally interesting the Germany has been an important center for the development of electronic music. Berlin in particular is "arguably the world capital of underground electronic music" (https://www.nytimes.com/2018/06/21/arts/music/women-djs-berl...)

331c8c71 · 2025-04-12T05:58:11 1744437491

Hmmm, does Jurgen Schmidhuber live in Germany? I'd think primarily Ticino where he spent the majority of his career if I understand correctly.

whilo · 2025-04-13T20:19:01 1744575541

No, he lives in Switzerland now (Lugano as far as I know). Switzerland is a big magnet for European talent, as is Germany to a lesser degree. My point though was that Germany was not lacking the thinkers to drive a technological revolution, it is rather the society that does not really have the mindset for radical changes and forward thinking.

whilo · on July 25, 2020

Hey, one of the core architects of Datahike here. As a Clojure company we are also super happy that Nubank made Clojure and Datomic much more credible with this move. While Datomic is obviously much more mature, it is important to understand that we have a different scope in our goals than Datomic. Datomic is mostly built as a more convenient backend database for corporate environments and is highly tailored towards AWS and a business environment where the costs of operating and depending on these cloud services is acceptable, which is only the most profitable, but small, slice of the whole market. Even when Datomic gets open-sourced it will not be automatically built with other than these design goals in mind.

Our goals on the other hand go even beyond thinking about this backend market; we want to use Datalog really as a distributed systems environment and extend Datahike to all endpoints including the browser and IOT development:

https://www.youtube.com/watch?v=A2CZwOHOb6U

Our main intention was never to just build an open-source Datomic. But it made too much sense not to do it as a first step. In fact we also really hope that Datomic will be open-sourced such that we can merge our efforts. But given the current governance model of Clojure and Datomic we do not yet foresee that open-sourcing Datomic alone would address a large section of our plans. We are ahead of Datomic already in a few areas:

We have funded development for ClojureScript support going for instance and in comparison to Datomic all our efforts where from the beginning aimed at this, we in fact provide more a set of libraries and abstractions that can stand on their own and that you can compose in different ways than having a top-down design that we then unbundle into libraries. This made it much easier for us to evolve and reuse our stack despite the pivot we did from replikativ to Datahike.

Regarding maturity we have worked hard during quarantine to address some of our pain points:

1. We significantly improved our write throughput and Datomic performance is in reach (close to release), https://github.com/replikativ/datahike/pull/201 2. We have a first version of our server API available and will extend this in the next months to provide Datomic-style local querying https://github.com/replikativ/datahike-server/ 3. We recently provided Java bindings https://lambdaforge.io/2020/05/25/java-api.html

Over the last year we also built a cooperative that has more than 5 people working on this full-time and we aim to grow even faster next year and really bring Datalog to the community and the masses. If Datomic gets properly open-sourced, we will get there even faster.

synthc · on July 25, 2020

Thanks for clarifying. I agree that Datomic is expensive to run and that open sourcing it would not improve this overnight, a lightweight and cost-effective alternative would be great.

How will you handle security when accessing a Datahike backend from the browser? I've used Datomic from the browser indirectly in the past for internal tools, using a custom rest api to run the queries, but for external access it was not clear how to limit the queries to the parts of the database the user had permission to view.

What are your plans for IOT development? I found that Datomic is not a good fit for timeseries data, does Datahike offer any advantages?

whilo · on July 25, 2020

Interesting, honestly speaking we have not thought about time series data a lot yet, but I think we should be able to provide custom indices and extend Datalog with more efficient query primitives, if this is necessary. Can you elaborate a bit? I have used HDF5 binary blobs for tensors of experimental recordings (parameter evolution in spiking neural networks) in Datomic a few years ago and it is definitely possible to integrate external index data structures, but eventually the query engine will need to be aware of how to join them efficiently.

W.r.t. security, our current approach is to shard access rights and encryption on a database level and just provide many databases, one for each user. This is obviously not the most space efficient, but the most general approach. If users can share access keys and data we can also do structural sharing between these instances and factorize further. We envision doing joins potentially over dozens of distributed Datahike instances in a global address space during single queries. Since the indices are amortized data structures it does not make too much sense to encrypt chunks of them for different users as this defeats the optimality guarantees of B+-trees, i.e. you could have very bad scan behaviour over huge ranges of encrypted Datoms. How have you tried to partition the data? This is an interesting problem.

We can also expose the datahike-server query endpoint directly and you can write static checks for access right restrictions. We only do this so far to limit the usage of builtin functions to safe ones, but you could also go ahead and do the same for more complex access controls. Some work in this direction for Datahike has also been done here: https://github.com/theronic/eacl Doing this openly on the internet will also require a resource model to fend of denial of service attacks, fortunately Datalog engines can have powerful query planners and we can restrict our runtime to limited budgets as well.

synthc · on July 26, 2020

For timeseries data I encoded a [entityId,timestamp,attribute] tuple to a big integer, using a order preserving mapping to ensure that the datoms are sorted by the timestamp. This provided the right functionality, for example using seek-datoms we could retreive the datoms with timestamps between some range, but performance was poor. I think a custom index could help a lot here. We also had problems with the database growing to large, and needed to manually shard the database over time.

A datalog equivalent to TimescaleDB (which extends Postgres with timeseries optimiziations and time based table partitioning) would be great.

For client access I tried to define access rules based on attributes (similar to how many graphql frameworks handle this), I tried to express this using datalog rules. For example, users hava permission to access :user/items, and :items/blabla, so a user X can access [X :user/items Y] and [Y :items/blabla Z] Some experiments were promising, but it was slow and I did not find a good way to integrate this.

whilo · on July 27, 2020

I see, so your problem was that you wanted to scan over all Datoms for one entity over a time period and you would have needed to have an EVAT index? In Datahike it would be fairly simple to add new indices like this.

Yes, access management must not incur a large overhead, that is why many systems have a separate restricted way to express and track rules. My hunch is that it still would be better to keep it in Datalog and specialize the query engine that it is fast on these (potentially restricted) rules and relations.

invisiblerobot · on July 26, 2020

>>> but for external access it was not clear how to limit the queries to the parts of the database the user had permission to view.

You can filter your entire db on the server to only include a subset of datoms

synthc · on July 26, 2020

I've tried this, but found it too slow for large databases