Is geometric, topological, and algebraic ML/data analysis actually used in the i...

dpflan · on July 15, 2024

I believe a use-case(s) receiving attention is drug design, protein design, chemical design, etc.

Here is a summer school by the London Geometry and Machine Learning group where research topics are shared and discussed. - https://www.logml.ai/

Here is another group, a weekly reading group on graphs and geometry: https://portal.valencelabs.com/logg

heyitsguay · on July 15, 2024

As someone who did an applied math PhD before drifting towards ML, it's worth pointing out that these applied math groups typically talk about applications, but the real question is whether they are actually used for the stated application in practice due to outperforming methods that use less pretty math. Typically (in every case i have seen) the answer is "no", and the mathematicians don't even really care about solving the applied problems nor fully understand what it would mean to do so. It's just a source of grant-justifiable abstract problems.

I would love to be proven wrong though!

dpflan · on July 15, 2024

Indeed, the ivory tower has nice chats and ideas and is a cool place to hang out, but does application actually occur.

funnygiraffe · on July 15, 2024

Thanks. That's certainly very interesting. Albeit it seems to me that the number of jobs doing geometric and topological ML/AI work in the drug or protein design space would be quite limited, because any discovery ultimately has to be validated through a wet lab process (or perhaps phase 1-3 clinical trials for drugs) which is expensive and time-consuming. However, I'm very uninformed and perhaps there is indeed a sizable job market here.

dpflan · on July 15, 2024

I think the job market in general for this kind of stuff is "small"; but you can find jobs. Look at Isomoprhic Labs for example. There are new AI/ML companies that have emerged in recent years, helped by success of things like AlphaFold. I think your question is really: does this research actually creates tangible results? If it did, it would be able to create more jobs to support it by virtue of being economically successfully and therefore growing?

llm_trw · on July 15, 2024

I've had some success using hyperbolic embeddings for bert like models.

It's not something that the companies I've worked for advertised or wrote papers about.

jqgatsby · on July 15, 2024

Hyperbolic embeddings have been an interest of mine ever since the Max Nickel paper. Would love to connect directly to discuss this topic if you're open. here's my email: https://photos.app.goo.gl/1khCwXBsVBuEP6xF7

llm_trw · on July 15, 2024

Not much to discuss really, I just monkey patched a different metric function, then results for our use case became substantially better after training a model from scratch on the same data compared to the previous euclidean model trained from scratch.

I'm currently working on massive multi agent orchestration so don't have my head in that side of things currently.

nborwankar · on July 16, 2024

Can you share what kinds of problems were conducive to hyperbolic embeddings in your experience. Also, separately, are you saying companies are using these in practice but don’t talk about them because of the advantage they give? Or am I reading too much into your last sentence.

llm_trw · on July 16, 2024

They are better at separating clusters and keep the fact that distances under the correct metric also provide semantic information. The issue is that training is longer and you need at least 32, and ideally 64 bit floats during training and inference.

And possibly.

The company I did the work for kept it very quiet. Bert like models are small enough that you can train them a a work station today so there is a lot less prestige in them than 5 years ago, which is why for profit companies don't write papers on them any more.

fjork · on July 15, 2024

I don't think there's much use currently. But I kinda like the direction of the paper anyway. Most mathematical objects in ML have geometric or topological structure, implicitly defined. By making that structure explicit, we at worst have a fresh new perspective on some ML thing. Like how viewing the complex numbers on a 2d cartesian plane often clicks more for students compared to the dry algebraic perspective. So even in the worst case I think there's some pedagogical clarity here.