Hacker Newsnew | past | comments | ask | show | jobs | submit | more ilaksh's commentslogin

What LLM do you guys use for fast inference for voice/phone agents? I feel like to get really good latency I need to "cheat" with Cerebras, groq or SambaNova.

Haiku 4.5 is very good but still seems to be adding a second of latency.


Does this use CLIP or something to get embeddings for each image and normal text embeddings for the text fields, and then feed the top N results to a VLM (LLM) to select the best answer(s)?

What's the advantage of this over using llamaindex?

Although even asking that question I will be honest, the last thing I used llamaindex for, it seemed mostly everything had to be shoehorned in as using that library was a foregone conclusion, even though ChromaDB was doing just about all the work in the end because the built in test vector store that llamaindex has strangely bad performance with any scale.

I do like how simple the llamaindex DocumentStore or whatever is where you can just point it at a directory. But it seems when using a specific vectordb you often can't do that.

I guess the other thing people do is put everything in postgres. Do people use pgvector to store image embeddings?


LlamaIndex relies heavily on RAG-style approaches, e.g., we're using items whose embedding vectors are close to the embedding vectors of the question (what you describe). RAG-style approaches work great if the answer depends only on a small part of the data, e.g., if the right answer can be extracted from a few top-N documents.

It's less applicable if the answer cannot be extracted from a small data subset. E.g., you want to count the number of pictures showing red cars in your database (rather than retrieving a few pictures of red cars). Or, let's say you want to tag beach holiday pictures with all the people who appear in them. That's another scenario where you cannot easily work with RAG. ThalamusDB supports such scenarios, e.g., you could use the query below in ThalamusDB:

SELECT H.pic FROM HolidayPictures H, ProfilePictures P as Tag WHERE NLFILTER(H.pic, 'this is a picture of the beach') AND NLJOIN(H.pic, P.pic, 'the same person appears in both pictures');

ThalamusDB handles scenarios where the LLM has to look at large data sets and uses a few techniques to make that more efficient. E.g., see here (https://arxiv.org/abs/2510.08489) for the implementation of the semantic join algorithm.

A few other things to consider:

1) ThalamusDB supports SQL with semantic operators. Lay users may prefer the natural language query interfaces offered by other frameworks. But people who are familiar with SQL might prefer writing SQL-style queries for maximum precision.

2) ThalamusDB offers various ways to restrict the per-query processing overheads, e.g., time and token limits. If the limit is reached, it actually returns a partial result (e.g., lower and upper bounds for query aggregates, subsets of result rows ...). Other frameworks do not return anything useful if query processing is interrupted before it's complete.


We use a vector db (Qdrant) to store embeddings of images and text and built a search UI atop it.


Cool. And the other person implies that the queries can search across all rows if necessary? For example if all images have people and the question is which images have the same people in them. Or are you talking about a different project?


I think the previous post refers to a different project. But yes: ThalamusDB can process all rows if necessary, including matching all images that have the same persons in them.



Doesn't seem high enough resolution.


On the Vectrex you could only draw lines between 256 x 256 grid points, so in theory 800 x 600 with antialiasing would be enough. But dunno if it would have the same contrast, OLED is as good as you can get I guess.


On a tiny screen like that, I suspect 800x600 is probably high enough DPI to fake the lines themselves well enough to the point where the pixels aren't discernable to the eye.

This alone still wouldn't remotely resemble a real vector display...

They would also need to accurately simulate the glow/bloom of the lines, and the phosphor decay rate over time that leads to effects like the "trail" behind the bullets in Asteroids. That is all extremely feasible. In a lot of ways, much easier than emulating a raster CRT display.

However, I have never seen a commercial emulation product do this with any competency.

Presumably because the number of people who would actually care is not large enough to affect the sales figures in any meaningful way.


Not really. One of the advantages of vector displays is the fact that the drawn lines are razor sharp with zero aliasing. Another is the fact that the hardware has very fine control over the brightness, allowing for very bright or very dim lines to be drawn. The bright ones are brighter than could be replicated with raster CRT displays, and combined with slow-decay phosphors made for some beautiful "trail" effects. A pixelated display of any sort can only yield a rough approximation at best.


    and combined with slow-decay phosphors made for some beautiful "trail" effects
Thank you. This is such an under-appreciated aspect of vector games' unique look on real hardware.

    A pixelated display of any sort can only yield a rough approximation at best.
Why do you feel this way? With sufficient DPI, to me this is fairly easy to achieve. A few examples of emulation that look like they're doing a very good job:

I think they have the bloom dialed up way too high, and maybe the trails aren't prominent enough, but I assume those are easy things to tweak.

https://www.youtube.com/watch?v=Z4lHsVueSj0

https://www.youtube.com/watch?v=RtUtfBWDgmA

https://www.youtube.com/watch?v=aKjs1rWnwSk


Last time I played a well-maintained Asteroids cabinet, bullets had obvious bloom, but I was surprised to not see a trail. There wasn't any noticeable bloom or trails on the other objects. I believe the arcade monitors have fast decay phosphor like in regular TV sets, so any trail would come from persistence of vision, probably due to the brightness of the bullet.

I'm not sure about the Vectrex CRT, it may have longer persistence phosphor.


The Asteroids I've played had a slow-decay phosphor and trails on the bullets (not so much the asteroids, UFOs, etc). If the cabinet you played had its tube replaced with a TV picture tube, its display characteristics may have changed.


The bloom might be all right if they could replicate the intensity. Maybe with an OLED and sufficient HDR color depth, but I'm not seeing that here. It doesn't look like they did much CRT effect processing on the second two. The fireballs in Star Wars should glow the way the bullets in Asteroids do (albeit with quicker phosphor decay so not much in the way of trails).


Why disagree with a first hand account without any personal experience yourself?


I had one client who made their whole thing about knowledge graphs, which I worked on because I needed money and it was interesting, but I am still a little suspicious that they may have had "knowledgebase" and "knowledge graphs" mixed up and did not know about vector search.

I think for the particular use case, something like filtering the vector search based on tags for each document and then (maybe) a relatively inexpensive and fast reranking LLM step could have worked as well or better. But the reranker is not necessarily important with a strong model and including enough results.


It looks to me like n8n has suddenly taken over to the point where Automation is now almost synonymous with n8n for most people.

As someone who builds workflows using my own agent system which is based on checklists, subtasks and tool calling with new features or tools based on plugins in Python, it is harder and harder to find an "AI Automation" project where people haven't predetermined that I have to use n8n.

It's ridiculous.

I actually think that defaulting to creating workflows in raw code is not an ideal outcome though because it feels inaccessible to non-programmers.

But I think within X months there will be a lot of people who find out how bad the licensing issues are with n8n and migrate to something similar to my system where workflows are run by agents that have a delegate_subtask command or commands to manage checklists etc. Because most of the workflows can be managed easily by strong models and just described in natural language if the agents have the right tool commands and the system has a scheduling/trigger system.

But then give it another X months or a year or so later and many will start using general purpose computer use agents that they just treat quite similarly to human employees. Because one of the biggest gaps regardless of how you do it is with the inconvenience of setting up OAuth 2 and the operational and bottom line issue of running all of your API requests through some centralized service like n8n.

So we will see people who have agent systems like myself (mindroot on GitHub) start building in computer and browser use capabilities and recipes for accessing websites and creating API keys etc. for their users.

Also there inevitably is going to be something along the lines of OAuth or similar that will allow agents to sign up for services and create credentials on behalf of users to solve this type of problem.

But one of the big advantages n8n has with users right now is that they have OAuth set up with literally everything.


> Also there inevitably is going to be something along the lines of OAuth or similar that will allow agents to sign up for services and create credentials on behalf of users to solve this type of problem.

I agree with you, but the real solution to this is an API.


When did this happen?

I’ve been using n8n for two or so years. Just feels like I woke up one day and everyone else was using it too.


it didn't help that its competitor "make" was hard to find on google :/ naming matters folks!


LLMs have an obvious application of removing the command limitations. https://github.com/UlfarErl/lampgpt

"FYI, it is now possible to play all of the Infocom games with a phenomenal parser using LampGPT, with just ./lampgpt.py -O gamename."


I guess I never had a problem with the Infocom parser -- it seemed already so much more advanced than other adventure games that generally only understood two word commands like "GET KEY". In Infocom games you could say things like "get the key and put it in the bag".


Ugh, I'm so glad this exists; I tried some text adventures a few years ago and struggled to get into them due in part to having to cooperate with a rather baroque user interface.

I feel like this could really open them up to a new generation.


I do see a lot of potential application for LLMs around having a natural dialogue with an NPC in future games. Also LLMs to well-defined structured parser instructions might finally enable Strongbad to "get ye flask".


"Rules, triggers, and workflows can be embedded directly into payments, making them smarter and adaptable." -- are they smart contracts? What kind of workflows? Does the workflow involve sending messages to agents or making HTTP requests?


> Does the workflow involve sending messages to agents or making HTTP requests?

It looks like it will be an API for agents [1].

[1] https://cloud.google.com/blog/products/ai-machine-learning/a...


Ah, I see their end goal here. You can either use a full browser to download data from websites (which is what most people will still do), but they provide a temptation to just pay CloudFlare to bypass their Turnstile (which corporations might go for, then they'll probably try to expand this to the average person too).

Basically, "pay to bypass our CAPTCHA!".


I wonder if there could be some kind of platform where you have to pay a $5 deposit or something to be able to post bugs. If you waste people's time with total nonsense then you lose the $5 and can no longer report. If it's less egregious than this, like they at least made a human effort, then maybe you keep some of the deposit. Although maybe $10 or $50 would be better.


GLTron or Armagetron Advanced?


Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: