Hacker Newsnew | past | comments | ask | show | jobs | submit | heavyarms's commentslogin

The last time I checked (a few days ago) it only had an "Upload Image" option... and I have been playing with Gemini on and off for months and I have never been able to actually upload an image.

It's basically what I've come to expect from most Google products at this point: half-baked, buggy, confusing, not intuitive.


It definitely has the ability to upload normal files, the + button has several options.

If you don't have it, you might be in a Google feature flag jail-- this happens frustratingly often, where 99.9% of users have a feature flag enabled but your account just gets stuck with the flag off with no way to resolve it. It's the absolute worst part about Google.


There's not a lot of detail in the announcement but I assume this is some kind of RAG system. I wonder if it will cover some short time period (past week, past month?) or if they are trying to cover the whole time period since the knowledge cutoff of the current model.


My guess is that they’ll just stuff a few daily headlines into the prompt so that queries about current affairs have some context, rather than re-training the model. Total guess obviously.


RAG isn't re-training. You can have vector embeddings of all AP news in a vector DB, then when prompted, find related news via similarity search, and add the most similar (and thus related) ones to the context.

Here's some simple example code in Go, for RAG with 5000 arXiv paper abstracts: https://github.com/philippgille/chromem-go/tree/v0.7.0/examp... (full disclosure it's using a simple vector DB I wrote)


Good point - possibly just a limited version of this, although I don’t know how they’d handle a rolling time window in the vector DB to limit results to just recent stories?


What mechanism would make it possible to enforce non-paywalled, non-authenticated access to public web pages? This is a classic "problem of the commons" type of issue.

The AI companies are signing deals with large media and publishing companies to get access to data without the threat of legal action. But nobody is going to voluntarily make deals with millions of personal blogs, vintage car forums, local book clubs, etc. and setup a micro payment system.

Any attempt to force some kind of micro payment or "prove you are not a robot" system will add a lot of friction for actual users and will be easily circumvented. If you are LinkedIn and you can devote a large portion of your R&D budget on this, you can maybe get it to work. But if you're running a blog on stamp collecting, you probably will not.


If this is a GPT-generated joke, I'd say they cracked AGI.


Whenever I see one of these posts, I click just to see if the proposed solution to testing the output of an LLM is to use the output of an LLM... and in almost all cases it is. It doesn't matter how many buzzwords and acronyms you use to describe what you're doing, at the end of the day it's turtles all the way down.

The issue is not the technology. When it comes to natural language (LLM responses that are sentences, prose, etc.) there is no actual standard by which you can even judge the output. There is no gold standard for natural language. Otherwise language would be boring. There is also no simple method for determining truth... philosophers have been discussing this for thousands of years and after all that effort we now know that... ¯\_(ツ)_/¯... and also, Earth is Flat and Birds Are Not Real.

Take, for example, the first sentence of my comment: "Whenever I see one of these posts, I click just to see if the proposed solution to testing the output of an LLM is to use the output of an LLM... and in almost all cases it is." This is absolutely true, in my own head, as my selective memory is choosing to remember that one time I clicked on a similar post on HN. But beyond the simple question of if it is true or not, even an army of human fact checkers and literature majors could probably not come up with a definitive and logical analysis regarding the quality and veracity of my prose. Is it even a grammatically correct sentence structure... with the run-on ellipsis and what not... ??? Is it meant to be funny? Or snarky? Who knows ¯\_(ツ)_/¯ WFT is that random pile of punctuation marks in the middle of that sentence... does the LLM even have a token for that?


The output of an LLM is often qualitative, not quantitative, and to test that, you need something that can judge the quality.

You're not debating philosophy with the LLM, you're just asking it if the answer matches (semantically) to the expected one.

I usually test LLM output quality with the following prompt (simplified):

"An AI assistant was tasked with {task}. The relevant information for their task was {context}. Their answer is {answer}. The correct answer should be something like {ground truth}. Is their answer correct?"

Then you can spice it up with chain of thought, asking it to judge alongside preferred criteria/dimensions and output a score, etc... you can go as wild as you'd like. But even this simple approach tends to work really well.

> turtles all the way down.

Saying "LLM testing LLM" is bad is like saying "computer testing computer" is bad. Yet, automated tests have value. And just as the unit tests will not prove your program is bug free, LLM evals won't guarantee 100% correctness. But they're incredibly useful tool.

In my experience working on pretty complex multi-agent multi-step systems, trying to get those to work without an eval framework in place is like playing whack-a-mole, only way less fun.


Too late to edit, but here's a great, really in-depth post about using LLMs as judges to evaluate LLM outputs (when you don't have the ground truth for everything): https://cameronrwolfe.substack.com/p/finetuned-judge This is about finetuning LLMs to do it, but the first part is a good intro to why and how.


> "An AI assistant was tasked with {task}. The relevant information for their task was {context}. Their answer is {answer}. The correct answer should be something like {ground truth}. Is their answer correct?"

If you have a ground truth, what was the purpose of asking the AI assistant for an answer in the first place?


When you're writing a test, you usually know the correct answer for that specific combination of input parameters.


Looking back at it, I must have been very tired when I wrote that!

Or maybe I was thinking about cases where the ground truth is difficult to establish.


The cat is dead. The cat is no longer alive. These are equivalent enough, usually, but fails string comparison.


Or like of you did ai voice calling the goal was XYZ did the conversation get, in so many words, to XYZ?


See von neumanns work on reliable machines from unreliable processes.

I wouldn't doubt that if each layer of an LLM added some additional check for an unreliable process that eventually you can make something reliable from the unreliable.


I don't see how von Neumann's work here helps at eliminating the problem and is arguably not particularly different than "just use more LLMs". His key result was to come up with a sufficient number of redundant computations to get the error below a threshold, which is still unreliable. This problem is worse because the fundamental issue is even trying to quantify what "correct" means.

Your suggestion at evaluating accuracy at the layers level necessarily implies there's some method of quantifiably detecting hallucinations. This is not necessarily possible given the particular attention models or even what is mathematically possible given an "infer this from finite text and no ability for independent verification"



> Who knows ¯\_(ツ)_/¯ WFT is that random pile of punctuation marks in the middle of that sentence... does the LLM even have a token for that?

  $ ollama run llama3.2
  >>> What does ¯\_(ツ)_/¯ mean?
  A popular internet meme!

  ¯\_(ツ)_/¯ is a Unicode character that represents a face with a
  surprised or confused expression. The three parts of the character are
  often interpreted as:

  1. `¯` - a horizontal bar, indicating skepticism or "not buying it"
  2. `(ツ)` - a smiling face with a tongue-out gesture, showing amusement
  or frustration
  3. `/¯` - an arrow pointing downwards, suggesting "that's just not
  right" or "good riddance"

  Together, the character is often used to express exasperation,
  disappointment, or sarcasm in response to a situation that seems
  ridiculous, unfair, or annoying. It's commonly used in text messages,
  social media posts, and online forums.

  Example: "You think you can eat an entire pizza by yourself?¯\_(ツ)_/¯"

  >>> /clear
  Cleared session context
  >>> ¯\_(ツ)_/¯
  ¯\_(ツ)_/¯

  >>> ¯\_(ツ)_/¯
  I see you're trying to express a mix of sass and Japanese flair! Keep
  'em coming, I'll keep responding in kind!

  >>> ¯\_(ツ)_/¯
  The eternal "shrug emoji" of the internet. You win, you've conveyed an
  entire emotional spectrum with just three characters. Well played,
  friend!

  >>> ¯\_(ツ)_/¯
  I think we've reached a new level of emoji sophistication. You're not
  even trying to convey anything anymore; it's just a masterclass in
  subtlety and understatement. Bravo!
Sometimes it’s nonsense, sometimes it’s approximately right (“shrug emoji”), sometimes it barks up the wrong tree (“Japanese flair”), sometimes it’s just vapid drivel.


it's an "eastern" style emoticon https://en.wikipedia.org/wiki/List_of_emoticons#Eastern though they're more common in the west nowadays (but not enough for your virtual keyboards to have them built in...)


> barks up the wrong tree (“Japanese flair”)

It just knows more than you. Google says:

katakana letter tu (U+30C4) - ツ


I used the expression “barks up the wrong tree” on purpose. I know the character is Japanese, but that’s not how it’s being used.


If you're running a company that is paying multiple vendors for basic AI features and LLM functionality, it might be worth doing the calculation of how much of that functionality might be covered by getting all of your employees on iOS and MacOS...


There are lots of valid use cases for speech synthesis and text-to-speech technology, and there are like 1 or 2 valid/legal use cases for voice cloning that I can think of. Ignoring the moral and ethical questions, why would anybody devote time and resources building a company around a very niche solution... one in which your customer churn rate is partially dependent on users not ending up in prison.

edit: typo


Had to chuckle when I looked at the Digital Advertising Alliance WebChoices browser tool (in Safari or any browser with cross-site tracking disabled). It allows you to opt out of being tracked, as long as you enable cross-site tracking and let them add a cookie. ¯\_(ツ)_/¯


This makes sense on a number of fronts.

1. If you have capital to invest, you could do worse than AI startups at the moment.

2. Nvidia's long-term threat is not just direct competitors (AMD, Intel), but the big cloud-players going to in-house chips. Supporting the next wave of your customers makes sense.

3. Using Nvidia is the path of least resistance right now. If you only invest in startups using your products (and you are an active investor), you give startups another reason to avoid taking a risk on the alternative.

edit: typo


But this also means if another AI winter were to come, Nvidia will be hit on two fronts.


Okay but we could always go back to GPU mining, right? ;-)


Nothing venture-capitalized, nothing gained!


I assume if one of the names in the paper was O'Shaughnessy you would immediately think: "Irish immigrant!" Schmidt? German immigrant!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: