Hacker Newsnew | past | comments | ask | show | jobs | submit | wills_forward's commentslogin

Why not use both? I just built a pipeline for document data extraction that uses PaddleOCR, then Gemini 3 to check + fix errors. It gets close to 99.9% on extraction from financial statements finally on par with humans.


I did the opposite. Tesseract to get bboxes, words, and chars and then mistral on the clips with some reasonable reflow to preserve geometry. Paddle wasn’t working on my local machine (until I found RapidOCR). Surya was also very good but because you can’t really tweak any knobs, when it failed it just kinda failed. But Surya > Rapid w/ Paddle > DocTr > Tesseract while the latter gave me the most granularity when I needed it.

Edit: Gemini 2.0 was good enough for VLM cleanup, and now 2.5 or above with structured output make reconstruction even easier.


This is The Way. Remember AI doesn't have to replace existing solutions but can tactfully supplement it.



The Break Fast Club.


The cheap easy take: it's tragically ironic that the software running the infrastructure in Silicon Valley is such a problem


It's a shame that SF politics are so dysfunctional it can't have a metro at the same level of quality as, say, North Korea.


North Korea? If you think it is a good example of a low bar of transit quality/safety to meet, then you’re comically far off.


You think that's setting the bar too high or too low?


Too high. I think NK transit system is incomparably safer and cleaner than BART.

Riding without a ticket? Jail.

Littering on the platform? Straight to jail, right away!

Doing any violent crime in NK transit? Believe it or not - death by firing squad.

Here is a quick overview of how the system works: https://youtu.be/eiyfwZVAzGw?si=CnOMa8F6NkiyhifE


We're in agreement about the facts on the ground.

Setting aside safety for a moment, consider just hygiene: BART is shockingly dirty. Which suggests mismanagement, above and beyond just a lack of detterence of criminality.

As for safety -- firing squads are probably not in the cards, but would jailing the violent be too much to hope for?


SF doesn't run BART, though.

Not saying SF politics is great, but at least point to the correct boogeyman.


I didn't know that, but please accept SF as a sloppy metonym for bay area. :-)


Maybe expected though that high salaries there depress incentive to work in these jobs even more than other cities?


No. It is pretty typical for anything gov to be pretty bad. Most dont work there due to how bureaucratic it is rather than the comp. This is what my friends who work in gov say at least.


There is a strong correlation between hiring low end people and being or becoming ever more bureaucratic. Bureaucracy like everything else is there for a reason.


And yet NYC .gov sites, apps, and functionality makes SF still look like a shantytown after all this time.


Beating a bar that is on the floor is none too impressive.


This dead horse ain’t gonna beat itself back to life. Might as well give it the ol’ college try, eh?


BART is a government organization and all California government employee pay is public. You can see that BART has about 40 software engineers and they earn about 70% of the market rate:

https://transparentcalifornia.com/salaries/search/?q=compute...

https://transparentcalifornia.com/salaries/search/?q=compute...

It seems to me that they are over-worked & under-paid and are doing a good job given the circumstances.

NIMBYs have blocked BART in Silicon Valley. BART doesn't reach Menlo Park, Palo Alto, Stanford, Mountain View, Sunnyvale, Los Altos, Santa Clara, or Cupertino. A few years ago, it finally reached San Jose.

A separate train (CalTrain) goes from SF through Silicon Valley. Last year they switched to electric trains which are faster and run more frequently. The SF CalTrain station is inconvenient (20-mins walk from downtown, under a highway), but they are working to extend CalTrain to the central SF station: https://en.wikipedia.org/wiki/Salesforce_Transit_Center#Futu... .

So Silicon Valley transit is getting better, slowly.


BART barely goes into Silicon Valley. Fremont was the closest stop up until 2017. Now it gets to North San Jose. Even if was funded, any further extension wouldn't be complete for over a decade.


I'll bite: Silicon Valley isn't known for good infrastructure, we are just able to roll back changes very easily. The cost of getting software wrong for BART is far higher than if my div is padded incorrectly.


So this could universally decrease the memory requirements by un-quantitized LLMs by 30%? Seems big if true.


Not as big when Q8 quantization is already considered overkill and cuts it down to 50% (and a flat 2x speed boost without any additional compute overhead mind you) and the more common Q4KM is more like 30%. Definitely interesting if it can be added to existing quantization, but K quants do already use different precision levels for different layers depending on general perplexity impact which is similar to this entropy metric they use, e.g. Q6 using a mix of 4 bits and 8 bits. And that's not even considering calibrated imatrix which does something conceptually similar to FFT to compress even higher.


Quantization is not lossless.


Nobody really cares if it meets a strict definition of lossless.


I do? I spend a ton of time post-training models for creative tasks.

The effects of model quantization are usually qualified in terms of performance on benchmaxxed tasks with strong logit probabilities, temp 0, and a "right" answer the model has to pick. Or even worse they'll be measured on metrics that don't map to anything except themselves like perplexity (https://arxiv.org/pdf/2407.09141)

I agree Q8 is strong but I also think the effects of quantization are constantly being underappreciated. People are often talking about how these models perform while fundamentally using 10+ variants of a single model with distinct performance profiles.

Even knowing the bits per weight used isn't enough to know how exactly a given quant method is affecting the model: https://docs.unsloth.ai/basics/unsloth-dynamic-v2.0-ggufs


If you've trained your own models you would be aware of quantization aware training.


"Nobody really cares if it meets a strict definition of lossless" != "quantization can be done haphazardly."


If you're trying to really snarkily refer to the article on Dynamic Quants 2.0 and how carefully developed they were, they're comparing their quants to the methodology 99.99% quants out there use.

The problem is not that people are making quants "haphazardly", it's that people keep parroting that various quants are "practically lossless" when they actually have absolutely no clue how lossy they are given how application specific the concept is for something as multidimensional as an LLM.

The moment anyone tries a little harder to quantify how lossy they are, we repeatedly find that the answer is "not any reasonably definition of lossless". Even in their example where Q4 is <1% away in MMLU 5-shot is probably massively helped by a calibration dataset that maps to MMLU-style tasks really well, just like constantly using WikiText massively helps models that were trained on... tons of text from Wikipedia.

So unless you're doing your own calibrated quantization with your own dataset (which is not impossible, but also not near common), even their "non-haphazard" method could have a noticeable impact on performance.


Wasn't referring to that.

You are saying that people are using quantized models haphazardly and talking about them haphazardly. I'll grant it's not the exact same thing as making them haphazardly, but I think you took the point.

The terms shouldn't be used here. They aren't helpful. You are either getting good results or you are not. It shouldn't be treated differently from further training on dataset d. The weights changed - how much better or worse at task Y did it just get?


The term is perfectly fine to use here because choosing a quantization strategy to deploy already has enough variables:

- quality for your specific application

- time to first token

- inter-token latency

- memory usage (varies even for a given bits per weight)

- generation of hardware required to run

Of those the hardest to measure is consistently "quality for your specific application".

It's so hard to measure robustly that many will take significantly worse performance on the other fronts just to not have to try to measure it... which is how you end up with full precision deployments of a 405b parameter model: https://openrouter.ai/meta-llama/llama-3.1-405b-instruct/pro...

When people are paying multiples more for compute to side-step a problem, language and technology that allows you to erase it from the equation is valid.


You say that as though people know these things for the full precision deployment and their use case.

Some have the capability to figure it and can do it for both full precision and quantized. Most don't and cannot.


And when you consider that the usual final step in the pipeline is that a sampler goes ham on the probabilities and just picks some random nonsense, the tolerance for lossy compression is fairly high.

In fact, there's this funny occurrence where Q4 models on occasion perform better than their fp16 counterparts on benchmarks ran with top_k=1 since the outputs are slightly more random and they can less deterministically blunder past the local maximum into a more correct solution.


We got an oral at ICLR for calling out how shit samplers like top_p and top_k are. Use min_p!


True yep, I wish more people benchmarked models with more representative sampler settings and then took the average of 5 or 10 responses.


That's not true. If there are measurable performance differences.


"strict" means something. People, including yourself, only care if there is a practical difference in performance. "this is lossless and that isn't lossless" is a completely useless statement in this realm. In many domains lossy compression is either not tolerated, not legal or not practical.


If you get any accuracy degradation with full 8 bits of precision you're doing it wrong.


Or your model wasn't trained so well (weights are too spiky)


Seems reductive.


This paper is basically statistical mechanics with a quantum veneer. Two major issues:

1. Scale: They're simulating just 13 qubits with QuTiP and making grand claims about quantum thermodynamics. The computational complexity they're glossing over here is astronomical. Anyone who's actually worked with quantum systems knows you can't just handwave away the scaling problems.

2. Measurement Problem: Their whole argument about instantaneous vs time-averaged measurements is just repackaging the quantum measurement problem without actually solving anything. They're doing the same philosophical shell game that every "breakthrough" quantum paper does by moving around where they put the observer and pretending they've discovered something profound.


I disagree with you on both fronts.

1. The main underpinning of this article is the analytical theory they come up with independent of their simulation. The fact that it explains a few qubits well is exactly why this is interesting. If you were to scale up their model - a spin-1/2 ising model, you would effectively get a classical magnet, which is obviously well described by classical thermodynamics. It's in limit of small systems that quantum mechanics makes thermodynamics tricky.

2. Their time averaging is just to remove fluctuations in the state, not avoid the measurement problem. They're looking at time averages of the density matrix, which still yields a quantum object that will collapse upon measurement. And as their mathematical model points out, this can be true for arbitrary time averaging windows, the limits just change respectively as smaller time averages allow for larger fluctuations. There's nothing being swept under the rug here.


Quantum mechanics is statistical mechanics in the complex numbers.


Quantum mechanics is Markov chains in imaginary time.


Can you explain that?


State transitions are probabilistic and operators have complex coefficients.


State transitions are deterministic, it's only measurement that is probabilistic.


Even that is arguable. Subjective experience is probabilistic… kinda.


Do atoms decay deterministically?


As long as they are isolated, their state is a superposition of all possible states, and evolves determinsitically, with the amplitude of each of these "sub-states" evolving perfectly determinsitically. If you want to perform a measurement, you choose a possible decomposition of the superposition state and measure along that axis, and you'll get one of the values along that axis, with a probability that is the modulus of the square of the (complex) amplitude of that value.


Yes, aka. continuously. Interactions with larger systems makes it appear discontinuous.


I saw the best minds of my generation pithposting on hn.



It was funny to hear the same guy warning LMMs were getting too powerful now talking about the limits of available original training data.


Citadel or Jump?


I really like the elegant simplicity of tagging the screen elements like that and not obfuscating it away.

Nice work too!


thanks! I took my inspiration from Vim browser plugin (https://chromewebstore.google.com/detail/vimium/dbepggeogbai...), they have a shortcut F that allows you to choose any element on the website to navigate from

thanks vim!


Is this part of the reason Apple decided to support RCS? They knew the iMessage system would get opened up eventually anyway...


My jaw drop to see algorhythmic complexity laid out so clearly in a 3d space like that. I wish I was smart enough to know if it's accurate or not.


To know, you must perform intellectual work, not merely be smart. I bet you are smart enough.


What a nice comment!! This has been a big failing of my mental model. I always believed if I was smart enough I should understand things without effort. Still trying to unlearn this....


That is a surprisingly common fallacy actually; I think you will find this book quite helpful to overcome it: https://www.penguinrandomhouse.com/books/44330/mindset-by-ca...


Aw thanks for such encouragement all


Unfortunately you must look closely at the details to deeply understand how something works. Even when I already have a decent mental heuristic about how an algorithm works, I get a much richer understanding by calculating the output of an algorithm by hand.

At least for me, I don't really understand something until I can see all of the moving parts and figure out how they work together. Until then, I just see a black box that does surprising things when poked.


It's also important to learn how to "teach yourself".

Understanding transformers will be really hard if you don't understand basic fully connected feedforward networks (multilayer perceptrons). And learning those is a bit challenging if you don't understand a single unit perceptron.

Transformers have the additional challenge of having a bit weird terminology. Keys, queries and values kinda make sense from a traditional information retrieval literature but they're more a metaphor in the attention system. "Attention" and other mentalistic/antrophomorphic terminology can also easily mislead intuitions.

Getting a good "learning path" is usually a teacher's main task, but you can learn to figure those by yourself by trying to find some part of the thing you can get a grasp of.

Most complicated seeming things (especially in tech) aren't really that complicated "to get". You just have to know a lot of stuff that the thing builds on.


99% persperation, 1% inspiration, as the addage goes...and I completely agree.

The frustration for the curious is that there is more than you can ever learn. You encounter something new and exciting, but then you realize that to really get to the spot where you can contribute will take at least a year or six, and that will require dropping other priorities.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: