Hacker Newsnew | past | comments | ask | show | jobs | submit | abetlen's commentslogin

I would add the following two numbers if you're generating realtime text or speech for human consumption:

- Human Reading Speed (English): ~250 words per minute

- Human Speaking Speed (English): ~150 words per minute

Should be treated like the Doherty Threshold [1] for generative content.

[1] https://lawsofux.com/doherty-threshold/


Human reading speed varies by a factor of 10 or more between individuals, while speaking speed is much more consistent.


Even my own reading speed even varies by a factor of 5 day to day, depending on how much reading I've been doing, sleep I've gotten, etc.


Plus, whether I am reading light fiction versus technical documentation.


> speaking speed is much more consistent.

Is it? I've noticed a huge variance in speaking speed in the US, but it tends to vary more between regions rather than individuals.


Exceptions for languages where rapidity of speech really varies according to context, such as in Spanish.


But I'd say LLMs produce content faster than I can read or write it, because they can produce content which is really dense.

Ask GPT-4 a question and then answer it yourself. Maybe your answer will be as good or better than GPT-4's but GPT-4 writes its answer a lot faster.


It certainly doesn't produce content as fast as I can read it.


Only if you use gpt-4. gpt-3.5-turbo is much faster, and gpt-4 is only going to get faster as GPUs get faster.


Yep. I use GPT-4 extensively and exclusively, and the comment I was replying to mentioned GPT-4. I can't wait for it to get faster.


Bing also uses GPT-4 and it is very fast. Microsoft spends more ok compute.


It doesn't exclusively use GPT-4, you might be right anyway that their GPT-4 is much faster but you're also not always seeing GPT-4 with them.


I'm pretty sure it at least mostly uses GPT-4.


afaict OpenAI's instance is massively overloaded, you can see with the 32k context model actually being faster in practice rather than slower


dense content? Not in my experience. It seems to be really overly verbose for me.


Prompt it to be information dense in its response then.


So I must specifically ask for it, but it's not at all the default.


I get it, but it is just about infinitely configurable to your specific needs so it doesn't bother me too much what the default response is.


Also worth checking out https://github.com/saharNooby/rwkv.cpp which is based on Georgi's library and offers support for the RWKV family of models which are Apache-2.0 licensed.


I’ve got some of their smaller Raven models running locally on my M1 (only 16GB of RAM).

I’m also in the middle of making it user friendly to run these models on all platforms (built with Flutter). First MacOS release will be out before this weekend: https://github.com/BrutalCoding/shady.ai


You can see for yourself (assuming you have the model weights) https://github.com/abetlen/llama-cpp-python

I get around ~140 ms per token running a 13B parameter model on a thinkpad laptop with a 14 core Intel i7-9750 processor. Because it's CPU inference the initial prompt processing takes longer than on GPU so total latency is still higher than I'd like. I'm working on some caching solutions that should make this bareable for things like chat.


This is not true, GPT-3 can perform chain-of-thought reasoning through in-context learning either by one/few-shot examples or zero-shot by starting a prompt with "let's think step by step" (less reliable).

GPT-3.5 (what's being used here) is a little better at zero-shot in-context learning as it's been intstruction fine-tuned so it's only given the general format in the context.


I think you're focusing on a few narrow examples where LLMs are underperforming and generalising about the technology as a whole. This ignores the fact that Microsoft already has a succesful LLM-based product in the market with Github Copilot. It's a real tool (not a party-trick technology) that people actually pay for and use every day.

Search is one application, and it might be crap right now, but for Microsoft it only needs to provide incremental value, for Google it's life or death. Microsoft is still better positioned in both the enterprise (Azure, Office365, Teams) and developer (Github, VSCode) markets.


Copilot mostly spews distracting nonsense, but when it’s useful (like with repetitive boilerplate where it doesn’t have to “think” much) it’s really nice. But if that’s the bar, I don’t think were ready for something like search, which is much more difficult and important to get right for the average person to get more good than harm from it.


Few people seem to know this, but you can disable auto-suggest in Copilot, so it only suggests things when you proactively ask it to. I only prompt it when I know it will be helpful and it's a huge time saver when used that way.


Sometimes, Copilot is brilliant. I have encountered solutions that are miles better than anything i had found on the internet nor expected to find in the first place.

The issue involved heavy numerical computation with numpy, and it found a library call for that that covered exactly my issue.


I've had similar experiences. Sometimes it just knows what you want and saves you a minute searching. Sometimes way more than a minute.

But I find it also hallucinates in code, coming up with function calls that aren't in the API but would sound like a natural thing to call.

Overall it's a positive though, it's pretty easy to tell for your other coding tools if the suggestion is for something made up, and the benefits of filling in your next little thought are very real.


do you consider things like extrapolating out the else half out of an if-else given the if half as boilerplate?

these tools are incredible productivity boosts if you leverage them well.

here's a sample from gpt: a low effort question and a code dump that would get you flamed on stackoverflow.

https://cdn.discordapp.com/attachments/263091858505334784/10...

I love it. As long as we continue to use these tools as augmentive, it's just going to get better and better


Google's search results are pretty terrible. I actually have a hard time telling which is a result and which is an ad anymore tbh. I really don't think the bar is that high.


Maybe the internet is actually that terrible now, and Google is just the messenger?


The internet has been terrible since Yahoo dominated search.

In fact, it was the glut of SEO nonsense like keyword stuffing is that PageRank countered.

If Google search sucks, someone will make one that doesn’t suck, and people will switch.


Search still relies on content that doesn't suck though, and like GP said, if the internet today sucks, then the competing search will also suck.


The internet is fucking awesome, and has been for decades.


The profit incentive is for search to suck. Making it shitty is what brings in the money.


The internet is terrible and Google is the reason.


Ok everyone, enjoy your SEO spam


That sounds like an endorsement of their ads platform?


>a few narrow examples

It's Microsoft's own advertisement.


My go-to heuristic is three point estimation, basically a weighted average of the best, worst, and average case [0].

(Best + Worst + 4 * Average) / 6

One nice property is that it imposes a distribution that adjusts for longer tailed risks.

https://en.wikipedia.org/wiki/Three-point_estimation


So one question here is: Why reduce the distribution (with long tail or whatever) to a single estimate number? If the distribution represents the range of possible outcomes well, then the single number throws away most of the information in the distribution.


I strongly agree, giving people the distribution conveys a lot of information especially if everyone is clear on what the parameters of that distribution mean (ie: what's the low estimate mean?).

At the same time, there are occasions where it can be useful to collapse a distribution for some types of reports, or for quickly looking across estimates.


If you run a Kubernetes cluster for self-hosting software or development I highly recommend setting up a Tailscale subnet router [1]. This will allow you to access any IP (pods or services) in your cluster from any of your Tailscale-connected computers. You can even configure Tailscale DNS to point to the DNS server in your cluster to connect using the service names directly ie. http://my-service.namespace.svc.cluster.local

[1] https://tailscale.com/kb/1185/kubernetes/#subnet-router


pikchr is awesome. A project I did recently was a WASM-compiled pikchr library to generate diagrams directly in the browser [1]. Here's a very early demo of a live editor you can play around with [2].

Not fully-featured yet but what I'd like to eventually do is set it up in a similar way to the mermaidjs editor [3]. They encode the entire diagram in the url. That makes it really easy to link to from markdown documents and has the nice benefit that the diagram is immutable for a given url so you don't need a backend to store anything.

[1]: https://www.npmjs.com/package/pikchr-js

[2]: https://pikchr-editor.insert-mode.dev/

[3]: https://mermaid-js.github.io/mermaid-live-editor


Cool trick, thanks for sharing. I don't get why there isn't a suitable snscanf function that takes the buffer length as an argument and returns the number of bytes parsed?


fmemopen takes the buffer length, and there is no need to have the buffer \0 terminated, so instead of strlen you can also just give the buffer size.

The number of bytes parsed can be fetched with the scanf %n format specifier.


If you're getting into this stuff a great resource I found is the ZipCPU Tutorial [0] by Dan Gisselquist. The tutorial covers both Verilog design and Formal Verification methods. It uses open source tools like Verilator and SymbiYosys so getting started is pretty easy.

[0]: https://zipcpu.com/tutorial/


I just want to share with people that in no uncertain terms Dan Gisselquist\ZipCPU is transphobic and many members of the Open FPGA community are not a fan of him.

[0]: https://twitter.com/whitequark/status/1333484002708254728 [1]: https://zipcpu.com/blog/2020/11/26/zipcpu-biz.html


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: