More

abetlen · on May 17, 2023

I would add the following two numbers if you're generating realtime text or speech for human consumption:

- Human Reading Speed (English): ~250 words per minute

- Human Speaking Speed (English): ~150 words per minute

Should be treated like the Doherty Threshold [1] for generative content.

[1] https://lawsofux.com/doherty-threshold/

skykooler · on May 17, 2023

Human reading speed varies by a factor of 10 or more between individuals, while speaking speed is much more consistent.

fritzo · on May 17, 2023

Even my own reading speed even varies by a factor of 5 day to day, depending on how much reading I've been doing, sleep I've gotten, etc.

irrational · on May 18, 2023

Plus, whether I am reading light fiction versus technical documentation.

JohnFen · on May 18, 2023

> speaking speed is much more consistent.

Is it? I've noticed a huge variance in speaking speed in the US, but it tends to vary more between regions rather than individuals.

telesilla · on May 18, 2023

Exceptions for languages where rapidity of speech really varies according to context, such as in Spanish.

armchairhacker · on May 17, 2023

But I'd say LLMs produce content faster than I can read or write it, because they can produce content which is really dense.

Ask GPT-4 a question and then answer it yourself. Maybe your answer will be as good or better than GPT-4's but GPT-4 writes its answer a lot faster.

furyofantares · on May 17, 2023

It certainly doesn't produce content as fast as I can read it.

renonce · on May 18, 2023

Only if you use gpt-4. gpt-3.5-turbo is much faster, and gpt-4 is only going to get faster as GPUs get faster.

furyofantares · on May 18, 2023

Yep. I use GPT-4 extensively and exclusively, and the comment I was replying to mentioned GPT-4. I can't wait for it to get faster.

cubefox · on May 18, 2023

Bing also uses GPT-4 and it is very fast. Microsoft spends more ok compute.

furyofantares · on May 18, 2023

It doesn't exclusively use GPT-4, you might be right anyway that their GPT-4 is much faster but you're also not always seeing GPT-4 with them.

cubefox · on May 18, 2023

I'm pretty sure it at least mostly uses GPT-4.

darknoon · on May 19, 2023

afaict OpenAI's instance is massively overloaded, you can see with the 32k context model actually being faster in practice rather than slower

bombolo · on May 18, 2023

dense content? Not in my experience. It seems to be really overly verbose for me.

Tostino · on May 18, 2023

Prompt it to be information dense in its response then.

bombolo · on May 18, 2023

So I must specifically ask for it, but it's not at all the default.

Tostino · on May 18, 2023

I get it, but it is just about infinitely configurable to your specific needs so it doesn't bother me too much what the default response is.

abetlen · on April 11, 2023

Also worth checking out https://github.com/saharNooby/rwkv.cpp which is based on Georgi's library and offers support for the RWKV family of models which are Apache-2.0 licensed.

BrutalCoding · on April 11, 2023

I’ve got some of their smaller Raven models running locally on my M1 (only 16GB of RAM).

I’m also in the middle of making it user friendly to run these models on all platforms (built with Flutter). First MacOS release will be out before this weekend: https://github.com/BrutalCoding/shady.ai

abetlen · on April 11, 2023

You can see for yourself (assuming you have the model weights) https://github.com/abetlen/llama-cpp-python

I get around ~140 ms per token running a 13B parameter model on a thinkpad laptop with a 14 core Intel i7-9750 processor. Because it's CPU inference the initial prompt processing takes longer than on GPU so total latency is still higher than I'd like. I'm working on some caching solutions that should make this bareable for things like chat.

abetlen · on March 7, 2023

This is not true, GPT-3 can perform chain-of-thought reasoning through in-context learning either by one/few-shot examples or zero-shot by starting a prompt with "let's think step by step" (less reliable).

GPT-3.5 (what's being used here) is a little better at zero-shot in-context learning as it's been intstruction fine-tuned so it's only given the general format in the context.

abetlen · on Feb 14, 2023

I think you're focusing on a few narrow examples where LLMs are underperforming and generalising about the technology as a whole. This ignores the fact that Microsoft already has a succesful LLM-based product in the market with Github Copilot. It's a real tool (not a party-trick technology) that people actually pay for and use every day.

Search is one application, and it might be crap right now, but for Microsoft it only needs to provide incremental value, for Google it's life or death. Microsoft is still better positioned in both the enterprise (Azure, Office365, Teams) and developer (Github, VSCode) markets.

FractalHQ · on Feb 14, 2023

Copilot mostly spews distracting nonsense, but when it’s useful (like with repetitive boilerplate where it doesn’t have to “think” much) it’s really nice. But if that’s the bar, I don’t think were ready for something like search, which is much more difficult and important to get right for the average person to get more good than harm from it.

lolinder · on Feb 14, 2023

Few people seem to know this, but you can disable auto-suggest in Copilot, so it only suggests things when you proactively ask it to. I only prompt it when I know it will be helpful and it's a huge time saver when used that way.

PartiallyTyped · on Feb 14, 2023

Sometimes, Copilot is brilliant. I have encountered solutions that are miles better than anything i had found on the internet nor expected to find in the first place.

The issue involved heavy numerical computation with numpy, and it found a library call for that that covered exactly my issue.

lordnacho · on Feb 14, 2023

I've had similar experiences. Sometimes it just knows what you want and saves you a minute searching. Sometimes way more than a minute.

But I find it also hallucinates in code, coming up with function calls that aren't in the API but would sound like a natural thing to call.

Overall it's a positive though, it's pretty easy to tell for your other coding tools if the suggestion is for something made up, and the benefits of filling in your next little thought are very real.

2c2c2c · on Feb 15, 2023

do you consider things like extrapolating out the else half out of an if-else given the if half as boilerplate?

these tools are incredible productivity boosts if you leverage them well.

here's a sample from gpt: a low effort question and a code dump that would get you flamed on stackoverflow.

https://cdn.discordapp.com/attachments/263091858505334784/10...

I love it. As long as we continue to use these tools as augmentive, it's just going to get better and better

nightski · on Feb 14, 2023

Google's search results are pretty terrible. I actually have a hard time telling which is a result and which is an ad anymore tbh. I really don't think the bar is that high.

lopatin · on Feb 14, 2023

Maybe the internet is actually that terrible now, and Google is just the messenger?

jonathankoren · on Feb 14, 2023

The internet has been terrible since Yahoo dominated search.

In fact, it was the glut of SEO nonsense like keyword stuffing is that PageRank countered.

If Google search sucks, someone will make one that doesn’t suck, and people will switch.

minsc_and_boo · on Feb 14, 2023

Search still relies on content that doesn't suck though, and like GP said, if the internet today sucks, then the competing search will also suck.

PaulDavisThe1st · on Feb 15, 2023

The internet is fucking awesome, and has been for decades.

anothernewdude · on Feb 15, 2023

The profit incentive is for search to suck. Making it shitty is what brings in the money.

soiler · on Feb 14, 2023

The internet is terrible and Google is the reason.

soiler · on Feb 15, 2023

Ok everyone, enjoy your SEO spam

tdesilva · on Feb 14, 2023

That sounds like an endorsement of their ads platform?

snickerbockers · on Feb 14, 2023

>a few narrow examples

It's Microsoft's own advertisement.

abetlen · on May 10, 2022

My go-to heuristic is three point estimation, basically a weighted average of the best, worst, and average case [0].

(Best + Worst + 4 * Average) / 6

One nice property is that it imposes a distribution that adjusts for longer tailed risks.

https://en.wikipedia.org/wiki/Three-point_estimation

redcalx · on May 10, 2022

So one question here is: Why reduce the distribution (with long tail or whatever) to a single estimate number? If the distribution represents the range of possible outcomes well, then the single number throws away most of the information in the distribution.

netghost · on May 10, 2022

I strongly agree, giving people the distribution conveys a lot of information especially if everyone is clear on what the parameters of that distribution mean (ie: what's the low estimate mean?).

At the same time, there are occasions where it can be useful to collapse a distribution for some types of reports, or for quickly looking across estimates.

abetlen · on May 4, 2022

If you run a Kubernetes cluster for self-hosting software or development I highly recommend setting up a Tailscale subnet router [1]. This will allow you to access any IP (pods or services) in your cluster from any of your Tailscale-connected computers. You can even configure Tailscale DNS to point to the DNS server in your cluster to connect using the service names directly ie. http://my-service.namespace.svc.cluster.local

[1] https://tailscale.com/kb/1185/kubernetes/#subnet-router

abetlen · on April 29, 2021

pikchr is awesome. A project I did recently was a WASM-compiled pikchr library to generate diagrams directly in the browser [1]. Here's a very early demo of a live editor you can play around with [2].

Not fully-featured yet but what I'd like to eventually do is set it up in a similar way to the mermaidjs editor [3]. They encode the entire diagram in the url. That makes it really easy to link to from markdown documents and has the nice benefit that the diagram is immutable for a given url so you don't need a backend to store anything.

[1]: https://www.npmjs.com/package/pikchr-js

[2]: https://pikchr-editor.insert-mode.dev/

[3]: https://mermaid-js.github.io/mermaid-live-editor

abetlen · on March 4, 2021

Cool trick, thanks for sharing. I don't get why there isn't a suitable snscanf function that takes the buffer length as an argument and returns the number of bytes parsed?

froh · on March 4, 2021

fmemopen takes the buffer length, and there is no need to have the buffer \0 terminated, so instead of strlen you can also just give the buffer size.

The number of bytes parsed can be fetched with the scanf %n format specifier.

abetlen · on Feb 25, 2021

If you're getting into this stuff a great resource I found is the ZipCPU Tutorial [0] by Dan Gisselquist. The tutorial covers both Verilog design and Formal Verification methods. It uses open source tools like Verilator and SymbiYosys so getting started is pretty easy.

[0]: https://zipcpu.com/tutorial/

anon2718 · on Feb 25, 2021

I just want to share with people that in no uncertain terms Dan Gisselquist\ZipCPU is transphobic and many members of the Open FPGA community are not a fan of him.

[0]: https://twitter.com/whitequark/status/1333484002708254728 [1]: https://zipcpu.com/blog/2020/11/26/zipcpu-biz.html