It is different this time, though. Take a look at this open source project.[1]
This is a system which lets you talk to NPCs in video games. It's a collection of off the shelf components held together by some Python code. The components do this:
- Listen to the user talking and convert speech to text.
- Watch the user's facial expressions via webcam.
- Watch the game, and use face recognition on the game images to determine what character is being addressed.
- Run the user's text through a LLM preloaded with about 30 lines of info about the NPC to generate a reply.
- Generate voice output in a voice generated to match the character's persona.
- Modify the image of the character on screen to animate their facial expressions to match the voice output. This is done on the output image, not by animating the 3D character.
Five years ago, that was science fiction. A year ago, half that stuff wouldn't work right. Now it's someone's hobby project.
I think this is a pretty good illustration of why incumbents are likely to capture much of the value though. Some simple scripts using OSS libraries can do some pretty amazing stuff you'd have previously needed advanced research teams to even attempt.
So the majority of the value gets captured not by companies focused on writing new components that nobody else can match, but by the incumbents with the wealth of proprietary data to feed into the components, or the infrastructure to run the more infrastructure-dependent models at scale, or the customer base to milk for AI-enabled versions of their existing product or selling AI consulting services.
Don't get me wrong, it is cool for indie game developers to be able to procedurally generate NPC conversations. But indie game developers are not likely to capture more value from being able to generate stuff very easily than Microsoft.
But it isn't different. People have been using things like Markov chains to experiment with NPC dialogue for well over a decade.
It just never got widespread adoption because it's just not interesting, and LLMs are no different here. The dialogue is still empty, despite being deeper and more grammatically complex than previous attempts.
If every farmer in an RPG hands out the same "collect 20 bear asses" quest it doesn't matter if they all have "detailed" randomly generated backstories and can opine about the game world, real world philosophy, or the 2024 US elections.
I actually think it makes a world of difference to opine about the game world. It's so much more immersive.
Have you ever gone to a living history museum? (Old Sturbridge Village is one example, my favorite I've been to). All these people in character, able to talk about the period, it makes for an amazing experience.
In traditional video games, if we try to or even accidentally push any deeper, we see the cracks in the universe. "Oh, I spoke to this person again, and they said the same thing to me." AI can help fix those cracks, and fill them in wherever the player ventures.
This certainly doesn't change Fortnite, but I think it could change immersive RPGs and MMOs.
"Living History" is a well crafted written experience, not procedurally generated slop.
The issue here is that LLMs can only act in-character if the world has already been built and written, if the prompts are so pre-chewed that you may as well just write the dialogue directly and get even better results.
Take Solaire of Astora. He's an interesting NPC not because of any depth of the dialogue, but because of how well in-tune he is to the world and game itself. A true believer in the old god, a beacon of optimism in a depressed dying world, and someone who sets the tone of the co-op multiplayer to be silly and fun.
“‘Living History’ is a well crafted written experience, not procedurally generated slop”
Having known people who lived/worked at a living history museum, their experience was much closer to improvisational comedy than a scripted interaction. Sure, they were riffing on their historical knowledge instead of cracking jokes, but it was not scripted.
> It just never got widespread adoption because it's just not interesting.
True. There have been NPC systems where the NPCs had motivations and a life of their own, even when no one was around. Those haven't helped gameplay much.
The current problem is that LLMs don't know enough about the game world. Recent progress on that.[1]
(2022) article. Interestingly, a lot has changed in just under 9 months in the AI world. GPT 4 has come and it's actually an AI crunch, not a gain. I wrote in another post I submitted but the gist is that bootstrapped startups and incumbents will be the true winners while VC backed startups won't, because there is no moat in AI to defend their high valuations.
I work in a domain of applying AI to specific enterprise domain. It's not like you can crawl our data in the open web. Getting any data from clients is years of lawyer struggles and chicken and egg problems to solve. Fine-tuning models to client expectations - they are not going to go through the process again with someone else.
And moat in B2C AI is owning tons of your personal data and habits that Google and Facebook do. It's just not trully utilized with GPT models yet.
I heard from a large AI founder recently on this topic. Data is an okay moat, but in this craze we'll see the power of data shrink. Companies are getting enough VC funding ($10m-$100m+) to buy any data they need. A better model could also make up for a lack of better data.
Instead, the best moat is to know that your product isn't a thin replicable wrapper for ChatGPT but instead has a large surface area, with lots of well-built features. Continue building those features at a fast pace, and you can win.
GPT-3.5, GPT-4, and similar are fairly new. There are many uses of the technology that remain unexplored by product people, not to mention the technology is getting more advanced and has more capabilities by the week (or day or month, depending on your perspective).
The new AI platform may over time enable more products than even the shift to mobile.
ADDED:
What's implied is that the moat could be built and some kinds of moats are not yet well-known or prevalent. Proprietary data is often mentioned. But also the application on top of LLMs (or LFMs) needs not be just a thin layer with little technical barrier.
Really tough one to guess. If the small laptop-run models win (become useful enough), the value may be captured by the commons, with various applications (glue code essentially) capturing the value. A bit like the early internet scenario - good for startups.
Likely NVidia, AWS, Azure, Google Cloud will capture a lot of the value. OpenAI might, but they are playing a game of tennis where they are "Advantage" but could still lose.
Now that the era of free money is over, and you have to pay nonzero interest, profits matter again. Is anybody in the AI space actually profitable? Is OpenAI losing money on every token to build volume?
This. Why does Silicon Valley always miss the effing obvious?
The best way to assess a startup's value is to play a bank evaluating them for a traditional no-frills loan. How risky a debt the bank considers it is a fair measure of the value of the company (and in most non-public cases, it will be negative EV, future revenues be damned). Not the BS analyses made by IBD teams at banks, and not the "valuations" ascribed to the startup by its cash-rich, opportunity-deprived VCs.
> The best way to assess a startup's value is to play a bank evaluating them for a traditional no-frills loan.
In Shoe Dog by Nike co-founder Phil Knight, he describes how the banks kept refusing to borrow them money because they refused to value based on future cash flows. Eventually, Nike switched to another bank. The first bank could have made a lot of money there.
In general, even Buffett after years of very conservative valuations (Sigar Butt Investing) switched to "buying great companies at fair prices". Why would you buy a company that barely keeps up with inflation if you could buy one that literally grows exponentially. If you hop from Sigar Butt to Sigar Butt, you can also grow your money exponentially, but it's harder because you pay more taxes, brokerage fees, and have to work more on finding the right enters and exits. Conversely, if you are as clever as the Nomad Investment Partnership and just only bought and hold Costco, Berkshire, and Amazon from 2005 to now, you would have gotten great returns on investement without having to do a thing.
One could also use that story to illustrate my point too. Phil Knight was being transparent with his banks on the books. His American bank thought he was cooking the books, his Japanese bank saw the growth rate of Nike's cash flows and loaned him the money based on that. It was not idle speculation in an ivory tower (or a Sand Hill Road office) like most VCs today. How many VCs even use a DD audit in the final stages of their Series D+ investment?
The second thing is that Nike had the cashflow to show in its books, unlike most of today's startups. Stuff was moving off the shelves super fast, and they were making a neat profit on every sale. It wasn't like a tech startup purposely underpricing itself initially then worrying when users don't retain after future price hikes. To put it another way, Nike would have been attractive for a PE firm today, unlike most startups today.
I completely agree with you that some/many VCs spend money on ridiculous business models. On the one hand, it seems like a waste of resources. On the other hand, you could also say that it's a great way for innovation to happen. Maybe some ideas made no sense at all, but worked and lead to a technological breakthrough? If VCs wouldn't fund moonshot ideas with many millions then who? Apart from a few universities, most universities I've been are absolutely terrible at getting people and resources aligned towards a common goal.
There's a difference between VCs funding a moonshot idea and VCs funding some stupid idea. For instance, Amgen and Genentech were founded by VCs, with VC funding, as were Google, Amazon, etc. The difference back then was that VCs actually did their due diligence as part of their duty. Back then, it was much harder to get VC funding in the first place - the science had to be at least 90% sound, the metrics had to be solid, the numbers out with an open book. For example, Google only raised a $25m Series A by the time they were already widely well known, almost a household name.
My critique is not about the VC funding model. It's about VCs not doing any due diligence these days but just chasing the next hype cycle.
That's a different kind of business problem. Each transaction is profitable but profits are not sufficient to grow fast. This is different from each transaction being a loss.
There are an estimated 1 billion knowledge workers worldwide. The alleged operating costs of OpenAI are around $700,000/day. That's $0.25 billion / year. Add to that salaries and retraining. Salaries: 375 employees at an average $350,000 / year comes to $0.13 billion / year. And retraining cost seems to be on the order of tens of millions per training run.
With the right subscription fee it does seem possible to balance the books and be profitable. Especially when they start selling bulk contracts to governments and schools and big corporations.
Reflecting on this nine months later, it feels a lot of people misread the pace of innovation, and where inertia actually stays. A couple of aspects I thought of when I heard about Jasper layoffs.
1. A lot of value was supposed to come from selling to enterprises. The narrative was that they would move slowly and hence nimble startups could sell to them and generate quick revenue.
The assumptions are really tested on this one. First, the virality and popularity meant any Engg leader working on AI related projects got social capital and prestige (and a promotion) inside the company, making it preferable for companies to build than buy. An API form factor helped immensely in getting to a POC within a day. Second, for those buying, many startups (in LLMOps) ended up selling the same thing, so they slowed down to evaluate. Third, the data privacy issues meant no enterprise was willing to go for cloud solutions.
2. A lot of startups never picked up the tougher problems. Eg: Training an open source model, or finetuning as a service, the core aspects to change the underlying behavior of a model was picked up in open source, but most startups never picked that part up. Partly to do with things that got hype. An LLM wrapper would show off a cool demo, gets shared widely, thus encouraging others to build something similar, rather than go deep.
A very clear indication of this was how Open AI and then Anthropic stopped offering finetuning services on newer models electing to just enable zero/few shot learning and bigger context windows. Easy for them, but tough for consumers who really wanted a customized solution.
There are still very cool moonshots out there, and probably unlock the value not captured by incumbents. At this point, my working assumption is that for an AI startup to capture value, they would have to go deeper into the stack, and offer a service their competitors would take effort to do (and by extension enterprises would take time to do). Eg: Ability/Training a open source model locally for search and summarization based on proprietary data. I know BCG[1] did it pretty well and got spectacular results.
Why would I use a Google LLM or a Facebook LLM over OpenAI's LLM?
Google and Facebook are today's knowledge dealers. They do not profit from providing an LLM that sidesteps all their products and gives you the answer you are searching for directly. They want to influence your eyeballs. They will try to do this by injecting their own thought manipulation crap in their LLMs. I instinctively would not trust them. I would want an LLM that is pure in some sense. Unfortunately, even OpenAI is already debased, but for another reason.
But here you can see the value that a startup can provide over the current incumbents. A startup can provide an unadulterated knowledge base of the internet and be profitable. Whether that is OpenAI or one of its competitors I do not know, but Google and Facebook cannot do that. There is no gain for them.
I’m too much of a bitchy queen to get past the author’s writing style, so I only got through a few paragraphs.
At the end of the day, the incumbents are mostly providing the APIs or the hardware to do anything significant. There may be a handful of outliers, but it seems a vast majority of AI startups these days are a new iteration of resellers.
Before, it was hosting that was resold, now it is APIs or if their customers have any gumption TPUs and GPUs, which is arguably still hosting.
I don’t see startups ruling AI.
Also, a lot of people misattribute the label of startup to established companies. I could go on a rant about tech journalists being the cause, but I’ll just say OpenAI is not a startup.
The reasons why startups did not capture a lot of value in the last wave of AI was because incumbents held the data and ML was primarily a feature added to someone else’s product and distribution channel.
The reason why ChatGPT changed that is because they developed an algorithm/model good enough to offer a consumer-grade conversational interface and they scraped the web to train it.
That is, they offered a whole product and nailed distribution so they could own the relationship with the user.
This is a system which lets you talk to NPCs in video games. It's a collection of off the shelf components held together by some Python code. The components do this:
- Listen to the user talking and convert speech to text.
- Watch the user's facial expressions via webcam.
- Watch the game, and use face recognition on the game images to determine what character is being addressed.
- Run the user's text through a LLM preloaded with about 30 lines of info about the NPC to generate a reply.
- Generate voice output in a voice generated to match the character's persona.
- Modify the image of the character on screen to animate their facial expressions to match the voice output. This is done on the output image, not by animating the 3D character.
Five years ago, that was science fiction. A year ago, half that stuff wouldn't work right. Now it's someone's hobby project.
[1] https://github.com/AkshitIreddy/Interactive-LLM-Powered-NPCs