On GPT-3.5 and now on GPT-4, I told it a line I could remember from a sonnet, and asked it to give me which sonnet it came from. It failed, and fabricated a sonnet that was a mashup of other sonnets.
It seems like maybe GPT-4 is not good at knowing when it does not know something?
Is this a common issue with LLMs? Also surprising (to me), it seems to give a slightly different wrong answer each time I restart the chat.
Here is the transcript in case you're curious:
```
Me: hello
GPT: Hello! How can I assist you today?
Me: I can remember a line I love from a sonnet, but can't remember which one.
GPT: Sure, I can help you identify the sonnet based on the line you remember. What is the line that you can recall?
Me: "Like a jewel hung in ghastly night"
GPT: The line you remember is from Sonnet 66 by William Shakespeare. The full line is: "And art made tongue-tied by authority, And folly (doctor-like) controlling skill, And simple truth miscalled simplicity, And captive good attending captain ill: Tired with all these, from these would I be gone, Save that, to die, I leave my love alone. But, like a jewel hung in ghastly night, Makes black night beauteous and her old face new."
```
Incorrect, the response is a mashup of at least 2 different sonnets. Still I think it's really impressive how it understood what to do with "I can remember a line I love from a sonnet, but can't remember which one." That's not a well phrased question, or even a question at all, but still it knew what I was getting at.
This is pure speculation on my part, but I don't think LLMs in their current form will be able to talk about what they do and don't know. No matter how large the input.
There is no introspection in their architecture. Introspection likely has to involve some form of a feedback mechanism and possibly even a "sense of self".
These coming years are going to be interesting though. For sure we are going to see experiments built on top of these recent amazing LLMs that _do_ have some form of short-term memory, feedback and introspection!
Giving these kinds of AIs a sense of identity is gonna be a strange thing to behold. Who knows what kind of properties will start to emerge
GPT-4 is reported to be well-calibrated, that is values in its output layer are in good correspondence with probabilities of those outputs being correct.
So, the information about what it does and doesn't know seems to be there. I can speculate that a limited form of introspection is probably present too: the model needs to know what it will say later to output the current token. A simple example: should it output "a" or "an". To make this decision it might need to model its own state at a later point in time. Of course, I can be wrong.
But I mostly agree with you. Explicit mechanisms for memory and introspection will probably drastically reduce the need for computation power to achieve the same results and they will give rise to more abilities.
When people look at this issue they tend to fixate on the 30/15% 'tax'. Is it fair? Is it not?
The commission is not nothing, obviously, but misses a larger point.
Apple did to Hey what they did to our company. One day we submitted a minor upgrade for an app that had been on in the app store _for years_, a bugfix release, and they told us they would not publish it or any subsequent submission unless we add IAP to the app.
We would be happy to pay apple some fair fee to place our app in their store. We all understand that it's not a charity. But that's just not what apple is doing
here, or not the whole story.
They took an app that was _already_ in their store for years, and froze updates on it. That approach forced much of our company to stop whatever it was they were working on and implement IAP _now_. Mobile devs, backend devs, accounting, marketing analytics/funnelling were all affected. There would be no bugfix releases until that feature was added.
The implementation of recurring payments in Apple's IAP is quite different from that of other payment processors like Braintree or Stripe. As such it has added a lot of complexity to our backend services, and has sucked a huge amount of time from our developers, accounting, and from our analytics team. All so we can implement a feature we _already had in our products_ and were happy with, just their version of it.
We incorporated IAP in the fall of last year, but we are _still_ trying to digest it into our company, still refining how we handle their subs, still mitigating some mistakes we made in the our initial implementation. And their remain some unresolved question for us on the how best to do accounting and market funneling.
The media continue to fixate on the 30% commision, as do must of the comments in this thread. But the issue is that Apple does this by imposing the commission on apps that _already_ have huge sunk costs on there platform, going back years, that already have tens of thousands of users (at least in our case), and then one day send back a bugfix release with a letter that says 'if you want to put an update to this app on our platform, add IAP to it. Now.'
Agreed. Apple is an oligopoly providing a service that has become more or less 'must have' So are Google/Facebook in their markets. Oligopolies that are 'must have' already exist. They are called utilities and they are regulated. A lot. And taxed, a lot.
Silicon Valley succeeded, they billed an empire (multiple ones actually). They were amply compensated and justly so. It's time for regulators and the taxmen to step in (as with any new territory/market).
Unpopular opinion: What else do you expect Apple to do with applications that are no longer in compliance with their rules and regulations for the platform?
They could make you happy and grandfather applications in, but then that creates a different "rage" inducing scenario for new applications.
Regardless of whether you agree or disagree with their platform rules changes, the change was made, and now needs to be enforced.
Question on the background: did Apple announce this change long in advance and this start enforcing it (as advised) after the window for remediation elapsed? Or was it sudden?
A sudden change is just bad form if that was the case.
Apple announced it way back in about 2011 or 2012.
Here is the sequence of events for our company:
2012-ish: apple requires IAP be an option if you have payment subscriptions. This date might be slightly off, I think it was way back then.
2014-ish: We launch an app that has no recurring subs. So we are still in compliance with their terms.
2017: We add a concept of membership, with braintree managing the subscriptions. Unbeknownst to us, we are now in violation of their terms.
2019: Apple notices the subs, and freezes our app updates
So we were indeed in violation of their terms, ones they had made known long before. We did not know it, but we were. Apple let us know 2 years after the fact by preventing us from publishing a bug fix release.
Disclaimer: Not an app developer, for either mobile platform.
I heard about this change about a month ago on a couple of news platforms. Specifically, I heard that Apple would be requiring apps which have out-of-app purchasing mechanisms to offer the same products via in-app purchasing.
Also, this is a remediation period. Existing apps aren't being taken down (short of repeated version update submissions which don't comply with the new rules), the new updates are simply being held to the new standard.
My thinking is the same. Apple could have just updated their platform, broke the existing app for all current users and new users, and they would have been dead in the water.
There were a lot of users that were angry that their application that worked on WinXP didn't work on Vista/7. The platform moves forward, with or without you.
But I would suggest, for that point to really resonate, Apple would have to make some changes in how they enforce
these rules, and other aspects of IAP:
1. Provide a grace period
Apple blocked our app release one release after we did a major makeover on the app, switching from Objective C to react native to bring it in line with our 2 other apps. We expected to experience teething issues and quickly release bug fixes as a result. We could not. Given how it went down, we had to implement IAP as fast as we could, giving us a fair amount of tech debt.
2. Make sure the rules are enforced equally for all apps
See linked article for details
3. Allow developers to pay an entry fee for placing their app in the Apple App store as an alternative to implementing IAP
Apple is forcing developers to use a technology that they dont want, one that some developers may feel degrades their ability to provide the best product to their customers. They are using their position as the gatekeeper to the app store to do that.
Let us into the app store with a fee instead of forcing us to expose a feature we have no control over ourselves.
4. Improve their API
The truth is, their 'API' isnt really an API, in the sense that say Braintree or Stripe give you. You are provided a 'receipt', which is really a window onto a state machine. The 'receipt' changes over time and tells you the history of the subscription. But that history is incomplete. For example, the history wont tell you the amount of a given transaction, it's just not there. You have to deduce it from the associated payment plan. But that is hard too, since the payment plan price may change over time to test the market.
Another problem with their implementation is that Apple does not send webhook events for some subscription events -- they only send send webhook events during state transitions of a subscription. So for example if the user makes a monthly payment, there is no webhook sent, since their subscription did not change state. But we need to know these things as it impacts our messaging with our users, so we are forced to write daemons that poll apple to update the receipt and see if any new events have occurred that may require user messaging.
It is a cumbersome and awkward 'API', one I do not belive many companies would voluntarily choose were they not forced to do so.
4. Give us more control over IAP
We have a 'customer is always right' approach at our company. If a user asks for a refund, they're probably going get it. Unless they purchased something using IAP. We have no control over an IAP subscription -- for those, we must direct the user to contact apple and see if they can get a refund there. Apple does not have a 'customer is always right' approach to their users. Should they refuce a customer a refund that we would have given, that blows back on us.
That is really the tip of the iceberg, there are other things we have to cede control over to Apple that effect our relationship with our own customers, especially relating to marketing and testing price points.
Apple seems to have either missed these companies were subscription based, another possibility is some of these companies started as free services and then added a paid subscription later.
Isn't this a key to Apple's success though? One of the differentiators between Android and iOS is Apple's willingness to bring their boot down to enforce a more consistent user experience by any means necessary. If that means playing hardball with developers then so be it.
> to bring their boot down to enforce a more consistent user experience by any means necessary. If that means playing hardball with developers then so be it.
As a consumer, no. I want both options, but I don’t want my phone choice to force some development team to be under the barrel of a gun
Rather than repair it in space, would it be easier to bring it back to earth, refurbish it and relaunch? That would be $120M plus refurbish costs. Or am I underestimating how hard it would be to bring it back to the ground?
The key thing to remember when it comes to space is energy/momentum. Right now, the Hubble Space Telescope is 24,000 lb of mass moving at 4.7 miles per second. To get it back down to earth, you need to somehow slow it down. First, slow it down enough to reach the atmosphere and then protect it as the atmo drag slowed it the rest of the way. I don't even think the Space Shuttle, were it still around, would be capable of doing that.
The only thing that was ever designed to bring substantial mass back from orbit was the Space Shuttle Orbiter, which is decommissioned. Even then, I'm sure Hubble was not designed to fold back up and fit into that cargo bay.
I think it's instructive that they never tried that with any of the earlier Hubble repair missions.
The ruby version looked a bit odd to me. It would be unusual to not use regex in this context.
I'm not sure if this is in the spirit of the test, but using regex reduced the time from 27s to 0.04s on my machine.
require 'benchmark'
contents = File.read("/tmp/200mb.txt")
find = "07123E1F482356C415F684407A3B8723E10B2CBBC0B8FCD6282C49D37C9C1ABC"
result = nil
elapsed = Benchmark.realtime do
result = (contents =~ /#{find}/)
end
puts "Found at index #{result} in #{elapsed}s"
(The search string was placed at the end of a 200mb file)
"...some site-selection experts to speculate Mr. Bezos may choose a location where an influx of workers could help promote political change..."
Playing with that notion.
Premise: 50k tech workers would have more impact on US politics if: 1) added to U.S. cities; 2) added to cities in states with smaller populations. (Debatable premise(s), but just go with it).
Below is trimmed down version of the original list of 20 with that in mind, sorted by lowest population. Cities in states where 50k would be < 0.40% of the population were cut (eg. 50k in florida would only be 0.25%).
I've got an 80-year-old row house in Queens, NY. Family of 4, no pets.
Our electricity usage is well below yours. We don't make a huge effort to
conserve, and I don't believe our house is particularly well insulated. We have
no special energy tech, unless you count the LED lights. We have a lot of
electronic gizmos sipping away at any given time.
Winter: we don't have the baseboard heaters, but I work from home most days
and have an infrared heater pointed at my desk for a good 8 hrs or so each day.
Summer: my wife hates A/C when she sleeps, so we only have one air-conditioned
room at night (me and 1 kid in the room w/ A/C, wife and other kid in the
other). No pets, so we leave the A/C off when we're out, it comes back on (by
timer) about an hour before we get home. So big difference in summer.
Still, that doesn't sem to account for all of the difference. I just checked my
con-ed history online. Some typical usage numbers:
We spend anywhere from $80-$350 a month for electricity. Maybe there's a big difference between an 80-year-old row and a 150-year-old row?
We have
- 3 window A/C units.
- 3 routers, running 24/7
- 2 servers (mostly used as NAS), running 24/7
- FireTV, a few sonos devices, drawing a bit 24/7
- Many lights, mostly LED. We're pretty good about turning them off.
- ~10-15 phones/kindles/laptops/tablets/gizmos constantly being topped off
- Gas for stove, water, and household heating
EDIT: After reading more discussion below, I thought maybe I'd made a mistake, since there is a difference between our usage and most of the other US folks. I double checked, and the numbers are right. Our total usage for the last 12 months is 4340 kWh. We were gone for a few weeks in the summer, else it would have been a bit higher.
This is fantastic. Feature request: drag a rectangle over a group of dots, and see them as a text list of websites. As is it's hard to see all the sites that are in a dense dot cluster.
Thanks, the code is available here if you wish, with all the script that were necessary to generate the json files containing the data : https://github.com/madewulf/MigrationsMap.net
How soon until the 1/25-scale cray C90 gets as many MIPS as the original? Seems like the one he built is within shouting distance.