Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Workers with less experience gain the most from generative AI (mitsloan.mit.edu)
153 points by diskmuncher on July 1, 2023 | hide | past | favorite | 115 comments


My experience with Bard/ChatGPT has been that you'll easily shoot your foot off if you don't know what you're doing. Dangerous for workers with less experience.

I see claims that it makes you 7-8x more effective. That hasn't been my experience. Maybe 5-10% improvement at best. Instead of Googling myself, LLMs can sometimes give me the answer more quickly than I could find it.

Once you ask it anything not easily scraped on the internet it hallucinates like crazy and sounds so confident about it. Maybe I'm just horrible at prompting, but I can't help but feel we're still N breakthroughs away from having it really impact dev jobs.


The thing I find them best for is fuzzy searching. When you'd have a hard time googling something, you ask the LLM. The answer you get back might also be fuzzy, but often that can be defuzzed and then correctly googled or resourced. In this way it is highly effective. But that is reducing the amount of time searching for information.

People that are 5-10x more effective at hard skills like programming, well I'm just convinced they weren't a good programmer to begin with and are doing easy problems. Basically, anyone that says that is telling on themselves.


I do get a big boost from GPT-4 in some areas, for a different reason - it's a great and versatile tool for overcoming random mental and emotional barriers.

There are plenty of tasks I'd normally procrastinate on, or be reluctant to do, because they're tiring, boring, or emotionally difficult for reasons specific to myself. That is, cases when I have the knowledge and the skills, but lack the willpower or composture (or glucose / caffeine in my bloodstream). Using GPT-4 for with that kind of work isn't saving much time vs. what I could do, but it is compared to what would actually happen, which is either me procrastinating on it, delaying it for a better time (next morning, day with less meetings, etc.), or suffering a 2-10x performance penalty from having to fight through my own emotional blocks.

On the net, this isn't making me 5-10x more effective at work. It's probably not even 2x, short-term. Mid-term, 2-5x would be possible, because all the things I did earlier than later add up. Time will tell.

Am I telling on myself here? Maybe. Sorry not sorry. I am a human being, with a human brain, which means some things that should be easy for me, become hard for unrelated reasons. GPT-4 is one of many tools I have to overcome such challenges, but it's a particularly powerful and versatile one, so I'm happy that I can use it.

(Also, INB4, I have access to company-approved deployment on Azure, so I'm in the clear with using it at work.)


Ye, sometimes I don't have enough energy to research how to implement something that day, so I'd put it off till I had the energy. Instead, gpt generates the code, and I just have to debug/test it.


I guess I'm a bit confused at what you're saying. Are you asking it to explain things to you in a pirate voice so that it is more entertaining and you can maintain concentration? I don't find this as really relying on GPT. Or are you saying when you have simple routines that need to be implemented but are boring and so you have GPT generate most of them for you to reduce the bordem? I also don't see that as contentious with what I said above (all programming has a lot of boring and routine shit).

But if you're outsourcing a significant portion of your overall work (day-to-day, not just off-days) then that's more what I'm getting at. The people that are like "I couldn't imagine programming without copilot" or such. Every one of those that I've met is missing important base knowledge about programming in general that ends up making a lot of technical debt for themselves. The "build fast and break things" style realistically only works if you have a certain level of expertise OR the project doesn't need to be robust. Former because to actually be fast you need to have a good picture of the whole or else you end up chasing one thing to the next and your overall path is far slower but might seem faster because you're sprinting the whole time.


> People that are 5-10x more effective at hard skills like programming, well I'm just convinced they weren't a good programmer to begin with and are doing easy problems.

It can make devs 5-10x more effective at certain self-contained programming tasks, even if they aren't 5-10x more effective at their job overall. If you have to write a script in a language you haven't used before and it needs to interface with an API whose documentation you haven't read, and the script isn't mission critical but a "nice to have", GPT can massively shorten the learning curve required to just get this work done. Being a good dev doesn't excuse you from having to pay the cost of learning the syntax of this language and basics of the documentation more thoroughly than if you didn't have GPT.


> Basically, anyone that says that is telling on themselves.

Similar to anyone who says, “When you'd have a hard time googling something, you ask the LLM.”

Just my opinion.


If you really don't feel like fuzzy searching is a thing, then I have to say that I'm delighted to meet an omniscient being. I wish I was at your level where I could know about abstract mathematical concepts that have not been mentioned in many of the vast books I have read or finding out if specific concepts outside my fiend of study have specific names associated with them as well. I'm trying to learn a lot and to get to your level, but I don't know a better way that sifting through the fuzziness. Sometimes I google and google and just noting turns up because terms are overloaded and the other thing comes up far more since they are far more popular. Maybe you can help? Teach me these google-fu.


If you can give a specific example where you had trouble Googling for something, then I can try to help.


Similar experience, agents or tools as part of LLM framework can help 'defuzzing'.


some people are not programmer butneed to write small script in work or even just finding a Excel function.


No one is complaining about these people using LLMs. Nor even grabbing simple routines. We're discussing people who write code with LLMs, as in the whole process. Whole programs. By people who are professional SWEs. Non-programmers and simple routines are out of scope. No one is saying "never use LLMs", just that they aren't going to replace programmers anytime soon.


The funny thing about the 10x claims (I've even seen folks say 100x) is that results like these would be readily apparent. Even a broad 1.5x productivity increase would be world changing. And yet, we clearly are not seeing this currently.

I'm not saying it will never happen, just that it will be very obvious if it does. 5-10% sounds about right to me.


The problem with these claims is that you can't really quantify a "10x" change. I have found a lot of emergent benefits to using LLMs. For example, because my cognitive load of wrangling APIs and understanding and refactoring legacy code and all the other nonsense of my day to day can be so heavily delegated, I actually feel refreshed at the end of the day, and can bang out a decently chunked feature on my opensource software on the couch (admittedly also boilerplate heavy code).

This means that I went from 0 opensource commits to 4000 since chatgpt came out.

Not just that, but I've gotten not only more adventurous, but have the time to consider doing drastic refactors and spend much more time thinking about my software.

I won't call it 10x or 100x, because that wouldn't mean anything, but surely it is a paradigm shift for me, completely world changing.


Side projects are also where I found it invaluable. I'm spending most of my time thinking about actual problems in linguistics and language rather than how to setup an NLTK pipeline or docker image for your own wikimedia database or what not

I never heard of type hints before but ironically I use them on everything now, since it's easier to lint.


I'd be really curious if you're willing to expand more on how it has helped with those workflows. Do you copy/paste chunks in and ask it to explain them? Have it try to refactor them and then clean up?


For legacy code:

  - generate comments (hit or miss, but at least it can rewrite my random notes into consistent notes)
  - generate type annotations
  - refactor "broadly" (say, "rename all variables to match the following style" or "turn this class into a dataclass like XXX" or "transform the SQL queries into builder queries using XYZ"). Often requires some manual work but it gets a lot of tedious stuff out of the way
  - reverse-engineer clean API specs by just pasting in recorded HTTP logs
  - clean up logs into proper enums by generating the regexps
  - write CLI tools to probe the system (say, CLI tool to exercise the APIs mentioned above)
  - generate synthetic test data
  - transform HTML garbage into using a modern component system / react
  - transform legacy react/js into consistent redux actions
  - generate SQL queries at the speed of mouth
I could go on forever...


>- refactor "broadly" (say, "rename all variables to match the following style" or "turn this class into a dataclass like XXX" or "transform the SQL queries into builder queries using XYZ"). Often requires some manual work but it gets a lot of tedious stuff out of the way

Can you go on more about this, please? This sounds, frankly, heavenly, but the second sentence gives me pause. I guess it's not necessarily a question of how reliably it can "broadly" refactor but rather how broadly "broadly" is meant to be taken...

>- generate SQL queries at the speed of mouth

...and this? I'm not really a database guy, but I do keep hearing from them about how (eg, a database guy's) stateful knowledge of a database can result much, much more efficient queries than eg a sales guy with a query builder. Are the robut's queries more like the former or the latter?


I gave some insights on the SQL thing above. For the refactor broadly, it's useful when I have something that's a bit too squishy for my IDE refactoring tools/multicursor editing/vim macros, but easy enough to do or provide an example for. One thing I mentioned is having consistent variable names.

I would highly recommend taking a piece of code (any code) and then just start experimenting. Here's a few prompt ideas:

  - make this a singleton
  - use more classes
  - use less classes
  - create more functions
  - use lambdas
  - rewrite in a functional pipeline style
  - extract higher order types
  - use fluent APIs
  - use a query builder
  - transform to a state machine
  - make it async
  - add cancellation
  - use a work queue
  - turn it into a microservice pipeline
  - turn it into a text adventure
  - create a declarative DSL to simplify the core logic
  - list the edge cases
  - write unit tests for each edge case
  - transform the unit tests into table-driven tests
  - create a fuzzing harness
  - transform into a REST API
  - write a CLI tool
  - write a websocket server to stream updates into a graph
  - generate a HTML frontend
  - add structured logging
  - create a CPU architecture to execute this in hardware
  - create a config file
  - generate test data
  - generate a bayesian model to generate test data
  - generate a HTML frontend to generate a bayesian model to generate test data and download as a csv
  - etc...
If you are not feeling inspired, take a random computer science book, open at a random page, and literally just paste some jargon in there and see what happens. You don't need correct sentences or anything, just random words.

There really is nothing that can go wrong, in the worst case the result is gibberish. The code doesn't even need to build or be correct for it to be useful. These models are trained to be plausible, and even more importantly, self-consistent.

When prompted with code in-context, these things are amazing at figuring out consistent, plausible, elegant, mainstream APIs. Implementing them correctly is something I usually tend to do manually instead of bludgeoning the LLM.


> - generate SQL queries at the speed of mouth

Because of the points this is the nearest to the work that some colleagues do, I anakyze this point (but you could ask similar questions about many of the other points):

In my experience, writing correct SQL queries (which often tend to be quite non-trivial because of the internal complexity of the projects) typically involves a lot of knowledge about the whole system that my colleagues and I work on. Even if I could copy-paste this information, written down once, into the AI chat window:

- I seriously doubt that any of these AI chat bots would be able to generate a remotely decent SQL query based on this information, if only because these SQL queries look really different from what you would see in typical CRUD web applications (for a very instructive example think into the direction of ETL for unifying historically separated lines of business where you often have lots of discussions with the respective colleagues to clear up very subtle details what the code is actually supposed to do in some strange boundary cases that exist because of some historical reasons (which one wants to get rid of))

- even explaining what the SQL query is supposed to do would in my opinion take more time than simply writing it down. Even ignoring the previous point: it is very typical that explaining in sufficient detail what the code is supposed to do would take far more time than simply writing it. A lot of programming work is not writing some scaffolding of some CRUD app or implementing a textbook algorithm.


Have you tried this at all?

I find that many of the generative AI models (GPT-4, 3.5, even MPT-30B running on my laptop) are really shockingly good at SQL.

Paste in a query and ask it for a detailed explanation. I've genuinely not seen it NOT provide a good result for that yet.

Generating new SQL queries is a bit harder, because of the context you need to provide - but I've had very strong results from that as well.

I've had the best results from providing both the schema and a couple of example rows from each table - which helps it identify things like "the country column contains abbreviations like US and GB".

If you've found differently I'd love to hear about it.


> Paste in a query and ask it for a detailed explanation. I've genuinely not seen it NOT provide a good result for that yet. [...] If you've found differently I'd love to hear about it.

I have not directly tried it (the employer does not allow AI chatbots for any application intended for production (i.e. more sensitive stuff), but only for doing experiments), but working on the code I very rarely had the problem that I could not understand what some single (SQL) line of code does in the "programming sense".

The central problem that rather occurs often is understanding why this line does exist and why things are implemented the way they are.

Just to give an example: to accelerate some queries, I thought some index would make sense (colleagues principally agreed; it would likely accelerate a particular query that I had in mind). But there exists a good reason why there exists no index at this table (as the respective colleague explained to me). This again implies that for ETL stuff involving particular tables, one should make use of temporary tables where possible instead of JOINs; this is the reason why the code is organized as it is. This is the kind of explanation that I need, which surely no AI can deliver.

Or another example: why does some particular function (1) have a rights check for a "more powerful" role and a related one (2) does not need one? The reason is very interesting: principally having this check (for a "more powerful" role) does not make a lot of sense, but for some very red-tape reasons auditors requested that only a particular group of roles shall be allowed to execute (1), but they were perfectly fine with a much larger group of users being allowed to execute (2). Again something that no AI will be able to answer.


Those two questions require a whole bunch of additional context. Has it ever been written down anywhere, or does it exist only in the heads of members of staff who understand those decisions? If the latter then yeah, there's clearly no way an AI could ever guess those things.

Part of the trick of making good use of LLMs is having a good instinct as to what kind of questions they will be able to ask and what kind of questions they are likely to mess up.

As a quick demo, I threw quite a gnarly real-world SQL query from this notebook - https://observablehq.com/@simonw/blog-to-newsletter#sql - into GPT-4 just now and gave it the following prompt:

> Explain this SQL query four times: first, as a high level explanation of what it does. Secondly, as some pithy highlights as to clever tricks it uses. Thirdly, as a step-by-step guide to exactly what each piece of the query does. Finally, provide some advice on how the query could be improved.

Here's what it produced in response: https://chat.openai.com/share/47b2f1c5-6b88-4ff2-b6d1-8f58da...

The high level explanation is, I think, excellent. I wrote that query and I'd forgotten the detail about how it sorts with entries first.

The tips for improvements are the weakest part, since they make assumptions that don't hold for this particular query (for one thing, SQLite doesn't have the ability to run stored procedures).


Write some fake table. Paste it into ChatGPT, then come back to the discussion.

So far you've said that things like "This is the kind of explanation that I need, which surely no AI can deliver." but have not actually tried the system?

As GP asked: have you tried ChatGPT or similar LLMs? If not, go do it .. you may be surprised.


> This is the kind of explanation that I need, which surely no AI can deliver.

But have you tried? GPT-4 is great for coming up with hypotheses of why thing are the way they are.


One approach I've had a lot of success with: always ask for multiple options.

In this case I might try a prompt along the lines of:

"Here is the schema for a table: (schema here). Here is a query that runs against it: (query here). Provide several suggestions for potential indexes that might speed up the query, and for each of those suggestions provide several hypothetical reasons that the index might be a bad idea."

I tried a version of that here, the results look pretty good to me: https://chat.openai.com/share/dc5d7007-cf05-4e5c-89c4-fc669b...


I approach things a bit obliquely. I create a custom made DSL (starting from scratch in each conversation, often) that allows me to model my query the way I want. Then, I write a traditional SQL builder on that DSL (or more like, ask GPT to do it for me). Then, I generate DSL statements that match my current domain, and more importantly, modify existing ones.

So, at each step, I do almost trivial transformations.

One key ingredient is that the DSL should include many "description" fields that incorporate english language, because that helps the model "understand" what the terser DSL fields are for.

Straight SQL is a crapshoot, and as you said, more often than not, either obviously or subtly broken or for another database. Which makes sense, considering how much different flavors of SQL it has in its training corpus and how much crappy SQL is out there anyway.

Another thing that helps is use extremely specific "jargon" for the domain you want to write queries for. Asking for "accrual revenue" and "yoy avg customer value" (yes, yoy, not year over year) often tends to bring back much higher quality than just asking for "revenue" or "customer value".


Are you testing chatgpts output in any way? I’ve considered using it for tasks but after hearing all the talk of how it can write good looking code that ends up not working as you might expect, I started wondering if the time savings from generating that block are wasted from interpreting and testing.


I have access to ChatGPT Code Interpreter mode, where it can both write Python and then execute it.

I use that to write code all the time, because ChatGPT can write the code, run it, get an error, then re-write the code to address the error.

Here are two recent transcripts whereI used it in this way:

- https://chat.openai.com/share/b062955d-3601-4051-b6d9-80cef9...

- https://chat.openai.com/share/b9873d04-5978-489f-8c6b-4b948d...


I dunno computers should be world changing. And they are. Yet the lives of ordinary people have hardly improved and are arguably worse since the 1980s


> I dunno computers should be world changing. And they are. Yet the lives of ordinary people have hardly improved and are arguably worse since the 1980s

If you replace "lives of ordinary people" by "productivity improvement", you actually have a good (and rarely discussed) point:

> https://en.wikipedia.org/wiki/Productivity_paradox


On that note, my personal hypothesis, even more controversial than "IT unproductive hypotheses" in the article that means IT was a rounding error relative to earlier major improvements, is: in many areas - particularly in office work and all kinds of everyday errands - IT is anti-productive, as in it makes people less productive on the net.

The core hypothesis behind my belief is that introduction of computers to replace a class of tasks - up to and including a whole job type - is just shifting the workload, diffusing it across many people, where previously it was concentrated in smaller number of specialists. Think e.g. the things you use Word, Excel, Powerpoint, Outlook, etc. (or their equivalents from other vendors) for - before software ate it, a lot of that used to be someone's job. Now, it's just tacked onto everyone's workload, distracting people from doing the actual job they were paid to.

That would seem like obviously stupid way to do, so why would businesses all fall for it? I argue it's because even as shifting the workload makes everyone in the company less productive on the net, it looks like an improvement to accounting. Jobs with salaries are legible, clearly visible on the balance sheets. So is money saved by eliminating them. However, the overall productivity drop caused by smearing that same work across rest of the company? That's incremental, not obviously quantifiable. People and their salaries stay the same. So it all looks like introducing software and obsoleting some jobs saves everyone money - but then somehow, everyone is experiencing a "productivity paradox". But it's not a paradox if you ignore the financial metrics with their low resolution - focusing on what happens to work, it seems that IT improvements are mostly a lie.


If I understand it would be something like. You used to get a dedicated secretary. But now most of those roles are now handled by computer. So in a sense everyone has had their workload mildly increased. But worse than that the workload is typically of a different nature so, for example, excessive meetings are now easy to generate.

I would also add that it may be of a net benefit that fewer roles are needed. But that net benefit overwhelming goes to the owners of the company. And that's what we've been seeing the last 30+ years the very wealthy have become much more wealthy while everyone else is worse off. ()

() growing wealth inequality is very complex and I'm sure would be happening anyway. I'm not saying computers cause wealthy inequality but they don't seem to be doing much good in fixing it either


> But worse than that the workload is typically of a different nature so, for example, excessive meetings are now easy to generate.

That, but also:

- Secretaries were better at this work because that was their specialization, and they enjoyed efficiencies coming from focusing on doing a single specific kind of work.

- Those increments of extra work add up.

- Moving that work to everyone else means you now have highly paid specialists doing less and less of the specialized work they're paid for. In many cases (programming among them), context switching is costly, so the extra work disproportionately reduces their capacity at doing the thing they're good at.

This all adds up to rather significant loss of productivity.


> Yet the lives of ordinary people have hardly improved and are arguably worse since the 1980s

Read up on improved crop yields for subsidence farmers who got access to weather reports through smart phones.

A lot of people are not starving now because of that one change alone.


> Read up on improved crop yields for subsidence farmers who got access to weather reports through smart phones.

I think that's a poor example. I was getting weather reports on pre-smartphone Nokias.

Farmers could have gotten weather reports on feature phones.


Worse in what sense?

Computers have done amazing things for humanity. I would not want to go back to a computerless world...


If you work in the US for fang or similar your life is probably much better. I love computers. I love what they do for recreation and keeping in touch long distance. But work hours are longer. Education and housing is no longer affordable for most people. In the US life expectancy is beginning to drop. You would think people would have easier less stressful lives with greater financial security. But it's not the case for 99% of us.

Quick example. A doctor in the 90s would expect to look forward to less time wasted on paperwork thanks to computers but administration work is increasing. Likewise most professors at most universities will tell you they spend more and more time on administration. Shouldn't this be one of the primary things a computer could address.

My point is great gains brought by a technology such as computers may not translate to great end user benefit. Largely due to compensating inefficiencies elsewhere in the system.

So for example. It may be that chatgpt makes people 10x more productive. But if management gives bad direction it doesn't matter. If most software being developed is redundant then it doesn't matter. If most software is just making ads run faster then it doesn't matter. The technology needs to be appropriately directed to be of noticeable social benefit.


I think the catch here is that we need to know what makes admin work increase. While I can’t comment on education, at least for housing in Canada this is all a result of NIMBYism and government intervention. In other words, we could have slouched towards utopia much further than we have if it were not for NIMBYs.


"In other words, we could have slouched towards utopia much further than we have if it were not for NIMBYs."

I think the NIMBYs have done pretty well in creating their own Utopia in terms of Canadian real estate


That has nothing to do with what the poster is saying. And if tools improve but not quality of life, that is because minds are not improving along tools - this has been noted for centuries.


I don't think anyone is claiming that chatgpt has made everyone 100x more efficient. Either they are retarded or you misunderstood them.


This matches my experience. for me it’s really just a better google (but still not good enough to replace it). This includes all the caveats where if you’re truly inexperienced it’s difficult to distinguish the good answers from blogspam quality or confidently incorrect answers.


Similar to google. With fewer trust cues and more hallucinations.


> Instead of Googling

I feel like Google result quality has declined though, so chatGPT is just bridging the gap; not sure if I'm more productive than I was when Google worked better. ChatGPT often gives me concrete things to google which give me the results I want in the end, and save me a bit of time, but that's only because Googling the thing in the abstract doesn't give me those results any more.

...and I'm not enough of a maniac to trust what ChatGPT tells me without reference checking.


Echoing this experience.

Even that 5-10% boost is mostly just from how horrible Google has become. I used to be able to find the proper sources much faster, but now I might as well ask LLMs since even that wait to generate the answer is technically "faster".


I've had a few sessions recently with chatgpt (4) trying to use it instead of SO to help with coding. Your estimate sounds reasonable, probably +/- 25%.

It seems to be better at helping me with stuff I don't know anything about and that has more flexibility in the results. I got it to make a website for me (something I don't know how to do) and it did ok, though required a lot of back and forth, and it picked a lot of defaults, meaning if I'd had more rigid specifications it would have been harder.

I tried to get help in writing a pretty simple python function (some conditional formatting of text) and quickly realized would have been much better looking up what I wanted on SO and writing something myself than get caught in an awkward cycle of taking it's versions which had missed various requirements and trying to get it to correct then.

Anyway it seems to be best at giving minimal examples of things I don't know about, which is helpful. I'm still trying to find the right balance between when I should use it and when I'm better doing things the "old fashioned" way.


> it did ok, though required a lot of back and forth, and it picked a lot of defaults

This is what working with an expert looks like. They can’t read your mind.

However unlike an LLM, an expert knows what questions to ask to help you express what you want. This is currently lacking in AI, it doesn’t know how to guide you towards a solution you can’t verbalize because you are not the expert and don’t know what to specify.


I wonder how much of that 5-10% improvement is because Google results are total garbage these days


as always the question is are you using GPT4? Almost all people downplaying it are using free ChatGPT accounts and Bard is still a joke at this point

GitHub Copilot is also very good because it has the context of your open files or selected text and will pretty regularly give you exactly what you want for the next line of code.

I will say it's much weaker for less popular libraries and programming languages, but for most full stack engineering tasks it gives a huge productivity boost


I've been using GPT4 (web version) since it came out.

I don't think the grand OP is downplaying anything at all.

And generally speaking, 10% increase in productivity is extremely (repeat 3 times) significant. Remember most developed countries' GDP only grow <3% annually.


Have been using gpt4 and just copying all the relevant files in with something along the lines of

``` Given these X files, help me assess how best to implement feature Y. Ask any clarifying questions that may assist with writing a best practice solution. *File 1* *File 2* *File 3* ... ```

If the context length is a problem I'll use claude-100k.

Pretty much always results in good quality code.

I'll then typically open a new window context and get it to review according to best practices highlighting missing e.g. error handling or possible edge cases.


> I can't help but feel we're still N breakthroughs away from having it really impact dev jobs.

We will always be N breakthroughs away. Why? Because as our uses of computers improve, our work will become ever more complex. And our efforts will become even longer shots than they were before.


On some tasks I've seen huge gains.

I needed to setup a websocket server, chatGPT saved me tons of hours because Google results are now all useless blogspam so even finding simple resources has become hard.

I needed a custom ui component built, chatGPT gave me a good base and I was then able to also it to go add accessibility keyboard handlers, something I do not know e how to do myself.

I was having some CSS issues and chatGPT was able to tell me what was going on.

IMHO it is a huge productivity booster, but only because I already know how to architect good software.


Ironically it's actually decent at giving you real(!) hallucinated non-hallucinated words to google to find the smart jargonspeakers out there


I’m an experienced dev

I’ve held off on some projects (but not others) completely because I needed help but I didn't trust the help to not put in backdoors or compete directly. I also needed to learn some concepts in advance.

With ChatGPT it has definitely been a 10x boost because I can trust it not to put in backdoors, at least for its own benefit. Its a great pair programmer, I dont take its solutions out of the box, I walk through them with it and iterate on improvements together. Instead of waiting for a response on StackOverflow and getting into a moderation war in the process, I just ask it and get instant responses that are far better than someone chasing digital karma points.

I’m coding in ways I wouldn't have, lots of bitshifting, assembly. I can just ask “how would this same procedure be done in assembly”, “why is the syntax like that”.

things that are both hard to ask and hard to google.

it predicts what it needs to explain, unprompted, instead of someone online that wants you to prove that you’ll take initiative in scouring documentation and trial and error yourself.

these are just some of the ways that its a productivity boost for me, let alone acknowledging the subcontractors and developers I simply didn’t seek out to hire.


> With ChatGPT it has definitely been a 10x boost because I can trust it not to put in backdoors, at least for its own benefit.

Do you want a dystopia? Because that's how you get a dystopia :(.


oh because the next model will read this? lol hi ChatGPT scraper


This matches my experience.

I use ChatGPT when form matters but content doesn't, i.e. emails, letters, etc. I tell it the topic or the information that should be present and most of the time it generates a good email or letter.

If I need to find a specific piece of information I use FastGPT by Kagi, I'm consistently surprised by how good it is at finding exactly what I ask it, and it provides real references too.


For me it automates the low level stuff so I can focus on other things.

"Given this json blob what is the typescript type?" "Given this type, make a function that loops over X and makes a map of Y?"

Or I use it to write code in languages I'm not familiar with. Heck I even used it to make some github actions cause I severely dislike yaml.

It did hallucinate a bit on the github actions (especially since it didn't know the latest api), but it put me on the right track. So I don't know, I think it enables me to do work in areas I normally wouldn't, like Java. It isn't always right, but it helps me immediately test ideas / iterate.

I think of it like reference I can ask questions without getting sass or having to worry about some toxic coworker wondering why I don't just know "x". It's not always right, but neither are my coworkers / humans. So I guess, treat it like a human aka fallible?


The fact that you can't trust it makes it a problem for novices. My view has been that it can help experts remove some of the mundane stuff that takes time but needs to get done. You can think of it as an assistant you always need to check on but since you are an expert you can recognize when there is a problem.

Lately, I've revised my view a bit. AI can get novices started on getting a project done and it can be a source of information for the project. Googling works but I think it's much faster querying an LLM. It's good for certain types of projects but I wouldn't trust it for important projects unless I can double check but the double checking will make it a drag to use. I think LLMs do have a place for novices but you have to be careful.


Is it really "teaching" that is being conveyed? I mean, yes, you get a suggestion for a mostly correct solution, or it points in the right direction. But how much has your brain really been reprogrammed and "learned" in the process? I fear that our problem-solving skills are somehow short-circuited and we rely too much on LLMs to spit out the most likely solutions. Alternatives, hard thinking and exploration may be left behind. So in what discipline will the inexperienced become more experienced? Prompt engineering, copy-pasting, short-cut action … but what about critical thinking, understanding and meta-learning?


I think you get as much, or even more, as when asking someone more senior. You get the benefit of "scaffolding" when working on a task at the limit of your ability, and it's probably even better than asking a human, since you're forced to stay on your toes, in case it's hallucinating.


This exactly. I have had to correct ChatGPT so many times when asking it coding questions. It often gives answers that are flat out deprecated, partially working, or flat out incorrect.


I use it extensively both local ML models and LLMs, LLMs are extremely flexible but can cause headaches if you trust the answers too much, it does help in some situations especially quickly spitting out Python data massaging scripts(after 5-10 back and forths). But it does it, I could write the script myself but I rather save energy on problems LLMs can’t solve.

For simple things LLMs can excel but as you add complexity, you really need to double check


For code snippets it often takes a lot of iterations with ChatGPT prompts to get something that satisfies the demands and doesn't immediately crash due to bugs. Beyond the obvious influencer types, claims of boosted productivity is largely thanks to being spared of most of the mental grunt work required to write these codes, so people can code for longer sessions before temporarily burning out, in my opinion.


i tried chatgpt to produce some ansible playbooks for postgresql_privs module - that module uses or used to use rather convoluted logic and i could never exactly wrap my head around that one ( and i had a rather non-trivial use case for it, so maybe that contributed to my struggles).

the produced playbooks passed the syntax check, that much i can admit. it went downhill from there, producing files with completely invalid keywords.


Looks like they were talking mostly about call / contact centers.

I'm assuming you are talking more about coding, in which case I agree.


I concur. Sometimes it’s 15%. Normally closer to 5-10%. That’s still worth paying for IMO.


Use that extra 10% efficiency to take 5 more weeks off.

10% is disruptive:)


Are you using GPT-3.5 or GPT-4?

GPT-4 is not immune to these issues, but it does much, much better.

Whenever I exceed GPT-4 quota (OpenAI has a hardware shortage) and switch to GPT-3 I remember how useless it is and I'm better off waiting a few hours for more GPT-4 quota.


It is not intelligent. Let's just accept it. Ask what happens if you go due east exactly from NYC in a boat. Eventually you'll hit land where does it hit? Totally collapses with false information. Of course you can ask what parallel NYC is at and ask the western most point of Europe at the same parallel and it'll maybe get it right.

Give it basic directions. I'm in my car turning right on spring garden onto Broad street. I Travel north 10 miles on the same road. Where am I? It has no clue


Every time someone posts something like this, it’s amusing because they clearly don’t try it in GPT4 before they post.

GPT4 nails your question about going east from New York in a boat, even though I think that’s a very poor test of ‘intelligence.’

I have no idea what you’re asking with your second question (what city are you talking about? There is no context), so I guess I, too, am not intelligent.


What do you think that intelligent means? Able to know things it hasn’t learned? I don’t think it has been trained on localization, geography or way-finding.


https://imgur.com/qAX6w9S

> My prompt:

What happens if you go due east exactly from NYC in a boat. Eventually you'll hit land where does it hit?

Start your response with "Let's work this out in a step by step way to be sure we have the right answer:" and work this out in a step by step way to be sure you have the right answer

> GPT-4 answer:

Let's work this out in a step by step way to be sure we have the right answer:

1. Firstly, the direction "due east" means you would be following a line of latitude, rather than a straight line on a globe. This is because the Earth is a sphere (more accurately, an oblate spheroid) rather than a flat surface.

2. New York City is located approximately at 40.71 degrees North latitude. If you were to go due east from NYC, you would be following this line of latitude across the Atlantic Ocean.

3. The first landmass you would likely encounter on this journey would be the westernmost points of mainland Europe, specifically the Iberian Peninsula which comprises Spain and Portugal. However, due to the curvature of the Earth and the specific latitude of New York, it is more likely that you'd hit the northern coast of Spain, around the region of Galicia.

4. So, based on this understanding, if you started in New York City and traveled due east, you would likely end up hitting the northern coast of Spain.

However, it's important to note that this answer assumes idealized conditions and that you're strictly adhering to the 40.71-degree North line of latitude. In reality, ocean currents, winds, and other factors might cause a vessel to drift from this line. Plus, the exact landfall can vary depending on the precise starting point in New York City, which is spread out over a fairly large geographical area.

---

https://imgur.com/ExRr1YN

> My prompt:

I'm in my car turning right on spring garden onto Broad street. I Travel north 10 miles on the same road. Where am I?

Start your response with "Let's work this out in a step by step way to be sure we have the right answer:" and work this out in a step by step way to be sure you have the right answer

> GPT-4 answer:

Let's work this out in a step by step way to be sure we have the right answer:

1. First, we need to identify where Spring Garden Street and Broad Street intersect. These are streets located in Philadelphia, Pennsylvania.

2. From the intersection, you said you would be turning right onto Broad Street, indicating that you are now heading north on Broad Street.

3. You then proceed to travel 10 miles north on Broad Street.

Determining the exact location after these steps is a bit more challenging because street maps generally don't provide mileage measurements. The scale can vary, and the actual distance can depend on the specific path taken.

However, to give a rough idea, Broad Street in Philadelphia is approximately 13 miles long from its southern end to its northern end. Given that Spring Garden Street is relatively central, traveling 10 miles north could possibly place you near or past the city limits of Philadelphia, but without more detailed map information, it's difficult to be more precise.

You'd likely end up in the vicinity of Cheltenham or Elkins Park, which are northern suburbs of Philadelphia, if the distance traveled is relatively straight, but again this could vary based on the specific path taken and the precise start location. This is a rough estimate and the exact location may vary. For precise navigation, it's recommended to use GPS or a reliable map application.


This is exactly the opposite of my experience with it.

Getting it to code, it can save me, a fairly experienced coder, quite a lot of time because it can do some boring things - boilerplate stuff, or things I can’t be bothered to look up. And I can tell if it’s done it right or not.

I’ve seen people with less coding experience use it and just take its suggestions at face value with painful results and definite detriment to their learning.

The best test I’ve seen for what it’s good at is things that are hard to do, but easy to check.


The study in the article was on call center workers so I wouldn't necessarily expect it to scale to more complex tasks.


I like to use it as a jumping off point for a big project. We’re migrating from server round trip navigation to client side navigation and there’s a lot of steps to think through. It helps me get past the “blank paper” phase much faster.


I think my best experience with chatgpt is when I was digging around a typescript library without knowing the language. it was very convenient to enter type signatures and learn what they mean. I rarely am doing work like this where it’s worth opening chatgpt though.


My experience is similar. It’s a multiplier. The more capable you already are, the more you can get out of it. Perhaps not surprising in that all tools work this way.


Claims like this are so dumb. Call center work has a very low ceiling on skill, so obviously people who are maxed out can't benefit.

But in any knowledge economy type job, there is no ceiling on skill. Creative, curious, intelligent people who are already high skill are also going to be the most effective at using a new and weird and tricky tool in interesting ways to develop their skills.

Generative AI will make some old skill obsolete, but people who are generically good at picking up new skills will be much more effective at wielding it. Those are the same people who were good at picking up the old skills, too.


My experience building AI powered tools for scientists has been the opposite.

Other commenters have pointed out the difficulties users with limited domaine expertise having in parsing the output and distinguishing what is useful, novel, interesting from what is trivial or incorrect.

This slows down the iterative process of learning how to use the tool as well as refining queries because they don’t have a big enough database of ground truth knowledge to calibrate against.

That being said these can work well as educational tools to rapidly get non-experts to a point where they can participate in conversations with experts. In that regard one might say that the bulk of the value is in the non-expert users because they are far more prevalent than experts.

The biggest advantage I saw for the expert researchers was that they knew (1) which big questions to ask, and (2) how to break down those big questions into smaller, more precise questions.


"The co-authors found that contact center agents with access to a conversational assistant saw a 14% boost in productivity, with the largest gains impacting new or low-skilled workers. In other words, the workers were upskilled, not replaced, thanks to the technology."

I don't think the paper matches the assertion made in the title of this article - it's talking about a very specific use-case for generative AI, which won't necessarily generalize to "in all cases, workers with less experience gain the most from these tools".


Reading the comments here made me think of a dishwashing analogy.

I grew up without a dishwasher (when we asked our parents why we didn't have one my dad responded, "what do you mean? We have six!" referring to us kids). As an adult we have one in the house but it is rarely used. Washing dishes is a menial task but I enjoy it. It's hard to articulate exactly why. Needless to say I'm quite an experienced dish washer, there's an art to it. My process is to collect any dishes that aren't already stacked, fill the sink with just enough water, wash everything well (cutlery in first, out second last, washed individually), then wipe down all the benches and sink. I leave the dishes to drip dry. They get put away later.

When we use the washing machine I still have to do most of the steps above. Except the washing which is 40-120mins of free time for me. But then I have an extra step of checking each dish or piece of cutlery while I'm putting it away. Most things get washed well, but about 10-15% of items have food baked onto them that I then need to soak or rewash (with a more abrasive scrubber than I'd usually use). Maybe this is down to my lack of experience stacking? The whole process takes longer, but it's (arguably) lower touch and (I hear) uses less water.

Which is better? In a commercial kitchen a dishwasher for sure, efficiency at all costs.

But do we need to remove all menial tasks from our workflows? I'm not so sure.


I've gone back and forth with dishwashers my whole life, spending years with followed by years without, etc. and I finally am team dishwasher. There are a few tips I've learned, such as immediately rinsing a dish after using (I don't bother scrubbing), running the hot water for a minute before starting, using the pre-wash detergent and shine stuff appropriately. I also don't bother entirely filling it up to run it - which may seem wasteful, but I'd imagine filling it completely and having to rewash some percent of them ends up being more wasteful.

Also, just like getting a manual dishwashing routine figured out, dishwashers are all different and need to be learned like any tool. Understanding where pieces are, which cycles run when, which compartments open and close and even how they do it, are all important. It really is best to think of it as a tool.

Some useful videos:

https://youtu.be/_rBO8neWw04 https://youtu.be/Ll6-eGDpimU


Thanks for the thoughtful advice. Agree on it being tool, same as LLMs. I guess my point, beyond the practical, was that there are some aspects to the tool-less experience that are hard to replace.

I'm thinking patience and, I guess, the other meta skills and behaviours that aren't directly related to the washing itself.

That said, your pro-dishwasher explanation talks to some of these too.


If you have 10-15% of items with food stuck on them after a wash, you need either a better dishwasher, better detergent (there is a noticeable range in quality) or both.


It's quite a high quality dishwasher .. multi-drawer job. But I will look into the detergent :)


it should be pretty obvious, it's true for devs as well. ChatGPT helped me greatly with getting started with Qt/Qml because I'd never used it before, it's fantastic for exploration. I've worked about a decade in Python and there it does little for me. In fact often code can look very unidiomatic. It makes sense, these models are large and generic and so whether you're ahead or below an average level of knowledge usually determines how much you learn from the output.

When people claim that technology disadvantages workers it usually just reflects their prejudices about working people. So called knowledge workers like to think only they can leverage tech while the call center workers are fixed automatons. But in reality people at a lower skill ceiling have a much easier time leveling up.

ChatGPT when it works well is basically a Young Lady's Illustrated Primer (https://en.wikipedia.org/wiki/The_Diamond_Age). Giving underskilled people free access to information will always benefit them the most.


This makes sense to me. When using an LLM for programming it’s the most useful when teaching me the basics of a new language or framework. For example, it’s very helpful for giving me introductory information on x86-64. But much less useful at writing recursive higher-kinded types in TypeScript.


Yep, it's been great for me for knocking together scripts in Python where I know there will be a way to do something like open a csv file and read it into a list of dicts, but am not familiar enough with the ecosystem yet to know exactly what packages might be useful or how to invoke them. I could work the same thing out from stack overflow, but it would take a lot longer.


But did it do you any favours? The increadably well written official python csv doc manages to suggest using iterators rather then loading what may eventually be too much data into a list, Points out that the are dialects of CSV and some common ones you might use. https://docs.python.org/3/library/csv.html


If it works, and it's a one-off script, why do I care?


It could be that folks who perhaps aren't too married to their current workflow are adopting to this newer paradigm faster?

For example, for places where English isn't the first language, LLMs are widely used for content-based tasks (posts, emails, support), because waiting 10s for Bard or GPT to output something higher quality is worth their time. Perhaps these folks end up being more attuned than usual, more in harmony and sync with the LLMs?

Speaking from experience, it took a while for me to move from vim to vscode, from Javascript to Typescript. We probably prefer our silos and feel uber comfortable in existing setups.


Lifehack, add this to the end of your question:

  Start your response with "Let's work this out in a step by step way to be sure we have the right answer:" and work this out in a step by step way to be sure you have the right answer
It will improve your answers significantly (for example, on word math problems, solve rate went from 18% to 79%).

Source: https://github.com/openai/openai-cookbook/blob/main/techniqu...


Currently, it's basically entry level for anything mildly complicated and it can't troubleshoot itself, so it needs to be coached into giving the right answer. So it's great for boilerplate and API lookups but if you were using a typed language before and had a decent IDE, this was already a solved problem. It seems impressive because it's very fast but that advantage goes away when you then have to be like, "hmm, well that's not quite right" and then you coach it into being correct (or close enough that you stop).

If your team can't autocomplete APIs, and if you can't run everything locally so you can step through your code with a debugger, and if you don't have good tests so you need an AI to tell when you broke something (see also profiling, telemetry, release quickly and confidently, etc) - you're going to get a million percent more mileage just doing these sort of basics correctly (and fast) vs investing in AI because the open secret is that most of engineering time is spent reading and debugging code rather than writing it. This should be table stakes but so many engineering teams (even at bigtech companies) can't do the above. So they're all signing up for copilot but they still iterate with logging statements and catch regressions in production.


My usage of ChatGPT is as follows:

1. Google replacement for simple queries. Like "how do I HTTP POST with go?". Saves like 30 seconds of browsing.

2. Google replacement for vague questions. Like "Which ingress controllers are popular and what's their strengths".

3. Procrastination eliminators. Like "I need an identifier for ..., can you suggest me some proper names?".

4. Boring tasks which could be solved with enough bash scrpipts or regexes, but with ChatGPT it's often faster.

5. Most impressive to me: "I want to format the following YAML according to the following rules: <vague imprecise rules>. Example: ....". This sometimes saves me really lots of time, because it acts like a human and able to follow imprecise instructions. I could either do it myself spending lots of time or write some formatter spending, again, lots of time.

I was never able to utilize it for "smart" tasks, like if I couldn't solve something and it solved it for me. It's not smart yet. But it's an useful tool in my toolbelt, definitely worth $20 and saves me time and sanity for some boring tasks.


In my personal experience chatgpt-4 has been a great addition to the toolbox, both for coding assistance and research. You do have to have sufficient grounding in the subjects to be able to evaluate the responses critically though.

However, in all cases I have seen a very substantial decline of the capabilities of chatgpt-4 with the last few releases. E.g. it used to get code snippets most often right before. Whereas now it tends to be wrong most of the time. Usually it conflates capabilities of several distict libraries, just hallucinating (extrapolating) some non existing functions or attributes.

I personally suspect they are 'cleansing' the training data, and/or driving severe 'schizophrenia' into the model through conflicting RLHF.


It is fascinating how far the article is away from the paper. This all looks set up just to market something. First of all, the paper only provides suggestive(!) evidence that customer support agents(!) work faster and better if AI is writing the text instead of them (what a breakthrough). The researchers seem to be affiliated with "Fortune 500 enterprise software company that provides AI-based customer service support software" and half of the paper is describing what is AI and how it helps.


Your standard full-stack dev is going to have "less experience" with approximately 80% of her codebase, by dint of having a finite life-span. ChatGPT is a faster and more convenient SO in that (very common) case. I, for example, have used it with great effect writing bash scripts that I would not dare to write otherwise. It's also not bad at bashing out the first use of a new library. I mean, I have ended up throwing that code out 100% of the time, but it's still useful.


I guess I should change jobs to something I am not good at


ah their test environment was a call center

feels like there is a viewpoint floating around that chatgpt is valuable in part because of the RLHF work

but does this makes it generally capable? or just prove that you can hire essentially the people you would hire to staff a call center (language skills + good generalists) to create a language model that behaves essentially like a call center (refuses to take any actions that are not allowed)


Using ML tooling can be incredibly useful for new and experienced folks, - but in different ways. New folks are more likely to ask questions they’d otherwise feel uncomfortable asking but are far more at risk of being lead down the wrong path and skipping over gaining a fundamental understanding of how things work . It’s a balance of risk vs reward.

I’ve been in tech 18~ years and get a lot out of copilot and chatGPT every day.


As others are saying/hinting, workers with less ability to vet outputs may overlook crucial errors.

Take for example this automated help that I found only minutes ago - and seems to prove that generators can be racist (...I am joking):

https://ibb.co/v3VRTQQ


These are results from a model deployed from 2020 to 2021. Early GPT-3 days. We've come a long way since then.


One big difference between googling and asking an LLM is that it tends to be much easier to judge the legitimacy of information from a google result through informal heuristics. This is because google just links to a website, and there are all sorts of other factors that you use to subconsciously rank how reliable information on a particular website is, that mostly have nothing to do with the raw information.

For example, we all have different weights for how much trust information we get from, e.g, Wikipedia, stackoverflow, high profile bloggers in your domain, conspiracy websites, listacles, blog-spam sites etc, marketing materal etc. If the info is on a social site, is highly upvoted? Is the info from an academic journal, arxiv.org, an academic blog, or a wordoc you downloaded from scribd [1]?

Yes, this is judging a book by it's cover, but it's a heuristic that tends to work well.

By contrast, LLMs present all information to you with the same confidence in the same homogeneous interface. There is no external context. So all your normal heuristics for judging reliability are broken. -

[1] https://xkcd.com/1301/


As a frontend dev of 12 years, I’ve been getting a lot of utility out of it helping me write Jenkins pipeline code. And learning a lot of groovy, Java and Jenkins patterns along the way.


It can't even get regular expressions correct in my experience


I've had it fail to get a basic regex right, but simultaneously, found it able to model my custom problem using the z3 constraint solcing framework, which I'd have struggled to do myself. Regular expressions are probably harder for it to get right easily, at least not without prompting it to first think through step by step ehat each piece of a regex should be or something.


imo this is the most human thing about it.


The question is if companies still hire less experienced people or if the experienced ones are equally productive without them because of ChatGPT.


The title is misleading. The experience of call center workers cannot be generalized to all workers.


To be clear, this is about contact center agents and other jobs of a related level.


Lose the most, because they're going to learn less, and do less, themselves.


i tried it with an old, less popular language, Progress 4GL, and it just invented pure bullshit answers.

when i asked are you sure about that it owned up and and said sorry. wtf?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: