I think we need a lot better benchmarks in order to capture the real complexity of typical day to day development.
I gave it my typical CI bootstrapping task:
> Generate gitlab ci yaml file for a hybrid front-end/backend project. Fronted is under /frontend and is a node project, packaged with yarn, built with vite to the /backend/public folder. The backend is a python flask server built with poetry. The deployable artifact should be uploaded to a private pypi registry on pypi.example.com. Use best practices recommended by tool usage.
and it generated scripts with docker run commands [1]:
Which, feels more like "connect the dots" or a very rough sketch, that might end up completely replaced. Commands in general seem ok (yarn install && yarn build, poetry build && poetry publish), but the docker run could be better expressed simply as a "image: " attribute of each job. I asked about that and I've been given general "why docker is useful" non-answer.
It also introduced a parallel build stage: frontend and backend are built at the same time, but in my question, I deliberately introduced serial dependency: the frontend code goes into the backend project. The parallel approach would be of course better, if it would correctly construct the end artifact before uploading, but it doesn't do so. Also a bit surprisingly, node install and poetry install could actually run in parallel as-is, but the generated code runs serially.
It uses outdated versions of tools. Python 3.8 seems still ok and used in many online examples due to some compatibility quirks with compiled libraries, but node 14 is more than 3 years old now. Current node LTS is 20.
- Uses python as base image, but adds the node to it (not a big fan of installing tools during build, but at least took care of that set-up)
- Took care of passing the artefacts built by the frontend; explicitly navigates to correct directories (cd frontend ; ... ; cd ../backend)
- --no-dev flag given to `poetry install` is a great touch
- Added "artifacts: " for good troubleshooting experience
- Gave "only: main" qualifier for the job, so at least considered a branching strategy
- Disabled virtualenv creation in poetry. I'm not a fan, but makes sense on CI
I would typically also add more complexity to that file (for example using commitizen for releases) and I only feel confident that gpt4 won't fall apart completely.
EDIT: Yes, gpt4 did ok-ish with releases. When I pointed out some flaws it responded with:
You're correct on both counts, and I appreciate your attention to detail.
I gave it my typical CI bootstrapping task:
> Generate gitlab ci yaml file for a hybrid front-end/backend project. Fronted is under /frontend and is a node project, packaged with yarn, built with vite to the /backend/public folder. The backend is a python flask server built with poetry. The deployable artifact should be uploaded to a private pypi registry on pypi.example.com. Use best practices recommended by tool usage.
and it generated scripts with docker run commands [1]:
Which, feels more like "connect the dots" or a very rough sketch, that might end up completely replaced. Commands in general seem ok (yarn install && yarn build, poetry build && poetry publish), but the docker run could be better expressed simply as a "image: " attribute of each job. I asked about that and I've been given general "why docker is useful" non-answer.It also introduced a parallel build stage: frontend and backend are built at the same time, but in my question, I deliberately introduced serial dependency: the frontend code goes into the backend project. The parallel approach would be of course better, if it would correctly construct the end artifact before uploading, but it doesn't do so. Also a bit surprisingly, node install and poetry install could actually run in parallel as-is, but the generated code runs serially.
It uses outdated versions of tools. Python 3.8 seems still ok and used in many online examples due to some compatibility quirks with compiled libraries, but node 14 is more than 3 years old now. Current node LTS is 20.
For comparison, here's the chatgpt4 version [2] :
Not perfect, but catches a lot more nuance:- Uses python as base image, but adds the node to it (not a big fan of installing tools during build, but at least took care of that set-up)
- Took care of passing the artefacts built by the frontend; explicitly navigates to correct directories (cd frontend ; ... ; cd ../backend)
- --no-dev flag given to `poetry install` is a great touch
- Added "artifacts: " for good troubleshooting experience
- Gave "only: main" qualifier for the job, so at least considered a branching strategy
- Disabled virtualenv creation in poetry. I'm not a fan, but makes sense on CI
I would typically also add more complexity to that file (for example using commitizen for releases) and I only feel confident that gpt4 won't fall apart completely.
EDIT: Yes, gpt4 did ok-ish with releases. When I pointed out some flaws it responded with:
Links:- [1] https://www.phind.com/agent?cache=clsye0lmt0019lg08bg09l2cf
- [2] https://chat.openai.com/share/67d50b56-3b68-4873-aa56-20f634...