Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Agreed. I would also point out maintainability. How do you test some SQL logic in isolation?

Additionally, in this day and age where enriching the data by running it through some ML model isn't that rare, doing it in SQL by exposing it through an API and invoking some UDF on a per-row basis is extremely inefficient due to network RTT. In my opinion it is much better to use something like Apache Beam and load the model in memory of your workers and run predictions "locally" on batches of data at the time.

On the other hand I see the value in expressing "simple" logic in SQL, especially when joining a series of tabular sources. That's why I am super happy with Apache beam SQL extensions (https://beam.apache.org/releases/pydoc/2.30.0/apache_beam.tr...) which, IMHO, has the benefits of both worlds.



> How do you test some SQL logic in isolation?

I do this using sql

1. Extracting an 'ephemeral model' to different model file

2. Mock out this model in upstream model in unit tests https://github.com/EqualExperts/dbt-unit-testing

3. Write unit tests for this model.

This is not different than regular software development in a language like java.

I would argue its even better better because unit tests are always in tabular format and pretty easy to understand. Java unit tests on other hand are never read by devs in practice.

> in this day and age where enriching the data by running it through some ML model isn't that rare,

Still pretty rare, This constitutes a very minor percentage of ETL in an typical enterprise.


> Java unit tests on other hand are never read by devs in practice.

You should get to know better java developers. :-)


haha same!. Ppl saying sql is unreadable need to know better sql developers.

sql is so pretty and so very readable and maintable if done by people who know what they are doing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: