Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've worked with many data scientists whose typical SQL usage was to load entire rows (or with simple 'with' filtering) into Pandas / R dataframes and then do all their work there. I think it's a combination of Pandas and R having much simpler APIs and a ton of documentation on Stack Overflow, and modern hardware just being so good that you can load a big enough chunk of your dataset in-memory on a laptop.


I mostly use SAS, I tend to prefer using plain sql queries where I typically depart SQL and jump into code is doing what SAS calls "By Group processing" (example https://support.sas.com/kb/26/013.html#)

I am not as familiar with R. Last time I worked in R (some years ago) equivalent R code was something like this caution I'm no expert in writing R so might be a better /more intuitive way...

Output_data <-merge(x=T1, y=T2, by="Date", all.x="True") %>% mutate(My_var = NAME) %>% fill(My_var)

In SQL the equivalent would need to use Over (Partition by) which is less intuitive for me to write.


Hard for me to imagine anyone who finds Pandas API more intuitive than plain old SQL. I can't do anything in Pandas without looking up syntax.


Guilty! I live in data.table in R, which is essentially an ideological implementation of SQL, but with much terser syntax.

https://cran.r-project.org/web/packages/data.table/vignettes...


Never feel guilty using that superior workflow, when your dataset can comfortably resides in memory.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: