We were using 500+gb of memory at peak and were expecting that to grow. If I remember we didn't go with Polars because we needed to run custom apply functions on DataFrames. Polars had them but the function took a tuple (not a DF or dict) which when you've got 20+ columns makes for really error prone code. Dask and Spark both supported a batch transform operation so the function took a Pandas Dataframe as input and output.