Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I am not optimizing too much around insertion speed. I avoid GIN, GIST and hash indexes.

The schema is nicely normalized.

I had troubles with hash indexes requiring hundreds of gigabytes of memory to rebuild.

Postgres B-Trees are painless and (very) fast.

Eg. querying one table by id (redacted):

    EXPLAIN ANALYZE SELECT * FROM table_name WHERE id = [ID_VALUE];

    Index Scan using table_name_pkey on table_name  (cost=0.71..2.93 rows=1 width=32) (actual time=0.042..0.042 rows=0 loops=1)
     Index Cond: (id =  '[ID_VALUE]'::bigint)
    Planning Time: 0.056 ms
    Execution Time: 0.052 ms

Here’s a zpool iostat 1

    # zpool iostat 1
    operations     bandwidth 
     read  write   read  write
    -----  -----  -----  -----
     148K   183K  1.23G  2.43G
     151K   180K  1.25G  2.36G
     151K   177K  1.25G  2.33G
     148K   153K  1.23G  2.13G


Have you measured _transactions_ per second? For both “insert only” and “mixed insert/update/delete?” This might be a more telling metric in terms of the stress it puts on the system. There’s a big difference between 1 transaction with 150k inserts, and 150k transactions with 1 insert each – especially when the workload shifts to include updates in the middle of those.

If you’re doing insert only, you might benefit from copying directly to disk rather than going through the typical query interface.


thank you for the response and stats! very cool




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: