Hacker Newsnew | past | comments | ask | show | jobs | submit | temuze's commentslogin

I'm currently working on it :)

See you in two weeks I hope


anything we can see already?


In the short-term, these kinds of investments can hype up a stock and create a small bump.

However, in the long-term, as the hype dies down, so will the stock prices.

At the end of the day, I think it will be a transfer of wealth from shareholders to Nvidia and power companies.


I just wish that AMD (and, pie in the sky, Intel) had gotten their shit together enough that these flaming dumptrucks full of money would have actually resulted in a competitive GPU market.

Honestly, Zuckerburg (seemingly the only CEO willing to actually invest in an open AI ecosystem for the obvious benefits it brings them) should just invest a few million into hiring a few real firmware hackers to port all the ML CUDA code into an agnostic layer that AMD can build to.


Groq seems to be well positioned to give Nvidia a run for their money, actually.


> as the hype dies down, so will the stock prices.

*Depending on govt interventions


I strongly disagree.

I worked at a company with a world-class QA team. They were amazing and I can't say enough nice things about them. They were comprehensive and professional and amazing human beings. They had great attention to detail and they catalogued a huge spreadsheet of manual things to test. Engineers loved working with them.

However -- the end result was that engineers got lazy. They were throwing code over to QA while barely testing code themselves. They were entirely reliant on manual QA, so every release bounced back and forth several times before release. Sometimes, we had feature branches being tested for months, creating HUGE merge conflicts.

Of course, management noticed this was inefficient, so they formed another team dedicated to automated QA. But their coverage was always tiny, and they didn't have resources to cover every release, so everyone wanted to continue using manual QA for CYA purposes.

When I started my own company, I hired some of my old engineering coworkers. I decided to not hire QA at all, which was controversial because we _loved_ our old QA team. However, the end result was that we were much faster.

1. It forced us to invest heavily on automation (parallelizing the bejesus out of everything, so it runs in <15min), making us much faster

2. Engineers had a _lot_ of motivation to test things well themselves because there was no CYA culture. They couldn't throw things over a wall and wash their hands of any accountability.

We also didn't have a lack of end-to-end tests, as the author alludes to. Almost all of our tests were functional / integration tests, that run on top of a docker-compose set up that simulated production pretty well. After all, are unit tests where you mock every data source helpful at all? We invested a lot of time in making realistic fixtures.

Sure, we released some small bugs. But we never had huge, show stopping bugs because engineers acted as owners, carefully testing the worst-case scenarios themselves.

The only downside was that we were slower to catch subtle, not-caught-by-Sentry bugs, so things like UX transition weirdness. But that was mostly okay.

Now, there is still a use case for manual QA -- it's a question of risk tolerance. However, most applications don't fit that category.


This phenomenon is very well described by Elisabeth Hendrickson’s “Better Testing, Worse Quality” article.


False dichotomy. Poor dev practice is not fixed by elimination of QA, but rather fixed by improving dev practice. The “five why’s” can help.


If you're working without a net, you're going to be more careful. And 5 whys is not a particularly great overall practice. https://qualitysafety.bmj.com/content/26/8/671


You're also more likely to die when mistakes inevitably happen. Dismissing reason is another strategy not likely to work well in your favor.


I used to do a similar thing, then I realized it was a potential problem.

Let's say you have an account at AcmeCo. Let's say AcmeCo has a breach and I can see your password hash. Let's say the company uses a weak password hash (e.g. MD5), or no salt and it's easy to reference a rainbow table.

From this rainbow table, I can look up your hash and see that your password is "lulzSecret2$AcmeCo".

Now let's say you're in another leak from BetaCo. Similar situation -- I see that your password is "lulzSecret2$BetaCo2". Maybe the two is because you were forced to rotate your password once.

It doesn't take a genius to guess what your algorithm is.

But we can take it another level. Maybe I'll try all the major banks and guess passwords using your algorithm ("lulzSecret2$bofa", "lulzSecret2$chase"). Most banks require 2fa, but most of the time they keep it to text-based 2fa.

If I know your phone number from one of the breaches (happens all the time), maybe I can hijack your SIM card (this also happens all the time) and boom, I'm into your bank account.


Assume the function is a cryptographically appropriate hash function, you can reduce the risk of suggested attack to almost nil, considering the number of inputs you'd need for such attack


Reading the docs, I'm a fan of your authentication / session management. When anyone pushes JWTs on me, I get sad. Session revoking is a necessary part of any authentication system and making a Redis call doesn't take that long.

However, the downside of custom session managers is that other services might not be able to read/write the created session. For example, I'm currently to get off of Express and onto Fastify. Unfortunately, @fastify/session isn't perfectly compatible with express-session (although I'm working on it). I would have a similar issue if I introduced Next.js + Blitz... Sometimes, I wish there was a shared protocol for sessions between languages/libraries!


As an aside, this is the first time I've heard about Flightcontrol. Super impressed! The biggest con of something like Vercel is that you can't be on your own AWS VPC. An RDS instance with a public IP address (which Vercel's docs endorse) is a dealbreaker for me.

But... wouldn't a Terraform module accomplish something similar? Our own stack is something like Codepipeline + Fargate + ALB + Cloudwatch + Cloudfront and we basically just forked https://github.com/cloudposse/terraform-aws-ecs-web-app


As someone working on a roughly similar product (withcoherence.com), I would highlight a few key reasons to prefer to a TF module: - maintainability (better to not have to maintain, audit, improve TF and keep up to date with best practices) - multiple environments (how do you create pipeline, CF distros, etc up to date with all your active branches). Even more - enhancements like using spot instances automatically for test envs. Maybe you already did that improvement or plan to do it, but a good example of where a platform might get ahead of your own fork. - discoverability/SPOF on team - how do you train a new dev on how to use these tools? much easier to train them on a nice UI. what if the person who forked that TF repo leaves the company? - integration to other environment types - Coherence is unique in this respect but how does your cloud footprint map to CI/CD testing envs, development envs, etc? Along with other open questions like how do you provide SSH access to the team across environments when needed? More stuff to fork and maintain... With something like Coherence, all of these questions are answered for you in one sane way (we configure a Cloud IDE and Cloud Shell automatically for all your environments) - Cross-cloud and cross-region support. Migrating your app to another provider or service is easier if supported by the automation tools. Coherence supports GCP and AWS for example.

All in all, buy vs. build is a tough question but generally SaaS wins once it is a viable option for real-world teams. In developer experience, we are still in the early innings on convincing folks it's worth it not to reinvent the wheel in-house.

Would love anyone interested to check out our free trial and feel free to ping hn@withcoherence.com with feedback or questions!


I think the whole point is you don't have to daisy chain a bunch of stuff. It's batteries included.


Open S3 Buckets as a service


Back at my old job in ~2016, we built a cheap homegrown data warehouse via Postgres, SQLite and Lambda.

Basically, it worked like this:

- All of our data lived in compressed SQLite DBs on S3.

- Upon receiving a query, Postgres would use a custom foreign data wrapper we built.

- This FDW would forward the query to a web service.

- This web service would start one lambda per SQLite file. Each lambda would fetch the file, query it, and return the result to the web service.

- This web service would re-issue lambdas as needed and return the results to the FDW.

- Postgres (hosted on a memory-optimized EC2 instance) would aggregate.

It was straight magic. Separated compute + storage with basically zero cost and better performance than Redshift and Vertica. All of our data was time-series data, so it was extraordinarily easy to partition.

Also, it was also considerably cheaper than Athena. On Athena, our queries would cost us ~$5/TB (which hasn't changed today!), so it was easily >$100 for most queries and we were running thousands of queries per hour.

I still think, to this day, that the inevitable open-source solution for DWs might look like this. Insert your data as SQLite or DuckDB into a bucket, pop in a Postgres extension, create a FDW, and `terraform apply` the lambdas + api gateway. It'll be harder for non-timeseries data but you can probably make something that stores other partitions.


We do something similar, but:

- instead of S3, we now use R2.

- instead of Postgres+Sqlite3, we use DuckDB+CSV/Parquet.

- instead of Lambda, we use AWS AppRunner (considering moving it to Fly.io or Workers).

It worked gloriously for variety of analytical workloads, even if slower had we used Clickhouse/Timescale/Redshift/Elasticsearch.


How has your experience been with DuckDB in production? It is a relatively new project. How is it’s reliability?


For our scale and request patterns (easily-partitioned / 0.1 qps), no major issues but the JavaScript bindings (which are different to their wasm bindings) that I use leave a lot to be desired. To DuckDB's credit, they seem to have top-notch CPP and Python bindings that even support the efficient memory-mapped Arrow format that's purpose-built for cross-language / cross-process , in addition to being top-notch in-memory representation for Panda-like data-frames.

Granted DuckDB's is in constant development, but it doesn't yet have native cross-version export/import feature (since its developers claim DuckDB hasn't reached maturity to stabilise its on-disk format just yet).

I also keep an eye on https://h2oai.github.io/db-benchmark/ As for Arrow-backed query engines, Pola.rs and DataFusion in particular sound the most exciting to me.

It also remains to be seen how DataBrick's delta.io develops (might come in handy for much much larger data-warehouses).


I've looked into this but saw hugely variable throughput, sometimes as little as 20 MB / second. Even if full throughput I think s3 single key performance maxes out at ~130 MB / second. How did you get these huge s3 blobs into lambda in a reasonable amount of time?


* With larger lambdas you get more predictable performance, 2GB RAM lambdas should get you ~ 90MB/s [0]

* Assuming you can parse faster than you read from S3 (true for most workloads?) that read throughput is your bottleneck.

* Set target query time, e.g 1s. That means for queries to finish in 1s each record on S3 has to be 90MB or smaller.

* Partition your data in such a way that each record on S3 is smaller than 90 MBs.

* Forgot to mention, you can also do parallel reads from S3, depending on your data format / parsing speed might be something to look into as well.

This is somewhat of a simplified guide (e.g for some workloads merging data takes time and we're not including that here) but should be good enough to start with.

[0] - https://bryson3gps.wordpress.com/2021/04/01/a-quick-look-at-...


How large were the SQLite database files you were working with here?

I've been thinking about building systems that store SQLite in S3 and pull them to a lambda for querying, but I'm nervous about how feasible it is based on database file size and how long it would take to perform the fetch.

I honestly hadn't thought about compressing them, but that would obviously be a big win.


Looks like you already answered my question here: https://news.ycombinator.com/item?id=31487825


Do you have a blog post or something similar where you go into more details on this architecture? I’d be very interested in reading it!


Just so you know I tweeted a link to this and it's got quite a bit of attention on Twitter: https://twitter.com/simonw/status/1529134311806410752


This sounds very similar to Trino’s (and by extension, Athena’s) architecture.

SQLite -> parquet (for columnar instead of row storage) Lambda -> Worker Tasks FDW -> Connector Postgres Aggregation -> Worker Stage

We run it in Kubernetes (EKS) with auto-scaling, so that works sort of like lambda.


Quick question, you on LinkedIn? Please send it my way


curious: how large was each compressed s3 sqlite db?


Sorry, it's been a while, I forget :(

We had to balance between making the files too big (which would be slow) and making them too small (too many lambdas to start)

I _think_ they were around ~10 GB each, but I might be off by an order of magnitude.



SeedFi | Engineers, Designers, PMs | SF, NYC, Atlanta or REMOTE (USA) | seedfi.com

SeedFi is a fintech startup that builds products for Americans living paycheck to paycheck. We're focused on improving our customers' financial health by helping them build savings and improve their credit score. Our products aim to permanently get people out of debt cycles. So far, the response we've seen has been really amazing: trustpilot.com/review/seedfi.com

SeedFi has about 40 employees and is growing fast. We raised over $34M from top tier VCs like A16Z and from major social impact funds and we've already helped our customers build millions of dollars in savings.

https://www.seedfi.com/jobs

You can also ask me questions via the email on my profile.


Think of it, billions of `0000-00-00 00:00:00`s!


The fact that their main product is "Elastic Cloud" and AWS EC2 means "Elastic Compute Cloud" is... unfortunate


In this case, EC2 precedes Elastic Inc. by 6 years and Elastic Cloud by 9, so can't blame Amazon for that one.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: