Hacker Newsnew | past | comments | ask | show | jobs | submit | leo_researchly's commentslogin

We have been working on how to track Y Combinator Companies (and other incubators for that matter) and I wanted to share a few learnings:

Identifying them in the first place is easy; even with a simple scraper you can save them to Notion, Google Sheets etc. Firecrawl or Exa are pretty good for that

The hard part is keeping the list current and "actionable":

you need a way to cluster startups; most VCs (& their thesis) are sector-driven. So a sector-taxonomy is ideal. But classifying startups into sectors is extremely difficult; you cannot rely on official documentation as it either does not exist or is too vague. you cannot rely on the startup name as the unique identifier; it can change, it might not be unique Startups pivot; so you need to keep your lists up-to-date you need to keep your database up-to-date; can't rely on news as information is scattered all over the place, can't rely on generic alerts, you need signals like "two new use cases on website" What we have seen to work

use two types of taxonomies: a broad one (NACE codes are a great way to start) and an AI-based one (use a simple classifier based on the company website) use the domain as the unique identifier instead of the name monitor live signals; that's usually the hardest part. The simplest (but also most expensive way) is to capture certain pages (about us, career pages etc.) regularly and use AI to get a "diff" usually Google Sheets is enough to start, but move quickly to a more stable database like Notion or Airtable (CRMs work too, but tend to be too overloaded) use N8N to glue all that together with a few simple prompts If there's interest, I can also share a technical breakdown and N8N files to start.

PS: you can read our full breakdown (in German, however) on your blog: https://www.researchly.at/post/y-combinator-companies-finden...


Generic studies like the one from Semrush (https://www.semrush.com/blog/ai-mode-comparison-study/) offer an overview of the sources AI-powered search engines, such as Perplexity, use.

While Semrush analyzed a large number of sources (150,000), their findings are broad and not always useful for specific topics because sources can vary greatly between segments. For example, Semrush found Reddit and Wikipedia to be frequently cited, but in our research, neither of these sources appeared. Additionally, Semrush did not disclose the prompts they used, whereas our study uses prompts based on G2’s "Most Popular Software Categories."

To provide more targeted insights, we analyzed around 2,000 sources in response to prompts focused specifically on software categories from G2, with a focus on Perplexity as the AI search engine.

Some takeaway: * "Corporate" websites dominate citations, accounting for 76.07% of all sources. * Most frequently cited domains in the "Corporate" category include established, "older" domains like ions.de and zendesk.de, as well as less well-known sites such as botpress.com. * Comparison pages like www.crmsystem.de do not dominate the citations, even though they might be expected to rank higher due to their likely greater neutrality compared to corporate sources.

We will update the report in the next few week to analyze around 200 prompts (10 software categories from G2 each 20 prompts) which will lead to around 4000 urls.


Hi ricardobeat, you just perfectly touched on our CTO's main concern. :) Although we are not a large organization (we are a 3 people team), we have a few technologies around.

Most of that was urged by me the CEO. Some, admittedly motivated by a "red shiny object" syndrome.

We use Coolify to accelerate deployment (atop of Azure), so it does not interfere with Bolt and Lovable.

Our backend developer used Bolt in that past to built frontends for his backends. I use Lovable primarily as an alternative to Figma, i. e. for communication new ideas. None of that ends up in the production product (here we use Cursor mainly).

v0 is in asses because we heard good things about their full-stack capabilities (something that Lovable/Bolt cannot really do).


hi pella, thanks for your comment, especially "the reason behind the status is key."

We built the radar primarily to have a data-driven reasoning behind it. I have explained our approach in more detail here: https://blog.tryresearchly.com/articles/built-own-technology...

Here is the summary: We score each technology across three dimensions: market adoption (how many big companies are really using it and seeing results), relative impact (on our bottom line), associated risks, and internal prios.

For each dimension we have a scorecard along the lines of: if five top 100 startups are using it publicly it gets 3 points of market adoption. The scorecard is far from perfect, but it gives us a good, repeatable algorithm across time and trend.

In the case of GPT-Codex: it is on hold because the perceived relative impact for us is low (we already have a good setup with alternative tools; bolt + cursor). In the above-article I have also linked to our Google Sheet. It contains the scores (some redacted) including the underlying data.

Also if you klick on some of the trends (e. g. GEO) it links to our written-out rationale (e. g. https://blog.tryresearchly.com/articles/wie-wir-uns-auf-geo-...)

Thanks for the Oreilly radar. Didn't know this one.


> In the case of GPT-Codex: it is on hold because the perceived relative impact for us is low

Interesting ... so for you "Hold" ~= "low impact / monitor ", while ThoughtWorks use "Hold" more like "don't start anything new" ( https://www.thoughtworks.com/en-us/insights/blog/technology-... ) [1]. Personally, I've started to read "Hold" in their sense - basically as "not recommended" for new work.

that’s why I first thought "GPT-Codex" (one of my favorite models) was already "not recommended." :-)

[1]

  ""
  Hold: The original intent of the hold ring was "proceed with caution", to   represent technologies that were too new to reasonably assess yet. But it has evolved into more of a "don't start anything new with this technology." You may be constrained to use it for existing projects because it is so deeply embedded into the tech portfolio, but you should think twice about using this technology for new development. 
  """

Thanks for the links - makes much more sense now!

EDIT:

Zalando definition ( https://opensource.zalando.com/tech-radar/ )

  "HOLD — Technologies not recommended to be used for new projects. Technologies that we think are not (yet) worth to (further) invest in. HOLD technologies should not be used for new projects, but usually can be continued for existing projects."


exactly. Maybe I need to update our ring-definition or least define them on the radar. Thanks!

BTW: what's your experience with GPT-Codex?


Still in the honeymoon phase with gpt-codex

  codex -m gpt-5-codex -c model_reasoning_effort="high" 
it’s my current favorite, with claude-code as runner-up.

As someone with aphantasia[1] I naturally lean toward backend / abstract modeling ( maps, technology radars, databases ) , so non-visual tools like codex-cli, claude-code, or even https://omarchy.org/ are especially appealing. I haven’t yet seen a technology radar that makes a visual vs. non-visual distinction. If you ever run a survey, adding such a category could be interesting - surprising patterns might emerge.

[1] https://hn.algolia.com/?query=aphantasia


There’s a lot of talk about how AI will replace content creators. I doubt that.

To improve our lowest ranking blog post (<100 impressions; some AI-written, some hybrid, some completely manual) I have built an N8N-workflow which checks SEO-density, internal linking and On Page-issues. I took me 16 full hours to build a basic workflow. As you can see in blog post, the workflow is quite complex and still only scratches the surface. And: this workflow only serves to get recommendations on how to improve blog posts. Rewriting them is a whole new category.

Based on that, I do not see how "ChatGPT will replace content creators".

In the blog post I also share the N8N workflow and a detailed dive into caveats.


My thoughts: ## 700 million weekly users (Google: approx. 3 billion daily) Are weekly users really a good metric? After all, if you use it just weekly is it really affecting your life?

## 18 billion messages weekly (Google: approx. 15 billion daily) Are Messages a good metric? Do I send a lot of messages because I can't get the right answer?

## Use Cases * 70% private use * Writing, searching for information, and advice are the most common uses, each accounting for 30% * Coding: 4% of total usage (where are my Vibe coders?)

It seems that most use cases are a more advanced form of wikihow The reason is that AI-usage is still very early and most people don't know what they can do with it. In the future, we'll see a lot of specialize SaaS-tools unbundling use cases and building advanced applications where users do not realize that they are using Gen Ai


While building our startups, it's important for me to think about moats—ways to protect us from "AI".

There are many guides on building moats, such as this one Hacker News (https://news.ycombinator.com/item?id=42620994).

But for me, these guides list moats and do not explain why they work. As such I wanted a framework for assessing assets that contribute to network effects.


:D that shouldn't surprise but it does. Is there anything where Doom can't run?


This week marks the IAA (International Motor Show Germany; one of the most important car trade shows). To explore the strategies that OEMs are pursuing, I analyzed their recent earnings calls.

Here's the TL;DR - Elon Musk might get a $1 trillion if he comes back to the office and delivers robotaxis and robots as promised (Elon Musk is busy with a lot of things besides cars) - Volkswagen is executing a product-heavy, China-local strategy (30 new models in 2024!) and partnering with US and Chinese-brands for soft- & hardware - While for Tesla full autonomy is just around the corner, Ford is focusing on Level 2/3 - Ford has announced a new Model T equivalent product innovation


I've touched on that point on the post (shortly). But I haven't considered the downsides of that constant shilling. That's a good point. Maybe that's also why they have paused their affiliate program.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: