Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Launch HN: Secoda (YC S21) – Searchable Company Data
97 points by Etai on Oct 29, 2021 | hide | past | favorite | 29 comments
Hey HN, we’re Etai and Andrew, and together with our team, we’re building Secoda (https://www.secoda.co). We do search (that actually works!) for your company data. The name Secoda stands for "Searchable Company Data".

As a company grows, so does its data. Tables, metrics, queries, and dashboards often become isolated and are difficult to find. Even with great practices, organizations still struggle to get value out of their data - up to 73% of all enterprise data goes unused. One of the big contributors to this problem is that organizations create data silos by not documenting and centralizing their data knowledge in a single place where every employee can access information about data.

Andrew and I experienced this problem first hand at the last company we worked at. Andrew led the Product team and I led the Operations team and found that it was extremely difficult to find, understand and use data without looping in someone on the data team to help. The problem was that we only had 1 employee on the data team who supported over 100 employees asking questions about how to find and use company data, which meant that it would take around 2 weeks to get an answer to any data request.

Other data management tools focus on listing all data resources, regardless of their relevance or accuracy - you generally just get a list of what's available, but not in a form that's very meaningful. We adopted some of these tools in our last jobs but found that they created an overwhelming index of too many tables, dashboards and queries that weren’t relevant to most employees. This meant that even after adopting a tool to solve the problem, most employees still couldn’t use them to find, understand and use data.

Our approach to solving this problem is to build Secoda as a tool that helps data teams curate metadata for less technical employees. Instead of listing every resource, data teams can use our tool to curate and document data for specific departments or roles. As a result, employees who are less familiar with data will not be overloaded by information that is irrelevant or too technical. Our goal is basically to be like Google search for in-company data. You enter what you need and you get back the relevant information. We integrate into databases, data warehouses, BI, and transformation tools and offer both an on-prem and cloud-hosted deployment.

Over the last six months, our team has been improving our product closely with our early adopters to build a better product. Today, we’re excited to share the launch of our self-service product with the HN community. You can now sign up to Secoda, connect your database or data warehouse and start using Secoda without a sales call. We offer a free 14-day trial (no credit card required). After the free trial, we charge per editor, per month. If you’d like, you can also take a look at this video of us setting up our Secoda workspace: https://www.loom.com/share/f41b317441554a36930b9cfe4c91a45f.

We're also hiring for a number of roles, which you can find here: https://www.workatastartup.com/companies/secoda.

We’d love to hear about your experiences with data discovery and any ideas/feedback/questions you might have about what we’re building!



A very cool approach. When I worked at Uber, there were internal tools to query data sources. In practice, I found few people outside data scientists knew how to query the data, or know what was available. I was an engineer and barely used this tool, until a peer DS showed me how to do this. Even then I was overwhelmed by the number of tables to join, knowing what data source contained what data, or knowing which tables I had access to query or join on.

A few questions:

1. How do you go about permission? This was a major question at Uber (where permission were put in place early enough). Especially with GDPR and other regulations, you cannot have anyone access any data.

2. What about PII? Some data needs to be stored, but cannot be viewed except for very, very few people and with a strong audit tail. This is a more specialized case for #1.

3. How do you see the tool "spread" the most within companies? I would assume that easy sharing is how people learn about this, then try it themselves... but would love to hear what you actually see.


1. How do you go about permission?

We have pretty advanced RBAC in Secoda. You can make anyone a viewer, guest, admin or editor in the workspace. Viewers and Editors are only able to see the information. Secondly, we allow you to create "groups" for different functions in the organizations (ie. marketing, sales etc.). You can choose to share any resource with a specific user or group. This works similar to the RBAC that Notion uses, which only means that the right people are seeing the right information in Secoda. Lastly, we allow data teams to create "collections" of information, which can be shared with specific groups or specific users. Without sounding bias, I think this is where Secoda excels as a product.

2. What about PII? Some data needs to be stored, but cannot be viewed except for very, very few people and with a strong audit tail. This is a more specialized case for #1.

We have an ability to auto tag PII on a table and column level. Any PII data won't be viewable without permission from the admin.

3. How do you see the tool "spread" the most within companies? I would assume that easy sharing is how people learn about this, then try it themselves... but would love to hear what you actually see.

Usually the Slack integration is the best way to spread Secoda. With our Slack integration, any employee can search for information by pressing /secoda in Slack. You can also push information from Slack to Secoda and vice versa. This exposes Secoda to new employees in the place they work.


Do you or do you have plans to support AD / LDAP directories? Google Groups?


Wait, is it curated, or is it search? To me "search" implies that the tool _discovers_ my stuff and makes it searchable. If I have to tell it about my stuff ("curation"), then that's just a metadata catalog.

The reason the distinction matters is that if it's curation-based, the onus is still on the data team to document all relevant assets, which they could already do, and have already demonstrated they don't want to.

Now, it could still be a good metadata catalog! Most of what's out there is bad. But if that's what you're shooting for, pitching it as "search" will be confusing.


That's a great point, we think of Secoda as _both_ a search and curation tool. The search portion of the tool is accomplished through the no-code integrations. When you connect Snowflake, for example, we extract metadata about the tables in Snowflake such as the columns, descriptions, number of queries run, people who are frequently using that table, etc. All that information becomes searchable in the catalog. After a company integrates all of their data sources, we see that teams leverage the curation capabilities of the product. Editors can add documentation through our documentation editor to provide additional context about the data resources that are discoverable in Secoda. In addition, teams can add shared definitions of metrics, answer questions that are asked by data consumers, and create ad hoc analyses that are also discoverable.


As you may already know, integrations are the heart and soul for products like this. I'm assuming you're already being bombarded by potential/current users asking "when will you have integration X?".

What is your strategy to scale out & maintain integrations? Speaking from experience, it's not something that is easy to scale out unless you have a dedicated team whose job is to build them out, or you have some third-party provider like CData providing OOTB connectors for your product.

(On a side note, this looks fantastic. Are you hiring any product folks per chance? I have significant experience tackling this same problem).


You're spot on. One thing we're doing to deal with the long tail of integrations is opening up an API to customers.

We're also considering open sourcing that part of the product, but haven't made a firm decision on that yet. Would love to chat if you're open to it. We're definitely looking for people in product. Feel free to send me an email to etai@secoda.co


Responded!


This would go a long way to addressing one of the key needs that we’ve been planning around: a central library to manage all the different documents and datasets that are accumulated by different teams.

We’ve sketched out an initial solution which looks a lot like Secoda, except focused on csv files, the concept being to check csv data sets into the library, add metadata, and then define how to bridge it into the central data store.

I’ll dig further into the website, it looks like you’ve done a lot of good work avoiding repeatedly addressing the same questions!


Ideally, no team has to answer the same question twice once they start using Secoda. In reality, there's times when are question looks similar but is defined differently. We're trying to suggest resources to people who ask questions so they become more self service. Similar to Intercom knowledge hub for data.


Congrats on launching!

So how do you compare to a Data Catalog like datahub? https://datahubproject.io/

From the video you looked very similar to them as a metadata consumer and they provide extensive API integrations so you can add basically any set of metadata you want including slack, jira etc. They're also offering a hosted version.

Their metadata is indexed into a tuneable ES cluster so you can fiddle with relevance etc to your hearts content.

What's your big differentiator?


Thank you! Secoda is different from DataHub in a few ways:

1. If you're using the DataHub open source solution it requires a data engineer to get the platform up and running and maintained, which can be a fairly expensive cost depending on the salary of the data engineer. Secoda has 15+ no code integrations that can be setup in 5 minutes and is a fully managed solution. We are releasing a metadata API that will be available before the end of the year, in case an organization is using a product that we do not currently integrate with.

2. Acryl (managed version of DataHub) is mainly focused on the data catalog, which they do a great job for. However, they don't provide the questions, dictionary, and visualization components that we provide in addition to the catalog. These additional components of the product add more context around data knowledge, and are also focused on helping non-technical users understand company data. Whereas the data catalog is focused more on helping technical data users understand company data.

3. Also if you're using Acryl, you'll have to get in touch with their team to get a demo of the product. For Secoda, you can signup at https://app.secoda.co and try out a free trial of the product without having to talk with our team. We do offer demos if people are interested though.


Hey Etai and team, Congrats on the launch! I’m so glad to see several teams trying to tackle this hard problem of complexity in the modern data stack.

I’m Shirshanka, the founder of the DataHub project, occasional responder to HN threads and reachable at https://slack.datahubproject.io :)

I wanted to respond to some of the text here since DataHub and Acryl Data was directly mentioned.

1. We’ve heard repeatedly from the community that DataHub quickstart just works in 5 mins or less (besides a current known issue with M1: thanks Apple!). Once people are able to show value with the quickstart and the pre-packaged connectors that connect upto 20+ systems, they quickly move towards a deployment model based on helm, that is open source and maintained by the Acryl team. All of this requires no code. Deploying DataHub using the provided helm charts is also quite easy based on what we’re hearing from the community.

2. Acryl Data is reimagining what a data catalog can do, data discovery, data observability and federated data governance. We believe that techniques like semantic knowledge graphs are only useful and reliable if they are built on top of a live and fresh operational metadata graph. Also we see ourselves not just as an “end user tool”, but as a central fabric through which metadata is stored, and transformed before integrating in other tools. As a result we are intentionally, API-first and stream-first.

3. We already offer the open source DataHub demo at https://demo.datahubproject.io. People talk to Acryl Data after they have already tried out the open source product and they are looking for a managed version that has more to offer.


Congrats on the launch, and good luck!

How is it different from Glean? https://www.glean.com


Enterprise search is a hard problem, but this looks pretty slick.

What's your tech stack?

Did you create the integrations from scratch, or use something like Zapier?


We use react, flask, neo4j, elastic search and it's all hosted in AWS. Integrations are from scratch


Looks great and reminds me of Moma at Google.


We always have a hard time figuring out how to define our source of truth and which tables/graphs to use for measuring our KPIs. Definitely going to reach out about using Secoda... would save us a lot of number wrangling during our weekly syncs.


Looking forward to showing you around the product! :)


We've been using Secoda since the Meet the Batch thread. It's come a long way already since then. Great product! Hard part is still getting the team to use it, but we've been making inroads.


That thread was at https://news.ycombinator.com/item?id=28156461, for those curious.


We have exactly this problem (and have tried to solve it in various ways) and launched Secoda internally at PartnerStack a few days ago.

Excited to see it roll-out in the org and build a solid data knowledge base.


Excited for your update after a few months of usage.


Wicked! Super excited to having you build the knowledge base at Partnerstack with Secoda


Congrats on the launch! How is this different from amundsen / stemma?


Very nice ! congrats - pretty awesome tool


Thanks!


How is this different than ThoughtSpot?


Secoda is different from ThoughtSpot in a few ways:

1. It is likely that you'll need to setup a 1:1 meeting with a ThoughtSpot expert to help get your company up and running on the software. Our goal with Secoda is to be a self-service platform that is designed for any company, large or small, to get their data knowledge base setup in 5 minutes.

2. ThoughtSpot's price point is typically in the six figure range, which is much higher than Secoda's price point that starts at $29/editor on the platform.

3. Thoughtspot's core focus is providing answers to data questions through visualizations. Secoda takes a more comprehensive approach to documenting data knowledge. In addition to having visualizations that help answer questions, we also provide a shared data dictionary for defining metrics, as well as a catalog that can store tables, dashboards, jobs, and many other data resources.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: