Hacker Newsnew | past | comments | ask | show | jobs | submit | ck_one's commentslogin

Is that actually possible? Can we do a live test here?

Let's say we want this dataset: Credit card line items for 35-year-old dentists living on the 400 block of Elm street in local town

How much do I have to pay you to get it?


i think it could be feasible to get an ad in front of "35-year-old dentists living on the 400 block of Elm street in local town" who has bought product X but i've never seen a transaction by transaction purchase history being for sale.


How much you got?

Never ask a sales person how much yo have to pay when the prices are not already clearly stated. Tell them how much you are willing to spend to see if they will do it for that amount. Sales people will always shoot high hoping to not leave money on the table. The price might change depending on how much you squeal and how high they shot. Your initial "willing to spend" should also be lower than you're actually willing to spend for the same but converse reason


Ok, so nobody here knows directly of any case where such data has been purchased, or vaguely similar, and we have no pricing information whatsoever available, but we are somehow completely knowledgeable about it being possible and how to do it? That sounds unlikely.


The supposedly in-the-know responses here are full of bravado but not much other than "trust me, bro"


https://news.ycombinator.com/item?id=44565878

Yea, you know everything, don't you.


Wow the Transunion business site, that really proves it huh.


Experian is known to sell the data they have. Why is this even in question? If I provide you Experian's website, you would give the same BS response?

Let me google this for you...

https://duckduckgo.com/?q=how+to+buy+data+from+a+data+broker...


The conversation was for buying transaction data from specific people, something that many seem to insist is easy and cheap and doable. Meanwhile if you actually read the responses to that search you smugly cited you'll find that no one seems to know how to actually do anything remotely like this. Yes this data is definitely harvested and it seems like you should be able to buy it in bulk from someone somewhere, but again no one seems to know where or how much or what the purchase minimum would be etc.


You asked for an example, one was given. if you’re saying you dont know how to send an email to a business page with the products purporting services described here - no comment in this section can help you


Yeah people fail to provide examples but continue to be doomers about how easy it is.


Been busy, but since you seem to be unable to find any body by searching on your own for the past 6 hours, here's something I found with a quick little search.

https://datarade.ai/data-categories/food-grocery-transaction...

Have we really lost the ability to use search functionality??


Of course people do. 5 seconds spent doing the most sparse-ass research will help you find plenty of stuff. If people don't respond, I imagine, for fear of 1) outing the specific area they work in, or 2) realizing these kinds of comments aren't generally acting in good faith so it is generally a complete waste of time.

I'll waste my own time and give a trivial example just off the top of my head. Go peruse some of the products offered on this page, put on your thinking cap or even look into them further and imagine what kind of data those services provide, where it likely comes from, and where it is sold to, and you'll be well on your way - and those are just the ones that are advertised openly.

https://www.transunion.com/business

Pretty much every one of the big players people typically associate with other areas such as personal credit have some feet in this space somewhere. Then theres the hundreds of lesser-known fly-by-night guys that have their own DB's they build off of mostly what is the same data, but correlated in different ways and sold to different audiences.

There are many, many services offering data-for-sale on practically anything to practically anyone. I heard of one recently claiming it can reliably determine someone's porn preferences. The fact you personally have never come across it, or are saying you aren't, is only a data point that is interesting to you, and no one else that actually knows what they are talking about in this space. Hope this post helps you somehow.


I didn’t ask for a link to a company that can do it. I want pricing. I am saying that nobody here is willing to share anything even approaching specific pricing, which makes me very much doubt that any of them have the direct transaction experience they are claiming. I don’t doubt that underwater welding exists, but I do doubt that anyone in this thread has done it, or has any direct experience with it.


>There are many, many services offering data-for-sale on practically anything to practically anyone. I heard of one recently claiming it can reliably determine someone's porn preferences.

Okay but then why not name at least a couple such services. Also, if the tech industry isn't selling data to them, where do they obtain it? Again, I see lots of ambiguity here, and the example link from transunion is hardly revealing of anything.


Credit card companies are known to sell data. https://www.cbsnews.com/news/mastercard-credit-card-customer...

Mobile service providers are known to have sold data. https://www.fcc.gov/document/fcc-fines-largest-wireless-carr...

Auto makers are known to sell data. https://www.caranddriver.com/news/a61711288/automakers-sold-...

You act like it doesn't happen, yet time and time again we learn about companies selling whatever data they can collect.

I can't believe we are still questioning this fact

What else do you need to know?


I think you misunderstand. I'm not doubting that it happens widely and pervasively. It's evident that this is the case. I just requested examples based on some of the very specific claims made here despite many ambiguities in how they were phrased.

Anyhow, thanks for taking the time to include some links.


For the most part, readers here are against it. Just because someone doesn’t know how to do it does not mean it is not doable. If it were not doable, these companies would not exist. I’ve already spent more time than I care on the topic. So if you want to think that people are collecting the data and not selling it to interested parties, the, boy, I don’t know. You can only lead hostess to water, but you can’t make it drink.


Hostess? You mean horses no?

>So if you want to think that people are collecting the data and not selling it to interested parties, the, boy, I don’t know.

As I very clearly said above, I don't doubt it at all, I was just asking for any clarification on who to whom.


and you were given them. so why keep taking this persistent obstinate line of questioning and persistent downvoting? it’s transparent and tired. industry experts chime in on this stuff all the time, it isnt anything done in backrooms or anything and is in the open. the only barrier to you not knowing is your own ignorance.


Persistent questioning (usually a good thing for debate by the way. I wasn't asking anything unreasonable), because I was curious and hoping for some answers from anyone willing to provide concrete details for claims that struck me as ambiguous on hard details, even though I don't doubt the widespread existence of personal data selling at all, as I've repeatedly mentioned.. As for downvoting, not sure what you mean. I can only say I've never once downvoted a single comment in any time spent commenting on this site, ever. I think the whole downvote thing is detestable and infantile. I'll even upvote things I disagree with just to counter it if I see them go grey.


Literally all anyone is asking for is one single concrete example of a site where you can roll up and buy personal information.


But what type of range are we talking? Tens, hundreds, thousands?


It could also mean that if you have to ask... or the first rule of data brokering...

Seems like the first thing to do would be to get an account with one of these data brokers. I'd imagine most of these places are "contact us for pricing" so they can play used car salesman games

Or, you could ask John Oliver to do it for you and then tell all of us on one of his episodes exactly how in depth it could get. They have the money to do this, and it seems like something right in his team's wheel house


If you need John Oliver to do it maybe it's not such a big problem? If no one here is able to provide a single concrete example, maybe it's not real?


John Oliver likes to spend HBO's money to do things others can't do while entertaining the rest of us. I'm not spending my money on something to prove what is known as possible for you. At this point, even with receipts, you're coming across as someone that would argue that grass is not green, or water isn't wet, and fire isn't hot.

Just because someone doesn't answer your belligerent questions does not mean it's not possible. It probably means that the people that are doing this with first hand knowledge have too much to do than trying to convert doubting Thomas over here.


All of this started because in response to an extremely concrete question, what's the cost of transaction data for a tightly constrained population, you replied with a smug non-answer about the greed of salespeople. These questions only got "belligerent" because every single answer has been nonsense insisting that it's super easy and cheap but also I couldn't possibly name a single site where this data is sold or provide even an order of magnitude of cost. Or maybe now it requires HBO levels of funding, who knows.


I offered sage advice on how to negotiate when you don’t know a firm price on anything whether that be data or a car or a home remodeling. If you want to say that advice was a smug answer then that’s on you. Every answer after has just gone further and further off the rails


Nah there's no way you actually watch John Oliver because that was really funny. Anyways, you mentioned earlier that we wouldn't believe you even if you posted receipts but that's actually exactly what we want to see. Like, just the name of a business, the thing that was sold, and the price.


In many fields there is no moat. It’s an execution battle and it comes down to question: can the startup innovate faster and get to the customers or can the incumbent defend its existing distribution well enough.

Microsoft owns GitHub and VSCode yet cursor was able to out execute them. Legora is moving very quickly in the legal space. Not clear yet who will win.


> Microsoft owns GitHub and VSCode yet cursor was able to out execute them

Really? My startup is under 30 people. We develop in the open (source available) and are extremely willing to try new process or tooling if it'll gain us an edge -- but we're also subject to SOC2.

Our own evaluation was Cursor et all isn't worth the headache of the compliance paperwork. Copilot + VSCode is playing rapid catch-up and is a far easier "yes".

How large is the intersection of companies who a) believe Cursor has a substantive edge in capability, and b) have willingness to send Cursor their code (and go through the headaches of various vendor reviews and declarations)?


Windsurf was acquired for $3B by OAI and it's clearly the worse of the two. Cursor is trying to raise at a $10B valuation and has $300MM in ARR in less than two years.

So in short, yes, companies do appear to be showing some willingness to send Cursor their code, even with all the headache associated with getting a new vendor.


Could you elaborate on the 'headaches of various vendor reviews and declarations'? (I thought paperwork etc was table stakes for a SOC 2 certificate)


A business is nothing more than defined, established, repeatable systems and processes. That is the difficulty with tooling businesses.


It's better than perplexity on certain tasks! Pretty impressive!

Feedback: the mermaid charts are a bit annoying when they are not needed

Our of curiosity, what web index do you guys use right now and does it make sense to build your own index at some point?


What object detection model do you use?


Is tesseract even ML based? Oh, this piece of software is more than 19 years old, perhaps there are other ways to do good, cheap OCR now. Does Gemini have an OCR library, internally? For other LLMs, I had the feeling that the LLM scripts a few lines of python to do the actual heavy lifting with a common OCR framework.


Custom trained yolo v8. I've moved on since then and the work was done in 2023. You'd get better results for much less today.


Quetzal as a product is inevitable. It will make it even easier to target international markets from day one with a tiny team. Congrats on the launch! You rock! Greetings from section 4D!


What's your next step? Why did you decide to focus on enzyme design?


We think enzymes are super cool! You can build molecular assembly lines at the atomic scale with them. Many pharmaceuticals are already manufactured with enzymes such as the diabetes drug Januvia. Engineering them is a big bottleneck though - takes years and millions of dollars. We want to speed this up with AI-powered design. Next step is ligand-protein prediction capability of AlphaFold3, which is also super useful for modelling enzyme-substrate interactions.


Possibly because it dovetails with pharma mfg and [potentially] food mfg. Could see a case made for enzymatically brewed 'meat inks' [very sorry for this term ;p] for 3d printing the next gen of lab meats.


Can anyone recommend a method to deduplicate pdfs? The hash is often different but the content and meta data is 99.99% the same.


You might want strip metadata before doing a comparison, using exiftool. Even though exiftool was originally written for EXIF metadata on JPGs, these days, it supports a lot of metadata standards, including PDF. This command will do it assuming you set filename=`basename your.pdf .pdf`:

    exiftool -all= -o ${filename}.stripped.pdf ${filename}.pdf
That won't help you with small differences in the contents, but might help with small differences in metadata. Running `md5sum` on the stripped PDF should give more reliable dedupe results.

I was recently working on a similar problem for JPG, RAW, and MP4 files (photo/video backup) so it is fresh in my mind.


I would consider rasterizing the PDFs and then hashing the resulting bitmaps.


cp?


There is also active development for tinnitus/noise induced hearing loss. Dr Chen from Harvard is working on it. He was also involved in the research linked in the article. They are getting closer to a cure. Here is an interview with him: https://www.youtube.com/watch?v=lJr86MUYJ8M


Pretty cool what they are working on. However, I wished there would be more funding for restoring hair cells which are the root cause for most people with hearing loss.

Researchers are getting closer. Dr. Chen from Harvard was able to regenerate hair cells in mature mice last year.

The problem is also becoming more widespread. 30 Mio people in the US and 400 Mio people worldwide have disabling hearing loss. Regenerating hair cells and the synapses around them would also cure Tinnitus. 30 Mio x $5k for a treatment = $60B market (probably way bigger with aging population)

I think we probably need more rich tech billionaires to get affected to attract large funding.

What billionaires that you know are affected besides:

- Brad Jacobs

- Ryan from Flexport/Founders Fund


Can you share some insights on what the issues were?


I wouldn't want to go into specifics, as it's easy to look like I was slinging mud.

But if you have seen the issues that Starliner has had recently, I would echo the statement from there, that valves are hard. Very hard. Everything to do with pressurised systems in space is hard. But the propulsion team worked it through and are now seeing the fruits of the long hours.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: