One aspect of the problem here is the difficulty in running a clinical trial, particularly at the recruitment stage. The covid-19 trials all had a surfeit of participants because of a pandemic, but with modern cancer treatment trials the qualification requirements significantly cut down on the eligible population.
This, in itself, isn't a huge obstacle. The problem is the state of healthcare data systems. It's next to impossible to perform high-quality search (even by individuals approved to do so by the IRB). The state of the art in most places is regex searching in SQL.
This is something we have the power to contribute to. Bringing modern search capabilities to important datasets like health (while maintaining HIPAA-conpliance) is a much better use of engineering time than mining spyware data for creepy insights...
[Disclosure: I contributed heavily to one of the major medical search products on the market. We dealt with organisations that expended tens of thousands of dollars and many months per candidate for recruitment. Using some very straightforward IR tech we literally found all their candidates in a few minutes, plus many more. But there is so much more to do!]
Yes, very true. Beyond just access to clinical data there are often major differences between how the same conditions are recorded between different provider organizations based on EHR data models and local practices. Researchers who want to use data from multiple organizations typically have to put a huge amount of work into their data pipelines for cleansing and normalization. Some standards development organizations such as HL7 (including their various FHIR accelerators) are now writing more detailed and specific implementation guides to improve data quality and consistency so I would encourage technologists to contribute to those projects.
This, in itself, isn't a huge obstacle. The problem is the state of healthcare data systems. It's next to impossible to perform high-quality search (even by individuals approved to do so by the IRB). The state of the art in most places is regex searching in SQL.
This is something we have the power to contribute to. Bringing modern search capabilities to important datasets like health (while maintaining HIPAA-conpliance) is a much better use of engineering time than mining spyware data for creepy insights...
[Disclosure: I contributed heavily to one of the major medical search products on the market. We dealt with organisations that expended tens of thousands of dollars and many months per candidate for recruitment. Using some very straightforward IR tech we literally found all their candidates in a few minutes, plus many more. But there is so much more to do!]