Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Biochemical Pathway Maps (biochemical-pathways.com)
232 points by exar0815 on June 17, 2021 | hide | past | favorite | 79 comments


For the last year, I've been self-learning synthetic biology by working to become one of the few people to have ever genetically modified fungi in a diy lab environment[1][2][3]. It occurred to me that genetic engineering is largely a process of hacking biosynthetic pathways; that is, we take heterologous genes from organisms which have evolved novel biochemical pathways and we put them into other organisms that are really good at biosynthesizing those compounds at-scale (like Saccharomyces cerevisiae aka yeast).

One thing I've observed is there seems to be no universal well-annotated structured database for biosynthetic pathways. Updates and additions to known pathways are published unstructured in papers, often in graphical form, presenting even more challenging for structuring the data.

Were this to exist, it would be possible to build an app that let you easily design DNA plasmids for testing all kinds of incredible biosynthesis experiments. For instance, it would be able to select an organism [E. coli], enter your starting substrate [glucose], select a desired metabolite [psilocybin] and out would come a DNA sequence for a plasmid that contains are the necessary parts, promoters, genes and terminators for transforming E. coli to produce psilocybin.

In my opinion, such a tool could profoundly impact science, industry and human well-being.

[1] http://everymanbio.com/

[2] https://www.instagram.com/everymanbio/

[3] https://youtube.com/everymanbio


There are several databases that annotate metabolic pathways, although you are right that we lack a comprehensive integrated data source. Some of the larger ones:

Reactome: https://reactome.org/

KEGG: https://www.kegg.jp/kegg/pathway.html

Wikipathways: https://www.wikipathways.org/index.php/WikiPathways

BioCarta: https://www.hsls.pitt.edu/obrc/index.php?page=URL1151008585

SMPDB: https://smpdb.ca/

ConsensusPathDB: http://cpdb.molgen.mpg.de/

Regarding the issue of of parsing pathway figure data from published papers: pathway-figure-ocr (https://github.com/hiplot/pathway-figure-ocr/) is a project by the developers of Wikipathways which is trying to solve that issue.


Fun historical note: Steve Sprang (1) added a T terminator to inkpad for ipad (2) developed originally under the company name Taptrix, which was in a YC'10 batch (3), after being his first app, Brushes, was an early smash success on the iphone (4), at my request the day he open-sourced it (5) because I wanted a T terminator for drawing biochemical pathway maps!

(1) https://news.ycombinator.com/user?id=ssprang

(2) http://inkpad.art/

(3) https://news.ycombinator.com/item?id=6665261

(4) https://www.macworld.com/article/198770/brushes.html

(5) https://github.com/sprang/Inkpad/issues/22


That's a great list, but it's not just about annotated metabolic pathways; there also needs to be information about the gene(s) that have been characterized as being involved in those pathways, along with information about the source organisms and potential target organisms for production. Same with respect to substrates needed for intermediary pathways.


Gene Ontology (GO) is what you are looking for. KEGG is also useful. Welcome to systems biology. Mapping transcriptional networks/pathways is more challenging and a major focus of current research across all model organisms.


PathWhiz is pretty general (https://smpdb.ca/pathwhiz) and covers most of that. Give it a go! You can sign in as a guest and start creating pathways. All of the gene information is included and pulled in from Uniprot as well as a bunch of other sources. Small molecule data is integrated with HMDB, DrugBank, and a bunch of other sources as well. (disclaimer: although I didn't work on this tool, I worked on the first version of SMDPB).


All that information is available through various databases and APIs, but the exact pipeline you are looking for might not be easily accessible via a nice user interface. It requires a lot of "glue" code to to connect the data from one resource to another.


As far as I know genetic construct design remains a mostly manual process because the decision tree that goes into creating a viable clone is complicated to say the least. There are a lot of variables to optimize, from choosing the right expression vector, to cutting and gluing the right genes at the right spot. I think that in many cases, the science behind genetic engineering is mostly empirical. Biologists will often spend a long time to figure out how to express one particular product, and then apply that knowledge to similar products, but if they move to something vastly different, there is no guarantee of success.

Automating the process is a valuable effort nevertheless. I think because of it's complexity, it would require a combination ofknowledge-graph based reasoing, with AI.

There are databases such as Addgene: https://www.addgene.org/search/catalog/plasmids/?q=Cas9, in which you can search existing plasmids created by the scientific community. I think this could be a valuable resource for a machine learning approach.


Can you please list them? Then I can write all the glue code and pull them together.


My idea is to essentially be the glue and build the interface.


Some notes towards those ends:

WikiPathways supports advanced queries via their SPARQL API and UI. See [1] and [2]. I find WikiPathways nice because it lets logged-in users create and edit pathways, with a low barrier to entry.

I've been building a way to find related genes using biochemical pathways [3]. The source code linked there includes practical examples for fetching information on genes in those pathways, which you rightly note is needed for something compelling. That and other code there might help spark ideas for you on how to glue together various biochemistry and molecular biology APIs to achieve your vision.

I'm currently working on a way to drastically expand the set of organisms and pathways covered by WikiPathways. Yeast has 66 pathways there, compared to 1319 for human. By doing fast ortholog detection at runtime (using another SPARQL API, provided by OrthoDB [4]) I'm hoping to be able to convert relevant annotated pathways across organisms, e.g. human to yeast, mouse to rat, Arabidopsis to rice -- and vice versa.

[1] http://sparql.wikipathways.org

[2] https://www.wikipathways.org/index.php/Help:WikiPathways_Spa...

[3] https://eweitz.github.io/ideogram/related-genes?q=RAD51&org=...

[4] https://sparql.orthodb.org


KEGG is a very useful index into known genes for this, I used it all the time when mining for possible enzymes in the past.


Agreed! And if it were complemented with added information about the organisms transcription machinery (promoters, terminators, etc), it could empower folks to generate DNA to test a new or modified metabolic pathway in a manner that's never been tried before.


As has been pointed out by others, there already exist databases that annotate many key metabolic pathways; however, I think the reason the database you describe doesn't exist is that there's a LOT more very customized work in recreating synthetic pathways in cells. Minimally, as you say, you'll produce plasmids that code for all the proteins you need. In practice, factors affecting your design decisions include, but are not limited to:

* What your ultimate goals are (proof of concept versus lab/industrial-scale synthesis)

* Your synthetic vehicle, i.e. in which organism you will be recreating this pathway

* Whether the relevant pathway/genes already exist in some form in your vehicle, and may be affected by an introduction of foreign genes

* How you will detect expression of your foreign proteins (the fact that your genes are expressed doesn't mean the corresponding proteins are produced, and if you can't confirm protein expression, the inevitable troubleshooting will be five kinds of hell)

* What kinds of promoters you'll need to drive gene expression

* How stable the proteins in this pathway are, and if you need to modify them to increase stability

* Whether your proteins, after being produced, need extra tweaks (post-translational modifications, in the parlance) in order to work

* How much DNA you'll need

* How you'll deliver the DNA, i.e. what size/kind of delivery vector

* How you'll make the delivery vector

* Whether your proteins will fold properly once produced in the vehicle, and whether they will localize to the right parts of the cell

* Whether your proteins are toxic to your vehicle cell

*etc.


What resources are you using to self-learn synthetic biology? I'm also interested in the topic.

I'm looking for something more practical and hands on at this point. I was planning to start my home late in the next month or so, and start with some basics bacteria projects. So far I found the following to be useful:

1. EDX principals of synthetic biology: https://www.edx.org/course/principles-of-synthetic-biology

2. Coursera systems biology: https://www.coursera.org/specializations/systems-biology

3. Coursera Industrial biotechnology https://www.coursera.org/learn/industrial-biotech

4. SBOL https://sbolstandard.org

5. This GitHub: https://github.com/websemantics/awesome-synthetic-biology

6. https://barricklab.org/twiki/bin/view/Lab

7. Synthetic biology primer (book)


I use two microbiology and mycology books I bought on OfferUp for $10, YouTube and papers. The vast amount of my learning comes from breaking larger technical goals into a series of smaller hands-on projects and then troubleshooting my way through. It's hard to give general advice on synbio since it's such a broad field and my journey principally involves a focus on fungi. In my case, I learned to grow all kinds of mushrooms and fungus at home. That lead me to learning DNA barcoding of fungi which has now lead me to DNA transformation of fungi. Feel free to send me more info about the direction you want to head in and I'd be happy to give you some feedback.


It’s probably helpful to know that these biochemical maps aren’t complete. It the best set of pathways that we can pull together based on current knowledge.

Your idea of a program where you enter starting point and desired product is great in theory, but it’s similar to saying “oh, creating DropBox is easy, it’s just an interface linked to Amazon Cloud”. It’s missing about 99% of the stuff in the background nobody sees.


Thank you, I keep having to explain to programmers that get excited about doing biology discoveries in short periods that you can't achieve that the way you achieve a prototype of a web service in a weekend... They often have the impression they are the ones doing the most complex thing in the world... And they think they understand the necessary bricks after a few hours of reading... Ahhh hubris


The fundamental disconnect seems to be that programming is working with defined languages and environments created by humans with tools created by humans to make further programming by humans easier. Biology is the result of billions of years of natural selection and random processes on hyper complicated chemical pathways. Something as simple as the epigenetics of expressing a gene - if its energetically unfavorable or even toxic, in many systems some sort of downregulation such as DNA methylation is going to occur and provide a competitive advantage in the culture. That's a single gotcha of thousands. Obviously, engineering biology in a robust and reliable way is quite doable, but, ya know, ya gotta know what you're doing and know what you're getting into.


Programming is a great analogy to biology if:

- there is no documentation to start with

- you can only learn the language through trial and error and development of crude frameworks that are accurate 75% of the time

- you have no ability to look at the environment or operating system beyond “trying things” and making assumptions based on results

- the environment itself resists most attempt at modifying it through feedback mechanisms (but you can’t see these either, just the effects)

- no two computers + environments are identical. Code you write will mostly work in different computers, but occasionally won’t and sometimes will just brick (kill) the computer

- when you learn a rule about changing a variable to produce Y affect, that rule doesn’t apply in 9 out of 10 scenarios


for the database, KEGG Pathways comes to mind immediately: https://www.genome.jp/pathway/map01100

many papers mention KEGG and use its references and strucfture as resources. https://molecular-cancer.biomedcentral.com/articles/10.1186/... (I'm not saying there's anything special about this paper in particular)


Is KEGG open access?



I would have a look at Rhea. Which is not a pathway database but a which chemical reactions are out there in biology. This can be combined with the data in UniProt to build metabolic pathways. And then at metanetx.org to build draft metabolic models.

We recently had a collaboration with AWS Open Data+Neptune to show how easy it is to combine Rhea,ChEBI and UniProt in your own database.

[1] https://www.rhea-db.org

[2] https://www.metanetx.org

[3] https://academic.oup.com/bioinformatics/article/36/6/1896/56...

[4] https://aws.amazon.com/blogs/industries/exploring-the-unipro...


Before I start binging on every single aspect of your social media, I gotta know:

Have you tried applying your hobby towards reducing the cost of healthcare in America?

Could you somehow make monoclonal antibodies in your petri dishes?

Could you diagnose what kind of flu virus you have?

Because the phrase "self-learning synthetic biology" immediately rang a bell inside my mind that screamed "small business dedicated to cheaper healthcare", and is something that I have considered dedicating my free time towards.


> Have you tried applying your hobby towards reducing the cost of healthcare in America?

That isn't my stated goal, but I would imagine that sharing how to genetically engineer an incredibly powerful organism to express heterologous genes for less than $3000 in a kitchen lab could open up many possibilities!

> Could you diagnose what kind of flu virus you have?

Is it possible to extract DNA from a bodily fluid, perform PCR and read the sequence results to determine flu variant? I'd have to check on if there is a known DNA sequence that barcodes flu viral variants, but presuming there is, then yes - all of this is possible from a home lab currently.


Influenza is an RNA virus so you'll need to reverse transcribe it.


Yes with a nanopore device that's doable if you take a sample that has a high enough viral charge. You may evem be able to get reads without amplification.


> Have you tried applying your hobby towards reducing the cost of healthcare in America?

This isn't so much a technological problem, but more a politics/incentives issue, as evidenced by the comparably lower cost of healthcare in Europe. Startups might be able to make small reductions in the cost, but truly solving the issue requires re-organising the entire healthcare system.

> Could you somehow make monoclonal antibodies in your petri dishes?

This wouldn't be particularly useful. In order to use it in humans, you need a ton of quality control and regulatory approvals, and this is where a lot of the manufacturing cost crops up.


agreed! a couple things seem to be holding this back:

- the academic publishing model incentivizes groups to build their own tools

- the small-ish market for something like this has kept commercial software from taking off (Genomatica started by building a tool like this in the early 2000s, before pivoting to bioprocess development)

- it's really hard to specify pathways in concrete physical terms. Even a chemical like glucose is actually a collection of pseudo-isomers (alpha & beta D-glucose). And try firmly defining a "gene" in your database!

that being said, there's a ton of work going on in this field and many cool projects to follow: https://dd-decaf.eu, https://biocyc.org, http://bigg.ucsd.edu, http://metanetx.org


The database would definitely need to define some boundaries and limitations, but I still think there is much opportunity in coalescing well-defined metabolic and genetic data and empowering folks to generate feasible genetic constructs.


You might also want some pathways to be pre-validated to work together in certain cellular contexts, like they have been doing with the BioBricks project


Absolutely. This is probably the best and only way to start.


Check out BRENDA [1]. It's the best one I've found for annotated pathways.

With regards to your tool idea, it is still very expensive to generate DNA from scratch. Designing a theoretical plasmid won't get you all the way there if you can't source the parent DNA sequence. We usually pay ~$0.30/base for DNA synthesis. Plus, I'm not sure what the metabolic pathway of psilocybin is but you may need several plasmids to recreate the whole thing in E. coli (generally limited to ~10-15 KB). An interesting idea but definitely no small feat of engineering.

[1] https://www.brenda-enzymes.org/pathway_index.php


People sometimes ask me how come a biologist might end up programming (or vice versa).

Showing them a map like this tends to clear things up pretty quickly!


yeah, but the programmer will believe the chart.

the biologist, who knows better, will regard the chart as “our current best attempt at documentation, subject to change at any time at all”

My background is biomedical. This or a very similar chart was used in the last week of lectures on control systems as a lesson in humility.

It was meant to demonstrate both how complex things can be — and then, when the professor recolored the lines on a section according to the confidence in their correctness as a function of recent research, how uncertain our knowledge of that complexity really is.

He had a backup slide where an entirely new graph, with almost no topological similarity to the metabolic one, was created for a few of the nodes that participate in other nonmetabolic pathways.


"the programmer will believe the chart."

I see you've not done much legacy software maintenance.. ;)


> yeah, but the programmer will believe the chart.

I'm a programmer and I do understand that these maps are "as much as we currently know" and the "we currently know" is expanding rapidly. I like to see biology as reverse engineering alien technology made by a much more advanced civilization. Of course that's not true — it all evolved over billions of years — but the resemblance is totally there.

The one thing that really strikes me about biology is how young of a science it actually is and how much progress we've made in so little time. It was only 1953 when the structure and function of DNA was discovered. In terms of history, it's not even yesterday. Yet now, 68 years later, we're literally injecting people with correctly encoded instructions to make a harmless part of a deadly virus to save their lives. This kinda blows my mind.


>I like to see biology as reverse engineering alien technology made by a much more advanced civilization. Of course that's not true — it all evolved over billions of years — but the resemblance is totally there.

Be really careful here, though, otherwise you'll hoodwink yourself.

The two famous things that come to mind for me are these, both of which remind me to tread carefully:

https://blogs.sciencemag.org/pipeline/archives/2007/11/06/an...

https://www.cs.utexas.edu/users/EWD/transcriptions/EWD09xx/E...


most of the edges in this graph were elucidated using radiocarbon tracking, and they are generally well-established. I don't think there have been major corrections to the Calvin cycle in 50 years. Ditto for the glycolysis cycle- at best, people find alternative paths and excursions, but they're not on-path for core biology.

These metabolic charts, like the results of spectroscopy, actually represent some of the best, highest quality scientific reference data that exists.


Agree with you 100%. This chart is just barely scratching the surface


It's a pretty cool field!


I don't ever see anything nearly this complex in software. Programming has encapsulation/boundaries. In biology, everything can have an effect on everything else (with side channels appearing everywhere, even having influences via thermodynamic laws, pH, etc).


Yes and there are also the non enzymatic reactions that the genetics-focused people tend to ignore, but they have a big importance on a metabolome and its evolution over time after extraction.


Eh? There's quite some encapsulation in biology too. Often literally.

But yeah, subsystems don't need to be kept to a size that will fit inside a regular programmer's working memory.


These maps are fun and fascinating, but can be misleading for the supplement enthusiasts.

In many cases you can trace pathways upstream until arriving at a supplement or nutrient that can be bought online. People erroneously assume that consuming that nutrient will achieve some downstream effects in certain pathways because all of the arrows connect. It's not that simple, though. For the most part our bodies are excellent at absorbing the nutrients we need and regulating everything as appropriate.


i had this poster on my wall (actually, the predecessor, which you could order for free by physical mail) my entire sophomore year while taking biochemistry. I used to joke to people that by the end, I would have memorized most of it, and that did actually turn out to be true. Even today, some 28 years later, I remember the hexose mannophoshate shunt.


> which you could order for free by physical mail

This one can be ordered as well.


and here [0] is where you can order it. you can ask for paper copies of the poster to be mailed to you (free) from the roche philanthropy department! it took about a month for mine to arrive, i think, but they're really awesome looking and i now have the 'cellular and molecular processes' one hanging behind me in my front room, so that i look impressively clever on zoom calls ;)

0. https://www.roche.com/sustainability/philanthropy/science_ed...


I ordered one about 1 year ago, but never received it. Possibly because I didn't have a 'proper' organisation (e.g. a university dept) that I could list in the form.


i simply listed my perfectly ordinary residential flat's address in the uk. and, as far as i can tell from the text in the link i gave above, they are giving these away to anyone who wants one as an educational public service, hence the fact that it's the philanthropy department that runs the scheme.

> We believe all people should have access to important scientific information to help their research and education. [...] Roche provides copies as a free service to the public.

they are pretty cool to have, with very high quality two colour printing, and you also get a wee booklet with an index to the two posters to help you find things. there's also a book [0] by the same author with much more detail, and obviously much more portable, but sadly much more expensive ;(

so, i'd try asking for them again, if i was you.

0. https://blackwells.co.uk/bookshop/product/9780470146842


Thanks, I will.


the difference is, at this point in my career, I no longer need this poster in physical form. When I return to the office, there will be a copy on every fridge. Also I transitioned from biology to cloud biology, this whole chart is just a JSON file.


Related discussion from about 6 months back (including where to find the PDF version).

https://news.ycombinator.com/item?id=25158249


Even earlier discussion, also with Link to posters in discussion

https://news.ycombinator.com/item?id=23448398


Beautiful! And complex from start to finish.

I wonder what formalisms/representations are there to manage the complexity of metabolic pathways. For example, say that that figure is 100% accurate, and furthermore, "all metadata needed" in terms of reaction rates, etc. for each individual reaction is available. If one wants to stage an intervention (say, suppress the production of compound X without altering anything else significantly), what kind of program would be able to find a solution?


It’s cool to see this on HN. It’s a common activity in metabolic modeling & metabolic engineering circles to build new pathway visualization tools (I spent a chunk of my PhD on one called “Escher”). I keep waiting for a tool to come along that’s good enough to be sticky and win this little market. But it might be more like IDEs, where there are some classics (pathway tools = emacs?), and an endless supply of new entrants.


People have tried that, but the problem is that every group have their tools in different languages, different flow management systems etc


My friend who studied biochemistry in Turku University once had this kind of poster on his wall, but it had all the pathways overlaid on a human body. When I first saw it, I studied it for the better part of an hour.

I have been trying to find an image of this poster without success. The detail level was like on these Roche posters, but the style was more like in these:

https://www.tocris.com/literature/life-science-posters


I love these posters! I had them on my bedroom wall as a teenager. (I had very little idea about most of what it is describing, but I thought it looked cool and hey, it’s free!)


Pretty cool !! I just ordered mine on their website. They provide it in two parts : 1. Metabolic Pathways (139cm x 98cm / 4.5feet x 3 feet) 2. Cellular and molecular processes (115cm x 98cm / 4feet x 3feet)


How do I begin to understand this? Where do you start?


You start with what you eat. There are three (four if you include drinking) main categories:

Protein Carbohydrates Fat

Your goal is to take reduced carbon (carbon bound to hydrogen) and metabolize it to oxidized carbon (carbon bound to oxygen). That process releases energy your cells can use.

The problem with oxidizing carbon is that it creates toxic products ("free" electrons) and involves splitting atmospheric oxygen (O2) into O-, which is very reactive. So then you need a whole host of other pathways to contain that reactive compound.

If you really want to understand it, follow the carbon as it comes in, and trace it through to CO2. That's the pathway, and everything else is to support that.


MIT 7.016 Introductory Biology, Fall 2018 https://www.youtube.com /playlist?list=PLUl4u3cNGP63LmSVIVzy584-ZbjbJ-Y63

MIT 5.111 Principles of Chemical Science, Fall 2014 https://www.youtube.com/playlist?list=PLUl4u3cNGP63LOmB3_O0x...

Textbooks (free PDFs available on gen.lib.rus.ec):

Lehninger, Principles of Biochemistry https://www.macmillanlearning.com/college/ca/product/Lehning...

Molecular Biology of the Cell https://www.ncbi.nlm.nih.gov/books/NBK21054/


Citric-Acid Cycle, Krebs cycle

Glucose, Acetyl-CoA, Pyruvate, ADP, ATP, NAD+, NADH, NADP, NADPH


It’s amazing how these things happen in the volume of a cubic micrometer. There is however a big misleading element; Not all of these pathways operate in the same cell and not all of them are equally strong - the arrows are not of equal “thickness” if you like. That’s the big problem in molecular biology and biochemistry: all these arrows are always depicted as of equal strength.


That is some fault-tolerant spaghetti code!


The conciseness of DNA coding for all that proteins do is almost beyond human intelligence in its subtly and obfuscation.


It's not clear yet whether it's beyond our intelligence or not.


And each kind of cell, life-stage of cell and its history will have an impact on what part of those mapping are really in use at a given time.


On the commercial side of this is Elsevier Pathway studio [1]

[1] https://www.elsevier.com/solutions/pathway-studio-biological...


I didn't know such map exists. I wonder if I can get one as a poster !



Thank you ! I've just ordered it. I hope I'll receive it :)


I remember reading someone doing the math on these and finding that statistics-wise, it was essentially impossible for these metabolic pathways to have evolved through randomness given the age of our planet.


All the calculations that I've seen with similar claims are very wrong. Do you have a link so I can take a look?

Just an important detail. It's not just randomness. It's randomness + natural selection.


It was part of the book, Arrival of the Fittest. The author somehow quantified the scale of the space of all possible chemical reactions and it was just so huge that the chance of the citric acid cycle emerging was just infinitesimal, like 1 in 10^100 or something. Like winning the lottery 10 times in a row type of thing. Found it an intriguing idea. Would be curious to see the counter argument, though it's likely to be beyond me mathematically.


This chart is exactly the hacked-together kluge I expect of a randomly evolved solution.

Keep in mind that as soon as an element is added to the system - and it works - that element becomes sticky. Rinse and repeat, and you get more and more complex systems over time.


Would be great to have a citation on this




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: