Whether or not the original Higgs discovery decay channels used ML, confirming that it was in fact the Higgs required measuring the decay to b-quarks, which has used ML since the LHC started taking data.
Over the lifetime of the LHC, he backgrounds got around 10x smaller for the same "efficiency" (fraction of true b-quarks tagged) if you want to be pedantic about the definitions. We've used NNs in b-tagging for decades now, so it was always possible to dial in a threshold for tagging that was e.g. 70% efficient.
Transformers gave us a factor of a few smaller backgrounds in the last few years though [1].
Whether or not the original Higgs discovery decay channels used ML, confirming that it was in fact the Higgs required measuring the decay to b-quarks, which has used ML since the LHC started taking data.
Over the lifetime of the LHC, he backgrounds got around 10x smaller for the same "efficiency" (fraction of true b-quarks tagged) if you want to be pedantic about the definitions. We've used NNs in b-tagging for decades now, so it was always possible to dial in a threshold for tagging that was e.g. 70% efficient.
Transformers gave us a factor of a few smaller backgrounds in the last few years though [1].
[1]: https://atlas.web.cern.ch/Atlas/GROUPS/PHYSICS/PLOTS/FTAG-20...