Congratulations on what look a very challenging task. I'm assuming a part of those tests hit a database. How have you dealt with it? I assume that a single instance, even on a powerful bare server could be a road blocker in this situation. A few insights on the Docker/Containerization part of it would also be nice!
Our testing running infrastructure spins up a pool of database instances on each worker machine, one for each worker process. The test spinup and teardown code handles schema management, hooking into our DB access layer to create and clean up database tables only if they're used by a given test.