If you train in parallel, how do you combine the weights generated from differen...

Salgat · on May 18, 2023

Believe it or not, it's a simple as averaging or adding the gradients of each training result before adding it to the model weights. The same thing happens when you train a model using batches of inputs.

throwawaymaths · on May 19, 2023

It actually isn't. You have to have a synchronizer, batchsize one or else "strange things" can happen and you waste a lot of cycles. Alternatively you can do non-simple changes to your network structure to enable distributed training.

Salgat · on May 19, 2023

It really is that simple. Yes, there's many different approaches to this (which can become quite clever and complex, which is true of training in general), but it all really boils down to adding or averaging the gradients in most cases.

throwawaymaths · on May 18, 2023

You take a huge hit.