Also, if your code has a lot of branching (most of my work wouldn’t benefit from offloading to GPU), or if the data being processed in parallel at a time is too small to make up for memory transfer, it can be the right approach and provide a huge performance boost.