I'm working on a Ph.D. working in GPU architecture, and this course is the real deal. It goes beyond how to run things on a GPU to analyzing the runtime and work efficiency of algorithms suited to the GPU.
I took the course at UIUC (ECE 408) 2 years ago. While the assignments weren't too challenging, I thought they were thorough in covering the material from class, and the material from class came straight from Professor Hwu's book.
Plus, the final exam was extremely harsh so I wouldn't have called it a joke.
Compared to the physics classes or that algorithm class, I put my brain on auto-pilot. Sure, the final was hard but mostly because I didn't have time to write code in a word document, and people who went to the official exam area actually got significantly more time (I took it a few years before you).
Many things were missing from that class, including how to improve performance by ensuring that warp receives the optimal data size, for example by using float4.
I could have learned the same stuff by looking at his MOOC - which is what OP got bored of doing.
Wen-mei Hwu's lectures on "Advanced Algorithmic Techiques for GPUs" (first lecture slides: http://iccs.lbl.gov/assets/docs/2011-01-24/lecture1_computat...) are a gold mine of GPU programming techniques. I believe he has published several books on the topic too, and released a benchmark suite (Parboil http://impact.crhc.illinois.edu/parboil/parboil.aspx) optimized with these techinques.