If performance really is of concern, qsort() or similar functions probably shouldn't be used. Instead a dedicated function, which allows to choose a specific algorithm (qsort() doesn't have to use quicksort) and also doesn't involve calling a comparison function pointed to by a function pointer, can be used.
One interesting thing I have learned recently is that GCC can do inlining through function pointers.
That doesn't make your point about dedicated function being better idea for performance but it's one thing "std::sort is faster by design" people often miss.
From GCC documentation:
>>-findirect-inlining
Inline also indirect calls that are discovered to be known at compile time thanks to previous inlining. This option has any effect only when inlining itself is turned on by the -finline-functions or -finline-small-functions options.
I have no idea when it can and when it can't do that to be honest. I tested qsort vs std::sort on my machine on my data and performance was the same (Windows, MinGW, GCC 4.8, -flto enabled) but other people reported different results.
I would hope that on modern compilers on modern architectures, this would be done with zero branches. That is the case in a quick test on my machine (gcc 4.6.3).
The simplest way to implement that would be two SETxx instructions and a subtract on 386+, SLT/SGT on MIPS, two subtracts (or one compare and one subtract) and two predicated moves on ARM. No branching required at all.