It depends, for SIMD float-> scalar floats it is fast as they operate on the same registers. If pulling out of lane 0 you don't even need to do anything(just a type cast). For other lanes you need a shuffle.
For SIMD integer to scalar integer, it has to move into separate register, so there is some short penalty(3 cycles iir).