It's easier for large rich companies with infrastructure and datasets. It's very hard for small startups to build useful real world models from scratch, so you see most people building on top of SD and APIs, but that limits what you can build, for example it's very hard to build realistic photo editing on top of stable diffusion.
I wrote it from the perspective of a small startup (<10 people, bootstrapped or small funding). I think it's far cheaper and easier to build a nice competitive mobile app/saas than to build a really useful model.
But yes I agree, it will be very competitive with much smaller margins.
I've tried it, sure it's good, but not even close to the real thing. But yes it's getting cheaper through better hardware, better data and better architectures. Also it builds on Facebook's models that were trained for months on thousands of A100 GPUs.