By "the Internet" they mean the IPv4 space, right? There are only 3.681 billion public IPv4 addresses so it's a trivial problem to scan them all at a suitably parallel scale.
They're working on scanning IPv6 as well. They got in trouble a few years back after they were observed harvesting IPv6 addresses by running a public NTP server[1].
Searching Ghodan for ssh server that are not on port 22 probably gives you back a venn diagram containing circles for "people who thing security by obscurity works" and "people who think their stuff is important enough to 'hide' by configuring non standard port numbers".
The intersection there probably has some interesting low hanging fruit in it...
(There's a third circle in that venn diagram which I sometimes sit in, labeled "people who change port numbers to keep log file noise lower", which wile maybe being a valid choice, also opens you up to being thought of as "interesting possibly low hanging fruit" by the sort of people who thing those things.)
You can probably get a pretty good idea of the v6 space by checking domain name registrations, certificate transparency, logging requests from v6 addresses, etc.
> Shodan has servers located around the world that crawl the Internet 24/7 to provide the latest Internet intelligence.