At best this lets you conclude that a URL could be valid. Is that really useful? Is the goal here to catch typos? Because you'd still miss an awful lot of typos.
If you really want your URL shortener to reject bad URLs, then you need to actually test fetching each URL (and even then...)
As an aside, I'd instantly fail any library that validates against a list of known TLDs. That was a bad idea when people were doing it a decade ago. It's completely impractical now.
My exact use case was the following: the user clicks a bookmarklet that passes the current URL in the browser as a query string parameter to a URL shortener script. The validation is then performed before the URL is shortened.
In that scenario, and with the given requirements, I can’t think of a case where the validation fails. There’s no need to worry about protocol-relative URLs, etc.
(Keep in mind that this page is 4 years old — I very well may have missed something.)
> If you really want your URL shortener to reject bad URLs, then you need to actually test fetching each URL (and even then...)
I disagree. http://example.com/ might experience downtime at some point in time, but that doesn’t mean it’s suddenly an invalid URL.
> As an aside, I'd instantly fail any library that validates against a list of known TLDs. That was a bad idea when people were doing it a decade ago. It's completely impractical now.
I still don't quite follow the purpose of the validation. Is it against malicious use? In normal use, I would think that pretty much any URL that's good enough for the browser sending it would be good enough for the link shortener.
If you really want your URL shortener to reject bad URLs, then you need to actually test fetching each URL (and even then...)
As an aside, I'd instantly fail any library that validates against a list of known TLDs. That was a bad idea when people were doing it a decade ago. It's completely impractical now.