There have been a few issues lately with broken links in docs (e.g. #20818, #21019). We should investigate automated checking of links in generated or published docs.
Comment From: dreis2211
I used http://validator.w3.org/checklink today to find hopefully one last issue. Maybe you can take a look at that for inspiration and/or validation of results.
As you can see there are some URLs which are generally reachable via Browser, but not if you simply curl against them, e.g.:
curl -I https://redis.io
HTTP/1.1 404 Not Found
Server: nginx/1.10.2
Date: Wed, 22 Apr 2020 16:50:24 GMT
Content-Length: 3673
Connection: keep-alive
Comment From: wilkinsona
Interesting finding. Thanks, @dreis2211. Looks like redis.io changes behaviour based on the User-Agent
. It 404s for HEAD
requests too. It responds with a 200 for GET
or HEAD
if you spoof the user agent and pretend to be a browser.
Comment From: dreis2211
Yeah, there are some which are also behind HTTP Basic auth. My point being: Some URLs are false positives and require a bit more magic than others for an eventual tooling.
Comment From: danielmenezesbr
Istio.io [1] [2] uses html-proofer and linkinator to test rendered HTML files to make sure they're accurate.