From: Todd Zullinger <tmz@pobox.com>
To: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: git@vger.kernel.org, "Matt Burke" <spraints@gmail.com>,
"Victoria Dye" <vdye@github.com>,
"Matthias Aßhauer" <mha1993@live.de>
Subject: Re: Migration of git-scm.com to a static web site: ready for review/testing
Date: Fri, 17 Nov 2023 21:57:44 -0500 [thread overview]
Message-ID: <ZVgoKPAg6jKZk_M6@pobox.com> (raw)
In-Reply-To: <4dd9b45d-b352-d8ba-3314-96ab48f7abf7@gmx.de>
Hi Johannes,
Johannes Schindelin wrote:
>> For checking links, a tool like linkcheker[1] is very handy.
>> This is run against the local docs in the Fedora package
>> builds to catch broken links.
>
> Hmm, `linkchecker` is really slow for me, even locally.
Yeah, it took an hour and a half to run for me, both on an
old laptop and a fast server with plenty of threads,
bandwidth, and memory.
Checking the git HTML documentation takes under 30 seconds,
which is largely the only place I've used it. It has been
very helpful in catching broken links in the docs during the
build and the time is short enough that I never minded.
> Granted, the added cross-references now increase the number of hyperlinks
> to check, but after I let the program run for a bit over an hour to look
> at https://git-scm.com/ (for comparison), it is now running on the local
> build (i.e. the `public/` folder generated by Hugo, not even an HTTP
> server) for over 45 minutes and still not done:
>
> -- snip --
> [...]
> 10 threads active, 112977 links queued, 206443 links in 100001 URLs checked, runtime 48 minutes, 46 seconds
> 10 threads active, 113455 links queued, 206689 links in 100001 URLs checked, runtime 48 minutes, 52 seconds
> 10 threads active, 113829 links queued, 206874 links in 100001 URLs checked, runtime 48 minutes, 57 seconds
> 10 threads active, 114230 links queued, 207136 links in 100001 URLs checked, runtime 49 minutes, 3 seconds
> 10 threads active, 114731 links queued, 207498 links in 100001 URLs checked, runtime 49 minutes, 9 seconds
> -- snap --
I would have thought that bumping the number of threads a
lot would really help, but I ran it on a dual Xeon system
with 40 threads and it took about the same time. Perhaps I
should have increased to double or more the system processor
count.
> Maybe something is going utterly wrong because the number
> of links seems to be dramatically larger than what the
> https://git-scm.com/ reported; Maybe linkchecker broke out
> of the `public/` directory and now indexes my entire
> harddrive ;-)
Heh, hopefully not. :)
I wondered if there were circular links that it was picking
up and not de-duplicating. I may try to run it with the
--verbose option which logs all checked URLs. Maybe that
will turn up something. It sure seems like there's a _lot_
of links here.
There is a --recursion-level option which might be helpful.
The --ignore-url and/or --no-follow-url may also be useful.
Though even if it's (very) slow, it might be worth running
to flush out some initial issues before making the site
live. Letting it run in the background for a few hours is
probably less effort than fielding a number of big reports
about broken URL here and there. :)
Of course, it would be even better if it were fast enough to
run as part of the site build process to catch broken links
before each deployment, but that would need to be measured
in some relatively small number of seconds instead of the
hours it seems to take now. :/
--
Todd
next prev parent reply other threads:[~2023-11-18 2:57 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-17 13:25 Migration of git-scm.com to a static web site: ready for review/testing Johannes Schindelin
2023-11-17 16:26 ` Todd Zullinger
2023-11-18 1:14 ` Johannes Schindelin
2023-11-18 2:57 ` Todd Zullinger [this message]
2023-11-21 14:25 ` Johannes Schindelin
2023-11-28 1:54 ` Todd Zullinger
2024-09-11 22:18 ` Johannes Schindelin
2023-11-18 9:41 ` Johannes Sixt
2023-11-18 9:46 ` Johannes Schindelin
2023-11-23 18:53 ` Kaartic Sivaraam
2024-09-11 22:18 ` Johannes Schindelin
2024-09-11 22:18 ` Johannes Schindelin
2024-09-11 22:20 ` Johannes Schindelin
2024-09-12 7:53 ` Toon Claes
2024-09-14 18:41 ` Johannes Schindelin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZVgoKPAg6jKZk_M6@pobox.com \
--to=tmz@pobox.com \
--cc=Johannes.Schindelin@gmx.de \
--cc=git@vger.kernel.org \
--cc=mha1993@live.de \
--cc=spraints@gmail.com \
--cc=vdye@github.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.