From: Donald Hunter <donald.hunter@gmail.com>
To: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>,
Mauro Carvalho Chehab <mchehab@kernel.org>,
Vegard Nossum <vegard.nossum@oracle.com>,
Akira Yokosawa <akiyks@gmail.com>,
Jani Nikula <jani.nikula@linux.intel.com>,
Randy Dunlap <rdunlap@infradead.org>,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2] docs: drop the version constraints for sphinx and dependencies
Date: Mon, 18 Mar 2024 16:44:55 +0000 [thread overview]
Message-ID: <m21q8732wo.fsf@gmail.com> (raw)
In-Reply-To: <20240301141800.30218-1-lukas.bulwahn@gmail.com> (Lukas Bulwahn's message of "Fri, 1 Mar 2024 15:18:00 +0100")
Lukas Bulwahn <lukas.bulwahn@gmail.com> writes:
> As discussed (see Links), there is some inertia to move to the recent
> Sphinx versions for the doc build environment.
>
> [...]
>
> Link: https://lore.kernel.org/linux-doc/874jf4m384.fsf@meer.lwn.net/
> Link: https://lore.kernel.org/linux-doc/20240226093854.47830-1-lukas.bulwahn@gmail.com/
> Reviewed-by: Akira Yokosawa <akiyks@gmail.com>
> Tested-by: Vegard Nossum <vegard.nossum@oracle.com>
> Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
> ---
> v1 -> v2:
> drop jinja2 as suggested by Vegard.
> add tags from v1 review
>
> Documentation/doc-guide/sphinx.rst | 11 ++++++-----
> Documentation/sphinx/requirements.txt | 7 ++-----
> scripts/sphinx-pre-install | 19 +++----------------
> 3 files changed, 11 insertions(+), 26 deletions(-)
Apologies if I am a little late to the party here - I am just catching
up with the changes on docs-next.
I went to install Sphinx 2.4.4 using requirements.txt for some doc work
and hit the upstream Sphinx dependency breakage. So I pulled docs-next
with the intention of sending a patch to requirements.txt with pinned
dependences. When I noticed that things have already moved on in
docs-next, I decided to spend some time investigating the performance
regression that has been present in Sphinx from 3.0.0 until now.
With Sphinx 2.4.4 I always get timings in this ballpark:
% time make htmldocs
...
real 4m5.417s
user 17m0.379s
sys 1m11.889s
With Sphinx 7.2.6 it's typically over 9 minutes:
% time make htmldocs
...
real 9m0.533s
user 15m38.397s
sys 1m0.907s
I collected profiling data using cProfile:
export srctree=`pwd`
export BUILDDIR=`pwd`/Documentation/output
python3 -m cProfile -o profile.dat ./sphinx_latest/bin/sphinx-build \
-b html \
-c ./Documentation \
-d ./Documentation/output/.doctrees \
-D version=6.8.0 -D release= \
-D kerneldoc_srctree=. -D kerneldoc_bin=./scripts/kernel-doc \
./Documentation \
./Documentation/output
Here's some of the profiling output:
$ python3 -m pstats profile.dat
Welcome to the profile statistics browser.
profile.dat% sort tottime
profile.dat% stats 10
Fri Mar 15 17:09:39 2024 profile.dat
3960680702 function calls (3696376639 primitive calls) in 1394.384 seconds
Ordered by: internal time
List reduced from 6733 to 10 due to restriction <10>
ncalls tottime percall cumtime percall filename:lineno(function)
770364892 165.102 0.000 165.102 0.000 sphinx/domains/c.py:153(__eq__)
104124 163.968 0.002 544.788 0.005 sphinx/domains/c.py:1731(_find_named_symbols)
543888397 123.767 0.000 176.685 0.000 sphinx/domains/c.py:1679(children_recurse_anon)
4292 74.081 0.017 74.081 0.017 {method 'poll' of 'select.poll' objects}
631233096 69.389 0.000 246.017 0.000 sphinx/domains/c.py:1746(candidates)
121406721/3359598 65.689 0.000 76.762 0.000 docutils/nodes.py:202(_fast_findall)
3477076 64.387 0.000 65.758 0.000 sphinx/util/nodes.py:633(_copy_except__document)
544032973 52.950 0.000 52.950 0.000 sphinx/domains/c.py:156(is_anon)
79012597/3430 36.395 0.000 36.395 0.011 sphinx/domains/c.py:1656(clear_doc)
286882978 31.271 0.000 31.279 0.000 {built-in method builtins.isinstance}
profile.dat% callers c.py:153
Ordered by: internal time
List reduced from 6733 to 4 due to restriction <'c.py:153'>
Function was called by...
ncalls tottime cumtime
sphinx/domains/c.py:153(__eq__) <- 631153346 134.803 134.803 sphinx/domains/c.py:1731(_find_named_symbols)
154878 0.041 0.041 sphinx/domains/c.py:2085(find_identifier)
139056533 30.259 30.259 sphinx/domains/c.py:2116(direct_lookup)
135 0.000 0.000 sphinx/util/cfamily.py:89(__eq__)
From that you can see there is a significant call amplification from
_find_named_symbols (100k calls) to __eq__ (630 million calls), plus
several other expensive functions. Looking at the code [1], you can see
why. It's doing a list walk to find matching symbols. When adding new
symbols it does an exhaustive walk to check for duplicates, so you get
worst-case performance, with ~13k symbols in a list during the doc
build.
I have an experimental fix that uses a dict for lookups. With the fix, I
consistently get times in the sub 5 minute range:
% time make htmldocs
...
real 4m27.085s
user 10m56.985s
sys 0m56.385s
I expect there are other speedups to be found. I will clean up my Sphinx
changes and share them on a GitHub branch (as well as push them
upstream) so that others can try them out.
For some reason, if I run sphinx-build manually with -j 12 (I have a 12
core machine) I get better performance than make htmldocs:
% sphinx-build -j 12 ...
...
real 3m56.074s
user 9m52.775s
sys 0m52.905s
I haven't had a chance to look at what makes the difference here, but
will investigate when I have time.
Cheers,
Donald.
[1] https://github.com/sphinx-doc/sphinx/blob/ff252861a7b295e8dd8085ea9f6ed85e085273fc/sphinx/domains/c/_symbol.py#L235-L283
> diff --git a/Documentation/sphinx/requirements.txt b/Documentation/sphinx/requirements.txt
> index 5d47ed443949..5017f307c8a4 100644
> --- a/Documentation/sphinx/requirements.txt
> +++ b/Documentation/sphinx/requirements.txt
> @@ -1,6 +1,3 @@
> -# jinja2>=3.1 is not compatible with Sphinx<4.0
> -jinja2<3.1
> -# alabaster>=0.7.14 is not compatible with Sphinx<=3.3
> -alabaster<0.7.14
> -Sphinx==2.4.4
> +alabaster
> +Sphinx
> pyyaml
next prev parent reply other threads:[~2024-03-18 16:45 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-01 14:18 [PATCH v2] docs: drop the version constraints for sphinx and dependencies Lukas Bulwahn
2024-03-03 15:17 ` Jonathan Corbet
2024-03-18 16:44 ` Donald Hunter [this message]
2024-03-18 16:54 ` Vegard Nossum
2024-03-18 17:10 ` Donald Hunter
2024-03-19 17:59 ` Donald Hunter
2024-03-21 16:56 ` Donald Hunter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=m21q8732wo.fsf@gmail.com \
--to=donald.hunter@gmail.com \
--cc=akiyks@gmail.com \
--cc=corbet@lwn.net \
--cc=jani.nikula@linux.intel.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lukas.bulwahn@gmail.com \
--cc=mchehab@kernel.org \
--cc=rdunlap@infradead.org \
--cc=vegard.nossum@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.