All of lore.kernel.org
 help / color / mirror / Atom feed
From: Donald Hunter <donald.hunter@gmail.com>
To: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>,
	 Mauro Carvalho Chehab <mchehab@kernel.org>,
	 Vegard Nossum <vegard.nossum@oracle.com>,
	 Akira Yokosawa <akiyks@gmail.com>,
	 Jani Nikula <jani.nikula@linux.intel.com>,
	Randy Dunlap <rdunlap@infradead.org>,
	 linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2] docs: drop the version constraints for sphinx and dependencies
Date: Mon, 18 Mar 2024 16:44:55 +0000	[thread overview]
Message-ID: <m21q8732wo.fsf@gmail.com> (raw)
In-Reply-To: <20240301141800.30218-1-lukas.bulwahn@gmail.com> (Lukas Bulwahn's message of "Fri, 1 Mar 2024 15:18:00 +0100")

Lukas Bulwahn <lukas.bulwahn@gmail.com> writes:

> As discussed (see Links), there is some inertia to move to the recent
> Sphinx versions for the doc build environment.
>
> [...]
>
> Link: https://lore.kernel.org/linux-doc/874jf4m384.fsf@meer.lwn.net/
> Link: https://lore.kernel.org/linux-doc/20240226093854.47830-1-lukas.bulwahn@gmail.com/
> Reviewed-by: Akira Yokosawa <akiyks@gmail.com>
> Tested-by: Vegard Nossum <vegard.nossum@oracle.com>
> Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
> ---
> v1 -> v2:
>   drop jinja2 as suggested by Vegard.
>   add tags from v1 review
>
>  Documentation/doc-guide/sphinx.rst    | 11 ++++++-----
>  Documentation/sphinx/requirements.txt |  7 ++-----
>  scripts/sphinx-pre-install            | 19 +++----------------
>  3 files changed, 11 insertions(+), 26 deletions(-)

Apologies if I am a little late to the party here - I am just catching
up with the changes on docs-next.

I went to install Sphinx 2.4.4 using requirements.txt for some doc work
and hit the upstream Sphinx dependency breakage. So I pulled docs-next
with the intention of sending a patch to requirements.txt with pinned
dependences. When I noticed that things have already moved on in
docs-next, I decided to spend some time investigating the performance
regression that has been present in Sphinx from 3.0.0 until now.

With Sphinx 2.4.4 I always get timings in this ballpark:

% time make htmldocs
...
real	4m5.417s
user	17m0.379s
sys	1m11.889s

With Sphinx 7.2.6 it's typically over 9 minutes:

% time make htmldocs
...
real	9m0.533s
user	15m38.397s
sys	1m0.907s

I collected profiling data using cProfile:

export srctree=`pwd`
export BUILDDIR=`pwd`/Documentation/output
python3 -m cProfile -o profile.dat ./sphinx_latest/bin/sphinx-build \
    -b html \
    -c ./Documentation \
    -d ./Documentation/output/.doctrees \
    -D version=6.8.0 -D release= \
    -D kerneldoc_srctree=. -D kerneldoc_bin=./scripts/kernel-doc \
    ./Documentation \
    ./Documentation/output

Here's some of the profiling output:

$ python3 -m pstats profile.dat
Welcome to the profile statistics browser.
profile.dat% sort tottime
profile.dat% stats 10
Fri Mar 15 17:09:39 2024    profile.dat

         3960680702 function calls (3696376639 primitive calls) in 1394.384 seconds

   Ordered by: internal time
   List reduced from 6733 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
770364892  165.102    0.000  165.102    0.000 sphinx/domains/c.py:153(__eq__)
   104124  163.968    0.002  544.788    0.005 sphinx/domains/c.py:1731(_find_named_symbols)
543888397  123.767    0.000  176.685    0.000 sphinx/domains/c.py:1679(children_recurse_anon)
     4292   74.081    0.017   74.081    0.017 {method 'poll' of 'select.poll' objects}
631233096   69.389    0.000  246.017    0.000 sphinx/domains/c.py:1746(candidates)
121406721/3359598   65.689    0.000   76.762    0.000 docutils/nodes.py:202(_fast_findall)
  3477076   64.387    0.000   65.758    0.000 sphinx/util/nodes.py:633(_copy_except__document)
544032973   52.950    0.000   52.950    0.000 sphinx/domains/c.py:156(is_anon)
79012597/3430   36.395    0.000   36.395    0.011 sphinx/domains/c.py:1656(clear_doc)
286882978   31.271    0.000   31.279    0.000 {built-in method builtins.isinstance}

profile.dat% callers c.py:153
   Ordered by: internal time
   List reduced from 6733 to 4 due to restriction <'c.py:153'>

Function                            was called by...
                                       ncalls  tottime  cumtime
sphinx/domains/c.py:153(__eq__)  <- 631153346  134.803  134.803  sphinx/domains/c.py:1731(_find_named_symbols)
                                       154878    0.041    0.041  sphinx/domains/c.py:2085(find_identifier)
                                    139056533   30.259   30.259  sphinx/domains/c.py:2116(direct_lookup)
                                          135    0.000    0.000  sphinx/util/cfamily.py:89(__eq__)

From that you can see there is a significant call amplification from
_find_named_symbols (100k calls) to __eq__ (630 million calls), plus
several other expensive functions. Looking at the code [1], you can see
why. It's doing a list walk to find matching symbols. When adding new
symbols it does an exhaustive walk to check for duplicates, so you get
worst-case performance, with ~13k symbols in a list during the doc
build.

I have an experimental fix that uses a dict for lookups. With the fix, I
consistently get times in the sub 5 minute range:

% time make htmldocs
...
real	4m27.085s
user	10m56.985s
sys	0m56.385s

I expect there are other speedups to be found. I will clean up my Sphinx
changes and share them on a GitHub branch (as well as push them
upstream) so that others can try them out.

For some reason, if I run sphinx-build manually with -j 12 (I have a 12
core machine) I get better performance than make htmldocs:

% sphinx-build -j 12 ...
...
real	3m56.074s
user	9m52.775s
sys	0m52.905s

I haven't had a chance to look at what makes the difference here, but
will investigate when I have time.

Cheers,
Donald.

[1] https://github.com/sphinx-doc/sphinx/blob/ff252861a7b295e8dd8085ea9f6ed85e085273fc/sphinx/domains/c/_symbol.py#L235-L283

> diff --git a/Documentation/sphinx/requirements.txt b/Documentation/sphinx/requirements.txt
> index 5d47ed443949..5017f307c8a4 100644
> --- a/Documentation/sphinx/requirements.txt
> +++ b/Documentation/sphinx/requirements.txt
> @@ -1,6 +1,3 @@
> -# jinja2>=3.1 is not compatible with Sphinx<4.0
> -jinja2<3.1
> -# alabaster>=0.7.14 is not compatible with Sphinx<=3.3
> -alabaster<0.7.14
> -Sphinx==2.4.4
> +alabaster
> +Sphinx
>  pyyaml

  parent reply	other threads:[~2024-03-18 16:45 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-01 14:18 [PATCH v2] docs: drop the version constraints for sphinx and dependencies Lukas Bulwahn
2024-03-03 15:17 ` Jonathan Corbet
2024-03-18 16:44 ` Donald Hunter [this message]
2024-03-18 16:54   ` Vegard Nossum
2024-03-18 17:10     ` Donald Hunter
2024-03-19 17:59       ` Donald Hunter
2024-03-21 16:56         ` Donald Hunter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m21q8732wo.fsf@gmail.com \
    --to=donald.hunter@gmail.com \
    --cc=akiyks@gmail.com \
    --cc=corbet@lwn.net \
    --cc=jani.nikula@linux.intel.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lukas.bulwahn@gmail.com \
    --cc=mchehab@kernel.org \
    --cc=rdunlap@infradead.org \
    --cc=vegard.nossum@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.