From: Mauro Carvalho Chehab <mchehab@kernel.org>
To: Jonathan Corbet <corbet@lwn.net>
Cc: Jani Nikula <jani.nikula@intel.com>,
ksummit-discuss@lists.linuxfoundation.org,
ksummit@lists.linux.dev,
Markus Heiser <markus.heiser@darmarit.de>
Subject: Re: [Ksummit-discuss] [TECH TOPIC] What kernel documentation could be
Date: Sat, 25 Jun 2022 10:10:29 +0100 [thread overview]
Message-ID: <20220625101029.67f14c4c@sal.lan> (raw)
In-Reply-To: <87tu891xfv.fsf@meer.lwn.net>
Em Fri, 24 Jun 2022 16:57:56 -0600
Jonathan Corbet <corbet@lwn.net> escreveu:
> Mauro Carvalho Chehab <mchehab@kernel.org> writes:
>
> > Em Thu, 23 Jun 2022 07:40:45 -0600
> > Jonathan Corbet <corbet@lwn.net> escreveu:
> >> I've actually, in a spare moment or two, been doing some profiling of
> >> the kernel docs build and trying to track down the sources of the
> >> slowness. I am thinking that nearly 700 *million* calls to the iterator
> >> for the C-domain symbol list might have something to do with it...
> >
> > Wow, that's a lot!
>
> Just for the curious ...
>
> The Sphinx C domain code makes this elaborate tree data structure out of
> all the identifier names it has picked up - including the names of
> function parameters when they are provided in prototypes for some
> reason. This is the structure that is consulted whenever we want to
> resolve a cross-reference, which is fairly often with automarkup
> enabled.
>
> How does it work? The algorithm is, as far as I can tell:
>
> - Serialize the whole tree in a sort of breadth-first traversal
>
> - Make a linear pass through the *entire* list, comparing the
> identifier name with each entry and accumulating a list of all the
> matches found.
>
> - Return the first match and throw away the rest.
>
> The result is O(n^2) behavior and, in the kernel docs build, n gets to
> be fairly large.
>
> I went in with a crowbar and sledgehammer and replaced some of the list
> searches with a dict lookup, resulting in about a 20% speedup in the
> full htmldocs build with Sphinx 5.0.2.
Great to have a 20% speedup here! Yet, I would expect an even better
performance improvement by replacing 700 million calls from linear search
to dict, as it would change from O(n^2) to O(1) at the average case [1],
but the speedup gain actually depends on the actual number of symbols we
have defined.
[1] According with:
https://wiki.python.org/moin/TimeComplexity#dict
Worse case is O(n) - when collisions are common which is unlikely.
> A couple of automarkup
> optimizations result in about a 27% speedup overall. And I didn't break
> any cross-references.
Great!
> I think it's possible to do better, but this is a start. I'll post the
> automarkup changes as soon as I can, but I need to verify them across
> the whole range of Sphinx versions.
> For the Sphinx stuff, I'll need to
> turn my hatchetwork into something presentable and figure out how to
> contribute it upstream, but it seems worth the effort.
Indeed.
Regards,
Mauro
next prev parent reply other threads:[~2022-06-25 9:10 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-17 20:57 [TECH TOPIC] What kernel documentation could be Jonathan Corbet
2022-06-17 20:57 ` [Ksummit-discuss] " Jonathan Corbet
2022-06-17 21:48 ` Laurent Pinchart
2022-06-17 21:48 ` Laurent Pinchart
2022-06-27 15:18 ` Jonathan Corbet
2022-06-18 8:24 ` Mauro Carvalho Chehab
2022-06-18 8:24 ` Mauro Carvalho Chehab
2022-06-18 11:03 ` Miguel Ojeda
2022-06-18 11:16 ` Mauro Carvalho Chehab
2022-06-18 11:16 ` Mauro Carvalho Chehab
2022-06-18 14:37 ` Miguel Ojeda
2022-06-23 9:18 ` Jani Nikula
2022-06-23 9:57 ` Mauro Carvalho Chehab
2022-06-23 10:30 ` Jani Nikula
2022-06-23 13:40 ` Jonathan Corbet
2022-06-24 7:33 ` Mauro Carvalho Chehab
2022-06-24 16:37 ` Markus Heiser
2022-06-27 15:27 ` Jonathan Corbet
2022-06-27 15:31 ` Christoph Hellwig
2022-06-28 7:43 ` Mauro Carvalho Chehab
2022-06-28 7:57 ` Geert Uytterhoeven
2022-06-28 11:01 ` Mauro Carvalho Chehab
2022-07-02 12:43 ` Stephen Rothwell
2022-06-24 22:57 ` Jonathan Corbet
2022-06-25 9:10 ` Mauro Carvalho Chehab [this message]
2022-06-25 14:00 ` Jonathan Corbet
2022-06-25 18:11 ` Linus Torvalds
2022-06-26 7:55 ` Mauro Carvalho Chehab
2022-06-26 9:26 ` Mauro Carvalho Chehab
2022-06-26 9:53 ` Mauro Carvalho Chehab
2022-06-27 15:28 ` Liam Howlett
2022-06-27 15:54 ` Christian Brauner
2022-06-27 16:27 ` Mark Brown
2022-06-28 10:53 ` Mauro Carvalho Chehab
2022-06-28 16:13 ` Luck, Tony
2022-06-27 15:34 ` Jonathan Corbet
2022-06-27 17:07 ` Linus Torvalds
2022-07-01 8:48 ` [PATCH 0/4] Address some issues with sphinx detection Mauro Carvalho Chehab
2022-07-01 8:48 ` [PATCH 1/4] scripts: sphinx-pre-install: fix venv version check logic Mauro Carvalho Chehab
2022-07-01 8:48 ` [PATCH 2/4] scripts: sphinx-pre-install: report broken venv Mauro Carvalho Chehab
2022-07-01 8:48 ` [PATCH 3/4] scripts: sphinx-pre-install: check for PDF min version later on Mauro Carvalho Chehab
2022-07-01 8:48 ` [PATCH 4/4] scripts: sphinx-pre-install: provide both venv and package installs Mauro Carvalho Chehab
2022-07-02 10:11 ` [PATCH v2 0/5] Address some issues with sphinx detection Mauro Carvalho Chehab
2022-07-02 10:11 ` [PATCH v2 1/5] scripts: sphinx-pre-install: fix venv version check logic Mauro Carvalho Chehab
2022-07-02 10:11 ` [PATCH v2 2/5] scripts: sphinx-pre-install: report broken venv Mauro Carvalho Chehab
2022-07-02 10:11 ` [PATCH v2 3/5] scripts: sphinx-pre-install: check for PDF min version later on Mauro Carvalho Chehab
2022-07-02 10:11 ` [PATCH v2 4/5] scripts: sphinx-pre-install: provide both venv and package installs Mauro Carvalho Chehab
2022-07-02 10:11 ` [PATCH v2 5/5] scripts: sphinx-pre-install: place a warning for Sphinx >= 3.0 Mauro Carvalho Chehab
2022-07-05 4:15 ` [PATCH v2 0/5] Address some issues with sphinx detection Akira Yokosawa
2022-07-06 14:31 ` Akira Yokosawa
2022-07-07 20:33 ` Mauro Carvalho Chehab
2022-07-07 18:45 ` Jonathan Corbet
2022-07-07 20:25 ` Mauro Carvalho Chehab
2022-07-07 20:15 ` Mauro Carvalho Chehab
2022-07-08 11:34 ` Expectation to --no-pdf option (was Re: [PATCH v2 0/5] Address some issues with sphinx detection) Akira Yokosawa
2022-07-08 14:02 ` Jonathan Corbet
2022-07-08 14:59 ` Mauro Carvalho Chehab
2022-07-08 15:27 ` Akira Yokosawa
2022-07-08 23:01 ` Akira Yokosawa
2022-07-09 7:59 ` Mauro Carvalho Chehab
2022-07-11 11:23 ` Akira Yokosawa
2022-08-01 23:30 ` [PATCH v2 0/5] Address some issues with sphinx detection Tomasz Warniełło
2022-07-02 10:52 ` [Ksummit-discuss] [TECH TOPIC] What kernel documentation could be Mauro Carvalho Chehab
2022-06-25 17:43 ` Steven Rostedt
2022-06-25 17:48 ` Laurent Pinchart
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220625101029.67f14c4c@sal.lan \
--to=mchehab@kernel.org \
--cc=corbet@lwn.net \
--cc=jani.nikula@intel.com \
--cc=ksummit-discuss@lists.linuxfoundation.org \
--cc=ksummit@lists.linux.dev \
--cc=markus.heiser@darmarit.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.