From: Jani Nikula <jani.nikula@intel.com>
To: Markus Heiser <markus.heiser@darmarit.de>
Cc: Jonathan Corbet <corbet@lwn.net>,
Mauro Carvalho Chehab <mchehab@infradead.org>,
Daniel Vetter <daniel.vetter@ffwll.ch>,
Matthew Wilcox <mawilcox@microsoft.com>,
"linux-doc \@ vger . kernel . org List"
<linux-doc@vger.kernel.org>,
"linux-kernel \@ vger . kernel . org List"
<linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH v1 2/6] kernel-doc: replace kernel-doc perl parser with a pure python one (WIP)
Date: Wed, 25 Jan 2017 22:59:08 +0200 [thread overview]
Message-ID: <87o9yu500z.fsf@intel.com> (raw)
In-Reply-To: <8B5A4B93-5497-4BEE-8A88-51A7C0E75E32@darmarit.de>
On Wed, 25 Jan 2017, Markus Heiser <markus.heiser@darmarit.de> wrote:
> Am 25.01.2017 um 11:24 schrieb Jani Nikula <jani.nikula@intel.com>:
>
>> Markus, thanks for your work on this.
>
> Thanks for your comments!
>
>> Excuse me for my bluntness, but I think changing everything in a single
>> commit, or even a few commits, is strictly not acceptable.
>
> OK, I understand.
>
>> When I changed *small* things in scripts/kernel-doc, I would make
>> htmldocs before and after the change, and recursively diff the produced
>> output to ensure there were no surprises. We already have enough
>> documentation that a manual eyeballing of the output is simply not
>> sufficient to ensure things don't break.
>
>> The diff in output between before and after this series? 160k lines of
>> unified diff without context ('diff -u0 -r old new | wc -l').
>>
>> Many of the changes are improvements on the result, such as using proper
>> <div> tags for function parameter lists etc., but clearly changing the
>> output should be independent of changing the parser, so we have some
>> chance of validating the parser.
>
>
> Hmm ... I try to sort my thoughts on this:
>
> The both parser are generating reST output. We have tested reST output
> so it should be enough to compare the reST from the old one with the
> new one ... at least theoretical.
>
> But the problem I see here is, that the perl script generates a
> reST output which I can't use. As an example we can take a look at
> the man-page builder I shipped in the series.
Sorry, I still don't understand *why* you can't use the same rst. Your
explanation seems to relate to man pages, but man pages come
*afterwards*, and are a separate improvement. I know you talk about lack
of proper structure and all that, but *why* can it strictly not be used,
if the *current* rst clearly can be used?
BR,
Jani.
>
> https://www.mail-archive.com/linux-doc@vger.kernel.org/msg09017.html
>
> In the commit message there is a small example:
>
> <section docname="basics ...>
> <title>get_sd_load_idx</title>
> <kernel_doc_man manpage="get_sd_load_idx.9"/>
> ...
> <desc desctype="function" domain="c"....>
> <desc_signature ....>
> <desc_type>int</desc_type>
> ...
>
> You see that it has <section> tag with childs <title/>, <kernel_doc_man/>
> and so on. This structured markup is used by builders, they navigate through
> the structured tree picking up nodes and spit out some man-page html, or
> whatever builder it is.
>
> ATM the perl parser generates a reST output which does not have such
> a structured tree, so the builder can't navigate in.
>
> So, what I mean is, the new parser has to generate a complete different reST
> output and thats why we can't compare the perl parser with python one on a reST
> basis ... and if reST is different, HTML is different :(
>
> So we do not have any chance to track regression when switching from
> the old to the new parser.
>
> Thats are my thoughts on this topic, may be you have a solution for this?
>
>>
>>> Ideally at the time of merging, we would be able to build the docs with
>>> *either* kerneldoc.
>>
>> I'd be fine with switching over in a single commit that doesn't
>> drastically change the output.
>
> One solution might be to improve the reST output of the perl script
> first, so that it produce something which has a structure and we
> all can agree on (short: reST output is the reference, ATM the reference
> need some improvements)
>
> If this is a way we like to go, I can send a patch for the perl script,
> so that we can commit one a reST reference.
>
>> A drop-in replacement. But that's not the
>> case here.
>>
>>> - I'll have to try it out to see how noisy it is. I'm not opposed to
>>> stricter checks; indeed, they could be a good thing. But we might want
>>> to have an option so we can cut back on the noise by default.
>>
>> The increase in 'make htmldocs' build log was from 1521 to 2791 lines in
>> my tree. Arguably there was useful extra diagnosis, but some of it was
>> the printouts of long lists of definitions that were not found, one per
>> line. So it could be condensed without losing info too.
>
> Yes, this was just a 1:1 merge from my POC, there are a lot of things
> which could be meld down. ATM, for me it is important to get a feedback
> on the functionalities and concepts of kernel-doc apps (RFC).
>
>> On to performance. With the default build options the new system was
>> noticeably slower than the current one, with a 50% increase on my
>> machine. But what really caught me by surprise was that passing
>> SPHINXOPTS=-j5 to parallelize worked better on the current system,
>> making the new one a whopping 70% slower. Of course, the argument is
>> that the proposed parser does more and is better, but due to the
>> monolithic change it's impossible to pinpoint the culprit or do a proper
>> cost/benefit analysis on this. Again, this calls for a more broken down
>> series of patches to make the changes.
>
> Ups, I have to look closer ... I thought the py-solution is faster
> since it does not for processes and does some caching.
>
>> Finally, while I'd love to see scripts/kernel-doc go, I do have to ask
>> if changing roughly 3k lines of Perl to roughly 3k lines of Python (*)
>> really makes everything better? They both still parse everything using a
>> large pile of regular expressions and a clunky state machine. When I
>> look at the code, I'm afraid I do not get that liberating feeling of
>> throwing out old junk in favor of something small or elegant or even
>> obviously more maintainable than the old one. The new one offers more
>> features, but repeatedly we face the problem that it's all lumped in
>> together with the parser change. We should be able to look at the parser
>> change and the other improvements separately.
>>
>> That said, perhaps having an elegant parser (perhaps based on a compiler
>> plugin) is incompatible with the idea of making it a bug-for-bug drop-in
>> replacement of the old one, and it's something we need to think about.
>
> Before I started implementing the parser I thought about separating
> parsing from generating reST. I played a bit with pycparser
>
> https://github.com/eliben/pycparser
>
> but I realized that the coverage of those parser might be not
> enough for the kernel sources. At this time you mentioned sparse.
> I haven't had time to at sparse but I guess that this is the
> tool.
>
> -- Markus --
>
>
>> All in all I think the message should be clear: this needs to be split
>> into small, incremental changes. Just like we do everything in the
>> kernel.
>>
>>
>> BR,
>> Jani.
>>
>>
>> (*) Please do not get hung up on these numbers. The Python version does
>> more in some ways, but adds more deps such as fspath that's not
>> included in the figures, and the Perl version outputs more
>> formats. It's not an apples to apples comparison. Let's just say
>> they are somewhere in the same ballpark.
>>
>> --
>> Jani Nikula, Intel Open Source Technology Center
>
--
Jani Nikula, Intel Open Source Technology Center
next prev parent reply other threads:[~2017-01-25 20:59 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-01-24 19:52 [RFC PATCH v1 0/6] pure python kernel-doc parser and more Markus Heiser
2017-01-24 19:52 ` [RFC PATCH v1 1/6] kernel-doc: pure python kernel-doc parser (preparation) Markus Heiser
2017-01-24 19:52 ` [RFC PATCH v1 2/6] kernel-doc: replace kernel-doc perl parser with a pure python one (WIP) Markus Heiser
2017-01-25 0:13 ` Jonathan Corbet
2017-01-25 6:37 ` Daniel Vetter
2017-01-25 7:37 ` Markus Heiser
2017-01-25 10:24 ` Jani Nikula
2017-01-25 10:35 ` Daniel Vetter
2017-01-25 19:07 ` Markus Heiser
2017-01-25 20:59 ` Jani Nikula [this message]
2017-01-26 9:54 ` Markus Heiser
2017-01-26 10:16 ` Jani Nikula
2017-01-26 18:50 ` Jonathan Corbet
2017-01-26 19:26 ` Jani Nikula
2017-01-27 9:46 ` Markus Heiser
2017-01-24 19:52 ` [RFC PATCH v1 3/6] kernel-doc: add kerneldoc-lint command Markus Heiser
2017-01-25 6:38 ` Daniel Vetter
2017-01-25 8:21 ` Jani Nikula
2017-01-25 9:34 ` Markus Heiser
2017-01-25 10:08 ` Jani Nikula
2017-01-24 19:52 ` [RFC PATCH v1 4/6] kernel-doc: insert TODOs on kernel-doc errors Markus Heiser
2017-01-24 19:52 ` [RFC PATCH v1 5/6] kernel-doc: add kerneldoc-src2rst command Markus Heiser
2017-01-24 19:52 ` [RFC PATCH v1 6/6] kernel-doc: add man page builder (target mandocs) Markus Heiser
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87o9yu500z.fsf@intel.com \
--to=jani.nikula@intel.com \
--cc=corbet@lwn.net \
--cc=daniel.vetter@ffwll.ch \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=markus.heiser@darmarit.de \
--cc=mawilcox@microsoft.com \
--cc=mchehab@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox