Re: [RFC PATCH v1 2/6] kernel-doc: replace kernel-doc perl parser with a pure python one (WIP)

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Jani Nikula <jani.nikula@intel.com>
To: Markus Heiser <markus.heiser@darmarit.de>
Cc: Jonathan Corbet <corbet@lwn.net>,
	Mauro Carvalho Chehab <mchehab@infradead.org>,
	Daniel Vetter <daniel.vetter@ffwll.ch>,
	Matthew Wilcox <mawilcox@microsoft.com>,
	"linux-doc \@ vger . kernel . org List"
	<linux-doc@vger.kernel.org>,
	"linux-kernel \@ vger . kernel . org List" 
	<linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH v1 2/6] kernel-doc: replace kernel-doc perl parser with a pure python one (WIP)
Date: Wed, 25 Jan 2017 22:59:08 +0200	[thread overview]
Message-ID: <87o9yu500z.fsf@intel.com> (raw)
In-Reply-To: <8B5A4B93-5497-4BEE-8A88-51A7C0E75E32@darmarit.de>

On Wed, 25 Jan 2017, Markus Heiser <markus.heiser@darmarit.de> wrote:
> Am 25.01.2017 um 11:24 schrieb Jani Nikula <jani.nikula@intel.com>:
>
>> Markus, thanks for your work on this.
>
> Thanks for your comments!
>
>> Excuse me for my bluntness, but I think changing everything in a single
>> commit, or even a few commits, is strictly not acceptable.
>
> OK, I understand.
>
>> When I changed *small* things in scripts/kernel-doc, I would make
>> htmldocs before and after the change, and recursively diff the produced
>> output to ensure there were no surprises. We already have enough
>> documentation that a manual eyeballing of the output is simply not
>> sufficient to ensure things don't break.
>
>> The diff in output between before and after this series? 160k lines of
>> unified diff without context ('diff -u0 -r old new | wc -l').
>> 
>> Many of the changes are improvements on the result, such as using proper
>> <div> tags for function parameter lists etc., but clearly changing the
>> output should be independent of changing the parser, so we have some
>> chance of validating the parser.
>
>
> Hmm ... I try to sort my thoughts on this:
>
> The both parser are generating reST output. We have tested reST output
> so it should be enough to compare the reST from the old one with the
> new one ... at least theoretical.
>
> But the problem I see here is, that the perl script generates a
> reST output which I can't use. As an example we can take a look at
> the man-page builder I shipped in the series.

Sorry, I still don't understand *why* you can't use the same rst. Your
explanation seems to relate to man pages, but man pages come
*afterwards*, and are a separate improvement. I know you talk about lack
of proper structure and all that, but *why* can it strictly not be used,
if the *current* rst clearly can be used?

BR,
Jani.


>
>  https://www.mail-archive.com/linux-doc@vger.kernel.org/msg09017.html
>
> In the commit message there is a small example:
>
>   <section docname="basics ...>
>     <title>get_sd_load_idx</title>
>     <kernel_doc_man manpage="get_sd_load_idx.9"/>
>     ...
>     <desc desctype="function" domain="c"....>
>       <desc_signature ....>
>         <desc_type>int</desc_type>
>         ...
>
> You see that it has <section> tag with childs <title/>, <kernel_doc_man/> 
> and so on. This structured markup is used by builders, they navigate through
> the structured tree picking up nodes and spit out some man-page html, or
> whatever builder it is.
>
> ATM the perl parser generates a reST output which does not have such
> a structured tree, so the builder can't navigate in.
>
> So, what I mean is, the new parser has to generate a complete different reST
> output and thats why we can't compare the perl parser with python one on a reST
> basis ... and if reST is different, HTML is different :(
>
> So we do not have any chance to track regression when switching from
> the old to the new parser.
>
> Thats are my thoughts on this topic, may be you have a solution for this?
>
>> 
>>>   Ideally at the time of merging, we would be able to build the docs with
>>>   *either* kerneldoc.
>> 
>> I'd be fine with switching over in a single commit that doesn't
>> drastically change the output.
>
> One solution might be to improve the reST output of the perl script
> first, so that it produce something which has a structure and we
> all can agree on (short: reST output is the reference, ATM the reference
> need some improvements)
>
> If this is a way we like to go, I can send a patch for the perl script,
> so that we can commit one a reST reference.
>
>> A drop-in replacement. But that's not the
>> case here.
>> 
>>> - I'll have to try it out to see how noisy it is.  I'm not opposed to
>>>   stricter checks; indeed, they could be a good thing.  But we might want
>>>   to have an option so we can cut back on the noise by default.
>> 
>> The increase in 'make htmldocs' build log was from 1521 to 2791 lines in
>> my tree. Arguably there was useful extra diagnosis, but some of it was
>> the printouts of long lists of definitions that were not found, one per
>> line. So it could be condensed without losing info too.
>
> Yes, this was just a 1:1 merge from my POC, there are a lot of things
> which could be meld down. ATM, for me it is important to get a feedback 
> on the functionalities and concepts of kernel-doc apps (RFC).
>
>> On to performance. With the default build options the new system was
>> noticeably slower than the current one, with a 50% increase on my
>> machine. But what really caught me by surprise was that passing
>> SPHINXOPTS=-j5 to parallelize worked better on the current system,
>> making the new one a whopping 70% slower. Of course, the argument is
>> that the proposed parser does more and is better, but due to the
>> monolithic change it's impossible to pinpoint the culprit or do a proper
>> cost/benefit analysis on this. Again, this calls for a more broken down
>> series of patches to make the changes.
>
> Ups, I have to look closer ... I thought the py-solution is faster 
> since it does not for processes and does some caching.
>
>> Finally, while I'd love to see scripts/kernel-doc go, I do have to ask
>> if changing roughly 3k lines of Perl to roughly 3k lines of Python (*)
>> really makes everything better? They both still parse everything using a
>> large pile of regular expressions and a clunky state machine. When I
>> look at the code, I'm afraid I do not get that liberating feeling of
>> throwing out old junk in favor of something small or elegant or even
>> obviously more maintainable than the old one. The new one offers more
>> features, but repeatedly we face the problem that it's all lumped in
>> together with the parser change. We should be able to look at the parser
>> change and the other improvements separately.
>> 
>> That said, perhaps having an elegant parser (perhaps based on a compiler
>> plugin) is incompatible with the idea of making it a bug-for-bug drop-in
>> replacement of the old one, and it's something we need to think about.
>
> Before I started implementing the parser I thought about separating
> parsing from generating reST. I played a bit with pycparser
>
>   https://github.com/eliben/pycparser
>
> but I realized that the coverage of those parser might be not
> enough for the kernel sources. At this time you mentioned sparse.
> I haven't had time to at sparse but I guess that this is the
> tool.
>
> -- Markus --
>
>
>> All in all I think the message should be clear: this needs to be split
>> into small, incremental changes. Just like we do everything in the
>> kernel.
>> 
>> 
>> BR,
>> Jani.
>> 
>> 
>> (*) Please do not get hung up on these numbers. The Python version does
>>    more in some ways, but adds more deps such as fspath that's not
>>    included in the figures, and the Perl version outputs more
>>    formats. It's not an apples to apples comparison. Let's just say
>>    they are somewhere in the same ballpark.
>> 
>> -- 
>> Jani Nikula, Intel Open Source Technology Center
>

-- 
Jani Nikula, Intel Open Source Technology Center

next prev parent reply	other threads:[~2017-01-25 20:59 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-24 19:52 [RFC PATCH v1 0/6] pure python kernel-doc parser and more Markus Heiser
2017-01-24 19:52 ` [RFC PATCH v1 1/6] kernel-doc: pure python kernel-doc parser (preparation) Markus Heiser
2017-01-24 19:52 ` [RFC PATCH v1 2/6] kernel-doc: replace kernel-doc perl parser with a pure python one (WIP) Markus Heiser
2017-01-25  0:13   ` Jonathan Corbet
2017-01-25  6:37     ` Daniel Vetter
2017-01-25  7:37       ` Markus Heiser
2017-01-25 10:24     ` Jani Nikula
2017-01-25 10:35       ` Daniel Vetter
2017-01-25 19:07       ` Markus Heiser
2017-01-25 20:59         ` Jani Nikula [this message]
2017-01-26  9:54           ` Markus Heiser
2017-01-26 10:16             ` Jani Nikula
2017-01-26 18:50         ` Jonathan Corbet
2017-01-26 19:26           ` Jani Nikula
2017-01-27  9:46             ` Markus Heiser
2017-01-24 19:52 ` [RFC PATCH v1 3/6] kernel-doc: add kerneldoc-lint command Markus Heiser
2017-01-25  6:38   ` Daniel Vetter
2017-01-25  8:21     ` Jani Nikula
2017-01-25  9:34       ` Markus Heiser
2017-01-25 10:08         ` Jani Nikula
2017-01-24 19:52 ` [RFC PATCH v1 4/6] kernel-doc: insert TODOs on kernel-doc errors Markus Heiser
2017-01-24 19:52 ` [RFC PATCH v1 5/6] kernel-doc: add kerneldoc-src2rst command Markus Heiser
2017-01-24 19:52 ` [RFC PATCH v1 6/6] kernel-doc: add man page builder (target mandocs) Markus Heiser

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87o9yu500z.fsf@intel.com \
    --to=jani.nikula@intel.com \
    --cc=corbet@lwn.net \
    --cc=daniel.vetter@ffwll.ch \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=markus.heiser@darmarit.de \
    --cc=mawilcox@microsoft.com \
    --cc=mchehab@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox