Linux Manual Pages development
 help / color / mirror / Atom feed
From: "G. Branden Robinson" <g.branden.robinson@gmail.com>
To: Alejandro Colomar <alx@kernel.org>,
	Ian Rogers <irogers@google.com>, David Airlie <airlied@gmail.com>,
	Simona Vetter <simona@ffwll.ch>,
	Maarten Lankhorst <maarten.lankhorst@linux.intel.com>,
	Maxime Ripard <mripard@kernel.org>,
	Thomas Zimmermann <tzimmermann@suse.de>,
	Jonathan Corbet <corbet@lwn.net>,
	dri-devel@lists.freedesktop.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-man@vger.kernel.org,
	groff@gnu.org
Subject: Re: [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page
Date: Sat, 2 Nov 2024 19:50:23 -0500	[thread overview]
Message-ID: <20241103005023.kdv5bkpqkpmsom5g@illithid> (raw)
In-Reply-To: <ZyZ4Tfxfr7M-EqUo@riva.ucam.org>

[-- Attachment #1: Type: text/plain, Size: 4954 bytes --]

Hi Colin,

At 2024-11-02T19:06:53+0000, Colin Watson wrote:
> How embarrassing.  Could somebody please file a bug on
> https://gitlab.com/man-db/man-db/-/issues to remind me to fix that?

Done; <https://gitlab.com/man-db/man-db/-/issues/46>.

> lexgrog(1) is a useful (if oddly-named, sorry) debugging tool, but if
> you focus on that then you'll end up with a design that's not very
> useful.  What really matters is indexing the whole system's manual
> pages, and mandb(8) does not do that by invoking lexgrog(1) one page
> at a time, but rather by running more or less the same code
> in-process.

Ah, I see it now--"lexgrog.l" is in both the Automake macros
"lexgrog_SOURCES" and "mandb_SOURCES".  Nice and DRY!

> I already know that getting acceptable performance for
> this requires care, as illustrated by one of the NEWS entries for
> man-db 2.10.0:
> 
>  * Significantly improve `mandb(8)` and `man -K` performance in the
>    common case where pages are of moderate size and compressed using
>    `zlib`: `mandb -c` goes from 344 seconds to 10 seconds on a test
>    system.
> 
> ... so I'm prepared to bet that forking nroff one page at a time will
> be unacceptably slow.

Probably, but there is little reason to run nroff that way (as of groff
1.23).  It already works well, but I have ideas for further hardening
groff's man(7) and mdoc(7) packages such that they return to a
well-defined state when changing input documents.

> (This also combines with the fact that man-db applies some sandboxing
> when it's calling nroff just in case it might happen that a
> moderately-sized C++ project has less than 100% perfect security when
> doing text processing, which I'm sure everyone agrees would never
> happen.)

Inconceivable, yes!  But fortunately you can run nroff over N documents
and pay its own startup overhead costs as well as those of sandboxing
only once.

> If it were possible to run nroff over a whole batch of pages and get
> output for each of them in one go, then maaaaybe.

That's already true for formatting the entire page.  It's how this was
created.

https://www.gnu.org/software/groff/manual/groff-man-pages.utf8.txt

(...best viewed with "less -R")

With the `-d EXTRACT` feature I have in mind, in its
as-simple-as-possible first-cut form, the problem you anticipate...

> man-db would need a reliable way to associate each line (or sometimes
> multiple lines) of output with each source file,

...would remain.  I'll have to think of a good way to write out
"metadata" (the input file name and the arguments to the `TH` request)
as each page is encountered, and of an interface to enable that.  I
don't see it happening before groff 1.25.

> and of course care would be needed around error handling and so on.

I need to give this thought, too.  What sorts of error scenarios do you
foresee?  GNU troff itself, if it can't open a file to be formatted,
reports an error diagnostic and continues to the next `argv` string
until it reaches the end of input.

> I can see the appeal, in terms of processing the actual language
> rather than a pile of hacks that try to guess what to do with it

...a major selling point, IMO...

> but on the other hand this starts to feel like a much less natural fit
> for the way nroff is run in every other situation, where you're
> processing one document at a time.

This I disagree with.  Or perhaps more precisely, it's another example
of the exception (man(1)) swallowing the rule (nroff/troff).  nroff and
troff were written as Unix filters; they read the standard input stream
(and/or argument list)[1], do some processing, and write to standard
output.[2]

Historically, troff (or one of its preprocessors) was commonly used with
multiple input files to catenate them.

Here's an example of this practice from 1980.

https://minnie.tuhs.org/cgi-bin/utree.pl?file=3BSD/usr/doc/pascal/makefile

Regards,
Branden

[1] ...including this option from Seventh Edition Unix (1979) or
    earlier, which survives in GNU troff to this day.

     -i     Read standard input after the input files are
            exhausted.

[2] Seventh Edition troff didn't write to stdout by default, but tried
    to open the typesetter device.  But it had an option to write to
    standard output.

     -t     Direct output to the standard output instead of the
            phototypesetter.

   Running old school Unix under emulation these days, you _have_ to use
   this option to avoid the dreaded "Typesetter busy." diagnostic.

   When Kernighan refactored troff for device-independence, he
   reseated it more squarely in the Unix filter tradition by writing
   its plain-text page description language to stdout.  The output
   driver, such as "dpost" for PostScript, also read its standard input,
   and could thus become just one more stage in a pipeline.  [CSTR #97]

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2024-11-03  0:50 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-15 21:17 [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page Ian Rogers
2024-10-15 21:17 ` [PATCH v2 2/3] proc_pid_fdinfo.5: Add subsection headers for different fd types Ian Rogers
2024-10-15 21:17 ` [PATCH v2 3/3] proc_pid_fdinfo.5: Add DRM subsection Ian Rogers
2024-11-01 13:24 ` [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page Alejandro Colomar
2024-11-01 18:19   ` Ian Rogers
2024-11-01 20:07     ` Alejandro Colomar
2024-11-02 10:08       ` G. Branden Robinson
2024-11-02 10:39         ` Alejandro Colomar
2024-11-02 21:36           ` Alejandro Colomar
2024-11-02 23:47             ` Colin Watson
2024-11-03  0:05               ` Alejandro Colomar
2024-11-03  0:07                 ` Alejandro Colomar
2024-11-03  0:24                 ` Colin Watson
2024-11-03  0:42                   ` Alejandro Colomar
2024-11-03  0:47                 ` Colin Watson
2024-11-03  1:09                   ` G. Branden Robinson
2024-11-03  1:18                     ` Colin Watson
2024-11-03  1:59                   ` Alejandro Colomar
2024-11-03 14:32                     ` Colin Watson
2024-11-03  4:05           ` G. Branden Robinson
2024-11-02 19:06         ` Colin Watson
2024-11-03  0:50           ` G. Branden Robinson [this message]
2024-11-03  1:55             ` Colin Watson
2024-11-02 23:10     ` [PATCH 0/3] Add mansect(1) program and manual page Alejandro Colomar
2024-11-02 23:10       ` [PATCH] CONTRIBUTING.d/patches: Document new features alongside the features Alejandro Colomar
2024-11-02 23:17         ` Alejandro Colomar
2024-11-02 23:10       ` [PATCH 1/3] signal.7: Better description for SIGFPE Alejandro Colomar
2024-11-02 23:17         ` Alejandro Colomar
2024-11-02 23:10       ` [PATCH 2/3] src/bin/mansect, mansect.1: Add program and its manual page Alejandro Colomar
2024-11-02 23:10       ` [PATCH 3/3] scripts/bash_aliases: man_section(), man_lsfunc(), man_lsvar(): Use mansect(1) Alejandro Colomar
2024-11-03  1:16       ` [PATCH v2 0/4] Add mansect(1) Alejandro Colomar
2024-11-03  1:16         ` [PATCH v2 1/4] src/bin/mansect, mansect.1: Add program and its manual page Alejandro Colomar
2024-11-03  1:17         ` [PATCH v2 2/4] scripts/bash_aliases: man_lsfunc(), man_lsvar(): Use mansect(1) Alejandro Colomar
2024-11-03  1:17         ` [PATCH v2 3/4] scripts/bash_aliases: man_lsfunc(), man_lsvar(): Use pcre2grep(1) instead of pcregrep(1) Alejandro Colomar
2024-11-03  1:17         ` [PATCH v2 4/4] src/bin/mansect: Preprocess with preconv(1) Alejandro Colomar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241103005023.kdv5bkpqkpmsom5g@illithid \
    --to=g.branden.robinson@gmail.com \
    --cc=airlied@gmail.com \
    --cc=alx@kernel.org \
    --cc=corbet@lwn.net \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=groff@gnu.org \
    --cc=irogers@google.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-man@vger.kernel.org \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=mripard@kernel.org \
    --cc=simona@ffwll.ch \
    --cc=tzimmermann@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox