Linux Manual Pages development
 help / color / mirror / Atom feed
From: Alejandro Colomar <alx@kernel.org>
To: "G. Branden Robinson" <g.branden.robinson@gmail.com>
Cc: Ian Rogers <irogers@google.com>, David Airlie <airlied@gmail.com>,
	Simona Vetter <simona@ffwll.ch>,
	Maarten Lankhorst <maarten.lankhorst@linux.intel.com>,
	Maxime Ripard <mripard@kernel.org>,
	Thomas Zimmermann <tzimmermann@suse.de>,
	Jonathan Corbet <corbet@lwn.net>,
	dri-devel@lists.freedesktop.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-man@vger.kernel.org,
	cjwatson@debian.org, groff@gnu.org
Subject: Re: [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page
Date: Sat, 2 Nov 2024 11:39:37 +0100	[thread overview]
Message-ID: <20241102103937.ose4y72a7yl3dcmz@devuan> (raw)
In-Reply-To: <20241102100837.anfonowxfx4ekn3d@illithid>

[-- Attachment #1: Type: text/plain, Size: 12063 bytes --]

Hi Branden,

On Sat, Nov 02, 2024 at 05:08:37AM -0500, G. Branden Robinson wrote:
> [adding Colin Watson to CC; and the groff list because I started musing]
> 
> Hi Alex,
> 
> At 2024-11-01T21:07:29+0100, Alejandro Colomar wrote:
> > > > > -/proc/pid/fdinfo/ \- information about file descriptors
> > > > > +.IR /proc/ pid /fdinfo " \- information about file descriptors"
> > > >
> > > > I wouldn't add formatting here for now.  That's something I prefer
> > > > to be cautious about, and if we do it, we should do it in a
> > > > separate commit.
> > > 
> > > I'll move it to a separate patch. Is the caution due to a lack of
> > > test infrastructure? That could be something to get resolved,
> > > perhaps through Google summer-of-code and the like.
> > 
> > That change might be controversial.
> 
> Then let those with objections step forward and make them!

Sure!  But that in itself (and the length of your mail) makes a strong
reason to have this in a separate commit.  :)

I'm not opposed to the change.  Only cautious.

> 
> (I may be one of them; see below.)
> 
> > We'd first need to check that all software that reads the NAME section
> > would behave well for this.
> 
> Not _all_ software, surely.  Anybody can write a craptastic man(7)
> scraper, and several have, mainly back when Web 1.0 was going to eat the
> world.  Most of those have withered on the vine.

Ahh, yeah, I committed the same mistake I criticise in others every now
and then.  $all does not really mean "all".  (-Wall, `make all`, ...)

I meant all [of which I care], which is basically groff(1) and
mandoc(1).  :)

> This is the _Linux_ man-pages project, so what matters are (1) man page
> formatters and (2) man page indexers that GNU/Linux systems actually
> use.  Where people get nervous with the "NAME" section is because of the
> indexer; if one's man(7) _formatter_ can't handle an `IR` call, it
> hasn't earned the name.

Yup.

> 
> Here's a sample input.
> 
> $ cat /tmp/proc_pid_fdinfo_mini.5
> .TH proc_pid_fdinfo_mini 5 2024-11-02 "example"
> .SH Name
> .IR /proc/ pid /fdinfo " \- information about file descriptors"
> .SH Description
> Text text text text.
> 
> Starting with formatters, let's see how they do.
> 
> $ nroff -man /tmp/proc_pid_fdinfo_mini.5
> proc_pid_fdinfo_mini(5)       File Formats Manual      proc_pid_fdinfo_mini(5)
> 
> Name
>        /proc/pid/fdinfo - information about file descriptors
> 
> Description
>        Text text text text.
> 
> example                           2024‐11‐02           proc_pid_fdinfo_mini(5)
> $ mandoc /tmp/proc_pid_fdinfo_mini.5 | ul
> proc_pid_fdinfo_mini(5)       File Formats Manual      proc_pid_fdinfo_mini(5)
> 
> Name
>        /proc/pid/fdinfo - information about file descriptors
> 
> Description
>        Text text text text.
> 
> example                           2024-11-02           proc_pid_fdinfo_mini(5)
> $ ~/heirloom/bin/nroff -man /tmp/proc_pid_fdinfo_mini.5 | ul
> proc_pid_fdinfo_mini(5)       File Formats Manual      proc_pid_fdinfo_mini(5)
> 
> 
> 
> Name
>        /proc/pid/fdinfo - information about file descriptors
> 
> Description
>        Text text text text.
> 
> 
> 
> example                           2024-11-02           proc_pid_fdinfo_mini(5)
> $ DWBHOME=~/dwb ~/dwb/bin/nroff -man /tmp/proc_pid_fdinfo_mini.5 | cat -s | ul
> 
>        proc_pid_fdinfo_mini(5)example (2024-11-02)roc_pid_fdinfo_mini(5)
> 
>        Name
>             /proc/pid/fdinfo - information about file descriptors
> 
>        Description
>             Text text text text.
> 
>        Page 1                                        (printed 11/2/2024)
> 
> I leave the execution of these to perceive the correct font style
> changes as an exercise for the reader, but they all get the
> "/proc/pid/fdinfo" line right.
> 
> On GNU/Linux systems, the only man page indexer I know of is Colin
> Watson's man-db--specifically, its mandb(8) program.  But it's nicely
> designed so that the "topic and summary description extraction" task is
> delegated to a standalone tool, lexgrog(1), and we can use that.
> 
> $ lexgrog /tmp/proc_pid_fdinfo_mini.5
> /tmp/proc_pid_fdinfo_mini.5: parse failed
> 
> Oh, damn.  I wasn't expecting that.  Maybe this is what defeats Michael
> Kerrisk's scraper with respect to groff's man pages.[1]
> 
> Well, I can find a silver lining here, because it gives me an even
> better reason than I had to pitch an idea I've been kicking around for a
> while.  Why not enhance groff man(7) to support a mode where _it_ will
> spit out the "Name"/"NAME" section, and only that, _for_ you?
> 
> This would be as easy as checking for an option, say '-d EXTRACT=Name',
> and having the package's "TH" and "SH" macro definitions divert
> (literally, with the `di` request) everything _except_ the section of
> interest to a diversion that is then never called/output.  (This is
> similar to an m4 feature known as the "black hole diversion".)

Sounds good.  And then lexgrog(1) would be a one-liner that calls
groff(1) with the appropriate flag, right?

> All of the features necessary to implement this[2] were part of troff as
> far as back as the birth of the man(7) package itself.  It's not clear
> to me why it wasn't done back in the 1980s.

Not enough energy of activation, probably, as with most stuff.

> lexgrog(1) itself will of course have to stay around for years to come,

You can make it a wrapper around groff(1) with flags, no?

> but this could take a significant distraction off of Colin's plate--I
> believe I have seen him grumble about how much *roff syntax he has to
> parse to have the feature be workable, and that's without upstart groff
> maintainers exploring up to every boundary that existed even in 1979 and
> cheerfully exercising their findings in man pages.
> 
> I also of course have ideas for generalizing the feature, so that you
> can request any (sub)section by name, and, with a bit more ambition,[4]
> paragraph tags (`TP`) too.
> 
> So you could do things like:
> 
> nroff -man -d EXTRACT="RETURN VALUE" man3/bsearch.3

I certainly use this.

	#  man_section()  prints specific manual page sections (DESCRIPTION, SYNOPSIS,
	# ...) of all manual pages in a directory (or in a single manual page file).
	# Usage example:  .../man-pages$ man_section man2 SYNOPSIS 'SEE ALSO';

	man_section()
	{
		if [ $# -lt 2 ]; then
			>&2 echo "Usage: ${FUNCNAME[0]} <dir> <section>...";
			return $EX_USAGE;
		fi

		local page="$1";
		shift;
		local sect="$*";

		find "$page" -type f \
		|xargs wc -l \
		|grep -v -e '\b1 ' -e '\btotal\b' \
		|awk '{ print $2 }' \
		|sort \
		|while read -r manpage; do
			(sed -n '/^\.TH/,/^\.SH/{/^\.SH/!p}' <"$manpage";
			 for s in $sect; do
				<"$manpage" \
				sed -n \
					-e "/^\.SH $s/p" \
					-e "/^\.SH $s/,/^\.SH/{/^\.SH/!p}";
			 done;) \
			|mandoc -Tutf8 2>/dev/null \
			|col -pbx;
		done;
	}

	#  man_lsfunc()  prints the name of all C functions declared in the SYNOPSIS
	# of all manual pages in a directory (or in a single manual page file).
	# Each name is printed in a separate line
	# Usage example:  .../man-pages$ man_lsfunc man2;

	man_lsfunc()
	{
		if [ $# -lt 1 ]; then
			>&2 echo "Usage: ${FUNCNAME[0]} <manpage|manNdir>...";
			return $EX_USAGE;
		fi

		for arg in "$@"; do
			man_section "$arg" 'SYNOPSIS';
		done \
		|sed_rm_ccomments \
		|pcregrep -Mn '(?s)^ [\w ]+ \**\w+\([\w\s(,)[\]*]*?(...)?\s*\); *$' \
		|grep '^[0-9]' \
		|sed -E 's/syscall\(SYS_(\w*),?/\1(/' \
		|sed -E 's/^[^(]+ \**(\w+)\(.*/\1/' \
		|uniq;
	}

	#  man_lsvar()  prints the name of all C variables declared in the SYNOPSIS
	# of all manual pages in a directory (or in a single manual page file).
	# Each name is printed in a separate line
	# Usage example:  .../man-pages$ man_lsvar man3;

	man_lsvar()
	{
		if [ $# -lt 1 ]; then
			>&2 echo "Usage: ${FUNCNAME[0]} <manpage|manNdir>...";
			return $EX_USAGE;
		fi

		for arg in "$@"; do
			man_section "$arg" 'SYNOPSIS';
		done \
		|sed_rm_ccomments \
		|pcregrep -Mv '(?s)^ [\w ]+ \**\w+\([\w\s(,)[\]*]+?(...)?\s*\); *$' \
		|pcregrep -Mn \
		  -e '(?s)^ +extern [\w ]+ \**\(\*+[\w ]+\)\([\w\s(,)[\]*]+?\s*\); *$' \
		  -e '^ +extern [\w ]+ \**[\w ]+; *$' \
		|grep '^[0-9]' \
		|grep -v 'typedef' \
		|sed -E 's/^[0-9]+: +extern [^(]+ \**\(\*+(\w* )?(\w+)\)\(.*/\2/' \
		|sed    's/^[0-9]\+: \+extern .* \**\(\w\+\); */\1/' \
		|uniq;
	}

Even grepc(1) derived from those scripts.

> 
> and:
> 
> nroff -man -d EXTRACT="OPTIONS/-b" man8/zic.8

While I haven't used this yet, it's probably because it's quite complex
to implement with regexes, not because it wouldn't be useful.

> 
> ...does this sound appetizing to anyone?

Certainly.

> > Also, many other pages might need to be changed accordingly for
> > consistency.
> 
> I withdraw the suggestion until lexgrog(1) flexes its own muscles, or
> has groff(1) do the lifting.  I'm sorry for prompting churn, Ian.
> 
> > No, this isn't outdated, since that reduces the quality of the diff.
> > Also, I review a lot of patches in the mail client, without running
> > git(1).  And it's not just for reviewing diffs, but also for writing
> > them.  Semantic newlines reduce the amount of work for producing the
> > diffs.
> 
> It's a real win for diffs.

And diffs are a real win for text.  Thus, semantic newlines are a real
win for text.  "Write poems, not prose."  (Any chance we may get that
warning added to groff(1)?  :D)


Cheers,
Alex

> 
> Here's a very recent example from groff.
> 
> diff --git a/man/groff.7.man b/man/groff.7.man
> index 1fb635f2b..1d248b237 100644
> --- a/man/groff.7.man
> +++ b/man/groff.7.man
> @@ -1281,6 +1281,7 @@ .SH Identifiers
>  typeface,
>  color,
>  special character or character class,
> +hyphenation language code,
>  environment,
>  or stream.
>  .
> 
> 
> (So recent that in fact I haven't pushed that yet.)
> 
> Lists like the foregoing are common in man pages.
> 
> Regards,
> Branden
> 
> [1] https://man7.org/linux/man-pages/dir_by_project.html#groff
> [2] String definitions, "string comparisons"[3], and diversions.
> [3] strictly, "formatted output comparisons"
> 
>     https://www.gnu.org/software/groff/manual/groff.html.node/Operators-in-Conditionals.html
> 
>     You can do stricter string comparisons in GNU troff.  And I've
>     thought of some syntactic sugar for performing them that wouldn't
>     break backward compatibility.
> 
> [4] To really land the feature, we need automatic tag generation from
>     input text (we don't want to make the man page author construct
>     their own tags).  Another reason we want the construction to be
>     automatic is to make the tags unique when multiple man pages are
>     formatted in one run, as one might do when making a book of man
>     pages.  Automatic tagging will also enable the slaying of two other
>     ancient dragons.
> 
>     1.  deep internal links for PDF bookmarks
>     2.  pod2man's `IX`-happy output; the widespread use of this
>         nonstandard macro confuses way too many novice page authors, and
>         bloats document size.
> 
>    Another feature we'll really want to do this right is improved string
>    processing facilities.  That, too, is something that will pay
>    dividends in several areas.  With a proper string iterator in the
>    formatter (and a couple more conditional operators),[5] it will be
>    possible to write a string library as a macro file, slimming down the
>    formatter itself a little and making macro writers' lives easier.
>    We're only two days into the month and this has already come up on
>    the groff list.
> 
>    https://lists.gnu.org/archive/html/groff/2024-11/msg00002.html
> 
> [5] https://savannah.gnu.org/bugs/?62264



-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2024-11-02 10:40 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-15 21:17 [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page Ian Rogers
2024-10-15 21:17 ` [PATCH v2 2/3] proc_pid_fdinfo.5: Add subsection headers for different fd types Ian Rogers
2024-10-15 21:17 ` [PATCH v2 3/3] proc_pid_fdinfo.5: Add DRM subsection Ian Rogers
2024-11-01 13:24 ` [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page Alejandro Colomar
2024-11-01 18:19   ` Ian Rogers
2024-11-01 20:07     ` Alejandro Colomar
2024-11-02 10:08       ` G. Branden Robinson
2024-11-02 10:39         ` Alejandro Colomar [this message]
2024-11-02 21:36           ` Alejandro Colomar
2024-11-02 23:47             ` Colin Watson
2024-11-03  0:05               ` Alejandro Colomar
2024-11-03  0:07                 ` Alejandro Colomar
2024-11-03  0:24                 ` Colin Watson
2024-11-03  0:42                   ` Alejandro Colomar
2024-11-03  0:47                 ` Colin Watson
2024-11-03  1:09                   ` G. Branden Robinson
2024-11-03  1:18                     ` Colin Watson
2024-11-03  1:59                   ` Alejandro Colomar
2024-11-03 14:32                     ` Colin Watson
2024-11-03  4:05           ` G. Branden Robinson
2024-11-02 19:06         ` Colin Watson
2024-11-03  0:50           ` G. Branden Robinson
2024-11-03  1:55             ` Colin Watson
2024-11-02 23:10     ` [PATCH 0/3] Add mansect(1) program and manual page Alejandro Colomar
2024-11-02 23:10       ` [PATCH] CONTRIBUTING.d/patches: Document new features alongside the features Alejandro Colomar
2024-11-02 23:17         ` Alejandro Colomar
2024-11-02 23:10       ` [PATCH 1/3] signal.7: Better description for SIGFPE Alejandro Colomar
2024-11-02 23:17         ` Alejandro Colomar
2024-11-02 23:10       ` [PATCH 2/3] src/bin/mansect, mansect.1: Add program and its manual page Alejandro Colomar
2024-11-02 23:10       ` [PATCH 3/3] scripts/bash_aliases: man_section(), man_lsfunc(), man_lsvar(): Use mansect(1) Alejandro Colomar
2024-11-03  1:16       ` [PATCH v2 0/4] Add mansect(1) Alejandro Colomar
2024-11-03  1:16         ` [PATCH v2 1/4] src/bin/mansect, mansect.1: Add program and its manual page Alejandro Colomar
2024-11-03  1:17         ` [PATCH v2 2/4] scripts/bash_aliases: man_lsfunc(), man_lsvar(): Use mansect(1) Alejandro Colomar
2024-11-03  1:17         ` [PATCH v2 3/4] scripts/bash_aliases: man_lsfunc(), man_lsvar(): Use pcre2grep(1) instead of pcregrep(1) Alejandro Colomar
2024-11-03  1:17         ` [PATCH v2 4/4] src/bin/mansect: Preprocess with preconv(1) Alejandro Colomar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241102103937.ose4y72a7yl3dcmz@devuan \
    --to=alx@kernel.org \
    --cc=airlied@gmail.com \
    --cc=cjwatson@debian.org \
    --cc=corbet@lwn.net \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=g.branden.robinson@gmail.com \
    --cc=groff@gnu.org \
    --cc=irogers@google.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-man@vger.kernel.org \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=mripard@kernel.org \
    --cc=simona@ffwll.ch \
    --cc=tzimmermann@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox