From: Alejandro Colomar <alx@kernel.org>
To: "G. Branden Robinson" <g.branden.robinson@gmail.com>
Cc: Ian Rogers <irogers@google.com>, David Airlie <airlied@gmail.com>,
Simona Vetter <simona@ffwll.ch>,
Maarten Lankhorst <maarten.lankhorst@linux.intel.com>,
Maxime Ripard <mripard@kernel.org>,
Thomas Zimmermann <tzimmermann@suse.de>,
Jonathan Corbet <corbet@lwn.net>,
dri-devel@lists.freedesktop.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-man@vger.kernel.org,
cjwatson@debian.org, groff@gnu.org
Subject: Re: [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page
Date: Sat, 2 Nov 2024 11:39:37 +0100 [thread overview]
Message-ID: <20241102103937.ose4y72a7yl3dcmz@devuan> (raw)
In-Reply-To: <20241102100837.anfonowxfx4ekn3d@illithid>
[-- Attachment #1: Type: text/plain, Size: 12063 bytes --]
Hi Branden,
On Sat, Nov 02, 2024 at 05:08:37AM -0500, G. Branden Robinson wrote:
> [adding Colin Watson to CC; and the groff list because I started musing]
>
> Hi Alex,
>
> At 2024-11-01T21:07:29+0100, Alejandro Colomar wrote:
> > > > > -/proc/pid/fdinfo/ \- information about file descriptors
> > > > > +.IR /proc/ pid /fdinfo " \- information about file descriptors"
> > > >
> > > > I wouldn't add formatting here for now. That's something I prefer
> > > > to be cautious about, and if we do it, we should do it in a
> > > > separate commit.
> > >
> > > I'll move it to a separate patch. Is the caution due to a lack of
> > > test infrastructure? That could be something to get resolved,
> > > perhaps through Google summer-of-code and the like.
> >
> > That change might be controversial.
>
> Then let those with objections step forward and make them!
Sure! But that in itself (and the length of your mail) makes a strong
reason to have this in a separate commit. :)
I'm not opposed to the change. Only cautious.
>
> (I may be one of them; see below.)
>
> > We'd first need to check that all software that reads the NAME section
> > would behave well for this.
>
> Not _all_ software, surely. Anybody can write a craptastic man(7)
> scraper, and several have, mainly back when Web 1.0 was going to eat the
> world. Most of those have withered on the vine.
Ahh, yeah, I committed the same mistake I criticise in others every now
and then. $all does not really mean "all". (-Wall, `make all`, ...)
I meant all [of which I care], which is basically groff(1) and
mandoc(1). :)
> This is the _Linux_ man-pages project, so what matters are (1) man page
> formatters and (2) man page indexers that GNU/Linux systems actually
> use. Where people get nervous with the "NAME" section is because of the
> indexer; if one's man(7) _formatter_ can't handle an `IR` call, it
> hasn't earned the name.
Yup.
>
> Here's a sample input.
>
> $ cat /tmp/proc_pid_fdinfo_mini.5
> .TH proc_pid_fdinfo_mini 5 2024-11-02 "example"
> .SH Name
> .IR /proc/ pid /fdinfo " \- information about file descriptors"
> .SH Description
> Text text text text.
>
> Starting with formatters, let's see how they do.
>
> $ nroff -man /tmp/proc_pid_fdinfo_mini.5
> proc_pid_fdinfo_mini(5) File Formats Manual proc_pid_fdinfo_mini(5)
>
> Name
> /proc/pid/fdinfo - information about file descriptors
>
> Description
> Text text text text.
>
> example 2024‐11‐02 proc_pid_fdinfo_mini(5)
> $ mandoc /tmp/proc_pid_fdinfo_mini.5 | ul
> proc_pid_fdinfo_mini(5) File Formats Manual proc_pid_fdinfo_mini(5)
>
> Name
> /proc/pid/fdinfo - information about file descriptors
>
> Description
> Text text text text.
>
> example 2024-11-02 proc_pid_fdinfo_mini(5)
> $ ~/heirloom/bin/nroff -man /tmp/proc_pid_fdinfo_mini.5 | ul
> proc_pid_fdinfo_mini(5) File Formats Manual proc_pid_fdinfo_mini(5)
>
>
>
> Name
> /proc/pid/fdinfo - information about file descriptors
>
> Description
> Text text text text.
>
>
>
> example 2024-11-02 proc_pid_fdinfo_mini(5)
> $ DWBHOME=~/dwb ~/dwb/bin/nroff -man /tmp/proc_pid_fdinfo_mini.5 | cat -s | ul
>
> proc_pid_fdinfo_mini(5)example (2024-11-02)roc_pid_fdinfo_mini(5)
>
> Name
> /proc/pid/fdinfo - information about file descriptors
>
> Description
> Text text text text.
>
> Page 1 (printed 11/2/2024)
>
> I leave the execution of these to perceive the correct font style
> changes as an exercise for the reader, but they all get the
> "/proc/pid/fdinfo" line right.
>
> On GNU/Linux systems, the only man page indexer I know of is Colin
> Watson's man-db--specifically, its mandb(8) program. But it's nicely
> designed so that the "topic and summary description extraction" task is
> delegated to a standalone tool, lexgrog(1), and we can use that.
>
> $ lexgrog /tmp/proc_pid_fdinfo_mini.5
> /tmp/proc_pid_fdinfo_mini.5: parse failed
>
> Oh, damn. I wasn't expecting that. Maybe this is what defeats Michael
> Kerrisk's scraper with respect to groff's man pages.[1]
>
> Well, I can find a silver lining here, because it gives me an even
> better reason than I had to pitch an idea I've been kicking around for a
> while. Why not enhance groff man(7) to support a mode where _it_ will
> spit out the "Name"/"NAME" section, and only that, _for_ you?
>
> This would be as easy as checking for an option, say '-d EXTRACT=Name',
> and having the package's "TH" and "SH" macro definitions divert
> (literally, with the `di` request) everything _except_ the section of
> interest to a diversion that is then never called/output. (This is
> similar to an m4 feature known as the "black hole diversion".)
Sounds good. And then lexgrog(1) would be a one-liner that calls
groff(1) with the appropriate flag, right?
> All of the features necessary to implement this[2] were part of troff as
> far as back as the birth of the man(7) package itself. It's not clear
> to me why it wasn't done back in the 1980s.
Not enough energy of activation, probably, as with most stuff.
> lexgrog(1) itself will of course have to stay around for years to come,
You can make it a wrapper around groff(1) with flags, no?
> but this could take a significant distraction off of Colin's plate--I
> believe I have seen him grumble about how much *roff syntax he has to
> parse to have the feature be workable, and that's without upstart groff
> maintainers exploring up to every boundary that existed even in 1979 and
> cheerfully exercising their findings in man pages.
>
> I also of course have ideas for generalizing the feature, so that you
> can request any (sub)section by name, and, with a bit more ambition,[4]
> paragraph tags (`TP`) too.
>
> So you could do things like:
>
> nroff -man -d EXTRACT="RETURN VALUE" man3/bsearch.3
I certainly use this.
# man_section() prints specific manual page sections (DESCRIPTION, SYNOPSIS,
# ...) of all manual pages in a directory (or in a single manual page file).
# Usage example: .../man-pages$ man_section man2 SYNOPSIS 'SEE ALSO';
man_section()
{
if [ $# -lt 2 ]; then
>&2 echo "Usage: ${FUNCNAME[0]} <dir> <section>...";
return $EX_USAGE;
fi
local page="$1";
shift;
local sect="$*";
find "$page" -type f \
|xargs wc -l \
|grep -v -e '\b1 ' -e '\btotal\b' \
|awk '{ print $2 }' \
|sort \
|while read -r manpage; do
(sed -n '/^\.TH/,/^\.SH/{/^\.SH/!p}' <"$manpage";
for s in $sect; do
<"$manpage" \
sed -n \
-e "/^\.SH $s/p" \
-e "/^\.SH $s/,/^\.SH/{/^\.SH/!p}";
done;) \
|mandoc -Tutf8 2>/dev/null \
|col -pbx;
done;
}
# man_lsfunc() prints the name of all C functions declared in the SYNOPSIS
# of all manual pages in a directory (or in a single manual page file).
# Each name is printed in a separate line
# Usage example: .../man-pages$ man_lsfunc man2;
man_lsfunc()
{
if [ $# -lt 1 ]; then
>&2 echo "Usage: ${FUNCNAME[0]} <manpage|manNdir>...";
return $EX_USAGE;
fi
for arg in "$@"; do
man_section "$arg" 'SYNOPSIS';
done \
|sed_rm_ccomments \
|pcregrep -Mn '(?s)^ [\w ]+ \**\w+\([\w\s(,)[\]*]*?(...)?\s*\); *$' \
|grep '^[0-9]' \
|sed -E 's/syscall\(SYS_(\w*),?/\1(/' \
|sed -E 's/^[^(]+ \**(\w+)\(.*/\1/' \
|uniq;
}
# man_lsvar() prints the name of all C variables declared in the SYNOPSIS
# of all manual pages in a directory (or in a single manual page file).
# Each name is printed in a separate line
# Usage example: .../man-pages$ man_lsvar man3;
man_lsvar()
{
if [ $# -lt 1 ]; then
>&2 echo "Usage: ${FUNCNAME[0]} <manpage|manNdir>...";
return $EX_USAGE;
fi
for arg in "$@"; do
man_section "$arg" 'SYNOPSIS';
done \
|sed_rm_ccomments \
|pcregrep -Mv '(?s)^ [\w ]+ \**\w+\([\w\s(,)[\]*]+?(...)?\s*\); *$' \
|pcregrep -Mn \
-e '(?s)^ +extern [\w ]+ \**\(\*+[\w ]+\)\([\w\s(,)[\]*]+?\s*\); *$' \
-e '^ +extern [\w ]+ \**[\w ]+; *$' \
|grep '^[0-9]' \
|grep -v 'typedef' \
|sed -E 's/^[0-9]+: +extern [^(]+ \**\(\*+(\w* )?(\w+)\)\(.*/\2/' \
|sed 's/^[0-9]\+: \+extern .* \**\(\w\+\); */\1/' \
|uniq;
}
Even grepc(1) derived from those scripts.
>
> and:
>
> nroff -man -d EXTRACT="OPTIONS/-b" man8/zic.8
While I haven't used this yet, it's probably because it's quite complex
to implement with regexes, not because it wouldn't be useful.
>
> ...does this sound appetizing to anyone?
Certainly.
> > Also, many other pages might need to be changed accordingly for
> > consistency.
>
> I withdraw the suggestion until lexgrog(1) flexes its own muscles, or
> has groff(1) do the lifting. I'm sorry for prompting churn, Ian.
>
> > No, this isn't outdated, since that reduces the quality of the diff.
> > Also, I review a lot of patches in the mail client, without running
> > git(1). And it's not just for reviewing diffs, but also for writing
> > them. Semantic newlines reduce the amount of work for producing the
> > diffs.
>
> It's a real win for diffs.
And diffs are a real win for text. Thus, semantic newlines are a real
win for text. "Write poems, not prose." (Any chance we may get that
warning added to groff(1)? :D)
Cheers,
Alex
>
> Here's a very recent example from groff.
>
> diff --git a/man/groff.7.man b/man/groff.7.man
> index 1fb635f2b..1d248b237 100644
> --- a/man/groff.7.man
> +++ b/man/groff.7.man
> @@ -1281,6 +1281,7 @@ .SH Identifiers
> typeface,
> color,
> special character or character class,
> +hyphenation language code,
> environment,
> or stream.
> .
>
>
> (So recent that in fact I haven't pushed that yet.)
>
> Lists like the foregoing are common in man pages.
>
> Regards,
> Branden
>
> [1] https://man7.org/linux/man-pages/dir_by_project.html#groff
> [2] String definitions, "string comparisons"[3], and diversions.
> [3] strictly, "formatted output comparisons"
>
> https://www.gnu.org/software/groff/manual/groff.html.node/Operators-in-Conditionals.html
>
> You can do stricter string comparisons in GNU troff. And I've
> thought of some syntactic sugar for performing them that wouldn't
> break backward compatibility.
>
> [4] To really land the feature, we need automatic tag generation from
> input text (we don't want to make the man page author construct
> their own tags). Another reason we want the construction to be
> automatic is to make the tags unique when multiple man pages are
> formatted in one run, as one might do when making a book of man
> pages. Automatic tagging will also enable the slaying of two other
> ancient dragons.
>
> 1. deep internal links for PDF bookmarks
> 2. pod2man's `IX`-happy output; the widespread use of this
> nonstandard macro confuses way too many novice page authors, and
> bloats document size.
>
> Another feature we'll really want to do this right is improved string
> processing facilities. That, too, is something that will pay
> dividends in several areas. With a proper string iterator in the
> formatter (and a couple more conditional operators),[5] it will be
> possible to write a string library as a macro file, slimming down the
> formatter itself a little and making macro writers' lives easier.
> We're only two days into the month and this has already come up on
> the groff list.
>
> https://lists.gnu.org/archive/html/groff/2024-11/msg00002.html
>
> [5] https://savannah.gnu.org/bugs/?62264
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2024-11-02 10:40 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-15 21:17 [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page Ian Rogers
2024-10-15 21:17 ` [PATCH v2 2/3] proc_pid_fdinfo.5: Add subsection headers for different fd types Ian Rogers
2024-10-15 21:17 ` [PATCH v2 3/3] proc_pid_fdinfo.5: Add DRM subsection Ian Rogers
2024-11-01 13:24 ` [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page Alejandro Colomar
2024-11-01 18:19 ` Ian Rogers
2024-11-01 20:07 ` Alejandro Colomar
2024-11-02 10:08 ` G. Branden Robinson
2024-11-02 10:39 ` Alejandro Colomar [this message]
2024-11-02 21:36 ` Alejandro Colomar
2024-11-02 23:47 ` Colin Watson
2024-11-03 0:05 ` Alejandro Colomar
2024-11-03 0:07 ` Alejandro Colomar
2024-11-03 0:24 ` Colin Watson
2024-11-03 0:42 ` Alejandro Colomar
2024-11-03 0:47 ` Colin Watson
2024-11-03 1:09 ` G. Branden Robinson
2024-11-03 1:18 ` Colin Watson
2024-11-03 1:59 ` Alejandro Colomar
2024-11-03 14:32 ` Colin Watson
2024-11-03 4:05 ` G. Branden Robinson
2024-11-02 19:06 ` Colin Watson
2024-11-03 0:50 ` G. Branden Robinson
2024-11-03 1:55 ` Colin Watson
2024-11-02 23:10 ` [PATCH 0/3] Add mansect(1) program and manual page Alejandro Colomar
2024-11-02 23:10 ` [PATCH] CONTRIBUTING.d/patches: Document new features alongside the features Alejandro Colomar
2024-11-02 23:17 ` Alejandro Colomar
2024-11-02 23:10 ` [PATCH 1/3] signal.7: Better description for SIGFPE Alejandro Colomar
2024-11-02 23:17 ` Alejandro Colomar
2024-11-02 23:10 ` [PATCH 2/3] src/bin/mansect, mansect.1: Add program and its manual page Alejandro Colomar
2024-11-02 23:10 ` [PATCH 3/3] scripts/bash_aliases: man_section(), man_lsfunc(), man_lsvar(): Use mansect(1) Alejandro Colomar
2024-11-03 1:16 ` [PATCH v2 0/4] Add mansect(1) Alejandro Colomar
2024-11-03 1:16 ` [PATCH v2 1/4] src/bin/mansect, mansect.1: Add program and its manual page Alejandro Colomar
2024-11-03 1:17 ` [PATCH v2 2/4] scripts/bash_aliases: man_lsfunc(), man_lsvar(): Use mansect(1) Alejandro Colomar
2024-11-03 1:17 ` [PATCH v2 3/4] scripts/bash_aliases: man_lsfunc(), man_lsvar(): Use pcre2grep(1) instead of pcregrep(1) Alejandro Colomar
2024-11-03 1:17 ` [PATCH v2 4/4] src/bin/mansect: Preprocess with preconv(1) Alejandro Colomar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241102103937.ose4y72a7yl3dcmz@devuan \
--to=alx@kernel.org \
--cc=airlied@gmail.com \
--cc=cjwatson@debian.org \
--cc=corbet@lwn.net \
--cc=dri-devel@lists.freedesktop.org \
--cc=g.branden.robinson@gmail.com \
--cc=groff@gnu.org \
--cc=irogers@google.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-man@vger.kernel.org \
--cc=maarten.lankhorst@linux.intel.com \
--cc=mripard@kernel.org \
--cc=simona@ffwll.ch \
--cc=tzimmermann@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox