From: Alejandro Colomar <alx@kernel.org>
To: "G. Branden Robinson" <g.branden.robinson@gmail.com>
Cc: Ian Rogers <irogers@google.com>, David Airlie <airlied@gmail.com>,
Simona Vetter <simona@ffwll.ch>,
Maarten Lankhorst <maarten.lankhorst@linux.intel.com>,
Maxime Ripard <mripard@kernel.org>,
Thomas Zimmermann <tzimmermann@suse.de>,
Jonathan Corbet <corbet@lwn.net>,
dri-devel@lists.freedesktop.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-man@vger.kernel.org,
cjwatson@debian.org, groff@gnu.org
Subject: Re: [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page
Date: Sat, 2 Nov 2024 11:39:37 +0100 [thread overview]
Message-ID: <20241102103937.ose4y72a7yl3dcmz@devuan> (raw)
In-Reply-To: <20241102100837.anfonowxfx4ekn3d@illithid>
[-- Attachment #1: Type: text/plain, Size: 12063 bytes --]
Hi Branden,
On Sat, Nov 02, 2024 at 05:08:37AM -0500, G. Branden Robinson wrote:
> [adding Colin Watson to CC; and the groff list because I started musing]
>
> Hi Alex,
>
> At 2024-11-01T21:07:29+0100, Alejandro Colomar wrote:
> > > > > -/proc/pid/fdinfo/ \- information about file descriptors
> > > > > +.IR /proc/ pid /fdinfo " \- information about file descriptors"
> > > >
> > > > I wouldn't add formatting here for now. That's something I prefer
> > > > to be cautious about, and if we do it, we should do it in a
> > > > separate commit.
> > >
> > > I'll move it to a separate patch. Is the caution due to a lack of
> > > test infrastructure? That could be something to get resolved,
> > > perhaps through Google summer-of-code and the like.
> >
> > That change might be controversial.
>
> Then let those with objections step forward and make them!
Sure! But that in itself (and the length of your mail) makes a strong
reason to have this in a separate commit. :)
I'm not opposed to the change. Only cautious.
>
> (I may be one of them; see below.)
>
> > We'd first need to check that all software that reads the NAME section
> > would behave well for this.
>
> Not _all_ software, surely. Anybody can write a craptastic man(7)
> scraper, and several have, mainly back when Web 1.0 was going to eat the
> world. Most of those have withered on the vine.
Ahh, yeah, I committed the same mistake I criticise in others every now
and then. $all does not really mean "all". (-Wall, `make all`, ...)
I meant all [of which I care], which is basically groff(1) and
mandoc(1). :)
> This is the _Linux_ man-pages project, so what matters are (1) man page
> formatters and (2) man page indexers that GNU/Linux systems actually
> use. Where people get nervous with the "NAME" section is because of the
> indexer; if one's man(7) _formatter_ can't handle an `IR` call, it
> hasn't earned the name.
Yup.
>
> Here's a sample input.
>
> $ cat /tmp/proc_pid_fdinfo_mini.5
> .TH proc_pid_fdinfo_mini 5 2024-11-02 "example"
> .SH Name
> .IR /proc/ pid /fdinfo " \- information about file descriptors"
> .SH Description
> Text text text text.
>
> Starting with formatters, let's see how they do.
>
> $ nroff -man /tmp/proc_pid_fdinfo_mini.5
> proc_pid_fdinfo_mini(5) File Formats Manual proc_pid_fdinfo_mini(5)
>
> Name
> /proc/pid/fdinfo - information about file descriptors
>
> Description
> Text text text text.
>
> example 2024‐11‐02 proc_pid_fdinfo_mini(5)
> $ mandoc /tmp/proc_pid_fdinfo_mini.5 | ul
> proc_pid_fdinfo_mini(5) File Formats Manual proc_pid_fdinfo_mini(5)
>
> Name
> /proc/pid/fdinfo - information about file descriptors
>
> Description
> Text text text text.
>
> example 2024-11-02 proc_pid_fdinfo_mini(5)
> $ ~/heirloom/bin/nroff -man /tmp/proc_pid_fdinfo_mini.5 | ul
> proc_pid_fdinfo_mini(5) File Formats Manual proc_pid_fdinfo_mini(5)
>
>
>
> Name
> /proc/pid/fdinfo - information about file descriptors
>
> Description
> Text text text text.
>
>
>
> example 2024-11-02 proc_pid_fdinfo_mini(5)
> $ DWBHOME=~/dwb ~/dwb/bin/nroff -man /tmp/proc_pid_fdinfo_mini.5 | cat -s | ul
>
> proc_pid_fdinfo_mini(5)example (2024-11-02)roc_pid_fdinfo_mini(5)
>
> Name
> /proc/pid/fdinfo - information about file descriptors
>
> Description
> Text text text text.
>
> Page 1 (printed 11/2/2024)
>
> I leave the execution of these to perceive the correct font style
> changes as an exercise for the reader, but they all get the
> "/proc/pid/fdinfo" line right.
>
> On GNU/Linux systems, the only man page indexer I know of is Colin
> Watson's man-db--specifically, its mandb(8) program. But it's nicely
> designed so that the "topic and summary description extraction" task is
> delegated to a standalone tool, lexgrog(1), and we can use that.
>
> $ lexgrog /tmp/proc_pid_fdinfo_mini.5
> /tmp/proc_pid_fdinfo_mini.5: parse failed
>
> Oh, damn. I wasn't expecting that. Maybe this is what defeats Michael
> Kerrisk's scraper with respect to groff's man pages.[1]
>
> Well, I can find a silver lining here, because it gives me an even
> better reason than I had to pitch an idea I've been kicking around for a
> while. Why not enhance groff man(7) to support a mode where _it_ will
> spit out the "Name"/"NAME" section, and only that, _for_ you?
>
> This would be as easy as checking for an option, say '-d EXTRACT=Name',
> and having the package's "TH" and "SH" macro definitions divert
> (literally, with the `di` request) everything _except_ the section of
> interest to a diversion that is then never called/output. (This is
> similar to an m4 feature known as the "black hole diversion".)
Sounds good. And then lexgrog(1) would be a one-liner that calls
groff(1) with the appropriate flag, right?
> All of the features necessary to implement this[2] were part of troff as
> far as back as the birth of the man(7) package itself. It's not clear
> to me why it wasn't done back in the 1980s.
Not enough energy of activation, probably, as with most stuff.
> lexgrog(1) itself will of course have to stay around for years to come,
You can make it a wrapper around groff(1) with flags, no?
> but this could take a significant distraction off of Colin's plate--I
> believe I have seen him grumble about how much *roff syntax he has to
> parse to have the feature be workable, and that's without upstart groff
> maintainers exploring up to every boundary that existed even in 1979 and
> cheerfully exercising their findings in man pages.
>
> I also of course have ideas for generalizing the feature, so that you
> can request any (sub)section by name, and, with a bit more ambition,[4]
> paragraph tags (`TP`) too.
>
> So you could do things like:
>
> nroff -man -d EXTRACT="RETURN VALUE" man3/bsearch.3
I certainly use this.
# man_section() prints specific manual page sections (DESCRIPTION, SYNOPSIS,
# ...) of all manual pages in a directory (or in a single manual page file).
# Usage example: .../man-pages$ man_section man2 SYNOPSIS 'SEE ALSO';
man_section()
{
if [ $# -lt 2 ]; then
>&2 echo "Usage: ${FUNCNAME[0]} <dir> <section>...";
return $EX_USAGE;
fi
local page="$1";
shift;
local sect="$*";
find "$page" -type f \
|xargs wc -l \
|grep -v -e '\b1 ' -e '\btotal\b' \
|awk '{ print $2 }' \
|sort \
|while read -r manpage; do
(sed -n '/^\.TH/,/^\.SH/{/^\.SH/!p}' <"$manpage";
for s in $sect; do
<"$manpage" \
sed -n \
-e "/^\.SH $s/p" \
-e "/^\.SH $s/,/^\.SH/{/^\.SH/!p}";
done;) \
|mandoc -Tutf8 2>/dev/null \
|col -pbx;
done;
}
# man_lsfunc() prints the name of all C functions declared in the SYNOPSIS
# of all manual pages in a directory (or in a single manual page file).
# Each name is printed in a separate line
# Usage example: .../man-pages$ man_lsfunc man2;
man_lsfunc()
{
if [ $# -lt 1 ]; then
>&2 echo "Usage: ${FUNCNAME[0]} <manpage|manNdir>...";
return $EX_USAGE;
fi
for arg in "$@"; do
man_section "$arg" 'SYNOPSIS';
done \
|sed_rm_ccomments \
|pcregrep -Mn '(?s)^ [\w ]+ \**\w+\([\w\s(,)[\]*]*?(...)?\s*\); *$' \
|grep '^[0-9]' \
|sed -E 's/syscall\(SYS_(\w*),?/\1(/' \
|sed -E 's/^[^(]+ \**(\w+)\(.*/\1/' \
|uniq;
}
# man_lsvar() prints the name of all C variables declared in the SYNOPSIS
# of all manual pages in a directory (or in a single manual page file).
# Each name is printed in a separate line
# Usage example: .../man-pages$ man_lsvar man3;
man_lsvar()
{
if [ $# -lt 1 ]; then
>&2 echo "Usage: ${FUNCNAME[0]} <manpage|manNdir>...";
return $EX_USAGE;
fi
for arg in "$@"; do
man_section "$arg" 'SYNOPSIS';
done \
|sed_rm_ccomments \
|pcregrep -Mv '(?s)^ [\w ]+ \**\w+\([\w\s(,)[\]*]+?(...)?\s*\); *$' \
|pcregrep -Mn \
-e '(?s)^ +extern [\w ]+ \**\(\*+[\w ]+\)\([\w\s(,)[\]*]+?\s*\); *$' \
-e '^ +extern [\w ]+ \**[\w ]+; *$' \
|grep '^[0-9]' \
|grep -v 'typedef' \
|sed -E 's/^[0-9]+: +extern [^(]+ \**\(\*+(\w* )?(\w+)\)\(.*/\2/' \
|sed 's/^[0-9]\+: \+extern .* \**\(\w\+\); */\1/' \
|uniq;
}
Even grepc(1) derived from those scripts.
>
> and:
>
> nroff -man -d EXTRACT="OPTIONS/-b" man8/zic.8
While I haven't used this yet, it's probably because it's quite complex
to implement with regexes, not because it wouldn't be useful.
>
> ...does this sound appetizing to anyone?
Certainly.
> > Also, many other pages might need to be changed accordingly for
> > consistency.
>
> I withdraw the suggestion until lexgrog(1) flexes its own muscles, or
> has groff(1) do the lifting. I'm sorry for prompting churn, Ian.
>
> > No, this isn't outdated, since that reduces the quality of the diff.
> > Also, I review a lot of patches in the mail client, without running
> > git(1). And it's not just for reviewing diffs, but also for writing
> > them. Semantic newlines reduce the amount of work for producing the
> > diffs.
>
> It's a real win for diffs.
And diffs are a real win for text. Thus, semantic newlines are a real
win for text. "Write poems, not prose." (Any chance we may get that
warning added to groff(1)? :D)
Cheers,
Alex
>
> Here's a very recent example from groff.
>
> diff --git a/man/groff.7.man b/man/groff.7.man
> index 1fb635f2b..1d248b237 100644
> --- a/man/groff.7.man
> +++ b/man/groff.7.man
> @@ -1281,6 +1281,7 @@ .SH Identifiers
> typeface,
> color,
> special character or character class,
> +hyphenation language code,
> environment,
> or stream.
> .
>
>
> (So recent that in fact I haven't pushed that yet.)
>
> Lists like the foregoing are common in man pages.
>
> Regards,
> Branden
>
> [1] https://man7.org/linux/man-pages/dir_by_project.html#groff
> [2] String definitions, "string comparisons"[3], and diversions.
> [3] strictly, "formatted output comparisons"
>
> https://www.gnu.org/software/groff/manual/groff.html.node/Operators-in-Conditionals.html
>
> You can do stricter string comparisons in GNU troff. And I've
> thought of some syntactic sugar for performing them that wouldn't
> break backward compatibility.
>
> [4] To really land the feature, we need automatic tag generation from
> input text (we don't want to make the man page author construct
> their own tags). Another reason we want the construction to be
> automatic is to make the tags unique when multiple man pages are
> formatted in one run, as one might do when making a book of man
> pages. Automatic tagging will also enable the slaying of two other
> ancient dragons.
>
> 1. deep internal links for PDF bookmarks
> 2. pod2man's `IX`-happy output; the widespread use of this
> nonstandard macro confuses way too many novice page authors, and
> bloats document size.
>
> Another feature we'll really want to do this right is improved string
> processing facilities. That, too, is something that will pay
> dividends in several areas. With a proper string iterator in the
> formatter (and a couple more conditional operators),[5] it will be
> possible to write a string library as a macro file, slimming down the
> formatter itself a little and making macro writers' lives easier.
> We're only two days into the month and this has already come up on
> the groff list.
>
> https://lists.gnu.org/archive/html/groff/2024-11/msg00002.html
>
> [5] https://savannah.gnu.org/bugs/?62264
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2024-11-02 10:40 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-15 21:17 [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page Ian Rogers
2024-10-15 21:17 ` [PATCH v2 2/3] proc_pid_fdinfo.5: Add subsection headers for different fd types Ian Rogers
2024-10-15 21:17 ` [PATCH v2 3/3] proc_pid_fdinfo.5: Add DRM subsection Ian Rogers
2024-11-01 13:24 ` [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page Alejandro Colomar
2024-11-01 18:19 ` Ian Rogers
2024-11-01 20:07 ` Alejandro Colomar
2024-11-02 10:08 ` G. Branden Robinson
2024-11-02 10:39 ` Alejandro Colomar [this message]
2024-11-02 21:36 ` Alejandro Colomar
2024-11-02 23:47 ` Colin Watson
2024-11-03 0:05 ` Alejandro Colomar
2024-11-03 0:07 ` Alejandro Colomar
2024-11-03 0:24 ` Colin Watson
2024-11-03 0:42 ` Alejandro Colomar
2024-11-03 0:47 ` Colin Watson
2024-11-03 1:09 ` G. Branden Robinson
2024-11-03 1:18 ` Colin Watson
2024-11-03 1:59 ` Alejandro Colomar
2024-11-03 14:32 ` Colin Watson
2024-11-03 4:05 ` G. Branden Robinson
2024-11-02 19:06 ` Colin Watson
2024-11-03 0:50 ` G. Branden Robinson
2024-11-03 1:55 ` Colin Watson
2024-11-02 23:10 ` [PATCH 0/3] Add mansect(1) program and manual page Alejandro Colomar
2024-11-02 23:10 ` [PATCH] CONTRIBUTING.d/patches: Document new features alongside the features Alejandro Colomar
2024-11-02 23:17 ` Alejandro Colomar
2024-11-02 23:10 ` [PATCH 1/3] signal.7: Better description for SIGFPE Alejandro Colomar
2024-11-02 23:17 ` Alejandro Colomar
2024-11-02 23:10 ` [PATCH 2/3] src/bin/mansect, mansect.1: Add program and its manual page Alejandro Colomar
2024-11-02 23:10 ` [PATCH 3/3] scripts/bash_aliases: man_section(), man_lsfunc(), man_lsvar(): Use mansect(1) Alejandro Colomar
2024-11-03 1:16 ` [PATCH v2 0/4] Add mansect(1) Alejandro Colomar
2024-11-03 1:16 ` [PATCH v2 1/4] src/bin/mansect, mansect.1: Add program and its manual page Alejandro Colomar
2024-11-03 1:17 ` [PATCH v2 2/4] scripts/bash_aliases: man_lsfunc(), man_lsvar(): Use mansect(1) Alejandro Colomar
2024-11-03 1:17 ` [PATCH v2 3/4] scripts/bash_aliases: man_lsfunc(), man_lsvar(): Use pcre2grep(1) instead of pcregrep(1) Alejandro Colomar
2024-11-03 1:17 ` [PATCH v2 4/4] src/bin/mansect: Preprocess with preconv(1) Alejandro Colomar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241102103937.ose4y72a7yl3dcmz@devuan \
--to=alx@kernel.org \
--cc=airlied@gmail.com \
--cc=cjwatson@debian.org \
--cc=corbet@lwn.net \
--cc=dri-devel@lists.freedesktop.org \
--cc=g.branden.robinson@gmail.com \
--cc=groff@gnu.org \
--cc=irogers@google.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-man@vger.kernel.org \
--cc=maarten.lankhorst@linux.intel.com \
--cc=mripard@kernel.org \
--cc=simona@ffwll.ch \
--cc=tzimmermann@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.