* [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page
@ 2024-10-15 21:17 Ian Rogers
2024-10-15 21:17 ` [PATCH v2 2/3] proc_pid_fdinfo.5: Add subsection headers for different fd types Ian Rogers
` (2 more replies)
0 siblings, 3 replies; 35+ messages in thread
From: Ian Rogers @ 2024-10-15 21:17 UTC (permalink / raw)
To: Alejandro Colomar, G . Branden Robinson
Cc: David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Jonathan Corbet, dri-devel, linux-doc,
linux-kernel, linux-man, Ian Rogers
When /proc/pid/fdinfo was part of proc.5 man page the indentation made
sense. As a standalone man page the indentation doesn't need to be so
far over to the right. Remove the initial tagged pragraph and move the
styling to the initial summary description.
Suggested-by: G. Branden Robinson <g.branden.robinson@gmail.com>
Signed-off-by: Ian Rogers <irogers@google.com>
---
man/man5/proc_pid_fdinfo.5 | 66 ++++++++++++++++++--------------------
1 file changed, 32 insertions(+), 34 deletions(-)
diff --git a/man/man5/proc_pid_fdinfo.5 b/man/man5/proc_pid_fdinfo.5
index 1e23bbe02..8678caf4a 100644
--- a/man/man5/proc_pid_fdinfo.5
+++ b/man/man5/proc_pid_fdinfo.5
@@ -6,20 +6,19 @@
.\"
.TH proc_pid_fdinfo 5 (date) "Linux man-pages (unreleased)"
.SH NAME
-/proc/pid/fdinfo/ \- information about file descriptors
+.IR /proc/ pid /fdinfo " \- information about file descriptors"
.SH DESCRIPTION
-.TP
-.IR /proc/ pid /fdinfo/ " (since Linux 2.6.22)"
-This is a subdirectory containing one entry for each file which the
-process has open, named by its file descriptor.
-The files in this directory are readable only by the owner of the process.
-The contents of each file can be read to obtain information
-about the corresponding file descriptor.
-The content depends on the type of file referred to by the
-corresponding file descriptor.
-.IP
+Since Linux 2.6.22,
+this subdirectory contains one entry for each file that process
+.I pid
+has open, named by its file descriptor. The files in this directory
+are readable only by the owner of the process. The contents of each
+file can be read to obtain information about the corresponding file
+descriptor. The content depends on the type of file referred to by
+the corresponding file descriptor.
+.P
For regular files and directories, we see something like:
-.IP
+.P
.in +4n
.EX
.RB "$" " cat /proc/12015/fdinfo/4"
@@ -28,7 +27,7 @@ flags: 01002002
mnt_id: 21
.EE
.in
-.IP
+.P
The fields are as follows:
.RS
.TP
@@ -51,7 +50,6 @@ this field incorrectly displayed the setting of
at the time the file was opened,
rather than the current setting of the close-on-exec flag.
.TP
-.I
.I mnt_id
This field, present since Linux 3.15,
.\" commit 49d063cb353265c3af701bab215ac438ca7df36d
@@ -59,13 +57,13 @@ is the ID of the mount containing this file.
See the description of
.IR /proc/ pid /mountinfo .
.RE
-.IP
+.P
For eventfd file descriptors (see
.BR eventfd (2)),
we see (since Linux 3.8)
.\" commit cbac5542d48127b546a23d816380a7926eee1c25
the following fields:
-.IP
+.P
.in +4n
.EX
pos: 0
@@ -74,16 +72,16 @@ mnt_id: 10
eventfd\-count: 40
.EE
.in
-.IP
+.P
.I eventfd\-count
is the current value of the eventfd counter, in hexadecimal.
-.IP
+.P
For epoll file descriptors (see
.BR epoll (7)),
we see (since Linux 3.8)
.\" commit 138d22b58696c506799f8de759804083ff9effae
the following fields:
-.IP
+.P
.in +4n
.EX
pos: 0
@@ -93,7 +91,7 @@ tfd: 9 events: 19 data: 74253d2500000009
tfd: 7 events: 19 data: 74253d2500000007
.EE
.in
-.IP
+.P
Each of the lines beginning
.I tfd
describes one of the file descriptors being monitored via
@@ -110,13 +108,13 @@ descriptor.
The
.I data
field is the data value associated with this file descriptor.
-.IP
+.P
For signalfd file descriptors (see
.BR signalfd (2)),
we see (since Linux 3.8)
.\" commit 138d22b58696c506799f8de759804083ff9effae
the following fields:
-.IP
+.P
.in +4n
.EX
pos: 0
@@ -125,7 +123,7 @@ mnt_id: 10
sigmask: 0000000000000006
.EE
.in
-.IP
+.P
.I sigmask
is the hexadecimal mask of signals that are accepted via this
signalfd file descriptor.
@@ -135,12 +133,12 @@ and
.BR SIGQUIT ;
see
.BR signal (7).)
-.IP
+.P
For inotify file descriptors (see
.BR inotify (7)),
we see (since Linux 3.8)
the following fields:
-.IP
+.P
.in +4n
.EX
pos: 0
@@ -150,7 +148,7 @@ inotify wd:2 ino:7ef82a sdev:800001 mask:800afff ignored_mask:0 fhandle\-bytes:8
inotify wd:1 ino:192627 sdev:800001 mask:800afff ignored_mask:0 fhandle\-bytes:8 fhandle\-type:1 f_handle:27261900802dfd73
.EE
.in
-.IP
+.P
Each of the lines beginning with "inotify" displays information about
one file or directory that is being monitored.
The fields in this line are as follows:
@@ -168,19 +166,19 @@ The ID of the device where the target file resides (in hexadecimal).
.I mask
The mask of events being monitored for the target file (in hexadecimal).
.RE
-.IP
+.P
If the kernel was built with exportfs support, the path to the target
file is exposed as a file handle, via three hexadecimal fields:
.IR fhandle\-bytes ,
.IR fhandle\-type ,
and
.IR f_handle .
-.IP
+.P
For fanotify file descriptors (see
.BR fanotify (7)),
we see (since Linux 3.8)
the following fields:
-.IP
+.P
.in +4n
.EX
pos: 0
@@ -190,7 +188,7 @@ fanotify flags:0 event\-flags:88002
fanotify ino:19264f sdev:800001 mflags:0 mask:1 ignored_mask:0 fhandle\-bytes:8 fhandle\-type:1 f_handle:4f261900a82dfd73
.EE
.in
-.IP
+.P
The fourth line displays information defined when the fanotify group
was created via
.BR fanotify_init (2):
@@ -210,7 +208,7 @@ argument given to
.BR fanotify_init (2)
(expressed in hexadecimal).
.RE
-.IP
+.P
Each additional line shown in the file contains information
about one of the marks in the fanotify group.
Most of these fields are as for inotify, except:
@@ -228,16 +226,16 @@ The events mask for this mark
The mask of events that are ignored for this mark
(expressed in hexadecimal).
.RE
-.IP
+.P
For details on these fields, see
.BR fanotify_mark (2).
-.IP
+.P
For timerfd file descriptors (see
.BR timerfd (2)),
we see (since Linux 3.17)
.\" commit af9c4957cf212ad9cf0bee34c95cb11de5426e85
the following fields:
-.IP
+.P
.in +4n
.EX
pos: 0
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 35+ messages in thread* [PATCH v2 2/3] proc_pid_fdinfo.5: Add subsection headers for different fd types 2024-10-15 21:17 [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page Ian Rogers @ 2024-10-15 21:17 ` Ian Rogers 2024-10-15 21:17 ` [PATCH v2 3/3] proc_pid_fdinfo.5: Add DRM subsection Ian Rogers 2024-11-01 13:24 ` [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page Alejandro Colomar 2 siblings, 0 replies; 35+ messages in thread From: Ian Rogers @ 2024-10-15 21:17 UTC (permalink / raw) To: Alejandro Colomar, G . Branden Robinson Cc: David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, Jonathan Corbet, dri-devel, linux-doc, linux-kernel, linux-man, Ian Rogers Make the sections about eventfd, epoll, signalfd, inotify, fanotify, timerfd better separated with a clearer subsection header. Signed-off-by: Ian Rogers <irogers@google.com> --- man/man5/proc_pid_fdinfo.5 | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/man/man5/proc_pid_fdinfo.5 b/man/man5/proc_pid_fdinfo.5 index 8678caf4a..02eceac04 100644 --- a/man/man5/proc_pid_fdinfo.5 +++ b/man/man5/proc_pid_fdinfo.5 @@ -57,6 +57,7 @@ is the ID of the mount containing this file. See the description of .IR /proc/ pid /mountinfo . .RE +.SS eventfd .P For eventfd file descriptors (see .BR eventfd (2)), @@ -75,6 +76,7 @@ eventfd\-count: 40 .P .I eventfd\-count is the current value of the eventfd counter, in hexadecimal. +.SS epoll .P For epoll file descriptors (see .BR epoll (7)), @@ -108,6 +110,7 @@ descriptor. The .I data field is the data value associated with this file descriptor. +.SS signalfd .P For signalfd file descriptors (see .BR signalfd (2)), @@ -133,6 +136,7 @@ and .BR SIGQUIT ; see .BR signal (7).) +.SS inotify .P For inotify file descriptors (see .BR inotify (7)), @@ -173,6 +177,7 @@ file is exposed as a file handle, via three hexadecimal fields: .IR fhandle\-type , and .IR f_handle . +.SS fanotify .P For fanotify file descriptors (see .BR fanotify (7)), @@ -229,6 +234,7 @@ The mask of events that are ignored for this mark .P For details on these fields, see .BR fanotify_mark (2). +.SS timerfd .P For timerfd file descriptors (see .BR timerfd (2)), -- 2.47.0.rc1.288.g06298d1525-goog ^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH v2 3/3] proc_pid_fdinfo.5: Add DRM subsection 2024-10-15 21:17 [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page Ian Rogers 2024-10-15 21:17 ` [PATCH v2 2/3] proc_pid_fdinfo.5: Add subsection headers for different fd types Ian Rogers @ 2024-10-15 21:17 ` Ian Rogers 2024-11-01 13:24 ` [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page Alejandro Colomar 2 siblings, 0 replies; 35+ messages in thread From: Ian Rogers @ 2024-10-15 21:17 UTC (permalink / raw) To: Alejandro Colomar, G . Branden Robinson Cc: David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, Jonathan Corbet, dri-devel, linux-doc, linux-kernel, linux-man, Ian Rogers Add description of DRM fdinfo information based on the Linux kernel's `Documentation/gpu/drm-usage-stats.rst`: https://docs.kernel.org/gpu/drm-usage-stats.html Signed-off-by: Ian Rogers <irogers@google.com> --- man/man5/proc_pid_fdinfo.5 | 94 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 94 insertions(+) diff --git a/man/man5/proc_pid_fdinfo.5 b/man/man5/proc_pid_fdinfo.5 index 02eceac04..bb6c07527 100644 --- a/man/man5/proc_pid_fdinfo.5 +++ b/man/man5/proc_pid_fdinfo.5 @@ -300,5 +300,99 @@ fields contain the values that .BR timerfd_gettime (2) on this file descriptor would return.) .RE +.SS Direct Rendering Manager +.P +DRM drivers can optionally choose to expose usage stats through +/proc/pid/fdinfo/. For example: +.P +.in +4n +.EX +pos: 0 +flags: 02100002 +mnt_id: 26 +ino: 284 +drm-driver: i915 +drm-client-id: 39 +drm-pdev: 0000:00:02.0 +drm-total-system0: 6044 KiB +drm-shared-system0: 0 +drm-active-system0: 0 +drm-resident-system0: 6044 KiB +drm-purgeable-system0: 1688 KiB +drm-total-stolen-system0: 0 +drm-shared-stolen-system0: 0 +drm-active-stolen-system0: 0 +drm-resident-stolen-system0: 0 +drm-purgeable-stolen-system0: 0 +drm-engine-render: 346249 ns +drm-engine-copy: 0 ns +drm-engine-video: 0 ns +drm-engine-capacity-video: 2 +drm-engine-video-enhance: 0 ns +.EE +.TP +.IR drm-driver: " .+ (mandatory)" +The name this driver registered. +.TP +.IR drm-pdev: " <aaaa:bb:cc.d>" +For PCI devices this should contain the PCI slot address of the device +in question. +.TP +.IR drm-client-id: " [0-9]+" +Unique value relating to the open DRM file descriptor used to +distinguish duplicated and shared file descriptors. +.P +GPUs usually contain multiple execution engines. Each shall be given a +stable and unique name (<engine_name>), with possible values +documented in the driver specific documentation. +.TP +.IR drm-engine-<engine_name>: " [0-9]+ ns" +GPU engine utilization, time spent busy executing workloads for this client. +.TP +.IR drm-engine-capacity-<engine_name>: " [0-9]+" +Capacity of the engine if not 1, cannot be 0. +.TP +.IR drm-cycles-<engine_name>: " [0-9]+" +Contains the number of busy cycles for the given engine. Values are +not required to be constantly monotonic, but are required to catch up +with the previously reported larger value within a reasonable +period. Upon observing a value lower than what was previously read, +userspace is expected to stay with that larger previous value until a +monotonic update is seen. +.TP +.IR drm-total-cycles-<engine_name>: " [0-9]+" +Contains the total number cycles for the given engine. This is a +timestamp in GPU unspecified unit that matches the update rate of +drm-cycles-<engine_name>. For drivers that implement this interface, +the engine utilization can be calculated entirely on the GPU clock +domain, without considering the CPU sleep time between 2 samples. +.P +Each possible memory type which can be used to store buffer objects by +the GPU in question shall be given a stable and unique name <region>. +The name "memory" is reserved to refer to normal system memory. +.TP +.IR drm-memory-<region>: " [0-9]+ [KiB|MiB]" +The amount of storage currently consumed by the buffer objects belong +to this client, in the respective memory region. +.IP +Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB' +indicating kibi- or mebi-bytes. +.TP +.IR drm-shared-<region>: " [0-9]+ [KiB|MiB]" +The total size of buffers that are shared with another file (e.g., have more +than a single handle). +.TP +.IR drm-total-<region>: " [0-9]+ [KiB|MiB]" +The total size of buffers that including shared and private memory. +.TP +.IR drm-resident-<region>: " [0-9]+ [KiB|MiB]" +The total size of buffers that are resident in the specified region. +.TP +.IR drm-purgeable-<region>: " [0-9]+ [KiB|MiB]" +The total size of buffers that are purgeable. +.TP +.IR drm-active-<region>: " [0-9]+ [KiB|MiB]" +The total size of buffers that are active on one or more engines. + .SH SEE ALSO .BR proc (5) -- 2.47.0.rc1.288.g06298d1525-goog ^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page 2024-10-15 21:17 [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page Ian Rogers 2024-10-15 21:17 ` [PATCH v2 2/3] proc_pid_fdinfo.5: Add subsection headers for different fd types Ian Rogers 2024-10-15 21:17 ` [PATCH v2 3/3] proc_pid_fdinfo.5: Add DRM subsection Ian Rogers @ 2024-11-01 13:24 ` Alejandro Colomar 2024-11-01 18:19 ` Ian Rogers 2 siblings, 1 reply; 35+ messages in thread From: Alejandro Colomar @ 2024-11-01 13:24 UTC (permalink / raw) To: Ian Rogers Cc: G . Branden Robinson, David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, Jonathan Corbet, dri-devel, linux-doc, linux-kernel, linux-man [-- Attachment #1: Type: text/plain, Size: 7486 bytes --] On Tue, Oct 15, 2024 at 02:17:17PM -0700, Ian Rogers wrote: > When /proc/pid/fdinfo was part of proc.5 man page the indentation made > sense. As a standalone man page the indentation doesn't need to be so > far over to the right. Remove the initial tagged pragraph and move the > styling to the initial summary description. > > Suggested-by: G. Branden Robinson <g.branden.robinson@gmail.com> > Signed-off-by: Ian Rogers <irogers@google.com> > --- > man/man5/proc_pid_fdinfo.5 | 66 ++++++++++++++++++-------------------- > 1 file changed, 32 insertions(+), 34 deletions(-) > > diff --git a/man/man5/proc_pid_fdinfo.5 b/man/man5/proc_pid_fdinfo.5 > index 1e23bbe02..8678caf4a 100644 > --- a/man/man5/proc_pid_fdinfo.5 > +++ b/man/man5/proc_pid_fdinfo.5 > @@ -6,20 +6,19 @@ > .\" > .TH proc_pid_fdinfo 5 (date) "Linux man-pages (unreleased)" > .SH NAME > -/proc/pid/fdinfo/ \- information about file descriptors > +.IR /proc/ pid /fdinfo " \- information about file descriptors" I wouldn't add formatting here for now. That's something I prefer to be cautious about, and if we do it, we should do it in a separate commit. > .SH DESCRIPTION > -.TP > -.IR /proc/ pid /fdinfo/ " (since Linux 2.6.22)" > -This is a subdirectory containing one entry for each file which the > -process has open, named by its file descriptor. > -The files in this directory are readable only by the owner of the process. > -The contents of each file can be read to obtain information > -about the corresponding file descriptor. > -The content depends on the type of file referred to by the > -corresponding file descriptor. > -.IP > +Since Linux 2.6.22, You could move this information to a HISTORY section. > +this subdirectory contains one entry for each file that process > +.I pid > +has open, named by its file descriptor. The files in this directory Please don't reflow existing text. Please read about semantic newlines in man-pages(7): $ MANWIDTH=72 man man-pages | sed -n '/Use semantic newlines/,/^$/p' Use semantic newlines In the source of a manual page, new sentences should be started on new lines, long sentences should be split into lines at clause breaks (commas, semicolons, colons, and so on), and long clauses should be split at phrase boundaries. This convention, sometimes known as "semantic newlines", makes it easier to see the effect of patches, which often operate at the level of individual sen‐ tences, clauses, or phrases. Have a lovely day! Alex > +are readable only by the owner of the process. The contents of each > +file can be read to obtain information about the corresponding file > +descriptor. The content depends on the type of file referred to by > +the corresponding file descriptor. > +.P > For regular files and directories, we see something like: > -.IP > +.P > .in +4n > .EX > .RB "$" " cat /proc/12015/fdinfo/4" > @@ -28,7 +27,7 @@ flags: 01002002 > mnt_id: 21 > .EE > .in > -.IP > +.P > The fields are as follows: > .RS > .TP > @@ -51,7 +50,6 @@ this field incorrectly displayed the setting of > at the time the file was opened, > rather than the current setting of the close-on-exec flag. > .TP > -.I > .I mnt_id > This field, present since Linux 3.15, > .\" commit 49d063cb353265c3af701bab215ac438ca7df36d > @@ -59,13 +57,13 @@ is the ID of the mount containing this file. > See the description of > .IR /proc/ pid /mountinfo . > .RE > -.IP > +.P > For eventfd file descriptors (see > .BR eventfd (2)), > we see (since Linux 3.8) > .\" commit cbac5542d48127b546a23d816380a7926eee1c25 > the following fields: > -.IP > +.P > .in +4n > .EX > pos: 0 > @@ -74,16 +72,16 @@ mnt_id: 10 > eventfd\-count: 40 > .EE > .in > -.IP > +.P > .I eventfd\-count > is the current value of the eventfd counter, in hexadecimal. > -.IP > +.P > For epoll file descriptors (see > .BR epoll (7)), > we see (since Linux 3.8) > .\" commit 138d22b58696c506799f8de759804083ff9effae > the following fields: > -.IP > +.P > .in +4n > .EX > pos: 0 > @@ -93,7 +91,7 @@ tfd: 9 events: 19 data: 74253d2500000009 > tfd: 7 events: 19 data: 74253d2500000007 > .EE > .in > -.IP > +.P > Each of the lines beginning > .I tfd > describes one of the file descriptors being monitored via > @@ -110,13 +108,13 @@ descriptor. > The > .I data > field is the data value associated with this file descriptor. > -.IP > +.P > For signalfd file descriptors (see > .BR signalfd (2)), > we see (since Linux 3.8) > .\" commit 138d22b58696c506799f8de759804083ff9effae > the following fields: > -.IP > +.P > .in +4n > .EX > pos: 0 > @@ -125,7 +123,7 @@ mnt_id: 10 > sigmask: 0000000000000006 > .EE > .in > -.IP > +.P > .I sigmask > is the hexadecimal mask of signals that are accepted via this > signalfd file descriptor. > @@ -135,12 +133,12 @@ and > .BR SIGQUIT ; > see > .BR signal (7).) > -.IP > +.P > For inotify file descriptors (see > .BR inotify (7)), > we see (since Linux 3.8) > the following fields: > -.IP > +.P > .in +4n > .EX > pos: 0 > @@ -150,7 +148,7 @@ inotify wd:2 ino:7ef82a sdev:800001 mask:800afff ignored_mask:0 fhandle\-bytes:8 > inotify wd:1 ino:192627 sdev:800001 mask:800afff ignored_mask:0 fhandle\-bytes:8 fhandle\-type:1 f_handle:27261900802dfd73 > .EE > .in > -.IP > +.P > Each of the lines beginning with "inotify" displays information about > one file or directory that is being monitored. > The fields in this line are as follows: > @@ -168,19 +166,19 @@ The ID of the device where the target file resides (in hexadecimal). > .I mask > The mask of events being monitored for the target file (in hexadecimal). > .RE > -.IP > +.P > If the kernel was built with exportfs support, the path to the target > file is exposed as a file handle, via three hexadecimal fields: > .IR fhandle\-bytes , > .IR fhandle\-type , > and > .IR f_handle . > -.IP > +.P > For fanotify file descriptors (see > .BR fanotify (7)), > we see (since Linux 3.8) > the following fields: > -.IP > +.P > .in +4n > .EX > pos: 0 > @@ -190,7 +188,7 @@ fanotify flags:0 event\-flags:88002 > fanotify ino:19264f sdev:800001 mflags:0 mask:1 ignored_mask:0 fhandle\-bytes:8 fhandle\-type:1 f_handle:4f261900a82dfd73 > .EE > .in > -.IP > +.P > The fourth line displays information defined when the fanotify group > was created via > .BR fanotify_init (2): > @@ -210,7 +208,7 @@ argument given to > .BR fanotify_init (2) > (expressed in hexadecimal). > .RE > -.IP > +.P > Each additional line shown in the file contains information > about one of the marks in the fanotify group. > Most of these fields are as for inotify, except: > @@ -228,16 +226,16 @@ The events mask for this mark > The mask of events that are ignored for this mark > (expressed in hexadecimal). > .RE > -.IP > +.P > For details on these fields, see > .BR fanotify_mark (2). > -.IP > +.P > For timerfd file descriptors (see > .BR timerfd (2)), > we see (since Linux 3.17) > .\" commit af9c4957cf212ad9cf0bee34c95cb11de5426e85 > the following fields: > -.IP > +.P > .in +4n > .EX > pos: 0 > -- > 2.47.0.rc1.288.g06298d1525-goog > -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page 2024-11-01 13:24 ` [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page Alejandro Colomar @ 2024-11-01 18:19 ` Ian Rogers 2024-11-01 20:07 ` Alejandro Colomar 2024-11-02 23:10 ` [PATCH 0/3] Add mansect(1) program and manual page Alejandro Colomar 0 siblings, 2 replies; 35+ messages in thread From: Ian Rogers @ 2024-11-01 18:19 UTC (permalink / raw) To: Alejandro Colomar Cc: G . Branden Robinson, David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, Jonathan Corbet, dri-devel, linux-doc, linux-kernel, linux-man On Fri, Nov 1, 2024 at 6:24 AM Alejandro Colomar <alx@kernel.org> wrote: > > On Tue, Oct 15, 2024 at 02:17:17PM -0700, Ian Rogers wrote: > > When /proc/pid/fdinfo was part of proc.5 man page the indentation made > > sense. As a standalone man page the indentation doesn't need to be so > > far over to the right. Remove the initial tagged pragraph and move the > > styling to the initial summary description. > > > > Suggested-by: G. Branden Robinson <g.branden.robinson@gmail.com> > > Signed-off-by: Ian Rogers <irogers@google.com> > > --- > > man/man5/proc_pid_fdinfo.5 | 66 ++++++++++++++++++-------------------- > > 1 file changed, 32 insertions(+), 34 deletions(-) > > > > diff --git a/man/man5/proc_pid_fdinfo.5 b/man/man5/proc_pid_fdinfo.5 > > index 1e23bbe02..8678caf4a 100644 > > --- a/man/man5/proc_pid_fdinfo.5 > > +++ b/man/man5/proc_pid_fdinfo.5 > > @@ -6,20 +6,19 @@ > > .\" > > .TH proc_pid_fdinfo 5 (date) "Linux man-pages (unreleased)" > > .SH NAME > > -/proc/pid/fdinfo/ \- information about file descriptors > > +.IR /proc/ pid /fdinfo " \- information about file descriptors" > > I wouldn't add formatting here for now. That's something I prefer to be > cautious about, and if we do it, we should do it in a separate commit. I'll move it to a separate patch. Is the caution due to a lack of test infrastructure? That could be something to get resolved, perhaps through Google summer-of-code and the like. > > .SH DESCRIPTION > > -.TP > > -.IR /proc/ pid /fdinfo/ " (since Linux 2.6.22)" > > -This is a subdirectory containing one entry for each file which the > > -process has open, named by its file descriptor. > > -The files in this directory are readable only by the owner of the process. > > -The contents of each file can be read to obtain information > > -about the corresponding file descriptor. > > -The content depends on the type of file referred to by the > > -corresponding file descriptor. > > -.IP > > +Since Linux 2.6.22, > > You could move this information to a HISTORY section. Sure, tbh I'm not sure anybody cares about this information and it could be as well to delete it. Sorry people running 17 year old kernels. For now I'll try to leave it unchanged. > > +this subdirectory contains one entry for each file that process > > +.I pid > > +has open, named by its file descriptor. The files in this directory > > Please don't reflow existing text. Please read about semantic newlines > in man-pages(7): > > $ MANWIDTH=72 man man-pages | sed -n '/Use semantic newlines/,/^$/p' > Use semantic newlines > In the source of a manual page, new sentences should be started > on new lines, long sentences should be split into lines at clause > breaks (commas, semicolons, colons, and so on), and long clauses > should be split at phrase boundaries. This convention, sometimes > known as "semantic newlines", makes it easier to see the effect > of patches, which often operate at the level of individual sen‐ > tences, clauses, or phrases. I'll update for v3 but I'm reminded of `git diff --word-diff=color` so perhaps this recommendation is outdated. Thanks, Ian > Have a lovely day! > Alex > > > +are readable only by the owner of the process. The contents of each > > +file can be read to obtain information about the corresponding file > > +descriptor. The content depends on the type of file referred to by > > +the corresponding file descriptor. > > +.P > > For regular files and directories, we see something like: > > -.IP > > +.P > > .in +4n > > .EX > > .RB "$" " cat /proc/12015/fdinfo/4" > > @@ -28,7 +27,7 @@ flags: 01002002 > > mnt_id: 21 > > .EE > > .in > > -.IP > > +.P > > The fields are as follows: > > .RS > > .TP > > @@ -51,7 +50,6 @@ this field incorrectly displayed the setting of > > at the time the file was opened, > > rather than the current setting of the close-on-exec flag. > > .TP > > -.I > > .I mnt_id > > This field, present since Linux 3.15, > > .\" commit 49d063cb353265c3af701bab215ac438ca7df36d > > @@ -59,13 +57,13 @@ is the ID of the mount containing this file. > > See the description of > > .IR /proc/ pid /mountinfo . > > .RE > > -.IP > > +.P > > For eventfd file descriptors (see > > .BR eventfd (2)), > > we see (since Linux 3.8) > > .\" commit cbac5542d48127b546a23d816380a7926eee1c25 > > the following fields: > > -.IP > > +.P > > .in +4n > > .EX > > pos: 0 > > @@ -74,16 +72,16 @@ mnt_id: 10 > > eventfd\-count: 40 > > .EE > > .in > > -.IP > > +.P > > .I eventfd\-count > > is the current value of the eventfd counter, in hexadecimal. > > -.IP > > +.P > > For epoll file descriptors (see > > .BR epoll (7)), > > we see (since Linux 3.8) > > .\" commit 138d22b58696c506799f8de759804083ff9effae > > the following fields: > > -.IP > > +.P > > .in +4n > > .EX > > pos: 0 > > @@ -93,7 +91,7 @@ tfd: 9 events: 19 data: 74253d2500000009 > > tfd: 7 events: 19 data: 74253d2500000007 > > .EE > > .in > > -.IP > > +.P > > Each of the lines beginning > > .I tfd > > describes one of the file descriptors being monitored via > > @@ -110,13 +108,13 @@ descriptor. > > The > > .I data > > field is the data value associated with this file descriptor. > > -.IP > > +.P > > For signalfd file descriptors (see > > .BR signalfd (2)), > > we see (since Linux 3.8) > > .\" commit 138d22b58696c506799f8de759804083ff9effae > > the following fields: > > -.IP > > +.P > > .in +4n > > .EX > > pos: 0 > > @@ -125,7 +123,7 @@ mnt_id: 10 > > sigmask: 0000000000000006 > > .EE > > .in > > -.IP > > +.P > > .I sigmask > > is the hexadecimal mask of signals that are accepted via this > > signalfd file descriptor. > > @@ -135,12 +133,12 @@ and > > .BR SIGQUIT ; > > see > > .BR signal (7).) > > -.IP > > +.P > > For inotify file descriptors (see > > .BR inotify (7)), > > we see (since Linux 3.8) > > the following fields: > > -.IP > > +.P > > .in +4n > > .EX > > pos: 0 > > @@ -150,7 +148,7 @@ inotify wd:2 ino:7ef82a sdev:800001 mask:800afff ignored_mask:0 fhandle\-bytes:8 > > inotify wd:1 ino:192627 sdev:800001 mask:800afff ignored_mask:0 fhandle\-bytes:8 fhandle\-type:1 f_handle:27261900802dfd73 > > .EE > > .in > > -.IP > > +.P > > Each of the lines beginning with "inotify" displays information about > > one file or directory that is being monitored. > > The fields in this line are as follows: > > @@ -168,19 +166,19 @@ The ID of the device where the target file resides (in hexadecimal). > > .I mask > > The mask of events being monitored for the target file (in hexadecimal). > > .RE > > -.IP > > +.P > > If the kernel was built with exportfs support, the path to the target > > file is exposed as a file handle, via three hexadecimal fields: > > .IR fhandle\-bytes , > > .IR fhandle\-type , > > and > > .IR f_handle . > > -.IP > > +.P > > For fanotify file descriptors (see > > .BR fanotify (7)), > > we see (since Linux 3.8) > > the following fields: > > -.IP > > +.P > > .in +4n > > .EX > > pos: 0 > > @@ -190,7 +188,7 @@ fanotify flags:0 event\-flags:88002 > > fanotify ino:19264f sdev:800001 mflags:0 mask:1 ignored_mask:0 fhandle\-bytes:8 fhandle\-type:1 f_handle:4f261900a82dfd73 > > .EE > > .in > > -.IP > > +.P > > The fourth line displays information defined when the fanotify group > > was created via > > .BR fanotify_init (2): > > @@ -210,7 +208,7 @@ argument given to > > .BR fanotify_init (2) > > (expressed in hexadecimal). > > .RE > > -.IP > > +.P > > Each additional line shown in the file contains information > > about one of the marks in the fanotify group. > > Most of these fields are as for inotify, except: > > @@ -228,16 +226,16 @@ The events mask for this mark > > The mask of events that are ignored for this mark > > (expressed in hexadecimal). > > .RE > > -.IP > > +.P > > For details on these fields, see > > .BR fanotify_mark (2). > > -.IP > > +.P > > For timerfd file descriptors (see > > .BR timerfd (2)), > > we see (since Linux 3.17) > > .\" commit af9c4957cf212ad9cf0bee34c95cb11de5426e85 > > the following fields: > > -.IP > > +.P > > .in +4n > > .EX > > pos: 0 > > -- > > 2.47.0.rc1.288.g06298d1525-goog > > > > -- > <https://www.alejandro-colomar.es/> ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page 2024-11-01 18:19 ` Ian Rogers @ 2024-11-01 20:07 ` Alejandro Colomar 2024-11-02 10:08 ` G. Branden Robinson 2024-11-02 23:10 ` [PATCH 0/3] Add mansect(1) program and manual page Alejandro Colomar 1 sibling, 1 reply; 35+ messages in thread From: Alejandro Colomar @ 2024-11-01 20:07 UTC (permalink / raw) To: Ian Rogers Cc: G . Branden Robinson, David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, Jonathan Corbet, dri-devel, linux-doc, linux-kernel, linux-man [-- Attachment #1: Type: text/plain, Size: 4319 bytes --] Hi Ian, On Fri, Nov 01, 2024 at 11:19:18AM -0700, Ian Rogers wrote: > On Fri, Nov 1, 2024 at 6:24 AM Alejandro Colomar <alx@kernel.org> wrote: > > > > On Tue, Oct 15, 2024 at 02:17:17PM -0700, Ian Rogers wrote: > > > When /proc/pid/fdinfo was part of proc.5 man page the indentation made > > > sense. As a standalone man page the indentation doesn't need to be so > > > far over to the right. Remove the initial tagged pragraph and move the > > > styling to the initial summary description. > > > > > > Suggested-by: G. Branden Robinson <g.branden.robinson@gmail.com> > > > Signed-off-by: Ian Rogers <irogers@google.com> > > > --- > > > man/man5/proc_pid_fdinfo.5 | 66 ++++++++++++++++++-------------------- > > > 1 file changed, 32 insertions(+), 34 deletions(-) > > > > > > diff --git a/man/man5/proc_pid_fdinfo.5 b/man/man5/proc_pid_fdinfo.5 > > > index 1e23bbe02..8678caf4a 100644 > > > --- a/man/man5/proc_pid_fdinfo.5 > > > +++ b/man/man5/proc_pid_fdinfo.5 > > > @@ -6,20 +6,19 @@ > > > .\" > > > .TH proc_pid_fdinfo 5 (date) "Linux man-pages (unreleased)" > > > .SH NAME > > > -/proc/pid/fdinfo/ \- information about file descriptors > > > +.IR /proc/ pid /fdinfo " \- information about file descriptors" > > > > I wouldn't add formatting here for now. That's something I prefer to be > > cautious about, and if we do it, we should do it in a separate commit. > > I'll move it to a separate patch. Is the caution due to a lack of test > infrastructure? That could be something to get resolved, perhaps > through Google summer-of-code and the like. That change might be controversial. We'd first need to check that all software that reads the NAME section would behave well for this. Also, many other pages might need to be changed accordingly for consistency. For testing infrastructure I think we're good. The makefile already does a lot of testing. > > > > .SH DESCRIPTION > > > -.TP > > > -.IR /proc/ pid /fdinfo/ " (since Linux 2.6.22)" > > > -This is a subdirectory containing one entry for each file which the > > > -process has open, named by its file descriptor. > > > -The files in this directory are readable only by the owner of the process. > > > -The contents of each file can be read to obtain information > > > -about the corresponding file descriptor. > > > -The content depends on the type of file referred to by the > > > -corresponding file descriptor. > > > -.IP > > > +Since Linux 2.6.22, > > > > You could move this information to a HISTORY section. > > Sure, tbh I'm not sure anybody cares about this information and it > could be as well to delete it. Sorry people running 17 year old > kernels. For now I'll try to leave it unchanged. I would like to keep it in HISTORY. You never know when it'll be useful and it's just one line or a few; it won't hurt. > > > > +this subdirectory contains one entry for each file that process > > > +.I pid > > > +has open, named by its file descriptor. The files in this directory > > > > Please don't reflow existing text. Please read about semantic newlines > > in man-pages(7): > > > > $ MANWIDTH=72 man man-pages | sed -n '/Use semantic newlines/,/^$/p' > > Use semantic newlines > > In the source of a manual page, new sentences should be started > > on new lines, long sentences should be split into lines at clause > > breaks (commas, semicolons, colons, and so on), and long clauses > > should be split at phrase boundaries. This convention, sometimes > > known as "semantic newlines", makes it easier to see the effect > > of patches, which often operate at the level of individual sen‐ > > tences, clauses, or phrases. > > I'll update for v3 but I'm reminded of `git diff --word-diff=color` so > perhaps this recommendation is outdated. No, this isn't outdated, since that reduces the quality of the diff. Also, I review a lot of patches in the mail client, without running git(1). And it's not just for reviewing diffs, but also for writing them. Semantic newlines reduce the amount of work for producing the diffs. And lastly, the source code reads much better if it's logically divided in phrases. > > Thanks, > Ian -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page 2024-11-01 20:07 ` Alejandro Colomar @ 2024-11-02 10:08 ` G. Branden Robinson 2024-11-02 10:39 ` Alejandro Colomar 2024-11-02 19:06 ` Colin Watson 0 siblings, 2 replies; 35+ messages in thread From: G. Branden Robinson @ 2024-11-02 10:08 UTC (permalink / raw) To: Alejandro Colomar Cc: Ian Rogers, David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, Jonathan Corbet, dri-devel, linux-doc, linux-kernel, linux-man, cjwatson, groff [-- Attachment #1: Type: text/plain, Size: 8067 bytes --] [adding Colin Watson to CC; and the groff list because I started musing] Hi Alex, At 2024-11-01T21:07:29+0100, Alejandro Colomar wrote: > > > > -/proc/pid/fdinfo/ \- information about file descriptors > > > > +.IR /proc/ pid /fdinfo " \- information about file descriptors" > > > > > > I wouldn't add formatting here for now. That's something I prefer > > > to be cautious about, and if we do it, we should do it in a > > > separate commit. > > > > I'll move it to a separate patch. Is the caution due to a lack of > > test infrastructure? That could be something to get resolved, > > perhaps through Google summer-of-code and the like. > > That change might be controversial. Then let those with objections step forward and make them! (I may be one of them; see below.) > We'd first need to check that all software that reads the NAME section > would behave well for this. Not _all_ software, surely. Anybody can write a craptastic man(7) scraper, and several have, mainly back when Web 1.0 was going to eat the world. Most of those have withered on the vine. This is the _Linux_ man-pages project, so what matters are (1) man page formatters and (2) man page indexers that GNU/Linux systems actually use. Where people get nervous with the "NAME" section is because of the indexer; if one's man(7) _formatter_ can't handle an `IR` call, it hasn't earned the name. Here's a sample input. $ cat /tmp/proc_pid_fdinfo_mini.5 .TH proc_pid_fdinfo_mini 5 2024-11-02 "example" .SH Name .IR /proc/ pid /fdinfo " \- information about file descriptors" .SH Description Text text text text. Starting with formatters, let's see how they do. $ nroff -man /tmp/proc_pid_fdinfo_mini.5 proc_pid_fdinfo_mini(5) File Formats Manual proc_pid_fdinfo_mini(5) Name /proc/pid/fdinfo - information about file descriptors Description Text text text text. example 2024‐11‐02 proc_pid_fdinfo_mini(5) $ mandoc /tmp/proc_pid_fdinfo_mini.5 | ul proc_pid_fdinfo_mini(5) File Formats Manual proc_pid_fdinfo_mini(5) Name /proc/pid/fdinfo - information about file descriptors Description Text text text text. example 2024-11-02 proc_pid_fdinfo_mini(5) $ ~/heirloom/bin/nroff -man /tmp/proc_pid_fdinfo_mini.5 | ul proc_pid_fdinfo_mini(5) File Formats Manual proc_pid_fdinfo_mini(5) Name /proc/pid/fdinfo - information about file descriptors Description Text text text text. example 2024-11-02 proc_pid_fdinfo_mini(5) $ DWBHOME=~/dwb ~/dwb/bin/nroff -man /tmp/proc_pid_fdinfo_mini.5 | cat -s | ul proc_pid_fdinfo_mini(5)example (2024-11-02)roc_pid_fdinfo_mini(5) Name /proc/pid/fdinfo - information about file descriptors Description Text text text text. Page 1 (printed 11/2/2024) I leave the execution of these to perceive the correct font style changes as an exercise for the reader, but they all get the "/proc/pid/fdinfo" line right. On GNU/Linux systems, the only man page indexer I know of is Colin Watson's man-db--specifically, its mandb(8) program. But it's nicely designed so that the "topic and summary description extraction" task is delegated to a standalone tool, lexgrog(1), and we can use that. $ lexgrog /tmp/proc_pid_fdinfo_mini.5 /tmp/proc_pid_fdinfo_mini.5: parse failed Oh, damn. I wasn't expecting that. Maybe this is what defeats Michael Kerrisk's scraper with respect to groff's man pages.[1] Well, I can find a silver lining here, because it gives me an even better reason than I had to pitch an idea I've been kicking around for a while. Why not enhance groff man(7) to support a mode where _it_ will spit out the "Name"/"NAME" section, and only that, _for_ you? This would be as easy as checking for an option, say '-d EXTRACT=Name', and having the package's "TH" and "SH" macro definitions divert (literally, with the `di` request) everything _except_ the section of interest to a diversion that is then never called/output. (This is similar to an m4 feature known as the "black hole diversion".) All of the features necessary to implement this[2] were part of troff as far as back as the birth of the man(7) package itself. It's not clear to me why it wasn't done back in the 1980s. lexgrog(1) itself will of course have to stay around for years to come, but this could take a significant distraction off of Colin's plate--I believe I have seen him grumble about how much *roff syntax he has to parse to have the feature be workable, and that's without upstart groff maintainers exploring up to every boundary that existed even in 1979 and cheerfully exercising their findings in man pages. I also of course have ideas for generalizing the feature, so that you can request any (sub)section by name, and, with a bit more ambition,[4] paragraph tags (`TP`) too. So you could do things like: nroff -man -d EXTRACT="RETURN VALUE" man3/bsearch.3 and: nroff -man -d EXTRACT="OPTIONS/-b" man8/zic.8 ...does this sound appetizing to anyone? > Also, many other pages might need to be changed accordingly for > consistency. I withdraw the suggestion until lexgrog(1) flexes its own muscles, or has groff(1) do the lifting. I'm sorry for prompting churn, Ian. > No, this isn't outdated, since that reduces the quality of the diff. > Also, I review a lot of patches in the mail client, without running > git(1). And it's not just for reviewing diffs, but also for writing > them. Semantic newlines reduce the amount of work for producing the > diffs. It's a real win for diffs. Here's a very recent example from groff. diff --git a/man/groff.7.man b/man/groff.7.man index 1fb635f2b..1d248b237 100644 --- a/man/groff.7.man +++ b/man/groff.7.man @@ -1281,6 +1281,7 @@ .SH Identifiers typeface, color, special character or character class, +hyphenation language code, environment, or stream. . (So recent that in fact I haven't pushed that yet.) Lists like the foregoing are common in man pages. Regards, Branden [1] https://man7.org/linux/man-pages/dir_by_project.html#groff [2] String definitions, "string comparisons"[3], and diversions. [3] strictly, "formatted output comparisons" https://www.gnu.org/software/groff/manual/groff.html.node/Operators-in-Conditionals.html You can do stricter string comparisons in GNU troff. And I've thought of some syntactic sugar for performing them that wouldn't break backward compatibility. [4] To really land the feature, we need automatic tag generation from input text (we don't want to make the man page author construct their own tags). Another reason we want the construction to be automatic is to make the tags unique when multiple man pages are formatted in one run, as one might do when making a book of man pages. Automatic tagging will also enable the slaying of two other ancient dragons. 1. deep internal links for PDF bookmarks 2. pod2man's `IX`-happy output; the widespread use of this nonstandard macro confuses way too many novice page authors, and bloats document size. Another feature we'll really want to do this right is improved string processing facilities. That, too, is something that will pay dividends in several areas. With a proper string iterator in the formatter (and a couple more conditional operators),[5] it will be possible to write a string library as a macro file, slimming down the formatter itself a little and making macro writers' lives easier. We're only two days into the month and this has already come up on the groff list. https://lists.gnu.org/archive/html/groff/2024-11/msg00002.html [5] https://savannah.gnu.org/bugs/?62264 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page 2024-11-02 10:08 ` G. Branden Robinson @ 2024-11-02 10:39 ` Alejandro Colomar 2024-11-02 21:36 ` Alejandro Colomar 2024-11-03 4:05 ` G. Branden Robinson 2024-11-02 19:06 ` Colin Watson 1 sibling, 2 replies; 35+ messages in thread From: Alejandro Colomar @ 2024-11-02 10:39 UTC (permalink / raw) To: G. Branden Robinson Cc: Ian Rogers, David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, Jonathan Corbet, dri-devel, linux-doc, linux-kernel, linux-man, cjwatson, groff [-- Attachment #1: Type: text/plain, Size: 12063 bytes --] Hi Branden, On Sat, Nov 02, 2024 at 05:08:37AM -0500, G. Branden Robinson wrote: > [adding Colin Watson to CC; and the groff list because I started musing] > > Hi Alex, > > At 2024-11-01T21:07:29+0100, Alejandro Colomar wrote: > > > > > -/proc/pid/fdinfo/ \- information about file descriptors > > > > > +.IR /proc/ pid /fdinfo " \- information about file descriptors" > > > > > > > > I wouldn't add formatting here for now. That's something I prefer > > > > to be cautious about, and if we do it, we should do it in a > > > > separate commit. > > > > > > I'll move it to a separate patch. Is the caution due to a lack of > > > test infrastructure? That could be something to get resolved, > > > perhaps through Google summer-of-code and the like. > > > > That change might be controversial. > > Then let those with objections step forward and make them! Sure! But that in itself (and the length of your mail) makes a strong reason to have this in a separate commit. :) I'm not opposed to the change. Only cautious. > > (I may be one of them; see below.) > > > We'd first need to check that all software that reads the NAME section > > would behave well for this. > > Not _all_ software, surely. Anybody can write a craptastic man(7) > scraper, and several have, mainly back when Web 1.0 was going to eat the > world. Most of those have withered on the vine. Ahh, yeah, I committed the same mistake I criticise in others every now and then. $all does not really mean "all". (-Wall, `make all`, ...) I meant all [of which I care], which is basically groff(1) and mandoc(1). :) > This is the _Linux_ man-pages project, so what matters are (1) man page > formatters and (2) man page indexers that GNU/Linux systems actually > use. Where people get nervous with the "NAME" section is because of the > indexer; if one's man(7) _formatter_ can't handle an `IR` call, it > hasn't earned the name. Yup. > > Here's a sample input. > > $ cat /tmp/proc_pid_fdinfo_mini.5 > .TH proc_pid_fdinfo_mini 5 2024-11-02 "example" > .SH Name > .IR /proc/ pid /fdinfo " \- information about file descriptors" > .SH Description > Text text text text. > > Starting with formatters, let's see how they do. > > $ nroff -man /tmp/proc_pid_fdinfo_mini.5 > proc_pid_fdinfo_mini(5) File Formats Manual proc_pid_fdinfo_mini(5) > > Name > /proc/pid/fdinfo - information about file descriptors > > Description > Text text text text. > > example 2024‐11‐02 proc_pid_fdinfo_mini(5) > $ mandoc /tmp/proc_pid_fdinfo_mini.5 | ul > proc_pid_fdinfo_mini(5) File Formats Manual proc_pid_fdinfo_mini(5) > > Name > /proc/pid/fdinfo - information about file descriptors > > Description > Text text text text. > > example 2024-11-02 proc_pid_fdinfo_mini(5) > $ ~/heirloom/bin/nroff -man /tmp/proc_pid_fdinfo_mini.5 | ul > proc_pid_fdinfo_mini(5) File Formats Manual proc_pid_fdinfo_mini(5) > > > > Name > /proc/pid/fdinfo - information about file descriptors > > Description > Text text text text. > > > > example 2024-11-02 proc_pid_fdinfo_mini(5) > $ DWBHOME=~/dwb ~/dwb/bin/nroff -man /tmp/proc_pid_fdinfo_mini.5 | cat -s | ul > > proc_pid_fdinfo_mini(5)example (2024-11-02)roc_pid_fdinfo_mini(5) > > Name > /proc/pid/fdinfo - information about file descriptors > > Description > Text text text text. > > Page 1 (printed 11/2/2024) > > I leave the execution of these to perceive the correct font style > changes as an exercise for the reader, but they all get the > "/proc/pid/fdinfo" line right. > > On GNU/Linux systems, the only man page indexer I know of is Colin > Watson's man-db--specifically, its mandb(8) program. But it's nicely > designed so that the "topic and summary description extraction" task is > delegated to a standalone tool, lexgrog(1), and we can use that. > > $ lexgrog /tmp/proc_pid_fdinfo_mini.5 > /tmp/proc_pid_fdinfo_mini.5: parse failed > > Oh, damn. I wasn't expecting that. Maybe this is what defeats Michael > Kerrisk's scraper with respect to groff's man pages.[1] > > Well, I can find a silver lining here, because it gives me an even > better reason than I had to pitch an idea I've been kicking around for a > while. Why not enhance groff man(7) to support a mode where _it_ will > spit out the "Name"/"NAME" section, and only that, _for_ you? > > This would be as easy as checking for an option, say '-d EXTRACT=Name', > and having the package's "TH" and "SH" macro definitions divert > (literally, with the `di` request) everything _except_ the section of > interest to a diversion that is then never called/output. (This is > similar to an m4 feature known as the "black hole diversion".) Sounds good. And then lexgrog(1) would be a one-liner that calls groff(1) with the appropriate flag, right? > All of the features necessary to implement this[2] were part of troff as > far as back as the birth of the man(7) package itself. It's not clear > to me why it wasn't done back in the 1980s. Not enough energy of activation, probably, as with most stuff. > lexgrog(1) itself will of course have to stay around for years to come, You can make it a wrapper around groff(1) with flags, no? > but this could take a significant distraction off of Colin's plate--I > believe I have seen him grumble about how much *roff syntax he has to > parse to have the feature be workable, and that's without upstart groff > maintainers exploring up to every boundary that existed even in 1979 and > cheerfully exercising their findings in man pages. > > I also of course have ideas for generalizing the feature, so that you > can request any (sub)section by name, and, with a bit more ambition,[4] > paragraph tags (`TP`) too. > > So you could do things like: > > nroff -man -d EXTRACT="RETURN VALUE" man3/bsearch.3 I certainly use this. # man_section() prints specific manual page sections (DESCRIPTION, SYNOPSIS, # ...) of all manual pages in a directory (or in a single manual page file). # Usage example: .../man-pages$ man_section man2 SYNOPSIS 'SEE ALSO'; man_section() { if [ $# -lt 2 ]; then >&2 echo "Usage: ${FUNCNAME[0]} <dir> <section>..."; return $EX_USAGE; fi local page="$1"; shift; local sect="$*"; find "$page" -type f \ |xargs wc -l \ |grep -v -e '\b1 ' -e '\btotal\b' \ |awk '{ print $2 }' \ |sort \ |while read -r manpage; do (sed -n '/^\.TH/,/^\.SH/{/^\.SH/!p}' <"$manpage"; for s in $sect; do <"$manpage" \ sed -n \ -e "/^\.SH $s/p" \ -e "/^\.SH $s/,/^\.SH/{/^\.SH/!p}"; done;) \ |mandoc -Tutf8 2>/dev/null \ |col -pbx; done; } # man_lsfunc() prints the name of all C functions declared in the SYNOPSIS # of all manual pages in a directory (or in a single manual page file). # Each name is printed in a separate line # Usage example: .../man-pages$ man_lsfunc man2; man_lsfunc() { if [ $# -lt 1 ]; then >&2 echo "Usage: ${FUNCNAME[0]} <manpage|manNdir>..."; return $EX_USAGE; fi for arg in "$@"; do man_section "$arg" 'SYNOPSIS'; done \ |sed_rm_ccomments \ |pcregrep -Mn '(?s)^ [\w ]+ \**\w+\([\w\s(,)[\]*]*?(...)?\s*\); *$' \ |grep '^[0-9]' \ |sed -E 's/syscall\(SYS_(\w*),?/\1(/' \ |sed -E 's/^[^(]+ \**(\w+)\(.*/\1/' \ |uniq; } # man_lsvar() prints the name of all C variables declared in the SYNOPSIS # of all manual pages in a directory (or in a single manual page file). # Each name is printed in a separate line # Usage example: .../man-pages$ man_lsvar man3; man_lsvar() { if [ $# -lt 1 ]; then >&2 echo "Usage: ${FUNCNAME[0]} <manpage|manNdir>..."; return $EX_USAGE; fi for arg in "$@"; do man_section "$arg" 'SYNOPSIS'; done \ |sed_rm_ccomments \ |pcregrep -Mv '(?s)^ [\w ]+ \**\w+\([\w\s(,)[\]*]+?(...)?\s*\); *$' \ |pcregrep -Mn \ -e '(?s)^ +extern [\w ]+ \**\(\*+[\w ]+\)\([\w\s(,)[\]*]+?\s*\); *$' \ -e '^ +extern [\w ]+ \**[\w ]+; *$' \ |grep '^[0-9]' \ |grep -v 'typedef' \ |sed -E 's/^[0-9]+: +extern [^(]+ \**\(\*+(\w* )?(\w+)\)\(.*/\2/' \ |sed 's/^[0-9]\+: \+extern .* \**\(\w\+\); */\1/' \ |uniq; } Even grepc(1) derived from those scripts. > > and: > > nroff -man -d EXTRACT="OPTIONS/-b" man8/zic.8 While I haven't used this yet, it's probably because it's quite complex to implement with regexes, not because it wouldn't be useful. > > ...does this sound appetizing to anyone? Certainly. > > Also, many other pages might need to be changed accordingly for > > consistency. > > I withdraw the suggestion until lexgrog(1) flexes its own muscles, or > has groff(1) do the lifting. I'm sorry for prompting churn, Ian. > > > No, this isn't outdated, since that reduces the quality of the diff. > > Also, I review a lot of patches in the mail client, without running > > git(1). And it's not just for reviewing diffs, but also for writing > > them. Semantic newlines reduce the amount of work for producing the > > diffs. > > It's a real win for diffs. And diffs are a real win for text. Thus, semantic newlines are a real win for text. "Write poems, not prose." (Any chance we may get that warning added to groff(1)? :D) Cheers, Alex > > Here's a very recent example from groff. > > diff --git a/man/groff.7.man b/man/groff.7.man > index 1fb635f2b..1d248b237 100644 > --- a/man/groff.7.man > +++ b/man/groff.7.man > @@ -1281,6 +1281,7 @@ .SH Identifiers > typeface, > color, > special character or character class, > +hyphenation language code, > environment, > or stream. > . > > > (So recent that in fact I haven't pushed that yet.) > > Lists like the foregoing are common in man pages. > > Regards, > Branden > > [1] https://man7.org/linux/man-pages/dir_by_project.html#groff > [2] String definitions, "string comparisons"[3], and diversions. > [3] strictly, "formatted output comparisons" > > https://www.gnu.org/software/groff/manual/groff.html.node/Operators-in-Conditionals.html > > You can do stricter string comparisons in GNU troff. And I've > thought of some syntactic sugar for performing them that wouldn't > break backward compatibility. > > [4] To really land the feature, we need automatic tag generation from > input text (we don't want to make the man page author construct > their own tags). Another reason we want the construction to be > automatic is to make the tags unique when multiple man pages are > formatted in one run, as one might do when making a book of man > pages. Automatic tagging will also enable the slaying of two other > ancient dragons. > > 1. deep internal links for PDF bookmarks > 2. pod2man's `IX`-happy output; the widespread use of this > nonstandard macro confuses way too many novice page authors, and > bloats document size. > > Another feature we'll really want to do this right is improved string > processing facilities. That, too, is something that will pay > dividends in several areas. With a proper string iterator in the > formatter (and a couple more conditional operators),[5] it will be > possible to write a string library as a macro file, slimming down the > formatter itself a little and making macro writers' lives easier. > We're only two days into the month and this has already come up on > the groff list. > > https://lists.gnu.org/archive/html/groff/2024-11/msg00002.html > > [5] https://savannah.gnu.org/bugs/?62264 -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page 2024-11-02 10:39 ` Alejandro Colomar @ 2024-11-02 21:36 ` Alejandro Colomar 2024-11-02 23:47 ` Colin Watson 2024-11-03 4:05 ` G. Branden Robinson 1 sibling, 1 reply; 35+ messages in thread From: Alejandro Colomar @ 2024-11-02 21:36 UTC (permalink / raw) To: G. Branden Robinson, cjwatson Cc: Ian Rogers, David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, Jonathan Corbet, dri-devel, linux-doc, linux-kernel, linux-man, cjwatson, groff [-- Attachment #1: Type: text/plain, Size: 4844 bytes --] Hi Branden, Colin, On Sat, Nov 02, 2024 at 11:40:13AM +0100, Alejandro Colomar wrote: > > I also of course have ideas for generalizing the feature, so that you > > can request any (sub)section by name, and, with a bit more ambition,[4] > > paragraph tags (`TP`) too. > > > > So you could do things like: > > > > nroff -man -d EXTRACT="RETURN VALUE" man3/bsearch.3 > > I certainly use this. > > # man_section() prints specific manual page sections (DESCRIPTION, SYNOPSIS, > # ...) of all manual pages in a directory (or in a single manual page file). > # Usage example: .../man-pages$ man_section man2 SYNOPSIS 'SEE ALSO'; > > man_section() > { > if [ $# -lt 2 ]; then > >&2 echo "Usage: ${FUNCNAME[0]} <dir> <section>..."; > return $EX_USAGE; > fi > > local page="$1"; > shift; > local sect="$*"; > > find "$page" -type f \ > |xargs wc -l \ > |grep -v -e '\b1 ' -e '\btotal\b' \ > |awk '{ print $2 }' \ > |sort \ > |while read -r manpage; do > (sed -n '/^\.TH/,/^\.SH/{/^\.SH/!p}' <"$manpage"; > for s in $sect; do > <"$manpage" \ > sed -n \ > -e "/^\.SH $s/p" \ > -e "/^\.SH $s/,/^\.SH/{/^\.SH/!p}"; > done;) \ > |mandoc -Tutf8 2>/dev/null \ > |col -pbx; > done; > } On the other hand, you may want to just package this small shell script (or rather a part of it) as a program. How about this? $ cat /usr/local/bin/mansect #!/bin/sh if [ $# -lt 1 ]; then >&2 echo "Usage: $0 SECTION [FILE ...]"; return 1; fi s="$1"; shift; if test -z "$*"; then sed -n \ -e '/^\.TH/,/^\.SH/{/^\.SH/!p}' \ -e '/^\.SH '"$s"'$/p' \ -e '/^\.SH '"$s"'$/,/^\.SH/{/^\.SH/!p}' \ ; else find "$@" -not -type d \ | xargs wc -l \ | sed '${/ total$/d}' \ | grep -v '\b1 ' \ | awk '{ print $2 }' \ | xargs -L1 sed -n \ -e '/^\.TH/,/^\.SH/{/^\.SH/!p}' \ -e '/^\.SH '"$s"'$/p' \ -e '/^\.SH '"$s"'$/,/^\.SH/{/^\.SH/!p}' \ ; fi; This only filters the source of the page, producing output that's suitable for the groff pipeline. alx@devuan:~$ man -w proc | xargs cat | mansect NAME .TH proc 5 2024-06-15 "Linux man-pages 6.9.1-158-g2ac94c631" .SH NAME proc \- process information, system information, and sysctl pseudo-filesystem alx@devuan:~$ man -w strtol strtoul | xargs mansect 'NAME' .TH strtol 3 2024-07-23 "Linux man-pages 6.9.1-158-g2ac94c631" .SH NAME strtol, strtoll, strtoq \- convert a string to a long integer .TH strtoul 3 2024-07-23 "Linux man-pages 6.9.1-158-g2ac94c631" .SH NAME strtoul, strtoull, strtouq \- convert a string to an unsigned long integer You can request several sections with a regex: $ man -w strtol strtoul | xargs mansect '\(NAME\|SEE ALSO\)' .TH strtol 3 2024-07-23 "Linux man-pages 6.9.1-158-g2ac94c631" .SH NAME strtol, strtoll, strtoq \- convert a string to a long integer .SH SEE ALSO .BR atof (3), .BR atoi (3), .BR atol (3), .BR strtod (3), .BR strtoimax (3), .BR strtoul (3) .TH strtoul 3 2024-07-23 "Linux man-pages 6.9.1-158-g2ac94c631" .SH NAME strtoul, strtoull, strtouq \- convert a string to an unsigned long integer .SH SEE ALSO .BR a64l (3), .BR atof (3), .BR atoi (3), .BR atol (3), .BR strtod (3), .BR strtol (3), .BR strtoumax (3) And it can then be piped to groff(1) to format the entire set of pages: $ man -w strtol strtoul | xargs mansect '\(NAME\|SEE ALSO\)' | groff -man -Tutf8 strtol(3) Library Functions Manual strtol(3) NAME strtol, strtoll, strtoq - convert a string to a long integer SEE ALSO atof(3), atoi(3), atol(3), strtod(3), strtoimax(3), strtoul(3) Linux man‐pages 6.9.1‐158‐g2ac... 2024‐07‐23 strtol(3) ─────────────────────────────────────────────────────────────────────────────── strtoul(3) Library Functions Manual strtoul(3) NAME strtoul, strtoull, strtouq - convert a string to an unsigned long integer SEE ALSO a64l(3), atof(3), atoi(3), atol(3), strtod(3), strtol(3), strtoumax(3) Linux man‐pages 6.9.1‐158‐g2ac... 2024‐07‐23 strtoul(3) This is quite naive, and will not work with pages that define their own stuff, since this script is not groff(1). But it should be as fast as is possible, which is what Colin wants, is as simple as it can be (and thus relatively safe), and should work with most pages (as far as indexing is concerned, probably all?). Have a lovely night! Alex -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page 2024-11-02 21:36 ` Alejandro Colomar @ 2024-11-02 23:47 ` Colin Watson 2024-11-03 0:05 ` Alejandro Colomar 0 siblings, 1 reply; 35+ messages in thread From: Colin Watson @ 2024-11-02 23:47 UTC (permalink / raw) To: Alejandro Colomar Cc: G. Branden Robinson, Ian Rogers, David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, Jonathan Corbet, dri-devel, linux-doc, linux-kernel, linux-man, groff On Sat, Nov 02, 2024 at 10:36:20PM +0100, Alejandro Colomar wrote: > This is quite naive, and will not work with pages that define their own > stuff, since this script is not groff(1). But it should be as fast as > is possible, which is what Colin wants, is as simple as it can be (and > thus relatively safe), and should work with most pages (as far as > indexing is concerned, probably all?). I seem to be being invoked here for something I actually don't think I want at all, which suggests that wires have been crossed somewhere. Can you explain why I'd want to replace some part of a fairly well-optimized and established C program with a shell pipeline? I'm pretty certain it would not be faster, at least. Thanks, -- Colin Watson (he/him) [cjwatson@debian.org] ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page 2024-11-02 23:47 ` Colin Watson @ 2024-11-03 0:05 ` Alejandro Colomar 2024-11-03 0:07 ` Alejandro Colomar ` (2 more replies) 0 siblings, 3 replies; 35+ messages in thread From: Alejandro Colomar @ 2024-11-03 0:05 UTC (permalink / raw) To: G. Branden Robinson, Ian Rogers, David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, Jonathan Corbet, dri-devel, linux-doc, linux-kernel, linux-man, groff [-- Attachment #1: Type: text/plain, Size: 1882 bytes --] Hi Colin, On Sat, Nov 02, 2024 at 11:47:14PM +0000, Colin Watson wrote: > On Sat, Nov 02, 2024 at 10:36:20PM +0100, Alejandro Colomar wrote: > > This is quite naive, and will not work with pages that define their own > > stuff, since this script is not groff(1). But it should be as fast as > > is possible, which is what Colin wants, is as simple as it can be (and > > thus relatively safe), and should work with most pages (as far as > > indexing is concerned, probably all?). > > I seem to be being invoked here for something I actually don't think I > want at all, which suggests that wires have been crossed somewhere. Can > you explain why I'd want to replace some part of a fairly well-optimized > and established C program with a shell pipeline? I'm pretty certain it > would not be faster, at least. Are you sure? With a small tweak, I get the following comparison: alx@devuan:~/src/linux/man-pages/man-pages/main$ time lexgrog man/*/* | wc lexgrog: can't resolve man7/groff_man.7 12475 99295 919842 real 0m6.166s user 0m5.132s sys 0m1.336s alx@devuan:~/src/linux/man-pages/man-pages/main$ time mansect NAME man/ \ | groff -man -Tutf8 | wc 9830 27109 689478 real 0m0.156s user 0m0.219s sys 0m0.019s Yes, I'm working with uncompressed pages. We'd need to add support for handling compressed pages. Also, we'd need to compare the performance of lexgrog(1) with compressed pages. But for a starter, this suggests some good performance. (I say with a small tweak, because the version I've posted uses xargs -L1, but I've tested for performance without the -L1, which is the main bottleneck. It has no consequences for the NAME. I need to work out some nasty details with sed -n1 for the generic version, though.) Have a lovely night! Alex -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page 2024-11-03 0:05 ` Alejandro Colomar @ 2024-11-03 0:07 ` Alejandro Colomar 2024-11-03 0:24 ` Colin Watson 2024-11-03 0:47 ` Colin Watson 2 siblings, 0 replies; 35+ messages in thread From: Alejandro Colomar @ 2024-11-03 0:07 UTC (permalink / raw) To: G. Branden Robinson, Ian Rogers, David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, Jonathan Corbet, dri-devel, linux-doc, linux-kernel, linux-man, groff [-- Attachment #1: Type: text/plain, Size: 2111 bytes --] On Sun, Nov 03, 2024 at 01:05:42AM +0100, Alejandro Colomar wrote: > Hi Colin, > > On Sat, Nov 02, 2024 at 11:47:14PM +0000, Colin Watson wrote: > > On Sat, Nov 02, 2024 at 10:36:20PM +0100, Alejandro Colomar wrote: > > > This is quite naive, and will not work with pages that define their own > > > stuff, since this script is not groff(1). But it should be as fast as > > > is possible, which is what Colin wants, is as simple as it can be (and > > > thus relatively safe), and should work with most pages (as far as > > > indexing is concerned, probably all?). > > > > I seem to be being invoked here for something I actually don't think I > > want at all, which suggests that wires have been crossed somewhere. Can > > you explain why I'd want to replace some part of a fairly well-optimized > > and established C program with a shell pipeline? I'm pretty certain it > > would not be faster, at least. > > Are you sure? With a small tweak, I get the following comparison: > > alx@devuan:~/src/linux/man-pages/man-pages/main$ time lexgrog man/*/* | wc > lexgrog: can't resolve man7/groff_man.7 > 12475 99295 919842 > > real 0m6.166s > user 0m5.132s > sys 0m1.336s > alx@devuan:~/src/linux/man-pages/man-pages/main$ time mansect NAME man/ \ > | groff -man -Tutf8 | wc > 9830 27109 689478 > > real 0m0.156s > user 0m0.219s > sys 0m0.019s > > Yes, I'm working with uncompressed pages. We'd need to add support for > handling compressed pages. Also, we'd need to compare the performance > of lexgrog(1) with compressed pages. But for a starter, this suggests > some good performance. > > (I say with a small tweak, because the version I've posted uses > xargs -L1, but I've tested for performance without the -L1, which is > the main bottleneck. It has no consequences for the NAME. I need to > work out some nasty details with sed -n1 for the generic version, s/n1/n/ > though.) > > > Have a lovely night! > Alex > > -- > <https://www.alejandro-colomar.es/> -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page 2024-11-03 0:05 ` Alejandro Colomar 2024-11-03 0:07 ` Alejandro Colomar @ 2024-11-03 0:24 ` Colin Watson 2024-11-03 0:42 ` Alejandro Colomar 2024-11-03 0:47 ` Colin Watson 2 siblings, 1 reply; 35+ messages in thread From: Colin Watson @ 2024-11-03 0:24 UTC (permalink / raw) To: Alejandro Colomar Cc: G. Branden Robinson, Ian Rogers, David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, Jonathan Corbet, dri-devel, linux-doc, linux-kernel, linux-man, groff On Sun, Nov 03, 2024 at 01:05:34AM +0100, Alejandro Colomar wrote: > Are you sure? With a small tweak, I get the following comparison: > > alx@devuan:~/src/linux/man-pages/man-pages/main$ time lexgrog man/*/* | wc > lexgrog: can't resolve man7/groff_man.7 > 12475 99295 919842 Comparing anything to lexgrog isn't very interesting; it's a debugging tool and is not in itself very performance-sensitive. As I've explained elsewhere, the interesting thing is mandb, which uses the same code in-process to scan a whole tree of pages in one go. I do not expect to ever want to replace that with a shell pipeline. -- Colin Watson (he/him) [cjwatson@debian.org] ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page 2024-11-03 0:24 ` Colin Watson @ 2024-11-03 0:42 ` Alejandro Colomar 0 siblings, 0 replies; 35+ messages in thread From: Alejandro Colomar @ 2024-11-03 0:42 UTC (permalink / raw) To: G. Branden Robinson, linux-man, groff [-- Attachment #1: Type: text/plain, Size: 1787 bytes --] [CC trimmed] Hi Colin, On Sun, Nov 03, 2024 at 12:24:54AM +0000, Colin Watson wrote: > On Sun, Nov 03, 2024 at 01:05:34AM +0100, Alejandro Colomar wrote: > > Are you sure? With a small tweak, I get the following comparison: > > > > alx@devuan:~/src/linux/man-pages/man-pages/main$ time lexgrog man/*/* | wc > > lexgrog: can't resolve man7/groff_man.7 > > 12475 99295 919842 > > Comparing anything to lexgrog isn't very interesting; it's a debugging > tool and is not in itself very performance-sensitive. As I've explained > elsewhere, the interesting thing is mandb, which uses the same code > in-process to scan a whole tree of pages in one go. I do not expect to > ever want to replace that with a shell pipeline. I don't know how to compare to mandb(8), since it does other stuff, and skips some when things haven't changed. In any case, if this is of any use, you may use it to compare, if you have an idea of what's more or less the percentage of time that mandb(8) spends on this task: alx@devuan:~/src/linux/man-pages/man-pages/master$ time mansect NAME man/ | wc 4851 23548 169216 real 0m0.044s user 0m0.033s sys 0m0.015s alx@devuan:~/src/linux/man-pages/man-pages/master$ time mandb man/ |& wc 30 179 2487 real 0m1.341s user 0m1.065s sys 0m0.302s alx@devuan:~/src/linux/man-pages/man-pages/master$ time mandb man/ |& wc 15 80 1116 real 0m0.030s user 0m0.013s sys 0m0.008s This has been run on the Linux man-pages repository, with uncompressed pages. I've optimized mansect(1) to be 3x faster, and slightly simpler and more robust, compared to the version posted on the list (and xargs doesn't need -L1 anymore). Cheers, Alex -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page 2024-11-03 0:05 ` Alejandro Colomar 2024-11-03 0:07 ` Alejandro Colomar 2024-11-03 0:24 ` Colin Watson @ 2024-11-03 0:47 ` Colin Watson 2024-11-03 1:09 ` G. Branden Robinson 2024-11-03 1:59 ` Alejandro Colomar 2 siblings, 2 replies; 35+ messages in thread From: Colin Watson @ 2024-11-03 0:47 UTC (permalink / raw) To: Alejandro Colomar Cc: G. Branden Robinson, Ian Rogers, David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, Jonathan Corbet, dri-devel, linux-doc, linux-kernel, linux-man, groff I'm not trying to stop you committing whatever you want to your repository, of course, but I want to be clear that this doesn't actually solve the right problem for manual page indexing. The point of the parsing code in mandb(8) - and I'm not claiming that it's great code or the perfect design, just that it works most of the time - is to extract the names and summary-descriptions from each page so that they can be used by tools such as apropos(1) and whatis(1). Splitting on section boundaries is just the simplest part of that problem, and I don't think that doing it in a separate program really gains anything. (That's leaving aside things like localized man pages, which I know some folks on the groff list tend to sniff at but I think they're important, and the fact that the NAME section has both semantic and presentational meaning means that like it or not the parser needs to be aware of this.) -- Colin Watson (he/him) [cjwatson@debian.org] ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page 2024-11-03 0:47 ` Colin Watson @ 2024-11-03 1:09 ` G. Branden Robinson 2024-11-03 1:18 ` Colin Watson 2024-11-03 1:59 ` Alejandro Colomar 1 sibling, 1 reply; 35+ messages in thread From: G. Branden Robinson @ 2024-11-03 1:09 UTC (permalink / raw) To: Ian Rogers, David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, Jonathan Corbet, dri-devel, linux-doc, linux-kernel, linux-man, groff [-- Attachment #1: Type: text/plain, Size: 1346 bytes --] Hi Colin, At 2024-11-03T00:47:23+0000, Colin Watson wrote: > (That's leaving aside things like localized man pages, which I know > some folks on the groff list tend to sniff I can think of only one, the maintainer of a rival formatter. ;-) > at but I think they're important, Me too. I agree with the sniffer that no language is ever likely to reach 100% parity with English in something like the Debian distribution, but more modest domains exist. I've put effort into l10n issues in man(7) and in groff generally. In particular, I really want seamless multilingual document support and achievement of that goal will be, I think, much closer in groff 1.24. (My pending push is gated on deciding how to change the me(7) and ms(7) packages to accommodate a formatter-level fix to an ugly wart in the l10n department; see <https://savannah.gnu.org/bugs/?66387>.) > and the fact that the NAME section has both semantic and > presentational meaning means that like it or not the parser needs to > be aware of this.) Even if mandb(8) doesn't run groff to extract the summary descriptions/ apropos lines, I think this feature might be useful to you for coverage/regression testing. Presumably, for valid inputs, groff and mandb(8) should reach similar conclusions about how the text of a "Name" section is to be formatted. Regards, Branden [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page 2024-11-03 1:09 ` G. Branden Robinson @ 2024-11-03 1:18 ` Colin Watson 0 siblings, 0 replies; 35+ messages in thread From: Colin Watson @ 2024-11-03 1:18 UTC (permalink / raw) To: G. Branden Robinson Cc: Ian Rogers, David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, Jonathan Corbet, dri-devel, linux-doc, linux-kernel, linux-man, groff (now with some local vim macros fixed to stop accidentally corrupting the To: lines of some of my outgoing emails ...) On Sat, Nov 02, 2024 at 08:09:29PM -0500, G. Branden Robinson wrote: > At 2024-11-03T00:47:23+0000, Colin Watson wrote: > > and the fact that the NAME section has both semantic and > > presentational meaning means that like it or not the parser needs to > > be aware of this.) > > Even if mandb(8) doesn't run groff to extract the summary descriptions/ > apropos lines, I think this feature might be useful to you for > coverage/regression testing. Presumably, for valid inputs, groff and > mandb(8) should reach similar conclusions about how the text of a "Name" > section is to be formatted. Yes, that's a good point and I agree with that. -- Colin Watson (he/him) [cjwatson@debian.org] ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page 2024-11-03 0:47 ` Colin Watson 2024-11-03 1:09 ` G. Branden Robinson @ 2024-11-03 1:59 ` Alejandro Colomar 2024-11-03 14:32 ` Colin Watson 1 sibling, 1 reply; 35+ messages in thread From: Alejandro Colomar @ 2024-11-03 1:59 UTC (permalink / raw) To: G. Branden Robinson, linux-man, groff [-- Attachment #1: Type: text/plain, Size: 1762 bytes --] Hi Colin, On Sun, Nov 03, 2024 at 12:47:23AM +0000, Colin Watson wrote: > I'm not trying to stop you committing whatever you want to your > repository, of course, but I want to be clear that this doesn't actually > solve the right problem for manual page indexing. The point of the > parsing code in mandb(8) - and I'm not claiming that it's great code or > the perfect design, just that it works most of the time - is to extract > the names and summary-descriptions from each page so that they can be > used by tools such as apropos(1) and whatis(1). Splitting on section > boundaries is just the simplest part of that problem, and I don't think > that doing it in a separate program really gains anything. Splitting on section boundaries is the minimum thing so that mandb(8) can use groff(1) directly to parse the section (instead of rolling your own man(7) parser). groff(1) could also be used --avoiding a shell script--, but that would need a new feature in groff(1) --which Breanden has suggested--. I prefer avoiding the growth of groff(1), if a simple sed(1) invocation can do it. The script will be useful for now to me, so I'll probably commit it. Feel free to use it if you find it useful. (If so, please let me know so that I keep the interface stable.) Cheers, Alex > (That's leaving aside things like localized man pages, which I know some > folks on the groff list tend to sniff at but I think they're important, > and the fact that the NAME section has both semantic and presentational > meaning means that like it or not the parser needs to be aware of this.) > > -- > Colin Watson (he/him) [cjwatson@debian.org] > > -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page 2024-11-03 1:59 ` Alejandro Colomar @ 2024-11-03 14:32 ` Colin Watson 0 siblings, 0 replies; 35+ messages in thread From: Colin Watson @ 2024-11-03 14:32 UTC (permalink / raw) To: Alejandro Colomar; +Cc: G. Branden Robinson, linux-man, groff On Sun, Nov 03, 2024 at 02:59:34AM +0100, Alejandro Colomar wrote: > On Sun, Nov 03, 2024 at 12:47:23AM +0000, Colin Watson wrote: > > I'm not trying to stop you committing whatever you want to your > > repository, of course, but I want to be clear that this doesn't actually > > solve the right problem for manual page indexing. The point of the > > parsing code in mandb(8) - and I'm not claiming that it's great code or > > the perfect design, just that it works most of the time - is to extract > > the names and summary-descriptions from each page so that they can be > > used by tools such as apropos(1) and whatis(1). Splitting on section > > boundaries is just the simplest part of that problem, and I don't think > > that doing it in a separate program really gains anything. > > Splitting on section boundaries is the minimum thing so that mandb(8) > can use groff(1) directly to parse the section (instead of rolling your > own man(7) parser). No, it doesn't help, because mandb(8) still has to do a bunch of other man(7) parsing on top of that (including the problem that caused me to be CCed into this thread in the first place). Delegating just the section splitting to a separate tool would add quite a bit of complexity without removing the need for man-db's own parser. A separate tool is only useful if it solves the whole problem at hand, rather than maybe 10% of it. And even then it would need some careful thought around integration. Thanks, -- Colin Watson (he/him) [cjwatson@debian.org] ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page 2024-11-02 10:39 ` Alejandro Colomar 2024-11-02 21:36 ` Alejandro Colomar @ 2024-11-03 4:05 ` G. Branden Robinson 1 sibling, 0 replies; 35+ messages in thread From: G. Branden Robinson @ 2024-11-03 4:05 UTC (permalink / raw) To: Alejandro Colomar Cc: Ian Rogers, David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, Jonathan Corbet, dri-devel, linux-doc, linux-kernel, linux-man, cjwatson, groff [-- Attachment #1: Type: text/plain, Size: 2262 bytes --] Hi Alex, At 2024-11-02T11:39:37+0100, Alejandro Colomar wrote: > And diffs are a real win for text. Thus, semantic newlines are a real > win for text. "Write poems, not prose." (Any chance we may get that > warning added to groff(1)? :D) Yes, but I've kicked it out to groff 1.25 because a gift-wrapped opportunity came along. We get to retire a warning category and its number. groff(7) [1.23.0]: Warnings ... el 16 The el request was encountered with no prior corresponding ie request. groff 1.24.0 [in preparation] NEWS: * The "el" warning category has been withdrawn. If enabled (which it was not by default), the formatter would emit a diagnostic if it inferred an imbalance between `ie` and `el` requests. Unfortunately its technique wasn't reliable and sometimes spuriously issued these warnings, and making it perfectly reliable did not look tractable. We recommend using brace escape sequences `\{` and `\}` to ensure that your control flow structures remain maintainable. This was a 35-year-old bug (or incomplete feature) in GNU troff that as far as I know first came to attention 10 years ago when the then-Heirloom Doctools maintainer pointed out an incompatibility between AT&T troff (from which Heirloom Doctools descends) and GNU troff. https://savannah.gnu.org/bugs/?45502 More recently, Paul Eggert scored big-time grognard points by actually depending on the AT&T troff behavior in the zic(8) man page. https://savannah.gnu.org/bugs/?65474 We therefore _had_ to fix it. The consequence is that the warning category `el` and bit 4 in the warning mask integer are undefined for groff 1.24. This was irresistible serendipity, because this warning category was (1) not enabled by default and (2) probably used only by people who wouldn't object to style warnings anyway. In groff 1.25, I want to revive bit 4 as new warning category `style`. Ending sentences before the end of a text line is something we can warn about as discussed a while back, and I plan to do so. https://lists.gnu.org/archive/html/groff/2022-06/msg00052.html I've been collecting specimens of other contemplated style warnings. https://savannah.gnu.org/bugs/?62776 Regards, Branden [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page 2024-11-02 10:08 ` G. Branden Robinson 2024-11-02 10:39 ` Alejandro Colomar @ 2024-11-02 19:06 ` Colin Watson 2024-11-03 0:50 ` G. Branden Robinson 1 sibling, 1 reply; 35+ messages in thread From: Colin Watson @ 2024-11-02 19:06 UTC (permalink / raw) To: G. Branden Robinson Cc: Alejandro Colomar, Ian Rogers, David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, Jonathan Corbet, dri-devel, linux-doc, linux-kernel, linux-man, groff On Sat, Nov 02, 2024 at 05:08:37AM -0500, G. Branden Robinson wrote: > On GNU/Linux systems, the only man page indexer I know of is Colin > Watson's man-db--specifically, its mandb(8) program. But it's nicely > designed so that the "topic and summary description extraction" task is > delegated to a standalone tool, lexgrog(1), and we can use that. > > $ lexgrog /tmp/proc_pid_fdinfo_mini.5 > /tmp/proc_pid_fdinfo_mini.5: parse failed > > Oh, damn. I wasn't expecting that. Maybe this is what defeats Michael > Kerrisk's scraper with respect to groff's man pages.[1] How embarrassing. Could somebody please file a bug on https://gitlab.com/man-db/man-db/-/issues to remind me to fix that? (Of course there'll be a lead time for fixes to get into distributions.) > Well, I can find a silver lining here, because it gives me an even > better reason than I had to pitch an idea I've been kicking around for a > while. Why not enhance groff man(7) to support a mode where _it_ will > spit out the "Name"/"NAME" section, and only that, _for_ you? > > This would be as easy as checking for an option, say '-d EXTRACT=Name', > and having the package's "TH" and "SH" macro definitions divert > (literally, with the `di` request) everything _except_ the section of > interest to a diversion that is then never called/output. (This is > similar to an m4 feature known as the "black hole diversion".) > > All of the features necessary to implement this[2] were part of troff as > far as back as the birth of the man(7) package itself. It's not clear > to me why it wasn't done back in the 1980s. > > lexgrog(1) itself will of course have to stay around for years to come, > but this could take a significant distraction off of Colin's plate--I > believe I have seen him grumble about how much *roff syntax he has to > parse to have the feature be workable, and that's without upstart groff > maintainers exploring up to every boundary that existed even in 1979 and > cheerfully exercising their findings in man pages. lexgrog(1) is a useful (if oddly-named, sorry) debugging tool, but if you focus on that then you'll end up with a design that's not very useful. What really matters is indexing the whole system's manual pages, and mandb(8) does not do that by invoking lexgrog(1) one page at a time, but rather by running more or less the same code in-process. I already know that getting acceptable performance for this requires care, as illustrated by one of the NEWS entries for man-db 2.10.0: * Significantly improve `mandb(8)` and `man -K` performance in the common case where pages are of moderate size and compressed using `zlib`: `mandb -c` goes from 344 seconds to 10 seconds on a test system. ... so I'm prepared to bet that forking nroff one page at a time will be unacceptably slow. (This also combines with the fact that man-db applies some sandboxing when it's calling nroff just in case it might happen that a moderately-sized C++ project has less than 100% perfect security when doing text processing, which I'm sure everyone agrees would never happen.) If it were possible to run nroff over a whole batch of pages and get output for each of them in one go, then maaaaybe. man-db would need a reliable way to associate each line (or sometimes multiple lines) of output with each source file, and of course care would be needed around error handling and so on. I can see the appeal, in terms of processing the actual language rather than a pile of hacks that try to guess what to do with it - but on the other hand this starts to feel like a much less natural fit for the way nroff is run in every other situation, where you're processing one document at a time. Cheers, -- Colin Watson (he/him) [cjwatson@debian.org] ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page 2024-11-02 19:06 ` Colin Watson @ 2024-11-03 0:50 ` G. Branden Robinson 2024-11-03 1:55 ` Colin Watson 0 siblings, 1 reply; 35+ messages in thread From: G. Branden Robinson @ 2024-11-03 0:50 UTC (permalink / raw) To: Alejandro Colomar, Ian Rogers, David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, Jonathan Corbet, dri-devel, linux-doc, linux-kernel, linux-man, groff [-- Attachment #1: Type: text/plain, Size: 4954 bytes --] Hi Colin, At 2024-11-02T19:06:53+0000, Colin Watson wrote: > How embarrassing. Could somebody please file a bug on > https://gitlab.com/man-db/man-db/-/issues to remind me to fix that? Done; <https://gitlab.com/man-db/man-db/-/issues/46>. > lexgrog(1) is a useful (if oddly-named, sorry) debugging tool, but if > you focus on that then you'll end up with a design that's not very > useful. What really matters is indexing the whole system's manual > pages, and mandb(8) does not do that by invoking lexgrog(1) one page > at a time, but rather by running more or less the same code > in-process. Ah, I see it now--"lexgrog.l" is in both the Automake macros "lexgrog_SOURCES" and "mandb_SOURCES". Nice and DRY! > I already know that getting acceptable performance for > this requires care, as illustrated by one of the NEWS entries for > man-db 2.10.0: > > * Significantly improve `mandb(8)` and `man -K` performance in the > common case where pages are of moderate size and compressed using > `zlib`: `mandb -c` goes from 344 seconds to 10 seconds on a test > system. > > ... so I'm prepared to bet that forking nroff one page at a time will > be unacceptably slow. Probably, but there is little reason to run nroff that way (as of groff 1.23). It already works well, but I have ideas for further hardening groff's man(7) and mdoc(7) packages such that they return to a well-defined state when changing input documents. > (This also combines with the fact that man-db applies some sandboxing > when it's calling nroff just in case it might happen that a > moderately-sized C++ project has less than 100% perfect security when > doing text processing, which I'm sure everyone agrees would never > happen.) Inconceivable, yes! But fortunately you can run nroff over N documents and pay its own startup overhead costs as well as those of sandboxing only once. > If it were possible to run nroff over a whole batch of pages and get > output for each of them in one go, then maaaaybe. That's already true for formatting the entire page. It's how this was created. https://www.gnu.org/software/groff/manual/groff-man-pages.utf8.txt (...best viewed with "less -R") With the `-d EXTRACT` feature I have in mind, in its as-simple-as-possible first-cut form, the problem you anticipate... > man-db would need a reliable way to associate each line (or sometimes > multiple lines) of output with each source file, ...would remain. I'll have to think of a good way to write out "metadata" (the input file name and the arguments to the `TH` request) as each page is encountered, and of an interface to enable that. I don't see it happening before groff 1.25. > and of course care would be needed around error handling and so on. I need to give this thought, too. What sorts of error scenarios do you foresee? GNU troff itself, if it can't open a file to be formatted, reports an error diagnostic and continues to the next `argv` string until it reaches the end of input. > I can see the appeal, in terms of processing the actual language > rather than a pile of hacks that try to guess what to do with it ...a major selling point, IMO... > but on the other hand this starts to feel like a much less natural fit > for the way nroff is run in every other situation, where you're > processing one document at a time. This I disagree with. Or perhaps more precisely, it's another example of the exception (man(1)) swallowing the rule (nroff/troff). nroff and troff were written as Unix filters; they read the standard input stream (and/or argument list)[1], do some processing, and write to standard output.[2] Historically, troff (or one of its preprocessors) was commonly used with multiple input files to catenate them. Here's an example of this practice from 1980. https://minnie.tuhs.org/cgi-bin/utree.pl?file=3BSD/usr/doc/pascal/makefile Regards, Branden [1] ...including this option from Seventh Edition Unix (1979) or earlier, which survives in GNU troff to this day. -i Read standard input after the input files are exhausted. [2] Seventh Edition troff didn't write to stdout by default, but tried to open the typesetter device. But it had an option to write to standard output. -t Direct output to the standard output instead of the phototypesetter. Running old school Unix under emulation these days, you _have_ to use this option to avoid the dreaded "Typesetter busy." diagnostic. When Kernighan refactored troff for device-independence, he reseated it more squarely in the Unix filter tradition by writing its plain-text page description language to stdout. The output driver, such as "dpost" for PostScript, also read its standard input, and could thus become just one more stage in a pipeline. [CSTR #97] [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page 2024-11-03 0:50 ` G. Branden Robinson @ 2024-11-03 1:55 ` Colin Watson 0 siblings, 0 replies; 35+ messages in thread From: Colin Watson @ 2024-11-03 1:55 UTC (permalink / raw) To: G. Branden Robinson Cc: Alejandro Colomar, Ian Rogers, David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, Jonathan Corbet, dri-devel, linux-doc, linux-kernel, linux-man, groff On Sat, Nov 02, 2024 at 07:50:23PM -0500, G. Branden Robinson wrote: > At 2024-11-02T19:06:53+0000, Colin Watson wrote: > > How embarrassing. Could somebody please file a bug on > > https://gitlab.com/man-db/man-db/-/issues to remind me to fix that? > > Done; <https://gitlab.com/man-db/man-db/-/issues/46>. Thanks, working on it. > > I already know that getting acceptable performance for > > this requires care, as illustrated by one of the NEWS entries for > > man-db 2.10.0: > > > > * Significantly improve `mandb(8)` and `man -K` performance in the > > common case where pages are of moderate size and compressed using > > `zlib`: `mandb -c` goes from 344 seconds to 10 seconds on a test > > system. > > > > ... so I'm prepared to bet that forking nroff one page at a time will > > be unacceptably slow. > > Probably, but there is little reason to run nroff that way (as of groff > 1.23). It already works well, but I have ideas for further hardening > groff's man(7) and mdoc(7) packages such that they return to a > well-defined state when changing input documents. Being able to keep track of which output goes with which input pages is critical to the indexer, though (as you acknowledge later in your reply). It can't just throw the whole lot at nroff and call it a day. One other thing: mandb/lexgrog also looks for preprocessing filter hints in pages (`'\" te` and the like). This is obscure, to be sure, but either a replacement would need to do the same thing or we'd need to be certain that it's no longer required. > > and of course care would be needed around error handling and so on. > > I need to give this thought, too. What sorts of error scenarios do you > foresee? GNU troff itself, if it can't open a file to be formatted, > reports an error diagnostic and continues to the next `argv` string > until it reaches the end of input. That might be sufficient, or man-db might need to be able to detect which pages had errors. I'm not currently sure. > > but on the other hand this starts to feel like a much less natural fit > > for the way nroff is run in every other situation, where you're > > processing one document at a time. > > This I disagree with. Or perhaps more precisely, it's another example > of the exception (man(1)) swallowing the rule (nroff/troff). nroff and > troff were written as Unix filters; they read the standard input stream > (and/or argument list)[1], do some processing, and write to standard > output.[2] > > Historically, troff (or one of its preprocessors) was commonly used with > multiple input files to catenate them. But this application is not conceptually like catenation (even if it might be possible to implement it that way). The collection of all manual pages on a system is not like one long document that happens to be split over multiple files, certainly not from an indexer's point of view. -- Colin Watson (he/him) [cjwatson@debian.org] ^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH 0/3] Add mansect(1) program and manual page 2024-11-01 18:19 ` Ian Rogers 2024-11-01 20:07 ` Alejandro Colomar @ 2024-11-02 23:10 ` Alejandro Colomar 2024-11-02 23:10 ` [PATCH] CONTRIBUTING.d/patches: Document new features alongside the features Alejandro Colomar ` (4 more replies) 1 sibling, 5 replies; 35+ messages in thread From: Alejandro Colomar @ 2024-11-02 23:10 UTC (permalink / raw) To: linux-man, branden, cjwatson; +Cc: groff, Alejandro Colomar [-- Attachment #1: Type: text/plain, Size: 2600 bytes --] Hi Branden, Colin, I'm proposing the addition of this program to the Linux man-pages repository, as a spin-off of the man_section() shell function that we already have. Eventually, we could move it to a separate repository, if it is more appropriate. Could you please review? (And also give any opinions you have about it.) It originally supports man(7) only, but we probably can extend it for mdoc(7) easily. Here's the manual page, for ease of review: $ MANWIDTH=64 man man1/mansect.1 | cat mansect(1) General Commands Manual mansect(1) NAME mansect - print the source code of sections of manual pages SYNOPSIS mansect section [file ...] DESCRIPTION The mansect command prints the source code of the section of the given manual‐page files. If no files are speci‐ fied, the standard input is used. section is a basic regular expression. The TH line is unconditionally printed. The output of this program is suitable for piping to the groff(1) pipeline. EXAMPLES $ man ‐w strtol strtoul | xargs mansect ’\(NAME\|SEE ALSO\)’ .TH strtol 3 2024‐07‐23 "Linux man‐pages 6.9.1" .SH NAME strtol, strtoll, strtoq - convert a string to a long integer .SH SEE ALSO .BR atof (3), .BR atoi (3), .BR atol (3), .BR strtod (3), .BR strtoimax (3), .BR strtoul (3) .TH strtoul 3 2024‐07‐23 "Linux man‐pages 6.9.1" .SH NAME strtoul, strtoull, strtouq - convert a string to an unsigned long integer .SH SEE ALSO .BR a64l (3), .BR atof (3), .BR atoi (3), .BR atol (3), .BR strtod (3), .BR strtol (3), .BR strtoumax (3) SEE ALSO lexgrog(1), groff(1), man(1) Linux man‐pages (unrelea... (date) mansect(1) What do you think of it? Have a lovely night! Alex Alejandro Colomar (2): src/bin/mansect, mansect.1: Add program and its manual page scripts/bash_aliases: man_section(), man_lsfunc(), man_lsvar(): Use mansect(1) Vincent Lefevre (1): signal.7: Better description for SIGFPE man/man1/mansect.1 | 61 ++++++++++++++++++++++++++++++++++++++++++++ man/man7/signal.7 | 2 +- scripts/bash_aliases | 38 +++++---------------------- src/bin/mansect | 33 ++++++++++++++++++++++++ 4 files changed, 101 insertions(+), 33 deletions(-) create mode 100644 man/man1/mansect.1 create mode 100755 src/bin/mansect -- 2.39.5 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH] CONTRIBUTING.d/patches: Document new features alongside the features 2024-11-02 23:10 ` [PATCH 0/3] Add mansect(1) program and manual page Alejandro Colomar @ 2024-11-02 23:10 ` Alejandro Colomar 2024-11-02 23:17 ` Alejandro Colomar 2024-11-02 23:10 ` [PATCH 1/3] signal.7: Better description for SIGFPE Alejandro Colomar ` (3 subsequent siblings) 4 siblings, 1 reply; 35+ messages in thread From: Alejandro Colomar @ 2024-11-02 23:10 UTC (permalink / raw) To: linux-man, branden, cjwatson, Günther Noack, Jiri Olsa Cc: groff, Alejandro Colomar [-- Attachment #1: Type: text/plain, Size: 1956 bytes --] Link: <https://lwn.net/Articles/989380/> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Günther Noack <gnoack@google.com> Signed-off-by: Alejandro Colomar <alx@kernel.org> --- [offlist] Hi Günther, Jiri, I've prepared a draft of this contributing process that we talked about. I won't officially post it until the other situation (sponsoring) is resolved, but we can discuss it in private if you want. Have a lovely night! Alex CONTRIBUTING.d/patches | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/CONTRIBUTING.d/patches b/CONTRIBUTING.d/patches index fedb163d3..0562ded66 100644 --- a/CONTRIBUTING.d/patches +++ b/CONTRIBUTING.d/patches @@ -131,6 +131,26 @@ Description to the list. See also <CONTRIBUTING.d/git> for instructions for configuring git-send-email(1) to use neomutt(1) as a driver. + New kernel/libc features + If you write a new kernel or libc feature, you should document it + in the same patch set that adds the feature, including any + patches to the manual pages. The entire patch set consisting of + both the feature and its manual page should be sent to all + recipients for a better review process. That can be done with + the following procedure: + + 1) Generate the kernel or libc patch set, with a cover letter, + and using --thread in git-format-patch(1) (as specified in + our ./CONTRIBUTING.d/git). This will generate a Message-ID + header field in the cover letter. + + 2) Generate the man-pages patch set using + --in-reply-to="<message-id>", where <message-id> is the value + of the header field of the cover letter. + + 3) Send first the kernel/libc patch set, and then the man-pages + one, so that they have a consistent order. + See also CONTRIBUTING CONTRIBUTING.d/* -- 2.39.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [PATCH] CONTRIBUTING.d/patches: Document new features alongside the features 2024-11-02 23:10 ` [PATCH] CONTRIBUTING.d/patches: Document new features alongside the features Alejandro Colomar @ 2024-11-02 23:17 ` Alejandro Colomar 0 siblings, 0 replies; 35+ messages in thread From: Alejandro Colomar @ 2024-11-02 23:17 UTC (permalink / raw) To: linux-man, branden, cjwatson, Günther Noack, Jiri Olsa; +Cc: groff [-- Attachment #1.1.1: Type: text/plain, Size: 2222 bytes --] Oops, this was sent by accident. :) On Sun, Nov 03, 2024 at 12:10:18AM +0100, Alejandro Colomar wrote: > Link: <https://lwn.net/Articles/989380/> > Cc: Jiri Olsa <jolsa@kernel.org> > Cc: Günther Noack <gnoack@google.com> > Signed-off-by: Alejandro Colomar <alx@kernel.org> > --- > > [offlist] > > Hi Günther, Jiri, > > I've prepared a draft of this contributing process that we talked about. > I won't officially post it until the other situation (sponsoring) is > resolved, but we can discuss it in private if you want. > > > Have a lovely night! > Alex > > CONTRIBUTING.d/patches | 23 +++++++++++++++++++++++ > 1 file changed, 23 insertions(+) > > diff --git a/CONTRIBUTING.d/patches b/CONTRIBUTING.d/patches > index fedb163d3..0562ded66 100644 > --- a/CONTRIBUTING.d/patches > +++ b/CONTRIBUTING.d/patches > @@ -131,6 +131,26 @@ Description > to the list. See also <CONTRIBUTING.d/git> for instructions for > configuring git-send-email(1) to use neomutt(1) as a driver. > > + New kernel/libc features > + If you write a new kernel or libc feature, you should document it > + in the same patch set that adds the feature, including any > + patches to the manual pages. The entire patch set consisting of > + both the feature and its manual page should be sent to all > + recipients for a better review process. That can be done with > + the following procedure: > + > + 1) Generate the kernel or libc patch set, with a cover letter, > + and using --thread in git-format-patch(1) (as specified in > + our ./CONTRIBUTING.d/git). This will generate a Message-ID > + header field in the cover letter. > + > + 2) Generate the man-pages patch set using > + --in-reply-to="<message-id>", where <message-id> is the value > + of the header field of the cover letter. > + > + 3) Send first the kernel/libc patch set, and then the man-pages > + one, so that they have a consistent order. > + > See also > CONTRIBUTING > CONTRIBUTING.d/* > -- > 2.39.2 > -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH 1/3] signal.7: Better description for SIGFPE 2024-11-02 23:10 ` [PATCH 0/3] Add mansect(1) program and manual page Alejandro Colomar 2024-11-02 23:10 ` [PATCH] CONTRIBUTING.d/patches: Document new features alongside the features Alejandro Colomar @ 2024-11-02 23:10 ` Alejandro Colomar 2024-11-02 23:17 ` Alejandro Colomar 2024-11-02 23:10 ` [PATCH 2/3] src/bin/mansect, mansect.1: Add program and its manual page Alejandro Colomar ` (2 subsequent siblings) 4 siblings, 1 reply; 35+ messages in thread From: Alejandro Colomar @ 2024-11-02 23:10 UTC (permalink / raw) To: linux-man, branden, cjwatson; +Cc: groff, Vincent Lefevre, Alejandro Colomar [-- Attachment #1: Type: text/plain, Size: 1247 bytes --] From: Vincent Lefevre <vincent@vinc17.net> SIGFPE has comment "Floating-point exception", which corresponds to the FPE acronym. But this is misleading as this signal may also be generated by an integer division by 0. Change it to "Erroneous arithmetic operation" from POSIX. Note: the GNU C Library manual says "fatal arithmetic error". Link: <https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/signal.h.html> Link: <https://www.gnu.org/software/libc/manual/html_node/Program-Error-Signals.html> Signed-off-by: Vincent Lefevre <vincent@vinc17.net> Signed-off-by: Alejandro Colomar <alx@kernel.org> --- man/man7/signal.7 | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/man/man7/signal.7 b/man/man7/signal.7 index 7a9e91cc7..d19f171b3 100644 --- a/man/man7/signal.7 +++ b/man/man7/signal.7 @@ -373,7 +373,7 @@ .SS Standard signals SIGCLD \- Ign A synonym for \fBSIGCHLD\fP SIGCONT P1990 Cont Continue if stopped SIGEMT \- Term Emulator trap -SIGFPE P1990 Core Floating-point exception +SIGFPE P1990 Core Erroneous arithmetic operation SIGHUP P1990 Term Hangup detected on controlling terminal or death of controlling process SIGILL P1990 Core Illegal Instruction -- 2.39.5 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [PATCH 1/3] signal.7: Better description for SIGFPE 2024-11-02 23:10 ` [PATCH 1/3] signal.7: Better description for SIGFPE Alejandro Colomar @ 2024-11-02 23:17 ` Alejandro Colomar 0 siblings, 0 replies; 35+ messages in thread From: Alejandro Colomar @ 2024-11-02 23:17 UTC (permalink / raw) To: linux-man, branden, cjwatson; +Cc: groff, Vincent Lefevre [-- Attachment #1.1: Type: text/plain, Size: 1465 bytes --] Oops, this was sent by accident. On Sun, Nov 03, 2024 at 12:10:27AM +0100, Alejandro Colomar wrote: > From: Vincent Lefevre <vincent@vinc17.net> > > SIGFPE has comment "Floating-point exception", which corresponds to > the FPE acronym. But this is misleading as this signal may also be > generated by an integer division by 0. > > Change it to "Erroneous arithmetic operation" from POSIX. > Note: the GNU C Library manual says "fatal arithmetic error". > > Link: <https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/signal.h.html> > Link: <https://www.gnu.org/software/libc/manual/html_node/Program-Error-Signals.html> > Signed-off-by: Vincent Lefevre <vincent@vinc17.net> > Signed-off-by: Alejandro Colomar <alx@kernel.org> > --- > man/man7/signal.7 | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/man/man7/signal.7 b/man/man7/signal.7 > index 7a9e91cc7..d19f171b3 100644 > --- a/man/man7/signal.7 > +++ b/man/man7/signal.7 > @@ -373,7 +373,7 @@ .SS Standard signals > SIGCLD \- Ign A synonym for \fBSIGCHLD\fP > SIGCONT P1990 Cont Continue if stopped > SIGEMT \- Term Emulator trap > -SIGFPE P1990 Core Floating-point exception > +SIGFPE P1990 Core Erroneous arithmetic operation > SIGHUP P1990 Term Hangup detected on controlling terminal > or death of controlling process > SIGILL P1990 Core Illegal Instruction > -- > 2.39.5 > -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH 2/3] src/bin/mansect, mansect.1: Add program and its manual page 2024-11-02 23:10 ` [PATCH 0/3] Add mansect(1) program and manual page Alejandro Colomar 2024-11-02 23:10 ` [PATCH] CONTRIBUTING.d/patches: Document new features alongside the features Alejandro Colomar 2024-11-02 23:10 ` [PATCH 1/3] signal.7: Better description for SIGFPE Alejandro Colomar @ 2024-11-02 23:10 ` Alejandro Colomar 2024-11-02 23:10 ` [PATCH 3/3] scripts/bash_aliases: man_section(), man_lsfunc(), man_lsvar(): Use mansect(1) Alejandro Colomar 2024-11-03 1:16 ` [PATCH v2 0/4] Add mansect(1) Alejandro Colomar 4 siblings, 0 replies; 35+ messages in thread From: Alejandro Colomar @ 2024-11-02 23:10 UTC (permalink / raw) To: linux-man, branden, cjwatson; +Cc: groff, Alejandro Colomar [-- Attachment #1: Type: text/plain, Size: 2847 bytes --] Cc: "G. Branden Robinson" <branden@debian.org> Cc: Colin Watson <cjwatson@debian.org> Cc: <groff@gnu.org> Signed-off-by: Alejandro Colomar <alx@kernel.org> --- man/man1/mansect.1 | 61 ++++++++++++++++++++++++++++++++++++++++++++++ src/bin/mansect | 33 +++++++++++++++++++++++++ 2 files changed, 94 insertions(+) create mode 100644 man/man1/mansect.1 create mode 100755 src/bin/mansect diff --git a/man/man1/mansect.1 b/man/man1/mansect.1 new file mode 100644 index 000000000..f46dc0609 --- /dev/null +++ b/man/man1/mansect.1 @@ -0,0 +1,61 @@ +.\" Copyright 2024, Alejandro Colomar <alx@kernel.org> +.\" +.\" SPDX-License-Identifier: Linux-man-pages-copyleft +.\" +.TH mansect 1 (date) "Linux man-pages (unreleased)" +.SH NAME +mansect +\- +print the source code of sections of manual pages +.SH SYNOPSIS +.B mansect +.I section +.RI [ file\~ .\|.\|.] +.SH DESCRIPTION +The +.B mansect +command prints the source code of the +.I section +of the given manual-page files. +If no files are specified, +the standard input is used. +.P +.I section +is a basic regular expression. +.P +The +.B TH +line is unconditionally printed. +.P +The output of this program is suitable for piping to the +.BR groff (1) +pipeline. +.SH EXAMPLES +.EX +.RB $\~ "man -w strtol strtoul | xargs mansect '\[rs](NAME\[rs]|SEE ALSO\[rs])'" +\&.TH strtol 3 2024-07-23 "Linux man-pages 6.9.1" +\&.SH NAME +strtol, strtoll, strtoq \- convert a string to a long integer +\&.SH SEE ALSO +\&.BR atof (3), +\&.BR atoi (3), +\&.BR atol (3), +\&.BR strtod (3), +\&.BR strtoimax (3), +\&.BR strtoul (3) +\&.TH strtoul 3 2024-07-23 "Linux man-pages 6.9.1" +\&.SH NAME +strtoul, strtoull, strtouq \- convert a string to an unsigned long integer +\&.SH SEE ALSO +\&.BR a64l (3), +\&.BR atof (3), +\&.BR atoi (3), +\&.BR atol (3), +\&.BR strtod (3), +\&.BR strtol (3), +\&.BR strtoumax (3) +.EE +.SH SEE ALSO +.BR lexgrog (1), +.BR groff (1), +.BR man (1) diff --git a/src/bin/mansect b/src/bin/mansect new file mode 100755 index 000000000..a35d387b1 --- /dev/null +++ b/src/bin/mansect @@ -0,0 +1,33 @@ +#!/bin/sh +# +# Copyright 2020-2024, Alejandro Colomar <alx@kernel.org> +# SPDX-License-Identifier: GPL-3.0-or-later + + +if test $# -lt 1; then + >&2 echo "Usage: $0 SECTION [FILE ...]"; + return 1; +fi; + +s="$1"; +shift; + + +if test $# -lt 1; then + sed -n \ + -e '/^\.TH/,/^\.SH/{/^\.SH/!p}' \ + -e '/^\.SH '"$s"'$/p' \ + -e '/^\.SH '"$s"'$/,/^\.SH/{/^\.SH/!p}' \ + ; +else + find "$@" -not -type d \ + | xargs wc -l \ + | sed '${/ total$/d}' \ + | grep -v '\b1 ' \ + | awk '{ print $2 }' \ + | xargs -L1 sed -n \ + -e '/^\.TH/,/^\.SH/{/^\.SH/!p}' \ + -e '/^\.SH '"$s"'$/p' \ + -e '/^\.SH '"$s"'$/,/^\.SH/{/^\.SH/!p}' \ + ; +fi; -- 2.39.5 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH 3/3] scripts/bash_aliases: man_section(), man_lsfunc(), man_lsvar(): Use mansect(1) 2024-11-02 23:10 ` [PATCH 0/3] Add mansect(1) program and manual page Alejandro Colomar ` (2 preceding siblings ...) 2024-11-02 23:10 ` [PATCH 2/3] src/bin/mansect, mansect.1: Add program and its manual page Alejandro Colomar @ 2024-11-02 23:10 ` Alejandro Colomar 2024-11-03 1:16 ` [PATCH v2 0/4] Add mansect(1) Alejandro Colomar 4 siblings, 0 replies; 35+ messages in thread From: Alejandro Colomar @ 2024-11-02 23:10 UTC (permalink / raw) To: linux-man, branden, cjwatson; +Cc: groff, Alejandro Colomar [-- Attachment #1: Type: text/plain, Size: 1991 bytes --] Signed-off-by: Alejandro Colomar <alx@kernel.org> --- scripts/bash_aliases | 38 ++++++-------------------------------- 1 file changed, 6 insertions(+), 32 deletions(-) diff --git a/scripts/bash_aliases b/scripts/bash_aliases index e461707c8..0b0b5e08a 100644 --- a/scripts/bash_aliases +++ b/scripts/bash_aliases @@ -40,35 +40,13 @@ sed_rm_ccomments() # man_section() prints specific manual page sections (DESCRIPTION, SYNOPSIS, # ...) of all manual pages in a directory (or in a single manual page file). -# Usage example: .../man-pages$ man_section man2 SYNOPSIS 'SEE ALSO'; +# Usage example: .../man-pages$ man_section '\(SYNOPSIS\|SEE ALSO\)' man2/; man_section() { - if [ $# -lt 2 ]; then - >&2 echo "Usage: ${FUNCNAME[0]} <dir> <section>..."; - return $EX_USAGE; - fi - - local page="$1"; - shift; - local sect="$*"; - - find "$page" -type f \ - |xargs wc -l \ - |grep -v -e '\b1 ' -e '\btotal\b' \ - |awk '{ print $2 }' \ - |sort \ - |while read -r manpage; do - (sed -n '/^\.TH/,/^\.SH/{/^\.SH/!p}' <"$manpage"; - for s in $sect; do - <"$manpage" \ - sed -n \ - -e "/^\.SH $s/p" \ - -e "/^\.SH $s/,/^\.SH/{/^\.SH/!p}"; - done;) \ - |mandoc -Tutf8 2>/dev/null \ - |col -pbx; - done; + mansect "$@" \ + | mandoc -Tutf8 2>/dev/null \ + | col -pbx; } # man_lsfunc() prints the name of all C functions declared in the SYNOPSIS @@ -83,9 +61,7 @@ man_lsfunc() return $EX_USAGE; fi - for arg in "$@"; do - man_section "$arg" 'SYNOPSIS'; - done \ + man_section 'SYNOPSIS' "$@"; |sed_rm_ccomments \ |pcregrep -Mn '(?s)^ [\w ]+ \**\w+\([\w\s(,)[\]*]*?(...)?\s*\); *$' \ |grep '^[0-9]' \ @@ -106,9 +82,7 @@ man_lsvar() return $EX_USAGE; fi - for arg in "$@"; do - man_section "$arg" 'SYNOPSIS'; - done \ + man_section 'SYNOPSIS' "$@"; |sed_rm_ccomments \ |pcregrep -Mv '(?s)^ [\w ]+ \**\w+\([\w\s(,)[\]*]+?(...)?\s*\); *$' \ |pcregrep -Mn \ -- 2.39.5 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH v2 0/4] Add mansect(1) 2024-11-02 23:10 ` [PATCH 0/3] Add mansect(1) program and manual page Alejandro Colomar ` (3 preceding siblings ...) 2024-11-02 23:10 ` [PATCH 3/3] scripts/bash_aliases: man_section(), man_lsfunc(), man_lsvar(): Use mansect(1) Alejandro Colomar @ 2024-11-03 1:16 ` Alejandro Colomar 2024-11-03 1:16 ` [PATCH v2 1/4] src/bin/mansect, mansect.1: Add program and its manual page Alejandro Colomar ` (3 more replies) 4 siblings, 4 replies; 35+ messages in thread From: Alejandro Colomar @ 2024-11-03 1:16 UTC (permalink / raw) To: linux-man; +Cc: Alejandro Colomar, groff, cjwatson, branden [-- Attachment #1: Type: text/plain, Size: 1318 bytes --] Hi Colin, Branden, I've further optimized the script to be 3x faster, simpler and more robust. It now also prints the filename in the output, by calling preconv(1), which is necessary for doing the job that mandb(8) does (see patch 4/4). Cheers, Alex Alejandro Colomar (4): src/bin/mansect, mansect.1: Add program and its manual page scripts/bash_aliases: man_lsfunc(), man_lsvar(): Use mansect(1) scripts/bash_aliases: man_lsfunc(), man_lsvar(): Use pcre2grep(1) instead of pcregrep(1) src/bin/mansect: Preprocess with preconv(1) man/man1/mansect.1 | 61 ++++++++++++++++++++++++++++++++++++++++++++ scripts/bash_aliases | 51 +++++++----------------------------- src/bin/mansect | 27 ++++++++++++++++++++ 3 files changed, 97 insertions(+), 42 deletions(-) create mode 100644 man/man1/mansect.1 create mode 100755 src/bin/mansect Range-diff against v0 (ignoring v1): -: --------- > 1: 5ccf08a11 src/bin/mansect, mansect.1: Add program and its manual page -: --------- > 2: ef793bf0a scripts/bash_aliases: man_lsfunc(), man_lsvar(): Use mansect(1) -: --------- > 3: 0464c22ec scripts/bash_aliases: man_lsfunc(), man_lsvar(): Use pcre2grep(1) instead of pcregrep(1) -: --------- > 4: 929d1df17 src/bin/mansect: Preprocess with preconv(1) -- 2.39.5 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH v2 1/4] src/bin/mansect, mansect.1: Add program and its manual page 2024-11-03 1:16 ` [PATCH v2 0/4] Add mansect(1) Alejandro Colomar @ 2024-11-03 1:16 ` Alejandro Colomar 2024-11-03 1:17 ` [PATCH v2 2/4] scripts/bash_aliases: man_lsfunc(), man_lsvar(): Use mansect(1) Alejandro Colomar ` (2 subsequent siblings) 3 siblings, 0 replies; 35+ messages in thread From: Alejandro Colomar @ 2024-11-03 1:16 UTC (permalink / raw) To: linux-man; +Cc: Alejandro Colomar, groff, cjwatson, branden [-- Attachment #1: Type: text/plain, Size: 2676 bytes --] Cc: "G. Branden Robinson" <branden@debian.org> Cc: Colin Watson <cjwatson@debian.org> Cc: <groff@gnu.org> Signed-off-by: Alejandro Colomar <alx@kernel.org> --- man/man1/mansect.1 | 61 ++++++++++++++++++++++++++++++++++++++++++++++ src/bin/mansect | 26 ++++++++++++++++++++ 2 files changed, 87 insertions(+) create mode 100644 man/man1/mansect.1 create mode 100755 src/bin/mansect diff --git a/man/man1/mansect.1 b/man/man1/mansect.1 new file mode 100644 index 000000000..c9e9138e7 --- /dev/null +++ b/man/man1/mansect.1 @@ -0,0 +1,61 @@ +.\" Copyright 2024, Alejandro Colomar <alx@kernel.org> +.\" +.\" SPDX-License-Identifier: Linux-man-pages-copyleft +.\" +.TH mansect 1 (date) "Linux man-pages (unreleased)" +.SH NAME +mansect +\- +print the source code of sections of manual pages +.SH SYNOPSIS +.B mansect +.I section +.RI [ file\~ .\|.\|.] +.SH DESCRIPTION +The +.B mansect +command prints the source code of the +.I section +of the given manual-page files. +If no files are specified, +the standard input is used. +.P +.I section +is an extended regular expression. +.P +The +.B TH +line is unconditionally printed. +.P +The output of this program is suitable for piping to the +.BR groff (1) +pipeline. +.SH EXAMPLES +.EX +.RB $\~ "man -w strtol strtoul | xargs mansect '\[rs](NAME\[rs]|SEE ALSO\[rs])'" +\&.TH strtol 3 2024-07-23 "Linux man-pages 6.9.1" +\&.SH NAME +strtol, strtoll, strtoq \- convert a string to a long integer +\&.SH SEE ALSO +\&.BR atof (3), +\&.BR atoi (3), +\&.BR atol (3), +\&.BR strtod (3), +\&.BR strtoimax (3), +\&.BR strtoul (3) +\&.TH strtoul 3 2024-07-23 "Linux man-pages 6.9.1" +\&.SH NAME +strtoul, strtoull, strtouq \- convert a string to an unsigned long integer +\&.SH SEE ALSO +\&.BR a64l (3), +\&.BR atof (3), +\&.BR atoi (3), +\&.BR atol (3), +\&.BR strtod (3), +\&.BR strtol (3), +\&.BR strtoumax (3) +.EE +.SH SEE ALSO +.BR lexgrog (1), +.BR groff (1), +.BR man (1) diff --git a/src/bin/mansect b/src/bin/mansect new file mode 100755 index 000000000..a13a6b534 --- /dev/null +++ b/src/bin/mansect @@ -0,0 +1,26 @@ +#!/bin/sh +# +# Copyright 2020-2024, Alejandro Colomar <alx@kernel.org> +# SPDX-License-Identifier: GPL-3.0-or-later + + +if test $# -lt 1; then + >&2 printf '%s\n' "$(basename "$0"): error: Too few arguments." + return 1; +fi; + +s="$1"; +shift; + + +if test $# -lt 1; then + cat; +else + find -L "$@" -not -type d \ + | xargs grep -l '^\.TH ' \ + | xargs cat; +fi \ +| sed -En \ + -e '/^\.TH /p' \ + -e '/^\.SH '"$s"'$/p' \ + -e '/^\.SH '"$s"'$/,/^\.(TH|SH)/{/^\.(TH|SH)/!p}'; -- 2.39.5 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH v2 2/4] scripts/bash_aliases: man_lsfunc(), man_lsvar(): Use mansect(1) 2024-11-03 1:16 ` [PATCH v2 0/4] Add mansect(1) Alejandro Colomar 2024-11-03 1:16 ` [PATCH v2 1/4] src/bin/mansect, mansect.1: Add program and its manual page Alejandro Colomar @ 2024-11-03 1:17 ` Alejandro Colomar 2024-11-03 1:17 ` [PATCH v2 3/4] scripts/bash_aliases: man_lsfunc(), man_lsvar(): Use pcre2grep(1) instead of pcregrep(1) Alejandro Colomar 2024-11-03 1:17 ` [PATCH v2 4/4] src/bin/mansect: Preprocess with preconv(1) Alejandro Colomar 3 siblings, 0 replies; 35+ messages in thread From: Alejandro Colomar @ 2024-11-03 1:17 UTC (permalink / raw) To: linux-man; +Cc: Alejandro Colomar, groff, cjwatson, branden [-- Attachment #1: Type: text/plain, Size: 2232 bytes --] Remove the man_section() function, and call the mansect(1) program instead. Signed-off-by: Alejandro Colomar <alx@kernel.org> --- scripts/bash_aliases | 45 ++++++-------------------------------------- 1 file changed, 6 insertions(+), 39 deletions(-) diff --git a/scripts/bash_aliases b/scripts/bash_aliases index e461707c8..25425c389 100644 --- a/scripts/bash_aliases +++ b/scripts/bash_aliases @@ -38,39 +38,6 @@ sed_rm_ccomments() ######################################################################## # Linux man-pages -# man_section() prints specific manual page sections (DESCRIPTION, SYNOPSIS, -# ...) of all manual pages in a directory (or in a single manual page file). -# Usage example: .../man-pages$ man_section man2 SYNOPSIS 'SEE ALSO'; - -man_section() -{ - if [ $# -lt 2 ]; then - >&2 echo "Usage: ${FUNCNAME[0]} <dir> <section>..."; - return $EX_USAGE; - fi - - local page="$1"; - shift; - local sect="$*"; - - find "$page" -type f \ - |xargs wc -l \ - |grep -v -e '\b1 ' -e '\btotal\b' \ - |awk '{ print $2 }' \ - |sort \ - |while read -r manpage; do - (sed -n '/^\.TH/,/^\.SH/{/^\.SH/!p}' <"$manpage"; - for s in $sect; do - <"$manpage" \ - sed -n \ - -e "/^\.SH $s/p" \ - -e "/^\.SH $s/,/^\.SH/{/^\.SH/!p}"; - done;) \ - |mandoc -Tutf8 2>/dev/null \ - |col -pbx; - done; -} - # man_lsfunc() prints the name of all C functions declared in the SYNOPSIS # of all manual pages in a directory (or in a single manual page file). # Each name is printed in a separate line @@ -83,9 +50,9 @@ man_lsfunc() return $EX_USAGE; fi - for arg in "$@"; do - man_section "$arg" 'SYNOPSIS'; - done \ + mansect 'SYNOPSIS' "$@" \ + |mandoc -Tutf8 2>/dev/null \ + |col -pbx \ |sed_rm_ccomments \ |pcregrep -Mn '(?s)^ [\w ]+ \**\w+\([\w\s(,)[\]*]*?(...)?\s*\); *$' \ |grep '^[0-9]' \ @@ -106,9 +73,9 @@ man_lsvar() return $EX_USAGE; fi - for arg in "$@"; do - man_section "$arg" 'SYNOPSIS'; - done \ + mansect 'SYNOPSIS' "$@" \ + |mandoc -Tutf8 2>/dev/null \ + |col -pbx \ |sed_rm_ccomments \ |pcregrep -Mv '(?s)^ [\w ]+ \**\w+\([\w\s(,)[\]*]+?(...)?\s*\); *$' \ |pcregrep -Mn \ -- 2.39.5 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH v2 3/4] scripts/bash_aliases: man_lsfunc(), man_lsvar(): Use pcre2grep(1) instead of pcregrep(1) 2024-11-03 1:16 ` [PATCH v2 0/4] Add mansect(1) Alejandro Colomar 2024-11-03 1:16 ` [PATCH v2 1/4] src/bin/mansect, mansect.1: Add program and its manual page Alejandro Colomar 2024-11-03 1:17 ` [PATCH v2 2/4] scripts/bash_aliases: man_lsfunc(), man_lsvar(): Use mansect(1) Alejandro Colomar @ 2024-11-03 1:17 ` Alejandro Colomar 2024-11-03 1:17 ` [PATCH v2 4/4] src/bin/mansect: Preprocess with preconv(1) Alejandro Colomar 3 siblings, 0 replies; 35+ messages in thread From: Alejandro Colomar @ 2024-11-03 1:17 UTC (permalink / raw) To: linux-man; +Cc: Alejandro Colomar, groff, cjwatson, branden [-- Attachment #1: Type: text/plain, Size: 1113 bytes --] pcregrep(1) is obsolete. Signed-off-by: Alejandro Colomar <alx@kernel.org> --- scripts/bash_aliases | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/scripts/bash_aliases b/scripts/bash_aliases index 25425c389..98b466410 100644 --- a/scripts/bash_aliases +++ b/scripts/bash_aliases @@ -54,7 +54,7 @@ man_lsfunc() |mandoc -Tutf8 2>/dev/null \ |col -pbx \ |sed_rm_ccomments \ - |pcregrep -Mn '(?s)^ [\w ]+ \**\w+\([\w\s(,)[\]*]*?(...)?\s*\); *$' \ + |pcre2grep -Mn '(?s)^ [\w ]+ \**\w+\([\w\s(,)[\]*]*?(...)?\s*\); *$' \ |grep '^[0-9]' \ |sed -E 's/syscall\(SYS_(\w*),?/\1(/' \ |sed -E 's/^[^(]+ \**(\w+)\(.*/\1/' \ @@ -77,8 +77,8 @@ man_lsvar() |mandoc -Tutf8 2>/dev/null \ |col -pbx \ |sed_rm_ccomments \ - |pcregrep -Mv '(?s)^ [\w ]+ \**\w+\([\w\s(,)[\]*]+?(...)?\s*\); *$' \ - |pcregrep -Mn \ + |pcre2grep -Mv '(?s)^ [\w ]+ \**\w+\([\w\s(,)[\]*]+?(...)?\s*\); *$' \ + |pcre2grep -Mn \ -e '(?s)^ +extern [\w ]+ \**\(\*+[\w ]+\)\([\w\s(,)[\]*]+?\s*\); *$' \ -e '^ +extern [\w ]+ \**[\w ]+; *$' \ |grep '^[0-9]' \ -- 2.39.5 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH v2 4/4] src/bin/mansect: Preprocess with preconv(1) 2024-11-03 1:16 ` [PATCH v2 0/4] Add mansect(1) Alejandro Colomar ` (2 preceding siblings ...) 2024-11-03 1:17 ` [PATCH v2 3/4] scripts/bash_aliases: man_lsfunc(), man_lsvar(): Use pcre2grep(1) instead of pcregrep(1) Alejandro Colomar @ 2024-11-03 1:17 ` Alejandro Colomar 3 siblings, 0 replies; 35+ messages in thread From: Alejandro Colomar @ 2024-11-03 1:17 UTC (permalink / raw) To: linux-man; +Cc: Alejandro Colomar, groff, cjwatson, branden [-- Attachment #1: Type: text/plain, Size: 736 bytes --] This doesn't process the pages in a significant way, and has the benefit that it writes the name of the pages in the output. Signed-off-by: Alejandro Colomar <alx@kernel.org> --- src/bin/mansect | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/src/bin/mansect b/src/bin/mansect index a13a6b534..e1e83a8d8 100755 --- a/src/bin/mansect +++ b/src/bin/mansect @@ -14,13 +14,14 @@ shift; if test $# -lt 1; then - cat; + preconv; else find -L "$@" -not -type d \ | xargs grep -l '^\.TH ' \ - | xargs cat; + | xargs preconv; fi \ | sed -En \ + -e '/^\.lf 1 /p' \ -e '/^\.TH /p' \ -e '/^\.SH '"$s"'$/p' \ -e '/^\.SH '"$s"'$/,/^\.(TH|SH)/{/^\.(TH|SH)/!p}'; -- 2.39.5 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 35+ messages in thread
end of thread, other threads:[~2024-11-03 14:32 UTC | newest] Thread overview: 35+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-10-15 21:17 [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page Ian Rogers 2024-10-15 21:17 ` [PATCH v2 2/3] proc_pid_fdinfo.5: Add subsection headers for different fd types Ian Rogers 2024-10-15 21:17 ` [PATCH v2 3/3] proc_pid_fdinfo.5: Add DRM subsection Ian Rogers 2024-11-01 13:24 ` [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page Alejandro Colomar 2024-11-01 18:19 ` Ian Rogers 2024-11-01 20:07 ` Alejandro Colomar 2024-11-02 10:08 ` G. Branden Robinson 2024-11-02 10:39 ` Alejandro Colomar 2024-11-02 21:36 ` Alejandro Colomar 2024-11-02 23:47 ` Colin Watson 2024-11-03 0:05 ` Alejandro Colomar 2024-11-03 0:07 ` Alejandro Colomar 2024-11-03 0:24 ` Colin Watson 2024-11-03 0:42 ` Alejandro Colomar 2024-11-03 0:47 ` Colin Watson 2024-11-03 1:09 ` G. Branden Robinson 2024-11-03 1:18 ` Colin Watson 2024-11-03 1:59 ` Alejandro Colomar 2024-11-03 14:32 ` Colin Watson 2024-11-03 4:05 ` G. Branden Robinson 2024-11-02 19:06 ` Colin Watson 2024-11-03 0:50 ` G. Branden Robinson 2024-11-03 1:55 ` Colin Watson 2024-11-02 23:10 ` [PATCH 0/3] Add mansect(1) program and manual page Alejandro Colomar 2024-11-02 23:10 ` [PATCH] CONTRIBUTING.d/patches: Document new features alongside the features Alejandro Colomar 2024-11-02 23:17 ` Alejandro Colomar 2024-11-02 23:10 ` [PATCH 1/3] signal.7: Better description for SIGFPE Alejandro Colomar 2024-11-02 23:17 ` Alejandro Colomar 2024-11-02 23:10 ` [PATCH 2/3] src/bin/mansect, mansect.1: Add program and its manual page Alejandro Colomar 2024-11-02 23:10 ` [PATCH 3/3] scripts/bash_aliases: man_section(), man_lsfunc(), man_lsvar(): Use mansect(1) Alejandro Colomar 2024-11-03 1:16 ` [PATCH v2 0/4] Add mansect(1) Alejandro Colomar 2024-11-03 1:16 ` [PATCH v2 1/4] src/bin/mansect, mansect.1: Add program and its manual page Alejandro Colomar 2024-11-03 1:17 ` [PATCH v2 2/4] scripts/bash_aliases: man_lsfunc(), man_lsvar(): Use mansect(1) Alejandro Colomar 2024-11-03 1:17 ` [PATCH v2 3/4] scripts/bash_aliases: man_lsfunc(), man_lsvar(): Use pcre2grep(1) instead of pcregrep(1) Alejandro Colomar 2024-11-03 1:17 ` [PATCH v2 4/4] src/bin/mansect: Preprocess with preconv(1) Alejandro Colomar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox