* [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page
@ 2024-10-15 21:17 Ian Rogers
2024-10-15 21:17 ` [PATCH v2 2/3] proc_pid_fdinfo.5: Add subsection headers for different fd types Ian Rogers
` (2 more replies)
0 siblings, 3 replies; 20+ messages in thread
From: Ian Rogers @ 2024-10-15 21:17 UTC (permalink / raw)
To: Alejandro Colomar, G . Branden Robinson
Cc: David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Jonathan Corbet, dri-devel, linux-doc,
linux-kernel, linux-man, Ian Rogers
When /proc/pid/fdinfo was part of proc.5 man page the indentation made
sense. As a standalone man page the indentation doesn't need to be so
far over to the right. Remove the initial tagged pragraph and move the
styling to the initial summary description.
Suggested-by: G. Branden Robinson <g.branden.robinson@gmail.com>
Signed-off-by: Ian Rogers <irogers@google.com>
---
man/man5/proc_pid_fdinfo.5 | 66 ++++++++++++++++++--------------------
1 file changed, 32 insertions(+), 34 deletions(-)
diff --git a/man/man5/proc_pid_fdinfo.5 b/man/man5/proc_pid_fdinfo.5
index 1e23bbe02..8678caf4a 100644
--- a/man/man5/proc_pid_fdinfo.5
+++ b/man/man5/proc_pid_fdinfo.5
@@ -6,20 +6,19 @@
.\"
.TH proc_pid_fdinfo 5 (date) "Linux man-pages (unreleased)"
.SH NAME
-/proc/pid/fdinfo/ \- information about file descriptors
+.IR /proc/ pid /fdinfo " \- information about file descriptors"
.SH DESCRIPTION
-.TP
-.IR /proc/ pid /fdinfo/ " (since Linux 2.6.22)"
-This is a subdirectory containing one entry for each file which the
-process has open, named by its file descriptor.
-The files in this directory are readable only by the owner of the process.
-The contents of each file can be read to obtain information
-about the corresponding file descriptor.
-The content depends on the type of file referred to by the
-corresponding file descriptor.
-.IP
+Since Linux 2.6.22,
+this subdirectory contains one entry for each file that process
+.I pid
+has open, named by its file descriptor. The files in this directory
+are readable only by the owner of the process. The contents of each
+file can be read to obtain information about the corresponding file
+descriptor. The content depends on the type of file referred to by
+the corresponding file descriptor.
+.P
For regular files and directories, we see something like:
-.IP
+.P
.in +4n
.EX
.RB "$" " cat /proc/12015/fdinfo/4"
@@ -28,7 +27,7 @@ flags: 01002002
mnt_id: 21
.EE
.in
-.IP
+.P
The fields are as follows:
.RS
.TP
@@ -51,7 +50,6 @@ this field incorrectly displayed the setting of
at the time the file was opened,
rather than the current setting of the close-on-exec flag.
.TP
-.I
.I mnt_id
This field, present since Linux 3.15,
.\" commit 49d063cb353265c3af701bab215ac438ca7df36d
@@ -59,13 +57,13 @@ is the ID of the mount containing this file.
See the description of
.IR /proc/ pid /mountinfo .
.RE
-.IP
+.P
For eventfd file descriptors (see
.BR eventfd (2)),
we see (since Linux 3.8)
.\" commit cbac5542d48127b546a23d816380a7926eee1c25
the following fields:
-.IP
+.P
.in +4n
.EX
pos: 0
@@ -74,16 +72,16 @@ mnt_id: 10
eventfd\-count: 40
.EE
.in
-.IP
+.P
.I eventfd\-count
is the current value of the eventfd counter, in hexadecimal.
-.IP
+.P
For epoll file descriptors (see
.BR epoll (7)),
we see (since Linux 3.8)
.\" commit 138d22b58696c506799f8de759804083ff9effae
the following fields:
-.IP
+.P
.in +4n
.EX
pos: 0
@@ -93,7 +91,7 @@ tfd: 9 events: 19 data: 74253d2500000009
tfd: 7 events: 19 data: 74253d2500000007
.EE
.in
-.IP
+.P
Each of the lines beginning
.I tfd
describes one of the file descriptors being monitored via
@@ -110,13 +108,13 @@ descriptor.
The
.I data
field is the data value associated with this file descriptor.
-.IP
+.P
For signalfd file descriptors (see
.BR signalfd (2)),
we see (since Linux 3.8)
.\" commit 138d22b58696c506799f8de759804083ff9effae
the following fields:
-.IP
+.P
.in +4n
.EX
pos: 0
@@ -125,7 +123,7 @@ mnt_id: 10
sigmask: 0000000000000006
.EE
.in
-.IP
+.P
.I sigmask
is the hexadecimal mask of signals that are accepted via this
signalfd file descriptor.
@@ -135,12 +133,12 @@ and
.BR SIGQUIT ;
see
.BR signal (7).)
-.IP
+.P
For inotify file descriptors (see
.BR inotify (7)),
we see (since Linux 3.8)
the following fields:
-.IP
+.P
.in +4n
.EX
pos: 0
@@ -150,7 +148,7 @@ inotify wd:2 ino:7ef82a sdev:800001 mask:800afff ignored_mask:0 fhandle\-bytes:8
inotify wd:1 ino:192627 sdev:800001 mask:800afff ignored_mask:0 fhandle\-bytes:8 fhandle\-type:1 f_handle:27261900802dfd73
.EE
.in
-.IP
+.P
Each of the lines beginning with "inotify" displays information about
one file or directory that is being monitored.
The fields in this line are as follows:
@@ -168,19 +166,19 @@ The ID of the device where the target file resides (in hexadecimal).
.I mask
The mask of events being monitored for the target file (in hexadecimal).
.RE
-.IP
+.P
If the kernel was built with exportfs support, the path to the target
file is exposed as a file handle, via three hexadecimal fields:
.IR fhandle\-bytes ,
.IR fhandle\-type ,
and
.IR f_handle .
-.IP
+.P
For fanotify file descriptors (see
.BR fanotify (7)),
we see (since Linux 3.8)
the following fields:
-.IP
+.P
.in +4n
.EX
pos: 0
@@ -190,7 +188,7 @@ fanotify flags:0 event\-flags:88002
fanotify ino:19264f sdev:800001 mflags:0 mask:1 ignored_mask:0 fhandle\-bytes:8 fhandle\-type:1 f_handle:4f261900a82dfd73
.EE
.in
-.IP
+.P
The fourth line displays information defined when the fanotify group
was created via
.BR fanotify_init (2):
@@ -210,7 +208,7 @@ argument given to
.BR fanotify_init (2)
(expressed in hexadecimal).
.RE
-.IP
+.P
Each additional line shown in the file contains information
about one of the marks in the fanotify group.
Most of these fields are as for inotify, except:
@@ -228,16 +226,16 @@ The events mask for this mark
The mask of events that are ignored for this mark
(expressed in hexadecimal).
.RE
-.IP
+.P
For details on these fields, see
.BR fanotify_mark (2).
-.IP
+.P
For timerfd file descriptors (see
.BR timerfd (2)),
we see (since Linux 3.17)
.\" commit af9c4957cf212ad9cf0bee34c95cb11de5426e85
the following fields:
-.IP
+.P
.in +4n
.EX
pos: 0
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2 2/3] proc_pid_fdinfo.5: Add subsection headers for different fd types
2024-10-15 21:17 [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page Ian Rogers
@ 2024-10-15 21:17 ` Ian Rogers
2024-10-15 21:17 ` [PATCH v2 3/3] proc_pid_fdinfo.5: Add DRM subsection Ian Rogers
2024-11-01 13:24 ` [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page Alejandro Colomar
2 siblings, 0 replies; 20+ messages in thread
From: Ian Rogers @ 2024-10-15 21:17 UTC (permalink / raw)
To: Alejandro Colomar, G . Branden Robinson
Cc: David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Jonathan Corbet, dri-devel, linux-doc,
linux-kernel, linux-man, Ian Rogers
Make the sections about eventfd, epoll, signalfd, inotify, fanotify,
timerfd better separated with a clearer subsection header.
Signed-off-by: Ian Rogers <irogers@google.com>
---
man/man5/proc_pid_fdinfo.5 | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/man/man5/proc_pid_fdinfo.5 b/man/man5/proc_pid_fdinfo.5
index 8678caf4a..02eceac04 100644
--- a/man/man5/proc_pid_fdinfo.5
+++ b/man/man5/proc_pid_fdinfo.5
@@ -57,6 +57,7 @@ is the ID of the mount containing this file.
See the description of
.IR /proc/ pid /mountinfo .
.RE
+.SS eventfd
.P
For eventfd file descriptors (see
.BR eventfd (2)),
@@ -75,6 +76,7 @@ eventfd\-count: 40
.P
.I eventfd\-count
is the current value of the eventfd counter, in hexadecimal.
+.SS epoll
.P
For epoll file descriptors (see
.BR epoll (7)),
@@ -108,6 +110,7 @@ descriptor.
The
.I data
field is the data value associated with this file descriptor.
+.SS signalfd
.P
For signalfd file descriptors (see
.BR signalfd (2)),
@@ -133,6 +136,7 @@ and
.BR SIGQUIT ;
see
.BR signal (7).)
+.SS inotify
.P
For inotify file descriptors (see
.BR inotify (7)),
@@ -173,6 +177,7 @@ file is exposed as a file handle, via three hexadecimal fields:
.IR fhandle\-type ,
and
.IR f_handle .
+.SS fanotify
.P
For fanotify file descriptors (see
.BR fanotify (7)),
@@ -229,6 +234,7 @@ The mask of events that are ignored for this mark
.P
For details on these fields, see
.BR fanotify_mark (2).
+.SS timerfd
.P
For timerfd file descriptors (see
.BR timerfd (2)),
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v2 3/3] proc_pid_fdinfo.5: Add DRM subsection
2024-10-15 21:17 [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page Ian Rogers
2024-10-15 21:17 ` [PATCH v2 2/3] proc_pid_fdinfo.5: Add subsection headers for different fd types Ian Rogers
@ 2024-10-15 21:17 ` Ian Rogers
2024-11-01 13:24 ` [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page Alejandro Colomar
2 siblings, 0 replies; 20+ messages in thread
From: Ian Rogers @ 2024-10-15 21:17 UTC (permalink / raw)
To: Alejandro Colomar, G . Branden Robinson
Cc: David Airlie, Simona Vetter, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, Jonathan Corbet, dri-devel, linux-doc,
linux-kernel, linux-man, Ian Rogers
Add description of DRM fdinfo information based on the Linux kernel's
`Documentation/gpu/drm-usage-stats.rst`:
https://docs.kernel.org/gpu/drm-usage-stats.html
Signed-off-by: Ian Rogers <irogers@google.com>
---
man/man5/proc_pid_fdinfo.5 | 94 ++++++++++++++++++++++++++++++++++++++
1 file changed, 94 insertions(+)
diff --git a/man/man5/proc_pid_fdinfo.5 b/man/man5/proc_pid_fdinfo.5
index 02eceac04..bb6c07527 100644
--- a/man/man5/proc_pid_fdinfo.5
+++ b/man/man5/proc_pid_fdinfo.5
@@ -300,5 +300,99 @@ fields contain the values that
.BR timerfd_gettime (2)
on this file descriptor would return.)
.RE
+.SS Direct Rendering Manager
+.P
+DRM drivers can optionally choose to expose usage stats through
+/proc/pid/fdinfo/. For example:
+.P
+.in +4n
+.EX
+pos: 0
+flags: 02100002
+mnt_id: 26
+ino: 284
+drm-driver: i915
+drm-client-id: 39
+drm-pdev: 0000:00:02.0
+drm-total-system0: 6044 KiB
+drm-shared-system0: 0
+drm-active-system0: 0
+drm-resident-system0: 6044 KiB
+drm-purgeable-system0: 1688 KiB
+drm-total-stolen-system0: 0
+drm-shared-stolen-system0: 0
+drm-active-stolen-system0: 0
+drm-resident-stolen-system0: 0
+drm-purgeable-stolen-system0: 0
+drm-engine-render: 346249 ns
+drm-engine-copy: 0 ns
+drm-engine-video: 0 ns
+drm-engine-capacity-video: 2
+drm-engine-video-enhance: 0 ns
+.EE
+.TP
+.IR drm-driver: " .+ (mandatory)"
+The name this driver registered.
+.TP
+.IR drm-pdev: " <aaaa:bb:cc.d>"
+For PCI devices this should contain the PCI slot address of the device
+in question.
+.TP
+.IR drm-client-id: " [0-9]+"
+Unique value relating to the open DRM file descriptor used to
+distinguish duplicated and shared file descriptors.
+.P
+GPUs usually contain multiple execution engines. Each shall be given a
+stable and unique name (<engine_name>), with possible values
+documented in the driver specific documentation.
+.TP
+.IR drm-engine-<engine_name>: " [0-9]+ ns"
+GPU engine utilization, time spent busy executing workloads for this client.
+.TP
+.IR drm-engine-capacity-<engine_name>: " [0-9]+"
+Capacity of the engine if not 1, cannot be 0.
+.TP
+.IR drm-cycles-<engine_name>: " [0-9]+"
+Contains the number of busy cycles for the given engine. Values are
+not required to be constantly monotonic, but are required to catch up
+with the previously reported larger value within a reasonable
+period. Upon observing a value lower than what was previously read,
+userspace is expected to stay with that larger previous value until a
+monotonic update is seen.
+.TP
+.IR drm-total-cycles-<engine_name>: " [0-9]+"
+Contains the total number cycles for the given engine. This is a
+timestamp in GPU unspecified unit that matches the update rate of
+drm-cycles-<engine_name>. For drivers that implement this interface,
+the engine utilization can be calculated entirely on the GPU clock
+domain, without considering the CPU sleep time between 2 samples.
+.P
+Each possible memory type which can be used to store buffer objects by
+the GPU in question shall be given a stable and unique name <region>.
+The name "memory" is reserved to refer to normal system memory.
+.TP
+.IR drm-memory-<region>: " [0-9]+ [KiB|MiB]"
+The amount of storage currently consumed by the buffer objects belong
+to this client, in the respective memory region.
+.IP
+Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB'
+indicating kibi- or mebi-bytes.
+.TP
+.IR drm-shared-<region>: " [0-9]+ [KiB|MiB]"
+The total size of buffers that are shared with another file (e.g., have more
+than a single handle).
+.TP
+.IR drm-total-<region>: " [0-9]+ [KiB|MiB]"
+The total size of buffers that including shared and private memory.
+.TP
+.IR drm-resident-<region>: " [0-9]+ [KiB|MiB]"
+The total size of buffers that are resident in the specified region.
+.TP
+.IR drm-purgeable-<region>: " [0-9]+ [KiB|MiB]"
+The total size of buffers that are purgeable.
+.TP
+.IR drm-active-<region>: " [0-9]+ [KiB|MiB]"
+The total size of buffers that are active on one or more engines.
+
.SH SEE ALSO
.BR proc (5)
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page
2024-10-15 21:17 [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page Ian Rogers
2024-10-15 21:17 ` [PATCH v2 2/3] proc_pid_fdinfo.5: Add subsection headers for different fd types Ian Rogers
2024-10-15 21:17 ` [PATCH v2 3/3] proc_pid_fdinfo.5: Add DRM subsection Ian Rogers
@ 2024-11-01 13:24 ` Alejandro Colomar
2024-11-01 18:19 ` Ian Rogers
2 siblings, 1 reply; 20+ messages in thread
From: Alejandro Colomar @ 2024-11-01 13:24 UTC (permalink / raw)
To: Ian Rogers
Cc: G . Branden Robinson, David Airlie, Simona Vetter,
Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
Jonathan Corbet, dri-devel, linux-doc, linux-kernel, linux-man
[-- Attachment #1: Type: text/plain, Size: 7486 bytes --]
On Tue, Oct 15, 2024 at 02:17:17PM -0700, Ian Rogers wrote:
> When /proc/pid/fdinfo was part of proc.5 man page the indentation made
> sense. As a standalone man page the indentation doesn't need to be so
> far over to the right. Remove the initial tagged pragraph and move the
> styling to the initial summary description.
>
> Suggested-by: G. Branden Robinson <g.branden.robinson@gmail.com>
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
> man/man5/proc_pid_fdinfo.5 | 66 ++++++++++++++++++--------------------
> 1 file changed, 32 insertions(+), 34 deletions(-)
>
> diff --git a/man/man5/proc_pid_fdinfo.5 b/man/man5/proc_pid_fdinfo.5
> index 1e23bbe02..8678caf4a 100644
> --- a/man/man5/proc_pid_fdinfo.5
> +++ b/man/man5/proc_pid_fdinfo.5
> @@ -6,20 +6,19 @@
> .\"
> .TH proc_pid_fdinfo 5 (date) "Linux man-pages (unreleased)"
> .SH NAME
> -/proc/pid/fdinfo/ \- information about file descriptors
> +.IR /proc/ pid /fdinfo " \- information about file descriptors"
I wouldn't add formatting here for now. That's something I prefer to be
cautious about, and if we do it, we should do it in a separate commit.
> .SH DESCRIPTION
> -.TP
> -.IR /proc/ pid /fdinfo/ " (since Linux 2.6.22)"
> -This is a subdirectory containing one entry for each file which the
> -process has open, named by its file descriptor.
> -The files in this directory are readable only by the owner of the process.
> -The contents of each file can be read to obtain information
> -about the corresponding file descriptor.
> -The content depends on the type of file referred to by the
> -corresponding file descriptor.
> -.IP
> +Since Linux 2.6.22,
You could move this information to a HISTORY section.
> +this subdirectory contains one entry for each file that process
> +.I pid
> +has open, named by its file descriptor. The files in this directory
Please don't reflow existing text. Please read about semantic newlines
in man-pages(7):
$ MANWIDTH=72 man man-pages | sed -n '/Use semantic newlines/,/^$/p'
Use semantic newlines
In the source of a manual page, new sentences should be started
on new lines, long sentences should be split into lines at clause
breaks (commas, semicolons, colons, and so on), and long clauses
should be split at phrase boundaries. This convention, sometimes
known as "semantic newlines", makes it easier to see the effect
of patches, which often operate at the level of individual sen‐
tences, clauses, or phrases.
Have a lovely day!
Alex
> +are readable only by the owner of the process. The contents of each
> +file can be read to obtain information about the corresponding file
> +descriptor. The content depends on the type of file referred to by
> +the corresponding file descriptor.
> +.P
> For regular files and directories, we see something like:
> -.IP
> +.P
> .in +4n
> .EX
> .RB "$" " cat /proc/12015/fdinfo/4"
> @@ -28,7 +27,7 @@ flags: 01002002
> mnt_id: 21
> .EE
> .in
> -.IP
> +.P
> The fields are as follows:
> .RS
> .TP
> @@ -51,7 +50,6 @@ this field incorrectly displayed the setting of
> at the time the file was opened,
> rather than the current setting of the close-on-exec flag.
> .TP
> -.I
> .I mnt_id
> This field, present since Linux 3.15,
> .\" commit 49d063cb353265c3af701bab215ac438ca7df36d
> @@ -59,13 +57,13 @@ is the ID of the mount containing this file.
> See the description of
> .IR /proc/ pid /mountinfo .
> .RE
> -.IP
> +.P
> For eventfd file descriptors (see
> .BR eventfd (2)),
> we see (since Linux 3.8)
> .\" commit cbac5542d48127b546a23d816380a7926eee1c25
> the following fields:
> -.IP
> +.P
> .in +4n
> .EX
> pos: 0
> @@ -74,16 +72,16 @@ mnt_id: 10
> eventfd\-count: 40
> .EE
> .in
> -.IP
> +.P
> .I eventfd\-count
> is the current value of the eventfd counter, in hexadecimal.
> -.IP
> +.P
> For epoll file descriptors (see
> .BR epoll (7)),
> we see (since Linux 3.8)
> .\" commit 138d22b58696c506799f8de759804083ff9effae
> the following fields:
> -.IP
> +.P
> .in +4n
> .EX
> pos: 0
> @@ -93,7 +91,7 @@ tfd: 9 events: 19 data: 74253d2500000009
> tfd: 7 events: 19 data: 74253d2500000007
> .EE
> .in
> -.IP
> +.P
> Each of the lines beginning
> .I tfd
> describes one of the file descriptors being monitored via
> @@ -110,13 +108,13 @@ descriptor.
> The
> .I data
> field is the data value associated with this file descriptor.
> -.IP
> +.P
> For signalfd file descriptors (see
> .BR signalfd (2)),
> we see (since Linux 3.8)
> .\" commit 138d22b58696c506799f8de759804083ff9effae
> the following fields:
> -.IP
> +.P
> .in +4n
> .EX
> pos: 0
> @@ -125,7 +123,7 @@ mnt_id: 10
> sigmask: 0000000000000006
> .EE
> .in
> -.IP
> +.P
> .I sigmask
> is the hexadecimal mask of signals that are accepted via this
> signalfd file descriptor.
> @@ -135,12 +133,12 @@ and
> .BR SIGQUIT ;
> see
> .BR signal (7).)
> -.IP
> +.P
> For inotify file descriptors (see
> .BR inotify (7)),
> we see (since Linux 3.8)
> the following fields:
> -.IP
> +.P
> .in +4n
> .EX
> pos: 0
> @@ -150,7 +148,7 @@ inotify wd:2 ino:7ef82a sdev:800001 mask:800afff ignored_mask:0 fhandle\-bytes:8
> inotify wd:1 ino:192627 sdev:800001 mask:800afff ignored_mask:0 fhandle\-bytes:8 fhandle\-type:1 f_handle:27261900802dfd73
> .EE
> .in
> -.IP
> +.P
> Each of the lines beginning with "inotify" displays information about
> one file or directory that is being monitored.
> The fields in this line are as follows:
> @@ -168,19 +166,19 @@ The ID of the device where the target file resides (in hexadecimal).
> .I mask
> The mask of events being monitored for the target file (in hexadecimal).
> .RE
> -.IP
> +.P
> If the kernel was built with exportfs support, the path to the target
> file is exposed as a file handle, via three hexadecimal fields:
> .IR fhandle\-bytes ,
> .IR fhandle\-type ,
> and
> .IR f_handle .
> -.IP
> +.P
> For fanotify file descriptors (see
> .BR fanotify (7)),
> we see (since Linux 3.8)
> the following fields:
> -.IP
> +.P
> .in +4n
> .EX
> pos: 0
> @@ -190,7 +188,7 @@ fanotify flags:0 event\-flags:88002
> fanotify ino:19264f sdev:800001 mflags:0 mask:1 ignored_mask:0 fhandle\-bytes:8 fhandle\-type:1 f_handle:4f261900a82dfd73
> .EE
> .in
> -.IP
> +.P
> The fourth line displays information defined when the fanotify group
> was created via
> .BR fanotify_init (2):
> @@ -210,7 +208,7 @@ argument given to
> .BR fanotify_init (2)
> (expressed in hexadecimal).
> .RE
> -.IP
> +.P
> Each additional line shown in the file contains information
> about one of the marks in the fanotify group.
> Most of these fields are as for inotify, except:
> @@ -228,16 +226,16 @@ The events mask for this mark
> The mask of events that are ignored for this mark
> (expressed in hexadecimal).
> .RE
> -.IP
> +.P
> For details on these fields, see
> .BR fanotify_mark (2).
> -.IP
> +.P
> For timerfd file descriptors (see
> .BR timerfd (2)),
> we see (since Linux 3.17)
> .\" commit af9c4957cf212ad9cf0bee34c95cb11de5426e85
> the following fields:
> -.IP
> +.P
> .in +4n
> .EX
> pos: 0
> --
> 2.47.0.rc1.288.g06298d1525-goog
>
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page
2024-11-01 13:24 ` [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page Alejandro Colomar
@ 2024-11-01 18:19 ` Ian Rogers
2024-11-01 20:07 ` Alejandro Colomar
0 siblings, 1 reply; 20+ messages in thread
From: Ian Rogers @ 2024-11-01 18:19 UTC (permalink / raw)
To: Alejandro Colomar
Cc: G . Branden Robinson, David Airlie, Simona Vetter,
Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
Jonathan Corbet, dri-devel, linux-doc, linux-kernel, linux-man
On Fri, Nov 1, 2024 at 6:24 AM Alejandro Colomar <alx@kernel.org> wrote:
>
> On Tue, Oct 15, 2024 at 02:17:17PM -0700, Ian Rogers wrote:
> > When /proc/pid/fdinfo was part of proc.5 man page the indentation made
> > sense. As a standalone man page the indentation doesn't need to be so
> > far over to the right. Remove the initial tagged pragraph and move the
> > styling to the initial summary description.
> >
> > Suggested-by: G. Branden Robinson <g.branden.robinson@gmail.com>
> > Signed-off-by: Ian Rogers <irogers@google.com>
> > ---
> > man/man5/proc_pid_fdinfo.5 | 66 ++++++++++++++++++--------------------
> > 1 file changed, 32 insertions(+), 34 deletions(-)
> >
> > diff --git a/man/man5/proc_pid_fdinfo.5 b/man/man5/proc_pid_fdinfo.5
> > index 1e23bbe02..8678caf4a 100644
> > --- a/man/man5/proc_pid_fdinfo.5
> > +++ b/man/man5/proc_pid_fdinfo.5
> > @@ -6,20 +6,19 @@
> > .\"
> > .TH proc_pid_fdinfo 5 (date) "Linux man-pages (unreleased)"
> > .SH NAME
> > -/proc/pid/fdinfo/ \- information about file descriptors
> > +.IR /proc/ pid /fdinfo " \- information about file descriptors"
>
> I wouldn't add formatting here for now. That's something I prefer to be
> cautious about, and if we do it, we should do it in a separate commit.
I'll move it to a separate patch. Is the caution due to a lack of test
infrastructure? That could be something to get resolved, perhaps
through Google summer-of-code and the like.
> > .SH DESCRIPTION
> > -.TP
> > -.IR /proc/ pid /fdinfo/ " (since Linux 2.6.22)"
> > -This is a subdirectory containing one entry for each file which the
> > -process has open, named by its file descriptor.
> > -The files in this directory are readable only by the owner of the process.
> > -The contents of each file can be read to obtain information
> > -about the corresponding file descriptor.
> > -The content depends on the type of file referred to by the
> > -corresponding file descriptor.
> > -.IP
> > +Since Linux 2.6.22,
>
> You could move this information to a HISTORY section.
Sure, tbh I'm not sure anybody cares about this information and it
could be as well to delete it. Sorry people running 17 year old
kernels. For now I'll try to leave it unchanged.
> > +this subdirectory contains one entry for each file that process
> > +.I pid
> > +has open, named by its file descriptor. The files in this directory
>
> Please don't reflow existing text. Please read about semantic newlines
> in man-pages(7):
>
> $ MANWIDTH=72 man man-pages | sed -n '/Use semantic newlines/,/^$/p'
> Use semantic newlines
> In the source of a manual page, new sentences should be started
> on new lines, long sentences should be split into lines at clause
> breaks (commas, semicolons, colons, and so on), and long clauses
> should be split at phrase boundaries. This convention, sometimes
> known as "semantic newlines", makes it easier to see the effect
> of patches, which often operate at the level of individual sen‐
> tences, clauses, or phrases.
I'll update for v3 but I'm reminded of `git diff --word-diff=color` so
perhaps this recommendation is outdated.
Thanks,
Ian
> Have a lovely day!
> Alex
>
> > +are readable only by the owner of the process. The contents of each
> > +file can be read to obtain information about the corresponding file
> > +descriptor. The content depends on the type of file referred to by
> > +the corresponding file descriptor.
> > +.P
> > For regular files and directories, we see something like:
> > -.IP
> > +.P
> > .in +4n
> > .EX
> > .RB "$" " cat /proc/12015/fdinfo/4"
> > @@ -28,7 +27,7 @@ flags: 01002002
> > mnt_id: 21
> > .EE
> > .in
> > -.IP
> > +.P
> > The fields are as follows:
> > .RS
> > .TP
> > @@ -51,7 +50,6 @@ this field incorrectly displayed the setting of
> > at the time the file was opened,
> > rather than the current setting of the close-on-exec flag.
> > .TP
> > -.I
> > .I mnt_id
> > This field, present since Linux 3.15,
> > .\" commit 49d063cb353265c3af701bab215ac438ca7df36d
> > @@ -59,13 +57,13 @@ is the ID of the mount containing this file.
> > See the description of
> > .IR /proc/ pid /mountinfo .
> > .RE
> > -.IP
> > +.P
> > For eventfd file descriptors (see
> > .BR eventfd (2)),
> > we see (since Linux 3.8)
> > .\" commit cbac5542d48127b546a23d816380a7926eee1c25
> > the following fields:
> > -.IP
> > +.P
> > .in +4n
> > .EX
> > pos: 0
> > @@ -74,16 +72,16 @@ mnt_id: 10
> > eventfd\-count: 40
> > .EE
> > .in
> > -.IP
> > +.P
> > .I eventfd\-count
> > is the current value of the eventfd counter, in hexadecimal.
> > -.IP
> > +.P
> > For epoll file descriptors (see
> > .BR epoll (7)),
> > we see (since Linux 3.8)
> > .\" commit 138d22b58696c506799f8de759804083ff9effae
> > the following fields:
> > -.IP
> > +.P
> > .in +4n
> > .EX
> > pos: 0
> > @@ -93,7 +91,7 @@ tfd: 9 events: 19 data: 74253d2500000009
> > tfd: 7 events: 19 data: 74253d2500000007
> > .EE
> > .in
> > -.IP
> > +.P
> > Each of the lines beginning
> > .I tfd
> > describes one of the file descriptors being monitored via
> > @@ -110,13 +108,13 @@ descriptor.
> > The
> > .I data
> > field is the data value associated with this file descriptor.
> > -.IP
> > +.P
> > For signalfd file descriptors (see
> > .BR signalfd (2)),
> > we see (since Linux 3.8)
> > .\" commit 138d22b58696c506799f8de759804083ff9effae
> > the following fields:
> > -.IP
> > +.P
> > .in +4n
> > .EX
> > pos: 0
> > @@ -125,7 +123,7 @@ mnt_id: 10
> > sigmask: 0000000000000006
> > .EE
> > .in
> > -.IP
> > +.P
> > .I sigmask
> > is the hexadecimal mask of signals that are accepted via this
> > signalfd file descriptor.
> > @@ -135,12 +133,12 @@ and
> > .BR SIGQUIT ;
> > see
> > .BR signal (7).)
> > -.IP
> > +.P
> > For inotify file descriptors (see
> > .BR inotify (7)),
> > we see (since Linux 3.8)
> > the following fields:
> > -.IP
> > +.P
> > .in +4n
> > .EX
> > pos: 0
> > @@ -150,7 +148,7 @@ inotify wd:2 ino:7ef82a sdev:800001 mask:800afff ignored_mask:0 fhandle\-bytes:8
> > inotify wd:1 ino:192627 sdev:800001 mask:800afff ignored_mask:0 fhandle\-bytes:8 fhandle\-type:1 f_handle:27261900802dfd73
> > .EE
> > .in
> > -.IP
> > +.P
> > Each of the lines beginning with "inotify" displays information about
> > one file or directory that is being monitored.
> > The fields in this line are as follows:
> > @@ -168,19 +166,19 @@ The ID of the device where the target file resides (in hexadecimal).
> > .I mask
> > The mask of events being monitored for the target file (in hexadecimal).
> > .RE
> > -.IP
> > +.P
> > If the kernel was built with exportfs support, the path to the target
> > file is exposed as a file handle, via three hexadecimal fields:
> > .IR fhandle\-bytes ,
> > .IR fhandle\-type ,
> > and
> > .IR f_handle .
> > -.IP
> > +.P
> > For fanotify file descriptors (see
> > .BR fanotify (7)),
> > we see (since Linux 3.8)
> > the following fields:
> > -.IP
> > +.P
> > .in +4n
> > .EX
> > pos: 0
> > @@ -190,7 +188,7 @@ fanotify flags:0 event\-flags:88002
> > fanotify ino:19264f sdev:800001 mflags:0 mask:1 ignored_mask:0 fhandle\-bytes:8 fhandle\-type:1 f_handle:4f261900a82dfd73
> > .EE
> > .in
> > -.IP
> > +.P
> > The fourth line displays information defined when the fanotify group
> > was created via
> > .BR fanotify_init (2):
> > @@ -210,7 +208,7 @@ argument given to
> > .BR fanotify_init (2)
> > (expressed in hexadecimal).
> > .RE
> > -.IP
> > +.P
> > Each additional line shown in the file contains information
> > about one of the marks in the fanotify group.
> > Most of these fields are as for inotify, except:
> > @@ -228,16 +226,16 @@ The events mask for this mark
> > The mask of events that are ignored for this mark
> > (expressed in hexadecimal).
> > .RE
> > -.IP
> > +.P
> > For details on these fields, see
> > .BR fanotify_mark (2).
> > -.IP
> > +.P
> > For timerfd file descriptors (see
> > .BR timerfd (2)),
> > we see (since Linux 3.17)
> > .\" commit af9c4957cf212ad9cf0bee34c95cb11de5426e85
> > the following fields:
> > -.IP
> > +.P
> > .in +4n
> > .EX
> > pos: 0
> > --
> > 2.47.0.rc1.288.g06298d1525-goog
> >
>
> --
> <https://www.alejandro-colomar.es/>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page
2024-11-01 18:19 ` Ian Rogers
@ 2024-11-01 20:07 ` Alejandro Colomar
2024-11-02 10:08 ` G. Branden Robinson
0 siblings, 1 reply; 20+ messages in thread
From: Alejandro Colomar @ 2024-11-01 20:07 UTC (permalink / raw)
To: Ian Rogers
Cc: G . Branden Robinson, David Airlie, Simona Vetter,
Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
Jonathan Corbet, dri-devel, linux-doc, linux-kernel, linux-man
[-- Attachment #1: Type: text/plain, Size: 4319 bytes --]
Hi Ian,
On Fri, Nov 01, 2024 at 11:19:18AM -0700, Ian Rogers wrote:
> On Fri, Nov 1, 2024 at 6:24 AM Alejandro Colomar <alx@kernel.org> wrote:
> >
> > On Tue, Oct 15, 2024 at 02:17:17PM -0700, Ian Rogers wrote:
> > > When /proc/pid/fdinfo was part of proc.5 man page the indentation made
> > > sense. As a standalone man page the indentation doesn't need to be so
> > > far over to the right. Remove the initial tagged pragraph and move the
> > > styling to the initial summary description.
> > >
> > > Suggested-by: G. Branden Robinson <g.branden.robinson@gmail.com>
> > > Signed-off-by: Ian Rogers <irogers@google.com>
> > > ---
> > > man/man5/proc_pid_fdinfo.5 | 66 ++++++++++++++++++--------------------
> > > 1 file changed, 32 insertions(+), 34 deletions(-)
> > >
> > > diff --git a/man/man5/proc_pid_fdinfo.5 b/man/man5/proc_pid_fdinfo.5
> > > index 1e23bbe02..8678caf4a 100644
> > > --- a/man/man5/proc_pid_fdinfo.5
> > > +++ b/man/man5/proc_pid_fdinfo.5
> > > @@ -6,20 +6,19 @@
> > > .\"
> > > .TH proc_pid_fdinfo 5 (date) "Linux man-pages (unreleased)"
> > > .SH NAME
> > > -/proc/pid/fdinfo/ \- information about file descriptors
> > > +.IR /proc/ pid /fdinfo " \- information about file descriptors"
> >
> > I wouldn't add formatting here for now. That's something I prefer to be
> > cautious about, and if we do it, we should do it in a separate commit.
>
> I'll move it to a separate patch. Is the caution due to a lack of test
> infrastructure? That could be something to get resolved, perhaps
> through Google summer-of-code and the like.
That change might be controversial. We'd first need to check that all
software that reads the NAME section would behave well for this.
Also, many other pages might need to be changed accordingly for
consistency.
For testing infrastructure I think we're good. The makefile already
does a lot of testing.
>
> > > .SH DESCRIPTION
> > > -.TP
> > > -.IR /proc/ pid /fdinfo/ " (since Linux 2.6.22)"
> > > -This is a subdirectory containing one entry for each file which the
> > > -process has open, named by its file descriptor.
> > > -The files in this directory are readable only by the owner of the process.
> > > -The contents of each file can be read to obtain information
> > > -about the corresponding file descriptor.
> > > -The content depends on the type of file referred to by the
> > > -corresponding file descriptor.
> > > -.IP
> > > +Since Linux 2.6.22,
> >
> > You could move this information to a HISTORY section.
>
> Sure, tbh I'm not sure anybody cares about this information and it
> could be as well to delete it. Sorry people running 17 year old
> kernels. For now I'll try to leave it unchanged.
I would like to keep it in HISTORY. You never know when it'll be useful
and it's just one line or a few; it won't hurt.
>
> > > +this subdirectory contains one entry for each file that process
> > > +.I pid
> > > +has open, named by its file descriptor. The files in this directory
> >
> > Please don't reflow existing text. Please read about semantic newlines
> > in man-pages(7):
> >
> > $ MANWIDTH=72 man man-pages | sed -n '/Use semantic newlines/,/^$/p'
> > Use semantic newlines
> > In the source of a manual page, new sentences should be started
> > on new lines, long sentences should be split into lines at clause
> > breaks (commas, semicolons, colons, and so on), and long clauses
> > should be split at phrase boundaries. This convention, sometimes
> > known as "semantic newlines", makes it easier to see the effect
> > of patches, which often operate at the level of individual sen‐
> > tences, clauses, or phrases.
>
> I'll update for v3 but I'm reminded of `git diff --word-diff=color` so
> perhaps this recommendation is outdated.
No, this isn't outdated, since that reduces the quality of the diff.
Also, I review a lot of patches in the mail client, without running
git(1). And it's not just for reviewing diffs, but also for writing
them. Semantic newlines reduce the amount of work for producing the
diffs. And lastly, the source code reads much better if it's logically
divided in phrases.
>
> Thanks,
> Ian
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page
2024-11-01 20:07 ` Alejandro Colomar
@ 2024-11-02 10:08 ` G. Branden Robinson
2024-11-02 10:39 ` Alejandro Colomar
2024-11-02 19:06 ` Colin Watson
0 siblings, 2 replies; 20+ messages in thread
From: G. Branden Robinson @ 2024-11-02 10:08 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Ian Rogers, David Airlie, Simona Vetter, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, Jonathan Corbet, dri-devel,
linux-doc, linux-kernel, linux-man, cjwatson, groff
[-- Attachment #1: Type: text/plain, Size: 8067 bytes --]
[adding Colin Watson to CC; and the groff list because I started musing]
Hi Alex,
At 2024-11-01T21:07:29+0100, Alejandro Colomar wrote:
> > > > -/proc/pid/fdinfo/ \- information about file descriptors
> > > > +.IR /proc/ pid /fdinfo " \- information about file descriptors"
> > >
> > > I wouldn't add formatting here for now. That's something I prefer
> > > to be cautious about, and if we do it, we should do it in a
> > > separate commit.
> >
> > I'll move it to a separate patch. Is the caution due to a lack of
> > test infrastructure? That could be something to get resolved,
> > perhaps through Google summer-of-code and the like.
>
> That change might be controversial.
Then let those with objections step forward and make them!
(I may be one of them; see below.)
> We'd first need to check that all software that reads the NAME section
> would behave well for this.
Not _all_ software, surely. Anybody can write a craptastic man(7)
scraper, and several have, mainly back when Web 1.0 was going to eat the
world. Most of those have withered on the vine.
This is the _Linux_ man-pages project, so what matters are (1) man page
formatters and (2) man page indexers that GNU/Linux systems actually
use. Where people get nervous with the "NAME" section is because of the
indexer; if one's man(7) _formatter_ can't handle an `IR` call, it
hasn't earned the name.
Here's a sample input.
$ cat /tmp/proc_pid_fdinfo_mini.5
.TH proc_pid_fdinfo_mini 5 2024-11-02 "example"
.SH Name
.IR /proc/ pid /fdinfo " \- information about file descriptors"
.SH Description
Text text text text.
Starting with formatters, let's see how they do.
$ nroff -man /tmp/proc_pid_fdinfo_mini.5
proc_pid_fdinfo_mini(5) File Formats Manual proc_pid_fdinfo_mini(5)
Name
/proc/pid/fdinfo - information about file descriptors
Description
Text text text text.
example 2024‐11‐02 proc_pid_fdinfo_mini(5)
$ mandoc /tmp/proc_pid_fdinfo_mini.5 | ul
proc_pid_fdinfo_mini(5) File Formats Manual proc_pid_fdinfo_mini(5)
Name
/proc/pid/fdinfo - information about file descriptors
Description
Text text text text.
example 2024-11-02 proc_pid_fdinfo_mini(5)
$ ~/heirloom/bin/nroff -man /tmp/proc_pid_fdinfo_mini.5 | ul
proc_pid_fdinfo_mini(5) File Formats Manual proc_pid_fdinfo_mini(5)
Name
/proc/pid/fdinfo - information about file descriptors
Description
Text text text text.
example 2024-11-02 proc_pid_fdinfo_mini(5)
$ DWBHOME=~/dwb ~/dwb/bin/nroff -man /tmp/proc_pid_fdinfo_mini.5 | cat -s | ul
proc_pid_fdinfo_mini(5)example (2024-11-02)roc_pid_fdinfo_mini(5)
Name
/proc/pid/fdinfo - information about file descriptors
Description
Text text text text.
Page 1 (printed 11/2/2024)
I leave the execution of these to perceive the correct font style
changes as an exercise for the reader, but they all get the
"/proc/pid/fdinfo" line right.
On GNU/Linux systems, the only man page indexer I know of is Colin
Watson's man-db--specifically, its mandb(8) program. But it's nicely
designed so that the "topic and summary description extraction" task is
delegated to a standalone tool, lexgrog(1), and we can use that.
$ lexgrog /tmp/proc_pid_fdinfo_mini.5
/tmp/proc_pid_fdinfo_mini.5: parse failed
Oh, damn. I wasn't expecting that. Maybe this is what defeats Michael
Kerrisk's scraper with respect to groff's man pages.[1]
Well, I can find a silver lining here, because it gives me an even
better reason than I had to pitch an idea I've been kicking around for a
while. Why not enhance groff man(7) to support a mode where _it_ will
spit out the "Name"/"NAME" section, and only that, _for_ you?
This would be as easy as checking for an option, say '-d EXTRACT=Name',
and having the package's "TH" and "SH" macro definitions divert
(literally, with the `di` request) everything _except_ the section of
interest to a diversion that is then never called/output. (This is
similar to an m4 feature known as the "black hole diversion".)
All of the features necessary to implement this[2] were part of troff as
far as back as the birth of the man(7) package itself. It's not clear
to me why it wasn't done back in the 1980s.
lexgrog(1) itself will of course have to stay around for years to come,
but this could take a significant distraction off of Colin's plate--I
believe I have seen him grumble about how much *roff syntax he has to
parse to have the feature be workable, and that's without upstart groff
maintainers exploring up to every boundary that existed even in 1979 and
cheerfully exercising their findings in man pages.
I also of course have ideas for generalizing the feature, so that you
can request any (sub)section by name, and, with a bit more ambition,[4]
paragraph tags (`TP`) too.
So you could do things like:
nroff -man -d EXTRACT="RETURN VALUE" man3/bsearch.3
and:
nroff -man -d EXTRACT="OPTIONS/-b" man8/zic.8
...does this sound appetizing to anyone?
> Also, many other pages might need to be changed accordingly for
> consistency.
I withdraw the suggestion until lexgrog(1) flexes its own muscles, or
has groff(1) do the lifting. I'm sorry for prompting churn, Ian.
> No, this isn't outdated, since that reduces the quality of the diff.
> Also, I review a lot of patches in the mail client, without running
> git(1). And it's not just for reviewing diffs, but also for writing
> them. Semantic newlines reduce the amount of work for producing the
> diffs.
It's a real win for diffs.
Here's a very recent example from groff.
diff --git a/man/groff.7.man b/man/groff.7.man
index 1fb635f2b..1d248b237 100644
--- a/man/groff.7.man
+++ b/man/groff.7.man
@@ -1281,6 +1281,7 @@ .SH Identifiers
typeface,
color,
special character or character class,
+hyphenation language code,
environment,
or stream.
.
(So recent that in fact I haven't pushed that yet.)
Lists like the foregoing are common in man pages.
Regards,
Branden
[1] https://man7.org/linux/man-pages/dir_by_project.html#groff
[2] String definitions, "string comparisons"[3], and diversions.
[3] strictly, "formatted output comparisons"
https://www.gnu.org/software/groff/manual/groff.html.node/Operators-in-Conditionals.html
You can do stricter string comparisons in GNU troff. And I've
thought of some syntactic sugar for performing them that wouldn't
break backward compatibility.
[4] To really land the feature, we need automatic tag generation from
input text (we don't want to make the man page author construct
their own tags). Another reason we want the construction to be
automatic is to make the tags unique when multiple man pages are
formatted in one run, as one might do when making a book of man
pages. Automatic tagging will also enable the slaying of two other
ancient dragons.
1. deep internal links for PDF bookmarks
2. pod2man's `IX`-happy output; the widespread use of this
nonstandard macro confuses way too many novice page authors, and
bloats document size.
Another feature we'll really want to do this right is improved string
processing facilities. That, too, is something that will pay
dividends in several areas. With a proper string iterator in the
formatter (and a couple more conditional operators),[5] it will be
possible to write a string library as a macro file, slimming down the
formatter itself a little and making macro writers' lives easier.
We're only two days into the month and this has already come up on
the groff list.
https://lists.gnu.org/archive/html/groff/2024-11/msg00002.html
[5] https://savannah.gnu.org/bugs/?62264
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page
2024-11-02 10:08 ` G. Branden Robinson
@ 2024-11-02 10:39 ` Alejandro Colomar
2024-11-02 21:36 ` Alejandro Colomar
2024-11-03 4:05 ` G. Branden Robinson
2024-11-02 19:06 ` Colin Watson
1 sibling, 2 replies; 20+ messages in thread
From: Alejandro Colomar @ 2024-11-02 10:39 UTC (permalink / raw)
To: G. Branden Robinson
Cc: Ian Rogers, David Airlie, Simona Vetter, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, Jonathan Corbet, dri-devel,
linux-doc, linux-kernel, linux-man, cjwatson, groff
[-- Attachment #1: Type: text/plain, Size: 12063 bytes --]
Hi Branden,
On Sat, Nov 02, 2024 at 05:08:37AM -0500, G. Branden Robinson wrote:
> [adding Colin Watson to CC; and the groff list because I started musing]
>
> Hi Alex,
>
> At 2024-11-01T21:07:29+0100, Alejandro Colomar wrote:
> > > > > -/proc/pid/fdinfo/ \- information about file descriptors
> > > > > +.IR /proc/ pid /fdinfo " \- information about file descriptors"
> > > >
> > > > I wouldn't add formatting here for now. That's something I prefer
> > > > to be cautious about, and if we do it, we should do it in a
> > > > separate commit.
> > >
> > > I'll move it to a separate patch. Is the caution due to a lack of
> > > test infrastructure? That could be something to get resolved,
> > > perhaps through Google summer-of-code and the like.
> >
> > That change might be controversial.
>
> Then let those with objections step forward and make them!
Sure! But that in itself (and the length of your mail) makes a strong
reason to have this in a separate commit. :)
I'm not opposed to the change. Only cautious.
>
> (I may be one of them; see below.)
>
> > We'd first need to check that all software that reads the NAME section
> > would behave well for this.
>
> Not _all_ software, surely. Anybody can write a craptastic man(7)
> scraper, and several have, mainly back when Web 1.0 was going to eat the
> world. Most of those have withered on the vine.
Ahh, yeah, I committed the same mistake I criticise in others every now
and then. $all does not really mean "all". (-Wall, `make all`, ...)
I meant all [of which I care], which is basically groff(1) and
mandoc(1). :)
> This is the _Linux_ man-pages project, so what matters are (1) man page
> formatters and (2) man page indexers that GNU/Linux systems actually
> use. Where people get nervous with the "NAME" section is because of the
> indexer; if one's man(7) _formatter_ can't handle an `IR` call, it
> hasn't earned the name.
Yup.
>
> Here's a sample input.
>
> $ cat /tmp/proc_pid_fdinfo_mini.5
> .TH proc_pid_fdinfo_mini 5 2024-11-02 "example"
> .SH Name
> .IR /proc/ pid /fdinfo " \- information about file descriptors"
> .SH Description
> Text text text text.
>
> Starting with formatters, let's see how they do.
>
> $ nroff -man /tmp/proc_pid_fdinfo_mini.5
> proc_pid_fdinfo_mini(5) File Formats Manual proc_pid_fdinfo_mini(5)
>
> Name
> /proc/pid/fdinfo - information about file descriptors
>
> Description
> Text text text text.
>
> example 2024‐11‐02 proc_pid_fdinfo_mini(5)
> $ mandoc /tmp/proc_pid_fdinfo_mini.5 | ul
> proc_pid_fdinfo_mini(5) File Formats Manual proc_pid_fdinfo_mini(5)
>
> Name
> /proc/pid/fdinfo - information about file descriptors
>
> Description
> Text text text text.
>
> example 2024-11-02 proc_pid_fdinfo_mini(5)
> $ ~/heirloom/bin/nroff -man /tmp/proc_pid_fdinfo_mini.5 | ul
> proc_pid_fdinfo_mini(5) File Formats Manual proc_pid_fdinfo_mini(5)
>
>
>
> Name
> /proc/pid/fdinfo - information about file descriptors
>
> Description
> Text text text text.
>
>
>
> example 2024-11-02 proc_pid_fdinfo_mini(5)
> $ DWBHOME=~/dwb ~/dwb/bin/nroff -man /tmp/proc_pid_fdinfo_mini.5 | cat -s | ul
>
> proc_pid_fdinfo_mini(5)example (2024-11-02)roc_pid_fdinfo_mini(5)
>
> Name
> /proc/pid/fdinfo - information about file descriptors
>
> Description
> Text text text text.
>
> Page 1 (printed 11/2/2024)
>
> I leave the execution of these to perceive the correct font style
> changes as an exercise for the reader, but they all get the
> "/proc/pid/fdinfo" line right.
>
> On GNU/Linux systems, the only man page indexer I know of is Colin
> Watson's man-db--specifically, its mandb(8) program. But it's nicely
> designed so that the "topic and summary description extraction" task is
> delegated to a standalone tool, lexgrog(1), and we can use that.
>
> $ lexgrog /tmp/proc_pid_fdinfo_mini.5
> /tmp/proc_pid_fdinfo_mini.5: parse failed
>
> Oh, damn. I wasn't expecting that. Maybe this is what defeats Michael
> Kerrisk's scraper with respect to groff's man pages.[1]
>
> Well, I can find a silver lining here, because it gives me an even
> better reason than I had to pitch an idea I've been kicking around for a
> while. Why not enhance groff man(7) to support a mode where _it_ will
> spit out the "Name"/"NAME" section, and only that, _for_ you?
>
> This would be as easy as checking for an option, say '-d EXTRACT=Name',
> and having the package's "TH" and "SH" macro definitions divert
> (literally, with the `di` request) everything _except_ the section of
> interest to a diversion that is then never called/output. (This is
> similar to an m4 feature known as the "black hole diversion".)
Sounds good. And then lexgrog(1) would be a one-liner that calls
groff(1) with the appropriate flag, right?
> All of the features necessary to implement this[2] were part of troff as
> far as back as the birth of the man(7) package itself. It's not clear
> to me why it wasn't done back in the 1980s.
Not enough energy of activation, probably, as with most stuff.
> lexgrog(1) itself will of course have to stay around for years to come,
You can make it a wrapper around groff(1) with flags, no?
> but this could take a significant distraction off of Colin's plate--I
> believe I have seen him grumble about how much *roff syntax he has to
> parse to have the feature be workable, and that's without upstart groff
> maintainers exploring up to every boundary that existed even in 1979 and
> cheerfully exercising their findings in man pages.
>
> I also of course have ideas for generalizing the feature, so that you
> can request any (sub)section by name, and, with a bit more ambition,[4]
> paragraph tags (`TP`) too.
>
> So you could do things like:
>
> nroff -man -d EXTRACT="RETURN VALUE" man3/bsearch.3
I certainly use this.
# man_section() prints specific manual page sections (DESCRIPTION, SYNOPSIS,
# ...) of all manual pages in a directory (or in a single manual page file).
# Usage example: .../man-pages$ man_section man2 SYNOPSIS 'SEE ALSO';
man_section()
{
if [ $# -lt 2 ]; then
>&2 echo "Usage: ${FUNCNAME[0]} <dir> <section>...";
return $EX_USAGE;
fi
local page="$1";
shift;
local sect="$*";
find "$page" -type f \
|xargs wc -l \
|grep -v -e '\b1 ' -e '\btotal\b' \
|awk '{ print $2 }' \
|sort \
|while read -r manpage; do
(sed -n '/^\.TH/,/^\.SH/{/^\.SH/!p}' <"$manpage";
for s in $sect; do
<"$manpage" \
sed -n \
-e "/^\.SH $s/p" \
-e "/^\.SH $s/,/^\.SH/{/^\.SH/!p}";
done;) \
|mandoc -Tutf8 2>/dev/null \
|col -pbx;
done;
}
# man_lsfunc() prints the name of all C functions declared in the SYNOPSIS
# of all manual pages in a directory (or in a single manual page file).
# Each name is printed in a separate line
# Usage example: .../man-pages$ man_lsfunc man2;
man_lsfunc()
{
if [ $# -lt 1 ]; then
>&2 echo "Usage: ${FUNCNAME[0]} <manpage|manNdir>...";
return $EX_USAGE;
fi
for arg in "$@"; do
man_section "$arg" 'SYNOPSIS';
done \
|sed_rm_ccomments \
|pcregrep -Mn '(?s)^ [\w ]+ \**\w+\([\w\s(,)[\]*]*?(...)?\s*\); *$' \
|grep '^[0-9]' \
|sed -E 's/syscall\(SYS_(\w*),?/\1(/' \
|sed -E 's/^[^(]+ \**(\w+)\(.*/\1/' \
|uniq;
}
# man_lsvar() prints the name of all C variables declared in the SYNOPSIS
# of all manual pages in a directory (or in a single manual page file).
# Each name is printed in a separate line
# Usage example: .../man-pages$ man_lsvar man3;
man_lsvar()
{
if [ $# -lt 1 ]; then
>&2 echo "Usage: ${FUNCNAME[0]} <manpage|manNdir>...";
return $EX_USAGE;
fi
for arg in "$@"; do
man_section "$arg" 'SYNOPSIS';
done \
|sed_rm_ccomments \
|pcregrep -Mv '(?s)^ [\w ]+ \**\w+\([\w\s(,)[\]*]+?(...)?\s*\); *$' \
|pcregrep -Mn \
-e '(?s)^ +extern [\w ]+ \**\(\*+[\w ]+\)\([\w\s(,)[\]*]+?\s*\); *$' \
-e '^ +extern [\w ]+ \**[\w ]+; *$' \
|grep '^[0-9]' \
|grep -v 'typedef' \
|sed -E 's/^[0-9]+: +extern [^(]+ \**\(\*+(\w* )?(\w+)\)\(.*/\2/' \
|sed 's/^[0-9]\+: \+extern .* \**\(\w\+\); */\1/' \
|uniq;
}
Even grepc(1) derived from those scripts.
>
> and:
>
> nroff -man -d EXTRACT="OPTIONS/-b" man8/zic.8
While I haven't used this yet, it's probably because it's quite complex
to implement with regexes, not because it wouldn't be useful.
>
> ...does this sound appetizing to anyone?
Certainly.
> > Also, many other pages might need to be changed accordingly for
> > consistency.
>
> I withdraw the suggestion until lexgrog(1) flexes its own muscles, or
> has groff(1) do the lifting. I'm sorry for prompting churn, Ian.
>
> > No, this isn't outdated, since that reduces the quality of the diff.
> > Also, I review a lot of patches in the mail client, without running
> > git(1). And it's not just for reviewing diffs, but also for writing
> > them. Semantic newlines reduce the amount of work for producing the
> > diffs.
>
> It's a real win for diffs.
And diffs are a real win for text. Thus, semantic newlines are a real
win for text. "Write poems, not prose." (Any chance we may get that
warning added to groff(1)? :D)
Cheers,
Alex
>
> Here's a very recent example from groff.
>
> diff --git a/man/groff.7.man b/man/groff.7.man
> index 1fb635f2b..1d248b237 100644
> --- a/man/groff.7.man
> +++ b/man/groff.7.man
> @@ -1281,6 +1281,7 @@ .SH Identifiers
> typeface,
> color,
> special character or character class,
> +hyphenation language code,
> environment,
> or stream.
> .
>
>
> (So recent that in fact I haven't pushed that yet.)
>
> Lists like the foregoing are common in man pages.
>
> Regards,
> Branden
>
> [1] https://man7.org/linux/man-pages/dir_by_project.html#groff
> [2] String definitions, "string comparisons"[3], and diversions.
> [3] strictly, "formatted output comparisons"
>
> https://www.gnu.org/software/groff/manual/groff.html.node/Operators-in-Conditionals.html
>
> You can do stricter string comparisons in GNU troff. And I've
> thought of some syntactic sugar for performing them that wouldn't
> break backward compatibility.
>
> [4] To really land the feature, we need automatic tag generation from
> input text (we don't want to make the man page author construct
> their own tags). Another reason we want the construction to be
> automatic is to make the tags unique when multiple man pages are
> formatted in one run, as one might do when making a book of man
> pages. Automatic tagging will also enable the slaying of two other
> ancient dragons.
>
> 1. deep internal links for PDF bookmarks
> 2. pod2man's `IX`-happy output; the widespread use of this
> nonstandard macro confuses way too many novice page authors, and
> bloats document size.
>
> Another feature we'll really want to do this right is improved string
> processing facilities. That, too, is something that will pay
> dividends in several areas. With a proper string iterator in the
> formatter (and a couple more conditional operators),[5] it will be
> possible to write a string library as a macro file, slimming down the
> formatter itself a little and making macro writers' lives easier.
> We're only two days into the month and this has already come up on
> the groff list.
>
> https://lists.gnu.org/archive/html/groff/2024-11/msg00002.html
>
> [5] https://savannah.gnu.org/bugs/?62264
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page
2024-11-02 10:08 ` G. Branden Robinson
2024-11-02 10:39 ` Alejandro Colomar
@ 2024-11-02 19:06 ` Colin Watson
2024-11-03 0:50 ` G. Branden Robinson
1 sibling, 1 reply; 20+ messages in thread
From: Colin Watson @ 2024-11-02 19:06 UTC (permalink / raw)
To: G. Branden Robinson
Cc: Alejandro Colomar, Ian Rogers, David Airlie, Simona Vetter,
Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
Jonathan Corbet, dri-devel, linux-doc, linux-kernel, linux-man,
groff
On Sat, Nov 02, 2024 at 05:08:37AM -0500, G. Branden Robinson wrote:
> On GNU/Linux systems, the only man page indexer I know of is Colin
> Watson's man-db--specifically, its mandb(8) program. But it's nicely
> designed so that the "topic and summary description extraction" task is
> delegated to a standalone tool, lexgrog(1), and we can use that.
>
> $ lexgrog /tmp/proc_pid_fdinfo_mini.5
> /tmp/proc_pid_fdinfo_mini.5: parse failed
>
> Oh, damn. I wasn't expecting that. Maybe this is what defeats Michael
> Kerrisk's scraper with respect to groff's man pages.[1]
How embarrassing. Could somebody please file a bug on
https://gitlab.com/man-db/man-db/-/issues to remind me to fix that? (Of
course there'll be a lead time for fixes to get into distributions.)
> Well, I can find a silver lining here, because it gives me an even
> better reason than I had to pitch an idea I've been kicking around for a
> while. Why not enhance groff man(7) to support a mode where _it_ will
> spit out the "Name"/"NAME" section, and only that, _for_ you?
>
> This would be as easy as checking for an option, say '-d EXTRACT=Name',
> and having the package's "TH" and "SH" macro definitions divert
> (literally, with the `di` request) everything _except_ the section of
> interest to a diversion that is then never called/output. (This is
> similar to an m4 feature known as the "black hole diversion".)
>
> All of the features necessary to implement this[2] were part of troff as
> far as back as the birth of the man(7) package itself. It's not clear
> to me why it wasn't done back in the 1980s.
>
> lexgrog(1) itself will of course have to stay around for years to come,
> but this could take a significant distraction off of Colin's plate--I
> believe I have seen him grumble about how much *roff syntax he has to
> parse to have the feature be workable, and that's without upstart groff
> maintainers exploring up to every boundary that existed even in 1979 and
> cheerfully exercising their findings in man pages.
lexgrog(1) is a useful (if oddly-named, sorry) debugging tool, but if
you focus on that then you'll end up with a design that's not very
useful. What really matters is indexing the whole system's manual
pages, and mandb(8) does not do that by invoking lexgrog(1) one page at
a time, but rather by running more or less the same code in-process. I
already know that getting acceptable performance for this requires care,
as illustrated by one of the NEWS entries for man-db 2.10.0:
* Significantly improve `mandb(8)` and `man -K` performance in the common
case where pages are of moderate size and compressed using `zlib`: `mandb
-c` goes from 344 seconds to 10 seconds on a test system.
... so I'm prepared to bet that forking nroff one page at a time will be
unacceptably slow. (This also combines with the fact that man-db
applies some sandboxing when it's calling nroff just in case it might
happen that a moderately-sized C++ project has less than 100% perfect
security when doing text processing, which I'm sure everyone agrees
would never happen.)
If it were possible to run nroff over a whole batch of pages and get
output for each of them in one go, then maaaaybe. man-db would need a
reliable way to associate each line (or sometimes multiple lines) of
output with each source file, and of course care would be needed around
error handling and so on. I can see the appeal, in terms of processing
the actual language rather than a pile of hacks that try to guess what
to do with it - but on the other hand this starts to feel like a much
less natural fit for the way nroff is run in every other situation,
where you're processing one document at a time.
Cheers,
--
Colin Watson (he/him) [cjwatson@debian.org]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page
2024-11-02 10:39 ` Alejandro Colomar
@ 2024-11-02 21:36 ` Alejandro Colomar
2024-11-02 23:47 ` Colin Watson
2024-11-03 4:05 ` G. Branden Robinson
1 sibling, 1 reply; 20+ messages in thread
From: Alejandro Colomar @ 2024-11-02 21:36 UTC (permalink / raw)
To: G. Branden Robinson, cjwatson
Cc: Ian Rogers, David Airlie, Simona Vetter, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, Jonathan Corbet, dri-devel,
linux-doc, linux-kernel, linux-man, cjwatson, groff
[-- Attachment #1: Type: text/plain, Size: 4844 bytes --]
Hi Branden, Colin,
On Sat, Nov 02, 2024 at 11:40:13AM +0100, Alejandro Colomar wrote:
> > I also of course have ideas for generalizing the feature, so that you
> > can request any (sub)section by name, and, with a bit more ambition,[4]
> > paragraph tags (`TP`) too.
> >
> > So you could do things like:
> >
> > nroff -man -d EXTRACT="RETURN VALUE" man3/bsearch.3
>
> I certainly use this.
>
> # man_section() prints specific manual page sections (DESCRIPTION, SYNOPSIS,
> # ...) of all manual pages in a directory (or in a single manual page file).
> # Usage example: .../man-pages$ man_section man2 SYNOPSIS 'SEE ALSO';
>
> man_section()
> {
> if [ $# -lt 2 ]; then
> >&2 echo "Usage: ${FUNCNAME[0]} <dir> <section>...";
> return $EX_USAGE;
> fi
>
> local page="$1";
> shift;
> local sect="$*";
>
> find "$page" -type f \
> |xargs wc -l \
> |grep -v -e '\b1 ' -e '\btotal\b' \
> |awk '{ print $2 }' \
> |sort \
> |while read -r manpage; do
> (sed -n '/^\.TH/,/^\.SH/{/^\.SH/!p}' <"$manpage";
> for s in $sect; do
> <"$manpage" \
> sed -n \
> -e "/^\.SH $s/p" \
> -e "/^\.SH $s/,/^\.SH/{/^\.SH/!p}";
> done;) \
> |mandoc -Tutf8 2>/dev/null \
> |col -pbx;
> done;
> }
On the other hand, you may want to just package this small shell script
(or rather a part of it) as a program.
How about this?
$ cat /usr/local/bin/mansect
#!/bin/sh
if [ $# -lt 1 ]; then
>&2 echo "Usage: $0 SECTION [FILE ...]";
return 1;
fi
s="$1";
shift;
if test -z "$*"; then
sed -n \
-e '/^\.TH/,/^\.SH/{/^\.SH/!p}' \
-e '/^\.SH '"$s"'$/p' \
-e '/^\.SH '"$s"'$/,/^\.SH/{/^\.SH/!p}' \
;
else
find "$@" -not -type d \
| xargs wc -l \
| sed '${/ total$/d}' \
| grep -v '\b1 ' \
| awk '{ print $2 }' \
| xargs -L1 sed -n \
-e '/^\.TH/,/^\.SH/{/^\.SH/!p}' \
-e '/^\.SH '"$s"'$/p' \
-e '/^\.SH '"$s"'$/,/^\.SH/{/^\.SH/!p}' \
;
fi;
This only filters the source of the page, producing output that's
suitable for the groff pipeline.
alx@devuan:~$ man -w proc | xargs cat | mansect NAME
.TH proc 5 2024-06-15 "Linux man-pages 6.9.1-158-g2ac94c631"
.SH NAME
proc \- process information, system information, and sysctl pseudo-filesystem
alx@devuan:~$ man -w strtol strtoul | xargs mansect 'NAME'
.TH strtol 3 2024-07-23 "Linux man-pages 6.9.1-158-g2ac94c631"
.SH NAME
strtol, strtoll, strtoq \- convert a string to a long integer
.TH strtoul 3 2024-07-23 "Linux man-pages 6.9.1-158-g2ac94c631"
.SH NAME
strtoul, strtoull, strtouq \- convert a string to an unsigned long integer
You can request several sections with a regex:
$ man -w strtol strtoul | xargs mansect '\(NAME\|SEE ALSO\)'
.TH strtol 3 2024-07-23 "Linux man-pages 6.9.1-158-g2ac94c631"
.SH NAME
strtol, strtoll, strtoq \- convert a string to a long integer
.SH SEE ALSO
.BR atof (3),
.BR atoi (3),
.BR atol (3),
.BR strtod (3),
.BR strtoimax (3),
.BR strtoul (3)
.TH strtoul 3 2024-07-23 "Linux man-pages 6.9.1-158-g2ac94c631"
.SH NAME
strtoul, strtoull, strtouq \- convert a string to an unsigned long integer
.SH SEE ALSO
.BR a64l (3),
.BR atof (3),
.BR atoi (3),
.BR atol (3),
.BR strtod (3),
.BR strtol (3),
.BR strtoumax (3)
And it can then be piped to groff(1) to format the entire set of pages:
$ man -w strtol strtoul | xargs mansect '\(NAME\|SEE ALSO\)' | groff -man -Tutf8
strtol(3) Library Functions Manual strtol(3)
NAME
strtol, strtoll, strtoq - convert a string to a long integer
SEE ALSO
atof(3), atoi(3), atol(3), strtod(3), strtoimax(3), strtoul(3)
Linux man‐pages 6.9.1‐158‐g2ac... 2024‐07‐23 strtol(3)
───────────────────────────────────────────────────────────────────────────────
strtoul(3) Library Functions Manual strtoul(3)
NAME
strtoul, strtoull, strtouq - convert a string to an unsigned long integer
SEE ALSO
a64l(3), atof(3), atoi(3), atol(3), strtod(3), strtol(3), strtoumax(3)
Linux man‐pages 6.9.1‐158‐g2ac... 2024‐07‐23 strtoul(3)
This is quite naive, and will not work with pages that define their own
stuff, since this script is not groff(1). But it should be as fast as
is possible, which is what Colin wants, is as simple as it can be (and
thus relatively safe), and should work with most pages (as far as
indexing is concerned, probably all?).
Have a lovely night!
Alex
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page
2024-11-02 21:36 ` Alejandro Colomar
@ 2024-11-02 23:47 ` Colin Watson
2024-11-03 0:05 ` Alejandro Colomar
0 siblings, 1 reply; 20+ messages in thread
From: Colin Watson @ 2024-11-02 23:47 UTC (permalink / raw)
To: Alejandro Colomar
Cc: G. Branden Robinson, Ian Rogers, David Airlie, Simona Vetter,
Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
Jonathan Corbet, dri-devel, linux-doc, linux-kernel, linux-man,
groff
On Sat, Nov 02, 2024 at 10:36:20PM +0100, Alejandro Colomar wrote:
> This is quite naive, and will not work with pages that define their own
> stuff, since this script is not groff(1). But it should be as fast as
> is possible, which is what Colin wants, is as simple as it can be (and
> thus relatively safe), and should work with most pages (as far as
> indexing is concerned, probably all?).
I seem to be being invoked here for something I actually don't think I
want at all, which suggests that wires have been crossed somewhere. Can
you explain why I'd want to replace some part of a fairly well-optimized
and established C program with a shell pipeline? I'm pretty certain it
would not be faster, at least.
Thanks,
--
Colin Watson (he/him) [cjwatson@debian.org]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page
2024-11-02 23:47 ` Colin Watson
@ 2024-11-03 0:05 ` Alejandro Colomar
2024-11-03 0:07 ` Alejandro Colomar
` (2 more replies)
0 siblings, 3 replies; 20+ messages in thread
From: Alejandro Colomar @ 2024-11-03 0:05 UTC (permalink / raw)
To: G. Branden Robinson, Ian Rogers, David Airlie, Simona Vetter,
Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
Jonathan Corbet, dri-devel, linux-doc, linux-kernel, linux-man,
groff
[-- Attachment #1: Type: text/plain, Size: 1882 bytes --]
Hi Colin,
On Sat, Nov 02, 2024 at 11:47:14PM +0000, Colin Watson wrote:
> On Sat, Nov 02, 2024 at 10:36:20PM +0100, Alejandro Colomar wrote:
> > This is quite naive, and will not work with pages that define their own
> > stuff, since this script is not groff(1). But it should be as fast as
> > is possible, which is what Colin wants, is as simple as it can be (and
> > thus relatively safe), and should work with most pages (as far as
> > indexing is concerned, probably all?).
>
> I seem to be being invoked here for something I actually don't think I
> want at all, which suggests that wires have been crossed somewhere. Can
> you explain why I'd want to replace some part of a fairly well-optimized
> and established C program with a shell pipeline? I'm pretty certain it
> would not be faster, at least.
Are you sure? With a small tweak, I get the following comparison:
alx@devuan:~/src/linux/man-pages/man-pages/main$ time lexgrog man/*/* | wc
lexgrog: can't resolve man7/groff_man.7
12475 99295 919842
real 0m6.166s
user 0m5.132s
sys 0m1.336s
alx@devuan:~/src/linux/man-pages/man-pages/main$ time mansect NAME man/ \
| groff -man -Tutf8 | wc
9830 27109 689478
real 0m0.156s
user 0m0.219s
sys 0m0.019s
Yes, I'm working with uncompressed pages. We'd need to add support for
handling compressed pages. Also, we'd need to compare the performance
of lexgrog(1) with compressed pages. But for a starter, this suggests
some good performance.
(I say with a small tweak, because the version I've posted uses
xargs -L1, but I've tested for performance without the -L1, which is
the main bottleneck. It has no consequences for the NAME. I need to
work out some nasty details with sed -n1 for the generic version,
though.)
Have a lovely night!
Alex
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page
2024-11-03 0:05 ` Alejandro Colomar
@ 2024-11-03 0:07 ` Alejandro Colomar
2024-11-03 0:24 ` Colin Watson
2024-11-03 0:47 ` Colin Watson
2 siblings, 0 replies; 20+ messages in thread
From: Alejandro Colomar @ 2024-11-03 0:07 UTC (permalink / raw)
To: G. Branden Robinson, Ian Rogers, David Airlie, Simona Vetter,
Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
Jonathan Corbet, dri-devel, linux-doc, linux-kernel, linux-man,
groff
[-- Attachment #1: Type: text/plain, Size: 2111 bytes --]
On Sun, Nov 03, 2024 at 01:05:42AM +0100, Alejandro Colomar wrote:
> Hi Colin,
>
> On Sat, Nov 02, 2024 at 11:47:14PM +0000, Colin Watson wrote:
> > On Sat, Nov 02, 2024 at 10:36:20PM +0100, Alejandro Colomar wrote:
> > > This is quite naive, and will not work with pages that define their own
> > > stuff, since this script is not groff(1). But it should be as fast as
> > > is possible, which is what Colin wants, is as simple as it can be (and
> > > thus relatively safe), and should work with most pages (as far as
> > > indexing is concerned, probably all?).
> >
> > I seem to be being invoked here for something I actually don't think I
> > want at all, which suggests that wires have been crossed somewhere. Can
> > you explain why I'd want to replace some part of a fairly well-optimized
> > and established C program with a shell pipeline? I'm pretty certain it
> > would not be faster, at least.
>
> Are you sure? With a small tweak, I get the following comparison:
>
> alx@devuan:~/src/linux/man-pages/man-pages/main$ time lexgrog man/*/* | wc
> lexgrog: can't resolve man7/groff_man.7
> 12475 99295 919842
>
> real 0m6.166s
> user 0m5.132s
> sys 0m1.336s
> alx@devuan:~/src/linux/man-pages/man-pages/main$ time mansect NAME man/ \
> | groff -man -Tutf8 | wc
> 9830 27109 689478
>
> real 0m0.156s
> user 0m0.219s
> sys 0m0.019s
>
> Yes, I'm working with uncompressed pages. We'd need to add support for
> handling compressed pages. Also, we'd need to compare the performance
> of lexgrog(1) with compressed pages. But for a starter, this suggests
> some good performance.
>
> (I say with a small tweak, because the version I've posted uses
> xargs -L1, but I've tested for performance without the -L1, which is
> the main bottleneck. It has no consequences for the NAME. I need to
> work out some nasty details with sed -n1 for the generic version,
s/n1/n/
> though.)
>
>
> Have a lovely night!
> Alex
>
> --
> <https://www.alejandro-colomar.es/>
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page
2024-11-03 0:05 ` Alejandro Colomar
2024-11-03 0:07 ` Alejandro Colomar
@ 2024-11-03 0:24 ` Colin Watson
2024-11-03 0:47 ` Colin Watson
2 siblings, 0 replies; 20+ messages in thread
From: Colin Watson @ 2024-11-03 0:24 UTC (permalink / raw)
To: Alejandro Colomar
Cc: G. Branden Robinson, Ian Rogers, David Airlie, Simona Vetter,
Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
Jonathan Corbet, dri-devel, linux-doc, linux-kernel, linux-man,
groff
On Sun, Nov 03, 2024 at 01:05:34AM +0100, Alejandro Colomar wrote:
> Are you sure? With a small tweak, I get the following comparison:
>
> alx@devuan:~/src/linux/man-pages/man-pages/main$ time lexgrog man/*/* | wc
> lexgrog: can't resolve man7/groff_man.7
> 12475 99295 919842
Comparing anything to lexgrog isn't very interesting; it's a debugging
tool and is not in itself very performance-sensitive. As I've explained
elsewhere, the interesting thing is mandb, which uses the same code
in-process to scan a whole tree of pages in one go. I do not expect to
ever want to replace that with a shell pipeline.
--
Colin Watson (he/him) [cjwatson@debian.org]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page
2024-11-03 0:05 ` Alejandro Colomar
2024-11-03 0:07 ` Alejandro Colomar
2024-11-03 0:24 ` Colin Watson
@ 2024-11-03 0:47 ` Colin Watson
2024-11-03 1:09 ` G. Branden Robinson
2 siblings, 1 reply; 20+ messages in thread
From: Colin Watson @ 2024-11-03 0:47 UTC (permalink / raw)
To: Alejandro Colomar
Cc: G. Branden Robinson, Ian Rogers, David Airlie, Simona Vetter,
Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
Jonathan Corbet, dri-devel, linux-doc, linux-kernel, linux-man,
groff
I'm not trying to stop you committing whatever you want to your
repository, of course, but I want to be clear that this doesn't actually
solve the right problem for manual page indexing. The point of the
parsing code in mandb(8) - and I'm not claiming that it's great code or
the perfect design, just that it works most of the time - is to extract
the names and summary-descriptions from each page so that they can be
used by tools such as apropos(1) and whatis(1). Splitting on section
boundaries is just the simplest part of that problem, and I don't think
that doing it in a separate program really gains anything.
(That's leaving aside things like localized man pages, which I know some
folks on the groff list tend to sniff at but I think they're important,
and the fact that the NAME section has both semantic and presentational
meaning means that like it or not the parser needs to be aware of this.)
--
Colin Watson (he/him) [cjwatson@debian.org]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page
2024-11-02 19:06 ` Colin Watson
@ 2024-11-03 0:50 ` G. Branden Robinson
2024-11-03 1:55 ` Colin Watson
0 siblings, 1 reply; 20+ messages in thread
From: G. Branden Robinson @ 2024-11-03 0:50 UTC (permalink / raw)
To: Alejandro Colomar, Ian Rogers, David Airlie, Simona Vetter,
Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
Jonathan Corbet, dri-devel, linux-doc, linux-kernel, linux-man,
groff
[-- Attachment #1: Type: text/plain, Size: 4954 bytes --]
Hi Colin,
At 2024-11-02T19:06:53+0000, Colin Watson wrote:
> How embarrassing. Could somebody please file a bug on
> https://gitlab.com/man-db/man-db/-/issues to remind me to fix that?
Done; <https://gitlab.com/man-db/man-db/-/issues/46>.
> lexgrog(1) is a useful (if oddly-named, sorry) debugging tool, but if
> you focus on that then you'll end up with a design that's not very
> useful. What really matters is indexing the whole system's manual
> pages, and mandb(8) does not do that by invoking lexgrog(1) one page
> at a time, but rather by running more or less the same code
> in-process.
Ah, I see it now--"lexgrog.l" is in both the Automake macros
"lexgrog_SOURCES" and "mandb_SOURCES". Nice and DRY!
> I already know that getting acceptable performance for
> this requires care, as illustrated by one of the NEWS entries for
> man-db 2.10.0:
>
> * Significantly improve `mandb(8)` and `man -K` performance in the
> common case where pages are of moderate size and compressed using
> `zlib`: `mandb -c` goes from 344 seconds to 10 seconds on a test
> system.
>
> ... so I'm prepared to bet that forking nroff one page at a time will
> be unacceptably slow.
Probably, but there is little reason to run nroff that way (as of groff
1.23). It already works well, but I have ideas for further hardening
groff's man(7) and mdoc(7) packages such that they return to a
well-defined state when changing input documents.
> (This also combines with the fact that man-db applies some sandboxing
> when it's calling nroff just in case it might happen that a
> moderately-sized C++ project has less than 100% perfect security when
> doing text processing, which I'm sure everyone agrees would never
> happen.)
Inconceivable, yes! But fortunately you can run nroff over N documents
and pay its own startup overhead costs as well as those of sandboxing
only once.
> If it were possible to run nroff over a whole batch of pages and get
> output for each of them in one go, then maaaaybe.
That's already true for formatting the entire page. It's how this was
created.
https://www.gnu.org/software/groff/manual/groff-man-pages.utf8.txt
(...best viewed with "less -R")
With the `-d EXTRACT` feature I have in mind, in its
as-simple-as-possible first-cut form, the problem you anticipate...
> man-db would need a reliable way to associate each line (or sometimes
> multiple lines) of output with each source file,
...would remain. I'll have to think of a good way to write out
"metadata" (the input file name and the arguments to the `TH` request)
as each page is encountered, and of an interface to enable that. I
don't see it happening before groff 1.25.
> and of course care would be needed around error handling and so on.
I need to give this thought, too. What sorts of error scenarios do you
foresee? GNU troff itself, if it can't open a file to be formatted,
reports an error diagnostic and continues to the next `argv` string
until it reaches the end of input.
> I can see the appeal, in terms of processing the actual language
> rather than a pile of hacks that try to guess what to do with it
...a major selling point, IMO...
> but on the other hand this starts to feel like a much less natural fit
> for the way nroff is run in every other situation, where you're
> processing one document at a time.
This I disagree with. Or perhaps more precisely, it's another example
of the exception (man(1)) swallowing the rule (nroff/troff). nroff and
troff were written as Unix filters; they read the standard input stream
(and/or argument list)[1], do some processing, and write to standard
output.[2]
Historically, troff (or one of its preprocessors) was commonly used with
multiple input files to catenate them.
Here's an example of this practice from 1980.
https://minnie.tuhs.org/cgi-bin/utree.pl?file=3BSD/usr/doc/pascal/makefile
Regards,
Branden
[1] ...including this option from Seventh Edition Unix (1979) or
earlier, which survives in GNU troff to this day.
-i Read standard input after the input files are
exhausted.
[2] Seventh Edition troff didn't write to stdout by default, but tried
to open the typesetter device. But it had an option to write to
standard output.
-t Direct output to the standard output instead of the
phototypesetter.
Running old school Unix under emulation these days, you _have_ to use
this option to avoid the dreaded "Typesetter busy." diagnostic.
When Kernighan refactored troff for device-independence, he
reseated it more squarely in the Unix filter tradition by writing
its plain-text page description language to stdout. The output
driver, such as "dpost" for PostScript, also read its standard input,
and could thus become just one more stage in a pipeline. [CSTR #97]
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page
2024-11-03 0:47 ` Colin Watson
@ 2024-11-03 1:09 ` G. Branden Robinson
2024-11-03 1:18 ` Colin Watson
0 siblings, 1 reply; 20+ messages in thread
From: G. Branden Robinson @ 2024-11-03 1:09 UTC (permalink / raw)
To: Ian Rogers, David Airlie, Simona Vetter, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, Jonathan Corbet, dri-devel,
linux-doc, linux-kernel, linux-man, groff
[-- Attachment #1: Type: text/plain, Size: 1346 bytes --]
Hi Colin,
At 2024-11-03T00:47:23+0000, Colin Watson wrote:
> (That's leaving aside things like localized man pages, which I know
> some folks on the groff list tend to sniff
I can think of only one, the maintainer of a rival formatter. ;-)
> at but I think they're important,
Me too. I agree with the sniffer that no language is ever likely to
reach 100% parity with English in something like the Debian
distribution, but more modest domains exist.
I've put effort into l10n issues in man(7) and in groff generally. In
particular, I really want seamless multilingual document support and
achievement of that goal will be, I think, much closer in groff 1.24.
(My pending push is gated on deciding how to change the me(7) and ms(7)
packages to accommodate a formatter-level fix to an ugly wart in the
l10n department; see <https://savannah.gnu.org/bugs/?66387>.)
> and the fact that the NAME section has both semantic and
> presentational meaning means that like it or not the parser needs to
> be aware of this.)
Even if mandb(8) doesn't run groff to extract the summary descriptions/
apropos lines, I think this feature might be useful to you for
coverage/regression testing. Presumably, for valid inputs, groff and
mandb(8) should reach similar conclusions about how the text of a "Name"
section is to be formatted.
Regards,
Branden
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page
2024-11-03 1:09 ` G. Branden Robinson
@ 2024-11-03 1:18 ` Colin Watson
0 siblings, 0 replies; 20+ messages in thread
From: Colin Watson @ 2024-11-03 1:18 UTC (permalink / raw)
To: G. Branden Robinson
Cc: Ian Rogers, David Airlie, Simona Vetter, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, Jonathan Corbet, dri-devel,
linux-doc, linux-kernel, linux-man, groff
(now with some local vim macros fixed to stop accidentally corrupting
the To: lines of some of my outgoing emails ...)
On Sat, Nov 02, 2024 at 08:09:29PM -0500, G. Branden Robinson wrote:
> At 2024-11-03T00:47:23+0000, Colin Watson wrote:
> > and the fact that the NAME section has both semantic and
> > presentational meaning means that like it or not the parser needs to
> > be aware of this.)
>
> Even if mandb(8) doesn't run groff to extract the summary descriptions/
> apropos lines, I think this feature might be useful to you for
> coverage/regression testing. Presumably, for valid inputs, groff and
> mandb(8) should reach similar conclusions about how the text of a "Name"
> section is to be formatted.
Yes, that's a good point and I agree with that.
--
Colin Watson (he/him) [cjwatson@debian.org]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page
2024-11-03 0:50 ` G. Branden Robinson
@ 2024-11-03 1:55 ` Colin Watson
0 siblings, 0 replies; 20+ messages in thread
From: Colin Watson @ 2024-11-03 1:55 UTC (permalink / raw)
To: G. Branden Robinson
Cc: Alejandro Colomar, Ian Rogers, David Airlie, Simona Vetter,
Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
Jonathan Corbet, dri-devel, linux-doc, linux-kernel, linux-man,
groff
On Sat, Nov 02, 2024 at 07:50:23PM -0500, G. Branden Robinson wrote:
> At 2024-11-02T19:06:53+0000, Colin Watson wrote:
> > How embarrassing. Could somebody please file a bug on
> > https://gitlab.com/man-db/man-db/-/issues to remind me to fix that?
>
> Done; <https://gitlab.com/man-db/man-db/-/issues/46>.
Thanks, working on it.
> > I already know that getting acceptable performance for
> > this requires care, as illustrated by one of the NEWS entries for
> > man-db 2.10.0:
> >
> > * Significantly improve `mandb(8)` and `man -K` performance in the
> > common case where pages are of moderate size and compressed using
> > `zlib`: `mandb -c` goes from 344 seconds to 10 seconds on a test
> > system.
> >
> > ... so I'm prepared to bet that forking nroff one page at a time will
> > be unacceptably slow.
>
> Probably, but there is little reason to run nroff that way (as of groff
> 1.23). It already works well, but I have ideas for further hardening
> groff's man(7) and mdoc(7) packages such that they return to a
> well-defined state when changing input documents.
Being able to keep track of which output goes with which input pages is
critical to the indexer, though (as you acknowledge later in your
reply). It can't just throw the whole lot at nroff and call it a day.
One other thing: mandb/lexgrog also looks for preprocessing filter hints
in pages (`'\" te` and the like). This is obscure, to be sure, but
either a replacement would need to do the same thing or we'd need to be
certain that it's no longer required.
> > and of course care would be needed around error handling and so on.
>
> I need to give this thought, too. What sorts of error scenarios do you
> foresee? GNU troff itself, if it can't open a file to be formatted,
> reports an error diagnostic and continues to the next `argv` string
> until it reaches the end of input.
That might be sufficient, or man-db might need to be able to detect
which pages had errors. I'm not currently sure.
> > but on the other hand this starts to feel like a much less natural fit
> > for the way nroff is run in every other situation, where you're
> > processing one document at a time.
>
> This I disagree with. Or perhaps more precisely, it's another example
> of the exception (man(1)) swallowing the rule (nroff/troff). nroff and
> troff were written as Unix filters; they read the standard input stream
> (and/or argument list)[1], do some processing, and write to standard
> output.[2]
>
> Historically, troff (or one of its preprocessors) was commonly used with
> multiple input files to catenate them.
But this application is not conceptually like catenation (even if it
might be possible to implement it that way). The collection of all
manual pages on a system is not like one long document that happens to
be split over multiple files, certainly not from an indexer's point of
view.
--
Colin Watson (he/him) [cjwatson@debian.org]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page
2024-11-02 10:39 ` Alejandro Colomar
2024-11-02 21:36 ` Alejandro Colomar
@ 2024-11-03 4:05 ` G. Branden Robinson
1 sibling, 0 replies; 20+ messages in thread
From: G. Branden Robinson @ 2024-11-03 4:05 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Ian Rogers, David Airlie, Simona Vetter, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, Jonathan Corbet, dri-devel,
linux-doc, linux-kernel, linux-man, cjwatson, groff
[-- Attachment #1: Type: text/plain, Size: 2262 bytes --]
Hi Alex,
At 2024-11-02T11:39:37+0100, Alejandro Colomar wrote:
> And diffs are a real win for text. Thus, semantic newlines are a real
> win for text. "Write poems, not prose." (Any chance we may get that
> warning added to groff(1)? :D)
Yes, but I've kicked it out to groff 1.25 because a gift-wrapped
opportunity came along. We get to retire a warning category and its
number.
groff(7) [1.23.0]:
Warnings
...
el 16 The el request was encountered with no prior
corresponding ie request.
groff 1.24.0 [in preparation] NEWS:
* The "el" warning category has been withdrawn. If enabled (which it
was not by default), the formatter would emit a diagnostic if it
inferred an imbalance between `ie` and `el` requests. Unfortunately
its technique wasn't reliable and sometimes spuriously issued these
warnings, and making it perfectly reliable did not look tractable.
We recommend using brace escape sequences `\{` and `\}` to ensure
that your control flow structures remain maintainable.
This was a 35-year-old bug (or incomplete feature) in GNU troff that as
far as I know first came to attention 10 years ago when the
then-Heirloom Doctools maintainer pointed out an incompatibility between
AT&T troff (from which Heirloom Doctools descends) and GNU troff.
https://savannah.gnu.org/bugs/?45502
More recently, Paul Eggert scored big-time grognard points by actually
depending on the AT&T troff behavior in the zic(8) man page.
https://savannah.gnu.org/bugs/?65474
We therefore _had_ to fix it.
The consequence is that the warning category `el` and bit 4 in the
warning mask integer are undefined for groff 1.24.
This was irresistible serendipity, because this warning category was (1)
not enabled by default and (2) probably used only by people who wouldn't
object to style warnings anyway.
In groff 1.25, I want to revive bit 4 as new warning category `style`.
Ending sentences before the end of a text line is something we can warn
about as discussed a while back, and I plan to do so.
https://lists.gnu.org/archive/html/groff/2022-06/msg00052.html
I've been collecting specimens of other contemplated style warnings.
https://savannah.gnu.org/bugs/?62776
Regards,
Branden
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2024-11-03 4:05 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-15 21:17 [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page Ian Rogers
2024-10-15 21:17 ` [PATCH v2 2/3] proc_pid_fdinfo.5: Add subsection headers for different fd types Ian Rogers
2024-10-15 21:17 ` [PATCH v2 3/3] proc_pid_fdinfo.5: Add DRM subsection Ian Rogers
2024-11-01 13:24 ` [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page Alejandro Colomar
2024-11-01 18:19 ` Ian Rogers
2024-11-01 20:07 ` Alejandro Colomar
2024-11-02 10:08 ` G. Branden Robinson
2024-11-02 10:39 ` Alejandro Colomar
2024-11-02 21:36 ` Alejandro Colomar
2024-11-02 23:47 ` Colin Watson
2024-11-03 0:05 ` Alejandro Colomar
2024-11-03 0:07 ` Alejandro Colomar
2024-11-03 0:24 ` Colin Watson
2024-11-03 0:47 ` Colin Watson
2024-11-03 1:09 ` G. Branden Robinson
2024-11-03 1:18 ` Colin Watson
2024-11-03 4:05 ` G. Branden Robinson
2024-11-02 19:06 ` Colin Watson
2024-11-03 0:50 ` G. Branden Robinson
2024-11-03 1:55 ` Colin Watson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).