* [PATCH 0/2] proc.5: note broken v4.18 userspace promise
@ 2022-12-23 17:59 Ævar Arnfjörð Bjarmason
2022-12-23 17:59 ` [PATCH 1/2] proc.5: note that "cmdline" might be favored over "stat.comm" by ps(1) Ævar Arnfjörð Bjarmason
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-12-23 17:59 UTC (permalink / raw)
To: linux-man
Cc: Alejandro Colomar, Tejun Heo, Linus Torvalds, Craig Small,
Alexey Dobriyan, Alejandro Colomar, Michael Kerrisk,
Ævar Arnfjörð Bjarmason
Along with a trivial change in 1/2 (which would otherwise textually
conflict) this 2/2 notes that since Linux 4.18 the promise that the
"comm" field in /proc/PID/stat will be no longer than 15 characters
hasn't been true for the "kworker" processes.
This caveat was noted in a discussion on HN
https://news.ycombinator.com/item?id=34093845;
I myself have code in git/git@2d3491b117c (tr2: log N parent process
names on Linux, 2021-08-27) which won't do anything bad if this were
to be combined with PID recycling (with a kernel thread usurping the
PID of a process that used to belong to our parent), but it will
behave unexpectedly. I wrote that code against the promises made in
proc(5) at the time.
As I'm about to send this I notice that [1] was sent yesterday, which
textually conflicts with this submission[1]. I've added its authors to
the CC (I'm not on the linux-man list).
Personally I never read the note that the "comm" string would be
contained in parentheses as a promise that the kernel was going to
strip ")" from userspace names, or only allow balanced parentheses or
whatever.
I'd think most programmers would see a mention of a \0-delimited
string and assume that it would contain anything but "\0" (as is the
case here). Perhaps it's useful to some to include such a clarifying
blurb, but I personally find it superfluous.
Whereas the fix here is a fix for a promise we're currently making
which hasn't been true since v4.18.
1. https://lore.kernel.org/linux-man/Y6SJDbKBk471KE4k@p183
Ævar Arnfjörð Bjarmason (2):
proc.5: note that "cmdline" might be favored over "stat.comm" by ps(1)
proc.5: the "comm" field can be longer than 16 bytes
man5/proc.5 | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
--
2.39.0.1106.gf45ba805d1a
^ permalink raw reply [flat|nested] 8+ messages in thread* [PATCH 1/2] proc.5: note that "cmdline" might be favored over "stat.comm" by ps(1) 2022-12-23 17:59 [PATCH 0/2] proc.5: note broken v4.18 userspace promise Ævar Arnfjörð Bjarmason @ 2022-12-23 17:59 ` Ævar Arnfjörð Bjarmason 2022-12-28 20:27 ` Alejandro Colomar 2022-12-23 17:59 ` [PATCH 2/2] proc.5: the "comm" field can be longer than 16 bytes Ævar Arnfjörð Bjarmason 2022-12-23 18:12 ` [PATCH 0/2] proc.5: note broken v4.18 userspace promise Linus Torvalds 2 siblings, 1 reply; 8+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2022-12-23 17:59 UTC (permalink / raw) To: linux-man Cc: Alejandro Colomar, Tejun Heo, Linus Torvalds, Craig Small, Alexey Dobriyan, Alejandro Colomar, Michael Kerrisk, Ævar Arnfjörð Bjarmason With "ps -A" the output uses the "stat.comm" field, but "ps a" will display the value in "cmdline". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- man5/proc.5 | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/man5/proc.5 b/man5/proc.5 index 65a4c38e3..115c8592e 100644 --- a/man5/proc.5 +++ b/man5/proc.5 @@ -2087,7 +2087,11 @@ The affected fields are indicated with the marking [PT]. The process ID. .TP (2) \fIcomm\fP \ %s -The filename of the executable, in parentheses. +The filename of the executable, in parentheses. Tools such as +.BR ps (1) +may alternatively (or additionally) use +.IR /proc/ pid /cmdline. +.IP Strings longer than .B TASK_COMM_LEN (16) characters (including the terminating null byte) are silently truncated. -- 2.39.0.1106.gf45ba805d1a ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] proc.5: note that "cmdline" might be favored over "stat.comm" by ps(1) 2022-12-23 17:59 ` [PATCH 1/2] proc.5: note that "cmdline" might be favored over "stat.comm" by ps(1) Ævar Arnfjörð Bjarmason @ 2022-12-28 20:27 ` Alejandro Colomar 0 siblings, 0 replies; 8+ messages in thread From: Alejandro Colomar @ 2022-12-28 20:27 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason, linux-man Cc: Alejandro Colomar, Tejun Heo, Linus Torvalds, Craig Small, Alexey Dobriyan, Michael Kerrisk [-- Attachment #1.1: Type: text/plain, Size: 1565 bytes --] Hi Ævar, On 12/23/22 18:59, Ævar Arnfjörð Bjarmason wrote: > With "ps -A" the output uses the "stat.comm" field, but "ps a" will > display the value in "cmdline". > > Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> > --- > man5/proc.5 | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/man5/proc.5 b/man5/proc.5 > index 65a4c38e3..115c8592e 100644 > --- a/man5/proc.5 > +++ b/man5/proc.5 > @@ -2087,7 +2087,11 @@ The affected fields are indicated with the marking [PT]. > The process ID. > .TP > (2) \fIcomm\fP \ %s > -The filename of the executable, in parentheses. > +The filename of the executable, in parentheses. Tools such as Please use semantic newlines. See man-pages(7): Use semantic newlines In the source of a manual page, new sentences should be started on new lines, long sentences should be split into lines at clause breaks (com‐ mas, semicolons, colons, and so on), and long clauses should be split at phrase boundaries. This convention, sometimes known as "semantic newlines", makes it easier to see the effect of patches, which often operate at the level of individual sentences, clauses, or phrases. Cheers, Alex > +.BR ps (1) > +may alternatively (or additionally) use > +.IR /proc/ pid /cmdline. > +.IP > Strings longer than > .B TASK_COMM_LEN > (16) characters (including the terminating null byte) are silently truncated. -- <http://www.alejandro-colomar.es/> [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 2/2] proc.5: the "comm" field can be longer than 16 bytes 2022-12-23 17:59 [PATCH 0/2] proc.5: note broken v4.18 userspace promise Ævar Arnfjörð Bjarmason 2022-12-23 17:59 ` [PATCH 1/2] proc.5: note that "cmdline" might be favored over "stat.comm" by ps(1) Ævar Arnfjörð Bjarmason @ 2022-12-23 17:59 ` Ævar Arnfjörð Bjarmason 2022-12-23 18:12 ` [PATCH 0/2] proc.5: note broken v4.18 userspace promise Linus Torvalds 2 siblings, 0 replies; 8+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2022-12-23 17:59 UTC (permalink / raw) To: linux-man Cc: Alejandro Colomar, Tejun Heo, Linus Torvalds, Craig Small, Alexey Dobriyan, Alejandro Colomar, Michael Kerrisk, Ævar Arnfjörð Bjarmason Since torvalds/linux@6b59808bfe48 (workqueue: Show the latest workqueue name in /proc/PID/{comm,stat,status}, 2018-05-18) the limit of 15 character comm names hasn't applied to "kworker" processes. This can be seen e.g. on my Linux v5.10 box: $ awk '{print $2}' /proc/*/stat 2>/dev/null | grep kworker | sort -R | head -n 5 (kworker/3:1-mm_percpu_wq) (kworker/0:0H-events_highpri) (kworker/1:1H-kblockd) (kworker/u16:1-events_unbound) (kworker/u17:0) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- man5/proc.5 | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/man5/proc.5 b/man5/proc.5 index 115c8592e..b23dd1479 100644 --- a/man5/proc.5 +++ b/man5/proc.5 @@ -2092,9 +2092,13 @@ The filename of the executable, in parentheses. Tools such as may alternatively (or additionally) use .IR /proc/ pid /cmdline. .IP -Strings longer than +For userspace, strings longer than .B TASK_COMM_LEN (16) characters (including the terminating null byte) are silently truncated. +Since Linux version 4.18.0 a longer limit of 64 (including the +terminating null byte) has applied to the kernel's own workqueue +workers (whose names start with "kworker/"). +.IP This is visible whether or not the executable is swapped out. .TP (3) \fIstate\fP \ %c -- 2.39.0.1106.gf45ba805d1a ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 0/2] proc.5: note broken v4.18 userspace promise 2022-12-23 17:59 [PATCH 0/2] proc.5: note broken v4.18 userspace promise Ævar Arnfjörð Bjarmason 2022-12-23 17:59 ` [PATCH 1/2] proc.5: note that "cmdline" might be favored over "stat.comm" by ps(1) Ævar Arnfjörð Bjarmason 2022-12-23 17:59 ` [PATCH 2/2] proc.5: the "comm" field can be longer than 16 bytes Ævar Arnfjörð Bjarmason @ 2022-12-23 18:12 ` Linus Torvalds 2022-12-28 20:26 ` Alejandro Colomar 2023-01-04 17:19 ` Tejun Heo 2 siblings, 2 replies; 8+ messages in thread From: Linus Torvalds @ 2022-12-23 18:12 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: linux-man, Alejandro Colomar, Tejun Heo, Craig Small, Alexey Dobriyan, Alejandro Colomar, Michael Kerrisk On Fri, Dec 23, 2022 at 10:00 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > > Whereas the fix here is a fix for a promise we're currently making > which hasn't been true since v4.18. Hah. Ack. Did anybody ever actually notice? I wonder if the newer limit of 64 characters for kworkers shouldn't even be mentioned at all, and if the 16-byte truncation for user space should also be just removed. Those limits should never have been some documented API, they were always just implementation details, after all. Linus ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/2] proc.5: note broken v4.18 userspace promise 2022-12-23 18:12 ` [PATCH 0/2] proc.5: note broken v4.18 userspace promise Linus Torvalds @ 2022-12-28 20:26 ` Alejandro Colomar 2023-01-04 20:59 ` Ævar Arnfjörð Bjarmason 2023-01-04 17:19 ` Tejun Heo 1 sibling, 1 reply; 8+ messages in thread From: Alejandro Colomar @ 2022-12-28 20:26 UTC (permalink / raw) To: Linus Torvalds, Ævar Arnfjörð Bjarmason Cc: linux-man, Alejandro Colomar, Tejun Heo, Craig Small, Alexey Dobriyan, Michael Kerrisk [-- Attachment #1.1: Type: text/plain, Size: 1886 bytes --] Hi, On 12/23/22 19:12, Linus Torvalds wrote: > On Fri, Dec 23, 2022 at 10:00 AM Ævar Arnfjörð Bjarmason > <avarab@gmail.com> wrote: >> >> Whereas the fix here is a fix for a promise we're currently making >> which hasn't been true since v4.18. > > Hah. Ack. Did anybody ever actually notice? > > I wonder if the newer limit of 64 characters for kworkers shouldn't > even be mentioned at all, and if the 16-byte truncation for user space > should also be just removed. > > Those limits should never have been some documented API, they were > always just implementation details, after all. > > Linus I agree. A variable implementation detail like this doesn't provide anything valuable to users; especially since there's no statbility promise at all. I'd rewrite to just remove the (16) implementation detail. Ævar, would you send an v2 that removes implementation details, rather than fixing the details? Thanks for the patch set! Cheers, Alex On 12/23/22 18:59, Ævar Arnfjörð Bjarmason wrote: > diff --git a/man5/proc.5 b/man5/proc.5 > index 115c8592e..b23dd1479 100644 > --- a/man5/proc.5 > +++ b/man5/proc.5 > @@ -2092,9 +2092,13 @@ The filename of the executable, in parentheses. Tools such as > may alternatively (or additionally) use > .IR/proc/ pid /cmdline. > .IP > -Strings longer than > +For userspace, strings longer than > .B TASK_COMM_LEN > (16) characters (including the terminating null byte) are silently truncated. > +Since Linux version 4.18.0 a longer limit of 64 (including the > +terminating null byte) has applied to the kernel's own workqueue > +workers (whose names start with "kworker/"). > +.IP > This is visible whether or not the executable is swapped out. > .TP > (3) \fIstate\fP \ %c -- <http://www.alejandro-colomar.es/> [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/2] proc.5: note broken v4.18 userspace promise 2022-12-28 20:26 ` Alejandro Colomar @ 2023-01-04 20:59 ` Ævar Arnfjörð Bjarmason 0 siblings, 0 replies; 8+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2023-01-04 20:59 UTC (permalink / raw) To: Alejandro Colomar Cc: Linus Torvalds, linux-man, Alejandro Colomar, Tejun Heo, Craig Small, Alexey Dobriyan, Michael Kerrisk On Wed, Dec 28 2022, Alejandro Colomar wrote: > [[PGP Signed Part:Undecided]] > Hi, > > On 12/23/22 19:12, Linus Torvalds wrote: >> On Fri, Dec 23, 2022 at 10:00 AM Ævar Arnfjörð Bjarmason >> <avarab@gmail.com> wrote: >>> >>> Whereas the fix here is a fix for a promise we're currently making >>> which hasn't been true since v4.18. >> Hah. Ack. Did anybody ever actually notice? >> I wonder if the newer limit of 64 characters for kworkers shouldn't >> even be mentioned at all, and if the 16-byte truncation for user space >> should also be just removed. >> Those limits should never have been some documented API, they were >> always just implementation details, after all. >> Linus > Sorry about the late reply, holidays. > I agree. A variable implementation detail like this doesn't provide > anything valuable to users; especially since there's no statbility > promise at all. I'd rewrite to just remove the (16) implementation > detail. > > Ævar, would you send an v2 that removes implementation details, rather > than fixing the details? Maybe, because I'm not sure I'm qualified to document this anymore. My current patch just extends the description to cover the 4.18 divergence. Let's separate a few things here: A. The long-standing docs promise that it's limited to 16 bytes B. Since 4.18 that hasn't been true for (some of) the kernel's own processes, where the limit's been 64. C. Was the part of "A" where a limit was documented at all a good idea in retrospect? D. If "C"'s a "no" (which seems to be the consensus) what should the docs say? E. I hadn't mentioned this before, but the docs for prctl()'s PR_SET_NAME document the same 16 byte limit. I think the current behavior since 4.18 is a broken userspace promise, although admittedly a minor/obscure one. I think even if going forward the documentation is deliberately ambiguous about it, it would make sense to briefly document the 16 and 64 byte limits as past limits, to at least help to explain why current code parsing "/proc/*/stat" seems to be confident in those (or more commonly, the 16 bytes). The code I wrote was rather anal about that promise, but e.g. looking at htop(1)'s source code they've got a total limit of 2048 for this sort of line (MAX_READ). I'm sure if I went fishing I could find other similar cases (and probably some lower ones). I don't think it would be good to just leave it ambiguous for those trying to use this interface. They might assume any of 16 bytes (from finding the prctl() PR_SET_NAME docs), 64 bytes (from reading kernel sources), 255 (maximum filename length) etc. Wouldn't the least bad thing be to: * Cover "A" and "B" in passing, i.e. explain past promised / implemented limits. * Note that this is no guarantee, but that... * ...we might use up to N, where N is some sane limit (1024? 2048? 4096?). So programs that parse this now could just increase their fixed buffers, rather than having to use some getc()/realloc() loop, as they might if the interface makes no promises about an upper bound, and if they're being paranoid about future-proofing the parser. If so I have no opinion on what value of "N" would be sane, other than it seems best to pick something. ? > On 12/23/22 18:59, Ævar Arnfjörð Bjarmason wrote: >> diff --git a/man5/proc.5 b/man5/proc.5 >> index 115c8592e..b23dd1479 100644 >> --- a/man5/proc.5 >> +++ b/man5/proc.5 >> @@ -2092,9 +2092,13 @@ The filename of the executable, in > parentheses. Tools such as >> may alternatively (or additionally) use >> .IR/proc/ pid /cmdline. >> .IP >> -Strings longer than >> +For userspace, strings longer than >> .B TASK_COMM_LEN >> (16) characters (including the terminating null byte) are silently truncated. >> +Since Linux version 4.18.0 a longer limit of 64 (including the >> +terminating null byte) has applied to the kernel's own workqueue >> +workers (whose names start with "kworker/"). >> +.IP >> This is visible whether or not the executable is swapped out. >> .TP >> (3) \fIstate\fP \ %c ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/2] proc.5: note broken v4.18 userspace promise 2022-12-23 18:12 ` [PATCH 0/2] proc.5: note broken v4.18 userspace promise Linus Torvalds 2022-12-28 20:26 ` Alejandro Colomar @ 2023-01-04 17:19 ` Tejun Heo 1 sibling, 0 replies; 8+ messages in thread From: Tejun Heo @ 2023-01-04 17:19 UTC (permalink / raw) To: Linus Torvalds Cc: Ævar Arnfjörð Bjarmason, linux-man, Alejandro Colomar, Craig Small, Alexey Dobriyan, Alejandro Colomar, Michael Kerrisk, Yafang Shao On Fri, Dec 23, 2022 at 10:12:22AM -0800, Linus Torvalds wrote: > On Fri, Dec 23, 2022 at 10:00 AM Ævar Arnfjörð Bjarmason > <avarab@gmail.com> wrote: > > > > Whereas the fix here is a fix for a promise we're currently making > > which hasn't been true since v4.18. > > Hah. Ack. Did anybody ever actually notice? > > I wonder if the newer limit of 64 characters for kworkers shouldn't > even be mentioned at all, and if the 16-byte truncation for user space > should also be just removed. > > Those limits should never have been some documented API, they were > always just implementation details, after all. In 2018, 6b59808bfe48 ("workqueue: Show the latest workqueue name in /proc/PID/{comm,stat,status}") extended the length of the wq workers. 64 used for the formatting buffer size is definitely an implementation detail. Last year, Yafang (cc'd) added d6986ce24fc0 ("kthread: dynamically allocate memory to store kthread's full name") removed restrictions on kthread names completely. It now gets dynamically allocated if long than TASK_COMM_LEN. BTW, on allocation failure, the name gets silently truncated. I think it'd be better to fail kthread creation instead (not that we're likely to fail these allocations but still). I think the right thing to do here is saying that all kernel thread names can be longer than 16. Thanks. -- tejun ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-01-04 21:32 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-12-23 17:59 [PATCH 0/2] proc.5: note broken v4.18 userspace promise Ævar Arnfjörð Bjarmason 2022-12-23 17:59 ` [PATCH 1/2] proc.5: note that "cmdline" might be favored over "stat.comm" by ps(1) Ævar Arnfjörð Bjarmason 2022-12-28 20:27 ` Alejandro Colomar 2022-12-23 17:59 ` [PATCH 2/2] proc.5: the "comm" field can be longer than 16 bytes Ævar Arnfjörð Bjarmason 2022-12-23 18:12 ` [PATCH 0/2] proc.5: note broken v4.18 userspace promise Linus Torvalds 2022-12-28 20:26 ` Alejandro Colomar 2023-01-04 20:59 ` Ævar Arnfjörð Bjarmason 2023-01-04 17:19 ` Tejun Heo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox