* [PATCH 0/2] taskstats: fix TGID exit version and tool message truncation
@ 2026-03-29 19:00 Yiyang Chen
2026-03-29 19:00 ` [PATCH 1/2] taskstats: set version in TGID exit notifications Yiyang Chen
2026-03-29 19:00 ` [PATCH 2/2] tools/accounting: handle truncated taskstats netlink messages Yiyang Chen
0 siblings, 2 replies; 5+ messages in thread
From: Yiyang Chen @ 2026-03-29 19:00 UTC (permalink / raw)
To: Balbir Singh
Cc: linux-kernel, Andrew Morton, Wang Yaxin, Fan Yu,
Dr . Thomas Orgis, Yiyang Chen
This series contains two independent fixes around taskstats.
The first patch fixes a long-standing taskstats bug where TGID exit
notifications can carry version == 0 because the cached signal->stats
aggregate is copied into the outgoing payload without restoring the
taskstats version field.
The second patch hardens the accounting sample tools against truncated
taskstats netlink messages by switching to recvmsg(), checking
MSG_TRUNC explicitly, and increasing the receive buffer size.
Yiyang Chen (2):
taskstats: set version in TGID exit notifications
tools/accounting: handle truncated taskstats netlink messages
kernel/taskstats.c | 1 +
tools/accounting/getdelays.c | 41 ++++++++++++++++++++++++++++++++----
tools/accounting/procacct.c | 40 +++++++++++++++++++++++++++++++----
3 files changed, 74 insertions(+), 8 deletions(-)
base-commit: f242ac4a09443c6e2e0ec03d7e2a21b00cbb3907
--
2.43.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/2] taskstats: set version in TGID exit notifications
2026-03-29 19:00 [PATCH 0/2] taskstats: fix TGID exit version and tool message truncation Yiyang Chen
@ 2026-03-29 19:00 ` Yiyang Chen
2026-03-30 21:29 ` Andrew Morton
2026-03-29 19:00 ` [PATCH 2/2] tools/accounting: handle truncated taskstats netlink messages Yiyang Chen
1 sibling, 1 reply; 5+ messages in thread
From: Yiyang Chen @ 2026-03-29 19:00 UTC (permalink / raw)
To: Balbir Singh
Cc: linux-kernel, Andrew Morton, Wang Yaxin, Fan Yu,
Dr . Thomas Orgis, Yiyang Chen, stable
delay accounting started populating taskstats records with a valid
version field via fill_pid() and fill_tgid().
Later, commit ad4ecbcba728 ("[PATCH] delay accounting taskstats
interface send tgid once") changed the TGID exit path to send the
cached signal->stats aggregate directly instead of building the outgoing
record through fill_tgid(). Unlike fill_tgid(), fill_tgid_exit() only
accumulates accounting data and never initializes stats->version.
As a result, TGID exit notifications can reach userspace with
version == 0 even though PID exit notifications and
TASKSTATS_CMD_GET replies carry a valid taskstats version.
Set stats->version = TASKSTATS_VERSION after copying the cached TGID
aggregate into the outgoing netlink payload so all taskstats records are
self-describing again.
Fixes: ad4ecbcba728 ("[PATCH] delay accounting taskstats interface send tgid once")
Cc: stable@vger.kernel.org
Signed-off-by: Yiyang Chen <cyyzero16@gmail.com>
---
kernel/taskstats.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/kernel/taskstats.c b/kernel/taskstats.c
index 0cd680ccc7e5..73bd6a6a7893 100644
--- a/kernel/taskstats.c
+++ b/kernel/taskstats.c
@@ -649,6 +649,7 @@ void taskstats_exit(struct task_struct *tsk, int group_dead)
goto err;
memcpy(stats, tsk->signal->stats, sizeof(*stats));
+ stats->version = TASKSTATS_VERSION;
send:
send_cpu_listeners(rep_skb, listeners);
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 2/2] tools/accounting: handle truncated taskstats netlink messages
2026-03-29 19:00 [PATCH 0/2] taskstats: fix TGID exit version and tool message truncation Yiyang Chen
2026-03-29 19:00 ` [PATCH 1/2] taskstats: set version in TGID exit notifications Yiyang Chen
@ 2026-03-29 19:00 ` Yiyang Chen
1 sibling, 0 replies; 5+ messages in thread
From: Yiyang Chen @ 2026-03-29 19:00 UTC (permalink / raw)
To: Balbir Singh
Cc: linux-kernel, Andrew Morton, Wang Yaxin, Fan Yu,
Dr . Thomas Orgis, Yiyang Chen
procacct and getdelays use a fixed receive buffer for taskstats
generic netlink messages. A multi-threaded process exit can emit a
single PID+TGID notification large enough to exceed that buffer on
newer kernels.
Switch to recvmsg() so MSG_TRUNC is detected explicitly, increase the
message buffer size, and report truncated datagrams clearly instead of
misparsing them as fatal netlink errors.
Also print the taskstats version in debug output to make version
mismatches easier to diagnose while inspecting taskstats traffic.
Signed-off-by: Yiyang Chen <cyyzero16@gmail.com>
---
tools/accounting/getdelays.c | 41 ++++++++++++++++++++++++++++++++----
tools/accounting/procacct.c | 40 +++++++++++++++++++++++++++++++----
2 files changed, 73 insertions(+), 8 deletions(-)
diff --git a/tools/accounting/getdelays.c b/tools/accounting/getdelays.c
index 50792df27707..368a622ca027 100644
--- a/tools/accounting/getdelays.c
+++ b/tools/accounting/getdelays.c
@@ -60,7 +60,7 @@ int print_task_context_switch_counts;
}
/* Maximum size of response requested or message sent */
-#define MAX_MSG_SIZE 1024
+#define MAX_MSG_SIZE 2048
/* Maximum number of cpus expected to be specified in a cpumask */
#define MAX_CPUS 32
@@ -115,6 +115,32 @@ static int create_nl_socket(int protocol)
return -1;
}
+static int recv_taskstats_msg(int sd, struct msgtemplate *msg)
+{
+ struct sockaddr_nl nladdr;
+ struct iovec iov = {
+ .iov_base = msg,
+ .iov_len = sizeof(*msg),
+ };
+ struct msghdr hdr = {
+ .msg_name = &nladdr,
+ .msg_namelen = sizeof(nladdr),
+ .msg_iov = &iov,
+ .msg_iovlen = 1,
+ };
+ int ret;
+
+ ret = recvmsg(sd, &hdr, 0);
+ if (ret < 0)
+ return -1;
+ if (hdr.msg_flags & MSG_TRUNC) {
+ errno = EMSGSIZE;
+ return -1;
+ }
+
+ return ret;
+}
+
static int send_cmd(int sd, __u16 nlmsg_type, __u32 nlmsg_pid,
__u8 genl_cmd, __u16 nla_type,
@@ -633,12 +659,16 @@ int main(int argc, char *argv[])
}
do {
- rep_len = recv(nl_sd, &msg, sizeof(msg), 0);
+ rep_len = recv_taskstats_msg(nl_sd, &msg);
PRINTF("received %d bytes\n", rep_len);
if (rep_len < 0) {
- fprintf(stderr, "nonfatal reply error: errno %d\n",
- errno);
+ if (errno == EMSGSIZE)
+ fprintf(stderr,
+ "dropped truncated taskstats netlink message, please increase MAX_MSG_SIZE\n");
+ else
+ fprintf(stderr, "nonfatal reply error: errno %d\n",
+ errno);
continue;
}
if (msg.n.nlmsg_type == NLMSG_ERROR ||
@@ -680,6 +710,9 @@ int main(int argc, char *argv[])
printf("TGID\t%d\n", rtid);
break;
case TASKSTATS_TYPE_STATS:
+ PRINTF("version %u\n",
+ ((struct taskstats *)
+ NLA_DATA(na))->version);
if (print_delays)
print_delayacct((struct taskstats *) NLA_DATA(na));
if (print_io_accounting)
diff --git a/tools/accounting/procacct.c b/tools/accounting/procacct.c
index e8dee05a6264..46e5986ad927 100644
--- a/tools/accounting/procacct.c
+++ b/tools/accounting/procacct.c
@@ -71,7 +71,7 @@ int print_task_context_switch_counts;
}
/* Maximum size of response requested or message sent */
-#define MAX_MSG_SIZE 1024
+#define MAX_MSG_SIZE 2048
/* Maximum number of cpus expected to be specified in a cpumask */
#define MAX_CPUS 32
@@ -121,6 +121,32 @@ static int create_nl_socket(int protocol)
return -1;
}
+static int recv_taskstats_msg(int sd, struct msgtemplate *msg)
+{
+ struct sockaddr_nl nladdr;
+ struct iovec iov = {
+ .iov_base = msg,
+ .iov_len = sizeof(*msg),
+ };
+ struct msghdr hdr = {
+ .msg_name = &nladdr,
+ .msg_namelen = sizeof(nladdr),
+ .msg_iov = &iov,
+ .msg_iovlen = 1,
+ };
+ int ret;
+
+ ret = recvmsg(sd, &hdr, 0);
+ if (ret < 0)
+ return -1;
+ if (hdr.msg_flags & MSG_TRUNC) {
+ errno = EMSGSIZE;
+ return -1;
+ }
+
+ return ret;
+}
+
static int send_cmd(int sd, __u16 nlmsg_type, __u32 nlmsg_pid,
__u8 genl_cmd, __u16 nla_type,
@@ -239,6 +265,8 @@ void handle_aggr(int mother, struct nlattr *na, int fd)
PRINTF("TGID\t%d\n", rtid);
break;
case TASKSTATS_TYPE_STATS:
+ PRINTF("version %u\n",
+ ((struct taskstats *)NLA_DATA(na))->version);
if (mother == TASKSTATS_TYPE_AGGR_PID)
print_procacct((struct taskstats *) NLA_DATA(na));
if (fd) {
@@ -347,12 +375,16 @@ int main(int argc, char *argv[])
}
do {
- rep_len = recv(nl_sd, &msg, sizeof(msg), 0);
+ rep_len = recv_taskstats_msg(nl_sd, &msg);
PRINTF("received %d bytes\n", rep_len);
if (rep_len < 0) {
- fprintf(stderr, "nonfatal reply error: errno %d\n",
- errno);
+ if (errno == EMSGSIZE)
+ fprintf(stderr,
+ "dropped truncated taskstats netlink message, please increase MAX_MSG_SIZE\n");
+ else
+ fprintf(stderr, "nonfatal reply error: errno %d\n",
+ errno);
continue;
}
if (msg.n.nlmsg_type == NLMSG_ERROR ||
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] taskstats: set version in TGID exit notifications
2026-03-29 19:00 ` [PATCH 1/2] taskstats: set version in TGID exit notifications Yiyang Chen
@ 2026-03-30 21:29 ` Andrew Morton
2026-03-31 16:20 ` Yiyang Chen
0 siblings, 1 reply; 5+ messages in thread
From: Andrew Morton @ 2026-03-30 21:29 UTC (permalink / raw)
To: Yiyang Chen
Cc: Balbir Singh, linux-kernel, Wang Yaxin, Fan Yu, Dr . Thomas Orgis,
stable
On Mon, 30 Mar 2026 03:00:40 +0800 Yiyang Chen <cyyzero16@gmail.com> wrote:
> delay accounting started populating taskstats records with a valid
> version field via fill_pid() and fill_tgid().
>
> Later, commit ad4ecbcba728 ("[PATCH] delay accounting taskstats
> interface send tgid once") changed the TGID exit path to send the
> cached signal->stats aggregate directly instead of building the outgoing
> record through fill_tgid(). Unlike fill_tgid(), fill_tgid_exit() only
> accumulates accounting data and never initializes stats->version.
>
> As a result, TGID exit notifications can reach userspace with
> version == 0 even though PID exit notifications and
> TASKSTATS_CMD_GET replies carry a valid taskstats version.
>
> Set stats->version = TASKSTATS_VERSION after copying the cached TGID
> aggregate into the outgoing netlink payload so all taskstats records are
> self-describing again.
>
> Fixes: ad4ecbcba728 ("[PATCH] delay accounting taskstats interface send tgid once")
Thanks, lol, 20 years ago.
Can you explain how others can trigger this? Some combination of
steps which results in the bad output?
> Cc: stable@vger.kernel.org
Is there a chance of breaking existing userspace here? Some existing
userspace code which is expecting 0 here and will get surprised by this
change?
> --- a/kernel/taskstats.c
> +++ b/kernel/taskstats.c
> @@ -649,6 +649,7 @@ void taskstats_exit(struct task_struct *tsk, int group_dead)
> goto err;
>
> memcpy(stats, tsk->signal->stats, sizeof(*stats));
> + stats->version = TASKSTATS_VERSION;
>
> send:
> send_cpu_listeners(rep_skb, listeners);
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] taskstats: set version in TGID exit notifications
2026-03-30 21:29 ` Andrew Morton
@ 2026-03-31 16:20 ` Yiyang Chen
0 siblings, 0 replies; 5+ messages in thread
From: Yiyang Chen @ 2026-03-31 16:20 UTC (permalink / raw)
To: akpm
Cc: bsingharora, cyyzero16, fan.yu9, linux-kernel, stable,
thomas.orgis, wang.yaxin
On Tue, Mar 31, 2026 at 5:29 AM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Mon, 30 Mar 2026 03:00:40 +0800 Yiyang Chen <cyyzero16@gmail.com> wrote:
>
> > delay accounting started populating taskstats records with a valid
> > version field via fill_pid() and fill_tgid().
> >
> > Later, commit ad4ecbcba728 ("[PATCH] delay accounting taskstats
> > interface send tgid once") changed the TGID exit path to send the
> > cached signal->stats aggregate directly instead of building the outgoing
> > record through fill_tgid(). Unlike fill_tgid(), fill_tgid_exit() only
> > accumulates accounting data and never initializes stats->version.
> >
> > As a result, TGID exit notifications can reach userspace with
> > version == 0 even though PID exit notifications and
> > TASKSTATS_CMD_GET replies carry a valid taskstats version.
> >
> > Set stats->version = TASKSTATS_VERSION after copying the cached TGID
> > aggregate into the outgoing netlink payload so all taskstats records are
> > self-describing again.
> >
> > Fixes: ad4ecbcba728 ("[PATCH] delay accounting taskstats interface send tgid once")
>
> Thanks, lol, 20 years ago.
>
> Can you explain how others can trigger this? Some combination of
> steps which results in the bad output?
Yes. This is easy to reproduce with `tools/accounting/getdelays.c`.
I have a small follow-up patch for that tool which:
1. increases the receive buffer/message size so the pid+tgid combined exit
notification is not dropped/truncated
2. prints `stats->version`.
With that patch, the reproducer is:
Terminal 1:
./getdelays -d -v -l -m 0
Terminal 2:
taskset -c 0 python3 -c 'import threading,time; t=threading.Thread(target=time.sleep,args=(0.1,)); t.start(); t.join()'
That produces both PID and TGID exit notifications for the same process. The PID
exit record reports a valid taskstats version, while the TGID exit record reports
`version 0`.
>
> > Cc: stable@vger.kernel.org
>
> Is there a chance of breaking existing userspace here? Some existing
> userspace code which is expecting 0 here and will get surprised by this
> change?
In practice, userspace uses `taskstats.version` to decide which fields are
present in `struct taskstats`, i.e. as a schema/version discriminator. A zero
version does not describe a valid taskstats layout, so it is hard to see how
userspace could use `0` as a meaningful or useful distinction here.
So I do not think fixing this in mainline should break sensible userspace. It
just restores consistency of the taskstats version semantics across
`TASKSTATS_CMD_GET`, PID exit notifications, and TGID exit notifications.
To be honest, I'm also not sure if this should backport to stable. But I think
mainline should still fix it.
Thanks,
Yiyang
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-03-31 16:20 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-29 19:00 [PATCH 0/2] taskstats: fix TGID exit version and tool message truncation Yiyang Chen
2026-03-29 19:00 ` [PATCH 1/2] taskstats: set version in TGID exit notifications Yiyang Chen
2026-03-30 21:29 ` Andrew Morton
2026-03-31 16:20 ` Yiyang Chen
2026-03-29 19:00 ` [PATCH 2/2] tools/accounting: handle truncated taskstats netlink messages Yiyang Chen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox