* [PATCH 0/2] taskstats: fix TGID exit version and tool message truncation @ 2026-03-29 19:00 Yiyang Chen 2026-03-29 19:00 ` [PATCH 1/2] taskstats: set version in TGID exit notifications Yiyang Chen 2026-03-29 19:00 ` [PATCH 2/2] tools/accounting: handle truncated taskstats netlink messages Yiyang Chen 0 siblings, 2 replies; 5+ messages in thread From: Yiyang Chen @ 2026-03-29 19:00 UTC (permalink / raw) To: Balbir Singh Cc: linux-kernel, Andrew Morton, Wang Yaxin, Fan Yu, Dr . Thomas Orgis, Yiyang Chen This series contains two independent fixes around taskstats. The first patch fixes a long-standing taskstats bug where TGID exit notifications can carry version == 0 because the cached signal->stats aggregate is copied into the outgoing payload without restoring the taskstats version field. The second patch hardens the accounting sample tools against truncated taskstats netlink messages by switching to recvmsg(), checking MSG_TRUNC explicitly, and increasing the receive buffer size. Yiyang Chen (2): taskstats: set version in TGID exit notifications tools/accounting: handle truncated taskstats netlink messages kernel/taskstats.c | 1 + tools/accounting/getdelays.c | 41 ++++++++++++++++++++++++++++++++---- tools/accounting/procacct.c | 40 +++++++++++++++++++++++++++++++---- 3 files changed, 74 insertions(+), 8 deletions(-) base-commit: f242ac4a09443c6e2e0ec03d7e2a21b00cbb3907 -- 2.43.0 ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/2] taskstats: set version in TGID exit notifications 2026-03-29 19:00 [PATCH 0/2] taskstats: fix TGID exit version and tool message truncation Yiyang Chen @ 2026-03-29 19:00 ` Yiyang Chen 2026-03-30 21:29 ` Andrew Morton 2026-03-29 19:00 ` [PATCH 2/2] tools/accounting: handle truncated taskstats netlink messages Yiyang Chen 1 sibling, 1 reply; 5+ messages in thread From: Yiyang Chen @ 2026-03-29 19:00 UTC (permalink / raw) To: Balbir Singh Cc: linux-kernel, Andrew Morton, Wang Yaxin, Fan Yu, Dr . Thomas Orgis, Yiyang Chen, stable delay accounting started populating taskstats records with a valid version field via fill_pid() and fill_tgid(). Later, commit ad4ecbcba728 ("[PATCH] delay accounting taskstats interface send tgid once") changed the TGID exit path to send the cached signal->stats aggregate directly instead of building the outgoing record through fill_tgid(). Unlike fill_tgid(), fill_tgid_exit() only accumulates accounting data and never initializes stats->version. As a result, TGID exit notifications can reach userspace with version == 0 even though PID exit notifications and TASKSTATS_CMD_GET replies carry a valid taskstats version. Set stats->version = TASKSTATS_VERSION after copying the cached TGID aggregate into the outgoing netlink payload so all taskstats records are self-describing again. Fixes: ad4ecbcba728 ("[PATCH] delay accounting taskstats interface send tgid once") Cc: stable@vger.kernel.org Signed-off-by: Yiyang Chen <cyyzero16@gmail.com> --- kernel/taskstats.c | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/taskstats.c b/kernel/taskstats.c index 0cd680ccc7e5..73bd6a6a7893 100644 --- a/kernel/taskstats.c +++ b/kernel/taskstats.c @@ -649,6 +649,7 @@ void taskstats_exit(struct task_struct *tsk, int group_dead) goto err; memcpy(stats, tsk->signal->stats, sizeof(*stats)); + stats->version = TASKSTATS_VERSION; send: send_cpu_listeners(rep_skb, listeners); -- 2.43.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] taskstats: set version in TGID exit notifications 2026-03-29 19:00 ` [PATCH 1/2] taskstats: set version in TGID exit notifications Yiyang Chen @ 2026-03-30 21:29 ` Andrew Morton 2026-03-31 16:20 ` Yiyang Chen 0 siblings, 1 reply; 5+ messages in thread From: Andrew Morton @ 2026-03-30 21:29 UTC (permalink / raw) To: Yiyang Chen Cc: Balbir Singh, linux-kernel, Wang Yaxin, Fan Yu, Dr . Thomas Orgis, stable On Mon, 30 Mar 2026 03:00:40 +0800 Yiyang Chen <cyyzero16@gmail.com> wrote: > delay accounting started populating taskstats records with a valid > version field via fill_pid() and fill_tgid(). > > Later, commit ad4ecbcba728 ("[PATCH] delay accounting taskstats > interface send tgid once") changed the TGID exit path to send the > cached signal->stats aggregate directly instead of building the outgoing > record through fill_tgid(). Unlike fill_tgid(), fill_tgid_exit() only > accumulates accounting data and never initializes stats->version. > > As a result, TGID exit notifications can reach userspace with > version == 0 even though PID exit notifications and > TASKSTATS_CMD_GET replies carry a valid taskstats version. > > Set stats->version = TASKSTATS_VERSION after copying the cached TGID > aggregate into the outgoing netlink payload so all taskstats records are > self-describing again. > > Fixes: ad4ecbcba728 ("[PATCH] delay accounting taskstats interface send tgid once") Thanks, lol, 20 years ago. Can you explain how others can trigger this? Some combination of steps which results in the bad output? > Cc: stable@vger.kernel.org Is there a chance of breaking existing userspace here? Some existing userspace code which is expecting 0 here and will get surprised by this change? > --- a/kernel/taskstats.c > +++ b/kernel/taskstats.c > @@ -649,6 +649,7 @@ void taskstats_exit(struct task_struct *tsk, int group_dead) > goto err; > > memcpy(stats, tsk->signal->stats, sizeof(*stats)); > + stats->version = TASKSTATS_VERSION; > > send: > send_cpu_listeners(rep_skb, listeners); ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] taskstats: set version in TGID exit notifications 2026-03-30 21:29 ` Andrew Morton @ 2026-03-31 16:20 ` Yiyang Chen 0 siblings, 0 replies; 5+ messages in thread From: Yiyang Chen @ 2026-03-31 16:20 UTC (permalink / raw) To: akpm Cc: bsingharora, cyyzero16, fan.yu9, linux-kernel, stable, thomas.orgis, wang.yaxin On Tue, Mar 31, 2026 at 5:29 AM Andrew Morton <akpm@linux-foundation.org> wrote: > > On Mon, 30 Mar 2026 03:00:40 +0800 Yiyang Chen <cyyzero16@gmail.com> wrote: > > > delay accounting started populating taskstats records with a valid > > version field via fill_pid() and fill_tgid(). > > > > Later, commit ad4ecbcba728 ("[PATCH] delay accounting taskstats > > interface send tgid once") changed the TGID exit path to send the > > cached signal->stats aggregate directly instead of building the outgoing > > record through fill_tgid(). Unlike fill_tgid(), fill_tgid_exit() only > > accumulates accounting data and never initializes stats->version. > > > > As a result, TGID exit notifications can reach userspace with > > version == 0 even though PID exit notifications and > > TASKSTATS_CMD_GET replies carry a valid taskstats version. > > > > Set stats->version = TASKSTATS_VERSION after copying the cached TGID > > aggregate into the outgoing netlink payload so all taskstats records are > > self-describing again. > > > > Fixes: ad4ecbcba728 ("[PATCH] delay accounting taskstats interface send tgid once") > > Thanks, lol, 20 years ago. > > Can you explain how others can trigger this? Some combination of > steps which results in the bad output? Yes. This is easy to reproduce with `tools/accounting/getdelays.c`. I have a small follow-up patch for that tool which: 1. increases the receive buffer/message size so the pid+tgid combined exit notification is not dropped/truncated 2. prints `stats->version`. With that patch, the reproducer is: Terminal 1: ./getdelays -d -v -l -m 0 Terminal 2: taskset -c 0 python3 -c 'import threading,time; t=threading.Thread(target=time.sleep,args=(0.1,)); t.start(); t.join()' That produces both PID and TGID exit notifications for the same process. The PID exit record reports a valid taskstats version, while the TGID exit record reports `version 0`. > > > Cc: stable@vger.kernel.org > > Is there a chance of breaking existing userspace here? Some existing > userspace code which is expecting 0 here and will get surprised by this > change? In practice, userspace uses `taskstats.version` to decide which fields are present in `struct taskstats`, i.e. as a schema/version discriminator. A zero version does not describe a valid taskstats layout, so it is hard to see how userspace could use `0` as a meaningful or useful distinction here. So I do not think fixing this in mainline should break sensible userspace. It just restores consistency of the taskstats version semantics across `TASKSTATS_CMD_GET`, PID exit notifications, and TGID exit notifications. To be honest, I'm also not sure if this should backport to stable. But I think mainline should still fix it. Thanks, Yiyang ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 2/2] tools/accounting: handle truncated taskstats netlink messages 2026-03-29 19:00 [PATCH 0/2] taskstats: fix TGID exit version and tool message truncation Yiyang Chen 2026-03-29 19:00 ` [PATCH 1/2] taskstats: set version in TGID exit notifications Yiyang Chen @ 2026-03-29 19:00 ` Yiyang Chen 1 sibling, 0 replies; 5+ messages in thread From: Yiyang Chen @ 2026-03-29 19:00 UTC (permalink / raw) To: Balbir Singh Cc: linux-kernel, Andrew Morton, Wang Yaxin, Fan Yu, Dr . Thomas Orgis, Yiyang Chen procacct and getdelays use a fixed receive buffer for taskstats generic netlink messages. A multi-threaded process exit can emit a single PID+TGID notification large enough to exceed that buffer on newer kernels. Switch to recvmsg() so MSG_TRUNC is detected explicitly, increase the message buffer size, and report truncated datagrams clearly instead of misparsing them as fatal netlink errors. Also print the taskstats version in debug output to make version mismatches easier to diagnose while inspecting taskstats traffic. Signed-off-by: Yiyang Chen <cyyzero16@gmail.com> --- tools/accounting/getdelays.c | 41 ++++++++++++++++++++++++++++++++---- tools/accounting/procacct.c | 40 +++++++++++++++++++++++++++++++---- 2 files changed, 73 insertions(+), 8 deletions(-) diff --git a/tools/accounting/getdelays.c b/tools/accounting/getdelays.c index 50792df27707..368a622ca027 100644 --- a/tools/accounting/getdelays.c +++ b/tools/accounting/getdelays.c @@ -60,7 +60,7 @@ int print_task_context_switch_counts; } /* Maximum size of response requested or message sent */ -#define MAX_MSG_SIZE 1024 +#define MAX_MSG_SIZE 2048 /* Maximum number of cpus expected to be specified in a cpumask */ #define MAX_CPUS 32 @@ -115,6 +115,32 @@ static int create_nl_socket(int protocol) return -1; } +static int recv_taskstats_msg(int sd, struct msgtemplate *msg) +{ + struct sockaddr_nl nladdr; + struct iovec iov = { + .iov_base = msg, + .iov_len = sizeof(*msg), + }; + struct msghdr hdr = { + .msg_name = &nladdr, + .msg_namelen = sizeof(nladdr), + .msg_iov = &iov, + .msg_iovlen = 1, + }; + int ret; + + ret = recvmsg(sd, &hdr, 0); + if (ret < 0) + return -1; + if (hdr.msg_flags & MSG_TRUNC) { + errno = EMSGSIZE; + return -1; + } + + return ret; +} + static int send_cmd(int sd, __u16 nlmsg_type, __u32 nlmsg_pid, __u8 genl_cmd, __u16 nla_type, @@ -633,12 +659,16 @@ int main(int argc, char *argv[]) } do { - rep_len = recv(nl_sd, &msg, sizeof(msg), 0); + rep_len = recv_taskstats_msg(nl_sd, &msg); PRINTF("received %d bytes\n", rep_len); if (rep_len < 0) { - fprintf(stderr, "nonfatal reply error: errno %d\n", - errno); + if (errno == EMSGSIZE) + fprintf(stderr, + "dropped truncated taskstats netlink message, please increase MAX_MSG_SIZE\n"); + else + fprintf(stderr, "nonfatal reply error: errno %d\n", + errno); continue; } if (msg.n.nlmsg_type == NLMSG_ERROR || @@ -680,6 +710,9 @@ int main(int argc, char *argv[]) printf("TGID\t%d\n", rtid); break; case TASKSTATS_TYPE_STATS: + PRINTF("version %u\n", + ((struct taskstats *) + NLA_DATA(na))->version); if (print_delays) print_delayacct((struct taskstats *) NLA_DATA(na)); if (print_io_accounting) diff --git a/tools/accounting/procacct.c b/tools/accounting/procacct.c index e8dee05a6264..46e5986ad927 100644 --- a/tools/accounting/procacct.c +++ b/tools/accounting/procacct.c @@ -71,7 +71,7 @@ int print_task_context_switch_counts; } /* Maximum size of response requested or message sent */ -#define MAX_MSG_SIZE 1024 +#define MAX_MSG_SIZE 2048 /* Maximum number of cpus expected to be specified in a cpumask */ #define MAX_CPUS 32 @@ -121,6 +121,32 @@ static int create_nl_socket(int protocol) return -1; } +static int recv_taskstats_msg(int sd, struct msgtemplate *msg) +{ + struct sockaddr_nl nladdr; + struct iovec iov = { + .iov_base = msg, + .iov_len = sizeof(*msg), + }; + struct msghdr hdr = { + .msg_name = &nladdr, + .msg_namelen = sizeof(nladdr), + .msg_iov = &iov, + .msg_iovlen = 1, + }; + int ret; + + ret = recvmsg(sd, &hdr, 0); + if (ret < 0) + return -1; + if (hdr.msg_flags & MSG_TRUNC) { + errno = EMSGSIZE; + return -1; + } + + return ret; +} + static int send_cmd(int sd, __u16 nlmsg_type, __u32 nlmsg_pid, __u8 genl_cmd, __u16 nla_type, @@ -239,6 +265,8 @@ void handle_aggr(int mother, struct nlattr *na, int fd) PRINTF("TGID\t%d\n", rtid); break; case TASKSTATS_TYPE_STATS: + PRINTF("version %u\n", + ((struct taskstats *)NLA_DATA(na))->version); if (mother == TASKSTATS_TYPE_AGGR_PID) print_procacct((struct taskstats *) NLA_DATA(na)); if (fd) { @@ -347,12 +375,16 @@ int main(int argc, char *argv[]) } do { - rep_len = recv(nl_sd, &msg, sizeof(msg), 0); + rep_len = recv_taskstats_msg(nl_sd, &msg); PRINTF("received %d bytes\n", rep_len); if (rep_len < 0) { - fprintf(stderr, "nonfatal reply error: errno %d\n", - errno); + if (errno == EMSGSIZE) + fprintf(stderr, + "dropped truncated taskstats netlink message, please increase MAX_MSG_SIZE\n"); + else + fprintf(stderr, "nonfatal reply error: errno %d\n", + errno); continue; } if (msg.n.nlmsg_type == NLMSG_ERROR || -- 2.43.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-03-31 16:20 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-03-29 19:00 [PATCH 0/2] taskstats: fix TGID exit version and tool message truncation Yiyang Chen 2026-03-29 19:00 ` [PATCH 1/2] taskstats: set version in TGID exit notifications Yiyang Chen 2026-03-30 21:29 ` Andrew Morton 2026-03-31 16:20 ` Yiyang Chen 2026-03-29 19:00 ` [PATCH 2/2] tools/accounting: handle truncated taskstats netlink messages Yiyang Chen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox