From: Yiyang Chen <cyyzero16@gmail.com>
To: thomas.orgis@uni-hamburg.de
Cc: akpm@linux-foundation.org, bsingharora@gmail.com,
cyyzero16@gmail.com, linux-kernel@vger.kernel.org,
oleg@redhat.com, wang.yaxin@zte.com.cn, yang.yang29@zte.com.cn
Subject: Re: [PATCH] taskstats: retain dead thread stats in TGID queries
Date: Tue, 31 Mar 2026 01:55:16 +0800 [thread overview]
Message-ID: <20260330175535.25616-1-cyyzero16@gmail.com> (raw)
In-Reply-To: <20260329165823.1e26001d@plasteblaster>
Hi Dr. Thomas
> I can discern that this was a structurally simple (MPI) program that
> spawned one process per CPU core and probably had two extra threads per
> core for communication. It allocated 34 % more memory than it actually
> needed. This one program took so much of the job's resources that other
> processes don't really count. A bad HPC job has a long table of
> commands each contributing a little, down towards individual calls to
> 'cat' and the like. I want to see and present those cases.
>
> In another application, I collect statistics using accumulated CPU time
> and coremem per program binary to be able to tell which programs and
> (older) versions use how much of our cluster over the years.
>
> With a counter for total tasks over the group lifetime added to struct
> taskstats and the missing fields filled following your patch, I could
> get all this information with a lot less overhead via datasets only on
> tgid exit and would not have to count each task as it finishes. I
> always like less overhead for monitoring/accounting!
Thanks a lot for the detailed feedback and for sharing your use case!
> > Factor the per-task TGID accumulation into a helper and use it in both
> > fill_stats_for_tgid() and fill_tgid_exit(). This keeps the fields
> > retained for dead threads aligned with the fields already accounted for
> > live threads, and follows the existing taskstats TGID aggregation model,
> > which already accumulates delay accounting in fill_tgid_exit() and
> > combines it with a live-thread scan in fill_stats_for_tgid().
>
> Pardon my ignorance, as I do not have the time right now to dive back
> into kernel code: Should other fields of interest also be filled? Do we
> have all of them covered? Memory highwater marks are not per-task,
> right? But coremem, virtmem? I/O stats?
You're right that my current patch only covers
ac_etime/ac_utime/ac_stime/nvcsw/nivcsw and delay accounting.
I focused on these fields that were already accumulated in
fill_stats_for_tgid() for live threads, to fix the inconsistency
where dead threads lost accumulation in TGID queries.
Also unify the fields for TGID queries and exit notifications,
and ensure that dead threads are correctly counted.
But adding the other fields makes sense as a follow-up patch.
This may require a minor refactoring to reuse some of the code
for PID taskstats accounting.
> Also, in the end, I'd strongly prefer this patch to include a
> user-visible change in the API, like an increased TASKSTATS_VERSION.
> There are no new fields added, but the interpretation of the data is
> different now for tgid.
My current thinking is not to bump TASKSTATS_VERSION,
since the struct layout and fields are unchanged.
But if maintainers think the semantic change should be versioned,
I’m happy to do that.
Thanks,
Yiyang Chen
next prev parent reply other threads:[~2026-03-30 17:55 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-26 19:12 [PATCH] taskstats: retain dead thread stats in TGID queries Yiyang Chen
2026-03-29 14:58 ` Dr. Thomas Orgis
2026-03-30 17:55 ` Yiyang Chen [this message]
2026-03-30 12:57 ` wang.yaxin
2026-03-30 18:15 ` Yiyang Chen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260330175535.25616-1-cyyzero16@gmail.com \
--to=cyyzero16@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=bsingharora@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=oleg@redhat.com \
--cc=thomas.orgis@uni-hamburg.de \
--cc=wang.yaxin@zte.com.cn \
--cc=yang.yang29@zte.com.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox