From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B87C6377ECD for ; Mon, 30 Mar 2026 17:55:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.179 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774893341; cv=none; b=a4DgRgpSgudEL+gSVP6DtzqczJJ8dd60U5VyDj7rXtahCk9bO3s/3lxV1Gxso1afOZKnUBGI7cgbumQWrEj3rFXF/5K+phnbuNQ0MOlvqeqhe9unbAGov5kKwTnrjTOQOpmt64I2H8Z1biZcxoETL4zKW7Zn2DNK0SDhBzJ671o= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774893341; c=relaxed/simple; bh=g+r68O5+PEeauVot1CtOjwLukKexKsq1a+jEqh30ld8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=jxbvzbyrFOtfK4Y17ZW0GgNYQw9lhSjJlTMO+04PZ7YYzg6BA9NMmdFMW22z++Kcqjyzy9E14GbThvfKS0UIgPtq/PJGms79SsPRu+WEJZcp119Qz6L/5Fs3G9V5Nj/2GAAzzU+09UZZPXBiiTcMgnhBrZ5kVjzL+PZ8PIAMdvQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=mLqaOKbQ; arc=none smtp.client-ip=209.85.214.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="mLqaOKbQ" Received: by mail-pl1-f179.google.com with SMTP id d9443c01a7336-2b24ddb2428so8904635ad.0 for ; Mon, 30 Mar 2026 10:55:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1774893340; x=1775498140; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=g+r68O5+PEeauVot1CtOjwLukKexKsq1a+jEqh30ld8=; b=mLqaOKbQ8oBuo39cJ0ohxDr9LPjbNpuJC5JirwZBtaEPiUgJgCjJwKYIaWicr19MQi P4iU4MAbxRL0+kQmuAj34LIwIqY1tV89DVwbuA/oPznPlNPKC02U6Bo8+MjEEZnYDyWq Zk3+v0aqd1rccQcns7PjRU+fcug0NKquHxhinfvfzUWHYDAPo313o+MYdHnfbm3jCiX3 uZU/7iglyzCIxD5wg3Q1FAgZatt1dQyE/LtcoXWSuGzN+jYVchjiff22lBSEi8VG2t08 L86K8GPns88FLpBKx4PCYEET/O+Is1hSlGxm3bskQs5a3L2zjXXLzcvUhaPhunb5fiXY pJqA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1774893340; x=1775498140; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=g+r68O5+PEeauVot1CtOjwLukKexKsq1a+jEqh30ld8=; b=S6Yge5+DwzvojkRyrn9PoIKGpQAH3iHZWpEP73eeYzayvBSt3nFbutKB+HYKTCog20 7EwaKQzfpo6IXRogjuOEisZJ8mn8L3/zaYHqfmY10tj3cLPrN2eDCR0Y5wsndTqeYzJR rOOHCU5YsY6/r8m6Fk1gjt7Fr+TVz2BJP7D+sevFkghwZ75uducw2k31zVdLjnz+xF+g KejD6w0ap+SQ4h95pst6omfWYECYVgjlUDPVyFa3NqQGQGm41QrEuoZjHAEkck8FrTjm st1gdwsDhBreHEHmfjL66D7yrmxIP2UVw52h7N4I0aQHoEKZ3k+91ioKbjK46316/7jF j+0w== X-Forwarded-Encrypted: i=1; AJvYcCUTugchQfEdhDaLJHXE7L0EE41Dl/VOjHRxOzkFCK9CNX3NKVp+rI4V3Hn8g9ANJI8RVezBdlT+xhfjwPY=@vger.kernel.org X-Gm-Message-State: AOJu0YyUJpf6ROW5JWUQERJ8cuOFdy0Of+hF9ycqBF4Ii20wYxergCcF lPARXolQVGBhbFkSsMOacV9/7EnAYgUAAFrDUBfdLL8vzidvlSwADHU+Qam+Bu8MfIXxLA== X-Gm-Gg: ATEYQzwQG2eVdqOLaYKl2ij6eZTzIOsOE7MteOBFShBV3KhMykBvTN1a7JbAgmACIlm 1O3nVga5zdiYWPcxJB06oAT7v4oQhFYS6hG8QSwH8EvN4qvd35mw+C1v/YDy5VAQizMVIbS+FG7 w9aF4rrNHHeGUBaahB4ciQk9TJbC2StWwwRUPKH6Hw/bbWmTaqRv/StKgEsHNJnxAaDpFSni5kg 2FkDdv4mb+sl/qhUm+jDsJBtKxO74dEUAorg//2v25dYQYunw86+smTk8jNA8h9VbRqWYLR+ObK blyuPWg1+2Hi5mwNXEjbjaejrNwgdPgcTDZzDC0o3R2RWWaoM7m2dxdYIerR5EtJme+oDdd5zEH ak0FlESJMuHDm9wcU/CRdBFMuCZiboTR5PbvXMw7tJHP5A+YPs8DchAUm6GkrjHBMNLdCah5Zk4 V6bIWFZw+8/i0dpiLMZxgT7Ms= X-Received: by 2002:a17:903:3d0b:b0:2b2:4e9c:9b9c with SMTP id d9443c01a7336-2b24e9c9ebamr68214845ad.48.1774893339897; Mon, 30 Mar 2026 10:55:39 -0700 (PDT) Received: from ubuntu24.lan ([14.219.52.214]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2b242683064sm87247405ad.33.2026.03.30.10.55.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 30 Mar 2026 10:55:39 -0700 (PDT) From: Yiyang Chen To: thomas.orgis@uni-hamburg.de Cc: akpm@linux-foundation.org, bsingharora@gmail.com, cyyzero16@gmail.com, linux-kernel@vger.kernel.org, oleg@redhat.com, wang.yaxin@zte.com.cn, yang.yang29@zte.com.cn Subject: Re: [PATCH] taskstats: retain dead thread stats in TGID queries Date: Tue, 31 Mar 2026 01:55:16 +0800 Message-ID: <20260330175535.25616-1-cyyzero16@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260329165823.1e26001d@plasteblaster> References: <20260329165823.1e26001d@plasteblaster> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Hi Dr. Thomas > I can discern that this was a structurally simple (MPI) program that > spawned one process per CPU core and probably had two extra threads per > core for communication. It allocated 34 % more memory than it actually > needed. This one program took so much of the job's resources that other > processes don't really count. A bad HPC job has a long table of > commands each contributing a little, down towards individual calls to > 'cat' and the like. I want to see and present those cases. > > In another application, I collect statistics using accumulated CPU time > and coremem per program binary to be able to tell which programs and > (older) versions use how much of our cluster over the years. > > With a counter for total tasks over the group lifetime added to struct > taskstats and the missing fields filled following your patch, I could > get all this information with a lot less overhead via datasets only on > tgid exit and would not have to count each task as it finishes. I > always like less overhead for monitoring/accounting! Thanks a lot for the detailed feedback and for sharing your use case! > > Factor the per-task TGID accumulation into a helper and use it in both > > fill_stats_for_tgid() and fill_tgid_exit(). This keeps the fields > > retained for dead threads aligned with the fields already accounted for > > live threads, and follows the existing taskstats TGID aggregation model, > > which already accumulates delay accounting in fill_tgid_exit() and > > combines it with a live-thread scan in fill_stats_for_tgid(). > > Pardon my ignorance, as I do not have the time right now to dive back > into kernel code: Should other fields of interest also be filled? Do we > have all of them covered? Memory highwater marks are not per-task, > right? But coremem, virtmem? I/O stats? You're right that my current patch only covers ac_etime/ac_utime/ac_stime/nvcsw/nivcsw and delay accounting. I focused on these fields that were already accumulated in fill_stats_for_tgid() for live threads, to fix the inconsistency where dead threads lost accumulation in TGID queries. Also unify the fields for TGID queries and exit notifications, and ensure that dead threads are correctly counted. But adding the other fields makes sense as a follow-up patch. This may require a minor refactoring to reuse some of the code for PID taskstats accounting. > Also, in the end, I'd strongly prefer this patch to include a > user-visible change in the API, like an increased TASKSTATS_VERSION. > There are no new fields added, but the interpretation of the data is > different now for tgid. My current thinking is not to bump TASKSTATS_VERSION, since the struct layout and fields are unchanged. But if maintainers think the semantic change should be versioned, I’m happy to do that. Thanks, Yiyang Chen