From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E571DC021A4 for ; Mon, 24 Feb 2025 17:06:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 80BBB28000C; Mon, 24 Feb 2025 12:06:28 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7944E280002; Mon, 24 Feb 2025 12:06:28 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5E94F28000C; Mon, 24 Feb 2025 12:06:28 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 3D22A280002 for ; Mon, 24 Feb 2025 12:06:28 -0500 (EST) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id D8141A2C61 for ; Mon, 24 Feb 2025 17:06:27 +0000 (UTC) X-FDA: 83155466814.05.DAE6C4C Received: from mail-pj1-f46.google.com (mail-pj1-f46.google.com [209.85.216.46]) by imf06.hostedemail.com (Postfix) with ESMTP id 82F2D180010 for ; Mon, 24 Feb 2025 17:06:25 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=WnGjLc2D; spf=pass (imf06.hostedemail.com: domain of inwardvessel@gmail.com designates 209.85.216.46 as permitted sender) smtp.mailfrom=inwardvessel@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740416785; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=VTHS8iK3FsKYm/QQs//Dbn45NKszSJWQZPTjnfB9t0g=; b=LiyaRzyNtQJi4yO0SbEtR6FpwyDNGR+MBtBRMp5WIMT0KfULnUNWhy9AcptfReaeGzsOHi Vc5gTfemkUNdlyAOTJ+dZqPQZC5WmLTaNLyaSsE8NSGVmLWzuRd32xlor7PJr3tigCd0A7 HDQcPBE70b1OAshnwB8ypER2NUIxqN4= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=WnGjLc2D; spf=pass (imf06.hostedemail.com: domain of inwardvessel@gmail.com designates 209.85.216.46 as permitted sender) smtp.mailfrom=inwardvessel@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740416785; a=rsa-sha256; cv=none; b=s0p3ry+DsQyFi6lhCb8tR/fE/kIOGJY3sCli4U86dB8qGCL0K60KzQW7k275diNStDIaMq QJFkWyeUB5uOMAtnvWm48ELtyjQfapkyHKaCizqt8rmRTk0T66hOmUkV6FWYW1vd9zbM+o GH5YTMknSU1+H0cMtkTUUXB2XvRGPtg= Received: by mail-pj1-f46.google.com with SMTP id 98e67ed59e1d1-2fc042c9290so7256797a91.0 for ; Mon, 24 Feb 2025 09:06:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1740416784; x=1741021584; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=VTHS8iK3FsKYm/QQs//Dbn45NKszSJWQZPTjnfB9t0g=; b=WnGjLc2DRzlc1eaLa7sMJzhxugHthC2uIOB5TrZTbRFPHpE5tF1fn2J+MNj3ajRWhz Hl+XKfvTmINMQrCIM1wNXNkSzaIz00JXwirjeWzfcxvfLkicQ+1fEQ9iOH4tbhCUlJkz izvnNtZpV/9kK1I/oKAkslYvd3yu5b7PXSDSa6CBQBZ8SwQwS5wD5e+72E8e8er5Htqb Z2/l1X4PKHrn0q5T1YX3svDVcRu1Bpo/Fgd/lcfr/Qf6SVx8MGuDabtcepHibs8sHpW3 l1xX2Q7t2YUaXCXd6rfA2JnRet8SB92pTRmY2XTfHj1mQuitz9lFkG86ofNB4OH3f4NZ eXSA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740416784; x=1741021584; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=VTHS8iK3FsKYm/QQs//Dbn45NKszSJWQZPTjnfB9t0g=; b=Vh3qvrTVhR+K2RgWM4cSCkGYAJ3G2EmrHtCsxXG/lRkCH2B0+JZA9Ez+N9dbys6olj XDWK8U7VbAZIMLdjwCCWrSBs0zj6ASwk4rDrup697G6bHGh4kWYvWhU2+EkQ4Kf37kkz ikpOHhtSeWd7DZLjcSJG8wUl6zBwPOrA8v7Oxpbqk+kkukwf+UgzhU3zCawt0bbjiIrD O/RnMJd0+rY7es82QRx2noVuOKNcGE0yD7zVVGOhcdQgx+Uw8dZvxRizqEev9uvyt5uw ZSsAmBCKIvoCO06Chqp67bbG8nug2YyTR5lXoh+BA+SxqbIGZY6fEL301cA+I6YTiZ5Z rIgA== X-Forwarded-Encrypted: i=1; AJvYcCVHcWasoBJ+8NBYO8FAOWN0NQvwD0FARINLRQtrmmWUC5DeUMfna00SVLrBFFeb55DX1N3eEPL1Wg==@kvack.org X-Gm-Message-State: AOJu0Yypr18rQA/4CwWCeUfFGzVVMFMbLt1XKZlhJIqFCC02z5hGDu4W i2x7VdftnueNscTY9WB9IZkYWtSBy1b/y9Yq2bB/5q57VrH1WmS3 X-Gm-Gg: ASbGncv+bHYmF5/ACh/zmprSAg+5+bveAVqZevcVpxZtTokWoR/FeDflmggmOJAVEvT RXYro8+GFqBRqcdIZr/Q1EqdUDkw6y1qhVuL0cwEfdZZu3Vplutj/KlmlkZccqDqIUkdTDmSVms SIuXD15MyU0ZiV8+xPBGnBP91XHWuhT1Q0yQHgAyog9jbNRO7Q3M8lKZx58NjkrEdxZFpqFdtfu 2hTchQNjG5WthvR2lP48E5sYE53ukqDcueBdogMUrCOFoNDyWOHbRUQLLE1GtoFjHYG8DKPULVL x17Bk8IfQEYG6Wm5+ArQM+EbMv57DxCVnvGH0vcgL9AZHyXuCZRLh8AnVA== X-Google-Smtp-Source: AGHT+IGMqhrUZAxEQJWkNDLXJrTxLwT4UhOoRT1uPGC+2lcRJOsdoKA540ITjl1yDP8hW2ZlJQlBRQ== X-Received: by 2002:a17:90b:1c90:b0:2fa:137f:5c5c with SMTP id 98e67ed59e1d1-2fce769a8a3mr21389387a91.1.1740416784161; Mon, 24 Feb 2025 09:06:24 -0800 (PST) Received: from ?IPV6:2a03:83e0:1151:15:19f:cf94:a905:e7ab? ([2620:10d:c090:500::6:77f7]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2fceb0a6902sm6852591a91.48.2025.02.24.09.06.22 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 24 Feb 2025 09:06:23 -0800 (PST) Message-ID: Date: Mon, 24 Feb 2025 09:06:21 -0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 01/11] cgroup: move rstat pointers into struct of their own To: Yosry Ahmed Cc: shakeel.butt@linux.dev, tj@kernel.org, mhocko@kernel.org, hannes@cmpxchg.org, akpm@linux-foundation.org, linux-mm@kvack.org, cgroups@vger.kernel.org, kernel-team@meta.com References: <20250218031448.46951-1-inwardvessel@gmail.com> <20250218031448.46951-2-inwardvessel@gmail.com> Content-Language: en-US From: JP Kobryn In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 82F2D180010 X-Stat-Signature: 8irpkqs9xbjfnihwbdq8a147f9jhafgd X-Rspam-User: X-HE-Tag: 1740416785-848988 X-HE-Meta: U2FsdGVkX1/4pHihrBkLy19LjHonudryiHqRYbnVyMhjDoIWRnaBIIYmbK3CGJmhLqnksCPR2rJHAeNbM2874GdhokM1V2eiFD8VNhDzrAldAkOOdXRlVKrUPZrfdhyaRPANiKgA7uXzFVrQKS2+HsXd+09ZXqyjiMVYAwRqfU4AOwAeomgkg12MK7r86TyEeVkSZpS80W/u8HRbGsOzdn7EKc8NQEHAMwApjPNxNWTzfWg+ouOm6BguVFzEI/M8Nfx1yfOmOdl1DtHn3Yo467xbl+I6bKJCscL+Lw9U8OohRC1OcWiIIQLxWYsQnjAk0bNNhV8yrRUx9yjA2sa42oalLr3vEszy64aSuyY45Vt+78izHpPTJ4Wi2khJ58Iv0j0Ar5MONe8YYPto6ezfstL4rC9PjX4WPClvXGue/9BeiMm9+qdMXLrxC0FMHRbIhNBlTHp55Zwra986omkYx91qf7BCzZQJb5iISQ8iwSr/0CgLZONgb26mqyIYHvo8pDkT22WWA6PJvIT0n/2sF44aj1npoj6nWcpADIld1+oatbIIABFfOwJDAe9jqYNxI7vPBsoR1T8N6iup3rf0WbKLYO+mOeuiB19UofrpL6wnG8XLBbnZDXP5O2tmHrrifg/fff33XvEJ9afNQezt5gth8cPorh0ZiHQ66mh5tfUQLKSbYMqo5tAqrTvJuy1annpQqHVmkAedek1P4I/J/4jHCGzpTK6BePL72SDHveyKox52CfwsZ6P02w3TDZipYTtSHYYvHe0uRBUHkEIDbIsfMuDU+ScY69Cwo1/El0FFKY6OSxL5vybH+6KI1tnAHtT0iH5EHI+qq5QHKLbdNx2mZpkOi5M+PJg5e3NCjzAl8sDUNlFfBk1aUluuu7rem2Se+QGRriKENVZ+MI147TBYaO0xDOn37twBY+rlHY78I1e1613CM0oakYQChc/jf+xapa8TCTsk0LcQIiG +0RVw9JB ROn6qgdI7f8joX9hqrl2PcMJH8wntPBTX5P80E9RubALqV/QUHmytQIhGEWHqiC0g0z4X2m14Q5W/punkmlaOb9LnfXmBS0uX3c2cOAL01YNeBC7ya1EbAMMxnDgSS7iVGlGlSrkwWwgM4L0xQwI6MonLYaXWpLr9bJfeMwUxeCeFTl4IyBlbiBT0aY0nVbKPYegLQoiKMND6vydHYlfr1gZqyw+mEmF0LlsjTbl1sKArs6OoyccQhpEc5YU7Q8feaW5dwcG8R5Af1OvZS1TZ6oSciANbWPjip/blQtIjRwwtTUf+mutgB1/JP0NdRElws5qgINRWvPWfrHlQd4O3NmjPkDwsgsrh3VgKbhIgelw0r2BWpvN/XNdb3jWuqZTEw/ljQxorkDa9KsL4FNSiboL1UWk0F4+Qv3bykwqQjB/ZHQeo+GGL4SatUg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2/20/25 8:53 AM, Yosry Ahmed wrote: > On Mon, Feb 17, 2025 at 07:14:38PM -0800, JP Kobryn wrote: >> The rstat infrastructure makes use of pointers for list management. >> These pointers only exist as fields in the cgroup struct, so moving them >> into their own struct will allow them to be used elsewhere. The base >> stat entities are included with them for now. >> >> Signed-off-by: JP Kobryn >> --- >> include/linux/cgroup-defs.h | 90 +----------------- >> include/linux/cgroup_rstat.h | 92 +++++++++++++++++++ >> kernel/cgroup/cgroup.c | 3 +- >> kernel/cgroup/rstat.c | 27 +++--- >> .../selftests/bpf/progs/btf_type_tag_percpu.c | 4 +- >> 5 files changed, 112 insertions(+), 104 deletions(-) >> create mode 100644 include/linux/cgroup_rstat.h >> >> diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h >> index 1b20d2d8ef7c..6b6cc027fe70 100644 >> --- a/include/linux/cgroup-defs.h >> +++ b/include/linux/cgroup-defs.h >> @@ -17,7 +17,7 @@ >> #include >> #include >> #include >> -#include >> +#include >> #include >> #include >> #include >> @@ -321,78 +321,6 @@ struct css_set { >> struct rcu_head rcu_head; >> }; >> >> -struct cgroup_base_stat { >> - struct task_cputime cputime; >> - >> -#ifdef CONFIG_SCHED_CORE >> - u64 forceidle_sum; >> -#endif >> - u64 ntime; >> -}; >> - >> -/* >> - * rstat - cgroup scalable recursive statistics. Accounting is done >> - * per-cpu in cgroup_rstat_cpu which is then lazily propagated up the >> - * hierarchy on reads. >> - * >> - * When a stat gets updated, the cgroup_rstat_cpu and its ancestors are >> - * linked into the updated tree. On the following read, propagation only >> - * considers and consumes the updated tree. This makes reading O(the >> - * number of descendants which have been active since last read) instead of >> - * O(the total number of descendants). >> - * >> - * This is important because there can be a lot of (draining) cgroups which >> - * aren't active and stat may be read frequently. The combination can >> - * become very expensive. By propagating selectively, increasing reading >> - * frequency decreases the cost of each read. >> - * >> - * This struct hosts both the fields which implement the above - >> - * updated_children and updated_next - and the fields which track basic >> - * resource statistics on top of it - bsync, bstat and last_bstat. >> - */ >> -struct cgroup_rstat_cpu { >> - /* >> - * ->bsync protects ->bstat. These are the only fields which get >> - * updated in the hot path. >> - */ >> - struct u64_stats_sync bsync; >> - struct cgroup_base_stat bstat; >> - >> - /* >> - * Snapshots at the last reading. These are used to calculate the >> - * deltas to propagate to the global counters. >> - */ >> - struct cgroup_base_stat last_bstat; >> - >> - /* >> - * This field is used to record the cumulative per-cpu time of >> - * the cgroup and its descendants. Currently it can be read via >> - * eBPF/drgn etc, and we are still trying to determine how to >> - * expose it in the cgroupfs interface. >> - */ >> - struct cgroup_base_stat subtree_bstat; >> - >> - /* >> - * Snapshots at the last reading. These are used to calculate the >> - * deltas to propagate to the per-cpu subtree_bstat. >> - */ >> - struct cgroup_base_stat last_subtree_bstat; >> - >> - /* >> - * Child cgroups with stat updates on this cpu since the last read >> - * are linked on the parent's ->updated_children through >> - * ->updated_next. >> - * >> - * In addition to being more compact, singly-linked list pointing >> - * to the cgroup makes it unnecessary for each per-cpu struct to >> - * point back to the associated cgroup. >> - * >> - * Protected by per-cpu cgroup_rstat_cpu_lock. >> - */ >> - struct cgroup *updated_children; /* terminated by self cgroup */ >> - struct cgroup *updated_next; /* NULL iff not on the list */ >> -}; >> - >> struct cgroup_freezer_state { >> /* Should the cgroup and its descendants be frozen. */ >> bool freeze; >> @@ -517,23 +445,9 @@ struct cgroup { >> struct cgroup *old_dom_cgrp; /* used while enabling threaded */ >> >> /* per-cpu recursive resource statistics */ >> - struct cgroup_rstat_cpu __percpu *rstat_cpu; >> + struct cgroup_rstat rstat; >> struct list_head rstat_css_list; >> >> - /* >> - * Add padding to separate the read mostly rstat_cpu and >> - * rstat_css_list into a different cacheline from the following >> - * rstat_flush_next and *bstat fields which can have frequent updates. >> - */ >> - CACHELINE_PADDING(_pad_); >> - >> - /* >> - * A singly-linked list of cgroup structures to be rstat flushed. >> - * This is a scratch field to be used exclusively by >> - * cgroup_rstat_flush_locked() and protected by cgroup_rstat_lock. >> - */ >> - struct cgroup *rstat_flush_next; >> - >> /* cgroup basic resource statistics */ >> struct cgroup_base_stat last_bstat; >> struct cgroup_base_stat bstat; >> diff --git a/include/linux/cgroup_rstat.h b/include/linux/cgroup_rstat.h >> new file mode 100644 >> index 000000000000..f95474d6f8ab >> --- /dev/null >> +++ b/include/linux/cgroup_rstat.h >> @@ -0,0 +1,92 @@ >> +/* SPDX-License-Identifier: GPL-2.0 */ >> +#ifndef _LINUX_RSTAT_H >> +#define _LINUX_RSTAT_H >> + >> +#include >> + >> +struct cgroup_rstat_cpu; > > Why do we need the forward declaration instead of just defining struct > cgroup_rstat_cpu first? Also, why do we need a new header for these > definitions rather than just adding struct cgroup_rstat to > cgroup-defs.h? The new header was added so the cgroup_rstat type can be used in bpf cgroup-defs.h. As for the forward declaration, this was done so that updated_next and updated children fields of the cgroup_rstat_cpu can change type from from cgroup to cgroup_rstat. Regardless, based on the direction we are moving with bpf sharing the "self" tree, this new header will NOT be needed in v2. > >> + >> +/* >> + * rstat - cgroup scalable recursive statistics. Accounting is done >> + * per-cpu in cgroup_rstat_cpu which is then lazily propagated up the >> + * hierarchy on reads. >> + * >> + * When a stat gets updated, the cgroup_rstat_cpu and its ancestors are >> + * linked into the updated tree. On the following read, propagation only >> + * considers and consumes the updated tree. This makes reading O(the >> + * number of descendants which have been active since last read) instead of >> + * O(the total number of descendants). >> + * >> + * This is important because there can be a lot of (draining) cgroups which >> + * aren't active and stat may be read frequently. The combination can >> + * become very expensive. By propagating selectively, increasing reading >> + * frequency decreases the cost of each read. >> + * >> + * This struct hosts both the fields which implement the above - >> + * updated_children and updated_next - and the fields which track basic >> + * resource statistics on top of it - bsync, bstat and last_bstat. >> + */ >> +struct cgroup_rstat { >> + struct cgroup_rstat_cpu __percpu *rstat_cpu; >> + >> + /* >> + * Add padding to separate the read mostly rstat_cpu and >> + * rstat_css_list into a different cacheline from the following >> + * rstat_flush_next and containing struct fields which can have >> + * frequent updates. >> + */ >> + CACHELINE_PADDING(_pad_); >> + struct cgroup *rstat_flush_next; >> +}; >> + >> +struct cgroup_base_stat { >> + struct task_cputime cputime; >> + >> +#ifdef CONFIG_SCHED_CORE >> + u64 forceidle_sum; >> +#endif >> + u64 ntime; >> +}; >> + >> +struct cgroup_rstat_cpu { >> + /* >> + * Child cgroups with stat updates on this cpu since the last read >> + * are linked on the parent's ->updated_children through >> + * ->updated_next. >> + * >> + * In addition to being more compact, singly-linked list pointing >> + * to the cgroup makes it unnecessary for each per-cpu struct to >> + * point back to the associated cgroup. >> + */ >> + struct cgroup *updated_children; /* terminated by self */ >> + struct cgroup *updated_next; /* NULL if not on the list */ >> + >> + /* >> + * ->bsync protects ->bstat. These are the only fields which get >> + * updated in the hot path. >> + */ >> + struct u64_stats_sync bsync; >> + struct cgroup_base_stat bstat; >> + >> + /* >> + * Snapshots at the last reading. These are used to calculate the >> + * deltas to propagate to the global counters. >> + */ >> + struct cgroup_base_stat last_bstat; >> + >> + /* >> + * This field is used to record the cumulative per-cpu time of >> + * the cgroup and its descendants. Currently it can be read via >> + * eBPF/drgn etc, and we are still trying to determine how to >> + * expose it in the cgroupfs interface. >> + */ >> + struct cgroup_base_stat subtree_bstat; >> + >> + /* >> + * Snapshots at the last reading. These are used to calculate the >> + * deltas to propagate to the per-cpu subtree_bstat. >> + */ >> + struct cgroup_base_stat last_subtree_bstat; >> +}; >> + >> +#endif /* _LINUX_RSTAT_H */