From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 97DA3C71141 for ; Wed, 11 Jun 2025 22:16:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 39FF06B0093; Wed, 11 Jun 2025 18:16:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 374916B0095; Wed, 11 Jun 2025 18:16:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2B13D6B0096; Wed, 11 Jun 2025 18:16:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 0AE466B0093 for ; Wed, 11 Jun 2025 18:16:13 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 7C38B1D6E8C for ; Wed, 11 Jun 2025 22:16:12 +0000 (UTC) X-FDA: 83544528984.14.7ABD32C Received: from out-187.mta1.migadu.com (out-187.mta1.migadu.com [95.215.58.187]) by imf26.hostedemail.com (Postfix) with ESMTP id AA391140012 for ; Wed, 11 Jun 2025 22:16:10 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=M8xikJAU; spf=pass (imf26.hostedemail.com: domain of shakeel.butt@linux.dev designates 95.215.58.187 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1749680170; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=uFFIt1y9IJBu59K3m1CLaEDnhewuqL+3YiQDRhpGo1g=; b=0WNUnEf7Uu+XuLFVar8lnN0Kdk2rA/a98T0Wlyd8Xd87lAj2FTfQdIrQ0LfvtXjth5mZMB C2cEtpDYoguZbI2NwO9rfZNJFA9W3yWHo91NyBhsEGt+MmRWAmK3ucLOHoTyxFVDQC76Iv KDut2CqoyUHL8EojOiM137DPJcKccp0= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=M8xikJAU; spf=pass (imf26.hostedemail.com: domain of shakeel.butt@linux.dev designates 95.215.58.187 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1749680170; a=rsa-sha256; cv=none; b=CVdZSebuPqsfE8oBLy7a3HRFbzUoPDkUNllV4QUXHYr8hDLdCy13Jn0I77685PRlpUZMzr n5oXViKhHzz2Cpo4Op2v6AHGXyBMKjyVSNJqu7TxHr2Qi9c33r69pzYRq0Cyq97mCiZud7 y89vw2VMvb1oMbKkLG7OF5cMVL/W0XU= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1749680168; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uFFIt1y9IJBu59K3m1CLaEDnhewuqL+3YiQDRhpGo1g=; b=M8xikJAUUp+5J+F6OXGY0j/RY1qIO7AN56Jrj0EoxB1JNdEk94Lk3WEPxO+EEQox5EItzf F/9mmpwEXB4Xucz2AYFR/V85LFzRwMYlMwM9IfQ/X2f0py38EnvGCHyh/qF6pgZgi5cLIj aOJVeh4ieT280EVlpAUexdDUv1wLz9k= From: Shakeel Butt To: Tejun Heo , Andrew Morton Cc: JP Kobryn , Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Vlastimil Babka , Alexei Starovoitov , Sebastian Andrzej Siewior , =?UTF-8?q?Michal=20Koutn=C3=BD?= , Harry Yoo , Yosry Ahmed , bpf@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Meta kernel team Subject: [PATCH v2 2/4] cgroup: make css_rstat_updated nmi safe Date: Wed, 11 Jun 2025 15:15:30 -0700 Message-ID: <20250611221532.2513772-3-shakeel.butt@linux.dev> In-Reply-To: <20250611221532.2513772-1-shakeel.butt@linux.dev> References: <20250611221532.2513772-1-shakeel.butt@linux.dev> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: AA391140012 X-Rspamd-Server: rspam03 X-Rspam-User: X-Stat-Signature: 8uhkj8ywi68rpcsjsk3enucknijm1c65 X-HE-Tag: 1749680170-273396 X-HE-Meta: U2FsdGVkX18PgjBvz5HVmvERZXL3kzp9JZBL3rwQ7o6IZLW8W0/pZYJRbt80qZGMjbj3NqFsTPgeaZcSIFeBjay0njfv10Fj/z1CpKlKBu2uMaPVCn6btgFtn8m0bSmiohgRv/Z0OMaomrQk1BeSuZgc5FmulHWbHk9iWjp615ncb79M606KlXHF5VX2vG1B3kKq40oUUlrAk8/sXeDwGkqLRfAV+nDqNX5LLTfq0V6CXaNhm1lTQLr6hqQAIL2GOc/0otoSzA4rWiHGN7JAELLkXantKuEuog+lGuiW+lOrAC2S2g+a5xvD0q29uaF2ouwvUuDKdJFfIBk3Cmr5l5fwmRvgdjN1LR8hIZso4dRPROh2fw206mAS3SHMI+0s7mEyw4kbPBG+nLzttxz3MvFFcUGzubqPYqAIaTUCLmkdNqH1nAO29M8k6TCho30hs9oN9BExF/bIjr+5IUalevz1oedl1IPLdeIJ1ozriVHKTog2saoWfI5FI0rzHB0O5ngyjiU3sPc/qRL6wpiaO5MpWSmJm+F2+OTFELCCzBIP8poFPrfxJO2z12f93xAkUPci+YnumvMH5qQM3UyzCtKEEG8m+6ptj/8uGDRiJC7aWaQCztDI13htmCNpMTNxdFkrUs2LXSo+2dyKQym4z4z50T9O4+4t3uvjZRGIaAGHMH2/vBgTQ24XHeuEI7yOix7muaEnroqmf/oOkmqFb/RvHnxfvuAxFmuHcs+QaYACHGqZG+xp6tST9o6WI8c4yxXqPivLLS60lEjTszTWa41TjAYf9GEdiAyKl+RDQYUwqC/tyG4PlOv8XPfG6fcB5k1F4bTPqfP0DQz+GuaWy+40FjfJkolUXu1LE2EPWxECIe+WttqVkLh4IkvEOxZUpjBCQioRYTrwGsbV2QDvO3mi1QoV3YWUAeaPRP4bzqO4tMbOhrGoLFpsmkwyAWPDFSbGFZ/YeEE5qImYX6p L58VqSFS iVPOA0Bubq+X5jt8XkXuw7Jnh7BkWYh2XO8tcRtGUwjexGGqVqGCExVUb/zu2wEbcmCZkYTcRJd65R5wDs5Rwae088NTEnJYEFyok+aavTRxsQ4gbtmGpYf19tiJapDRxdJIE4ov0vo+vAF93FhX7j/ha/mtttmA7nnY/cv424xKSlqE122ApS+esH5UokXFMwfoMDpmDu6Rn+yvKNe/wAPhFGyqQ5/6IoIhG8a7e/lJ/tuJYt++8UTvNMACoUp1+mKGR X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: To make css_rstat_updated() able to safely run in nmi context, let's move the rstat update tree creation at the flush side and use per-cpu lockless lists in struct cgroup_subsys to track the css whose stats are updated on that cpu. The struct cgroup_subsys_state now has per-cpu lnode which needs to be inserted into the corresponding per-cpu lhead of struct cgroup_subsys. Since we want the insertion to be nmi safe, there can be multiple inserters on the same cpu for the same lnode. Here multiple inserters are from stacked contexts like softirq, hardirq and nmi. The current llist does not provide function to protect against the scenario where multiple inserters can use the same lnode. So, using llist_node() out of the box is not safe for this scenario. However we can protect against multiple inserters using the same lnode by using the fact llist node points to itself when not on the llist and atomically reset it and select the winner as the single inserter. Signed-off-by: Shakeel Butt --- Changes since v1: - More clear code comment as suggested by Tejun. kernel/cgroup/rstat.c | 65 +++++++++++++++++++++++++++++++++++-------- 1 file changed, 53 insertions(+), 12 deletions(-) diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c index a5608ae2be27..a7550961dd12 100644 --- a/kernel/cgroup/rstat.c +++ b/kernel/cgroup/rstat.c @@ -138,13 +138,16 @@ void _css_rstat_cpu_unlock(struct cgroup_subsys_state *css, int cpu, * @css: target cgroup subsystem state * @cpu: cpu on which rstat_cpu was updated * - * @css's rstat_cpu on @cpu was updated. Put it on the parent's matching - * rstat_cpu->updated_children list. See the comment on top of - * css_rstat_cpu definition for details. + * Atomically inserts the css in the ss's llist for the given cpu. This is + * reentrant safe i.e. safe against softirq, hardirq and nmi. The ss's llist + * will be processed at the flush time to create the update tree. */ __bpf_kfunc void css_rstat_updated(struct cgroup_subsys_state *css, int cpu) { - unsigned long flags; + struct llist_head *lhead; + struct css_rstat_cpu *rstatc; + struct css_rstat_cpu __percpu *rstatc_pcpu; + struct llist_node *self; /* * Since bpf programs can call this function, prevent access to @@ -153,19 +156,44 @@ __bpf_kfunc void css_rstat_updated(struct cgroup_subsys_state *css, int cpu) if (!css_uses_rstat(css)) return; + lockdep_assert_preemption_disabled(); + + /* + * For archs withnot nmi safe cmpxchg or percpu ops support, ignore + * the requests from nmi context. + */ + if ((!IS_ENABLED(CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG) || + !IS_ENABLED(CONFIG_ARCH_HAS_NMI_SAFE_THIS_CPU_OPS)) && in_nmi()) + return; + + rstatc = css_rstat_cpu(css, cpu); + /* If already on list return. */ + if (llist_on_list(&rstatc->lnode)) + return; + /* - * Speculative already-on-list test. This may race leading to - * temporary inaccuracies, which is fine. + * This function can be renentered by irqs and nmis for the same cgroup + * and may try to insert the same per-cpu lnode into the llist. Note + * that llist_add() does not protect against such scenarios. * - * Because @parent's updated_children is terminated with @parent - * instead of NULL, we can tell whether @css is on the list by - * testing the next pointer for NULL. + * To protect against such stacked contexts of irqs/nmis, we use the + * fact that lnode points to itself when not on a list and then use + * this_cpu_cmpxchg() to atomically set to NULL to select the winner + * which will call llist_add(). The losers can assume the insertion is + * successful and the winner will eventually add the per-cpu lnode to + * the llist. */ - if (data_race(css_rstat_cpu(css, cpu)->updated_next)) + self = &rstatc->lnode; + rstatc_pcpu = css->rstat_cpu; + if (this_cpu_cmpxchg(rstatc_pcpu->lnode.next, self, NULL) != self) return; - flags = _css_rstat_cpu_lock(css, cpu, true); + lhead = ss_lhead_cpu(css->ss, cpu); + llist_add(&rstatc->lnode, lhead); +} +static void __css_process_update_tree(struct cgroup_subsys_state *css, int cpu) +{ /* put @css and all ancestors on the corresponding updated lists */ while (true) { struct css_rstat_cpu *rstatc = css_rstat_cpu(css, cpu); @@ -191,8 +219,19 @@ __bpf_kfunc void css_rstat_updated(struct cgroup_subsys_state *css, int cpu) css = parent; } +} + +static void css_process_update_tree(struct cgroup_subsys *ss, int cpu) +{ + struct llist_head *lhead = ss_lhead_cpu(ss, cpu); + struct llist_node *lnode; + + while ((lnode = llist_del_first_init(lhead))) { + struct css_rstat_cpu *rstatc; - _css_rstat_cpu_unlock(css, cpu, flags, true); + rstatc = container_of(lnode, struct css_rstat_cpu, lnode); + __css_process_update_tree(rstatc->owner, cpu); + } } /** @@ -300,6 +339,8 @@ static struct cgroup_subsys_state *css_rstat_updated_list( flags = _css_rstat_cpu_lock(root, cpu, false); + css_process_update_tree(root->ss, cpu); + /* Return NULL if this subtree is not on-list */ if (!rstatc->updated_next) goto unlock_ret; -- 2.47.1