From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EF4F928152D for ; Fri, 8 Aug 2025 15:12:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754665938; cv=none; b=l638aVrMQK4thS59alA1uA+Y0FRd88EduHKiKZM2G9fXj5E8IXto1OYkOVE4GGxIZlFJ52YlJwTZjlBd5jh78pTSb2rQKPbn8upVuUuzEx2wfE3QRTX24l1crlbgIngqswnuEtXOP4FShXLoZ93wO4TkFvp2ADbGLaPITDh/8nw= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754665938; c=relaxed/simple; bh=QRY16VAW4X4F0xXHp04SZrPFAFE5kMQLXklX4MXdJ+A=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=tp2QnsNLXqnvwPp899b6mMX7A+HEvTQOYBDtQfxDASX3YqbgajNIY0rKPOe6uJYNZK4tp1uegsz3ChPF16aQB9Vb2sGnOcokBWHNH1R+JGYsOmOe0ZAoWV0M20iolhOeBCC0LWV36Gyby9qHFNB7A/7cUK+y8aURfyBukaTjXkw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=EamPtlCC; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="EamPtlCC" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1754665933; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=u94p6gUaRQMoeY109WbKTEOrDCchNSj9vS9OMLPC8GY=; b=EamPtlCClVmtWNtdcCBPJYK2MfcRLiV5iYy3v8rq7XZ6+qwagAI0TCqkKONeYqb0yrIE4f rV1jvjzew2FEgUHBgY7UsqLZgF7lmN0xJ3ON5vPwocH5JPKuaz9Xu7XuF8dvBM9AUOWMpe CagdfrdrQt44YOjuv7XCMAthX5A0wh8= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-42-g97BaJH6Pcm_bPkS05BaYw-1; Fri, 08 Aug 2025 11:12:12 -0400 X-MC-Unique: g97BaJH6Pcm_bPkS05BaYw-1 X-Mimecast-MFC-AGG-ID: g97BaJH6Pcm_bPkS05BaYw_1754665928 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 0C8EC1800289; Fri, 8 Aug 2025 15:12:08 +0000 (UTC) Received: from llong-thinkpadp16vgen1.westford.csb (unknown [10.22.65.37]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id BDCEB1954199; Fri, 8 Aug 2025 15:12:00 +0000 (UTC) From: Waiman Long To: Tejun Heo , Johannes Weiner , =?UTF-8?q?Michal=20Koutn=C3=BD?= , Jonathan Corbet , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Uladzislau Rezki , Steven Rostedt , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Anna-Maria Behnsen , Ingo Molnar , Thomas Gleixner , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Ben Segall , Mel Gorman , Valentin Schneider , Shuah Khan Cc: cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, rcu@vger.kernel.org, linux-kselftest@vger.kernel.org, Phil Auld , Costa Shulyupin , Gabriele Monaco , Cestmir Kalina , Waiman Long Subject: [RFC PATCH 03/18] sched/isolation: Use RCU to delay successive housekeeping cpumask updates Date: Fri, 8 Aug 2025 11:10:47 -0400 Message-ID: <20250808151053.19777-4-longman@redhat.com> In-Reply-To: <20250808151053.19777-1-longman@redhat.com> References: <20250808151053.19777-1-longman@redhat.com> Precedence: bulk X-Mailing-List: linux-doc@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 Even though there are 2 separate sets of housekeeping cpumasks for access and update, it is possible that the set of cpumasks to be updated are still being used by the callers of housekeeping functions resulting in the use of an intermediate cpumask between the new and old ones. To reduce the chance of this, we need to introduce delay between successive housekeeping cpumask updates. One simple way is to make use of the RCU grace period delay. The callers of the housekeeping APIs can optionally hold rcu_read_lock to eliminate the chance of using intermediate housekeeping cpumasks. Signed-off-by: Waiman Long --- kernel/sched/isolation.c | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c index ee396ae13719..f26708667754 100644 --- a/kernel/sched/isolation.c +++ b/kernel/sched/isolation.c @@ -23,6 +23,9 @@ EXPORT_SYMBOL_GPL(housekeeping_overridden); * The housekeeping cpumasks can now be dynamically updated at run time. * Two set of cpumasks are kept. One set can be used while the other set are * being updated concurrently. + * + * rcu_read_lock() can optionally be held by housekeeping API callers to + * ensure stability of the cpumasks. */ static DEFINE_RAW_SPINLOCK(cpumask_lock); struct housekeeping { @@ -34,6 +37,8 @@ struct housekeeping { static struct housekeeping housekeeping; static bool sched_tick_offload_inited; +static struct rcu_head rcu_gp[HK_TYPE_MAX]; +static unsigned long update_flags; bool housekeeping_enabled(enum hk_type type) { @@ -267,6 +272,18 @@ static int __init housekeeping_isolcpus_setup(char *str) } __setup("isolcpus=", housekeeping_isolcpus_setup); +/* + * Bits in update_flags can only turned on with cpumask_lock held and + * cleared by this RCU callback function. + */ +static void rcu_gp_end(struct rcu_head *rcu) +{ + int type = rcu - rcu_gp; + + /* Atomically clear the corresponding flag bit */ + clear_bit(type, &update_flags); +} + /** * housekeeping_exclude_cpumask - Update housekeeping cpumasks to exclude only the given cpumask * @cpumask: new cpumask to be excluded from housekeeping cpumasks @@ -306,8 +323,21 @@ int housekeeping_exclude_cpumask(struct cpumask *cpumask, unsigned long hk_flags } #endif +retry: + /* + * If the RCU grace period for the previous update with conflicting + * flag bits hasn't been completed yet, we have to wait for it. + */ + while (READ_ONCE(update_flags) & hk_flags) + synchronize_rcu(); + raw_spin_lock(&cpumask_lock); + if (READ_ONCE(update_flags) & hk_flags) { + raw_spin_unlock(&cpumask_lock); + goto retry; + } + for_each_set_bit(type, &hk_flags, HK_TYPE_MAX) { int idx = ++housekeeping.seq_nrs[type] & 1; struct cpumask *dst_cpumask = housekeeping.cpumasks[type][idx]; @@ -320,8 +350,11 @@ int housekeeping_exclude_cpumask(struct cpumask *cpumask, unsigned long hk_flags housekeeping.flags |= BIT(type); } WRITE_ONCE(housekeeping.cpumask_ptrs[type], dst_cpumask); + set_bit(type, &update_flags); } raw_spin_unlock(&cpumask_lock); + for_each_set_bit(type, &hk_flags, HK_TYPE_MAX) + call_rcu(&rcu_gp[type], rcu_gp_end); if (!housekeeping.flags && static_key_enabled(&housekeeping_overridden)) static_key_disable(&housekeeping_overridden.key); -- 2.50.0