From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0019281AD7; Tue, 22 Oct 2024 14:02:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729605772; cv=none; b=ekAD9a51UVJGJYBRBkwYHUaOCR5MAkh/v4UBWZMO1i0mvwxta5HJVuF/JjW16+ESubRaB/PALzRjjtLhZS7asS/UZs05aM9gQLhV7FfZtZYMNiDSlSdOYxQ84f5qhE3IQoocZ/FSeAowlCpHH3ZXezVdf+qiP/0WaGY6Ibc5Pyg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729605772; c=relaxed/simple; bh=EjeSgyKTr/lV46QdZQZAGPinH4ooRRKs33jscuCD8/s=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=idO1lfchkakx6rhSWZ/KMkiSc+GJBHzcuivYgF/kH7U3d1MPEZJeT/H4XVnqPAlVXjaM8a4lnOqDf4YU5d+XdAiJ360asgdteipR8T2Y77+LfLBppQISA8psdc1zeSgVuZH9pFgebTZxZufZ1tmpgPky+cN9TnFVEQcIVjnD9SY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=KtTDOS6E; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="KtTDOS6E" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 38775C4CEC3; Tue, 22 Oct 2024 14:02:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1729605771; bh=EjeSgyKTr/lV46QdZQZAGPinH4ooRRKs33jscuCD8/s=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=KtTDOS6EbY/1gLUY4DJPaL0Sbt2ZCc5xvv07+OcQNKFeIwHQ/MX9vWILjPrRueAjj uRj+i6FdlN3C7Lk0o/8Now6nehNXvRJubNkJPq3x7BVcKz7GtG0FbsOM9PyZ7zukKm 85DrmzXzrHx9Qi+WeGbQt0hvLKdYAOfyDm5MWva2YVWLGNVf5wAhzAf1e+3b32CapU t5xkgaFmaZQy3kGd8P1ugQmy4s7VqeKkfXCcOcKlux/1qPZMyAeAUavW48aMVqTFHY 5KpLzHOiDYTY/nkr0mGPSO/cy0xTeysfZEDggNZsPJW7vv5zi8BrEJpIEfxoTi0mn+ r1Fw+ZxlMA1sw== Date: Tue, 22 Oct 2024 16:02:48 +0200 From: Frederic Weisbecker To: Zqiang Cc: paulmck@kernel.org, neeraj.upadhyay@kernel.org, joel@joelfernandes.org, urezki@gmail.com, boqun.feng@gmail.com, rcu@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v2] rcu/nocb: Fix the WARN_ON_ONCE() in rcu_nocb_rdp_deoffload() Message-ID: References: <20241022034117.2294-1-qiang.zhang1211@gmail.com> Precedence: bulk X-Mailing-List: rcu@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20241022034117.2294-1-qiang.zhang1211@gmail.com> Le Tue, Oct 22, 2024 at 11:41:17AM +0800, Zqiang a écrit : > Currently, running rcutorture test with torture_type=rcu fwd_progress=8 > n_barrier_cbs=8 nocbs_nthreads=8 nocbs_toggle=100 onoff_interval=60 > test_boost=2, will trigger the following warning: > > WARNING: CPU: 19 PID: 100 at kernel/rcu/tree_nocb.h:1061 rcu_nocb_rdp_deoffload+0x292/0x2a0 > RIP: 0010:rcu_nocb_rdp_deoffload+0x292/0x2a0 > [18839.537322] Call Trace: > [18839.538006] > [18839.538596] ? __warn+0x7e/0x120 > [18839.539491] ? rcu_nocb_rdp_deoffload+0x292/0x2a0 > [18839.540757] ? report_bug+0x18e/0x1a0 > [18839.541805] ? handle_bug+0x3d/0x70 > [18839.542837] ? exc_invalid_op+0x18/0x70 > [18839.543959] ? asm_exc_invalid_op+0x1a/0x20 > [18839.545165] ? rcu_nocb_rdp_deoffload+0x292/0x2a0 > [18839.546547] rcu_nocb_cpu_deoffload+0x70/0xa0 > [18839.547814] rcu_nocb_toggle+0x136/0x1c0 > [18839.548960] ? __pfx_rcu_nocb_toggle+0x10/0x10 > [18839.550073] kthread+0xd1/0x100 > [18839.550958] ? __pfx_kthread+0x10/0x10 > [18839.552008] ret_from_fork+0x2f/0x50 > [18839.553002] ? __pfx_kthread+0x10/0x10 > [18839.553968] ret_from_fork_asm+0x1a/0x30 > [18839.555038] > > CPU0 CPU2 CPU3 > //rcu_nocb_toggle //nocb_cb_wait //rcutorture > > // deoffload CPU1 // process CPU1's rdp > rcu_barrier() > rcu_segcblist_entrain() > rcu_segcblist_add_len(1); > // len == 2 > // enqueue barrier > // callback to CPU1's > // rdp->cblist > rcu_do_batch() > // invoke CPU1's rdp->cblist > // callback > rcu_barrier_callback() > rcu_barrier() > mutex_lock(&rcu_state.barrier_mutex); > // still see len == 2 > // enqueue barrier callback > // to CPU1's rdp->cblist > rcu_segcblist_entrain() > rcu_segcblist_add_len(1); > // len == 3 > // decrement len > rcu_segcblist_add_len(-2); > kthread_parkme() > > // CPU1's rdp->cblist len == 1 > // Warn because there is > // still a pending barrier > // trigger warning > WARN_ON_ONCE(rcu_segcblist_n_cbs(&rdp->cblist)); > cpus_read_unlock(); > > // wait CPU1 comes online > // invoke barrier callback on > // CPU1 rdp's->cblist > wait_for_completion(&rcu_state.barrier_completion); > // deoffload CPU4 > cpus_read_lock() > rcu_barrier() > mutex_lock(&rcu_state.barrier_mutex); > // block on barrier_mutex > // wait rcu_barrier() on > // CPU3 to unlock barrier_mutex > // but CPU3 unlock barrier_mutex > // need to wait CPU1 comes online > // when CPU1 going online will block on cpus_write_lock > > The above scenario will not only trigger WARN_ON_ONCE(), but also > trigger deadlock, this commit therefore check rdp->nocb_cb_sleep > flags before invoke kthread_parkme(), and the kthread_parkme() is > not invoke until there are no pending callbacks and set > rdp->nocb_cb_sleep is true. > > Fixes: 1fcb932c8b5c ("rcu/nocb: Simplify (de-)offloading state machine") > Suggested-by: Frederic Weisbecker > Signed-off-by: Zqiang Applied with the below wordsmithing, thanks a lot! --- From: Zqiang Date: Tue, 22 Oct 2024 11:41:17 +0800 Subject: [PATCH] rcu/nocb: Fix missed RCU barrier on deoffloading Currently, running rcutorture test with torture_type=rcu fwd_progress=8 n_barrier_cbs=8 nocbs_nthreads=8 nocbs_toggle=100 onoff_interval=60 test_boost=2, will trigger the following warning: WARNING: CPU: 19 PID: 100 at kernel/rcu/tree_nocb.h:1061 rcu_nocb_rdp_deoffload+0x292/0x2a0 RIP: 0010:rcu_nocb_rdp_deoffload+0x292/0x2a0 Call Trace: ? __warn+0x7e/0x120 ? rcu_nocb_rdp_deoffload+0x292/0x2a0 ? report_bug+0x18e/0x1a0 ? handle_bug+0x3d/0x70 ? exc_invalid_op+0x18/0x70 ? asm_exc_invalid_op+0x1a/0x20 ? rcu_nocb_rdp_deoffload+0x292/0x2a0 rcu_nocb_cpu_deoffload+0x70/0xa0 rcu_nocb_toggle+0x136/0x1c0 ? __pfx_rcu_nocb_toggle+0x10/0x10 kthread+0xd1/0x100 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2f/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 CPU0 CPU2 CPU3 //rcu_nocb_toggle //nocb_cb_wait //rcutorture // deoffload CPU1 // process CPU1's rdp rcu_barrier() rcu_segcblist_entrain() rcu_segcblist_add_len(1); // len == 2 // enqueue barrier // callback to CPU1's // rdp->cblist rcu_do_batch() // invoke CPU1's rdp->cblist // callback rcu_barrier_callback() rcu_barrier() mutex_lock(&rcu_state.barrier_mutex); // still see len == 2 // enqueue barrier callback // to CPU1's rdp->cblist rcu_segcblist_entrain() rcu_segcblist_add_len(1); // len == 3 // decrement len rcu_segcblist_add_len(-2); kthread_parkme() // CPU1's rdp->cblist len == 1 // Warn because there is // still a pending barrier // trigger warning WARN_ON_ONCE(rcu_segcblist_n_cbs(&rdp->cblist)); cpus_read_unlock(); // wait CPU1 to comes online and // invoke barrier callback on // CPU1 rdp's->cblist wait_for_completion(&rcu_state.barrier_completion); // deoffload CPU4 cpus_read_lock() rcu_barrier() mutex_lock(&rcu_state.barrier_mutex); // block on barrier_mutex // wait rcu_barrier() on // CPU3 to unlock barrier_mutex // but CPU3 unlock barrier_mutex // need to wait CPU1 comes online // when CPU1 going online will block on cpus_write_lock The above scenario will not only trigger a WARN_ON_ONCE(), but also trigger a deadlock. Thanks to nocb locking, a second racing rcu_barrier() on an offline CPU will either observe the decremented callback counter down to 0 and spare the callback enqueue, or rcuo will observe the new callback and keep rdp->nocb_cb_sleep to false. Therefore check rdp->nocb_cb_sleep before parking to make sure no further rcu_barrier() is waiting on the rdp. Fixes: 1fcb932c8b5c ("rcu/nocb: Simplify (de-)offloading state machine") Suggested-by: Frederic Weisbecker Signed-off-by: Zqiang Signed-off-by: Frederic Weisbecker --- kernel/rcu/tree_nocb.h | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h index 16865475120b..2605dd234a13 100644 --- a/kernel/rcu/tree_nocb.h +++ b/kernel/rcu/tree_nocb.h @@ -891,7 +891,18 @@ static void nocb_cb_wait(struct rcu_data *rdp) swait_event_interruptible_exclusive(rdp->nocb_cb_wq, nocb_cb_wait_cond(rdp)); if (kthread_should_park()) { - kthread_parkme(); + /* + * kthread_park() must be preceded by an rcu_barrier(). + * But yet another rcu_barrier() might have sneaked in between + * the barrier callback execution and the callbacks counter + * decrement. + */ + if (rdp->nocb_cb_sleep) { + rcu_nocb_lock_irqsave(rdp, flags); + WARN_ON_ONCE(rcu_segcblist_n_cbs(&rdp->cblist)); + rcu_nocb_unlock_irqrestore(rdp, flags); + kthread_parkme(); + } } else if (READ_ONCE(rdp->nocb_cb_sleep)) { WARN_ON(signal_pending(current)); trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("WokeEmpty")); -- 2.46.0