From: Frederic Weisbecker <frederic@kernel.org>
To: kernel test robot <oliver.sang@intel.com>,
"Paul E. McKenney" <paulmck@kernel.org>
Cc: oe-lkp@lists.linux.dev, lkp@intel.com,
linux-kernel@vger.kernel.org, rcu@vger.kernel.org
Subject: Re: [linus:master] [rcu/nocb] 7625926086: WARNING:at_kernel/rcu/tree_nocb.h:#rcu_nocb_try_bypass
Date: Tue, 11 Jul 2023 18:06:11 +0200 [thread overview]
Message-ID: <ZK1983kQCrN+zCqs@lothringen> (raw)
In-Reply-To: <202307111535.4823c255-oliver.sang@intel.com>
On Tue, Jul 11, 2023 at 04:48:06PM +0800, kernel test robot wrote:
>
>
> Hello,
>
> kernel test robot noticed "WARNING:at_kernel/rcu/tree_nocb.h:#rcu_nocb_try_bypass" on:
>
> commit: 7625926086765123251f765d91fc3a70617d334d ("rcu/nocb: Fix shrinker race against callback enqueuer")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> [test failed on linus/master Linux 6.5-rc1]
> [test failed on linux-next/master fe57d0d86f03a8b2afe2869a95477d0ed1824c96]
>
> in testcase: trinity
> version: trinity-i386-abe9de86-1_20230429
> with following parameters:
>
> runtime: 300s
> group: group-00
>
> test-description: Trinity is a linux system call fuzz tester.
> test-url: http://codemonkey.org.uk/projects/trinity/
>
>
> compiler: gcc-12
> test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
>
> (please refer to attached dmesg/kmsg for entire log/backtrace):
>
>
> the WARNING only happens in a low rate (10 out of 204 runs as below).
> but since we never observed the issue on parent, and we also noticed
> there is a WARN_ON_ONCE in code change, so just report what we observed
> in our tests FYI.
>
> 5c83cedbaaad6dfe 7625926086765123251f765d91f
> ---------------- ---------------------------
> fail:runs %reproduction fail:runs
> | | |
> :202 5% 10:204 dmesg.RIP:rcu_nocb_bypass_lock
> :202 5% 10:204 dmesg.RIP:rcu_nocb_try_bypass
> :202 5% 10:204 dmesg.WARNING:at_kernel/rcu/tree_nocb.h:#rcu_nocb_bypass_lock
> :202 5% 10:204 dmesg.WARNING:at_kernel/rcu/tree_nocb.h:#rcu_nocb_try_bypass
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202307111535.4823c255-oliver.sang@intel.com
>
>
>
> [ 315.569574][ T16] WARNING: CPU: 1 PID: 16 at kernel/rcu/tree_nocb.h:124
> rcu_nocb_try_bypass (kernel/rcu/tree_nocb.h:124 kernel/rcu/tree_nocb.h:513)
It's:
static void rcu_nocb_wait_contended(struct rcu_data *rdp)
{
WARN_ON_ONCE(smp_processor_id() != rdp->cpu);
while (WARN_ON_ONCE(atomic_read(&rdp->nocb_lock_contended))) <<< HERE
cpu_relax();
}
This is called on enqueue time by rcu_nocb_try_bypass().
The places that can increment ->nocb_lock_contended are:
* rcu_nocb_try_bypass() itself, but it's not re-entrant
* rcu_nocb_rdp_deoffload(), but it executes on the local CPU with
IRQs disabled, so rcu_nocb_try_bypass() can't race
* lazy_rcu_shrink_scan(): that's indeed the issue here
* rcu_barrier_entrain(), executed either locally with IRQs disabled
or remotely when the target is dead, so can't race with rcu_nocb_try_bypass()
* rcutree_migrate_callback(): locally with IRQs disabled, can't race
So indeed lazy_rcu_shrink_scan() is the only possible culprit. But then
I wonder, what is the purpose of this ->nocb_lock_contended if it's actually
not an allowed behaviour?
Paul, any recollection of that?
Thanks.
next prev parent reply other threads:[~2023-07-11 16:06 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-11 8:48 [linus:master] [rcu/nocb] 7625926086: WARNING:at_kernel/rcu/tree_nocb.h:#rcu_nocb_try_bypass kernel test robot
2023-07-11 9:45 ` Z qiang
2023-07-11 10:04 ` Z qiang
2023-07-11 16:06 ` Frederic Weisbecker [this message]
2023-07-11 16:41 ` Paul E. McKenney
2023-07-12 16:59 ` Frederic Weisbecker
2023-07-12 20:48 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZK1983kQCrN+zCqs@lothringen \
--to=frederic@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lkp@intel.com \
--cc=oe-lkp@lists.linux.dev \
--cc=oliver.sang@intel.com \
--cc=paulmck@kernel.org \
--cc=rcu@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox