From: Joel Fernandes <joel@joelfernandes.org>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>,
Naresh Kamboju <naresh.kamboju@linaro.org>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, patches@lists.linux.dev,
linux-kernel@vger.kernel.org, torvalds@linux-foundation.org,
akpm@linux-foundation.org, linux@roeck-us.net, shuah@kernel.org,
patches@kernelci.org, lkft-triage@lists.linaro.org,
pavel@denx.de, jonathanh@nvidia.com, f.fainelli@gmail.com,
sudipm.mukherjee@gmail.com, srw@sladewatkins.net, rwarsow@gmx.de,
conor@kernel.org, Chengming Zhou <zhouchengming@bytedance.com>,
Peter Zijlstra <peterz@infradead.org>,
Ovidiu Panait <ovidiu.panait@windriver.com>,
Ingo Molnar <mingo@kernel.org>, rcu <rcu@vger.kernel.org>
Subject: Re: [PATCH 5.15 000/183] 5.15.134-rc1 review
Date: Wed, 11 Oct 2023 05:05:04 +0000 [thread overview]
Message-ID: <20231011050504.GA201855@google.com> (raw)
In-Reply-To: <433f5823-059c-4b51-8d18-8b356a5a507f@paulmck-laptop>
On Tue, Oct 10, 2023 at 06:34:35PM -0700, Paul E. McKenney wrote:
[...]
> > > > > > > It's also worth noting that the bug this fixes wasn't exposed until the
> > > > > > > maple tree (added in v6.1) was used for the IRQ descriptors (added in
> > > > > > > v6.5).
> > > > > >
> > > > > > Lots of latent bugs, to be sure, even with rcutorture. :-/
> > > > >
> > > > > The Right Thing is to fix the bug all the way back to the introduction,
> > > > > but what fallout makes the backport less desirable than living with the
> > > > > unexposed bug?
> > > >
> > > > You are quite right that it is possible for the risk of a backport to
> > > > exceed the risk of the original bug.
> > > >
> > > > I defer to Joel (CCed) on how best to resolve this in -stable.
> > >
> > > Maybe I am missing something but this issue should also be happening
> > > in mainline right?
> > >
> > > Even though mainline has 897ba84dc5aa ("rcu-tasks: Handle idle tasks
> > > for recently offlined CPUs") , the warning should still be happening
> > > due to Liam's "kernel/sched: Modify initial boot task idle setup"
> > > because the warning is just rearranged a bit but essentially the same.
> > >
> > > IMHO, the right thing to do then is to drop Liam's patch from 5.15 and
> > > fix it in mainline (using the ideas described in this thread), then
> > > backport both that new fix and Liam's patch to 5.15.
> > >
> > > Or is there a reason this warning does not show up on the mainline?
>
> There is not a whole lot of commonality between the v5.15.134 version of
> RCU Tasks Trace and that of mainline. In theory, in mainline, CPU hotplug
> is supposed to be disabled across all calls to trc_inspect_reader(),
> which means that there would not be any CPU coming or going.
>
> But there could potentially be some time between when a CPU was
> marked as online and its idle task was marked PF_IDLE. And in
> fact x86 start_secondary() invokes set_cpu_online() before it calls
> cpu_startup_entry(), and it is the latter than sets PF_IDLE.
>
> The same is true of alpha, arc, arm, arm64, csky, ia64, loongarch, mips,
> openrisc, parisc, powerpc, riscv, s390, sh, sparc32, sparc64, x86 xen,
> and xtensa, which is everybody.
>
> One reason why my testing did not reproduce this is because I was running
> against v6.6-rc1, and cff9b2332ab7 ("kernel/sched: Modify initial boot
> task idle setup") went into v6.6-rc3. An initial run merging in current
> mainline also failed to reproduce this, but I am running overnight.
> If that doesn't reproduce, I will try inserting delays between the
> set_cpu_online() and the cpu_startup_entry().
I thought the warning happens before set_cpu_online() is even called, because
under such situation, ofl == true and the task is not set to PF_IDLE yet:
WARN_ON_ONCE(ofl && task_curr(t) && !is_idle_task(t));
> If this problem is real, fixes include:
>
> o Revert Liam's patch and make Tiny RCU's call_rcu() deal with
> the problem. This is overhead and non-tinyness, but to Joel's
> point, it might be best.
>
> o Go back to something more like Liam's original patch, which
> cleared PF_IDLE only for the boot CPU.
>
> o Set PF_IDLE before calling set_cpu_online(). This would work,
> but it would also be rather ugly, reaching into each and every
> architecture.
>
> o Move the call to set_cpu_online() into cpu_startup_entry().
> This would require some serious inspection to prove that it is
> safe, assuming that it is in fact safe.
>
> o Drop the WARN_ON_ONCE() from trc_inspect_reader(). Not all
> that excited by losing this diagnostic, but then again it
> has been awhile since it has caught anything.
>
> o Make the WARN_ON_ONCE() condition in trc_inspect_reader() instead
> to a "return false" to retry later. Ditto, also not liking the
> possibility of indefinite deferral with no warning.
Just for completeness,
o Since it just a warning, checking for task_struct::pid == 0 instead of is_idle_task()?
Though PF_IDLE is also set in play_idle_precise().
o Change warning to:
WARN_ON_ONCE(ofl && task_curr(t) && (!is_idle_task(t) && t->pid != 0));
thanks,
- Joel
next prev parent reply other threads:[~2023-10-11 5:05 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-04 17:53 [PATCH 5.15 000/183] 5.15.134-rc1 review Greg Kroah-Hartman
2023-10-04 18:43 ` Florian Fainelli
2023-10-06 10:25 ` Greg Kroah-Hartman
2023-10-06 10:37 ` Harshit Mogalapalli
2023-10-06 11:03 ` Greg Kroah-Hartman
2023-10-06 12:15 ` Sasha Levin
2023-10-06 17:23 ` Florian Fainelli
2023-10-05 1:01 ` Shuah Khan
2023-10-05 1:31 ` SeongJae Park
2023-10-05 17:49 ` Naresh Kamboju
2023-10-06 16:20 ` Liam R. Howlett
2023-10-06 16:47 ` Paul E. McKenney
2023-10-06 17:57 ` Liam R. Howlett
2023-10-06 18:20 ` Paul E. McKenney
2023-10-08 1:22 ` Joel Fernandes
2023-10-09 1:20 ` Paul E. McKenney
2023-10-11 1:34 ` Paul E. McKenney
2023-10-11 5:05 ` Joel Fernandes [this message]
2023-10-11 10:25 ` Paul E. McKenney
2023-10-11 13:47 ` Frederic Weisbecker
2023-10-11 16:31 ` Paul E. McKenney
2023-10-11 2:44 ` Joel Fernandes
2023-10-11 3:11 ` Paul E. McKenney
2023-10-05 22:18 ` Guenter Roeck
2023-10-06 7:40 ` Ron Economos
2023-10-06 9:32 ` Jon Hunter
2023-10-11 15:58 ` Joel Fernandes
2023-10-11 17:44 ` Greg Kroah-Hartman
2023-10-16 2:25 ` Joel Fernandes
2023-10-16 8:06 ` Greg Kroah-Hartman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231011050504.GA201855@google.com \
--to=joel@joelfernandes.org \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=conor@kernel.org \
--cc=f.fainelli@gmail.com \
--cc=gregkh@linuxfoundation.org \
--cc=jonathanh@nvidia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux@roeck-us.net \
--cc=lkft-triage@lists.linaro.org \
--cc=mingo@kernel.org \
--cc=naresh.kamboju@linaro.org \
--cc=ovidiu.panait@windriver.com \
--cc=patches@kernelci.org \
--cc=patches@lists.linux.dev \
--cc=paulmck@kernel.org \
--cc=pavel@denx.de \
--cc=peterz@infradead.org \
--cc=rcu@vger.kernel.org \
--cc=rwarsow@gmx.de \
--cc=shuah@kernel.org \
--cc=srw@sladewatkins.net \
--cc=stable@vger.kernel.org \
--cc=sudipm.mukherjee@gmail.com \
--cc=torvalds@linux-foundation.org \
--cc=zhouchengming@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox