From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Yinghai Lu <yinghai@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>, linux-kernel@vger.kernel.org
Subject: Re: [GIT PULL rcu/next] rcu commits for 2.6.40
Date: Sat, 14 May 2011 08:31:18 -0700 [thread overview]
Message-ID: <20110514153118.GA24311@linux.vnet.ibm.com> (raw)
In-Reply-To: <20110514142621.GB2258@linux.vnet.ibm.com>
On Sat, May 14, 2011 at 07:26:21AM -0700, Paul E. McKenney wrote:
> On Fri, May 13, 2011 at 02:08:21PM -0700, Yinghai Lu wrote:
> > On Thu, May 12, 2011 at 2:36 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> > > On 05/12/2011 02:20 AM, Paul E. McKenney wrote:
> > >> On Thu, May 12, 2011 at 12:42:50AM -0700, Yinghai Lu wrote:
> > >>> On 05/12/2011 12:27 AM, Yinghai Lu wrote:
> > >>>> On 05/11/2011 11:03 PM, Ingo Molnar wrote:
> > >>>>>
> > >>>>> * Yinghai Lu <yinghai@kernel.org> wrote:
> > >>>>>
> > >>>>>> e59fb3120becfb36b22ddb8bd27d065d3cdca499 is the first bad commit
> > >>>>>> commit e59fb3120becfb36b22ddb8bd27d065d3cdca499
> > >>>>>> Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > >>>>>> Date: Tue Sep 7 10:38:22 2010 -0700
> > >>>>>>
> > >>>>>> rcu: Decrease memory-barrier usage based on semi-formal proof
> > >>>>>
> > >>>>> Find below an (untested!) attempt at reverting it for debugging purposes: could
> > >>>>> you please try it, does your system now boot up fine?
> > >>>>>
> > >>>>> Thanks,
> > >>>>>
> > >>>>> Ingo
> > >>>>>
> > >>>>
> > >>>> yes, reverted manually that commit fix the problem.
> > >>>
> > >>> on system with 8 sockets westmere-ex
> > >>>
> > >>> it seems other commits after that commit contribute some delay too.
> > >>>
> > >>> [ 32.240739] cpu_dev_init done
> > >>> [ 73.587288] memory_dev_init done
> > >>
> > >> I am testing a revert of e59fb3120becfb36b22ddb8bd27d065d3cdca499 and
> > >> will chase down the delay.
> > >>
> > >
> > > it seems still need to revert following one in addition e59fb3120becfb36b22ddb8bd27d065d3cdca499.
> > >
> > > [root@mpk14-2404-239-158 linux-2.6]# git bisect good
> > > a26ac2455ffcf3be5c6ef92bc6df7182700f2114 is the first bad commit
> > > commit a26ac2455ffcf3be5c6ef92bc6df7182700f2114
> > > Author: Paul E. McKenney <paul.mckenney@linaro.org>
> > > Date: Wed Jan 12 14:10:23 2011 -0800
> > >
> > > rcu: move TREE_RCU from softirq to kthread
> > >
> > > If RCU priority boosting is to be meaningful, callback invocation must
> > > be boosted in addition to preempted RCU readers. Otherwise, in presence
> > > of CPU real-time threads, the grace period ends, but the callbacks don't
> > > get invoked. If the callbacks don't get invoked, the associated memory
> > > doesn't get freed, so the system is still subject to OOM.
> > >
> > > But it is not reasonable to priority-boost RCU_SOFTIRQ, so this commit
> > > moves the callback invocations to a kthread, which can be boosted easily.
> > >
> > > Also add comments and properly synchronized all accesses to
> > > rcu_cpu_kthread_task, as suggested by Lai Jiangshan.
> > >
> > > Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
> > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > Reviewed-by: Josh Triplett <josh@joshtriplett.org>
> > >
> > > :040000 040000 e40306ac6405952c1d387325a98588442209abe8 efe9ea2f408c62daaccf49e6d1339dff3a74f049 M Documentation
> > > :040000 040000 8f9e7a8fa3a728d4ae58e2efb8ada7cf08aed00e 9b44deba45ba905c5d9b3cc314812f0ba3f7e639 M include
> > > :040000 040000 4b10b719a2d56ed4bc796a9f43775732bb5ff144 4db269277ccf607e1a6a7d7f4c2a7cf8d592d46a M kernel
> > > :040000 040000 881f102e6831381beed016ed240d690f6a2ccd5e 57d2fc6f84e47394c116bc617a9a0ef9b8b6dbd4 M tools
> >
> > so only revert e59fb3120becfb36b22ddb8bd27d065d3cdca499 is not enough.
> >
> > [ 315.248277] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
> > [ 315.285642] serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
> > [ 427.405283] INFO: rcu_sched_state detected stalls on CPUs/tasks: {
> > 0} (detected by 50, t=15002 jiffies)
> > [ 427.408267] sending NMI to all CPUs:
> > [ 427.419298] NMI backtrace for cpu 1
> > [ 427.420616] CPU 1
> >
> > Paul, can you make one clean revert for
> > | a26ac2455ffcf3be5c6ef92bc6df7182700f2114
> > | rcu: move TREE_RCU from softirq to kthread
>
> I will be continuing to look into a few things over the weekend, but
> if I cannot find the cause, then changing back to softirq might be the
> thing to do. It won't be so much a revert in the "git revert" sense
> due to later dependencies, but it could be shifted back from kthread
> to softirq. This would certainly decrease dependence on the scheduler,
> at least in the common case where ksoftirqd does not run.
So, upon reviewing Yinghai's RCU debugfs output after getting a good
night's sleep, I see that the dyntick nesting level is getting messed up.
This is shown by the "dt=7237/73" near the end of the debugfs info of
Yinghai's message from Tue, 10 May 2011 23:42:24 -0700. This says that
RCU believes that the CPU is not in dyntick-idle mode (7237 is an odd
number) and that that there are 73 levels of not being in dyntick-idle
mode, which means at least 72 interrupt levels. Unless x86 interrupts
normally nest 72 levels deep...
This situation will cause RCU to think that a given CPU is not in
dyntick-idle mode when it really is. This results in RCU waiting on
it to respond, and eventually waking it up. Which would cause needless
grace-period delays.
Before commit e59fb31 (Decrease memory-barrier usage based on semi-formal
proof), rcu_enter_nohz() would have unconditionally caused RCU to believe
that the CPU was in dyntick-idle mode. After this commit, RCU pays attention
to the (broken) nesting count. Though the broken nesting level probably
caused some trouble even before this commit.
So I am restoring the old semantics where rcu_enter_nohz() unconditionally
tells RCU that the CPU really is in nohz mode. I am also adding
some WARN_ON_ONCE() statements that will hopefully help find where the
misnesting is occurring. I will also see if I can find the mis-nesting,
but I am not as familiar with the interrupt entry/exit code as I should
be. So I will create and sanity-test the patch and post it first,
and do the inspection afterwards.
Thanx, Paul
next prev parent reply other threads:[~2011-05-14 15:31 UTC|newest]
Thread overview: 76+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-05-08 15:18 [GIT PULL rcu/next] rcu commits for 2.6.40 Paul E. McKenney
2011-05-09 7:36 ` Ingo Molnar
2011-05-09 21:09 ` Yinghai Lu
2011-05-10 8:56 ` Paul E. McKenney
2011-05-10 9:37 ` Ingo Molnar
2011-05-10 18:04 ` Yinghai Lu
2011-05-10 19:32 ` Paul E. McKenney
2011-05-10 20:52 ` Yinghai Lu
2011-05-11 4:54 ` Paul E. McKenney
2011-05-11 6:03 ` Yinghai Lu
2011-05-11 6:42 ` Yinghai Lu
2011-05-11 20:13 ` Paul E. McKenney
2011-05-11 16:54 ` Yinghai Lu
2011-05-11 16:56 ` Yinghai Lu
2011-05-11 20:18 ` Paul E. McKenney
2011-05-11 20:59 ` Yinghai Lu
2011-05-11 21:30 ` Yinghai Lu
2011-05-11 23:02 ` Yinghai Lu
2011-05-12 6:03 ` Ingo Molnar
2011-05-12 7:27 ` Yinghai Lu
2011-05-12 7:42 ` Yinghai Lu
2011-05-12 9:20 ` Paul E. McKenney
2011-05-12 17:31 ` Yinghai Lu
2011-05-12 21:36 ` Yinghai Lu
2011-05-13 1:28 ` Yinghai Lu
2011-05-13 8:42 ` Ingo Molnar
2011-05-13 12:19 ` Ingo Molnar
2011-05-13 13:04 ` Ingo Molnar
2011-05-13 13:12 ` Ingo Molnar
2011-05-13 14:14 ` Paul E. McKenney
2011-05-13 15:07 ` Ingo Molnar
2011-05-13 16:26 ` Paul E. McKenney
2011-05-16 7:08 ` Ingo Molnar
2011-05-16 7:48 ` Paul E. McKenney
2011-05-16 11:51 ` Ingo Molnar
2011-05-16 12:23 ` Ingo Molnar
2011-05-16 14:30 ` Ingo Molnar
2011-05-16 21:33 ` Paul E. McKenney
2011-05-16 22:07 ` Paul E. McKenney
2011-05-16 21:24 ` Paul E. McKenney
2011-05-16 23:52 ` Frederic Weisbecker
2011-05-17 2:40 ` Frederic Weisbecker
2011-05-17 7:53 ` Paul E. McKenney
2011-05-17 12:43 ` Frederic Weisbecker
2011-05-17 22:21 ` Paul E. McKenney
2011-05-18 21:10 ` Yinghai Lu
2011-05-18 23:13 ` Frederic Weisbecker
2011-05-19 4:33 ` Yinghai Lu
2011-05-19 14:47 ` Frederic Weisbecker
2011-05-19 19:51 ` Yinghai Lu
2011-05-19 21:15 ` Frederic Weisbecker
2011-05-19 21:45 ` Yinghai Lu
2011-05-20 0:09 ` [PATCH] rcu: Fix unpaired rcu_irq_enter() from locking selftests Frederic Weisbecker
2011-05-20 8:36 ` Ingo Molnar
2011-05-20 15:12 ` Paul E. McKenney
2011-05-20 15:11 ` Paul E. McKenney
2011-05-20 0:14 ` [GIT PULL rcu/next] rcu commits for 2.6.40 Frederic Weisbecker
2011-05-13 14:40 ` Ingo Molnar
2011-05-13 16:38 ` Paul E. McKenney
2011-05-16 7:10 ` Ingo Molnar
2011-05-13 21:08 ` Yinghai Lu
2011-05-14 14:26 ` Paul E. McKenney
2011-05-14 15:31 ` Paul E. McKenney [this message]
2011-05-14 18:34 ` Paul E. McKenney
2011-05-15 3:59 ` Yinghai Lu
2011-05-15 4:14 ` Yinghai Lu
2011-05-15 5:41 ` Yinghai Lu
2011-05-15 5:49 ` Yinghai Lu
2011-05-15 6:04 ` Paul E. McKenney
2011-05-15 6:59 ` Paul E. McKenney
2011-05-16 7:08 ` Paul E. McKenney
2011-05-16 7:39 ` Ingo Molnar
2011-05-15 6:01 ` Paul E. McKenney
2011-05-15 22:01 ` Frederic Weisbecker
2011-05-16 5:56 ` Paul E. McKenney
2011-05-16 22:40 ` Frederic Weisbecker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110514153118.GA24311@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=yinghai@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.