linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Ingo Molnar <mingo@elte.hu>
Cc: Josh Triplett <josht@linux.vnet.ibm.com>,
	linux-kernel@vger.kernel.org, laijs@cn.fujitsu.com,
	dipankar@in.ibm.com, akpm@linux-foundation.org,
	mathieu.desnoyers@polymtl.ca, dvhltc@us.ibm.com, niv@us.ibm.com,
	tglx@linutronix.de, peterz@infradead.org, rostedt@goodmis.org,
	hugh.dickins@tiscali.co.uk, benh@kernel.crashing.org
Subject: Re: [PATCH -tip/core/rcu 1/6] Cleanups and fixes for RCU in face of heavy CPU-hotplug stress
Date: Tue, 18 Aug 2009 13:07:01 -0700	[thread overview]
Message-ID: <20090818200701.GG6766@linux.vnet.ibm.com> (raw)
In-Reply-To: <20090818152643.GA5549@elte.hu>

On Tue, Aug 18, 2009 at 05:26:43PM +0200, Ingo Molnar wrote:
> 
> FYI, i've started triggering hangs in -tip testing recently, during 
> CPU hotplug tests:
> 
> [   57.632003] eth0: no IPv6 routers present
> [  103.564010] kmemleak: 29 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> [  200.380003] Hangcheck: hangcheck value past margin!
> [  248.192003] INFO: task S99local:2974 blocked for more than 120 seconds.
> [  248.194532] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  248.202330] S99local      D 0000000c  6256  2974   2687 0x00000000
> [  248.208929]  9c7ebe90 00000086 6b67ef8b 0000000c 9f25a610 81a69869 00000001 820b6990
> [  248.216123]  820b6990 820b6990 9c6e4c20 9c6e4eb4 82c78990 00000000 6b993559 0000000c
> [  248.220616]  9c7ebe90 8105f22a 9c6e4eb4 9c6e4c20 00000001 9c7ebe98 9c7ebeb4 81a65cb3
> [  248.229990] Call Trace:
> [  248.234049]  [<81a69869>] ? _spin_unlock_irqrestore+0x22/0x37
> [  248.239769]  [<8105f22a>] ? prepare_to_wait+0x48/0x4e
> [  248.244796]  [<81a65cb3>] rcu_barrier_cpu_hotplug+0xaa/0xc9
> [  248.250343]  [<8105f029>] ? autoremove_wake_function+0x0/0x38
> [  248.256063]  [<81062cf2>] notifier_call_chain+0x49/0x71
> [  248.261263]  [<81062da0>] raw_notifier_call_chain+0x11/0x13
> [  248.266809]  [<81a0b475>] _cpu_down+0x272/0x288
> [  248.271316]  [<81a0b4d5>] cpu_down+0x4a/0xa2
> [  248.275563]  [<81a0c48a>] store_online+0x2a/0x5e
> [  248.280156]  [<81a0c460>] ? store_online+0x0/0x5e
> [  248.284836]  [<814ddc35>] sysdev_store+0x20/0x28
> [  248.289429]  [<8112e403>] sysfs_write_file+0xb8/0xe3
> [  248.294369]  [<8112e34b>] ? sysfs_write_file+0x0/0xe3
> [  248.299396]  [<810e4c8f>] vfs_write+0x91/0x120
> [  248.303817]  [<810e4dc1>] sys_write+0x40/0x65
> [  248.308150]  [<81002d73>] sysenter_do_call+0x12/0x28
> 
> config and bootlog attached. I'd suspect one of these patches:
> 
> 684ca5c: rcu: Fix typo in rcu_irq_exit() comment header
> b612ba8: rcu: Make rcupreempt_trace.c look at offline CPUs
> 8064d54: rcu: Make preemptable RCU scan all CPUs when summing RCU counters
> 2e59755: rcu: Simplify RCU CPU-hotplug notification
> 799e64f: cpu hotplug: Introduce cpu_notifier() to handle !HOTPLUG_CPU case
> 2756962: rcu: Split hierarchical RCU initialization into boot-time and CPU-online piece
> 
> Any ideas?

Gah...  I thought I had fixed that one!!!  I was seeing a deadlock
where rcu_barrier_cpu_hotplug() would register the three RCU callbacks,
then wait for them.  But in some situations, it would wait for them in
a state such that grace period could not complete.  I convinced myself
that moving the wait back from CPU_DEAD to CPU_POST_DEAD solved the
problem.

I am going to take a more bullet-proof approach, switching from the
wait_completion() form to wait_event(), which will allow me to wait
for the previous hotplug operation's callbacks at the beginning of the
subsequent hotplug operation.

I reserve the right to insert a short delay in the CPU-hotplug path
outside of any locks, but would imagine that people would prefer that
I avoid that sort of thing, at least until we have bulk CPU-hotplug
operations.

							Thanx, Paul

  reply	other threads:[~2009-08-18 20:07 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-15 16:51 [PATCH -tip/core/rcu 1/6] Cleanups and fixes for RCU in face of heavy CPU-hotplug stress Paul E. McKenney
2009-08-15 16:53 ` [PATCH -tip/core/rcu 1/6] Split hierarchical RCU initialization into boot-time and CPU-online pieces Paul E. McKenney
2009-08-15 17:07   ` [tip:core/rcu] rcu: " tip-bot for Paul E. McKenney
2009-08-15 16:53 ` [PATCH -tip/core/rcu 2/6] Introduce cpu_notifier() to handle !HOTPLUG_CPU case Paul E. McKenney
2009-08-15 17:07   ` [tip:core/rcu] cpu hotplug: " tip-bot for Paul E. McKenney
2009-08-17 17:21   ` [PATCH -tip/core/rcu 2/6] " Josh Triplett
2009-08-17 18:28     ` Paul E. McKenney
2009-08-15 16:53 ` [PATCH -tip/core/rcu 3/6] Simplify RCU CPU-hotplug notification Paul E. McKenney
2009-08-15 17:07   ` [tip:core/rcu] rcu: " tip-bot for Paul E. McKenney
2009-08-20  4:02   ` [PATCH -tip/core/rcu 3/6] " Lai Jiangshan
2009-08-20  4:21     ` Paul E. McKenney
2009-08-15 16:53 ` [PATCH -tip/core/rcu 4/6] Make preemptable RCU scan all CPUs when summing RCU counters Paul E. McKenney
2009-08-15 17:07   ` [tip:core/rcu] rcu: " tip-bot for Paul E. McKenney
2009-08-15 16:53 ` [PATCH -tip/core/rcu 5/6] Make rcupreempt_trace.c look at offline CPUs Paul E. McKenney
2009-08-15 17:07   ` [tip:core/rcu] rcu: " tip-bot for Paul E. McKenney
2009-08-15 16:53 ` [PATCH -tip/core/rcu 6/6] Fix typo in rcu_irq_exit() comment header Paul E. McKenney
2009-08-15 17:00   ` Ingo Molnar
2009-08-15 17:10     ` Paul E. McKenney
2009-08-15 17:11       ` Ingo Molnar
2009-08-15 17:08   ` [tip:core/rcu] rcu: " tip-bot for Josh Triplett
2009-08-17 18:24 ` [PATCH -tip/core/rcu 1/6] Cleanups and fixes for RCU in face of heavy CPU-hotplug stress Josh Triplett
2009-08-17 19:20   ` Paul E. McKenney
2009-08-18 15:26     ` Ingo Molnar
2009-08-18 20:07       ` Paul E. McKenney [this message]
2009-08-19  6:06         ` Paul E. McKenney
2009-08-19 11:59           ` Ingo Molnar
2009-08-19 12:09           ` [tip:core/rcu] rcu: Delay rcu_barrier() wait until beginning of next CPU-hotunplug operation tip-bot for Paul E. McKenney
2009-08-19 15:24           ` [PATCH -tip/core/rcu 1/6] Cleanups and fixes for RCU in face of heavy CPU-hotplug stress Mathieu Desnoyers
2009-08-19 16:38             ` Paul E. McKenney
2009-08-19 18:10               ` Mathieu Desnoyers
2009-08-19 18:31                 ` Paul E. McKenney
2009-08-20 14:03       ` Mathieu Desnoyers
2009-08-21 14:17         ` Ingo Molnar
2009-08-21 14:29           ` Steven Rostedt
2009-08-21 14:44             ` Ingo Molnar
2009-08-21 15:00               ` Mathieu Desnoyers
2009-08-21 15:37               ` Paul E. McKenney
2009-08-21 14:58           ` Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090818200701.GG6766@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=benh@kernel.crashing.org \
    --cc=dipankar@in.ibm.com \
    --cc=dvhltc@us.ibm.com \
    --cc=hugh.dickins@tiscali.co.uk \
    --cc=josht@linux.vnet.ibm.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@polymtl.ca \
    --cc=mingo@elte.hu \
    --cc=niv@us.ibm.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).