From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
To: Ingo Molnar <mingo@elte.hu>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
Josh Triplett <josht@linux.vnet.ibm.com>,
linux-kernel@vger.kernel.org, laijs@cn.fujitsu.com,
dipankar@in.ibm.com, akpm@linux-foundation.org,
dvhltc@us.ibm.com, niv@us.ibm.com, tglx@linutronix.de,
peterz@infradead.org, rostedt@goodmis.org,
hugh.dickins@tiscali.co.uk, benh@kernel.crashing.org
Subject: Re: [PATCH -tip/core/rcu 1/6] Cleanups and fixes for RCU in face of heavy CPU-hotplug stress
Date: Thu, 20 Aug 2009 10:03:35 -0400 [thread overview]
Message-ID: <20090820140335.GA31773@Krystal> (raw)
In-Reply-To: <20090818152643.GA5549@elte.hu>
* Ingo Molnar (mingo@elte.hu) wrote:
>
> FYI, i've started triggering hangs in -tip testing recently, during
> CPU hotplug tests:
>
> [ 57.632003] eth0: no IPv6 routers present
> [ 103.564010] kmemleak: 29 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> [ 200.380003] Hangcheck: hangcheck value past margin!
> [ 248.192003] INFO: task S99local:2974 blocked for more than 120 seconds.
> [ 248.194532] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 248.202330] S99local D 0000000c 6256 2974 2687 0x00000000
> [ 248.208929] 9c7ebe90 00000086 6b67ef8b 0000000c 9f25a610 81a69869 00000001 820b6990
> [ 248.216123] 820b6990 820b6990 9c6e4c20 9c6e4eb4 82c78990 00000000 6b993559 0000000c
> [ 248.220616] 9c7ebe90 8105f22a 9c6e4eb4 9c6e4c20 00000001 9c7ebe98 9c7ebeb4 81a65cb3
> [ 248.229990] Call Trace:
> [ 248.234049] [<81a69869>] ? _spin_unlock_irqrestore+0x22/0x37
> [ 248.239769] [<8105f22a>] ? prepare_to_wait+0x48/0x4e
> [ 248.244796] [<81a65cb3>] rcu_barrier_cpu_hotplug+0xaa/0xc9
> [ 248.250343] [<8105f029>] ? autoremove_wake_function+0x0/0x38
> [ 248.256063] [<81062cf2>] notifier_call_chain+0x49/0x71
> [ 248.261263] [<81062da0>] raw_notifier_call_chain+0x11/0x13
> [ 248.266809] [<81a0b475>] _cpu_down+0x272/0x288
> [ 248.271316] [<81a0b4d5>] cpu_down+0x4a/0xa2
> [ 248.275563] [<81a0c48a>] store_online+0x2a/0x5e
> [ 248.280156] [<81a0c460>] ? store_online+0x0/0x5e
> [ 248.284836] [<814ddc35>] sysdev_store+0x20/0x28
> [ 248.289429] [<8112e403>] sysfs_write_file+0xb8/0xe3
> [ 248.294369] [<8112e34b>] ? sysfs_write_file+0x0/0xe3
> [ 248.299396] [<810e4c8f>] vfs_write+0x91/0x120
> [ 248.303817] [<810e4dc1>] sys_write+0x40/0x65
> [ 248.308150] [<81002d73>] sysenter_do_call+0x12/0x28
>
> config and bootlog attached. I'd suspect one of these patches:
>
> 684ca5c: rcu: Fix typo in rcu_irq_exit() comment header
> b612ba8: rcu: Make rcupreempt_trace.c look at offline CPUs
> 8064d54: rcu: Make preemptable RCU scan all CPUs when summing RCU counters
> 2e59755: rcu: Simplify RCU CPU-hotplug notification
> 799e64f: cpu hotplug: Introduce cpu_notifier() to handle !HOTPLUG_CPU case
> 2756962: rcu: Split hierarchical RCU initialization into boot-time and CPU-online piece
>
> Any ideas?
>
[...]
> [ 0.484001] Booting processor 1 APIC 0x1 ip 0x6000
> [ 0.004000] Initializing CPU#1
> [ 0.004000] masked ExtINT on CPU#1
> [ 0.004000] Calibrating delay loop... 2007.04 BogoMIPS (lpj=4014080)
> [ 0.004000] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
> [ 0.004000] CPU: L2 Cache: 512K (64 bytes/line)
> [ 0.004000] CPU: Physical Processor ID: 0
> [ 0.004000] CPU: Processor Core ID: 1
> [ 0.004000] mce: CPU supports 5 MCE banks
> [ 0.588001] CPU1: AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ stepping 02
I would not trust this architecture for synchronization tests. There has
been reports of a hardware bug affecting the cmpxchg instruction in the
field. The load fence normally implied by the semantic seems to be
missing. AFAIK, AMD never acknowledged the problem.
So far I have proposed a best effort fix for this (patching in an lfence
after cmpxchg), but userspace would still be buggy.
http://lkml.indiana.edu/hypermail/linux/kernel/0904.2/03024.html
However, in this particular RCU case, __wait_event() calls
prepare_to_wait() which uses set_current_state(). This in turn uses
xchg(), which has an implied memory barrier. This barrier is required
for proper ordering of the condition test wrt preparing the wait queue.
Maybe it would be worth checking if amd did not screw up the xchg
instruction too on this architecture.
So I am not saying that there is not another bug hidden somewhere, but
until this can be reproduced on a different architecture, I think we
should consider that the problem might be caused by missing hardware
synchronization.
Mathieu
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
next prev parent reply other threads:[~2009-08-20 14:03 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-15 16:51 [PATCH -tip/core/rcu 1/6] Cleanups and fixes for RCU in face of heavy CPU-hotplug stress Paul E. McKenney
2009-08-15 16:53 ` [PATCH -tip/core/rcu 1/6] Split hierarchical RCU initialization into boot-time and CPU-online pieces Paul E. McKenney
2009-08-15 17:07 ` [tip:core/rcu] rcu: " tip-bot for Paul E. McKenney
2009-08-15 16:53 ` [PATCH -tip/core/rcu 2/6] Introduce cpu_notifier() to handle !HOTPLUG_CPU case Paul E. McKenney
2009-08-15 17:07 ` [tip:core/rcu] cpu hotplug: " tip-bot for Paul E. McKenney
2009-08-17 17:21 ` [PATCH -tip/core/rcu 2/6] " Josh Triplett
2009-08-17 18:28 ` Paul E. McKenney
2009-08-15 16:53 ` [PATCH -tip/core/rcu 3/6] Simplify RCU CPU-hotplug notification Paul E. McKenney
2009-08-15 17:07 ` [tip:core/rcu] rcu: " tip-bot for Paul E. McKenney
2009-08-20 4:02 ` [PATCH -tip/core/rcu 3/6] " Lai Jiangshan
2009-08-20 4:21 ` Paul E. McKenney
2009-08-15 16:53 ` [PATCH -tip/core/rcu 4/6] Make preemptable RCU scan all CPUs when summing RCU counters Paul E. McKenney
2009-08-15 17:07 ` [tip:core/rcu] rcu: " tip-bot for Paul E. McKenney
2009-08-15 16:53 ` [PATCH -tip/core/rcu 5/6] Make rcupreempt_trace.c look at offline CPUs Paul E. McKenney
2009-08-15 17:07 ` [tip:core/rcu] rcu: " tip-bot for Paul E. McKenney
2009-08-15 16:53 ` [PATCH -tip/core/rcu 6/6] Fix typo in rcu_irq_exit() comment header Paul E. McKenney
2009-08-15 17:00 ` Ingo Molnar
2009-08-15 17:10 ` Paul E. McKenney
2009-08-15 17:11 ` Ingo Molnar
2009-08-15 17:08 ` [tip:core/rcu] rcu: " tip-bot for Josh Triplett
2009-08-17 18:24 ` [PATCH -tip/core/rcu 1/6] Cleanups and fixes for RCU in face of heavy CPU-hotplug stress Josh Triplett
2009-08-17 19:20 ` Paul E. McKenney
2009-08-18 15:26 ` Ingo Molnar
2009-08-18 20:07 ` Paul E. McKenney
2009-08-19 6:06 ` Paul E. McKenney
2009-08-19 11:59 ` Ingo Molnar
2009-08-19 12:09 ` [tip:core/rcu] rcu: Delay rcu_barrier() wait until beginning of next CPU-hotunplug operation tip-bot for Paul E. McKenney
2009-08-19 15:24 ` [PATCH -tip/core/rcu 1/6] Cleanups and fixes for RCU in face of heavy CPU-hotplug stress Mathieu Desnoyers
2009-08-19 16:38 ` Paul E. McKenney
2009-08-19 18:10 ` Mathieu Desnoyers
2009-08-19 18:31 ` Paul E. McKenney
2009-08-20 14:03 ` Mathieu Desnoyers [this message]
2009-08-21 14:17 ` Ingo Molnar
2009-08-21 14:29 ` Steven Rostedt
2009-08-21 14:44 ` Ingo Molnar
2009-08-21 15:00 ` Mathieu Desnoyers
2009-08-21 15:37 ` Paul E. McKenney
2009-08-21 14:58 ` Mathieu Desnoyers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090820140335.GA31773@Krystal \
--to=mathieu.desnoyers@polymtl.ca \
--cc=akpm@linux-foundation.org \
--cc=benh@kernel.crashing.org \
--cc=dipankar@in.ibm.com \
--cc=dvhltc@us.ibm.com \
--cc=hugh.dickins@tiscali.co.uk \
--cc=josht@linux.vnet.ibm.com \
--cc=laijs@cn.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=niv@us.ibm.com \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox