From: "Paul E. McKenney" <paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
To: Heiko Carstens <heiko.carstens-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>
Cc: Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org>,
"Rafael J. Wysocki" <rjw-KKrjLPT3xs0@public.gmane.org>,
Linux Kernel Mailing List
<linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
Kernel Testers List
<kernel-testers-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
Rusty Russell <rusty-8n+1lVoiYb80n/F98K4Iww@public.gmane.org>,
Vegard Nossum
<vegard.nossum-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
Peter Zijlstra
<a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw@public.gmane.org>,
Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
Dmitry Adamushko
<dmitry.adamushko-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
Andrew Morton
<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
Steven Rostedt <srostedt-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Subject: Re: [Bug #11989] Suspend failure on NForce4-based boards due to chanes in stop_machine
Date: Tue, 11 Nov 2008 08:17:39 -0800 [thread overview]
Message-ID: <20081111161739.GA6736@linux.vnet.ibm.com> (raw)
In-Reply-To: <20081111150132.GB9459-Pmgahw53EmNLmI7Nx2oIsGnsbthNF6/HVpNB7YpNyf8@public.gmane.org>
On Tue, Nov 11, 2008 at 04:01:32PM +0100, Heiko Carstens wrote:
> On Tue, Nov 11, 2008 at 06:35:05AM -0800, Paul E. McKenney wrote:
> > > > A process that would do nothing but onlining/offlining cpus would get
> > > > stuck after a while:
> > > >
> > > > 0 schedule+842 [0x342522]
> > > > 1 schedule_timeout+200 [0x342ec4]
> > > > 2 wait_for_common+362 [0x341fd6]
> > > > 3 wait_for_completion+54 [0x342146]
> > > > 4 __synchronize_sched+80 [0x81670]
> > > > 5 cpu_down+172 [0x33c030]
> > > > 6 store_online+96 [0x33c488]
> > > > 7 sysdev_store+52 [0x1bda84]
> > > > 8 sysfs_write_file+242 [0x1350ba]
> > > > 9 vfs_write+176 [0xd2028]
> > > > 10 sys_write+82 [0xd21ea]
> > > > 11 sysc_noemu+16 [0x269d8]
> > > >
> > > > All cpus are in cpu_idle and no other task in state TASK_INTERRUPTIBLE
> > > > or TASK_UNINTERRUPTIBLE. However it would continue to work as soon as
> > > > I login into the system or generate a console interrupt.
> > > > I'm going to look into the dump and see if I can figure out what is
> > > > broken here.
> > > > Dunno if it is the same bug or something else.
> > >
> > > [Cc:-ed Steven and Paul, since this backtrace seems to be RCU specific]
> > >
> > > Steven, Paul, any idea what could cause the hang? I think I would
> > > get lost in the RCU code...
> >
> > Hello, Heiko,
> >
> > Could you please apply the following debug patch (due to Jiangshan and
> > myself)? Then you should be able to build with CONFIG_RCU_TRACE,
> > then mount debugfs after boot, for example, on /debug. This will
> > create a /debug/rcu directory with three files, "rcucb", "rcu_data",
> > and "rcu_bh_data". Since you are still able to log in, could you
> > please send the contents of these three files?
>
> Hi Paul,
>
> could you attach the patch please? :)
Peter Z. beat you to it. ;-)
See previous email.
> Does the patch also make sense if the system continues to work? That
> is the machine isn't stalled anymore as soon as I log in.
> On the other hand I do have a dump of the system and can look in
> whatever data structures you want. If that helps.
Ah!
I would like to see the value of rcu_ctrlblk.cpumask and also the value
of cpu_online_map. One guess would be that rcu_ctrlblk.cpumask has a
bit set that is -not- set in cpu_online_map, which would indicate that
RCU was incorrectly waiting on an offline CPU.
On the other hand, if all the bits set in rcu_ctrlblk.cpumask are also
set in cpu_online_map, then could you please dump out the instances of
the rcu_data per-CPU variable that correspond to the bits set in
rcu_ctrlblk.cpumask?
Finally, if no bits are set in rcu_ctrlblk.cpumask, the question would
be "why isn't the synchronize_sched() waking up?"
BTW, I am assuming that you have the same config as Raphael, in other
words, that you are running Classic RCU rather than preemptable RCU.
The point of the patch is that it allows you to see this info by catting
out the /debug/rcu files, at least assuming that the system is healthy
enough to allow you to cat files. But if you already have a crash dump...
Thanx, Paul
next prev parent reply other threads:[~2008-11-11 16:17 UTC|newest]
Thread overview: 106+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-11-09 17:53 2.6.28-rc3-git6: Reported regressions from 2.6.27 Rafael J. Wysocki
2008-11-09 17:53 ` [Bug #11799] xorg can not start up with stolen memory Rafael J. Wysocki
2008-11-09 17:54 ` [Bug #11806] iwl3945 fails with microcode error Rafael J. Wysocki
2008-11-09 17:59 ` [Bug #11834] iwl3945: if I leave my machine running overnight, wifi will not work in the morning Rafael J. Wysocki
2008-11-09 17:59 ` [Bug #11822] ACPI Warning (nspredef-0858): _SB_.PCI0.LPC_.EC__.BAT0._BIF: Return Package type mismatch at index 9 - found Buffer, expected String [20080926] Rafael J. Wysocki
2008-11-09 17:59 ` [Bug #11849] default IRQ affinity change in v2.6.27 (breaking several SMP PPC based systems) Rafael J. Wysocki
2008-11-09 17:59 ` [Bug #11841] plenty of line "ACPI: EC: non-query interrupt received, switching to interrupt mode" in dmesg and system not powering down Rafael J. Wysocki
2008-11-09 17:59 ` [Bug #11826] extreme slowness of IO stuff using 2.6.28-rc1 Rafael J. Wysocki
2008-11-09 17:59 ` [Bug #11873] unable to mount ext3 root filesystem due to htree_dirblock_to_tree Rafael J. Wysocki
2008-11-09 17:59 ` [Bug #11891] resume from disk broken on hp/compaq nx7000 (DRM problem) Rafael J. Wysocki
2008-11-09 17:59 ` [Bug #11858] Timeout regression introduced by 242f9dcb8ba6f68fcd217a119a7648a4f69290e9 Rafael J. Wysocki
2008-11-09 17:59 ` [Bug #11875] radeonfb lockup in .28-rc (bisected) Rafael J. Wysocki
2008-11-09 21:15 ` Benjamin Herrenschmidt
2008-11-10 5:46 ` Benjamin Herrenschmidt
2008-11-10 7:13 ` Paul Collins
[not found] ` <87abc8rr3m.fsf-D7l3p2TGOOdLdt5/z87VRY6ehsQQaF5K@public.gmane.org>
2008-11-10 9:05 ` Benjamin Herrenschmidt
2008-11-10 9:06 ` David Miller
2008-11-10 20:39 ` Andreas Schwab
[not found] ` <jetzafiad4.fsf-+JVCjXrnBTholqkO4TVVkw@public.gmane.org>
2008-11-10 21:52 ` Benjamin Herrenschmidt
2008-11-10 23:20 ` Andreas Schwab
[not found] ` <jefxlzi2x0.fsf-+JVCjXrnBTholqkO4TVVkw@public.gmane.org>
2008-11-10 23:34 ` Benjamin Herrenschmidt
2008-11-10 23:54 ` Andreas Schwab
[not found] ` <je1vxji1br.fsf-+JVCjXrnBTholqkO4TVVkw@public.gmane.org>
2008-11-11 1:49 ` Benjamin Herrenschmidt
2008-11-11 2:47 ` Linus Torvalds
[not found] ` <alpine.LFD.2.00.0811101822350.3468-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
2008-11-11 3:21 ` Benjamin Herrenschmidt
2008-11-11 9:31 ` Andreas Schwab
[not found] ` <jeskpy7gnl.fsf-+JVCjXrnBTholqkO4TVVkw@public.gmane.org>
2008-11-11 11:30 ` Benjamin Herrenschmidt
2008-11-21 2:55 ` Benjamin Herrenschmidt
2008-11-21 3:02 ` Benjamin Herrenschmidt
2008-11-13 23:11 ` David Miller
[not found] ` <20081113.151116.139760511.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
2008-11-14 0:54 ` Benjamin Herrenschmidt
2008-11-14 2:50 ` David Miller
[not found] ` <20081113.185059.154690040.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
2008-11-14 3:04 ` David Miller
[not found] ` <20081113.190447.252605555.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
2008-11-14 3:29 ` Benjamin Herrenschmidt
2008-11-14 4:28 ` David Miller
2008-11-14 8:51 ` Benjamin Herrenschmidt
2008-11-09 17:59 ` [Bug #11895] 2.6.28-rc2 regression: keyboard dead after reboot on Toshiba Portege 4000 Rafael J. Wysocki
2008-11-10 16:53 ` Andrey Borzenkov
[not found] ` <200811101953.38938.arvidjaar-JGs/UdohzUI@public.gmane.org>
2008-11-10 18:06 ` Rafael J. Wysocki
2008-11-09 17:59 ` [Bug #11898] mke2fs hang on AIC79 device Rafael J. Wysocki
2008-11-09 17:59 ` [Bug #11899] sometime boot failed on T61 laptop Rafael J. Wysocki
2008-11-09 17:59 ` [Bug #11896] [2.6.28-rc2] EeePC ACPI errors & exceptions Rafael J. Wysocki
2008-11-09 17:59 ` [Bug #11905] lots of extra timer interrupts costing 2W Rafael J. Wysocki
2008-11-09 17:59 ` [Bug #11906] 2.6.28-rc2 seems to fail at powering down the monitor when it should Rafael J. Wysocki
2008-11-09 17:59 ` [Bug #11903] regression: vmalloc easily fail Rafael J. Wysocki
2008-11-09 17:59 ` [Bug #11911] new PCMCIA device instance after resume - orinoco can't download firmware Rafael J. Wysocki
2008-11-10 3:55 ` Andrey Borzenkov
2008-11-09 17:59 ` [Bug #11917] Asus Eee PC hotkeys stop working after prolonged usage Rafael J. Wysocki
2008-11-09 17:59 ` [Bug #11908] linux-2.6.28-rc2 regression : oprofile doesnt work anymore Rafael J. Wysocki
2008-11-09 17:59 ` [Bug #11913] USB/INPUT: slab error in cache_alloc_debugcheck_after(): double free? Rafael J. Wysocki
2008-11-09 17:59 ` [Bug #11925] cdrom: missing compat ioctls Rafael J. Wysocki
2008-11-09 23:00 ` Andreas Schwab
[not found] ` <jeiqqwjyis.fsf-+JVCjXrnBTholqkO4TVVkw@public.gmane.org>
2008-11-09 23:29 ` Rafael J. Wysocki
[not found] ` <200811100029.11044.rjw-KKrjLPT3xs0@public.gmane.org>
2008-11-09 23:39 ` Andreas Schwab
2008-11-09 17:59 ` [Bug #11937] ext3 __log_wait_for_space: no transactions Rafael J. Wysocki
2008-11-09 17:59 ` [Bug #11928] ath5k gets lost with eeepc-laptop removal Rafael J. Wysocki
2008-11-09 17:59 ` [Bug #11942] AMD64 reboot regression Rafael J. Wysocki
2008-11-09 17:59 ` [Bug #11965] regression introduced by - timers: fix itimer/many thread hang Rafael J. Wysocki
2008-11-09 17:59 ` [Bug #11947] 2.6.28-rc VC switching with Intel graphics broken Rafael J. Wysocki
2008-11-11 9:28 ` Romano Giannetti
2008-11-09 17:59 ` [Bug #11958] [2.6.27.x => 2.6.28-rc3] Xorg crash with xf86MapVidMem error Rafael J. Wysocki
2008-11-09 17:59 ` [Bug #11982] Fan level 7 after resume wit 2.6.28-rc3 Rafael J. Wysocki
2008-11-09 17:59 ` [Bug #11985] 2.6.28-rc3 truncates nfsd results Rafael J. Wysocki
2008-11-09 21:05 ` J. Bruce Fields
2008-11-09 17:59 ` [Bug #11984] regression when switching TTY->X, input related? Rafael J. Wysocki
2008-11-09 17:59 ` [Bug #11970] gettimeofday return a old time in mmbench Rafael J. Wysocki
2008-11-09 17:59 ` [Bug #11989] Suspend failure on NForce4-based boards due to chanes in stop_machine Rafael J. Wysocki
2008-11-10 12:04 ` Heiko Carstens
2008-11-10 14:47 ` Rafael J. Wysocki
[not found] ` <200811101547.21325.rjw-KKrjLPT3xs0@public.gmane.org>
2008-11-10 22:55 ` Rafael J. Wysocki
[not found] ` <200811102355.42389.rjw-KKrjLPT3xs0@public.gmane.org>
2008-11-11 10:52 ` Ingo Molnar
[not found] ` <20081111105214.GA15645-X9Un+BFzKDI@public.gmane.org>
2008-11-11 11:31 ` Heiko Carstens
[not found] ` <20081111113134.GA5653-Pmgahw53EmNLmI7Nx2oIsGnsbthNF6/HVpNB7YpNyf8@public.gmane.org>
2008-11-11 12:42 ` Heiko Carstens
[not found] ` <20081111124201.GA9459-Pmgahw53EmNLmI7Nx2oIsGnsbthNF6/HVpNB7YpNyf8@public.gmane.org>
2008-11-11 13:13 ` Ingo Molnar
2008-11-11 14:35 ` Paul E. McKenney
2008-11-11 15:01 ` Heiko Carstens
[not found] ` <20081111150132.GB9459-Pmgahw53EmNLmI7Nx2oIsGnsbthNF6/HVpNB7YpNyf8@public.gmane.org>
2008-11-11 16:17 ` Paul E. McKenney [this message]
2008-11-11 15:02 ` Paul E. McKenney
[not found] ` <20081111150225.GA10743-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2008-11-11 16:14 ` Heiko Carstens
[not found] ` <20081111161401.GC9459-Pmgahw53EmNLmI7Nx2oIsGnsbthNF6/HVpNB7YpNyf8@public.gmane.org>
2008-11-11 16:45 ` Paul E. McKenney
[not found] ` <20081111164523.GB6736-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2008-11-11 17:34 ` Paul E. McKenney
[not found] ` <20081111173451.GA24720-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2008-11-12 9:05 ` Heiko Carstens
2008-11-12 16:03 ` Paul E. McKenney
[not found] ` <20081112160349.GA6667-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2008-11-12 16:51 ` Heiko Carstens
[not found] ` <20081112165118.GA30743-Pmgahw53EmNLmI7Nx2oIsGnsbthNF6/HVpNB7YpNyf8@public.gmane.org>
2008-11-12 19:43 ` Paul E. McKenney
2008-11-11 17:03 ` Q: force_quiescent_state && cpu_online_map Oleg Nesterov
[not found] ` <20081111170327.GB18214-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2008-11-11 17:25 ` Paul E. McKenney
2008-11-11 13:36 ` [Bug #11989] Suspend failure on NForce4-based boards due to chanes in stop_machine Vegard Nossum
2008-11-11 13:46 ` Vegard Nossum
[not found] ` <19f34abd0811110536i71994436q4aa78a99d201c478-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-11-11 13:49 ` Peter Zijlstra
2008-11-11 14:47 ` Vegard Nossum
[not found] ` <19f34abd0811110647y2a00cfbfr2b219a5aa1b3ac9f-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-11-11 15:11 ` Dmitry Adamushko
2008-11-11 16:31 ` Oleg Nesterov
[not found] ` <20081111163118.GA18214-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2008-11-12 3:30 ` Rusty Russell
2008-11-12 3:39 ` Rusty Russell
[not found] ` <200811112256.58467.rusty-8n+1lVoiYb80n/F98K4Iww@public.gmane.org>
2008-11-15 13:37 ` Rafael J. Wysocki
2008-11-11 21:28 ` Dmitry Adamushko
[not found] ` <b647ffbd0811111328s6a0cd185we3316be5e8f5ce-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-11-11 23:43 ` Rafael J. Wysocki
2008-11-09 17:59 ` [Bug #11994] Computer doesn't power down after commit CPI: EC: do transaction from interrupt context Rafael J. Wysocki
2008-11-09 17:59 ` [Bug #11988] Eliminate recursive mutex in compat fb ioctl path Rafael J. Wysocki
2008-11-14 14:51 ` Geert Uytterhoeven
[not found] ` <Pine.LNX.4.64.0811141549140.5955-DVqXPGhgXSn9uFGNBm7GzQ@public.gmane.org>
2008-11-15 11:51 ` Rafael J. Wysocki
2008-11-09 17:59 ` [Bug #11987] Bootup time regression from 2.6.27 to 2.6.28-rc3+ Rafael J. Wysocki
2008-11-09 17:59 ` [Bug #11986] 2.6.28-rc2-git1: spitz still won't boot Rafael J. Wysocki
2008-11-09 17:59 ` [Bug #11996] Tracing framework regression in 2.6.28-rc3 Rafael J. Wysocki
-- strict thread matches above, loose matches on Subject: below --
2008-11-16 16:24 2.6.28-rc5: Reported regressions from 2.6.27 Rafael J. Wysocki
2008-11-16 16:35 ` [Bug #11989] Suspend failure on NForce4-based boards due to chanes in stop_machine Rafael J. Wysocki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20081111161739.GA6736@linux.vnet.ibm.com \
--to=paulmck-23vcf4htsmix0ybbhkvfkdbpr1lh4cv8@public.gmane.org \
--cc=a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw@public.gmane.org \
--cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
--cc=dmitry.adamushko-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=heiko.carstens-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org \
--cc=kernel-testers-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=mingo-X9Un+BFzKDI@public.gmane.org \
--cc=oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=rjw-KKrjLPT3xs0@public.gmane.org \
--cc=rusty-8n+1lVoiYb80n/F98K4Iww@public.gmane.org \
--cc=srostedt-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=vegard.nossum-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).