From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: kordex - <kordex@gmail.com>
Cc: linux-kernel@vger.kernel.org
Subject: Re: Badness at kernel/rcutree.c:1228
Date: Tue, 8 Dec 2009 07:35:58 -0800 [thread overview]
Message-ID: <20091208153557.GD6799@linux.vnet.ibm.com> (raw)
In-Reply-To: <8b8dd87a0912080354k69c9f508v8186f0009cf1bf8f@mail.gmail.com>
On Tue, Dec 08, 2009 at 01:54:38PM +0200, kordex - wrote:
> Hey,
>
> I did add the printk and warn_once lines you requested and the output
> from them is:
>
> Hierarchical RCU implementation.
> rcu_init() smp_processor_id() = 0
> rcu_init_percpu_data(0) called
> rcu_init_percpu_data(0) called
So both rcu_sched_state.beenonline and rcu_bh_state.beenonline have
been set to 1 by the line immediately preceding the one that printed
out "rcu_init_percpu_data(0) called". This is the only line that
assigns a value to the beenonline field.
My suggestion is to bisect the portion of the boot process immediately
preceding the WARN_ON_ONCE() from the prior boot to locate whatever
is clobbering that value.
Thanx, Paul
> Full dmesg at http://xnet.fi/opt/apps/lkml-2.6.32-vanilla.dmesg.printk.txt
>
> Captured just right after boot so it does not yet have anything else
> than just clean boot messages.
>
> --Mikko Kortelainen
>
> 2009/12/8 Paul E. McKenney <paulmck@linux.vnet.ibm.com>:
> > On Mon, Dec 07, 2009 at 08:58:35PM +0200, kordex - wrote:
> >> Hello,
> >>
> >> I encountered this and I am able to reproduce, unfortunately. I was
> >> syncing files from two large HDD drives and I saw difference between
> >> cksum so I first suspected faulty filesystem but it was not that. Then
> >> I ran couple of Seagate Seatools tests on these disks of mine, still
> >> nothing. I suspected there was something bad in rsync(1) code but
> >> still negative as tar worked in the same way. Ofcourse I did the dmesg
> >> checking but after reboot after checking the disks were fine. It
> >> showed up clean. So I went there and tried to resync my non raid disk
> >> with raid0 disk. Still same problem, data mismatch. I read something
> >> from google when installing new sata controller about lib_sata an sil
> >> 3114 and seagates disks not working so this time i though of checking
> >> dmesg. And there it was nice little surprise:
> >>
> >> ------------[ cut here ]------------
> >> Badness at kernel/rcutree.c:1228
> >> NIP: c004ecbc LR: c004f14c CTR: c007bd70
> >> REGS: df34dde0 TRAP: 0700 Not tainted (2.6.32)
> >> MSR: 00029032 <EE,ME,CE,IR,DR> CR: 24024482 XER: 20000000
> >> TASK = d12c2d10[2734] 'buffer' THREAD: df34c000
> >> GPR00: 00000001 df34de90 d12c2d10 c0746498 c07465a0 00b1ed55 00000001 00138c4a
> >> GPR08: 00001032 c07b0000 00009032 00000008 44024488 1001c6ac 1002efd8 1002f064
> >> GPR16: 10060000 10070000 10070000 c073a590 c073a710 c0685204 00000000 c07a5520
> >> GPR24: c07a5520 0000000a df34c000 c0746498 c0746420 c0746528 00000009 c07465a0
> >> NIP [c004ecbc] __rcu_process_callbacks+0x34/0x48c
> >> LR [c004f14c] rcu_process_callbacks+0x38/0x4c
> >> Call Trace:
> >> [df34de90] [c004efec] __rcu_process_callbacks+0x364/0x48c (unreliable)
> >> [df34deb0] [c004f14c] rcu_process_callbacks+0x38/0x4c
> >> [df34ded0] [c0028e8c] __do_softirq+0xa4/0x120
> >> [df34df10] [c0006a0c] do_softirq+0x40/0x58
> >> [df34df20] [c0028c8c] irq_exit+0x38/0x80
> >> [df34df30] [c0010d28] timer_interrupt+0x11c/0x138
> >> [df34df40] [c00143a0] ret_from_except+0x0/0x14
> >> --- Exception: 901 at 0x10002ddc
> >> LR = 0x10002d80
> >> Instruction dump:
> >> 7c0802a6 bf410008 7c9f2378 7c7b1b78 90010024 8804000e 2f800000 40be002c
> >> 3d20c07b 8009a334 7c000034 5400d97e <0f000000> 2f800000 41be0010 38000001
> >> munin-update: page allocation failure. order:1, mode:0x20
> >> Call Trace:
> >> [c59bfa30] [c0008a24] show_stack+0x4c/0x14c (unreliable)
> >> [c59bfa70] [c0058154] __alloc_pages_nodemask+0x468/0x4b8
> >> [c59bfb00] [c0077448] cache_alloc_refill+0x2d8/0x55c
> >> [c59bfb50] [c0077830] kmem_cache_alloc+0x64/0xb0
> >> [c59bfb70] [c04862fc] sk_prot_alloc+0x2c/0x78
> >> [c59bfb90] [c04863e0] sk_clone+0x20/0x1b0
> >> [c59bfbb0] [c04bca70] inet_csk_clone+0x1c/0x8c
> >> [c59bfbc0] [c04d1358] tcp_create_openreq_child+0x20/0x2c4
> >> [c59bfbe0] [c04d00a0] tcp_v4_syn_recv_sock+0x58/0x17c
> >> [c59bfc00] [c04d11c4] tcp_check_req+0x268/0x3dc
> >> [c59bfc40] [c04cf87c] tcp_v4_do_rcv+0xa0/0x198
> >> [c59bfc70] [c04cfdb8] tcp_v4_rcv+0x444/0x6d4
> >> [c59bfca0] [c04b450c] ip_local_deliver+0x104/0x1d8
> >> [c59bfcc0] [c04b43d0] ip_rcv+0x508/0x540
> >> [c59bfcf0] [c049227c] netif_receive_skb+0x390/0x3bc
> >> [c59bfd20] [c0492344] process_backlog+0x9c/0x134
> >> [c59bfd50] [c0492b44] net_rx_action+0x80/0x190
> >> [c59bfd80] [c0028e8c] __do_softirq+0xa4/0x120
> >> [c59bfdc0] [c0006a0c] do_softirq+0x40/0x58
> >> [c59bfdd0] [c0028db4] local_bh_enable+0x7c/0x9c
> >> [c59bfde0] [c04852c8] release_sock+0x94/0xa8
> >> [c59bfe00] [c04dcd48] inet_stream_connect+0x224/0x29c
> >> [c59bfe50] [c0482a54] sys_connect+0x78/0xa8
> >> [c59bff00] [c0484088] sys_socketcall+0xf0/0x240
> >> [c59bff40] [c0013cf4] ret_from_syscall+0x0/0x38
> >> --- Exception: c01 at 0xfd51434
> >> LR = 0xff66550
> >> Mem-Info:
> >> DMA per-cpu:
> >> CPU 0: hi: 186, btch: 31 usd: 87
> >> active_anon:6308 inactive_anon:8347 isolated_anon:0
> >> active_file:52970 inactive_file:53351 isolated_file:0
> >> unevictable:0 dirty:12753 writeback:0 unstable:0
> >> free:1728 slab_reclaimable:2017 slab_unreclaimable:1020
> >> mapped:2045 shmem:410 pagetables:431 bounce:0
> >> DMA free:6912kB min:2884kB low:3604kB high:4324kB active_anon:25232kB
> >> inactive_anon:33388kB active_file:211880kB inactive_file:213404kB
> >> unevictable:0kB isolated(anon):0kB isolated(file):0kB present:520192kB
> >> mlocked:0kB dirty:51012kB writeback:0kB mapped:8180kB shmem:1640kB
> >> slab_reclaimable:8068kB slab_unreclaimable:4080kB kernel_stack:992kB
> >> pagetables:1724kB unstable:0kB bounce:0kB writeback_tmp:0kB
> >> pages_scanned:0 all_unreclaimable? no
> >> lowmem_reserve[]: 0 0 0 0
> >> DMA: 1728*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
> >> 0*1024kB 0*2048kB 0*4096kB = 6912kB
> >> 106743 total pagecache pages
> >> 0 pages in swap cache
> >> Swap cache stats: add 0, delete 0, find 0/0
> >> Free swap = 779144kB
> >> Total swap = 779144kB
> >> 131072 pages RAM
> >> 0 pages HighMem
> >> 3391 pages reserved
> >> 14583 pages shared
> >> 121040 pages non-shared
> >>
> >> ...
> >>
> >> Full dmesg: http://xnet.fi/opt/apps/lkml-2.6.32-vanilla.dmesg.txt
> >> Kernel config: http://xnet.fi/opt/apps/lkml-2.6.32-vanilla.config.txt
> >
> > Interesting! You appear to be running a UP build on a 32-bit PowerPC
> > platform, which presumably disables CPU hotplug. You are also running
> > CONFIG_TREE_RCU, which does require initialization. Your dmesg snippet
> > seems to start after RCU initialized itself, as it would normally print:
> >
> > Hierarchical RCU implementation.
> >
> > Are you able to capture console output that early? If so, could you
> > please try the attached patch?
> >
> > The behavior is consistent with the boot CPU thinking that it is not
> > online, which would indeed be strange. The patch checks on this and
> > also prints out at the point that ->beenonline should be initialized.
> >
> >> uname -a: Linux navi 2.6.32 #2 Mon Dec 7 13:35:57 EET 2009 ppc GNU/Linux
> >>
> >> And no I am not running any patches on this. Same occurred with
> >> 2.6.31.6 which I had patched with Con Kolivas' BFS so I upgraded to
> >> latest stable vanilla without any patches and still there it is. You
> >> know, it ain't too interesting to sync 1TB disks 5 times (yea I needed
> >> to do some testing for rsync) with slow controllers like these.
> >
> > You also have quite a few occurrences of the following splat in your
> > dmesg, which might well explain your sync-ing woes:
> >
> > Mem-Info:
> > DMA per-cpu:
> > CPU 0: hi: 186, btch: 31 usd: 153
> > active_anon:8938 inactive_anon:9432 isolated_anon:0
> > active_file:11943 inactive_file:91997 isolated_file:0
> > unevictable:0 dirty:4317 writeback:3464 unstable:0
> > free:1044 slab_reclaimable:1160 slab_unreclaimable:1123
> > mapped:3207 shmem:411 pagetables:471 bounce:0
> > DMA free:4176kB min:2884kB low:3604kB high:4324kB active_anon:35752kB
> > inactive_anon:37728kB active_file:47772kB inactive_file:367988kB
> > unevictable:0kB isolated(anon):0kB isolated(file):0kB present:520192kB
> > mlocked:0kB dirty:17268kB writeback:13856kB mapped:12828kB shmem:1644kB
> > slab_reclaimable:4640kB slab_unreclaimable:4492kB kernel_stack:1040kB
> > pagetables:1884kB unstable:0kB bounce:0kB writeback_tmp:0kB
> > pages_scanned:0 all_unreclaimable? no
> > lowmem_reserve[]: 0 0 0 0
> > DMA: 1034*4kB 5*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
> > 0*1024kB 0*2048kB 0*4096kB = 4176kB
> > 104353 total pagecache pages
> > 0 pages in swap cache
> > Swap cache stats: add 0, delete 0, find 0/0
> > Free swap = 779144kB
> > Total swap = 779144kB
> > 131072 pages RAM
> > 0 pages HighMem
> > 3391 pages reserved
> > 14353 pages shared
> > 120982 pages non-shared
> > mconf: page allocation failure. order:1, mode:0x20
> > Call Trace:
> > [d1197cb0] [c0008a24] show_stack+0x4c/0x14c (unreliable)
> > [d1197cf0] [c0058154] __alloc_pages_nodemask+0x468/0x4b8
> > [d1197d80] [c0077448] cache_alloc_refill+0x2d8/0x55c
> > [d1197dd0] [c0077780] __kmalloc+0xb4/0x100
> > [d1197df0] [c02aa6bc] tty_buffer_request_room+0xcc/0x148
> > [d1197e10] [c02aa8f4] tty_insert_flip_string+0x2c/0xa0
> > [d1197e30] [c02ab678] pty_write+0x40/0x70
> > [d1197e50] [c02a6244] n_tty_write+0x260/0x3bc
> > [d1197eb0] [c02a3780] tty_write+0x1bc/0x260
> > [d1197ef0] [c007b4d4] vfs_write+0xb8/0x180
> > [d1197f10] [c007b67c] sys_write+0x4c/0x8c
> > [d1197f40] [c0013cf4] ret_from_syscall+0x0/0x38
> > --- Exception: c01 at 0xfef4034
> > LR = 0xfe97624
> > Mem-Info:
> > DMA per-cpu:
> > CPU 0: hi: 186, btch: 31 usd: 153
> > active_anon:8938 inactive_anon:9432 isolated_anon:0
> > active_file:11943 inactive_file:91997 isolated_file:0
> > unevictable:0 dirty:4317 writeback:3336 unstable:0
> > free:1044 slab_reclaimable:1160 slab_unreclaimable:1123
> > mapped:3207 shmem:411 pagetables:471 bounce:0
> > DMA free:4176kB min:2884kB low:3604kB high:4324kB active_anon:35752kB
> > inactive_anon:37728kB active_file:47772kB inactive_file:367988kB
> > unevictable:0kB isolated(anon):0kB isolated(file):0kB present:520192kB
> > mlocked:0kB dirty:17268kB writeback:13344kB mapped:12828kB shmem:1644kB
> > slab_reclaimable:4640kB slab_unreclaimable:4492kB kernel_stack:1040kB
> > pagetables:1884kB unstable:0kB bounce:0kB writeback_tmp:0kB
> > pages_scanned:0 all_unreclaimable? no
> > lowmem_reserve[]: 0 0 0 0
> > DMA: 1034*4kB 5*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
> > 0*1024kB 0*2048kB 0*4096kB = 4176kB
> > 104353 total pagecache pages
> > 0 pages in swap cache
> > Swap cache stats: add 0, delete 0, find 0/0
> > Free swap = 779144kB
> > Total swap = 779144kB
> > 131072 pages RAM
> > 0 pages HighMem
> > 3391 pages reserved
> > 14353 pages shared
> > 120982 pages non-shared
> > mconf: page allocation failure. order:1, mode:0x20
> > Call Trace:
> > [d1197cb0] [c0008a24] show_stack+0x4c/0x14c (unreliable)
> > [d1197cf0] [c0058154] __alloc_pages_nodemask+0x468/0x4b8
> > [d1197d80] [c0077448] cache_alloc_refill+0x2d8/0x55c
> > [d1197dd0] [c0077780] __kmalloc+0xb4/0x100
> > [d1197df0] [c02aa6bc] tty_buffer_request_room+0xcc/0x148
> > [d1197e10] [c02aa8f4] tty_insert_flip_string+0x2c/0xa0
> > [d1197e30] [c02ab678] pty_write+0x40/0x70
> > [d1197e50] [c02a6244] n_tty_write+0x260/0x3bc
> > [d1197eb0] [c02a3780] tty_write+0x1bc/0x260
> > [d1197ef0] [c007b4d4] vfs_write+0xb8/0x180
> > [d1197f10] [c007b67c] sys_write+0x4c/0x8c
> > [d1197f40] [c0013cf4] ret_from_syscall+0x0/0x38
> > --- Exception: c01 at 0xfef4034
> >
> > Thanx, Paul
> >
> > ------------------------------------------------------------------------
> >
> > rcu: diagnostic patch for kordex
> >
> > Make sure that the boot CPU has marked itself as online, print out
> > the running CPU's ID, and print a message from within rcu_init_percpu_data()
> > that includes the currently running CPU.
> >
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> not for inclusion
> > ---
> >
> > diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c
> > index 4001833..4f357c4 100644
> > --- a/kernel/rcupdate.c
> > +++ b/kernel/rcupdate.c
> > @@ -175,6 +175,8 @@ void __init rcu_init(void)
> > * this is called early in boot, before either interrupts
> > * or the scheduler are operational.
> > */
> > + WARN_ON_ONCE(!cpu_online(smp_processor_id()));
> > + printk(KERN_ALERT "rcu_init() smp_processor_id() = %d\n", smp_processor_id());
> > for_each_online_cpu(i)
> > rcu_barrier_cpu_hotplug(NULL, CPU_UP_PREPARE, (void *)(long)i);
> > }
> > diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> > index f3077c0..207125b 100644
> > --- a/kernel/rcutree.c
> > +++ b/kernel/rcutree.c
> > @@ -1557,6 +1557,7 @@ rcu_init_percpu_data(int cpu, struct rcu_state *rsp, int preemptable)
> > rdp->passed_quiesc = 0; /* We could be racing with new GP, */
> > rdp->qs_pending = 1; /* so set up to respond to current GP. */
> > rdp->beenonline = 1; /* We have now been online. */
> > + printk(KERN_ALERT "rcu_init_percpu_data(%d) called\n", cpu);
> > rdp->preemptable = preemptable;
> > rdp->passed_quiesc_completed = lastcomp - 1;
> > rdp->qlen_last_fqs_check = 0;
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
next prev parent reply other threads:[~2009-12-08 15:35 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-12-07 18:58 Badness at kernel/rcutree.c:1228 kordex -
2009-12-08 0:08 ` Paul E. McKenney
[not found] ` <8b8dd87a0912080352j36fa24bbvf301a4101d80e434@mail.gmail.com>
2009-12-08 11:54 ` kordex -
2009-12-08 15:35 ` Paul E. McKenney [this message]
[not found] ` <8b8dd87a0912080924q3890ea8o23f6d1cfab37e306@mail.gmail.com>
[not found] ` <20091208192207.GC6779@linux.vnet.ibm.com>
[not found] ` <20091208200657.GA12990@linux.vnet.ibm.com>
2009-12-09 1:41 ` kordex -
2009-12-09 2:08 ` Paul E. McKenney
[not found] ` <8b8dd87a0912090115i3c4877b8s2e47f84f9c66ff7@mail.gmail.com>
[not found] ` <20091209140334.GB6812@linux.vnet.ibm.com>
[not found] ` <8b8dd87a0912090657y2702ecd7x5f41beb3ab785ccf@mail.gmail.com>
[not found] ` <20091209165304.GB6938@linux.vnet.ibm.com>
[not found] ` <8b8dd87a0912090935s75de7011s905c3007311a6b6b@mail.gmail.com>
[not found] ` <20091209183042.GD6938@linux.vnet.ibm.com>
2009-12-09 18:36 ` kordex -
2009-12-09 19:18 ` Paul E. McKenney
[not found] ` <8b8dd87a0912091309m21b8e560pd68e47e7ec38f097@mail.gmail.com>
[not found] ` <20091210024809.GJ6938@linux.vnet.ibm.com>
2009-12-10 14:24 ` kordex -
2009-12-10 19:32 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20091208153557.GD6799@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=kordex@gmail.com \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.