linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* reiserfs unstable on large systems in 2.6.13-git9
@ 2005-09-11 22:30 Andi Kleen
  2005-09-11 23:21 ` Chris Mason
  0 siblings, 1 reply; 3+ messages in thread
From: Andi Kleen @ 2005-09-11 22:30 UTC (permalink / raw)
  To: reiserfs-dev, mason, jeffm; +Cc: linux-fsdevel


When I run even relatively minor stress on 8 or 16 core
Opterons I get deadlocks like this: 

At one point I also had a deadlock on a semaphore with
all processes that did disk access going into D
(backtrace lost on that one unfortunately but all the
traces went through reiserfs_set_acl) 

 Watchdog detected LOCKUP on CPU 10
CPU 10
Modules linked in:
Pid: 21498, comm: reaim Not tainted 2.6.13-git9 #4
RIP: 0010:[<ffffffff80418e2f>] <ffffffff80418e2f>{.text.lock.spinlock+22}
RSP: 0018:ffff81013b59bc40  EFLAGS: 00000086
RAX: 0000000000000000 RBX: ffffffff804c2ba0 RCX: 00000000c0000100
RDX: 0000000000000000 RSI: ffff81023b6f30c0 RDI: ffffffff804c2ba8
RBP: 0000000000000282 R08: ffff81013b59a000 R09: 0000000000000000
R10: 00000000ffffffff R11: 0000000000000246 R12: ffffffff804c2ba8
R13: ffff81023b6f30c0 R14: ffff81013b59bc50 R15: 00000000000001ff
FS:  00002aaaaaf3b0a0(0000) GS:ffffffff80603d00(0000) knlGS:00000000401c9c60
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007ffffffe048c CR3: 000000013dbdb000 CR4: 00000000000006a0
Process reaim (pid: 21498, threadinfo ffff81013b59a000, task ffff81023b6f30c0)
Stack: 0000000000000282 ffffffff80418ad0 0000000000000001 ffff81023b6f30c0
       ffffffff801313b0 ffff810134a0bc68 ffffffff804c2bb0 ffff81013b59bdd8
       ffff81013b59bdd8 ffff8102b586c678
Call Trace:<ffffffff80418ad0>{__down+160} <ffffffff801313b0>{default_wake_function+0}
       <ffffffff80418779>{__down_failed+53} <ffffffff80418f96>{.text.lock.kernel_lock+25}
       <ffffffff801c2bdc>{reiserfs_setattr+44} <ffffffff80418603>{__down_write+51}
       <ffffffff80199594>{notify_change+340} <ffffffff8017bf61>{do_truncate+65}
       <ffffffff8018d3e4>{may_open+468} <ffffffff8018ecfe>{open_namei+734}
       <ffffffff80417af3>{thread_return+0} <ffffffff8017bb97>{filp_open+39}
       <ffffffff8017b94b>{get_unused_fd+219} <ffffffff8017bc11>{do_sys_open+81}
       <ffffffff8010d91e>{system_call+126}

Code: 80 3f 00 7e f9 e9 90 fd ff ff f3 90 80 3f 00 7e f9 e9 9c fd
console shuts up ...
 NMI Watchdog detected LOCKUP on CPU 6
 ...

 

-Andi

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: reiserfs unstable on large systems in 2.6.13-git9
  2005-09-11 22:30 reiserfs unstable on large systems in 2.6.13-git9 Andi Kleen
@ 2005-09-11 23:21 ` Chris Mason
  2005-09-12  8:48   ` Andi Kleen
  0 siblings, 1 reply; 3+ messages in thread
From: Chris Mason @ 2005-09-11 23:21 UTC (permalink / raw)
  To: Andi Kleen; +Cc: reiserfs-dev, mason, jeffm, linux-fsdevel

On Mon, 12 Sep 2005 00:30:46 +0200
Andi Kleen <ak@suse.de> wrote:

> 
> When I run even relatively minor stress on 8 or 16 core
> Opterons I get deadlocks like this: 

I'm assuming this goes away when acls are turned off?

-chris

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: reiserfs unstable on large systems in 2.6.13-git9
  2005-09-11 23:21 ` Chris Mason
@ 2005-09-12  8:48   ` Andi Kleen
  0 siblings, 0 replies; 3+ messages in thread
From: Andi Kleen @ 2005-09-12  8:48 UTC (permalink / raw)
  To: Chris Mason; +Cc: reiserfs-dev, mason, jeffm, linux-fsdevel

On Monday 12 September 2005 01:21, Chris Mason wrote:
> On Mon, 12 Sep 2005 00:30:46 +0200
>
> Andi Kleen <ak@suse.de> wrote:
> > When I run even relatively minor stress on 8 or 16 core
> > Opterons I get deadlocks like this:
>
> I'm assuming this goes away when acls are turned off?

No, it doesn't although the oopses look different now.

BTW the second oops in 
https://bugzilla.novell.com/show_bug.cgi?id=105377
(for 2.6.13) looks similar too.

-Andi

(on 16 core system, but i've seen it on other smaller systems too) 

NMI Watchdog detected LOCKUP on CPU 11
^MCPU 11 
^MModules linked in:
^MPid: 20408, comm: ls Not tainted 2.6.13-git9 #4
^MRIP: 0010:[<ffffffff80418bc9>] <ffffffff80418bc9>{_spin_lock_irqsave+9}
^MRSP: 0018:ffff81043e259e60  EFLAGS: 00000002
^MRAX: 0000000000000000 RBX: ffffffff804c2ba0 RCX: 0000000000000000
^MRDX: 0000000000000000 RSI: 00007fffff959fc0 RDI: ffffffff804c2ba8
^MRBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
^MR10: ffffffffffffffff R11: 0000000000000246 R12: ffffffff804c2ba8
^MR13: ffff8102bd6eb540 R14: ffff81043e259e70 R15: 00007fffff95a000
^MFS:  00002aaaab31d6e0(0000) GS:ffffffff80603d80(0000) knlGS:0000000000000000
^MCS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
^MCR2: 00002aaaaaf9c450 CR3: 000000033bfca000 CR4: 00000000000006a0
^MProcess ls (pid: 20408, threadinfo ffff81043e258000, task ffff8102bd6eb540)
^MStack: 0000000000000286 ffffffff80418a7b 0000000000000000 ffff8102bd6eb540 
^M       ffffffff801313b0 0000000000000000 0000000000000000 ffff810009d6ba38 
^M       0000000000000000 0000000000000000 
^MCall Trace:<ffffffff80418a7b>{__down+75} 
<ffffffff801313b0>{default_wake_function+0}
^M       <ffffffff80418779>{__down_failed+53} 
<ffffffff80418f96>{.text.lock.kernel_lock+25}
^M       <ffffffff8013d3c6>{sys_sysctl+38} <ffffffff8010d91e>{system_call+126}
^M       

^MCode: f0 fe 0f 0f 88 5b 02 00 00 48 8b 04 24 48 83 c4 08 c3 66 66 
^Mconsole shuts up ...
^M NMI Watchdog detected LOCKUP on CPU 10
^MKernel panic - not syncing: Aiee, killing interrupt handler!
^M CPU 10 
^MModules linked in:
^MPid: 25726, comm: reaim Not tainted 2.6.13-git9 #4
^MRIP: 0010:[<ffffffff80418e2f>] <ffffffff80418e2f>{.text.lock.spinlock+22}
^MRSP: 0018:ffff810133aebc40  EFLAGS: 00000086
^MRAX: 0000000000000000 RBX: ffffffff804c2ba0 RCX: 0000000000000000
^MRDX: 0000000000000000 RSI: ffff810133aebdd8 RDI: ffffffff804c2ba8
^MRBP: ffff8103bfa26c78 R08: ffff81043ff72000 R09: 0000000000000000
^MR10: ffff81007f70c5e0 R11: 0000000000000246 R12: ffffffff804c2ba8
^MR13: ffff8101bc67c880 R14: ffff810133aebc50 R15: 00000000000001ff
^MFS:  00002aaaaaf3b0a0(0000) GS:ffffffff80603d00(0000) knlGS:0000000000000000
^MCS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
^MCR2: 00002aaaaae1641e CR3: 000000013fedc000 CR4: 00000000000006a0
^MProcess reaim (pid: 25726, threadinfo ffff810133aea000, task 
ffff8101bc67c880)
^MStack: 0000000000000282 ffffffff80418a7b 0000000000000000 ffff8101bc67c880 
^M       ffffffff801313b0 0000000000000000 0000000000000000 ffff8101bc67c880 
^M       ffff810133aebdd8 ffff8103bfa26c78 
^MCall Trace:<ffffffff80418a7b>{__down+75} 
<ffffffff801313b0>{default_wake_function+0}
^M       <ffffffff80418779>{__down_failed+53} 
<ffffffff80418f96>{.text.lock.kernel_lock+25}
^M       <ffffffff801c2bdc>{reiserfs_setattr+44} 
<ffffffff80418603>{__down_write+51}
^M       <ffffffff80199594>{notify_change+340} 
<ffffffff8017bf61>{do_truncate+65}
^M       <ffffffff8018d3e4>{may_open+468} <ffffffff8018ecfe>{open_namei+734}
^M       <ffffffff80417af3>{thread_return+0} <ffffffff8017bb97>{filp_open+39}
^M       <ffffffff8017b94b>{get_unused_fd+219} 
<ffffffff8017bc11>{do_sys_open+81}
^M       <ffffffff8010d91e>{system_call+126} 

^MCode: 80 3f 00 7e f9 e9 90 fd ff ff f3 90 80 3f 00 7e f9 e9 9c fd 
^Mconsole shuts up ...
^MBadness in do_unblank_screen at drivers/char/vt.c:2831

^MCall Trace: <NMI> <ffffffff8028e8db>{do_unblank_screen+75} 
<ffffffff801203fc>{bust_spinlocks+28}
^M       <ffffffff8010edb5>{oops_end+21} <ffffffff8010f601>{die_nmi+113}
^M       <ffffffff80119695>{nmi_watchdog_tick+245} 
<ffffffff8010f492>{default_do_nmi+130}
^M       <ffffffff80119585>{do_nmi+69} <ffffffff8010e927>{nmi+127}
^M       <ffffffff80418e2f>{.text.lock.spinlock+22}  <EOE> 
<ffffffff80418a7b>{__down+75}
^M       <ffffffff801313b0>{default_wake_function+0} 
<ffffffff80418779>{__down_failed+53}
^M       <ffffffff80418f96>{.text.lock.kernel_lock+25} 
<ffffffff801c2bdc>{reiserfs_setattr+44}
^M       <ffffffff80418603>{__down_write+51} 
<ffffffff80199594>{notify_change+340}
^M       <ffffffff8017bf61>{do_truncate+65} <ffffffff8018d3e4>{may_open+468}
^M       <ffffffff8018ecfe>{open_namei+734} 
<ffffffff80417af3>{thread_return+0}
^M       <ffffffff8017bb97>{filp_open+39} 
<ffffffff8017b94b>{get_unused_fd+219}
^M       <ffffffff8017bc11>{do_sys_open+81} 
<ffffffff8010d91e>{system_call+126}
^M       
^M <0>Rebooting in 30 seconds..SESC[0mESC[1m^@ESC[01;00H^@                                               

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2005-09-12  8:48 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-09-11 22:30 reiserfs unstable on large systems in 2.6.13-git9 Andi Kleen
2005-09-11 23:21 ` Chris Mason
2005-09-12  8:48   ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).