* [Bug 2367] New: kernel BUG at inode.c:334!
@ 2004-03-25 17:32 Martin J. Bligh
[not found] ` <20040326154915.GC3472@logos.cnet>
0 siblings, 1 reply; 5+ messages in thread
From: Martin J. Bligh @ 2004-03-25 17:32 UTC (permalink / raw)
To: linux-kernel; +Cc: bugzilla
http://bugme.osdl.org/show_bug.cgi?id=2367
Summary: kernel BUG at inode.c:334!
Kernel Version: 2.4.25
Status: NEW
Severity: normal
Owner: axboe@suse.de
Submitter: bugzilla@stone.nu
Distribution: Debian GNU/Linux
Hardware Environment: IBM Netfinity 6000 / ICP Vortex / EXP300 Disk cabinet
Software Environment: Vanilla Debian GNU/Linux (woody)
Problem Description:
kernel BUG at inode.c:334!
invalid operand: 0000
CPU: 1
EIP: 0010:[<c014b8bc>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010202
eax: f7b0606c ebx: 0000000f ecx: ee95c228 edx: f7b06000
esi: ee95c220 edi: f7b06064 ebp: f7b0606c esp: f031df88
ds: 0018 es: 0018 ss: 0018
Process fsstress (pid: 883, stackpage=f031d000)
Stack: f7b06000 f031c000 bffffb94 bffff918 f7b06064 c014bd16 f7b06000 00000000
c0137e4e 00000000 f031c000 400135cc c0137eab 00000000 c0106e77 4012ee48
4012ad74 080505a0 400135cc bffffb94 bffff918 00000024 0000002b 0000002b
Call Trace: [<c014bd16>] [<c0137e4e>] [<c0137eab>] [<c0106e77>]
Code: 0f 0b 4e 01 b5 a5 2a c0 89 d8 0c 08 24 f8 89 86 1c 01 00 00
>> EIP; c014b8bc <sync_inodes_sb+90/250> <=====
>> eax; f7b0606c <_end+3777e8e8/3865b8dc>
>> ecx; ee95c228 <_end+2e5d4aa4/3865b8dc>
>> edx; f7b06000 <_end+3777e87c/3865b8dc>
>> esi; ee95c220 <_end+2e5d4a9c/3865b8dc>
>> edi; f7b06064 <_end+3777e8e0/3865b8dc>
>> ebp; f7b0606c <_end+3777e8e8/3865b8dc>
>> esp; f031df88 <_end+2ff96804/3865b8dc>
Trace; c014bd16 <sync_inodes+36/4c>
Trace; c0137e4e <fsync_dev+3a/80>
Trace; c0137eab <sys_sync+7/10>
Trace; c0106e77 <system_call+33/38>
Code; c014b8bc <sync_inodes_sb+90/250>
00000000 <_EIP>:
Code; c014b8bc <sync_inodes_sb+90/250> <=====
0: 0f 0b ud2a <=====
Code; c014b8be <sync_inodes_sb+92/250>
2: 4e dec %esi
Code; c014b8bf <sync_inodes_sb+93/250>
3: 01 b5 a5 2a c0 89 add %esi,0x89c02aa5(%ebp)
Code; c014b8c5 <sync_inodes_sb+99/250>
9: d8 0c 08 fmuls (%eax,%ecx,1)
Code; c014b8c8 <sync_inodes_sb+9c/250>
c: 24 f8 and $0xf8,%al
Code; c014b8ca <sync_inodes_sb+9e/250>
e: 89 86 1c 01 00 00 mov %eax,0x11c(%esi)
Steps to reproduce:
1) Download the latest LTP testsuite
2) Build and install the testsuite
3) Make sure the NFS server daemons are running
4) Export an XFS filesystem to be used for testing, globally, with root allowed.
ex: /mnt/xfs *(sync,rw, no_root_squash)
5) Change directory to where the LTP is installed
6) Change directory to testcases/bin/
7) Execute 'nfs_fsstress.sh' and follow the prompts.
a) Enter your hostname as the server
b) Enter the export filesystem name, i.e. /mnt/xfs
c) Enter "10" for the number of hours to execute.
The oops should occur within 2-3h or so.
This was found when trying to resolve XFS bug:
http://oss.sgi.com/bugzilla/show_bug.cgi?id=309
^ permalink raw reply [flat|nested] 5+ messages in thread[parent not found: <20040326154915.GC3472@logos.cnet>]
* Re: [Bug 2367] New: kernel BUG at inode.c:334! [not found] ` <20040326154915.GC3472@logos.cnet> @ 2004-03-26 15:58 ` Marcelo Tosatti [not found] ` <20040326154000.GA28389@panic.unixguru.info> 0 siblings, 1 reply; 5+ messages in thread From: Marcelo Tosatti @ 2004-03-26 15:58 UTC (permalink / raw) To: Martin J. Bligh; +Cc: linux-kernel, bugzilla, dwmw2, viro, Mika Fischer [-- Attachment #1: Type: text/plain, Size: 3634 bytes --] On Thu, Mar 25, 2004 at 09:32:22AM -0800, Martin J. Bligh wrote: > http://bugme.osdl.org/show_bug.cgi?id=2367 > > Summary: kernel BUG at inode.c:334! > Kernel Version: 2.4.25 > Status: NEW > Severity: normal > Owner: axboe@suse.de > Submitter: bugzilla@stone.nu > > > Distribution: Debian GNU/Linux > Hardware Environment: IBM Netfinity 6000 / ICP Vortex / EXP300 Disk cabinet > Software Environment: Vanilla Debian GNU/Linux (woody) > > Problem Description: > kernel BUG at inode.c:334! > invalid operand: 0000 > CPU: 1 > EIP: 0010:[<c014b8bc>] Not tainted > Using defaults from ksymoops -t elf32-i386 -a i386 > EFLAGS: 00010202 > eax: f7b0606c ebx: 0000000f ecx: ee95c228 edx: f7b06000 > esi: ee95c220 edi: f7b06064 ebp: f7b0606c esp: f031df88 > ds: 0018 es: 0018 ss: 0018 > Process fsstress (pid: 883, stackpage=f031d000) > Stack: f7b06000 f031c000 bffffb94 bffff918 f7b06064 c014bd16 f7b06000 00000000 > c0137e4e 00000000 f031c000 400135cc c0137eab 00000000 c0106e77 4012ee48 > 4012ad74 080505a0 400135cc bffffb94 bffff918 00000024 0000002b 0000002b > Call Trace: [<c014bd16>] [<c0137e4e>] [<c0137eab>] [<c0106e77>] > Code: 0f 0b 4e 01 b5 a5 2a c0 89 d8 0c 08 24 f8 89 86 1c 01 00 00 > > >> EIP; c014b8bc <sync_inodes_sb+90/250> <===== > > >> eax; f7b0606c <_end+3777e8e8/3865b8dc> > >> ecx; ee95c228 <_end+2e5d4aa4/3865b8dc> > >> edx; f7b06000 <_end+3777e87c/3865b8dc> > >> esi; ee95c220 <_end+2e5d4a9c/3865b8dc> > >> edi; f7b06064 <_end+3777e8e0/3865b8dc> > >> ebp; f7b0606c <_end+3777e8e8/3865b8dc> > >> esp; f031df88 <_end+2ff96804/3865b8dc> > > Trace; c014bd16 <sync_inodes+36/4c> > Trace; c0137e4e <fsync_dev+3a/80> > Trace; c0137eab <sys_sync+7/10> > Trace; c0106e77 <system_call+33/38> > > Code; c014b8bc <sync_inodes_sb+90/250> > 00000000 <_EIP>: > Code; c014b8bc <sync_inodes_sb+90/250> <===== > 0: 0f 0b ud2a <===== > Code; c014b8be <sync_inodes_sb+92/250> > 2: 4e dec %esi > Code; c014b8bf <sync_inodes_sb+93/250> > 3: 01 b5 a5 2a c0 89 add %esi,0x89c02aa5(%ebp) > Code; c014b8c5 <sync_inodes_sb+99/250> > 9: d8 0c 08 fmuls (%eax,%ecx,1) > Code; c014b8c8 <sync_inodes_sb+9c/250> > c: 24 f8 and $0xf8,%al > Code; c014b8ca <sync_inodes_sb+9e/250> > e: 89 86 1c 01 00 00 mov %eax,0x11c(%esi) > > Steps to reproduce: > 1) Download the latest LTP testsuite > 2) Build and install the testsuite > 3) Make sure the NFS server daemons are running > 4) Export an XFS filesystem to be used for testing, globally, with root allowed. > ex: /mnt/xfs *(sync,rw, no_root_squash) > 5) Change directory to where the LTP is installed > 6) Change directory to testcases/bin/ > 7) Execute 'nfs_fsstress.sh' and follow the prompts. > a) Enter your hostname as the server > b) Enter the export filesystem name, i.e. /mnt/xfs > c) Enter "10" for the number of hours to execute. > > The oops should occur within 2-3h or so. > > This was found when trying to resolve XFS bug: > http://oss.sgi.com/bugzilla/show_bug.cgi?id=309 This is the second bug report of "BUG at inode.c:334" I have seen. The other one reported by Mika Fischer. Its indeed not valid for I_LOCK or I_FREEING inode's to be on the superblock dirty list. I cannot see how this is happening. Martin, Mika, can you please apply the attached patch and rerun the tests? It might give a bit more clue. Thanks. [-- Attachment #2: inode-sync-debug.patch --] [-- Type: text/plain, Size: 539 bytes --] --- fs/inode.c.orig 2004-03-26 12:30:01.961087616 -0300 +++ fs/inode.c 2004-03-26 12:42:44.992089272 -0300 @@ -330,8 +330,15 @@ list_del(&inode->i_list); list_add(&inode->i_list, &inode->i_sb->s_locked_inodes); - if (inode->i_state & (I_LOCK|I_FREEING)) + if (inode->i_state & (I_FREEING)) { + printk("inode->i_istate:%x \n", inode->i_state); BUG(); + } + + if (inode->i_state & (I_LOCK)) { + printk("inode->i_istate:%x \n", inode->i_state); + BUG(); + } /* Set I_LOCK, reset I_DIRTY */ dirty = inode->i_state & I_DIRTY; ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <20040326154000.GA28389@panic.unixguru.info>]
* 2.4: kernel BUG at inode.c:334! [not found] ` <20040326154000.GA28389@panic.unixguru.info> @ 2004-03-26 18:39 ` Marcelo Tosatti 2004-03-30 14:51 ` Jaco Kroon 0 siblings, 1 reply; 5+ messages in thread From: Marcelo Tosatti @ 2004-03-26 18:39 UTC (permalink / raw) To: Fredrik Steen; +Cc: linux-kernel, dwmw2 On Fri, Mar 26, 2004 at 04:40:00PM +0100, Fredrik Steen wrote: > On [040326 16:20] Marcelo Tosatti <marcelo.tosatti@cyclades.com> wrote: > > On Thu, Mar 25, 2004 at 09:32:22AM -0800, Martin J. Bligh wrote: > > > http://bugme.osdl.org/show_bug.cgi?id=2367 > > > > This is the second bug report of "BUG at inode.c:334" I have seen. > > The other one reported by Mika Fischer. > > > > Its indeed not valid for I_LOCK or I_FREEING inode's to be on the > > superblock dirty list. I cannot see how this is happening. > > > > Martin, Mika, can you please apply the attached patch and rerun the tests? > > > > It might give a bit more clue. Thanks. > > > > --- fs/inode.c.orig 2004-03-26 12:30:01.961087616 -0300 > [...] > > I ran the patch and got this: > inode->i_istate:f > Kernel BUG at inode.c:340! > [...] Hi Fredik, It seems Trond already figured it out, we are erroneously moving locked inodes to the dirty list. He attached the following patch in the bugzilla to fix the problem. Can you please give it a try? --- linux-2.4.26-up/fs/inode.c.orig 2004-03-19 17:12:46.000000000 -0500 +++ linux-2.4.26-up/fs/inode.c 2004-03-26 13:01:23.000000000 -0500 @@ -319,7 +319,8 @@ void refile_inode(struct inode *inode) if (!inode) return; spin_lock(&inode_lock); - __refile_inode(inode); + if (!(inode->i_state & I_LOCK)) + __refile_inode(inode); spin_unlock(&inode_lock); } ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 2.4: kernel BUG at inode.c:334! 2004-03-26 18:39 ` 2.4: " Marcelo Tosatti @ 2004-03-30 14:51 ` Jaco Kroon 2004-03-30 16:51 ` Marcelo Tosatti 0 siblings, 1 reply; 5+ messages in thread From: Jaco Kroon @ 2004-03-30 14:51 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: Fredrik Steen, linux-kernel, dwmw2 Hello We were having similar problems on two oldish machines (about 5 years old, old prolines) and since the dpt_i2o driver isn't ported officially yet we were stuck with 2.4 for some time before we decided to just stuff it and switch to 2.6 using some patch for dpt_i2o. Now we are still having the same problem, but less regularly - seems to die shortly after we run a addusers script that causes intensive io on /, which is using ext3. Unfortunately the stack traces doesn't get sent to a log file (how can I quickly rig this?) and both machines are production machines -> ie, it goes down and we run for all we are worth to hit that reset button. I do however have a small machine at home that seem to be giving similar problems, but I'm not sure. I can't get stack traces in this case at all (APM kicks in and I can't get it back out after it crashes). I've now recompiled with full kernel debugging (everything under kernel hacking) and the only thing I get in the kernel logs are ??? suppressed messages from the kernel. It still dies. It also has periods where it just slows down to a stop (doesn't respond to pings for up to a minute at a time). Usually dies whilst compiling (heavy disk io). One of the production machines and my machine at home currently runs 2.6.4 and the other 2.4.25. So this seems to be a more general problem (My co-worker suspects ext3 - since this bug report started with xfs that might not be the case). The only pattern we are seeing between all of these is that they serve as nfs servers (but on mine at home it still dies, even when not serving nfs - it still is a nfs client when it dies though), are not the newest and greatest machines and all of them use ext3 as their root file system. Oh, also, usually shortly after, or during, intensive disk io - which match up with what Mika mentioned. I've also tried disabling IO-APIC (which we're not even sure is supported, but APIC is), as well as pre-empting. We don't suspect nfs on the production machines anymore since we managed to trash the nfs exported dir for about an hour (keeping the server at load average 8.5) which makes use of reiserfs - we might've been lucky though. In almost all the cases these exports are relatively big though, and I noticed there is a problem there as well (We don't get the magical 1000 number quite yet). Is there anything else I should/can take a look at? Is there any other way in which I can help find the problem? If I can just get somewhere to start ... (The patch below doesn't apply to 2.6 as far as I can see). Apologies for the essay. Jaco Marcelo Tosatti wrote: >On Fri, Mar 26, 2004 at 04:40:00PM +0100, Fredrik Steen wrote: > > >>On [040326 16:20] Marcelo Tosatti <marcelo.tosatti@cyclades.com> wrote: >> >> >>> On Thu, Mar 25, 2004 at 09:32:22AM -0800, Martin J. Bligh wrote: >>> > http://bugme.osdl.org/show_bug.cgi?id=2367 >>> >>>This is the second bug report of "BUG at inode.c:334" I have seen. >>>The other one reported by Mika Fischer. >>> >>>Its indeed not valid for I_LOCK or I_FREEING inode's to be on the >>>superblock dirty list. I cannot see how this is happening. >>> >>>Martin, Mika, can you please apply the attached patch and rerun the tests? >>> >>>It might give a bit more clue. Thanks. >>> >>>--- fs/inode.c.orig 2004-03-26 12:30:01.961087616 -0300 >>> >>> >>[...] >> >>I ran the patch and got this: >>inode->i_istate:f >>Kernel BUG at inode.c:340! >>[...] >> >> > >Hi Fredik, > >It seems Trond already figured it out, we are erroneously moving >locked inodes to the dirty list. He attached the following patch in >the bugzilla to fix the problem. Can you please give it a try? > >--- linux-2.4.26-up/fs/inode.c.orig 2004-03-19 17:12:46.000000000 -0500 >+++ linux-2.4.26-up/fs/inode.c 2004-03-26 13:01:23.000000000 -0500 >@@ -319,7 +319,8 @@ void refile_inode(struct inode *inode) > if (!inode) > return; > spin_lock(&inode_lock); >- __refile_inode(inode); >+ if (!(inode->i_state & I_LOCK)) >+ __refile_inode(inode); > spin_unlock(&inode_lock); > } > > ===========================================This message and attachments are subject to a disclaimer. Please refer to www.it.up.ac.za/documentation/governance/disclaimer/ for full details. Hierdie boodskap en aanhangsels is aan 'n vrywaringsklousule onderhewig. Volledige besonderhede is by www.it.up.ac.za/documentation/governance/disclaimer/ beskikbaar. =========================================== ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 2.4: kernel BUG at inode.c:334! 2004-03-30 14:51 ` Jaco Kroon @ 2004-03-30 16:51 ` Marcelo Tosatti 0 siblings, 0 replies; 5+ messages in thread From: Marcelo Tosatti @ 2004-03-30 16:51 UTC (permalink / raw) To: Jaco Kroon; +Cc: Fredrik Steen, linux-kernel, dwmw2 On Tue, Mar 30, 2004 at 04:51:57PM +0200, Jaco Kroon wrote: > So this seems to be a more general problem (My co-worker suspects ext3 - > since this bug report started with xfs that might not be the case). The > only pattern we are seeing between all of these is that they serve as > nfs servers (but on mine at home it still dies, even when not serving > nfs - it still is a nfs client when it dies though), are not the newest > and greatest machines and all of them use ext3 as their root file > system. Oh, also, usually shortly after, or during, intensive disk io - > which match up with what Mika mentioned. I've also tried disabling > IO-APIC (which we're not even sure is supported, but APIC is), as well > as pre-empting. > > We don't suspect nfs on the production machines anymore since we managed > to trash the nfs exported dir for about an hour (keeping the server at > load average 8.5) which makes use of reiserfs - we might've been lucky > though. In almost all the cases these exports are relatively big > though, and I noticed there is a problem there as well (We don't get the > magical 1000 number quite yet). > > Is there anything else I should/can take a look at? Is there any other > way in which I can help find the problem? If I can just get somewhere > to start ... (The patch below doesn't apply to 2.6 as far as I can see). > > Apologies for the essay. Jaco, The "kernel BUG at inode.c:340" problem is fixed in 2.4.26-rc1. If that was what you were hitting, can you try that on your servers About the other crashes, its hard to help without more information. Try attaching a serial cable to the box for serial console. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2004-03-30 17:04 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-03-25 17:32 [Bug 2367] New: kernel BUG at inode.c:334! Martin J. Bligh
[not found] ` <20040326154915.GC3472@logos.cnet>
2004-03-26 15:58 ` Marcelo Tosatti
[not found] ` <20040326154000.GA28389@panic.unixguru.info>
2004-03-26 18:39 ` 2.4: " Marcelo Tosatti
2004-03-30 14:51 ` Jaco Kroon
2004-03-30 16:51 ` Marcelo Tosatti
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox