public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* OOPS caused by ext2 changes
@ 2002-04-16  1:16 Dave Hansen
  2002-04-16  3:27 ` Andrew Morton
  0 siblings, 1 reply; 3+ messages in thread
From: Dave Hansen @ 2002-04-16  1:16 UTC (permalink / raw)
  To: Alexander Viro; +Cc: Andrew Morton, linux-kernel

Andrew Morton and I discused this earlier.  I have some more information 
now.  The problem: "dbench 64" run on a small (~120meg) partition with 
1k block sizes produces Oopses.

This changeset:
http://linus.bkbits.net:8080/linux-2.5/patch@1.248.2.6?nav=index.html|ChangeSet|cset@1.248.2.6
is the culprit.  Without it applied, none of this happens.

The errors:
VFS: brelse: Trying to free free buffer
EXT2-fs error (device sd(8,6)): ext2_free_blocks: bit already cleared 
for block 101078
EXT2-fs error (device sd(8,6)): ext2_free_blocks: bit already cleared 
for block 101077

Then, follows it up with a bunch of Oopses.  I've included two of them 
below, but it looks to me like the buffer_head chain is getting changed 
unexpectedly.

for instance, this:
         do {
                 bh->b_end_io = NULL;
                 tail = bh;
                 bh = bh->b_this_page;
         } while (bh);
Oopses when bh is set to something invalid.

I can reproduce these pretty easily.  They still happen in 2.5.8.

  Oops: 0002
CPU:    0
EIP:    0010:[<c013ef60>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010246
eax: 02000820   ebx: c4aadbe8   ecx: 02000820   edx: 00000000
esi: 00000400   edi: 00000000   ebp: 00000400   esp: f5ed1e9c
ds: 0018   es: 0018   ss: 0018
Stack: c4aadbe8 c013f271 c4aadbe8 00000400 fe0a7000 f5ed1ec4 c019e335 
00000400
        00000000 f5ed1ec4 f54e0e94 c012c0eb f54e0e8c 00000000 c4aadbe8 
c4aadbe8
        c4aadbe8 f54e0df0 c4aadbe8 00000400 c013fc42 f54e0df0 c4aadbe8 
00000000
Call Trace: [<c013f271>] [<c019e335>] [<c012c0eb>] [<c013fc42>] 
[<c0176440>]
    [<c01752cc>] [<c0176440>] [<c0177cba>] [<c01486f0>] [<c01487ae>] 
[<c010728b>]
Code: c7 40 38 00 00 00 00 89 c2 8b 40 28 85 c0 75 f0 89 4a 28 b8

 >>EIP; c013ef60 <create_empty_buffers+30/60>   <=====
Trace; c013f270 <__block_prepare_write+80/300>
Trace; c019e334 <radix_tree_insert+14/30>
Trace; c012c0ea <__add_to_page_cache+1a/70>
Trace; c013fc42 <block_prepare_write+22/70>
Trace; c0176440 <ext2_get_block+0/3c0>
Trace; c01752cc <ext2_make_empty+3c/110>
Trace; c0176440 <ext2_get_block+0/3c0>
Trace; c0177cba <ext2_mkdir+8a/100>
Trace; c01486f0 <vfs_mkdir+60/90>
Trace; c01487ae <sys_mkdir+8e/d0>
Trace; c010728a <syscall_call+6/a>
Code;  c013ef60 <create_empty_buffers+30/60>
00000000 <_EIP>:
Code;  c013ef60 <create_empty_buffers+30/60>   <=====
    0:   c7 40 38 00 00 00 00      movl   $0x0,0x38(%eax)   <=====
Code;  c013ef66 <create_empty_buffers+36/60>
    7:   89 c2                     mov    %eax,%edx
Code;  c013ef68 <create_empty_buffers+38/60>
    9:   8b 40 28                  mov    0x28(%eax),%eax
Code;  c013ef6c <create_empty_buffers+3c/60>
    c:   85 c0                     test   %eax,%eax
Code;  c013ef6e <create_empty_buffers+3e/60>
    e:   75 f0                     jne    0 <_EIP>
Code;  c013ef70 <create_empty_buffers+40/60>
   10:   89 4a 28                  mov    %ecx,0x28(%edx)
Code;  c013ef72 <create_empty_buffers+42/60>
   13:   b8 00 00 00 00            mov    $0x0,%eax


EFLAGS: 00010202
eax: 00000008   ebx: f6349e50   ecx: f6ee6c00   edx: 00000008
esi: f4b1e800   edi: 00000003   ebp: f6349e68   esp: f6349e04
ds: 0018   es: 0018   ss: 0018
Stack: c017651d 00000008 f6349e1c 00000003 00000003 00011a2a 00000000 
c718fec0
        000000f0 00000202 000000f0 c0132079 c718fec0 00000246 f6349e4c 
00000000
        f4f7859c 00000000 f4b1e740 f4f784b4 00000000 00000000 00000400 
c013e179
Call Trace: [<c017651d>] [<c0132079>] [<c013e179>] [<c013e6d1>] 
[<c013dd46>]
    [<c013e904>] [<c013efa2>] [<c0176410>] [<c012eb90>] [<c0176410>] 
[<c012cdc0>]
    [<c013be26>] [<c013b8a0>] [<c013bbce>] [<c0108cf7>]
Code: 8b 42 14 85 c0 74 05 f0 ff 4a 14 c3 c7 44 24 04 a0 8b 26 c0

 >>EIP; c013dea4 <__brelse+4/20>   <=====
Trace; c017651c <ext2_get_block+10c/470>
Trace; c0132078 <kmem_cache_alloc+78/120>
Trace; c013e178 <create_buffers+68/100>
Trace; c013e6d0 <__block_prepare_write+110/290>
Trace; c013dd46 <balance_dirty+6/50>
Trace; c013e904 <__block_commit_write+b4/e0>
Trace; c013efa2 <block_prepare_write+22/80>
Trace; c0176410 <ext2_get_block+0/470>
Trace; c012eb90 <generic_file_write+500/750>
Trace; c0176410 <ext2_get_block+0/470>
Trace; c012cdc0 <file_read_actor+0/100>
Trace; c013be26 <sys_write+96/d0>
Trace; c013b8a0 <generic_file_llseek+0/d0>
Trace; c013bbce <sys_lseek+6e/80>
Trace; c0108cf6 <syscall_call+6/a>
Code;  c013dea4 <__brelse+4/20>
00000000 <_EIP>:
Code;  c013dea4 <__brelse+4/20>   <=====
    0:   8b 42 14                  mov    0x14(%edx),%eax   <=====
Code;  c013dea6 <__brelse+6/20>
    3:   85 c0                     test   %eax,%eax
Code;  c013dea8 <__brelse+8/20>
    5:   74 05                     je     c <_EIP+0xc> c013deb0 
<__brelse+10/20>
Code;  c013deaa <__brelse+a/20>
    7:   f0 ff 4a 14               lock decl 0x14(%edx)
Code;  c013deae <__brelse+e/20>
    b:   c3                        ret
Code;  c013deb0 <__brelse+10/20>
    c:   c7 44 24 04 a0 8b 26      movl   $0xc0268ba0,0x4(%esp,1)
Code;  c013deb6 <__brelse+16/20>
   13:   c0



-- 
Dave Hansen
haveblue@us.ibm.com


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: OOPS caused by ext2 changes
  2002-04-16  1:16 OOPS caused by ext2 changes Dave Hansen
@ 2002-04-16  3:27 ` Andrew Morton
  2002-04-16  5:31   ` David C. Hansen
  0 siblings, 1 reply; 3+ messages in thread
From: Andrew Morton @ 2002-04-16  3:27 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Alexander Viro, linux-kernel

Dave Hansen wrote:
> 
> Andrew Morton and I discused this earlier.  I have some more information
> now.  The problem: "dbench 64" run on a small (~120meg) partition with
> 1k block sizes produces Oopses.
> 
> This changeset:
> http://linus.bkbits.net:8080/linux-2.5/patch@1.248.2.6?nav=index.html|ChangeSet|cset@1.248.2.6
> is the culprit.  Without it applied, none of this happens.
> 


It's vaguely surprising that that chunk is associated with the
problem.

However it seems that there's potential for a buffer reference
leak in ext2_get_branch:

        while (--depth) {
                bh = sb_bread(sb, le32_to_cpu(p->key));
                if (!bh)
                        goto failure;
                /* Reader: pointers */
                if (!verify_chain(chain, p))
                        goto changed;
                add_chain(++p, bh, (u32*)bh->b_data + *++offsets);
                /* Reader: end */
                if (!p->key)
                        goto no_block;
        }
        return NULL;

changed:
        *err = -EAGAIN;
        goto no_block;
failure:
        *err = -EIO;
no_block:
        return p;
}


See, sb_bread() bumps b_count, but on the `goto changed;'
branch we lose track of that buffer.

b_count is only 16 bits, so it's conceivable that the
count wraps to zero, and that is fatal.

It would be interesting to replace that `goto changed;' 
with { __brelse(bh); goto changed; }.  Plus maybe a
debug printk to see if we are indeed hitting that path.

I don't think this is the bug actually - if we were
leaking bh refs that easily we'd get `busy buffer'
whines at unmount.  But it merits investigation.

-

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: OOPS caused by ext2 changes
  2002-04-16  3:27 ` Andrew Morton
@ 2002-04-16  5:31   ` David C. Hansen
  0 siblings, 0 replies; 3+ messages in thread
From: David C. Hansen @ 2002-04-16  5:31 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Alexander Viro, linux-kernel

Andrew Morton wrote:

>Dave Hansen wrote:
>
>>Andrew Morton and I discused this earlier.  I have some more information
>>now.  The problem: "dbench 64" run on a small (~120meg) partition with
>>1k block sizes produces Oopses.
>>
>>This changeset:
>>http://linus.bkbits.net:8080/linux-2.5/patch@1.248.2.6?nav=index.html|ChangeSet|cset@1.248.2.6
>>is the culprit.  Without it applied, none of this happens.
>>
>However it seems that there's potential for a buffer reference
>leak in ext2_get_branch:
>
>See, sb_bread() bumps b_count, but on the `goto changed;'
>branch we lose track of that buffer.
>
>b_count is only 16 bits, so it's conceivable that the
>count wraps to zero, and that is fatal.
>
>It would be interesting to replace that `goto changed;' 
>with { __brelse(bh); goto changed; }.  Plus maybe a
>debug printk to see if we are indeed hitting that path.
>
Well, I'm a little bit clearer about what's going on now.  I noticed 
that verify_chain() is inline, and that is what is actually Oopsing. 
 Any idea how we're getting 8 into edx?  

edx: 00000008
Code;  c013dea4 <__brelse+4/20>   <=====
   0:   8b 42 14                  mov    0x14(%edx),%eax   <=====

Is the Indirect array getting junk into it?


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2002-04-16  5:31 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-04-16  1:16 OOPS caused by ext2 changes Dave Hansen
2002-04-16  3:27 ` Andrew Morton
2002-04-16  5:31   ` David C. Hansen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox