public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* latest -git: kernel BUG at fs/xfs/support/debug.c:54!
@ 2008-07-17 17:46 Vegard Nossum
  2008-07-17 19:05 ` Eric Sandeen
  0 siblings, 1 reply; 6+ messages in thread
From: Vegard Nossum @ 2008-07-17 17:46 UTC (permalink / raw)
  To: Tim Shimmin, xfs; +Cc: linux-kernel, Johannes Weiner

Hi,

I got this with an intentionally corrupted filesystem:

Filesystem "loop1": Disabling barriers, not supported by the underlying device
XFS mounting filesystem loop1
Ending clean XFS mount for filesystem: loop1
Device loop1 - bad inode magic/vsn daddr 9680 #30 (magic=4946)
------------[ cut here ]------------
kernel BUG at fs/xfs/support/debug.c:54!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Pid: 12849, comm: grep Not tainted (2.6.26-03414-g33af79d #43)
EIP: 0060:[<c0386d89>] EFLAGS: 00210246 CPU: 1
EIP is at cmn_err+0x99/0xa0
EAX: ed75e000 EBX: c089047c ECX: ed75e000 EDX: 00000000
ESI: 00000000 EDI: 00200286 EBP: ed75fbbc ESP: ed75fba4
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process grep (pid: 12849, ti=ed75e000 task=f1ee9fe0 task.ti=ed75e000)
Stack: c0855099 c0846a0d c0d0e8c0 00004946 df92a8f0 0000001e ed75fc2c c03540d4
       00000000 c089047c ed75fbfc 000025d0 00000000 0000001e 00004946 ed75fc54
       00000000 df92a8f0 df92abd8 000025d0 00000000 00000000 706f6f6c 00000031
Call Trace:
 [<c03540d4>] ? xfs_imap_to_bp+0x164/0x250
 [<c015ad76>] ? trace_hardirqs_on_caller+0x116/0x170
 [<c0354250>] ? xfs_itobp+0x90/0x180
 [<c0356e51>] ? xfs_iread+0xa1/0x280
 [<c034f216>] ? xfs_iget_core+0x1c6/0x6e0
 [<c034f82a>] ? xfs_iget+0xfa/0x170
 [<c0377546>] ? xfs_lookup+0xb6/0xc0
 [<c0382fba>] ? xfs_vn_lookup+0x4a/0x90
 [<c01ac110>] ? do_lookup+0x160/0x1b0
 [<c01adc38>] ? __link_path_walk+0x208/0xdc0
 [<c014f916>] ? up_read+0x16/0x30
 [<c034eabe>] ? xfs_iunlock+0xee/0x110
 [<c0382bdf>] ? xfs_vn_follow_link+0x3f/0x80
 [<c01ae327>] ? __link_path_walk+0x8f7/0xdc0
 [<c015906b>] ? trace_hardirqs_off+0xb/0x10
 [<c01ae844>] ? path_walk+0x54/0xb0
 [<c01aea45>] ? do_path_lookup+0x85/0x230
 [<c01af7a8>] ? __user_walk_fd+0x38/0x50
 [<c01a7fb1>] ? vfs_stat_fd+0x21/0x50
 [<c01590cd>] ? put_lock_stats+0xd/0x30
 [<c01bc81d>] ? mntput_no_expire+0x1d/0x110
 [<c01a8081>] ? vfs_stat+0x11/0x20
 [<c01a80a4>] ? sys_stat64+0x14/0x30
 [<c01a5a8f>] ? fput+0x1f/0x30
 [<c044a0f8>] ? trace_hardirqs_on_thunk+0xc/0x10
 [<c015ad76>] ? trace_hardirqs_on_caller+0x116/0x170
 [<c044a0f8>] ? trace_hardirqs_on_thunk+0xc/0x10
 [<c010407f>] ? sysenter_past_esp+0x78/0xc5
 =======================
Code: 04 e8 00 eb 3d 00 89 fa b8 40 4a 92 c0 e8 70 d0 3d 00 85 f6 74
15 83 c4 0c 5b 5e 5f 5d c3 8d 74 26 00 c6 81 c0 e8 d0 c0 00 eb bc <0f>
0b eb fe 90 90 90 55 b9 04 00 00 00 89 e5 57 89 c7 31 c0 f3
EIP: [<c0386d89>] cmn_err+0x99/0xa0 SS:ESP 0068:ed75fba4
Kernel panic - not syncing: Fatal exception


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: latest -git: kernel BUG at fs/xfs/support/debug.c:54!
  2008-07-17 17:46 latest -git: kernel BUG at fs/xfs/support/debug.c:54! Vegard Nossum
@ 2008-07-17 19:05 ` Eric Sandeen
  2008-07-17 19:18   ` Vegard Nossum
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Sandeen @ 2008-07-17 19:05 UTC (permalink / raw)
  To: Vegard Nossum; +Cc: Tim Shimmin, xfs, linux-kernel, Johannes Weiner

Vegard Nossum wrote:
> Hi,
> 
> I got this with an intentionally corrupted filesystem:
> 
> Filesystem "loop1": Disabling barriers, not supported by the underlying device
> XFS mounting filesystem loop1
> Ending clean XFS mount for filesystem: loop1
> Device loop1 - bad inode magic/vsn daddr 9680 #30 (magic=4946)
> ------------[ cut here ]------------
> kernel BUG at fs/xfs/support/debug.c:54!

running a debug XFS will turn all sorts of tests into panics that would
not otherwise crash and burn that way.

I think normally when testing intentionally corrupted filesystems, you
expect corruptions to be handled gracefully.  But in xfs's flavor of
debug, I'm not sure it's quite as true.

Perhaps the debug variant should not BUG() on disk corruption either,
but it'd probably be more relevent to test this on a non-debug build.

Does this corrupted fs survive better on non-debug xfs?

-Eric

> invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> Pid: 12849, comm: grep Not tainted (2.6.26-03414-g33af79d #43)
> EIP: 0060:[<c0386d89>] EFLAGS: 00210246 CPU: 1
> EIP is at cmn_err+0x99/0xa0
> EAX: ed75e000 EBX: c089047c ECX: ed75e000 EDX: 00000000
> ESI: 00000000 EDI: 00200286 EBP: ed75fbbc ESP: ed75fba4
>  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> Process grep (pid: 12849, ti=ed75e000 task=f1ee9fe0 task.ti=ed75e000)
> Stack: c0855099 c0846a0d c0d0e8c0 00004946 df92a8f0 0000001e ed75fc2c c03540d4
>        00000000 c089047c ed75fbfc 000025d0 00000000 0000001e 00004946 ed75fc54
>        00000000 df92a8f0 df92abd8 000025d0 00000000 00000000 706f6f6c 00000031
> Call Trace:
>  [<c03540d4>] ? xfs_imap_to_bp+0x164/0x250
>  [<c015ad76>] ? trace_hardirqs_on_caller+0x116/0x170
>  [<c0354250>] ? xfs_itobp+0x90/0x180
>  [<c0356e51>] ? xfs_iread+0xa1/0x280
>  [<c034f216>] ? xfs_iget_core+0x1c6/0x6e0
>  [<c034f82a>] ? xfs_iget+0xfa/0x170
>  [<c0377546>] ? xfs_lookup+0xb6/0xc0
>  [<c0382fba>] ? xfs_vn_lookup+0x4a/0x90
>  [<c01ac110>] ? do_lookup+0x160/0x1b0
>  [<c01adc38>] ? __link_path_walk+0x208/0xdc0
>  [<c014f916>] ? up_read+0x16/0x30
>  [<c034eabe>] ? xfs_iunlock+0xee/0x110
>  [<c0382bdf>] ? xfs_vn_follow_link+0x3f/0x80
>  [<c01ae327>] ? __link_path_walk+0x8f7/0xdc0
>  [<c015906b>] ? trace_hardirqs_off+0xb/0x10
>  [<c01ae844>] ? path_walk+0x54/0xb0
>  [<c01aea45>] ? do_path_lookup+0x85/0x230
>  [<c01af7a8>] ? __user_walk_fd+0x38/0x50
>  [<c01a7fb1>] ? vfs_stat_fd+0x21/0x50
>  [<c01590cd>] ? put_lock_stats+0xd/0x30
>  [<c01bc81d>] ? mntput_no_expire+0x1d/0x110
>  [<c01a8081>] ? vfs_stat+0x11/0x20
>  [<c01a80a4>] ? sys_stat64+0x14/0x30
>  [<c01a5a8f>] ? fput+0x1f/0x30
>  [<c044a0f8>] ? trace_hardirqs_on_thunk+0xc/0x10
>  [<c015ad76>] ? trace_hardirqs_on_caller+0x116/0x170
>  [<c044a0f8>] ? trace_hardirqs_on_thunk+0xc/0x10
>  [<c010407f>] ? sysenter_past_esp+0x78/0xc5
>  =======================
> Code: 04 e8 00 eb 3d 00 89 fa b8 40 4a 92 c0 e8 70 d0 3d 00 85 f6 74
> 15 83 c4 0c 5b 5e 5f 5d c3 8d 74 26 00 c6 81 c0 e8 d0 c0 00 eb bc <0f>
> 0b eb fe 90 90 90 55 b9 04 00 00 00 89 e5 57 89 c7 31 c0 f3
> EIP: [<c0386d89>] cmn_err+0x99/0xa0 SS:ESP 0068:ed75fba4
> Kernel panic - not syncing: Fatal exception
> 
> 
> Vegard
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: latest -git: kernel BUG at fs/xfs/support/debug.c:54!
  2008-07-17 19:05 ` Eric Sandeen
@ 2008-07-17 19:18   ` Vegard Nossum
  2008-07-17 19:29     ` Vegard Nossum
  0 siblings, 1 reply; 6+ messages in thread
From: Vegard Nossum @ 2008-07-17 19:18 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Tim Shimmin, xfs, linux-kernel, Johannes Weiner

On Thu, Jul 17, 2008 at 9:05 PM, Eric Sandeen <sandeen@redhat.com> wrote:
>> Hi,
>>
>> I got this with an intentionally corrupted filesystem:
>>
>> Filesystem "loop1": Disabling barriers, not supported by the underlying device
>> XFS mounting filesystem loop1
>> Ending clean XFS mount for filesystem: loop1
>> Device loop1 - bad inode magic/vsn daddr 9680 #30 (magic=4946)
>> ------------[ cut here ]------------
>> kernel BUG at fs/xfs/support/debug.c:54!
>
> running a debug XFS will turn all sorts of tests into panics that would
> not otherwise crash and burn that way.
>
> I think normally when testing intentionally corrupted filesystems, you
> expect corruptions to be handled gracefully.  But in xfs's flavor of
> debug, I'm not sure it's quite as true.
>
> Perhaps the debug variant should not BUG() on disk corruption either,
> but it'd probably be more relevent to test this on a non-debug build.
>
> Does this corrupted fs survive better on non-debug xfs?

Thanks, you are right. I have adjusted my configuration, but I am
still able to produce this:

BUG: unable to handle kernel paging request at b62a66e0
IP: [<c030ef88>] xfs_alloc_fix_freelist+0x28/0x490
*pde = 00000000
Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Pid: 4174, comm: rm Not tainted (2.6.26-03414-g33af79d #44)
EIP: 0060:[<c030ef88>] EFLAGS: 00210296 CPU: 0
EIP is at xfs_alloc_fix_freelist+0x28/0x490
EAX: f63e8830 EBX: f490a000 ECX: f48e8000 EDX: b62a66e0
ESI: 00000000 EDI: f48e9d8c EBP: f48e9d6c ESP: f48e9ccc
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process rm (pid: 4174, ti=f48e8000 task=f63d5fa0 task.ti=f48e8000)
Stack: 00000000 f63e8ac0 f63d5fa0 f63d64cc 00000002 00000000 f63d5fa0 f63e8830
       b62a66e0 f490a000 f73a3e10 c0b57c78 f49f2be0 c0ce8048 f49f24c0 00200046
       00000002 f48e9d20 c015908e f48e9d20 c01590cd f48e9d50 00200246 f63d6010
Call Trace:
 [<c015908e>] ? get_lock_stats+0x1e/0x50
 [<c01590cd>] ? put_lock_stats+0xd/0x30
 [<c030f453>] ? xfs_free_extent+0x63/0xd0
 [<c074955b>] ? down_read+0x5b/0x80
 [<c030f470>] ? xfs_free_extent+0x80/0xd0
 [<c0361f1a>] ? kmem_zone_alloc+0x7a/0xc0
 [<c0361f1a>] ? kmem_zone_alloc+0x7a/0xc0
 [<c03201ca>] ? xfs_bmap_finish+0x13a/0x180
 [<c03428d8>] ? xfs_itruncate_finish+0x1b8/0x400
 [<c035fa2b>] ? xfs_inactive+0x3bb/0x4e0
 [<c036b87a>] ? xfs_fs_clear_inode+0x8a/0xe0
 [<c01b962c>] ? clear_inode+0x7c/0x160
 [<c01b9c2e>] ? generic_delete_inode+0x10e/0x120
 [<c01b9d67>] ? generic_drop_inode+0x127/0x180
 [<c01b8be7>] ? iput+0x47/0x50
 [<c01af1bc>] ? do_unlinkat+0xec/0x170
 [<c0430938>] ? trace_hardirqs_on_thunk+0xc/0x10
 [<c0104174>] ? restore_nocheck_notrace+0x0/0xe
 [<c0430938>] ? trace_hardirqs_on_thunk+0xc/0x10
 [<c015ad76>] ? trace_hardirqs_on_caller+0x116/0x170
 [<c01af383>] ? sys_unlinkat+0x23/0x50
 [<c010407f>] ? sysenter_past_esp+0x78/0xc5
 =======================
Code: 8d 76 00 55 89 e5 57 89 c7 56 53 81 ec 94 00 00 00 8b 1f 89 95
70 ff ff ff 8b 57 0c 8b 40 04 89 5d 84 89 55 80 89 85 7c ff ff ff <80>
3a 00 0f 84 e7 02 00 00 c7 45 f0 00 00 00 00 8b 55 80 80 7a
EIP: [<c030ef88>] xfs_alloc_fix_freelist+0x28/0x490 SS:ESP 0068:f48e9ccc
Kernel panic - not syncing: Fatal exception

(Full log at http://folk.uio.no/vegardno/linux/log-1216322418.txt has
some more details.)


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: latest -git: kernel BUG at fs/xfs/support/debug.c:54!
  2008-07-17 19:18   ` Vegard Nossum
@ 2008-07-17 19:29     ` Vegard Nossum
  2008-07-17 22:40       ` Dave Chinner
  0 siblings, 1 reply; 6+ messages in thread
From: Vegard Nossum @ 2008-07-17 19:29 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Tim Shimmin, xfs, linux-kernel, Johannes Weiner

On Thu, Jul 17, 2008 at 9:18 PM, Vegard Nossum <vegard.nossum@gmail.com> wrote:
> Thanks, you are right. I have adjusted my configuration, but I am
> still able to produce this:
>
> BUG: unable to handle kernel paging request at b62a66e0
> IP: [<c030ef88>] xfs_alloc_fix_freelist+0x28/0x490

FWIW, this is fs/xfs/xfs_alloc.c:1817:

        if (!pag->pagf_init) {


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: latest -git: kernel BUG at fs/xfs/support/debug.c:54!
  2008-07-17 19:29     ` Vegard Nossum
@ 2008-07-17 22:40       ` Dave Chinner
  2008-07-19 13:16         ` Vegard Nossum
  0 siblings, 1 reply; 6+ messages in thread
From: Dave Chinner @ 2008-07-17 22:40 UTC (permalink / raw)
  To: Vegard Nossum
  Cc: Eric Sandeen, Tim Shimmin, xfs, linux-kernel, Johannes Weiner

On Thu, Jul 17, 2008 at 09:29:39PM +0200, Vegard Nossum wrote:
> On Thu, Jul 17, 2008 at 9:18 PM, Vegard Nossum <vegard.nossum@gmail.com> wrote:
> > Thanks, you are right. I have adjusted my configuration, but I am
> > still able to produce this:
> >
> > BUG: unable to handle kernel paging request at b62a66e0
> > IP: [<c030ef88>] xfs_alloc_fix_freelist+0x28/0x490
> 
> FWIW, this is fs/xfs/xfs_alloc.c:1817:
> 
>         if (!pag->pagf_init) {

Which kind of implies that we've got a bogus fsbno
that we're using as the basis of allocation.....

What is the corruption you are inducing? Can you produce
a xfs_metadump image of the filesystem and put it up somewhere
that we can access it?

I suspect that we are not validating the block numbers coming
out of the various btrees as landing inside the filesystem....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: latest -git: kernel BUG at fs/xfs/support/debug.c:54!
  2008-07-17 22:40       ` Dave Chinner
@ 2008-07-19 13:16         ` Vegard Nossum
  0 siblings, 0 replies; 6+ messages in thread
From: Vegard Nossum @ 2008-07-19 13:16 UTC (permalink / raw)
  To: Vegard Nossum, Eric Sandeen, Tim Shimmin, xfs, linux-kernel,
	Johannes Weiner

On Fri, Jul 18, 2008 at 12:40 AM, Dave Chinner <david@fromorbit.com> wrote:
> On Thu, Jul 17, 2008 at 09:29:39PM +0200, Vegard Nossum wrote:
>> On Thu, Jul 17, 2008 at 9:18 PM, Vegard Nossum <vegard.nossum@gmail.com> wrote:
>> > Thanks, you are right. I have adjusted my configuration, but I am
>> > still able to produce this:
>> >
>> > BUG: unable to handle kernel paging request at b62a66e0
>> > IP: [<c030ef88>] xfs_alloc_fix_freelist+0x28/0x490
>>
>> FWIW, this is fs/xfs/xfs_alloc.c:1817:
>>
>>         if (!pag->pagf_init) {
>
> Which kind of implies that we've got a bogus fsbno
> that we're using as the basis of allocation.....
>
> What is the corruption you are inducing? Can you produce
> a xfs_metadump image of the filesystem and put it up somewhere
> that we can access it?
>
> I suspect that we are not validating the block numbers coming
> out of the various btrees as landing inside the filesystem....

The method of corruption is quite crude (but efficient); just flip a
number of bits at random before mounting.

I got a different crash (NULL pointer) now, and I have a reproducible
case with a full disk image (it's only about 11M compressed, no
private/sensitive data). See
http://userweb.kernel.org/~vegard/bugs/20080719-xfs/

The way to reproduce:

    mount -o loop disk.xfs_idestroy_fork.bin /mnt
    rm -rf /mnt/*

And it should give something like this:

BUG: unable to handle kernel NULL pointer dereference at 00000008
IP: [<c0340ebf>] xfs_idestroy_fork+0x1f/0xe0
*pde = 00000000
Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Pid: 3966, comm: rm Not tainted (2.6.26-03421-g253a722 #49)
EIP: 0060:[<c0340ebf>] EFLAGS: 00210202 CPU: 1
EIP is at xfs_idestroy_fork+0x1f/0xe0
EAX: f5402a00 EBX: 00000000 ECX: f5ff0da0 EDX: 00000001
ESI: 00000001 EDI: f5402a00 EBP: f5fe5e7c ESP: f5fe5e70
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process rm (pid: 3966, ti=f5fe4000 task=f5f1cfb0 task.ti=f5fe4000)
Stack: f5402a00 00000000 f5fe5ecc f5fe5ea4 c035f729 00000000 00000004 00000002
       f79e4180 f5ff0cd0 f5402a00 f5ff0520 00000001 f5fe5ee0 c035f91e 00000000
       00000000 00000000 00000001 f79e4180 f5f1cfb0 00000000 c01590ae f5ff0a40
Call Trace:
 [<c035f729>] ? xfs_inactive_attrs+0xe9/0x100
 [<c035f91e>] ? xfs_inactive+0x1de/0x4e0
 [<c01590ae>] ? get_lock_stats+0x1e/0x50
 [<c01590ed>] ? put_lock_stats+0xd/0x30
 [<c036b94a>] ? xfs_fs_clear_inode+0x8a/0xe0
 [<c01b964c>] ? clear_inode+0x7c/0x160
 [<c01b9c4e>] ? generic_delete_inode+0x10e/0x120
 [<c01b9d87>] ? generic_drop_inode+0x127/0x180
 [<c01b8c07>] ? iput+0x47/0x50
 [<c01af1dc>] ? do_unlinkat+0xec/0x170
 [<c0430a08>] ? trace_hardirqs_on_thunk+0xc/0x10
 [<c015ad96>] ? trace_hardirqs_on_caller+0x116/0x170
 [<c01af3a3>] ? sys_unlinkat+0x23/0x50
 [<c010407f>] ? sysenter_past_esp+0x78/0xc5
 =======================
Code: c9 c3 8d 76 00 8d bc 27 00 00 00 00 55 89 e5 83 ec 0c 85 d2 89
1c 24 8d 58 38 89 74 24 04 89 d6 89 7c 24 08 89 c7 74 03 8b 58 34 <8b>
43 08 85 c0 74 10 0f bf 53 0c e8 c1 11 02 00 c7 43 08 00 00
EIP: [<c0340ebf>] xfs_idestroy_fork+0x1f/0xe0 SS:ESP 0068:f5fe5e70
---[ end trace 9a7a5b8ebfdbeebf ]---


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-07-19 13:16 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-17 17:46 latest -git: kernel BUG at fs/xfs/support/debug.c:54! Vegard Nossum
2008-07-17 19:05 ` Eric Sandeen
2008-07-17 19:18   ` Vegard Nossum
2008-07-17 19:29     ` Vegard Nossum
2008-07-17 22:40       ` Dave Chinner
2008-07-19 13:16         ` Vegard Nossum

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox