latest -git: kernel BUG at fs/xfs/support/debug.c:54!

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* latest -git: kernel BUG at fs/xfs/support/debug.c:54!
@ 2008-07-17 17:46 Vegard Nossum
       [not found] ` <487F980F.7070708@redhat.com>
  0 siblings, 1 reply; 5+ messages in thread
From: Vegard Nossum @ 2008-07-17 17:46 UTC (permalink / raw)
  To: Tim Shimmin, xfs; +Cc: linux-kernel, Johannes Weiner

Hi,

I got this with an intentionally corrupted filesystem:

Filesystem "loop1": Disabling barriers, not supported by the underlying device
XFS mounting filesystem loop1
Ending clean XFS mount for filesystem: loop1
Device loop1 - bad inode magic/vsn daddr 9680 #30 (magic=4946)
------------[ cut here ]------------
kernel BUG at fs/xfs/support/debug.c:54!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Pid: 12849, comm: grep Not tainted (2.6.26-03414-g33af79d #43)
EIP: 0060:[<c0386d89>] EFLAGS: 00210246 CPU: 1
EIP is at cmn_err+0x99/0xa0
EAX: ed75e000 EBX: c089047c ECX: ed75e000 EDX: 00000000
ESI: 00000000 EDI: 00200286 EBP: ed75fbbc ESP: ed75fba4
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process grep (pid: 12849, ti=ed75e000 task=f1ee9fe0 task.ti=ed75e000)
Stack: c0855099 c0846a0d c0d0e8c0 00004946 df92a8f0 0000001e ed75fc2c c03540d4
       00000000 c089047c ed75fbfc 000025d0 00000000 0000001e 00004946 ed75fc54
       00000000 df92a8f0 df92abd8 000025d0 00000000 00000000 706f6f6c 00000031
Call Trace:
 [<c03540d4>] ? xfs_imap_to_bp+0x164/0x250
 [<c015ad76>] ? trace_hardirqs_on_caller+0x116/0x170
 [<c0354250>] ? xfs_itobp+0x90/0x180
 [<c0356e51>] ? xfs_iread+0xa1/0x280
 [<c034f216>] ? xfs_iget_core+0x1c6/0x6e0
 [<c034f82a>] ? xfs_iget+0xfa/0x170
 [<c0377546>] ? xfs_lookup+0xb6/0xc0
 [<c0382fba>] ? xfs_vn_lookup+0x4a/0x90
 [<c01ac110>] ? do_lookup+0x160/0x1b0
 [<c01adc38>] ? __link_path_walk+0x208/0xdc0
 [<c014f916>] ? up_read+0x16/0x30
 [<c034eabe>] ? xfs_iunlock+0xee/0x110
 [<c0382bdf>] ? xfs_vn_follow_link+0x3f/0x80
 [<c01ae327>] ? __link_path_walk+0x8f7/0xdc0
 [<c015906b>] ? trace_hardirqs_off+0xb/0x10
 [<c01ae844>] ? path_walk+0x54/0xb0
 [<c01aea45>] ? do_path_lookup+0x85/0x230
 [<c01af7a8>] ? __user_walk_fd+0x38/0x50
 [<c01a7fb1>] ? vfs_stat_fd+0x21/0x50
 [<c01590cd>] ? put_lock_stats+0xd/0x30
 [<c01bc81d>] ? mntput_no_expire+0x1d/0x110
 [<c01a8081>] ? vfs_stat+0x11/0x20
 [<c01a80a4>] ? sys_stat64+0x14/0x30
 [<c01a5a8f>] ? fput+0x1f/0x30
 [<c044a0f8>] ? trace_hardirqs_on_thunk+0xc/0x10
 [<c015ad76>] ? trace_hardirqs_on_caller+0x116/0x170
 [<c044a0f8>] ? trace_hardirqs_on_thunk+0xc/0x10
 [<c010407f>] ? sysenter_past_esp+0x78/0xc5
 =======================
Code: 04 e8 00 eb 3d 00 89 fa b8 40 4a 92 c0 e8 70 d0 3d 00 85 f6 74
15 83 c4 0c 5b 5e 5f 5d c3 8d 74 26 00 c6 81 c0 e8 d0 c0 00 eb bc <0f>
0b eb fe 90 90 90 55 b9 04 00 00 00 89 e5 57 89 c7 31 c0 f3
EIP: [<c0386d89>] cmn_err+0x99/0xa0 SS:ESP 0068:ed75fba4
Kernel panic - not syncing: Fatal exception


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: latest -git: kernel BUG at fs/xfs/support/debug.c:54!
       [not found] ` <487F980F.7070708@redhat.com>
@ 2008-07-17 19:18   ` Vegard Nossum
  2008-07-17 19:29     ` Vegard Nossum
  0 siblings, 1 reply; 5+ messages in thread
From: Vegard Nossum @ 2008-07-17 19:18 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Tim Shimmin, xfs, linux-kernel, Johannes Weiner

On Thu, Jul 17, 2008 at 9:05 PM, Eric Sandeen <sandeen@redhat.com> wrote:
>> Hi,
>>
>> I got this with an intentionally corrupted filesystem:
>>
>> Filesystem "loop1": Disabling barriers, not supported by the underlying device
>> XFS mounting filesystem loop1
>> Ending clean XFS mount for filesystem: loop1
>> Device loop1 - bad inode magic/vsn daddr 9680 #30 (magic=4946)
>> ------------[ cut here ]------------
>> kernel BUG at fs/xfs/support/debug.c:54!
>
> running a debug XFS will turn all sorts of tests into panics that would
> not otherwise crash and burn that way.
>
> I think normally when testing intentionally corrupted filesystems, you
> expect corruptions to be handled gracefully.  But in xfs's flavor of
> debug, I'm not sure it's quite as true.
>
> Perhaps the debug variant should not BUG() on disk corruption either,
> but it'd probably be more relevent to test this on a non-debug build.
>
> Does this corrupted fs survive better on non-debug xfs?

Thanks, you are right. I have adjusted my configuration, but I am
still able to produce this:

BUG: unable to handle kernel paging request at b62a66e0
IP: [<c030ef88>] xfs_alloc_fix_freelist+0x28/0x490
*pde = 00000000
Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Pid: 4174, comm: rm Not tainted (2.6.26-03414-g33af79d #44)
EIP: 0060:[<c030ef88>] EFLAGS: 00210296 CPU: 0
EIP is at xfs_alloc_fix_freelist+0x28/0x490
EAX: f63e8830 EBX: f490a000 ECX: f48e8000 EDX: b62a66e0
ESI: 00000000 EDI: f48e9d8c EBP: f48e9d6c ESP: f48e9ccc
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process rm (pid: 4174, ti=f48e8000 task=f63d5fa0 task.ti=f48e8000)
Stack: 00000000 f63e8ac0 f63d5fa0 f63d64cc 00000002 00000000 f63d5fa0 f63e8830
       b62a66e0 f490a000 f73a3e10 c0b57c78 f49f2be0 c0ce8048 f49f24c0 00200046
       00000002 f48e9d20 c015908e f48e9d20 c01590cd f48e9d50 00200246 f63d6010
Call Trace:
 [<c015908e>] ? get_lock_stats+0x1e/0x50
 [<c01590cd>] ? put_lock_stats+0xd/0x30
 [<c030f453>] ? xfs_free_extent+0x63/0xd0
 [<c074955b>] ? down_read+0x5b/0x80
 [<c030f470>] ? xfs_free_extent+0x80/0xd0
 [<c0361f1a>] ? kmem_zone_alloc+0x7a/0xc0
 [<c0361f1a>] ? kmem_zone_alloc+0x7a/0xc0
 [<c03201ca>] ? xfs_bmap_finish+0x13a/0x180
 [<c03428d8>] ? xfs_itruncate_finish+0x1b8/0x400
 [<c035fa2b>] ? xfs_inactive+0x3bb/0x4e0
 [<c036b87a>] ? xfs_fs_clear_inode+0x8a/0xe0
 [<c01b962c>] ? clear_inode+0x7c/0x160
 [<c01b9c2e>] ? generic_delete_inode+0x10e/0x120
 [<c01b9d67>] ? generic_drop_inode+0x127/0x180
 [<c01b8be7>] ? iput+0x47/0x50
 [<c01af1bc>] ? do_unlinkat+0xec/0x170
 [<c0430938>] ? trace_hardirqs_on_thunk+0xc/0x10
 [<c0104174>] ? restore_nocheck_notrace+0x0/0xe
 [<c0430938>] ? trace_hardirqs_on_thunk+0xc/0x10
 [<c015ad76>] ? trace_hardirqs_on_caller+0x116/0x170
 [<c01af383>] ? sys_unlinkat+0x23/0x50
 [<c010407f>] ? sysenter_past_esp+0x78/0xc5
 =======================
Code: 8d 76 00 55 89 e5 57 89 c7 56 53 81 ec 94 00 00 00 8b 1f 89 95
70 ff ff ff 8b 57 0c 8b 40 04 89 5d 84 89 55 80 89 85 7c ff ff ff <80>
3a 00 0f 84 e7 02 00 00 c7 45 f0 00 00 00 00 8b 55 80 80 7a
EIP: [<c030ef88>] xfs_alloc_fix_freelist+0x28/0x490 SS:ESP 0068:f48e9ccc
Kernel panic - not syncing: Fatal exception

(Full log at http://folk.uio.no/vegardno/linux/log-1216322418.txt has
some more details.)


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: latest -git: kernel BUG at fs/xfs/support/debug.c:54!
  2008-07-17 19:18   ` Vegard Nossum
@ 2008-07-17 19:29     ` Vegard Nossum
  2008-07-17 22:40       ` Dave Chinner
  0 siblings, 1 reply; 5+ messages in thread
From: Vegard Nossum @ 2008-07-17 19:29 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Tim Shimmin, xfs, linux-kernel, Johannes Weiner

On Thu, Jul 17, 2008 at 9:18 PM, Vegard Nossum <vegard.nossum@gmail.com> wrote:
> Thanks, you are right. I have adjusted my configuration, but I am
> still able to produce this:
>
> BUG: unable to handle kernel paging request at b62a66e0
> IP: [<c030ef88>] xfs_alloc_fix_freelist+0x28/0x490

FWIW, this is fs/xfs/xfs_alloc.c:1817:

        if (!pag->pagf_init) {


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: latest -git: kernel BUG at fs/xfs/support/debug.c:54!
  2008-07-17 19:29     ` Vegard Nossum
@ 2008-07-17 22:40       ` Dave Chinner
  2008-07-19 13:16         ` Vegard Nossum
  0 siblings, 1 reply; 5+ messages in thread
From: Dave Chinner @ 2008-07-17 22:40 UTC (permalink / raw)
  To: Vegard Nossum
  Cc: Eric Sandeen, Tim Shimmin, xfs, linux-kernel, Johannes Weiner

On Thu, Jul 17, 2008 at 09:29:39PM +0200, Vegard Nossum wrote:
> On Thu, Jul 17, 2008 at 9:18 PM, Vegard Nossum <vegard.nossum@gmail.com> wrote:
> > Thanks, you are right. I have adjusted my configuration, but I am
> > still able to produce this:
> >
> > BUG: unable to handle kernel paging request at b62a66e0
> > IP: [<c030ef88>] xfs_alloc_fix_freelist+0x28/0x490
> 
> FWIW, this is fs/xfs/xfs_alloc.c:1817:
> 
>         if (!pag->pagf_init) {

Which kind of implies that we've got a bogus fsbno
that we're using as the basis of allocation.....

What is the corruption you are inducing? Can you produce
a xfs_metadump image of the filesystem and put it up somewhere
that we can access it?

I suspect that we are not validating the block numbers coming
out of the various btrees as landing inside the filesystem....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: latest -git: kernel BUG at fs/xfs/support/debug.c:54!
  2008-07-17 22:40       ` Dave Chinner
@ 2008-07-19 13:16         ` Vegard Nossum
  0 siblings, 0 replies; 5+ messages in thread
From: Vegard Nossum @ 2008-07-19 13:16 UTC (permalink / raw)
  To: Vegard Nossum, Eric Sandeen, Tim Shimmin, xfs, linux-kernel,
	Johannes Weiner

On Fri, Jul 18, 2008 at 12:40 AM, Dave Chinner <david@fromorbit.com> wrote:
> On Thu, Jul 17, 2008 at 09:29:39PM +0200, Vegard Nossum wrote:
>> On Thu, Jul 17, 2008 at 9:18 PM, Vegard Nossum <vegard.nossum@gmail.com> wrote:
>> > Thanks, you are right. I have adjusted my configuration, but I am
>> > still able to produce this:
>> >
>> > BUG: unable to handle kernel paging request at b62a66e0
>> > IP: [<c030ef88>] xfs_alloc_fix_freelist+0x28/0x490
>>
>> FWIW, this is fs/xfs/xfs_alloc.c:1817:
>>
>>         if (!pag->pagf_init) {
>
> Which kind of implies that we've got a bogus fsbno
> that we're using as the basis of allocation.....
>
> What is the corruption you are inducing? Can you produce
> a xfs_metadump image of the filesystem and put it up somewhere
> that we can access it?
>
> I suspect that we are not validating the block numbers coming
> out of the various btrees as landing inside the filesystem....

The method of corruption is quite crude (but efficient); just flip a
number of bits at random before mounting.

I got a different crash (NULL pointer) now, and I have a reproducible
case with a full disk image (it's only about 11M compressed, no
private/sensitive data). See
http://userweb.kernel.org/~vegard/bugs/20080719-xfs/

The way to reproduce:

    mount -o loop disk.xfs_idestroy_fork.bin /mnt
    rm -rf /mnt/*

And it should give something like this:

BUG: unable to handle kernel NULL pointer dereference at 00000008
IP: [<c0340ebf>] xfs_idestroy_fork+0x1f/0xe0
*pde = 00000000
Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Pid: 3966, comm: rm Not tainted (2.6.26-03421-g253a722 #49)
EIP: 0060:[<c0340ebf>] EFLAGS: 00210202 CPU: 1
EIP is at xfs_idestroy_fork+0x1f/0xe0
EAX: f5402a00 EBX: 00000000 ECX: f5ff0da0 EDX: 00000001
ESI: 00000001 EDI: f5402a00 EBP: f5fe5e7c ESP: f5fe5e70
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process rm (pid: 3966, ti=f5fe4000 task=f5f1cfb0 task.ti=f5fe4000)
Stack: f5402a00 00000000 f5fe5ecc f5fe5ea4 c035f729 00000000 00000004 00000002
       f79e4180 f5ff0cd0 f5402a00 f5ff0520 00000001 f5fe5ee0 c035f91e 00000000
       00000000 00000000 00000001 f79e4180 f5f1cfb0 00000000 c01590ae f5ff0a40
Call Trace:
 [<c035f729>] ? xfs_inactive_attrs+0xe9/0x100
 [<c035f91e>] ? xfs_inactive+0x1de/0x4e0
 [<c01590ae>] ? get_lock_stats+0x1e/0x50
 [<c01590ed>] ? put_lock_stats+0xd/0x30
 [<c036b94a>] ? xfs_fs_clear_inode+0x8a/0xe0
 [<c01b964c>] ? clear_inode+0x7c/0x160
 [<c01b9c4e>] ? generic_delete_inode+0x10e/0x120
 [<c01b9d87>] ? generic_drop_inode+0x127/0x180
 [<c01b8c07>] ? iput+0x47/0x50
 [<c01af1dc>] ? do_unlinkat+0xec/0x170
 [<c0430a08>] ? trace_hardirqs_on_thunk+0xc/0x10
 [<c015ad96>] ? trace_hardirqs_on_caller+0x116/0x170
 [<c01af3a3>] ? sys_unlinkat+0x23/0x50
 [<c010407f>] ? sysenter_past_esp+0x78/0xc5
 =======================
Code: c9 c3 8d 76 00 8d bc 27 00 00 00 00 55 89 e5 83 ec 0c 85 d2 89
1c 24 8d 58 38 89 74 24 04 89 d6 89 7c 24 08 89 c7 74 03 8b 58 34 <8b>
43 08 85 c0 74 10 0f bf 53 0c e8 c1 11 02 00 c7 43 08 00 00
EIP: [<c0340ebf>] xfs_idestroy_fork+0x1f/0xe0 SS:ESP 0068:f5fe5e70
---[ end trace 9a7a5b8ebfdbeebf ]---


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2008-07-19 13:14 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-17 17:46 latest -git: kernel BUG at fs/xfs/support/debug.c:54! Vegard Nossum
     [not found] ` <487F980F.7070708@redhat.com>
2008-07-17 19:18   ` Vegard Nossum
2008-07-17 19:29     ` Vegard Nossum
2008-07-17 22:40       ` Dave Chinner
2008-07-19 13:16         ` Vegard Nossum

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox