* latest -git: kernel BUG at fs/xfs/support/debug.c:54!
@ 2008-07-17 17:46 Vegard Nossum
2008-07-17 19:05 ` Eric Sandeen
0 siblings, 1 reply; 6+ messages in thread
From: Vegard Nossum @ 2008-07-17 17:46 UTC (permalink / raw)
To: Tim Shimmin, xfs; +Cc: linux-kernel, Johannes Weiner
Hi,
I got this with an intentionally corrupted filesystem:
Filesystem "loop1": Disabling barriers, not supported by the underlying device
XFS mounting filesystem loop1
Ending clean XFS mount for filesystem: loop1
Device loop1 - bad inode magic/vsn daddr 9680 #30 (magic=4946)
------------[ cut here ]------------
kernel BUG at fs/xfs/support/debug.c:54!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Pid: 12849, comm: grep Not tainted (2.6.26-03414-g33af79d #43)
EIP: 0060:[<c0386d89>] EFLAGS: 00210246 CPU: 1
EIP is at cmn_err+0x99/0xa0
EAX: ed75e000 EBX: c089047c ECX: ed75e000 EDX: 00000000
ESI: 00000000 EDI: 00200286 EBP: ed75fbbc ESP: ed75fba4
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process grep (pid: 12849, ti=ed75e000 task=f1ee9fe0 task.ti=ed75e000)
Stack: c0855099 c0846a0d c0d0e8c0 00004946 df92a8f0 0000001e ed75fc2c c03540d4
00000000 c089047c ed75fbfc 000025d0 00000000 0000001e 00004946 ed75fc54
00000000 df92a8f0 df92abd8 000025d0 00000000 00000000 706f6f6c 00000031
Call Trace:
[<c03540d4>] ? xfs_imap_to_bp+0x164/0x250
[<c015ad76>] ? trace_hardirqs_on_caller+0x116/0x170
[<c0354250>] ? xfs_itobp+0x90/0x180
[<c0356e51>] ? xfs_iread+0xa1/0x280
[<c034f216>] ? xfs_iget_core+0x1c6/0x6e0
[<c034f82a>] ? xfs_iget+0xfa/0x170
[<c0377546>] ? xfs_lookup+0xb6/0xc0
[<c0382fba>] ? xfs_vn_lookup+0x4a/0x90
[<c01ac110>] ? do_lookup+0x160/0x1b0
[<c01adc38>] ? __link_path_walk+0x208/0xdc0
[<c014f916>] ? up_read+0x16/0x30
[<c034eabe>] ? xfs_iunlock+0xee/0x110
[<c0382bdf>] ? xfs_vn_follow_link+0x3f/0x80
[<c01ae327>] ? __link_path_walk+0x8f7/0xdc0
[<c015906b>] ? trace_hardirqs_off+0xb/0x10
[<c01ae844>] ? path_walk+0x54/0xb0
[<c01aea45>] ? do_path_lookup+0x85/0x230
[<c01af7a8>] ? __user_walk_fd+0x38/0x50
[<c01a7fb1>] ? vfs_stat_fd+0x21/0x50
[<c01590cd>] ? put_lock_stats+0xd/0x30
[<c01bc81d>] ? mntput_no_expire+0x1d/0x110
[<c01a8081>] ? vfs_stat+0x11/0x20
[<c01a80a4>] ? sys_stat64+0x14/0x30
[<c01a5a8f>] ? fput+0x1f/0x30
[<c044a0f8>] ? trace_hardirqs_on_thunk+0xc/0x10
[<c015ad76>] ? trace_hardirqs_on_caller+0x116/0x170
[<c044a0f8>] ? trace_hardirqs_on_thunk+0xc/0x10
[<c010407f>] ? sysenter_past_esp+0x78/0xc5
=======================
Code: 04 e8 00 eb 3d 00 89 fa b8 40 4a 92 c0 e8 70 d0 3d 00 85 f6 74
15 83 c4 0c 5b 5e 5f 5d c3 8d 74 26 00 c6 81 c0 e8 d0 c0 00 eb bc <0f>
0b eb fe 90 90 90 55 b9 04 00 00 00 89 e5 57 89 c7 31 c0 f3
EIP: [<c0386d89>] cmn_err+0x99/0xa0 SS:ESP 0068:ed75fba4
Kernel panic - not syncing: Fatal exception
Vegard
--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: latest -git: kernel BUG at fs/xfs/support/debug.c:54! 2008-07-17 17:46 latest -git: kernel BUG at fs/xfs/support/debug.c:54! Vegard Nossum @ 2008-07-17 19:05 ` Eric Sandeen 2008-07-17 19:18 ` Vegard Nossum 0 siblings, 1 reply; 6+ messages in thread From: Eric Sandeen @ 2008-07-17 19:05 UTC (permalink / raw) To: Vegard Nossum; +Cc: Tim Shimmin, xfs, linux-kernel, Johannes Weiner Vegard Nossum wrote: > Hi, > > I got this with an intentionally corrupted filesystem: > > Filesystem "loop1": Disabling barriers, not supported by the underlying device > XFS mounting filesystem loop1 > Ending clean XFS mount for filesystem: loop1 > Device loop1 - bad inode magic/vsn daddr 9680 #30 (magic=4946) > ------------[ cut here ]------------ > kernel BUG at fs/xfs/support/debug.c:54! running a debug XFS will turn all sorts of tests into panics that would not otherwise crash and burn that way. I think normally when testing intentionally corrupted filesystems, you expect corruptions to be handled gracefully. But in xfs's flavor of debug, I'm not sure it's quite as true. Perhaps the debug variant should not BUG() on disk corruption either, but it'd probably be more relevent to test this on a non-debug build. Does this corrupted fs survive better on non-debug xfs? -Eric > invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC > Pid: 12849, comm: grep Not tainted (2.6.26-03414-g33af79d #43) > EIP: 0060:[<c0386d89>] EFLAGS: 00210246 CPU: 1 > EIP is at cmn_err+0x99/0xa0 > EAX: ed75e000 EBX: c089047c ECX: ed75e000 EDX: 00000000 > ESI: 00000000 EDI: 00200286 EBP: ed75fbbc ESP: ed75fba4 > DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 > Process grep (pid: 12849, ti=ed75e000 task=f1ee9fe0 task.ti=ed75e000) > Stack: c0855099 c0846a0d c0d0e8c0 00004946 df92a8f0 0000001e ed75fc2c c03540d4 > 00000000 c089047c ed75fbfc 000025d0 00000000 0000001e 00004946 ed75fc54 > 00000000 df92a8f0 df92abd8 000025d0 00000000 00000000 706f6f6c 00000031 > Call Trace: > [<c03540d4>] ? xfs_imap_to_bp+0x164/0x250 > [<c015ad76>] ? trace_hardirqs_on_caller+0x116/0x170 > [<c0354250>] ? xfs_itobp+0x90/0x180 > [<c0356e51>] ? xfs_iread+0xa1/0x280 > [<c034f216>] ? xfs_iget_core+0x1c6/0x6e0 > [<c034f82a>] ? xfs_iget+0xfa/0x170 > [<c0377546>] ? xfs_lookup+0xb6/0xc0 > [<c0382fba>] ? xfs_vn_lookup+0x4a/0x90 > [<c01ac110>] ? do_lookup+0x160/0x1b0 > [<c01adc38>] ? __link_path_walk+0x208/0xdc0 > [<c014f916>] ? up_read+0x16/0x30 > [<c034eabe>] ? xfs_iunlock+0xee/0x110 > [<c0382bdf>] ? xfs_vn_follow_link+0x3f/0x80 > [<c01ae327>] ? __link_path_walk+0x8f7/0xdc0 > [<c015906b>] ? trace_hardirqs_off+0xb/0x10 > [<c01ae844>] ? path_walk+0x54/0xb0 > [<c01aea45>] ? do_path_lookup+0x85/0x230 > [<c01af7a8>] ? __user_walk_fd+0x38/0x50 > [<c01a7fb1>] ? vfs_stat_fd+0x21/0x50 > [<c01590cd>] ? put_lock_stats+0xd/0x30 > [<c01bc81d>] ? mntput_no_expire+0x1d/0x110 > [<c01a8081>] ? vfs_stat+0x11/0x20 > [<c01a80a4>] ? sys_stat64+0x14/0x30 > [<c01a5a8f>] ? fput+0x1f/0x30 > [<c044a0f8>] ? trace_hardirqs_on_thunk+0xc/0x10 > [<c015ad76>] ? trace_hardirqs_on_caller+0x116/0x170 > [<c044a0f8>] ? trace_hardirqs_on_thunk+0xc/0x10 > [<c010407f>] ? sysenter_past_esp+0x78/0xc5 > ======================= > Code: 04 e8 00 eb 3d 00 89 fa b8 40 4a 92 c0 e8 70 d0 3d 00 85 f6 74 > 15 83 c4 0c 5b 5e 5f 5d c3 8d 74 26 00 c6 81 c0 e8 d0 c0 00 eb bc <0f> > 0b eb fe 90 90 90 55 b9 04 00 00 00 89 e5 57 89 c7 31 c0 f3 > EIP: [<c0386d89>] cmn_err+0x99/0xa0 SS:ESP 0068:ed75fba4 > Kernel panic - not syncing: Fatal exception > > > Vegard > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: latest -git: kernel BUG at fs/xfs/support/debug.c:54! 2008-07-17 19:05 ` Eric Sandeen @ 2008-07-17 19:18 ` Vegard Nossum 2008-07-17 19:29 ` Vegard Nossum 0 siblings, 1 reply; 6+ messages in thread From: Vegard Nossum @ 2008-07-17 19:18 UTC (permalink / raw) To: Eric Sandeen; +Cc: Tim Shimmin, xfs, linux-kernel, Johannes Weiner On Thu, Jul 17, 2008 at 9:05 PM, Eric Sandeen <sandeen@redhat.com> wrote: >> Hi, >> >> I got this with an intentionally corrupted filesystem: >> >> Filesystem "loop1": Disabling barriers, not supported by the underlying device >> XFS mounting filesystem loop1 >> Ending clean XFS mount for filesystem: loop1 >> Device loop1 - bad inode magic/vsn daddr 9680 #30 (magic=4946) >> ------------[ cut here ]------------ >> kernel BUG at fs/xfs/support/debug.c:54! > > running a debug XFS will turn all sorts of tests into panics that would > not otherwise crash and burn that way. > > I think normally when testing intentionally corrupted filesystems, you > expect corruptions to be handled gracefully. But in xfs's flavor of > debug, I'm not sure it's quite as true. > > Perhaps the debug variant should not BUG() on disk corruption either, > but it'd probably be more relevent to test this on a non-debug build. > > Does this corrupted fs survive better on non-debug xfs? Thanks, you are right. I have adjusted my configuration, but I am still able to produce this: BUG: unable to handle kernel paging request at b62a66e0 IP: [<c030ef88>] xfs_alloc_fix_freelist+0x28/0x490 *pde = 00000000 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Pid: 4174, comm: rm Not tainted (2.6.26-03414-g33af79d #44) EIP: 0060:[<c030ef88>] EFLAGS: 00210296 CPU: 0 EIP is at xfs_alloc_fix_freelist+0x28/0x490 EAX: f63e8830 EBX: f490a000 ECX: f48e8000 EDX: b62a66e0 ESI: 00000000 EDI: f48e9d8c EBP: f48e9d6c ESP: f48e9ccc DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process rm (pid: 4174, ti=f48e8000 task=f63d5fa0 task.ti=f48e8000) Stack: 00000000 f63e8ac0 f63d5fa0 f63d64cc 00000002 00000000 f63d5fa0 f63e8830 b62a66e0 f490a000 f73a3e10 c0b57c78 f49f2be0 c0ce8048 f49f24c0 00200046 00000002 f48e9d20 c015908e f48e9d20 c01590cd f48e9d50 00200246 f63d6010 Call Trace: [<c015908e>] ? get_lock_stats+0x1e/0x50 [<c01590cd>] ? put_lock_stats+0xd/0x30 [<c030f453>] ? xfs_free_extent+0x63/0xd0 [<c074955b>] ? down_read+0x5b/0x80 [<c030f470>] ? xfs_free_extent+0x80/0xd0 [<c0361f1a>] ? kmem_zone_alloc+0x7a/0xc0 [<c0361f1a>] ? kmem_zone_alloc+0x7a/0xc0 [<c03201ca>] ? xfs_bmap_finish+0x13a/0x180 [<c03428d8>] ? xfs_itruncate_finish+0x1b8/0x400 [<c035fa2b>] ? xfs_inactive+0x3bb/0x4e0 [<c036b87a>] ? xfs_fs_clear_inode+0x8a/0xe0 [<c01b962c>] ? clear_inode+0x7c/0x160 [<c01b9c2e>] ? generic_delete_inode+0x10e/0x120 [<c01b9d67>] ? generic_drop_inode+0x127/0x180 [<c01b8be7>] ? iput+0x47/0x50 [<c01af1bc>] ? do_unlinkat+0xec/0x170 [<c0430938>] ? trace_hardirqs_on_thunk+0xc/0x10 [<c0104174>] ? restore_nocheck_notrace+0x0/0xe [<c0430938>] ? trace_hardirqs_on_thunk+0xc/0x10 [<c015ad76>] ? trace_hardirqs_on_caller+0x116/0x170 [<c01af383>] ? sys_unlinkat+0x23/0x50 [<c010407f>] ? sysenter_past_esp+0x78/0xc5 ======================= Code: 8d 76 00 55 89 e5 57 89 c7 56 53 81 ec 94 00 00 00 8b 1f 89 95 70 ff ff ff 8b 57 0c 8b 40 04 89 5d 84 89 55 80 89 85 7c ff ff ff <80> 3a 00 0f 84 e7 02 00 00 c7 45 f0 00 00 00 00 8b 55 80 80 7a EIP: [<c030ef88>] xfs_alloc_fix_freelist+0x28/0x490 SS:ESP 0068:f48e9ccc Kernel panic - not syncing: Fatal exception (Full log at http://folk.uio.no/vegardno/linux/log-1216322418.txt has some more details.) Vegard -- "The animistic metaphor of the bug that maliciously sneaked in while the programmer was not looking is intellectually dishonest as it disguises that the error is the programmer's own creation." -- E. W. Dijkstra, EWD1036 ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: latest -git: kernel BUG at fs/xfs/support/debug.c:54! 2008-07-17 19:18 ` Vegard Nossum @ 2008-07-17 19:29 ` Vegard Nossum 2008-07-17 22:40 ` Dave Chinner 0 siblings, 1 reply; 6+ messages in thread From: Vegard Nossum @ 2008-07-17 19:29 UTC (permalink / raw) To: Eric Sandeen; +Cc: Tim Shimmin, xfs, linux-kernel, Johannes Weiner On Thu, Jul 17, 2008 at 9:18 PM, Vegard Nossum <vegard.nossum@gmail.com> wrote: > Thanks, you are right. I have adjusted my configuration, but I am > still able to produce this: > > BUG: unable to handle kernel paging request at b62a66e0 > IP: [<c030ef88>] xfs_alloc_fix_freelist+0x28/0x490 FWIW, this is fs/xfs/xfs_alloc.c:1817: if (!pag->pagf_init) { Vegard -- "The animistic metaphor of the bug that maliciously sneaked in while the programmer was not looking is intellectually dishonest as it disguises that the error is the programmer's own creation." -- E. W. Dijkstra, EWD1036 ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: latest -git: kernel BUG at fs/xfs/support/debug.c:54! 2008-07-17 19:29 ` Vegard Nossum @ 2008-07-17 22:40 ` Dave Chinner 2008-07-19 13:16 ` Vegard Nossum 0 siblings, 1 reply; 6+ messages in thread From: Dave Chinner @ 2008-07-17 22:40 UTC (permalink / raw) To: Vegard Nossum Cc: Eric Sandeen, Tim Shimmin, xfs, linux-kernel, Johannes Weiner On Thu, Jul 17, 2008 at 09:29:39PM +0200, Vegard Nossum wrote: > On Thu, Jul 17, 2008 at 9:18 PM, Vegard Nossum <vegard.nossum@gmail.com> wrote: > > Thanks, you are right. I have adjusted my configuration, but I am > > still able to produce this: > > > > BUG: unable to handle kernel paging request at b62a66e0 > > IP: [<c030ef88>] xfs_alloc_fix_freelist+0x28/0x490 > > FWIW, this is fs/xfs/xfs_alloc.c:1817: > > if (!pag->pagf_init) { Which kind of implies that we've got a bogus fsbno that we're using as the basis of allocation..... What is the corruption you are inducing? Can you produce a xfs_metadump image of the filesystem and put it up somewhere that we can access it? I suspect that we are not validating the block numbers coming out of the various btrees as landing inside the filesystem.... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: latest -git: kernel BUG at fs/xfs/support/debug.c:54! 2008-07-17 22:40 ` Dave Chinner @ 2008-07-19 13:16 ` Vegard Nossum 0 siblings, 0 replies; 6+ messages in thread From: Vegard Nossum @ 2008-07-19 13:16 UTC (permalink / raw) To: Vegard Nossum, Eric Sandeen, Tim Shimmin, xfs, linux-kernel, Johannes Weiner On Fri, Jul 18, 2008 at 12:40 AM, Dave Chinner <david@fromorbit.com> wrote: > On Thu, Jul 17, 2008 at 09:29:39PM +0200, Vegard Nossum wrote: >> On Thu, Jul 17, 2008 at 9:18 PM, Vegard Nossum <vegard.nossum@gmail.com> wrote: >> > Thanks, you are right. I have adjusted my configuration, but I am >> > still able to produce this: >> > >> > BUG: unable to handle kernel paging request at b62a66e0 >> > IP: [<c030ef88>] xfs_alloc_fix_freelist+0x28/0x490 >> >> FWIW, this is fs/xfs/xfs_alloc.c:1817: >> >> if (!pag->pagf_init) { > > Which kind of implies that we've got a bogus fsbno > that we're using as the basis of allocation..... > > What is the corruption you are inducing? Can you produce > a xfs_metadump image of the filesystem and put it up somewhere > that we can access it? > > I suspect that we are not validating the block numbers coming > out of the various btrees as landing inside the filesystem.... The method of corruption is quite crude (but efficient); just flip a number of bits at random before mounting. I got a different crash (NULL pointer) now, and I have a reproducible case with a full disk image (it's only about 11M compressed, no private/sensitive data). See http://userweb.kernel.org/~vegard/bugs/20080719-xfs/ The way to reproduce: mount -o loop disk.xfs_idestroy_fork.bin /mnt rm -rf /mnt/* And it should give something like this: BUG: unable to handle kernel NULL pointer dereference at 00000008 IP: [<c0340ebf>] xfs_idestroy_fork+0x1f/0xe0 *pde = 00000000 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Pid: 3966, comm: rm Not tainted (2.6.26-03421-g253a722 #49) EIP: 0060:[<c0340ebf>] EFLAGS: 00210202 CPU: 1 EIP is at xfs_idestroy_fork+0x1f/0xe0 EAX: f5402a00 EBX: 00000000 ECX: f5ff0da0 EDX: 00000001 ESI: 00000001 EDI: f5402a00 EBP: f5fe5e7c ESP: f5fe5e70 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process rm (pid: 3966, ti=f5fe4000 task=f5f1cfb0 task.ti=f5fe4000) Stack: f5402a00 00000000 f5fe5ecc f5fe5ea4 c035f729 00000000 00000004 00000002 f79e4180 f5ff0cd0 f5402a00 f5ff0520 00000001 f5fe5ee0 c035f91e 00000000 00000000 00000000 00000001 f79e4180 f5f1cfb0 00000000 c01590ae f5ff0a40 Call Trace: [<c035f729>] ? xfs_inactive_attrs+0xe9/0x100 [<c035f91e>] ? xfs_inactive+0x1de/0x4e0 [<c01590ae>] ? get_lock_stats+0x1e/0x50 [<c01590ed>] ? put_lock_stats+0xd/0x30 [<c036b94a>] ? xfs_fs_clear_inode+0x8a/0xe0 [<c01b964c>] ? clear_inode+0x7c/0x160 [<c01b9c4e>] ? generic_delete_inode+0x10e/0x120 [<c01b9d87>] ? generic_drop_inode+0x127/0x180 [<c01b8c07>] ? iput+0x47/0x50 [<c01af1dc>] ? do_unlinkat+0xec/0x170 [<c0430a08>] ? trace_hardirqs_on_thunk+0xc/0x10 [<c015ad96>] ? trace_hardirqs_on_caller+0x116/0x170 [<c01af3a3>] ? sys_unlinkat+0x23/0x50 [<c010407f>] ? sysenter_past_esp+0x78/0xc5 ======================= Code: c9 c3 8d 76 00 8d bc 27 00 00 00 00 55 89 e5 83 ec 0c 85 d2 89 1c 24 8d 58 38 89 74 24 04 89 d6 89 7c 24 08 89 c7 74 03 8b 58 34 <8b> 43 08 85 c0 74 10 0f bf 53 0c e8 c1 11 02 00 c7 43 08 00 00 EIP: [<c0340ebf>] xfs_idestroy_fork+0x1f/0xe0 SS:ESP 0068:f5fe5e70 ---[ end trace 9a7a5b8ebfdbeebf ]--- Vegard -- "The animistic metaphor of the bug that maliciously sneaked in while the programmer was not looking is intellectually dishonest as it disguises that the error is the programmer's own creation." -- E. W. Dijkstra, EWD1036 ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2008-07-19 13:16 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-07-17 17:46 latest -git: kernel BUG at fs/xfs/support/debug.c:54! Vegard Nossum 2008-07-17 19:05 ` Eric Sandeen 2008-07-17 19:18 ` Vegard Nossum 2008-07-17 19:29 ` Vegard Nossum 2008-07-17 22:40 ` Dave Chinner 2008-07-19 13:16 ` Vegard Nossum
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox