* latest -git: kernel BUG at fs/xfs/support/debug.c:54!
@ 2008-07-17 17:46 Vegard Nossum
[not found] ` <487F980F.7070708@redhat.com>
0 siblings, 1 reply; 5+ messages in thread
From: Vegard Nossum @ 2008-07-17 17:46 UTC (permalink / raw)
To: Tim Shimmin, xfs; +Cc: linux-kernel, Johannes Weiner
Hi,
I got this with an intentionally corrupted filesystem:
Filesystem "loop1": Disabling barriers, not supported by the underlying device
XFS mounting filesystem loop1
Ending clean XFS mount for filesystem: loop1
Device loop1 - bad inode magic/vsn daddr 9680 #30 (magic=4946)
------------[ cut here ]------------
kernel BUG at fs/xfs/support/debug.c:54!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Pid: 12849, comm: grep Not tainted (2.6.26-03414-g33af79d #43)
EIP: 0060:[<c0386d89>] EFLAGS: 00210246 CPU: 1
EIP is at cmn_err+0x99/0xa0
EAX: ed75e000 EBX: c089047c ECX: ed75e000 EDX: 00000000
ESI: 00000000 EDI: 00200286 EBP: ed75fbbc ESP: ed75fba4
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process grep (pid: 12849, ti=ed75e000 task=f1ee9fe0 task.ti=ed75e000)
Stack: c0855099 c0846a0d c0d0e8c0 00004946 df92a8f0 0000001e ed75fc2c c03540d4
00000000 c089047c ed75fbfc 000025d0 00000000 0000001e 00004946 ed75fc54
00000000 df92a8f0 df92abd8 000025d0 00000000 00000000 706f6f6c 00000031
Call Trace:
[<c03540d4>] ? xfs_imap_to_bp+0x164/0x250
[<c015ad76>] ? trace_hardirqs_on_caller+0x116/0x170
[<c0354250>] ? xfs_itobp+0x90/0x180
[<c0356e51>] ? xfs_iread+0xa1/0x280
[<c034f216>] ? xfs_iget_core+0x1c6/0x6e0
[<c034f82a>] ? xfs_iget+0xfa/0x170
[<c0377546>] ? xfs_lookup+0xb6/0xc0
[<c0382fba>] ? xfs_vn_lookup+0x4a/0x90
[<c01ac110>] ? do_lookup+0x160/0x1b0
[<c01adc38>] ? __link_path_walk+0x208/0xdc0
[<c014f916>] ? up_read+0x16/0x30
[<c034eabe>] ? xfs_iunlock+0xee/0x110
[<c0382bdf>] ? xfs_vn_follow_link+0x3f/0x80
[<c01ae327>] ? __link_path_walk+0x8f7/0xdc0
[<c015906b>] ? trace_hardirqs_off+0xb/0x10
[<c01ae844>] ? path_walk+0x54/0xb0
[<c01aea45>] ? do_path_lookup+0x85/0x230
[<c01af7a8>] ? __user_walk_fd+0x38/0x50
[<c01a7fb1>] ? vfs_stat_fd+0x21/0x50
[<c01590cd>] ? put_lock_stats+0xd/0x30
[<c01bc81d>] ? mntput_no_expire+0x1d/0x110
[<c01a8081>] ? vfs_stat+0x11/0x20
[<c01a80a4>] ? sys_stat64+0x14/0x30
[<c01a5a8f>] ? fput+0x1f/0x30
[<c044a0f8>] ? trace_hardirqs_on_thunk+0xc/0x10
[<c015ad76>] ? trace_hardirqs_on_caller+0x116/0x170
[<c044a0f8>] ? trace_hardirqs_on_thunk+0xc/0x10
[<c010407f>] ? sysenter_past_esp+0x78/0xc5
=======================
Code: 04 e8 00 eb 3d 00 89 fa b8 40 4a 92 c0 e8 70 d0 3d 00 85 f6 74
15 83 c4 0c 5b 5e 5f 5d c3 8d 74 26 00 c6 81 c0 e8 d0 c0 00 eb bc <0f>
0b eb fe 90 90 90 55 b9 04 00 00 00 89 e5 57 89 c7 31 c0 f3
EIP: [<c0386d89>] cmn_err+0x99/0xa0 SS:ESP 0068:ed75fba4
Kernel panic - not syncing: Fatal exception
Vegard
--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036
^ permalink raw reply [flat|nested] 5+ messages in thread[parent not found: <487F980F.7070708@redhat.com>]
* Re: latest -git: kernel BUG at fs/xfs/support/debug.c:54! [not found] ` <487F980F.7070708@redhat.com> @ 2008-07-17 19:18 ` Vegard Nossum 2008-07-17 19:29 ` Vegard Nossum 0 siblings, 1 reply; 5+ messages in thread From: Vegard Nossum @ 2008-07-17 19:18 UTC (permalink / raw) To: Eric Sandeen; +Cc: Tim Shimmin, xfs, linux-kernel, Johannes Weiner On Thu, Jul 17, 2008 at 9:05 PM, Eric Sandeen <sandeen@redhat.com> wrote: >> Hi, >> >> I got this with an intentionally corrupted filesystem: >> >> Filesystem "loop1": Disabling barriers, not supported by the underlying device >> XFS mounting filesystem loop1 >> Ending clean XFS mount for filesystem: loop1 >> Device loop1 - bad inode magic/vsn daddr 9680 #30 (magic=4946) >> ------------[ cut here ]------------ >> kernel BUG at fs/xfs/support/debug.c:54! > > running a debug XFS will turn all sorts of tests into panics that would > not otherwise crash and burn that way. > > I think normally when testing intentionally corrupted filesystems, you > expect corruptions to be handled gracefully. But in xfs's flavor of > debug, I'm not sure it's quite as true. > > Perhaps the debug variant should not BUG() on disk corruption either, > but it'd probably be more relevent to test this on a non-debug build. > > Does this corrupted fs survive better on non-debug xfs? Thanks, you are right. I have adjusted my configuration, but I am still able to produce this: BUG: unable to handle kernel paging request at b62a66e0 IP: [<c030ef88>] xfs_alloc_fix_freelist+0x28/0x490 *pde = 00000000 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Pid: 4174, comm: rm Not tainted (2.6.26-03414-g33af79d #44) EIP: 0060:[<c030ef88>] EFLAGS: 00210296 CPU: 0 EIP is at xfs_alloc_fix_freelist+0x28/0x490 EAX: f63e8830 EBX: f490a000 ECX: f48e8000 EDX: b62a66e0 ESI: 00000000 EDI: f48e9d8c EBP: f48e9d6c ESP: f48e9ccc DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process rm (pid: 4174, ti=f48e8000 task=f63d5fa0 task.ti=f48e8000) Stack: 00000000 f63e8ac0 f63d5fa0 f63d64cc 00000002 00000000 f63d5fa0 f63e8830 b62a66e0 f490a000 f73a3e10 c0b57c78 f49f2be0 c0ce8048 f49f24c0 00200046 00000002 f48e9d20 c015908e f48e9d20 c01590cd f48e9d50 00200246 f63d6010 Call Trace: [<c015908e>] ? get_lock_stats+0x1e/0x50 [<c01590cd>] ? put_lock_stats+0xd/0x30 [<c030f453>] ? xfs_free_extent+0x63/0xd0 [<c074955b>] ? down_read+0x5b/0x80 [<c030f470>] ? xfs_free_extent+0x80/0xd0 [<c0361f1a>] ? kmem_zone_alloc+0x7a/0xc0 [<c0361f1a>] ? kmem_zone_alloc+0x7a/0xc0 [<c03201ca>] ? xfs_bmap_finish+0x13a/0x180 [<c03428d8>] ? xfs_itruncate_finish+0x1b8/0x400 [<c035fa2b>] ? xfs_inactive+0x3bb/0x4e0 [<c036b87a>] ? xfs_fs_clear_inode+0x8a/0xe0 [<c01b962c>] ? clear_inode+0x7c/0x160 [<c01b9c2e>] ? generic_delete_inode+0x10e/0x120 [<c01b9d67>] ? generic_drop_inode+0x127/0x180 [<c01b8be7>] ? iput+0x47/0x50 [<c01af1bc>] ? do_unlinkat+0xec/0x170 [<c0430938>] ? trace_hardirqs_on_thunk+0xc/0x10 [<c0104174>] ? restore_nocheck_notrace+0x0/0xe [<c0430938>] ? trace_hardirqs_on_thunk+0xc/0x10 [<c015ad76>] ? trace_hardirqs_on_caller+0x116/0x170 [<c01af383>] ? sys_unlinkat+0x23/0x50 [<c010407f>] ? sysenter_past_esp+0x78/0xc5 ======================= Code: 8d 76 00 55 89 e5 57 89 c7 56 53 81 ec 94 00 00 00 8b 1f 89 95 70 ff ff ff 8b 57 0c 8b 40 04 89 5d 84 89 55 80 89 85 7c ff ff ff <80> 3a 00 0f 84 e7 02 00 00 c7 45 f0 00 00 00 00 8b 55 80 80 7a EIP: [<c030ef88>] xfs_alloc_fix_freelist+0x28/0x490 SS:ESP 0068:f48e9ccc Kernel panic - not syncing: Fatal exception (Full log at http://folk.uio.no/vegardno/linux/log-1216322418.txt has some more details.) Vegard -- "The animistic metaphor of the bug that maliciously sneaked in while the programmer was not looking is intellectually dishonest as it disguises that the error is the programmer's own creation." -- E. W. Dijkstra, EWD1036 ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: latest -git: kernel BUG at fs/xfs/support/debug.c:54! 2008-07-17 19:18 ` Vegard Nossum @ 2008-07-17 19:29 ` Vegard Nossum 2008-07-17 22:40 ` Dave Chinner 0 siblings, 1 reply; 5+ messages in thread From: Vegard Nossum @ 2008-07-17 19:29 UTC (permalink / raw) To: Eric Sandeen; +Cc: Tim Shimmin, xfs, linux-kernel, Johannes Weiner On Thu, Jul 17, 2008 at 9:18 PM, Vegard Nossum <vegard.nossum@gmail.com> wrote: > Thanks, you are right. I have adjusted my configuration, but I am > still able to produce this: > > BUG: unable to handle kernel paging request at b62a66e0 > IP: [<c030ef88>] xfs_alloc_fix_freelist+0x28/0x490 FWIW, this is fs/xfs/xfs_alloc.c:1817: if (!pag->pagf_init) { Vegard -- "The animistic metaphor of the bug that maliciously sneaked in while the programmer was not looking is intellectually dishonest as it disguises that the error is the programmer's own creation." -- E. W. Dijkstra, EWD1036 ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: latest -git: kernel BUG at fs/xfs/support/debug.c:54! 2008-07-17 19:29 ` Vegard Nossum @ 2008-07-17 22:40 ` Dave Chinner 2008-07-19 13:16 ` Vegard Nossum 0 siblings, 1 reply; 5+ messages in thread From: Dave Chinner @ 2008-07-17 22:40 UTC (permalink / raw) To: Vegard Nossum Cc: Eric Sandeen, Tim Shimmin, xfs, linux-kernel, Johannes Weiner On Thu, Jul 17, 2008 at 09:29:39PM +0200, Vegard Nossum wrote: > On Thu, Jul 17, 2008 at 9:18 PM, Vegard Nossum <vegard.nossum@gmail.com> wrote: > > Thanks, you are right. I have adjusted my configuration, but I am > > still able to produce this: > > > > BUG: unable to handle kernel paging request at b62a66e0 > > IP: [<c030ef88>] xfs_alloc_fix_freelist+0x28/0x490 > > FWIW, this is fs/xfs/xfs_alloc.c:1817: > > if (!pag->pagf_init) { Which kind of implies that we've got a bogus fsbno that we're using as the basis of allocation..... What is the corruption you are inducing? Can you produce a xfs_metadump image of the filesystem and put it up somewhere that we can access it? I suspect that we are not validating the block numbers coming out of the various btrees as landing inside the filesystem.... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: latest -git: kernel BUG at fs/xfs/support/debug.c:54! 2008-07-17 22:40 ` Dave Chinner @ 2008-07-19 13:16 ` Vegard Nossum 0 siblings, 0 replies; 5+ messages in thread From: Vegard Nossum @ 2008-07-19 13:16 UTC (permalink / raw) To: Vegard Nossum, Eric Sandeen, Tim Shimmin, xfs, linux-kernel, Johannes Weiner On Fri, Jul 18, 2008 at 12:40 AM, Dave Chinner <david@fromorbit.com> wrote: > On Thu, Jul 17, 2008 at 09:29:39PM +0200, Vegard Nossum wrote: >> On Thu, Jul 17, 2008 at 9:18 PM, Vegard Nossum <vegard.nossum@gmail.com> wrote: >> > Thanks, you are right. I have adjusted my configuration, but I am >> > still able to produce this: >> > >> > BUG: unable to handle kernel paging request at b62a66e0 >> > IP: [<c030ef88>] xfs_alloc_fix_freelist+0x28/0x490 >> >> FWIW, this is fs/xfs/xfs_alloc.c:1817: >> >> if (!pag->pagf_init) { > > Which kind of implies that we've got a bogus fsbno > that we're using as the basis of allocation..... > > What is the corruption you are inducing? Can you produce > a xfs_metadump image of the filesystem and put it up somewhere > that we can access it? > > I suspect that we are not validating the block numbers coming > out of the various btrees as landing inside the filesystem.... The method of corruption is quite crude (but efficient); just flip a number of bits at random before mounting. I got a different crash (NULL pointer) now, and I have a reproducible case with a full disk image (it's only about 11M compressed, no private/sensitive data). See http://userweb.kernel.org/~vegard/bugs/20080719-xfs/ The way to reproduce: mount -o loop disk.xfs_idestroy_fork.bin /mnt rm -rf /mnt/* And it should give something like this: BUG: unable to handle kernel NULL pointer dereference at 00000008 IP: [<c0340ebf>] xfs_idestroy_fork+0x1f/0xe0 *pde = 00000000 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Pid: 3966, comm: rm Not tainted (2.6.26-03421-g253a722 #49) EIP: 0060:[<c0340ebf>] EFLAGS: 00210202 CPU: 1 EIP is at xfs_idestroy_fork+0x1f/0xe0 EAX: f5402a00 EBX: 00000000 ECX: f5ff0da0 EDX: 00000001 ESI: 00000001 EDI: f5402a00 EBP: f5fe5e7c ESP: f5fe5e70 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process rm (pid: 3966, ti=f5fe4000 task=f5f1cfb0 task.ti=f5fe4000) Stack: f5402a00 00000000 f5fe5ecc f5fe5ea4 c035f729 00000000 00000004 00000002 f79e4180 f5ff0cd0 f5402a00 f5ff0520 00000001 f5fe5ee0 c035f91e 00000000 00000000 00000000 00000001 f79e4180 f5f1cfb0 00000000 c01590ae f5ff0a40 Call Trace: [<c035f729>] ? xfs_inactive_attrs+0xe9/0x100 [<c035f91e>] ? xfs_inactive+0x1de/0x4e0 [<c01590ae>] ? get_lock_stats+0x1e/0x50 [<c01590ed>] ? put_lock_stats+0xd/0x30 [<c036b94a>] ? xfs_fs_clear_inode+0x8a/0xe0 [<c01b964c>] ? clear_inode+0x7c/0x160 [<c01b9c4e>] ? generic_delete_inode+0x10e/0x120 [<c01b9d87>] ? generic_drop_inode+0x127/0x180 [<c01b8c07>] ? iput+0x47/0x50 [<c01af1dc>] ? do_unlinkat+0xec/0x170 [<c0430a08>] ? trace_hardirqs_on_thunk+0xc/0x10 [<c015ad96>] ? trace_hardirqs_on_caller+0x116/0x170 [<c01af3a3>] ? sys_unlinkat+0x23/0x50 [<c010407f>] ? sysenter_past_esp+0x78/0xc5 ======================= Code: c9 c3 8d 76 00 8d bc 27 00 00 00 00 55 89 e5 83 ec 0c 85 d2 89 1c 24 8d 58 38 89 74 24 04 89 d6 89 7c 24 08 89 c7 74 03 8b 58 34 <8b> 43 08 85 c0 74 10 0f bf 53 0c e8 c1 11 02 00 c7 43 08 00 00 EIP: [<c0340ebf>] xfs_idestroy_fork+0x1f/0xe0 SS:ESP 0068:f5fe5e70 ---[ end trace 9a7a5b8ebfdbeebf ]--- Vegard -- "The animistic metaphor of the bug that maliciously sneaked in while the programmer was not looking is intellectually dishonest as it disguises that the error is the programmer's own creation." -- E. W. Dijkstra, EWD1036 ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2008-07-19 13:14 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-17 17:46 latest -git: kernel BUG at fs/xfs/support/debug.c:54! Vegard Nossum
[not found] ` <487F980F.7070708@redhat.com>
2008-07-17 19:18 ` Vegard Nossum
2008-07-17 19:29 ` Vegard Nossum
2008-07-17 22:40 ` Dave Chinner
2008-07-19 13:16 ` Vegard Nossum
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox