* corrupted log causes infinite loop at mount
@ 2006-10-13 19:46 Eric Sandeen
2006-10-15 5:51 ` David Chatterton
0 siblings, 1 reply; 4+ messages in thread
From: Eric Sandeen @ 2006-10-13 19:46 UTC (permalink / raw)
To: xfs
While playing with some filesystem corruption testers, I ran into this.
http://sandeen.net/xfs.31.img.bz2
If you try to mount, it gets into xfs_buf_get_noaddr via log replay with
a len of 0, and I think this causes an infinite loop in the goto:
try_again:
data = kmem_alloc(malloc_len, KM_SLEEP | KM_MAYFAIL);
if (unlikely(data == NULL))
goto fail_free_buf;
/* check whether alignment matches.. */
if ((__psunsigned_t)data !=
((__psunsigned_t)data & ~target->bt_smask)) {
/* .. else double the size and try again */
kmem_free(data, malloc_len);
malloc_len <<= 1;
goto try_again;
}
Up the callchain a bit there is an ASSERT that the size is > 0, but of
course that doesn't help on a non-debug kernel...
haven't had time to investigate beyond that.
-Eric
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: corrupted log causes infinite loop at mount 2006-10-13 19:46 corrupted log causes infinite loop at mount Eric Sandeen @ 2006-10-15 5:51 ` David Chatterton 2006-10-15 14:02 ` Eric Sandeen 0 siblings, 1 reply; 4+ messages in thread From: David Chatterton @ 2006-10-15 5:51 UTC (permalink / raw) To: Eric Sandeen; +Cc: xfs Eric, Eric Sandeen wrote: > While playing with some filesystem corruption testers, I ran into this. > > http://sandeen.net/xfs.31.img.bz2 > > If you try to mount, it gets into xfs_buf_get_noaddr via log replay with > a len of 0, and I think this causes an infinite loop in the goto: > > try_again: > data = kmem_alloc(malloc_len, KM_SLEEP | KM_MAYFAIL); > if (unlikely(data == NULL)) > goto fail_free_buf; > > /* check whether alignment matches.. */ > if ((__psunsigned_t)data != > ((__psunsigned_t)data & ~target->bt_smask)) { > /* .. else double the size and try again */ > kmem_free(data, malloc_len); > malloc_len <<= 1; > goto try_again; > } > > Up the callchain a bit there is an ASSERT that the size is > 0, but of > course that doesn't help on a non-debug kernel... > > haven't had time to investigate beyond that. > > -Eric > I assume the loop is further up the chain since kmem_alloc should return NULL when asked to alloc 0. So then the problem also lies further up the chain in checking for a 0 length before calling down, and/or not assuming we are out of memory when xfs_buf_get_noaddr fails. David -- David Chatterton XFS Engineering Manager SGI Australia ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: corrupted log causes infinite loop at mount 2006-10-15 5:51 ` David Chatterton @ 2006-10-15 14:02 ` Eric Sandeen 2006-10-18 3:47 ` Eric Sandeen 0 siblings, 1 reply; 4+ messages in thread From: Eric Sandeen @ 2006-10-15 14:02 UTC (permalink / raw) To: chatz; +Cc: xfs David Chatterton wrote: > I assume the loop is further up the chain since kmem_alloc should return NULL > when asked to alloc 0. So then the problem also lies further up the chain in > checking for a 0 length before calling down, and/or not assuming we are out of > memory when xfs_buf_get_noaddr fails. Well, I set kdb breakpoints, and we only entered xfs_buf_get_noaddr once, so I assume it's looping inside. But I was looking for bugs on, um, another filesystem at the time, so didn't investigate much. I can put it on my list of spare-time bugs to look at, or just thought you guys may be interested as well. -Eric p.s. ok can't help but look just a bit further... a test module which does: int __init test_init(void) { void *data; int size = 0; data = kmalloc(size, GFP_KERNEL); if (data == NULL) { printk("got NULL for alloc return\n"); return -1; } else { printk("allocated %d bytes at %p\n", size, data); return 0; } } yields: allocated 0 bytes at ffff810029d88480 not NULL... nifty eh! -Eric ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: corrupted log causes infinite loop at mount 2006-10-15 14:02 ` Eric Sandeen @ 2006-10-18 3:47 ` Eric Sandeen 0 siblings, 0 replies; 4+ messages in thread From: Eric Sandeen @ 2006-10-18 3:47 UTC (permalink / raw) To: Eric Sandeen; +Cc: chatz, xfs Eric Sandeen wrote: > David Chatterton wrote: >> I assume the loop is further up the chain since kmem_alloc should >> return NULL >> when asked to alloc 0. So then the problem also lies further up the >> chain in >> checking for a 0 length before calling down, and/or not assuming we >> are out of >> memory when xfs_buf_get_noaddr fails. > > Well, I set kdb breakpoints, and we only entered xfs_buf_get_noaddr > once, so I assume it's looping inside. But I was looking for bugs on, > um, another filesystem at the time, so didn't investigate much. > > I can put it on my list of spare-time bugs to look at, or just thought > you guys may be interested as well. <spare-time> well, as a quick fix, this seems to do the trick: --- linux-2.6.18.orig/fs/xfs/xfs_log_recover.c +++ linux-2.6.18/fs/xfs/xfs_log_recover.c @@ -75,6 +75,9 @@ xlog_get_bp( int num_bblks) { ASSERT(num_bblks > 0); + if (num_bblks <= 0) { + return NULL; + } if (log->l_sectbb_log) { if (num_bblks > 1) but it's not the most helpful output: XFS: Log inconsistent (didn't find previous header) XFS: empty log check failed XFS: log mount/recovery failed: error 5 XFS: log mount failed ... but not that bad I guess. it's getting the 0 allocation because last & start are equal here: num_blks 0 last 2756 start 2756 [<dec87bd5>] xlog_find_verify_log_record+0x45/0x2f2 [xfs] [<dec88254>] xlog_find_tail+0x20e/0xb8d [xfs] [<dec88be9>] xlog_recover+0x16/0x22d [xfs] [<dec84338>] xfs_log_mount+0x4e4/0x530 [xfs] [<dec8b2af>] xfs_mountfs+0xa58/0xf61 [xfs] [<dec7dd7b>] xfs_ioinit+0x1e/0x23 [xfs] [<dec91db8>] xfs_mount+0x7a8/0x875 [xfs] [<deca2713>] vfs_mount+0x17/0x1a [xfs] [<deca25b5>] xfs_fs_fill_super+0x6c/0x1b3 [xfs] [<c047bd7b>] get_sb_bdev+0xd1/0x11f [<deca1ac9>] xfs_fs_get_sb+0x20/0x25 [xfs] [<c047b933>] vfs_kern_mount+0x83/0xf6 [<c047b9e8>] do_kern_mount+0x2d/0x3e [<c048ee67>] do_mount+0x5fe/0x671 [<c048ef51>] sys_mount+0x77/0xae [<c0403fb3>] syscall_call+0x7/0xb and they're equal because in xlog_find_zeroed(): start_blk = last_blk - num_scan_bblks; /* here, start 2756 last 3268 num_scan_bblks 512 */ /* * We search for any instances of cycle number 0 that occur before * our current estimate of the head. What we're trying to detect is * 1 ... | 0 | 1 | 0... * ^ binary search ends here */ if ((error = xlog_find_verify_cycle(log, start_blk, (int)num_scan_bblks, 0, &new_blk))) goto bp_err; if (new_blk != -1) last_blk = new_blk; /* now new last_blk == new_blk == 2756, same as start */ /* * Potentially backup over partial log record write. We don't need * to search the end of the log because we know it is zero. */ if ((error = xlog_find_verify_log_record(log, start_blk, &last_blk, 0)) == -1) { Maybe that's enough for Tim to come up with a better check :) -Eric ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2006-10-18 3:47 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-10-13 19:46 corrupted log causes infinite loop at mount Eric Sandeen 2006-10-15 5:51 ` David Chatterton 2006-10-15 14:02 ` Eric Sandeen 2006-10-18 3:47 ` Eric Sandeen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox