corrupted log causes infinite loop at mount

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* corrupted log causes infinite loop at mount
@ 2006-10-13 19:46 Eric Sandeen
  2006-10-15  5:51 ` David Chatterton
  0 siblings, 1 reply; 4+ messages in thread
From: Eric Sandeen @ 2006-10-13 19:46 UTC (permalink / raw)
  To: xfs

While playing with some filesystem corruption testers, I ran into this.

http://sandeen.net/xfs.31.img.bz2

If you try to mount, it gets into xfs_buf_get_noaddr via log replay with
a len of 0, and I think this causes an infinite loop in the goto:

 try_again:
        data = kmem_alloc(malloc_len, KM_SLEEP | KM_MAYFAIL);
        if (unlikely(data == NULL))
                goto fail_free_buf;

        /* check whether alignment matches.. */
        if ((__psunsigned_t)data !=
            ((__psunsigned_t)data & ~target->bt_smask)) {
                /* .. else double the size and try again */
                kmem_free(data, malloc_len);
                malloc_len <<= 1;
                goto try_again;
        }

Up the callchain a bit there is an ASSERT that the size is > 0, but of
course that doesn't help on a non-debug kernel...

haven't had time to investigate beyond that.

-Eric

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: corrupted log causes infinite loop at mount
  2006-10-13 19:46 corrupted log causes infinite loop at mount Eric Sandeen
@ 2006-10-15  5:51 ` David Chatterton
  2006-10-15 14:02   ` Eric Sandeen
  0 siblings, 1 reply; 4+ messages in thread
From: David Chatterton @ 2006-10-15  5:51 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs

Eric,

Eric Sandeen wrote:
> While playing with some filesystem corruption testers, I ran into this.
> 
> http://sandeen.net/xfs.31.img.bz2
> 
> If you try to mount, it gets into xfs_buf_get_noaddr via log replay with
> a len of 0, and I think this causes an infinite loop in the goto:
> 
>  try_again:
>         data = kmem_alloc(malloc_len, KM_SLEEP | KM_MAYFAIL);
>         if (unlikely(data == NULL))
>                 goto fail_free_buf;
> 
>         /* check whether alignment matches.. */
>         if ((__psunsigned_t)data !=
>             ((__psunsigned_t)data & ~target->bt_smask)) {
>                 /* .. else double the size and try again */
>                 kmem_free(data, malloc_len);
>                 malloc_len <<= 1;
>                 goto try_again;
>         }
> 
> Up the callchain a bit there is an ASSERT that the size is > 0, but of
> course that doesn't help on a non-debug kernel...
> 
> haven't had time to investigate beyond that.
> 
> -Eric
> 

I assume the loop is further up the chain since kmem_alloc should return NULL
when asked to alloc 0. So then the problem also lies further up the chain in
checking for a 0 length before calling down, and/or not assuming we are out of
memory when xfs_buf_get_noaddr fails.

David

-- 
David Chatterton
XFS Engineering Manager
SGI Australia

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: corrupted log causes infinite loop at mount
  2006-10-15  5:51 ` David Chatterton
@ 2006-10-15 14:02   ` Eric Sandeen
  2006-10-18  3:47     ` Eric Sandeen
  0 siblings, 1 reply; 4+ messages in thread
From: Eric Sandeen @ 2006-10-15 14:02 UTC (permalink / raw)
  To: chatz; +Cc: xfs

David Chatterton wrote:
> I assume the loop is further up the chain since kmem_alloc should return NULL
> when asked to alloc 0. So then the problem also lies further up the chain in
> checking for a 0 length before calling down, and/or not assuming we are out of
> memory when xfs_buf_get_noaddr fails.

Well, I set kdb breakpoints, and we only entered xfs_buf_get_noaddr once, so I 
assume it's looping inside.  But I was looking for bugs on, um, another 
filesystem at the time, so didn't investigate much.

I can put it on my list of spare-time bugs to look at, or just thought you guys 
may be interested as well.

-Eric

p.s. ok can't help but look just a bit further...

a test module which does:

int __init test_init(void)
{
         void *data;
         int size = 0;

         data = kmalloc(size, GFP_KERNEL);
         if (data == NULL) {
                 printk("got NULL for alloc return\n");
                 return -1;
         } else {
                 printk("allocated %d bytes at %p\n", size, data);
                 return 0;
         }

}

yields:

allocated 0 bytes at ffff810029d88480

not NULL... nifty eh!

-Eric

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: corrupted log causes infinite loop at mount
  2006-10-15 14:02   ` Eric Sandeen
@ 2006-10-18  3:47     ` Eric Sandeen
  0 siblings, 0 replies; 4+ messages in thread
From: Eric Sandeen @ 2006-10-18  3:47 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: chatz, xfs

Eric Sandeen wrote:
> David Chatterton wrote:
>> I assume the loop is further up the chain since kmem_alloc should 
>> return NULL
>> when asked to alloc 0. So then the problem also lies further up the 
>> chain in
>> checking for a 0 length before calling down, and/or not assuming we 
>> are out of
>> memory when xfs_buf_get_noaddr fails.
> 
> Well, I set kdb breakpoints, and we only entered xfs_buf_get_noaddr 
> once, so I assume it's looping inside.  But I was looking for bugs on, 
> um, another filesystem at the time, so didn't investigate much.
> 
> I can put it on my list of spare-time bugs to look at, or just thought 
> you guys may be interested as well.

<spare-time>

well, as a quick fix, this seems to do the trick:

--- linux-2.6.18.orig/fs/xfs/xfs_log_recover.c
+++ linux-2.6.18/fs/xfs/xfs_log_recover.c
@@ -75,6 +75,9 @@ xlog_get_bp(
         int             num_bblks)
  {
         ASSERT(num_bblks > 0);
+       if (num_bblks <= 0) {
+               return NULL;
+       }

         if (log->l_sectbb_log) {
                 if (num_bblks > 1)

but it's not the most helpful output:

XFS: Log inconsistent (didn't find previous header)
XFS: empty log check failed
XFS: log mount/recovery failed: error 5
XFS: log mount failed

... but not that bad I guess.

it's getting the 0 allocation because last & start are equal here:

num_blks 0 last 2756 start 2756

  [<dec87bd5>] xlog_find_verify_log_record+0x45/0x2f2 [xfs]
  [<dec88254>] xlog_find_tail+0x20e/0xb8d [xfs]
  [<dec88be9>] xlog_recover+0x16/0x22d [xfs]
  [<dec84338>] xfs_log_mount+0x4e4/0x530 [xfs]
  [<dec8b2af>] xfs_mountfs+0xa58/0xf61 [xfs]
  [<dec7dd7b>] xfs_ioinit+0x1e/0x23 [xfs]
  [<dec91db8>] xfs_mount+0x7a8/0x875 [xfs]
  [<deca2713>] vfs_mount+0x17/0x1a [xfs]
  [<deca25b5>] xfs_fs_fill_super+0x6c/0x1b3 [xfs]
  [<c047bd7b>] get_sb_bdev+0xd1/0x11f
  [<deca1ac9>] xfs_fs_get_sb+0x20/0x25 [xfs]
  [<c047b933>] vfs_kern_mount+0x83/0xf6
  [<c047b9e8>] do_kern_mount+0x2d/0x3e
  [<c048ee67>] do_mount+0x5fe/0x671
  [<c048ef51>] sys_mount+0x77/0xae
  [<c0403fb3>] syscall_call+0x7/0xb

and they're equal because in xlog_find_zeroed():

         start_blk = last_blk - num_scan_bblks;

/* here, start 2756 last 3268 num_scan_bblks 512 */

         /*
          * We search for any instances of cycle number 0 that occur before
          * our current estimate of the head.  What we're trying to detect is
          *        1 ... | 0 | 1 | 0...
          *                       ^ binary search ends here
          */
         if ((error = xlog_find_verify_cycle(log, start_blk,
                                          (int)num_scan_bblks, 0, &new_blk)))
                 goto bp_err;
         if (new_blk != -1)
                 last_blk = new_blk;

/* now new last_blk == new_blk == 2756, same as start */

         /*
          * Potentially backup over partial log record write.  We don't need
          * to search the end of the log because we know it is zero.
          */
         if ((error = xlog_find_verify_log_record(log, start_blk,
                                 &last_blk, 0)) == -1) {

Maybe that's enough for Tim to come up with a better check :)

-Eric

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2006-10-18  3:47 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-10-13 19:46 corrupted log causes infinite loop at mount Eric Sandeen
2006-10-15  5:51 ` David Chatterton
2006-10-15 14:02   ` Eric Sandeen
2006-10-18  3:47     ` Eric Sandeen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox