public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* Corruption errors with growfs
@ 2013-10-04  7:19 Fredrik Tolf
  2013-10-04 11:52 ` Dave Chinner
  0 siblings, 1 reply; 3+ messages in thread
From: Fredrik Tolf @ 2013-10-04  7:19 UTC (permalink / raw)
  To: xfs

Dear list,

I recently consolidated two filesystems that have previously been 
separate; I'll refer to them below as /home and /home/pub (since that's 
what they're actually called).

When I first did so, I needed to grow /home quite a bit to accomodate the 
files in /home/pub. This is on LVM, so I extended the LV quite a bit 
(from about 550 GB to about 3.5 TB), and tried to grow the filesystem. At 
this point I encountered an error about corruption, prompting me to 
unmount the filesystem and run xfs_repair on it. I did so, it completed 
successfully and retained the filesystem at the size it was supposed to be 
grown to, so I ascribed the errors to some latent corruption by some older 
kernel version or something and went on with my life.

However, today I tried to grow the filesystem by another 500 GB, 
encountering again a very similar error. Clearly, this couldn't just be 
left-over corruption from some earlier kernel bug since I'm still using 
the exact same kernel. What's worse, however, is that xfs_repair restored 
the filesystem to its size prior to running growfs, so it seems I can't 
grow the filesystem and am stuck at its current size.

Does someone know what is happening, and what I can do to fix it?

The kernel I'm running is vanilla Linux 3.10.5, and I'm using xfsprogs 
3.1.4 (standard Debian Squeeze version). I unfortunately lost the original 
error I got from growfs since I had to close that terminal to unmount the 
filesystem (and I didn't think of copying it), but this is in the dmesg, 
and probably more useful anyway:

[205909.076160] ffff88002e8f6200: 58 46 53 42 00 00 10 00 00 00 00 00 48 c0 00 00  XFSB........H...
[205909.085110] ffff88002e8f6210: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[205909.093986] ffff88002e8f6220: f6 18 71 30 51 8c 4a 63 a5 86 cc e2 91 35 16 f3  ..q0Q.Jc.....5..
[205909.104116] ffff88002e8f6230: 00 00 00 00 04 00 00 04 00 00 00 00 00 00 00 80  ................
[205909.113001] XFS (dm-1): Internal error xfs_sb_read_verify at line 730 of file fs/xfs/xfs_mount.c.  Caller 0xffffffffa02471da
[205909.113001]
[205909.125884] CPU: 1 PID: 303 Comm: kworker/1:1H Not tainted 3.10.5 #1
[205909.136336] Hardware name:    /M57SLI-S4, BIOS FE 11/22/2007
[205909.138721] Workqueue: xfslogd xfs_buf_iodone_work [xfs]
[205909.146918]  ffffffff8138be52 ffff88003fc92dc0 ffffffffa0248fb5 ffffffffa02471da
[205909.159978]  ffffffff000002da ffff880037b36360 ffff8800314a3e00 0000000000000075
[205909.168660]  ffff880037cce000 ffff88002e8f6200 ffffffffa028be6d ffffffffa02471da
[205909.176469] Call Trace:
[205909.179155]  [<ffffffff8138be52>] ? dump_stack+0x10/0x1e
[205909.184593]  [<ffffffffa0248fb5>] ? xfs_corruption_error+0x54/0x6f [xfs]
[205909.191397]  [<ffffffffa02471da>] ? xfs_buf_iodone_work+0x40/0x77 [xfs]
[205909.198177]  [<ffffffffa028be6d>] ? xfs_sb_read_verify+0xa9/0xc8 [xfs]
[205909.204812]  [<ffffffffa02471da>] ? xfs_buf_iodone_work+0x40/0x77 [xfs]
[205909.211540]  [<ffffffff8138d038>] ? __schedule+0x514/0x541
[205909.217148]  [<ffffffffa02471da>] ? xfs_buf_iodone_work+0x40/0x77 [xfs]
[205909.223857]  [<ffffffff8104aaf2>] ? process_one_work+0x1f9/0x2fc
[205909.229974]  [<ffffffff8104ad52>] ? worker_thread+0x15d/0x268
[205909.235827]  [<ffffffff8104abf5>] ? process_one_work+0x2fc/0x2fc
[205909.241977]  [<ffffffff8104ea6b>] ? kthread_freezable_should_stop+0x56/0x56
[205909.249037]  [<ffffffff8104abf5>] ? process_one_work+0x2fc/0x2fc
[205909.255153]  [<ffffffff8104eb16>] ? kthread+0xab/0xb3
[205909.260311]  [<ffffffff8104ea6b>] ? kthread_freezable_should_stop+0x56/0x56
[205909.267380]  [<ffffffff8139336c>] ? ret_from_fork+0x7c/0xb0
[205909.273070]  [<ffffffff8104ea6b>] ? kthread_freezable_should_stop+0x56/0x56
[205909.280126] XFS (dm-1): Corruption detected. Unmount and run xfs_repair
[205909.286853] XFS (dm-1): metadata I/O error: block 0x32000000 ("xfs_trans_read_buf_map") error 117 numblks 1
[205909.296703] XFS (dm-1): error 117 reading secondary superblock for ag 16

This is what xfs_info has to say about the filesystem:

meta-data=/dev/mapper/ravol-home isize=256    agcount=187, agsize=6553600 blks
          =                       sectsz=512   attr=1
data     =                       bsize=4096   blocks=1220542464, imaxpct=25
          =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=32768, version=1
          =                       sectsz=512   sunit=0 blks, lazy-count=0
realtime =none                   extsz=65536  blocks=0, rtextents=0

Thanks for reading!

--
Fredrik Tolf

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Corruption errors with growfs
  2013-10-04  7:19 Corruption errors with growfs Fredrik Tolf
@ 2013-10-04 11:52 ` Dave Chinner
  2013-10-13  2:54   ` Fredrik Tolf
  0 siblings, 1 reply; 3+ messages in thread
From: Dave Chinner @ 2013-10-04 11:52 UTC (permalink / raw)
  To: Fredrik Tolf; +Cc: xfs

On Fri, Oct 04, 2013 at 09:19:06AM +0200, Fredrik Tolf wrote:
> Dear list,
> 
> I recently consolidated two filesystems that have previously been
> separate; I'll refer to them below as /home and /home/pub (since
> that's what they're actually called).
> 
> When I first did so, I needed to grow /home quite a bit to
> accomodate the files in /home/pub. This is on LVM, so I extended the
> LV quite a bit (from about 550 GB to about 3.5 TB), and tried to
> grow the filesystem. At this point I encountered an error about
> corruption, prompting me to unmount the filesystem and run
> xfs_repair on it. I did so, it completed successfully and retained
> the filesystem at the size it was supposed to be grown to, so I
> ascribed the errors to some latent corruption by some older kernel
> version or something and went on with my life.
> 
> However, today I tried to grow the filesystem by another 500 GB,
> encountering again a very similar error. Clearly, this couldn't just
> be left-over corruption from some earlier kernel bug since I'm still
> using the exact same kernel. What's worse, however, is that
> xfs_repair restored the filesystem to its size prior to running
> growfs, so it seems I can't grow the filesystem and am stuck at its
> current size.
> 
> Does someone know what is happening, and what I can do to fix it?

Old kernel versions didn't zero the empty part of the secondary
superblocks when growing the filesystem. This commit in 3.8 fixed
the kernel growfs code not to put garbage in the new secondary
superblocks.

commit 1375cb65e87b327a8dd4f920c3e3d837fb40e9c2
Author: Dave Chinner <dchinner@redhat.com>
Date:   Tue Oct 9 14:50:52 2012 +1100

    xfs: growfs: don't read garbage for new secondary superblocks

    When updating new secondary superblocks in a growfs operation, the
    superblock buffer is read from the newly grown region of the
    underlying device. This is not guaranteed to be zero, so violates
    the underlying assumption that the unused parts of superblocks are
    zero filled. Get a new buffer for these secondary superblocks to
    ensure that the unused regions are zero filled correctly.

    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
    Signed-off-by: Ben Myers <bpm@sgi.com>


The only time the kernel reads secondary superblocks is during a
growfs operation, so that's the only time the kernel will detect
such an error. More extensive validity tests were added during 3.9
and 3.10, and these now throw corruption errors over secondary
superblocks that have not been correctly zeroed.

To fix this, you need to grab xfsprogs from the git repo
(3.2.0-alpha will do) as this commit to xfs_repair detects and fixes
the corrupted superblocks:

commit cbd7508db4c9597889ad98d5f027542002e0e57c
Author: Eric Sandeen <sandeen@redhat.com>
Date:   Thu Aug 15 02:26:40 2013 +0000

    xfs_repair: zero out unused parts of superblocks
    
    Prior to:
    1375cb65 xfs: growfs: don't read garbage for new secondary superblocks
    
    we ran the risk of allowing garbage in secondary superblocks
    beyond the in-use sb fields.  With kernels 3.10 and beyond, the
    verifiers will kick these out as invalid, but xfs_repair does
    not detect or repair this condition.
    
    There is superblock stale-data zeroing code, but it is under a
    narrow conditional - the bug addressed in the above commit did not
    meet that conditional.  So change this to check unconditionally.
    
    Further, the checking code was looking at the in-memory
    superblock buffer, which was zeroed prior to population, and
    would therefore never possibly show any stale data beyond the
    last up-rev superblock field.
    
    So instead, check the disk buffer for this garbage condition.
    
    If we detect garbage, we must zero out both the in-memory sb
    and the disk buffer; the former may contain unused data
    in up-rev sb fields which will be written back out; the latter
    may contain garbage beyond all fields, which won't be updated
    when we translate the in-memory sb back to disk.
    
    The V4 superblock case was zeroing out the sb_bad_features2
    field; we also fix that to leave that field alone.
    
    Lastly, use offsetof() instead of the tortured (__psint_t)
    casts & pointer math.
    
    Reported-by: Michael Maier <m1278468@allmail.net>
    Signed-off-by: Eric Sandeen <sandeen@redhat.com>
    Reviewed-by: Rich Johnston <rjohnston@sgi.com>
    Signed-off-by: Rich Johnston <rjohnston@sgi.com>

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Corruption errors with growfs
  2013-10-04 11:52 ` Dave Chinner
@ 2013-10-13  2:54   ` Fredrik Tolf
  0 siblings, 0 replies; 3+ messages in thread
From: Fredrik Tolf @ 2013-10-13  2:54 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Fri, 4 Oct 2013, Dave Chinner wrote:
> To fix this, you need to grab xfsprogs from the git repo
> (3.2.0-alpha will do) as this commit to xfs_repair detects and fixes
> the corrupted superblocks

I see; that makes sense. Thanks a lot!

--
Fredrik Tolf

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2013-10-13  2:54 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-04  7:19 Corruption errors with growfs Fredrik Tolf
2013-10-04 11:52 ` Dave Chinner
2013-10-13  2:54   ` Fredrik Tolf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox