* Corruption errors with growfs
@ 2013-10-04 7:19 Fredrik Tolf
2013-10-04 11:52 ` Dave Chinner
0 siblings, 1 reply; 3+ messages in thread
From: Fredrik Tolf @ 2013-10-04 7:19 UTC (permalink / raw)
To: xfs
Dear list,
I recently consolidated two filesystems that have previously been
separate; I'll refer to them below as /home and /home/pub (since that's
what they're actually called).
When I first did so, I needed to grow /home quite a bit to accomodate the
files in /home/pub. This is on LVM, so I extended the LV quite a bit
(from about 550 GB to about 3.5 TB), and tried to grow the filesystem. At
this point I encountered an error about corruption, prompting me to
unmount the filesystem and run xfs_repair on it. I did so, it completed
successfully and retained the filesystem at the size it was supposed to be
grown to, so I ascribed the errors to some latent corruption by some older
kernel version or something and went on with my life.
However, today I tried to grow the filesystem by another 500 GB,
encountering again a very similar error. Clearly, this couldn't just be
left-over corruption from some earlier kernel bug since I'm still using
the exact same kernel. What's worse, however, is that xfs_repair restored
the filesystem to its size prior to running growfs, so it seems I can't
grow the filesystem and am stuck at its current size.
Does someone know what is happening, and what I can do to fix it?
The kernel I'm running is vanilla Linux 3.10.5, and I'm using xfsprogs
3.1.4 (standard Debian Squeeze version). I unfortunately lost the original
error I got from growfs since I had to close that terminal to unmount the
filesystem (and I didn't think of copying it), but this is in the dmesg,
and probably more useful anyway:
[205909.076160] ffff88002e8f6200: 58 46 53 42 00 00 10 00 00 00 00 00 48 c0 00 00 XFSB........H...
[205909.085110] ffff88002e8f6210: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[205909.093986] ffff88002e8f6220: f6 18 71 30 51 8c 4a 63 a5 86 cc e2 91 35 16 f3 ..q0Q.Jc.....5..
[205909.104116] ffff88002e8f6230: 00 00 00 00 04 00 00 04 00 00 00 00 00 00 00 80 ................
[205909.113001] XFS (dm-1): Internal error xfs_sb_read_verify at line 730 of file fs/xfs/xfs_mount.c. Caller 0xffffffffa02471da
[205909.113001]
[205909.125884] CPU: 1 PID: 303 Comm: kworker/1:1H Not tainted 3.10.5 #1
[205909.136336] Hardware name: /M57SLI-S4, BIOS FE 11/22/2007
[205909.138721] Workqueue: xfslogd xfs_buf_iodone_work [xfs]
[205909.146918] ffffffff8138be52 ffff88003fc92dc0 ffffffffa0248fb5 ffffffffa02471da
[205909.159978] ffffffff000002da ffff880037b36360 ffff8800314a3e00 0000000000000075
[205909.168660] ffff880037cce000 ffff88002e8f6200 ffffffffa028be6d ffffffffa02471da
[205909.176469] Call Trace:
[205909.179155] [<ffffffff8138be52>] ? dump_stack+0x10/0x1e
[205909.184593] [<ffffffffa0248fb5>] ? xfs_corruption_error+0x54/0x6f [xfs]
[205909.191397] [<ffffffffa02471da>] ? xfs_buf_iodone_work+0x40/0x77 [xfs]
[205909.198177] [<ffffffffa028be6d>] ? xfs_sb_read_verify+0xa9/0xc8 [xfs]
[205909.204812] [<ffffffffa02471da>] ? xfs_buf_iodone_work+0x40/0x77 [xfs]
[205909.211540] [<ffffffff8138d038>] ? __schedule+0x514/0x541
[205909.217148] [<ffffffffa02471da>] ? xfs_buf_iodone_work+0x40/0x77 [xfs]
[205909.223857] [<ffffffff8104aaf2>] ? process_one_work+0x1f9/0x2fc
[205909.229974] [<ffffffff8104ad52>] ? worker_thread+0x15d/0x268
[205909.235827] [<ffffffff8104abf5>] ? process_one_work+0x2fc/0x2fc
[205909.241977] [<ffffffff8104ea6b>] ? kthread_freezable_should_stop+0x56/0x56
[205909.249037] [<ffffffff8104abf5>] ? process_one_work+0x2fc/0x2fc
[205909.255153] [<ffffffff8104eb16>] ? kthread+0xab/0xb3
[205909.260311] [<ffffffff8104ea6b>] ? kthread_freezable_should_stop+0x56/0x56
[205909.267380] [<ffffffff8139336c>] ? ret_from_fork+0x7c/0xb0
[205909.273070] [<ffffffff8104ea6b>] ? kthread_freezable_should_stop+0x56/0x56
[205909.280126] XFS (dm-1): Corruption detected. Unmount and run xfs_repair
[205909.286853] XFS (dm-1): metadata I/O error: block 0x32000000 ("xfs_trans_read_buf_map") error 117 numblks 1
[205909.296703] XFS (dm-1): error 117 reading secondary superblock for ag 16
This is what xfs_info has to say about the filesystem:
meta-data=/dev/mapper/ravol-home isize=256 agcount=187, agsize=6553600 blks
= sectsz=512 attr=1
data = bsize=4096 blocks=1220542464, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal bsize=4096 blocks=32768, version=1
= sectsz=512 sunit=0 blks, lazy-count=0
realtime =none extsz=65536 blocks=0, rtextents=0
Thanks for reading!
--
Fredrik Tolf
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Corruption errors with growfs
2013-10-04 7:19 Corruption errors with growfs Fredrik Tolf
@ 2013-10-04 11:52 ` Dave Chinner
2013-10-13 2:54 ` Fredrik Tolf
0 siblings, 1 reply; 3+ messages in thread
From: Dave Chinner @ 2013-10-04 11:52 UTC (permalink / raw)
To: Fredrik Tolf; +Cc: xfs
On Fri, Oct 04, 2013 at 09:19:06AM +0200, Fredrik Tolf wrote:
> Dear list,
>
> I recently consolidated two filesystems that have previously been
> separate; I'll refer to them below as /home and /home/pub (since
> that's what they're actually called).
>
> When I first did so, I needed to grow /home quite a bit to
> accomodate the files in /home/pub. This is on LVM, so I extended the
> LV quite a bit (from about 550 GB to about 3.5 TB), and tried to
> grow the filesystem. At this point I encountered an error about
> corruption, prompting me to unmount the filesystem and run
> xfs_repair on it. I did so, it completed successfully and retained
> the filesystem at the size it was supposed to be grown to, so I
> ascribed the errors to some latent corruption by some older kernel
> version or something and went on with my life.
>
> However, today I tried to grow the filesystem by another 500 GB,
> encountering again a very similar error. Clearly, this couldn't just
> be left-over corruption from some earlier kernel bug since I'm still
> using the exact same kernel. What's worse, however, is that
> xfs_repair restored the filesystem to its size prior to running
> growfs, so it seems I can't grow the filesystem and am stuck at its
> current size.
>
> Does someone know what is happening, and what I can do to fix it?
Old kernel versions didn't zero the empty part of the secondary
superblocks when growing the filesystem. This commit in 3.8 fixed
the kernel growfs code not to put garbage in the new secondary
superblocks.
commit 1375cb65e87b327a8dd4f920c3e3d837fb40e9c2
Author: Dave Chinner <dchinner@redhat.com>
Date: Tue Oct 9 14:50:52 2012 +1100
xfs: growfs: don't read garbage for new secondary superblocks
When updating new secondary superblocks in a growfs operation, the
superblock buffer is read from the newly grown region of the
underlying device. This is not guaranteed to be zero, so violates
the underlying assumption that the unused parts of superblocks are
zero filled. Get a new buffer for these secondary superblocks to
ensure that the unused regions are zero filled correctly.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
The only time the kernel reads secondary superblocks is during a
growfs operation, so that's the only time the kernel will detect
such an error. More extensive validity tests were added during 3.9
and 3.10, and these now throw corruption errors over secondary
superblocks that have not been correctly zeroed.
To fix this, you need to grab xfsprogs from the git repo
(3.2.0-alpha will do) as this commit to xfs_repair detects and fixes
the corrupted superblocks:
commit cbd7508db4c9597889ad98d5f027542002e0e57c
Author: Eric Sandeen <sandeen@redhat.com>
Date: Thu Aug 15 02:26:40 2013 +0000
xfs_repair: zero out unused parts of superblocks
Prior to:
1375cb65 xfs: growfs: don't read garbage for new secondary superblocks
we ran the risk of allowing garbage in secondary superblocks
beyond the in-use sb fields. With kernels 3.10 and beyond, the
verifiers will kick these out as invalid, but xfs_repair does
not detect or repair this condition.
There is superblock stale-data zeroing code, but it is under a
narrow conditional - the bug addressed in the above commit did not
meet that conditional. So change this to check unconditionally.
Further, the checking code was looking at the in-memory
superblock buffer, which was zeroed prior to population, and
would therefore never possibly show any stale data beyond the
last up-rev superblock field.
So instead, check the disk buffer for this garbage condition.
If we detect garbage, we must zero out both the in-memory sb
and the disk buffer; the former may contain unused data
in up-rev sb fields which will be written back out; the latter
may contain garbage beyond all fields, which won't be updated
when we translate the in-memory sb back to disk.
The V4 superblock case was zeroing out the sb_bad_features2
field; we also fix that to leave that field alone.
Lastly, use offsetof() instead of the tortured (__psint_t)
casts & pointer math.
Reported-by: Michael Maier <m1278468@allmail.net>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Rich Johnston <rjohnston@sgi.com>
Signed-off-by: Rich Johnston <rjohnston@sgi.com>
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Corruption errors with growfs
2013-10-04 11:52 ` Dave Chinner
@ 2013-10-13 2:54 ` Fredrik Tolf
0 siblings, 0 replies; 3+ messages in thread
From: Fredrik Tolf @ 2013-10-13 2:54 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On Fri, 4 Oct 2013, Dave Chinner wrote:
> To fix this, you need to grab xfsprogs from the git repo
> (3.2.0-alpha will do) as this commit to xfs_repair detects and fixes
> the corrupted superblocks
I see; that makes sense. Thanks a lot!
--
Fredrik Tolf
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2013-10-13 2:54 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-04 7:19 Corruption errors with growfs Fredrik Tolf
2013-10-04 11:52 ` Dave Chinner
2013-10-13 2:54 ` Fredrik Tolf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox