* Metadata corruption detected at xfs_agf block
@ 2016-07-18 11:25 Eryu Guan
2016-07-18 18:55 ` Eric Sandeen
0 siblings, 1 reply; 2+ messages in thread
From: Eryu Guan @ 2016-07-18 11:25 UTC (permalink / raw)
To: xfs
Hi,
I hit metadata corruption reported by xfs_repair after running fsstress
on the test XFS.
# xfs_repair -n /dev/mapper/testvg-testlv
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
- scan filesystem freespace and inode maps...
Metadata corruption detected at xfs_agf block 0x59fa001/0x200
flfirst 118 in agf 3 too large (max = 118)
agf 118 freelist blocks bad, skipping freelist scan
sb_fdblocks 15716842, counted 15716838
- found root inode chunk
Phase 3 - for each AG...
- scan (but don't clear) agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 1
- agno = 2
- agno = 3
- agno = 0
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.
Kernel is 4.7-rc7, xfsprogs is v4.3.0 (v4.5.0/v4.7-rc1 reported no
corruption, I think that's because of commit 96f859d ("libxfs: pack the
agfl header structure so XFS_AGFL_SIZE is correct"))
This is similar to this thread:
new fs, xfs_admin new label, metadata corruption detected
http://oss.sgi.com/archives/xfs/2016-03/msg00297.html
which ended up a new patch in growfs code, commit ad747e3b2996 ("xfs:
Don't wrap growfs AGFL indexes"), so I think I'd better report this
similar issue anyway, though I'm not sure if it's really a bug.
It's not reproducible everytime, but I can reproduce it in a loop run
i=0; ret=0
mkfs -t xfs -f /dev/mapper/testvg-testlv
while [ $i -lt 10 -a $ret -eq 0 ]; do
mount /dev/mapper/testvg-testlv /mnt/xfs
fsstress -d /mnt/xfs -n 1000 -p 1000
umount /mnt/xfs
xfs_repair -n /dev/mapper/testvg-testlv
ret=$?
((i++))
done
mkfs.xfs output
meta-data=/dev/mapper/testvg-testlv isize=512 agcount=4, agsize=3931136 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=0, sparse=0
data = bsize=4096 blocks=15724544, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=1
log =internal log bsize=4096 blocks=7678, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
lvm info
[root@bootp-73-5-205 ~]# pvs
PV VG Fmt Attr PSize PFree
/dev/vda10 testvg lvm2 a-- 15.00g 0
/dev/vda7 testvg lvm2 a-- 15.00g 0
/dev/vda8 testvg lvm2 a-- 15.00g 0
/dev/vda9 testvg lvm2 a-- 15.00g 0
[root@bootp-73-5-205 ~]# vgs
VG #PV #LV #SN Attr VSize VFree
testvg 4 1 0 wz--n- 59.98g 0
[root@bootp-73-5-205 ~]# lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
testlv testvg -wi-a----- 59.98g
host info (x86_64 kvm guest running on RHEL6 host)
[root@bootp-73-5-205 ~]# uname -a
Linux localhost.localdomain 4.7.0-rc7 #21 SMP Fri Jul 15 12:50:03 CST 2016 x86_64 x86_64 x86_64 GNU/Linux
[root@bootp-73-5-205 ~]# free -m
total used free shared buff/cache available
Mem: 7983 113 5569 8 2299 7577
Swap: 8191 0 8191
[root@bootp-73-5-205 ~]# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 4
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 13
Model name: QEMU Virtual CPU version (cpu64-rhel6)
Stepping: 3
CPU MHz: 2892.748
BogoMIPS: 5785.49
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 4096K
NUMA node0 CPU(s): 0-3
Thanks,
Eryu
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 2+ messages in thread* Re: Metadata corruption detected at xfs_agf block
2016-07-18 11:25 Metadata corruption detected at xfs_agf block Eryu Guan
@ 2016-07-18 18:55 ` Eric Sandeen
0 siblings, 0 replies; 2+ messages in thread
From: Eric Sandeen @ 2016-07-18 18:55 UTC (permalink / raw)
To: xfs
On 7/18/16 4:25 AM, Eryu Guan wrote:
> Hi,
>
> I hit metadata corruption reported by xfs_repair after running fsstress
> on the test XFS.
>
> # xfs_repair -n /dev/mapper/testvg-testlv
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
> - zero log...
> - scan filesystem freespace and inode maps...
> Metadata corruption detected at xfs_agf block 0x59fa001/0x200
> flfirst 118 in agf 3 too large (max = 118)
^^^ ^^^
FWIW, this confusing output was fixed by:
6aa32b4 xfs_repair: fix agf limit error messages
so today it would say:
flfirst 118 in agf 3 too large (max = 117)
> agf 118 freelist blocks bad, skipping freelist scan
> sb_fdblocks 15716842, counted 15716838
> - found root inode chunk
> Phase 3 - for each AG...
> - scan (but don't clear) agi unlinked lists...
> - process known inodes and perform inode discovery...
> - agno = 0
> - agno = 1
> - agno = 2
> - agno = 3
> - process newly discovered inodes...
> Phase 4 - check for duplicate blocks...
> - setting up duplicate extent list...
> - check for inodes claiming duplicate blocks...
> - agno = 1
> - agno = 2
> - agno = 3
> - agno = 0
> No modify flag set, skipping phase 5
> Phase 6 - check inode connectivity...
> - traversing filesystem ...
> - traversal finished ...
> - moving disconnected inodes to lost+found ...
> Phase 7 - verify link counts...
> No modify flag set, skipping filesystem flush and exiting.
>
> Kernel is 4.7-rc7, xfsprogs is v4.3.0 (v4.5.0/v4.7-rc1 reported no
> corruption, I think that's because of commit 96f859d ("libxfs: pack the
> agfl header structure so XFS_AGFL_SIZE is correct"))
hm this does seem related.
> This is similar to this thread:
>
> new fs, xfs_admin new label, metadata corruption detected
> http://oss.sgi.com/archives/xfs/2016-03/msg00297.html
That one did have a growfs step, which you don't have, right?
> which ended up a new patch in growfs code, commit ad747e3b2996 ("xfs:
> Don't wrap growfs AGFL indexes"), so I think I'd better report this
> similar issue anyway, though I'm not sure if it's really a bug.
Ok, interesting, I thought growfs was the only path to this.
/*
* Size of the AGFL. For CRC-enabled filesystes we steal a couple of
* slots in the beginning of the block for a proper header with the
* location information and CRC.
*/
#define XFS_AGFL_SIZE(mp) \
(((mp)->m_sb.sb_sectsize - \
(xfs_sb_version_hascrc(&((mp)->m_sb)) ? \
sizeof(struct xfs_agfl) : 0)) / \
sizeof(xfs_agblock_t))
so the packed version of struct xfs_agfl is smaller (36 vs 40), and so
yields a larger XFS_AGFL_SIZE (119 vs 118 in this case) and thus a
larger possible index (118 vs 117)
The (older) repair code you ran thinks 117 is the max index, but the
(newer) kernel created 118. So this is newer kernel + older userspace,
that all makes sense so far.
xfs_alloc_put_freelist():
be32_add_cpu(&agf->agf_flfirst, 1);
xfs_trans_brelse(tp, agflbp);
if (be32_to_cpu(agf->agf_flfirst) == XFS_AGFL_SIZE(mp)) // 119
agf->agf_flfirst = 0;
so I guess this is the non-growfs case that can hit this as well, and
we can end up with agf_flfirts == 118 when the repair code thinks
117 is the max permissible. It's just less likely than the growfs
case. Now, how to fix this one for all combinations... :(
-Eric
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2016-07-18 18:55 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-07-18 11:25 Metadata corruption detected at xfs_agf block Eryu Guan
2016-07-18 18:55 ` Eric Sandeen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox