public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* Metadata corruption detected at xfs_agf block
@ 2016-07-18 11:25 Eryu Guan
  2016-07-18 18:55 ` Eric Sandeen
  0 siblings, 1 reply; 2+ messages in thread
From: Eryu Guan @ 2016-07-18 11:25 UTC (permalink / raw)
  To: xfs

Hi,

I hit metadata corruption reported by xfs_repair after running fsstress
on the test XFS.

# xfs_repair -n /dev/mapper/testvg-testlv
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
Metadata corruption detected at xfs_agf block 0x59fa001/0x200
flfirst 118 in agf 3 too large (max = 118)
agf 118 freelist blocks bad, skipping freelist scan
sb_fdblocks 15716842, counted 15716838
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 0
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.

Kernel is 4.7-rc7, xfsprogs is v4.3.0 (v4.5.0/v4.7-rc1 reported no
corruption, I think that's because of commit 96f859d ("libxfs: pack the
agfl header structure so XFS_AGFL_SIZE is correct"))

This is similar to this thread:

new fs, xfs_admin new label, metadata corruption detected
http://oss.sgi.com/archives/xfs/2016-03/msg00297.html

which ended up a new patch in growfs code, commit ad747e3b2996 ("xfs:
Don't wrap growfs AGFL indexes"), so I think I'd better report this
similar issue anyway, though I'm not sure if it's really a bug.

It's not reproducible everytime, but I can reproduce it in a loop run

i=0; ret=0
mkfs -t xfs -f /dev/mapper/testvg-testlv
while [ $i -lt 10 -a $ret -eq 0 ]; do
	mount /dev/mapper/testvg-testlv /mnt/xfs
	fsstress -d /mnt/xfs -n 1000 -p 1000
	umount /mnt/xfs
	xfs_repair -n /dev/mapper/testvg-testlv
	ret=$?
	((i++))
done

mkfs.xfs output
meta-data=/dev/mapper/testvg-testlv isize=512    agcount=4, agsize=3931136 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=0, sparse=0
data     =                       bsize=4096   blocks=15724544, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=7678, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

lvm info
[root@bootp-73-5-205 ~]# pvs
  PV         VG     Fmt  Attr PSize  PFree
  /dev/vda10 testvg lvm2 a--  15.00g    0 
  /dev/vda7  testvg lvm2 a--  15.00g    0 
  /dev/vda8  testvg lvm2 a--  15.00g    0 
  /dev/vda9  testvg lvm2 a--  15.00g    0 
[root@bootp-73-5-205 ~]# vgs
  VG     #PV #LV #SN Attr   VSize  VFree
  testvg   4   1   0 wz--n- 59.98g    0 
[root@bootp-73-5-205 ~]# lvs
  LV     VG     Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  testlv testvg -wi-a----- 59.98g

host info (x86_64 kvm guest running on RHEL6 host)
[root@bootp-73-5-205 ~]# uname -a
Linux localhost.localdomain 4.7.0-rc7 #21 SMP Fri Jul 15 12:50:03 CST 2016 x86_64 x86_64 x86_64 GNU/Linux
[root@bootp-73-5-205 ~]# free -m
              total        used        free      shared  buff/cache   available
Mem:           7983         113        5569           8        2299        7577
Swap:          8191           0        8191
[root@bootp-73-5-205 ~]# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    1
Core(s) per socket:    1
Socket(s):             4
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 13
Model name:            QEMU Virtual CPU version (cpu64-rhel6)
Stepping:              3
CPU MHz:               2892.748
BogoMIPS:              5785.49
Hypervisor vendor:     KVM
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              4096K
NUMA node0 CPU(s):     0-3

Thanks,
Eryu

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Metadata corruption detected at xfs_agf block
  2016-07-18 11:25 Metadata corruption detected at xfs_agf block Eryu Guan
@ 2016-07-18 18:55 ` Eric Sandeen
  0 siblings, 0 replies; 2+ messages in thread
From: Eric Sandeen @ 2016-07-18 18:55 UTC (permalink / raw)
  To: xfs

On 7/18/16 4:25 AM, Eryu Guan wrote:
> Hi,
> 
> I hit metadata corruption reported by xfs_repair after running fsstress
> on the test XFS.
> 
> # xfs_repair -n /dev/mapper/testvg-testlv
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
>         - zero log...
>         - scan filesystem freespace and inode maps...
> Metadata corruption detected at xfs_agf block 0x59fa001/0x200
> flfirst 118 in agf 3 too large (max = 118)
          ^^^                           ^^^

FWIW, this confusing output was fixed by:

6aa32b4 xfs_repair: fix agf limit error messages

so today it would say:

flfirst 118 in agf 3 too large (max = 117)

> agf 118 freelist blocks bad, skipping freelist scan
> sb_fdblocks 15716842, counted 15716838
>         - found root inode chunk
> Phase 3 - for each AG...
>         - scan (but don't clear) agi unlinked lists...
>         - process known inodes and perform inode discovery...
>         - agno = 0
>         - agno = 1
>         - agno = 2
>         - agno = 3
>         - process newly discovered inodes...
> Phase 4 - check for duplicate blocks...
>         - setting up duplicate extent list...
>         - check for inodes claiming duplicate blocks...
>         - agno = 1
>         - agno = 2
>         - agno = 3
>         - agno = 0
> No modify flag set, skipping phase 5
> Phase 6 - check inode connectivity...
>         - traversing filesystem ...
>         - traversal finished ...
>         - moving disconnected inodes to lost+found ...
> Phase 7 - verify link counts...
> No modify flag set, skipping filesystem flush and exiting.
> 
> Kernel is 4.7-rc7, xfsprogs is v4.3.0 (v4.5.0/v4.7-rc1 reported no
> corruption, I think that's because of commit 96f859d ("libxfs: pack the
> agfl header structure so XFS_AGFL_SIZE is correct"))

hm this does seem related.

> This is similar to this thread:
> 
> new fs, xfs_admin new label, metadata corruption detected
> http://oss.sgi.com/archives/xfs/2016-03/msg00297.html

That one did have a growfs step, which you don't have, right?

> which ended up a new patch in growfs code, commit ad747e3b2996 ("xfs:
> Don't wrap growfs AGFL indexes"), so I think I'd better report this
> similar issue anyway, though I'm not sure if it's really a bug.

Ok, interesting, I thought growfs was the only path to this.

/*
 * Size of the AGFL.  For CRC-enabled filesystes we steal a couple of
 * slots in the beginning of the block for a proper header with the
 * location information and CRC.
 */
#define XFS_AGFL_SIZE(mp) \
        (((mp)->m_sb.sb_sectsize - \
         (xfs_sb_version_hascrc(&((mp)->m_sb)) ? \
                sizeof(struct xfs_agfl) : 0)) / \
          sizeof(xfs_agblock_t))

so the packed version of struct xfs_agfl is smaller (36 vs 40), and so
yields a larger XFS_AGFL_SIZE (119 vs 118 in this case) and thus a
larger possible index (118 vs 117)

The (older) repair code you ran thinks 117 is the max index, but the
(newer) kernel created 118.  So this is newer kernel + older userspace,
that all makes sense so far.

xfs_alloc_put_freelist():

        be32_add_cpu(&agf->agf_flfirst, 1);
        xfs_trans_brelse(tp, agflbp);
        if (be32_to_cpu(agf->agf_flfirst) == XFS_AGFL_SIZE(mp)) // 119
                agf->agf_flfirst = 0;

so I guess this is the non-growfs case that can hit this as well, and
we can end up with agf_flfirts == 118 when the repair code thinks
117 is the max permissible.  It's just less likely than the growfs
case.  Now, how to fix this one for all combinations...  :(

-Eric 
 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2016-07-18 18:55 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-07-18 11:25 Metadata corruption detected at xfs_agf block Eryu Guan
2016-07-18 18:55 ` Eric Sandeen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox