Re: [PATCH 0/6 v2] xfs: fix the bulkstat mess

From: Dave Chinner <david@fromorbit.com>
To: xfs@oss.sgi.com
Subject: Re: [PATCH 0/6 v2] xfs: fix the bulkstat mess
Date: Wed, 5 Nov 2014 17:32:26 +1100	[thread overview]
Message-ID: <20141105063226.GF28565@dastard> (raw)
In-Reply-To: <20141105060728.GE28565@dastard>

On Wed, Nov 05, 2014 at 05:07:28PM +1100, Dave Chinner wrote:
> On Wed, Nov 05, 2014 at 11:05:15AM +1100, Dave Chinner wrote:
> > Hi folks, this is version 2 of the bulkstat fixup series first
> > posted here:
> > 
> > http://oss.sgi.com/archives/xfs/2014-11/msg00057.html
> > 
> > Version 2 fixes the issues Brian found during review:
> > - chunk formatter error leakage (patch 3)
> > - moved main loop chunk formatter error handling from patch 4 to
> >   patch 5
> > - reworks last_agino updating in patch 6 to do post-formatting
> >   updates and added comments.
> > 
> > Comeents and testing welcome.
> 
> I'm not 100% convinced that this fixes all the problems. I just
> created, dumped and restored a 10 million inode filesystem (about
> 50GB of dump file) and I found 102 missing files in the dump with no
> errors from xfsdump or xfsrestore.
> 
> The files are missing from just 4 directories out of about 1000
> directories containing equal numbers of files, so its not a common
> trigger whatever the issue is. I'll keep digging...

OK, this looks like a problem with handling the last record in the
AGI btree:

$ for i in `cat s.diff | grep "^+/" | sed -e 's/^+//'` ; do ls -i $i; done |sort -n
163209114099 /mnt/scratch/2/dbc/5459605f~~~~~~~~RDJX8QBHPPMCGMD7YJQGYPD2
....
163209114129 /mnt/scratch/2/dbc/5459605f~~~~~~~~U820IYQFKS8A6QYCC8HU3ZBX
292057960758 /mnt/scratch/0/dcc/54596070~~~~~~~~9BUH5D5PZTGAC8BT1YL77OZ0
...
292057960769 /mnt/scratch/0/dcc/54596070~~~~~~~~DAO78GAAFNUZU8PH7Q0UZNRH
1395864555809 /mnt/scratch/1/e60/54596103~~~~~~~~GEMXGHYNREW409N7W9INBMVA
.....
1395864555841 /mnt/scratch/1/e60/54596103~~~~~~~~9XPK9FWHCE21AJ3EN023DU47
1653562593576 /mnt/scratch/5/e79/5459611c~~~~~~~~BSBZ6EUCT9HOIRQPMFZDVPQ5
.....
1653562593601 /mnt/scratch/5/e79/5459611c~~~~~~~~6QY1SO8ZGGNQESAGXSB3G3DH
$

xfs_db> convert inode 163209114099 agno
0x26 (38)
xfs_db> convert inode 163209114099 agino
0x571f3 (356851)
xfs_db> convert inode 163209114129 agino
0x57211 (356881)
xfs_db> agi 38
xfs_db> a root
xfs_db> a ptrs[2]
xfs_db> p
....
recs[1-234] = [startino,freecount,free]
......
228:[356352,0,0] 229:[356416,0,0] 230:[356512,0,0] 231:[356576,0,0]
232:[356672,0,0] 233:[356736,0,0] 234:[356832,14,0xfffc000000000000]

So the first contiguous inode range they all fall into the partial final record
in the AG.

xfs_db> convert inode 292057960758 agino
0x2d136 (184630)
.....
155:[184544,0,0] 156:[184608,30,0xfffffffc00000000]

Same.

xfs_db> convert inode 1395864555809 agino
0x2d121 (184609)
.....
155:[184544,0,0] 156:[184608,30,0xfffffffc00000000]

Same.

xfs_db> convert inode 1653562593576 agino
0x2d128 (184616)
....
155:[184544,0,0] 156:[184608,30,0xfffffffc00000000]

Same.

So they are all falling into the last btree record in the AG, and so
appear to have been skipped as a result of the same issue. At least
that gives me something to look at.

Still, please review the patches I've already posted - I'll push
them to linus if they are fine ASAP, and then add whatever I find
from this test later.

Cheers,

Dave.

PS: every AG I looked at had an identical inode allocation pattern.
Given that the directory entries and the file contents created are
all deterministic, it's reassuring to see that the allocator has
created identical metadata structure layouts on disk for a
repeating workload that creates identical user-visible
hierarchies...
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs