From: Dave Chinner <david@fromorbit.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Eryu Guan <eguan@redhat.com>, linux-xfs@vger.kernel.org
Subject: Re: [BUG] sb_fdblocks counting error caused by too large indlen returned from xfs_bmap_worst_indlen()
Date: Tue, 11 Jul 2017 10:08:59 +1000 [thread overview]
Message-ID: <20170711000859.GG17762@dastard> (raw)
In-Reply-To: <20170710161824.GI4103@magnolia>
On Mon, Jul 10, 2017 at 09:18:24AM -0700, Darrick J. Wong wrote:
> On Sun, Jul 09, 2017 at 10:08:18PM +0800, Eryu Guan wrote:
> > On Fri, Jul 07, 2017 at 11:49:37PM -0700, Darrick J. Wong wrote:
> > > On Fri, Jul 07, 2017 at 08:01:43PM +0800, Eryu Guan wrote:
> > > > Hi all,
> > > >
> > > > I recently hit a repeatable sb_fdblocks corruption as below:
> > > >
> > > > Phase 1 - find and verify superblock...
> > > > Phase 2 - using internal log
> > > > - zero log...
> > > > - scan filesystem freespace and inode maps...
> > > > sb_fdblocks 14538692, counted 14669764
> > > > - found root inode chunk
> > > > Phase 3 - for each AG...
> > > > ...
> > > >
> > > > And the count diff is always 14669764 - 14538692 = 131072 (128k). The
> > > > XFS in question was formated with "-m rmapbt=1 -b 1k" option.
> > > >
> > > > After turning on XFS_WARN and adding some debug printks (I appended the
> > > > detailed logs at the end of mail), I found that this was caused by too
> > > > large 'indlen' returned by xfs_bmap_worst_indlen(), which can't fit in a
> > > > 17 bits value (STARTBLOCKVALBITS is defined as 17), so the assert in
> > > > nullstartblock() failed: ASSERT(k < (1 << STARTBLOCKVALBITS));
> > > >
> > > > From the log, newlen = 151513, which needs 18 bits, so nullstartblock()
> > > > throws away the 18th bit, and the sb_fdblocks difference is always 2^17
> > > > = 131072.
> > >
> > > br_startblock is encoded in memory (and in the on-disk bmbt records) as
> > > a 52-bit unsigned integer. We signal a delayed allocation record by
> > > setting the uppermost STARTBLOCKMASKBITS (35) bits to 1 and stash the
> > > 'indlen' reservation (i.e. the worst case estimate of the space we need
> > > to grow the bmbt/rmapbt to map the entire delayed allocation) in the
> > > lower 17 bits of br_startblock. In theory this is ok because we're
> > > still quite a ways from having enough storage to create an fs where
> > > the upper bits in the agno part of an xfs_fsblock_t are actually set.
> >
> > This confirms what I read from the code, thanks! But I'm still curious
> > about how these numbers are chosen, especially STARTBLOCKMASKBITS is
> > defined as (15 + 20), where are they from?
>
> <shrug> Dave? :)
You mean these definitions?
#define STARTBLOCKVALBITS 17
#define STARTBLOCKMASKBITS (15 + 20)
Today: a history lesson. :)
Remember that these are stored encoded in a xfs_bmbt_rec on disk and
a xfs_bmbt_rec_host in memory, so they need to fit in this
definition both on disk and in memory:
/*
* Bmap btree record and extent descriptor.
* l0:63 is an extent flag (value 1 indicates non-normal).
* l0:9-62 are startoff.
* l0:0-8 and l1:21-63 are startblock.
* l1:0-20 are blockcount.
*/
#define BMBT_EXNTFLAG_BITLEN 1
#define BMBT_STARTOFF_BITLEN 54
#define BMBT_STARTBLOCK_BITLEN 52
#define BMBT_BLOCKCOUNT_BITLEN 21
We're looking at BMBT_STARTBLOCK_BITLEN here so it's obvious that
that 15 + 20 + 17 is 52 bits. And that the startblock encoding for
delayed allocation obivously fits inside the space in on disk and
host extent tree records correctly.
But what about the old "small block" format that was originally
used on 32 bit MIPS systems? That only had 32 bits in the start
block encoding *in memory*, so it should be clear that:
15 + 17 = 32 bits
Indeed, look back at the older code:
(http://oss.sgi.com/cgi-bin/gitweb.cgi?p=archive/xfs-import.git;a=blob;f=fs/xfs/xfs_bmap_btree.h;h=a4555abb6622a5d2ac6c08ab0f585761d6ff4585;hb=HEAD)
#define STARTBLOCKMASKBITS (15 + XFS_BIG_BLKNOS * 20)
#define DSTARTBLOCKMASKBITS (15 + 20)
And you can see we had different definitions for in-memory and
on-disk start block masks, and that difference was the size of the
block addresses the compiled kernel could manage.
IOWs, the "15 + 20" is the old definition that recognised the
difference in block size between 32 bit and 64 bit systems could
originally support in memory vs what the 64-bit on-disk format
supported. We now only support 64bit in memory, so the in-memory and
on-disk definitions are now the same....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2017-07-11 0:16 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-07-07 12:01 [BUG] sb_fdblocks counting error caused by too large indlen returned from xfs_bmap_worst_indlen() Eryu Guan
2017-07-08 6:49 ` Darrick J. Wong
2017-07-09 14:08 ` Eryu Guan
2017-07-10 16:18 ` Darrick J. Wong
2017-07-11 0:08 ` Dave Chinner [this message]
2017-07-11 11:09 ` Eryu Guan
2017-09-02 7:49 ` Eryu Guan
2017-09-02 15:20 ` Darrick J. Wong
2017-09-03 4:01 ` Eryu Guan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170711000859.GG17762@dastard \
--to=david@fromorbit.com \
--cc=darrick.wong@oracle.com \
--cc=eguan@redhat.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).