From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>,
fstests <fstests@vger.kernel.org>,
linux-xfs@vger.kernel.org,
Ziyang Zhang <ZiyangZhang@linux.alibaba.com>
Subject: Re: [PATCH] common/populate: Ensure that S_IFDIR.FMT_BTREE is in btree format
Date: Tue, 6 Dec 2022 18:17:47 -0800 [thread overview]
Message-ID: <Y4/3y+Ibd4JqB2kp@magnolia> (raw)
In-Reply-To: <Y4/2ZUIm2MKs6UID@B-P7TQMD6M-0146.local>
On Wed, Dec 07, 2022 at 10:11:49AM +0800, Gao Xiang wrote:
> Hi Dave,
>
> On Wed, Dec 07, 2022 at 10:34:17AM +1100, Dave Chinner wrote:
> > On Fri, Dec 02, 2022 at 10:23:07AM +0800, Gao Xiang wrote:
> > > Hi Darrick,
> > >
> > > On Thu, Dec 01, 2022 at 07:52:44AM -0800, Darrick J. Wong wrote:
> > > > On Thu, Dec 01, 2022 at 04:12:08PM +0800, Gao Xiang wrote:
> > > > > Sometimes "$((128 * dblksz / 40))" dirents cannot make sure that
> > > > > S_IFDIR.FMT_BTREE could become btree format for its DATA fork.
> > > > >
> > > > > Actually we just observed it can fail after apply our inode
> > > > > extent-to-btree workaround. The root cause is that the kernel may be
> > > > > too good at allocating consecutive blocks so that the data fork is
> > > > > still in extents format.
> > > > >
> > > > > Therefore instead of using a fixed number, let's make sure the number
> > > > > of extents is large enough than (inode size - inode core size) /
> > > > > sizeof(xfs_bmbt_rec_t).
> > > > >
> > > > > Suggested-by: "Darrick J. Wong" <djwong@kernel.org>
> > > > > Cc: Ziyang Zhang <ZiyangZhang@linux.alibaba.com>
> > > > > Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> > > > > ---
> > > > > common/populate | 22 +++++++++++++++++++++-
> > > > > 1 file changed, 21 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/common/populate b/common/populate
> > > > > index 6e004997..e179a300 100644
> > > > > --- a/common/populate
> > > > > +++ b/common/populate
> > > > > @@ -71,6 +71,25 @@ __populate_create_dir() {
> > > > > done
> > > > > }
> > > > >
> > > > > +# Create a large directory and ensure that it's a btree format
> > > > > +__populate_create_btree_dir() {
> > > >
> > > > Since this encodes behavior specific to xfs, this ought to be called
> > > >
> > > > __populate_xfs_create_btree_dir
> > > >
> > > > or something like that.
> > > >
> > > > > + name="$1"
> > > > > + isize="$2"
> > > >
> > > > These ought to be local variables, e.g.
> > > >
> > > > local name="$1"
> > > > local isize="$2"
> > > >
> > > > So that they don't pollute the global name scope. Yay bash.
> > > >
> > > > > +
> > > > > + mkdir -p "${name}"
> > > > > + d=0
> > > > > + while true; do
> > > > > + creat=mkdir
> > > > > + test "$((d % 20))" -eq 0 && creat=touch
> > > > > + $creat "${name}/$(printf "%.08d" "$d")"
> > > > > + if [ "$((d % 40))" -eq 0 ]; then
> > > > > + nexts="$($XFS_IO_PROG -c "stat" $name | grep 'fsxattr.nextents' | sed -e 's/^.*nextents = //g' -e 's/\([0-9]*\).*$/\1/g')"
> > > >
> > > > _xfs_get_fsxattr...
> >
> > The grep/sed expression is also overly complex - it can easily be
> > replaced with just this:
> >
> > nexts=`$XFS_IO_PROG -c "stat" $name | sed -ne 's/^fsxattr.nextents = //p'
> >
> > The "-n" option to sed suppresses the printing of the stream
> > (pattern space) to the output as it processes the input, which gets
> > rid of the need for using grep to suppress non-matching input. The "p"
> > suffix to the search string forces matched patterns to be printed to
> > the output.
> >
> > This results in only matched, substituted pattern spaces to be
> > printed, avoiding the need for grep and multiple sed regexes to be
> > matched to strip the text down to just the integer string.
>
> I just copied it from another reference at that time as a copy-and-paste
> engineer.. Also note that Ziyang's new patch already use
> _xfs_get_fsxattr to get this field.
>
> >
> > > > > + [ "$nexts" -gt "$(((isize - 176) / 16))" ] && break
> > > >
> > > > Only need to calculate this once if you declare this at the top:
> > > >
> > > > # We need enough extents to guarantee that the data fork is in
> > > > # btree format. Cycling the mount to use xfs_db is too slow, so
> > > > # watch for when the extent count exceeds the space after the
> > > > # inode core.
> > > > local max_nextents="$(((isize - 176) / 16))"
> > > >
> > > > and then you can do:
> > > >
> > > > [[ $nexts -gt $max_nextents ]] && break
> > > >
> > > > Also not a fan of hardcoding 176 around fstests, but I don't know how
> > > > we'd detect that at all.
> > > >
> > > > # Number of bytes reserved for only the inode record, excluding the
> > > > # immediate fork areas.
> > > > _xfs_inode_core_bytes()
> > > > {
> > > > echo 176
> > > > }
> > > >
> > > > I guess? Or extract it from tests/xfs/122.out?
> > >
> > > Thanks for your comments.
> > >
> > > I guess hard-coded 176 in _xfs_inode_core_bytes() is fine for now
> > > (It seems a bit weird to extract a number from a test expected result..)
> >
> > Which is wrong when testing a v4 filesystem - in that case the inode
> > core size is 96 bytes and so max extents may be larger on v4
> > filesystems than v5 filesystems....
>
> Do we really care v4 fs for now since it's deprecated?... Darrick once also
> suggested using (isize / 16) but it seems it could take unnecessary time to
> prepare.. Or we could just use (isize - 96) / 16 to keep v4 work.
Well you /could/ make _xfs_inode_core_bytes grep xfs_info for 'crc=1'
and switch 176/96 on that. The only reason why the existing callers
hardcoded 176 is that (I think) they all require crc=1.
(Or they're h***** bash scripts and we've just gotten lucky the whole
time.)
--D
> Thanks,
> Gao Xiang
>
>
> >
> > Cheers,
> >
> > Dave.
> > --
> > Dave Chinner
> > david@fromorbit.com
next prev parent reply other threads:[~2022-12-07 2:18 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-01 8:12 [PATCH] common/populate: Ensure that S_IFDIR.FMT_BTREE is in btree format Gao Xiang
2022-12-01 15:52 ` Darrick J. Wong
2022-12-02 2:23 ` Gao Xiang
2022-12-06 23:34 ` Dave Chinner
2022-12-07 2:11 ` Gao Xiang
2022-12-07 2:17 ` Darrick J. Wong [this message]
2022-12-07 21:48 ` Dave Chinner
2022-12-08 6:29 ` Ziyang Zhang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y4/3y+Ibd4JqB2kp@magnolia \
--to=djwong@kernel.org \
--cc=ZiyangZhang@linux.alibaba.com \
--cc=david@fromorbit.com \
--cc=fstests@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox