From: Mark Fasheh <mfasheh@suse.de>
To: Qu Wenruo <quwenruo@cn.fujitsu.com>
Cc: linux-btrfs@vger.kernel.org, jbacik@fb.com, clm@fb.com
Subject: Re: [PATCH 0/4] btrfs: update qgroups in drop snapshot
Date: Wed, 23 Sep 2015 14:49:26 -0700 [thread overview]
Message-ID: <20150923214926.GD17854@wotan.suse.de> (raw)
In-Reply-To: <5602031A.1080905@cn.fujitsu.com>
On Wed, Sep 23, 2015 at 09:40:42AM +0800, Qu Wenruo wrote:
> Thanks for all your work on this.
> Comment on test case is inlined below.
No problem, thanks for the review Qu!
> >+# Create an fs tree of a given height at a target location. This is
> >+# done by agressively creating inline extents to expand the number of
> >+# nodes required. We also add an traditional extent so that
> >+# drop_snapshot is forced to walk at least one extent that is not
> >+# stored in metadata.
> >+#
> >+# NOTE: The ability to vary tree height for this test is very useful
> >+# for debugging problems with drop_snapshot(). As a result we retain
> >+# that parameter even though the test below always does level 2 trees.
> >+_explode_fs_tree () {
> >+ local level=$1;
> >+ local loc="$2";
> >+ local bs=4095;
> >+ local cnt=1;
> >+ local n;
> >+
> >+ if [ -z $loc ]; then
> >+ echo "specify location for fileset"
> >+ exit 1;
> >+ fi
> >+
> >+ case $level in
> >+ 1)# this always reproduces level 1 trees
> >+ n=10;
>
> I'd prefer a more accurate calculation based on nodesize.
> That would allow using any possible nodesize.
How do we query nodesize? 'btrfs fi show' doesn't seem to give it.
But yeah in theory that sounds nice, I can certainly try it out. I'm not
really sure how complicated we want this whole function to be though. Right
now it should always work on all platforms.
> For example for $nodesize larger than 4K
> l1_min_n = int(($nodesize - 101) / (4095 + 21 + 17)) + 1
>
> And for 4K nodesize,
> 2 2K inline extent will cause the fs_tree to be 2 level.
> As 4K nodeize won't allow 4095 inline extent length.
>
> 101 is sizeof(struct btrfs_header).
> So $nodesize - 101 is the available space for a leaf.
> And 4095 is the maximum inline extent length, 21 is the inline
> file_extent header length,offsetof(struct btrfs_file_extent_item,
> disk_bytenr).
> 17 is sizeof(struct btrfs_disk_key).
>
> Above is the maximum value in theory, in fact leaf will be split
> more early than hitting the number.
> So the max_n should be good enough to create desired tree level, and
> make script run a little faster.
>
> >+ ;;
> >+ 2)# this always reproduces level 2 trees
> >+ n=1500
> For level 2 and higher tree, it will be
> l2_min_n = l1_min_n * (($nodesize - 101) / 33 +1) ^ ($level - 1)
>
> 101 is still header size.
> 33 is sizeof(struct btrfs_key_ptr)
>
> >+ ;;
> >+ 3)# this always reproduces level 3 trees
> >+ n=1000000;
> >+ ;;
> >+ *)
> >+ echo "Can't make level $level trees";
> >+ exit 1;
> >+ ;;
> >+ esac
> >+
> >+ mkdir -p $loc
> >+ for i in `seq -w 1 $n`;
> >+ do
> >+ dd status=none if=/dev/zero of=$loc/file$i bs=$bs count=$cnt
> >+ done
> >+
> >+ bs=131072
> >+ cnt=1
> >+ dd status=none if=/dev/zero of=$loc/extentfile bs=$bs count=$cnt
> >+}
> >+
> >+# Force the default leaf size as the calculations for making our btree
> >+# heights are based on that.
> >+run_check _scratch_mkfs "--nodesize 16384"
>
> When using given nodesize, it's better to consider other arch like
> AA64 or other arch which support 64K pagesize.
> (btrfs doesn't support subpage size nodesize, which means if using
> nodesize smaller than pagesize, it won't be mounted)
>
> I got informed several times when submitting qgroup related test case.
> See btrfs/017 and btrfs/091.
>
> But on the other hand, using 64K nodesize is somewhat overkilled and
> will make the test takes more time.
>
> If it's OK to ignore 64K pagesize case, I'd prefer to use 4K
> nodesize, which will be much faster to create fs tree with 2~3
> levels.
Right, so 64K nodesize is the most 'compatible' nodesize.
> >+_scratch_mount
> >+
> >+# populate the default subvolume and create a snapshot ('snap1')
> >+_explode_fs_tree 1 $SCRATCH_MNT/files
> >+_run_btrfs_util_prog subvolume snapshot $SCRATCH_MNT $SCRATCH_MNT/snap1
> >+
> >+# create new btree nodes in this snapshot. They will become shared
> >+# with the next snapshot we create.
> >+_explode_fs_tree 1 $SCRATCH_MNT/snap1/files-snap1
> >+
> >+# create our final snapshot ('snap2'), populate it with
> >+# exclusively owned nodes.
> >+#
> >+# As a result of this action, snap2 will get an implied ref to the
> >+# 128K extent created in the default subvolume.
> >+_run_btrfs_util_prog subvolume snapshot $SCRATCH_MNT/snap1 $SCRATCH_MNT/snap2
> >+_explode_fs_tree 1 $SCRATCH_MNT/snap2/files-snap2
> >+
> >+# Enable qgroups now that we have our filesystem prepared. This
> >+# will kick off a scan which we will have to wait for.
> >+_run_btrfs_util_prog quota enable $SCRATCH_MNT
> >+_run_btrfs_util_prog quota rescan -w $SCRATCH_MNT
> >+
> >+# Remount to clear cache, force everything to disk
> >+_scratch_unmount
> >+_scratch_mount
>
> Is there anything special that needs to use umount/mount other than sync?
A couple times now it's been to my advantage to force btrfs to reread the
file trees. It might not be strictly necessary any more.
> >+# Finally, delete snap1 to trigger btrfs_drop_snapshot(). This
> >+# snapshot is most interesting to delete because it will cause some
> >+# nodes to go exclusively owned for snap2, while some will stay shared
> >+# with the default subvolume. That exercises a maximum of the drop
> >+# snapshot/qgroup interactions.
> >+#
> >+# snap2s imlied ref from to the 128K extent in files/ can be lost by
> >+# the root finding code in qgroup accounting due to snap1 no longer
> >+# providing a path to it. This was fixed by the first two patches
> >+# referenced above.
> >+_run_btrfs_util_prog subvolume delete $SCRATCH_MNT/snap1
> >+
> >+# There is no way from userspace to force btrfs_drop_snapshot to run
> >+# at a given time (even via mount/unmount). We must wait for it to
> >+# start and complete. This is the shortest time on my tests systems I
> >+# have found which always allows drop_snapshot to run to completion.
> >+sleep 45
>
> Does "btrfs subv delete -c" help here?
Unfortunately not :( We need to wait for drop_snapshot() to get run. That
flag (from memory) just waits for the initial orphaning transaction to
finish.
> >+
> >+_scratch_unmount
> >+
> >+# generate a qgroup report and look for inconsistent groups
> >+# - don't use _run_btrfs_util_prog here as it captures the output and
> >+# we need to grep it.
> >+$BTRFS_UTIL_PROG check --qgroup-report $SCRATCH_DEV 2>&1 | grep -E -q "Counts for qgroup.*are different"
> >+if [ $? -ne 0 ]; then
> >+ status=0
> >+fi
> Quite a nice idea to use btrfsck to check qgroup validation.
>
> But I don't see the reason not to use _run_btrfS_util_progs, as I
> don't think it's needed to grep.
>
> If there is a bug in return value of btrfsck, then I'm OK with it as
> a workaround.
>
> But if btrfsck --qgroup-report will return non-zero when it finds a
> qgroup mismatch, I think is better to just call
> _run_btrfs_util_prog, as it has judgment for return value check.
btrfsck --qgroup-report returns zero unless there was an issue generating
the report so the grep there is the only way to catch this consistently.
Thanks again,
--Mark
--
Mark Fasheh
next prev parent reply other threads:[~2015-09-23 21:49 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-22 20:15 [PATCH 0/4] btrfs: update qgroups in drop snapshot Mark Fasheh
2015-09-22 20:15 ` [PATCH 1/4] Btrfs: use btrfs_get_fs_root in resolve_indirect_ref Mark Fasheh
2015-09-22 20:15 ` [PATCH 2/4] Btrfs: keep dropped roots in cache until transaction commit, V2 Mark Fasheh
2015-09-22 20:15 ` [PATCH 3/4] btrfs: Add qgroup tracing Mark Fasheh
2015-09-22 20:15 ` [PATCH 4/4] btrfs: qgroup: account shared subtree during snapshot delete Mark Fasheh
2015-11-02 1:59 ` Qu Wenruo
2015-11-03 23:56 ` Mark Fasheh
2015-11-04 1:10 ` Qu Wenruo
2015-09-22 21:12 ` [PATCH 0/4] btrfs: update qgroups in drop snapshot Mark Fasheh
2015-09-23 1:40 ` Qu Wenruo
2015-09-23 21:49 ` Mark Fasheh [this message]
2015-09-24 5:47 ` Duncan
2015-09-24 6:29 ` Qu Wenruo
2015-09-23 3:58 ` Qu Wenruo
2015-09-23 8:50 ` Holger Hoffstätte
2015-09-23 22:08 ` Mark Fasheh
2015-09-25 3:17 ` Qu Wenruo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150923214926.GD17854@wotan.suse.de \
--to=mfasheh@suse.de \
--cc=clm@fb.com \
--cc=jbacik@fb.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=quwenruo@cn.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).