From: Qu Wenruo <quwenruo@cn.fujitsu.com>
To: Mark Fasheh <mfasheh@suse.de>, btrfs <linux-btrfs@vger.kernel.org>
Subject: About fi du and reflink/dedupe
Date: Fri, 22 Apr 2016 10:57:29 +0800 [thread overview]
Message-ID: <0e9b715e-0cb3-c7a3-e40e-36c9be1f53bd@cn.fujitsu.com> (raw)
Hi Mark,
Thanks for your contribution to btrfs-filesystem-du command.
However there seems to be some strange behavior related to reflinke(and
further in-band dedupe).
(And the root cause is lying quite deep into kernel backref resolving codes)
["Exclusive" value not really exclsuive]
When a file with 2 file extents, and the 2nd file extent points to the
1st one, the fi du gives wrong answer
The following command can create such file easily.
# mkfs.btrfs -f /dev/sdb5
# mount /dev/sdb5 /mnt/test
# xfs_io -f -c "pwrite 0 128K" /mnt/test/tmp
# xfs_io -c "reflink /mnt/test/tmp 0 128K 128K" /mnt/test/tmp
# btrfs fi du /mnt/test
Total Exclusive Set shared Filename
256.00KiB 256.00KiB - /mnt/test//tmp
256.00KiB 256.00KiB 0.00B /mnt/test/
Total seems to be OK, while I am confused of the exclusive value.
As the above method will only create one real data extent, which takes
128K, and if following the qgroup definition, its exclusive should be
128K other than 256K.
Fi du uses FIEMAP ioctl to get the fiemap, and fi du uses the SHARED
flag to determine whether it is shared.
However that SHARED flag doesn't handle case like this, in which
ino/root are all the same, only extent offset is different.
And what's more, if we modify btrfs_check_shared() to return SHARED flag
for such case, we will get 0 exclusive value for it.
Which is quite strang. (I assume the exclusive should be 128K)
[Slow btrfs_check_shared() performance]
In above case, btrfs fi du returns very fast.
But when the file is in-band deduped and size goes to 1G.
btrfs_check_shared() will take a lot of time to return, as it will do
backref walk through.
This would be a super huge problem for inband dedupe.
[Possible solution]
Would you please consider to judge shared extent in user space?
And don't rely on the SHARED flag from fiemap.
The work flow would be like:
1) Call fiemap skipping FIEMAP_EXTENT_SHARED flag
Although we still need to modify kernel to avoid btrfs_check_shared()
2) Get the disk bytenr and record it in user space bytenr pool
3) Compare each file extent disk bytenr with bytenr pool
And like qgroup, use this to build a rfer/excl data for each inode.
At least, this method would handle above exclusive value and avoid
year-long fiemap ioctl call in in-band dedupe case.
Thanks,
Qu
next reply other threads:[~2016-04-22 2:57 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-04-22 2:57 Qu Wenruo [this message]
2016-04-22 17:46 ` About fi du and reflink/dedupe Mark Fasheh
2016-04-25 0:46 ` Qu Wenruo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0e9b715e-0cb3-c7a3-e40e-36c9be1f53bd@cn.fujitsu.com \
--to=quwenruo@cn.fujitsu.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=mfasheh@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).