From: LuVar <luvar@plaintext.sk>
To: Zygo Blaxell <zblaxell@furryterror.org>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: btrfs deduplication and linux cache management
Date: Mon, 3 Nov 2014 15:09:11 +0100 (GMT+01:00) [thread overview]
Message-ID: <1959259002.1771415023751082.JavaMail.root@shiva> (raw)
In-Reply-To: <20141030160004.GK17395@hungrycats.org>
Thanks for nice and "replicate at home yourself" example. On my machine it is behaving precisely like in your:
<code>
root@blackdawn:/home/luvar# sync; sysctl vm.drop_caches=1
vm.drop_caches = 1
root@blackdawn:/home/luvar# time cat /home/luvar/programs/adt-bundle-linux/sdk/system-images/android-L/default/armeabi-v7a/userdata.img > /dev/null
real 0m6.768s
user 0m0.016s
sys 0m0.599s
root@blackdawn:/home/luvar# time cat /home/luvar/programs/android-sdk-linux/system-images/android-L/default/armeabi-v7a/userdata.img > /dev/null
real 0m5.259s
user 0m0.018s
sys 0m0.695s
root@blackdawn:/home/luvar# time cat /home/luvar/programs/adt-bundle-linux/sdk/system-images/android-L/default/armeabi-v7a/userdata.img > /dev/null
real 0m0.701s
user 0m0.014s
sys 0m0.288s
root@blackdawn:/home/luvar# time cat /home/luvar/programs/android-sdk-linux/system-images/android-L/default/armeabi-v7a/userdata.img > /dev/null
real 0m0.286s
user 0m0.013s
sys 0m0.272s
</code>
If you would mind asking, is there any plan to optimize this behaviour? I know that btrfs is not like ZFS (whole system from blockdevice, through cache, to VFS), so vould be possible to implement such optimization without major patch in linux block cache/VFS cache?
Thanks, have a nice day,
--
LuVar
----- "Zygo Blaxell" <zblaxell@furryterror.org> wrote:
> On Thu, Oct 30, 2014 at 10:26:07AM +0100, luvar@plaintext.sk wrote:
> > Hi,
> > I want to ask, if deduplicated file content will be cached in linux
> kernel just once for two deduplicated files.
> >
> > To explain in deep:
> > - I use btrfs for whole system with few subvolumes with some
> compression on some subvolumes.
> > - I have two directories with eclipse SDK with slightly differences
> (same version, different config)
> > - I assume that given directories is deduplicated and so two
> eclipse installations take place on hdd like one would (in rough
> estimation)
> > - I will start one of given eclipse
> > - linux kernel will cache all opened files during start of eclipse
> (I have enough free ram)
> > - I am just happy stupid linux user:
> > 1. will kernel cache file content after decompression? (I think
> yes)
> > 2. cached data will be in VFS layer or in block device layer?
>
> My guess based on behavior is the VFS layer. See below.
>
> > - When I will lunch second eclipse (different from first, but
> deduplicated from first) after first one:
> > 1. will second start require less data to be read from HDD?
>
> No.
>
> > 2. will be metadata for second instance read from hdd? (I asume
> yes)
>
> Yes (how could it not?).
>
> > 3. will be actual data read second time? (I hope not)
>
> Unfortunately, yes.
>
> This is my test:
>
> 1. Create a file full of compressible data that is big enough to
> take
> a few seconds to read from disk, but not too big to fit in RAM:
>
> yes $(date) | head -c 500m > a
>
> 2. Create a "deduplicated" (shared extent) copy of same:
>
> cp --reflink=always a b
>
> (use filefrag -v to verify both files have same physical extents)
>
> 3. Drop caches
>
> sync; sysctl vm.drop_caches=1
>
> 4. Time reading both files with cold and hot cache:
>
> time cat a > /dev/null
> time cat b > /dev/null
> time cat a > /dev/null
> time cat b > /dev/null
>
> Ideally, the first 'cat a' would load the file back from disk, so it
> will take a long time, and the other three would be very fast as the
> shared extent data would already be in RAM.
>
> That is what happens on 3.17.1:
>
> time cat a > /dev/null
> real 0m18.870s
> user 0m0.017s
> sys 0m3.432s
>
> time cat b > /dev/null
> real 0m16.931s
> user 0m0.007s
> sys 0m3.357s
>
> time cat a > /dev/null
> real 0m0.141s
> user 0m0.001s
> sys 0m0.136s
>
> time cat b > /dev/null
> real 0m0.121s
> user 0m0.002s
> sys 0m0.116s
>
> Above we see that reading 'b' the first time takes almost as long as
> 'a'.
> The second reads are cached, so they finish two orders of magnitude
> faster.
>
> That suggests that deduplicated extents are read and cached as
> entirely
> separate copies of the data. The sys time for the first read of 'b'
> would imply separate decompression as well.
>
> Compare the above result with a hardlink, which might behave more
> like
> what we expect:
>
> rm -f b
> ln a b
> sync; sysctl vm.drop_caches=1
>
> time cat a > /dev/null
> real 0m20.262s
> user 0m0.010s
> sys 0m3.376s
>
> time cat b > /dev/null
> real 0m0.125s
> user 0m0.003s
> sys 0m0.120s
>
> time cat a > /dev/null
> real 0m0.103s
> user 0m0.004s
> sys 0m0.097s
>
> time cat b > /dev/null
> real 0m0.098s
> user 0m0.002s
> sys 0m0.091s
>
> Above we clearly see that we read 'a' from disk only once, and use
> the
> cache three times.
next prev parent reply other threads:[~2014-11-03 14:09 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1589590871.231414660858286.JavaMail.root@shiva>
2014-10-30 9:26 ` btrfs deduplication and linux cache management luvar
2014-10-30 12:00 ` Austin S Hemmelgarn
2014-10-30 16:00 ` Zygo Blaxell
2014-11-03 14:09 ` LuVar [this message]
2014-11-04 20:01 ` Zygo Blaxell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1959259002.1771415023751082.JavaMail.root@shiva \
--to=luvar@plaintext.sk \
--cc=linux-btrfs@vger.kernel.org \
--cc=zblaxell@furryterror.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox