From: LuVar <luvar@plaintext.sk>
To: Zygo Blaxell <zblaxell@furryterror.org>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: btrfs deduplication and linux cache management
Date: Mon, 3 Nov 2014 15:09:11 +0100 (GMT+01:00) [thread overview]
Message-ID: <1959259002.1771415023751082.JavaMail.root@shiva> (raw)
In-Reply-To: <20141030160004.GK17395@hungrycats.org>
Thanks for nice and "replicate at home yourself" example. On my machine it is behaving precisely like in your:
<code>
root@blackdawn:/home/luvar# sync; sysctl vm.drop_caches=1
vm.drop_caches = 1
root@blackdawn:/home/luvar# time cat /home/luvar/programs/adt-bundle-linux/sdk/system-images/android-L/default/armeabi-v7a/userdata.img > /dev/null
real 0m6.768s
user 0m0.016s
sys 0m0.599s
root@blackdawn:/home/luvar# time cat /home/luvar/programs/android-sdk-linux/system-images/android-L/default/armeabi-v7a/userdata.img > /dev/null
real 0m5.259s
user 0m0.018s
sys 0m0.695s
root@blackdawn:/home/luvar# time cat /home/luvar/programs/adt-bundle-linux/sdk/system-images/android-L/default/armeabi-v7a/userdata.img > /dev/null
real 0m0.701s
user 0m0.014s
sys 0m0.288s
root@blackdawn:/home/luvar# time cat /home/luvar/programs/android-sdk-linux/system-images/android-L/default/armeabi-v7a/userdata.img > /dev/null
real 0m0.286s
user 0m0.013s
sys 0m0.272s
</code>
If you would mind asking, is there any plan to optimize this behaviour? I know that btrfs is not like ZFS (whole system from blockdevice, through cache, to VFS), so vould be possible to implement such optimization without major patch in linux block cache/VFS cache?
Thanks, have a nice day,
--
LuVar
----- "Zygo Blaxell" <zblaxell@furryterror.org> wrote:
> On Thu, Oct 30, 2014 at 10:26:07AM +0100, luvar@plaintext.sk wrote:
> > Hi,
> > I want to ask, if deduplicated file content will be cached in linux
> kernel just once for two deduplicated files.
> >
> > To explain in deep:
> > - I use btrfs for whole system with few subvolumes with some
> compression on some subvolumes.
> > - I have two directories with eclipse SDK with slightly differences
> (same version, different config)
> > - I assume that given directories is deduplicated and so two
> eclipse installations take place on hdd like one would (in rough
> estimation)
> > - I will start one of given eclipse
> > - linux kernel will cache all opened files during start of eclipse
> (I have enough free ram)
> > - I am just happy stupid linux user:
> > 1. will kernel cache file content after decompression? (I think
> yes)
> > 2. cached data will be in VFS layer or in block device layer?
>
> My guess based on behavior is the VFS layer. See below.
>
> > - When I will lunch second eclipse (different from first, but
> deduplicated from first) after first one:
> > 1. will second start require less data to be read from HDD?
>
> No.
>
> > 2. will be metadata for second instance read from hdd? (I asume
> yes)
>
> Yes (how could it not?).
>
> > 3. will be actual data read second time? (I hope not)
>
> Unfortunately, yes.
>
> This is my test:
>
> 1. Create a file full of compressible data that is big enough to
> take
> a few seconds to read from disk, but not too big to fit in RAM:
>
> yes $(date) | head -c 500m > a
>
> 2. Create a "deduplicated" (shared extent) copy of same:
>
> cp --reflink=always a b
>
> (use filefrag -v to verify both files have same physical extents)
>
> 3. Drop caches
>
> sync; sysctl vm.drop_caches=1
>
> 4. Time reading both files with cold and hot cache:
>
> time cat a > /dev/null
> time cat b > /dev/null
> time cat a > /dev/null
> time cat b > /dev/null
>
> Ideally, the first 'cat a' would load the file back from disk, so it
> will take a long time, and the other three would be very fast as the
> shared extent data would already be in RAM.
>
> That is what happens on 3.17.1:
>
> time cat a > /dev/null
> real 0m18.870s
> user 0m0.017s
> sys 0m3.432s
>
> time cat b > /dev/null
> real 0m16.931s
> user 0m0.007s
> sys 0m3.357s
>
> time cat a > /dev/null
> real 0m0.141s
> user 0m0.001s
> sys 0m0.136s
>
> time cat b > /dev/null
> real 0m0.121s
> user 0m0.002s
> sys 0m0.116s
>
> Above we see that reading 'b' the first time takes almost as long as
> 'a'.
> The second reads are cached, so they finish two orders of magnitude
> faster.
>
> That suggests that deduplicated extents are read and cached as
> entirely
> separate copies of the data. The sys time for the first read of 'b'
> would imply separate decompression as well.
>
> Compare the above result with a hardlink, which might behave more
> like
> what we expect:
>
> rm -f b
> ln a b
> sync; sysctl vm.drop_caches=1
>
> time cat a > /dev/null
> real 0m20.262s
> user 0m0.010s
> sys 0m3.376s
>
> time cat b > /dev/null
> real 0m0.125s
> user 0m0.003s
> sys 0m0.120s
>
> time cat a > /dev/null
> real 0m0.103s
> user 0m0.004s
> sys 0m0.097s
>
> time cat b > /dev/null
> real 0m0.098s
> user 0m0.002s
> sys 0m0.091s
>
> Above we clearly see that we read 'a' from disk only once, and use
> the
> cache three times.
next prev parent reply other threads:[~2014-11-03 14:09 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1589590871.231414660858286.JavaMail.root@shiva>
2014-10-30 9:26 ` btrfs deduplication and linux cache management luvar
2014-10-30 12:00 ` Austin S Hemmelgarn
2014-10-30 16:00 ` Zygo Blaxell
2014-11-03 14:09 ` LuVar [this message]
2014-11-04 20:01 ` Zygo Blaxell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1959259002.1771415023751082.JavaMail.root@shiva \
--to=luvar@plaintext.sk \
--cc=linux-btrfs@vger.kernel.org \
--cc=zblaxell@furryterror.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.