Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Zygo Blaxell <zblaxell@furryterror.org>
To: luvar@plaintext.sk
Cc: linux-btrfs@vger.kernel.org
Subject: Re: btrfs deduplication and linux cache management
Date: Thu, 30 Oct 2014 12:00:04 -0400	[thread overview]
Message-ID: <20141030160004.GK17395@hungrycats.org> (raw)
In-Reply-To: <650272300.271414661167523.JavaMail.root@shiva>

[-- Attachment #1: Type: text/plain, Size: 3365 bytes --]

On Thu, Oct 30, 2014 at 10:26:07AM +0100, luvar@plaintext.sk wrote:
> Hi,
> I want to ask, if deduplicated file content will be cached in linux kernel just once for two deduplicated files.
> 
> To explain in deep:
>  - I use btrfs for whole system with few subvolumes with some compression on some subvolumes.
>  - I have two directories with eclipse SDK with slightly differences (same version, different config)
>  - I assume that given directories is deduplicated and so two eclipse installations take place on hdd like one would (in rough estimation)
>  - I will start one of given eclipse
>  - linux kernel will cache all opened files during start of eclipse (I have enough free ram)
>  - I am just happy stupid linux user:
>     1. will kernel cache file content after decompression? (I think yes)
>     2. cached data will be in VFS layer or in block device layer?

My guess based on behavior is the VFS layer.  See below.

>  - When I will lunch second eclipse (different from first, but deduplicated from first) after first one:
>     1. will second start require less data to be read from HDD?

No.

>     2. will be metadata for second instance read from hdd? (I asume yes)

Yes (how could it not?).

>     3. will be actual data read second time? (I hope not)

Unfortunately, yes.

This is my test:

1.  Create a file full of compressible data that is big enough to take
a few seconds to read from disk, but not too big to fit in RAM:

	yes $(date) | head -c 500m > a

2.  Create a "deduplicated" (shared extent) copy of same:

	cp --reflink=always a b

	(use filefrag -v to verify both files have same physical extents)

3.  Drop caches

	sync; sysctl vm.drop_caches=1

4.  Time reading both files with cold and hot cache:

	time cat a > /dev/null
	time cat b > /dev/null
	time cat a > /dev/null
	time cat b > /dev/null

Ideally, the first 'cat a' would load the file back from disk, so it
will take a long time, and the other three would be very fast as the
shared extent data would already be in RAM.

That is what happens on 3.17.1:

	time cat a > /dev/null
	real    0m18.870s
	user    0m0.017s
	sys     0m3.432s

	time cat b > /dev/null
	real    0m16.931s
	user    0m0.007s
	sys     0m3.357s

	time cat a > /dev/null
	real    0m0.141s
	user    0m0.001s
	sys     0m0.136s

	time cat b > /dev/null
	real    0m0.121s
	user    0m0.002s
	sys     0m0.116s

Above we see that reading 'b' the first time takes almost as long as 'a'.
The second reads are cached, so they finish two orders of magnitude
faster.

That suggests that deduplicated extents are read and cached as entirely
separate copies of the data.  The sys time for the first read of 'b'
would imply separate decompression as well.

Compare the above result with a hardlink, which might behave more like
what we expect:

	rm -f b
	ln a b
	sync; sysctl vm.drop_caches=1

	time cat a > /dev/null
	real    0m20.262s
	user    0m0.010s
	sys     0m3.376s

	time cat b > /dev/null
	real    0m0.125s
	user    0m0.003s
	sys     0m0.120s

	time cat a > /dev/null
	real    0m0.103s
	user    0m0.004s
	sys     0m0.097s

	time cat b > /dev/null
	real    0m0.098s
	user    0m0.002s
	sys     0m0.091s

Above we clearly see that we read 'a' from disk only once, and use the
cache three times.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

  parent reply	other threads:[~2014-10-30 16:00 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1589590871.231414660858286.JavaMail.root@shiva>
2014-10-30  9:26 ` btrfs deduplication and linux cache management luvar
2014-10-30 12:00   ` Austin S Hemmelgarn
2014-10-30 16:00   ` Zygo Blaxell [this message]
2014-11-03 14:09     ` LuVar
2014-11-04 20:01       ` Zygo Blaxell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141030160004.GK17395@hungrycats.org \
    --to=zblaxell@furryterror.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=luvar@plaintext.sk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox