From: Austin S Hemmelgarn <ahferroin7@gmail.com>
To: luvar@plaintext.sk, linux-btrfs@vger.kernel.org
Subject: Re: btrfs deduplication and linux cache management
Date: Thu, 30 Oct 2014 08:00:54 -0400 [thread overview]
Message-ID: <54522876.5030607@gmail.com> (raw)
In-Reply-To: <650272300.271414661167523.JavaMail.root@shiva>
[-- Attachment #1: Type: text/plain, Size: 2268 bytes --]
On 2014-10-30 05:26, luvar@plaintext.sk wrote:
> Hi,
> I want to ask, if deduplicated file content will be cached in linux kernel just once for two deduplicated files.
>
> To explain in deep:
> - I use btrfs for whole system with few subvolumes with some compression on some subvolumes.
> - I have two directories with eclipse SDK with slightly differences (same version, different config)
> - I assume that given directories is deduplicated and so two eclipse installations take place on hdd like one would (in rough estimation)
> - I will start one of given eclipse
> - linux kernel will cache all opened files during start of eclipse (I have enough free ram)
> - I am just happy stupid linux user:
> 1. will kernel cache file content after decompression? (I think yes)
> 2. cached data will be in VFS layer or in block device layer?
> - When I will lunch second eclipse (different from first, but deduplicated from first) after first one:
> 1. will second start require less data to be read from HDD?
> 2. will be metadata for second instance read from hdd? (I asume yes)
> 3. will be actual data read second time? (I hope not)
>
> Thanks for answers,
> have a nice day,
I don't know for certain, but here is how I understand things work in
this case:
1. Individual blocks are cached in the block device layer, which means
that the de-duplicated data would only be cached at most as many times
as there are disks it is on (ie at most 1 time for a single device
filesystem, up to twice for a multi-device btrfs raid1 setup).
2. In the vfs layer, the cache handles decoded inodes (the actual file
metadata), dentries (the file's entry in the parent directory), and
individual pages of file content (after decompression). AFAIK, the vfs
layer's cache is pathname based, so that would probably cache two copies
of the data, but after the metadata look-up, wouldn't need to read from
the disk cause of the block layer cache.
Overall, this means that while de-duplicated data may be cached more
than once, it shouldn't need to be reread from disk if there is still a
copy in cache. Metadata may or may not need to be read from the disk,
depending on what is in the VFS cache.
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 2455 bytes --]
next prev parent reply other threads:[~2014-10-30 12:00 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1589590871.231414660858286.JavaMail.root@shiva>
2014-10-30 9:26 ` btrfs deduplication and linux cache management luvar
2014-10-30 12:00 ` Austin S Hemmelgarn [this message]
2014-10-30 16:00 ` Zygo Blaxell
2014-11-03 14:09 ` LuVar
2014-11-04 20:01 ` Zygo Blaxell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54522876.5030607@gmail.com \
--to=ahferroin7@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=luvar@plaintext.sk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.