All of lore.kernel.org
 help / color / mirror / Atom feed
From: Austin S Hemmelgarn <ahferroin7@gmail.com>
To: luvar@plaintext.sk, linux-btrfs@vger.kernel.org
Subject: Re: btrfs deduplication and linux cache management
Date: Thu, 30 Oct 2014 08:00:54 -0400	[thread overview]
Message-ID: <54522876.5030607@gmail.com> (raw)
In-Reply-To: <650272300.271414661167523.JavaMail.root@shiva>

[-- Attachment #1: Type: text/plain, Size: 2268 bytes --]

On 2014-10-30 05:26, luvar@plaintext.sk wrote:
> Hi,
> I want to ask, if deduplicated file content will be cached in linux kernel just once for two deduplicated files.
>
> To explain in deep:
>   - I use btrfs for whole system with few subvolumes with some compression on some subvolumes.
>   - I have two directories with eclipse SDK with slightly differences (same version, different config)
>   - I assume that given directories is deduplicated and so two eclipse installations take place on hdd like one would (in rough estimation)
>   - I will start one of given eclipse
>   - linux kernel will cache all opened files during start of eclipse (I have enough free ram)
>   - I am just happy stupid linux user:
>      1. will kernel cache file content after decompression? (I think yes)
>      2. cached data will be in VFS layer or in block device layer?
>   - When I will lunch second eclipse (different from first, but deduplicated from first) after first one:
>      1. will second start require less data to be read from HDD?
>      2. will be metadata for second instance read from hdd? (I asume yes)
>      3. will be actual data read second time? (I hope not)
>
> Thanks for answers,
> have a nice day,

I don't know for certain, but here is how I understand things work in 
this case:
1. Individual blocks are cached in the block device layer, which means 
that the de-duplicated data would only be cached at most as many times 
as there are disks it is on (ie at most 1 time for a single device 
filesystem, up to twice for a multi-device btrfs raid1 setup).
2. In the vfs layer, the cache handles decoded inodes (the actual file 
metadata), dentries (the file's entry in the parent directory), and 
individual pages of file content (after decompression).  AFAIK, the vfs 
layer's cache is pathname based, so that would probably cache two copies 
of the data, but after the metadata look-up, wouldn't need to read from 
the disk cause of the block layer cache.

Overall, this means that while de-duplicated data may be cached more 
than once, it shouldn't need to be reread from disk if there is still a 
copy in cache.  Metadata may or may not need to be read from the disk, 
depending on what is in the VFS cache.


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 2455 bytes --]

  reply	other threads:[~2014-10-30 12:00 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1589590871.231414660858286.JavaMail.root@shiva>
2014-10-30  9:26 ` btrfs deduplication and linux cache management luvar
2014-10-30 12:00   ` Austin S Hemmelgarn [this message]
2014-10-30 16:00   ` Zygo Blaxell
2014-11-03 14:09     ` LuVar
2014-11-04 20:01       ` Zygo Blaxell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54522876.5030607@gmail.com \
    --to=ahferroin7@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=luvar@plaintext.sk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.