public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Laurent Stacul <captain.stac@gmail.com>
To: Vito Caputo <vcaputo@pengaru.com>
Cc: Laurent Stacul <captain.stac@gmail.com>, linux-kernel@vger.kernel.org
Subject: Re: XFS/mmap reflink file question
Date: Thu, 19 Aug 2021 12:04:17 +0200	[thread overview]
Message-ID: <YR4soQ7AV+mxL2ml@saturne.home> (raw)
In-Reply-To: <20210817221258.jb4pg77bdle7t2oj@shells.gnugeneration.com>

On Tue, Aug 17, 2021 at 03:12:58PM -0700, Vito Caputo wrote:
>On Tue, Aug 17, 2021 at 02:19:12PM +0200, Laurent Stacul wrote:
>> Hello,
>>
>> I spent much time digging into the mmap mechanism and I don't have a clear view
>> on mmap'ing a file and a reflink to this file would be mapped twice in memory
>> (this only applies in case the filesystem supports reflink feature like XFS).
>>
>> To describe my tests, I generate a file stored on an XFS partition and create a
>> reflink of it:
>>
>>     % dd if=/dev/zero of=./output.dat bs=1M count=24
>>     % cp --reflink -v output.dat output2.dat
>>     % xfs_bmap -v output.dat
>>     output.dat:
>>      EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET          TOTAL
>>        0: [0..49151]:      3756776..3805927  0 (3756776..3805927) 49152 100000
>>     % xfs_bmap -v output2.dat
>>     output2.dat:
>>      EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET          TOTAL
>>        0: [0..49151]:      3756776..3805927  0 (3756776..3805927) 49152 100000
>>
>> Then I mmap the first file twice using vmtouch tool:
>>
>>     % vmtouch -l output.dat&
>>     [1] 15870
>>     LOCKED 6144 pages (24M)
>>     % vmtouch -l output.dat&
>>     [2] 15872
>>     LOCKED 6144 pages (24M)
>>     % pmap -X 15872 | grep -e 'Pss' -e 'output' | awk '{if(NR>1)printf("%16s %4s %6s %10s %10s %10s\n", $1, $2, $4, $5, $7, $8)}'
>>          Address Perm Device      Inode        Rss        Pss
>>     7fcbb9eb9000 r--s  fc:10    3755268      24576      12288
>>
>> As we can see the Proportional Set Size is as expected the half of the Resident
>> Set Size because the memory is shared by the two processes. Now, I mmap the
>> reflink `output2.dat' of 'output.dat':
>>
>>     % vmtouch -l output2.dat&
>>     [3] 15892
>>     LOCKED 6144 pages (24M)
>>     % pmap -X 15872 | grep -e 'Pss' -e 'output' | awk '{if(NR>1)printf("%16s %4s %6s %10s %10s %10s\n", $1, $2, $4, $5, $7, $8)}'
>>          Address Perm Device      Inode        Rss        Pss
>>     7fcbb9eb9000 r--s  fc:10    3755268      24576      12288
>>
>> The Pss of mmap'ed file by the first process has not decreased (I expected a
>> value of Rss / 3 because I hoped the memory would have been shared by the 3
>> processes). If I look at the process map of the last process, we can interpret
>> a new memory area was allocated and locked.
>>
>>     % pmap -X 15892 | grep -e 'Pss' -e 'output' | awk '{if(NR>1)printf("%16s %4s %6s %10s %10s %10s\n", $1, $2, $4, $5, $7, $8)}'
>>           Address Perm Device      Inode        Rss        Pss
>>      7f5adc53f000 r--s  fc:10    3755269      24576      24576
>>
>> So my questions:
>> - Why can't we benefit from the memory sharing when reflinked files are mmap'ed
>>   ? It would be great because one application would be, in the context of
>>   containers, the possibility to share some read only areas between container
>>   that are built from the layer diff that are reproducible between images. We
>>   can imagine a layer that brings some shared libraries in an image from a
>>   reproducible FS diff so that containers would not load several times a
>>   library.
>> - I can think of many tricky cases with the behavior I was expecting (especially
>>   if a process has write access to the mapped area), but if you know a way, an
>>   option something to achieve what I am trying to do, I would be glad to hear
>>   it.
>> - Conversely, don't hesitate to tell me my expectation is just crazy.
>>
>> Anyway, I am always looking forward to listening to valuable specialist insights.
>> Thanks in advance,
>>
>> stac
>>
>> PS: Please, add me is CC if this message deserves an answer.
>>
>
>This is one of the major features overlayfs brings to the table over
>reflink's current implementation.
>
>With reflink copies you get distinct inodes and the data sharing
>occurs further down in the fs at the extent level, below the struct
>address_space instances.
>
>If memory serves Dave Chinner has given the issue some thought, but I
>haven't noticed/heard anything in terms of progress there.  Maybe
>he'll see this and chime in...
>
>Regards,
>Vito Caputo

Thanks for your answer. If I understand correctly, reflink feature
cannot be used in the scenario I propose because reflinks are
optimization occuring under the VFS. This makes sense to me and I was
not really confident this had a chance to work.

As you suggest, I will turn my effort on overlay.

Regards,
stac


      parent reply	other threads:[~2021-08-19 10:04 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-17 12:19 XFS/mmap reflink file question Laurent Stacul
2021-08-17 22:12 ` Vito Caputo
2021-08-19  8:59   ` Laurent Stacul
2021-08-19 10:04   ` Laurent Stacul [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YR4soQ7AV+mxL2ml@saturne.home \
    --to=captain.stac@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=vcaputo@pengaru.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox