From: Matthew Wilcox <willy@infradead.org>
To: Goldwyn Rodrigues <rgoldwyn@suse.de>
Cc: linux-fsdevel@vger.kernel.org, linux-btrfs@vger.kernel.org
Subject: Re: [RFC PATCH 0/5] Shared memory for shared extents
Date: Mon, 25 Oct 2021 16:43:23 +0100 [thread overview]
Message-ID: <YXbQm6TxaWcLnpal@casper.infradead.org> (raw)
In-Reply-To: <20211025145301.hk627p2qcotxegrd@fiona>
On Mon, Oct 25, 2021 at 09:53:01AM -0500, Goldwyn Rodrigues wrote:
> On 2:43 23/10, Matthew Wilcox wrote:
> > On Fri, Oct 22, 2021 at 03:15:00PM -0500, Goldwyn Rodrigues wrote:
> > > This is an attempt to reduce the memory footprint by using a shared
> > > page(s) for shared extent(s) in the filesystem. I am hoping to start a
> > > discussion to iron out the details for implementation.
> >
> > When you say "Shared extents", you mean reflinks, which are COW, right?
>
> Yes, shared extents are extents which are shared on disk by two or more
> files. Yes, same as reflinks. Just to explain with an example:
>
> If two files, f1 and f2 have shared extent(s), and both files are read. Each
> file's mapping->i_pages will hold a copy of the contents of the shared
> extent on disk. So, f1->mapping will have one copy and f2->mapping will
> have another copy.
>
> For reads (and only reads), if we use underlying device's mapping, we
> can save on duplicate copy of the pages.
Yes; I'm familiar with the problem. Dave Chinner and I had a great
discussion about it at LCA a couple of years ago.
The implementation I've had in mind for a while is that the filesystem
either creates a separate inode for a shared extent, or (as you've
done here) uses the bdev's inode. We can discuss the pros/cons of
that separately.
To avoid the double-lookup problem, I was intending to generalise DAX
entries into PFN entries. That way, if the read() (or mmap read fault)
misses in the inode's cache, we can look up the shared extent cache,
and then cache the physical address of the memory in the inode.
That makes reclaim/eviction of the page in the shared extent more
expensive because you have to iterate all the inodes which share the
extent and remove the PFN entries before the page can be reused.
Perhaps we should have a Zoom meeting about this before producing duelling
patch series? I can host if you're interested.
next prev parent reply other threads:[~2021-10-25 15:45 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-22 20:15 [RFC PATCH 0/5] Shared memory for shared extents Goldwyn Rodrigues
2021-10-22 20:15 ` [RFC PATCH 1/5] mm: Use file parameter to determine bdi Goldwyn Rodrigues
2021-10-22 20:15 ` [RFC PATCH 2/5] mm: Switch mapping to device mapping Goldwyn Rodrigues
2021-10-23 1:36 ` Matthew Wilcox
2021-10-22 20:15 ` [RFC PATCH 3/5] btrfs: Add sharedext mount option Goldwyn Rodrigues
2021-10-22 20:15 ` [RFC PATCH 4/5] btrfs: Set s_bdev for btrfs super block Goldwyn Rodrigues
2021-10-22 20:15 ` [RFC PATCH 5/5] btrfs: function to convert file offset to device offset Goldwyn Rodrigues
2021-10-23 1:43 ` [RFC PATCH 0/5] Shared memory for shared extents Matthew Wilcox
2021-10-25 14:53 ` Goldwyn Rodrigues
2021-10-25 15:43 ` Matthew Wilcox [this message]
2021-10-25 16:43 ` Goldwyn Rodrigues
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YXbQm6TxaWcLnpal@casper.infradead.org \
--to=willy@infradead.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=rgoldwyn@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox