linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* How can we share page cache pages for reflinked files?
@ 2017-08-10  4:28 Dave Chinner
  2017-08-10  5:57 ` Kirill A. Shutemov
  2017-08-10 16:11 ` Matthew Wilcox
  0 siblings, 2 replies; 16+ messages in thread
From: Dave Chinner @ 2017-08-10  4:28 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-mm

Hi folks,

I've recently been looking into what is involved in sharing page
cache pages for shared extents in a filesystem. That is, create a
file, reflink it so there's two files but only one copy of the data
on disk, then read both files.  Right now, we get two copies of the
data in the page cache - one in each inode mapping tree.

If we scale this up to a container host which is using reflink trees
it's shared root images, there might be hundreds of copies of the
same data held in cache (i.e. one page per container). Given that
the filesystem knows that the underlying data extent is shared when
we go to read it, it's relatively easy to add mechanisms to the
filesystem to return the same page for all attempts to read the
from a shared extent from all inodes that share it.

However, the problem I'm getting stuck on is that the page cache
itself can't handle inserting a single page into multiple page cache
mapping trees. i.e. The page has a single pointer to the mapping
address space, and the mapping has a single pointer back to the
owner inode. As such, a cached page has a 1:1 mapping to it's host
inode and this structure seems to be assumed rather widely through
the code.

The problem is somewhat limited by the fact that only clean,
read-only pages would be shared, and the attempt to write/dirty
a shared page in a mapping would trigger a COW operation in the
filesystem which would invalidate that inode's shared page and
replace it with a new, inode-private page that could be written to.
This still requires us to be able to find the right inode from the
shared page context to run the COW operation. Luckily, the IO path
already has an inode pointer, and the page fault path provides us
with the inode via file_inode(vmf->vma->vm_file) so we don't
actually need page->mapping->host in these paths.

Along these lines I've thought about using a "shared mapping" that
is associated with the filesystem rather than a specific inode (like
a bdev mapping), but that's no good because if page->mapping !=
inode->i_mapping the page is consider to have been invalidated and
should be considered invalid.

Further - a page has a single, fixed index into the mapping tree
(i.e. page->index), so this prevents arbitrary page sharing across
inodes (the "deduplication triggered shared extent" case). And we
can't really get rid of the page index, because that's how the page
finds itself in a mapping tree.

This leads me to think about crazy schemes like allocating a
"referring struct page" that is allocated for every reference to a
shared cache page and chain them all to the real struct page sorta
like we do for compound pages. That would give us a unique struct
page for each mapping tree and solve many of the issues, but I'm not
sure how viable such a concept would be.

I'm sure there's more issues than I've outlined here, but I haven't
gone deeper than this because I've got to solve the one to many
problem first.  I don't know if anyone has looked at this in any
detail, so I don't know what ideas, patches, crazy schemes, etc
might already exist out there. Right now I'm just looking for
information to narrow down what I need to look at - finding what
rabbit holes have already been explored and what dragons are already
known about would help an awful lot right now.

Anyone?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2017-08-15 15:11 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-08-10  4:28 How can we share page cache pages for reflinked files? Dave Chinner
2017-08-10  5:57 ` Kirill A. Shutemov
2017-08-10  9:01   ` Dave Chinner
2017-08-10 13:31     ` Kirill A. Shutemov
2017-08-11  3:59       ` Dave Chinner
2017-08-11 12:57         ` Kirill A. Shutemov
2017-08-10 16:11 ` Matthew Wilcox
2017-08-10 19:17   ` Vivek Goyal
2017-08-10 21:20     ` Matthew Wilcox
2017-08-11  4:25   ` Dave Chinner
2017-08-11 17:08     ` Matthew Wilcox
2017-08-11 18:04       ` Christoph Hellwig
2017-08-14  6:48       ` Dave Chinner
2017-08-14 18:14         ` Christopher Lameter
2017-08-14 21:09           ` Kirill A. Shutemov
2017-08-15 15:11             ` Christopher Lameter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).