All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jerome Glisse <jglisse@redhat.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-fsdevel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Tejun Heo <tj@kernel.org>, Jan Kara <jack@suse.cz>,
	Josef Bacik <jbacik@fb.com>
Subject: Re: [PATCH 00/14] Small step toward KSM for file back page.
Date: Wed, 7 Oct 2020 17:45:32 -0400	[thread overview]
Message-ID: <20201007214532.GA3484657@redhat.com> (raw)
In-Reply-To: <20201007183316.GV20115@casper.infradead.org>

On Wed, Oct 07, 2020 at 07:33:16PM +0100, Matthew Wilcox wrote:
> On Wed, Oct 07, 2020 at 01:54:19PM -0400, Jerome Glisse wrote:
> > On Wed, Oct 07, 2020 at 06:05:58PM +0100, Matthew Wilcox wrote:
> > > On Wed, Oct 07, 2020 at 10:48:35AM -0400, Jerome Glisse wrote:
> > > > On Wed, Oct 07, 2020 at 04:20:13AM +0100, Matthew Wilcox wrote:
> > > > > On Tue, Oct 06, 2020 at 09:05:49PM -0400, jglisse@redhat.com wrote:
> > > For other things (NUMA distribution), we can point to something which

[...]

> > > isn't a struct page and can be distiguished from a real struct page by a
> > > bit somewhere (I have ideas for at least three bits in struct page that
> > > could be used for this).  Then use a pointer in that data structure to
> > > point to the real page.  Or do NUMA distribution at the inode level.
> > > Have a way to get from (inode, node) to an address_space which contains
> > > just regular pages.
> > 
> > How do you find all the copies ? KSM maintains a list for a reasons.
> > Same would be needed here because if you want to break the write prot
> > you need to find all the copy first. If you intend to walk page table
> > then how do you synchronize to avoid more copy to spawn while you
> > walk reverse mapping, we could lock the struct page i guess. Also how
> > do you walk device page table which are completely hidden from core mm.
> 
> You have the inode and you iterate over each mapping, looking up the page
> that's in each mapping.  Or you use the i_mmap tree to find the pages.

This would slow down for everyone as we would have to walk all mapping
each time we try to write to page. Also we a have mechanism for page
write back to avoid race between thread trying to write and write back.
We would also need something similar. Without mediating this through
struct page i do not see how to keep this reasonable from performance
point of view.


> > > I don't have time to work on all of these.  If there's one that
> > > particularly interests you, let's dive deep into it and figure out how
> > 
> > I care about KSM, duplicate NUMA copy (not only for CPU but also
> > device) and write protection or exclusive write access. In each case
> > you need a list of all the copy (for KSM of the deduplicated page)
> > Having a special entry in the page cache does not sound like a good
> > option in many code path you would need to re-look the page cache to
> > find out if the page is in special state. If you use a bit flag in
> > struct page how do you get to the callback or to the copy/alias,
> > walk all the page tables ?
> 
> Like I said, something that _looks_ like a struct page.  At least looks
> enough like a struct page that you can pull a pointer out of the page
> cache and check the bit.  But since it's not actually a struct page,
> you can use the rest of the data structure for pointers to things you
> want to track.  Like the real struct page.

What i fear is the added cost because it means we need to do this look-
up everytime to check and we also need proper locking to avoid races.
Adding an ancilliary struct and trying to keep everything synchronize
seems harder to me.

> 
> > I do not see how i am doing violence to struct page :) The basis of
> > my approach is to pass down the mapping. We always have the mapping
> > at the top of the stack (either syscall entry point on a file or
> > through the vma when working on virtual address).
> 
> Yes, you explained all that in Utah.  I wasn't impressed than, and I'm
> not impressed now.

Is this more of a taste thing or is there something specific you do not
like ?

Cheers,
Jérôme


  reply	other threads:[~2020-10-07 21:45 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-07  1:05 [PATCH 00/14] Small step toward KSM for file back page jglisse
2020-10-07  1:05 ` [PATCH 01/14] mm/pxa: page exclusive access add header file for all helpers jglisse
2020-10-07  1:05 ` [PATCH 02/14] fs: define filler_t as a function pointer type jglisse
2020-10-07  1:05 ` [PATCH 03/14] fs: directly use a_ops->freepage() instead of a local copy of it jglisse
2020-10-07  1:05 ` [PATCH 04/14] mm: add struct address_space to readpage() callback jglisse
2020-10-07  1:05 ` [PATCH 05/14] mm: add struct address_space to writepage() callback jglisse
2020-10-07  1:05 ` [PATCH 06/14] mm: add struct address_space to set_page_dirty() callback jglisse
2020-10-07  1:05 ` [PATCH 07/14] mm: add struct address_space to invalidatepage() callback jglisse
2020-10-07  1:05 ` [PATCH 08/14] mm: add struct address_space to releasepage() callback jglisse
2020-10-07  1:05 ` [PATCH 09/14] mm: add struct address_space to freepage() callback jglisse
2020-10-07  1:05 ` [PATCH 10/14] mm: add struct address_space to putback_page() callback jglisse
2020-10-07  1:06 ` [PATCH 11/14] mm: add struct address_space to launder_page() callback jglisse
2020-10-07  1:06 ` [PATCH 12/14] mm: add struct address_space to is_partially_uptodate() callback jglisse
2020-10-07  1:06 ` [PATCH 13/14] mm: add struct address_space to isolate_page() callback jglisse
2020-10-07  1:06 ` [PATCH 14/14] mm: add struct address_space to is_dirty_writeback() callback jglisse
2020-10-07  3:20 ` [PATCH 00/14] Small step toward KSM for file back page Matthew Wilcox
2020-10-07 14:48   ` Jerome Glisse
2020-10-07 17:05     ` Matthew Wilcox
2020-10-07 17:54       ` Jerome Glisse
2020-10-07 18:33         ` Matthew Wilcox
2020-10-07 21:45           ` Jerome Glisse [this message]
2020-10-07 22:09         ` Matthew Wilcox
2020-10-08 15:30           ` Jerome Glisse
2020-10-08 15:43             ` Matthew Wilcox
2020-10-08 18:48               ` Jerome Glisse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201007214532.GA3484657@redhat.com \
    --to=jglisse@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=jack@suse.cz \
    --cc=jbacik@fb.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=tj@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.