All of lore.kernel.org
 help / color / mirror / Atom feed
From: Claudio Imbrenda <imbrenda@linux.ibm.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: linux-mm@kvack.org, linux-s390@vger.kernel.org
Subject: Re: Inaccessible pages & folios
Date: Thu, 15 Apr 2021 11:28:14 +0200	[thread overview]
Message-ID: <20210415112814.303f7f02@ibm-vm> (raw)
In-Reply-To: <20210412135514.GK2531743@casper.infradead.org>

On Mon, 12 Apr 2021 14:55:14 +0100
Matthew Wilcox <willy@infradead.org> wrote:

[...]

> 
> I was only thinking about the page cache case ...
> 
>         access_ret = arch_make_page_accessible(page);
>         /*
>          * If writeback has been triggered on a page that cannot be
> made
>          * accessible, it is too late to recover here.
>          */
>         VM_BUG_ON_PAGE(access_ret != 0, page);
> 
> ... where it seems all pages _can_ be made accessible.

yes, for that case it is straightforward

> > also, I assume you keep the semantic difference between get_page and
> > pin_page? that's also very important for us  
> 
> I haven't changed anything in gup.c yet.  Just trying to get the page
> cache to suck less right now.

fair enough :)
 
> > > So what you're saying is that the host might allocate, eg a 1GB
> > > folio for a guest, then the guest splits that up into smaller
> > > chunks (eg 1MB), and would only want one of those small chunks
> > > accessible to the hypervisor?  
> > 
> > qemu will allocate a big chunk of memory, and I/O would happen only
> > on small chunks (depending on what the guest does). I don't know
> > how swap and pagecache would behave in the folio scenario.
> > 
> > Also consider that currently we need 4k hardware pages for protected
> > guests (so folios would be ok, as long as they are backed by small
> > pages)
> > 
> > How and when are folios created actually?
> > 
> > is there a way to prevent creation of multi-page folios?  
> 
> Today there's no way to create multi-page folios because I haven't
> submitted the patch to add alloc_folio() and friends:
> 
> https://git.infradead.org/users/willy/pagecache.git/commitdiff/4fe26f7a28ffdc850cd016cdaaa74974c59c5f53
> 
> We do have a way to allocate compound pages and add them to the page
> cache, but that's only in use by tmpfs/shmem.
> 
> What will happen is that (for filesystems which support multipage
> folios), they'll be allocated by the page cache.  I expect other
> places will start to use folios after that (eg anonymous memory), but
> I don't know where all those places will be.  I hope not to be
> involved in that!
> 
> The general principle, though, is that the overhead of tracking
> memory in page-sized units is too high, and we need to use larger
> units by default. There are occasions when we need to do things to
> memory in smaller units, and for those, we can choose to either
> handle sub-folio things, or we can split a folio apart into smaller
> folios.
> 
> > > > a possible approach maybe would be to keep the _page variant,
> > > > and add a _folio wrapper around it    
> > > 
> > > Yes, we can do that.  It's what I'm currently doing for
> > > flush_dcache_folio().  
> > 
> > where would the page flags be stored? as I said, we really depend on
> > that bit to be set correctly to prevent potentially disruptive I/O
> > errors. It's ok if the bit overindicates protection (non-protected
> > pages can be marked as protected), but protected pages must at all
> > times have the bit set.
> > 
> > the reason why this hook exists at all, is to prevent secure pages
> > from being accidentally (or maliciously) fed into I/O  
> 
> You can still use PG_arch_1 on the sub-pages of a folio.  It's one of
> the things you'll have to decide, actually.  Does setting PG_arch_1 on
> the head page of the folio indicate that the entire page is
> accessible, or just that the head page is accessible?  Different page
> flags have made different decisions here.

ok then, I think the simplest and safest thing to do right now is to
keep the flag on each page


in short:
* pagecache -> you can put a loop or introduce a _folio wrapper for
  arch_make_page_accessible
* gup.c -> won't be touched for now, but when the time comes, the
  PG_arch_1 bit should be set for each page


      reply	other threads:[~2021-04-15  9:28 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-09 19:40 Inaccessible pages & folios Matthew Wilcox
2021-04-12 12:18 ` Claudio Imbrenda
2021-04-12 12:43   ` Matthew Wilcox
2021-04-12 13:37     ` Claudio Imbrenda
2021-04-12 13:55       ` Matthew Wilcox
2021-04-15  9:28         ` Claudio Imbrenda [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210415112814.303f7f02@ibm-vm \
    --to=imbrenda@linux.ibm.com \
    --cc=linux-mm@kvack.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.