From: Jarkko Sakkinen <jarkko@kernel.org>
To: Matthew Wilcox <willy@infradead.org>
Cc: codalist@telemann.coda.cs.cmu.edu, jaharkes@cs.cmu.edu,
Nathaniel McCallum <nathaniel@profian.com>,
linux-unionfs@vger.kernel.org, intel-gfx@lists.freedesktop.org,
Dave Hansen <dave.hansen@linux.intel.com>,
linux-mips@vger.kernel.org, dri-devel@lists.freedesktop.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Dave Hansen <dave.hansen@intel.com>,
linux-fsdevel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
Reinette Chatre <reinette.chatre@intel.com>,
linux-sgx@vger.kernel.org
Subject: Re: [Intel-gfx] [PATCH RFC v2] mm: Add f_ops->populate()
Date: Mon, 7 Mar 2022 17:43:14 +0200 [thread overview]
Message-ID: <YiYoEiBklxQrb8Wj@iki.fi> (raw)
In-Reply-To: <YiYYvAWYgC+PKEx0@casper.infradead.org>
On Mon, Mar 07, 2022 at 02:37:48PM +0000, Matthew Wilcox wrote:
> On Sun, Mar 06, 2022 at 03:41:54PM -0800, Dave Hansen wrote:
> > In short: page faults stink. The core kernel has lots of ways of
> > avoiding page faults like madvise(MADV_WILLNEED) or mmap(MAP_POPULATE).
> > But, those only work on normal RAM that the core mm manages.
> >
> > SGX is weird. SGX memory is managed outside the core mm. It doesn't
> > have a 'struct page' and get_user_pages() doesn't work on it. Its VMAs
> > are marked with VM_IO. So, none of the existing methods for avoiding
> > page faults work on SGX memory.
> >
> > This essentially helps extend existing "normal RAM" kernel ABIs to work
> > for avoiding faults for SGX too. SGX users want to enjoy all of the
> > benefits of a delayed allocation policy (better resource use,
> > overcommit, NUMA affinity) but without the cost of millions of faults.
>
> We have a mechanism for dynamically reducing the number of page faults
> already; it's just buried in the page cache code. You have vma->vm_file,
> which contains a file_ra_state. You can use this to track where
> recent faults have been and grow the size of the region you fault in
> per page fault. You don't have to (indeed probably don't want to) use
> the same algorithm as the page cache, but the _principle_ is the same --
> were recent speculative faults actually used; should we grow the number
> of pages actually faulted in, or is this a random sparse workload where
> we want to allocate individual pages.
>
> Don't rely on the user to ask. They don't know.
This sounds like a possibility. I'll need to study it properly first
though. Thank you for pointing this out.
BR, Jarkko
WARNING: multiple messages have this Message-ID (diff)
From: Jarkko Sakkinen <jarkko@kernel.org>
To: Matthew Wilcox <willy@infradead.org>
Cc: Dave Hansen <dave.hansen@intel.com>,
Andrew Morton <akpm@linux-foundation.org>,
Dave Hansen <dave.hansen@linux.intel.com>,
Nathaniel McCallum <nathaniel@profian.com>,
Reinette Chatre <reinette.chatre@intel.com>,
linux-sgx@vger.kernel.org, jaharkes@cs.cmu.edu,
linux-mips@vger.kernel.org, linux-kernel@vger.kernel.org,
intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
codalist@telemann.coda.cs.cmu.edu, linux-unionfs@vger.kernel.org,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH RFC v2] mm: Add f_ops->populate()
Date: Mon, 7 Mar 2022 17:43:14 +0200 [thread overview]
Message-ID: <YiYoEiBklxQrb8Wj@iki.fi> (raw)
In-Reply-To: <YiYYvAWYgC+PKEx0@casper.infradead.org>
On Mon, Mar 07, 2022 at 02:37:48PM +0000, Matthew Wilcox wrote:
> On Sun, Mar 06, 2022 at 03:41:54PM -0800, Dave Hansen wrote:
> > In short: page faults stink. The core kernel has lots of ways of
> > avoiding page faults like madvise(MADV_WILLNEED) or mmap(MAP_POPULATE).
> > But, those only work on normal RAM that the core mm manages.
> >
> > SGX is weird. SGX memory is managed outside the core mm. It doesn't
> > have a 'struct page' and get_user_pages() doesn't work on it. Its VMAs
> > are marked with VM_IO. So, none of the existing methods for avoiding
> > page faults work on SGX memory.
> >
> > This essentially helps extend existing "normal RAM" kernel ABIs to work
> > for avoiding faults for SGX too. SGX users want to enjoy all of the
> > benefits of a delayed allocation policy (better resource use,
> > overcommit, NUMA affinity) but without the cost of millions of faults.
>
> We have a mechanism for dynamically reducing the number of page faults
> already; it's just buried in the page cache code. You have vma->vm_file,
> which contains a file_ra_state. You can use this to track where
> recent faults have been and grow the size of the region you fault in
> per page fault. You don't have to (indeed probably don't want to) use
> the same algorithm as the page cache, but the _principle_ is the same --
> were recent speculative faults actually used; should we grow the number
> of pages actually faulted in, or is this a random sparse workload where
> we want to allocate individual pages.
>
> Don't rely on the user to ask. They don't know.
This sounds like a possibility. I'll need to study it properly first
though. Thank you for pointing this out.
BR, Jarkko
WARNING: multiple messages have this Message-ID (diff)
From: Jarkko Sakkinen <jarkko@kernel.org>
To: Matthew Wilcox <willy@infradead.org>
Cc: codalist@telemann.coda.cs.cmu.edu, jaharkes@cs.cmu.edu,
Nathaniel McCallum <nathaniel@profian.com>,
linux-unionfs@vger.kernel.org, intel-gfx@lists.freedesktop.org,
Dave Hansen <dave.hansen@linux.intel.com>,
linux-mips@vger.kernel.org, dri-devel@lists.freedesktop.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Dave Hansen <dave.hansen@intel.com>,
linux-fsdevel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
Reinette Chatre <reinette.chatre@intel.com>,
linux-sgx@vger.kernel.org
Subject: Re: [PATCH RFC v2] mm: Add f_ops->populate()
Date: Mon, 7 Mar 2022 17:43:14 +0200 [thread overview]
Message-ID: <YiYoEiBklxQrb8Wj@iki.fi> (raw)
In-Reply-To: <YiYYvAWYgC+PKEx0@casper.infradead.org>
On Mon, Mar 07, 2022 at 02:37:48PM +0000, Matthew Wilcox wrote:
> On Sun, Mar 06, 2022 at 03:41:54PM -0800, Dave Hansen wrote:
> > In short: page faults stink. The core kernel has lots of ways of
> > avoiding page faults like madvise(MADV_WILLNEED) or mmap(MAP_POPULATE).
> > But, those only work on normal RAM that the core mm manages.
> >
> > SGX is weird. SGX memory is managed outside the core mm. It doesn't
> > have a 'struct page' and get_user_pages() doesn't work on it. Its VMAs
> > are marked with VM_IO. So, none of the existing methods for avoiding
> > page faults work on SGX memory.
> >
> > This essentially helps extend existing "normal RAM" kernel ABIs to work
> > for avoiding faults for SGX too. SGX users want to enjoy all of the
> > benefits of a delayed allocation policy (better resource use,
> > overcommit, NUMA affinity) but without the cost of millions of faults.
>
> We have a mechanism for dynamically reducing the number of page faults
> already; it's just buried in the page cache code. You have vma->vm_file,
> which contains a file_ra_state. You can use this to track where
> recent faults have been and grow the size of the region you fault in
> per page fault. You don't have to (indeed probably don't want to) use
> the same algorithm as the page cache, but the _principle_ is the same --
> were recent speculative faults actually used; should we grow the number
> of pages actually faulted in, or is this a random sparse workload where
> we want to allocate individual pages.
>
> Don't rely on the user to ask. They don't know.
This sounds like a possibility. I'll need to study it properly first
though. Thank you for pointing this out.
BR, Jarkko
next prev parent reply other threads:[~2022-03-08 12:51 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-06 3:26 [Intel-gfx] [PATCH RFC v2] mm: Add f_ops->populate() Jarkko Sakkinen
2022-03-06 3:26 ` Jarkko Sakkinen
2022-03-06 3:26 ` Jarkko Sakkinen
2022-03-06 23:24 ` [Intel-gfx] " Andrew Morton
2022-03-06 23:24 ` Andrew Morton
2022-03-06 23:24 ` Andrew Morton
2022-03-06 23:41 ` [Intel-gfx] " Dave Hansen
2022-03-06 23:41 ` Dave Hansen
2022-03-06 23:41 ` Dave Hansen
2022-03-07 11:27 ` [Intel-gfx] " Jarkko Sakkinen
2022-03-07 11:27 ` Jarkko Sakkinen
2022-03-07 11:27 ` Jarkko Sakkinen
2022-03-07 15:29 ` [Intel-gfx] " Dave Hansen
2022-03-07 15:29 ` Dave Hansen
2022-03-07 15:29 ` Dave Hansen
2022-03-07 15:44 ` [Intel-gfx] " Jarkko Sakkinen
2022-03-07 15:44 ` Jarkko Sakkinen
2022-03-07 15:44 ` Jarkko Sakkinen
2022-03-07 14:37 ` [Intel-gfx] " Matthew Wilcox
2022-03-07 14:37 ` Matthew Wilcox
2022-03-07 14:37 ` Matthew Wilcox
2022-03-07 15:43 ` Jarkko Sakkinen [this message]
2022-03-07 15:43 ` Jarkko Sakkinen
2022-03-07 15:43 ` Jarkko Sakkinen
2022-03-07 13:00 ` [Intel-gfx] " Jarkko Sakkinen
2022-03-07 13:00 ` Jarkko Sakkinen
2022-03-07 13:00 ` Jarkko Sakkinen
2022-03-08 8:28 ` kernel test robot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YiYoEiBklxQrb8Wj@iki.fi \
--to=jarkko@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=codalist@telemann.coda.cs.cmu.edu \
--cc=dave.hansen@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-gfx@lists.freedesktop.org \
--cc=jaharkes@cs.cmu.edu \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mips@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-sgx@vger.kernel.org \
--cc=linux-unionfs@vger.kernel.org \
--cc=nathaniel@profian.com \
--cc=reinette.chatre@intel.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.