Re: [LSF/MM/BPF TOPIC] Do not pin pages for various direct-io scheme

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Jason Gunthorpe <jgg@mellanox.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: "jglisse@redhat.com" <jglisse@redhat.com>,
	"lsf-pc@lists.linux-foundation.org"
	<lsf-pc@lists.linux-foundation.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Benjamin LaHaise <bcrl@kvack.org>
Subject: Re: [LSF/MM/BPF TOPIC] Do not pin pages for various direct-io scheme
Date: Mon, 27 Jan 2020 19:01:07 +0000	[thread overview]
Message-ID: <20200127190100.GA1823@ziepe.ca> (raw)
In-Reply-To: <ba250f19-cc51-f1dc-3236-58be1f291db3@kernel.dk>

On Tue, Jan 21, 2020 at 08:54:22PM -0700, Jens Axboe wrote:
> On 1/21/20 7:31 PM, jglisse@redhat.com wrote:
> > From: Jérôme Glisse <jglisse@redhat.com>
> > 
> > Direct I/O does pin memory through GUP (get user page) this does
> > block several mm activities like:
> >     - compaction
> >     - numa
> >     - migration
> >     ...
> > 
> > It is also troublesome if the pinned pages are actualy file back
> > pages that migth go under writeback. In which case the page can
> > not be write protected from direct-io point of view (see various
> > discussion about recent work on GUP [1]). This does happens for
> > instance if the virtual memory address use as buffer for read
> > operation is the outcome of an mmap of a regular file.
> > 
> > 
> > With direct-io or aio (asynchronous io) pages are pinned until
> > syscall completion (which depends on many factors: io size,
> > block device speed, ...). For io-uring pages can be pinned an
> > indifinite amount of time.
> > 
> > 
> > So i would like to convert direct io code (direct-io, aio and
> > io-uring) to obey mmu notifier and thus allow memory management
> > and writeback to work and behave like any other process memory.
> > 
> > For direct-io and aio this mostly gives a way to wait on syscall
> > completion. For io-uring this means that buffer might need to be
> > re-validated (ie looking up pages again to get the new set of
> > pages for the buffer). Impact for io-uring is the delay needed
> > to lookup new pages or wait on writeback (if necessary). This
> > would only happens _if_ an invalidation event happens, which it-
> > self should only happen under memory preissure or for NUMA
> > activities.
> > 
> > They are ways to minimize the impact (for instance by using the
> > mmu notifier type to ignore some invalidation cases).
> > 
> > 
> > So i would like to discuss all this during LSF, it is mostly a
> > filesystem discussion with strong tie to mm.
> 
> I'd be interested in this topic, as it pertains to io_uring. The whole
> point of registered buffers is to avoid mapping overhead, and page
> references. 

I'd also be interested as it pertains to mmu notifiers and related
which I've been involved with reworking lately. I feel others are
looking at doing different things with bio/skbs that are kind of
related to this idea so I feel it is worthwhile topic.

This proposal sounds, at a high level, quite similar to what vhost is
doing today, where they want to use copy_to_user without paying it's
cost by directly accessing kernel pages and keeping everything in sync
with notifiers.

> If we add extra overhead per operation for that, well... I'm
> assuming the above is strictly for file mapped pages? Or also page
> migration?

Generally the performance profile we see in other places is that
applications that don't touch their memory have no impact, while
things get wonky during invalidations.

However, that has assumed DMA devices where the DMA device has some
optimized HW way to manage locking.   

In vhost the performance concernes seems to revolve around locking the
CPU access thread against the mmu notifier thread.

I'm curious about Jérôme's thinking on this, particularly when you mix
in longer lifetimes of skbs and bios and whatnot. At some point the
pages must become pinned, for instance while they are submitted to a
device for DMA.

Thanks,
Jason

next prev parent reply	other threads:[~2020-01-27 19:01 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-22  2:31 [LSF/MM/BPF TOPIC] Do not pin pages for various direct-io scheme jglisse
2020-01-22  3:54 ` Jens Axboe
2020-01-22  4:57   ` Jerome Glisse
2020-01-22 11:59     ` Michal Hocko
2020-01-22 15:12       ` Jens Axboe
2020-01-22 16:54         ` Jerome Glisse
2020-01-22 17:04           ` Jens Axboe
2020-01-22 17:28             ` Jerome Glisse
2020-01-22 17:38               ` Jens Axboe
2020-01-22 17:40                 ` Jerome Glisse
2020-01-22 17:49                   ` Jens Axboe
2020-01-27 19:01   ` Jason Gunthorpe [this message]
2020-01-22  4:19 ` Dan Williams
2020-01-22  5:00   ` Jerome Glisse
2020-01-22 15:56     ` [Lsf-pc] " Dan Williams
2020-01-22 17:02       ` Jerome Glisse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200127190100.GA1823@ziepe.ca \
    --to=jgg@mellanox.com \
    --cc=axboe@kernel.dk \
    --cc=bcrl@kvack.org \
    --cc=jglisse@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).