All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrea Arcangeli <aarcange@redhat.com>
To: Jerome Glisse <j.glisse@gmail.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>,
	Oleg Nesterov <oleg@redhat.com>, Hugh Dickins <hughd@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Alex Williamson <alex.williamson@redhat.com>,
	kirill.shutemov@linux.intel.com, linux-kernel@vger.kernel.org,
	"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: GUP guarantees wrt to userspace mappings
Date: Mon, 2 May 2016 21:11:41 +0200	[thread overview]
Message-ID: <20160502191141.GE12310@redhat.com> (raw)
In-Reply-To: <20160502152249.GA5827@gmail.com>

On Mon, May 02, 2016 at 05:22:49PM +0200, Jerome Glisse wrote:
> I think this is still fine as it means that device will read only and thus
> you can migrate to different page (ie the guest is not expecting to read back
> anything writen by the device and device writting to the page would be illegal
> and a proper IOMMU would forbid it). So it is like direct-io when you write
> from anonymous memory to a file.

Agreed. write=1 is so that if there's an O_DIRECT write() and the app
is only reading, there will be no COW generated on shared anonymous
memory/MAP_PRIVATE-filebacked.

> Now that i think again about it, i don't think it exist. pmdp_collapse_flush()
> will flush the tlb and thus send an IPI but get_user_pages_fast() can't be
> preempted so the flush will have to wait for existing get_user_pages_fast() to
> complete. Or am i missunderstanding flush ? So khugepaged is safe from GUP_fast
> point of view like the comment, inside it, says.

This is exactly correct, there's no race window.

The IPI (or the quiescent point in case of the gup_fast RCU version)
are the things that flush away get_user_pages_fast with pmdp_collapse_flush().

> Well you can't not rely on special vma here. Qemu alloc anonymous memory and
> hand it over to guest, then a guest driver (ie runing in the guest not on the
> host) try to map that memory and need valid DMA address for it, this is when
> vfio (on the host kernel) starts pining memory of regular anonymous vma (on
> the host). That same memory might back some special vma with ->mmap callback
> but in the guest. Point is there is no driver on the host and no special vma.
> From host point of view this is anonymous memory, but from guest POV it is
> just memory.

It's quite important it stays regular tmpfs/anon as device memory is
managed by the device and we'd lose everything (KSM/swapping/NUMA
balancing/compaction/memory-hotunplug/CMA etc..).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Andrea Arcangeli <aarcange@redhat.com>
To: Jerome Glisse <j.glisse@gmail.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>,
	Oleg Nesterov <oleg@redhat.com>, Hugh Dickins <hughd@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Alex Williamson <alex.williamson@redhat.com>,
	kirill.shutemov@linux.intel.com, linux-kernel@vger.kernel.org,
	"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: GUP guarantees wrt to userspace mappings
Date: Mon, 2 May 2016 21:11:41 +0200	[thread overview]
Message-ID: <20160502191141.GE12310@redhat.com> (raw)
In-Reply-To: <20160502152249.GA5827@gmail.com>

On Mon, May 02, 2016 at 05:22:49PM +0200, Jerome Glisse wrote:
> I think this is still fine as it means that device will read only and thus
> you can migrate to different page (ie the guest is not expecting to read back
> anything writen by the device and device writting to the page would be illegal
> and a proper IOMMU would forbid it). So it is like direct-io when you write
> from anonymous memory to a file.

Agreed. write=1 is so that if there's an O_DIRECT write() and the app
is only reading, there will be no COW generated on shared anonymous
memory/MAP_PRIVATE-filebacked.

> Now that i think again about it, i don't think it exist. pmdp_collapse_flush()
> will flush the tlb and thus send an IPI but get_user_pages_fast() can't be
> preempted so the flush will have to wait for existing get_user_pages_fast() to
> complete. Or am i missunderstanding flush ? So khugepaged is safe from GUP_fast
> point of view like the comment, inside it, says.

This is exactly correct, there's no race window.

The IPI (or the quiescent point in case of the gup_fast RCU version)
are the things that flush away get_user_pages_fast with pmdp_collapse_flush().

> Well you can't not rely on special vma here. Qemu alloc anonymous memory and
> hand it over to guest, then a guest driver (ie runing in the guest not on the
> host) try to map that memory and need valid DMA address for it, this is when
> vfio (on the host kernel) starts pining memory of regular anonymous vma (on
> the host). That same memory might back some special vma with ->mmap callback
> but in the guest. Point is there is no driver on the host and no special vma.
> From host point of view this is anonymous memory, but from guest POV it is
> just memory.

It's quite important it stays regular tmpfs/anon as device memory is
managed by the device and we'd lose everything (KSM/swapping/NUMA
balancing/compaction/memory-hotunplug/CMA etc..).

  parent reply	other threads:[~2016-05-02 19:11 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-28 16:20 [BUG] vfio device assignment regression with THP ref counting redesign Alex Williamson
2016-04-28 16:20 ` Alex Williamson
2016-04-28 18:17 ` Kirill A. Shutemov
2016-04-28 18:17   ` Kirill A. Shutemov
2016-04-28 18:58   ` Alex Williamson
2016-04-28 18:58     ` Alex Williamson
2016-04-28 23:21     ` Andrea Arcangeli
2016-04-28 23:21       ` Andrea Arcangeli
2016-04-29  0:44       ` Alex Williamson
2016-04-29  0:44         ` Alex Williamson
2016-04-29  0:51       ` Kirill A. Shutemov
2016-04-29  0:51         ` Kirill A. Shutemov
2016-04-29  2:45         ` Alex Williamson
2016-04-29  2:45           ` Alex Williamson
2016-04-29  7:06           ` Kirill A. Shutemov
2016-04-29  7:06             ` Kirill A. Shutemov
2016-04-29 15:12             ` Alex Williamson
2016-04-29 15:12               ` Alex Williamson
2016-04-29 16:34             ` Andrea Arcangeli
2016-04-29 16:34               ` Andrea Arcangeli
2016-04-29 22:34               ` Alex Williamson
2016-04-29 22:34                 ` Alex Williamson
2016-05-02 10:41               ` Kirill A. Shutemov
2016-05-02 10:41                 ` Kirill A. Shutemov
2016-05-02 11:15                 ` Jerome Glisse
2016-05-02 11:15                   ` Jerome Glisse
2016-05-02 12:14                   ` GUP guarantees wrt to userspace mappings redesign Kirill A. Shutemov
2016-05-02 12:14                     ` Kirill A. Shutemov
2016-05-02 13:39                     ` Jerome Glisse
2016-05-02 13:39                       ` Jerome Glisse
2016-05-02 15:00                       ` GUP guarantees wrt to userspace mappings Kirill A. Shutemov
2016-05-02 15:00                         ` Kirill A. Shutemov
2016-05-02 15:22                         ` Jerome Glisse
2016-05-02 15:22                           ` Jerome Glisse
2016-05-02 16:12                           ` Kirill A. Shutemov
2016-05-02 16:12                             ` Kirill A. Shutemov
2016-05-02 19:14                             ` Andrea Arcangeli
2016-05-02 19:14                               ` Andrea Arcangeli
2016-05-02 19:11                           ` Andrea Arcangeli [this message]
2016-05-02 19:11                             ` Andrea Arcangeli
2016-05-02 19:02                         ` Andrea Arcangeli
2016-05-02 19:02                           ` Andrea Arcangeli
2016-05-02 14:15                     ` GUP guarantees wrt to userspace mappings redesign Oleg Nesterov
2016-05-02 14:15                       ` Oleg Nesterov
2016-05-02 16:21                       ` Kirill A. Shutemov
2016-05-02 16:21                         ` Kirill A. Shutemov
2016-05-02 16:22                         ` Oleg Nesterov
2016-05-02 16:22                           ` Oleg Nesterov
2016-05-02 18:03                           ` Kirill A. Shutemov
2016-05-02 18:03                             ` Kirill A. Shutemov
2016-05-02 17:41                             ` Oleg Nesterov
2016-05-02 17:41                               ` Oleg Nesterov
2016-05-02 18:56                     ` Andrea Arcangeli
2016-05-02 18:56                       ` Andrea Arcangeli
2016-05-02 15:23                 ` [BUG] vfio device assignment regression with THP ref counting redesign Andrea Arcangeli
2016-05-02 15:23                   ` Andrea Arcangeli
2016-05-02 16:00                   ` Kirill A. Shutemov
2016-05-02 16:00                     ` Kirill A. Shutemov
2016-05-02 18:03                     ` Andrea Arcangeli
2016-05-02 18:03                       ` Andrea Arcangeli
2016-05-05  1:19                       ` Alex Williamson
2016-05-05  1:19                         ` Alex Williamson
2016-05-05 14:39                         ` Andrea Arcangeli
2016-05-05 14:39                           ` Andrea Arcangeli
2016-05-05 15:09                           ` Andrea Arcangeli
2016-05-05 15:09                             ` Andrea Arcangeli
2016-05-05 15:11                           ` Kirill A. Shutemov
2016-05-05 15:11                             ` Kirill A. Shutemov
2016-05-05 15:24                             ` Andrea Arcangeli
2016-05-05 15:24                               ` Andrea Arcangeli
2016-05-06  7:29                               ` Kirill A. Shutemov
2016-05-06  7:29                                 ` Kirill A. Shutemov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160502191141.GE12310@redhat.com \
    --to=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=hughd@google.com \
    --cc=j.glisse@gmail.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=oleg@redhat.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.