From: Gleb Natapov <gleb@kernel.org>
To: Andres Lagar-Cavilla <andreslc@google.com>
Cc: Radim Krcmar <rkrcmar@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Rik van Riel <riel@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Mel Gorman <mgorman@suse.de>,
Andy Lutomirski <luto@amacapital.net>,
Andrew Morton <akpm@linux-foundation.org>,
Andrea Arcangeli <aarcange@redhat.com>,
Sasha Levin <sasha.levin@oracle.com>,
Jianyu Zhan <nasa4836@gmail.com>,
Paul Cassella <cassella@cray.com>,
Hugh Dickins <hughd@google.com>,
Peter Feiner <pfeiner@google.com>,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org
Subject: Re: [PATCH v2] kvm: Faults which trigger IO release the mmap_sem
Date: Thu, 18 Sep 2014 09:15:33 +0300 [thread overview]
Message-ID: <20140918061533.GD30733@minantech.com> (raw)
In-Reply-To: <1410976308-7683-1-git-send-email-andreslc@google.com>
On Wed, Sep 17, 2014 at 10:51:48AM -0700, Andres Lagar-Cavilla wrote:
> When KVM handles a tdp fault it uses FOLL_NOWAIT. If the guest memory
> has been swapped out or is behind a filemap, this will trigger async
> readahead and return immediately. The rationale is that KVM will kick
> back the guest with an "async page fault" and allow for some other
> guest process to take over.
>
> If async PFs are enabled the fault is retried asap from an async
> workqueue. If not, it's retried immediately in the same code path. In
> either case the retry will not relinquish the mmap semaphore and will
> block on the IO. This is a bad thing, as other mmap semaphore users
> now stall as a function of swap or filemap latency.
>
> This patch ensures both the regular and async PF path re-enter the
> fault allowing for the mmap semaphore to be relinquished in the case
> of IO wait.
>
Reviewed-by: Gleb Natapov <gleb@kernel.org>
> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
> Signed-off-by: Andres Lagar-Cavilla <andreslc@google.com>
>
> ---
> v1 -> v2
>
> * WARN_ON_ONCE -> VM_WARN_ON_ONCE
> * pagep == NULL skips the final retry
> * kvm_gup_retry -> kvm_gup_io
> * Comment updates throughout
> ---
> include/linux/kvm_host.h | 11 +++++++++++
> include/linux/mm.h | 1 +
> mm/gup.c | 4 ++++
> virt/kvm/async_pf.c | 4 +---
> virt/kvm/kvm_main.c | 49 +++++++++++++++++++++++++++++++++++++++++++++---
> 5 files changed, 63 insertions(+), 6 deletions(-)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 3addcbc..4c1991b 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -198,6 +198,17 @@ int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, unsigned long hva,
> int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
> #endif
>
> +/*
> + * Carry out a gup that requires IO. Allow the mm to relinquish the mmap
> + * semaphore if the filemap/swap has to wait on a page lock. pagep == NULL
> + * controls whether we retry the gup one more time to completion in that case.
> + * Typically this is called after a FAULT_FLAG_RETRY_NOWAIT in the main tdp
> + * handler.
> + */
> +int kvm_get_user_page_io(struct task_struct *tsk, struct mm_struct *mm,
> + unsigned long addr, bool write_fault,
> + struct page **pagep);
> +
> enum {
> OUTSIDE_GUEST_MODE,
> IN_GUEST_MODE,
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index ebc5f90..13e585f7 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2011,6 +2011,7 @@ static inline struct page *follow_page(struct vm_area_struct *vma,
> #define FOLL_HWPOISON 0x100 /* check page is hwpoisoned */
> #define FOLL_NUMA 0x200 /* force NUMA hinting page fault */
> #define FOLL_MIGRATION 0x400 /* wait for page to replace migration entry */
> +#define FOLL_TRIED 0x800 /* a retry, previous pass started an IO */
>
> typedef int (*pte_fn_t)(pte_t *pte, pgtable_t token, unsigned long addr,
> void *data);
> diff --git a/mm/gup.c b/mm/gup.c
> index 91d044b..af7ea3e 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -281,6 +281,10 @@ static int faultin_page(struct task_struct *tsk, struct vm_area_struct *vma,
> fault_flags |= FAULT_FLAG_ALLOW_RETRY;
> if (*flags & FOLL_NOWAIT)
> fault_flags |= FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT;
> + if (*flags & FOLL_TRIED) {
> + VM_WARN_ON_ONCE(fault_flags & FAULT_FLAG_ALLOW_RETRY);
> + fault_flags |= FAULT_FLAG_TRIED;
> + }
>
> ret = handle_mm_fault(mm, vma, address, fault_flags);
> if (ret & VM_FAULT_ERROR) {
> diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
> index d6a3d09..5ff7f7f 100644
> --- a/virt/kvm/async_pf.c
> +++ b/virt/kvm/async_pf.c
> @@ -80,9 +80,7 @@ static void async_pf_execute(struct work_struct *work)
>
> might_sleep();
>
> - down_read(&mm->mmap_sem);
> - get_user_pages(NULL, mm, addr, 1, 1, 0, NULL, NULL);
> - up_read(&mm->mmap_sem);
> + kvm_get_user_page_io(NULL, mm, addr, 1, NULL);
> kvm_async_page_present_sync(vcpu, apf);
>
> spin_lock(&vcpu->async_pf.lock);
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 7ef6b48..fa8a565 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1115,6 +1115,43 @@ static int get_user_page_nowait(struct task_struct *tsk, struct mm_struct *mm,
> return __get_user_pages(tsk, mm, start, 1, flags, page, NULL, NULL);
> }
>
> +int kvm_get_user_page_io(struct task_struct *tsk, struct mm_struct *mm,
> + unsigned long addr, bool write_fault,
> + struct page **pagep)
> +{
> + int npages;
> + int locked = 1;
> + int flags = FOLL_TOUCH | FOLL_HWPOISON |
> + (pagep ? FOLL_GET : 0) |
> + (write_fault ? FOLL_WRITE : 0);
> +
> + /*
> + * If retrying the fault, we get here *not* having allowed the filemap
> + * to wait on the page lock. We should now allow waiting on the IO with
> + * the mmap semaphore released.
> + */
> + down_read(&mm->mmap_sem);
> + npages = __get_user_pages(tsk, mm, addr, 1, flags, pagep, NULL,
> + &locked);
> + if (!locked) {
> + VM_BUG_ON(npages != -EBUSY);
> +
> + if (!pagep)
> + return 0;
> +
> + /*
> + * The previous call has now waited on the IO. Now we can
> + * retry and complete. Pass TRIED to ensure we do not re
> + * schedule async IO (see e.g. filemap_fault).
> + */
> + down_read(&mm->mmap_sem);
> + npages = __get_user_pages(tsk, mm, addr, 1, flags | FOLL_TRIED,
> + pagep, NULL, NULL);
> + }
> + up_read(&mm->mmap_sem);
> + return npages;
> +}
> +
> static inline int check_user_page_hwpoison(unsigned long addr)
> {
> int rc, flags = FOLL_TOUCH | FOLL_HWPOISON | FOLL_WRITE;
> @@ -1177,9 +1214,15 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
> npages = get_user_page_nowait(current, current->mm,
> addr, write_fault, page);
> up_read(¤t->mm->mmap_sem);
> - } else
> - npages = get_user_pages_fast(addr, 1, write_fault,
> - page);
> + } else {
> + /*
> + * By now we have tried gup_fast, and possibly async_pf, and we
> + * are certainly not atomic. Time to retry the gup, allowing
> + * mmap semaphore to be relinquished in the case of IO.
> + */
> + npages = kvm_get_user_page_io(current, current->mm, addr,
> + write_fault, page);
> + }
> if (npages != 1)
> return npages;
>
> --
> 2.1.0.rc2.206.gedb03e5
>
--
Gleb.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2014-09-18 6:15 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-09-15 20:11 [PATCH] kvm: Faults which trigger IO release the mmap_sem Andres Lagar-Cavilla
2014-09-16 13:51 ` Paolo Bonzini
2014-09-16 16:52 ` Andres Lagar-Cavilla
2014-09-16 16:55 ` Andres Lagar-Cavilla
2014-09-16 18:29 ` Paolo Bonzini
2014-09-16 18:42 ` Andres Lagar-Cavilla
2014-09-17 7:43 ` Paolo Bonzini
2014-09-17 16:58 ` Andres Lagar-Cavilla
2014-09-17 20:01 ` Paolo Bonzini
2014-09-16 20:51 ` Radim Krčmář
2014-09-16 21:01 ` Andres Lagar-Cavilla
2014-09-16 22:34 ` Radim Krčmář
2014-09-17 4:15 ` Andres Lagar-Cavilla
2014-09-17 11:35 ` Radim Krčmář
2014-09-17 10:26 ` Gleb Natapov
2014-09-17 11:27 ` Radim Krčmář
2014-09-17 11:42 ` Gleb Natapov
2014-09-17 17:00 ` Andres Lagar-Cavilla
2014-09-17 17:08 ` Gleb Natapov
2014-09-17 17:13 ` Andres Lagar-Cavilla
2014-09-17 17:21 ` Gleb Natapov
2014-09-17 17:41 ` Andres Lagar-Cavilla
2014-09-17 17:51 ` [PATCH v2] " Andres Lagar-Cavilla
2014-09-18 0:29 ` Wanpeng Li
2014-09-18 6:13 ` Gleb Natapov
2014-09-19 0:32 ` Wanpeng Li
2014-09-19 3:58 ` Andres Lagar-Cavilla
2014-09-19 6:08 ` Paolo Bonzini
2014-09-22 20:49 ` Andres Lagar-Cavilla
2014-09-22 21:32 ` Paolo Bonzini
2014-09-22 21:53 ` Andrew Morton
2014-09-18 6:15 ` Gleb Natapov [this message]
2014-09-25 21:16 ` Andrea Arcangeli
2014-09-25 21:50 ` Andres Lagar-Cavilla
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140918061533.GD30733@minantech.com \
--to=gleb@kernel.org \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=andreslc@google.com \
--cc=cassella@cray.com \
--cc=hughd@google.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@amacapital.net \
--cc=mgorman@suse.de \
--cc=nasa4836@gmail.com \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=pfeiner@google.com \
--cc=riel@redhat.com \
--cc=rkrcmar@redhat.com \
--cc=sasha.levin@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).