Re: [RFC PATCH] kvm,x86: Exit to user space in case of page fault error

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <sean.j.christopherson@intel.com>
To: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>,
	kvm@vger.kernel.org, virtio-fs@redhat.com, pbonzini@redhat.com,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH] kvm,x86: Exit to user space in case of page fault error
Date: Tue, 30 Jun 2020 09:32:18 -0700	[thread overview]
Message-ID: <20200630163218.GF7733@linux.intel.com> (raw)
In-Reply-To: <87h7usbkhq.fsf@vitty.brq.redhat.com>

On Tue, Jun 30, 2020 at 06:12:49PM +0200, Vitaly Kuznetsov wrote:
> Sean Christopherson <sean.j.christopherson@intel.com> writes:
> 
> > On Tue, Jun 30, 2020 at 05:43:54PM +0200, Vitaly Kuznetsov wrote:
> >> Vivek Goyal <vgoyal@redhat.com> writes:
> >> 
> >> > On Tue, Jun 30, 2020 at 05:13:54PM +0200, Vitaly Kuznetsov wrote:
> >> >> 
> >> >> > - If you retry in kernel, we will change the context completely that
> >> >> >   who was trying to access the gfn in question. We want to retain
> >> >> >   the real context and retain information who was trying to access
> >> >> >   gfn in question.
> >> >> 
> >> >> (Just so I understand the idea better) does the guest context matter to
> >> >> the host? Or, more specifically, are we going to do anything besides
> >> >> get_user_pages() which will actually analyze who triggered the access
> >> >> *in the guest*?
> >> >
> >> > When we exit to user space, qemu prints bunch of register state. I am
> >> > wondering what does that state represent. Does some of that traces
> >> > back to the process which was trying to access that hva? I don't
> >> > know.
> >> 
> >> We can get the full CPU state when the fault happens if we need to but
> >> generally we are not analyzing it. I can imagine looking at CPL, for
> >> example, but trying to distinguish guest's 'process A' from 'process B'
> >> may not be simple.
> >> 
> >> >
> >> > I think keeping a cache of error gfns might not be too bad from
> >> > implemetation point of view. I will give it a try and see how
> >> > bad does it look.
> >> 
> >> Right; I'm only worried about the fact that every cache (or hash) has a
> >> limited size and under certain curcumstances we may overflow it. When an
> >> overflow happens, we will follow the APF path again and this can go over
> >> and over. Maybe we can punch a hole in EPT/NPT making the PFN reserved/
> >> not-present so when the guest tries to access it again we trap the
> >> access in KVM and, if the error persists, don't follow the APF path?
> >
> > Just to make sure I'm somewhat keeping track, is the problem we're trying to
> > solve that the guest may not immediately retry the "bad" GPA and so KVM may
> > not detect that the async #PF already came back as -EFAULT or whatever? 
> 
> Yes. In Vivek's patch there's a single 'error_gfn' per vCPU which serves
> as an indicator whether to follow APF path or not.

A thought along the lines of your "punch a hole in the page tables" idea
would be to invalidate the SPTE (in the unlikely case it's present but not
writable) and tagging it as being invalid for async #PF.  E.g. for !EPT,
there are 63 bits available for metadata.  For EPT, there's a measly 60,
assuming we want to avoid using SUPPRESS_VE.  The fully !present case would
be straightforward, but the !writable case would require extra work,
especially for shadow paging.

With the SPTE tagged, it'd "just" be a matter of hooking into the page fault
paths to detect the flag and disable async #PF.  For TDP that's not too bad,
e.g. pass in a flag to fast_page_fault() and propagate it to try_async_pf().
Not sure how to handle shadow paging, that code makes my head hurt just
looking at it.

It'd require tweaking is_shadow_present_pte() to be more precise, but that's
probably a good thing, and peanuts compared to handling the faults.

next prev parent reply	other threads:[~2020-06-30 16:32 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-25 21:47 [RFC PATCH] kvm,x86: Exit to user space in case of page fault error Vivek Goyal
2020-06-26  9:25 ` Vitaly Kuznetsov
2020-06-26 15:03   ` Vivek Goyal
2020-06-29 20:56     ` Vitaly Kuznetsov
2020-06-29 22:03       ` Vivek Goyal
2020-06-30 13:24         ` Vitaly Kuznetsov
2020-06-30 14:53           ` Vivek Goyal
2020-06-30 15:13             ` Vitaly Kuznetsov
2020-06-30 15:25               ` Vivek Goyal
2020-06-30 15:43                 ` Vitaly Kuznetsov
2020-06-30 15:50                   ` Sean Christopherson
2020-06-30 16:12                     ` Vitaly Kuznetsov
2020-06-30 16:32                       ` Sean Christopherson [this message]
2020-06-30 18:25                   ` Vivek Goyal
2020-07-01  8:06                     ` Vitaly Kuznetsov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200630163218.GF7733@linux.intel.com \
    --to=sean.j.christopherson@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=vgoyal@redhat.com \
    --cc=virtio-fs@redhat.com \
    --cc=vkuznets@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox