From: Sean Christopherson <sean.j.christopherson@intel.com>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
virtio-fs-list <virtio-fs@redhat.com>,
vkuznets@redhat.com, pbonzini@redhat.com
Subject: Re: [PATCH v4] kvm,x86: Exit to user space in case page fault error
Date: Fri, 2 Oct 2020 14:13:14 -0700 [thread overview]
Message-ID: <20201002211314.GE24460@linux.intel.com> (raw)
In-Reply-To: <20201002200214.GB10232@redhat.com>
On Fri, Oct 02, 2020 at 04:02:14PM -0400, Vivek Goyal wrote:
> On Fri, Oct 02, 2020 at 12:45:18PM -0700, Sean Christopherson wrote:
> > On Fri, Oct 02, 2020 at 03:27:34PM -0400, Vivek Goyal wrote:
> > > On Fri, Oct 02, 2020 at 11:30:37AM -0700, Sean Christopherson wrote:
> > > > On Fri, Oct 02, 2020 at 11:38:54AM -0400, Vivek Goyal wrote:
> > > > I don't think it's necessary to provide userspace with the register state of
> > > > the guest task that hit the bad page. Other than debugging, I don't see how
> > > > userspace can do anything useful which such information.
> > >
> > > I think debugging is the whole point so that user can figure out which
> > > access by guest task resulted in bad memory access. I would think this
> > > will be important piece of information.
> >
> > But isn't this failure due to a truncation in the host? Why would we care
> > about debugging the guest? It hasn't done anything wrong, has it? Or am I
> > misunderstanding the original problem statement.
>
> I think you understood problem statement right. If guest has right
> context, it just gives additional information who tried to access
> the missing memory page.
Yes, but it's not actionable, e.g. QEMU can't do anything differently given
a guest RIP. It's useful information for hands-on debug, but the information
can be easily collected through other means when doing hands-on debug.
> > > > To fully handle the situation, the guest needs to remove the bad page from
> > > > its memory pool. Once the page is offlined, the guest kernel's error
> > > > handling will kick in when a task accesses the bad page (or nothing ever
> > > > touches the bad page again and everyone is happy).
> > >
> > > This is not really a case of bad page as such. It is more of a page
> > > gone missing/trucated. And no new user can map it. We just need to
> > > worry about existing users who already have it mapped.
> >
> > What do you mean by "no new user can map it"? Are you talking about guest
> > tasks or host tasks? If guest tasks, how would the guest know the page is
> > missing and thus prevent mapping the non-existent page?
>
> If a new task wants mmap(), it will send a request to virtiofsd/qemu
> on host. If file has been truncated, then mapping beyond file size
> will fail and process will get error. So they will not be able to
> map a page which has been truncated.
Ah. Is there anything that prevents the notification side of things from
being handled purely within the virtiofs layer? E.g. host notifies the guest
that a file got truncated, virtiofs driver in the guest invokes a kernel API
to remove the page(s).
> > > > Note, I'm not necessarily suggesting that QEMU piggyback its #MC injection
> > > > to handle this, but I suspect the resulting behavior will look quite similar,
> > > > e.g. notify the virtiofs driver in the guest, which does some magic to take
> > > > the offending region offline, and then guest tasks get SIGBUS or whatever.
> > > >
> > > > I also don't think it's KVM's responsibility to _directly_ handle such a
> > > > scenario. As I said in an earlier version, KVM can't possibly know _why_ a
> > > > page fault came back with -EFAULT, only userspace can connect the dots of
> > > > GPA -> HVA -> vm_area_struct -> file -> inject event. KVM definitely should
> > > > exit to userspace on the -EFAULT instead of hanging the guest, but that can
> > > > be done via a new request, as suggested.
> > >
> > > KVM atleast should have the mechanism to report this back to guest. And
> > > we don't have any. There only seems to be #MC stuff for poisoned pages.
> > > I am not sure how much we can build on top of #MC stuff to take care
> > > of this case. One problem with #MC I found was that it generates
> > > synchronous #MC only on load and not store. So all the code is
> > > written in such a way that synchronous #MC can happen only on load
> > > and hence the error handling.
> > >
> > > Stores generate different kind of #MC that too asynchronously and
> > > caller will not know about it immiditely. But in this case we need
> > > to know about error in the context of caller both for loads and stores.
> > >
> > > Anyway, I agree that this patch does not solve the problem of race
> > > free synchronous event inject into the guest. That's something yet
> > > to be solved either by #VE or by #MC or by #foo.
> > >
> > > This patch is only doing two things.
> > >
> > > - Because we don't have a mechanism to report errors to guest, use
> > > the second best method and exit to user space.
> >
> > Not that it matters at this point, but IMO exiting to userspace is the
> > correct behavior, not simply "second best method". Again, KVM by design
> > does not know what lies behind any given HVA, and therefore should not be
> > making decisions about how to handle bad HVAs.
> >
> > > - Make behavior consistent between synchronous fault and async faults.
> > > Currently sync faults exit to user space on error while async
> > > faults spin infinitely.
> >
> > Yes, and this part can be done with a new request type. Adding a new
> > request doesn't commit KVM to any ABI changes, e.g. userspace will still
> > see an -EFAULT return, and nothing more. I.e. handling this via request
> > doesn't prevent switching to synchronous exits in the future, if such a
> > feature is required.
>
> I am really not sure what benfit this kvm request is bringing to the
> table. To me maintaining a hash table and exiting when guest retries
> is much nicer desgin.
8 bytes pre vCPU versus 512 bytes per vCPU, with no guesswork. I.e. the
request can guarantee the first access to a truncated file will be reported
to userspace.
> In fact, once we have the mechanism to inject error into the guest using an
> exception, We can easily plug that into the same path.
You're blurring the two things together. The first step is to fix the
infinite loop by exiting to userspace with -EFAULT. Notifying the guest of
the truncated file is a completely different problem to solve. Reporting
-EFAULT to userspace is a pure bug fix, and is not unique to DAX.
> If memory usage is a concern, we can reduce the hash table size to say
> 4 or 8.
>
> How is this change commiting KVM to an ABI?
I didn't mean to imply this patch changes the ABI. What I was trying to say
is that going with the request-based implementation doesn't commit KVM to
new behavior, e.g. we can yank out the request implementation in favor of
something else if the need arises.
> > > Once we have a method to report errors back to guest, then we first
> > > should report error back to guest. And only in the absense of such
> > > mechanism we should exit to user space.
> >
> > I don't see the value in extending KVM's ABI to avoid the exit to userspace.
> > E.g. we could add a memslot flag to autoreflect -EFAULT as an event of some
> > form, but this is a slow path, the extra time should be a non-issue.
>
> IIUC, you are saying that this becomes the property of memory region. Some
> memory regions can choose their errors being reported back to guest
> (using some kind of flag in memslot). And in the absense of such flag,
> default behavior will continue to be exit to user space?
>
> I guess that's fine if we don't want to treat all the memory areas in
> same way. I can't think why reporting errors for all areas to guest
> is a bad idea.
Because it'd be bleeding host information to the guest. E.g. if there's a
a host kernel bug that causes gup() to fail, then KVM would inject a random
fault into the guest, which is all kinds of bad.
> Let guest either handle the error or die if it can't
> handle it. But that discussion is for some other day. Our first problem
> is that we don't have a race free mechanism to report errors.
>
> Thanks
> Vivek
>
next prev parent reply other threads:[~2020-10-02 21:13 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-20 21:13 [PATCH v4] kvm,x86: Exit to user space in case page fault error Vivek Goyal
2020-07-27 13:56 ` Vivek Goyal
2020-07-27 16:09 ` Vitaly Kuznetsov
2020-07-27 18:40 ` Vivek Goyal
2020-07-30 5:01 ` Pankaj Gupta
2020-08-07 17:51 ` Vivek Goyal
2020-09-29 4:37 ` Sean Christopherson
2020-10-01 21:55 ` Vivek Goyal
2020-10-01 22:33 ` Sean Christopherson
2020-10-02 15:38 ` Vivek Goyal
2020-10-02 18:30 ` Sean Christopherson
2020-10-02 19:27 ` Vivek Goyal
2020-10-02 19:45 ` Sean Christopherson
2020-10-02 20:02 ` Vivek Goyal
2020-10-02 21:13 ` Sean Christopherson [this message]
2020-10-05 15:33 ` Vivek Goyal
2020-10-05 16:16 ` Sean Christopherson
2020-10-06 13:46 ` Vivek Goyal
2020-10-06 14:05 ` Vitaly Kuznetsov
2020-10-06 14:15 ` Vivek Goyal
2020-10-06 14:50 ` Vitaly Kuznetsov
2020-10-06 15:08 ` Vivek Goyal
2020-10-06 15:24 ` Vitaly Kuznetsov
2020-10-06 16:12 ` Sean Christopherson
2020-10-06 16:24 ` Vivek Goyal
2020-10-06 16:39 ` Vitaly Kuznetsov
2020-10-06 17:17 ` Sean Christopherson
2020-10-06 17:21 ` [Virtio-fs] [PATCH v4] kvm, x86: " Dr. David Alan Gilbert
2020-10-06 17:28 ` Vivek Goyal
2020-10-06 17:35 ` [PATCH v4] kvm,x86: " Vivek Goyal
2020-10-07 0:04 ` Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201002211314.GE24460@linux.intel.com \
--to=sean.j.christopherson@intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=vgoyal@redhat.com \
--cc=virtio-fs@redhat.com \
--cc=vkuznets@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).