From: Sean Christopherson <seanjc@google.com>
To: David Woodhouse <dwmw2@infradead.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>, kvm@vger.kernel.org, mhal@rbox.co
Subject: Re: [PATCH 2/4] KVM: x86/xen: Compatibility fixes for shared runstate area
Date: Wed, 23 Nov 2022 17:17:57 +0000 [thread overview]
Message-ID: <Y35VxflJBVjzloaj@google.com> (raw)
In-Reply-To: <a12c8e6123cf702bc882988f9da3be7bd096a2e3.camel@infradead.org>
On Tue, Nov 22, 2022, David Woodhouse wrote:
> On Tue, 2022-11-22 at 18:39 +0000, Sean Christopherson wrote:
> > On Sat, Nov 19, 2022, David Woodhouse wrote:
> > > From: David Woodhouse <
> > > dwmw@amazon.co.uk
> > > >
> > >
> > > The guest runstate area can be arbitrarily byte-aligned. In fact, even
> > > when a sane 32-bit guest aligns the overall structure nicely, the 64-bit
> > > fields in the structure end up being unaligned due to the fact that the
> > > 32-bit ABI only aligns them to 32 bits.
> > >
> > > So setting the ->state_entry_time field to something|XEN_RUNSTATE_UPDATE
> > > is buggy, because if it's unaligned then we can't update the whole field
> > > atomically; the low bytes might be observable before the _UPDATE bit is.
> > > Xen actually updates the *byte* containing that top bit, on its own. KVM
> > > should do the same.
> >
> > I think we're using the wrong APIs to update the runstate. The VMCS/VMCB pages
> > _need_ the host pfn, i.e. need to use a gpc (eventually). The Xen PV stuff on the
> > other hand most definitely doesn't need to know the pfn.
> >
> > The event channel code would be difficult to convert due to lack of uaccess
> > primitives, but I don't see anything in the runstate code that prevents KVM from
> > using a gfn_to_hva_cache. That will naturally handle page splits by sending them
> > down a slow path and would yield far simpler code.
> >
> > If taking the slow path is an issue, then the guest really should be fixed to not
> > split pages. And if that's not an acceptable answer, the gfn_to_hva_cache code
> > could be updated to use the fast path if the region is contiguous in the host
> > virtual address space.
> >
>
> Yeah, that's tempting. Going back to gfn_to_hva_cache was the first
> thing I tried. There are a handful of complexifying factors, none of
> which are insurmountable if we try hard enough.
>
> • Even if we fix the gfn_to_hva_cache to still use the fast path for
> more than the first page, it's still possible for the runstate to
> cross from one memslot to an adjacent one. We probably still need
> two of them, which is a large part of the ugliness of this patch.
Hrm. What if KVM requires that the runstate be contiguous in host virtual address
space? That wouldn't violate Xen's ABI since it's a KVM restriction, and it can't
break backwards compatibility since KVM doesn't even handle page splits, let alone
memslot splits. AFAIK, most (all?) userspace VMMs make guest RAM virtually
contiguous anyways.
Probably a moot point since I don't see a way around the "page got swapped out"
issue.
> • Accessing it via the userspace HVA requires coping with the case
> where the vCPU is being scheduled out, and it can't sleep. The
> previous code before commit 34d41d19e dealt with that by using
> pagefault_disable() and depending on the GHC fast path. But even
> so...
>
> • We also could end up having to touch page B, page A then page B
> again, potentially having to abort and leave the runstate with
> the XEN_RUNSTATE_UPDATE bit still set. I do kind of prefer the
> version which checks that both pages are present before it starts
> writing anything to the guest.
Ugh, and because of the in-atomic case, there's nothing KVM can do to remedy page B
getting swapped out. Actually, it doesn't even require a B=>A=>B pattern. Even
without a page split, the backing page could get swapped out between setting and
clearing.
> As I said, they're not insurmountable but that's why I ended up with
> the patch you see, after first attempting to use a gfn_to_hva_cache
> again. Happy to entertain suggestions.
What if we use a gfn_to_pfn_cache for the page containing XEN_RUNSTATE_UPDATE,
and then use kvm_vcpu_write_guest_atomic() if there's a page split? That would
avoid needing to acquire mutliple gpc locks, and the update could use a single
code flow by always constructing an on-stack representation. E.g. very roughly:
*update_bit = (vx->runstate_entry_time | XEN_RUNSTATE_UPDATE) >> 56;
smp_wmb();
if (khva)
memcpy(khva, rs_state, user_len);
else
kvm_vcpu_write_guest_in_atomic(vcpu, gpa, rs_state, user_len);
smp_wmb();
*update_bit = vx->runstate_entry_time >> 56;
smp_wmb();
where "khva" is NULL if there's a page split.
> I note we have a kvm_read_guest_atomic() and briefly pondered adding a
> 'write' version of same... but that doesn't seem to cope with crossing
> memslots either; it also assumes its caller only uses it to access
> within a single page.
We should fix the lack of page split handling. There are no bugs because KVM
only uses the helper for cases where the accesses are naturally aligned, but that's
bound to bite someone eventually. That would be an oppurtunity to dedup the code
that handles "segments" (ugh), e.g. instead of
gfn_t gfn = gpa >> PAGE_SHIFT;
int seg;
int offset = offset_in_page(gpa);
int ret;
while ((seg = next_segment(len, offset)) != 0) {
ret = kvm_vcpu_read_guest_page(vcpu, gfn, data, offset, seg);
if (ret < 0)
return ret;
offset = 0;
len -= seg;
data += seg;
++gfn;
}
return 0;
provide a macro that allows
kvm_for_each_chunk(...)
ret = kvm_vcpu_read_guest_page(vcpu, gfn, data, offset, seg);
if (ret)
return ret;
}
I also think we should rename the "atomic" helpers to be "in_atomic" (or inatomic
if we want to follow the kernel's terrible nomenclature, which I always read as
"not atomic"). I always think read_guest_atomic() means an "atomic read".
next prev parent reply other threads:[~2022-11-23 17:18 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-19 9:46 [PATCH 1/4] MAINTAINERS: Add KVM x86/xen maintainer list David Woodhouse
2022-11-19 9:46 ` [PATCH 2/4] KVM: x86/xen: Compatibility fixes for shared runstate area David Woodhouse
2022-11-19 11:38 ` Durrant, Paul
2022-11-19 11:46 ` David Woodhouse
2022-11-22 18:39 ` Sean Christopherson
2022-11-22 23:04 ` David Woodhouse
2022-11-23 17:17 ` Sean Christopherson [this message]
2022-11-23 17:58 ` David Woodhouse
2022-11-23 18:16 ` Sean Christopherson
2022-11-23 18:48 ` David Woodhouse
2022-11-22 23:46 ` Michal Luczaj
2022-11-23 0:03 ` David Woodhouse
2022-11-23 18:21 ` Sean Christopherson
2022-11-23 19:22 ` David Woodhouse
2022-11-23 19:32 ` Sean Christopherson
2022-11-23 20:07 ` David Woodhouse
2022-11-23 20:26 ` Sean Christopherson
2022-11-19 9:46 ` [PATCH 3/4] KVM: Update gfn_to_pfn_cache khva when it moves within the same page David Woodhouse
2022-11-19 11:40 ` Durrant, Paul
2022-11-22 16:23 ` Sean Christopherson
2022-11-22 16:49 ` Paul Durrant
2022-11-22 17:02 ` David Woodhouse
2022-11-22 19:09 ` Sean Christopherson
2022-11-22 23:13 ` David Woodhouse
2022-11-24 0:31 ` Paolo Bonzini
2022-11-24 0:45 ` David Woodhouse
2022-11-24 0:54 ` David Woodhouse
2022-11-24 1:11 ` Paolo Bonzini
2022-11-24 10:41 ` David Woodhouse
2022-11-24 11:30 ` David Woodhouse
2022-11-19 9:46 ` [PATCH 4/4] KVM: x86/xen: Add runstate tests for 32-bit mode and crossing page boundary David Woodhouse
2022-11-19 11:42 ` Durrant, Paul
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y35VxflJBVjzloaj@google.com \
--to=seanjc@google.com \
--cc=dwmw2@infradead.org \
--cc=kvm@vger.kernel.org \
--cc=mhal@rbox.co \
--cc=pbonzini@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.