From: David Woodhouse <dwmw2@infradead.org>
To: Sean Christopherson <seanjc@google.com>,
Gerd Hoffmann <kraxel@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
Jonathan Corbet <corbet@lwn.net>,
Shuah Khan <skhan@linuxfoundation.org>,
Thomas Gleixner <tglx@kernel.org>, Ingo Molnar <mingo@redhat.com>,
Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
Paul Durrant <paul@xen.org>,
kvm@vger.kernel.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org
Subject: Re: [PATCH] KVM: x86/xen: Add KVM_XEN_VCPU_ATTR_TYPE_WRITE_HYPERCALL_PAGE
Date: Fri, 26 Jun 2026 08:45:32 +0100 [thread overview]
Message-ID: <8edfdca645f691cb856e80ade830d78925fdc19d.camel@infradead.org> (raw)
In-Reply-To: <aj21KctIXuf7b_5G@google.com>
[-- Attachment #1: Type: text/plain, Size: 6198 bytes --]
On Thu, 2026-06-25 at 16:09 -0700, Sean Christopherson wrote:
> On Tue, Apr 28, 2026, David Woodhouse wrote:
> > From: David Woodhouse <dwmw@amazon.co.uk>
> >
> > Commit 3617c0ee7decb ("KVM: x86/xen: Only write Xen hypercall page for
> > guest writes to MSR") blocked host-initiated writes from triggering the
> > Xen hypercall page setup, to fix an SRCU usage violation when the
> > hypercall MSR index collides with a real MSR written during vCPU reset.
> >
> > However, some VMMs legitimately need to trigger hypercall page setup
> > from host context. For example, a VMM may intercept the guest's MSR
> > write to track an epoch (for kexec/crash recovery), and then replay the
> > write as a host-initiated KVM_SET_MSRS to populate the hypercall page.
> > The host_initiated check breaks this use case.
> >
> > Add KVM_XEN_VCPU_ATTR_TYPE_WRITE_HYPERCALL_PAGE as a new vcpu attribute
> > that explicitly invokes kvm_xen_write_hypercall_page() under proper
> > locking. This gives userspace a safe interface to trigger hypercall page
> > setup without going through the MSR write path, preserving the
> > host_initiated defence in depth while restoring the lost functionality.
>
> This is all kinda silly. Userspace provides KVM a blob, then userspace intercepts
> the MSR write that triggers doing something with said blob, only to call back into
> KVM to consume the blob that userspace provided in the first place.
>
> Any chance we can deprecate KVM's kvm_xen_write_hypercall_page(), and instead
> rely on userspace to fill the page? This extra bit obviously isn't much code to
> carry, but it's yet one more Xen thing to maintain, and we've accumulated a lot
> of those over the years...
We don't actually use the 'blob' mode. That was added in commit
ffde22ac53b6d in 2009 with a comment saying, "A generic mechanism to
delegate MSR writes to userspace seems overkill and risks encouraging
similar MSR abuse in the future. Thus this patch adds special support
for the Xen HVM MSR."
When João and I came along almost a decade later, in 23200b7a30de3
where we added hypercall interception support we said, "Since this
means KVM owns the ABI, dispense with the facility for the VMM to
provide its own copy of the hypercall pages; just fill them in directly
using VMCALL/VMMCALL as we do for the Hyper-V hypercall page."
I think we could probably rip out the blob mode without any fear of
breaking userspace. Even in 2018 I don't think we could even find the
alleged code from 2009 that used the old support. At least, not in
buildable and usable form?
> > diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
> > index 91fd3673c09a..c16b4560c9e7 100644
> > --- a/arch/x86/kvm/xen.c
> > +++ b/arch/x86/kvm/xen.c
> > @@ -907,6 +907,13 @@ int kvm_xen_vcpu_set_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data)
> > {
> > int idx, r = -ENOENT;
> >
> > + /*
> > + * kvm_xen_write_hypercall_page() manages its own locking.
> > + * Handle it before taking xen_lock to avoid a deadlock.
>
> Do we actually want the side effects that necessitate taking xen.xen_lock? From
> a uAPI perspective, it's odd to effectively bundle KVM_XEN_ATTR_TYPE_LONG_MODE
> into KVM_XEN_VCPU_ATTR_TYPE_WRITE_HYPERCALL_PAGE.
That's *guest* ABI, and it's derived from Xen behaviour. Xen will
'latch' its idea of whether a guest VM is 32-bit or 64-bit, for the
purpose of shared data structures (shared_info page, vcpu_info,
runstate).
Xen latches this from the current mode of the running vCPU in *two*
places:
• When the hypercall MSR is invoked
• When the guest sets the event channel GSI (HVM_PARAM_CALLBACK_IRQ).
Thus far, the former has been handled in the kernel (in the code you're
looking at), while the latter is why we have the ioctl to explicitly
latch the guest's long_mode from userspace too, as userspace handles
the HVMOP_set_param calls.
> The other question is, why does kvm_xen_write_hypercall_page() drop xen_lock
> when writing guest memory? That seems odd and unnecessary.
Huh? It takes the lock to do the thing that needs the lock, then drops
it. That is not "odd and unnecessary" at all.
You've been spending too long with these scope-guarded locks. I *hate*
them. I hate the way they slowly spread around the whole kernel, making
every lock holder hold their locks for just a *little* bit longer than
they need to, slowly increasing lock contention "just a little bit; it
doesn't matter" at a time. I hate the way they stop us thinking about
which locks are needed and in which order, and make it unclear whether
some action in the tail of a function actually *needed* the lock, or
was just caught up in it as collateral damage.
> > + */
> > + if (data->type == KVM_XEN_VCPU_ATTR_TYPE_WRITE_HYPERCALL_PAGE)
> > + return kvm_xen_write_hypercall_page(vcpu, data->u.gpa) ? -EIO : 0;
>
> -EIO is rather weird, wouldn't -EINVAL be more appropriate? Ah, and both are
> wrong if copying the blob fails.
-EINVAL is more for "you asked me to do something that doesn't make
sense". -EIO is for "something went wrong when I tried".
Arguably, the thing that's most likely to go wrong is the
kvm_vcpu_write_guest() where it writes instructions[] to the guest, and
maybe that ought to be -EFAULT? But I'm not sure that's quite the right
semantic to return from the ioctl?
> > +
> > mutex_lock(&vcpu->kvm->arch.xen.xen_lock);
> > idx = srcu_read_lock(&vcpu->kvm->srcu);
>
> Speaking of writing memory, kvm_xen_write_hypercall_page() expects the caller
> to be in a read-side SRCU critical section (I didn't actually run this with
> PROVE_LOCKING=y, but I don't think I'm missing anything?)
Yes, good catch. Thanks.
> So, if this uAPI is unavoidable seems like we want something like the below.
> Either that or guard all of kvm_xen_write_hypercall_page() with a lock, and put
> the entire thing in a helper so that KVM_XEN_VCPU_ATTR_TYPE_WRITE_HYPERCALL_PAGE
> can be handled in a case-statement and doesn't need to grab SRCU on its own.
Makes sense (with the test, of course). Want me to put them together
and resend?
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
next prev parent reply other threads:[~2026-06-26 7:45 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-28 19:12 [PATCH] KVM: x86/xen: Add KVM_XEN_VCPU_ATTR_TYPE_WRITE_HYPERCALL_PAGE David Woodhouse
2026-04-29 10:36 ` Paul Durrant
2026-06-02 17:01 ` David Woodhouse
2026-06-25 23:09 ` Sean Christopherson
2026-06-26 7:45 ` David Woodhouse [this message]
2026-06-26 13:40 ` Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8edfdca645f691cb856e80ade830d78925fdc19d.camel@infradead.org \
--to=dwmw2@infradead.org \
--cc=bp@alien8.de \
--cc=corbet@lwn.net \
--cc=dave.hansen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=kraxel@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=paul@xen.org \
--cc=pbonzini@redhat.com \
--cc=seanjc@google.com \
--cc=skhan@linuxfoundation.org \
--cc=tglx@kernel.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox