From: Alexandru Elisei <alexandru.elisei@arm.com>
To: Sean Christopherson <seanjc@google.com>
Cc: maz@kernel.org, oupton@kernel.org, joey.gouly@arm.com,
suzuki.poulose@arm.com, yuzenghui@huawei.com,
linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
tabba@google.com, David.Hildenbrand@arm.com
Subject: Re: [RFC PATCH] KVM: arm64: Align KVM_EXIT_MEMORY_FAULT error codes with documentation
Date: Fri, 8 May 2026 09:55:17 +0100 [thread overview]
Message-ID: <af2k9btE5hS6lWHR@raptor> (raw)
In-Reply-To: <afyUkaQ1wtivyH8W@google.com>
Hi Sean,
On Thu, May 07, 2026 at 06:33:05AM -0700, Sean Christopherson wrote:
> On Thu, May 07, 2026, Alexandru Elisei wrote:
> > Hi Sean,
> >
> > (Resending this because I managed to mess up the headers, sorry for the
> > duplicate).
> >
> > Thanks for the explanations!
> >
> > On Wed, May 06, 2026 at 05:44:50AM -0700, Sean Christopherson wrote:
> > > On Wed, May 06, 2026, Alexandru Elisei wrote:
> > > > The documentation for KVM_EXIT_MEMORY_FAULT states:
> > > >
> > > > 'Note! KVM_EXIT_MEMORY_FAULT is unique among all KVM exit reasons in that
> > > > it accompanies a return code of '-1', not '0'! errno will always be set to
> > > > EFAULT or EHWPOISON when KVM exits with KVM_EXIT_MEMORY_FAULT, userspace
> > > > should assume kvm_run.exit_reason is stale/undefined for all other error
> > > > numbers'.
> > > >
> > > > where a return code of '-1' is special because according to man 2 ioctl:
> > > >
> > > > 'On error, -1 is returned, and errno is set to indicate the error'.
> > > >
> > > > Putting the two together means that the ioctl KVM_RUN must 1) complete with
> > > > an error and 2) that error must must be either EFAULT or EHWPOISON for
> > > > userspace to detect a KVM_EXIT_MEMORY_FAULT VCPU exit.
> > >
> > > Yes and no. The key escape valve we (very deliberately) gave ourselves is this:
> > >
> > > userspace should assume kvm_run.exit_reason is stale/undefined for all other
> > > error numbers.
> > >
> > > As arm64 already does, that clause allows KVM to "speculatively" set exit_reason
> > > to KVM_EXIT_MEMORY_FAULT. Which is by design. The userspace flow is intended
> > > to be "if KVM_RUN returns EFAULT or EHWPOISON, then check for KVM_EXIT_MEMORY_FAULT
> > > to see if KVM provided more information about why the EFAULT/EHWPOISON error was
> > > returned".
> >
> > Hm... In general, "speculatively" populating exit_reason with
> > KVM_EXIT_MEMORY_FAULT when userspace is not intended to use that information
> > looks a bit dubious to me.
>
> Oh, for sure, it's not exactly ideal.
>
> > Why do the work if userspace is not supposed to use the information?
>
> Because not filling kvm_run when KVM is supposed to (per KVM's contract with
> userspace) would be a bug, whereas unnecessarily filling kvm_run is "just" wasted
> cycles (and not very many of them). x86 also has multiple flows where it fills
> kvm_run "speculatively", e.g. in low(ish) level helpers where it's not known if
> KVM will actually exit to userspace.
For arm64, it's not that hard to figure out that 0 from the fault handlers means
a return to guest:
fault handler returns 0 => kvm_handle_guest_abort() massages the 0 into 1 =>
kvm_vcpu_arch_ioctl() resumes loop.
Consequently anything other than 0 from the fault handlers means an exit to
userspace.
Not sure if that proves or disproves my point though :(
>
> Overall, for code like this, IMO it's also yields less complex KVM code, though
> I suppose it can also end up being more confusing for readers.
>
> > Regarding gmem_abort(). As I see it, if today someone writes userspace that
> > relies on any of the undocumented error codes propagated from kvm_gmem_get_pfn()
> > to handle KVM_EXIT_MEMORY_FAULT, that means that KVM can never use those error
> > codes for any other exit_reason in the future, because that userspace will
> > break.
>
> Hmm, if we wanted to defend against that, we could scribble kvm_run.exit_reason
> on the way out of KVM_RUN, e.g.
>
> diff --git virt/kvm/kvm_main.c virt/kvm/kvm_main.c
> index 89489996fbc1..76801d103dd9 100644
> --- virt/kvm/kvm_main.c
> +++ virt/kvm/kvm_main.c
> @@ -4475,6 +4475,10 @@ static long kvm_vcpu_ioctl(struct file *filp,
> */
> rseq_virt_userspace_exit();
>
> + if (vcpu->run->exit_reason == KVM_EXIT_MEMORY_FAULT &&
> + r && r != -EFAULT && r != EHWPOISON)
^^^^^^^^^^
-EHWPOISON
> + vcpu->run->exit_reason = KVM_EXIT_UNKNOWN;
> +
> trace_kvm_userspace_exit(vcpu->run->exit_reason, r);
> break;
> }
I was thinking something like this, to avoid populating KVM_EXIT_MEMORY_FAULT
information and then overwriting it later (I assume all architectures go
through the helper and don't open code it, haven't checked):
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 4c14aee1fb06..6e1eeb511967 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2505,11 +2505,14 @@ static inline void kvm_account_pgtable_pages(void *virt, int nr)
/* Max number of entries allowed for each kvm dirty ring */
#define KVM_DIRTY_RING_MAX_ENTRIES 65536
-static inline void kvm_prepare_memory_fault_exit(struct kvm_vcpu *vcpu,
+static inline void kvm_prepare_memory_fault_exit(struct kvm_vcpu *vcpu, int error,
gpa_t gpa, gpa_t size,
bool is_write, bool is_exec,
bool is_private)
{
+ if (error != -EFAULT && error != -EHWPOISON)
+ return;
+
vcpu->run->exit_reason = KVM_EXIT_MEMORY_FAULT;
vcpu->run->memory_fault.gpa = gpa;
vcpu->run->memory_fault.size = size;
arm64 sets exit_reason = KVM_EXIT_UNKNOWN before the run loop. After a
cursory look, I *think* x86 does the same, so the result would be similar
to what you propose, at least for these two architectures. We could also
set exit_reason to unknown if !-EFAULT && !-EHWPOISON to be sure.
Avoids leaking what memory the guest accesses, for the extra, extra
paranoid.
It would also make at least one person (me) less confused about why
KVM_EXIT_MEMORY_FAULT is populated when userspace is not supposed to
consume it :)
On the other hand, all call sites would need to be modified.
>
> I don't know that I'm convinced that level of paranoia is worth it though.
It's up to you, I don't feel strongly about it. If you do decide to go
ahead with it, whatever approach you choose, I can prepare the patch.
>
> > I'm sure this was all carefully considered when designing the interface, I was
> > just curious how this particular problem has been solved.
>
> Heh, I like to think we carefully considered the interface, but thinking of every
> possible way userspace can be silly is hard :-)
Agreed. That's why I think exposing strictly the minimum necessary
information to userspace is a good defence :)
Thanks,
Alex
prev parent reply other threads:[~2026-05-08 8:55 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-06 10:50 [RFC PATCH] KVM: arm64: Align KVM_EXIT_MEMORY_FAULT error codes with documentation Alexandru Elisei
2026-05-06 12:44 ` Sean Christopherson
2026-05-06 13:39 ` Alexandru Elisei, Sean Christopherson
2026-05-07 8:45 ` Alexandru Elisei
2026-05-07 13:33 ` Sean Christopherson
2026-05-08 8:55 ` Alexandru Elisei [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=af2k9btE5hS6lWHR@raptor \
--to=alexandru.elisei@arm.com \
--cc=David.Hildenbrand@arm.com \
--cc=joey.gouly@arm.com \
--cc=kvmarm@lists.linux.dev \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=maz@kernel.org \
--cc=oupton@kernel.org \
--cc=seanjc@google.com \
--cc=suzuki.poulose@arm.com \
--cc=tabba@google.com \
--cc=yuzenghui@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox