From: Sean Christopherson <seanjc@google.com>
To: Xiaoyao Li <xiaoyao.li@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
Michal Luczaj <mhal@rbox.co>,
Oliver Upton <oliver.upton@linux.dev>,
Colton Lewis <coltonlewis@google.com>
Subject: Re: [PATCH v2 2/2] KVM: selftests: Add logic to detect if ioctl() failed because VM was killed
Date: Thu, 30 Nov 2023 08:09:27 -0800 [thread overview]
Message-ID: <ZWiztxAzjCAUw7cx@google.com> (raw)
In-Reply-To: <ebacaa61-4156-4948-a9f7-8ea7c0a49e4a@intel.com>
On Thu, Nov 30, 2023, Xiaoyao Li wrote:
> On 11/30/2023 3:22 AM, Sean Christopherson wrote:
> > On Mon, Nov 13, 2023, Xiaoyao Li wrote:
> > > On 11/9/2023 12:07 AM, Sean Christopherson wrote:
> > > > On Wed, Nov 08, 2023, Xiaoyao Li wrote:
> > > > > On 11/8/2023 9:09 AM, Sean Christopherson wrote:
> > > > > > Add yet another macro to the VM/vCPU ioctl() framework to detect when an
> > > > > > ioctl() failed because KVM killed/bugged the VM, i.e. when there was
> > > > > > nothing wrong with the ioctl() itself. If KVM kills a VM, e.g. by way of
> > > > > > a failed KVM_BUG_ON(), all subsequent VM and vCPU ioctl()s will fail with
> > > > > > -EIO, which can be quite misleading and ultimately waste user/developer
> > > > > > time.
> > > > > >
> > > > > > Use KVM_CHECK_EXTENSION on KVM_CAP_USER_MEMORY to detect if the VM is
> > > > > > dead and/or bug, as KVM doesn't provide a dedicated ioctl(). Using a
> > > > > > heuristic is obviously less than ideal, but practically speaking the logic
> > > > > > is bulletproof barring a KVM change, and any such change would arguably
> > > > > > break userspace, e.g. if KVM returns something other than -EIO.
> > > > >
> > > > > We hit similar issue when testing TDX VMs. Most failure of SEMCALL is
> > > > > handled with a KVM_BUG_ON(), which leads to vm dead. Then the following
> > > > > IOCTL from userspace (QEMU) and gets -EIO.
> > > > >
> > > > > Can we return a new KVM_EXIT_VM_DEAD on KVM_REQ_VM_DEAD?
> > > >
> > > > Why? Even if KVM_EXIT_VM_DEAD somehow provided enough information to be useful
> > > > from an automation perspective, the VM is obviously dead. I don't see how the
> > > > VMM can do anything but log the error and tear down the VM. KVM_BUG_ON() comes
> > > > with a WARN, which will be far more helpful for a human debugger, e.g. because
> > > > all vCPUs would exit with KVM_EXIT_VM_DEAD, it wouldn't even identify which vCPU
> > > > initially triggered the issue.
> > >
> > > It's not about providing more helpful debugging info, but to provide a
> > > dedicated notification for VMM that "the VM is dead, all the following
> > > command may not response". With it, VMM can get rid of the tricky detection
> > > like this patch.
> >
> > But a VMM doesn't need this tricky detection, because this tricky detections isn't
> > about detecting that the VM is dead, it's all about helping a human debug why a
> > test failed.
> >
> > -EIO already effectively says "the VM is dead", e.g. QEMU isn't going to keep trying
> > to run vCPUs.
>
> If -EIO for KVM ioctl denotes "the VM is dead" is to be the officially
> announced API, I'm fine.
Yes, -EIO is effectively ABI at this point. Though there is the caveat that -EIO
doesn't guarantee KVM killed the VM, i.e. KVM could return -EIO for some other
reason (though that's highly unlikely for KVM_RUN at least).
> > Similarly, selftests assert either way, the goal is purely to print
> > out a unique error message to minimize the chances of confusing the human running
> > the test (or looking at results).
> >
> > > > Definitely a "no" on this one. As has been established by the guest_memfd series,
> > > > it's ok to return -1/errno with a valid exit_reason.
> > > >
> > > > > But I'm wondering if any userspace relies on -EIO behavior for VM DEAD case.
> > > >
> > > > I doubt userspace relies on -EIO, but userpsace definitely relies on -1/errno being
> > > > returned when a fatal error.
> > >
> > > what about KVM_EXIT_SHUTDOWN? Or KVM_EXIT_INTERNAL_ERROR?
> >
> > I don't follow,
>
> I was trying to ask if KVM_EXIT_SHUTDOWN and KVM_EXIT_INTERNAL_ERROR are
> treated as fatal error by userspace.
Ah. Not really. SHUTDOWN isn't fatal per se, e.g. QEMU emulates a RESET if a
vCPU hits shutdown. INTERNAL_ERROR isn't always fatal on x86, e.g. QEMU ignores
(I think that's what happens) emulation failure when the vCPU is at CPL > 0 so
that guest userspace can't DoS the VM.
next prev parent reply other threads:[~2023-11-30 16:09 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-08 1:09 [PATCH v2 0/2] KVM: selftests: Detect if KVM bugged the VM Sean Christopherson
2023-11-08 1:09 ` [PATCH v2 1/2] KVM: selftests: Drop the single-underscore ioctl() helpers Sean Christopherson
2023-11-08 1:09 ` [PATCH v2 2/2] KVM: selftests: Add logic to detect if ioctl() failed because VM was killed Sean Christopherson
2023-11-08 10:06 ` Xiaoyao Li
2023-11-08 16:07 ` Sean Christopherson
2023-11-13 4:04 ` Xiaoyao Li
2023-11-29 19:22 ` Sean Christopherson
2023-11-30 3:04 ` Xiaoyao Li
2023-11-30 16:09 ` Sean Christopherson [this message]
2023-11-30 1:44 ` [PATCH v2 0/2] KVM: selftests: Detect if KVM bugged the VM Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZWiztxAzjCAUw7cx@google.com \
--to=seanjc@google.com \
--cc=coltonlewis@google.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mhal@rbox.co \
--cc=oliver.upton@linux.dev \
--cc=pbonzini@redhat.com \
--cc=xiaoyao.li@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.