* [PATCH v2 0/2] KVM: selftests: Detect if KVM bugged the VM
@ 2023-11-08 1:09 Sean Christopherson
2023-11-08 1:09 ` [PATCH v2 1/2] KVM: selftests: Drop the single-underscore ioctl() helpers Sean Christopherson
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Sean Christopherson @ 2023-11-08 1:09 UTC (permalink / raw)
To: Paolo Bonzini
Cc: kvm, linux-kernel, Sean Christopherson, Michal Luczaj,
Oliver Upton, Colton Lewis
Teach selftests' ioctl() macros to detect and report when an ioctl()
unexpectedly fails because KVM has killed and/or bugged the VM. Because
selftests does the right thing and tries to gracefully clean up VMs, a
bugged VM can generate confusing errors, e.g. when deleting memslots.
v2:
- Drop the ARM patch (not worth the churn).
- Drop macros for ioctls() that return file descriptors. Looking at this
with fresh eyes, I agree they do more harm than good. [Oliver]
v1: https://lore.kernel.org/all/20230804004226.1984505-1-seanjc@google.com
Sean Christopherson (2):
KVM: selftests: Drop the single-underscore ioctl() helpers
KVM: selftests: Add logic to detect if ioctl() failed because VM was
killed
.../selftests/kvm/include/kvm_util_base.h | 75 ++++++++++++-------
tools/testing/selftests/kvm/lib/kvm_util.c | 2 +-
2 files changed, 51 insertions(+), 26 deletions(-)
base-commit: 45b890f7689eb0aba454fc5831d2d79763781677
--
2.42.0.869.gea05f2083d-goog
^ permalink raw reply [flat|nested] 10+ messages in thread* [PATCH v2 1/2] KVM: selftests: Drop the single-underscore ioctl() helpers 2023-11-08 1:09 [PATCH v2 0/2] KVM: selftests: Detect if KVM bugged the VM Sean Christopherson @ 2023-11-08 1:09 ` Sean Christopherson 2023-11-08 1:09 ` [PATCH v2 2/2] KVM: selftests: Add logic to detect if ioctl() failed because VM was killed Sean Christopherson 2023-11-30 1:44 ` [PATCH v2 0/2] KVM: selftests: Detect if KVM bugged the VM Sean Christopherson 2 siblings, 0 replies; 10+ messages in thread From: Sean Christopherson @ 2023-11-08 1:09 UTC (permalink / raw) To: Paolo Bonzini Cc: kvm, linux-kernel, Sean Christopherson, Michal Luczaj, Oliver Upton, Colton Lewis Drop _kvm_ioctl(), _vm_ioctl(), and _vcpu_ioctl(), as they are no longer used by anything other than the no-underscores variants (and may have never been used directly). The single-underscore variants were never intended to be a "feature", they were a stopgap of sorts to ease the conversion to pretty printing ioctl() names when reporting errors. Opportunistically add a comment explaining when to use __KVM_IOCTL_ERROR() versus KVM_IOCTL_ERROR(). The single-underscore macros were subtly ensuring that the name of the ioctl() was printed on error, i.e. it's all too easy to overlook the fact that using __KVM_IOCTL_ERROR() is intentional. Signed-off-by: Sean Christopherson <seanjc@google.com> --- .../selftests/kvm/include/kvm_util_base.h | 42 +++++++++---------- 1 file changed, 19 insertions(+), 23 deletions(-) diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h b/tools/testing/selftests/kvm/include/kvm_util_base.h index a18db6a7b3cf..1f6193dc7d3a 100644 --- a/tools/testing/selftests/kvm/include/kvm_util_base.h +++ b/tools/testing/selftests/kvm/include/kvm_util_base.h @@ -248,6 +248,13 @@ static inline bool kvm_has_cap(long cap) #define __KVM_SYSCALL_ERROR(_name, _ret) \ "%s failed, rc: %i errno: %i (%s)", (_name), (_ret), errno, strerror(errno) +/* + * Use the "inner", double-underscore macro when reporting errors from within + * other macros so that the name of ioctl() and not its literal numeric value + * is printed on error. The "outer" macro is strongly preferred when reporting + * errors "directly", i.e. without an additional layer of macros, as it reduces + * the probability of passing in the wrong string. + */ #define __KVM_IOCTL_ERROR(_name, _ret) __KVM_SYSCALL_ERROR(_name, _ret) #define KVM_IOCTL_ERROR(_ioctl, _ret) __KVM_IOCTL_ERROR(#_ioctl, _ret) @@ -260,17 +267,13 @@ static inline bool kvm_has_cap(long cap) #define __kvm_ioctl(kvm_fd, cmd, arg) \ kvm_do_ioctl(kvm_fd, cmd, arg) - -#define _kvm_ioctl(kvm_fd, cmd, name, arg) \ +#define kvm_ioctl(kvm_fd, cmd, arg) \ ({ \ int ret = __kvm_ioctl(kvm_fd, cmd, arg); \ \ - TEST_ASSERT(!ret, __KVM_IOCTL_ERROR(name, ret)); \ + TEST_ASSERT(!ret, __KVM_IOCTL_ERROR(#cmd, ret)); \ }) -#define kvm_ioctl(kvm_fd, cmd, arg) \ - _kvm_ioctl(kvm_fd, cmd, #cmd, arg) - static __always_inline void static_assert_is_vm(struct kvm_vm *vm) { } #define __vm_ioctl(vm, cmd, arg) \ @@ -279,16 +282,12 @@ static __always_inline void static_assert_is_vm(struct kvm_vm *vm) { } kvm_do_ioctl((vm)->fd, cmd, arg); \ }) -#define _vm_ioctl(vm, cmd, name, arg) \ -({ \ - int ret = __vm_ioctl(vm, cmd, arg); \ - \ - TEST_ASSERT(!ret, __KVM_IOCTL_ERROR(name, ret)); \ -}) - #define vm_ioctl(vm, cmd, arg) \ - _vm_ioctl(vm, cmd, #cmd, arg) - +({ \ + int ret = __vm_ioctl(vm, cmd, arg); \ + \ + TEST_ASSERT(!ret, __KVM_IOCTL_ERROR(#cmd, ret)); \ +}) static __always_inline void static_assert_is_vcpu(struct kvm_vcpu *vcpu) { } @@ -298,15 +297,12 @@ static __always_inline void static_assert_is_vcpu(struct kvm_vcpu *vcpu) { } kvm_do_ioctl((vcpu)->fd, cmd, arg); \ }) -#define _vcpu_ioctl(vcpu, cmd, name, arg) \ -({ \ - int ret = __vcpu_ioctl(vcpu, cmd, arg); \ - \ - TEST_ASSERT(!ret, __KVM_IOCTL_ERROR(name, ret)); \ -}) - #define vcpu_ioctl(vcpu, cmd, arg) \ - _vcpu_ioctl(vcpu, cmd, #cmd, arg) +({ \ + int ret = __vcpu_ioctl(vcpu, cmd, arg); \ + \ + TEST_ASSERT(!ret, __KVM_IOCTL_ERROR(#cmd, ret)); \ +}) /* * Looks up and returns the value corresponding to the capability -- 2.42.0.869.gea05f2083d-goog ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v2 2/2] KVM: selftests: Add logic to detect if ioctl() failed because VM was killed 2023-11-08 1:09 [PATCH v2 0/2] KVM: selftests: Detect if KVM bugged the VM Sean Christopherson 2023-11-08 1:09 ` [PATCH v2 1/2] KVM: selftests: Drop the single-underscore ioctl() helpers Sean Christopherson @ 2023-11-08 1:09 ` Sean Christopherson 2023-11-08 10:06 ` Xiaoyao Li 2023-11-30 1:44 ` [PATCH v2 0/2] KVM: selftests: Detect if KVM bugged the VM Sean Christopherson 2 siblings, 1 reply; 10+ messages in thread From: Sean Christopherson @ 2023-11-08 1:09 UTC (permalink / raw) To: Paolo Bonzini Cc: kvm, linux-kernel, Sean Christopherson, Michal Luczaj, Oliver Upton, Colton Lewis Add yet another macro to the VM/vCPU ioctl() framework to detect when an ioctl() failed because KVM killed/bugged the VM, i.e. when there was nothing wrong with the ioctl() itself. If KVM kills a VM, e.g. by way of a failed KVM_BUG_ON(), all subsequent VM and vCPU ioctl()s will fail with -EIO, which can be quite misleading and ultimately waste user/developer time. Use KVM_CHECK_EXTENSION on KVM_CAP_USER_MEMORY to detect if the VM is dead and/or bug, as KVM doesn't provide a dedicated ioctl(). Using a heuristic is obviously less than ideal, but practically speaking the logic is bulletproof barring a KVM change, and any such change would arguably break userspace, e.g. if KVM returns something other than -EIO. Without the detection, tearing down a bugged VM yields a cryptic failure when deleting memslots: ==== Test Assertion Failure ==== lib/kvm_util.c:689: !ret pid=45131 tid=45131 errno=5 - Input/output error 1 0x00000000004036c3: __vm_mem_region_delete at kvm_util.c:689 2 0x00000000004042f0: kvm_vm_free at kvm_util.c:724 (discriminator 12) 3 0x0000000000402929: race_sync_regs at sync_regs_test.c:193 4 0x0000000000401cab: main at sync_regs_test.c:334 (discriminator 6) 5 0x0000000000416f13: __libc_start_call_main at libc-start.o:? 6 0x000000000041855f: __libc_start_main_impl at ??:? 7 0x0000000000401d40: _start at ??:? KVM_SET_USER_MEMORY_REGION failed, rc: -1 errno: 5 (Input/output error) Which morphs into a more pointed error message with the detection: ==== Test Assertion Failure ==== lib/kvm_util.c:689: false pid=80347 tid=80347 errno=5 - Input/output error 1 0x00000000004039ab: __vm_mem_region_delete at kvm_util.c:689 (discriminator 5) 2 0x0000000000404660: kvm_vm_free at kvm_util.c:724 (discriminator 12) 3 0x0000000000402ac9: race_sync_regs at sync_regs_test.c:193 4 0x0000000000401cb7: main at sync_regs_test.c:334 (discriminator 6) 5 0x0000000000418263: __libc_start_call_main at libc-start.o:? 6 0x00000000004198af: __libc_start_main_impl at ??:? 7 0x0000000000401d90: _start at ??:? KVM killed/bugged the VM, check the kernel log for clues Suggested-by: Michal Luczaj <mhal@rbox.co> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Colton Lewis <coltonlewis@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> --- .../selftests/kvm/include/kvm_util_base.h | 39 ++++++++++++++++--- tools/testing/selftests/kvm/lib/kvm_util.c | 2 +- 2 files changed, 35 insertions(+), 6 deletions(-) diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h b/tools/testing/selftests/kvm/include/kvm_util_base.h index 1f6193dc7d3a..c7717942ddbb 100644 --- a/tools/testing/selftests/kvm/include/kvm_util_base.h +++ b/tools/testing/selftests/kvm/include/kvm_util_base.h @@ -282,11 +282,40 @@ static __always_inline void static_assert_is_vm(struct kvm_vm *vm) { } kvm_do_ioctl((vm)->fd, cmd, arg); \ }) +/* + * Assert that a VM or vCPU ioctl() succeeded, with extra magic to detect if + * the ioctl() failed because KVM killed/bugged the VM. To detect a dead VM, + * probe KVM_CAP_USER_MEMORY, which (a) has been supported by KVM since before + * selftests existed and (b) should never outright fail, i.e. is supposed to + * return 0 or 1. If KVM kills a VM, KVM returns -EIO for all ioctl()s for the + * VM and its vCPUs, including KVM_CHECK_EXTENSION. + */ +#define __TEST_ASSERT_VM_VCPU_IOCTL(cond, name, ret, vm) \ +do { \ + int __errno = errno; \ + \ + static_assert_is_vm(vm); \ + \ + if (cond) \ + break; \ + \ + if (errno == EIO && \ + __vm_ioctl(vm, KVM_CHECK_EXTENSION, (void *)KVM_CAP_USER_MEMORY) < 0) { \ + TEST_ASSERT(errno == EIO, "KVM killed the VM, should return -EIO"); \ + TEST_FAIL("KVM killed/bugged the VM, check the kernel log for clues"); \ + } \ + errno = __errno; \ + TEST_ASSERT(cond, __KVM_IOCTL_ERROR(name, ret)); \ +} while (0) + +#define TEST_ASSERT_VM_VCPU_IOCTL(cond, cmd, ret, vm) \ + __TEST_ASSERT_VM_VCPU_IOCTL(cond, #cmd, ret, vm) + #define vm_ioctl(vm, cmd, arg) \ ({ \ int ret = __vm_ioctl(vm, cmd, arg); \ \ - TEST_ASSERT(!ret, __KVM_IOCTL_ERROR(#cmd, ret)); \ + __TEST_ASSERT_VM_VCPU_IOCTL(!ret, #cmd, ret, vm); \ }) static __always_inline void static_assert_is_vcpu(struct kvm_vcpu *vcpu) { } @@ -301,7 +330,7 @@ static __always_inline void static_assert_is_vcpu(struct kvm_vcpu *vcpu) { } ({ \ int ret = __vcpu_ioctl(vcpu, cmd, arg); \ \ - TEST_ASSERT(!ret, __KVM_IOCTL_ERROR(#cmd, ret)); \ + __TEST_ASSERT_VM_VCPU_IOCTL(!ret, #cmd, ret, (vcpu)->vm); \ }) /* @@ -312,7 +341,7 @@ static inline int vm_check_cap(struct kvm_vm *vm, long cap) { int ret = __vm_ioctl(vm, KVM_CHECK_EXTENSION, (void *)cap); - TEST_ASSERT(ret >= 0, KVM_IOCTL_ERROR(KVM_CHECK_EXTENSION, ret)); + TEST_ASSERT_VM_VCPU_IOCTL(ret >= 0, KVM_CHECK_EXTENSION, ret, vm); return ret; } @@ -371,7 +400,7 @@ static inline int vm_get_stats_fd(struct kvm_vm *vm) { int fd = __vm_ioctl(vm, KVM_GET_STATS_FD, NULL); - TEST_ASSERT(fd >= 0, KVM_IOCTL_ERROR(KVM_GET_STATS_FD, fd)); + TEST_ASSERT_VM_VCPU_IOCTL(fd >= 0, KVM_GET_STATS_FD, fd, vm); return fd; } @@ -583,7 +612,7 @@ static inline int vcpu_get_stats_fd(struct kvm_vcpu *vcpu) { int fd = __vcpu_ioctl(vcpu, KVM_GET_STATS_FD, NULL); - TEST_ASSERT(fd >= 0, KVM_IOCTL_ERROR(KVM_GET_STATS_FD, fd)); + TEST_ASSERT_VM_VCPU_IOCTL(fd >= 0, KVM_CHECK_EXTENSION, fd, vcpu->vm); return fd; } diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index 7a8af1821f5d..c847f942cd38 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -1227,7 +1227,7 @@ struct kvm_vcpu *__vm_vcpu_add(struct kvm_vm *vm, uint32_t vcpu_id) vcpu->vm = vm; vcpu->id = vcpu_id; vcpu->fd = __vm_ioctl(vm, KVM_CREATE_VCPU, (void *)(unsigned long)vcpu_id); - TEST_ASSERT(vcpu->fd >= 0, KVM_IOCTL_ERROR(KVM_CREATE_VCPU, vcpu->fd)); + TEST_ASSERT_VM_VCPU_IOCTL(vcpu->fd >= 0, KVM_CREATE_VCPU, vcpu->fd, vm); TEST_ASSERT(vcpu_mmap_sz() >= sizeof(*vcpu->run), "vcpu mmap size " "smaller than expected, vcpu_mmap_sz: %i expected_min: %zi", -- 2.42.0.869.gea05f2083d-goog ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v2 2/2] KVM: selftests: Add logic to detect if ioctl() failed because VM was killed 2023-11-08 1:09 ` [PATCH v2 2/2] KVM: selftests: Add logic to detect if ioctl() failed because VM was killed Sean Christopherson @ 2023-11-08 10:06 ` Xiaoyao Li 2023-11-08 16:07 ` Sean Christopherson 0 siblings, 1 reply; 10+ messages in thread From: Xiaoyao Li @ 2023-11-08 10:06 UTC (permalink / raw) To: Sean Christopherson, Paolo Bonzini Cc: kvm, linux-kernel, Michal Luczaj, Oliver Upton, Colton Lewis On 11/8/2023 9:09 AM, Sean Christopherson wrote: > Add yet another macro to the VM/vCPU ioctl() framework to detect when an > ioctl() failed because KVM killed/bugged the VM, i.e. when there was > nothing wrong with the ioctl() itself. If KVM kills a VM, e.g. by way of > a failed KVM_BUG_ON(), all subsequent VM and vCPU ioctl()s will fail with > -EIO, which can be quite misleading and ultimately waste user/developer > time. > > Use KVM_CHECK_EXTENSION on KVM_CAP_USER_MEMORY to detect if the VM is > dead and/or bug, as KVM doesn't provide a dedicated ioctl(). Using a > heuristic is obviously less than ideal, but practically speaking the logic > is bulletproof barring a KVM change, and any such change would arguably > break userspace, e.g. if KVM returns something other than -EIO. We hit similar issue when testing TDX VMs. Most failure of SEMCALL is handled with a KVM_BUG_ON(), which leads to vm dead. Then the following IOCTL from userspace (QEMU) and gets -EIO. Can we return a new KVM_EXIT_VM_DEAD on KVM_REQ_VM_DEAD? and replace -EIO with 0? yes, it's a ABI change. But I'm wondering if any userspace relies on -EIO behavior for VM DEAD case. > Without the detection, tearing down a bugged VM yields a cryptic failure > when deleting memslots: > > ==== Test Assertion Failure ==== > lib/kvm_util.c:689: !ret > pid=45131 tid=45131 errno=5 - Input/output error > 1 0x00000000004036c3: __vm_mem_region_delete at kvm_util.c:689 > 2 0x00000000004042f0: kvm_vm_free at kvm_util.c:724 (discriminator 12) > 3 0x0000000000402929: race_sync_regs at sync_regs_test.c:193 > 4 0x0000000000401cab: main at sync_regs_test.c:334 (discriminator 6) > 5 0x0000000000416f13: __libc_start_call_main at libc-start.o:? > 6 0x000000000041855f: __libc_start_main_impl at ??:? > 7 0x0000000000401d40: _start at ??:? > KVM_SET_USER_MEMORY_REGION failed, rc: -1 errno: 5 (Input/output error) > > Which morphs into a more pointed error message with the detection: > > ==== Test Assertion Failure ==== > lib/kvm_util.c:689: false > pid=80347 tid=80347 errno=5 - Input/output error > 1 0x00000000004039ab: __vm_mem_region_delete at kvm_util.c:689 (discriminator 5) > 2 0x0000000000404660: kvm_vm_free at kvm_util.c:724 (discriminator 12) > 3 0x0000000000402ac9: race_sync_regs at sync_regs_test.c:193 > 4 0x0000000000401cb7: main at sync_regs_test.c:334 (discriminator 6) > 5 0x0000000000418263: __libc_start_call_main at libc-start.o:? > 6 0x00000000004198af: __libc_start_main_impl at ??:? > 7 0x0000000000401d90: _start at ??:? > KVM killed/bugged the VM, check the kernel log for clues > > Suggested-by: Michal Luczaj <mhal@rbox.co> > Cc: Oliver Upton <oliver.upton@linux.dev> > Cc: Colton Lewis <coltonlewis@google.com> > Signed-off-by: Sean Christopherson <seanjc@google.com> > --- > .../selftests/kvm/include/kvm_util_base.h | 39 ++++++++++++++++--- > tools/testing/selftests/kvm/lib/kvm_util.c | 2 +- > 2 files changed, 35 insertions(+), 6 deletions(-) > > diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h b/tools/testing/selftests/kvm/include/kvm_util_base.h > index 1f6193dc7d3a..c7717942ddbb 100644 > --- a/tools/testing/selftests/kvm/include/kvm_util_base.h > +++ b/tools/testing/selftests/kvm/include/kvm_util_base.h > @@ -282,11 +282,40 @@ static __always_inline void static_assert_is_vm(struct kvm_vm *vm) { } > kvm_do_ioctl((vm)->fd, cmd, arg); \ > }) > > +/* > + * Assert that a VM or vCPU ioctl() succeeded, with extra magic to detect if > + * the ioctl() failed because KVM killed/bugged the VM. To detect a dead VM, > + * probe KVM_CAP_USER_MEMORY, which (a) has been supported by KVM since before > + * selftests existed and (b) should never outright fail, i.e. is supposed to > + * return 0 or 1. If KVM kills a VM, KVM returns -EIO for all ioctl()s for the > + * VM and its vCPUs, including KVM_CHECK_EXTENSION. > + */ > +#define __TEST_ASSERT_VM_VCPU_IOCTL(cond, name, ret, vm) \ > +do { \ > + int __errno = errno; \ > + \ > + static_assert_is_vm(vm); \ > + \ > + if (cond) \ > + break; \ > + \ > + if (errno == EIO && \ > + __vm_ioctl(vm, KVM_CHECK_EXTENSION, (void *)KVM_CAP_USER_MEMORY) < 0) { \ > + TEST_ASSERT(errno == EIO, "KVM killed the VM, should return -EIO"); \ > + TEST_FAIL("KVM killed/bugged the VM, check the kernel log for clues"); \ > + } \ > + errno = __errno; \ > + TEST_ASSERT(cond, __KVM_IOCTL_ERROR(name, ret)); \ > +} while (0) > + > +#define TEST_ASSERT_VM_VCPU_IOCTL(cond, cmd, ret, vm) \ > + __TEST_ASSERT_VM_VCPU_IOCTL(cond, #cmd, ret, vm) > + > #define vm_ioctl(vm, cmd, arg) \ > ({ \ > int ret = __vm_ioctl(vm, cmd, arg); \ > \ > - TEST_ASSERT(!ret, __KVM_IOCTL_ERROR(#cmd, ret)); \ > + __TEST_ASSERT_VM_VCPU_IOCTL(!ret, #cmd, ret, vm); \ > }) > > static __always_inline void static_assert_is_vcpu(struct kvm_vcpu *vcpu) { } > @@ -301,7 +330,7 @@ static __always_inline void static_assert_is_vcpu(struct kvm_vcpu *vcpu) { } > ({ \ > int ret = __vcpu_ioctl(vcpu, cmd, arg); \ > \ > - TEST_ASSERT(!ret, __KVM_IOCTL_ERROR(#cmd, ret)); \ > + __TEST_ASSERT_VM_VCPU_IOCTL(!ret, #cmd, ret, (vcpu)->vm); \ > }) > > /* > @@ -312,7 +341,7 @@ static inline int vm_check_cap(struct kvm_vm *vm, long cap) > { > int ret = __vm_ioctl(vm, KVM_CHECK_EXTENSION, (void *)cap); > > - TEST_ASSERT(ret >= 0, KVM_IOCTL_ERROR(KVM_CHECK_EXTENSION, ret)); > + TEST_ASSERT_VM_VCPU_IOCTL(ret >= 0, KVM_CHECK_EXTENSION, ret, vm); > return ret; > } > > @@ -371,7 +400,7 @@ static inline int vm_get_stats_fd(struct kvm_vm *vm) > { > int fd = __vm_ioctl(vm, KVM_GET_STATS_FD, NULL); > > - TEST_ASSERT(fd >= 0, KVM_IOCTL_ERROR(KVM_GET_STATS_FD, fd)); > + TEST_ASSERT_VM_VCPU_IOCTL(fd >= 0, KVM_GET_STATS_FD, fd, vm); > return fd; > } > > @@ -583,7 +612,7 @@ static inline int vcpu_get_stats_fd(struct kvm_vcpu *vcpu) > { > int fd = __vcpu_ioctl(vcpu, KVM_GET_STATS_FD, NULL); > > - TEST_ASSERT(fd >= 0, KVM_IOCTL_ERROR(KVM_GET_STATS_FD, fd)); > + TEST_ASSERT_VM_VCPU_IOCTL(fd >= 0, KVM_CHECK_EXTENSION, fd, vcpu->vm); > return fd; > } > > diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c > index 7a8af1821f5d..c847f942cd38 100644 > --- a/tools/testing/selftests/kvm/lib/kvm_util.c > +++ b/tools/testing/selftests/kvm/lib/kvm_util.c > @@ -1227,7 +1227,7 @@ struct kvm_vcpu *__vm_vcpu_add(struct kvm_vm *vm, uint32_t vcpu_id) > vcpu->vm = vm; > vcpu->id = vcpu_id; > vcpu->fd = __vm_ioctl(vm, KVM_CREATE_VCPU, (void *)(unsigned long)vcpu_id); > - TEST_ASSERT(vcpu->fd >= 0, KVM_IOCTL_ERROR(KVM_CREATE_VCPU, vcpu->fd)); > + TEST_ASSERT_VM_VCPU_IOCTL(vcpu->fd >= 0, KVM_CREATE_VCPU, vcpu->fd, vm); > > TEST_ASSERT(vcpu_mmap_sz() >= sizeof(*vcpu->run), "vcpu mmap size " > "smaller than expected, vcpu_mmap_sz: %i expected_min: %zi", ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 2/2] KVM: selftests: Add logic to detect if ioctl() failed because VM was killed 2023-11-08 10:06 ` Xiaoyao Li @ 2023-11-08 16:07 ` Sean Christopherson 2023-11-13 4:04 ` Xiaoyao Li 0 siblings, 1 reply; 10+ messages in thread From: Sean Christopherson @ 2023-11-08 16:07 UTC (permalink / raw) To: Xiaoyao Li Cc: Paolo Bonzini, kvm, linux-kernel, Michal Luczaj, Oliver Upton, Colton Lewis On Wed, Nov 08, 2023, Xiaoyao Li wrote: > On 11/8/2023 9:09 AM, Sean Christopherson wrote: > > Add yet another macro to the VM/vCPU ioctl() framework to detect when an > > ioctl() failed because KVM killed/bugged the VM, i.e. when there was > > nothing wrong with the ioctl() itself. If KVM kills a VM, e.g. by way of > > a failed KVM_BUG_ON(), all subsequent VM and vCPU ioctl()s will fail with > > -EIO, which can be quite misleading and ultimately waste user/developer > > time. > > > > Use KVM_CHECK_EXTENSION on KVM_CAP_USER_MEMORY to detect if the VM is > > dead and/or bug, as KVM doesn't provide a dedicated ioctl(). Using a > > heuristic is obviously less than ideal, but practically speaking the logic > > is bulletproof barring a KVM change, and any such change would arguably > > break userspace, e.g. if KVM returns something other than -EIO. > > We hit similar issue when testing TDX VMs. Most failure of SEMCALL is > handled with a KVM_BUG_ON(), which leads to vm dead. Then the following > IOCTL from userspace (QEMU) and gets -EIO. > > Can we return a new KVM_EXIT_VM_DEAD on KVM_REQ_VM_DEAD? Why? Even if KVM_EXIT_VM_DEAD somehow provided enough information to be useful from an automation perspective, the VM is obviously dead. I don't see how the VMM can do anything but log the error and tear down the VM. KVM_BUG_ON() comes with a WARN, which will be far more helpful for a human debugger, e.g. because all vCPUs would exit with KVM_EXIT_VM_DEAD, it wouldn't even identify which vCPU initially triggered the issue. Using an exit reason is a also bit tricky because it requires a vCPU, whereas a dead VM blocks anything and everything. > and replace -EIO with 0? yes, it's a ABI change. Definitely a "no" on this one. As has been established by the guest_memfd series, it's ok to return -1/errno with a valid exit_reason. > But I'm wondering if any userspace relies on -EIO behavior for VM DEAD case. I doubt userspace relies on -EIO, but userpsace definitely relies on -1/errno being returned when a fatal error. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 2/2] KVM: selftests: Add logic to detect if ioctl() failed because VM was killed 2023-11-08 16:07 ` Sean Christopherson @ 2023-11-13 4:04 ` Xiaoyao Li 2023-11-29 19:22 ` Sean Christopherson 0 siblings, 1 reply; 10+ messages in thread From: Xiaoyao Li @ 2023-11-13 4:04 UTC (permalink / raw) To: Sean Christopherson Cc: Paolo Bonzini, kvm, linux-kernel, Michal Luczaj, Oliver Upton, Colton Lewis On 11/9/2023 12:07 AM, Sean Christopherson wrote: > On Wed, Nov 08, 2023, Xiaoyao Li wrote: >> On 11/8/2023 9:09 AM, Sean Christopherson wrote: >>> Add yet another macro to the VM/vCPU ioctl() framework to detect when an >>> ioctl() failed because KVM killed/bugged the VM, i.e. when there was >>> nothing wrong with the ioctl() itself. If KVM kills a VM, e.g. by way of >>> a failed KVM_BUG_ON(), all subsequent VM and vCPU ioctl()s will fail with >>> -EIO, which can be quite misleading and ultimately waste user/developer >>> time. >>> >>> Use KVM_CHECK_EXTENSION on KVM_CAP_USER_MEMORY to detect if the VM is >>> dead and/or bug, as KVM doesn't provide a dedicated ioctl(). Using a >>> heuristic is obviously less than ideal, but practically speaking the logic >>> is bulletproof barring a KVM change, and any such change would arguably >>> break userspace, e.g. if KVM returns something other than -EIO. >> >> We hit similar issue when testing TDX VMs. Most failure of SEMCALL is >> handled with a KVM_BUG_ON(), which leads to vm dead. Then the following >> IOCTL from userspace (QEMU) and gets -EIO. >> >> Can we return a new KVM_EXIT_VM_DEAD on KVM_REQ_VM_DEAD? > > Why? Even if KVM_EXIT_VM_DEAD somehow provided enough information to be useful > from an automation perspective, the VM is obviously dead. I don't see how the > VMM can do anything but log the error and tear down the VM. KVM_BUG_ON() comes > with a WARN, which will be far more helpful for a human debugger, e.g. because > all vCPUs would exit with KVM_EXIT_VM_DEAD, it wouldn't even identify which vCPU > initially triggered the issue. It's not about providing more helpful debugging info, but to provide a dedicated notification for VMM that "the VM is dead, all the following command may not response". With it, VMM can get rid of the tricky detection like this patch. > Using an exit reason is a also bit tricky because it requires a vCPU, whereas a > dead VM blocks anything and everything. No argue of it. It cannot work for all the case, but at least it can make some case happier. >> and replace -EIO with 0? yes, it's a ABI change. > > Definitely a "no" on this one. As has been established by the guest_memfd series, > it's ok to return -1/errno with a valid exit_reason. > >> But I'm wondering if any userspace relies on -EIO behavior for VM DEAD case. > > I doubt userspace relies on -EIO, but userpsace definitely relies on -1/errno being > returned when a fatal error. what about KVM_EXIT_SHUTDOWN? Or KVM_EXIT_INTERNAL_ERROR? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 2/2] KVM: selftests: Add logic to detect if ioctl() failed because VM was killed 2023-11-13 4:04 ` Xiaoyao Li @ 2023-11-29 19:22 ` Sean Christopherson 2023-11-30 3:04 ` Xiaoyao Li 0 siblings, 1 reply; 10+ messages in thread From: Sean Christopherson @ 2023-11-29 19:22 UTC (permalink / raw) To: Xiaoyao Li Cc: Paolo Bonzini, kvm, linux-kernel, Michal Luczaj, Oliver Upton, Colton Lewis On Mon, Nov 13, 2023, Xiaoyao Li wrote: > On 11/9/2023 12:07 AM, Sean Christopherson wrote: > > On Wed, Nov 08, 2023, Xiaoyao Li wrote: > > > On 11/8/2023 9:09 AM, Sean Christopherson wrote: > > > > Add yet another macro to the VM/vCPU ioctl() framework to detect when an > > > > ioctl() failed because KVM killed/bugged the VM, i.e. when there was > > > > nothing wrong with the ioctl() itself. If KVM kills a VM, e.g. by way of > > > > a failed KVM_BUG_ON(), all subsequent VM and vCPU ioctl()s will fail with > > > > -EIO, which can be quite misleading and ultimately waste user/developer > > > > time. > > > > > > > > Use KVM_CHECK_EXTENSION on KVM_CAP_USER_MEMORY to detect if the VM is > > > > dead and/or bug, as KVM doesn't provide a dedicated ioctl(). Using a > > > > heuristic is obviously less than ideal, but practically speaking the logic > > > > is bulletproof barring a KVM change, and any such change would arguably > > > > break userspace, e.g. if KVM returns something other than -EIO. > > > > > > We hit similar issue when testing TDX VMs. Most failure of SEMCALL is > > > handled with a KVM_BUG_ON(), which leads to vm dead. Then the following > > > IOCTL from userspace (QEMU) and gets -EIO. > > > > > > Can we return a new KVM_EXIT_VM_DEAD on KVM_REQ_VM_DEAD? > > > > Why? Even if KVM_EXIT_VM_DEAD somehow provided enough information to be useful > > from an automation perspective, the VM is obviously dead. I don't see how the > > VMM can do anything but log the error and tear down the VM. KVM_BUG_ON() comes > > with a WARN, which will be far more helpful for a human debugger, e.g. because > > all vCPUs would exit with KVM_EXIT_VM_DEAD, it wouldn't even identify which vCPU > > initially triggered the issue. > > It's not about providing more helpful debugging info, but to provide a > dedicated notification for VMM that "the VM is dead, all the following > command may not response". With it, VMM can get rid of the tricky detection > like this patch. But a VMM doesn't need this tricky detection, because this tricky detections isn't about detecting that the VM is dead, it's all about helping a human debug why a test failed. -EIO already effectively says "the VM is dead", e.g. QEMU isn't going to keep trying to run vCPUs. Similarly, selftests assert either way, the goal is purely to print out a unique error message to minimize the chances of confusing the human running the test (or looking at results). > > Definitely a "no" on this one. As has been established by the guest_memfd series, > > it's ok to return -1/errno with a valid exit_reason. > > > > > But I'm wondering if any userspace relies on -EIO behavior for VM DEAD case. > > > > I doubt userspace relies on -EIO, but userpsace definitely relies on -1/errno being > > returned when a fatal error. > > what about KVM_EXIT_SHUTDOWN? Or KVM_EXIT_INTERNAL_ERROR? I don't follow, those are vcpu_run.exit_reason values, not errno values. Returning any flavor of KVM_EXIT_*, which are positive values, would break userspace, e.g. QEMU explicitly looks for "ret < 0", and glibc only treats small-ish negative values as errors, i.e. a postive return value will be propagated verbatim up to QEMU. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 2/2] KVM: selftests: Add logic to detect if ioctl() failed because VM was killed 2023-11-29 19:22 ` Sean Christopherson @ 2023-11-30 3:04 ` Xiaoyao Li 2023-11-30 16:09 ` Sean Christopherson 0 siblings, 1 reply; 10+ messages in thread From: Xiaoyao Li @ 2023-11-30 3:04 UTC (permalink / raw) To: Sean Christopherson Cc: Paolo Bonzini, kvm, linux-kernel, Michal Luczaj, Oliver Upton, Colton Lewis On 11/30/2023 3:22 AM, Sean Christopherson wrote: > On Mon, Nov 13, 2023, Xiaoyao Li wrote: >> On 11/9/2023 12:07 AM, Sean Christopherson wrote: >>> On Wed, Nov 08, 2023, Xiaoyao Li wrote: >>>> On 11/8/2023 9:09 AM, Sean Christopherson wrote: >>>>> Add yet another macro to the VM/vCPU ioctl() framework to detect when an >>>>> ioctl() failed because KVM killed/bugged the VM, i.e. when there was >>>>> nothing wrong with the ioctl() itself. If KVM kills a VM, e.g. by way of >>>>> a failed KVM_BUG_ON(), all subsequent VM and vCPU ioctl()s will fail with >>>>> -EIO, which can be quite misleading and ultimately waste user/developer >>>>> time. >>>>> >>>>> Use KVM_CHECK_EXTENSION on KVM_CAP_USER_MEMORY to detect if the VM is >>>>> dead and/or bug, as KVM doesn't provide a dedicated ioctl(). Using a >>>>> heuristic is obviously less than ideal, but practically speaking the logic >>>>> is bulletproof barring a KVM change, and any such change would arguably >>>>> break userspace, e.g. if KVM returns something other than -EIO. >>>> >>>> We hit similar issue when testing TDX VMs. Most failure of SEMCALL is >>>> handled with a KVM_BUG_ON(), which leads to vm dead. Then the following >>>> IOCTL from userspace (QEMU) and gets -EIO. >>>> >>>> Can we return a new KVM_EXIT_VM_DEAD on KVM_REQ_VM_DEAD? >>> >>> Why? Even if KVM_EXIT_VM_DEAD somehow provided enough information to be useful >>> from an automation perspective, the VM is obviously dead. I don't see how the >>> VMM can do anything but log the error and tear down the VM. KVM_BUG_ON() comes >>> with a WARN, which will be far more helpful for a human debugger, e.g. because >>> all vCPUs would exit with KVM_EXIT_VM_DEAD, it wouldn't even identify which vCPU >>> initially triggered the issue. >> >> It's not about providing more helpful debugging info, but to provide a >> dedicated notification for VMM that "the VM is dead, all the following >> command may not response". With it, VMM can get rid of the tricky detection >> like this patch. > > But a VMM doesn't need this tricky detection, because this tricky detections isn't > about detecting that the VM is dead, it's all about helping a human debug why a > test failed. > > -EIO already effectively says "the VM is dead", e.g. QEMU isn't going to keep trying > to run vCPUs. If -EIO for KVM ioctl denotes "the VM is dead" is to be the officially announced API, I'm fine. > Similarly, selftests assert either way, the goal is purely to print > out a unique error message to minimize the chances of confusing the human running > the test (or looking at results). > >>> Definitely a "no" on this one. As has been established by the guest_memfd series, >>> it's ok to return -1/errno with a valid exit_reason. >>> >>>> But I'm wondering if any userspace relies on -EIO behavior for VM DEAD case. >>> >>> I doubt userspace relies on -EIO, but userpsace definitely relies on -1/errno being >>> returned when a fatal error. >> >> what about KVM_EXIT_SHUTDOWN? Or KVM_EXIT_INTERNAL_ERROR? > > I don't follow, I was trying to ask if KVM_EXIT_SHUTDOWN and KVM_EXIT_INTERNAL_ERROR are treated as fatal error by userspace. > those are vcpu_run.exit_reason values, not errno values. Returning > any flavor of KVM_EXIT_*, which are positive values, would break userspace, e.g. > QEMU explicitly looks for "ret < 0", and glibc only treats small-ish negative > values as errors, i.e. a postive return value will be propagated verbatim up to > QEMU. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 2/2] KVM: selftests: Add logic to detect if ioctl() failed because VM was killed 2023-11-30 3:04 ` Xiaoyao Li @ 2023-11-30 16:09 ` Sean Christopherson 0 siblings, 0 replies; 10+ messages in thread From: Sean Christopherson @ 2023-11-30 16:09 UTC (permalink / raw) To: Xiaoyao Li Cc: Paolo Bonzini, kvm, linux-kernel, Michal Luczaj, Oliver Upton, Colton Lewis On Thu, Nov 30, 2023, Xiaoyao Li wrote: > On 11/30/2023 3:22 AM, Sean Christopherson wrote: > > On Mon, Nov 13, 2023, Xiaoyao Li wrote: > > > On 11/9/2023 12:07 AM, Sean Christopherson wrote: > > > > On Wed, Nov 08, 2023, Xiaoyao Li wrote: > > > > > On 11/8/2023 9:09 AM, Sean Christopherson wrote: > > > > > > Add yet another macro to the VM/vCPU ioctl() framework to detect when an > > > > > > ioctl() failed because KVM killed/bugged the VM, i.e. when there was > > > > > > nothing wrong with the ioctl() itself. If KVM kills a VM, e.g. by way of > > > > > > a failed KVM_BUG_ON(), all subsequent VM and vCPU ioctl()s will fail with > > > > > > -EIO, which can be quite misleading and ultimately waste user/developer > > > > > > time. > > > > > > > > > > > > Use KVM_CHECK_EXTENSION on KVM_CAP_USER_MEMORY to detect if the VM is > > > > > > dead and/or bug, as KVM doesn't provide a dedicated ioctl(). Using a > > > > > > heuristic is obviously less than ideal, but practically speaking the logic > > > > > > is bulletproof barring a KVM change, and any such change would arguably > > > > > > break userspace, e.g. if KVM returns something other than -EIO. > > > > > > > > > > We hit similar issue when testing TDX VMs. Most failure of SEMCALL is > > > > > handled with a KVM_BUG_ON(), which leads to vm dead. Then the following > > > > > IOCTL from userspace (QEMU) and gets -EIO. > > > > > > > > > > Can we return a new KVM_EXIT_VM_DEAD on KVM_REQ_VM_DEAD? > > > > > > > > Why? Even if KVM_EXIT_VM_DEAD somehow provided enough information to be useful > > > > from an automation perspective, the VM is obviously dead. I don't see how the > > > > VMM can do anything but log the error and tear down the VM. KVM_BUG_ON() comes > > > > with a WARN, which will be far more helpful for a human debugger, e.g. because > > > > all vCPUs would exit with KVM_EXIT_VM_DEAD, it wouldn't even identify which vCPU > > > > initially triggered the issue. > > > > > > It's not about providing more helpful debugging info, but to provide a > > > dedicated notification for VMM that "the VM is dead, all the following > > > command may not response". With it, VMM can get rid of the tricky detection > > > like this patch. > > > > But a VMM doesn't need this tricky detection, because this tricky detections isn't > > about detecting that the VM is dead, it's all about helping a human debug why a > > test failed. > > > > -EIO already effectively says "the VM is dead", e.g. QEMU isn't going to keep trying > > to run vCPUs. > > If -EIO for KVM ioctl denotes "the VM is dead" is to be the officially > announced API, I'm fine. Yes, -EIO is effectively ABI at this point. Though there is the caveat that -EIO doesn't guarantee KVM killed the VM, i.e. KVM could return -EIO for some other reason (though that's highly unlikely for KVM_RUN at least). > > Similarly, selftests assert either way, the goal is purely to print > > out a unique error message to minimize the chances of confusing the human running > > the test (or looking at results). > > > > > > Definitely a "no" on this one. As has been established by the guest_memfd series, > > > > it's ok to return -1/errno with a valid exit_reason. > > > > > > > > > But I'm wondering if any userspace relies on -EIO behavior for VM DEAD case. > > > > > > > > I doubt userspace relies on -EIO, but userpsace definitely relies on -1/errno being > > > > returned when a fatal error. > > > > > > what about KVM_EXIT_SHUTDOWN? Or KVM_EXIT_INTERNAL_ERROR? > > > > I don't follow, > > I was trying to ask if KVM_EXIT_SHUTDOWN and KVM_EXIT_INTERNAL_ERROR are > treated as fatal error by userspace. Ah. Not really. SHUTDOWN isn't fatal per se, e.g. QEMU emulates a RESET if a vCPU hits shutdown. INTERNAL_ERROR isn't always fatal on x86, e.g. QEMU ignores (I think that's what happens) emulation failure when the vCPU is at CPL > 0 so that guest userspace can't DoS the VM. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 0/2] KVM: selftests: Detect if KVM bugged the VM 2023-11-08 1:09 [PATCH v2 0/2] KVM: selftests: Detect if KVM bugged the VM Sean Christopherson 2023-11-08 1:09 ` [PATCH v2 1/2] KVM: selftests: Drop the single-underscore ioctl() helpers Sean Christopherson 2023-11-08 1:09 ` [PATCH v2 2/2] KVM: selftests: Add logic to detect if ioctl() failed because VM was killed Sean Christopherson @ 2023-11-30 1:44 ` Sean Christopherson 2 siblings, 0 replies; 10+ messages in thread From: Sean Christopherson @ 2023-11-30 1:44 UTC (permalink / raw) To: Sean Christopherson, Paolo Bonzini Cc: kvm, linux-kernel, Michal Luczaj, Oliver Upton, Colton Lewis On Tue, 07 Nov 2023 17:09:51 -0800, Sean Christopherson wrote: > Teach selftests' ioctl() macros to detect and report when an ioctl() > unexpectedly fails because KVM has killed and/or bugged the VM. Because > selftests does the right thing and tries to gracefully clean up VMs, a > bugged VM can generate confusing errors, e.g. when deleting memslots. > > v2: > - Drop the ARM patch (not worth the churn). > - Drop macros for ioctls() that return file descriptors. Looking at this > with fresh eyes, I agree they do more harm than good. [Oliver] > > [...] Applied to kvm-x86 selftests. Xiaoyao, I definitely want to continue the conversation on improving the userspace experience when KVM kills a VM, but I don't see a reason to hold up "fixing" the selftests. [1/2] KVM: selftests: Drop the single-underscore ioctl() helpers https://github.com/kvm-x86/linux/commit/6542a0036928 [2/2] KVM: selftests: Add logic to detect if ioctl() failed because VM was killed https://github.com/kvm-x86/linux/commit/1b78d474ce4e -- https://github.com/kvm-x86/linux/tree/next ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2023-11-30 16:09 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-11-08 1:09 [PATCH v2 0/2] KVM: selftests: Detect if KVM bugged the VM Sean Christopherson 2023-11-08 1:09 ` [PATCH v2 1/2] KVM: selftests: Drop the single-underscore ioctl() helpers Sean Christopherson 2023-11-08 1:09 ` [PATCH v2 2/2] KVM: selftests: Add logic to detect if ioctl() failed because VM was killed Sean Christopherson 2023-11-08 10:06 ` Xiaoyao Li 2023-11-08 16:07 ` Sean Christopherson 2023-11-13 4:04 ` Xiaoyao Li 2023-11-29 19:22 ` Sean Christopherson 2023-11-30 3:04 ` Xiaoyao Li 2023-11-30 16:09 ` Sean Christopherson 2023-11-30 1:44 ` [PATCH v2 0/2] KVM: selftests: Detect if KVM bugged the VM Sean Christopherson
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.