From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D961124DD09 for ; Tue, 30 Sep 2025 14:24:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759242251; cv=none; b=ogQKp3evMZoHrbQiv1jfmX+PmlusGKMzFz5gwdEGZGgzRexlf2Km2YTCPUZ+TwY4G9/QJYMusrhRnQUYF02NQ35PIZPXFOF7atjTz+6T0IsfVY3zr+jjhR+Ln6PMIPky7ps+yDkGwWc308C4mxq8e9tDzERND3/2Ab50DlBS6sk= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759242251; c=relaxed/simple; bh=QX4I/lCarlYgvfsjBbMkBMBMASp/lik6TTXQ2+1tQ5Y=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=ELQK2U6zjauwxD1puARMqccuusjw9/HiAZLyDiJbJYyOw1na9CLrz7djV6X8X1EpLuIEI9+TAL5eP4cSRmzkZVH6ry2L+pdvIhSFUsbQqU7K9iK+BX4Xk3YlCKIX3JC9Bqxjm4Bo2myLloyeA/vlsiJDD7aoAQA0DegMxfkZrdk= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=2ZuA/d9x; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="2ZuA/d9x" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-332560b7171so5341726a91.0 for ; Tue, 30 Sep 2025 07:24:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1759242249; x=1759847049; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=oQlJNmaSy7Vsey9bZkxA7JGXtQsD4l92RLpVpPchWDw=; b=2ZuA/d9x4STm9M28mbYFfxncb31DO8LaUjtbpc8kFduhRUUhKMuoPgHzhkIc18RvMo 3tu2gJE+VPHxbBH5mK6CIECGIpAVw0gtx/VAX8wtt/oe3qUH8OoOXpQL1PZnQU65XWxp 9q2xkAWZMlUxbVXfo10rcPP/WcOfnuQ3RJxFMT/O8zwZQoCjIcboG4cB3JT2pVTAL6KC aCGt/3jflz4vAxsfl0igIcT+NJ0Msvjc1YVkgplV7ZG/yw+GJsJxEfQD1w6/A9BeVwYj 8YOBp8mXrn0cg9Dq+e1KKl6FmYR9BD1GQqODdApBozmg9gA6gPgGTPDdKfl1n7Llf4yY wgEw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1759242249; x=1759847049; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=oQlJNmaSy7Vsey9bZkxA7JGXtQsD4l92RLpVpPchWDw=; b=LCvFp4tdJuf+ub1S2qvpnZG9+7+usyo145Ojj+emumXPQ4UzRw0z2cciwdwg8xBTAu ImFUeBmArlfl2GRyfEGNWazkwqcUnkBvIPBpnDhjw5a81EY/fiN0UA/CX3R7yN06L2dW DQBmscbHwmwZ+g4SSYc0k5dHiNnkXFoEvMWsLSQRTqm1J0whdiocNtGv6DqobEk1lEpa FtVIbMugQo8fW8QwYFu9MqLD12AESw9m/6/eIkX8LINtVAPWNOyZMA2xUIctqsNeGMNB Ddo/R8DFFJxr0oHs/6yxSjIzaH+lpnJ9Dq4DTn4EacOg+uwu4YfHpBUW/713UUYztwJq 1JTQ== X-Forwarded-Encrypted: i=1; AJvYcCV2PKvOsbNUGblVhgKsZ38DAD5j53HcbBkSn5Z2mOIucEfMdD8c+JKbZtpodqOZJLRmYvvq7PWUf4reVRc=@vger.kernel.org X-Gm-Message-State: AOJu0Yx+YaGPmrc3fBW6duZkLeD/nnBILDI2WZUtdyPOzTo137eehqF6 B9qN6ixlHgjEeyFw63x1eB1YSfB8oxiENsXXkQP0AvNDIKCV65mStUS+PblbZqKr5R09wuHU/dD Xw36KFA== X-Google-Smtp-Source: AGHT+IFz/mOSeweoDFQEe1prKxm0oY3++suIaj9pQRWJfRm1ouaf2TCChsh41iBwxlNuzYnHzPTecD9i+ec= X-Received: from pjbbk12.prod.google.com ([2002:a17:90b:80c:b0:32d:7097:81f1]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:4ac4:b0:32e:8c6d:5928 with SMTP id 98e67ed59e1d1-3383abde248mr5248895a91.13.1759242249096; Tue, 30 Sep 2025 07:24:09 -0700 (PDT) Date: Tue, 30 Sep 2025 07:24:07 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250926163114.2626257-1-seanjc@google.com> <20250926163114.2626257-6-seanjc@google.com> Message-ID: Subject: Re: [PATCH 5/6] KVM: selftests: Add wrappers for mmap() and munmap() to assert success From: Sean Christopherson To: Ackerley Tng Cc: Paolo Bonzini , Christian Borntraeger , Janosch Frank , Claudio Imbrenda , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Hildenbrand , Fuad Tabba Content-Type: text/plain; charset="us-ascii" On Tue, Sep 30, 2025, Ackerley Tng wrote: > Sean Christopherson writes: > > To be perfectly honest, I forgot test_util.h existed :-) > > Merging/dropping one of kvm_util.h vs test_util.h is a good idea. The > distinction is not clear and it's already kind of messy between the two. That's a topic for another day. > It's a common pattern in KVM selftests to have a syscall/ioctl wrapper > foo() that asserts defaults and a __foo() that doesn't assert anything > and allows tests to assert something else, but I have a contrary > opinion. > > I think it's better that tests be explicit about what they're testing > for, so perhaps it's better to use macros like TEST_ASSERT_EQ() to > explicitly call a function and check the results. No, foo() and __foo() is a well-established pattern in the kernel, and in KVM selftests it is a very well-established pattern for syscalls and ioctls. And I feel very, very strong about handling errors in the core infrastructure. Relying on developers to remember to add an assert is 100% guaranteed to result in missed asserts. That makes everyone's life painful, because inevitably an ioctl will fail on someone else's system, and then they're stuck debugging a super random failure with no insight into what the developer _meant_ to do. And requiring developers to write (i.e. copy+paste) boring, uninteresting code to handle failures adds a lot of friction to development, is a terrible use of developers' time, and results in _awful_ error messages. Bad or missing error messages in tests have easily wasted tens of hours of just _my_ time; I suspect the total cost throughout the KVM community can be measured in tens of days. E.g. pop quiz, what state did I clobber that generated this error message with a TEST_ASSERT_EQ(ret, 0)? Answer at the bottom. ==== Test Assertion Failure ==== lib/x86/processor.c:1128: ret == 0 pid=2456 tid=2456 errno=22 - Invalid argument 1 0x0000000000415465: vcpu_load_state at processor.c:1128 2 0x0000000000402805: save_restore_vm at hyperv_evmcs.c:221 3 0x000000000040204d: main at hyperv_evmcs.c:286 4 0x000000000041df43: __libc_start_call_main at libc-start.o:? 5 0x00000000004200ec: __libc_start_main_impl at ??:? 6 0x0000000000402220: _start at ??:? 0xffffffffffffffff != 0 (ret != 0) You might say "oh, I can go look at the source". But what if you don't have the source because you got a test failure from CI? Or because the assert came from a bug report due to a failure in someone else's CI pipeline? That is not a contrived example. Before the ioctl assertion framework was added, KVM selftests was littered with such garbage. Note, I'm not blaming developers in any way. After having to add tens of asserts on KVM ioctls just to write a simple test, it's entirely natural to become fatigued and start throwing in TEST_ASSERT_EQ(ret, 0) or TEST_ASSERT(!ret, "ioctl failed"). There's also the mechanics of requiring the caller to assert. KVM ioctls that return a single value, e.g. register accessors, then need to use an out-param to communicate the value or error code, e.g. this val = vcpu_get_reg(vcpu, reg_id); TEST_ASSERT_EQ(val, 0); would become this: ret = vcpu_get_reg(vcpu, reg_id, &val); TEST_ASSERT_EQ(ret, 0); TEST_ASSERT_EQ(val, 0); But of course, the developer would bundle that into: TEST_ASSERT(!ret && !val, "get_reg failed"); And then the user is really sad when the "!val" condition fails, because they can't even tell. Again, this't a contrived example, it literally happend to me when dealing with the guest_memfd NUMA testcase, and was what prompted me to write this syscall framework. This also shows the typical error message that a developer will write. This TEST_ASSERT() failed on me due to a misguided cleanup I made: ret = syscall(__NR_get_mempolicy, &get_policy, &get_nodemask, maxnode, mem, MPOL_F_ADDR); TEST_ASSERT(!ret && get_policy == MPOL_DEFAULT && get_nodemask == 0, "Policy should be MPOL_DEFAULT and nodes zero"); generating this error message: ==== Test Assertion Failure ==== guest_memfd_test.c:120: !ret && get_policy == MPOL_DEFAULT && get_nodemask == 0 pid=52062 tid=52062 errno=22 - Invalid argument 1 0x0000000000404113: test_mbind at guest_memfd_test.c:120 (discriminator 6) 2 (inlined by) __test_guest_memfd at guest_memfd_test.c:409 (discriminator 6) 3 0x0000000000402320: test_guest_memfd at guest_memfd_test.c:432 4 (inlined by) main at guest_memfd_test.c:529 5 0x000000000041eda3: __libc_start_call_main at libc-start.o:? 6 0x0000000000420f4c: __libc_start_main_impl at ??:? 7 0x00000000004025c0: _start at ??:? Policy should be MPOL_DEFAULT and nodes zero At first glance, it would appear that get_mempolicy() failed with -EINVAL. Nope. ret==0, but errno was left set from an earlier syscall. It took me a few minutes of digging and a run with strace to figure out that get_mempolicy() succeeded. Constrast that with: kvm_get_mempolicy(&policy, &nodemask, maxnode, mem, MPOL_F_ADDR); TEST_ASSERT(policy == MPOL_DEFAULT && !nodemask, "Wanted MPOL_DEFAULT (%u) and nodemask 0x0, got %u and 0x%lx", MPOL_DEFAULT, policy, nodemask); ==== Test Assertion Failure ==== guest_memfd_test.c:120: policy == MPOL_DEFAULT && !nodemask pid=52700 tid=52700 errno=22 - Invalid argument 1 0x0000000000404915: test_mbind at guest_memfd_test.c:120 (discriminator 6) 2 (inlined by) __test_guest_memfd at guest_memfd_test.c:407 (discriminator 6) 3 0x0000000000402320: test_guest_memfd at guest_memfd_test.c:430 4 (inlined by) main at guest_memfd_test.c:527 5 0x000000000041eda3: __libc_start_call_main at libc-start.o:? 6 0x0000000000420f4c: __libc_start_main_impl at ??:? 7 0x00000000004025c0: _start at ??:? Wanted MPOL_DEFAULT (0) and nodemask 0x0, got 1 and 0x1 Yeah, there's still some noise with errno=22, but it's fairly clear that the returned values mismatches, and super obvious that the syscall succeeded when looking at the code. This is not a cherry-picked example. There are hundreds, if not thousands, of such asserts in KVM selftests and KVM-Unit-Tests in particular. And that's when developers _aren't_ forced to manually add boilerplate asserts in ioctls succeeding. For people that are completely new to KVM selftests, I can appreciate that it might take a while to acclimate to the foo() and __foo() pattern, but I have a hard time believing that it adds significant cognitive load after you've spent a decent amount of time in KVM selftests. And I 100% want to cater to the people that are dealing with KVM selftests day in, day out. > Or perhaps it should be more explicit, like in the name, that an > assertion is made within this function? No, that's entirely inflexible, will lead to confusion, and adds a copious amount of noise. E.g. this /* emulate hypervisor clearing CR4.OSXSAVE */ vcpu_sregs_get(vcpu, &sregs); sregs.cr4 &= ~X86_CR4_OSXSAVE; vcpu_sregs_set(vcpu, &sregs); versus /* emulate hypervisor clearing CR4.OSXSAVE */ vcpu_sregs_get_assert(vcpu, &sregs); sregs.cr4 &= ~X86_CR4_OSXSAVE; vcpu_sregs_set_assert(vcpu, &sregs); The "assert" is pure noise and makes it harder to see the "get" versus "set". If we instead annotate the the "no_assert" case, then we'll end up with ambigous cases where a developer won't be able to determine if an unannotated API asserts or not, and conflict cases where a "no_assert" API _does_ assert, just not on the primary ioctl it's invoking. IMO, foo() and __foo() is quite explicit once you become accustomed to the environment. > In many cases a foo() exists without the corresponding __foo(), which > seems to be discouraging testing for error cases. That's almost always because no one has needed __foo(). > Also, I guess especially for vcpu_run(), tests would like to loop/take > different actions based on different errnos and then it gets a bit > unwieldy to have to avoid functions that have assertions within them. vcpu_run() is a special case. KVM_RUN is so much more than a normal ioctl, and so having vcpu_run() follow the "standard" pattern isn't entirely feasible. Speaking of vcpu_run(), and directly related to idea of having developers manually do TEST_ASSERT_EQ(), one of the top items on my selftests todo list is to have vcpu_run() handle GUEST_ASSERT and GUEST_PRINTF whenever possible. Having to add UCALL_PRINTF handling just to get a debug message out of a test's guest code is beyond frustrating. Ditto for the 60+ tests that had to manually add UCALL_ABORT handling, which leads to tests having code like this, which then gets copy+pasted all over the place and becomes a nightmare to maintain. static void __vcpu_run_expect(struct kvm_vcpu *vcpu, unsigned int cmd) { struct ucall uc; vcpu_run(vcpu); switch (get_ucall(vcpu, &uc)) { case UCALL_ABORT: REPORT_GUEST_ASSERT(uc); break; default: if (uc.cmd == cmd) return; TEST_FAIL("Unexpected ucall: %lu", uc.cmd); } } > I can see people forgetting to add TEST_ASSERT_EQ()s to check results of > setup/teardown functions but I think those errors would surface some > other way anyway. Heh, I don't mean to be condescending, but I highly doubt you'll have this opinion after you've had to debug a completely unfamiliar test that's failing in weird ways, for the tenth time. > Not a strongly-held opinion, As you may have noticed, I have extremely strong opinions in this area :-) > and no major concerns on the naming either. It's a selftest after all and > IIUC we're okay to have selftest interfaces change anyway? Yes, changes are fine. It's the churn I want to avoid. Oh, and here's the "answer" to the TEST_ASSERT_EQ() failure: ==== Test Assertion Failure ==== include/kvm_util.h:794: !ret pid=43866 tid=43866 errno=22 - Invalid argument 1 0x0000000000415486: vcpu_sregs_set at kvm_util.h:794 (discriminator 4) 2 (inlined by) vcpu_load_state at processor.c:1125 (discriminator 4) 3 0x0000000000402805: save_restore_vm at hyperv_evmcs.c:221 4 0x000000000040204d: main at hyperv_evmcs.c:286 5 0x000000000041dfc3: __libc_start_call_main at libc-start.o:? 6 0x000000000042016c: __libc_start_main_impl at ??:? 7 0x0000000000402220: _start at ??:? KVM_SET_SREGS failed, rc: -1 errno: 22 (Invalid argument)