Re: kernel selftest max_guest_memory_test fails when using more that 256 vCPUs

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: mlevitsk@redhat.com
Cc: kvm@vger.kernel.org
Subject: Re: kernel selftest max_guest_memory_test fails when using more that 256 vCPUs
Date: Mon, 11 Mar 2024 17:05:18 -0700	[thread overview]
Message-ID: <Ze-cPqZDXnF-FEXj@google.com> (raw)
In-Reply-To: <f39788063fc3e63edb8ba0490ff17ed8cb6598da.camel@redhat.com>

On Mon, Mar 11, 2024, mlevitsk@redhat.com wrote:
> Hi,
> 
> Recently I debugged a failure of this selftest and this is what is happening:
> 
> For each vCPU this test runs the guest till it does the ucall, then it resets
> all the vCPU registers to their initial values (including RIP) and runs the guest again.
> I don't know if this is needed.
> 
> What happens however is that ucall code allocates the ucall struct prior to calling the host,
> and then expects the host to resume the guest, at which point the guest frees the struct.
> 
> However since the host manually resets the guest registers, the code that frees the ucall struct
> is never reached and thus the ucall struct is leaked.
> 
> Currently ucall code has a pool of KVM_MAX_VCPUS (512) objects, thus if the test is run with more
> than 256 vCPUs, the pool is exhausted and the test fails.
> 
> So either we need to:
>   - add a way to manually free the ucall struct for such tests from the host side.

Part of me wants to do something along these lines, as every GUEST_DONE() and
failed GUEST_ASSERT() is "leaking" a ucall structure.  But practically speaking,
freeing a ucall structure from anywhere except the vCPU context is bound to cause
more problems than it solves.

>   - remove the manual reset of the vCPUs register state from this test and
>   instead put the guest code in while(1) {} loop.

Definitely this one.  IIRC, the only reason I stuffed registers in the test was
because I was trying to force MMU reloads.  I can't think of any reason why a
simple infinite loop in the guest wouldn't work.  I'm pretty sure this is all
that's needed?

diff --git a/tools/testing/selftests/kvm/max_guest_memory_test.c b/tools/testing/selftests/kvm/max_guest_memory_test.c
index 6628dc4dda89..5f9950f41313 100644
--- a/tools/testing/selftests/kvm/max_guest_memory_test.c
+++ b/tools/testing/selftests/kvm/max_guest_memory_test.c
@@ -22,10 +22,12 @@ static void guest_code(uint64_t start_gpa, uint64_t end_gpa, uint64_t stride)
 {
        uint64_t gpa;
 
-       for (gpa = start_gpa; gpa < end_gpa; gpa += stride)
-               *((volatile uint64_t *)gpa) = gpa;
+       for (;;) {
+               for (gpa = start_gpa; gpa < end_gpa; gpa += stride)
+                       *((volatile uint64_t *)gpa) = gpa;
 
-       GUEST_DONE();
+               GUEST_DONE();
+       }
 }
 
 struct vcpu_info {
@@ -64,17 +66,12 @@ static void *vcpu_worker(void *data)
        struct kvm_vcpu *vcpu = info->vcpu;
        struct kvm_vm *vm = vcpu->vm;
        struct kvm_sregs sregs;
-       struct kvm_regs regs;
 
        vcpu_args_set(vcpu, 3, info->start_gpa, info->end_gpa, vm->page_size);
-
-       /* Snapshot regs before the first run. */
-       vcpu_regs_get(vcpu, &regs);
        rendezvous_with_boss();
 
        run_vcpu(vcpu);
        rendezvous_with_boss();
-       vcpu_regs_set(vcpu, &regs);
        vcpu_sregs_get(vcpu, &sregs);
 #ifdef __x86_64__
        /* Toggle CR0.WP to trigger a MMU context reset. */


>   - refactor the ucall code to not rely on a fixed pool of structs, making it
>   possible to tolerate small memory leaks like that (I don't like this to be
>   honest).

Heh, me neither.

next prev parent reply	other threads:[~2024-03-12  0:05 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-11 23:13 kernel selftest max_guest_memory_test fails when using more that 256 vCPUs mlevitsk
2024-03-12  0:05 ` Sean Christopherson [this message]
2024-03-12 11:37   ` Andrew Jones
2024-03-12 16:27     ` Sean Christopherson

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:6628dc4dda8 dfblob:5f9950f4131 )
 OR (
bs:"Re: kernel selftest max_guest_memory_test fails when using more that 256 vCPUs" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Ze-cPqZDXnF-FEXj@google.com \
    --to=seanjc@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=mlevitsk@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox