Re: [PATCH 3/4] KVM: selftests: Wait for all vCPU to be created before entering guest mode

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: David Matlack <dmatlack@google.com>
Cc: Ben Gardon <bgardon@google.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	kvm@vger.kernel.org, Andrew Jones <drjones@redhat.com>,
	Jim Mattson <jmattson@google.com>,
	Yanan Wang <wangyanan55@huawei.com>, Peter Xu <peterx@redhat.com>,
	Aaron Lewis <aaronlewis@google.com>
Subject: Re: [PATCH 3/4] KVM: selftests: Wait for all vCPU to be created before entering guest mode
Date: Fri, 12 Nov 2021 16:32:03 +0000	[thread overview]
Message-ID: <YY6XA4U1hRcbaG32@google.com> (raw)
In-Reply-To: <CALzav=fxuU9_jP7q3=qm5LYXbVkR-qUORR=EeiiOqa_GneZVdQ@mail.gmail.com>

On Thu, Nov 11, 2021, David Matlack wrote:
> On Thu, Nov 11, 2021 at 4:51 PM David Matlack <dmatlack@google.com> wrote:
> > > > @@ -196,6 +202,17 @@ static void *vcpu_thread_main(void *data)
> > > >  {
> > > >         struct vcpu_thread *vcpu = data;
> > > >
> > > > +       WRITE_ONCE(vcpu->running, true);
> > > > +
> > > > +       /*
> > > > +        * Wait for all vCPU threads to be up and running before calling the test-
> > > > +        * provided vCPU thread function. This prevents thread creation (which
> > > > +        * requires taking the mmap_sem in write mode) from interfering with the
> > > > +        * guest faulting in its memory.
> > > > +        */
> > > > +       while (!READ_ONCE(all_vcpu_threads_running))

This is way more convoluted than it needs to be:

	atomic_inc(&perf_test_args.nr_running_vcpus);

	while (atomic_read(&perf_test_args.nr_running_vcpus) < perf_test_args.nr_vcpus)
		cpu_relax();

diff --git a/tools/testing/selftests/kvm/include/perf_test_util.h b/tools/testing/selftests/kvm/include/perf_test_util.h
index df9f1a3a3ffb..ce9039ec8c18 100644
--- a/tools/testing/selftests/kvm/include/perf_test_util.h
+++ b/tools/testing/selftests/kvm/include/perf_test_util.h
@@ -30,6 +30,8 @@ struct perf_test_args {
        uint64_t host_page_size;
        uint64_t guest_page_size;
        int wr_fract;
+       int nr_vcpus;
+       atomic_t nr_running_vcpus;
 
        struct perf_test_vcpu_args vcpu_args[KVM_MAX_VCPUS];
 };


Alternatively it could be a countdown to zero so that only the atomic_t is needed.


> > > I can never remember the rules on this so I could be wrong, but you
> > > may want a cpu_relax() in that loop to prevent it from being optimized
> > > out. Maybe the READ_ONCE is sufficient though.
> >
> > READ_ONCE is sufficient to prevent the loop from being optimized out
> > but cpu_relax() is nice to have to play nice with our hyperthread
> > buddy.
> >
> > On that note there are a lot of spin waits in the KVM selftests and
> > none of the ones I've seen use cpu_relax().
> >
> > I'll take a look at adding cpu_relax() throughout the selftests in v2.
> >
> > >
> > > >         vcpu_thread_fn(&perf_test_args.vcpu_args[vcpu->vcpu_id]);
> > > >
> > > >         return NULL;
> > > > @@ -206,14 +223,23 @@ void perf_test_start_vcpu_threads(int vcpus, void (*vcpu_fn)(struct perf_test_vc
> > > >         int vcpu_id;
> > > >
> > > >         vcpu_thread_fn = vcpu_fn;
> > > > +       WRITE_ONCE(all_vcpu_threads_running, false);
> > > >
> > > >         for (vcpu_id = 0; vcpu_id < vcpus; vcpu_id++) {
> > > >                 struct vcpu_thread *vcpu = &vcpu_threads[vcpu_id];
> > > >
> > > >                 vcpu->vcpu_id = vcpu_id;
> > > > +               WRITE_ONCE(vcpu->running, false);
> > >
> > > Do these need to be WRITE_ONCE? I don't think WRITE_ONCE provides any
> > > extra memory ordering guarantees and I don't know why the compiler
> > > would optimize these out. If they do need to be WRITE_ONCE, they
> > > probably merit comments.
> >
> > To be completely honest I'm not sure. I included WRITE_ONCE out of
> > caution to ensure the compiler does not reorder the writes with
> > respect to the READ_ONCE. I'll need to do a bit more research to
> > confirm if it's really necessary.
> 
> FWIW removing WRITE_ONCE and bumping the optimization level up to O3
> did not cause any problems. But this is no proof of course.
> 
> This quote from memory-barries.txt makes me think it'd be prudent to
> keep the WRITE_ONCE:
> 
>      You should assume that the compiler can move READ_ONCE() and
>      WRITE_ONCE() past code not containing READ_ONCE(), WRITE_ONCE(),
>      barrier(), or similar primitives.
> 
> So, for example, the compiler could potentially re-order READ_ONCE
> loop below after the write to all_vcpu_threads_running if we did not
> include WRITE_ONCE?

Practically speaking, no.  pthread_create() undoubtedly has barriers galore, so
unless @vcpus==0, all_vcpu_threads_running won't get re-ordered.

As noted above, READ_ONCE/WRITE_ONCE do provide _some_ guarantees about memory
ordering with respect to the _compiler_.  Notably, the compiler is allowed to
reorder non-ONCE loads/stores around {READ,WRITE}_ONCE.  And emphasis on "compiler".
On x86, that's effectively the same as memory ordering in hardware, because ignoring
WC memory, x86 is strongy ordered.  But arm64 and others are weakly ordered, in
which case {READ,WRITE}_ONCE do not provide any guarantees about how loads/stores
will complete when run on the CPU.

This is a bad example because there's not really a race to be had, e.g. aside from
pthread_create(), there's also the fact that all_vcpu_threads_running=false is a
likely nop since it's zero initialized.  

Here's a contrived example:

static void *vcpu_thread_main(void *data)
{
	while (!READ_ONCE(test_stage))
		cpu_relax();

	READ_ONCE(test_fn)();
}


int main(...)
{
	for (vcpu_id = 0; vcpu_id < vcpus; vcpu_id++)
		pthread_create(&vcpu->thread, NULL, vcpu_thread_main, vcpu);

	test_fn = do_work;

	WRITE_ONCE(test_stage, 1);
}

On any architecture, the thread could observe a NULL test_fn because it's allowed
to reorder the store to test_fn around the store to test_stage.  Making it

	WRITE_ONCE(test_fn, do_work);
	WRITE_ONCE(test_stage, 1);

would solve the problem on x86 as it would effectively force the compiler to emit:

	test_stage = 0
	barrier() // because of pthread_create()
	test_fn    = do_work
	test_stage = 1

but on arm64 and others, hardware can complete the second store to test_stage
_before_ the store to test_fn, and so the threads could observe a NULL test_fn
even though the compiler was forced to generate a specific order of operations.

To ensure something like the above works on weakly-ordered architctures, the code
would should instead be something like:

static void *vcpu_thread_main(void *data)
{
        while (!READ_ONCE(test_stage))
                cpu_relax();

	smp_rmb();
        test_fn();
}

int main(...)
{
        for (vcpu_id = 0; vcpu_id < vcpus; vcpu_id++)
                pthread_create(&vcpu->thread, NULL, vcpu_thread_main, vcpu);

        test_fn = do_work;
	smp_wmb();
        test_stage = 1;
}

Final note, the READ_ONCE() in the while loop is still necessary because without
that the compiler would be allowed to assume that test_stage can't be changed
in that loop, e.g. on x86 could generate the following hang the task:

	mov [test_stage], %rax
1:	test %rax, %rax
	jnz 2f
	pause
	jmp 1b
2:

next prev parent reply	other threads:[~2021-11-12 16:32 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-11  0:12 [PATCH 0/4] KVM: selftests: Avoid mmap_sem contention during memory population David Matlack
2021-11-11  0:12 ` [PATCH 1/4] KVM: selftests: Start at iteration 0 instead of -1 David Matlack
2021-11-11  0:12 ` [PATCH 2/4] KVM: selftests: Move vCPU thread creation and joining to common helpers David Matlack
2021-11-11 18:08   ` Ben Gardon
2021-11-12  0:47     ` David Matlack
2021-11-11  0:12 ` [PATCH 3/4] KVM: selftests: Wait for all vCPU to be created before entering guest mode David Matlack
2021-11-11 18:17   ` Ben Gardon
2021-11-12  0:51     ` David Matlack
2021-11-12  1:46       ` David Matlack
2021-11-12 16:32         ` Sean Christopherson [this message]
2021-11-11  0:12 ` [PATCH 4/4] KVM: selftests: Use perf_test_destroy_vm in memslot_modification_stress_test David Matlack
2021-11-11 18:18   ` Ben Gardon
2021-11-16 11:13 ` [PATCH 0/4] KVM: selftests: Avoid mmap_sem contention during memory population Paolo Bonzini
2021-11-17  0:21   ` David Matlack

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:df9f1a3a3ff dfblob:ce9039ec8c1 )
 OR (
bs:"Re: [PATCH 3/4] KVM: selftests: Wait for all vCPU to be created before entering guest mode" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YY6XA4U1hRcbaG32@google.com \
    --to=seanjc@google.com \
    --cc=aaronlewis@google.com \
    --cc=bgardon@google.com \
    --cc=dmatlack@google.com \
    --cc=drjones@redhat.com \
    --cc=jmattson@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=wangyanan55@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.