From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60879) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bmm7D-00029L-Rl for qemu-devel@nongnu.org; Wed, 21 Sep 2016 14:19:31 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bmm78-0003HU-Rw for qemu-devel@nongnu.org; Wed, 21 Sep 2016 14:19:27 -0400 Received: from mx1.redhat.com ([209.132.183.28]:44216) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bmm78-0003HO-IR for qemu-devel@nongnu.org; Wed, 21 Sep 2016 14:19:22 -0400 References: <1474289459-15242-1-git-send-email-pbonzini@redhat.com> <1474289459-15242-17-git-send-email-pbonzini@redhat.com> <20160921172444.GF13385@flamenco> From: Paolo Bonzini Message-ID: <803ceaca-088a-99b8-1a43-821d3507fd9a@redhat.com> Date: Wed, 21 Sep 2016 20:19:18 +0200 MIME-Version: 1.0 In-Reply-To: <20160921172444.GF13385@flamenco> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH 16/16] cpus-common: lock-free fast path for cpu_exec_start/end List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Emilio G. Cota" Cc: qemu-devel@nongnu.org, serge.fdrv@gmail.com, alex.bennee@linaro.org, sergey.fedorov@linaro.org On 21/09/2016 19:24, Emilio G. Cota wrote: > On Mon, Sep 19, 2016 at 14:50:59 +0200, Paolo Bonzini wrote: >> Set cpu->running without taking the cpu_list lock, only look at it if >> there is a concurrent exclusive section. This requires adding a new >> field to CPUState, which records whether a running CPU is being counte= d >> in pending_cpus. When an exclusive section is started concurrently wi= th >> cpu_exec_start, cpu_exec_start can use the new field to wait for the e= nd >> of the exclusive section. >> >> This a separate patch for easier bisection of issues. >> >> Signed-off-by: Paolo Bonzini >> --- >> cpus-common.c | 73 +++++++++++++++++++++++++++++++++++++= +++------ >> docs/tcg-exclusive.promela | 53 +++++++++++++++++++++++++++++++-- >> include/qom/cpu.h | 5 ++-- >> 3 files changed, 117 insertions(+), 14 deletions(-) >> >> diff --git a/cpus-common.c b/cpus-common.c >> index f7ad534..46cf8ef 100644 >> --- a/cpus-common.c >> +++ b/cpus-common.c >> @@ -184,8 +184,12 @@ void start_exclusive(void) >> =20 >> /* Make all other cpus stop executing. */ >> pending_cpus =3D 1; >> + >> + /* Write pending_cpus before reading other_cpu->running. */ >> + smp_mb(); >> CPU_FOREACH(other_cpu) { >> if (other_cpu->running) { >> + other_cpu->has_waiter =3D true; >> pending_cpus++; >> qemu_cpu_kick(other_cpu); >> } >> @@ -212,24 +216,75 @@ void end_exclusive(void) >> /* Wait for exclusive ops to finish, and begin cpu execution. */ >> void cpu_exec_start(CPUState *cpu) >> { >> - qemu_mutex_lock(&qemu_cpu_list_mutex); >> - exclusive_idle(); >> cpu->running =3D true; >> - qemu_mutex_unlock(&qemu_cpu_list_mutex); >> + >> + /* Write cpu->running before reading pending_cpus. */ >> + smp_mb(); >> + >> + /* 1. start_exclusive saw cpu->running =3D=3D true and pending_cp= us >=3D 1. >> + * After taking the lock we'll see cpu->has_waiter =3D=3D true an= d run---not >> + * for long because start_exclusive kicked us. cpu_exec_end will >> + * decrement pending_cpus and signal the waiter. >> + * >> + * 2. start_exclusive saw cpu->running =3D=3D false but pending_c= pus >=3D 1. >> + * This includes the case when an exclusive item is running now. >> + * Then we'll see cpu->has_waiter =3D=3D false and wait for the i= tem to >> + * complete. >> + * >> + * 3. pending_cpus =3D=3D 0. Then start_exclusive is definitely = going to >> + * see cpu->running =3D=3D true, and it will kick the CPU. >> + */ >> + if (pending_cpus) { >> + qemu_mutex_lock(&qemu_cpu_list_mutex); >> + if (!cpu->has_waiter) { >> + /* Not counted in pending_cpus, let the exclusive item >> + * run. Since we have the lock, set cpu->running to true >> + * while holding it instead of retrying. >> + */ >> + cpu->running =3D false; >> + exclusive_idle(); >> + /* Now pending_cpus is zero. */ >> + cpu->running =3D true; >> + } else { >> + /* Counted in pending_cpus, go ahead. */ >> + } >> + qemu_mutex_unlock(&qemu_cpu_list_mutex); >> + } >=20 > wrt scenario (3): I don't think other threads will always see cpu->runn= ing =3D=3D true. > Consider the following: >=20 > cpu0 cpu1 > ---- ---- >=20 > cpu->running =3D true; pending_cpus =3D 1; > smp_mb(); smp_mb(); > if (pending_cpus) { /* false */ } CPU_FOREACH(other_cpu) { if (other_cp= u->running) { /* false */ } } >=20 > The barriers here don't guarantee that changes are immediately visible = to others > (for that we need strong ops, i.e. atomics). No, this is not true. Barriers order stores and loads within a thread _and_ establish synchronizes-with edges. In the example above you are violating causality: - cpu0 stores cpu->running before loading pending_cpus - because pending_cpus =3D=3D 0, cpu1 stores pending_cpus =3D 1 after cpu= 0 loads it - cpu1 loads cpu->running after it stores pending_cpus hence the only valid ordering is cpu->running =3D true if (pending_cpus) pending_cpus if (other_cpu->running) > Is there a performance (scalability) reason behind this patch? Yes: it speeds up all cpu_exec_start/end, _not_ start/end_exclusive. With this patch, as long as there are no start/end_exclusive (which are supposed to be rare) there is no contention on multiple CPUs doing cpu_exec_start/end. Without it, as CPUs increase, the global cpu_list_mutex is going to become a bottleneck. Paolo