From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46803) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bmpml-0004Jh-Ux for qemu-devel@nongnu.org; Wed, 21 Sep 2016 18:14:37 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bmpmh-0003Qi-0r for qemu-devel@nongnu.org; Wed, 21 Sep 2016 18:14:35 -0400 Received: from out2-smtp.messagingengine.com ([66.111.4.26]:34341) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bmpme-0003ME-KY for qemu-devel@nongnu.org; Wed, 21 Sep 2016 18:14:30 -0400 Date: Wed, 21 Sep 2016 18:14:19 -0400 From: "Emilio G. Cota" Message-ID: <20160921221419.GA30386@flamenco> References: <1474289459-15242-1-git-send-email-pbonzini@redhat.com> <1474289459-15242-17-git-send-email-pbonzini@redhat.com> <20160921172444.GF13385@flamenco> <803ceaca-088a-99b8-1a43-821d3507fd9a@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <803ceaca-088a-99b8-1a43-821d3507fd9a@redhat.com> Subject: Re: [Qemu-devel] [PATCH 16/16] cpus-common: lock-free fast path for cpu_exec_start/end List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: qemu-devel@nongnu.org, serge.fdrv@gmail.com, alex.bennee@linaro.org, sergey.fedorov@linaro.org On Wed, Sep 21, 2016 at 20:19:18 +0200, Paolo Bonzini wrote: (snip) > No, this is not true. Barriers order stores and loads within a thread > _and_ establish synchronizes-with edges. > > In the example above you are violating causality: > > - cpu0 stores cpu->running before loading pending_cpus > > - because pending_cpus == 0, cpu1 stores pending_cpus = 1 after cpu0 > loads it > > - cpu1 loads cpu->running after it stores pending_cpus OK. So I simplified the example to understand this better: cpu0 cpu1 ---- ---- { A = B = 0, r0 and r1 are private variables } x = 1 y = 1 smp_mb() smp_mb() r0 = y r1 = x Turns out this is scenario 10 here: https://lwn.net/Articles/573436/ The source of my confusion was not paying due attention to smp_mb, which is necessary for maintaining transitivity. > > Is there a performance (scalability) reason behind this patch? > > Yes: it speeds up all cpu_exec_start/end, _not_ start/end_exclusive. > > With this patch, as long as there are no start/end_exclusive (which are > supposed to be rare) there is no contention on multiple CPUs doing > cpu_exec_start/end. > > Without it, as CPUs increase, the global cpu_list_mutex is going to > become a bottleneck. I see. Scalability-wise I wouldn't expect much improvement with MTTCG full-system, given that the iothread lock is still acquired on every CPU loop exit (just like in KVM). However, for user-mode this should yield measurable improvements =D Thanks, E.