From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Paul E. McKenney" Subject: Re: [PATCH 4/6] kvm tools: Add rwlock wrapper Date: Sun, 29 May 2011 20:07:26 -0700 Message-ID: <20110530030725.GM2668@linux.vnet.ibm.com> References: <20110528183259.GA15019@elte.hu> <4DE1EA93.6040401@redhat.com> <20110529073550.GA21254@elte.hu> <4DE1FBA5.6080905@redhat.com> <20110529123755.GC26627@elte.hu> <4DE2409D.1050701@redhat.com> <20110529142747.GA15441@elte.hu> <4DE25F70.5080700@redhat.com> <20110529153854.GH2668@linux.vnet.ibm.com> <20110529193327.GE9835@elte.hu> Reply-To: paulmck@linux.vnet.ibm.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Avi Kivity , Mathieu Desnoyers , Pekka Enberg , Sasha Levin , john@jfloren.net, kvm@vger.kernel.org, asias.hejun@gmail.com, gorcunov@gmail.com, prasadjoshi124@gmail.com To: Ingo Molnar Return-path: Received: from e9.ny.us.ibm.com ([32.97.182.139]:42194 "EHLO e9.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753275Ab1E3DHc (ORCPT ); Sun, 29 May 2011 23:07:32 -0400 Received: from d01relay03.pok.ibm.com (d01relay03.pok.ibm.com [9.56.227.235]) by e9.ny.us.ibm.com (8.14.4/8.13.1) with ESMTP id p4U2bWu7015737 for ; Sun, 29 May 2011 22:37:32 -0400 Received: from d01av01.pok.ibm.com (d01av01.pok.ibm.com [9.56.224.215]) by d01relay03.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id p4U37UGC115518 for ; Sun, 29 May 2011 23:07:30 -0400 Received: from d01av01.pok.ibm.com (loopback [127.0.0.1]) by d01av01.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id p4U37TxT022820 for ; Sun, 29 May 2011 23:07:30 -0400 Content-Disposition: inline In-Reply-To: <20110529193327.GE9835@elte.hu> Sender: kvm-owner@vger.kernel.org List-ID: On Sun, May 29, 2011 at 09:33:27PM +0200, Ingo Molnar wrote: > > * Paul E. McKenney wrote: > > > On Sun, May 29, 2011 at 06:00:00PM +0300, Avi Kivity wrote: > > > On 05/29/2011 05:27 PM, Ingo Molnar wrote: > > > >* Avi Kivity wrote: > > > > > > > >> I don't understand how you expect per_cpu to work in userspace. As > > > >> soon as you calculate the per-cpu address, it can be invalidated. > > > >> It doesn't help that you get a signal; you've already calculated > > > >> the address. > > > > > > > >I was thinking of some sort of transactional mechanism, a tightly > > > >controlled set of assembly instructions updating the percpu fields, > > > >where the migration event would be able to 'unroll' incomplete > > > >modifications done to the 'wrong' percpu data structure. (It would be > > > >rather complex and every percpu op would have to be an atomic because > > > >there's always the chance that it's executed on the wrong CPU.) > > > > > > > >But note that we do not even need any notification if there's a > > > >(local) lock on the percpu fields: > > > > > > > >It will work because it's statistically percpu the lock will not > > > >SMP-bounce between CPUs generally so it will be very fast to > > > >acquire/release it, and we get the full cache benefits of percpu > > > >variables. > > > > > > > >The migration notification would still be useful to detect grace > > > >periods at natural points - but as Paul pointed out polling it via > > > >SIGALRM works as well. The two (migration and SIGALRM) could be > > > >combined as well. > > > > > > I think it's way simpler to map cpu == thread. And in fact, when > > > you run a Linux kernel in a kvm guest, that's what happens, since > > > each vcpu _is_ a host thread. > > > > I have to agree with Avi here. If a stop_machine()-like approach is > > going to work, the updates have to be very rare, so any additional > > cache-nonlocality from having lots of threads should not be a problem. > > Especially given that in this particular case, there are exactly as > > many CPUs as threads anyway. The readers should only need to touch a > > constant number of cache lines either way. > > > > Or am I missing something here? > > I was talking about the (still imaginery!) user-space tree-RCU code! > :-) Ah! I did miss a turn in the dicussion, then. ;-) My initial thought is to start with CPU==thread, with the CPU online and offline operations used at thread creation and destruction time. But longer term, you are right that there would be cache-locality benefits to splitting. Perhaps more important, user-space testing could then achieve much better coverage of the various race conditions. > The stop_machine_run()-alike thing is for brlocks - for which Sasha > sent patches already, see these patches on the kvm@vger.kernel.org > list: > > [PATCH 3/4] kvm tools: Add a brlock > [PATCH 4/4] kvm tools: Use brlock in MMIO and IOPORT > > Wrt. brlocks, we'll keep them as simple as possible and indeed no > involved tricks are needed AFAICS. read_lock() will be a compiler > barrier(), that's as fast as it gets :-) Makes sense! The other debugging use for the read-side primitives is to execute the read-side ready-do-respond-to-kvm-pause code. This can help catch bugs where the developer put the br_read_lock() and the br_read_unlock() in the wrong place. Thanx, Paul