From mboxrd@z Thu Jan 1 00:00:00 1970 From: George Dunlap Subject: Re: [PATCHv5 3/3] p2m: convert p2m rwlock to percpu rwlock Date: Tue, 22 Dec 2015 12:07:36 +0000 Message-ID: <56793D08.2030108@citrix.com> References: <1450454920-11036-1-git-send-email-malcolm.crossley@citrix.com> <1450454920-11036-4-git-send-email-malcolm.crossley@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta3.messagelabs.com ([195.245.230.39]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1aBLjC-0007Kh-Ut for xen-devel@lists.xenproject.org; Tue, 22 Dec 2015 12:07:43 +0000 In-Reply-To: <1450454920-11036-4-git-send-email-malcolm.crossley@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Malcolm Crossley , JBeulich@suse.com, ian.campbell@citrix.com, andrew.cooper3@citrix.com, Marcos.Matsunaga@oracle.com, keir@xen.org, konrad.wilk@oracle.com, george.dunlap@eu.citrix.com Cc: xen-devel@lists.xenproject.org, dario.faggioli@citrix.com, stefano.stabellini@citrix.com List-Id: xen-devel@lists.xenproject.org On 18/12/15 16:08, Malcolm Crossley wrote: > The per domain p2m read lock suffers from significant contention when > performance multi-queue block or network IO due to the parallel > grant map/unmaps/copies occuring on the DomU's p2m. > > On multi-socket systems, the contention results in the locked compare swap > operation failing frequently which results in a tight loop of retries of the > compare swap operation. As the coherency fabric can only support a specific > rate of compare swap operations for a particular data location then taking > the read lock itself becomes a bottleneck for p2m operations. > > Percpu rwlock p2m performance with the same configuration is approximately > 64 gbit/s vs the 48 gbit/s with grant table percpu rwlocks only. > > Oprofile was used to determine the initial overhead of the read-write locks > and to confirm the overhead was dramatically reduced by the percpu rwlocks. > > Note: altp2m users will not achieve a gain if they take an altp2m read lock > simultaneously with the main p2m lock. > > Signed-off-by: Malcolm Crossley Looks good, thanks: Reviewed-by: George Dunlap If you end up switching to always using the per-cpu pointer stored in the percpu_rwlock struct, you can retain the Reviewed-by for those changes. -George