From mboxrd@z Thu Jan  1 00:00:00 1970
From: George Dunlap <george.dunlap@citrix.com>
Subject: Re: [PATCHv5 3/3] p2m: convert p2m rwlock to percpu
	rwlock
Date: Tue, 22 Dec 2015 12:07:36 +0000
Message-ID: <56793D08.2030108@citrix.com>
References: <1450454920-11036-1-git-send-email-malcolm.crossley@citrix.com>
	<1450454920-11036-4-git-send-email-malcolm.crossley@citrix.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
Received: from mail6.bemta3.messagelabs.com ([195.245.230.39])
	by lists.xen.org with esmtp (Exim 4.72)
	(envelope-from <prvs=791546d89=George.Dunlap@citrix.com>)
	id 1aBLjC-0007Kh-Ut
	for xen-devel@lists.xenproject.org; Tue, 22 Dec 2015 12:07:43 +0000
In-Reply-To: <1450454920-11036-4-git-send-email-malcolm.crossley@citrix.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Malcolm Crossley <malcolm.crossley@citrix.com>, JBeulich@suse.com, ian.campbell@citrix.com, andrew.cooper3@citrix.com, Marcos.Matsunaga@oracle.com, keir@xen.org, konrad.wilk@oracle.com, george.dunlap@eu.citrix.com
Cc: xen-devel@lists.xenproject.org, dario.faggioli@citrix.com, stefano.stabellini@citrix.com
List-Id: xen-devel@lists.xenproject.org

On 18/12/15 16:08, Malcolm Crossley wrote:
> The per domain p2m read lock suffers from significant contention when
> performance multi-queue block or network IO due to the parallel
> grant map/unmaps/copies occuring on the DomU's p2m.
> 
> On multi-socket systems, the contention results in the locked compare swap
> operation failing frequently which results in a tight loop of retries of the
> compare swap operation. As the coherency fabric can only support a specific
> rate of compare swap operations for a particular data location then taking
> the read lock itself becomes a bottleneck for p2m operations.
> 
> Percpu rwlock p2m performance with the same configuration is approximately
> 64 gbit/s vs the 48 gbit/s with grant table percpu rwlocks only.
> 
> Oprofile was used to determine the initial overhead of the read-write locks
> and to confirm the overhead was dramatically reduced by the percpu rwlocks.
> 
> Note: altp2m users will not achieve a gain if they take an altp2m read lock
> simultaneously with the main p2m lock.
> 
> Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com>

Looks good, thanks:

Reviewed-by: George Dunlap <george.dunlap@citrix.com>

If you end up switching to always using the per-cpu pointer stored in
the percpu_rwlock struct, you can retain the Reviewed-by for those changes.

 -George