From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sowmini Varadhan Date: Wed, 01 Apr 2015 01:08:18 +0000 Subject: Re: [PATCH v8 RFC 0/3] Generic IOMMU pooled allocator Message-Id: <551B4502.1020603@oracle.com> List-Id: References: <20150331180642.GA13314@oracle.com> <1427850091.20500.150.camel@kernel.crashing.org> In-Reply-To: <1427850091.20500.150.camel@kernel.crashing.org> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Benjamin Herrenschmidt Cc: aik@au1.ibm.com, anton@au1.ibm.com, paulus@samba.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, davem@davemloft.net On 03/31/2015 09:01 PM, Benjamin Herrenschmidt wrote: > On Tue, 2015-03-31 at 14:06 -0400, Sowmini Varadhan wrote: >> Having bravely said that.. >> >> the IB team informs me that they see a 10% degradation using >> the spin_lock as opposed to the trylock. >> >> one path going forward is to continue processing this patch-set >> as is. I can investigate this further, and later revise the spin_lock >> to the trylock, after we are certain that it is good/necessary. > > Have they tried using more pools instead ? we just tried 32 instead of 16, no change to perf. Looks like their current bottleneck is the find_next_zero_bit (they can get a 2X perf improvement with the lock fragmentation, but are then hitting a new ceiling, even with the trylock version) I'm starting to wonder if some approximation of dma premapped buffers may be needed. Doing a map/unmap on each packet is expensive.