From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1753195AbcGMK1v (ORCPT <rfc822;w@1wt.eu>);
	Wed, 13 Jul 2016 06:27:51 -0400
Received: from mx2.suse.de ([195.135.220.15]:47686 "EHLO mx2.suse.de"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1751689AbcGMK1l (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Wed, 13 Jul 2016 06:27:41 -0400
Date: Wed, 13 Jul 2016 12:27:18 +0200
From: Joerg Roedel <jroedel@suse.de>
To: Robin Murphy <robin.murphy@arm.com>
Cc: Joerg Roedel <joro@8bytes.org>, iommu@lists.linux-foundation.org,
        Vincent.Wan@amd.com, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 16/20 v2] iommu/amd: Optimize map_sg and unmap_sg
Message-ID: <20160713102718.GD27306@suse.de>
References: <1467978311-28322-1-git-send-email-joro@8bytes.org>
 <1467978311-28322-17-git-send-email-joro@8bytes.org>
 <5784D597.4010703@arm.com>
 <20160712133042.GG12639@8bytes.org>
 <57850DF8.9040507@arm.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <57850DF8.9040507@arm.com>
User-Agent: Mutt/1.5.24 (2015-08-30)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Tue, Jul 12, 2016 at 04:34:16PM +0100, Robin Murphy wrote:
> The boundary masks for block devices are tricky to track down through so
> many layers of indirection in the common frameworks, but there are a lot
> of 64K ones there. After some more impromptu digging into the subject
> I've finally satisfied my curiosity - it seems this restriction stems
> from the ATA DMA PRD table format, so it could perhaps still be a real
> concern for anyone using some crusty old PCI IDE card in their modern
> system.

The boundary-mask is a capability of the underlying PCI device, no? The
ATA or whatever-stack above should have no influence on it.
> 
> Indeed, I wasn't suggesting making more than one call, just that
> alloc_iova_fast() is quite likely to have to fall back to alloc_iova()
> here, so there may be some mileage in going directly to the latter, with
> the benefit of then being able to rely on find_iova() later (since you
> know for sure you allocated out of the tree rather than the caches). My
> hunch is that dma_map_sg() tends to be called for bulk data transfer
> (block devices, DRM, etc.) so is probably a less contended path compared
> to the network layer hammering dma_map_single().

Using different functions for allocation would also require special
handling in the queued-freeing code, as I have to track the allocation
then to know wheter I free it with the _fast variant or not.

> > +	mask          = dma_get_seg_boundary(dev);
> > +	boundary_size = mask + 1 ? ALIGN(mask + 1, PAGE_SIZE) >> PAGE_SHIFT :
> > +				   1UL << (BITS_PER_LONG - PAGE_SHIFT);
> 
> (mask >> PAGE_SHIFT) + 1 ?

Should make no difference unless some of the first PAGE_SHIFT bits of
mask is 0 (which shouldn't happen).


	Joerg