From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4D9A4C433EF for ; Wed, 29 Jun 2022 12:13:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 899098E0002; Wed, 29 Jun 2022 08:13:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 821FB8E0001; Wed, 29 Jun 2022 08:13:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6C21C8E0002; Wed, 29 Jun 2022 08:13:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 57B098E0001 for ; Wed, 29 Jun 2022 08:13:11 -0400 (EDT) Received: from smtpin31.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 1FA9234AC9 for ; Wed, 29 Jun 2022 12:13:11 +0000 (UTC) X-FDA: 79631162982.31.2DBA894 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf03.hostedemail.com (Postfix) with ESMTP id 27EC82002E for ; Wed, 29 Jun 2022 12:13:09 +0000 (UTC) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 3497A152B; Wed, 29 Jun 2022 05:13:09 -0700 (PDT) Received: from [10.57.85.71] (unknown [10.57.85.71]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id BEC103F66F; Wed, 29 Jun 2022 05:10:30 -0700 (PDT) Message-ID: Date: Wed, 29 Jun 2022 13:07:42 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; rv:91.0) Gecko/20100101 Thunderbird/91.11.0 Subject: Re: [PATCH v7 08/21] iommu/dma: support PCI P2PDMA pages in dma-iommu map_sg Content-Language: en-GB To: Logan Gunthorpe , linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org, linux-block@vger.kernel.org, linux-pci@vger.kernel.org, linux-mm@kvack.org, iommu@lists.linux-foundation.org Cc: Minturn Dave B , Martin Oliveira , Ralph Campbell , Jason Gunthorpe , John Hubbard , Dave Hansen , Matthew Wilcox , =?UTF-8?Q?Christian_K=c3=b6nig?= , Jason Gunthorpe , Chaitanya Kulkarni , Jason Ekstrand , Daniel Vetter , Bjorn Helgaas , Dan Williams , Stephen Bates , Ira Weiny , Christoph Hellwig , Xiong Jianxin References: <20220615161233.17527-1-logang@deltatee.com> <20220615161233.17527-9-logang@deltatee.com> From: Robin Murphy In-Reply-To: <20220615161233.17527-9-logang@deltatee.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf03.hostedemail.com: domain of robin.murphy@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=robin.murphy@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1656504790; a=rsa-sha256; cv=none; b=leKGDJMJkbAX7hROiR3T8lhgx4JXUDhmQdofzhI5g+OkG4zPcSVQ3wkuzbi8sI5hkDoLBx kzQdXoKqLzjjDVV357rU8fhvrrF4RDm9yUnl7zW0LoTtnA8BQZcrpukORjLnXr/k+FfAbe JoDnYDM6o75VE489vHx0RFWf7ajYpXA= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1656504790; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=z09HyVLVyA7CiFF0G/Q1WBg2DYzrEQ6++I3akWapE/E=; b=AT/KL3xH/LV9nOETW9LfbYItibEc5uhYdSah+xGJXWMMbIi41a10ko6NB2GGKo6OLR60bm uIl/B6ZNlQn90j/7x2Nc9mSw6POWip726gfayXJkFQox2VIOUAODQni1p1cE2qZWt0iSMs 4DiUnChUeUK1JkhEy7ltPZDE075OSCo= X-Stat-Signature: 859xajwwtt3984i9xdysaqefkqjjrm3x X-Rspamd-Server: rspam08 X-Rspam-User: X-Rspamd-Queue-Id: 27EC82002E Authentication-Results: imf03.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf03.hostedemail.com: domain of robin.murphy@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=robin.murphy@arm.com X-HE-Tag: 1656504789-476820 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2022-06-15 17:12, Logan Gunthorpe wrote: > When a PCI P2PDMA page is seen, set the IOVA length of the segment > to zero so that it is not mapped into the IOVA. Then, in finalise_sg(), > apply the appropriate bus address to the segment. The IOVA is not > created if the scatterlist only consists of P2PDMA pages. > > A P2PDMA page may have three possible outcomes when being mapped: > 1) If the data path between the two devices doesn't go through > the root port, then it should be mapped with a PCI bus address > 2) If the data path goes through the host bridge, it should be mapped > normally with an IOMMU IOVA. > 3) It is not possible for the two devices to communicate and thus > the mapping operation should fail (and it will return -EREMOTEIO). > > Similar to dma-direct, the sg_dma_mark_pci_p2pdma() flag is used to > indicate bus address segments. On unmap, P2PDMA segments are skipped > over when determining the start and end IOVA addresses. > > With this change, the flags variable in the dma_map_ops is set to > DMA_F_PCI_P2PDMA_SUPPORTED to indicate support for P2PDMA pages. > > Signed-off-by: Logan Gunthorpe > Reviewed-by: Jason Gunthorpe > --- > drivers/iommu/dma-iommu.c | 68 +++++++++++++++++++++++++++++++++++---- > 1 file changed, 61 insertions(+), 7 deletions(-) > > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c > index f90251572a5d..b01ca0c6a7ab 100644 > --- a/drivers/iommu/dma-iommu.c > +++ b/drivers/iommu/dma-iommu.c > @@ -21,6 +21,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -1062,6 +1063,16 @@ static int __finalise_sg(struct device *dev, struct scatterlist *sg, int nents, > sg_dma_address(s) = DMA_MAPPING_ERROR; > sg_dma_len(s) = 0; > > + if (is_pci_p2pdma_page(sg_page(s)) && !s_iova_len) { Logically, should we not be able to use sg_is_dma_bus_address() here? I think it should be feasible, and simpler, to prepare the p2p segments up-front, such that at this point all we need to do is restore the original length (if even that, see below). > + if (i > 0) > + cur = sg_next(cur); > + > + pci_p2pdma_map_bus_segment(s, cur); > + count++; > + cur_len = 0; > + continue; > + } > + > /* > * Now fill in the real DMA data. If... > * - there is a valid output segment to append to > @@ -1158,6 +1169,8 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg, > struct iova_domain *iovad = &cookie->iovad; > struct scatterlist *s, *prev = NULL; > int prot = dma_info_to_prot(dir, dev_is_dma_coherent(dev), attrs); > + struct dev_pagemap *pgmap = NULL; > + enum pci_p2pdma_map_type map_type; > dma_addr_t iova; > size_t iova_len = 0; > unsigned long mask = dma_get_seg_boundary(dev); > @@ -1193,6 +1206,35 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg, > s_length = iova_align(iovad, s_length + s_iova_off); > s->length = s_length; > > + if (is_pci_p2pdma_page(sg_page(s))) { > + if (sg_page(s)->pgmap != pgmap) { > + pgmap = sg_page(s)->pgmap; > + map_type = pci_p2pdma_map_type(pgmap, dev); > + } There's a definite code smell here, but per above and below I think we *should* actually call the new helper instead of copy-pasting half of it. > + > + switch (map_type) { > + case PCI_P2PDMA_MAP_BUS_ADDR: > + /* > + * A zero length will be ignored by > + * iommu_map_sg() and then can be detected If that is required behaviour then it needs an explicit check in iommu_map_sg() to guarantee (and document) it. It's only by chance that __iommu_map() happens to return success for size == 0 *if* all the other arguments still line up, which is a far cry from a safe no-op. However, rather than play yet more silly tricks, I think it would make even more sense to make iommu_map_sg() properly aware and able to skip direct p2p segments on its own. Once it becomes normal to pass mixed scatterlists around, it's only a matter of time until one ends up being handed to a driver which manages its own IOMMU domain, and then what? > + * in __finalise_sg() to actually map the > + * bus address. > + */ > + s->length = 0; > + continue; > + case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE: > + /* > + * Mapping through host bridge should be > + * mapped with regular IOVAs, thus we > + * do nothing here and continue below. > + */ > + break; > + default: > + ret = -EREMOTEIO; > + goto out_restore_sg; > + } > + } > + > /* > * Due to the alignment of our single IOVA allocation, we can > * depend on these assumptions about the segment boundary mask: > @@ -1215,6 +1257,9 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg, > prev = s; > } > > + if (!iova_len) > + return __finalise_sg(dev, sg, nents, 0); > + > iova = iommu_dma_alloc_iova(domain, iova_len, dma_get_mask(dev), dev); > if (!iova) { > ret = -ENOMEM; > @@ -1236,7 +1281,7 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg, > out_restore_sg: > __invalidate_sg(sg, nents); > out: > - if (ret != -ENOMEM) > + if (ret != -ENOMEM && ret != -EREMOTEIO) > return -EINVAL; > return ret; > } > @@ -1244,7 +1289,7 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg, > static void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg, > int nents, enum dma_data_direction dir, unsigned long attrs) > { > - dma_addr_t start, end; > + dma_addr_t end, start = DMA_MAPPING_ERROR; There are several things I don't like about this logic, I'd rather have "end = 0" here... > struct scatterlist *tmp; > int i; > > @@ -1260,14 +1305,22 @@ static void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg, > * The scatterlist segments are mapped into a single > * contiguous IOVA allocation, so this is incredibly easy. > */ [ This comment rather stops being true :( ] > - start = sg_dma_address(sg); > - for_each_sg(sg_next(sg), tmp, nents - 1, i) { ...then generalise the first-element special case here into a dedicated "walk to the first non-p2p element" loop... > + for_each_sg(sg, tmp, nents, i) { > + if (sg_is_dma_bus_address(tmp)) { > + sg_dma_unmark_bus_address(tmp); [ Again I question what this actually achieves ] > + continue; > + } > if (sg_dma_len(tmp) == 0) > break; > - sg = tmp; > + > + if (start == DMA_MAPPING_ERROR) > + start = sg_dma_address(tmp); > + > + end = sg_dma_address(tmp) + sg_dma_len(tmp); > } > - end = sg_dma_address(sg) + sg_dma_len(sg); > - __iommu_dma_unmap(dev, start, end - start); > + > + if (start != DMA_MAPPING_ERROR) ...then "if (end)" here. Thanks, Robin. > + __iommu_dma_unmap(dev, start, end - start); > } > > static dma_addr_t iommu_dma_map_resource(struct device *dev, phys_addr_t phys, > @@ -1460,6 +1513,7 @@ static unsigned long iommu_dma_get_merge_boundary(struct device *dev) > } > > static const struct dma_map_ops iommu_dma_ops = { > + .flags = DMA_F_PCI_P2PDMA_SUPPORTED, > .alloc = iommu_dma_alloc, > .free = iommu_dma_free, > .alloc_pages = dma_common_alloc_pages,