From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.1 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7FA03C10F13 for ; Tue, 16 Apr 2019 10:49:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 43E942075B for ; Tue, 16 Apr 2019 10:49:10 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="MXmROrMJ" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726946AbfDPKtJ (ORCPT ); Tue, 16 Apr 2019 06:49:09 -0400 Received: from hqemgate14.nvidia.com ([216.228.121.143]:19854 "EHLO hqemgate14.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726241AbfDPKtJ (ORCPT ); Tue, 16 Apr 2019 06:49:09 -0400 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqemgate14.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Tue, 16 Apr 2019 03:49:12 -0700 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Tue, 16 Apr 2019 03:49:07 -0700 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Tue, 16 Apr 2019 03:49:07 -0700 Received: from [10.24.45.163] (10.124.1.5) by HQMAIL101.nvidia.com (172.20.187.10) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Tue, 16 Apr 2019 10:49:03 +0000 Subject: Re: [PATCH V2] PCI: tegra: Use the DMA-API to get the MSI address To: Robin Murphy , Lorenzo Pieralisi CC: , , , , , , , , , References: <1553004121-24606-1-git-send-email-vidyas@nvidia.com> <20190411094024.GC6227@red-moon> X-Nvconfidentiality: public From: Vidya Sagar Message-ID: <7d9ecd44-36a7-befb-621f-1e30ef8c69bd@nvidia.com> Date: Tue, 16 Apr 2019 16:18:46 +0530 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 MIME-Version: 1.0 In-Reply-To: X-Originating-IP: [10.124.1.5] X-ClientProxiedBy: HQMAIL108.nvidia.com (172.18.146.13) To HQMAIL101.nvidia.com (172.20.187.10) Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US Content-Transfer-Encoding: quoted-printable DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1555411752; bh=ONbEL5AFFXMOuTEoetcSzlVZs0Drbxqhe8FSI7EiSD0=; h=X-PGP-Universal:Subject:To:CC:References:X-Nvconfidentiality:From: Message-ID:Date:User-Agent:MIME-Version:In-Reply-To: X-Originating-IP:X-ClientProxiedBy:Content-Type:Content-Language: Content-Transfer-Encoding; b=MXmROrMJS/0cauPR5IqSXYRn37dJyFNpQcMPqwijsvrK+A2Pa/pRDo44NU8O/hGK4 I8yamL3/A4L1b2PqP3FmcmioMAPf5nAfzqFc01nl7fFttICrFWX2EhMmn5pKnlBCRB ENo1MFhXxtbmjCL4qeXUrFDf7PnLRODGKMS+pWPA6JF2nIxJub304voTQ4M001jQ+R yMECaNYNq/qe5r1gFZXVHphJqlUezL/r0M2scafJ8uUC/aTDqZSN6kMdAhVEsVq9tL L78TENTVgappAxn8Ydh2AckopXcbn3B2W/Q17i/bOdMWjyJ7vlN3meb8Z9QydKmHYL eaqYPjb/FBbZA== Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org On 4/12/2019 8:30 PM, Robin Murphy wrote: > On 11/04/2019 10:40, Lorenzo Pieralisi wrote: >> [+Robin] >> >> I would like Robin to have a look before merging this patch so >> that we agree that's the expected approach. >=20 > It's a bit crazy, but I guess it's not really any worse than the existing= implementation. As long the comments and commit message clearly document t= hat this is just trickery to reserve a 32-bit DMA address (which AFAICS the= y more or less do) I don't think I have a significant objection. >=20 > One suggestion I would make is using dma_alloc_attrs() with DMA_ATTR_NO_K= ERNEL_MAPPING, to make it clear that this is being set aside for 'special' = device purposes and is not intended to be accessed as a regular buffer (plu= s I believe this is non-coherent, so that should also skip the small overhe= ad of creating a non-cacheable remap). Thanks Robin for your inputs. I'll take care of changing API from dma_alloc= _coherent()/dma_free_coherent() to dma_alloc_attrs()/dma_free_attrs() in V3= patch. >=20 > Ultimately, it might make sense to have an API in the PCI core which can = look at memblock/bridge windows/etc. to find a suitable non-overlapping add= ress for this kind of root-complex-integrated doorbell without having to su= bvert the page allocator, as it seems to be a fairly common setup in 'embed= ded' IPs. >=20 > Robin. >=20 >> >> Lorenzo >> >> On Tue, Mar 19, 2019 at 07:32:01PM +0530, Vidya Sagar wrote: >>> Since the upstream MSI memory writes are generated by downstream device= s, >>> it is logically correct to have MSI target memory coming from the DMA p= ool >>> reserved for PCIe than from the general memory pool reserved for CPU >>> access. This avoids PCIe DMA addresses coinciding with MSI target addre= ss >>> thereby raising unwanted MSI interrupts. This patch also enforces to li= mit >>> the MSI target address to 32-bits to make it work for PCIe endponits th= at >>> support only 32-bit MSI target address and those endpoints that support >>> 64-bit MSI target address anyway work with 32-bit MSI target address. >>> >>> Signed-off-by: Vidya Sagar >>> Reviewed-by: Thierry Reding >>> Acked-by: Thierry Reding >>> --- >>> v2: >>> * changed 'phys' type to 'dma_addr_t' from 'u64' >>> * added a comment on why DMA mask is set to 32-bit >>> * replaced 'dma' with 'DMA' >>> >>> =C2=A0 drivers/pci/controller/pci-tegra.c | 35 ++++++++++++++++++++++++= ++--------- >>> =C2=A0 1 file changed, 26 insertions(+), 9 deletions(-) >>> >>> diff --git a/drivers/pci/controller/pci-tegra.c b/drivers/pci/controlle= r/pci-tegra.c >>> index f4f53d092e00..f8173a5e352d 100644 >>> --- a/drivers/pci/controller/pci-tegra.c >>> +++ b/drivers/pci/controller/pci-tegra.c >>> @@ -231,9 +231,9 @@ struct tegra_msi { >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 struct msi_controller chip; >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 DECLARE_BITMAP(used, INT_PCI_MSI_NR); >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 struct irq_domain *domain; >>> -=C2=A0=C2=A0=C2=A0 unsigned long pages; >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 struct mutex lock; >>> -=C2=A0=C2=A0=C2=A0 u64 phys; >>> +=C2=A0=C2=A0=C2=A0 void *virt; >>> +=C2=A0=C2=A0=C2=A0 dma_addr_t phys; >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 int irq; >>> =C2=A0 }; >>> @@ -1536,7 +1536,7 @@ static int tegra_pcie_msi_setup(struct tegra_pcie= *pcie) >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 err =3D platform_get_irq_byname(pdev, "m= si"); >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (err < 0) { >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 dev_err(dev, "fa= iled to get IRQ: %d\n", err); >>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 goto err; >>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 goto free_irq_domain; >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 msi->irq =3D err; >>> @@ -1545,17 +1545,34 @@ static int tegra_pcie_msi_setup(struct tegra_pc= ie *pcie) >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 tegra_msi_irq_chip.name, pcie); >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (err < 0) { >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 dev_err(dev, "fa= iled to request IRQ: %d\n", err); >>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 goto err; >>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 goto free_irq_domain; >>> +=C2=A0=C2=A0=C2=A0 } >>> + >>> +=C2=A0=C2=A0=C2=A0 /* Though the PCIe controller can address >32-bit a= ddress space, to >>> +=C2=A0=C2=A0=C2=A0=C2=A0 * facilitate endpoints that support only 32-b= it MSI target address, >>> +=C2=A0=C2=A0=C2=A0=C2=A0 * the mask is set to 32-bit to make sure that= MSI target address is >>> +=C2=A0=C2=A0=C2=A0=C2=A0 * always a 32-bit address >>> +=C2=A0=C2=A0=C2=A0=C2=A0 */ >>> +=C2=A0=C2=A0=C2=A0 err =3D dma_set_coherent_mask(dev, DMA_BIT_MASK(32)= ); >>> +=C2=A0=C2=A0=C2=A0 if (err < 0) { >>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 dev_err(dev, "failed to set= DMA coherent mask: %d\n", err); >>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 goto free_irq; >>> +=C2=A0=C2=A0=C2=A0 } >>> + >>> +=C2=A0=C2=A0=C2=A0 msi->virt =3D dma_alloc_coherent(dev, PAGE_SIZE, &m= si->phys, GFP_KERNEL); >>> +=C2=A0=C2=A0=C2=A0 if (!msi->virt) { >>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 dev_err(dev, "failed to all= ocate DMA memory for MSI\n"); >>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 err =3D -ENOMEM; >>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 goto free_irq; >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } >>> -=C2=A0=C2=A0=C2=A0 /* setup AFI/FPCI range */ >>> -=C2=A0=C2=A0=C2=A0 msi->pages =3D __get_free_pages(GFP_KERNEL, 0); >>> -=C2=A0=C2=A0=C2=A0 msi->phys =3D virt_to_phys((void *)msi->pages); >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 host->msi =3D &msi->chip; >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return 0; >>> -err: >>> +free_irq: >>> +=C2=A0=C2=A0=C2=A0 free_irq(msi->irq, pcie); >>> +free_irq_domain: >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 irq_domain_remove(msi->domain); >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return err; >>> =C2=A0 } >>> @@ -1592,7 +1609,7 @@ static void tegra_pcie_msi_teardown(struct tegra_= pcie *pcie) >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 struct tegra_msi *msi =3D &pcie->msi; >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 unsigned int i, irq; >>> -=C2=A0=C2=A0=C2=A0 free_pages(msi->pages, 0); >>> +=C2=A0=C2=A0=C2=A0 dma_free_coherent(pcie->dev, PAGE_SIZE, msi->virt, = msi->phys); >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (msi->irq > 0) >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 free_irq(msi->ir= q, pcie); >>> --=20 >>> 2.7.4 >>>