From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CA72FC54E58 for ; Sun, 24 Mar 2024 23:22:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 07F4B6B007B; Sun, 24 Mar 2024 19:22:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0075B6B0082; Sun, 24 Mar 2024 19:22:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DEA346B0083; Sun, 24 Mar 2024 19:22:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id CD0826B007B for ; Sun, 24 Mar 2024 19:22:23 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 8886F401AD for ; Sun, 24 Mar 2024 23:22:23 +0000 (UTC) X-FDA: 81933508566.30.BFAD0AD Received: from verein.lst.de (verein.lst.de [213.95.11.211]) by imf13.hostedemail.com (Postfix) with ESMTP id B523E20019 for ; Sun, 24 Mar 2024 23:22:21 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf13.hostedemail.com: domain of hch@lst.de designates 213.95.11.211 as permitted sender) smtp.mailfrom=hch@lst.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1711322542; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wdf6PUBzbOxshWuBwuNO7bfujr6oxShqhVnMFylPW/s=; b=V5hVEhY4GtOZNE+4oihCBrortd1Rhtr0KfqCF5UcCMKA826afBz9laosrt0H5v29svDEm8 IylVQup4KHg2Lsd/bG3uecy+D1oM7dC4UV5VQfW8zG+biPeYn/R0vkNk7fEY1hQ1x3UMlz BpHB1mvC8jeTi6m+9mml6QdKfzsvzUI= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf13.hostedemail.com: domain of hch@lst.de designates 213.95.11.211 as permitted sender) smtp.mailfrom=hch@lst.de ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1711322542; a=rsa-sha256; cv=none; b=jueeIZltOEfZe3jrjy+OSksjr7phmIyLq/LpFzViSgkLLL134LD4yJKgCC/qrvpPc1DYb/ Fe61ipqINmTDUG1br5QIFHrsfbUtRIv3o5/dOH0K2boUqDgOqLcUP5GgL/0rY+c8DJSndb 2sQHBg+STo3l+7uDdpMqNVxXzXBzAx8= Received: by verein.lst.de (Postfix, from userid 2407) id 08B4F68D0F; Mon, 25 Mar 2024 00:22:16 +0100 (CET) Date: Mon, 25 Mar 2024 00:22:15 +0100 From: Christoph Hellwig To: Jason Gunthorpe Cc: Christoph Hellwig , Leon Romanovsky , Robin Murphy , Marek Szyprowski , Joerg Roedel , Will Deacon , Chaitanya Kulkarni , Jonathan Corbet , Jens Axboe , Keith Busch , Sagi Grimberg , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , =?iso-8859-1?B?Suly9G1l?= Glisse , Andrew Morton , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, kvm@vger.kernel.org, linux-mm@kvack.org, Bart Van Assche , Damien Le Moal , Amir Goldstein , "josef@toxicpanda.com" , "Martin K. Petersen" , "daniel@iogearbox.net" , Dan Williams , "jack@suse.com" , Zhu Yanjun Subject: Re: [RFC RESEND 00/16] Split IOMMU DMA mapping operation to two steps Message-ID: <20240324232215.GC20765@lst.de> References: <20240306221400.GA8663@lst.de> <20240307000036.GP9225@ziepe.ca> <20240307150505.GA28978@lst.de> <20240307210116.GQ9225@ziepe.ca> <20240308164920.GA17991@lst.de> <20240308202342.GZ9225@ziepe.ca> <20240309161418.GA27113@lst.de> <20240319153620.GB66976@ziepe.ca> <20240321223910.GA22663@lst.de> <20240322184330.GL66976@ziepe.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240322184330.GL66976@ziepe.ca> User-Agent: Mutt/1.5.17 (2007-11-01) X-Rspamd-Queue-Id: B523E20019 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: ederr17nuiey19j6o58scbanaf6h4anb X-HE-Tag: 1711322541-969302 X-HE-Meta: U2FsdGVkX18jVeJCfIu2F9G6E1ZE/mvTqKiccpNMc83ZEHvtwHMS++KWd8o5Klho14qfKborbuzpJtQq+UqbZ1Oba8nefNpR/vSuN4mv+HwcpkqxCSP4DuYaUD3A+f/TIzlv6CZQZjmZxUHnf4i77xPKvX2utsxUx631qq5/sNmNDe4tNbgK1BHm/ACm0Kw1MRagD6Vf4yzVNr6iB9w5jYEURIhJ7YoLrDQQ98i+l8NXLgTN4HptSoxOBGFFP+K3nsVKtTupzbg8sa0Nx9LzTCd9iSDFVp1Cl2eO/JEAv1eehKGNxU4fAHSoH17GnzGPlxqITtJJDF7fGkF5WgJuRF+3zYE+bIizrd79F2Uv0S/7OCJYY14/JcLnigYv9ab3L1PmCLa+I6b0uVX1oXHadm7PBF0EyGEwAZuMR1Wh78bVTUleWqd50Wc8pjJ/PVAb/CCJ55hTQOJgWJSIMkGoYBpDH7lQHE2VMj5KBgHIflmYfJ1BgHxttCSVGQV67+agJM+m7un6kBaxcyVVm+p5bhcd4JBTn8BgpIk/rKW3jhhi8d6sYbJzpwEfjFBjCFLetaxNirXKfDZz8qLUBXyIo83JsbkQCFK2gjKMw+qBvwYMJqyHrxLjRJMRJUNKk+NHZUsfcYmgKaWaB9mWzWZfbAreZBmrOY/7+Ygfdf7X3mB0n0okz3+jkK2GQn4RHl6Eu51m87WGvUCDepG7fSX3O0OQFJBtoBlHwwKNqs5fq3XFo1oD+cgefefl8LZKP7iUthJc4U7Z5rkS116MCY1NIUWzY1T9KmIRgOpwwakf2YlzIcqIxoO6M6NiYGMgox4vooHMUbQL2WZpk12F13b8vCeJ1CbqSiz6FwmP6UB8J6rI8vqqvFeiutZFNs0U0tQMs0uoOGQodhL3y16AjPNYnvuxZq+YrkEQIbJVGD0/5nxAiMf3LAB30O1MhMXR0RH3X9Z4PmbnMnjLQ94W2Yq SQ56/kI4 b70BEh1/ktrPGyB75uodvBlH33RDhQ9C4XTpSR8aOu+U1cVAEQfiQvYWQSMZMGzAVc1/akCeGtm9s8Hi578ZUykSLKA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Mar 22, 2024 at 03:43:30PM -0300, Jason Gunthorpe wrote: > If we are going to make caller provided uniformity a requirement, lets > imagine a formal memory type idea to help keep this a little > abstracted? > > DMA_MEMORY_TYPE_NORMAL > DMA_MEMORY_TYPE_P2P_NOT_ACS > DMA_MEMORY_TYPE_ENCRYPTED > DMA_MEMORY_TYPE_BOUNCE_BUFFER // ?? > > Then maybe the driver flow looks like: > > if (transaction.memory_type == DMA_MEMORY_TYPE_NORMAL && dma_api_has_iommu(dev)) { Add a nice helper to make this somewhat readable, but yes. > } else if (transaction.memory_type == DMA_MEMORY_TYPE_P2P_NOT_ACS) { > num_hwsgls = transcation.num_sgls; > for_each_range(transaction, range) { > hwsgl[i].addr = dma_api_p2p_not_acs_map(range.start_physical, range.length, p2p_memory_provider); > hwsgl[i].len = range.size; > } > } else { > /* Must be DMA_MEMORY_TYPE_NORMAL, DMA_MEMORY_TYPE_ENCRYPTED, DMA_MEMORY_TYPE_BOUNCE_BUFFER? */ > num_hwsgls = transcation.num_sgls; > for_each_range(transaction, range) { > hwsgl[i].addr = dma_api_map_cpu_page(range.start_page, range.length); > hwsgl[i].len = range.size; > } > And these two are really the same except that we call a different map helper underneath. So I think as far as the driver is concerned they should be the same, the DMA API just needs to key off the memory tap. > And the hmm_range_fault case is sort of like: > > struct dma_api_iommu_state state; > dma_api_iommu_start(&state, mr.num_pages); > > [..] > hmm_range_fault(...) > if (present) > dma_link_page(&state, faulting_address_offset, page); > else > dma_unlink_page(&state, faulting_address_offset, page); > > Is this looking closer? Yes. > > > So I take it as a requirement that RDMA MUST make single MR's out of a > > > hodgepodge of page types. RDMA MRs cannot be split. Multiple MR's are > > > not a functional replacement for a single MR. > > > > But MRs consolidate multiple dma addresses anyway. > > I'm not sure I understand this? The RDMA MRs take a a list of PFNish address, (or SGLs with the enhanced MRs from Mellanox) and give you back a single rkey/lkey. > To go back to my main thesis - I would like a high performance low > level DMA API that is capable enough that it could implement > scatterlist dma_map_sg() and thus also implement any future > scatterlist_v2, bio, hmm_range_fault or any other thing we come up > with on top of it. This is broadly what I thought we agreed to at LSF > last year. I think the biggest underlying problem of the scatterlist based DMA implementation for IOMMUs is that it's trying to handle to much, that is magic coalescing even if the segments boundaries don't align with the IOMMU page size. If we can get rid of that misfeature I think we'd greatly simply the API and implementation.