From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B229BC54E67 for ; Wed, 20 Mar 2024 08:55:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3D0D16B0089; Wed, 20 Mar 2024 04:55:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 380B56B008A; Wed, 20 Mar 2024 04:55:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1FAC06B0092; Wed, 20 Mar 2024 04:55:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 0D5706B0089 for ; Wed, 20 Mar 2024 04:55:46 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id CD4A740F9D for ; Wed, 20 Mar 2024 08:55:45 +0000 (UTC) X-FDA: 81916809450.24.DC421C3 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf22.hostedemail.com (Postfix) with ESMTP id 94D86C0011 for ; Wed, 20 Mar 2024 08:55:42 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b="k/LiyTzX"; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf22.hostedemail.com: domain of leon@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=leon@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1710924942; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=yKjwKhClGK3A3LPUulyg650CjegYjnMBkeAhpXXug5c=; b=M+UBJufFdemzLwnp56+Lj9V2ygMDyIwC46jcqJp470WU6roubCDclm/17+NeqemNqbQU7R +nak+clFe9a0gUJSppiPLFq4QSQFNtI2L9rRQJCQjBrCOqiPH5r16wCgg4fdZP57uOuiHI GswokNLfMH7g1kof81AZseGGzzg/6HI= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b="k/LiyTzX"; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf22.hostedemail.com: domain of leon@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=leon@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1710924942; a=rsa-sha256; cv=none; b=sVPzkXzSg0a4rHQUnk0SV719Ln1RH/KIXdqzb4YlwRYgifkDGGb9FaFfQ/kElrqVR9/2mZ CuOTvzqzl8Z54+pkiTcKZtNEIV6p8WwwVnSq1qX9x+zleHtns+kGKDa1PJPAw83G7CJPlQ jEu1KM0KY5QGLyGuNW3FZBOP8oV3Y1g= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 6F08360F91; Wed, 20 Mar 2024 08:55:41 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 66867C433C7; Wed, 20 Mar 2024 08:55:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1710924941; bh=gdJJxHO47QTHX7IJIQn/yNoS/qXqccMa+6/0bsPpjEQ=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=k/LiyTzXOZm/bpwg/R7K7z83777XKyGQEoOkmI0+JirTc+A3Irgqqlo/MTqIaHAuT LaDz5/N89UQYLdibXxPlCFj22TikvtLMQmCsrzfiu9c8ExWvxD7lgnlxbCUAnPTxSu ovGWRjpBoZtNhTrQF2nDJYdWAWTZvyEET/dz7cc/V7Of5+KdYPv3eTw24b9yYI6Vth LvOmZtFkDBkj+IBOqsu96vPj5qqCbR/FSCbJ8WT6U9SUo/vlTPTqZ2qWSWdndiAJJg HA55nQnMV03il6CwCvqo9aY5LSDVcJcESBRt/qQPtJTao2MeNigIFRqvGlKDldCtwy dX0egkjrYM2eg== Date: Wed, 20 Mar 2024 10:55:36 +0200 From: Leon Romanovsky To: Jason Gunthorpe , Christoph Hellwig Cc: Robin Murphy , Marek Szyprowski , Joerg Roedel , Will Deacon , Chaitanya Kulkarni , Jonathan Corbet , Jens Axboe , Keith Busch , Sagi Grimberg , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , =?iso-8859-1?B?Suly9G1l?= Glisse , Andrew Morton , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, kvm@vger.kernel.org, linux-mm@kvack.org, Bart Van Assche , Damien Le Moal , Amir Goldstein , "josef@toxicpanda.com" , "Martin K. Petersen" , "daniel@iogearbox.net" , Dan Williams , "jack@suse.com" , Zhu Yanjun Subject: Re: [RFC RESEND 00/16] Split IOMMU DMA mapping operation to two steps Message-ID: <20240320085536.GA14887@unreal> References: <20240306162022.GB28427@lst.de> <20240306174456.GO9225@ziepe.ca> <20240306221400.GA8663@lst.de> <20240307000036.GP9225@ziepe.ca> <20240307150505.GA28978@lst.de> <20240307210116.GQ9225@ziepe.ca> <20240308164920.GA17991@lst.de> <20240308202342.GZ9225@ziepe.ca> <20240309161418.GA27113@lst.de> <20240319153620.GB66976@ziepe.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240319153620.GB66976@ziepe.ca> X-Rspamd-Queue-Id: 94D86C0011 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: fs8p8bwsp7kmnorhfwr38krr7guoddni X-HE-Tag: 1710924942-972772 X-HE-Meta: U2FsdGVkX18fGADm333QSO8+raF/m/MnRtx0OuxX6BDsk9ZMh2KMPPAMgr8uD1pXz0PUsQpI0udkAmghTHs5uF1MdEZkWmH4c3rsinutASRDFjlAwlE5kw/p9OhUIwtIDv62Qht/3mbVhyaOEPLIPwAtrnw/O3LGVCX1kV/D1wNPpWU1BJOVO6U+/6vCQ1XcYZTLSLXAhS5GxV42vy6IrYyBzgcpBZRklSpgsf+/pJ66B//du7Cnz8Mysdul3PtWhAYXNbCTgAqywz2iIH6ksBcASzWmMbKtT9uHukj5eX6vB0YU1V1micuLapnCGPBmVZcBRz0cdAQWxzzhV3cDPmBG4ouC0JvGE25d0tuvbaqIf7dyXbp9jteS9cr1YbsK+wyo0bJwfOYbXZOsNhnYu3gVKLGxGbUUE9a17gpuS4lQnliVFTJq7/1KTzEcRQbIXuvurf3v29xtiZI6RBynFVK3OCTZ33auzvRwVHgVxO5u5qIBSRszOG4NtudRucuvZ8Mm+NwPCSiGNXr4iqHPnUjKJcMtgG8fo6blmBnExEd8vHDPlhk9YMgunyzoH6h4M+iJObj+LNRC7D5A9bMlrmkO09QWC+OKGy22vnObQjFSyJ21biJOenPRrW/IPPD95luFn1wze+17aM664/ar4jWG9FDkyi6l7XAFD+uPtSZ0MnwO8rN7o9cIsezmHaezzvNXWIG4j9W9mM3uJvriR4kQMOT19KtfU8kx7D0nkhLkvtSxFjLW4KyHA8LT32XBokpdetgDN5K5yQH/b922NZ4JKhfbjGm5ZNmJg1Ux6diQAj2mMtmpjT6uMreghUTNz+cv82ggNE4RVpHyEeuGtUJSYUM6l+ts95mz0/95j/8t9dTDPIZZv2dsngI341REvhQRJpQjPzbt8KG2wdNn9yjN7h+7OS1gMFRjTvJYiI8k3rW5+hVFNRLDdEQahDATigU1+41ugt50lbB2VhI fSOFn7jp lPkD2pPsjfPMyTPhYejSXOi2U4ASXG746jtLgR7E906CRaXfTCx7JbGAHJ8stxBtD6op7MREnPML/zPfiTJW55hxUbxfb70Wwu7AvXY7Qs+XR6eif4IvYSUcDC4lGbPxGCKdZeSzipfkNUk0aOczKSvK4i42IG9b9dZSCkUp2fbhX01Y7YF3g1VTOw7AbQvjzJY5wvvFrbnBXyWS+OVJWs+7r1Dmlz6jS7YbBV7g+ruLvkcANl+9NSIHt/IQ9gY6ntBxr X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Mar 19, 2024 at 12:36:20PM -0300, Jason Gunthorpe wrote: > On Sat, Mar 09, 2024 at 05:14:18PM +0100, Christoph Hellwig wrote: > > On Fri, Mar 08, 2024 at 04:23:42PM -0400, Jason Gunthorpe wrote: > > > > The DMA API callers really need to know what is P2P or not for > > > > various reasons. And they should generally have that information > > > > available, either from pin_user_pages that needs to special case > > > > it or from the in-kernel I/O submitter that build it from P2P and > > > > normal memory. > > > > > > I think that is a BIO thing. RDMA just calls with FOLL_PCI_P2PDMA and > > > shoves the resulting page list into in a scattertable. It never checks > > > if any returned page is P2P - it has no reason to care. dma_map_sg() > > > does all the work. > > > > Right now it does, but that's not really a good interface. If we have > > a pin_user_pages variant that only pins until the next relevant P2P > > boundary and tells you about we can significantly simplify the overall > > interface. > > Sorry for the delay, I was away.. <...> > Can we tweak what Leon has done to keep the hmm_range_fault support > and non-uniformity for RDMA but add a uniformity optimized flow for > BIO? Something like this will do the trick. >From 45e739e7073fb04bc168624f77320130bb3f9267 Mon Sep 17 00:00:00 2001 Message-ID: <45e739e7073fb04bc168624f77320130bb3f9267.1710924764.git.leonro@nvidia.com> From: Leon Romanovsky Date: Mon, 18 Mar 2024 11:16:41 +0200 Subject: [PATCH] mm/gup: add strict interface to pin user pages according to FOLL flag All pin_user_pages*() and get_user_pages*() callbacks allocate user pages by partially taking into account their p2p vs. non-p2p properties. In case, user sets FOLL_PCI_P2PDMA flag, the allocated pages will include both p2p and "regular" pages, while if FOLL_PCI_P2PDMA flag is not provided, only regular pages are returned. In order to make sure that with FOLL_PCI_P2PDMA flag, only p2p pages are returned, let's introduce new internal FOLL_STRICT flag and provide special pin_user_pages_fast_strict() API call. Signed-off-by: Leon Romanovsky --- include/linux/mm.h | 3 +++ mm/gup.c | 36 +++++++++++++++++++++++++++++++++++- mm/internal.h | 4 +++- 3 files changed, 41 insertions(+), 2 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index f5a97dec5169..910b65dde24a 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2491,6 +2491,9 @@ int pin_user_pages_fast(unsigned long start, int nr_pages, unsigned int gup_flags, struct page **pages); void folio_add_pin(struct folio *folio); +int pin_user_pages_fast_strict(unsigned long start, int nr_pages, + unsigned int gup_flags, struct page **pages); + int account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc); int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc, struct task_struct *task, bool bypass_rlim); diff --git a/mm/gup.c b/mm/gup.c index df83182ec72d..11b5c626a4ab 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -133,6 +133,10 @@ struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags) if (unlikely(!(flags & FOLL_PCI_P2PDMA) && is_pci_p2pdma_page(page))) return NULL; + if (flags & FOLL_STRICT) + if (flags & FOLL_PCI_P2PDMA && !is_pci_p2pdma_page(page)) + return NULL; + if (flags & FOLL_GET) return try_get_folio(page, refs); @@ -232,6 +236,10 @@ int __must_check try_grab_page(struct page *page, unsigned int flags) if (unlikely(!(flags & FOLL_PCI_P2PDMA) && is_pci_p2pdma_page(page))) return -EREMOTEIO; + if (flags & FOLL_STRICT) + if (flags & FOLL_PCI_P2PDMA && !is_pci_p2pdma_page(page)) + return -EREMOTEIO; + if (flags & FOLL_GET) folio_ref_inc(folio); else if (flags & FOLL_PIN) { @@ -2243,6 +2251,8 @@ static bool is_valid_gup_args(struct page **pages, int *locked, * - FOLL_TOUCH/FOLL_PIN/FOLL_TRIED/FOLL_FAST_ONLY are internal only * - FOLL_REMOTE is internal only and used on follow_page() * - FOLL_UNLOCKABLE is internal only and used if locked is !NULL + * - FOLL_STRICT is internal only and used to distinguish between p2p + * and "regular" pages. */ if (WARN_ON_ONCE(gup_flags & INTERNAL_GUP_FLAGS)) return false; @@ -3187,7 +3197,8 @@ static int internal_get_user_pages_fast(unsigned long start, if (WARN_ON_ONCE(gup_flags & ~(FOLL_WRITE | FOLL_LONGTERM | FOLL_FORCE | FOLL_PIN | FOLL_GET | FOLL_FAST_ONLY | FOLL_NOFAULT | - FOLL_PCI_P2PDMA | FOLL_HONOR_NUMA_FAULT))) + FOLL_PCI_P2PDMA | FOLL_HONOR_NUMA_FAULT | + FOLL_STRICT))) return -EINVAL; if (gup_flags & FOLL_PIN) @@ -3322,6 +3333,29 @@ int pin_user_pages_fast(unsigned long start, int nr_pages, } EXPORT_SYMBOL_GPL(pin_user_pages_fast); +/** + * pin_user_pages_fast_strict() - this is pin_user_pages_fast() variant, which + * makes sure that only pages with same properties are pinned. + * + * @start: starting user address + * @nr_pages: number of pages from start to pin + * @gup_flags: flags modifying pin behaviour + * @pages: array that receives pointers to the pages pinned. + * Should be at least nr_pages long. + * + * Nearly the same as pin_user_pages_fastt(), except that FOLL_STRICT is set. + * + * FOLL_STRICT means that the pages are allocated with specific FOLL_* properties. + */ +int pin_user_pages_fast_strict(unsigned long start, int nr_pages, + unsigned int gup_flags, struct page **pages) +{ + if (!is_valid_gup_args(pages, NULL, &gup_flags, FOLL_PIN | FOLL_STRICT)) + return -EINVAL; + return internal_get_user_pages_fast(start, nr_pages, gup_flags, pages); +} +EXPORT_SYMBOL_GPL(pin_user_pages_fast_strict); + /** * pin_user_pages_remote() - pin pages of a remote process * diff --git a/mm/internal.h b/mm/internal.h index f309a010d50f..7578837a0444 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1031,10 +1031,12 @@ enum { FOLL_FAST_ONLY = 1 << 20, /* allow unlocking the mmap lock */ FOLL_UNLOCKABLE = 1 << 21, + /* don't mix pages with different properties, e.g. p2p with "regular" ones */ + FOLL_STRICT = 1 << 22, }; #define INTERNAL_GUP_FLAGS (FOLL_TOUCH | FOLL_TRIED | FOLL_REMOTE | FOLL_PIN | \ - FOLL_FAST_ONLY | FOLL_UNLOCKABLE) + FOLL_FAST_ONLY | FOLL_UNLOCKABLE | FOLL_STRICT) /* * Indicates for which pages that are write-protected in the page table, -- 2.44.0