From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 83681C71136 for ; Fri, 13 Jun 2025 19:21:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 03B746B0093; Fri, 13 Jun 2025 15:21:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F07336B0095; Fri, 13 Jun 2025 15:21:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E1D386B0096; Fri, 13 Jun 2025 15:21:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id C1CBF6B0093 for ; Fri, 13 Jun 2025 15:21:48 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 76013BE5EB for ; Fri, 13 Jun 2025 19:21:48 +0000 (UTC) X-FDA: 83551347096.28.508EF56 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf08.hostedemail.com (Postfix) with ESMTP id 1EB2B16000D for ; Fri, 13 Jun 2025 19:21:45 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=B+xCHb1m; spf=pass (imf08.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1749842506; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=47EwzNRArjARKfaIAn0KmpmAVReUeeJkM/EwClBQZtg=; b=zBKz9w8uIJ59mwUgac0bA8fOi4hekYEhamhbwt98e2qQfw5N7Iqw8gYEBoxs3cpxe6ON4D ADbzYrcnXr9a2dDySCDR/NHHTYTdZhMBeM2ZcmA4mlJISCkC1LuNEoMn3GSDFwdl7V3wYd 3HKcA0+BgvA/gbMXDNI+kGqrhgPp1f8= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=B+xCHb1m; spf=pass (imf08.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1749842506; a=rsa-sha256; cv=none; b=jMfYziKNjeRS6aRr8PEjzK8qnijbEXr0usMsOif3LF3tj7p0A2e2lqRTE7AuugB71vRJ/q o4Beg/5SiwB/9+OVz8XcLaEooOxqGNA28A9lPTrjB4NHjKr2zy7o1UKtLNfTfW7f+dAZmZ QTy4deMpxKaI8tuhLffTycCCs0mfuNA= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1749842505; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=47EwzNRArjARKfaIAn0KmpmAVReUeeJkM/EwClBQZtg=; b=B+xCHb1m8QA6i4FwnZKQLXd+mYZ9LBwCA6IRoYLbN1UGNz6BE7P7ZgFvF+6n9rk3Ezv+2J eJvKpDqnYt5tJ/lsFxWxa1VW0k+q9tTLlMRD92kTVvFnqFVTaLUStQdAB4B47IVhTT/PfR 8olFfFVNDBfM1GFWMMZVjme+xsslQ4c= Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-346-utpr8cT-Mi2VocfaadlPwA-1; Fri, 13 Jun 2025 15:21:44 -0400 X-MC-Unique: utpr8cT-Mi2VocfaadlPwA-1 X-Mimecast-MFC-AGG-ID: utpr8cT-Mi2VocfaadlPwA_1749842503 Received: by mail-qk1-f200.google.com with SMTP id af79cd13be357-7cd06c31ad6so507998785a.0 for ; Fri, 13 Jun 2025 12:21:44 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1749842503; x=1750447303; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=47EwzNRArjARKfaIAn0KmpmAVReUeeJkM/EwClBQZtg=; b=UNCjtFJP0UGuqmeadxzoGK45Agg82MHopDhgTfW+DmRDrbkNS33M0e5vNXYDJkDkTO db1Rq5bF+nwIoMTmsbK4+oklW4HzBy+II2yTOtlJeo1XaP6hz+o2a1VY8LQU9QA9HTBE aFlKcJwvBJbRATXkdMX/15g8bPy4on1FLVUE4rUqayqsJqLJ1Pqf+83jtg6aKzm+b9yT nWeteL43rQlxYtiMWVirMKCrpXpEwaOjKOxBc7OPsKDE9cZAZPxYKl/o7D9US2cXh9Ds PeqDnAAkt1fNk0mfotR1Q+YnDluzj/g/cLLO6PWyGLFqODc4i1YRC5PrEI1U9VQSyOdH NrDw== X-Forwarded-Encrypted: i=1; AJvYcCWg9GSYedixze6p7DG/xbzhzvRDQBROQbtWtYiKzGb6aPMYOr/wfE8IE4Chy9hTAMYMPhSNIEYYtA==@kvack.org X-Gm-Message-State: AOJu0YxPOdi1beBb5eHfz7bzzQjb/kxKecTpb6k80sBDvz2c/TOA8+LH 82A1JrbMIfE0LXRL5Ab+Ik8Oh7mKQ6/KnJucwiPCn+zr6pbVX40ZCUgtZ0jnw7d9d4a/lLzUreq VP1q+qmetLKvCqV4BRDrhP4b64fkdI3qLP0zi/YhHsUBGdL9QYj8k X-Gm-Gg: ASbGncsm3kEyYNDFCMnCUe+6Bm18dcNbjIuXxjkGhAHKsCTGaHfWJsTVOlxg7w+uOIv W+kJ41VIcpY7NbvcXyP0QQKuIq7KLIELAVHPzMABSHa5VG8OszD8U9KdWm4Al/St643m2yqhwEw PppiQXvJetvWIFYaYUu51rKP7kpQpUcIFNT2Nj3UwI9eG9dXPVyXw/xs09U1Vhs/0HhqQwKjGn6 6RpjPIdf2UodrcN+z1m8J708/gjxrGnWEi5+WVLCZxLMjqCwLL8Fi8PhLI5qYsQGqWHqiT5rj/q /Hrt9OcrC0E9sw== X-Received: by 2002:a05:620a:410c:b0:7cd:3ef0:d1ac with SMTP id af79cd13be357-7d3c6841c5fmr163742385a.15.1749842503533; Fri, 13 Jun 2025 12:21:43 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFYPw1DvF4R6o67TB67rLDMELDw3huktfBOSn9E7o8cN6pBp5XOkigzzTV9lHJHGKVi6V/qow== X-Received: by 2002:a05:620a:410c:b0:7cd:3ef0:d1ac with SMTP id af79cd13be357-7d3c6841c5fmr163739085a.15.1749842503116; Fri, 13 Jun 2025 12:21:43 -0700 (PDT) Received: from x1.local ([85.131.185.92]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7d3b8df8ee8sm209361985a.28.2025.06.13.12.21.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 13 Jun 2025 12:21:42 -0700 (PDT) Date: Fri, 13 Jun 2025 15:21:39 -0400 From: Peter Xu To: David Hildenbrand Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kvm@vger.kernel.org, Andrew Morton , Alex Williamson , Zi Yan , Jason Gunthorpe , Alex Mastro , Nico Pache Subject: Re: [PATCH 5/5] vfio-pci: Best-effort huge pfnmaps with !MAP_FIXED mappings Message-ID: References: <20250613134111.469884-1-peterx@redhat.com> <20250613134111.469884-6-peterx@redhat.com> MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: XRiIJSkYB6cNm76Q8xnrNZwNX1JNiZSTsiB4uF3q9W4_1749842503 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 1EB2B16000D X-Stat-Signature: yq3r46mcpc7wicupf5idfb754cqz1pt9 X-Rspam-User: X-HE-Tag: 1749842505-146300 X-HE-Meta: U2FsdGVkX1/zpYQgOhzjToHZzjusBIEOxuDAVdG6rvBIRpiphZGsCKiXERKVTm5gdLZdmfgR8S3pz9H0VIjS2vRPir69Ak+s1OG7aA2quZldzOmBiComzVhQe578gHdQAoRnHHwzZvRuZjM5tO89mpIOchEYalfZSo6fvB4KGMe6T5lirW7NMhbS9pbqnWo+/HxLyOgqhp2xfD3RQ7/tAdOPE1eD/kkMEMCJSasJi6Q8Pzxl9C6ryzXPOpkubLN4a+9JsvkU+Mj4tMDjkk0LaXlyw2LRpp7pcv7Lp3uEwRunMiFwjnl/fkfi6JNS0MZmNjN5gpJ4OGosWsAKtAfd/ZWUCirGTYZavDVG39QGTb9QH92dTwvkrA9+y7C5CLEjrNSGrCcxaRR8uONaIdxhOntV64cod4/x0bDt0sgofxRAFm8P5PMcy8FixAOyduQJVjQCRZWajgGoUMzhVB8G+N+OTglNgVAvwqS41QQPqC01zCzDAPrHADP2TcsLL7GQflaSfN/m0ZUjHveD4G/4B/bWCaS4xr8mus470n4kX9K/UmTZPGTHHzDGaUC2adXwG2EcRbtOwY+Q1w/2eJZfhSmDcdpRYlZV6QMQYTPFlxpkANeZf6hNE9S8OOo7C6vUptIJeRCh2CIp0JXw7Ot55JUClWHRkQJWflSFiFwGBiKVl0LKclRtOv4n9hQxTC0fVcYtU8rgebAEL8LX7Psmt/zpwqwxusXWSnZbRfjLjzTtdn7lCNyWo0LOnQGEzqXja2xEZUT/wN4KOydImZLhoaGk+kTtU2Ka0m5DDsSNbaCHmscqiJbBfBtWS4krTumgweX+WbqtMtGvx8ybTYcibaSG4jbNhD/YewInjUbMpQSte61+AJjHXZhgwLtISugdemQ1eGsvSoe/+Sqe+xopMgiPHCfynShg0vuOECo0nFxpJRLdYYKPKSSAo6vUdKvUIxw6X56LFKQmx0Bp5Yl XQONDx2M Ix6imjen+G1qB6dBeZnupGOwTiOaH/lYVt7R6LYZYTh/vTtMKMZuvuVkguuP+91rqp64zFr22njaHY64lut9Wl9OGuANH2UP4Rzu9iZoZK/qlMNS+MkdvsO5BzRwpaM39TmFBrohglcbwUiqmsIItQ8qQCb0c6g4jtDyGHCjU4HPlJyvsNoxb1G8BRonngsRHrtciruyIae7PxXluyQn4DojxP6QD1LJM5h8lNfmIK9zNmFbBcwH30OuArxNtewp7FAT9MU5mU07OYvCMOKhokRWt/kSeoM+czkg8X1Fgcx/PpVWLubNM3cXHpG7NMQc8yAKlVnkaThrhew0/7NWZ1KRaPKH4qHWLOAinxlpL9/Pvj4zjyPhvnP+jx+QVBxwtVCTG8AkC748UIEtZ08yXgTTDkk5X323/q1v1HmXbjT39wmChvZPLv3brJjhzdyBWqfdzefqWr3YqQCk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Jun 13, 2025 at 08:09:41PM +0200, David Hildenbrand wrote: > On 13.06.25 15:41, Peter Xu wrote: > > This patch enables best-effort mmap() for vfio-pci bars even without > > MAP_FIXED, so as to utilize huge pfnmaps as much as possible. It should > > also avoid userspace changes (switching to MAP_FIXED with pre-aligned VA > > addresses) to start enabling huge pfnmaps on VFIO bars. > > > > Here the trick is making sure the MMIO PFNs will be aligned with the VAs > > allocated from mmap() when !MAP_FIXED, so that whatever returned from > > mmap(!MAP_FIXED) of vfio-pci MMIO regions will be automatically suitable > > for huge pfnmaps as much as possible. > > > > To achieve that, a custom vfio_device's get_unmapped_area() for vfio-pci > > devices is needed. > > > > Note that MMIO physical addresses should normally be guaranteed to be > > always bar-size aligned, hence the bar offset can logically be directly > > used to do the calculation. However to make it strict and clear (rather > > than relying on spec details), we still try to fetch the bar's physical > > addresses from pci_dev.resource[]. > > > > Signed-off-by: Alex Williamson > > There is likely a > > Co-developed-by: Alex Williamson > > missing? Would it mean the same if we use the two SoBs like what this patch uses? I sincerely don't know the difference.. I hope it's fine to show that this patch was developed together. Please let me know otherwise. > > > Signed-off-by: Peter Xu > > --- > > drivers/vfio/pci/vfio_pci.c | 3 ++ > > drivers/vfio/pci/vfio_pci_core.c | 65 ++++++++++++++++++++++++++++++++ > > include/linux/vfio_pci_core.h | 6 +++ > > 3 files changed, 74 insertions(+) > > > > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c > > index 5ba39f7623bb..d9ae6cdbea28 100644 > > --- a/drivers/vfio/pci/vfio_pci.c > > +++ b/drivers/vfio/pci/vfio_pci.c > > @@ -144,6 +144,9 @@ static const struct vfio_device_ops vfio_pci_ops = { > > .detach_ioas = vfio_iommufd_physical_detach_ioas, > > .pasid_attach_ioas = vfio_iommufd_physical_pasid_attach_ioas, > > .pasid_detach_ioas = vfio_iommufd_physical_pasid_detach_ioas, > > +#ifdef CONFIG_ARCH_SUPPORTS_HUGE_PFNMAP > > + .get_unmapped_area = vfio_pci_core_get_unmapped_area, > > +#endif > > }; > > static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) > > diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c > > index 6328c3a05bcd..835bc168f8b7 100644 > > --- a/drivers/vfio/pci/vfio_pci_core.c > > +++ b/drivers/vfio/pci/vfio_pci_core.c > > @@ -1641,6 +1641,71 @@ static unsigned long vma_to_pfn(struct vm_area_struct *vma) > > return (pci_resource_start(vdev->pdev, index) >> PAGE_SHIFT) + pgoff; > > } > > +#ifdef CONFIG_ARCH_SUPPORTS_HUGE_PFNMAP > > +/* > > + * Hint function to provide mmap() virtual address candidate so as to be > > + * able to map huge pfnmaps as much as possible. It is done by aligning > > + * the VA to the PFN to be mapped in the specific bar. > > + * > > + * Note that this function does the minimum check on mmap() parameters to > > + * make the PFN calculation valid only. The majority of mmap() sanity check > > + * will be done later in mmap(). > > + */ > > +unsigned long vfio_pci_core_get_unmapped_area(struct vfio_device *device, > > + struct file *file, > > + unsigned long addr, > > + unsigned long len, > > + unsigned long pgoff, > > + unsigned long flags) > > A very suboptimal way to indent this many parameters; just use two tabs at > the beginning. This is the default indentation from Emacs c-mode. Since this is a VFIO file, I checked the file and looks like there's not yet a strict rule of indentation across the whole file. I can switch to two-tabs for sure if nobody else disagrees. > > > +{ > > + struct vfio_pci_core_device *vdev = > > + container_of(device, struct vfio_pci_core_device, vdev); > > + struct pci_dev *pdev = vdev->pdev; > > + unsigned long ret, phys_len, req_start, phys_addr; > > + unsigned int index; > > + > > + index = pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT); > > Could do > > unsigned int index = pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT); > > at the very top. Sure. > > > + > > + /* Currently, only bars 0-5 supports huge pfnmap */ > > + if (index >= VFIO_PCI_ROM_REGION_INDEX) > > + goto fallback; > > + > > + /* Bar offset */ > > + req_start = (pgoff << PAGE_SHIFT) & ((1UL << VFIO_PCI_OFFSET_SHIFT) - 1); > > + phys_len = PAGE_ALIGN(pci_resource_len(pdev, index)); > > + > > + /* > > + * Make sure we at least can get a valid physical address to do the > > + * math. If this happens, it will probably fail mmap() later.. > > + */ > > + if (req_start >= phys_len) > > + goto fallback; > > + > > + phys_len = MIN(phys_len, len); > > + /* Calculate the start of physical address to be mapped */ > > + phys_addr = pci_resource_start(pdev, index) + req_start; > > + > > + /* Choose the alignment */ > > + if (IS_ENABLED(CONFIG_ARCH_SUPPORTS_PUD_PFNMAP) && phys_len >= PUD_SIZE) { > > + ret = mm_get_unmapped_area_aligned(file, addr, len, phys_addr, > > + flags, PUD_SIZE, 0); > > + if (ret) > > + return ret; > > + } > > + > > + if (phys_len >= PMD_SIZE) { > > + ret = mm_get_unmapped_area_aligned(file, addr, len, phys_addr, > > + flags, PMD_SIZE, 0); > > + if (ret) > > + return ret; > > Similar to Jason, I wonder if that logic should reside in the core, and we > only indicate the maximum page table level we support. I replied. We can continue the discussion there. Thanks, -- Peter Xu