From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 39A90C71136 for ; Fri, 13 Jun 2025 13:41:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5F95F6B009C; Fri, 13 Jun 2025 09:41:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5A8F96B009E; Fri, 13 Jun 2025 09:41:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 449DA6B009D; Fri, 13 Jun 2025 09:41:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 175966B009B for ; Fri, 13 Jun 2025 09:41:30 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id BB0061A07EB for ; Fri, 13 Jun 2025 13:41:29 +0000 (UTC) X-FDA: 83550489498.23.438E541 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf13.hostedemail.com (Postfix) with ESMTP id 8360A2000E for ; Fri, 13 Jun 2025 13:41:27 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Ckkm6SQH; spf=pass (imf13.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1749822087; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1Rg4eLLEgx6FDFHzHNf6mLR85Vz7bTkf4DxmFxHRoP8=; b=lsShDJULqMoQcyCZBv+4zZWIMszhjXA7V7Jri4Isi4Y2u6iuc3ScOfFfH3WNpUd6s59vi9 fz0NKNMUWOKYUeTU1cXEYM5z9ECUxz/l0tCcu5QhyXl8CmtEMoFxWh27D0Ky/VfoNpFFtm 0Au/3EOatwKMgFApSO4Z7TbwlnS261E= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Ckkm6SQH; spf=pass (imf13.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1749822087; a=rsa-sha256; cv=none; b=OuKMV/BSHheP2h+DFrbl/aS9i1+ErzYOR1gFdW6jWHti6X2mpn0bhLIjtjvbC5dCmoKyDQ gn9kSj23Wfp/ggC//HwABhC9WkqLp4ztZ2b8bKMlIOSAwxJuZnTATD48cnGFvgl97GX7+F qTkICP4XaWC/2QsfShYUVB4sSXse/Zs= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1749822086; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1Rg4eLLEgx6FDFHzHNf6mLR85Vz7bTkf4DxmFxHRoP8=; b=Ckkm6SQHtgP88rXtJOU54LvvIJoGilXtNsDSEQYNmwKF6e0qX7Eu4uo2fiGGQLR5tz9B4p HHSpg1myvpvFtk7qh07OmjZ6lVEkva0y/pAbokOMQSe7GMyHSXfPdVlzzDqShQPmEmBkWg 1XIFUpUQabaoUgCTvrVx/pZs00fDYFs= Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-522-Dd2_0r37M2edTp1582R5LQ-1; Fri, 13 Jun 2025 09:41:25 -0400 X-MC-Unique: Dd2_0r37M2edTp1582R5LQ-1 X-Mimecast-MFC-AGG-ID: Dd2_0r37M2edTp1582R5LQ_1749822085 Received: by mail-qk1-f198.google.com with SMTP id af79cd13be357-7c5e28d0cc0so337905785a.3 for ; Fri, 13 Jun 2025 06:41:25 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1749822085; x=1750426885; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1Rg4eLLEgx6FDFHzHNf6mLR85Vz7bTkf4DxmFxHRoP8=; b=U9Zm8WHIxBP0GeWiNqXcqckMrHmjHJxbrnC186BpPchYBSF9r8NWU1mCxwKKGmeuQ2 F32ktzp9Wxh+VnwXBU5IFT3nfVQdmfpJQd1Rza904OyYNuWxXWWEPfKF/jV+Ba4C2fH6 GLI91h4/aYkf21qgaj8AwFe9BOxiALtNN3xvV+wvSRNAXVUm6dTKbUG3dCl215NUSv5L K3lCoY1tMuQqB+6llwMl7BN6ZcwDx0wAmTZWlO2aYxBSrJBKDGsAthqpx0OnYDOkMSkQ GMfH3OiXS1Ov320TESPHVsWkRURog3MuTSBOLVObdwhROeycJOZycIrKCLzvhSEe/PuA kYOg== X-Forwarded-Encrypted: i=1; AJvYcCVygPmZUGIMyQEd+76GKp/XBcwg20e+yuN1XwXR0AIr+QKaRWsrjL0qCq2Y4q4vqOHObm1JBtOwMA==@kvack.org X-Gm-Message-State: AOJu0Yzjkz8eUQHw3M/LGecP047pXSeIKXlFBV+AElVVyrbLOWd1n2dt qP47/AVeBcbXwf24U/4naBJ3qFk5dAZ0CfRK80lQX8sYHaOpC/Ah72Y/55C1EeW1ucoa2OiEWgV N9P8SJBrTFGd9VVW1MbJ/icEQve9888QB0Kn5gA/QdN4sn922LIet X-Gm-Gg: ASbGncvCGUIoQ7aiyCKLlxpKi0wk0x4j+HGiG9POcIxUyM+6du753BMEAAVT8/M2jWh cJCz76X3TpE6q5nQnOohpoXsl3VjSO3qaCM6IyNEsR8SaKVHym+MnP7i+AQR3i2Ki10Cg13IJ0s JBsclC7zcC1Ra6yCxWa9iMvKBKd3vpkLBfY6FsshuWpwEJmgf6zkeg86AJl7pRjbXEJc6SLsCfF 5nXEfX2N+vGhhoeaYCt3GQN+oeJdWg86r7+16ZpdBBbmO+fLZZA/0+AWWaH0xxspr8u4pNtXC73 yonATh4MwWI= X-Received: by 2002:a05:620a:1721:b0:7d3:aad4:6f87 with SMTP id af79cd13be357-7d3bc37ba17mr568289085a.7.1749822084916; Fri, 13 Jun 2025 06:41:24 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFqZxKV2mT6BD/bi39sPGRhzywu281UAzYUirAEIW40JfRfajmXwptkVPgbiPNtRrb6I7XH6g== X-Received: by 2002:a05:620a:1721:b0:7d3:aad4:6f87 with SMTP id af79cd13be357-7d3bc37ba17mr568285785a.7.1749822084512; Fri, 13 Jun 2025 06:41:24 -0700 (PDT) Received: from x1.com ([85.131.185.92]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7d3b8ee3f72sm171519285a.94.2025.06.13.06.41.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 13 Jun 2025 06:41:23 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kvm@vger.kernel.org Cc: Andrew Morton , Alex Williamson , Zi Yan , Jason Gunthorpe , Alex Mastro , David Hildenbrand , Nico Pache , peterx@redhat.com Subject: [PATCH 5/5] vfio-pci: Best-effort huge pfnmaps with !MAP_FIXED mappings Date: Fri, 13 Jun 2025 09:41:11 -0400 Message-ID: <20250613134111.469884-6-peterx@redhat.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250613134111.469884-1-peterx@redhat.com> References: <20250613134111.469884-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: TLH5ze5br0MZ7d3o-RfKcVdsxb2PztQpe6CMNE_elHU_1749822085 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: 8bit content-type: text/plain; charset="US-ASCII"; x-default=true X-Stat-Signature: rb5suou7cmcwyeaq4ioqxo6zsnwzhea5 X-Rspamd-Queue-Id: 8360A2000E X-Rspam-User: X-Rspamd-Server: rspam09 X-HE-Tag: 1749822087-137352 X-HE-Meta: U2FsdGVkX1+B94zFkc5G5SjtQb193eIJnaR9uA+kd0KiuquR0PI0kXl86ZsI06yvHfF3jnVHPZuqRyr4U2MNRqmA8qF4FkHYsyKDF48rybQRJK/7imApHLHWfPmpf1XS0afHIZjhkk6EBlBFc2eMTvcCHvnvJa66Uzwd8/i7JLKBZyHj2bo1DFcMddoWUo5x+FWvnYiA+uwHn3DFPpkUKwlhoI5G4qeqZoBohywyGEyI6cpRUu1A1gHU9WeTdQP+DmVsrE3yZ+45kDJlyhTvf9aD5/xhVWoPqcaiHDETC3QD7fCxry3CLoVF0EGKSVdwKYZiHUY4Eq5VsbBGhyfi3sIbzhFqju8+4MHUMzV1sAXVAA3x0qPGa5QHq9E+AUB3tTLynG8txj3KHUso04ddbxk59a9wMN0KP+PkfEtm6o34BgaaOl8shWoje8WDESqa19ZJlqn2wIV2TdESjL5tqUQistleg4wLvtvSUql+x/Nml/bjoQ1wP/uitYf4l4avVosJJK2t5jZ94t62WwfKqa6bNsmkSDCWty7a14MfkiWeN5S5vhHvF1Bb52mWDvS8+AIrEWN1AQdVFBMa4mmE7mJBAw//0fgHKU2wzsF2cWQQoLjxncyl1yB5UH1D20h8jsial0xy/IT/MPlpQ78QG9OifZ+mTZPi7q3j6aweuLWbGmP1wqQOfxExSiUHe88MTzNLJECoUpM/mlBxGzU9VVOZ9TTlttetIE+qAVxob7ZM6gMlrZJuXVnRx/ybtYMOrTIQ/5zhqaMdD1nEh/uL3Ujfyj2L3/pAsc/dPdsUDYA0zal+jmWBfe7i43O9xQBvuou3J2+ZeVUK53RvgfHtu+RTn5lUE0VO8ha+LADwa9LeEG7fwGohrIi5l2KOsalqIAtxSHpIpvc34+wyG5i0E+KhcRHzwVrWFQOPfMcCcFFnhjyEC1LrQQKZijSPsR3fXhvu5249i+7rTdfbKAY 0gY93D/K PD5dpx+SlVKfd3Hzawq+1/Gcwh6p5XvDRiMVHmyafmscrvA9HeuxGhIIivEu1DJjU2slQV5MN1LZS3sinLVdx3X8nn7JP29SPm+PD/GRx+edaQfCIpgfOaCh5HjEeKwjVQS4TuoVmBSG4ln3usIFCdSK028RY5oz9XhkRQbX6x1P06WOLkm4Fgp6uVYDf7YA3PVZ/ECfRJN4ZdeR0kYwAn0W5QhroiJW8wWSiy11pTvDfmHCj6r9o8zCRu83abHokur09QuVpi0KCLq4qEuYvICaPRC2J7T6FljJGnV8xPMrTY88fIpWeM9TSAKCX8q2U0eM/+aKcwMC74red2Y9c2dHDrzVybGS03RyeCynz8eez9NRXob3j3sAKPGN30NyjyrRBnqg3bwyUPM+oQ3sjBtuJ/OrXiHXxW1BM5BrQVJ63qiBibUkI4xX0K8E7fCW9ruOfGIe9b+jV5Bk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This patch enables best-effort mmap() for vfio-pci bars even without MAP_FIXED, so as to utilize huge pfnmaps as much as possible. It should also avoid userspace changes (switching to MAP_FIXED with pre-aligned VA addresses) to start enabling huge pfnmaps on VFIO bars. Here the trick is making sure the MMIO PFNs will be aligned with the VAs allocated from mmap() when !MAP_FIXED, so that whatever returned from mmap(!MAP_FIXED) of vfio-pci MMIO regions will be automatically suitable for huge pfnmaps as much as possible. To achieve that, a custom vfio_device's get_unmapped_area() for vfio-pci devices is needed. Note that MMIO physical addresses should normally be guaranteed to be always bar-size aligned, hence the bar offset can logically be directly used to do the calculation. However to make it strict and clear (rather than relying on spec details), we still try to fetch the bar's physical addresses from pci_dev.resource[]. Signed-off-by: Alex Williamson Signed-off-by: Peter Xu --- drivers/vfio/pci/vfio_pci.c | 3 ++ drivers/vfio/pci/vfio_pci_core.c | 65 ++++++++++++++++++++++++++++++++ include/linux/vfio_pci_core.h | 6 +++ 3 files changed, 74 insertions(+) diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index 5ba39f7623bb..d9ae6cdbea28 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -144,6 +144,9 @@ static const struct vfio_device_ops vfio_pci_ops = { .detach_ioas = vfio_iommufd_physical_detach_ioas, .pasid_attach_ioas = vfio_iommufd_physical_pasid_attach_ioas, .pasid_detach_ioas = vfio_iommufd_physical_pasid_detach_ioas, +#ifdef CONFIG_ARCH_SUPPORTS_HUGE_PFNMAP + .get_unmapped_area = vfio_pci_core_get_unmapped_area, +#endif }; static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c index 6328c3a05bcd..835bc168f8b7 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -1641,6 +1641,71 @@ static unsigned long vma_to_pfn(struct vm_area_struct *vma) return (pci_resource_start(vdev->pdev, index) >> PAGE_SHIFT) + pgoff; } +#ifdef CONFIG_ARCH_SUPPORTS_HUGE_PFNMAP +/* + * Hint function to provide mmap() virtual address candidate so as to be + * able to map huge pfnmaps as much as possible. It is done by aligning + * the VA to the PFN to be mapped in the specific bar. + * + * Note that this function does the minimum check on mmap() parameters to + * make the PFN calculation valid only. The majority of mmap() sanity check + * will be done later in mmap(). + */ +unsigned long vfio_pci_core_get_unmapped_area(struct vfio_device *device, + struct file *file, + unsigned long addr, + unsigned long len, + unsigned long pgoff, + unsigned long flags) +{ + struct vfio_pci_core_device *vdev = + container_of(device, struct vfio_pci_core_device, vdev); + struct pci_dev *pdev = vdev->pdev; + unsigned long ret, phys_len, req_start, phys_addr; + unsigned int index; + + index = pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT); + + /* Currently, only bars 0-5 supports huge pfnmap */ + if (index >= VFIO_PCI_ROM_REGION_INDEX) + goto fallback; + + /* Bar offset */ + req_start = (pgoff << PAGE_SHIFT) & ((1UL << VFIO_PCI_OFFSET_SHIFT) - 1); + phys_len = PAGE_ALIGN(pci_resource_len(pdev, index)); + + /* + * Make sure we at least can get a valid physical address to do the + * math. If this happens, it will probably fail mmap() later.. + */ + if (req_start >= phys_len) + goto fallback; + + phys_len = MIN(phys_len, len); + /* Calculate the start of physical address to be mapped */ + phys_addr = pci_resource_start(pdev, index) + req_start; + + /* Choose the alignment */ + if (IS_ENABLED(CONFIG_ARCH_SUPPORTS_PUD_PFNMAP) && phys_len >= PUD_SIZE) { + ret = mm_get_unmapped_area_aligned(file, addr, len, phys_addr, + flags, PUD_SIZE, 0); + if (ret) + return ret; + } + + if (phys_len >= PMD_SIZE) { + ret = mm_get_unmapped_area_aligned(file, addr, len, phys_addr, + flags, PMD_SIZE, 0); + if (ret) + return ret; + } + +fallback: + return mm_get_unmapped_area(current->mm, file, addr, len, pgoff, flags); +} +EXPORT_SYMBOL_GPL(vfio_pci_core_get_unmapped_area); +#endif + static vm_fault_t vfio_pci_mmap_huge_fault(struct vm_fault *vmf, unsigned int order) { diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h index fbb472dd99b3..e59699e01901 100644 --- a/include/linux/vfio_pci_core.h +++ b/include/linux/vfio_pci_core.h @@ -119,6 +119,12 @@ ssize_t vfio_pci_core_read(struct vfio_device *core_vdev, char __user *buf, size_t count, loff_t *ppos); ssize_t vfio_pci_core_write(struct vfio_device *core_vdev, const char __user *buf, size_t count, loff_t *ppos); +unsigned long vfio_pci_core_get_unmapped_area(struct vfio_device *device, + struct file *file, + unsigned long addr, + unsigned long len, + unsigned long pgoff, + unsigned long flags); int vfio_pci_core_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma); void vfio_pci_core_request(struct vfio_device *core_vdev, unsigned int count); int vfio_pci_core_match(struct vfio_device *core_vdev, char *buf); -- 2.49.0