From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 430EFC71136 for ; Fri, 13 Jun 2025 13:41:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D4BA66B0092; Fri, 13 Jun 2025 09:41:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CFB996B0093; Fri, 13 Jun 2025 09:41:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BEA5F6B0095; Fri, 13 Jun 2025 09:41:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id A01DB6B0092 for ; Fri, 13 Jun 2025 09:41:21 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 4701FBB82E for ; Fri, 13 Jun 2025 13:41:21 +0000 (UTC) X-FDA: 83550489162.28.93A4B90 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf28.hostedemail.com (Postfix) with ESMTP id E62EAC0003 for ; Fri, 13 Jun 2025 13:41:18 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=e8Jm2Cc7; spf=pass (imf28.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1749822079; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=8+Xheo22yfi8+4Y8JwQZygk+6DcGkUs/7vPXcz1EW7o=; b=oCugzoKeo4kMnRH9qGFODOjuZT5wEpgpCixL6U0rwdnpzjSBJXnSxtMu5uSOCN3ey2HRmV lhSb2re22ZrmkfHVf8eeH7VpShBBWXxTCvSLh9JAUVQ9SicykoabLqxZNUJbFG56GntCmL rEbyfd87Zf2KtCoLiFYrR9OVoeKmBuM= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1749822079; a=rsa-sha256; cv=none; b=Mc9B0PKVruy+EE3XDOqi79hkoDw2TKG9zfwMNvaxLi5GLIZAL3tXn6qULOOpWILwGocpit y96IeuNNubdywJFJOzAYN4CL+vR6NSpo9WvbKp2EHJv1p0Q3d4O8ficFIaAVKtgmTxZBuR y8onIbfTSRbeUL96R3SjReQY6oPisQo= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=e8Jm2Cc7; spf=pass (imf28.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1749822078; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=8+Xheo22yfi8+4Y8JwQZygk+6DcGkUs/7vPXcz1EW7o=; b=e8Jm2Cc7OtAAj+7oml/AXIPZX1+Kc/MQ07/VzUvR30cQQTof8iYYgHRzCoKgH7Zg/6U2SD 56RmlJio4PGfTo7MSaq+FsGcyg/5q8cXADEW8N1wxxrNmPF+C2XzA3IgdovGmm7S8UT5Rp kNc2x4ovgFKgvFVzbCNyYNxtrGhFtZI= Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-685-amLxgjRjNGGMF7NrgsK14g-1; Fri, 13 Jun 2025 09:41:17 -0400 X-MC-Unique: amLxgjRjNGGMF7NrgsK14g-1 X-Mimecast-MFC-AGG-ID: amLxgjRjNGGMF7NrgsK14g_1749822075 Received: by mail-qk1-f200.google.com with SMTP id af79cd13be357-7d38f565974so568969085a.1 for ; Fri, 13 Jun 2025 06:41:16 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1749822075; x=1750426875; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=8+Xheo22yfi8+4Y8JwQZygk+6DcGkUs/7vPXcz1EW7o=; b=ZosdSLJsWxuuLjfytBNnLPAnUvKwj9oNWKNTC+M0HCV6DW7odB89blJuRqUZkdTGW5 w0gdxoSqEtsgCTm+BNlpD8iGXkY3ayTUaCxLryjskd9FSW4H0K7Y1y0j2HqQN9EnKrVW MMtoHqoNb424+1vgufrBSiOFF5sYJUGjdIulva2W9dLs/GrJh8FUkLCA2WYhlCwHS1lo /kEZ+2F0bIuVote6Aw648AM0+BSbA31WzKdVO8/OtYOul/ozz8xBg5gm/BdahG+9quI4 2z2fKpn+YPcSm48/E7Vi212lAsxlkc0cqlAcOolVg1jV2Xk0MKqUQYdX63GLFmvoJm3H /f1Q== X-Forwarded-Encrypted: i=1; AJvYcCWXc3+Q2U8iBND4CLODpzRkrNDlvIU49Th3EwTDXGN/q2+P4nCwlj4twLuZ1hb+MV6W54+O7rG3LA==@kvack.org X-Gm-Message-State: AOJu0YyKXQ1Xz532ce45uL1WafAKsr/lTskY8C/an7AYV3rd1c9TUxtI R8EH9lsl1WIIb7L70Um272QzSLF/LJ/3d27zFdbQn+c1saDftA20Af5i0GdvYJF7xvjvjrkXqG3 WlF6yEGejUstiNhGGygvfNsDR+craAc4bmrk5JImWQDp9Ef9zEVpv X-Gm-Gg: ASbGncuMkIgMh756F3tRK53WvGIpOzNzwARL1MGC98I+CuNRkZlq7FDCWrlyZUgFr+3 mnXCts/0zeDbFOQu5O6hi6PmZ7e7Zi0HWRDEXFp1rdwixQNqotEpo3hwDzMmx9hg3a4uMJ5baQU uIj+QKEd7cRjIHGCtSEl4Btfi695R/oyR0MJELveCZIHh81oaBgfuefKO3qerZMtoJ3ZPd9O6qB ytdeKh5Dq3SLcpE7+XnbM/W8FczHmEhOSoY0LPaWsF+embTW85Fohdwd8CDm/OVBP1N5GCfRRT1 oIDkxBq6vTE= X-Received: by 2002:a05:620a:2621:b0:7d3:9032:2b85 with SMTP id af79cd13be357-7d3c5360840mr136459985a.13.1749822075471; Fri, 13 Jun 2025 06:41:15 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFe+TnRdtM1gxQu/6gMVaNFM0qTZrkCayfL/mj3FQ53jOkq/dZA5fcnFQB8Hiozqkhq54Zhsw== X-Received: by 2002:a05:620a:2621:b0:7d3:9032:2b85 with SMTP id af79cd13be357-7d3c5360840mr136454585a.13.1749822075007; Fri, 13 Jun 2025 06:41:15 -0700 (PDT) Received: from x1.com ([85.131.185.92]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7d3b8ee3f72sm171519285a.94.2025.06.13.06.41.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 13 Jun 2025 06:41:13 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kvm@vger.kernel.org Cc: Andrew Morton , Alex Williamson , Zi Yan , Jason Gunthorpe , Alex Mastro , David Hildenbrand , Nico Pache , peterx@redhat.com Subject: [PATCH 0/5] mm/vfio: huge pfnmaps with !MAP_FIXED mappings Date: Fri, 13 Jun 2025 09:41:06 -0400 Message-ID: <20250613134111.469884-1-peterx@redhat.com> X-Mailer: git-send-email 2.49.0 MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: Ux1NbDHq91UxP1U64_vBjYaU1qP4i--wPrmCrHfeJ58_1749822075 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: E62EAC0003 X-Stat-Signature: mi7361k4etw8eje116isi5z87gp9gjut X-HE-Tag: 1749822078-851510 X-HE-Meta: U2FsdGVkX1+8UnQ572/5I+MbKj3TmexN4u/el5rquLkbajHOYjrZnG0VLLlDWoe2pwq/uWsbb/ruVP7dit6juyOWI5xNQh14HoG4WOY3PqvtL4e1RAhk7Uc9W4GTc7Gd5n2hJb2ePc9ipcN6+8mH0Tg+8YHC0BbJ8zSI/hVNCSP2w2ripyrZj64NQRBVZJQzMtD4ZvZo7CYYkEYC/TfUfaofllkyl4vjM7B5bup3BAh1VXDcuKxvZgMo/PYLzhOk6v4hVY7z4RyHiysNyCG7LVnmSjf9kIWCeQhWtZdkClAgne+VXhRb5pg2HUTDVJ16DuLJur8HNCDWD0NcUOQns7gimQjokvtzVQgnXca2lxv8BdnLvn543CCg04aKkmIBWbm+1vOtay7CmgpUzF2YwH8WwMX8wuOM2XHmL6MRAYz7qIqgvVExKBE5kGoEExzjdQfUnzFOYvfoxUdt0C3E/xOdMqmRPhJa91ryMG+Il7D5upo3M+wv/x6z3GM88cFw0BVhw8V2pxYyGpCKamQTOWb0vE1X2+JRpXBk4k0BY5D07LwgMS+nH9M5L1hHp/Udirwgim7E/h10U4geQF05Uht8Ky12nEWvrrd+MAQyS1UXq5Z6u/fs/SBScMLnhcUKds79dauu1siv+StagG4TPLyZcZlAYtfp60lfnoNNht38H+GTVKcSvowjxQ7Har8Vd17c3ySh0jRL97B+08lXXqn/Q/CV1SM1HA/TFD47F2SaKkT9VbBFmhS1pIHwNBtFg3pEWrkBGLRg4iKwTjM3rPgtN+YNUgYZw3VWuzG3TNq3utatH2msQIcIRHeGvHk5D8pnTB0XSXqpudtdOXozdvIZK7PbhpphnrpeuXfILeyOk8X6A8Us9uAVA2S1ngQFafvYhLiZ4HoDRkH/h64vnmKC966+pooO9vAKMh3so8GJh5FGmEnVAwC52zs8MScVP4mysazr/n51CEfXUgB v1vd4fxU NCeQfv/gMC9ao9eoliUTC30dIDuzvITvG2Z0zbkEQFddpwTHYJW9qRKFOvvPgF649poelmHio7eXx0z/+gEZb1Uzumth9OCin8YacxKM7nlyqDpwKG3hwAanLE20oWsx+/AEVZ6mcsXo3bizz+yoFf7MnX3Tu67eM9hhNi8LKZDdND+/LUfeqpxH4nt/NBlSc7q07BkrOQwM7hKv4X3lpl1mJh4MA8esH93bqJHJEski+PF0a56w1wJUjWLjK0zxceA8e3SnHyAlm0H1IdeXWSenqywh2j4FhIHDawEYHovAHXzw4sWRzb/ykoMIkwPpsICFZ7Mcwk7ufcKjFYR+u8OMnVFhiXDWJ6h2BGN79JpuUi+s05TagLmhKe49YHLZB/Q7+Gq5Upv0gDT6zTiguiYn9GNkdMfaEIvlchK59TFrFlcjcq+7S19iODbG/kbpuSO7Yadf1VQ2wX62MNanBIV2EK84GzWEoyaIAKgQ/fIQfzZPAxHTo3iPZekGg/J0pH4t6 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: [based on latest akpm/mm-new as of June 12th 2025, commit 19d47edf9] This series enables !MAP_FIXED huge pfnmaps for vfio-pci. Before this series, an userapp in most cases need to be modified to benefit from huge mappings to provide huge size aligned VA using MAP_FIXED. After this series, the userapp can benefit from huge pfnmap automatically after the kernel upgrades, with no userspace modifications. It's still best-effort, because the auto-alignment will require a larger VA range to be allocated via the per-arch allocator, hence if the huge-mapping aligned VA cannot be allocated then it'll still fallback to small mappings like before. However that's really from theory POV: in reality I don't yet know when it'll fail on any 64bits system due to it. So far, only vfio-pci is supported. But the logic should be applicable to all the drivers that support or will support huge pfnmaps. Kudos goes to Jason on the suggestion: https://lore.kernel.org/r/20250530131050.GA233377@nvidia.com Though instead of refactoring shmem, I found we already have a function we can directly reuse for THP calculations. The idea is fairly simple too, which is to make sure whatever virtual address got returned from an mmap() request of the MMIO BAR regions to be huge-size-aligned with the physical address of the corresponding BARs. It contains minimum mm changes, in reality only to rename and export the THP function that can be reused. That is patch 3. Patch 1 & 2 are trivial small cleanups that I found while I'm looking at this problem. They can even be posted separately if anyone would like me to. Patch 4 is a tunneling needed to wire vfio-pci over to the mmap() operations of vfio_device. Then, patch 5 is the real meat. For testing: besides checkpatch and my daily cross-build harness, unit tests working all fine from either myself [1] (based on another Alex's test program) or Alex, checking the alignments look all sane with mmap(!MAP_FIXED), and huge mappings properly installed. Alex Mastro: please feel free to try this out with your internal tests. The hope is that after this series applied your app should get huge pfnmaps without any changes (with any pgoff specified). Logically there should be minimal dependency on stable branches whenever huge pfnmap is available. Comments welcomed, thanks. [1] https://github.com/xzpeter/clibs/blob/master/misc/vfio-pci-nofix.c [2] https://github.com/awilliam/tests/blob/vfio-pci-device-map-alignment/vfio-pci-device-map-alignment.c Peter Xu (5): mm: Deduplicate mm_get_unmapped_area() mm/hugetlb: Remove prepare_hugepage_range() mm: Rename __thp_get_unmapped_area to mm_get_unmapped_area_aligned vfio: Introduce vfio_device_ops.get_unmapped_area hook vfio-pci: Best-effort huge pfnmaps with !MAP_FIXED mappings arch/loongarch/include/asm/hugetlb.h | 14 ------ arch/mips/include/asm/hugetlb.h | 14 ------ drivers/vfio/pci/vfio_pci.c | 3 ++ drivers/vfio/pci/vfio_pci_core.c | 65 ++++++++++++++++++++++++++++ drivers/vfio/vfio_main.c | 18 ++++++++ fs/hugetlbfs/inode.c | 8 +--- include/asm-generic/hugetlb.h | 8 ---- include/linux/huge_mm.h | 14 +++++- include/linux/hugetlb.h | 6 --- include/linux/vfio.h | 7 +++ include/linux/vfio_pci_core.h | 6 +++ mm/huge_memory.c | 6 ++- mm/mmap.c | 5 +-- 13 files changed, 120 insertions(+), 54 deletions(-) -- 2.49.0