From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 4CD09C52D7B for ; Wed, 14 Aug 2024 14:44:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:MIME-Version:In-Reply-To: Content-Type:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=2Jrz59kVtKBeQkFrldVlczGR7E19ulshJboNFDWEcHE=; b=L3jxB8I/RMZL+XT6eY9XEkcWZ7 d0oiXbh3cyHE/Wu5pn3nrd7nYxHlgHYynS3WNaEKJ5SJPCpl4AEex8Ncbx5w1Ttw1M/XdM+AY5kJF ljt80pP9kO83PGAGH3BjHeo9+oruRpj86HcI3t5YgOQOtNtIH7AfbHgbiiLxOHAlxaE+CIp7vBgwW RcI7+NI9IMM8CfSJf9O1LYzFGBk9OvEXynLeNdYYhL8/pOUvxEpkrNoT7N2hPp38oHzstdpHKnRl4 2crRT5geX6xSa/5Qc3H/6EHLPKMdDJ0957WAGDAgPGHdZQxtYpLJ63uCZHZkdiE7DtNjl/RitDxGV EmJ+3vwg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1seFEM-00000007LOB-0U4C; Wed, 14 Aug 2024 14:44:06 +0000 Received: from mail-dm6nam12on20601.outbound.protection.outlook.com ([2a01:111:f403:2417::601] helo=NAM12-DM6-obe.outbound.protection.outlook.com) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1seFDh-00000007LBN-0US7 for linux-arm-kernel@lists.infradead.org; Wed, 14 Aug 2024 14:43:26 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=aGy7V2IsBTcANEID+Jc4HZzooKdYApOl67Jvb+DZb30o6UjxQUzNVxSp4NPRCRbFcdc2HM/h658eTxGhK5au5ejTizREoaLr2JSkQX1Nnb3jyW4sYGqA2qQAtamjZ+2Ub/wFuiDCZQUbkaY0lUTPbOWhjXCQsNxr1LFjQbt3k7o0KhoN+SdGEp+iaVypzQReG8ky3VzuiwNX3FwEBtcl/OTL3/y4ddsOUwtZbWPlf0LStwhMsJisRAXD9nwLpWGq4Oj7on0VLxdnxltAFGhFN8zG4FjK3HASQ4AZ2xBXXJzb4h32EdsaoiX3YIEJRHHwzlz7kXpfrXlczxvPkBjmsg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=2Jrz59kVtKBeQkFrldVlczGR7E19ulshJboNFDWEcHE=; b=PQTzU6EUU6JkAHb4IiFVZxdo+KsZu0z7zOLMsEAjIOt0iRp4G3MCXbLrjNSQ1oQUoWWT4pudryR6o2VHZLAxHAkYeHbVyekFEZxRc4pswrJOeGEhQkOHbPljWCnmYD1ds4lsmCOw38v2JlZHTFCOIeEm9sal+1XTMGgmt/uMqmBXQWOkNthLqb8d+JVzi8Kqm4MD8OJQkQo9NLlpccpd1PkAZIKD9nfFK13jY/Aawa5zIt3zsvpUzPg1PMbUHKpyDxfYR40+KAXx2/GeqRwN3l7m9VLkf9iH4YzjaTapmeTlefi8FF1H10BsEKm2UFVViD1aQYw6MyxiW6j3MnjmLQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=2Jrz59kVtKBeQkFrldVlczGR7E19ulshJboNFDWEcHE=; b=EQcmIw1o9uh+mPj9O8tc1jkpArLV70lWLuLc6hMJXHqu2ed+Fcs5RgSmB3W6vEmLiQ8tGL6HLSi28/t1m0eStdXesN9D3eM9AZhVJiqDcY9zsWAmZ6j0JsFJoZMhw7vx6lrWu7thZR98vtI4HwtsoJGG6Xr/HsfKk+CP6HEPnTrR1Xsrp4bGDdGSbMW8LoMei/B3Eyt+DhWHCyYnDFaS20zRwE02IlhLIpIAaHlDD7I9heoX/4AlVnUHmPatxKYDhS7QQrFTPfQi78b8MR0K+n6sce0/+R4WVvm+28Ph/N7vwIFFT2rEfr7wT4nr3NFFh+3jJIhCaPCNSXhJD8b0Aw== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from DM4PR12MB7767.namprd12.prod.outlook.com (2603:10b6:8:100::16) by DM4PR12MB7575.namprd12.prod.outlook.com (2603:10b6:8:10d::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7875.18; Wed, 14 Aug 2024 14:43:08 +0000 Received: from DM4PR12MB7767.namprd12.prod.outlook.com ([fe80::55c8:54a0:23b5:3e52]) by DM4PR12MB7767.namprd12.prod.outlook.com ([fe80::55c8:54a0:23b5:3e52%3]) with mapi id 15.20.7849.021; Wed, 14 Aug 2024 14:43:08 +0000 Date: Wed, 14 Aug 2024 11:43:07 -0300 From: Jason Gunthorpe To: Sean Christopherson Cc: Peter Xu , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Oscar Salvador , Axel Rasmussen , linux-arm-kernel@lists.infradead.org, x86@kernel.org, Will Deacon , Gavin Shan , Paolo Bonzini , Zi Yan , Andrew Morton , Catalin Marinas , Ingo Molnar , Alistair Popple , Borislav Petkov , David Hildenbrand , Thomas Gleixner , kvm@vger.kernel.org, Dave Hansen , Alex Williamson , Yan Zhao Subject: Re: [PATCH 00/19] mm: Support huge pfnmaps Message-ID: <20240814144307.GP2032816@nvidia.com> References: <20240809160909.1023470-1-peterx@redhat.com> <20240814123715.GB2032816@nvidia.com> Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: MN2PR15CA0060.namprd15.prod.outlook.com (2603:10b6:208:237::29) To DM4PR12MB7767.namprd12.prod.outlook.com (2603:10b6:8:100::16) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DM4PR12MB7767:EE_|DM4PR12MB7575:EE_ X-MS-Office365-Filtering-Correlation-Id: 967cf2da-8929-474b-8b5f-08dcbc6f69eb X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|366016|7416014|376014; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?8tzLyTenTj/HThlvbuQx9i04VxPZeieoMHjQbibdd3MafxndoEkrWU+3hAul?= =?us-ascii?Q?3VRdkUOYqFfQ3LPDJcAWa/VAWcFkoqirmehAgrvv2eCsTqSH3za8ODhcPhew?= =?us-ascii?Q?vnc4lg/6AcdMXoxubMiL+6BHP81XNdFMdGviG7Ism2ONFCxIPGN69ale5F6e?= =?us-ascii?Q?rxoAqZqUuDHdNByIlp/JoUTVoeoZdToV44arxnjpKFXERT9PufHQllUe5e0Z?= =?us-ascii?Q?bCpju8Vk7J6vw1Zxi6Te/04Sdp0JeKae0/ohrRHLhix+O26MbOkZyFHn4CD1?= =?us-ascii?Q?VnN6J7DuyCq+4apHUYDBnibmuuihTulz6rXqVzJdruiuk5cW3JFwK2OIFeIe?= =?us-ascii?Q?3b3EGxLmgqDE0ig5JZJvL8PalpIjAOVkppaG7a29n/MIXwnZTmRN20uXOrVj?= =?us-ascii?Q?MlAqgvRNduaYBUWe2UbIOVGZ4gocgEPr8Pc9iIHXvuO/e3pqtEp8MXxt8h/E?= =?us-ascii?Q?lwU+pQon6VCjVv1VCi3eZNgXs3RN5Pw1S+F63Geojdts0Iz3Y4tqgm6s2ovu?= =?us-ascii?Q?S6XQVGH+ii3mzCOh07HKaNUY0gMXYl7fGdaTTo8xH21DHcIxiVNFnKGxjyHx?= =?us-ascii?Q?pUvZmlHzYbLNdR5HVxfXqSnYEELbi8QBaZbL9FnRbgeCr0GsOpVrRqEsbJ5E?= =?us-ascii?Q?zOl+ktoST1qxJJ261G/sHdwfAWbtktCowq1EsE0RW+OnClD7fH6ghaCfPISN?= =?us-ascii?Q?d+ALq3ALKum75XGhQ8wmb+QDfVtY2TUOHzIt5EL/sFJjDz2LaE3VgaoG8zru?= =?us-ascii?Q?DhT46UL4c/E93oXWT8HaIXixzXNC9MLyKRae89sz56/nfrklvn09NaknSJkS?= =?us-ascii?Q?zYZdz19lOlJYh/bUr8PYatsC4z22D6INBpQ1WmVY/EC8F3MiCfqI4zoauBQo?= =?us-ascii?Q?g7ldQhiDDGbniOV2dTq1MjENtCZtu44zPI055cHadyWWLAlIUShkU/akgA/7?= =?us-ascii?Q?GlPNJE6JAuS4gjr11tIsfd6UQwh3VZ2OaUV/CT15m+1+9ObEcu8J4XU2wVYg?= =?us-ascii?Q?vXtZkjTZdkqO3FrM2WPWF9iUu4hl5/niLX6vHZ7ZiGhFZpp988fPaQHV9FsL?= =?us-ascii?Q?w9/eXmS2RslCrjM/wiuUy1jyzSwSD18dSTN8JI8ZcEyNnCqbtN1A/LbNE6IL?= =?us-ascii?Q?qvvpRf+6ZwS9dOfnA3ymlaLzAAPSiLBXdrKuvzoW0kVKh5P3/94Pbj03o01U?= =?us-ascii?Q?TMrVEp/qfn/YdVO1qAG+qHnT63aT+Fh9z9eiFYmcWn2VNPU9mEdHcnzUbZAh?= =?us-ascii?Q?UspXmm/lgC1oTR4jPPLQTk1XO1bAS80KzpH9jjraCH7GtOie0KNY9tUTJ0Mt?= =?us-ascii?Q?4rM5EhTlfAra5CYFPoID6/kEBeGMH8o00fCWhlobX+dO4A=3D=3D?= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DM4PR12MB7767.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(1800799024)(366016)(7416014)(376014);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?NRwL3sBaTATkA1jTP1lD3jRfKPfc2xdOC3L1pFFRb5VySGQmGgc2oKMKL5uw?= =?us-ascii?Q?vTiOBHY34GK1Qrw5bs0F3YVT1alXI3NTBdDGLJ0YLMiEcET+sN5DMuA0M7Uj?= =?us-ascii?Q?i0XxArvpgussAnOl868Klvyp5OwCO7Tqpb39RJKf42Dz0pW7hHAzAQ9c2LOA?= =?us-ascii?Q?D6CoWVA1EP2x/jrSAV5N+c56OdyDsIAVTNMeNeaD2ocQjByWAxPiPYaGEtzB?= =?us-ascii?Q?B5pBQXjU3LSiXv8dts8u9sx5PGRh/GEaKbJ5QkixcHHAxUNUaYFKnFCmm2Zm?= =?us-ascii?Q?tyHqRmUCuLL61kCxSD5reBNKSixniOK7da4qrv2b0s4fubuKRN540ftp7fQY?= =?us-ascii?Q?LMkcj3CNFqXf3J4oeVAPRHpNG82LCekNSSjxXWr0gVHm78+ySsz/HtH9pAJ3?= =?us-ascii?Q?ASKQlj2laNjFtV1d6mafp4th343sUFdj+3+htwUandQD2N8dZorfheh/ZA63?= =?us-ascii?Q?pZ7mv2/25jInwXzrpiis5okN5f9AVRGeA/u0gD+B6Jx7TP8bx7ZXxAg4Wl93?= =?us-ascii?Q?GAAxbtqM7Em23j+gH547FaGUbuEWhyEjQtc9FJ0Jkg0OWmoMxrrFYmNEK5dl?= =?us-ascii?Q?1GGsqRWguJVQZZuyk5Ofu0HzHrl3UmDZRc/BTSA1Ms4bNyXq6gf/hnldtPGc?= =?us-ascii?Q?IGtWsdrvUBnURKuMtQfanxbCJ4ZsMIV0mI8ZMehFrQJzjBeyNLufDBvJdZWC?= =?us-ascii?Q?wQhqPt83d2wYQZhw9R5NeiGp4hAkansJVOO0/F8molqTCjzVICiYOR2h+flD?= =?us-ascii?Q?s2uKOL+zjl8VKY2K34Q5iNb9WyUbayfm3rEV1p4rZzGIDUsUHSWeog98y0BL?= =?us-ascii?Q?ntR2v/y+BitSP3xKnyUxPU9DTykUfmtax/UX1EdAsKBaE9R7H86EIKHITwOS?= =?us-ascii?Q?jMdZmt24dlrpZ47YNawD6y9xySm3ZxK6yOVVutbYlXewMT7EBXWAH5FjvJml?= =?us-ascii?Q?vN5IgBnP4AbJBBV4yoX8a2pflHc0TePhgamwGyqT75LiqMbDuc0/C9ouiIDE?= =?us-ascii?Q?ErA+g/KmQAW0ERNT5VqEMpx93ZhngdGEtj+6mtq9T+J+meHIW1wNZ6Y/E0d+?= =?us-ascii?Q?p8trn7j4KbtC3uAtjrDBHWjdlXz18ROMMEPB8Pm+y4LxOKuv+bn+mzZWkNI6?= =?us-ascii?Q?f1nQTWB7X4n1HVWbARUSA8+4W684QEuDzBVdN7lX3a/7QhfRutuzkSH8ODUd?= =?us-ascii?Q?RJgnZSbCGqfzPfBTK3jeQmmxGxV/Bl/+DWRF4uK27S6NyhychXjiCHRiW4hx?= =?us-ascii?Q?niIuymerzOGr98ZxltEZDTks8YqxkIq3krkywtGn0Janbb3u4P9Iiq/LNxno?= =?us-ascii?Q?y8Qw3W1d1rfzSzzM4JiT4G6WlAdv+YwLOBTg6BEkEjVO3w0+d8tchZ5YmnsM?= =?us-ascii?Q?UhGwS2aesYUUKhttGrMPGPfaf6DLkW9M8kcS3fg/7rSkV4z2hu9TZGo31L3r?= =?us-ascii?Q?SyVgVeCJy+dm5muiVZnCnam1V5QWp1m9FwYHFjdmthV9+1uTAZ8KG0TDuRGM?= =?us-ascii?Q?cjGXMduU6zo3a/tFurBYFkMsQ+3PgWy24RjSnQrLVKT8rluVCICrksq6QEy8?= =?us-ascii?Q?cu9ueafvHOtNToxEbn8wc+KGgM+qsjpbjOkNyYyk?= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 967cf2da-8929-474b-8b5f-08dcbc6f69eb X-MS-Exchange-CrossTenant-AuthSource: DM4PR12MB7767.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 14 Aug 2024 14:43:08.5182 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: EwupXbeW34PT8QeppTdAKU4tLIJSiGUUjuKpGBmOtmadSLErDbOsR+HkOxsIe1VX X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM4PR12MB7575 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240814_074325_166794_0DA70E7E X-CRM114-Status: GOOD ( 20.45 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Wed, Aug 14, 2024 at 07:35:01AM -0700, Sean Christopherson wrote: > On Wed, Aug 14, 2024, Jason Gunthorpe wrote: > > On Fri, Aug 09, 2024 at 12:08:50PM -0400, Peter Xu wrote: > > > Overview > > > ======== > > > > > > This series is based on mm-unstable, commit 98808d08fc0f of Aug 7th latest, > > > plus dax 1g fix [1]. Note that this series should also apply if without > > > the dax 1g fix series, but when without it, mprotect() will trigger similar > > > errors otherwise on PUD mappings. > > > > > > This series implements huge pfnmaps support for mm in general. Huge pfnmap > > > allows e.g. VM_PFNMAP vmas to map in either PMD or PUD levels, similar to > > > what we do with dax / thp / hugetlb so far to benefit from TLB hits. Now > > > we extend that idea to PFN mappings, e.g. PCI MMIO bars where it can grow > > > as large as 8GB or even bigger. > > > > FWIW, I've started to hear people talk about needing this in the VFIO > > context with VMs. > > > > vfio/iommufd will reassemble the contiguous range from the 4k PFNs to > > setup the IOMMU, but KVM is not able to do it so reliably. > > Heh, KVM should very reliably do the exact opposite, i.e. KVM should never create > a huge page unless the mapping is huge in the primary MMU. And that's very much > by design, as KVM has no knowledge of what actually resides at a given PFN, and > thus can't determine whether or not its safe to create a huge page if KVM happens > to realize the VM has access to a contiguous range of memory. Oh? Someone told me recently x86 kvm had code to reassemble contiguous ranges? I don't quite understand your safety argument, if the VMA has 1G of contiguous physical memory described with 4K it is definitely safe for KVM to reassemble that same memory and represent it as 1G. Jason