From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8FC92C5AD49 for ; Fri, 6 Jun 2025 17:57:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F051E6B0088; Fri, 6 Jun 2025 13:57:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EB6156B008C; Fri, 6 Jun 2025 13:57:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DCC046B0092; Fri, 6 Jun 2025 13:57:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id BE50D6B0088 for ; Fri, 6 Jun 2025 13:57:39 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 83835160614 for ; Fri, 6 Jun 2025 17:57:39 +0000 (UTC) X-FDA: 83525733438.27.E29E45E Received: from mail-pf1-f201.google.com (mail-pf1-f201.google.com [209.85.210.201]) by imf24.hostedemail.com (Postfix) with ESMTP id C57BC180004 for ; Fri, 6 Jun 2025 17:57:37 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=U+1CSx4n; spf=pass (imf24.hostedemail.com: domain of 3ECxDaAYKCL0vhdqmfjrrjoh.frpolqx0-ppnydfn.ruj@flex--seanjc.bounces.google.com designates 209.85.210.201 as permitted sender) smtp.mailfrom=3ECxDaAYKCL0vhdqmfjrrjoh.frpolqx0-ppnydfn.ruj@flex--seanjc.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1749232657; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=uy4s6eK2oicgrVJHBktYrwR1uwscm0ofZKzJBnShpYk=; b=HfZ1n4B9w3R3dXiNREbny0VRAcXg2pUxDlJbCj1vWd3yqWvu8rfY21fMm1o9BzC5WED4PF fHvf/ln/9C1XHCSUAatC5m73QG+KwgRU8mnS6FVmtHUP7ie7Mb1Mu2pQdbVM4yetdd9gbA ObxvJJT3GOfBRd80uneyVzwfLYS7JQM= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=U+1CSx4n; spf=pass (imf24.hostedemail.com: domain of 3ECxDaAYKCL0vhdqmfjrrjoh.frpolqx0-ppnydfn.ruj@flex--seanjc.bounces.google.com designates 209.85.210.201 as permitted sender) smtp.mailfrom=3ECxDaAYKCL0vhdqmfjrrjoh.frpolqx0-ppnydfn.ruj@flex--seanjc.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1749232657; a=rsa-sha256; cv=none; b=t1AY1bn6t1li30/YTAcT/Bs5T0UtXNmQaTRG3lKfb4wYufwrUDJT/NMdNd1TqBxjbaIzMV VfUINP6+mYrI48iZAduCZxo+6P2gvSGGh8kFX0DxV75g3TpZP3JxoiNl3hHXVcOH+AkZdY ePOpUUk1s2ll6q4L9c8whFfK2nkGXTY= Received: by mail-pf1-f201.google.com with SMTP id d2e1a72fcca58-74299055c3dso3648968b3a.0 for ; Fri, 06 Jun 2025 10:57:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1749232656; x=1749837456; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=uy4s6eK2oicgrVJHBktYrwR1uwscm0ofZKzJBnShpYk=; b=U+1CSx4n7x7KOe/+Ckj56elQngA0jSi2pJnP3w8QdDB6r9nJOGk6croBFZX017s2xM +7bmXhBxzNwZ5c6PhbX2bHhhKI2gbC8NokJUd97KboDdmV1G+NEjBbHMYF8kkrlD/InW bIq5D4tjqU5hXInuBMllnyvZjUWSs7wdqRsd2Ywo8Fy/PqmgdPvPwju/V+9KZsHMkACN 2N9owflq9hDUXlI+vwExcugCmgHl6eGsIhknL23tNJZVfeKBBpOoIpxY0GvjPm32tGhK AA/M5VkyS6u10akKBpqSc/xEAKfgxnCafz3am5D3wMamTz2ZqFLvWr/IfPhXN9+adclK Thmg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1749232656; x=1749837456; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=uy4s6eK2oicgrVJHBktYrwR1uwscm0ofZKzJBnShpYk=; b=AmKqLPHEOL5AjXjKzRXIZINrI0yY6uloN9JqpnWLpxSn3ty0RaVCAIJY2QEflnt/VD K7LtWHtxO8FkDK4D9gjO15KeEAY8pqingzGY435CrUCQfH7QxVTnlmbQSxC1zBe2bqNy fz00g2eM5bkOoAnsBW3jRMxN34lvKn0QQgw0MRv5CygFAh22Wet/oF5qqFHzJ/SckxGH YYy2plDrdDGsVBh6A42r5HbCFowzWFhUBMVmfVhnkxfFtQqwm9vOkKh3N5Y9rxNfXY2f 8Nf8/BJ7G3jycF18TlwuEx6Tv0Twic6L7KYGXFZStcyttaH0UpH17LwM4JVm7jPSZc3z JNlQ== X-Forwarded-Encrypted: i=1; AJvYcCWoWYuEOBcKDQnpS14crZ8n4bicT0P/W12ql8Q6Ex7sH/GtXFhQ7j6lqDAFrB4qPRPUXCNXRps3HA==@kvack.org X-Gm-Message-State: AOJu0Yw7w9P7NoWmj4XtLTkDXZ8C2wt/OM9Sogg2Eep8+cDedwoNeiyX ThUKQXEdQKHt9XKFUZa3sCfWs6vLN+Vrd+e6cMc5FTaLUg6rT6h7PBH2rsbcawqsKRucoqhwnr9 x4wT1qg== X-Google-Smtp-Source: AGHT+IE0O8E/z1AW0aCKfeQ6ZwKpA8wqWW3uIA5Eo0Mj+1d46DIjhKhwh8m8l5jhmiyq44/R62b0WTD+8ZQ= X-Received: from pfaz1.prod.google.com ([2002:aa7:91c1:0:b0:746:3185:144e]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:22d1:b0:742:a91d:b2f5 with SMTP id d2e1a72fcca58-74827ea2140mr5987849b3a.13.1749232656472; Fri, 06 Jun 2025 10:57:36 -0700 (PDT) Date: Fri, 6 Jun 2025 10:57:34 -0700 In-Reply-To: Mime-Version: 1.0 References: <20250524013943.2832-1-ankita@nvidia.com> <20250524013943.2832-4-ankita@nvidia.com> <20250527002652.GM61950@nvidia.com> Message-ID: Subject: Re: [PATCH v6 3/5] kvm: arm64: New memslot flag to indicate cacheable mapping From: Sean Christopherson To: Ankit Agrawal Cc: Jason Gunthorpe , "maz@kernel.org" , "oliver.upton@linux.dev" , "joey.gouly@arm.com" , "suzuki.poulose@arm.com" , "yuzenghui@huawei.com" , "catalin.marinas@arm.com" , "will@kernel.org" , "ryan.roberts@arm.com" , "shahuang@redhat.com" , "lpieralisi@kernel.org" , "david@redhat.com" , Aniket Agashe , Neo Jia , Kirti Wankhede , Krishnakant Jaju , "Tarun Gupta (SW-GPU)" , Vikram Sethi , Andy Currid , Alistair Popple , John Hubbard , Dan Williams , Zhi Wang , Matt Ochs , Uday Dhoke , Dheeraj Nigam , "alex.williamson@redhat.com" , "sebastianene@google.com" , "coltonlewis@google.com" , "kevin.tian@intel.com" , "yi.l.liu@intel.com" , "ardb@kernel.org" , "akpm@linux-foundation.org" , "gshan@redhat.com" , "linux-mm@kvack.org" , "ddutile@redhat.com" , "tabba@google.com" , "qperret@google.com" , "kvmarm@lists.linux.dev" , "linux-kernel@vger.kernel.org" , "linux-arm-kernel@lists.infradead.org" , "maobibo@loongson.cn" Content-Type: text/plain; charset="us-ascii" X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: C57BC180004 X-Stat-Signature: qhpjrfeb9unkk36fyubixqabsayyfqor X-Rspam-User: X-HE-Tag: 1749232657-182079 X-HE-Meta: U2FsdGVkX19OTpsSt8zRUvUmV/AO9StBdK5SkYnzfInsIOuMuAVRvnO7+w5BJDB/olCVAkA4Hm8DZ26EN2+H8IdkYt3w+byeSnA+pPkCA/cOjs/Fqh4olfhw7fNJLmdmlcxL1HbM0Ez5rJsTX15mO236qkMPGSnFc2FJwgXgYdk3cs4C/19HR27BUYSkTTJqjpsLDjY8WYt0KajBeactbKRDHb2QpjYXu2452I0CNrKw54sSlY1EH0/yl4haClVBWVwffW3XPugaw4KOEt77IsibV6c+f40Az5aw2+nfCbXYW4PaVcfuUaPk1+Us+G9Rz1F66xkQ5rDnFL5RG/JsPF6mHRQKGLVA67IuAe1yDpXCIdvzBxjCCfA8HM0a1vylx0bLMfddqHcknUPTIBVpCQLwBLVfctbWAqB0Inp44DFpBLkm+ibycX1l2FfKLbKEMPVl7Es6CT6dNjyhK8CpmoAD+GtKCOoY5NtLJAYODdt9cYCEEYI8XoxoCmFpqHIrcOiqwnkudP6ZooOxwqh/xYjMkPYitGxwI5kwoEM8sEOaxqJlVXdcRf96pQ1EE4JzOn5P5QNulQGCKcmNKVloTlzu29CcFuLfWTdjdkS3zRsrSG4AlanETWYZEryjLI55qsBVDtr8kc9f/xZ84gTqexwwyQa5wZQvvDuTbcPHnghMi5t1VkcsnK1ELYvxpkYnuI81Zoxb9ZnilJtqpDlDYfK4F+WGx2jmJubc5AI9iDMt2yfLTwhyfnSH+d3ucUcjdPTWwGUo5/+hoPmPZtSbooJmo1giQGknSLoBentnS6OZzAXOMmGHiqDCfLo2Ia7c3elQLI7wIRvav5GCTnNXD/b5LUZglmg5mquX1fATwk6GUTgPoR8qwaUrEaiI08JYPp/eWV7raacgwIZQF6XIQFobeG5Cfojjh+u+bKUiMIxFO11MHmU9J3tH20NN5KpAS5B7VLwJ4CQ0z4sBYB4 CJz99B81 hs07S25cViH6iH9uAvWBNSq+bSKl1uWIDTZudcUFzjDSVLttuIhUIfSMVQHK1oFdlFMLhMy6nzuh+LSpxdCzQigqx+iYcXyKdA6ZtT8IXGz1XSiBoiehb/BX0U7wzeSA+6HkjsdwaPDZhX7oKkFwkfV/g3q0ASVcj7K6BB1vOv1bdNNUlDuN9ZPBIKlWTjDh28t6yNIiG4mGd++WY1EVK8bDV/juhxfcEABLHLmvaDYLl2VjnhQg9xV+/vUGSmXxy6NL/QliF+Anl1vhSCjcpytFD1gyhw9I+F/alN9aXhDEn4b9teLW9KZLY9ZOdKRQQUbxollfazUt/lZhftcZT2FfTVdWm9mCVvrLwl95YA1wfYqUQx1IeNtqvEVGDpAxuum8DLyGY77GgYz2LZE2jnD/NpQnELtsC+NS9TbMtevHy3a/bHmUatG1A6b3zlAQJkkDJ1iN7hm37e3DrXl5H+TNga/Fr8x+coQXtjeOFY/jM9/RetNCy9q40RemkN4NfEtqcLHi8mSqkhxIqa+mpEVdPrYjmPtZkQHKY624lPU2L/ZL/yAck8Yhn2cHShRWUkvhyLS7Auw3kgFU9eiQL/vC7GAMHUFek8LQedSn08c6MhjM+MmoCShzOtSSK0/xcdExIV5MCKIHrM2y8o2JB6I6J0Q== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, May 27, 2025, Ankit Agrawal wrote: > > I thought we agreed not to do this? Sean was strongly against it > > right? Yes. NAK, at least for this implementation. IMO, this has no business being in KVM's uAPI, and it's a mess. KVM x86 unconditionally supports cacheable PFNMAP mappings, yet this: bool __weak kvm_arch_supports_cacheable_pfnmap(void) { return false; } if (kvm_arch_supports_cacheable_pfnmap()) valid_flags |= KVM_MEM_ENABLE_CACHEABLE_PFNMAP; means x86 will disallow KVM_MEM_ENABLE_CACHEABLE_PFNMAP. Which is fine-ish from a uAPI perspective, as the flag is documented as arm64-only, and we can state that all other architectures always allow cacheable mappings. But even that is a mess, because KVM won't _guarantee_ the final mapping is cacheable. On AMD, there's simply no sane way to force WB (KVM can't override guest PAT, i.e. the memtype requested/set by the guest's stage-1 page tables). On Intel, after years of pain, we _finally_ got KVM out of a mess where KVM was forcing WB for all non-MMIO memory. Only to have to immediately revert and add KVM_X86_QUIRK_IGNORE_GUEST_PAT because buggy guest drivers were relying on KVM's behavior :-( So there's zero chance of this memslot flag ever being supported on x86. Which, again, is fine for uAPI. But for internal code it's going to be all kinds of confusing, because kvm_arch_supports_cacheable_pfnmap() is a flat out lie. And as proposed, the memslot flag also doesn't actually address Oliver's want: The memslot flag says userspace expects a particular GFN range to guarantee ^^^^^^^^^ Write-Back semantics. IIUC, what Oliver wants is: if (mapping_type_noncacheable(vma->vm_page_prot)) { if (new->flags & KVM_MEM_FORCE_CACHEABLE_PFNMAP) return -EINVAL; } else { if (!kvm_arch_supports_cacheable_pfnmap())) return -EINVAL; } That's at least a bit more palatable, as it doesn't create impossible situations on x86, e.g. x86 simply doesn't support letting userspace force a cacheable. And Oliver also stated: Whether or not FWB is employed for a particular region of IPA space is useful information for userspace deciding what it needs to do to access guest memory. The above would only cover half of that, i.e. wouldn't prevent userspace from getting surprised by a WB mapping. So I think it would need to be this? if (mapping_type_noncacheable(vma->vm_page_prot) != !(new->flags & KVM_MEM_FORCE_CACHEABLE_PFNMAP)) return -EINVAL; Which I don't hate as much, but I still don't love it, as it's overly specific, e.g. only helps with PFNMAP memory, and pushes a sanity from userspace into KVM. Which is another complaint with this uAPI: it effectively assumes/implies PFNMAP is device memory, but that's simply not true. There are zero guarantees with respect to what actually lies behind any given PFNMAP. It could be device memory, but it could also be regular RAM, or something in between. I would much prefer we have a way userspace query the effective memtype for a range of memory, either for a VMA or for a KVM mapping, and let _userspace_ do whatever sanity checks it wants. That seems like it would be more generally useful, and would be feasible to support on multiple architectures. Though I'd probably prefer to avoid even that, e.g. in favor of providing enough information in other ways so that userspace can (somewhat easily) deduce how KVM will behave for a giving mapping. > > There is no easy way for VFIO to know to set it, and the kernel will > > not allow switching a cachable VMA to non-cachable anyhow. > > > So all it does is make it harder to create a memslot. > > Oliver had mentioned earlier that he would still prefer a memslot flag as > VMM should convey its intent through that flag: > > https://lore.kernel.org/all/aAdKCGCuwlUeUXKY@linux.dev/ > Oliver, could you please confirm if you are convinced with not having this > flag? Can we rely on MT_NORMAL in vma mapping to convey this? Is MT_NORMAL visable and/or controllable by userspace?