From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E83ACEEAA7E for ; Fri, 15 Sep 2023 00:33:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C4B4D6B02C0; Thu, 14 Sep 2023 20:33:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BFB356B0300; Thu, 14 Sep 2023 20:33:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AC3236B0311; Thu, 14 Sep 2023 20:33:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 9829C6B02C0 for ; Thu, 14 Sep 2023 20:33:58 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 5F987A05FC for ; Fri, 15 Sep 2023 00:33:58 +0000 (UTC) X-FDA: 81236959356.02.C85D2BC Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) by imf08.hostedemail.com (Postfix) with ESMTP id 9C10C16001D for ; Fri, 15 Sep 2023 00:33:56 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="XScy/1IR"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf08.hostedemail.com: domain of 3c6YDZQYKCPwwierngksskpi.gsqpmry1-qqozego.svk@flex--seanjc.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3c6YDZQYKCPwwierngksskpi.gsqpmry1-qqozego.svk@flex--seanjc.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1694738036; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=J23lG2GsSAJUhRlfXZ8Nll1xYHKwqpwqAHVfDlHiwRQ=; b=S/YKCKd52ynb3sI1OC5jfypHrOB44VANWOiKpU2Y9Gjm7fL+1lTXKkNIpZLY0zTq9PfpC4 jd1erm0HvyyBYpfFM/Cq1kAd0ZaNxYzK6IXXWmDIDe6ohyvandWWHGhEGyVoyia5GZYkrY yrHo1VdbPjbTvRcaHWKj3Qg3tW3Pjqg= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="XScy/1IR"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf08.hostedemail.com: domain of 3c6YDZQYKCPwwierngksskpi.gsqpmry1-qqozego.svk@flex--seanjc.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3c6YDZQYKCPwwierngksskpi.gsqpmry1-qqozego.svk@flex--seanjc.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1694738036; a=rsa-sha256; cv=none; b=Fzrxhex1Dc6u1148IrpG7g5Kl2Onmkj1ISPHs4cE7rc7uRyUsxTILCVo3FN3SzSIkP2t79 P8E6uuq2H2xyVIWqBV+AvLWucsDgqywGzDylAHfbvKMTxBTa4uts5Teee4Qis1OhT00MWZ l2y4ht0kLvABKVB7eokq1563v0C+rSI= Received: by mail-yb1-f202.google.com with SMTP id 3f1490d57ef6-d77fa2e7771so1884734276.1 for ; Thu, 14 Sep 2023 17:33:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1694738035; x=1695342835; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=J23lG2GsSAJUhRlfXZ8Nll1xYHKwqpwqAHVfDlHiwRQ=; b=XScy/1IRfsG1wwnumOVC00D4KPj+0oDIafG9m3Y1wu64+0L+fEgJZrspoi4Stm2OKn xORg/byib7Qzy43qrf5qrha0qmCC9jjFCDtXn75Qg7XXr/7s0sykU5HlAkQ+jjaRdeBY 3SYm4RrmxREGvi29BYHUso9L4Z7WKJ8nS/AqsRmRT8zZW3eMLSoLFFsFtkpFOiu0Akzs Y3JqpHM6fSsaF8jetIlhJPQ4Td+kT5XY9iBhNDpyGXjF8jpmPPnSJ4d1DMEY9oTDDNet 6s8Sj5zJ30t+U77fC9uQNtT/2/1zPgS+rGsTqTlSfXNuB/UAcFi2Q6GT4u7P6cimKOpd LM7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694738035; x=1695342835; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=J23lG2GsSAJUhRlfXZ8Nll1xYHKwqpwqAHVfDlHiwRQ=; b=TpkMud8egknipnsDVaoE/SvpD8jFAD/aEIWzod4ZFXu5LF6QNZXpd0o2sM6Icp1z1G tpqZolkQK6BLIr4mpcj9YfRlYLUwb42c7fV6NIfI5nsqY7xkGebODrg9ROqjW6l7/xjQ 2I585T/xmfeVoMxfIf5yRSJ+0O+PCe2vk+Ds+p7bgrjXMcf25m8x1BC4FeVOaBkCeouR CXnTc9TpeLBQk6WYkXQuY/UxlJ9b+TDFMKPM/gRnRjOlkRj4Tox85ZU7JFUuKCA4miiW T+C8ClFTqZPo/30N/VwzvF+MIqpNsvFi4KJhwcumTUtYAGw/GyUe8xWk3vZNDpIH0u/m Kl4g== X-Gm-Message-State: AOJu0YzjlBQY7esy3NeKGaG4OOnTyiRfTUFR7zB+neddh4Q9CidFtCR5 iiH+MybO9/l2X5Uuu5OROkn5PSCeMsQ= X-Google-Smtp-Source: AGHT+IHBN6t0jhuW16tvzWJMbA1thelqq1JkZNxaAmcO8xrBGxWQKZuWbB7lmKVGkcmoGzwdDBC0p73x1Zc= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a25:ad5c:0:b0:d7b:9830:c172 with SMTP id l28-20020a25ad5c000000b00d7b9830c172mr2934ybe.0.1694738035530; Thu, 14 Sep 2023 17:33:55 -0700 (PDT) Date: Thu, 14 Sep 2023 17:33:53 -0700 In-Reply-To: Mime-Version: 1.0 References: Message-ID: Subject: Re: [RFC PATCH v11 12/29] KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory From: Sean Christopherson To: Ackerley Tng Cc: pbonzini@redhat.com, maz@kernel.org, oliver.upton@linux.dev, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, willy@infradead.org, akpm@linux-foundation.org, paul@paul-moore.com, jmorris@namei.org, serge@hallyn.com, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-security-module@vger.kernel.org, linux-kernel@vger.kernel.org, chao.p.peng@linux.intel.com, tabba@google.com, jarkko@kernel.org, yu.c.zhang@linux.intel.com, vannapurve@google.com, mail@maciej.szmigiero.name, vbabka@suse.cz, david@redhat.com, qperret@google.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com Content-Type: text/plain; charset="us-ascii" X-Rspamd-Queue-Id: 9C10C16001D X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: 6s4o4kmi4bau1pipzc4m4eoj87j39sf1 X-HE-Tag: 1694738036-217998 X-HE-Meta: U2FsdGVkX1+EOTnkv5VCBc0USo5nl2iwUjC/EV0UU1CJ54rzBz3dlRiHw1g2m2aoPoVTDV6XgTK09q3/WnJvgRkP248F1wtQn/wvy7Clp3k0ysN8DxcpV45wHv8dkaHT4Z431EA7Q/qCCboApGVHimm/HKGhu7LHc4/IA6J3o07YlTFkPr2y5a+rJzq+7YCZsi6ucvMCAL1KBv79lfN/44s4wvWI3pNfYxL6FpSXqk88Z98BSxN0gqnwDMMetG/zeljv80CTKtBgoMpnRaWruslnSeuIaxfO/pHKoVdJ1FUdsqQDtgnX7U8bs+8p4o5JSargd51PI1nHEunKTFcmHN8Q9SpTpdiUeUmT9hPcNiSD7fo+GNAEm/yXUPfk1M1OQJB7mFIOgsa/zng9QxsnR4RrYOR7rfiXrlT9hhwDKoQjvioa7OjEAzR/8VHYHW9bqJJSZl71Qp+ROPBhjURELI37EXv28Ys7dgvJ0UdYnG4CWD9y5GKALlYMtkAzkyAdcQMHhQPirK8lBEShjohqtmHZA8uFFCr749wMIkvQ5a/B3c8b/IfR4qu7KV4L9JHeGt9JsKlhHpqK1V2T/NEw0r9KoBNY/h7dkULRfixB/DLdR7j0m4Kzl6H3f6W7OTBpPkoMZvQoSTEfaoM7Yob+yNl6M40ZHNA5b8irqKJb2Xn+RzkOh1C1EtwiFySCe47V5YQwk7cgLnGPDRzMkctzfAwBulZGSqIYO7e0vu5eJSy+pvfVP4z7+9WHyF6UplbBy8i5EMzciylr1PhcqN8Hvpo42jRzKfX2pDaWr5hgcmLp+r3D6l3dYlcRjduYUKI7UDttA3bQE0/ouNWo49I1vZ/FUvjBDdvkT1C0Ls5USUYXsVyfLklxVYUKk/pwowpoDDwd0HNFVbjcoq6w8UGQ6EsM7xfcvYtPg/ro/yvW2b7KVHIuZfFewQ0a6SxZk4UVDj4AOB4csDTo6wB7blW NofcvAL4 2jKrU4nxR91R78hED9k10MeyrOac6Nbg6NMcqa9b0vI6c2ot6wz24adiNxvL/WLRXlJe9lqcf6ro+KPEjKhPb8K1UuUjLw0KRDF8IQ81+31mdxdLoxmKoNF9podMU8pyl8ol/xy2/imGD9YJ2isHnrqsrveI0j8cPVbT94s88HzyZeBrSfNr5NWnLUIhk8pa+h/kFuGzFWlNUL2h9JiqojfLvUwv1nkBDN8NfQFNMmztPXh72ZqRwLIIp25+1538RvsyqZJE1l+tUchcQSQq3ZQPzP9FsMr8GECWzDBzhw2Tix9/vhWej/AfFli8ZkAT3oN+yhyYW3zNuB3LQRPxO0bvhcrXH9ZdKynGvpi23JuEO2PWozJLeJxscX1jpCERPoXfPndv6LdPPwpmmQmsYZik6m9hSpsTrkedk8/QNh3Urmz2A+yNDkwtaoGoRqtlShNikW25xVGJ9JeBb0rpDUamhJJ4V6BjBQYnEGq7VTz098jd0lu+bVU1tg09Xtc2AQ7JDfKPP7+DGJLjhsLacLjWzYT1ICLLfOT0ly+HkgQuSfXmNG0Ve2e3S16hKSh1WIYpcSq15LpFtrAges1Tf0Y4TyNWzflD1sO/nzZFTJvY26pozqUI1hXWfhFJzC05wcbFjfbKkyr+ASq60KSaDntJbVikLdg76xPt/PhoW/1gZQpDsn2axSvJaBm3C0EP4VmBOGyK6T7ZeA3XPsALBXomXOzYZPp1bdbxBhwRiILOlp6cWhTtKo6efEl9pjSJnY9VnLSGT0mEkfKNtgWzGNE38X+XUVPj5Ymni X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Sep 14, 2023, Ackerley Tng wrote: > Sean Christopherson writes: > > > On Mon, Aug 28, 2023, Ackerley Tng wrote: > >> Sean Christopherson writes: > >> >> If we track struct kvm with the inode, then I think (a), (b) and (c) can > >> >> be independent of the refcounting method. What do you think? > >> > > >> > No go. Because again, the inode (physical memory) is coupled to the virtual machine > >> > as a thing, not to a "struct kvm". Or more concretely, the inode is coupled to an > >> > ASID or an HKID, and there can be multiple "struct kvm" objects associated with a > >> > single ASID. And at some point in the future, I suspect we'll have multiple KVM > >> > objects per HKID too. > >> > > >> > The current SEV use case is for the migration helper, where two KVM objects share > >> > a single ASID (the "real" VM and the helper). I suspect TDX will end up with > >> > similar behavior where helper "VMs" can use the HKID of the "real" VM. For KVM, > >> > that means multiple struct kvm objects being associated with a single HKID. > >> > > >> > To prevent use-after-free, KVM "just" needs to ensure the helper instances can't > >> > outlive the real instance, i.e. can't use the HKID/ASID after the owning virtual > >> > machine has been destroyed. > >> > > >> > To put it differently, "struct kvm" is a KVM software construct that _usually_, > >> > but not always, is associated 1:1 with a virtual machine. > >> > > >> > And FWIW, stashing the pointer without holding a reference would not be a complete > >> > solution, because it couldn't guard against KVM reusing a pointer. E.g. if a > >> > struct kvm was unbound and then freed, KVM could reuse the same memory for a new > >> > struct kvm, with a different ASID/HKID, and get a false negative on the rebinding > >> > check. > >> > >> I agree that inode (physical memory) is coupled to the virtual machine > >> as a more generic concept. > >> > >> I was hoping that in the absence of CC hardware providing a HKID/ASID, > >> the struct kvm pointer could act as a representation of the "virtual > >> machine". You're definitely right that KVM could reuse a pointer and so > >> that idea doesn't stand. > >> > >> I thought about generating UUIDs to represent "virtual machines" in the > >> absence of CC hardware, and this UUID could be transferred during > >> intra-host migration, but this still doesn't take host userspace out of > >> the TCB. A malicious host VMM could just use the migration ioctl to copy > >> the UUID to a malicious dumper VM, which would then pass checks with a > >> gmem file linked to the malicious dumper VM. This is fine for HKID/ASIDs > >> because the memory is encrypted; with UUIDs there's no memory > >> encryption. > > > > I don't understand what problem you're trying to solve. I don't see a need to > > provide a single concrete representation/definition of a "virtual machine". E.g. > > there's no need for a formal definition to securely perform intrahost migration, > > KVM just needs to ensure that the migration doesn't compromise guest security, > > functionality, etc. > > > > That gets a lot more complex if the target KVM instance (module, not "struct kvm") > > is a different KVM, e.g. when migrating to a different host. Then there needs to > > be a way to attest that the target is trusted and whatnot, but that still doesn't > > require there to be a formal definition of a "virtual machine". > > > >> Circling back to the original topic, was associating the file with > >> struct kvm at gmem file creation time meant to constrain the use of the > >> gmem file to one struct kvm, or one virtual machine, or something else? > > > > It's meant to keep things as simple as possible (relatively speaking). A 1:1 > > association between a KVM instance and a gmem instance means we don't have to > > worry about the edge cases and oddities I pointed out earlier in this thread. > > > > I looked through this thread again and re-read the edge cases and > oddities that was pointed out earlier (last paragraph at [1]) and I > think I understand better, and I have just one last clarification. > > It was previously mentioned that binding on creation time simplifies the > lifecycle of memory: > > "(a) prevent a different VM from *ever* binding to the gmem instance" [1] > > Does this actually mean > > "prevent a different struct kvm from *ever* binding to this gmem file" > > ? Yes. > If so, then binding on creation > > + Makes the gmem *file* (and just not the bindings xarray) the binding > between struct kvm and the file. Yep. > + Simplifies the KVM-userspace contract to "this gmem file can only be > used with this struct kvm" Yep. > Binding on creation doesn't offer any way to block the contents of the > inode from being used with another "virtual machine" though, since we > can have more than one gmem file pointing to the same inode, and the > other gmem file is associated with another struct kvm. (And a strut kvm > isn't associated 1:1 with a virtual machine [2]) Yep. > The point about an inode needing to be coupled to a virtual machine as a > thing [2] led me to try to find a single concrete representation of a > "virtual machine". > > Is locking inode contents to a "virtual machine" outside the scope of > gmem? Yes, because it's not gmem's responsibility to define "secure" (from a guest perspective) or "safe" (from a platform stability and correctness perspective). E.g. inserting additional vCPUs into the VM a la the SEV migration helper thing is comically insecure without some way to attest the helper code. Building policy into the host kernel/KVM to do that attestation or otherwise determine what code is/isn't safe for the guest to run is firmly out-of-scope. KVM can certainly provide the tools and help with enforcement, but the policy needs to be defined elsewhere. Even for something like pKVM, where KVM is in the TCB, KVM still doesn't define who/what to trust (though KVM is heavily involved in enforcing security stuff). And for platform safety, e.g. not allowing two VMs to use the same HKID (ignoring helpers for the moment), that's a KVM problem but NOT a gmem problem. The point I raised in link[2] about a gmem inode and thus the HKID/ASID associated with the inode being bound to the "virtual machine" still holds true, but (a) it's not a 1:1 correlation, e.g. a VM could utilize multiple gmem inodes (all with the same HKID/ASID), and (b) the safety and functional correctness aspects aren't unique to gmem, e.g. even when when gmem isn't in the picture, KVM needs to make sure it manages ASIDs correctly. The only difference with SNP in the picture is that if KVM screws up ASID management, bad things happen to the host, not (just) the guest. > If so, then it is fine to bind on creation time, use a VM ioctl > over a system ioctl, and the method of refcounting in gmem v12 is okay. > > [1] https://lore.kernel.org/lkml/ZNKv9ul2I7A4V7IF@google.com/ > [2] https://lore.kernel.org/lkml/ZOO782YGRY0YMuPu@google.com/ > > >