From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sean Christopherson Date: Tue, 15 Aug 2023 13:03:51 -0700 Subject: [RFC PATCH v11 12/29] KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory In-Reply-To: References: Message-ID: List-Id: To: kvm-riscv@lists.infradead.org MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit On Tue, Aug 15, 2023, Ackerley Tng wrote: > Sean Christopherson writes: > > >> I feel that memslots form a natural way of managing usage of the gmem > >> file. When a memslot is created, it is using the file; hence we take a > >> refcount on the gmem file, and as memslots are removed, we drop > >> refcounts on the gmem file. > > > > Yes and no. It's definitely more natural *if* the goal is to allow guest_memfd > > memory to exist without being attached to a VM. But I'm not at all convinced > > that we want to allow that, or that it has desirable properties. With TDX and > > SNP in particuarly, I'm pretty sure that allowing memory to outlive the VM is > > very underisable (more below). > > > > This is a little confusing, with the file/inode split in gmem where the > physical memory/data is attached to the inode and the file represents > the VM's view of that memory, won't the memory outlive the VM? Doh, I overloaded the term "VM". By "VM" I meant the virtual machine as a "thing" the rest of the world sees and interacts with, not the original "struct kvm" object. Because yes, you're absolutely correct that the memory will outlive "struct kvm", but it won't outlive the virtual machine, and specifically won't outlive the ASID (SNP) / HKID (TDX) to which it's bound. > This [1] POC was built based on that premise, that the gmem inode can be > linked to another file and handed off to another VM, to facilitate > intra-host migration, where the point is to save the work of rebuilding > the VM's memory in the destination VM. > > With this, the bindings don't outlive the VM, but the data/memory > does. I think this split design you proposed is really nice. > > >> The KVM pointer is shared among all the bindings in gmem?s xarray, and we can > >> enforce that a gmem file is used only with one VM: > >> > >> + When binding a memslot to the file, if a kvm pointer exists, it must > >> be the same kvm as the one in this binding > >> + When the binding to the last memslot is removed from a file, NULL the > >> kvm pointer. > > > > Nullifying the KVM pointer isn't sufficient, because without additional actions > > userspace could extract data from a VM by deleting its memslots and then binding > > the guest_memfd to an attacker controlled VM. Or more likely with TDX and SNP, > > induce badness by coercing KVM into mapping memory into a guest with the wrong > > ASID/HKID. > > > > I can think of three ways to handle that: > > > > (a) prevent a different VM from *ever* binding to the gmem instance > > (b) free/zero physical pages when unbinding > > (c) free/zero when binding to a different VM > > > > Option (a) is easy, but that pretty much defeats the purpose of decopuling > > guest_memfd from a VM. > > > > Option (b) isn't hard to implement, but it screws up the lifecycle of the memory, > > e.g. would require memory when a memslot is deleted. That isn't necessarily a > > deal-breaker, but it runs counter to how KVM memlots currently operate. Memslots > > are basically just weird page tables, e.g. deleting a memslot doesn't have any > > impact on the underlying data in memory. TDX throws a wrench in this as removing > > a page from the Secure EPT is effectively destructive to the data (can't be mapped > > back in to the VM without zeroing the data), but IMO that's an oddity with TDX and > > not necessarily something we want to carry over to other VM types. > > > > There would also be performance implications (probably a non-issue in practice), > > and weirdness if/when we get to sharing, linking and/or mmap()ing gmem. E.g. what > > should happen if the last memslot (binding) is deleted, but there outstanding userspace > > mappings? > > > > Option (c) is better from a lifecycle perspective, but it adds its own flavor of > > complexity, e.g. the performant way to reclaim TDX memory requires the TDMR > > (effectively the VM pointer), and so a deferred relcaim doesn't really work for > > TDX. And I'm pretty sure it *can't* work for SNP, because RMP entries must not > > outlive the VM; KVM can't reuse an ASID if there are pages assigned to that ASID > > in the RMP, i.e. until all memory belonging to the VM has been fully freed. > > > > If we are on the same page that the memory should outlive the VM but not > the bindings, then associating the gmem inode to a new VM should be a > feature and not a bug. > > What do we want to defend against here? > > (a) Malicious host VMM > > For a malicious host VMM to read guest memory (with TDX and SNP), it can > create a new VM with the same HKID/ASID as the victim VM, rebind the > gmem inode to a VM crafted with an image that dumps the memory. > > I believe it is not possible for userspace to arbitrarily select a > matching HKID unless userspace uses the intra-host migration ioctls, but if the > migration ioctl is used, then EPTs are migrated and the memory dumper VM > can't successfully run a different image from the victim VM. If the > dumper VM needs to run the same image as the victim VM, then it would be > a successful migration rather than an attack. (Perhaps we need to clean > up some #MCs here but that can be a separate patch). >From a guest security perspective, throw TDX and SNP out the window. As far as the design of guest_memfd is concerned, I truly do not care what security properties they provide, I only care about whether or not KVM's support for TDX and SNP is clean, robust, and functionally correct. Note, I'm not saying I don't care about TDX/SNP. What I'm saying is that I don't want to design something that is beneficial only to what is currently a very niche class of VMs that require specific flavors of hardware. > (b) Malicious host kernel > > A malicious host kernel can allow a malicious host VMM to re-use a HKID > for the dumper VM, but this isn't something a better gmem design can > defend against. Yep, completely out-of-scope. > (c) Attacks using gmem for software-protected VMs > > Attacks using gmem for software-protected VMs are possible since there > is no real encryption with HKID/ASID (yet?). The selftest for [1] > actually uses this lack of encryption to test that the destination VM > can read the source VM's memory after the migration. In the POC [1], as > long as both destination VM knows where in the inode's memory to read, > it can read what it wants to. Encryption is not required to protect guest memory from less privileged software. The selftests don't rely on lack of encryption, they rely on KVM incorporating host userspace into the TCB. Just because this RFC doesn't remove the VMM from the TCB for SW-protected VMS, doesn't mean we _can't_ remove the VMM from the TCB. pKVM has already shown that such an implementation is possible. We didn't tackle pKVM-like support in the initial implementation because it's non-trivial, doesn't yet have a concrete use case to fund/drive development, and would have significantly delayed support for the use cases people do actually care about. There are certainly benefits from memory being encrypted, but it's neither a requirement nor a panacea, as proven by the never ending stream of speculative execution attacks. > This is a problem for software-protected VMs, but I feel that it is also a > separate issue from gmem's design. No, I don't want guest_memfd to be just be a vehicle for SNP/TDX VMs. Having line of sight to removing host userspace from the TCB is absolutely a must have for me, and having line of sight to improving KVM's security posture for "regular" VMs is even more of a must have. If guest_memfd doesn't provide us a very direct path to (eventually) achieving those goals, then IMO it's a failure. Which leads me to: (d) Buggy components Today, for all intents and purposes, guest memory *must* be mapped writable in the VMM, which means it is all too easy for a benign-but-buggy host component to corrupt guest memory. There are ways to mitigate potential problems, e.g. by developing userspace to adhere to the principle of least privilege inasmuch as possible, but such mitigations would be far less robust than what can be achieved via guest_memfd, and practically speaking I don't see us (Google, but also KVM in general) making progress on deprivileging userspace without forcing the issue. > >> Could binding gmem files not on creation, but at memslot configuration > >> time be sufficient and simpler? > > > > After working through the flows, I think binding on-demand would simplify the > > refcounting (stating the obvious), but complicate the lifecycle of the memory as > > well as the contract between KVM and userspace, > > If we are on the same page that the memory should outlive the VM but not > the bindings, does it still complicate the lifecycle of the memory and > the userspace/KVM contract? Could it just be a different contract? Not entirely sure I understand what you're asking. Does this question go away with my clarification about struct kvm vs. virtual machine? > > and would break the separation of > > concerns between the inode (physical memory / data) and file (VM's view / mappings). > > Binding on-demand is orthogonal to the separation of concerns between > inode and file, because it can be built regardless of whether we do the > gmem file/inode split. > > + This flip-the-refcounting POC is built with the file/inode split and > + In [2] (the delayed binding approach to solve intra-host migration), I > also tried flipping the refcounting, and that without the gmem > file/inode split. (Refcounting in [2] is buggy because the file can't > take a refcount on KVM, but it would work without taking that refcount) > > [1] https://lore.kernel.org/lkml/cover.1691446946.git.ackerleytng at google.com/T/ > [2] https://github.com/googleprodkernel/linux-cc/commit/dd5ac5e53f14a1ef9915c9c1e4cc1006a40b49df From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BAD041426C for ; Tue, 15 Aug 2023 20:03:54 +0000 (UTC) Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-d664f9c5b92so5399638276.3 for ; Tue, 15 Aug 2023 13:03:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1692129833; x=1692734633; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=UdgiQF5eHDQGC93K0blyfdFyGr+2XYdNo+ZKSbJ8j6I=; b=tMP747PnrWIY3S2IgIlpTGvys1qcp7gvhf2tEkTincr/WVeeoLtSV4nAqvShetKLlE L1iZ2gBuYmK67h8Uw9IKCA2gN3qOM0GGC3XehHtQr0jMkW0ZJ/fXrRn6EfVMRIVGETms zlDBmwUShl2Q3VsZCRyHfSqWCKWeTFGcKSG1iAXT03s7rjvYB7wm+4XZLwdERLEqJnHl s+OEqS8CHaugxEaHqjEM4nmoUN0SaAqSQyAqE5Qleml+ks20SUEByr18WbK2nerA03bM m/EmYRW9TIB8/DRpffNeX6d54vgVpq23mcljzKseBhsuD7QuMUsIfvRucPdmyp9uk8JZ laEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692129833; x=1692734633; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=UdgiQF5eHDQGC93K0blyfdFyGr+2XYdNo+ZKSbJ8j6I=; b=Fy/T6sz6vacHb1OmqJGJvD9C8t1i2LiO1gQ/uK6cErtuS2uqWDuFcFz1E+jgu1upe3 72XzQxUQRnHXGZ8YtVB8Y8AIeRZEQxz6dHN/JWVrx0Z6XXlEUsaJFAlJGTdV0PXhgA1z Lp2wKJYJYdth4bzSS7IAz0iZRvsa2uRWKVoKO/iY12b0/X2KW4WWYd6B+UrLAAtyWWRX odiOtvNI0RL0Sb0LD511MK6o8s3Dc2wQnPG2cVTPmx6ETRrLZS2P2TpZfYyHE7pzwHT+ 9dSPVU42TYeh0JHiYdD5RLPZBuXZ9xesWex5jEifkQzVX8VG+NOqLU0DIM66yxGpJPuY sIvg== X-Gm-Message-State: AOJu0YySWXWlLMz425/lAsK7GhUunOHeTseZEuFU5omFloX1Srs8eI6T UqGxFsUwhYVFwMObM7maH8ittGD0EVQ= X-Google-Smtp-Source: AGHT+IF8PVKZ3/Mllt1qn4YqGQbrszG6AYhXRKKw5U6+m8LONIq9kWubfU+atsLqR25KKJSwV9mEz+vM5vc= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a05:6902:565:b0:d18:73fc:40af with SMTP id a5-20020a056902056500b00d1873fc40afmr176842ybt.5.1692129833519; Tue, 15 Aug 2023 13:03:53 -0700 (PDT) Date: Tue, 15 Aug 2023 13:03:51 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: kvmarm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: Message-ID: Subject: Re: [RFC PATCH v11 12/29] KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory From: Sean Christopherson To: Ackerley Tng Cc: pbonzini@redhat.com, maz@kernel.org, oliver.upton@linux.dev, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, willy@infradead.org, akpm@linux-foundation.org, paul@paul-moore.com, jmorris@namei.org, serge@hallyn.com, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-security-module@vger.kernel.org, linux-kernel@vger.kernel.org, chao.p.peng@linux.intel.com, tabba@google.com, jarkko@kernel.org, yu.c.zhang@linux.intel.com, vannapurve@google.com, mail@maciej.szmigiero.name, vbabka@suse.cz, david@redhat.com, qperret@google.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable On Tue, Aug 15, 2023, Ackerley Tng wrote: > Sean Christopherson writes: >=20 > >> I feel that memslots form a natural way of managing usage of the gmem > >> file. When a memslot is created, it is using the file; hence we take a > >> refcount on the gmem file, and as memslots are removed, we drop > >> refcounts on the gmem file. > > > > Yes and no. It's definitely more natural *if* the goal is to allow gue= st_memfd > > memory to exist without being attached to a VM. But I'm not at all con= vinced > > that we want to allow that, or that it has desirable properties. With = TDX and > > SNP in particuarly, I'm pretty sure that allowing memory to outlive the= VM is > > very underisable (more below). > > >=20 > This is a little confusing, with the file/inode split in gmem where the > physical memory/data is attached to the inode and the file represents > the VM's view of that memory, won't the memory outlive the VM? Doh, I overloaded the term "VM". By "VM" I meant the virtual machine as a = "thing" the rest of the world sees and interacts with, not the original "struct kvm= " object. Because yes, you're absolutely correct that the memory will outlive "struct= kvm", but it won't outlive the virtual machine, and specifically won't outlive th= e ASID (SNP) / HKID (TDX) to which it's bound. > This [1] POC was built based on that premise, that the gmem inode can be > linked to another file and handed off to another VM, to facilitate > intra-host migration, where the point is to save the work of rebuilding > the VM's memory in the destination VM. >=20 > With this, the bindings don't outlive the VM, but the data/memory > does. I think this split design you proposed is really nice. >=20 > >> The KVM pointer is shared among all the bindings in gmem=E2=80=99s xar= ray, and we can > >> enforce that a gmem file is used only with one VM: > >> > >> + When binding a memslot to the file, if a kvm pointer exists, it must > >> be the same kvm as the one in this binding > >> + When the binding to the last memslot is removed from a file, NULL th= e > >> kvm pointer. > > > > Nullifying the KVM pointer isn't sufficient, because without additional= actions > > userspace could extract data from a VM by deleting its memslots and the= n binding > > the guest_memfd to an attacker controlled VM. Or more likely with TDX = and SNP, > > induce badness by coercing KVM into mapping memory into a guest with th= e wrong > > ASID/HKID. > > > > I can think of three ways to handle that: > > > > (a) prevent a different VM from *ever* binding to the gmem instance > > (b) free/zero physical pages when unbinding > > (c) free/zero when binding to a different VM > > > > Option (a) is easy, but that pretty much defeats the purpose of decopul= ing > > guest_memfd from a VM. > > > > Option (b) isn't hard to implement, but it screws up the lifecycle of t= he memory, > > e.g. would require memory when a memslot is deleted. That isn't necess= arily a > > deal-breaker, but it runs counter to how KVM memlots currently operate.= Memslots > > are basically just weird page tables, e.g. deleting a memslot doesn't h= ave any > > impact on the underlying data in memory. TDX throws a wrench in this a= s removing > > a page from the Secure EPT is effectively destructive to the data (can'= t be mapped > > back in to the VM without zeroing the data), but IMO that's an oddity w= ith TDX and > > not necessarily something we want to carry over to other VM types. > > > > There would also be performance implications (probably a non-issue in p= ractice), > > and weirdness if/when we get to sharing, linking and/or mmap()ing gmem.= E.g. what > > should happen if the last memslot (binding) is deleted, but there outst= anding userspace > > mappings? > > > > Option (c) is better from a lifecycle perspective, but it adds its own = flavor of > > complexity, e.g. the performant way to reclaim TDX memory requires the = TDMR > > (effectively the VM pointer), and so a deferred relcaim doesn't really = work for > > TDX. And I'm pretty sure it *can't* work for SNP, because RMP entries = must not > > outlive the VM; KVM can't reuse an ASID if there are pages assigned to = that ASID > > in the RMP, i.e. until all memory belonging to the VM has been fully fr= eed. > > >=20 > If we are on the same page that the memory should outlive the VM but not > the bindings, then associating the gmem inode to a new VM should be a > feature and not a bug. >=20 > What do we want to defend against here? >=20 > (a) Malicious host VMM >=20 > For a malicious host VMM to read guest memory (with TDX and SNP), it can > create a new VM with the same HKID/ASID as the victim VM, rebind the > gmem inode to a VM crafted with an image that dumps the memory. >=20 > I believe it is not possible for userspace to arbitrarily select a > matching HKID unless userspace uses the intra-host migration ioctls, but = if the > migration ioctl is used, then EPTs are migrated and the memory dumper VM > can't successfully run a different image from the victim VM. If the > dumper VM needs to run the same image as the victim VM, then it would be > a successful migration rather than an attack. (Perhaps we need to clean > up some #MCs here but that can be a separate patch). >From a guest security perspective, throw TDX and SNP out the window. As fa= r as the design of guest_memfd is concerned, I truly do not care what security p= roperties they provide, I only care about whether or not KVM's support for TDX and SN= P is clean, robust, and functionally correct. Note, I'm not saying I don't care about TDX/SNP. What I'm saying is that I= don't want to design something that is beneficial only to what is currently a ver= y niche class of VMs that require specific flavors of hardware. > (b) Malicious host kernel >=20 > A malicious host kernel can allow a malicious host VMM to re-use a HKID > for the dumper VM, but this isn't something a better gmem design can > defend against. Yep, completely out-of-scope. > (c) Attacks using gmem for software-protected VMs >=20 > Attacks using gmem for software-protected VMs are possible since there > is no real encryption with HKID/ASID (yet?). The selftest for [1] > actually uses this lack of encryption to test that the destination VM > can read the source VM's memory after the migration. In the POC [1], as > long as both destination VM knows where in the inode's memory to read, > it can read what it wants to. =20 Encryption is not required to protect guest memory from less privileged sof= tware. The selftests don't rely on lack of encryption, they rely on KVM incorporat= ing host userspace into the TCB. Just because this RFC doesn't remove the VMM from the TCB for SW-protected = VMS, doesn't mean we _can't_ remove the VMM from the TCB. pKVM has already show= n that such an implementation is possible. We didn't tackle pKVM-like support in = the initial implementation because it's non-trivial, doesn't yet have a concret= e use case to fund/drive development, and would have significantly delayed suppor= t for the use cases people do actually care about. There are certainly benefits from memory being encrypted, but it's neither = a requirement nor a panacea, as proven by the never ending stream of speculat= ive execution attacks. =20 > This is a problem for software-protected VMs, but I feel that it is also = a > separate issue from gmem's design. No, I don't want guest_memfd to be just be a vehicle for SNP/TDX VMs. Havi= ng line of sight to removing host userspace from the TCB is absolutely a must have = for me, and having line of sight to improving KVM's security posture for "regular" = VMs is even more of a must have. If guest_memfd doesn't provide us a very direct = path to (eventually) achieving those goals, then IMO it's a failure. Which leads me to: (d) Buggy components Today, for all intents and purposes, guest memory *must* be mapped writable= in the VMM, which means it is all too easy for a benign-but-buggy host compone= nt to corrupt guest memory. There are ways to mitigate potential problems, e.g. = by developing userspace to adhere to the principle of least privilege inasmuch= as possible, but such mitigations would be far less robust than what can be ac= hieved via guest_memfd, and practically speaking I don't see us (Google, but also = KVM in general) making progress on deprivileging userspace without forcing the iss= ue. > >> Could binding gmem files not on creation, but at memslot configuration > >> time be sufficient and simpler? > > > > After working through the flows, I think binding on-demand would simpli= fy the > > refcounting (stating the obvious), but complicate the lifecycle of the = memory as > > well as the contract between KVM and userspace, >=20 > If we are on the same page that the memory should outlive the VM but not > the bindings, does it still complicate the lifecycle of the memory and > the userspace/KVM contract? Could it just be a different contract? Not entirely sure I understand what you're asking. Does this question go a= way with my clarification about struct kvm vs. virtual machine? > > and would break the separation of > > concerns between the inode (physical memory / data) and file (VM's view= / mappings). >=20 > Binding on-demand is orthogonal to the separation of concerns between > inode and file, because it can be built regardless of whether we do the > gmem file/inode split. >=20 > + This flip-the-refcounting POC is built with the file/inode split and > + In [2] (the delayed binding approach to solve intra-host migration), I > also tried flipping the refcounting, and that without the gmem > file/inode split. (Refcounting in [2] is buggy because the file can't > take a refcount on KVM, but it would work without taking that refcount) >=20 > [1] https://lore.kernel.org/lkml/cover.1691446946.git.ackerleytng@google.= com/T/ > [2] https://github.com/googleprodkernel/linux-cc/commit/dd5ac5e53f14a1ef9= 915c9c1e4cc1006a40b49df From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id F028DC0015E for ; Tue, 15 Aug 2023 20:04:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:Message-ID: References:Mime-Version:In-Reply-To:Date:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=j4vXl0gPnMEUq2uEYY26sCvHh3NJ0I3ZdUs7Nfue0aw=; b=j5A4ye+/YzC+1Od1HO8VQ93H4i Kh2X1S+MhmTepTxVzp9HREAjEwqB+ErcczrzL8uZ/2aIUz4Bkoni3OvwexltN1CT8XmoAPTVaL3sT NWi73bmrU97crGeF9pdxcsOmUPweGmxcRCFiogqoAyRAy+Gvdli7jsLTw5VlWp3DV35AgBs3BM8zw Cj8Eq+z5gF1Rsaa2aXNFDffRFPASv+1y/dxkcKJ/uldQiQ/2q1YJ0uc/JQP0fmzEEH1BA3LWST9m+ 94V1r85jTizwQvJV94NagpxuavCK2hY5YfLxB3v5v1a4nTko7sY9PEuFzaYeL9rhKyc86UGy5Oh4D B+ydIR8A==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1qW0Gm-002O5t-2Z; Tue, 15 Aug 2023 20:04:00 +0000 Received: from mail-yb1-xb49.google.com ([2607:f8b0:4864:20::b49]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1qW0Gj-002O3Z-1r for linux-riscv@lists.infradead.org; Tue, 15 Aug 2023 20:03:59 +0000 Received: by mail-yb1-xb49.google.com with SMTP id 3f1490d57ef6-d679709cdb5so4661127276.1 for ; Tue, 15 Aug 2023 13:03:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1692129833; x=1692734633; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=UdgiQF5eHDQGC93K0blyfdFyGr+2XYdNo+ZKSbJ8j6I=; b=tMP747PnrWIY3S2IgIlpTGvys1qcp7gvhf2tEkTincr/WVeeoLtSV4nAqvShetKLlE L1iZ2gBuYmK67h8Uw9IKCA2gN3qOM0GGC3XehHtQr0jMkW0ZJ/fXrRn6EfVMRIVGETms zlDBmwUShl2Q3VsZCRyHfSqWCKWeTFGcKSG1iAXT03s7rjvYB7wm+4XZLwdERLEqJnHl s+OEqS8CHaugxEaHqjEM4nmoUN0SaAqSQyAqE5Qleml+ks20SUEByr18WbK2nerA03bM m/EmYRW9TIB8/DRpffNeX6d54vgVpq23mcljzKseBhsuD7QuMUsIfvRucPdmyp9uk8JZ laEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692129833; x=1692734633; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=UdgiQF5eHDQGC93K0blyfdFyGr+2XYdNo+ZKSbJ8j6I=; b=Vtg6QmFndtumxTuDaC2kogtfygGmpbx0qvpnSeDqAlLISoTISNWPZt+TDClLi3X5aG 1WKA6k4rXY/ZABfoMbTQaOPCq/XGwBI9yiJNSp7gMRpNdUqnw0DvqGfER8BYsUjz5Ivi gK13c7Z5+7ecmUfhYBV6JNvmkENSPE0XUC4Loy2rKKGTzrtioVe9fGZ7m8AoQSfs+jHc 6Bd9aGKow7P6flHlZcwMB3L8pWy+5qajRiD0AFs5QdVOu/6zVxcn7cO54AWyIEVwbtRQ EKduo1DlVxjGq+Ojei0/YG39arbdVqzUlf4hTo/GlNmLAE3/U8yIutZB9MyBydljPYQs YL9w== X-Gm-Message-State: AOJu0Yzk2uyV9f2rePgbBCWaT6VGDnc/KA29SdLvIzRNUgSm90t+M9vc 7htn2dNcM0+hiZloUd/QeA/qG0V+x10= X-Google-Smtp-Source: AGHT+IF8PVKZ3/Mllt1qn4YqGQbrszG6AYhXRKKw5U6+m8LONIq9kWubfU+atsLqR25KKJSwV9mEz+vM5vc= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a05:6902:565:b0:d18:73fc:40af with SMTP id a5-20020a056902056500b00d1873fc40afmr176842ybt.5.1692129833519; Tue, 15 Aug 2023 13:03:53 -0700 (PDT) Date: Tue, 15 Aug 2023 13:03:51 -0700 In-Reply-To: Mime-Version: 1.0 References: Message-ID: Subject: Re: [RFC PATCH v11 12/29] KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory From: Sean Christopherson To: Ackerley Tng Cc: pbonzini@redhat.com, maz@kernel.org, oliver.upton@linux.dev, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, willy@infradead.org, akpm@linux-foundation.org, paul@paul-moore.com, jmorris@namei.org, serge@hallyn.com, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-security-module@vger.kernel.org, linux-kernel@vger.kernel.org, chao.p.peng@linux.intel.com, tabba@google.com, jarkko@kernel.org, yu.c.zhang@linux.intel.com, vannapurve@google.com, mail@maciej.szmigiero.name, vbabka@suse.cz, david@redhat.com, qperret@google.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230815_130357_618883_975A6917 X-CRM114-Status: GOOD ( 55.62 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org T24gVHVlLCBBdWcgMTUsIDIwMjMsIEFja2VybGV5IFRuZyB3cm90ZToKPiBTZWFuIENocmlzdG9w aGVyc29uIDxzZWFuamNAZ29vZ2xlLmNvbT4gd3JpdGVzOgo+IAo+ID4+IEkgZmVlbCB0aGF0IG1l bXNsb3RzIGZvcm0gYSBuYXR1cmFsIHdheSBvZiBtYW5hZ2luZyB1c2FnZSBvZiB0aGUgZ21lbQo+ ID4+IGZpbGUuIFdoZW4gYSBtZW1zbG90IGlzIGNyZWF0ZWQsIGl0IGlzIHVzaW5nIHRoZSBmaWxl OyBoZW5jZSB3ZSB0YWtlIGEKPiA+PiByZWZjb3VudCBvbiB0aGUgZ21lbSBmaWxlLCBhbmQgYXMg bWVtc2xvdHMgYXJlIHJlbW92ZWQsIHdlIGRyb3AKPiA+PiByZWZjb3VudHMgb24gdGhlIGdtZW0g ZmlsZS4KPiA+Cj4gPiBZZXMgYW5kIG5vLiAgSXQncyBkZWZpbml0ZWx5IG1vcmUgbmF0dXJhbCAq aWYqIHRoZSBnb2FsIGlzIHRvIGFsbG93IGd1ZXN0X21lbWZkCj4gPiBtZW1vcnkgdG8gZXhpc3Qg d2l0aG91dCBiZWluZyBhdHRhY2hlZCB0byBhIFZNLiAgQnV0IEknbSBub3QgYXQgYWxsIGNvbnZp bmNlZAo+ID4gdGhhdCB3ZSB3YW50IHRvIGFsbG93IHRoYXQsIG9yIHRoYXQgaXQgaGFzIGRlc2ly YWJsZSBwcm9wZXJ0aWVzLiAgV2l0aCBURFggYW5kCj4gPiBTTlAgaW4gcGFydGljdWFybHksIEkn bSBwcmV0dHkgc3VyZSB0aGF0IGFsbG93aW5nIG1lbW9yeSB0byBvdXRsaXZlIHRoZSBWTSBpcwo+ ID4gdmVyeSB1bmRlcmlzYWJsZSAobW9yZSBiZWxvdykuCj4gPgo+IAo+IFRoaXMgaXMgYSBsaXR0 bGUgY29uZnVzaW5nLCB3aXRoIHRoZSBmaWxlL2lub2RlIHNwbGl0IGluIGdtZW0gd2hlcmUgdGhl Cj4gcGh5c2ljYWwgbWVtb3J5L2RhdGEgaXMgYXR0YWNoZWQgdG8gdGhlIGlub2RlIGFuZCB0aGUg ZmlsZSByZXByZXNlbnRzCj4gdGhlIFZNJ3MgdmlldyBvZiB0aGF0IG1lbW9yeSwgd29uJ3QgdGhl IG1lbW9yeSBvdXRsaXZlIHRoZSBWTT8KCkRvaCwgSSBvdmVybG9hZGVkIHRoZSB0ZXJtICJWTSIu ICBCeSAiVk0iIEkgbWVhbnQgdGhlIHZpcnR1YWwgbWFjaGluZSBhcyBhICJ0aGluZyIKdGhlIHJl c3Qgb2YgdGhlIHdvcmxkIHNlZXMgYW5kIGludGVyYWN0cyB3aXRoLCBub3QgdGhlIG9yaWdpbmFs ICJzdHJ1Y3Qga3ZtIiBvYmplY3QuCgpCZWNhdXNlIHllcywgeW91J3JlIGFic29sdXRlbHkgY29y cmVjdCB0aGF0IHRoZSBtZW1vcnkgd2lsbCBvdXRsaXZlICJzdHJ1Y3Qga3ZtIiwKYnV0IGl0IHdv bid0IG91dGxpdmUgdGhlIHZpcnR1YWwgbWFjaGluZSwgYW5kIHNwZWNpZmljYWxseSB3b24ndCBv dXRsaXZlIHRoZQpBU0lEIChTTlApIC8gSEtJRCAoVERYKSB0byB3aGljaCBpdCdzIGJvdW5kLgoK PiBUaGlzIFsxXSBQT0Mgd2FzIGJ1aWx0IGJhc2VkIG9uIHRoYXQgcHJlbWlzZSwgdGhhdCB0aGUg Z21lbSBpbm9kZSBjYW4gYmUKPiBsaW5rZWQgdG8gYW5vdGhlciBmaWxlIGFuZCBoYW5kZWQgb2Zm IHRvIGFub3RoZXIgVk0sIHRvIGZhY2lsaXRhdGUKPiBpbnRyYS1ob3N0IG1pZ3JhdGlvbiwgd2hl cmUgdGhlIHBvaW50IGlzIHRvIHNhdmUgdGhlIHdvcmsgb2YgcmVidWlsZGluZwo+IHRoZSBWTSdz IG1lbW9yeSBpbiB0aGUgZGVzdGluYXRpb24gVk0uCj4gCj4gV2l0aCB0aGlzLCB0aGUgYmluZGlu Z3MgZG9uJ3Qgb3V0bGl2ZSB0aGUgVk0sIGJ1dCB0aGUgZGF0YS9tZW1vcnkKPiBkb2VzLiBJIHRo aW5rIHRoaXMgc3BsaXQgZGVzaWduIHlvdSBwcm9wb3NlZCBpcyByZWFsbHkgbmljZS4KPiAKPiA+ PiBUaGUgS1ZNIHBvaW50ZXIgaXMgc2hhcmVkIGFtb25nIGFsbCB0aGUgYmluZGluZ3MgaW4gZ21l beKAmXMgeGFycmF5LCBhbmQgd2UgY2FuCj4gPj4gZW5mb3JjZSB0aGF0IGEgZ21lbSBmaWxlIGlz IHVzZWQgb25seSB3aXRoIG9uZSBWTToKPiA+Pgo+ID4+ICsgV2hlbiBiaW5kaW5nIGEgbWVtc2xv dCB0byB0aGUgZmlsZSwgaWYgYSBrdm0gcG9pbnRlciBleGlzdHMsIGl0IG11c3QKPiA+PiAgIGJl IHRoZSBzYW1lIGt2bSBhcyB0aGUgb25lIGluIHRoaXMgYmluZGluZwo+ID4+ICsgV2hlbiB0aGUg YmluZGluZyB0byB0aGUgbGFzdCBtZW1zbG90IGlzIHJlbW92ZWQgZnJvbSBhIGZpbGUsIE5VTEwg dGhlCj4gPj4gICBrdm0gcG9pbnRlci4KPiA+Cj4gPiBOdWxsaWZ5aW5nIHRoZSBLVk0gcG9pbnRl ciBpc24ndCBzdWZmaWNpZW50LCBiZWNhdXNlIHdpdGhvdXQgYWRkaXRpb25hbCBhY3Rpb25zCj4g PiB1c2Vyc3BhY2UgY291bGQgZXh0cmFjdCBkYXRhIGZyb20gYSBWTSBieSBkZWxldGluZyBpdHMg bWVtc2xvdHMgYW5kIHRoZW4gYmluZGluZwo+ID4gdGhlIGd1ZXN0X21lbWZkIHRvIGFuIGF0dGFj a2VyIGNvbnRyb2xsZWQgVk0uICBPciBtb3JlIGxpa2VseSB3aXRoIFREWCBhbmQgU05QLAo+ID4g aW5kdWNlIGJhZG5lc3MgYnkgY29lcmNpbmcgS1ZNIGludG8gbWFwcGluZyBtZW1vcnkgaW50byBh IGd1ZXN0IHdpdGggdGhlIHdyb25nCj4gPiBBU0lEL0hLSUQuCj4gPgo+ID4gSSBjYW4gdGhpbmsg b2YgdGhyZWUgd2F5cyB0byBoYW5kbGUgdGhhdDoKPiA+Cj4gPiAgIChhKSBwcmV2ZW50IGEgZGlm ZmVyZW50IFZNIGZyb20gKmV2ZXIqIGJpbmRpbmcgdG8gdGhlIGdtZW0gaW5zdGFuY2UKPiA+ICAg KGIpIGZyZWUvemVybyBwaHlzaWNhbCBwYWdlcyB3aGVuIHVuYmluZGluZwo+ID4gICAoYykgZnJl ZS96ZXJvIHdoZW4gYmluZGluZyB0byBhIGRpZmZlcmVudCBWTQo+ID4KPiA+IE9wdGlvbiAoYSkg aXMgZWFzeSwgYnV0IHRoYXQgcHJldHR5IG11Y2ggZGVmZWF0cyB0aGUgcHVycG9zZSBvZiBkZWNv cHVsaW5nCj4gPiBndWVzdF9tZW1mZCBmcm9tIGEgVk0uCj4gPgo+ID4gT3B0aW9uIChiKSBpc24n dCBoYXJkIHRvIGltcGxlbWVudCwgYnV0IGl0IHNjcmV3cyB1cCB0aGUgbGlmZWN5Y2xlIG9mIHRo ZSBtZW1vcnksCj4gPiBlLmcuIHdvdWxkIHJlcXVpcmUgbWVtb3J5IHdoZW4gYSBtZW1zbG90IGlz IGRlbGV0ZWQuICBUaGF0IGlzbid0IG5lY2Vzc2FyaWx5IGEKPiA+IGRlYWwtYnJlYWtlciwgYnV0 IGl0IHJ1bnMgY291bnRlciB0byBob3cgS1ZNIG1lbWxvdHMgY3VycmVudGx5IG9wZXJhdGUuICBN ZW1zbG90cwo+ID4gYXJlIGJhc2ljYWxseSBqdXN0IHdlaXJkIHBhZ2UgdGFibGVzLCBlLmcuIGRl bGV0aW5nIGEgbWVtc2xvdCBkb2Vzbid0IGhhdmUgYW55Cj4gPiBpbXBhY3Qgb24gdGhlIHVuZGVy bHlpbmcgZGF0YSBpbiBtZW1vcnkuICBURFggdGhyb3dzIGEgd3JlbmNoIGluIHRoaXMgYXMgcmVt b3ZpbmcKPiA+IGEgcGFnZSBmcm9tIHRoZSBTZWN1cmUgRVBUIGlzIGVmZmVjdGl2ZWx5IGRlc3Ry dWN0aXZlIHRvIHRoZSBkYXRhIChjYW4ndCBiZSBtYXBwZWQKPiA+IGJhY2sgaW4gdG8gdGhlIFZN IHdpdGhvdXQgemVyb2luZyB0aGUgZGF0YSksIGJ1dCBJTU8gdGhhdCdzIGFuIG9kZGl0eSB3aXRo IFREWCBhbmQKPiA+IG5vdCBuZWNlc3NhcmlseSBzb21ldGhpbmcgd2Ugd2FudCB0byBjYXJyeSBv dmVyIHRvIG90aGVyIFZNIHR5cGVzLgo+ID4KPiA+IFRoZXJlIHdvdWxkIGFsc28gYmUgcGVyZm9y bWFuY2UgaW1wbGljYXRpb25zIChwcm9iYWJseSBhIG5vbi1pc3N1ZSBpbiBwcmFjdGljZSksCj4g PiBhbmQgd2VpcmRuZXNzIGlmL3doZW4gd2UgZ2V0IHRvIHNoYXJpbmcsIGxpbmtpbmcgYW5kL29y IG1tYXAoKWluZyBnbWVtLiAgRS5nLiB3aGF0Cj4gPiBzaG91bGQgaGFwcGVuIGlmIHRoZSBsYXN0 IG1lbXNsb3QgKGJpbmRpbmcpIGlzIGRlbGV0ZWQsIGJ1dCB0aGVyZSBvdXRzdGFuZGluZyB1c2Vy c3BhY2UKPiA+IG1hcHBpbmdzPwo+ID4KPiA+IE9wdGlvbiAoYykgaXMgYmV0dGVyIGZyb20gYSBs aWZlY3ljbGUgcGVyc3BlY3RpdmUsIGJ1dCBpdCBhZGRzIGl0cyBvd24gZmxhdm9yIG9mCj4gPiBj b21wbGV4aXR5LCBlLmcuIHRoZSBwZXJmb3JtYW50IHdheSB0byByZWNsYWltIFREWCBtZW1vcnkg cmVxdWlyZXMgdGhlIFRETVIKPiA+IChlZmZlY3RpdmVseSB0aGUgVk0gcG9pbnRlciksIGFuZCBz byBhIGRlZmVycmVkIHJlbGNhaW0gZG9lc24ndCByZWFsbHkgd29yayBmb3IKPiA+IFREWC4gIEFu ZCBJJ20gcHJldHR5IHN1cmUgaXQgKmNhbid0KiB3b3JrIGZvciBTTlAsIGJlY2F1c2UgUk1QIGVu dHJpZXMgbXVzdCBub3QKPiA+IG91dGxpdmUgdGhlIFZNOyBLVk0gY2FuJ3QgcmV1c2UgYW4gQVNJ RCBpZiB0aGVyZSBhcmUgcGFnZXMgYXNzaWduZWQgdG8gdGhhdCBBU0lECj4gPiBpbiB0aGUgUk1Q LCBpLmUuIHVudGlsIGFsbCBtZW1vcnkgYmVsb25naW5nIHRvIHRoZSBWTSBoYXMgYmVlbiBmdWxs eSBmcmVlZC4KPiA+Cj4gCj4gSWYgd2UgYXJlIG9uIHRoZSBzYW1lIHBhZ2UgdGhhdCB0aGUgbWVt b3J5IHNob3VsZCBvdXRsaXZlIHRoZSBWTSBidXQgbm90Cj4gdGhlIGJpbmRpbmdzLCB0aGVuIGFz c29jaWF0aW5nIHRoZSBnbWVtIGlub2RlIHRvIGEgbmV3IFZNIHNob3VsZCBiZSBhCj4gZmVhdHVy ZSBhbmQgbm90IGEgYnVnLgo+IAo+IFdoYXQgZG8gd2Ugd2FudCB0byBkZWZlbmQgYWdhaW5zdCBo ZXJlPwo+IAo+IChhKSBNYWxpY2lvdXMgaG9zdCBWTU0KPiAKPiBGb3IgYSBtYWxpY2lvdXMgaG9z dCBWTU0gdG8gcmVhZCBndWVzdCBtZW1vcnkgKHdpdGggVERYIGFuZCBTTlApLCBpdCBjYW4KPiBj cmVhdGUgYSBuZXcgVk0gd2l0aCB0aGUgc2FtZSBIS0lEL0FTSUQgYXMgdGhlIHZpY3RpbSBWTSwg cmViaW5kIHRoZQo+IGdtZW0gaW5vZGUgdG8gYSBWTSBjcmFmdGVkIHdpdGggYW4gaW1hZ2UgdGhh dCBkdW1wcyB0aGUgbWVtb3J5Lgo+IAo+IEkgYmVsaWV2ZSBpdCBpcyBub3QgcG9zc2libGUgZm9y IHVzZXJzcGFjZSB0byBhcmJpdHJhcmlseSBzZWxlY3QgYQo+IG1hdGNoaW5nIEhLSUQgdW5sZXNz IHVzZXJzcGFjZSB1c2VzIHRoZSBpbnRyYS1ob3N0IG1pZ3JhdGlvbiBpb2N0bHMsIGJ1dCBpZiB0 aGUKPiBtaWdyYXRpb24gaW9jdGwgaXMgdXNlZCwgdGhlbiBFUFRzIGFyZSBtaWdyYXRlZCBhbmQg dGhlIG1lbW9yeSBkdW1wZXIgVk0KPiBjYW4ndCBzdWNjZXNzZnVsbHkgcnVuIGEgZGlmZmVyZW50 IGltYWdlIGZyb20gdGhlIHZpY3RpbSBWTS4gSWYgdGhlCj4gZHVtcGVyIFZNIG5lZWRzIHRvIHJ1 biB0aGUgc2FtZSBpbWFnZSBhcyB0aGUgdmljdGltIFZNLCB0aGVuIGl0IHdvdWxkIGJlCj4gYSBz dWNjZXNzZnVsIG1pZ3JhdGlvbiByYXRoZXIgdGhhbiBhbiBhdHRhY2suIChQZXJoYXBzIHdlIG5l ZWQgdG8gY2xlYW4KPiB1cCBzb21lICNNQ3MgaGVyZSBidXQgdGhhdCBjYW4gYmUgYSBzZXBhcmF0 ZSBwYXRjaCkuCgpGcm9tIGEgZ3Vlc3Qgc2VjdXJpdHkgcGVyc3BlY3RpdmUsIHRocm93IFREWCBh bmQgU05QIG91dCB0aGUgd2luZG93LiAgQXMgZmFyIGFzCnRoZSBkZXNpZ24gb2YgZ3Vlc3RfbWVt ZmQgaXMgY29uY2VybmVkLCBJIHRydWx5IGRvIG5vdCBjYXJlIHdoYXQgc2VjdXJpdHkgcHJvcGVy dGllcwp0aGV5IHByb3ZpZGUsIEkgb25seSBjYXJlIGFib3V0IHdoZXRoZXIgb3Igbm90IEtWTSdz IHN1cHBvcnQgZm9yIFREWCBhbmQgU05QIGlzCmNsZWFuLCByb2J1c3QsIGFuZCBmdW5jdGlvbmFs bHkgY29ycmVjdC4KCk5vdGUsIEknbSBub3Qgc2F5aW5nIEkgZG9uJ3QgY2FyZSBhYm91dCBURFgv U05QLiAgV2hhdCBJJ20gc2F5aW5nIGlzIHRoYXQgSSBkb24ndAp3YW50IHRvIGRlc2lnbiBzb21l dGhpbmcgdGhhdCBpcyBiZW5lZmljaWFsIG9ubHkgdG8gd2hhdCBpcyBjdXJyZW50bHkgYSB2ZXJ5 Cm5pY2hlIGNsYXNzIG9mIFZNcyB0aGF0IHJlcXVpcmUgc3BlY2lmaWMgZmxhdm9ycyBvZiBoYXJk d2FyZS4KCj4gKGIpIE1hbGljaW91cyBob3N0IGtlcm5lbAo+IAo+IEEgbWFsaWNpb3VzIGhvc3Qg a2VybmVsIGNhbiBhbGxvdyBhIG1hbGljaW91cyBob3N0IFZNTSB0byByZS11c2UgYSBIS0lECj4g Zm9yIHRoZSBkdW1wZXIgVk0sIGJ1dCB0aGlzIGlzbid0IHNvbWV0aGluZyBhIGJldHRlciBnbWVt IGRlc2lnbiBjYW4KPiBkZWZlbmQgYWdhaW5zdC4KClllcCwgY29tcGxldGVseSBvdXQtb2Ytc2Nv cGUuCgo+IChjKSBBdHRhY2tzIHVzaW5nIGdtZW0gZm9yIHNvZnR3YXJlLXByb3RlY3RlZCBWTXMK PiAKPiBBdHRhY2tzIHVzaW5nIGdtZW0gZm9yIHNvZnR3YXJlLXByb3RlY3RlZCBWTXMgYXJlIHBv c3NpYmxlIHNpbmNlIHRoZXJlCj4gaXMgbm8gcmVhbCBlbmNyeXB0aW9uIHdpdGggSEtJRC9BU0lE ICh5ZXQ/KS4gVGhlIHNlbGZ0ZXN0IGZvciBbMV0KPiBhY3R1YWxseSB1c2VzIHRoaXMgbGFjayBv ZiBlbmNyeXB0aW9uIHRvIHRlc3QgdGhhdCB0aGUgZGVzdGluYXRpb24gVk0KPiBjYW4gcmVhZCB0 aGUgc291cmNlIFZNJ3MgbWVtb3J5IGFmdGVyIHRoZSBtaWdyYXRpb24uIEluIHRoZSBQT0MgWzFd LCBhcwo+IGxvbmcgYXMgYm90aCBkZXN0aW5hdGlvbiBWTSBrbm93cyB3aGVyZSBpbiB0aGUgaW5v ZGUncyBtZW1vcnkgdG8gcmVhZCwKPiBpdCBjYW4gcmVhZCB3aGF0IGl0IHdhbnRzIHRvLgogCkVu Y3J5cHRpb24gaXMgbm90IHJlcXVpcmVkIHRvIHByb3RlY3QgZ3Vlc3QgbWVtb3J5IGZyb20gbGVz cyBwcml2aWxlZ2VkIHNvZnR3YXJlLgpUaGUgc2VsZnRlc3RzIGRvbid0IHJlbHkgb24gbGFjayBv ZiBlbmNyeXB0aW9uLCB0aGV5IHJlbHkgb24gS1ZNIGluY29ycG9yYXRpbmcKaG9zdCB1c2Vyc3Bh Y2UgaW50byB0aGUgVENCLgoKSnVzdCBiZWNhdXNlIHRoaXMgUkZDIGRvZXNuJ3QgcmVtb3ZlIHRo ZSBWTU0gZnJvbSB0aGUgVENCIGZvciBTVy1wcm90ZWN0ZWQgVk1TLApkb2Vzbid0IG1lYW4gd2Ug X2Nhbid0XyByZW1vdmUgdGhlIFZNTSBmcm9tIHRoZSBUQ0IuICBwS1ZNIGhhcyBhbHJlYWR5IHNo b3duIHRoYXQKc3VjaCBhbiBpbXBsZW1lbnRhdGlvbiBpcyBwb3NzaWJsZS4gIFdlIGRpZG4ndCB0 YWNrbGUgcEtWTS1saWtlIHN1cHBvcnQgaW4gdGhlCmluaXRpYWwgaW1wbGVtZW50YXRpb24gYmVj YXVzZSBpdCdzIG5vbi10cml2aWFsLCBkb2Vzbid0IHlldCBoYXZlIGEgY29uY3JldGUgdXNlCmNh c2UgdG8gZnVuZC9kcml2ZSBkZXZlbG9wbWVudCwgYW5kIHdvdWxkIGhhdmUgc2lnbmlmaWNhbnRs eSBkZWxheWVkIHN1cHBvcnQgZm9yCnRoZSB1c2UgY2FzZXMgcGVvcGxlIGRvIGFjdHVhbGx5IGNh cmUgYWJvdXQuCgpUaGVyZSBhcmUgY2VydGFpbmx5IGJlbmVmaXRzIGZyb20gbWVtb3J5IGJlaW5n IGVuY3J5cHRlZCwgYnV0IGl0J3MgbmVpdGhlciBhCnJlcXVpcmVtZW50IG5vciBhIHBhbmFjZWEs IGFzIHByb3ZlbiBieSB0aGUgbmV2ZXIgZW5kaW5nIHN0cmVhbSBvZiBzcGVjdWxhdGl2ZQpleGVj dXRpb24gYXR0YWNrcy4KIAo+IFRoaXMgaXMgYSBwcm9ibGVtIGZvciBzb2Z0d2FyZS1wcm90ZWN0 ZWQgVk1zLCBidXQgSSBmZWVsIHRoYXQgaXQgaXMgYWxzbyBhCj4gc2VwYXJhdGUgaXNzdWUgZnJv bSBnbWVtJ3MgZGVzaWduLgoKTm8sIEkgZG9uJ3Qgd2FudCBndWVzdF9tZW1mZCB0byBiZSBqdXN0 IGJlIGEgdmVoaWNsZSBmb3IgU05QL1REWCBWTXMuICBIYXZpbmcgbGluZQpvZiBzaWdodCB0byBy ZW1vdmluZyBob3N0IHVzZXJzcGFjZSBmcm9tIHRoZSBUQ0IgaXMgYWJzb2x1dGVseSBhIG11c3Qg aGF2ZSBmb3IgbWUsCmFuZCBoYXZpbmcgbGluZSBvZiBzaWdodCB0byBpbXByb3ZpbmcgS1ZNJ3Mg c2VjdXJpdHkgcG9zdHVyZSBmb3IgInJlZ3VsYXIiIFZNcyBpcwpldmVuIG1vcmUgb2YgYSBtdXN0 IGhhdmUuICBJZiBndWVzdF9tZW1mZCBkb2Vzbid0IHByb3ZpZGUgdXMgYSB2ZXJ5IGRpcmVjdCBw YXRoIHRvCihldmVudHVhbGx5KSBhY2hpZXZpbmcgdGhvc2UgZ29hbHMsIHRoZW4gSU1PIGl0J3Mg YSBmYWlsdXJlLgoKV2hpY2ggbGVhZHMgbWUgdG86CgooZCkgQnVnZ3kgY29tcG9uZW50cwoKVG9k YXksIGZvciBhbGwgaW50ZW50cyBhbmQgcHVycG9zZXMsIGd1ZXN0IG1lbW9yeSAqbXVzdCogYmUg bWFwcGVkIHdyaXRhYmxlIGluCnRoZSBWTU0sIHdoaWNoIG1lYW5zIGl0IGlzIGFsbCB0b28gZWFz eSBmb3IgYSBiZW5pZ24tYnV0LWJ1Z2d5IGhvc3QgY29tcG9uZW50IHRvCmNvcnJ1cHQgZ3Vlc3Qg bWVtb3J5LiAgVGhlcmUgYXJlIHdheXMgdG8gbWl0aWdhdGUgcG90ZW50aWFsIHByb2JsZW1zLCBl LmcuIGJ5CmRldmVsb3BpbmcgdXNlcnNwYWNlIHRvIGFkaGVyZSB0byB0aGUgcHJpbmNpcGxlIG9m IGxlYXN0IHByaXZpbGVnZSBpbmFzbXVjaCBhcwpwb3NzaWJsZSwgYnV0IHN1Y2ggbWl0aWdhdGlv bnMgd291bGQgYmUgZmFyIGxlc3Mgcm9idXN0IHRoYW4gd2hhdCBjYW4gYmUgYWNoaWV2ZWQKdmlh IGd1ZXN0X21lbWZkLCBhbmQgcHJhY3RpY2FsbHkgc3BlYWtpbmcgSSBkb24ndCBzZWUgdXMgKEdv b2dsZSwgYnV0IGFsc28gS1ZNIGluCmdlbmVyYWwpIG1ha2luZyBwcm9ncmVzcyBvbiBkZXByaXZp bGVnaW5nIHVzZXJzcGFjZSB3aXRob3V0IGZvcmNpbmcgdGhlIGlzc3VlLgoKPiA+PiBDb3VsZCBi aW5kaW5nIGdtZW0gZmlsZXMgbm90IG9uIGNyZWF0aW9uLCBidXQgYXQgbWVtc2xvdCBjb25maWd1 cmF0aW9uCj4gPj4gdGltZSBiZSBzdWZmaWNpZW50IGFuZCBzaW1wbGVyPwo+ID4KPiA+IEFmdGVy IHdvcmtpbmcgdGhyb3VnaCB0aGUgZmxvd3MsIEkgdGhpbmsgYmluZGluZyBvbi1kZW1hbmQgd291 bGQgc2ltcGxpZnkgdGhlCj4gPiByZWZjb3VudGluZyAoc3RhdGluZyB0aGUgb2J2aW91cyksIGJ1 dCBjb21wbGljYXRlIHRoZSBsaWZlY3ljbGUgb2YgdGhlIG1lbW9yeSBhcwo+ID4gd2VsbCBhcyB0 aGUgY29udHJhY3QgYmV0d2VlbiBLVk0gYW5kIHVzZXJzcGFjZSwKPiAKPiBJZiB3ZSBhcmUgb24g dGhlIHNhbWUgcGFnZSB0aGF0IHRoZSBtZW1vcnkgc2hvdWxkIG91dGxpdmUgdGhlIFZNIGJ1dCBu b3QKPiB0aGUgYmluZGluZ3MsIGRvZXMgaXQgc3RpbGwgY29tcGxpY2F0ZSB0aGUgbGlmZWN5Y2xl IG9mIHRoZSBtZW1vcnkgYW5kCj4gdGhlIHVzZXJzcGFjZS9LVk0gY29udHJhY3Q/IENvdWxkIGl0 IGp1c3QgYmUgYSBkaWZmZXJlbnQgY29udHJhY3Q/CgpOb3QgZW50aXJlbHkgc3VyZSBJIHVuZGVy c3RhbmQgd2hhdCB5b3UncmUgYXNraW5nLiAgRG9lcyB0aGlzIHF1ZXN0aW9uIGdvIGF3YXkKd2l0 aCBteSBjbGFyaWZpY2F0aW9uIGFib3V0IHN0cnVjdCBrdm0gdnMuIHZpcnR1YWwgbWFjaGluZT8K Cj4gPiBhbmQgd291bGQgYnJlYWsgdGhlIHNlcGFyYXRpb24gb2YKPiA+IGNvbmNlcm5zIGJldHdl ZW4gdGhlIGlub2RlIChwaHlzaWNhbCBtZW1vcnkgLyBkYXRhKSBhbmQgZmlsZSAoVk0ncyB2aWV3 IC8gbWFwcGluZ3MpLgo+IAo+IEJpbmRpbmcgb24tZGVtYW5kIGlzIG9ydGhvZ29uYWwgdG8gdGhl IHNlcGFyYXRpb24gb2YgY29uY2VybnMgYmV0d2Vlbgo+IGlub2RlIGFuZCBmaWxlLCBiZWNhdXNl IGl0IGNhbiBiZSBidWlsdCByZWdhcmRsZXNzIG9mIHdoZXRoZXIgd2UgZG8gdGhlCj4gZ21lbSBm aWxlL2lub2RlIHNwbGl0Lgo+IAo+ICsgVGhpcyBmbGlwLXRoZS1yZWZjb3VudGluZyBQT0MgaXMg YnVpbHQgd2l0aCB0aGUgZmlsZS9pbm9kZSBzcGxpdCBhbmQKPiArIEluIFsyXSAodGhlIGRlbGF5 ZWQgYmluZGluZyBhcHByb2FjaCB0byBzb2x2ZSBpbnRyYS1ob3N0IG1pZ3JhdGlvbiksIEkKPiAg IGFsc28gdHJpZWQgZmxpcHBpbmcgdGhlIHJlZmNvdW50aW5nLCBhbmQgdGhhdCB3aXRob3V0IHRo ZSBnbWVtCj4gICBmaWxlL2lub2RlIHNwbGl0LiAoUmVmY291bnRpbmcgaW4gWzJdIGlzIGJ1Z2d5 IGJlY2F1c2UgdGhlIGZpbGUgY2FuJ3QKPiAgIHRha2UgYSByZWZjb3VudCBvbiBLVk0sIGJ1dCBp dCB3b3VsZCB3b3JrIHdpdGhvdXQgdGFraW5nIHRoYXQgcmVmY291bnQpCj4gCj4gWzFdIGh0dHBz Oi8vbG9yZS5rZXJuZWwub3JnL2xrbWwvY292ZXIuMTY5MTQ0Njk0Ni5naXQuYWNrZXJsZXl0bmdA Z29vZ2xlLmNvbS9ULwo+IFsyXSBodHRwczovL2dpdGh1Yi5jb20vZ29vZ2xlcHJvZGtlcm5lbC9s aW51eC1jYy9jb21taXQvZGQ1YWM1ZTUzZjE0YTFlZjk5MTVjOWMxZTRjYzEwMDZhNDBiNDlkZgoK X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KbGludXgtcmlz Y3YgbWFpbGluZyBsaXN0CmxpbnV4LXJpc2N2QGxpc3RzLmluZnJhZGVhZC5vcmcKaHR0cDovL2xp c3RzLmluZnJhZGVhZC5vcmcvbWFpbG1hbi9saXN0aW5mby9saW51eC1yaXNjdgo= From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.ozlabs.org (lists.ozlabs.org [112.213.38.117]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 75DA5C001E0 for ; Tue, 15 Aug 2023 20:04:51 +0000 (UTC) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20221208 header.b=tMP747Pn; dkim-atps=neutral Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4RQMgP6PwLz3cRZ for ; Wed, 16 Aug 2023 06:04:49 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20221208 header.b=tMP747Pn; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=flex--seanjc.bounces.google.com (client-ip=2607:f8b0:4864:20::b49; helo=mail-yb1-xb49.google.com; envelope-from=3kdrbzaykdhoqcylhaemmejc.amkjglsvnna-bctjgqrq.mxjyzq.mpe@flex--seanjc.bounces.google.com; receiver=lists.ozlabs.org) Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4RQMfQ2F7Pz30KG for ; Wed, 16 Aug 2023 06:03:56 +1000 (AEST) Received: by mail-yb1-xb49.google.com with SMTP id 3f1490d57ef6-d669fcad15cso5239102276.0 for ; Tue, 15 Aug 2023 13:03:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1692129833; x=1692734633; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=UdgiQF5eHDQGC93K0blyfdFyGr+2XYdNo+ZKSbJ8j6I=; b=tMP747PnrWIY3S2IgIlpTGvys1qcp7gvhf2tEkTincr/WVeeoLtSV4nAqvShetKLlE L1iZ2gBuYmK67h8Uw9IKCA2gN3qOM0GGC3XehHtQr0jMkW0ZJ/fXrRn6EfVMRIVGETms zlDBmwUShl2Q3VsZCRyHfSqWCKWeTFGcKSG1iAXT03s7rjvYB7wm+4XZLwdERLEqJnHl s+OEqS8CHaugxEaHqjEM4nmoUN0SaAqSQyAqE5Qleml+ks20SUEByr18WbK2nerA03bM m/EmYRW9TIB8/DRpffNeX6d54vgVpq23mcljzKseBhsuD7QuMUsIfvRucPdmyp9uk8JZ laEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692129833; x=1692734633; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=UdgiQF5eHDQGC93K0blyfdFyGr+2XYdNo+ZKSbJ8j6I=; b=QHLLMJeIk6H4msZf7bGBL264yVX2dLbH2UvxtAXKaF1J3tp0r7s5p53Cg1xGNGH3WS JAZZgFMmvv7eYFZbiiIn/n6PPii3dfCrlf2+b0F3nx2XAtfTl4WtZEWEd5Sz5+yB3BTx d+dvARtaMv6n6tdsZbS1uTrxuvDFgJ2wIR4WPZKjIlBbw1fF2dXEqLBIVvF+QGcrdbRc 0UuptGDJeqQ/LWA1PPqWNbyAG4h85KUUB/tNBrtGXPKeMBmjzu7w0KChhvWPIm/Hf0w2 +nZW9ULvqKv1gfpRJqf3Ms+Y5DJC75H8d3R+NOQhnuaWvYP/LVR7BPaEaBCpqB+jpW8B ZlTA== X-Gm-Message-State: AOJu0YxdB4kWzsmc3ZDmk+AoJaRy5dn++sMmKIY624sa4KDr/2pvsK/a 5cnU5Aok+v1AT2nVcBhJA5G0unl4Jb4= X-Google-Smtp-Source: AGHT+IF8PVKZ3/Mllt1qn4YqGQbrszG6AYhXRKKw5U6+m8LONIq9kWubfU+atsLqR25KKJSwV9mEz+vM5vc= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a05:6902:565:b0:d18:73fc:40af with SMTP id a5-20020a056902056500b00d1873fc40afmr176842ybt.5.1692129833519; Tue, 15 Aug 2023 13:03:53 -0700 (PDT) Date: Tue, 15 Aug 2023 13:03:51 -0700 In-Reply-To: Mime-Version: 1.0 References: Message-ID: Subject: Re: [RFC PATCH v11 12/29] KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory From: Sean Christopherson To: Ackerley Tng Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kvm@vger.kernel.org, david@redhat.com, yu.c.zhang@linux.intel.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, chao.p.peng@linux.intel.com, linux-riscv@lists.infradead.org, isaku.yamahata@gmail.com, paul@paul-moore.com, maz@kernel.org, chenhuacai@kernel.org, jmorris@namei.org, willy@infradead.org, wei.w.wang@intel.com, tabba@google.com, jarkko@kernel.org, serge@hallyn.com, mail@maciej.szmigiero.name, aou@eecs.berkeley.edu, vbabka@suse.cz, michael.roth@amd.com, paul.walmsley@sifive.com, kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, qperret@google.com, liam.merwick@oracle.com, linux-mips@vger.kernel.org, oliver.upton@linux.dev, linux-security-module@vger.kernel.org, palmer@dabbelt.com, kvm-riscv@lists.infradead.org, anup@brainfault.org, linux-fsdevel@vger.kernel.org, pbonzini@redhat.com, akpm@linux-foundation.org, vannapurve@google.com, linuxppc-dev@lists.ozlabs.org, kirill.shutemov@linux.intel.com Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" On Tue, Aug 15, 2023, Ackerley Tng wrote: > Sean Christopherson writes: >=20 > >> I feel that memslots form a natural way of managing usage of the gmem > >> file. When a memslot is created, it is using the file; hence we take a > >> refcount on the gmem file, and as memslots are removed, we drop > >> refcounts on the gmem file. > > > > Yes and no. It's definitely more natural *if* the goal is to allow gue= st_memfd > > memory to exist without being attached to a VM. But I'm not at all con= vinced > > that we want to allow that, or that it has desirable properties. With = TDX and > > SNP in particuarly, I'm pretty sure that allowing memory to outlive the= VM is > > very underisable (more below). > > >=20 > This is a little confusing, with the file/inode split in gmem where the > physical memory/data is attached to the inode and the file represents > the VM's view of that memory, won't the memory outlive the VM? Doh, I overloaded the term "VM". By "VM" I meant the virtual machine as a = "thing" the rest of the world sees and interacts with, not the original "struct kvm= " object. Because yes, you're absolutely correct that the memory will outlive "struct= kvm", but it won't outlive the virtual machine, and specifically won't outlive th= e ASID (SNP) / HKID (TDX) to which it's bound. > This [1] POC was built based on that premise, that the gmem inode can be > linked to another file and handed off to another VM, to facilitate > intra-host migration, where the point is to save the work of rebuilding > the VM's memory in the destination VM. >=20 > With this, the bindings don't outlive the VM, but the data/memory > does. I think this split design you proposed is really nice. >=20 > >> The KVM pointer is shared among all the bindings in gmem=E2=80=99s xar= ray, and we can > >> enforce that a gmem file is used only with one VM: > >> > >> + When binding a memslot to the file, if a kvm pointer exists, it must > >> be the same kvm as the one in this binding > >> + When the binding to the last memslot is removed from a file, NULL th= e > >> kvm pointer. > > > > Nullifying the KVM pointer isn't sufficient, because without additional= actions > > userspace could extract data from a VM by deleting its memslots and the= n binding > > the guest_memfd to an attacker controlled VM. Or more likely with TDX = and SNP, > > induce badness by coercing KVM into mapping memory into a guest with th= e wrong > > ASID/HKID. > > > > I can think of three ways to handle that: > > > > (a) prevent a different VM from *ever* binding to the gmem instance > > (b) free/zero physical pages when unbinding > > (c) free/zero when binding to a different VM > > > > Option (a) is easy, but that pretty much defeats the purpose of decopul= ing > > guest_memfd from a VM. > > > > Option (b) isn't hard to implement, but it screws up the lifecycle of t= he memory, > > e.g. would require memory when a memslot is deleted. That isn't necess= arily a > > deal-breaker, but it runs counter to how KVM memlots currently operate.= Memslots > > are basically just weird page tables, e.g. deleting a memslot doesn't h= ave any > > impact on the underlying data in memory. TDX throws a wrench in this a= s removing > > a page from the Secure EPT is effectively destructive to the data (can'= t be mapped > > back in to the VM without zeroing the data), but IMO that's an oddity w= ith TDX and > > not necessarily something we want to carry over to other VM types. > > > > There would also be performance implications (probably a non-issue in p= ractice), > > and weirdness if/when we get to sharing, linking and/or mmap()ing gmem.= E.g. what > > should happen if the last memslot (binding) is deleted, but there outst= anding userspace > > mappings? > > > > Option (c) is better from a lifecycle perspective, but it adds its own = flavor of > > complexity, e.g. the performant way to reclaim TDX memory requires the = TDMR > > (effectively the VM pointer), and so a deferred relcaim doesn't really = work for > > TDX. And I'm pretty sure it *can't* work for SNP, because RMP entries = must not > > outlive the VM; KVM can't reuse an ASID if there are pages assigned to = that ASID > > in the RMP, i.e. until all memory belonging to the VM has been fully fr= eed. > > >=20 > If we are on the same page that the memory should outlive the VM but not > the bindings, then associating the gmem inode to a new VM should be a > feature and not a bug. >=20 > What do we want to defend against here? >=20 > (a) Malicious host VMM >=20 > For a malicious host VMM to read guest memory (with TDX and SNP), it can > create a new VM with the same HKID/ASID as the victim VM, rebind the > gmem inode to a VM crafted with an image that dumps the memory. >=20 > I believe it is not possible for userspace to arbitrarily select a > matching HKID unless userspace uses the intra-host migration ioctls, but = if the > migration ioctl is used, then EPTs are migrated and the memory dumper VM > can't successfully run a different image from the victim VM. If the > dumper VM needs to run the same image as the victim VM, then it would be > a successful migration rather than an attack. (Perhaps we need to clean > up some #MCs here but that can be a separate patch). >From a guest security perspective, throw TDX and SNP out the window. As fa= r as the design of guest_memfd is concerned, I truly do not care what security p= roperties they provide, I only care about whether or not KVM's support for TDX and SN= P is clean, robust, and functionally correct. Note, I'm not saying I don't care about TDX/SNP. What I'm saying is that I= don't want to design something that is beneficial only to what is currently a ver= y niche class of VMs that require specific flavors of hardware. > (b) Malicious host kernel >=20 > A malicious host kernel can allow a malicious host VMM to re-use a HKID > for the dumper VM, but this isn't something a better gmem design can > defend against. Yep, completely out-of-scope. > (c) Attacks using gmem for software-protected VMs >=20 > Attacks using gmem for software-protected VMs are possible since there > is no real encryption with HKID/ASID (yet?). The selftest for [1] > actually uses this lack of encryption to test that the destination VM > can read the source VM's memory after the migration. In the POC [1], as > long as both destination VM knows where in the inode's memory to read, > it can read what it wants to. =20 Encryption is not required to protect guest memory from less privileged sof= tware. The selftests don't rely on lack of encryption, they rely on KVM incorporat= ing host userspace into the TCB. Just because this RFC doesn't remove the VMM from the TCB for SW-protected = VMS, doesn't mean we _can't_ remove the VMM from the TCB. pKVM has already show= n that such an implementation is possible. We didn't tackle pKVM-like support in = the initial implementation because it's non-trivial, doesn't yet have a concret= e use case to fund/drive development, and would have significantly delayed suppor= t for the use cases people do actually care about. There are certainly benefits from memory being encrypted, but it's neither = a requirement nor a panacea, as proven by the never ending stream of speculat= ive execution attacks. =20 > This is a problem for software-protected VMs, but I feel that it is also = a > separate issue from gmem's design. No, I don't want guest_memfd to be just be a vehicle for SNP/TDX VMs. Havi= ng line of sight to removing host userspace from the TCB is absolutely a must have = for me, and having line of sight to improving KVM's security posture for "regular" = VMs is even more of a must have. If guest_memfd doesn't provide us a very direct = path to (eventually) achieving those goals, then IMO it's a failure. Which leads me to: (d) Buggy components Today, for all intents and purposes, guest memory *must* be mapped writable= in the VMM, which means it is all too easy for a benign-but-buggy host compone= nt to corrupt guest memory. There are ways to mitigate potential problems, e.g. = by developing userspace to adhere to the principle of least privilege inasmuch= as possible, but such mitigations would be far less robust than what can be ac= hieved via guest_memfd, and practically speaking I don't see us (Google, but also = KVM in general) making progress on deprivileging userspace without forcing the iss= ue. > >> Could binding gmem files not on creation, but at memslot configuration > >> time be sufficient and simpler? > > > > After working through the flows, I think binding on-demand would simpli= fy the > > refcounting (stating the obvious), but complicate the lifecycle of the = memory as > > well as the contract between KVM and userspace, >=20 > If we are on the same page that the memory should outlive the VM but not > the bindings, does it still complicate the lifecycle of the memory and > the userspace/KVM contract? Could it just be a different contract? Not entirely sure I understand what you're asking. Does this question go a= way with my clarification about struct kvm vs. virtual machine? > > and would break the separation of > > concerns between the inode (physical memory / data) and file (VM's view= / mappings). >=20 > Binding on-demand is orthogonal to the separation of concerns between > inode and file, because it can be built regardless of whether we do the > gmem file/inode split. >=20 > + This flip-the-refcounting POC is built with the file/inode split and > + In [2] (the delayed binding approach to solve intra-host migration), I > also tried flipping the refcounting, and that without the gmem > file/inode split. (Refcounting in [2] is buggy because the file can't > take a refcount on KVM, but it would work without taking that refcount) >=20 > [1] https://lore.kernel.org/lkml/cover.1691446946.git.ackerleytng@google.= com/T/ > [2] https://github.com/googleprodkernel/linux-cc/commit/dd5ac5e53f14a1ef9= 915c9c1e4cc1006a40b49df From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 058A1C0015E for ; Tue, 15 Aug 2023 20:04:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:Message-ID: References:Mime-Version:In-Reply-To:Date:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=di/aB/PVJELFFTB6fI23RxozYrI66ifvvscma12bmxQ=; b=CEqCnJo8FYLHvjPhuGI+8GlOuI sYaSUTjJVmTcUw6W5XHsNILP8glPdd4eWFa+qHLx5CAmclKnHerwh6XNC/m4ElkrwZXiHKz0Wka3l kTcdWiqXojVmeRJzAaifQ+W12NuTXJ290EELCWXRnCKHd8g073ogA3OT7hVltFiTwlb8sdwHBk3sB AqjNAfWcX1K/d7zSGQk8MaZbPpEAve7YXIxy8W5Yo6qOXm2PolHQNU7wrBLhxpYLcaGzLtdQ3Oc+i A26szA2IHW0pdCZnX/+OkpKnCAMewPUAPPWHiTp+GsvlQ9WZ+gIh+5qJ7BDsHHu7Z23JdL62q4R7J R/I1fo+Q==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1qW0Gn-002O6E-21; Tue, 15 Aug 2023 20:04:01 +0000 Received: from mail-yb1-xb4a.google.com ([2607:f8b0:4864:20::b4a]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1qW0Gj-002O3Y-1G for linux-arm-kernel@lists.infradead.org; Tue, 15 Aug 2023 20:03:59 +0000 Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-c6dd0e46a52so6406370276.2 for ; Tue, 15 Aug 2023 13:03:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1692129833; x=1692734633; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=UdgiQF5eHDQGC93K0blyfdFyGr+2XYdNo+ZKSbJ8j6I=; b=tMP747PnrWIY3S2IgIlpTGvys1qcp7gvhf2tEkTincr/WVeeoLtSV4nAqvShetKLlE L1iZ2gBuYmK67h8Uw9IKCA2gN3qOM0GGC3XehHtQr0jMkW0ZJ/fXrRn6EfVMRIVGETms zlDBmwUShl2Q3VsZCRyHfSqWCKWeTFGcKSG1iAXT03s7rjvYB7wm+4XZLwdERLEqJnHl s+OEqS8CHaugxEaHqjEM4nmoUN0SaAqSQyAqE5Qleml+ks20SUEByr18WbK2nerA03bM m/EmYRW9TIB8/DRpffNeX6d54vgVpq23mcljzKseBhsuD7QuMUsIfvRucPdmyp9uk8JZ laEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692129833; x=1692734633; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=UdgiQF5eHDQGC93K0blyfdFyGr+2XYdNo+ZKSbJ8j6I=; b=U4ZXZTJYNVqlLywLWPWI8C0wne8Z+xsPDMYj1vA99AEsFHKXwSgyORwnZbhiQalMaj 4hsWYuzVMO2OUdZRfNUyQAWNHdE6QSU4bACPbWiFgv2MZxurz8822KBqta4B6yDLx5D0 unF0YTneTG90j3sj/Q/bIOcmHt0oEVtoQZfcUjA6p2944HDlsvgFOniKlhbIiUdnzjJY gzH2ALKHQL6tCv3/s9ri+1mXZhV7enbUUTkhC3iKU4JddGOR8CSwOt2+jfjHxPgEwNr1 LP3DzJ6jpiTF5t41YfWG7x8P9BVwvS89H5uu6hzJ+6VsPJyrT3nKVdrW9PJqMapcNUx4 dM4g== X-Gm-Message-State: AOJu0YzE/hie7qSBQ3REUEsuG0B3cir9DWMT333GSSL8XM1RnhX/szFm aGbiu2mcP8B6CwKRZid/cCHIgL7b6g4= X-Google-Smtp-Source: AGHT+IF8PVKZ3/Mllt1qn4YqGQbrszG6AYhXRKKw5U6+m8LONIq9kWubfU+atsLqR25KKJSwV9mEz+vM5vc= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a05:6902:565:b0:d18:73fc:40af with SMTP id a5-20020a056902056500b00d1873fc40afmr176842ybt.5.1692129833519; Tue, 15 Aug 2023 13:03:53 -0700 (PDT) Date: Tue, 15 Aug 2023 13:03:51 -0700 In-Reply-To: Mime-Version: 1.0 References: Message-ID: Subject: Re: [RFC PATCH v11 12/29] KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory From: Sean Christopherson To: Ackerley Tng Cc: pbonzini@redhat.com, maz@kernel.org, oliver.upton@linux.dev, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, willy@infradead.org, akpm@linux-foundation.org, paul@paul-moore.com, jmorris@namei.org, serge@hallyn.com, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-security-module@vger.kernel.org, linux-kernel@vger.kernel.org, chao.p.peng@linux.intel.com, tabba@google.com, jarkko@kernel.org, yu.c.zhang@linux.intel.com, vannapurve@google.com, mail@maciej.szmigiero.name, vbabka@suse.cz, david@redhat.com, qperret@google.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230815_130357_454891_C3FD95E5 X-CRM114-Status: GOOD ( 57.23 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org T24gVHVlLCBBdWcgMTUsIDIwMjMsIEFja2VybGV5IFRuZyB3cm90ZToKPiBTZWFuIENocmlzdG9w aGVyc29uIDxzZWFuamNAZ29vZ2xlLmNvbT4gd3JpdGVzOgo+IAo+ID4+IEkgZmVlbCB0aGF0IG1l bXNsb3RzIGZvcm0gYSBuYXR1cmFsIHdheSBvZiBtYW5hZ2luZyB1c2FnZSBvZiB0aGUgZ21lbQo+ ID4+IGZpbGUuIFdoZW4gYSBtZW1zbG90IGlzIGNyZWF0ZWQsIGl0IGlzIHVzaW5nIHRoZSBmaWxl OyBoZW5jZSB3ZSB0YWtlIGEKPiA+PiByZWZjb3VudCBvbiB0aGUgZ21lbSBmaWxlLCBhbmQgYXMg bWVtc2xvdHMgYXJlIHJlbW92ZWQsIHdlIGRyb3AKPiA+PiByZWZjb3VudHMgb24gdGhlIGdtZW0g ZmlsZS4KPiA+Cj4gPiBZZXMgYW5kIG5vLiAgSXQncyBkZWZpbml0ZWx5IG1vcmUgbmF0dXJhbCAq aWYqIHRoZSBnb2FsIGlzIHRvIGFsbG93IGd1ZXN0X21lbWZkCj4gPiBtZW1vcnkgdG8gZXhpc3Qg d2l0aG91dCBiZWluZyBhdHRhY2hlZCB0byBhIFZNLiAgQnV0IEknbSBub3QgYXQgYWxsIGNvbnZp bmNlZAo+ID4gdGhhdCB3ZSB3YW50IHRvIGFsbG93IHRoYXQsIG9yIHRoYXQgaXQgaGFzIGRlc2ly YWJsZSBwcm9wZXJ0aWVzLiAgV2l0aCBURFggYW5kCj4gPiBTTlAgaW4gcGFydGljdWFybHksIEkn bSBwcmV0dHkgc3VyZSB0aGF0IGFsbG93aW5nIG1lbW9yeSB0byBvdXRsaXZlIHRoZSBWTSBpcwo+ ID4gdmVyeSB1bmRlcmlzYWJsZSAobW9yZSBiZWxvdykuCj4gPgo+IAo+IFRoaXMgaXMgYSBsaXR0 bGUgY29uZnVzaW5nLCB3aXRoIHRoZSBmaWxlL2lub2RlIHNwbGl0IGluIGdtZW0gd2hlcmUgdGhl Cj4gcGh5c2ljYWwgbWVtb3J5L2RhdGEgaXMgYXR0YWNoZWQgdG8gdGhlIGlub2RlIGFuZCB0aGUg ZmlsZSByZXByZXNlbnRzCj4gdGhlIFZNJ3MgdmlldyBvZiB0aGF0IG1lbW9yeSwgd29uJ3QgdGhl IG1lbW9yeSBvdXRsaXZlIHRoZSBWTT8KCkRvaCwgSSBvdmVybG9hZGVkIHRoZSB0ZXJtICJWTSIu ICBCeSAiVk0iIEkgbWVhbnQgdGhlIHZpcnR1YWwgbWFjaGluZSBhcyBhICJ0aGluZyIKdGhlIHJl c3Qgb2YgdGhlIHdvcmxkIHNlZXMgYW5kIGludGVyYWN0cyB3aXRoLCBub3QgdGhlIG9yaWdpbmFs ICJzdHJ1Y3Qga3ZtIiBvYmplY3QuCgpCZWNhdXNlIHllcywgeW91J3JlIGFic29sdXRlbHkgY29y cmVjdCB0aGF0IHRoZSBtZW1vcnkgd2lsbCBvdXRsaXZlICJzdHJ1Y3Qga3ZtIiwKYnV0IGl0IHdv bid0IG91dGxpdmUgdGhlIHZpcnR1YWwgbWFjaGluZSwgYW5kIHNwZWNpZmljYWxseSB3b24ndCBv dXRsaXZlIHRoZQpBU0lEIChTTlApIC8gSEtJRCAoVERYKSB0byB3aGljaCBpdCdzIGJvdW5kLgoK PiBUaGlzIFsxXSBQT0Mgd2FzIGJ1aWx0IGJhc2VkIG9uIHRoYXQgcHJlbWlzZSwgdGhhdCB0aGUg Z21lbSBpbm9kZSBjYW4gYmUKPiBsaW5rZWQgdG8gYW5vdGhlciBmaWxlIGFuZCBoYW5kZWQgb2Zm IHRvIGFub3RoZXIgVk0sIHRvIGZhY2lsaXRhdGUKPiBpbnRyYS1ob3N0IG1pZ3JhdGlvbiwgd2hl cmUgdGhlIHBvaW50IGlzIHRvIHNhdmUgdGhlIHdvcmsgb2YgcmVidWlsZGluZwo+IHRoZSBWTSdz IG1lbW9yeSBpbiB0aGUgZGVzdGluYXRpb24gVk0uCj4gCj4gV2l0aCB0aGlzLCB0aGUgYmluZGlu Z3MgZG9uJ3Qgb3V0bGl2ZSB0aGUgVk0sIGJ1dCB0aGUgZGF0YS9tZW1vcnkKPiBkb2VzLiBJIHRo aW5rIHRoaXMgc3BsaXQgZGVzaWduIHlvdSBwcm9wb3NlZCBpcyByZWFsbHkgbmljZS4KPiAKPiA+ PiBUaGUgS1ZNIHBvaW50ZXIgaXMgc2hhcmVkIGFtb25nIGFsbCB0aGUgYmluZGluZ3MgaW4gZ21l beKAmXMgeGFycmF5LCBhbmQgd2UgY2FuCj4gPj4gZW5mb3JjZSB0aGF0IGEgZ21lbSBmaWxlIGlz IHVzZWQgb25seSB3aXRoIG9uZSBWTToKPiA+Pgo+ID4+ICsgV2hlbiBiaW5kaW5nIGEgbWVtc2xv dCB0byB0aGUgZmlsZSwgaWYgYSBrdm0gcG9pbnRlciBleGlzdHMsIGl0IG11c3QKPiA+PiAgIGJl IHRoZSBzYW1lIGt2bSBhcyB0aGUgb25lIGluIHRoaXMgYmluZGluZwo+ID4+ICsgV2hlbiB0aGUg YmluZGluZyB0byB0aGUgbGFzdCBtZW1zbG90IGlzIHJlbW92ZWQgZnJvbSBhIGZpbGUsIE5VTEwg dGhlCj4gPj4gICBrdm0gcG9pbnRlci4KPiA+Cj4gPiBOdWxsaWZ5aW5nIHRoZSBLVk0gcG9pbnRl ciBpc24ndCBzdWZmaWNpZW50LCBiZWNhdXNlIHdpdGhvdXQgYWRkaXRpb25hbCBhY3Rpb25zCj4g PiB1c2Vyc3BhY2UgY291bGQgZXh0cmFjdCBkYXRhIGZyb20gYSBWTSBieSBkZWxldGluZyBpdHMg bWVtc2xvdHMgYW5kIHRoZW4gYmluZGluZwo+ID4gdGhlIGd1ZXN0X21lbWZkIHRvIGFuIGF0dGFj a2VyIGNvbnRyb2xsZWQgVk0uICBPciBtb3JlIGxpa2VseSB3aXRoIFREWCBhbmQgU05QLAo+ID4g aW5kdWNlIGJhZG5lc3MgYnkgY29lcmNpbmcgS1ZNIGludG8gbWFwcGluZyBtZW1vcnkgaW50byBh IGd1ZXN0IHdpdGggdGhlIHdyb25nCj4gPiBBU0lEL0hLSUQuCj4gPgo+ID4gSSBjYW4gdGhpbmsg b2YgdGhyZWUgd2F5cyB0byBoYW5kbGUgdGhhdDoKPiA+Cj4gPiAgIChhKSBwcmV2ZW50IGEgZGlm ZmVyZW50IFZNIGZyb20gKmV2ZXIqIGJpbmRpbmcgdG8gdGhlIGdtZW0gaW5zdGFuY2UKPiA+ICAg KGIpIGZyZWUvemVybyBwaHlzaWNhbCBwYWdlcyB3aGVuIHVuYmluZGluZwo+ID4gICAoYykgZnJl ZS96ZXJvIHdoZW4gYmluZGluZyB0byBhIGRpZmZlcmVudCBWTQo+ID4KPiA+IE9wdGlvbiAoYSkg aXMgZWFzeSwgYnV0IHRoYXQgcHJldHR5IG11Y2ggZGVmZWF0cyB0aGUgcHVycG9zZSBvZiBkZWNv cHVsaW5nCj4gPiBndWVzdF9tZW1mZCBmcm9tIGEgVk0uCj4gPgo+ID4gT3B0aW9uIChiKSBpc24n dCBoYXJkIHRvIGltcGxlbWVudCwgYnV0IGl0IHNjcmV3cyB1cCB0aGUgbGlmZWN5Y2xlIG9mIHRo ZSBtZW1vcnksCj4gPiBlLmcuIHdvdWxkIHJlcXVpcmUgbWVtb3J5IHdoZW4gYSBtZW1zbG90IGlz IGRlbGV0ZWQuICBUaGF0IGlzbid0IG5lY2Vzc2FyaWx5IGEKPiA+IGRlYWwtYnJlYWtlciwgYnV0 IGl0IHJ1bnMgY291bnRlciB0byBob3cgS1ZNIG1lbWxvdHMgY3VycmVudGx5IG9wZXJhdGUuICBN ZW1zbG90cwo+ID4gYXJlIGJhc2ljYWxseSBqdXN0IHdlaXJkIHBhZ2UgdGFibGVzLCBlLmcuIGRl bGV0aW5nIGEgbWVtc2xvdCBkb2Vzbid0IGhhdmUgYW55Cj4gPiBpbXBhY3Qgb24gdGhlIHVuZGVy bHlpbmcgZGF0YSBpbiBtZW1vcnkuICBURFggdGhyb3dzIGEgd3JlbmNoIGluIHRoaXMgYXMgcmVt b3ZpbmcKPiA+IGEgcGFnZSBmcm9tIHRoZSBTZWN1cmUgRVBUIGlzIGVmZmVjdGl2ZWx5IGRlc3Ry dWN0aXZlIHRvIHRoZSBkYXRhIChjYW4ndCBiZSBtYXBwZWQKPiA+IGJhY2sgaW4gdG8gdGhlIFZN IHdpdGhvdXQgemVyb2luZyB0aGUgZGF0YSksIGJ1dCBJTU8gdGhhdCdzIGFuIG9kZGl0eSB3aXRo IFREWCBhbmQKPiA+IG5vdCBuZWNlc3NhcmlseSBzb21ldGhpbmcgd2Ugd2FudCB0byBjYXJyeSBv dmVyIHRvIG90aGVyIFZNIHR5cGVzLgo+ID4KPiA+IFRoZXJlIHdvdWxkIGFsc28gYmUgcGVyZm9y bWFuY2UgaW1wbGljYXRpb25zIChwcm9iYWJseSBhIG5vbi1pc3N1ZSBpbiBwcmFjdGljZSksCj4g PiBhbmQgd2VpcmRuZXNzIGlmL3doZW4gd2UgZ2V0IHRvIHNoYXJpbmcsIGxpbmtpbmcgYW5kL29y IG1tYXAoKWluZyBnbWVtLiAgRS5nLiB3aGF0Cj4gPiBzaG91bGQgaGFwcGVuIGlmIHRoZSBsYXN0 IG1lbXNsb3QgKGJpbmRpbmcpIGlzIGRlbGV0ZWQsIGJ1dCB0aGVyZSBvdXRzdGFuZGluZyB1c2Vy c3BhY2UKPiA+IG1hcHBpbmdzPwo+ID4KPiA+IE9wdGlvbiAoYykgaXMgYmV0dGVyIGZyb20gYSBs aWZlY3ljbGUgcGVyc3BlY3RpdmUsIGJ1dCBpdCBhZGRzIGl0cyBvd24gZmxhdm9yIG9mCj4gPiBj b21wbGV4aXR5LCBlLmcuIHRoZSBwZXJmb3JtYW50IHdheSB0byByZWNsYWltIFREWCBtZW1vcnkg cmVxdWlyZXMgdGhlIFRETVIKPiA+IChlZmZlY3RpdmVseSB0aGUgVk0gcG9pbnRlciksIGFuZCBz byBhIGRlZmVycmVkIHJlbGNhaW0gZG9lc24ndCByZWFsbHkgd29yayBmb3IKPiA+IFREWC4gIEFu ZCBJJ20gcHJldHR5IHN1cmUgaXQgKmNhbid0KiB3b3JrIGZvciBTTlAsIGJlY2F1c2UgUk1QIGVu dHJpZXMgbXVzdCBub3QKPiA+IG91dGxpdmUgdGhlIFZNOyBLVk0gY2FuJ3QgcmV1c2UgYW4gQVNJ RCBpZiB0aGVyZSBhcmUgcGFnZXMgYXNzaWduZWQgdG8gdGhhdCBBU0lECj4gPiBpbiB0aGUgUk1Q LCBpLmUuIHVudGlsIGFsbCBtZW1vcnkgYmVsb25naW5nIHRvIHRoZSBWTSBoYXMgYmVlbiBmdWxs eSBmcmVlZC4KPiA+Cj4gCj4gSWYgd2UgYXJlIG9uIHRoZSBzYW1lIHBhZ2UgdGhhdCB0aGUgbWVt b3J5IHNob3VsZCBvdXRsaXZlIHRoZSBWTSBidXQgbm90Cj4gdGhlIGJpbmRpbmdzLCB0aGVuIGFz c29jaWF0aW5nIHRoZSBnbWVtIGlub2RlIHRvIGEgbmV3IFZNIHNob3VsZCBiZSBhCj4gZmVhdHVy ZSBhbmQgbm90IGEgYnVnLgo+IAo+IFdoYXQgZG8gd2Ugd2FudCB0byBkZWZlbmQgYWdhaW5zdCBo ZXJlPwo+IAo+IChhKSBNYWxpY2lvdXMgaG9zdCBWTU0KPiAKPiBGb3IgYSBtYWxpY2lvdXMgaG9z dCBWTU0gdG8gcmVhZCBndWVzdCBtZW1vcnkgKHdpdGggVERYIGFuZCBTTlApLCBpdCBjYW4KPiBj cmVhdGUgYSBuZXcgVk0gd2l0aCB0aGUgc2FtZSBIS0lEL0FTSUQgYXMgdGhlIHZpY3RpbSBWTSwg cmViaW5kIHRoZQo+IGdtZW0gaW5vZGUgdG8gYSBWTSBjcmFmdGVkIHdpdGggYW4gaW1hZ2UgdGhh dCBkdW1wcyB0aGUgbWVtb3J5Lgo+IAo+IEkgYmVsaWV2ZSBpdCBpcyBub3QgcG9zc2libGUgZm9y IHVzZXJzcGFjZSB0byBhcmJpdHJhcmlseSBzZWxlY3QgYQo+IG1hdGNoaW5nIEhLSUQgdW5sZXNz IHVzZXJzcGFjZSB1c2VzIHRoZSBpbnRyYS1ob3N0IG1pZ3JhdGlvbiBpb2N0bHMsIGJ1dCBpZiB0 aGUKPiBtaWdyYXRpb24gaW9jdGwgaXMgdXNlZCwgdGhlbiBFUFRzIGFyZSBtaWdyYXRlZCBhbmQg dGhlIG1lbW9yeSBkdW1wZXIgVk0KPiBjYW4ndCBzdWNjZXNzZnVsbHkgcnVuIGEgZGlmZmVyZW50 IGltYWdlIGZyb20gdGhlIHZpY3RpbSBWTS4gSWYgdGhlCj4gZHVtcGVyIFZNIG5lZWRzIHRvIHJ1 biB0aGUgc2FtZSBpbWFnZSBhcyB0aGUgdmljdGltIFZNLCB0aGVuIGl0IHdvdWxkIGJlCj4gYSBz dWNjZXNzZnVsIG1pZ3JhdGlvbiByYXRoZXIgdGhhbiBhbiBhdHRhY2suIChQZXJoYXBzIHdlIG5l ZWQgdG8gY2xlYW4KPiB1cCBzb21lICNNQ3MgaGVyZSBidXQgdGhhdCBjYW4gYmUgYSBzZXBhcmF0 ZSBwYXRjaCkuCgpGcm9tIGEgZ3Vlc3Qgc2VjdXJpdHkgcGVyc3BlY3RpdmUsIHRocm93IFREWCBh bmQgU05QIG91dCB0aGUgd2luZG93LiAgQXMgZmFyIGFzCnRoZSBkZXNpZ24gb2YgZ3Vlc3RfbWVt ZmQgaXMgY29uY2VybmVkLCBJIHRydWx5IGRvIG5vdCBjYXJlIHdoYXQgc2VjdXJpdHkgcHJvcGVy dGllcwp0aGV5IHByb3ZpZGUsIEkgb25seSBjYXJlIGFib3V0IHdoZXRoZXIgb3Igbm90IEtWTSdz IHN1cHBvcnQgZm9yIFREWCBhbmQgU05QIGlzCmNsZWFuLCByb2J1c3QsIGFuZCBmdW5jdGlvbmFs bHkgY29ycmVjdC4KCk5vdGUsIEknbSBub3Qgc2F5aW5nIEkgZG9uJ3QgY2FyZSBhYm91dCBURFgv U05QLiAgV2hhdCBJJ20gc2F5aW5nIGlzIHRoYXQgSSBkb24ndAp3YW50IHRvIGRlc2lnbiBzb21l dGhpbmcgdGhhdCBpcyBiZW5lZmljaWFsIG9ubHkgdG8gd2hhdCBpcyBjdXJyZW50bHkgYSB2ZXJ5 Cm5pY2hlIGNsYXNzIG9mIFZNcyB0aGF0IHJlcXVpcmUgc3BlY2lmaWMgZmxhdm9ycyBvZiBoYXJk d2FyZS4KCj4gKGIpIE1hbGljaW91cyBob3N0IGtlcm5lbAo+IAo+IEEgbWFsaWNpb3VzIGhvc3Qg a2VybmVsIGNhbiBhbGxvdyBhIG1hbGljaW91cyBob3N0IFZNTSB0byByZS11c2UgYSBIS0lECj4g Zm9yIHRoZSBkdW1wZXIgVk0sIGJ1dCB0aGlzIGlzbid0IHNvbWV0aGluZyBhIGJldHRlciBnbWVt IGRlc2lnbiBjYW4KPiBkZWZlbmQgYWdhaW5zdC4KClllcCwgY29tcGxldGVseSBvdXQtb2Ytc2Nv cGUuCgo+IChjKSBBdHRhY2tzIHVzaW5nIGdtZW0gZm9yIHNvZnR3YXJlLXByb3RlY3RlZCBWTXMK PiAKPiBBdHRhY2tzIHVzaW5nIGdtZW0gZm9yIHNvZnR3YXJlLXByb3RlY3RlZCBWTXMgYXJlIHBv c3NpYmxlIHNpbmNlIHRoZXJlCj4gaXMgbm8gcmVhbCBlbmNyeXB0aW9uIHdpdGggSEtJRC9BU0lE ICh5ZXQ/KS4gVGhlIHNlbGZ0ZXN0IGZvciBbMV0KPiBhY3R1YWxseSB1c2VzIHRoaXMgbGFjayBv ZiBlbmNyeXB0aW9uIHRvIHRlc3QgdGhhdCB0aGUgZGVzdGluYXRpb24gVk0KPiBjYW4gcmVhZCB0 aGUgc291cmNlIFZNJ3MgbWVtb3J5IGFmdGVyIHRoZSBtaWdyYXRpb24uIEluIHRoZSBQT0MgWzFd LCBhcwo+IGxvbmcgYXMgYm90aCBkZXN0aW5hdGlvbiBWTSBrbm93cyB3aGVyZSBpbiB0aGUgaW5v ZGUncyBtZW1vcnkgdG8gcmVhZCwKPiBpdCBjYW4gcmVhZCB3aGF0IGl0IHdhbnRzIHRvLgogCkVu Y3J5cHRpb24gaXMgbm90IHJlcXVpcmVkIHRvIHByb3RlY3QgZ3Vlc3QgbWVtb3J5IGZyb20gbGVz cyBwcml2aWxlZ2VkIHNvZnR3YXJlLgpUaGUgc2VsZnRlc3RzIGRvbid0IHJlbHkgb24gbGFjayBv ZiBlbmNyeXB0aW9uLCB0aGV5IHJlbHkgb24gS1ZNIGluY29ycG9yYXRpbmcKaG9zdCB1c2Vyc3Bh Y2UgaW50byB0aGUgVENCLgoKSnVzdCBiZWNhdXNlIHRoaXMgUkZDIGRvZXNuJ3QgcmVtb3ZlIHRo ZSBWTU0gZnJvbSB0aGUgVENCIGZvciBTVy1wcm90ZWN0ZWQgVk1TLApkb2Vzbid0IG1lYW4gd2Ug X2Nhbid0XyByZW1vdmUgdGhlIFZNTSBmcm9tIHRoZSBUQ0IuICBwS1ZNIGhhcyBhbHJlYWR5IHNo b3duIHRoYXQKc3VjaCBhbiBpbXBsZW1lbnRhdGlvbiBpcyBwb3NzaWJsZS4gIFdlIGRpZG4ndCB0 YWNrbGUgcEtWTS1saWtlIHN1cHBvcnQgaW4gdGhlCmluaXRpYWwgaW1wbGVtZW50YXRpb24gYmVj YXVzZSBpdCdzIG5vbi10cml2aWFsLCBkb2Vzbid0IHlldCBoYXZlIGEgY29uY3JldGUgdXNlCmNh c2UgdG8gZnVuZC9kcml2ZSBkZXZlbG9wbWVudCwgYW5kIHdvdWxkIGhhdmUgc2lnbmlmaWNhbnRs eSBkZWxheWVkIHN1cHBvcnQgZm9yCnRoZSB1c2UgY2FzZXMgcGVvcGxlIGRvIGFjdHVhbGx5IGNh cmUgYWJvdXQuCgpUaGVyZSBhcmUgY2VydGFpbmx5IGJlbmVmaXRzIGZyb20gbWVtb3J5IGJlaW5n IGVuY3J5cHRlZCwgYnV0IGl0J3MgbmVpdGhlciBhCnJlcXVpcmVtZW50IG5vciBhIHBhbmFjZWEs IGFzIHByb3ZlbiBieSB0aGUgbmV2ZXIgZW5kaW5nIHN0cmVhbSBvZiBzcGVjdWxhdGl2ZQpleGVj dXRpb24gYXR0YWNrcy4KIAo+IFRoaXMgaXMgYSBwcm9ibGVtIGZvciBzb2Z0d2FyZS1wcm90ZWN0 ZWQgVk1zLCBidXQgSSBmZWVsIHRoYXQgaXQgaXMgYWxzbyBhCj4gc2VwYXJhdGUgaXNzdWUgZnJv bSBnbWVtJ3MgZGVzaWduLgoKTm8sIEkgZG9uJ3Qgd2FudCBndWVzdF9tZW1mZCB0byBiZSBqdXN0 IGJlIGEgdmVoaWNsZSBmb3IgU05QL1REWCBWTXMuICBIYXZpbmcgbGluZQpvZiBzaWdodCB0byBy ZW1vdmluZyBob3N0IHVzZXJzcGFjZSBmcm9tIHRoZSBUQ0IgaXMgYWJzb2x1dGVseSBhIG11c3Qg aGF2ZSBmb3IgbWUsCmFuZCBoYXZpbmcgbGluZSBvZiBzaWdodCB0byBpbXByb3ZpbmcgS1ZNJ3Mg c2VjdXJpdHkgcG9zdHVyZSBmb3IgInJlZ3VsYXIiIFZNcyBpcwpldmVuIG1vcmUgb2YgYSBtdXN0 IGhhdmUuICBJZiBndWVzdF9tZW1mZCBkb2Vzbid0IHByb3ZpZGUgdXMgYSB2ZXJ5IGRpcmVjdCBw YXRoIHRvCihldmVudHVhbGx5KSBhY2hpZXZpbmcgdGhvc2UgZ29hbHMsIHRoZW4gSU1PIGl0J3Mg YSBmYWlsdXJlLgoKV2hpY2ggbGVhZHMgbWUgdG86CgooZCkgQnVnZ3kgY29tcG9uZW50cwoKVG9k YXksIGZvciBhbGwgaW50ZW50cyBhbmQgcHVycG9zZXMsIGd1ZXN0IG1lbW9yeSAqbXVzdCogYmUg bWFwcGVkIHdyaXRhYmxlIGluCnRoZSBWTU0sIHdoaWNoIG1lYW5zIGl0IGlzIGFsbCB0b28gZWFz eSBmb3IgYSBiZW5pZ24tYnV0LWJ1Z2d5IGhvc3QgY29tcG9uZW50IHRvCmNvcnJ1cHQgZ3Vlc3Qg bWVtb3J5LiAgVGhlcmUgYXJlIHdheXMgdG8gbWl0aWdhdGUgcG90ZW50aWFsIHByb2JsZW1zLCBl LmcuIGJ5CmRldmVsb3BpbmcgdXNlcnNwYWNlIHRvIGFkaGVyZSB0byB0aGUgcHJpbmNpcGxlIG9m IGxlYXN0IHByaXZpbGVnZSBpbmFzbXVjaCBhcwpwb3NzaWJsZSwgYnV0IHN1Y2ggbWl0aWdhdGlv bnMgd291bGQgYmUgZmFyIGxlc3Mgcm9idXN0IHRoYW4gd2hhdCBjYW4gYmUgYWNoaWV2ZWQKdmlh IGd1ZXN0X21lbWZkLCBhbmQgcHJhY3RpY2FsbHkgc3BlYWtpbmcgSSBkb24ndCBzZWUgdXMgKEdv b2dsZSwgYnV0IGFsc28gS1ZNIGluCmdlbmVyYWwpIG1ha2luZyBwcm9ncmVzcyBvbiBkZXByaXZp bGVnaW5nIHVzZXJzcGFjZSB3aXRob3V0IGZvcmNpbmcgdGhlIGlzc3VlLgoKPiA+PiBDb3VsZCBi aW5kaW5nIGdtZW0gZmlsZXMgbm90IG9uIGNyZWF0aW9uLCBidXQgYXQgbWVtc2xvdCBjb25maWd1 cmF0aW9uCj4gPj4gdGltZSBiZSBzdWZmaWNpZW50IGFuZCBzaW1wbGVyPwo+ID4KPiA+IEFmdGVy IHdvcmtpbmcgdGhyb3VnaCB0aGUgZmxvd3MsIEkgdGhpbmsgYmluZGluZyBvbi1kZW1hbmQgd291 bGQgc2ltcGxpZnkgdGhlCj4gPiByZWZjb3VudGluZyAoc3RhdGluZyB0aGUgb2J2aW91cyksIGJ1 dCBjb21wbGljYXRlIHRoZSBsaWZlY3ljbGUgb2YgdGhlIG1lbW9yeSBhcwo+ID4gd2VsbCBhcyB0 aGUgY29udHJhY3QgYmV0d2VlbiBLVk0gYW5kIHVzZXJzcGFjZSwKPiAKPiBJZiB3ZSBhcmUgb24g dGhlIHNhbWUgcGFnZSB0aGF0IHRoZSBtZW1vcnkgc2hvdWxkIG91dGxpdmUgdGhlIFZNIGJ1dCBu b3QKPiB0aGUgYmluZGluZ3MsIGRvZXMgaXQgc3RpbGwgY29tcGxpY2F0ZSB0aGUgbGlmZWN5Y2xl IG9mIHRoZSBtZW1vcnkgYW5kCj4gdGhlIHVzZXJzcGFjZS9LVk0gY29udHJhY3Q/IENvdWxkIGl0 IGp1c3QgYmUgYSBkaWZmZXJlbnQgY29udHJhY3Q/CgpOb3QgZW50aXJlbHkgc3VyZSBJIHVuZGVy c3RhbmQgd2hhdCB5b3UncmUgYXNraW5nLiAgRG9lcyB0aGlzIHF1ZXN0aW9uIGdvIGF3YXkKd2l0 aCBteSBjbGFyaWZpY2F0aW9uIGFib3V0IHN0cnVjdCBrdm0gdnMuIHZpcnR1YWwgbWFjaGluZT8K Cj4gPiBhbmQgd291bGQgYnJlYWsgdGhlIHNlcGFyYXRpb24gb2YKPiA+IGNvbmNlcm5zIGJldHdl ZW4gdGhlIGlub2RlIChwaHlzaWNhbCBtZW1vcnkgLyBkYXRhKSBhbmQgZmlsZSAoVk0ncyB2aWV3 IC8gbWFwcGluZ3MpLgo+IAo+IEJpbmRpbmcgb24tZGVtYW5kIGlzIG9ydGhvZ29uYWwgdG8gdGhl IHNlcGFyYXRpb24gb2YgY29uY2VybnMgYmV0d2Vlbgo+IGlub2RlIGFuZCBmaWxlLCBiZWNhdXNl IGl0IGNhbiBiZSBidWlsdCByZWdhcmRsZXNzIG9mIHdoZXRoZXIgd2UgZG8gdGhlCj4gZ21lbSBm aWxlL2lub2RlIHNwbGl0Lgo+IAo+ICsgVGhpcyBmbGlwLXRoZS1yZWZjb3VudGluZyBQT0MgaXMg YnVpbHQgd2l0aCB0aGUgZmlsZS9pbm9kZSBzcGxpdCBhbmQKPiArIEluIFsyXSAodGhlIGRlbGF5 ZWQgYmluZGluZyBhcHByb2FjaCB0byBzb2x2ZSBpbnRyYS1ob3N0IG1pZ3JhdGlvbiksIEkKPiAg IGFsc28gdHJpZWQgZmxpcHBpbmcgdGhlIHJlZmNvdW50aW5nLCBhbmQgdGhhdCB3aXRob3V0IHRo ZSBnbWVtCj4gICBmaWxlL2lub2RlIHNwbGl0LiAoUmVmY291bnRpbmcgaW4gWzJdIGlzIGJ1Z2d5 IGJlY2F1c2UgdGhlIGZpbGUgY2FuJ3QKPiAgIHRha2UgYSByZWZjb3VudCBvbiBLVk0sIGJ1dCBp dCB3b3VsZCB3b3JrIHdpdGhvdXQgdGFraW5nIHRoYXQgcmVmY291bnQpCj4gCj4gWzFdIGh0dHBz Oi8vbG9yZS5rZXJuZWwub3JnL2xrbWwvY292ZXIuMTY5MTQ0Njk0Ni5naXQuYWNrZXJsZXl0bmdA Z29vZ2xlLmNvbS9ULwo+IFsyXSBodHRwczovL2dpdGh1Yi5jb20vZ29vZ2xlcHJvZGtlcm5lbC9s aW51eC1jYy9jb21taXQvZGQ1YWM1ZTUzZjE0YTFlZjk5MTVjOWMxZTRjYzEwMDZhNDBiNDlkZgoK X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KbGludXgtYXJt LWtlcm5lbCBtYWlsaW5nIGxpc3QKbGludXgtYXJtLWtlcm5lbEBsaXN0cy5pbmZyYWRlYWQub3Jn Cmh0dHA6Ly9saXN0cy5pbmZyYWRlYWQub3JnL21haWxtYW4vbGlzdGluZm8vbGludXgtYXJtLWtl cm5lbAo=