From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sean Christopherson Date: Tue, 3 Oct 2023 08:59:27 -0700 Subject: [RFC PATCH v12 11/33] KVM: Introduce per-page memory attributes In-Reply-To: References: <20230914015531.1419405-1-seanjc@google.com> <20230914015531.1419405-12-seanjc@google.com> Message-ID: List-Id: To: kvm-riscv@lists.infradead.org MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit On Tue, Oct 03, 2023, Fuad Tabba wrote: > Hi, > > > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > > index d2d913acf0df..f8642ff2eb9d 100644 > > --- a/include/uapi/linux/kvm.h > > +++ b/include/uapi/linux/kvm.h > > @@ -1227,6 +1227,7 @@ struct kvm_ppc_resize_hpt { > > #define KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE 228 > > #define KVM_CAP_ARM_SUPPORTED_BLOCK_SIZES 229 > > #define KVM_CAP_USER_MEMORY2 230 > > +#define KVM_CAP_MEMORY_ATTRIBUTES 231 > > > > #ifdef KVM_CAP_IRQ_ROUTING > > > > @@ -2293,4 +2294,17 @@ struct kvm_s390_zpci_op { > > /* flags for kvm_s390_zpci_op->u.reg_aen.flags */ > > #define KVM_S390_ZPCIOP_REGAEN_HOST (1 << 0) > > > > +/* Available with KVM_CAP_MEMORY_ATTRIBUTES */ > > +#define KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES _IOR(KVMIO, 0xd2, __u64) > > +#define KVM_SET_MEMORY_ATTRIBUTES _IOW(KVMIO, 0xd3, struct kvm_memory_attributes) > > + > > +struct kvm_memory_attributes { > > + __u64 address; > > + __u64 size; > > + __u64 attributes; > > + __u64 flags; > > +}; > > + > > +#define KVM_MEMORY_ATTRIBUTE_PRIVATE (1ULL << 3) > > + > > In pKVM, we don't want to allow setting (or clearing) of PRIVATE/SHARED > attributes from userspace. Why not? The whole thing falls apart if userspace doesn't *know* the state of a page, and the only way for userspace to know the state of a page at a given moment in time is if userspace controls the attributes. E.g. even if KVM were to provide a way for userspace to query attributes, the attributes exposed to usrspace would become stale the instant KVM drops slots_lock (or whatever lock protects the attributes) since userspace couldn't prevent future changes. Why does pKVM need to prevent userspace from stating *its* view of attributes? If the goal is to reduce memory overhead, that can be solved by using an internal, non-ABI attributes flag to track pKVM's view of SHARED vs. PRIVATE. If the guest attempts to access memory where pKVM and userspace don't agree on the state, generate an exit to userspace. Or kill the guest. Or do something else entirely. > However, we'd like to use the attributes xarray to track the sharing state of > guest pages at the host kernel. > > Moreover, we'd rather the default guest page state be PRIVATE, and > only specify which pages are shared. All pKVM guest pages start off as > private, and the majority will remain so. I would rather optimize kvm_vm_set_mem_attributes() to generate range-based xarray entries, at which point it shouldn't matter all that much whether PRIVATE or SHARED is the default "empty" state. We opted not to do that for the initial merge purely to keep the code as simple as possible (which is obviously still not exactly simple). With range-based xarray entries, the cost of tagging huge chunks of memory as PRIVATE should be a non-issue. And if that's not enough for whatever reason, I would rather define the polarity of PRIVATE on a per-VM basis, but only for internal storage. > I'm not sure if this is the best way to do this: One idea would be to move > the definition of KVM_MEMORY_ATTRIBUTE_PRIVATE to > arch/*/include/asm/kvm_host.h, which is where kvm_arch_supported_attributes() > lives as well. This would allow different architectures to specify their own > attributes (i.e., instead we'd have a KVM_MEMORY_ATTRIBUTE_SHARED for pKVM). > This wouldn't help in terms of preventing userspace from clearing attributes > (i.e., setting a 0 attribute) though. > > The other thing, which we need for pKVM anyway, is to make > kvm_vm_set_mem_attributes() global, so that it can be called from outside of > kvm_main.c (already have a local patch for this that declares it in > kvm_host.h), That's no problem, but I am definitely opposed to KVM modifying attributes that are owned by userspace. > and not gate this function by KVM_GENERIC_MEMORY_ATTRIBUTES. As above, I am opposed to pKVM having a completely different ABI for managing PRIVATE vs. SHARED. I have no objection to pKVM using unclaimed flags in the attributes to store extra metadata, but if KVM_SET_MEMORY_ATTRIBUTES doesn't work for pKVM, then we've failed miserably and should revist the uAPI. From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 36FDC36B16 for ; Tue, 3 Oct 2023 15:59:29 +0000 (UTC) Received: by mail-yb1-f202.google.com with SMTP id 3f1490d57ef6-d81646fcf3eso1311280276.0 for ; Tue, 03 Oct 2023 08:59:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1696348769; x=1696953569; darn=lists.linux.dev; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=xP89tk8jeFUDqiHwgtcyinENLLmFDFAo+sp2wFgmRBw=; b=2W1/Lh1/41FhaQIxtDf0ppTtIKf6ft8gN3k/ReLJCNaVi90vGAJnGe86kQYYIalXSG bQ9LYnrouR/8NGnO8EJqstgJN8ve4g4+YwJfNbgneLRiHlB+s4qixODnVLHYB4HvmW+8 IMrAznGSBU1oRRWI3o5tTbg/14kGDZl5uz95fHLOm9X9nFSxZeormdApLkt9ZJbYOJ1d K/NmNpTXmSTy+0yQOfXpOtBMNArjI7OBfziVhgKD0myeCX7QoNrwzHB7lL+UtPY2UPI6 Lb2IW7sX35QVgUuOIihce9aAjgCfW8vMqddvMeFP3fA7TcOzfWJ2ehP/1yfPdliNTS8k RIhw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696348769; x=1696953569; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=xP89tk8jeFUDqiHwgtcyinENLLmFDFAo+sp2wFgmRBw=; b=ca7CWTb6DH0GHpNUYUhL6Wkga2cTxRUVVjFcEdChNXv9f+434i83Thz8vTlNjwoRGI NyfkDf6qQVbXsoSjv7KJ7jr8dlp7LltDnzDGg7mCn96Y05bLLlZoc0DLDEJf+VNj75QY AKuAefSo09AUiCj7SFbcZKZlwIATNl4zcimNoZmeJUc0eKDeR61hpRtr25/eBV5vYhHY YWODkl7LyniQHAhCRf59kosSEwEhwcMu0X0cTWWPuxC3XpCo/GFyRK2MlR/WpqQ5CnDH LktBD0QZlR/nRW0jNEdX6JfiHCgwQ62q1xF6mSVJ39UvIAL3breePDwqB5NBYWkfZOjB xFbA== X-Gm-Message-State: AOJu0YyrqMkJ6iYaRJcZheZkNGu1AXEx29EUVJGa5fTEmrzhT9a7cJEP 6n1VXdBMg8+F8FLIPL/q/NfpuICa84Y= X-Google-Smtp-Source: AGHT+IHlsTPxnmIroL8cntBugIBpa7VGBcPZjKPldlj/dItcEWdxSvIkKSkj76hPSiTFqWvviFkle9Hc81s= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a25:3604:0:b0:d7a:c85c:725b with SMTP id d4-20020a253604000000b00d7ac85c725bmr227114yba.7.1696348768967; Tue, 03 Oct 2023 08:59:28 -0700 (PDT) Date: Tue, 3 Oct 2023 08:59:27 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: kvmarm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20230914015531.1419405-1-seanjc@google.com> <20230914015531.1419405-12-seanjc@google.com> Message-ID: Subject: Re: [RFC PATCH v12 11/33] KVM: Introduce per-page memory attributes From: Sean Christopherson To: Fuad Tabba Cc: Paolo Bonzini , Marc Zyngier , Oliver Upton , Huacai Chen , Michael Ellerman , Anup Patel , Paul Walmsley , Palmer Dabbelt , Albert Ou , "Matthew Wilcox (Oracle)" , Andrew Morton , Paul Moore , James Morris , "Serge E. Hallyn" , kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-security-module@vger.kernel.org, linux-kernel@vger.kernel.org, Chao Peng , Jarkko Sakkinen , Anish Moorthy , Yu Zhang , Isaku Yamahata , Xu Yilun , Vlastimil Babka , Vishal Annapurve , Ackerley Tng , Maciej Szmigiero , David Hildenbrand , Quentin Perret , Michael Roth , Wang , Liam Merwick , Isaku Yamahata , "Kirill A . Shutemov" Content-Type: text/plain; charset="us-ascii" On Tue, Oct 03, 2023, Fuad Tabba wrote: > Hi, > > > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > > index d2d913acf0df..f8642ff2eb9d 100644 > > --- a/include/uapi/linux/kvm.h > > +++ b/include/uapi/linux/kvm.h > > @@ -1227,6 +1227,7 @@ struct kvm_ppc_resize_hpt { > > #define KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE 228 > > #define KVM_CAP_ARM_SUPPORTED_BLOCK_SIZES 229 > > #define KVM_CAP_USER_MEMORY2 230 > > +#define KVM_CAP_MEMORY_ATTRIBUTES 231 > > > > #ifdef KVM_CAP_IRQ_ROUTING > > > > @@ -2293,4 +2294,17 @@ struct kvm_s390_zpci_op { > > /* flags for kvm_s390_zpci_op->u.reg_aen.flags */ > > #define KVM_S390_ZPCIOP_REGAEN_HOST (1 << 0) > > > > +/* Available with KVM_CAP_MEMORY_ATTRIBUTES */ > > +#define KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES _IOR(KVMIO, 0xd2, __u64) > > +#define KVM_SET_MEMORY_ATTRIBUTES _IOW(KVMIO, 0xd3, struct kvm_memory_attributes) > > + > > +struct kvm_memory_attributes { > > + __u64 address; > > + __u64 size; > > + __u64 attributes; > > + __u64 flags; > > +}; > > + > > +#define KVM_MEMORY_ATTRIBUTE_PRIVATE (1ULL << 3) > > + > > In pKVM, we don't want to allow setting (or clearing) of PRIVATE/SHARED > attributes from userspace. Why not? The whole thing falls apart if userspace doesn't *know* the state of a page, and the only way for userspace to know the state of a page at a given moment in time is if userspace controls the attributes. E.g. even if KVM were to provide a way for userspace to query attributes, the attributes exposed to usrspace would become stale the instant KVM drops slots_lock (or whatever lock protects the attributes) since userspace couldn't prevent future changes. Why does pKVM need to prevent userspace from stating *its* view of attributes? If the goal is to reduce memory overhead, that can be solved by using an internal, non-ABI attributes flag to track pKVM's view of SHARED vs. PRIVATE. If the guest attempts to access memory where pKVM and userspace don't agree on the state, generate an exit to userspace. Or kill the guest. Or do something else entirely. > However, we'd like to use the attributes xarray to track the sharing state of > guest pages at the host kernel. > > Moreover, we'd rather the default guest page state be PRIVATE, and > only specify which pages are shared. All pKVM guest pages start off as > private, and the majority will remain so. I would rather optimize kvm_vm_set_mem_attributes() to generate range-based xarray entries, at which point it shouldn't matter all that much whether PRIVATE or SHARED is the default "empty" state. We opted not to do that for the initial merge purely to keep the code as simple as possible (which is obviously still not exactly simple). With range-based xarray entries, the cost of tagging huge chunks of memory as PRIVATE should be a non-issue. And if that's not enough for whatever reason, I would rather define the polarity of PRIVATE on a per-VM basis, but only for internal storage. > I'm not sure if this is the best way to do this: One idea would be to move > the definition of KVM_MEMORY_ATTRIBUTE_PRIVATE to > arch/*/include/asm/kvm_host.h, which is where kvm_arch_supported_attributes() > lives as well. This would allow different architectures to specify their own > attributes (i.e., instead we'd have a KVM_MEMORY_ATTRIBUTE_SHARED for pKVM). > This wouldn't help in terms of preventing userspace from clearing attributes > (i.e., setting a 0 attribute) though. > > The other thing, which we need for pKVM anyway, is to make > kvm_vm_set_mem_attributes() global, so that it can be called from outside of > kvm_main.c (already have a local patch for this that declares it in > kvm_host.h), That's no problem, but I am definitely opposed to KVM modifying attributes that are owned by userspace. > and not gate this function by KVM_GENERIC_MEMORY_ATTRIBUTES. As above, I am opposed to pKVM having a completely different ABI for managing PRIVATE vs. SHARED. I have no objection to pKVM using unclaimed flags in the attributes to store extra metadata, but if KVM_SET_MEMORY_ATTRIBUTES doesn't work for pKVM, then we've failed miserably and should revist the uAPI. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 237CAE7AD75 for ; Tue, 3 Oct 2023 15:59:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:Message-ID: References:Mime-Version:In-Reply-To:Date:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=31+jpTI+0vFu3wAQV3zSE4Y9Pcridjsb2bmQMw9l7mU=; b=C4yVTM2S4/JQuzaULXT7gRo8ni SnUGAzAOrInShsn074mMEAVis8n1Id2OObuKP8YFVxXgQ5Dxf6Shdkmb2pH9Dxp5XpLDfHqP8OB7D tXsnLERTRYLteGYcGdaGwIzvt1sZQYC55iPQyUv8AHDWBT8aT+dIuMdgmnVGEpfA3XRN7yGn38ENi bpyu+QK4jM8+0aK6UwUPdVQS6BySRka2R+t6Nx96akJ+uXs4Tug4Na0bA9srxYgEMgWLmMFzxq87F En0xXAKwKz6PznC3/+ZJh4BEFj2umRkKwAh/hzJL95zI2h5Ymfxz5Yy0gPMNjW+7gsXlSc2396IiH aPNEpnug==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1qnho5-00Ev2t-1e; Tue, 03 Oct 2023 15:59:33 +0000 Received: from mail-yb1-xb4a.google.com ([2607:f8b0:4864:20::b4a]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1qnho2-00Ev2A-2X for linux-riscv@lists.infradead.org; Tue, 03 Oct 2023 15:59:32 +0000 Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-d81646fcf3eso1311278276.0 for ; Tue, 03 Oct 2023 08:59:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1696348769; x=1696953569; darn=lists.infradead.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=xP89tk8jeFUDqiHwgtcyinENLLmFDFAo+sp2wFgmRBw=; b=Uurf1snVUCspgWuf04/nuRKIiF2JuCmrBORv4xcyNhmes3ZGaLLIgYqVWNDNRLmvCr Sp6/XR+uVl4ClNidiIn87HttBPVsAi04urIHfCuib9SBSUUbB5gipxR0W34TEy8rsR3/ fE/E+j35U1mCXLEqlVin+GxKkdw+tsMzNuVG/Ry1QvfcnnagShCzxDfiufWDDsKw3dF6 gxR9mg2v+YiL93w4rdKRB4yrnFwicjZEHdzMBMTRKvu4m9K9Vx9i1te/OVonOZ9pM21P a9K0lYmJLhDrllFpno9KICi2IiB0hdyxCZKz7QLCixXIRHQYAlwAdSLqpUnnrIv4YMCz iggQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696348769; x=1696953569; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=xP89tk8jeFUDqiHwgtcyinENLLmFDFAo+sp2wFgmRBw=; b=Q+Ukt2+0Xd3saYgxhmo4/7tr4K5gXRQvKKOhvjOFIJTDPJVlAWtkcAVCrlvBEXUtgJ TUF3zOAfLp5S0eGd2rFhLOMRZ+g5L8zt6AK1Rt757IjcKLdnWQ66mGaI4vhvpzoNTerO 4zU3Ty0NU3IdBd92i7F9e4TtQV7LQIv5MgHrp1l/Btkhw7x5AWcWIIajV0qzXsKrODQO UVG7B9n8npSULsrxBKQHVnitFIKI1uP3YKZNQQd0M2jOZNDCPHCCCwCgDeZ4N+2iDM3G RPt71g8hjLk7rrq+M58/maKpyUWJqM9prEiHrGBWNoW8vSRSFRF0UfIXFtx9ZdJ2vOUF 2QLQ== X-Gm-Message-State: AOJu0YwDk2CjMHME6mFK6XKCE8nvPLb4xXkgPdKCU79n5c49XLqFdac9 hNsPc5Rdt7ULc6NOfXS7GdsiAHaX9yg= X-Google-Smtp-Source: AGHT+IHlsTPxnmIroL8cntBugIBpa7VGBcPZjKPldlj/dItcEWdxSvIkKSkj76hPSiTFqWvviFkle9Hc81s= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a25:3604:0:b0:d7a:c85c:725b with SMTP id d4-20020a253604000000b00d7ac85c725bmr227114yba.7.1696348768967; Tue, 03 Oct 2023 08:59:28 -0700 (PDT) Date: Tue, 3 Oct 2023 08:59:27 -0700 In-Reply-To: Mime-Version: 1.0 References: <20230914015531.1419405-1-seanjc@google.com> <20230914015531.1419405-12-seanjc@google.com> Message-ID: Subject: Re: [RFC PATCH v12 11/33] KVM: Introduce per-page memory attributes From: Sean Christopherson To: Fuad Tabba Cc: Paolo Bonzini , Marc Zyngier , Oliver Upton , Huacai Chen , Michael Ellerman , Anup Patel , Paul Walmsley , Palmer Dabbelt , Albert Ou , "Matthew Wilcox (Oracle)" , Andrew Morton , Paul Moore , James Morris , "Serge E. Hallyn" , kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-security-module@vger.kernel.org, linux-kernel@vger.kernel.org, Chao Peng , Jarkko Sakkinen , Anish Moorthy , Yu Zhang , Isaku Yamahata , Xu Yilun , Vlastimil Babka , Vishal Annapurve , Ackerley Tng , Maciej Szmigiero , David Hildenbrand , Quentin Perret , Michael Roth , Wang , Liam Merwick , Isaku Yamahata , "Kirill A . Shutemov" X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20231003_085930_825958_BF698815 X-CRM114-Status: GOOD ( 31.89 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org On Tue, Oct 03, 2023, Fuad Tabba wrote: > Hi, > > > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > > index d2d913acf0df..f8642ff2eb9d 100644 > > --- a/include/uapi/linux/kvm.h > > +++ b/include/uapi/linux/kvm.h > > @@ -1227,6 +1227,7 @@ struct kvm_ppc_resize_hpt { > > #define KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE 228 > > #define KVM_CAP_ARM_SUPPORTED_BLOCK_SIZES 229 > > #define KVM_CAP_USER_MEMORY2 230 > > +#define KVM_CAP_MEMORY_ATTRIBUTES 231 > > > > #ifdef KVM_CAP_IRQ_ROUTING > > > > @@ -2293,4 +2294,17 @@ struct kvm_s390_zpci_op { > > /* flags for kvm_s390_zpci_op->u.reg_aen.flags */ > > #define KVM_S390_ZPCIOP_REGAEN_HOST (1 << 0) > > > > +/* Available with KVM_CAP_MEMORY_ATTRIBUTES */ > > +#define KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES _IOR(KVMIO, 0xd2, __u64) > > +#define KVM_SET_MEMORY_ATTRIBUTES _IOW(KVMIO, 0xd3, struct kvm_memory_attributes) > > + > > +struct kvm_memory_attributes { > > + __u64 address; > > + __u64 size; > > + __u64 attributes; > > + __u64 flags; > > +}; > > + > > +#define KVM_MEMORY_ATTRIBUTE_PRIVATE (1ULL << 3) > > + > > In pKVM, we don't want to allow setting (or clearing) of PRIVATE/SHARED > attributes from userspace. Why not? The whole thing falls apart if userspace doesn't *know* the state of a page, and the only way for userspace to know the state of a page at a given moment in time is if userspace controls the attributes. E.g. even if KVM were to provide a way for userspace to query attributes, the attributes exposed to usrspace would become stale the instant KVM drops slots_lock (or whatever lock protects the attributes) since userspace couldn't prevent future changes. Why does pKVM need to prevent userspace from stating *its* view of attributes? If the goal is to reduce memory overhead, that can be solved by using an internal, non-ABI attributes flag to track pKVM's view of SHARED vs. PRIVATE. If the guest attempts to access memory where pKVM and userspace don't agree on the state, generate an exit to userspace. Or kill the guest. Or do something else entirely. > However, we'd like to use the attributes xarray to track the sharing state of > guest pages at the host kernel. > > Moreover, we'd rather the default guest page state be PRIVATE, and > only specify which pages are shared. All pKVM guest pages start off as > private, and the majority will remain so. I would rather optimize kvm_vm_set_mem_attributes() to generate range-based xarray entries, at which point it shouldn't matter all that much whether PRIVATE or SHARED is the default "empty" state. We opted not to do that for the initial merge purely to keep the code as simple as possible (which is obviously still not exactly simple). With range-based xarray entries, the cost of tagging huge chunks of memory as PRIVATE should be a non-issue. And if that's not enough for whatever reason, I would rather define the polarity of PRIVATE on a per-VM basis, but only for internal storage. > I'm not sure if this is the best way to do this: One idea would be to move > the definition of KVM_MEMORY_ATTRIBUTE_PRIVATE to > arch/*/include/asm/kvm_host.h, which is where kvm_arch_supported_attributes() > lives as well. This would allow different architectures to specify their own > attributes (i.e., instead we'd have a KVM_MEMORY_ATTRIBUTE_SHARED for pKVM). > This wouldn't help in terms of preventing userspace from clearing attributes > (i.e., setting a 0 attribute) though. > > The other thing, which we need for pKVM anyway, is to make > kvm_vm_set_mem_attributes() global, so that it can be called from outside of > kvm_main.c (already have a local patch for this that declares it in > kvm_host.h), That's no problem, but I am definitely opposed to KVM modifying attributes that are owned by userspace. > and not gate this function by KVM_GENERIC_MEMORY_ATTRIBUTES. As above, I am opposed to pKVM having a completely different ABI for managing PRIVATE vs. SHARED. I have no objection to pKVM using unclaimed flags in the attributes to store extra metadata, but if KVM_SET_MEMORY_ATTRIBUTES doesn't work for pKVM, then we've failed miserably and should revist the uAPI. _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.ozlabs.org (lists.ozlabs.org [112.213.38.117]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 20458E7AD75 for ; Tue, 3 Oct 2023 16:00:30 +0000 (UTC) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20230601 header.b=OYPH+wwL; dkim-atps=neutral Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4S0Mwr2C6xz3cnZ for ; Wed, 4 Oct 2023 03:00:28 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20230601 header.b=OYPH+wwL; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=flex--seanjc.bounces.google.com (client-ip=2607:f8b0:4864:20::b4a; helo=mail-yb1-xb4a.google.com; envelope-from=3ydoczqykdhulxtgcvzhhzex.vhfebgnqiiv-wxoeblml.hsetul.hkz@flex--seanjc.bounces.google.com; receiver=lists.ozlabs.org) Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4S0Mvp35vBz3c5F for ; Wed, 4 Oct 2023 02:59:32 +1100 (AEDT) Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-d81646fcf3eso1311283276.0 for ; Tue, 03 Oct 2023 08:59:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1696348769; x=1696953569; darn=lists.ozlabs.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=xP89tk8jeFUDqiHwgtcyinENLLmFDFAo+sp2wFgmRBw=; b=OYPH+wwLahyFnG8KHSWTnTcQwTHQ9RKcFqYMiSzseWDAKhWhPwOZ321Yp4vWj0A2xn Na9p27/PTcAqv5Z0mZzQZKeHjhYhhlpPkCTes8DzO1ciP7LhyHfshr/RJ+fqrZmQCXcl wYeI5a/AF669YgpHCJXz/79CB7pYz5+UZ/59txDAd9gvHEOLQ0HKiNdGWegGLHu5vNKu XhfGIRlVL/n0BMT8dcagQYqzwb920jY/bXZc3g1UUR6TINaeGEecpRYOhjC8K/0Jw9YJ BW52YiwJAzsgpdIGqMSfoyCdxzHrb1doq5p/eq60ywr3QAVIzdhEzaq7M9BvIO7IT70Q L/Kw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696348769; x=1696953569; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=xP89tk8jeFUDqiHwgtcyinENLLmFDFAo+sp2wFgmRBw=; b=Ta3Rw8bqAi4ejPhakPZDc/vk40TXuVjTkx1bkCTPqNnGRgaLIR5cKgJfgW33Bbcckw xBsMnn/OHCJCsnKBDgDD+0DgEAI4wUMOdiDfSxdZ/KkLRq2h0iUsLCjz70aVUhxbpPcv FJwgEjesLj/HHoAcN2LT8J/iVjkDofOzFgy4H9evrqsbP+f+tOOiGSVnniJC885jXZb9 x+YWCfjt/kCPGU+gi1+bwJCMAFxRNtHqe21WGE2MZj6ME4aits/068Q6QzNTG2tLpMWU 5/wVIs4u9/3evMpoma39CxsmHERpJaZKLwYMp8fHx9q9LnJx0xsOeuXJeYMzaHwy5DOi 9eSg== X-Gm-Message-State: AOJu0YylfrM9S6qGwmn5onAm5gpqRyXodXjy7qkDBlilNI4BkIRRLUi6 sxGi7G/5hKs8FKkOjvFE5hrIb8cqLfs= X-Google-Smtp-Source: AGHT+IHlsTPxnmIroL8cntBugIBpa7VGBcPZjKPldlj/dItcEWdxSvIkKSkj76hPSiTFqWvviFkle9Hc81s= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a25:3604:0:b0:d7a:c85c:725b with SMTP id d4-20020a253604000000b00d7ac85c725bmr227114yba.7.1696348768967; Tue, 03 Oct 2023 08:59:28 -0700 (PDT) Date: Tue, 3 Oct 2023 08:59:27 -0700 In-Reply-To: Mime-Version: 1.0 References: <20230914015531.1419405-1-seanjc@google.com> <20230914015531.1419405-12-seanjc@google.com> Message-ID: Subject: Re: [RFC PATCH v12 11/33] KVM: Introduce per-page memory attributes From: Sean Christopherson To: Fuad Tabba Content-Type: text/plain; charset="us-ascii" X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kvm@vger.kernel.org, David Hildenbrand , Yu Zhang , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Chao Peng , linux-riscv@lists.infradead.org, Isaku Yamahata , Paul Moore , Marc Zyngier , Huacai Chen , James Morris , "Matthew Wilcox \(Oracle\)" , Wang , Vlastimil Babka , Jarkko Sakkinen , "Serge E. Hallyn" , Maciej Szmigiero , Albert Ou , Michael Roth , Ackerley Tng , Paul Walmsley , kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, Isaku Yamahata , Quentin Perret , Liam Merwick , linux-mips@vger.kernel .org, Oliver Upton , linux-security-module@vger.kernel.org, Palmer Dabbelt , "Kirill A . Shutemov" , kvm-riscv@lists.infradead.org, Anup Patel , linux-fsdevel@vger.kernel.org, Paolo Bonzini , Andrew Morton , Vishal Annapurve , linuxppc-dev@lists.ozlabs.org, Xu Yilun , Anish Moorthy Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" On Tue, Oct 03, 2023, Fuad Tabba wrote: > Hi, > > > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > > index d2d913acf0df..f8642ff2eb9d 100644 > > --- a/include/uapi/linux/kvm.h > > +++ b/include/uapi/linux/kvm.h > > @@ -1227,6 +1227,7 @@ struct kvm_ppc_resize_hpt { > > #define KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE 228 > > #define KVM_CAP_ARM_SUPPORTED_BLOCK_SIZES 229 > > #define KVM_CAP_USER_MEMORY2 230 > > +#define KVM_CAP_MEMORY_ATTRIBUTES 231 > > > > #ifdef KVM_CAP_IRQ_ROUTING > > > > @@ -2293,4 +2294,17 @@ struct kvm_s390_zpci_op { > > /* flags for kvm_s390_zpci_op->u.reg_aen.flags */ > > #define KVM_S390_ZPCIOP_REGAEN_HOST (1 << 0) > > > > +/* Available with KVM_CAP_MEMORY_ATTRIBUTES */ > > +#define KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES _IOR(KVMIO, 0xd2, __u64) > > +#define KVM_SET_MEMORY_ATTRIBUTES _IOW(KVMIO, 0xd3, struct kvm_memory_attributes) > > + > > +struct kvm_memory_attributes { > > + __u64 address; > > + __u64 size; > > + __u64 attributes; > > + __u64 flags; > > +}; > > + > > +#define KVM_MEMORY_ATTRIBUTE_PRIVATE (1ULL << 3) > > + > > In pKVM, we don't want to allow setting (or clearing) of PRIVATE/SHARED > attributes from userspace. Why not? The whole thing falls apart if userspace doesn't *know* the state of a page, and the only way for userspace to know the state of a page at a given moment in time is if userspace controls the attributes. E.g. even if KVM were to provide a way for userspace to query attributes, the attributes exposed to usrspace would become stale the instant KVM drops slots_lock (or whatever lock protects the attributes) since userspace couldn't prevent future changes. Why does pKVM need to prevent userspace from stating *its* view of attributes? If the goal is to reduce memory overhead, that can be solved by using an internal, non-ABI attributes flag to track pKVM's view of SHARED vs. PRIVATE. If the guest attempts to access memory where pKVM and userspace don't agree on the state, generate an exit to userspace. Or kill the guest. Or do something else entirely. > However, we'd like to use the attributes xarray to track the sharing state of > guest pages at the host kernel. > > Moreover, we'd rather the default guest page state be PRIVATE, and > only specify which pages are shared. All pKVM guest pages start off as > private, and the majority will remain so. I would rather optimize kvm_vm_set_mem_attributes() to generate range-based xarray entries, at which point it shouldn't matter all that much whether PRIVATE or SHARED is the default "empty" state. We opted not to do that for the initial merge purely to keep the code as simple as possible (which is obviously still not exactly simple). With range-based xarray entries, the cost of tagging huge chunks of memory as PRIVATE should be a non-issue. And if that's not enough for whatever reason, I would rather define the polarity of PRIVATE on a per-VM basis, but only for internal storage. > I'm not sure if this is the best way to do this: One idea would be to move > the definition of KVM_MEMORY_ATTRIBUTE_PRIVATE to > arch/*/include/asm/kvm_host.h, which is where kvm_arch_supported_attributes() > lives as well. This would allow different architectures to specify their own > attributes (i.e., instead we'd have a KVM_MEMORY_ATTRIBUTE_SHARED for pKVM). > This wouldn't help in terms of preventing userspace from clearing attributes > (i.e., setting a 0 attribute) though. > > The other thing, which we need for pKVM anyway, is to make > kvm_vm_set_mem_attributes() global, so that it can be called from outside of > kvm_main.c (already have a local patch for this that declares it in > kvm_host.h), That's no problem, but I am definitely opposed to KVM modifying attributes that are owned by userspace. > and not gate this function by KVM_GENERIC_MEMORY_ATTRIBUTES. As above, I am opposed to pKVM having a completely different ABI for managing PRIVATE vs. SHARED. I have no objection to pKVM using unclaimed flags in the attributes to store extra metadata, but if KVM_SET_MEMORY_ATTRIBUTES doesn't work for pKVM, then we've failed miserably and should revist the uAPI. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 7943BE7AD75 for ; Tue, 3 Oct 2023 16:00:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:Message-ID: References:Mime-Version:In-Reply-To:Date:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=7gnzWKaIE33mqwafSEwPsja8b1mpROjEfeVZec8iT5g=; b=swijbSbaGOm987ppxM8C5ZzjBX hp7H8vKgrRvJleOdvb2z34reuFIUXFrv6Kr+qmQoQqQ2NiEEKuKQLtxgUYT+wH2FfrnP4XXpAOUdc T2x5HgU7vHe1xUVUnhgTq51mRCaCzYyEpYJl7Iu5IDcFoqGzipwScBcW4gNq/maeNdLdDbx+9SbYb 51RnaCk+TtL6uHk3Z5xQCvfjxkJB81aHkIDqeeXmLwvmEYFVtA6A6JFC2tvB/0ozU2n9AQTbvvK9B cyGxWPOdpydCf8n4amB+2kQNEOqXaiL2TIktoUFMDI+/0j3o68zE3yxqvkzdOgXNUur7H3k172mrO SmFP643Q==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1qnhoA-00Ev6I-1p; Tue, 03 Oct 2023 15:59:38 +0000 Received: from mail-yb1-xb49.google.com ([2607:f8b0:4864:20::b49]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1qnho5-00Ev29-2k for linux-arm-kernel@lists.infradead.org; Tue, 03 Oct 2023 15:59:36 +0000 Received: by mail-yb1-xb49.google.com with SMTP id 3f1490d57ef6-d81646fcf3eso1311277276.0 for ; Tue, 03 Oct 2023 08:59:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1696348769; x=1696953569; darn=lists.infradead.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=xP89tk8jeFUDqiHwgtcyinENLLmFDFAo+sp2wFgmRBw=; b=Uurf1snVUCspgWuf04/nuRKIiF2JuCmrBORv4xcyNhmes3ZGaLLIgYqVWNDNRLmvCr Sp6/XR+uVl4ClNidiIn87HttBPVsAi04urIHfCuib9SBSUUbB5gipxR0W34TEy8rsR3/ fE/E+j35U1mCXLEqlVin+GxKkdw+tsMzNuVG/Ry1QvfcnnagShCzxDfiufWDDsKw3dF6 gxR9mg2v+YiL93w4rdKRB4yrnFwicjZEHdzMBMTRKvu4m9K9Vx9i1te/OVonOZ9pM21P a9K0lYmJLhDrllFpno9KICi2IiB0hdyxCZKz7QLCixXIRHQYAlwAdSLqpUnnrIv4YMCz iggQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696348769; x=1696953569; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=xP89tk8jeFUDqiHwgtcyinENLLmFDFAo+sp2wFgmRBw=; b=sqZVGD+OBArWVbLXB+xQ6VXKYvyj2NrIYThANkX5mD4kzz0cddlplmipPXymmXaAe+ ORzelCZN6ZTDIMeyguUsq3sV1HXn5IaZWUv00mppBXckAvLxvPt4q+n4WpOFQDEpbScJ 3FH20nXN75oi8W8s2FCtPAIUvsbYDWAQm/xf5Iw8s6k7iaY8DkYroEOiJWFRBPARKVdu hWK2YgwvINY+T28lW5ST27iGpSI6G03hwHEb5d7MDbvTiUAGMKAovIBHGKAWBj59nGFQ zLdIIqteRib0OHuges2diqIvoXD4csv98kpw80+Gx8rdunn0S99I+6csIl3Q4a53T3/q GubA== X-Gm-Message-State: AOJu0YzU3oFwcoh6mGYZiu8vFeBrj2WrNVgi2pgLGMwa4aSO7pPuQUR+ CYqh5unPC1RWW0o8dkvt+caE2vB1Uos= X-Google-Smtp-Source: AGHT+IHlsTPxnmIroL8cntBugIBpa7VGBcPZjKPldlj/dItcEWdxSvIkKSkj76hPSiTFqWvviFkle9Hc81s= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a25:3604:0:b0:d7a:c85c:725b with SMTP id d4-20020a253604000000b00d7ac85c725bmr227114yba.7.1696348768967; Tue, 03 Oct 2023 08:59:28 -0700 (PDT) Date: Tue, 3 Oct 2023 08:59:27 -0700 In-Reply-To: Mime-Version: 1.0 References: <20230914015531.1419405-1-seanjc@google.com> <20230914015531.1419405-12-seanjc@google.com> Message-ID: Subject: Re: [RFC PATCH v12 11/33] KVM: Introduce per-page memory attributes From: Sean Christopherson To: Fuad Tabba Cc: Paolo Bonzini , Marc Zyngier , Oliver Upton , Huacai Chen , Michael Ellerman , Anup Patel , Paul Walmsley , Palmer Dabbelt , Albert Ou , "Matthew Wilcox (Oracle)" , Andrew Morton , Paul Moore , James Morris , "Serge E. Hallyn" , kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-security-module@vger.kernel.org, linux-kernel@vger.kernel.org, Chao Peng , Jarkko Sakkinen , Anish Moorthy , Yu Zhang , Isaku Yamahata , Xu Yilun , Vlastimil Babka , Vishal Annapurve , Ackerley Tng , Maciej Szmigiero , David Hildenbrand , Quentin Perret , Michael Roth , Wang , Liam Merwick , Isaku Yamahata , "Kirill A . Shutemov" X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20231003_085933_892121_1B6DD438 X-CRM114-Status: GOOD ( 33.41 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Tue, Oct 03, 2023, Fuad Tabba wrote: > Hi, > > > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > > index d2d913acf0df..f8642ff2eb9d 100644 > > --- a/include/uapi/linux/kvm.h > > +++ b/include/uapi/linux/kvm.h > > @@ -1227,6 +1227,7 @@ struct kvm_ppc_resize_hpt { > > #define KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE 228 > > #define KVM_CAP_ARM_SUPPORTED_BLOCK_SIZES 229 > > #define KVM_CAP_USER_MEMORY2 230 > > +#define KVM_CAP_MEMORY_ATTRIBUTES 231 > > > > #ifdef KVM_CAP_IRQ_ROUTING > > > > @@ -2293,4 +2294,17 @@ struct kvm_s390_zpci_op { > > /* flags for kvm_s390_zpci_op->u.reg_aen.flags */ > > #define KVM_S390_ZPCIOP_REGAEN_HOST (1 << 0) > > > > +/* Available with KVM_CAP_MEMORY_ATTRIBUTES */ > > +#define KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES _IOR(KVMIO, 0xd2, __u64) > > +#define KVM_SET_MEMORY_ATTRIBUTES _IOW(KVMIO, 0xd3, struct kvm_memory_attributes) > > + > > +struct kvm_memory_attributes { > > + __u64 address; > > + __u64 size; > > + __u64 attributes; > > + __u64 flags; > > +}; > > + > > +#define KVM_MEMORY_ATTRIBUTE_PRIVATE (1ULL << 3) > > + > > In pKVM, we don't want to allow setting (or clearing) of PRIVATE/SHARED > attributes from userspace. Why not? The whole thing falls apart if userspace doesn't *know* the state of a page, and the only way for userspace to know the state of a page at a given moment in time is if userspace controls the attributes. E.g. even if KVM were to provide a way for userspace to query attributes, the attributes exposed to usrspace would become stale the instant KVM drops slots_lock (or whatever lock protects the attributes) since userspace couldn't prevent future changes. Why does pKVM need to prevent userspace from stating *its* view of attributes? If the goal is to reduce memory overhead, that can be solved by using an internal, non-ABI attributes flag to track pKVM's view of SHARED vs. PRIVATE. If the guest attempts to access memory where pKVM and userspace don't agree on the state, generate an exit to userspace. Or kill the guest. Or do something else entirely. > However, we'd like to use the attributes xarray to track the sharing state of > guest pages at the host kernel. > > Moreover, we'd rather the default guest page state be PRIVATE, and > only specify which pages are shared. All pKVM guest pages start off as > private, and the majority will remain so. I would rather optimize kvm_vm_set_mem_attributes() to generate range-based xarray entries, at which point it shouldn't matter all that much whether PRIVATE or SHARED is the default "empty" state. We opted not to do that for the initial merge purely to keep the code as simple as possible (which is obviously still not exactly simple). With range-based xarray entries, the cost of tagging huge chunks of memory as PRIVATE should be a non-issue. And if that's not enough for whatever reason, I would rather define the polarity of PRIVATE on a per-VM basis, but only for internal storage. > I'm not sure if this is the best way to do this: One idea would be to move > the definition of KVM_MEMORY_ATTRIBUTE_PRIVATE to > arch/*/include/asm/kvm_host.h, which is where kvm_arch_supported_attributes() > lives as well. This would allow different architectures to specify their own > attributes (i.e., instead we'd have a KVM_MEMORY_ATTRIBUTE_SHARED for pKVM). > This wouldn't help in terms of preventing userspace from clearing attributes > (i.e., setting a 0 attribute) though. > > The other thing, which we need for pKVM anyway, is to make > kvm_vm_set_mem_attributes() global, so that it can be called from outside of > kvm_main.c (already have a local patch for this that declares it in > kvm_host.h), That's no problem, but I am definitely opposed to KVM modifying attributes that are owned by userspace. > and not gate this function by KVM_GENERIC_MEMORY_ATTRIBUTES. As above, I am opposed to pKVM having a completely different ABI for managing PRIVATE vs. SHARED. I have no objection to pKVM using unclaimed flags in the attributes to store extra metadata, but if KVM_SET_MEMORY_ATTRIBUTES doesn't work for pKVM, then we've failed miserably and should revist the uAPI. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel