From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.6 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 41E94C4320E for ; Tue, 31 Aug 2021 19:08:40 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D633A61057 for ; Tue, 31 Aug 2021 19:08:39 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org D633A61057 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 2CD696B006C; Tue, 31 Aug 2021 15:08:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 256836B0071; Tue, 31 Aug 2021 15:08:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0F7868D0001; Tue, 31 Aug 2021 15:08:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0211.hostedemail.com [216.40.44.211]) by kanga.kvack.org (Postfix) with ESMTP id F30DB6B006C for ; Tue, 31 Aug 2021 15:08:38 -0400 (EDT) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id A8BA68249980 for ; Tue, 31 Aug 2021 19:08:38 +0000 (UTC) X-FDA: 78536312316.23.A01E0D1 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf04.hostedemail.com (Postfix) with ESMTP id 3CC6C50000A4 for ; Tue, 31 Aug 2021 19:08:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1630436917; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4SWxEMPDuRz0FY7KmKfDVDG8ppIxTwJN7M0MpRD99pY=; b=OU/3jr3eE121nNCak64NPj/UyGUeqT/ZAmAlqHJ+djRGsy786zY/gLcqgin79fyK+FG52g fvnOXqiKNEo8k7PzcvjIdwaNiA8UpMTArTEGeqgz2wPRBoABz14Y3XmMXER+czmSleLCMy aG7PwmYYKBD1TlZ3F7QKco+HpqoTw/Y= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-423-6jLW8QM7Oam9oYrwBxYFAw-1; Tue, 31 Aug 2021 15:08:36 -0400 X-MC-Unique: 6jLW8QM7Oam9oYrwBxYFAw-1 Received: by mail-wr1-f72.google.com with SMTP id v18-20020adfe2920000b029013bbfb19640so135670wri.17 for ; Tue, 31 Aug 2021 12:08:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:organization :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=4SWxEMPDuRz0FY7KmKfDVDG8ppIxTwJN7M0MpRD99pY=; b=ni3vlq+SsdxP8aLPu45zmi2M9XCubdcBBhJRAhMroTX3ENHgM1Qe2sBBNyPOccZhWV oE0HdAWJIiFZpk2e8CE3Rzfl0lH6zy0446cADlcY0t/4fTFRo3aO6yX+u6QPyCWtViBF QtEnimFSJA91zy8HyBG4TQfnRMTIZwRBLje0pGP3jZwpaPoY0is9MczsoK9rCd5n76YN 1TQXfm46HMrCmR/nOjrlE91VxDolO0WaLeXC/xnoPyhoAA8mPcohp/wMxDKuJLMYvppJ Yv43ewWjunQypgkZ2xy006ta6FipuvxzZLCDBlQ/mVoYJFdr3wlFGXKJ/H+6pPaYcJGA gFsA== X-Gm-Message-State: AOAM530BT14i0fyFxKDRDHcI4uw+td/7yvDiVFl0Fw+o7GxZ8Lpeg+K0 O94uTamW5yY8l5XvtHQJKV1SdEaqAZYQJa/x95BZhOIDizasnQVP9CssmXijhBPz4nnrlrp+tiA a9IEzXWDA2Hg= X-Received: by 2002:a1c:29c3:: with SMTP id p186mr5819779wmp.22.1630436915198; Tue, 31 Aug 2021 12:08:35 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy1F1cNwRIN0hukoTwKZyMeA3JM8Dvn1W5XQb38okw/iCNR0VQvNO0o/m5wjByu0ITsWo/x4A== X-Received: by 2002:a1c:29c3:: with SMTP id p186mr5819762wmp.22.1630436914992; Tue, 31 Aug 2021 12:08:34 -0700 (PDT) Received: from [192.168.3.132] (p4ff23bf5.dip0.t-ipconnect.de. [79.242.59.245]) by smtp.gmail.com with ESMTPSA id m3sm24311904wrg.45.2021.08.31.12.08.33 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 31 Aug 2021 12:08:34 -0700 (PDT) Subject: Re: [RFC] KVM: mm: fd-based approach for supporting KVM guest private memory To: Yu Zhang Cc: Sean Christopherson , Paolo Bonzini , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Borislav Petkov , Andy Lutomirski , Andrew Morton , Joerg Roedel , Andi Kleen , David Rientjes , Vlastimil Babka , Tom Lendacky , Thomas Gleixner , Peter Zijlstra , Ingo Molnar , Varad Gautam , Dario Faggioli , x86@kernel.org, linux-mm@kvack.org, linux-coco@lists.linux.dev, "Kirill A . Shutemov" , "Kirill A . Shutemov" , Kuppuswamy Sathyanarayanan , Dave Hansen References: <20210824005248.200037-1-seanjc@google.com> <307d385a-a263-276f-28eb-4bc8dd287e32@redhat.com> <20210827023150.jotwvom7mlsawjh4@linux.intel.com> From: David Hildenbrand Organization: Red Hat Message-ID: <243bc6a3-b43b-cd18-9cbb-1f42a5de802f@redhat.com> Date: Tue, 31 Aug 2021 21:08:33 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: <20210827023150.jotwvom7mlsawjh4@linux.intel.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="OU/3jr3e"; spf=none (imf04.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 216.205.24.124) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 3CC6C50000A4 X-Stat-Signature: 1xtdzij5wus3benxsjik4g8rimmymb1h X-HE-Tag: 1630436918-676067 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 27.08.21 04:31, Yu Zhang wrote: > On Thu, Aug 26, 2021 at 12:15:48PM +0200, David Hildenbrand wrote: >> On 24.08.21 02:52, Sean Christopherson wrote: >>> The goal of this RFC is to try and align KVM, mm, and anyone else with skin in the >>> game, on an acceptable direction for supporting guest private memory, e.g. for >>> Intel's TDX. The TDX architectural effectively allows KVM guests to crash the >>> host if guest private memory is accessible to host userspace, and thus does not >>> play nice with KVM's existing approach of pulling the pfn and mapping level from >>> the host page tables. >>> >>> This is by no means a complete patch; it's a rough sketch of the KVM changes that >>> would be needed. The kernel side of things is completely omitted from the patch; >>> the design concept is below. >>> >>> There's also fair bit of hand waving on implementation details that shouldn't >>> fundamentally change the overall ABI, e.g. how the backing store will ensure >>> there are no mappings when "converting" to guest private. >>> >> >> This is a lot of complexity and rather advanced approaches (not saying they >> are bad, just that we try to teach the whole stack something completely >> new). >> >> >> What I think would really help is a list of requirements, such that >> everybody is aware of what we actually want to achieve. Let me start: >> >> GFN: Guest Frame Number >> EPFN: Encrypted Physical Frame Number >> >> >> 1) An EPFN must not get mapped into more than one VM: it belongs exactly to >> one VM. It must neither be shared between VMs between processes nor between >> VMs within a processes. >> >> >> 2) User space (well, and actually the kernel) must never access an EPFN: >> >> - If we go for an fd, essentially all operations (read/write) have to >> fail. >> - If we have to map an EPFN into user space page tables (e.g., to >> simplify KVM), we could only allow fake swap entries such that "there >> is something" but it cannot be accessed and is flagged accordingly. >> - /proc/kcore and friends have to be careful as well and should not read >> this memory. So there has to be a way to flag these pages. >> >> 3) We need a way to express the GFN<->EPFN mapping and essentially assign an >> EPFN to a GFN. >> >> >> 4) Once we assigned a EPFN to a GFN, that assignment must not longer change. >> Further, an EPFN must not get assigned to multiple GFNs. >> >> >> 5) There has to be a way to "replace" encrypted parts by "shared" parts >> and the other way around. >> >> What else? > > Thanks a lot for this summary. A question about the requirement: do we or > do we not have plan to support assigned device to the protected VM? Good question, I assume that is stuff for the far far future. > > If yes. The fd based solution may need change the VFIO interface as well( > though the fake swap entry solution need mess with VFIO too). Because: > > 1> KVM uses VFIO when assigning devices into a VM. > > 2> Not knowing which GPA ranges may be used by the VM as DMA buffer, all > guest pages will have to be mapped in host IOMMU page table to host pages, > which are pinned during the whole life cycle fo the VM. > > 3> IOMMU mapping is done during VM creation time by VFIO and IOMMU driver, > in vfio_dma_do_map(). > > 4> However, vfio_dma_do_map() needs the HVA to perform a GUP to get the HPA > and pin the page. > > But if we are using fd based solution, not every GPA can have a HVA, thus > the current VFIO interface to map and pin the GPA(IOVA) wont work. And I > doubt if VFIO can be modified to support this easily. I fully agree. Maybe Intel folks have some idea how that's supposed to look like in the future. -- Thanks, David / dhildenb