From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F03A0C4321E for ; Tue, 5 Apr 2022 22:23:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1357155AbiDEWPm (ORCPT ); Tue, 5 Apr 2022 18:15:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48552 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1573245AbiDEScj (ORCPT ); Tue, 5 Apr 2022 14:32:39 -0400 Received: from mail-pj1-x1031.google.com (mail-pj1-x1031.google.com [IPv6:2607:f8b0:4864:20::1031]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CDE2B13F6F for ; Tue, 5 Apr 2022 11:30:40 -0700 (PDT) Received: by mail-pj1-x1031.google.com with SMTP id n6-20020a17090a670600b001caa71a9c4aso309366pjj.1 for ; Tue, 05 Apr 2022 11:30:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=G8+4mXlfXEhZHOhFIRvGTNgBOyulQouXw+I2k4OPfSs=; b=JEAiwwWf8T9LZvKvUJKSgjehqTRQAgWTmIg0Kl3BSYJYTC+v/3bm2x1/9Hy8KfSmg+ cHSPWpCpQx/B9uyWSDx7KDa583awqWBLkdvoEKwQ/QBUVRp4lZ6Cj73ZO0l8R1jsuMTu f4CTIP1KefeADG3WAL+m7kzbPsENWR2ewqxk3Vv6zcy974PyQLpH4p1GEvsWlPaOBnDD BG/d5Cparj8Y2vpN7tIW/k4gGVthNC5Ac8769ebux1ZwYDdSobMpJ6R2kheemPJpQheC Hjen7QcB7kjlOMvcpc1sK+TrMXslHzsMWklqEGe8pGwPZ2paBwziGZrdbb+U0gJgoIyn aHEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=G8+4mXlfXEhZHOhFIRvGTNgBOyulQouXw+I2k4OPfSs=; b=LiGQEEnaU9YRQNGYu7zUuXYjeeikUTkEt9a0g1NsgYpXXg/DpusyH5qAK+t+2E4fH7 ftE9elTWTghWiHEuReuvutezdR1kvaVkVEShn8iuJesdd31Vuyp/3UfwYZ1b4TsHYdGB B6S7xk84SsduyMIe17Bm9m1b8TFrXL/VG8Vcf1R1nMR2jJ7wDpYX6cM4OZ5KWGC6dezS Crkw92ZEPBRzsP7qUD3HEyl54J0QvCQpOB6bYCILAI7QHwvme7Xtb1eLZgPsda/ony0L tLF9NWA1K4qrMp6qnbxyhWv6NTKqRzxZNDdr9rmXHxJpm51rvscxUg0INEoK3dKsnd4h dVhw== X-Gm-Message-State: AOAM530m8NCCCI0QU0JZQGj52/Sr4pfHfjnGJ3PU5nMpMrPHYFuBYpxD Vak9p+1PQ8gtSaJZjHnZXTR3ow== X-Google-Smtp-Source: ABdhPJyua072a4AGX8ZFGYAQtmIMqiU1fY+N03i9KlJ21rU1Sdp03j8ltrsaSuySC/JcuYQKVUW/ZQ== X-Received: by 2002:a17:902:8217:b0:156:9c4f:90eb with SMTP id x23-20020a170902821700b001569c4f90ebmr4788820pln.121.1649183440130; Tue, 05 Apr 2022 11:30:40 -0700 (PDT) Received: from google.com (157.214.185.35.bc.googleusercontent.com. [35.185.214.157]) by smtp.gmail.com with ESMTPSA id i7-20020a628707000000b004fa6eb33b02sm16131023pfe.49.2022.04.05.11.30.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Apr 2022 11:30:39 -0700 (PDT) Date: Tue, 5 Apr 2022 18:30:35 +0000 From: Sean Christopherson To: Andy Lutomirski Cc: Quentin Perret , Steven Price , Chao Peng , kvm list , Linux Kernel Mailing List , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, Linux API , qemu-devel@nongnu.org, Paolo Bonzini , Jonathan Corbet , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , the arch/x86 maintainers , "H. Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Mike Rapoport , "Maciej S . Szmigiero" , Vlastimil Babka , Vishal Annapurve , Yu Zhang , "Kirill A. Shutemov" , "Nakajima, Jun" , Dave Hansen , Andi Kleen , David Hildenbrand , Marc Zyngier , Will Deacon Subject: Re: [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory Message-ID: References: <80aad2f9-9612-4e87-a27a-755d3fa97c92@www.fastmail.com> <83fd55f8-cd42-4588-9bf6-199cbce70f33@www.fastmail.com> <54acbba9-f4fd-48c1-9028-d596d9f63069@www.fastmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <54acbba9-f4fd-48c1-9028-d596d9f63069@www.fastmail.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org On Tue, Apr 05, 2022, Andy Lutomirski wrote: > On Tue, Apr 5, 2022, at 3:36 AM, Quentin Perret wrote: > > On Monday 04 Apr 2022 at 15:04:17 (-0700), Andy Lutomirski wrote: > >> The best I can come up with is a special type of shared page that is not > >> GUP-able and maybe not even mmappable, having a clear option for > >> transitions to fail, and generally preventing the nasty cases from > >> happening in the first place. > > > > Right, that sounds reasonable to me. > > At least as a v1, this is probably more straightforward than allowing mmap(). > Also, there's much to be said for a simpler, limited API, to be expanded if > genuinely needed, as opposed to starting out with a very featureful API. Regarding "genuinely needed", IMO the same applies to supporting this at all. Without numbers from something at least approximating a real use case, we're just speculating on which will be the most performant approach. > >> Maybe there could be a special mode for the private memory fds in which > >> specific pages are marked as "managed by this fd but actually shared". > >> pread() and pwrite() would work on those pages, but not mmap(). (Or maybe > >> mmap() but the resulting mappings would not permit GUP.) And > >> transitioning them would be a special operation on the fd that is specific > >> to pKVM and wouldn't work on TDX or SEV. > > > > Aha, didn't think of pread()/pwrite(). Very interesting. > > There are plenty of use cases for which pread()/pwrite()/splice() will be as > fast or even much faster than mmap()+memcpy(). ... > resume guest > *** host -> hypervisor -> guest *** > Guest unshares the page. > *** guest -> hypervisor *** > Hypervisor removes PTE. TLBI. > *** hypervisor -> guest *** > > Obviously considerable cleverness is needed to make a virt IOMMU like this > work well, but still. > > Anyway, my suggestion is that the fd backing proposal get slightly modified > to get it ready for multiple subtypes of backing object, which should be a > pretty minimal change. Then, if someone actually needs any of this > cleverness, it can be added later. In the mean time, the > pread()/pwrite()/splice() scheme is pretty good. Tangentially related to getting private-fd ready for multiple things, what about implementing the pread()/pwrite()/splice() scheme in pKVM itself? I.e. read() on the VM fd, with the offset corresponding to gfn in some way. Ditto for mmap() on the VM fd, though that would require additional changes outside of pKVM. That would allow pKVM to support in-place conversions without the private-fd having to differentiate between the type of protected VM, and without having to provide new APIs from the private-fd. TDX, SNP, etc... Just Work by not supporting the pKVM APIs. And assuming we get multiple consumers down the road, pKVM will need to be able to communicate the "true" state of a page to other consumers, because in addition to being a consumer, pKVM is also an owner/enforcer analogous to the TDX Module and the SEV PSP.