From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 64A41C433E2 for ; Fri, 4 Sep 2020 21:59:07 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B6CBD2087C for ; Fri, 4 Sep 2020 21:59:06 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="MklrcSnz" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B6CBD2087C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A8AD26B0002; Fri, 4 Sep 2020 17:59:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A13636B0003; Fri, 4 Sep 2020 17:59:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8B44C6B0037; Fri, 4 Sep 2020 17:59:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 6FC696B0002 for ; Fri, 4 Sep 2020 17:59:05 -0400 (EDT) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 292D78248068 for ; Fri, 4 Sep 2020 21:59:05 +0000 (UTC) X-FDA: 77226745050.19.steel70_3a11988270b5 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin19.hostedemail.com (Postfix) with ESMTP id F29DE1AD31E for ; Fri, 4 Sep 2020 21:59:04 +0000 (UTC) X-HE-Tag: steel70_3a11988270b5 X-Filterd-Recvd-Size: 9236 Received: from us-smtp-delivery-1.mimecast.com (us-smtp-1.mimecast.com [205.139.110.61]) by imf15.hostedemail.com (Postfix) with ESMTP for ; Fri, 4 Sep 2020 21:59:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1599256743; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=EoZOJ7xWPzkIVlMZdJmR4Hwqzb4r1fE3xQ2uosOBZDI=; b=MklrcSnzZrcxeP62IHapv8Jxl9u+GrbkPjgbVCz8FVLIcdf2KGHL2pXPaNA6B7ph2iUkp+ RuHpguiPcvirmg9q7kj3ox34cTmtdJEFkVLYfV1hYtvTHIMQRKUIHIbzsUxVUufPE2p40m Res0Gla5sSShxdEqj7qpDr0faZdxurc= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-496-ocJMRjTdPK2UxO1qTlzkSg-1; Fri, 04 Sep 2020 17:59:01 -0400 X-MC-Unique: ocJMRjTdPK2UxO1qTlzkSg-1 Received: by mail-wm1-f71.google.com with SMTP id x81so2658277wmg.8 for ; Fri, 04 Sep 2020 14:59:00 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=EoZOJ7xWPzkIVlMZdJmR4Hwqzb4r1fE3xQ2uosOBZDI=; b=nUXE92sdAV5X/83236147wGjZZY6G9I/DX2HX17UFd1E+jcSiCZ8tDHFSvkxmKu9/3 vquhsZcsIX3Ka3sD1M+KxP5y+PffSEmTRw5qNIcnQVyJvWCUEbUCFN8tEL1MUpaofdm4 ECBjMtrszSDk1iWUz9R3phqBgEP6Oj2GV8Pk0XT0nyYNRgUa47LrBWbwS70Ba64xASAX lbrMEN2hncWXEle7bGRqaz93l2EIsq8lNaG7Cw1CpC3n3T7CmOuOt8c/0K+K6vlZkEOC eDtHF0pBimM+gtV3gSZ2rH9r8OPCwn3+F5WZyFiWcphv0ne5LMPlsH9JVvPkV7Oizcib BKbw== X-Gm-Message-State: AOAM5317Jo98ZoT+lsStu3i5lLWNZGMwnaVqC81GY4P9mBPDY/umm8/M aWiMsmqTMOQuJRefE+S0VnjZiF6kh9DkbhQMYesbyMY0cxxihL1vY0tAXkR2SAy4D9kcqBwsv13 XxwZ0fsOTKZY= X-Received: by 2002:a1c:7912:: with SMTP id l18mr9393760wme.124.1599256739662; Fri, 04 Sep 2020 14:58:59 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzajioDBmsJb5kmZTNPcpohl6e4YiK5MUNMtPhEODb2SBzokI+UbTLMT6caLh7sLzoYFtDUdQ== X-Received: by 2002:a1c:7912:: with SMTP id l18mr9393732wme.124.1599256739345; Fri, 04 Sep 2020 14:58:59 -0700 (PDT) Received: from ?IPv6:2001:b07:6468:f312:c5ce:ce6f:889c:8d7a? ([2001:b07:6468:f312:c5ce:ce6f:889c:8d7a]) by smtp.gmail.com with ESMTPSA id q15sm14108923wrr.8.2020.09.04.14.58.57 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 04 Sep 2020 14:58:58 -0700 (PDT) Subject: Re: [RESEND RFC PATCH 0/5] Remote mapping To: Andy Lutomirski Cc: Andy Lutomirski , =?UTF-8?Q?Adalbert_Laz=c4=83r?= , Linux-MM , Linux API , Andrew Morton , Alexander Graf , Stefan Hajnoczi , Jerome Glisse , =?UTF-8?Q?Mihai_Don=c8=9bu?= , Mircea Cirjaliu , Arnd Bergmann , Sargun Dhillon , Aleksa Sarai , Oleg Nesterov , Jann Horn , Kees Cook , Matthew Wilcox , Christian Brauner References: <70D23368-A24D-4A15-8FC7-FA728D102475@amacapital.net> From: Paolo Bonzini Message-ID: <836cff86-e670-8c69-6cbd-b22c5b5538df@redhat.com> Date: Fri, 4 Sep 2020 23:58:57 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.11.0 MIME-Version: 1.0 In-Reply-To: <70D23368-A24D-4A15-8FC7-FA728D102475@amacapital.net> Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=pbonzini@redhat.com X-Mimecast-Spam-Score: 0.003 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Language: en-US X-Rspamd-Queue-Id: F29DE1AD31E X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam04 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 04/09/20 22:34, Andy Lutomirski wrote: > On Sep 4, 2020, at 1:09 PM, Paolo Bonzini wrote: >> =EF=BB=BFOn 04/09/20 21:39, Andy Lutomirski wrote: >>> I'm a little concerned >>> that it's actually too clever and that maybe a more >>> straightforward solution should be investigated. I personally=20 >>> rather dislike the KVM model in which the guest address space >>> mirrors the host (QEMU) address space rather than being its own >>> thing. In particular, the current model means that >>> extra-special-strange mappings like SEV-encrypted memory are >>> required to be present in the QEMU page tables in order for the >>> guest to see them. (If I had noticed that last bit before it went >>> upstream, I would have NAKked it. I would still like to see it >>> deprecated and ideally eventually removed from the kernel. We >>> have absolutely no business creating incoherent mappings like >>> this.) >>=20 >> NACK first and ask second, right Andy? I see that nothing has >> changed since Alan Cox left Linux. >=20 > NACKs are negotiable. And maybe someone can convince me that the SEV > mapping scheme is reasonable, but I would be surprised. So why say NACK? Any half-decent maintainer would hold on merging the patches at least until the discussion is over. Also I suppose any deprecation proposal should come with a description of an alternative. Anyway, for SEV the problem is DMA. There is no way to know in advance which memory the guest will use for I/O; it can change at any time and the same host-physical address can even be mapped both as C=3D0 and C=3D1= by the guest. There's no communication protocol between the guest and the host to tell the host _which_ memory should be mapped in QEMU. (One was added to support migration, but that doesn't even work with SEV-ES processors where migration is planned to happen mostly with help from the guest, either in the firmware or somewhere else). But this is a digression. (If you would like to continue the discussion please trim the recipient list and change the subject). > Regardless, you seem to be suggesting that you want to have enclave > VMs in which the enclave can see some memory that the parent VM can=E2=80= =99t > see. How does this fit into the KVM mapping model? How does this > remote mapping mechanism help? Do you want QEMU to have that memory > mapped in its own pagetables? There are three processes: - the manager, which is the parent of the VMs and uses the pidfd_mem system call - the primary VM - the enclave VM(s) The primary VM and the enclave VM(s) would each get a different memory access file descriptor. QEMU would treat them no differently from any other externally-provided memory backend, say hugetlbfs or memfd, so yeah they would be mmap-ed to userspace and the host virtual address passed as usual to KVM. Enclave VMs could be used to store secrets and perform crypto for example. The enclave is measured at boot, any keys or other stuff it needs can be provided out-of-band from the manager The manager can decide at any time to hide some memory from the parent VM (in order to give it to an enclave). This would actually be done on request of the parent VM itself, and QEMU would probably be so kind as to replace the "hole" left in the guest memory with zeroes. But QEMU is untrusted, so the manager cannot rely on QEMU behaving well. Hence the privilege separation model that was implemented here. Actually Amazon has already created something like that and Andra-Irina Paraschiv has posted patches on the list for this. Their implementation is not open source, but this pidfd-mem concept is something that Andra, Alexander Graf and I came up with as a way to 1) reimplement the feature upstream and 2) satisfy Bitdefender's need for memory introspection 3) add what seemed a useful interface anyway, for example to replace PTRACE_{PEEK,POKE}DATA. Though (3) would only need pread/pwrite, not mmap which adds a lot of the complexity. > As it stands, the way that KVM memory mappings are created seems to > be convenient, but it also seems to be resulting in increasing > bizarre userspace mappings. At what point is the right solution to > decouple KVM=E2=80=99s mappings from QEMU=E2=80=99s? So what you are suggesting is that KVM manages its own address space instead of host virtual addresses (and with no relationship to host virtual addresses, it would be just a "cookie")? It would then need a couple ioctls to mmap/munmap (creating and deleting VMAs) into the address space, and those cookies would be passed to KVM_SET_USER_MEMORY_REGION. QEMU would still need access to these VMAs, would it mmap a file descriptor provided by KVM? All in all the implementation seems quite complex, and I don't understand why it would avoid incoherent SEV mappings; what am I missing? Paolo