From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6E6FBC43461 for ; Sat, 5 Sep 2020 18:27:37 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CDA422078E for ; Sat, 5 Sep 2020 18:27:36 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="hZ46gx/3" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CDA422078E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id CE4896B0002; Sat, 5 Sep 2020 14:27:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C9BE96B0037; Sat, 5 Sep 2020 14:27:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B5BF46B0055; Sat, 5 Sep 2020 14:27:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0238.hostedemail.com [216.40.44.238]) by kanga.kvack.org (Postfix) with ESMTP id 9FB266B0002 for ; Sat, 5 Sep 2020 14:27:35 -0400 (EDT) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 52D2E181AEF07 for ; Sat, 5 Sep 2020 18:27:35 +0000 (UTC) X-FDA: 77229840870.28.mark37_4906dba270bd Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin28.hostedemail.com (Postfix) with ESMTP id 1044C6C05 for ; Sat, 5 Sep 2020 18:27:35 +0000 (UTC) X-HE-Tag: mark37_4906dba270bd X-Filterd-Recvd-Size: 7632 Received: from us-smtp-delivery-1.mimecast.com (us-smtp-2.mimecast.com [205.139.110.61]) by imf29.hostedemail.com (Postfix) with ESMTP for ; Sat, 5 Sep 2020 18:27:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1599330453; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BtgZJueEOiQwnrF4MWuzBn6DByhoMOeJI2jFjsPl+JU=; b=hZ46gx/3D50q8spw3Sg2dJkyD1eq7YB4UxdrE6Go3J7uQ2LeyEozjg+q9b/ijpOP9bn/S9 Lushx5IrTd1CCxQc3kSQiMPKEHLrzEtBgwBbMVf1MkVTzYn3xTTc3VpRumBKfDWNLZw8hO mJNGZrmcjSIJmAg7Plp1SxQ1L9jaDAw= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-34-1zcagkUMNzeP3wBV2vDXCg-1; Sat, 05 Sep 2020 14:27:32 -0400 X-MC-Unique: 1zcagkUMNzeP3wBV2vDXCg-1 Received: by mail-wr1-f72.google.com with SMTP id 33so3609737wrk.12 for ; Sat, 05 Sep 2020 11:27:31 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=BtgZJueEOiQwnrF4MWuzBn6DByhoMOeJI2jFjsPl+JU=; b=Ro46hN2cv3prpuMAH6QrTrjPgSP4zCi/G8cWmGRVZCswg3hQNIs2LFYeIxeOH/GnQb mIgUAN/F7xHfKDnI0nVHEUHuVZhTyalb/OtYWihTKOxYlasxzQUX9gNIXN8nVsbaQjoR bbx7FYC9TQcrRZOfpHhHrr3AMVQdQMd4igDuZJJyRtD0ceDgWSRyb3bHjM+Kq4e2SxKY /JVFEJQBqb2jwXF7sz8sO5JHf68fY6AhXFRzHskXqblrQlzCgZ8Fu9SAZEjjUOjAWymU 73PLaoxtsGMDgtAlPoiIPjzlsNUWG7sLqdFcyxvGuyTkdNGfR3iO/kAkS2bE8dcLh2Iv ITUA== X-Gm-Message-State: AOAM532LiQhZoBq0tR4csAVizTgKSZHf8KIg+h2vnOCZ21+ZRVV0EzwW bal6kHgdqDZMCe4WtrUlongZqv7LAxn5aUe+ce1djLiGDAuBvk8/jm9o9YJ+CndRW7ZWBA84GB1 1grRzxcLhjpg= X-Received: by 2002:adf:dd82:: with SMTP id x2mr14129311wrl.419.1599330450945; Sat, 05 Sep 2020 11:27:30 -0700 (PDT) X-Google-Smtp-Source: ABdhPJw/x/sd7F1MCI06oI6NuHBEe16ZyXTnQS5UlsXBps+fFu54kInH2QDOs1ht+0nTW3W26BpTEg== X-Received: by 2002:adf:dd82:: with SMTP id x2mr14129282wrl.419.1599330450666; Sat, 05 Sep 2020 11:27:30 -0700 (PDT) Received: from ?IPv6:2001:b07:6468:f312:7ac8:5f99:2279:bef0? ([2001:b07:6468:f312:7ac8:5f99:2279:bef0]) by smtp.gmail.com with ESMTPSA id h8sm18696461wrw.68.2020.09.05.11.27.29 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 05 Sep 2020 11:27:29 -0700 (PDT) Subject: Re: [RESEND RFC PATCH 0/5] Remote mapping To: Andy Lutomirski Cc: =?UTF-8?Q?Adalbert_Laz=c4=83r?= , Linux-MM , Linux API , Andrew Morton , Alexander Graf , Stefan Hajnoczi , Jerome Glisse , =?UTF-8?Q?Mihai_Don=c8=9bu?= , Mircea Cirjaliu , Arnd Bergmann , Sargun Dhillon , Aleksa Sarai , Oleg Nesterov , Jann Horn , Kees Cook , Matthew Wilcox , Christian Brauner References: <70D23368-A24D-4A15-8FC7-FA728D102475@amacapital.net> <836cff86-e670-8c69-6cbd-b22c5b5538df@redhat.com> From: Paolo Bonzini Message-ID: Date: Sat, 5 Sep 2020 20:27:29 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.11.0 MIME-Version: 1.0 In-Reply-To: Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=pbonzini@redhat.com X-Mimecast-Spam-Score: 0.002 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Content-Language: en-US X-Rspamd-Queue-Id: 1044C6C05 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam02 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 05/09/20 01:17, Andy Lutomirski wrote: > There's sev_pin_memory(), so QEMU must have at least some idea of > which memory could potentially be encrypted. Is it in fact the case > that QEMU doesn't know that some SEV pinned memory might actually be > used for DMA until the guest tries to do DMA on that memory? If so, > yuck. Yes. All the memory is pinned, all the memory could potentially be used for DMA (of garbage if it's encrypted). And it's the same for pretty much all protected VM extensions (SEV, POWER, s390, Intel TDX). >> The primary VM and the enclave VM(s) would each get a different memory >> access file descriptor. QEMU would treat them no differently from any >> other externally-provided memory backend, say hugetlbfs or memfd, so >> yeah they would be mmap-ed to userspace and the host virtual address >> passed as usual to KVM. > > Would the VM processes mmap() these descriptors, or would KVM learn > how to handle that memory without it being mapped? The idea is that the process mmaps them, QEMU would treat them just the same as a hugetlbfs file descriptor for example. >> The manager can decide at any time to hide some memory from the parent >> VM (in order to give it to an enclave). This would actually be done on >> request of the parent VM itself [...] But QEMU is >> untrusted, so the manager cannot rely on QEMU behaving well. Hence the >> privilege separation model that was implemented here. > > How does this work? Is there a revoke mechanism, or does the parent > just munmap() the memory itself? The parent has ioctls to add and remove memory from the pidfd-mem. So unmapping is just calling the ioctl that removes a range. >> So what you are suggesting is that KVM manages its own address space >> instead of host virtual addresses (and with no relationship to host >> virtual addresses, it would be just a "cookie")? > > [...] For this pidfd-mem scheme in particular, it might avoid the nasty > corner case I mentioned. With pidfd-mem as in this patchset, I'm > concerned about what happens when process A maps some process B > memory, process B maps some of process A's memory, and there's a > recursive mapping that results. Or when a process maps its own > memory, for that matter. > > Or memfd could get fancier with operations to split memfds, remove > pages from memfds, etc. Maybe that's overkill. Doing it directly with memfd is certainly an option, especially since MFD_HUGE_* exists. Basically you'd have a system call to create a secondary view of the memfd, and the syscall interface could still be very similar to what is in this patch, in particular the control/access pair. Probably this could be used also to implement Matthew Wilcox's ideas. I still believe that the pidfd-mem concept has merit as a "capability-like" PTRACE_{PEEK,POKE}DATA replacement, but it would not need any of privilege separation or mmap support, only direct read/write. So there's two concepts mixed in one interface in this patch, with two completely different usecases. Merging them is clever, but perhaps too clever. I can say that since it was my idea. :D Thanks, Paolo