From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D4F50C433F5 for ; Tue, 2 Nov 2021 12:06:28 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7039060FC2 for ; Tue, 2 Nov 2021 12:06:28 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 7039060FC2 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=nongnu.org Received: from localhost ([::1]:40228 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mhsYd-0006oP-Ls for qemu-devel@archiver.kernel.org; Tue, 02 Nov 2021 08:06:27 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:51476) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mhs4t-0000p5-RA for qemu-devel@nongnu.org; Tue, 02 Nov 2021 07:35:43 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:26915) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mhs4p-0006Wd-Qu for qemu-devel@nongnu.org; Tue, 02 Nov 2021 07:35:43 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1635852939; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=LmMvfElX6M+0WYCPcYmQMWdPiGm8k07IOEcDvgd1SxY=; b=UmhKxuQ+q4gKOZSX6rAiOeZQ5fMp4qqstKeZdwa7QKlSQGMQUR6GS2TGN/XLohjXfAMQty eP3uS9+lbWmKGRlxe614JzMgRa+liFGEp8hdhGpJtyvrvKp445usbGXzntpQ/WnzB0Hnem uvzLWywB9gubFclQm2sg1U7arJMwRBY= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-231-ZyPG22-4NGm9b0spL4JHlQ-1; Tue, 02 Nov 2021 07:35:38 -0400 X-MC-Unique: ZyPG22-4NGm9b0spL4JHlQ-1 Received: by mail-wm1-f71.google.com with SMTP id k25-20020a05600c1c9900b00332f798ba1dso989492wms.4 for ; Tue, 02 Nov 2021 04:35:37 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=LmMvfElX6M+0WYCPcYmQMWdPiGm8k07IOEcDvgd1SxY=; b=gjsAurcKBMuSysjjzwvLfAWr8UlTAeNqicA35RUqR0yFsXAcZCD3j3yE5o+MLOsMce daDO6x6SyIjLHQzQXlQ1iiAaaSM8Mp5qejZ7KNztd4ciJmwUDgB9Q8ee7X585PkUimek Qt8TQIrxtqK42LERZnd0zM411PEd1rd15DgQ8v1EML3aTdmyWxR/yuu9c0KTnAyVJ8YT GfJPOs5jPMFE4hTY4JaquJ5mEyIsjSvS17ZNoLaIsO1gYW2jp81dj5LvMMdxFH7b5Red ZFwoEwETLVvIEgfoGlzUCMreEDR8QwotC7zXaWeHN6u1fDdb4L+O9/sUcFgfMS03qB0O qKcg== X-Gm-Message-State: AOAM532IVKieyI8sQzQ+Q5pO8b+vRnG5BAAOZhqODz3rMfYfYb8DvwFd +lqTxSRhms+bbZLPU50H4d0TBdLexNfwtEXDAXjmXfRDgZDIRUtI2xPoATMnjQfFUpDFRkgjYJB F9zexFpy2+fKRX5Y= X-Received: by 2002:a5d:4e81:: with SMTP id e1mr47330392wru.242.1635852936889; Tue, 02 Nov 2021 04:35:36 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzPGl4SyFSPG9ZM0E/Oj2KRRGHWrzU20lxdKAhKtxgtI9RAzppH5ex8ctP1VZa3BxTTX/bRIQ== X-Received: by 2002:a5d:4e81:: with SMTP id e1mr47330357wru.242.1635852936676; Tue, 02 Nov 2021 04:35:36 -0700 (PDT) Received: from redhat.com ([2a03:c5c0:207e:c1:107d:c1da:65:fcb8]) by smtp.gmail.com with ESMTPSA id o20sm2145924wmq.47.2021.11.02.04.35.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 02 Nov 2021 04:35:36 -0700 (PDT) Date: Tue, 2 Nov 2021 07:35:31 -0400 From: "Michael S. Tsirkin" To: David Hildenbrand Subject: Re: [PATCH v1 00/12] virtio-mem: Expose device memory via multiple memslots Message-ID: <20211102072843-mutt-send-email-mst@kernel.org> References: <20211027124531.57561-1-david@redhat.com> <20211101181352-mutt-send-email-mst@kernel.org> MIME-Version: 1.0 In-Reply-To: Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=mst@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Received-SPF: pass client-ip=216.205.24.124; envelope-from=mst@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -34 X-Spam_score: -3.5 X-Spam_bar: --- X-Spam_report: (-3.5 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.702, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Eduardo Habkost , kvm@vger.kernel.org, Richard Henderson , Stefan Hajnoczi , qemu-devel@nongnu.org, Peter Xu , "Dr . David Alan Gilbert" , Sebastien Boeuf , Igor Mammedov , Ani Sinha , Paolo Bonzini , Hui Zhu , Philippe =?iso-8859-1?Q?Mathieu-Daud=E9?= Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" On Tue, Nov 02, 2021 at 09:33:55AM +0100, David Hildenbrand wrote: > On 01.11.21 23:15, Michael S. Tsirkin wrote: > > On Wed, Oct 27, 2021 at 02:45:19PM +0200, David Hildenbrand wrote: > >> This is the follow-up of [1], dropping auto-detection and vhost-user > >> changes from the initial RFC. > >> > >> Based-on: 20211011175346.15499-1-david@redhat.com > >> > >> A virtio-mem device is represented by a single large RAM memory region > >> backed by a single large mmap. > >> > >> Right now, we map that complete memory region into guest physical addres > >> space, resulting in a very large memory mapping, KVM memory slot, ... > >> although only a small amount of memory might actually be exposed to the VM. > >> > >> For example, when starting a VM with a 1 TiB virtio-mem device that only > >> exposes little device memory (e.g., 1 GiB) towards the VM initialliy, > >> in order to hotplug more memory later, we waste a lot of memory on metadata > >> for KVM memory slots (> 2 GiB!) and accompanied bitmaps. Although some > >> optimizations in KVM are being worked on to reduce this metadata overhead > >> on x86-64 in some cases, it remains a problem with nested VMs and there are > >> other reasons why we would want to reduce the total memory slot to a > >> reasonable minimum. > >> > >> We want to: > >> a) Reduce the metadata overhead, including bitmap sizes inside KVM but also > >> inside QEMU KVM code where possible. > >> b) Not always expose all device-memory to the VM, to reduce the attack > >> surface of malicious VMs without using userfaultfd. > > > > I'm confused by the mention of these security considerations, > > and I expect users will be just as confused. > > Malicious VMs wanting to consume more memory than desired is only > relevant when running untrusted VMs in some environments, and it can be > caught differently, for example, by carefully monitoring and limiting > the maximum memory consumption of a VM. We have the same issue already > when using virtio-balloon to logically unplug memory. For me, it's a > secondary concern ( optimizing a is much more important ). > > Some users showed interest in having QEMU disallow access to unplugged > memory, because coming up with a maximum memory consumption for a VM is > hard. This is one step into that direction without having to run with > uffd enabled all of the time. Sorry about missing the memo - is there a lot of overhead associated with uffd then? > ("security is somewhat the wrong word. we won't be able to steal any > information from the hypervisor.) Right. Let's just spell it out. Further, removing memory still requires guest cooperation. > > > So let's say user wants to not be exposed. What value for > > the option should be used? What if a lower option is used? > > Is there still some security advantage? > > My recommendation will be to use 1 memslot per gigabyte as default if > possible in the configuration. If we have a virtio-mem devices with a > maximum size of 128 GiB, the suggestion will be to use memslots=128. > Some setups will require less (e.g., vhost-user until adjusted, old > KVM), some setups can allow for more. I assume that most users will > later set "memslots=0", to enable auto-detection mode. > > > Assume we have a virtio-mem device with a maximum size of 1 TiB and we > hotplugged 1 GiB to the VM. With "memslots=1", the malicious VM could > actually access the whole 1 TiB. With "memslots=1024", the malicious VM > could only access additional ~ 1 GiB. With "memslots=512", ~ 2 GiB. > That's the reduced attack surface. > > Of course, it's different after we hotunplugged memory, before we have > VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE support in QEMU, because all memory > inside the usable region has to be accessible and we cannot "unplug" the > memslots. > > > Note: With upcoming VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE changes in QEMU, > one will be able to disallow any access for malicious VMs by setting the > memblock size just as big as the device block size. > > So with a 128 GiB virtio-mem device with memslots=128,block-size=1G, or > with memslots=1024,block-size=128M we could make it impossible for a > malicious VM to consume more memory than intended. But we lose > flexibility due to the block size and the limited number of available > memslots. > > But again, for "full protection against malicious VMs" I consider > userfaultfd protection more flexible. This approach here gives some > advantage, especially when having large virtio-mem devices that start > out small. > > -- > Thanks, > > David / dhildenb