From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:55123) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cGoDj-0004Es-Tq for qemu-devel@nongnu.org; Tue, 13 Dec 2016 09:38:26 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cGoDg-00088t-NC for qemu-devel@nongnu.org; Tue, 13 Dec 2016 09:38:19 -0500 Received: from mx1.redhat.com ([209.132.183.28]:51124) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1cGoDg-00088Y-F7 for qemu-devel@nongnu.org; Tue, 13 Dec 2016 09:38:16 -0500 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 5D2D9C05AA6F for ; Tue, 13 Dec 2016 14:38:15 +0000 (UTC) Date: Tue, 13 Dec 2016 16:38:14 +0200 From: "Michael S. Tsirkin" Message-ID: <20161213163524-mutt-send-email-mst@kernel.org> References: <1481089965-3888-3-git-send-email-peterx@redhat.com> <20161211051011-mutt-send-email-mst@kernel.org> <20161212015602.GJ28693@pxdev.xzpeter.org> <20161212123544.2139b842@t450s.home> <20161213033341.GA32222@pxdev.xzpeter.org> <20161212205150.3e7f7d3b@t450s.home> <20161213052429.GB32222@pxdev.xzpeter.org> <20161212224828.5cc9f841@t450s.home> <20161213061212.GC32222@pxdev.xzpeter.org> <20161213061747.0c152b86@t450s.home> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20161213061747.0c152b86@t450s.home> Subject: Re: [Qemu-devel] [PATCH for-2.9 2/2] intel_iommu: extend supported guest aw to 48 bits List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alex Williamson Cc: Peter Xu , jasowang@redhat.com, famz@redhat.com, qemu-devel@nongnu.org On Tue, Dec 13, 2016 at 06:17:47AM -0700, Alex Williamson wrote: > On Tue, 13 Dec 2016 14:12:12 +0800 > Peter Xu wrote: > > > On Mon, Dec 12, 2016 at 10:48:28PM -0700, Alex Williamson wrote: > > > On Tue, 13 Dec 2016 13:24:29 +0800 > > > Peter Xu wrote: > > > > > > > On Mon, Dec 12, 2016 at 08:51:50PM -0700, Alex Williamson wrote: > > > > > > > > [...] > > > > > > > > > > > I'm not sure how the vIOMMU supporting 39 bits or 48 bits is directly > > > > > > > relevant to vfio, we're not sharing page tables. There is already a > > > > > > > case today, without vIOMMU that you can make a guest which has more > > > > > > > guest physical address space than the hardware IOMMU by overcommitting > > > > > > > system memory. Generally this quickly resolves itself when we start > > > > > > > pinning pages since the physical address width of the IOMMU is > > > > > > > typically the same as the physical address width of the host system > > > > > > > (ie. we exhaust the host memory). > > > > > > > > > > > > Hi, Alex, > > > > > > > > > > > > Here does "hardware IOMMU" means the IOMMU iova address space width? > > > > > > For example, if guest has 48 bits physical address width (without > > > > > > vIOMMU), but host hardware IOMMU only supports 39 bits for its iova > > > > > > address space, could device assigment work in this case? > > > > > > > > > > The current usage depends entirely on what the user (VM) tries to map. > > > > > You could expose a vIOMMU with a 64bit address width, but the moment > > > > > you try to perform a DMA mapping with IOVA beyond bit 39 (if that's the > > > > > host IOMMU address width), the ioctl will fail and the VM will abort. > > > > > IOW, you can claim whatever vIOMMU address width you want, but if you > > > > > layout guest memory or devices in such a way that actually require IOVA > > > > > mapping beyond the host capabilities, you're going to abort. Likewise, > > > > > without a vIOMMU if the guest memory layout is sufficiently sparse to > > > > > require such IOVAs, you're going to abort. Thanks, > > > > > > > > Thanks for the explanation. I got the point. > > > > > > > > However, should we allow guest behaviors affect hypervisor? In this > > > > case, if guest maps IOVA range over 39 bits (assuming vIOMMU is > > > > declaring itself with 48 bits address width), the VM will crash. How > > > > about we shrink vIOMMU address width to 39 bits during boot if we > > > > detected that assigned devices are configured? IMHO no matter what we > > > > do in the guest, the hypervisor should keep the guest alive from > > > > hypervisor POV (emulation of the guest hardware should not be stopped > > > > by guest behavior). If any operation in guest can cause hypervisor > > > > down, isn't it a bug? > > > > > > Any case of the guest crashing the hypervisor (ie. the host) is a > > > serious bug, but a guest causing it's own VM to abort is an entirely > > > different class, and in some cases justified. For instance, you only > > > need a guest misbehaving in the virtio protocol to generate a VM > > > abort. The cases Kevin raises make me reconsider because they are > > > cases of a VM behaving properly, within the specifications of the > > > hardware exposed to it, generating a VM abort, and in the case of vfio > > > exposed through to a guest user, allow the VM to be susceptible to the > > > actions of that user. > > > > > > Of course any time we tie VM hardware to a host constraint, we're > > > asking for trouble. You're example of shrinking the vIOMMU address > > > width to 39bits on boot highlights that. Clearly cold plug devices is > > > only one scenario, what about hotplug devices? We cannot dynamically > > > change the vIOMMU address width. What about migration, we could start > > > the VM w/o an assigned device on a 48bit capable host and migrate it to > > > a 39bit host and then attempt to hot add an assigned device. For the > > > most compatibility, why would we ever configure the VM with a vIOMMU > > > address width beyond the minimum necessary to support the potential > > > populated guest physical memory? Thanks, > > > > For now, I feel a tunable for the address width more essential - let's > > just name it as "aw-bits", which should only be used by advanced > > users. By default, we can use an address width safe enough, like 39 > > bits (I assume that most pIOMMUs should support at least 39 bits). > > User configurations can override (for now, we can limit the options to > > only 39/48 bits). > > > > Then, we can temporarily live even without the interface to detect > > host parameters - when user specify a specific width, he/she will > > manage the rest (of course taking the risk of VM aborts). > > I'm sorry, what is the actual benefit of a 48-bit address width? > Simply to be able to support larger memory VMs? In that case the > address width should be automatically configured when necessary rather > than providing yet another obscure user configuration. I think we need to map out all the issues, and a tunable isn't a bad way to experiment in order do this. > Minimally, if > we don't have the support worked out for an option we should denote it > as an experimental option by prefixing it with 'x-'. Once we make a > non-experimental option, we're stuck with it, and if feels like this is > being rushed through without an concrete requirement for supporting > it. Thanks, > > Alex That's a good idea I think. We'll rename once we have a better understanding what this depends on. -- MST