From mboxrd@z Thu Jan 1 00:00:00 1970 From: Steven DuChene Subject: Re: Device is ineligible for IOMMU domain attach due to platform RMRR requirement Date: Sat, 07 Mar 2015 05:13:46 -0500 Message-ID: <54FACF5A.9080504@hp.com> References: <54F9392B.3060102@hp.com> <1425622242.5200.368.camel@redhat.com> <54FA6C13.1000002@hp.com> <1425703389.4675.49.camel@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Cc: kvm@vger.kernel.org To: Alex Williamson Return-path: Received: from g4t3427.houston.hp.com ([15.201.208.55]:57144 "EHLO g4t3427.houston.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751879AbbCGKNL (ORCPT ); Sat, 7 Mar 2015 05:13:11 -0500 In-Reply-To: <1425703389.4675.49.camel@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: Alex: What would be the result of running an earlier kernel that did not have your RMRR patch on a system that was known to have these problems with RMRR issues? Would there possibly be some instability when trying to do PCI passthrough of these same NVidia devices? We have a debian install on one of these same systems and it is running a 3.14.23-2 kernel and we are seeing some issues with PCI passthrough. -- Steven DuChene On 03/06/2015 11:43 PM, Alex Williamson wrote: > On Fri, 2015-03-06 at 22:10 -0500, Steven DuChene wrote: >> Alex: >> Thanks for your quick reply and the information. One question though: >> When you say contact the platform vendor, are you talking about the >> vendor of the GPU card (NVidia) or the vendor of the system hardware >> (HP)? I.E. is the problem in the system BIOS/firmware or in the firmware >> of the GPU card? >> >> This seems like this is going to be the death-knell of PCI passthrough >> as the likelihood of getting a system vendor to fix some obscure thing >> like this seems remote. > Hi Steven, > > The problem is in the system firmware; the platform vendor in your case > is HP. The issue is actually very limited. Most platform vendors do > not make use of RMRRs beyond the recommendations of the VT-d spec. This > limits RMRRs in the general case to a small set of devices that are not > generally used for PCI assignment anyway. An exemption even exists for > RMRRs associated with USB devices since their usage is known to be > limited to early boot. That effectively limits the scope for most > vendors to UMA graphics where PCI assignment does not yet work anyway. > I expect an exemption could also be added there once the RMRR usage is > discovered and documented. > > In the case you've encountered, the RMRR usage is proprietary and we > cannot know the extent of ongoing usage. We must therefore assume that > it is in use and that the RMRR requirement of the platform must be > honored. > > Obviously our goal with this change is not to pick on any specific > vendor, but to restrict PCI assignment where it can be implemented > safely, both for the platform and the VM. RMRRs present a restriction > in how the IOVA space for a device can be used that we cannot continue > to ignore and which presents implementation issues to support in a PCI > device assignment model. HP engineers as well as the upstream community > have been consulted on this change and agreed to the restriction. As I > said, KVM is not the first hypervisor to implement this restriction and > PCI assignment continues to be a valuable feature on those hypervisors. > Even on affected systems, RMRRs typically only apply to physical PCI > devices. The vast majority of PCI assignment applications are used with > networking devices where SR-IOV is far more prevalent and where SR-IOV > virtual functions are typically unencumbered by RMRRs. > > I believe this change is in the best interest of PCI assignment users, > the scope of affected systems is not as widespread as it might seem from > your perspective, and workarounds are often available for the most > common use case in the form of SR-IOV VFs. Unfortunately we don't have > SR-IOV for Nvidia Tesla cards, so again, all I can offer is to contact > the platform vendor to see if there's any chance of a firmware update > that might remove this restriction. Thanks, > > Alex > > >> On 03/06/2015 01:10 AM, Alex Williamson wrote: >>> On Fri, 2015-03-06 at 00:20 -0500, Steven DuChene wrote: >>>> I am attempting on ubuntu 14.04 to configure PCI passthrough of a NVidia >>>> K40 GPU card that is plugged into a HP DL580 rack mounted server. >>>> I have done all of the pre-work I normally have done in the past with >>>> pci-stub, vfio and etc but when I try an execute a qemu-system-x86_64 >>>> command that works on a similar version of debian, I get the following >>>> error in the dmesg: >>>> >>>> Device is ineligible for IOMMU domain attach due to platform RMRR >>>> requirement. Contact your platform vendor. >>>> >>>> I have read through the patch description from Alex at: >>>> >>>> http://lists.linuxfoundation.org/pipermail/iommu/2014-June/008816.html >>>> >>>> and I have read the IOMMU documentation at: >>>> >>>> https://www.kernel.org/doc/Documentation/Intel-IOMMU.txt >>>> >>>> but I am still not really understanding if or what the fix is for this. >>>> >>>> The ubuntu 14.04 system where I am getting this error is running >>>> 3.16.0-30-generic >>>> The debian system where I can do similar PCI passthrough of a NVidia K2 >>>> GPU device is running a 3.14.29-4 kernel. >>>> >>>> Can anyone provide any insight into an fix or workaround for this? >>> Hi Steven, >>> >>> The issue is that VT-d RMRRs are a platform imposed requirement that a >>> device continue to have identity mapped access to a platform defined >>> memory region at all times. This requirement is fundamentally >>> incompatible with PCI device assignment where the address space of the >>> assigned device is defined by the VM. The VT-d specification hints at >>> this restriction (8.4): >>> >>> The RMRR regions are expected to be used for legacy usages (such >>> as USB, UMA Graphics, etc.) requiring reserved memory. Platform >>> designers should avoid or limit use of reserved memory regions >>> since these require system software to create holes in the DMA >>> virtual address range available to system software and its >>> drivers. >>> >>> In order to support assignment of such devices and continue to honor the >>> RMRR, reserved memory regions would need to be imposed on the guest. >>> Doing this has a number of issues and it's not clear that it enables any >>> usable configurations due to the lack of isolation often implied by the >>> RMRRs. RMRRs themselves imply some sort of communication conduit to the >>> platform, which it's also not clear should be allowed for a guest owned >>> device. >>> >>> We also cannot continue the previous behavior of simply ignoring RMRRs >>> for assigned devices. Not only does the platform require us to honor >>> them, failing to do so could have implication for both the platform and >>> the VM health and integrity. >>> >>> As indicated by the dmesg warning, users encountering this problem >>> should contact their platform vendor, which is really the only course of >>> action that I can recommend. Only the platform vendor can tell you why >>> they've imposed this requirement for the device and potentially offer a >>> remedy to remove that requirement. KVM is not the first hypervisor to >>> impose this restriction for such devices. The referenced patch was >>> tagged for stable, so you can expect that this change will eventually >>> trickle through all the distributions. Sorry for the trouble, but it >>> really was a necessary change. Thanks, >>> >>> Alex >>> > >