From mboxrd@z Thu Jan  1 00:00:00 1970
From: Steven DuChene <steven.duchene@hp.com>
Subject: Re: Device is ineligible for IOMMU domain attach due to platform
 RMRR requirement
Date: Sat, 07 Mar 2015 05:13:46 -0500
Message-ID: <54FACF5A.9080504@hp.com>
References: <54F9392B.3060102@hp.com> <1425622242.5200.368.camel@redhat.com>	 <54FA6C13.1000002@hp.com> <1425703389.4675.49.camel@redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Cc: kvm@vger.kernel.org
To: Alex Williamson <alex.williamson@redhat.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from g4t3427.houston.hp.com ([15.201.208.55]:57144 "EHLO
	g4t3427.houston.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751879AbbCGKNL (ORCPT <rfc822;kvm@vger.kernel.org>);
	Sat, 7 Mar 2015 05:13:11 -0500
In-Reply-To: <1425703389.4675.49.camel@redhat.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

Alex:
What would be the result of running an earlier kernel that did not have 
your RMRR patch on a system that was known to have these problems with 
RMRR issues? Would there possibly be some instability when trying to do 
PCI passthrough of these same NVidia devices?

We have a debian install on one of these same systems and it is running 
a 3.14.23-2 kernel and we are seeing some issues with PCI passthrough.
--
Steven DuChene

On 03/06/2015 11:43 PM, Alex Williamson wrote:
> On Fri, 2015-03-06 at 22:10 -0500, Steven DuChene wrote:
>> Alex:
>> Thanks for your quick reply and the information. One question though:
>> When you say contact the platform vendor, are you talking about the
>> vendor of the GPU card (NVidia) or the vendor of the system hardware
>> (HP)? I.E. is the problem in the system BIOS/firmware or in the firmware
>> of the GPU card?
>>
>> This seems like this is going to be the death-knell of PCI passthrough
>> as the likelihood of getting a system vendor to fix some obscure thing
>> like this seems remote.
> Hi Steven,
>
> The problem is in the system firmware; the platform vendor in your case
> is HP.  The issue is actually very limited.  Most platform vendors do
> not make use of RMRRs beyond the recommendations of the VT-d spec.  This
> limits RMRRs in the general case to a small set of devices that are not
> generally used for PCI assignment anyway.  An exemption even exists for
> RMRRs associated with USB devices since their usage is known to be
> limited to early boot.  That effectively limits the scope for most
> vendors to UMA graphics where PCI assignment does not yet work anyway.
> I expect an exemption could also be added there once the RMRR usage is
> discovered and documented.
>
> In the case you've encountered, the RMRR usage is proprietary and we
> cannot know the extent of ongoing usage.  We must therefore assume that
> it is in use and that the RMRR requirement of the platform must be
> honored.
>
> Obviously our goal with this change is not to pick on any specific
> vendor, but to restrict PCI assignment where it can be implemented
> safely, both for the platform and the VM.  RMRRs present a restriction
> in how the IOVA space for a device can be used that we cannot continue
> to ignore and which presents implementation issues to support in a PCI
> device assignment model.  HP engineers as well as the upstream community
> have been consulted on this change and agreed to the restriction.  As I
> said, KVM is not the first hypervisor to implement this restriction and
> PCI assignment continues to be a valuable feature on those hypervisors.
> Even on affected systems, RMRRs typically only apply to physical PCI
> devices.  The vast majority of PCI assignment applications are used with
> networking devices where SR-IOV is far more prevalent and where SR-IOV
> virtual functions are typically unencumbered by RMRRs.
>
> I believe this change is in the best interest of PCI assignment users,
> the scope of affected systems is not as widespread as it might seem from
> your perspective, and workarounds are often available for the most
> common use case in the form of SR-IOV VFs.  Unfortunately we don't have
> SR-IOV for Nvidia Tesla cards, so again, all I can offer is to contact
> the platform vendor to see if there's any chance of a firmware update
> that might remove this restriction.  Thanks,
>
> Alex
>
>
>> On 03/06/2015 01:10 AM, Alex Williamson wrote:
>>> On Fri, 2015-03-06 at 00:20 -0500, Steven DuChene wrote:
>>>> I am attempting on ubuntu 14.04 to configure PCI passthrough of a NVidia
>>>> K40 GPU card that is plugged into a HP DL580 rack mounted server.
>>>> I have done all of the pre-work I normally have done in the past with
>>>> pci-stub, vfio and etc but when I try an execute a qemu-system-x86_64
>>>> command that works on a similar version of debian, I get the following
>>>> error in the dmesg:
>>>>
>>>> Device is ineligible for IOMMU domain attach due to platform RMRR
>>>> requirement. Contact your platform vendor.
>>>>
>>>> I have read through the patch description from Alex at:
>>>>
>>>> http://lists.linuxfoundation.org/pipermail/iommu/2014-June/008816.html
>>>>
>>>> and I have read the IOMMU documentation at:
>>>>
>>>> https://www.kernel.org/doc/Documentation/Intel-IOMMU.txt
>>>>
>>>> but I am still not really understanding if or what the fix is for this.
>>>>
>>>> The ubuntu 14.04 system where I am getting this error is running
>>>> 3.16.0-30-generic
>>>> The debian system where I can do similar PCI passthrough of a NVidia K2
>>>> GPU device is running a 3.14.29-4 kernel.
>>>>
>>>> Can anyone provide any insight into an fix or workaround for this?
>>> Hi Steven,
>>>
>>> The issue is that VT-d RMRRs are a platform imposed requirement that a
>>> device continue to have identity mapped access to a platform defined
>>> memory region at all times.  This requirement is fundamentally
>>> incompatible with PCI device assignment where the address space of the
>>> assigned device is defined by the VM.  The VT-d specification hints at
>>> this restriction (8.4):
>>>
>>>           The RMRR regions are expected to be used for legacy usages (such
>>>           as USB, UMA Graphics, etc.) requiring reserved memory. Platform
>>>           designers should avoid or limit use of reserved memory regions
>>>           since these require system software to create holes in the DMA
>>>           virtual address range available to system software and its
>>>           drivers.
>>>
>>> In order to support assignment of such devices and continue to honor the
>>> RMRR, reserved memory regions would need to be imposed on the guest.
>>> Doing this has a number of issues and it's not clear that it enables any
>>> usable configurations due to the lack of isolation often implied by the
>>> RMRRs.  RMRRs themselves imply some sort of communication conduit to the
>>> platform, which it's also not clear should be allowed for a guest owned
>>> device.
>>>
>>> We also cannot continue the previous behavior of simply ignoring RMRRs
>>> for assigned devices.  Not only does the platform require us to honor
>>> them, failing to do so could have implication for both the platform and
>>> the VM health and integrity.
>>>
>>> As indicated by the dmesg warning, users encountering this problem
>>> should contact their platform vendor, which is really the only course of
>>> action that I can recommend.  Only the platform vendor can tell you why
>>> they've imposed this requirement for the device and potentially offer a
>>> remedy to remove that requirement.  KVM is not the first hypervisor to
>>> impose this restriction for such devices.  The referenced patch was
>>> tagged for stable, so you can expect that this change will eventually
>>> trickle through all the distributions.  Sorry for the trouble, but it
>>> really was a necessary change.  Thanks,
>>>
>>> Alex
>>>
>
>