From: Alexey Kardashevskiy <aik@ozlabs.ru>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH qemu] memory: Fix IOMMU replay base address
Date: Tue, 23 Feb 2016 20:00:19 +1100 [thread overview]
Message-ID: <56CC1FA3.5020003@ozlabs.ru> (raw)
In-Reply-To: <20160223062056.GU2808@voom.fritz.box>
On 02/23/2016 05:20 PM, David Gibson wrote:
> On Tue, Feb 23, 2016 at 01:11:35PM +1100, Alexey Kardashevskiy wrote:
>> On 02/22/2016 11:12 PM, David Gibson wrote:
>>> On Mon, Feb 22, 2016 at 10:36:19PM +1100, Alexey Kardashevskiy wrote:
>>>> On 02/22/2016 05:26 PM, David Gibson wrote:
>>>>> On Mon, Feb 22, 2016 at 05:09:39PM +1100, Alexey Kardashevskiy wrote:
>>>>>> Since a788f227 "memory: Allow replay of IOMMU mapping notifications"
>>>>>> when new VFIO listener is added, all existing IOMMU mappings are replayed.
>>>>>> However there is a problem that the base address of an IOMMU memory region
>>>>>> (IOMMU MR) is ignored which is not a problem for the existing user (which is
>>>>>> pseries) with its default 32bit DMA window starting at 0 but it is if there is
>>>>>> another DMA window.
>>>>>>
>>>>>> This adjusts the replaying address by mr->addr.
>>>>>
>>>>> Uh.. this doesn't look right to me. AFAICT from the existing
>>>>> implementations the 'addr' parameter to the translate function is an
>>>>> offset within the memory region, which would make the original version
>>>>> correct.
>>>>
>>>> Ok, then spapr_tce_translate_iommu() needs to be fixed. Or I am missing
>>>> something here? The @addr field is definitely ignored now.
>>>
>>> I think at least one of us is missing something. In the normal AS
>>> translation path, IIUC, it will subtract the mr->addr value to give
>>> offsets which are passed to translate. It uses those offsets to find
>>> the TCE and returns a translated address. Where else the @addr be
>>> used?
>>
>> Ok, this patch is definitely wrong.
>>
>> The actual problem I am debugging is this:
>> 1. run guest with an emulated XHCI to have 2 DMA windows;
>> 2. hotplug vfio-pci device.
>> 3. observe failing vfio_dma_map().
>>
>> This triggers memory_region_register_iommu_notifier() +
>> memory_region_iommu_replay() which calls vfio_iommu_map_notify() in a loop.
>> For the second window (huge one, 0x800000000000000 bus address) it fails as
>> iotlb.iova's returned by spapr_tce_translate_iommu() are offsets within
>> IOMMU MR.
>>
>> But vfio_dma_map() (which is eventually called by vfio_iommu_map_notify())
>> needs an absolute iotlb->iova address.
>>
>> imho this should be correct:
>>
>> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
>> index 2e02dc6..0d6b926 100644
>> --- a/hw/ppc/spapr_iommu.c
>> +++ b/hw/ppc/spapr_iommu.c
>> @@ -126,7 +126,7 @@ static IOMMUTLBEntry
>> spapr_tce_translate_iommu(MemoryRegion *iommu, hwaddr addr,
>> hwaddr page_mask = IOMMU_PAGE_MASK(tcet->page_shift);
>>
>> tce = tcet->table[addr >> tcet->page_shift];
>> - ret.iova = addr & page_mask;
>> + ret.iova = (addr + iommu->addr) & page_mask;
>> ret.translated_addr = tce & page_mask;
>
> I wondered about that change, but I'd have to look closer to see if
> the iova field here is expected to be relative to the MR as well. It
> would be oddly inconsistent if it wasn't.
It is relative and it does not make sense as there is no source MR/AS in
iotlb (only target AS) so there is no use in such iova.
>
>> However this does not help either.
>>
>> Because at the moment I create hardware windows from a helper which is
>> called from spapr_phb_hot_plug_child() (HotplugHandlerClass::plug()
>> callback). And this is too late as pci_qdev_realize("vfio-pci") calls:
>>
>> vfio_connect_container -> ... -> vfio_listener_region_add -> .. ->
>> memory_region_iommu_replay
>> and the latter fails - there is no huge window in the host kernel yet.
>> What is the right way of fixing this?...
>
> Hmm.. I think we need a backend specific hook which will be called
> from vfio_listener_region_add() when it sees a new iommu region, but
> before it calls replay.
>
> Basically that notifier is enumerating to the VFIO subsystem where the
> guest has IOMMU windows. With DDW on the host, we need something to
> mirror the guest side windows into the host (or reject them, if
> they're not compatible.
The host kernel API consists of 2 ioctls basically - create and remove a
window. There is no "query" which would tell how many windows are there and
their parameters. When there is a first vfio-pci device, I basically go
through the list of emulated windows and create hardware windows for every
active window. But I do this from spapr-phb code where I have the complete
picture - how many windows, LIOBNs.
This vfio_listener_region_add() thing - it is called per IOMMU MR, i.e.
once or twice per PHB, in undefined order, not good (however the order
never changes in the reality). And how exactly do I connect VFIO container
and spapr PHB (presumably using QOM)? I cannot think of any good interface
now... Have another memory listener on PHB's AS in spapr_pci?
> Actually.. that hook should replace the test a little bit higher
> testing against container->min_iova and container->max_iova. With
> host-side DDW the set of allowable IOVAs is no longer a single, simple
> range - and it's now changeable.
I am missing the purpose of container->min_iova/max_iova, on x86 these are
constant values and on pseries - QEMU chooses the windows parameters when
creates the windows so this check is more like assert...
--
Alexey
next prev parent reply other threads:[~2016-02-23 9:01 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-22 6:09 [Qemu-devel] [PATCH qemu] memory: Fix IOMMU replay base address Alexey Kardashevskiy
2016-02-22 6:26 ` David Gibson
2016-02-22 11:36 ` Alexey Kardashevskiy
2016-02-22 12:12 ` David Gibson
2016-02-23 2:11 ` Alexey Kardashevskiy
2016-02-23 6:20 ` David Gibson
2016-02-23 9:00 ` Alexey Kardashevskiy [this message]
2016-02-23 9:56 ` Paolo Bonzini
2016-02-23 11:25 ` David Gibson
2016-02-24 0:19 ` Alexey Kardashevskiy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56CC1FA3.5020003@ozlabs.ru \
--to=aik@ozlabs.ru \
--cc=david@gibson.dropbear.id.au \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).