qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Alexey Kardashevskiy <aik@ozlabs.ru>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH qemu] memory: Fix IOMMU replay base address
Date: Tue, 23 Feb 2016 20:00:19 +1100	[thread overview]
Message-ID: <56CC1FA3.5020003@ozlabs.ru> (raw)
In-Reply-To: <20160223062056.GU2808@voom.fritz.box>

On 02/23/2016 05:20 PM, David Gibson wrote:
> On Tue, Feb 23, 2016 at 01:11:35PM +1100, Alexey Kardashevskiy wrote:
>> On 02/22/2016 11:12 PM, David Gibson wrote:
>>> On Mon, Feb 22, 2016 at 10:36:19PM +1100, Alexey Kardashevskiy wrote:
>>>> On 02/22/2016 05:26 PM, David Gibson wrote:
>>>>> On Mon, Feb 22, 2016 at 05:09:39PM +1100, Alexey Kardashevskiy wrote:
>>>>>> Since a788f227 "memory: Allow replay of IOMMU mapping notifications"
>>>>>> when new VFIO listener is added, all existing IOMMU mappings are replayed.
>>>>>> However there is a problem that the base address of an IOMMU memory region
>>>>>> (IOMMU MR) is ignored which is not a problem for the existing user (which is
>>>>>> pseries) with its default 32bit DMA window starting at 0 but it is if there is
>>>>>> another DMA window.
>>>>>>
>>>>>> This adjusts the replaying address by mr->addr.
>>>>>
>>>>> Uh.. this doesn't look right to me.  AFAICT from the existing
>>>>> implementations the 'addr' parameter to the translate function is an
>>>>> offset within the memory region, which would make the original version
>>>>> correct.
>>>>
>>>> Ok, then spapr_tce_translate_iommu() needs to be fixed. Or I am missing
>>>> something here? The @addr field is definitely ignored now.
>>>
>>> I think at least one of us is missing something.  In the normal AS
>>> translation path, IIUC, it will subtract the mr->addr value to give
>>> offsets which are passed to translate.  It uses those offsets to find
>>> the TCE and returns a translated address.  Where else the @addr be
>>> used?
>>
>> Ok, this patch is definitely wrong.
>>
>> The actual problem I am debugging is this:
>> 1. run guest with an emulated XHCI to have 2 DMA windows;
>> 2. hotplug vfio-pci device.
>> 3. observe failing vfio_dma_map().
>>
>> This triggers memory_region_register_iommu_notifier() +
>> memory_region_iommu_replay() which calls vfio_iommu_map_notify() in a loop.
>> For the second window (huge one, 0x800000000000000 bus address) it fails as
>> iotlb.iova's returned by spapr_tce_translate_iommu() are offsets within
>> IOMMU MR.
>>
>> But vfio_dma_map() (which is eventually called by vfio_iommu_map_notify())
>> needs an absolute iotlb->iova address.
>>
>> imho this should be correct:
>>
>> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
>> index 2e02dc6..0d6b926 100644
>> --- a/hw/ppc/spapr_iommu.c
>> +++ b/hw/ppc/spapr_iommu.c
>> @@ -126,7 +126,7 @@ static IOMMUTLBEntry
>> spapr_tce_translate_iommu(MemoryRegion *iommu, hwaddr addr,
>>           hwaddr page_mask = IOMMU_PAGE_MASK(tcet->page_shift);
>>
>>           tce = tcet->table[addr >> tcet->page_shift];
>> -        ret.iova = addr & page_mask;
>> +        ret.iova = (addr + iommu->addr) & page_mask;
>>           ret.translated_addr = tce & page_mask;
>
> I wondered about that change, but I'd have to look closer to see if
> the iova field here is expected to be relative to the MR as well.  It
> would be oddly inconsistent if it wasn't.


It is relative and it does not make sense as there is no source MR/AS in 
iotlb (only target AS) so there is no use in such iova.


>
>> However this does not help either.
>>
>> Because at the moment I create hardware windows from a helper which is
>> called from spapr_phb_hot_plug_child() (HotplugHandlerClass::plug()
>> callback). And this is too late as pci_qdev_realize("vfio-pci") calls:
>>
>> vfio_connect_container -> ... -> vfio_listener_region_add -> .. ->
>> memory_region_iommu_replay
>> and the latter fails - there is no huge window in the host kernel yet.
>> What is the right way of fixing this?...
>
> Hmm.. I think we need a backend specific hook which will be called
> from vfio_listener_region_add() when it sees a new iommu region, but
> before it calls replay.
>
> Basically that notifier is enumerating to the VFIO subsystem where the
> guest has IOMMU windows.  With DDW on the host, we need something to
> mirror the guest side windows into the host (or reject them, if
> they're not compatible.

The host kernel API consists of 2 ioctls basically - create and remove a 
window. There is no "query" which would tell how many windows are there and 
their parameters. When there is a first vfio-pci device, I basically go 
through the list of emulated windows and create hardware windows for every 
active window. But I do this from spapr-phb code where I have the complete 
picture - how many windows, LIOBNs.

This vfio_listener_region_add() thing - it is called per IOMMU MR, i.e. 
once or twice per PHB, in undefined order, not good (however the order 
never changes in the reality). And how exactly do I connect VFIO container 
and spapr PHB (presumably using QOM)? I cannot think of any good interface 
now... Have another memory listener on PHB's AS in spapr_pci?


> Actually.. that hook should replace the test a little bit higher
> testing against container->min_iova and container->max_iova.  With
> host-side DDW the set of allowable IOVAs is no longer a single, simple
> range - and it's now changeable.

I am missing the purpose of container->min_iova/max_iova, on x86 these are 
constant values and on pseries - QEMU chooses the windows parameters when 
creates the windows so this check is more like assert...


-- 
Alexey

  reply	other threads:[~2016-02-23  9:01 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-22  6:09 [Qemu-devel] [PATCH qemu] memory: Fix IOMMU replay base address Alexey Kardashevskiy
2016-02-22  6:26 ` David Gibson
2016-02-22 11:36   ` Alexey Kardashevskiy
2016-02-22 12:12     ` David Gibson
2016-02-23  2:11       ` Alexey Kardashevskiy
2016-02-23  6:20         ` David Gibson
2016-02-23  9:00           ` Alexey Kardashevskiy [this message]
2016-02-23  9:56             ` Paolo Bonzini
2016-02-23 11:25               ` David Gibson
2016-02-24  0:19               ` Alexey Kardashevskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56CC1FA3.5020003@ozlabs.ru \
    --to=aik@ozlabs.ru \
    --cc=david@gibson.dropbear.id.au \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).