From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:35248)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <pmorel@linux.vnet.ibm.com>) id 1aOlMY-0001sT-0D
	for qemu-devel@nongnu.org; Thu, 28 Jan 2016 07:07:49 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <pmorel@linux.vnet.ibm.com>) id 1aOlMS-0007a2-D6
	for qemu-devel@nongnu.org; Thu, 28 Jan 2016 07:07:45 -0500
Received: from e06smtp10.uk.ibm.com ([195.75.94.106]:55909)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <pmorel@linux.vnet.ibm.com>) id 1aOlMR-0007Yz-W8
	for qemu-devel@nongnu.org; Thu, 28 Jan 2016 07:07:40 -0500
Received: from localhost
	by e06smtp10.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use
	Only! Violators will be prosecuted
	for <qemu-devel@nongnu.org> from <pmorel@linux.vnet.ibm.com>;
	Thu, 28 Jan 2016 12:07:37 -0000
Received: from b06cxnps3075.portsmouth.uk.ibm.com
	(d06relay10.portsmouth.uk.ibm.com [9.149.109.195])
	by d06dlp01.portsmouth.uk.ibm.com (Postfix) with ESMTP id 7096917D8056
	for <qemu-devel@nongnu.org>; Thu, 28 Jan 2016 12:07:44 +0000 (GMT)
Received: from d06av03.portsmouth.uk.ibm.com (d06av03.portsmouth.uk.ibm.com
	[9.149.37.213])
	by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with
	ESMTP id u0SC7ZqM3080682
	for <qemu-devel@nongnu.org>; Thu, 28 Jan 2016 12:07:35 GMT
Received: from d06av03.portsmouth.uk.ibm.com (localhost [127.0.0.1])
	by d06av03.portsmouth.uk.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with
	ESMTP id u0SC7YVP021850
	for <qemu-devel@nongnu.org>; Thu, 28 Jan 2016 05:07:35 -0700
References: <1452611505-25478-1-git-send-email-pmorel@linux.vnet.ibm.com>
	<1452622595.9674.19.camel@redhat.com>
	<569FA454.6050409@linux.vnet.ibm.com>
	<1453304819.32741.277.camel@redhat.com>
	<56A0D9F4.1060708@linux.vnet.ibm.com>
	<1453500876.32741.465.camel@redhat.com>
	<1453501156.32741.468.camel@redhat.com>
	<56A787DC.6060905@linux.vnet.ibm.com>
	<1453827630.26652.71.camel@redhat.com>
	<56A88DD9.3040004@linux.vnet.ibm.com>
	<1453916634.6261.7.camel@redhat.com>
From: Pierre Morel <pmorel@linux.vnet.ibm.com>
Message-ID: <56AA0485.1090204@linux.vnet.ibm.com>
Date: Thu, 28 Jan 2016 13:07:33 +0100
MIME-Version: 1.0
In-Reply-To: <1453916634.6261.7.camel@redhat.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH v3] vfio/common: Check iova with limit not
 with size
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: pbonzini@redhat.com, qemu-devel@nongnu.org, peter.maydell@linaro.org


On 01/27/2016 06:43 PM, Alex Williamson wrote:
> On Wed, 2016-01-27 at 10:28 +0100, Pierre Morel wrote:
>>   
>> On 01/26/2016 06:00 PM, Alex Williamson wrote:
>>> On Tue, 2016-01-26 at 15:51 +0100, Pierre Morel wrote:
>>>> On 01/22/2016 11:19 PM, Alex Williamson wrote:
>>>>> On Fri, 2016-01-22 at 15:14 -0700, Alex Williamson wrote:
>>>>>> On Thu, 2016-01-21 at 14:15 +0100, Pierre Morel wrote:
>>>>>>> On 01/20/2016 04:46 PM, Alex Williamson wrote:
>>>>>>>> On Wed, 2016-01-20 at 16:14 +0100, Pierre Morel wrote:
>>>>>>>>> On 01/12/2016 07:16 PM, Alex Williamson wrote:
>>>>>>>>>> On Tue, 2016-01-12 at 16:11 +0100, Pierre Morel wrote:
>>>>>>>>>>> In vfio_listener_region_add(), we try to validate that the region
>>>>>>>>>>> is
>>>>>>>>>>> not
>>>>>>>>>>> zero sized and hasn't overflowed the addresses space.
>>>>>>>>>>>   
>>>>>>>>>>> But the calculation uses the size of the region instead of
>>>>>>>>>>> using the region's limit (size - 1).
>>>>>>>>>>>   
>>>>>>>>>>> This leads to Int128 overflow when the region has
>>>>>>>>>>> been initialized to UINT64_MAX because in this case
>>>>>>>>>>> memory_region_init() transform the size from UINT64_MAX
>>>>>>>>>>> to int128_2_64().
>>>>>>>>>>>   
>>>>>>>>>>> Let's really use the limit by sustracting one to the size
>>>>>>>>>>> and take care to use the limit for functions using limit
>>>>>>>>>>> and size to call functions which need size.
>>>>>>>>>>>   
>>>>>>>>>>> Signed-off-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
>>>>>>>>>>> ---
>>>>>>>>>>>   
>>>>>>>>>>> Changes from v2:
>>>>>>>>>>>          - all, just ignore v2, sorry about this,
>>>>>>>>>>>            this is build after v1
>>>>>>>>>>>   
>>>>>>>>>>> Changes from v1:
>>>>>>>>>>>          - adjust the tests by knowing we already substracted one to
>>>>>>>>>>> end.
>>>>>>>>>>>   
>>>>>>>>>>>       hw/vfio/common.c |   14 +++++++-------
>>>>>>>>>>>       1 files changed, 7 insertions(+), 7 deletions(-)
>>>>>>>>>>>   
>>>>>>>>>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>>>>>>>>>>> index 6797208..a5f6643 100644
>>>>>>>>>>> --- a/hw/vfio/common.c
>>>>>>>>>>> +++ b/hw/vfio/common.c
>>>>>>>>>>> @@ -348,12 +348,12 @@ static void
>>>>>>>>>>> vfio_listener_region_add(MemoryListener *listener,
>>>>>>>>>>>           if (int128_ge(int128_make64(iova), llend)) {
>>>>>>>>>>>               return;
>>>>>>>>>>>           }
>>>>>>>>>>> -    end = int128_get64(llend);
>>>>>>>>>>> +    end = int128_get64(int128_sub(llend, int128_one()));
>>>>>>>>>>>       
>>>>>>>>>>> -    if ((iova < container->min_iova) || ((end - 1) > container-
>>>>>>>>>>>> max_iova)) {
>>>>>>>>>>> +    if ((iova < container->min_iova) || (end  > container-
>>>>>>>>>>>> max_iova)) {
>>>>>>>>>>>               error_report("vfio: IOMMU container %p can't map guest
>>>>>>>>>>> IOVA
>>>>>>>>>>> region"
>>>>>>>>>>>                            " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx,
>>>>>>>>>>> -                     container, iova, end - 1);
>>>>>>>>>>> +                     container, iova, end);
>>>>>>>>>>>               ret = -EFAULT;
>>>>>>>>>>>               goto fail;
>>>>>>>>>>>           }
>>>>>>>>>>> @@ -363,7 +363,7 @@ static void
>>>>>>>>>>> vfio_listener_region_add(MemoryListener *listener,
>>>>>>>>>>>           if (memory_region_is_iommu(section->mr)) {
>>>>>>>>>>>               VFIOGuestIOMMU *giommu;
>>>>>>>>>>>       
>>>>>>>>>>> -        trace_vfio_listener_region_add_iommu(iova, end - 1);
>>>>>>>>>>> +        trace_vfio_listener_region_add_iommu(iova, end);
>>>>>>>>>>>               /*
>>>>>>>>>>>                * FIXME: We should do some checking to see if the
>>>>>>>>>>>                * capabilities of the host VFIO IOMMU are adequate to
>>>>>>>>>>> model
>>>>>>>>>>> @@ -394,13 +394,13 @@ static void
>>>>>>>>>>> vfio_listener_region_add(MemoryListener *listener,
>>>>>>>>>>>                   section->offset_within_region +
>>>>>>>>>>>                   (iova - section->offset_within_address_space);
>>>>>>>>>>>       
>>>>>>>>>>> -    trace_vfio_listener_region_add_ram(iova, end - 1, vaddr);
>>>>>>>>>>> +    trace_vfio_listener_region_add_ram(iova, end, vaddr);
>>>>>>>>>>>       
>>>>>>>>>>> -    ret = vfio_dma_map(container, iova, end - iova, vaddr,
>>>>>>>>>>> section-
>>>>>>>>>>>> readonly);
>>>>>>>>>>> +    ret = vfio_dma_map(container, iova, end - iova + 1, vaddr,
>>>>>>>>>>> section->readonly);
>>>>>>>>>>>           if (ret) {
>>>>>>>>>>>               error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
>>>>>>>>>>>                            "0x%"HWADDR_PRIx", %p) = %d (%m)",
>>>>>>>>>>> -                     container, iova, end - iova, vaddr, ret);
>>>>>>>>>>> +                     container, iova, end - iova + 1, vaddr,
>>>>>>>>>>> ret);
>>>>>>>>>>>               goto fail;
>>>>>>>>>>>           }
>>>>>>>>>>>       
>>>>>>>>>> Hmm, did we just push the overflow from one place to another?  If
>>>>>>>>>> we're
>>>>>>>>>> mapping a full region of size int128_2_64() starting at iova zero,
>>>>>>>>>> then
>>>>>>>>>> this becomes (0xffff_ffff_ffff_ffff - 0 + 1) = 0.  So I think we
>>>>>>>>>> need
>>>>>>>>>> to calculate size with 128bit arithmetic too and let it assert if
>>>>>>>>>> we
>>>>>>>>>> overflow, ie:
>>>>>>>>>>   
>>>>>>>>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>>>>>>>>>> index a5f6643..13ad90b 100644
>>>>>>>>>> --- a/hw/vfio/common.c
>>>>>>>>>> +++ b/hw/vfio/common.c
>>>>>>>>>> @@ -321,7 +321,7 @@ static void
>>>>>>>>>> vfio_listener_region_add(MemoryListener *listener,
>>>>>>>>>>                                            MemoryRegionSection
>>>>>>>>>> *section)
>>>>>>>>>>       {
>>>>>>>>>>           VFIOContainer *container = container_of(listener,
>>>>>>>>>> VFIOContainer, listener);
>>>>>>>>>> -    hwaddr iova, end;
>>>>>>>>>> +    hwaddr iova, end, size;
>>>>>>>>>>           Int128 llend;
>>>>>>>>>>           void *vaddr;
>>>>>>>>>>           int ret;
>>>>>>>>>> @@ -348,7 +348,9 @@ static void
>>>>>>>>>> vfio_listener_region_add(MemoryListener *listener,
>>>>>>>>>>           if (int128_ge(int128_make64(iova), llend)) {
>>>>>>>>>>               return;
>>>>>>>>>>           }
>>>>>>>>>> +
>>>>>>>>>>           end = int128_get64(int128_sub(llend, int128_one()));
>>>>>>>>>> +    size = int128_get64(int128_sub(llend, int128_make64(iova)));
>>>>>>>>> here again, if iova is null, since llend is section->size (2^64) ...
>>>>>>>>>   
>>>>>>>>>>       
>>>>>>>>>>           if ((iova < container->min_iova) || (end  > container-
>>>>>>>>>>> max_iova)) {
>>>>>>>>>>               error_report("vfio: IOMMU container %p can't map guest
>>>>>>>>>> IOVA region"
>>>>>>>>>> @@ -396,11 +398,11 @@ static void
>>>>>>>>>> vfio_listener_region_add(MemoryListener *listener,
>>>>>>>>>>       
>>>>>>>>>>           trace_vfio_listener_region_add_ram(iova, end, vaddr);
>>>>>>>>>>       
>>>>>>>>>> -    ret = vfio_dma_map(container, iova, end - iova + 1, vaddr,
>>>>>>>>>> section->readonly);
>>>>>>>>>> +    ret = vfio_dma_map(container, iova, size, vaddr, section-
>>>>>>>>>>> readonly);
>>>>>>>>>>           if (ret) {
>>>>>>>>>>               error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
>>>>>>>>>>                            "0x%"HWADDR_PRIx", %p) = %d (%m)",
>>>>>>>>>> -                     container, iova, end - iova + 1, vaddr, ret);
>>>>>>>>>> +                     container, iova, size, vaddr, ret);
>>>>>>>>>>               goto fail;
>>>>>>>>>>           }
>>>>>>>>>>       
>>>>>>>>>>   
>>>>>>>>>> Does that still solve your scenario?  Perhaps vfio-iommu-type1
>>>>>>>>>> should
>>>>>>>>>> have used first/last rather than start/size for mapping since we
>>>>>>>>>> seem
>>>>>>>>>> to have an off-by-one for mapping a full 64bit space.  Seems like
>>>>>>>>>> we
>>>>>>>>>> could do it with two calls to vfio_dma_map if we really wanted to.
>>>>>>>>>> Thanks,
>>>>>>>>>>   
>>>>>>>>>> Alex
>>>>>>>>>>   
>>>>>>>>> You are right, every try to solve this will push the overflow
>>>>>>>>> somewhere
>>>>>>>>> else.
>>>>>>>>>   
>>>>>>>>> There is just no way to express 2^64 with 64 bits, we have the
>>>>>>>>> int128()
>>>>>>>>> solution,
>>>>>>>>> but if we solve it here, we fall in the linux ioctl call anyway.
>>>>>>>>>   
>>>>>>>>> Intuitively, making two calls do not seem right to me.
>>>>>>>>>   
>>>>>>>>> But, what do you think of something like:
>>>>>>>>>   
>>>>>>>>> - creating a new VFIO extention
>>>>>>>>>   
>>>>>>>>> - and in ioctl(), since we have a flag entry in the
>>>>>>>>> vfio_iommu_type1_dma_map,
>>>>>>>>> may be adding a new flag meaning "map all virtual memory" ?
>>>>>>>>> or meaning "use first/last" ?
>>>>>>>>> I think this would break existing code unless we add a new VFIO
>>>>>>>>> extension.
>>>>>>>> Backup, is there ever a case where we actually need to map the entire
>>>>>>>> 64bit address space?  This is fairly well impossible on x86.  I'm
>>>>>>>> pointing out an issue, but I don't know that we need to solve it with
>>>>>>>> more than an assert since it's never likely to happen.  Thanks,
>>>>>>>>   
>>>>>>>> Alex
>>>>>>>>   
>>>>>>> If I understood right, IOVA is the IO virtual address,
>>>>>>> it is then possible to map the virtual address page 0xffff_ffff_ffff_f000
>>>>>>> to something reasonable inside the real memory.
>>>>>> It is.
>>>>>>   
>>>>>>> Eventual we do not need to map the last virtual page but
>>>>>>> I think that in a general case the all virtual memory, as viewed by the
>>>>>>> device through the IOMMU should be mapped to avoid any uninitialized
>>>>>>> virtual memory access.
>>>>>> When using vfio, a device only has access to the IOVA space which has
>>>>>> been explicitly mapped.  This would be a security issue otherwise since
>>>>>> kernel vfio can't rely on userspace to wipe the device IOVA space.
>>>> yes.
>>>>>>> It is the same reason that make us map the all virtual memory for the
>>>>>>> CPU MMU.
>>>>>> We don't really do that either, CPU mapping works based on page tables
>>>>>> and non-existent entries simply don't exist.  We don't fully populate
>>>>>> the page tables in advance, this would be a horrible waste of memory.
>>>>   
>>>> Alex,
>>>>   
>>>> I am not sure of that, when preparing DMA from the device, the
>>>> guest will provide the destination address and these destination addresses
>>>> will be translated by the IOMMU when the device start the DMA.
>>>>   
>>>> The guest can make any decision by preparing the DMA and if I have well
>>>> understood,
>>>> this is transparent to QEMU.
>>>> What is not transparent is the IOMMU translation.
>>>>   
>>>> Then, when the device starts the DMA the destination address can be
>>>> anything inside
>>>> the virtual memory and the IOMMU will translate this.
>>>> To be able to translate, a table entry for this virtual address must
>>>> exist in the IOMMU
>>>> page table.
>>>>   
>>>> If you have several level of page table you may only fill the first level
>>>> for all entries, and may be, have only one first level entry initialized
>>>> and the belonging
>>>> second level entries filled.
>>>> Which greatly reduces the size of the tables.
>>>>   
>>>> But if you do not fill one of the first level entry the behavior of the
>>>> IOMMU
>>>> and then of the DMA is done according to what ever has been left in this
>>>> entry.
>>> It seems like you're arguing that the guest is going to have a 2^64 bit
>>> address space for DMA targets.  Sure, the guest driver can program the
>>> device to DMA anywhere in that address space, but what should the IOMMU
>>> actually consider an valid DMA?  It has to be things that have been
>>> mapped, like guest physical RAM or peer-to-peer DMA targets.  How would
>>> the IOMMU handle a stray DMA for anything else?  By default the DMA
>>> target would not be mapped and the DMA would generate a fault.  How
>>> that fault is handled is specific to the architecture and platform.  If
>>> you want to provide the device with a full 2^64 bit address space where
>>> nothing will fault, you're going to need to do it via a lot of mappings
>>> pointing to the same host physical page.  I'm really not sure where
>>> we're going with this though.
>>   
>> Alex,
>>   
>> I think we misunderstood us.
>> As I do not know if I do not understand you or if you do not understand me,
>> I try to explain me better.
>> You may find the description obvious, the all is ok.
>>   
>> In the architectures I know, DMA and IOMMU are two separate things.
>> IOMMU must initialize table entries for the all space DMA is able to access.
>> It must do it, no exception allowed.
>> But some, and may be quite all the entries in the table are invalid
>> entries,
>> existing entries with the invalid bit (or no valid bit) set.
>>   
>> Most, if not all IOMMU have several level of tables.
>> Suppose a two level table mapping a 2^32 memory space.
>> You have first level with 4k entries with an indirection to the second
>> page level.
>> Second page level mapping 256 entries of 4k pages.
>>   
>> In this case, to have a valid page table accessing 1 byte somewhere in
>> memory you have:
>> 4k - 1 entries of first level with invalid bit set
>> 1 first level entry which is valid and point to a second level table
>>   
>> in second level table you have 255 entries with invalid bit set
>> one entry valid pointing to a page where the byte is.
>>   
>> This just to illustrate what I want to say when I say all entries are
>> initialized.
>> If one let garbage inside one entry it happens random things... we do
>> not want that.
>>   
>> Are we OK with this?
> Yes, basic page tables.
>
>>>>>>> May be I missed something, or may be I worry too much,
>>>>>>> but I see this as a restriction on the supported hardware
>>>>>>> if we compare host and guest hardware support compatibility.
>>>>>> I don't see the issue, there's arguably a bug in the API that doesn't
>>>>>> allow us to map the full 64bit IOVA space of a device in a single
>>>>>> mapping, but we can do it in two.  Besides, there's really no case
>>>>>> where a device needs a fully populated IOTLB unless you're actually
>>>>>> giving the device access to 16 EMB of memory.
>>>>> s/EMB/EB/  Or I suppose technically EiB
>>>> yes, I agree with this, we do not need to access so much memory.
>>>>   
>>>>>>> We can live with it, because in fact you are right and today
>>>>>>> I am not aware of a hardware wanting to access this page but a
>>>>>>> hardware designers knowing having a IOMMU may want to access exactly
>>>>>>> this kind of strange virtual page for special features and this would work
>>>>>>> on the host but not inside of the guest.
>>>>>> The API issue is not that we can't map 0xffff_ffff_ffff_f000, it's that
>>>>>> we can't map 0x0 through 0xffff_ffff_ffff_ffff in a single mapping
>>>>>> because we pass the size instead of the end address (where size here
>>>>>> would be 2^64).  We can map 0x0 through 0xffff_ffff_ffff_efff, followed
>>>>>> by 0xffff_ffff_ffff_f000 through 0xffff_ffff_ffff_ffff, but again, why
>>>>>> would you ever need to do this?  Thanks,
>>>>>>   
>>>>>> Alex
>>>> The thing is that It could be useful to say we map all the virtual memory.
>>> Why?  And again, you can do it, just not in a single mapping, which
>>> really seems like a theoretical problem since you're mapping the IOVA
>>> space of a device, which lives within and consumes a small portion of
>>> this address space, so you're pretty much always looking at doing
>>> multiple mappings.
>>   
>> My fault, I use the word map indifferently to refer to a valid mapping
>> or invalid mapping.
>> May be the source of the misunderstood.
>>   
>>>   
>>>> Having a size of 2^64 was a possibility.
>>>> On the other hands, with the actual implementation the
>>>> "memory_region_iommu_replay" would take on long long long time.
>>>>   
>>>> In fact, depending on the IOMMU capabilities and usage we do not need to
>>>> call
>>>> the "memory_region_iommu_replay" at that time or even not at all.
>>> The mapping itself would take a long time, even with GiB pages we're
>>> talking about populating 2^34 page table entries.
>>   
>> as seen earlier, it depends on the IOMMU architecture, some tables
>> can be ignored if the entry of the parent table is invalid.
>> And anyway, the host must do it.
>>   
>>>    Of course many of
>>> those entries would need to point to the same physical page to cover
>>> empty space since otherwise you'd need to run this on a processor that
>>> supports more than 2^64 bits of address space, but you probably don't
>>> want to waste lots of memory covering empty space, so that means a
>>> smaller page size, which means orders of magnitude more mappings and
>>> more space wasted in the page tables...  All of this should have some
>>> useful value, which is escaping me.  Thanks,
>>>   
>>> Alex
>>>   
>>   
>> Another source of misunderstanding can be that I may not understand
>> the QEMU code as good as I think.
>>   
>> And may be the last (I hope so) source of misunderstanding is that I did
>> not explain well
>> the goal of this which is the optimization of IOMMU for some architectures:
>>   
>> Depending on the IOMMU architecture you may not be forced to map the memory
>> with a valid entry from the beginning.
>> In fact you have the following possibility:
>> - map all memory with invalid entry, may be the host do it per default
>> (he better do)
>> - only update entries that the guest updated.
>>   
>> To do this you must intercept guest updates.
>> There are several possibilities to do the interception depending on the
>> host architecture,
>> the IOMMU and the IOMMU-TLB architecture.
>>   
>> In this case, you would map all the memory the DMA may theoretically access
>> with invalid entries, you have no need of replaying the IOMMU entries on
>> start, and you
>> will only updates entries on each DMA_MAP/UNMAP calls.
>>   
>> The migration case would need some kind of IOMMU replay but it is
>> another problem.
> You've still lost me

sorry, when I do not understand I ask until I understand.
and since I did not understand it follows stupid questions.

Qemu code is sometime hard  to understand for me and there is no room
for approximations.

>   on the actual problem you're trying to solve.  You
> want to invalidate all of the IOTLB entries for a device, but that's
> exactly the starting state of an IOMMU domain and you already have the
> ability to get back to that state by unmapping the entire address space,
> with the caveat that it takes two calls to flush the full 2^64 bit
> address space.

Thanks, I think that it is the point I did not full understand.
I think I understand better, thanks to your patience.

>    So don't you just need to fix the overflow issue that
> you've identified and add a workaround for the API issue?

OK, I try to do this.

>    I don't
> understand why you'd ever need to _map_, ie. create valid IOTLB entries,

me neither:-) .
I misunderstood, I do not need it.

> for the entire address space, but if you don't even want to track the
> highest DMA address in use to make unmapping more efficient, the tools
> are still there for you.  Thanks,
>
> Alex
>
>

thanks you very much for your patience and explanations,
I revisit the way to fix the overflow issue and contact you again
when I find something proper.

Pierre