qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Jonah Palmer <jonah.palmer@oracle.com>
To: Eugenio Perez Martin <eperezma@redhat.com>,
	Si-Wei Liu <si-wei.liu@oracle.com>
Cc: qemu-devel@nongnu.org, "Michael S. Tsirkin" <mst@redhat.com>,
	Lei Yang <leiyang@redhat.com>, Peter Xu <peterx@redhat.com>,
	Dragos Tatulea <dtatulea@nvidia.com>,
	Jason Wang <jasowang@redhat.com>
Subject: Re: [RFC 1/2] iova_tree: add an id member to DMAMap
Date: Mon, 29 Apr 2024 07:19:13 -0400	[thread overview]
Message-ID: <1e1cbe99-65e8-4292-b19b-8e054f5b1fca@oracle.com> (raw)
In-Reply-To: <CAJaqyWeyfPp5bh9iZrkwZshoStEHZ85P6t4TcEdmR5sDYhG4ug@mail.gmail.com>



On 4/29/24 4:14 AM, Eugenio Perez Martin wrote:
> On Thu, Apr 25, 2024 at 7:44 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>
>>
>>
>> On 4/24/2024 12:33 AM, Eugenio Perez Martin wrote:
>>> On Wed, Apr 24, 2024 at 12:21 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>>>
>>>>
>>>> On 4/22/2024 1:49 AM, Eugenio Perez Martin wrote:
>>>>> On Sat, Apr 20, 2024 at 1:50 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>>>>>
>>>>>> On 4/19/2024 1:29 AM, Eugenio Perez Martin wrote:
>>>>>>> On Thu, Apr 18, 2024 at 10:46 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>>>>>>> On 4/10/2024 3:03 AM, Eugenio Pérez wrote:
>>>>>>>>> IOVA tree is also used to track the mappings of virtio-net shadow
>>>>>>>>> virtqueue.  This mappings may not match with the GPA->HVA ones.
>>>>>>>>>
>>>>>>>>> This causes a problem when overlapped regions (different GPA but same
>>>>>>>>> translated HVA) exists in the tree, as looking them by HVA will return
>>>>>>>>> them twice.  To solve this, create an id member so we can assign unique
>>>>>>>>> identifiers (GPA) to the maps.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>>>>>>>> ---
>>>>>>>>>       include/qemu/iova-tree.h | 5 +++--
>>>>>>>>>       util/iova-tree.c         | 3 ++-
>>>>>>>>>       2 files changed, 5 insertions(+), 3 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/include/qemu/iova-tree.h b/include/qemu/iova-tree.h
>>>>>>>>> index 2a10a7052e..34ee230e7d 100644
>>>>>>>>> --- a/include/qemu/iova-tree.h
>>>>>>>>> +++ b/include/qemu/iova-tree.h
>>>>>>>>> @@ -36,6 +36,7 @@ typedef struct DMAMap {
>>>>>>>>>           hwaddr iova;
>>>>>>>>>           hwaddr translated_addr;
>>>>>>>>>           hwaddr size;                /* Inclusive */
>>>>>>>>> +    uint64_t id;
>>>>>>>>>           IOMMUAccessFlags perm;
>>>>>>>>>       } QEMU_PACKED DMAMap;
>>>>>>>>>       typedef gboolean (*iova_tree_iterator)(DMAMap *map);
>>>>>>>>> @@ -100,8 +101,8 @@ const DMAMap *iova_tree_find(const IOVATree *tree, const DMAMap *map);
>>>>>>>>>        * @map: the mapping to search
>>>>>>>>>        *
>>>>>>>>>        * Search for a mapping in the iova tree that translated_addr overlaps with the
>>>>>>>>> - * mapping range specified.  Only the first found mapping will be
>>>>>>>>> - * returned.
>>>>>>>>> + * mapping range specified and map->id is equal.  Only the first found
>>>>>>>>> + * mapping will be returned.
>>>>>>>>>        *
>>>>>>>>>        * Return: DMAMap pointer if found, or NULL if not found.  Note that
>>>>>>>>>        * the returned DMAMap pointer is maintained internally.  User should
>>>>>>>>> diff --git a/util/iova-tree.c b/util/iova-tree.c
>>>>>>>>> index 536789797e..0863e0a3b8 100644
>>>>>>>>> --- a/util/iova-tree.c
>>>>>>>>> +++ b/util/iova-tree.c
>>>>>>>>> @@ -97,7 +97,8 @@ static gboolean iova_tree_find_address_iterator(gpointer key, gpointer value,
>>>>>>>>>
>>>>>>>>>           needle = args->needle;
>>>>>>>>>           if (map->translated_addr + map->size < needle->translated_addr ||
>>>>>>>>> -        needle->translated_addr + needle->size < map->translated_addr) {
>>>>>>>>> +        needle->translated_addr + needle->size < map->translated_addr ||
>>>>>>>>> +        needle->id != map->id) {
>>>>>>>> It looks this iterator can also be invoked by SVQ from
>>>>>>>> vhost_svq_translate_addr() -> iova_tree_find_iova(), where guest GPA
>>>>>>>> space will be searched on without passing in the ID (GPA), and exact
>>>>>>>> match for the same GPA range is not actually needed unlike the mapping
>>>>>>>> removal case. Could we create an API variant, for the SVQ lookup case
>>>>>>>> specifically? Or alternatively, add a special flag, say skip_id_match to
>>>>>>>> DMAMap, and the id match check may look like below:
>>>>>>>>
>>>>>>>> (!needle->skip_id_match && needle->id != map->id)
>>>>>>>>
>>>>>>>> I think vhost_svq_translate_addr() could just call the API variant or
>>>>>>>> pass DMAmap with skip_id_match set to true to svq_iova_tree_find_iova().
>>>>>>>>
>>>>>>> I think you're totally right. But I'd really like to not complicate
>>>>>>> the API of the iova_tree more.
>>>>>>>
>>>>>>> I think we can look for the hwaddr using memory_region_from_host and
>>>>>>> then get the hwaddr. It is another lookup though...
>>>>>> Yeah, that will be another means of doing translation without having to
>>>>>> complicate the API around iova_tree. I wonder how the lookup through
>>>>>> memory_region_from_host() may perform compared to the iova tree one, the
>>>>>> former looks to be an O(N) linear search on a linked list while the
>>>>>> latter would be roughly O(log N) on an AVL tree?
>>>>> Even worse, as the reverse lookup (from QEMU vaddr to SVQ IOVA) is
>>>>> linear too. It is not even ordered.
>>>> Oh Sorry, I misread the code and I should look for g_tree_foreach ()
>>>> instead of g_tree_search_node(). So the former is indeed linear
>>>> iteration, but it looks to be ordered?
>>>>
>>>> https://urldefense.com/v3/__https://github.com/GNOME/glib/blob/main/glib/gtree.c*L1115__;Iw!!ACWV5N9M2RV99hQ!Ng2rLfRd9tLyNTNocW50Mf5AcxSt0uF0wOdv120djff-z_iAdbujYK-jMi5UC1DZLxb1yLUv2vV0j3wJo8o$
>>> The GPA / IOVA are ordered but we're looking by QEMU's vaddr.
>>>
>>> If we have these translations:
>>> [0x1000, 0x2000] -> [0x10000, 0x11000]
>>> [0x2000, 0x3000] -> [0x6000, 0x7000]
>>>
>>> We will see them in this order, so we cannot stop the search at the first node.
>> Yeah, reverse lookup is unordered indeed, anyway.
>>
>>>
>>>>> But apart from this detail you're right, I have the same concerns with
>>>>> this solution too. If we see a hard performance regression we could go
>>>>> to more complicated solutions, like maintaining a reverse IOVATree in
>>>>> vhost-iova-tree too. First RFCs of SVQ did that actually.
>>>> Agreed, yeap we can use memory_region_from_host for now.  Any reason why
>>>> reverse IOVATree was dropped, lack of users? But now we have one!
>>>>
>>> No, it is just simplicity. We already have an user in the hot patch in
>>> the master branch, vhost_svq_vring_write_descs. But I never profiled
>>> enough to find if it is a bottleneck or not to be honest.
>> Right, without vIOMMU or a lot of virtqueues / mappings, it's hard to
>> profile and see the difference.
>>>
>>> I'll send the new series by today, thank you for finding these issues!
>> Thanks! In case you don't have bandwidth to add back reverse IOVA tree,
>> Jonah (cc'ed) may have interest in looking into it.
>>
> 
> Actually, yes. I've tried to solve it using:
> memory_region_get_ram_ptr -> It's hard to get this pointer to work
> without messing a lot with IOVATree.
> memory_region_find -> I'm totally unable to make it return sections
> that make sense
> flatview_for_each_range -> It does not return the same
> MemoryRegionsection as the listener, not sure why.
> 
> The only advance I have is that memory_region_from_host is able to
> tell if the vaddr is from the guest or not.
> 
> So I'm convinced there must be a way to do it with the memory
> subsystem, but I think the best way to do it ATM is to store a
> parallel tree with GPA-> SVQ IOVA translations. At removal time, if we
> find the entry in this new tree, we can directly remove it by GPA. If
> not, assume it is a host-only address like SVQ vrings, and remove by
> iterating on vaddr as we do now. It is guaranteed the guest does not
> translate to that vaddr and that that vaddr is unique in the tree
> anyway.
> 
> Does it sound reasonable? Jonah, would you be interested in moving this forward?
> 
> Thanks!
> 

Sure, I'd be more than happy to work on this stuff! I can probably get 
started on this either today or tomorrow.

Si-Wei mentioned something about these "reverse IOVATree" patches that 
were dropped; is this relevant to what you're asking here? Is it 
something I should base my work off of?

If there's any other relevant information about this issue that you 
think I should know, let me know. I'll start digging into this ASAP and 
will reach out if I need any guidance. :)

Jonah

>> -Siwei
>>
>>
>>>
>>>> Thanks,
>>>> -Siwei
>>>>> Thanks!
>>>>>
>>>>>> Of course,
>>>>>> memory_region_from_host() won't search out of the guest memory space for
>>>>>> sure. As this could be on the hot data path I have a little bit
>>>>>> hesitance over the potential cost or performance regression this change
>>>>>> could bring in, but maybe I'm overthinking it too much...
>>>>>>
>>>>>> Thanks,
>>>>>> -Siwei
>>>>>>
>>>>>>>> Thanks,
>>>>>>>> -Siwei
>>>>>>>>>               return false;
>>>>>>>>>           }
>>>>>>>>>
>>
> 


  reply	other threads:[~2024-04-29 11:19 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-10 10:03 [RFC 0/2] Identify aliased maps in vdpa SVQ iova_tree Eugenio Pérez
2024-04-10 10:03 ` [RFC 1/2] iova_tree: add an id member to DMAMap Eugenio Pérez
2024-04-18 20:46   ` Si-Wei Liu
2024-04-19  8:29     ` Eugenio Perez Martin
2024-04-19 23:49       ` Si-Wei Liu
2024-04-22  8:49         ` Eugenio Perez Martin
2024-04-23 22:20           ` Si-Wei Liu
2024-04-24  7:33             ` Eugenio Perez Martin
2024-04-25 17:43               ` Si-Wei Liu
2024-04-29  8:14                 ` Eugenio Perez Martin
2024-04-29 11:19                   ` Jonah Palmer [this message]
2024-04-30 18:11                     ` Eugenio Perez Martin
2024-05-01 22:08                       ` Si-Wei Liu
2024-05-02  6:18                         ` Eugenio Perez Martin
2024-05-07  9:12                           ` Si-Wei Liu
2024-04-30  5:54                   ` Si-Wei Liu
2024-04-30 17:19                     ` Eugenio Perez Martin
2024-05-01 23:13                       ` Si-Wei Liu
2024-05-02  6:44                         ` Eugenio Perez Martin
2024-05-08  0:52                           ` Si-Wei Liu
2024-05-08 15:25                             ` Eugenio Perez Martin
2024-04-10 10:03 ` [RFC 2/2] vdpa: identify aliased maps in iova_tree Eugenio Pérez
2024-04-12  6:46 ` [RFC 0/2] Identify aliased maps in vdpa SVQ iova_tree Jason Wang
2024-04-12  7:56   ` Eugenio Perez Martin
2024-05-07  7:29     ` Jason Wang
2024-05-07 10:56       ` Eugenio Perez Martin
2024-05-08  2:29         ` Jason Wang
2024-05-08 17:15           ` Eugenio Perez Martin
2024-05-09  6:27             ` Jason Wang
2024-05-09  7:10               ` Eugenio Perez Martin
2024-05-10  4:28                 ` Jason Wang
2024-05-10  7:16                   ` Eugenio Perez Martin
2024-05-11  4:00                     ` Jason Wang
2024-05-13  6:27                       ` Eugenio Perez Martin
2024-05-13  8:28                         ` Jason Wang
2024-05-13  9:56                           ` Eugenio Perez Martin
2024-05-14  3:56                             ` Jason Wang
2024-07-24 16:59                               ` Jonah Palmer
2024-07-29 10:04                                 ` Eugenio Perez Martin
2024-07-29 17:50                                   ` Jonah Palmer
2024-07-29 18:20                                     ` Eugenio Perez Martin
2024-07-29 19:33                                       ` Jonah Palmer
2024-07-30  8:47                                   ` Jason Wang
2024-07-30 11:00                                     ` Eugenio Perez Martin
2024-07-30 12:31                                       ` Jonah Palmer
2024-07-31  9:56                                         ` Eugenio Perez Martin
2024-07-31 14:09                                           ` Jonah Palmer
2024-08-01  0:41                                             ` Si-Wei Liu
2024-08-01  8:22                                               ` Eugenio Perez Martin
2024-08-02  5:54                                                 ` Si-Wei Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1e1cbe99-65e8-4292-b19b-8e054f5b1fca@oracle.com \
    --to=jonah.palmer@oracle.com \
    --cc=dtatulea@nvidia.com \
    --cc=eperezma@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=leiyang@redhat.com \
    --cc=mst@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=si-wei.liu@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).