From: Si-Wei Liu <si-wei.liu@oracle.com>
To: Eugenio Perez Martin <eperezma@redhat.com>
Cc: Jonah Palmer <jonah.palmer@oracle.com>,
Jason Wang <jasowang@redhat.com>,
qemu-devel@nongnu.org, "Michael S. Tsirkin" <mst@redhat.com>,
Lei Yang <leiyang@redhat.com>, Peter Xu <peterx@redhat.com>,
Dragos Tatulea <dtatulea@nvidia.com>
Subject: Re: [RFC 0/2] Identify aliased maps in vdpa SVQ iova_tree
Date: Thu, 1 Aug 2024 22:54:43 -0700 [thread overview]
Message-ID: <fbf055d4-e791-464a-a801-699ab439b82c@oracle.com> (raw)
In-Reply-To: <CAJaqyWcLW3tTdQLM65voYzKQ_S-5ZTQh5NAQAzU88m=BTyWa5g@mail.gmail.com>
On 8/1/2024 1:22 AM, Eugenio Perez Martin wrote:
> On Thu, Aug 1, 2024 at 2:41 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>> Hi Jonah,
>>
>> On 7/31/2024 7:09 AM, Jonah Palmer wrote:
>>>>>>>>>>>> Let me clarify, correct me if I was wrong:
>>>>>>>>>>>>
>>>>>>>>>>>> 1) IOVA allocator is still implemented via a tree, we just
>>>>>>>>>>>> don't need
>>>>>>>>>>>> to store how the IOVA is used
>>>>>>>>>>>> 2) A dedicated GPA -> IOVA tree, updated via listeners and is
>>>>>>>>>>>> used in
>>>>>>>>>>>> the datapath SVQ translation
>>>>>>>>>>>> 3) A linear mapping or another SVQ -> IOVA tree used for SVQ
>>>>>>>>>>>>
>>>>>>>>>>> His solution is composed of three trees:
>>>>>>>>>>> 1) One for the IOVA allocations, so we know where to allocate
>>>>>>>>>>> new ranges
>>>>>>>>>>> 2) One of the GPA -> SVQ IOVA translations.
>>>>>>>>>>> 3) Another one for SVQ vrings translations.
>>>>>>>>>>>
>>> For my understanding, say we have those 3 memory mappings:
>>>
>>> HVA GPA IOVA
>>> ---------------------------------------------------
>>> Map
>>> (1) [0x7f7903e00000, 0x7f7983e00000) [0x0, 0x80000000) [0x1000,
>>> 0x80000000)
>>> (2) [0x7f7983e00000, 0x7f9903e00000) [0x100000000, 0x2080000000)
>>> [0x80001000, 0x2000001000)
>>> (3) [0x7f7903ea0000, 0x7f7903ec0000) [0xfeda0000, 0xfedc0000)
>>> [0x2000001000, 0x2000021000)
>>>
>>> And then say when we go to unmap (e.g. vhost_vdpa_svq_unmap_ring)
>>> we're given an HVA of 0x7f7903eb0000, which fits in both the first and
>>> third mappings.
>>>
>>> The correct one to remove here would be the third mapping, right? Not
>>> only because the HVA range of the third mapping has a more "specific"
>>> or "tighter" range fit given an HVA of 0x7f7903eb0000 (which, as I
>>> understand, may not always be the case in other scenarios), but mainly
>>> because the HVA->GPA translation would give GPA 0xfedb0000, which only
>>> fits in the third mapping's GPA range. Am I understanding this correctly?
>> You're correct, we would still need a GPA -> IOVA tree for mapping and
>> unmapping on guest mem. I've talked to Eugenio this morning and I think
>> he is now aligned. Granted, this GPA tree is partial in IOVA space that
>> doesn't contain ranges from host-only memory (e.g. backed by SVQ
>> descriptors or buffers), we could create an API variant to
>> vhost_iova_tree_map_alloc() and vhost_iova_tree_map_remove(), which not
>> just adds IOVA -> HVA range to the HVA tree, but also manipulates the
>> GPA tree to maintain guest memory mappings, i.e. only invoked from the
>> memory listener ops. Such that this new API is distinguishable from the
>> one in the SVQ mapping and unmapping path that only manipulates the HVA
>> tree.
>>
> Right, I think I understand both Jason's and your approach better, and
> I think it is the best one. To modify the lookup API is hard, as the
> caller does not know if the HVA looked up is contained in the guest
> memory or not. To modify the add or remove regions is easier, as they
> know it.
Exactly.
>
>> I think the only case that you may need to pay attention to in
>> implementation is in the SVQ address translation path, where if you come
>> to an HVA address for translation, you would need to tell apart which
>> tree you'd have to look up - if this HVA is backed by guest mem you
>> could use API qemu_ram_block_from_host() to infer the ram block then the
>> GPA, so you end up doing a lookup on the GPA tree; or else the HVA may
>> be from the SVQ mappings, where you'd have to search the HVA tree again
>> to look for host-mem-only range before you can claim the HVA is a
>> bogus/unmapped address...
> I'd leave this HVA -> IOVA tree for future performance optimization on
> top, and focus on the aliased maps for a first series.
>
> However, calling qemu_ram_block_from_host is actually not needed if
> the HVA tree contains all the translations, both SVQ and guest buffers
> in memory.
If we don't take account of any aliased map or overlapped HVAs, looking
up through the HVA tree itself should work. I think calling
qemu_ram_block_from_host() further assures that we always deal with the
real ram block that backs up the guest memory, while it is hard to
guarantee the same with IOVA -> HVA tree, in case that there exists
overlapped HVA ranges. This is simple and reliable since we avoid
building the HVA lookup tree around any existing assumption or API
implications in the memory subsystem, I'd lean toward using existing
memory system API to simplify the implementation of the IOVA -> HVA tree
(especially the lookup routine).
>
>> For now, this additional second lookup is
>> sub-optimal but inadvitable, but I think both of us agreed that you
>> could start to implement this version first, and look for future
>> opportunity to optimize the lookup performance on top.
>>
> Right, thanks for explaining!
Thanks for the discussion!
-Siwei
>
>>> ---
>>>
>>> In the case where the first mapping here is removed (GPA [0x0,
>>> 0x80000000)), why do we use the word "reintroduce" here? As I
>>> understand it, when we remove a mapping, we're essentially
>>> invalidating the IOVA range associated with that mapping, right? In
>>> other words, the IOVA ranges here don't overlap, so removing a mapping
>>> where its HVA range overlaps another mapping's HVA range shouldn't
>>> affect the other mapping since they have unique IOVA ranges. Is my
>>> understanding correct here or am I probably missing something?
>> With the GPA tree I think this case should work fine. I've double
>> checked the implementation of vhost-vdpa iotlb, and doesn't see a red
>> flag there.
>>
>> Thanks,
>> -Siwei
>>
>
prev parent reply other threads:[~2024-08-02 5:55 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-10 10:03 [RFC 0/2] Identify aliased maps in vdpa SVQ iova_tree Eugenio Pérez
2024-04-10 10:03 ` [RFC 1/2] iova_tree: add an id member to DMAMap Eugenio Pérez
2024-04-18 20:46 ` Si-Wei Liu
2024-04-19 8:29 ` Eugenio Perez Martin
2024-04-19 23:49 ` Si-Wei Liu
2024-04-22 8:49 ` Eugenio Perez Martin
2024-04-23 22:20 ` Si-Wei Liu
2024-04-24 7:33 ` Eugenio Perez Martin
2024-04-25 17:43 ` Si-Wei Liu
2024-04-29 8:14 ` Eugenio Perez Martin
2024-04-29 11:19 ` Jonah Palmer
2024-04-30 18:11 ` Eugenio Perez Martin
2024-05-01 22:08 ` Si-Wei Liu
2024-05-02 6:18 ` Eugenio Perez Martin
2024-05-07 9:12 ` Si-Wei Liu
2024-04-30 5:54 ` Si-Wei Liu
2024-04-30 17:19 ` Eugenio Perez Martin
2024-05-01 23:13 ` Si-Wei Liu
2024-05-02 6:44 ` Eugenio Perez Martin
2024-05-08 0:52 ` Si-Wei Liu
2024-05-08 15:25 ` Eugenio Perez Martin
2024-04-10 10:03 ` [RFC 2/2] vdpa: identify aliased maps in iova_tree Eugenio Pérez
2024-04-12 6:46 ` [RFC 0/2] Identify aliased maps in vdpa SVQ iova_tree Jason Wang
2024-04-12 7:56 ` Eugenio Perez Martin
2024-05-07 7:29 ` Jason Wang
2024-05-07 10:56 ` Eugenio Perez Martin
2024-05-08 2:29 ` Jason Wang
2024-05-08 17:15 ` Eugenio Perez Martin
2024-05-09 6:27 ` Jason Wang
2024-05-09 7:10 ` Eugenio Perez Martin
2024-05-10 4:28 ` Jason Wang
2024-05-10 7:16 ` Eugenio Perez Martin
2024-05-11 4:00 ` Jason Wang
2024-05-13 6:27 ` Eugenio Perez Martin
2024-05-13 8:28 ` Jason Wang
2024-05-13 9:56 ` Eugenio Perez Martin
2024-05-14 3:56 ` Jason Wang
2024-07-24 16:59 ` Jonah Palmer
2024-07-29 10:04 ` Eugenio Perez Martin
2024-07-29 17:50 ` Jonah Palmer
2024-07-29 18:20 ` Eugenio Perez Martin
2024-07-29 19:33 ` Jonah Palmer
2024-07-30 8:47 ` Jason Wang
2024-07-30 11:00 ` Eugenio Perez Martin
2024-07-30 12:31 ` Jonah Palmer
2024-07-31 9:56 ` Eugenio Perez Martin
2024-07-31 14:09 ` Jonah Palmer
2024-08-01 0:41 ` Si-Wei Liu
2024-08-01 8:22 ` Eugenio Perez Martin
2024-08-02 5:54 ` Si-Wei Liu [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fbf055d4-e791-464a-a801-699ab439b82c@oracle.com \
--to=si-wei.liu@oracle.com \
--cc=dtatulea@nvidia.com \
--cc=eperezma@redhat.com \
--cc=jasowang@redhat.com \
--cc=jonah.palmer@oracle.com \
--cc=leiyang@redhat.com \
--cc=mst@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).