qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Si-Wei Liu <si-wei.liu@oracle.com>
To: Eugenio Perez Martin <eperezma@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>,
	qemu-devel@nongnu.org, Peter Xu <peterx@redhat.com>,
	Jason Wang <jasowang@redhat.com>,
	Laurent Vivier <lvivier@redhat.com>,
	Dragos Tatulea <dtatulea@nvidia.com>,
	Lei Yang <leiyang@redhat.com>, Parav Pandit <parav@mellanox.com>,
	Stefano Garzarella <sgarzare@redhat.com>,
	Zhu Lingshan <lingshan.zhu@intel.com>
Subject: Re: [PATCH v2 6/7] vdpa: move iova_tree allocation to net_vhost_vdpa_init
Date: Mon, 1 Apr 2024 23:19:32 -0700	[thread overview]
Message-ID: <58cf082c-fa54-48a6-aa49-e8b6cba60f53@oracle.com> (raw)
In-Reply-To: <CAJaqyWdDRqMEwVh6ZcVdnEZoXy-_9B2qk25eYcoVmeeTxgGm8g@mail.gmail.com>



On 2/14/2024 11:11 AM, Eugenio Perez Martin wrote:
> On Wed, Feb 14, 2024 at 7:29 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>> Hi Michael,
>>
>> On 2/13/2024 2:22 AM, Michael S. Tsirkin wrote:
>>> On Mon, Feb 05, 2024 at 05:10:36PM -0800, Si-Wei Liu wrote:
>>>> Hi Eugenio,
>>>>
>>>> I thought this new code looks good to me and the original issue I saw with
>>>> x-svq=on should be gone. However, after rebase my tree on top of this,
>>>> there's a new failure I found around setting up guest mappings at early
>>>> boot, please see attached the specific QEMU config and corresponding event
>>>> traces. Haven't checked into the detail yet, thinking you would need to be
>>>> aware of ahead.
>>>>
>>>> Regards,
>>>> -Siwei
>>> Eugenio were you able to reproduce? Siwei did you have time to
>>> look into this?
>> Didn't get a chance to look into the detail yet in the past week, but
>> thought it may have something to do with the (internals of) iova tree
>> range allocation and the lookup routine. It started to fall apart at the
>> first vhost_vdpa_dma_unmap call showing up in the trace events, where it
>> should've gotten IOVA=0x2000001000,  but an incorrect IOVA address
>> 0x1000 was ended up returning from the iova tree lookup routine.
>>
>> HVA                    GPA                IOVA
>> -------------------------------------------------------------------------------------------------------------------------
>> Map
>> [0x7f7903e00000, 0x7f7983e00000)    [0x0, 0x80000000) [0x1000, 0x80000000)
>> [0x7f7983e00000, 0x7f9903e00000)    [0x100000000, 0x2080000000)
>> [0x80001000, 0x2000001000)
>> [0x7f7903ea0000, 0x7f7903ec0000)    [0xfeda0000, 0xfedc0000)
>> [0x2000001000, 0x2000021000)
>>
>> Unmap
>> [0x7f7903ea0000, 0x7f7903ec0000)    [0xfeda0000, 0xfedc0000) [0x1000,
>> 0x20000) ???
>>                                       shouldn't it be [0x2000001000,
>> 0x2000021000) ???
>>
It looks the SVQ iova tree lookup routine vhost_iova_tree_find_iova(), 
which is called from vhost_vdpa_listener_region_del(), can't properly 
deal with overlapped region. Specifically, q35's mch_realize() has the 
following:

579     memory_region_init_alias(&mch->open_high_smram, OBJECT(mch), 
"smram-open-high",
580                              mch->ram_memory, 
MCH_HOST_BRIDGE_SMRAM_C_BASE,
581                              MCH_HOST_BRIDGE_SMRAM_C_SIZE);
582     memory_region_add_subregion_overlap(mch->system_memory, 0xfeda0000,
583 &mch->open_high_smram, 1);
584     memory_region_set_enabled(&mch->open_high_smram, false);

#0  0x0000564c30bf6980 in iova_tree_find_address_iterator 
(key=0x564c331cf8e0, value=0x564c331cf8e0, data=0x7fffb6d749b0) at 
../util/iova-tree.c:96
#1  0x00007f5f66479654 in g_tree_foreach () at /lib64/libglib-2.0.so.0
#2  0x0000564c30bf6b53 in iova_tree_find_iova (tree=<optimized out>, 
map=map@entry=0x7fffb6d74a00) at ../util/iova-tree.c:114
#3  0x0000564c309da0a9 in vhost_iova_tree_find_iova (tree=<optimized 
out>, map=map@entry=0x7fffb6d74a00) at ../hw/virtio/vhost-iova-tree.c:70
#4  0x0000564c3085e49d in vhost_vdpa_listener_region_del 
(listener=0x564c331024c8, section=0x7fffb6d74aa0) at 
../hw/virtio/vhost-vdpa.c:444
#5  0x0000564c309f4931 in address_space_update_topology_pass 
(as=as@entry=0x564c31ab1840 <address_space_memory>, 
old_view=old_view@entry=0x564c33364cc0, 
new_view=new_view@entry=0x564c333640f0, adding=adding@entry=false) at 
../system/memory.c:977
#6  0x0000564c309f4dcd in address_space_set_flatview (as=0x564c31ab1840 
<address_space_memory>) at ../system/memory.c:1079
#7  0x0000564c309f86d0 in memory_region_transaction_commit () at 
../system/memory.c:1132
#8  0x0000564c309f86d0 in memory_region_transaction_commit () at 
../system/memory.c:1117
#9  0x0000564c307cce64 in mch_realize (d=<optimized out>, 
errp=<optimized out>) at ../hw/pci-host/q35.c:584

However, it looks like iova_tree_find_address_iterator() only check if 
the translated address (HVA) falls in to the range when trying to locate 
the desired IOVA, causing the first DMAMap that happens to overlap in 
the translated address (HVA) space to be returned prematurely:

  89 static gboolean iova_tree_find_address_iterator(gpointer key, 
gpointer value,
  90                                                 gpointer data)
  91 {
  :
  :
  99     if (map->translated_addr + map->size < needle->translated_addr ||
100         needle->translated_addr + needle->size < map->translated_addr) {
101         return false;
102     }
103
104     args->result = map;
105     return true;
106 }

In the QEMU trace file, it reveals that the first DMAMap as below gets 
returned incorrectly instead the second, the latter of which is what the 
actual IOVA corresponds to:

HVA								GPA						IOVA
[0x7f7903e00000, 0x7f7983e00000) 	[0x0, 0x80000000)			[0x1000, 0x80001000)
[0x7f7903ea0000, 0x7f7903ec0000)	[0xfeda0000, 0xfedc0000)	[0x2000001000, 0x2000021000)


Maybe other than check the HVA range, we should also match GPA, or at 
least the size should exactly match?

> Yes, I'm still not able to reproduce. In particular, I don't know how
> how the memory listener adds a region and then release a region with a
> different size. I'm talking about these log entries:
>
> 1706854838.154394:vhost_vdpa_listener_region_add vdpa: 0x556d45c75140
> iova 0x0 llend 0x80000000 vaddr: 0x7f7903e00000 read-only: 0
> 452:vhost_vdpa_listener_region_del vdpa: 0x556d45c75140 iova 0x0 llend
> 0x7fffffff
Didn't see a different size here, though if you referred to the 
discrepancy in the traces around llend, I thought the two between _add() 
and _del() would have to be interpreted differently due to:

3d1e4d34 "vhost_vdpa: fix the input in 
trace_vhost_vdpa_listener_region_del()"

Regards,
-Siwei
> Is it possible for you to also trace the skipped regions? We should
> add a debug trace there too...
>
> Thanks!
>
>> PS, I will be taking off from today and for the next two weeks. Will try
>> to help out looking more closely after I get back.
>>
>> -Siwei
>>>    Can't merge patches which are known to break things ...



  reply	other threads:[~2024-04-02  6:20 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-01 18:09 [PATCH v2 0/7] Move memory listener register to vhost_vdpa_init Eugenio Pérez
2024-02-01 18:09 ` [PATCH v2 1/7] vdpa: check for iova tree initialized at net_client_start Eugenio Pérez
2024-02-01 18:09 ` [PATCH v2 2/7] vdpa: reorder vhost_vdpa_set_backend_cap Eugenio Pérez
2024-02-01 18:09 ` [PATCH v2 3/7] vdpa: set backend capabilities at vhost_vdpa_init Eugenio Pérez
2024-02-01 18:09 ` [PATCH v2 4/7] vdpa: add listener_registered Eugenio Pérez
2024-02-01 18:09 ` [PATCH v2 5/7] vdpa: reorder listener assignment Eugenio Pérez
2024-02-01 18:09 ` [PATCH v2 6/7] vdpa: move iova_tree allocation to net_vhost_vdpa_init Eugenio Pérez
2024-02-06  1:10   ` Si-Wei Liu
2024-02-13 10:22     ` Michael S. Tsirkin
2024-02-13 16:26       ` Eugenio Perez Martin
2024-02-14 18:37         ` Si-Wei Liu
2024-02-14 18:29       ` Si-Wei Liu
2024-02-14 19:11         ` Eugenio Perez Martin
2024-04-02  6:19           ` Si-Wei Liu [this message]
2024-04-02 12:01             ` Eugenio Perez Martin
2024-04-03  6:53               ` Si-Wei Liu
2024-04-03  8:46                 ` Eugenio Perez Martin
2024-02-01 18:09 ` [PATCH v2 7/7] vdpa: move memory listener register to vhost_vdpa_init Eugenio Pérez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=58cf082c-fa54-48a6-aa49-e8b6cba60f53@oracle.com \
    --to=si-wei.liu@oracle.com \
    --cc=dtatulea@nvidia.com \
    --cc=eperezma@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=leiyang@redhat.com \
    --cc=lingshan.zhu@intel.com \
    --cc=lvivier@redhat.com \
    --cc=mst@redhat.com \
    --cc=parav@mellanox.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=sgarzare@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).