From: Jason Wang <jasowang@redhat.com>
To: "Zhoujian (jay)" <jianjay.zhou@huawei.com>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>
Cc: "mst@redhat.com" <mst@redhat.com>,
"Gonglei (Arei)" <arei.gonglei@huawei.com>,
"marcandre.lureau@redhat.com" <marcandre.lureau@redhat.com>,
"Huangweidong (C)" <weidong.huang@huawei.com>,
"wangxin (U)" <wangxinxin.wang@huawei.com>,
Maxime Coquelin <maxime.coquelin@redhat.com>
Subject: Re: [Qemu-devel] [RFC PATCH] vhost-user: quit infinite loop while used memslots is more than the backend limit
Date: Wed, 20 Dec 2017 16:56:59 +0800 [thread overview]
Message-ID: <ce633ca0-92c8-de17-fc94-cd9454ca29ad@redhat.com> (raw)
In-Reply-To: <B2D15215269B544CADD246097EACE7473AB0D08F@DGGEMM505-MBS.china.huawei.com>
On 2017年12月19日 18:21, Zhoujian (jay) wrote:
>
>> -----Original Message-----
>> From: Jason Wang [mailto:jasowang@redhat.com]
>> Sent: Tuesday, December 19, 2017 5:24 PM
>> To: Zhoujian (jay) <jianjay.zhou@huawei.com>; qemu-devel@nongnu.org
>> Cc: mst@redhat.com; Gonglei (Arei) <arei.gonglei@huawei.com>;
>> marcandre.lureau@redhat.com; Huangweidong (C) <weidong.huang@huawei.com>;
>> wangxin (U) <wangxinxin.wang@huawei.com>
>> Subject: Re: [RFC PATCH] vhost-user: quit infinite loop while used
>> memslots is more than the backend limit
>>
>>
>>
>> On 2017年12月13日 15:04, Jay Zhou wrote:
>>> Steps to 100% reproduce:
>>> (1) start a VM with two VFIO 82599 PCI NICs successfully
>>> (2) hot plug a RAM, which is attached successfully
>>> (3) hot plug another RAM, which is attached successfully
>>> (4) hot plug a vhost-user NIC, which is attached successfully
>>> (5) hot plug another vhost-user NIC, which is hanged
>>>
>>> then the qemu process infinitely and quickly print the logs below:
>>> [...]
>>> vhost backend memory slots limit is less than current number of
>> present memory slots
>>> failed to init vhost_net for queue 0
>>> vhost backend memory slots limit is less than current number of
>> present memory slots
>>> failed to init vhost_net for queue 0
>>> vhost backend memory slots limit is less than current number of
>> present memory slots
>>> failed to init vhost_net for queue 0
>>> [...]
>>>
>>> and the cpu utilization of the systemd-journal process on host is
>>> 100%, the reason is clear:
>>> "used_memslots > hdev->vhost_ops-
>>> vhost_backend_memslots_limit(hdev)"
>>> is true in vhost_dev_init, and the main thread in the infinite do and
>>> while loop in net_vhost_user_init.
>>>
>>> One way is to increase the vhost backend memory slots limit, some
>>> discussions are here:
>>>
>>> https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg04572.html
>>>
>>> I think if
>>> "used_memslots > hdev->vhost_ops-
>>> vhost_backend_memslots_limit(hdev)"
>>> is true, it does not make sense to try to initialize the vhost-user
>>> again and again, because the main thread can not handle other qmps at
>>> all, which means it can not reduce the number of memslots, it must be
>>> failed at last.
>> You should include the reason why we hit infinite loop in the commit log,
>> and what's more important, is this issue specific to memslots only?
>> What if you hit errors other than this in vhost_dev_init() for vhost-user?
> If the new hot plugged vhost-user NIC failed to initialize in vhost_dev_init,
> the s->started in net_vhost_user_init is always false, so we hit infinite loop
> in net_vhost_user_init().
> It is not specific to memslots only, e.g. if the backend lacks some feature
> masks, it will hit infinite loop too.
>
> Is it available to identify the cases that MAY be failed and MUST be failed
> at initialization?
> - If it must be failed, quit the loop at once.
> - If it may be failed, then we should do the loop to wait the connection
> established, i.e. wait the s->started becomes true. And some sleep time
> may be added to reduce the CPU utilization.
Then I think you probably need another flag e.g s->error and set it on
fatal error.
>
>> Is this better to disable event source in this case?
> The "event source" you mean is the logs printed?
> If it is disabled, the reason why initialization failed is not explicit,
> this is the drawback.
>
No I mean the thing that triggers the loop which is s->started here.
Thanks
> Regards,
> Jay
>
>> Thanks
>>
>>> Signed-off-by: Jay Zhou <jianjay.zhou@huawei.com>
>>> ---
>>> hw/virtio/vhost.c | 3 +++
>>> include/hw/virtio/vhost.h | 2 ++
>>> net/vhost-user.c | 5 +++++
>>> 3 files changed, 10 insertions(+)
>>>
>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index
>>> e4290ce..1cc4a18 100644
>>> --- a/hw/virtio/vhost.c
>>> +++ b/hw/virtio/vhost.c
>>> @@ -47,6 +47,8 @@ static unsigned int used_memslots;
>>> static QLIST_HEAD(, vhost_dev) vhost_devices =
>>> QLIST_HEAD_INITIALIZER(vhost_devices);
>>>
>>> +bool used_memslots_exceeded;
>>> +
>>> bool vhost_has_free_slot(void)
>>> {
>>> unsigned int slots_limit = ~0U;
>>> @@ -1254,6 +1256,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void
>> *opaque,
>>> if (used_memslots > hdev->vhost_ops-
>>> vhost_backend_memslots_limit(hdev)) {
>>> error_report("vhost backend memory slots limit is less"
>>> " than current number of present memory slots");
>>> + used_memslots_exceeded = true;
>>> r = -1;
>>> goto fail;
>>> }
>>> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
>>> index 467dc77..bcc66d9 100644
>>> --- a/include/hw/virtio/vhost.h
>>> +++ b/include/hw/virtio/vhost.h
>>> @@ -6,6 +6,8 @@
>>> #include "hw/virtio/virtio.h"
>>> #include "exec/memory.h"
>>>
>>> +extern bool used_memslots_exceeded;
>>> +
>>> /* Generic structures common for any vhost based device. */
>>> struct vhost_virtqueue {
>>> int kick;
>>> diff --git a/net/vhost-user.c b/net/vhost-user.c index
>>> c23927c..8b7bfff 100644
>>> --- a/net/vhost-user.c
>>> +++ b/net/vhost-user.c
>>> @@ -17,6 +17,7 @@
>>> #include "qemu/error-report.h"
>>> #include "qmp-commands.h"
>>> #include "trace.h"
>>> +#include "include/hw/virtio/vhost.h"
>>>
>>> typedef struct VhostUserState {
>>> NetClientState nc;
>>> @@ -311,6 +312,10 @@ static int net_vhost_user_init(NetClientState *peer,
>> const char *device,
>>> qemu_chr_fe_set_handlers(&s->chr, NULL, NULL,
>>> net_vhost_user_event, NULL, nc0->name,
>> NULL,
>>> true);
>>> + if (used_memslots_exceeded) {
>>> + error_report("used memslots exceeded the backend limit,
>> quit loop");
>>> + return -1;
>>> + }
>>> } while (!s->started);
>>>
>>> assert(s->vhost_net);
prev parent reply other threads:[~2017-12-20 8:57 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-12-13 7:04 [Qemu-devel] [RFC PATCH] vhost-user: quit infinite loop while used memslots is more than the backend limit Jay Zhou
2017-12-13 7:11 ` no-reply
2017-12-13 7:38 ` no-reply
2017-12-19 9:23 ` Jason Wang
2017-12-19 10:21 ` Zhoujian (jay)
2017-12-20 8:56 ` Jason Wang [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ce633ca0-92c8-de17-fc94-cd9454ca29ad@redhat.com \
--to=jasowang@redhat.com \
--cc=arei.gonglei@huawei.com \
--cc=jianjay.zhou@huawei.com \
--cc=marcandre.lureau@redhat.com \
--cc=maxime.coquelin@redhat.com \
--cc=mst@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=wangxinxin.wang@huawei.com \
--cc=weidong.huang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).