qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Li Zhang <lizhang@suse.de>
To: "Daniel P. Berrangé" <berrange@redhat.com>
Cc: cfontana@suse.de, qemu-devel@nongnu.org, dgilbert@redhat.com,
	quintela@redhat.com
Subject: Re: [PATCH 1/2] multifd: use qemu_sem_timedwait in multifd_recv_thread to avoid waiting forever
Date: Fri, 26 Nov 2021 18:44:21 +0100	[thread overview]
Message-ID: <3e726683-b91a-263e-a8c2-2a43a776748c@suse.de> (raw)
In-Reply-To: <YaEVuHJuWcwXISTI@redhat.com>


On 11/26/21 6:13 PM, Daniel P. Berrangé wrote:
> On Fri, Nov 26, 2021 at 06:00:24PM +0100, Li Zhang wrote:
>> On 11/26/21 5:51 PM, Daniel P. Berrangé wrote:
>>> On Fri, Nov 26, 2021 at 05:44:04PM +0100, Li Zhang wrote:
>>>> On 11/26/21 4:49 PM, Daniel P. Berrangé wrote:
>>>>> On Fri, Nov 26, 2021 at 04:31:53PM +0100, Li Zhang wrote:
>>>>>> When doing live migration with multifd channels 8, 16 or larger number,
>>>>>> the guest hangs in the presence of the network errors such as missing TCP ACKs.
>>>>>>
>>>>>> At sender's side:
>>>>>> The main thread is blocked on qemu_thread_join, migration_fd_cleanup
>>>>>> is called because one thread fails on qio_channel_write_all when
>>>>>> the network problem happens and other send threads are blocked on sendmsg.
>>>>>> They could not be terminated. So the main thread is blocked on qemu_thread_join
>>>>>> to wait for the threads terminated.
>>>>> Isn't the right answer here to ensure we've called 'shutdown' on
>>>>> all the FDs, so that the threads get kicked out of sendmsg, before
>>>>> trying to join the thread ?
>>>> If we shutdown the channels at sender's side, it could terminate send
>>>> threads. The receive threads are still waiting there.
>>>>
>>>>   From receiver's side, if wait semaphore is timeout, the channels can be
>>>> terminated at last. And the sender threads also be terminated at last.
>>> If something goes wrong on the sender side, the mgmt app should be
>>> tearing down the destination QEMU entirely, so I'm not sure we need
>>> to do anything special to deal with received threads.
>>>
>>> Using semtimedwait just feels risky because it will introduce false
>>> failures if the system/network is under high load such that the
>>> connections don't all establish within 1 second.
>> You are right. This may be a risk. I am not sure if the interval is proper,
>> we can set longer.
> I don't think any kind of timeout is right in this context. There should
> be a sem_post() invoked in every scenario where we want to tear down the
> recv thread. That should only be the case when we see the other end of
> the connection close IMHO.

OK,  I need to consider about that. It may be better to shutdown the 
channels from sender's side.


Thanks

Li

>
> Regards,
> Daniel


  reply	other threads:[~2021-11-26 17:45 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-26 15:31 [PATCH 0/2] migration: multifd live migration improvement Li Zhang
2021-11-26 15:31 ` [PATCH 1/2] multifd: use qemu_sem_timedwait in multifd_recv_thread to avoid waiting forever Li Zhang
2021-11-26 15:49   ` Daniel P. Berrangé
2021-11-26 16:44     ` Li Zhang
2021-11-26 16:51       ` Daniel P. Berrangé
2021-11-26 17:00         ` Li Zhang
2021-11-26 17:13           ` Daniel P. Berrangé
2021-11-26 17:44             ` Li Zhang [this message]
2021-11-29 11:20     ` Dr. David Alan Gilbert
2021-11-29 13:37       ` Li Zhang
2021-11-29 14:50         ` Dr. David Alan Gilbert
2021-11-29 15:34           ` Li Zhang
2021-12-01 12:11           ` Li Zhang
2021-12-01 12:22             ` Daniel P. Berrangé
2021-12-01 13:42               ` Li Zhang
2021-12-01 14:09                 ` Daniel P. Berrangé
2021-12-01 14:15                   ` Li Zhang
2021-11-29 14:58       ` Daniel P. Berrangé
2021-11-29 15:49         ` Dr. David Alan Gilbert
2021-12-06  9:28           ` Li Zhang
2021-11-26 16:33   ` Juan Quintela
2021-11-26 16:56     ` Li Zhang
2021-11-26 15:31 ` [PATCH 2/2] migration: Set the socket backlog number to reduce the chance of live migration failure Li Zhang
2021-11-26 16:32   ` Juan Quintela
2021-11-26 16:44     ` Li Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3e726683-b91a-263e-a8c2-2a43a776748c@suse.de \
    --to=lizhang@suse.de \
    --cc=berrange@redhat.com \
    --cc=cfontana@suse.de \
    --cc=dgilbert@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).