All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: "Daniel P. Berrangé" <berrange@redhat.com>
Cc: Lukas Straub <lukasstraub2@web.de>,
	Fabiano Rosas <farosas@suse.de>,
	Yong Huang <yong.huang@smartx.com>,
	qemu-devel@nongnu.org
Subject: Re: [PATCH] multifd: Make the main thread yield periodically to the main loop
Date: Tue, 19 Aug 2025 16:03:01 -0400	[thread overview]
Message-ID: <aKTYdUW_4j5qFXOx@x1.local> (raw)
In-Reply-To: <aKRpAP_8qjlNA20A@redhat.com>

On Tue, Aug 19, 2025 at 01:07:28PM +0100, Daniel P. Berrangé wrote:
> On Tue, Aug 19, 2025 at 02:03:26PM +0200, Lukas Straub wrote:
> > On Tue, 19 Aug 2025 11:31:03 +0100
> > Daniel P. Berrangé <berrange@redhat.com> wrote:
> > 
> > > On Mon, Aug 11, 2025 at 10:53:11AM -0300, Fabiano Rosas wrote:
> > > > Lukas Straub <lukasstraub2@web.de> writes:
> > > >   
> > > > > On Fri, 8 Aug 2025 11:37:23 -0400
> > > > > Peter Xu <peterx@redhat.com> wrote:
> > > > >> ...
> > > > >> migrate_cancel() should really be an OOB command..  It should be a superset
> > > > >> of yank features, plus anything migration speficic besides yanking the
> > > > >> channels, for example, when migration thread is blocked in PRE_SWITCHOVER.  
> > > > >
> > > > > Hmm, I think the migration code should handle this properly even if the
> > > > > yank command is used. From the POV of migration, it sees that the
> > > > > connection broke with connection reset. That is the same error as if the
> > > > > other side crashes/is killed or a NAT/stateful firewall in between
> > > > > reboots.
> > > > >  
> > > > 
> > > > That should all work just fine. After yank or after a detectable network
> > > > failure. The issue here seems to be that the destination recv is hanging
> > > > indefinitely. I don't think we ever played with socket timeout
> > > > configurations, or even switching to non-blocking during the sync. This
> > > > is actually (AFAIK) the first time we get a hang that's not "just" a
> > > > synchronization issue in the migration code.  
> > > 
> > > Based on the stack trace, whether the socket is blocking or not isn't a
> > > problem - QEMU is stuck in a  sem_wait call that will delay the coroutine,
> > > and thus the thread, indefinitely. IMHO the semaphore usage needs to be
> > > removed in favour of a synchronization mechanism that can integrate with
> > > event loop such that the coroutine does not block.
> > > 
> > 
> > I don't think that is an issue. The semaphore is just there to sync
> > with the multifd threads, which are in turn blocking on recvmsg.
> > 
> > Without multifd the main thread would hang in recvmsg as well in this
> > scenario.
> 
> If it is using blocking I/O that would hang, but that's another thing
> that should not be done.  The QIOChannel code supports using non-blocking
> sockets in a blocking manner by yielding the coroutine.

The thing is multifd feature, as a whole, is done with a thread-based
model.  It doesn't have any other coroutines to yield, AFAIU..

Instead, I do want to make the precopy load on dest QEMU also happen in a
separate thread instead of the main thread at some point.

I did try it once but it isn't trivial.  Unlike savevm, there're quite some
assumptions that the bql will be around when loading the VM.  But maybe I
should keep trying that until we figure out all such spots and see whether
we can still move it out at some point.

If that'll work some day, then multifd sync on dest qemu will by default
happen without BQL.

Thanks,

-- 
Peter Xu



  reply	other threads:[~2025-08-19 20:04 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-07  2:41 [PATCH] multifd: Make the main thread yield periodically to the main loop yong.huang
2025-08-07  9:32 ` Lukas Straub
2025-08-07  9:36 ` Lukas Straub
2025-08-08  2:36   ` Yong Huang
2025-08-08  7:01     ` Lukas Straub
2025-08-08  8:02       ` Yong Huang
2025-08-08 13:55         ` Fabiano Rosas
2025-08-08 15:37           ` Peter Xu
2025-08-11  2:25             ` Yong Huang
2025-08-11  7:03             ` Lukas Straub
2025-08-11 13:53               ` Fabiano Rosas
2025-08-19 10:31                 ` Daniel P. Berrangé
2025-08-19 12:03                   ` Lukas Straub
2025-08-19 12:07                     ` Daniel P. Berrangé
2025-08-19 20:03                       ` Peter Xu [this message]
2025-08-11  2:27           ` Yong Huang
2025-08-08  6:36 ` Yong Huang
2025-08-08 15:42 ` Peter Xu
2025-08-11  2:02   ` Yong Huang
2025-08-19 10:19 ` Daniel P. Berrangé

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aKTYdUW_4j5qFXOx@x1.local \
    --to=peterx@redhat.com \
    --cc=berrange@redhat.com \
    --cc=farosas@suse.de \
    --cc=lukasstraub2@web.de \
    --cc=qemu-devel@nongnu.org \
    --cc=yong.huang@smartx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.