qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Fabiano Rosas <farosas@suse.de>
Cc: qemu-devel@nongnu.org, Bryan Zhang <bryan.zhang@bytedance.com>,
	Prasad Pandit <ppandit@redhat.com>,
	Yuan Liu <yuan1.liu@intel.com>, Avihai Horon <avihaih@nvidia.com>,
	Hao Xiang <hao.xiang@bytedance.com>
Subject: Re: [PATCH 07/14] migration/multifd: Simplify locking in sender thread
Date: Thu, 1 Feb 2024 18:37:01 +0800	[thread overview]
Message-ID: <Zbt0TUeKR3UTJnC0@x1n> (raw)
In-Reply-To: <87o7d1jlu5.fsf@suse.de>

On Wed, Jan 31, 2024 at 05:21:06PM -0300, Fabiano Rosas wrote:
> peterx@redhat.com writes:
> 
> > From: Peter Xu <peterx@redhat.com>
> >
> > The sender thread will yield the p->mutex before IO starts, trying to not
> > block the requester thread.  This may be unnecessary lock optimizations,
> > because the requester can already read pending_job safely even without the
> > lock, because the requester is currently the only one who can assign a
> > task.
> 
> What about the coroutine yield at qio_channel_writev_full_all()? Is it
> safe from yield while holding a lock? Could the main loop dispatch the
> cleanup function, it calls join on the multifd thread and it deadlocks?

This should be fine, IMHO, as sender threads are never in a coroutine?
IOW, it should be qemu_in_coroutine()==false always.

> 
> >
> > Drop that lock complication on both sides:
> >
> >   (1) in the sender thread, always take the mutex until job done
> >   (2) in the requester thread, check pending_job clear lockless
> >
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  migration/multifd.c | 23 ++++++++++++++++-------
> >  1 file changed, 16 insertions(+), 7 deletions(-)
> >
> > diff --git a/migration/multifd.c b/migration/multifd.c
> > index 6a4863edd2..4dc5af0a15 100644
> > --- a/migration/multifd.c
> > +++ b/migration/multifd.c
> > @@ -429,7 +429,9 @@ static int multifd_send_pages(void)
> >          return -1;
> >      }
> >  
> > +    /* We wait here, until at least one channel is ready */
> >      qemu_sem_wait(&multifd_send_state->channels_ready);
> > +
> >      /*
> >       * next_channel can remain from a previous migration that was
> >       * using more channels, so ensure it doesn't overflow if the
> > @@ -441,17 +443,26 @@ static int multifd_send_pages(void)
> >              return -1;
> >          }
> >          p = &multifd_send_state->params[i];
> > -        qemu_mutex_lock(&p->mutex);
> > +        /*
> > +         * Lockless read to p->pending_job is safe, because only multifd
> > +         * sender thread can clear it.
> > +         */
> >          if (!p->pending_job) {
> 
> The worst it could happen is we read at the same time the thread is
> clearing it and we loop to the next channel. So it doesn't need to be
> atomic either.

Yep.  Actually the worst case is when all the rest N-1 channels are all
busy, then we loop N more times to fetch this pending_job finally became
false, but only if a race, so should be fine.

I'll switch to qatomic_read|set() in v2, btw, which I forgot yesterday.  in
case some compiler does register-cache tricks here to avoid a dead loop.

> 
> > -            p->pending_job = true;
> >              next_channel = (i + 1) % migrate_multifd_channels();
> >              break;
> >          }
> > -        qemu_mutex_unlock(&p->mutex);
> >      }
> > +
> > +    qemu_mutex_lock(&p->mutex);
> 
> What data this lock protects now? Everything below here only happens
> after this thread sees pending_job==false. It seems we would only need a
> barrier on the multifd thread to make sure p->pending_job=false is
> ordered after everything.
> 
> Even for the "sync" case, it appears the lock is not needed as well?

Great question. :)

Let's see whether we can remove the lock.  Since I'll need to run for
today, I'll have a closer look tomorrow.

Hopefully we still keep this patch untouched? The goal of this patch
originally was only trying to simplify the sender thread on releasing lock
one more time.  Current change should avoid that already.

> 
> We might need to remove p->running first and move the kick from
> multifd_send_terminate_threads() into multifd_save_cleanup() like I
> suggested, but it seems like we could remove this lock.
> 
> Which would make sense, because there's nothing another thread would
> want to do with a channel's MultiFDSendParams unless the channel is idle
> waiting for work.
> 
> > +    /*
> > +     * Double check on pending_job==false with the lock.  In the future if
> > +     * we can have >1 requester thread, we can replace this with a "goto
> > +     * retry", but that is for later.
> > +     */
> > +    assert(p->pending_job == false);
> > +    p->pending_job = true;
> >      assert(!p->pages->num);
> >      assert(!p->pages->block);
> > -
> >      p->packet_num = multifd_send_state->packet_num++;
> 
> I noticed this line cannot be here. If the channel thread takes long to
> wakeup, the "sync" code will increment once more and overwrite this
> field. This and the identical line at multifd_send_sync_main() should go
> into multifd_send_fill_packet() I think.

Another good one.

This is similarly relevant to my effort to split pending_job into two:
these two things (job/sync) are potentially racy on using *p.

Moving it to threads will require an atomic op, but I'll do it, because
otherwise it's racy as you correctly pointed out.

Another work for me tomorrow; I'll prepare something.

> 
> >      multifd_send_state->pages = p->pages;
> >      p->pages = pages;
> > @@ -704,8 +715,6 @@ static void *multifd_send_thread(void *opaque)
> >              multifd_send_fill_packet(p);
> >              p->num_packets++;
> >              p->total_normal_pages += pages->num;
> > -            qemu_mutex_unlock(&p->mutex);
> > -
> >              trace_multifd_send(p->id, packet_num, pages->num, p->flags,
> >                                 p->next_packet_size);
> >  
> > @@ -725,6 +734,7 @@ static void *multifd_send_thread(void *opaque)
> >              ret = qio_channel_writev_full_all(p->c, p->iov, p->iovs_num, NULL,
> >                                                0, p->write_flags, &local_err);
> >              if (ret != 0) {
> > +                qemu_mutex_unlock(&p->mutex);
> >                  break;
> >              }
> >  
> > @@ -733,7 +743,6 @@ static void *multifd_send_thread(void *opaque)
> >  
> >              multifd_pages_reset(p->pages);
> >              p->next_packet_size = 0;
> > -            qemu_mutex_lock(&p->mutex);
> >              p->pending_job = false;
> >              qemu_mutex_unlock(&p->mutex);
> >          } else if (p->pending_sync) {
> 

-- 
Peter Xu



  reply	other threads:[~2024-02-01 10:37 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-31 10:30 [PATCH 00/14] migration/multifd: Refactor ->send_prepare() and cleanups peterx
2024-01-31 10:30 ` [PATCH 01/14] migration/multifd: Drop stale comment for multifd zero copy peterx
2024-01-31 10:30 ` [PATCH 02/14] migration/multifd: multifd_send_kick_main() peterx
2024-01-31 10:31 ` [PATCH 03/14] migration/multifd: Drop MultiFDSendParams.quit, cleanup error paths peterx
2024-01-31 15:05   ` Fabiano Rosas
2024-02-01  9:28     ` Peter Xu
2024-02-01 13:30       ` Fabiano Rosas
2024-02-02  0:21         ` Peter Xu
2024-01-31 10:31 ` [PATCH 04/14] migration/multifd: Postpone reset of MultiFDPages_t peterx
2024-01-31 15:27   ` Fabiano Rosas
2024-02-01 10:01     ` Peter Xu
2024-02-01 15:21       ` Fabiano Rosas
2024-02-02  0:28         ` Peter Xu
2024-02-02  0:37           ` Peter Xu
2024-02-02 12:15             ` Fabiano Rosas
2024-01-31 10:31 ` [PATCH 05/14] migration/multifd: Drop MultiFDSendParams.normal[] array peterx
2024-01-31 16:02   ` Fabiano Rosas
2024-01-31 10:31 ` [PATCH 06/14] migration/multifd: Separate SYNC request with normal jobs peterx
2024-01-31 18:45   ` Fabiano Rosas
2024-01-31 10:31 ` [PATCH 07/14] migration/multifd: Simplify locking in sender thread peterx
2024-01-31 20:21   ` Fabiano Rosas
2024-02-01 10:37     ` Peter Xu [this message]
2024-01-31 10:31 ` [PATCH 08/14] migration/multifd: Drop pages->num check " peterx
2024-01-31 21:19   ` Fabiano Rosas
2024-01-31 10:31 ` [PATCH 09/14] migration/multifd: Rename p->num_packets and clean it up peterx
2024-01-31 21:24   ` Fabiano Rosas
2024-01-31 10:31 ` [PATCH 10/14] migration/multifd: Move total_normal_pages accounting peterx
2024-01-31 21:26   ` Fabiano Rosas
2024-01-31 10:31 ` [PATCH 11/14] migration/multifd: Move trace_multifd_send|recv() peterx
2024-01-31 21:26   ` Fabiano Rosas
2024-01-31 10:31 ` [PATCH 12/14] migration/multifd: multifd_send_prepare_header() peterx
2024-01-31 21:32   ` Fabiano Rosas
2024-02-01 10:02     ` Peter Xu
2024-01-31 10:31 ` [PATCH 13/14] migration/multifd: Move header prepare/fill into send_prepare() peterx
2024-01-31 21:42   ` Fabiano Rosas
2024-02-01 10:15     ` Peter Xu
2024-02-02  3:57   ` Peter Xu
2024-01-31 10:31 ` [PATCH 14/14] migration/multifd: Forbid spurious wakeups peterx
2024-01-31 21:43   ` Fabiano Rosas
2024-02-01  6:01   ` Peter Xu
2024-01-31 22:49 ` [PATCH 00/14] migration/multifd: Refactor ->send_prepare() and cleanups Fabiano Rosas
2024-02-01  5:47   ` Peter Xu
2024-02-01 12:51     ` Avihai Horon
2024-02-01 21:46       ` Fabiano Rosas
2024-02-02  2:12         ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zbt0TUeKR3UTJnC0@x1n \
    --to=peterx@redhat.com \
    --cc=avihaih@nvidia.com \
    --cc=bryan.zhang@bytedance.com \
    --cc=farosas@suse.de \
    --cc=hao.xiang@bytedance.com \
    --cc=ppandit@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=yuan1.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).