Re: [RFC PATCH v2 1/6] migration/multifd: Remove channels_ready semaphore

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Juan Quintela <quintela@redhat.com>
To: Fabiano Rosas <farosas@suse.de>
Cc: qemu-devel@nongnu.org,  Peter Xu <peterx@redhat.com>,
	 Leonardo Bras <leobras@redhat.com>,
	 Elena Ufimtseva <elena.ufimtseva@oracle.com>
Subject: Re: [RFC PATCH v2 1/6] migration/multifd: Remove channels_ready semaphore
Date: Thu, 19 Oct 2023 20:41:26 +0200	[thread overview]
Message-ID: <87a5se3161.fsf@secure.mitica> (raw)
In-Reply-To: <87lebyy5ac.fsf@suse.de> (Fabiano Rosas's message of "Thu, 19 Oct 2023 12:56:43 -0300")

Fabiano Rosas <farosas@suse.de> wrote:
> Juan Quintela <quintela@redhat.com> writes:
>
>> Fabiano Rosas <farosas@suse.de> wrote:
>>> Juan Quintela <quintela@redhat.com> writes:
>>>
>>
>> This is a common pattern for concurrency.  To not have your mutex locked
>> too long, you put a variable (that can only be tested/changed with the
>> lock) to explain that the "channel" is busy, the struct that lock
>> protects is not (see how we make sure that the channel don't use any
>> variable of the struct without the locking).
>
> Sure, but what purpose is to mark the channel as busy? The migration
> thread cannot access the p->packet anyway. From multifd_send_pages()
> perspective, as soon as the channel releases the lock to start with the
> IO, the packet has been sent. It could start preparing the next pages
> struct while the channel is doing IO. No?

ok, we remove the pending.
Then we are sending that packet.

But see what happens on multifd_send_pages()

channels_ready is 0.
this is channel 1
next_channel == 1
channel 0 gets ready, so it increases channels_ready.

static int multifd_send_pages(QEMUFile *f)
{

    qemu_sem_wait(&multifd_send_state->channels_ready);
    // we pass this

    next_channel %= migrate_multifd_channels();
    for (i = next_channel;; i = (i + 1) % migrate_multifd_channels()) {
        p = &multifd_send_state->params[i];

        // remember that i == 0

        qemu_mutex_lock(&p->mutex);
        if (p->quit) {
            error_report("%s: channel %d has already quit!", __func__, i);
            qemu_mutex_unlock(&p->mutex);
            return -1;
        }
        if (!p->pending_job) {
            p->pending_job++;
            next_channel = (i + 1) % migrate_multifd_channels();
            break;
        }
        qemu_mutex_unlock(&p->mutex);

// We choose 1, to send the packet through it.
// channel 1 is busy.
// channel 0 is idle but receives no work.
    }
...
}

So the variable is there to differentiate what channels are busy/idle to
send the work to the idle channels.

> We don't touch p after the IO aside from p->pending_jobs-- and we
> already distribute the load uniformly by incrementing next_channel.

I know.  After multifd_send_threads() releases the mutex it will only
touch ->pending_job (taking the mutex 1st).

> I'm not saying this would be more performant, just wondering if it would
> be possible.

Yeap, but as said before quite suboptimal.

>> As said, we don't want that.  Because channels can go a different speeds
>> due to factors outside of our control.
>>
>> If the semaphore bothers you, you can change it to to a condition
>> variable, but you just move the complexity from one side to the other
>> (Initial implementation had a condition variable, but Paolo said that
>> the semaphore is more efficient, so he won)
>
> Oh, it doesn't bother me. I'm just trying to unequivocally understand
> it's effects. And if it logically follows that it's not necessary, only
> then remove it.

Both channels_ready and pending_job makes the scheme more performant.
Without them it will not fail, just work way slower.

In the example that just showed you, if we started always from channel 0
to search for a idle channel, we would even do worse (that would be an
actual error):

start with channels_ready == 0;
channels_ready is 0.
channel 1 gets ready, so it increases channels_ready.

static int multifd_send_pages(QEMUFile *f)
{
    qemu_sem_wait(&multifd_send_state->channels_ready);
    // we pass this

    for (i = 0;; i = (i + 1) % migrate_multifd_channels()) {
        p = &multifd_send_state->params[i];

        // remember that i == 0

        qemu_mutex_lock(&p->mutex);
        if (p->quit) {
            error_report("%s: channel %d has already quit!", __func__, i);
            qemu_mutex_unlock(&p->mutex);
            return -1;
        }
        if (!p->pending_job) {
            p->pending_job++;
            next_channel = (i + 1) % migrate_multifd_channels();
            break;
        }
        // As there is no test to see if this is idle, we put the page here
        qemu_mutex_unlock(&p->mutex);

        // We put here the page info
    }
...
}

channel 2 guest ready, so it increses channels_ready

static int multifd_send_pages(QEMUFile *f)
{
    qemu_sem_wait(&multifd_send_state->channels_ready);
    // we pass this

    for (i = 0;; i = (i + 1) % migrate_multifd_channels()) {
        p = &multifd_send_state->params[i];

        // remember that i == 0

        qemu_mutex_lock(&p->mutex);
        if (p->quit) {
            error_report("%s: channel %d has already quit!", __func__, i);
            qemu_mutex_unlock(&p->mutex);
            return -1;
        }
        if (!p->pending_job) {
            p->pending_job++;
            next_channel = (i + 1) % migrate_multifd_channels();
            break;
        }
        // As there is no test to see if this is idle, we put the page here
        qemu_mutex_unlock(&p->mutex);

        // We put here the page info
        // channel 0 is still transmitting the 1st page
        // And overwrote the previous page info
   }
...
}

In this particular case, using next_channel in round robin would have
saved the case.  When you put info for a channel to consume
asynhronously, you need to mark somehow that the channel has finished to
use the data before ordering it to put do more job.

We can changing pending_job to a bool if you preffer.  I think that we
have nailed all the off_by_one errors by now (famous last words).

Later, Juan.

next prev parent reply	other threads:[~2023-10-19 18:42 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-12 14:06 [RFC PATCH v2 0/6] migration/multifd: Locking changes Fabiano Rosas
2023-10-12 14:06 ` [RFC PATCH v2 1/6] migration/multifd: Remove channels_ready semaphore Fabiano Rosas
2023-10-19  9:06   ` Juan Quintela
2023-10-19 14:35     ` Peter Xu
2023-10-19 15:00       ` Juan Quintela
2023-10-19 15:46         ` Peter Xu
2023-10-19 18:28           ` Juan Quintela
2023-10-19 18:50             ` Peter Xu
2023-10-20  7:56               ` Juan Quintela
2023-10-19 14:55     ` Fabiano Rosas
2023-10-19 15:18       ` Juan Quintela
2023-10-19 15:56         ` Fabiano Rosas
2023-10-19 18:41           ` Juan Quintela [this message]
2023-10-19 19:04             ` Peter Xu
2023-10-20  7:53               ` Juan Quintela
2023-10-20 12:48                 ` Fabiano Rosas
2023-10-22 20:17                   ` Peter Xu
2023-10-12 14:06 ` [RFC PATCH v2 2/6] migration/multifd: Stop checking p->quit in multifd_send_thread Fabiano Rosas
2023-10-19  9:08   ` Juan Quintela
2023-10-19 14:58     ` Fabiano Rosas
2023-10-19 15:19       ` Peter Xu
2023-10-19 15:19       ` Juan Quintela
2023-10-12 14:06 ` [RFC PATCH v2 3/6] migration/multifd: Decouple control flow from the SYNC packet Fabiano Rosas
2023-10-19 10:28   ` Juan Quintela
2023-10-19 15:31     ` Peter Xu
2023-10-12 14:06 ` [RFC PATCH v2 4/6] migration/multifd: Extract sem_done waiting into a function Fabiano Rosas
2023-10-12 14:06 ` [RFC PATCH v2 5/6] migration/multifd: Stop setting 'quit' outside of channels Fabiano Rosas
2023-10-19 10:35   ` Juan Quintela
2023-10-12 14:06 ` [RFC PATCH v2 6/6] migration/multifd: Bring back the 'ready' semaphore Fabiano Rosas
2023-10-19 10:43   ` Juan Quintela

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a5se3161.fsf@secure.mitica \
    --to=quintela@redhat.com \
    --cc=elena.ufimtseva@oracle.com \
    --cc=farosas@suse.de \
    --cc=leobras@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.