All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com>
Cc: qemu-devel@nongnu.org, michael@hinespot.com, quintela@redhat.com,
	lvivier@redhat.com, berrange@redhat.com
Subject: Re: [Qemu-devel] [PATCH 3/5] migration/rdma: Allow cancelling while waiting for wrid
Date: Wed, 12 Jul 2017 17:32:34 +0800	[thread overview]
Message-ID: <20170712093233.GI29326@pxdev.xzpeter.org> (raw)
In-Reply-To: <20170704184915.31586-4-dgilbert@redhat.com>

On Tue, Jul 04, 2017 at 07:49:13PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> When waiting for a WRID, if the other side dies we end up waiting
> for ever with no way to cancel the migration.
> Cure this by poll()ing the fd first with a timeout and checking
> error flags and migration state.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  migration/rdma.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++------
>  1 file changed, 48 insertions(+), 6 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index 6111e10c70..7273ae9929 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -1466,6 +1466,52 @@ static uint64_t qemu_rdma_poll(RDMAContext *rdma, uint64_t *wr_id_out,
>      return  0;
>  }
>  
> +/* Wait for activity on the completion channel.
> + * Returns 0 on success, none-0 on error.
> + */
> +static int qemu_rdma_wait_comp_channel(RDMAContext *rdma)
> +{
> +    /*
> +     * Coroutine doesn't start until migration_fd_process_incoming()
> +     * so don't yield unless we know we're running inside of a coroutine.
> +     */
> +    if (rdma->migration_started_on_destination) {
> +        yield_until_fd_readable(rdma->comp_channel->fd);
> +    } else {
> +        /* This is the source side, we're in a separate thread
> +         * or destination prior to migration_fd_process_incoming()
> +         * we can't yield; so we have to poll the fd.
> +         * But we need to be able to handle 'cancel' or an error
> +         * without hanging forever.
> +         */
> +        while (!rdma->error_state && !rdma->error_reported &&
> +               !rdma->received_error) {

Should we just check rdma->error_state? Since iiuc error_reported only
means whether error is reported (set to 1 if reported), and
received_error means whether we got explicit error from peer (I think
only via a RDMA_CONTROL_ERROR message), and it's 1 if received. IIUC
either error_reported or received_error will mean that error_state be
set already.

> +            GPollFD pfds[1];
> +            pfds[0].fd = rdma->comp_channel->fd;
> +            pfds[0].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
> +            /* 0.5s timeout, should be fine for a 'cancel' */
> +            switch (qemu_poll_ns(pfds, 1, 500 * 1000 * 1000)) {

(I would be more aggresive, say, 100ms, or even less, for better UI
 response with merely nothing lost. But 500ms is okay as well to me)

> +            case 1: /* fd active */
> +                return 0;
> +
> +            case 0: /* Timeout, go around again */
> +                break;
> +
> +            default: /* Error of some type */
> +                return -1;
> +            }
> +
> +            if (migrate_get_current()->state == MIGRATION_STATUS_CANCELLING) {
> +                /* Bail out and let the cancellation happen */
> +                return -EPIPE;
> +            }
> +        }
> +    }
> +
> +    return rdma->error_state || rdma->error_reported ||
> +           rdma->received_error;

Similar question here. Maybe just return rdma->error_state?

Thanks,

-- 
Peter Xu

  reply	other threads:[~2017-07-12  9:32 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-04 18:49 [Qemu-devel] [PATCH 0/5] A bunch of RDMA fixes Dr. David Alan Gilbert (git)
2017-07-04 18:49 ` [Qemu-devel] [PATCH 1/5] migration/rdma: Fix race on source Dr. David Alan Gilbert (git)
2017-07-12  3:13   ` Peter Xu
2017-07-04 18:49 ` [Qemu-devel] [PATCH 2/5] migration: Close file on failed migration load Dr. David Alan Gilbert (git)
2017-07-12  3:20   ` Peter Xu
2017-07-12 11:00     ` Dr. David Alan Gilbert
2017-07-14  2:51       ` Peter Xu
2017-07-04 18:49 ` [Qemu-devel] [PATCH 3/5] migration/rdma: Allow cancelling while waiting for wrid Dr. David Alan Gilbert (git)
2017-07-12  9:32   ` Peter Xu [this message]
2017-07-12 12:36     ` Dr. David Alan Gilbert
2017-07-14  2:57       ` Peter Xu
2017-07-04 18:49 ` [Qemu-devel] [PATCH 4/5] migration/rdma: Safely convert control types Dr. David Alan Gilbert (git)
2017-07-12  3:24   ` Peter Xu
2017-07-04 18:49 ` [Qemu-devel] [PATCH 5/5] migration/rdma: Send error during cancelling Dr. David Alan Gilbert (git)
2017-07-12  3:42   ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170712093233.GI29326@pxdev.xzpeter.org \
    --to=peterx@redhat.com \
    --cc=berrange@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=michael@hinespot.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.