All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org,
	"Daniel P . Berrange" <berrange@redhat.com>,
	Leonardo Bras Soares Passos <lsoaresp@redhat.com>,
	Juan Quintela <quintela@redhat.com>
Subject: Re: [PATCH 2/5] migration: Fix race on qemu_file_shutdown()
Date: Thu, 22 Sep 2022 16:43:23 +0100	[thread overview]
Message-ID: <YyyCm4rpTPZA4ykp@work-vm> (raw)
In-Reply-To: <20220920223800.47467-3-peterx@redhat.com>

* Peter Xu (peterx@redhat.com) wrote:
> In qemu_file_shutdown(), there's a possible race if with current order of
> operation.  There're two major things to do:
> 
>   (1) Do real shutdown() (e.g. shutdown() syscall on socket)
>   (2) Update qemufile's last_error
> 
> We must do (2) before (1) otherwise there can be a race condition like:
> 
>       page receiver                     other thread
>       -------------                     ------------
>       qemu_get_buffer()
>                                         do shutdown()
>         returns 0 (buffer all zero)
>         (meanwhile we didn't check this retcode)
>       try to detect IO error
>         last_error==NULL, IO okay
>       install ALL-ZERO page
>                                         set last_error
>       --> guest crash!
> 
> To fix this, we can also check retval of qemu_get_buffer(), but not all
> APIs can be properly checked and ultimately we still need to go back to
> qemu_file_get_error().  E.g. qemu_get_byte() doesn't return error.
> 
> Maybe some day a rework of qemufile API is really needed, but for now keep
> using qemu_file_get_error() and fix it by not allowing that race condition
> to happen.  Here shutdown() is indeed special because the last_error was
> emulated.  For real -EIO errors it'll always be set when e.g. sendmsg()
> error triggers so we won't miss those ones, only shutdown() is a bit tricky
> here.
> 
> Cc: Daniel P. Berrange <berrange@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>

Oh that's kind of fun,


Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/qemu-file.c | 27 ++++++++++++++++++++++++---
>  1 file changed, 24 insertions(+), 3 deletions(-)
> 
> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> index 4f400c2e52..2d5f74ffc2 100644
> --- a/migration/qemu-file.c
> +++ b/migration/qemu-file.c
> @@ -79,6 +79,30 @@ int qemu_file_shutdown(QEMUFile *f)
>      int ret = 0;
>  
>      f->shutdown = true;
> +
> +    /*
> +     * We must set qemufile error before the real shutdown(), otherwise
> +     * there can be a race window where we thought IO all went though
> +     * (because last_error==NULL) but actually IO has already stopped.
> +     *
> +     * If without correct ordering, the race can happen like this:
> +     *
> +     *      page receiver                     other thread
> +     *      -------------                     ------------
> +     *      qemu_get_buffer()
> +     *                                        do shutdown()
> +     *        returns 0 (buffer all zero)
> +     *        (we didn't check this retcode)
> +     *      try to detect IO error
> +     *        last_error==NULL, IO okay
> +     *      install ALL-ZERO page
> +     *                                        set last_error
> +     *      --> guest crash!
> +     */
> +    if (!f->last_error) {
> +        qemu_file_set_error(f, -EIO);
> +    }
> +
>      if (!qio_channel_has_feature(f->ioc,
>                                   QIO_CHANNEL_FEATURE_SHUTDOWN)) {
>          return -ENOSYS;
> @@ -88,9 +112,6 @@ int qemu_file_shutdown(QEMUFile *f)
>          ret = -EIO;
>      }
>  
> -    if (!f->last_error) {
> -        qemu_file_set_error(f, -EIO);
> -    }
>      return ret;
>  }
>  
> -- 
> 2.32.0
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



  reply	other threads:[~2022-09-22 17:18 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-20 22:37 [PATCH 0/5] migration: Bug fixes (prepare for preempt-full) Peter Xu
2022-09-20 22:37 ` [PATCH 1/5] migration: Fix possible deadloop of ram save process Peter Xu
2022-09-22 14:49   ` Dr. David Alan Gilbert
2022-09-22 15:25     ` Peter Xu
2022-09-22 16:41       ` Dr. David Alan Gilbert
2022-10-04 14:25         ` Peter Xu
2022-10-04 15:02           ` Dr. David Alan Gilbert
2022-09-20 22:37 ` [PATCH 2/5] migration: Fix race on qemu_file_shutdown() Peter Xu
2022-09-22 15:43   ` Dr. David Alan Gilbert [this message]
2022-09-22 16:58   ` Daniel P. Berrangé
2022-09-22 19:37     ` Peter Xu
2022-09-23  7:14       ` Daniel P. Berrangé
2022-09-23 18:27         ` Peter Xu
2022-09-20 22:37 ` [PATCH 3/5] migration: Disallow xbzrle with postcopy Peter Xu
2022-09-22 15:56   ` Dr. David Alan Gilbert
2022-09-22 19:28     ` Peter Xu
2022-09-20 22:37 ` [PATCH 4/5] migration: Disallow postcopy preempt to be used with compress Peter Xu
2022-09-22 16:29   ` Dr. David Alan Gilbert
2022-09-20 22:38 ` [PATCH 5/5] migration: Use non-atomic ops for clear log bitmap Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YyyCm4rpTPZA4ykp@work-vm \
    --to=dgilbert@redhat.com \
    --cc=berrange@redhat.com \
    --cc=lsoaresp@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.