qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Juan Quintela <quintela@redhat.com>
To: Liang Li <liang.z.li@intel.com>
Cc: qemu-devel@nongnu.org, amit.shah@redhat.com, dgilbert@redhat.com
Subject: Re: [Qemu-devel] [PATCH] migration: Fix multi-thread compression bug
Date: Wed, 04 May 2016 11:11:55 +0200	[thread overview]
Message-ID: <878tzquqzo.fsf@emacs.mitica> (raw)
In-Reply-To: <1462257521-16075-1-git-send-email-liang.z.li@intel.com> (Liang Li's message of "Tue, 3 May 2016 14:38:41 +0800")

Liang Li <liang.z.li@intel.com> wrote:
> Recently, a bug related to multiple thread compression feature for
> live migration is reported. The destination side will be blocked
> during live migration if there are heavy workload in host and
> memory intensive workload in guest, this is most likely to happen
> when there is one decompression thread.
>
> Some parts of the decompression code are incorrect:
> 1. The main thread receives data from source side will enter a busy
> loop to wait for a free decompression thread.
> 2. A lock is needed to protect the decomp_param[idx]->start, because
> it is checked in the main thread and is updated in the decompression
> thread.
>
> Fix these two issues by following the code pattern for compression.
>
> Reported-by: Daniel P. Berrange <berrange@redhat.com>
> Signed-off-by: Liang Li <liang.z.li@intel.com>

step in the right direction, so:
Reviewed-by: Juan Quintela <quintela@redhat.com>

but I am still not sure that this is
enough.  if you have the change, look at the multiple-fd code that I
posted, is very similar here.


>  struct DecompressParam {

what protect start, and what protect done?


>      bool start;
> +    bool done;
>      QemuMutex mutex;
>      QemuCond cond;
>      void *des;
> @@ -287,6 +288,8 @@ static bool quit_comp_thread;
>  static bool quit_decomp_thread;
>  static DecompressParam *decomp_param;
>  static QemuThread *decompress_threads;
> +static QemuMutex decomp_done_lock;
> +static QemuCond decomp_done_cond;
>  
>  static int do_compress_ram_page(CompressParam *param);
>  
> @@ -834,6 +837,7 @@ static inline void start_compression(CompressParam *param)
>  
>  static inline void start_decompression(DecompressParam *param)
>  {

Here nothing protects done

> +    param->done = false;
>      qemu_mutex_lock(&param->mutex);
>      param->start = true;
>      qemu_cond_signal(&param->cond);
> @@ -2193,19 +2197,24 @@ static void *do_data_decompress(void *opaque)
>          qemu_mutex_lock(&param->mutex);

we are looking at quit_decomp_thread and nothing protects it


>          while (!param->start && !quit_decomp_thread) {
>              qemu_cond_wait(&param->cond, &param->mutex);
> +        }
> +        if (!quit_decomp_thread) {
>              pagesize = TARGET_PAGE_SIZE;
> -            if (!quit_decomp_thread) {
> -                /* uncompress() will return failed in some case, especially
> -                 * when the page is dirted when doing the compression, it's
> -                 * not a problem because the dirty page will be retransferred
> -                 * and uncompress() won't break the data in other pages.
> -                 */
> -                uncompress((Bytef *)param->des, &pagesize,
> -                           (const Bytef *)param->compbuf, param->len);
> -            }
> -            param->start = false;
> +            /* uncompress() will return failed in some case, especially
> +             * when the page is dirted when doing the compression, it's
> +             * not a problem because the dirty page will be retransferred
> +             * and uncompress() won't break the data in other pages.
> +             */
> +            uncompress((Bytef *)param->des, &pagesize,
> +                       (const Bytef *)param->compbuf, param->len);

We are calling uncompress (a slow operation) with param->mutex taken, is
there any reason why we can't just put the param->* vars in locals?

>          }
> +        param->start = false;

Why are we setting start to false when we _are_ not decompressing a
page?  I think this line should be inside the loop.

>          qemu_mutex_unlock(&param->mutex);
> +
> +        qemu_mutex_lock(&decomp_done_lock);
> +        param->done = true;

here param->done is protected by decomp_done_lock.

> +        qemu_cond_signal(&decomp_done_cond);
> +        qemu_mutex_unlock(&decomp_done_lock);
>      }
>  
>      return NULL;
> @@ -2219,10 +2228,13 @@ void migrate_decompress_threads_create(void)
>      decompress_threads = g_new0(QemuThread, thread_count);
>      decomp_param = g_new0(DecompressParam, thread_count);
>      quit_decomp_thread = false;
> +    qemu_mutex_init(&decomp_done_lock);
> +    qemu_cond_init(&decomp_done_cond);
>      for (i = 0; i < thread_count; i++) {
>          qemu_mutex_init(&decomp_param[i].mutex);
>          qemu_cond_init(&decomp_param[i].cond);
>          decomp_param[i].compbuf = g_malloc0(compressBound(TARGET_PAGE_SIZE));
> +        decomp_param[i].done = true;
>          qemu_thread_create(decompress_threads + i, "decompress",
>                             do_data_decompress, decomp_param + i,
>                             QEMU_THREAD_JOINABLE);
> @@ -2258,9 +2270,10 @@ static void decompress_data_with_multi_threads(QEMUFile *f,
>      int idx, thread_count;
>  
>      thread_count = migrate_decompress_threads();
> +    qemu_mutex_lock(&decomp_done_lock);

we took decomp_done_lock

>      while (true) {
>          for (idx = 0; idx < thread_count; idx++) {
> -            if (!decomp_param[idx].start) {
> +            if (decomp_param[idx].done) {

and we can protecet done with it.

>                  qemu_get_buffer(f, decomp_param[idx].compbuf, len);
>                  decomp_param[idx].des = host;
>                  decomp_param[idx].len = len;

but this ones should be proteced by docomp_param[idx].mutex, no?

> @@ -2270,8 +2283,11 @@ static void decompress_data_with_multi_threads(QEMUFile *f,
>          }
>          if (idx < thread_count) {
>              break;
> +        } else {
> +            qemu_cond_wait(&decomp_done_cond, &decomp_done_lock);
>          }
>      }
> +    qemu_mutex_unlock(&decomp_done_lock);
>  }
>  
>  /*

Thanks, Juan.

  parent reply	other threads:[~2016-05-04  9:12 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-03  6:38 [Qemu-devel] [PATCH] migration: Fix multi-thread compression bug Liang Li
2016-05-03  6:43 ` Li, Liang Z
2016-05-03 10:44 ` Daniel P. Berrange
2016-05-03 16:39 ` Dr. David Alan Gilbert
2016-05-04  1:10   ` Li, Liang Z
2016-05-04  9:03   ` Juan Quintela
2016-05-04  9:11 ` Juan Quintela [this message]
2016-05-04 10:03   ` Li, Liang Z

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=878tzquqzo.fsf@emacs.mitica \
    --to=quintela@redhat.com \
    --cc=amit.shah@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=liang.z.li@intel.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).