Re: [Qemu-devel] [PATCH 1/1] Stop reinit of XBZRLE.lock

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Markus Armbruster <armbru@redhat.com>
Cc: arei.gonglei@huawei.com, qemu-devel@nongnu.org, quintela@redhat.com
Subject: Re: [Qemu-devel] [PATCH 1/1] Stop reinit of XBZRLE.lock
Date: Tue, 18 Mar 2014 20:47:16 +0000	[thread overview]
Message-ID: <20140318204715.GE2715@work-vm> (raw)
In-Reply-To: <871txze7yc.fsf@blackfin.pond.sub.org>

* Markus Armbruster (armbru@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> writes:

<snip>

> > diff --git a/arch_init.c b/arch_init.c
> > index 60c975d..16474b5 100644
> > --- a/arch_init.c
> > +++ b/arch_init.c
> > @@ -167,10 +167,13 @@ static struct {
> >      /* Cache for XBZRLE, Protected by lock. */
> >      PageCache *cache;
> >      QemuMutex lock;
> > +    bool lock_init; /* True once we've init'd lock */
> >  } XBZRLE = {
> >      .encoded_buf = NULL,
> >      .current_buf = NULL,
> >      .cache = NULL,
> > +    /* .lock is initialised in ram_save_setup */
> > +    .lock_init = false
> >  };
> 
> Redundant initializers.

Given how subtle lock stuff is, I'll take making it obvious as more important.

> >  /* buffer used for XBZRLE decoding */
> >  static uint8_t *xbzrle_decoded_buf;
> > @@ -187,6 +190,11 @@ static void XBZRLE_cache_unlock(void)
> >          qemu_mutex_unlock(&XBZRLE.lock);
> >  }
> >  
> > +/* called from qmp_migrate_set_cache_size in main thread, possibly while
> > + * a migration is in progress.
> > + * A running migration maybe using the cache and might finish during this
> 
> may be
> 
> > + * call, hence changes to the cache are protected by XBZRLE.lock().
> > + */
> 
> Style nit, since I'm nitpicking spelling already: our winged comments
> usually look like
> 
> /*
>  * Text
>  */

Oops, yes; if I need to respin I'll fix that (hmm I wonder if the check
script could be tweaked to find those).

> >  int64_t xbzrle_cache_resize(int64_t new_size)
> >  {
> >      PageCache *new_cache, *cache_to_free;
> > @@ -195,9 +203,12 @@ int64_t xbzrle_cache_resize(int64_t new_size)
> >          return -1;
> >      }
> >  
> > -    /* no need to lock, the current thread holds qemu big lock */
> > +    /* The current thread holds qemu big lock, and we hold it while creating
> > +     * the cache in ram_save_setup, thus it's safe to test if the
> > +     * cache exists yet without it's own lock (which might not have been
> > +     * init'd yet)
> > +     */
> >      if (XBZRLE.cache != NULL) {
> > -        /* check XBZRLE.cache again later */
> >          if (pow2floor(new_size) == migrate_xbzrle_cache_size()) {
> >              return pow2floor(new_size);
> >          }
> > @@ -209,7 +220,10 @@ int64_t xbzrle_cache_resize(int64_t new_size)
> >          }
> >  
> >          XBZRLE_cache_lock();
> > -        /* the XBZRLE.cache may have be destroyed, check it again */
> > +        /* the migration might have finished between the check above and us
> > +         * taking the lock,  causing XBZRLE.cache to be destroyed
> > +         *   check it again
> > +         */
> >          if (XBZRLE.cache != NULL) {
> >              cache_to_free = XBZRLE.cache;
> >              XBZRLE.cache = new_cache;
> > @@ -744,7 +758,15 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
> >              DPRINTF("Error creating cache\n");
> >              return -1;
> >          }
> > -        qemu_mutex_init(&XBZRLE.lock);
> > +        /* mutex's can't be reinit'd without destroying them
> > +         * and we've not got a good place to destroy it that
> > +         * we can guarantee isn't being called when we might want
> > +         * to hold the lock.
> > +         */
> > +        if (!XBZRLE.lock_init) {
> > +            XBZRLE.lock_init = true;
> > +            qemu_mutex_init(&XBZRLE.lock);
> > +        }
> >          qemu_mutex_unlock_iothread();
> >  
> >          /* We prefer not to abort if there is no memory */
> 
> I detest how the locking works in xbzrle_cache_resize().

Yeh, it's tricky - I think the way to think about it is that the 
lock protects the cache and it's contents, not the pointer.
Except that's not really true when it swaps the new one in - hmm.

> The first XBZRLE.cache != NULL is not under XBZRLE.lock.  As you explain
> in the comment, this is required, because we foolishly delay lock
> initialization until a migration starts, and it's safe, because cache
> creation is also under the BQL.

Some of this is down to the interfaces between migration generally,
the devices being migrated and the specials for RAM/iterative migration.

1) A lot of the ram migration data, including XBZRLE, is global data in
arch_init - this is bad.

2) The management of these is generally glued onto the migration
setup/iterate/complete set of methods used during migration, however
they don't have any calls that correspond to the actual start/end of
migration, or the like that could be sanely used to do any init
that tied up with the actual start end of migration - this is bad.

3) Migration is in a separate thread and could finish at any time - this
is expected but complicates life, especially when (2) means that
the data structures for RAMs migration etc are cleared up when migration
finishes.

I don't know the history but (1) seems to arrise from the
semi-arbitrary split between arch_init.c, memory.c, savevm.c, and probably
3 or 4 other files I've failed to mention.

> The second XBZRLE.cache != NULL *is* under XBZRLE.lock.  Required,
> because cache destruction is *not* inder the BQL, only under
> XBZRLE.lock.
> 
> I'd very, very much prefer this to be made obviously safe: initialize
> XBZRLE.lock sufficiently early, then access XBZRLE.cache only under
> XBZRLE.lock.
> 
> Confusing and way too subtle for no good reason.  But your patch doesn't
> add subtlety, it explains it, and fixes a bug.  Therefore:
> 
> Reviewed-by: Markus Armbruster <armbru@redhat.com>

Thanks.

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

next prev parent reply	other threads:[~2014-03-18 20:47 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-18 15:56 [Qemu-devel] [PATCH 1/1] Stop reinit of XBZRLE.lock Dr. David Alan Gilbert (git)
2014-03-18 16:47 ` 陈梁
2014-03-18 20:20   ` Dr. David Alan Gilbert
2014-03-18 17:24 ` Markus Armbruster
2014-03-18 20:47   ` Dr. David Alan Gilbert [this message]
2014-03-19  7:50     ` Markus Armbruster
2014-03-19  9:31       ` Dr. David Alan Gilbert
2014-03-19 12:07         ` Markus Armbruster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140318204715.GE2715@work-vm \
    --to=dgilbert@redhat.com \
    --cc=arei.gonglei@huawei.com \
    --cc=armbru@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).