From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48217) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WIJsZ-00026B-Bm for qemu-devel@nongnu.org; Tue, 25 Feb 2014 10:25:13 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WIJsU-0006XQ-Hp for qemu-devel@nongnu.org; Tue, 25 Feb 2014 10:25:07 -0500 Received: from mx1.redhat.com ([209.132.183.28]:3258) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WIJsU-0006Ui-74 for qemu-devel@nongnu.org; Tue, 25 Feb 2014 10:25:02 -0500 From: Juan Quintela In-Reply-To: <33183CC9F5247A488A2544077AF19020815C8B93@SZXEMA503-MBS.china.huawei.com> (Gonglei's message of "Sat, 22 Feb 2014 08:52:06 +0000") References: <33183CC9F5247A488A2544077AF19020815C8B93@SZXEMA503-MBS.china.huawei.com> Date: Tue, 25 Feb 2014 16:24:54 +0100 Message-ID: <87sir7z03d.fsf@elfo.mitica> MIME-Version: 1.0 Content-Type: text/plain Subject: Re: [Qemu-devel] [PATCH v3] XBZRLE: Fix qemu crash when resize the xbzrle cache Reply-To: quintela@redhat.com List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Gonglei (Arei)" Cc: "chenliang (T)" , "owasserm@redhat.com" , "Huangweidong (C)" , "qemu-devel@nongnu.org" , "Dr. David Alan Gilbert" "Gonglei (Arei)" wrote: > Resizing the xbzrle cache during migration causes qemu-crash, > because the main-thread and migration-thread modify the xbzrle > cache size concurrently without lock-protection. > > Signed-off-by: ChenLiang > Signed-off-by: Gonglei > Reviewed-by: Dr. David Alan Gilbert Sorry for the late review. > --- > Changes against the previous version: > *Remove the temporary variable cache in the xbzrle_cache_resize. > > arch_init.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++++++++++--- > 1 file changed, 55 insertions(+), 3 deletions(-) > > diff --git a/arch_init.c b/arch_init.c > index 80574a0..003640f 100644 > --- a/arch_init.c > +++ b/arch_init.c > @@ -164,26 +164,69 @@ static struct { > uint8_t *encoded_buf; > /* buffer for storing page content */ > uint8_t *current_buf; > - /* Cache for XBZRLE */ > + /* Cache for XBZRLE, Protected by lock. */ > PageCache *cache; > + QemuMutex lock; > } XBZRLE = { > .encoded_buf = NULL, > .current_buf = NULL, > .cache = NULL, > + /* use the lock carefully */ > + .lock = {PTHREAD_MUTEX_INITIALIZER}, > }; Have you tried to compile for windows? I think this would make it break. We need to init it with qemu_mutex_init() in ram_save_setup() after the call to cache_init()? > /* buffer used for XBZRLE decoding */ > static uint8_t *xbzrle_decoded_buf; > > +static void XBZRLE_cache_lock(void) > +{ > + qemu_mutex_lock(&XBZRLE.lock); > +} > + if we change this to: if (migrate_use_xbzrle()) qemu_mutex_lock(&XBZRLE.lock); On both cache operations, we would remove the overhead in the no XBRLE case, right? > +static void XBZRLE_cache_unlock(void) > +{ > + qemu_mutex_unlock(&XBZRLE.lock); > +} > + > int64_t xbzrle_cache_resize(int64_t new_size) > { > + PageCache *new_cache, *old_cache; > + > if (new_size < TARGET_PAGE_SIZE) { > return -1; > } > > + XBZRLE_cache_lock(); > if (XBZRLE.cache != NULL) { > - return cache_resize(XBZRLE.cache, new_size / TARGET_PAGE_SIZE) * > - TARGET_PAGE_SIZE; > + /* check XBZRLE.cache again later */ > + XBZRLE_cache_unlock(); > + if (pow2floor(new_size) == migrate_xbzrle_cache_size()) { > + return pow2floor(new_size); > + } > + new_cache = cache_init(new_size / TARGET_PAGE_SIZE, > + TARGET_PAGE_SIZE); > + if (!new_cache) { > + DPRINTF("Error creating cache\n"); > + return -1; > + } > + > + XBZRLE_cache_lock(); > + /* the XBZRLE.cache may have be destroyed, check it again */ > + if (XBZRLE.cache != NULL) { > + old_cache = XBZRLE.cache; > + XBZRLE.cache = new_cache; > + new_cache = NULL; > + } > + XBZRLE_cache_unlock(); > + > + if (NULL == new_cache) { > + cache_fini(old_cache); > + } else { > + cache_fini(new_cache); > + } > + } else { > + XBZRLE_cache_unlock(); > } Can we change this to: if (pow2floor(new_size) == migrate_xbzrle_cache_size()) { return pow2floor(new_size); } new_cache = cache_init(new_size / TARGET_PAGE_SIZE, TARGET_PAGE_SIZE); if (!new_cache) { DPRINTF("Error creating cache\n"); return -1; } XBZRLE_cache_lock(); /* the XBZRLE.cache may have be destroyed, check it again */ if (XBZRLE.cache != NULL) { cache_to_free = XBZRLE.cache; XBZRLE.cache = new_cache; } else { cache_to_free = new_cache; } XBZRLE_cache_unlock(); cache_fini(cache_to_free); I think that looking is clearer this way. BTW, we are losing _everything_ that is on the cache if we resize it. I don't know if that is what we want. If we have a big cache, and make it bigger because we are having too many misses, just dropping the whole cache looks like a bad idea :-( And my understanding is that we should do a: migrate_get_current()->xbzrle_cache_size = new_size; independently of what CACHE is equal or different from NULL? (this was already wrong before your patch, just asking because you are looking at the code). Thanks, Juan.