From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:47737) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WEEjB-0004jt-6L for qemu-devel@nongnu.org; Fri, 14 Feb 2014 04:06:38 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WEEiz-0007IQ-MM for qemu-devel@nongnu.org; Fri, 14 Feb 2014 04:06:33 -0500 Received: from mx1.redhat.com ([209.132.183.28]:53336) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WEEiz-0007IH-Dw for qemu-devel@nongnu.org; Fri, 14 Feb 2014 04:06:21 -0500 Date: Fri, 14 Feb 2014 09:06:13 +0000 From: "Dr. David Alan Gilbert" Message-ID: <20140214090612.GA2316@work-vm> References: <52F4E211.9080808@profihost.ag> <52F4E37D.4050204@profihost.ag> <52F5321C.4090605@profihost.ag> <20140207200204.GA5013@work-vm> <52F53DBC.8000209@profihost.ag> <52F68438.9050300@profihost.ag> <20140210160706.GK3545@work-vm> <52F92035.5040402@profihost.ag> <20140213200617.GK24733@work-vm> <52FD3696.8080103@profihost.ag> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <52FD3696.8080103@profihost.ag> Subject: Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad swap file entry List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Priebe Cc: Paolo Bonzini , qemu-devel , Alexandre DERUMIER , owasserm@redhat.com * Stefan Priebe (s.priebe@profihost.ag) wrote: > > Am 13.02.2014 21:06, schrieb Dr. David Alan Gilbert: > >* Stefan Priebe (s.priebe@profihost.ag) wrote: > >>Am 10.02.2014 17:07, schrieb Dr. David Alan Gilbert: > >>>* Stefan Priebe (s.priebe@profihost.ag) wrote: > >>>>i could fix it by explicitly disable xbzrle - it seems its > >>>>automatically on if i do not set the migration caps to false. > >>>> > >>>>So it seems to be a xbzrle bug. > >>> > >>>Stefan can you give me some more info on your hardware and > >>>migration setup; that stressapptest (which is a really nice > >>>find!) really batters the memory and it means the migration > >>>isn't converging for me, so I'm curious what your setup is. > >> > >>That one is devlopment by google and known to me since a few years. > >>Google has detected that memtest and co are not good enough to > >>stress test memory. > > > >Hi Stefan, > > I've just posted a patch to qemu-devel that fixes two bugs that > >we found; I've only tried a small stressapptest run and it seems > >to survive with them (where it didn't before); you might like to try > >it if you're up for rebuilding qemu. > > > >It's the one entitled ' [PATCH] Fix two XBZRLE corruption issues' > > > >I'll try and get a larger run done myself, but I'd be interested to > >hear if it fixes it for you (or anyone else who hit the problem). > > Yes works fine - now no crash but it's sower than without XBZRLE ;-) > > Without XBZRLE: i needed migrate_downtime 4 around 60s > With XBZRLE: i needed migrate_downtime 16 and 240s Hmm; how did that compare with the previous (broken) with XBZRLE time? (i.e. was XBZRLE always slower for you?) If you're driving this from the hmp/command interface then the result of the info migrate command at the end of each of those runs would be interesting. Another thing you could try is changing the xbzrle_cache_zero_page in arch_init.c that I added so it reads as: static void xbzrle_cache_zero_page(ram_addr_t current_addr) { if (ram_bulk_stage || !migrate_use_xbzrle()) { return; } if (!cache_is_cached(XBZRLE.cache, current_addr)) { return; } /* We don't care if this fails to allocate a new cache page * as long as it updated an old one */ cache_insert(XBZRLE.cache, current_addr, ZERO_TARGET_PAGE); } Dave -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK