From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:47737)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1WEEjB-0004jt-6L
	for qemu-devel@nongnu.org; Fri, 14 Feb 2014 04:06:38 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1WEEiz-0007IQ-MM
	for qemu-devel@nongnu.org; Fri, 14 Feb 2014 04:06:33 -0500
Received: from mx1.redhat.com ([209.132.183.28]:53336)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1WEEiz-0007IH-Dw
	for qemu-devel@nongnu.org; Fri, 14 Feb 2014 04:06:21 -0500
Date: Fri, 14 Feb 2014 09:06:13 +0000
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Message-ID: <20140214090612.GA2316@work-vm>
References: <52F4E211.9080808@profihost.ag> <52F4E37D.4050204@profihost.ag>
	<52F5321C.4090605@profihost.ag> <20140207200204.GA5013@work-vm>
	<52F53DBC.8000209@profihost.ag> <52F68438.9050300@profihost.ag>
	<20140210160706.GK3545@work-vm> <52F92035.5040402@profihost.ag>
	<20140213200617.GK24733@work-vm> <52FD3696.8080103@profihost.ag>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <52FD3696.8080103@profihost.ag>
Subject: Re: [Qemu-devel] [pve-devel] QEMU LIve Migration - swap_free: Bad
 swap file entry
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Stefan Priebe <s.priebe@profihost.ag>
Cc: Paolo Bonzini <pbonzini@redhat.com>, qemu-devel <qemu-devel@nongnu.org>, Alexandre DERUMIER <aderumier@odiso.com>, owasserm@redhat.com

* Stefan Priebe (s.priebe@profihost.ag) wrote:
> 
> Am 13.02.2014 21:06, schrieb Dr. David Alan Gilbert:
> >* Stefan Priebe (s.priebe@profihost.ag) wrote:
> >>Am 10.02.2014 17:07, schrieb Dr. David Alan Gilbert:
> >>>* Stefan Priebe (s.priebe@profihost.ag) wrote:
> >>>>i could fix it by explicitly disable xbzrle - it seems its
> >>>>automatically on if i do not set the migration caps to false.
> >>>>
> >>>>So it seems to be a xbzrle bug.
> >>>
> >>>Stefan can you give me some more info on your hardware and
> >>>migration setup;   that stressapptest (which is a really nice
> >>>find!) really batters the memory and it means the migration
> >>>isn't converging for me, so I'm curious what your setup is.
> >>
> >>That one is devlopment by google and known to me since a few years.
> >>Google has detected that memtest and co are not good enough to
> >>stress test memory.
> >
> >Hi Stefan,
> >   I've just posted a patch to qemu-devel that fixes two bugs that
> >we found; I've only tried a small stressapptest run and it seems
> >to survive with them (where it didn't before);  you might like to try
> >it if you're up for rebuilding qemu.
> >
> >It's the one entitled ' [PATCH] Fix two XBZRLE corruption issues'
> >
> >I'll try and get a larger run done myself, but I'd be interested to
> >hear if it fixes it for you (or anyone else who hit the problem).
> 
> Yes works fine - now no crash but it's sower than without XBZRLE ;-)
> 
> Without XBZRLE: i needed migrate_downtime 4 around 60s
> With XBZRLE: i needed migrate_downtime 16 and 240s

Hmm; how did that compare with the previous (broken) with XBZRLE
time?   (i.e. was XBZRLE always slower for you?)

If you're driving this from the hmp/command interface then
the result of the
      info migrate

command at the end of each of those runs would be interesting.

Another thing you could try is changing the xbzrle_cache_zero_page
in arch_init.c that I added so it reads as:

static void xbzrle_cache_zero_page(ram_addr_t current_addr)
{
    if (ram_bulk_stage || !migrate_use_xbzrle()) {
        return;
    }

    if (!cache_is_cached(XBZRLE.cache, current_addr)) {
        return;
    }

    /* We don't care if this fails to allocate a new cache page
     * as long as it updated an old one */
    cache_insert(XBZRLE.cache, current_addr, ZERO_TARGET_PAGE);
}

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK