From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([208.118.235.92]:59374)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <dwg@au1.ibm.com>) id 1TVAtp-0003Uk-O2
	for qemu-devel@nongnu.org; Sun, 04 Nov 2012 19:50:47 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <dwg@au1.ibm.com>) id 1TVAto-0003lV-BD
	for qemu-devel@nongnu.org; Sun, 04 Nov 2012 19:50:45 -0500
Received: from e23smtp03.au.ibm.com ([202.81.31.145]:56033)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <dwg@au1.ibm.com>) id 1TVAtn-0003kv-LU
	for qemu-devel@nongnu.org; Sun, 04 Nov 2012 19:50:44 -0500
Received: from /spool/local
	by e23smtp03.au.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use
	Only! Violators will be prosecuted
	for <qemu-devel@nongnu.org> from <dwg@au1.ibm.com>;
	Mon, 5 Nov 2012 10:47:29 +1000
Received: from d23av03.au.ibm.com (d23av03.au.ibm.com [9.190.234.97])
	by d23relay04.au.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id
	qA50eEPP42991780
	for <qemu-devel@nongnu.org>; Mon, 5 Nov 2012 11:40:15 +1100
Received: from d23av03.au.ibm.com (loopback [127.0.0.1])
	by d23av03.au.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id
	qA50oPjd019119
	for <qemu-devel@nongnu.org>; Mon, 5 Nov 2012 11:50:25 +1100
Date: Mon, 5 Nov 2012 11:17:00 +1100
From: David Gibson <dwg@au1.ibm.com>
Message-ID: <20121105001700.GV27695@truffula.fritz.box>
References: <1351654988-13165-1-git-send-email-david@gibson.dropbear.id.au>
	<509106A0.7020400@redhat.com>
	<20121102031508.GN27695@truffula.fritz.box>
	<87625oi84n.fsf@elfo.mitica>
	<20121103150054.GT27695@truffula.fritz.box>
	<87d2ztfa9i.fsf@elfo.mitica>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <87d2ztfa9i.fsf@elfo.mitica>
Subject: Re: [Qemu-devel] [PATCH] Fix off-by-1 error in RAM migration code
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Juan Quintela <quintela@redhat.com>
Cc: Orit Wasserman <owasserm@redhat.com>, aliguori@us.ibm.com, qemu-devel@nongnu.org

On Sun, Nov 04, 2012 at 08:17:29PM +0100, Juan Quintela wrote:
> David Gibson <dwg@au1.ibm.com> wrote:
> > On Fri, Nov 02, 2012 at 11:58:32AM +0100, Juan Quintela wrote:
> >> David Gibson <dwg@au1.ibm.com> wrote:
> >> > On Wed, Oct 31, 2012 at 01:08:16PM +0200, Orit Wasserman wrote:
> >> >> On 10/31/2012 05:43 AM, David Gibson wrote:
> >> 
> >> Reviewed-by: Juan Quintela <quintela@redhat.com> 
> >> 
> >> Good catch, I missunderstood the function when fixing a different bug,
> >> and never undrestood why it fixed it.
> >
> > Actually.. it just occurred to me that I think there has to be another
> > bug here somewhere..
> 
> I am at KVM Forum and LinuxCon for this week, so I can't test anything.
> 
> For some reason, I missunderstood bitmap_set() and though this was the
> value that we "initiliazed" the bitmap with.  So, I changed from 0 to 1,
> and ..... I was sending half the pages over the wire.  Yes, that is
> right, just half of them.  So clearly we have some bug somewhere
> else :-()

Ah.

> > I haven't actually observed any effects from the memory corruption -
> > though it's certainly a real bug.  I found this because another effect
> > of this bug is that migration_dirty_pages count was set to 1 more than
> > the actual number of dirty bits in the bitmap.  That meant the dirty
> > pages count was never reaching zero and so the migration/savevm never
> > terminated.
> 
> I wonder what is on page 0 on an x86, problably some BIOS data that
> never changes?  No clue about pseries.

On pseries it will be exception vectors.  So it will change on the
firmware->kernel transition.  And I think there may be a very few
special variables in there, so it will change later.  Note that the
migration_dirty_pages count will remain mismatched, whether or not
page 0 gets touched.

> > Except.. that every so often the migration *did* terminate (maybe 1
> > time in 5).  Also I kind of hope somebody would have noticed this
> > earlier if migrations never terminated on x86 too.  But as far as I
> > can tell, if initially mismatched like this it ought to be impossible
> > for the dirty page count to ever reach zero.  Which suggests there is
> > another bug with the dirty count tracking :(.
> 
> We use the dirty bitmap count to know how many pages are dirty, but once
> that the number is low enough, we just sent "the  rest" of the pages.
> So, it would always converge (or not) independent of that bug by one.
> We never test for zero dirty pages, we test "are we able to send this
> many pages" over "max_downtime".  So this explains why it works for you
> sometimes.

Hrm.  It explains why it works sometimes (phew the bug I though must
be there needn't be).  It doesn't explain why it mostly didn't - I
actually saw it spinning there on ram_save_iterate(), not completing
with one dirty page remaining.

But t

> > It's possible the memory corruption could account for this, of course
> > - since that in theory at least, could have almost any strange effect
> > on the program's behavior.  But that doesn't seem particularly likely
> > to me.
> 
> This depends on _what_ is on page zero, if that is differente for
> whatever we put there during boot, and it we ever wrote to that page
> again, we would mark that pge dirty anyways, so I would put that
> "corruption" problem as highly improbable.  Not that we shouldn't fix
> the bug, but I doubt that you are getting memory corruption due to this
> bug.
> 
> The only way that you can get memory corruption is if you write to that
> page just before you do migration, and then you never wrote to it again.
> What is on hardware page zero on pseries?  or it is just a normal page?

You misunderstand.  I wasn't talking about potential corruption of
guest memory because we fail to transmit page 0 (though that is also a
bug).  I'm talking about corruption of qemu internal memory because
the bitmap_set() writes one 1 bit beyond the end of the space
allocated for the migration bitmap - it has no kind of bounds checking.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson