From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:51934)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1aHBvY-0005H1-L2
	for qemu-devel@nongnu.org; Thu, 07 Jan 2016 09:52:38 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1aHBvV-00068z-CQ
	for qemu-devel@nongnu.org; Thu, 07 Jan 2016 09:52:36 -0500
Received: from mx1.redhat.com ([209.132.183.28]:38826)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1aHBvV-00068e-2I
	for qemu-devel@nongnu.org; Thu, 07 Jan 2016 09:52:33 -0500
Date: Thu, 7 Jan 2016 14:52:26 +0000
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Message-ID: <20160107145226.GG2519@work-vm>
References: <0259E1C966E8C54AA93AA2B1240828E650F34EC8@szxema507-mbs.china.huawei.com>
	<20160106095731.GB2528@work-vm>
	<568D6E5B.1040206@linux.vnet.ibm.com>
	<0259E1C966E8C54AA93AA2B1240828E650F3536F@szxema507-mbs.china.huawei.com>
	<20160107091413.GA2519@work-vm>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <20160107091413.GA2519@work-vm>
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Qemu-devel]
 =?utf-8?b?562U5aSNOiAgV2hhdCdzIHRoZSBhZHZhbnRhZ2Vz?=
 =?utf-8?q?_of_POSTCOPY_over_CPU-THROTTLE=3F?=
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Zhangbo (Oscar)" <oscar.zhangbo@huawei.com>
Cc: "zhouyimin Zhou(Yimin)" <zhouyimin@huawei.com>, Zhanghailiang <zhang.zhanghailiang@huawei.com>, "Wangyufei (James)" <james.wangyufei@huawei.com>, Yanqiangjun <yanqiangjun@huawei.com>, "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>, "Huangpeng (Peter)" <peter.huangpeng@huawei.com>, Linqiangmin <linqiangmin@huawei.com>, Huangzhichao <huangzhichao@huawei.com>, "jjherne@linux.vnet.ibm.com" <jjherne@linux.vnet.ibm.com>, "Herongguang (Stephen)" <herongguang.he@huawei.com>

* Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
> * Zhangbo (Oscar) (oscar.zhangbo@huawei.com) wrote:
> > Thank you David and Jason!
> >=20
> > BTW, I noticed that Vmware did the same work alike us, but the situat=
ion is a little different:
> >     they proposed postcopy(in the name of QuickResume) in vSphere4.1,=
 but they substituted it with SDPS(similar to CPU-THROTTLE) from vSphere5=
, do you know the reason behind this?
> >     Reference: https://qianr.wordpress.com/2013/10/14/vmware-vm-live-=
migration-vmotion/
> > It's told that they've already introduced a shared storage to avoid l=
osing the guest when the network connection is lost. So what's their conc=
ern of disposing QuickResume?=20
>=20
> Hmm that's a good summary; I'd not seen any detail of how vmware's syst=
ems worked before.
> I'm not exactly sure; but I think that's saying that they store the
> outstanding pages that haven't been transferred in a file on disk
> so that they could be used to recover the VM later if the network faile=
d.
> It's not clear to me from that description if they do that only when th=
e
> network fails, or as part of a normal postcopy flow.


I was thinking about this a bit more; recovering from a failed network
connection for postcopy might be doable, I can kind of see how to do it
*without* using a disk intermediate, but with a disk it's a bit trickier.

Assuming the network connection is lost after we enter postcopy mode,
we've also lost a bit of state - because the source doesn't know exactly
which pages the destination has already received.  This is also something
that we don't actually keep track of on the destination, we leave it up
to the source to tell us when we're finished.

What we could do is:
  a) When the network connection fails make sure we dont kill the destina=
tion,
    and go into some form of paused mode
  b) Also make sure the source doesn't lose the migration state.
  c) Now, get the destination to listen for a connection (something like
     migrate_incoming but we don't want to reset the state, but we do nee=
d
     the ability to specify a different network setup)
  d) Tell the source to connect to the destination again
  e) The source does *not* carry on any background transfer - it only tra=
nsfers
     pages that the destination asks for.
  f) We start a recovery thread on the destination that just walks all of=
 memory
     reading one byte from each page; it should get stuck on any outstand=
ing pages
     and cause the page to be requested.
  g) The source must send a requested page, even if it had previously sen=
t it
     because it might have been lost in network buffers.
  h) Once that recovery thread finishes we know we've received all pages,=
 so
     we're good.

  That all sounds doable; the tricky bit is making sure the destination c=
opes
with the failure enough to be able to recover; it mustn't cause an exit o=
r
a cleanup of the migration state when the network errors.  Also, if a dev=
ice
tries to access a page of memory, it had better not block the monitor, si=
nce
we'll need that to recover.

Could we do it to file, like that description of VMWare? Again the tricky
bit is to do with the pages that may or may not have been lost in the pac=
ket
buffer.  If we did a migrate-to-file/snapshot then that would have all of
memory, rather than the nice small chunk that article describes, but we
could recover from a whole migrate-to-file if we did something special
to make the postcopy load from that (like the userfault driven loadvm
work people have done).
To keep the file small we would have to be smarter.
We'd have to include all the pages that we *know* we haven't sent, but
we'd also have to include pages that might not have been received;
e.g. resend say the last ~20MByte (enough for a few packet buffers????)
into that file.
Then the destination would have to do the recovery thread like above,
and also only load pages it was asked for.
One trick to make this easier, would be to have something that loaded
this recovery file and pretended to be a source-vm, then we could use
the sequence above on the destination side without any changes.

Dave

> > Are there any other prices we need to pay to have postcopy?
>=20
> The precopy phase in postcopy mode is slower than normal precopy migrat=
ion,
> but I think that's the only other penalty.


>=20
> Dave
>=20
> >=20
> > -----=E9=82=AE=E4=BB=B6=E5=8E=9F=E4=BB=B6-----
> > =E5=8F=91=E4=BB=B6=E4=BA=BA: Jason J. Herne [mailto:jjherne@linux.vne=
t.ibm.com]=20
> > =E5=8F=91=E9=80=81=E6=97=B6=E9=97=B4: 2016=E5=B9=B41=E6=9C=887=E6=97=A5=
 3:43
> > =E6=94=B6=E4=BB=B6=E4=BA=BA: Dr. David Alan Gilbert; Zhangbo (Oscar)
> > =E6=8A=84=E9=80=81: zhouyimin Zhou(Yimin); Zhanghailiang; Yanqiangjun=
; Huangpeng (Peter); qemu-devel@nongnu.org; Herongguang (Stephen); Linqia=
ngmin; Huangzhichao; Wangyufei (James)
> > =E4=B8=BB=E9=A2=98: Re: [Qemu-devel] What's the advantages of POSTCOP=
Y over CPU-THROTTLE?
> >=20
> > On 01/06/2016 04:57 AM, Dr. David Alan Gilbert wrote:
> > > * Zhangbo (Oscar) (oscar.zhangbo@huawei.com) wrote:
> > >> Hi all:
> > >> =E3=80=80=E3=80=80Postcopy is suitable for migrating guests which =
have large page change rates. It
> > >>      1 makes the guest run at the destination ASAP.
> > >>      2 makes the downtime of the guest small enough.
> > >>      If we don't take the 1st advantage into account, then, its be=
nefit seems similar with CPU-THROTTLE: both of them make the guest's down=
time small during migration.
> > >>
> > >>      CPU-THROTTLE would make the guest's dirtypage rate *smaller t=
han the network bandwidth*, in order to make the to_send_page_number in e=
ach iteration convergent and achieve the small-enough downtime during the=
 last iteration.
> > >>      If we adopt POST-COPY here, the guest's dirtypage rate would =
*become equal to the bandwidth*, because we have to fetch its memory from=
 the source side, via the network.
> > >>      Both of them would introduce performance degradations of the =
guest, which may in turn cause downtime larger.
> > >>
> > >>      So, here comes the question: If we just compare POSTCOPY with=
 CPU-THROTTLE for their advantages in decreasing downtime, POSTCOPY seems=
 has no pos over CPU-THROTTLE, is that right?
> > >>
> > >>      Meanwhile, Are there any other benifits of POSTCOPY besides t=
he 2 mentioned above?
> > >
> > > It's a good question and they do both try and help solve the same p=
roblem.
> > > One problem with cpu-throttle is whether you can throttle the CPU=20
> > > enough to get the dirty-rate below the rate of the network, and the=
=20
> > > answer to that is very workload dependent.  On a large, many-core V=
M,=20
> > > even a little bit of CPU can dirty a lot of memory.  Postcopy is=20
> > > guaranteed to finish migration, irrespective of the workload.
> > >
> > > Postcopy is pretty fine-grained, in that only threads that are=20
> > > accessing pages that are still on the source are blocked, since it=20
> > > allows the use of async page faults, that means it's even finer=20
> > > grained than the vCPU level, so many threads come back up to full=20
> > > performance pretty quickly even if there are a few pages left.
> > >
> >=20
> > Good answer Dave. FWIW, I completely agree. Using cpu throttling can =
help the situation depending on workload. Postcopy will *always* work.=20
> > One possible side effect of Postcopy is loss of the guest if the netw=
ork connection dies during the postcopy phase of migration. This should b=
e a very rare occurrence however. So both methods have their uses.
> >=20
> > --
> > -- Jason J. Herne (jjherne@linux.vnet.ibm.com)
> >=20
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK