From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:33164)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1aH6eF-00036H-FL
	for qemu-devel@nongnu.org; Thu, 07 Jan 2016 04:14:24 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1aH6eB-0002QP-8y
	for qemu-devel@nongnu.org; Thu, 07 Jan 2016 04:14:23 -0500
Received: from mx1.redhat.com ([209.132.183.28]:44109)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1aH6eB-0002QD-1Y
	for qemu-devel@nongnu.org; Thu, 07 Jan 2016 04:14:19 -0500
Date: Thu, 7 Jan 2016 09:14:13 +0000
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Message-ID: <20160107091413.GA2519@work-vm>
References: <0259E1C966E8C54AA93AA2B1240828E650F34EC8@szxema507-mbs.china.huawei.com>
	<20160106095731.GB2528@work-vm>
	<568D6E5B.1040206@linux.vnet.ibm.com>
	<0259E1C966E8C54AA93AA2B1240828E650F3536F@szxema507-mbs.china.huawei.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <0259E1C966E8C54AA93AA2B1240828E650F3536F@szxema507-mbs.china.huawei.com>
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Qemu-devel]
 =?utf-8?b?562U5aSNOiAgV2hhdCdzIHRoZSBhZHZhbnRhZ2Vz?=
 =?utf-8?q?_of_POSTCOPY_over_CPU-THROTTLE=3F?=
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Zhangbo (Oscar)" <oscar.zhangbo@huawei.com>
Cc: "zhouyimin Zhou(Yimin)" <zhouyimin@huawei.com>, Zhanghailiang <zhang.zhanghailiang@huawei.com>, "Wangyufei (James)" <james.wangyufei@huawei.com>, Yanqiangjun <yanqiangjun@huawei.com>, "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>, "Huangpeng (Peter)" <peter.huangpeng@huawei.com>, Linqiangmin <linqiangmin@huawei.com>, Huangzhichao <huangzhichao@huawei.com>, "jjherne@linux.vnet.ibm.com" <jjherne@linux.vnet.ibm.com>, "Herongguang (Stephen)" <herongguang.he@huawei.com>

* Zhangbo (Oscar) (oscar.zhangbo@huawei.com) wrote:
> Thank you David and Jason!
>=20
> BTW, I noticed that Vmware did the same work alike us, but the situatio=
n is a little different:
>     they proposed postcopy(in the name of QuickResume) in vSphere4.1, b=
ut they substituted it with SDPS(similar to CPU-THROTTLE) from vSphere5, =
do you know the reason behind this?
>     Reference: https://qianr.wordpress.com/2013/10/14/vmware-vm-live-mi=
gration-vmotion/
> It's told that they've already introduced a shared storage to avoid los=
ing the guest when the network connection is lost. So what's their concer=
n of disposing QuickResume?=20

Hmm that's a good summary; I'd not seen any detail of how vmware's system=
s worked before.
I'm not exactly sure; but I think that's saying that they store the
outstanding pages that haven't been transferred in a file on disk
so that they could be used to recover the VM later if the network failed.
It's not clear to me from that description if they do that only when the
network fails, or as part of a normal postcopy flow.

> Are there any other prices we need to pay to have postcopy?

The precopy phase in postcopy mode is slower than normal precopy migratio=
n,
but I think that's the only other penalty.

Dave

>=20
> -----=E9=82=AE=E4=BB=B6=E5=8E=9F=E4=BB=B6-----
> =E5=8F=91=E4=BB=B6=E4=BA=BA: Jason J. Herne [mailto:jjherne@linux.vnet.=
ibm.com]=20
> =E5=8F=91=E9=80=81=E6=97=B6=E9=97=B4: 2016=E5=B9=B41=E6=9C=887=E6=97=A5=
 3:43
> =E6=94=B6=E4=BB=B6=E4=BA=BA: Dr. David Alan Gilbert; Zhangbo (Oscar)
> =E6=8A=84=E9=80=81: zhouyimin Zhou(Yimin); Zhanghailiang; Yanqiangjun; =
Huangpeng (Peter); qemu-devel@nongnu.org; Herongguang (Stephen); Linqiang=
min; Huangzhichao; Wangyufei (James)
> =E4=B8=BB=E9=A2=98: Re: [Qemu-devel] What's the advantages of POSTCOPY =
over CPU-THROTTLE?
>=20
> On 01/06/2016 04:57 AM, Dr. David Alan Gilbert wrote:
> > * Zhangbo (Oscar) (oscar.zhangbo@huawei.com) wrote:
> >> Hi all:
> >> =E3=80=80=E3=80=80Postcopy is suitable for migrating guests which ha=
ve large page change rates. It
> >>      1 makes the guest run at the destination ASAP.
> >>      2 makes the downtime of the guest small enough.
> >>      If we don't take the 1st advantage into account, then, its bene=
fit seems similar with CPU-THROTTLE: both of them make the guest's downti=
me small during migration.
> >>
> >>      CPU-THROTTLE would make the guest's dirtypage rate *smaller tha=
n the network bandwidth*, in order to make the to_send_page_number in eac=
h iteration convergent and achieve the small-enough downtime during the l=
ast iteration.
> >>      If we adopt POST-COPY here, the guest's dirtypage rate would *b=
ecome equal to the bandwidth*, because we have to fetch its memory from t=
he source side, via the network.
> >>      Both of them would introduce performance degradations of the gu=
est, which may in turn cause downtime larger.
> >>
> >>      So, here comes the question: If we just compare POSTCOPY with C=
PU-THROTTLE for their advantages in decreasing downtime, POSTCOPY seems h=
as no pos over CPU-THROTTLE, is that right?
> >>
> >>      Meanwhile, Are there any other benifits of POSTCOPY besides the=
 2 mentioned above?
> >
> > It's a good question and they do both try and help solve the same pro=
blem.
> > One problem with cpu-throttle is whether you can throttle the CPU=20
> > enough to get the dirty-rate below the rate of the network, and the=20
> > answer to that is very workload dependent.  On a large, many-core VM,=
=20
> > even a little bit of CPU can dirty a lot of memory.  Postcopy is=20
> > guaranteed to finish migration, irrespective of the workload.
> >
> > Postcopy is pretty fine-grained, in that only threads that are=20
> > accessing pages that are still on the source are blocked, since it=20
> > allows the use of async page faults, that means it's even finer=20
> > grained than the vCPU level, so many threads come back up to full=20
> > performance pretty quickly even if there are a few pages left.
> >
>=20
> Good answer Dave. FWIW, I completely agree. Using cpu throttling can he=
lp the situation depending on workload. Postcopy will *always* work.=20
> One possible side effect of Postcopy is loss of the guest if the networ=
k connection dies during the postcopy phase of migration. This should be =
a very rare occurrence however. So both methods have their uses.
>=20
> --
> -- Jason J. Herne (jjherne@linux.vnet.ibm.com)
>=20
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK