From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:54774)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <yanghy@cn.fujitsu.com>) id 1ZKzAU-0002BP-Gy
	for qemu-devel@nongnu.org; Thu, 30 Jul 2015 21:31:28 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <yanghy@cn.fujitsu.com>) id 1ZKzAQ-0004iO-D8
	for qemu-devel@nongnu.org; Thu, 30 Jul 2015 21:31:26 -0400
Received: from [59.151.112.132] (port=59814 helo=heian.cn.fujitsu.com)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <yanghy@cn.fujitsu.com>) id 1ZKzAN-0004fx-Vb
	for qemu-devel@nongnu.org; Thu, 30 Jul 2015 21:31:22 -0400
Message-ID: <55BACFDE.1070004@cn.fujitsu.com>
Date: Fri, 31 Jul 2015 09:31:10 +0800
From: Yang Hongyang <yanghy@cn.fujitsu.com>
MIME-Version: 1.0
References: <55B9CF3F.2060202@huawei.com>
	<A12AC9D104E08D47BAF23C492F83C53B25BBF49F@SHSMSX104.ccr.corp.intel.com>
	<20150730080339.GA2250@work-vm> <55B9DD38.2030706@redhat.com>
	<20150730115640.GC2250@work-vm> <55BA1431.4090904@huawei.com>
	<20150730123052.GD2250@work-vm> <55BA1BB4.6070805@huawei.com>
	<20150730135936.GE2250@work-vm> <55BA3FF0.1000609@cn.fujitsu.com>
	<20150730175300.GJ2250@work-vm>
	<55BACAA9.7050900@cn.fujitsu.com> <55BACF20.6000509@huawei.com>
In-Reply-To: <55BACF20.6000509@huawei.com>
Content-Type: text/plain; charset="windows-1252"; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [POC] colo-proxy in qemu
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: zhanghailiang <zhang.zhanghailiang@huawei.com>, "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Li Zhijian <lizhijian@cn.fujitsu.com>, "jan.kiszka@siemens.com" <jan.kiszka@siemens.com>, Jason Wang <jasowang@redhat.com>, "Dong,
	Eddie" <eddie.dong@intel.com>, peter.huangpeng@huawei.com, "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>, Gonglei <arei.gonglei@huawei.com>, "stefanha@redhat.com" <stefanha@redhat.com>

On 07/31/2015 09:28 AM, zhanghailiang wrote:
> On 2015/7/31 9:08, Yang Hongyang wrote:
>>
>>
>> On 07/31/2015 01:53 AM, Dr. David Alan Gilbert wrote:
>>> * Yang Hongyang (yanghy@cn.fujitsu.com) wrote:
>>>>
>>>>
>>>> On 07/30/2015 09:59 PM, Dr. David Alan Gilbert wrote:
>>>>> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>>>>>> On 2015/7/30 20:30, Dr. David Alan Gilbert wrote:
>>>>>>> * Gonglei (arei.gonglei@huawei.com) wrote:
>>>>>>>> On 2015/7/30 19:56, Dr. David Alan Gilbert wrote:
>>>>>>>>> * Jason Wang (jasowang@redhat.com) wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 07/30/2015 04:03 PM, Dr. David Alan Gilbert wrote:
>>>>>>>>>>> * Dong, Eddie (eddie.dong@intel.com) wrote:
>>>>>>>>>>>>>> A question here, the packet comparing may be very tricky. For
>>>>>>>>>>>>>> example,
>>>>>>>>>>>>>> some protocol use random data to generate unpredictable id or
>>>>>>>>>>>>>> something else. One example is ipv6_select_ident() in Linux. So COLO
>>>>>>>>>>>>>> needs a mechanism to make sure PVM and SVM can generate same random
>>>>>>>>>>>>> data?
>>>>>>>>>>>>> Good question, the random data connection is a big problem for
>>>>>>>>>>>>> COLO. At
>>>>>>>>>>>>> present, it will trigger checkpoint processing because of the
>>>>>>>>>>>>> different random
>>>>>>>>>>>>> data.
>>>>>>>>>>>>> I don't think any mechanisms can assure two different machines
>>>>>>>>>>>>> generate the
>>>>>>>>>>>>> same random data. If you have any ideas, pls tell us :)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Frequent checkpoint can handle this scenario, but maybe will cause the
>>>>>>>>>>>>> performance poor. :(
>>>>>>>>>>>>>
>>>>>>>>>>>> The assumption is that, after VM checkpoint, SVM and PVM have
>>>>>>>>>>>> identical internal state, so the pattern used to generate random
>>>>>>>>>>>> data has high possibility to generate identical data at short time,
>>>>>>>>>>>> at least...
>>>>>>>>>>> They do diverge pretty quickly though; I have simple examples which
>>>>>>>>>>> reliably cause a checkpoint because of simple randomness in
>>>>>>>>>>> applications.
>>>>>>>>>>>
>>>>>>>>>>> Dave
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> And it will become even worse if hwrng is used in guest.
>>>>>>>>>
>>>>>>>>> Yes; it seems quite application dependent;  (on IPv4) an ssh connection,
>>>>>>>>> once established, tends to work well without triggering checkpoints;
>>>>>>>>> and static web pages also work well.  Examples of things that do cause
>>>>>>>>> more checkpoints are, displaying guest statistics (e.g. running top
>>>>>>>>> in that ssh) which is timing dependent, and dynamically generated
>>>>>>>>> web pages that include a unique ID (bugzilla's password reset link in
>>>>>>>>> it's front page was a fun one), I think also establishing
>>>>>>>>> new encrypted connections cause the same randomness.
>>>>>>>>>
>>>>>>>>> However, it's worth remembering that COLO is trying to reduce the
>>>>>>>>> number of checkpoints compared to a simple checkpointing world
>>>>>>>>> which would be aiming to do a checkpoint ~100 times a second,
>>>>>>>>> and for compute bound workloads, or ones that don't expose
>>>>>>>>> the randomness that much, it can get checkpoints of a few seconds
>>>>>>>>> in length which greatly reduces the overhead.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Yes. That's the truth.
>>>>>>>> We can set two different modes for different scenarios. Maybe Named
>>>>>>>> 1) frequent checkpoint mode for multi-connections and randomness scenarios
>>>>>>>> and 2) non-frequent checkpoint mode for other scenarios.
>>>>>>>>
>>>>>>>> But that's the next plan, we are thinking about that.
>>>>>>>
>>>>>>> I have some code that tries to automatically switch between those;
>>>>>>> it measures the checkpoint lengths, and if they're consistently short
>>>>>>> it sends a different message byte to the secondary at the start of the
>>>>>>> checkpoint, so that it doesn't bother running.   Every so often it
>>>>>>> then flips back to a COLO checkpoint to see if the checkpoints
>>>>>>> are still really fast.
>>>>>>>
>>>>>>
>>>>>> Do you mean if there are consistent checkpoint requests, not do checkpoint
>>>>>> but just send a special message to SVM?
>>>>>> Resume to common COLO mode until the checkpoint lengths is so not short ?
>>>>>
>>>>>    We still have to do checkpoints, but we send a special message to the
>>>>> SVM so that
>>>>> the SVM just takes the checkpoint but does not run.
>>>>>
>>>>>    I'll send the code after I've updated it to your current version; but it's
>>>>> quite rough/experimental.
>>>>>
>>>>> It works something like
>>>>>
>>>>>   -----------run PVM     run SVM
>>>>>       COLO     <long gap>
>>>>>       mode       miscompare
>>>>>                  checkpoint
>>>>>   -----------run PVM     run SVM
>>>>>       COLO     <short gap>
>>>>>       mode       miscompare
>>>>>                  checkpoint
>>>>>   -----------run PVM     run SVM
>>>>>       COLO     <short gap>
>>>>>       mode       miscompare         < After a few short runs
>>>>>                  checkpoint
>>>>>   -----------run PVM     SVM idle   \
>>>>>     Passive    <fixed delay>        |  - repeat 'n' times
>>>>>       mode       checkpoint         /
>>>>>   -----------run PVM     run SVM
>>>>>       COLO     <short gap>          < Still a short gap
>>>>>       mode       miscompare
>>>>>   -----------run PVM     SVM idle   \
>>>>>     Passive    <fixed delay>        |  - repeat 'n' times
>>>>>       mode       checkpoint         /
>>>>>   -----------run PVM     run SVM
>>>>>       COLO     <long gap>          < long gap now, stay in COLO
>>>>>       mode       miscompare
>>>>>                  checkpoint
>>>>>   -----------run PVM     run SVM
>>>>>       COLO     <long gap>
>>>>>       mode       miscompare
>>>>>                  checkpoint
>>>>>
>>>>> So it saves the CPU time on the SVM, and the comparison traffic, and is
>>>>> automatic at switching into the passive mode.
>>>>>
>>>>> It used to be more useful, but your minimum COLO run time that you
>>>>> added a few versions ago helps a lot in the cases where there are miscompares,
>>>>> and the delay after the miscompare before you take the checkpoint also helps
>>>>> in the case where the data is very random.
>>>>
>>>> This is great! This is exactly what we were thinking about, when random
>>>> scenario will fallback to MC/Remus like FT. Thank you very much!
>>>> I have a question, do you also modify colo-proxy kernel module? because
>>>> in the fixed checkpoint mode, I think we need to buffer the network
>>>> packets, and release them at checkpoint.
>>>
>>> Yes, we do need to buffer and release them at the end, but I've not modified
>>> colo-proxy so far.  Doesn't the current code on PMY already need to buffer
>>> packets
>>> that are generated after the first miscompare
>>
>> Yes, they are buffered,
>>
>>> and before the checkpoint and
>>> then release them at the checkpoint?
>>
>> but will be release only if the packets compare returns identical. so in order
>> to support this fallback mode, we need to modify it to release the packets at
>> the checkpoint, there won't be too much code though.
>>
>
> No, when do checkpoint, we will send all the residual queued packets.
> So it is already supported.

Great, my memory is wrong, sorry...

>
>>>
>>> Dave
>>>
>>>>
>>>>>
>>>>> Dave
>>>>>
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>>> Dave
>>>>>>>
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> -Gonglei
>>>>>>>>
>>>>>>> --
>>>>>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>>>>>
>>>>>>> .
>>>>>>>
>>>>>>
>>>>>>
>>>>> --
>>>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>>> .
>>>>>
>>>>
>>>> --
>>>> Thanks,
>>>> Yang.
>>> --
>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>> .
>>>
>>
>
>
> .
>

-- 
Thanks,
Yang.