From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:54774) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZKzAU-0002BP-Gy for qemu-devel@nongnu.org; Thu, 30 Jul 2015 21:31:28 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZKzAQ-0004iO-D8 for qemu-devel@nongnu.org; Thu, 30 Jul 2015 21:31:26 -0400 Received: from [59.151.112.132] (port=59814 helo=heian.cn.fujitsu.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZKzAN-0004fx-Vb for qemu-devel@nongnu.org; Thu, 30 Jul 2015 21:31:22 -0400 Message-ID: <55BACFDE.1070004@cn.fujitsu.com> Date: Fri, 31 Jul 2015 09:31:10 +0800 From: Yang Hongyang MIME-Version: 1.0 References: <55B9CF3F.2060202@huawei.com> <20150730080339.GA2250@work-vm> <55B9DD38.2030706@redhat.com> <20150730115640.GC2250@work-vm> <55BA1431.4090904@huawei.com> <20150730123052.GD2250@work-vm> <55BA1BB4.6070805@huawei.com> <20150730135936.GE2250@work-vm> <55BA3FF0.1000609@cn.fujitsu.com> <20150730175300.GJ2250@work-vm> <55BACAA9.7050900@cn.fujitsu.com> <55BACF20.6000509@huawei.com> In-Reply-To: <55BACF20.6000509@huawei.com> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [POC] colo-proxy in qemu List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: zhanghailiang , "Dr. David Alan Gilbert" Cc: Li Zhijian , "jan.kiszka@siemens.com" , Jason Wang , "Dong, Eddie" , peter.huangpeng@huawei.com, "qemu-devel@nongnu.org" , Gonglei , "stefanha@redhat.com" On 07/31/2015 09:28 AM, zhanghailiang wrote: > On 2015/7/31 9:08, Yang Hongyang wrote: >> >> >> On 07/31/2015 01:53 AM, Dr. David Alan Gilbert wrote: >>> * Yang Hongyang (yanghy@cn.fujitsu.com) wrote: >>>> >>>> >>>> On 07/30/2015 09:59 PM, Dr. David Alan Gilbert wrote: >>>>> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote: >>>>>> On 2015/7/30 20:30, Dr. David Alan Gilbert wrote: >>>>>>> * Gonglei (arei.gonglei@huawei.com) wrote: >>>>>>>> On 2015/7/30 19:56, Dr. David Alan Gilbert wrote: >>>>>>>>> * Jason Wang (jasowang@redhat.com) wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 07/30/2015 04:03 PM, Dr. David Alan Gilbert wrote: >>>>>>>>>>> * Dong, Eddie (eddie.dong@intel.com) wrote: >>>>>>>>>>>>>> A question here, the packet comparing may be very tricky. For >>>>>>>>>>>>>> example, >>>>>>>>>>>>>> some protocol use random data to generate unpredictable id or >>>>>>>>>>>>>> something else. One example is ipv6_select_ident() in Linux. So COLO >>>>>>>>>>>>>> needs a mechanism to make sure PVM and SVM can generate same random >>>>>>>>>>>>> data? >>>>>>>>>>>>> Good question, the random data connection is a big problem for >>>>>>>>>>>>> COLO. At >>>>>>>>>>>>> present, it will trigger checkpoint processing because of the >>>>>>>>>>>>> different random >>>>>>>>>>>>> data. >>>>>>>>>>>>> I don't think any mechanisms can assure two different machines >>>>>>>>>>>>> generate the >>>>>>>>>>>>> same random data. If you have any ideas, pls tell us :) >>>>>>>>>>>>> >>>>>>>>>>>>> Frequent checkpoint can handle this scenario, but maybe will cause the >>>>>>>>>>>>> performance poor. :( >>>>>>>>>>>>> >>>>>>>>>>>> The assumption is that, after VM checkpoint, SVM and PVM have >>>>>>>>>>>> identical internal state, so the pattern used to generate random >>>>>>>>>>>> data has high possibility to generate identical data at short time, >>>>>>>>>>>> at least... >>>>>>>>>>> They do diverge pretty quickly though; I have simple examples which >>>>>>>>>>> reliably cause a checkpoint because of simple randomness in >>>>>>>>>>> applications. >>>>>>>>>>> >>>>>>>>>>> Dave >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> And it will become even worse if hwrng is used in guest. >>>>>>>>> >>>>>>>>> Yes; it seems quite application dependent; (on IPv4) an ssh connection, >>>>>>>>> once established, tends to work well without triggering checkpoints; >>>>>>>>> and static web pages also work well. Examples of things that do cause >>>>>>>>> more checkpoints are, displaying guest statistics (e.g. running top >>>>>>>>> in that ssh) which is timing dependent, and dynamically generated >>>>>>>>> web pages that include a unique ID (bugzilla's password reset link in >>>>>>>>> it's front page was a fun one), I think also establishing >>>>>>>>> new encrypted connections cause the same randomness. >>>>>>>>> >>>>>>>>> However, it's worth remembering that COLO is trying to reduce the >>>>>>>>> number of checkpoints compared to a simple checkpointing world >>>>>>>>> which would be aiming to do a checkpoint ~100 times a second, >>>>>>>>> and for compute bound workloads, or ones that don't expose >>>>>>>>> the randomness that much, it can get checkpoints of a few seconds >>>>>>>>> in length which greatly reduces the overhead. >>>>>>>>> >>>>>>>> >>>>>>>> Yes. That's the truth. >>>>>>>> We can set two different modes for different scenarios. Maybe Named >>>>>>>> 1) frequent checkpoint mode for multi-connections and randomness scenarios >>>>>>>> and 2) non-frequent checkpoint mode for other scenarios. >>>>>>>> >>>>>>>> But that's the next plan, we are thinking about that. >>>>>>> >>>>>>> I have some code that tries to automatically switch between those; >>>>>>> it measures the checkpoint lengths, and if they're consistently short >>>>>>> it sends a different message byte to the secondary at the start of the >>>>>>> checkpoint, so that it doesn't bother running. Every so often it >>>>>>> then flips back to a COLO checkpoint to see if the checkpoints >>>>>>> are still really fast. >>>>>>> >>>>>> >>>>>> Do you mean if there are consistent checkpoint requests, not do checkpoint >>>>>> but just send a special message to SVM? >>>>>> Resume to common COLO mode until the checkpoint lengths is so not short ? >>>>> >>>>> We still have to do checkpoints, but we send a special message to the >>>>> SVM so that >>>>> the SVM just takes the checkpoint but does not run. >>>>> >>>>> I'll send the code after I've updated it to your current version; but it's >>>>> quite rough/experimental. >>>>> >>>>> It works something like >>>>> >>>>> -----------run PVM run SVM >>>>> COLO >>>>> mode miscompare >>>>> checkpoint >>>>> -----------run PVM run SVM >>>>> COLO >>>>> mode miscompare >>>>> checkpoint >>>>> -----------run PVM run SVM >>>>> COLO >>>>> mode miscompare < After a few short runs >>>>> checkpoint >>>>> -----------run PVM SVM idle \ >>>>> Passive | - repeat 'n' times >>>>> mode checkpoint / >>>>> -----------run PVM run SVM >>>>> COLO < Still a short gap >>>>> mode miscompare >>>>> -----------run PVM SVM idle \ >>>>> Passive | - repeat 'n' times >>>>> mode checkpoint / >>>>> -----------run PVM run SVM >>>>> COLO < long gap now, stay in COLO >>>>> mode miscompare >>>>> checkpoint >>>>> -----------run PVM run SVM >>>>> COLO >>>>> mode miscompare >>>>> checkpoint >>>>> >>>>> So it saves the CPU time on the SVM, and the comparison traffic, and is >>>>> automatic at switching into the passive mode. >>>>> >>>>> It used to be more useful, but your minimum COLO run time that you >>>>> added a few versions ago helps a lot in the cases where there are miscompares, >>>>> and the delay after the miscompare before you take the checkpoint also helps >>>>> in the case where the data is very random. >>>> >>>> This is great! This is exactly what we were thinking about, when random >>>> scenario will fallback to MC/Remus like FT. Thank you very much! >>>> I have a question, do you also modify colo-proxy kernel module? because >>>> in the fixed checkpoint mode, I think we need to buffer the network >>>> packets, and release them at checkpoint. >>> >>> Yes, we do need to buffer and release them at the end, but I've not modified >>> colo-proxy so far. Doesn't the current code on PMY already need to buffer >>> packets >>> that are generated after the first miscompare >> >> Yes, they are buffered, >> >>> and before the checkpoint and >>> then release them at the checkpoint? >> >> but will be release only if the packets compare returns identical. so in order >> to support this fallback mode, we need to modify it to release the packets at >> the checkpoint, there won't be too much code though. >> > > No, when do checkpoint, we will send all the residual queued packets. > So it is already supported. Great, my memory is wrong, sorry... > >>> >>> Dave >>> >>>> >>>>> >>>>> Dave >>>>> >>>>>> >>>>>> Thanks. >>>>>> >>>>>>> Dave >>>>>>> >>>>>>>> >>>>>>>> Regards, >>>>>>>> -Gonglei >>>>>>>> >>>>>>> -- >>>>>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK >>>>>>> >>>>>>> . >>>>>>> >>>>>> >>>>>> >>>>> -- >>>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK >>>>> . >>>>> >>>> >>>> -- >>>> Thanks, >>>> Yang. >>> -- >>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK >>> . >>> >> > > > . > -- Thanks, Yang.