From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:55615) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZKmSB-0000kB-82 for qemu-devel@nongnu.org; Thu, 30 Jul 2015 07:56:52 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZKmS6-0005ox-AF for qemu-devel@nongnu.org; Thu, 30 Jul 2015 07:56:51 -0400 Received: from mx1.redhat.com ([209.132.183.28]:58910) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZKmS6-0005op-3X for qemu-devel@nongnu.org; Thu, 30 Jul 2015 07:56:46 -0400 Date: Thu, 30 Jul 2015 12:56:40 +0100 From: "Dr. David Alan Gilbert" Message-ID: <20150730115640.GC2250@work-vm> References: <55AC9859.3050100@cn.fujitsu.com> <55B9A6C6.6010008@redhat.com> <55B9CF3F.2060202@huawei.com> <20150730080339.GA2250@work-vm> <55B9DD38.2030706@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <55B9DD38.2030706@redhat.com> Subject: Re: [Qemu-devel] [POC] colo-proxy in qemu List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Jason Wang Cc: zhanghailiang , Li Zhijian , "jan.kiszka@siemens.com" , "Dong, Eddie" , "qemu-devel@nongnu.org" , "peter.huangpeng" , Gonglei , "stefanha@redhat.com" , Yang Hongyang * Jason Wang (jasowang@redhat.com) wrote: > > > On 07/30/2015 04:03 PM, Dr. David Alan Gilbert wrote: > > * Dong, Eddie (eddie.dong@intel.com) wrote: > >>>> A question here, the packet comparing may be very tricky. For example, > >>>> some protocol use random data to generate unpredictable id or > >>>> something else. One example is ipv6_select_ident() in Linux. So COLO > >>>> needs a mechanism to make sure PVM and SVM can generate same random > >>> data? > >>> Good question, the random data connection is a big problem for COLO. At > >>> present, it will trigger checkpoint processing because of the different random > >>> data. > >>> I don't think any mechanisms can assure two different machines generate the > >>> same random data. If you have any ideas, pls tell us :) > >>> > >>> Frequent checkpoint can handle this scenario, but maybe will cause the > >>> performance poor. :( > >>> > >> The assumption is that, after VM checkpoint, SVM and PVM have identical internal state, so the pattern used to generate random data has high possibility to generate identical data at short time, at least... > > They do diverge pretty quickly though; I have simple examples which > > reliably cause a checkpoint because of simple randomness in applications. > > > > Dave > > > > And it will become even worse if hwrng is used in guest. Yes; it seems quite application dependent; (on IPv4) an ssh connection, once established, tends to work well without triggering checkpoints; and static web pages also work well. Examples of things that do cause more checkpoints are, displaying guest statistics (e.g. running top in that ssh) which is timing dependent, and dynamically generated web pages that include a unique ID (bugzilla's password reset link in it's front page was a fun one), I think also establishing new encrypted connections cause the same randomness. However, it's worth remembering that COLO is trying to reduce the number of checkpoints compared to a simple checkpointing world which would be aiming to do a checkpoint ~100 times a second, and for compute bound workloads, or ones that don't expose the randomness that much, it can get checkpoints of a few seconds in length which greatly reduces the overhead. Dave -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK