From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:55088) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1alEsn-000585-F2 for qemu-devel@nongnu.org; Wed, 30 Mar 2016 08:06:03 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1alEsh-00034h-De for qemu-devel@nongnu.org; Wed, 30 Mar 2016 08:05:57 -0400 Received: from mx1.redhat.com ([209.132.183.28]:54507) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1alEsh-00034d-6R for qemu-devel@nongnu.org; Wed, 30 Mar 2016 08:05:51 -0400 Date: Wed, 30 Mar 2016 13:05:45 +0100 From: "Dr. David Alan Gilbert" Message-ID: <20160330120544.GB7580@work-vm> References: <1459326950-17708-1-git-send-email-zhangchen.fnst@cn.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1459326950-17708-1-git-send-email-zhangchen.fnst@cn.fujitsu.com> Subject: Re: [Qemu-devel] [PATCH V2 0/3] Introduce COLO-compare List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Zhang Chen Cc: Li Zhijian , Gui jianfeng , Jason Wang , "eddie.dong" , qemu devel , Yang Hongyang , zhanghailiang * Zhang Chen (zhangchen.fnst@cn.fujitsu.com) wrote: > COLO-compare is a part of COLO project. It is used > to compare the network package to help COLO decide > whether to do checkpoint. Hi Zhang Chen, I've put comments on the individual patches, but some more general things: 1) Please add a coment giving the example of the command line for the primary and secondary use of this module - it helps make it easier to understand the patches. 2) There's no tracing in here - please add some; I found when I tried to get COLO working I needed to use lots of tracing and debugging to understand the packet flow. 3) Add comments; e.g. for each function say which thread is using it and where the packets are coming from; e.g. called from the main thread on the primary for packets arriving over the socket from the secondary. There's just so many packets going in so many directions it would make it easier to follow. 4) A more fundamental problem is what happens if the secondary never sends anything on the socket, the result is you end up running until the end of the long COLO checkpoint without triggering a discompare - in my world I added a timeout (400ms) for an unmatched packet from the primary, where if no matching packet was received a checkpoint would be triggered. 5) I see the packet comparison is still the simple memcmpy that you had in December; are you planning on doing anything more complicated; you must be seing most packets miscompare? You can see my current world at; https://github.com/orbitfp7/qemu/commits/orbit-wp4-colo-mar16 which has my basic TCP comparison (it's only tracking incoming connections) and I know it's not complete either. It mostly works OK, although I've got an occasional seg (which makes me wonder if I need to add the conn_list_lock I see you added). I'm also not doing any TCP reassembly which is probably needed. Dave > v2: > - add jhash.h > > v1: > - initial patch > > > Zhang Chen (3): > colo-compare: introduce colo compare initlization > colo-compare: track connection and enqueue packet > colo-compare: introduce packet comparison thread > > include/qemu/jhash.h | 59 ++++ > net/Makefile.objs | 1 + > net/colo-compare.c | 782 +++++++++++++++++++++++++++++++++++++++++++++++++++ > vl.c | 3 +- > 4 files changed, 844 insertions(+), 1 deletion(-) > create mode 100644 include/qemu/jhash.h > create mode 100644 net/colo-compare.c > > -- > 1.9.1 > > > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK