From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44331) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1alo4x-0005El-Tj for qemu-devel@nongnu.org; Thu, 31 Mar 2016 21:40:53 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1alo4t-0000iI-Jl for qemu-devel@nongnu.org; Thu, 31 Mar 2016 21:40:51 -0400 Received: from [59.151.112.132] (port=47430 helo=heian.cn.fujitsu.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1alo4p-0000fZ-5w for qemu-devel@nongnu.org; Thu, 31 Mar 2016 21:40:47 -0400 References: <1459326950-17708-1-git-send-email-zhangchen.fnst@cn.fujitsu.com> <20160330120544.GB7580@work-vm> <56FC92ED.8000401@cn.fujitsu.com> <20160331094305.GE2265@work-vm> From: Li Zhijian Message-ID: <56FDD184.9060201@cn.fujitsu.com> Date: Fri, 1 Apr 2016 09:40:20 +0800 MIME-Version: 1.0 In-Reply-To: <20160331094305.GE2265@work-vm> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH V2 0/3] Introduce COLO-compare List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Dr. David Alan Gilbert" Cc: Zhang Chen , Gui jianfeng , Jason Wang , "eddie.dong" , qemu devel , Yang Hongyang , zhanghailiang On 03/31/2016 05:43 PM, Dr. David Alan Gilbert wrote: > * Li Zhijian (lizhijian@cn.fujitsu.com) wrote: >> >> >> On 03/30/2016 08:05 PM, Dr. David Alan Gilbert wrote: >>> * Zhang Chen (zhangchen.fnst@cn.fujitsu.com) wrote: >>>> COLO-compare is a part of COLO project. It is used >>>> to compare the network package to help COLO decide >>>> whether to do checkpoint. >>> >>> Hi Zhang Chen, >>> I've put comments on the individual patches, but some more general things: >>> >>> 1) Please add a coment giving the example of the command line for the primary >>> and secondary use of this module - it helps make it easier to understand the patches. >>> >>> 2) There's no tracing in here - please add some; I found when I tried to get >>> COLO working I needed to use lots of tracing and debugging to understand the >>> packet flow. >>> >>> 3) Add comments; e.g. for each function say which thread is using it and where >>> the packets are coming from; e.g. >>> called from the main thread on the primary for packets arriving over the socket >>> from the secondary. >>> >>> There's just so many packets going in so many directions it would make it >>> easier to follow. >>> >>> 4) A more fundamental problem is what happens if the secondary never sends anything >>> on the socket, the result is you end up running until the end of the long COLO >>> checkpoint without triggering a discompare - in my world I added a timeout (400ms) >>> for an unmatched packet from the primary, where if no matching packet was received >>> a checkpoint would be triggered. >>> >>> 5) I see the packet comparison is still the simple memcmpy that you had in December; >>> are you planning on doing anything more complicated; you must be seing most packets >>> miscompare? >>> >>> You can see my current world at; https://github.com/orbitfp7/qemu/commits/orbit-wp4-colo-mar16 >>> which has my basic TCP comparison (it's only tracking incoming connections) and I know it's >>> not complete either. It mostly works OK, although I've got an occasional seg >>> (which makes me wonder if I need to add the conn_list_lock I see you added). I'm also >>> not doing any TCP reassembly which is probably needed. >>> >> Thank you very much for your comments. >> I just see you tree, you put in a lot of work(tcp comparison enhance, sequence/acknowledge >> number re-write, timeout...) >> >> Actually, this compare module is just in a RFC stage(only including compare frame), there are >> many works to be done: >> >> 1) Integrate to COLO frame(and Let COLO primary and secondary at running state) > > Yes; although I think you've had most of the code for that, also feel free to use > any of the bits I've changed. Thank you, we will add this part to next version. > >> 2) ip segment defrag > > Yes, this seems the hardest bit to me. It is needed for some workloads; for example > a simple case if just an ssh connection, the differences in timing between the primary > and secondary you see that the primary might send two short packets and the secondary > might send one long packet with the same data. OK, let it as a TODO issue > >> 3) comparison base on the sequence number(tcp and udp) if packet has >> Because tcp re-transmission is quit common. IRC, your code will compare the whole tcp >> packet(sequence number will be compare) > > Yes; once the secondary reworks the sequence number I found the seuence number > matched most of the time on the primary. well, let (3) (4) (5) as a TODO issue too or move it to an extra patchset(depending on 3) > One problem (depending on the traffic) > is that the ack number might not match and I'm not sure of the best fix, for example, > consider: > > a) Send packet 1 > b) Send packet 2 > c) Receive response > > The order of b & c is random - if the response on the same socket arrives (c) before (b) > then the ack number in packet 2 is different; on one workload this caused a lot of miscompares > but only if the timing is just wrong. Yes, I've had this problem too. the logic is a bit complicated, as kernel colo-proxy, we compare the tcp playload(excluding sequence/ack number) only. And introduce the 'max_ack'(max_ack = MAX(primary_max_ack, secondary_max_ack)) to guarantee the packet which ack is <= 'max_ack' can be release to client. > > (I also had to turn off TCP timestamping to get useful comparisons) Yes, it works. > >> 4) packet belongs to the same connection is sort by sequence number >> >> 5) Out-Of-Oder packet handle > > I think 4 & 5 both happen as part of the defrag; out of order packets were a problem for > me on some of the workloads; I had to turn off multiqueue in one case. > >> 6) cleanup the un-active conn_list which maybe closed. the simple way is to introduce a >> timer to record whether a connection have packet come within a timeout, connection gone >> beyond this timeout should be cleanup. > > At the moment that isn't a big problem because when you receive the next checkpoint you > can flush the list. > The only case where you have to deal with this is for continuous failover when the > original secondary is promoted to primary, then it's connection list has to live > on longer for any connections it created prior to the failover. > Perhaps this gets more complex with defrag? > Yes, that need comparison support save/restore operation. move to TODO list. >> 7) Dave point out above (4) It make sense, we will consider add this one. >> >> 8) something I miss... >> >> For Various reasons, not all the works can be done immediately, So we hope to discuss and >> decide which function have the high priority. >> Any comments and suggestions are welcome. > > Yes, there's a lot of work; as I say, feel free to use any of my patches > from the world above, I wasn't planning on doing much more work on that set. > >> IMO, a compare frame and a COLO frame hack patch could be simple enough. > > I think you'd have to show that you got some useful comparison matches; > if it almost always failed the comparison then I can't see the point. > In a word, we will enhance the comparison and try to add (1) and (7) in next version. PS, we will pick code from you tree. ^_^ Thanks Li Zhijian > Dave > >> >> Thanks >> Li >> >>> Dave >>> >>>> v2: >>>> - add jhash.h >>>> >>>> v1: >>>> - initial patch >>>> >>>> >>>> Zhang Chen (3): >>>> colo-compare: introduce colo compare initlization >>>> colo-compare: track connection and enqueue packet >>>> colo-compare: introduce packet comparison thread >>>> >>>> include/qemu/jhash.h | 59 ++++ >>>> net/Makefile.objs | 1 + >>>> net/colo-compare.c | 782 +++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> vl.c | 3 +- >>>> 4 files changed, 844 insertions(+), 1 deletion(-) >>>> create mode 100644 include/qemu/jhash.h >>>> create mode 100644 net/colo-compare.c >>>> >>>> -- >>>> 1.9.1 >>>> >>>> >>>> >>> -- >>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK >>> >>> >>> . >>> >> >> -- >> Best regards. >> Li Zhijian (8555) >> >> > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > > > . >