From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:42267) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aMSCv-0001qD-RE for qemu-devel@nongnu.org; Thu, 21 Jan 2016 22:16:19 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aMSCq-0002X8-Pj for qemu-devel@nongnu.org; Thu, 21 Jan 2016 22:16:17 -0500 Received: from mx1.redhat.com ([209.132.183.28]:57265) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aMSCq-0002X4-FL for qemu-devel@nongnu.org; Thu, 21 Jan 2016 22:16:12 -0500 References: <1450780978-19123-1-git-send-email-zhangchen.fnst@cn.fujitsu.com> <568494B8.4080105@redhat.com> <5684E9EB.3070002@cn.fujitsu.com> <568A0527.9040001@redhat.com> <568A2A5F.3090608@cn.fujitsu.com> <568A3F80.8000806@redhat.com> <568A54C2.8050300@cn.fujitsu.com> <568CA327.4020103@redhat.com> <569C8EB7.3060507@cn.fujitsu.com> <569CB08F.4030607@redhat.com> <569EFF25.2020804@cn.fujitsu.com> <569F2F27.9000806@redhat.com> <569F5AFF.2050302@cn.fujitsu.com> <569F5F43.5030807@redhat.com> <569F61D7.3060502@cn.fujitsu.com> From: Jason Wang Message-ID: <56A19EEA.4000700@redhat.com> Date: Fri, 22 Jan 2016 11:15:54 +0800 MIME-Version: 1.0 In-Reply-To: <569F61D7.3060502@cn.fujitsu.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Wen Congyang , Zhang Chen , qemu devel Cc: zhanghailiang , Li Zhijian , Gui jianfeng , "eddie.dong" , "Dr. David Alan Gilbert" , Huang peng , Gong lei , Stefan Hajnoczi , jan.kiszka@siemens.com, Yang Hongyang On 01/20/2016 06:30 PM, Wen Congyang wrote: > On 01/20/2016 06:19 PM, Jason Wang wrote: >> >=20 >> >=20 >> > On 01/20/2016 06:01 PM, Wen Congyang wrote: >>> >> On 01/20/2016 02:54 PM, Jason Wang wrote: >>>> >>> >>>> >>> On 01/20/2016 11:29 AM, Zhang Chen wrote: >>>>>> >>>>> Sure. >>>>>> >>>>> >>>>>> >>>>> Two main comments/suggestions: >>>>>> >>>>> >>>>>> >>>>> - TCP analysis is missed in current version, maybe you point= a git tree >>>>>> >>>>> (or another version of RFC) to me for a better understanding= of the >>>>>> >>>>> design. (Just a skeleton for TCP should be sufficient to dis= cuss). >>>>>> >>>>> - I prefer to make the code as reusable as possible. So it's= better to >>>>>> >>>>> split/decouple the reusable parts from the codes. So a vague= idea is: >>>>>> >>>>> >>>>>> >>>>> 1) Decouple the packet comparing from the netfilter. You've = achieved >>>>>> >>>>> this 99% since the work has been done in a thread. Just let = the thread >>>>>> >>>>> poll sockets directly, then the comparing have the possibili= ty to be >>>>>> >>>>> reused by other kinds of dataplane. >>>>>> >>>>> 2) Implement traffic mirror/redirector as filter. >>>>>> >>>>> 3) Implement TCP seq rewriting as a filter. >>>>>> >>>>> >>>>>> >>>>> Then, in primary node, you need just a traffic mirror, which= did: >>>>>> >>>>> - mirror ingress traffic to secondary node >>>>>> >>>>> - mirror outgress traffic to packet comparing thread >>>>>> >>>>> >>>>>> >>>>> And in secondadry node, you need two filters: >>>>>> >>>>> - A TCP seq rewriter which adjust tcp sequence number. >>>>>> >>>>> - A traffic redirector which redirect packet from a socket a= s ingress >>>>>> >>>>> traffic, and redirect outgress traffic to the socket which c= ould be >>>>>> >>>>> polled by remote packet comparing thread. >>>>>> >>>>> Thoughts? >>>>>> >>>>> >>>>>> >>>>> Thanks >>>>>> >>>>> >>>>>>> >>>>>> Thanks >>>>>>> >>>>>> zhangchen >>>>> >>>> >>>>> >>>> Hi, Jason. >>>>> >>>> We consider your suggestion to split/decouple >>>>> >>>> the reusable parts from the codes. >>>>> >>>> Due to filter plugin are traversed one by one in order >>>>> >>>> we will split colo-proxy to three filters in each side. >>>>> >>>> >>>>> >>>> But in this plan,primary and secondary both have socket >>>>> >>>> server,startup is a problem. >>>> >>> I believe this issue could be solved by reusing socket chardev. >>>> >>> >>>>> >>>> >>>>> >>>> Primary qemu = =20 >>>>> >>>> Secondary qemu >>>>> >>>> +----------------------------------------------------------+ = =20 >>>>> >>>> +-----------------------------------------------------------+ >>>>> >>>> | +-----------------------------------------------------+ | = |=20 >>>>> >>>> +------------------------------------------------------+ | >>>>> >>>> | | | | = |=20 >>>>> >>>> | | | >>>>> >>>> | | guest | | = |=20 >>>>> >>>> | guest | | >>>>> >>>> | | | | = |=20 >>>>> >>>> | | | >>>>> >>>> | +-----------^--------------+--------------------------+ | = |=20 >>>>> >>>> +---------------------+--------+-----------------------+ | >>>>> >>>> | | | | = =20 >>>>> >>>> | ^ | | >>>>> >>>> | | | | = =20 >>>>> >>>> | | | | >>>>> >>>> | +-----------------------------------------------= --+=20 >>>>> >>>> | | | | >>>>> >>>> | netfilter | | | = | | =20 >>>>> >>>> netfilter | | | >>>>> >>>> | +-----------------------------------------------------+ | = | |=20 >>>>> >>>> +------------------------------------------------------+ | >>>>> >>>> | | | | filter excute order | | = | |=20 >>>>> >>>> | | | filter excute order | | >>>>> >>>> | | | | +-------------------> | | = | |=20 >>>>> >>>> | | | +-------------------> | | >>>>> >>>> | | | | | | = | |=20 >>>>> >>>> | | | TCP | | >>>>> >>>> | | +---------+-+ +------v-----+ +----+ +-----+ | | = | |=20 >>>>> >>>> | +-----------+ +---+----+---v+rewriter+ +--------+ | | >>>>> >>>> | | | | | | | | | | = | |=20 >>>>> >>>> | | | | | | | | | | >>>>> >>>> | | | mirror | | redirect +----> compare | | | = =20 >>>>> >>>> +--------> mirror +---> adjust | adjust +-->redirect| |= | >>>>> >>>> | | | client | | server | | | | | = |=20 >>>>> >>>> | | server | | ack | seq | |client | | | >>>>> >>>> | | | | | | | | | | = |=20 >>>>> >>>> | | | | | | | | | | >>>>> >>>> | | +----^------+ +----^-------+ +-----+------+ | | = |=20 >>>>> >>>> | +-----------+ +--------+-------------+ +----+---+ | | >>>>> >>>> | | | tx | rx | rx | | = |=20 >>>>> >>>> | tx all | rx | | >>>>> >>>> | +-----------------------------------------------------+ | = |=20 >>>>> >>>> +------------------------------------------------------+ | >>>>> >>>> | | =20 >>>>> >>>> +-------------------------------------------------------------= ------------------------------+ =20 >>>>> >>>> | >>>>> >>>> | | | | = =20 >>>>> >>>> | | >>>>> >>>> +----------------------------------------------------------+ = =20 >>>>> >>>> +-----------------------------------------------------------+ >>>>> >>>> | | >>>>> >>>> |guest receive |guest send >>>>> >>>> | | >>>>> >>>> +--------+------------------------------------v------------+ >>>>> >>>> | | >>>>> >>>> | | >>>>> >>>> | tap =20 >>>>> >>>> | NOTE: filter direction is rx/tx= /all >>>>> >>>> | =20 >>>>> >>>> | rx:receive packets sent to the = netdev >>>>> >>>> | =20 >>>>> >>>> | tx:receive packets sent by the = netdev >>>>> >>>> +----------------------------------------------------------+ >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>> >>> I still like to decouple comparer from netfilter. It have two ob= vious >>>> >>> advantages: >>>> >>> >>>> >>> - make it can be reused by other dataplane (e.g vhost) >>>> >>> - secondary redirector could redirect rx to comparer on primary = node >>>> >>> directly which simplify the design. >>>> >>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> guest recv packet route >>>>> >>>> >>>>> >>>> primary >>>>> >>>> tap --> mirror client filter >>>>> >>>> mirror client will send packet to guest,at the >>>>> >>>> same time, copy and forward packet to secondary >>>>> >>>> mirror server. >>>>> >>>> >>>>> >>>> secondary >>>>> >>>> mirror server filter --> TCP rewriter >>>>> >>>> if recv packet is TCP packet,we will adjust ack >>>>> >>>> and update TCP checksum, then send to secondary >>>>> >>>> guest. else directly send to guest. >>>>> >>>> >>>>> >>>> >>>>> >>>> guest send packet route >>>>> >>>> >>>>> >>>> primary >>>>> >>>> guest --> redirect server filter >>>>> >>>> redirect server filter recv primary guest packet >>>>> >>>> but do nothing, just pass to next filter. >>>>> >>>> >>>>> >>>> redirect server filter --> compare filter >>>>> >>>> compare filter recv primary guest packet then >>>>> >>>> waiting scondary redirect packet to compare it. >>>>> >>>> if packet same,send primary packet and clear secondary >>>>> >>>> packet, else send primary packet and do >>>>> >>>> checkpoint. >>>>> >>>> >>>>> >>>> secondary >>>>> >>>> guest --> TCP rewriter filter >>>>> >>>> if the packet is TCP packet,we will adjust seq >>>>> >>>> and update TCP checksum. then send it to >>>>> >>>> redirect client filter. else directly send to >>>>> >>>> redirect client filter. >>>>> >>>> >>>>> >>>> redirect client filter --> redirect server filter >>>>> >>>> forward packet to primary >>>>> >>>> >>>>> >>>> >>>>> >>>> In failover scene=EF=BC=88primary is down=EF=BC=89, the TCP re= writer will keep >>>>> >>>> servicing >>>>> >>>> for the TCP connection which is established after the last che= ckpoint=E3=80=82 >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> How about this plan? >>>> >>> Sounds good. >>>> >>> >>>> >>> And there's indeed no need to differ client/server by reusing th= e socket >>>> >>> chardev. E.g: >>>> >>> >>>> >>> In primary node: >>>> >>> >>>> >>> ... >>>> >>> -chardev socket,id=3Dcomparer0,host=3Dip_primary,port=3DX,server= ,nowait >>>> >>> -chardev socket,id=3Dcomparer1,host=3Dip_primary,port=3DY,server= ,nowait >>>> >>> -chardev socket,id=3Dmirrorer0,host=3Dip_primary,port=3DZ,server= ,nowait >>>> >>> -netdev tap,id=3Dhn0 >>>> >>> -traffic-mirrorer netdev=3Dhn0,id=3Dt0,indev=3Dcomparer0,outdev=3D= mirrorer0 >>>> >>> -colo-comparer primary_traffic=3Dcomparer0,secondary_traffic=3Dc= omparer1 >>> >> Why mirrorer has indev?=20 >> >=20 >> >=20 >> > As I said in the previous mails. I would like to decouple packet >> > comparing from netfilter. You've already done most of this since the >> > comparing is done in an independent thread. So the indev here is to >> > mirror the packet sent by guest to the packet comparing thread. >> >=20 >>> >> I think we can use traffic-redirector to do it. >>> >> The command line is: >>> >> -netdev tap,id=3Dhn0 >>> >> -object traffic-mirrorer,id=3Df0,netdev=3Dhn0,queue=3Dtx,outdev=3D= mirrorer0 >>> >> -object traffic-redirector,id=3Df1,netdev=3Dhn0,queue=3Drx,outdev=3D= comparer0 >>> >> -colo-comparer primary_traffic=3Dcomparer0,secondary_traffic=3Dcom= parer1,netdev=3Dhn0 >>> >> In the comparer thread, we can use qemu_net_queue_send_iov() to se= nd >>> >> out the packet. >>> >> >>> >> Also, we can merge the socketdev comparer1 and mirrorer0. >> >=20 >> > It depends on whether or not packet comparing was done in a net filt= er >> > (which I prefer not). > I mean that: packet comapring is done in a thread, not a net filter. > The flow of the packet sent from guest: > 1. traffice-redirecotr, we will redirector the packet to comparer0, the= next > filter will never see it. > 2. comparing thread: read it from socket chardev comparer0 > 3. call qemu_net_queue_send_iov() to send it back to the netdev. Ok, looks like I miss something. My suggestion tries best to let the packet comparing not tie to filter or netdev. But your suggestion still need it to be coupled with a netdev. Any advantages of doing this (or is there a reason that packet must be sent to netdev after doing comparing?). If not, why not just mirror (duplicate the packet and forward it to a chardev, and pass the original packet to the next filter or netdev)? And doing qemu_net_queue_send_iov() to a netdev in another thread may need some synchronization with iothread. > > Thanks > Wen Congyang > >> >=20