From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48834) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aMUTw-0005et-IF for qemu-devel@nongnu.org; Fri, 22 Jan 2016 00:42:02 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aMUTr-0004SY-Gi for qemu-devel@nongnu.org; Fri, 22 Jan 2016 00:42:00 -0500 Received: from mx1.redhat.com ([209.132.183.28]:44545) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aMUTr-0004SU-7x for qemu-devel@nongnu.org; Fri, 22 Jan 2016 00:41:55 -0500 References: <1450780978-19123-1-git-send-email-zhangchen.fnst@cn.fujitsu.com> <568494B8.4080105@redhat.com> <5684E9EB.3070002@cn.fujitsu.com> <568A0527.9040001@redhat.com> <568A2A5F.3090608@cn.fujitsu.com> <568A3F80.8000806@redhat.com> <568A54C2.8050300@cn.fujitsu.com> <568CA327.4020103@redhat.com> <569C8EB7.3060507@cn.fujitsu.com> <569CB08F.4030607@redhat.com> <569EFF25.2020804@cn.fujitsu.com> <569F2F27.9000806@redhat.com> <569F5AFF.2050302@cn.fujitsu.com> <569F5F43.5030807@redhat.com> <569F61D7.3060502@cn.fujitsu.com> <56A19EEA.4000700@redhat.com> <56A1A1CA.8020008@cn.fujitsu.com> From: Jason Wang Message-ID: <56A1C112.3060402@redhat.com> Date: Fri, 22 Jan 2016 13:41:38 +0800 MIME-Version: 1.0 In-Reply-To: <56A1A1CA.8020008@cn.fujitsu.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Wen Congyang , Zhang Chen , qemu devel Cc: zhanghailiang , Li Zhijian , Gui jianfeng , "eddie.dong" , "Dr. David Alan Gilbert" , Huang peng , Gong lei , Stefan Hajnoczi , jan.kiszka@siemens.com, Yang Hongyang On 01/22/2016 11:28 AM, Wen Congyang wrote: > On 01/22/2016 11:15 AM, Jason Wang wrote: >> >> On 01/20/2016 06:30 PM, Wen Congyang wrote: >>> On 01/20/2016 06:19 PM, Jason Wang wrote: >>>>> >>>>> On 01/20/2016 06:01 PM, Wen Congyang wrote: >>>>>>> On 01/20/2016 02:54 PM, Jason Wang wrote: >>>>>>>>> On 01/20/2016 11:29 AM, Zhang Chen wrote: >>>>>>>>>>>>> Sure. >>>>>>>>>>>>> >>>>>>>>>>>>> Two main comments/suggestions: >>>>>>>>>>>>> >>>>>>>>>>>>> - TCP analysis is missed in current version, maybe you poin= t a git tree >>>>>>>>>>>>> (or another version of RFC) to me for a better understandin= g of the >>>>>>>>>>>>> design. (Just a skeleton for TCP should be sufficient to di= scuss). >>>>>>>>>>>>> - I prefer to make the code as reusable as possible. So it'= s better to >>>>>>>>>>>>> split/decouple the reusable parts from the codes. So a vagu= e idea is: >>>>>>>>>>>>> >>>>>>>>>>>>> 1) Decouple the packet comparing from the netfilter. You've= achieved >>>>>>>>>>>>> this 99% since the work has been done in a thread. Just let= the thread >>>>>>>>>>>>> poll sockets directly, then the comparing have the possibil= ity to be >>>>>>>>>>>>> reused by other kinds of dataplane. >>>>>>>>>>>>> 2) Implement traffic mirror/redirector as filter. >>>>>>>>>>>>> 3) Implement TCP seq rewriting as a filter. >>>>>>>>>>>>> >>>>>>>>>>>>> Then, in primary node, you need just a traffic mirror, whic= h did: >>>>>>>>>>>>> - mirror ingress traffic to secondary node >>>>>>>>>>>>> - mirror outgress traffic to packet comparing thread >>>>>>>>>>>>> >>>>>>>>>>>>> And in secondadry node, you need two filters: >>>>>>>>>>>>> - A TCP seq rewriter which adjust tcp sequence number. >>>>>>>>>>>>> - A traffic redirector which redirect packet from a socket = as ingress >>>>>>>>>>>>> traffic, and redirect outgress traffic to the socket which = could be >>>>>>>>>>>>> polled by remote packet comparing thread. >>>>>>>>>>>>> Thoughts? >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks >>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks >>>>>>>>>>>>>>> zhangchen >>>>>>>>>>> Hi, Jason. >>>>>>>>>>> We consider your suggestion to split/decouple >>>>>>>>>>> the reusable parts from the codes. >>>>>>>>>>> Due to filter plugin are traversed one by one in order >>>>>>>>>>> we will split colo-proxy to three filters in each side. >>>>>>>>>>> >>>>>>>>>>> But in this plan,primary and secondary both have socket >>>>>>>>>>> server,startup is a problem. >>>>>>>>> I believe this issue could be solved by reusing socket chardev. >>>>>>>>> >>>>>>>>>>> Primary qemu = =20 >>>>>>>>>>> Secondary qemu >>>>>>>>>>> +----------------------------------------------------------+ = =20 >>>>>>>>>>> +-----------------------------------------------------------+ >>>>>>>>>>> | +-----------------------------------------------------+ | = |=20 >>>>>>>>>>> +------------------------------------------------------+ | >>>>>>>>>>> | | | | = |=20 >>>>>>>>>>> | | | >>>>>>>>>>> | | guest | | = |=20 >>>>>>>>>>> | guest | | >>>>>>>>>>> | | | | = |=20 >>>>>>>>>>> | | | >>>>>>>>>>> | +-----------^--------------+--------------------------+ | = |=20 >>>>>>>>>>> +---------------------+--------+-----------------------+ | >>>>>>>>>>> | | | | = =20 >>>>>>>>>>> | ^ | | >>>>>>>>>>> | | | | = =20 >>>>>>>>>>> | | | | >>>>>>>>>>> | +----------------------------------------------= ---+=20 >>>>>>>>>>> | | | | >>>>>>>>>>> | netfilter | | | = | | =20 >>>>>>>>>>> netfilter | | | >>>>>>>>>>> | +-----------------------------------------------------+ | = | |=20 >>>>>>>>>>> +------------------------------------------------------+ | >>>>>>>>>>> | | | | filter excute order | | = | |=20 >>>>>>>>>>> | | | filter excute order | | >>>>>>>>>>> | | | | +-------------------> | | = | |=20 >>>>>>>>>>> | | | +-------------------> | | >>>>>>>>>>> | | | | | | = | |=20 >>>>>>>>>>> | | | TCP | | >>>>>>>>>>> | | +---------+-+ +------v-----+ +----+ +-----+ | | = | |=20 >>>>>>>>>>> | +-----------+ +---+----+---v+rewriter+ +--------+ | | >>>>>>>>>>> | | | | | | | | | | = | |=20 >>>>>>>>>>> | | | | | | | | | | >>>>>>>>>>> | | | mirror | | redirect +----> compare | | | = =20 >>>>>>>>>>> +--------> mirror +---> adjust | adjust +-->redirect| = | | >>>>>>>>>>> | | | client | | server | | | | | = |=20 >>>>>>>>>>> | | server | | ack | seq | |client | | | >>>>>>>>>>> | | | | | | | | | | = |=20 >>>>>>>>>>> | | | | | | | | | | >>>>>>>>>>> | | +----^------+ +----^-------+ +-----+------+ | | = |=20 >>>>>>>>>>> | +-----------+ +--------+-------------+ +----+---+ | | >>>>>>>>>>> | | | tx | rx | rx | | = |=20 >>>>>>>>>>> | tx all | rx | | >>>>>>>>>>> | +-----------------------------------------------------+ | = |=20 >>>>>>>>>>> +------------------------------------------------------+ | >>>>>>>>>>> | | =20 >>>>>>>>>>> +------------------------------------------------------------= -------------------------------+ =20 >>>>>>>>>>> | >>>>>>>>>>> | | | | = =20 >>>>>>>>>>> | | >>>>>>>>>>> +----------------------------------------------------------+ = =20 >>>>>>>>>>> +-----------------------------------------------------------+ >>>>>>>>>>> | | >>>>>>>>>>> |guest receive |guest send >>>>>>>>>>> | | >>>>>>>>>>> +--------+------------------------------------v------------+ >>>>>>>>>>> | | >>>>>>>>>>> | | >>>>>>>>>>> | tap =20 >>>>>>>>>>> | NOTE: filter direction is rx/t= x/all >>>>>>>>>>> | =20 >>>>>>>>>>> | rx:receive packets sent to the= netdev >>>>>>>>>>> | =20 >>>>>>>>>>> | tx:receive packets sent by the= netdev >>>>>>>>>>> +----------------------------------------------------------+ >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> I still like to decouple comparer from netfilter. It have two o= bvious >>>>>>>>> advantages: >>>>>>>>> >>>>>>>>> - make it can be reused by other dataplane (e.g vhost) >>>>>>>>> - secondary redirector could redirect rx to comparer on primary= node >>>>>>>>> directly which simplify the design. >>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> guest recv packet route >>>>>>>>>>> >>>>>>>>>>> primary >>>>>>>>>>> tap --> mirror client filter >>>>>>>>>>> mirror client will send packet to guest,at the >>>>>>>>>>> same time, copy and forward packet to secondary >>>>>>>>>>> mirror server. >>>>>>>>>>> >>>>>>>>>>> secondary >>>>>>>>>>> mirror server filter --> TCP rewriter >>>>>>>>>>> if recv packet is TCP packet,we will adjust ack >>>>>>>>>>> and update TCP checksum, then send to secondary >>>>>>>>>>> guest. else directly send to guest. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> guest send packet route >>>>>>>>>>> >>>>>>>>>>> primary >>>>>>>>>>> guest --> redirect server filter >>>>>>>>>>> redirect server filter recv primary guest packet >>>>>>>>>>> but do nothing, just pass to next filter. >>>>>>>>>>> >>>>>>>>>>> redirect server filter --> compare filter >>>>>>>>>>> compare filter recv primary guest packet then >>>>>>>>>>> waiting scondary redirect packet to compare it. >>>>>>>>>>> if packet same,send primary packet and clear secondary >>>>>>>>>>> packet, else send primary packet and do >>>>>>>>>>> checkpoint. >>>>>>>>>>> >>>>>>>>>>> secondary >>>>>>>>>>> guest --> TCP rewriter filter >>>>>>>>>>> if the packet is TCP packet,we will adjust seq >>>>>>>>>>> and update TCP checksum. then send it to >>>>>>>>>>> redirect client filter. else directly send to >>>>>>>>>>> redirect client filter. >>>>>>>>>>> >>>>>>>>>>> redirect client filter --> redirect server filter >>>>>>>>>>> forward packet to primary >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> In failover scene=EF=BC=88primary is down=EF=BC=89, the TCP r= ewriter will keep >>>>>>>>>>> servicing >>>>>>>>>>> for the TCP connection which is established after the last ch= eckpoint=E3=80=82 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> How about this plan? >>>>>>>>> Sounds good. >>>>>>>>> >>>>>>>>> And there's indeed no need to differ client/server by reusing t= he socket >>>>>>>>> chardev. E.g: >>>>>>>>> >>>>>>>>> In primary node: >>>>>>>>> >>>>>>>>> ... >>>>>>>>> -chardev socket,id=3Dcomparer0,host=3Dip_primary,port=3DX,serve= r,nowait >>>>>>>>> -chardev socket,id=3Dcomparer1,host=3Dip_primary,port=3DY,serve= r,nowait >>>>>>>>> -chardev socket,id=3Dmirrorer0,host=3Dip_primary,port=3DZ,serve= r,nowait >>>>>>>>> -netdev tap,id=3Dhn0 >>>>>>>>> -traffic-mirrorer netdev=3Dhn0,id=3Dt0,indev=3Dcomparer0,outdev= =3Dmirrorer0 >>>>>>>>> -colo-comparer primary_traffic=3Dcomparer0,secondary_traffic=3D= comparer1 >>>>>>> Why mirrorer has indev?=20 >>>>> >>>>> As I said in the previous mails. I would like to decouple packet >>>>> comparing from netfilter. You've already done most of this since th= e >>>>> comparing is done in an independent thread. So the indev here is to >>>>> mirror the packet sent by guest to the packet comparing thread. >>>>> >>>>>>> I think we can use traffic-redirector to do it. >>>>>>> The command line is: >>>>>>> -netdev tap,id=3Dhn0 >>>>>>> -object traffic-mirrorer,id=3Df0,netdev=3Dhn0,queue=3Dtx,outdev=3D= mirrorer0 >>>>>>> -object traffic-redirector,id=3Df1,netdev=3Dhn0,queue=3Drx,outdev= =3Dcomparer0 >>>>>>> -colo-comparer primary_traffic=3Dcomparer0,secondary_traffic=3Dco= mparer1,netdev=3Dhn0 >>>>>>> In the comparer thread, we can use qemu_net_queue_send_iov() to s= end >>>>>>> out the packet. >>>>>>> >>>>>>> Also, we can merge the socketdev comparer1 and mirrorer0. >>>>> It depends on whether or not packet comparing was done in a net fil= ter >>>>> (which I prefer not). >>> I mean that: packet comapring is done in a thread, not a net filter. >>> The flow of the packet sent from guest: >>> 1. traffice-redirecotr, we will redirector the packet to comparer0, t= he next >>> filter will never see it. >>> 2. comparing thread: read it from socket chardev comparer0 >>> 3. call qemu_net_queue_send_iov() to send it back to the netdev. >> Ok, looks like I miss something. >> >> My suggestion tries best to let the packet comparing not tie to filter >> or netdev. But your suggestion still need it to be coupled with a >> netdev. Any advantages of doing this (or is there a reason that packet >> must be sent to netdev after doing comparing?). If not, why not just > Yes, the packet must be sent to netdev after doing comparing. If both > the primary packet and secondary packet are the same(contains the same > application level data), we will drop the secondary packet, and send th= e > primary packet to the netdev. Otherwise, we will sync the state. And drop primary packet also here? > >> mirror (duplicate the packet and forward it to a chardev, and pass the >> original packet to the next filter or netdev)? And doing > We cannot send the packet to the netdev before comparing. We need to ke= ep > the connection after failover. > > Thanks > Wen Congyang > >> qemu_net_queue_send_iov() to a netdev in another thread may need some >> synchronization with iothread. >> >>> Thanks >>> Wen Congyang >>> >> >> >> . >> > >