qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Jason Wang <jasowang@redhat.com>
To: Wen Congyang <wency@cn.fujitsu.com>,
	Zhang Chen <zhangchen.fnst@cn.fujitsu.com>,
	qemu devel <qemu-devel@nongnu.org>
Cc: zhanghailiang <zhang.zhanghailiang@huawei.com>,
	Li Zhijian <lizhijian@cn.fujitsu.com>,
	Gui jianfeng <guijianfeng@cn.fujitsu.com>,
	"eddie.dong" <eddie.dong@intel.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Huang peng <peter.huangpeng@huawei.com>,
	Gong lei <arei.gonglei@huawei.com>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	jan.kiszka@siemens.com,
	Yang Hongyang <hongyang.yang@easystack.cn>
Subject: Re: [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter
Date: Fri, 22 Jan 2016 13:41:38 +0800	[thread overview]
Message-ID: <56A1C112.3060402@redhat.com> (raw)
In-Reply-To: <56A1A1CA.8020008@cn.fujitsu.com>



On 01/22/2016 11:28 AM, Wen Congyang wrote:
> On 01/22/2016 11:15 AM, Jason Wang wrote:
>>
>> On 01/20/2016 06:30 PM, Wen Congyang wrote:
>>> On 01/20/2016 06:19 PM, Jason Wang wrote:
>>>>>
>>>>> On 01/20/2016 06:01 PM, Wen Congyang wrote:
>>>>>>> On 01/20/2016 02:54 PM, Jason Wang wrote:
>>>>>>>>> On 01/20/2016 11:29 AM, Zhang Chen wrote:
>>>>>>>>>>>>> Sure.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Two main comments/suggestions:
>>>>>>>>>>>>>
>>>>>>>>>>>>> - TCP analysis is missed in current version, maybe you point a git tree
>>>>>>>>>>>>> (or another version of RFC) to me for a better understanding of the
>>>>>>>>>>>>> design. (Just a skeleton for TCP should be sufficient to discuss).
>>>>>>>>>>>>> - I prefer to make the code as reusable as possible. So it's better to
>>>>>>>>>>>>> split/decouple the reusable parts from the codes. So a vague idea is:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1) Decouple the packet comparing from the netfilter. You've achieved
>>>>>>>>>>>>> this 99% since the work has been done in a thread. Just let the thread
>>>>>>>>>>>>> poll sockets directly, then the comparing have the possibility to be
>>>>>>>>>>>>> reused by other kinds of dataplane.
>>>>>>>>>>>>> 2) Implement traffic mirror/redirector as filter.
>>>>>>>>>>>>> 3) Implement TCP seq rewriting as a filter.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Then, in primary node, you need just a traffic mirror, which did:
>>>>>>>>>>>>> - mirror ingress traffic to secondary node
>>>>>>>>>>>>> - mirror outgress traffic to packet comparing thread
>>>>>>>>>>>>>
>>>>>>>>>>>>> And in secondadry node, you need two filters:
>>>>>>>>>>>>> - A TCP seq rewriter which adjust tcp sequence number.
>>>>>>>>>>>>> - A traffic redirector which redirect packet from a socket as ingress
>>>>>>>>>>>>> traffic, and redirect outgress traffic to the socket which could be
>>>>>>>>>>>>> polled by remote packet comparing thread.
>>>>>>>>>>>>>   Thoughts?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>> zhangchen
>>>>>>>>>>> Hi, Jason.
>>>>>>>>>>> We consider your suggestion to split/decouple
>>>>>>>>>>> the reusable parts from the codes.
>>>>>>>>>>> Due to filter plugin are traversed one by one in order
>>>>>>>>>>> we will split colo-proxy to three filters in each side.
>>>>>>>>>>>
>>>>>>>>>>> But in this plan,primary and secondary both have socket
>>>>>>>>>>> server,startup is a problem.
>>>>>>>>> I believe this issue could be solved by reusing socket chardev.
>>>>>>>>>
>>>>>>>>>>>  Primary qemu                                                      
>>>>>>>>>>> Secondary qemu
>>>>>>>>>>> +----------------------------------------------------------+      
>>>>>>>>>>> +-----------------------------------------------------------+
>>>>>>>>>>> | +-----------------------------------------------------+  |       | 
>>>>>>>>>>> +------------------------------------------------------+ |
>>>>>>>>>>> | |                                                     |  |       | 
>>>>>>>>>>> |                                                      | |
>>>>>>>>>>> | |                        guest                        |  |       | 
>>>>>>>>>>> |                        guest                         | |
>>>>>>>>>>> | |                                                     |  |       | 
>>>>>>>>>>> |                                                      | |
>>>>>>>>>>> | +-----------^--------------+--------------------------+  |       | 
>>>>>>>>>>> +---------------------+--------+-----------------------+ |
>>>>>>>>>>> |             |              |                             |      
>>>>>>>>>>> |                        ^        |                         |
>>>>>>>>>>> |             |              |                             |      
>>>>>>>>>>> |                        |        |                         |
>>>>>>>>>>> |             +-------------------------------------------------+ 
>>>>>>>>>>> |                        |        |                         |
>>>>>>>>>>> |  netfilter  |              |                             |    |  |  
>>>>>>>>>>> netfilter            |        |                         |
>>>>>>>>>>> | +-----------------------------------------------------+  |    |  | 
>>>>>>>>>>> +------------------------------------------------------+ |
>>>>>>>>>>> | |           |              |     filter excute order  |  |    |  | 
>>>>>>>>>>> |                     |        |  filter excute order  | |
>>>>>>>>>>> | |           |              |    +-------------------> |  |    |  | 
>>>>>>>>>>> |                     |        | +-------------------> | |
>>>>>>>>>>> | |           |              |                          |  |    |  | 
>>>>>>>>>>> |                     |        |   TCP                 | |
>>>>>>>>>>> | | +---------+-+     +------v-----+    +----+ +-----+  |  |    |  | 
>>>>>>>>>>> | +-----------+   +---+----+---v+rewriter+  +--------+ | |
>>>>>>>>>>> | | |           |     |            |    |            |  |  |    |  | 
>>>>>>>>>>> | |           |   |        |             |  |        | | |
>>>>>>>>>>> | | |  mirror   |     |  redirect  +---->  compare   |  |  |   
>>>>>>>>>>> +--------> mirror   +---> adjust |   adjust    +-->redirect| | |
>>>>>>>>>>> | | |  client   |     |  server    |    |            |  |  |       | 
>>>>>>>>>>> | |  server   |   | ack    |   seq       |  |client  | | |
>>>>>>>>>>> | | |           |     |            |    |            |  |  |       | 
>>>>>>>>>>> | |           |   |        |             |  |        | | |
>>>>>>>>>>> | | +----^------+     +----^-------+    +-----+------+  |  |       | 
>>>>>>>>>>> | +-----------+   +--------+-------------+  +----+---+ | |
>>>>>>>>>>> | |      |     tx          |      rx          |     rx  |  |       | 
>>>>>>>>>>> |            tx                        all       |  rx | |
>>>>>>>>>>> | +-----------------------------------------------------+  |       | 
>>>>>>>>>>> +------------------------------------------------------+ |
>>>>>>>>>>> |        |                
>>>>>>>>>>> +-------------------------------------------------------------------------------------------+      
>>>>>>>>>>> |
>>>>>>>>>>> |        |                                    |            |      
>>>>>>>>>>> |                                                           |
>>>>>>>>>>> +----------------------------------------------------------+      
>>>>>>>>>>> +-----------------------------------------------------------+
>>>>>>>>>>>          |                                    |
>>>>>>>>>>>          |guest receive                       |guest send
>>>>>>>>>>>          |                                    |
>>>>>>>>>>> +--------+------------------------------------v------------+
>>>>>>>>>>> |                                                          |
>>>>>>>>>>> |                                                          |
>>>>>>>>>>> |                         tap                             
>>>>>>>>>>> |                              NOTE: filter direction is rx/tx/all
>>>>>>>>>>> |                                                         
>>>>>>>>>>> |                              rx:receive packets sent to the netdev
>>>>>>>>>>> |                                                         
>>>>>>>>>>> |                              tx:receive packets sent by the netdev
>>>>>>>>>>> +----------------------------------------------------------+
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>> I still like to decouple comparer from netfilter. It have two obvious
>>>>>>>>> advantages:
>>>>>>>>>
>>>>>>>>> - make it can be reused by other dataplane (e.g vhost)
>>>>>>>>> - secondary redirector could redirect rx to comparer on primary node
>>>>>>>>> directly which simplify the design.
>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> guest recv packet route
>>>>>>>>>>>
>>>>>>>>>>> primary
>>>>>>>>>>> tap --> mirror client filter
>>>>>>>>>>> mirror client will send packet to guest,at the
>>>>>>>>>>> same time, copy and forward packet to secondary
>>>>>>>>>>> mirror server.
>>>>>>>>>>>
>>>>>>>>>>> secondary
>>>>>>>>>>> mirror server filter --> TCP rewriter
>>>>>>>>>>> if recv packet is TCP packet,we will adjust ack
>>>>>>>>>>> and update TCP checksum, then send to secondary
>>>>>>>>>>> guest. else directly send to guest.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> guest send packet route
>>>>>>>>>>>
>>>>>>>>>>> primary
>>>>>>>>>>> guest --> redirect server filter
>>>>>>>>>>> redirect server filter recv primary guest packet
>>>>>>>>>>> but do nothing, just pass to next filter.
>>>>>>>>>>>
>>>>>>>>>>> redirect server filter --> compare filter
>>>>>>>>>>> compare filter recv primary guest packet then
>>>>>>>>>>> waiting scondary redirect packet to compare it.
>>>>>>>>>>> if packet same,send primary packet and clear secondary
>>>>>>>>>>> packet, else send primary packet and do
>>>>>>>>>>> checkpoint.
>>>>>>>>>>>
>>>>>>>>>>> secondary
>>>>>>>>>>> guest --> TCP rewriter filter
>>>>>>>>>>> if the packet is TCP packet,we will adjust seq
>>>>>>>>>>> and update TCP checksum. then send it to
>>>>>>>>>>> redirect client filter. else directly send to
>>>>>>>>>>> redirect client filter.
>>>>>>>>>>>
>>>>>>>>>>> redirect client filter --> redirect server filter
>>>>>>>>>>> forward packet to primary
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> In failover scene(primary is down), the TCP rewriter will keep
>>>>>>>>>>> servicing
>>>>>>>>>>> for the TCP connection which is established after the last checkpoint。
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> How about this plan?
>>>>>>>>> Sounds good.
>>>>>>>>>
>>>>>>>>> And there's indeed no need to differ client/server by reusing the socket
>>>>>>>>> chardev. E.g:
>>>>>>>>>
>>>>>>>>> In primary node:
>>>>>>>>>
>>>>>>>>> ...
>>>>>>>>> -chardev socket,id=comparer0,host=ip_primary,port=X,server,nowait
>>>>>>>>> -chardev socket,id=comparer1,host=ip_primary,port=Y,server,nowait
>>>>>>>>> -chardev socket,id=mirrorer0,host=ip_primary,port=Z,server,nowait
>>>>>>>>> -netdev tap,id=hn0
>>>>>>>>> -traffic-mirrorer netdev=hn0,id=t0,indev=comparer0,outdev=mirrorer0
>>>>>>>>> -colo-comparer primary_traffic=comparer0,secondary_traffic=comparer1
>>>>>>> Why mirrorer has indev? 
>>>>>
>>>>> As I said in the previous mails. I would like to decouple packet
>>>>> comparing from netfilter. You've already done most of this since the
>>>>> comparing is done in an independent thread. So the indev here is to
>>>>> mirror the packet sent by guest to the packet comparing thread.
>>>>>
>>>>>>> I think we can use traffic-redirector to do it.
>>>>>>> The command line is:
>>>>>>> -netdev tap,id=hn0
>>>>>>> -object traffic-mirrorer,id=f0,netdev=hn0,queue=tx,outdev=mirrorer0
>>>>>>> -object traffic-redirector,id=f1,netdev=hn0,queue=rx,outdev=comparer0
>>>>>>> -colo-comparer primary_traffic=comparer0,secondary_traffic=comparer1,netdev=hn0
>>>>>>> In the comparer thread, we can use qemu_net_queue_send_iov() to send
>>>>>>> out the packet.
>>>>>>>
>>>>>>> Also, we can merge the socketdev comparer1 and mirrorer0.
>>>>> It depends on whether or not packet comparing was done in a net filter
>>>>> (which I prefer not).
>>> I mean that: packet comapring is done in a thread, not a net filter.
>>> The flow of the packet sent from guest:
>>> 1. traffice-redirecotr, we will redirector the packet to comparer0, the next
>>>    filter will never see it.
>>> 2. comparing thread: read it from socket chardev comparer0
>>> 3. call qemu_net_queue_send_iov() to send it back to the netdev.
>> Ok, looks like I miss something.
>>
>> My suggestion tries best to let the packet comparing not tie to filter
>> or netdev. But your suggestion still need it to be coupled with a
>> netdev. Any advantages of doing this (or is there a reason that packet
>> must be sent to netdev after doing comparing?). If not, why not just
> Yes, the packet must be sent to netdev after doing comparing. If both
> the primary packet and secondary packet are the same(contains the same
> application level data), we will drop the secondary packet, and send the
> primary packet to the netdev. Otherwise, we will sync the state.

And drop primary packet also here?

>
>> mirror (duplicate the packet and forward it to a chardev, and pass the
>> original packet to the next filter or netdev)? And doing
> We cannot send the packet to the netdev before comparing. We need to keep
> the connection after failover.
>
> Thanks
> Wen Congyang
>
>> qemu_net_queue_send_iov() to a netdev in another thread may need some
>> synchronization with iothread.
>>
>>> Thanks
>>> Wen Congyang
>>>
>>
>>
>> .
>>
>
>

  reply	other threads:[~2016-01-22  5:42 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-22 10:42 [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter Zhang Chen
2015-12-22 10:42 ` [Qemu-devel] [RFC PATCH v2 01/10] Init colo-proxy object " Zhang Chen
2016-01-15 18:21   ` Dr. David Alan Gilbert
2016-01-18  7:08     ` Zhang Chen
2015-12-22 10:42 ` [Qemu-devel] [RFC PATCH v2 02/10] Jhash: add linux kernel jhashtable in qemu Zhang Chen
2016-01-08 12:08   ` Dr. David Alan Gilbert
2016-01-11  1:49     ` Zhang Chen
2016-01-11 12:50       ` Dr. David Alan Gilbert
2016-01-12  1:58         ` Zhang Chen
2016-01-12  8:58           ` Dr. David Alan Gilbert
2015-12-22 10:42 ` [Qemu-devel] [RFC PATCH v2 03/10] Colo-proxy: add colo-proxy framework Zhang Chen
2016-02-19 19:57   ` Dr. David Alan Gilbert
2016-02-22  3:04     ` Zhang Chen
2015-12-22 10:42 ` [Qemu-devel] [RFC PATCH v2 04/10] Colo-proxy: add data structure and jhash func Zhang Chen
2015-12-22 10:42 ` [Qemu-devel] [RFC PATCH v2 05/10] net/colo-proxy: Add colo interface to use proxy Zhang Chen
2016-02-19 19:58   ` Dr. David Alan Gilbert
2016-02-22  3:08     ` Zhang Chen
2015-12-22 10:42 ` [Qemu-devel] [RFC PATCH v2 06/10] net/colo-proxy: add socket used by forward func Zhang Chen
2016-02-19 20:01   ` Dr. David Alan Gilbert
2016-02-22  5:51     ` Zhang Chen
2015-12-22 10:42 ` [Qemu-devel] [RFC PATCH v2 07/10] net/colo-proxy: Add packet enqueue & handle func Zhang Chen
2015-12-22 10:42 ` [Qemu-devel] [RFC PATCH v2 08/10] net/colo-proxy: Handle packet and connection Zhang Chen
2016-02-19 20:04   ` Dr. David Alan Gilbert
2016-02-22  6:41     ` Zhang Chen
2016-02-22 19:54       ` Dr. David Alan Gilbert
2016-02-23 17:58       ` Dr. David Alan Gilbert
2016-02-24  2:01         ` Zhang Chen
2015-12-22 10:42 ` [Qemu-devel] [RFC PATCH v2 09/10] net/colo-proxy: Compare pri pkt to sec pkt Zhang Chen
2016-02-19 20:07   ` Dr. David Alan Gilbert
2015-12-22 10:42 ` [Qemu-devel] [RFC PATCH v2 10/10] net/colo-proxy: Colo-proxy do checkpoint and clear Zhang Chen
2015-12-29  6:31 ` [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter Zhang Chen
2015-12-29  6:58   ` Jason Wang
2015-12-29  7:08     ` Zhang Chen
2015-12-31  2:36 ` Jason Wang
2015-12-31  8:02   ` Li Zhijian
2016-01-04  2:08     ` Jason Wang
2015-12-31  8:40   ` Zhang Chen
2016-01-04  5:37     ` Jason Wang
2016-01-04  8:16       ` Zhang Chen
2016-01-04  9:46         ` Jason Wang
2016-01-04 11:17           ` Zhang Chen
2016-01-06  5:16             ` Jason Wang
2016-01-18  7:05               ` Zhang Chen
2016-01-18  9:29                 ` Jason Wang
2016-01-20  3:29                   ` Zhang Chen
2016-01-20  6:54                     ` Jason Wang
2016-01-20  7:44                       ` Wen Congyang
2016-01-20  9:20                         ` Jason Wang
2016-01-20  9:49                           ` Wen Congyang
2016-01-20 10:03                             ` Jason Wang
2016-01-20 10:34                               ` Wen Congyang
2016-01-22  5:33                                 ` Jason Wang
2016-01-22  5:57                                   ` Wen Congyang
2016-01-20 10:01                       ` Wen Congyang
2016-01-20 10:19                         ` Jason Wang
2016-01-20 10:30                           ` Wen Congyang
2016-01-22  3:15                             ` Jason Wang
2016-01-22  3:28                               ` Wen Congyang
2016-01-22  5:41                                 ` Jason Wang [this message]
2016-01-22  5:56                                   ` Wen Congyang
2016-01-22  6:21                                     ` Jason Wang
2016-01-22  6:47                                       ` Wen Congyang
2016-01-22  7:42                                         ` Jason Wang
2016-01-22  7:46                                           ` Wen Congyang
2016-01-27 15:22                                             ` Eric Blake
2016-01-04 16:52           ` Dr. David Alan Gilbert
2016-01-06  5:20             ` Jason Wang
2016-01-06  9:10               ` Dr. David Alan Gilbert
2016-01-08 11:19 ` Dr. David Alan Gilbert
2016-01-11  1:30   ` Zhang Chen
2016-01-11 12:59     ` Dr. David Alan Gilbert
2016-01-12  7:32       ` Zhang Chen
2016-02-29 20:04 ` Dr. David Alan Gilbert
2016-03-01  2:39   ` Li Zhijian
2016-03-01 10:48     ` Dr. David Alan Gilbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56A1C112.3060402@redhat.com \
    --to=jasowang@redhat.com \
    --cc=arei.gonglei@huawei.com \
    --cc=dgilbert@redhat.com \
    --cc=eddie.dong@intel.com \
    --cc=guijianfeng@cn.fujitsu.com \
    --cc=hongyang.yang@easystack.cn \
    --cc=jan.kiszka@siemens.com \
    --cc=lizhijian@cn.fujitsu.com \
    --cc=peter.huangpeng@huawei.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=wency@cn.fujitsu.com \
    --cc=zhang.zhanghailiang@huawei.com \
    --cc=zhangchen.fnst@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).