From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:56058) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZJdgA-0000hR-CD for qemu-devel@nongnu.org; Mon, 27 Jul 2015 04:22:35 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZJdg7-0004hA-6E for qemu-devel@nongnu.org; Mon, 27 Jul 2015 04:22:34 -0400 Received: from [59.151.112.132] (port=17685 helo=heian.cn.fujitsu.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZJdg5-0004dy-5Y for qemu-devel@nongnu.org; Mon, 27 Jul 2015 04:22:31 -0400 Message-ID: <55B5EA38.2030803@cn.fujitsu.com> Date: Mon, 27 Jul 2015 16:22:16 +0800 From: Yang Hongyang MIME-Version: 1.0 References: <55AC9859.3050100@cn.fujitsu.com> <20150720103208.GA12675@stefanha-thinkpad.redhat.com> <55B19F25.10905@redhat.com> <55B1F196.2000008@cn.fujitsu.com> <55B5A465.6030004@redhat.com> <55B5AB70.3030704@cn.fujitsu.com> <55B5B85B.1010009@redhat.com> <55B5C6E9.6090707@cn.fujitsu.com> <55B5DF9E.6020908@redhat.com> <55B5E2A3.40600@cn.fujitsu.com> <55B5E69F.20406@redhat.com> In-Reply-To: <55B5E69F.20406@redhat.com> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [POC] colo-proxy in qemu List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Jason Wang , "Dong, Eddie" , Stefan Hajnoczi , Li Zhijian Cc: zhanghailiang , "jan.kiszka@siemens.com" , "peter.huangpeng" , "qemu-devel@nongnu.org" , "Gonglei (Arei)" , "stefanha@redhat.com" , "dgilbert@redhat.com" On 07/27/2015 04:06 PM, Jason Wang wrote: > > > On 07/27/2015 03:49 PM, Yang Hongyang wrote: >> On 07/27/2015 03:37 PM, Jason Wang wrote: >>> >>> >>> On 07/27/2015 01:51 PM, Yang Hongyang wrote: >>>> On 07/27/2015 12:49 PM, Jason Wang wrote: >>>>> >>>>> >>>>> On 07/27/2015 11:54 AM, Yang Hongyang wrote: >>>>>> >>>>>> >>>>>> On 07/27/2015 11:24 AM, Jason Wang wrote: >>>>>>> >>>>>>> >>>>>>> On 07/24/2015 04:04 PM, Yang Hongyang wrote: >>>>>>>> Hi Jason, >>>>>>>> >>>>>>>> On 07/24/2015 10:12 AM, Jason Wang wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> On 07/24/2015 10:04 AM, Dong, Eddie wrote: >>>>>>>>>> Hi Stefan: >>>>>>>>>> Thanks for your comments! >>>>>>>>>> >>>>>>>>>>> On Mon, Jul 20, 2015 at 02:42:33PM +0800, Li Zhijian wrote: >>>>>>>>>>>> We are planning to implement colo-proxy in qemu to cache and >>>>>>>>>>>> compare >>>>>>>>>>> packets. >>>>>>>>>>> >>>>>>>>>>> I thought there is a kernel module to do that? >>>>>>>>>> Yes, that is the previous solution the COLO sub-community >>>>>>>>>> choose >>>>>>>>>> to go, but we realized it might be not the best choices, and >>>>>>>>>> thus we >>>>>>>>>> want to bring discussion back here :) More comments are welcome. >>>>>>>>>> >>>>>>>>> >>>>>>>>> Hi: >>>>>>>>> >>>>>>>>> Could you pls describe more details on this decision? What's the >>>>>>>>> reason >>>>>>>>> that you realize it was not the best choice? >>>>>>>> >>>>>>>> Below is my opinion: >>>>>>>> >>>>>>>> We realized that there're disadvantages do it in kernel spaces: >>>>>>>> 1. We need to recompile kernel: the colo-proxy kernel module is >>>>>>>> implemented as a nf conntrack extension. Adding a extension >>>>>>>> need to >>>>>>>> modify the extension struct in-kernel, so recompile kernel is >>>>>>>> needed. >>>>>>> >>>>>>> There's no need to do all in kernel, you can use a separate >>>>>>> process to >>>>>>> do the comparing and trigger the state sync through monitor. >>>>>> >>>>>> I don't get it, colo-proxy kernel module using a kthread do the >>>>>> comparing and >>>>>> trigger the state sync. We implemented it as a nf conntrack extension >>>>>> module, >>>>>> so we need to extend the extension struct in-kernel, although it just >>>>>> needs >>>>>> few lines changes to kernel, but a recompile of kernel is needed. >>>>>> Are you >>>>>> talking about not implement it as a nf conntrack extension? >>>>> >>>>> Yes, I mean implement the comparing in userspace but not in qemu. >>>> >>>> Yes, it is an alternative, that requires other components such as >>>> netfilter userspace tools, it will add the complexity I think, we >>>> wanted to implement a simple solution in QEMU. Another reason is >>>> that using other userspace tools will affect the performance, the >>>> context switch between kernel and userspace may be an overhead. >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> 2. We need to recompile iptables/nftables to use together with the >>>>>>>> colo-proxy >>>>>>>> kernel module. >>>>>>>> 3. Need to configure primary host to forward input packets to >>>>>>>> secondary as >>>>>>>> well as configure secondary to forward output packets to >>>>>>>> primary >>>>>>>> host, the >>>>>>>> network topology and configuration is too complex for a >>>>>>>> regular >>>>>>>> user. >>>>>>>> >>>>>>> >>>>>>> You can use current kernel primitives to mirror the traffic of both >>>>>>> PVM >>>>>>> and SVM to another process without any modification of kernel. And >>>>>>> qemu >>>>>>> can offload all network configuration to management in this >>>>>>> case. And >>>>>>> what's more import, this works for vhost. Filtering in qemu won't >>>>>>> work >>>>>>> for vhost. >>>>>> >>>>>> We are using tc to mirror/forward packets now. Implement in QEMU do >>>>>> have some >>>>>> limits, but there're also limits in kernel, if the packet do not pass >>>>>> the host kernel TCP/IP stack, such as vhost-user. >>>>> >>>>> But the limits are much less than userspace, no? For vhost-user, maybe >>>>> we could extend the backed to mirror the traffic also. >>>> >>>> IMO the limits are more or less. Besides, for mirror/forward packets, >>>> using tc requires a separate physical nic or a vlan, the nic should not >>>> be used for other purpose. if we implement it in QEMU, using an socket >>>> connection to forward packets, we no longer need an separate nic, it >>>> will >>>> reduce the network topology complexity. >>> >>> It depends on how do you design your user space. If you want using >>> userspace to forward the packet, you can 1) use packet socket to capture >>> all traffic on the tap that is used by VM 2) mirror the traffic to a new >>> tap device, the user space can then read all traffic from this new tap >>> device. >> >> Yes, but we can also do it in QEMU space, right? > > Right. > >> This will make life easier >> because we do all in one solution within QEMU. > > But I'm not sure qemu is the right place to do this as you mention that > it needs userspace protocol stack support. We only need some simple features like defragment of TCP packets, analyze TCP headers, since QEMU has a slirp userspace protocol stack, that should not be a big deal. > >> >>> >>> . >>> >> > > . > -- Thanks, Yang.