From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:57552) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZJbVJ-0003Ha-QI for qemu-devel@nongnu.org; Mon, 27 Jul 2015 02:03:15 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZJbVG-00052x-8o for qemu-devel@nongnu.org; Mon, 27 Jul 2015 02:03:13 -0400 Received: from [59.151.112.132] (port=11514 helo=heian.cn.fujitsu.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZJbV7-00051K-0S for qemu-devel@nongnu.org; Mon, 27 Jul 2015 02:03:10 -0400 Message-ID: <55B5C98A.6090508@cn.fujitsu.com> Date: Mon, 27 Jul 2015 14:02:50 +0800 From: Yang Hongyang MIME-Version: 1.0 References: <55AF75E6.6070909@cn.fujitsu.com> <1437562536-20414-1-git-send-email-yanghy@cn.fujitsu.com> <55B082BE.2020703@redhat.com> <55B5C14F.5030808@cn.fujitsu.com> In-Reply-To: <55B5C14F.5030808@cn.fujitsu.com> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH] RFC/net: Add a net filter List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Jason Wang , qemu-devel@nongnu.org Cc: thuth@redhat.com, stefanha@redhat.com On 07/27/2015 01:27 PM, Yang Hongyang wrote: > On 07/23/2015 01:59 PM, Jason Wang wrote: >> >> >> On 07/22/2015 06:55 PM, Yang Hongyang wrote: >>> This patch add a net filter between network backend and NIC devices. >>> All packets will pass by this filter. >>> TODO: >>> multiqueue support. >>> plugin support. >>> >>> +--------------+ +-------------+ >>> +----------+ | filter | |frontend(NIC)| >>> | real | | | | | >>> | network <--+backend <-------+ | >>> | backend | | peer +-------> peer | >>> +----------+ +--------------+ +-------------+ >>> >>> Usage: >>> -netdev tap,id=bn0 # you can use whatever backend as needed >>> -netdev filter,id=f0,backend=bn0,plugin=dump >>> -device e1000,netdev=f0 >>> >>> Signed-off-by: Yang Hongyang >> >> Hi: >> >> Several questions: >> >> - Looks like we can do more than filter, so may be something like >> traffic control or other is more suitable? > > The filter is just a transparent proxy of a backend if no filter plugin > is inserted. It just by pass all packets. Capture all traffic is the purpose > of the filter. As long as we have an entry to capture all packets, we > can do more, this is what a filter plugin will do. There are some use cases > I can think of: > - dump, by using filter, we can dump either output/input packets. > - buffer, to buffer/release packets, this feature can be used when using > macrocheckpoing. Or other Remus like VM FT solutions. You can > also supply an interval to a buffer plugin, which will release > packets by interval. > May be other use cases based on this special backend. > >> - What's the advantages of introducing a new type of netdev? You can take the filter as a full featured network backend, And by implement it as a new type of netdev, we can reuse the existing netdev design, reuse as many existing code/design as we can. >> As far as I >> can see, just replace the dump function in Tomas' series with a >> configurable function pointer will do the trick? (Probably with some >> monitor commands). And then you won't even need to deal with vnet hder >> and offload stuffs? > > I think dump function focus on every netdev, it adds an dump_enabled to > NetClientState, and dump the packet when the netdev receive been called, > This filter function more focus on packets between backend/frontend, > it's kind of an injection to the network packets flow. > So the semantics are different I think. > >> - I'm not sure the value of doing this especially consider host (linux) >> has much more functional and powerful traffic control system. >> >> Thanks. >> >> >>> --- >>> include/net/net.h | 3 + >>> net/Makefile.objs | 1 + >>> net/clients.h | 3 + >>> net/filter.c | 200 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> net/net.c | 6 +- >>> qapi-schema.json | 23 ++++++- >>> 6 files changed, 233 insertions(+), 3 deletions(-) >>> create mode 100644 net/filter.c >>> >>> diff --git a/include/net/net.h b/include/net/net.h >>> index 6a6cbef..250f365 100644 >>> --- a/include/net/net.h >>> +++ b/include/net/net.h >>> @@ -45,6 +45,8 @@ typedef void (NetPoll)(NetClientState *, bool enable); >>> typedef int (NetCanReceive)(NetClientState *); >>> typedef ssize_t (NetReceive)(NetClientState *, const uint8_t *, size_t); >>> typedef ssize_t (NetReceiveIOV)(NetClientState *, const struct iovec *, int); >>> +typedef ssize_t (NetReceiveFilter)(NetClientState *, NetClientState *, >>> + unsigned, const uint8_t *, size_t); >>> typedef void (NetCleanup) (NetClientState *); >>> typedef void (LinkStatusChanged)(NetClientState *); >>> typedef void (NetClientDestructor)(NetClientState *); >>> @@ -64,6 +66,7 @@ typedef struct NetClientInfo { >>> NetReceive *receive; >>> NetReceive *receive_raw; >>> NetReceiveIOV *receive_iov; >>> + NetReceiveFilter *receive_filter; >>> NetCanReceive *can_receive; >>> NetCleanup *cleanup; >>> LinkStatusChanged *link_status_changed; >>> diff --git a/net/Makefile.objs b/net/Makefile.objs >>> index ec19cb3..914aec0 100644 >>> --- a/net/Makefile.objs >>> +++ b/net/Makefile.objs >>> @@ -13,3 +13,4 @@ common-obj-$(CONFIG_HAIKU) += tap-haiku.o >>> common-obj-$(CONFIG_SLIRP) += slirp.o >>> common-obj-$(CONFIG_VDE) += vde.o >>> common-obj-$(CONFIG_NETMAP) += netmap.o >>> +common-obj-y += filter.o >>> diff --git a/net/clients.h b/net/clients.h >>> index d47530e..bcfb34b 100644 >>> --- a/net/clients.h >>> +++ b/net/clients.h >>> @@ -62,4 +62,7 @@ int net_init_netmap(const NetClientOptions *opts, const >>> char *name, >>> int net_init_vhost_user(const NetClientOptions *opts, const char *name, >>> NetClientState *peer, Error **errp); >>> >>> +int net_init_filter(const NetClientOptions *opts, const char *name, >>> + NetClientState *peer, Error **errp); >>> + >>> #endif /* QEMU_NET_CLIENTS_H */ >>> diff --git a/net/filter.c b/net/filter.c >>> new file mode 100644 >>> index 0000000..006c64a >>> --- /dev/null >>> +++ b/net/filter.c >>> @@ -0,0 +1,200 @@ >>> +/* >>> + * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO) >>> + * (a.k.a. Fault Tolerance or Continuous Replication) >>> + * >>> + * Copyright (c) 2015 HUAWEI TECHNOLOGIES CO., LTD. >>> + * Copyright (c) 2015 FUJITSU LIMITED >>> + * Copyright (c) 2015 Intel Corporation >>> + * >>> + * This work is licensed under the terms of the GNU GPL, version 2 or >>> + * later. See the COPYING file in the top-level directory. >>> + */ >>> + >>> +#include "net/net.h" >>> +#include "clients.h" >>> +#include "qemu-common.h" >>> +#include "qemu/error-report.h" >>> + >>> +typedef struct FILTERState { >>> + NetClientState nc; >>> + NetClientState *backend; >>> +} FILTERState; >>> + >>> +static ssize_t filter_receive(NetClientState *nc, NetClientState *sender, >>> + unsigned flags, const uint8_t *data, size_t size) >>> +{ >>> + FILTERState *s = DO_UPCAST(FILTERState, nc, nc); >>> + NetClientState *issued_nc = NULL; >>> + ssize_t ret; >>> + >>> + if (sender->info->type == NET_CLIENT_OPTIONS_KIND_NIC) { >>> + /* packet received from NIC */ >>> + printf("packet received from NIC!!!\n"); >>> + issued_nc = s->backend; >>> + } else { >>> + /* packet received from backend */ >>> + printf("packet received from backend!!!\n"); >>> + issued_nc = nc->peer; >>> + } >>> + >>> + if (flags & QEMU_NET_PACKET_FLAG_RAW && issued_nc->info->receive_raw) { >>> + ret = issued_nc->info->receive_raw(issued_nc, data, size); >>> + } else { >>> + ret = issued_nc->info->receive(issued_nc, data, size); >>> + } >>> + >>> + return ret; >>> +} >>> + >>> +static void filter_cleanup(NetClientState *nc) >>> +{ >>> + return; >>> +} >>> + >>> +static bool filter_has_ufo(NetClientState *nc) >>> +{ >>> + FILTERState *s = DO_UPCAST(FILTERState, nc, nc); >>> + NetClientState *backend = s->backend; >>> + >>> + if (!backend->info->has_ufo) { >>> + return false; >>> + } >>> + >>> + return backend->info->has_ufo(backend); >>> +} >>> + >>> +static bool filter_has_vnet_hdr(NetClientState *nc) >>> +{ >>> + FILTERState *s = DO_UPCAST(FILTERState, nc, nc); >>> + NetClientState *backend = s->backend; >>> + >>> + if (!backend->info->has_vnet_hdr) { >>> + return false; >>> + } >>> + >>> + return backend->info->has_vnet_hdr(backend); >>> +} >>> + >>> +static bool filter_has_vnet_hdr_len(NetClientState *nc, int len) >>> +{ >>> + FILTERState *s = DO_UPCAST(FILTERState, nc, nc); >>> + NetClientState *backend = s->backend; >>> + >>> + if (!backend->info->has_vnet_hdr_len) { >>> + return false; >>> + } >>> + >>> + return backend->info->has_vnet_hdr_len(backend, len); >>> +} >>> + >>> +static void filter_using_vnet_hdr(NetClientState *nc, bool using_vnet_hdr) >>> +{ >>> + FILTERState *s = DO_UPCAST(FILTERState, nc, nc); >>> + NetClientState *backend = s->backend; >>> + >>> + if (!backend->info->using_vnet_hdr) { >>> + return; >>> + } >>> + >>> + backend->info->using_vnet_hdr(backend, using_vnet_hdr); >>> +} >>> + >>> +static void filter_set_offload(NetClientState *nc, int csum, int tso4, >>> + int tso6, int ecn, int ufo) >>> +{ >>> + FILTERState *s = DO_UPCAST(FILTERState, nc, nc); >>> + NetClientState *backend = s->backend; >>> + >>> + if (!backend->info->set_offload) { >>> + return; >>> + } >>> + >>> + backend->info->set_offload(backend, csum, tso4, tso6, ecn, ufo); >>> +} >>> + >>> +static void filter_set_vnet_hdr_len(NetClientState *nc, int len) >>> +{ >>> + FILTERState *s = DO_UPCAST(FILTERState, nc, nc); >>> + NetClientState *backend = s->backend; >>> + >>> + if (!backend->info->set_vnet_hdr_len) { >>> + return; >>> + } >>> + >>> + backend->info->set_vnet_hdr_len(backend, len); >>> +} >>> + >>> +static int filter_set_vnet_le(NetClientState *nc, bool is_le) >>> +{ >>> + FILTERState *s = DO_UPCAST(FILTERState, nc, nc); >>> + NetClientState *backend = s->backend; >>> + >>> + if (!backend->info->set_vnet_le) { >>> + return -ENOSYS; >>> + } >>> + >>> + return backend->info->set_vnet_le(backend, is_le); >>> +} >>> + >>> +static int filter_set_vnet_be(NetClientState *nc, bool is_be) >>> +{ >>> + FILTERState *s = DO_UPCAST(FILTERState, nc, nc); >>> + NetClientState *backend = s->backend; >>> + >>> + if (!backend->info->set_vnet_be) { >>> + return -ENOSYS; >>> + } >>> + >>> + return backend->info->set_vnet_be(backend, is_be); >>> +} >>> + >>> +static NetClientInfo net_filter_info = { >>> + .type = NET_CLIENT_OPTIONS_KIND_FILTER, >>> + .size = sizeof(FILTERState), >>> + .receive_filter = filter_receive, >>> + .cleanup = filter_cleanup, >>> + .has_ufo = filter_has_ufo, >>> + .has_vnet_hdr = filter_has_vnet_hdr, >>> + .has_vnet_hdr_len = filter_has_vnet_hdr_len, >>> + .using_vnet_hdr = filter_using_vnet_hdr, >>> + .set_offload = filter_set_offload, >>> + .set_vnet_hdr_len = filter_set_vnet_hdr_len, >>> + .set_vnet_le = filter_set_vnet_le, >>> + .set_vnet_be = filter_set_vnet_be, >>> +}; >>> + >>> +int net_init_filter(const NetClientOptions *opts, const char *name, >>> + NetClientState *peer, Error **errp) >>> +{ >>> + NetClientState *nc; >>> + FILTERState *s; >>> + const NetdevFilterOptions *filter; >>> + char *backend_id = NULL; >>> + /* char *plugin = NULL; */ >>> + >>> + assert(opts->kind == NET_CLIENT_OPTIONS_KIND_FILTER); >>> + filter = opts->filter; >>> + assert(filter->has_backend); >>> + >>> + backend_id = filter->backend; >>> + /* plugin = filter->has_plugin ? filter->plugin : NULL; */ >>> + >>> + nc = qemu_new_net_client(&net_filter_info, peer, "filter", name); >>> + /* >>> + * TODO: Both backend and frontend packets will use this queue, we >>> + * double this queue's maxlen >>> + */ >>> + s = DO_UPCAST(FILTERState, nc, nc); >>> + s->backend = qemu_find_netdev(backend_id); >>> + if (!s->backend) { >>> + error_setg(errp, "invalid backend name specified"); >>> + return -1; >>> + } >>> + >>> + s->backend->peer = nc; >>> + /* >>> + * TODO: >>> + * init filter plugin >>> + */ >>> + return 0; >>> +} >>> diff --git a/net/net.c b/net/net.c >>> index 28a5597..466c6ff 100644 >>> --- a/net/net.c >>> +++ b/net/net.c >>> @@ -57,6 +57,7 @@ const char *host_net_devices[] = { >>> "tap", >>> "socket", >>> "dump", >>> + "filter", >>> #ifdef CONFIG_NET_BRIDGE >>> "bridge", >>> #endif >>> @@ -571,7 +572,9 @@ ssize_t qemu_deliver_packet(NetClientState *sender, >>> return 0; >>> } >>> >>> - if (flags & QEMU_NET_PACKET_FLAG_RAW && nc->info->receive_raw) { >>> + if (nc->info->receive_filter) { >>> + ret = nc->info->receive_filter(nc, sender, flags, data, size); >>> + } else if (flags & QEMU_NET_PACKET_FLAG_RAW && nc->info->receive_raw) { >>> ret = nc->info->receive_raw(nc, data, size); >>> } else { >>> ret = nc->info->receive(nc, data, size); >>> @@ -886,6 +889,7 @@ static int (* const >>> net_client_init_fun[NET_CLIENT_OPTIONS_KIND_MAX])( >>> const char *name, >>> NetClientState *peer, Error **errp) = { >>> [NET_CLIENT_OPTIONS_KIND_NIC] = net_init_nic, >>> + [NET_CLIENT_OPTIONS_KIND_FILTER] = net_init_filter, >>> #ifdef CONFIG_SLIRP >>> [NET_CLIENT_OPTIONS_KIND_USER] = net_init_slirp, >>> #endif >>> diff --git a/qapi-schema.json b/qapi-schema.json >>> index a0a45f7..3329973 100644 >>> --- a/qapi-schema.json >>> +++ b/qapi-schema.json >>> @@ -2063,7 +2063,7 @@ >>> # Add a network backend. >>> # >>> # @type: the type of network backend. Current valid values are 'user', 'tap', >>> -# 'vde', 'socket', 'dump' and 'bridge' >>> +# 'vde', 'socket', 'dump' , 'bridge' and 'filter' >>> # >>> # @id: the name of the new network backend >>> # >>> @@ -2474,6 +2474,24 @@ >>> '*vhostforce': 'bool' } } >>> >>> ## >>> +# @NetdevFilterOptions >>> +# >>> +# A net filter between network backend and NIC device >>> +# >>> +# @plugin: #optional a plugin represent a set of filter rules, >>> +# by default, if no plugin is supplied, the net filter will do >>> +# nothing but pass all packets to network backend. >>> +# >>> +# @backend: the network backend. >>> +# >>> +# Since 2.5 >>> +## >>> +{ 'struct': 'NetdevFilterOptions', >>> + 'data': { >>> + '*plugin': 'str', >>> + '*backend': 'str' } } >>> + >>> +## >>> # @NetClientOptions >>> # >>> # A discriminated record of network device traits. >>> @@ -2496,7 +2514,8 @@ >>> 'bridge': 'NetdevBridgeOptions', >>> 'hubport': 'NetdevHubPortOptions', >>> 'netmap': 'NetdevNetmapOptions', >>> - 'vhost-user': 'NetdevVhostUserOptions' } } >>> + 'vhost-user': 'NetdevVhostUserOptions', >>> + 'filter': 'NetdevFilterOptions'} } >>> >>> ## >>> # @NetLegacy >> >> . >> > -- Thanks, Yang.