From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:50688) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZgS9a-0001uD-Bz for qemu-devel@nongnu.org; Mon, 28 Sep 2015 02:43:16 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZgS9X-0002bt-1y for qemu-devel@nongnu.org; Mon, 28 Sep 2015 02:43:14 -0400 Received: from [59.151.112.132] (port=15314 helo=heian.cn.fujitsu.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZgS9U-0002aN-Mp for qemu-devel@nongnu.org; Mon, 28 Sep 2015 02:43:10 -0400 Message-ID: <5608E173.7000006@cn.fujitsu.com> Date: Mon, 28 Sep 2015 14:42:59 +0800 From: Yang Hongyang MIME-Version: 1.0 References: <1442405768-23019-1-git-send-email-yanghy@cn.fujitsu.com> <1442405768-23019-10-git-send-email-yanghy@cn.fujitsu.com> <87si64qiyr.fsf@blackfin.pond.sub.org> <5604F54F.1000509@cn.fujitsu.com> <87bncqy1tn.fsf@blackfin.pond.sub.org> In-Reply-To: <87bncqy1tn.fsf@blackfin.pond.sub.org> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v11 09/12] netfilter: add a netbuffer filter List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Markus Armbruster Cc: thuth@redhat.com, zhang.zhanghailiang@huawei.com, lizhijian@cn.fujitsu.com, jasowang@redhat.com, qemu-devel@nongnu.org, stefanha@redhat.com On 09/25/2015 11:07 PM, Markus Armbruster wrote: > Yang Hongyang writes: > >> On 09/24/2015 05:12 PM, Markus Armbruster wrote: >>> Yang Hongyang writes: >>> >>>> This filter is to buffer/release packets, this feature can be used >>>> when using MicroCheckpointing, or other Remus like VM FT solutions, you >>> >>> What's "Remus"? >> >> Remus is an opensource VM FT solution: >> paper: http://http//www.cs.ubc.ca/~andy/papers/remus-nsdi-final.pdf >> First implemented on Xen: >> http://wiki.xen.org/wiki/Remus >> >> MicroCheckpointing in QEMU is another Remus implementation. >> >>> >>>> can also use it to simulate the network delay. >>>> It has an interval option, if supplied, this filter will release >>>> packets by interval. >>> >>> Suggest "will delay packets by that time interval." >>> >>> Is interval really optional? >> >> It supposed to be optional. When the buffer filter has a user: >> http://lists.nongnu.org/archive/html/qemu-devel/2015-09/msg00363.html >> >> In this patchset, the zero interval check is removed. packets are >> released on demand through filter_buffer_release_all() api call. > > Understand. But is interval optional right at this point in this patch > series? It's not, will remove this optional description although we will make it optional soon. > >>> >>>> >>>> Usage: >>>> -netdev tap,id=bn0 >>>> -object filter-buffer,id=f0,netdev=bn0,chain=in,interval=1000 >>>> >>>> NOTE: >>>> the scale of interval is microsecond. >>> >>> Perhaps "interval is in microseconds". >> >> Better, thanks. >> >>> >>>> >>>> Signed-off-by: Yang Hongyang >>>> --- >>>> v11: add a fixme comment from Jason >>>> v10: use NetQueue flush api to flush packets >>>> sent_cb can not be called when we already return size >>>> v9: adjustment due to the qapi change >>>> v7: use QTAILQ_FOREACH_SAFE() when flush packets >>>> v6: move the interval check earlier and some comment adjust >>>> v5: remove dummy sent_cb >>>> change interval type from int64 to uint32 >>>> check interval!=0 when initialise >>>> rename FILTERBUFFERState to FilterBufferState >>>> v4: remove bh >>>> pass the packet to next filter instead of receiver >>>> v3: check packet's sender and sender->peer when flush it >>>> --- >>>> net/Makefile.objs | 1 + >>>> net/filter-buffer.c | 170 ++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> qemu-options.hx | 18 ++++++ >>>> vl.c | 7 ++- >>>> 4 files changed, 195 insertions(+), 1 deletion(-) >>>> create mode 100644 net/filter-buffer.c >>>> >>>> diff --git a/net/Makefile.objs b/net/Makefile.objs >>>> index 914aec0..5fa2f97 100644 >>>> --- a/net/Makefile.objs >>>> +++ b/net/Makefile.objs >>>> @@ -14,3 +14,4 @@ common-obj-$(CONFIG_SLIRP) += slirp.o >>>> common-obj-$(CONFIG_VDE) += vde.o >>>> common-obj-$(CONFIG_NETMAP) += netmap.o >>>> common-obj-y += filter.o >>>> +common-obj-y += filter-buffer.o >>>> diff --git a/net/filter-buffer.c b/net/filter-buffer.c >>>> new file mode 100644 >>>> index 0000000..ef94e91 >>>> --- /dev/null >>>> +++ b/net/filter-buffer.c >>>> @@ -0,0 +1,170 @@ >>>> +/* >>>> + * Copyright (c) 2015 FUJITSU LIMITED >>>> + * Author: Yang Hongyang >>>> + * >>>> + * This work is licensed under the terms of the GNU GPL, version 2 or >>>> + * later. See the COPYING file in the top-level directory. >>>> + */ >>>> + >>>> +#include "net/filter.h" >>>> +#include "net/queue.h" >>>> +#include "qemu-common.h" >>>> +#include "qemu/timer.h" >>>> +#include "qemu/iov.h" >>>> +#include "qapi/qmp/qerror.h" >>>> +#include "qapi-visit.h" >>>> +#include "qom/object.h" >>>> + >>>> +#define TYPE_FILTER_BUFFER "filter-buffer" >>>> + >>>> +#define FILTER_BUFFER(obj) \ >>>> + OBJECT_CHECK(FilterBufferState, (obj), TYPE_FILTER_BUFFER) >>>> + >>>> +struct FilterBufferState { >>>> + NetFilterState parent_obj; >>>> + >>>> + NetQueue *incoming_queue; >>>> + uint32_t interval; >>>> + QEMUTimer release_timer; >>>> +}; >>>> +typedef struct FilterBufferState FilterBufferState; >>> >>> Again, not splitting the declaration is more concise. >>> >>>> + >>>> +static void filter_buffer_flush(NetFilterState *nf) >>>> +{ >>>> + FilterBufferState *s = FILTER_BUFFER(nf); >>>> + >>>> + if (!qemu_net_queue_flush(s->incoming_queue)) { >>>> + /* Unable to empty the queue, purge remaining packets */ >>>> + qemu_net_queue_purge(s->incoming_queue, nf->netdev); >>>> + } >>>> +} >>> >>> This either flushes or purges incoming_queue, where "purge" means >>> dropping packets. Correct? >> >> I think it is correct. Dropping packets is allowed, even on real >> hardware, packets may lose. >> >>> >>>> + >>>> +static void filter_buffer_release_timer(void *opaque) >>>> +{ >>>> + NetFilterState *nf = opaque; >>>> + FilterBufferState *s = FILTER_BUFFER(nf); >>> >>> Style nit: blank line between declarations and statements, please. >>> >>>> + filter_buffer_flush(nf); >>> >>> Is purging correct here? > > When the timer expires, we flush as many buffered packets as we can, > then throw away the rest. Why throw them away? Shouldn't we leave them > in the buffer, and only throw away packets when the buffer is full? > >>>> + timer_mod(&s->release_timer, >>>> + qemu_clock_get_us(QEMU_CLOCK_VIRTUAL) + s->interval); >>> >>> Timer rearmed to fire again in s->interval microseconds. >> >> Yes. >> >>> >>>> +} >>>> + >>>> +/* filter APIs */ >>>> +static ssize_t filter_buffer_receive_iov(NetFilterState *nf, >>>> + NetClientState *sender, >>>> + unsigned flags, >>>> + const struct iovec *iov, >>>> + int iovcnt, >>>> + NetPacketSent *sent_cb) >>>> +{ >>>> + FilterBufferState *s = FILTER_BUFFER(nf); >>>> + >>>> + /* >>>> + * we return size when buffer a packet, the sender will take it as >>>> + * a already sent packet, so sent_cb should not be called later >>> >>> Humor me: when a comment has multiple sentences, start each one with a >>> capital letter, and end it with punctuation. >> >> Ok. >> >>> >>>> + * FIXME: even if guest can't receive packet for some reasons. Filter >>>> + * can still accept packet until its internal queue is full. >>>> + */ >>> >>> I'm not sure I understand the comment. >> >> This is taken from Jason's comments, may be he can have a better explain? >> >>> >>>> + qemu_net_queue_append_iov(s->incoming_queue, sender, flags, >>>> + iov, iovcnt, NULL); >>>> + return iov_size(iov, iovcnt); >>>> +} >>>> + >>>> +static void filter_buffer_cleanup(NetFilterState *nf) >>>> +{ >>>> + FilterBufferState *s = FILTER_BUFFER(nf); >>>> + >>>> + if (s->interval) { >>>> + timer_del(&s->release_timer); >>>> + } >>>> + >>>> + /* flush packets */ >>>> + if (s->incoming_queue) { >>>> + filter_buffer_flush(nf); >>> >>> I guess purging is correct here. >>> >>>> + g_free(s->incoming_queue); >>>> + } >>>> +} >>>> + >>>> +static void filter_buffer_setup(NetFilterState *nf, Error **errp) >>>> +{ >>>> + FilterBufferState *s = FILTER_BUFFER(nf); >>>> + >>>> + /* >>>> + * this check should be dropped when there're VM FT solutions like MC >>>> + * or COLO use this filter to release packets on demand. >>>> + */ >>> >>> If you end a sentence with a period, you get to start it with a capital >>> letter :) >> >> Ok, thanks. >> >>> >>> "there're" is odd. Perhaps something like "We may want to accept zero >>> interval when VM FT solutions like MC or COLO use this filter to release >>> packets on demand." >>> >>>> + if (!s->interval) { >>>> + error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "interval", >>>> + "a non-zero interval"); >>> >>> How can this happen? Doesn't filter_buffer_set_interval() catch zero >>> intervals already? >> >> When user do not supply an interval parameter, filter_buffer_set_interval() >> won't be called, and the interval is 0. >> When we have actual user like I mentioned above, we should allow user to omit >> the interval option. > > Let's see whether I understand. > > If you specify a non-zero interval, filter_buffer_set_interval() stores > it. > > If you specify a zero interval, filter_buffer_set_interval() rejects it. > > If you specify no interval, it defaults to zero, and > filter_buffer_setup() fails right here. > > Therefore, interval is not optional right now. > > Correct? > >>>> + return; >>>> + } >>>> + >>>> + s->incoming_queue = qemu_new_net_queue(qemu_netfilter_pass_to_next, nf); >>>> + if (s->interval) { >>> >>> Condition cannot be false. Same in filter_buffer_cleanup(). >>> >>>> + timer_init_us(&s->release_timer, QEMU_CLOCK_VIRTUAL, >>>> + filter_buffer_release_timer, nf); >>>> + timer_mod(&s->release_timer, >>>> + qemu_clock_get_us(QEMU_CLOCK_VIRTUAL) + s->interval); >>> >>> Timer armed to fire in s->interval microseconds. >>> >>> Unless I misunderstand something, this doesn't actually delay each >>> packet by s->interval microseconds, it batches packet delivery: all >>> packets arriving in a given interval are delayed until the end of the >>> interval. Correct? >> >> Correct. > > Documentation needs to spell that out. > >>>> + } >>>> +} >>>> + >>>> +static void filter_buffer_class_init(ObjectClass *oc, void *data) >>>> +{ >>>> + NetFilterClass *nfc = NETFILTER_CLASS(oc); >>>> + >>>> + nfc->setup = filter_buffer_setup; >>>> + nfc->cleanup = filter_buffer_cleanup; >>>> + nfc->receive_iov = filter_buffer_receive_iov; >>>> +} >>>> + >>>> +static void filter_buffer_get_interval(Object *obj, Visitor *v, >>>> void *opaque, >>>> + const char *name, Error **errp) >>>> +{ >>>> + FilterBufferState *s = FILTER_BUFFER(obj); >>>> + uint32_t value = s->interval; >>>> + >>>> + visit_type_uint32(v, &value, name, errp); >>>> +} >>>> + >>>> +static void filter_buffer_set_interval(Object *obj, Visitor *v, >>>> void *opaque, >>>> + const char *name, Error **errp) >>>> +{ >>>> + FilterBufferState *s = FILTER_BUFFER(obj); >>>> + Error *local_err = NULL; >>>> + uint32_t value; >>>> + >>>> + visit_type_uint32(v, &value, name, &local_err); >>>> + if (local_err) { >>>> + goto out; >>>> + } >>>> + if (!value) { >>>> + error_setg(&local_err, "Property '%s.%s' doesn't take value '%" >>>> + PRIu32 "'", object_get_typename(obj), name, value); >>> >>> What does it take? That's what the user wants to know... Perhaps >>> "Property '%s.%s' requires a positive value". >> >> Maybe better, thanks. >> >>> >>>> + goto out; >>>> + } >>>> + s->interval = value; >>>> + >>>> +out: >>>> + error_propagate(errp, local_err); >>>> +} >>>> + >>>> +static void filter_buffer_init(Object *obj) >>>> +{ >>>> + object_property_add(obj, "interval", "int", >>>> + filter_buffer_get_interval, >>>> + filter_buffer_set_interval, NULL, NULL, NULL); >>>> +} >>>> + >>>> +static const TypeInfo filter_buffer_info = { >>>> + .name = TYPE_FILTER_BUFFER, >>>> + .parent = TYPE_NETFILTER, >>>> + .class_init = filter_buffer_class_init, >>>> + .instance_init = filter_buffer_init, >>>> + .instance_size = sizeof(FilterBufferState), >>>> +}; >>>> + >>>> +static void register_types(void) >>>> +{ >>>> + type_register_static(&filter_buffer_info); >>>> +} >>>> + >>>> +type_init(register_types); >>>> diff --git a/qemu-options.hx b/qemu-options.hx >>>> index 7e147b8..b09f97f 100644 >>>> --- a/qemu-options.hx >>>> +++ b/qemu-options.hx >>>> @@ -3642,6 +3642,24 @@ in PEM format, in filenames >>>> @var{ca-cert.pem}, @var{ca-crl.pem} (optional), >>>> @var{server-cert.pem} (only servers), @var{server-key.pem} (only servers), >>>> @var{client-cert.pem} (only clients), and @var{client-key.pem} >>>> (only clients). >>>> >>>> +@item -object >>>> filter-buffer,id=@var{id},netdev=@var{netdevid}[,chain=@var{all|in|out}][,interval=@var{t}] >>>> + >>>> +Buffer network packets on netdev @var{netdevid}. >>>> +If interval @var{t} provided, will release packets by interval. >>>> +Interval scale: microsecond. >>>> + >>>> +If interval @var{t} not provided, you have to make sure the packets can be >>> >>> Are you sure omitting t works? filter_buffer_set_interval() rejects >>> zero... >> >> When we omit t, filter_buffer_set_interval() won't be called. > > But filter_buffer_setup() will be, and it will fail, won't it? > >>>> +released, either by manually remove this filter or call the >>>> release buffer API, >>>> +otherwise, the packets will be buffered forever. Use with caution. >>> >>> Please wrap your lines a bit earlier. >>> >>>> + >>>> +chain @var{all|in|out} is an option that can be applied to any >>>> netfilter, default is @option{all}. >>>> + >>>> +@option{all} means this filter will receive packets both sent >>>> to/from the netdev >>>> + >>>> +@option{in} means this filter will receive packets sent to the netdev >>>> + >>>> +@option{out} means this filter will receive packets sent from the netdev >>>> + >>> >>> I'd describe this like "filter is inserted in the receive / transmit >>> queue". >> >> thanks. >> >>> >>>> @end table >>>> >>>> ETEXI >>>> diff --git a/vl.c b/vl.c >>>> index ec589e2..3cf89d5 100644 >>>> --- a/vl.c >>>> +++ b/vl.c >>>> @@ -2794,7 +2794,12 @@ static bool object_create_initial(const char *type) >>>> if (g_str_equal(type, "rng-egd")) { >>>> return false; >>>> } >>>> - /* TODO: return false for concrete netfilters */ >>>> + >>>> + /* return false for concrete netfilters */ >>> >>> I find this comment useless, please drop it :) >> >> Ok, thanks. >> >>> >>>> + if (g_str_equal(type, "filter-buffer")) { >>>> + return false; >>>> + } >>>> + >>>> return true; >>>> } >>> . >>> > . > -- Thanks, Yang.