From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37700) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZEVjZ-0006lD-9b for qemu-devel@nongnu.org; Mon, 13 Jul 2015 00:52:55 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZEVjW-0001XM-1C for qemu-devel@nongnu.org; Mon, 13 Jul 2015 00:52:53 -0400 Received: from mx1.redhat.com ([209.132.183.28]:58239) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZEVjV-0001X7-PA for qemu-devel@nongnu.org; Mon, 13 Jul 2015 00:52:49 -0400 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) by mx1.redhat.com (Postfix) with ESMTPS id 427B592383 for ; Mon, 13 Jul 2015 04:52:48 +0000 (UTC) Message-ID: <55A3441D.5000305@redhat.com> Date: Mon, 13 Jul 2015 12:52:45 +0800 From: Jason Wang MIME-Version: 1.0 References: <1435633611-14023-1-git-send-email-famz@redhat.com> <559254CC.4000106@redhat.com> <20150702124626.GE21214@stefanha-thinkpad.home> <5599F6C9.7070603@redhat.com> <20150706152116.GA11925@stefanha-thinkpad.redhat.com> <559B91B5.6070708@redhat.com> <20150708105009.GA25442@stefanha-thinkpad.redhat.com> In-Reply-To: <20150708105009.GA25442@stefanha-thinkpad.redhat.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH] virtio-net: Drop net_virtio_info.can_receive List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: Fam Zheng , qemu-devel@nongnu.org, "Michael S. Tsirkin" On 07/08/2015 06:50 PM, Stefan Hajnoczi wrote: > On Tue, Jul 07, 2015 at 04:45:41PM +0800, Jason Wang wrote: >> >> On 07/06/2015 11:21 PM, Stefan Hajnoczi wrote: >>> On Mon, Jul 06, 2015 at 11:32:25AM +0800, Jason Wang wrote: >>>> On 07/02/2015 08:46 PM, Stefan Hajnoczi wrote: >>>>> On Tue, Jun 30, 2015 at 04:35:24PM +0800, Jason Wang wrote: >>>>>> On 06/30/2015 11:06 AM, Fam Zheng wrote: >>>>>>> virtio_net_receive still does the check by calling >>>>>>> virtio_net_can_receive, if the device or driver is not ready, the packet >>>>>>> is dropped. >>>>>>> >>>>>>> This is necessary because returning false from can_receive complicates >>>>>>> things: the peer would disable sending until we explicitly flush the >>>>>>> queue. >>>>>>> >>>>>>> Signed-off-by: Fam Zheng >>>>>>> --- >>>>>>> hw/net/virtio-net.c | 1 - >>>>>>> 1 file changed, 1 deletion(-) >>>>>>> >>>>>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c >>>>>>> index d728233..dbef0d0 100644 >>>>>>> --- a/hw/net/virtio-net.c >>>>>>> +++ b/hw/net/virtio-net.c >>>>>>> @@ -1503,7 +1503,6 @@ static int virtio_net_load_device(VirtIODevice *vdev, QEMUFile *f, >>>>>>> static NetClientInfo net_virtio_info = { >>>>>>> .type = NET_CLIENT_OPTIONS_KIND_NIC, >>>>>>> .size = sizeof(NICState), >>>>>>> - .can_receive = virtio_net_can_receive, >>>>>>> .receive = virtio_net_receive, >>>>>>> .link_status_changed = virtio_net_set_link_status, >>>>>>> .query_rx_filter = virtio_net_query_rxfilter, >>>>>> A side effect of this patch is it will read and then drop packet is >>>>>> guest driver is no ok. >>>>> I think that the semantics of .can_receive() and .receive() return >>>>> values are currently incorrect in many NICs. They have .can_receive() >>>>> functions that return false for conditions where .receive() would >>>>> discard the packet. So what happens is that packets get queued when >>>>> they should actually be discarded. >>>> Yes, but they are bugs more or less. >>>> >>>>> The purpose of the flow control (queuing) mechanism is to tell the >>>>> sender to hold off until the receiver has more rx buffers available. >>>>> It's a short-term thing that doesn't included link down, rx disable, or >>>>> NIC reset states. >>>>> >>>>> Therefore, I think this patch will not introduce a regression. It is >>>>> adjusting the code to stop queuing packets when they should actually be >>>>> dropped. >>>>> >>>>> Thoughts? >>>> I agree there's no functional issue. But it cause wasting of cpu cycles >>>> (consider guest is being flooded). Sometime it maybe even dangerous. For >>>> tap, we're probably ok since we have 756ae78b but for other backend, we >>>> don't. >>> If the guest uses iptables rules or other mechanisms to drop bogus >>> packets the cost is even higher than discarding them at the QEMU layer. >> But it was the choice of guest. >> >>> What's more is that if you're using link down as a DoS mitigation >>> strategy then you might as well hot unplug the NIC. >>> >>> Stefan >> I think there're two problems for virtio-net: >> >> 1) mitigation method when guest driver is ok. For tx, we have either >> timer or bh, for rx and only for tap, we have 756ae78b. We probably need >> fixes for other backends. > I agree. It's not directly related to the change Fam is proposing > though. > >> 2) when driver is not ok, the point is we should not poll the backend at >> all (I believe this is one of the main objects of main loop). Something >> like tap_can_send() and the commit that drops tap_can_send() all follow >> this rule. But this patch does not, we end up with: >> >> - driver is not ok or no driver, qemu keep reading and dropping packets. > It's correct from a functionality perspective. When rx is disabled or > the link is down, packets should be dropped. > > From a performance perspective I agree it would be better to drop > packets earlier in the stack. > > If you want tap to drop packets efficiently during link down/receive > disabled/etc: > > What's the best way to disable tap rx so the host kernel drops packets > instead of queuing them? There's no such API currently especially for a unprivileged qemu. But tuntap will drop the packet if it exceeds tx_queue_length. This looks still much better than current behaviour. > >> - driver is ok but not enough rx buffer, qemu will disable tap read poll. > QEMU relies on the kernel's socket receive buffer or netif receive > queue to hold incoming packets. When QEMU enables tap read poll again > it receives the queued packets. > > Queuing is necessary so USB net emulation (the rx buffer is only 1 > packet!) and maybe slirp work. Yes. > > I think real NICs drop packets when the rx ring is exhausted, but we > need to keep the queuing behavior unless someone proposes a way to > eliminate it. Yes, queueing seems useless if backend can do it. So I'm ok with the patch, and we can optimize in the future. Thanks