From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:37700)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jasowang@redhat.com>) id 1ZEVjZ-0006lD-9b
	for qemu-devel@nongnu.org; Mon, 13 Jul 2015 00:52:55 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <jasowang@redhat.com>) id 1ZEVjW-0001XM-1C
	for qemu-devel@nongnu.org; Mon, 13 Jul 2015 00:52:53 -0400
Received: from mx1.redhat.com ([209.132.183.28]:58239)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jasowang@redhat.com>) id 1ZEVjV-0001X7-PA
	for qemu-devel@nongnu.org; Mon, 13 Jul 2015 00:52:49 -0400
Received: from int-mx10.intmail.prod.int.phx2.redhat.com
	(int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23])
	by mx1.redhat.com (Postfix) with ESMTPS id 427B592383
	for <qemu-devel@nongnu.org>; Mon, 13 Jul 2015 04:52:48 +0000 (UTC)
Message-ID: <55A3441D.5000305@redhat.com>
Date: Mon, 13 Jul 2015 12:52:45 +0800
From: Jason Wang <jasowang@redhat.com>
MIME-Version: 1.0
References: <1435633611-14023-1-git-send-email-famz@redhat.com>
	<559254CC.4000106@redhat.com>
	<20150702124626.GE21214@stefanha-thinkpad.home>
	<5599F6C9.7070603@redhat.com>
	<20150706152116.GA11925@stefanha-thinkpad.redhat.com>
	<559B91B5.6070708@redhat.com>
	<20150708105009.GA25442@stefanha-thinkpad.redhat.com>
In-Reply-To: <20150708105009.GA25442@stefanha-thinkpad.redhat.com>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH] virtio-net: Drop
	net_virtio_info.can_receive
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Fam Zheng <famz@redhat.com>, qemu-devel@nongnu.org, "Michael S. Tsirkin" <mst@redhat.com>


On 07/08/2015 06:50 PM, Stefan Hajnoczi wrote:
> On Tue, Jul 07, 2015 at 04:45:41PM +0800, Jason Wang wrote:
>>
>> On 07/06/2015 11:21 PM, Stefan Hajnoczi wrote:
>>> On Mon, Jul 06, 2015 at 11:32:25AM +0800, Jason Wang wrote:
>>>> On 07/02/2015 08:46 PM, Stefan Hajnoczi wrote:
>>>>> On Tue, Jun 30, 2015 at 04:35:24PM +0800, Jason Wang wrote:
>>>>>> On 06/30/2015 11:06 AM, Fam Zheng wrote:
>>>>>>> virtio_net_receive still does the check by calling
>>>>>>> virtio_net_can_receive, if the device or driver is not ready, the packet
>>>>>>> is dropped.
>>>>>>>
>>>>>>> This is necessary because returning false from can_receive complicates
>>>>>>> things: the peer would disable sending until we explicitly flush the
>>>>>>> queue.
>>>>>>>
>>>>>>> Signed-off-by: Fam Zheng <famz@redhat.com>
>>>>>>> ---
>>>>>>>  hw/net/virtio-net.c | 1 -
>>>>>>>  1 file changed, 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>>>>>>> index d728233..dbef0d0 100644
>>>>>>> --- a/hw/net/virtio-net.c
>>>>>>> +++ b/hw/net/virtio-net.c
>>>>>>> @@ -1503,7 +1503,6 @@ static int virtio_net_load_device(VirtIODevice *vdev, QEMUFile *f,
>>>>>>>  static NetClientInfo net_virtio_info = {
>>>>>>>      .type = NET_CLIENT_OPTIONS_KIND_NIC,
>>>>>>>      .size = sizeof(NICState),
>>>>>>> -    .can_receive = virtio_net_can_receive,
>>>>>>>      .receive = virtio_net_receive,
>>>>>>>      .link_status_changed = virtio_net_set_link_status,
>>>>>>>      .query_rx_filter = virtio_net_query_rxfilter,
>>>>>> A side effect of this patch is it will read and then drop packet is
>>>>>> guest driver is no ok.
>>>>> I think that the semantics of .can_receive() and .receive() return
>>>>> values are currently incorrect in many NICs.  They have .can_receive()
>>>>> functions that return false for conditions where .receive() would
>>>>> discard the packet.  So what happens is that packets get queued when
>>>>> they should actually be discarded.
>>>> Yes, but they are bugs more or less.
>>>>
>>>>> The purpose of the flow control (queuing) mechanism is to tell the
>>>>> sender to hold off until the receiver has more rx buffers available.
>>>>> It's a short-term thing that doesn't included link down, rx disable, or
>>>>> NIC reset states.
>>>>>
>>>>> Therefore, I think this patch will not introduce a regression.  It is
>>>>> adjusting the code to stop queuing packets when they should actually be
>>>>> dropped.
>>>>>
>>>>> Thoughts?
>>>> I agree there's no functional issue. But it cause wasting of cpu cycles
>>>> (consider guest is being flooded). Sometime it maybe even dangerous. For
>>>> tap, we're probably ok since we have 756ae78b but for other backend, we
>>>> don't.
>>> If the guest uses iptables rules or other mechanisms to drop bogus
>>> packets the cost is even higher than discarding them at the QEMU layer.
>> But it was the choice of guest.
>>
>>> What's more is that if you're using link down as a DoS mitigation
>>> strategy then you might as well hot unplug the NIC.
>>>
>>> Stefan
>> I think there're two problems for virtio-net:
>>
>> 1) mitigation method when guest driver is ok. For tx, we have either
>> timer or bh, for rx and only for tap, we have 756ae78b. We probably need
>> fixes for other backends.
> I agree.  It's not directly related to the change Fam is proposing
> though.
>
>> 2) when driver is not ok, the point is we should not poll the backend at
>> all (I believe this is one of the main objects of main loop). Something
>> like tap_can_send() and the commit that drops tap_can_send() all follow
>> this rule. But this patch does not, we end up with:
>>
>> - driver is not ok or no driver, qemu keep reading and dropping packets.
> It's correct from a functionality perspective.  When rx is disabled or
> the link is down, packets should be dropped.
>
> From a performance perspective I agree it would be better to drop
> packets earlier in the stack.
>
> If you want tap to drop packets efficiently during link down/receive
> disabled/etc:
>
> What's the best way to disable tap rx so the host kernel drops packets
> instead of queuing them?

There's no such API currently especially for a unprivileged qemu. But
tuntap will drop the packet if it exceeds tx_queue_length. This looks
still much better than current behaviour. 

>
>> - driver is ok but not enough rx buffer, qemu will disable tap read poll.
> QEMU relies on the kernel's socket receive buffer or netif receive
> queue to hold incoming packets.  When QEMU enables tap read poll again
> it receives the queued packets.
>
> Queuing is necessary so USB net emulation (the rx buffer is only 1
> packet!) and maybe slirp work.

Yes.

>
> I think real NICs drop packets when the rx ring is exhausted, but we
> need to keep the queuing behavior unless someone proposes a way to
> eliminate it.

Yes, queueing seems useless if backend can do it.

So I'm ok with the patch, and we can optimize in the future.

Thanks