netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jason Wang <jasowang@redhat.com>
To: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: Koichiro Den <den@klaipeden.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	virtualization@lists.linux-foundation.org,
	Network Development <netdev@vger.kernel.org>
Subject: Re: [PATCH net-next] virtio-net: invoke zerocopy callback on xmit path if no tx napi
Date: Tue, 22 Aug 2017 10:50:41 +0800	[thread overview]
Message-ID: <64d451ae-9944-e978-5a05-54bb1a62aaad@redhat.com> (raw)
In-Reply-To: <CAF=yD-LZ4=WAYfUtY7xRWi50FRSkrcOa+b7uc46xRnC4sbDCzQ@mail.gmail.com>



On 2017年08月21日 23:41, Willem de Bruijn wrote:
> On Mon, Aug 21, 2017 at 8:33 AM, Jason Wang <jasowang@redhat.com> wrote:
>>
>> On 2017年08月19日 14:38, Koichiro Den wrote:
>>> Facing the possible unbounded delay relying on freeing on xmit path,
>>> we also better to invoke and clear the upper layer zerocopy callback
>>> beforehand to keep them from waiting for unbounded duration in vain.
>>> For instance, this removes the possible deadlock in the case that the
>>> upper layer is a zerocopy-enabled vhost-net.
>>> This does not apply if napi_tx is enabled since it will be called in
>>> reasonale time.
>>>
>>> Signed-off-by: Koichiro Den <den@klaipeden.com>
>>> ---
>>>    drivers/net/virtio_net.c | 8 ++++++++
>>>    1 file changed, 8 insertions(+)
>>>
>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>> index 4302f313d9a7..f7deaa5b7b50 100644
>>> --- a/drivers/net/virtio_net.c
>>> +++ b/drivers/net/virtio_net.c
>>> @@ -1290,6 +1290,14 @@ static netdev_tx_t start_xmit(struct sk_buff *skb,
>>> struct net_device *dev)
>>>          /* Don't wait up for transmitted skbs to be freed. */
>>>          if (!use_napi) {
>>> +               if (skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY) {
>>> +                       struct ubuf_info *uarg;
>>> +                       uarg = skb_shinfo(skb)->destructor_arg;
>>> +                       if (uarg->callback)
>>> +                           uarg->callback(uarg, true);
>>> +                       skb_shinfo(skb)->destructor_arg = NULL;
>>> +                       skb_shinfo(skb)->tx_flags &= ~SKBTX_DEV_ZEROCOPY;
>>> +               }
>>>                  skb_orphan(skb);
>>>                  nf_reset(skb);
>>>          }
>>
>>
>> Interesting, deadlock could be treated as a a radical case of the discussion
>> here https://patchwork.kernel.org/patch/3787671/.
>>
>> git grep tells more similar skb_orphan() cases. Do we need to change them
>> all (or part)?
> Most skb_orphan calls are not relevant to the issue of transmit delay.

Yes, but at least we should audit the ones in drivers/net.

>
>> Actually, we may meet similar issues at many other places (e.g netem).
> Netem is an interesting case. Because it is intended to mimic network
> delay, at least in the case where it calls skb_orphan, it may make
> sense to release all references, including calling skb_zcopy_clear.
>
> In general, zerocopy reverts to copy on all paths that may cause
> unbounded delay due to another process. Guarding against delay
> induced by the administrator is infeasible. It is always possible to
> just pause the nic. Netem is one instance of that, and not unbounded.

The problem is, admin may only delay the traffic in e.g one interface, 
but it actually delay or stall all traffic inside a VM.

>
>> Need
>> to consider a complete solution for this. Figuring out all places that could
>> delay a packet is a method.
> The issue described in the referenced patch seems like head of line
> blocking between two flows. If one flow delays zerocopy descriptor
> release from the vhost-net pool, it blocks all subsequent descriptors
> in that pool from being released, also delaying other flows that use
> the same descriptor pool. If the pool is empty, all transmission stopped.
>
> Reverting to copy tx when the pool reaches a low watermark, as the
> patch does, fixes this.

An issue of the referenced patch is that sndbuf could be smaller than 
low watermark.

> Perhaps the descriptor pool should also be
> revised to allow out of order completions. Then there is no need to
> copy zerocopy packets whenever they may experience delay.

Yes, but as replied in the referenced thread, windows driver may treat 
out of order completion as a bug.

>
> On the point of counting copy vs zerocopy: the new msg_zerocopy
> variant of ubuf_info has a field to record whether a deep copy was
> made. This can be used with vhost-net zerocopy, too.

Just to make sure I understand. It's still not clear to me how to reuse 
this for vhost-net, e.g zerocopy flag is in a union which is not used by 
vhost_net.

Thanks

  reply	other threads:[~2017-08-22  2:50 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-19  6:38 [PATCH net-next] virtio-net: invoke zerocopy callback on xmit path if no tx napi Koichiro Den
2017-08-20 20:49 ` Willem de Bruijn
2017-08-21 12:40   ` Koichiro Den
2017-08-22 12:11   ` Willem de Bruijn
2017-08-22 14:04     ` Koichiro Den
2017-08-22 17:19       ` Willem de Bruijn
2017-08-23 14:26         ` Koichiro Den
2017-08-21 12:33 ` Jason Wang
2017-08-21 12:58   ` Koichiro Den
2017-08-21 15:41   ` Willem de Bruijn
2017-08-22  2:50     ` Jason Wang [this message]
2017-08-22  3:10       ` Willem de Bruijn
2017-08-22 11:47         ` Jason Wang
2017-08-22 13:42         ` Koichiro Den
2017-08-22 17:16           ` Willem de Bruijn
2017-08-23 14:24             ` Koichiro Den
2017-08-22 17:55       ` Michael S. Tsirkin
2017-08-22 18:01         ` David Miller
2017-08-22 18:28           ` Eric Dumazet
2017-08-22 18:39             ` Michael S. Tsirkin
2017-08-23 14:28         ` Koichiro Den
2017-08-23 14:47           ` Koichiro Den
2017-08-23 15:20           ` Willem de Bruijn
2017-08-23 22:57             ` Michael S. Tsirkin
2017-08-24  3:28               ` Willem de Bruijn
2017-08-24  4:34                 ` Michael S. Tsirkin
2017-08-24 13:50                 ` Michael S. Tsirkin
2017-08-24 20:20                   ` Willem de Bruijn
2017-08-24 20:50                     ` Michael S. Tsirkin
2017-08-25 22:44                       ` Willem de Bruijn
2017-08-25 23:32                         ` Michael S. Tsirkin
2017-08-26  1:03                           ` Willem de Bruijn
2017-08-29 19:35                             ` Willem de Bruijn
2017-08-29 19:42                               ` Michael S. Tsirkin
2017-08-29 19:53                                 ` Willem de Bruijn
2017-08-29 20:40                                   ` Michael S. Tsirkin
2017-08-29 22:55                                     ` Willem de Bruijn
2017-08-30  1:45                               ` Jason Wang
2017-08-30  3:11                                 ` Willem de Bruijn
2017-09-01  3:08                                   ` Jason Wang
2017-08-31 14:30                               ` Willem de Bruijn
2017-09-01  3:25                                 ` Jason Wang
2017-09-01 16:15                                   ` Willem de Bruijn
2017-09-01 16:17                                     ` Willem de Bruijn
2017-09-04  3:03                                       ` Jason Wang
2017-09-05 14:09                                         ` Willem de Bruijn
2017-09-06  3:27                                           ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=64d451ae-9944-e978-5a05-54bb1a62aaad@redhat.com \
    --to=jasowang@redhat.com \
    --cc=den@klaipeden.com \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).