All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: Koichiro Den <den@klaipeden.com>,
	Jason Wang <jasowang@redhat.com>,
	virtualization@lists.linux-foundation.org,
	Network Development <netdev@vger.kernel.org>
Subject: Re: [PATCH net-next] virtio-net: invoke zerocopy callback on xmit path if no tx napi
Date: Thu, 24 Aug 2017 23:50:53 +0300	[thread overview]
Message-ID: <20170824234551-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <CAF=yD-+9Ah8pC9i2w3Ad3WnhQit7Yo479pMrToty7priL6BFLw@mail.gmail.com>

On Thu, Aug 24, 2017 at 04:20:39PM -0400, Willem de Bruijn wrote:
> >> Traffic shaping can introduce msec timescale latencies.
> >>
> >> The delay may actually be a useful signal. If the guest does not
> >> orphan skbs early, TSQ will throttle the socket causing host
> >> queue build up.
> >>
> >> But, if completions are queued in-order, unrelated flows may be
> >> throttled as well. Allowing out of order completions would resolve
> >> this HoL blocking.
> >
> > We can allow out of order, no guests that follow virtio spec
> > will break. But this won't help in all cases
> > - a single slow flow can occupy the whole ring, you will not
> >   be able to make any new buffers available for the fast flow
> > - what host considers a single flow can be multiple flows for guest
> >
> > There are many other examples.
> 
> These examples are due to exhaustion of the fixed ubuf_info pool,
> right?

No - the ring size itself.

> We could use dynamic allocation or a resizable pool if these
> issues are serious enough.

We need some kind of limit on how many requests a guest can queue in the
host, or it's an obvious DoS attack vector. We used the ring size for
that.

> >> > Neither
> >> > do I see why would using tx interrupts within guest be a work around -
> >> > AFAIK windows driver uses tx interrupts.
> >>
> >> It does not address completion latency itself. What I meant was
> >> that in an interrupt-driver model, additional starvation issues,
> >> such as the potential deadlock raised at the start of this thread,
> >> or the timer delay observed before packets were orphaned in
> >> virtio-net in commit b0c39dbdc204, are mitigated.
> >>
> >> Specifically, it breaks the potential deadlock where sockets are
> >> blocked waiting for completions (to free up budget in sndbuf, tsq, ..),
> >> yet completion handling is blocked waiting for a new packet to
> >> trigger free_old_xmit_skbs from start_xmit.
> >
> > This talk of potential deadlock confuses me - I think you mean we would
> > deadlock if we did not orphan skbs in !use_napi - is that right?  If you
> > mean that you can drop skb orphan and this won't lead to a deadlock if
> > free skbs upon a tx interrupt, I agree, for sure.
> 
> Yes, that is what I meant.
> 
> >> >> That is the only thing keeping us from removing the HoL blocking in vhost-net zerocopy.
> >> >
> >> > We don't enable network watchdog on virtio but we could and maybe
> >> > should.
> >>
> >> Can you elaborate?
> >
> > The issue is that holding onto buffers for very long times makes guests
> > think they are stuck. This is funamentally because from guest point of
> > view this is a NIC, so it is supposed to transmit things out in
> > a timely manner. If host backs the virtual NIC by something that is not
> > a NIC, with traffic shaping etc introducing unbounded latencies,
> > guest will be confused.
> 
> That assumes that guests are fragile in this regard. A linux guest
> does not make such assumptions.

Yes it does. Examples above:
	> > - a single slow flow can occupy the whole ring, you will not
	> >   be able to make any new buffers available for the fast flow
	> > - what host considers a single flow can be multiple flows for guest

it's easier to see if you enable the watchdog timer for virtio. Linux
supports that.

> There are NICs with hardware
> rate limiting, so I'm not sure how much of a leap host os rate
> limiting is.

I don't know what happens if these NICs hold onto packets for seconds.

-- 
MST

  reply	other threads:[~2017-08-24 20:50 UTC|newest]

Thread overview: 85+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-19  6:38 [PATCH net-next] virtio-net: invoke zerocopy callback on xmit path if no tx napi Koichiro Den
2017-08-20 20:49 ` Willem de Bruijn
2017-08-21 12:40   ` Koichiro Den
2017-08-21 12:40   ` Koichiro Den
2017-08-22 12:11   ` Willem de Bruijn
2017-08-22 14:04     ` Koichiro Den
2017-08-22 17:19       ` Willem de Bruijn
2017-08-22 17:19       ` Willem de Bruijn
2017-08-23 14:26         ` Koichiro Den
2017-08-23 14:26         ` Koichiro Den
2017-08-22 14:04     ` Koichiro Den
2017-08-22 12:11   ` Willem de Bruijn
2017-08-21 12:33 ` Jason Wang
2017-08-21 12:33 ` Jason Wang
2017-08-21 12:58   ` Koichiro Den
2017-08-21 12:58   ` Koichiro Den
2017-08-21 15:41   ` Willem de Bruijn
2017-08-22  2:50     ` Jason Wang
2017-08-22  3:10       ` Willem de Bruijn
2017-08-22 11:47         ` Jason Wang
2017-08-22 13:42         ` Koichiro Den
2017-08-22 17:16           ` Willem de Bruijn
2017-08-22 17:16           ` Willem de Bruijn
2017-08-23 14:24             ` Koichiro Den
2017-08-23 14:24             ` Koichiro Den
2017-08-22  3:10       ` Willem de Bruijn
2017-08-22 17:55       ` Michael S. Tsirkin
2017-08-22 17:55       ` Michael S. Tsirkin
2017-08-22 18:01         ` David Miller
2017-08-22 18:01         ` David Miller
2017-08-22 18:28           ` Eric Dumazet
2017-08-22 18:39             ` Michael S. Tsirkin
2017-08-22 18:39             ` Michael S. Tsirkin
2017-08-23 14:28         ` Koichiro Den
2017-08-23 14:47           ` Koichiro Den
2017-08-23 14:47           ` Koichiro Den
2017-08-23 15:20           ` Willem de Bruijn
2017-08-23 15:20           ` Willem de Bruijn
2017-08-23 22:57             ` Michael S. Tsirkin
2017-08-24  3:28               ` Willem de Bruijn
2017-08-24  3:28               ` Willem de Bruijn
2017-08-24  4:34                 ` Michael S. Tsirkin
2017-08-24  4:34                 ` Michael S. Tsirkin
2017-08-24 13:50                 ` Michael S. Tsirkin
2017-08-24 20:20                   ` Willem de Bruijn
2017-08-24 20:50                     ` Michael S. Tsirkin [this message]
2017-08-25 22:44                       ` Willem de Bruijn
2017-08-25 22:44                       ` Willem de Bruijn
2017-08-25 23:32                         ` Michael S. Tsirkin
2017-08-26  1:03                           ` Willem de Bruijn
2017-08-26  1:03                           ` Willem de Bruijn
2017-08-29 19:35                             ` Willem de Bruijn
2017-08-29 19:42                               ` Michael S. Tsirkin
2017-08-29 19:53                                 ` Willem de Bruijn
2017-08-29 20:40                                   ` Michael S. Tsirkin
2017-08-29 22:55                                     ` Willem de Bruijn
2017-08-29 20:40                                   ` Michael S. Tsirkin
2017-08-29 19:53                                 ` Willem de Bruijn
2017-08-29 19:42                               ` Michael S. Tsirkin
2017-08-30  1:45                               ` Jason Wang
2017-08-30  1:45                               ` Jason Wang
2017-08-30  3:11                                 ` Willem de Bruijn
2017-08-30  3:11                                 ` Willem de Bruijn
2017-09-01  3:08                                   ` Jason Wang
2017-09-01  3:08                                   ` Jason Wang
2017-08-31 14:30                               ` Willem de Bruijn
2017-08-31 14:30                               ` Willem de Bruijn
2017-09-01  3:25                                 ` Jason Wang
2017-09-01  3:25                                 ` Jason Wang
2017-09-01 16:15                                   ` Willem de Bruijn
2017-09-01 16:17                                     ` Willem de Bruijn
2017-09-04  3:03                                       ` Jason Wang
2017-09-04  3:03                                       ` Jason Wang
2017-09-05 14:09                                         ` Willem de Bruijn
2017-09-06  3:27                                           ` Michael S. Tsirkin
2017-09-06  3:27                                           ` Michael S. Tsirkin
2017-09-01 16:17                                     ` Willem de Bruijn
2017-09-01 16:15                                   ` Willem de Bruijn
2017-08-25 23:32                         ` Michael S. Tsirkin
2017-08-24 20:50                     ` Michael S. Tsirkin
2017-08-24 20:20                   ` Willem de Bruijn
2017-08-23 22:57             ` Michael S. Tsirkin
2017-08-23 14:28         ` Koichiro Den
2017-08-22  2:50     ` Jason Wang
2017-08-21 15:41   ` Willem de Bruijn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170824234551-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=den@klaipeden.com \
    --cc=jasowang@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.