All of lore.kernel.org
 help / color / mirror / Atom feed
From: Rusty Russell <rusty@rustcorp.com.au>
To: Herbert Xu <herbert@gondor.apana.org.au>
Cc: netdev@vger.kernel.org,
	virtualization@lists.linux-foundation.org,
	David Miller <davem@davemloft.net>
Subject: Re: [PATCH 2/4] virtio_net: return NETDEV_TX_BUSY instead of queueing an extra skb.
Date: Wed, 3 Jun 2009 12:47:04 +0930	[thread overview]
Message-ID: <200906031247.05591.rusty@rustcorp.com.au> (raw)
In-Reply-To: <20090602234532.GA5417@gondor.apana.org.au>

On Wed, 3 Jun 2009 09:15:32 am Herbert Xu wrote:
> On Tue, Jun 02, 2009 at 11:25:57PM +0930, Rusty Russell wrote:
> > Or, we could just "return NETDEV_TX_BUSY;".  I like that :)
>
> No you should fix it so that you check the queue status after
> transmitting a packet so we never get into this state in the
> first place.

We could figure out if we can take the worst-case packet, and underutilize
our queue.  And fix the other *67* drivers.

Of course that doesn't even work, because we return NETDEV_TX_BUSY from dev.c!

"Hi, core netdevs here.  Don't use NETDEV_TX_BUSY.   Yeah, we can't figure out
how to avoid it either.  But y'know, just hack something together".

Herbert, we are *better* than this!

How's this?  Tested for the virtio_net driver here.


[RFC] net: fix double-tcpdump problem with NETDEV_TX_BUSY.

Herbert shares a distain for drivers returning TX_BUSY because
network taps see packets twice when it's used.  Unfortunately, it's
ubiquitous.

This patch marks packets by (ab)using the "peeked" bit in the skb.
This bit is currently used for packets queued in a socket; we reset it
in dev_queue_xmit and set it when we hand the packet to
dev_queue_xmit_nit.

We also reset it on incoming packets: this is safe, but it might be
sufficient to reset it only in the loopback driver?

diff --git a/net/core/dev.c b/net/core/dev.c
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1678,8 +1678,10 @@ int dev_hard_start_xmit(struct sk_buff *
 	int rc;
 
 	if (likely(!skb->next)) {
-		if (!list_empty(&ptype_all))
+		if (!list_empty(&ptype_all) && !skb->peeked) {
 			dev_queue_xmit_nit(skb, dev);
+			skb->peeked = true;
+		}
 
 		if (netif_needs_gso(dev, skb)) {
 			if (unlikely(dev_gso_segment(skb)))
@@ -1796,6 +1798,8 @@ int dev_queue_xmit(struct sk_buff *skb)
 	struct Qdisc *q;
 	int rc = -ENOMEM;
 
+	skb->peeked = false;
+
 	/* GSO will handle the following emulations directly. */
 	if (netif_needs_gso(dev, skb))
 		goto gso;
@@ -1942,6 +1946,8 @@ int netif_rx(struct sk_buff *skb)
 	if (!skb->tstamp.tv64)
 		net_timestamp(skb);
 
+	skb->peeked = false;
+
 	/*
 	 * The code is rearranged so that the path is the most
 	 * short when CPU is congested, but is still operating.


  parent reply	other threads:[~2009-06-03  3:17 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-29 14:16 [PATCH 2/4] virtio_net: return NETDEV_TX_BUSY instead of queueing an extra skb Rusty Russell
2009-06-02  8:07 ` Mark McLoughlin
2009-06-02  8:07 ` Mark McLoughlin
2009-06-02 14:04   ` Rusty Russell
2009-06-02 14:04   ` Rusty Russell
2009-06-02  9:05 ` Herbert Xu
2009-06-02  9:05 ` Herbert Xu
2009-06-02 13:55   ` Rusty Russell
2009-06-02 13:55   ` Rusty Russell
2009-06-02 23:45     ` Herbert Xu
2009-06-03  3:17       ` Rusty Russell
2009-06-03  3:17       ` Rusty Russell [this message]
2009-06-08  5:22         ` Herbert Xu
2009-06-08  5:22         ` Herbert Xu
2009-06-13 12:30           ` Rusty Russell
2009-06-13 12:30           ` Rusty Russell
2009-06-14  6:45             ` Herbert Xu
2009-06-14  6:45             ` Herbert Xu
2009-06-18  7:17               ` Rusty Russell
2009-06-18  7:34                 ` Herbert Xu
2009-06-19  3:37                   ` Rusty Russell
2009-06-19  3:37                   ` Rusty Russell
2009-06-19  4:36                     ` Herbert Xu
2009-06-19  4:36                     ` Herbert Xu
2009-06-19 13:50                       ` Rusty Russell
2009-06-19 14:10                         ` Herbert Xu
2009-06-19 14:10                         ` Herbert Xu
2009-06-22  2:39                           ` Rusty Russell
2009-06-22  2:39                           ` Rusty Russell
2009-06-19 13:50                       ` Rusty Russell
2009-06-22  5:46                       ` Krishna Kumar2
2009-06-22  7:34                         ` Herbert Xu
2009-06-22  7:34                         ` Herbert Xu
2009-06-22 13:41                           ` Krishna Kumar2
2009-06-22 13:41                           ` Krishna Kumar2
2009-06-22 18:25                           ` Matt Carlson
2009-06-22 18:25                           ` Matt Carlson
2009-06-23  2:54                             ` Herbert Xu
2009-06-23  2:54                             ` Herbert Xu
2009-06-22  5:46                       ` Krishna Kumar2
2009-06-18  7:34                 ` Herbert Xu
2009-06-18  7:17               ` Rusty Russell
2009-06-02 23:45     ` Herbert Xu
  -- strict thread matches above, loose matches on Subject: below --
2009-05-29 14:16 Rusty Russell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200906031247.05591.rusty@rustcorp.com.au \
    --to=rusty@rustcorp.com.au \
    --cc=davem@davemloft.net \
    --cc=herbert@gondor.apana.org.au \
    --cc=netdev@vger.kernel.org \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.