From: "Michael S. Tsirkin" <mst@redhat.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: netdev <netdev@vger.kernel.org>
Subject: Re: [RFC] about "net: orphan frags on receive" insanity
Date: Thu, 27 Jun 2013 10:09:10 +0300 [thread overview]
Message-ID: <20130627070910.GA5489@redhat.com> (raw)
In-Reply-To: <1372275874.3301.206.camel@edumazet-glaptop>
On Wed, Jun 26, 2013 at 12:44:34PM -0700, Eric Dumazet wrote:
> On Wed, 2013-06-26 at 22:22 +0300, Michael S. Tsirkin wrote:
>
> > The point is we don't know the final destination of the packet
> > until it's going through the stack.
> >
> > We don't want to trigger a copy for all data we get from tun:
> > we only want to do this if the data has a chance to get
> > queued somewhere indefinitely.
>
> I think you missed my point.
>
> I am pretty sure it should be done from netif_rx(), not from
> __netif_receive_skb_core()
> so that modern NIC devices do not have to pay this extra cost.
Yres, this is exactly what I thought you meant, but as I said,
this approach disables tun zero copy.
The point of the current code is to copy only if there
are any host protocols that consume the skb.
> # size net/core/dev_*.o
> text data bss dec hex filename
> 41928 963 752 43643 aa7b net/core/dev_before.o
> 41579 963 752 43294 a91e net/core/dev_after.o
So we have a lot of code like this:
if (pt_prev) {
ret = deliver_skb(skb, pt_prev, orig_dev);
}
pt_prev = NULL;
and this is what created all this.
But in practice pt_prev is set when we have multiple
consumers for a packet - this is the standard path:
if (pt_prev) {
if (unlikely(skb_orphan_frags(skb, GFP_ATOMIC)))
goto drop;
else
ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
}
So I think calls to deliver_skb
are not all that common - we should not make gcc inline
them so aggressively, let it make its own decisions.
Here's an alternative patch:
diff --git a/net/core/dev.c b/net/core/dev.c
index fc1e289..03cb51c 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1642,9 +1642,9 @@ int dev_forward_skb(struct net_device *dev, struct sk_buff *skb)
}
EXPORT_SYMBOL_GPL(dev_forward_skb);
-static inline int deliver_skb(struct sk_buff *skb,
- struct packet_type *pt_prev,
- struct net_device *orig_dev)
+static int deliver_skb(struct sk_buff *skb,
+ struct packet_type *pt_prev,
+ struct net_device *orig_dev)
{
if (unlikely(skb_orphan_frags(skb, GFP_ATOMIC)))
return -ENOMEM;
This gives us more than half the gain of your patch
without breaking tun zero copy.
[mst@robin linux]$ size net/core/dev_orig.o
text data bss dec hex filename
47357 1270 720 49347 c0c3 net/core/dev_orig.o
[mst@robin linux]$ size net/core/dev.o
text data bss dec hex filename
47105 1270 720 49095 bfc7 net/core/dev.o
Maybe we should tag calls to if (pt_prev) before deliver_skb
as unlikely, this will move this code further from the hot path -
this needs some testing.
> Untested patch :
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index fc1e289..3730318 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1646,8 +1646,6 @@ static inline int deliver_skb(struct sk_buff *skb,
> struct packet_type *pt_prev,
> struct net_device *orig_dev)
> {
> - if (unlikely(skb_orphan_frags(skb, GFP_ATOMIC)))
> - return -ENOMEM;
> atomic_inc(&skb->users);
> return pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
> }
> @@ -3133,6 +3131,9 @@ int netif_rx(struct sk_buff *skb)
this is basically assuming tun will use netif_rx forever,
but I think it's quite possible that we'll
switch it to napi down the road.
> if (netpoll_rx(skb))
> return NET_RX_DROP;
>
> + if (unlikely(skb_orphan_frags(skb, GFP_ATOMIC)))
> + return NET_RX_DROP;
> +
> net_timestamp_check(netdev_tstamp_prequeue, skb);
>
> trace_netif_rx(skb);
And this chunk means all tun data is immediately copied before
it's passed to net core, in effect, no zero copy transmit.
> @@ -3498,10 +3499,7 @@ ncls:
> }
>
> if (pt_prev) {
> - if (unlikely(skb_orphan_frags(skb, GFP_ATOMIC)))
> - goto drop;
> - else
> - ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
> + ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
> } else {
> drop:
> atomic_long_inc(&skb->dev->rx_dropped);
>
prev parent reply other threads:[~2013-06-27 7:08 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-26 8:23 [RFC] about "net: orphan frags on receive" insanity Eric Dumazet
2013-06-26 9:56 ` Michael S. Tsirkin
2013-06-26 10:12 ` Eric Dumazet
2013-06-26 18:56 ` Michael S. Tsirkin
2013-06-26 19:09 ` Eric Dumazet
2013-06-26 19:22 ` Michael S. Tsirkin
2013-06-26 19:44 ` Eric Dumazet
2013-06-27 7:09 ` Michael S. Tsirkin [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130627070910.GA5489@redhat.com \
--to=mst@redhat.com \
--cc=eric.dumazet@gmail.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).