netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: netdev <netdev@vger.kernel.org>
Subject: Re: [RFC] about "net: orphan frags on receive" insanity
Date: Thu, 27 Jun 2013 10:09:10 +0300	[thread overview]
Message-ID: <20130627070910.GA5489@redhat.com> (raw)
In-Reply-To: <1372275874.3301.206.camel@edumazet-glaptop>

On Wed, Jun 26, 2013 at 12:44:34PM -0700, Eric Dumazet wrote:
> On Wed, 2013-06-26 at 22:22 +0300, Michael S. Tsirkin wrote:
> 
> > The point is we don't know the final destination of the packet
> > until it's going through the stack.
> > 
> > We don't want to trigger a copy for all data we get from tun:
> > we only want to do this if the data has a chance to get
> > queued somewhere indefinitely.
> 
> I think you missed my point.
> 
> I am pretty sure it should be done from netif_rx(), not from
> __netif_receive_skb_core()
> so that modern NIC devices do not have to pay this extra cost.

Yres, this is exactly what I thought you meant, but as I said,
this approach disables tun zero copy.
The point of the current code is to copy only if there
are any host protocols that consume the skb.

> # size net/core/dev_*.o
>    text	   data	    bss	    dec	    hex	filename
>   41928	    963	    752	  43643	   aa7b	net/core/dev_before.o
>   41579	    963	    752	  43294	   a91e	net/core/dev_after.o

So we have a lot of code like this:
                if (pt_prev) {
                        ret = deliver_skb(skb, pt_prev, orig_dev);
                }
                pt_prev = NULL;
and this is what created all this.

But in practice pt_prev is set when we have multiple
consumers for a packet - this is the standard path:
        if (pt_prev) {
                if (unlikely(skb_orphan_frags(skb, GFP_ATOMIC)))
                        goto drop;
                else
                        ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
        }
So I think calls to deliver_skb
are not all that common - we should not make gcc inline
them so aggressively, let it make its own decisions.

Here's an alternative patch:


diff --git a/net/core/dev.c b/net/core/dev.c
index fc1e289..03cb51c 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1642,9 +1642,9 @@ int dev_forward_skb(struct net_device *dev, struct sk_buff *skb)
 }
 EXPORT_SYMBOL_GPL(dev_forward_skb);
 
-static inline int deliver_skb(struct sk_buff *skb,
-			      struct packet_type *pt_prev,
-			      struct net_device *orig_dev)
+static int deliver_skb(struct sk_buff *skb,
+		       struct packet_type *pt_prev,
+		       struct net_device *orig_dev)
 {
 	if (unlikely(skb_orphan_frags(skb, GFP_ATOMIC)))
 		return -ENOMEM;


This gives us more than half the gain of your patch
without breaking tun zero copy.

[mst@robin linux]$ size net/core/dev_orig.o 
   text    data     bss     dec     hex filename
  47357    1270     720   49347    c0c3 net/core/dev_orig.o
[mst@robin linux]$ size net/core/dev.o
   text    data     bss     dec     hex filename
  47105    1270     720   49095    bfc7 net/core/dev.o

Maybe we should tag calls to if (pt_prev) before deliver_skb
as unlikely, this will move this code further from the hot path -
this needs some testing.



> Untested patch :
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index fc1e289..3730318 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1646,8 +1646,6 @@ static inline int deliver_skb(struct sk_buff *skb,
>  			      struct packet_type *pt_prev,
>  			      struct net_device *orig_dev)
>  {
> -	if (unlikely(skb_orphan_frags(skb, GFP_ATOMIC)))
> -		return -ENOMEM;
>  	atomic_inc(&skb->users);
>  	return pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
>  }
> @@ -3133,6 +3131,9 @@ int netif_rx(struct sk_buff *skb)

this is basically assuming tun will use netif_rx forever,
but I think it's quite possible that we'll
switch it to napi down the road.

>  	if (netpoll_rx(skb))
>  		return NET_RX_DROP;
>  
> +	if (unlikely(skb_orphan_frags(skb, GFP_ATOMIC)))
> +		return NET_RX_DROP;
> +
>  	net_timestamp_check(netdev_tstamp_prequeue, skb);
>  
>  	trace_netif_rx(skb);

And this chunk means all tun data is immediately copied before
it's passed to net core, in effect, no zero copy transmit.

> @@ -3498,10 +3499,7 @@ ncls:
>  	}
>  
>  	if (pt_prev) {
> -		if (unlikely(skb_orphan_frags(skb, GFP_ATOMIC)))
> -			goto drop;
> -		else
> -			ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
> +		ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
>  	} else {
>  drop:
>  		atomic_long_inc(&skb->dev->rx_dropped);
>

      reply	other threads:[~2013-06-27  7:08 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-26  8:23 [RFC] about "net: orphan frags on receive" insanity Eric Dumazet
2013-06-26  9:56 ` Michael S. Tsirkin
2013-06-26 10:12   ` Eric Dumazet
2013-06-26 18:56     ` Michael S. Tsirkin
2013-06-26 19:09       ` Eric Dumazet
2013-06-26 19:22         ` Michael S. Tsirkin
2013-06-26 19:44           ` Eric Dumazet
2013-06-27  7:09             ` Michael S. Tsirkin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130627070910.GA5489@redhat.com \
    --to=mst@redhat.com \
    --cc=eric.dumazet@gmail.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).