Re: [PATCH] net: Update netdev_alloc_frag to work more efficiently with TCP and GRO

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Alexander Duyck <alexander.h.duyck@intel.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>,
	netdev@vger.kernel.org, davem@davemloft.net,
	jeffrey.t.kirsher@intel.com
Subject: Re: [PATCH] net: Update netdev_alloc_frag to work more efficiently with TCP and GRO
Date: Fri, 22 Jun 2012 17:17:26 -0700	[thread overview]
Message-ID: <4FE50B16.6020606@intel.com> (raw)
In-Reply-To: <1340368430.4604.10280.camel@edumazet-glaptop>

On 06/22/2012 05:33 AM, Eric Dumazet wrote:
> On Thu, 2012-06-21 at 07:07 +0200, Eric Dumazet wrote:
>> On Wed, 2012-06-20 at 21:07 -0700, Alexander Duyck wrote:
>>> On 6/20/2012 6:21 AM, Eric Dumazet wrote:
>>>> +		nc->page = alloc_pages(GFP_ATOMIC | __GFP_COLD |
>>>> +				       (NETDEV_FRAG_ORDER ? __GFP_COMP : 0),
>>>> +				       NETDEV_FRAG_ORDER);
>>> I was wondering if you needed the check for NETDEV_FRAG_ORDER here.  
>>>  From what I can tell setting __GFP_COMP for an order 0 page has no 
>>> effect since it only seems to get checked in prep_new_page and that is 
>>> after a check to verify if the page is order 0 or not.
>> Good point, it seems some net drivers could be changed to remove
>> useless tests.
>>
>> I'll post some performance data as well.
> Here is the patch I tested here.
>
> Using 32768 bytes allocations is actually nice for MTU=9000 traffic,
> since we can fit 3 frames per 32KB instead of only 2 frames (using
> kmalloc-16384 slab))
>
> Also, I prefill page->_count with a high bias value, to avoid the
> get_page() we did for each allocated frag.
>
> In my profiles, the get_page() cost was dominant, because of false
> sharing with skb consumers (as they might run on different cpus)
>
> This way, when 32768 bytes are filled, we perform a single
> atomic_sub_return() and can recycle the page if we find we are the last
> user (this is what you did in your patch, when testing page->_count
> being 1)
This is working really nicely.  On my system put_page dropped to
somewhere near the bottom of the perf top runs I was doing.  In addition
netdev_alloc_frag dropped from about 4% CPU to 2%.

>
> Note : If I used max(PAGE_SIZE, 32678) for MAX_NETDEV_FRAGSIZE,
> gcc was not able to optimise get_order(MAX_NETDEV_FRAGSIZE), strange...
The issue is probably the type checking in the max macro.  You might
have better luck using max_t and specifying a type.

> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 5b21522..d31efa2 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -296,9 +296,18 @@ EXPORT_SYMBOL(build_skb);
>  struct netdev_alloc_cache {
>  	struct page *page;
>  	unsigned int offset;
> +	unsigned int pagecnt_bias;
>  };
>  static DEFINE_PER_CPU(struct netdev_alloc_cache, netdev_alloc_cache);
>  
> +#if PAGE_SIZE > 32768
> +#define MAX_NETDEV_FRAGSIZE	PAGE_SIZE
> +#else
> +#define MAX_NETDEV_FRAGSIZE	32768
> +#endif
> +
> +#define NETDEV_PAGECNT_BIAS	(MAX_NETDEV_FRAGSIZE /		\
> +				 SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
>  /**
>   * netdev_alloc_frag - allocate a page fragment
>   * @fragsz: fragment size
I'm assuming the reason for using the size of skb_shared_info here is
because you don't expect any requests to be smaller than that?  I
suppose that is reasonable, but is there any reason why this couldn't be
a smaller value such as SMP_CACHE_BYTES?

> @@ -316,18 +325,25 @@ void *netdev_alloc_frag(unsigned int fragsz)
>  	nc = &__get_cpu_var(netdev_alloc_cache);
>  	if (unlikely(!nc->page)) {
>  refill:
> -		nc->page = alloc_page(GFP_ATOMIC | __GFP_COLD);
> +		nc->page = alloc_pages(GFP_ATOMIC | __GFP_COLD | __GFP_COMP,
> +				       get_order(MAX_NETDEV_FRAGSIZE));
> +		if (unlikely(!nc->page))
> +			goto end;
> +recycle:
> +		atomic_set(&nc->page->_count, NETDEV_PAGECNT_BIAS);
> +		nc->pagecnt_bias = NETDEV_PAGECNT_BIAS;
>  		nc->offset = 0;
>  	}
> -	if (likely(nc->page)) {
> -		if (nc->offset + fragsz > PAGE_SIZE) {
> -			put_page(nc->page);
> -			goto refill;
> -		}
> -		data = page_address(nc->page) + nc->offset;
> -		nc->offset += fragsz;
> -		get_page(nc->page);
> +	if (nc->offset + fragsz > MAX_NETDEV_FRAGSIZE) {
> +		if (!atomic_sub_return(nc->pagecnt_bias,
> +				       &nc->page->_count))
> +			goto recycle;
> +		goto refill;
>  	}
> +	data = page_address(nc->page) + nc->offset;
> +	nc->offset += fragsz;
> +	nc->pagecnt_bias--; /* avoid get_page()/get_page() false sharing */
> +end:
>  	local_irq_restore(flags);
>  	return data;
>  }
> @@ -353,7 +369,7 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev,
>  	unsigned int fragsz = SKB_DATA_ALIGN(length + NET_SKB_PAD) +
>  			      SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>  
> -	if (fragsz <= PAGE_SIZE && !(gfp_mask & __GFP_WAIT)) {
> +	if (fragsz <= MAX_NETDEV_FRAGSIZE && !(gfp_mask & __GFP_WAIT)) {
>  		void *data = netdev_alloc_frag(fragsz);
>  
>  		if (likely(data)) {
>
>
One idea I had that I have been trying to figure out how make work would
be to actually run the offset in the reverse direction so that you start
it at MAX_NETDEV_FRAGSIZE and work your way back down to 0.  The
advantage to that approach is that you then only have to perform one
check instead of two and you can drop a goto.  The only problem I have
been running into is that it doesn't seem to perform as well as your
patch but I am still working the details out.

Thanks,

Alex

next prev parent reply	other threads:[~2012-06-23  0:17 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-20  0:43 [PATCH] net: Update netdev_alloc_frag to work more efficiently with TCP and GRO Alexander Duyck
2012-06-20  1:49 ` Alexander Duyck
2012-06-20  5:36 ` Eric Dumazet
2012-06-20  8:17   ` Eric Dumazet
2012-06-20  8:44     ` Eric Dumazet
2012-06-20  9:04       ` David Miller
2012-06-20  9:14         ` Eric Dumazet
2012-06-20 13:21     ` Eric Dumazet
2012-06-21  4:07       ` Alexander Duyck
2012-06-21  5:07         ` Eric Dumazet
2012-06-22 12:33           ` Eric Dumazet
2012-06-23  0:17             ` Alexander Duyck [this message]
2012-06-29 23:04             ` Alexander Duyck
2012-06-30  8:39               ` Eric Dumazet
2012-06-21  5:56     ` David Miller
2012-06-20 16:30   ` Alexander Duyck
2012-06-20 17:14     ` Alexander Duyck
2012-06-20 18:41       ` Eric Dumazet
2012-06-20 20:10         ` Alexander Duyck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FE50B16.6020606@intel.com \
    --to=alexander.h.duyck@intel.com \
    --cc=alexander.duyck@gmail.com \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=jeffrey.t.kirsher@intel.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).