From: Alexander Duyck <alexander.h.duyck@intel.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>,
netdev@vger.kernel.org, davem@davemloft.net,
jeffrey.t.kirsher@intel.com
Subject: Re: [PATCH] net: Update netdev_alloc_frag to work more efficiently with TCP and GRO
Date: Fri, 29 Jun 2012 16:04:39 -0700 [thread overview]
Message-ID: <4FEE3487.9080408@intel.com> (raw)
In-Reply-To: <1340368430.4604.10280.camel@edumazet-glaptop>
On 06/22/2012 05:33 AM, Eric Dumazet wrote:
> Here is the patch I tested here.
>
> Using 32768 bytes allocations is actually nice for MTU=9000 traffic,
> since we can fit 3 frames per 32KB instead of only 2 frames (using
> kmalloc-16384 slab))
>
> Also, I prefill page->_count with a high bias value, to avoid the
> get_page() we did for each allocated frag.
>
> In my profiles, the get_page() cost was dominant, because of false
> sharing with skb consumers (as they might run on different cpus)
>
> This way, when 32768 bytes are filled, we perform a single
> atomic_sub_return() and can recycle the page if we find we are the last
> user (this is what you did in your patch, when testing page->_count
> being 1)
>
> Note : If I used max(PAGE_SIZE, 32678) for MAX_NETDEV_FRAGSIZE,
> gcc was not able to optimise get_order(MAX_NETDEV_FRAGSIZE), strange...
>
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 5b21522..d31efa2 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -296,9 +296,18 @@ EXPORT_SYMBOL(build_skb);
> struct netdev_alloc_cache {
> struct page *page;
> unsigned int offset;
> + unsigned int pagecnt_bias;
> };
> static DEFINE_PER_CPU(struct netdev_alloc_cache, netdev_alloc_cache);
>
> +#if PAGE_SIZE > 32768
> +#define MAX_NETDEV_FRAGSIZE PAGE_SIZE
> +#else
> +#define MAX_NETDEV_FRAGSIZE 32768
> +#endif
> +
> +#define NETDEV_PAGECNT_BIAS (MAX_NETDEV_FRAGSIZE / \
> + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
> /**
> * netdev_alloc_frag - allocate a page fragment
> * @fragsz: fragment size
> @@ -316,18 +325,25 @@ void *netdev_alloc_frag(unsigned int fragsz)
> nc = &__get_cpu_var(netdev_alloc_cache);
> if (unlikely(!nc->page)) {
> refill:
> - nc->page = alloc_page(GFP_ATOMIC | __GFP_COLD);
> + nc->page = alloc_pages(GFP_ATOMIC | __GFP_COLD | __GFP_COMP,
> + get_order(MAX_NETDEV_FRAGSIZE));
> + if (unlikely(!nc->page))
> + goto end;
> +recycle:
> + atomic_set(&nc->page->_count, NETDEV_PAGECNT_BIAS);
> + nc->pagecnt_bias = NETDEV_PAGECNT_BIAS;
> nc->offset = 0;
> }
> - if (likely(nc->page)) {
> - if (nc->offset + fragsz > PAGE_SIZE) {
> - put_page(nc->page);
> - goto refill;
> - }
> - data = page_address(nc->page) + nc->offset;
> - nc->offset += fragsz;
> - get_page(nc->page);
> + if (nc->offset + fragsz > MAX_NETDEV_FRAGSIZE) {
> + if (!atomic_sub_return(nc->pagecnt_bias,
> + &nc->page->_count))
> + goto recycle;
> + goto refill;
> }
> + data = page_address(nc->page) + nc->offset;
> + nc->offset += fragsz;
> + nc->pagecnt_bias--; /* avoid get_page()/get_page() false sharing */
> +end:
> local_irq_restore(flags);
> return data;
> }
> @@ -353,7 +369,7 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev,
> unsigned int fragsz = SKB_DATA_ALIGN(length + NET_SKB_PAD) +
> SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>
> - if (fragsz <= PAGE_SIZE && !(gfp_mask & __GFP_WAIT)) {
> + if (fragsz <= MAX_NETDEV_FRAGSIZE && !(gfp_mask & __GFP_WAIT)) {
> void *data = netdev_alloc_frag(fragsz);
>
> if (likely(data)) {
>
>
I was wondering if there were any plans to clean this patch up and
submit it to net-next? If not, I can probably work on that since this
addressed the concerns I had in my original patch.
Thanks,
Alex
next prev parent reply other threads:[~2012-06-29 23:04 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-06-20 0:43 [PATCH] net: Update netdev_alloc_frag to work more efficiently with TCP and GRO Alexander Duyck
2012-06-20 1:49 ` Alexander Duyck
2012-06-20 5:36 ` Eric Dumazet
2012-06-20 8:17 ` Eric Dumazet
2012-06-20 8:44 ` Eric Dumazet
2012-06-20 9:04 ` David Miller
2012-06-20 9:14 ` Eric Dumazet
2012-06-20 13:21 ` Eric Dumazet
2012-06-21 4:07 ` Alexander Duyck
2012-06-21 5:07 ` Eric Dumazet
2012-06-22 12:33 ` Eric Dumazet
2012-06-23 0:17 ` Alexander Duyck
2012-06-29 23:04 ` Alexander Duyck [this message]
2012-06-30 8:39 ` Eric Dumazet
2012-06-21 5:56 ` David Miller
2012-06-20 16:30 ` Alexander Duyck
2012-06-20 17:14 ` Alexander Duyck
2012-06-20 18:41 ` Eric Dumazet
2012-06-20 20:10 ` Alexander Duyck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FEE3487.9080408@intel.com \
--to=alexander.h.duyck@intel.com \
--cc=alexander.duyck@gmail.com \
--cc=davem@davemloft.net \
--cc=eric.dumazet@gmail.com \
--cc=jeffrey.t.kirsher@intel.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.