From: Alexander Duyck <alexander.h.duyck@intel.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>,
netdev@vger.kernel.org, davem@davemloft.net,
jeffrey.t.kirsher@intel.com
Subject: Re: [PATCH] net: Update netdev_alloc_frag to work more efficiently with TCP and GRO
Date: Fri, 29 Jun 2012 16:04:39 -0700 [thread overview]
Message-ID: <4FEE3487.9080408@intel.com> (raw)
In-Reply-To: <1340368430.4604.10280.camel@edumazet-glaptop>
On 06/22/2012 05:33 AM, Eric Dumazet wrote:
> Here is the patch I tested here.
>
> Using 32768 bytes allocations is actually nice for MTU=9000 traffic,
> since we can fit 3 frames per 32KB instead of only 2 frames (using
> kmalloc-16384 slab))
>
> Also, I prefill page->_count with a high bias value, to avoid the
> get_page() we did for each allocated frag.
>
> In my profiles, the get_page() cost was dominant, because of false
> sharing with skb consumers (as they might run on different cpus)
>
> This way, when 32768 bytes are filled, we perform a single
> atomic_sub_return() and can recycle the page if we find we are the last
> user (this is what you did in your patch, when testing page->_count
> being 1)
>
> Note : If I used max(PAGE_SIZE, 32678) for MAX_NETDEV_FRAGSIZE,
> gcc was not able to optimise get_order(MAX_NETDEV_FRAGSIZE), strange...
>
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 5b21522..d31efa2 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -296,9 +296,18 @@ EXPORT_SYMBOL(build_skb);
> struct netdev_alloc_cache {
> struct page *page;
> unsigned int offset;
> + unsigned int pagecnt_bias;
> };
> static DEFINE_PER_CPU(struct netdev_alloc_cache, netdev_alloc_cache);
>
> +#if PAGE_SIZE > 32768
> +#define MAX_NETDEV_FRAGSIZE PAGE_SIZE
> +#else
> +#define MAX_NETDEV_FRAGSIZE 32768
> +#endif
> +
> +#define NETDEV_PAGECNT_BIAS (MAX_NETDEV_FRAGSIZE / \
> + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
> /**
> * netdev_alloc_frag - allocate a page fragment
> * @fragsz: fragment size
> @@ -316,18 +325,25 @@ void *netdev_alloc_frag(unsigned int fragsz)
> nc = &__get_cpu_var(netdev_alloc_cache);
> if (unlikely(!nc->page)) {
> refill:
> - nc->page = alloc_page(GFP_ATOMIC | __GFP_COLD);
> + nc->page = alloc_pages(GFP_ATOMIC | __GFP_COLD | __GFP_COMP,
> + get_order(MAX_NETDEV_FRAGSIZE));
> + if (unlikely(!nc->page))
> + goto end;
> +recycle:
> + atomic_set(&nc->page->_count, NETDEV_PAGECNT_BIAS);
> + nc->pagecnt_bias = NETDEV_PAGECNT_BIAS;
> nc->offset = 0;
> }
> - if (likely(nc->page)) {
> - if (nc->offset + fragsz > PAGE_SIZE) {
> - put_page(nc->page);
> - goto refill;
> - }
> - data = page_address(nc->page) + nc->offset;
> - nc->offset += fragsz;
> - get_page(nc->page);
> + if (nc->offset + fragsz > MAX_NETDEV_FRAGSIZE) {
> + if (!atomic_sub_return(nc->pagecnt_bias,
> + &nc->page->_count))
> + goto recycle;
> + goto refill;
> }
> + data = page_address(nc->page) + nc->offset;
> + nc->offset += fragsz;
> + nc->pagecnt_bias--; /* avoid get_page()/get_page() false sharing */
> +end:
> local_irq_restore(flags);
> return data;
> }
> @@ -353,7 +369,7 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev,
> unsigned int fragsz = SKB_DATA_ALIGN(length + NET_SKB_PAD) +
> SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>
> - if (fragsz <= PAGE_SIZE && !(gfp_mask & __GFP_WAIT)) {
> + if (fragsz <= MAX_NETDEV_FRAGSIZE && !(gfp_mask & __GFP_WAIT)) {
> void *data = netdev_alloc_frag(fragsz);
>
> if (likely(data)) {
>
>
I was wondering if there were any plans to clean this patch up and
submit it to net-next? If not, I can probably work on that since this
addressed the concerns I had in my original patch.
Thanks,
Alex
next prev parent reply other threads:[~2012-06-29 23:04 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-06-20 0:43 [PATCH] net: Update netdev_alloc_frag to work more efficiently with TCP and GRO Alexander Duyck
2012-06-20 1:49 ` Alexander Duyck
2012-06-20 5:36 ` Eric Dumazet
2012-06-20 8:17 ` Eric Dumazet
2012-06-20 8:44 ` Eric Dumazet
2012-06-20 9:04 ` David Miller
2012-06-20 9:14 ` Eric Dumazet
2012-06-20 13:21 ` Eric Dumazet
2012-06-21 4:07 ` Alexander Duyck
2012-06-21 5:07 ` Eric Dumazet
2012-06-22 12:33 ` Eric Dumazet
2012-06-23 0:17 ` Alexander Duyck
2012-06-29 23:04 ` Alexander Duyck [this message]
2012-06-30 8:39 ` Eric Dumazet
2012-06-21 5:56 ` David Miller
2012-06-20 16:30 ` Alexander Duyck
2012-06-20 17:14 ` Alexander Duyck
2012-06-20 18:41 ` Eric Dumazet
2012-06-20 20:10 ` Alexander Duyck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FEE3487.9080408@intel.com \
--to=alexander.h.duyck@intel.com \
--cc=alexander.duyck@gmail.com \
--cc=davem@davemloft.net \
--cc=eric.dumazet@gmail.com \
--cc=jeffrey.t.kirsher@intel.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).