* [PATCH] net: Update netdev_alloc_frag to work more efficiently with TCP and GRO @ 2012-06-20 0:43 Alexander Duyck 2012-06-20 1:49 ` Alexander Duyck 2012-06-20 5:36 ` Eric Dumazet 0 siblings, 2 replies; 19+ messages in thread From: Alexander Duyck @ 2012-06-20 0:43 UTC (permalink / raw) To: netdev; +Cc: davem, jeffrey.t.kirsher This patch is meant to help improve system performance when netdev_alloc_frag is used in scenarios in which buffers are short lived. This is accomplished by allowing the page offset to be reset in the event that the page count is 1. I also reordered the direction in which we give out sections of the page so that we start at the end of the page and end at the start. The main motivation being that I preferred to have offset represent the amount of page remaining to be used. My primary test case was using ixgbe in combination with TCP. With this patch applied I saw CPU utilization drop from 3.4% to 3.0% for a single thread of netperf receiving a TCP stream via ixgbe. I also tested several scenarios in which the page reuse would not be possible such as UDP flows and routing. In both of these scenarios I saw no noticeable performance degradation compared to the kernel without this patch. Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> --- net/core/skbuff.c | 15 +++++++++++---- 1 files changed, 11 insertions(+), 4 deletions(-) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 5b21522..eb3853c 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -317,15 +317,22 @@ void *netdev_alloc_frag(unsigned int fragsz) if (unlikely(!nc->page)) { refill: nc->page = alloc_page(GFP_ATOMIC | __GFP_COLD); - nc->offset = 0; } if (likely(nc->page)) { - if (nc->offset + fragsz > PAGE_SIZE) { + unsigned int offset = PAGE_SIZE; + + if (page_count(nc->page) != 1) + offset = nc->offset; + + if (offset < fragsz) { put_page(nc->page); goto refill; } - data = page_address(nc->page) + nc->offset; - nc->offset += fragsz; + + offset -= fragsz; + nc->offset = offset; + + data = page_address(nc->page) + offset; get_page(nc->page); } local_irq_restore(flags); ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH] net: Update netdev_alloc_frag to work more efficiently with TCP and GRO 2012-06-20 0:43 [PATCH] net: Update netdev_alloc_frag to work more efficiently with TCP and GRO Alexander Duyck @ 2012-06-20 1:49 ` Alexander Duyck 2012-06-20 5:36 ` Eric Dumazet 1 sibling, 0 replies; 19+ messages in thread From: Alexander Duyck @ 2012-06-20 1:49 UTC (permalink / raw) To: Alexander Duyck; +Cc: netdev, davem, jeffrey.t.kirsher, Eric Dumazet On 6/19/2012 5:43 PM, Alexander Duyck wrote: > This patch is meant to help improve system performance when > netdev_alloc_frag is used in scenarios in which buffers are short lived. > This is accomplished by allowing the page offset to be reset in the event > that the page count is 1. I also reordered the direction in which we give > out sections of the page so that we start at the end of the page and end at > the start. The main motivation being that I preferred to have offset > represent the amount of page remaining to be used. > > My primary test case was using ixgbe in combination with TCP. With this > patch applied I saw CPU utilization drop from 3.4% to 3.0% for a single > thread of netperf receiving a TCP stream via ixgbe. > > I also tested several scenarios in which the page reuse would not be > possible such as UDP flows and routing. In both of these scenarios I saw > no noticeable performance degradation compared to the kernel without this > patch. > > Cc: Eric Dumazet<edumazet@google.com> > Signed-off-by: Alexander Duyck<alexander.h.duyck@intel.com> > --- > > net/core/skbuff.c | 15 +++++++++++---- > 1 files changed, 11 insertions(+), 4 deletions(-) > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index 5b21522..eb3853c 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -317,15 +317,22 @@ void *netdev_alloc_frag(unsigned int fragsz) > if (unlikely(!nc->page)) { > refill: > nc->page = alloc_page(GFP_ATOMIC | __GFP_COLD); > - nc->offset = 0; > } > if (likely(nc->page)) { > - if (nc->offset + fragsz> PAGE_SIZE) { > + unsigned int offset = PAGE_SIZE; > + > + if (page_count(nc->page) != 1) > + offset = nc->offset; > + > + if (offset< fragsz) { > put_page(nc->page); > goto refill; > } > - data = page_address(nc->page) + nc->offset; > - nc->offset += fragsz; > + > + offset -= fragsz; > + nc->offset = offset; > + > + data = page_address(nc->page) + offset; > get_page(nc->page); > } > local_irq_restore(flags); > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html It looks like I forgot to add "--auto" to the command line when I sent this out via stg mail so I am just adding Eric to the CC list on this reply. Sorry for the extra noise. Thanks, Alex ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] net: Update netdev_alloc_frag to work more efficiently with TCP and GRO 2012-06-20 0:43 [PATCH] net: Update netdev_alloc_frag to work more efficiently with TCP and GRO Alexander Duyck 2012-06-20 1:49 ` Alexander Duyck @ 2012-06-20 5:36 ` Eric Dumazet 2012-06-20 8:17 ` Eric Dumazet 2012-06-20 16:30 ` Alexander Duyck 1 sibling, 2 replies; 19+ messages in thread From: Eric Dumazet @ 2012-06-20 5:36 UTC (permalink / raw) To: Alexander Duyck; +Cc: netdev, davem, jeffrey.t.kirsher On Tue, 2012-06-19 at 17:43 -0700, Alexander Duyck wrote: > This patch is meant to help improve system performance when > netdev_alloc_frag is used in scenarios in which buffers are short lived. > This is accomplished by allowing the page offset to be reset in the event > that the page count is 1. I also reordered the direction in which we give > out sections of the page so that we start at the end of the page and end at > the start. The main motivation being that I preferred to have offset > represent the amount of page remaining to be used. > > My primary test case was using ixgbe in combination with TCP. With this > patch applied I saw CPU utilization drop from 3.4% to 3.0% for a single > thread of netperf receiving a TCP stream via ixgbe. > > I also tested several scenarios in which the page reuse would not be > possible such as UDP flows and routing. In both of these scenarios I saw > no noticeable performance degradation compared to the kernel without this > patch. > > Cc: Eric Dumazet <edumazet@google.com> > Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> > --- > > net/core/skbuff.c | 15 +++++++++++---- > 1 files changed, 11 insertions(+), 4 deletions(-) > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index 5b21522..eb3853c 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -317,15 +317,22 @@ void *netdev_alloc_frag(unsigned int fragsz) > if (unlikely(!nc->page)) { > refill: > nc->page = alloc_page(GFP_ATOMIC | __GFP_COLD); > - nc->offset = 0; > } > if (likely(nc->page)) { > - if (nc->offset + fragsz > PAGE_SIZE) { > + unsigned int offset = PAGE_SIZE; > + > + if (page_count(nc->page) != 1) > + offset = nc->offset; > + > + if (offset < fragsz) { > put_page(nc->page); > goto refill; > } > - data = page_address(nc->page) + nc->offset; > - nc->offset += fragsz; > + > + offset -= fragsz; > + nc->offset = offset; > + > + data = page_address(nc->page) + offset; > get_page(nc->page); > } > local_irq_restore(flags); > I tested this idea one month ago and got not convincing results, because the branch was taken half of the time. The cases where page can be reused is probably specific to ixgbe because it uses a different allocator for the frags themselves. netdev_alloc_frag() is only used to allocate the skb head. For typical nics, we allocate frags to populate the RX ring _way_ before packet is received by the NIC. Then, I played with using order-2 pages instead of order-0 ones if PAGE_SIZE < 8192. No clear win either, but you might try this too. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] net: Update netdev_alloc_frag to work more efficiently with TCP and GRO 2012-06-20 5:36 ` Eric Dumazet @ 2012-06-20 8:17 ` Eric Dumazet 2012-06-20 8:44 ` Eric Dumazet ` (2 more replies) 2012-06-20 16:30 ` Alexander Duyck 1 sibling, 3 replies; 19+ messages in thread From: Eric Dumazet @ 2012-06-20 8:17 UTC (permalink / raw) To: Alexander Duyck; +Cc: netdev, davem, jeffrey.t.kirsher On Wed, 2012-06-20 at 07:36 +0200, Eric Dumazet wrote: > I tested this idea one month ago and got not convincing results, because > the branch was taken half of the time. > > The cases where page can be reused is probably specific to ixgbe because > it uses a different allocator for the frags themselves. > netdev_alloc_frag() is only used to allocate the skb head. > > For typical nics, we allocate frags to populate the RX ring _way_ before > packet is received by the NIC. > > Then, I played with using order-2 pages instead of order-0 ones if > PAGE_SIZE < 8192. > > No clear win either, but you might try this too. By the way, big cost in netdev_alloc_frag() is the irq masking/restore We probably could have a version for softirq users... Another idea would also use a pool of pages, instead of a single one, if we want to play with the "clear the offset if page count is one" idea. Strange, I did again benchs with order-2 allocations and got good results this time, but with latest net-next, maybe things have changed since last time I did this. (netdev_alloc_frag(), get_page_from_freelist() and put_page() less prevalent in perf results) ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] net: Update netdev_alloc_frag to work more efficiently with TCP and GRO 2012-06-20 8:17 ` Eric Dumazet @ 2012-06-20 8:44 ` Eric Dumazet 2012-06-20 9:04 ` David Miller 2012-06-20 13:21 ` Eric Dumazet 2012-06-21 5:56 ` David Miller 2 siblings, 1 reply; 19+ messages in thread From: Eric Dumazet @ 2012-06-20 8:44 UTC (permalink / raw) To: Alexander Duyck; +Cc: netdev, davem, jeffrey.t.kirsher On Wed, 2012-06-20 at 10:17 +0200, Eric Dumazet wrote: > Strange, I did again benchs with order-2 allocations and got good > results this time, but with latest net-next, maybe things have changed > since last time I did this. > > (netdev_alloc_frag(), get_page_from_freelist() and put_page() less > prevalent in perf results) > Oh well, I now remember why I abandoned this : machines dont have infinite memory after all... (put_page assumes order-0 page) ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] net: Update netdev_alloc_frag to work more efficiently with TCP and GRO 2012-06-20 8:44 ` Eric Dumazet @ 2012-06-20 9:04 ` David Miller 2012-06-20 9:14 ` Eric Dumazet 0 siblings, 1 reply; 19+ messages in thread From: David Miller @ 2012-06-20 9:04 UTC (permalink / raw) To: eric.dumazet; +Cc: alexander.h.duyck, netdev, jeffrey.t.kirsher From: Eric Dumazet <eric.dumazet@gmail.com> Date: Wed, 20 Jun 2012 10:44:07 +0200 > (put_page assumes order-0 page) But it certainly can handle compound pages, I thought? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] net: Update netdev_alloc_frag to work more efficiently with TCP and GRO 2012-06-20 9:04 ` David Miller @ 2012-06-20 9:14 ` Eric Dumazet 0 siblings, 0 replies; 19+ messages in thread From: Eric Dumazet @ 2012-06-20 9:14 UTC (permalink / raw) To: David Miller; +Cc: alexander.h.duyck, netdev, jeffrey.t.kirsher On Wed, 2012-06-20 at 02:04 -0700, David Miller wrote: > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Wed, 20 Jun 2012 10:44:07 +0200 > > > (put_page assumes order-0 page) > > But it certainly can handle compound pages, I thought? Yes, I forgot the GFP_COMP ;) Oh well, time for a coffee ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] net: Update netdev_alloc_frag to work more efficiently with TCP and GRO 2012-06-20 8:17 ` Eric Dumazet 2012-06-20 8:44 ` Eric Dumazet @ 2012-06-20 13:21 ` Eric Dumazet 2012-06-21 4:07 ` Alexander Duyck 2012-06-21 5:56 ` David Miller 2 siblings, 1 reply; 19+ messages in thread From: Eric Dumazet @ 2012-06-20 13:21 UTC (permalink / raw) To: Alexander Duyck; +Cc: netdev, davem, jeffrey.t.kirsher On Wed, 2012-06-20 at 10:17 +0200, Eric Dumazet wrote: > Strange, I did again benchs with order-2 allocations and got good > results this time, but with latest net-next, maybe things have changed > since last time I did this. > > (netdev_alloc_frag(), get_page_from_freelist() and put_page() less > prevalent in perf results) > In fact, since SLUB uses order-3 for kmalloc-2048, I felt lucky to try this as well, and results are really good, on ixgbe at least. diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 5b21522..ffd2cba 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -299,6 +299,9 @@ struct netdev_alloc_cache { }; static DEFINE_PER_CPU(struct netdev_alloc_cache, netdev_alloc_cache); +#define MAX_NETDEV_FRAGSIZE max_t(unsigned int, PAGE_SIZE, 32768) +#define NETDEV_FRAG_ORDER get_order(MAX_NETDEV_FRAGSIZE) + /** * netdev_alloc_frag - allocate a page fragment * @fragsz: fragment size @@ -316,11 +319,13 @@ void *netdev_alloc_frag(unsigned int fragsz) nc = &__get_cpu_var(netdev_alloc_cache); if (unlikely(!nc->page)) { refill: - nc->page = alloc_page(GFP_ATOMIC | __GFP_COLD); + nc->page = alloc_pages(GFP_ATOMIC | __GFP_COLD | + (NETDEV_FRAG_ORDER ? __GFP_COMP : 0), + NETDEV_FRAG_ORDER); nc->offset = 0; } if (likely(nc->page)) { - if (nc->offset + fragsz > PAGE_SIZE) { + if (nc->offset + fragsz > MAX_NETDEV_FRAGSIZE) { put_page(nc->page); goto refill; } @@ -353,7 +358,7 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev, unsigned int fragsz = SKB_DATA_ALIGN(length + NET_SKB_PAD) + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); - if (fragsz <= PAGE_SIZE && !(gfp_mask & __GFP_WAIT)) { + if (fragsz <= MAX_NETDEV_FRAGSIZE && !(gfp_mask & __GFP_WAIT)) { void *data = netdev_alloc_frag(fragsz); if (likely(data)) { ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH] net: Update netdev_alloc_frag to work more efficiently with TCP and GRO 2012-06-20 13:21 ` Eric Dumazet @ 2012-06-21 4:07 ` Alexander Duyck 2012-06-21 5:07 ` Eric Dumazet 0 siblings, 1 reply; 19+ messages in thread From: Alexander Duyck @ 2012-06-21 4:07 UTC (permalink / raw) To: Eric Dumazet; +Cc: Alexander Duyck, netdev, davem, jeffrey.t.kirsher On 6/20/2012 6:21 AM, Eric Dumazet wrote: > On Wed, 2012-06-20 at 10:17 +0200, Eric Dumazet wrote: > >> Strange, I did again benchs with order-2 allocations and got good >> results this time, but with latest net-next, maybe things have changed >> since last time I did this. >> >> (netdev_alloc_frag(), get_page_from_freelist() and put_page() less >> prevalent in perf results) >> > In fact, since SLUB uses order-3 for kmalloc-2048, I felt lucky to try > this as well, and results are really good, on ixgbe at least. > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index 5b21522..ffd2cba 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -299,6 +299,9 @@ struct netdev_alloc_cache { > }; > static DEFINE_PER_CPU(struct netdev_alloc_cache, netdev_alloc_cache); > > +#define MAX_NETDEV_FRAGSIZE max_t(unsigned int, PAGE_SIZE, 32768) > +#define NETDEV_FRAG_ORDER get_order(MAX_NETDEV_FRAGSIZE) > + > /** > * netdev_alloc_frag - allocate a page fragment > * @fragsz: fragment size > @@ -316,11 +319,13 @@ void *netdev_alloc_frag(unsigned int fragsz) > nc =&__get_cpu_var(netdev_alloc_cache); > if (unlikely(!nc->page)) { > refill: > - nc->page = alloc_page(GFP_ATOMIC | __GFP_COLD); > + nc->page = alloc_pages(GFP_ATOMIC | __GFP_COLD | > + (NETDEV_FRAG_ORDER ? __GFP_COMP : 0), > + NETDEV_FRAG_ORDER); > nc->offset = 0; > } I was wondering if you needed the check for NETDEV_FRAG_ORDER here. From what I can tell setting __GFP_COMP for an order 0 page has no effect since it only seems to get checked in prep_new_page and that is after a check to verify if the page is order 0 or not. Thanks, Alex ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] net: Update netdev_alloc_frag to work more efficiently with TCP and GRO 2012-06-21 4:07 ` Alexander Duyck @ 2012-06-21 5:07 ` Eric Dumazet 2012-06-22 12:33 ` Eric Dumazet 0 siblings, 1 reply; 19+ messages in thread From: Eric Dumazet @ 2012-06-21 5:07 UTC (permalink / raw) To: Alexander Duyck; +Cc: Alexander Duyck, netdev, davem, jeffrey.t.kirsher On Wed, 2012-06-20 at 21:07 -0700, Alexander Duyck wrote: > On 6/20/2012 6:21 AM, Eric Dumazet wrote: > > + nc->page = alloc_pages(GFP_ATOMIC | __GFP_COLD | > > + (NETDEV_FRAG_ORDER ? __GFP_COMP : 0), > > + NETDEV_FRAG_ORDER); > I was wondering if you needed the check for NETDEV_FRAG_ORDER here. > From what I can tell setting __GFP_COMP for an order 0 page has no > effect since it only seems to get checked in prep_new_page and that is > after a check to verify if the page is order 0 or not. Good point, it seems some net drivers could be changed to remove useless tests. I'll post some performance data as well. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] net: Update netdev_alloc_frag to work more efficiently with TCP and GRO 2012-06-21 5:07 ` Eric Dumazet @ 2012-06-22 12:33 ` Eric Dumazet 2012-06-23 0:17 ` Alexander Duyck 2012-06-29 23:04 ` Alexander Duyck 0 siblings, 2 replies; 19+ messages in thread From: Eric Dumazet @ 2012-06-22 12:33 UTC (permalink / raw) To: Alexander Duyck; +Cc: Alexander Duyck, netdev, davem, jeffrey.t.kirsher On Thu, 2012-06-21 at 07:07 +0200, Eric Dumazet wrote: > On Wed, 2012-06-20 at 21:07 -0700, Alexander Duyck wrote: > > On 6/20/2012 6:21 AM, Eric Dumazet wrote: > > > > + nc->page = alloc_pages(GFP_ATOMIC | __GFP_COLD | > > > + (NETDEV_FRAG_ORDER ? __GFP_COMP : 0), > > > + NETDEV_FRAG_ORDER); > > > I was wondering if you needed the check for NETDEV_FRAG_ORDER here. > > From what I can tell setting __GFP_COMP for an order 0 page has no > > effect since it only seems to get checked in prep_new_page and that is > > after a check to verify if the page is order 0 or not. > > Good point, it seems some net drivers could be changed to remove > useless tests. > > I'll post some performance data as well. Here is the patch I tested here. Using 32768 bytes allocations is actually nice for MTU=9000 traffic, since we can fit 3 frames per 32KB instead of only 2 frames (using kmalloc-16384 slab)) Also, I prefill page->_count with a high bias value, to avoid the get_page() we did for each allocated frag. In my profiles, the get_page() cost was dominant, because of false sharing with skb consumers (as they might run on different cpus) This way, when 32768 bytes are filled, we perform a single atomic_sub_return() and can recycle the page if we find we are the last user (this is what you did in your patch, when testing page->_count being 1) Note : If I used max(PAGE_SIZE, 32678) for MAX_NETDEV_FRAGSIZE, gcc was not able to optimise get_order(MAX_NETDEV_FRAGSIZE), strange... diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 5b21522..d31efa2 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -296,9 +296,18 @@ EXPORT_SYMBOL(build_skb); struct netdev_alloc_cache { struct page *page; unsigned int offset; + unsigned int pagecnt_bias; }; static DEFINE_PER_CPU(struct netdev_alloc_cache, netdev_alloc_cache); +#if PAGE_SIZE > 32768 +#define MAX_NETDEV_FRAGSIZE PAGE_SIZE +#else +#define MAX_NETDEV_FRAGSIZE 32768 +#endif + +#define NETDEV_PAGECNT_BIAS (MAX_NETDEV_FRAGSIZE / \ + SKB_DATA_ALIGN(sizeof(struct skb_shared_info))) /** * netdev_alloc_frag - allocate a page fragment * @fragsz: fragment size @@ -316,18 +325,25 @@ void *netdev_alloc_frag(unsigned int fragsz) nc = &__get_cpu_var(netdev_alloc_cache); if (unlikely(!nc->page)) { refill: - nc->page = alloc_page(GFP_ATOMIC | __GFP_COLD); + nc->page = alloc_pages(GFP_ATOMIC | __GFP_COLD | __GFP_COMP, + get_order(MAX_NETDEV_FRAGSIZE)); + if (unlikely(!nc->page)) + goto end; +recycle: + atomic_set(&nc->page->_count, NETDEV_PAGECNT_BIAS); + nc->pagecnt_bias = NETDEV_PAGECNT_BIAS; nc->offset = 0; } - if (likely(nc->page)) { - if (nc->offset + fragsz > PAGE_SIZE) { - put_page(nc->page); - goto refill; - } - data = page_address(nc->page) + nc->offset; - nc->offset += fragsz; - get_page(nc->page); + if (nc->offset + fragsz > MAX_NETDEV_FRAGSIZE) { + if (!atomic_sub_return(nc->pagecnt_bias, + &nc->page->_count)) + goto recycle; + goto refill; } + data = page_address(nc->page) + nc->offset; + nc->offset += fragsz; + nc->pagecnt_bias--; /* avoid get_page()/get_page() false sharing */ +end: local_irq_restore(flags); return data; } @@ -353,7 +369,7 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev, unsigned int fragsz = SKB_DATA_ALIGN(length + NET_SKB_PAD) + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); - if (fragsz <= PAGE_SIZE && !(gfp_mask & __GFP_WAIT)) { + if (fragsz <= MAX_NETDEV_FRAGSIZE && !(gfp_mask & __GFP_WAIT)) { void *data = netdev_alloc_frag(fragsz); if (likely(data)) { ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH] net: Update netdev_alloc_frag to work more efficiently with TCP and GRO 2012-06-22 12:33 ` Eric Dumazet @ 2012-06-23 0:17 ` Alexander Duyck 2012-06-29 23:04 ` Alexander Duyck 1 sibling, 0 replies; 19+ messages in thread From: Alexander Duyck @ 2012-06-23 0:17 UTC (permalink / raw) To: Eric Dumazet; +Cc: Alexander Duyck, netdev, davem, jeffrey.t.kirsher On 06/22/2012 05:33 AM, Eric Dumazet wrote: > On Thu, 2012-06-21 at 07:07 +0200, Eric Dumazet wrote: >> On Wed, 2012-06-20 at 21:07 -0700, Alexander Duyck wrote: >>> On 6/20/2012 6:21 AM, Eric Dumazet wrote: >>>> + nc->page = alloc_pages(GFP_ATOMIC | __GFP_COLD | >>>> + (NETDEV_FRAG_ORDER ? __GFP_COMP : 0), >>>> + NETDEV_FRAG_ORDER); >>> I was wondering if you needed the check for NETDEV_FRAG_ORDER here. >>> From what I can tell setting __GFP_COMP for an order 0 page has no >>> effect since it only seems to get checked in prep_new_page and that is >>> after a check to verify if the page is order 0 or not. >> Good point, it seems some net drivers could be changed to remove >> useless tests. >> >> I'll post some performance data as well. > Here is the patch I tested here. > > Using 32768 bytes allocations is actually nice for MTU=9000 traffic, > since we can fit 3 frames per 32KB instead of only 2 frames (using > kmalloc-16384 slab)) > > Also, I prefill page->_count with a high bias value, to avoid the > get_page() we did for each allocated frag. > > In my profiles, the get_page() cost was dominant, because of false > sharing with skb consumers (as they might run on different cpus) > > This way, when 32768 bytes are filled, we perform a single > atomic_sub_return() and can recycle the page if we find we are the last > user (this is what you did in your patch, when testing page->_count > being 1) This is working really nicely. On my system put_page dropped to somewhere near the bottom of the perf top runs I was doing. In addition netdev_alloc_frag dropped from about 4% CPU to 2%. > > Note : If I used max(PAGE_SIZE, 32678) for MAX_NETDEV_FRAGSIZE, > gcc was not able to optimise get_order(MAX_NETDEV_FRAGSIZE), strange... The issue is probably the type checking in the max macro. You might have better luck using max_t and specifying a type. > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index 5b21522..d31efa2 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -296,9 +296,18 @@ EXPORT_SYMBOL(build_skb); > struct netdev_alloc_cache { > struct page *page; > unsigned int offset; > + unsigned int pagecnt_bias; > }; > static DEFINE_PER_CPU(struct netdev_alloc_cache, netdev_alloc_cache); > > +#if PAGE_SIZE > 32768 > +#define MAX_NETDEV_FRAGSIZE PAGE_SIZE > +#else > +#define MAX_NETDEV_FRAGSIZE 32768 > +#endif > + > +#define NETDEV_PAGECNT_BIAS (MAX_NETDEV_FRAGSIZE / \ > + SKB_DATA_ALIGN(sizeof(struct skb_shared_info))) > /** > * netdev_alloc_frag - allocate a page fragment > * @fragsz: fragment size I'm assuming the reason for using the size of skb_shared_info here is because you don't expect any requests to be smaller than that? I suppose that is reasonable, but is there any reason why this couldn't be a smaller value such as SMP_CACHE_BYTES? > @@ -316,18 +325,25 @@ void *netdev_alloc_frag(unsigned int fragsz) > nc = &__get_cpu_var(netdev_alloc_cache); > if (unlikely(!nc->page)) { > refill: > - nc->page = alloc_page(GFP_ATOMIC | __GFP_COLD); > + nc->page = alloc_pages(GFP_ATOMIC | __GFP_COLD | __GFP_COMP, > + get_order(MAX_NETDEV_FRAGSIZE)); > + if (unlikely(!nc->page)) > + goto end; > +recycle: > + atomic_set(&nc->page->_count, NETDEV_PAGECNT_BIAS); > + nc->pagecnt_bias = NETDEV_PAGECNT_BIAS; > nc->offset = 0; > } > - if (likely(nc->page)) { > - if (nc->offset + fragsz > PAGE_SIZE) { > - put_page(nc->page); > - goto refill; > - } > - data = page_address(nc->page) + nc->offset; > - nc->offset += fragsz; > - get_page(nc->page); > + if (nc->offset + fragsz > MAX_NETDEV_FRAGSIZE) { > + if (!atomic_sub_return(nc->pagecnt_bias, > + &nc->page->_count)) > + goto recycle; > + goto refill; > } > + data = page_address(nc->page) + nc->offset; > + nc->offset += fragsz; > + nc->pagecnt_bias--; /* avoid get_page()/get_page() false sharing */ > +end: > local_irq_restore(flags); > return data; > } > @@ -353,7 +369,7 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev, > unsigned int fragsz = SKB_DATA_ALIGN(length + NET_SKB_PAD) + > SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > > - if (fragsz <= PAGE_SIZE && !(gfp_mask & __GFP_WAIT)) { > + if (fragsz <= MAX_NETDEV_FRAGSIZE && !(gfp_mask & __GFP_WAIT)) { > void *data = netdev_alloc_frag(fragsz); > > if (likely(data)) { > > One idea I had that I have been trying to figure out how make work would be to actually run the offset in the reverse direction so that you start it at MAX_NETDEV_FRAGSIZE and work your way back down to 0. The advantage to that approach is that you then only have to perform one check instead of two and you can drop a goto. The only problem I have been running into is that it doesn't seem to perform as well as your patch but I am still working the details out. Thanks, Alex ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] net: Update netdev_alloc_frag to work more efficiently with TCP and GRO 2012-06-22 12:33 ` Eric Dumazet 2012-06-23 0:17 ` Alexander Duyck @ 2012-06-29 23:04 ` Alexander Duyck 2012-06-30 8:39 ` Eric Dumazet 1 sibling, 1 reply; 19+ messages in thread From: Alexander Duyck @ 2012-06-29 23:04 UTC (permalink / raw) To: Eric Dumazet; +Cc: Alexander Duyck, netdev, davem, jeffrey.t.kirsher On 06/22/2012 05:33 AM, Eric Dumazet wrote: > Here is the patch I tested here. > > Using 32768 bytes allocations is actually nice for MTU=9000 traffic, > since we can fit 3 frames per 32KB instead of only 2 frames (using > kmalloc-16384 slab)) > > Also, I prefill page->_count with a high bias value, to avoid the > get_page() we did for each allocated frag. > > In my profiles, the get_page() cost was dominant, because of false > sharing with skb consumers (as they might run on different cpus) > > This way, when 32768 bytes are filled, we perform a single > atomic_sub_return() and can recycle the page if we find we are the last > user (this is what you did in your patch, when testing page->_count > being 1) > > Note : If I used max(PAGE_SIZE, 32678) for MAX_NETDEV_FRAGSIZE, > gcc was not able to optimise get_order(MAX_NETDEV_FRAGSIZE), strange... > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index 5b21522..d31efa2 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -296,9 +296,18 @@ EXPORT_SYMBOL(build_skb); > struct netdev_alloc_cache { > struct page *page; > unsigned int offset; > + unsigned int pagecnt_bias; > }; > static DEFINE_PER_CPU(struct netdev_alloc_cache, netdev_alloc_cache); > > +#if PAGE_SIZE > 32768 > +#define MAX_NETDEV_FRAGSIZE PAGE_SIZE > +#else > +#define MAX_NETDEV_FRAGSIZE 32768 > +#endif > + > +#define NETDEV_PAGECNT_BIAS (MAX_NETDEV_FRAGSIZE / \ > + SKB_DATA_ALIGN(sizeof(struct skb_shared_info))) > /** > * netdev_alloc_frag - allocate a page fragment > * @fragsz: fragment size > @@ -316,18 +325,25 @@ void *netdev_alloc_frag(unsigned int fragsz) > nc = &__get_cpu_var(netdev_alloc_cache); > if (unlikely(!nc->page)) { > refill: > - nc->page = alloc_page(GFP_ATOMIC | __GFP_COLD); > + nc->page = alloc_pages(GFP_ATOMIC | __GFP_COLD | __GFP_COMP, > + get_order(MAX_NETDEV_FRAGSIZE)); > + if (unlikely(!nc->page)) > + goto end; > +recycle: > + atomic_set(&nc->page->_count, NETDEV_PAGECNT_BIAS); > + nc->pagecnt_bias = NETDEV_PAGECNT_BIAS; > nc->offset = 0; > } > - if (likely(nc->page)) { > - if (nc->offset + fragsz > PAGE_SIZE) { > - put_page(nc->page); > - goto refill; > - } > - data = page_address(nc->page) + nc->offset; > - nc->offset += fragsz; > - get_page(nc->page); > + if (nc->offset + fragsz > MAX_NETDEV_FRAGSIZE) { > + if (!atomic_sub_return(nc->pagecnt_bias, > + &nc->page->_count)) > + goto recycle; > + goto refill; > } > + data = page_address(nc->page) + nc->offset; > + nc->offset += fragsz; > + nc->pagecnt_bias--; /* avoid get_page()/get_page() false sharing */ > +end: > local_irq_restore(flags); > return data; > } > @@ -353,7 +369,7 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev, > unsigned int fragsz = SKB_DATA_ALIGN(length + NET_SKB_PAD) + > SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > > - if (fragsz <= PAGE_SIZE && !(gfp_mask & __GFP_WAIT)) { > + if (fragsz <= MAX_NETDEV_FRAGSIZE && !(gfp_mask & __GFP_WAIT)) { > void *data = netdev_alloc_frag(fragsz); > > if (likely(data)) { > > I was wondering if there were any plans to clean this patch up and submit it to net-next? If not, I can probably work on that since this addressed the concerns I had in my original patch. Thanks, Alex ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] net: Update netdev_alloc_frag to work more efficiently with TCP and GRO 2012-06-29 23:04 ` Alexander Duyck @ 2012-06-30 8:39 ` Eric Dumazet 0 siblings, 0 replies; 19+ messages in thread From: Eric Dumazet @ 2012-06-30 8:39 UTC (permalink / raw) To: Alexander Duyck; +Cc: Alexander Duyck, netdev, davem, jeffrey.t.kirsher On Fri, 2012-06-29 at 16:04 -0700, Alexander Duyck wrote: > I was wondering if there were any plans to clean this patch up and > submit it to net-next? If not, I can probably work on that since this > addressed the concerns I had in my original patch. > I used this patch for a while on my machines, but I am working on something allowing fallback to order-0 allocations if memory gets fragmented. This fallback should almost never happen, but we should have it just in case ? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] net: Update netdev_alloc_frag to work more efficiently with TCP and GRO 2012-06-20 8:17 ` Eric Dumazet 2012-06-20 8:44 ` Eric Dumazet 2012-06-20 13:21 ` Eric Dumazet @ 2012-06-21 5:56 ` David Miller 2 siblings, 0 replies; 19+ messages in thread From: David Miller @ 2012-06-21 5:56 UTC (permalink / raw) To: eric.dumazet; +Cc: alexander.h.duyck, netdev, jeffrey.t.kirsher From: Eric Dumazet <eric.dumazet@gmail.com> Date: Wed, 20 Jun 2012 10:17:03 +0200 > By the way, big cost in netdev_alloc_frag() is the irq > masking/restore We probably could have a version for softirq > users... That's an extremely disappointing way for us to be losing cycles. Everyone pays this price purely because: 1) __netdev_alloc_skb() uses netdev_alloc_frag() and: 2) #1 is invoked, either directly or indirectly, by tons of slow non-NAPI drivers. This got me looking into the plathora of interfaces we let drivers use to allocate RX buffers. It's a big mess. We have dev_alloc_skb() which essentially calls __netdev_alloc_skb() with a NULL device argument. This is terrible because it means that if we want to do something interesting on a per-device level we can't rely upon the device being non-NULL in __netdev_alloc_skb(). I looked at the remaining dev_alloc_skb() users and these are in places which are allocating packets in a module which is one level removed from the netdevice level. For example, ATM and infiniband IPATH. What these callers want is something more like: static inline struct sk_buff *alloc_skb_and_reserve_pad(unsinged int length, gfp_t gfp_mask) { struct sk_buff *skb = __alloc_skb(length + NET_SKB_PAD, gfp_mask, 0, NUMA_NO_NODE); if (likely(skb)) skb_reserve(skb, NET_SKB_PAD); return skb; } Then we won't have the NULL device case for __netdev_alloc_skb() any more. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] net: Update netdev_alloc_frag to work more efficiently with TCP and GRO 2012-06-20 5:36 ` Eric Dumazet 2012-06-20 8:17 ` Eric Dumazet @ 2012-06-20 16:30 ` Alexander Duyck 2012-06-20 17:14 ` Alexander Duyck 1 sibling, 1 reply; 19+ messages in thread From: Alexander Duyck @ 2012-06-20 16:30 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev, davem, jeffrey.t.kirsher On 06/19/2012 10:36 PM, Eric Dumazet wrote: > On Tue, 2012-06-19 at 17:43 -0700, Alexander Duyck wrote: >> This patch is meant to help improve system performance when >> netdev_alloc_frag is used in scenarios in which buffers are short lived. >> This is accomplished by allowing the page offset to be reset in the event >> that the page count is 1. I also reordered the direction in which we give >> out sections of the page so that we start at the end of the page and end at >> the start. The main motivation being that I preferred to have offset >> represent the amount of page remaining to be used. >> >> My primary test case was using ixgbe in combination with TCP. With this >> patch applied I saw CPU utilization drop from 3.4% to 3.0% for a single >> thread of netperf receiving a TCP stream via ixgbe. >> >> I also tested several scenarios in which the page reuse would not be >> possible such as UDP flows and routing. In both of these scenarios I saw >> no noticeable performance degradation compared to the kernel without this >> patch. >> >> Cc: Eric Dumazet <edumazet@google.com> >> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> >> --- >> >> net/core/skbuff.c | 15 +++++++++++---- >> 1 files changed, 11 insertions(+), 4 deletions(-) >> >> diff --git a/net/core/skbuff.c b/net/core/skbuff.c >> index 5b21522..eb3853c 100644 >> --- a/net/core/skbuff.c >> +++ b/net/core/skbuff.c >> @@ -317,15 +317,22 @@ void *netdev_alloc_frag(unsigned int fragsz) >> if (unlikely(!nc->page)) { >> refill: >> nc->page = alloc_page(GFP_ATOMIC | __GFP_COLD); >> - nc->offset = 0; >> } >> if (likely(nc->page)) { >> - if (nc->offset + fragsz > PAGE_SIZE) { >> + unsigned int offset = PAGE_SIZE; >> + >> + if (page_count(nc->page) != 1) >> + offset = nc->offset; >> + >> + if (offset < fragsz) { >> put_page(nc->page); >> goto refill; >> } >> - data = page_address(nc->page) + nc->offset; >> - nc->offset += fragsz; >> + >> + offset -= fragsz; >> + nc->offset = offset; >> + >> + data = page_address(nc->page) + offset; >> get_page(nc->page); >> } >> local_irq_restore(flags); >> > I tested this idea one month ago and got not convincing results, because > the branch was taken half of the time. > > The cases where page can be reused is probably specific to ixgbe because > it uses a different allocator for the frags themselves. > netdev_alloc_frag() is only used to allocate the skb head. Actually it is pretty much anywhere a copy-break type setup exists. I think ixgbe and a few other drivers have this type of setup where netdev_alloc_skb is called and the data is just copied into the buffer. My thought was if that I can improve this one case without hurting the other cases I should just go ahead and submit it since it is a net win performance wise. I think one of the biggest advantages of this for ixgbe is that it allows the buffer to become cache warm so that writing the shared info and copying the header contents becomes very cheap compared to accessing a cache cold page. > For typical nics, we allocate frags to populate the RX ring _way_ before > packet is received by the NIC. > > Then, I played with using order-2 pages instead of order-0 ones if > PAGE_SIZE < 8192. > > No clear win either, but you might try this too. The biggest issue I see with an order-2 page is that it means the memory is going to take much longer to cycle out of a shared page. As a result changes like the one I just came up with would likely have little to no benefit because we would run out of room in the frags list before we could start reusing a fresh page. Thanks, Alex ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] net: Update netdev_alloc_frag to work more efficiently with TCP and GRO 2012-06-20 16:30 ` Alexander Duyck @ 2012-06-20 17:14 ` Alexander Duyck 2012-06-20 18:41 ` Eric Dumazet 0 siblings, 1 reply; 19+ messages in thread From: Alexander Duyck @ 2012-06-20 17:14 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev, davem, jeffrey.t.kirsher On 06/20/2012 09:30 AM, Alexander Duyck wrote: > On 06/19/2012 10:36 PM, Eric Dumazet wrote: >> On Tue, 2012-06-19 at 17:43 -0700, Alexander Duyck wrote: >>> This patch is meant to help improve system performance when >>> netdev_alloc_frag is used in scenarios in which buffers are short lived. >>> This is accomplished by allowing the page offset to be reset in the event >>> that the page count is 1. I also reordered the direction in which we give >>> out sections of the page so that we start at the end of the page and end at >>> the start. The main motivation being that I preferred to have offset >>> represent the amount of page remaining to be used. >>> >>> My primary test case was using ixgbe in combination with TCP. With this >>> patch applied I saw CPU utilization drop from 3.4% to 3.0% for a single >>> thread of netperf receiving a TCP stream via ixgbe. >>> >>> I also tested several scenarios in which the page reuse would not be >>> possible such as UDP flows and routing. In both of these scenarios I saw >>> no noticeable performance degradation compared to the kernel without this >>> patch. >>> >>> Cc: Eric Dumazet <edumazet@google.com> >>> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> >>> --- >>> >>> net/core/skbuff.c | 15 +++++++++++---- >>> 1 files changed, 11 insertions(+), 4 deletions(-) >>> >>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c >>> index 5b21522..eb3853c 100644 >>> --- a/net/core/skbuff.c >>> +++ b/net/core/skbuff.c >>> @@ -317,15 +317,22 @@ void *netdev_alloc_frag(unsigned int fragsz) >>> if (unlikely(!nc->page)) { >>> refill: >>> nc->page = alloc_page(GFP_ATOMIC | __GFP_COLD); >>> - nc->offset = 0; >>> } >>> if (likely(nc->page)) { >>> - if (nc->offset + fragsz > PAGE_SIZE) { >>> + unsigned int offset = PAGE_SIZE; >>> + >>> + if (page_count(nc->page) != 1) >>> + offset = nc->offset; >>> + >>> + if (offset < fragsz) { >>> put_page(nc->page); >>> goto refill; >>> } >>> - data = page_address(nc->page) + nc->offset; >>> - nc->offset += fragsz; >>> + >>> + offset -= fragsz; >>> + nc->offset = offset; >>> + >>> + data = page_address(nc->page) + offset; >>> get_page(nc->page); >>> } >>> local_irq_restore(flags); >>> >> I tested this idea one month ago and got not convincing results, because >> the branch was taken half of the time. >> >> The cases where page can be reused is probably specific to ixgbe because >> it uses a different allocator for the frags themselves. >> netdev_alloc_frag() is only used to allocate the skb head. > Actually it is pretty much anywhere a copy-break type setup exists. I > think ixgbe and a few other drivers have this type of setup where > netdev_alloc_skb is called and the data is just copied into the buffer. > My thought was if that I can improve this one case without hurting the > other cases I should just go ahead and submit it since it is a net win > performance wise. > > I think one of the biggest advantages of this for ixgbe is that it > allows the buffer to become cache warm so that writing the shared info > and copying the header contents becomes very cheap compared to accessing > a cache cold page. > >> For typical nics, we allocate frags to populate the RX ring _way_ before >> packet is received by the NIC. >> >> Then, I played with using order-2 pages instead of order-0 ones if >> PAGE_SIZE < 8192. >> >> No clear win either, but you might try this too. > The biggest issue I see with an order-2 page is that it means the memory > is going to take much longer to cycle out of a shared page. As a result > changes like the one I just came up with would likely have little to no > benefit because we would run out of room in the frags list before we > could start reusing a fresh page. > > Thanks, > > Alex > Actually I think I just realized what the difference is. I was looking at things with LRO disabled. With LRO enabled our hardware RSC feature kind of defeats the whole point of the GRO or TCP coalescing anyway since it will stuff 16 fragments into a single packet before we even hand the packet off to the stack. Thanks, Alex ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] net: Update netdev_alloc_frag to work more efficiently with TCP and GRO 2012-06-20 17:14 ` Alexander Duyck @ 2012-06-20 18:41 ` Eric Dumazet 2012-06-20 20:10 ` Alexander Duyck 0 siblings, 1 reply; 19+ messages in thread From: Eric Dumazet @ 2012-06-20 18:41 UTC (permalink / raw) To: Alexander Duyck; +Cc: netdev, davem, jeffrey.t.kirsher On Wed, 2012-06-20 at 10:14 -0700, Alexander Duyck wrote: > Actually I think I just realized what the difference is. I was looking > at things with LRO disabled. With LRO enabled our hardware RSC feature > kind of defeats the whole point of the GRO or TCP coalescing anyway > since it will stuff 16 fragments into a single packet before we even > hand the packet off to the stack. I noticed LRO was now 'off' by default on ixgbe (net-next tree), I am pretty sure it was 'on' some months ago ? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] net: Update netdev_alloc_frag to work more efficiently with TCP and GRO 2012-06-20 18:41 ` Eric Dumazet @ 2012-06-20 20:10 ` Alexander Duyck 0 siblings, 0 replies; 19+ messages in thread From: Alexander Duyck @ 2012-06-20 20:10 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev, davem, jeffrey.t.kirsher On 06/20/2012 11:41 AM, Eric Dumazet wrote: > On Wed, 2012-06-20 at 10:14 -0700, Alexander Duyck wrote: > >> Actually I think I just realized what the difference is. I was looking >> at things with LRO disabled. With LRO enabled our hardware RSC feature >> kind of defeats the whole point of the GRO or TCP coalescing anyway >> since it will stuff 16 fragments into a single packet before we even >> hand the packet off to the stack. > I noticed LRO was now 'off' by default on ixgbe (net-next tree), I am > pretty sure it was 'on' some months ago ? It should be on by default unless you are doing some routing. In that case as soon as the interface comes up the LRO is disabled by the stack. Thanks, Alex ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2012-06-30 8:39 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-06-20 0:43 [PATCH] net: Update netdev_alloc_frag to work more efficiently with TCP and GRO Alexander Duyck 2012-06-20 1:49 ` Alexander Duyck 2012-06-20 5:36 ` Eric Dumazet 2012-06-20 8:17 ` Eric Dumazet 2012-06-20 8:44 ` Eric Dumazet 2012-06-20 9:04 ` David Miller 2012-06-20 9:14 ` Eric Dumazet 2012-06-20 13:21 ` Eric Dumazet 2012-06-21 4:07 ` Alexander Duyck 2012-06-21 5:07 ` Eric Dumazet 2012-06-22 12:33 ` Eric Dumazet 2012-06-23 0:17 ` Alexander Duyck 2012-06-29 23:04 ` Alexander Duyck 2012-06-30 8:39 ` Eric Dumazet 2012-06-21 5:56 ` David Miller 2012-06-20 16:30 ` Alexander Duyck 2012-06-20 17:14 ` Alexander Duyck 2012-06-20 18:41 ` Eric Dumazet 2012-06-20 20:10 ` Alexander Duyck
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).