* Re: [PATCH net] net/mlx5e: Do not recycle pages from emergency reserve
[not found] ` <CAPe+=50xs=MMqxHPgTiGa3qytL7633Cygf_vBWuAuq=ceLyacg@mail.gmail.com>
@ 2017-01-21 19:26 ` Eric Dumazet
2017-01-23 9:14 ` Jesper Dangaard Brouer
2017-01-21 20:31 ` Saeed Mahameed
2017-01-22 18:09 ` Tom Herbert
2 siblings, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2017-01-21 19:26 UTC (permalink / raw)
To: kernel netdev
Cc: Tom Herbert, Saeed Mahameed, netdev, Tariq Toukan, Davem,
Saeed Mahameed, brouer
On Sat, 2017-01-21 at 20:12 +0100, kernel netdev wrote:
>
>
> Den 21. jan. 2017 7.10 PM skrev "Tom Herbert" <tom@herbertland.com>:
> On Thu, Jan 19, 2017 at 11:14 AM, Saeed Mahameed
> <saeedm@dev.mellanox.co.il> wrote:
> > On Thu, Jan 19, 2017 at 9:03 AM, Eric Dumazet
> <eric.dumazet@gmail.com> wrote:
> >> From: Eric Dumazet <edumazet@google.com>
> >>
> >> A driver using dev_alloc_page() must not reuse a page
> allocated from
> >> emergency memory reserve.
> >>
> >> Otherwise all packets using this page will be immediately
> dropped,
> >> unless for very specific sockets having SOCK_MEMALLOC bit
> set.
> >>
> >> This issue might be hard to debug, because only a fraction
> of received
> >> packets would be dropped.
> >
> > Hi Eric,
> >
> > When you say reuse, you mean point to the same page from
> several SKBs ?
> >
> > Because in our page cache implementation we don't reuse
> pages that
> > already passed to the stack,
> > we just keep them in the page cache until the ref count drop
> back to
> > one, so we recycle them (i,e they will be re-used only when
> no one
> > else is using them).
> >
>
> Saeed,
>
> Speaking of the mlx page cache can we remove this or a least
> make it
> optional to use. It is another example of complex
> functionality being
> put into drivers that makes things like backports more
> complicated and
> provide at best some marginal value. In the case of the mlx5e
> cache
> code the results from pktgen really weren't very impressive in
> the
> first place. Also, the cache suffers from HOL blocking where
> we can
> block the whole cache due to an outstanding reference on just
> one page
> (something that you wouldn't see in pktgen but is likely to
> happen in
> real applications).
>
>
> (Send from phone in car)
>
>
> To Tom, have you measured the effect of this page cache? Before
> claiming it is ineffective.
>
>
> My previous measurements show approx 20℅ speedup on a UDP test with
> delivery to remote CPU.
>
I find this a bit strange. When you have time (ie not while driving your
car or during week end) please give more details, for example on message
size. Was it before skb_condense() was added ?
>
> Removing the cache would of cause be a good usecase for speeding up
> the page allocator (PCP). Which Mel Gorman and me are working on.
> AFAIK current page order0 cost 240 cycles. Mel have reduced til to
> 180, and without NUMA 150 cycles. And with bulking this can be
> amortized to 80 cycles.
>
>
> --Jesper
>
>
>
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH net] net/mlx5e: Do not recycle pages from emergency reserve
2017-01-21 19:26 ` Eric Dumazet
@ 2017-01-23 9:14 ` Jesper Dangaard Brouer
2017-01-24 9:21 ` Saeed Mahameed
0 siblings, 1 reply; 16+ messages in thread
From: Jesper Dangaard Brouer @ 2017-01-23 9:14 UTC (permalink / raw)
To: Eric Dumazet
Cc: brouer, Tom Herbert, Saeed Mahameed, netdev, Tariq Toukan, Davem,
Saeed Mahameed
On Sat, 21 Jan 2017 11:26:49 -0800
Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > My previous measurements show approx 20℅ speedup on a UDP test with
> > delivery to remote CPU.
> >
> I find this a bit strange. When you have time (ie not while driving your
> car or during week end) please give more details, for example on message
> size.
I tested this with both 64 bytes and 1500 bytes. After I moved to 50G
and 100G testing then I don't need to use 64 bytes packets to provoke
the bottlenecks in the stack ;-)
> Was it before skb_condense() was added ?
It tested this just before skb_condense() was added. BUT
skb_condense() does not get activated when using mlx5, because uses
build_skb() ie. not using frags.
For people that don't realize this:
Eric's optimization in skb_condense() is about trading remote CPU
atomic refcnt (put_page) for copy + local CPU refcnt dec.
My measurements show cycles cost local=31 vs. remote=208, thus a
estimated saving around 177 cycles. Which is spend on calling a fairly
complex function __pskb_pull_tail(), and only works for more complex
SKBs with frags.
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net] net/mlx5e: Do not recycle pages from emergency reserve
2017-01-23 9:14 ` Jesper Dangaard Brouer
@ 2017-01-24 9:21 ` Saeed Mahameed
0 siblings, 0 replies; 16+ messages in thread
From: Saeed Mahameed @ 2017-01-24 9:21 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: Eric Dumazet, Tom Herbert, Saeed Mahameed, netdev, Tariq Toukan,
Davem
On Mon, Jan 23, 2017 at 11:14 AM, Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
> On Sat, 21 Jan 2017 11:26:49 -0800
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>> > My previous measurements show approx 20℅ speedup on a UDP test with
>> > delivery to remote CPU.
>> >
>> I find this a bit strange. When you have time (ie not while driving your
>> car or during week end) please give more details, for example on message
>> size.
>
> I tested this with both 64 bytes and 1500 bytes. After I moved to 50G
> and 100G testing then I don't need to use 64 bytes packets to provoke
> the bottlenecks in the stack ;-)
>
Exactly! for XDP like uses cases, page cache maybe a non required optimization.
but when you start testing a typical TCP use cases over 50/100G link
you will need
more buffers (pages) to host the traffic for longer periods, you will
hit that bottleneck.
>> Was it before skb_condense() was added ?
>
> It tested this just before skb_condense() was added. BUT
> skb_condense() does not get activated when using mlx5, because uses
> build_skb() ie. not using frags.
>
Well, we can always replace build_skb with alloc_skb +
memcpy(skb->data, headlen) + add_skb_frag(payload)
does it it worth it ? and is it healthy that both skb->data and
skb_shinfo(skb)->frags[i] point to the same page ?
> For people that don't realize this:
> Eric's optimization in skb_condense() is about trading remote CPU
> atomic refcnt (put_page) for copy + local CPU refcnt dec.
>
> My measurements show cycles cost local=31 vs. remote=208, thus a
> estimated saving around 177 cycles. Which is spend on calling a fairly
> complex function __pskb_pull_tail(), and only works for more complex
> SKBs with frags.
>
> --
> Best regards,
> Jesper Dangaard Brouer
> MSc.CS, Principal Kernel Engineer at Red Hat
> LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net] net/mlx5e: Do not recycle pages from emergency reserve
[not found] ` <CAPe+=50xs=MMqxHPgTiGa3qytL7633Cygf_vBWuAuq=ceLyacg@mail.gmail.com>
2017-01-21 19:26 ` Eric Dumazet
@ 2017-01-21 20:31 ` Saeed Mahameed
2017-01-22 17:50 ` Tom Herbert
2017-01-23 8:39 ` Jesper Dangaard Brouer
2017-01-22 18:09 ` Tom Herbert
2 siblings, 2 replies; 16+ messages in thread
From: Saeed Mahameed @ 2017-01-21 20:31 UTC (permalink / raw)
To: kernel netdev
Cc: Tom Herbert, Saeed Mahameed, netdev, Tariq Toukan, Davem,
Eric Dumazet, Jesper Dangaard Brouer
On Sat, Jan 21, 2017 at 9:12 PM, kernel netdev <netdev@brouer.com> wrote:
>
>
> Den 21. jan. 2017 7.10 PM skrev "Tom Herbert" <tom@herbertland.com>:
>
> On Thu, Jan 19, 2017 at 11:14 AM, Saeed Mahameed
> <saeedm@dev.mellanox.co.il> wrote:
>> On Thu, Jan 19, 2017 at 9:03 AM, Eric Dumazet <eric.dumazet@gmail.com>
>> wrote:
>>> From: Eric Dumazet <edumazet@google.com>
>>>
>>> A driver using dev_alloc_page() must not reuse a page allocated from
>>> emergency memory reserve.
>>>
>>> Otherwise all packets using this page will be immediately dropped,
>>> unless for very specific sockets having SOCK_MEMALLOC bit set.
>>>
>>> This issue might be hard to debug, because only a fraction of received
>>> packets would be dropped.
>>
>> Hi Eric,
>>
>> When you say reuse, you mean point to the same page from several SKBs ?
>>
>> Because in our page cache implementation we don't reuse pages that
>> already passed to the stack,
>> we just keep them in the page cache until the ref count drop back to
>> one, so we recycle them (i,e they will be re-used only when no one
>> else is using them).
>>
> Saeed,
>
> Speaking of the mlx page cache can we remove this or a least make it
> optional to use. It is another example of complex functionality being
> put into drivers that makes things like backports more complicated and
Re complexity, I am not sure the mlx page cache is that complex,
we just wrap alloc_page/put_page with our own page cache calls.
Roughly the page cache implementation is 200-300 LOC tops all concentrated
in one place in the code.
> provide at best some marginal value. In the case of the mlx5e cache
> code the results from pktgen really weren't very impressive in the
> first place. Also, the cache suffers from HOL blocking where we can
Well, with pktgen you won't notice a huge improvement since the pages are freed
in the stack directly from our rx receive handler (gro_receive), those
pages will go back to the page allocator and get requested immediately
again from the driver (no stress).
The real improvements are seen when you really stress the page allocator with
real uses cases such as TCP/UDP with user applications, where the
pages are held longer than the driver needs, the page cache in this
case will play an important role of reducing the stress
on the page allocater since with those use cases we are juggling with
more pages than pktgen could.
With more stress (#cores TCP streams) our humble page cache some times
can't hold ! and it will get full fast enough and will fail to recycle
for a huge percentage of the driver pages requests.
Before our own page cache we used dev_alloc_skb which used its own
cache "page_frag_cache" and it worked nice enough. So i don't really
recommend removing the page cache, until we have
a generic RX page cache solution for all device drivers.
> block the whole cache due to an outstanding reference on just one page
> (something that you wouldn't see in pktgen but is likely to happen in
> real applications).
>
Re the HOL issue, we have some upcoming patches that would drastically
improve the HOL blocking issue (we will simple swap the HOL on every
sample).
>
> (Send from phone in car)
>
Driver Safe :) ..
> To Tom, have you measured the effect of this page cache? Before claiming it
> is ineffective.
>
> My previous measurements show approx 20℅ speedup on a UDP test with delivery
> to remote CPU.
>
> Removing the cache would of cause be a good usecase for speeding up the page
> allocator (PCP). Which Mel Gorman and me are working on. AFAIK current page
> order0 cost 240 cycles. Mel have reduced til to 180, and without NUMA 150
> cycles. And with bulking this can be amortized to 80 cycles.
>
Are you trying to say that we won't need the cache if you manage to
deliver those optimizations ?
can you compare those optimizations with the page_frag_cache from
dev_alloc_skb ?
> --Jesper
>
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH net] net/mlx5e: Do not recycle pages from emergency reserve
2017-01-21 20:31 ` Saeed Mahameed
@ 2017-01-22 17:50 ` Tom Herbert
2017-01-24 9:29 ` Saeed Mahameed
2017-01-23 8:39 ` Jesper Dangaard Brouer
1 sibling, 1 reply; 16+ messages in thread
From: Tom Herbert @ 2017-01-22 17:50 UTC (permalink / raw)
To: Saeed Mahameed
Cc: kernel netdev, Saeed Mahameed, netdev, Tariq Toukan, Davem,
Eric Dumazet, Jesper Dangaard Brouer
On Sat, Jan 21, 2017 at 12:31 PM, Saeed Mahameed
<saeedm@dev.mellanox.co.il> wrote:
> On Sat, Jan 21, 2017 at 9:12 PM, kernel netdev <netdev@brouer.com> wrote:
>>
>>
>> Den 21. jan. 2017 7.10 PM skrev "Tom Herbert" <tom@herbertland.com>:
>>
>> On Thu, Jan 19, 2017 at 11:14 AM, Saeed Mahameed
>> <saeedm@dev.mellanox.co.il> wrote:
>>> On Thu, Jan 19, 2017 at 9:03 AM, Eric Dumazet <eric.dumazet@gmail.com>
>>> wrote:
>>>> From: Eric Dumazet <edumazet@google.com>
>>>>
>>>> A driver using dev_alloc_page() must not reuse a page allocated from
>>>> emergency memory reserve.
>>>>
>>>> Otherwise all packets using this page will be immediately dropped,
>>>> unless for very specific sockets having SOCK_MEMALLOC bit set.
>>>>
>>>> This issue might be hard to debug, because only a fraction of received
>>>> packets would be dropped.
>>>
>>> Hi Eric,
>>>
>>> When you say reuse, you mean point to the same page from several SKBs ?
>>>
>>> Because in our page cache implementation we don't reuse pages that
>>> already passed to the stack,
>>> we just keep them in the page cache until the ref count drop back to
>>> one, so we recycle them (i,e they will be re-used only when no one
>>> else is using them).
>>>
>> Saeed,
>>
>> Speaking of the mlx page cache can we remove this or a least make it
>> optional to use. It is another example of complex functionality being
>> put into drivers that makes things like backports more complicated and
>
> Re complexity, I am not sure the mlx page cache is that complex,
> we just wrap alloc_page/put_page with our own page cache calls.
> Roughly the page cache implementation is 200-300 LOC tops all concentrated
> in one place in the code.
>
Taken as part of the RX buffer management code the whole thing in very
complicated and seems to be completely bereft of any comments in the
code as to how things are supposed to work.
>> provide at best some marginal value. In the case of the mlx5e cache
>> code the results from pktgen really weren't very impressive in the
>> first place. Also, the cache suffers from HOL blocking where we can
>
> Well, with pktgen you won't notice a huge improvement since the pages are freed
> in the stack directly from our rx receive handler (gro_receive), those
> pages will go back to the page allocator and get requested immediately
> again from the driver (no stress).
>
> The real improvements are seen when you really stress the page allocator with
> real uses cases such as TCP/UDP with user applications, where the
> pages are held longer than the driver needs, the page cache in this
> case will play an important role of reducing the stress
> on the page allocater since with those use cases we are juggling with
> more pages than pktgen could.
>
> With more stress (#cores TCP streams) our humble page cache some times
> can't hold ! and it will get full fast enough and will fail to recycle
> for a huge percentage of the driver pages requests.
>
> Before our own page cache we used dev_alloc_skb which used its own
> cache "page_frag_cache" and it worked nice enough. So i don't really
> recommend removing the page cache, until we have
> a generic RX page cache solution for all device drivers.
>
>> block the whole cache due to an outstanding reference on just one page
>> (something that you wouldn't see in pktgen but is likely to happen in
>> real applications).
>>
>
> Re the HOL issue, we have some upcoming patches that would drastically
> improve the HOL blocking issue (we will simple swap the HOL on every
> sample).
>
>>
>> (Send from phone in car)
>>
>
> Driver Safe :) ..
>
>> To Tom, have you measured the effect of this page cache? Before claiming it
>> is ineffective.
>>
>> My previous measurements show approx 20℅ speedup on a UDP test with delivery
>> to remote CPU.
>>
>> Removing the cache would of cause be a good usecase for speeding up the page
>> allocator (PCP). Which Mel Gorman and me are working on. AFAIK current page
>> order0 cost 240 cycles. Mel have reduced til to 180, and without NUMA 150
>> cycles. And with bulking this can be amortized to 80 cycles.
>>
>
> Are you trying to say that we won't need the cache if you manage to
> deliver those optimizations ?
> can you compare those optimizations with the page_frag_cache from
> dev_alloc_skb ?
>
>> --Jesper
>>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net] net/mlx5e: Do not recycle pages from emergency reserve
2017-01-22 17:50 ` Tom Herbert
@ 2017-01-24 9:29 ` Saeed Mahameed
0 siblings, 0 replies; 16+ messages in thread
From: Saeed Mahameed @ 2017-01-24 9:29 UTC (permalink / raw)
To: Tom Herbert
Cc: kernel netdev, Saeed Mahameed, netdev, Tariq Toukan, Davem,
Eric Dumazet, Jesper Dangaard Brouer
On Sun, Jan 22, 2017 at 7:50 PM, Tom Herbert <tom@herbertland.com> wrote:
> On Sat, Jan 21, 2017 at 12:31 PM, Saeed Mahameed
> <saeedm@dev.mellanox.co.il> wrote:
>> On Sat, Jan 21, 2017 at 9:12 PM, kernel netdev <netdev@brouer.com> wrote:
>>>
>>>
>>> Den 21. jan. 2017 7.10 PM skrev "Tom Herbert" <tom@herbertland.com>:
>>>
>>> On Thu, Jan 19, 2017 at 11:14 AM, Saeed Mahameed
>>> <saeedm@dev.mellanox.co.il> wrote:
>>>> On Thu, Jan 19, 2017 at 9:03 AM, Eric Dumazet <eric.dumazet@gmail.com>
>>>> wrote:
>>>>> From: Eric Dumazet <edumazet@google.com>
>>>>>
>>>>> A driver using dev_alloc_page() must not reuse a page allocated from
>>>>> emergency memory reserve.
>>>>>
>>>>> Otherwise all packets using this page will be immediately dropped,
>>>>> unless for very specific sockets having SOCK_MEMALLOC bit set.
>>>>>
>>>>> This issue might be hard to debug, because only a fraction of received
>>>>> packets would be dropped.
>>>>
>>>> Hi Eric,
>>>>
>>>> When you say reuse, you mean point to the same page from several SKBs ?
>>>>
>>>> Because in our page cache implementation we don't reuse pages that
>>>> already passed to the stack,
>>>> we just keep them in the page cache until the ref count drop back to
>>>> one, so we recycle them (i,e they will be re-used only when no one
>>>> else is using them).
>>>>
>>> Saeed,
>>>
>>> Speaking of the mlx page cache can we remove this or a least make it
>>> optional to use. It is another example of complex functionality being
>>> put into drivers that makes things like backports more complicated and
>>
>> Re complexity, I am not sure the mlx page cache is that complex,
>> we just wrap alloc_page/put_page with our own page cache calls.
>> Roughly the page cache implementation is 200-300 LOC tops all concentrated
>> in one place in the code.
>>
> Taken as part of the RX buffer management code the whole thing in very
> complicated and seems to be completely bereft of any comments in the
> code as to how things are supposed to work.
>
The Idea is fairly simple, maybe we could have made it cleaner if we
better decoupled the page
cache from the rx logic.
The page cache works in parallel to the rx logic, it takes its owns ref counts,
this way the rx logic will continue working as if there was no cache
(get/put_pages).
but instead of of the kernel (get/put_pages) it will call the internal
cache get/put_cache_page.
Yeah I agree, some comments could help. We will work on this when we
fix the HOL issues.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net] net/mlx5e: Do not recycle pages from emergency reserve
2017-01-21 20:31 ` Saeed Mahameed
2017-01-22 17:50 ` Tom Herbert
@ 2017-01-23 8:39 ` Jesper Dangaard Brouer
2017-01-23 15:45 ` David Miller
1 sibling, 1 reply; 16+ messages in thread
From: Jesper Dangaard Brouer @ 2017-01-23 8:39 UTC (permalink / raw)
To: Saeed Mahameed
Cc: brouer, Tom Herbert, Saeed Mahameed, netdev, Tariq Toukan, Davem,
Eric Dumazet
On Sat, 21 Jan 2017 22:31:26 +0200 Saeed Mahameed <saeedm@dev.mellanox.co.il> wrote:
> > My previous measurements show approx 20℅ speedup on a UDP test with delivery
> > to remote CPU.
> >
> > Removing the cache would of cause be a good usecase for speeding up the page
> > allocator (PCP). Which Mel Gorman and me are working on. AFAIK current page
> > order0 cost 240 cycles. Mel have reduced til to 180, and without NUMA 150
> > cycles. And with bulking this can be amortized to 80 cycles.
> >
>
> Are you trying to say that we won't need the cache if you manage to
> deliver those optimizations ?
These page alloc optimization are good, but not fast-enough to replace
your cache. Maybe I can improve it further, but it will be very hard
to compete with a recycle cache (it can skip many checks the page alloc
need, as it knows the page state).
Said in another way, the gap will be significantly smaller, and you
will not see a big boost from using this cache.
BUT there are other advantages of using a guaranteed recycle pool
facility (like the page_pool). Namely, (1) DMA-overhead: keeping page
DMA mapped to counter DMA+IOMMU overhead, (2) RX-zero-copy: opens up
for a software memory model solution for pre-VMA-mapping pages to
userspace (See: [1] for argument how this avoids leaking kernel mem,
but only expose/leak packet-data mem)
> can you compare those optimizations with the page_frag_cache from
> dev_alloc_skb ?
IHMO the page_frag_cache is indirectly a bulk page allocator facility,
which is limiting the number of times the page allocator is called, and
thus amortize the cost of talking to the page allocator. Performance
wise, it is likely comparable to your page cache, and likely faster
than the 80 cycles order0-bulking.
BUT we generally worry about using these 32K pages, as it opens a
window for attackers pinning down memory.
[1] https://prototype-kernel.readthedocs.io/en/latest/vm/page_pool/design/memory_model_nic.html
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net] net/mlx5e: Do not recycle pages from emergency reserve
2017-01-23 8:39 ` Jesper Dangaard Brouer
@ 2017-01-23 15:45 ` David Miller
0 siblings, 0 replies; 16+ messages in thread
From: David Miller @ 2017-01-23 15:45 UTC (permalink / raw)
To: brouer; +Cc: saeedm, tom, saeedm, netdev, tariqt, eric.dumazet
From: Jesper Dangaard Brouer <brouer@redhat.com>
Date: Mon, 23 Jan 2017 09:39:40 +0100
> BUT there are other advantages of using a guaranteed recycle pool
> facility (like the page_pool). Namely, (1) DMA-overhead: keeping page
> DMA mapped to counter DMA+IOMMU overhead, (2) RX-zero-copy: opens up
> for a software memory model solution for pre-VMA-mapping pages to
> userspace (See: [1] for argument how this avoids leaking kernel mem,
> but only expose/leak packet-data mem)
+1
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net] net/mlx5e: Do not recycle pages from emergency reserve
[not found] ` <CAPe+=50xs=MMqxHPgTiGa3qytL7633Cygf_vBWuAuq=ceLyacg@mail.gmail.com>
2017-01-21 19:26 ` Eric Dumazet
2017-01-21 20:31 ` Saeed Mahameed
@ 2017-01-22 18:09 ` Tom Herbert
2 siblings, 0 replies; 16+ messages in thread
From: Tom Herbert @ 2017-01-22 18:09 UTC (permalink / raw)
To: kernel netdev
Cc: Saeed Mahameed, netdev, Tariq Toukan, Davem, Eric Dumazet,
Saeed Mahameed, Jesper Dangaard Brouer
On Sat, Jan 21, 2017 at 11:12 AM, kernel netdev <netdev@brouer.com> wrote:
>
>
> Den 21. jan. 2017 7.10 PM skrev "Tom Herbert" <tom@herbertland.com>:
>
> On Thu, Jan 19, 2017 at 11:14 AM, Saeed Mahameed
> <saeedm@dev.mellanox.co.il> wrote:
>> On Thu, Jan 19, 2017 at 9:03 AM, Eric Dumazet <eric.dumazet@gmail.com>
>> wrote:
>>> From: Eric Dumazet <edumazet@google.com>
>>>
>>> A driver using dev_alloc_page() must not reuse a page allocated from
>>> emergency memory reserve.
>>>
>>> Otherwise all packets using this page will be immediately dropped,
>>> unless for very specific sockets having SOCK_MEMALLOC bit set.
>>>
>>> This issue might be hard to debug, because only a fraction of received
>>> packets would be dropped.
>>
>> Hi Eric,
>>
>> When you say reuse, you mean point to the same page from several SKBs ?
>>
>> Because in our page cache implementation we don't reuse pages that
>> already passed to the stack,
>> we just keep them in the page cache until the ref count drop back to
>> one, so we recycle them (i,e they will be re-used only when no one
>> else is using them).
>>
> Saeed,
>
> Speaking of the mlx page cache can we remove this or a least make it
> optional to use. It is another example of complex functionality being
> put into drivers that makes things like backports more complicated and
> provide at best some marginal value. In the case of the mlx5e cache
> code the results from pktgen really weren't very impressive in the
> first place. Also, the cache suffers from HOL blocking where we can
> block the whole cache due to an outstanding reference on just one page
> (something that you wouldn't see in pktgen but is likely to happen in
> real applications).
>
>
> (Send from phone in car)
>
> To Tom, have you measured the effect of this page cache? Before claiming it
> is ineffective.
No, I have not. TBH, I have most of the past few weeks trying to debug
a backport of the code from 4.9 to 4.6. Until we have a working
backport performance is immaterial for our purposes. Unfortunately, we
are seeing some issues: the checksum faults I described previously and
crashes on bad page refcns which are presumably being caused by the
logic in RX buffer processing. This is why I am now having to dissect
the code and trying to disable things like the page cache that are not
essential functionality.
In any case the HOL blocking issue is obvious from reading the code
which and implies bimodal behavior-- we don't need a test for that to
know it's a bad as we've see the bad effects of that in many other
contexts.
>
> My previous measurements show approx 20℅ speedup on a UDP test with delivery
> to remote CPU.
>
> Removing the cache would of cause be a good usecase for speeding up the page
> allocator (PCP). Which Mel Gorman and me are working on. AFAIK current page
> order0 cost 240 cycles. Mel have reduced til to 180, and without NUMA 150
> cycles. And with bulking this can be amortized to 80 cycles.
>
That would be great. If only I had a nickel for every time someone
started working on a driver and came the conclusion that they need to
do a custom memory allocator because the kernel allocator is so
inefficient!
Tom
> --Jesper
>
^ permalink raw reply [flat|nested] 16+ messages in thread