* irq disable in __netdev_alloc_frag() ?
@ 2014-10-23 0:15 Alexei Starovoitov
2014-10-23 1:52 ` Eric Dumazet
0 siblings, 1 reply; 15+ messages in thread
From: Alexei Starovoitov @ 2014-10-23 0:15 UTC (permalink / raw)
To: Eric Dumazet, Network Development
Hi Eric,
in the commit 6f532612cc24 ("net: introduce netdev_alloc_frag()")
you mentioned that the reason to disable interrupts
in __netdev_alloc_frag() is:
"- Must be IRQ safe (non NAPI drivers can use it)"
Is there a way to do this conditionally?
Without it I see 10% performance gain for my RX tests
(from 6.9Mpps to 7.7Mpps) and __netdev_alloc_frag()
itself goes from 6.6% to 2.1%
(popf seems to be quite costly)
Thanks
Alexei
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: irq disable in __netdev_alloc_frag() ? 2014-10-23 0:15 irq disable in __netdev_alloc_frag() ? Alexei Starovoitov @ 2014-10-23 1:52 ` Eric Dumazet 2014-10-23 2:22 ` Alexei Starovoitov 2014-10-23 3:19 ` Alexander Duyck 0 siblings, 2 replies; 15+ messages in thread From: Eric Dumazet @ 2014-10-23 1:52 UTC (permalink / raw) To: Alexei Starovoitov; +Cc: Eric Dumazet, Network Development On Wed, 2014-10-22 at 17:15 -0700, Alexei Starovoitov wrote: > Hi Eric, > > in the commit 6f532612cc24 ("net: introduce netdev_alloc_frag()") > you mentioned that the reason to disable interrupts > in __netdev_alloc_frag() is: > "- Must be IRQ safe (non NAPI drivers can use it)" > > Is there a way to do this conditionally? > > Without it I see 10% performance gain for my RX tests > (from 6.9Mpps to 7.7Mpps) and __netdev_alloc_frag() > itself goes from 6.6% to 2.1% > (popf seems to be quite costly) Well, your driver is probably a NAPI one, so you need to mask irqs, or to remove all non NAPI drivers from linux. __netdev_alloc_frag() (__netdev_alloc_skb()) is used by all. Problem is __netdev_alloc_frag() is generally deep inside caller chain, so using a private pool might have quite an overhead. Same could be said for skb_queue_head() /skb_queue_tail() / sock_queue_rcv_skb() : Many callers don't need to block irq. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: irq disable in __netdev_alloc_frag() ? 2014-10-23 1:52 ` Eric Dumazet @ 2014-10-23 2:22 ` Alexei Starovoitov 2014-10-23 3:48 ` Eric Dumazet 2014-10-23 3:19 ` Alexander Duyck 1 sibling, 1 reply; 15+ messages in thread From: Alexei Starovoitov @ 2014-10-23 2:22 UTC (permalink / raw) To: Eric Dumazet; +Cc: Eric Dumazet, Network Development On Wed, Oct 22, 2014 at 6:52 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > > On Wed, 2014-10-22 at 17:15 -0700, Alexei Starovoitov wrote: >> Hi Eric, >> >> in the commit 6f532612cc24 ("net: introduce netdev_alloc_frag()") >> you mentioned that the reason to disable interrupts >> in __netdev_alloc_frag() is: >> "- Must be IRQ safe (non NAPI drivers can use it)" >> >> Is there a way to do this conditionally? >> >> Without it I see 10% performance gain for my RX tests >> (from 6.9Mpps to 7.7Mpps) and __netdev_alloc_frag() >> itself goes from 6.6% to 2.1% >> (popf seems to be quite costly) > > Well, your driver is probably a NAPI one, so you need to > mask irqs, or to remove all non NAPI drivers from linux. yeah, the 10G+ nics I care about are all napi :) > __netdev_alloc_frag() (__netdev_alloc_skb()) is used by all. > > Problem is __netdev_alloc_frag() is generally deep inside caller > chain, so using a private pool might have quite an overhead. yes. I was thinking, since dev is already passed into __netdev_alloc_skb(), we can check whether dev registered with napi via dev->napi_list and if so, tell inner __netdev_alloc_frag() to skip irq disabling... don't know about skb_queue_head() and friends. I'm only looking at pure rx now. One challenge at a time. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: irq disable in __netdev_alloc_frag() ? 2014-10-23 2:22 ` Alexei Starovoitov @ 2014-10-23 3:48 ` Eric Dumazet 0 siblings, 0 replies; 15+ messages in thread From: Eric Dumazet @ 2014-10-23 3:48 UTC (permalink / raw) To: Alexei Starovoitov; +Cc: Eric Dumazet, Network Development On Wed, 2014-10-22 at 19:22 -0700, Alexei Starovoitov wrote: > yes. I was thinking, since dev is already passed > into __netdev_alloc_skb(), we can check whether > dev registered with napi via dev->napi_list and if so, > tell inner __netdev_alloc_frag() to skip irq disabling... > This does not matter. The problem is not _this_ device, the problem is that another device might trigger a hard irq, and this hard irq could mess your data. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: irq disable in __netdev_alloc_frag() ? 2014-10-23 1:52 ` Eric Dumazet 2014-10-23 2:22 ` Alexei Starovoitov @ 2014-10-23 3:19 ` Alexander Duyck 2014-10-23 3:51 ` Eric Dumazet 1 sibling, 1 reply; 15+ messages in thread From: Alexander Duyck @ 2014-10-23 3:19 UTC (permalink / raw) To: Eric Dumazet, Alexei Starovoitov; +Cc: Eric Dumazet, Network Development On 10/22/2014 06:52 PM, Eric Dumazet wrote: > On Wed, 2014-10-22 at 17:15 -0700, Alexei Starovoitov wrote: >> Hi Eric, >> >> in the commit 6f532612cc24 ("net: introduce netdev_alloc_frag()") >> you mentioned that the reason to disable interrupts >> in __netdev_alloc_frag() is: >> "- Must be IRQ safe (non NAPI drivers can use it)" >> >> Is there a way to do this conditionally? >> >> Without it I see 10% performance gain for my RX tests >> (from 6.9Mpps to 7.7Mpps) and __netdev_alloc_frag() >> itself goes from 6.6% to 2.1% >> (popf seems to be quite costly) > Well, your driver is probably a NAPI one, so you need to > mask irqs, or to remove all non NAPI drivers from linux. > > __netdev_alloc_frag() (__netdev_alloc_skb()) is used by all. > > Problem is __netdev_alloc_frag() is generally deep inside caller > chain, so using a private pool might have quite an overhead. > > Same could be said for skb_queue_head() /skb_queue_tail() / > sock_queue_rcv_skb() : > Many callers don't need to block irq. Couldn't __netdev_alloc_frag() be forked into two functions, one that is only called from inside the NAPI context and one that is called for all other contexts? It would mean having to double the number of pages being held per CPU, but I would think something like that would be doable. Thanks, Alex ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: irq disable in __netdev_alloc_frag() ? 2014-10-23 3:19 ` Alexander Duyck @ 2014-10-23 3:51 ` Eric Dumazet 2014-10-23 3:56 ` Eric Dumazet 2014-10-27 20:35 ` Jesper Dangaard Brouer 0 siblings, 2 replies; 15+ messages in thread From: Eric Dumazet @ 2014-10-23 3:51 UTC (permalink / raw) To: Alexander Duyck; +Cc: Alexei Starovoitov, Eric Dumazet, Network Development On Wed, 2014-10-22 at 20:19 -0700, Alexander Duyck wrote: > Couldn't __netdev_alloc_frag() be forked into two functions, one that is > only called from inside the NAPI context and one that is called for all > other contexts? It would mean having to double the number of pages > being held per CPU, but I would think something like that would be doable. Possibly, but this looks like code bloat for me. On my hosts, this hard irq masking is pure noise. What CPU are you using Alexander ? Same could be done with some kmem_cache_alloc() : SLAB uses hard irq masking while some caches are never used from hard irq context. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: irq disable in __netdev_alloc_frag() ? 2014-10-23 3:51 ` Eric Dumazet @ 2014-10-23 3:56 ` Eric Dumazet 2014-10-23 4:29 ` Alexei Starovoitov 2014-10-27 20:35 ` Jesper Dangaard Brouer 1 sibling, 1 reply; 15+ messages in thread From: Eric Dumazet @ 2014-10-23 3:56 UTC (permalink / raw) To: Alexander Duyck; +Cc: Alexei Starovoitov, Eric Dumazet, Network Development On Wed, 2014-10-22 at 20:51 -0700, Eric Dumazet wrote: > On Wed, 2014-10-22 at 20:19 -0700, Alexander Duyck wrote: > > > Couldn't __netdev_alloc_frag() be forked into two functions, one that is > > only called from inside the NAPI context and one that is called for all > > other contexts? It would mean having to double the number of pages > > being held per CPU, but I would think something like that would be doable. > > Possibly, but this looks like code bloat for me. > > On my hosts, this hard irq masking is pure noise. > > What CPU are you using Alexander ? Sorry, the question was for Alexei ;) ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: irq disable in __netdev_alloc_frag() ? 2014-10-23 3:56 ` Eric Dumazet @ 2014-10-23 4:29 ` Alexei Starovoitov 2014-10-23 5:14 ` Eric Dumazet 0 siblings, 1 reply; 15+ messages in thread From: Alexei Starovoitov @ 2014-10-23 4:29 UTC (permalink / raw) To: Eric Dumazet; +Cc: Alexander Duyck, Eric Dumazet, Network Development On Wed, Oct 22, 2014 at 8:56 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Wed, 2014-10-22 at 20:51 -0700, Eric Dumazet wrote: >> On Wed, 2014-10-22 at 20:19 -0700, Alexander Duyck wrote: >> >> > Couldn't __netdev_alloc_frag() be forked into two functions, one that is >> > only called from inside the NAPI context and one that is called for all >> > other contexts? It would mean having to double the number of pages >> > being held per CPU, but I would think something like that would be doable. >> >> Possibly, but this looks like code bloat for me. >> >> On my hosts, this hard irq masking is pure noise. >> >> What CPU are you using Alexander ? > > Sorry, the question was for Alexei ;) my numbers were from just released Xeon E5-1630 v3 which is haswell-ep I think Alexander's idea is worth pursuing. Code-wise the bloat is minimal: one extra netdev_alloc_cache and few 'if' which one to choose. Most of the time only one of them is used, since systems with napi and not-napi nics are very rare. I think it should help virtualization case as well, since virtio_net is napi already. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: irq disable in __netdev_alloc_frag() ? 2014-10-23 4:29 ` Alexei Starovoitov @ 2014-10-23 5:14 ` Eric Dumazet 2014-10-23 6:12 ` Alexei Starovoitov 0 siblings, 1 reply; 15+ messages in thread From: Eric Dumazet @ 2014-10-23 5:14 UTC (permalink / raw) To: Alexei Starovoitov; +Cc: Alexander Duyck, Eric Dumazet, Network Development On Wed, 2014-10-22 at 21:29 -0700, Alexei Starovoitov wrote: > my numbers were from just released Xeon E5-1630 v3 > which is haswell-ep > I think Alexander's idea is worth pursuing. > Code-wise the bloat is minimal: one extra netdev_alloc_cache > and few 'if' which one to choose. > Most of the time only one of them is used, since > systems with napi and not-napi nics are very rare. > I think it should help virtualization case as well, > since virtio_net is napi already. Really, I doubt this will make any difference in real workloads where bottleneck is memory bandwidth, not the time taken by pushf/cli/popf. What difference do you have on a real workload like : netperf TCP_RR or UDP_RR ? ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: irq disable in __netdev_alloc_frag() ? 2014-10-23 5:14 ` Eric Dumazet @ 2014-10-23 6:12 ` Alexei Starovoitov 0 siblings, 0 replies; 15+ messages in thread From: Alexei Starovoitov @ 2014-10-23 6:12 UTC (permalink / raw) To: Eric Dumazet; +Cc: Alexander Duyck, Eric Dumazet, Network Development On Wed, Oct 22, 2014 at 10:14 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > > Really, I doubt this will make any difference in real workloads > where bottleneck is memory bandwidth, not the time taken by > pushf/cli/popf. > > What difference do you have on a real workload like : netperf TCP_RR or > UDP_RR ? there are different workloads. l2/l3 switching is very real as well. For tcp_rr on the host the difference will be minimal, since the time is spent elsewhere. For cases that don't use tcp stack, but do pure l2/l3 processing all improvements will add up. imo the kernel should be optimized for all use cases when possible. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: irq disable in __netdev_alloc_frag() ? 2014-10-23 3:51 ` Eric Dumazet 2014-10-23 3:56 ` Eric Dumazet @ 2014-10-27 20:35 ` Jesper Dangaard Brouer 2014-10-28 2:30 ` Christoph Lameter 1 sibling, 1 reply; 15+ messages in thread From: Jesper Dangaard Brouer @ 2014-10-27 20:35 UTC (permalink / raw) To: Eric Dumazet Cc: brouer, Alexander Duyck, Alexei Starovoitov, Eric Dumazet, Network Development, Christoph Lameter On Wed, 22 Oct 2014 20:51:16 -0700 Eric Dumazet <eric.dumazet@gmail.com> wrote: > On my hosts, this hard irq masking is pure noise. On my hosts I can measure a significant difference between using local_irq_disable() vs. local_irq_save(flags) * 2.860 ns cost for local_irq_{disable,enable} * 14.840 ns cost for local_irq_save()+local_irq_restore() This is quite significant in my nanosec world ;-) > What CPU are you using Alexander ? I'm using a E5-2695 (Ivy-bridge) You can easily reproduce my results on your own system with my time_bench_sample module here: https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/time_bench_sample.c#L173 > Same could be done with some kmem_cache_alloc() : SLAB uses hard irq > masking while some caches are never used from hard irq context. Sounds interesting. -- Best regards, Jesper Dangaard Brouer MSc.CS, Sr. Network Kernel Developer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: irq disable in __netdev_alloc_frag() ? 2014-10-27 20:35 ` Jesper Dangaard Brouer @ 2014-10-28 2:30 ` Christoph Lameter 2014-10-28 2:46 ` Eric Dumazet 0 siblings, 1 reply; 15+ messages in thread From: Christoph Lameter @ 2014-10-28 2:30 UTC (permalink / raw) To: Jesper Dangaard Brouer Cc: Eric Dumazet, Alexander Duyck, Alexei Starovoitov, Eric Dumazet, Network Development On Mon, 27 Oct 2014, Jesper Dangaard Brouer wrote: > > Same could be done with some kmem_cache_alloc() : SLAB uses hard irq > > masking while some caches are never used from hard irq context. > > Sounds interesting. SLUB does not disable interrupts in the fast paths. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: irq disable in __netdev_alloc_frag() ? 2014-10-28 2:30 ` Christoph Lameter @ 2014-10-28 2:46 ` Eric Dumazet 2014-10-28 4:56 ` David Miller 2014-10-28 19:46 ` Christoph Lameter 0 siblings, 2 replies; 15+ messages in thread From: Eric Dumazet @ 2014-10-28 2:46 UTC (permalink / raw) To: Christoph Lameter Cc: Jesper Dangaard Brouer, Eric Dumazet, Alexander Duyck, Alexei Starovoitov, Network Development On Mon, Oct 27, 2014 at 7:30 PM, Christoph Lameter <cl@linux.com> wrote: > On Mon, 27 Oct 2014, Jesper Dangaard Brouer wrote: > >> > Same could be done with some kmem_cache_alloc() : SLAB uses hard irq >> > masking while some caches are never used from hard irq context. >> >> Sounds interesting. > > SLUB does not disable interrupts in the fast paths. > Unfortunately, SLUB is more expensive than SLAB for many networking workloads. The cost of disabling interrupts is pure noise compared to cache line misses. SLUB has poor behavior compared to SLAB with alien caches, even with the side effect that 'struct page' is 64 bytes aligned instead of being 56 bytes with SLAB Note that I am not doing SLUB/SLAB tests every day, so it might be better nowadays. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: irq disable in __netdev_alloc_frag() ? 2014-10-28 2:46 ` Eric Dumazet @ 2014-10-28 4:56 ` David Miller 2014-10-28 19:46 ` Christoph Lameter 1 sibling, 0 replies; 15+ messages in thread From: David Miller @ 2014-10-28 4:56 UTC (permalink / raw) To: edumazet; +Cc: cl, brouer, eric.dumazet, alexander.duyck, ast, netdev From: Eric Dumazet <edumazet@google.com> Date: Mon, 27 Oct 2014 19:46:20 -0700 > Unfortunately, SLUB is more expensive than SLAB for many networking > workloads. > > The cost of disabling interrupts is pure noise compared to cache > line misses. > > SLUB has poor behavior compared to SLAB with alien caches, even with > the side effect that 'struct page' is 64 bytes aligned instead of > being 56 bytes with SLAB And SLAB completely shits itself when lots of memory gets cached up on a foreign node. This discussion has happened many times, SLAB may be faster when things work out nicely, but it acts poorly wrt. keeping foreign memory from being cached too aggressively. And there is a cost for that, which is that foreign memory has to be properly balanced back to it's home node. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: irq disable in __netdev_alloc_frag() ? 2014-10-28 2:46 ` Eric Dumazet 2014-10-28 4:56 ` David Miller @ 2014-10-28 19:46 ` Christoph Lameter 1 sibling, 0 replies; 15+ messages in thread From: Christoph Lameter @ 2014-10-28 19:46 UTC (permalink / raw) To: Eric Dumazet Cc: Jesper Dangaard Brouer, Eric Dumazet, Alexander Duyck, Alexei Starovoitov, Network Development On Mon, 27 Oct 2014, Eric Dumazet wrote: > Unfortunately, SLUB is more expensive than SLAB for many networking workloads. hackbench performs better with SLUB. > The cost of disabling interrupts is pure noise compared to cache line misses. > > SLUB has poor behavior compared to SLAB with alien caches, > even with the side effect that 'struct page' is 64 bytes aligned > instead of being 56 bytes with SLAB > > Note that I am not doing SLUB/SLAB tests every day, so it might be > better nowadays. Well we did some intensive work on these issues a couple of years ago. ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2014-10-28 19:54 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-10-23 0:15 irq disable in __netdev_alloc_frag() ? Alexei Starovoitov 2014-10-23 1:52 ` Eric Dumazet 2014-10-23 2:22 ` Alexei Starovoitov 2014-10-23 3:48 ` Eric Dumazet 2014-10-23 3:19 ` Alexander Duyck 2014-10-23 3:51 ` Eric Dumazet 2014-10-23 3:56 ` Eric Dumazet 2014-10-23 4:29 ` Alexei Starovoitov 2014-10-23 5:14 ` Eric Dumazet 2014-10-23 6:12 ` Alexei Starovoitov 2014-10-27 20:35 ` Jesper Dangaard Brouer 2014-10-28 2:30 ` Christoph Lameter 2014-10-28 2:46 ` Eric Dumazet 2014-10-28 4:56 ` David Miller 2014-10-28 19:46 ` Christoph Lameter
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).