netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Understanding lock contention in __udp4_lib_mcast_deliver
@ 2013-06-27 19:22 Shawn Bohrer
  2013-06-27 19:58 ` Rick Jones
  2013-06-27 21:39 ` Or Gerlitz
  0 siblings, 2 replies; 10+ messages in thread
From: Shawn Bohrer @ 2013-06-27 19:22 UTC (permalink / raw)
  To: netdev

I'm looking for opportunities to improve the multicast receive
performance for our application, and I thought I'd spend some time
trying to understand what I thought might be a small/simple
improvement.  Profiling with perf I see that there is spin_lock
contention in __udp4_lib_mcast_deliver:

0.68%  swapper  [kernel.kallsyms]               [k] _raw_spin_lock
       |
       --- _raw_spin_lock
          |
          |--24.13%-- perf_adjust_freq_unthr_context.part.21
          |
          |--22.40%-- scheduler_tick
          |
          |--14.96%-- __udp4_lib_mcast_deliver

The lock appears to be the udp_hslot lock protecting sockets that have
been hashed based on the local port.  My application contains multiple
processes and each process listens on multiple multicast sockets.  We
open one socket per multicast addr:port per machine.  All of the local
UDP ports are unique.  The UDP hash table has 65536 entries and it
appears the hash function is simply the port number plus a
contribution from the network namespace (I believe the namespace part
should be constant since we're not not using network namespaces.
Perhaps I should disable CONFIG_NET_NS).

$ dmesg | grep "UDP hash"
[    0.606472] UDP hash table entries: 65536 (order: 9, 2097152 bytes)

At this point I'm confused why there is any contention over this
spin_lock.  We should have enough hash table entries to avoid
collisions and all of our ports are unique.  Assuming that part is
true I could imagine contention if the kernel is processing two
packets for the same multicast addr:port in parallel but is that
possible?  I'm using a Mellanox ConnectX-3 card that has multiple
receive queues but I've always thought that all packets for a given
srcip:srcport:destip:destport would all be delivered to a single queue
and thus not processed in parallel.

There is obviously something wrong with my thinking and I'd appreciate
if someone could tell me what I'm missing.

Thanks,
Shawn

-- 

---------------------------------------------------------------
This email, along with any attachments, is confidential. If you 
believe you received this message in error, please contact the 
sender immediately and delete all copies of the message.  
Thank you.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2013-07-02 20:16 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-06-27 19:22 Understanding lock contention in __udp4_lib_mcast_deliver Shawn Bohrer
2013-06-27 19:58 ` Rick Jones
2013-06-27 20:20   ` Shawn Bohrer
2013-06-27 20:46     ` Rick Jones
2013-06-27 21:54       ` Shawn Bohrer
2013-06-27 22:03         ` Rick Jones
2013-06-27 22:44           ` Shawn Bohrer
2013-07-02 20:16             ` Eric Dumazet
2013-06-27 21:39 ` Or Gerlitz
2013-06-27 21:58   ` Shawn Bohrer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).