All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Rick Jones <rick.jones2@hpe.com>,
	netdev@vger.kernel.org, Saeed Mahameed <saeedm@mellanox.com>,
	Tariq Toukan <tariqt@mellanox.com>,
	brouer@redhat.com
Subject: Re: Netperf UDP issue with connected sockets
Date: Mon, 21 Nov 2016 17:03:51 +0100	[thread overview]
Message-ID: <20161121170351.50a09ee1@redhat.com> (raw)
In-Reply-To: <1479408683.8455.273.camel@edumazet-glaptop3.roam.corp.google.com>


On Thu, 17 Nov 2016 10:51:23 -0800
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> On Thu, 2016-11-17 at 19:30 +0100, Jesper Dangaard Brouer wrote:
> 
> > The point is I can see a socket Send-Q forming, thus we do know the
> > application have something to send. Thus, and possibility for
> > non-opportunistic bulking. Allowing/implementing bulk enqueue from
> > socket layer into qdisc layer, should be fairly simple (and rest of
> > xmit_more is already in place).    
> 
> 
> As I said, you are fooled by TX completions.

Obviously TX completions play a role yes, and I bet I can adjust the
TX completion to cause xmit_more to happen, at the expense of
introducing added latency.

The point is the "bloated" spinlock in __dev_queue_xmit is still caused
by the MMIO tailptr/doorbell.  The added cost occurs when enqueueing
packets, and result in the inability to get enough packets into the
qdisc for xmit_more going (on my system).  I argue that a bulk enqueue
API would allow us to get past the hurtle of transitioning into
xmit_more mode more easily.


> Please make sure to increase the sndbuf limits !
> 
> echo 2129920 >/proc/sys/net/core/wmem_default

Testing with this makes no difference.

 $ grep -H . /proc/sys/net/core/wmem_default
 /proc/sys/net/core/wmem_default:2129920


> lpaa23:~# sar -n DEV 1 10|grep eth1
                  IFACE   rxpck/s    txpck/s    rxkB/s     txkB/s   rxcmp/s   txcmp/s  rxmcst/s
> 10:49:25         eth1      7.00 9273283.00      0.61 2187214.90      0.00      0.00      0.00
> 10:49:26         eth1      1.00 9230795.00      0.06 2176787.57      0.00      0.00      1.00
> 10:49:27         eth1      2.00 9247906.00      0.17 2180915.45      0.00      0.00      0.00
> 10:49:28         eth1      3.00 9246542.00      0.23 2180790.38      0.00      0.00      1.00
> Average:         eth1      2.50 9018045.70      0.25 2126893.82      0.00      0.00      0.50

Very impressive numbers 9.2Mpps TX.

What is this test?  What kind of traffic? Multiple CPUs?


> lpaa23:~# ethtool -S eth1|grep more; sleep 1;ethtool -S eth1|grep more
>      xmit_more: 2251366909
>      xmit_more: 2256011392
> 
> lpaa23:~# echo 2256011392-2251366909 | bc
> 4644483

The xmit_more definitely works on your system, but I cannot get it to
"kick-in" on my setup.  Once the xmit_more is active, then the
"bloated" spinlock problem should go way.


(Tests with "udp_flood --pmtu 3 --send")

Forcing TX completion to happen on the same CPU, no xmit_more:

 ~/git/network-testing/bin/ethtool_stats.pl --sec 2 --dev mlx5p2
 Show adapter(s) (mlx5p2) statistics (ONLY that changed!)
 Ethtool(mlx5p2  ) stat:    104592908 (    104,592,908) <= tx0_bytes /sec
 Ethtool(mlx5p2  ) stat:        39059 (         39,059) <= tx0_nop /sec
 Ethtool(mlx5p2  ) stat:      1743215 (      1,743,215) <= tx0_packets /sec 
 Ethtool(mlx5p2  ) stat:    104719986 (    104,719,986) <= tx_bytes /sec
 Ethtool(mlx5p2  ) stat:    111774540 (    111,774,540) <= tx_bytes_phy /sec
 Ethtool(mlx5p2  ) stat:      1745333 (      1,745,333) <= tx_csum_partial /sec
 Ethtool(mlx5p2  ) stat:      1745333 (      1,745,333) <= tx_packets /sec
 Ethtool(mlx5p2  ) stat:      1746477 (      1,746,477) <= tx_packets_phy /sec
 Ethtool(mlx5p2  ) stat:    111483434 (    111,483,434) <= tx_prio1_bytes /sec
 Ethtool(mlx5p2  ) stat:      1741928 (      1,741,928) <= tx_prio1_packets /sec

Forcing TX completion to happen on remote CPU, some xmit_more:

 Show adapter(s) (mlx5p2) statistics (ONLY that changed!)
 Ethtool(mlx5p2  ) stat:    128485892 (    128,485,892) <= tx0_bytes /sec
 Ethtool(mlx5p2  ) stat:        31840 (         31,840) <= tx0_nop /sec
 Ethtool(mlx5p2  ) stat:      2141432 (      2,141,432) <= tx0_packets /sec
 Ethtool(mlx5p2  ) stat:          350 (            350) <= tx0_xmit_more /sec
 Ethtool(mlx5p2  ) stat:    128486459 (    128,486,459) <= tx_bytes /sec
 Ethtool(mlx5p2  ) stat:    137052191 (    137,052,191) <= tx_bytes_phy /sec
 Ethtool(mlx5p2  ) stat:      2141441 (      2,141,441) <= tx_csum_partial /sec
 Ethtool(mlx5p2  ) stat:      2141441 (      2,141,441) <= tx_packets /sec
 Ethtool(mlx5p2  ) stat:      2141441 (      2,141,441) <= tx_packets_phy /sec
 Ethtool(mlx5p2  ) stat:    137051300 (    137,051,300) <= tx_prio1_bytes /sec
 Ethtool(mlx5p2  ) stat:      2141427 (      2,141,427) <= tx_prio1_packets /sec
 Ethtool(mlx5p2  ) stat:          350 (            350) <= tx_xmit_more /sec



>    PerfTop:   76969 irqs/sec  kernel:96.6%  exact: 100.0% [4000Hz cycles:pp],  (all, 48 CPUs)
>---------------------------------------------------------------------------------------------
>     11.64%  [kernel]  [k] skb_set_owner_w               
>      6.21%  [kernel]  [k] queued_spin_lock_slowpath     
>      4.76%  [kernel]  [k] _raw_spin_lock                
>      4.40%  [kernel]  [k] __ip_make_skb                 
>      3.10%  [kernel]  [k] sock_wfree                    
>      2.87%  [kernel]  [k] ipt_do_table                  
>      2.76%  [kernel]  [k] fq_dequeue                    
>      2.71%  [kernel]  [k] mlx4_en_xmit                  
>      2.50%  [kernel]  [k] __dev_queue_xmit              
>      2.29%  [kernel]  [k] __ip_append_data.isra.40      
>      2.28%  [kernel]  [k] udp_sendmsg                   
>      2.01%  [kernel]  [k] __alloc_skb                   
>      1.90%  [kernel]  [k] napi_consume_skb              
>      1.63%  [kernel]  [k] udp_send_skb                  
>      1.62%  [kernel]  [k] skb_release_data              
>      1.62%  [kernel]  [k] entry_SYSCALL_64_fastpath     
>      1.56%  [kernel]  [k] dev_hard_start_xmit           
>      1.55%  udpsnd    [.] __libc_send                   
>      1.48%  [kernel]  [k] netif_skb_features            
>      1.42%  [kernel]  [k] __qdisc_run                   
>      1.35%  [kernel]  [k] sk_dst_check                  
>      1.33%  [kernel]  [k] sock_def_write_space          
>      1.30%  [kernel]  [k] kmem_cache_alloc_node_trace   
>      1.29%  [kernel]  [k] __local_bh_enable_ip          
>      1.21%  [kernel]  [k] copy_user_enhanced_fast_string
>      1.08%  [kernel]  [k] __kmalloc_reserve.isra.40     
>      1.08%  [kernel]  [k] SYSC_sendto                   
>      1.07%  [kernel]  [k] kmem_cache_alloc_node         
>      0.95%  [kernel]  [k] ip_finish_output2             
>      0.95%  [kernel]  [k] ktime_get                     
>      0.91%  [kernel]  [k] validate_xmit_skb             
>      0.88%  [kernel]  [k] sock_alloc_send_pskb          
>      0.82%  [kernel]  [k] sock_sendmsg                  

My perf outputs below...

Forcing TX completion to happen on the same CPU, no xmit_more:

# Overhead  CPU  Command     Shared Object     Symbol                         
# ........  ...  ..........  ................. ...............................
#
    12.17%  000  udp_flood   [kernel.vmlinux]  [k] _raw_spin_lock             
     5.03%  000  udp_flood   [mlx5_core]       [k] mlx5e_sq_xmit              
     3.13%  000  udp_flood   [kernel.vmlinux]  [k] __ip_append_data.isra.47   
     2.85%  000  udp_flood   [kernel.vmlinux]  [k] entry_SYSCALL_64           
     2.75%  000  udp_flood   [mlx5_core]       [k] mlx5e_poll_tx_cq           
     2.61%  000  udp_flood   [kernel.vmlinux]  [k] sock_def_write_space       
     2.48%  000  udp_flood   [kernel.vmlinux]  [k] skb_set_owner_w            
     2.25%  000  udp_flood   [kernel.vmlinux]  [k] __alloc_skb                
     2.21%  000  udp_flood   [kernel.vmlinux]  [k] udp_sendmsg                
     2.19%  000  udp_flood   [kernel.vmlinux]  [k] __slab_free                
     2.08%  000  udp_flood   [kernel.vmlinux]  [k] sock_wfree                 
     2.06%  000  udp_flood   [kernel.vmlinux]  [k] __ip_make_skb              
     1.93%  000  udp_flood   [mlx5_core]       [k] mlx5e_get_cqe              
     1.93%  000  udp_flood   libc-2.17.so      [.] __libc_send                
     1.80%  000  udp_flood   [kernel.vmlinux]  [k] entry_SYSCALL_64_fastpath  
     1.64%  000  udp_flood   [kernel.vmlinux]  [k] kfree                      
     1.61%  000  udp_flood   [kernel.vmlinux]  [k] ip_finish_output2          
     1.59%  000  udp_flood   [kernel.vmlinux]  [k] __local_bh_enable_ip       
     1.57%  000  udp_flood   [kernel.vmlinux]  [k] __dev_queue_xmit           
     1.49%  000  udp_flood   [kernel.vmlinux]  [k] __kmalloc_node_track_caller
     1.38%  000  udp_flood   [kernel.vmlinux]  [k] kmem_cache_alloc_node      
     1.30%  000  udp_flood   [kernel.vmlinux]  [k] dst_release                
     1.26%  000  udp_flood   [kernel.vmlinux]  [k] ksize                      
     1.26%  000  udp_flood   [kernel.vmlinux]  [k] sk_dst_check               
     1.22%  000  udp_flood   [kernel.vmlinux]  [k] SYSC_sendto                
     1.22%  000  udp_flood   [kernel.vmlinux]  [k] ip_send_check              


Forcing TX completion to happen on remote CPU, some xmit_more:

# Overhead  CPU  Command      Shared Object     Symbol                        
# ........  ...  ............ ................  ..............................
#
    11.67%  002  udp_flood   [kernel.vmlinux]  [k] _raw_spin_lock             
     7.61%  002  udp_flood   [kernel.vmlinux]  [k] skb_set_owner_w            
     6.15%  002  udp_flood   [mlx5_core]       [k] mlx5e_sq_xmit              
     3.05%  002  udp_flood   [kernel.vmlinux]  [k] entry_SYSCALL_64           
     2.89%  002  udp_flood   [kernel.vmlinux]  [k] __ip_append_data.isra.47   
     2.78%  000  swapper     [mlx5_core]       [k] mlx5e_poll_tx_cq           
     2.65%  002  udp_flood   [kernel.vmlinux]  [k] sk_dst_check               
     2.36%  002  udp_flood   [kernel.vmlinux]  [k] __alloc_skb                
     2.22%  002  udp_flood   [kernel.vmlinux]  [k] ip_finish_output2          
     2.07%  000  swapper     [kernel.vmlinux]  [k] __slab_free                
     2.06%  002  udp_flood   [kernel.vmlinux]  [k] udp_sendmsg                
     1.97%  002  udp_flood   [kernel.vmlinux]  [k] ksize                      
     1.92%  002  udp_flood   [kernel.vmlinux]  [k] entry_SYSCALL_64_fastpath  
     1.82%  002  udp_flood   [kernel.vmlinux]  [k] __ip_make_skb              
     1.79%  002  udp_flood   libc-2.17.so      [.] __libc_send                
     1.62%  002  udp_flood   [kernel.vmlinux]  [k] __kmalloc_node_track_caller
     1.53%  002  udp_flood   [kernel.vmlinux]  [k] __local_bh_enable_ip       
     1.48%  002  udp_flood   [kernel.vmlinux]  [k] sock_alloc_send_pskb       
     1.43%  002  udp_flood   [kernel.vmlinux]  [k] __dev_queue_xmit           
     1.39%  002  udp_flood   [kernel.vmlinux]  [k] ip_send_check              
     1.39%  002  udp_flood   [kernel.vmlinux]  [k] kmem_cache_alloc_node      
     1.37%  002  udp_flood   [kernel.vmlinux]  [k] dst_release                
     1.21%  002  udp_flood   [kernel.vmlinux]  [k] udp_send_skb               
     1.18%  002  udp_flood   [kernel.vmlinux]  [k] __fget_light               
     1.16%  002  udp_flood   [kernel.vmlinux]  [k] kfree                      
     1.15%  000  swapper     [kernel.vmlinux]  [k] sock_wfree                 
     1.14%  002  udp_flood   [kernel.vmlinux]  [k] SYSC_sendto                

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

  parent reply	other threads:[~2016-11-21 16:03 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-03 14:59 High perf top ip_idents_reserve doing netperf UDP_STREAM Jesper Dangaard Brouer
2014-09-03 15:17 ` Eric Dumazet
2016-11-16 12:16   ` Netperf UDP issue with connected sockets Jesper Dangaard Brouer
2016-11-16 17:46     ` Rick Jones
2016-11-16 22:40       ` Jesper Dangaard Brouer
2016-11-16 22:50         ` Rick Jones
2016-11-17  0:34         ` Eric Dumazet
2016-11-17  8:16           ` Jesper Dangaard Brouer
2016-11-17 13:20             ` Eric Dumazet
2016-11-17 13:42               ` Jesper Dangaard Brouer
2016-11-17 14:17                 ` Eric Dumazet
2016-11-17 14:57                   ` Jesper Dangaard Brouer
2016-11-17 16:21                     ` Eric Dumazet
2016-11-17 18:30                       ` Jesper Dangaard Brouer
2016-11-17 18:51                         ` Eric Dumazet
2016-11-17 21:19                           ` Jesper Dangaard Brouer
2016-11-17 21:44                             ` Eric Dumazet
2016-11-17 23:08                               ` Rick Jones
2016-11-18  0:37                                 ` Julian Anastasov
2016-11-18  0:42                                   ` Rick Jones
2016-11-18 17:12                               ` Jesper Dangaard Brouer
2016-11-21 16:03                           ` Jesper Dangaard Brouer [this message]
2016-11-21 18:10                             ` Eric Dumazet
2016-11-29  6:58                               ` [WIP] net+mlx4: auto doorbell Eric Dumazet
2016-11-30 11:38                                 ` Jesper Dangaard Brouer
2016-11-30 15:56                                   ` Eric Dumazet
2016-11-30 19:17                                     ` Jesper Dangaard Brouer
2016-11-30 19:30                                       ` Eric Dumazet
2016-11-30 22:30                                         ` Jesper Dangaard Brouer
2016-11-30 22:40                                           ` Eric Dumazet
2016-12-01  0:27                                         ` Eric Dumazet
2016-12-01  1:16                                           ` Tom Herbert
2016-12-01  2:32                                             ` Eric Dumazet
2016-12-01  2:50                                               ` Eric Dumazet
2016-12-02 18:16                                                 ` Eric Dumazet
2016-12-01  5:03                                               ` Tom Herbert
2016-12-01 19:24                                                 ` Willem de Bruijn
2016-11-30 13:50                                 ` Saeed Mahameed
2016-11-30 15:44                                   ` Eric Dumazet
2016-11-30 16:27                                     ` Saeed Mahameed
2016-11-30 17:28                                       ` Eric Dumazet
2016-12-01 12:05                                       ` Jesper Dangaard Brouer
2016-12-01 14:24                                         ` Eric Dumazet
2016-12-01 16:04                                           ` Jesper Dangaard Brouer
2016-12-01 17:04                                             ` Eric Dumazet
2016-12-01 19:17                                               ` Jesper Dangaard Brouer
2016-12-01 20:11                                                 ` Eric Dumazet
2016-12-01 20:20                                               ` David Miller
2016-12-01 22:10                                                 ` Eric Dumazet
2016-12-02 14:23                                               ` Eric Dumazet
2016-12-01 21:32                                 ` Alexander Duyck
2016-12-01 22:04                                   ` Eric Dumazet
2016-11-17 17:34                     ` Netperf UDP issue with connected sockets David Laight
2016-11-17 22:39                       ` Alexander Duyck
2016-11-17 17:42             ` Rick Jones
2016-11-28 18:33             ` Rick Jones
2016-11-28 18:40               ` Rick Jones
2016-11-30 10:43               ` Jesper Dangaard Brouer
2016-11-30 17:42                 ` Rick Jones
2016-11-30 18:11                   ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161121170351.50a09ee1@redhat.com \
    --to=brouer@redhat.com \
    --cc=eric.dumazet@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=rick.jones2@hpe.com \
    --cc=saeedm@mellanox.com \
    --cc=tariqt@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.