All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brenden Blanco <bblanco@plumgrid.com>
To: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	Jesper Dangaard Brouer <brouer@redhat.com>,
	davem@davemloft.net, netdev@vger.kernel.org, tom@herbertland.com,
	daniel@iogearbox.net, john.fastabend@gmail.com,
	Eran Ben Elisha <eranbe@mellanox.com>,
	Rana Shahout <ranas@mellanox.com>,
	Matan Barak <matanb@mellanox.com>
Subject: Re: [RFC PATCH 4/5] mlx4: add support for fast rx drop bpf program
Date: Tue, 5 Apr 2016 21:05:04 -0700	[thread overview]
Message-ID: <20160406040503.GA18574@gmail.com> (raw)
In-Reply-To: <5703C878.7050307@mellanox.com>

On Tue, Apr 05, 2016 at 05:15:20PM +0300, Or Gerlitz wrote:
> On 4/4/2016 9:50 PM, Alexei Starovoitov wrote:
> >On Mon, Apr 04, 2016 at 08:22:03AM -0700, Eric Dumazet wrote:
> >>A single flow is able to use 40Gbit on those 40Gbit NIC, so there is not
> >>a single 10GB trunk used for a given flow.
> >>
> >>This 14Mpps thing seems to be a queue limitation on mlx4.
> >yeah, could be queueing related. Multiple cpus can send ~30Mpps of the same 64 byte packet,
> >but mlx4 can only receive 14.5Mpps. Odd.
> >
> >Or (and other mellanox guys), what is really going on inside 40G nic?
> 
> Hi Alexei,
> 
> Not that I know everything that goes inside there, and not that if I
> knew it all I could have posted that here (I heard HWs sometimes
> have IP)... but, anyway, as for your questions:
> 
> ConnectX3 40Gbs NIC can receive > 10Gbs packet-worthy (14.5M) in
> single ring and Mellanox
> 100Gbs NICs can receive > 25Gbs packet-worthy (37.5M) in single
> ring, people that use DPDK (...) even see this numbers and AFAIU we
> now attempt to see that in the kernel with XDP :)
> 
> I realize that we might have some issues in the mlx4 driver
> reporting on HW drops. Eran (cc-ed) and Co are looking on that.
Thanks!
> 
> In parallel to doing so, I would suggest you to do some experiments
> that might shed some more light, if on the TX side you do
> 
> $ ./pktgen_sample03_burst_single_flow.sh -i $DEV -d $IP -m $MAC -t 4
> 
> On the RX side,  skip RSS and force the packets that match that
> traffic pattern to go to (say) ring (==action) 0
> 
> $ ethtool -U $DEV flow-type ip4 dst-mac $MAC dst-ip $IP action 0 loc 0

I added the module parameter:
  options mlx4_core log_num_mgm_entry_size=-1
And with this I was able to reach to >20 Mpps. This is actually
regardless of the ethtool settings mentioned above.

 25.31%  ksoftirqd/0   [mlx4_en]         [k] mlx4_en_process_rx_cq
 20.18%  ksoftirqd/0   [mlx4_en]         [k] mlx4_en_alloc_frags
  8.42%  ksoftirqd/0   [mlx4_en]         [k] mlx4_en_free_frag
  5.59%  swapper       [kernel.vmlinux]  [k] poll_idle
  5.38%  ksoftirqd/0   [kernel.vmlinux]  [k] get_page_from_freelist
  3.06%  ksoftirqd/0   [mlx4_en]         [k] mlx4_call_bpf
  2.73%  ksoftirqd/0   [mlx4_en]         [k] 0x000000000001cf94
  2.72%  ksoftirqd/0   [kernel.vmlinux]  [k] free_pages_prepare
  2.19%  ksoftirqd/0   [kernel.vmlinux]  [k] percpu_array_map_lookup_elem
  2.08%  ksoftirqd/0   [kernel.vmlinux]  [k] sk_load_byte_positive_offset
  1.72%  ksoftirqd/0   [kernel.vmlinux]  [k] free_one_page
  1.59%  ksoftirqd/0   [kernel.vmlinux]  [k] bpf_map_lookup_elem
  1.30%  ksoftirqd/0   [mlx4_en]         [k] 0x000000000001cfc1
  1.07%  ksoftirqd/0   [kernel.vmlinux]  [k] __alloc_pages_nodemask
  1.00%  ksoftirqd/0   [mlx4_en]         [k] mlx4_alloc_pages.isra.23

> 
> to go back to RSS remove the rule
> 
> $ ethtool -U $DEV delete action 0
> 
> FWIW (not that I see how it helps you now), you can do HW drop on
> the RX side with ring -1
> 
> $ ethtool -U $DEV flow-type ip4 dst-mac $MAC dst-ip $IP action -1 loc 0
> 
> Or.
> 

Here also is the output from the two machines using a tool to get
ethtool delta stats at 1 second intervals:

----------- sender -----------
           tx_packets: 20,246,059
             tx_bytes: 1,214,763,540 bps    = 9,267.91 Mbps
            xmit_more: 19,463,226
        queue_stopped: 36,982
           wake_queue: 36,982
             rx_pause: 6,351
    tx_pause_duration: 124,974
  tx_pause_transition: 3,176
    tx_novlan_packets: 20,244,344
      tx_novlan_bytes: 1,295,629,440 bps    = 9,884.86 Mbps
          tx0_packets: 5,151,029
            tx0_bytes: 309,061,680 bps      = 2,357.95 Mbps
          tx1_packets: 5,094,532
            tx1_bytes: 305,671,920 bps      = 2,332.9 Mbps
          tx2_packets: 5,130,996
            tx2_bytes: 307,859,760 bps      = 2,348.78 Mbps
          tx3_packets: 5,135,513
            tx3_bytes: 308,130,780 bps      = 2,350.85 Mbps
                 UP 0: 9,389.68             Mbps = 100.00%
                 UP 0: 20,512,070           Tran/sec = 100.00%

----------- receiver -----------
           rx_packets: 20,207,929
             rx_bytes: 1,212,475,740 bps    = 9,250.45 Mbps
           rx_dropped: 236,604
    rx_pause_duration: 128,436
  rx_pause_transition: 3,258
             tx_pause: 6,516
    rx_novlan_packets: 20,208,906
      rx_novlan_bytes: 1,293,369,984 bps    = 9,867.62 Mbps
          rx0_packets: 20,444,526
            rx0_bytes: 1,226,671,560 bps    = 9,358.76 Mbps

  reply	other threads:[~2016-04-06  4:05 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-02  1:21 [RFC PATCH 0/5] Add driver bpf hook for early packet drop Brenden Blanco
2016-04-02  1:21 ` [RFC PATCH 1/5] bpf: add PHYS_DEV prog type for early driver filter Brenden Blanco
2016-04-02 16:39   ` Tom Herbert
2016-04-03  7:02     ` Brenden Blanco
2016-04-04 22:07       ` Thomas Graf
2016-04-05  8:19         ` Jesper Dangaard Brouer
2016-04-04  8:49   ` Daniel Borkmann
2016-04-04 13:07     ` Jesper Dangaard Brouer
2016-04-04 13:36       ` Daniel Borkmann
2016-04-04 14:09         ` Tom Herbert
2016-04-04 15:12           ` Jesper Dangaard Brouer
2016-04-04 15:29             ` Brenden Blanco
2016-04-04 16:07               ` John Fastabend
2016-04-04 16:17                 ` Brenden Blanco
2016-04-04 20:00                   ` Alexei Starovoitov
2016-04-04 22:04                     ` Thomas Graf
2016-04-05  2:25                       ` Alexei Starovoitov
2016-04-05  8:11                         ` Jesper Dangaard Brouer
2016-04-05  9:29                     ` Jesper Dangaard Brouer
2016-04-05 22:06                       ` Alexei Starovoitov
2016-04-04 14:33       ` Eric Dumazet
2016-04-04 15:18         ` Edward Cree
2016-04-02  1:21 ` [RFC PATCH 2/5] net: add ndo to set bpf prog in adapter rx Brenden Blanco
2016-04-02  1:21 ` [RFC PATCH 3/5] rtnl: add option for setting link bpf prog Brenden Blanco
2016-04-02  1:21 ` [RFC PATCH 4/5] mlx4: add support for fast rx drop bpf program Brenden Blanco
2016-04-02  2:08   ` Eric Dumazet
2016-04-02  2:47     ` Alexei Starovoitov
2016-04-04 14:57       ` Jesper Dangaard Brouer
2016-04-04 15:22         ` Eric Dumazet
2016-04-04 18:50           ` Alexei Starovoitov
2016-04-05 14:15             ` Or Gerlitz
2016-04-06  4:05               ` Brenden Blanco [this message]
2016-04-03  6:15     ` Brenden Blanco
2016-04-05  2:20       ` Brenden Blanco
2016-04-05  2:44         ` Eric Dumazet
2016-04-05 18:59         ` Eran Ben Elisha
2016-04-02  8:23   ` Jesper Dangaard Brouer
2016-04-03  6:11     ` Brenden Blanco
2016-04-04 18:27       ` Alexei Starovoitov
2016-04-05  6:04         ` Jesper Dangaard Brouer
2016-04-02 18:40   ` Johannes Berg
2016-04-03  6:38     ` Brenden Blanco
2016-04-04  7:35       ` Johannes Berg
2016-04-04  9:57         ` Daniel Borkmann
2016-04-04 18:46           ` Alexei Starovoitov
2016-04-04 21:01             ` Daniel Borkmann
2016-04-05  1:17               ` Alexei Starovoitov
2016-04-04  8:33   ` Jesper Dangaard Brouer
2016-04-04  9:22   ` Daniel Borkmann
2016-04-02  1:21 ` [RFC PATCH 5/5] Add sample for adding simple drop program to link Brenden Blanco
2016-04-06 19:48   ` Jesper Dangaard Brouer
2016-04-06 20:01     ` Jesper Dangaard Brouer
2016-04-06 23:11       ` Alexei Starovoitov
2016-04-06 20:03     ` Daniel Borkmann
2016-04-02 16:47 ` [RFC PATCH 0/5] Add driver bpf hook for early packet drop Tom Herbert
2016-04-03  5:41   ` Brenden Blanco
2016-04-04  7:48     ` Jesper Dangaard Brouer
2016-04-04 18:10       ` Alexei Starovoitov
2016-04-02 18:41 ` Johannes Berg
2016-04-02 22:57   ` Tom Herbert
2016-04-03  2:28     ` Lorenzo Colitti
2016-04-04  7:37       ` Johannes Berg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160406040503.GA18574@gmail.com \
    --to=bblanco@plumgrid.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=brouer@redhat.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=eranbe@mellanox.com \
    --cc=eric.dumazet@gmail.com \
    --cc=john.fastabend@gmail.com \
    --cc=matanb@mellanox.com \
    --cc=netdev@vger.kernel.org \
    --cc=ogerlitz@mellanox.com \
    --cc=ranas@mellanox.com \
    --cc=tom@herbertland.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.