Re: [PATCH RFC net-next 1/2] net: Reference bpf_redirect_info via task_struct on PREEMPT_RT.

BPF List
 help / color / mirror / Atom feed

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
To: Jesper Dangaard Brouer <hawk@kernel.org>
Cc: "Toke Høiland-Jørgensen" <toke@redhat.com>,
	bpf@vger.kernel.org, netdev@vger.kernel.org
Subject: Re: [PATCH RFC net-next 1/2] net: Reference bpf_redirect_info via task_struct on PREEMPT_RT.
Date: Tue, 20 Feb 2024 16:32:06 +0100	[thread overview]
Message-ID: <20240220153206.AUZ_zP24@linutronix.de> (raw)
In-Reply-To: <07620deb-2b96-4bcc-a045-480568a27c58@kernel.org>

On 2024-02-20 13:57:24 [+0100], Jesper Dangaard Brouer wrote:
> > so I replaced nr_cpu_ids with 64 and bootet maxcpus=64 so that I can run
> > xdp-bench on the ixgbe.
> > 
> 
> Yes, ixgbe HW have limited TX queues, and XDP tries to allocate a
> hardware TX queue for every CPU in the system.  So, I guess you have too
> many CPUs in your system - lol.
> 
> Other drivers have a fallback to a locked XDP TX path, so this is also
> something to lookout for in the machine with i40e.

this locked XDP TX path starts at 64 but xdp_progs are rejected > 64 * 2.

> > so. i40 send, ixgbe receive.
> > 
> > -t 2
> > 
> > | Summary                 2,348,800 rx/s                  0 err/s
> > |   receive total         2,348,800 pkt/s         2,348,800 drop/s                0 error/s
> > |     cpu:0               2,348,800 pkt/s         2,348,800 drop/s                0 error/s
> > |   xdp_exception                 0 hit/s
> > 
> 
> This is way too low, with i40e sending.
> 
> On my system with only -t 1 my i40e driver can send with approx 15Mpps:
> 
>  Ethtool(i40e2) stat:     15028585 (  15,028,585) <= tx-0.packets /sec
>  Ethtool(i40e2) stat:     15028589 (  15,028,589) <= tx_packets /sec

-t1 in ixgbe
Show adapter(s) (eth1) statistics (ONLY that changed!)
Ethtool(eth1    ) stat:    107857263 (    107,857,263) <= tx_bytes /sec
Ethtool(eth1    ) stat:    115047684 (    115,047,684) <= tx_bytes_nic /sec
Ethtool(eth1    ) stat:      1797621 (      1,797,621) <= tx_packets /sec
Ethtool(eth1    ) stat:      1797636 (      1,797,636) <= tx_pkts_nic /sec
Ethtool(eth1    ) stat:    107857263 (    107,857,263) <= tx_queue_0_bytes /sec
Ethtool(eth1    ) stat:      1797621 (      1,797,621) <= tx_queue_0_packets /sec

-t i40e
Ethtool(eno2np1 ) stat:           90 (             90) <= port.rx_bytes /sec
Ethtool(eno2np1 ) stat:            1 (              1) <= port.rx_size_127 /sec
Ethtool(eno2np1 ) stat:            1 (              1) <= port.rx_unicast /sec
Ethtool(eno2np1 ) stat:     79554379 (     79,554,379) <= port.tx_bytes /sec
Ethtool(eno2np1 ) stat:      1243037 (      1,243,037) <= port.tx_size_64 /sec
Ethtool(eno2np1 ) stat:      1243037 (      1,243,037) <= port.tx_unicast /sec
Ethtool(eno2np1 ) stat:           86 (             86) <= rx-32.bytes /sec
Ethtool(eno2np1 ) stat:            1 (              1) <= rx-32.packets /sec
Ethtool(eno2np1 ) stat:           86 (             86) <= rx_bytes /sec
Ethtool(eno2np1 ) stat:            1 (              1) <= rx_cache_waive /sec
Ethtool(eno2np1 ) stat:            1 (              1) <= rx_packets /sec
Ethtool(eno2np1 ) stat:            1 (              1) <= rx_unicast /sec
Ethtool(eno2np1 ) stat:     74580821 (     74,580,821) <= tx-0.bytes /sec
Ethtool(eno2np1 ) stat:      1243014 (      1,243,014) <= tx-0.packets /sec
Ethtool(eno2np1 ) stat:     74580821 (     74,580,821) <= tx_bytes /sec
Ethtool(eno2np1 ) stat:      1243014 (      1,243,014) <= tx_packets /sec
Ethtool(eno2np1 ) stat:      1243037 (      1,243,037) <= tx_unicast /sec

mine is slightly slower. But this seems to match what I see on the RX
side.

> At this level, if you can verify that CPU:60 is 100% loaded, and packet
> generator is sending more than rx number, then it could work as a valid
> experiment.

i40e receiving on 8:
%Cpu8  :  0.0 us,  0.0 sy,  0.0 ni, 84.8 id,  0.0 wa,  0.0 hi, 15.2 si,  0.0 st 

ixgbe receiving on 13:
%Cpu13 :  0.0 us,  0.0 sy,  0.0 ni, 56.7 id,  0.0 wa,  0.0 hi, 43.3 si,  0.0 st 

looks idle. On the sending side kpktgend_0 is always at 100%.

> > -t 18
> > | Summary                 7,784,946 rx/s                  0 err/s
> > |   receive total         7,784,946 pkt/s         7,784,946 drop/s                0 error/s
> > |     cpu:60              7,784,946 pkt/s         7,784,946 drop/s                0 error/s
> > |   xdp_exception                 0 hit/s
> > 
> > after t18 it drop down to 2,…
> > Now I got worse than before since -t8 says 7,5… and it did 8,4 in the
> > morning. Do you have maybe a .config for me in case I did not enable the
> > performance switch?
> > 
> 
> I would look for root-cause with perf record +
>  perf report --sort cpu,comm,dso,symbol --no-children

while sending with ixgbe while running perf top on the box:
| Samples: 621K of event 'cycles', 4000 Hz, Event count (approx.): 49979376685 lost: 0/0 drop: 0/0
| Overhead  CPU  Command          Shared Object             Symbol
|   31.98%  000  kpktgend_0       [kernel]                  [k] xas_find
|    6.72%  000  kpktgend_0       [kernel]                  [k] pfn_to_dma_pte
|    5.63%  000  kpktgend_0       [kernel]                  [k] ixgbe_xmit_frame_ring
|    4.78%  000  kpktgend_0       [kernel]                  [k] dma_pte_clear_level
|    3.16%  000  kpktgend_0       [kernel]                  [k] __iommu_dma_unmap
|    2.30%  000  kpktgend_0       [kernel]                  [k] fq_ring_free_locked
|    1.99%  000  kpktgend_0       [kernel]                  [k] __domain_mapping
|    1.82%  000  kpktgend_0       [kernel]                  [k] iommu_dma_alloc_iova
|    1.80%  000  kpktgend_0       [kernel]                  [k] __iommu_map
|    1.72%  000  kpktgend_0       [kernel]                  [k] iommu_pgsize.isra.0
|    1.70%  000  kpktgend_0       [kernel]                  [k] __iommu_dma_map
|    1.63%  000  kpktgend_0       [kernel]                  [k] alloc_iova_fast
|    1.59%  000  kpktgend_0       [kernel]                  [k] _raw_spin_lock_irqsave
|    1.32%  000  kpktgend_0       [kernel]                  [k] iommu_map
|    1.30%  000  kpktgend_0       [kernel]                  [k] iommu_dma_map_page
|    1.23%  000  kpktgend_0       [kernel]                  [k] intel_iommu_iotlb_sync_map
|    1.21%  000  kpktgend_0       [kernel]                  [k] xa_find_after
|    1.17%  000  kpktgend_0       [kernel]                  [k] ixgbe_poll
|    1.06%  000  kpktgend_0       [kernel]                  [k] __iommu_unmap
|    1.04%  000  kpktgend_0       [kernel]                  [k] intel_iommu_unmap_pages
|    1.01%  000  kpktgend_0       [kernel]                  [k] free_iova_fast
|    0.96%  000  kpktgend_0       [pktgen]                  [k] pktgen_thread_worker

the i40e box while sending:
|Samples: 400K of event 'cycles:P', 4000 Hz, Event count (approx.): 80512443924 lost: 0/0 drop: 0/0
|Overhead  CPU  Command          Shared Object         Symbol
|  24.04%  000  kpktgend_0       [kernel]              [k] i40e_lan_xmit_frame
|  17.20%  019  swapper          [kernel]              [k] i40e_napi_poll
|   4.84%  019  swapper          [kernel]              [k] intel_idle_irq
|   4.20%  019  swapper          [kernel]              [k] napi_consume_skb
|   3.00%  000  kpktgend_0       [pktgen]              [k] pktgen_thread_worker
|   2.76%  008  swapper          [kernel]              [k] i40e_napi_poll
|   2.36%  000  kpktgend_0       [kernel]              [k] dma_map_page_attrs
|   1.93%  019  swapper          [kernel]              [k] dma_unmap_page_attrs
|   1.70%  008  swapper          [kernel]              [k] intel_idle_irq
|   1.44%  008  swapper          [kernel]              [k] __udp4_lib_rcv
|   1.44%  008  swapper          [kernel]              [k] __netif_receive_skb_core.constprop.0
|   1.40%  008  swapper          [kernel]              [k] napi_build_skb
|   1.28%  000  kpktgend_0       [kernel]              [k] kfree_skb_reason
|   1.27%  008  swapper          [kernel]              [k] ip_rcv_core
|   1.19%  008  swapper          [kernel]              [k] inet_gro_receive
|   1.01%  008  swapper          [kernel]              [k] kmem_cache_free.part.0

> --Jesper

Sebastian

next prev parent reply	other threads:[~2024-02-20 15:32 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-13 14:58 [PATCH RFC net-next 0/2] Use per-task storage for XDP-redirects on PREEMPT_RT Sebastian Andrzej Siewior
2024-02-13 14:58 ` [PATCH RFC net-next 1/2] net: Reference bpf_redirect_info via task_struct " Sebastian Andrzej Siewior
2024-02-13 20:50   ` Jesper Dangaard Brouer
2024-02-14 12:19     ` Sebastian Andrzej Siewior
2024-02-14 13:23       ` Toke Høiland-Jørgensen
2024-02-14 14:28         ` Sebastian Andrzej Siewior
2024-02-14 16:08           ` Toke Høiland-Jørgensen
2024-02-14 16:36             ` Sebastian Andrzej Siewior
2024-02-15 20:23               ` Toke Høiland-Jørgensen
2024-02-16 16:57                 ` Sebastian Andrzej Siewior
2024-02-19 19:01                   ` Toke Høiland-Jørgensen
2024-02-20  9:17                     ` Jesper Dangaard Brouer
2024-02-20 10:17                       ` Sebastian Andrzej Siewior
2024-02-20 10:42                         ` Jesper Dangaard Brouer
2024-02-20 12:08                           ` Sebastian Andrzej Siewior
2024-02-20 12:57                             ` Jesper Dangaard Brouer
2024-02-20 15:32                               ` Sebastian Andrzej Siewior [this message]
2024-02-22  9:22                                 ` Sebastian Andrzej Siewior
2024-02-22 10:10                                   ` Jesper Dangaard Brouer
2024-02-22 10:58                                     ` Sebastian Andrzej Siewior
2024-02-20 12:10                           ` Dave Taht
2024-02-14 16:13   ` Toke Høiland-Jørgensen
2024-02-15  9:04     ` Sebastian Andrzej Siewior
2024-02-15 12:11       ` Toke Høiland-Jørgensen
2024-02-13 14:58 ` [PATCH RFC net-next 2/2] net: Move per-CPU flush-lists to bpf_xdp_storage " Sebastian Andrzej Siewior

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240220153206.AUZ_zP24@linutronix.de \
    --to=bigeasy@linutronix.de \
    --cc=bpf@vger.kernel.org \
    --cc=hawk@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=toke@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox