* XDP Performance Regression in recent kernel versions
@ 2024-06-18 15:28 Sebastiano Miano
2024-06-19 6:00 ` Tariq Toukan
2024-06-19 16:27 ` Jesper Dangaard Brouer
0 siblings, 2 replies; 20+ messages in thread
From: Sebastiano Miano @ 2024-06-18 15:28 UTC (permalink / raw)
To: bpf, netdev; +Cc: saeedm, tariqt, hawk, edumazet, kuba, pabeni
Hi folks,
I have been conducting some basic experiments with XDP and have
observed a significant performance regression in recent kernel
versions compared to v5.15.
My setup is the following:
- Hardware: Two machines connected back-to-back with 100G Mellanox
ConnectX-6 Dx.
- DUT: 2x16 core Intel(R) Xeon(R) Silver 4314 CPU @ 2.40GHz.
- Software: xdp-bench program from [1] running on the DUT in both DROP
and TX modes.
- Traffic generator: Pktgen-DPDK sending traffic with a single 64B UDP
flow at ~130Mpps.
- Tests: Single core, HT disabled
Results:
Kernel version |-------| XDP_DROP |--------| XDP_TX |
5.15 30Mpps 16.1Mpps
6.2 21.3Mpps 14.1Mpps
6.5 19.9Mpps 8.6Mpps
bpf-next (6.10-rc2) 22.1Mpps 9.2Mpps
I repeated the experiments multiple times and consistently obtained
similar results.
Are you aware of any performance regressions in recent kernel versions
that could explain these results?
[1] https://github.com/xdp-project/xdp-tools
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XDP Performance Regression in recent kernel versions
2024-06-18 15:28 XDP Performance Regression in recent kernel versions Sebastiano Miano
@ 2024-06-19 6:00 ` Tariq Toukan
2024-06-19 15:17 ` Sebastiano Miano
2024-06-19 16:27 ` Jesper Dangaard Brouer
1 sibling, 1 reply; 20+ messages in thread
From: Tariq Toukan @ 2024-06-19 6:00 UTC (permalink / raw)
To: Sebastiano Miano, bpf, netdev
Cc: saeedm, tariqt, hawk, edumazet, kuba, pabeni, Gal Pressman, amira
On 18/06/2024 18:28, Sebastiano Miano wrote:
> Hi folks,
>
> I have been conducting some basic experiments with XDP and have
> observed a significant performance regression in recent kernel
> versions compared to v5.15.
>
Hi,
> My setup is the following:
> - Hardware: Two machines connected back-to-back with 100G Mellanox
> ConnectX-6 Dx.
> - DUT: 2x16 core Intel(R) Xeon(R) Silver 4314 CPU @ 2.40GHz.
> - Software: xdp-bench program from [1] running on the DUT in both DROP
> and TX modes.
> - Traffic generator: Pktgen-DPDK sending traffic with a single 64B UDP
> flow at ~130Mpps.
> - Tests: Single core, HT disabled
>
> Results:
>
> Kernel version |-------| XDP_DROP |--------| XDP_TX |
> 5.15 30Mpps 16.1Mpps
> 6.2 21.3Mpps 14.1Mpps
> 6.5 19.9Mpps 8.6Mpps
> bpf-next (6.10-rc2) 22.1Mpps 9.2Mpps
>
> I repeated the experiments multiple times and consistently obtained
> similar results.
> Are you aware of any performance regressions in recent kernel versions
> that could explain these results?
>
> [1] https://github.com/xdp-project/xdp-tools
>
Thanks for your report.
I assume cpu util for the active core on the DUT is 100% in all cases,
right?
Can you please share some more details? Like relevant ethtool counters,
and perf top output.
We'll check if this repro for us as well.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XDP Performance Regression in recent kernel versions
2024-06-19 6:00 ` Tariq Toukan
@ 2024-06-19 15:17 ` Sebastiano Miano
0 siblings, 0 replies; 20+ messages in thread
From: Sebastiano Miano @ 2024-06-19 15:17 UTC (permalink / raw)
To: Tariq Toukan
Cc: bpf, netdev, saeedm, hawk, edumazet, kuba, pabeni, Gal Pressman,
amira
On Wed, 19 Jun 2024 at 08:00, Tariq Toukan <tariqt@nvidia.com> wrote:
>
> Thanks for your report.
>
> I assume cpu util for the active core on the DUT is 100% in all cases,
> right?
Yes, that's correct.
The irq is also on the core on the right numa node, and I have
disabled CPU frequency scaling.
>
> Can you please share some more details? Like relevant ethtool counters,
> and perf top output.
>
> We'll check if this repro for us as well.
Sure, below you can find the reports for the XDP_DROP and XDP_TX cases.
I am attaching only the ones for kern v5.15 vs v6.5.
--------------------------------------------------
ethtool output (5.15) - Missing counters are zero
--------------------------------------------------
NIC statistics:
rx_packets: 333854100
rx_bytes: 20031246044
tx_packets: 25
tx_bytes: 2070
rx_csum_unnecessary: 333854079
rx_xdp_drop: 3753342954
rx_xdp_redirect: 0
rx_xdp_tx_xmit: 5582660674
rx_xdp_tx_mpwqe: 175018775
rx_xdp_tx_inlnw: 8970048
rx_xdp_tx_nops: 378338337
rx_xdp_tx_full: 0
rx_xdp_tx_err: 0
rx_xdp_tx_cqe: 87229072
rx_cache_reuse: 9369255040
rx_cache_full: 68
rx_cache_empty: 16153471
rx_cache_busy: 193
rx_cache_waive: 15864256
rx_congst_umr: 158
ch_events: 448
ch_poll: 151091830
ch_arm: 301
rx_out_of_buffer: 990473555
rx_if_down_packets: 67469721
rx_steer_missed_packets: 1962570491
rx_vport_unicast_packets: 38460159194
rx_vport_unicast_bytes: 2461450188460
tx_vport_unicast_packets: 5582654212
tx_vport_unicast_bytes: 334959252764
tx_packets_phy: 5588396729
rx_packets_phy: 97052087562
tx_bytes_phy: 357657403514
rx_bytes_phy: 6211329423080
tx_mac_control_phy: 5745055
tx_pause_ctrl_phy: 5745055
rx_discards_phy: 58591428329
tx_discards_phy: 0
tx_errors_phy: 0
rx_undersize_pkts_phy: 0
rx_fragments_phy: 0
rx_jabbers_phy: 0
rx_64_bytes_phy: 97052040472
rx_65_to_127_bytes_phy: 3
rx_128_to_255_bytes_phy: 0
rx_256_to_511_bytes_phy: 26
rx_512_to_1023_bytes_phy: 0
rx_1024_to_1518_bytes_phy: 0
rx_1519_to_2047_bytes_phy: 0
rx_2048_to_4095_bytes_phy: 0
rx_4096_to_8191_bytes_phy: 0
rx_8192_to_10239_bytes_phy: 0
rx_prio0_bytes: 6211318150440
rx_prio0_packets: 38460533605
rx_prio0_discards: 58591314012
tx_prio0_bytes: 357288052986
tx_prio0_packets: 5582625883
tx_global_pause: 5745042
tx_global_pause_duration: 771103810
ch0_events: 55
ch0_poll: 146981606
ch0_arm: 35
ch0_aff_change: 6
ch0_force_irq: 0
ch0_eq_rearm: 0
rx0_packets: 70812690
rx0_bytes: 4248761400
rx0_csum_complete: 0
rx0_csum_complete_tail: 0
rx0_csum_complete_tail_slow: 0
rx0_csum_unnecessary: 70812671
rx0_csum_unnecessary_inner: 0
rx0_csum_none: 19
rx0_xdp_drop: 3753342954
rx0_xdp_redirect: 0
rx0_lro_packets: 0
rx0_lro_bytes: 0
rx0_ecn_mark: 0
rx0_removed_vlan_packets: 0
rx0_wqe_err: 0
rx0_mpwqe_filler_cqes: 0
rx0_mpwqe_filler_strides: 0
rx0_oversize_pkts_sw_drop: 0
rx0_buff_alloc_err: 0
rx0_cqe_compress_blks: 0
rx0_cqe_compress_pkts: 0
rx0_cache_reuse: 9368316609
rx0_cache_full: 2
rx0_cache_empty: 11519
rx0_cache_busy: 0
rx0_cache_waive: 0
rx0_congst_umr: 158
rx0_arfs_err: 0
rx0_recover: 0
rx0_xdp_tx_xmit: 5582664928
rx0_xdp_tx_mpwqe: 175018908
rx0_xdp_tx_inlnw: 8970048
rx0_xdp_tx_nops: 378338623
rx0_xdp_tx_full: 0
rx0_xdp_tx_err: 0
rx0_xdp_tx_cqes: 87229139
--------------------------------------------------
perf top output (5.15) - XDP_DROP
--------------------------------------------------
19.27% [kernel] [k] mlx5e_skb_from_cqe_mpwrq_linear
11.74% [kernel] [k] mlx5e_handle_rx_cqe_mpwrq
9.82% [kernel] [k] mlx5e_xdp_handle
9.43% [kernel] [k] mlx5e_alloc_rx_mpwqe
9.29% bpf_prog_xdp_basic_prog [k] bpf_prog_5f76c01f0ff23233_xdp_basic_prog
7.06% [kernel] [k] mlx5e_page_release_dynamic
6.95% [kernel] [k] mlx5e_poll_rx_cq
5.89% [kernel] [k] dma_sync_single_for_cpu
5.21% [kernel] [k] dma_sync_single_for_device
4.12% [kernel] [k] mlx5e_free_rx_mpwqe
1.65% [kernel] [k] mlx5e_poll_ico_cq
1.60% [kernel] [k] mlx5e_napi_poll
1.59% [kernel] [k] bpf_get_smp_processor_id
0.94% [kernel] [k] bpf_dispatcher_xdp_func
0.91% [kernel] [k] net_rx_action
0.90% bpf_prog_xdp_dispatcher [k] bpf_prog_17d608957d1f805a_xdp_dispatcher
0.90% [kernel] [k] bpf_dispatcher_xdp
0.64% [kernel] [k] mlx5e_post_rx_mpwqes
0.64% [kernel] [k] mlx5e_poll_xdpsq_cq
0.37% [kernel] [k] __softirqentry_text_start
--------------------------------------------------
perf top output (5.15) - XDP_TX
--------------------------------------------------
13.84% bpf_prog_xdp_swap_macs_prog [k]
bpf_prog_0a3ad412f28cbb6d_xdp_swap_macs_prog
11.43% [kernel] [k] mlx5e_xmit_xdp_buff
10.69% [kernel] [k] mlx5e_skb_from_cqe_mpwrq_linear
9.79% [kernel] [k] mlx5e_xmit_xdp_frame_mpwqe
8.35% [kernel] [k] mlx5e_handle_rx_cqe_mpwrq
6.34% [kernel] [k] dma_sync_single_for_device
6.20% [kernel] [k] mlx5e_poll_rx_cq
5.62% [kernel] [k] mlx5e_page_release_dynamic
5.33% [kernel] [k] mlx5e_xdp_handle
5.21% [kernel] [k] mlx5e_alloc_rx_mpwqe
4.47% [kernel] [k] mlx5e_free_xdpsq_desc
3.26% [kernel] [k] dma_sync_single_for_cpu
1.47% [kernel] [k] mlx5e_xmit_xdp_frame_check_mpwqe
1.22% [kernel] [k] mlx5e_poll_xdpsq_cq
0.95% [kernel] [k] net_rx_action
0.90% [kernel] [k] bpf_get_smp_processor_id
0.80% [kernel] [k] mlx5e_napi_poll
0.69% [kernel] [k] mlx5e_xdp_mpwqe_session_start
0.63% [kernel] [k] mlx5e_poll_ico_cq
0.49% [kernel] [k] bpf_dispatcher_xdp
0.47% [kernel] [k] bpf_dispatcher_xdp_func
---------------------------------------------------------------------------------------
--------------------------------------------------
ethtool output (6.5) - Missing counters are zero
--------------------------------------------------
NIC statistics:
rx_packets: 7282880
rx_bytes: 436973482
tx_packets: 42
tx_bytes: 3556
rx_csum_unnecessary: 7282816
rx_xdp_drop: 7783331724
rx_xdp_redirect: 0
rx_xdp_tx_xmit: 46956452544
rx_xdp_tx_mpwqe: 4401807536
rx_xdp_tx_inlnw: 46951234092
rx_xdp_tx_nops: 4988835176
rx_xdp_tx_full: 0
rx_xdp_tx_err: 0
rx_xdp_tx_cqe: 733694572
rx_pp_alloc_fast: 3641784
rx_pp_alloc_slow: 8
rx_pp_alloc_slow_high_order: 0
rx_pp_alloc_empty: 8
rx_pp_alloc_refill: 0
rx_pp_alloc_waive: 0
rx_pp_recycle_cached: 3641280
ch_events: 505
ch_poll: 855423286
rx_out_of_buffer: 534918379
rx_if_down_packets: 4044804
rx_steer_missed_packets: 298
rx_vport_unicast_packets: 287214261626
rx_vport_unicast_bytes: 18381712744116
tx_vport_unicast_packets: 46956452544
tx_vport_unicast_bytes: 2817387157674
tx_packets_phy: 47000866603
rx_packets_phy: 728277471186
tx_bytes_phy: 3008055468662
rx_bytes_phy: 46609758231313
tx_mac_control_phy: 44414017
tx_pause_ctrl_phy: 44414017
rx_discards_phy: 441063206498
rx_64_bytes_phy: 728277470842
rx_65_to_127_bytes_phy: 133
rx_128_to_255_bytes_phy: 0
rx_256_to_511_bytes_phy: 211
rx_512_to_1023_bytes_phy: 0
rx_1024_to_1518_bytes_phy: 0
rx_1519_to_2047_bytes_phy: 0
rx_2048_to_4095_bytes_phy: 0
rx_4096_to_8191_bytes_phy: 0
rx_8192_to_10239_bytes_phy: 0
rx_buffer_passed_thres_phy: 1192226
rx_prio0_bytes: 46609758231313
rx_prio0_packets: 287214264688
rx_prio0_discards: 441063206498
tx_prio0_bytes: 3005212971574
tx_prio0_packets: 46956452586
tx_global_pause: 44414017
tx_global_pause_duration: 5961284324
ch0_events: 120
ch0_poll: 855423025
ch0_arm: 100
ch0_aff_change: 0
ch0_force_irq: 0
ch0_eq_rearm: 0
rx0_packets: 7282880
rx0_bytes: 436973482
rx0_csum_complete: 0
rx0_csum_complete_tail: 0
rx0_csum_complete_tail_slow: 0
rx0_csum_unnecessary: 7282816
rx0_csum_unnecessary_inner: 0
rx0_csum_none: 64
rx0_xdp_drop: 7783331724
rx0_xdp_redirect: 0
rx0_lro_packets: 0
rx0_lro_bytes: 0
rx0_gro_packets: 0
rx0_gro_bytes: 0
rx0_gro_skbs: 0
rx0_gro_match_packets: 0
rx0_gro_large_hds: 0
rx0_ecn_mark: 0
rx0_removed_vlan_packets: 0
rx0_wqe_err: 0
rx0_mpwqe_filler_cqes: 0
rx0_mpwqe_filler_strides: 0
rx0_oversize_pkts_sw_drop: 0
rx0_buff_alloc_err: 0
rx0_cqe_compress_blks: 0
rx0_cqe_compress_pkts: 0
rx0_congst_umr: 0
rx0_arfs_err: 0
rx0_recover: 0
rx0_pp_alloc_fast: 3641784
rx0_pp_alloc_slow: 8
rx0_pp_alloc_slow_high_order: 0
rx0_pp_alloc_empty: 8
rx0_pp_alloc_refill: 0
rx0_pp_alloc_waive: 0
rx0_pp_recycle_cached: 3641280
rx0_pp_recycle_cache_full: 0
rx0_pp_recycle_ring: 0
rx0_pp_recycle_ring_full: 0
rx0_pp_recycle_released_ref: 0
rx0_xdp_tx_xmit: 46956452544
rx0_xdp_tx_mpwqe: 4401807536
rx0_xdp_tx_inlnw: 46951234092
rx0_xdp_tx_nops: 4988835176
rx0_xdp_tx_full: 0
rx0_xdp_tx_err: 0
rx0_xdp_tx_cqes: 733694572
--------------------------------------------------
perf top output (6.5) - XDP_DROP
--------------------------------------------------
27.63% [kernel] [k] mlx5e_skb_from_cqe_mpwrq_linear
12.61% [kernel] [k] mlx5e_handle_rx_cqe_mpwrq
8.38% [kernel] [k] mlx5e_rx_cq_process_basic_cqe_comp
7.06% [kernel] [k] page_pool_put_defragged_page
6.45% [kernel] [k] mlx5e_xdp_handle
5.36% bpf_prog_xdp_basic_prog [k] bpf_prog_5f76c01f0ff23233_xdp_basic_prog
4.95% [kernel] [k] dma_sync_single_for_device
4.89% [kernel] [k] page_pool_alloc_pages
4.36% [kernel] [k] mlx5e_alloc_rx_mpwqe
3.70% [kernel] [k] dma_sync_single_for_cpu
2.71% [kernel] [k] mlx5e_page_release_fragmented.isra.0
2.09% [kernel] [k] bpf_dispatcher_xdp_func
1.95% [kernel] [k] mlx5e_free_rx_mpwqe
1.10% [kernel] [k] mlx5e_poll_ico_cq
1.07% [kernel] [k] bpf_get_smp_processor_id
1.05% [kernel] [k] mlx5e_napi_poll
0.85% [kernel] [k] mlx5e_poll_xdpsq_cq
0.61% [kernel] [k] net_rx_action
0.58% bpf_prog_xdp_dispatcher [k] bpf_prog_17d608957d1f805a_xdp_dispatcher
0.57% [kernel] [k] bpf_dispatcher_xdp
0.53% [kernel] [k] mlx5e_post_rx_mpwqes
0.27% [kernel] [k] __do_softirq
0.25% [kernel] [k] mlx5e_poll_tx_cq
--------------------------------------------------
perf top output (6.5) - XDP_TX
--------------------------------------------------
19.60% [kernel] [k] mlx5e_xdp_mpwqe_add_dseg
14.61% [kernel] [k] mlx5e_skb_from_cqe_mpwrq_linear
11.55% [kernel] [k] mlx5e_xmit_xdp_buff
5.85% [kernel] [k] mlx5e_handle_rx_cqe_mpwrq
5.73% bpf_prog_xdp_swap_macs_prog [k] bpf_prog_0a3a_xdp_swap_macs_prog
5.09% [kernel] [k] mlx5e_free_xdpsq_desc
5.08% [kernel] [k] dma_sync_single_for_device
4.66% [kernel] [k] mlx5e_xmit_xdp_frame_mpwqe
3.64% [kernel] [k] mlx5e_rx_cq_process_basic_cqe_comp
3.34% [kernel] [k] page_pool_put_defragged_page
3.04% [kernel] [k] mlx5e_xdp_handle
3.03% [kernel] [k] mlx5e_page_release_fragmented.isra.0
2.56% [kernel] [k] dma_sync_single_for_cpu
2.15% [kernel] [k] mlx5e_alloc_rx_mpwqe
1.96% [kernel] [k] page_pool_alloc_pages
1.06% [kernel] [k] mlx5e_xmit_xdp_frame_check_mpwqe
1.02% [kernel] [k] bpf_dispatcher_xdp_func
1.01% [kernel] [k] mlx5e_free_rx_mpwqe
0.84% [kernel] [k] mlx5e_poll_xdpsq_cq
0.62% [kernel] [k] mlx5e_xdpsq_get_next_pi
0.53% [kernel] [k] mlx5e_poll_ico_cq
0.48% [kernel] [k] bpf_get_smp_processor_id
0.48% [kernel] [k] net_rx_action
0.36% [kernel] [k] mlx5e_napi_poll
0.32% [kernel] [k] mlx5e_xdp_mpwqe_complete
0.25% [kernel] [k] bpf_dispatcher_xdp
0.22% bpf_prog_xdp_dispatcher [k] bpf_prog_17d6_xdp_dispatcher
0.21% [kernel] [k] mlx5e_post_rx_mpwqes
0.11% [kernel] [k] __do_softirq
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XDP Performance Regression in recent kernel versions
2024-06-18 15:28 XDP Performance Regression in recent kernel versions Sebastiano Miano
2024-06-19 6:00 ` Tariq Toukan
@ 2024-06-19 16:27 ` Jesper Dangaard Brouer
2024-06-19 19:17 ` Toke Høiland-Jørgensen
1 sibling, 1 reply; 20+ messages in thread
From: Jesper Dangaard Brouer @ 2024-06-19 16:27 UTC (permalink / raw)
To: Sebastiano Miano, bpf, netdev, Toke Hoiland Jorgensen,
Toke Høiland-Jørgensen
Cc: saeedm, tariqt, edumazet, kuba, pabeni
On 18/06/2024 17.28, Sebastiano Miano wrote:
> Hi folks,
>
> I have been conducting some basic experiments with XDP and have
> observed a significant performance regression in recent kernel
> versions compared to v5.15.
>
> My setup is the following:
> - Hardware: Two machines connected back-to-back with 100G Mellanox
> ConnectX-6 Dx.
> - DUT: 2x16 core Intel(R) Xeon(R) Silver 4314 CPU @ 2.40GHz.
> - Software: xdp-bench program from [1] running on the DUT in both DROP
> and TX modes.
> - Traffic generator: Pktgen-DPDK sending traffic with a single 64B UDP
> flow at ~130Mpps.
> - Tests: Single core, HT disabled
>
> Results:
>
> Kernel version |-------| XDP_DROP |--------| XDP_TX |
> 5.15 30Mpps 16.1Mpps
> 6.2 21.3Mpps 14.1Mpps
> 6.5 19.9Mpps 8.6Mpps
> bpf-next (6.10-rc2) 22.1Mpps 9.2Mpps
>
Around when I left Red Hat there were a project with [LNST] that used
xdp-bench for tracking and finding regressions like this.
Perhaps Toke can enlighten us, if that project have caught similar
regressions?
[LNST] https://github.com/LNST-project/lnst
> I repeated the experiments multiple times and consistently obtained
> similar results.
> Are you aware of any performance regressions in recent kernel versions
> that could explain these results?
>
> [1] https://github.com/xdp-project/xdp-tools
--Jesper
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XDP Performance Regression in recent kernel versions
2024-06-19 16:27 ` Jesper Dangaard Brouer
@ 2024-06-19 19:17 ` Toke Høiland-Jørgensen
2024-06-20 9:52 ` Daniel Borkmann
0 siblings, 1 reply; 20+ messages in thread
From: Toke Høiland-Jørgensen @ 2024-06-19 19:17 UTC (permalink / raw)
To: Jesper Dangaard Brouer, Sebastiano Miano, bpf, netdev
Cc: saeedm, tariqt, edumazet, kuba, pabeni, Samuel Dobron
Jesper Dangaard Brouer <hawk@kernel.org> writes:
> On 18/06/2024 17.28, Sebastiano Miano wrote:
>> Hi folks,
>>
>> I have been conducting some basic experiments with XDP and have
>> observed a significant performance regression in recent kernel
>> versions compared to v5.15.
>>
>> My setup is the following:
>> - Hardware: Two machines connected back-to-back with 100G Mellanox
>> ConnectX-6 Dx.
>> - DUT: 2x16 core Intel(R) Xeon(R) Silver 4314 CPU @ 2.40GHz.
>> - Software: xdp-bench program from [1] running on the DUT in both DROP
>> and TX modes.
>> - Traffic generator: Pktgen-DPDK sending traffic with a single 64B UDP
>> flow at ~130Mpps.
>> - Tests: Single core, HT disabled
>>
>> Results:
>>
>> Kernel version |-------| XDP_DROP |--------| XDP_TX |
>> 5.15 30Mpps 16.1Mpps
>> 6.2 21.3Mpps 14.1Mpps
>> 6.5 19.9Mpps 8.6Mpps
>> bpf-next (6.10-rc2) 22.1Mpps 9.2Mpps
>>
>
> Around when I left Red Hat there were a project with [LNST] that used
> xdp-bench for tracking and finding regressions like this.
>
> Perhaps Toke can enlighten us, if that project have caught similar
> regressions?
>
> [LNST] https://github.com/LNST-project/lnst
Yes, actually, we have! Here's the bugzilla for it:
https://bugzilla.redhat.com/show_bug.cgi?id=2270408
I'm on PTO for the rest of this week, but adding Samuel who ran the
tests to Cc, he should be able to provide more information if needed.
-Toke
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XDP Performance Regression in recent kernel versions
2024-06-19 19:17 ` Toke Høiland-Jørgensen
@ 2024-06-20 9:52 ` Daniel Borkmann
2024-06-21 12:35 ` Samuel Dobron
0 siblings, 1 reply; 20+ messages in thread
From: Daniel Borkmann @ 2024-06-20 9:52 UTC (permalink / raw)
To: Toke Høiland-Jørgensen, Jesper Dangaard Brouer,
Sebastiano Miano, bpf, netdev
Cc: saeedm, tariqt, edumazet, kuba, pabeni, Samuel Dobron, netdev
On 6/19/24 9:17 PM, Toke Høiland-Jørgensen wrote:
> Jesper Dangaard Brouer <hawk@kernel.org> writes:
>> On 18/06/2024 17.28, Sebastiano Miano wrote:
>>> I have been conducting some basic experiments with XDP and have
>>> observed a significant performance regression in recent kernel
>>> versions compared to v5.15.
>>>
>>> My setup is the following:
>>> - Hardware: Two machines connected back-to-back with 100G Mellanox
>>> ConnectX-6 Dx.
>>> - DUT: 2x16 core Intel(R) Xeon(R) Silver 4314 CPU @ 2.40GHz.
>>> - Software: xdp-bench program from [1] running on the DUT in both DROP
>>> and TX modes.
>>> - Traffic generator: Pktgen-DPDK sending traffic with a single 64B UDP
>>> flow at ~130Mpps.
>>> - Tests: Single core, HT disabled
>>>
>>> Results:
>>>
>>> Kernel version |-------| XDP_DROP |--------| XDP_TX |
>>> 5.15 30Mpps 16.1Mpps
>>> 6.2 21.3Mpps 14.1Mpps
>>> 6.5 19.9Mpps 8.6Mpps
>>> bpf-next (6.10-rc2) 22.1Mpps 9.2Mpps
>>>
>>
>> Around when I left Red Hat there were a project with [LNST] that used
>> xdp-bench for tracking and finding regressions like this.
>>
>> Perhaps Toke can enlighten us, if that project have caught similar
>> regressions?
>>
>> [LNST] https://github.com/LNST-project/lnst
>
> Yes, actually, we have! Here's the bugzilla for it:
> https://bugzilla.redhat.com/show_bug.cgi?id=2270408
> We compared performance of ELN and RHEL9 candidate kernels and noticed significant
> drop in XDP drop [1] on mlx5 (25G).
>
> On any rhel9 candidate kernel we are able to drop 19-20M pkts/sec but on an ELN
> kernels, we are reaching just 15M pkts/sec (CPU utillization remains the same -
> around 100%).
>
> We don't see such regression on ixgbe or i40e.
It looks like this is known since March, was this ever reported to Nvidia back
then? :/
Given XDP is in the critical path for many in production, we should think about
regular performance reporting for the different vendors for each released kernel,
similar to here [0].
Out of curiosity, @Saeed: Is Nvidia internally regularly assessing XDP perf for mlx5
as part of QA? (Probably not, but I thought I'd ask.)
Thanks,
Daniel
[0] http://core.dpdk.org/perf-reports/
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XDP Performance Regression in recent kernel versions
2024-06-20 9:52 ` Daniel Borkmann
@ 2024-06-21 12:35 ` Samuel Dobron
2024-06-24 11:46 ` Toke Høiland-Jørgensen
2024-06-30 11:43 ` Tariq Toukan
0 siblings, 2 replies; 20+ messages in thread
From: Samuel Dobron @ 2024-06-21 12:35 UTC (permalink / raw)
To: Daniel Borkmann, hawk
Cc: Toke Høiland-Jørgensen, Sebastiano Miano, bpf, netdev,
saeedm, tariqt, edumazet, kuba, pabeni
Hey all,
Yeah, we do tests for ELN kernels [1] on a regular basis. Since
~January of this year.
As already mentioned, mlx5 is the only driver affected by this regression.
Unfortunately, I think Jesper is actually hitting 2 regressions we noticed,
the one already mentioned by Toke, another one [0] has been reported
in early February.
Btw. issue mentioned by Toke has been moved to Jira, see [5].
Not sure all of you are able to see the content of [0], Jira says it's
RH-confidental.
So, I am not sure how much I can share without being fired :D. Anyway,
affected kernels have been released a while ago, so anyone can find it
on its own.
Basically, we detected 5% regression on XDP_DROP+mlx5 (currently, we
don't have data for any other XDP mode) in kernel-5.14 compared to
previous builds.
From tests history, I can see (most likely) the same improvement
on 6.10rc2 (from 15Mpps to 17-18Mpps), so I'd say 20% drop has been
(partially) fixed?
For earlier 6.10. kernels we don't have data due to [3] (there is regression on
XDP_DROP as well, but I believe it's turbo-boost issue, as I mentioned
in issue).
So if you want to run tests on 6.10. please see [3].
Summary XDP_DROP+mlx5@25G:
kernel pps
<5.14 20.5M baseline
>=5.14 19M [0]
<6.4 19-20M baseline for ELN kernels
>=6.4 15M [4 and 5] (mentioned by Toke)
>=6.10 ??? [3]
>=6.10rc2 17M-18M
> It looks like this is known since March, was this ever reported to Nvidia back
> then? :/
Not sure if that's a question for me, I was told, filling an issue in
Bugzilla/Jira is where
our competences end. Who is supposed to report it to them?
> Given XDP is in the critical path for many in production, we should think about
> regular performance reporting for the different vendors for each released kernel,
> similar to here [0].
I think this might be the part of upstream kernel testing with LNST?
Maybe Jesper
knows more about that? Until then, I think, I can let you know about
new regressions we catch.
Thanks,
Sam.
[0] https://issues.redhat.com/browse/RHEL-24054
[1] https://koji.fedoraproject.org/koji/search?terms=kernel-%5Cd.*eln*&type=build&match=regexp
[2] https://koji.fedoraproject.org/koji/buildinfo?buildID=2469107
[3] https://bugzilla.redhat.com/show_bug.cgi?id=2282969
[4] https://bugzilla.redhat.com/show_bug.cgi?id=2270408
[5] https://issues.redhat.com/browse/RHEL-24054
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XDP Performance Regression in recent kernel versions
2024-06-21 12:35 ` Samuel Dobron
@ 2024-06-24 11:46 ` Toke Høiland-Jørgensen
2024-06-30 10:25 ` Tariq Toukan
2024-06-30 11:43 ` Tariq Toukan
1 sibling, 1 reply; 20+ messages in thread
From: Toke Høiland-Jørgensen @ 2024-06-24 11:46 UTC (permalink / raw)
To: Samuel Dobron, Daniel Borkmann, hawk
Cc: Sebastiano Miano, bpf, netdev, saeedm, tariqt, edumazet, kuba,
pabeni
Samuel Dobron <sdobron@redhat.com> writes:
>> It looks like this is known since March, was this ever reported to Nvidia back
>> then? :/
>
> Not sure if that's a question for me, I was told, filling an issue in
> Bugzilla/Jira is where
> our competences end. Who is supposed to report it to them?
I don't think we have a formal reporting procedure, but I was planning
to send this to the list, referencing the Bugzilla entry. Seems I
dropped the ball on that; sorry! :(
Can we set up a better reporting procedure for this going forward? A
mailing list, or just a name we can put in reports? Or something else?
Tariq, any preferences?
-Toke
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XDP Performance Regression in recent kernel versions
2024-06-24 11:46 ` Toke Høiland-Jørgensen
@ 2024-06-30 10:25 ` Tariq Toukan
2024-07-22 10:57 ` Samuel Dobron
0 siblings, 1 reply; 20+ messages in thread
From: Tariq Toukan @ 2024-06-30 10:25 UTC (permalink / raw)
To: Toke Høiland-Jørgensen, Samuel Dobron, Daniel Borkmann,
hawk
Cc: Sebastiano Miano, bpf, netdev, saeedm, edumazet, kuba, pabeni,
Dragos Tatulea
On 24/06/2024 14:46, Toke Høiland-Jørgensen wrote:
> Samuel Dobron <sdobron@redhat.com> writes:
>
>>> It looks like this is known since March, was this ever reported to Nvidia back
>>> then? :/
>>
>> Not sure if that's a question for me, I was told, filling an issue in
>> Bugzilla/Jira is where
>> our competences end. Who is supposed to report it to them?
>
> I don't think we have a formal reporting procedure, but I was planning
> to send this to the list, referencing the Bugzilla entry. Seems I
> dropped the ball on that; sorry! :(
>
> Can we set up a better reporting procedure for this going forward? A
> mailing list, or just a name we can put in reports? Or something else?
> Tariq, any preferences?
>
> -Toke
>
Hi,
Please add Dragos and me on XDP mailing list reports.
Regards,
Tariq
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XDP Performance Regression in recent kernel versions
2024-06-21 12:35 ` Samuel Dobron
2024-06-24 11:46 ` Toke Høiland-Jørgensen
@ 2024-06-30 11:43 ` Tariq Toukan
2024-07-22 9:26 ` Dragos Tatulea
1 sibling, 1 reply; 20+ messages in thread
From: Tariq Toukan @ 2024-06-30 11:43 UTC (permalink / raw)
To: Samuel Dobron, Daniel Borkmann, hawk, Dragos Tatulea
Cc: Toke Høiland-Jørgensen, Sebastiano Miano, bpf, netdev,
saeedm, edumazet, kuba, pabeni
On 21/06/2024 15:35, Samuel Dobron wrote:
> Hey all,
>
> Yeah, we do tests for ELN kernels [1] on a regular basis. Since
> ~January of this year.
>
> As already mentioned, mlx5 is the only driver affected by this regression.
> Unfortunately, I think Jesper is actually hitting 2 regressions we noticed,
> the one already mentioned by Toke, another one [0] has been reported
> in early February.
> Btw. issue mentioned by Toke has been moved to Jira, see [5].
>
> Not sure all of you are able to see the content of [0], Jira says it's
> RH-confidental.
> So, I am not sure how much I can share without being fired :D. Anyway,
> affected kernels have been released a while ago, so anyone can find it
> on its own.
> Basically, we detected 5% regression on XDP_DROP+mlx5 (currently, we
> don't have data for any other XDP mode) in kernel-5.14 compared to
> previous builds.
>
> From tests history, I can see (most likely) the same improvement
> on 6.10rc2 (from 15Mpps to 17-18Mpps), so I'd say 20% drop has been
> (partially) fixed?
>
> For earlier 6.10. kernels we don't have data due to [3] (there is regression on
> XDP_DROP as well, but I believe it's turbo-boost issue, as I mentioned
> in issue).
> So if you want to run tests on 6.10. please see [3].
>
> Summary XDP_DROP+mlx5@25G:
> kernel pps
> <5.14 20.5M baseline
>> =5.14 19M [0]
> <6.4 19-20M baseline for ELN kernels
>> =6.4 15M [4 and 5] (mentioned by Toke)
+ @Dragos
That's about when we added several changes to the RX datapath.
Most relevant are:
- Fully removing the in-driver RX page-cache.
- Refactoring to support XDP multi-buffer.
We tested XDP performance before submission, I don't recall we noticed
such a degradation.
I'll check with Dragos as he probably has these reports.
>> =6.10 ??? [3]
>> =6.10rc2 17M-18M
>
>
>> It looks like this is known since March, was this ever reported to Nvidia back
>> then? :/
>
> Not sure if that's a question for me, I was told, filling an issue in
> Bugzilla/Jira is where
> our competences end. Who is supposed to report it to them?
>
>> Given XDP is in the critical path for many in production, we should think about
>> regular performance reporting for the different vendors for each released kernel,
>> similar to here [0].
>
> I think this might be the part of upstream kernel testing with LNST?
> Maybe Jesper
> knows more about that? Until then, I think, I can let you know about
> new regressions we catch.
>
> Thanks,
> Sam.
>
> [0] https://issues.redhat.com/browse/RHEL-24054
> [1] https://koji.fedoraproject.org/koji/search?terms=kernel-%5Cd.*eln*&type=build&match=regexp
> [2] https://koji.fedoraproject.org/koji/buildinfo?buildID=2469107
> [3] https://bugzilla.redhat.com/show_bug.cgi?id=2282969
> [4] https://bugzilla.redhat.com/show_bug.cgi?id=2270408
> [5] https://issues.redhat.com/browse/RHEL-24054
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XDP Performance Regression in recent kernel versions
2024-06-30 11:43 ` Tariq Toukan
@ 2024-07-22 9:26 ` Dragos Tatulea
2024-07-23 9:52 ` Carolina Jubran
0 siblings, 1 reply; 20+ messages in thread
From: Dragos Tatulea @ 2024-07-22 9:26 UTC (permalink / raw)
To: Tariq Toukan, daniel@iogearbox.net, Carolina Jubran,
sdobron@redhat.com, hawk@kernel.org
Cc: toke@redhat.com, mianosebastiano@gmail.com, pabeni@redhat.com,
netdev@vger.kernel.org, edumazet@google.com, Saeed Mahameed,
bpf@vger.kernel.org, kuba@kernel.org
On Sun, 2024-06-30 at 14:43 +0300, Tariq Toukan wrote:
>
> On 21/06/2024 15:35, Samuel Dobron wrote:
> > Hey all,
> >
> > Yeah, we do tests for ELN kernels [1] on a regular basis. Since
> > ~January of this year.
> >
> > As already mentioned, mlx5 is the only driver affected by this regression.
> > Unfortunately, I think Jesper is actually hitting 2 regressions we noticed,
> > the one already mentioned by Toke, another one [0] has been reported
> > in early February.
> > Btw. issue mentioned by Toke has been moved to Jira, see [5].
> >
> > Not sure all of you are able to see the content of [0], Jira says it's
> > RH-confidental.
> > So, I am not sure how much I can share without being fired :D. Anyway,
> > affected kernels have been released a while ago, so anyone can find it
> > on its own.
> > Basically, we detected 5% regression on XDP_DROP+mlx5 (currently, we
> > don't have data for any other XDP mode) in kernel-5.14 compared to
> > previous builds.
> >
> > From tests history, I can see (most likely) the same improvement
> > on 6.10rc2 (from 15Mpps to 17-18Mpps), so I'd say 20% drop has been
> > (partially) fixed?
> >
> > For earlier 6.10. kernels we don't have data due to [3] (there is regression on
> > XDP_DROP as well, but I believe it's turbo-boost issue, as I mentioned
> > in issue).
> > So if you want to run tests on 6.10. please see [3].
> >
> > Summary XDP_DROP+mlx5@25G:
> > kernel pps
> > <5.14 20.5M baseline
> > > =5.14 19M [0]
> > <6.4 19-20M baseline for ELN kernels
> > > =6.4 15M [4 and 5] (mentioned by Toke)
>
> + @Dragos
>
> That's about when we added several changes to the RX datapath.
> Most relevant are:
> - Fully removing the in-driver RX page-cache.
> - Refactoring to support XDP multi-buffer.
>
> We tested XDP performance before submission, I don't recall we noticed
> such a degradation.
Adding Carolina to post her analysis on this.
>
> I'll check with Dragos as he probably has these reports.
>
We only noticed a 6% degradation for XDP_XDROP.
https://lore.kernel.org/netdev/b6fcfa8b-c2b3-8a92-fb6e-0760d5f6f5ff@redhat.com/T/
> > > =6.10 ??? [3]
> > > =6.10rc2 17M-18M
> >
> >
> > > It looks like this is known since March, was this ever reported to Nvidia back
> > > then? :/
> >
> > Not sure if that's a question for me, I was told, filling an issue in
> > Bugzilla/Jira is where
> > our competences end. Who is supposed to report it to them?
> >
> > > Given XDP is in the critical path for many in production, we should think about
> > > regular performance reporting for the different vendors for each released kernel,
> > > similar to here [0].
> >
> > I think this might be the part of upstream kernel testing with LNST?
> > Maybe Jesper
> > knows more about that? Until then, I think, I can let you know about
> > new regressions we catch.
> >
> > Thanks,
> > Sam.
> >
> > [0] https://issues.redhat.com/browse/RHEL-24054
> > [1] https://koji.fedoraproject.org/koji/search?terms=kernel-%5Cd.*eln*&type=build&match=regexp
> > [2] https://koji.fedoraproject.org/koji/buildinfo?buildID=2469107
> > [3] https://bugzilla.redhat.com/show_bug.cgi?id=2282969
> > [4] https://bugzilla.redhat.com/show_bug.cgi?id=2270408
> > [5] https://issues.redhat.com/browse/RHEL-24054
> >
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XDP Performance Regression in recent kernel versions
2024-06-30 10:25 ` Tariq Toukan
@ 2024-07-22 10:57 ` Samuel Dobron
0 siblings, 0 replies; 20+ messages in thread
From: Samuel Dobron @ 2024-07-22 10:57 UTC (permalink / raw)
To: Tariq Toukan
Cc: Toke Høiland-Jørgensen, Daniel Borkmann, hawk,
Sebastiano Miano, bpf, netdev, saeedm, edumazet, kuba, pabeni,
Dragos Tatulea
Hey,
Sorry for waiting.
I've started a discussion within our team, how to handle this since we
don't have reporting process defined. So it may take some time, I'll
let you know.
Thanks,
Sam.
On Sun, Jun 30, 2024 at 12:26 PM Tariq Toukan <tariqt@nvidia.com> wrote:
>
>
>
> On 24/06/2024 14:46, Toke Høiland-Jørgensen wrote:
> > Samuel Dobron <sdobron@redhat.com> writes:
> >
> >>> It looks like this is known since March, was this ever reported to Nvidia back
> >>> then? :/
> >>
> >> Not sure if that's a question for me, I was told, filling an issue in
> >> Bugzilla/Jira is where
> >> our competences end. Who is supposed to report it to them?
> >
> > I don't think we have a formal reporting procedure, but I was planning
> > to send this to the list, referencing the Bugzilla entry. Seems I
> > dropped the ball on that; sorry! :(
> >
> > Can we set up a better reporting procedure for this going forward? A
> > mailing list, or just a name we can put in reports? Or something else?
> > Tariq, any preferences?
> >
> > -Toke
> >
>
> Hi,
> Please add Dragos and me on XDP mailing list reports.
>
> Regards,
> Tariq
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XDP Performance Regression in recent kernel versions
2024-07-22 9:26 ` Dragos Tatulea
@ 2024-07-23 9:52 ` Carolina Jubran
2024-07-24 15:36 ` Toke Høiland-Jørgensen
2024-07-30 11:04 ` Samuel Dobron
0 siblings, 2 replies; 20+ messages in thread
From: Carolina Jubran @ 2024-07-23 9:52 UTC (permalink / raw)
To: Dragos Tatulea, Tariq Toukan, daniel@iogearbox.net,
sdobron@redhat.com, hawk@kernel.org, mianosebastiano@gmail.com
Cc: toke@redhat.com, pabeni@redhat.com, netdev@vger.kernel.org,
edumazet@google.com, Saeed Mahameed, bpf@vger.kernel.org,
kuba@kernel.org
On 22/07/2024 12:26, Dragos Tatulea wrote:
> On Sun, 2024-06-30 at 14:43 +0300, Tariq Toukan wrote:
>>
>> On 21/06/2024 15:35, Samuel Dobron wrote:
>>> Hey all,
>>>
>>> Yeah, we do tests for ELN kernels [1] on a regular basis. Since
>>> ~January of this year.
>>>
>>> As already mentioned, mlx5 is the only driver affected by this regression.
>>> Unfortunately, I think Jesper is actually hitting 2 regressions we noticed,
>>> the one already mentioned by Toke, another one [0] has been reported
>>> in early February.
>>> Btw. issue mentioned by Toke has been moved to Jira, see [5].
>>>
>>> Not sure all of you are able to see the content of [0], Jira says it's
>>> RH-confidental.
>>> So, I am not sure how much I can share without being fired :D. Anyway,
>>> affected kernels have been released a while ago, so anyone can find it
>>> on its own.
>>> Basically, we detected 5% regression on XDP_DROP+mlx5 (currently, we
>>> don't have data for any other XDP mode) in kernel-5.14 compared to
>>> previous builds.
>>>
>>> From tests history, I can see (most likely) the same improvement
>>> on 6.10rc2 (from 15Mpps to 17-18Mpps), so I'd say 20% drop has been
>>> (partially) fixed?
>>>
>>> For earlier 6.10. kernels we don't have data due to [3] (there is regression on
>>> XDP_DROP as well, but I believe it's turbo-boost issue, as I mentioned
>>> in issue).
>>> So if you want to run tests on 6.10. please see [3].
>>>
>>> Summary XDP_DROP+mlx5@25G:
>>> kernel pps
>>> <5.14 20.5M baseline
>>>> =5.14 19M [0]
>>> <6.4 19-20M baseline for ELN kernels
>>>> =6.4 15M [4 and 5] (mentioned by Toke)
>>
>> + @Dragos
>>
>> That's about when we added several changes to the RX datapath.
>> Most relevant are:
>> - Fully removing the in-driver RX page-cache.
>> - Refactoring to support XDP multi-buffer.
>>
>> We tested XDP performance before submission, I don't recall we noticed
>> such a degradation.
>
> Adding Carolina to post her analysis on this.
Hey everyone,
After investigating the issue, it seems the performance degradation is
linked to the commit "x86/bugs: Report Intel retbleed vulnerability"
(6ad0ad2bf8a67).
This commit addresses the Intel retbleed vulnerability and introduces
mitigation measures that impact performance, especially the Spectre v2
mitigations.
Disabling these mitigations in the kernel arguments
(spectre_v2=off ibrs=off) resolved the degradation in my tests.
Could you try adding the mentioned parameters to your kernel arguments
and check if you still see the degradation?
Thank you,
Carolina.
>
>>
>> I'll check with Dragos as he probably has these reports.
>>
> We only noticed a 6% degradation for XDP_XDROP.
>
> https://lore.kernel.org/netdev/b6fcfa8b-c2b3-8a92-fb6e-0760d5f6f5ff@redhat.com/T/
>
>>>> =6.10 ??? [3]
>>>> =6.10rc2 17M-18M
>>>
>>>
>>>> It looks like this is known since March, was this ever reported to Nvidia back
>>>> then? :/
>>>
>>> Not sure if that's a question for me, I was told, filling an issue in
>>> Bugzilla/Jira is where
>>> our competences end. Who is supposed to report it to them?
>>>
>>>> Given XDP is in the critical path for many in production, we should think about
>>>> regular performance reporting for the different vendors for each released kernel,
>>>> similar to here [0].
>>>
>>> I think this might be the part of upstream kernel testing with LNST?
>>> Maybe Jesper
>>> knows more about that? Until then, I think, I can let you know about
>>> new regressions we catch.
>>>
>>> Thanks,
>>> Sam.
>>>
>>> [0] https://issues.redhat.com/browse/RHEL-24054
>>> [1] https://koji.fedoraproject.org/koji/search?terms=kernel-%5Cd.*eln*&type=build&match=regexp
>>> [2] https://koji.fedoraproject.org/koji/buildinfo?buildID=2469107
>>> [3] https://bugzilla.redhat.com/show_bug.cgi?id=2282969
>>> [4] https://bugzilla.redhat.com/show_bug.cgi?id=2270408
>>> [5] https://issues.redhat.com/browse/RHEL-24054
>>>
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XDP Performance Regression in recent kernel versions
2024-07-23 9:52 ` Carolina Jubran
@ 2024-07-24 15:36 ` Toke Høiland-Jørgensen
2024-07-25 12:27 ` Samuel Dobron
2024-07-26 8:09 ` Dragos Tatulea
2024-07-30 11:04 ` Samuel Dobron
1 sibling, 2 replies; 20+ messages in thread
From: Toke Høiland-Jørgensen @ 2024-07-24 15:36 UTC (permalink / raw)
To: Carolina Jubran, Dragos Tatulea, Tariq Toukan,
daniel@iogearbox.net, sdobron@redhat.com, hawk@kernel.org,
mianosebastiano@gmail.com
Cc: pabeni@redhat.com, netdev@vger.kernel.org, edumazet@google.com,
Saeed Mahameed, bpf@vger.kernel.org, kuba@kernel.org
Carolina Jubran <cjubran@nvidia.com> writes:
> On 22/07/2024 12:26, Dragos Tatulea wrote:
>> On Sun, 2024-06-30 at 14:43 +0300, Tariq Toukan wrote:
>>>
>>> On 21/06/2024 15:35, Samuel Dobron wrote:
>>>> Hey all,
>>>>
>>>> Yeah, we do tests for ELN kernels [1] on a regular basis. Since
>>>> ~January of this year.
>>>>
>>>> As already mentioned, mlx5 is the only driver affected by this regression.
>>>> Unfortunately, I think Jesper is actually hitting 2 regressions we noticed,
>>>> the one already mentioned by Toke, another one [0] has been reported
>>>> in early February.
>>>> Btw. issue mentioned by Toke has been moved to Jira, see [5].
>>>>
>>>> Not sure all of you are able to see the content of [0], Jira says it's
>>>> RH-confidental.
>>>> So, I am not sure how much I can share without being fired :D. Anyway,
>>>> affected kernels have been released a while ago, so anyone can find it
>>>> on its own.
>>>> Basically, we detected 5% regression on XDP_DROP+mlx5 (currently, we
>>>> don't have data for any other XDP mode) in kernel-5.14 compared to
>>>> previous builds.
>>>>
>>>> From tests history, I can see (most likely) the same improvement
>>>> on 6.10rc2 (from 15Mpps to 17-18Mpps), so I'd say 20% drop has been
>>>> (partially) fixed?
>>>>
>>>> For earlier 6.10. kernels we don't have data due to [3] (there is regression on
>>>> XDP_DROP as well, but I believe it's turbo-boost issue, as I mentioned
>>>> in issue).
>>>> So if you want to run tests on 6.10. please see [3].
>>>>
>>>> Summary XDP_DROP+mlx5@25G:
>>>> kernel pps
>>>> <5.14 20.5M baseline
>>>>> =5.14 19M [0]
>>>> <6.4 19-20M baseline for ELN kernels
>>>>> =6.4 15M [4 and 5] (mentioned by Toke)
>>>
>>> + @Dragos
>>>
>>> That's about when we added several changes to the RX datapath.
>>> Most relevant are:
>>> - Fully removing the in-driver RX page-cache.
>>> - Refactoring to support XDP multi-buffer.
>>>
>>> We tested XDP performance before submission, I don't recall we noticed
>>> such a degradation.
>>
>> Adding Carolina to post her analysis on this.
>
> Hey everyone,
>
> After investigating the issue, it seems the performance degradation is
> linked to the commit "x86/bugs: Report Intel retbleed vulnerability"
> (6ad0ad2bf8a67).
Hmm, that commit is from June 2022, and according to Samuel's tests,
this issue was introduced sometime between commits b6dad5178cea and
40f71e7cd3c6 (both of which are dated in June 2023). Besides, if it was
a retbleed mitigation issue, that would affect other drivers as well,
no? Our testing only shows this regression on mlx5, not on the intel
drivers.
>>> I'll check with Dragos as he probably has these reports.
>>>
>> We only noticed a 6% degradation for XDP_XDROP.
>>
>> https://lore.kernel.org/netdev/b6fcfa8b-c2b3-8a92-fb6e-0760d5f6f5ff@redhat.com/T/
That message mentions that "This will be handled in a different patch
series by adding support for multi-packet per page." - did that ever go
in?
-Toke
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XDP Performance Regression in recent kernel versions
2024-07-24 15:36 ` Toke Høiland-Jørgensen
@ 2024-07-25 12:27 ` Samuel Dobron
2024-07-26 8:09 ` Dragos Tatulea
1 sibling, 0 replies; 20+ messages in thread
From: Samuel Dobron @ 2024-07-25 12:27 UTC (permalink / raw)
To: Toke Høiland-Jørgensen
Cc: Carolina Jubran, Dragos Tatulea, Tariq Toukan,
daniel@iogearbox.net, hawk@kernel.org, mianosebastiano@gmail.com,
pabeni@redhat.com, netdev@vger.kernel.org, edumazet@google.com,
Saeed Mahameed, bpf@vger.kernel.org, kuba@kernel.org
Confirming that this is just mlx5 issue, intel is fine.
I just did a quick test with disabled[0] Spectre v2 mitigations.
The performance remains the same, no difference at all.
Sam.
[0]:
$ cat /sys/devices/system/cpu/vulnerabilities/spectre_v2
Vulnerable; IBPB: disabled; STIBP: disabled; PBRSB-eIBRS: Vulnerable;
BHI: Vulnerable
On Wed, Jul 24, 2024 at 5:48 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Carolina Jubran <cjubran@nvidia.com> writes:
>
> > On 22/07/2024 12:26, Dragos Tatulea wrote:
> >> On Sun, 2024-06-30 at 14:43 +0300, Tariq Toukan wrote:
> >>>
> >>> On 21/06/2024 15:35, Samuel Dobron wrote:
> >>>> Hey all,
> >>>>
> >>>> Yeah, we do tests for ELN kernels [1] on a regular basis. Since
> >>>> ~January of this year.
> >>>>
> >>>> As already mentioned, mlx5 is the only driver affected by this regression.
> >>>> Unfortunately, I think Jesper is actually hitting 2 regressions we noticed,
> >>>> the one already mentioned by Toke, another one [0] has been reported
> >>>> in early February.
> >>>> Btw. issue mentioned by Toke has been moved to Jira, see [5].
> >>>>
> >>>> Not sure all of you are able to see the content of [0], Jira says it's
> >>>> RH-confidental.
> >>>> So, I am not sure how much I can share without being fired :D. Anyway,
> >>>> affected kernels have been released a while ago, so anyone can find it
> >>>> on its own.
> >>>> Basically, we detected 5% regression on XDP_DROP+mlx5 (currently, we
> >>>> don't have data for any other XDP mode) in kernel-5.14 compared to
> >>>> previous builds.
> >>>>
> >>>> From tests history, I can see (most likely) the same improvement
> >>>> on 6.10rc2 (from 15Mpps to 17-18Mpps), so I'd say 20% drop has been
> >>>> (partially) fixed?
> >>>>
> >>>> For earlier 6.10. kernels we don't have data due to [3] (there is regression on
> >>>> XDP_DROP as well, but I believe it's turbo-boost issue, as I mentioned
> >>>> in issue).
> >>>> So if you want to run tests on 6.10. please see [3].
> >>>>
> >>>> Summary XDP_DROP+mlx5@25G:
> >>>> kernel pps
> >>>> <5.14 20.5M baseline
> >>>>> =5.14 19M [0]
> >>>> <6.4 19-20M baseline for ELN kernels
> >>>>> =6.4 15M [4 and 5] (mentioned by Toke)
> >>>
> >>> + @Dragos
> >>>
> >>> That's about when we added several changes to the RX datapath.
> >>> Most relevant are:
> >>> - Fully removing the in-driver RX page-cache.
> >>> - Refactoring to support XDP multi-buffer.
> >>>
> >>> We tested XDP performance before submission, I don't recall we noticed
> >>> such a degradation.
> >>
> >> Adding Carolina to post her analysis on this.
> >
> > Hey everyone,
> >
> > After investigating the issue, it seems the performance degradation is
> > linked to the commit "x86/bugs: Report Intel retbleed vulnerability"
> > (6ad0ad2bf8a67).
>
> Hmm, that commit is from June 2022, and according to Samuel's tests,
> this issue was introduced sometime between commits b6dad5178cea and
> 40f71e7cd3c6 (both of which are dated in June 2023). Besides, if it was
> a retbleed mitigation issue, that would affect other drivers as well,
> no? Our testing only shows this regression on mlx5, not on the intel
> drivers.
>
>
> >>> I'll check with Dragos as he probably has these reports.
> >>>
> >> We only noticed a 6% degradation for XDP_XDROP.
> >>
> >> https://lore.kernel.org/netdev/b6fcfa8b-c2b3-8a92-fb6e-0760d5f6f5ff@redhat.com/T/
>
> That message mentions that "This will be handled in a different patch
> series by adding support for multi-packet per page." - did that ever go
> in?
>
> -Toke
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XDP Performance Regression in recent kernel versions
2024-07-24 15:36 ` Toke Høiland-Jørgensen
2024-07-25 12:27 ` Samuel Dobron
@ 2024-07-26 8:09 ` Dragos Tatulea
2024-07-29 18:00 ` Samuel Dobron
1 sibling, 1 reply; 20+ messages in thread
From: Dragos Tatulea @ 2024-07-26 8:09 UTC (permalink / raw)
To: toke@redhat.com, Tariq Toukan, daniel@iogearbox.net,
Carolina Jubran, sdobron@redhat.com, hawk@kernel.org,
mianosebastiano@gmail.com
Cc: Saeed Mahameed, edumazet@google.com, netdev@vger.kernel.org,
kuba@kernel.org, pabeni@redhat.com, bpf@vger.kernel.org
Hi,
On Wed, 2024-07-24 at 17:36 +0200, Toke Høiland-Jørgensen wrote:
> Carolina Jubran <cjubran@nvidia.com> writes:
>
> > On 22/07/2024 12:26, Dragos Tatulea wrote:
> > > On Sun, 2024-06-30 at 14:43 +0300, Tariq Toukan wrote:
> > > >
> > > > On 21/06/2024 15:35, Samuel Dobron wrote:
> > > > > Hey all,
> > > > >
> > > > > Yeah, we do tests for ELN kernels [1] on a regular basis. Since
> > > > > ~January of this year.
> > > > >
> > > > > As already mentioned, mlx5 is the only driver affected by this regression.
> > > > > Unfortunately, I think Jesper is actually hitting 2 regressions we noticed,
> > > > > the one already mentioned by Toke, another one [0] has been reported
> > > > > in early February.
> > > > > Btw. issue mentioned by Toke has been moved to Jira, see [5].
> > > > >
> > > > > Not sure all of you are able to see the content of [0], Jira says it's
> > > > > RH-confidental.
> > > > > So, I am not sure how much I can share without being fired :D. Anyway,
> > > > > affected kernels have been released a while ago, so anyone can find it
> > > > > on its own.
> > > > > Basically, we detected 5% regression on XDP_DROP+mlx5 (currently, we
> > > > > don't have data for any other XDP mode) in kernel-5.14 compared to
> > > > > previous builds.
> > > > >
> > > > > From tests history, I can see (most likely) the same improvement
> > > > > on 6.10rc2 (from 15Mpps to 17-18Mpps), so I'd say 20% drop has been
> > > > > (partially) fixed?
> > > > >
> > > > > For earlier 6.10. kernels we don't have data due to [3] (there is regression on
> > > > > XDP_DROP as well, but I believe it's turbo-boost issue, as I mentioned
> > > > > in issue).
> > > > > So if you want to run tests on 6.10. please see [3].
> > > > >
> > > > > Summary XDP_DROP+mlx5@25G:
> > > > > kernel pps
> > > > > <5.14 20.5M baseline
> > > > > > =5.14 19M [0]
> > > > > <6.4 19-20M baseline for ELN kernels
> > > > > > =6.4 15M [4 and 5] (mentioned by Toke)
> > > >
> > > > + @Dragos
> > > >
> > > > That's about when we added several changes to the RX datapath.
> > > > Most relevant are:
> > > > - Fully removing the in-driver RX page-cache.
> > > > - Refactoring to support XDP multi-buffer.
> > > >
> > > > We tested XDP performance before submission, I don't recall we noticed
> > > > such a degradation.
> > >
> > > Adding Carolina to post her analysis on this.
> >
> > Hey everyone,
> >
> > After investigating the issue, it seems the performance degradation is
> > linked to the commit "x86/bugs: Report Intel retbleed vulnerability"
> > (6ad0ad2bf8a67).
>
> Hmm, that commit is from June 2022, [...]
>
The results from the very first mail in this thread from Sebastiano were
showing a 30Mpps -> 21.3Mpps XDP_DROP regression between 5.15 and 6.2. This
is what Carolina was focused on. Furthermore, the results from Samuel don't show
this regression. Seems like the discussion is now focused on the 6.4 regression?
> [...] and according to Samuel's tests,
> this issue was introduced sometime between commits b6dad5178cea and
> 40f71e7cd3c6 (both of which are dated in June 2023).
>
Thanks for the commit range (now I know how to decode ELN kernel versions :)).
Strangely this range doesn't have anything suspicious. I would have expected to
see the page_pool or the XDP multibuf changes would have shown up in this range.
But they are already present in the working version... Anyway, we'll keep on
looking.
> Besides, if it was
> a retbleed mitigation issue, that would affect other drivers as well,
> no? Our testing only shows this regression on mlx5, not on the intel
> drivers.
>
>
> > > > I'll check with Dragos as he probably has these reports.
> > > >
> > > We only noticed a 6% degradation for XDP_XDROP.
> > >
> > > https://lore.kernel.org/netdev/b6fcfa8b-c2b3-8a92-fb6e-0760d5f6f5ff@redhat.com/T/
>
> That message mentions that "This will be handled in a different patch
> series by adding support for multi-packet per page." - did that ever go
> in?
>
Nope, no XDP multi-packet per page yet.
Thanks,
Dragos
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XDP Performance Regression in recent kernel versions
2024-07-26 8:09 ` Dragos Tatulea
@ 2024-07-29 18:00 ` Samuel Dobron
0 siblings, 0 replies; 20+ messages in thread
From: Samuel Dobron @ 2024-07-29 18:00 UTC (permalink / raw)
To: Dragos Tatulea
Cc: toke@redhat.com, Tariq Toukan, daniel@iogearbox.net,
Carolina Jubran, hawk@kernel.org, mianosebastiano@gmail.com,
Saeed Mahameed, edumazet@google.com, netdev@vger.kernel.org,
kuba@kernel.org, pabeni@redhat.com, bpf@vger.kernel.org
Ah, sorry.
Yes, I was talking about 6.4 regression.
I double-checked that v5.15 regression and I don't see anything
that significant as Sebastiano. I ran a couple of tests for:
* kernel-5.10.0-0.rc6.90.eln105
* kernel-5.14.0-60.eln112
* kernel-5.15.0-0.rc7.53.eln113
* kernel-5.16.0-60.eln114
* kernel-6.11.0-0.rc0.20240724git786c8248dbd3.12.eln141
The results of XDP_DROP on receiving side (the one, that is dropping
packets) are more or less the same ~20.5Mpps (17.5Mpps on 6.11, but
that's due to 6.4 regression). CPU is bottleneck, so 100% cpu utilization
for all the kernels on both ends - generator and receiver. We use pktgen
as a generator, both generator and receiver machines use mlx5 NIC.
However, I noticed that between 5.10 and 5.14 there is 30Mpps->22Mpps
regression BUT at the GENERATOR side, CPU util remains the same
on both ends and amount of dropped packets on receiver side is
the same as well (since it's CPU bottlenecked). Other drivers seems
to be unaffected.
That's probably something unrelated to Sebastiano's regression,
but I believe it's worth to mention.
And so, no idea where Sebastiano's regression comes from. I can see,
he uses ConnectX-6, we don't have those, only ConnectX-5, cloud that
be the problem?
Thanks,
Sam.
On Fri, Jul 26, 2024 at 10:09 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> Hi,
>
> On Wed, 2024-07-24 at 17:36 +0200, Toke Høiland-Jørgensen wrote:
> > Carolina Jubran <cjubran@nvidia.com> writes:
> >
> > > On 22/07/2024 12:26, Dragos Tatulea wrote:
> > > > On Sun, 2024-06-30 at 14:43 +0300, Tariq Toukan wrote:
> > > > >
> > > > > On 21/06/2024 15:35, Samuel Dobron wrote:
> > > > > > Hey all,
> > > > > >
> > > > > > Yeah, we do tests for ELN kernels [1] on a regular basis. Since
> > > > > > ~January of this year.
> > > > > >
> > > > > > As already mentioned, mlx5 is the only driver affected by this regression.
> > > > > > Unfortunately, I think Jesper is actually hitting 2 regressions we noticed,
> > > > > > the one already mentioned by Toke, another one [0] has been reported
> > > > > > in early February.
> > > > > > Btw. issue mentioned by Toke has been moved to Jira, see [5].
> > > > > >
> > > > > > Not sure all of you are able to see the content of [0], Jira says it's
> > > > > > RH-confidental.
> > > > > > So, I am not sure how much I can share without being fired :D. Anyway,
> > > > > > affected kernels have been released a while ago, so anyone can find it
> > > > > > on its own.
> > > > > > Basically, we detected 5% regression on XDP_DROP+mlx5 (currently, we
> > > > > > don't have data for any other XDP mode) in kernel-5.14 compared to
> > > > > > previous builds.
> > > > > >
> > > > > > From tests history, I can see (most likely) the same improvement
> > > > > > on 6.10rc2 (from 15Mpps to 17-18Mpps), so I'd say 20% drop has been
> > > > > > (partially) fixed?
> > > > > >
> > > > > > For earlier 6.10. kernels we don't have data due to [3] (there is regression on
> > > > > > XDP_DROP as well, but I believe it's turbo-boost issue, as I mentioned
> > > > > > in issue).
> > > > > > So if you want to run tests on 6.10. please see [3].
> > > > > >
> > > > > > Summary XDP_DROP+mlx5@25G:
> > > > > > kernel pps
> > > > > > <5.14 20.5M baseline
> > > > > > > =5.14 19M [0]
> > > > > > <6.4 19-20M baseline for ELN kernels
> > > > > > > =6.4 15M [4 and 5] (mentioned by Toke)
> > > > >
> > > > > + @Dragos
> > > > >
> > > > > That's about when we added several changes to the RX datapath.
> > > > > Most relevant are:
> > > > > - Fully removing the in-driver RX page-cache.
> > > > > - Refactoring to support XDP multi-buffer.
> > > > >
> > > > > We tested XDP performance before submission, I don't recall we noticed
> > > > > such a degradation.
> > > >
> > > > Adding Carolina to post her analysis on this.
> > >
> > > Hey everyone,
> > >
> > > After investigating the issue, it seems the performance degradation is
> > > linked to the commit "x86/bugs: Report Intel retbleed vulnerability"
> > > (6ad0ad2bf8a67).
> >
> > Hmm, that commit is from June 2022, [...]
> >
> The results from the very first mail in this thread from Sebastiano were
> showing a 30Mpps -> 21.3Mpps XDP_DROP regression between 5.15 and 6.2. This
> is what Carolina was focused on. Furthermore, the results from Samuel don't show
> this regression. Seems like the discussion is now focused on the 6.4 regression?
>
> > [...] and according to Samuel's tests,
> > this issue was introduced sometime between commits b6dad5178cea and
> > 40f71e7cd3c6 (both of which are dated in June 2023).
> >
> Thanks for the commit range (now I know how to decode ELN kernel versions :)).
> Strangely this range doesn't have anything suspicious. I would have expected to
> see the page_pool or the XDP multibuf changes would have shown up in this range.
> But they are already present in the working version... Anyway, we'll keep on
> looking.
>
> > Besides, if it was
> > a retbleed mitigation issue, that would affect other drivers as well,
> > no? Our testing only shows this regression on mlx5, not on the intel
> > drivers.
> >
> >
> > > > > I'll check with Dragos as he probably has these reports.
> > > > >
> > > > We only noticed a 6% degradation for XDP_XDROP.
> > > >
> > > > https://lore.kernel.org/netdev/b6fcfa8b-c2b3-8a92-fb6e-0760d5f6f5ff@redhat.com/T/
> >
> > That message mentions that "This will be handled in a different patch
> > series by adding support for multi-packet per page." - did that ever go
> > in?
> >
> Nope, no XDP multi-packet per page yet.
>
> Thanks,
> Dragos
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XDP Performance Regression in recent kernel versions
2024-07-23 9:52 ` Carolina Jubran
2024-07-24 15:36 ` Toke Høiland-Jørgensen
@ 2024-07-30 11:04 ` Samuel Dobron
2024-12-11 13:20 ` Samuel Dobron
1 sibling, 1 reply; 20+ messages in thread
From: Samuel Dobron @ 2024-07-30 11:04 UTC (permalink / raw)
To: Carolina Jubran, Dragos Tatulea, Tariq Toukan,
daniel@iogearbox.net, hawk@kernel.org, mianosebastiano@gmail.com
Cc: toke@redhat.com, pabeni@redhat.com, netdev@vger.kernel.org,
edumazet@google.com, Saeed Mahameed, bpf@vger.kernel.org,
kuba@kernel.org
> Could you try adding the mentioned parameters to your kernel arguments
> and check if you still see the degradation?
Hey,
So i tried multiple kernels around v5.15 as well as couple of previous
v6.xx and there is no difference with spectre v2 mitigations enabled
or disabled.
No difference on other drivers as well.
Sam.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XDP Performance Regression in recent kernel versions
2024-07-30 11:04 ` Samuel Dobron
@ 2024-12-11 13:20 ` Samuel Dobron
2025-01-08 9:26 ` Carolina Jubran
0 siblings, 1 reply; 20+ messages in thread
From: Samuel Dobron @ 2024-12-11 13:20 UTC (permalink / raw)
To: Carolina Jubran, Dragos Tatulea, Tariq Toukan,
daniel@iogearbox.net, hawk@kernel.org, mianosebastiano@gmail.com
Cc: toke@redhat.com, pabeni@redhat.com, netdev@vger.kernel.org,
edumazet@google.com, Saeed Mahameed, bpf@vger.kernel.org,
kuba@kernel.org, Benjamin Poirier
Hey all,
We recently enabled tests for XDP TX, so I was able to test
xdp tx as well.
XDP_DROP performance regression is the same as I reported
a while ago. There is about 20% regression in
kernel-6.4.0-0.rc6.20230616git40f71e7cd3c6.50.eln126 (baseline)
compared to previous kernel
kernel-6.4.0-0.rc6.20230614gitb6dad5178cea.49.eln126 (broken).
We don't see such regression for other drivers.
The regression was partially fixed somewhere between eln126 and
kernel-6.10.0-0.rc2.20240606git2df0193e62cf.27.eln137 (partially
fixed) and the performance since then is -7 to -15% compared to
baseline. So, nothing new.
XDP_TX is however, more interesting.
When comparing baseline with broken kernel there is 20 - 25%
performance drop (cpu utilizations remains the same) on mlx driver.
There is also 10% drop on other drivers as well. HOWEVER, it got
fixed somewhere between broken and partially fixed kernel. On most
recent kernels, we don't see that regressions on other drivers. But
2-10% (depends if using dpa/load-bytes) regression remains on mlx5.
The numbers look a bit similar to regression with enabled spectre/meltdown
mitigations but based on my experiments, there is no difference with
enabled/disabled mitigations.
Hope this will help,
Sam.
On Tue, Jul 30, 2024 at 1:04 PM Samuel Dobron <sdobron@redhat.com> wrote:
>
> > Could you try adding the mentioned parameters to your kernel arguments
> > and check if you still see the degradation?
>
> Hey,
> So i tried multiple kernels around v5.15 as well as couple of previous
> v6.xx and there is no difference with spectre v2 mitigations enabled
> or disabled.
>
> No difference on other drivers as well.
>
>
> Sam.
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XDP Performance Regression in recent kernel versions
2024-12-11 13:20 ` Samuel Dobron
@ 2025-01-08 9:26 ` Carolina Jubran
0 siblings, 0 replies; 20+ messages in thread
From: Carolina Jubran @ 2025-01-08 9:26 UTC (permalink / raw)
To: Samuel Dobron, Dragos Tatulea, Tariq Toukan, daniel@iogearbox.net,
hawk@kernel.org, mianosebastiano@gmail.com
Cc: toke@redhat.com, pabeni@redhat.com, netdev@vger.kernel.org,
edumazet@google.com, Saeed Mahameed, bpf@vger.kernel.org,
kuba@kernel.org, Benjamin Poirier
Hello,
Thank you Sam for the detailed information.
I have identified the specific kernel configuration change responsible
for the degradation between kernel versions
6.4.0-0.rc6.20230614gitb6dad5178cea.49.eln126 and
6.4.0-0.rc6.20230616git40f71e7cd3c6.50.eln126. The introduction of the
CONFIG_INIT_STACK_ALL_ZERO setting in the latter version has led to a
noticeable performance impact.
I am currently investigating why this change specifically affects mlx5.
Thanks,
Carolina
On 11/12/2024 15:20, Samuel Dobron wrote:
> Hey all,
>
> We recently enabled tests for XDP TX, so I was able to test
> xdp tx as well.
>
> XDP_DROP performance regression is the same as I reported
> a while ago. There is about 20% regression in
> kernel-6.4.0-0.rc6.20230616git40f71e7cd3c6.50.eln126 (baseline)
> compared to previous kernel
> kernel-6.4.0-0.rc6.20230614gitb6dad5178cea.49.eln126 (broken).
> We don't see such regression for other drivers.
>
> The regression was partially fixed somewhere between eln126 and
> kernel-6.10.0-0.rc2.20240606git2df0193e62cf.27.eln137 (partially
> fixed) and the performance since then is -7 to -15% compared to
> baseline. So, nothing new.
>
> XDP_TX is however, more interesting.
> When comparing baseline with broken kernel there is 20 - 25%
> performance drop (cpu utilizations remains the same) on mlx driver.
> There is also 10% drop on other drivers as well. HOWEVER, it got
> fixed somewhere between broken and partially fixed kernel. On most
> recent kernels, we don't see that regressions on other drivers. But
> 2-10% (depends if using dpa/load-bytes) regression remains on mlx5.
>
> The numbers look a bit similar to regression with enabled spectre/meltdown
> mitigations but based on my experiments, there is no difference with
> enabled/disabled mitigations.
>
> Hope this will help,
> Sam.
>
> On Tue, Jul 30, 2024 at 1:04 PM Samuel Dobron <sdobron@redhat.com> wrote:
>>
>>> Could you try adding the mentioned parameters to your kernel arguments
>>> and check if you still see the degradation?
>>
>> Hey,
>> So i tried multiple kernels around v5.15 as well as couple of previous
>> v6.xx and there is no difference with spectre v2 mitigations enabled
>> or disabled.
>>
>> No difference on other drivers as well.
>>
>>
>> Sam.
>>
>
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2025-01-08 9:26 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-18 15:28 XDP Performance Regression in recent kernel versions Sebastiano Miano
2024-06-19 6:00 ` Tariq Toukan
2024-06-19 15:17 ` Sebastiano Miano
2024-06-19 16:27 ` Jesper Dangaard Brouer
2024-06-19 19:17 ` Toke Høiland-Jørgensen
2024-06-20 9:52 ` Daniel Borkmann
2024-06-21 12:35 ` Samuel Dobron
2024-06-24 11:46 ` Toke Høiland-Jørgensen
2024-06-30 10:25 ` Tariq Toukan
2024-07-22 10:57 ` Samuel Dobron
2024-06-30 11:43 ` Tariq Toukan
2024-07-22 9:26 ` Dragos Tatulea
2024-07-23 9:52 ` Carolina Jubran
2024-07-24 15:36 ` Toke Høiland-Jørgensen
2024-07-25 12:27 ` Samuel Dobron
2024-07-26 8:09 ` Dragos Tatulea
2024-07-29 18:00 ` Samuel Dobron
2024-07-30 11:04 ` Samuel Dobron
2024-12-11 13:20 ` Samuel Dobron
2025-01-08 9:26 ` Carolina Jubran
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).