* [net-next v6 0/9] Add support for per-NAPI config via netlink
@ 2024-10-11 18:44 Joe Damato
2024-10-15 1:10 ` patchwork-bot+netdevbpf
0 siblings, 1 reply; 11+ messages in thread
From: Joe Damato @ 2024-10-11 18:44 UTC (permalink / raw)
To: netdev
Cc: mkarsten, skhawaja, sdf, bjorn, amritha.nambiar,
sridhar.samudrala, willemdebruijn.kernel, edumazet, Joe Damato,
Alexander Lobakin, Breno Leitao, Daniel Jurgens, David Ahern,
David S. Miller, Donald Hunter, Jakub Kicinski,
Jesper Dangaard Brouer, Jiri Pirko, Johannes Berg,
Jonathan Corbet, Leon Romanovsky, open list:DOCUMENTATION,
open list, open list:MELLANOX MLX4 core VPI driver,
Lorenzo Bianconi, Michael Chan, Mina Almasry, Paolo Abeni,
Saeed Mahameed, Sebastian Andrzej Siewior, Tariq Toukan,
Xuan Zhuo
Greetings:
Welcome to v6. Minor changes from v5 [1], please see changelog below.
There were no explicit comments from reviewers on the call outs in my
v5, so I'm retaining them from my previous cover letter just in case :)
A few important call outs for reviewers:
1. This revision seems to work (see below for a full walk through). I
think this is the behavior we talked about, but please let me know if
a use case is missing.
2. Re a previous point made by Stanislav regarding "taking over a NAPI
ID" when the channel count changes: mlx5 seems to call napi_disable
followed by netif_napi_del for the old queues and then calls
napi_enable for the new ones. In this RFC, the NAPI ID generation is
deferred to napi_enable. This means we won't end up with two of the
same NAPI IDs added to the hash at the same time.
Can we assume all drivers will napi_disable the old queues before
napi_enable the new ones?
- If yes: we might not need to worry about a NAPI ID takeover
function.
- If no: I'll need to make a change so that the NAPI ID generation
is deferred only for drivers which have opted into the config
space via calls to netif_napi_add_config
3. I made the decision to remove the WARN_ON_ONCE that (I think?)
Jakub previously suggested in alloc_netdev_mqs (WARN_ON_ONCE(txqs
!= rxqs);) because this was triggering on every kernel boot with my
mlx5 NIC.
4. I left the "maxqs = max(txqs, rxqs);" in alloc_netdev_mqs despite
thinking this is a bit strange. I think it's strange that we might
be short some number of NAPI configs, but it seems like most people
are in favor of this approach, so I've left it.
I'd appreciate thoughts from reviewers on the above items, if at all
possible.
Now, on to the implementation.
Firstly, this implementation moves certain settings to napi_struct so that
they are "per-NAPI", while taking care to respect existing sysfs
parameters which are interface wide and affect all NAPIs:
- NAPI ID
- gro_flush_timeout
- defer_hard_irqs
Furthermore:
- NAPI ID generation and addition to the hash is now deferred to
napi_enable, instead of during netif_napi_add
- NAPIs are removed from the hash during napi_disable, instead of
netif_napi_del.
- An array of "struct napi_config" is allocated in net_device.
IMPORTANT: The above changes affect all network drivers.
Optionally, drivers may opt-in to using their config space by calling
netif_napi_add_config instead of netif_napi_add.
If a driver does this, the NAPI being added is linked with an allocated
"struct napi_config" and the per-NAPI settings (including NAPI ID) are
persisted even as hardware queues are destroyed and recreated.
To help illustrate how this would end up working, I've added patches for
3 drivers, of which I have access to only 1:
- mlx5 which is the basis of the examples below
- mlx4 which has TX only NAPIs, just to highlight that case. I have
only compile tested this patch; I don't have this hardware.
- bnxt which I have only compiled tested. I don't have this
hardware.
NOTE: I only tested this on mlx5; I have no access to the other hardware
for which I provided patches. Hopefully other folks can help test :)
Here's how it works when I test it on my mlx5 system:
# start with 2 queues
$ ethtool -l eth4 | grep Combined | tail -1
Combined: 2
First, output the current NAPI settings:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 7}'
[{'defer-hard-irqs': 0,
'gro-flush-timeout': 0,
'id': 345,
'ifindex': 7,
'irq': 527},
{'defer-hard-irqs': 0,
'gro-flush-timeout': 0,
'id': 344,
'ifindex': 7,
'irq': 327}]
Now, set the global sysfs parameters:
$ sudo bash -c 'echo 20000 >/sys/class/net/eth4/gro_flush_timeout'
$ sudo bash -c 'echo 100 >/sys/class/net/eth4/napi_defer_hard_irqs'
Output current NAPI settings again:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 7}'
[{'defer-hard-irqs': 100,
'gro-flush-timeout': 20000,
'id': 345,
'ifindex': 7,
'irq': 527},
{'defer-hard-irqs': 100,
'gro-flush-timeout': 20000,
'id': 344,
'ifindex': 7,
'irq': 327}]
Now set NAPI ID 345, via its NAPI ID to specific values:
$ sudo ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/netdev.yaml \
--do napi-set \
--json='{"id": 345,
"defer-hard-irqs": 111,
"gro-flush-timeout": 11111}'
None
Now output current NAPI settings again to ensure only NAPI ID 345
changed:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 7}'
[{'defer-hard-irqs': 111,
'gro-flush-timeout': 11111,
'id': 345,
'ifindex': 7,
'irq': 527},
{'defer-hard-irqs': 100,
'gro-flush-timeout': 20000,
'id': 344,
'ifindex': 7,
'irq': 327}]
Now, increase gro-flush-timeout only:
$ sudo ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/netdev.yaml \
--do napi-set --json='{"id": 345,
"gro-flush-timeout": 44444}'
None
Now output the current NAPI settings once more:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 7}'
[{'defer-hard-irqs': 111,
'gro-flush-timeout': 44444,
'id': 345,
'ifindex': 7,
'irq': 527},
{'defer-hard-irqs': 100,
'gro-flush-timeout': 20000,
'id': 344,
'ifindex': 7,
'irq': 327}]
Now set NAPI ID 345 to have gro_flush_timeout of 0:
$ sudo ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/netdev.yaml \
--do napi-set --json='{"id": 345,
"gro-flush-timeout": 0}'
None
Check that NAPI ID 345 has a value of 0:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 7}'
[{'defer-hard-irqs': 111,
'gro-flush-timeout': 0,
'id': 345,
'ifindex': 7,
'irq': 527},
{'defer-hard-irqs': 100,
'gro-flush-timeout': 20000,
'id': 344,
'ifindex': 7,
'irq': 327}]
Change the queue count, ensuring that NAPI ID 345 retains its settings:
$ sudo ethtool -L eth4 combined 4
Check that the new queues have the system wide settings but that NAPI ID
345 remains unchanged:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 7}'
[{'defer-hard-irqs': 100,
'gro-flush-timeout': 20000,
'id': 347,
'ifindex': 7,
'irq': 529},
{'defer-hard-irqs': 100,
'gro-flush-timeout': 20000,
'id': 346,
'ifindex': 7,
'irq': 528},
{'defer-hard-irqs': 111,
'gro-flush-timeout': 0,
'id': 345,
'ifindex': 7,
'irq': 527},
{'defer-hard-irqs': 100,
'gro-flush-timeout': 20000,
'id': 344,
'ifindex': 7,
'irq': 327}]
Now reduce the queue count below where NAPI ID 345 is indexed:
$ sudo ethtool -L eth4 combined 1
Check the output:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 7}'
[{'defer-hard-irqs': 100,
'gro-flush-timeout': 20000,
'id': 344,
'ifindex': 7,
'irq': 327}]
Re-increase the queue count to ensure NAPI ID 345 is re-assigned the same
values:
$ sudo ethtool -L eth4 combined 2
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 7}'
[{'defer-hard-irqs': 111,
'gro-flush-timeout': 0,
'id': 345,
'ifindex': 7,
'irq': 527},
{'defer-hard-irqs': 100,
'gro-flush-timeout': 20000,
'id': 344,
'ifindex': 7,
'irq': 327}]
Create new queues to ensure the sysfs globals are used for the new NAPIs
but that NAPI ID 345 is unchanged:
$ sudo ethtool -L eth4 combined 8
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 7}'
[...]
{'defer-hard-irqs': 100,
'gro-flush-timeout': 20000,
'id': 346,
'ifindex': 7,
'irq': 528},
{'defer-hard-irqs': 111,
'gro-flush-timeout': 0,
'id': 345,
'ifindex': 7,
'irq': 527},
{'defer-hard-irqs': 100,
'gro-flush-timeout': 20000,
'id': 344,
'ifindex': 7,
'irq': 327}]
Last, but not least, let's try writing the sysfs parameters to ensure
all NAPIs are rewritten:
$ sudo bash -c 'echo 33333 >/sys/class/net/eth4/gro_flush_timeout'
$ sudo bash -c 'echo 222 >/sys/class/net/eth4/napi_defer_hard_irqs'
Check that worked:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 7}'
[...]
{'defer-hard-irqs': 222,
'gro-flush-timeout': 33333,
'id': 346,
'ifindex': 7,
'irq': 528},
{'defer-hard-irqs': 222,
'gro-flush-timeout': 33333,
'id': 345,
'ifindex': 7,
'irq': 527},
{'defer-hard-irqs': 222,
'gro-flush-timeout': 33333,
'id': 344,
'ifindex': 7,
'irq': 327}]
Thanks,
Joe
[1]: https://lore.kernel.org/netdev/20241009005525.13651-1-jdamato@fastly.com/
v6:
- Added Jakub's Reviewed-by to all commit messages
- Added Eric's Reviewed-by to the commits for which he provided his
review (i.e. all patches except 4, 5, and 6)
- Updated the netlink doc for gro_flush_timeout in patch 4
- Added WARN_ON_ONCE to napi_hash_add_with_id as suggested by Eric
Dumazet to patch 5
v5: https://lore.kernel.org/netdev/20241009005525.13651-1-jdamato@fastly.com/
- Converted from an RFC to a PATCH
- Moved defer_hard_irqs above control-fields comment in napi_struct in
patch 1
- Moved gro_flush_timeout above control-fields comment in napi_struct in
patch 3
- Removed unnecessary idpf changes from patch 3
- Removed unnecessary kdoc in patch 5 for a parameter removed in an
earlier revision
- Removed unnecessary NULL check in patch 5
- Used tooling to autogenerate code for patch 6, which fixed the type and
range of NETDEV_A_NAPI_DEFER_HARD_IRQ.
rfcv4: https://lore.kernel.org/lkml/20241001235302.57609-1-jdamato@fastly.com/
- Updated commit messages of most patches
- Renamed netif_napi_add_storage to netif_napi_add_config in patch 5
- Added a NULL check in netdev_set_defer_hard_irqs and
netdev_set_gro_flush_timeout for netdev->napi_config in patch 5
- Removed the WARN_ON_ONCE suggested in an earlier revision
in alloc_netdev_mqs from patch 5; it triggers every time on my mlx5
machine at boot and needlessly spams the log
- Added a locking adjustment suggested by Stanislav to patch 6 to
protect napi_id in patch 5
- Removed napi_hash_del from netif_napi_del in patch 5. netif_napi_del
calls __netif_napi_del which itself calls napi_hash_del. The
original code thus resulted in two napi_hash_del calls, which is
incorrect.
- Removed the napi_hash_add from netif_napi_add_weight in patch 5.
NAPIs are added to the hash when napi_enable is called, instead.
- Moved the napi_restore_config to the top of napi_enable in patch 5.
- Simplified the logic in __netif_napi_del and removed napi_hash_del.
NAPIs are removed in napi_disable.
- Fixed merge conflicts in patch 6 so it applies cleanly
rfcv3: https://lore.kernel.org/netdev/20240912100738.16567-8-jdamato@fastly.com/T/
- Renamed napi_storage to napi_config
- Reordered patches
- Added defer_hard_irqs and gro_flush_timeout to napi_struct
- Attempt to save and restore settings on napi_disable/napi_enable
- Removed weight as a parameter to netif_napi_add_storage
- Updated driver patches to no longer pass in weight
rfcv2: https://lore.kernel.org/netdev/20240908160702.56618-1-jdamato@fastly.com/
- Almost total rewrite from v1
Joe Damato (9):
net: napi: Make napi_defer_hard_irqs per-NAPI
netdev-genl: Dump napi_defer_hard_irqs
net: napi: Make gro_flush_timeout per-NAPI
netdev-genl: Dump gro_flush_timeout
net: napi: Add napi_config
netdev-genl: Support setting per-NAPI config values
bnxt: Add support for persistent NAPI config
mlx5: Add support for persistent NAPI config
mlx4: Add support for persistent NAPI config to RX CQs
Documentation/netlink/specs/netdev.yaml | 28 ++++++
.../networking/net_cachelines/net_device.rst | 3 +
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 3 +-
drivers/net/ethernet/mellanox/mlx4/en_cq.c | 3 +-
.../net/ethernet/mellanox/mlx5/core/en_main.c | 2 +-
include/linux/netdevice.h | 42 +++++++-
include/uapi/linux/netdev.h | 3 +
net/core/dev.c | 96 ++++++++++++++++---
net/core/dev.h | 88 +++++++++++++++++
net/core/net-sysfs.c | 4 +-
net/core/netdev-genl-gen.c | 18 ++++
net/core/netdev-genl-gen.h | 1 +
net/core/netdev-genl.c | 57 +++++++++++
tools/include/uapi/linux/netdev.h | 3 +
14 files changed, 326 insertions(+), 25 deletions(-)
base-commit: d677aebd663ddc287f2b2bda098474694a0ca875
--
2.25.1
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [net-next v6 0/9] Add support for per-NAPI config via netlink
2024-10-11 18:44 Joe Damato
@ 2024-10-15 1:10 ` patchwork-bot+netdevbpf
0 siblings, 0 replies; 11+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-10-15 1:10 UTC (permalink / raw)
To: Joe Damato
Cc: netdev, mkarsten, skhawaja, sdf, bjorn, amritha.nambiar,
sridhar.samudrala, willemdebruijn.kernel, edumazet,
aleksander.lobakin, leitao, danielj, dsahern, davem,
donald.hunter, kuba, hawk, jiri, johannes.berg, corbet, leon,
linux-doc, linux-kernel, linux-rdma, lorenzo, michael.chan,
almasrymina, pabeni, saeedm, bigeasy, tariqt, xuanzhuo
Hello:
This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Fri, 11 Oct 2024 18:44:55 +0000 you wrote:
> Greetings:
>
> Welcome to v6. Minor changes from v5 [1], please see changelog below.
>
> There were no explicit comments from reviewers on the call outs in my
> v5, so I'm retaining them from my previous cover letter just in case :)
>
> [...]
Here is the summary with links:
- [net-next,v6,1/9] net: napi: Make napi_defer_hard_irqs per-NAPI
https://git.kernel.org/netdev/net-next/c/f15e3b3ddb9f
- [net-next,v6,2/9] netdev-genl: Dump napi_defer_hard_irqs
https://git.kernel.org/netdev/net-next/c/516010460011
- [net-next,v6,3/9] net: napi: Make gro_flush_timeout per-NAPI
https://git.kernel.org/netdev/net-next/c/acb8d4ed5661
- [net-next,v6,4/9] netdev-genl: Dump gro_flush_timeout
https://git.kernel.org/netdev/net-next/c/0137891e7457
- [net-next,v6,5/9] net: napi: Add napi_config
https://git.kernel.org/netdev/net-next/c/86e25f40aa1e
- [net-next,v6,6/9] netdev-genl: Support setting per-NAPI config values
https://git.kernel.org/netdev/net-next/c/1287c1ae0fc2
- [net-next,v6,7/9] bnxt: Add support for persistent NAPI config
https://git.kernel.org/netdev/net-next/c/419365227496
- [net-next,v6,8/9] mlx5: Add support for persistent NAPI config
https://git.kernel.org/netdev/net-next/c/2a3372cafe02
- [net-next,v6,9/9] mlx4: Add support for persistent NAPI config to RX CQs
https://git.kernel.org/netdev/net-next/c/c9191eaa7285
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [net-next v6 0/9] Add support for per-NAPI config via netlink
@ 2024-12-18 11:22 Alex Lazar
2024-12-18 17:08 ` Joe Damato
0 siblings, 1 reply; 11+ messages in thread
From: Alex Lazar @ 2024-12-18 11:22 UTC (permalink / raw)
To: jdamato@fastly.com
Cc: aleksander.lobakin@intel.com, almasrymina@google.com,
amritha.nambiar@intel.com, bigeasy@linutronix.de,
bjorn@rivosinc.com, corbet@lwn.net, Dan Jurgens,
davem@davemloft.net, donald.hunter@gmail.com, dsahern@kernel.org,
edumazet@google.com, hawk@kernel.org, jiri@resnulli.us,
johannes.berg@intel.com, kuba@kernel.org, leitao@debian.org,
leon@kernel.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
lorenzo@kernel.org, michael.chan@broadcom.com,
mkarsten@uwaterloo.ca, netdev@vger.kernel.org, pabeni@redhat.com,
Saeed Mahameed, sdf@fomichev.me, skhawaja@google.com,
sridhar.samudrala@intel.com, Tariq Toukan,
willemdebruijn.kernel@gmail.com, xuanzhuo@linux.alibaba.com,
Gal Pressman, Nimrod Oren, Dror Tennenbaum, Dragos Tatulea
Hi Joe and all,
I am part of the NVIDIA Eth drivers team, and we are experiencing a problem,
sibesced to this change: commit 86e25f40aa1e ("net: napi: Add napi_config")
The issue occurs when sending packets from one machine to another.
On the receiver side, we have XSK (XDPsock) that receives the packet and sends it
back to the sender.
At some point, one packet (packet A) gets "stuck," and if we send a new packet
(packet B), it "pushes" the previous one. Packet A is then processed by the NAPI
poll, and packet B gets stuck, and so on.
Your change involves moving napi_hash_del() and napi_hash_add() from
netif_napi_del() and netif_napi_add_weight() to napi_enable() and napi_disable().
If I move them back to netif_napi_del() and netif_napi_add_weight(),
the issue is resolved (I moved the entire if/else block, not just the napi_hash_del/add).
This issue occurs with both the new and old APIs (netif_napi_add/_config).
Moving the napi_hash_add() and napi_hash_del() functions resolves it for both.
I am debugging this, no breakthrough so far.
I would appreciate if you could look into this.
We can provide more details per request.
Regards,
Alexei Lazar
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [net-next v6 0/9] Add support for per-NAPI config via netlink
2024-12-18 11:22 [net-next v6 0/9] Add support for per-NAPI config via netlink Alex Lazar
@ 2024-12-18 17:08 ` Joe Damato
2024-12-20 17:40 ` Joe Damato
0 siblings, 1 reply; 11+ messages in thread
From: Joe Damato @ 2024-12-18 17:08 UTC (permalink / raw)
To: Alex Lazar
Cc: aleksander.lobakin@intel.com, almasrymina@google.com,
amritha.nambiar@intel.com, bigeasy@linutronix.de,
bjorn@rivosinc.com, corbet@lwn.net, Dan Jurgens,
davem@davemloft.net, donald.hunter@gmail.com, dsahern@kernel.org,
edumazet@google.com, hawk@kernel.org, jiri@resnulli.us,
johannes.berg@intel.com, kuba@kernel.org, leitao@debian.org,
leon@kernel.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
lorenzo@kernel.org, michael.chan@broadcom.com,
mkarsten@uwaterloo.ca, netdev@vger.kernel.org, pabeni@redhat.com,
Saeed Mahameed, sdf@fomichev.me, skhawaja@google.com,
sridhar.samudrala@intel.com, Tariq Toukan,
willemdebruijn.kernel@gmail.com, xuanzhuo@linux.alibaba.com,
Gal Pressman, Nimrod Oren, Dror Tennenbaum, Dragos Tatulea
On Wed, Dec 18, 2024 at 11:22:33AM +0000, Alex Lazar wrote:
> Hi Joe and all,
>
> I am part of the NVIDIA Eth drivers team, and we are experiencing a problem,
> sibesced to this change: commit 86e25f40aa1e ("net: napi: Add napi_config")
>
> The issue occurs when sending packets from one machine to another.
> On the receiver side, we have XSK (XDPsock) that receives the packet and sends it
> back to the sender.
> At some point, one packet (packet A) gets "stuck," and if we send a new packet
> (packet B), it "pushes" the previous one. Packet A is then processed by the NAPI
> poll, and packet B gets stuck, and so on.
>
> Your change involves moving napi_hash_del() and napi_hash_add() from
> netif_napi_del() and netif_napi_add_weight() to napi_enable() and napi_disable().
> If I move them back to netif_napi_del() and netif_napi_add_weight(),
> the issue is resolved (I moved the entire if/else block, not just the napi_hash_del/add).
>
> This issue occurs with both the new and old APIs (netif_napi_add/_config).
> Moving the napi_hash_add() and napi_hash_del() functions resolves it for both.
> I am debugging this, no breakthrough so far.
>
> I would appreciate if you could look into this.
> We can provide more details per request.
I appreciate your report, but there is not a lot in your message to
help debug the issue.
Can you please:
1.) Verify that the kernel tree you are testing on has commit
cecc1555a8c2 ("net: Make napi_hash_lock irq safe") included ? If it
does not, can you pull in that commit and re-run your test and
report back if that fixes your problem?
2.) If (1) does not fix your problem, can you please reply with at
least the following information:
- Specify what device this is happening on (in case I have access
to one)
- Which driver is affected
- Which upstream kernel SHA you are building your test kernel from
- The reproducer program(s) with clear instructions on how exactly
to run it/them in order to reproduce the issue
Thanks,
Joe
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [net-next v6 0/9] Add support for per-NAPI config via netlink
2024-12-18 17:08 ` Joe Damato
@ 2024-12-20 17:40 ` Joe Damato
2024-12-23 8:17 ` Alex Lazar
0 siblings, 1 reply; 11+ messages in thread
From: Joe Damato @ 2024-12-20 17:40 UTC (permalink / raw)
To: Alex Lazar, aleksander.lobakin@intel.com, almasrymina@google.com,
amritha.nambiar@intel.com, bigeasy@linutronix.de,
bjorn@rivosinc.com, corbet@lwn.net, Dan Jurgens,
davem@davemloft.net, donald.hunter@gmail.com, dsahern@kernel.org,
edumazet@google.com, hawk@kernel.org, jiri@resnulli.us,
johannes.berg@intel.com, kuba@kernel.org, leitao@debian.org,
leon@kernel.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
lorenzo@kernel.org, michael.chan@broadcom.com,
mkarsten@uwaterloo.ca, netdev@vger.kernel.org, pabeni@redhat.com,
Saeed Mahameed, sdf@fomichev.me, skhawaja@google.com,
sridhar.samudrala@intel.com, Tariq Toukan,
willemdebruijn.kernel@gmail.com, xuanzhuo@linux.alibaba.com,
Gal Pressman, Nimrod Oren, Dror Tennenbaum, Dragos Tatulea
On Wed, Dec 18, 2024 at 09:08:58AM -0800, Joe Damato wrote:
> On Wed, Dec 18, 2024 at 11:22:33AM +0000, Alex Lazar wrote:
> > Hi Joe and all,
> >
> > I am part of the NVIDIA Eth drivers team, and we are experiencing a problem,
> > sibesced to this change: commit 86e25f40aa1e ("net: napi: Add napi_config")
> >
> > The issue occurs when sending packets from one machine to another.
> > On the receiver side, we have XSK (XDPsock) that receives the packet and sends it
> > back to the sender.
> > At some point, one packet (packet A) gets "stuck," and if we send a new packet
> > (packet B), it "pushes" the previous one. Packet A is then processed by the NAPI
> > poll, and packet B gets stuck, and so on.
> >
> > Your change involves moving napi_hash_del() and napi_hash_add() from
> > netif_napi_del() and netif_napi_add_weight() to napi_enable() and napi_disable().
> > If I move them back to netif_napi_del() and netif_napi_add_weight(),
> > the issue is resolved (I moved the entire if/else block, not just the napi_hash_del/add).
> >
> > This issue occurs with both the new and old APIs (netif_napi_add/_config).
> > Moving the napi_hash_add() and napi_hash_del() functions resolves it for both.
> > I am debugging this, no breakthrough so far.
> >
> > I would appreciate if you could look into this.
> > We can provide more details per request.
>
> I appreciate your report, but there is not a lot in your message to
> help debug the issue.
>
> Can you please:
>
> 1.) Verify that the kernel tree you are testing on has commit
> cecc1555a8c2 ("net: Make napi_hash_lock irq safe") included ? If it
> does not, can you pull in that commit and re-run your test and
> report back if that fixes your problem?
>
> 2.) If (1) does not fix your problem, can you please reply with at
> least the following information:
> - Specify what device this is happening on (in case I have access
> to one)
> - Which driver is affected
> - Which upstream kernel SHA you are building your test kernel from
> - The reproducer program(s) with clear instructions on how exactly
> to run it/them in order to reproduce the issue
I didn't hear back on the above, but wanted to let you know that
I'll be out of the office soon, so my responses/bandwidth for
helping to debug this will be limited over the next week or two.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [net-next v6 0/9] Add support for per-NAPI config via netlink
2024-12-20 17:40 ` Joe Damato
@ 2024-12-23 8:17 ` Alex Lazar
2024-12-30 14:31 ` Joe Damato
0 siblings, 1 reply; 11+ messages in thread
From: Alex Lazar @ 2024-12-23 8:17 UTC (permalink / raw)
To: Joe Damato, aleksander.lobakin@intel.com, almasrymina@google.com,
amritha.nambiar@intel.com, bigeasy@linutronix.de,
bjorn@rivosinc.com, corbet@lwn.net, Dan Jurgens,
davem@davemloft.net, donald.hunter@gmail.com, dsahern@kernel.org,
edumazet@google.com, hawk@kernel.org, jiri@resnulli.us,
johannes.berg@intel.com, kuba@kernel.org, leitao@debian.org,
leon@kernel.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
lorenzo@kernel.org, michael.chan@broadcom.com,
mkarsten@uwaterloo.ca, netdev@vger.kernel.org, pabeni@redhat.com,
Saeed Mahameed, sdf@fomichev.me, skhawaja@google.com,
sridhar.samudrala@intel.com, Tariq Toukan,
willemdebruijn.kernel@gmail.com, xuanzhuo@linux.alibaba.com,
Gal Pressman, Nimrod Oren, Dror Tennenbaum, Dragos Tatulea
On 20/12/2024 19:40, Joe Damato wrote:
> On Wed, Dec 18, 2024 at 09:08:58AM -0800, Joe Damato wrote:
>> On Wed, Dec 18, 2024 at 11:22:33AM +0000, Alex Lazar wrote:
>>> Hi Joe and all,
>>>
>>> I am part of the NVIDIA Eth drivers team, and we are experiencing a problem,
>>> sibesced to this change: commit 86e25f40aa1e ("net: napi: Add napi_config")
>>>
>>> The issue occurs when sending packets from one machine to another.
>>> On the receiver side, we have XSK (XDPsock) that receives the packet and sends it
>>> back to the sender.
>>> At some point, one packet (packet A) gets "stuck," and if we send a new packet
>>> (packet B), it "pushes" the previous one. Packet A is then processed by the NAPI
>>> poll, and packet B gets stuck, and so on.
>>>
>>> Your change involves moving napi_hash_del() and napi_hash_add() from
>>> netif_napi_del() and netif_napi_add_weight() to napi_enable() and napi_disable().
>>> If I move them back to netif_napi_del() and netif_napi_add_weight(),
>>> the issue is resolved (I moved the entire if/else block, not just the napi_hash_del/add).
>>>
>>> This issue occurs with both the new and old APIs (netif_napi_add/_config).
>>> Moving the napi_hash_add() and napi_hash_del() functions resolves it for both.
>>> I am debugging this, no breakthrough so far.
>>>
>>> I would appreciate if you could look into this.
>>> We can provide more details per request.
>>
>> I appreciate your report, but there is not a lot in your message to
>> help debug the issue.
>>
>> Can you please:
>>
>> 1.) Verify that the kernel tree you are testing on has commit
>> cecc1555a8c2 ("net: Make napi_hash_lock irq safe") included ? If it
>> does not, can you pull in that commit and re-run your test and
>> report back if that fixes your problem?
I verified that the kernel tree includes commit cecc1555a8c2 ("net: Make
napi_hash_lock irq safe"), but the issue still occurs.
>>
>> 2.) If (1) does not fix your problem, can you please reply with at
>> least the following information:
>> - Specify what device this is happening on (in case I have access
>> to one)
We are using two ConnectX-5 cards connected back-to-back.
>> - Which driver is affected
The affected driver is the MLX5 driver.
>> - Which upstream kernel SHA you are building your test kernel from
The upstream kernel SHA we are building is 9163b05eca1d ("Merge branch
'add-support-for-so_priority-cmsg'").
>> - The reproducer program(s) with clear instructions on how exactly
>> to run it/them in order to reproduce the issue
Test setup/configuration:
On one side, we use a Python script with the scapy.all library to create
UDP packets of size 1024, using port 19017 and the MAC/IP of the other side.
On the other side, we define an n-tuple filter (ethtool --config-ntuple
eth2 flow-type udp4 dst-port 19017 action 4) and run xdpsock (xdpsock -i
eth2 -N -q 4 --l2fwd -z -B).
In the test, we send a single packet each time, which is received and
sent back to the sender.
As part of the validation, we check the statistics on the other side and
notice a discrepancy between what xdpsock shows and what we see in the
driver (ethtool -S eth2 | grep "tx_xsk_xmit").
>
> I didn't hear back on the above, but wanted to let you know that
> I'll be out of the office soon, so my responses/bandwidth for
> helping to debug this will be limited over the next week or two.
Hi Joe,
Thanks for the quick response.
Comments inline, If you need more details or further clarification,
please let me know.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [net-next v6 0/9] Add support for per-NAPI config via netlink
2024-12-23 8:17 ` Alex Lazar
@ 2024-12-30 14:31 ` Joe Damato
2025-01-10 18:16 ` Joe Damato
0 siblings, 1 reply; 11+ messages in thread
From: Joe Damato @ 2024-12-30 14:31 UTC (permalink / raw)
To: Alex Lazar
Cc: aleksander.lobakin@intel.com, almasrymina@google.com,
amritha.nambiar@intel.com, bigeasy@linutronix.de,
bjorn@rivosinc.com, corbet@lwn.net, Dan Jurgens,
davem@davemloft.net, donald.hunter@gmail.com, dsahern@kernel.org,
edumazet@google.com, hawk@kernel.org, jiri@resnulli.us,
johannes.berg@intel.com, kuba@kernel.org, leitao@debian.org,
leon@kernel.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
lorenzo@kernel.org, michael.chan@broadcom.com,
mkarsten@uwaterloo.ca, netdev@vger.kernel.org, pabeni@redhat.com,
Saeed Mahameed, sdf@fomichev.me, skhawaja@google.com,
sridhar.samudrala@intel.com, Tariq Toukan,
willemdebruijn.kernel@gmail.com, xuanzhuo@linux.alibaba.com,
Gal Pressman, Nimrod Oren, Dror Tennenbaum, Dragos Tatulea
On Mon, Dec 23, 2024 at 08:17:08AM +0000, Alex Lazar wrote:
>
>
> On 20/12/2024 19:40, Joe Damato wrote:
> > On Wed, Dec 18, 2024 at 09:08:58AM -0800, Joe Damato wrote:
> >> On Wed, Dec 18, 2024 at 11:22:33AM +0000, Alex Lazar wrote:
> >>> Hi Joe and all,
> >>>
> >>> I am part of the NVIDIA Eth drivers team, and we are experiencing a problem,
> >>> sibesced to this change: commit 86e25f40aa1e ("net: napi: Add napi_config")
> >>>
> >>> The issue occurs when sending packets from one machine to another.
> >>> On the receiver side, we have XSK (XDPsock) that receives the packet and sends it
> >>> back to the sender.
> >>> At some point, one packet (packet A) gets "stuck," and if we send a new packet
> >>> (packet B), it "pushes" the previous one. Packet A is then processed by the NAPI
> >>> poll, and packet B gets stuck, and so on.
> >>>
> >>> Your change involves moving napi_hash_del() and napi_hash_add() from
> >>> netif_napi_del() and netif_napi_add_weight() to napi_enable() and napi_disable().
> >>> If I move them back to netif_napi_del() and netif_napi_add_weight(),
> >>> the issue is resolved (I moved the entire if/else block, not just the napi_hash_del/add).
> >>>
> >>> This issue occurs with both the new and old APIs (netif_napi_add/_config).
> >>> Moving the napi_hash_add() and napi_hash_del() functions resolves it for both.
> >>> I am debugging this, no breakthrough so far.
> >>>
> >>> I would appreciate if you could look into this.
> >>> We can provide more details per request.
> >>
> >> I appreciate your report, but there is not a lot in your message to
> >> help debug the issue.
> >>
> >> Can you please:
> >>
> >> 1.) Verify that the kernel tree you are testing on has commit
> >> cecc1555a8c2 ("net: Make napi_hash_lock irq safe") included ? If it
> >> does not, can you pull in that commit and re-run your test and
> >> report back if that fixes your problem?
>
> I verified that the kernel tree includes commit cecc1555a8c2 ("net: Make
> napi_hash_lock irq safe"), but the issue still occurs.
OK, thanks for verifying that.
> >>
> >> 2.) If (1) does not fix your problem, can you please reply with at
> >> least the following information:
> >> - Specify what device this is happening on (in case I have access
> >> to one)
>
> We are using two ConnectX-5 cards connected back-to-back.
>
> >> - Which driver is affected
>
> The affected driver is the MLX5 driver.
>
> >> - Which upstream kernel SHA you are building your test kernel from
>
> The upstream kernel SHA we are building is 9163b05eca1d ("Merge branch
> 'add-support-for-so_priority-cmsg'").
I have access to ConnectX-5 Ex EN MCX516A-CDAT NICs and can build a
kernel based on the upstream SHA you mentioned.
> >> - The reproducer program(s) with clear instructions on how exactly
> >> to run it/them in order to reproduce the issue
>
> Test setup/configuration:
> On one side, we use a Python script with the scapy.all library to create
> UDP packets of size 1024, using port 19017 and the MAC/IP of the other side.
> On the other side, we define an n-tuple filter (ethtool --config-ntuple
> eth2 flow-type udp4 dst-port 19017 action 4) and run xdpsock (xdpsock -i
> eth2 -N -q 4 --l2fwd -z -B).
> In the test, we send a single packet each time, which is received and
> sent back to the sender.
> As part of the validation, we check the statistics on the other side and
> notice a discrepancy between what xdpsock shows and what we see in the
> driver (ethtool -S eth2 | grep "tx_xsk_xmit").
1. Can you please be much more specific when you say "notice a
discrepancy" and substitute that for specific data? What,
exactly, is a single run of the experiment that results in a lost
or delayed packet? Do you have tcpdump output? What is the
expected output and what is the actual output?
2. Can you provide the full source code? Both the reproducer and the
script to check the values?
You can feel free to create a github gist and link to it, then I
can git clone it and run it.
In the gist you can create a README.md that shows exactly what
commands I need to run to reproduce, the expected output from the
script, and the actual output.
Since I have access to this hardware, I should be able to
reproduce.
Ideally, I'll git clone your gist, run something like "make test"
(or similar) and reproduce the issue and get either "Test
succeeded" or Test failed; expected %d packets but received %d".
3. If this issue is with napi_disable/napi_enable, can you think of
a simpler reproducer? For example, wouldn't changing the queue
count cause the same issue without needing to involve xdp at all?
What about a simpler experiment with UDPing [1] and an ntuple
filter? Does it reproduce with plain old NAPI processing or is it
XDP specific?
> >
> > I didn't hear back on the above, but wanted to let you know that
> > I'll be out of the office soon, so my responses/bandwidth for
> > helping to debug this will be limited over the next week or two.
>
> Hi Joe,
>
> Thanks for the quick response.
> Comments inline, If you need more details or further clarification,
> please let me know.
As mentioned above and in my previous emails: please provide lot
more detail and make it as easy as possible for me to reproduce this
issue with the simplest reproducer possible and a much more detailed
explanation.
Please note: I will be out of the office until Jan 9 so my responses
will be limited until then.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [net-next v6 0/9] Add support for per-NAPI config via netlink
2024-12-30 14:31 ` Joe Damato
@ 2025-01-10 18:16 ` Joe Damato
2025-01-10 18:26 ` Stanislav Fomichev
0 siblings, 1 reply; 11+ messages in thread
From: Joe Damato @ 2025-01-10 18:16 UTC (permalink / raw)
To: Alex Lazar, aleksander.lobakin@intel.com, almasrymina@google.com,
bigeasy@linutronix.de, bjorn@rivosinc.com, Dan Jurgens,
davem@davemloft.net, donald.hunter@gmail.com, dsahern@kernel.org,
edumazet@google.com, hawk@kernel.org, jiri@resnulli.us,
johannes.berg@intel.com, kuba@kernel.org, leitao@debian.org,
leon@kernel.org, linux-kernel@vger.kernel.org,
linux-rdma@vger.kernel.org, lorenzo@kernel.org,
michael.chan@broadcom.com, mkarsten@uwaterloo.ca,
netdev@vger.kernel.org, pabeni@redhat.com, Saeed Mahameed,
sdf@fomichev.me, skhawaja@google.com, sridhar.samudrala@intel.com,
Tariq Toukan, willemdebruijn.kernel@gmail.com,
xuanzhuo@linux.alibaba.com, Gal Pressman, Nimrod Oren,
Dror Tennenbaum, Dragos Tatulea
On Mon, Dec 30, 2024 at 09:31:23AM -0500, Joe Damato wrote:
> On Mon, Dec 23, 2024 at 08:17:08AM +0000, Alex Lazar wrote:
> >
[...]
> >
> > Hi Joe,
> >
> > Thanks for the quick response.
> > Comments inline, If you need more details or further clarification,
> > please let me know.
>
> As mentioned above and in my previous emails: please provide lot
> more detail and make it as easy as possible for me to reproduce this
> issue with the simplest reproducer possible and a much more detailed
> explanation.
>
> Please note: I will be out of the office until Jan 9 so my responses
> will be limited until then.
Just to follow up on this for anyone who missed the other thread,
Stanislav proposed a patch which _might_ fix the issue being hit
here.
Please see [1], try that patch, and report back if that patch fixes
the issue.
Thanks.
[1]: https://lore.kernel.org/netdev/20250109003436.2829560-1-sdf@fomichev.me/
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [net-next v6 0/9] Add support for per-NAPI config via netlink
2025-01-10 18:16 ` Joe Damato
@ 2025-01-10 18:26 ` Stanislav Fomichev
2025-01-10 18:58 ` Martin Karsten
0 siblings, 1 reply; 11+ messages in thread
From: Stanislav Fomichev @ 2025-01-10 18:26 UTC (permalink / raw)
To: Joe Damato, Alex Lazar, aleksander.lobakin@intel.com,
almasrymina@google.com, bigeasy@linutronix.de, bjorn@rivosinc.com,
Dan Jurgens, davem@davemloft.net, donald.hunter@gmail.com,
dsahern@kernel.org, edumazet@google.com, hawk@kernel.org,
jiri@resnulli.us, johannes.berg@intel.com, kuba@kernel.org,
leitao@debian.org, leon@kernel.org, linux-kernel@vger.kernel.org,
linux-rdma@vger.kernel.org, lorenzo@kernel.org,
michael.chan@broadcom.com, mkarsten@uwaterloo.ca,
netdev@vger.kernel.org, pabeni@redhat.com, Saeed Mahameed,
sdf@fomichev.me, skhawaja@google.com, sridhar.samudrala@intel.com,
Tariq Toukan, willemdebruijn.kernel@gmail.com,
xuanzhuo@linux.alibaba.com, Gal Pressman, Nimrod Oren,
Dror Tennenbaum, Dragos Tatulea
On 01/10, Joe Damato wrote:
> On Mon, Dec 30, 2024 at 09:31:23AM -0500, Joe Damato wrote:
> > On Mon, Dec 23, 2024 at 08:17:08AM +0000, Alex Lazar wrote:
> > >
>
> [...]
>
> > >
> > > Hi Joe,
> > >
> > > Thanks for the quick response.
> > > Comments inline, If you need more details or further clarification,
> > > please let me know.
> >
> > As mentioned above and in my previous emails: please provide lot
> > more detail and make it as easy as possible for me to reproduce this
> > issue with the simplest reproducer possible and a much more detailed
> > explanation.
> >
> > Please note: I will be out of the office until Jan 9 so my responses
> > will be limited until then.
>
> Just to follow up on this for anyone who missed the other thread,
> Stanislav proposed a patch which _might_ fix the issue being hit
> here.
>
> Please see [1], try that patch, and report back if that patch fixes
> the issue.
>
> Thanks.
>
> [1]: https://lore.kernel.org/netdev/20250109003436.2829560-1-sdf@fomichev.me/
Note that it might help only if xsk is using busy-polling. Not sure
that's the case, it's relatively obscure feature :-)
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [net-next v6 0/9] Add support for per-NAPI config via netlink
2025-01-10 18:26 ` Stanislav Fomichev
@ 2025-01-10 18:58 ` Martin Karsten
2025-01-12 8:05 ` Alex Lazar
0 siblings, 1 reply; 11+ messages in thread
From: Martin Karsten @ 2025-01-10 18:58 UTC (permalink / raw)
To: Stanislav Fomichev, Joe Damato, Alex Lazar,
aleksander.lobakin@intel.com, almasrymina@google.com,
bigeasy@linutronix.de, bjorn@rivosinc.com, Dan Jurgens,
davem@davemloft.net, donald.hunter@gmail.com, dsahern@kernel.org,
edumazet@google.com, hawk@kernel.org, jiri@resnulli.us,
johannes.berg@intel.com, kuba@kernel.org, leitao@debian.org,
leon@kernel.org, linux-kernel@vger.kernel.org,
linux-rdma@vger.kernel.org, lorenzo@kernel.org,
michael.chan@broadcom.com, netdev@vger.kernel.org,
pabeni@redhat.com, Saeed Mahameed, sdf@fomichev.me,
skhawaja@google.com, sridhar.samudrala@intel.com, Tariq Toukan,
willemdebruijn.kernel@gmail.com, xuanzhuo@linux.alibaba.com,
Gal Pressman, Nimrod Oren, Dror Tennenbaum, Dragos Tatulea
On 2025-01-10 13:26, Stanislav Fomichev wrote:
> On 01/10, Joe Damato wrote:
>> On Mon, Dec 30, 2024 at 09:31:23AM -0500, Joe Damato wrote:
>>> On Mon, Dec 23, 2024 at 08:17:08AM +0000, Alex Lazar wrote:
>>>>
>>
>> [...]
>>
>>>>
>>>> Hi Joe,
>>>>
>>>> Thanks for the quick response.
>>>> Comments inline, If you need more details or further clarification,
>>>> please let me know.
>>>
>>> As mentioned above and in my previous emails: please provide lot
>>> more detail and make it as easy as possible for me to reproduce this
>>> issue with the simplest reproducer possible and a much more detailed
>>> explanation.
>>>
>>> Please note: I will be out of the office until Jan 9 so my responses
>>> will be limited until then.
>>
>> Just to follow up on this for anyone who missed the other thread,
>> Stanislav proposed a patch which _might_ fix the issue being hit
>> here.
>>
>> Please see [1], try that patch, and report back if that patch fixes
>> the issue.
>>
>> Thanks.
>>
>> [1]: https://lore.kernel.org/netdev/20250109003436.2829560-1-sdf@fomichev.me/
>
> Note that it might help only if xsk is using busy-polling. Not sure
> that's the case, it's relatively obscure feature :-)
I believe I have reproduced Alex' issue using the methodology below and
your patch fixes it for me.
The experiment uses a server (tilly01) with mlx5 and a client (tilly02).
In the problem case, the 'response' packet gets stuck, but the next
'request' packets triggers both the stuck and the regular responses. The
pattern can also be seen in the tcpdump output at the client. Note that
the response packet is not a valid packet (only MAC addresses swapped,
not IP addresses), but tcpdump shows it regardless.
Thanks,
Martin
# on server tilly01
watch -n 0.5 "sudo ethtool -S ens2f1np1 | fgrep tx_xsk_xmit"
# on client tilly02
sudo tcpdump -qbi eno3d1 udp
# on client tilly02
while true; do
ssh tilly01 "sudo ifconfig ens2f1np1 down; sudo modprobe -r mlx5_ib;
sleep 1; sudo modprobe mlx5_ib; sudo ifconfig ens2f1np1 up"
ssh -f tilly01 "sudo ./bpf-examples/AF_XDP-example/xdpsock \
-i ens2f1np1 -N -q 4 --l2fwd -z -B >/dev/null 2>&1"
exp=1
for ((i=0;i<5;i++)); do
ssh tilly01 "sudo ethtool --config-ntuple ens2f1np1 flow-type udp4\
dst-port 19017 action 4 >/dev/null 2>&1"
for ((j=0;j<10;j++)); do
echo -n "$exp "
echo 'send(IP(dst="192.168.199.1",src="192.168.199.2")\
/UDP(dport=19017))' | sudo ./scapy/run_scapy >/dev/null 2>&1
cnt=$(ssh tilly01 ethtool -S ens2f1np1|grep -F tx_xsk_xmit\
|cut -f2 -d:)
[ $cnt -eq $exp ] || {
echo COUNTER WRONG
read x
}
((exp+=1))
done
ssh tilly01 sudo ethtool --config-ntuple ens2f1np1 delete 1023
done
echo reset
ssh tilly01 sudo killall xdpsock
done
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [net-next v6 0/9] Add support for per-NAPI config via netlink
2025-01-10 18:58 ` Martin Karsten
@ 2025-01-12 8:05 ` Alex Lazar
0 siblings, 0 replies; 11+ messages in thread
From: Alex Lazar @ 2025-01-12 8:05 UTC (permalink / raw)
To: Martin Karsten, Stanislav Fomichev, Joe Damato,
aleksander.lobakin@intel.com, almasrymina@google.com,
bigeasy@linutronix.de, bjorn@rivosinc.com, Dan Jurgens,
davem@davemloft.net, donald.hunter@gmail.com, dsahern@kernel.org,
edumazet@google.com, hawk@kernel.org, jiri@resnulli.us,
johannes.berg@intel.com, kuba@kernel.org, leitao@debian.org,
leon@kernel.org, linux-kernel@vger.kernel.org,
linux-rdma@vger.kernel.org, lorenzo@kernel.org,
michael.chan@broadcom.com, netdev@vger.kernel.org,
pabeni@redhat.com, Saeed Mahameed, sdf@fomichev.me,
skhawaja@google.com, sridhar.samudrala@intel.com, Tariq Toukan,
willemdebruijn.kernel@gmail.com, xuanzhuo@linux.alibaba.com,
Gal Pressman, Nimrod Oren, Dror Tennenbaum, Dragos Tatulea
On 10/01/2025 20:58, Martin Karsten wrote:
> On 2025-01-10 13:26, Stanislav Fomichev wrote:
>> On 01/10, Joe Damato wrote:
>>> On Mon, Dec 30, 2024 at 09:31:23AM -0500, Joe Damato wrote:
>>>> On Mon, Dec 23, 2024 at 08:17:08AM +0000, Alex Lazar wrote:
>>>>>
>>>
>>> [...]
>>>
>>>>>
>>>>> Hi Joe,
>>>>>
>>>>> Thanks for the quick response.
>>>>> Comments inline, If you need more details or further clarification,
>>>>> please let me know.
>>>>
>>>> As mentioned above and in my previous emails: please provide lot
>>>> more detail and make it as easy as possible for me to reproduce this
>>>> issue with the simplest reproducer possible and a much more detailed
>>>> explanation.
>>>>
>>>> Please note: I will be out of the office until Jan 9 so my responses
>>>> will be limited until then.
>>>
>>> Just to follow up on this for anyone who missed the other thread,
>>> Stanislav proposed a patch which _might_ fix the issue being hit
>>> here.
>>>
>>> Please see [1], try that patch, and report back if that patch fixes
>>> the issue.
>>>
>>> Thanks.
>>>
>>> [1]: https://lore.kernel.org/netdev/20250109003436.2829560-1-
>>> sdf@fomichev.me/
>>
>> Note that it might help only if xsk is using busy-polling. Not sure
>> that's the case, it's relatively obscure feature :-)
>
> I believe I have reproduced Alex' issue using the methodology below and
> your patch fixes it for me.
>
> The experiment uses a server (tilly01) with mlx5 and a client (tilly02).
> In the problem case, the 'response' packet gets stuck, but the next
> 'request' packets triggers both the stuck and the regular responses. The
> pattern can also be seen in the tcpdump output at the client. Note that
> the response packet is not a valid packet (only MAC addresses swapped,
> not IP addresses), but tcpdump shows it regardless.
>
> Thanks,
> Martin
>
> # on server tilly01
> watch -n 0.5 "sudo ethtool -S ens2f1np1 | fgrep tx_xsk_xmit"
>
> # on client tilly02
> sudo tcpdump -qbi eno3d1 udp
>
> # on client tilly02
> while true; do
> ssh tilly01 "sudo ifconfig ens2f1np1 down; sudo modprobe -r mlx5_ib;
> sleep 1; sudo modprobe mlx5_ib; sudo ifconfig ens2f1np1 up"
> ssh -f tilly01 "sudo ./bpf-examples/AF_XDP-example/xdpsock \
> -i ens2f1np1 -N -q 4 --l2fwd -z -B >/dev/null 2>&1"
> exp=1
> for ((i=0;i<5;i++)); do
> ssh tilly01 "sudo ethtool --config-ntuple ens2f1np1 flow-type udp4\
> dst-port 19017 action 4 >/dev/null 2>&1"
> for ((j=0;j<10;j++)); do
> echo -n "$exp "
> echo 'send(IP(dst="192.168.199.1",src="192.168.199.2")\
> /UDP(dport=19017))' | sudo ./scapy/run_scapy >/dev/null 2>&1
> cnt=$(ssh tilly01 ethtool -S ens2f1np1|grep -F tx_xsk_xmit\
> |cut -f2 -d:)
> [ $cnt -eq $exp ] || {
> echo COUNTER WRONG
> read x
> }
> ((exp+=1))
> done
> ssh tilly01 sudo ethtool --config-ntuple ens2f1np1 delete 1023
> done
> echo reset
> ssh tilly01 sudo killall xdpsock
> done
>
Thanks to Joe Martin and Stanislav for introducing this fix and for your
efforts in solving this issue. I reviewed it over the weekend and
verified that it solves the problem.
Thanks,
Alex Lazar
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2025-01-12 8:05 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-18 11:22 [net-next v6 0/9] Add support for per-NAPI config via netlink Alex Lazar
2024-12-18 17:08 ` Joe Damato
2024-12-20 17:40 ` Joe Damato
2024-12-23 8:17 ` Alex Lazar
2024-12-30 14:31 ` Joe Damato
2025-01-10 18:16 ` Joe Damato
2025-01-10 18:26 ` Stanislav Fomichev
2025-01-10 18:58 ` Martin Karsten
2025-01-12 8:05 ` Alex Lazar
-- strict thread matches above, loose matches on Subject: below --
2024-10-11 18:44 Joe Damato
2024-10-15 1:10 ` patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).