[net-next v6 0/9] Add support for per-NAPI config via netlink

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [net-next v6 0/9] Add support for per-NAPI config via netlink
@ 2024-10-11 18:44 Joe Damato
  2024-10-15  1:10 ` patchwork-bot+netdevbpf
  0 siblings, 1 reply; 11+ messages in thread
From: Joe Damato @ 2024-10-11 18:44 UTC (permalink / raw)
  To: netdev
  Cc: mkarsten, skhawaja, sdf, bjorn, amritha.nambiar,
	sridhar.samudrala, willemdebruijn.kernel, edumazet, Joe Damato,
	Alexander Lobakin, Breno Leitao, Daniel Jurgens, David Ahern,
	David S. Miller, Donald Hunter, Jakub Kicinski,
	Jesper Dangaard Brouer, Jiri Pirko, Johannes Berg,
	Jonathan Corbet, Leon Romanovsky, open list:DOCUMENTATION,
	open list, open list:MELLANOX MLX4 core VPI driver,
	Lorenzo Bianconi, Michael Chan, Mina Almasry, Paolo Abeni,
	Saeed Mahameed, Sebastian Andrzej Siewior, Tariq Toukan,
	Xuan Zhuo

Greetings:

Welcome to v6. Minor changes from v5 [1], please see changelog below.

There were no explicit comments from reviewers on the call outs in my
v5, so I'm retaining them from my previous cover letter just in case :)

A few important call outs for reviewers:

  1. This revision seems to work (see below for a full walk through). I
     think this is the behavior we talked about, but please let me know if
     a use case is missing.
  
  2. Re a previous point made by Stanislav regarding "taking over a NAPI
     ID" when the channel count changes: mlx5 seems to call napi_disable
     followed by netif_napi_del for the old queues and then calls
     napi_enable for the new ones. In this RFC, the NAPI ID generation is
     deferred to napi_enable. This means we won't end up with two of the
     same NAPI IDs added to the hash at the same time.

     Can we assume all drivers will napi_disable the old queues before
     napi_enable the new ones?
       - If yes: we might not need to worry about a NAPI ID takeover
       	 function.
       - If no: I'll need to make a change so that the NAPI ID generation
       	 is deferred only for drivers which have opted into the config
       	 space via calls to netif_napi_add_config

  3. I made the decision to remove the WARN_ON_ONCE that (I think?)
     Jakub previously suggested in alloc_netdev_mqs (WARN_ON_ONCE(txqs
     != rxqs);) because this was triggering on every kernel boot with my
     mlx5 NIC.

  4. I left the "maxqs = max(txqs, rxqs);" in alloc_netdev_mqs despite
     thinking this is a bit strange. I think it's strange that we might
     be short some number of NAPI configs, but it seems like most people
     are in favor of this approach, so I've left it. 

I'd appreciate thoughts from reviewers on the above items, if at all
possible.

Now, on to the implementation.

Firstly, this implementation moves certain settings to napi_struct so that
they are "per-NAPI", while taking care to respect existing sysfs
parameters which are interface wide and affect all NAPIs:
  - NAPI ID
  - gro_flush_timeout
  - defer_hard_irqs

Furthermore:
   - NAPI ID generation and addition to the hash is now deferred to
     napi_enable, instead of during netif_napi_add
   - NAPIs are removed from the hash during napi_disable, instead of
     netif_napi_del.
   - An array of "struct napi_config" is allocated in net_device.

IMPORTANT: The above changes affect all network drivers.

Optionally, drivers may opt-in to using their config space by calling
netif_napi_add_config instead of netif_napi_add.

If a driver does this, the NAPI being added is linked with an allocated
"struct napi_config" and the per-NAPI settings (including NAPI ID) are
persisted even as hardware queues are destroyed and recreated.

To help illustrate how this would end up working, I've added patches for
3 drivers, of which I have access to only 1:
  - mlx5 which is the basis of the examples below
  - mlx4 which has TX only NAPIs, just to highlight that case. I have
    only compile tested this patch; I don't have this hardware.
  - bnxt which I have only compiled tested. I don't have this
    hardware.

NOTE: I only tested this on mlx5; I have no access to the other hardware
for which I provided patches. Hopefully other folks can help test :)

Here's how it works when I test it on my mlx5 system:

# start with 2 queues

$ ethtool -l eth4 | grep Combined | tail -1
Combined:       2

First, output the current NAPI settings:

$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
                         --dump napi-get --json='{"ifindex": 7}'
[{'defer-hard-irqs': 0,
  'gro-flush-timeout': 0,
  'id': 345,
  'ifindex': 7,
  'irq': 527},
 {'defer-hard-irqs': 0,
  'gro-flush-timeout': 0,
  'id': 344,
  'ifindex': 7,
  'irq': 327}]

Now, set the global sysfs parameters:

$ sudo bash -c 'echo 20000 >/sys/class/net/eth4/gro_flush_timeout'
$ sudo bash -c 'echo 100 >/sys/class/net/eth4/napi_defer_hard_irqs'

Output current NAPI settings again:

$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
                         --dump napi-get --json='{"ifindex": 7}'
[{'defer-hard-irqs': 100,
  'gro-flush-timeout': 20000,
  'id': 345,
  'ifindex': 7,
  'irq': 527},
 {'defer-hard-irqs': 100,
  'gro-flush-timeout': 20000,
  'id': 344,
  'ifindex': 7,
  'irq': 327}]

Now set NAPI ID 345, via its NAPI ID to specific values:

$ sudo ./tools/net/ynl/cli.py \
          --spec Documentation/netlink/specs/netdev.yaml \
          --do napi-set \
          --json='{"id": 345,
                   "defer-hard-irqs": 111,
                   "gro-flush-timeout": 11111}'
None

Now output current NAPI settings again to ensure only NAPI ID 345
changed:

$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
                         --dump napi-get --json='{"ifindex": 7}'

[{'defer-hard-irqs': 111,
  'gro-flush-timeout': 11111,
  'id': 345,
  'ifindex': 7,
  'irq': 527},
 {'defer-hard-irqs': 100,
  'gro-flush-timeout': 20000,
  'id': 344,
  'ifindex': 7,
  'irq': 327}]

Now, increase gro-flush-timeout only:

$ sudo ./tools/net/ynl/cli.py \
       --spec Documentation/netlink/specs/netdev.yaml \
       --do napi-set --json='{"id": 345,
                              "gro-flush-timeout": 44444}'
None

Now output the current NAPI settings once more:

$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
                         --dump napi-get --json='{"ifindex": 7}'
[{'defer-hard-irqs': 111,
  'gro-flush-timeout': 44444,
  'id': 345,
  'ifindex': 7,
  'irq': 527},
 {'defer-hard-irqs': 100,
  'gro-flush-timeout': 20000,
  'id': 344,
  'ifindex': 7,
  'irq': 327}]

Now set NAPI ID 345 to have gro_flush_timeout of 0:

$ sudo ./tools/net/ynl/cli.py \
       --spec Documentation/netlink/specs/netdev.yaml \
       --do napi-set --json='{"id": 345,
                              "gro-flush-timeout": 0}'
None

Check that NAPI ID 345 has a value of 0:

$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
                         --dump napi-get --json='{"ifindex": 7}'

[{'defer-hard-irqs': 111,
  'gro-flush-timeout': 0,
  'id': 345,
  'ifindex': 7,
  'irq': 527},
 {'defer-hard-irqs': 100,
  'gro-flush-timeout': 20000,
  'id': 344,
  'ifindex': 7,
  'irq': 327}]

Change the queue count, ensuring that NAPI ID 345 retains its settings:

$ sudo ethtool -L eth4 combined 4

Check that the new queues have the system wide settings but that NAPI ID
345 remains unchanged:

$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
                         --dump napi-get --json='{"ifindex": 7}'

[{'defer-hard-irqs': 100,
  'gro-flush-timeout': 20000,
  'id': 347,
  'ifindex': 7,
  'irq': 529},
 {'defer-hard-irqs': 100,
  'gro-flush-timeout': 20000,
  'id': 346,
  'ifindex': 7,
  'irq': 528},
 {'defer-hard-irqs': 111,
  'gro-flush-timeout': 0,
  'id': 345,
  'ifindex': 7,
  'irq': 527},
 {'defer-hard-irqs': 100,
  'gro-flush-timeout': 20000,
  'id': 344,
  'ifindex': 7,
  'irq': 327}]

Now reduce the queue count below where NAPI ID 345 is indexed:

$ sudo ethtool -L eth4 combined 1

Check the output:

$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
                         --dump napi-get --json='{"ifindex": 7}'
[{'defer-hard-irqs': 100,
  'gro-flush-timeout': 20000,
  'id': 344,
  'ifindex': 7,
  'irq': 327}]

Re-increase the queue count to ensure NAPI ID 345 is re-assigned the same
values:

$ sudo ethtool -L eth4 combined 2

$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
                         --dump napi-get --json='{"ifindex": 7}'
[{'defer-hard-irqs': 111,
  'gro-flush-timeout': 0,
  'id': 345,
  'ifindex': 7,
  'irq': 527},
 {'defer-hard-irqs': 100,
  'gro-flush-timeout': 20000,
  'id': 344,
  'ifindex': 7,
  'irq': 327}]

Create new queues to ensure the sysfs globals are used for the new NAPIs
but that NAPI ID 345 is unchanged:

$ sudo ethtool -L eth4 combined 8

$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
                         --dump napi-get --json='{"ifindex": 7}'
[...]
 {'defer-hard-irqs': 100,
  'gro-flush-timeout': 20000,
  'id': 346,
  'ifindex': 7,
  'irq': 528},
 {'defer-hard-irqs': 111,
  'gro-flush-timeout': 0,
  'id': 345,
  'ifindex': 7,
  'irq': 527},
 {'defer-hard-irqs': 100,
  'gro-flush-timeout': 20000,
  'id': 344,
  'ifindex': 7,
  'irq': 327}]

Last, but not least, let's try writing the sysfs parameters to ensure
all NAPIs are rewritten:

$ sudo bash -c 'echo 33333 >/sys/class/net/eth4/gro_flush_timeout'
$ sudo bash -c 'echo 222 >/sys/class/net/eth4/napi_defer_hard_irqs'

Check that worked:

$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
                         --dump napi-get --json='{"ifindex": 7}'

[...]
 {'defer-hard-irqs': 222,
  'gro-flush-timeout': 33333,
  'id': 346,
  'ifindex': 7,
  'irq': 528},
 {'defer-hard-irqs': 222,
  'gro-flush-timeout': 33333,
  'id': 345,
  'ifindex': 7,
  'irq': 527},
 {'defer-hard-irqs': 222,
  'gro-flush-timeout': 33333,
  'id': 344,
  'ifindex': 7,
  'irq': 327}]

Thanks,
Joe

[1]: https://lore.kernel.org/netdev/20241009005525.13651-1-jdamato@fastly.com/

v6:
  - Added Jakub's Reviewed-by to all commit messages
  - Added Eric's Reviewed-by to the commits for which he provided his
    review (i.e. all patches except 4, 5, and 6)
  - Updated the netlink doc for gro_flush_timeout in patch 4
  - Added WARN_ON_ONCE to napi_hash_add_with_id as suggested by Eric
    Dumazet to patch 5

v5: https://lore.kernel.org/netdev/20241009005525.13651-1-jdamato@fastly.com/
  - Converted from an RFC to a PATCH
  - Moved defer_hard_irqs above control-fields comment in napi_struct in
    patch 1
  - Moved gro_flush_timeout above control-fields comment in napi_struct in
    patch 3
  - Removed unnecessary idpf changes from patch 3
  - Removed unnecessary kdoc in patch 5 for a parameter removed in an
    earlier revision
  - Removed unnecessary NULL check in patch 5
  - Used tooling to autogenerate code for patch 6, which fixed the type and
    range of NETDEV_A_NAPI_DEFER_HARD_IRQ.

rfcv4: https://lore.kernel.org/lkml/20241001235302.57609-1-jdamato@fastly.com/
  - Updated commit messages of most patches
  - Renamed netif_napi_add_storage to netif_napi_add_config in patch 5
  - Added a NULL check in netdev_set_defer_hard_irqs and
    netdev_set_gro_flush_timeout for netdev->napi_config in patch 5
  - Removed the WARN_ON_ONCE suggested in an earlier revision
    in alloc_netdev_mqs from patch 5; it triggers every time on my mlx5
    machine at boot and needlessly spams the log
  - Added a locking adjustment suggested by Stanislav to patch 6 to
    protect napi_id in patch 5
  - Removed napi_hash_del from netif_napi_del in patch 5. netif_napi_del
    calls __netif_napi_del which itself calls napi_hash_del. The
    original code thus resulted in two napi_hash_del calls, which is
    incorrect.
  - Removed the napi_hash_add from netif_napi_add_weight in patch 5.
    NAPIs are added to the hash when napi_enable is called, instead.
  - Moved the napi_restore_config to the top of napi_enable in patch 5.
  - Simplified the logic in __netif_napi_del and removed napi_hash_del.
    NAPIs are removed in napi_disable.
  - Fixed merge conflicts in patch 6 so it applies cleanly

rfcv3: https://lore.kernel.org/netdev/20240912100738.16567-8-jdamato@fastly.com/T/
  - Renamed napi_storage to napi_config
  - Reordered patches
  - Added defer_hard_irqs and gro_flush_timeout to napi_struct
  - Attempt to save and restore settings on napi_disable/napi_enable
  - Removed weight as a parameter to netif_napi_add_storage
  - Updated driver patches to no longer pass in weight

rfcv2: https://lore.kernel.org/netdev/20240908160702.56618-1-jdamato@fastly.com/
  - Almost total rewrite from v1

Joe Damato (9):
  net: napi: Make napi_defer_hard_irqs per-NAPI
  netdev-genl: Dump napi_defer_hard_irqs
  net: napi: Make gro_flush_timeout per-NAPI
  netdev-genl: Dump gro_flush_timeout
  net: napi: Add napi_config
  netdev-genl: Support setting per-NAPI config values
  bnxt: Add support for persistent NAPI config
  mlx5: Add support for persistent NAPI config
  mlx4: Add support for persistent NAPI config to RX CQs

 Documentation/netlink/specs/netdev.yaml       | 28 ++++++
 .../networking/net_cachelines/net_device.rst  |  3 +
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     |  3 +-
 drivers/net/ethernet/mellanox/mlx4/en_cq.c    |  3 +-
 .../net/ethernet/mellanox/mlx5/core/en_main.c |  2 +-
 include/linux/netdevice.h                     | 42 +++++++-
 include/uapi/linux/netdev.h                   |  3 +
 net/core/dev.c                                | 96 ++++++++++++++++---
 net/core/dev.h                                | 88 +++++++++++++++++
 net/core/net-sysfs.c                          |  4 +-
 net/core/netdev-genl-gen.c                    | 18 ++++
 net/core/netdev-genl-gen.h                    |  1 +
 net/core/netdev-genl.c                        | 57 +++++++++++
 tools/include/uapi/linux/netdev.h             |  3 +
 14 files changed, 326 insertions(+), 25 deletions(-)


base-commit: d677aebd663ddc287f2b2bda098474694a0ca875
-- 
2.25.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [net-next v6 0/9] Add support for per-NAPI config via netlink
  2024-10-11 18:44 Joe Damato
@ 2024-10-15  1:10 ` patchwork-bot+netdevbpf
  0 siblings, 0 replies; 11+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-10-15  1:10 UTC (permalink / raw)
  To: Joe Damato
  Cc: netdev, mkarsten, skhawaja, sdf, bjorn, amritha.nambiar,
	sridhar.samudrala, willemdebruijn.kernel, edumazet,
	aleksander.lobakin, leitao, danielj, dsahern, davem,
	donald.hunter, kuba, hawk, jiri, johannes.berg, corbet, leon,
	linux-doc, linux-kernel, linux-rdma, lorenzo, michael.chan,
	almasrymina, pabeni, saeedm, bigeasy, tariqt, xuanzhuo

Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Fri, 11 Oct 2024 18:44:55 +0000 you wrote:
> Greetings:
> 
> Welcome to v6. Minor changes from v5 [1], please see changelog below.
> 
> There were no explicit comments from reviewers on the call outs in my
> v5, so I'm retaining them from my previous cover letter just in case :)
> 
> [...]

Here is the summary with links:
  - [net-next,v6,1/9] net: napi: Make napi_defer_hard_irqs per-NAPI
    https://git.kernel.org/netdev/net-next/c/f15e3b3ddb9f
  - [net-next,v6,2/9] netdev-genl: Dump napi_defer_hard_irqs
    https://git.kernel.org/netdev/net-next/c/516010460011
  - [net-next,v6,3/9] net: napi: Make gro_flush_timeout per-NAPI
    https://git.kernel.org/netdev/net-next/c/acb8d4ed5661
  - [net-next,v6,4/9] netdev-genl: Dump gro_flush_timeout
    https://git.kernel.org/netdev/net-next/c/0137891e7457
  - [net-next,v6,5/9] net: napi: Add napi_config
    https://git.kernel.org/netdev/net-next/c/86e25f40aa1e
  - [net-next,v6,6/9] netdev-genl: Support setting per-NAPI config values
    https://git.kernel.org/netdev/net-next/c/1287c1ae0fc2
  - [net-next,v6,7/9] bnxt: Add support for persistent NAPI config
    https://git.kernel.org/netdev/net-next/c/419365227496
  - [net-next,v6,8/9] mlx5: Add support for persistent NAPI config
    https://git.kernel.org/netdev/net-next/c/2a3372cafe02
  - [net-next,v6,9/9] mlx4: Add support for persistent NAPI config to RX CQs
    https://git.kernel.org/netdev/net-next/c/c9191eaa7285

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [net-next v6 0/9] Add support for per-NAPI config via netlink
@ 2024-12-18 11:22 Alex Lazar
  2024-12-18 17:08 ` Joe Damato
  0 siblings, 1 reply; 11+ messages in thread
From: Alex Lazar @ 2024-12-18 11:22 UTC (permalink / raw)
  To: jdamato@fastly.com
  Cc: aleksander.lobakin@intel.com, almasrymina@google.com,
	amritha.nambiar@intel.com, bigeasy@linutronix.de,
	bjorn@rivosinc.com, corbet@lwn.net, Dan Jurgens,
	davem@davemloft.net, donald.hunter@gmail.com, dsahern@kernel.org,
	edumazet@google.com, hawk@kernel.org, jiri@resnulli.us,
	johannes.berg@intel.com, kuba@kernel.org, leitao@debian.org,
	leon@kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
	lorenzo@kernel.org, michael.chan@broadcom.com,
	mkarsten@uwaterloo.ca, netdev@vger.kernel.org, pabeni@redhat.com,
	Saeed Mahameed, sdf@fomichev.me, skhawaja@google.com,
	sridhar.samudrala@intel.com, Tariq Toukan,
	willemdebruijn.kernel@gmail.com, xuanzhuo@linux.alibaba.com,
	Gal Pressman, Nimrod Oren, Dror Tennenbaum, Dragos Tatulea

Hi Joe and all,

I am part of the NVIDIA Eth drivers team, and we are experiencing a problem,
sibesced to this change: commit 86e25f40aa1e ("net: napi: Add napi_config")

The issue occurs when sending packets from one machine to another.
On the receiver side, we have XSK (XDPsock) that receives the packet and sends it
back to the sender.
At some point, one packet (packet A) gets "stuck," and if we send a new packet
(packet B), it "pushes" the previous one. Packet A is then processed by the NAPI
poll, and packet B gets stuck, and so on.

Your change involves moving napi_hash_del() and napi_hash_add() from
netif_napi_del() and netif_napi_add_weight() to napi_enable() and napi_disable().
If I move them back to netif_napi_del() and netif_napi_add_weight(),
the issue is resolved (I moved the entire if/else block, not just the napi_hash_del/add).

This issue occurs with both the new and old APIs (netif_napi_add/_config).
Moving the napi_hash_add() and napi_hash_del() functions resolves it for both.
I am debugging this, no breakthrough so far.

I would appreciate if you could look into this.
We can provide more details per request.

Regards,
Alexei Lazar

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [net-next v6 0/9] Add support for per-NAPI config via netlink
  2024-12-18 11:22 [net-next v6 0/9] Add support for per-NAPI config via netlink Alex Lazar
@ 2024-12-18 17:08 ` Joe Damato
  2024-12-20 17:40   ` Joe Damato
  0 siblings, 1 reply; 11+ messages in thread
From: Joe Damato @ 2024-12-18 17:08 UTC (permalink / raw)
  To: Alex Lazar
  Cc: aleksander.lobakin@intel.com, almasrymina@google.com,
	amritha.nambiar@intel.com, bigeasy@linutronix.de,
	bjorn@rivosinc.com, corbet@lwn.net, Dan Jurgens,
	davem@davemloft.net, donald.hunter@gmail.com, dsahern@kernel.org,
	edumazet@google.com, hawk@kernel.org, jiri@resnulli.us,
	johannes.berg@intel.com, kuba@kernel.org, leitao@debian.org,
	leon@kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
	lorenzo@kernel.org, michael.chan@broadcom.com,
	mkarsten@uwaterloo.ca, netdev@vger.kernel.org, pabeni@redhat.com,
	Saeed Mahameed, sdf@fomichev.me, skhawaja@google.com,
	sridhar.samudrala@intel.com, Tariq Toukan,
	willemdebruijn.kernel@gmail.com, xuanzhuo@linux.alibaba.com,
	Gal Pressman, Nimrod Oren, Dror Tennenbaum, Dragos Tatulea

On Wed, Dec 18, 2024 at 11:22:33AM +0000, Alex Lazar wrote:
> Hi Joe and all,
> 
> I am part of the NVIDIA Eth drivers team, and we are experiencing a problem,
> sibesced to this change: commit 86e25f40aa1e ("net: napi: Add napi_config")
> 
> The issue occurs when sending packets from one machine to another.
> On the receiver side, we have XSK (XDPsock) that receives the packet and sends it
> back to the sender.
> At some point, one packet (packet A) gets "stuck," and if we send a new packet
> (packet B), it "pushes" the previous one. Packet A is then processed by the NAPI
> poll, and packet B gets stuck, and so on.
> 
> Your change involves moving napi_hash_del() and napi_hash_add() from
> netif_napi_del() and netif_napi_add_weight() to napi_enable() and napi_disable().
> If I move them back to netif_napi_del() and netif_napi_add_weight(),
> the issue is resolved (I moved the entire if/else block, not just the napi_hash_del/add).
> 
> This issue occurs with both the new and old APIs (netif_napi_add/_config).
> Moving the napi_hash_add() and napi_hash_del() functions resolves it for both.
> I am debugging this, no breakthrough so far.
> 
> I would appreciate if you could look into this.
> We can provide more details per request.

I appreciate your report, but there is not a lot in your message to
help debug the issue.

Can you please:

1.) Verify that the kernel tree you are testing on has commit
cecc1555a8c2 ("net: Make napi_hash_lock irq safe") included ? If it
does not, can you pull in that commit and re-run your test and
report back if that fixes your problem?

2.) If (1) does not fix your problem, can you please reply with at
least the following information:
  - Specify what device this is happening on (in case I have access
    to one)
  - Which driver is affected
  - Which upstream kernel SHA you are building your test kernel from
  - The reproducer program(s) with clear instructions on how exactly
    to run it/them in order to reproduce the issue

Thanks,
Joe

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [net-next v6 0/9] Add support for per-NAPI config via netlink
  2024-12-18 17:08 ` Joe Damato
@ 2024-12-20 17:40   ` Joe Damato
  2024-12-23  8:17     ` Alex Lazar
  0 siblings, 1 reply; 11+ messages in thread
From: Joe Damato @ 2024-12-20 17:40 UTC (permalink / raw)
  To: Alex Lazar, aleksander.lobakin@intel.com, almasrymina@google.com,
	amritha.nambiar@intel.com, bigeasy@linutronix.de,
	bjorn@rivosinc.com, corbet@lwn.net, Dan Jurgens,
	davem@davemloft.net, donald.hunter@gmail.com, dsahern@kernel.org,
	edumazet@google.com, hawk@kernel.org, jiri@resnulli.us,
	johannes.berg@intel.com, kuba@kernel.org, leitao@debian.org,
	leon@kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
	lorenzo@kernel.org, michael.chan@broadcom.com,
	mkarsten@uwaterloo.ca, netdev@vger.kernel.org, pabeni@redhat.com,
	Saeed Mahameed, sdf@fomichev.me, skhawaja@google.com,
	sridhar.samudrala@intel.com, Tariq Toukan,
	willemdebruijn.kernel@gmail.com, xuanzhuo@linux.alibaba.com,
	Gal Pressman, Nimrod Oren, Dror Tennenbaum, Dragos Tatulea

On Wed, Dec 18, 2024 at 09:08:58AM -0800, Joe Damato wrote:
> On Wed, Dec 18, 2024 at 11:22:33AM +0000, Alex Lazar wrote:
> > Hi Joe and all,
> > 
> > I am part of the NVIDIA Eth drivers team, and we are experiencing a problem,
> > sibesced to this change: commit 86e25f40aa1e ("net: napi: Add napi_config")
> > 
> > The issue occurs when sending packets from one machine to another.
> > On the receiver side, we have XSK (XDPsock) that receives the packet and sends it
> > back to the sender.
> > At some point, one packet (packet A) gets "stuck," and if we send a new packet
> > (packet B), it "pushes" the previous one. Packet A is then processed by the NAPI
> > poll, and packet B gets stuck, and so on.
> > 
> > Your change involves moving napi_hash_del() and napi_hash_add() from
> > netif_napi_del() and netif_napi_add_weight() to napi_enable() and napi_disable().
> > If I move them back to netif_napi_del() and netif_napi_add_weight(),
> > the issue is resolved (I moved the entire if/else block, not just the napi_hash_del/add).
> > 
> > This issue occurs with both the new and old APIs (netif_napi_add/_config).
> > Moving the napi_hash_add() and napi_hash_del() functions resolves it for both.
> > I am debugging this, no breakthrough so far.
> > 
> > I would appreciate if you could look into this.
> > We can provide more details per request.
> 
> I appreciate your report, but there is not a lot in your message to
> help debug the issue.
> 
> Can you please:
> 
> 1.) Verify that the kernel tree you are testing on has commit
> cecc1555a8c2 ("net: Make napi_hash_lock irq safe") included ? If it
> does not, can you pull in that commit and re-run your test and
> report back if that fixes your problem?
> 
> 2.) If (1) does not fix your problem, can you please reply with at
> least the following information:
>   - Specify what device this is happening on (in case I have access
>     to one)
>   - Which driver is affected
>   - Which upstream kernel SHA you are building your test kernel from
>   - The reproducer program(s) with clear instructions on how exactly
>     to run it/them in order to reproduce the issue

I didn't hear back on the above, but wanted to let you know that
I'll be out of the office soon, so my responses/bandwidth for
helping to debug this will be limited over the next week or two.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [net-next v6 0/9] Add support for per-NAPI config via netlink
  2024-12-20 17:40   ` Joe Damato
@ 2024-12-23  8:17     ` Alex Lazar
  2024-12-30 14:31       ` Joe Damato
  0 siblings, 1 reply; 11+ messages in thread
From: Alex Lazar @ 2024-12-23  8:17 UTC (permalink / raw)
  To: Joe Damato, aleksander.lobakin@intel.com, almasrymina@google.com,
	amritha.nambiar@intel.com, bigeasy@linutronix.de,
	bjorn@rivosinc.com, corbet@lwn.net, Dan Jurgens,
	davem@davemloft.net, donald.hunter@gmail.com, dsahern@kernel.org,
	edumazet@google.com, hawk@kernel.org, jiri@resnulli.us,
	johannes.berg@intel.com, kuba@kernel.org, leitao@debian.org,
	leon@kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
	lorenzo@kernel.org, michael.chan@broadcom.com,
	mkarsten@uwaterloo.ca, netdev@vger.kernel.org, pabeni@redhat.com,
	Saeed Mahameed, sdf@fomichev.me, skhawaja@google.com,
	sridhar.samudrala@intel.com, Tariq Toukan,
	willemdebruijn.kernel@gmail.com, xuanzhuo@linux.alibaba.com,
	Gal Pressman, Nimrod Oren, Dror Tennenbaum, Dragos Tatulea



On 20/12/2024 19:40, Joe Damato wrote:
> On Wed, Dec 18, 2024 at 09:08:58AM -0800, Joe Damato wrote:
>> On Wed, Dec 18, 2024 at 11:22:33AM +0000, Alex Lazar wrote:
>>> Hi Joe and all,
>>>
>>> I am part of the NVIDIA Eth drivers team, and we are experiencing a problem,
>>> sibesced to this change: commit 86e25f40aa1e ("net: napi: Add napi_config")
>>>
>>> The issue occurs when sending packets from one machine to another.
>>> On the receiver side, we have XSK (XDPsock) that receives the packet and sends it
>>> back to the sender.
>>> At some point, one packet (packet A) gets "stuck," and if we send a new packet
>>> (packet B), it "pushes" the previous one. Packet A is then processed by the NAPI
>>> poll, and packet B gets stuck, and so on.
>>>
>>> Your change involves moving napi_hash_del() and napi_hash_add() from
>>> netif_napi_del() and netif_napi_add_weight() to napi_enable() and napi_disable().
>>> If I move them back to netif_napi_del() and netif_napi_add_weight(),
>>> the issue is resolved (I moved the entire if/else block, not just the napi_hash_del/add).
>>>
>>> This issue occurs with both the new and old APIs (netif_napi_add/_config).
>>> Moving the napi_hash_add() and napi_hash_del() functions resolves it for both.
>>> I am debugging this, no breakthrough so far.
>>>
>>> I would appreciate if you could look into this.
>>> We can provide more details per request.
>>
>> I appreciate your report, but there is not a lot in your message to
>> help debug the issue.
>>
>> Can you please:
>>
>> 1.) Verify that the kernel tree you are testing on has commit
>> cecc1555a8c2 ("net: Make napi_hash_lock irq safe") included ? If it
>> does not, can you pull in that commit and re-run your test and
>> report back if that fixes your problem?

I verified that the kernel tree includes commit cecc1555a8c2 ("net: Make 
napi_hash_lock irq safe"), but the issue still occurs.

>>
>> 2.) If (1) does not fix your problem, can you please reply with at
>> least the following information:
>>    - Specify what device this is happening on (in case I have access
>>      to one)

We are using two ConnectX-5 cards connected back-to-back.

>>    - Which driver is affected

The affected driver is the MLX5 driver.

>>    - Which upstream kernel SHA you are building your test kernel from

The upstream kernel SHA we are building is 9163b05eca1d ("Merge branch 
'add-support-for-so_priority-cmsg'").

>>    - The reproducer program(s) with clear instructions on how exactly
>>      to run it/them in order to reproduce the issue

Test setup/configuration:
On one side, we use a Python script with the scapy.all library to create 
UDP packets of size 1024, using port 19017 and the MAC/IP of the other side.
On the other side, we define an n-tuple filter (ethtool --config-ntuple 
eth2 flow-type udp4 dst-port 19017 action 4) and run xdpsock (xdpsock -i 
eth2 -N -q 4 --l2fwd -z -B).
In the test, we send a single packet each time, which is received and 
sent back to the sender.
As part of the validation, we check the statistics on the other side and 
notice a discrepancy between what xdpsock shows and what we see in the 
driver (ethtool -S eth2 | grep "tx_xsk_xmit").

> 
> I didn't hear back on the above, but wanted to let you know that
> I'll be out of the office soon, so my responses/bandwidth for
> helping to debug this will be limited over the next week or two.

Hi Joe,

Thanks for the quick response.
Comments inline, If you need more details or further clarification, 
please let me know.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [net-next v6 0/9] Add support for per-NAPI config via netlink
  2024-12-23  8:17     ` Alex Lazar
@ 2024-12-30 14:31       ` Joe Damato
  2025-01-10 18:16         ` Joe Damato
  0 siblings, 1 reply; 11+ messages in thread
From: Joe Damato @ 2024-12-30 14:31 UTC (permalink / raw)
  To: Alex Lazar
  Cc: aleksander.lobakin@intel.com, almasrymina@google.com,
	amritha.nambiar@intel.com, bigeasy@linutronix.de,
	bjorn@rivosinc.com, corbet@lwn.net, Dan Jurgens,
	davem@davemloft.net, donald.hunter@gmail.com, dsahern@kernel.org,
	edumazet@google.com, hawk@kernel.org, jiri@resnulli.us,
	johannes.berg@intel.com, kuba@kernel.org, leitao@debian.org,
	leon@kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
	lorenzo@kernel.org, michael.chan@broadcom.com,
	mkarsten@uwaterloo.ca, netdev@vger.kernel.org, pabeni@redhat.com,
	Saeed Mahameed, sdf@fomichev.me, skhawaja@google.com,
	sridhar.samudrala@intel.com, Tariq Toukan,
	willemdebruijn.kernel@gmail.com, xuanzhuo@linux.alibaba.com,
	Gal Pressman, Nimrod Oren, Dror Tennenbaum, Dragos Tatulea

On Mon, Dec 23, 2024 at 08:17:08AM +0000, Alex Lazar wrote:
> 
> 
> On 20/12/2024 19:40, Joe Damato wrote:
> > On Wed, Dec 18, 2024 at 09:08:58AM -0800, Joe Damato wrote:
> >> On Wed, Dec 18, 2024 at 11:22:33AM +0000, Alex Lazar wrote:
> >>> Hi Joe and all,
> >>>
> >>> I am part of the NVIDIA Eth drivers team, and we are experiencing a problem,
> >>> sibesced to this change: commit 86e25f40aa1e ("net: napi: Add napi_config")
> >>>
> >>> The issue occurs when sending packets from one machine to another.
> >>> On the receiver side, we have XSK (XDPsock) that receives the packet and sends it
> >>> back to the sender.
> >>> At some point, one packet (packet A) gets "stuck," and if we send a new packet
> >>> (packet B), it "pushes" the previous one. Packet A is then processed by the NAPI
> >>> poll, and packet B gets stuck, and so on.
> >>>
> >>> Your change involves moving napi_hash_del() and napi_hash_add() from
> >>> netif_napi_del() and netif_napi_add_weight() to napi_enable() and napi_disable().
> >>> If I move them back to netif_napi_del() and netif_napi_add_weight(),
> >>> the issue is resolved (I moved the entire if/else block, not just the napi_hash_del/add).
> >>>
> >>> This issue occurs with both the new and old APIs (netif_napi_add/_config).
> >>> Moving the napi_hash_add() and napi_hash_del() functions resolves it for both.
> >>> I am debugging this, no breakthrough so far.
> >>>
> >>> I would appreciate if you could look into this.
> >>> We can provide more details per request.
> >>
> >> I appreciate your report, but there is not a lot in your message to
> >> help debug the issue.
> >>
> >> Can you please:
> >>
> >> 1.) Verify that the kernel tree you are testing on has commit
> >> cecc1555a8c2 ("net: Make napi_hash_lock irq safe") included ? If it
> >> does not, can you pull in that commit and re-run your test and
> >> report back if that fixes your problem?
> 
> I verified that the kernel tree includes commit cecc1555a8c2 ("net: Make 
> napi_hash_lock irq safe"), but the issue still occurs.

OK, thanks for verifying that.
 
> >>
> >> 2.) If (1) does not fix your problem, can you please reply with at
> >> least the following information:
> >>    - Specify what device this is happening on (in case I have access
> >>      to one)
> 
> We are using two ConnectX-5 cards connected back-to-back.
> 
> >>    - Which driver is affected
> 
> The affected driver is the MLX5 driver.
> 
> >>    - Which upstream kernel SHA you are building your test kernel from
> 
> The upstream kernel SHA we are building is 9163b05eca1d ("Merge branch 
> 'add-support-for-so_priority-cmsg'").

I have access to ConnectX-5 Ex EN MCX516A-CDAT NICs and can build a
kernel based on the upstream SHA you mentioned.

> >>    - The reproducer program(s) with clear instructions on how exactly
> >>      to run it/them in order to reproduce the issue
> 
> Test setup/configuration:
> On one side, we use a Python script with the scapy.all library to create 
> UDP packets of size 1024, using port 19017 and the MAC/IP of the other side.
> On the other side, we define an n-tuple filter (ethtool --config-ntuple 
> eth2 flow-type udp4 dst-port 19017 action 4) and run xdpsock (xdpsock -i 
> eth2 -N -q 4 --l2fwd -z -B).
> In the test, we send a single packet each time, which is received and 
> sent back to the sender.
> As part of the validation, we check the statistics on the other side and 
> notice a discrepancy between what xdpsock shows and what we see in the 
> driver (ethtool -S eth2 | grep "tx_xsk_xmit").

1. Can you please be much more specific when you say "notice a
   discrepancy" and substitute that for specific data? What,
   exactly, is a single run of the experiment that results in a lost
   or delayed packet? Do you have tcpdump output? What is the
   expected output and what is the actual output?

2. Can you provide the full source code? Both the reproducer and the
   script to check the values?

   You can feel free to create a github gist and link to it, then I
   can git clone it and run it.

   In the gist you can create a README.md that shows exactly what
   commands I need to run to reproduce, the expected output from the
   script, and the actual output.

   Since I have access to this hardware, I should be able to
   reproduce.

   Ideally, I'll git clone your gist, run something like "make test"
   (or similar) and reproduce the issue and get either "Test
   succeeded" or Test failed; expected %d packets but received %d".

3. If this issue is with napi_disable/napi_enable, can you think of
   a simpler reproducer? For example, wouldn't changing the queue
   count cause the same issue without needing to involve xdp at all?

   What about a simpler experiment with UDPing [1] and an ntuple
   filter? Does it reproduce with plain old NAPI processing or is it
   XDP specific?

> > 
> > I didn't hear back on the above, but wanted to let you know that
> > I'll be out of the office soon, so my responses/bandwidth for
> > helping to debug this will be limited over the next week or two.
> 
> Hi Joe,
> 
> Thanks for the quick response.
> Comments inline, If you need more details or further clarification, 
> please let me know.

As mentioned above and in my previous emails: please provide lot
more detail and make it as easy as possible for me to reproduce this
issue with the simplest reproducer possible and a much more detailed
explanation.

Please note: I will be out of the office until Jan 9 so my responses
will be limited until then.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [net-next v6 0/9] Add support for per-NAPI config via netlink
  2024-12-30 14:31       ` Joe Damato
@ 2025-01-10 18:16         ` Joe Damato
  2025-01-10 18:26           ` Stanislav Fomichev
  0 siblings, 1 reply; 11+ messages in thread
From: Joe Damato @ 2025-01-10 18:16 UTC (permalink / raw)
  To: Alex Lazar, aleksander.lobakin@intel.com, almasrymina@google.com,
	bigeasy@linutronix.de, bjorn@rivosinc.com, Dan Jurgens,
	davem@davemloft.net, donald.hunter@gmail.com, dsahern@kernel.org,
	edumazet@google.com, hawk@kernel.org, jiri@resnulli.us,
	johannes.berg@intel.com, kuba@kernel.org, leitao@debian.org,
	leon@kernel.org, linux-kernel@vger.kernel.org,
	linux-rdma@vger.kernel.org, lorenzo@kernel.org,
	michael.chan@broadcom.com, mkarsten@uwaterloo.ca,
	netdev@vger.kernel.org, pabeni@redhat.com, Saeed Mahameed,
	sdf@fomichev.me, skhawaja@google.com, sridhar.samudrala@intel.com,
	Tariq Toukan, willemdebruijn.kernel@gmail.com,
	xuanzhuo@linux.alibaba.com, Gal Pressman, Nimrod Oren,
	Dror Tennenbaum, Dragos Tatulea

On Mon, Dec 30, 2024 at 09:31:23AM -0500, Joe Damato wrote:
> On Mon, Dec 23, 2024 at 08:17:08AM +0000, Alex Lazar wrote:
> > 

[...]

> > 
> > Hi Joe,
> > 
> > Thanks for the quick response.
> > Comments inline, If you need more details or further clarification, 
> > please let me know.
> 
> As mentioned above and in my previous emails: please provide lot
> more detail and make it as easy as possible for me to reproduce this
> issue with the simplest reproducer possible and a much more detailed
> explanation.
> 
> Please note: I will be out of the office until Jan 9 so my responses
> will be limited until then.

Just to follow up on this for anyone who missed the other thread,
Stanislav proposed a patch which _might_ fix the issue being hit
here.

Please see [1], try that patch, and report back if that patch fixes
the issue.

Thanks.

[1]: https://lore.kernel.org/netdev/20250109003436.2829560-1-sdf@fomichev.me/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [net-next v6 0/9] Add support for per-NAPI config via netlink
  2025-01-10 18:16         ` Joe Damato
@ 2025-01-10 18:26           ` Stanislav Fomichev
  2025-01-10 18:58             ` Martin Karsten
  0 siblings, 1 reply; 11+ messages in thread
From: Stanislav Fomichev @ 2025-01-10 18:26 UTC (permalink / raw)
  To: Joe Damato, Alex Lazar, aleksander.lobakin@intel.com,
	almasrymina@google.com, bigeasy@linutronix.de, bjorn@rivosinc.com,
	Dan Jurgens, davem@davemloft.net, donald.hunter@gmail.com,
	dsahern@kernel.org, edumazet@google.com, hawk@kernel.org,
	jiri@resnulli.us, johannes.berg@intel.com, kuba@kernel.org,
	leitao@debian.org, leon@kernel.org, linux-kernel@vger.kernel.org,
	linux-rdma@vger.kernel.org, lorenzo@kernel.org,
	michael.chan@broadcom.com, mkarsten@uwaterloo.ca,
	netdev@vger.kernel.org, pabeni@redhat.com, Saeed Mahameed,
	sdf@fomichev.me, skhawaja@google.com, sridhar.samudrala@intel.com,
	Tariq Toukan, willemdebruijn.kernel@gmail.com,
	xuanzhuo@linux.alibaba.com, Gal Pressman, Nimrod Oren,
	Dror Tennenbaum, Dragos Tatulea

On 01/10, Joe Damato wrote:
> On Mon, Dec 30, 2024 at 09:31:23AM -0500, Joe Damato wrote:
> > On Mon, Dec 23, 2024 at 08:17:08AM +0000, Alex Lazar wrote:
> > > 
> 
> [...]
> 
> > > 
> > > Hi Joe,
> > > 
> > > Thanks for the quick response.
> > > Comments inline, If you need more details or further clarification, 
> > > please let me know.
> > 
> > As mentioned above and in my previous emails: please provide lot
> > more detail and make it as easy as possible for me to reproduce this
> > issue with the simplest reproducer possible and a much more detailed
> > explanation.
> > 
> > Please note: I will be out of the office until Jan 9 so my responses
> > will be limited until then.
> 
> Just to follow up on this for anyone who missed the other thread,
> Stanislav proposed a patch which _might_ fix the issue being hit
> here.
> 
> Please see [1], try that patch, and report back if that patch fixes
> the issue.
> 
> Thanks.
> 
> [1]: https://lore.kernel.org/netdev/20250109003436.2829560-1-sdf@fomichev.me/

Note that it might help only if xsk is using busy-polling. Not sure
that's the case, it's relatively obscure feature :-)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [net-next v6 0/9] Add support for per-NAPI config via netlink
  2025-01-10 18:26           ` Stanislav Fomichev
@ 2025-01-10 18:58             ` Martin Karsten
  2025-01-12  8:05               ` Alex Lazar
  0 siblings, 1 reply; 11+ messages in thread
From: Martin Karsten @ 2025-01-10 18:58 UTC (permalink / raw)
  To: Stanislav Fomichev, Joe Damato, Alex Lazar,
	aleksander.lobakin@intel.com, almasrymina@google.com,
	bigeasy@linutronix.de, bjorn@rivosinc.com, Dan Jurgens,
	davem@davemloft.net, donald.hunter@gmail.com, dsahern@kernel.org,
	edumazet@google.com, hawk@kernel.org, jiri@resnulli.us,
	johannes.berg@intel.com, kuba@kernel.org, leitao@debian.org,
	leon@kernel.org, linux-kernel@vger.kernel.org,
	linux-rdma@vger.kernel.org, lorenzo@kernel.org,
	michael.chan@broadcom.com, netdev@vger.kernel.org,
	pabeni@redhat.com, Saeed Mahameed, sdf@fomichev.me,
	skhawaja@google.com, sridhar.samudrala@intel.com, Tariq Toukan,
	willemdebruijn.kernel@gmail.com, xuanzhuo@linux.alibaba.com,
	Gal Pressman, Nimrod Oren, Dror Tennenbaum, Dragos Tatulea

On 2025-01-10 13:26, Stanislav Fomichev wrote:
> On 01/10, Joe Damato wrote:
>> On Mon, Dec 30, 2024 at 09:31:23AM -0500, Joe Damato wrote:
>>> On Mon, Dec 23, 2024 at 08:17:08AM +0000, Alex Lazar wrote:
>>>>
>>
>> [...]
>>
>>>>
>>>> Hi Joe,
>>>>
>>>> Thanks for the quick response.
>>>> Comments inline, If you need more details or further clarification,
>>>> please let me know.
>>>
>>> As mentioned above and in my previous emails: please provide lot
>>> more detail and make it as easy as possible for me to reproduce this
>>> issue with the simplest reproducer possible and a much more detailed
>>> explanation.
>>>
>>> Please note: I will be out of the office until Jan 9 so my responses
>>> will be limited until then.
>>
>> Just to follow up on this for anyone who missed the other thread,
>> Stanislav proposed a patch which _might_ fix the issue being hit
>> here.
>>
>> Please see [1], try that patch, and report back if that patch fixes
>> the issue.
>>
>> Thanks.
>>
>> [1]: https://lore.kernel.org/netdev/20250109003436.2829560-1-sdf@fomichev.me/
> 
> Note that it might help only if xsk is using busy-polling. Not sure
> that's the case, it's relatively obscure feature :-)

I believe I have reproduced Alex' issue using the methodology below and 
your patch fixes it for me.

The experiment uses a server (tilly01) with mlx5 and a client (tilly02). 
In the problem case, the 'response' packet gets stuck, but the next 
'request' packets triggers both the stuck and the regular responses. The 
pattern can also be seen in the tcpdump output at the client. Note that 
the response packet is not a valid packet (only MAC addresses swapped, 
not IP addresses), but tcpdump shows it regardless.

Thanks,
Martin

# on server tilly01
watch -n 0.5 "sudo ethtool -S ens2f1np1 | fgrep tx_xsk_xmit"

# on client tilly02
sudo tcpdump -qbi eno3d1 udp

# on client tilly02
while true; do
   ssh tilly01 "sudo ifconfig ens2f1np1 down; sudo modprobe -r mlx5_ib;
     sleep 1; sudo modprobe mlx5_ib; sudo ifconfig ens2f1np1 up"
   ssh -f tilly01 "sudo ./bpf-examples/AF_XDP-example/xdpsock \
     -i ens2f1np1 -N -q 4 --l2fwd -z -B >/dev/null 2>&1"
   exp=1
   for ((i=0;i<5;i++)); do
     ssh tilly01 "sudo ethtool --config-ntuple ens2f1np1 flow-type udp4\
       dst-port 19017 action 4 >/dev/null 2>&1"
     for ((j=0;j<10;j++)); do
       echo -n "$exp "
       echo 'send(IP(dst="192.168.199.1",src="192.168.199.2")\
         /UDP(dport=19017))' | sudo ./scapy/run_scapy >/dev/null 2>&1
       cnt=$(ssh tilly01 ethtool -S ens2f1np1|grep -F tx_xsk_xmit\
         |cut -f2 -d:)
       [ $cnt -eq $exp ] || {
         echo COUNTER WRONG
         read x
       }
       ((exp+=1))
     done
     ssh tilly01 sudo ethtool --config-ntuple ens2f1np1 delete 1023
   done
   echo reset
   ssh tilly01 sudo killall xdpsock
done


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [net-next v6 0/9] Add support for per-NAPI config via netlink
  2025-01-10 18:58             ` Martin Karsten
@ 2025-01-12  8:05               ` Alex Lazar
  0 siblings, 0 replies; 11+ messages in thread
From: Alex Lazar @ 2025-01-12  8:05 UTC (permalink / raw)
  To: Martin Karsten, Stanislav Fomichev, Joe Damato,
	aleksander.lobakin@intel.com, almasrymina@google.com,
	bigeasy@linutronix.de, bjorn@rivosinc.com, Dan Jurgens,
	davem@davemloft.net, donald.hunter@gmail.com, dsahern@kernel.org,
	edumazet@google.com, hawk@kernel.org, jiri@resnulli.us,
	johannes.berg@intel.com, kuba@kernel.org, leitao@debian.org,
	leon@kernel.org, linux-kernel@vger.kernel.org,
	linux-rdma@vger.kernel.org, lorenzo@kernel.org,
	michael.chan@broadcom.com, netdev@vger.kernel.org,
	pabeni@redhat.com, Saeed Mahameed, sdf@fomichev.me,
	skhawaja@google.com, sridhar.samudrala@intel.com, Tariq Toukan,
	willemdebruijn.kernel@gmail.com, xuanzhuo@linux.alibaba.com,
	Gal Pressman, Nimrod Oren, Dror Tennenbaum, Dragos Tatulea



On 10/01/2025 20:58, Martin Karsten wrote:
> On 2025-01-10 13:26, Stanislav Fomichev wrote:
>> On 01/10, Joe Damato wrote:
>>> On Mon, Dec 30, 2024 at 09:31:23AM -0500, Joe Damato wrote:
>>>> On Mon, Dec 23, 2024 at 08:17:08AM +0000, Alex Lazar wrote:
>>>>>
>>>
>>> [...]
>>>
>>>>>
>>>>> Hi Joe,
>>>>>
>>>>> Thanks for the quick response.
>>>>> Comments inline, If you need more details or further clarification,
>>>>> please let me know.
>>>>
>>>> As mentioned above and in my previous emails: please provide lot
>>>> more detail and make it as easy as possible for me to reproduce this
>>>> issue with the simplest reproducer possible and a much more detailed
>>>> explanation.
>>>>
>>>> Please note: I will be out of the office until Jan 9 so my responses
>>>> will be limited until then.
>>>
>>> Just to follow up on this for anyone who missed the other thread,
>>> Stanislav proposed a patch which _might_ fix the issue being hit
>>> here.
>>>
>>> Please see [1], try that patch, and report back if that patch fixes
>>> the issue.
>>>
>>> Thanks.
>>>
>>> [1]: https://lore.kernel.org/netdev/20250109003436.2829560-1- 
>>> sdf@fomichev.me/
>>
>> Note that it might help only if xsk is using busy-polling. Not sure
>> that's the case, it's relatively obscure feature :-)
> 
> I believe I have reproduced Alex' issue using the methodology below and 
> your patch fixes it for me.
> 
> The experiment uses a server (tilly01) with mlx5 and a client (tilly02). 
> In the problem case, the 'response' packet gets stuck, but the next 
> 'request' packets triggers both the stuck and the regular responses. The 
> pattern can also be seen in the tcpdump output at the client. Note that 
> the response packet is not a valid packet (only MAC addresses swapped, 
> not IP addresses), but tcpdump shows it regardless.
> 
> Thanks,
> Martin
> 
> # on server tilly01
> watch -n 0.5 "sudo ethtool -S ens2f1np1 | fgrep tx_xsk_xmit"
> 
> # on client tilly02
> sudo tcpdump -qbi eno3d1 udp
> 
> # on client tilly02
> while true; do
>    ssh tilly01 "sudo ifconfig ens2f1np1 down; sudo modprobe -r mlx5_ib;
>      sleep 1; sudo modprobe mlx5_ib; sudo ifconfig ens2f1np1 up"
>    ssh -f tilly01 "sudo ./bpf-examples/AF_XDP-example/xdpsock \
>      -i ens2f1np1 -N -q 4 --l2fwd -z -B >/dev/null 2>&1"
>    exp=1
>    for ((i=0;i<5;i++)); do
>      ssh tilly01 "sudo ethtool --config-ntuple ens2f1np1 flow-type udp4\
>        dst-port 19017 action 4 >/dev/null 2>&1"
>      for ((j=0;j<10;j++)); do
>        echo -n "$exp "
>        echo 'send(IP(dst="192.168.199.1",src="192.168.199.2")\
>          /UDP(dport=19017))' | sudo ./scapy/run_scapy >/dev/null 2>&1
>        cnt=$(ssh tilly01 ethtool -S ens2f1np1|grep -F tx_xsk_xmit\
>          |cut -f2 -d:)
>        [ $cnt -eq $exp ] || {
>          echo COUNTER WRONG
>          read x
>        }
>        ((exp+=1))
>      done
>      ssh tilly01 sudo ethtool --config-ntuple ens2f1np1 delete 1023
>    done
>    echo reset
>    ssh tilly01 sudo killall xdpsock
> done
> 

Thanks to Joe Martin and Stanislav for introducing this fix and for your 
efforts in solving this issue. I reviewed it over the weekend and 
verified that it solves the problem.

Thanks,
Alex Lazar


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-01-12  8:05 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-18 11:22 [net-next v6 0/9] Add support for per-NAPI config via netlink Alex Lazar
2024-12-18 17:08 ` Joe Damato
2024-12-20 17:40   ` Joe Damato
2024-12-23  8:17     ` Alex Lazar
2024-12-30 14:31       ` Joe Damato
2025-01-10 18:16         ` Joe Damato
2025-01-10 18:26           ` Stanislav Fomichev
2025-01-10 18:58             ` Martin Karsten
2025-01-12  8:05               ` Alex Lazar
  -- strict thread matches above, loose matches on Subject: below --
2024-10-11 18:44 Joe Damato
2024-10-15  1:10 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).