[PATCH net 0/5] mlxsw: Fixes

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net 0/5] mlxsw: Fixes
@ 2024-10-25 14:26 Petr Machata
  2024-10-25 14:26 ` [PATCH net 1/5] mlxsw: spectrum_ptp: Add missing verification before pushing Tx header Petr Machata
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Petr Machata @ 2024-10-25 14:26 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Danielle Ratson, Petr Machata,
	Ido Schimmel, Amit Cohen, mlxsw

In this patchset:

- Tx header should be pushed for each packet which is transmitted via
  Spectrum ASICs. Patch #1 adds a missing call to skb_cow_head() to make
  sure that there is both enough room to push the Tx header and that the
  SKB header is not cloned and can be modified.

- Commit b5b60bb491b2 ("mlxsw: pci: Use page pool for Rx buffers
  allocation") converted mlxsw to use page pool for Rx buffers allocation.
  Sync for CPU and for device should be done for Rx pages. In patches #2
  and #3, add the missing calls to sync pages for, respectively, CPU and
  the device.

- Patch #4 then fixes a bug to IPv6 GRE forwarding offload. Patch #5 adds
  a generic forwarding test that fails with mlxsw ports prior to the fix.

Amit Cohen (3):
  mlxsw: spectrum_ptp: Add missing verification before pushing Tx header
  mlxsw: pci: Sync Rx buffers for CPU
  mlxsw: pci: Sync Rx buffers for device

Ido Schimmel (2):
  mlxsw: spectrum_ipip: Fix memory leak when changing remote IPv6
    address
  selftests: forwarding: Add IPv6 GRE remote change tests

 drivers/net/ethernet/mellanox/mlxsw/pci.c     | 25 ++++--
 .../ethernet/mellanox/mlxsw/spectrum_ipip.c   | 26 +++++-
 .../ethernet/mellanox/mlxsw/spectrum_ptp.c    |  7 ++
 .../selftests/net/forwarding/ip6gre_flat.sh   | 14 ++++
 .../net/forwarding/ip6gre_flat_key.sh         | 14 ++++
 .../net/forwarding/ip6gre_flat_keys.sh        | 14 ++++
 .../selftests/net/forwarding/ip6gre_hier.sh   | 14 ++++
 .../net/forwarding/ip6gre_hier_key.sh         | 14 ++++
 .../net/forwarding/ip6gre_hier_keys.sh        | 14 ++++
 .../selftests/net/forwarding/ip6gre_lib.sh    | 80 +++++++++++++++++++
 10 files changed, 212 insertions(+), 10 deletions(-)

-- 
2.45.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH net 1/5] mlxsw: spectrum_ptp: Add missing verification before pushing Tx header
  2024-10-25 14:26 [PATCH net 0/5] mlxsw: Fixes Petr Machata
@ 2024-10-25 14:26 ` Petr Machata
  2024-10-25 14:26 ` [PATCH net 2/5] mlxsw: pci: Sync Rx buffers for CPU Petr Machata
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Petr Machata @ 2024-10-25 14:26 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Danielle Ratson, Petr Machata,
	Ido Schimmel, Amit Cohen, mlxsw, Richard Cochran

From: Amit Cohen <amcohen@nvidia.com>

Tx header should be pushed for each packet which is transmitted via
Spectrum ASICs. The cited commit moved the call to skb_cow_head() from
mlxsw_sp_port_xmit() to functions which handle Tx header.

In case that mlxsw_sp->ptp_ops->txhdr_construct() is used to handle Tx
header, and txhdr_construct() is mlxsw_sp_ptp_txhdr_construct(), there is
no call for skb_cow_head() before pushing Tx header size to SKB. This flow
is relevant for Spectrum-1 and Spectrum-4, for PTP packets.

Add the missing call to skb_cow_head() to make sure that there is both
enough room to push the Tx header and that the SKB header is not cloned and
can be modified.

An additional set will be sent to net-next to centralize the handling of
the Tx header by pushing it to every packet just before transmission.

Cc: Richard Cochran <richardcochran@gmail.com>
Fixes: 24157bc69f45 ("mlxsw: Send PTP packets as data packets to overcome a limitation")
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c
index 5b174cb95eb8..d94081c7658e 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c
@@ -16,6 +16,7 @@
 #include "spectrum.h"
 #include "spectrum_ptp.h"
 #include "core.h"
+#include "txheader.h"
 
 #define MLXSW_SP1_PTP_CLOCK_CYCLES_SHIFT	29
 #define MLXSW_SP1_PTP_CLOCK_FREQ_KHZ		156257 /* 6.4nSec */
@@ -1684,6 +1685,12 @@ int mlxsw_sp_ptp_txhdr_construct(struct mlxsw_core *mlxsw_core,
 				 struct sk_buff *skb,
 				 const struct mlxsw_tx_info *tx_info)
 {
+	if (skb_cow_head(skb, MLXSW_TXHDR_LEN)) {
+		this_cpu_inc(mlxsw_sp_port->pcpu_stats->tx_dropped);
+		dev_kfree_skb_any(skb);
+		return -ENOMEM;
+	}
+
 	mlxsw_sp_txhdr_construct(skb, tx_info);
 	return 0;
 }
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net 2/5] mlxsw: pci: Sync Rx buffers for CPU
  2024-10-25 14:26 [PATCH net 0/5] mlxsw: Fixes Petr Machata
  2024-10-25 14:26 ` [PATCH net 1/5] mlxsw: spectrum_ptp: Add missing verification before pushing Tx header Petr Machata
@ 2024-10-25 14:26 ` Petr Machata
  2024-10-25 15:00   ` Alexander Lobakin
  2024-10-25 14:26 ` [PATCH net 3/5] mlxsw: pci: Sync Rx buffers for device Petr Machata
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Petr Machata @ 2024-10-25 14:26 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Danielle Ratson, Petr Machata,
	Ido Schimmel, Amit Cohen, mlxsw, Jiri Pirko

From: Amit Cohen <amcohen@nvidia.com>

When Rx packet is received, drivers should sync the pages for CPU, to
ensure the CPU reads the data written by the device and not stale
data from its cache.

Add the missing sync call in Rx path, sync the actual length of data for
each fragment.

Cc: Jiri Pirko <jiri@resnulli.us>
Fixes: b5b60bb491b2 ("mlxsw: pci: Use page pool for Rx buffers allocation")
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlxsw/pci.c | 22 +++++++++++++++-------
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.c b/drivers/net/ethernet/mellanox/mlxsw/pci.c
index 060e5b939211..2320a5f323b4 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/pci.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/pci.c
@@ -389,15 +389,27 @@ static void mlxsw_pci_wqe_frag_unmap(struct mlxsw_pci *mlxsw_pci, char *wqe,
 	dma_unmap_single(&pdev->dev, mapaddr, frag_len, direction);
 }
 
-static struct sk_buff *mlxsw_pci_rdq_build_skb(struct page *pages[],
+static struct sk_buff *mlxsw_pci_rdq_build_skb(struct mlxsw_pci_queue *q,
+					       struct page *pages[],
 					       u16 byte_count)
 {
+	struct mlxsw_pci_queue *cq = q->u.rdq.cq;
 	unsigned int linear_data_size;
+	struct page_pool *page_pool;
 	struct sk_buff *skb;
 	int page_index = 0;
 	bool linear_only;
 	void *data;
 
+	linear_only = byte_count + MLXSW_PCI_RX_BUF_SW_OVERHEAD <= PAGE_SIZE;
+	linear_data_size = linear_only ? byte_count :
+					 PAGE_SIZE -
+					 MLXSW_PCI_RX_BUF_SW_OVERHEAD;
+
+	page_pool = cq->u.cq.page_pool;
+	page_pool_dma_sync_for_cpu(page_pool, pages[page_index],
+				   MLXSW_PCI_SKB_HEADROOM, linear_data_size);
+
 	data = page_address(pages[page_index]);
 	net_prefetch(data);
 
@@ -405,11 +417,6 @@ static struct sk_buff *mlxsw_pci_rdq_build_skb(struct page *pages[],
 	if (unlikely(!skb))
 		return ERR_PTR(-ENOMEM);
 
-	linear_only = byte_count + MLXSW_PCI_RX_BUF_SW_OVERHEAD <= PAGE_SIZE;
-	linear_data_size = linear_only ? byte_count :
-					 PAGE_SIZE -
-					 MLXSW_PCI_RX_BUF_SW_OVERHEAD;
-
 	skb_reserve(skb, MLXSW_PCI_SKB_HEADROOM);
 	skb_put(skb, linear_data_size);
 
@@ -425,6 +432,7 @@ static struct sk_buff *mlxsw_pci_rdq_build_skb(struct page *pages[],
 
 		page = pages[page_index];
 		frag_size = min(byte_count, PAGE_SIZE);
+		page_pool_dma_sync_for_cpu(page_pool, page, 0, frag_size);
 		skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags,
 				page, 0, frag_size, PAGE_SIZE);
 		byte_count -= frag_size;
@@ -760,7 +768,7 @@ static void mlxsw_pci_cqe_rdq_handle(struct mlxsw_pci *mlxsw_pci,
 	if (err)
 		goto out;
 
-	skb = mlxsw_pci_rdq_build_skb(pages, byte_count);
+	skb = mlxsw_pci_rdq_build_skb(q, pages, byte_count);
 	if (IS_ERR(skb)) {
 		dev_err_ratelimited(&pdev->dev, "Failed to build skb for RDQ\n");
 		mlxsw_pci_rdq_pages_recycle(q, pages, num_sg_entries);
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net 3/5] mlxsw: pci: Sync Rx buffers for device
  2024-10-25 14:26 [PATCH net 0/5] mlxsw: Fixes Petr Machata
  2024-10-25 14:26 ` [PATCH net 1/5] mlxsw: spectrum_ptp: Add missing verification before pushing Tx header Petr Machata
  2024-10-25 14:26 ` [PATCH net 2/5] mlxsw: pci: Sync Rx buffers for CPU Petr Machata
@ 2024-10-25 14:26 ` Petr Machata
  2024-10-25 15:02   ` Alexander Lobakin
  2024-10-25 14:26 ` [PATCH net 4/5] mlxsw: spectrum_ipip: Fix memory leak when changing remote IPv6 address Petr Machata
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Petr Machata @ 2024-10-25 14:26 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Danielle Ratson, Petr Machata,
	Ido Schimmel, Amit Cohen, mlxsw, Jiri Pirko

From: Amit Cohen <amcohen@nvidia.com>

Non-coherent architectures, like ARM, may require invalidating caches
before the device can use the DMA mapped memory, which means that before
posting pages to device, drivers should sync the memory for device.

Sync for device can be configured as page pool responsibility. Set the
relevant flag and define max_len for sync.

Cc: Jiri Pirko <jiri@resnulli.us>
Fixes: b5b60bb491b2 ("mlxsw: pci: Use page pool for Rx buffers allocation")
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlxsw/pci.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.c b/drivers/net/ethernet/mellanox/mlxsw/pci.c
index 2320a5f323b4..d6f37456fb31 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/pci.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/pci.c
@@ -996,12 +996,13 @@ static int mlxsw_pci_cq_page_pool_init(struct mlxsw_pci_queue *q,
 	if (cq_type != MLXSW_PCI_CQ_RDQ)
 		return 0;
 
-	pp_params.flags = PP_FLAG_DMA_MAP;
+	pp_params.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV;
 	pp_params.pool_size = MLXSW_PCI_WQE_COUNT * mlxsw_pci->num_sg_entries;
 	pp_params.nid = dev_to_node(&mlxsw_pci->pdev->dev);
 	pp_params.dev = &mlxsw_pci->pdev->dev;
 	pp_params.napi = &q->u.cq.napi;
 	pp_params.dma_dir = DMA_FROM_DEVICE;
+	pp_params.max_len = PAGE_SIZE;
 
 	page_pool = page_pool_create(&pp_params);
 	if (IS_ERR(page_pool))
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net 4/5] mlxsw: spectrum_ipip: Fix memory leak when changing remote IPv6 address
  2024-10-25 14:26 [PATCH net 0/5] mlxsw: Fixes Petr Machata
                   ` (2 preceding siblings ...)
  2024-10-25 14:26 ` [PATCH net 3/5] mlxsw: pci: Sync Rx buffers for device Petr Machata
@ 2024-10-25 14:26 ` Petr Machata
  2024-10-25 14:26 ` [PATCH net 5/5] selftests: forwarding: Add IPv6 GRE remote change tests Petr Machata
  2024-10-31  1:30 ` [PATCH net 0/5] mlxsw: Fixes patchwork-bot+netdevbpf
  5 siblings, 0 replies; 12+ messages in thread
From: Petr Machata @ 2024-10-25 14:26 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Danielle Ratson, Petr Machata,
	Ido Schimmel, Amit Cohen, mlxsw, Maksym Yaremchuk

From: Ido Schimmel <idosch@nvidia.com>

The device stores IPv6 addresses that are used for encapsulation in
linear memory that is managed by the driver.

Changing the remote address of an ip6gre net device never worked
properly, but since cited commit the following reproducer [1] would
result in a warning [2] and a memory leak [3]. The problem is that the
new remote address is never added by the driver to its hash table (and
therefore the device) and the old address is never removed from it.

Fix by programming the new address when the configuration of the ip6gre
net device changes and removing the old one. If the address did not
change, then the above would result in increasing the reference count of
the address and then decreasing it.

[1]
 # ip link add name bla up type ip6gre local 2001:db8:1::1 remote 2001:db8:2::1 tos inherit ttl inherit
 # ip link set dev bla type ip6gre remote 2001:db8:3::1
 # ip link del dev bla
 # devlink dev reload pci/0000:01:00.0

[2]
WARNING: CPU: 0 PID: 1682 at drivers/net/ethernet/mellanox/mlxsw/spectrum.c:3002 mlxsw_sp_ipv6_addr_put+0x140/0x1d0
Modules linked in:
CPU: 0 UID: 0 PID: 1682 Comm: ip Not tainted 6.12.0-rc3-custom-g86b5b55bc835 #151
Hardware name: Nvidia SN5600/VMOD0013, BIOS 5.13 05/31/2023
RIP: 0010:mlxsw_sp_ipv6_addr_put+0x140/0x1d0
[...]
Call Trace:
 <TASK>
 mlxsw_sp_router_netdevice_event+0x55f/0x1240
 notifier_call_chain+0x5a/0xd0
 call_netdevice_notifiers_info+0x39/0x90
 unregister_netdevice_many_notify+0x63e/0x9d0
 rtnl_dellink+0x16b/0x3a0
 rtnetlink_rcv_msg+0x142/0x3f0
 netlink_rcv_skb+0x50/0x100
 netlink_unicast+0x242/0x390
 netlink_sendmsg+0x1de/0x420
 ____sys_sendmsg+0x2bd/0x320
 ___sys_sendmsg+0x9a/0xe0
 __sys_sendmsg+0x7a/0xd0
 do_syscall_64+0x9e/0x1a0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

[3]
unreferenced object 0xffff898081f597a0 (size 32):
  comm "ip", pid 1626, jiffies 4294719324
  hex dump (first 32 bytes):
    20 01 0d b8 00 02 00 00 00 00 00 00 00 00 00 01   ...............
    21 49 61 83 80 89 ff ff 00 00 00 00 01 00 00 00  !Ia.............
  backtrace (crc fd9be911):
    [<00000000df89c55d>] __kmalloc_cache_noprof+0x1da/0x260
    [<00000000ff2a1ddb>] mlxsw_sp_ipv6_addr_kvdl_index_get+0x281/0x340
    [<000000009ddd445d>] mlxsw_sp_router_netdevice_event+0x47b/0x1240
    [<00000000743e7757>] notifier_call_chain+0x5a/0xd0
    [<000000007c7b9e13>] call_netdevice_notifiers_info+0x39/0x90
    [<000000002509645d>] register_netdevice+0x5f7/0x7a0
    [<00000000c2e7d2a9>] ip6gre_newlink_common.isra.0+0x65/0x130
    [<0000000087cd6d8d>] ip6gre_newlink+0x72/0x120
    [<000000004df7c7cc>] rtnl_newlink+0x471/0xa20
    [<0000000057ed632a>] rtnetlink_rcv_msg+0x142/0x3f0
    [<0000000032e0d5b5>] netlink_rcv_skb+0x50/0x100
    [<00000000908bca63>] netlink_unicast+0x242/0x390
    [<00000000cdbe1c87>] netlink_sendmsg+0x1de/0x420
    [<0000000011db153e>] ____sys_sendmsg+0x2bd/0x320
    [<000000003b6d53eb>] ___sys_sendmsg+0x9a/0xe0
    [<00000000cae27c62>] __sys_sendmsg+0x7a/0xd0

Fixes: cf42911523e0 ("mlxsw: spectrum_ipip: Use common hash table for IPv6 address mapping")
Reported-by: Maksym Yaremchuk <maksymy@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
---
 .../ethernet/mellanox/mlxsw/spectrum_ipip.c   | 26 +++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_ipip.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_ipip.c
index d761a1235994..7ea798a4949e 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_ipip.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_ipip.c
@@ -481,11 +481,33 @@ mlxsw_sp_ipip_ol_netdev_change_gre6(struct mlxsw_sp *mlxsw_sp,
 				    struct mlxsw_sp_ipip_entry *ipip_entry,
 				    struct netlink_ext_ack *extack)
 {
+	u32 new_kvdl_index, old_kvdl_index = ipip_entry->dip_kvdl_index;
+	struct in6_addr old_addr6 = ipip_entry->parms.daddr.addr6;
 	struct mlxsw_sp_ipip_parms new_parms;
+	int err;
 
 	new_parms = mlxsw_sp_ipip_netdev_parms_init_gre6(ipip_entry->ol_dev);
-	return mlxsw_sp_ipip_ol_netdev_change_gre(mlxsw_sp, ipip_entry,
-						  &new_parms, extack);
+
+	err = mlxsw_sp_ipv6_addr_kvdl_index_get(mlxsw_sp,
+						&new_parms.daddr.addr6,
+						&new_kvdl_index);
+	if (err)
+		return err;
+	ipip_entry->dip_kvdl_index = new_kvdl_index;
+
+	err = mlxsw_sp_ipip_ol_netdev_change_gre(mlxsw_sp, ipip_entry,
+						 &new_parms, extack);
+	if (err)
+		goto err_change_gre;
+
+	mlxsw_sp_ipv6_addr_put(mlxsw_sp, &old_addr6);
+
+	return 0;
+
+err_change_gre:
+	ipip_entry->dip_kvdl_index = old_kvdl_index;
+	mlxsw_sp_ipv6_addr_put(mlxsw_sp, &new_parms.daddr.addr6);
+	return err;
 }
 
 static int
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net 5/5] selftests: forwarding: Add IPv6 GRE remote change tests
  2024-10-25 14:26 [PATCH net 0/5] mlxsw: Fixes Petr Machata
                   ` (3 preceding siblings ...)
  2024-10-25 14:26 ` [PATCH net 4/5] mlxsw: spectrum_ipip: Fix memory leak when changing remote IPv6 address Petr Machata
@ 2024-10-25 14:26 ` Petr Machata
  2024-10-31  1:30 ` [PATCH net 0/5] mlxsw: Fixes patchwork-bot+netdevbpf
  5 siblings, 0 replies; 12+ messages in thread
From: Petr Machata @ 2024-10-25 14:26 UTC (permalink / raw)
  To: netdev
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Danielle Ratson, Petr Machata,
	Ido Schimmel, Amit Cohen, mlxsw

From: Ido Schimmel <idosch@nvidia.com>

Test that after changing the remote address of an ip6gre net device
traffic is forwarded as expected. Test with both flat and hierarchical
topologies and with and without an input / output keys.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
---
 .../selftests/net/forwarding/ip6gre_flat.sh   | 14 ++++
 .../net/forwarding/ip6gre_flat_key.sh         | 14 ++++
 .../net/forwarding/ip6gre_flat_keys.sh        | 14 ++++
 .../selftests/net/forwarding/ip6gre_hier.sh   | 14 ++++
 .../net/forwarding/ip6gre_hier_key.sh         | 14 ++++
 .../net/forwarding/ip6gre_hier_keys.sh        | 14 ++++
 .../selftests/net/forwarding/ip6gre_lib.sh    | 80 +++++++++++++++++++
 7 files changed, 164 insertions(+)

diff --git a/tools/testing/selftests/net/forwarding/ip6gre_flat.sh b/tools/testing/selftests/net/forwarding/ip6gre_flat.sh
index 96c97064f2d3..becc7c3fc809 100755
--- a/tools/testing/selftests/net/forwarding/ip6gre_flat.sh
+++ b/tools/testing/selftests/net/forwarding/ip6gre_flat.sh
@@ -8,6 +8,7 @@
 ALL_TESTS="
 	gre_flat
 	gre_mtu_change
+	gre_flat_remote_change
 "
 
 NUM_NETIFS=6
@@ -44,6 +45,19 @@ gre_mtu_change()
 	test_mtu_change
 }
 
+gre_flat_remote_change()
+{
+	flat_remote_change
+
+	test_traffic_ip4ip6 "GRE flat IPv4-in-IPv6 (new remote)"
+	test_traffic_ip6ip6 "GRE flat IPv6-in-IPv6 (new remote)"
+
+	flat_remote_restore
+
+	test_traffic_ip4ip6 "GRE flat IPv4-in-IPv6 (old remote)"
+	test_traffic_ip6ip6 "GRE flat IPv6-in-IPv6 (old remote)"
+}
+
 cleanup()
 {
 	pre_cleanup
diff --git a/tools/testing/selftests/net/forwarding/ip6gre_flat_key.sh b/tools/testing/selftests/net/forwarding/ip6gre_flat_key.sh
index ff9fb0db9bd1..e5335116a2fd 100755
--- a/tools/testing/selftests/net/forwarding/ip6gre_flat_key.sh
+++ b/tools/testing/selftests/net/forwarding/ip6gre_flat_key.sh
@@ -8,6 +8,7 @@
 ALL_TESTS="
 	gre_flat
 	gre_mtu_change
+	gre_flat_remote_change
 "
 
 NUM_NETIFS=6
@@ -44,6 +45,19 @@ gre_mtu_change()
 	test_mtu_change
 }
 
+gre_flat_remote_change()
+{
+	flat_remote_change
+
+	test_traffic_ip4ip6 "GRE flat IPv4-in-IPv6 with key (new remote)"
+	test_traffic_ip6ip6 "GRE flat IPv6-in-IPv6 with key (new remote)"
+
+	flat_remote_restore
+
+	test_traffic_ip4ip6 "GRE flat IPv4-in-IPv6 with key (old remote)"
+	test_traffic_ip6ip6 "GRE flat IPv6-in-IPv6 with key (old remote)"
+}
+
 cleanup()
 {
 	pre_cleanup
diff --git a/tools/testing/selftests/net/forwarding/ip6gre_flat_keys.sh b/tools/testing/selftests/net/forwarding/ip6gre_flat_keys.sh
index 12c138785242..7e0cbfdefab0 100755
--- a/tools/testing/selftests/net/forwarding/ip6gre_flat_keys.sh
+++ b/tools/testing/selftests/net/forwarding/ip6gre_flat_keys.sh
@@ -8,6 +8,7 @@
 ALL_TESTS="
 	gre_flat
 	gre_mtu_change
+	gre_flat_remote_change
 "
 
 NUM_NETIFS=6
@@ -44,6 +45,19 @@ gre_mtu_change()
 	test_mtu_change	gre
 }
 
+gre_flat_remote_change()
+{
+	flat_remote_change
+
+	test_traffic_ip4ip6 "GRE flat IPv4-in-IPv6 with ikey/okey (new remote)"
+	test_traffic_ip6ip6 "GRE flat IPv6-in-IPv6 with ikey/okey (new remote)"
+
+	flat_remote_restore
+
+	test_traffic_ip4ip6 "GRE flat IPv4-in-IPv6 with ikey/okey (old remote)"
+	test_traffic_ip6ip6 "GRE flat IPv6-in-IPv6 with ikey/okey (old remote)"
+}
+
 cleanup()
 {
 	pre_cleanup
diff --git a/tools/testing/selftests/net/forwarding/ip6gre_hier.sh b/tools/testing/selftests/net/forwarding/ip6gre_hier.sh
index 83b55c30a5c3..e0844495f3d1 100755
--- a/tools/testing/selftests/net/forwarding/ip6gre_hier.sh
+++ b/tools/testing/selftests/net/forwarding/ip6gre_hier.sh
@@ -8,6 +8,7 @@
 ALL_TESTS="
 	gre_hier
 	gre_mtu_change
+	gre_hier_remote_change
 "
 
 NUM_NETIFS=6
@@ -44,6 +45,19 @@ gre_mtu_change()
 	test_mtu_change gre
 }
 
+gre_hier_remote_change()
+{
+	hier_remote_change
+
+	test_traffic_ip4ip6 "GRE hierarchical IPv4-in-IPv6 (new remote)"
+	test_traffic_ip6ip6 "GRE hierarchical IPv6-in-IPv6 (new remote)"
+
+	hier_remote_restore
+
+	test_traffic_ip4ip6 "GRE hierarchical IPv4-in-IPv6 (old remote)"
+	test_traffic_ip6ip6 "GRE hierarchical IPv6-in-IPv6 (old remote)"
+}
+
 cleanup()
 {
 	pre_cleanup
diff --git a/tools/testing/selftests/net/forwarding/ip6gre_hier_key.sh b/tools/testing/selftests/net/forwarding/ip6gre_hier_key.sh
index 256607916d92..741bc9c928eb 100755
--- a/tools/testing/selftests/net/forwarding/ip6gre_hier_key.sh
+++ b/tools/testing/selftests/net/forwarding/ip6gre_hier_key.sh
@@ -8,6 +8,7 @@
 ALL_TESTS="
 	gre_hier
 	gre_mtu_change
+	gre_hier_remote_change
 "
 
 NUM_NETIFS=6
@@ -44,6 +45,19 @@ gre_mtu_change()
 	test_mtu_change gre
 }
 
+gre_hier_remote_change()
+{
+	hier_remote_change
+
+	test_traffic_ip4ip6 "GRE hierarchical IPv4-in-IPv6 with key (new remote)"
+	test_traffic_ip6ip6 "GRE hierarchical IPv6-in-IPv6 with key (new remote)"
+
+	hier_remote_restore
+
+	test_traffic_ip4ip6 "GRE hierarchical IPv4-in-IPv6 with key (old remote)"
+	test_traffic_ip6ip6 "GRE hierarchical IPv6-in-IPv6 with key (old remote)"
+}
+
 cleanup()
 {
 	pre_cleanup
diff --git a/tools/testing/selftests/net/forwarding/ip6gre_hier_keys.sh b/tools/testing/selftests/net/forwarding/ip6gre_hier_keys.sh
index ad1bcd6334a8..ad9eab4b1367 100755
--- a/tools/testing/selftests/net/forwarding/ip6gre_hier_keys.sh
+++ b/tools/testing/selftests/net/forwarding/ip6gre_hier_keys.sh
@@ -8,6 +8,7 @@
 ALL_TESTS="
 	gre_hier
 	gre_mtu_change
+	gre_hier_remote_change
 "
 
 NUM_NETIFS=6
@@ -44,6 +45,19 @@ gre_mtu_change()
 	test_mtu_change gre
 }
 
+gre_hier_remote_change()
+{
+	hier_remote_change
+
+	test_traffic_ip4ip6 "GRE hierarchical IPv4-in-IPv6 with ikey/okey (new remote)"
+	test_traffic_ip6ip6 "GRE hierarchical IPv6-in-IPv6 with ikey/okey (new remote)"
+
+	hier_remote_restore
+
+	test_traffic_ip4ip6 "GRE hierarchical IPv4-in-IPv6 with ikey/okey (old remote)"
+	test_traffic_ip6ip6 "GRE hierarchical IPv6-in-IPv6 with ikey/okey (old remote)"
+}
+
 cleanup()
 {
 	pre_cleanup
diff --git a/tools/testing/selftests/net/forwarding/ip6gre_lib.sh b/tools/testing/selftests/net/forwarding/ip6gre_lib.sh
index 24f4ab328bd2..2d91281dc5b7 100644
--- a/tools/testing/selftests/net/forwarding/ip6gre_lib.sh
+++ b/tools/testing/selftests/net/forwarding/ip6gre_lib.sh
@@ -436,3 +436,83 @@ test_mtu_change()
 	check_err $?
 	log_test "ping GRE IPv6, packet size 1800 after MTU change"
 }
+
+topo_flat_remote_change()
+{
+	local old1=$1; shift
+	local new1=$1; shift
+	local old2=$1; shift
+	local new2=$1; shift
+
+	ip link set dev g1a type ip6gre local $new1 remote $new2
+        __addr_add_del g1a add "$new1/128"
+        __addr_add_del g1a del "$old1/128"
+	ip -6 route add $new2/128 via 2001:db8:10::2
+	ip -6 route del $old2/128
+
+	ip link set dev g2a type ip6gre local $new2 remote $new1
+        __addr_add_del g2a add "$new2/128"
+        __addr_add_del g2a del "$old2/128"
+	ip -6 route add vrf v$ol2 $new1/128 via 2001:db8:10::1
+	ip -6 route del vrf v$ol2 $old1/128
+}
+
+flat_remote_change()
+{
+	local old1=2001:db8:3::1
+	local new1=2001:db8:3::10
+	local old2=2001:db8:3::2
+	local new2=2001:db8:3::20
+
+	topo_flat_remote_change $old1 $new1 $old2 $new2
+}
+
+flat_remote_restore()
+{
+	local old1=2001:db8:3::10
+	local new1=2001:db8:3::1
+	local old2=2001:db8:3::20
+	local new2=2001:db8:3::2
+
+	topo_flat_remote_change $old1 $new1 $old2 $new2
+}
+
+topo_hier_remote_change()
+{
+	local old1=$1; shift
+	local new1=$1; shift
+	local old2=$1; shift
+	local new2=$1; shift
+
+        __addr_add_del dummy1 del "$old1/64"
+        __addr_add_del dummy1 add "$new1/64"
+	ip link set dev g1a type ip6gre local $new1 remote $new2
+	ip -6 route add vrf v$ul1 $new2/128 via 2001:db8:10::2
+	ip -6 route del vrf v$ul1 $old2/128
+
+        __addr_add_del dummy2 del "$old2/64"
+        __addr_add_del dummy2 add "$new2/64"
+	ip link set dev g2a type ip6gre local $new2 remote $new1
+	ip -6 route add vrf v$ul2 $new1/128 via 2001:db8:10::1
+	ip -6 route del vrf v$ul2 $old1/128
+}
+
+hier_remote_change()
+{
+	local old1=2001:db8:3::1
+	local new1=2001:db8:3::10
+	local old2=2001:db8:3::2
+	local new2=2001:db8:3::20
+
+	topo_hier_remote_change $old1 $new1 $old2 $new2
+}
+
+hier_remote_restore()
+{
+	local old1=2001:db8:3::10
+	local new1=2001:db8:3::1
+	local old2=2001:db8:3::20
+	local new2=2001:db8:3::2
+
+	topo_hier_remote_change $old1 $new1 $old2 $new2
+}
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH net 2/5] mlxsw: pci: Sync Rx buffers for CPU
  2024-10-25 14:26 ` [PATCH net 2/5] mlxsw: pci: Sync Rx buffers for CPU Petr Machata
@ 2024-10-25 15:00   ` Alexander Lobakin
  2024-10-27  7:29     ` Amit Cohen
  0 siblings, 1 reply; 12+ messages in thread
From: Alexander Lobakin @ 2024-10-25 15:00 UTC (permalink / raw)
  To: Petr Machata, Amit Cohen
  Cc: netdev, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Danielle Ratson,
	Ido Schimmel, mlxsw, Jiri Pirko

From: Petr Machata <petrm@nvidia.com>
Date: Fri, 25 Oct 2024 16:26:26 +0200

> From: Amit Cohen <amcohen@nvidia.com>
> 
> When Rx packet is received, drivers should sync the pages for CPU, to
> ensure the CPU reads the data written by the device and not stale
> data from its cache.

[...]

> -static struct sk_buff *mlxsw_pci_rdq_build_skb(struct page *pages[],
> +static struct sk_buff *mlxsw_pci_rdq_build_skb(struct mlxsw_pci_queue *q,
> +					       struct page *pages[],
>  					       u16 byte_count)
>  {
> +	struct mlxsw_pci_queue *cq = q->u.rdq.cq;
>  	unsigned int linear_data_size;
> +	struct page_pool *page_pool;
>  	struct sk_buff *skb;
>  	int page_index = 0;
>  	bool linear_only;
>  	void *data;
>  
> +	linear_only = byte_count + MLXSW_PCI_RX_BUF_SW_OVERHEAD <= PAGE_SIZE;
> +	linear_data_size = linear_only ? byte_count :
> +					 PAGE_SIZE -
> +					 MLXSW_PCI_RX_BUF_SW_OVERHEAD;

Maybe reformat the line while at it?

	linear_data_size = linear_only ? byte_count :
			   PAGE_SIZE - MLXSW_PCI_RX_BUF_SW_OVERHEAD;

> +
> +	page_pool = cq->u.cq.page_pool;
> +	page_pool_dma_sync_for_cpu(page_pool, pages[page_index],
> +				   MLXSW_PCI_SKB_HEADROOM, linear_data_size);

page_pool_dma_sync_for_cpu() already skips the headroom:

	dma_sync_single_range_for_cpu(pool->p.dev,
				      offset + pool->p.offset, ...

Since your pool->p.offset is MLXSW_PCI_SKB_HEADROOM, I believe you need
to pass 0 here.

> +
>  	data = page_address(pages[page_index]);
>  	net_prefetch(data);

Thanks,
Olek

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net 3/5] mlxsw: pci: Sync Rx buffers for device
  2024-10-25 14:26 ` [PATCH net 3/5] mlxsw: pci: Sync Rx buffers for device Petr Machata
@ 2024-10-25 15:02   ` Alexander Lobakin
  2024-10-27  6:51     ` Amit Cohen
  0 siblings, 1 reply; 12+ messages in thread
From: Alexander Lobakin @ 2024-10-25 15:02 UTC (permalink / raw)
  To: Petr Machata, Amit Cohen
  Cc: netdev, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Danielle Ratson,
	Ido Schimmel, mlxsw, Jiri Pirko

From: Petr Machata <petrm@nvidia.com>
Date: Fri, 25 Oct 2024 16:26:27 +0200

> From: Amit Cohen <amcohen@nvidia.com>
> 
> Non-coherent architectures, like ARM, may require invalidating caches
> before the device can use the DMA mapped memory, which means that before
> posting pages to device, drivers should sync the memory for device.
> 
> Sync for device can be configured as page pool responsibility. Set the
> relevant flag and define max_len for sync.
> 
> Cc: Jiri Pirko <jiri@resnulli.us>
> Fixes: b5b60bb491b2 ("mlxsw: pci: Use page pool for Rx buffers allocation")
> Signed-off-by: Amit Cohen <amcohen@nvidia.com>
> Reviewed-by: Ido Schimmel <idosch@nvidia.com>
> Signed-off-by: Petr Machata <petrm@nvidia.com>
> ---
>  drivers/net/ethernet/mellanox/mlxsw/pci.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.c b/drivers/net/ethernet/mellanox/mlxsw/pci.c
> index 2320a5f323b4..d6f37456fb31 100644
> --- a/drivers/net/ethernet/mellanox/mlxsw/pci.c
> +++ b/drivers/net/ethernet/mellanox/mlxsw/pci.c
> @@ -996,12 +996,13 @@ static int mlxsw_pci_cq_page_pool_init(struct mlxsw_pci_queue *q,
>  	if (cq_type != MLXSW_PCI_CQ_RDQ)
>  		return 0;
>  
> -	pp_params.flags = PP_FLAG_DMA_MAP;
> +	pp_params.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV;
>  	pp_params.pool_size = MLXSW_PCI_WQE_COUNT * mlxsw_pci->num_sg_entries;
>  	pp_params.nid = dev_to_node(&mlxsw_pci->pdev->dev);
>  	pp_params.dev = &mlxsw_pci->pdev->dev;
>  	pp_params.napi = &q->u.cq.napi;
>  	pp_params.dma_dir = DMA_FROM_DEVICE;
> +	pp_params.max_len = PAGE_SIZE;

max_len is the maximum HW-writable area of a buffer. Headroom and
tailroom must be excluded. In your case

	pp_params.max_len = PAGE_SIZE - MLXSW_PCI_RX_BUF_SW_OVERHEAD;

>  
>  	page_pool = page_pool_create(&pp_params);
>  	if (IS_ERR(page_pool))

Thanks,
Olek

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH net 3/5] mlxsw: pci: Sync Rx buffers for device
  2024-10-25 15:02   ` Alexander Lobakin
@ 2024-10-27  6:51     ` Amit Cohen
  2024-10-29 15:12       ` Alexander Lobakin
  0 siblings, 1 reply; 12+ messages in thread
From: Amit Cohen @ 2024-10-27  6:51 UTC (permalink / raw)
  To: Alexander Lobakin, Petr Machata
  Cc: netdev@vger.kernel.org, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Danielle Ratson, Ido Schimmel, mlxsw, Jiri Pirko



> -----Original Message-----
> From: Alexander Lobakin <aleksander.lobakin@intel.com>
> Sent: Friday, 25 October 2024 18:03
> To: Petr Machata <petrm@nvidia.com>; Amit Cohen <amcohen@nvidia.com>
> Cc: netdev@vger.kernel.org; Andrew Lunn <andrew+netdev@lunn.ch>; David S. Miller <davem@davemloft.net>; Eric Dumazet
> <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>; Simon Horman <horms@kernel.org>;
> Danielle Ratson <danieller@nvidia.com>; Ido Schimmel <idosch@nvidia.com>; mlxsw <mlxsw@nvidia.com>; Jiri Pirko <jiri@resnulli.us>
> Subject: Re: [PATCH net 3/5] mlxsw: pci: Sync Rx buffers for device
> 
> From: Petr Machata <petrm@nvidia.com>
> Date: Fri, 25 Oct 2024 16:26:27 +0200
> 
> > From: Amit Cohen <amcohen@nvidia.com>
> >
> > Non-coherent architectures, like ARM, may require invalidating caches
> > before the device can use the DMA mapped memory, which means that before
> > posting pages to device, drivers should sync the memory for device.
> >
> > Sync for device can be configured as page pool responsibility. Set the
> > relevant flag and define max_len for sync.
> >
> > Cc: Jiri Pirko <jiri@resnulli.us>
> > Fixes: b5b60bb491b2 ("mlxsw: pci: Use page pool for Rx buffers allocation")
> > Signed-off-by: Amit Cohen <amcohen@nvidia.com>
> > Reviewed-by: Ido Schimmel <idosch@nvidia.com>
> > Signed-off-by: Petr Machata <petrm@nvidia.com>
> > ---
> >  drivers/net/ethernet/mellanox/mlxsw/pci.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.c b/drivers/net/ethernet/mellanox/mlxsw/pci.c
> > index 2320a5f323b4..d6f37456fb31 100644
> > --- a/drivers/net/ethernet/mellanox/mlxsw/pci.c
> > +++ b/drivers/net/ethernet/mellanox/mlxsw/pci.c
> > @@ -996,12 +996,13 @@ static int mlxsw_pci_cq_page_pool_init(struct mlxsw_pci_queue *q,
> >  	if (cq_type != MLXSW_PCI_CQ_RDQ)
> >  		return 0;
> >
> > -	pp_params.flags = PP_FLAG_DMA_MAP;
> > +	pp_params.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV;
> >  	pp_params.pool_size = MLXSW_PCI_WQE_COUNT * mlxsw_pci->num_sg_entries;
> >  	pp_params.nid = dev_to_node(&mlxsw_pci->pdev->dev);
> >  	pp_params.dev = &mlxsw_pci->pdev->dev;
> >  	pp_params.napi = &q->u.cq.napi;
> >  	pp_params.dma_dir = DMA_FROM_DEVICE;
> > +	pp_params.max_len = PAGE_SIZE;
> 
> max_len is the maximum HW-writable area of a buffer. Headroom and
> tailroom must be excluded. In your case
> 
> 	pp_params.max_len = PAGE_SIZE - MLXSW_PCI_RX_BUF_SW_OVERHEAD;
> 

mlxsw driver uses fragmented buffers and the page pool is used to allocate the buffers for all scatter/gather entries.
For each packet, the HW-writable area of a buffer of the *first* entry is 'PAGE_SIZE - MLXSW_PCI_RX_BUF_SW_OVERHEAD', but for other entries we map PAGE_SIZE to HW.
That's why we set page pool to sync PAGE_SIZE and use offset=0.

> >
> >  	page_pool = page_pool_create(&pp_params);
> >  	if (IS_ERR(page_pool))
> 
> Thanks,
> Olek

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH net 2/5] mlxsw: pci: Sync Rx buffers for CPU
  2024-10-25 15:00   ` Alexander Lobakin
@ 2024-10-27  7:29     ` Amit Cohen
  0 siblings, 0 replies; 12+ messages in thread
From: Amit Cohen @ 2024-10-27  7:29 UTC (permalink / raw)
  To: Alexander Lobakin, Petr Machata
  Cc: netdev@vger.kernel.org, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Danielle Ratson, Ido Schimmel, mlxsw, Jiri Pirko



> -----Original Message-----
> From: Alexander Lobakin <aleksander.lobakin@intel.com>
> Sent: Friday, 25 October 2024 18:00
> To: Petr Machata <petrm@nvidia.com>; Amit Cohen <amcohen@nvidia.com>
> Cc: netdev@vger.kernel.org; Andrew Lunn <andrew+netdev@lunn.ch>; David S. Miller <davem@davemloft.net>; Eric Dumazet
> <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>; Simon Horman <horms@kernel.org>;
> Danielle Ratson <danieller@nvidia.com>; Ido Schimmel <idosch@nvidia.com>; mlxsw <mlxsw@nvidia.com>; Jiri Pirko <jiri@resnulli.us>
> Subject: Re: [PATCH net 2/5] mlxsw: pci: Sync Rx buffers for CPU
> 
> From: Petr Machata <petrm@nvidia.com>
> Date: Fri, 25 Oct 2024 16:26:26 +0200
> 
> > From: Amit Cohen <amcohen@nvidia.com>
> >
> > When Rx packet is received, drivers should sync the pages for CPU, to
> > ensure the CPU reads the data written by the device and not stale
> > data from its cache.
> 
> [...]
> 
> > -static struct sk_buff *mlxsw_pci_rdq_build_skb(struct page *pages[],
> > +static struct sk_buff *mlxsw_pci_rdq_build_skb(struct mlxsw_pci_queue *q,
> > +					       struct page *pages[],
> >  					       u16 byte_count)
> >  {
> > +	struct mlxsw_pci_queue *cq = q->u.rdq.cq;
> >  	unsigned int linear_data_size;
> > +	struct page_pool *page_pool;
> >  	struct sk_buff *skb;
> >  	int page_index = 0;
> >  	bool linear_only;
> >  	void *data;
> >
> > +	linear_only = byte_count + MLXSW_PCI_RX_BUF_SW_OVERHEAD <= PAGE_SIZE;
> > +	linear_data_size = linear_only ? byte_count :
> > +					 PAGE_SIZE -
> > +					 MLXSW_PCI_RX_BUF_SW_OVERHEAD;
> 
> Maybe reformat the line while at it?
> 
> 	linear_data_size = linear_only ? byte_count :
> 			   PAGE_SIZE - MLXSW_PCI_RX_BUF_SW_OVERHEAD;
> 
> > +
> > +	page_pool = cq->u.cq.page_pool;
> > +	page_pool_dma_sync_for_cpu(page_pool, pages[page_index],
> > +				   MLXSW_PCI_SKB_HEADROOM, linear_data_size);
> 
> page_pool_dma_sync_for_cpu() already skips the headroom:
> 
> 	dma_sync_single_range_for_cpu(pool->p.dev,
> 				      offset + pool->p.offset, ...
> 
> Since your pool->p.offset is MLXSW_PCI_SKB_HEADROOM, I believe you need
> to pass 0 here.

Our pool->p.offset is zero.
We use the page pool to allocate buffers for scatter/gather entries.
Only the first entry saves headroom for software usage, so only for the first buffer of the packet we pass headroom to page_pool_dma_sync_for_cpu(). 

> 
> > +
> >  	data = page_address(pages[page_index]);
> >  	net_prefetch(data);
> 
> Thanks,
> Olek

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net 3/5] mlxsw: pci: Sync Rx buffers for device
  2024-10-27  6:51     ` Amit Cohen
@ 2024-10-29 15:12       ` Alexander Lobakin
  0 siblings, 0 replies; 12+ messages in thread
From: Alexander Lobakin @ 2024-10-29 15:12 UTC (permalink / raw)
  To: Amit Cohen
  Cc: Petr Machata, netdev@vger.kernel.org, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Danielle Ratson, Ido Schimmel, mlxsw, Jiri Pirko

From: Amit Cohen <amcohen@nvidia.com>
Date: Sun, 27 Oct 2024 06:51:00 +0000

> 
> 
>> -----Original Message-----
>> From: Alexander Lobakin <aleksander.lobakin@intel.com>
>> Sent: Friday, 25 October 2024 18:03
>> To: Petr Machata <petrm@nvidia.com>; Amit Cohen <amcohen@nvidia.com>
>> Cc: netdev@vger.kernel.org; Andrew Lunn <andrew+netdev@lunn.ch>; David S. Miller <davem@davemloft.net>; Eric Dumazet
>> <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>; Simon Horman <horms@kernel.org>;
>> Danielle Ratson <danieller@nvidia.com>; Ido Schimmel <idosch@nvidia.com>; mlxsw <mlxsw@nvidia.com>; Jiri Pirko <jiri@resnulli.us>
>> Subject: Re: [PATCH net 3/5] mlxsw: pci: Sync Rx buffers for device
>>
>> From: Petr Machata <petrm@nvidia.com>
>> Date: Fri, 25 Oct 2024 16:26:27 +0200
>>
>>> From: Amit Cohen <amcohen@nvidia.com>
>>>
>>> Non-coherent architectures, like ARM, may require invalidating caches
>>> before the device can use the DMA mapped memory, which means that before
>>> posting pages to device, drivers should sync the memory for device.
>>>
>>> Sync for device can be configured as page pool responsibility. Set the
>>> relevant flag and define max_len for sync.
>>>
>>> Cc: Jiri Pirko <jiri@resnulli.us>
>>> Fixes: b5b60bb491b2 ("mlxsw: pci: Use page pool for Rx buffers allocation")
>>> Signed-off-by: Amit Cohen <amcohen@nvidia.com>
>>> Reviewed-by: Ido Schimmel <idosch@nvidia.com>
>>> Signed-off-by: Petr Machata <petrm@nvidia.com>
>>> ---
>>>  drivers/net/ethernet/mellanox/mlxsw/pci.c | 3 ++-
>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.c b/drivers/net/ethernet/mellanox/mlxsw/pci.c
>>> index 2320a5f323b4..d6f37456fb31 100644
>>> --- a/drivers/net/ethernet/mellanox/mlxsw/pci.c
>>> +++ b/drivers/net/ethernet/mellanox/mlxsw/pci.c
>>> @@ -996,12 +996,13 @@ static int mlxsw_pci_cq_page_pool_init(struct mlxsw_pci_queue *q,
>>>  	if (cq_type != MLXSW_PCI_CQ_RDQ)
>>>  		return 0;
>>>
>>> -	pp_params.flags = PP_FLAG_DMA_MAP;
>>> +	pp_params.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV;
>>>  	pp_params.pool_size = MLXSW_PCI_WQE_COUNT * mlxsw_pci->num_sg_entries;
>>>  	pp_params.nid = dev_to_node(&mlxsw_pci->pdev->dev);
>>>  	pp_params.dev = &mlxsw_pci->pdev->dev;
>>>  	pp_params.napi = &q->u.cq.napi;
>>>  	pp_params.dma_dir = DMA_FROM_DEVICE;
>>> +	pp_params.max_len = PAGE_SIZE;
>>
>> max_len is the maximum HW-writable area of a buffer. Headroom and
>> tailroom must be excluded. In your case
>>
>> 	pp_params.max_len = PAGE_SIZE - MLXSW_PCI_RX_BUF_SW_OVERHEAD;
>>
> 
> mlxsw driver uses fragmented buffers and the page pool is used to allocate the buffers for all scatter/gather entries.
> For each packet, the HW-writable area of a buffer of the *first* entry is 'PAGE_SIZE - MLXSW_PCI_RX_BUF_SW_OVERHEAD', but for other entries we map PAGE_SIZE to HW.
> That's why we set page pool to sync PAGE_SIZE and use offset=0.

Ooops, I didn't notice this particular configuration has offset == 0, sorry.

Thanks,
Olek

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net 0/5] mlxsw: Fixes
  2024-10-25 14:26 [PATCH net 0/5] mlxsw: Fixes Petr Machata
                   ` (4 preceding siblings ...)
  2024-10-25 14:26 ` [PATCH net 5/5] selftests: forwarding: Add IPv6 GRE remote change tests Petr Machata
@ 2024-10-31  1:30 ` patchwork-bot+netdevbpf
  5 siblings, 0 replies; 12+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-10-31  1:30 UTC (permalink / raw)
  To: Petr Machata
  Cc: netdev, andrew+netdev, davem, edumazet, kuba, pabeni, horms,
	danieller, idosch, amcohen, mlxsw

Hello:

This series was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Fri, 25 Oct 2024 16:26:24 +0200 you wrote:
> In this patchset:
> 
> - Tx header should be pushed for each packet which is transmitted via
>   Spectrum ASICs. Patch #1 adds a missing call to skb_cow_head() to make
>   sure that there is both enough room to push the Tx header and that the
>   SKB header is not cloned and can be modified.
> 
> [...]

Here is the summary with links:
  - [net,1/5] mlxsw: spectrum_ptp: Add missing verification before pushing Tx header
    https://git.kernel.org/netdev/net/c/0a66e5582b51
  - [net,2/5] mlxsw: pci: Sync Rx buffers for CPU
    https://git.kernel.org/netdev/net/c/15f73e601a9c
  - [net,3/5] mlxsw: pci: Sync Rx buffers for device
    https://git.kernel.org/netdev/net/c/d0fbdc3ae9ec
  - [net,4/5] mlxsw: spectrum_ipip: Fix memory leak when changing remote IPv6 address
    https://git.kernel.org/netdev/net/c/12ae97c531fc
  - [net,5/5] selftests: forwarding: Add IPv6 GRE remote change tests
    https://git.kernel.org/netdev/net/c/d7bd61fa0222

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-10-31  1:30 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-25 14:26 [PATCH net 0/5] mlxsw: Fixes Petr Machata
2024-10-25 14:26 ` [PATCH net 1/5] mlxsw: spectrum_ptp: Add missing verification before pushing Tx header Petr Machata
2024-10-25 14:26 ` [PATCH net 2/5] mlxsw: pci: Sync Rx buffers for CPU Petr Machata
2024-10-25 15:00   ` Alexander Lobakin
2024-10-27  7:29     ` Amit Cohen
2024-10-25 14:26 ` [PATCH net 3/5] mlxsw: pci: Sync Rx buffers for device Petr Machata
2024-10-25 15:02   ` Alexander Lobakin
2024-10-27  6:51     ` Amit Cohen
2024-10-29 15:12       ` Alexander Lobakin
2024-10-25 14:26 ` [PATCH net 4/5] mlxsw: spectrum_ipip: Fix memory leak when changing remote IPv6 address Petr Machata
2024-10-25 14:26 ` [PATCH net 5/5] selftests: forwarding: Add IPv6 GRE remote change tests Petr Machata
2024-10-31  1:30 ` [PATCH net 0/5] mlxsw: Fixes patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).