* [PATCH net-next 01/12] mlxsw: core: Remove debug prints
2025-02-04 11:04 [PATCH net-next 00/12] mlxsw: Preparations for XDP support Petr Machata
@ 2025-02-04 11:04 ` Petr Machata
2025-02-04 11:04 ` [PATCH net-next 02/12] mlxsw: Check Rx local port in PCI code Petr Machata
` (11 subsequent siblings)
12 siblings, 0 replies; 22+ messages in thread
From: Petr Machata @ 2025-02-04 11:04 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Andrew Lunn, netdev
Cc: Amit Cohen, Ido Schimmel, Petr Machata, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, bpf,
mlxsw
From: Amit Cohen <amcohen@nvidia.com>
dev_dbg_ratelimited() is used as part of mlxsw_core_skb_receive(), the
printed info can be achieved using simple tracing. Remove such calls to
clean up the code. A next patch will change 'rx_info' fields, without
these prints, some fields will become unnecessary and can be removed.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
---
drivers/net/ethernet/mellanox/mlxsw/core.c | 6 ------
1 file changed, 6 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c
index 2bb2b77351bd..8becb08984a6 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -2948,9 +2948,6 @@ void mlxsw_core_skb_receive(struct mlxsw_core *mlxsw_core, struct sk_buff *skb,
bool found = false;
if (rx_info->is_lag) {
- dev_dbg_ratelimited(mlxsw_core->bus_info->dev, "%s: lag_id = %d, lag_port_index = 0x%x\n",
- __func__, rx_info->u.lag_id,
- rx_info->trap_id);
/* Upper layer does not care if the skb came from LAG or not,
* so just get the local_port for the lag port and push it up.
*/
@@ -2961,9 +2958,6 @@ void mlxsw_core_skb_receive(struct mlxsw_core *mlxsw_core, struct sk_buff *skb,
local_port = rx_info->u.sys_port;
}
- dev_dbg_ratelimited(mlxsw_core->bus_info->dev, "%s: local_port = %d, trap_id = 0x%x\n",
- __func__, local_port, rx_info->trap_id);
-
if ((rx_info->trap_id >= MLXSW_TRAP_ID_MAX) ||
(local_port >= mlxsw_core->max_ports))
goto drop;
--
2.47.0
^ permalink raw reply related [flat|nested] 22+ messages in thread* [PATCH net-next 02/12] mlxsw: Check Rx local port in PCI code
2025-02-04 11:04 [PATCH net-next 00/12] mlxsw: Preparations for XDP support Petr Machata
2025-02-04 11:04 ` [PATCH net-next 01/12] mlxsw: core: Remove debug prints Petr Machata
@ 2025-02-04 11:04 ` Petr Machata
2025-02-04 11:04 ` [PATCH net-next 03/12] mlxsw: Add struct mlxsw_pci_rx_pkt_info Petr Machata
` (10 subsequent siblings)
12 siblings, 0 replies; 22+ messages in thread
From: Petr Machata @ 2025-02-04 11:04 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Andrew Lunn, netdev
Cc: Amit Cohen, Ido Schimmel, Petr Machata, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, bpf,
mlxsw
From: Amit Cohen <amcohen@nvidia.com>
In case that a packet is received from port in a LAG, the CQE contains info
about the LAG and later, core code checks which local port is the Rx port.
To support XDP, such checking should be done also as part of PCI code, to
get the relevant XDP program according to Rx netdevice. There is no point
to check the mapping twice, as preparation for XDP support, check which is
the Rx local port as part of PCI code and fill this info in 'rx_info'.
Remove the unnecessary fields from 'rx_info'.
Handle 'rx_info.local_port' earlier in the code, as this info will be
used for XDP running, XDP will be handled in the code after handling
local port.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
---
drivers/net/ethernet/mellanox/mlxsw/core.c | 18 +++---------------
drivers/net/ethernet/mellanox/mlxsw/core.h | 7 +------
drivers/net/ethernet/mellanox/mlxsw/pci.c | 22 ++++++++++++----------
3 files changed, 16 insertions(+), 31 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c
index 8becb08984a6..392c0355d589 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -2944,29 +2944,17 @@ void mlxsw_core_skb_receive(struct mlxsw_core *mlxsw_core, struct sk_buff *skb,
{
struct mlxsw_rx_listener_item *rxl_item;
const struct mlxsw_rx_listener *rxl;
- u16 local_port;
bool found = false;
- if (rx_info->is_lag) {
- /* Upper layer does not care if the skb came from LAG or not,
- * so just get the local_port for the lag port and push it up.
- */
- local_port = mlxsw_core_lag_mapping_get(mlxsw_core,
- rx_info->u.lag_id,
- rx_info->lag_port_index);
- } else {
- local_port = rx_info->u.sys_port;
- }
-
if ((rx_info->trap_id >= MLXSW_TRAP_ID_MAX) ||
- (local_port >= mlxsw_core->max_ports))
+ (rx_info->local_port >= mlxsw_core->max_ports))
goto drop;
rcu_read_lock();
list_for_each_entry_rcu(rxl_item, &mlxsw_core->rx_listener_list, list) {
rxl = &rxl_item->rxl;
if ((rxl->local_port == MLXSW_PORT_DONT_CARE ||
- rxl->local_port == local_port) &&
+ rxl->local_port == rx_info->local_port) &&
rxl->trap_id == rx_info->trap_id &&
rxl->mirror_reason == rx_info->mirror_reason) {
if (rxl_item->enabled)
@@ -2979,7 +2967,7 @@ void mlxsw_core_skb_receive(struct mlxsw_core *mlxsw_core, struct sk_buff *skb,
goto drop;
}
- rxl->func(skb, local_port, rxl_item->priv);
+ rxl->func(skb, rx_info->local_port, rxl_item->priv);
rcu_read_unlock();
return;
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.h b/drivers/net/ethernet/mellanox/mlxsw/core.h
index 1a871397a6df..72eb7dbf57ce 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.h
@@ -242,12 +242,7 @@ int mlxsw_reg_write(struct mlxsw_core *mlxsw_core,
const struct mlxsw_reg_info *reg, char *payload);
struct mlxsw_rx_info {
- bool is_lag;
- union {
- u16 sys_port;
- u16 lag_id;
- } u;
- u16 lag_port_index;
+ u16 local_port;
u8 mirror_reason;
int trap_id;
};
diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.c b/drivers/net/ethernet/mellanox/mlxsw/pci.c
index 5b44c931b660..55ef185c9f5a 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/pci.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/pci.c
@@ -761,6 +761,18 @@ static void mlxsw_pci_cqe_rdq_handle(struct mlxsw_pci *mlxsw_pci,
if (mlxsw_pci_cqe_crc_get(cqe_v, cqe))
byte_count -= ETH_FCS_LEN;
+ if (mlxsw_pci_cqe_lag_get(cqe_v, cqe)) {
+ u16 lag_id, lag_port_index;
+
+ lag_id = mlxsw_pci_cqe_lag_id_get(cqe_v, cqe);
+ lag_port_index = mlxsw_pci_cqe_lag_subport_get(cqe_v, cqe);
+ rx_info.local_port = mlxsw_core_lag_mapping_get(mlxsw_pci->core,
+ lag_id,
+ lag_port_index);
+ } else {
+ rx_info.local_port = mlxsw_pci_cqe_system_port_get(cqe);
+ }
+
err = mlxsw_pci_elem_info_pages_ref_store(q, elem_info, byte_count,
pages, &num_sg_entries);
if (err)
@@ -779,16 +791,6 @@ static void mlxsw_pci_cqe_rdq_handle(struct mlxsw_pci *mlxsw_pci,
skb_mark_for_recycle(skb);
- if (mlxsw_pci_cqe_lag_get(cqe_v, cqe)) {
- rx_info.is_lag = true;
- rx_info.u.lag_id = mlxsw_pci_cqe_lag_id_get(cqe_v, cqe);
- rx_info.lag_port_index =
- mlxsw_pci_cqe_lag_subport_get(cqe_v, cqe);
- } else {
- rx_info.is_lag = false;
- rx_info.u.sys_port = mlxsw_pci_cqe_system_port_get(cqe);
- }
-
rx_info.trap_id = mlxsw_pci_cqe_trap_id_get(cqe);
if (rx_info.trap_id == MLXSW_TRAP_ID_DISCARD_INGRESS_ACL ||
--
2.47.0
^ permalink raw reply related [flat|nested] 22+ messages in thread* [PATCH net-next 03/12] mlxsw: Add struct mlxsw_pci_rx_pkt_info
2025-02-04 11:04 [PATCH net-next 00/12] mlxsw: Preparations for XDP support Petr Machata
2025-02-04 11:04 ` [PATCH net-next 01/12] mlxsw: core: Remove debug prints Petr Machata
2025-02-04 11:04 ` [PATCH net-next 02/12] mlxsw: Check Rx local port in PCI code Petr Machata
@ 2025-02-04 11:04 ` Petr Machata
2025-02-04 11:04 ` [PATCH net-next 04/12] mlxsw: pci: Use mlxsw_pci_rx_pkt_info Petr Machata
` (9 subsequent siblings)
12 siblings, 0 replies; 22+ messages in thread
From: Petr Machata @ 2025-02-04 11:04 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Andrew Lunn, netdev
Cc: Amit Cohen, Ido Schimmel, Petr Machata, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, bpf,
mlxsw
From: Amit Cohen <amcohen@nvidia.com>
When Rx packet is received, given byte_count value from the CQE, we
calculate how many scatter/gather entries are used and the size of each
entry.
Such calculation is used for syncing the buffers for CPU and for building
SKB. When XDP will be supported, these values will be used also to create
XDP buffer.
To avoid recalculating number of scatter/gather entries and size of each
entry, add a dedicated structure to hold such info. Store also pointers
to pages. Initialize the new structure once Rx packet is received. This
patch only initializes the structure, next patches will use it.
Add struct mlxsw_pci_rx_pkt_info in pci.h as next patch in this set will
use it from another file.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
---
drivers/net/ethernet/mellanox/mlxsw/pci.c | 57 +++++++++++++++++---
drivers/net/ethernet/mellanox/mlxsw/pci.h | 8 +++
drivers/net/ethernet/mellanox/mlxsw/pci_hw.h | 1 -
3 files changed, 57 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.c b/drivers/net/ethernet/mellanox/mlxsw/pci.c
index 55ef185c9f5a..aca1857a4e70 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/pci.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/pci.c
@@ -390,6 +390,49 @@ static void mlxsw_pci_wqe_frag_unmap(struct mlxsw_pci *mlxsw_pci, char *wqe,
dma_unmap_single(&pdev->dev, mapaddr, frag_len, direction);
}
+static u8 mlxsw_pci_num_sg_entries_get(u16 byte_count)
+{
+ return DIV_ROUND_UP(byte_count + MLXSW_PCI_RX_BUF_SW_OVERHEAD,
+ PAGE_SIZE);
+}
+
+static int
+mlxsw_pci_rx_pkt_info_init(const struct mlxsw_pci *pci,
+ const struct mlxsw_pci_queue_elem_info *elem_info,
+ u16 byte_count,
+ struct mlxsw_pci_rx_pkt_info *rx_pkt_info)
+{
+ unsigned int linear_data_size;
+ u8 num_sg_entries;
+ bool linear_only;
+ int i;
+
+ num_sg_entries = mlxsw_pci_num_sg_entries_get(byte_count);
+ if (WARN_ON_ONCE(num_sg_entries > pci->num_sg_entries))
+ return -EINVAL;
+
+ rx_pkt_info->num_sg_entries = num_sg_entries;
+
+ linear_only = byte_count + MLXSW_PCI_RX_BUF_SW_OVERHEAD <= PAGE_SIZE;
+ linear_data_size = linear_only ? byte_count :
+ PAGE_SIZE -
+ MLXSW_PCI_RX_BUF_SW_OVERHEAD;
+
+ for (i = 0; i < num_sg_entries; i++) {
+ unsigned int sg_entry_size;
+
+ sg_entry_size = i ? min(byte_count, PAGE_SIZE) :
+ linear_data_size;
+
+ rx_pkt_info->sg_entries_size[i] = sg_entry_size;
+ rx_pkt_info->pages[i] = elem_info->pages[i];
+
+ byte_count -= sg_entry_size;
+ }
+
+ return 0;
+}
+
static struct sk_buff *mlxsw_pci_rdq_build_skb(struct mlxsw_pci_queue *q,
struct page *pages[],
u16 byte_count)
@@ -470,12 +513,6 @@ static void mlxsw_pci_rdq_page_free(struct mlxsw_pci_queue *q,
false);
}
-static u8 mlxsw_pci_num_sg_entries_get(u16 byte_count)
-{
- return DIV_ROUND_UP(byte_count + MLXSW_PCI_RX_BUF_SW_OVERHEAD,
- PAGE_SIZE);
-}
-
static int
mlxsw_pci_elem_info_pages_ref_store(const struct mlxsw_pci_queue *q,
const struct mlxsw_pci_queue_elem_info *el,
@@ -486,8 +523,6 @@ mlxsw_pci_elem_info_pages_ref_store(const struct mlxsw_pci_queue *q,
int i;
num_sg_entries = mlxsw_pci_num_sg_entries_get(byte_count);
- if (WARN_ON_ONCE(num_sg_entries > q->pci->num_sg_entries))
- return -EINVAL;
for (i = 0; i < num_sg_entries; i++)
pages[i] = el->pages[i];
@@ -743,6 +778,7 @@ static void mlxsw_pci_cqe_rdq_handle(struct mlxsw_pci *mlxsw_pci,
u16 consumer_counter_limit,
enum mlxsw_pci_cqe_v cqe_v, char *cqe)
{
+ struct mlxsw_pci_rx_pkt_info rx_pkt_info = {};
struct pci_dev *pdev = mlxsw_pci->pdev;
struct page *pages[MLXSW_PCI_WQE_SG_ENTRIES];
struct mlxsw_pci_queue_elem_info *elem_info;
@@ -773,6 +809,11 @@ static void mlxsw_pci_cqe_rdq_handle(struct mlxsw_pci *mlxsw_pci,
rx_info.local_port = mlxsw_pci_cqe_system_port_get(cqe);
}
+ err = mlxsw_pci_rx_pkt_info_init(q->pci, elem_info, byte_count,
+ &rx_pkt_info);
+ if (err)
+ goto out;
+
err = mlxsw_pci_elem_info_pages_ref_store(q, elem_info, byte_count,
pages, &num_sg_entries);
if (err)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.h b/drivers/net/ethernet/mellanox/mlxsw/pci.h
index cacc2f9fa1d4..74677feacbb5 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/pci.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/pci.h
@@ -11,11 +11,19 @@
#define PCI_DEVICE_ID_MELLANOX_SPECTRUM3 0xcf70
#define PCI_DEVICE_ID_MELLANOX_SPECTRUM4 0xcf80
+#define MLXSW_PCI_WQE_SG_ENTRIES 3
+
#if IS_ENABLED(CONFIG_MLXSW_PCI)
int mlxsw_pci_driver_register(struct pci_driver *pci_driver);
void mlxsw_pci_driver_unregister(struct pci_driver *pci_driver);
+struct mlxsw_pci_rx_pkt_info {
+ struct page *pages[MLXSW_PCI_WQE_SG_ENTRIES];
+ unsigned int sg_entries_size[MLXSW_PCI_WQE_SG_ENTRIES];
+ u8 num_sg_entries;
+};
+
#else
static inline int
diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci_hw.h b/drivers/net/ethernet/mellanox/mlxsw/pci_hw.h
index 6bed495dcf0f..83d25f926287 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/pci_hw.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/pci_hw.h
@@ -64,7 +64,6 @@
#define MLXSW_PCI_EQE_COUNT (MLXSW_PCI_AQ_SIZE / MLXSW_PCI_EQE_SIZE)
#define MLXSW_PCI_EQE_UPDATE_COUNT 0x80
-#define MLXSW_PCI_WQE_SG_ENTRIES 3
#define MLXSW_PCI_WQE_TYPE_ETHERNET 0xA
/* pci_wqe_c
--
2.47.0
^ permalink raw reply related [flat|nested] 22+ messages in thread* [PATCH net-next 04/12] mlxsw: pci: Use mlxsw_pci_rx_pkt_info
2025-02-04 11:04 [PATCH net-next 00/12] mlxsw: Preparations for XDP support Petr Machata
` (2 preceding siblings ...)
2025-02-04 11:04 ` [PATCH net-next 03/12] mlxsw: Add struct mlxsw_pci_rx_pkt_info Petr Machata
@ 2025-02-04 11:04 ` Petr Machata
2025-02-04 11:05 ` [PATCH net-next 05/12] mlxsw: pci: Add a separate function for syncing buffers for CPU Petr Machata
` (8 subsequent siblings)
12 siblings, 0 replies; 22+ messages in thread
From: Petr Machata @ 2025-02-04 11:04 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Andrew Lunn, netdev
Cc: Amit Cohen, Ido Schimmel, Petr Machata, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, bpf,
mlxsw
From: Amit Cohen <amcohen@nvidia.com>
Pass the newly added structure as an argument for mlxsw_pci_rdq_build_skb()
and use it.
Remove mlxsw_pci_elem_info_pages_ref_store(), as mlxsw_pci_rx_pkt_info
stores pointers to pages.
Pass to mlxsw_pci_rdq_pages_alloc() number of scatter/gather entries which
is stored in mlxsw_pci_rx_pkt_info.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
---
drivers/net/ethernet/mellanox/mlxsw/pci.c | 65 ++++++-----------------
1 file changed, 16 insertions(+), 49 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.c b/drivers/net/ethernet/mellanox/mlxsw/pci.c
index aca1857a4e70..374b3f2f117d 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/pci.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/pci.c
@@ -433,28 +433,23 @@ mlxsw_pci_rx_pkt_info_init(const struct mlxsw_pci *pci,
return 0;
}
-static struct sk_buff *mlxsw_pci_rdq_build_skb(struct mlxsw_pci_queue *q,
- struct page *pages[],
- u16 byte_count)
+static struct sk_buff *
+mlxsw_pci_rdq_build_skb(struct mlxsw_pci_queue *q,
+ const struct mlxsw_pci_rx_pkt_info *rx_pkt_info)
{
struct mlxsw_pci_queue *cq = q->u.rdq.cq;
unsigned int linear_data_size;
struct page_pool *page_pool;
struct sk_buff *skb;
- int page_index = 0;
- bool linear_only;
void *data;
+ int i;
- linear_only = byte_count + MLXSW_PCI_RX_BUF_SW_OVERHEAD <= PAGE_SIZE;
- linear_data_size = linear_only ? byte_count :
- PAGE_SIZE -
- MLXSW_PCI_RX_BUF_SW_OVERHEAD;
-
+ linear_data_size = rx_pkt_info->sg_entries_size[0];
page_pool = cq->u.cq.page_pool;
- page_pool_dma_sync_for_cpu(page_pool, pages[page_index],
+ page_pool_dma_sync_for_cpu(page_pool, rx_pkt_info->pages[0],
MLXSW_PCI_SKB_HEADROOM, linear_data_size);
- data = page_address(pages[page_index]);
+ data = page_address(rx_pkt_info->pages[0]);
net_prefetch(data);
skb = napi_build_skb(data, PAGE_SIZE);
@@ -464,23 +459,18 @@ static struct sk_buff *mlxsw_pci_rdq_build_skb(struct mlxsw_pci_queue *q,
skb_reserve(skb, MLXSW_PCI_SKB_HEADROOM);
skb_put(skb, linear_data_size);
- if (linear_only)
+ if (rx_pkt_info->num_sg_entries == 1)
return skb;
- byte_count -= linear_data_size;
- page_index++;
-
- while (byte_count > 0) {
+ for (i = 1; i < rx_pkt_info->num_sg_entries; i++) {
unsigned int frag_size;
struct page *page;
- page = pages[page_index];
- frag_size = min(byte_count, PAGE_SIZE);
+ page = rx_pkt_info->pages[i];
+ frag_size = rx_pkt_info->sg_entries_size[i];
page_pool_dma_sync_for_cpu(page_pool, page, 0, frag_size);
skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags,
page, 0, frag_size, PAGE_SIZE);
- byte_count -= frag_size;
- page_index++;
}
return skb;
@@ -513,24 +503,6 @@ static void mlxsw_pci_rdq_page_free(struct mlxsw_pci_queue *q,
false);
}
-static int
-mlxsw_pci_elem_info_pages_ref_store(const struct mlxsw_pci_queue *q,
- const struct mlxsw_pci_queue_elem_info *el,
- u16 byte_count, struct page *pages[],
- u8 *p_num_sg_entries)
-{
- u8 num_sg_entries;
- int i;
-
- num_sg_entries = mlxsw_pci_num_sg_entries_get(byte_count);
-
- for (i = 0; i < num_sg_entries; i++)
- pages[i] = el->pages[i];
-
- *p_num_sg_entries = num_sg_entries;
- return 0;
-}
-
static int
mlxsw_pci_rdq_pages_alloc(struct mlxsw_pci_queue *q,
struct mlxsw_pci_queue_elem_info *elem_info,
@@ -780,11 +752,9 @@ static void mlxsw_pci_cqe_rdq_handle(struct mlxsw_pci *mlxsw_pci,
{
struct mlxsw_pci_rx_pkt_info rx_pkt_info = {};
struct pci_dev *pdev = mlxsw_pci->pdev;
- struct page *pages[MLXSW_PCI_WQE_SG_ENTRIES];
struct mlxsw_pci_queue_elem_info *elem_info;
struct mlxsw_rx_info rx_info = {};
struct sk_buff *skb;
- u8 num_sg_entries;
u16 byte_count;
int err;
@@ -814,19 +784,16 @@ static void mlxsw_pci_cqe_rdq_handle(struct mlxsw_pci *mlxsw_pci,
if (err)
goto out;
- err = mlxsw_pci_elem_info_pages_ref_store(q, elem_info, byte_count,
- pages, &num_sg_entries);
+ err = mlxsw_pci_rdq_pages_alloc(q, elem_info,
+ rx_pkt_info.num_sg_entries);
if (err)
goto out;
- err = mlxsw_pci_rdq_pages_alloc(q, elem_info, num_sg_entries);
- if (err)
- goto out;
-
- skb = mlxsw_pci_rdq_build_skb(q, pages, byte_count);
+ skb = mlxsw_pci_rdq_build_skb(q, &rx_pkt_info);
if (IS_ERR(skb)) {
dev_err_ratelimited(&pdev->dev, "Failed to build skb for RDQ\n");
- mlxsw_pci_rdq_pages_recycle(q, pages, num_sg_entries);
+ mlxsw_pci_rdq_pages_recycle(q, rx_pkt_info.pages,
+ rx_pkt_info.num_sg_entries);
goto out;
}
--
2.47.0
^ permalink raw reply related [flat|nested] 22+ messages in thread* [PATCH net-next 05/12] mlxsw: pci: Add a separate function for syncing buffers for CPU
2025-02-04 11:04 [PATCH net-next 00/12] mlxsw: Preparations for XDP support Petr Machata
` (3 preceding siblings ...)
2025-02-04 11:04 ` [PATCH net-next 04/12] mlxsw: pci: Use mlxsw_pci_rx_pkt_info Petr Machata
@ 2025-02-04 11:05 ` Petr Machata
2025-02-04 11:05 ` [PATCH net-next 06/12] mlxsw: pci: Store maximum number of ports Petr Machata
` (7 subsequent siblings)
12 siblings, 0 replies; 22+ messages in thread
From: Petr Machata @ 2025-02-04 11:05 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Andrew Lunn, netdev
Cc: Amit Cohen, Ido Schimmel, Petr Machata, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, bpf,
mlxsw
From: Amit Cohen <amcohen@nvidia.com>
Currently, sync for CPU is done as part of building SKB. When XDP will
be supported, such sync should be done earlier, before creating XDP
buffer. Add a function for syncing buffers for CPU and call it early in
mlxsw_pci_cqe_rdq_handle(), as in future patch, the driver will handle XDP
there.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
---
drivers/net/ethernet/mellanox/mlxsw/pci.c | 30 +++++++++++++++++------
1 file changed, 22 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.c b/drivers/net/ethernet/mellanox/mlxsw/pci.c
index 374b3f2f117d..5796d836a7ee 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/pci.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/pci.c
@@ -433,22 +433,34 @@ mlxsw_pci_rx_pkt_info_init(const struct mlxsw_pci *pci,
return 0;
}
+static void
+mlxsw_pci_sync_for_cpu(const struct mlxsw_pci_queue *q,
+ const struct mlxsw_pci_rx_pkt_info *rx_pkt_info)
+{
+ struct mlxsw_pci_queue *cq = q->u.rdq.cq;
+ struct page_pool *page_pool;
+ int i;
+
+ page_pool = cq->u.cq.page_pool;
+
+ for (i = 0; i < rx_pkt_info->num_sg_entries; i++) {
+ u32 offset = i ? 0 : MLXSW_PCI_SKB_HEADROOM;
+
+ page_pool_dma_sync_for_cpu(page_pool, rx_pkt_info->pages[i],
+ offset,
+ rx_pkt_info->sg_entries_size[i]);
+ }
+}
+
static struct sk_buff *
mlxsw_pci_rdq_build_skb(struct mlxsw_pci_queue *q,
const struct mlxsw_pci_rx_pkt_info *rx_pkt_info)
{
- struct mlxsw_pci_queue *cq = q->u.rdq.cq;
unsigned int linear_data_size;
- struct page_pool *page_pool;
struct sk_buff *skb;
void *data;
int i;
- linear_data_size = rx_pkt_info->sg_entries_size[0];
- page_pool = cq->u.cq.page_pool;
- page_pool_dma_sync_for_cpu(page_pool, rx_pkt_info->pages[0],
- MLXSW_PCI_SKB_HEADROOM, linear_data_size);
-
data = page_address(rx_pkt_info->pages[0]);
net_prefetch(data);
@@ -457,6 +469,7 @@ mlxsw_pci_rdq_build_skb(struct mlxsw_pci_queue *q,
return ERR_PTR(-ENOMEM);
skb_reserve(skb, MLXSW_PCI_SKB_HEADROOM);
+ linear_data_size = rx_pkt_info->sg_entries_size[0];
skb_put(skb, linear_data_size);
if (rx_pkt_info->num_sg_entries == 1)
@@ -468,7 +481,6 @@ mlxsw_pci_rdq_build_skb(struct mlxsw_pci_queue *q,
page = rx_pkt_info->pages[i];
frag_size = rx_pkt_info->sg_entries_size[i];
- page_pool_dma_sync_for_cpu(page_pool, page, 0, frag_size);
skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags,
page, 0, frag_size, PAGE_SIZE);
}
@@ -784,6 +796,8 @@ static void mlxsw_pci_cqe_rdq_handle(struct mlxsw_pci *mlxsw_pci,
if (err)
goto out;
+ mlxsw_pci_sync_for_cpu(q, &rx_pkt_info);
+
err = mlxsw_pci_rdq_pages_alloc(q, elem_info,
rx_pkt_info.num_sg_entries);
if (err)
--
2.47.0
^ permalink raw reply related [flat|nested] 22+ messages in thread* [PATCH net-next 06/12] mlxsw: pci: Store maximum number of ports
2025-02-04 11:04 [PATCH net-next 00/12] mlxsw: Preparations for XDP support Petr Machata
` (4 preceding siblings ...)
2025-02-04 11:05 ` [PATCH net-next 05/12] mlxsw: pci: Add a separate function for syncing buffers for CPU Petr Machata
@ 2025-02-04 11:05 ` Petr Machata
2025-02-04 11:05 ` [PATCH net-next 07/12] mlxsw: pci: Add PCI ports array Petr Machata
` (6 subsequent siblings)
12 siblings, 0 replies; 22+ messages in thread
From: Petr Machata @ 2025-02-04 11:05 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Andrew Lunn, netdev
Cc: Amit Cohen, Ido Schimmel, Petr Machata, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, bpf,
mlxsw
From: Amit Cohen <amcohen@nvidia.com>
A next patch will store mapping between local port to netdevice in PCI
driver. The motivation is to allow quick access to XDP program.
When a packet is received, the Rx local port is known, to run XDP program
we need to map Rx local port to netdevice, as XDP program is set per
netdevice.
As preparation, store the maximum number of ports as part of mlxsw_pci
structure, this value is queried from firmware.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
---
drivers/net/ethernet/mellanox/mlxsw/pci.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.c b/drivers/net/ethernet/mellanox/mlxsw/pci.c
index 5796d836a7ee..8af4050d5fc6 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/pci.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/pci.c
@@ -137,6 +137,7 @@ struct mlxsw_pci {
bool skip_reset;
struct net_device *napi_dev_tx;
struct net_device *napi_dev_rx;
+ unsigned int max_ports;
};
static int mlxsw_pci_napi_devs_init(struct mlxsw_pci *mlxsw_pci)
@@ -171,6 +172,20 @@ static void mlxsw_pci_napi_devs_fini(struct mlxsw_pci *mlxsw_pci)
free_netdev(mlxsw_pci->napi_dev_tx);
}
+static int mlxsw_pci_max_ports_set(struct mlxsw_pci *mlxsw_pci)
+{
+ struct mlxsw_core *mlxsw_core = mlxsw_pci->core;
+ unsigned int max_ports;
+
+ if (!MLXSW_CORE_RES_VALID(mlxsw_core, MAX_SYSTEM_PORT))
+ return -EINVAL;
+
+ /* Switch ports are numbered from 1 to queried value */
+ max_ports = MLXSW_CORE_RES_GET(mlxsw_core, MAX_SYSTEM_PORT) + 1;
+ mlxsw_pci->max_ports = max_ports;
+ return 0;
+}
+
static char *__mlxsw_pci_queue_elem_get(struct mlxsw_pci_queue *q,
size_t elem_size, int elem_index)
{
@@ -2069,6 +2084,10 @@ static int mlxsw_pci_init(void *bus_priv, struct mlxsw_core *mlxsw_core,
if (err)
goto err_napi_devs_init;
+ err = mlxsw_pci_max_ports_set(mlxsw_pci);
+ if (err)
+ goto err_max_ports_set;
+
err = mlxsw_pci_aqs_init(mlxsw_pci, mbox);
if (err)
goto err_aqs_init;
@@ -2086,6 +2105,7 @@ static int mlxsw_pci_init(void *bus_priv, struct mlxsw_core *mlxsw_core,
err_request_eq_irq:
mlxsw_pci_aqs_fini(mlxsw_pci);
err_aqs_init:
+err_max_ports_set:
mlxsw_pci_napi_devs_fini(mlxsw_pci);
err_napi_devs_init:
err_requery_resources:
--
2.47.0
^ permalink raw reply related [flat|nested] 22+ messages in thread* [PATCH net-next 07/12] mlxsw: pci: Add PCI ports array
2025-02-04 11:04 [PATCH net-next 00/12] mlxsw: Preparations for XDP support Petr Machata
` (5 preceding siblings ...)
2025-02-04 11:05 ` [PATCH net-next 06/12] mlxsw: pci: Store maximum number of ports Petr Machata
@ 2025-02-04 11:05 ` Petr Machata
2025-02-04 11:05 ` [PATCH net-next 08/12] mlxsw: Add APIs to init/fini PCI port Petr Machata
` (5 subsequent siblings)
12 siblings, 0 replies; 22+ messages in thread
From: Petr Machata @ 2025-02-04 11:05 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Andrew Lunn, netdev
Cc: Amit Cohen, Ido Schimmel, Petr Machata, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, bpf,
mlxsw
From: Amit Cohen <amcohen@nvidia.com>
A future patch set will add support for XDP in mlxsw driver. When a
packet is received, the Rx local port is provided by the CQE, and we should
check if an XDP program is configured for this port.
To allow quick mapping between local port to netdevice and XDP program,
add an array of mlxsw_pci_port structure. Allocate the array as part of
init, according to maximum number of ports. For now, this structure only
contains pointer to netdevice. Next patches will extend the structure.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
---
drivers/net/ethernet/mellanox/mlxsw/pci.c | 30 +++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.c b/drivers/net/ethernet/mellanox/mlxsw/pci.c
index 8af4050d5fc6..563b9c0578f8 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/pci.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/pci.c
@@ -102,6 +102,10 @@ struct mlxsw_pci_queue_type_group {
u8 count; /* number of queues in group */
};
+struct mlxsw_pci_port {
+ struct net_device *netdev;
+};
+
struct mlxsw_pci {
struct pci_dev *pdev;
u8 __iomem *hw_addr;
@@ -138,6 +142,7 @@ struct mlxsw_pci {
struct net_device *napi_dev_tx;
struct net_device *napi_dev_rx;
unsigned int max_ports;
+ struct mlxsw_pci_port *pci_ports;
};
static int mlxsw_pci_napi_devs_init(struct mlxsw_pci *mlxsw_pci)
@@ -186,6 +191,24 @@ static int mlxsw_pci_max_ports_set(struct mlxsw_pci *mlxsw_pci)
return 0;
}
+static int mlxsw_pci_ports_init(struct mlxsw_pci *mlxsw_pci)
+{
+ struct mlxsw_pci_port *pci_ports;
+
+ pci_ports = kcalloc(mlxsw_pci->max_ports,
+ sizeof(struct mlxsw_pci_port), GFP_KERNEL);
+ if (!pci_ports)
+ return -ENOMEM;
+
+ mlxsw_pci->pci_ports = pci_ports;
+ return 0;
+}
+
+static void mlxsw_pci_ports_fini(struct mlxsw_pci *mlxsw_pci)
+{
+ kfree(mlxsw_pci->pci_ports);
+}
+
static char *__mlxsw_pci_queue_elem_get(struct mlxsw_pci_queue *q,
size_t elem_size, int elem_index)
{
@@ -2088,6 +2111,10 @@ static int mlxsw_pci_init(void *bus_priv, struct mlxsw_core *mlxsw_core,
if (err)
goto err_max_ports_set;
+ err = mlxsw_pci_ports_init(mlxsw_pci);
+ if (err)
+ goto err_ports_init;
+
err = mlxsw_pci_aqs_init(mlxsw_pci, mbox);
if (err)
goto err_aqs_init;
@@ -2105,6 +2132,8 @@ static int mlxsw_pci_init(void *bus_priv, struct mlxsw_core *mlxsw_core,
err_request_eq_irq:
mlxsw_pci_aqs_fini(mlxsw_pci);
err_aqs_init:
+ mlxsw_pci_ports_fini(mlxsw_pci);
+err_ports_init:
err_max_ports_set:
mlxsw_pci_napi_devs_fini(mlxsw_pci);
err_napi_devs_init:
@@ -2135,6 +2164,7 @@ static void mlxsw_pci_fini(void *bus_priv)
free_irq(pci_irq_vector(mlxsw_pci->pdev, 0), mlxsw_pci);
mlxsw_pci_aqs_fini(mlxsw_pci);
+ mlxsw_pci_ports_fini(mlxsw_pci);
mlxsw_pci_napi_devs_fini(mlxsw_pci);
mlxsw_pci_fw_area_fini(mlxsw_pci);
mlxsw_pci_free_irq_vectors(mlxsw_pci);
--
2.47.0
^ permalink raw reply related [flat|nested] 22+ messages in thread* [PATCH net-next 08/12] mlxsw: Add APIs to init/fini PCI port
2025-02-04 11:04 [PATCH net-next 00/12] mlxsw: Preparations for XDP support Petr Machata
` (6 preceding siblings ...)
2025-02-04 11:05 ` [PATCH net-next 07/12] mlxsw: pci: Add PCI ports array Petr Machata
@ 2025-02-04 11:05 ` Petr Machata
2025-02-04 11:05 ` [PATCH net-next 09/12] mlxsw: pci: Initialize XDP Rx queue info per RDQ Petr Machata
` (4 subsequent siblings)
12 siblings, 0 replies; 22+ messages in thread
From: Petr Machata @ 2025-02-04 11:05 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Andrew Lunn, netdev
Cc: Amit Cohen, Ido Schimmel, Petr Machata, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, bpf,
mlxsw
From: Amit Cohen <amcohen@nvidia.com>
The previous patch added PCI ports array, to store the associated netdevice
for each local port. Add APIs which set/unset netdevice for specific local
port, these APIs will be used from mlxsw_sp_port_create() and
mlxsw_sp_port_remove(). For now, store only netdevice pointer, next patches
will extend this structure.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
---
drivers/net/ethernet/mellanox/mlxsw/core.c | 13 +++++++++++++
drivers/net/ethernet/mellanox/mlxsw/core.h | 6 ++++++
drivers/net/ethernet/mellanox/mlxsw/pci.c | 21 +++++++++++++++++++++
3 files changed, 40 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c
index 392c0355d589..628530e01b19 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -219,6 +219,19 @@ mlxsw_core_flood_mode(struct mlxsw_core *mlxsw_core)
}
EXPORT_SYMBOL(mlxsw_core_flood_mode);
+void mlxsw_core_bus_port_init(struct mlxsw_core *mlxsw_core, u16 local_port,
+ struct net_device *netdev)
+{
+ mlxsw_core->bus->port_init(mlxsw_core->bus_priv, local_port, netdev);
+}
+EXPORT_SYMBOL(mlxsw_core_bus_port_init);
+
+void mlxsw_core_bus_port_fini(struct mlxsw_core *mlxsw_core, u16 local_port)
+{
+ mlxsw_core->bus->port_fini(mlxsw_core->bus_priv, local_port);
+}
+EXPORT_SYMBOL(mlxsw_core_bus_port_fini);
+
void *mlxsw_core_driver_priv(struct mlxsw_core *mlxsw_core)
{
return mlxsw_core->driver_priv;
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.h b/drivers/net/ethernet/mellanox/mlxsw/core.h
index 72eb7dbf57ce..506fe50acdec 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.h
@@ -40,6 +40,9 @@ enum mlxsw_cmd_mbox_config_profile_lag_mode
mlxsw_core_lag_mode(struct mlxsw_core *mlxsw_core);
enum mlxsw_cmd_mbox_config_profile_flood_mode
mlxsw_core_flood_mode(struct mlxsw_core *mlxsw_core);
+void mlxsw_core_bus_port_init(struct mlxsw_core *mlxsw_core, u16 local_port,
+ struct net_device *netdev);
+void mlxsw_core_bus_port_fini(struct mlxsw_core *mlxsw_core, u16 local_port);
void *mlxsw_core_driver_priv(struct mlxsw_core *mlxsw_core);
@@ -495,6 +498,9 @@ struct mlxsw_bus {
u32 (*read_frc_l)(void *bus_priv);
u32 (*read_utc_sec)(void *bus_priv);
u32 (*read_utc_nsec)(void *bus_priv);
+ void (*port_init)(void *bus_priv, u16 local_port,
+ struct net_device *netdev);
+ void (*port_fini)(void *bus_priv, u16 local_port);
enum mlxsw_cmd_mbox_config_profile_lag_mode (*lag_mode)(void *bus_priv);
enum mlxsw_cmd_mbox_config_profile_flood_mode (*flood_mode)(void *priv);
u8 features;
diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.c b/drivers/net/ethernet/mellanox/mlxsw/pci.c
index 563b9c0578f8..bd6c772a3384 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/pci.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/pci.c
@@ -2434,6 +2434,25 @@ mlxsw_pci_flood_mode(void *bus_priv)
return mlxsw_pci->flood_mode;
}
+static void mlxsw_pci_port_init(void *bus_priv, u16 local_port,
+ struct net_device *netdev)
+{
+ struct mlxsw_pci *mlxsw_pci = bus_priv;
+ struct mlxsw_pci_port *pci_port;
+
+ pci_port = &mlxsw_pci->pci_ports[local_port];
+ pci_port->netdev = netdev;
+}
+
+static void mlxsw_pci_port_fini(void *bus_priv, u16 local_port)
+{
+ struct mlxsw_pci *mlxsw_pci = bus_priv;
+ struct mlxsw_pci_port *pci_port;
+
+ pci_port = &mlxsw_pci->pci_ports[local_port];
+ pci_port->netdev = NULL;
+}
+
static const struct mlxsw_bus mlxsw_pci_bus = {
.kind = "pci",
.init = mlxsw_pci_init,
@@ -2445,6 +2464,8 @@ static const struct mlxsw_bus mlxsw_pci_bus = {
.read_frc_l = mlxsw_pci_read_frc_l,
.read_utc_sec = mlxsw_pci_read_utc_sec,
.read_utc_nsec = mlxsw_pci_read_utc_nsec,
+ .port_init = mlxsw_pci_port_init,
+ .port_fini = mlxsw_pci_port_fini,
.lag_mode = mlxsw_pci_lag_mode,
.flood_mode = mlxsw_pci_flood_mode,
.features = MLXSW_BUS_F_TXRX | MLXSW_BUS_F_RESET,
--
2.47.0
^ permalink raw reply related [flat|nested] 22+ messages in thread* [PATCH net-next 09/12] mlxsw: pci: Initialize XDP Rx queue info per RDQ
2025-02-04 11:04 [PATCH net-next 00/12] mlxsw: Preparations for XDP support Petr Machata
` (7 preceding siblings ...)
2025-02-04 11:05 ` [PATCH net-next 08/12] mlxsw: Add APIs to init/fini PCI port Petr Machata
@ 2025-02-04 11:05 ` Petr Machata
2025-02-04 11:05 ` [PATCH net-next 10/12] mlxsw: spectrum: Initialize PCI port with the relevant netdevice Petr Machata
` (3 subsequent siblings)
12 siblings, 0 replies; 22+ messages in thread
From: Petr Machata @ 2025-02-04 11:05 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Andrew Lunn, netdev
Cc: Amit Cohen, Ido Schimmel, Petr Machata, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, bpf,
mlxsw
From: Amit Cohen <amcohen@nvidia.com>
In preparation for XDP support, register an Rx queue info structure for
each receive queue.
Each Rx queue is used by multiple net devices so pass a dummy net device
(unregistered, 0 ifindex) as the device.
Pass a queue index of 0 since the net devices are registered by the
driver as single queue.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
---
drivers/net/ethernet/mellanox/mlxsw/pci.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.c b/drivers/net/ethernet/mellanox/mlxsw/pci.c
index bd6c772a3384..b102be38d29d 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/pci.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/pci.c
@@ -14,6 +14,7 @@
#include <linux/log2.h>
#include <linux/string.h>
#include <net/page_pool/helpers.h>
+#include <net/xdp.h>
#include "pci_hw.h"
#include "pci.h"
@@ -93,6 +94,7 @@ struct mlxsw_pci_queue {
} eq;
struct {
struct mlxsw_pci_queue *cq;
+ struct xdp_rxq_info xdp_rxq;
} rdq;
} u;
};
@@ -624,6 +626,11 @@ static int mlxsw_pci_rdq_init(struct mlxsw_pci *mlxsw_pci, char *mbox,
cq->u.cq.dq = q;
q->u.rdq.cq = cq;
+ err = __xdp_rxq_info_reg(&q->u.rdq.xdp_rxq, mlxsw_pci->napi_dev_rx, 0,
+ cq->u.cq.napi.napi_id, PAGE_SIZE);
+ if (err)
+ goto err_xdp_rxq_info_reg;
+
mlxsw_pci_queue_doorbell_producer_ring(mlxsw_pci, q);
for (i = 0; i < q->count; i++) {
@@ -633,7 +640,7 @@ static int mlxsw_pci_rdq_init(struct mlxsw_pci *mlxsw_pci, char *mbox,
for (j = 0; j < mlxsw_pci->num_sg_entries; j++) {
err = mlxsw_pci_rdq_page_alloc(q, elem_info, j);
if (err)
- goto rollback;
+ goto err_rdq_page_alloc;
}
/* Everything is set up, ring doorbell to pass elem to HW */
q->producer_counter++;
@@ -642,13 +649,15 @@ static int mlxsw_pci_rdq_init(struct mlxsw_pci *mlxsw_pci, char *mbox,
return 0;
-rollback:
+err_rdq_page_alloc:
for (i--; i >= 0; i--) {
elem_info = mlxsw_pci_queue_elem_info_get(q, i);
for (j--; j >= 0; j--)
mlxsw_pci_rdq_page_free(q, elem_info, j);
j = mlxsw_pci->num_sg_entries;
}
+ xdp_rxq_info_unreg(&q->u.rdq.xdp_rxq);
+err_xdp_rxq_info_reg:
q->u.rdq.cq = NULL;
cq->u.cq.dq = NULL;
mlxsw_cmd_hw2sw_rdq(mlxsw_pci->core, q->num);
@@ -663,6 +672,7 @@ static void mlxsw_pci_rdq_fini(struct mlxsw_pci *mlxsw_pci,
int i, j;
mlxsw_cmd_hw2sw_rdq(mlxsw_pci->core, q->num);
+ xdp_rxq_info_unreg(&q->u.rdq.xdp_rxq);
for (i = 0; i < q->count; i++) {
elem_info = mlxsw_pci_queue_elem_info_get(q, i);
for (j = 0; j < mlxsw_pci->num_sg_entries; j++)
--
2.47.0
^ permalink raw reply related [flat|nested] 22+ messages in thread* [PATCH net-next 10/12] mlxsw: spectrum: Initialize PCI port with the relevant netdevice
2025-02-04 11:04 [PATCH net-next 00/12] mlxsw: Preparations for XDP support Petr Machata
` (8 preceding siblings ...)
2025-02-04 11:05 ` [PATCH net-next 09/12] mlxsw: pci: Initialize XDP Rx queue info per RDQ Petr Machata
@ 2025-02-04 11:05 ` Petr Machata
2025-02-04 11:05 ` [PATCH net-next 11/12] mlxsw: Set some SKB fields in bus driver Petr Machata
` (2 subsequent siblings)
12 siblings, 0 replies; 22+ messages in thread
From: Petr Machata @ 2025-02-04 11:05 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Andrew Lunn, netdev
Cc: Amit Cohen, Ido Schimmel, Petr Machata, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, bpf,
mlxsw
From: Amit Cohen <amcohen@nvidia.com>
When a netdevice is associated with local port, set the netdevice as part
of PCI ports array. When a port is removed, unset the relevant netdevice.
This will be useful for XDP support, to allow quick access to the relevant
netdevice given local port from CQE.
Init is done before the netdevice is registered and de-init is done after
the netdevice is unregistered, so there is never concurrent access to the
array between the control path and the data path.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
---
drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index d714311fd884..6b77e087fe47 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -1543,6 +1543,7 @@ static int mlxsw_sp_port_create(struct mlxsw_sp *mlxsw_sp, u16 local_port,
mlxsw_core_port_netdev_link(mlxsw_sp->core, local_port,
mlxsw_sp_port, dev);
mlxsw_sp_port->dev = dev;
+ mlxsw_core_bus_port_init(mlxsw_sp->core, local_port, dev);
mlxsw_sp_port->mlxsw_sp = mlxsw_sp;
mlxsw_sp_port->local_port = local_port;
mlxsw_sp_port->pvid = MLXSW_SP_DEFAULT_VID;
@@ -1758,6 +1759,7 @@ static int mlxsw_sp_port_create(struct mlxsw_sp *mlxsw_sp, u16 local_port,
err_dev_addr_init:
free_percpu(mlxsw_sp_port->pcpu_stats);
err_alloc_stats:
+ mlxsw_core_bus_port_fini(mlxsw_sp->core, local_port);
free_netdev(dev);
err_alloc_etherdev:
mlxsw_core_port_fini(mlxsw_sp->core, local_port);
@@ -1793,6 +1795,7 @@ static void mlxsw_sp_port_remove(struct mlxsw_sp *mlxsw_sp, u16 local_port)
mlxsw_sp_port_buffers_fini(mlxsw_sp_port);
free_percpu(mlxsw_sp_port->pcpu_stats);
WARN_ON_ONCE(!list_empty(&mlxsw_sp_port->vlans_list));
+ mlxsw_core_bus_port_fini(mlxsw_sp->core, local_port);
free_netdev(mlxsw_sp_port->dev);
mlxsw_core_port_fini(mlxsw_sp->core, local_port);
mlxsw_sp_port_swid_set(mlxsw_sp, local_port,
--
2.47.0
^ permalink raw reply related [flat|nested] 22+ messages in thread* [PATCH net-next 11/12] mlxsw: Set some SKB fields in bus driver
2025-02-04 11:04 [PATCH net-next 00/12] mlxsw: Preparations for XDP support Petr Machata
` (9 preceding siblings ...)
2025-02-04 11:05 ` [PATCH net-next 10/12] mlxsw: spectrum: Initialize PCI port with the relevant netdevice Petr Machata
@ 2025-02-04 11:05 ` Petr Machata
2025-02-04 11:05 ` [PATCH net-next 12/12] mlxsw: Validate local port from CQE in PCI code Petr Machata
2025-02-04 15:56 ` [PATCH net-next 00/12] mlxsw: Preparations for XDP support Alexei Starovoitov
12 siblings, 0 replies; 22+ messages in thread
From: Petr Machata @ 2025-02-04 11:05 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Andrew Lunn, netdev
Cc: Amit Cohen, Ido Schimmel, Petr Machata, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, bpf,
mlxsw
From: Amit Cohen <amcohen@nvidia.com>
Currently, skb->dev and skb->protocol are set in the switch driver
(i.e., 'mlxsw_spectrum'). Previous patches add ports array to bus driver,
so we can get netdevice given local port. There is no real reason to not
set skb->dev and skb->protocol when SKB is created. Move the relevant code
to bus driver. This is needed as a preparation for using
xdp_build_skb_from_buff() which takes care of calling eth_type_trans().
eth_type_trans() moves skb->data to point after Ethernet header, so
skb->len is decreased accordingly. Add ETH_HLEN when per CPU stats are
updated, to save the current behavior which counts also Ethernet header
length.
eth_type_trans() sets skb->dev, so do not handle this in the driver.
Note that for EMADs, local port in CQE is zero, and there is no relevant
netdevice, for such packets, do not set skb->dev and skb->protocol.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
---
drivers/net/ethernet/mellanox/mlxsw/pci.c | 13 ++++++++++---
drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 5 +----
drivers/net/ethernet/mellanox/mlxsw/spectrum_trap.c | 6 +-----
3 files changed, 12 insertions(+), 12 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.c b/drivers/net/ethernet/mellanox/mlxsw/pci.c
index b102be38d29d..b560c21fd3ef 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/pci.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/pci.c
@@ -494,7 +494,8 @@ mlxsw_pci_sync_for_cpu(const struct mlxsw_pci_queue *q,
static struct sk_buff *
mlxsw_pci_rdq_build_skb(struct mlxsw_pci_queue *q,
- const struct mlxsw_pci_rx_pkt_info *rx_pkt_info)
+ const struct mlxsw_pci_rx_pkt_info *rx_pkt_info,
+ struct net_device *netdev)
{
unsigned int linear_data_size;
struct sk_buff *skb;
@@ -513,7 +514,7 @@ mlxsw_pci_rdq_build_skb(struct mlxsw_pci_queue *q,
skb_put(skb, linear_data_size);
if (rx_pkt_info->num_sg_entries == 1)
- return skb;
+ goto out;
for (i = 1; i < rx_pkt_info->num_sg_entries; i++) {
unsigned int frag_size;
@@ -525,6 +526,10 @@ mlxsw_pci_rdq_build_skb(struct mlxsw_pci_queue *q,
page, 0, frag_size, PAGE_SIZE);
}
+out:
+ if (netdev)
+ skb->protocol = eth_type_trans(skb, netdev);
+
return skb;
}
@@ -814,6 +819,7 @@ static void mlxsw_pci_cqe_rdq_handle(struct mlxsw_pci *mlxsw_pci,
struct pci_dev *pdev = mlxsw_pci->pdev;
struct mlxsw_pci_queue_elem_info *elem_info;
struct mlxsw_rx_info rx_info = {};
+ struct mlxsw_pci_port *pci_port;
struct sk_buff *skb;
u16 byte_count;
int err;
@@ -851,7 +857,8 @@ static void mlxsw_pci_cqe_rdq_handle(struct mlxsw_pci *mlxsw_pci,
if (err)
goto out;
- skb = mlxsw_pci_rdq_build_skb(q, &rx_pkt_info);
+ pci_port = &mlxsw_pci->pci_ports[rx_info.local_port];
+ skb = mlxsw_pci_rdq_build_skb(q, &rx_pkt_info, pci_port->netdev);
if (IS_ERR(skb)) {
dev_err_ratelimited(&pdev->dev, "Failed to build skb for RDQ\n");
mlxsw_pci_rdq_pages_recycle(q, rx_pkt_info.pages,
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index 6b77e087fe47..a7d2e3716283 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -2340,15 +2340,12 @@ void mlxsw_sp_rx_listener_no_mark_func(struct sk_buff *skb,
return;
}
- skb->dev = mlxsw_sp_port->dev;
-
pcpu_stats = this_cpu_ptr(mlxsw_sp_port->pcpu_stats);
u64_stats_update_begin(&pcpu_stats->syncp);
pcpu_stats->rx_packets++;
- pcpu_stats->rx_bytes += skb->len;
+ pcpu_stats->rx_bytes += skb->len + ETH_HLEN;
u64_stats_update_end(&pcpu_stats->syncp);
- skb->protocol = eth_type_trans(skb, skb->dev);
napi_gro_receive(mlxsw_skb_cb(skb)->rx_md_info.napi, skb);
}
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_trap.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_trap.c
index 1f9c1c86839f..2a69f1815e5a 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_trap.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_trap.c
@@ -72,16 +72,12 @@ static int mlxsw_sp_rx_listener(struct mlxsw_sp *mlxsw_sp, struct sk_buff *skb,
return -EINVAL;
}
- skb->dev = mlxsw_sp_port->dev;
-
pcpu_stats = this_cpu_ptr(mlxsw_sp_port->pcpu_stats);
u64_stats_update_begin(&pcpu_stats->syncp);
pcpu_stats->rx_packets++;
- pcpu_stats->rx_bytes += skb->len;
+ pcpu_stats->rx_bytes += skb->len + ETH_HLEN;
u64_stats_update_end(&pcpu_stats->syncp);
- skb->protocol = eth_type_trans(skb, skb->dev);
-
return 0;
}
--
2.47.0
^ permalink raw reply related [flat|nested] 22+ messages in thread* [PATCH net-next 12/12] mlxsw: Validate local port from CQE in PCI code
2025-02-04 11:04 [PATCH net-next 00/12] mlxsw: Preparations for XDP support Petr Machata
` (10 preceding siblings ...)
2025-02-04 11:05 ` [PATCH net-next 11/12] mlxsw: Set some SKB fields in bus driver Petr Machata
@ 2025-02-04 11:05 ` Petr Machata
2025-02-04 15:56 ` [PATCH net-next 00/12] mlxsw: Preparations for XDP support Alexei Starovoitov
12 siblings, 0 replies; 22+ messages in thread
From: Petr Machata @ 2025-02-04 11:05 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Andrew Lunn, netdev
Cc: Amit Cohen, Ido Schimmel, Petr Machata, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, bpf,
mlxsw
From: Amit Cohen <amcohen@nvidia.com>
Currently, there is a check in core code to validate that the received
local port does not exceed number of ports in the switch. Next patch will
have to validate it also in PCI, before accessing the pci_ports array.
There is no reason to check it twice, so move this check to PCI code.
Note that 'mlxsw_pci->max_ports' and 'mlxsw_core->max_ports' store the same
value, which is read from firmware.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
---
drivers/net/ethernet/mellanox/mlxsw/core.c | 3 +--
drivers/net/ethernet/mellanox/mlxsw/pci.c | 3 +++
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c
index 628530e01b19..962283bbfe18 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -2959,8 +2959,7 @@ void mlxsw_core_skb_receive(struct mlxsw_core *mlxsw_core, struct sk_buff *skb,
const struct mlxsw_rx_listener *rxl;
bool found = false;
- if ((rx_info->trap_id >= MLXSW_TRAP_ID_MAX) ||
- (rx_info->local_port >= mlxsw_core->max_ports))
+ if (rx_info->trap_id >= MLXSW_TRAP_ID_MAX)
goto drop;
rcu_read_lock();
diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.c b/drivers/net/ethernet/mellanox/mlxsw/pci.c
index b560c21fd3ef..778493b21318 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/pci.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/pci.c
@@ -845,6 +845,9 @@ static void mlxsw_pci_cqe_rdq_handle(struct mlxsw_pci *mlxsw_pci,
rx_info.local_port = mlxsw_pci_cqe_system_port_get(cqe);
}
+ if (rx_info.local_port >= mlxsw_pci->max_ports)
+ goto out;
+
err = mlxsw_pci_rx_pkt_info_init(q->pci, elem_info, byte_count,
&rx_pkt_info);
if (err)
--
2.47.0
^ permalink raw reply related [flat|nested] 22+ messages in thread* Re: [PATCH net-next 00/12] mlxsw: Preparations for XDP support
2025-02-04 11:04 [PATCH net-next 00/12] mlxsw: Preparations for XDP support Petr Machata
` (11 preceding siblings ...)
2025-02-04 11:05 ` [PATCH net-next 12/12] mlxsw: Validate local port from CQE in PCI code Petr Machata
@ 2025-02-04 15:56 ` Alexei Starovoitov
2025-02-04 15:59 ` Amit Cohen
12 siblings, 1 reply; 22+ messages in thread
From: Alexei Starovoitov @ 2025-02-04 15:56 UTC (permalink / raw)
To: Petr Machata
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Andrew Lunn, Network Development, Amit Cohen, Ido Schimmel,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, bpf, mlxsw
On Tue, Feb 4, 2025 at 11:06 AM Petr Machata <petrm@nvidia.com> wrote:
>
> Amit Cohen writes:
>
> A future patch set will add support for XDP in mlxsw driver. This set adds
> some preparations.
Why?
What is the goal here?
My understanding is that mlxsw is a hw switch and skb-s are used to
implement tap functionality for few listeners.
The volume of such packets is supposed to be small.
Even if XDP is added there is a huge mismatch in packet rates.
Hence the question. Why bother?
^ permalink raw reply [flat|nested] 22+ messages in thread* RE: [PATCH net-next 00/12] mlxsw: Preparations for XDP support
2025-02-04 15:56 ` [PATCH net-next 00/12] mlxsw: Preparations for XDP support Alexei Starovoitov
@ 2025-02-04 15:59 ` Amit Cohen
2025-02-04 16:02 ` Alexei Starovoitov
0 siblings, 1 reply; 22+ messages in thread
From: Amit Cohen @ 2025-02-04 15:59 UTC (permalink / raw)
To: Alexei Starovoitov, Petr Machata
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Andrew Lunn, Network Development, Ido Schimmel,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, bpf, mlxsw
> -----Original Message-----
> From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Sent: Tuesday, 4 February 2025 17:56
> To: Petr Machata <petrm@nvidia.com>
> Cc: David S. Miller <davem@davemloft.net>; Eric Dumazet <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo Abeni
> <pabeni@redhat.com>; Andrew Lunn <andrew+netdev@lunn.ch>; Network Development <netdev@vger.kernel.org>; Amit Cohen
> <amcohen@nvidia.com>; Ido Schimmel <idosch@nvidia.com>; Alexei Starovoitov <ast@kernel.org>; Daniel Borkmann
> <daniel@iogearbox.net>; Jesper Dangaard Brouer <hawk@kernel.org>; John Fastabend <john.fastabend@gmail.com>; bpf
> <bpf@vger.kernel.org>; mlxsw <mlxsw@nvidia.com>
> Subject: Re: [PATCH net-next 00/12] mlxsw: Preparations for XDP support
>
> On Tue, Feb 4, 2025 at 11:06 AM Petr Machata <petrm@nvidia.com> wrote:
> >
> > Amit Cohen writes:
> >
> > A future patch set will add support for XDP in mlxsw driver. This set adds
> > some preparations.
>
> Why?
> What is the goal here?
> My understanding is that mlxsw is a hw switch and skb-s are used to
> implement tap functionality for few listeners.
> The volume of such packets is supposed to be small.
> Even if XDP is added there is a huge mismatch in packet rates.
> Hence the question. Why bother?
You're right, most of packets should be handled by HW, XDP is mainly useful for telemetry.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next 00/12] mlxsw: Preparations for XDP support
2025-02-04 15:59 ` Amit Cohen
@ 2025-02-04 16:02 ` Alexei Starovoitov
2025-02-04 17:26 ` Amit Cohen
0 siblings, 1 reply; 22+ messages in thread
From: Alexei Starovoitov @ 2025-02-04 16:02 UTC (permalink / raw)
To: Amit Cohen
Cc: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Andrew Lunn, Network Development, Ido Schimmel,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, bpf, mlxsw
On Tue, Feb 4, 2025 at 3:59 PM Amit Cohen <amcohen@nvidia.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> > Sent: Tuesday, 4 February 2025 17:56
> > To: Petr Machata <petrm@nvidia.com>
> > Cc: David S. Miller <davem@davemloft.net>; Eric Dumazet <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo Abeni
> > <pabeni@redhat.com>; Andrew Lunn <andrew+netdev@lunn.ch>; Network Development <netdev@vger.kernel.org>; Amit Cohen
> > <amcohen@nvidia.com>; Ido Schimmel <idosch@nvidia.com>; Alexei Starovoitov <ast@kernel.org>; Daniel Borkmann
> > <daniel@iogearbox.net>; Jesper Dangaard Brouer <hawk@kernel.org>; John Fastabend <john.fastabend@gmail.com>; bpf
> > <bpf@vger.kernel.org>; mlxsw <mlxsw@nvidia.com>
> > Subject: Re: [PATCH net-next 00/12] mlxsw: Preparations for XDP support
> >
> > On Tue, Feb 4, 2025 at 11:06 AM Petr Machata <petrm@nvidia.com> wrote:
> > >
> > > Amit Cohen writes:
> > >
> > > A future patch set will add support for XDP in mlxsw driver. This set adds
> > > some preparations.
> >
> > Why?
> > What is the goal here?
> > My understanding is that mlxsw is a hw switch and skb-s are used to
> > implement tap functionality for few listeners.
> > The volume of such packets is supposed to be small.
> > Even if XDP is added there is a huge mismatch in packet rates.
> > Hence the question. Why bother?
>
> You're right, most of packets should be handled by HW, XDP is mainly useful for telemetry.
Why skb path is not enough?
^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: [PATCH net-next 00/12] mlxsw: Preparations for XDP support
2025-02-04 16:02 ` Alexei Starovoitov
@ 2025-02-04 17:26 ` Amit Cohen
2025-02-05 17:09 ` Jakub Kicinski
0 siblings, 1 reply; 22+ messages in thread
From: Amit Cohen @ 2025-02-04 17:26 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Andrew Lunn, Network Development, Ido Schimmel,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, bpf, mlxsw
> -----Original Message-----
> From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Sent: Tuesday, 4 February 2025 18:02
> To: Amit Cohen <amcohen@nvidia.com>
> Cc: Petr Machata <petrm@nvidia.com>; David S. Miller <davem@davemloft.net>; Eric Dumazet <edumazet@google.com>; Jakub Kicinski
> <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>; Andrew Lunn <andrew+netdev@lunn.ch>; Network Development
> <netdev@vger.kernel.org>; Ido Schimmel <idosch@nvidia.com>; Alexei Starovoitov <ast@kernel.org>; Daniel Borkmann
> <daniel@iogearbox.net>; Jesper Dangaard Brouer <hawk@kernel.org>; John Fastabend <john.fastabend@gmail.com>; bpf
> <bpf@vger.kernel.org>; mlxsw <mlxsw@nvidia.com>
> Subject: Re: [PATCH net-next 00/12] mlxsw: Preparations for XDP support
>
> On Tue, Feb 4, 2025 at 3:59 PM Amit Cohen <amcohen@nvidia.com> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> > > Sent: Tuesday, 4 February 2025 17:56
> > > To: Petr Machata <petrm@nvidia.com>
> > > Cc: David S. Miller <davem@davemloft.net>; Eric Dumazet <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo Abeni
> > > <pabeni@redhat.com>; Andrew Lunn <andrew+netdev@lunn.ch>; Network Development <netdev@vger.kernel.org>; Amit Cohen
> > > <amcohen@nvidia.com>; Ido Schimmel <idosch@nvidia.com>; Alexei Starovoitov <ast@kernel.org>; Daniel Borkmann
> > > <daniel@iogearbox.net>; Jesper Dangaard Brouer <hawk@kernel.org>; John Fastabend <john.fastabend@gmail.com>; bpf
> > > <bpf@vger.kernel.org>; mlxsw <mlxsw@nvidia.com>
> > > Subject: Re: [PATCH net-next 00/12] mlxsw: Preparations for XDP support
> > >
> > > On Tue, Feb 4, 2025 at 11:06 AM Petr Machata <petrm@nvidia.com> wrote:
> > > >
> > > > Amit Cohen writes:
> > > >
> > > > A future patch set will add support for XDP in mlxsw driver. This set adds
> > > > some preparations.
> > >
> > > Why?
> > > What is the goal here?
> > > My understanding is that mlxsw is a hw switch and skb-s are used to
> > > implement tap functionality for few listeners.
> > > The volume of such packets is supposed to be small.
> > > Even if XDP is added there is a huge mismatch in packet rates.
> > > Hence the question. Why bother?
> >
> > You're right, most of packets should be handled by HW, XDP is mainly useful for telemetry.
>
> Why skb path is not enough?
We get better packet rates using XDP, this can be useful to redirect packets to a server for analysis for example.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next 00/12] mlxsw: Preparations for XDP support
2025-02-04 17:26 ` Amit Cohen
@ 2025-02-05 17:09 ` Jakub Kicinski
2025-02-15 14:02 ` Simon Horman
0 siblings, 1 reply; 22+ messages in thread
From: Jakub Kicinski @ 2025-02-05 17:09 UTC (permalink / raw)
To: Amit Cohen
Cc: Alexei Starovoitov, Petr Machata, David S. Miller, Eric Dumazet,
Paolo Abeni, Andrew Lunn, Network Development, Ido Schimmel,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, bpf, mlxsw
On Tue, 4 Feb 2025 17:26:43 +0000 Amit Cohen wrote:
> > > You're right, most of packets should be handled by HW, XDP is
> > > mainly useful for telemetry.
> >
> > Why skb path is not enough?
>
> We get better packet rates using XDP, this can be useful to redirect
> packets to a server for analysis for example.
TBH I also feel a little ambivalent about adding advanced software
features to mlxsw. You have a dummy device off which you hang the NAPIs,
the page pools, and now the RXQ objects. That already works poorly with
our APIs. How are you going to handle the XDP side? Program per port,
I hope? But the basic fact remains that only fallback traffic goes thru
the XDP program which is not the normal Linux model, routing is after
XDP.
On one hand it'd be great if upstream switch drivers could benefit from
the advanced features. On the other the HW is clearly not capable of
delivering in line with how NICs work, so we're signing up for a stream
of corner cases, bugs and incompatibility. Dunno.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next 00/12] mlxsw: Preparations for XDP support
2025-02-05 17:09 ` Jakub Kicinski
@ 2025-02-15 14:02 ` Simon Horman
2025-02-15 16:10 ` Jakub Kicinski
0 siblings, 1 reply; 22+ messages in thread
From: Simon Horman @ 2025-02-15 14:02 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Amit Cohen, Alexei Starovoitov, Petr Machata, David S. Miller,
Eric Dumazet, Paolo Abeni, Andrew Lunn, Network Development,
Ido Schimmel, Alexei Starovoitov, Daniel Borkmann,
Jesper Dangaard Brouer, John Fastabend, bpf, mlxsw
On Wed, Feb 05, 2025 at 09:09:58AM -0800, Jakub Kicinski wrote:
> On Tue, 4 Feb 2025 17:26:43 +0000 Amit Cohen wrote:
> > > > You're right, most of packets should be handled by HW, XDP is
> > > > mainly useful for telemetry.
> > >
> > > Why skb path is not enough?
> >
> > We get better packet rates using XDP, this can be useful to redirect
> > packets to a server for analysis for example.
>
> TBH I also feel a little ambivalent about adding advanced software
> features to mlxsw. You have a dummy device off which you hang the NAPIs,
> the page pools, and now the RXQ objects. That already works poorly with
> our APIs. How are you going to handle the XDP side? Program per port,
> I hope? But the basic fact remains that only fallback traffic goes thru
> the XDP program which is not the normal Linux model, routing is after
> XDP.
>
> On one hand it'd be great if upstream switch drivers could benefit from
> the advanced features. On the other the HW is clearly not capable of
> delivering in line with how NICs work, so we're signing up for a stream
> of corner cases, bugs and incompatibility. Dunno.
FWIIW, I do think that as this driver is actively maintained by the vendor,
and this is a grey zone, it is reasonable to allow the vendor to decide if
they want the burden of this complexity to gain some performance.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next 00/12] mlxsw: Preparations for XDP support
2025-02-15 14:02 ` Simon Horman
@ 2025-02-15 16:10 ` Jakub Kicinski
2025-02-16 9:26 ` Simon Horman
2025-02-17 9:35 ` Ido Schimmel
0 siblings, 2 replies; 22+ messages in thread
From: Jakub Kicinski @ 2025-02-15 16:10 UTC (permalink / raw)
To: Simon Horman
Cc: Amit Cohen, Alexei Starovoitov, Petr Machata, David S. Miller,
Eric Dumazet, Paolo Abeni, Andrew Lunn, Network Development,
Ido Schimmel, Alexei Starovoitov, Daniel Borkmann,
Jesper Dangaard Brouer, John Fastabend, bpf, mlxsw
On Sat, 15 Feb 2025 14:02:52 +0000 Simon Horman wrote:
> > TBH I also feel a little ambivalent about adding advanced software
> > features to mlxsw. You have a dummy device off which you hang the NAPIs,
> > the page pools, and now the RXQ objects. That already works poorly with
> > our APIs. How are you going to handle the XDP side? Program per port,
> > I hope? But the basic fact remains that only fallback traffic goes thru
> > the XDP program which is not the normal Linux model, routing is after
> > XDP.
> >
> > On one hand it'd be great if upstream switch drivers could benefit from
> > the advanced features. On the other the HW is clearly not capable of
> > delivering in line with how NICs work, so we're signing up for a stream
> > of corner cases, bugs and incompatibility. Dunno.
>
> FWIIW, I do think that as this driver is actively maintained by the vendor,
> and this is a grey zone, it is reasonable to allow the vendor to decide if
> they want the burden of this complexity to gain some performance.
Yes, I left this series in PW for an extra couple of days expecting
a discussion but I suppose my email was taken as a final judgment.
The object separation can be faked more accurately, and analyzed
(in the cover letter) to give us more confidence that the divergence
won't create problems.
The "actively maintained" part is true and very much appreciated, but
it's both something that may easily change, and is hard to objectively
adjudicate. Reporting results to the upstream CI would be much more
objective and hopefully easier to maintain, were the folks supporting
mlxsw to "join a startup", or otherwise disengage.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next 00/12] mlxsw: Preparations for XDP support
2025-02-15 16:10 ` Jakub Kicinski
@ 2025-02-16 9:26 ` Simon Horman
2025-02-17 9:35 ` Ido Schimmel
1 sibling, 0 replies; 22+ messages in thread
From: Simon Horman @ 2025-02-16 9:26 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Amit Cohen, Alexei Starovoitov, Petr Machata, David S. Miller,
Eric Dumazet, Paolo Abeni, Andrew Lunn, Network Development,
Ido Schimmel, Alexei Starovoitov, Daniel Borkmann,
Jesper Dangaard Brouer, John Fastabend, bpf, mlxsw
On Sat, Feb 15, 2025 at 08:10:43AM -0800, Jakub Kicinski wrote:
> On Sat, 15 Feb 2025 14:02:52 +0000 Simon Horman wrote:
> > > TBH I also feel a little ambivalent about adding advanced software
> > > features to mlxsw. You have a dummy device off which you hang the NAPIs,
> > > the page pools, and now the RXQ objects. That already works poorly with
> > > our APIs. How are you going to handle the XDP side? Program per port,
> > > I hope? But the basic fact remains that only fallback traffic goes thru
> > > the XDP program which is not the normal Linux model, routing is after
> > > XDP.
> > >
> > > On one hand it'd be great if upstream switch drivers could benefit from
> > > the advanced features. On the other the HW is clearly not capable of
> > > delivering in line with how NICs work, so we're signing up for a stream
> > > of corner cases, bugs and incompatibility. Dunno.
> >
> > FWIIW, I do think that as this driver is actively maintained by the vendor,
> > and this is a grey zone, it is reasonable to allow the vendor to decide if
> > they want the burden of this complexity to gain some performance.
>
> Yes, I left this series in PW for an extra couple of days expecting
> a discussion but I suppose my email was taken as a final judgment.
Yes, I was trying to spur that discussion.
> The object separation can be faked more accurately, and analyzed
> (in the cover letter) to give us more confidence that the divergence
> won't create problems.
>
> The "actively maintained" part is true and very much appreciated, but
> it's both something that may easily change, and is hard to objectively
> adjudicate. Reporting results to the upstream CI would be much more
> objective and hopefully easier to maintain, were the folks supporting
> mlxsw to "join a startup", or otherwise disengage.
A good point. Things can change. And that may leave upstream maintainers
caring the can.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next 00/12] mlxsw: Preparations for XDP support
2025-02-15 16:10 ` Jakub Kicinski
2025-02-16 9:26 ` Simon Horman
@ 2025-02-17 9:35 ` Ido Schimmel
1 sibling, 0 replies; 22+ messages in thread
From: Ido Schimmel @ 2025-02-17 9:35 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Simon Horman, Amit Cohen, Alexei Starovoitov, Petr Machata,
David S. Miller, Eric Dumazet, Paolo Abeni, Andrew Lunn,
Network Development, Alexei Starovoitov, Daniel Borkmann,
Jesper Dangaard Brouer, John Fastabend, bpf, mlxsw
On Sat, Feb 15, 2025 at 08:10:43AM -0800, Jakub Kicinski wrote:
> On Sat, 15 Feb 2025 14:02:52 +0000 Simon Horman wrote:
> > > TBH I also feel a little ambivalent about adding advanced software
> > > features to mlxsw. You have a dummy device off which you hang the NAPIs,
> > > the page pools, and now the RXQ objects. That already works poorly with
> > > our APIs. How are you going to handle the XDP side? Program per port,
> > > I hope? But the basic fact remains that only fallback traffic goes thru
> > > the XDP program which is not the normal Linux model, routing is after
> > > XDP.
> > >
> > > On one hand it'd be great if upstream switch drivers could benefit from
> > > the advanced features. On the other the HW is clearly not capable of
> > > delivering in line with how NICs work, so we're signing up for a stream
> > > of corner cases, bugs and incompatibility. Dunno.
> >
> > FWIIW, I do think that as this driver is actively maintained by the vendor,
> > and this is a grey zone, it is reasonable to allow the vendor to decide if
> > they want the burden of this complexity to gain some performance.
>
> Yes, I left this series in PW for an extra couple of days expecting
> a discussion but I suppose my email was taken as a final judgment.
Yes.
> The object separation can be faked more accurately, and analyzed
> (in the cover letter) to give us more confidence that the divergence
> won't create problems.
Unlike regular NICs, this device has more ports than Rx queues, so we
cannot associate a Rx queue with a net device. Like you said, this is
why NAPI instances and RXQ objects are associated with a dummy net
device. However, there are already drivers such as mtk that have the
same problem and do the same thing. The only API change that we made in
this regard is adding a net device argument to xdp_build_skb_from_buff()
instead of having it use rxq->dev.
Regarding the invocation of XDP programs, they are of course invoked on
a per-port basis. It's just that the driver first needs to look up the
XDP program in an internal array based on the Rx port in the completion
info.
Regarding motivation, one use case we thought about is telemetry. For
example, today you can configure a tc filter with a sample action that
will mirror one out of N packets to the CPU. The driver identifies such
packets according to the trap ID in the completion info and then passes
them to the psample module with various metadata that it extracted from
the completion info (e.g., latency, egress queue occupancy, if sampled
on egress). Some users don't want to process these packets locally, but
instead have them sent together with the metadata to a server for
processing. If XDP programs had access to this metadata we could do this
on the CPU with relatively low overhead. However, this is not supported
with tc-bpf, so you might tell me that it shouldn't be supported with
XDP either.
^ permalink raw reply [flat|nested] 22+ messages in thread