* [PATCH net-next v2 0/3] Add native mode XDP support
@ 2025-02-10 10:33 Meghana Malladi
2025-02-10 10:33 ` [PATCH net-next v2 1/3] net: ti: icssg-prueth: Use page_pool API for RX buffer allocation Meghana Malladi
` (3 more replies)
0 siblings, 4 replies; 17+ messages in thread
From: Meghana Malladi @ 2025-02-10 10:33 UTC (permalink / raw)
To: rogerq, danishanwar, pabeni, kuba, edumazet, davem, andrew+netdev
Cc: bpf, linux-arm-kernel, linux-kernel, netdev, u.kleine-koenig,
krzysztof.kozlowski, dan.carpenter, m-malladi, schnelle, glaroque,
rdunlap, diogo.ivo, jan.kiszka, john.fastabend, hawk, daniel, ast,
srk, Vignesh Raghavendra
This series adds native XDP support using page_pool.
XDP zero copy support is not included in this patch series.
Patch 1/3: Replaces skb with page pool for Rx buffer allocation
Patch 2/3: Adds prueth_swdata struct for SWDATA for all swdata cases
Patch 3/3: Introduces native mode XDP support
v1: https://lore.kernel.org/all/20250122124951.3072410-1-m-malladi@ti.com/
Changes since v1 (v2-v1):
- Add missing SoBs for all patches
1/3:
- Recycle pages wherever necessary using skb_mark_for_recycle()
- Use napi_build_skb() instead of build_skb()
- Update with correct frag_size argument in napi_build_skb()
- Use napi_gro_receive() instead of netif_receive_skb()
- Use PP_FLAG_DMA_SYNC_DEV to enable DMA sync with device
- Use page_pool_dma_sync_for_cpu() to sync Rx page pool for CPU
3/3:
- Fix XDP typo in the commit message
- Add XDP feature flags using xdp_set_features_flag()
- Use xdp_build_skb_from_buff() when XDP ran
All the above changes have been suggested by Ido Schimmel <idosch@idosch.org>
Roger Quadros (3):
net: ti: icssg-prueth: Use page_pool API for RX buffer allocation
net: ti: icssg-prueth: introduce and use prueth_swdata struct for
SWDATA
net: ti: icssg-prueth: Add XDP support
drivers/net/ethernet/ti/Kconfig | 1 +
drivers/net/ethernet/ti/icssg/icssg_common.c | 427 ++++++++++++++----
drivers/net/ethernet/ti/icssg/icssg_config.h | 2 +-
drivers/net/ethernet/ti/icssg/icssg_prueth.c | 129 +++++-
drivers/net/ethernet/ti/icssg/icssg_prueth.h | 50 +-
.../net/ethernet/ti/icssg/icssg_prueth_sr1.c | 23 +-
6 files changed, 540 insertions(+), 92 deletions(-)
base-commit: acdefab0dcbc3833b5a734ab80d792bb778517a0
--
2.43.0
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH net-next v2 1/3] net: ti: icssg-prueth: Use page_pool API for RX buffer allocation
2025-02-10 10:33 [PATCH net-next v2 0/3] Add native mode XDP support Meghana Malladi
@ 2025-02-10 10:33 ` Meghana Malladi
2025-02-12 13:56 ` Roger Quadros
2025-02-10 10:33 ` [PATCH net-next v2 2/3] net: ti: icssg-prueth: introduce and use prueth_swdata struct for SWDATA Meghana Malladi
` (2 subsequent siblings)
3 siblings, 1 reply; 17+ messages in thread
From: Meghana Malladi @ 2025-02-10 10:33 UTC (permalink / raw)
To: rogerq, danishanwar, pabeni, kuba, edumazet, davem, andrew+netdev
Cc: bpf, linux-arm-kernel, linux-kernel, netdev, u.kleine-koenig,
krzysztof.kozlowski, dan.carpenter, m-malladi, schnelle, glaroque,
rdunlap, diogo.ivo, jan.kiszka, john.fastabend, hawk, daniel, ast,
srk, Vignesh Raghavendra
From: Roger Quadros <rogerq@kernel.org>
This is to prepare for native XDP support.
The page pool API is more faster in allocating pages than
__alloc_skb(). Drawback is that it works at PAGE_SIZE granularity
so we are not efficient in memory usage.
i.e. we are using PAGE_SIZE (4KB) memory for 1.5KB max packet size.
Signed-off-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
Signed-off-by: Meghana Malladi <m-malladi@ti.com>
---
v1: https://lore.kernel.org/all/20250122124951.3072410-1-m-malladi@ti.com/
Changes since v1 (v2-v1):
- Recycle pages wherever necessary using skb_mark_for_recycle()
- Use napi_build_skb() instead of build_skb()
- Update with correct frag_size argument in napi_build_skb()
- Use napi_gro_receive() instead of netif_receive_skb()
- Use PP_FLAG_DMA_SYNC_DEV to enable DMA sync with device
- Use page_pool_dma_sync_for_cpu() to sync Rx page pool for CPU
All the above changes have been suggested by Ido Schimmel <idosch@idosch.org>
drivers/net/ethernet/ti/Kconfig | 1 +
drivers/net/ethernet/ti/icssg/icssg_common.c | 174 ++++++++++++------
drivers/net/ethernet/ti/icssg/icssg_prueth.h | 14 +-
.../net/ethernet/ti/icssg/icssg_prueth_sr1.c | 21 ++-
4 files changed, 140 insertions(+), 70 deletions(-)
diff --git a/drivers/net/ethernet/ti/Kconfig b/drivers/net/ethernet/ti/Kconfig
index 0d5a862cd78a..b461281d31b6 100644
--- a/drivers/net/ethernet/ti/Kconfig
+++ b/drivers/net/ethernet/ti/Kconfig
@@ -204,6 +204,7 @@ config TI_ICSSG_PRUETH_SR1
select PHYLIB
select TI_ICSS_IEP
select TI_K3_CPPI_DESC_POOL
+ select PAGE_POOL
depends on PRU_REMOTEPROC
depends on NET_SWITCHDEV
depends on ARCH_K3 && OF && TI_K3_UDMA_GLUE_LAYER
diff --git a/drivers/net/ethernet/ti/icssg/icssg_common.c b/drivers/net/ethernet/ti/icssg/icssg_common.c
index 74f0f200a89d..c3c1e2bf461e 100644
--- a/drivers/net/ethernet/ti/icssg/icssg_common.c
+++ b/drivers/net/ethernet/ti/icssg/icssg_common.c
@@ -45,6 +45,11 @@ void prueth_cleanup_rx_chns(struct prueth_emac *emac,
struct prueth_rx_chn *rx_chn,
int max_rflows)
{
+ if (rx_chn->pg_pool) {
+ page_pool_destroy(rx_chn->pg_pool);
+ rx_chn->pg_pool = NULL;
+ }
+
if (rx_chn->desc_pool)
k3_cppi_desc_pool_destroy(rx_chn->desc_pool);
@@ -461,17 +466,17 @@ int prueth_init_rx_chns(struct prueth_emac *emac,
}
EXPORT_SYMBOL_GPL(prueth_init_rx_chns);
-int prueth_dma_rx_push(struct prueth_emac *emac,
- struct sk_buff *skb,
- struct prueth_rx_chn *rx_chn)
+int prueth_dma_rx_push_mapped(struct prueth_emac *emac,
+ struct prueth_rx_chn *rx_chn,
+ struct page *page, u32 buf_len)
{
struct net_device *ndev = emac->ndev;
struct cppi5_host_desc_t *desc_rx;
- u32 pkt_len = skb_tailroom(skb);
dma_addr_t desc_dma;
dma_addr_t buf_dma;
void **swdata;
+ buf_dma = page_pool_get_dma_addr(page) + PRUETH_HEADROOM;
desc_rx = k3_cppi_desc_pool_alloc(rx_chn->desc_pool);
if (!desc_rx) {
netdev_err(ndev, "rx push: failed to allocate descriptor\n");
@@ -479,25 +484,18 @@ int prueth_dma_rx_push(struct prueth_emac *emac,
}
desc_dma = k3_cppi_desc_pool_virt2dma(rx_chn->desc_pool, desc_rx);
- buf_dma = dma_map_single(rx_chn->dma_dev, skb->data, pkt_len, DMA_FROM_DEVICE);
- if (unlikely(dma_mapping_error(rx_chn->dma_dev, buf_dma))) {
- k3_cppi_desc_pool_free(rx_chn->desc_pool, desc_rx);
- netdev_err(ndev, "rx push: failed to map rx pkt buffer\n");
- return -EINVAL;
- }
-
cppi5_hdesc_init(desc_rx, CPPI5_INFO0_HDESC_EPIB_PRESENT,
PRUETH_NAV_PS_DATA_SIZE);
k3_udma_glue_rx_dma_to_cppi5_addr(rx_chn->rx_chn, &buf_dma);
- cppi5_hdesc_attach_buf(desc_rx, buf_dma, skb_tailroom(skb), buf_dma, skb_tailroom(skb));
+ cppi5_hdesc_attach_buf(desc_rx, buf_dma, buf_len, buf_dma, buf_len);
swdata = cppi5_hdesc_get_swdata(desc_rx);
- *swdata = skb;
+ *swdata = page;
- return k3_udma_glue_push_rx_chn(rx_chn->rx_chn, 0,
+ return k3_udma_glue_push_rx_chn(rx_chn->rx_chn, PRUETH_RX_FLOW_DATA,
desc_rx, desc_dma);
}
-EXPORT_SYMBOL_GPL(prueth_dma_rx_push);
+EXPORT_SYMBOL_GPL(prueth_dma_rx_push_mapped);
u64 icssg_ts_to_ns(u32 hi_sw, u32 hi, u32 lo, u32 cycle_time_ns)
{
@@ -541,12 +539,16 @@ static int emac_rx_packet(struct prueth_emac *emac, u32 flow_id)
u32 buf_dma_len, pkt_len, port_id = 0;
struct net_device *ndev = emac->ndev;
struct cppi5_host_desc_t *desc_rx;
- struct sk_buff *skb, *new_skb;
dma_addr_t desc_dma, buf_dma;
+ struct page *page, *new_page;
+ struct page_pool *pool;
+ struct sk_buff *skb;
void **swdata;
u32 *psdata;
+ void *pa;
int ret;
+ pool = rx_chn->pg_pool;
ret = k3_udma_glue_pop_rx_chn(rx_chn->rx_chn, flow_id, &desc_dma);
if (ret) {
if (ret != -ENODATA)
@@ -558,15 +560,10 @@ static int emac_rx_packet(struct prueth_emac *emac, u32 flow_id)
return 0;
desc_rx = k3_cppi_desc_pool_dma2virt(rx_chn->desc_pool, desc_dma);
-
swdata = cppi5_hdesc_get_swdata(desc_rx);
- skb = *swdata;
-
- psdata = cppi5_hdesc_get_psdata(desc_rx);
- /* RX HW timestamp */
- if (emac->rx_ts_enabled)
- emac_rx_timestamp(emac, skb, psdata);
+ page = *swdata;
+ page_pool_dma_sync_for_cpu(pool, page, 0, PAGE_SIZE);
cppi5_hdesc_get_obuf(desc_rx, &buf_dma, &buf_dma_len);
k3_udma_glue_rx_cppi5_to_dma_addr(rx_chn->rx_chn, &buf_dma);
pkt_len = cppi5_hdesc_get_pktlen(desc_rx);
@@ -574,32 +571,51 @@ static int emac_rx_packet(struct prueth_emac *emac, u32 flow_id)
pkt_len -= 4;
cppi5_desc_get_tags_ids(&desc_rx->hdr, &port_id, NULL);
- dma_unmap_single(rx_chn->dma_dev, buf_dma, buf_dma_len, DMA_FROM_DEVICE);
k3_cppi_desc_pool_free(rx_chn->desc_pool, desc_rx);
- skb->dev = ndev;
- new_skb = netdev_alloc_skb_ip_align(ndev, PRUETH_MAX_PKT_SIZE);
/* if allocation fails we drop the packet but push the
- * descriptor back to the ring with old skb to prevent a stall
+ * descriptor back to the ring with old page to prevent a stall
*/
- if (!new_skb) {
+ new_page = page_pool_dev_alloc_pages(pool);
+ if (unlikely(!new_page)) {
+ new_page = page;
ndev->stats.rx_dropped++;
- new_skb = skb;
- } else {
- /* send the filled skb up the n/w stack */
- skb_put(skb, pkt_len);
- if (emac->prueth->is_switch_mode)
- skb->offload_fwd_mark = emac->offload_fwd_mark;
- skb->protocol = eth_type_trans(skb, ndev);
- napi_gro_receive(&emac->napi_rx, skb);
- ndev->stats.rx_bytes += pkt_len;
- ndev->stats.rx_packets++;
+ goto requeue;
+ }
+
+ /* prepare skb and send to n/w stack */
+ pa = page_address(page);
+ skb = napi_build_skb(pa, PAGE_SIZE);
+ if (!skb) {
+ ndev->stats.rx_dropped++;
+ page_pool_recycle_direct(pool, page);
+ goto requeue;
}
+ skb_reserve(skb, PRUETH_HEADROOM);
+ skb_put(skb, pkt_len);
+ skb->dev = ndev;
+
+ psdata = cppi5_hdesc_get_psdata(desc_rx);
+ /* RX HW timestamp */
+ if (emac->rx_ts_enabled)
+ emac_rx_timestamp(emac, skb, psdata);
+
+ if (emac->prueth->is_switch_mode)
+ skb->offload_fwd_mark = emac->offload_fwd_mark;
+ skb->protocol = eth_type_trans(skb, ndev);
+
+ skb_mark_for_recycle(skb);
+ napi_gro_receive(&emac->napi_rx, skb);
+ ndev->stats.rx_bytes += pkt_len;
+ ndev->stats.rx_packets++;
+
+requeue:
/* queue another RX DMA */
- ret = prueth_dma_rx_push(emac, new_skb, &emac->rx_chns);
+ ret = prueth_dma_rx_push_mapped(emac, &emac->rx_chns, new_page,
+ PRUETH_MAX_PKT_SIZE);
if (WARN_ON(ret < 0)) {
- dev_kfree_skb_any(new_skb);
+ page_pool_recycle_direct(pool, new_page);
ndev->stats.rx_errors++;
ndev->stats.rx_dropped++;
}
@@ -611,22 +627,17 @@ static void prueth_rx_cleanup(void *data, dma_addr_t desc_dma)
{
struct prueth_rx_chn *rx_chn = data;
struct cppi5_host_desc_t *desc_rx;
- struct sk_buff *skb;
- dma_addr_t buf_dma;
- u32 buf_dma_len;
+ struct page_pool *pool;
+ struct page *page;
void **swdata;
+ pool = rx_chn->pg_pool;
+
desc_rx = k3_cppi_desc_pool_dma2virt(rx_chn->desc_pool, desc_dma);
swdata = cppi5_hdesc_get_swdata(desc_rx);
- skb = *swdata;
- cppi5_hdesc_get_obuf(desc_rx, &buf_dma, &buf_dma_len);
- k3_udma_glue_rx_cppi5_to_dma_addr(rx_chn->rx_chn, &buf_dma);
-
- dma_unmap_single(rx_chn->dma_dev, buf_dma, buf_dma_len,
- DMA_FROM_DEVICE);
+ page = *swdata;
+ page_pool_recycle_direct(pool, page);
k3_cppi_desc_pool_free(rx_chn->desc_pool, desc_rx);
-
- dev_kfree_skb_any(skb);
}
static int prueth_tx_ts_cookie_get(struct prueth_emac *emac)
@@ -907,29 +918,71 @@ int icssg_napi_rx_poll(struct napi_struct *napi_rx, int budget)
}
EXPORT_SYMBOL_GPL(icssg_napi_rx_poll);
+static struct page_pool *prueth_create_page_pool(struct prueth_emac *emac,
+ struct device *dma_dev,
+ int size)
+{
+ struct page_pool_params pp_params = { 0 };
+ struct page_pool *pool;
+
+ pp_params.order = 0;
+ pp_params.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV;
+ pp_params.pool_size = size;
+ pp_params.nid = dev_to_node(emac->prueth->dev);
+ pp_params.dma_dir = DMA_BIDIRECTIONAL;
+ pp_params.dev = dma_dev;
+ pp_params.napi = &emac->napi_rx;
+ pp_params.max_len = PAGE_SIZE;
+
+ pool = page_pool_create(&pp_params);
+ if (IS_ERR(pool))
+ netdev_err(emac->ndev, "cannot create rx page pool\n");
+
+ return pool;
+}
+
int prueth_prepare_rx_chan(struct prueth_emac *emac,
struct prueth_rx_chn *chn,
int buf_size)
{
- struct sk_buff *skb;
+ struct page_pool *pool;
+ struct page *page;
int i, ret;
+ pool = prueth_create_page_pool(emac, chn->dma_dev, chn->descs_num);
+ if (IS_ERR(pool))
+ return PTR_ERR(pool);
+
+ chn->pg_pool = pool;
+
for (i = 0; i < chn->descs_num; i++) {
- skb = __netdev_alloc_skb_ip_align(NULL, buf_size, GFP_KERNEL);
- if (!skb)
- return -ENOMEM;
+ /* NOTE: we're not using memory efficiently here.
+ * 1 full page (4KB?) used here instead of
+ * PRUETH_MAX_PKT_SIZE (~1.5KB?)
+ */
+ page = page_pool_dev_alloc_pages(pool);
+ if (!page) {
+ netdev_err(emac->ndev, "couldn't allocate rx page\n");
+ ret = -ENOMEM;
+ goto recycle_alloc_pg;
+ }
- ret = prueth_dma_rx_push(emac, skb, chn);
+ ret = prueth_dma_rx_push_mapped(emac, chn, page, buf_size);
if (ret < 0) {
netdev_err(emac->ndev,
- "cannot submit skb for rx chan %s ret %d\n",
+ "cannot submit page for rx chan %s ret %d\n",
chn->name, ret);
- kfree_skb(skb);
- return ret;
+ page_pool_recycle_direct(pool, page);
+ goto recycle_alloc_pg;
}
}
return 0;
+
+recycle_alloc_pg:
+ prueth_reset_rx_chan(&emac->rx_chns, PRUETH_MAX_RX_FLOWS, false);
+
+ return ret;
}
EXPORT_SYMBOL_GPL(prueth_prepare_rx_chan);
@@ -958,6 +1011,9 @@ void prueth_reset_rx_chan(struct prueth_rx_chn *chn,
prueth_rx_cleanup, !!i);
if (disable)
k3_udma_glue_disable_rx_chn(chn->rx_chn);
+
+ page_pool_destroy(chn->pg_pool);
+ chn->pg_pool = NULL;
}
EXPORT_SYMBOL_GPL(prueth_reset_rx_chan);
diff --git a/drivers/net/ethernet/ti/icssg/icssg_prueth.h b/drivers/net/ethernet/ti/icssg/icssg_prueth.h
index 329b46e9ee53..c7b906de18af 100644
--- a/drivers/net/ethernet/ti/icssg/icssg_prueth.h
+++ b/drivers/net/ethernet/ti/icssg/icssg_prueth.h
@@ -33,6 +33,8 @@
#include <linux/dma/k3-udma-glue.h>
#include <net/devlink.h>
+#include <net/xdp.h>
+#include <net/page_pool/helpers.h>
#include "icssg_config.h"
#include "icss_iep.h"
@@ -131,6 +133,7 @@ struct prueth_rx_chn {
u32 descs_num;
unsigned int irq[ICSSG_MAX_RFLOWS]; /* separate irq per flow */
char name[32];
+ struct page_pool *pg_pool;
};
/* There are 4 Tx DMA channels, but the highest priority is CH3 (thread 3)
@@ -210,6 +213,10 @@ struct prueth_emac {
struct netdev_hw_addr_list vlan_mcast_list[MAX_VLAN_ID];
};
+/* The buf includes headroom compatible with both skb and xdpf */
+#define PRUETH_HEADROOM_NA (max(XDP_PACKET_HEADROOM, NET_SKB_PAD) + NET_IP_ALIGN)
+#define PRUETH_HEADROOM ALIGN(PRUETH_HEADROOM_NA, sizeof(long))
+
/**
* struct prueth_pdata - PRUeth platform data
* @fdqring_mode: Free desc queue mode
@@ -410,9 +417,10 @@ int prueth_init_rx_chns(struct prueth_emac *emac,
struct prueth_rx_chn *rx_chn,
char *name, u32 max_rflows,
u32 max_desc_num);
-int prueth_dma_rx_push(struct prueth_emac *emac,
- struct sk_buff *skb,
- struct prueth_rx_chn *rx_chn);
+int prueth_dma_rx_push_mapped(struct prueth_emac *emac,
+ struct prueth_rx_chn *rx_chn,
+ struct page *page, u32 buf_len);
+unsigned int prueth_rxbuf_total_len(unsigned int len);
void emac_rx_timestamp(struct prueth_emac *emac,
struct sk_buff *skb, u32 *psdata);
enum netdev_tx icssg_ndo_start_xmit(struct sk_buff *skb, struct net_device *ndev);
diff --git a/drivers/net/ethernet/ti/icssg/icssg_prueth_sr1.c b/drivers/net/ethernet/ti/icssg/icssg_prueth_sr1.c
index 64a19ff39562..aeeb8a50376b 100644
--- a/drivers/net/ethernet/ti/icssg/icssg_prueth_sr1.c
+++ b/drivers/net/ethernet/ti/icssg/icssg_prueth_sr1.c
@@ -274,10 +274,12 @@ static struct sk_buff *prueth_process_rx_mgm(struct prueth_emac *emac,
struct prueth_rx_chn *rx_chn = &emac->rx_mgm_chn;
struct net_device *ndev = emac->ndev;
struct cppi5_host_desc_t *desc_rx;
- struct sk_buff *skb, *new_skb;
+ struct page *page, *new_page;
dma_addr_t desc_dma, buf_dma;
u32 buf_dma_len, pkt_len;
+ struct sk_buff *skb;
void **swdata;
+ void *pa;
int ret;
ret = k3_udma_glue_pop_rx_chn(rx_chn->rx_chn, flow_id, &desc_dma);
@@ -299,32 +301,35 @@ static struct sk_buff *prueth_process_rx_mgm(struct prueth_emac *emac,
}
swdata = cppi5_hdesc_get_swdata(desc_rx);
- skb = *swdata;
+ page = *swdata;
cppi5_hdesc_get_obuf(desc_rx, &buf_dma, &buf_dma_len);
pkt_len = cppi5_hdesc_get_pktlen(desc_rx);
dma_unmap_single(rx_chn->dma_dev, buf_dma, buf_dma_len, DMA_FROM_DEVICE);
k3_cppi_desc_pool_free(rx_chn->desc_pool, desc_rx);
- new_skb = netdev_alloc_skb_ip_align(ndev, PRUETH_MAX_PKT_SIZE);
+ new_page = page_pool_dev_alloc_pages(rx_chn->pg_pool);
/* if allocation fails we drop the packet but push the
* descriptor back to the ring with old skb to prevent a stall
*/
- if (!new_skb) {
+ if (!new_page) {
netdev_err(ndev,
- "skb alloc failed, dropped mgm pkt from flow %d\n",
+ "page alloc failed, dropped mgm pkt from flow %d\n",
flow_id);
- new_skb = skb;
+ new_page = page;
skb = NULL; /* return NULL */
} else {
/* return the filled skb */
+ pa = page_address(page);
+ skb = napi_build_skb(pa, PAGE_SIZE);
skb_put(skb, pkt_len);
}
/* queue another DMA */
- ret = prueth_dma_rx_push(emac, new_skb, &emac->rx_mgm_chn);
+ ret = prueth_dma_rx_push_mapped(emac, &emac->rx_chns, new_page,
+ PRUETH_MAX_PKT_SIZE);
if (WARN_ON(ret < 0))
- dev_kfree_skb_any(new_skb);
+ page_pool_recycle_direct(rx_chn->pg_pool, new_page);
return skb;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH net-next v2 2/3] net: ti: icssg-prueth: introduce and use prueth_swdata struct for SWDATA
2025-02-10 10:33 [PATCH net-next v2 0/3] Add native mode XDP support Meghana Malladi
2025-02-10 10:33 ` [PATCH net-next v2 1/3] net: ti: icssg-prueth: Use page_pool API for RX buffer allocation Meghana Malladi
@ 2025-02-10 10:33 ` Meghana Malladi
2025-02-12 14:12 ` Roger Quadros
2025-02-10 10:33 ` [PATCH net-next v2 3/3] net: ti: icssg-prueth: Add XDP support Meghana Malladi
2025-02-18 16:02 ` [PATCH net-next v2 0/3] Add native mode " Jesper Dangaard Brouer
3 siblings, 1 reply; 17+ messages in thread
From: Meghana Malladi @ 2025-02-10 10:33 UTC (permalink / raw)
To: rogerq, danishanwar, pabeni, kuba, edumazet, davem, andrew+netdev
Cc: bpf, linux-arm-kernel, linux-kernel, netdev, u.kleine-koenig,
krzysztof.kozlowski, dan.carpenter, m-malladi, schnelle, glaroque,
rdunlap, diogo.ivo, jan.kiszka, john.fastabend, hawk, daniel, ast,
srk, Vignesh Raghavendra
From: Roger Quadros <rogerq@kernel.org>
We have different cases for SWDATA (skb, page, cmd, etc)
so it is better to have a dedicated data structure for that.
We can embed the type field inside the struct and use it
to interpret the data in completion handlers.
Increase SWDATA size to 48 so we have some room to add
more data if required.
Signed-off-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
Signed-off-by: Meghana Malladi <m-malladi@ti.com>
---
v1: https://lore.kernel.org/all/20250122124951.3072410-1-m-malladi@ti.com/
Changes since v1 (v2-v1): None
drivers/net/ethernet/ti/icssg/icssg_common.c | 47 ++++++++++++-------
drivers/net/ethernet/ti/icssg/icssg_config.h | 2 +-
drivers/net/ethernet/ti/icssg/icssg_prueth.c | 6 +++
drivers/net/ethernet/ti/icssg/icssg_prueth.h | 18 +++++++
.../net/ethernet/ti/icssg/icssg_prueth_sr1.c | 4 +-
5 files changed, 58 insertions(+), 19 deletions(-)
diff --git a/drivers/net/ethernet/ti/icssg/icssg_common.c b/drivers/net/ethernet/ti/icssg/icssg_common.c
index c3c1e2bf461e..a124c5773551 100644
--- a/drivers/net/ethernet/ti/icssg/icssg_common.c
+++ b/drivers/net/ethernet/ti/icssg/icssg_common.c
@@ -136,12 +136,12 @@ int emac_tx_complete_packets(struct prueth_emac *emac, int chn,
struct net_device *ndev = emac->ndev;
struct cppi5_host_desc_t *desc_tx;
struct netdev_queue *netif_txq;
+ struct prueth_swdata *swdata;
struct prueth_tx_chn *tx_chn;
unsigned int total_bytes = 0;
struct sk_buff *skb;
dma_addr_t desc_dma;
int res, num_tx = 0;
- void **swdata;
tx_chn = &emac->tx_chns[chn];
@@ -163,12 +163,18 @@ int emac_tx_complete_packets(struct prueth_emac *emac, int chn,
swdata = cppi5_hdesc_get_swdata(desc_tx);
/* was this command's TX complete? */
- if (emac->is_sr1 && *(swdata) == emac->cmd_data) {
+ if (emac->is_sr1 && (void *)(swdata) == emac->cmd_data) {
prueth_xmit_free(tx_chn, desc_tx);
continue;
}
- skb = *(swdata);
+ if (swdata->type != PRUETH_SWDATA_SKB) {
+ netdev_err(ndev, "tx_complete: invalid swdata type %d\n", swdata->type);
+ budget++;
+ continue;
+ }
+
+ skb = swdata->data.skb;
prueth_xmit_free(tx_chn, desc_tx);
ndev = skb->dev;
@@ -472,9 +478,9 @@ int prueth_dma_rx_push_mapped(struct prueth_emac *emac,
{
struct net_device *ndev = emac->ndev;
struct cppi5_host_desc_t *desc_rx;
+ struct prueth_swdata *swdata;
dma_addr_t desc_dma;
dma_addr_t buf_dma;
- void **swdata;
buf_dma = page_pool_get_dma_addr(page) + PRUETH_HEADROOM;
desc_rx = k3_cppi_desc_pool_alloc(rx_chn->desc_pool);
@@ -490,7 +496,8 @@ int prueth_dma_rx_push_mapped(struct prueth_emac *emac,
cppi5_hdesc_attach_buf(desc_rx, buf_dma, buf_len, buf_dma, buf_len);
swdata = cppi5_hdesc_get_swdata(desc_rx);
- *swdata = page;
+ swdata->type = PRUETH_SWDATA_PAGE;
+ swdata->data.page = page;
return k3_udma_glue_push_rx_chn(rx_chn->rx_chn, PRUETH_RX_FLOW_DATA,
desc_rx, desc_dma);
@@ -539,11 +546,11 @@ static int emac_rx_packet(struct prueth_emac *emac, u32 flow_id)
u32 buf_dma_len, pkt_len, port_id = 0;
struct net_device *ndev = emac->ndev;
struct cppi5_host_desc_t *desc_rx;
+ struct prueth_swdata *swdata;
dma_addr_t desc_dma, buf_dma;
struct page *page, *new_page;
struct page_pool *pool;
struct sk_buff *skb;
- void **swdata;
u32 *psdata;
void *pa;
int ret;
@@ -561,7 +568,11 @@ static int emac_rx_packet(struct prueth_emac *emac, u32 flow_id)
desc_rx = k3_cppi_desc_pool_dma2virt(rx_chn->desc_pool, desc_dma);
swdata = cppi5_hdesc_get_swdata(desc_rx);
- page = *swdata;
+ if (swdata->type != PRUETH_SWDATA_PAGE) {
+ netdev_err(ndev, "rx_pkt: invalid swdata->type %d\n", swdata->type);
+ return 0;
+ }
+ page = swdata->data.page;
page_pool_dma_sync_for_cpu(pool, page, 0, PAGE_SIZE);
cppi5_hdesc_get_obuf(desc_rx, &buf_dma, &buf_dma_len);
@@ -627,16 +638,18 @@ static void prueth_rx_cleanup(void *data, dma_addr_t desc_dma)
{
struct prueth_rx_chn *rx_chn = data;
struct cppi5_host_desc_t *desc_rx;
+ struct prueth_swdata *swdata;
struct page_pool *pool;
struct page *page;
- void **swdata;
pool = rx_chn->pg_pool;
desc_rx = k3_cppi_desc_pool_dma2virt(rx_chn->desc_pool, desc_dma);
swdata = cppi5_hdesc_get_swdata(desc_rx);
- page = *swdata;
- page_pool_recycle_direct(pool, page);
+ if (swdata->type == PRUETH_SWDATA_PAGE) {
+ page = swdata->data.page;
+ page_pool_recycle_direct(pool, page);
+ }
k3_cppi_desc_pool_free(rx_chn->desc_pool, desc_rx);
}
@@ -673,13 +686,13 @@ enum netdev_tx icssg_ndo_start_xmit(struct sk_buff *skb, struct net_device *ndev
struct prueth_emac *emac = netdev_priv(ndev);
struct prueth *prueth = emac->prueth;
struct netdev_queue *netif_txq;
+ struct prueth_swdata *swdata;
struct prueth_tx_chn *tx_chn;
dma_addr_t desc_dma, buf_dma;
u32 pkt_len, dst_tag_id;
int i, ret = 0, q_idx;
bool in_tx_ts = 0;
int tx_ts_cookie;
- void **swdata;
u32 *epib;
pkt_len = skb_headlen(skb);
@@ -741,7 +754,8 @@ enum netdev_tx icssg_ndo_start_xmit(struct sk_buff *skb, struct net_device *ndev
k3_udma_glue_tx_dma_to_cppi5_addr(tx_chn->tx_chn, &buf_dma);
cppi5_hdesc_attach_buf(first_desc, buf_dma, pkt_len, buf_dma, pkt_len);
swdata = cppi5_hdesc_get_swdata(first_desc);
- *swdata = skb;
+ swdata->type = PRUETH_SWDATA_SKB;
+ swdata->data.skb = skb;
/* Handle the case where skb is fragmented in pages */
cur_desc = first_desc;
@@ -844,15 +858,16 @@ static void prueth_tx_cleanup(void *data, dma_addr_t desc_dma)
{
struct prueth_tx_chn *tx_chn = data;
struct cppi5_host_desc_t *desc_tx;
+ struct prueth_swdata *swdata;
struct sk_buff *skb;
- void **swdata;
desc_tx = k3_cppi_desc_pool_dma2virt(tx_chn->desc_pool, desc_dma);
swdata = cppi5_hdesc_get_swdata(desc_tx);
- skb = *(swdata);
+ if (swdata->type == PRUETH_SWDATA_SKB) {
+ skb = swdata->data.skb;
+ dev_kfree_skb_any(skb);
+ }
prueth_xmit_free(tx_chn, desc_tx);
-
- dev_kfree_skb_any(skb);
}
irqreturn_t prueth_rx_irq(int irq, void *dev_id)
diff --git a/drivers/net/ethernet/ti/icssg/icssg_config.h b/drivers/net/ethernet/ti/icssg/icssg_config.h
index c884e9fa099e..eab84e11d80e 100644
--- a/drivers/net/ethernet/ti/icssg/icssg_config.h
+++ b/drivers/net/ethernet/ti/icssg/icssg_config.h
@@ -20,7 +20,7 @@ struct icssg_flow_cfg {
#define PRUETH_PKT_TYPE_CMD 0x10
#define PRUETH_NAV_PS_DATA_SIZE 16 /* Protocol specific data size */
-#define PRUETH_NAV_SW_DATA_SIZE 16 /* SW related data size */
+#define PRUETH_NAV_SW_DATA_SIZE 48 /* SW related data size */
#define PRUETH_MAX_TX_DESC 512
#define PRUETH_MAX_RX_DESC 512
#define PRUETH_MAX_RX_FLOWS 1 /* excluding default flow */
diff --git a/drivers/net/ethernet/ti/icssg/icssg_prueth.c b/drivers/net/ethernet/ti/icssg/icssg_prueth.c
index 00ed97860547..e5e4efe485f6 100644
--- a/drivers/net/ethernet/ti/icssg/icssg_prueth.c
+++ b/drivers/net/ethernet/ti/icssg/icssg_prueth.c
@@ -1522,6 +1522,12 @@ static int prueth_probe(struct platform_device *pdev)
np = dev->of_node;
+ if (sizeof(struct prueth_swdata) > PRUETH_NAV_SW_DATA_SIZE) {
+ dev_err(dev, "insufficient SW_DATA size: %d vs %ld\n",
+ PRUETH_NAV_SW_DATA_SIZE, sizeof(struct prueth_swdata));
+ return -ENOMEM;
+ }
+
prueth = devm_kzalloc(dev, sizeof(*prueth), GFP_KERNEL);
if (!prueth)
return -ENOMEM;
diff --git a/drivers/net/ethernet/ti/icssg/icssg_prueth.h b/drivers/net/ethernet/ti/icssg/icssg_prueth.h
index c7b906de18af..2c8585255b7c 100644
--- a/drivers/net/ethernet/ti/icssg/icssg_prueth.h
+++ b/drivers/net/ethernet/ti/icssg/icssg_prueth.h
@@ -136,6 +136,24 @@ struct prueth_rx_chn {
struct page_pool *pg_pool;
};
+enum prueth_swdata_type {
+ PRUETH_SWDATA_INVALID = 0,
+ PRUETH_SWDATA_SKB,
+ PRUETH_SWDATA_PAGE,
+ PRUETH_SWDATA_CMD,
+};
+
+union prueth_data {
+ struct sk_buff *skb;
+ struct page *page;
+ u32 cmd;
+};
+
+struct prueth_swdata {
+ union prueth_data data;
+ enum prueth_swdata_type type;
+};
+
/* There are 4 Tx DMA channels, but the highest priority is CH3 (thread 3)
* and lower three are lower priority channels or threads.
*/
diff --git a/drivers/net/ethernet/ti/icssg/icssg_prueth_sr1.c b/drivers/net/ethernet/ti/icssg/icssg_prueth_sr1.c
index aeeb8a50376b..7bbe0808b3ec 100644
--- a/drivers/net/ethernet/ti/icssg/icssg_prueth_sr1.c
+++ b/drivers/net/ethernet/ti/icssg/icssg_prueth_sr1.c
@@ -275,10 +275,10 @@ static struct sk_buff *prueth_process_rx_mgm(struct prueth_emac *emac,
struct net_device *ndev = emac->ndev;
struct cppi5_host_desc_t *desc_rx;
struct page *page, *new_page;
+ struct prueth_swdata *swdata;
dma_addr_t desc_dma, buf_dma;
u32 buf_dma_len, pkt_len;
struct sk_buff *skb;
- void **swdata;
void *pa;
int ret;
@@ -301,7 +301,7 @@ static struct sk_buff *prueth_process_rx_mgm(struct prueth_emac *emac,
}
swdata = cppi5_hdesc_get_swdata(desc_rx);
- page = *swdata;
+ page = swdata->data.page;
cppi5_hdesc_get_obuf(desc_rx, &buf_dma, &buf_dma_len);
pkt_len = cppi5_hdesc_get_pktlen(desc_rx);
--
2.43.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH net-next v2 3/3] net: ti: icssg-prueth: Add XDP support
2025-02-10 10:33 [PATCH net-next v2 0/3] Add native mode XDP support Meghana Malladi
2025-02-10 10:33 ` [PATCH net-next v2 1/3] net: ti: icssg-prueth: Use page_pool API for RX buffer allocation Meghana Malladi
2025-02-10 10:33 ` [PATCH net-next v2 2/3] net: ti: icssg-prueth: introduce and use prueth_swdata struct for SWDATA Meghana Malladi
@ 2025-02-10 10:33 ` Meghana Malladi
2025-02-11 18:39 ` kernel test robot
` (2 more replies)
2025-02-18 16:02 ` [PATCH net-next v2 0/3] Add native mode " Jesper Dangaard Brouer
3 siblings, 3 replies; 17+ messages in thread
From: Meghana Malladi @ 2025-02-10 10:33 UTC (permalink / raw)
To: rogerq, danishanwar, pabeni, kuba, edumazet, davem, andrew+netdev
Cc: bpf, linux-arm-kernel, linux-kernel, netdev, u.kleine-koenig,
krzysztof.kozlowski, dan.carpenter, m-malladi, schnelle, glaroque,
rdunlap, diogo.ivo, jan.kiszka, john.fastabend, hawk, daniel, ast,
srk, Vignesh Raghavendra
From: Roger Quadros <rogerq@kernel.org>
Add native XDP support. We do not support zero copy yet.
Signed-off-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
Signed-off-by: Meghana Malladi <m-malladi@ti.com>
---
v1: https://lore.kernel.org/all/20250122124951.3072410-1-m-malladi@ti.com/
Changes since v1 (v2-v1):
- Fix XDP typo in the commit message
- Add XDP feature flags using xdp_set_features_flag()
- Use xdp_build_skb_from_buff() when XDP ran
All the above changes have been suggested by Ido Schimmel <idosch@idosch.org>
drivers/net/ethernet/ti/icssg/icssg_common.c | 226 +++++++++++++++++--
drivers/net/ethernet/ti/icssg/icssg_prueth.c | 123 +++++++++-
drivers/net/ethernet/ti/icssg/icssg_prueth.h | 18 ++
3 files changed, 353 insertions(+), 14 deletions(-)
diff --git a/drivers/net/ethernet/ti/icssg/icssg_common.c b/drivers/net/ethernet/ti/icssg/icssg_common.c
index a124c5773551..b01750a2d57e 100644
--- a/drivers/net/ethernet/ti/icssg/icssg_common.c
+++ b/drivers/net/ethernet/ti/icssg/icssg_common.c
@@ -98,11 +98,19 @@ void prueth_xmit_free(struct prueth_tx_chn *tx_chn,
{
struct cppi5_host_desc_t *first_desc, *next_desc;
dma_addr_t buf_dma, next_desc_dma;
+ struct prueth_swdata *swdata;
u32 buf_dma_len;
first_desc = desc;
next_desc = first_desc;
+ swdata = cppi5_hdesc_get_swdata(desc);
+ if (swdata->type == PRUETH_SWDATA_PAGE) {
+ page_pool_recycle_direct(swdata->rx_chn->pg_pool,
+ swdata->data.page);
+ goto free_desc;
+ }
+
cppi5_hdesc_get_obuf(first_desc, &buf_dma, &buf_dma_len);
k3_udma_glue_tx_cppi5_to_dma_addr(tx_chn->tx_chn, &buf_dma);
@@ -126,6 +134,7 @@ void prueth_xmit_free(struct prueth_tx_chn *tx_chn,
k3_cppi_desc_pool_free(tx_chn->desc_pool, next_desc);
}
+free_desc:
k3_cppi_desc_pool_free(tx_chn->desc_pool, first_desc);
}
EXPORT_SYMBOL_GPL(prueth_xmit_free);
@@ -139,6 +148,7 @@ int emac_tx_complete_packets(struct prueth_emac *emac, int chn,
struct prueth_swdata *swdata;
struct prueth_tx_chn *tx_chn;
unsigned int total_bytes = 0;
+ struct xdp_frame *xdpf;
struct sk_buff *skb;
dma_addr_t desc_dma;
int res, num_tx = 0;
@@ -168,20 +178,29 @@ int emac_tx_complete_packets(struct prueth_emac *emac, int chn,
continue;
}
- if (swdata->type != PRUETH_SWDATA_SKB) {
+ switch (swdata->type) {
+ case PRUETH_SWDATA_SKB:
+ skb = swdata->data.skb;
+ ndev->stats.tx_bytes += skb->len;
+ ndev->stats.tx_packets++;
+ total_bytes += skb->len;
+ napi_consume_skb(skb, budget);
+ break;
+ case PRUETH_SWDATA_XDPF:
+ xdpf = swdata->data.xdpf;
+ ndev->stats.tx_bytes += xdpf->len;
+ ndev->stats.tx_packets++;
+ total_bytes += xdpf->len;
+ xdp_return_frame(xdpf);
+ break;
+ default:
netdev_err(ndev, "tx_complete: invalid swdata type %d\n", swdata->type);
+ prueth_xmit_free(tx_chn, desc_tx);
budget++;
continue;
}
- skb = swdata->data.skb;
prueth_xmit_free(tx_chn, desc_tx);
-
- ndev = skb->dev;
- ndev->stats.tx_packets++;
- ndev->stats.tx_bytes += skb->len;
- total_bytes += skb->len;
- napi_consume_skb(skb, budget);
num_tx++;
}
@@ -498,6 +517,7 @@ int prueth_dma_rx_push_mapped(struct prueth_emac *emac,
swdata = cppi5_hdesc_get_swdata(desc_rx);
swdata->type = PRUETH_SWDATA_PAGE;
swdata->data.page = page;
+ swdata->rx_chn = rx_chn;
return k3_udma_glue_push_rx_chn(rx_chn->rx_chn, PRUETH_RX_FLOW_DATA,
desc_rx, desc_dma);
@@ -540,7 +560,156 @@ void emac_rx_timestamp(struct prueth_emac *emac,
ssh->hwtstamp = ns_to_ktime(ns);
}
-static int emac_rx_packet(struct prueth_emac *emac, u32 flow_id)
+/**
+ * emac_xmit_xdp_frame - transmits an XDP frame
+ * @emac: emac device
+ * @xdpf: data to transmit
+ * @page: page from page pool if already DMA mapped
+ * @q_idx: queue id
+ *
+ * Return: XDP state
+ */
+int emac_xmit_xdp_frame(struct prueth_emac *emac,
+ struct xdp_frame *xdpf,
+ struct page *page,
+ unsigned int q_idx)
+{
+ struct cppi5_host_desc_t *first_desc;
+ struct net_device *ndev = emac->ndev;
+ struct prueth_tx_chn *tx_chn;
+ dma_addr_t desc_dma, buf_dma;
+ struct prueth_swdata *swdata;
+ u32 *epib;
+ int ret;
+
+ void *data = xdpf->data;
+ u32 pkt_len = xdpf->len;
+
+ if (q_idx >= PRUETH_MAX_TX_QUEUES) {
+ netdev_err(ndev, "xdp tx: invalid q_id %d\n", q_idx);
+ return ICSSG_XDP_CONSUMED; /* drop */
+ }
+
+ tx_chn = &emac->tx_chns[q_idx];
+
+ if (page) { /* already DMA mapped by page_pool */
+ buf_dma = page_pool_get_dma_addr(page);
+ buf_dma += xdpf->headroom + sizeof(struct xdp_frame);
+ } else { /* Map the linear buffer */
+ buf_dma = dma_map_single(tx_chn->dma_dev, data, pkt_len, DMA_TO_DEVICE);
+ if (dma_mapping_error(tx_chn->dma_dev, buf_dma)) {
+ netdev_err(ndev, "xdp tx: failed to map data buffer\n");
+ return ICSSG_XDP_CONSUMED; /* drop */
+ }
+ }
+
+ first_desc = k3_cppi_desc_pool_alloc(tx_chn->desc_pool);
+ if (!first_desc) {
+ netdev_dbg(ndev, "xdp tx: failed to allocate descriptor\n");
+ if (!page)
+ dma_unmap_single(tx_chn->dma_dev, buf_dma, pkt_len, DMA_TO_DEVICE);
+ goto drop_free_descs; /* drop */
+ }
+
+ cppi5_hdesc_init(first_desc, CPPI5_INFO0_HDESC_EPIB_PRESENT,
+ PRUETH_NAV_PS_DATA_SIZE);
+ cppi5_hdesc_set_pkttype(first_desc, 0);
+ epib = first_desc->epib;
+ epib[0] = 0;
+ epib[1] = 0;
+
+ /* set dst tag to indicate internal qid at the firmware which is at
+ * bit8..bit15. bit0..bit7 indicates port num for directed
+ * packets in case of switch mode operation
+ */
+ cppi5_desc_set_tags_ids(&first_desc->hdr, 0, (emac->port_id | (q_idx << 8)));
+ k3_udma_glue_tx_dma_to_cppi5_addr(tx_chn->tx_chn, &buf_dma);
+ cppi5_hdesc_attach_buf(first_desc, buf_dma, pkt_len, buf_dma, pkt_len);
+ swdata = cppi5_hdesc_get_swdata(first_desc);
+ if (page) {
+ swdata->type = PRUETH_SWDATA_PAGE;
+ swdata->data.page = page;
+ /* we assume page came from RX channel page pool */
+ swdata->rx_chn = &emac->rx_chns;
+ } else {
+ swdata->type = PRUETH_SWDATA_XDPF;
+ swdata->data.xdpf = xdpf;
+ }
+
+ cppi5_hdesc_set_pktlen(first_desc, pkt_len);
+ desc_dma = k3_cppi_desc_pool_virt2dma(tx_chn->desc_pool, first_desc);
+
+ ret = k3_udma_glue_push_tx_chn(tx_chn->tx_chn, first_desc, desc_dma);
+ if (ret) {
+ netdev_err(ndev, "xdp tx: push failed: %d\n", ret);
+ goto drop_free_descs;
+ }
+
+ return ICSSG_XDP_TX;
+
+drop_free_descs:
+ prueth_xmit_free(tx_chn, first_desc);
+ return ICSSG_XDP_CONSUMED;
+}
+EXPORT_SYMBOL_GPL(emac_xmit_xdp_frame);
+
+/**
+ * emac_run_xdp - run an XDP program
+ * @emac: emac device
+ * @xdp: XDP buffer containing the frame
+ * @page: page with RX data if already DMA mapped
+ *
+ * Return: XDP state
+ */
+static int emac_run_xdp(struct prueth_emac *emac, struct xdp_buff *xdp,
+ struct page *page)
+{
+ int err, result = ICSSG_XDP_PASS;
+ struct bpf_prog *xdp_prog;
+ struct xdp_frame *xdpf;
+ int q_idx;
+ u32 act;
+
+ xdp_prog = READ_ONCE(emac->xdp_prog);
+
+ act = bpf_prog_run_xdp(xdp_prog, xdp);
+ switch (act) {
+ case XDP_PASS:
+ break;
+ case XDP_TX:
+ /* Send packet to TX ring for immediate transmission */
+ xdpf = xdp_convert_buff_to_frame(xdp);
+ if (unlikely(!xdpf))
+ goto drop;
+
+ q_idx = smp_processor_id() % emac->tx_ch_num;
+ result = emac_xmit_xdp_frame(emac, xdpf, page, q_idx);
+ if (result == ICSSG_XDP_CONSUMED)
+ goto drop;
+ break;
+ case XDP_REDIRECT:
+ err = xdp_do_redirect(emac->ndev, xdp, xdp_prog);
+ if (err)
+ goto drop;
+ result = ICSSG_XDP_REDIR;
+ break;
+ default:
+ bpf_warn_invalid_xdp_action(emac->ndev, xdp_prog, act);
+ fallthrough;
+ case XDP_ABORTED:
+drop:
+ trace_xdp_exception(emac->ndev, xdp_prog, act);
+ fallthrough; /* handle aborts by dropping packet */
+ case XDP_DROP:
+ result = ICSSG_XDP_CONSUMED;
+ page_pool_recycle_direct(emac->rx_chns.pg_pool, page);
+ break;
+ }
+
+ return result;
+}
+
+static int emac_rx_packet(struct prueth_emac *emac, u32 flow_id, int *xdp_state)
{
struct prueth_rx_chn *rx_chn = &emac->rx_chns;
u32 buf_dma_len, pkt_len, port_id = 0;
@@ -551,10 +720,12 @@ static int emac_rx_packet(struct prueth_emac *emac, u32 flow_id)
struct page *page, *new_page;
struct page_pool *pool;
struct sk_buff *skb;
+ struct xdp_buff xdp;
u32 *psdata;
void *pa;
int ret;
+ *xdp_state = 0;
pool = rx_chn->pg_pool;
ret = k3_udma_glue_pop_rx_chn(rx_chn->rx_chn, flow_id, &desc_dma);
if (ret) {
@@ -594,9 +765,21 @@ static int emac_rx_packet(struct prueth_emac *emac, u32 flow_id)
goto requeue;
}
- /* prepare skb and send to n/w stack */
pa = page_address(page);
- skb = napi_build_skb(pa, PAGE_SIZE);
+ if (emac->xdp_prog) {
+ xdp_init_buff(&xdp, PAGE_SIZE, &rx_chn->xdp_rxq);
+ xdp_prepare_buff(&xdp, pa, PRUETH_HEADROOM, pkt_len, false);
+
+ *xdp_state = emac_run_xdp(emac, &xdp, page);
+ if (*xdp_state == ICSSG_XDP_PASS)
+ skb = xdp_build_skb_from_buff(&xdp);
+ else
+ goto requeue;
+ } else {
+ /* prepare skb and send to n/w stack */
+ skb = napi_build_skb(pa, PAGE_SIZE);
+ }
+
if (!skb) {
ndev->stats.rx_dropped++;
page_pool_recycle_direct(pool, page);
@@ -859,14 +1042,25 @@ static void prueth_tx_cleanup(void *data, dma_addr_t desc_dma)
struct prueth_tx_chn *tx_chn = data;
struct cppi5_host_desc_t *desc_tx;
struct prueth_swdata *swdata;
+ struct xdp_frame *xdpf;
struct sk_buff *skb;
desc_tx = k3_cppi_desc_pool_dma2virt(tx_chn->desc_pool, desc_dma);
swdata = cppi5_hdesc_get_swdata(desc_tx);
- if (swdata->type == PRUETH_SWDATA_SKB) {
+
+ switch (swdata->type) {
+ case PRUETH_SWDATA_SKB:
skb = swdata->data.skb;
dev_kfree_skb_any(skb);
+ break;
+ case PRUETH_SWDATA_XDPF:
+ xdpf = swdata->data.xdpf;
+ xdp_return_frame(xdpf);
+ break;
+ default:
+ break;
}
+
prueth_xmit_free(tx_chn, desc_tx);
}
@@ -901,15 +1095,18 @@ int icssg_napi_rx_poll(struct napi_struct *napi_rx, int budget)
PRUETH_RX_FLOW_DATA_SR1 : PRUETH_RX_FLOW_DATA;
int flow = emac->is_sr1 ?
PRUETH_MAX_RX_FLOWS_SR1 : PRUETH_MAX_RX_FLOWS;
+ int xdp_state_or = 0;
int num_rx = 0;
int cur_budget;
+ int xdp_state;
int ret;
while (flow--) {
cur_budget = budget - num_rx;
while (cur_budget--) {
- ret = emac_rx_packet(emac, flow);
+ ret = emac_rx_packet(emac, flow, &xdp_state);
+ xdp_state_or |= xdp_state;
if (ret)
break;
num_rx++;
@@ -919,6 +1116,9 @@ int icssg_napi_rx_poll(struct napi_struct *napi_rx, int budget)
break;
}
+ if (xdp_state_or & ICSSG_XDP_REDIR)
+ xdp_do_flush();
+
if (num_rx < budget && napi_complete_done(napi_rx, num_rx)) {
if (unlikely(emac->rx_pace_timeout_ns)) {
hrtimer_start(&emac->rx_hrtimer,
diff --git a/drivers/net/ethernet/ti/icssg/icssg_prueth.c b/drivers/net/ethernet/ti/icssg/icssg_prueth.c
index e5e4efe485f6..a360a1d6f8d7 100644
--- a/drivers/net/ethernet/ti/icssg/icssg_prueth.c
+++ b/drivers/net/ethernet/ti/icssg/icssg_prueth.c
@@ -559,6 +559,33 @@ const struct icss_iep_clockops prueth_iep_clockops = {
.perout_enable = prueth_perout_enable,
};
+static int prueth_create_xdp_rxqs(struct prueth_emac *emac)
+{
+ struct xdp_rxq_info *rxq = &emac->rx_chns.xdp_rxq;
+ struct page_pool *pool = emac->rx_chns.pg_pool;
+ int ret;
+
+ ret = xdp_rxq_info_reg(rxq, emac->ndev, 0, rxq->napi_id);
+ if (ret)
+ return ret;
+
+ ret = xdp_rxq_info_reg_mem_model(rxq, MEM_TYPE_PAGE_POOL, pool);
+ if (ret)
+ xdp_rxq_info_unreg(rxq);
+
+ return ret;
+}
+
+static void prueth_destroy_xdp_rxqs(struct prueth_emac *emac)
+{
+ struct xdp_rxq_info *rxq = &emac->rx_chns.xdp_rxq;
+
+ if (!xdp_rxq_info_is_reg(rxq))
+ return;
+
+ xdp_rxq_info_unreg(rxq);
+}
+
static int icssg_prueth_add_mcast(struct net_device *ndev, const u8 *addr)
{
struct net_device *real_dev;
@@ -780,10 +807,14 @@ static int emac_ndo_open(struct net_device *ndev)
if (ret)
goto free_tx_ts_irq;
- ret = k3_udma_glue_enable_rx_chn(emac->rx_chns.rx_chn);
+ ret = prueth_create_xdp_rxqs(emac);
if (ret)
goto reset_rx_chn;
+ ret = k3_udma_glue_enable_rx_chn(emac->rx_chns.rx_chn);
+ if (ret)
+ goto destroy_xdp_rxqs;
+
for (i = 0; i < emac->tx_ch_num; i++) {
ret = k3_udma_glue_enable_tx_chn(emac->tx_chns[i].tx_chn);
if (ret)
@@ -809,6 +840,8 @@ static int emac_ndo_open(struct net_device *ndev)
* any SKB for completion. So set false to free_skb
*/
prueth_reset_tx_chan(emac, i, false);
+destroy_xdp_rxqs:
+ prueth_destroy_xdp_rxqs(emac);
reset_rx_chn:
prueth_reset_rx_chan(&emac->rx_chns, max_rx_flows, false);
free_tx_ts_irq:
@@ -880,6 +913,8 @@ static int emac_ndo_stop(struct net_device *ndev)
prueth_reset_rx_chan(&emac->rx_chns, max_rx_flows, true);
+ prueth_destroy_xdp_rxqs(emac);
+
napi_disable(&emac->napi_rx);
hrtimer_cancel(&emac->rx_hrtimer);
@@ -1024,6 +1059,90 @@ static int emac_ndo_vlan_rx_del_vid(struct net_device *ndev,
return 0;
}
+/**
+ * emac_xdp_xmit - Implements ndo_xdp_xmit
+ * @dev: netdev
+ * @n: number of frames
+ * @frames: array of XDP buffer pointers
+ * @flags: XDP extra info
+ *
+ * Return: number of frames successfully sent. Failed frames
+ * will be free'ed by XDP core.
+ *
+ * For error cases, a negative errno code is returned and no-frames
+ * are transmitted (caller must handle freeing frames).
+ **/
+static int emac_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames,
+ u32 flags)
+{
+ struct prueth_emac *emac = netdev_priv(dev);
+ unsigned int q_idx;
+ int nxmit = 0;
+ int i;
+
+ q_idx = smp_processor_id() % emac->tx_ch_num;
+
+ if (unlikely(flags & ~XDP_XMIT_FLAGS_MASK))
+ return -EINVAL;
+
+ for (i = 0; i < n; i++) {
+ struct xdp_frame *xdpf = frames[i];
+ int err;
+
+ err = emac_xmit_xdp_frame(emac, xdpf, NULL, q_idx);
+ if (err != ICSSG_XDP_TX)
+ break;
+ nxmit++;
+ }
+
+ return nxmit;
+}
+
+/**
+ * emac_xdp_setup - add/remove an XDP program
+ * @emac: emac device
+ * @bpf: XDP program
+ *
+ * Return: Always 0 (Success)
+ **/
+static int emac_xdp_setup(struct prueth_emac *emac, struct netdev_bpf *bpf)
+{
+ struct bpf_prog *prog = bpf->prog;
+ xdp_features_t val;
+
+ val = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT |
+ NETDEV_XDP_ACT_NDO_XMIT;
+ xdp_set_features_flag(emac->ndev, val);
+
+ if (!emac->xdpi.prog && !prog)
+ return 0;
+
+ WRITE_ONCE(emac->xdp_prog, prog);
+
+ xdp_attachment_setup(&emac->xdpi, bpf);
+
+ return 0;
+}
+
+/**
+ * emac_ndo_bpf - implements ndo_bpf for icssg_prueth
+ * @ndev: network adapter device
+ * @bpf: XDP program
+ *
+ * Return: 0 on success, error code on failure.
+ **/
+static int emac_ndo_bpf(struct net_device *ndev, struct netdev_bpf *bpf)
+{
+ struct prueth_emac *emac = netdev_priv(ndev);
+
+ switch (bpf->command) {
+ case XDP_SETUP_PROG:
+ return emac_xdp_setup(emac, bpf);
+ default:
+ return -EINVAL;
+ }
+}
+
static const struct net_device_ops emac_netdev_ops = {
.ndo_open = emac_ndo_open,
.ndo_stop = emac_ndo_stop,
@@ -1038,6 +1157,8 @@ static const struct net_device_ops emac_netdev_ops = {
.ndo_fix_features = emac_ndo_fix_features,
.ndo_vlan_rx_add_vid = emac_ndo_vlan_rx_add_vid,
.ndo_vlan_rx_kill_vid = emac_ndo_vlan_rx_del_vid,
+ .ndo_bpf = emac_ndo_bpf,
+ .ndo_xdp_xmit = emac_xdp_xmit,
};
static int prueth_netdev_init(struct prueth *prueth,
diff --git a/drivers/net/ethernet/ti/icssg/icssg_prueth.h b/drivers/net/ethernet/ti/icssg/icssg_prueth.h
index 2c8585255b7c..fb8dc8e12c19 100644
--- a/drivers/net/ethernet/ti/icssg/icssg_prueth.h
+++ b/drivers/net/ethernet/ti/icssg/icssg_prueth.h
@@ -8,6 +8,8 @@
#ifndef __NET_TI_ICSSG_PRUETH_H
#define __NET_TI_ICSSG_PRUETH_H
+#include <linux/bpf.h>
+#include <linux/bpf_trace.h>
#include <linux/etherdevice.h>
#include <linux/genalloc.h>
#include <linux/if_vlan.h>
@@ -134,6 +136,7 @@ struct prueth_rx_chn {
unsigned int irq[ICSSG_MAX_RFLOWS]; /* separate irq per flow */
char name[32];
struct page_pool *pg_pool;
+ struct xdp_rxq_info xdp_rxq;
};
enum prueth_swdata_type {
@@ -141,16 +144,19 @@ enum prueth_swdata_type {
PRUETH_SWDATA_SKB,
PRUETH_SWDATA_PAGE,
PRUETH_SWDATA_CMD,
+ PRUETH_SWDATA_XDPF,
};
union prueth_data {
struct sk_buff *skb;
struct page *page;
u32 cmd;
+ struct xdp_frame *xdpf;
};
struct prueth_swdata {
union prueth_data data;
+ struct prueth_rx_chn *rx_chn;
enum prueth_swdata_type type;
};
@@ -161,6 +167,12 @@ struct prueth_swdata {
#define PRUETH_MAX_TX_TS_REQUESTS 50 /* Max simultaneous TX_TS requests */
+/* XDP BPF state */
+#define ICSSG_XDP_PASS 0
+#define ICSSG_XDP_CONSUMED BIT(0)
+#define ICSSG_XDP_TX BIT(1)
+#define ICSSG_XDP_REDIR BIT(2)
+
/* Minimum coalesce time in usecs for both Tx and Rx */
#define ICSSG_MIN_COALESCE_USECS 20
@@ -229,6 +241,8 @@ struct prueth_emac {
unsigned long rx_pace_timeout_ns;
struct netdev_hw_addr_list vlan_mcast_list[MAX_VLAN_ID];
+ struct bpf_prog *xdp_prog;
+ struct xdp_attachment_info xdpi;
};
/* The buf includes headroom compatible with both skb and xdpf */
@@ -467,5 +481,9 @@ void prueth_put_cores(struct prueth *prueth, int slice);
/* Revision specific helper */
u64 icssg_ts_to_ns(u32 hi_sw, u32 hi, u32 lo, u32 cycle_time_ns);
+int emac_xmit_xdp_frame(struct prueth_emac *emac,
+ struct xdp_frame *xdpf,
+ struct page *page,
+ unsigned int q_idx);
#endif /* __NET_TI_ICSSG_PRUETH_H */
--
2.43.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH net-next v2 3/3] net: ti: icssg-prueth: Add XDP support
2025-02-10 10:33 ` [PATCH net-next v2 3/3] net: ti: icssg-prueth: Add XDP support Meghana Malladi
@ 2025-02-11 18:39 ` kernel test robot
2025-02-11 22:20 ` kernel test robot
2025-02-12 16:03 ` Roger Quadros
2 siblings, 0 replies; 17+ messages in thread
From: kernel test robot @ 2025-02-11 18:39 UTC (permalink / raw)
To: Meghana Malladi, rogerq, danishanwar, pabeni, kuba, edumazet,
davem, andrew+netdev
Cc: llvm, oe-kbuild-all, bpf, linux-arm-kernel, linux-kernel, netdev,
u.kleine-koenig, krzysztof.kozlowski, dan.carpenter, m-malladi,
schnelle, glaroque, rdunlap, diogo.ivo, jan.kiszka,
john.fastabend, hawk, daniel, ast, srk, Vignesh Raghavendra
Hi Meghana,
kernel test robot noticed the following build errors:
[auto build test ERROR on acdefab0dcbc3833b5a734ab80d792bb778517a0]
url: https://github.com/intel-lab-lkp/linux/commits/Meghana-Malladi/net-ti-icssg-prueth-Use-page_pool-API-for-RX-buffer-allocation/20250210-183805
base: acdefab0dcbc3833b5a734ab80d792bb778517a0
patch link: https://lore.kernel.org/r/20250210103352.541052-4-m-malladi%40ti.com
patch subject: [PATCH net-next v2 3/3] net: ti: icssg-prueth: Add XDP support
config: arm64-allmodconfig (https://download.01.org/0day-ci/archive/20250212/202502120205.w04H4d0q-lkp@intel.com/config)
compiler: clang version 18.1.8 (https://github.com/llvm/llvm-project 3b5b5c1ec4a3095ab096dd780e84d7ab81f3d7ff)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250212/202502120205.w04H4d0q-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202502120205.w04H4d0q-lkp@intel.com/
All errors (new ones prefixed by >>):
>> drivers/net/ethernet/ti/icssg/icssg_prueth.c:568:50: error: no member named 'napi_id' in 'struct xdp_rxq_info'
568 | ret = xdp_rxq_info_reg(rxq, emac->ndev, 0, rxq->napi_id);
| ~~~ ^
1 error generated.
vim +568 drivers/net/ethernet/ti/icssg/icssg_prueth.c
561
562 static int prueth_create_xdp_rxqs(struct prueth_emac *emac)
563 {
564 struct xdp_rxq_info *rxq = &emac->rx_chns.xdp_rxq;
565 struct page_pool *pool = emac->rx_chns.pg_pool;
566 int ret;
567
> 568 ret = xdp_rxq_info_reg(rxq, emac->ndev, 0, rxq->napi_id);
569 if (ret)
570 return ret;
571
572 ret = xdp_rxq_info_reg_mem_model(rxq, MEM_TYPE_PAGE_POOL, pool);
573 if (ret)
574 xdp_rxq_info_unreg(rxq);
575
576 return ret;
577 }
578
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net-next v2 3/3] net: ti: icssg-prueth: Add XDP support
2025-02-10 10:33 ` [PATCH net-next v2 3/3] net: ti: icssg-prueth: Add XDP support Meghana Malladi
2025-02-11 18:39 ` kernel test robot
@ 2025-02-11 22:20 ` kernel test robot
2025-02-12 16:03 ` Roger Quadros
2 siblings, 0 replies; 17+ messages in thread
From: kernel test robot @ 2025-02-11 22:20 UTC (permalink / raw)
To: Meghana Malladi, rogerq, danishanwar, pabeni, kuba, edumazet,
davem, andrew+netdev
Cc: oe-kbuild-all, bpf, linux-arm-kernel, linux-kernel, netdev,
u.kleine-koenig, krzysztof.kozlowski, dan.carpenter, m-malladi,
schnelle, glaroque, rdunlap, diogo.ivo, jan.kiszka,
john.fastabend, hawk, daniel, ast, srk, Vignesh Raghavendra
Hi Meghana,
kernel test robot noticed the following build errors:
[auto build test ERROR on acdefab0dcbc3833b5a734ab80d792bb778517a0]
url: https://github.com/intel-lab-lkp/linux/commits/Meghana-Malladi/net-ti-icssg-prueth-Use-page_pool-API-for-RX-buffer-allocation/20250210-183805
base: acdefab0dcbc3833b5a734ab80d792bb778517a0
patch link: https://lore.kernel.org/r/20250210103352.541052-4-m-malladi%40ti.com
patch subject: [PATCH net-next v2 3/3] net: ti: icssg-prueth: Add XDP support
config: arm64-defconfig (https://download.01.org/0day-ci/archive/20250212/202502120546.Y6ri4qi6-lkp@intel.com/config)
compiler: aarch64-linux-gcc (GCC) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250212/202502120546.Y6ri4qi6-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202502120546.Y6ri4qi6-lkp@intel.com/
All errors (new ones prefixed by >>):
drivers/net/ethernet/ti/icssg/icssg_prueth.c: In function 'prueth_create_xdp_rxqs':
>> drivers/net/ethernet/ti/icssg/icssg_prueth.c:568:55: error: 'struct xdp_rxq_info' has no member named 'napi_id'
568 | ret = xdp_rxq_info_reg(rxq, emac->ndev, 0, rxq->napi_id);
| ^~
vim +568 drivers/net/ethernet/ti/icssg/icssg_prueth.c
561
562 static int prueth_create_xdp_rxqs(struct prueth_emac *emac)
563 {
564 struct xdp_rxq_info *rxq = &emac->rx_chns.xdp_rxq;
565 struct page_pool *pool = emac->rx_chns.pg_pool;
566 int ret;
567
> 568 ret = xdp_rxq_info_reg(rxq, emac->ndev, 0, rxq->napi_id);
569 if (ret)
570 return ret;
571
572 ret = xdp_rxq_info_reg_mem_model(rxq, MEM_TYPE_PAGE_POOL, pool);
573 if (ret)
574 xdp_rxq_info_unreg(rxq);
575
576 return ret;
577 }
578
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net-next v2 1/3] net: ti: icssg-prueth: Use page_pool API for RX buffer allocation
2025-02-10 10:33 ` [PATCH net-next v2 1/3] net: ti: icssg-prueth: Use page_pool API for RX buffer allocation Meghana Malladi
@ 2025-02-12 13:56 ` Roger Quadros
2025-02-18 10:10 ` Malladi, Meghana
0 siblings, 1 reply; 17+ messages in thread
From: Roger Quadros @ 2025-02-12 13:56 UTC (permalink / raw)
To: Meghana Malladi, danishanwar, pabeni, kuba, edumazet, davem,
andrew+netdev
Cc: bpf, linux-arm-kernel, linux-kernel, netdev, u.kleine-koenig,
krzysztof.kozlowski, dan.carpenter, schnelle, glaroque, rdunlap,
diogo.ivo, jan.kiszka, john.fastabend, hawk, daniel, ast, srk,
Vignesh Raghavendra
On 10/02/2025 12:33, Meghana Malladi wrote:
> From: Roger Quadros <rogerq@kernel.org>
>
> This is to prepare for native XDP support.
>
> The page pool API is more faster in allocating pages than
> __alloc_skb(). Drawback is that it works at PAGE_SIZE granularity
> so we are not efficient in memory usage.
> i.e. we are using PAGE_SIZE (4KB) memory for 1.5KB max packet size.
>
> Signed-off-by: Roger Quadros <rogerq@kernel.org>
> Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
> Signed-off-by: Meghana Malladi <m-malladi@ti.com>
> ---
> v1: https://lore.kernel.org/all/20250122124951.3072410-1-m-malladi@ti.com/
>
> Changes since v1 (v2-v1):
> - Recycle pages wherever necessary using skb_mark_for_recycle()
> - Use napi_build_skb() instead of build_skb()
> - Update with correct frag_size argument in napi_build_skb()
> - Use napi_gro_receive() instead of netif_receive_skb()
> - Use PP_FLAG_DMA_SYNC_DEV to enable DMA sync with device
> - Use page_pool_dma_sync_for_cpu() to sync Rx page pool for CPU
>
> All the above changes have been suggested by Ido Schimmel <idosch@idosch.org>
>
> drivers/net/ethernet/ti/Kconfig | 1 +
> drivers/net/ethernet/ti/icssg/icssg_common.c | 174 ++++++++++++------
> drivers/net/ethernet/ti/icssg/icssg_prueth.h | 14 +-
> .../net/ethernet/ti/icssg/icssg_prueth_sr1.c | 21 ++-
> 4 files changed, 140 insertions(+), 70 deletions(-)
>
> diff --git a/drivers/net/ethernet/ti/Kconfig b/drivers/net/ethernet/ti/Kconfig
> index 0d5a862cd78a..b461281d31b6 100644
> --- a/drivers/net/ethernet/ti/Kconfig
> +++ b/drivers/net/ethernet/ti/Kconfig
> @@ -204,6 +204,7 @@ config TI_ICSSG_PRUETH_SR1
> select PHYLIB
> select TI_ICSS_IEP
> select TI_K3_CPPI_DESC_POOL
> + select PAGE_POOL
> depends on PRU_REMOTEPROC
> depends on NET_SWITCHDEV
> depends on ARCH_K3 && OF && TI_K3_UDMA_GLUE_LAYER
> diff --git a/drivers/net/ethernet/ti/icssg/icssg_common.c b/drivers/net/ethernet/ti/icssg/icssg_common.c
> index 74f0f200a89d..c3c1e2bf461e 100644
> --- a/drivers/net/ethernet/ti/icssg/icssg_common.c
> +++ b/drivers/net/ethernet/ti/icssg/icssg_common.c
> @@ -45,6 +45,11 @@ void prueth_cleanup_rx_chns(struct prueth_emac *emac,
> struct prueth_rx_chn *rx_chn,
> int max_rflows)
> {
> + if (rx_chn->pg_pool) {
> + page_pool_destroy(rx_chn->pg_pool);
> + rx_chn->pg_pool = NULL;
> + }
> +
> if (rx_chn->desc_pool)
> k3_cppi_desc_pool_destroy(rx_chn->desc_pool);
>
> @@ -461,17 +466,17 @@ int prueth_init_rx_chns(struct prueth_emac *emac,
> }
> EXPORT_SYMBOL_GPL(prueth_init_rx_chns);
>
> -int prueth_dma_rx_push(struct prueth_emac *emac,
> - struct sk_buff *skb,
> - struct prueth_rx_chn *rx_chn)
> +int prueth_dma_rx_push_mapped(struct prueth_emac *emac,
> + struct prueth_rx_chn *rx_chn,
> + struct page *page, u32 buf_len)
> {
> struct net_device *ndev = emac->ndev;
> struct cppi5_host_desc_t *desc_rx;
> - u32 pkt_len = skb_tailroom(skb);
> dma_addr_t desc_dma;
> dma_addr_t buf_dma;
> void **swdata;
>
> + buf_dma = page_pool_get_dma_addr(page) + PRUETH_HEADROOM;
> desc_rx = k3_cppi_desc_pool_alloc(rx_chn->desc_pool);
> if (!desc_rx) {
> netdev_err(ndev, "rx push: failed to allocate descriptor\n");
> @@ -479,25 +484,18 @@ int prueth_dma_rx_push(struct prueth_emac *emac,
> }
> desc_dma = k3_cppi_desc_pool_virt2dma(rx_chn->desc_pool, desc_rx);
>
> - buf_dma = dma_map_single(rx_chn->dma_dev, skb->data, pkt_len, DMA_FROM_DEVICE);
> - if (unlikely(dma_mapping_error(rx_chn->dma_dev, buf_dma))) {
> - k3_cppi_desc_pool_free(rx_chn->desc_pool, desc_rx);
> - netdev_err(ndev, "rx push: failed to map rx pkt buffer\n");
> - return -EINVAL;
> - }
> -
> cppi5_hdesc_init(desc_rx, CPPI5_INFO0_HDESC_EPIB_PRESENT,
> PRUETH_NAV_PS_DATA_SIZE);
> k3_udma_glue_rx_dma_to_cppi5_addr(rx_chn->rx_chn, &buf_dma);
> - cppi5_hdesc_attach_buf(desc_rx, buf_dma, skb_tailroom(skb), buf_dma, skb_tailroom(skb));
> + cppi5_hdesc_attach_buf(desc_rx, buf_dma, buf_len, buf_dma, buf_len);
>
> swdata = cppi5_hdesc_get_swdata(desc_rx);
> - *swdata = skb;
> + *swdata = page;
>
> - return k3_udma_glue_push_rx_chn(rx_chn->rx_chn, 0,
> + return k3_udma_glue_push_rx_chn(rx_chn->rx_chn, PRUETH_RX_FLOW_DATA,
> desc_rx, desc_dma);
> }
> -EXPORT_SYMBOL_GPL(prueth_dma_rx_push);
> +EXPORT_SYMBOL_GPL(prueth_dma_rx_push_mapped);
>
> u64 icssg_ts_to_ns(u32 hi_sw, u32 hi, u32 lo, u32 cycle_time_ns)
> {
> @@ -541,12 +539,16 @@ static int emac_rx_packet(struct prueth_emac *emac, u32 flow_id)
> u32 buf_dma_len, pkt_len, port_id = 0;
> struct net_device *ndev = emac->ndev;
> struct cppi5_host_desc_t *desc_rx;
> - struct sk_buff *skb, *new_skb;
> dma_addr_t desc_dma, buf_dma;
> + struct page *page, *new_page;
> + struct page_pool *pool;
> + struct sk_buff *skb;
> void **swdata;
> u32 *psdata;
> + void *pa;
> int ret;
>
> + pool = rx_chn->pg_pool;
> ret = k3_udma_glue_pop_rx_chn(rx_chn->rx_chn, flow_id, &desc_dma);
> if (ret) {
> if (ret != -ENODATA)
> @@ -558,15 +560,10 @@ static int emac_rx_packet(struct prueth_emac *emac, u32 flow_id)
> return 0;
>
> desc_rx = k3_cppi_desc_pool_dma2virt(rx_chn->desc_pool, desc_dma);
> -
> swdata = cppi5_hdesc_get_swdata(desc_rx);
> - skb = *swdata;
> -
> - psdata = cppi5_hdesc_get_psdata(desc_rx);
> - /* RX HW timestamp */
> - if (emac->rx_ts_enabled)
> - emac_rx_timestamp(emac, skb, psdata);
> + page = *swdata;
>
Unnecessary blank line.
> + page_pool_dma_sync_for_cpu(pool, page, 0, PAGE_SIZE);
> cppi5_hdesc_get_obuf(desc_rx, &buf_dma, &buf_dma_len);
> k3_udma_glue_rx_cppi5_to_dma_addr(rx_chn->rx_chn, &buf_dma);
> pkt_len = cppi5_hdesc_get_pktlen(desc_rx);
> @@ -574,32 +571,51 @@ static int emac_rx_packet(struct prueth_emac *emac, u32 flow_id)
> pkt_len -= 4;
> cppi5_desc_get_tags_ids(&desc_rx->hdr, &port_id, NULL);
>
> - dma_unmap_single(rx_chn->dma_dev, buf_dma, buf_dma_len, DMA_FROM_DEVICE);
> k3_cppi_desc_pool_free(rx_chn->desc_pool, desc_rx);
>
> - skb->dev = ndev;
> - new_skb = netdev_alloc_skb_ip_align(ndev, PRUETH_MAX_PKT_SIZE);
> /* if allocation fails we drop the packet but push the
> - * descriptor back to the ring with old skb to prevent a stall
> + * descriptor back to the ring with old page to prevent a stall
> */
> - if (!new_skb) {
> + new_page = page_pool_dev_alloc_pages(pool);
> + if (unlikely(!new_page)) {
> + new_page = page;
> ndev->stats.rx_dropped++;
> - new_skb = skb;
> - } else {
> - /* send the filled skb up the n/w stack */
> - skb_put(skb, pkt_len);
> - if (emac->prueth->is_switch_mode)
> - skb->offload_fwd_mark = emac->offload_fwd_mark;
> - skb->protocol = eth_type_trans(skb, ndev);
> - napi_gro_receive(&emac->napi_rx, skb);
> - ndev->stats.rx_bytes += pkt_len;
> - ndev->stats.rx_packets++;
> + goto requeue;
> + }
> +
> + /* prepare skb and send to n/w stack */
> + pa = page_address(page);
> + skb = napi_build_skb(pa, PAGE_SIZE);
> + if (!skb) {
> + ndev->stats.rx_dropped++;
> + page_pool_recycle_direct(pool, page);
> + goto requeue;
> }
>
> + skb_reserve(skb, PRUETH_HEADROOM);
> + skb_put(skb, pkt_len);
> + skb->dev = ndev;
> +
> + psdata = cppi5_hdesc_get_psdata(desc_rx);
> + /* RX HW timestamp */
> + if (emac->rx_ts_enabled)
> + emac_rx_timestamp(emac, skb, psdata);
> +
> + if (emac->prueth->is_switch_mode)
> + skb->offload_fwd_mark = emac->offload_fwd_mark;
> + skb->protocol = eth_type_trans(skb, ndev);
> +
> + skb_mark_for_recycle(skb);
> + napi_gro_receive(&emac->napi_rx, skb);
> + ndev->stats.rx_bytes += pkt_len;
> + ndev->stats.rx_packets++;
> +
> +requeue:
> /* queue another RX DMA */
> - ret = prueth_dma_rx_push(emac, new_skb, &emac->rx_chns);
> + ret = prueth_dma_rx_push_mapped(emac, &emac->rx_chns, new_page,
> + PRUETH_MAX_PKT_SIZE);
> if (WARN_ON(ret < 0)) {
> - dev_kfree_skb_any(new_skb);
> + page_pool_recycle_direct(pool, new_page);
> ndev->stats.rx_errors++;
> ndev->stats.rx_dropped++;
> }
> @@ -611,22 +627,17 @@ static void prueth_rx_cleanup(void *data, dma_addr_t desc_dma)
> {
> struct prueth_rx_chn *rx_chn = data;
> struct cppi5_host_desc_t *desc_rx;
> - struct sk_buff *skb;
> - dma_addr_t buf_dma;
> - u32 buf_dma_len;
> + struct page_pool *pool;
> + struct page *page;
> void **swdata;
>
> + pool = rx_chn->pg_pool;
> +
here too.
> desc_rx = k3_cppi_desc_pool_dma2virt(rx_chn->desc_pool, desc_dma);
> swdata = cppi5_hdesc_get_swdata(desc_rx);
> - skb = *swdata;
> - cppi5_hdesc_get_obuf(desc_rx, &buf_dma, &buf_dma_len);
> - k3_udma_glue_rx_cppi5_to_dma_addr(rx_chn->rx_chn, &buf_dma);
> -
> - dma_unmap_single(rx_chn->dma_dev, buf_dma, buf_dma_len,
> - DMA_FROM_DEVICE);
> + page = *swdata;
> + page_pool_recycle_direct(pool, page);
> k3_cppi_desc_pool_free(rx_chn->desc_pool, desc_rx);
> -
> - dev_kfree_skb_any(skb);
> }
>
> static int prueth_tx_ts_cookie_get(struct prueth_emac *emac)
> @@ -907,29 +918,71 @@ int icssg_napi_rx_poll(struct napi_struct *napi_rx, int budget)
> }
> EXPORT_SYMBOL_GPL(icssg_napi_rx_poll);
>
> +static struct page_pool *prueth_create_page_pool(struct prueth_emac *emac,
> + struct device *dma_dev,
> + int size)
> +{
> + struct page_pool_params pp_params = { 0 };
> + struct page_pool *pool;
> +
> + pp_params.order = 0;
> + pp_params.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV;
> + pp_params.pool_size = size;
> + pp_params.nid = dev_to_node(emac->prueth->dev);
> + pp_params.dma_dir = DMA_BIDIRECTIONAL;
> + pp_params.dev = dma_dev;
> + pp_params.napi = &emac->napi_rx;
> + pp_params.max_len = PAGE_SIZE;
> +
> + pool = page_pool_create(&pp_params);
> + if (IS_ERR(pool))
> + netdev_err(emac->ndev, "cannot create rx page pool\n");
> +
> + return pool;
> +}
> +
> int prueth_prepare_rx_chan(struct prueth_emac *emac,
> struct prueth_rx_chn *chn,
> int buf_size)
> {
> - struct sk_buff *skb;
> + struct page_pool *pool;
> + struct page *page;
> int i, ret;
>
> + pool = prueth_create_page_pool(emac, chn->dma_dev, chn->descs_num);
> + if (IS_ERR(pool))
> + return PTR_ERR(pool);
> +
> + chn->pg_pool = pool;
> +
> for (i = 0; i < chn->descs_num; i++) {
> - skb = __netdev_alloc_skb_ip_align(NULL, buf_size, GFP_KERNEL);
> - if (!skb)
> - return -ENOMEM;
> + /* NOTE: we're not using memory efficiently here.
> + * 1 full page (4KB?) used here instead of
> + * PRUETH_MAX_PKT_SIZE (~1.5KB?)
> + */
> + page = page_pool_dev_alloc_pages(pool);
Did you evaluate Ido's suggestion to use page_pool_alloc_frag()?
> + if (!page) {
> + netdev_err(emac->ndev, "couldn't allocate rx page\n");
> + ret = -ENOMEM;
> + goto recycle_alloc_pg;
> + }
>
> - ret = prueth_dma_rx_push(emac, skb, chn);
> + ret = prueth_dma_rx_push_mapped(emac, chn, page, buf_size);
> if (ret < 0) {
> netdev_err(emac->ndev,
> - "cannot submit skb for rx chan %s ret %d\n",
> + "cannot submit page for rx chan %s ret %d\n",
> chn->name, ret);
> - kfree_skb(skb);
> - return ret;
> + page_pool_recycle_direct(pool, page);
> + goto recycle_alloc_pg;
> }
> }
>
> return 0;
> +
> +recycle_alloc_pg:
> + prueth_reset_rx_chan(&emac->rx_chns, PRUETH_MAX_RX_FLOWS, false);
> +
> + return ret;
> }
> EXPORT_SYMBOL_GPL(prueth_prepare_rx_chan);
>
> @@ -958,6 +1011,9 @@ void prueth_reset_rx_chan(struct prueth_rx_chn *chn,
> prueth_rx_cleanup, !!i);
> if (disable)
> k3_udma_glue_disable_rx_chn(chn->rx_chn);
> +
> + page_pool_destroy(chn->pg_pool);
> + chn->pg_pool = NULL;
> }
> EXPORT_SYMBOL_GPL(prueth_reset_rx_chan);
>
> diff --git a/drivers/net/ethernet/ti/icssg/icssg_prueth.h b/drivers/net/ethernet/ti/icssg/icssg_prueth.h
> index 329b46e9ee53..c7b906de18af 100644
> --- a/drivers/net/ethernet/ti/icssg/icssg_prueth.h
> +++ b/drivers/net/ethernet/ti/icssg/icssg_prueth.h
> @@ -33,6 +33,8 @@
> #include <linux/dma/k3-udma-glue.h>
>
> #include <net/devlink.h>
> +#include <net/xdp.h>
> +#include <net/page_pool/helpers.h>
>
> #include "icssg_config.h"
> #include "icss_iep.h"
> @@ -131,6 +133,7 @@ struct prueth_rx_chn {
> u32 descs_num;
> unsigned int irq[ICSSG_MAX_RFLOWS]; /* separate irq per flow */
> char name[32];
> + struct page_pool *pg_pool;
> };
>
> /* There are 4 Tx DMA channels, but the highest priority is CH3 (thread 3)
> @@ -210,6 +213,10 @@ struct prueth_emac {
> struct netdev_hw_addr_list vlan_mcast_list[MAX_VLAN_ID];
> };
>
> +/* The buf includes headroom compatible with both skb and xdpf */
> +#define PRUETH_HEADROOM_NA (max(XDP_PACKET_HEADROOM, NET_SKB_PAD) + NET_IP_ALIGN)
> +#define PRUETH_HEADROOM ALIGN(PRUETH_HEADROOM_NA, sizeof(long))
> +
> /**
> * struct prueth_pdata - PRUeth platform data
> * @fdqring_mode: Free desc queue mode
> @@ -410,9 +417,10 @@ int prueth_init_rx_chns(struct prueth_emac *emac,
> struct prueth_rx_chn *rx_chn,
> char *name, u32 max_rflows,
> u32 max_desc_num);
> -int prueth_dma_rx_push(struct prueth_emac *emac,
> - struct sk_buff *skb,
> - struct prueth_rx_chn *rx_chn);
> +int prueth_dma_rx_push_mapped(struct prueth_emac *emac,
> + struct prueth_rx_chn *rx_chn,
> + struct page *page, u32 buf_len);
> +unsigned int prueth_rxbuf_total_len(unsigned int len);
> void emac_rx_timestamp(struct prueth_emac *emac,
> struct sk_buff *skb, u32 *psdata);
> enum netdev_tx icssg_ndo_start_xmit(struct sk_buff *skb, struct net_device *ndev);
> diff --git a/drivers/net/ethernet/ti/icssg/icssg_prueth_sr1.c b/drivers/net/ethernet/ti/icssg/icssg_prueth_sr1.c
> index 64a19ff39562..aeeb8a50376b 100644
> --- a/drivers/net/ethernet/ti/icssg/icssg_prueth_sr1.c
> +++ b/drivers/net/ethernet/ti/icssg/icssg_prueth_sr1.c
> @@ -274,10 +274,12 @@ static struct sk_buff *prueth_process_rx_mgm(struct prueth_emac *emac,
> struct prueth_rx_chn *rx_chn = &emac->rx_mgm_chn;
> struct net_device *ndev = emac->ndev;
> struct cppi5_host_desc_t *desc_rx;
> - struct sk_buff *skb, *new_skb;
> + struct page *page, *new_page;
> dma_addr_t desc_dma, buf_dma;
> u32 buf_dma_len, pkt_len;
> + struct sk_buff *skb;
Can we get rid of SKB entirely from the management channel code?
The packet received on this channel is never passed to user space so
I don't see why SKB is required.
> void **swdata;
> + void *pa;
> int ret;
>
> ret = k3_udma_glue_pop_rx_chn(rx_chn->rx_chn, flow_id, &desc_dma);
> @@ -299,32 +301,35 @@ static struct sk_buff *prueth_process_rx_mgm(struct prueth_emac *emac,
> }
>
> swdata = cppi5_hdesc_get_swdata(desc_rx);
> - skb = *swdata;
> + page = *swdata;
> cppi5_hdesc_get_obuf(desc_rx, &buf_dma, &buf_dma_len);
> pkt_len = cppi5_hdesc_get_pktlen(desc_rx);
>
> dma_unmap_single(rx_chn->dma_dev, buf_dma, buf_dma_len, DMA_FROM_DEVICE);
> k3_cppi_desc_pool_free(rx_chn->desc_pool, desc_rx);
>
> - new_skb = netdev_alloc_skb_ip_align(ndev, PRUETH_MAX_PKT_SIZE);
> + new_page = page_pool_dev_alloc_pages(rx_chn->pg_pool);
> /* if allocation fails we drop the packet but push the
> * descriptor back to the ring with old skb to prevent a stall
> */
> - if (!new_skb) {
> + if (!new_page) {
> netdev_err(ndev,
> - "skb alloc failed, dropped mgm pkt from flow %d\n",
> + "page alloc failed, dropped mgm pkt from flow %d\n",
> flow_id);
> - new_skb = skb;
> + new_page = page;
> skb = NULL; /* return NULL */
> } else {
> /* return the filled skb */
> + pa = page_address(page);
> + skb = napi_build_skb(pa, PAGE_SIZE);
> skb_put(skb, pkt_len);
> }
>
> /* queue another DMA */
> - ret = prueth_dma_rx_push(emac, new_skb, &emac->rx_mgm_chn);
> + ret = prueth_dma_rx_push_mapped(emac, &emac->rx_chns, new_page,
> + PRUETH_MAX_PKT_SIZE);
> if (WARN_ON(ret < 0))
> - dev_kfree_skb_any(new_skb);
> + page_pool_recycle_direct(rx_chn->pg_pool, new_page);
>
> return skb;
> }
--
cheers,
-roger
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net-next v2 2/3] net: ti: icssg-prueth: introduce and use prueth_swdata struct for SWDATA
2025-02-10 10:33 ` [PATCH net-next v2 2/3] net: ti: icssg-prueth: introduce and use prueth_swdata struct for SWDATA Meghana Malladi
@ 2025-02-12 14:12 ` Roger Quadros
2025-02-18 11:02 ` Malladi, Meghana
0 siblings, 1 reply; 17+ messages in thread
From: Roger Quadros @ 2025-02-12 14:12 UTC (permalink / raw)
To: Meghana Malladi, danishanwar, pabeni, kuba, edumazet, davem,
andrew+netdev
Cc: bpf, linux-arm-kernel, linux-kernel, netdev, u.kleine-koenig,
krzysztof.kozlowski, dan.carpenter, schnelle, glaroque, rdunlap,
diogo.ivo, jan.kiszka, john.fastabend, hawk, daniel, ast, srk,
Vignesh Raghavendra
Hi Meghana,
On 10/02/2025 12:33, Meghana Malladi wrote:
> From: Roger Quadros <rogerq@kernel.org>
>
> We have different cases for SWDATA (skb, page, cmd, etc)
> so it is better to have a dedicated data structure for that.
> We can embed the type field inside the struct and use it
> to interpret the data in completion handlers.
>
> Increase SWDATA size to 48 so we have some room to add
> more data if required.
>
> Signed-off-by: Roger Quadros <rogerq@kernel.org>
> Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
> Signed-off-by: Meghana Malladi <m-malladi@ti.com>
> ---
> v1: https://lore.kernel.org/all/20250122124951.3072410-1-m-malladi@ti.com/
>
> Changes since v1 (v2-v1): None
>
> drivers/net/ethernet/ti/icssg/icssg_common.c | 47 ++++++++++++-------
> drivers/net/ethernet/ti/icssg/icssg_config.h | 2 +-
> drivers/net/ethernet/ti/icssg/icssg_prueth.c | 6 +++
> drivers/net/ethernet/ti/icssg/icssg_prueth.h | 18 +++++++
> .../net/ethernet/ti/icssg/icssg_prueth_sr1.c | 4 +-
> 5 files changed, 58 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/net/ethernet/ti/icssg/icssg_common.c b/drivers/net/ethernet/ti/icssg/icssg_common.c
> index c3c1e2bf461e..a124c5773551 100644
> --- a/drivers/net/ethernet/ti/icssg/icssg_common.c
> +++ b/drivers/net/ethernet/ti/icssg/icssg_common.c
> @@ -136,12 +136,12 @@ int emac_tx_complete_packets(struct prueth_emac *emac, int chn,
> struct net_device *ndev = emac->ndev;
> struct cppi5_host_desc_t *desc_tx;
> struct netdev_queue *netif_txq;
> + struct prueth_swdata *swdata;
> struct prueth_tx_chn *tx_chn;
> unsigned int total_bytes = 0;
> struct sk_buff *skb;
> dma_addr_t desc_dma;
> int res, num_tx = 0;
> - void **swdata;
>
> tx_chn = &emac->tx_chns[chn];
>
> @@ -163,12 +163,18 @@ int emac_tx_complete_packets(struct prueth_emac *emac, int chn,
> swdata = cppi5_hdesc_get_swdata(desc_tx);
>
> /* was this command's TX complete? */
> - if (emac->is_sr1 && *(swdata) == emac->cmd_data) {
> + if (emac->is_sr1 && (void *)(swdata) == emac->cmd_data) {
> prueth_xmit_free(tx_chn, desc_tx);
> continue;
> }
>
> - skb = *(swdata);
> + if (swdata->type != PRUETH_SWDATA_SKB) {
> + netdev_err(ndev, "tx_complete: invalid swdata type %d\n", swdata->type);
> + budget++;
> + continue;
We are leaking the tx descriptor here. need to call prueth_xmit_free(tx_chn, desc_tx)?
> + }
> +
> + skb = swdata->data.skb;
> prueth_xmit_free(tx_chn, desc_tx);
>
> ndev = skb->dev;
> @@ -472,9 +478,9 @@ int prueth_dma_rx_push_mapped(struct prueth_emac *emac,
> {
> struct net_device *ndev = emac->ndev;
> struct cppi5_host_desc_t *desc_rx;
> + struct prueth_swdata *swdata;
> dma_addr_t desc_dma;
> dma_addr_t buf_dma;
> - void **swdata;
>
> buf_dma = page_pool_get_dma_addr(page) + PRUETH_HEADROOM;
> desc_rx = k3_cppi_desc_pool_alloc(rx_chn->desc_pool);
> @@ -490,7 +496,8 @@ int prueth_dma_rx_push_mapped(struct prueth_emac *emac,
> cppi5_hdesc_attach_buf(desc_rx, buf_dma, buf_len, buf_dma, buf_len);
>
> swdata = cppi5_hdesc_get_swdata(desc_rx);
> - *swdata = page;
> + swdata->type = PRUETH_SWDATA_PAGE;
> + swdata->data.page = page;
>
> return k3_udma_glue_push_rx_chn(rx_chn->rx_chn, PRUETH_RX_FLOW_DATA,
> desc_rx, desc_dma);
> @@ -539,11 +546,11 @@ static int emac_rx_packet(struct prueth_emac *emac, u32 flow_id)
> u32 buf_dma_len, pkt_len, port_id = 0;
> struct net_device *ndev = emac->ndev;
> struct cppi5_host_desc_t *desc_rx;
> + struct prueth_swdata *swdata;
> dma_addr_t desc_dma, buf_dma;
> struct page *page, *new_page;
> struct page_pool *pool;
> struct sk_buff *skb;
> - void **swdata;
> u32 *psdata;
> void *pa;
> int ret;
> @@ -561,7 +568,11 @@ static int emac_rx_packet(struct prueth_emac *emac, u32 flow_id)
>
> desc_rx = k3_cppi_desc_pool_dma2virt(rx_chn->desc_pool, desc_dma);
> swdata = cppi5_hdesc_get_swdata(desc_rx);
> - page = *swdata;
> + if (swdata->type != PRUETH_SWDATA_PAGE) {
> + netdev_err(ndev, "rx_pkt: invalid swdata->type %d\n", swdata->type);
what about freeing the rx descriptor?
> + return 0;
> + }
need new line.
> + page = swdata->data.page;
>
drop new line.
> page_pool_dma_sync_for_cpu(pool, page, 0, PAGE_SIZE);
> cppi5_hdesc_get_obuf(desc_rx, &buf_dma, &buf_dma_len);
> @@ -627,16 +638,18 @@ static void prueth_rx_cleanup(void *data, dma_addr_t desc_dma)
> {
> struct prueth_rx_chn *rx_chn = data;
> struct cppi5_host_desc_t *desc_rx;
> + struct prueth_swdata *swdata;
> struct page_pool *pool;
> struct page *page;
> - void **swdata;
>
> pool = rx_chn->pg_pool;
>
> desc_rx = k3_cppi_desc_pool_dma2virt(rx_chn->desc_pool, desc_dma);
> swdata = cppi5_hdesc_get_swdata(desc_rx);
> - page = *swdata;
> - page_pool_recycle_direct(pool, page);
> + if (swdata->type == PRUETH_SWDATA_PAGE) {
> + page = swdata->data.page;
> + page_pool_recycle_direct(pool, page);
> + }
need new line.
> k3_cppi_desc_pool_free(rx_chn->desc_pool, desc_rx);
> }
>
> @@ -673,13 +686,13 @@ enum netdev_tx icssg_ndo_start_xmit(struct sk_buff *skb, struct net_device *ndev
> struct prueth_emac *emac = netdev_priv(ndev);
> struct prueth *prueth = emac->prueth;
> struct netdev_queue *netif_txq;
> + struct prueth_swdata *swdata;
> struct prueth_tx_chn *tx_chn;
> dma_addr_t desc_dma, buf_dma;
> u32 pkt_len, dst_tag_id;
> int i, ret = 0, q_idx;
> bool in_tx_ts = 0;
> int tx_ts_cookie;
> - void **swdata;
> u32 *epib;
>
> pkt_len = skb_headlen(skb);
> @@ -741,7 +754,8 @@ enum netdev_tx icssg_ndo_start_xmit(struct sk_buff *skb, struct net_device *ndev
> k3_udma_glue_tx_dma_to_cppi5_addr(tx_chn->tx_chn, &buf_dma);
> cppi5_hdesc_attach_buf(first_desc, buf_dma, pkt_len, buf_dma, pkt_len);
> swdata = cppi5_hdesc_get_swdata(first_desc);
> - *swdata = skb;
> + swdata->type = PRUETH_SWDATA_SKB;
> + swdata->data.skb = skb;
>
> /* Handle the case where skb is fragmented in pages */
> cur_desc = first_desc;
> @@ -844,15 +858,16 @@ static void prueth_tx_cleanup(void *data, dma_addr_t desc_dma)
> {
> struct prueth_tx_chn *tx_chn = data;
> struct cppi5_host_desc_t *desc_tx;
> + struct prueth_swdata *swdata;
> struct sk_buff *skb;
> - void **swdata;
>
> desc_tx = k3_cppi_desc_pool_dma2virt(tx_chn->desc_pool, desc_dma);
> swdata = cppi5_hdesc_get_swdata(desc_tx);
> - skb = *(swdata);
> + if (swdata->type == PRUETH_SWDATA_SKB) {
> + skb = swdata->data.skb;
> + dev_kfree_skb_any(skb);
> + }
need new line.
> prueth_xmit_free(tx_chn, desc_tx);
> -
> - dev_kfree_skb_any(skb);
> }
>
> irqreturn_t prueth_rx_irq(int irq, void *dev_id)
> diff --git a/drivers/net/ethernet/ti/icssg/icssg_config.h b/drivers/net/ethernet/ti/icssg/icssg_config.h
> index c884e9fa099e..eab84e11d80e 100644
> --- a/drivers/net/ethernet/ti/icssg/icssg_config.h
> +++ b/drivers/net/ethernet/ti/icssg/icssg_config.h
> @@ -20,7 +20,7 @@ struct icssg_flow_cfg {
>
> #define PRUETH_PKT_TYPE_CMD 0x10
> #define PRUETH_NAV_PS_DATA_SIZE 16 /* Protocol specific data size */
> -#define PRUETH_NAV_SW_DATA_SIZE 16 /* SW related data size */
> +#define PRUETH_NAV_SW_DATA_SIZE 48 /* SW related data size */
Why do you need 48? I think it should fit in 16.
> #define PRUETH_MAX_TX_DESC 512
> #define PRUETH_MAX_RX_DESC 512
> #define PRUETH_MAX_RX_FLOWS 1 /* excluding default flow */
> diff --git a/drivers/net/ethernet/ti/icssg/icssg_prueth.c b/drivers/net/ethernet/ti/icssg/icssg_prueth.c
> index 00ed97860547..e5e4efe485f6 100644
> --- a/drivers/net/ethernet/ti/icssg/icssg_prueth.c
> +++ b/drivers/net/ethernet/ti/icssg/icssg_prueth.c
> @@ -1522,6 +1522,12 @@ static int prueth_probe(struct platform_device *pdev)
>
> np = dev->of_node;
>
> + if (sizeof(struct prueth_swdata) > PRUETH_NAV_SW_DATA_SIZE) {
> + dev_err(dev, "insufficient SW_DATA size: %d vs %ld\n",
> + PRUETH_NAV_SW_DATA_SIZE, sizeof(struct prueth_swdata));
> + return -ENOMEM;
> + }
> +
Can this be made a build time check instead of runtime?
> prueth = devm_kzalloc(dev, sizeof(*prueth), GFP_KERNEL);
> if (!prueth)
> return -ENOMEM;
> diff --git a/drivers/net/ethernet/ti/icssg/icssg_prueth.h b/drivers/net/ethernet/ti/icssg/icssg_prueth.h
> index c7b906de18af..2c8585255b7c 100644
> --- a/drivers/net/ethernet/ti/icssg/icssg_prueth.h
> +++ b/drivers/net/ethernet/ti/icssg/icssg_prueth.h
> @@ -136,6 +136,24 @@ struct prueth_rx_chn {
> struct page_pool *pg_pool;
> };
>
> +enum prueth_swdata_type {
> + PRUETH_SWDATA_INVALID = 0,
> + PRUETH_SWDATA_SKB,
> + PRUETH_SWDATA_PAGE,
> + PRUETH_SWDATA_CMD,
> +};
> +
> +union prueth_data {
> + struct sk_buff *skb;
> + struct page *page;
> + u32 cmd;
> +};
> +
> +struct prueth_swdata {
> + union prueth_data data;
> + enum prueth_swdata_type type;
> +};
Can we re-write this like so with type first and union embedded inside?
struct prueth_swdata {
enum prueth_swdata_type type;
union prueth_data {
struct sk_buff *skb;
struct page *page;
u32 cmd;
};
};
> +
> /* There are 4 Tx DMA channels, but the highest priority is CH3 (thread 3)
> * and lower three are lower priority channels or threads.
> */
> diff --git a/drivers/net/ethernet/ti/icssg/icssg_prueth_sr1.c b/drivers/net/ethernet/ti/icssg/icssg_prueth_sr1.c
> index aeeb8a50376b..7bbe0808b3ec 100644
> --- a/drivers/net/ethernet/ti/icssg/icssg_prueth_sr1.c
> +++ b/drivers/net/ethernet/ti/icssg/icssg_prueth_sr1.c
> @@ -275,10 +275,10 @@ static struct sk_buff *prueth_process_rx_mgm(struct prueth_emac *emac,
> struct net_device *ndev = emac->ndev;
> struct cppi5_host_desc_t *desc_rx;
> struct page *page, *new_page;
> + struct prueth_swdata *swdata;
> dma_addr_t desc_dma, buf_dma;
> u32 buf_dma_len, pkt_len;
> struct sk_buff *skb;
> - void **swdata;
> void *pa;
> int ret;
>
> @@ -301,7 +301,7 @@ static struct sk_buff *prueth_process_rx_mgm(struct prueth_emac *emac,
> }
>
> swdata = cppi5_hdesc_get_swdata(desc_rx);
> - page = *swdata;
> + page = swdata->data.page;
> cppi5_hdesc_get_obuf(desc_rx, &buf_dma, &buf_dma_len);
> pkt_len = cppi5_hdesc_get_pktlen(desc_rx);
>
--
cheers,
-roger
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net-next v2 3/3] net: ti: icssg-prueth: Add XDP support
2025-02-10 10:33 ` [PATCH net-next v2 3/3] net: ti: icssg-prueth: Add XDP support Meghana Malladi
2025-02-11 18:39 ` kernel test robot
2025-02-11 22:20 ` kernel test robot
@ 2025-02-12 16:03 ` Roger Quadros
2025-02-18 18:29 ` Malladi, Meghana
2 siblings, 1 reply; 17+ messages in thread
From: Roger Quadros @ 2025-02-12 16:03 UTC (permalink / raw)
To: Meghana Malladi, danishanwar, pabeni, kuba, edumazet, davem,
andrew+netdev
Cc: bpf, linux-arm-kernel, linux-kernel, netdev, u.kleine-koenig,
krzysztof.kozlowski, dan.carpenter, schnelle, glaroque, rdunlap,
diogo.ivo, jan.kiszka, john.fastabend, hawk, daniel, ast, srk,
Vignesh Raghavendra
On 10/02/2025 12:33, Meghana Malladi wrote:
> From: Roger Quadros <rogerq@kernel.org>
>
> Add native XDP support. We do not support zero copy yet.
>
> Signed-off-by: Roger Quadros <rogerq@kernel.org>
> Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
> Signed-off-by: Meghana Malladi <m-malladi@ti.com>
> ---
> v1: https://lore.kernel.org/all/20250122124951.3072410-1-m-malladi@ti.com/
>
> Changes since v1 (v2-v1):
> - Fix XDP typo in the commit message
> - Add XDP feature flags using xdp_set_features_flag()
> - Use xdp_build_skb_from_buff() when XDP ran
>
> All the above changes have been suggested by Ido Schimmel <idosch@idosch.org>
>
> drivers/net/ethernet/ti/icssg/icssg_common.c | 226 +++++++++++++++++--
> drivers/net/ethernet/ti/icssg/icssg_prueth.c | 123 +++++++++-
> drivers/net/ethernet/ti/icssg/icssg_prueth.h | 18 ++
> 3 files changed, 353 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/net/ethernet/ti/icssg/icssg_common.c b/drivers/net/ethernet/ti/icssg/icssg_common.c
> index a124c5773551..b01750a2d57e 100644
> --- a/drivers/net/ethernet/ti/icssg/icssg_common.c
> +++ b/drivers/net/ethernet/ti/icssg/icssg_common.c
> @@ -98,11 +98,19 @@ void prueth_xmit_free(struct prueth_tx_chn *tx_chn,
> {
> struct cppi5_host_desc_t *first_desc, *next_desc;
> dma_addr_t buf_dma, next_desc_dma;
> + struct prueth_swdata *swdata;
> u32 buf_dma_len;
>
> first_desc = desc;
> next_desc = first_desc;
>
> + swdata = cppi5_hdesc_get_swdata(desc);
> + if (swdata->type == PRUETH_SWDATA_PAGE) {
> + page_pool_recycle_direct(swdata->rx_chn->pg_pool,
> + swdata->data.page);
if swdata->data.page.pp already contains the page_pool then you can avoid
passing around rx_chn via swdata altogether.
> + goto free_desc;
> + }
> +
> cppi5_hdesc_get_obuf(first_desc, &buf_dma, &buf_dma_len);
> k3_udma_glue_tx_cppi5_to_dma_addr(tx_chn->tx_chn, &buf_dma);
>
> @@ -126,6 +134,7 @@ void prueth_xmit_free(struct prueth_tx_chn *tx_chn,
> k3_cppi_desc_pool_free(tx_chn->desc_pool, next_desc);
> }
>
> +free_desc:
> k3_cppi_desc_pool_free(tx_chn->desc_pool, first_desc);
> }
> EXPORT_SYMBOL_GPL(prueth_xmit_free);
> @@ -139,6 +148,7 @@ int emac_tx_complete_packets(struct prueth_emac *emac, int chn,
> struct prueth_swdata *swdata;
> struct prueth_tx_chn *tx_chn;
> unsigned int total_bytes = 0;
> + struct xdp_frame *xdpf;
> struct sk_buff *skb;
> dma_addr_t desc_dma;
> int res, num_tx = 0;
> @@ -168,20 +178,29 @@ int emac_tx_complete_packets(struct prueth_emac *emac, int chn,
> continue;
> }
>
> - if (swdata->type != PRUETH_SWDATA_SKB) {
> + switch (swdata->type) {
> + case PRUETH_SWDATA_SKB:
> + skb = swdata->data.skb;
> + ndev->stats.tx_bytes += skb->len;
> + ndev->stats.tx_packets++;
dev_sw_netstats_tx_add() instead?
> + total_bytes += skb->len;
> + napi_consume_skb(skb, budget);
> + break;
> + case PRUETH_SWDATA_XDPF:
> + xdpf = swdata->data.xdpf;
> + ndev->stats.tx_bytes += xdpf->len;
> + ndev->stats.tx_packets++;
here too
> + total_bytes += xdpf->len;
> + xdp_return_frame(xdpf);
> + break;
> + default:
> netdev_err(ndev, "tx_complete: invalid swdata type %d\n", swdata->type);
ndev->stats.tx_dropped++
> + prueth_xmit_free(tx_chn, desc_tx);
> budget++;
> continue;
> }
>
> - skb = swdata->data.skb;
> prueth_xmit_free(tx_chn, desc_tx);
> -
> - ndev = skb->dev;
> - ndev->stats.tx_packets++;
> - ndev->stats.tx_bytes += skb->len;
> - total_bytes += skb->len;
> - napi_consume_skb(skb, budget);
> num_tx++;
> }
>
> @@ -498,6 +517,7 @@ int prueth_dma_rx_push_mapped(struct prueth_emac *emac,
> swdata = cppi5_hdesc_get_swdata(desc_rx);
> swdata->type = PRUETH_SWDATA_PAGE;
> swdata->data.page = page;
> + swdata->rx_chn = rx_chn;
>
> return k3_udma_glue_push_rx_chn(rx_chn->rx_chn, PRUETH_RX_FLOW_DATA,
> desc_rx, desc_dma);
> @@ -540,7 +560,156 @@ void emac_rx_timestamp(struct prueth_emac *emac,
> ssh->hwtstamp = ns_to_ktime(ns);
> }
>
> -static int emac_rx_packet(struct prueth_emac *emac, u32 flow_id)
> +/**
> + * emac_xmit_xdp_frame - transmits an XDP frame
> + * @emac: emac device
> + * @xdpf: data to transmit
> + * @page: page from page pool if already DMA mapped
> + * @q_idx: queue id
> + *
> + * Return: XDP state
> + */
> +int emac_xmit_xdp_frame(struct prueth_emac *emac,
> + struct xdp_frame *xdpf,
> + struct page *page,
> + unsigned int q_idx)
> +{
> + struct cppi5_host_desc_t *first_desc;
> + struct net_device *ndev = emac->ndev;
> + struct prueth_tx_chn *tx_chn;
> + dma_addr_t desc_dma, buf_dma;
> + struct prueth_swdata *swdata;
> + u32 *epib;
> + int ret;
> +
> + void *data = xdpf->data;
> + u32 pkt_len = xdpf->len;
> +
> + if (q_idx >= PRUETH_MAX_TX_QUEUES) {
> + netdev_err(ndev, "xdp tx: invalid q_id %d\n", q_idx);
ndev->stats.tx_dropped++;
> + return ICSSG_XDP_CONSUMED; /* drop */
> + }
> +
> + tx_chn = &emac->tx_chns[q_idx];
> +
> + if (page) { /* already DMA mapped by page_pool */
> + buf_dma = page_pool_get_dma_addr(page);
> + buf_dma += xdpf->headroom + sizeof(struct xdp_frame);
> + } else { /* Map the linear buffer */
> + buf_dma = dma_map_single(tx_chn->dma_dev, data, pkt_len, DMA_TO_DEVICE);
> + if (dma_mapping_error(tx_chn->dma_dev, buf_dma)) {
> + netdev_err(ndev, "xdp tx: failed to map data buffer\n");
ndev->stats.tx_dropped++;
> + return ICSSG_XDP_CONSUMED; /* drop */
> + }
> + }
> +
> + first_desc = k3_cppi_desc_pool_alloc(tx_chn->desc_pool);
> + if (!first_desc) {
> + netdev_dbg(ndev, "xdp tx: failed to allocate descriptor\n");
> + if (!page)
> + dma_unmap_single(tx_chn->dma_dev, buf_dma, pkt_len, DMA_TO_DEVICE);
Better to do the k3_cppi_desc_pool_alloc() before the DMA mapping
so it is easier to clean up on failure.
> + goto drop_free_descs; /* drop */
> + }
> +
> + cppi5_hdesc_init(first_desc, CPPI5_INFO0_HDESC_EPIB_PRESENT,
> + PRUETH_NAV_PS_DATA_SIZE);
> + cppi5_hdesc_set_pkttype(first_desc, 0);
> + epib = first_desc->epib;
> + epib[0] = 0;
> + epib[1] = 0;
> +
> + /* set dst tag to indicate internal qid at the firmware which is at
> + * bit8..bit15. bit0..bit7 indicates port num for directed
> + * packets in case of switch mode operation
> + */
> + cppi5_desc_set_tags_ids(&first_desc->hdr, 0, (emac->port_id | (q_idx << 8)));
> + k3_udma_glue_tx_dma_to_cppi5_addr(tx_chn->tx_chn, &buf_dma);
> + cppi5_hdesc_attach_buf(first_desc, buf_dma, pkt_len, buf_dma, pkt_len);
> + swdata = cppi5_hdesc_get_swdata(first_desc);
> + if (page) {
> + swdata->type = PRUETH_SWDATA_PAGE;
> + swdata->data.page = page;
> + /* we assume page came from RX channel page pool */
> + swdata->rx_chn = &emac->rx_chns;
> + } else {
> + swdata->type = PRUETH_SWDATA_XDPF;
> + swdata->data.xdpf = xdpf;
> + }
> +
> + cppi5_hdesc_set_pktlen(first_desc, pkt_len);
> + desc_dma = k3_cppi_desc_pool_virt2dma(tx_chn->desc_pool, first_desc);
> +
> + ret = k3_udma_glue_push_tx_chn(tx_chn->tx_chn, first_desc, desc_dma);
> + if (ret) {
> + netdev_err(ndev, "xdp tx: push failed: %d\n", ret);
> + goto drop_free_descs;
> + }
> +
> + return ICSSG_XDP_TX;
> +
> +drop_free_descs:
ndev->stats.tx_dropped++;
> + prueth_xmit_free(tx_chn, first_desc);
this will also unmap the dma_buffer for all cases. So maybe you need
to add a flag to prueth_xmit_free() to skip unmap for certain cases.
> + return ICSSG_XDP_CONSUMED;
> +}
> +EXPORT_SYMBOL_GPL(emac_xmit_xdp_frame);
> +
> +/**
> + * emac_run_xdp - run an XDP program
> + * @emac: emac device
> + * @xdp: XDP buffer containing the frame
> + * @page: page with RX data if already DMA mapped
> + *
> + * Return: XDP state
> + */
> +static int emac_run_xdp(struct prueth_emac *emac, struct xdp_buff *xdp,
> + struct page *page)
> +{
> + int err, result = ICSSG_XDP_PASS;
> + struct bpf_prog *xdp_prog;
> + struct xdp_frame *xdpf;
> + int q_idx;
> + u32 act;
> +
> + xdp_prog = READ_ONCE(emac->xdp_prog);
> +
unnecessary new line.
> + act = bpf_prog_run_xdp(xdp_prog, xdp);
> + switch (act) {
> + case XDP_PASS:
return ICSSG_XDP_PASS;
> + break;
> + case XDP_TX:
> + /* Send packet to TX ring for immediate transmission */
> + xdpf = xdp_convert_buff_to_frame(xdp);
> + if (unlikely(!xdpf))
ndev->stats.tx_dropped++;
> + goto drop;
> +
> + q_idx = smp_processor_id() % emac->tx_ch_num;
> + result = emac_xmit_xdp_frame(emac, xdpf, page, q_idx);
> + if (result == ICSSG_XDP_CONSUMED)
> + goto drop;
increment tx stats?
return ICSSG_XDP_TX;
> + break;
> + case XDP_REDIRECT:
> + err = xdp_do_redirect(emac->ndev, xdp, xdp_prog);
> + if (err)
> + goto drop;
> + result = ICSSG_XDP_REDIR;
return ICSSG_XDP_REDIR
> + break;
> + default:
> + bpf_warn_invalid_xdp_action(emac->ndev, xdp_prog, act);
> + fallthrough;
> + case XDP_ABORTED:
> +drop:
> + trace_xdp_exception(emac->ndev, xdp_prog, act);
> + fallthrough; /* handle aborts by dropping packet */
> + case XDP_DROP:
ndev->stats.rx_dropped++;
> + result = ICSSG_XDP_CONSUMED;
> + page_pool_recycle_direct(emac->rx_chns.pg_pool, page);
> + break;
> + }
> +
> + return result;
> +}
> +
> +static int emac_rx_packet(struct prueth_emac *emac, u32 flow_id, int *xdp_state)
> {
> struct prueth_rx_chn *rx_chn = &emac->rx_chns;
> u32 buf_dma_len, pkt_len, port_id = 0;
> @@ -551,10 +720,12 @@ static int emac_rx_packet(struct prueth_emac *emac, u32 flow_id)
> struct page *page, *new_page;
> struct page_pool *pool;
> struct sk_buff *skb;
> + struct xdp_buff xdp;
> u32 *psdata;
> void *pa;
> int ret;
>
> + *xdp_state = 0;
> pool = rx_chn->pg_pool;
> ret = k3_udma_glue_pop_rx_chn(rx_chn->rx_chn, flow_id, &desc_dma);
> if (ret) {
> @@ -594,9 +765,21 @@ static int emac_rx_packet(struct prueth_emac *emac, u32 flow_id)
> goto requeue;
> }
>
> - /* prepare skb and send to n/w stack */
> pa = page_address(page);
> - skb = napi_build_skb(pa, PAGE_SIZE);
> + if (emac->xdp_prog) {
> + xdp_init_buff(&xdp, PAGE_SIZE, &rx_chn->xdp_rxq);
> + xdp_prepare_buff(&xdp, pa, PRUETH_HEADROOM, pkt_len, false);
> +
> + *xdp_state = emac_run_xdp(emac, &xdp, page);
> + if (*xdp_state == ICSSG_XDP_PASS)
> + skb = xdp_build_skb_from_buff(&xdp);
> + else
> + goto requeue;
> + } else {
> + /* prepare skb and send to n/w stack */
> + skb = napi_build_skb(pa, PAGE_SIZE);
> + }
> +
> if (!skb) {
> ndev->stats.rx_dropped++;
> page_pool_recycle_direct(pool, page);
> @@ -859,14 +1042,25 @@ static void prueth_tx_cleanup(void *data, dma_addr_t desc_dma)
> struct prueth_tx_chn *tx_chn = data;
> struct cppi5_host_desc_t *desc_tx;
> struct prueth_swdata *swdata;
> + struct xdp_frame *xdpf;
> struct sk_buff *skb;
>
> desc_tx = k3_cppi_desc_pool_dma2virt(tx_chn->desc_pool, desc_dma);
> swdata = cppi5_hdesc_get_swdata(desc_tx);
> - if (swdata->type == PRUETH_SWDATA_SKB) {
> +
> + switch (swdata->type) {
> + case PRUETH_SWDATA_SKB:
> skb = swdata->data.skb;
> dev_kfree_skb_any(skb);
> + break;
> + case PRUETH_SWDATA_XDPF:
> + xdpf = swdata->data.xdpf;
> + xdp_return_frame(xdpf);
> + break;
what about PRUETH_SWDATA_PAGE?
> + default:
> + break;
> }
> +
> prueth_xmit_free(tx_chn, desc_tx);
> }
>
> @@ -901,15 +1095,18 @@ int icssg_napi_rx_poll(struct napi_struct *napi_rx, int budget)
> PRUETH_RX_FLOW_DATA_SR1 : PRUETH_RX_FLOW_DATA;
> int flow = emac->is_sr1 ?
> PRUETH_MAX_RX_FLOWS_SR1 : PRUETH_MAX_RX_FLOWS;
> + int xdp_state_or = 0;
> int num_rx = 0;
> int cur_budget;
> + int xdp_state;
> int ret;
>
> while (flow--) {
> cur_budget = budget - num_rx;
>
> while (cur_budget--) {
> - ret = emac_rx_packet(emac, flow);
> + ret = emac_rx_packet(emac, flow, &xdp_state);
> + xdp_state_or |= xdp_state;
> if (ret)
> break;
> num_rx++;
> @@ -919,6 +1116,9 @@ int icssg_napi_rx_poll(struct napi_struct *napi_rx, int budget)
> break;
> }
>
> + if (xdp_state_or & ICSSG_XDP_REDIR)
> + xdp_do_flush();
> +
> if (num_rx < budget && napi_complete_done(napi_rx, num_rx)) {
> if (unlikely(emac->rx_pace_timeout_ns)) {
> hrtimer_start(&emac->rx_hrtimer,
> diff --git a/drivers/net/ethernet/ti/icssg/icssg_prueth.c b/drivers/net/ethernet/ti/icssg/icssg_prueth.c
> index e5e4efe485f6..a360a1d6f8d7 100644
> --- a/drivers/net/ethernet/ti/icssg/icssg_prueth.c
> +++ b/drivers/net/ethernet/ti/icssg/icssg_prueth.c
> @@ -559,6 +559,33 @@ const struct icss_iep_clockops prueth_iep_clockops = {
> .perout_enable = prueth_perout_enable,
> };
>
> +static int prueth_create_xdp_rxqs(struct prueth_emac *emac)
> +{
> + struct xdp_rxq_info *rxq = &emac->rx_chns.xdp_rxq;
> + struct page_pool *pool = emac->rx_chns.pg_pool;
> + int ret;
> +
> + ret = xdp_rxq_info_reg(rxq, emac->ndev, 0, rxq->napi_id);
but who sets rxq->napi_id?
I think you need to use emac->napi_rx.napi_id
> + if (ret)
> + return ret;
> +
> + ret = xdp_rxq_info_reg_mem_model(rxq, MEM_TYPE_PAGE_POOL, pool);
> + if (ret)
> + xdp_rxq_info_unreg(rxq);
> +
> + return ret;
> +}
> +
> +static void prueth_destroy_xdp_rxqs(struct prueth_emac *emac)
> +{
> + struct xdp_rxq_info *rxq = &emac->rx_chns.xdp_rxq;
> +
> + if (!xdp_rxq_info_is_reg(rxq))
> + return;
> +
> + xdp_rxq_info_unreg(rxq);
> +}
> +
> static int icssg_prueth_add_mcast(struct net_device *ndev, const u8 *addr)
> {
> struct net_device *real_dev;
> @@ -780,10 +807,14 @@ static int emac_ndo_open(struct net_device *ndev)
> if (ret)
> goto free_tx_ts_irq;
>
> - ret = k3_udma_glue_enable_rx_chn(emac->rx_chns.rx_chn);
> + ret = prueth_create_xdp_rxqs(emac);
> if (ret)
> goto reset_rx_chn;
>
> + ret = k3_udma_glue_enable_rx_chn(emac->rx_chns.rx_chn);
> + if (ret)
> + goto destroy_xdp_rxqs;
> +
> for (i = 0; i < emac->tx_ch_num; i++) {
> ret = k3_udma_glue_enable_tx_chn(emac->tx_chns[i].tx_chn);
> if (ret)
> @@ -809,6 +840,8 @@ static int emac_ndo_open(struct net_device *ndev)
> * any SKB for completion. So set false to free_skb
> */
> prueth_reset_tx_chan(emac, i, false);
> +destroy_xdp_rxqs:
> + prueth_destroy_xdp_rxqs(emac);
> reset_rx_chn:
> prueth_reset_rx_chan(&emac->rx_chns, max_rx_flows, false);
> free_tx_ts_irq:
> @@ -880,6 +913,8 @@ static int emac_ndo_stop(struct net_device *ndev)
>
> prueth_reset_rx_chan(&emac->rx_chns, max_rx_flows, true);
>
Please drop new line.
> + prueth_destroy_xdp_rxqs(emac);
> +
here too.
> napi_disable(&emac->napi_rx);
> hrtimer_cancel(&emac->rx_hrtimer);
>
> @@ -1024,6 +1059,90 @@ static int emac_ndo_vlan_rx_del_vid(struct net_device *ndev,
> return 0;
> }
>
> +/**
> + * emac_xdp_xmit - Implements ndo_xdp_xmit
> + * @dev: netdev
> + * @n: number of frames
> + * @frames: array of XDP buffer pointers
> + * @flags: XDP extra info
> + *
> + * Return: number of frames successfully sent. Failed frames
> + * will be free'ed by XDP core.
> + *
> + * For error cases, a negative errno code is returned and no-frames
> + * are transmitted (caller must handle freeing frames).
> + **/
> +static int emac_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames,
> + u32 flags)
> +{
> + struct prueth_emac *emac = netdev_priv(dev);
> + unsigned int q_idx;
> + int nxmit = 0;
> + int i;
> +
> + q_idx = smp_processor_id() % emac->tx_ch_num;
> +
> + if (unlikely(flags & ~XDP_XMIT_FLAGS_MASK))
> + return -EINVAL;
> +
> + for (i = 0; i < n; i++) {
> + struct xdp_frame *xdpf = frames[i];
> + int err;
> +
> + err = emac_xmit_xdp_frame(emac, xdpf, NULL, q_idx);
> + if (err != ICSSG_XDP_TX)
> + break;
> + nxmit++;
> + }
> +
> + return nxmit;
> +}
> +
> +/**
> + * emac_xdp_setup - add/remove an XDP program
> + * @emac: emac device
> + * @bpf: XDP program
> + *
> + * Return: Always 0 (Success)
> + **/
> +static int emac_xdp_setup(struct prueth_emac *emac, struct netdev_bpf *bpf)
> +{
> + struct bpf_prog *prog = bpf->prog;
> + xdp_features_t val;
> +
> + val = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT |
> + NETDEV_XDP_ACT_NDO_XMIT;
> + xdp_set_features_flag(emac->ndev, val);
> +
> + if (!emac->xdpi.prog && !prog)
> + return 0;
> +
> + WRITE_ONCE(emac->xdp_prog, prog);
> +
> + xdp_attachment_setup(&emac->xdpi, bpf);
> +
> + return 0;
> +}
> +
> +/**
> + * emac_ndo_bpf - implements ndo_bpf for icssg_prueth
> + * @ndev: network adapter device
> + * @bpf: XDP program
> + *
> + * Return: 0 on success, error code on failure.
> + **/
> +static int emac_ndo_bpf(struct net_device *ndev, struct netdev_bpf *bpf)
> +{
> + struct prueth_emac *emac = netdev_priv(ndev);
> +
> + switch (bpf->command) {
> + case XDP_SETUP_PROG:
> + return emac_xdp_setup(emac, bpf);
> + default:
> + return -EINVAL;
> + }
> +}
> +
> static const struct net_device_ops emac_netdev_ops = {
> .ndo_open = emac_ndo_open,
> .ndo_stop = emac_ndo_stop,
> @@ -1038,6 +1157,8 @@ static const struct net_device_ops emac_netdev_ops = {
> .ndo_fix_features = emac_ndo_fix_features,
> .ndo_vlan_rx_add_vid = emac_ndo_vlan_rx_add_vid,
> .ndo_vlan_rx_kill_vid = emac_ndo_vlan_rx_del_vid,
> + .ndo_bpf = emac_ndo_bpf,
> + .ndo_xdp_xmit = emac_xdp_xmit,
> };
>
> static int prueth_netdev_init(struct prueth *prueth,
> diff --git a/drivers/net/ethernet/ti/icssg/icssg_prueth.h b/drivers/net/ethernet/ti/icssg/icssg_prueth.h
> index 2c8585255b7c..fb8dc8e12c19 100644
> --- a/drivers/net/ethernet/ti/icssg/icssg_prueth.h
> +++ b/drivers/net/ethernet/ti/icssg/icssg_prueth.h
> @@ -8,6 +8,8 @@
> #ifndef __NET_TI_ICSSG_PRUETH_H
> #define __NET_TI_ICSSG_PRUETH_H
>
> +#include <linux/bpf.h>
> +#include <linux/bpf_trace.h>
> #include <linux/etherdevice.h>
> #include <linux/genalloc.h>
> #include <linux/if_vlan.h>
> @@ -134,6 +136,7 @@ struct prueth_rx_chn {
> unsigned int irq[ICSSG_MAX_RFLOWS]; /* separate irq per flow */
> char name[32];
> struct page_pool *pg_pool;
> + struct xdp_rxq_info xdp_rxq;
> };
>
> enum prueth_swdata_type {
> @@ -141,16 +144,19 @@ enum prueth_swdata_type {
> PRUETH_SWDATA_SKB,
> PRUETH_SWDATA_PAGE,
> PRUETH_SWDATA_CMD,
> + PRUETH_SWDATA_XDPF,
> };
>
> union prueth_data {
> struct sk_buff *skb;
> struct page *page;
> u32 cmd;
> + struct xdp_frame *xdpf;
> };
>
> struct prueth_swdata {
> union prueth_data data;
> + struct prueth_rx_chn *rx_chn;
> enum prueth_swdata_type type;
> };
>
> @@ -161,6 +167,12 @@ struct prueth_swdata {
>
> #define PRUETH_MAX_TX_TS_REQUESTS 50 /* Max simultaneous TX_TS requests */
>
> +/* XDP BPF state */
> +#define ICSSG_XDP_PASS 0
> +#define ICSSG_XDP_CONSUMED BIT(0)
> +#define ICSSG_XDP_TX BIT(1)
> +#define ICSSG_XDP_REDIR BIT(2)
> +
> /* Minimum coalesce time in usecs for both Tx and Rx */
> #define ICSSG_MIN_COALESCE_USECS 20
>
> @@ -229,6 +241,8 @@ struct prueth_emac {
> unsigned long rx_pace_timeout_ns;
>
> struct netdev_hw_addr_list vlan_mcast_list[MAX_VLAN_ID];
> + struct bpf_prog *xdp_prog;
> + struct xdp_attachment_info xdpi;
> };
>
> /* The buf includes headroom compatible with both skb and xdpf */
> @@ -467,5 +481,9 @@ void prueth_put_cores(struct prueth *prueth, int slice);
>
> /* Revision specific helper */
> u64 icssg_ts_to_ns(u32 hi_sw, u32 hi, u32 lo, u32 cycle_time_ns);
> +int emac_xmit_xdp_frame(struct prueth_emac *emac,
> + struct xdp_frame *xdpf,
> + struct page *page,
> + unsigned int q_idx);
>
> #endif /* __NET_TI_ICSSG_PRUETH_H */
--
cheers,
-roger
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net-next v2 1/3] net: ti: icssg-prueth: Use page_pool API for RX buffer allocation
2025-02-12 13:56 ` Roger Quadros
@ 2025-02-18 10:10 ` Malladi, Meghana
2025-02-18 12:24 ` Diogo Ivo
2025-02-24 13:20 ` Roger Quadros
0 siblings, 2 replies; 17+ messages in thread
From: Malladi, Meghana @ 2025-02-18 10:10 UTC (permalink / raw)
To: Roger Quadros, danishanwar, pabeni, kuba, edumazet, davem,
andrew+netdev
Cc: bpf, linux-arm-kernel, linux-kernel, netdev, u.kleine-koenig,
krzysztof.kozlowski, dan.carpenter, schnelle, glaroque, rdunlap,
diogo.ivo, jan.kiszka, john.fastabend, hawk, daniel, ast, srk,
Vignesh Raghavendra
Hi Roger,
Thanks for reviewing this patch series.
On 2/12/2025 7:26 PM, Roger Quadros wrote:
>
>
> On 10/02/2025 12:33, Meghana Malladi wrote:
>> From: Roger Quadros <rogerq@kernel.org>
>>
>> This is to prepare for native XDP support.
>>
>> The page pool API is more faster in allocating pages than
>> __alloc_skb(). Drawback is that it works at PAGE_SIZE granularity
>> so we are not efficient in memory usage.
>> i.e. we are using PAGE_SIZE (4KB) memory for 1.5KB max packet size.
>>
>> Signed-off-by: Roger Quadros <rogerq@kernel.org>
>> Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
>> Signed-off-by: Meghana Malladi <m-malladi@ti.com>
>> ---
>> v1: https://lore.kernel.org/all/20250122124951.3072410-1-m-malladi@ti.com/
>>
>> Changes since v1 (v2-v1):
>> - Recycle pages wherever necessary using skb_mark_for_recycle()
>> - Use napi_build_skb() instead of build_skb()
>> - Update with correct frag_size argument in napi_build_skb()
>> - Use napi_gro_receive() instead of netif_receive_skb()
>> - Use PP_FLAG_DMA_SYNC_DEV to enable DMA sync with device
>> - Use page_pool_dma_sync_for_cpu() to sync Rx page pool for CPU
>>
>> All the above changes have been suggested by Ido Schimmel <idosch@idosch.org>
>>
>> drivers/net/ethernet/ti/Kconfig | 1 +
>> drivers/net/ethernet/ti/icssg/icssg_common.c | 174 ++++++++++++------
>> drivers/net/ethernet/ti/icssg/icssg_prueth.h | 14 +-
>> .../net/ethernet/ti/icssg/icssg_prueth_sr1.c | 21 ++-
>> 4 files changed, 140 insertions(+), 70 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/ti/Kconfig b/drivers/net/ethernet/ti/Kconfig
>> index 0d5a862cd78a..b461281d31b6 100644
>> --- a/drivers/net/ethernet/ti/Kconfig
>> +++ b/drivers/net/ethernet/ti/Kconfig
>> @@ -204,6 +204,7 @@ config TI_ICSSG_PRUETH_SR1
>> select PHYLIB
>> select TI_ICSS_IEP
>> select TI_K3_CPPI_DESC_POOL
>> + select PAGE_POOL
>> depends on PRU_REMOTEPROC
>> depends on NET_SWITCHDEV
>> depends on ARCH_K3 && OF && TI_K3_UDMA_GLUE_LAYER
>> diff --git a/drivers/net/ethernet/ti/icssg/icssg_common.c b/drivers/net/ethernet/ti/icssg/icssg_common.c
>> index 74f0f200a89d..c3c1e2bf461e 100644
>> --- a/drivers/net/ethernet/ti/icssg/icssg_common.c
>> +++ b/drivers/net/ethernet/ti/icssg/icssg_common.c
>> @@ -45,6 +45,11 @@ void prueth_cleanup_rx_chns(struct prueth_emac *emac,
>> struct prueth_rx_chn *rx_chn,
>> int max_rflows)
>> {
>> + if (rx_chn->pg_pool) {
>> + page_pool_destroy(rx_chn->pg_pool);
>> + rx_chn->pg_pool = NULL;
>> + }
>> +
>> if (rx_chn->desc_pool)
>> k3_cppi_desc_pool_destroy(rx_chn->desc_pool);
>>
>> @@ -461,17 +466,17 @@ int prueth_init_rx_chns(struct prueth_emac *emac,
>> }
>> EXPORT_SYMBOL_GPL(prueth_init_rx_chns);
>>
>> -int prueth_dma_rx_push(struct prueth_emac *emac,
>> - struct sk_buff *skb,
>> - struct prueth_rx_chn *rx_chn)
>> +int prueth_dma_rx_push_mapped(struct prueth_emac *emac,
>> + struct prueth_rx_chn *rx_chn,
>> + struct page *page, u32 buf_len)
>> {
>> struct net_device *ndev = emac->ndev;
>> struct cppi5_host_desc_t *desc_rx;
>> - u32 pkt_len = skb_tailroom(skb);
>> dma_addr_t desc_dma;
>> dma_addr_t buf_dma;
>> void **swdata;
>>
>> + buf_dma = page_pool_get_dma_addr(page) + PRUETH_HEADROOM;
>> desc_rx = k3_cppi_desc_pool_alloc(rx_chn->desc_pool);
>> if (!desc_rx) {
>> netdev_err(ndev, "rx push: failed to allocate descriptor\n");
>> @@ -479,25 +484,18 @@ int prueth_dma_rx_push(struct prueth_emac *emac,
>> }
>> desc_dma = k3_cppi_desc_pool_virt2dma(rx_chn->desc_pool, desc_rx);
>>
>> - buf_dma = dma_map_single(rx_chn->dma_dev, skb->data, pkt_len, DMA_FROM_DEVICE);
>> - if (unlikely(dma_mapping_error(rx_chn->dma_dev, buf_dma))) {
>> - k3_cppi_desc_pool_free(rx_chn->desc_pool, desc_rx);
>> - netdev_err(ndev, "rx push: failed to map rx pkt buffer\n");
>> - return -EINVAL;
>> - }
>> -
>> cppi5_hdesc_init(desc_rx, CPPI5_INFO0_HDESC_EPIB_PRESENT,
>> PRUETH_NAV_PS_DATA_SIZE);
>> k3_udma_glue_rx_dma_to_cppi5_addr(rx_chn->rx_chn, &buf_dma);
>> - cppi5_hdesc_attach_buf(desc_rx, buf_dma, skb_tailroom(skb), buf_dma, skb_tailroom(skb));
>> + cppi5_hdesc_attach_buf(desc_rx, buf_dma, buf_len, buf_dma, buf_len);
>>
>> swdata = cppi5_hdesc_get_swdata(desc_rx);
>> - *swdata = skb;
>> + *swdata = page;
>>
>> - return k3_udma_glue_push_rx_chn(rx_chn->rx_chn, 0,
>> + return k3_udma_glue_push_rx_chn(rx_chn->rx_chn, PRUETH_RX_FLOW_DATA,
>> desc_rx, desc_dma);
>> }
>> -EXPORT_SYMBOL_GPL(prueth_dma_rx_push);
>> +EXPORT_SYMBOL_GPL(prueth_dma_rx_push_mapped);
>>
>> u64 icssg_ts_to_ns(u32 hi_sw, u32 hi, u32 lo, u32 cycle_time_ns)
>> {
>> @@ -541,12 +539,16 @@ static int emac_rx_packet(struct prueth_emac *emac, u32 flow_id)
>> u32 buf_dma_len, pkt_len, port_id = 0;
>> struct net_device *ndev = emac->ndev;
>> struct cppi5_host_desc_t *desc_rx;
>> - struct sk_buff *skb, *new_skb;
>> dma_addr_t desc_dma, buf_dma;
>> + struct page *page, *new_page;
>> + struct page_pool *pool;
>> + struct sk_buff *skb;
>> void **swdata;
>> u32 *psdata;
>> + void *pa;
>> int ret;
>>
>> + pool = rx_chn->pg_pool;
>> ret = k3_udma_glue_pop_rx_chn(rx_chn->rx_chn, flow_id, &desc_dma);
>> if (ret) {
>> if (ret != -ENODATA)
>> @@ -558,15 +560,10 @@ static int emac_rx_packet(struct prueth_emac *emac, u32 flow_id)
>> return 0;
>>
>> desc_rx = k3_cppi_desc_pool_dma2virt(rx_chn->desc_pool, desc_dma);
>> -
>> swdata = cppi5_hdesc_get_swdata(desc_rx);
>> - skb = *swdata;
>> -
>> - psdata = cppi5_hdesc_get_psdata(desc_rx);
>> - /* RX HW timestamp */
>> - if (emac->rx_ts_enabled)
>> - emac_rx_timestamp(emac, skb, psdata);
>> + page = *swdata;
>>
> Unnecessary blank line.
>
Ok noted. I will address all the cosmetic changes suggested by you in v3.
>> + page_pool_dma_sync_for_cpu(pool, page, 0, PAGE_SIZE);
>> cppi5_hdesc_get_obuf(desc_rx, &buf_dma, &buf_dma_len);
>> k3_udma_glue_rx_cppi5_to_dma_addr(rx_chn->rx_chn, &buf_dma);
>> pkt_len = cppi5_hdesc_get_pktlen(desc_rx);
>> @@ -574,32 +571,51 @@ static int emac_rx_packet(struct prueth_emac *emac, u32 flow_id)
>> pkt_len -= 4;
>> cppi5_desc_get_tags_ids(&desc_rx->hdr, &port_id, NULL);
>>
>> - dma_unmap_single(rx_chn->dma_dev, buf_dma, buf_dma_len, DMA_FROM_DEVICE);
>> k3_cppi_desc_pool_free(rx_chn->desc_pool, desc_rx);
>>
>> - skb->dev = ndev;
>> - new_skb = netdev_alloc_skb_ip_align(ndev, PRUETH_MAX_PKT_SIZE);
>> /* if allocation fails we drop the packet but push the
>> - * descriptor back to the ring with old skb to prevent a stall
>> + * descriptor back to the ring with old page to prevent a stall
>> */
>> - if (!new_skb) {
>> + new_page = page_pool_dev_alloc_pages(pool);
>> + if (unlikely(!new_page)) {
>> + new_page = page;
>> ndev->stats.rx_dropped++;
>> - new_skb = skb;
>> - } else {
>> - /* send the filled skb up the n/w stack */
>> - skb_put(skb, pkt_len);
>> - if (emac->prueth->is_switch_mode)
>> - skb->offload_fwd_mark = emac->offload_fwd_mark;
>> - skb->protocol = eth_type_trans(skb, ndev);
>> - napi_gro_receive(&emac->napi_rx, skb);
>> - ndev->stats.rx_bytes += pkt_len;
>> - ndev->stats.rx_packets++;
>> + goto requeue;
>> + }
>> +
>> + /* prepare skb and send to n/w stack */
>> + pa = page_address(page);
>> + skb = napi_build_skb(pa, PAGE_SIZE);
>> + if (!skb) {
>> + ndev->stats.rx_dropped++;
>> + page_pool_recycle_direct(pool, page);
>> + goto requeue;
>> }
>>
>> + skb_reserve(skb, PRUETH_HEADROOM);
>> + skb_put(skb, pkt_len);
>> + skb->dev = ndev;
>> +
>> + psdata = cppi5_hdesc_get_psdata(desc_rx);
>> + /* RX HW timestamp */
>> + if (emac->rx_ts_enabled)
>> + emac_rx_timestamp(emac, skb, psdata);
>> +
>> + if (emac->prueth->is_switch_mode)
>> + skb->offload_fwd_mark = emac->offload_fwd_mark;
>> + skb->protocol = eth_type_trans(skb, ndev);
>> +
>> + skb_mark_for_recycle(skb);
>> + napi_gro_receive(&emac->napi_rx, skb);
>> + ndev->stats.rx_bytes += pkt_len;
>> + ndev->stats.rx_packets++;
>> +
>> +requeue:
>> /* queue another RX DMA */
>> - ret = prueth_dma_rx_push(emac, new_skb, &emac->rx_chns);
>> + ret = prueth_dma_rx_push_mapped(emac, &emac->rx_chns, new_page,
>> + PRUETH_MAX_PKT_SIZE);
>> if (WARN_ON(ret < 0)) {
>> - dev_kfree_skb_any(new_skb);
>> + page_pool_recycle_direct(pool, new_page);
>> ndev->stats.rx_errors++;
>> ndev->stats.rx_dropped++;
>> }
>> @@ -611,22 +627,17 @@ static void prueth_rx_cleanup(void *data, dma_addr_t desc_dma)
>> {
>> struct prueth_rx_chn *rx_chn = data;
>> struct cppi5_host_desc_t *desc_rx;
>> - struct sk_buff *skb;
>> - dma_addr_t buf_dma;
>> - u32 buf_dma_len;
>> + struct page_pool *pool;
>> + struct page *page;
>> void **swdata;
>>
>> + pool = rx_chn->pg_pool;
>> +
> here too.
>
>> desc_rx = k3_cppi_desc_pool_dma2virt(rx_chn->desc_pool, desc_dma);
>> swdata = cppi5_hdesc_get_swdata(desc_rx);
>> - skb = *swdata;
>> - cppi5_hdesc_get_obuf(desc_rx, &buf_dma, &buf_dma_len);
>> - k3_udma_glue_rx_cppi5_to_dma_addr(rx_chn->rx_chn, &buf_dma);
>> -
>> - dma_unmap_single(rx_chn->dma_dev, buf_dma, buf_dma_len,
>> - DMA_FROM_DEVICE);
>> + page = *swdata;
>> + page_pool_recycle_direct(pool, page);
>> k3_cppi_desc_pool_free(rx_chn->desc_pool, desc_rx);
>> -
>> - dev_kfree_skb_any(skb);
>> }
>>
>> static int prueth_tx_ts_cookie_get(struct prueth_emac *emac)
>> @@ -907,29 +918,71 @@ int icssg_napi_rx_poll(struct napi_struct *napi_rx, int budget)
>> }
>> EXPORT_SYMBOL_GPL(icssg_napi_rx_poll);
>>
>> +static struct page_pool *prueth_create_page_pool(struct prueth_emac *emac,
>> + struct device *dma_dev,
>> + int size)
>> +{
>> + struct page_pool_params pp_params = { 0 };
>> + struct page_pool *pool;
>> +
>> + pp_params.order = 0;
>> + pp_params.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV;
>> + pp_params.pool_size = size;
>> + pp_params.nid = dev_to_node(emac->prueth->dev);
>> + pp_params.dma_dir = DMA_BIDIRECTIONAL;
>> + pp_params.dev = dma_dev;
>> + pp_params.napi = &emac->napi_rx;
>> + pp_params.max_len = PAGE_SIZE;
>> +
>> + pool = page_pool_create(&pp_params);
>> + if (IS_ERR(pool))
>> + netdev_err(emac->ndev, "cannot create rx page pool\n");
>> +
>> + return pool;
>> +}
>> +
>> int prueth_prepare_rx_chan(struct prueth_emac *emac,
>> struct prueth_rx_chn *chn,
>> int buf_size)
>> {
>> - struct sk_buff *skb;
>> + struct page_pool *pool;
>> + struct page *page;
>> int i, ret;
>>
>> + pool = prueth_create_page_pool(emac, chn->dma_dev, chn->descs_num);
>> + if (IS_ERR(pool))
>> + return PTR_ERR(pool);
>> +
>> + chn->pg_pool = pool;
>> +
>> for (i = 0; i < chn->descs_num; i++) {
>> - skb = __netdev_alloc_skb_ip_align(NULL, buf_size, GFP_KERNEL);
>> - if (!skb)
>> - return -ENOMEM;
>> + /* NOTE: we're not using memory efficiently here.
>> + * 1 full page (4KB?) used here instead of
>> + * PRUETH_MAX_PKT_SIZE (~1.5KB?)
>> + */
>> + page = page_pool_dev_alloc_pages(pool);
>
> Did you evaluate Ido's suggestion to use page_pool_alloc_frag()?
>
Yes I have done some research on this. IMO, page_pool_alloc_frag() does
fit our use case to reduce the memory footprint, downside of this would
be overhead caused by atomic operations for page->pp_ref_count, also the
page cannot be recycled till the page->pp_ref_count becomes 1, i.e., if
the page fragments are not freed properly it will cause continuous page
allocation, eventually leading to leaky page pool.
Based on this presentation:
https://archive.fosdem.org/2020/schedule/event/xdp_and_page_pool_api/attachments/paper/3625/export/events/attachments/xdp_and_page_pool_api/paper/3625/XDP_and_page_pool.pdf
Under the XDP memory model section: it is quoted that -
- "Cannot allocate page fragments to support it (e.g. through
napi_alloc_skb())"
- "Rx buffers must be recycled to get high speed"
- "Optimized for one packet per page"
- "Supports split-page model (usually driver is in charge of recycling)"
FWIW, to ensure simplicity and avoid leaky page pools it is better to
use entire page instead of fragments, atleast for XDP. Please do correct
me if my understanding is wrong.
>> + if (!page) {
>> + netdev_err(emac->ndev, "couldn't allocate rx page\n");
>> + ret = -ENOMEM;
>> + goto recycle_alloc_pg;
>> + }
>>
>> - ret = prueth_dma_rx_push(emac, skb, chn);
>> + ret = prueth_dma_rx_push_mapped(emac, chn, page, buf_size);
>> if (ret < 0) {
>> netdev_err(emac->ndev,
>> - "cannot submit skb for rx chan %s ret %d\n",
>> + "cannot submit page for rx chan %s ret %d\n",
>> chn->name, ret);
>> - kfree_skb(skb);
>> - return ret;
>> + page_pool_recycle_direct(pool, page);
>> + goto recycle_alloc_pg;
>> }
>> }
>>
>> return 0;
>> +
>> +recycle_alloc_pg:
>> + prueth_reset_rx_chan(&emac->rx_chns, PRUETH_MAX_RX_FLOWS, false);
>> +
>> + return ret;
>> }
>> EXPORT_SYMBOL_GPL(prueth_prepare_rx_chan);
>>
>> @@ -958,6 +1011,9 @@ void prueth_reset_rx_chan(struct prueth_rx_chn *chn,
>> prueth_rx_cleanup, !!i);
>> if (disable)
>> k3_udma_glue_disable_rx_chn(chn->rx_chn);
>> +
>> + page_pool_destroy(chn->pg_pool);
>> + chn->pg_pool = NULL;
>> }
>> EXPORT_SYMBOL_GPL(prueth_reset_rx_chan);
>>
>> diff --git a/drivers/net/ethernet/ti/icssg/icssg_prueth.h b/drivers/net/ethernet/ti/icssg/icssg_prueth.h
>> index 329b46e9ee53..c7b906de18af 100644
>> --- a/drivers/net/ethernet/ti/icssg/icssg_prueth.h
>> +++ b/drivers/net/ethernet/ti/icssg/icssg_prueth.h
>> @@ -33,6 +33,8 @@
>> #include <linux/dma/k3-udma-glue.h>
>>
>> #include <net/devlink.h>
>> +#include <net/xdp.h>
>> +#include <net/page_pool/helpers.h>
>>
>> #include "icssg_config.h"
>> #include "icss_iep.h"
>> @@ -131,6 +133,7 @@ struct prueth_rx_chn {
>> u32 descs_num;
>> unsigned int irq[ICSSG_MAX_RFLOWS]; /* separate irq per flow */
>> char name[32];
>> + struct page_pool *pg_pool;
>> };
>>
>> /* There are 4 Tx DMA channels, but the highest priority is CH3 (thread 3)
>> @@ -210,6 +213,10 @@ struct prueth_emac {
>> struct netdev_hw_addr_list vlan_mcast_list[MAX_VLAN_ID];
>> };
>>
>> +/* The buf includes headroom compatible with both skb and xdpf */
>> +#define PRUETH_HEADROOM_NA (max(XDP_PACKET_HEADROOM, NET_SKB_PAD) + NET_IP_ALIGN)
>> +#define PRUETH_HEADROOM ALIGN(PRUETH_HEADROOM_NA, sizeof(long))
>> +
>> /**
>> * struct prueth_pdata - PRUeth platform data
>> * @fdqring_mode: Free desc queue mode
>> @@ -410,9 +417,10 @@ int prueth_init_rx_chns(struct prueth_emac *emac,
>> struct prueth_rx_chn *rx_chn,
>> char *name, u32 max_rflows,
>> u32 max_desc_num);
>> -int prueth_dma_rx_push(struct prueth_emac *emac,
>> - struct sk_buff *skb,
>> - struct prueth_rx_chn *rx_chn);
>> +int prueth_dma_rx_push_mapped(struct prueth_emac *emac,
>> + struct prueth_rx_chn *rx_chn,
>> + struct page *page, u32 buf_len);
>> +unsigned int prueth_rxbuf_total_len(unsigned int len);
>> void emac_rx_timestamp(struct prueth_emac *emac,
>> struct sk_buff *skb, u32 *psdata);
>> enum netdev_tx icssg_ndo_start_xmit(struct sk_buff *skb, struct net_device *ndev);
>> diff --git a/drivers/net/ethernet/ti/icssg/icssg_prueth_sr1.c b/drivers/net/ethernet/ti/icssg/icssg_prueth_sr1.c
>> index 64a19ff39562..aeeb8a50376b 100644
>> --- a/drivers/net/ethernet/ti/icssg/icssg_prueth_sr1.c
>> +++ b/drivers/net/ethernet/ti/icssg/icssg_prueth_sr1.c
>> @@ -274,10 +274,12 @@ static struct sk_buff *prueth_process_rx_mgm(struct prueth_emac *emac,
>> struct prueth_rx_chn *rx_chn = &emac->rx_mgm_chn;
>> struct net_device *ndev = emac->ndev;
>> struct cppi5_host_desc_t *desc_rx;
>> - struct sk_buff *skb, *new_skb;
>> + struct page *page, *new_page;
>> dma_addr_t desc_dma, buf_dma;
>> u32 buf_dma_len, pkt_len;
>> + struct sk_buff *skb;
>
> Can we get rid of SKB entirely from the management channel code?
> The packet received on this channel is never passed to user space so
> I don't see why SKB is required.
>
Yes I do agree with you on the fact the SKB here is not passed to the
network stack, hence this is redundant. But honestly I am not sure how
that can be done, because the callers of this function access skb->data
from the skb which is returned and the same can't be done with page (how
to pass the same data using page?)
Also as you are aware we are not currently supporting SR1 devices
anymore, hence I don't have any SR1 devices handy to test these changes
and ensure nothing is broken if I remove SKB entirely.
>> void **swdata;
>> + void *pa;
>> int ret;
>>
>> ret = k3_udma_glue_pop_rx_chn(rx_chn->rx_chn, flow_id, &desc_dma);
>> @@ -299,32 +301,35 @@ static struct sk_buff *prueth_process_rx_mgm(struct prueth_emac *emac,
>> }
>>
>> swdata = cppi5_hdesc_get_swdata(desc_rx);
>> - skb = *swdata;
>> + page = *swdata;
>> cppi5_hdesc_get_obuf(desc_rx, &buf_dma, &buf_dma_len);
>> pkt_len = cppi5_hdesc_get_pktlen(desc_rx);
>>
>> dma_unmap_single(rx_chn->dma_dev, buf_dma, buf_dma_len, DMA_FROM_DEVICE);
>> k3_cppi_desc_pool_free(rx_chn->desc_pool, desc_rx);
>>
>> - new_skb = netdev_alloc_skb_ip_align(ndev, PRUETH_MAX_PKT_SIZE);
>> + new_page = page_pool_dev_alloc_pages(rx_chn->pg_pool);
>> /* if allocation fails we drop the packet but push the
>> * descriptor back to the ring with old skb to prevent a stall
>> */
>> - if (!new_skb) {
>> + if (!new_page) {
>> netdev_err(ndev,
>> - "skb alloc failed, dropped mgm pkt from flow %d\n",
>> + "page alloc failed, dropped mgm pkt from flow %d\n",
>> flow_id);
>> - new_skb = skb;
>> + new_page = page;
>> skb = NULL; /* return NULL */
>> } else {
>> /* return the filled skb */
>> + pa = page_address(page);
>> + skb = napi_build_skb(pa, PAGE_SIZE);
>> skb_put(skb, pkt_len);
>> }
>>
>> /* queue another DMA */
>> - ret = prueth_dma_rx_push(emac, new_skb, &emac->rx_mgm_chn);
>> + ret = prueth_dma_rx_push_mapped(emac, &emac->rx_chns, new_page,
>> + PRUETH_MAX_PKT_SIZE);
>> if (WARN_ON(ret < 0))
>> - dev_kfree_skb_any(new_skb);
>> + page_pool_recycle_direct(rx_chn->pg_pool, new_page);
>>
>> return skb;
>> }
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net-next v2 2/3] net: ti: icssg-prueth: introduce and use prueth_swdata struct for SWDATA
2025-02-12 14:12 ` Roger Quadros
@ 2025-02-18 11:02 ` Malladi, Meghana
0 siblings, 0 replies; 17+ messages in thread
From: Malladi, Meghana @ 2025-02-18 11:02 UTC (permalink / raw)
To: Roger Quadros, danishanwar, pabeni, kuba, edumazet, davem,
andrew+netdev
Cc: bpf, linux-arm-kernel, linux-kernel, netdev, u.kleine-koenig,
krzysztof.kozlowski, dan.carpenter, schnelle, glaroque, rdunlap,
diogo.ivo, jan.kiszka, john.fastabend, hawk, daniel, ast, srk,
Vignesh Raghavendra
On 2/12/2025 7:42 PM, Roger Quadros wrote:
> Hi Meghana,
>
> On 10/02/2025 12:33, Meghana Malladi wrote:
>> From: Roger Quadros <rogerq@kernel.org>
>>
>> We have different cases for SWDATA (skb, page, cmd, etc)
>> so it is better to have a dedicated data structure for that.
>> We can embed the type field inside the struct and use it
>> to interpret the data in completion handlers.
>>
>> Increase SWDATA size to 48 so we have some room to add
>> more data if required.
>>
>> Signed-off-by: Roger Quadros <rogerq@kernel.org>
>> Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
>> Signed-off-by: Meghana Malladi <m-malladi@ti.com>
>> ---
>> v1: https://lore.kernel.org/all/20250122124951.3072410-1-m-malladi@ti.com/
>>
>> Changes since v1 (v2-v1): None
>>
>> drivers/net/ethernet/ti/icssg/icssg_common.c | 47 ++++++++++++-------
>> drivers/net/ethernet/ti/icssg/icssg_config.h | 2 +-
>> drivers/net/ethernet/ti/icssg/icssg_prueth.c | 6 +++
>> drivers/net/ethernet/ti/icssg/icssg_prueth.h | 18 +++++++
>> .../net/ethernet/ti/icssg/icssg_prueth_sr1.c | 4 +-
>> 5 files changed, 58 insertions(+), 19 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/ti/icssg/icssg_common.c b/drivers/net/ethernet/ti/icssg/icssg_common.c
>> index c3c1e2bf461e..a124c5773551 100644
>> --- a/drivers/net/ethernet/ti/icssg/icssg_common.c
>> +++ b/drivers/net/ethernet/ti/icssg/icssg_common.c
>> @@ -136,12 +136,12 @@ int emac_tx_complete_packets(struct prueth_emac *emac, int chn,
>> struct net_device *ndev = emac->ndev;
>> struct cppi5_host_desc_t *desc_tx;
>> struct netdev_queue *netif_txq;
>> + struct prueth_swdata *swdata;
>> struct prueth_tx_chn *tx_chn;
>> unsigned int total_bytes = 0;
>> struct sk_buff *skb;
>> dma_addr_t desc_dma;
>> int res, num_tx = 0;
>> - void **swdata;
>>
>> tx_chn = &emac->tx_chns[chn];
>>
>> @@ -163,12 +163,18 @@ int emac_tx_complete_packets(struct prueth_emac *emac, int chn,
>> swdata = cppi5_hdesc_get_swdata(desc_tx);
>>
>> /* was this command's TX complete? */
>> - if (emac->is_sr1 && *(swdata) == emac->cmd_data) {
>> + if (emac->is_sr1 && (void *)(swdata) == emac->cmd_data) {
>> prueth_xmit_free(tx_chn, desc_tx);
>> continue;
>> }
>>
>> - skb = *(swdata);
>> + if (swdata->type != PRUETH_SWDATA_SKB) {
>> + netdev_err(ndev, "tx_complete: invalid swdata type %d\n", swdata->type);
>> + budget++;
>> + continue;
>
> We are leaking the tx descriptor here. need to call prueth_xmit_free(tx_chn, desc_tx)?
>
Yes agreed, I will add prueth_xmit_free() here.
>> + }
>> +
>> + skb = swdata->data.skb;
>> prueth_xmit_free(tx_chn, desc_tx);
>>
>> ndev = skb->dev;
>> @@ -472,9 +478,9 @@ int prueth_dma_rx_push_mapped(struct prueth_emac *emac,
>> {
>> struct net_device *ndev = emac->ndev;
>> struct cppi5_host_desc_t *desc_rx;
>> + struct prueth_swdata *swdata;
>> dma_addr_t desc_dma;
>> dma_addr_t buf_dma;
>> - void **swdata;
>>
>> buf_dma = page_pool_get_dma_addr(page) + PRUETH_HEADROOM;
>> desc_rx = k3_cppi_desc_pool_alloc(rx_chn->desc_pool);
>> @@ -490,7 +496,8 @@ int prueth_dma_rx_push_mapped(struct prueth_emac *emac,
>> cppi5_hdesc_attach_buf(desc_rx, buf_dma, buf_len, buf_dma, buf_len);
>>
>> swdata = cppi5_hdesc_get_swdata(desc_rx);
>> - *swdata = page;
>> + swdata->type = PRUETH_SWDATA_PAGE;
>> + swdata->data.page = page;
>>
>> return k3_udma_glue_push_rx_chn(rx_chn->rx_chn, PRUETH_RX_FLOW_DATA,
>> desc_rx, desc_dma);
>> @@ -539,11 +546,11 @@ static int emac_rx_packet(struct prueth_emac *emac, u32 flow_id)
>> u32 buf_dma_len, pkt_len, port_id = 0;
>> struct net_device *ndev = emac->ndev;
>> struct cppi5_host_desc_t *desc_rx;
>> + struct prueth_swdata *swdata;
>> dma_addr_t desc_dma, buf_dma;
>> struct page *page, *new_page;
>> struct page_pool *pool;
>> struct sk_buff *skb;
>> - void **swdata;
>> u32 *psdata;
>> void *pa;
>> int ret;
>> @@ -561,7 +568,11 @@ static int emac_rx_packet(struct prueth_emac *emac, u32 flow_id)
>>
>> desc_rx = k3_cppi_desc_pool_dma2virt(rx_chn->desc_pool, desc_dma);
>> swdata = cppi5_hdesc_get_swdata(desc_rx);
>> - page = *swdata;
>> + if (swdata->type != PRUETH_SWDATA_PAGE) {
>> + netdev_err(ndev, "rx_pkt: invalid swdata->type %d\n", swdata->type);
>
> what about freeing the rx descriptor?
>
Yes, will do that.
>> + return 0;
>> + }
>
> need new line.
>
>> + page = swdata->data.page;
>>
> drop new line.
>
>> page_pool_dma_sync_for_cpu(pool, page, 0, PAGE_SIZE);
>> cppi5_hdesc_get_obuf(desc_rx, &buf_dma, &buf_dma_len);
>> @@ -627,16 +638,18 @@ static void prueth_rx_cleanup(void *data, dma_addr_t desc_dma)
>> {
>> struct prueth_rx_chn *rx_chn = data;
>> struct cppi5_host_desc_t *desc_rx;
>> + struct prueth_swdata *swdata;
>> struct page_pool *pool;
>> struct page *page;
>> - void **swdata;
>>
>> pool = rx_chn->pg_pool;
>>
>> desc_rx = k3_cppi_desc_pool_dma2virt(rx_chn->desc_pool, desc_dma);
>> swdata = cppi5_hdesc_get_swdata(desc_rx);
>> - page = *swdata;
>> - page_pool_recycle_direct(pool, page);
>> + if (swdata->type == PRUETH_SWDATA_PAGE) {
>> + page = swdata->data.page;
>> + page_pool_recycle_direct(pool, page);
>> + }
>
> need new line.
>
>> k3_cppi_desc_pool_free(rx_chn->desc_pool, desc_rx);
>> }
>>
>> @@ -673,13 +686,13 @@ enum netdev_tx icssg_ndo_start_xmit(struct sk_buff *skb, struct net_device *ndev
>> struct prueth_emac *emac = netdev_priv(ndev);
>> struct prueth *prueth = emac->prueth;
>> struct netdev_queue *netif_txq;
>> + struct prueth_swdata *swdata;
>> struct prueth_tx_chn *tx_chn;
>> dma_addr_t desc_dma, buf_dma;
>> u32 pkt_len, dst_tag_id;
>> int i, ret = 0, q_idx;
>> bool in_tx_ts = 0;
>> int tx_ts_cookie;
>> - void **swdata;
>> u32 *epib;
>>
>> pkt_len = skb_headlen(skb);
>> @@ -741,7 +754,8 @@ enum netdev_tx icssg_ndo_start_xmit(struct sk_buff *skb, struct net_device *ndev
>> k3_udma_glue_tx_dma_to_cppi5_addr(tx_chn->tx_chn, &buf_dma);
>> cppi5_hdesc_attach_buf(first_desc, buf_dma, pkt_len, buf_dma, pkt_len);
>> swdata = cppi5_hdesc_get_swdata(first_desc);
>> - *swdata = skb;
>> + swdata->type = PRUETH_SWDATA_SKB;
>> + swdata->data.skb = skb;
>>
>> /* Handle the case where skb is fragmented in pages */
>> cur_desc = first_desc;
>> @@ -844,15 +858,16 @@ static void prueth_tx_cleanup(void *data, dma_addr_t desc_dma)
>> {
>> struct prueth_tx_chn *tx_chn = data;
>> struct cppi5_host_desc_t *desc_tx;
>> + struct prueth_swdata *swdata;
>> struct sk_buff *skb;
>> - void **swdata;
>>
>> desc_tx = k3_cppi_desc_pool_dma2virt(tx_chn->desc_pool, desc_dma);
>> swdata = cppi5_hdesc_get_swdata(desc_tx);
>> - skb = *(swdata);
>> + if (swdata->type == PRUETH_SWDATA_SKB) {
>> + skb = swdata->data.skb;
>> + dev_kfree_skb_any(skb);
>> + }
>
> need new line.
>
>> prueth_xmit_free(tx_chn, desc_tx);
>> -
>> - dev_kfree_skb_any(skb);
>> }
>>
>> irqreturn_t prueth_rx_irq(int irq, void *dev_id)
>> diff --git a/drivers/net/ethernet/ti/icssg/icssg_config.h b/drivers/net/ethernet/ti/icssg/icssg_config.h
>> index c884e9fa099e..eab84e11d80e 100644
>> --- a/drivers/net/ethernet/ti/icssg/icssg_config.h
>> +++ b/drivers/net/ethernet/ti/icssg/icssg_config.h
>> @@ -20,7 +20,7 @@ struct icssg_flow_cfg {
>>
>> #define PRUETH_PKT_TYPE_CMD 0x10
>> #define PRUETH_NAV_PS_DATA_SIZE 16 /* Protocol specific data size */
>> -#define PRUETH_NAV_SW_DATA_SIZE 16 /* SW related data size */
>> +#define PRUETH_NAV_SW_DATA_SIZE 48 /* SW related data size */
>
> Why do you need 48? I think it should fit in 16.
>
Yes, I overlooked the fact that swdata will be a union, will update
this. But if we are adding rx_chn in prueth_swdata for the next patch,
we will need to increase this to 24.
>> #define PRUETH_MAX_TX_DESC 512
>> #define PRUETH_MAX_RX_DESC 512
>> #define PRUETH_MAX_RX_FLOWS 1 /* excluding default flow */
>> diff --git a/drivers/net/ethernet/ti/icssg/icssg_prueth.c b/drivers/net/ethernet/ti/icssg/icssg_prueth.c
>> index 00ed97860547..e5e4efe485f6 100644
>> --- a/drivers/net/ethernet/ti/icssg/icssg_prueth.c
>> +++ b/drivers/net/ethernet/ti/icssg/icssg_prueth.c
>> @@ -1522,6 +1522,12 @@ static int prueth_probe(struct platform_device *pdev)
>>
>> np = dev->of_node;
>>
>> + if (sizeof(struct prueth_swdata) > PRUETH_NAV_SW_DATA_SIZE) {
>> + dev_err(dev, "insufficient SW_DATA size: %d vs %ld\n",
>> + PRUETH_NAV_SW_DATA_SIZE, sizeof(struct prueth_swdata));
>> + return -ENOMEM;
>> + }
>> +
>
> Can this be made a build time check instead of runtime?
>
Sure, I beleive BUILD_BUG_ON_MSG macro can be used as a build time
check. Will use that.
>> prueth = devm_kzalloc(dev, sizeof(*prueth), GFP_KERNEL);
>> if (!prueth)
>> return -ENOMEM;
>> diff --git a/drivers/net/ethernet/ti/icssg/icssg_prueth.h b/drivers/net/ethernet/ti/icssg/icssg_prueth.h
>> index c7b906de18af..2c8585255b7c 100644
>> --- a/drivers/net/ethernet/ti/icssg/icssg_prueth.h
>> +++ b/drivers/net/ethernet/ti/icssg/icssg_prueth.h
>> @@ -136,6 +136,24 @@ struct prueth_rx_chn {
>> struct page_pool *pg_pool;
>> };
>>
>> +enum prueth_swdata_type {
>> + PRUETH_SWDATA_INVALID = 0,
>> + PRUETH_SWDATA_SKB,
>> + PRUETH_SWDATA_PAGE,
>> + PRUETH_SWDATA_CMD,
>> +};
>> +
>> +union prueth_data {
>> + struct sk_buff *skb;
>> + struct page *page;
>> + u32 cmd;
>> +};
>> +
>> +struct prueth_swdata {
>> + union prueth_data data;
>> + enum prueth_swdata_type type;
>> +};
>
> Can we re-write this like so with type first and union embedded inside?
>
> struct prueth_swdata {
> enum prueth_swdata_type type;
> union prueth_data {
> struct sk_buff *skb;
> struct page *page;
> u32 cmd;
> };
> };
>
Yeah ok, I will make the changes for this.
>
>> +
>> /* There are 4 Tx DMA channels, but the highest priority is CH3 (thread 3)
>> * and lower three are lower priority channels or threads.
>> */
>> diff --git a/drivers/net/ethernet/ti/icssg/icssg_prueth_sr1.c b/drivers/net/ethernet/ti/icssg/icssg_prueth_sr1.c
>> index aeeb8a50376b..7bbe0808b3ec 100644
>> --- a/drivers/net/ethernet/ti/icssg/icssg_prueth_sr1.c
>> +++ b/drivers/net/ethernet/ti/icssg/icssg_prueth_sr1.c
>> @@ -275,10 +275,10 @@ static struct sk_buff *prueth_process_rx_mgm(struct prueth_emac *emac,
>> struct net_device *ndev = emac->ndev;
>> struct cppi5_host_desc_t *desc_rx;
>> struct page *page, *new_page;
>> + struct prueth_swdata *swdata;
>> dma_addr_t desc_dma, buf_dma;
>> u32 buf_dma_len, pkt_len;
>> struct sk_buff *skb;
>> - void **swdata;
>> void *pa;
>> int ret;
>>
>> @@ -301,7 +301,7 @@ static struct sk_buff *prueth_process_rx_mgm(struct prueth_emac *emac,
>> }
>>
>> swdata = cppi5_hdesc_get_swdata(desc_rx);
>> - page = *swdata;
>> + page = swdata->data.page;
>> cppi5_hdesc_get_obuf(desc_rx, &buf_dma, &buf_dma_len);
>> pkt_len = cppi5_hdesc_get_pktlen(desc_rx);
>>
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net-next v2 1/3] net: ti: icssg-prueth: Use page_pool API for RX buffer allocation
2025-02-18 10:10 ` Malladi, Meghana
@ 2025-02-18 12:24 ` Diogo Ivo
2025-02-18 12:28 ` [EXTERNAL] " Malladi, Meghana
2025-02-24 13:20 ` Roger Quadros
1 sibling, 1 reply; 17+ messages in thread
From: Diogo Ivo @ 2025-02-18 12:24 UTC (permalink / raw)
To: Malladi, Meghana, Roger Quadros, danishanwar, pabeni, kuba,
edumazet, davem, andrew+netdev
Cc: bpf, linux-arm-kernel, linux-kernel, netdev, u.kleine-koenig,
krzysztof.kozlowski, dan.carpenter, schnelle, glaroque, rdunlap,
jan.kiszka, john.fastabend, hawk, daniel, ast, srk,
Vignesh Raghavendra, diogo.ivo
Hi Meghana,
On 2/18/25 10:10 AM, Malladi, Meghana wrote:
> On 2/12/2025 7:26 PM, Roger Quadros wrote:
>> Can we get rid of SKB entirely from the management channel code?
>> The packet received on this channel is never passed to user space so
>> I don't see why SKB is required.
>>
>
> Yes I do agree with you on the fact the SKB here is not passed to the
> network stack, hence this is redundant. But honestly I am not sure how
> that can be done, because the callers of this function access skb->data
> from the skb which is returned and the same can't be done with page (how
> to pass the same data using page?)
> Also as you are aware we are not currently supporting SR1 devices
> anymore, hence I don't have any SR1 devices handy to test these changes
> and ensure nothing is broken if I remove SKB entirely.
I have some SR1 devices available and would be happy to test these
proposed changes in case they are feasible.
Best regards,
Diogo
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [EXTERNAL] Re: [PATCH net-next v2 1/3] net: ti: icssg-prueth: Use page_pool API for RX buffer allocation
2025-02-18 12:24 ` Diogo Ivo
@ 2025-02-18 12:28 ` Malladi, Meghana
0 siblings, 0 replies; 17+ messages in thread
From: Malladi, Meghana @ 2025-02-18 12:28 UTC (permalink / raw)
To: Diogo Ivo, Roger Quadros, danishanwar, pabeni, kuba, edumazet,
davem, andrew+netdev
Cc: bpf, linux-arm-kernel, linux-kernel, netdev, u.kleine-koenig,
krzysztof.kozlowski, dan.carpenter, schnelle, glaroque, rdunlap,
jan.kiszka, john.fastabend, hawk, daniel, ast, srk,
Vignesh Raghavendra
Hi Diogo,
On 2/18/2025 5:54 PM, Diogo Ivo wrote:
> Hi Meghana, On 2/18/25 10: 10 AM, Malladi, Meghana wrote: > On 2/12/2025
> 7: 26 PM, Roger Quadros wrote: >> Can we get rid of SKB entirely from
> the management channel code? >> The packet received on this channel is
> never passed to
> ZjQcmQRYFpfptBannerStart
> This message was sent from outside of Texas Instruments.
> Do not click links or open attachments unless you recognize the source
> of this email and know the content is safe.
> Report Suspicious
> <https://us-phishalarm-ewt.proofpoint.com/EWT/v1/G3vK!
> uvdgfB3F_Ow7QIXEnBKj3ybYT8I9yL0CM5RLkel44YW99zMeqk_TnCBkOwR0q45N5dLjtBz49EalJyUJ1-U1lPk6DIBve_3-$>
> ZjQcmQRYFpfptBannerEnd
>
> Hi Meghana,
>
> On 2/18/25 10:10 AM, Malladi, Meghana wrote:
>> On 2/12/2025 7:26 PM, Roger Quadros wrote:
>>> Can we get rid of SKB entirely from the management channel code?
>>> The packet received on this channel is never passed to user space so
>>> I don't see why SKB is required.
>>>
>>
>> Yes I do agree with you on the fact the SKB here is not passed to the
>> network stack, hence this is redundant. But honestly I am not sure how
>> that can be done, because the callers of this function access skb->data
>> from the skb which is returned and the same can't be done with page (how
>> to pass the same data using page?)
>> Also as you are aware we are not currently supporting SR1 devices
>> anymore, hence I don't have any SR1 devices handy to test these changes
>> and ensure nothing is broken if I remove SKB entirely.
>
> I have some SR1 devices available and would be happy to test these
> proposed changes in case they are feasible.
>
That's awesome. Once the changes have been aligned with Roger, I will
share the changes with you for testing before posting v3. Thanks for
your support.
> Best regards,
> Diogo
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net-next v2 0/3] Add native mode XDP support
2025-02-10 10:33 [PATCH net-next v2 0/3] Add native mode XDP support Meghana Malladi
` (2 preceding siblings ...)
2025-02-10 10:33 ` [PATCH net-next v2 3/3] net: ti: icssg-prueth: Add XDP support Meghana Malladi
@ 2025-02-18 16:02 ` Jesper Dangaard Brouer
2025-02-18 17:55 ` Malladi, Meghana
3 siblings, 1 reply; 17+ messages in thread
From: Jesper Dangaard Brouer @ 2025-02-18 16:02 UTC (permalink / raw)
To: Meghana Malladi, rogerq, danishanwar, pabeni, kuba, edumazet,
davem, andrew+netdev
Cc: bpf, linux-arm-kernel, linux-kernel, netdev, u.kleine-koenig,
krzysztof.kozlowski, dan.carpenter, schnelle, glaroque, rdunlap,
diogo.ivo, jan.kiszka, john.fastabend, daniel, ast, srk,
Vignesh Raghavendra
On 10/02/2025 11.33, Meghana Malladi wrote:
> This series adds native XDP support using page_pool.
Please also describe *what driver* to adds XDP support for.
This cover letter will (by netdev maintainers) be part of the merge
commit text. Thus, please mention the driver name in the text.
This also applies for the Subject line. It should either be prefix with
"net: ti: icssg-prueth:" like you did for other patches, or be renamed
to e.g.: "Add native mode XDP support for driver ti/icssg-prueth".
Thanks,
--Jesper
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net-next v2 0/3] Add native mode XDP support
2025-02-18 16:02 ` [PATCH net-next v2 0/3] Add native mode " Jesper Dangaard Brouer
@ 2025-02-18 17:55 ` Malladi, Meghana
0 siblings, 0 replies; 17+ messages in thread
From: Malladi, Meghana @ 2025-02-18 17:55 UTC (permalink / raw)
To: Jesper Dangaard Brouer, rogerq, danishanwar, pabeni, kuba,
edumazet, davem, andrew+netdev
Cc: bpf, linux-arm-kernel, linux-kernel, netdev, u.kleine-koenig,
krzysztof.kozlowski, dan.carpenter, schnelle, glaroque, rdunlap,
diogo.ivo, jan.kiszka, john.fastabend, daniel, ast, srk,
Vignesh Raghavendra
On 2/18/2025 9:32 PM, Jesper Dangaard Brouer wrote:
>
> On 10/02/2025 11.33, Meghana Malladi wrote:
>> This series adds native XDP support using page_pool.
>
> Please also describe *what driver* to adds XDP support for.
>
> This cover letter will (by netdev maintainers) be part of the merge
> commit text. Thus, please mention the driver name in the text.
>
> This also applies for the Subject line. It should either be prefix with
> "net: ti: icssg-prueth:" like you did for other patches, or be renamed
> to e.g.: "Add native mode XDP support for driver ti/icssg-prueth".
>
Oh ok got it. Will update the subject line accordingly in v3 and follow
the same for my upcoming patches. Thanks.
> Thanks,
> --Jesper
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net-next v2 3/3] net: ti: icssg-prueth: Add XDP support
2025-02-12 16:03 ` Roger Quadros
@ 2025-02-18 18:29 ` Malladi, Meghana
0 siblings, 0 replies; 17+ messages in thread
From: Malladi, Meghana @ 2025-02-18 18:29 UTC (permalink / raw)
To: Roger Quadros, danishanwar, pabeni, kuba, edumazet, davem,
andrew+netdev
Cc: bpf, linux-arm-kernel, linux-kernel, netdev, u.kleine-koenig,
krzysztof.kozlowski, dan.carpenter, schnelle, glaroque, rdunlap,
diogo.ivo, jan.kiszka, john.fastabend, hawk, daniel, ast, srk,
Vignesh Raghavendra
On 2/12/2025 9:33 PM, Roger Quadros wrote:
>
>
> On 10/02/2025 12:33, Meghana Malladi wrote:
>> From: Roger Quadros <rogerq@kernel.org>
>>
>> Add native XDP support. We do not support zero copy yet.
>>
>> Signed-off-by: Roger Quadros <rogerq@kernel.org>
>> Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
>> Signed-off-by: Meghana Malladi <m-malladi@ti.com>
>> ---
>> v1: https://lore.kernel.org/all/20250122124951.3072410-1-m-malladi@ti.com/
>>
>> Changes since v1 (v2-v1):
>> - Fix XDP typo in the commit message
>> - Add XDP feature flags using xdp_set_features_flag()
>> - Use xdp_build_skb_from_buff() when XDP ran
>>
>> All the above changes have been suggested by Ido Schimmel <idosch@idosch.org>
>>
>> drivers/net/ethernet/ti/icssg/icssg_common.c | 226 +++++++++++++++++--
>> drivers/net/ethernet/ti/icssg/icssg_prueth.c | 123 +++++++++-
>> drivers/net/ethernet/ti/icssg/icssg_prueth.h | 18 ++
>> 3 files changed, 353 insertions(+), 14 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/ti/icssg/icssg_common.c b/drivers/net/ethernet/ti/icssg/icssg_common.c
>> index a124c5773551..b01750a2d57e 100644
>> --- a/drivers/net/ethernet/ti/icssg/icssg_common.c
>> +++ b/drivers/net/ethernet/ti/icssg/icssg_common.c
>> @@ -98,11 +98,19 @@ void prueth_xmit_free(struct prueth_tx_chn *tx_chn,
>> {
>> struct cppi5_host_desc_t *first_desc, *next_desc;
>> dma_addr_t buf_dma, next_desc_dma;
>> + struct prueth_swdata *swdata;
>> u32 buf_dma_len;
>>
>> first_desc = desc;
>> next_desc = first_desc;
>>
>> + swdata = cppi5_hdesc_get_swdata(desc);
>> + if (swdata->type == PRUETH_SWDATA_PAGE) {
>> + page_pool_recycle_direct(swdata->rx_chn->pg_pool,
>> + swdata->data.page);
>
> if swdata->data.page.pp already contains the page_pool then you can avoid
> passing around rx_chn via swdata altogether.
>
Oh ok, didn't know page also contains page_pool. Will remove rx_chn from
swdata then.
>> + goto free_desc;
>> + }
>> +
>> cppi5_hdesc_get_obuf(first_desc, &buf_dma, &buf_dma_len);
>> k3_udma_glue_tx_cppi5_to_dma_addr(tx_chn->tx_chn, &buf_dma);
>>
>> @@ -126,6 +134,7 @@ void prueth_xmit_free(struct prueth_tx_chn *tx_chn,
>> k3_cppi_desc_pool_free(tx_chn->desc_pool, next_desc);
>> }
>>
>> +free_desc:
>> k3_cppi_desc_pool_free(tx_chn->desc_pool, first_desc);
>> }
>> EXPORT_SYMBOL_GPL(prueth_xmit_free);
>> @@ -139,6 +148,7 @@ int emac_tx_complete_packets(struct prueth_emac *emac, int chn,
>> struct prueth_swdata *swdata;
>> struct prueth_tx_chn *tx_chn;
>> unsigned int total_bytes = 0;
>> + struct xdp_frame *xdpf;
>> struct sk_buff *skb;
>> dma_addr_t desc_dma;
>> int res, num_tx = 0;
>> @@ -168,20 +178,29 @@ int emac_tx_complete_packets(struct prueth_emac *emac, int chn,
>> continue;
>> }
>>
>> - if (swdata->type != PRUETH_SWDATA_SKB) {
>> + switch (swdata->type) {
>> + case PRUETH_SWDATA_SKB:
>> + skb = swdata->data.skb;
>> + ndev->stats.tx_bytes += skb->len;
>> + ndev->stats.tx_packets++;
>
> dev_sw_netstats_tx_add() instead?
>
Ok, will use this instead.
>> + total_bytes += skb->len;
>> + napi_consume_skb(skb, budget);
>> + break;
>> + case PRUETH_SWDATA_XDPF:
>> + xdpf = swdata->data.xdpf;
>> + ndev->stats.tx_bytes += xdpf->len;
>> + ndev->stats.tx_packets++;
> here too
>
>> + total_bytes += xdpf->len;
>> + xdp_return_frame(xdpf);
>> + break;
>> + default:
>> netdev_err(ndev, "tx_complete: invalid swdata type %d\n", swdata->type);
>
> ndev->stats.tx_dropped++
>
yeah, will add it.
>> + prueth_xmit_free(tx_chn, desc_tx);
>> budget++;
>> continue;
>> }
>>
>> - skb = swdata->data.skb;
>> prueth_xmit_free(tx_chn, desc_tx);
>> -
>> - ndev = skb->dev;
>> - ndev->stats.tx_packets++;
>> - ndev->stats.tx_bytes += skb->len;
>> - total_bytes += skb->len;
>> - napi_consume_skb(skb, budget);
>> num_tx++;
>> }
>>
>> @@ -498,6 +517,7 @@ int prueth_dma_rx_push_mapped(struct prueth_emac *emac,
>> swdata = cppi5_hdesc_get_swdata(desc_rx);
>> swdata->type = PRUETH_SWDATA_PAGE;
>> swdata->data.page = page;
>> + swdata->rx_chn = rx_chn;
>>
>> return k3_udma_glue_push_rx_chn(rx_chn->rx_chn, PRUETH_RX_FLOW_DATA,
>> desc_rx, desc_dma);
>> @@ -540,7 +560,156 @@ void emac_rx_timestamp(struct prueth_emac *emac,
>> ssh->hwtstamp = ns_to_ktime(ns);
>> }
>>
>> -static int emac_rx_packet(struct prueth_emac *emac, u32 flow_id)
>> +/**
>> + * emac_xmit_xdp_frame - transmits an XDP frame
>> + * @emac: emac device
>> + * @xdpf: data to transmit
>> + * @page: page from page pool if already DMA mapped
>> + * @q_idx: queue id
>> + *
>> + * Return: XDP state
>> + */
>> +int emac_xmit_xdp_frame(struct prueth_emac *emac,
>> + struct xdp_frame *xdpf,
>> + struct page *page,
>> + unsigned int q_idx)
>> +{
>> + struct cppi5_host_desc_t *first_desc;
>> + struct net_device *ndev = emac->ndev;
>> + struct prueth_tx_chn *tx_chn;
>> + dma_addr_t desc_dma, buf_dma;
>> + struct prueth_swdata *swdata;
>> + u32 *epib;
>> + int ret;
>> +
>> + void *data = xdpf->data;
>> + u32 pkt_len = xdpf->len;
>> +
>> + if (q_idx >= PRUETH_MAX_TX_QUEUES) {
>> + netdev_err(ndev, "xdp tx: invalid q_id %d\n", q_idx);
>
> ndev->stats.tx_dropped++;
>
Instead of adding this here, I will add it in the caller of the function
so the caller can increase the stats based on what this function
returns. If it returns ICSSG_XDP_CONSUMED then dropped++.
>> + return ICSSG_XDP_CONSUMED; /* drop */
>> + }
>> +
>> + tx_chn = &emac->tx_chns[q_idx];
>> +
>> + if (page) { /* already DMA mapped by page_pool */
>> + buf_dma = page_pool_get_dma_addr(page);
>> + buf_dma += xdpf->headroom + sizeof(struct xdp_frame);
>> + } else { /* Map the linear buffer */
>> + buf_dma = dma_map_single(tx_chn->dma_dev, data, pkt_len, DMA_TO_DEVICE);
>> + if (dma_mapping_error(tx_chn->dma_dev, buf_dma)) {
>> + netdev_err(ndev, "xdp tx: failed to map data buffer\n");
>
> ndev->stats.tx_dropped++;
>
Same.
>> + return ICSSG_XDP_CONSUMED; /* drop */
>> + }
>> + }
>> +
>> + first_desc = k3_cppi_desc_pool_alloc(tx_chn->desc_pool);
>> + if (!first_desc) {
>> + netdev_dbg(ndev, "xdp tx: failed to allocate descriptor\n");
>> + if (!page)
>> + dma_unmap_single(tx_chn->dma_dev, buf_dma, pkt_len, DMA_TO_DEVICE);
>
> Better to do the k3_cppi_desc_pool_alloc() before the DMA mapping
> so it is easier to clean up on failure.
>
Ok, will move it above.
>> + goto drop_free_descs; /* drop */
>> + }
>> +
>> + cppi5_hdesc_init(first_desc, CPPI5_INFO0_HDESC_EPIB_PRESENT,
>> + PRUETH_NAV_PS_DATA_SIZE);
>> + cppi5_hdesc_set_pkttype(first_desc, 0);
>> + epib = first_desc->epib;
>> + epib[0] = 0;
>> + epib[1] = 0;
>> +
>> + /* set dst tag to indicate internal qid at the firmware which is at
>> + * bit8..bit15. bit0..bit7 indicates port num for directed
>> + * packets in case of switch mode operation
>> + */
>> + cppi5_desc_set_tags_ids(&first_desc->hdr, 0, (emac->port_id | (q_idx << 8)));
>> + k3_udma_glue_tx_dma_to_cppi5_addr(tx_chn->tx_chn, &buf_dma);
>> + cppi5_hdesc_attach_buf(first_desc, buf_dma, pkt_len, buf_dma, pkt_len);
>> + swdata = cppi5_hdesc_get_swdata(first_desc);
>> + if (page) {
>> + swdata->type = PRUETH_SWDATA_PAGE;
>> + swdata->data.page = page;
>> + /* we assume page came from RX channel page pool */
>> + swdata->rx_chn = &emac->rx_chns;
>> + } else {
>> + swdata->type = PRUETH_SWDATA_XDPF;
>> + swdata->data.xdpf = xdpf;
>> + }
>> +
>> + cppi5_hdesc_set_pktlen(first_desc, pkt_len);
>> + desc_dma = k3_cppi_desc_pool_virt2dma(tx_chn->desc_pool, first_desc);
>> +
>> + ret = k3_udma_glue_push_tx_chn(tx_chn->tx_chn, first_desc, desc_dma);
>> + if (ret) {
>> + netdev_err(ndev, "xdp tx: push failed: %d\n", ret);
>> + goto drop_free_descs;
>> + }
>> +
>> + return ICSSG_XDP_TX;
>> +
>> +drop_free_descs:
>
> ndev->stats.tx_dropped++;
>
Same, will let the caller decide to increase the stats.
>> + prueth_xmit_free(tx_chn, first_desc);
>
> this will also unmap the dma_buffer for all cases. So maybe you need
> to add a flag to prueth_xmit_free() to skip unmap for certain cases.
>
Can you specify for which cases unmap should be skipped and when it
shouldn't.
>> + return ICSSG_XDP_CONSUMED;
>> +}
>> +EXPORT_SYMBOL_GPL(emac_xmit_xdp_frame);
>> +
>> +/**
>> + * emac_run_xdp - run an XDP program
>> + * @emac: emac device
>> + * @xdp: XDP buffer containing the frame
>> + * @page: page with RX data if already DMA mapped
>> + *
>> + * Return: XDP state
>> + */
>> +static int emac_run_xdp(struct prueth_emac *emac, struct xdp_buff *xdp,
>> + struct page *page)
>> +{
>> + int err, result = ICSSG_XDP_PASS;
>> + struct bpf_prog *xdp_prog;
>> + struct xdp_frame *xdpf;
>> + int q_idx;
>> + u32 act;
>> +
>> + xdp_prog = READ_ONCE(emac->xdp_prog);
>> +
> unnecessary new line.
>
Ok, will remove it.
>> + act = bpf_prog_run_xdp(xdp_prog, xdp);
>> + switch (act) {
>> + case XDP_PASS:
>
> return ICSSG_XDP_PASS;
>
result is populated with ICSSG_XDP_PASS initially and after break it
returns ICSSG_XDP_PASS.
>> + break;
>> + case XDP_TX:
>> + /* Send packet to TX ring for immediate transmission */
>> + xdpf = xdp_convert_buff_to_frame(xdp);
>> + if (unlikely(!xdpf))
> ndev->stats.tx_dropped++;
>
will add this after drop label, as it is applicable for all conditions
where drop happens.
>> + goto drop;
>> +
>> + q_idx = smp_processor_id() % emac->tx_ch_num;
>> + result = emac_xmit_xdp_frame(emac, xdpf, page, q_idx);
>> + if (result == ICSSG_XDP_CONSUMED)
>> + goto drop;
>
> increment tx stats?
>
same
> return ICSSG_XDP_TX;
>
emac_xmit_xdp_frame() returns ICSSG_XDP_TX, if the packet hasn't been
dropped (ICSSG_XDP_CONSUMED)
>> + break;
>> + case XDP_REDIRECT:
>> + err = xdp_do_redirect(emac->ndev, xdp, xdp_prog);
>> + if (err)
>> + goto drop;
>> + result = ICSSG_XDP_REDIR;
>
> return ICSSG_XDP_REDIR
break handles this return.
>> + break;
>> + default:
>> + bpf_warn_invalid_xdp_action(emac->ndev, xdp_prog, act);
>> + fallthrough;
>> + case XDP_ABORTED:
>> +drop:
>> + trace_xdp_exception(emac->ndev, xdp_prog, act);
>> + fallthrough; /* handle aborts by dropping packet */
>> + case XDP_DROP:
>
> ndev->stats.rx_dropped++;
>
yes, will add it under XDP_DROP to handle both XDP_DROP switch case and
drop label.
>> + result = ICSSG_XDP_CONSUMED;
>> + page_pool_recycle_direct(emac->rx_chns.pg_pool, page);
>> + break;
>> + }
>> +
>> + return result;
>> +}
>> +
>> +static int emac_rx_packet(struct prueth_emac *emac, u32 flow_id, int *xdp_state)
>> {
>> struct prueth_rx_chn *rx_chn = &emac->rx_chns;
>> u32 buf_dma_len, pkt_len, port_id = 0;
>> @@ -551,10 +720,12 @@ static int emac_rx_packet(struct prueth_emac *emac, u32 flow_id)
>> struct page *page, *new_page;
>> struct page_pool *pool;
>> struct sk_buff *skb;
>> + struct xdp_buff xdp;
>> u32 *psdata;
>> void *pa;
>> int ret;
>>
>> + *xdp_state = 0;
>> pool = rx_chn->pg_pool;
>> ret = k3_udma_glue_pop_rx_chn(rx_chn->rx_chn, flow_id, &desc_dma);
>> if (ret) {
>> @@ -594,9 +765,21 @@ static int emac_rx_packet(struct prueth_emac *emac, u32 flow_id)
>> goto requeue;
>> }
>>
>> - /* prepare skb and send to n/w stack */
>> pa = page_address(page);
>> - skb = napi_build_skb(pa, PAGE_SIZE);
>> + if (emac->xdp_prog) {
>> + xdp_init_buff(&xdp, PAGE_SIZE, &rx_chn->xdp_rxq);
>> + xdp_prepare_buff(&xdp, pa, PRUETH_HEADROOM, pkt_len, false);
>> +
>> + *xdp_state = emac_run_xdp(emac, &xdp, page);
>> + if (*xdp_state == ICSSG_XDP_PASS)
>> + skb = xdp_build_skb_from_buff(&xdp);
>> + else
>> + goto requeue;
>> + } else {
>> + /* prepare skb and send to n/w stack */
>> + skb = napi_build_skb(pa, PAGE_SIZE);
>> + }
>> +
>> if (!skb) {
>> ndev->stats.rx_dropped++;
>> page_pool_recycle_direct(pool, page);
>> @@ -859,14 +1042,25 @@ static void prueth_tx_cleanup(void *data, dma_addr_t desc_dma)
>> struct prueth_tx_chn *tx_chn = data;
>> struct cppi5_host_desc_t *desc_tx;
>> struct prueth_swdata *swdata;
>> + struct xdp_frame *xdpf;
>> struct sk_buff *skb;
>>
>> desc_tx = k3_cppi_desc_pool_dma2virt(tx_chn->desc_pool, desc_dma);
>> swdata = cppi5_hdesc_get_swdata(desc_tx);
>> - if (swdata->type == PRUETH_SWDATA_SKB) {
>> +
>> + switch (swdata->type) {
>> + case PRUETH_SWDATA_SKB:
>> skb = swdata->data.skb;
>> dev_kfree_skb_any(skb);
>> + break;
>> + case PRUETH_SWDATA_XDPF:
>> + xdpf = swdata->data.xdpf;
>> + xdp_return_frame(xdpf);
>> + break;
>
> what about PRUETH_SWDATA_PAGE?
>
This gets handled inside prueth_xmit_free().
>> + default:
>> + break;
>> }
>> +
>> prueth_xmit_free(tx_chn, desc_tx);
>> }
>>
>> @@ -901,15 +1095,18 @@ int icssg_napi_rx_poll(struct napi_struct *napi_rx, int budget)
>> PRUETH_RX_FLOW_DATA_SR1 : PRUETH_RX_FLOW_DATA;
>> int flow = emac->is_sr1 ?
>> PRUETH_MAX_RX_FLOWS_SR1 : PRUETH_MAX_RX_FLOWS;
>> + int xdp_state_or = 0;
>> int num_rx = 0;
>> int cur_budget;
>> + int xdp_state;
>> int ret;
>>
>> while (flow--) {
>> cur_budget = budget - num_rx;
>>
>> while (cur_budget--) {
>> - ret = emac_rx_packet(emac, flow);
>> + ret = emac_rx_packet(emac, flow, &xdp_state);
>> + xdp_state_or |= xdp_state;
>> if (ret)
>> break;
>> num_rx++;
>> @@ -919,6 +1116,9 @@ int icssg_napi_rx_poll(struct napi_struct *napi_rx, int budget)
>> break;
>> }
>>
>> + if (xdp_state_or & ICSSG_XDP_REDIR)
>> + xdp_do_flush();
>> +
>> if (num_rx < budget && napi_complete_done(napi_rx, num_rx)) {
>> if (unlikely(emac->rx_pace_timeout_ns)) {
>> hrtimer_start(&emac->rx_hrtimer,
>> diff --git a/drivers/net/ethernet/ti/icssg/icssg_prueth.c b/drivers/net/ethernet/ti/icssg/icssg_prueth.c
>> index e5e4efe485f6..a360a1d6f8d7 100644
>> --- a/drivers/net/ethernet/ti/icssg/icssg_prueth.c
>> +++ b/drivers/net/ethernet/ti/icssg/icssg_prueth.c
>> @@ -559,6 +559,33 @@ const struct icss_iep_clockops prueth_iep_clockops = {
>> .perout_enable = prueth_perout_enable,
>> };
>>
>> +static int prueth_create_xdp_rxqs(struct prueth_emac *emac)
>> +{
>> + struct xdp_rxq_info *rxq = &emac->rx_chns.xdp_rxq;
>> + struct page_pool *pool = emac->rx_chns.pg_pool;
>> + int ret;
>> +
>> + ret = xdp_rxq_info_reg(rxq, emac->ndev, 0, rxq->napi_id);
>
> but who sets rxq->napi_id?
>
> I think you need to use emac->napi_rx.napi_id
>
Yes, I have updated it with "emac->napi_rx.napi_id"
>> + if (ret)
>> + return ret;
>> +
>> + ret = xdp_rxq_info_reg_mem_model(rxq, MEM_TYPE_PAGE_POOL, pool);
>> + if (ret)
>> + xdp_rxq_info_unreg(rxq);
>> +
>> + return ret;
>> +}
>> +
>> +static void prueth_destroy_xdp_rxqs(struct prueth_emac *emac)
>> +{
>> + struct xdp_rxq_info *rxq = &emac->rx_chns.xdp_rxq;
>> +
>> + if (!xdp_rxq_info_is_reg(rxq))
>> + return;
>> +
>> + xdp_rxq_info_unreg(rxq);
>> +}
>> +
>> static int icssg_prueth_add_mcast(struct net_device *ndev, const u8 *addr)
>> {
>> struct net_device *real_dev;
>> @@ -780,10 +807,14 @@ static int emac_ndo_open(struct net_device *ndev)
>> if (ret)
>> goto free_tx_ts_irq;
>>
>> - ret = k3_udma_glue_enable_rx_chn(emac->rx_chns.rx_chn);
>> + ret = prueth_create_xdp_rxqs(emac);
>> if (ret)
>> goto reset_rx_chn;
>>
>> + ret = k3_udma_glue_enable_rx_chn(emac->rx_chns.rx_chn);
>> + if (ret)
>> + goto destroy_xdp_rxqs;
>> +
>> for (i = 0; i < emac->tx_ch_num; i++) {
>> ret = k3_udma_glue_enable_tx_chn(emac->tx_chns[i].tx_chn);
>> if (ret)
>> @@ -809,6 +840,8 @@ static int emac_ndo_open(struct net_device *ndev)
>> * any SKB for completion. So set false to free_skb
>> */
>> prueth_reset_tx_chan(emac, i, false);
>> +destroy_xdp_rxqs:
>> + prueth_destroy_xdp_rxqs(emac);
>> reset_rx_chn:
>> prueth_reset_rx_chan(&emac->rx_chns, max_rx_flows, false);
>> free_tx_ts_irq:
>> @@ -880,6 +913,8 @@ static int emac_ndo_stop(struct net_device *ndev)
>>
>> prueth_reset_rx_chan(&emac->rx_chns, max_rx_flows, true);
>>
> Please drop new line.
>
Okay.
>> + prueth_destroy_xdp_rxqs(emac);
>> +
> here too.
>
>> napi_disable(&emac->napi_rx);
>> hrtimer_cancel(&emac->rx_hrtimer);
>>
>> @@ -1024,6 +1059,90 @@ static int emac_ndo_vlan_rx_del_vid(struct net_device *ndev,
>> return 0;
>> }
>>
>> +/**
>> + * emac_xdp_xmit - Implements ndo_xdp_xmit
>> + * @dev: netdev
>> + * @n: number of frames
>> + * @frames: array of XDP buffer pointers
>> + * @flags: XDP extra info
>> + *
>> + * Return: number of frames successfully sent. Failed frames
>> + * will be free'ed by XDP core.
>> + *
>> + * For error cases, a negative errno code is returned and no-frames
>> + * are transmitted (caller must handle freeing frames).
>> + **/
>> +static int emac_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames,
>> + u32 flags)
>> +{
>> + struct prueth_emac *emac = netdev_priv(dev);
>> + unsigned int q_idx;
>> + int nxmit = 0;
>> + int i;
>> +
>> + q_idx = smp_processor_id() % emac->tx_ch_num;
>> +
>> + if (unlikely(flags & ~XDP_XMIT_FLAGS_MASK))
>> + return -EINVAL;
>> +
>> + for (i = 0; i < n; i++) {
>> + struct xdp_frame *xdpf = frames[i];
>> + int err;
>> +
>> + err = emac_xmit_xdp_frame(emac, xdpf, NULL, q_idx);
>> + if (err != ICSSG_XDP_TX)
>> + break;
>> + nxmit++;
>> + }
>> +
>> + return nxmit;
>> +}
>> +
>> +/**
>> + * emac_xdp_setup - add/remove an XDP program
>> + * @emac: emac device
>> + * @bpf: XDP program
>> + *
>> + * Return: Always 0 (Success)
>> + **/
>> +static int emac_xdp_setup(struct prueth_emac *emac, struct netdev_bpf *bpf)
>> +{
>> + struct bpf_prog *prog = bpf->prog;
>> + xdp_features_t val;
>> +
>> + val = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT |
>> + NETDEV_XDP_ACT_NDO_XMIT;
>> + xdp_set_features_flag(emac->ndev, val);
>> +
>> + if (!emac->xdpi.prog && !prog)
>> + return 0;
>> +
>> + WRITE_ONCE(emac->xdp_prog, prog);
>> +
>> + xdp_attachment_setup(&emac->xdpi, bpf);
>> +
>> + return 0;
>> +}
>> +
>> +/**
>> + * emac_ndo_bpf - implements ndo_bpf for icssg_prueth
>> + * @ndev: network adapter device
>> + * @bpf: XDP program
>> + *
>> + * Return: 0 on success, error code on failure.
>> + **/
>> +static int emac_ndo_bpf(struct net_device *ndev, struct netdev_bpf *bpf)
>> +{
>> + struct prueth_emac *emac = netdev_priv(ndev);
>> +
>> + switch (bpf->command) {
>> + case XDP_SETUP_PROG:
>> + return emac_xdp_setup(emac, bpf);
>> + default:
>> + return -EINVAL;
>> + }
>> +}
>> +
>> static const struct net_device_ops emac_netdev_ops = {
>> .ndo_open = emac_ndo_open,
>> .ndo_stop = emac_ndo_stop,
>> @@ -1038,6 +1157,8 @@ static const struct net_device_ops emac_netdev_ops = {
>> .ndo_fix_features = emac_ndo_fix_features,
>> .ndo_vlan_rx_add_vid = emac_ndo_vlan_rx_add_vid,
>> .ndo_vlan_rx_kill_vid = emac_ndo_vlan_rx_del_vid,
>> + .ndo_bpf = emac_ndo_bpf,
>> + .ndo_xdp_xmit = emac_xdp_xmit,
>> };
>>
>> static int prueth_netdev_init(struct prueth *prueth,
>> diff --git a/drivers/net/ethernet/ti/icssg/icssg_prueth.h b/drivers/net/ethernet/ti/icssg/icssg_prueth.h
>> index 2c8585255b7c..fb8dc8e12c19 100644
>> --- a/drivers/net/ethernet/ti/icssg/icssg_prueth.h
>> +++ b/drivers/net/ethernet/ti/icssg/icssg_prueth.h
>> @@ -8,6 +8,8 @@
>> #ifndef __NET_TI_ICSSG_PRUETH_H
>> #define __NET_TI_ICSSG_PRUETH_H
>>
>> +#include <linux/bpf.h>
>> +#include <linux/bpf_trace.h>
>> #include <linux/etherdevice.h>
>> #include <linux/genalloc.h>
>> #include <linux/if_vlan.h>
>> @@ -134,6 +136,7 @@ struct prueth_rx_chn {
>> unsigned int irq[ICSSG_MAX_RFLOWS]; /* separate irq per flow */
>> char name[32];
>> struct page_pool *pg_pool;
>> + struct xdp_rxq_info xdp_rxq;
>> };
>>
>> enum prueth_swdata_type {
>> @@ -141,16 +144,19 @@ enum prueth_swdata_type {
>> PRUETH_SWDATA_SKB,
>> PRUETH_SWDATA_PAGE,
>> PRUETH_SWDATA_CMD,
>> + PRUETH_SWDATA_XDPF,
>> };
>>
>> union prueth_data {
>> struct sk_buff *skb;
>> struct page *page;
>> u32 cmd;
>> + struct xdp_frame *xdpf;
>> };
>>
>> struct prueth_swdata {
>> union prueth_data data;
>> + struct prueth_rx_chn *rx_chn;
>> enum prueth_swdata_type type;
>> };
>>
>> @@ -161,6 +167,12 @@ struct prueth_swdata {
>>
>> #define PRUETH_MAX_TX_TS_REQUESTS 50 /* Max simultaneous TX_TS requests */
>>
>> +/* XDP BPF state */
>> +#define ICSSG_XDP_PASS 0
>> +#define ICSSG_XDP_CONSUMED BIT(0)
>> +#define ICSSG_XDP_TX BIT(1)
>> +#define ICSSG_XDP_REDIR BIT(2)
>> +
>> /* Minimum coalesce time in usecs for both Tx and Rx */
>> #define ICSSG_MIN_COALESCE_USECS 20
>>
>> @@ -229,6 +241,8 @@ struct prueth_emac {
>> unsigned long rx_pace_timeout_ns;
>>
>> struct netdev_hw_addr_list vlan_mcast_list[MAX_VLAN_ID];
>> + struct bpf_prog *xdp_prog;
>> + struct xdp_attachment_info xdpi;
>> };
>>
>> /* The buf includes headroom compatible with both skb and xdpf */
>> @@ -467,5 +481,9 @@ void prueth_put_cores(struct prueth *prueth, int slice);
>>
>> /* Revision specific helper */
>> u64 icssg_ts_to_ns(u32 hi_sw, u32 hi, u32 lo, u32 cycle_time_ns);
>> +int emac_xmit_xdp_frame(struct prueth_emac *emac,
>> + struct xdp_frame *xdpf,
>> + struct page *page,
>> + unsigned int q_idx);
>>
>> #endif /* __NET_TI_ICSSG_PRUETH_H */
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net-next v2 1/3] net: ti: icssg-prueth: Use page_pool API for RX buffer allocation
2025-02-18 10:10 ` Malladi, Meghana
2025-02-18 12:24 ` Diogo Ivo
@ 2025-02-24 13:20 ` Roger Quadros
1 sibling, 0 replies; 17+ messages in thread
From: Roger Quadros @ 2025-02-24 13:20 UTC (permalink / raw)
To: Malladi, Meghana, danishanwar, pabeni, kuba, edumazet, davem,
andrew+netdev
Cc: bpf, linux-arm-kernel, linux-kernel, netdev, u.kleine-koenig,
krzysztof.kozlowski, dan.carpenter, schnelle, glaroque, rdunlap,
diogo.ivo, jan.kiszka, john.fastabend, hawk, daniel, ast, srk,
Vignesh Raghavendra
On 18/02/2025 12:10, Malladi, Meghana wrote:
> Hi Roger,
> Thanks for reviewing this patch series.
>
> On 2/12/2025 7:26 PM, Roger Quadros wrote:
>>
>>
>> On 10/02/2025 12:33, Meghana Malladi wrote:
>>> From: Roger Quadros <rogerq@kernel.org>
>>>
>>> This is to prepare for native XDP support.
>>>
>>> The page pool API is more faster in allocating pages than
>>> __alloc_skb(). Drawback is that it works at PAGE_SIZE granularity
>>> so we are not efficient in memory usage.
>>> i.e. we are using PAGE_SIZE (4KB) memory for 1.5KB max packet size.
>>>
>>> Signed-off-by: Roger Quadros <rogerq@kernel.org>
>>> Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
>>> Signed-off-by: Meghana Malladi <m-malladi@ti.com>
>>> ---
>>> v1: https://lore.kernel.org/all/20250122124951.3072410-1-m-malladi@ti.com/
>>>
>>> Changes since v1 (v2-v1):
>>> - Recycle pages wherever necessary using skb_mark_for_recycle()
>>> - Use napi_build_skb() instead of build_skb()
>>> - Update with correct frag_size argument in napi_build_skb()
>>> - Use napi_gro_receive() instead of netif_receive_skb()
>>> - Use PP_FLAG_DMA_SYNC_DEV to enable DMA sync with device
>>> - Use page_pool_dma_sync_for_cpu() to sync Rx page pool for CPU
>>>
>>> All the above changes have been suggested by Ido Schimmel <idosch@idosch.org>
>>>
>>> drivers/net/ethernet/ti/Kconfig | 1 +
>>> drivers/net/ethernet/ti/icssg/icssg_common.c | 174 ++++++++++++------
>>> drivers/net/ethernet/ti/icssg/icssg_prueth.h | 14 +-
>>> .../net/ethernet/ti/icssg/icssg_prueth_sr1.c | 21 ++-
>>> 4 files changed, 140 insertions(+), 70 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/ti/Kconfig b/drivers/net/ethernet/ti/Kconfig
>>> index 0d5a862cd78a..b461281d31b6 100644
>>> --- a/drivers/net/ethernet/ti/Kconfig
>>> +++ b/drivers/net/ethernet/ti/Kconfig
>>> @@ -204,6 +204,7 @@ config TI_ICSSG_PRUETH_SR1
>>> select PHYLIB
>>> select TI_ICSS_IEP
>>> select TI_K3_CPPI_DESC_POOL
>>> + select PAGE_POOL
>>> depends on PRU_REMOTEPROC
>>> depends on NET_SWITCHDEV
>>> depends on ARCH_K3 && OF && TI_K3_UDMA_GLUE_LAYER
>>> diff --git a/drivers/net/ethernet/ti/icssg/icssg_common.c b/drivers/net/ethernet/ti/icssg/icssg_common.c
>>> index 74f0f200a89d..c3c1e2bf461e 100644
>>> --- a/drivers/net/ethernet/ti/icssg/icssg_common.c
>>> +++ b/drivers/net/ethernet/ti/icssg/icssg_common.c
>>> @@ -45,6 +45,11 @@ void prueth_cleanup_rx_chns(struct prueth_emac *emac,
>>> struct prueth_rx_chn *rx_chn,
>>> int max_rflows)
>>> {
>>> + if (rx_chn->pg_pool) {
>>> + page_pool_destroy(rx_chn->pg_pool);
>>> + rx_chn->pg_pool = NULL;
>>> + }
>>> +
>>> if (rx_chn->desc_pool)
>>> k3_cppi_desc_pool_destroy(rx_chn->desc_pool);
>>> @@ -461,17 +466,17 @@ int prueth_init_rx_chns(struct prueth_emac *emac,
>>> }
>>> EXPORT_SYMBOL_GPL(prueth_init_rx_chns);
>>> -int prueth_dma_rx_push(struct prueth_emac *emac,
>>> - struct sk_buff *skb,
>>> - struct prueth_rx_chn *rx_chn)
>>> +int prueth_dma_rx_push_mapped(struct prueth_emac *emac,
>>> + struct prueth_rx_chn *rx_chn,
>>> + struct page *page, u32 buf_len)
>>> {
>>> struct net_device *ndev = emac->ndev;
>>> struct cppi5_host_desc_t *desc_rx;
>>> - u32 pkt_len = skb_tailroom(skb);
>>> dma_addr_t desc_dma;
>>> dma_addr_t buf_dma;
>>> void **swdata;
>>> + buf_dma = page_pool_get_dma_addr(page) + PRUETH_HEADROOM;
>>> desc_rx = k3_cppi_desc_pool_alloc(rx_chn->desc_pool);
>>> if (!desc_rx) {
>>> netdev_err(ndev, "rx push: failed to allocate descriptor\n");
>>> @@ -479,25 +484,18 @@ int prueth_dma_rx_push(struct prueth_emac *emac,
>>> }
>>> desc_dma = k3_cppi_desc_pool_virt2dma(rx_chn->desc_pool, desc_rx);
>>> - buf_dma = dma_map_single(rx_chn->dma_dev, skb->data, pkt_len, DMA_FROM_DEVICE);
>>> - if (unlikely(dma_mapping_error(rx_chn->dma_dev, buf_dma))) {
>>> - k3_cppi_desc_pool_free(rx_chn->desc_pool, desc_rx);
>>> - netdev_err(ndev, "rx push: failed to map rx pkt buffer\n");
>>> - return -EINVAL;
>>> - }
>>> -
>>> cppi5_hdesc_init(desc_rx, CPPI5_INFO0_HDESC_EPIB_PRESENT,
>>> PRUETH_NAV_PS_DATA_SIZE);
>>> k3_udma_glue_rx_dma_to_cppi5_addr(rx_chn->rx_chn, &buf_dma);
>>> - cppi5_hdesc_attach_buf(desc_rx, buf_dma, skb_tailroom(skb), buf_dma, skb_tailroom(skb));
>>> + cppi5_hdesc_attach_buf(desc_rx, buf_dma, buf_len, buf_dma, buf_len);
>>> swdata = cppi5_hdesc_get_swdata(desc_rx);
>>> - *swdata = skb;
>>> + *swdata = page;
>>> - return k3_udma_glue_push_rx_chn(rx_chn->rx_chn, 0,
>>> + return k3_udma_glue_push_rx_chn(rx_chn->rx_chn, PRUETH_RX_FLOW_DATA,
>>> desc_rx, desc_dma);
>>> }
>>> -EXPORT_SYMBOL_GPL(prueth_dma_rx_push);
>>> +EXPORT_SYMBOL_GPL(prueth_dma_rx_push_mapped);
>>> u64 icssg_ts_to_ns(u32 hi_sw, u32 hi, u32 lo, u32 cycle_time_ns)
>>> {
>>> @@ -541,12 +539,16 @@ static int emac_rx_packet(struct prueth_emac *emac, u32 flow_id)
>>> u32 buf_dma_len, pkt_len, port_id = 0;
>>> struct net_device *ndev = emac->ndev;
>>> struct cppi5_host_desc_t *desc_rx;
>>> - struct sk_buff *skb, *new_skb;
>>> dma_addr_t desc_dma, buf_dma;
>>> + struct page *page, *new_page;
>>> + struct page_pool *pool;
>>> + struct sk_buff *skb;
>>> void **swdata;
>>> u32 *psdata;
>>> + void *pa;
>>> int ret;
>>> + pool = rx_chn->pg_pool;
>>> ret = k3_udma_glue_pop_rx_chn(rx_chn->rx_chn, flow_id, &desc_dma);
>>> if (ret) {
>>> if (ret != -ENODATA)
>>> @@ -558,15 +560,10 @@ static int emac_rx_packet(struct prueth_emac *emac, u32 flow_id)
>>> return 0;
>>> desc_rx = k3_cppi_desc_pool_dma2virt(rx_chn->desc_pool, desc_dma);
>>> -
>>> swdata = cppi5_hdesc_get_swdata(desc_rx);
>>> - skb = *swdata;
>>> -
>>> - psdata = cppi5_hdesc_get_psdata(desc_rx);
>>> - /* RX HW timestamp */
>>> - if (emac->rx_ts_enabled)
>>> - emac_rx_timestamp(emac, skb, psdata);
>>> + page = *swdata;
>>>
>> Unnecessary blank line.
>>
>
> Ok noted. I will address all the cosmetic changes suggested by you in v3.
>
>>> + page_pool_dma_sync_for_cpu(pool, page, 0, PAGE_SIZE);
>>> cppi5_hdesc_get_obuf(desc_rx, &buf_dma, &buf_dma_len);
>>> k3_udma_glue_rx_cppi5_to_dma_addr(rx_chn->rx_chn, &buf_dma);
>>> pkt_len = cppi5_hdesc_get_pktlen(desc_rx);
>>> @@ -574,32 +571,51 @@ static int emac_rx_packet(struct prueth_emac *emac, u32 flow_id)
>>> pkt_len -= 4;
>>> cppi5_desc_get_tags_ids(&desc_rx->hdr, &port_id, NULL);
>>> - dma_unmap_single(rx_chn->dma_dev, buf_dma, buf_dma_len, DMA_FROM_DEVICE);
>>> k3_cppi_desc_pool_free(rx_chn->desc_pool, desc_rx);
>>> - skb->dev = ndev;
>>> - new_skb = netdev_alloc_skb_ip_align(ndev, PRUETH_MAX_PKT_SIZE);
>>> /* if allocation fails we drop the packet but push the
>>> - * descriptor back to the ring with old skb to prevent a stall
>>> + * descriptor back to the ring with old page to prevent a stall
>>> */
>>> - if (!new_skb) {
>>> + new_page = page_pool_dev_alloc_pages(pool);
>>> + if (unlikely(!new_page)) {
>>> + new_page = page;
>>> ndev->stats.rx_dropped++;
>>> - new_skb = skb;
>>> - } else {
>>> - /* send the filled skb up the n/w stack */
>>> - skb_put(skb, pkt_len);
>>> - if (emac->prueth->is_switch_mode)
>>> - skb->offload_fwd_mark = emac->offload_fwd_mark;
>>> - skb->protocol = eth_type_trans(skb, ndev);
>>> - napi_gro_receive(&emac->napi_rx, skb);
>>> - ndev->stats.rx_bytes += pkt_len;
>>> - ndev->stats.rx_packets++;
>>> + goto requeue;
>>> + }
>>> +
>>> + /* prepare skb and send to n/w stack */
>>> + pa = page_address(page);
>>> + skb = napi_build_skb(pa, PAGE_SIZE);
>>> + if (!skb) {
>>> + ndev->stats.rx_dropped++;
>>> + page_pool_recycle_direct(pool, page);
>>> + goto requeue;
>>> }
>>> + skb_reserve(skb, PRUETH_HEADROOM);
>>> + skb_put(skb, pkt_len);
>>> + skb->dev = ndev;
>>> +
>>> + psdata = cppi5_hdesc_get_psdata(desc_rx);
>>> + /* RX HW timestamp */
>>> + if (emac->rx_ts_enabled)
>>> + emac_rx_timestamp(emac, skb, psdata);
>>> +
>>> + if (emac->prueth->is_switch_mode)
>>> + skb->offload_fwd_mark = emac->offload_fwd_mark;
>>> + skb->protocol = eth_type_trans(skb, ndev);
>>> +
>>> + skb_mark_for_recycle(skb);
>>> + napi_gro_receive(&emac->napi_rx, skb);
>>> + ndev->stats.rx_bytes += pkt_len;
>>> + ndev->stats.rx_packets++;
>>> +
>>> +requeue:
>>> /* queue another RX DMA */
>>> - ret = prueth_dma_rx_push(emac, new_skb, &emac->rx_chns);
>>> + ret = prueth_dma_rx_push_mapped(emac, &emac->rx_chns, new_page,
>>> + PRUETH_MAX_PKT_SIZE);
>>> if (WARN_ON(ret < 0)) {
>>> - dev_kfree_skb_any(new_skb);
>>> + page_pool_recycle_direct(pool, new_page);
>>> ndev->stats.rx_errors++;
>>> ndev->stats.rx_dropped++;
>>> }
>>> @@ -611,22 +627,17 @@ static void prueth_rx_cleanup(void *data, dma_addr_t desc_dma)
>>> {
>>> struct prueth_rx_chn *rx_chn = data;
>>> struct cppi5_host_desc_t *desc_rx;
>>> - struct sk_buff *skb;
>>> - dma_addr_t buf_dma;
>>> - u32 buf_dma_len;
>>> + struct page_pool *pool;
>>> + struct page *page;
>>> void **swdata;
>>> + pool = rx_chn->pg_pool;
>>> +
>> here too.
>>
>>> desc_rx = k3_cppi_desc_pool_dma2virt(rx_chn->desc_pool, desc_dma);
>>> swdata = cppi5_hdesc_get_swdata(desc_rx);
>>> - skb = *swdata;
>>> - cppi5_hdesc_get_obuf(desc_rx, &buf_dma, &buf_dma_len);
>>> - k3_udma_glue_rx_cppi5_to_dma_addr(rx_chn->rx_chn, &buf_dma);
>>> -
>>> - dma_unmap_single(rx_chn->dma_dev, buf_dma, buf_dma_len,
>>> - DMA_FROM_DEVICE);
>>> + page = *swdata;
>>> + page_pool_recycle_direct(pool, page);
>>> k3_cppi_desc_pool_free(rx_chn->desc_pool, desc_rx);
>>> -
>>> - dev_kfree_skb_any(skb);
>>> }
>>> static int prueth_tx_ts_cookie_get(struct prueth_emac *emac)
>>> @@ -907,29 +918,71 @@ int icssg_napi_rx_poll(struct napi_struct *napi_rx, int budget)
>>> }
>>> EXPORT_SYMBOL_GPL(icssg_napi_rx_poll);
>>> +static struct page_pool *prueth_create_page_pool(struct prueth_emac *emac,
>>> + struct device *dma_dev,
>>> + int size)
>>> +{
>>> + struct page_pool_params pp_params = { 0 };
>>> + struct page_pool *pool;
>>> +
>>> + pp_params.order = 0;
>>> + pp_params.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV;
>>> + pp_params.pool_size = size;
>>> + pp_params.nid = dev_to_node(emac->prueth->dev);
>>> + pp_params.dma_dir = DMA_BIDIRECTIONAL;
>>> + pp_params.dev = dma_dev;
>>> + pp_params.napi = &emac->napi_rx;
>>> + pp_params.max_len = PAGE_SIZE;
>>> +
>>> + pool = page_pool_create(&pp_params);
>>> + if (IS_ERR(pool))
>>> + netdev_err(emac->ndev, "cannot create rx page pool\n");
>>> +
>>> + return pool;
>>> +}
>>> +
>>> int prueth_prepare_rx_chan(struct prueth_emac *emac,
>>> struct prueth_rx_chn *chn,
>>> int buf_size)
>>> {
>>> - struct sk_buff *skb;
>>> + struct page_pool *pool;
>>> + struct page *page;
>>> int i, ret;
>>> + pool = prueth_create_page_pool(emac, chn->dma_dev, chn->descs_num);
>>> + if (IS_ERR(pool))
>>> + return PTR_ERR(pool);
>>> +
>>> + chn->pg_pool = pool;
>>> +
>>> for (i = 0; i < chn->descs_num; i++) {
>>> - skb = __netdev_alloc_skb_ip_align(NULL, buf_size, GFP_KERNEL);
>>> - if (!skb)
>>> - return -ENOMEM;
>>> + /* NOTE: we're not using memory efficiently here.
>>> + * 1 full page (4KB?) used here instead of
>>> + * PRUETH_MAX_PKT_SIZE (~1.5KB?)
>>> + */
>>> + page = page_pool_dev_alloc_pages(pool);
>>
>> Did you evaluate Ido's suggestion to use page_pool_alloc_frag()?
>>
>
> Yes I have done some research on this. IMO, page_pool_alloc_frag() does fit our use case to reduce the memory footprint, downside of this would be overhead caused by atomic operations for page->pp_ref_count, also the page cannot be recycled till the page->pp_ref_count becomes 1, i.e., if the page fragments are not freed properly it will cause continuous page allocation, eventually leading to leaky page pool.
>
> Based on this presentation: https://archive.fosdem.org/2020/schedule/event/xdp_and_page_pool_api/attachments/paper/3625/export/events/attachments/xdp_and_page_pool_api/paper/3625/XDP_and_page_pool.pdf
> Under the XDP memory model section: it is quoted that -
> - "Cannot allocate page fragments to support it (e.g. through napi_alloc_skb())"
> - "Rx buffers must be recycled to get high speed"
> - "Optimized for one packet per page"
> - "Supports split-page model (usually driver is in charge of recycling)"
>
> FWIW, to ensure simplicity and avoid leaky page pools it is better to use entire page instead of fragments, atleast for XDP. Please do correct me if my understanding is wrong.
>
>>> + if (!page) {
>>> + netdev_err(emac->ndev, "couldn't allocate rx page\n");
>>> + ret = -ENOMEM;
>>> + goto recycle_alloc_pg;
>>> + }
>>> - ret = prueth_dma_rx_push(emac, skb, chn);
>>> + ret = prueth_dma_rx_push_mapped(emac, chn, page, buf_size);
>>> if (ret < 0) {
>>> netdev_err(emac->ndev,
>>> - "cannot submit skb for rx chan %s ret %d\n",
>>> + "cannot submit page for rx chan %s ret %d\n",
>>> chn->name, ret);
>>> - kfree_skb(skb);
>>> - return ret;
>>> + page_pool_recycle_direct(pool, page);
>>> + goto recycle_alloc_pg;
>>> }
>>> }
>>> return 0;
>>> +
>>> +recycle_alloc_pg:
>>> + prueth_reset_rx_chan(&emac->rx_chns, PRUETH_MAX_RX_FLOWS, false);
>>> +
>>> + return ret;
>>> }
>>> EXPORT_SYMBOL_GPL(prueth_prepare_rx_chan);
>>> @@ -958,6 +1011,9 @@ void prueth_reset_rx_chan(struct prueth_rx_chn *chn,
>>> prueth_rx_cleanup, !!i);
>>> if (disable)
>>> k3_udma_glue_disable_rx_chn(chn->rx_chn);
>>> +
>>> + page_pool_destroy(chn->pg_pool);
>>> + chn->pg_pool = NULL;
>>> }
>>> EXPORT_SYMBOL_GPL(prueth_reset_rx_chan);
>>> diff --git a/drivers/net/ethernet/ti/icssg/icssg_prueth.h b/drivers/net/ethernet/ti/icssg/icssg_prueth.h
>>> index 329b46e9ee53..c7b906de18af 100644
>>> --- a/drivers/net/ethernet/ti/icssg/icssg_prueth.h
>>> +++ b/drivers/net/ethernet/ti/icssg/icssg_prueth.h
>>> @@ -33,6 +33,8 @@
>>> #include <linux/dma/k3-udma-glue.h>
>>> #include <net/devlink.h>
>>> +#include <net/xdp.h>
>>> +#include <net/page_pool/helpers.h>
>>> #include "icssg_config.h"
>>> #include "icss_iep.h"
>>> @@ -131,6 +133,7 @@ struct prueth_rx_chn {
>>> u32 descs_num;
>>> unsigned int irq[ICSSG_MAX_RFLOWS]; /* separate irq per flow */
>>> char name[32];
>>> + struct page_pool *pg_pool;
>>> };
>>> /* There are 4 Tx DMA channels, but the highest priority is CH3 (thread 3)
>>> @@ -210,6 +213,10 @@ struct prueth_emac {
>>> struct netdev_hw_addr_list vlan_mcast_list[MAX_VLAN_ID];
>>> };
>>> +/* The buf includes headroom compatible with both skb and xdpf */
>>> +#define PRUETH_HEADROOM_NA (max(XDP_PACKET_HEADROOM, NET_SKB_PAD) + NET_IP_ALIGN)
>>> +#define PRUETH_HEADROOM ALIGN(PRUETH_HEADROOM_NA, sizeof(long))
>>> +
>>> /**
>>> * struct prueth_pdata - PRUeth platform data
>>> * @fdqring_mode: Free desc queue mode
>>> @@ -410,9 +417,10 @@ int prueth_init_rx_chns(struct prueth_emac *emac,
>>> struct prueth_rx_chn *rx_chn,
>>> char *name, u32 max_rflows,
>>> u32 max_desc_num);
>>> -int prueth_dma_rx_push(struct prueth_emac *emac,
>>> - struct sk_buff *skb,
>>> - struct prueth_rx_chn *rx_chn);
>>> +int prueth_dma_rx_push_mapped(struct prueth_emac *emac,
>>> + struct prueth_rx_chn *rx_chn,
>>> + struct page *page, u32 buf_len);
>>> +unsigned int prueth_rxbuf_total_len(unsigned int len);
>>> void emac_rx_timestamp(struct prueth_emac *emac,
>>> struct sk_buff *skb, u32 *psdata);
>>> enum netdev_tx icssg_ndo_start_xmit(struct sk_buff *skb, struct net_device *ndev);
>>> diff --git a/drivers/net/ethernet/ti/icssg/icssg_prueth_sr1.c b/drivers/net/ethernet/ti/icssg/icssg_prueth_sr1.c
>>> index 64a19ff39562..aeeb8a50376b 100644
>>> --- a/drivers/net/ethernet/ti/icssg/icssg_prueth_sr1.c
>>> +++ b/drivers/net/ethernet/ti/icssg/icssg_prueth_sr1.c
>>> @@ -274,10 +274,12 @@ static struct sk_buff *prueth_process_rx_mgm(struct prueth_emac *emac,
>>> struct prueth_rx_chn *rx_chn = &emac->rx_mgm_chn;
>>> struct net_device *ndev = emac->ndev;
>>> struct cppi5_host_desc_t *desc_rx;
>>> - struct sk_buff *skb, *new_skb;
>>> + struct page *page, *new_page;
>>> dma_addr_t desc_dma, buf_dma;
>>> u32 buf_dma_len, pkt_len;
>>> + struct sk_buff *skb;
>>
>> Can we get rid of SKB entirely from the management channel code?
>> The packet received on this channel is never passed to user space so
>> I don't see why SKB is required.
>>
>
> Yes I do agree with you on the fact the SKB here is not passed to the network stack, hence this is redundant. But honestly I am not sure how that can be done, because the callers of this function access skb->data
> from the skb which is returned and the same can't be done with page (how to pass the same data using page?)
> Also as you are aware we are not currently supporting SR1 devices anymore, hence I don't have any SR1 devices handy to test these changes and ensure nothing is broken if I remove SKB entirely.
>
skb->data is nothing but same as skb as we are not using any head room.
You can just use the page pointer instead.
also ensure that you recycle the page back to page pool once you are done
with it.
>>> void **swdata;
>>> + void *pa;
>>> int ret;
>>> ret = k3_udma_glue_pop_rx_chn(rx_chn->rx_chn, flow_id, &desc_dma);
>>> @@ -299,32 +301,35 @@ static struct sk_buff *prueth_process_rx_mgm(struct prueth_emac *emac,
>>> }
>>> swdata = cppi5_hdesc_get_swdata(desc_rx);
>>> - skb = *swdata;
>>> + page = *swdata;
>>> cppi5_hdesc_get_obuf(desc_rx, &buf_dma, &buf_dma_len);
>>> pkt_len = cppi5_hdesc_get_pktlen(desc_rx);
>>> dma_unmap_single(rx_chn->dma_dev, buf_dma, buf_dma_len, DMA_FROM_DEVICE);
>>> k3_cppi_desc_pool_free(rx_chn->desc_pool, desc_rx);
>>> - new_skb = netdev_alloc_skb_ip_align(ndev, PRUETH_MAX_PKT_SIZE);
>>> + new_page = page_pool_dev_alloc_pages(rx_chn->pg_pool);
>>> /* if allocation fails we drop the packet but push the
>>> * descriptor back to the ring with old skb to prevent a stall
>>> */
>>> - if (!new_skb) {
>>> + if (!new_page) {
>>> netdev_err(ndev,
>>> - "skb alloc failed, dropped mgm pkt from flow %d\n",
>>> + "page alloc failed, dropped mgm pkt from flow %d\n",
>>> flow_id);
>>> - new_skb = skb;
>>> + new_page = page;
>>> skb = NULL; /* return NULL */
>>> } else {
>>> /* return the filled skb */
>>> + pa = page_address(page);
>>> + skb = napi_build_skb(pa, PAGE_SIZE);
>>> skb_put(skb, pkt_len);
>>> }
>>> /* queue another DMA */
>>> - ret = prueth_dma_rx_push(emac, new_skb, &emac->rx_mgm_chn);
>>> + ret = prueth_dma_rx_push_mapped(emac, &emac->rx_chns, new_page,
>>> + PRUETH_MAX_PKT_SIZE);
>>> if (WARN_ON(ret < 0))
>>> - dev_kfree_skb_any(new_skb);
>>> + page_pool_recycle_direct(rx_chn->pg_pool, new_page);
>>> return skb;
>>> }
>>
>
--
cheers,
-roger
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2025-02-24 13:22 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-10 10:33 [PATCH net-next v2 0/3] Add native mode XDP support Meghana Malladi
2025-02-10 10:33 ` [PATCH net-next v2 1/3] net: ti: icssg-prueth: Use page_pool API for RX buffer allocation Meghana Malladi
2025-02-12 13:56 ` Roger Quadros
2025-02-18 10:10 ` Malladi, Meghana
2025-02-18 12:24 ` Diogo Ivo
2025-02-18 12:28 ` [EXTERNAL] " Malladi, Meghana
2025-02-24 13:20 ` Roger Quadros
2025-02-10 10:33 ` [PATCH net-next v2 2/3] net: ti: icssg-prueth: introduce and use prueth_swdata struct for SWDATA Meghana Malladi
2025-02-12 14:12 ` Roger Quadros
2025-02-18 11:02 ` Malladi, Meghana
2025-02-10 10:33 ` [PATCH net-next v2 3/3] net: ti: icssg-prueth: Add XDP support Meghana Malladi
2025-02-11 18:39 ` kernel test robot
2025-02-11 22:20 ` kernel test robot
2025-02-12 16:03 ` Roger Quadros
2025-02-18 18:29 ` Malladi, Meghana
2025-02-18 16:02 ` [PATCH net-next v2 0/3] Add native mode " Jesper Dangaard Brouer
2025-02-18 17:55 ` Malladi, Meghana
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).