Netdev List
 help / color / mirror / Atom feed
* Re: [Intel-wired-lan] [PATCH iwl-next v5 1/4] igc: remove unused autoneg_failed field
From: Ruinskiy, Dima @ 2026-06-14  7:16 UTC (permalink / raw)
  To: KhaiWenTan, anthony.l.nguyen, przemyslaw.kitszel, andrew+netdev,
	davem, edumazet, kuba, pabeni
  Cc: intel-wired-lan, netdev, linux-kernel, faizal.abdul.rahim,
	hong.aun.looi, hector.blanco.alcaine, khai.wen.tan, Faizal Rahim,
	Aleksandr Loktionov
In-Reply-To: <20260507214706.309984-2-khai.wen.tan@linux.intel.com>

On 08/05/2026 0:47, KhaiWenTan wrote:
> From: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
> 
> autoneg_failed in struct igc_mac_info is never set in the igc driver.
> Remove the field and the dead code checking it in
> igc_config_fc_after_link_up().
> 
> The field originates from the e1000/e1000e fiber/serdes forced-link
> path, where MAC-level autoneg timeout sets it to signal the flow-control
> code to force pause. igc supports only copper, so it never needs to set
> this field.
> 
> Reviewed-by: Looi Hong Aun <hong.aun.looi@intel.com>
> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
> Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
> Signed-off-by: Khai Wen Tan <khai.wen.tan@linux.intel.com>
> ---
>   drivers/net/ethernet/intel/igc/igc_hw.h  |  1 -
>   drivers/net/ethernet/intel/igc/igc_mac.c | 16 +---------------
>   2 files changed, 1 insertion(+), 16 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/igc/igc_hw.h b/drivers/net/ethernet/intel/igc/igc_hw.h
> index be8a49a86d09..86ab8f566f44 100644
> --- a/drivers/net/ethernet/intel/igc/igc_hw.h
> +++ b/drivers/net/ethernet/intel/igc/igc_hw.h
> @@ -92,7 +92,6 @@ struct igc_mac_info {
>   	bool asf_firmware_present;
>   	bool arc_subsystem_valid;
>   
> -	bool autoneg_failed;
>   	bool get_link_status;
>   };
>   
> diff --git a/drivers/net/ethernet/intel/igc/igc_mac.c b/drivers/net/ethernet/intel/igc/igc_mac.c
> index 7ac6637f8db7..142beb9ae557 100644
> --- a/drivers/net/ethernet/intel/igc/igc_mac.c
> +++ b/drivers/net/ethernet/intel/igc/igc_mac.c
> @@ -438,28 +438,14 @@ void igc_config_collision_dist(struct igc_hw *hw)
>    * Checks the status of auto-negotiation after link up to ensure that the
>    * speed and duplex were not forced.  If the link needed to be forced, then
>    * flow control needs to be forced also.  If auto-negotiation is enabled
> - * and did not fail, then we configure flow control based on our link
> - * partner.
> + * then we configure flow control based on our link partner.
>    */
>   s32 igc_config_fc_after_link_up(struct igc_hw *hw)
>   {
>   	u16 mii_status_reg, mii_nway_adv_reg, mii_nway_lp_ability_reg;
> -	struct igc_mac_info *mac = &hw->mac;
>   	u16 speed, duplex;
>   	s32 ret_val = 0;
>   
> -	/* Check for the case where we have fiber media and auto-neg failed
> -	 * so we had to force link.  In this case, we need to force the
> -	 * configuration of the MAC to match the "fc" parameter.
> -	 */
> -	if (mac->autoneg_failed)
> -		ret_val = igc_force_mac_fc(hw);
> -
> -	if (ret_val) {
> -		hw_dbg("Error forcing flow control settings\n");
> -		goto out;
> -	}
> -
>   	/* In auto-neg, we need to check and see if Auto-Neg has completed,
>   	 * and if so, how the PHY and link partner has flow control
>   	 * configured.
Reviewed-by: Dima Ruinskiy <dima.ruinskiy@intel.com>

^ permalink raw reply

* Re: [PATCH v7 1/5] net/mlx5: free mlx5_st_idx_data on final dealloc
From: Michael Gur @ 2026-06-14  7:06 UTC (permalink / raw)
  To: Zhiping Zhang, netdev; +Cc: kvm, linux-rdma, linux-pci, dri-devel
In-Reply-To: <20260611161546.4075580-2-zhipingz@meta.com>


On 6/11/2026 7:11 PM, Zhiping Zhang wrote:
> When the last reference to an ST table entry is dropped,
> mlx5_st_dealloc_index() removed the entry from idx_xa but leaked the
> backing mlx5_st_idx_data allocation. Repeated alloc/dealloc cycles
> therefore accumulate one struct mlx5_st_idx_data per cycle.
>
> Free idx_data after the xa_erase() so the lifetime of the bookkeeping
> struct matches the lifetime of the ST entry it tracks.
>
> Fixes: 888a7776f4fb ("net/mlx5: Add support for device steering tag")
> Signed-off-by: Zhiping Zhang <zhipingz@meta.com>
> ---
>   drivers/net/ethernet/mellanox/mlx5/core/lib/st.c | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/st.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/st.c
> index 997be91f0a13..7cedc348790d 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/st.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/st.c
> @@ -175,6 +175,7 @@ int mlx5_st_dealloc_index(struct mlx5_core_dev *dev, u16 st_index)
>   
>   	if (refcount_dec_and_test(&idx_data->usecount)) {
>   		xa_erase(&st->idx_xa, st_index);
> +		kfree(idx_data);
>   		/* We leave PCI config space as was before, no mkey will refer to it */
>   	}
>   

Reviewed-by: Michael Gur <michaelgur@nvidia.com>

Thanks.


^ permalink raw reply

* Re:Re: [PATCH v3] net: stmmac: fix fatal bus error on resume by reinitializing RX buffers
From: Ding Hui @ 2026-06-14  6:14 UTC (permalink / raw)
  To: kuba
  Cc: alexandre.torgue, andrew+netdev, davem, dinghui1111, dinghui,
	edumazet, j.raczynski, linux-arm-kernel, linux-kernel,
	linux-stm32, liuxuanjun, maxime.chevallier, mcoquelin.stm32,
	netdev, pabeni, rmk+kernel, xiasanbo, yangchen11
In-Reply-To: <20260608193059.78e05dce@kernel.org>

At 2026-06-09 10:30:59, "Jakub Kicinski" <kuba@kernel.org> wrote:
>On Thu,  4 Jun 2026 22:45:54 +0800 Ding Hui wrote:
>> +/**
>> + * stmmac_reinit_rx_descriptors - re-program RX descriptor buffer addresses
>> + *				   after stmmac_clear_descriptors()
>> + * @priv: driver private structure
>> + * @dma_conf: structure holding the dma data
>> + * @queue: RX queue index
>
>nit:
>
>kernel-doc script says:
>
>Warning: drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:1733 No description found for return value of 'stmmac_reinit_rx_descriptors'
>
>You need a Returns: statement in this kdoc
>-- 
>pw-bot: cr

Sorry for late reply. I will update a new version for it. Thanks.


^ permalink raw reply

* Re:Re: [PATCH v3] net: stmmac: fix fatal bus error on resume by reinitializing RX buffers
From: Ding Hui @ 2026-06-14  6:02 UTC (permalink / raw)
  To: j.raczynski
  Cc: alexandre.torgue, andrew+netdev, davem, dinghui1111, dinghui,
	edumazet, kuba, linux-arm-kernel, linux-kernel, linux-stm32,
	liuxuanjun, maxime.chevallier, mcoquelin.stm32, netdev, pabeni,
	rmk+kernel, xiasanbo, yangchen11
In-Reply-To: <aiaORbb0lZVxDg8L@AMDC4622.eu.corp.samsungelectronics.net>

At 2026-06-08 17:41:25, "Jakub Raczynski" <j.raczynski@samsung.com> wrote:
>On Thu, Jun 04, 2026 at 10:45:54PM +0800, Ding Hui wrote:
>> From: Ding Hui <dinghui@lixiang.com>
>> +	for (queue = 0; queue < priv->plat->rx_queues_to_use; queue++) {
>> +		ret = stmmac_reinit_rx_descriptors(priv, &priv->dma_conf,
>> +						   queue);
>> +		if (ret) {
>> +			netdev_err(priv->dev,
>> +				   "%s: rx desc reinit failed on queue %u\n",
>> +				   __func__, queue);
>> +			mutex_unlock(&priv->lock);
>> +			rtnl_unlock();
>> +			return ret;
>> +		}
>> +	}
>
>This is not directly related to the patch, but rather stmmac_resume() itself,
>but doesn't this return and hw_setup one leave bunch of descriptor memory
>hanging and effectively leaked?
>
>> +
>>  	ret = stmmac_hw_setup(ndev);
>>  	if (ret < 0) {
>>  		netdev_err(priv->dev, "%s: Hw setup failed\n", __func__);
>> -- 
>

You are right that both error paths leave the descriptor rings and RX
buffers allocated without an explicit cleanup. However, I prefer to call
it a memory "hanging" but not "leaked":

The memory is not permanently leaked. All RX buffers allocated in the
error path are stored in dma_conf->rx_queue[q].buf_pool[].page (or
.xdp for XSK queues), and the DMA descriptor rings themselves remain
reachable via priv->dma_conf. When the user eventually brings the
interface down, stmmac_release() -> free_dma_desc_resources() will
free everything correctly.

Maybe I should submit a follow-up patch that adds proper cleanup to
stmmac_resume()'s error paths (calling free_dma_desc_resources() and
marking the device as not running), if that would be welcome. I'd
prefer to keep it separate from this fix to keep the scope clean.

>Other than that, I don't see any obvious issues.
>

Thanks for the review.


^ permalink raw reply

* [PATCH net-next] r8169: migrate Rx path to page_pool
From: atharva-potdar @ 2026-06-14  5:41 UTC (permalink / raw)
  To: hkallweit1, nic_swsd, andrew+netdev, davem, edumazet, kuba,
	pabeni
  Cc: netdev, atharva-potdar

Replace the driver-managed skb+copy Rx model with page_pool
zero-copy in preparation for XDP support.

Key changes:
- Allocate order-0 pages via page_pool instead of alloc_pages + dma_map
- Build skbs directly from pages with napi_build_skb (zero-copy)
- Add rtl8169_rx_refill() to replenish descriptors after processing
- Track dirty_rx boundary for efficient refill scheduling
- Cap max_mtu to R8169_RX_BUF_SIZE - VLAN_ETH_HLEN - ETH_FCS_LEN
  (order-0 pages can't support arbitrary jumbo frames)

Tested on RTL8168h with iperf3 (~470 Mbps, 0 retransmits) and
1000 pings (0 drops).

Signed-off-by: atharva-potdar <atharvapotdar07@gmail.com>
---
 drivers/net/ethernet/realtek/r8169_main.c | 128 ++++++++++++++--------
 1 file changed, 85 insertions(+), 43 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index ec4fc21fa..9d8d678ac 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -31,6 +31,7 @@
 #include <linux/unaligned.h>
 #include <net/ip6_checksum.h>
 #include <net/netdev_queues.h>
+#include <net/page_pool/helpers.h>
 #include <net/phy/realtek_phy.h>
 
 #include "r8169.h"
@@ -70,7 +71,9 @@
 #define InterFrameGap	0x03	/* 3 means InterFrameGap = the shortest one */
 
 #define R8169_REGS_SIZE		256
-#define R8169_RX_BUF_SIZE	(SZ_16K - 1)
+#define R8169_RX_HEADROOM	ALIGN(XDP_PACKET_HEADROOM, 8)
+#define R8169_RX_BUF_SIZE	(PAGE_SIZE - R8169_RX_HEADROOM - \
+				 SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
 #define NUM_TX_DESC	256	/* Number of Tx descriptor registers */
 #define NUM_RX_DESC	256	/* Number of Rx descriptor registers */
 #define R8169_TX_RING_BYTES	(NUM_TX_DESC * sizeof(struct TxDesc))
@@ -737,6 +740,7 @@ struct rtl8169_private {
 	enum mac_version mac_version;
 	enum rtl_dash_type dash_type;
 	u32 cur_rx; /* Index into the Rx descriptor buffer of next Rx pkt. */
+	u32 dirty_rx; /* Index of first Rx descriptor needing a new buffer */
 	u32 cur_tx; /* Index into the Tx descriptor buffer of next Rx pkt. */
 	u32 dirty_tx;
 	struct TxDesc *TxDescArray;	/* 256-aligned Tx descriptor ring */
@@ -745,6 +749,8 @@ struct rtl8169_private {
 	dma_addr_t RxPhyAddr;
 	struct page *Rx_databuff[NUM_RX_DESC];	/* Rx data buffers */
 	struct ring_info tx_skb[NUM_TX_DESC];	/* Tx data buffers */
+	struct page_pool *page_pool;
+	u32 rx_buf_sz;
 	u16 cp_cmd;
 	u16 tx_lpi_timer;
 	u32 irq_mask;
@@ -4148,37 +4154,27 @@ static int rtl8169_change_mtu(struct net_device *dev, int new_mtu)
 	return 0;
 }
 
-static void rtl8169_mark_to_asic(struct RxDesc *desc)
+static void rtl8169_mark_to_asic(struct RxDesc *desc, u32 rx_buf_sz)
 {
 	u32 eor = le32_to_cpu(desc->opts1) & RingEnd;
 
 	desc->opts2 = 0;
 	/* Force memory writes to complete before releasing descriptor */
 	dma_wmb();
-	WRITE_ONCE(desc->opts1, cpu_to_le32(DescOwn | eor | R8169_RX_BUF_SIZE));
+	WRITE_ONCE(desc->opts1, cpu_to_le32(DescOwn | eor | rx_buf_sz));
 }
 
 static struct page *rtl8169_alloc_rx_data(struct rtl8169_private *tp,
 					  struct RxDesc *desc)
 {
-	struct device *d = tp_to_dev(tp);
-	int node = dev_to_node(d);
-	dma_addr_t mapping;
 	struct page *data;
 
-	data = alloc_pages_node(node, GFP_KERNEL, get_order(R8169_RX_BUF_SIZE));
+	data = page_pool_dev_alloc_pages(tp->page_pool);
 	if (!data)
 		return NULL;
 
-	mapping = dma_map_page(d, data, 0, R8169_RX_BUF_SIZE, DMA_FROM_DEVICE);
-	if (unlikely(dma_mapping_error(d, mapping))) {
-		netdev_err(tp->dev, "Failed to map RX DMA!\n");
-		__free_pages(data, get_order(R8169_RX_BUF_SIZE));
-		return NULL;
-	}
-
-	desc->addr = cpu_to_le64(mapping);
-	rtl8169_mark_to_asic(desc);
+	desc->addr = cpu_to_le64(page_pool_get_dma_addr(data) + R8169_RX_HEADROOM);
+	rtl8169_mark_to_asic(desc, tp->rx_buf_sz);
 
 	return data;
 }
@@ -4187,15 +4183,17 @@ static void rtl8169_rx_clear(struct rtl8169_private *tp)
 {
 	int i;
 
-	for (i = 0; i < NUM_RX_DESC && tp->Rx_databuff[i]; i++) {
-		dma_unmap_page(tp_to_dev(tp),
-			       le64_to_cpu(tp->RxDescArray[i].addr),
-			       R8169_RX_BUF_SIZE, DMA_FROM_DEVICE);
-		__free_pages(tp->Rx_databuff[i], get_order(R8169_RX_BUF_SIZE));
+	for (i = 0; i < NUM_RX_DESC; i++) {
+		if (!tp->Rx_databuff[i])
+			continue;
+		page_pool_put_full_page(tp->page_pool, tp->Rx_databuff[i], true);
 		tp->Rx_databuff[i] = NULL;
 		tp->RxDescArray[i].addr = 0;
 		tp->RxDescArray[i].opts1 = 0;
 	}
+
+	page_pool_destroy(tp->page_pool);
+	tp->page_pool = NULL;
 }
 
 static int rtl8169_rx_fill(struct rtl8169_private *tp)
@@ -4221,11 +4219,28 @@ static int rtl8169_rx_fill(struct rtl8169_private *tp)
 
 static int rtl8169_init_ring(struct rtl8169_private *tp)
 {
+	struct page_pool_params pp_params = { 0 };
+
 	rtl8169_init_ring_indexes(tp);
+	tp->dirty_rx = 0;
+	tp->rx_buf_sz = R8169_RX_BUF_SIZE;
 
 	memset(tp->tx_skb, 0, sizeof(tp->tx_skb));
 	memset(tp->Rx_databuff, 0, sizeof(tp->Rx_databuff));
 
+	pp_params.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV;
+	pp_params.order = 0;
+	pp_params.pool_size = NUM_RX_DESC;
+	pp_params.nid = dev_to_node(tp_to_dev(tp));
+	pp_params.dev = tp_to_dev(tp);
+	pp_params.dma_dir = DMA_FROM_DEVICE;
+	pp_params.offset = R8169_RX_HEADROOM;
+	pp_params.max_len = tp->rx_buf_sz;
+
+	tp->page_pool = page_pool_create(&pp_params);
+	if (IS_ERR(tp->page_pool))
+		return PTR_ERR(tp->page_pool);
+
 	return rtl8169_rx_fill(tp);
 }
 
@@ -4312,7 +4327,7 @@ static void rtl_reset_work(struct rtl8169_private *tp)
 	rtl8169_cleanup(tp);
 
 	for (i = 0; i < NUM_RX_DESC; i++)
-		rtl8169_mark_to_asic(tp->RxDescArray + i);
+		rtl8169_mark_to_asic(tp->RxDescArray + i, tp->rx_buf_sz);
 
 	napi_enable(&tp->napi);
 	rtl_hw_start(tp);
@@ -4776,9 +4791,8 @@ static int rtl_rx(struct net_device *dev, struct rtl8169_private *tp, int budget
 	for (count = 0; count < budget; count++, tp->cur_rx++) {
 		unsigned int pkt_size, entry = tp->cur_rx % NUM_RX_DESC;
 		struct RxDesc *desc = tp->RxDescArray + entry;
+		struct page *page;
 		struct sk_buff *skb;
-		const void *rx_buf;
-		dma_addr_t addr;
 		u32 status;
 
 		status = le32_to_cpu(READ_ONCE(desc->opts1));
@@ -4791,6 +4805,9 @@ static int rtl_rx(struct net_device *dev, struct rtl8169_private *tp, int budget
 		 */
 		dma_rmb();
 
+		page = tp->Rx_databuff[entry];
+		tp->Rx_databuff[entry] = NULL;
+
 		if (unlikely(status & RxRES)) {
 			if (net_ratelimit())
 				netdev_warn(dev, "Rx ERROR. status = %08x\n",
@@ -4802,9 +4819,9 @@ static int rtl_rx(struct net_device *dev, struct rtl8169_private *tp, int budget
 				dev->stats.rx_crc_errors++;
 
 			if (!(dev->features & NETIF_F_RXALL))
-				goto release_descriptor;
+				goto recycle;
 			else if (status & RxRWT || !(status & (RxRUNT | RxCRC)))
-				goto release_descriptor;
+				goto recycle;
 		}
 
 		pkt_size = status & GENMASK(13, 0);
@@ -4817,24 +4834,23 @@ static int rtl_rx(struct net_device *dev, struct rtl8169_private *tp, int budget
 		if (unlikely(rtl8169_fragmented_frame(status))) {
 			dev->stats.rx_dropped++;
 			dev->stats.rx_length_errors++;
-			goto release_descriptor;
+			goto recycle;
 		}
 
-		skb = napi_alloc_skb(&tp->napi, pkt_size);
+		dma_sync_single_for_cpu(d,
+					page_pool_get_dma_addr(page) +
+					R8169_RX_HEADROOM,
+					pkt_size, DMA_FROM_DEVICE);
+
+		skb = napi_build_skb(page_address(page), PAGE_SIZE);
 		if (unlikely(!skb)) {
 			dev->stats.rx_dropped++;
-			goto release_descriptor;
+			goto recycle;
 		}
 
-		addr = le64_to_cpu(desc->addr);
-		rx_buf = page_address(tp->Rx_databuff[entry]);
-
-		dma_sync_single_for_cpu(d, addr, pkt_size, DMA_FROM_DEVICE);
-		prefetch(rx_buf);
-		skb_copy_to_linear_data(skb, rx_buf, pkt_size);
-		skb->tail += pkt_size;
-		skb->len = pkt_size;
-		dma_sync_single_for_device(d, addr, pkt_size, DMA_FROM_DEVICE);
+		skb_reserve(skb, R8169_RX_HEADROOM);
+		skb_put(skb, pkt_size);
+		skb_mark_for_recycle(skb);
 
 		rtl8169_rx_csum(skb, status);
 		skb->protocol = eth_type_trans(skb, dev);
@@ -4847,13 +4863,34 @@ static int rtl_rx(struct net_device *dev, struct rtl8169_private *tp, int budget
 		napi_gro_receive(&tp->napi, skb);
 
 		dev_sw_netstats_rx_add(dev, pkt_size);
-release_descriptor:
-		rtl8169_mark_to_asic(desc);
+
+		continue;
+
+recycle:
+		page_pool_put_full_page(tp->page_pool, page, true);
 	}
 
 	return count;
 }
 
+static void rtl8169_rx_refill(struct rtl8169_private *tp)
+{
+	u32 dirty_rx = tp->dirty_rx;
+
+	while (dirty_rx != tp->cur_rx) {
+		u32 entry = dirty_rx % NUM_RX_DESC;
+
+		if (!tp->Rx_databuff[entry]) {
+			tp->Rx_databuff[entry] = rtl8169_alloc_rx_data(tp,
+								       tp->RxDescArray + entry);
+			if (!tp->Rx_databuff[entry])
+				break;
+		}
+		dirty_rx++;
+	}
+	tp->dirty_rx = dirty_rx;
+}
+
 static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
 {
 	struct rtl8169_private *tp = dev_instance;
@@ -4921,6 +4958,7 @@ static int rtl8169_poll(struct napi_struct *napi, int budget)
 	rtl_tx(dev, tp, budget);
 
 	work_done = rtl_rx(dev, tp, budget);
+	rtl8169_rx_refill(tp);
 
 	if (work_done < budget && napi_complete_done(napi, work_done))
 		rtl_irq_enable(tp);
@@ -5775,8 +5813,12 @@ static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	}
 
 	jumbo_max = rtl_jumbo_max(tp);
-	if (jumbo_max)
-		dev->max_mtu = jumbo_max;
+	if (jumbo_max) {
+		unsigned int page_pool_mtu;
+
+		page_pool_mtu = R8169_RX_BUF_SIZE - VLAN_ETH_HLEN - ETH_FCS_LEN;
+		dev->max_mtu = min_t(int, jumbo_max, page_pool_mtu);
+	}
 
 	rtl_set_irq_mask(tp);
 
@@ -5808,7 +5850,7 @@ static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	if (jumbo_max)
 		netdev_info(dev, "jumbo features [frames: %d bytes, tx checksumming: %s]\n",
-			    jumbo_max, tp->mac_version <= RTL_GIGA_MAC_VER_06 ?
+			    dev->max_mtu, tp->mac_version <= RTL_GIGA_MAC_VER_06 ?
 			    "ok" : "ko");
 
 	if (tp->dash_type != RTL_DASH_NONE) {
-- 
2.54.0


^ permalink raw reply related

* [PATCH v4 5/6] pds_core: add host backed memory support for firmware
From: Nikhil P. Rao @ 2026-06-14  5:00 UTC (permalink / raw)
  To: netdev
  Cc: Brett Creeley, Andrew Lunn, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Eric Joyner, Vamsi Atluri,
	Nikhil P . Rao
In-Reply-To: <20260614050052.1048328-1-nikhil.rao@amd.com>

From: Vamsi Atluri <Vamsi.Atluri@amd.com>

Some newer AMD/Pensando cards have minimal memory and there are cases
where components, specifically in the control plane, need more memory.
This series adds support for host backed DMA memory that can be used
by the firmware for the previously mentioned cases.

Assisted-by: Claude:claude-opus-4.6
Signed-off-by: Vamsi Atluri <Vamsi.Atluri@amd.com>
Signed-off-by: Nikhil P. Rao <nikhil.rao@amd.com>
---
 drivers/net/ethernet/amd/pds_core/core.c | 160 +++++++++++++++++++++++
 drivers/net/ethernet/amd/pds_core/core.h |  20 +++
 drivers/net/ethernet/amd/pds_core/main.c |   4 +-
 include/linux/pds/pds_core_if.h          |  64 +++++++++
 4 files changed, 247 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/amd/pds_core/core.c b/drivers/net/ethernet/amd/pds_core/core.c
index 705cab7b0727..d1695ca95440 100644
--- a/drivers/net/ethernet/amd/pds_core/core.c
+++ b/drivers/net/ethernet/amd/pds_core/core.c
@@ -487,6 +487,7 @@ void pdsc_teardown(struct pdsc *pdsc, bool removing)
 		pdsc->viftype_status = NULL;
 	}
 
+	pdsc_host_mem_free(pdsc);
 	pdsc_dev_uninit(pdsc);
 
 	set_bit(PDSC_S_FW_DEAD, &pdsc->state);
@@ -496,6 +497,7 @@ int pdsc_start(struct pdsc *pdsc)
 {
 	pds_core_intr_mask(&pdsc->intr_ctrl[pdsc->adminqcq.intx],
 			   PDS_CORE_INTR_MASK_CLEAR);
+	pdsc_host_mem_add(pdsc);
 
 	return 0;
 }
@@ -658,3 +660,161 @@ void pdsc_health_thread(struct work_struct *work)
 out_unlock:
 	mutex_unlock(&pdsc->config_lock);
 }
+
+static void pdsc_host_mem_del_one(struct pdsc *pdsc, u16 tag, u8 reason)
+{
+	union pds_core_dev_comp comp = {};
+	union pds_core_dev_cmd cmd = {
+		.host_mem.opcode = PDS_CORE_CMD_HOST_MEM,
+		.host_mem.oper = PDS_CORE_HOST_MEM_DEL,
+		.host_mem.tag = cpu_to_le16(tag),
+		.host_mem.reason = reason,
+	};
+
+	dev_dbg(pdsc->dev, "Sending devcmd for mem del tag %d\n", tag);
+	pdsc_devcmd(pdsc, &cmd, &comp, pdsc->devcmd_timeout);
+}
+
+static int pdsc_host_mem_add_one(struct pdsc *pdsc, int index)
+{
+	struct pdsc_host_mem *hm = &pdsc->host_mem_reqs[index];
+	union pds_core_dev_comp comp = {};
+	union pds_core_dev_cmd cmd = {};
+	int err;
+
+	cmd.host_mem.opcode = PDS_CORE_CMD_HOST_MEM;
+	cmd.host_mem.oper = PDS_CORE_HOST_MEM_QUERY;
+	cmd.host_mem.index = cpu_to_le16(index);
+	dev_dbg(pdsc->dev, "Sending devcmd for mem query index %d\n", index);
+	err = pdsc_devcmd(pdsc, &cmd, &comp, pdsc->devcmd_timeout);
+	if (err || comp.status != PDS_RC_SUCCESS) {
+		dev_err(pdsc->dev, "mem query failed err %d status %d\n",
+			err, comp.status);
+		return err ? err : -EIO;
+	}
+	hm->size = le32_to_cpu(comp.host_mem.size);
+	hm->tag = le16_to_cpu(comp.host_mem.tag);
+	dev_dbg(pdsc->dev, "mem query returned size %d tag %d\n",
+		hm->size, hm->tag);
+
+	if (!hm->size || hm->size > PDSC_HOST_MEM_MAX_CONTIG) {
+		dev_err(pdsc->dev, "invalid size %d for tag %d\n",
+			hm->size, hm->tag);
+		err = -EINVAL;
+		goto err_del;
+	}
+
+	hm->order = get_order(hm->size);
+	hm->pg = alloc_pages(GFP_KERNEL | __GFP_ZERO | __GFP_NOWARN, hm->order);
+	if (!hm->pg) {
+		dev_warn(pdsc->dev, "alloc order %d failed for tag %d\n",
+			 hm->order, hm->tag);
+		err = -ENOMEM;
+		goto err_del;
+	}
+
+	hm->pa = dma_map_page(pdsc->dev, hm->pg, 0, hm->size,
+			      DMA_BIDIRECTIONAL);
+	if (dma_mapping_error(pdsc->dev, hm->pa)) {
+		dev_err(pdsc->dev, "dma map failed for tag %d size %d\n",
+			hm->tag, hm->size);
+		__free_pages(hm->pg, hm->order);
+		hm->pg = NULL;
+		err = -EIO;
+		goto err_del;
+	}
+
+	/* Track this allocation so pdsc_host_mem_free() can clean it up */
+	pdsc->num_host_mem_reqs++;
+
+	memset(&cmd, 0, sizeof(cmd));
+	memset(&comp, 0, sizeof(comp));
+	cmd.host_mem.opcode = PDS_CORE_CMD_HOST_MEM;
+	cmd.host_mem.oper = PDS_CORE_HOST_MEM_ADD;
+	cmd.host_mem.tag = cpu_to_le16(hm->tag);
+	cmd.host_mem.size = cpu_to_le32(hm->size);
+	cmd.host_mem.buf_pa = cpu_to_le64(hm->pa);
+
+	dev_dbg(pdsc->dev, "Sending devcmd for mem add tag %d size %d pa %pad\n",
+		hm->tag, hm->size, &hm->pa);
+	err = pdsc_devcmd(pdsc, &cmd, &comp, pdsc->devcmd_timeout);
+	if (err || comp.status != PDS_RC_SUCCESS) {
+		dev_err(pdsc->dev, "mem add failed err %d status %d for tag %d\n",
+			err, comp.status, hm->tag);
+		err = err ? err : -EIO;
+		goto err_del;
+	}
+	dev_dbg(pdsc->dev, "mem add completed for tag %d\n", hm->tag);
+
+	return 0;
+
+err_del:
+	/* After MEM_QUERY succeeds, firmware expects MEM_ADD or MEM_DEL */
+	pdsc_host_mem_del_one(pdsc, hm->tag, PDS_RC_ENOMEM);
+	return err;
+}
+
+void pdsc_host_mem_add(struct pdsc *pdsc)
+{
+	union pds_core_dev_comp comp = {};
+	union pds_core_dev_cmd cmd = {};
+	u16 count;
+	int err;
+	int i;
+
+	if (!(pdsc->dev_ident.capabilities &
+	     cpu_to_le64(PDS_CORE_DEV_CAP_HOST_MEM)))
+		return;
+
+	cmd.host_mem.opcode = PDS_CORE_CMD_HOST_MEM;
+	cmd.host_mem.oper = PDS_CORE_HOST_MEM_GET_COUNT;
+	cmd.host_mem.index = cpu_to_le16(PDSC_HOST_MEM_MAX_COUNT);
+	cmd.host_mem.max_contig = cpu_to_le32(PDSC_HOST_MEM_MAX_CONTIG);
+	dev_dbg(pdsc->dev, "Sending devcmd for mem get count max_contig %u\n",
+		PDSC_HOST_MEM_MAX_CONTIG);
+	err = pdsc_devcmd(pdsc, &cmd, &comp, pdsc->devcmd_timeout);
+	if (err || comp.status != PDS_RC_SUCCESS) {
+		dev_err(pdsc->dev, "mem get count failed err %d status %d\n",
+			err, comp.status);
+		return;
+	}
+
+	count = min(le16_to_cpu(comp.host_mem.count),
+		    PDSC_HOST_MEM_MAX_COUNT);
+	dev_dbg(pdsc->dev, "mem get count returned count %d\n", count);
+	if (count == 0)
+		return;
+
+	pdsc->host_mem_reqs = kzalloc_objs(*pdsc->host_mem_reqs, count,
+					   GFP_KERNEL);
+	if (!pdsc->host_mem_reqs) {
+		dev_err(pdsc->dev, "failed to alloc host_mem_reqs array\n");
+		return;
+	}
+
+	for (i = 0; i < count; i++) {
+		err = pdsc_host_mem_add_one(pdsc, i);
+		if (err)
+			break;
+	}
+}
+
+void pdsc_host_mem_free(struct pdsc *pdsc)
+{
+	int i;
+
+	if (!pdsc->host_mem_reqs)
+		return;
+
+	for (i = 0; i < pdsc->num_host_mem_reqs; i++) {
+		dma_unmap_page(pdsc->dev, pdsc->host_mem_reqs[i].pa,
+			       pdsc->host_mem_reqs[i].size,
+			       DMA_BIDIRECTIONAL);
+		__free_pages(pdsc->host_mem_reqs[i].pg,
+			     pdsc->host_mem_reqs[i].order);
+	}
+
+	kfree(pdsc->host_mem_reqs);
+	pdsc->host_mem_reqs = NULL;
+	pdsc->num_host_mem_reqs = 0;
+}
diff --git a/drivers/net/ethernet/amd/pds_core/core.h b/drivers/net/ethernet/amd/pds_core/core.h
index c686f0bbbaeb..53e5a6f0af9c 100644
--- a/drivers/net/ethernet/amd/pds_core/core.h
+++ b/drivers/net/ethernet/amd/pds_core/core.h
@@ -5,6 +5,7 @@
 #define _PDSC_H_
 
 #include <linux/debugfs.h>
+#include <linux/mmzone.h>
 #include <net/devlink.h>
 
 #include <linux/pds/pds_common.h>
@@ -23,6 +24,10 @@
 #define PDSC_SETUP_RECOVERY	false
 #define PDSC_SETUP_INIT		true
 
+/* Use fixed 4MB max to avoid cpu_to_le32() truncation on large-page configs */
+#define PDSC_HOST_MEM_MAX_CONTIG (4 * 1024 * 1024)
+#define PDSC_HOST_MEM_MAX_COUNT  256
+
 struct pdsc_deferred_dma {
 	struct list_head list;
 	dma_addr_t dma_addr;
@@ -149,6 +154,14 @@ struct pdsc_viftype {
 	struct pds_auxiliary_dev *padev;
 };
 
+struct pdsc_host_mem {
+	u32 size;
+	u16 tag;
+	u8 order;
+	struct page *pg;
+	dma_addr_t pa;
+};
+
 /* No state flags set means we are in a steady running state */
 enum pdsc_state_flags {
 	PDSC_S_FW_DEAD,		    /* stopped, wait on startup or recovery */
@@ -210,6 +223,9 @@ struct pdsc {
 	struct pdsc_viftype *viftype_status;
 	struct work_struct pci_reset_work;
 
+	struct pdsc_host_mem *host_mem_reqs;
+	u16 num_host_mem_reqs;
+
 	struct pds_core_component_list_info fw_components;
 };
 
@@ -287,6 +303,7 @@ void pdsc_debugfs_add_viftype(struct pdsc *pdsc);
 void pdsc_debugfs_add_irqs(struct pdsc *pdsc);
 void pdsc_debugfs_add_qcq(struct pdsc *pdsc, struct pdsc_qcq *qcq);
 void pdsc_debugfs_del_qcq(struct pdsc_qcq *qcq);
+void pdsc_debugfs_add_host_mem(struct pdsc *pdsc);
 
 int pdsc_err_to_errno(enum pds_core_status_code code);
 bool pdsc_is_fw_running(struct pdsc *pdsc);
@@ -345,6 +362,9 @@ void pdsc_fw_down(struct pdsc *pdsc);
 void pdsc_fw_up(struct pdsc *pdsc);
 void pdsc_pci_reset_thread(struct work_struct *work);
 
+void pdsc_host_mem_add(struct pdsc *pdsc);
+void pdsc_host_mem_free(struct pdsc *pdsc);
+
 void pdsc_deferred_dma_add(struct pdsc *pdsc, struct pdsc_deferred_dma *entry,
 			   dma_addr_t dma_addr, void *va, size_t size,
 			   enum dma_data_direction dir);
diff --git a/drivers/net/ethernet/amd/pds_core/main.c b/drivers/net/ethernet/amd/pds_core/main.c
index 02ff5e51f617..a6f327f1c1d5 100644
--- a/drivers/net/ethernet/amd/pds_core/main.c
+++ b/drivers/net/ethernet/amd/pds_core/main.c
@@ -21,6 +21,8 @@ static const struct pci_device_id pdsc_id_table[] = {
 };
 MODULE_DEVICE_TABLE(pci, pdsc_id_table);
 
+static void pdsc_stop_health_thread(struct pdsc *pdsc);
+
 static void pdsc_wdtimer_cb(struct timer_list *t)
 {
 	struct pdsc *pdsc = timer_container_of(pdsc, t, wdtimer);
@@ -437,7 +439,7 @@ static void pdsc_remove(struct pci_dev *pdev)
 		pdsc_sriov_configure(pdev, 0);
 		pdsc_auxbus_dev_del(pdsc, pdsc, &pdsc->padev);
 
-		timer_shutdown_sync(&pdsc->wdtimer);
+		pdsc_stop_health_thread(pdsc);
 		if (pdsc->wq)
 			destroy_workqueue(pdsc->wq);
 
diff --git a/include/linux/pds/pds_core_if.h b/include/linux/pds/pds_core_if.h
index cc1f180aee55..8a0c962b6472 100644
--- a/include/linux/pds/pds_core_if.h
+++ b/include/linux/pds/pds_core_if.h
@@ -46,6 +46,7 @@ enum pds_core_cmd_opcode {
 	PDS_CORE_CMD_SEND_COMPONENT	= 9,
 	PDS_CORE_CMD_FINALIZE_UPDATE	= 10,
 	PDS_CORE_CMD_MATCH_RECORD_DESC	= 11,
+	PDS_CORE_CMD_HOST_MEM		= 12,
 
 	/* SR/IOV commands */
 	PDS_CORE_CMD_VF_GETATTR		= 60,
@@ -110,9 +111,11 @@ struct pds_core_drv_identity {
 /**
  * enum pds_core_dev_capability - Device capabilities
  * @PDS_CORE_DEV_CAP_PLDM_FW_UPDATE: Device only supports FW update via PLDM
+ * @PDS_CORE_DEV_CAP_HOST_MEM: Device supports host memory for fw use
  */
 enum pds_core_dev_capability {
 	PDS_CORE_DEV_CAP_PLDM_FW_UPDATE = BIT(0),
+	PDS_CORE_DEV_CAP_HOST_MEM = BIT(1),
 };
 
 #define PDS_DEV_TYPE_MAX	16
@@ -838,6 +841,65 @@ struct pds_core_match_record_desc_comp {
 	u8 rsvd;
 };
 
+/**
+ * enum pds_core_host_mem_oper - HOST_MEM sub-operations
+ * @PDS_CORE_HOST_MEM_GET_COUNT: Query number of memory requests
+ * @PDS_CORE_HOST_MEM_QUERY:     Query details of a memory request
+ * @PDS_CORE_HOST_MEM_ADD:       Provide allocated memory to firmware
+ * @PDS_CORE_HOST_MEM_DEL:       Notify firmware of memory deallocation
+ */
+enum pds_core_host_mem_oper {
+	PDS_CORE_HOST_MEM_GET_COUNT	= 0,
+	PDS_CORE_HOST_MEM_QUERY		= 1,
+	PDS_CORE_HOST_MEM_ADD		= 2,
+	PDS_CORE_HOST_MEM_DEL		= 3,
+};
+
+/**
+ * struct pds_core_host_mem_cmd - HOST_MEM command
+ * @opcode:     Opcode PDS_CORE_CMD_HOST_MEM
+ * @oper:       Operation (enum pds_core_host_mem_oper)
+ * @index:      Memory request index (GET_COUNT: max_count, QUERY: index)
+ * @tag:        Tag for this memory request (ADD/DEL)
+ * @reason:     Reason for deletion (DEL only)
+ * @rsvd:       Reserved
+ * @max_contig: Maximum contiguous memory size (GET_COUNT only)
+ * @size:       Size of memory in bytes (ADD only)
+ * @buf_pa:     DMA address of memory (ADD only)
+ *
+ * Unified command for all host memory operations. Fields are reused
+ * across operations to minimize opcode space usage.
+ */
+struct pds_core_host_mem_cmd {
+	u8     opcode;
+	u8     oper;
+	__le16 index;
+	__le16 tag;
+	u8     reason;
+	u8     rsvd;
+	__le32 max_contig;
+	__le32 size;
+	__le64 buf_pa;
+};
+
+/**
+ * struct pds_core_host_mem_comp - HOST_MEM completion
+ * @status:       Status of the command (enum pds_core_status_code)
+ * @oper:         Operation that was performed
+ * @count:        Number of memory requests (GET_COUNT)
+ * @size:         Size of memory request in bytes (QUERY)
+ * @tag:          Tag for this memory request (QUERY/DEL)
+ * @rsvd:         Reserved
+ */
+struct pds_core_host_mem_comp {
+	u8     status;
+	u8     oper;
+	__le16 count;
+	__le32 size;
+	__le16 tag;
+	u8     rsvd[6];
+};
+
 /*
  * union pds_core_dev_cmd - Overlay of core device command structures
  */
@@ -861,6 +923,7 @@ union pds_core_dev_cmd {
 	struct pds_core_send_component_cmd     send_component;
 	struct pds_core_finalize_update_cmd    finalize_update;
 	struct pds_core_match_record_desc_cmd  match_record_desc;
+	struct pds_core_host_mem_cmd           host_mem;
 };
 
 /*
@@ -886,6 +949,7 @@ union pds_core_dev_comp {
 	struct pds_core_send_component_comp     send_component;
 	struct pds_core_finalize_update_comp    finalize_update;
 	struct pds_core_match_record_desc_comp  match_record_desc;
+	struct pds_core_host_mem_comp           host_mem;
 };
 
 /**
-- 
2.43.0


^ permalink raw reply related

* [PATCH v4 3/6] pds_core: add PLDM firmware update support via devlink flash
From: Nikhil P. Rao @ 2026-06-14  5:00 UTC (permalink / raw)
  To: netdev
  Cc: Brett Creeley, Andrew Lunn, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Eric Joyner, Nikhil P . Rao
In-Reply-To: <20260614050052.1048328-1-nikhil.rao@amd.com>

From: Brett Creeley <brett.creeley@amd.com>

Implement PLDM FW Update in the pds_core driver using the upstream
pldmfw API. This allows updating an entire PLDM FW package at once
or updating specific firmware components by name.

Flash the entire image:
  devlink dev flash pci/0000:b5:00.0 file firmware.pldmfw

Flash a specific component from the PLDM FW package:
  devlink dev flash pci/0000:b5:00.0 \
    file firmware.pldmfw component fw.cpld

Per-component update uses driver-defined component names (fw.mainfw,
fw.cpld, etc.). Not all components support per-component update -
devlink will reject the request if the specified component cannot
be updated.

Assisted-by: Claude:claude-opus-4.6
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Nikhil P. Rao <nikhil.rao@amd.com>
---
 .../device_drivers/ethernet/amd/pds_core.rst  |  88 ++
 drivers/net/ethernet/amd/Kconfig              |   1 +
 drivers/net/ethernet/amd/pds_core/core.h      |  30 +-
 drivers/net/ethernet/amd/pds_core/dev.c       |  79 ++
 drivers/net/ethernet/amd/pds_core/devlink.c   |   2 +-
 drivers/net/ethernet/amd/pds_core/fw.c        | 767 +++++++++++++++++-
 drivers/net/ethernet/amd/pds_core/main.c      |   7 +
 include/linux/pds/pds_core_if.h               | 402 +++++++++
 8 files changed, 1371 insertions(+), 5 deletions(-)

diff --git a/Documentation/networking/device_drivers/ethernet/amd/pds_core.rst b/Documentation/networking/device_drivers/ethernet/amd/pds_core.rst
index 9e8a16c44102..71f0222589bb 100644
--- a/Documentation/networking/device_drivers/ethernet/amd/pds_core.rst
+++ b/Documentation/networking/device_drivers/ethernet/amd/pds_core.rst
@@ -102,6 +102,94 @@ currently in use, and that bank will used for the next boot::
   # devlink dev flash pci/0000:b5:00.0 \
             file pensando/dsc_fw_1.63.0-22.tar
 
+Firmware Management (PLDM)
+==========================
+
+Firmware that supports PLDM can be updated using the devlink flash command
+with a PLDM firmware package. The entire package can be updated at once::
+
+  # devlink dev flash pci/0000:b5:00.0 file firmware.pldmfw
+
+Individual components can also be updated by specifying the component name::
+
+  # devlink dev flash pci/0000:b5:00.0 \
+            file firmware.pldmfw component fw.cpld
+
+Per-component update uses driver-defined component names (fw.mainfw,
+fw.cpld, etc.). Not all components support per-component update -
+devlink will reject the request if the specified component cannot
+be updated.
+
+Info versions (PLDM)
+====================
+
+Firmware that supports PLDM reports component versions using driver-defined
+names. The driver reports the following component versions:
+
+.. list-table:: devlink info versions for PLDM-capable firmware
+   :widths: 5 5 90
+
+   * - Name
+     - Type
+     - Description
+   * - ``fw``
+     - running
+     - Version of firmware running on the device
+   * - ``fw.mainfw``
+     - running, stored
+     - Main firmware
+   * - ``fw.mainfw.gold``
+     - stored
+     - Gold (recovery) firmware
+   * - ``fw.bootloader``
+     - running, stored
+     - Boot loader
+   * - ``fw.cpld``
+     - running, stored
+     - CPLD
+   * - ``fw.secure``
+     - running, stored
+     - Secure boot firmware
+   * - ``fw.fpga``
+     - running, stored
+     - FPGA configuration
+   * - ``fw.suc.mainfw``
+     - running, stored
+     - System Unit Controller main firmware
+   * - ``fw.suc.bootloader``
+     - running, stored
+     - System Unit Controller bootloader
+   * - ``fw.uboot``
+     - running, stored
+     - U-Boot bootloader
+   * - ``asic.id``
+     - fixed
+     - The ASIC type for this device
+   * - ``asic.rev``
+     - fixed
+     - The revision of the ASIC for this device
+
+Example output::
+
+  $ devlink dev info pci/0000:00:05.0
+  pci/0000:00:05.0:
+    driver pds_core
+    serial_number FLM18420073
+    versions:
+        fixed:
+          asic.id 0x0
+          asic.rev 0x0
+        running:
+          fw.bootloader 1.2.3
+          fw.mainfw 1.3.0
+          fw.cpld 3.18
+          fw 1.3.0
+        stored:
+          fw.bootloader 1.2.3
+          fw.mainfw.gold 1.2.0
+          fw.mainfw 1.3.0
+          fw.cpld 3.18
+
 Health Reporters
 ================
 
diff --git a/drivers/net/ethernet/amd/Kconfig b/drivers/net/ethernet/amd/Kconfig
index e35991141a1a..743e3d4b6b94 100644
--- a/drivers/net/ethernet/amd/Kconfig
+++ b/drivers/net/ethernet/amd/Kconfig
@@ -171,6 +171,7 @@ config PDS_CORE
 	depends on 64BIT && PCI
 	select AUXILIARY_BUS
 	select NET_DEVLINK
+	select PLDMFW
 	help
 	  This enables the support for the AMD/Pensando Core device family of
 	  adapters.  More specific information on this driver can be
diff --git a/drivers/net/ethernet/amd/pds_core/core.h b/drivers/net/ethernet/amd/pds_core/core.h
index b7fe9ad73349..c686f0bbbaeb 100644
--- a/drivers/net/ethernet/amd/pds_core/core.h
+++ b/drivers/net/ethernet/amd/pds_core/core.h
@@ -23,6 +23,14 @@
 #define PDSC_SETUP_RECOVERY	false
 #define PDSC_SETUP_INIT		true
 
+struct pdsc_deferred_dma {
+	struct list_head list;
+	dma_addr_t dma_addr;
+	void *va;
+	size_t size;
+	enum dma_data_direction dir;
+};
+
 struct pdsc_dev_bar {
 	void __iomem *vaddr;
 	phys_addr_t bus_addr;
@@ -185,6 +193,8 @@ struct pdsc {
 	struct mutex devcmd_lock;	/* lock for dev_cmd operations */
 	struct mutex config_lock;	/* lock for configuration operations */
 	spinlock_t adminq_lock;		/* lock for adminq operations */
+	struct list_head deferred_dma_list;
+	spinlock_t deferred_dma_lock;	/* lock for deferred DMA list */
 	refcount_t adminq_refcnt;
 	struct pds_core_dev_info_regs __iomem *info_regs;
 	struct pds_core_dev_cmd_regs __iomem *cmd_regs;
@@ -199,6 +209,8 @@ struct pdsc {
 	u64 last_eid;
 	struct pdsc_viftype *viftype_status;
 	struct work_struct pci_reset_work;
+
+	struct pds_core_component_list_info fw_components;
 };
 
 /** enum pds_core_dbell_bits - bitwise composition of dbell values.
@@ -281,8 +293,16 @@ bool pdsc_is_fw_running(struct pdsc *pdsc);
 bool pdsc_is_fw_good(struct pdsc *pdsc);
 int pdsc_devcmd(struct pdsc *pdsc, union pds_core_dev_cmd *cmd,
 		union pds_core_dev_comp *comp, int max_seconds);
+int pdsc_devcmd_with_data(struct pdsc *pdsc, union pds_core_dev_cmd *cmd,
+			  const void *data, size_t data_len,
+			  union pds_core_dev_comp *comp, int max_seconds);
+int pdsc_devcmd_with_data_nomsg(struct pdsc *pdsc, union pds_core_dev_cmd *cmd,
+				const void *data, size_t data_len,
+				union pds_core_dev_comp *comp, int max_seconds);
 int pdsc_devcmd_locked(struct pdsc *pdsc, union pds_core_dev_cmd *cmd,
 		       union pds_core_dev_comp *comp, int max_seconds);
+int pdsc_devcmd_locked_nomsg(struct pdsc *pdsc, union pds_core_dev_cmd *cmd,
+			     union pds_core_dev_comp *comp, int max_seconds);
 int pdsc_devcmd_init(struct pdsc *pdsc);
 int pdsc_devcmd_reset(struct pdsc *pdsc);
 int pdsc_dev_init(struct pdsc *pdsc);
@@ -315,11 +335,19 @@ void pdsc_process_adminq(struct pdsc_qcq *qcq);
 void pdsc_work_thread(struct work_struct *work);
 irqreturn_t pdsc_adminq_isr(int irq, void *data);
 
-int pdsc_firmware_update(struct pdsc *pdsc, const struct firmware *fw,
+int pdsc_firmware_update(struct pdsc *pdsc,
+			 struct devlink_flash_update_params *params,
 			 struct netlink_ext_ack *extack);
+int pdsc_get_component_info(struct pdsc *pdsc);
+const char *pdsc_fw_type_to_name(u8 type);
 
 void pdsc_fw_down(struct pdsc *pdsc);
 void pdsc_fw_up(struct pdsc *pdsc);
 void pdsc_pci_reset_thread(struct work_struct *work);
 
+void pdsc_deferred_dma_add(struct pdsc *pdsc, struct pdsc_deferred_dma *entry,
+			   dma_addr_t dma_addr, void *va, size_t size,
+			   enum dma_data_direction dir);
+void pdsc_deferred_dma_free(struct pdsc *pdsc);
+
 #endif /* _PDSC_H_ */
diff --git a/drivers/net/ethernet/amd/pds_core/dev.c b/drivers/net/ethernet/amd/pds_core/dev.c
index 5c0ca3d0b000..6082b28915db 100644
--- a/drivers/net/ethernet/amd/pds_core/dev.c
+++ b/drivers/net/ethernet/amd/pds_core/dev.c
@@ -206,15 +206,53 @@ static int __pdsc_devcmd_locked(struct pdsc *pdsc, union pds_core_dev_cmd *cmd,
 	else
 		memcpy_fromio(comp, &pdsc->cmd_regs->comp, sizeof(*comp));
 
+	if (err != -ETIMEDOUT && err != -EAGAIN)
+		pdsc_deferred_dma_free(pdsc);
+
 	return err;
 }
 
+void pdsc_deferred_dma_add(struct pdsc *pdsc, struct pdsc_deferred_dma *entry,
+			   dma_addr_t dma_addr, void *va, size_t size,
+			   enum dma_data_direction dir)
+{
+	entry->dma_addr = dma_addr;
+	entry->va = va;
+	entry->size = size;
+	entry->dir = dir;
+
+	spin_lock(&pdsc->deferred_dma_lock);
+	list_add_tail(&entry->list, &pdsc->deferred_dma_list);
+	spin_unlock(&pdsc->deferred_dma_lock);
+}
+
+void pdsc_deferred_dma_free(struct pdsc *pdsc)
+{
+	struct pdsc_deferred_dma *entry, *tmp;
+
+	spin_lock(&pdsc->deferred_dma_lock);
+	list_for_each_entry_safe(entry, tmp, &pdsc->deferred_dma_list, list) {
+		dma_unmap_single(pdsc->dev, entry->dma_addr,
+				 entry->size, entry->dir);
+		kfree(entry->va);
+		list_del(&entry->list);
+		kfree(entry);
+	}
+	spin_unlock(&pdsc->deferred_dma_lock);
+}
+
 int pdsc_devcmd_locked(struct pdsc *pdsc, union pds_core_dev_cmd *cmd,
 		       union pds_core_dev_comp *comp, int max_seconds)
 {
 	return __pdsc_devcmd_locked(pdsc, cmd, comp, max_seconds, true);
 }
 
+int pdsc_devcmd_locked_nomsg(struct pdsc *pdsc, union pds_core_dev_cmd *cmd,
+			     union pds_core_dev_comp *comp, int max_seconds)
+{
+	return __pdsc_devcmd_locked(pdsc, cmd, comp, max_seconds, false);
+}
+
 int pdsc_devcmd(struct pdsc *pdsc, union pds_core_dev_cmd *cmd,
 		union pds_core_dev_comp *comp, int max_seconds)
 {
@@ -227,6 +265,47 @@ int pdsc_devcmd(struct pdsc *pdsc, union pds_core_dev_cmd *cmd,
 	return err;
 }
 
+static int __pdsc_devcmd_with_data(struct pdsc *pdsc,
+				   union pds_core_dev_cmd *cmd,
+				   const void *data, size_t data_len,
+				   union pds_core_dev_comp *comp,
+				   int max_seconds, bool do_msg)
+{
+	int err;
+
+	mutex_lock(&pdsc->devcmd_lock);
+	if (!pdsc->cmd_regs) {
+		err = -ENXIO;
+		goto unlock;
+	}
+	if (data_len > sizeof(pdsc->cmd_regs->data)) {
+		err = -ENOSPC;
+		goto unlock;
+	}
+	memcpy_toio(&pdsc->cmd_regs->data, data, data_len);
+	err = __pdsc_devcmd_locked(pdsc, cmd, comp, max_seconds, do_msg);
+unlock:
+	mutex_unlock(&pdsc->devcmd_lock);
+
+	return err;
+}
+
+int pdsc_devcmd_with_data(struct pdsc *pdsc, union pds_core_dev_cmd *cmd,
+			  const void *data, size_t data_len,
+			  union pds_core_dev_comp *comp, int max_seconds)
+{
+	return __pdsc_devcmd_with_data(pdsc, cmd, data, data_len,
+				       comp, max_seconds, true);
+}
+
+int pdsc_devcmd_with_data_nomsg(struct pdsc *pdsc, union pds_core_dev_cmd *cmd,
+				const void *data, size_t data_len,
+				union pds_core_dev_comp *comp, int max_seconds)
+{
+	return __pdsc_devcmd_with_data(pdsc, cmd, data, data_len,
+				       comp, max_seconds, false);
+}
+
 int pdsc_devcmd_init(struct pdsc *pdsc)
 {
 	union pds_core_dev_comp comp = {};
diff --git a/drivers/net/ethernet/amd/pds_core/devlink.c b/drivers/net/ethernet/amd/pds_core/devlink.c
index 2ea97e1c5939..3b763ee1715e 100644
--- a/drivers/net/ethernet/amd/pds_core/devlink.c
+++ b/drivers/net/ethernet/amd/pds_core/devlink.c
@@ -90,7 +90,7 @@ int pdsc_dl_flash_update(struct devlink *dl,
 {
 	struct pdsc *pdsc = devlink_priv(dl);
 
-	return pdsc_firmware_update(pdsc, params->fw, extack);
+	return pdsc_firmware_update(pdsc, params, extack);
 }
 
 static char *fw_slotnames[] = {
diff --git a/drivers/net/ethernet/amd/pds_core/fw.c b/drivers/net/ethernet/amd/pds_core/fw.c
index fa626719e68d..0ef34869fdc0 100644
--- a/drivers/net/ethernet/amd/pds_core/fw.c
+++ b/drivers/net/ethernet/amd/pds_core/fw.c
@@ -1,6 +1,9 @@
 // SPDX-License-Identifier: GPL-2.0
 /* Copyright(c) 2023 Advanced Micro Devices, Inc */
 
+#include <linux/pldmfw.h>
+#include <linux/vmalloc.h>
+
 #include "core.h"
 
 /* The worst case wait for the install activity is about 25 minutes when
@@ -14,6 +17,46 @@
 /* Number of periodic log updates during fw file download */
 #define PDSC_FW_INTERVAL_FRACTION	32
 
+#define PDSC_FW_COMPONENT_PREFIX		"fw."
+#define PDSC_FW_COMPONENT_FULL_NAME_BUFLEN \
+	(sizeof(PDSC_FW_COMPONENT_PREFIX) + PDS_CORE_FW_COMPONENT_NAME_BUFLEN)
+
+/* Driver-defined component type to name mapping */
+static const char * const pdsc_fw_type_names[] = {
+	[PDS_CORE_FW_TYPE_MAIN]      = "mainfw",
+	[PDS_CORE_FW_TYPE_BOOT]      = "bootloader",
+	[PDS_CORE_FW_TYPE_CPLD]      = "cpld",
+	[PDS_CORE_FW_TYPE_SECURE]    = "secure",
+	[PDS_CORE_FW_TYPE_FPGA]      = "fpga",
+	[PDS_CORE_FW_TYPE_SUC_MAIN]  = "suc.mainfw",
+	[PDS_CORE_FW_TYPE_SUC_BOOT]  = "suc.bootloader",
+	[PDS_CORE_FW_TYPE_UBOOT]     = "uboot",
+};
+
+const char *pdsc_fw_type_to_name(u8 type)
+{
+	if (type < ARRAY_SIZE(pdsc_fw_type_names) && pdsc_fw_type_names[type])
+		return pdsc_fw_type_names[type];
+	return NULL;
+}
+
+static u8 pdsc_name_to_fw_type(const char *name)
+{
+	size_t prefix_len;
+	int i;
+
+	prefix_len = str_has_prefix(name, PDSC_FW_COMPONENT_PREFIX);
+	if (prefix_len)
+		name += prefix_len;
+
+	for (i = 1; i < ARRAY_SIZE(pdsc_fw_type_names); i++) {
+		if (pdsc_fw_type_names[i] &&
+		    !strcmp(name, pdsc_fw_type_names[i]))
+			return i;
+	}
+	return 0;
+}
+
 static int pdsc_devcmd_fw_download_locked(struct pdsc *pdsc, u64 addr,
 					  u32 offset, u32 length)
 {
@@ -23,7 +66,7 @@ static int pdsc_devcmd_fw_download_locked(struct pdsc *pdsc, u64 addr,
 		.fw_download.addr = cpu_to_le64(addr),
 		.fw_download.length = cpu_to_le32(length),
 	};
-	union pds_core_dev_comp comp;
+	union pds_core_dev_comp comp = {};
 
 	return pdsc_devcmd_locked(pdsc, &cmd, &comp, pdsc->devcmd_timeout);
 }
@@ -95,8 +138,9 @@ static int pdsc_fw_status_long_wait(struct pdsc *pdsc,
 	return err;
 }
 
-int pdsc_firmware_update(struct pdsc *pdsc, const struct firmware *fw,
-			 struct netlink_ext_ack *extack)
+static int pdsc_legacy_firmware_update(struct pdsc *pdsc,
+				       const struct firmware *fw,
+				       struct netlink_ext_ack *extack)
 {
 	u32 buf_sz, copy_sz, offset;
 	struct devlink *dl;
@@ -195,3 +239,720 @@ int pdsc_firmware_update(struct pdsc *pdsc, const struct firmware *fw,
 						   NULL, 0, 0);
 	return err;
 }
+
+struct pdsc_component_priv {
+	u16 component_id;
+	bool skip;
+	struct list_head list_entry;
+};
+
+struct pds_core_fwu_priv {
+	struct pldmfw context;
+	struct devlink_flash_update_params *params;
+	struct netlink_ext_ack *extack;
+	struct pdsc *pdsc;
+	struct list_head components;
+};
+
+static void pdsc_free_fwu_priv(struct pds_core_fwu_priv *priv)
+{
+	struct pdsc_component_priv *component_priv, *tmp;
+
+	list_for_each_entry_safe(component_priv, tmp, &priv->components,
+				 list_entry) {
+		list_del(&component_priv->list_entry);
+		kfree(component_priv);
+	}
+}
+
+static int pdsc_devcmd_match_record_desc(struct pdsc *pdsc, u16 desc_type,
+					 u16 desc_size, const u8 *desc_data,
+					 u8 *match)
+{
+	union pds_core_dev_cmd cmd = {
+		.match_record_desc.opcode = PDS_CORE_CMD_MATCH_RECORD_DESC,
+		.match_record_desc.ver = 1,
+		.match_record_desc.type = cpu_to_le16(desc_type),
+		.match_record_desc.size = cpu_to_le16(desc_size),
+	};
+	union pds_core_dev_comp comp = {};
+	int err;
+
+	err = pdsc_devcmd_with_data(pdsc, &cmd, desc_data, desc_size,
+				    &comp, pdsc->devcmd_timeout);
+	*match = comp.match_record_desc.match;
+
+	return err;
+}
+
+static bool pdsc_match_record_descs(struct pldmfw *context,
+				    struct pldmfw_record *record)
+{
+	struct pds_core_fwu_priv *priv =
+		container_of(context, struct pds_core_fwu_priv, context);
+	struct pdsc *pdsc = priv->pdsc;
+	struct pldmfw_desc_tlv *desc;
+
+	if (!pldmfw_op_pci_match_record(context, record))
+		return false;
+
+	list_for_each_entry(desc, &record->descs, entry) {
+		u8 match;
+		int err;
+
+		switch (desc->type) {
+		/* skip types checked in pldmfw_op_pci_match_record */
+		case PLDM_DESC_ID_PCI_VENDOR_ID:
+		case PLDM_DESC_ID_PCI_DEVICE_ID:
+		case PLDM_DESC_ID_PCI_SUBVENDOR_ID:
+		case PLDM_DESC_ID_PCI_SUBDEV_ID:
+			continue;
+		}
+
+		if (!desc->size)
+			return false;
+
+		err = pdsc_devcmd_match_record_desc(pdsc, desc->type,
+						    desc->size, desc->data,
+						    &match);
+		if (err) {
+			dev_err(pdsc->dev,
+				"match_record_desc failed type: 0x%04x size: %u, err %d\n",
+				desc->type, desc->size, err);
+			return false;
+		}
+		/* all record descriptors must match */
+		if (!match)
+			return false;
+	}
+
+	return true;
+}
+
+static int pdsc_devcmd_send_package_data(struct pdsc *pdsc, u64 addr,
+					 u16 length, u16 offset, u16 total_len)
+{
+	union pds_core_dev_cmd cmd = {
+		.send_pkg_data.opcode = PDS_CORE_CMD_SEND_PKG_DATA,
+		.send_pkg_data.ver = 1,
+		.send_pkg_data.data_pa = cpu_to_le64(addr),
+		.send_pkg_data.data_len = cpu_to_le16(length),
+		.send_pkg_data.offset = cpu_to_le16(offset),
+		.send_pkg_data.total_len = cpu_to_le16(total_len),
+	};
+	union pds_core_dev_comp comp = {};
+
+	return pdsc_devcmd(pdsc, &cmd, &comp, pdsc->devcmd_timeout);
+}
+
+static int pdsc_send_package_data(struct pldmfw *context, const u8 *data,
+				  u16 length)
+{
+	struct pds_core_fwu_priv *priv =
+		container_of(context, struct pds_core_fwu_priv, context);
+	struct pdsc_deferred_dma *deferred;
+	struct device *dev = context->dev;
+	struct pdsc *pdsc = priv->pdsc;
+	dma_addr_t dma_addr;
+	u8 *package_data;
+	u32 offset;
+	int err;
+
+	if (!length)
+		return 0;
+
+	deferred = kmalloc_obj(*deferred, GFP_KERNEL);
+	if (!deferred)
+		return -ENOMEM;
+
+	package_data = kmemdup(data, length, GFP_KERNEL);
+	if (!package_data) {
+		kfree(deferred);
+		return -ENOMEM;
+	}
+
+	dma_addr = dma_map_single(dev, package_data, length, DMA_TO_DEVICE);
+	if (dma_mapping_error(dev, dma_addr)) {
+		dev_err(dev, "Failed to dma_map package_data length 0x%x\n",
+			length);
+		kfree(package_data);
+		kfree(deferred);
+		return -ENOMEM;
+	}
+
+	for (offset = 0; offset < length; offset += PDS_PAGE_SIZE) {
+		u32 copy_sz;
+
+		copy_sz = min_t(unsigned int, PDS_PAGE_SIZE, length - offset);
+		err = pdsc_devcmd_send_package_data(pdsc, dma_addr + offset,
+						    copy_sz, offset, length);
+		if (err) {
+			dev_err(dev,
+				"send_package_data failed off 0x%x len 0x%x: %pe\n",
+				offset, copy_sz, ERR_PTR(err));
+			break;
+		}
+	}
+
+	if (err == -ETIMEDOUT || err == -EAGAIN) {
+		pdsc_deferred_dma_add(pdsc, deferred, dma_addr,
+				      package_data, length, DMA_TO_DEVICE);
+		return err;
+	}
+
+	kfree(deferred);
+	dma_unmap_single(dev, dma_addr, length, DMA_TO_DEVICE);
+	kfree(package_data);
+	return err;
+}
+
+static bool pdsc_component_type_exists(struct pdsc *pdsc, u8 type)
+{
+	int i;
+
+	for (i = 0; i < pdsc->fw_components.num_components; i++) {
+		if (pdsc->fw_components.info[i].component_type == type)
+			return true;
+	}
+	return false;
+}
+
+static u8 pdsc_get_component_type_by_id(struct pdsc *pdsc, u16 component_id)
+{
+	int i;
+
+	for (i = 0; i < pdsc->fw_components.num_components; i++) {
+		struct pds_core_fw_component_info *info =
+			&pdsc->fw_components.info[i];
+
+		if (info->identifier == component_id)
+			return info->component_type;
+	}
+	return 0;
+}
+
+static bool pdsc_component_id_matches_type(struct pdsc *pdsc,
+					   u8 component_id, u8 type)
+{
+	int i;
+
+	for (i = 0; i < pdsc->fw_components.num_components; i++) {
+		struct pds_core_fw_component_info *info =
+			&pdsc->fw_components.info[i];
+
+		if (info->identifier == component_id &&
+		    info->component_type == type)
+			return true;
+	}
+	return false;
+}
+
+static bool pdsc_skip_component(struct pds_core_fwu_priv *priv,
+				u16 component_id)
+{
+	struct pdsc_component_priv *component_priv;
+
+	list_for_each_entry(component_priv, &priv->components, list_entry) {
+		if (component_priv->component_id == component_id)
+			return component_priv->skip;
+	}
+
+	return false;
+}
+
+static int pdsc_send_component_table(struct pldmfw *context,
+				     struct pldmfw_component *component,
+				     u8 transfer_flag)
+{
+	struct pds_core_fwu_priv *priv =
+		container_of(context, struct pds_core_fwu_priv, context);
+	struct pds_core_component_tbl *component_tbl;
+	struct pdsc_component_priv *component_priv;
+	struct device *dev = context->dev;
+	union pds_core_dev_comp comp = {};
+	union pds_core_dev_cmd cmd = {};
+	struct pdsc *pdsc = priv->pdsc;
+	bool skip_component = false;
+	u8 requested_type = 0;
+	u16 buf_sz, tbl_sz;
+	int err = 0;
+
+	dev_dbg(dev,
+		"component name %s classification %u id %u activation_method %u ver_len %d ver_str %.*s index %u size %u transfer_flag 0x%02x\n",
+		priv->params->component, component->classification,
+		component->identifier, component->activation_method,
+		component->version_len, component->version_len,
+		component->version_string, component->index,
+		component->component_size, transfer_flag);
+
+	component_priv = kzalloc_obj(*component_priv, GFP_KERNEL);
+	if (!component_priv)
+		return -ENOMEM;
+
+	if (priv->params->component) {
+		requested_type = pdsc_name_to_fw_type(priv->params->component);
+		if (component->identifier > U8_MAX ||
+		    !pdsc_component_id_matches_type(pdsc,
+						    component->identifier,
+						    requested_type)) {
+			skip_component = true;
+			goto add_component_priv;
+		}
+	}
+
+	buf_sz = sizeof(pdsc->cmd_regs->data);
+	tbl_sz = struct_size(component_tbl, version_str,
+			     component->version_len);
+	if (tbl_sz > buf_sz) {
+		dev_err(dev, "component_tbl size %d too big, max size: %d\n",
+			tbl_sz, buf_sz);
+		err = -ENOSPC;
+		goto free_component_priv;
+	}
+	component_tbl = kzalloc(tbl_sz, GFP_KERNEL);
+	if (!component_tbl) {
+		err = -ENOMEM;
+		goto free_component_priv;
+	}
+
+	component_tbl->comparison_stamp =
+		cpu_to_le32(component->comparison_stamp);
+	component_tbl->classification = cpu_to_le16(component->classification);
+	component_tbl->identifier = cpu_to_le16(component->identifier);
+	component_tbl->transfer_flag = transfer_flag;
+	component_tbl->version_str_type = component->version_type;
+	component_tbl->version_str_len = component->version_len;
+	memcpy(component_tbl->version_str, component->version_string,
+	       component->version_len);
+
+	cmd.send_component_tbl.opcode = PDS_CORE_CMD_SEND_COMPONENT_TBL;
+	cmd.send_component_tbl.ver = 1;
+	cmd.send_component_tbl.slot_id = PDS_CORE_FW_SLOT_INVALID;
+
+	err = pdsc_devcmd_with_data(pdsc, &cmd, component_tbl, tbl_sz,
+				    &comp, pdsc->devcmd_timeout);
+	kfree(component_tbl);
+	if (err) {
+		dev_err(dev, "Failed sending component table: %pe\n",
+			ERR_PTR(err));
+		goto free_component_priv;
+	}
+
+	if (comp.send_component_tbl.response == 1 &&
+	    comp.send_component_tbl.response_code ==
+		PDS_CORE_COMPONENT_PREREQS_NOT_MET)
+		skip_component = true;
+
+add_component_priv:
+	component_priv->skip = skip_component;
+	component_priv->component_id = component->identifier;
+	list_add(&component_priv->list_entry, &priv->components);
+
+	return 0;
+
+free_component_priv:
+	kfree(component_priv);
+	return err;
+}
+
+int pdsc_get_component_info(struct pdsc *pdsc)
+{
+	union pds_core_dev_cmd cmd = {
+		.get_component_info.opcode = PDS_CORE_CMD_GET_COMPONENT_INFO,
+		.get_component_info.ver = 1,
+	};
+	struct pds_core_component_list_info *list_info;
+	struct pdsc_deferred_dma *deferred;
+	union pds_core_dev_comp comp = {};
+	dma_addr_t dma_addr;
+	u8 num_components;
+	int err, i;
+
+	deferred = kmalloc_obj(*deferred);
+	if (!deferred)
+		return -ENOMEM;
+
+	list_info = kzalloc(PDS_PAGE_SIZE, GFP_KERNEL);
+	if (!list_info) {
+		kfree(deferred);
+		return -ENOMEM;
+	}
+
+	dma_addr = dma_map_single(pdsc->dev, list_info, PDS_PAGE_SIZE,
+				  DMA_FROM_DEVICE);
+	if (dma_mapping_error(pdsc->dev, dma_addr)) {
+		dev_err(pdsc->dev,
+			"Failed to dma_map component_list_info length %d\n",
+			PDS_PAGE_SIZE);
+		kfree(list_info);
+		kfree(deferred);
+		return -ENOMEM;
+	}
+
+	cmd.get_component_info.data_len = cpu_to_le16(PDS_PAGE_SIZE);
+	cmd.get_component_info.data_pa = cpu_to_le64(dma_addr);
+
+	err = pdsc_devcmd(pdsc, &cmd, &comp, pdsc->devcmd_timeout * 2);
+	if (err == -ETIMEDOUT || err == -EAGAIN) {
+		pdsc_deferred_dma_add(pdsc, deferred, dma_addr, list_info,
+				      PDS_PAGE_SIZE, DMA_FROM_DEVICE);
+		return err;
+	}
+
+	kfree(deferred);
+	dma_unmap_single(pdsc->dev, dma_addr, PDS_PAGE_SIZE, DMA_FROM_DEVICE);
+	if (err)
+		goto out;
+
+	if (comp.get_component_info.ver == 0) {
+		/* Don't support backward compatibility as version 0 has
+		 * alignment issues, so give a hint to users to update
+		 * their firmware
+		 */
+		dev_warn_once(pdsc->dev,
+			      "Incompatible get_component_info version %u reported by firmware\n",
+			      comp.get_component_info.ver);
+		err = 0;
+		goto out;
+	}
+
+	num_components = list_info->num_components;
+	if (num_components > PDS_CORE_FW_COMPONENT_LIST_LEN) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	pdsc->fw_components.num_components = num_components;
+	for (i = 0; i < num_components; i++) {
+		struct pds_core_fw_component_info *info =
+			&pdsc->fw_components.info[i];
+
+		memcpy(info, &list_info->info[i], sizeof(*info));
+		info->version[PDS_CORE_FW_COMPONENT_VER_BUFLEN - 1] = 0;
+		info->name[PDS_CORE_FW_COMPONENT_NAME_BUFLEN - 1] = 0;
+	}
+
+out:
+	kfree(list_info);
+	return err;
+}
+
+static int pdsc_devcmd_send_component(struct pdsc *pdsc,
+				      struct pds_core_flash_component *info,
+				      u16 info_sz, dma_addr_t addr, u32 length,
+				      u32 offset, u16 slot_id,
+				      union pds_core_dev_comp *comp)
+{
+	union pds_core_dev_cmd cmd = {
+		.send_component.opcode = PDS_CORE_CMD_SEND_COMPONENT,
+		.send_component.ver = 1,
+		.send_component.operation = PDS_CORE_SEND_COMPONENT_START,
+		.send_component.data_pa = cpu_to_le64(addr),
+		.send_component.data_len = cpu_to_le32(length),
+		.send_component.offset = cpu_to_le32(offset),
+		.send_component.slot_id = slot_id,
+	};
+	unsigned long timeout = 300 * HZ;
+	unsigned long start_time;
+	unsigned long end_time;
+	int err;
+
+	start_time = jiffies;
+	end_time = start_time + timeout;
+	do {
+		/* prevent noisy/benign devcmd failures */
+		err = pdsc_devcmd_with_data_nomsg(pdsc, &cmd, info, info_sz,
+						  comp, 60);
+		if (err != -EAGAIN)
+			break;
+
+		/* if required, subsequent commands check status of
+		 * PDS_CORE_CMD_SEND_COMPONENT command, which returns
+		 * EAGAIN while the command is still running,
+		 * else we get the final command status.
+		 */
+		cmd.send_component.operation = PDS_CORE_SEND_COMPONENT_STATUS;
+		msleep(20);
+	} while (time_before(jiffies, end_time));
+
+	if (err == -EAGAIN || err == -ETIMEDOUT)
+		dev_err(pdsc->dev, "PDS_CORE_CMD_SEND_COMPONENT timed out\n");
+
+	return err;
+}
+
+static int pdsc_flash_component_chunk(struct pdsc *pdsc, struct device *dev,
+				      struct pds_core_flash_component *info,
+				      u16 info_sz, const u8 *data, u16 copy_sz,
+				      u32 offset, u8 slot_id,
+				      union pds_core_dev_comp *comp)
+{
+	struct pdsc_deferred_dma *deferred;
+	dma_addr_t dma_addr;
+	u8 *component_data;
+	int err;
+
+	deferred = kmalloc_obj(*deferred, GFP_KERNEL);
+	if (!deferred)
+		return -ENOMEM;
+
+	component_data = kmemdup(data, copy_sz, GFP_KERNEL);
+	if (!component_data) {
+		kfree(deferred);
+		return -ENOMEM;
+	}
+
+	dma_addr = dma_map_single(dev, component_data, copy_sz, DMA_TO_DEVICE);
+	if (dma_mapping_error(dev, dma_addr)) {
+		dev_err(dev,
+			"Failed to dma_map component_data at offset 0x%x copy_sz 0x%x\n",
+			offset, copy_sz);
+		kfree(component_data);
+		kfree(deferred);
+		return -ENOMEM;
+	}
+
+	err = pdsc_devcmd_send_component(pdsc, info, info_sz, dma_addr,
+					 copy_sz, offset, slot_id, comp);
+	if (err == -ETIMEDOUT || err == -EAGAIN) {
+		pdsc_deferred_dma_add(pdsc, deferred, dma_addr,
+				      component_data, copy_sz, DMA_TO_DEVICE);
+		return err;
+	}
+
+	kfree(deferred);
+	dma_unmap_single(dev, dma_addr, copy_sz, DMA_TO_DEVICE);
+	kfree(component_data);
+
+	return err;
+}
+
+static int pdsc_flash_component(struct pldmfw *context,
+				struct pldmfw_component *component)
+{
+	char component_name_buf[sizeof(PDSC_FW_COMPONENT_PREFIX) + 16];
+	struct pds_core_fwu_priv *priv =
+		container_of(context, struct pds_core_fwu_priv, context);
+	struct pds_core_flash_component *component_info;
+	const char *component_name = NULL;
+	struct device *dev = context->dev;
+	struct pdsc *pdsc = priv->pdsc;
+	u16 buf_sz, info_sz;
+	struct devlink *dl;
+	u8 component_type;
+	u32 total_len;
+	u32 offset;
+	int err;
+
+	if (pdsc_skip_component(priv, component->identifier))
+		return 0;
+
+	component_type = pdsc_get_component_type_by_id(pdsc,
+						       component->identifier);
+	if (component_type) {
+		const char *type_name = pdsc_fw_type_to_name(component_type);
+
+		if (type_name) {
+			snprintf(component_name_buf, sizeof(component_name_buf),
+				 "%s%s", PDSC_FW_COMPONENT_PREFIX, type_name);
+			component_name = component_name_buf;
+		}
+	}
+
+	total_len = component->component_size;
+	dev_dbg(dev,
+		"component name %s class %u id %u act_meth %u ver_str %.*s index %u size %u\n",
+		component_name ?: "(unknown)", component->classification,
+		component->identifier, component->activation_method,
+		component->version_len, component->version_string,
+		component->index, component->component_size);
+
+	buf_sz = sizeof(pdsc->cmd_regs->data);
+	info_sz = struct_size(component_info, version_str,
+			      component->version_len);
+	if (info_sz > buf_sz) {
+		dev_err(dev, "component_info size %d too big, max size: %d\n",
+			info_sz, buf_sz);
+		return -ENOSPC;
+	}
+	component_info = vzalloc(info_sz);
+	if (!component_info)
+		return -ENOMEM;
+
+	component_info->comparison_stamp =
+		cpu_to_le32(component->comparison_stamp);
+	component_info->image_size = cpu_to_le32(total_len);
+	component_info->classification = cpu_to_le16(component->classification);
+	component_info->identifier = cpu_to_le16(component->identifier);
+	component_info->options = cpu_to_le16(component->options);
+	component_info->version_str_type = component->version_type;
+	component_info->version_str_len = component->version_len;
+	memcpy(component_info->version_str, component->version_string,
+	       component->version_len);
+
+	dl = priv_to_devlink(pdsc);
+
+	offset = 0;
+	while (offset < total_len) {
+		union pds_core_dev_comp comp = {};
+		u16 copy_sz;
+
+		copy_sz = min_t(unsigned int, PDS_PAGE_SIZE,
+				total_len - offset);
+
+		err = pdsc_flash_component_chunk(pdsc, dev, component_info,
+						 info_sz,
+						 component->component_data +
+						 offset, copy_sz, offset,
+						 PDS_CORE_FW_SLOT_INVALID,
+						 &comp);
+		if (err &&
+		    comp.send_component.compat_response &&
+		    (comp.send_component.compat_response_code ==
+		     PDS_CORE_COMPONENT_STAMP_IDENTICAL ||
+		     comp.send_component.compat_response_code ==
+		     PDS_CORE_COMPONENT_STAMP_LOWER)) {
+			err = 0;
+			devlink_flash_update_status_notify(dl, "Skipped",
+							   component_name,
+							   0, 0);
+			goto skip_component;
+		}
+
+		if (err) {
+			dev_err(dev,
+				"send_component failed offset 0x%x len 0x%x: %pe\n",
+				offset, copy_sz, ERR_PTR(err));
+			goto err_out;
+		}
+
+		offset += copy_sz;
+		devlink_flash_update_status_notify(dl,
+						   "Erasing/Flashing",
+						   component_name, offset,
+						   total_len);
+	}
+
+	vfree(component_info);
+	return 0;
+
+err_out:
+	devlink_flash_update_status_notify(dl,
+					   "Erasing/Flashing Component Failed",
+					   component_name, 0, 0);
+skip_component:
+	vfree(component_info);
+	return err;
+}
+
+static int pdsc_devcmd_finalize_update(struct pdsc *pdsc)
+{
+	union pds_core_dev_cmd cmd = {
+		.finalize_update.opcode = PDS_CORE_CMD_FINALIZE_UPDATE,
+		.finalize_update.ver = 1,
+	};
+	union pds_core_dev_comp comp = {};
+
+	return pdsc_devcmd(pdsc, &cmd, &comp, pdsc->devcmd_timeout);
+}
+
+static int pdsc_finalize_update(struct pldmfw *context)
+{
+	struct pds_core_fwu_priv *priv =
+		container_of(context, struct pds_core_fwu_priv, context);
+	const char *component_name = priv->params->component;
+	unsigned long start_time, end_time;
+	struct device *dev = context->dev;
+	struct pdsc *pdsc = priv->pdsc;
+	struct devlink *dl;
+	int err;
+
+	dl = priv_to_devlink(pdsc);
+
+	start_time = jiffies;
+	end_time = start_time + (PDSC_FW_INSTALL_TIMEOUT * HZ);
+	do {
+		err = pdsc_devcmd_finalize_update(pdsc);
+		if (!err || err != -EAGAIN)
+			break;
+
+		dev_dbg(dev, "retrying finalize_update: %pe\n", ERR_PTR(err));
+		msleep(20);
+	} while (time_before(jiffies, end_time) && err == -EAGAIN);
+
+	if (err) {
+		devlink_flash_update_status_notify(dl, "Finalize Update Failed",
+						   component_name, 0, 0);
+		dev_err(dev, "finalize_update failed: %pe\n", ERR_PTR(err));
+		return err;
+	}
+
+	devlink_flash_update_status_notify(dl, "Finalized Update",
+					   component_name, 0, 0);
+	return 0;
+}
+
+static const struct pldmfw_ops pdsc_pldmfw_ops = {
+	.match_record = pdsc_match_record_descs,
+	.send_package_data = pdsc_send_package_data,
+	.send_component_table = pdsc_send_component_table,
+	.flash_component = pdsc_flash_component,
+	.finalize_update = pdsc_finalize_update
+};
+
+static int pdsc_pldm_firmware_update(struct pdsc *pdsc,
+				     struct devlink_flash_update_params *params,
+				     struct netlink_ext_ack *extack,
+				     const struct firmware *fw)
+{
+	struct pds_core_fwu_priv priv = {};
+	int err;
+
+	if (!pdsc->fw_components.num_components) {
+		err = pdsc_get_component_info(pdsc);
+		if (err) {
+			dev_err(pdsc->dev,
+				"Failed to get component info: %pe\n",
+				ERR_PTR(err));
+			return err;
+		}
+	}
+
+	if (params->component) {
+		u8 type = pdsc_name_to_fw_type(params->component);
+
+		if (!type || !pdsc_component_type_exists(pdsc, type))
+			return -ENOENT;
+	}
+
+	INIT_LIST_HEAD(&priv.components);
+	priv.context.ops = &pdsc_pldmfw_ops;
+	priv.context.dev = pdsc->dev;
+	priv.params = params;
+	priv.pdsc = pdsc;
+
+	err = pldmfw_flash_image(&priv.context, fw);
+	pdsc_free_fwu_priv(&priv);
+
+	return err;
+}
+
+int pdsc_firmware_update(struct pdsc *pdsc,
+			 struct devlink_flash_update_params *params,
+			 struct netlink_ext_ack *extack)
+{
+	int err;
+
+	if (pdsc->dev_ident.version >= PDS_CORE_IDENTITY_VERSION_2 &&
+	    pdsc->dev_ident.capabilities &
+		cpu_to_le64(PDS_CORE_DEV_CAP_PLDM_FW_UPDATE))
+		err = pdsc_pldm_firmware_update(pdsc, params, extack,
+						params->fw);
+	else
+		err = pdsc_legacy_firmware_update(pdsc, params->fw, extack);
+
+	/* Invalidate cached component info so next info_get refreshes */
+	pdsc->fw_components.num_components = 0;
+
+	return err;
+}
diff --git a/drivers/net/ethernet/amd/pds_core/main.c b/drivers/net/ethernet/amd/pds_core/main.c
index 22db78343eb0..02ff5e51f617 100644
--- a/drivers/net/ethernet/amd/pds_core/main.c
+++ b/drivers/net/ethernet/amd/pds_core/main.c
@@ -246,6 +246,8 @@ static int pdsc_init_pf(struct pdsc *pdsc)
 	mutex_init(&pdsc->devcmd_lock);
 	mutex_init(&pdsc->config_lock);
 	spin_lock_init(&pdsc->adminq_lock);
+	INIT_LIST_HEAD(&pdsc->deferred_dma_list);
+	spin_lock_init(&pdsc->deferred_dma_lock);
 
 	mutex_lock(&pdsc->config_lock);
 	set_bit(PDSC_S_FW_DEAD, &pdsc->state);
@@ -311,6 +313,7 @@ static int pdsc_init_pf(struct pdsc *pdsc)
 		destroy_workqueue(pdsc->wq);
 	mutex_destroy(&pdsc->config_lock);
 	mutex_destroy(&pdsc->devcmd_lock);
+	pdsc_deferred_dma_free(pdsc);
 	pci_free_irq_vectors(pdsc->pdev);
 	pdsc_unmap_bars(pdsc);
 err_out_release_regions:
@@ -452,6 +455,8 @@ static void pdsc_remove(struct pci_dev *pdev)
 	}
 
 	pci_disable_device(pdev);
+	if (!pdev->is_virtfn)
+		pdsc_deferred_dma_free(pdsc);
 
 	ida_free(&pdsc_ida, pdsc->uid);
 	pdsc_debugfs_del_dev(pdsc);
@@ -499,6 +504,8 @@ static void pdsc_reset_prepare(struct pci_dev *pdev)
 	pci_release_regions(pdev);
 	if (pci_is_enabled(pdev))
 		pci_disable_device(pdev);
+	if (!pdev->is_virtfn)
+		pdsc_deferred_dma_free(pdsc);
 }
 
 static void pdsc_reset_done(struct pci_dev *pdev)
diff --git a/include/linux/pds/pds_core_if.h b/include/linux/pds/pds_core_if.h
index 619186f26b5b..cc1f180aee55 100644
--- a/include/linux/pds/pds_core_if.h
+++ b/include/linux/pds/pds_core_if.h
@@ -40,6 +40,13 @@ enum pds_core_cmd_opcode {
 	PDS_CORE_CMD_FW_DOWNLOAD	= 4,
 	PDS_CORE_CMD_FW_CONTROL		= 5,
 
+	PDS_CORE_CMD_GET_COMPONENT_INFO	= 6,
+	PDS_CORE_CMD_SEND_PKG_DATA	= 7,
+	PDS_CORE_CMD_SEND_COMPONENT_TBL	= 8,
+	PDS_CORE_CMD_SEND_COMPONENT	= 9,
+	PDS_CORE_CMD_FINALIZE_UPDATE	= 10,
+	PDS_CORE_CMD_MATCH_RECORD_DESC	= 11,
+
 	/* SR/IOV commands */
 	PDS_CORE_CMD_VF_GETATTR		= 60,
 	PDS_CORE_CMD_VF_SETATTR		= 61,
@@ -100,6 +107,14 @@ struct pds_core_drv_identity {
 	char   driver_ver_str[32];
 };
 
+/**
+ * enum pds_core_dev_capability - Device capabilities
+ * @PDS_CORE_DEV_CAP_PLDM_FW_UPDATE: Device only supports FW update via PLDM
+ */
+enum pds_core_dev_capability {
+	PDS_CORE_DEV_CAP_PLDM_FW_UPDATE = BIT(0),
+};
+
 #define PDS_DEV_TYPE_MAX	16
 /**
  * struct pds_core_dev_identity - Device identity information
@@ -119,6 +134,9 @@ struct pds_core_drv_identity {
  *		      value in usecs to device units using:
  *		      device units = usecs * mult / div
  * @vif_types:        How many of each VIF device type is supported
+ * @max_fw_slots:     Maximum number of fw slots/components
+ *		      only supported on version >= PDS_CORE_IDENTITY_VERSION_2
+ * @rsvd2:	      Word boundary padding
  * @capabilities:     Device capabilities
  *		      only supported on version >= PDS_CORE_IDENTITY_VERSION_2
  */
@@ -133,6 +151,8 @@ struct pds_core_dev_identity {
 	__le32 intr_coal_mult;
 	__le32 intr_coal_div;
 	__le16 vif_types[PDS_DEV_TYPE_MAX];
+	__le16 max_fw_slots;
+	u8     rsvd2[6];
 	__le64 capabilities;
 };
 
@@ -279,11 +299,20 @@ enum pds_core_fw_control_oper {
 	PDS_CORE_FW_GET_LIST               = 7,
 };
 
+/**
+ * enum pds_core_fw_slot - Firmware slot identifiers
+ * @PDS_CORE_FW_SLOT_INVALID: Let firmware select slot based on package metadata
+ * @PDS_CORE_FW_SLOT_A:       Primary firmware slot A
+ * @PDS_CORE_FW_SLOT_B:       Primary firmware slot B
+ * @PDS_CORE_FW_SLOT_GOLD:    Gold/recovery firmware slot
+ * @PDS_CORE_FW_SLOT_MAX:     Sentinel value indicating no slot resolved
+ */
 enum pds_core_fw_slot {
 	PDS_CORE_FW_SLOT_INVALID    = 0,
 	PDS_CORE_FW_SLOT_A	    = 1,
 	PDS_CORE_FW_SLOT_B          = 2,
 	PDS_CORE_FW_SLOT_GOLD       = 3,
+	PDS_CORE_FW_SLOT_MAX        = 0xff,
 };
 
 /**
@@ -450,6 +479,365 @@ struct pds_core_vf_ctrl_comp {
 	u8	status;
 };
 
+/**
+ * struct pds_core_send_pkg_data_cmd - Send package data command
+ * @opcode: Opcode PDS_CORE_CMD_SEND_PKG_DATA
+ * @ver: Driver's max support version of this command
+ * @total_len: Total length of the package data
+ * @offset: Offset in the package data, non-zero if multiple commands are
+ *	    needed for sending the package data
+ * @data_len: Length of data stored at data_pa
+ * @data_pa: Data physical address for DMA to device
+ *
+ * The package data may be too large to store in a single buffer, so multiple
+ * PDS_CORE_CMD_SEND_PKG_DATA devcmds may be needed.
+ */
+struct pds_core_send_pkg_data_cmd {
+	u8 opcode;
+	u8 ver;
+	__le16 total_len;
+	__le16 offset;
+	__le16 data_len;
+	__le64 data_pa;
+};
+
+/**
+ * struct pds_core_send_pkg_data_comp - Send package data completion
+ * @status: Status of the command (enum pds_core_status_code)
+ * @ver: Device's max supported version of this command
+ * @rsvd: Word boundary padding
+ */
+struct pds_core_send_pkg_data_comp {
+	u8 status;
+	u8 ver;
+	u8 rsvd[2];
+};
+
+/**
+ * struct pds_core_component_tbl - Component table details
+ * @comparison_stamp: Comparison stamp used for component version checks
+ * @classification: Vendor specific classification info
+ * @identifier: Component's ID
+ * @transfer_flag: Part of the component table this request represents
+ * @version_str_type: The types of strings used
+ * @version_str_len: Length of @version_str
+ * @version_str: Component version information
+ */
+struct pds_core_component_tbl {
+	__le32 comparison_stamp;
+	__le16 classification;
+	__le16 identifier;
+	u8     transfer_flag;
+	u8     version_str_type;
+	u8     version_str_len;
+	u8     version_str[];
+};
+
+/**
+ * struct pds_core_send_component_tbl_cmd - Send component table command
+ * @opcode: Opcode PDS_CORE_CMD_SEND_COMPONENT_TBL
+ * @ver: Driver's max support version of this command
+ * @slot_id: enum pds_core_fw_slot
+ * @rsvd: Word boundary padding
+ *
+ * Expects to find component table info (struct pds_core_component_tbl)
+ * in cmd_regs->data.  Driver should keep the devcmd interface locked
+ * while preparing the component table info.
+ */
+struct pds_core_send_component_tbl_cmd {
+	u8 opcode;
+	u8 ver;
+	u8 slot_id;
+	u8 rsvd;
+};
+
+enum pds_core_component_resp_code {
+	PDS_CORE_COMPONENT_VALID = 0x0,
+	PDS_CORE_COMPONENT_STAMP_IDENTICAL = 0x1,
+	PDS_CORE_COMPONENT_STAMP_LOWER = 0x2,
+	PDS_CORE_COMPONENT_STAMP_OR_VERSION_INVALID = 0x3,
+	PDS_CORE_COMPONENT_CONFLICT = 0x4,
+	PDS_CORE_COMPONENT_PREREQS_NOT_MET = 0x5,
+	PDS_CORE_COMPONENT_NOT_SUPPORTED = 0x6,
+	PDS_CORE_COMPONENT_FW_TYPE_INVALID = 0xd0,
+};
+
+/**
+ * struct pds_core_send_component_tbl_comp - Send component table completion
+ * @status: Status of the command (enum pds_core_status_code)
+ * @ver: Device's max supported version of this command
+ * @completion_code: Component completion code
+ * @response: Component response
+ * @response_code: Component response code
+ * @slot_id: Actual slot_id of the component (enum pds_core_fw_slot)
+ * @rsvd: Word boundary padding
+ */
+struct pds_core_send_component_tbl_comp {
+	u8 status;
+	u8 ver;
+	u8 completion_code;
+	u8 response;
+	u8 response_code;
+	u8 slot_id;
+	u8 rsvd[2];
+};
+
+/**
+ * enum pds_core_send_component_op - PDS_CORE_CMD_SEND_COMPONENT operation
+ * @PDS_CORE_SEND_COMPONENT_START: Initial operation to start transfer
+ * @PDS_CORE_SEND_COMPONENT_STATUS: Subsequent calls to check on status
+ * PDS_CORE_CMD_SEND_COMPONENT
+ */
+enum pds_core_send_component_op {
+	PDS_CORE_SEND_COMPONENT_START = 0,
+	PDS_CORE_SEND_COMPONENT_STATUS = 1,
+};
+
+#define PDS_CORE_FW_COMPONENT_ID_INVALID 0xFFFF
+/**
+ * struct pds_core_flash_component - Component details
+ * @comparison_stamp: Comparison stamp used for component version checks
+ * @image_size: Component image size
+ * @classification: Vendor specific classification info
+ * @identifier: Component's ID
+ * @options: Component options
+ * @rsvd: Word boundary padding
+ * @version_str_type: The types of strings used
+ * @version_str_len: Length of @version_str
+ * @version_str: Component version information
+ */
+struct pds_core_flash_component {
+	__le32 comparison_stamp;
+	__le32 image_size;
+	__le16 classification;
+	__le16 identifier;
+	__le16 options;
+	u8 rsvd[3];
+	u8 version_str_type;
+	u8 version_str_len;
+	u8 version_str[];
+};
+
+/**
+ * struct pds_core_send_component_cmd - Send component command
+ * @opcode: Opcode PDS_CORE_CMD_SEND_COMPONENT
+ * @ver: Driver's max supported version of this command
+ * @slot_id: enum pds_core_fw_slot
+ * @operation: enum pds_core_send_component_op
+ * @offset: Offset into the component, non-zero if multiple commands
+ *	    are needed for a single component
+ * @data_len: Length of this part of the component stored at @data_pa
+ * @rsvd: Word boundary padding
+ * @data_pa: DMA address of the component
+ *
+ * A component may be too large to store in a single buffer, so multiple
+ * PDS_CORE_CMD_SEND_COMPONENT devcmds may be needed.
+ *
+ * Expects to find flash component info (struct pds_core_flash_component)
+ * in cmd_regs->data. Driver should keep the devcmd interface locked
+ * while preparing and sending the flash component info.
+ */
+struct pds_core_send_component_cmd {
+	u8 opcode;
+	u8 ver;
+	u8 slot_id;
+	u8 operation;
+	__le32 offset;
+	__le32 data_len;
+	u8 rsvd[4];
+	__le64 data_pa;
+};
+
+/**
+ * struct pds_core_send_component_comp - Send component completion
+ * @status: Status of the command (enum pds_core_status_code)
+ * @ver: Device's max supported version of this command
+ * @completion_code: Completion code
+ * @compat_response: Compatibility response (0 = Component can be updated)
+ * @compat_response_code: Compatibility response code
+ * @rsvd: Word boundary padding
+ */
+struct pds_core_send_component_comp {
+	u8 status;
+	u8 ver;
+	u8 completion_code;
+	u8 compat_response;
+	u8 compat_response_code;
+	u8 rsvd[3];
+};
+
+/**
+ * enum pds_core_fw_component_type - Firmware component type
+ * @PDS_CORE_FW_TYPE_UNKNOWN: Unknown component type
+ * @PDS_CORE_FW_TYPE_MAIN: Main firmware
+ * @PDS_CORE_FW_TYPE_BOOT: Boot loader
+ * @PDS_CORE_FW_TYPE_CPLD: CPLD firmware
+ * @PDS_CORE_FW_TYPE_SECURE: Secure firmware
+ * @PDS_CORE_FW_TYPE_FPGA: FPGA configuration
+ * @PDS_CORE_FW_TYPE_SUC_MAIN: System Unit Controller firmware
+ * @PDS_CORE_FW_TYPE_SUC_BOOT: System Unit Controller bootloader
+ * @PDS_CORE_FW_TYPE_UBOOT: U-Boot bootloader
+ *
+ * Gold/recovery variants are identified by slot_id == PDS_CORE_FW_SLOT_GOLD
+ * and reported with a ".gold" suffix (e.g., fw.mainfw.gold).
+ */
+enum pds_core_fw_component_type {
+	PDS_CORE_FW_TYPE_UNKNOWN   = 0,
+	PDS_CORE_FW_TYPE_MAIN      = 1,
+	PDS_CORE_FW_TYPE_BOOT      = 2,
+	PDS_CORE_FW_TYPE_CPLD      = 3,
+	PDS_CORE_FW_TYPE_SECURE    = 4,
+	PDS_CORE_FW_TYPE_FPGA      = 5,
+	PDS_CORE_FW_TYPE_SUC_MAIN  = 6,
+	PDS_CORE_FW_TYPE_SUC_BOOT  = 7,
+	PDS_CORE_FW_TYPE_UBOOT     = 8,
+};
+
+/**
+ * enum pds_core_component_info_flags - Component info flags
+ * @PDS_CORE_FW_COMPONENT_INFO_F_RUNNING: Component is currently running
+ * @PDS_CORE_FW_COMPONENT_INFO_F_STARTUP: Component version on next FW boot
+ * @PDS_CORE_FW_COMPONENT_INFO_F_FIXED: Component is fixed and cannot be updated
+ * @PDS_CORE_FW_COMPONENT_INFO_F_UPDATE_BY_NAME: Component can be updated
+ *	by name
+ */
+enum pds_core_component_info_flags {
+	PDS_CORE_FW_COMPONENT_INFO_F_RUNNING = BIT(0),
+	PDS_CORE_FW_COMPONENT_INFO_F_STARTUP = BIT(1),
+	PDS_CORE_FW_COMPONENT_INFO_F_FIXED = BIT(2),
+	PDS_CORE_FW_COMPONENT_INFO_F_UPDATE_BY_NAME = BIT(3),
+};
+
+/**
+ * struct pds_core_fw_component_info - GET_COMPONENT_INFO entry
+ * @name: Component's name
+ * @component_type: enum pds_core_fw_component_type
+ * @rsvd: Word boundary padding
+ * @flags: enum pds_core_component_info_flags
+ * @identifier: Component's identifier
+ * @slot_id: Component's slot identifier
+ * @version: Component's version
+ */
+struct pds_core_fw_component_info {
+#define PDS_CORE_FW_COMPONENT_NAME_BUFLEN 24
+	char name[PDS_CORE_FW_COMPONENT_NAME_BUFLEN];
+	u8 component_type;
+	u8 rsvd[3];
+	__le16 flags;
+	u8 identifier;
+	u8 slot_id;
+#define PDS_CORE_FW_COMPONENT_VER_BUFLEN 32
+	char version[PDS_CORE_FW_COMPONENT_VER_BUFLEN];
+};
+
+#define PDS_CORE_FW_COMPONENT_LIST_LEN	((PDS_PAGE_SIZE - 8) / \
+		sizeof(struct pds_core_fw_component_info))
+
+/**
+ * struct pds_core_component_list_info - GET_COMPONENT_INFO completion data
+ * @num_components: Number of valid components
+ * @rsvd: Word boundary padding
+ * @info: List of valid components
+ */
+struct pds_core_component_list_info {
+	u8 num_components;
+	u8 rsvd[7];
+	struct pds_core_fw_component_info info[PDS_CORE_FW_COMPONENT_LIST_LEN];
+};
+
+/**
+ * struct pds_core_get_component_info_cmd - GET_COMPONENT_INFO command
+ * @opcode: PDS_CORE_CMD_GET_COMPONENT_INFO
+ * @ver: Driver's max supported version of this command
+ * @data_len: Length of data at data_pa
+ * @rsvd: Word boundary padding
+ * @data_pa: DMA address of data
+ *
+ * FW populates struct pds_core_component_list_info pointed to by @data_pa
+ */
+struct pds_core_get_component_info_cmd {
+	u8 opcode;
+	u8 ver;
+	__le16 data_len;
+	u8 rsvd[4];
+	__le64 data_pa;
+};
+
+/**
+ * struct pds_core_get_component_info_comp - GET_COMPONENT_INFO completion
+ * @status: enum pds_core_status_code
+ * @ver: Device's max supported version of this command
+ * @rsvd: Word boundary padding
+ */
+struct pds_core_get_component_info_comp {
+	u8 status;
+	u8 ver;
+	u8 rsvd[2];
+};
+
+/**
+ * struct pds_core_finalize_update_cmd - FINALIZE_UPDATE command
+ * @opcode: PDS_CORE_CMD_FINALIZE_UPDATE
+ * @ver: Driver's max support version of this command
+ * @rsvd: Word boundary padding
+ *
+ * Driver sends at the end of updating all components to finalize the update
+ */
+struct pds_core_finalize_update_cmd {
+	u8 opcode;
+	u8 ver;
+	u8 rsvd[2];
+};
+
+/**
+ * struct pds_core_finalize_update_comp - FINALIZE_UPDATE completion
+ * @status: enum pds_core_status_code
+ * @ver: Device's max supported version of this command
+ * @rsvd: Word boundary padding
+ */
+struct pds_core_finalize_update_comp {
+	u8 status;
+	u8 ver;
+	u8 rsvd[2];
+};
+
+/**
+ * struct pds_core_match_record_desc_cmd - MATCH_RECORD_DESC command
+ * @opcode: PDS_CORE_CMD_MATCH_RECORD_DESC
+ * @ver: Driver's max supported version of this command
+ * @type: PLDM Descriptor Identifier Type
+ * @size: Length of the Descriptor Identifier Value
+ * @rsvd: Word boundary padding
+ *
+ * Expects to find the Descriptor Identifier Data in cmd_regs->data. Driver
+ * should keep the devcmd interface locked while preparing and sending this
+ * command.
+ */
+struct pds_core_match_record_desc_cmd {
+	u8 opcode;
+	u8 ver;
+	__le16 type;
+	__le16 size;
+	u8 rsvd[2];
+};
+
+/**
+ * struct pds_core_match_record_desc_comp - MATCH_RECORD_DESC completion
+ * @status: enum pds_core_status_code
+ * @ver: Device's max supported version of this command
+ * @match: Whether or not the Record Descriptor matches the device
+ * @rsvd: Word boundary padding
+ *
+ * When status is PDS_RC_SUCCESS, then @match is valid, otherwise it's
+ * undefined.
+ */
+struct pds_core_match_record_desc_comp {
+	u8 status;
+	u8 ver;
+	u8 match;
+	u8 rsvd;
+};
+
 /*
  * union pds_core_dev_cmd - Overlay of core device command structures
  */
@@ -466,6 +854,13 @@ union pds_core_dev_cmd {
 	struct pds_core_vf_setattr_cmd   vf_setattr;
 	struct pds_core_vf_getattr_cmd   vf_getattr;
 	struct pds_core_vf_ctrl_cmd      vf_ctrl;
+
+	struct pds_core_get_component_info_cmd get_component_info;
+	struct pds_core_send_pkg_data_cmd      send_pkg_data;
+	struct pds_core_send_component_tbl_cmd send_component_tbl;
+	struct pds_core_send_component_cmd     send_component;
+	struct pds_core_finalize_update_cmd    finalize_update;
+	struct pds_core_match_record_desc_cmd  match_record_desc;
 };
 
 /*
@@ -484,6 +879,13 @@ union pds_core_dev_comp {
 	struct pds_core_vf_setattr_comp   vf_setattr;
 	struct pds_core_vf_getattr_comp   vf_getattr;
 	struct pds_core_vf_ctrl_comp      vf_ctrl;
+
+	struct pds_core_get_component_info_comp get_component_info;
+	struct pds_core_send_pkg_data_comp      send_pkg_data;
+	struct pds_core_send_component_tbl_comp send_component_tbl;
+	struct pds_core_send_component_comp     send_component;
+	struct pds_core_finalize_update_comp    finalize_update;
+	struct pds_core_match_record_desc_comp  match_record_desc;
 };
 
 /**
-- 
2.43.0


^ permalink raw reply related

* [PATCH v4 4/6] pds_core: add PLDM component info display
From: Nikhil P. Rao @ 2026-06-14  5:00 UTC (permalink / raw)
  To: netdev
  Cc: Brett Creeley, Andrew Lunn, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Eric Joyner
In-Reply-To: <20260614050052.1048328-1-nikhil.rao@amd.com>

From: Brett Creeley <brett.creeley@amd.com>

Add detailed component information display via devlink info. This
allows users to see individual firmware components and their versions.
Components are reported as fixed, running, or stored based on their
firmware-provided flags.

Example output:
  $ devlink dev info pci/0000:00:05.0
  versions:
    fixed:
      asic.id 0x0
      asic.rev 0x0
    running:
      fw.bootloader 1.2.3
      fw.uboot 1.60.0-73
      fw 1.60.0-73
      fw.cpld 3.18
    stored:
      fw.bootloader 1.2.3
      fw.uboot 1.60.0-73
      fw.uboot.gold 1.50.0-22
      fw.gold 1.50.0-22
      fw 1.60.0-73
      fw.cpld 3.18

Signed-off-by: Brett Creeley <brett.creeley@amd.com>
---
 .../device_drivers/ethernet/amd/pds_core.rst  |  16 +--
 drivers/net/ethernet/amd/pds_core/devlink.c   | 132 +++++++++++++++++-
 drivers/net/ethernet/amd/pds_core/fw.c        |  14 +-
 3 files changed, 147 insertions(+), 15 deletions(-)

diff --git a/Documentation/networking/device_drivers/ethernet/amd/pds_core.rst b/Documentation/networking/device_drivers/ethernet/amd/pds_core.rst
index 71f0222589bb..9444daeaae1b 100644
--- a/Documentation/networking/device_drivers/ethernet/amd/pds_core.rst
+++ b/Documentation/networking/device_drivers/ethernet/amd/pds_core.rst
@@ -115,8 +115,8 @@ Individual components can also be updated by specifying the component name::
   # devlink dev flash pci/0000:b5:00.0 \
             file firmware.pldmfw component fw.cpld
 
-Per-component update uses driver-defined component names (fw.mainfw,
-fw.cpld, etc.). Not all components support per-component update -
+Per-component update uses driver-defined component names (fw, fw.cpld,
+etc.). Not all components support per-component update -
 devlink will reject the request if the specified component cannot
 be updated.
 
@@ -133,12 +133,9 @@ names. The driver reports the following component versions:
      - Type
      - Description
    * - ``fw``
-     - running
-     - Version of firmware running on the device
-   * - ``fw.mainfw``
      - running, stored
      - Main firmware
-   * - ``fw.mainfw.gold``
+   * - ``fw.gold``
      - stored
      - Gold (recovery) firmware
    * - ``fw.bootloader``
@@ -181,13 +178,12 @@ Example output::
           asic.rev 0x0
         running:
           fw.bootloader 1.2.3
-          fw.mainfw 1.3.0
-          fw.cpld 3.18
           fw 1.3.0
+          fw.cpld 3.18
         stored:
           fw.bootloader 1.2.3
-          fw.mainfw.gold 1.2.0
-          fw.mainfw 1.3.0
+          fw.gold 1.2.0
+          fw 1.3.0
           fw.cpld 3.18
 
 Health Reporters
diff --git a/drivers/net/ethernet/amd/pds_core/devlink.c b/drivers/net/ethernet/amd/pds_core/devlink.c
index 3b763ee1715e..2f40b97affd6 100644
--- a/drivers/net/ethernet/amd/pds_core/devlink.c
+++ b/drivers/net/ethernet/amd/pds_core/devlink.c
@@ -93,14 +93,105 @@ int pdsc_dl_flash_update(struct devlink *dl,
 	return pdsc_firmware_update(pdsc, params, extack);
 }
 
+static int pdsc_dl_report_component(struct devlink_info_req *req,
+				    struct pds_core_fw_component_info *info)
+{
+	enum devlink_info_version_type ver_type;
+	u16 flags = le16_to_cpu(info->flags);
+	char *ver = info->version;
+	const char *name;
+	char buf[32];
+
+	/* Main firmware is reported as generic "fw" */
+	if (info->component_type == PDS_CORE_FW_TYPE_MAIN) {
+		if (info->slot_id == PDS_CORE_FW_SLOT_GOLD)
+			snprintf(buf, sizeof(buf), "fw.gold");
+		else
+			snprintf(buf, sizeof(buf), "fw");
+	} else {
+		name = pdsc_fw_type_to_name(info->component_type);
+		if (!name)
+			return 0;
+
+		if (info->slot_id == PDS_CORE_FW_SLOT_GOLD)
+			snprintf(buf, sizeof(buf), "fw.%s.gold", name);
+		else
+			snprintf(buf, sizeof(buf), "fw.%s", name);
+	}
+
+	ver_type = DEVLINK_INFO_VERSION_TYPE_NONE;
+	if (flags & PDS_CORE_FW_COMPONENT_INFO_F_UPDATE_BY_NAME)
+		ver_type = DEVLINK_INFO_VERSION_TYPE_COMPONENT;
+
+	if (flags & PDS_CORE_FW_COMPONENT_INFO_F_FIXED) {
+		int err;
+
+		err = devlink_info_version_fixed_put(req, buf, ver);
+		if (err)
+			return err;
+	}
+
+	if (flags & PDS_CORE_FW_COMPONENT_INFO_F_RUNNING) {
+		int err;
+
+		err = devlink_info_version_running_put_ext(req, buf,
+							   ver, ver_type);
+		if (err)
+			return err;
+	}
+
+	if (flags & PDS_CORE_FW_COMPONENT_INFO_F_STARTUP) {
+		int err;
+
+		err = devlink_info_version_stored_put_ext(req, buf,
+							  ver, ver_type);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+static int pdsc_dl_component_info_get(struct devlink *dl,
+				      struct devlink_info_req *req,
+				      struct netlink_ext_ack *extack)
+{
+	struct pds_core_component_list_info *list_info;
+	struct pdsc *pdsc = devlink_priv(dl);
+	u8 num_components;
+	int err;
+	int i;
+
+	if (!pdsc->fw_components.num_components) {
+		err = pdsc_get_component_info(pdsc);
+		if (err) {
+			dev_err(pdsc->dev, "Failed to get component_info %pe\n",
+				ERR_PTR(err));
+			return err;
+		}
+	}
+
+	list_info = &pdsc->fw_components;
+	num_components = min_t(u16, list_info->num_components,
+			       le16_to_cpu(pdsc->dev_ident.max_fw_slots));
+	for (i = 0; i < num_components; i++) {
+		err = pdsc_dl_report_component(req, &list_info->info[i]);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
 static char *fw_slotnames[] = {
 	"fw.goldfw",
 	"fw.mainfwa",
 	"fw.mainfwb",
 };
 
-int pdsc_dl_info_get(struct devlink *dl, struct devlink_info_req *req,
-		     struct netlink_ext_ack *extack)
+static int pdsc_dl_fw_list_info_get(struct devlink *dl,
+				    struct devlink_info_req *req,
+				    struct netlink_ext_ack *extack)
 {
 	union pds_core_dev_cmd cmd = {
 		.fw_control.opcode = PDS_CORE_CMD_FW_CONTROL,
@@ -134,12 +225,49 @@ int pdsc_dl_info_get(struct devlink *dl, struct devlink_info_req *req,
 			return err;
 	}
 
+	return 0;
+}
+
+static int pdsc_dl_info_get_v1(struct devlink *dl,
+			       struct devlink_info_req *req,
+			       struct netlink_ext_ack *extack)
+{
+	struct pdsc *pdsc = devlink_priv(dl);
+	int err;
+
+	err = pdsc_dl_fw_list_info_get(dl, req, extack);
+	if (err)
+		dev_warn_once(pdsc->dev, "Failed to get fw list: %pe\n",
+			      ERR_PTR(err));
+
+	/* Version 1: report fw from dev_info (running only) */
 	err = devlink_info_version_running_put(req,
 					       DEVLINK_INFO_VERSION_GENERIC_FW,
 					       pdsc->dev_info.fw_version);
 	if (err)
 		return err;
 
+	return 0;
+}
+
+int pdsc_dl_info_get(struct devlink *dl, struct devlink_info_req *req,
+		     struct netlink_ext_ack *extack)
+{
+	struct pdsc *pdsc = devlink_priv(dl);
+	char buf[32];
+	int err;
+
+	if (pdsc->dev_ident.version >= PDS_CORE_IDENTITY_VERSION_2) {
+		err = pdsc_dl_component_info_get(dl, req, extack);
+		if (err)
+			dev_warn_once(pdsc->dev, "Failed to get component info: %pe\n",
+				      ERR_PTR(err));
+	} else {
+		err = pdsc_dl_info_get_v1(dl, req, extack);
+		if (err)
+			return err;
+	}
+
 	snprintf(buf, sizeof(buf), "0x%x", pdsc->dev_info.asic_type);
 	err = devlink_info_version_fixed_put(req,
 					     DEVLINK_INFO_VERSION_GENERIC_ASIC_ID,
diff --git a/drivers/net/ethernet/amd/pds_core/fw.c b/drivers/net/ethernet/amd/pds_core/fw.c
index 0ef34869fdc0..1374d7bea013 100644
--- a/drivers/net/ethernet/amd/pds_core/fw.c
+++ b/drivers/net/ethernet/amd/pds_core/fw.c
@@ -21,9 +21,11 @@
 #define PDSC_FW_COMPONENT_FULL_NAME_BUFLEN \
 	(sizeof(PDSC_FW_COMPONENT_PREFIX) + PDS_CORE_FW_COMPONENT_NAME_BUFLEN)
 
-/* Driver-defined component type to name mapping */
+/* Driver-defined component type to name mapping.
+ * PDS_CORE_FW_TYPE_MAIN is NULL - handled specially as "fw" without prefix.
+ */
 static const char * const pdsc_fw_type_names[] = {
-	[PDS_CORE_FW_TYPE_MAIN]      = "mainfw",
+	[PDS_CORE_FW_TYPE_MAIN]      = NULL,
 	[PDS_CORE_FW_TYPE_BOOT]      = "bootloader",
 	[PDS_CORE_FW_TYPE_CPLD]      = "cpld",
 	[PDS_CORE_FW_TYPE_SECURE]    = "secure",
@@ -45,6 +47,10 @@ static u8 pdsc_name_to_fw_type(const char *name)
 	size_t prefix_len;
 	int i;
 
+	/* "fw" without suffix maps to main firmware */
+	if (!strcmp(name, "fw"))
+		return PDS_CORE_FW_TYPE_MAIN;
+
 	prefix_len = str_has_prefix(name, PDSC_FW_COMPONENT_PREFIX);
 	if (prefix_len)
 		name += prefix_len;
@@ -752,7 +758,9 @@ static int pdsc_flash_component(struct pldmfw *context,
 	if (component_type) {
 		const char *type_name = pdsc_fw_type_to_name(component_type);
 
-		if (type_name) {
+		if (component_type == PDS_CORE_FW_TYPE_MAIN) {
+			component_name = "fw";
+		} else if (type_name) {
 			snprintf(component_name_buf, sizeof(component_name_buf),
 				 "%s%s", PDSC_FW_COMPONENT_PREFIX, type_name);
 			component_name = component_name_buf;
-- 
2.43.0


^ permalink raw reply related

* [PATCH v4 6/6] pds_core: add debugfs support for host backed memory
From: Nikhil P. Rao @ 2026-06-14  5:00 UTC (permalink / raw)
  To: netdev
  Cc: Brett Creeley, Andrew Lunn, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Eric Joyner, Vamsi Atluri
In-Reply-To: <20260614050052.1048328-1-nikhil.rao@amd.com>

From: Vamsi Atluri <Vamsi.Atluri@amd.com>

Add debugfs file to display host memory allocations including tag,
size, order, and physical address for each memory request.

Signed-off-by: Vamsi Atluri <Vamsi.Atluri@amd.com>
---
 drivers/net/ethernet/amd/pds_core/core.c    |  2 +
 drivers/net/ethernet/amd/pds_core/core.h    |  1 +
 drivers/net/ethernet/amd/pds_core/debugfs.c | 45 +++++++++++++++++++++
 3 files changed, 48 insertions(+)

diff --git a/drivers/net/ethernet/amd/pds_core/core.c b/drivers/net/ethernet/amd/pds_core/core.c
index d1695ca95440..c0a8966dfff9 100644
--- a/drivers/net/ethernet/amd/pds_core/core.c
+++ b/drivers/net/ethernet/amd/pds_core/core.c
@@ -487,6 +487,7 @@ void pdsc_teardown(struct pdsc *pdsc, bool removing)
 		pdsc->viftype_status = NULL;
 	}
 
+	pdsc_debugfs_del_host_mem(pdsc);
 	pdsc_host_mem_free(pdsc);
 	pdsc_dev_uninit(pdsc);
 
@@ -498,6 +499,7 @@ int pdsc_start(struct pdsc *pdsc)
 	pds_core_intr_mask(&pdsc->intr_ctrl[pdsc->adminqcq.intx],
 			   PDS_CORE_INTR_MASK_CLEAR);
 	pdsc_host_mem_add(pdsc);
+	pdsc_debugfs_add_host_mem(pdsc);
 
 	return 0;
 }
diff --git a/drivers/net/ethernet/amd/pds_core/core.h b/drivers/net/ethernet/amd/pds_core/core.h
index 53e5a6f0af9c..b453493c093f 100644
--- a/drivers/net/ethernet/amd/pds_core/core.h
+++ b/drivers/net/ethernet/amd/pds_core/core.h
@@ -304,6 +304,7 @@ void pdsc_debugfs_add_irqs(struct pdsc *pdsc);
 void pdsc_debugfs_add_qcq(struct pdsc *pdsc, struct pdsc_qcq *qcq);
 void pdsc_debugfs_del_qcq(struct pdsc_qcq *qcq);
 void pdsc_debugfs_add_host_mem(struct pdsc *pdsc);
+void pdsc_debugfs_del_host_mem(struct pdsc *pdsc);
 
 int pdsc_err_to_errno(enum pds_core_status_code code);
 bool pdsc_is_fw_running(struct pdsc *pdsc);
diff --git a/drivers/net/ethernet/amd/pds_core/debugfs.c b/drivers/net/ethernet/amd/pds_core/debugfs.c
index 810a0cd9bcac..ef0a1b7d159b 100644
--- a/drivers/net/ethernet/amd/pds_core/debugfs.c
+++ b/drivers/net/ethernet/amd/pds_core/debugfs.c
@@ -178,3 +178,48 @@ void pdsc_debugfs_del_qcq(struct pdsc_qcq *qcq)
 	debugfs_remove_recursive(qcq->dentry);
 	qcq->dentry = NULL;
 }
+
+static int host_mem_show(struct seq_file *seq, void *v)
+{
+	struct pdsc *pdsc = seq->private;
+	struct pdsc_host_mem *hm;
+	int i;
+
+	if (!pdsc->host_mem_reqs || pdsc->num_host_mem_reqs == 0) {
+		seq_puts(seq, "No host memory allocated\n");
+		return 0;
+	}
+
+	seq_printf(seq, "Host memory requests: %u\n\n",
+		   pdsc->num_host_mem_reqs);
+	seq_puts(seq, "Tag    Size         Order  PA\n");
+	seq_puts(seq, "---    ----         -----  --\n");
+
+	for (i = 0; i < pdsc->num_host_mem_reqs; i++) {
+		hm = &pdsc->host_mem_reqs[i];
+
+		if (!hm->pg)
+			continue;
+
+		seq_printf(seq, "%-6u %-12u %-6u %pad\n",
+			   hm->tag, hm->size, hm->order, &hm->pa);
+	}
+
+	return 0;
+}
+DEFINE_SHOW_ATTRIBUTE(host_mem);
+
+void pdsc_debugfs_add_host_mem(struct pdsc *pdsc)
+{
+	if (!(pdsc->dev_ident.capabilities &
+	     cpu_to_le64(PDS_CORE_DEV_CAP_HOST_MEM)))
+		return;
+
+	debugfs_create_file("host_mem", 0400, pdsc->dentry,
+			    pdsc, &host_mem_fops);
+}
+
+void pdsc_debugfs_del_host_mem(struct pdsc *pdsc)
+{
+	debugfs_lookup_and_remove("host_mem", pdsc->dentry);
+}
-- 
2.43.0


^ permalink raw reply related

* [PATCH v4 2/6] pds_core: add support for identity version 2
From: Nikhil P. Rao @ 2026-06-14  5:00 UTC (permalink / raw)
  To: netdev
  Cc: Brett Creeley, Andrew Lunn, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Eric Joyner
In-Reply-To: <20260614050052.1048328-1-nikhil.rao@amd.com>

From: Brett Creeley <brett.creeley@amd.com>

Add a new capabilities field in struct pds_core_dev_identity,
which requires bumping the identity version to 2, i.e.
PDS_CORE_IDENTITY_VERSION_2. If version 2 negotiation fails,
then quietly fall back to version 1. If version 1 negotiation
fails, then driver load will fail.

Another patch in the series will make use of the capabilities
field.

Signed-off-by: Brett Creeley <brett.creeley@amd.com>
---
 drivers/net/ethernet/amd/pds_core/dev.c | 39 ++++++++++++++++++++-----
 include/linux/pds/pds_core_if.h         |  4 +++
 2 files changed, 36 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/amd/pds_core/dev.c b/drivers/net/ethernet/amd/pds_core/dev.c
index dd9989cfe6b3..5c0ca3d0b000 100644
--- a/drivers/net/ethernet/amd/pds_core/dev.c
+++ b/drivers/net/ethernet/amd/pds_core/dev.c
@@ -250,15 +250,17 @@ int pdsc_devcmd_reset(struct pdsc *pdsc)
 	return pdsc_devcmd(pdsc, &cmd, &comp, pdsc->devcmd_timeout);
 }
 
-static int pdsc_devcmd_identify_locked(struct pdsc *pdsc)
+static int pdsc_devcmd_identify_locked(struct pdsc *pdsc, u8 drv_ident_ver,
+				       bool do_msg)
 {
 	union pds_core_dev_comp comp = {};
 	union pds_core_dev_cmd cmd = {
 		.identify.opcode = PDS_CORE_CMD_IDENTIFY,
-		.identify.ver = PDS_CORE_IDENTITY_VERSION_1,
+		.identify.ver = drv_ident_ver,
 	};
 
-	return pdsc_devcmd_locked(pdsc, &cmd, &comp, pdsc->devcmd_timeout);
+	return __pdsc_devcmd_locked(pdsc, &cmd, &comp, pdsc->devcmd_timeout,
+				    do_msg);
 }
 
 static void pdsc_init_devinfo(struct pdsc *pdsc)
@@ -281,8 +283,9 @@ static void pdsc_init_devinfo(struct pdsc *pdsc)
 	dev_dbg(pdsc->dev, "fw_version %s\n", pdsc->dev_info.fw_version);
 }
 
-static int pdsc_identify(struct pdsc *pdsc)
+static int pdsc_identify_ver(struct pdsc *pdsc, u8 drv_ident_ver)
 {
+	bool do_msg = drv_ident_ver == PDS_CORE_IDENTITY_VERSION_1;
 	struct pds_core_drv_identity drv = {};
 	size_t sz;
 	int err;
@@ -305,17 +308,24 @@ static int pdsc_identify(struct pdsc *pdsc)
 	sz = min_t(size_t, sizeof(drv), sizeof(pdsc->cmd_regs->data));
 	memcpy_toio(&pdsc->cmd_regs->data, &drv, sz);
 
-	err = pdsc_devcmd_identify_locked(pdsc);
+	err = pdsc_devcmd_identify_locked(pdsc, drv_ident_ver, do_msg);
 	if (!err) {
 		sz = min_t(size_t, sizeof(pdsc->dev_ident),
 			   sizeof(pdsc->cmd_regs->data));
 		memcpy_fromio(&pdsc->dev_ident, &pdsc->cmd_regs->data, sz);
+
+		/* V1 firmware doesn't set capabilities, so the field may
+		 * contain garbage from the outgoing driver identity.
+		 */
+		if (pdsc->dev_ident.version < PDS_CORE_IDENTITY_VERSION_2)
+			pdsc->dev_ident.capabilities = 0;
 	}
 	mutex_unlock(&pdsc->devcmd_lock);
 
 	if (err) {
-		dev_err(pdsc->dev, "Cannot identify device: %pe\n",
-			ERR_PTR(err));
+		if (do_msg)
+			dev_err(pdsc->dev, "Cannot identify device: %pe\n",
+				ERR_PTR(err));
 		return err;
 	}
 
@@ -334,6 +344,21 @@ static int pdsc_identify(struct pdsc *pdsc)
 	return 0;
 }
 
+static int pdsc_identify(struct pdsc *pdsc)
+{
+	int err;
+
+	/* Older firmware rejects anything but PDS_CORE_IDENTITY_VERSION_1
+	 * instead of returning the max supported identity version, so retry if
+	 * firmware doesn't support PDS_CORE_IDENTITY_VERSION_2
+	 */
+	err = pdsc_identify_ver(pdsc, PDS_CORE_IDENTITY_VERSION_2);
+	if (err)
+		err = pdsc_identify_ver(pdsc, PDS_CORE_IDENTITY_VERSION_1);
+
+	return err;
+}
+
 void pdsc_dev_uninit(struct pdsc *pdsc)
 {
 	if (pdsc->intr_info) {
diff --git a/include/linux/pds/pds_core_if.h b/include/linux/pds/pds_core_if.h
index 17a87c1a55d7..619186f26b5b 100644
--- a/include/linux/pds/pds_core_if.h
+++ b/include/linux/pds/pds_core_if.h
@@ -119,6 +119,8 @@ struct pds_core_drv_identity {
  *		      value in usecs to device units using:
  *		      device units = usecs * mult / div
  * @vif_types:        How many of each VIF device type is supported
+ * @capabilities:     Device capabilities
+ *		      only supported on version >= PDS_CORE_IDENTITY_VERSION_2
  */
 struct pds_core_dev_identity {
 	u8     version;
@@ -131,9 +133,11 @@ struct pds_core_dev_identity {
 	__le32 intr_coal_mult;
 	__le32 intr_coal_div;
 	__le16 vif_types[PDS_DEV_TYPE_MAX];
+	__le64 capabilities;
 };
 
 #define PDS_CORE_IDENTITY_VERSION_1	1
+#define PDS_CORE_IDENTITY_VERSION_2	2
 
 /**
  * struct pds_core_dev_identify_cmd - Driver/device identify command
-- 
2.43.0


^ permalink raw reply related

* [PATCH v4 1/6] pds_core: add support for quiet devcmd failures
From: Nikhil P. Rao @ 2026-06-14  5:00 UTC (permalink / raw)
  To: netdev
  Cc: Brett Creeley, Andrew Lunn, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Eric Joyner
In-Reply-To: <20260614050052.1048328-1-nikhil.rao@amd.com>

From: Brett Creeley <brett.creeley@amd.com>

Currently there aren't any use-cases that require special handling
on whether or not to print devcmd failures. Specifically
non-generic failures, i.e. not supported failures. Add support to
allow these messages to be suppressed. This will be used when
adding support to negotiate PDS_CORE_IDENTITY_VERSION_2.

Signed-off-by: Brett Creeley <brett.creeley@amd.com>
---
 drivers/net/ethernet/amd/pds_core/dev.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/amd/pds_core/dev.c b/drivers/net/ethernet/amd/pds_core/dev.c
index bded6b33289c..dd9989cfe6b3 100644
--- a/drivers/net/ethernet/amd/pds_core/dev.c
+++ b/drivers/net/ethernet/amd/pds_core/dev.c
@@ -126,7 +126,8 @@ static const char *pdsc_devcmd_str(int opcode)
 	}
 }
 
-static int pdsc_devcmd_wait(struct pdsc *pdsc, u8 opcode, int max_seconds)
+static int __pdsc_devcmd_wait(struct pdsc *pdsc, u8 opcode, int max_seconds,
+			      const bool do_msg)
 {
 	struct device *dev = pdsc->dev;
 	unsigned long start_time;
@@ -179,7 +180,7 @@ static int pdsc_devcmd_wait(struct pdsc *pdsc, u8 opcode, int max_seconds)
 
 	status = pdsc_devcmd_status(pdsc);
 	err = pdsc_err_to_errno(status);
-	if (err && err != -EAGAIN)
+	if (do_msg && err && err != -EAGAIN)
 		dev_err(dev, "DEVCMD %d %s failed, status=%d err %d %pe\n",
 			opcode, pdsc_devcmd_str(opcode), status, err,
 			ERR_PTR(err));
@@ -187,8 +188,9 @@ static int pdsc_devcmd_wait(struct pdsc *pdsc, u8 opcode, int max_seconds)
 	return err;
 }
 
-int pdsc_devcmd_locked(struct pdsc *pdsc, union pds_core_dev_cmd *cmd,
-		       union pds_core_dev_comp *comp, int max_seconds)
+static int __pdsc_devcmd_locked(struct pdsc *pdsc, union pds_core_dev_cmd *cmd,
+				union pds_core_dev_comp *comp, int max_seconds,
+				const bool do_msg)
 {
 	int err;
 
@@ -197,7 +199,7 @@ int pdsc_devcmd_locked(struct pdsc *pdsc, union pds_core_dev_cmd *cmd,
 
 	memcpy_toio(&pdsc->cmd_regs->cmd, cmd, sizeof(*cmd));
 	pdsc_devcmd_dbell(pdsc);
-	err = pdsc_devcmd_wait(pdsc, cmd->opcode, max_seconds);
+	err = __pdsc_devcmd_wait(pdsc, cmd->opcode, max_seconds, do_msg);
 
 	if ((err == -ENXIO || err == -ETIMEDOUT) && pdsc->wq)
 		queue_work(pdsc->wq, &pdsc->health_work);
@@ -207,6 +209,12 @@ int pdsc_devcmd_locked(struct pdsc *pdsc, union pds_core_dev_cmd *cmd,
 	return err;
 }
 
+int pdsc_devcmd_locked(struct pdsc *pdsc, union pds_core_dev_cmd *cmd,
+		       union pds_core_dev_comp *comp, int max_seconds)
+{
+	return __pdsc_devcmd_locked(pdsc, cmd, comp, max_seconds, true);
+}
+
 int pdsc_devcmd(struct pdsc *pdsc, union pds_core_dev_cmd *cmd,
 		union pds_core_dev_comp *comp, int max_seconds)
 {
-- 
2.43.0


^ permalink raw reply related

* [PATCH net-next v4 0/6] pds_core: Add PLDM firmware update and host backed memory support
From: Nikhil P. Rao @ 2026-06-14  5:00 UTC (permalink / raw)
  To: netdev
  Cc: Brett Creeley, Andrew Lunn, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Eric Joyner, Nikhil P. Rao

This series adds PLDM-based firmware update support to the pds_core
driver. PLDM (Platform Level Data Model) is a DMTF standard for firmware
management that provides a vendor-neutral interface for firmware updates.

The implementation uses the kernel's pldmfw library for package parsing
and component matching. Users can update entire firmware packages or
individual components via devlink flash. Component information is
displayed via devlink info, showing firmware versions and update status
for each component.

The series also adds host backed memory support, allowing firmware to
request memory pages from the host for its operations.

Changes since v3:
- Changed "fw.mainfw" to just "fw" for main firmware (Jakub Kicinski).
  Gold slot main firmware is reported as "fw.gold".
- Removed redundant memset before alloc_pages (Paolo Abeni)
- Changed dev_err to dev_warn for alloc_pages failure (Paolo Abeni)
- Only report dev_info.fw_version for identity version 1 (version 2+
  reports firmware via PLDM component info)
- Fixed checkpatch alignment issue by extracting pdsc_dl_info_get_v1()
  helper function

Changes since v2:
- Use driver-defined component names instead of passing through firmware
  names (Jakub Kicinski). Added component_type enum that firmware populates,
  driver maps to stable names like fw.mainfw, fw.goldfw, fw.bootloader.
  Added documentation of firmware version names to pds_core.rst.
- Fixed bugs identified by sashiko:
  Patch 2 (identity version 2):
  - Fix comment using wrong macro names (IDENTIFY vs IDENTITY)

  Patch 3 (PLDM firmware update):
  - DMA-after-free on EAGAIN/ETIMEDOUT: when a command times out or
    returns busy, firmware may still be accessing the DMA buffer; defer
    freeing until a subsequent command succeeds
  - Use dev_warn_once for incompatible firmware version (ver==0)
  - Clear component cache after flash to show updated versions

  Patch 4 (component info):
  - Fix min_t(u8) truncation of max_fw_slots (u16) to min_t(u16)
  - Fix F_FIXED early return skipping F_RUNNING flag check
  - Don't fail devlink info if component query fails; use dev_warn_once
    and continue to report generic fields (fw, asic.id, serial_number)

  Patch 5 (host backed memory):
  - Switch from adminq to devcmd; fixes both workqueue self-deadlock
    during recovery (adminq completion runs on same wq as health_thread)
    and health_work re-queued after cancel (adminq timeout re-queues work)
  - Remove MEM_DEL from teardown path; fixes both MEM_DEL sent twice
    for same tag and num_host_mem_reqs ambiguous semantics (now only
    tracks pages to free). pci_clear_master guarantees DMA quiescence.
  - Fix PDSC_HOST_MEM_MAX_CONTIG to 4MB constant (was arch-dependent)
  - Not fixed: pdsc_host_mem_add() failure ignored; partial host memory
    is acceptable and firmware handles fewer regions than requested

  Patch 6 (debugfs):
  - Move pdsc_debugfs_del_host_mem() before pdsc_host_mem_free() to
    fix use-after-free race with debugfs readers
  - Use %u for unsigned types and %pad for dma_addr_t
  - Remove "file exists" check (now dead code since teardown removes file)

Note: The following fix was submitted separately via net:
- DMA in flight during teardown (call pci_clear_master before freeing
  host memory):
  https://lore.kernel.org/all/20260604213637.3844317-1-nikhil.rao@amd.com/

Changes since v1:
- Removed redefinition of __counted_by kernel primitive (Jakub Kicinski)
- Fixed kdoc warnings in pds_core_if.h
- Fixed checkpatch warnings
- Fixed bugs identified by sashiko:
  Patch 2 (identity version 2):
  - Zero data region before firmware commands
  - Suppress expected error message during identify probe

  Patch 3 (PLDM firmware update):
  - Memory leak in pdsc_send_component_image() error path
  - Memory leak in pdsc_flash_component() error path
  - Missing devcmd_lock in pdsc_devcmd_finalize_update()
  - Fixed dma_mapping_error() return value handling (returns boolean, not error code)
  - Skip logic for components with index > 255

  Patch 4 (component info):
  - Added generic fw version display for all identity versions
  - Handle components with both RUNNING and STARTUP flags

  Patch 5 (host backed memory):
  - Race between pdsc_remove and health thread (use-after-free)
  - Set missing index field in MEM_QUERY command
  - Host memory allocation size and zeroing
  - Don't free host memory on MEM_ADD timeout (firmware may still be using it)

  Patch 6 (debugfs):
  - Fix dentry reference leak in debugfs_lookup (missing dput)

- Improvements:
  - Cache component info to avoid repeated firmware queries (patch 4)

Note: The following fix for an existing bug was submitted separately
via net:
- Timeout error overwritten with stale status:
  https://lore.kernel.org/netdev/20260515212907.998028-1-nikhil.rao@amd.com/

Link to v3: https://lore.kernel.org/netdev/20260608-upstream_v3_clean-v3-0-7b077cd5f334@amd.com/

Signed-off-by: Nikhil P. Rao <nikhil.rao@amd.com>

Brett Creeley (4):
  pds_core: add support for quiet devcmd failures
  pds_core: add support for identity version 2
  pds_core: add PLDM firmware update support via devlink flash
  pds_core: add PLDM component info display

Vamsi Atluri (2):
  pds_core: add host backed memory support for firmware
  pds_core: add debugfs support for host backed memory

 .../device_drivers/ethernet/amd/pds_core.rst  |  84 ++
 drivers/net/ethernet/amd/Kconfig              |   1 +
 drivers/net/ethernet/amd/pds_core/core.c      | 162 ++++
 drivers/net/ethernet/amd/pds_core/core.h      |  51 +-
 drivers/net/ethernet/amd/pds_core/debugfs.c   |  45 +
 drivers/net/ethernet/amd/pds_core/dev.c       | 136 ++-
 drivers/net/ethernet/amd/pds_core/devlink.c   | 134 ++-
 drivers/net/ethernet/amd/pds_core/fw.c        | 775 +++++++++++++++++-
 drivers/net/ethernet/amd/pds_core/main.c      |  11 +-
 include/linux/pds/pds_core_if.h               | 470 +++++++++++
 10 files changed, 1849 insertions(+), 20 deletions(-)


base-commit: 903db046d5579bef0ea699eae4b279dd6455fc9f
--
2.43.0


^ permalink raw reply

* Re: [PATCH] staging: remove obsolete network and 16-bit video drivers
From: Al Viro @ 2026-06-14  4:28 UTC (permalink / raw)
  To: Gabriel Ramos; +Cc: andrew, netdev, linux-kernel, linux-staging
In-Reply-To: <20260614021629.656478-1-maguraa53@gmail.com>

On Sat, Jun 13, 2026 at 11:16:29PM -0300, Gabriel Ramos wrote:
> From: Gabriel Ramos Barbosa Mota <maguraa53@gmail.com>
> 
> These drivers are currently in the staging tree and have been identified as unmaintained and obsolete. They lack modern hardware support, do not integrate with the current device model, and represent technical debt. This patch removes these drivers to clean up the staging area and reduce the overall maintenance burden, as there is no evidence of active users or ongoing development for these specific components. Signed-off-by: Gabriel Ramos <maguraa53@gmail.com>
> ---
>  teste_disco.img | Bin 0 -> 20971520 bytes
>  1 file changed, 0 insertions(+), 0 deletions(-)
>  create mode 100644 teste_disco.img
> 
> diff --git a/teste_disco.img b/teste_disco.img
> new file mode 100644
> index 0000000000000000000000000000000000000000..13145eb3ecf765e330220f06cf8ad7553c3d9ff2
> GIT binary patch
[snip]

What this patch does is dropping a 20Mb binary into top-level
directory, apparently with empty (and dirty) ext4 image in it
(according to debugfs, that is).

WTF?

^ permalink raw reply

* Re: [RFC PATCH net-next 0/7] net: airoha: add EN7581 SOE ESP packet offload
From: Jihong Min @ 2026-06-14  4:18 UTC (permalink / raw)
  To: netdev, Lorenzo Bianconi
  Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Andrew Lunn, Simon Horman, Herbert Xu, Steffen Klassert,
	Rob Herring, Krzysztof Kozlowski, Conor Dooley, devicetree,
	Matthias Brugger, AngeloGioacchino Del Regno, linux-arm-kernel,
	linux-mediatek, Christian Marangi, Felix Fietkau, linux-kernel
In-Reply-To: <20260614040032.1567994-1-hurryman2212@gmail.com>



On 6/14/26 13:00, Jihong Min wrote:
> Add Secure Offload Engine (SOE) support for the Airoha EN7581 Ethernet
> driver. SOE provides inline ESP packet offload for native ESP and NAT-T
> traffic, with the Ethernet/QDMA path used to submit packets to the SOE
> block and the PPE path used to bind eligible ESP flows. NETIF_F_GSO_ESP
> and NETIF_F_HW_ESP_TX_CSUM are intentionally left out for now and will be
> revisited separately for feasibility.
> 
> This is posted as RFC because the code was originally developed and tested
> against an OpenWrt 6.18 Airoha tree, not against the current upstream
> net-next driver. The original OpenWrt commit used as the source for this
> RFC is available at:
> https://github.com/hurryman2212/OpenW1700k-test/commit/7c1b5e662f7790b3d23ed143beadc1dcbf6d15f7
> 
> The SOE part is intentionally linked into the airoha Ethernet module
> instead of being exposed as an independent crypto or platform driver. The
> user-visible ESP offload control is a netdev capability: xfrmdev_ops and
> NETIF_F_HW_ESP live on the target netdev, and the feature can be controlled
> through the usual netdev feature path. SOE also shares the FE/QDMA/PPE
> datapath, private queues, DSA conduit handling and netdev lifetime owned by
> airoha_eth.
> 
> Patch 1 adds xdo_dev_packet_xmit() because the existing XFRM packet
> offload transmit path does not provide a hook for hardware whose ESP engine
> is reached through device-specific packet forwarding. SOE needs to consume
> the skb, add a hardware hop descriptor, steer it to a private QDMA path and
> return the final transmit status. Drivers that do not implement the
> optional callback keep the existing XFRM output behavior.
> 
> Jihong Min (7):
>   xfrm: allow packet offload drivers to own transmit
>   dt-bindings: net: airoha: add EN7581 SOE
>   arm64: dts: airoha: add EN7581 SOE node
>   net: airoha: add SOE registers and driver state
>   net: airoha: add QDMA support for SOE packets
>   net: airoha: add PPE support for SOE flows
>   net: airoha: add SOE XFRM packet offload support
> 
>  .../bindings/net/airoha,en7581-soe.yaml       |   48 +
>  MAINTAINERS                                   |    1 +
>  arch/arm64/boot/dts/airoha/en7581.dtsi        |    6 +
>  drivers/net/ethernet/airoha/Kconfig           |   13 +
>  drivers/net/ethernet/airoha/Makefile          |    1 +
>  drivers/net/ethernet/airoha/airoha_eth.c      |  668 +++++-
>  drivers/net/ethernet/airoha/airoha_eth.h      |   40 +
>  drivers/net/ethernet/airoha/airoha_ppe.c      |  606 +++++-
>  drivers/net/ethernet/airoha/airoha_regs.h     |   16 +
>  drivers/net/ethernet/airoha/airoha_soe.c      | 1896 +++++++++++++++++
>  drivers/net/ethernet/airoha/airoha_soe.h      |  126 ++
>  include/linux/netdevice.h                     |    8 +
>  include/linux/soc/airoha/airoha_offload.h     |    5 +
>  net/xfrm/xfrm_output.c                        |   11 +
>  14 files changed, 3342 insertions(+), 103 deletions(-)
>  create mode 100644 Documentation/devicetree/bindings/net/airoha,en7581-soe.yaml
>  create mode 100644 drivers/net/ethernet/airoha/airoha_soe.c
>  create mode 100644 drivers/net/ethernet/airoha/airoha_soe.h
> 

I noticed, after posting this RFC, that I forgot to include the
following trailer while preparing the latest patch series:

Assisted-by: Codex:gpt-5.5

These patches were written and tested with AI assistance, although I've
reviewed the resulting code and test results. I'll include the trailer
properly in future revisions or submissions. Sorry.


Sincerely,
Jihong Min

^ permalink raw reply

* [RFC PATCH net-next 7/7] net: airoha: add SOE XFRM packet offload support
From: Jihong Min @ 2026-06-14  4:00 UTC (permalink / raw)
  To: netdev, Lorenzo Bianconi
  Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Andrew Lunn, Simon Horman, Herbert Xu, Steffen Klassert,
	Rob Herring, Krzysztof Kozlowski, Conor Dooley, devicetree,
	Matthias Brugger, AngeloGioacchino Del Regno, linux-arm-kernel,
	linux-mediatek, Christian Marangi, Felix Fietkau, linux-kernel,
	Jihong Min
In-Reply-To: <20260614040032.1567994-1-hurryman2212@gmail.com>

Add the EN7581 Secure Offload Engine provider. The provider programs ESP
SAs, exposes NETIF_F_HW_ESP through xfrmdev_ops, submits encrypt and
decrypt packets through the QDMA SOE path, and handles SOE completion
delivery.

Mirror the XFRM ops to DSA user devices whose CPU conduit is an Airoha
netdev so packet offload remains available through switch ports.

Signed-off-by: Jihong Min <hurryman2212@gmail.com>
---
 drivers/net/ethernet/airoha/Kconfig      |   13 +
 drivers/net/ethernet/airoha/Makefile     |    1 +
 drivers/net/ethernet/airoha/airoha_soe.c | 1896 ++++++++++++++++++++++
 3 files changed, 1910 insertions(+)
 create mode 100644 drivers/net/ethernet/airoha/airoha_soe.c

diff --git a/drivers/net/ethernet/airoha/Kconfig b/drivers/net/ethernet/airoha/Kconfig
index ad3ce501e7a5..a20e9dd0bfde 100644
--- a/drivers/net/ethernet/airoha/Kconfig
+++ b/drivers/net/ethernet/airoha/Kconfig
@@ -31,4 +31,17 @@ config NET_AIROHA_FLOW_STATS
 	help
 	  Enable Aiorha flowtable statistic counters.
 
+config NET_AIROHA_SOE
+	bool "Airoha SOE ESP offload support"
+	depends on NET_AIROHA
+	depends on INET
+	select XFRM
+	select XFRM_OFFLOAD
+	help
+	  Enable support for the Airoha Secure Offload Engine used by
+	  the Ethernet driver for ESP packet offload. This option only
+	  adds the provider and netdev plumbing; ESP offload is still
+	  advertised at runtime only when the SOE block and required
+	  packet offload path are available.
+
 endif #NET_VENDOR_AIROHA
diff --git a/drivers/net/ethernet/airoha/Makefile b/drivers/net/ethernet/airoha/Makefile
index 94468053e34b..b68b8f614b0e 100644
--- a/drivers/net/ethernet/airoha/Makefile
+++ b/drivers/net/ethernet/airoha/Makefile
@@ -6,4 +6,5 @@
 obj-$(CONFIG_NET_AIROHA) += airoha-eth.o
 airoha-eth-y := airoha_eth.o airoha_ppe.o
 airoha-eth-$(CONFIG_DEBUG_FS) += airoha_ppe_debugfs.o
+airoha-eth-$(CONFIG_NET_AIROHA_SOE) += airoha_soe.o
 obj-$(CONFIG_NET_AIROHA_NPU) += airoha_npu.o
diff --git a/drivers/net/ethernet/airoha/airoha_soe.c b/drivers/net/ethernet/airoha/airoha_soe.c
new file mode 100644
index 000000000000..3a240ed44d7f
--- /dev/null
+++ b/drivers/net/ethernet/airoha/airoha_soe.c
@@ -0,0 +1,1896 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Airoha Secure Offload Engine (SOE) provider for the Ethernet driver.
+ *
+ * This file owns the EN7581 SOE packet-offload glue used by airoha_eth:
+ * xfrm state programming, hop-descriptor TX metadata, SOE RX completion
+ * decoding, and DSA proxy netdev binding. The SOE block is reached through
+ * the FE/QDMA packet fabric and is initialized by the Ethernet driver rather
+ * than by a separate platform driver.
+ */
+
+#include <linux/atomic.h>
+#include <linux/bitfield.h>
+#include <linux/completion.h>
+#include <linux/device.h>
+#include <linux/err.h>
+#include <linux/if_ether.h>
+#include <linux/if_packet.h>
+#include <linux/iopoll.h>
+#include <linux/io.h>
+#include <linux/ipv6.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/moduleparam.h>
+#include <linux/netdevice.h>
+#include <linux/of.h>
+#include <linux/of_address.h>
+#include <linux/rcupdate.h>
+#include <linux/refcount.h>
+#include <linux/skbuff.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/string.h>
+#include <linux/udp.h>
+#include <linux/unaligned.h>
+
+#include <net/dst.h>
+#include <net/esp.h>
+#include <net/gso.h>
+#include <net/ip.h>
+#include <net/net_namespace.h>
+#include <net/xfrm.h>
+
+#include "airoha_eth.h"
+#include "airoha_regs.h"
+#include "airoha_soe.h"
+
+#define AIROHA_SOE_NUM_SA 32
+#define AIROHA_SOE_QDMA_HOP_DESC_LEN 32
+#define AIROHA_SOE_KEY_WORDS 8
+#define AIROHA_SOE_ADDR_WORDS 4
+#define AIROHA_SOE_SA_TIMEOUT_US 1000
+#define AIROHA_SOE_SA_FREE_TIMEOUT HZ
+#define AIROHA_SOE_HOP_DESC0_ENCRYPT 0xffffff81ULL
+#define AIROHA_SOE_HOP_DESC0_DECRYPT 0xffffff82ULL
+#define AIROHA_SOE_HOP_DESC1 0x1ff00000000ULL
+#define AIROHA_SOE_QDMA_TX_RING 2
+#define AIROHA_SOE_TXMSG2_DEFAULT 0xff00ffff
+
+/* This is the packet/IPsec SOE window at 0x1fbfa000. EN7581 E2 exposes
+ * this register block for packet processing, not standalone crypto offload.
+ */
+#define AIROHA_SOE_GLB_CFG 0x000
+#define AIROHA_SOE_GLB_CFG_ENC_EN BIT(0)
+#define AIROHA_SOE_GLB_CFG_DEC_EN BIT(1)
+#define AIROHA_SOE_CONT_ICV_CTRL 0x004
+#define AIROHA_SOE_INT_EN 0x020
+#define AIROHA_SOE_INT_STS 0x024
+#define AIROHA_SOE_INT_ALL GENMASK(15, 0)
+#define AIROHA_SOE_CNT_CLR 0x04c
+#define AIROHA_SOE_CNT_CLR_ALL BIT(0)
+#define AIROHA_SOE_SA_CTRL 0x100
+#define AIROHA_SOE_SA_DONE 0x104
+#define AIROHA_SOE_SA_CMD 0x110
+#define AIROHA_SOE_BCNT_THSHD_32_SOFT 0x114
+#define AIROHA_SOE_BCNT_THSHD_64_SOFT 0x118
+#define AIROHA_SOE_SA_SPI 0x11c
+#define AIROHA_SOE_SA_UDP_PORT 0x120
+#define AIROHA_SOE_SA_ENC_KEY(n) (0x124 + (n) * 4)
+#define AIROHA_SOE_SA_HMAC_KEY(n) (0x144 + (n) * 4)
+#define AIROHA_SOE_SA_SRC_ADDR(n) (0x164 + (n) * 4)
+#define AIROHA_SOE_SA_DST_ADDR(n) (0x174 + (n) * 4)
+#define AIROHA_SOE_ICV_OK_LO_CNT 0x184
+#define AIROHA_SOE_ICV_OK_HI_CNT 0x188
+#define AIROHA_SOE_ICV_FAIL_LO_CNT 0x18c
+#define AIROHA_SOE_ICV_FAIL_HI_CNT 0x190
+#define AIROHA_SOE_CON_ICV_FAIL_CNT 0x194
+#define AIROHA_SOE_SEQ_NUM_LO 0x198
+#define AIROHA_SOE_SEQ_NUM_HI 0x19c
+#define AIROHA_SOE_BCNT_LO 0x1a0
+#define AIROHA_SOE_BCNT_HI 0x1a4
+#define AIROHA_SOE_FLOW_LAB_DSCP 0x1a8
+#define AIROHA_SOE_BCNT_80 0x1ac
+#define AIROHA_SOE_BCNT_THSHD_80 0x1b0
+#define AIROHA_SOE_BCNT_THSHD_32_HARD 0x1b4
+#define AIROHA_SOE_BCNT_THSHD_64_HARD 0x1b8
+#define AIROHA_SOE_SEQ_THSHD_32_SOFT 0x1bc
+#define AIROHA_SOE_SEQ_THSHD_64_SOFT 0x1c0
+#define AIROHA_SOE_SEQ_THSHD_32_HARD 0x1c4
+#define AIROHA_SOE_SEQ_THSHD_64_HARD 0x1c8
+#define AIROHA_SOE_SA_CTRL_WR BIT(0)
+#define AIROHA_SOE_SA_CTRL_IDX GENMASK(15, 8)
+#define AIROHA_SOE_SA_DONE_W1C BIT(0)
+
+#define AIROHA_SOE_SA_CMD_ENC BIT(0)
+#define AIROHA_SOE_SA_CMD_CIPHER GENMASK(3, 1)
+#define AIROHA_SOE_SA_CMD_HASH GENMASK(6, 4)
+#define AIROHA_SOE_SA_CMD_AES_KEY_LEN GENMASK(8, 7)
+#define AIROHA_SOE_SA_CMD_ESN_EN BIT(9)
+#define AIROHA_SOE_SA_CMD_OUT_IPV6 BIT(10)
+#define AIROHA_SOE_SA_CMD_ESP_MODE BIT(11) /* 0=tunnel, 1=transport */
+#define AIROHA_SOE_SA_CMD_NAT_EN BIT(12)
+#define AIROHA_SOE_SA_CMD_ANTI_RPLY_EN BIT(13)
+#define AIROHA_SOE_SA_CMD_ANTI_RPLY_WDW GENMASK(15, 14)
+#define AIROHA_SOE_SA_CMD_SN_ERR_DROP BIT(16)
+#define AIROHA_SOE_SA_CMD_PAD_ERR_DROP BIT(17)
+#define AIROHA_SOE_SA_CMD_ICV_ERR_DROP BIT(18)
+#define AIROHA_SOE_SA_CMD_GCM_ICV_LEN GENMASK(25, 24)
+#define AIROHA_SOE_SA_CMD_DEC_UDP_PARSER_EN BIT(29)
+#define AIROHA_SOE_SA_CMD_VLD BIT(31)
+
+#define AIROHA_SOE_CIPHER_AES_CBC 1
+#define AIROHA_SOE_CIPHER_AES_GCM 2
+#define AIROHA_SOE_HASH_HMAC_SHA1_96 1
+#define AIROHA_SOE_HASH_HMAC_SHA256_128 2
+#define AIROHA_SOE_AES_KEY_128 0
+#define AIROHA_SOE_AES_KEY_192 1
+#define AIROHA_SOE_AES_KEY_256 2
+#define AIROHA_SOE_QDMA_QUEUE_ENCRYPT 8
+#define AIROHA_SOE_QDMA_QUEUE_DECRYPT 9
+#define AIROHA_SOE_NATT_PORT 4500
+#define AIROHA_SOE_HOP_FLAG_ENCRYPTED 3
+#define AIROHA_SOE_HOP_FLAG_DECRYPTED 4
+#define AIROHA_SOE_HOP_FLAG_ERROR_BASE 5
+#define AIROHA_SOE_HOP_INFO_ENCRYPT 2
+#define AIROHA_SOE_HOP_INFO_DECRYPT 3
+
+static unsigned int airoha_soe_rx_trace_packets;
+module_param_named(soe_rx_trace_packets, airoha_soe_rx_trace_packets, uint,
+		   0600);
+MODULE_PARM_DESC(soe_rx_trace_packets,
+		 "Number of SOE RX completion IPv4 headers to log");
+
+enum airoha_soe_ctx_dir {
+	AIROHA_SOE_CTX_OUT,
+	AIROHA_SOE_CTX_IN,
+};
+
+struct airoha_soe_ctx {
+	struct list_head list;
+	enum airoha_soe_ctx_dir dir;
+	union {
+		struct dst_entry *dst;
+		struct {
+			struct xfrm_state *x;
+			struct airoha_gdm_dev *gdm_dev;
+			struct net_device *dev;
+			__be32 saddr;
+			__be16 sport;
+			u16 foe_hash;
+			u32 foe_reason;
+			u8 sa_index;
+			bool foe_valid;
+			u32 mark;
+		} rx;
+	};
+};
+
+struct airoha_soe_sa {
+	struct airoha_soe *soe;
+	unsigned int index;
+	u32 cmd;
+	u32 spi;
+
+	spinlock_t lock; /* Protects in-flight context queues and dead. */
+	struct list_head tx_queue;
+	struct list_head rx_queue;
+	struct completion idle;
+	unsigned int inflight;
+	bool dead;
+};
+
+struct airoha_soe_xfrm_state {
+	struct airoha_gdm_dev *dev;
+	struct airoha_soe *soe;
+	struct airoha_soe_sa *sa;
+	bool counted;
+};
+
+struct airoha_soe_sa_cfg {
+	u32 cmd;
+	u32 spi;
+	u32 udp_port;
+	u32 enc_key[AIROHA_SOE_KEY_WORDS];
+	u32 hmac_key[AIROHA_SOE_KEY_WORDS];
+	u32 src_addr[AIROHA_SOE_ADDR_WORDS];
+	u32 dst_addr[AIROHA_SOE_ADDR_WORDS];
+	u64 soft_byte_limit;
+	u64 hard_byte_limit;
+	u64 soft_packet_limit;
+	u64 hard_packet_limit;
+};
+
+struct airoha_soe_rx_info {
+	int packet_len;
+	bool encap;
+	__be16 sport;
+	__be16 dport;
+	__be32 spi;
+};
+
+struct airoha_soe {
+	struct device *dev;
+	void __iomem *base;
+
+	/* Serialize SA table programming and software slot ownership. */
+	struct mutex sa_lock;
+	unsigned long sa_map;
+	struct airoha_soe_sa __rcu *sa[AIROHA_SOE_NUM_SA];
+	atomic_t pending_rx;
+
+	spinlock_t state_lock; /* Protects dead against concurrent users. */
+	refcount_t refcnt;
+	struct completion released;
+	bool dead;
+};
+
+static const struct xfrmdev_ops airoha_soe_xfrmdev_ops;
+static const struct xfrmdev_ops airoha_soe_dsa_xfrmdev_ops;
+
+static struct airoha_soe *airoha_soe_get_ref(struct airoha_soe *soe)
+{
+	unsigned long flags;
+	bool alive;
+
+	if (!soe)
+		return NULL;
+
+	spin_lock_irqsave(&soe->state_lock, flags);
+	alive = !soe->dead && refcount_inc_not_zero(&soe->refcnt);
+	spin_unlock_irqrestore(&soe->state_lock, flags);
+
+	return alive ? soe : NULL;
+}
+
+static void airoha_soe_put_ref(struct airoha_soe *soe)
+{
+	if (soe && refcount_dec_and_test(&soe->refcnt))
+		complete(&soe->released);
+}
+
+bool airoha_soe_available(struct airoha_soe *soe)
+{
+	unsigned long flags;
+	bool available;
+
+	if (!soe)
+		return false;
+
+	spin_lock_irqsave(&soe->state_lock, flags);
+	available = !soe->dead;
+	spin_unlock_irqrestore(&soe->state_lock, flags);
+
+	return available;
+}
+
+u32 airoha_soe_features(struct airoha_soe *soe)
+{
+	return airoha_soe_available(soe) ? AIROHA_SOE_FEATURE_ESP : 0;
+}
+
+static u64 airoha_soe_limit(u64 limit)
+{
+	return limit == XFRM_INF ? U64_MAX : limit;
+}
+
+static int airoha_soe_wait_sa_done(struct airoha_soe *soe)
+{
+	u32 done;
+	int err;
+
+	err = readl_poll_timeout(soe->base + AIROHA_SOE_SA_DONE, done,
+				 done & AIROHA_SOE_SA_DONE_W1C, 1,
+				 AIROHA_SOE_SA_TIMEOUT_US);
+	writel(0, soe->base + AIROHA_SOE_SA_CTRL);
+	writel(AIROHA_SOE_SA_DONE_W1C, soe->base + AIROHA_SOE_SA_DONE);
+
+	return err;
+}
+
+static int airoha_soe_commit_sa(struct airoha_soe *soe, unsigned int index)
+{
+	u32 ctrl;
+
+	/* SA registers are a single staging window committed by index. */
+	writel(AIROHA_SOE_SA_DONE_W1C, soe->base + AIROHA_SOE_SA_DONE);
+	ctrl = FIELD_PREP(AIROHA_SOE_SA_CTRL_IDX, index) |
+	       AIROHA_SOE_SA_CTRL_WR;
+	writel(ctrl, soe->base + AIROHA_SOE_SA_CTRL);
+
+	return airoha_soe_wait_sa_done(soe);
+}
+
+static void airoha_soe_write_key(void __iomem *base, u32 reg, const u32 *key)
+{
+	unsigned int i;
+
+	for (i = 0; i < AIROHA_SOE_KEY_WORDS; i++)
+		writel(key[i], base + reg + i * sizeof(u32));
+}
+
+static void airoha_soe_write_addr(void __iomem *base, u32 reg, const u32 *addr)
+{
+	unsigned int i;
+
+	for (i = 0; i < AIROHA_SOE_ADDR_WORDS; i++)
+		writel(addr[i], base + reg + i * sizeof(u32));
+}
+
+static int airoha_soe_program_sa_locked(struct airoha_soe *soe,
+					unsigned int index,
+					const struct airoha_soe_sa_cfg *cfg)
+{
+	void __iomem *base = soe->base;
+
+	writel(cfg->cmd | AIROHA_SOE_SA_CMD_VLD, base + AIROHA_SOE_SA_CMD);
+	writel(lower_32_bits(cfg->soft_byte_limit),
+	       base + AIROHA_SOE_BCNT_THSHD_32_SOFT);
+	writel(upper_32_bits(cfg->soft_byte_limit),
+	       base + AIROHA_SOE_BCNT_THSHD_64_SOFT);
+	writel(cfg->spi, base + AIROHA_SOE_SA_SPI);
+	writel(cfg->udp_port, base + AIROHA_SOE_SA_UDP_PORT);
+	airoha_soe_write_key(base, AIROHA_SOE_SA_ENC_KEY(0), cfg->enc_key);
+	airoha_soe_write_key(base, AIROHA_SOE_SA_HMAC_KEY(0), cfg->hmac_key);
+	airoha_soe_write_addr(base, AIROHA_SOE_SA_SRC_ADDR(0), cfg->src_addr);
+	airoha_soe_write_addr(base, AIROHA_SOE_SA_DST_ADDR(0), cfg->dst_addr);
+
+	writel(0, base + AIROHA_SOE_ICV_OK_LO_CNT);
+	writel(0, base + AIROHA_SOE_ICV_OK_HI_CNT);
+	writel(0, base + AIROHA_SOE_ICV_FAIL_LO_CNT);
+	writel(0, base + AIROHA_SOE_ICV_FAIL_HI_CNT);
+	writel(0, base + AIROHA_SOE_CON_ICV_FAIL_CNT);
+	writel(0, base + AIROHA_SOE_SEQ_NUM_LO);
+	writel(0, base + AIROHA_SOE_SEQ_NUM_HI);
+	writel(0, base + AIROHA_SOE_BCNT_LO);
+	writel(0, base + AIROHA_SOE_BCNT_HI);
+	writel(0, base + AIROHA_SOE_FLOW_LAB_DSCP);
+	writel(0, base + AIROHA_SOE_BCNT_80);
+	writel(0xffffffff, base + AIROHA_SOE_BCNT_THSHD_80);
+	writel(lower_32_bits(cfg->hard_byte_limit),
+	       base + AIROHA_SOE_BCNT_THSHD_32_HARD);
+	writel(upper_32_bits(cfg->hard_byte_limit),
+	       base + AIROHA_SOE_BCNT_THSHD_64_HARD);
+	writel(lower_32_bits(cfg->soft_packet_limit),
+	       base + AIROHA_SOE_SEQ_THSHD_32_SOFT);
+	writel(upper_32_bits(cfg->soft_packet_limit),
+	       base + AIROHA_SOE_SEQ_THSHD_64_SOFT);
+	writel(lower_32_bits(cfg->hard_packet_limit),
+	       base + AIROHA_SOE_SEQ_THSHD_32_HARD);
+	writel(upper_32_bits(cfg->hard_packet_limit),
+	       base + AIROHA_SOE_SEQ_THSHD_64_HARD);
+
+	return airoha_soe_commit_sa(soe, index);
+}
+
+static int airoha_soe_clear_sa_locked(struct airoha_soe *soe,
+				      unsigned int index)
+{
+	struct airoha_soe_sa_cfg cfg = {};
+
+	return airoha_soe_program_sa_locked(soe, index, &cfg);
+}
+
+static void airoha_soe_copy_words(u32 *dst, const u8 *src, unsigned int bits)
+{
+	unsigned int words = bits / (BITS_PER_BYTE * sizeof(u32));
+	unsigned int i;
+
+	for (i = 0; i < words && i < AIROHA_SOE_KEY_WORDS; i++)
+		dst[i] = get_unaligned_be32(src + i * sizeof(u32));
+}
+
+static int airoha_soe_aes_key_len(unsigned int bits,
+				  struct netlink_ext_ack *extack, u32 *val)
+{
+	switch (bits) {
+	case 128:
+		*val = AIROHA_SOE_AES_KEY_128;
+		return 0;
+	case 192:
+		*val = AIROHA_SOE_AES_KEY_192;
+		return 0;
+	case 256:
+		*val = AIROHA_SOE_AES_KEY_256;
+		return 0;
+	default:
+		NL_SET_ERR_MSG_MOD(extack,
+				   "SOE supports AES-128/192/256 keys only");
+		return -EOPNOTSUPP;
+	}
+}
+
+static int airoha_soe_build_algo(struct xfrm_state *x,
+				 struct airoha_soe_sa_cfg *cfg,
+				 struct netlink_ext_ack *extack)
+{
+	u32 key_len;
+	u32 field;
+	int err;
+
+	if (x->aead) {
+		if (strcmp(x->aead->alg_name, "rfc4106(gcm(aes))")) {
+			NL_SET_ERR_MSG_MOD(extack,
+					   "SOE supports rfc4106(gcm(aes)) AEAD only");
+			return -EOPNOTSUPP;
+		}
+
+		if (x->aead->alg_key_len < 32) {
+			NL_SET_ERR_MSG_MOD(extack, "invalid AEAD key length");
+			return -EINVAL;
+		}
+
+		key_len = x->aead->alg_key_len - 32;
+		err = airoha_soe_aes_key_len(key_len, extack, &field);
+		if (err)
+			return err;
+
+		cfg->cmd |= FIELD_PREP(AIROHA_SOE_SA_CMD_CIPHER,
+				       AIROHA_SOE_CIPHER_AES_GCM);
+		cfg->cmd |= FIELD_PREP(AIROHA_SOE_SA_CMD_AES_KEY_LEN, field);
+		switch (x->aead->alg_icv_len) {
+		case 64:
+			field = 0;
+			break;
+		case 96:
+			field = 1;
+			break;
+		case 128:
+			field = 2;
+			break;
+		default:
+			NL_SET_ERR_MSG_MOD(extack,
+					   "SOE supports 64/96/128-bit GCM ICV only");
+			return -EOPNOTSUPP;
+		}
+		cfg->cmd |= FIELD_PREP(AIROHA_SOE_SA_CMD_GCM_ICV_LEN, field);
+		airoha_soe_copy_words(cfg->enc_key, x->aead->alg_key, key_len);
+		cfg->hmac_key[0] =
+			get_unaligned_be32(x->aead->alg_key + key_len / 8);
+		return 0;
+	}
+
+	if (!x->ealg || strcmp(x->ealg->alg_name, "cbc(aes)")) {
+		NL_SET_ERR_MSG_MOD(extack,
+				   "SOE supports cbc(aes) encryption only");
+		return -EOPNOTSUPP;
+	}
+
+	err = airoha_soe_aes_key_len(x->ealg->alg_key_len, extack, &field);
+	if (err)
+		return err;
+
+	cfg->cmd |=
+		FIELD_PREP(AIROHA_SOE_SA_CMD_CIPHER, AIROHA_SOE_CIPHER_AES_CBC);
+	cfg->cmd |= FIELD_PREP(AIROHA_SOE_SA_CMD_AES_KEY_LEN, field);
+	airoha_soe_copy_words(cfg->enc_key, x->ealg->alg_key,
+			      x->ealg->alg_key_len);
+
+	if (!x->aalg) {
+		NL_SET_ERR_MSG_MOD(extack,
+				   "SOE CBC mode requires HMAC authentication");
+		return -EOPNOTSUPP;
+	}
+
+	if (!strcmp(x->aalg->alg_name, "hmac(sha1)")) {
+		if (x->aalg->alg_key_len != 160 ||
+		    x->aalg->alg_trunc_len != 96) {
+			NL_SET_ERR_MSG_MOD(extack,
+					   "SOE supports HMAC-SHA1-96 only");
+			return -EOPNOTSUPP;
+		}
+		field = AIROHA_SOE_HASH_HMAC_SHA1_96;
+	} else if (!strcmp(x->aalg->alg_name, "hmac(sha256)")) {
+		if (x->aalg->alg_key_len != 256 ||
+		    x->aalg->alg_trunc_len != 128) {
+			NL_SET_ERR_MSG_MOD(extack,
+					   "SOE supports HMAC-SHA256-128 only");
+			return -EOPNOTSUPP;
+		}
+		field = AIROHA_SOE_HASH_HMAC_SHA256_128;
+	} else {
+		NL_SET_ERR_MSG_MOD(extack,
+				   "SOE supports HMAC-SHA1/SHA256 only");
+		return -EOPNOTSUPP;
+	}
+
+	cfg->cmd |= FIELD_PREP(AIROHA_SOE_SA_CMD_HASH, field);
+	airoha_soe_copy_words(cfg->hmac_key, x->aalg->alg_key,
+			      x->aalg->alg_key_len);
+
+	return 0;
+}
+
+static int airoha_soe_build_replay(struct xfrm_state *x,
+				   struct airoha_soe_sa_cfg *cfg,
+				   struct netlink_ext_ack *extack)
+{
+	u32 window;
+
+	if ((x->props.flags & XFRM_STATE_ESN) ||
+	    x->repl_mode == XFRM_REPLAY_MODE_ESN) {
+		NL_SET_ERR_MSG_MOD(extack, "SOE ESN is not supported yet");
+		return -EOPNOTSUPP;
+	}
+
+	window = x->replay_esn ? x->replay_esn->replay_window :
+				 x->props.replay_window;
+	if (!window)
+		return 0;
+
+	cfg->cmd |= AIROHA_SOE_SA_CMD_ANTI_RPLY_EN;
+	cfg->cmd |= FIELD_PREP(AIROHA_SOE_SA_CMD_ANTI_RPLY_WDW,
+			       min_t(u32, (window - 1) / 64, 3));
+
+	return 0;
+}
+
+static int airoha_soe_build_sa(struct xfrm_state *x,
+			       struct airoha_soe_sa_cfg *cfg,
+			       struct netlink_ext_ack *extack)
+{
+	int err;
+
+	if (x->xso.type != XFRM_DEV_OFFLOAD_PACKET) {
+		NL_SET_ERR_MSG_MOD(extack,
+				   "SOE supports XFRM packet offload only");
+		return -EOPNOTSUPP;
+	}
+
+	if (x->xso.dir != XFRM_DEV_OFFLOAD_OUT &&
+	    x->xso.dir != XFRM_DEV_OFFLOAD_IN) {
+		NL_SET_ERR_MSG_MOD(extack, "SOE supports in/out SAs only");
+		return -EOPNOTSUPP;
+	}
+
+	if (x->id.proto != IPPROTO_ESP) {
+		NL_SET_ERR_MSG_MOD(extack, "SOE supports ESP only");
+		return -EOPNOTSUPP;
+	}
+
+	if (x->props.family != AF_INET || x->outer_mode.family != AF_INET) {
+		NL_SET_ERR_MSG_MOD(extack,
+				   "SOE bring-up supports IPv4 outer tunnel only");
+		return -EOPNOTSUPP;
+	}
+
+	if (x->props.mode != XFRM_MODE_TUNNEL) {
+		NL_SET_ERR_MSG_MOD(extack, "SOE supports tunnel mode only");
+		return -EOPNOTSUPP;
+	}
+
+	if (x->encap && x->encap->encap_type != UDP_ENCAP_ESPINUDP) {
+		NL_SET_ERR_MSG_MOD(extack,
+				   "SOE supports native ESP or UDP_ENCAP_ESPINUDP");
+		return -EOPNOTSUPP;
+	}
+
+	if (x->tfcpad) {
+		NL_SET_ERR_MSG_MOD(extack, "SOE does not support TFC padding");
+		return -EOPNOTSUPP;
+	}
+
+	cfg->cmd = AIROHA_SOE_SA_CMD_SN_ERR_DROP |
+		   AIROHA_SOE_SA_CMD_PAD_ERR_DROP |
+		   AIROHA_SOE_SA_CMD_ICV_ERR_DROP;
+	if (x->xso.dir == XFRM_DEV_OFFLOAD_OUT) {
+		cfg->cmd |= AIROHA_SOE_SA_CMD_ENC;
+		if (x->encap)
+			cfg->cmd |= AIROHA_SOE_SA_CMD_NAT_EN;
+		cfg->src_addr[0] = be32_to_cpu(x->props.saddr.a4);
+		cfg->dst_addr[0] = be32_to_cpu(x->id.daddr.a4);
+	} else if (x->encap) {
+		/* RX submit passes the full UDP/4500 packet to SOE. Ask the
+		 * decrypt parser to consume the UDP header before ESP decap.
+		 */
+		cfg->cmd |= AIROHA_SOE_SA_CMD_DEC_UDP_PARSER_EN;
+	}
+
+	err = airoha_soe_build_algo(x, cfg, extack);
+	if (err)
+		return err;
+
+	err = airoha_soe_build_replay(x, cfg, extack);
+	if (err)
+		return err;
+
+	cfg->spi = be32_to_cpu(x->id.spi);
+	if (x->encap) {
+		/* The NAT-T port word stores dport above sport. */
+		cfg->udp_port = (u32)ntohs(x->encap->encap_dport) << 16 |
+				ntohs(x->encap->encap_sport);
+	}
+	cfg->soft_byte_limit = airoha_soe_limit(x->lft.soft_byte_limit);
+	cfg->hard_byte_limit = airoha_soe_limit(x->lft.hard_byte_limit);
+	cfg->soft_packet_limit = airoha_soe_limit(x->lft.soft_packet_limit);
+	cfg->hard_packet_limit = airoha_soe_limit(x->lft.hard_packet_limit);
+
+	return 0;
+}
+
+static int airoha_soe_alloc_sa(struct airoha_soe *soe, struct xfrm_state *x,
+			       struct netlink_ext_ack *extack,
+			       struct airoha_soe_sa **sa)
+{
+	struct airoha_soe_sa_cfg cfg = {};
+	struct airoha_soe_sa *new_sa;
+	unsigned int i;
+	int err;
+
+	if (!soe || !sa || !airoha_soe_available(soe)) {
+		NL_SET_ERR_MSG_MOD(extack, "SOE provider is unavailable");
+		return -ENODEV;
+	}
+
+	err = airoha_soe_build_sa(x, &cfg, extack);
+	if (err)
+		return err;
+
+	new_sa = kzalloc_obj(*new_sa, GFP_KERNEL);
+	if (!new_sa)
+		return -ENOMEM;
+
+	mutex_lock(&soe->sa_lock);
+	for (i = 0; i < AIROHA_SOE_NUM_SA; i++) {
+		if (!(soe->sa_map & BIT(i)))
+			break;
+	}
+	if (i == AIROHA_SOE_NUM_SA) {
+		mutex_unlock(&soe->sa_lock);
+		kfree(new_sa);
+		return -ENOSPC;
+	}
+
+	err = airoha_soe_program_sa_locked(soe, i, &cfg);
+	if (err) {
+		mutex_unlock(&soe->sa_lock);
+		kfree(new_sa);
+		return err;
+	}
+
+	new_sa->soe = soe;
+	new_sa->index = i;
+	new_sa->cmd = cfg.cmd;
+	new_sa->spi = cfg.spi;
+	spin_lock_init(&new_sa->lock);
+	INIT_LIST_HEAD(&new_sa->tx_queue);
+	INIT_LIST_HEAD(&new_sa->rx_queue);
+	init_completion(&new_sa->idle);
+	rcu_assign_pointer(soe->sa[i], new_sa);
+	soe->sa_map |= BIT(i);
+	mutex_unlock(&soe->sa_lock);
+
+	*sa = new_sa;
+	return 0;
+}
+
+static void airoha_soe_mark_sa_dead(struct airoha_soe_sa *sa)
+{
+	if (!sa)
+		return;
+
+	spin_lock_bh(&sa->lock);
+	sa->dead = true;
+	if (!sa->inflight)
+		complete(&sa->idle);
+	spin_unlock_bh(&sa->lock);
+}
+
+static void airoha_soe_free_ctx(struct airoha_soe_ctx *ctx)
+{
+	if (!ctx)
+		return;
+
+	if (ctx->dir == AIROHA_SOE_CTX_OUT)
+		dst_release(ctx->dst);
+	else
+		xfrm_state_put(ctx->rx.x);
+	kfree(ctx);
+}
+
+static void airoha_soe_purge_ctx_list(struct list_head *head)
+{
+	struct airoha_soe_ctx *ctx, *tmp;
+
+	list_for_each_entry_safe(ctx, tmp, head, list) {
+		list_del(&ctx->list);
+		airoha_soe_free_ctx(ctx);
+	}
+}
+
+static void airoha_soe_forget_rx_ctx_list(struct airoha_soe_sa *sa)
+{
+	if (!list_empty(&sa->rx_queue))
+		atomic_sub((int)list_count_nodes(&sa->rx_queue),
+			   &sa->soe->pending_rx);
+}
+
+static void airoha_soe_abort_sa(struct airoha_soe_sa *sa)
+{
+	LIST_HEAD(rx_queue);
+	LIST_HEAD(tx_queue);
+
+	if (!sa)
+		return;
+
+	spin_lock_bh(&sa->lock);
+	sa->dead = true;
+	airoha_soe_forget_rx_ctx_list(sa);
+	list_splice_init(&sa->tx_queue, &tx_queue);
+	list_splice_init(&sa->rx_queue, &rx_queue);
+	sa->inflight = 0;
+	complete(&sa->idle);
+	spin_unlock_bh(&sa->lock);
+
+	airoha_soe_purge_ctx_list(&tx_queue);
+	airoha_soe_purge_ctx_list(&rx_queue);
+}
+
+static void airoha_soe_free_sa(struct airoha_soe_sa *sa)
+{
+	LIST_HEAD(rx_queue);
+	LIST_HEAD(tx_queue);
+	struct airoha_soe *soe;
+
+	if (!sa)
+		return;
+
+	soe = sa->soe;
+	airoha_soe_mark_sa_dead(sa);
+	if (!wait_for_completion_timeout(&sa->idle, AIROHA_SOE_SA_FREE_TIMEOUT))
+		dev_warn(soe->dev,
+			 "timed out waiting for SOE SA%u in-flight packets\n",
+			 sa->index);
+
+	mutex_lock(&soe->sa_lock);
+	if (sa->index < AIROHA_SOE_NUM_SA &&
+	    rcu_access_pointer(soe->sa[sa->index]) == sa) {
+		airoha_soe_clear_sa_locked(soe, sa->index);
+		RCU_INIT_POINTER(soe->sa[sa->index], NULL);
+		soe->sa_map &= ~BIT(sa->index);
+	}
+	mutex_unlock(&soe->sa_lock);
+	synchronize_rcu();
+
+	spin_lock_bh(&sa->lock);
+	airoha_soe_forget_rx_ctx_list(sa);
+	list_splice_init(&sa->tx_queue, &tx_queue);
+	list_splice_init(&sa->rx_queue, &rx_queue);
+	spin_unlock_bh(&sa->lock);
+	airoha_soe_purge_ctx_list(&tx_queue);
+	airoha_soe_purge_ctx_list(&rx_queue);
+
+	kfree(sa);
+}
+
+static struct airoha_soe_ctx *airoha_soe_pop_ctx(struct airoha_soe_sa *sa,
+						 enum airoha_soe_ctx_dir dir)
+{
+	struct list_head *head;
+	struct airoha_soe_ctx *ctx = NULL;
+
+	head = dir == AIROHA_SOE_CTX_OUT ? &sa->tx_queue : &sa->rx_queue;
+
+	spin_lock_bh(&sa->lock);
+	if (!list_empty(head)) {
+		ctx = list_first_entry(head, struct airoha_soe_ctx, list);
+		list_del(&ctx->list);
+		if (dir == AIROHA_SOE_CTX_IN)
+			atomic_dec(&sa->soe->pending_rx);
+	}
+
+	if (ctx && !WARN_ON_ONCE(!sa->inflight)) {
+		sa->inflight--;
+		if (sa->dead && !sa->inflight)
+			complete(&sa->idle);
+	}
+	spin_unlock_bh(&sa->lock);
+
+	return ctx;
+}
+
+static int airoha_soe_prepare_ip_headers(struct sk_buff *skb)
+{
+	unsigned int hdr_len;
+
+	if (!pskb_may_pull(skb, 1))
+		return -EINVAL;
+
+	switch (skb->data[0] & 0xf0) {
+	case 0x40:
+		hdr_len = sizeof(struct iphdr);
+		skb->protocol = htons(ETH_P_IP);
+		break;
+	case 0x60:
+		hdr_len = sizeof(struct ipv6hdr);
+		skb->protocol = htons(ETH_P_IPV6);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	if (!pskb_may_pull(skb, hdr_len))
+		return -EINVAL;
+
+	skb_reset_network_header(skb);
+	skb_set_transport_header(skb, hdr_len);
+
+	return 0;
+}
+
+static void airoha_soe_trace_rx_complete(struct sk_buff *skb,
+					 const struct airoha_soe_ctx *ctx,
+					 const struct xfrm_state *x)
+{
+	unsigned int trace = READ_ONCE(airoha_soe_rx_trace_packets);
+	const struct iphdr *iph;
+
+	if (!trace || skb->protocol != htons(ETH_P_IP))
+		return;
+
+	iph = ip_hdr(skb);
+	pr_info("airoha_eth: SOE RX complete dev=%s saddr=%pI4 daddr=%pI4 proto=%u len=%u mark=0x%x spi=0x%08x natt=%u foe=%u hash=0x%04x sa=%u\n",
+		ctx->rx.dev->name, &iph->saddr, &iph->daddr, iph->protocol,
+		ntohs(iph->tot_len), skb->mark, ntohl(x->id.spi),
+		x->encap ? 1 : 0, ctx->rx.foe_valid, ctx->rx.foe_hash,
+		ctx->rx.sa_index);
+	WRITE_ONCE(airoha_soe_rx_trace_packets, trace - 1);
+}
+
+static int airoha_soe_push_l2_header(struct sk_buff *skb)
+{
+	static const u8 ipv4_l2_header[ETH_HLEN] = {
+		0x00, 0x0c, 0xe7, 0x20, 0x21, 0x12, 0x00,
+		0x0c, 0xe7, 0x20, 0x22, 0x62, 0x08, 0x00,
+	};
+	static const u8 ipv6_l2_header[ETH_HLEN] = {
+		0x00, 0x0c, 0xe7, 0x20, 0x21, 0x12, 0x00,
+		0x0c, 0xe7, 0x20, 0x22, 0x62, 0x86, 0xdd,
+	};
+	const u8 *l2_header;
+	int err;
+
+	err = airoha_soe_prepare_ip_headers(skb);
+	if (err)
+		return err;
+
+	if (skb->protocol == htons(ETH_P_IP))
+		l2_header = ipv4_l2_header;
+	else
+		l2_header = ipv6_l2_header;
+
+	/* TDMA/SOE port 7 expects an Ethernet-looking frame before the SOE hop. */
+	memcpy(skb_push(skb, ETH_HLEN), l2_header, ETH_HLEN);
+
+	return 0;
+}
+
+static void airoha_soe_push_hop_desc(struct sk_buff *skb, unsigned int sa_index,
+				     bool encrypt, int foe_idx)
+{
+	u32 hop_direction = encrypt ? AIROHA_SOE_HOP_INFO_ENCRYPT :
+				      AIROHA_SOE_HOP_INFO_DECRYPT;
+	u64 desc3 = ((u64)(u16)((hop_direction << 4) | 0x80) << 48) |
+		    ((u64)(sa_index & 0x3f) << 40) | 0x05dc0000ULL;
+	u64 desc2 = 0;
+	__le64 desc[4] = {};
+
+	if (foe_idx >= 0)
+		desc2 = (u64)(foe_idx & 0xffff) << 32;
+
+	desc[0] = cpu_to_le64(encrypt ? AIROHA_SOE_HOP_DESC0_ENCRYPT :
+					AIROHA_SOE_HOP_DESC0_DECRYPT);
+	desc[1] = cpu_to_le64(AIROHA_SOE_HOP_DESC1);
+	desc[2] = cpu_to_le64(desc2);
+	desc[3] = cpu_to_le64(desc3);
+	((u8 *)desc)[28] = sa_index;
+
+	/* The FE/QDMA hop descriptor is consumed by PSE port 7 before SOE. */
+	memcpy(skb_push(skb, AIROHA_SOE_QDMA_HOP_DESC_LEN), desc, sizeof(desc));
+}
+
+static int airoha_soe_submit_skb(struct airoha_soe_sa *sa,
+				 struct airoha_gdm_dev *dev,
+				 struct sk_buff *skb,
+				 struct airoha_soe_ctx *ctx)
+{
+	struct net_device *netdev = netdev_from_priv(dev);
+	u32 queue = ctx->dir == AIROHA_SOE_CTX_OUT ?
+			    AIROHA_SOE_QDMA_QUEUE_ENCRYPT :
+			    AIROHA_SOE_QDMA_QUEUE_DECRYPT;
+	bool encrypt = ctx->dir == AIROHA_SOE_CTX_OUT;
+	unsigned int headroom = AIROHA_SOE_QDMA_HOP_DESC_LEN + ETH_HLEN;
+	struct list_head *head;
+	u32 msg0, msg1;
+	int foe_idx = -1;
+	int err;
+
+	if (skb->ip_summed == CHECKSUM_PARTIAL) {
+		err = skb_checksum_help(skb);
+		if (err)
+			return err;
+	}
+
+	err = skb_cow_head(skb, headroom);
+	if (err)
+		return err;
+
+	err = airoha_soe_push_l2_header(skb);
+	if (err)
+		return err;
+
+	msg0 = FIELD_PREP(QDMA_ETH_TXMSG_SOE_SA_MASK, sa->index & 0x3f);
+	msg1 = FIELD_PREP(QDMA_ETH_TXMSG_METER_MASK, 0x7f) |
+	       FIELD_PREP(QDMA_ETH_TXMSG_FPORT_MASK, 7) |
+	       FIELD_PREP(QDMA_ETH_TXMSG_NBOQ_MASK, queue) |
+	       QDMA_ETH_TXMSG_HOP_MASK |
+	       FIELD_PREP(QDMA_ETH_TXMSG_ACNT_G1_MASK, 0x1f) |
+	       FIELD_PREP(QDMA_ETH_TXMSG_ACNT_G0_MASK, 0x3f);
+
+	if (ctx->dir == AIROHA_SOE_CTX_IN && ctx->rx.foe_valid &&
+	    ctx->rx.foe_hash != AIROHA_RXD4_FOE_ENTRY)
+		foe_idx = ctx->rx.foe_hash;
+
+	airoha_soe_push_hop_desc(skb, sa->index, encrypt, foe_idx);
+
+	skb->dev = netdev;
+	skb_set_queue_mapping(skb, AIROHA_SOE_QDMA_TX_RING);
+
+	if (!dev->soe_xmit_skb)
+		return -ENODEV;
+
+	head = ctx->dir == AIROHA_SOE_CTX_OUT ? &sa->tx_queue : &sa->rx_queue;
+	spin_lock_bh(&sa->lock);
+	if (sa->dead) {
+		spin_unlock_bh(&sa->lock);
+		return -ENOENT;
+	}
+
+	/* Completion descriptors carry only SA/hop flags, so keep skb context here. */
+	list_add_tail(&ctx->list, head);
+	sa->inflight++;
+	if (ctx->dir == AIROHA_SOE_CTX_IN)
+		atomic_inc(&sa->soe->pending_rx);
+	reinit_completion(&sa->idle);
+
+	err = dev->soe_xmit_skb(dev, skb, msg0, msg1,
+				AIROHA_SOE_TXMSG2_DEFAULT);
+	if (err) {
+		list_del(&ctx->list);
+		if (ctx->dir == AIROHA_SOE_CTX_IN)
+			atomic_dec(&sa->soe->pending_rx);
+		sa->inflight--;
+		if (sa->dead && !sa->inflight)
+			complete(&sa->idle);
+	}
+	spin_unlock_bh(&sa->lock);
+
+	return err;
+}
+
+int airoha_soe_xmit(struct airoha_soe_sa *sa, struct airoha_gdm_dev *dev,
+		    struct sk_buff *skb, struct xfrm_state *x)
+{
+	struct airoha_soe_ctx *ctx;
+	struct dst_entry *path;
+	struct dst_entry *dst;
+	int err;
+
+	if (!sa || !dev || !skb || !x || x->xso.dir != XFRM_DEV_OFFLOAD_OUT)
+		return -EINVAL;
+
+	if (skb_is_gso(skb))
+		return -EOPNOTSUPP;
+
+	dst = skb_dst(skb);
+	if (!dst)
+		return -EHOSTUNREACH;
+
+	path = xfrm_dst_path(dst);
+	if (!path)
+		return -EHOSTUNREACH;
+
+	ctx = kzalloc_obj(*ctx, GFP_ATOMIC);
+	if (!ctx)
+		return -ENOMEM;
+
+	ctx->dir = AIROHA_SOE_CTX_OUT;
+	dst_hold(path);
+	ctx->dst = path;
+
+	err = airoha_soe_submit_skb(sa, dev, skb, ctx);
+	if (err) {
+		airoha_soe_free_ctx(ctx);
+		return err;
+	}
+
+	return 0;
+}
+
+static bool airoha_soe_rx_parse_ipv4(struct sk_buff *skb,
+				     struct airoha_soe_rx_info *info)
+{
+	struct ip_esp_hdr *esph;
+	struct udphdr *uh;
+	struct iphdr *iph;
+	int iphlen;
+	int udp_len;
+	int packet_len;
+
+	if (skb->protocol != htons(ETH_P_IP)) {
+		if (!pskb_may_pull(skb, 1) || (skb->data[0] >> 4) != 4)
+			return false;
+
+		skb->protocol = htons(ETH_P_IP);
+	}
+
+	if (!pskb_may_pull(skb, sizeof(*iph)))
+		return false;
+
+	iph = ip_hdr(skb);
+	if (iph->version != 4 || ip_is_fragment(iph))
+		return false;
+
+	iphlen = iph->ihl * 4;
+	packet_len = ntohs(iph->tot_len);
+	if (iphlen < sizeof(*iph) || packet_len > skb->len)
+		return false;
+
+	if (iph->protocol == IPPROTO_ESP) {
+		if (packet_len <= iphlen + sizeof(*esph) ||
+		    !pskb_may_pull(skb, iphlen + sizeof(*esph)))
+			return false;
+
+		esph = (struct ip_esp_hdr *)(skb->data + iphlen);
+		if (!esph->spi)
+			return false;
+
+		info->packet_len = packet_len;
+		info->encap = false;
+		info->sport = 0;
+		info->dport = 0;
+		info->spi = esph->spi;
+
+		return true;
+	}
+
+	if (iph->protocol != IPPROTO_UDP ||
+	    !pskb_may_pull(skb, iphlen + sizeof(*uh) + sizeof(*esph)))
+		return false;
+
+	uh = (struct udphdr *)(skb->data + iphlen);
+	udp_len = ntohs(uh->len);
+	if (uh->dest != htons(AIROHA_SOE_NATT_PORT) ||
+	    udp_len <= sizeof(*uh) + sizeof(*esph) ||
+	    iphlen + udp_len != packet_len || packet_len > skb->len)
+		return false;
+
+	esph = (struct ip_esp_hdr *)(skb->data + iphlen + sizeof(*uh));
+	if (!esph->spi)
+		return false;
+
+	info->packet_len = packet_len;
+	info->encap = true;
+	info->sport = uh->source;
+	info->dport = uh->dest;
+	info->spi = esph->spi;
+
+	return true;
+}
+
+/* Plain ESP/NAT-T first arrives as normal RX, then is bounced to SOE decrypt. */
+bool airoha_soe_rx_plain_skb(struct airoha_gdm_dev *dev, struct sk_buff *skb,
+			     struct net_device *rx_dev, u16 foe_hash,
+			     u32 foe_reason, bool foe_valid)
+{
+	struct airoha_soe_xfrm_state *state;
+	struct airoha_soe_rx_info info = {};
+	struct airoha_soe_ctx *ctx;
+	xfrm_address_t daddr = {};
+	struct xfrm_state *x;
+	int err;
+
+	if (!dev || !skb || !rx_dev)
+		return false;
+
+	if (!dev->eth->soe || !(rx_dev->features & NETIF_F_HW_ESP))
+		return false;
+
+	if (!atomic_read(&dev->soe_xfrm_state_count))
+		return false;
+
+	/* The packet is still in the driver RX path after eth_type_trans(). */
+	skb_reset_network_header(skb);
+	if (!airoha_soe_rx_parse_ipv4(skb, &info))
+		return false;
+
+	if (skb->len != info.packet_len && pskb_trim(skb, info.packet_len))
+		return false;
+
+	daddr.a4 = ip_hdr(skb)->daddr;
+	x = xfrm_input_state_lookup(dev_net(rx_dev), skb->mark, &daddr,
+				    info.spi, IPPROTO_ESP, AF_INET);
+	if (!x)
+		return false;
+
+	if (x->xso.dir != XFRM_DEV_OFFLOAD_IN)
+		goto put_state;
+	if (x->xso.type != XFRM_DEV_OFFLOAD_PACKET)
+		goto put_state;
+	if (x->xso.dev != rx_dev)
+		goto put_state;
+	if ((info.encap &&
+	     (!x->encap || x->encap->encap_type != UDP_ENCAP_ESPINUDP)) ||
+	    (!info.encap && x->encap))
+		goto put_state;
+
+	if (info.encap && info.dport != x->encap->encap_dport)
+		goto put_state;
+
+	state = (struct airoha_soe_xfrm_state *)x->xso.offload_handle;
+	if (!state || state->dev != dev || !state->sa)
+		goto put_state;
+
+	ctx = kzalloc_obj(*ctx, GFP_ATOMIC);
+	if (!ctx)
+		goto put_state;
+
+	ctx->dir = AIROHA_SOE_CTX_IN;
+	ctx->rx.x = x;
+	ctx->rx.gdm_dev = dev;
+	ctx->rx.dev = rx_dev;
+	ctx->rx.saddr = ip_hdr(skb)->saddr;
+	ctx->rx.sport = info.sport;
+	ctx->rx.foe_hash = foe_hash;
+	ctx->rx.foe_reason = foe_reason;
+	ctx->rx.sa_index = state->sa->index;
+	ctx->rx.foe_valid = foe_valid;
+	ctx->rx.mark = skb->mark;
+
+		err = airoha_soe_submit_skb(state->sa, dev, skb, ctx);
+	if (err) {
+		airoha_soe_free_ctx(ctx);
+		goto drop_state;
+	}
+
+	return true;
+
+drop_state:
+	kfree_skb(skb);
+	return true;
+put_state:
+	xfrm_state_put(x);
+	return false;
+}
+
+static bool airoha_soe_complete_out(struct sk_buff *skb,
+				    struct airoha_soe_ctx *ctx)
+{
+	struct dst_entry *dst = ctx->dst;
+	struct net *net;
+	int err;
+
+	ctx->dst = NULL;
+	if (!pskb_may_pull(skb, ETH_HLEN + 1))
+		goto drop;
+	skb_pull(skb, ETH_HLEN);
+
+	err = airoha_soe_prepare_ip_headers(skb);
+	if (err)
+		goto drop;
+
+	/* Re-enter dst_output() with the original dst after hardware ESP encode. */
+	skb->protocol = htons(ETH_P_IP);
+	skb_dst_drop(skb);
+	skb_dst_set(skb, dst);
+	skb->ignore_df = 1;
+	net = dev_net(dst->dev);
+	kfree(ctx);
+	dst_output(net, NULL, skb);
+
+	return true;
+
+drop:
+	dst_release(dst);
+	kfree(ctx);
+	kfree_skb(skb);
+	return true;
+}
+
+static bool airoha_soe_complete_in(struct sk_buff *skb,
+				   struct airoha_soe_ctx *ctx)
+{
+	struct xfrm_state *x = ctx->rx.x;
+	struct net_device *rx_dev = ctx->rx.dev;
+	struct xfrm_offload *xo;
+	struct sec_path *sp;
+	int err;
+
+	if (!pskb_may_pull(skb, ETH_HLEN + 1))
+		goto drop;
+	skb_pull(skb, ETH_HLEN);
+
+	err = airoha_soe_prepare_ip_headers(skb);
+	if (err)
+		goto drop;
+
+	skb->dev = rx_dev;
+	skb->mark = ctx->rx.mark;
+	skb->ip_summed = CHECKSUM_NONE;
+	skb_reset_mac_header(skb);
+	skb_reset_mac_len(skb);
+	skb->pkt_type = PACKET_HOST;
+	skb->encapsulation = 0;
+	skb_dst_drop(skb);
+
+	if (x->encap && x->encap->encap_type == UDP_ENCAP_ESPINUDP &&
+	    (ctx->rx.saddr != x->props.saddr.a4 ||
+	     ctx->rx.sport != x->encap->encap_sport)) {
+		xfrm_address_t ipaddr = {
+			.a4 = ctx->rx.saddr,
+		};
+
+		km_new_mapping(x, &ipaddr, ctx->rx.sport);
+	}
+
+	/* Tell xfrm_input() equivalent consumers that hardware already decrypted. */
+	sp = secpath_set(skb);
+	if (!sp)
+		goto drop;
+
+	if (sp->len == XFRM_MAX_DEPTH) {
+		secpath_reset(skb);
+		goto drop;
+	}
+
+	sp->xvec[sp->len++] = x;
+	sp->olen++;
+	ctx->rx.x = NULL;
+	xo = xfrm_offload(skb);
+	if (!xo) {
+		secpath_reset(skb);
+		goto drop;
+	}
+
+	xo->flags = CRYPTO_DONE;
+	xo->status = CRYPTO_SUCCESS;
+
+	airoha_soe_trace_rx_complete(skb, ctx, x);
+
+	/* SOE decrypt completion reaches the CPU before the routed plaintext
+	 * packet has selected its final egress port. Preserve the original FOE
+	 * hash and SA hop until the Ethernet xmit path can bind that decrypt
+	 * entry with the completed L2/PSE descriptor.
+	 */
+	if (ctx->rx.foe_valid)
+		airoha_ppe_soe_mark_skb(&ctx->rx.gdm_dev->eth->ppe->dev, skb,
+					ctx->rx.foe_hash, ctx->rx.sa_index,
+					AIROHA_SOE_HOP_INFO_DECRYPT);
+
+	kfree(ctx);
+	netif_rx(skb);
+
+	return true;
+
+drop:
+	airoha_soe_free_ctx(ctx);
+	kfree_skb(skb);
+	return true;
+}
+
+bool airoha_soe_rx_skb(struct airoha_soe *soe, struct sk_buff *skb,
+		       unsigned int sa_index, u32 hop_flags)
+{
+	struct airoha_soe_ctx *ctx;
+	struct airoha_soe_sa *sa;
+
+	if (!soe || !skb || sa_index >= AIROHA_SOE_NUM_SA)
+		return false;
+
+	rcu_read_lock();
+	sa = rcu_dereference(soe->sa[sa_index]);
+	if (!sa) {
+		rcu_read_unlock();
+		return false;
+	}
+
+	if (hop_flags >= AIROHA_SOE_HOP_FLAG_ERROR_BASE) {
+		ctx = airoha_soe_pop_ctx(sa, AIROHA_SOE_CTX_OUT);
+		if (!ctx)
+			ctx = airoha_soe_pop_ctx(sa, AIROHA_SOE_CTX_IN);
+		rcu_read_unlock();
+		airoha_soe_free_ctx(ctx);
+		kfree_skb(skb);
+		return true;
+	}
+
+	if (hop_flags == AIROHA_SOE_HOP_FLAG_ENCRYPTED) {
+		ctx = airoha_soe_pop_ctx(sa, AIROHA_SOE_CTX_OUT);
+		rcu_read_unlock();
+		if (!ctx) {
+			kfree_skb(skb);
+			return true;
+		}
+		return airoha_soe_complete_out(skb, ctx);
+	}
+
+	if (hop_flags == AIROHA_SOE_HOP_FLAG_DECRYPTED) {
+		ctx = airoha_soe_pop_ctx(sa, AIROHA_SOE_CTX_IN);
+		rcu_read_unlock();
+		if (!ctx) {
+			kfree_skb(skb);
+			return true;
+		}
+		return airoha_soe_complete_in(skb, ctx);
+	}
+
+	rcu_read_unlock();
+	return false;
+}
+
+bool airoha_soe_has_pending_rx(struct airoha_soe *soe)
+{
+	if (!soe)
+		return false;
+
+	return !!atomic_read(&soe->pending_rx);
+}
+
+int airoha_soe_xfrm_ppe_info(const struct dst_entry *dst, u8 *sa_index, u8 *hop)
+{
+	struct airoha_soe_xfrm_state *state;
+	struct net_device *netdev;
+	struct xfrm_state *x;
+
+	if (!dst || !sa_index || !hop)
+		return -EINVAL;
+
+	x = dst_xfrm(dst);
+	if (!x || x->xso.type != XFRM_DEV_OFFLOAD_PACKET)
+		return -EOPNOTSUPP;
+
+	state = (struct airoha_soe_xfrm_state *)x->xso.offload_handle;
+	if (!state || !state->sa)
+		return -ENODEV;
+
+	if (!state->dev)
+		return -ENODEV;
+
+	netdev = netdev_from_priv(state->dev);
+	if (netdev != x->xso.dev || !(netdev->features & NETIF_F_HW_ESP))
+		return -ENODEV;
+
+	switch (x->xso.dir) {
+	case XFRM_DEV_OFFLOAD_OUT:
+		*hop = AIROHA_SOE_HOP_INFO_ENCRYPT;
+		break;
+	case XFRM_DEV_OFFLOAD_IN:
+		*hop = AIROHA_SOE_HOP_INFO_DECRYPT;
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	*sa_index = state->sa->index;
+
+	return 0;
+}
+
+static int airoha_soe_xfrm_state_add(struct net_device *dev,
+				     struct xfrm_state *x,
+				     struct netlink_ext_ack *extack)
+{
+	struct airoha_soe_xfrm_state *state;
+	struct airoha_gdm_dev *gdm_dev;
+	struct airoha_soe *soe;
+	gfp_t gfp;
+	int err;
+
+	if (dev->xfrmdev_ops != &airoha_soe_xfrmdev_ops ||
+	    !(dev->features & NETIF_F_HW_ESP))
+		return -EOPNOTSUPP;
+
+	gdm_dev = netdev_priv(dev);
+	soe = airoha_soe_get_ref(gdm_dev->eth->soe);
+	if (!soe)
+		return -ENODEV;
+
+	gfp = (x->xso.flags & XFRM_DEV_OFFLOAD_FLAG_ACQ) ? GFP_ATOMIC :
+							   GFP_KERNEL;
+	state = kzalloc_obj(*state, gfp);
+	if (!state) {
+		airoha_soe_put_ref(soe);
+		return -ENOMEM;
+	}
+
+	state->dev = gdm_dev;
+	state->soe = soe;
+
+	if (x->xso.flags & XFRM_DEV_OFFLOAD_FLAG_ACQ)
+		goto out;
+
+	err = airoha_soe_alloc_sa(soe, x, extack, &state->sa);
+	if (err)
+		goto err_free;
+
+	atomic_inc(&gdm_dev->soe_xfrm_state_count);
+	state->counted = true;
+out:
+	x->xso.offload_handle = (unsigned long)state;
+	return 0;
+
+err_free:
+	kfree(state);
+	airoha_soe_put_ref(soe);
+	return err;
+}
+
+static void airoha_soe_xfrm_state_delete(struct net_device *dev,
+					 struct xfrm_state *x)
+{
+	struct airoha_soe_xfrm_state *state;
+
+	state = (struct airoha_soe_xfrm_state *)x->xso.offload_handle;
+	if (state && state->sa) {
+		airoha_ppe_soe_flush_sa(state->dev->eth->ppe, state->sa->index);
+		airoha_soe_abort_sa(state->sa);
+	}
+}
+
+static void airoha_soe_xfrm_state_free(struct net_device *dev,
+				       struct xfrm_state *x)
+{
+	struct airoha_soe_xfrm_state *state;
+
+	state = (struct airoha_soe_xfrm_state *)xchg(&x->xso.offload_handle, 0);
+	if (!state)
+		return;
+
+	if (state->sa) {
+		airoha_ppe_soe_flush_sa(state->dev->eth->ppe,
+					state->sa->index);
+		airoha_soe_free_sa(state->sa);
+	}
+	if (state->counted)
+		atomic_dec(&state->dev->soe_xfrm_state_count);
+	airoha_soe_put_ref(state->soe);
+	kfree(state);
+}
+
+static bool airoha_soe_xfrm_offload_ok(struct sk_buff *skb,
+				       struct xfrm_state *x)
+{
+	struct airoha_soe_xfrm_state *state;
+	struct net_device *dev = x->xso.dev;
+
+	if (!dev || !(dev->features & NETIF_F_HW_ESP))
+		return false;
+
+	if (x->xso.type != XFRM_DEV_OFFLOAD_PACKET ||
+	    x->xso.dir != XFRM_DEV_OFFLOAD_OUT)
+		return false;
+
+	state = (struct airoha_soe_xfrm_state *)x->xso.offload_handle;
+
+	return state && state->sa;
+}
+
+static int airoha_soe_xfrm_packet_xmit_gso(struct sk_buff *skb,
+					   struct xfrm_state *x,
+					   struct airoha_soe_xfrm_state *state)
+{
+	struct sk_buff *segs, *nskb;
+	int err;
+
+	segs = skb_gso_segment(skb, 0);
+	if (IS_ERR(segs)) {
+		XFRM_INC_STATS(xs_net(x), LINUX_MIB_XFRMOUTERROR);
+		kfree_skb(skb);
+		return PTR_ERR(segs);
+	}
+
+	consume_skb(skb);
+
+	skb_list_walk_safe(segs, skb, nskb) {
+		skb_mark_not_on_list(skb);
+		err = airoha_soe_xmit(state->sa, state->dev, skb, x);
+		if (err) {
+			XFRM_INC_STATS(xs_net(x), LINUX_MIB_XFRMOUTERROR);
+			kfree_skb(skb);
+			kfree_skb_list(nskb);
+			return err;
+		}
+	}
+
+	return 0;
+}
+
+static int airoha_soe_xfrm_packet_xmit(struct sk_buff *skb,
+				       struct xfrm_state *x)
+{
+	struct airoha_soe_xfrm_state *state;
+	struct net_device *netdev;
+	int err = -EHOSTUNREACH;
+
+	state = (struct airoha_soe_xfrm_state *)x->xso.offload_handle;
+	if (!state || !state->sa || !state->dev)
+		goto drop;
+
+	netdev = netdev_from_priv(state->dev);
+	if (netdev->xfrmdev_ops != &airoha_soe_xfrmdev_ops ||
+	    !(netdev->features & NETIF_F_HW_ESP))
+		goto drop;
+
+	if (skb_is_gso(skb))
+		return airoha_soe_xfrm_packet_xmit_gso(skb, x, state);
+
+	err = airoha_soe_xmit(state->sa, state->dev, skb, x);
+	if (err)
+		goto drop;
+
+	return 0;
+
+drop:
+	XFRM_INC_STATS(xs_net(x), LINUX_MIB_XFRMOUTERROR);
+	kfree_skb(skb);
+	return err;
+}
+
+static int airoha_soe_xfrm_policy_add(struct xfrm_policy *x,
+				      struct netlink_ext_ack *extack)
+{
+	if (x->xdo.type != XFRM_DEV_OFFLOAD_PACKET) {
+		NL_SET_ERR_MSG_MOD(extack,
+				   "SOE supports XFRM packet policies only");
+		return -EOPNOTSUPP;
+	}
+
+	if (xfrm_policy_id2dir(x->index) >= XFRM_POLICY_MAX) {
+		NL_SET_ERR_MSG_MOD(extack,
+				   "SOE does not offload socket policies");
+		return -EOPNOTSUPP;
+	}
+
+	if (x->xfrm_nr != 1 ||
+	    x->xfrm_vec[0].id.proto != IPPROTO_ESP) {
+		NL_SET_ERR_MSG_MOD(extack,
+				   "SOE offloads ESP policies only");
+		return -EOPNOTSUPP;
+	}
+
+	if (!x->xdo.dev || !(x->xdo.dev->features & NETIF_F_HW_ESP)) {
+		NL_SET_ERR_MSG_MOD(extack, "SOE ESP offload is disabled");
+		return -EOPNOTSUPP;
+	}
+
+	switch (x->xdo.dir) {
+	case XFRM_DEV_OFFLOAD_IN:
+	case XFRM_DEV_OFFLOAD_OUT:
+		return 0;
+	default:
+		NL_SET_ERR_MSG_MOD(extack, "SOE supports in/out policies only");
+		return -EOPNOTSUPP;
+	}
+}
+
+static struct net_device *airoha_soe_dsa_conduit_get(struct net_device *dev)
+{
+	struct net_device *conduit;
+	struct dsa_port *dp;
+
+	if (!dsa_user_dev_check(dev))
+		return NULL;
+
+	/* DSA users expose XFRM, but SOE is attached to their CPU conduit. */
+	dp = dsa_port_from_netdev(dev);
+	if (IS_ERR(dp) || !dp->cpu_dp)
+		return NULL;
+
+	conduit = dsa_port_to_conduit(dp);
+	if (!conduit || conduit->xfrmdev_ops != &airoha_soe_xfrmdev_ops)
+		return NULL;
+
+	dev_hold(conduit);
+
+	return conduit;
+}
+
+static int airoha_soe_dsa_xfrm_state_add(struct net_device *dev,
+					 struct xfrm_state *x,
+					 struct netlink_ext_ack *extack)
+{
+	struct net_device *conduit;
+	int err;
+
+	conduit = airoha_soe_dsa_conduit_get(dev);
+	if (!conduit) {
+		NL_SET_ERR_MSG_MOD(extack, "SOE DSA conduit is unavailable");
+		return -EOPNOTSUPP;
+	}
+
+	err = airoha_soe_xfrm_state_add(conduit, x, extack);
+	dev_put(conduit);
+
+	return err;
+}
+
+static void airoha_soe_dsa_xfrm_state_delete(struct net_device *dev,
+					     struct xfrm_state *x)
+{
+	airoha_soe_xfrm_state_delete(dev, x);
+}
+
+static void airoha_soe_dsa_xfrm_state_free(struct net_device *dev,
+					   struct xfrm_state *x)
+{
+	airoha_soe_xfrm_state_free(dev, x);
+}
+
+static bool airoha_soe_dsa_xfrm_offload_ok(struct sk_buff *skb,
+					   struct xfrm_state *x)
+{
+	return airoha_soe_xfrm_offload_ok(skb, x);
+}
+
+static int airoha_soe_dsa_xfrm_policy_add(struct xfrm_policy *x,
+					  struct netlink_ext_ack *extack)
+{
+	return airoha_soe_xfrm_policy_add(x, extack);
+}
+
+static int airoha_soe_dsa_xfrm_packet_xmit(struct sk_buff *skb,
+					   struct xfrm_state *x)
+{
+	return airoha_soe_xfrm_packet_xmit(skb, x);
+}
+
+static const struct xfrmdev_ops airoha_soe_xfrmdev_ops = {
+	.xdo_dev_state_add = airoha_soe_xfrm_state_add,
+	.xdo_dev_state_delete = airoha_soe_xfrm_state_delete,
+	.xdo_dev_state_free = airoha_soe_xfrm_state_free,
+	.xdo_dev_offload_ok = airoha_soe_xfrm_offload_ok,
+	.xdo_dev_policy_add = airoha_soe_xfrm_policy_add,
+	.xdo_dev_packet_xmit = airoha_soe_xfrm_packet_xmit,
+};
+
+static const struct xfrmdev_ops airoha_soe_dsa_xfrmdev_ops = {
+	.xdo_dev_state_add = airoha_soe_dsa_xfrm_state_add,
+	.xdo_dev_state_delete = airoha_soe_dsa_xfrm_state_delete,
+	.xdo_dev_state_free = airoha_soe_dsa_xfrm_state_free,
+	.xdo_dev_offload_ok = airoha_soe_dsa_xfrm_offload_ok,
+	.xdo_dev_policy_add = airoha_soe_dsa_xfrm_policy_add,
+	.xdo_dev_packet_xmit = airoha_soe_dsa_xfrm_packet_xmit,
+};
+
+static void airoha_soe_dsa_proxy_enable(struct net_device *dev)
+{
+	struct net_device *conduit;
+
+	conduit = airoha_soe_dsa_conduit_get(dev);
+	if (!conduit)
+		return;
+
+	if (dev->xfrmdev_ops && dev->xfrmdev_ops != &airoha_soe_dsa_xfrmdev_ops)
+		goto out;
+
+	/* Mirror ESP capability onto DSA users while programming SAs on the conduit. */
+	dev->xfrmdev_ops = &airoha_soe_dsa_xfrmdev_ops;
+	dev->hw_features |= NETIF_F_HW_ESP;
+	dev->hw_enc_features |= NETIF_F_HW_ESP;
+	dev->wanted_features |= NETIF_F_HW_ESP;
+
+	conduit->wanted_features |= NETIF_F_HW_ESP;
+	netdev_update_features(conduit);
+	netdev_update_features(dev);
+out:
+	dev_put(conduit);
+}
+
+static void airoha_soe_dsa_proxy_clear(struct net_device *dev)
+{
+	if (dev->xfrmdev_ops != &airoha_soe_dsa_xfrmdev_ops)
+		return;
+
+	dev->wanted_features &= ~NETIF_F_HW_ESP;
+	dev->hw_features &= ~NETIF_F_HW_ESP;
+	dev->hw_enc_features &= ~NETIF_F_HW_ESP;
+	netdev_update_features(dev);
+	dev->xfrmdev_ops = NULL;
+}
+
+static void airoha_soe_dsa_proxy_scan(bool enable)
+{
+	struct net_device *dev;
+
+	for_each_netdev(&init_net, dev) {
+		if (enable)
+			airoha_soe_dsa_proxy_enable(dev);
+		else
+			airoha_soe_dsa_proxy_clear(dev);
+	}
+}
+
+static int airoha_soe_netdev_event(struct notifier_block *nb,
+				   unsigned long event, void *ptr)
+{
+	switch (event) {
+	case NETDEV_REGISTER:
+	case NETDEV_CHANGENAME:
+		airoha_soe_dsa_proxy_scan(true);
+		break;
+	}
+
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block airoha_soe_netdev_notifier = {
+	.notifier_call = airoha_soe_netdev_event,
+};
+
+void airoha_soe_build_netdev(struct net_device *netdev,
+			     airoha_soe_xmit_skb_t xmit_skb)
+{
+	struct airoha_gdm_dev *dev = netdev_priv(netdev);
+
+	atomic_set(&dev->soe_xfrm_state_count, 0);
+	dev->soe_xmit_skb = xmit_skb;
+
+	if (!xmit_skb ||
+	    !(airoha_soe_features(dev->eth->soe) & AIROHA_SOE_FEATURE_ESP))
+		return;
+
+	netdev->xfrmdev_ops = &airoha_soe_xfrmdev_ops;
+	netdev->hw_features |= NETIF_F_HW_ESP;
+	netdev->hw_enc_features |= NETIF_F_HW_ESP;
+}
+
+void airoha_soe_teardown_netdev(struct net_device *netdev)
+{
+	struct airoha_gdm_dev *dev = netdev_priv(netdev);
+
+	if (netdev->xfrmdev_ops == &airoha_soe_xfrmdev_ops)
+		netdev->xfrmdev_ops = NULL;
+	dev->soe_xmit_skb = NULL;
+}
+
+int airoha_soe_set_features(struct net_device *netdev,
+			    netdev_features_t features)
+{
+	netdev_features_t changed = (netdev->features ^ features) &
+				    NETIF_F_HW_ESP;
+	struct airoha_gdm_dev *dev = netdev_priv(netdev);
+
+	if (!changed)
+		return 0;
+
+	if ((features & NETIF_F_HW_ESP) &&
+	    !(airoha_soe_features(dev->eth->soe) & AIROHA_SOE_FEATURE_ESP))
+		return -EOPNOTSUPP;
+
+	if (atomic_read(&dev->soe_xfrm_state_count)) {
+		netdev_err(netdev,
+			   "cannot change ESP features with active SAs\n");
+		return -EBUSY;
+	}
+
+	return 0;
+}
+
+static struct device_node *airoha_soe_find_node(struct airoha_eth *eth)
+{
+	struct device_node *parent, *np;
+
+	if (!eth->dev->of_node)
+		return NULL;
+
+	parent = of_get_parent(eth->dev->of_node);
+	if (!parent)
+		return NULL;
+
+	/* SOE is a sibling DT node; Ethernet owns the provider lifetime. */
+	for_each_child_of_node(parent, np) {
+		if (!of_device_is_available(np) ||
+		    !of_device_is_compatible(np, "airoha,en7581-soe"))
+			continue;
+
+		of_node_put(parent);
+		return np;
+	}
+
+	of_node_put(parent);
+
+	return NULL;
+}
+
+int airoha_soe_init(struct airoha_eth *eth)
+{
+	struct device *dev = eth->dev;
+	struct device_node *np;
+	struct resource res;
+	struct airoha_soe *soe;
+	void __iomem *base;
+	int err;
+
+	np = airoha_soe_find_node(eth);
+	if (!np)
+		return 0;
+
+	err = of_address_to_resource(np, 0, &res);
+	if (err)
+		goto put_node;
+
+	base = devm_ioremap_resource(dev, &res);
+	if (IS_ERR(base)) {
+		err = PTR_ERR(base);
+		goto put_node;
+	}
+
+	soe = devm_kzalloc(dev, sizeof(*soe), GFP_KERNEL);
+	if (!soe) {
+		err = -ENOMEM;
+		goto put_node;
+	}
+
+	soe->dev = dev;
+	soe->base = base;
+	mutex_init(&soe->sa_lock);
+	spin_lock_init(&soe->state_lock);
+	refcount_set(&soe->refcnt, 1);
+	init_completion(&soe->released);
+
+	/* Enable the packet engines; reset leaves SOE present but idle. */
+	writel(AIROHA_SOE_INT_ALL, base + AIROHA_SOE_INT_STS);
+	writel(AIROHA_SOE_CNT_CLR_ALL, base + AIROHA_SOE_CNT_CLR);
+	writel(AIROHA_SOE_INT_ALL, base + AIROHA_SOE_INT_EN);
+	writel(AIROHA_SOE_GLB_CFG_ENC_EN | AIROHA_SOE_GLB_CFG_DEC_EN,
+	       base + AIROHA_SOE_GLB_CFG);
+
+	err = register_netdevice_notifier(&airoha_soe_netdev_notifier);
+	if (err)
+		goto disable_soe;
+
+	eth->soe = soe;
+
+	rtnl_lock();
+	airoha_soe_dsa_proxy_scan(true);
+	rtnl_unlock();
+
+	of_node_put(np);
+
+	return 0;
+
+disable_soe:
+	writel(0, base + AIROHA_SOE_GLB_CFG);
+	writel(0, base + AIROHA_SOE_INT_EN);
+	writel(0xffffffff, base + AIROHA_SOE_INT_STS);
+put_node:
+	of_node_put(np);
+
+	return err;
+}
+
+void airoha_soe_deinit(struct airoha_eth *eth)
+{
+	struct airoha_soe *soe = eth->soe;
+	unsigned long flags;
+
+	if (!soe)
+		return;
+
+	eth->soe = NULL;
+
+	spin_lock_irqsave(&soe->state_lock, flags);
+	soe->dead = true;
+	spin_unlock_irqrestore(&soe->state_lock, flags);
+
+	rtnl_lock();
+	airoha_soe_dsa_proxy_scan(false);
+	rtnl_unlock();
+	unregister_netdevice_notifier(&airoha_soe_netdev_notifier);
+
+	airoha_soe_put_ref(soe);
+	wait_for_completion(&soe->released);
+
+	writel(0, soe->base + AIROHA_SOE_GLB_CFG);
+	writel(0, soe->base + AIROHA_SOE_INT_EN);
+	writel(0xffffffff, soe->base + AIROHA_SOE_INT_STS);
+}
-- 
2.53.0

^ permalink raw reply related

* [RFC PATCH net-next 6/7] net: airoha: add PPE support for SOE flows
From: Jihong Min @ 2026-06-14  4:00 UTC (permalink / raw)
  To: netdev, Lorenzo Bianconi
  Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Andrew Lunn, Simon Horman, Herbert Xu, Steffen Klassert,
	Rob Herring, Krzysztof Kozlowski, Conor Dooley, devicetree,
	Matthias Brugger, AngeloGioacchino Del Regno, linux-arm-kernel,
	linux-mediatek, Christian Marangi, Felix Fietkau, linux-kernel,
	Jihong Min
In-Reply-To: <20260614040032.1567994-1-hurryman2212@gmail.com>

Add PPE metadata handling for SOE flows so decrypted packets can carry
their original FOE/SA context until the normal egress path is known, and
so XFRM flowtable entries can be programmed with SOE SA and hop
information.

Signed-off-by: Jihong Min <hurryman2212@gmail.com>
---
 drivers/net/ethernet/airoha/airoha_ppe.c  | 606 +++++++++++++++++++++-
 include/linux/soc/airoha/airoha_offload.h |   5 +
 2 files changed, 585 insertions(+), 26 deletions(-)

diff --git a/drivers/net/ethernet/airoha/airoha_ppe.c b/drivers/net/ethernet/airoha/airoha_ppe.c
index 91bcc55a6ac6..faa5f04d4c7b 100644
--- a/drivers/net/ethernet/airoha/airoha_ppe.c
+++ b/drivers/net/ethernet/airoha/airoha_ppe.c
@@ -6,18 +6,77 @@
 
 #include <linux/ip.h>
 #include <linux/ipv6.h>
+#include <linux/kstrtox.h>
+#include <linux/moduleparam.h>
 #include <linux/of_platform.h>
 #include <linux/platform_device.h>
 #include <linux/rhashtable.h>
+#include <linux/sysfs.h>
+#include <linux/tcp.h>
+#include <linux/udp.h>
 #include <net/ipv6.h>
+#include <net/netfilter/nf_flow_table.h>
 #include <net/pkt_cls.h>
+#include <net/route.h>
 
 #include "airoha_regs.h"
 #include "airoha_eth.h"
+#include "airoha_soe.h"
 
 static DEFINE_MUTEX(flow_offload_mutex);
 static DEFINE_SPINLOCK(ppe_lock);
 
+#define AIROHA_FOE_HOP0			GENMASK(31, 29)
+#define AIROHA_FOE_HOP1			GENMASK(28, 26)
+#define AIROHA_FOE_HOP2			GENMASK(25, 23)
+#define AIROHA_FOE_HOP3			GENMASK(22, 20)
+#define AIROHA_FOE_HOP_MASK						\
+	(AIROHA_FOE_HOP0 | AIROHA_FOE_HOP1 | AIROHA_FOE_HOP2 |		\
+	 AIROHA_FOE_HOP3)
+#define AIROHA_PPE_SOE_DEFAULT_TUNNEL_MTU	1500
+#define AIROHA_PPE_SOE_MAGIC_IPSEC		0x72a1
+#define AIROHA_PPE_SOE_PORT_AG			0x3f
+#define AIROHA_PPE_SOE_CHANNEL			2
+#define AIROHA_PPE_SOE_META_TIMEOUT_MS		1000
+#define AIROHA_PPE_SOE_MAGIC_GDM4		0x729a
+#define AIROHA_PPE_SOE_MARK_MAGIC		0x5e000000
+#define AIROHA_PPE_SOE_MARK_MAGIC_MASK		0xff000000
+#define AIROHA_PPE_SOE_MARK_HASH_MASK		0x00ffff00
+#define AIROHA_PPE_DEFAULT_BIND_RATE		0x1e
+#define AIROHA_PPE_SOE_BIND_DELAY_PACKETS	2
+#define AIROHA_PPE_FORCE_COMMIT_PROBE_WINDOW	0
+
+struct airoha_ppe_soe_tuple_info {
+	unsigned int tunnel_mtu;
+	u8 sa_index;
+	u8 hop;
+};
+
+static unsigned int airoha_ppe_bind_rate = AIROHA_PPE_DEFAULT_BIND_RATE;
+static unsigned int airoha_ppe_soe_inline_bind_delay_packets =
+	AIROHA_PPE_SOE_BIND_DELAY_PACKETS;
+static unsigned int airoha_ppe_soe_inline_force_commit_probe_window =
+	AIROHA_PPE_FORCE_COMMIT_PROBE_WINDOW;
+static struct airoha_ppe *airoha_ppe_active;
+
+static int airoha_ppe_set_bind_rate(const char *val,
+				    const struct kernel_param *kp);
+static int airoha_ppe_get_bind_rate(char *buf, const struct kernel_param *kp);
+
+module_param_named(soe_inline_force_commit_probe_window,
+		   airoha_ppe_soe_inline_force_commit_probe_window, uint, 0600);
+MODULE_PARM_DESC(soe_inline_force_commit_probe_window,
+		 "Adjacent FOE slots searched before force-commit");
+module_param_named(soe_inline_bind_delay_packets,
+		   airoha_ppe_soe_inline_bind_delay_packets, uint, 0600);
+MODULE_PARM_DESC(soe_inline_bind_delay_packets,
+		 "CPU-visible SOE decrypt packets before binding FOE entry");
+module_param_call(ppe_bind_rate, airoha_ppe_set_bind_rate,
+		  airoha_ppe_get_bind_rate, NULL, 0600);
+__MODULE_PARM_TYPE(ppe_bind_rate, "uint");
+MODULE_PARM_DESC(ppe_bind_rate,
+		 "PPE bind-rate threshold for L2B and bind fields");
+
 static const struct rhashtable_params airoha_flow_table_params = {
 	.head_offset = offsetof(struct airoha_flow_table_entry, node),
 	.key_offset = offsetof(struct airoha_flow_table_entry, cookie),
@@ -78,6 +137,17 @@ bool airoha_ppe_is_enabled(struct airoha_eth *eth, int index)
 	return airoha_fe_rr(eth, REG_PPE_GLO_CFG(index)) & PPE_GLO_CFG_EN_MASK;
 }
 
+static void airoha_ppe_apply_bind_rate(struct airoha_eth *eth, int ppe_idx)
+{
+	u32 rate = min_t(u32, READ_ONCE(airoha_ppe_bind_rate),
+			 FIELD_MAX(PPE_BIND_RATE_BIND_MASK));
+
+	airoha_fe_rmw(eth, REG_PPE_BIND_RATE(ppe_idx),
+		      PPE_BIND_RATE_L2B_BIND_MASK | PPE_BIND_RATE_BIND_MASK,
+		      FIELD_PREP(PPE_BIND_RATE_L2B_BIND_MASK, rate) |
+			      FIELD_PREP(PPE_BIND_RATE_BIND_MASK, rate));
+}
+
 static u32 airoha_ppe_get_timestamp(struct airoha_ppe *ppe)
 {
 	return airoha_fe_get(ppe->eth, REG_FE_FOE_TS,
@@ -157,15 +227,14 @@ static void airoha_ppe_hw_init(struct airoha_ppe *ppe)
 			      FIELD_PREP(PPE_DRAM_TB_NUM_ENTRY_MASK,
 					 dram_num_entries));
 
-		airoha_fe_rmw(eth, REG_PPE_BIND_RATE(i),
-			      PPE_BIND_RATE_L2B_BIND_MASK |
-			      PPE_BIND_RATE_BIND_MASK,
-			      FIELD_PREP(PPE_BIND_RATE_L2B_BIND_MASK, 0x1e) |
-			      FIELD_PREP(PPE_BIND_RATE_BIND_MASK, 0x1e));
+		airoha_ppe_apply_bind_rate(eth, i);
 
 		airoha_fe_wr(eth, REG_PPE_HASH_SEED(i), PPE_HASH_SEED);
 		airoha_fe_clear(eth, REG_PPE_PPE_FLOW_CFG(i),
 				PPE_FLOW_CFG_IP6_6RD_MASK);
+		airoha_fe_set(eth, REG_PPE_PPE_FLOW_CFG(i),
+			      PPE_FLOW_CFG_IP4_IPSEC_MASK |
+				      PPE_FLOW_CFG_IP6_IPSEC_MASK);
 
 		for (p = 0; p < ARRAY_SIZE(eth->ports); p++)
 			airoha_fe_rmw(eth, REG_PPE_MTU(i, p),
@@ -509,6 +578,162 @@ static int airoha_ppe_foe_entry_set_ipv6_tuple(struct airoha_foe_entry *hwe,
 	return 0;
 }
 
+static int airoha_ppe_soe_fill_inner_ipv4_data(struct sk_buff *skb,
+					       struct airoha_flow_data *data,
+					       int *type, int *l4proto)
+{
+	unsigned int ip_offset = ETH_HLEN;
+	union {
+		struct tcphdr tcp;
+		struct udphdr udp;
+	} ports;
+	struct iphdr iph_buf, *iph;
+	unsigned int l4_offset;
+	struct udphdr *udp;
+	struct tcphdr *tcp;
+
+	if (skb_headlen(skb) < ETH_HLEN)
+		return -EINVAL;
+
+	memcpy(&data->eth, skb->data, ETH_HLEN);
+	if (data->eth.h_proto != htons(ETH_P_IP))
+		return -EAFNOSUPPORT;
+
+	iph = skb_header_pointer(skb, ip_offset, sizeof(iph_buf), &iph_buf);
+	if (!iph || iph->ihl < 5 || iph->version != 4)
+		return -EINVAL;
+
+	l4_offset = ip_offset + iph->ihl * 4;
+	data->v4.src_addr = iph->saddr;
+	data->v4.dst_addr = iph->daddr;
+	*l4proto = iph->protocol;
+
+	switch (iph->protocol) {
+	case IPPROTO_TCP:
+		tcp = skb_header_pointer(skb, l4_offset, sizeof(ports.tcp),
+					 &ports.tcp);
+		if (!tcp)
+			return -EINVAL;
+
+		data->src_port = tcp->source;
+		data->dst_port = tcp->dest;
+		*type = PPE_PKT_TYPE_IPV4_HNAPT;
+		break;
+	case IPPROTO_UDP:
+		udp = skb_header_pointer(skb, l4_offset, sizeof(ports.udp),
+					 &ports.udp);
+		if (!udp)
+			return -EINVAL;
+
+		data->src_port = udp->source;
+		data->dst_port = udp->dest;
+		*type = PPE_PKT_TYPE_IPV4_HNAPT;
+		break;
+	default:
+		*type = PPE_PKT_TYPE_IPV4_ROUTE;
+		break;
+	}
+
+	return 0;
+}
+
+static int airoha_ppe_foe_entry_set_soe_fields(struct airoha_foe_entry *hwe,
+					       u8 sa_index, u8 hop,
+					       unsigned int tunnel_mtu)
+{
+	int type;
+
+	if (hop > FIELD_MAX(AIROHA_FOE_HOP0))
+		return -ERANGE;
+
+	type = FIELD_GET(AIROHA_FOE_IB1_BIND_PACKET_TYPE, hwe->ib1);
+	switch (type) {
+	case PPE_PKT_TYPE_IPV4_HNAPT:
+	case PPE_PKT_TYPE_IPV4_ROUTE:
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	tunnel_mtu = min_t(unsigned int, tunnel_mtu,
+			   FIELD_MAX(AIROHA_FOE_TUNNEL_MTU));
+
+	/* SOE FOE entries store the hop selector and SA index here. */
+	hwe->ipv4.rsv[0] &= ~AIROHA_FOE_HOP_MASK;
+	hwe->ipv4.rsv[0] |= FIELD_PREP(AIROHA_FOE_HOP0, hop);
+
+	hwe->ipv4.data &= ~(AIROHA_FOE_ACTDP | AIROHA_FOE_TUNNEL_ID);
+	hwe->ipv4.data |= AIROHA_FOE_TUNNEL |
+			  FIELD_PREP(AIROHA_FOE_ACTDP, sa_index);
+
+	hwe->ipv4.l2.meter &= ~AIROHA_FOE_TUNNEL_MTU;
+	hwe->ipv4.l2.meter |= FIELD_PREP(AIROHA_FOE_TUNNEL_MTU, tunnel_mtu);
+
+	hwe->ib1 &= ~AIROHA_FOE_IB1_BIND_TUNNEL_DECAP;
+
+	return 0;
+}
+
+static int
+airoha_ppe_soe_get_tuple_info(const struct flow_offload_tuple *tuple,
+			      struct airoha_ppe_soe_tuple_info *info)
+{
+	int err;
+
+	if (!tuple || !info)
+		return -EINVAL;
+	if (tuple->xmit_type != FLOW_OFFLOAD_XMIT_XFRM)
+		return -EOPNOTSUPP;
+
+	err = airoha_soe_xfrm_ppe_info(tuple->dst_cache, &info->sa_index,
+				       &info->hop);
+	if (err)
+		return err;
+
+	info->tunnel_mtu = tuple->mtu ? tuple->mtu :
+				      AIROHA_PPE_SOE_DEFAULT_TUNNEL_MTU;
+
+	return 0;
+}
+
+static int
+airoha_ppe_foe_entry_set_soe_info(struct airoha_foe_entry *hwe,
+				  const struct flow_offload_tuple *tuple)
+{
+	struct airoha_ppe_soe_tuple_info info;
+	int err;
+
+	if (!tuple)
+		return 0;
+	if (tuple->xmit_type != FLOW_OFFLOAD_XMIT_XFRM)
+		return 0;
+
+	err = airoha_ppe_soe_get_tuple_info(tuple, &info);
+	if (err)
+		return err;
+
+	err = airoha_ppe_foe_entry_set_soe_fields(hwe, info.sa_index,
+						  info.hop, info.tunnel_mtu);
+	if (err)
+		return err;
+
+	/* XFRM packet-offload entries are not plain Ethernet/IP entries:
+	 * the PPE must tag them as SOE/IPsec work and submit them through the
+	 * SOE-facing channel/port aggregation path. Without these fields the
+	 * entry can still become BND, but traffic falls back to the slow path
+	 * instead of the inline encrypt/decrypt datapath.
+	 */
+	hwe->ipv4.l2.common.etype = AIROHA_PPE_SOE_MAGIC_IPSEC;
+	hwe->ipv4.data &= ~AIROHA_FOE_CHANNEL;
+	hwe->ipv4.data |= FIELD_PREP(AIROHA_FOE_CHANNEL,
+				     AIROHA_PPE_SOE_CHANNEL);
+	hwe->ipv4.ib2 &= ~AIROHA_FOE_IB2_PORT_AG;
+	hwe->ipv4.ib2 |= FIELD_PREP(AIROHA_FOE_IB2_PORT_AG,
+				    AIROHA_PPE_SOE_PORT_AG);
+
+	return 0;
+}
+
 static u32 airoha_ppe_foe_get_entry_hash(struct airoha_ppe *ppe,
 					 struct airoha_foe_entry *hwe)
 {
@@ -633,6 +858,9 @@ static void airoha_ppe_foe_flow_stats_update(struct airoha_ppe *ppe,
 		meter = &hwe->ipv4.l2.meter;
 	}
 
+	if (*data & AIROHA_FOE_TUNNEL)
+		return;
+
 	pse_port = FIELD_GET(AIROHA_FOE_IB2_PSE_PORT, *ib2);
 	if (pse_port == FE_PSE_PORT_CDM4)
 		return;
@@ -868,16 +1096,129 @@ airoha_ppe_foe_commit_subflow_entry(struct airoha_ppe *ppe,
 	return 0;
 }
 
-static void airoha_ppe_foe_insert_entry(struct airoha_ppe *ppe,
-					struct sk_buff *skb,
-					u32 hash, bool rx_wlan)
+static void airoha_ppe_soe_meta_store(struct airoha_ppe_soe_meta *meta,
+				      u32 key_hash, u16 foe_hash, u8 sa_index,
+				      u8 hop)
 {
-	struct airoha_flow_table_entry *e;
-	struct airoha_foe_bridge br = {};
-	struct airoha_foe_entry *hwe;
-	bool commit_done = false;
-	struct hlist_node *n;
-	u32 index, state;
+	u8 seen = 1;
+
+	if (READ_ONCE(meta->valid) &&
+	    READ_ONCE(meta->key_hash) == key_hash &&
+	    READ_ONCE(meta->foe_hash) == foe_hash &&
+	    READ_ONCE(meta->sa_index) == sa_index &&
+	    READ_ONCE(meta->hop) == hop &&
+	    time_before_eq(jiffies, READ_ONCE(meta->expires)))
+		seen = min_t(u8, READ_ONCE(meta->seen) + 1, U8_MAX);
+
+	WRITE_ONCE(meta->key_hash, key_hash);
+	WRITE_ONCE(meta->foe_hash, foe_hash);
+	WRITE_ONCE(meta->sa_index, sa_index);
+	WRITE_ONCE(meta->hop, hop);
+	WRITE_ONCE(meta->seen, seen);
+	WRITE_ONCE(meta->expires,
+		   jiffies + msecs_to_jiffies(AIROHA_PPE_SOE_META_TIMEOUT_MS));
+	WRITE_ONCE(meta->valid, 1);
+}
+
+void airoha_ppe_soe_mark_skb(struct airoha_ppe_dev *dev, struct sk_buff *skb,
+			     u16 hash, u8 sa_index, u8 hop)
+{
+	struct airoha_ppe *ppe;
+	u32 ppe_hash_mask;
+
+	if (!dev || !skb)
+		return;
+
+	ppe = dev->priv;
+	if (!ppe || !ppe->soe_meta)
+		return;
+
+	ppe_hash_mask = airoha_ppe_get_total_num_entries(ppe) - 1;
+	if (hash > ppe_hash_mask)
+		return;
+
+	/* SOE decrypt completion is CPU-visible before normal routing has
+	 * selected the plaintext egress netdev. Keep the original encrypted FOE
+	 * hash and SA hop briefly on the skb so airoha_dev_xmit() can finish
+	 * the PPE entry once the final egress descriptor is known.
+	 */
+	airoha_ppe_soe_meta_store(&ppe->soe_meta[hash], hash, hash, sa_index,
+				  hop);
+	ppe->foe_check_time[hash] = 0;
+
+	skb->mark &= ~(AIROHA_PPE_SOE_MARK_MAGIC_MASK |
+		       AIROHA_PPE_SOE_MARK_HASH_MASK);
+	skb->mark |= AIROHA_PPE_SOE_MARK_MAGIC |
+		     FIELD_PREP(AIROHA_PPE_SOE_MARK_HASH_MASK, hash);
+}
+
+bool airoha_ppe_soe_skb_marked(struct sk_buff *skb)
+{
+	return skb && ((skb->mark & AIROHA_PPE_SOE_MARK_MAGIC_MASK) ==
+		       AIROHA_PPE_SOE_MARK_MAGIC);
+}
+
+void airoha_ppe_soe_xmit_skb(struct airoha_ppe_dev *dev, struct sk_buff *skb,
+			     struct net_device *netdev)
+{
+	struct airoha_foe_entry entry, tmpl, *hwe;
+	struct airoha_flow_data data = {};
+	struct airoha_ppe_soe_meta *meta;
+	u32 ppe_hash_mask, key_hash;
+	struct airoha_gdm_dev *gdm;
+	struct airoha_ppe *ppe;
+	unsigned long expires;
+	u16 hash;
+	int err, l4proto, type;
+	u8 sa_index, hop;
+	u8 seen;
+
+	if (!dev || !skb || !netdev)
+		return;
+
+	if ((skb->mark & AIROHA_PPE_SOE_MARK_MAGIC_MASK) !=
+	    AIROHA_PPE_SOE_MARK_MAGIC)
+		return;
+
+	ppe = dev->priv;
+	if (!ppe || !ppe->soe_meta)
+		goto clear_mark;
+
+	ppe_hash_mask = airoha_ppe_get_total_num_entries(ppe) - 1;
+	key_hash = FIELD_GET(AIROHA_PPE_SOE_MARK_HASH_MASK, skb->mark);
+	if (key_hash > ppe_hash_mask)
+		goto clear_mark;
+
+	meta = &ppe->soe_meta[key_hash];
+	if (!READ_ONCE(meta->valid))
+		goto clear_mark;
+
+	if (READ_ONCE(meta->key_hash) != key_hash)
+		goto clear_mark;
+
+	expires = READ_ONCE(meta->expires);
+	if (time_after(jiffies, expires)) {
+		WRITE_ONCE(meta->valid, 0);
+		goto clear_mark;
+	}
+
+	hash = READ_ONCE(meta->foe_hash);
+	if (hash > ppe_hash_mask) {
+		WRITE_ONCE(meta->valid, 0);
+		goto clear_mark;
+	}
+
+	seen = READ_ONCE(meta->seen);
+	if (seen <= READ_ONCE(airoha_ppe_soe_inline_bind_delay_packets))
+		goto clear_mark;
+
+	err = airoha_ppe_soe_fill_inner_ipv4_data(skb, &data, &type, &l4proto);
+	if (err)
+		goto clear_mark;
+
+	sa_index = READ_ONCE(meta->sa_index);
+	hop = READ_ONCE(meta->hop);
+	WRITE_ONCE(meta->valid, 0);
 
 	spin_lock_bh(&ppe_lock);
 
@@ -885,13 +1226,120 @@ static void airoha_ppe_foe_insert_entry(struct airoha_ppe *ppe,
 	if (!hwe)
 		goto unlock;
 
-	state = FIELD_GET(AIROHA_FOE_IB1_BIND_STATE, hwe->ib1);
-	if (state == AIROHA_FOE_STATE_BIND)
+	switch (FIELD_GET(AIROHA_FOE_IB1_BIND_PACKET_TYPE, hwe->ib1)) {
+	case PPE_PKT_TYPE_IPV4_HNAPT:
+	case PPE_PKT_TYPE_IPV4_ROUTE:
+		break;
+	default:
 		goto unlock;
+	}
 
-	index = airoha_ppe_foe_get_entry_hash(ppe, hwe);
-	hlist_for_each_entry_safe(e, n, &ppe->foe_flow[index], list) {
+	err = airoha_ppe_foe_entry_prepare(ppe->eth, &tmpl, netdev, type,
+					   &data, l4proto);
+	if (err)
+		goto unlock;
+
+	memcpy(&entry, hwe, sizeof(entry));
+	entry.ib1 &= ~(AIROHA_FOE_IB1_BIND_STATE |
+		       AIROHA_FOE_IB1_BIND_KEEPALIVE |
+		       AIROHA_FOE_IB1_BIND_TIMESTAMP);
+	entry.ib1 |= FIELD_PREP(AIROHA_FOE_IB1_BIND_STATE,
+				AIROHA_FOE_STATE_BIND) |
+		     AIROHA_FOE_IB1_BIND_TTL;
+	entry.ib1 = (entry.ib1 & (AIROHA_FOE_IB1_BIND_PACKET_TYPE |
+				  AIROHA_FOE_IB1_BIND_UDP)) |
+		    (tmpl.ib1 & ~(AIROHA_FOE_IB1_BIND_PACKET_TYPE |
+				  AIROHA_FOE_IB1_BIND_UDP));
+	entry.ipv4.ib2 = tmpl.ipv4.ib2;
+	entry.ipv4.data = tmpl.ipv4.data;
+	memcpy(&entry.ipv4.l2, &tmpl.ipv4.l2, sizeof(entry.ipv4.l2));
+
+	gdm = netdev_priv(netdev);
+	if (gdm->port && gdm->port->id == AIROHA_GDM4_IDX)
+		entry.ipv4.l2.common.etype = AIROHA_PPE_SOE_MAGIC_GDM4;
+
+	if (FIELD_GET(AIROHA_FOE_IB1_BIND_PACKET_TYPE, entry.ib1) ==
+	    PPE_PKT_TYPE_IPV4_HNAPT)
+		memcpy(&entry.ipv4.new_tuple, &entry.ipv4.orig_tuple,
+		       sizeof(entry.ipv4.new_tuple));
+
+	/* Commit the original decrypt entry only after the normal transmit path
+	 * has provided the final plaintext egress descriptor. Binding it at SOE
+	 * RX completion would miss this device-specific L2/PSE state.
+	 */
+	err = airoha_ppe_foe_entry_set_soe_fields(&entry, sa_index, hop,
+						  AIROHA_PPE_SOE_DEFAULT_TUNNEL_MTU);
+	if (!err)
+		airoha_ppe_foe_commit_entry(ppe, &entry, hash, false);
+
+unlock:
+	spin_unlock_bh(&ppe_lock);
+clear_mark:
+	skb->mark &= ~(AIROHA_PPE_SOE_MARK_MAGIC_MASK |
+		       AIROHA_PPE_SOE_MARK_HASH_MASK);
+}
+
+void airoha_ppe_soe_flush_sa(struct airoha_ppe *ppe, u8 sa_index)
+{
+	u32 num_entries, hash;
+
+	if (!ppe)
+		return;
+
+	num_entries = airoha_ppe_get_total_num_entries(ppe);
+
+	spin_lock_bh(&ppe_lock);
+	for (hash = 0; hash < num_entries; hash++) {
+		struct airoha_foe_entry *hwe;
+		u32 state, type;
+
+		hwe = airoha_ppe_foe_get_entry_locked(ppe, hash);
+		if (!hwe)
+			continue;
+
+		state = FIELD_GET(AIROHA_FOE_IB1_BIND_STATE, hwe->ib1);
+		if (state != AIROHA_FOE_STATE_BIND)
+			continue;
+
+		type = FIELD_GET(AIROHA_FOE_IB1_BIND_PACKET_TYPE, hwe->ib1);
+		if (type != PPE_PKT_TYPE_IPV4_HNAPT &&
+		    type != PPE_PKT_TYPE_IPV4_ROUTE)
+			continue;
+
+		if (!(hwe->ipv4.data & AIROHA_FOE_TUNNEL))
+			continue;
+
+		if (FIELD_GET(AIROHA_FOE_ACTDP, hwe->ipv4.data) != sa_index)
+			continue;
+
+		/* NAT-T data and IKE control both use UDP/4500. A stale SOE
+		 * bound entry can otherwise keep sending later IKE_AUTH packets
+		 * to the SOE path after the SA has been deleted.
+		 */
+		hwe->ib1 &= ~AIROHA_FOE_IB1_BIND_STATE;
+		hwe->ib1 |= FIELD_PREP(AIROHA_FOE_IB1_BIND_STATE,
+				       AIROHA_FOE_STATE_INVALID);
+		airoha_ppe_foe_commit_entry(ppe, hwe, hash, false);
+	}
+	spin_unlock_bh(&ppe_lock);
+}
+
+static bool airoha_ppe_foe_try_flow_commit_bucket(struct airoha_ppe *ppe,
+						  struct airoha_foe_entry *hwe,
+						  u32 hash, u32 probe_index,
+						  bool rx_wlan,
+						  bool allow_l2_subflow)
+{
+	struct airoha_flow_table_entry *e;
+	struct hlist_node *n;
+	bool commit_done = false;
+	u32 state;
+
+	hlist_for_each_entry_safe(e, n, &ppe->foe_flow[probe_index], list) {
 		if (e->type == FLOW_TYPE_L2_SUBFLOW) {
+			if (!allow_l2_subflow)
+				continue;
+
 			state = FIELD_GET(AIROHA_FOE_IB1_BIND_STATE, hwe->ib1);
 			if (state != AIROHA_FOE_STATE_BIND) {
 				e->hash = 0xffff;
@@ -908,6 +1356,51 @@ static void airoha_ppe_foe_insert_entry(struct airoha_ppe *ppe,
 		e->hash = hash;
 	}
 
+	return commit_done;
+}
+
+static void airoha_ppe_foe_insert_entry(struct airoha_ppe *ppe,
+					struct sk_buff *skb,
+					u32 hash, bool rx_wlan)
+{
+	struct airoha_flow_table_entry *e;
+	struct airoha_foe_bridge br = {};
+	struct airoha_foe_entry *hwe;
+	bool commit_done = false;
+	u32 index, mask, state, window;
+	unsigned int i;
+
+	spin_lock_bh(&ppe_lock);
+
+	hwe = airoha_ppe_foe_get_entry_locked(ppe, hash);
+	if (!hwe)
+		goto unlock;
+
+	state = FIELD_GET(AIROHA_FOE_IB1_BIND_STATE, hwe->ib1);
+	if (state == AIROHA_FOE_STATE_BIND)
+		goto unlock;
+
+	index = airoha_ppe_foe_get_entry_hash(ppe, hwe);
+	commit_done =
+		airoha_ppe_foe_try_flow_commit_bucket(ppe, hwe, hash, index,
+						      rx_wlan, true);
+
+	mask = airoha_ppe_get_total_num_entries(ppe) - 1;
+	window = min_t(u32,
+		       READ_ONCE(airoha_ppe_soe_inline_force_commit_probe_window),
+		       mask);
+	for (i = 1; !commit_done && i <= window; i++) {
+		u32 candidates[2] = { (index + i) & mask, (index - i) & mask };
+		unsigned int j;
+
+		for (j = 0; !commit_done && j < ARRAY_SIZE(candidates); j++) {
+			commit_done =
+				airoha_ppe_foe_try_flow_commit_bucket(ppe, hwe,
+								      hash, candidates[j],
+								      rx_wlan, false);
+		}
+	}
+
 	if (commit_done)
 		goto unlock;
 
@@ -940,8 +1433,9 @@ airoha_ppe_foe_l2_flow_commit_entry(struct airoha_ppe *ppe,
 				       airoha_l2_flow_table_params);
 }
 
-static int airoha_ppe_foe_flow_commit_entry(struct airoha_ppe *ppe,
-					    struct airoha_flow_table_entry *e)
+static int
+airoha_ppe_foe_flow_commit_entry(struct airoha_ppe *ppe,
+				 struct airoha_flow_table_entry *e)
 {
 	int type = FIELD_GET(AIROHA_FOE_IB1_BIND_PACKET_TYPE, e->data.ib1);
 	u32 hash;
@@ -1057,6 +1551,7 @@ static int airoha_ppe_entry_idle_time(struct airoha_ppe *ppe,
 static int airoha_ppe_flow_offload_replace(struct airoha_eth *eth,
 					   struct flow_cls_offload *f)
 {
+	const struct flow_offload_tuple *tuple = (const void *)f->cookie;
 	struct flow_rule *rule = flow_cls_offload_flow_rule(f);
 	struct airoha_flow_table_entry *e;
 	struct airoha_flow_data data = {};
@@ -1183,7 +1678,9 @@ static int airoha_ppe_flow_offload_replace(struct airoha_eth *eth,
 		flow_rule_match_ipv4_addrs(rule, &addrs);
 		data.v4.src_addr = addrs.key->src;
 		data.v4.dst_addr = addrs.key->dst;
-		airoha_ppe_foe_entry_set_ipv4_tuple(&hwe, &data, false);
+		err = airoha_ppe_foe_entry_set_ipv4_tuple(&hwe, &data, false);
+		if (err)
+			return err;
 	}
 
 	if (addr_type == FLOW_DISSECTOR_KEY_IPV6_ADDRS) {
@@ -1228,6 +1725,10 @@ static int airoha_ppe_flow_offload_replace(struct airoha_eth *eth,
 			return err;
 	}
 
+	err = airoha_ppe_foe_entry_set_soe_info(&hwe, tuple);
+	if (err)
+		return err;
+
 	e = kzalloc_obj(*e);
 	if (!e)
 		return -ENOMEM;
@@ -1350,16 +1851,26 @@ static int airoha_ppe_flow_offload_cmd(struct airoha_eth *eth,
 	return -EOPNOTSUPP;
 }
 
-static int airoha_ppe_flush_sram_entries(struct airoha_ppe *ppe)
+static int airoha_ppe_flush_entries(struct airoha_ppe *ppe)
 {
+	u32 ppe_num_entries = airoha_ppe_get_total_num_entries(ppe);
 	u32 sram_num_entries = airoha_ppe_get_total_sram_num_entries(ppe);
 	struct airoha_foe_entry *hwe = ppe->foe;
 	int i, err = 0;
 
+	memset(hwe, 0, ppe_num_entries * sizeof(*hwe));
+	if (ppe->foe_stats) {
+		u32 ppe_num_stats_entries =
+			airoha_ppe_get_total_num_stats_entries(ppe);
+
+		memset(ppe->foe_stats, 0,
+		       ppe_num_stats_entries * sizeof(*ppe->foe_stats));
+	}
+	dma_wmb();
+
 	for (i = 0; i < sram_num_entries; i++) {
 		int err;
 
-		memset(&hwe[i], 0, sizeof(*hwe));
 		err = airoha_ppe_foe_commit_sram_entry(ppe, i);
 		if (err)
 			break;
@@ -1368,6 +1879,37 @@ static int airoha_ppe_flush_sram_entries(struct airoha_ppe *ppe)
 	return err;
 }
 
+static int airoha_ppe_set_bind_rate(const char *val,
+				    const struct kernel_param *kp)
+{
+	struct airoha_ppe *ppe;
+	unsigned long rate;
+	int err, i;
+
+	err = kstrtoul(val, 0, &rate);
+	if (err)
+		return err;
+	if (rate > FIELD_MAX(PPE_BIND_RATE_BIND_MASK))
+		return -ERANGE;
+
+	WRITE_ONCE(airoha_ppe_bind_rate, (unsigned int)rate);
+
+	mutex_lock(&flow_offload_mutex);
+	ppe = READ_ONCE(airoha_ppe_active);
+	if (ppe) {
+		for (i = 0; i < ppe->eth->soc->num_ppe; i++)
+			airoha_ppe_apply_bind_rate(ppe->eth, i);
+	}
+	mutex_unlock(&flow_offload_mutex);
+
+	return 0;
+}
+
+static int airoha_ppe_get_bind_rate(char *buf, const struct kernel_param *kp)
+{
+	return sysfs_emit(buf, "%u\n", READ_ONCE(airoha_ppe_bind_rate));
+}
+
 static struct airoha_npu *airoha_ppe_npu_get(struct airoha_eth *eth)
 {
 	struct airoha_npu *npu = airoha_npu_get(eth->dev);
@@ -1601,12 +2143,20 @@ int airoha_ppe_init(struct airoha_eth *eth)
 			return -ENOMEM;
 	}
 
-	ppe->foe_check_time = devm_kzalloc(eth->dev, ppe_num_entries,
-					   GFP_KERNEL);
+	ppe->foe_check_time =
+		devm_kzalloc(eth->dev,
+			     ppe_num_entries * sizeof(*ppe->foe_check_time),
+			     GFP_KERNEL);
 	if (!ppe->foe_check_time)
 		return -ENOMEM;
 
-	err = airoha_ppe_flush_sram_entries(ppe);
+	ppe->soe_meta = devm_kzalloc(eth->dev,
+				     ppe_num_entries * sizeof(*ppe->soe_meta),
+				     GFP_KERNEL);
+	if (!ppe->soe_meta)
+		return -ENOMEM;
+
+	err = airoha_ppe_flush_entries(ppe);
 	if (err)
 		return err;
 
@@ -1622,6 +2172,8 @@ int airoha_ppe_init(struct airoha_eth *eth)
 	if (err)
 		goto error_l2_flow_table_destroy;
 
+	WRITE_ONCE(airoha_ppe_active, ppe);
+
 	return 0;
 
 error_l2_flow_table_destroy:
@@ -1636,6 +2188,8 @@ void airoha_ppe_deinit(struct airoha_eth *eth)
 {
 	struct airoha_npu *npu;
 
+	WRITE_ONCE(airoha_ppe_active, NULL);
+
 	mutex_lock(&flow_offload_mutex);
 
 	npu = rcu_replace_pointer(eth->npu, NULL,
diff --git a/include/linux/soc/airoha/airoha_offload.h b/include/linux/soc/airoha/airoha_offload.h
index 7589fccfeef6..120dbd274c89 100644
--- a/include/linux/soc/airoha/airoha_offload.h
+++ b/include/linux/soc/airoha/airoha_offload.h
@@ -11,7 +11,12 @@
 #include <linux/workqueue.h>
 
 enum {
+	PPE_CPU_REASON_UN_HIT = 0x0d,
+	PPE_CPU_REASON_HIT_UNBIND = 0x0e,
 	PPE_CPU_REASON_HIT_UNBIND_RATE_REACHED = 0x0f,
+	PPE_CPU_REASON_HIT_BIND_FORCE_CPU = 0x16,
+	PPE_CPU_REASON_HIT_BIND_EXCEED_MTU = 0x1c,
+	PPE_CPU_REASON_NOT_THROUGH_PPE = 0x1e,
 };
 
 struct airoha_ppe_dev {
-- 
2.53.0

^ permalink raw reply related

* [RFC PATCH net-next 5/7] net: airoha: add QDMA support for SOE packets
From: Jihong Min @ 2026-06-14  4:00 UTC (permalink / raw)
  To: netdev, Lorenzo Bianconi
  Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Andrew Lunn, Simon Horman, Herbert Xu, Steffen Klassert,
	Rob Herring, Krzysztof Kozlowski, Conor Dooley, devicetree,
	Matthias Brugger, AngeloGioacchino Del Regno, linux-arm-kernel,
	linux-mediatek, Christian Marangi, Felix Fietkau, linux-kernel,
	Jihong Min
In-Reply-To: <20260614040032.1567994-1-hurryman2212@gmail.com>

Add QDMA RX/TX plumbing for SOE packets, including the SOE RX ring,
coherent RX slots, SOE completion decoding, and the private QDMA submit
helper used by the SOE provider. Wire the Ethernet netdev feature and
lifetime hooks through the SOE stubs.

Signed-off-by: Jihong Min <hurryman2212@gmail.com>
---
 drivers/net/ethernet/airoha/airoha_eth.c | 668 ++++++++++++++++++++---
 1 file changed, 591 insertions(+), 77 deletions(-)

diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
index 5f1a118875fb..42fd30c12ed7 100644
--- a/drivers/net/ethernet/airoha/airoha_eth.c
+++ b/drivers/net/ethernet/airoha/airoha_eth.c
@@ -6,6 +6,7 @@
 #include <linux/of.h>
 #include <linux/of_net.h>
 #include <linux/of_reserved_mem.h>
+#include <linux/moduleparam.h>
 #include <linux/platform_device.h>
 #include <linux/tcp.h>
 #include <linux/u64_stats_sync.h>
@@ -16,6 +17,67 @@
 
 #include "airoha_regs.h"
 #include "airoha_eth.h"
+#include "airoha_soe.h"
+
+/* QDMA1 uses a different RX IRQ bank layout than QDMA0 on EN7581. */
+#define AIROHA_QDMA_WAN_RX_IRQ0_BANK_PIN_MASK 0x0000839f
+#define AIROHA_QDMA_WAN_RX_IRQ1_BANK_PIN_MASK 0x7f800400
+#define AIROHA_QDMA_WAN_RX_IRQ2_BANK_PIN_MASK 0x00000000
+#define AIROHA_QDMA_WAN_RX_IRQ3_BANK_PIN_MASK 0x00000040
+
+static unsigned int airoha_qdma0_rx_irq_bank_mask[AIROHA_MAX_NUM_IRQ_BANKS] = {
+	RX_IRQ0_BANK_PIN_MASK,
+	RX_IRQ1_BANK_PIN_MASK,
+	RX_IRQ2_BANK_PIN_MASK,
+	RX_IRQ3_BANK_PIN_MASK,
+};
+
+static unsigned int airoha_qdma1_rx_irq_bank_mask[AIROHA_MAX_NUM_IRQ_BANKS] = {
+	AIROHA_QDMA_WAN_RX_IRQ0_BANK_PIN_MASK,
+	AIROHA_QDMA_WAN_RX_IRQ1_BANK_PIN_MASK,
+	AIROHA_QDMA_WAN_RX_IRQ2_BANK_PIN_MASK,
+	AIROHA_QDMA_WAN_RX_IRQ3_BANK_PIN_MASK,
+};
+
+#if IS_REACHABLE(CONFIG_NET_AIROHA_SOE)
+#define AIROHA_SOE_RX_RING 10
+#define AIROHA_SOE_RX_ALLOC_NDESC 2048
+#define AIROHA_SOE_RX_NDESC_DEFAULT 512
+#define AIROHA_SOE_RX_BUF_SIZE 4096
+#define AIROHA_SOE_RX_BUF_SIZE_MIN 128
+#define AIROHA_SOE_RX_BUF_SIZE_MAX 16384
+#define AIROHA_SOE_DECRYPT_FOE_REASON_MASK \
+	(BIT(PPE_CPU_REASON_UN_HIT) | BIT(PPE_CPU_REASON_HIT_UNBIND) | \
+	 BIT(PPE_CPU_REASON_HIT_UNBIND_RATE_REACHED))
+
+static int airoha_soe_rx_poll_desc_budget = AIROHA_SOE_RX_NDESC_DEFAULT;
+static int airoha_soe_rx_max_frame_descs = 128;
+static int airoha_soe_probe_rx_ndesc = AIROHA_SOE_RX_NDESC_DEFAULT;
+static int airoha_soe_probe_rx_buf_size = AIROHA_SOE_RX_BUF_SIZE;
+static bool airoha_soe_probe_rx_coherent = true;
+static int airoha_soe_probe_rx_scatter = 1;
+
+module_param_named(soe_rx_poll_desc_budget, airoha_soe_rx_poll_desc_budget, int,
+		   0600);
+MODULE_PARM_DESC(soe_rx_poll_desc_budget,
+		 "Maximum SOE RX descriptors consumed per poll");
+module_param_named(soe_rx_max_frame_descs, airoha_soe_rx_max_frame_descs, int,
+		   0600);
+MODULE_PARM_DESC(soe_rx_max_frame_descs,
+		 "Maximum descriptors per SOE RX frame before dropping the chain");
+module_param_named(soe_probe_rx_ndesc, airoha_soe_probe_rx_ndesc, int, 0600);
+MODULE_PARM_DESC(soe_probe_rx_ndesc, "SOE RX descriptor count");
+module_param_named(soe_probe_rx_buf_size, airoha_soe_probe_rx_buf_size, int,
+		   0600);
+MODULE_PARM_DESC(soe_probe_rx_buf_size, "SOE RX buffer size");
+module_param_named(soe_probe_rx_coherent, airoha_soe_probe_rx_coherent, bool,
+		   0600);
+MODULE_PARM_DESC(soe_probe_rx_coherent, "Use coherent SOE RX buffers");
+module_param_named(soe_probe_rx_scatter, airoha_soe_probe_rx_scatter, int,
+		   0600);
+MODULE_PARM_DESC(soe_probe_rx_scatter,
+		 "SOE RX scatter mode: 0 disabled, 1 enabled");
+#endif
 
 u32 airoha_rr(void __iomem *base, u32 offset)
 {
@@ -71,6 +133,100 @@ static void airoha_qdma_irq_disable(struct airoha_irq_bank *irq_bank,
 	airoha_qdma_set_irqmask(irq_bank, index, mask, 0);
 }
 
+static unsigned int *airoha_qdma_rx_irq_bank_masks(struct airoha_qdma *qdma)
+{
+	struct airoha_eth *eth = qdma->eth;
+	int id = qdma - &eth->qdma[0];
+
+	return id ? airoha_qdma1_rx_irq_bank_mask :
+		    airoha_qdma0_rx_irq_bank_mask;
+}
+
+static u32 airoha_qdma_rx_irq_bank_mask(struct airoha_qdma *qdma, int bank)
+{
+	unsigned int *masks = airoha_qdma_rx_irq_bank_masks(qdma);
+
+	if (bank < 0 || bank >= AIROHA_MAX_NUM_IRQ_BANKS)
+		return 0;
+
+	return READ_ONCE(masks[bank]);
+}
+
+#if IS_REACHABLE(CONFIG_NET_AIROHA_SOE)
+static u32 airoha_qdma_rx_irq_all_bank_mask(struct airoha_qdma *qdma)
+{
+	u32 mask = 0;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(qdma->irq_banks); i++)
+		mask |= airoha_qdma_rx_irq_bank_mask(qdma, i);
+
+	return mask;
+}
+#endif
+
+static void airoha_qdma_apply_rx_irq_bank_masks(struct airoha_qdma *qdma)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(qdma->irq_banks); i++) {
+		struct airoha_irq_bank *irq_bank = &qdma->irq_banks[i];
+		u32 mask = airoha_qdma_rx_irq_bank_mask(qdma, i);
+
+		airoha_qdma_set_irqmask(irq_bank, QDMA_INT_REG_IDX0,
+					RX_COHERENT_LOW_INT_MASK,
+					INT_RX0_MASK(mask));
+		airoha_qdma_set_irqmask(irq_bank, QDMA_INT_REG_IDX1,
+					RX_NO_CPU_DSCP_LOW_INT_MASK |
+						RX_DONE_LOW_INT_MASK,
+					INT_RX1_MASK(mask));
+		airoha_qdma_set_irqmask(irq_bank, QDMA_INT_REG_IDX2,
+					RX_NO_CPU_DSCP_HIGH_INT_MASK |
+						RX_DONE_HIGH_INT_MASK,
+					INT_RX2_MASK(mask));
+		airoha_qdma_set_irqmask(irq_bank, QDMA_INT_REG_IDX3,
+					RX_COHERENT_HIGH_INT_MASK,
+					INT_RX3_MASK(mask));
+	}
+}
+
+static void airoha_qdma_set_rx_done_irq(struct airoha_qdma *qdma, int qid,
+					bool enable)
+{
+	int i, intr_reg;
+	u32 mask;
+
+	intr_reg = qid < RX_DONE_HIGH_OFFSET ? QDMA_INT_REG_IDX1 :
+					       QDMA_INT_REG_IDX2;
+	mask = BIT(qid % RX_DONE_HIGH_OFFSET);
+
+	for (i = 0; i < ARRAY_SIZE(qdma->irq_banks); i++) {
+		if (!(BIT(qid) & airoha_qdma_rx_irq_bank_mask(qdma, i)))
+			continue;
+
+		if (enable)
+			airoha_qdma_irq_enable(&qdma->irq_banks[i], intr_reg,
+					       mask);
+		else
+			airoha_qdma_irq_disable(&qdma->irq_banks[i], intr_reg,
+						mask);
+	}
+
+#if IS_REACHABLE(CONFIG_NET_AIROHA_SOE)
+	if (qid != AIROHA_SOE_RX_RING)
+		return;
+
+	if (BIT(qid) & airoha_qdma_rx_irq_all_bank_mask(qdma))
+		return;
+
+	/* SOE RX10 may not be covered by the shared bank mask; use bank 0. */
+	if (enable)
+		airoha_qdma_irq_enable(&qdma->irq_banks[0], intr_reg, mask);
+	else
+		airoha_qdma_irq_disable(&qdma->irq_banks[0], intr_reg, mask);
+#endif
+}
+
 static int airoha_set_macaddr(struct airoha_gdm_dev *dev, const u8 *addr)
 {
 	u8 ref_addr[ETH_ALEN] __aligned(2);
@@ -532,6 +688,11 @@ static int airoha_fe_init(struct airoha_eth *eth)
 	airoha_fe_wr(eth, REG_FE_CDM1_OQ_MAP2, BIT(4));
 	airoha_fe_wr(eth, REG_FE_CDM1_OQ_MAP3, BIT(28));
 
+	/* SOE/TDMA output depends on PSE shared-buffer flow control instead
+	 * of leaving port-local sharing disabled.
+	 */
+	airoha_fe_clear(eth, REG_PSE_FC_CFG, PSE_TDMA_SHARE_BUF_DIS_MASK);
+
 	airoha_fe_vip_setup(eth);
 	airoha_fe_pse_ports_init(eth);
 
@@ -597,20 +758,29 @@ static int airoha_qdma_fill_rx_queue(struct airoha_queue *q)
 		int offset;
 		u32 val;
 
-		page = page_pool_dev_alloc_frag(q->page_pool, &offset,
-						q->buf_size);
-		if (!page)
-			break;
+		if (q->rx_coherent) {
+			/* Coherent slots avoid page_pool recycling for SOE frames. */
+			offset = q->head * q->buf_size;
+			e->buf = q->rx_coherent_buf + offset;
+			e->dma_addr = q->rx_coherent_dma + offset;
+			e->dma_len = q->buf_size;
+		} else {
+			page = page_pool_dev_alloc_frag(q->page_pool, &offset,
+							q->buf_size);
+			if (!page)
+				break;
+
+			offset += AIROHA_RX_HEADROOM;
+			e->buf = page_address(page) + offset;
+			e->dma_addr = page_pool_get_dma_addr(page) + offset;
+			e->dma_len =
+				SKB_WITH_OVERHEAD(AIROHA_RX_LEN(q->buf_size));
+		}
 
 		q->head = (q->head + 1) % q->ndesc;
 		q->queued++;
 		nframes++;
 
-		offset += AIROHA_RX_HEADROOM;
-		e->buf = page_address(page) + offset;
-		e->dma_addr = page_pool_get_dma_addr(page) + offset;
-		e->dma_len = SKB_WITH_OVERHEAD(AIROHA_RX_LEN(q->buf_size));
-
 		val = FIELD_PREP(QDMA_DESC_LEN_MASK, e->dma_len);
 		WRITE_ONCE(desc->ctrl, cpu_to_le32(val));
 		WRITE_ONCE(desc->addr, cpu_to_le32(e->dma_addr));
@@ -652,92 +822,210 @@ airoha_qdma_get_gdm_dev(struct airoha_eth *eth, struct airoha_qdma_desc *desc)
 	return port->devs[d] ? port->devs[d] : ERR_PTR(-ENODEV);
 }
 
+#if IS_REACHABLE(CONFIG_NET_AIROHA_SOE)
+static bool airoha_qdma_rx_is_soe(u32 msg0)
+{
+	u32 hop_flags = FIELD_GET(QDMA_ETH_RXMSG_HOP_FLAGS_MASK, msg0);
+
+	return hop_flags >= 3 && hop_flags <= 7;
+}
+#endif
+
 static int airoha_qdma_rx_process(struct airoha_queue *q, int budget)
 {
-	enum dma_data_direction dir = page_pool_get_dma_dir(q->page_pool);
+	enum dma_data_direction dir;
 	struct airoha_qdma *qdma = q->qdma;
 	struct airoha_eth *eth = qdma->eth;
 	int qid = q - &qdma->q_rx[0];
+	int desc_budget = q->ndesc;
+	int desc_done = 0;
 	int done = 0;
 
-	while (done < budget) {
+	dir = q->rx_coherent ? DMA_FROM_DEVICE :
+			       page_pool_get_dma_dir(q->page_pool);
+
+#if IS_REACHABLE(CONFIG_NET_AIROHA_SOE)
+	if (airoha_soe_rx_poll_desc_budget > 0)
+		desc_budget = min(airoha_soe_rx_poll_desc_budget, q->ndesc);
+#endif
+
+	while (q->queued > 0 && done < budget && desc_done < desc_budget) {
 		struct airoha_queue_entry *e = &q->entry[q->tail];
 		struct airoha_qdma_desc *desc = &q->desc[q->tail];
-		u32 hash, reason, msg1, desc_ctrl;
-		struct airoha_gdm_dev *dev;
+		u32 hash, reason, msg0, msg1, msg2, desc_ctrl;
+#if IS_REACHABLE(CONFIG_NET_AIROHA_SOE)
+		/* Scattered SOE completions only tag the first descriptor. */
+		bool partial_soe = q->skb && !q->skb->dev;
+#endif
+		struct airoha_gdm_dev *dev = NULL;
+		struct net_device *rx_dev = NULL;
 		struct net_device *netdev;
+		bool soe_pkt = false;
 		int data_len, len;
-		struct page *page;
+		struct page *page = NULL;
 
 		desc_ctrl = le32_to_cpu(READ_ONCE(desc->ctrl));
 		if (!(desc_ctrl & QDMA_DESC_DONE_MASK))
 			break;
 
 		dma_rmb();
+		desc_done++;
 
 		q->tail = (q->tail + 1) % q->ndesc;
 		q->queued--;
 
-		dma_sync_single_for_cpu(eth->dev, e->dma_addr, e->dma_len,
-					dir);
+		if (!q->rx_coherent)
+			dma_sync_single_for_cpu(eth->dev, e->dma_addr,
+						e->dma_len, dir);
+
+		if (!q->rx_coherent)
+			page = virt_to_head_page(e->buf);
+
+		if (q->rx_drop_chain) {
+			if (!FIELD_GET(QDMA_DESC_MORE_MASK, desc_ctrl)) {
+				q->rx_drop_chain = false;
+				q->rx_frame_descs = 0;
+				done++;
+			}
+			if (!q->rx_coherent)
+				page_pool_put_full_page(q->page_pool, page,
+							true);
+			continue;
+		}
 
-		page = virt_to_head_page(e->buf);
 		len = FIELD_GET(QDMA_DESC_LEN_MASK, desc_ctrl);
-		data_len = q->skb ? AIROHA_RX_LEN(q->buf_size) : e->dma_len;
+		data_len = q->skb && !q->rx_coherent ?
+				   AIROHA_RX_LEN(q->buf_size) :
+				   e->dma_len;
 		if (!len || data_len < len)
 			goto free_frag;
 
-		dev = airoha_qdma_get_gdm_dev(eth, desc);
-		if (IS_ERR(dev))
-			goto free_frag;
+		msg0 = le32_to_cpu(READ_ONCE(desc->msg0));
+		msg1 = le32_to_cpu(READ_ONCE(desc->msg1));
+		msg2 = le32_to_cpu(READ_ONCE(desc->msg2));
+#if IS_REACHABLE(CONFIG_NET_AIROHA_SOE)
+		soe_pkt = partial_soe || airoha_qdma_rx_is_soe(msg0);
+#endif
+		if (!soe_pkt) {
+			dev = airoha_qdma_get_gdm_dev(eth, desc);
+			if (IS_ERR(dev))
+				goto free_frag;
+			netdev = netdev_from_priv(dev);
+			rx_dev = netdev;
+		}
 
-		netdev = netdev_from_priv(dev);
 		if (!q->skb) { /* first buffer */
-			q->skb = napi_build_skb(e->buf - AIROHA_RX_HEADROOM,
-						q->buf_size);
+			if (q->rx_coherent) {
+				q->skb = napi_alloc_skb(&q->napi, len);
+				if (q->skb)
+					skb_put_data(q->skb, e->buf, len);
+			} else {
+				void *buf = e->buf - AIROHA_RX_HEADROOM;
+
+				q->skb = napi_build_skb(buf, q->buf_size);
+			}
 			if (!q->skb)
 				goto free_frag;
 
-			skb_reserve(q->skb, AIROHA_RX_HEADROOM);
-			__skb_put(q->skb, len);
-			skb_mark_for_recycle(q->skb);
-			q->skb->dev = netdev;
-			q->skb->protocol = eth_type_trans(q->skb, netdev);
-			q->skb->ip_summed = CHECKSUM_UNNECESSARY;
+			q->rx_drop_chain = false;
+			q->rx_frame_descs = 1;
+			if (!q->rx_coherent) {
+				skb_reserve(q->skb, AIROHA_RX_HEADROOM);
+				__skb_put(q->skb, len);
+				skb_mark_for_recycle(q->skb);
+			}
+			q->skb->dev = soe_pkt ? NULL : netdev;
+			q->skb->ip_summed = soe_pkt ? CHECKSUM_NONE :
+						      CHECKSUM_UNNECESSARY;
 			skb_record_rx_queue(q->skb, qid);
+			if (soe_pkt) {
+				q->soe_rx_msg0 = msg0;
+				q->soe_rx_msg2 = msg2;
+			}
+			if (!soe_pkt)
+				q->skb->protocol = eth_type_trans(q->skb,
+								  netdev);
 		} else { /* scattered frame */
-			struct skb_shared_info *shinfo = skb_shinfo(q->skb);
-			int nr_frags = shinfo->nr_frags;
-
-			if (nr_frags >= ARRAY_SIZE(shinfo->frags))
+#if IS_REACHABLE(CONFIG_NET_AIROHA_SOE)
+			if (soe_pkt && airoha_soe_rx_max_frame_descs > 0 &&
+			    q->rx_frame_descs >=
+				    airoha_soe_rx_max_frame_descs) {
+				q->rx_drop_chain =
+					FIELD_GET(QDMA_DESC_MORE_MASK, desc_ctrl);
 				goto free_frag;
-
-			skb_add_rx_frag(q->skb, nr_frags, page,
-					e->buf - page_address(page), len,
-					q->buf_size);
+			}
+#endif
+			q->rx_frame_descs++;
+			if (q->rx_coherent) {
+				if (skb_tailroom(q->skb) < len) {
+					unsigned int needed;
+
+					needed = len - skb_tailroom(q->skb);
+					if (pskb_expand_head(q->skb, 0, needed,
+							     GFP_ATOMIC))
+						goto free_frag;
+				}
+				skb_put_data(q->skb, e->buf, len);
+			} else {
+				struct skb_shared_info *shinfo =
+					skb_shinfo(q->skb);
+				int nr_frags = shinfo->nr_frags;
+
+				if (nr_frags >= ARRAY_SIZE(shinfo->frags))
+					goto free_frag;
+
+				skb_add_rx_frag(q->skb, nr_frags, page,
+						e->buf - page_address(page),
+						len, q->buf_size);
+			}
 		}
 
 		if (FIELD_GET(QDMA_DESC_MORE_MASK, desc_ctrl))
 			continue;
 
+		q->rx_drop_chain = false;
+		q->rx_frame_descs = 0;
+#if IS_REACHABLE(CONFIG_NET_AIROHA_SOE)
+		if (soe_pkt) {
+			u32 hop_flags = FIELD_GET(QDMA_ETH_RXMSG_HOP_FLAGS_MASK,
+						  q->soe_rx_msg0);
+			u32 sa_index = FIELD_GET(QDMA_ETH_RXMSG_SW_UDF_MASK,
+						 q->soe_rx_msg2);
+
+			done++;
+			if (!airoha_soe_rx_skb(eth->soe, q->skb, sa_index,
+					       hop_flags))
+				dev_kfree_skb(q->skb);
+			q->skb = NULL;
+			continue;
+		}
+#endif
+
 		if (netdev_uses_dsa(netdev)) {
 			struct airoha_gdm_port *port = dev->port;
+			struct dsa_port *cpu_dp = netdev->dsa_ptr;
+			u32 sptag = FIELD_GET(QDMA_ETH_RXMSG_SPTAG, msg0);
 
 			/* PPE module requires untagged packets to work
 			 * properly and it provides DSA port index via the
 			 * DMA descriptor. Report DSA tag to the DSA stack
 			 * via skb dst info.
 			 */
-			u32 msg0 = le32_to_cpu(READ_ONCE(desc->msg0));
-			u32 sptag = FIELD_GET(QDMA_ETH_RXMSG_SPTAG, msg0);
-
 			if (sptag < ARRAY_SIZE(port->dsa_meta) &&
 			    port->dsa_meta[sptag])
 				skb_dst_set_noref(q->skb,
 						  &port->dsa_meta[sptag]->dst);
+
+			if (cpu_dp && cpu_dp->ds) {
+				struct dsa_port *dp =
+					dsa_to_port(cpu_dp->ds, sptag);
+
+				if (dp && dsa_port_is_user(dp) &&
+				    dp->cpu_dp == cpu_dp && dp->user)
+					rx_dev = dp->user;
+			}
 		}
 
-		msg1 = le32_to_cpu(READ_ONCE(desc->msg1));
 		hash = FIELD_GET(AIROHA_RXD4_FOE_ENTRY, msg1);
 		if (hash != AIROHA_RXD4_FOE_ENTRY)
 			skb_set_hash(q->skb, jhash_1word(hash, 0),
@@ -749,18 +1037,54 @@ static int airoha_qdma_rx_process(struct airoha_queue *q, int budget)
 					     false);
 
 		done++;
+#if IS_REACHABLE(CONFIG_NET_AIROHA_SOE)
+		/* Native ESP/NAT-T packets enter through the normal RX path
+		 * first. If they match an inbound packet-offload SA, consume
+		 * the encrypted skb here and resubmit it to SOE before GRO
+		 * takes ownership of the packet. SOE decrypts the original
+		 * ESP/NAT-T packet; after GRO or the normal RX stack processes
+		 * the skb, it is no longer a suitable hardware decrypt
+		 * candidate. Keep the RX FOE hash/reason so the decrypt
+		 * completion can later bind the PPE flow after egress is known.
+		 */
+		if (hash != AIROHA_RXD4_FOE_ENTRY) {
+			bool foe_valid = false;
+
+			if (reason < 32)
+				foe_valid = AIROHA_SOE_DECRYPT_FOE_REASON_MASK &
+					    BIT(reason);
+			if (airoha_soe_rx_plain_skb(dev, q->skb, rx_dev, hash,
+						    reason, foe_valid)) {
+				q->skb = NULL;
+				continue;
+			}
+		} else if (airoha_soe_rx_plain_skb(dev, q->skb, rx_dev, hash,
+						   reason, false)) {
+			q->skb = NULL;
+			continue;
+		}
+#endif
 		napi_gro_receive(&q->napi, q->skb);
 		q->skb = NULL;
 		continue;
 free_frag:
+		q->rx_drop_chain = !!FIELD_GET(QDMA_DESC_MORE_MASK, desc_ctrl);
+		q->rx_frame_descs = 0;
+		done++;
 		if (q->skb) {
 			dev_kfree_skb(q->skb);
 			q->skb = NULL;
 		}
-		page_pool_put_full_page(q->page_pool, page, true);
+		if (!q->rx_coherent)
+			page_pool_put_full_page(q->page_pool, page, true);
 	}
 	airoha_qdma_fill_rx_queue(q);
 
+#if IS_REACHABLE(CONFIG_NET_AIROHA_SOE)
+	if (desc_done && !done)
+		return 1;
+#endif
+
 	return done;
 }
 
@@ -776,17 +1100,9 @@ static int airoha_qdma_rx_napi_poll(struct napi_struct *napi, int budget)
 
 	if (done < budget && napi_complete(napi)) {
 		struct airoha_qdma *qdma = q->qdma;
-		int i, qid = q - &qdma->q_rx[0];
-		int intr_reg = qid < RX_DONE_HIGH_OFFSET ? QDMA_INT_REG_IDX1
-							 : QDMA_INT_REG_IDX2;
-
-		for (i = 0; i < ARRAY_SIZE(qdma->irq_banks); i++) {
-			if (!(BIT(qid) & RX_IRQ_BANK_PIN_MASK(i)))
-				continue;
+		int qid = q - &qdma->q_rx[0];
 
-			airoha_qdma_irq_enable(&qdma->irq_banks[i], intr_reg,
-					       BIT(qid % RX_DONE_HIGH_OFFSET));
-		}
+		airoha_qdma_set_rx_done_irq(qdma, qid, true);
 	}
 
 	return done;
@@ -795,7 +1111,7 @@ static int airoha_qdma_rx_napi_poll(struct napi_struct *napi, int budget)
 static int airoha_qdma_init_rx_queue(struct airoha_queue *q,
 				     struct airoha_qdma *qdma, int ndesc)
 {
-	const struct page_pool_params pp_params = {
+	struct page_pool_params pp_params = {
 		.order = 0,
 		.pool_size = 256,
 		.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV,
@@ -809,8 +1125,21 @@ static int airoha_qdma_init_rx_queue(struct airoha_queue *q,
 	int qid = q - &qdma->q_rx[0], thr;
 	dma_addr_t dma_addr;
 
+#if IS_REACHABLE(CONFIG_NET_AIROHA_SOE)
+	if (qid == AIROHA_SOE_RX_RING) {
+		ndesc = max(ndesc, AIROHA_SOE_RX_ALLOC_NDESC);
+		if (airoha_soe_probe_rx_ndesc > 0)
+			ndesc = clamp(airoha_soe_probe_rx_ndesc, 1,
+				      AIROHA_SOE_RX_ALLOC_NDESC);
+	}
+#endif
+
 	q->buf_size = PAGE_SIZE / 2;
 	q->qdma = qdma;
+#if IS_REACHABLE(CONFIG_NET_AIROHA_SOE)
+	if (qid == AIROHA_SOE_RX_RING)
+		q->rx_coherent = airoha_soe_probe_rx_coherent;
+#endif
 
 	q->entry = devm_kzalloc(eth->dev, ndesc * sizeof(*q->entry),
 				GFP_KERNEL);
@@ -821,19 +1150,45 @@ static int airoha_qdma_init_rx_queue(struct airoha_queue *q,
 				      &dma_addr, GFP_KERNEL);
 	if (!q->desc)
 		return -ENOMEM;
+	q->desc_dma = dma_addr;
 
-	q->page_pool = page_pool_create(&pp_params);
-	if (IS_ERR(q->page_pool)) {
-		int err = PTR_ERR(q->page_pool);
+	if (!q->rx_coherent) {
+		q->page_pool = page_pool_create(&pp_params);
+		if (IS_ERR(q->page_pool)) {
+			int err = PTR_ERR(q->page_pool);
 
-		q->page_pool = NULL;
-		return err;
+			q->page_pool = NULL;
+			return err;
+		}
+	}
+
+#if IS_REACHABLE(CONFIG_NET_AIROHA_SOE)
+	if (qid == AIROHA_SOE_RX_RING) {
+		size_t buf_size;
+		int max_buf_size;
+
+		q->rx_alloc_ndesc = ndesc;
+		max_buf_size = q->rx_coherent ? AIROHA_SOE_RX_BUF_SIZE_MAX :
+						AIROHA_SOE_RX_BUF_SIZE;
+		q->buf_size = clamp(airoha_soe_probe_rx_buf_size,
+				    AIROHA_SOE_RX_BUF_SIZE_MIN, max_buf_size);
+		if (q->rx_coherent) {
+			buf_size = q->buf_size * ndesc;
+			q->rx_coherent_buf =
+				dmam_alloc_coherent(eth->dev, buf_size,
+						    &q->rx_coherent_dma,
+						    GFP_KERNEL);
+			if (!q->rx_coherent_buf)
+				return -ENOMEM;
+			q->rx_coherent_buf_size = buf_size;
+		}
 	}
+#endif
 
 	q->ndesc = ndesc;
 	netif_napi_add(eth->napi_dev, &q->napi, airoha_qdma_rx_napi_poll);
 
-	airoha_qdma_wr(qdma, REG_RX_RING_BASE(qid), dma_addr);
+	airoha_qdma_wr(qdma, REG_RX_RING_BASE(qid), q->desc_dma);
 	airoha_qdma_rmw(qdma, REG_RX_RING_SIZE(qid),
 			RX_RING_SIZE_MASK,
 			FIELD_PREP(RX_RING_SIZE_MASK, ndesc));
@@ -843,7 +1198,14 @@ static int airoha_qdma_init_rx_queue(struct airoha_queue *q,
 			FIELD_PREP(RX_RING_THR_MASK, thr));
 	airoha_qdma_rmw(qdma, REG_RX_DMA_IDX(qid), RX_RING_DMA_IDX_MASK,
 			FIELD_PREP(RX_RING_DMA_IDX_MASK, q->head));
-	airoha_qdma_set(qdma, REG_RX_SCATTER_CFG(qid), RX_RING_SG_EN_MASK);
+#if IS_REACHABLE(CONFIG_NET_AIROHA_SOE)
+	if (qid == AIROHA_SOE_RX_RING && !airoha_soe_probe_rx_scatter)
+		airoha_qdma_clear(qdma, REG_RX_SCATTER_CFG(qid),
+				  RX_RING_SG_EN_MASK);
+	else
+#endif
+		airoha_qdma_set(qdma, REG_RX_SCATTER_CFG(qid),
+				RX_RING_SG_EN_MASK);
 
 	airoha_qdma_fill_rx_queue(q);
 
@@ -859,11 +1221,14 @@ static void airoha_qdma_cleanup_rx_queue(struct airoha_queue *q)
 	while (q->queued) {
 		struct airoha_queue_entry *e = &q->entry[q->tail];
 		struct airoha_qdma_desc *desc = &q->desc[q->tail];
-		struct page *page = virt_to_head_page(e->buf);
 
-		dma_sync_single_for_cpu(eth->dev, e->dma_addr, e->dma_len,
-					page_pool_get_dma_dir(q->page_pool));
-		page_pool_put_full_page(q->page_pool, page, false);
+		if (!q->rx_coherent) {
+			struct page *page = virt_to_head_page(e->buf);
+
+			dma_sync_single_for_cpu(eth->dev, e->dma_addr, e->dma_len,
+						page_pool_get_dma_dir(q->page_pool));
+			page_pool_put_full_page(q->page_pool, page, false);
+		}
 		/* Reset DMA descriptor */
 		WRITE_ONCE(desc->ctrl, 0);
 		WRITE_ONCE(desc->addr, 0);
@@ -1045,8 +1410,24 @@ static int airoha_qdma_tx_napi_poll(struct napi_struct *napi, int budget)
 			airoha_qdma_rmw(qdma, REG_IRQ_CLEAR_LEN(id),
 					IRQ_CLEAR_LEN_MASK, 0x80);
 		airoha_qdma_rmw(qdma, REG_IRQ_CLEAR_LEN(id),
-				IRQ_CLEAR_LEN_MASK, (done & 0x7f));
+				IRQ_CLEAR_LEN_MASK,
+				(done & 0x7f));
+	}
+
+#if IS_REACHABLE(CONFIG_NET_AIROHA_SOE)
+	if (done && airoha_soe_has_pending_rx(eth->soe)) {
+		int i;
+
+		/* SOE decrypt completions can lag behind TX cleanup IRQs. */
+		for (i = 0; i < ARRAY_SIZE(eth->qdma); i++) {
+			struct airoha_queue *rxq =
+				&eth->qdma[i].q_rx[AIROHA_SOE_RX_RING];
+
+			if (rxq->ndesc)
+				napi_schedule(&rxq->napi);
+		}
 	}
+#endif
 
 	if (done < budget && napi_complete(napi))
 		airoha_qdma_irq_enable(&qdma->irq_banks[0], QDMA_INT_REG_IDX0,
@@ -1346,16 +1727,11 @@ static int airoha_qdma_hw_init(struct airoha_qdma *qdma)
 	for (i = 0; i < ARRAY_SIZE(qdma->irq_banks); i++) {
 		/* clear pending irqs */
 		airoha_qdma_wr(qdma, REG_INT_STATUS(i), 0xffffffff);
-		/* setup rx irqs */
-		airoha_qdma_irq_enable(&qdma->irq_banks[i], QDMA_INT_REG_IDX0,
-				       INT_RX0_MASK(RX_IRQ_BANK_PIN_MASK(i)));
-		airoha_qdma_irq_enable(&qdma->irq_banks[i], QDMA_INT_REG_IDX1,
-				       INT_RX1_MASK(RX_IRQ_BANK_PIN_MASK(i)));
-		airoha_qdma_irq_enable(&qdma->irq_banks[i], QDMA_INT_REG_IDX2,
-				       INT_RX2_MASK(RX_IRQ_BANK_PIN_MASK(i)));
-		airoha_qdma_irq_enable(&qdma->irq_banks[i], QDMA_INT_REG_IDX3,
-				       INT_RX3_MASK(RX_IRQ_BANK_PIN_MASK(i)));
 	}
+	airoha_qdma_apply_rx_irq_bank_masks(qdma);
+#if IS_REACHABLE(CONFIG_NET_AIROHA_SOE)
+	airoha_qdma_set_rx_done_irq(qdma, AIROHA_SOE_RX_RING, true);
+#endif
 	/* setup tx irqs */
 	airoha_qdma_irq_enable(&qdma->irq_banks[0], QDMA_INT_REG_IDX0,
 			       TX_COHERENT_LOW_INT_MASK | INT_TX_MASK);
@@ -2176,6 +2552,110 @@ int airoha_get_fe_port(struct airoha_gdm_dev *dev)
 	}
 }
 
+#if IS_REACHABLE(CONFIG_NET_AIROHA_SOE)
+int airoha_qdma_xmit_skb(struct airoha_gdm_dev *dev, struct sk_buff *skb,
+			 u32 msg0, u32 msg1, u32 msg2)
+{
+	struct net_device *netdev = netdev_from_priv(dev);
+	struct airoha_qdma *qdma = dev->qdma;
+	u32 nr_frags, len;
+	struct airoha_queue_entry *e;
+	struct netdev_queue *txq;
+	struct airoha_queue *q;
+	LIST_HEAD(tx_list);
+	int i = 0, qid;
+	void *data;
+	u16 index;
+
+	qid = airoha_qdma_get_txq(qdma, skb_get_queue_mapping(skb));
+	q = &qdma->q_tx[qid];
+	if (WARN_ON_ONCE(!q->ndesc))
+		return -ENODEV;
+
+	spin_lock_bh(&q->lock);
+
+	txq = skb_get_tx_queue(netdev, skb);
+	nr_frags = 1 + skb_shinfo(skb)->nr_frags;
+	if (q->queued + nr_frags >= q->ndesc) {
+		netif_tx_stop_queue(txq);
+		q->txq_stopped = true;
+		spin_unlock_bh(&q->lock);
+		return -EBUSY;
+	}
+
+	len = skb_headlen(skb);
+	data = skb->data;
+
+	e = list_first_entry(&q->tx_list, struct airoha_queue_entry, list);
+	index = e - q->entry;
+
+	while (true) {
+		struct airoha_qdma_desc *desc = &q->desc[index];
+		skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
+		dma_addr_t addr;
+		u32 val;
+
+		addr = dma_map_single(netdev->dev.parent, data, len,
+				      DMA_TO_DEVICE);
+		if (unlikely(dma_mapping_error(netdev->dev.parent, addr)))
+			goto error_unmap;
+
+		list_move_tail(&e->list, &tx_list);
+		e->skb = i == nr_frags - 1 ? skb : NULL;
+		e->dma_addr = addr;
+		e->dma_len = len;
+
+		e = list_first_entry(&q->tx_list, struct airoha_queue_entry,
+				     list);
+		index = e - q->entry;
+
+		val = FIELD_PREP(QDMA_DESC_LEN_MASK, len);
+		if (i < nr_frags - 1)
+			val |= FIELD_PREP(QDMA_DESC_MORE_MASK, 1);
+		WRITE_ONCE(desc->ctrl, cpu_to_le32(val));
+		WRITE_ONCE(desc->addr, cpu_to_le32(addr));
+		val = FIELD_PREP(QDMA_DESC_NEXT_ID_MASK, index);
+		WRITE_ONCE(desc->data, cpu_to_le32(val));
+		WRITE_ONCE(desc->msg0, cpu_to_le32(msg0));
+		WRITE_ONCE(desc->msg1, cpu_to_le32(msg1));
+		WRITE_ONCE(desc->msg2, cpu_to_le32(msg2));
+
+		if (++i == nr_frags)
+			break;
+
+		data = skb_frag_address(frag);
+		len = skb_frag_size(frag);
+	}
+	q->queued += i;
+
+	skb_tx_timestamp(skb);
+	netdev_tx_sent_queue(txq, skb->len);
+	if (q->ndesc - q->queued < q->free_thr) {
+		netif_tx_stop_queue(txq);
+		q->txq_stopped = true;
+	}
+
+	/* SOE submits do not run in the regular ndo_start_xmit batching path. */
+	airoha_qdma_rmw(qdma, REG_TX_CPU_IDX(qid), TX_RING_CPU_IDX_MASK,
+			FIELD_PREP(TX_RING_CPU_IDX_MASK, index));
+
+	spin_unlock_bh(&q->lock);
+
+	return 0;
+
+error_unmap:
+	list_for_each_entry(e, &tx_list, list) {
+		dma_unmap_single(netdev->dev.parent, e->dma_addr, e->dma_len,
+				 DMA_TO_DEVICE);
+		e->dma_addr = 0;
+	}
+	list_splice(&tx_list, &q->tx_list);
+	spin_unlock_bh(&q->lock);
+
+	return -ENOMEM;
+}
+#endif
+
 static netdev_tx_t airoha_dev_xmit(struct sk_buff *skb,
 				   struct net_device *netdev)
 {
@@ -2185,6 +2665,7 @@ static netdev_tx_t airoha_dev_xmit(struct sk_buff *skb,
 	struct airoha_queue_entry *e;
 	struct netdev_queue *txq;
 	struct airoha_queue *q;
+	bool soe_decrypt_skb = false;
 	LIST_HEAD(tx_list);
 	int i = 0, qid;
 	void *data;
@@ -2223,6 +2704,11 @@ static netdev_tx_t airoha_dev_xmit(struct sk_buff *skb,
 	       FIELD_PREP(QDMA_ETH_TXMSG_FPORT_MASK, fport) |
 	       FIELD_PREP(QDMA_ETH_TXMSG_METER_MASK, 0x7f);
 
+#if IS_REACHABLE(CONFIG_NET_AIROHA_SOE)
+	if (dev->eth->ppe)
+		soe_decrypt_skb = airoha_ppe_soe_skb_marked(skb);
+#endif
+
 	q = &qdma->q_tx[qid];
 	if (WARN_ON_ONCE(!q->ndesc))
 		goto error;
@@ -2293,13 +2779,24 @@ static netdev_tx_t airoha_dev_xmit(struct sk_buff *skb,
 		q->txq_stopped = true;
 	}
 
-	if (netif_xmit_stopped(txq) || !netdev_xmit_more())
+	if (netif_xmit_stopped(txq) || !netdev_xmit_more() ||
+	    soe_decrypt_skb)
 		airoha_qdma_rmw(qdma, REG_TX_CPU_IDX(qid),
 				TX_RING_CPU_IDX_MASK,
 				FIELD_PREP(TX_RING_CPU_IDX_MASK, index));
 
 	spin_unlock_bh(&q->lock);
 
+	/* SOE decrypt flow binding needs the final egress netdev and QDMA
+	 * descriptor context. The SOE RX path marks only candidate packets; bind
+	 * only after the current plaintext packet has been queued and kicked so
+	 * the newly bound decrypt entry cannot race the CPU-transmitted packet.
+	 */
+#if IS_REACHABLE(CONFIG_NET_AIROHA_SOE)
+	if (soe_decrypt_skb)
+		airoha_ppe_soe_xmit_skb(&dev->eth->ppe->dev, skb, netdev);
+#endif
+
 	return NETDEV_TX_OK;
 
 error_unmap:
@@ -3096,6 +3593,12 @@ static int airoha_dev_tc_setup(struct net_device *dev,
 	}
 }
 
+static int airoha_dev_set_features(struct net_device *netdev,
+				   netdev_features_t features)
+{
+	return airoha_soe_set_features(netdev, features);
+}
+
 static const struct net_device_ops airoha_netdev_ops = {
 	.ndo_init		= airoha_dev_init,
 	.ndo_open		= airoha_dev_open,
@@ -3105,6 +3608,7 @@ static const struct net_device_ops airoha_netdev_ops = {
 	.ndo_start_xmit		= airoha_dev_xmit,
 	.ndo_get_stats64        = airoha_dev_get_stats64,
 	.ndo_set_mac_address	= airoha_dev_set_macaddr,
+	.ndo_set_features	= airoha_dev_set_features,
 	.ndo_setup_tc		= airoha_dev_tc_setup,
 };
 
@@ -3230,6 +3734,7 @@ static int airoha_alloc_gdm_device(struct airoha_eth *eth,
 	dev->eth = eth;
 	dev->nbq = nbq;
 	port->devs[index] = dev;
+	airoha_soe_build_netdev(netdev, airoha_qdma_xmit_skb);
 
 	return 0;
 }
@@ -3409,10 +3914,14 @@ static int airoha_probe(struct platform_device *pdev)
 	strscpy(eth->napi_dev->name, "qdma_eth", sizeof(eth->napi_dev->name));
 	platform_set_drvdata(pdev, eth);
 
-	err = airoha_hw_init(pdev, eth);
+	err = airoha_soe_init(eth);
 	if (err)
 		goto error_netdev_free;
 
+	err = airoha_hw_init(pdev, eth);
+	if (err)
+		goto error_soe_deinit;
+
 	for (i = 0; i < ARRAY_SIZE(eth->qdma); i++)
 		airoha_qdma_start_napi(&eth->qdma[i]);
 
@@ -3457,11 +3966,14 @@ static int airoha_probe(struct platform_device *pdev)
 			netdev = netdev_from_priv(dev);
 			if (netdev->reg_state == NETREG_REGISTERED)
 				unregister_netdev(netdev);
+			airoha_soe_teardown_netdev(netdev);
 			of_node_put(netdev->dev.of_node);
 		}
 		airoha_metadata_dst_free(port);
 	}
 	airoha_hw_cleanup(eth);
+error_soe_deinit:
+	airoha_soe_deinit(eth);
 error_netdev_free:
 	free_netdev(eth->napi_dev);
 	platform_set_drvdata(pdev, NULL);
@@ -3492,12 +4004,14 @@ static void airoha_remove(struct platform_device *pdev)
 				continue;
 
 			netdev = netdev_from_priv(dev);
+			airoha_soe_teardown_netdev(netdev);
 			unregister_netdev(netdev);
 			of_node_put(netdev->dev.of_node);
 		}
 		airoha_metadata_dst_free(port);
 	}
 	airoha_hw_cleanup(eth);
+	airoha_soe_deinit(eth);
 
 	free_netdev(eth->napi_dev);
 	platform_set_drvdata(pdev, NULL);
-- 
2.53.0

^ permalink raw reply related

* [RFC PATCH net-next 4/7] net: airoha: add SOE registers and driver state
From: Jihong Min @ 2026-06-14  4:00 UTC (permalink / raw)
  To: netdev, Lorenzo Bianconi
  Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Andrew Lunn, Simon Horman, Herbert Xu, Steffen Klassert,
	Rob Herring, Krzysztof Kozlowski, Conor Dooley, devicetree,
	Matthias Brugger, AngeloGioacchino Del Regno, linux-arm-kernel,
	linux-mediatek, Christian Marangi, Felix Fietkau, linux-kernel,
	Jihong Min
In-Reply-To: <20260614040032.1567994-1-hurryman2212@gmail.com>

Add the FE/PPE/QDMA register definitions and shared driver state needed
by the Secure Offload Engine. Add a small SOE-facing header with stubs so
later Ethernet and PPE changes remain buildable until the provider is
enabled.

Signed-off-by: Jihong Min <hurryman2212@gmail.com>
---
 drivers/net/ethernet/airoha/airoha_eth.h  |  40 +++++++
 drivers/net/ethernet/airoha/airoha_regs.h |  16 +++
 drivers/net/ethernet/airoha/airoha_soe.h  | 126 ++++++++++++++++++++++
 3 files changed, 182 insertions(+)
 create mode 100644 drivers/net/ethernet/airoha/airoha_soe.h

diff --git a/drivers/net/ethernet/airoha/airoha_eth.h b/drivers/net/ethernet/airoha/airoha_eth.h
index 46b1c31939de..c5f09aedd7e7 100644
--- a/drivers/net/ethernet/airoha/airoha_eth.h
+++ b/drivers/net/ethernet/airoha/airoha_eth.h
@@ -16,6 +16,8 @@
 #include <linux/soc/airoha/airoha_offload.h>
 #include <net/dsa.h>
 
+struct airoha_soe;
+
 #define AIROHA_MAX_NUM_GDM_PORTS	4
 #define AIROHA_MAX_NUM_GDM_DEVS		2
 #define AIROHA_MAX_NUM_QDMA		2
@@ -189,18 +191,31 @@ struct airoha_queue {
 	spinlock_t lock;
 	struct airoha_queue_entry *entry;
 	struct airoha_qdma_desc *desc;
+	/* Preserved for RX ring reprogramming; dmam_alloc_coherent() owns it. */
+	dma_addr_t desc_dma;
 	u16 head;
 	u16 tail;
 
 	int queued;
 	int ndesc;
+	/* SOE RX rings may use coherent slots instead of page_pool fragments. */
+	int rx_alloc_ndesc;
 	int free_thr;
 	int buf_size;
 	bool txq_stopped;
+	bool rx_coherent;
+	bool rx_drop_chain;
+	void *rx_coherent_buf;
+	dma_addr_t rx_coherent_dma;
+	size_t rx_coherent_buf_size;
 
 	struct napi_struct napi;
 	struct page_pool *page_pool;
 	struct sk_buff *skb;
+	/* First SOE descriptor metadata is kept while a scattered frame is built. */
+	int rx_frame_descs;
+	u32 soe_rx_msg0;
+	u32 soe_rx_msg2;
 
 	struct list_head tx_list;
 };
@@ -434,6 +449,16 @@ struct airoha_foe_stats64 {
 	u64 packets;
 };
 
+struct airoha_ppe_soe_meta {
+	unsigned long expires;
+	u32 key_hash;
+	u16 foe_hash;
+	u8 valid;
+	u8 sa_index;
+	u8 hop;
+	u8 seen;
+};
+
 struct airoha_flow_data {
 	struct ethhdr eth;
 
@@ -552,6 +577,11 @@ struct airoha_gdm_dev {
 
 	u32 flags;
 	int nbq;
+	/* Prevent toggling NETIF_F_HW_ESP while programmed SAs still exist. */
+	atomic_t soe_xfrm_state_count;
+	/* Private SOE submit path into this GDM's active QDMA instance. */
+	int (*soe_xmit_skb)(struct airoha_gdm_dev *dev, struct sk_buff *skb,
+			    u32 msg0, u32 msg1, u32 msg2);
 
 	struct airoha_hw_stats stats;
 };
@@ -581,6 +611,7 @@ struct airoha_ppe {
 
 	struct hlist_head *foe_flow;
 	u16 *foe_check_time;
+	struct airoha_ppe_soe_meta *soe_meta;
 
 	struct airoha_foe_stats *foe_stats;
 	dma_addr_t foe_stats_dma;
@@ -621,6 +652,7 @@ struct airoha_eth {
 
 	struct airoha_qdma qdma[AIROHA_MAX_NUM_QDMA];
 	struct airoha_gdm_port *ports[AIROHA_MAX_NUM_GDM_PORTS];
+	struct airoha_soe *soe;
 };
 
 u32 airoha_rr(void __iomem *base, u32 offset);
@@ -676,6 +708,14 @@ static inline bool airoha_is_7583(struct airoha_eth *eth)
 int airoha_get_fe_port(struct airoha_gdm_dev *dev);
 bool airoha_is_valid_gdm_dev(struct airoha_eth *eth,
 			     struct airoha_gdm_dev *dev);
+int airoha_qdma_xmit_skb(struct airoha_gdm_dev *dev, struct sk_buff *skb,
+			 u32 msg0, u32 msg1, u32 msg2);
+void airoha_ppe_soe_mark_skb(struct airoha_ppe_dev *dev, struct sk_buff *skb,
+			     u16 hash, u8 sa_index, u8 hop);
+bool airoha_ppe_soe_skb_marked(struct sk_buff *skb);
+void airoha_ppe_soe_xmit_skb(struct airoha_ppe_dev *dev, struct sk_buff *skb,
+			     struct net_device *netdev);
+void airoha_ppe_soe_flush_sa(struct airoha_ppe *ppe, u8 sa_index);
 
 void airoha_ppe_set_cpu_port(struct airoha_gdm_dev *dev, u8 ppe_id, u8 fport);
 bool airoha_ppe_is_enabled(struct airoha_eth *eth, int index);
diff --git a/drivers/net/ethernet/airoha/airoha_regs.h b/drivers/net/ethernet/airoha/airoha_regs.h
index 436f3c8779c1..27e158d0fa4b 100644
--- a/drivers/net/ethernet/airoha/airoha_regs.h
+++ b/drivers/net/ethernet/airoha/airoha_regs.h
@@ -82,6 +82,10 @@
 #define PSE_SHARE_USED_MTHD_MASK	GENMASK(31, 16)
 #define PSE_SHARE_USED_HTHD_MASK	GENMASK(15, 0)
 
+/* TDMA/SOE port 7 needs shared-buffer flow control enabled in the PSE. */
+#define REG_PSE_FC_CFG			0x0098
+#define PSE_TDMA_SHARE_BUF_DIS_MASK	BIT(23)
+
 #define REG_GDM_MISC_CFG		0x0148
 #define GDM2_RDM_ACK_WAIT_PREF_MASK	BIT(9)
 #define GDM2_CHN_VLD_MODE_MASK		BIT(5)
@@ -252,6 +256,8 @@
 #define PPE_GLO_CFG_EN_MASK			BIT(0)
 
 #define REG_PPE_PPE_FLOW_CFG(_n)		(((_n) ? PPE2_BASE : PPE1_BASE) + 0x204)
+#define PPE_FLOW_CFG_IP6_IPSEC_MASK		BIT(28)
+#define PPE_FLOW_CFG_IP4_IPSEC_MASK		BIT(27)
 #define PPE_FLOW_CFG_IP6_HASH_GRE_KEY_MASK	BIT(20)
 #define PPE_FLOW_CFG_IP4_HASH_GRE_KEY_MASK	BIT(19)
 #define PPE_FLOW_CFG_IP4_HASH_FLOW_LABEL_MASK	BIT(18)
@@ -851,6 +857,8 @@
 #define QDMA_DESC_NEXT_ID_MASK		GENMASK(15, 0)
 /* TX MSG0 */
 #define QDMA_ETH_TXMSG_MIC_IDX_MASK	BIT(30)
+/* SOE submit metadata: msg0 carries SA index, msg1 selects port 7 OQ8/OQ9. */
+#define QDMA_ETH_TXMSG_SOE_SA_MASK	GENMASK(29, 24)
 #define QDMA_ETH_TXMSG_SP_TAG_MASK	GENMASK(29, 14)
 #define QDMA_ETH_TXMSG_ICO_MASK		BIT(13)
 #define QDMA_ETH_TXMSG_UCO_MASK		BIT(12)
@@ -873,6 +881,11 @@
 
 /* RX MSG0 */
 #define QDMA_ETH_RXMSG_SPTAG		GENMASK(21, 14)
+/* SOE completion metadata can use the full 16-bit SP tag word. */
+#define QDMA_ETH_RXMSG_SPTAG_FULL	GENMASK(29, 14)
+/* SOE completion metadata returned by the QDMA RX descriptor. */
+#define QDMA_ETH_RXMSG_SOE_MASK		BIT(10)
+#define QDMA_ETH_RXMSG_HOP_FLAGS_MASK	GENMASK(2, 0)
 /* RX MSG1 */
 #define QDMA_ETH_RXMSG_DEI_MASK		BIT(31)
 #define QDMA_ETH_RXMSG_IP6_MASK		BIT(30)
@@ -883,6 +896,9 @@
 #define QDMA_ETH_RXMSG_SPORT_MASK	GENMASK(25, 21)
 #define QDMA_ETH_RXMSG_CRSN_MASK	GENMASK(20, 16)
 #define QDMA_ETH_RXMSG_PPE_ENTRY_MASK	GENMASK(15, 0)
+/* RX MSG2 */
+/* SW_UDF carries the SA index for SOE completion frames. */
+#define QDMA_ETH_RXMSG_SW_UDF_MASK	GENMASK(31, 24)
 
 struct airoha_qdma_desc {
 	__le32 rsv;
diff --git a/drivers/net/ethernet/airoha/airoha_soe.h b/drivers/net/ethernet/airoha/airoha_soe.h
new file mode 100644
index 000000000000..0bde2e9c6b5b
--- /dev/null
+++ b/drivers/net/ethernet/airoha/airoha_soe.h
@@ -0,0 +1,126 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Ethernet-facing declarations for the Airoha Secure Offload Engine (SOE)
+ * packet offload provider.
+ *
+ * airoha_eth owns SOE lifetime and calls these helpers to expose xfrm
+ * ESP/NAT-T offload on its netdevs. When CONFIG_NET_AIROHA_SOE is disabled,
+ * the stubs keep the Ethernet driver buildable without SOE support.
+ */
+
+#ifndef AIROHA_SOE_H
+#define AIROHA_SOE_H
+
+#include <linux/bitops.h>
+#include <linux/errno.h>
+#include <linux/kconfig.h>
+#include <linux/netdev_features.h>
+#include <linux/types.h>
+
+struct airoha_soe;
+struct airoha_soe_sa;
+struct airoha_eth;
+struct airoha_gdm_dev;
+struct device;
+struct dst_entry;
+struct net_device;
+struct netlink_ext_ack;
+struct sk_buff;
+struct xfrm_state;
+
+#define AIROHA_SOE_FEATURE_ESP		BIT(0)
+
+typedef int (*airoha_soe_xmit_skb_t)(struct airoha_gdm_dev *dev,
+				     struct sk_buff *skb, u32 msg0, u32 msg1,
+				     u32 msg2);
+
+#if IS_ENABLED(CONFIG_NET_AIROHA_SOE)
+int airoha_soe_init(struct airoha_eth *eth);
+void airoha_soe_deinit(struct airoha_eth *eth);
+bool airoha_soe_available(struct airoha_soe *soe);
+u32 airoha_soe_features(struct airoha_soe *soe);
+void airoha_soe_build_netdev(struct net_device *dev,
+			     airoha_soe_xmit_skb_t xmit_skb);
+void airoha_soe_teardown_netdev(struct net_device *dev);
+int airoha_soe_set_features(struct net_device *dev,
+			    netdev_features_t features);
+bool airoha_soe_rx_skb(struct airoha_soe *soe, struct sk_buff *skb,
+		       unsigned int sa_index, u32 hop_flags);
+bool airoha_soe_rx_plain_skb(struct airoha_gdm_dev *dev,
+			     struct sk_buff *skb, struct net_device *rx_dev,
+			     u16 foe_hash, u32 foe_reason, bool foe_valid);
+bool airoha_soe_has_pending_rx(struct airoha_soe *soe);
+int airoha_soe_xfrm_ppe_info(const struct dst_entry *dst, u8 *sa_index,
+			     u8 *hop);
+int airoha_soe_xmit(struct airoha_soe_sa *sa, struct airoha_gdm_dev *dev,
+		    struct sk_buff *skb, struct xfrm_state *x);
+#else
+static inline int airoha_soe_init(struct airoha_eth *eth)
+{
+	return 0;
+}
+
+static inline void airoha_soe_deinit(struct airoha_eth *eth)
+{
+}
+
+static inline bool airoha_soe_available(struct airoha_soe *soe)
+{
+	return false;
+}
+
+static inline u32 airoha_soe_features(struct airoha_soe *soe)
+{
+	return 0;
+}
+
+static inline void airoha_soe_build_netdev(struct net_device *dev,
+					   airoha_soe_xmit_skb_t xmit_skb)
+{
+}
+
+static inline void airoha_soe_teardown_netdev(struct net_device *dev)
+{
+}
+
+static inline int airoha_soe_set_features(struct net_device *dev,
+					  netdev_features_t features)
+{
+	return 0;
+}
+
+static inline bool airoha_soe_rx_skb(struct airoha_soe *soe,
+				     struct sk_buff *skb,
+				     unsigned int sa_index, u32 hop_flags)
+{
+	return false;
+}
+
+static inline bool airoha_soe_rx_plain_skb(struct airoha_gdm_dev *dev,
+					   struct sk_buff *skb,
+					   struct net_device *rx_dev,
+					   u16 foe_hash, u32 foe_reason,
+					   bool foe_valid)
+{
+	return false;
+}
+
+static inline bool airoha_soe_has_pending_rx(struct airoha_soe *soe)
+{
+	return false;
+}
+
+static inline int airoha_soe_xfrm_ppe_info(const struct dst_entry *dst,
+					   u8 *sa_index, u8 *hop)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline int airoha_soe_xmit(struct airoha_soe_sa *sa,
+				  struct airoha_gdm_dev *dev,
+				  struct sk_buff *skb, struct xfrm_state *x)
+{
+	return -EOPNOTSUPP;
+}
+#endif
+
+#endif /* AIROHA_SOE_H */
-- 
2.53.0

^ permalink raw reply related

* [RFC PATCH net-next 3/7] arm64: dts: airoha: add EN7581 SOE node
From: Jihong Min @ 2026-06-14  4:00 UTC (permalink / raw)
  To: netdev, Lorenzo Bianconi
  Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Andrew Lunn, Simon Horman, Herbert Xu, Steffen Klassert,
	Rob Herring, Krzysztof Kozlowski, Conor Dooley, devicetree,
	Matthias Brugger, AngeloGioacchino Del Regno, linux-arm-kernel,
	linux-mediatek, Christian Marangi, Felix Fietkau, linux-kernel,
	Jihong Min
In-Reply-To: <20260614040032.1567994-1-hurryman2212@gmail.com>

Describe the EN7581 SOE register window and interrupt so the Ethernet driver can discover and initialize the packet offload engine.

Signed-off-by: Jihong Min <hurryman2212@gmail.com>
---
 arch/arm64/boot/dts/airoha/en7581.dtsi | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/arm64/boot/dts/airoha/en7581.dtsi b/arch/arm64/boot/dts/airoha/en7581.dtsi
index ff6908a76e8e..a3c1033d2437 100644
--- a/arch/arm64/boot/dts/airoha/en7581.dtsi
+++ b/arch/arm64/boot/dts/airoha/en7581.dtsi
@@ -347,6 +347,12 @@ i2c1: i2c@1fbf8100 {
 			status = "disabled";
 		};
 
+		soe: soe@1fbfa000 {
+			compatible = "airoha,en7581-soe";
+			reg = <0x0 0x1fbfa000 0x0 0x268>;
+			interrupts = <GIC_SPI 79 IRQ_TYPE_LEVEL_HIGH>;
+		};
+
 		eth: ethernet@1fb50000 {
 			compatible = "airoha,en7581-eth";
 			reg = <0 0x1fb50000 0 0x2600>,
-- 
2.53.0


^ permalink raw reply related

* [RFC PATCH net-next 2/7] dt-bindings: net: airoha: add EN7581 SOE
From: Jihong Min @ 2026-06-14  4:00 UTC (permalink / raw)
  To: netdev, Lorenzo Bianconi
  Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Andrew Lunn, Simon Horman, Herbert Xu, Steffen Klassert,
	Rob Herring, Krzysztof Kozlowski, Conor Dooley, devicetree,
	Matthias Brugger, AngeloGioacchino Del Regno, linux-arm-kernel,
	linux-mediatek, Christian Marangi, Felix Fietkau, linux-kernel,
	Jihong Min
In-Reply-To: <20260614040032.1567994-1-hurryman2212@gmail.com>

Document the EN7581 Secure Offload Engine register window used by the Ethernet driver for ESP packet offload, and add the new binding to the Airoha Ethernet MAINTAINERS entry.

Signed-off-by: Jihong Min <hurryman2212@gmail.com>
---
 .../bindings/net/airoha,en7581-soe.yaml       | 48 +++++++++++++++++++
 MAINTAINERS                                   |  1 +
 2 files changed, 49 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/airoha,en7581-soe.yaml

diff --git a/Documentation/devicetree/bindings/net/airoha,en7581-soe.yaml b/Documentation/devicetree/bindings/net/airoha,en7581-soe.yaml
new file mode 100644
index 000000000000..24aecafecc70
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/airoha,en7581-soe.yaml
@@ -0,0 +1,48 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/net/airoha,en7581-soe.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Airoha EN7581 Secure Offload Engine
+
+maintainers:
+  - Lorenzo Bianconi <lorenzo@kernel.org>
+
+description:
+  The Secure Offload Engine provides inline ESP packet offload resources used
+  by the Airoha Ethernet controller.
+
+properties:
+  compatible:
+    const: airoha,en7581-soe
+
+  reg:
+    maxItems: 1
+
+  interrupts:
+    maxItems: 1
+
+required:
+  - compatible
+  - reg
+  - interrupts
+
+additionalProperties: false
+
+examples:
+  - |
+    #include <dt-bindings/interrupt-controller/arm-gic.h>
+    #include <dt-bindings/interrupt-controller/irq.h>
+
+    soc {
+      #address-cells = <2>;
+      #size-cells = <2>;
+
+      soe@1fbfa000 {
+        compatible = "airoha,en7581-soe";
+        reg = <0 0x1fbfa000 0 0x268>;
+        interrupts = <GIC_SPI 79 IRQ_TYPE_LEVEL_HIGH>;
+      };
+    };
+...
diff --git a/MAINTAINERS b/MAINTAINERS
index cc1dde0c9067..7c338e670572 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -757,6 +757,7 @@ L:	linux-mediatek@lists.infradead.org (moderated for non-subscribers)
 L:	netdev@vger.kernel.org
 S:	Maintained
 F:	Documentation/devicetree/bindings/net/airoha,en7581-eth.yaml
+F:	Documentation/devicetree/bindings/net/airoha,en7581-soe.yaml
 F:	drivers/net/ethernet/airoha/
 
 AIROHA PCIE PHY DRIVER
-- 
2.53.0


^ permalink raw reply related

* [RFC PATCH net-next 1/7] xfrm: allow packet offload drivers to own transmit
From: Jihong Min @ 2026-06-14  4:00 UTC (permalink / raw)
  To: netdev, Lorenzo Bianconi
  Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Andrew Lunn, Simon Horman, Herbert Xu, Steffen Klassert,
	Rob Herring, Krzysztof Kozlowski, Conor Dooley, devicetree,
	Matthias Brugger, AngeloGioacchino Del Regno, linux-arm-kernel,
	linux-mediatek, Christian Marangi, Felix Fietkau, linux-kernel,
	Jihong Min
In-Reply-To: <20260614040032.1567994-1-hurryman2212@gmail.com>

Packet offload drivers can currently program state and validate whether an skb can be offloaded, but they cannot take ownership of a packet that needs driver-specific TX preparation before the regular XFRM output path continues.

Add an optional xdo_dev_packet_xmit() callback. Drivers that implement it consume the skb and return the final TX status; all other drivers keep the existing XFRM output path.

Signed-off-by: Jihong Min <hurryman2212@gmail.com>
---
 include/linux/netdevice.h |  8 ++++++++
 net/xfrm/xfrm_output.c    | 11 +++++++++++
 2 files changed, 19 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 7f4f0837c09f..1552eb81ddf0 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1048,6 +1048,14 @@ struct xfrmdev_ops {
 	int	(*xdo_dev_policy_add) (struct xfrm_policy *x, struct netlink_ext_ack *extack);
 	void	(*xdo_dev_policy_delete) (struct xfrm_policy *x);
 	void	(*xdo_dev_policy_free) (struct xfrm_policy *x);
+	/* Optional packet-offload TX path for devices that need
+	 * driver-specific transmit preparation instead of continuing through
+	 * the regular XFRM output path, such as adding offload metadata or
+	 * steering the packet to a private transmit queue. The driver consumes
+	 * skb and returns the final transmit status.
+	 */
+	int	(*xdo_dev_packet_xmit)(struct sk_buff *skb,
+				       struct xfrm_state *x);
 };
 #endif
 
diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c
index cc35c2fcbbe0..9f11559b0221 100644
--- a/net/xfrm/xfrm_output.c
+++ b/net/xfrm/xfrm_output.c
@@ -770,6 +770,17 @@ int xfrm_output(struct sock *sk, struct sk_buff *skb)
 	}
 
 	if (x->xso.type == XFRM_DEV_OFFLOAD_PACKET) {
+#ifdef CONFIG_XFRM_OFFLOAD
+		const struct xfrmdev_ops *ops;
+#endif
+
+#ifdef CONFIG_XFRM_OFFLOAD
+		ops = x->xso.dev->xfrmdev_ops;
+		/* Callback validates, consumes skb and returns final TX status. */
+		if (ops && ops->xdo_dev_packet_xmit)
+			return ops->xdo_dev_packet_xmit(skb, x);
+#endif
+
 		if (!xfrm_dev_offload_ok(skb, x)) {
 			XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTERROR);
 			kfree_skb(skb);
-- 
2.53.0


^ permalink raw reply related

* [RFC PATCH net-next 0/7] net: airoha: add EN7581 SOE ESP packet offload
From: Jihong Min @ 2026-06-14  4:00 UTC (permalink / raw)
  To: netdev, Lorenzo Bianconi
  Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Andrew Lunn, Simon Horman, Herbert Xu, Steffen Klassert,
	Rob Herring, Krzysztof Kozlowski, Conor Dooley, devicetree,
	Matthias Brugger, AngeloGioacchino Del Regno, linux-arm-kernel,
	linux-mediatek, Christian Marangi, Felix Fietkau, linux-kernel,
	Jihong Min

Add Secure Offload Engine (SOE) support for the Airoha EN7581 Ethernet
driver. SOE provides inline ESP packet offload for native ESP and NAT-T
traffic, with the Ethernet/QDMA path used to submit packets to the SOE
block and the PPE path used to bind eligible ESP flows. NETIF_F_GSO_ESP
and NETIF_F_HW_ESP_TX_CSUM are intentionally left out for now and will be
revisited separately for feasibility.

This is posted as RFC because the code was originally developed and tested
against an OpenWrt 6.18 Airoha tree, not against the current upstream
net-next driver. The original OpenWrt commit used as the source for this
RFC is available at:
https://github.com/hurryman2212/OpenW1700k-test/commit/7c1b5e662f7790b3d23ed143beadc1dcbf6d15f7

The SOE part is intentionally linked into the airoha Ethernet module
instead of being exposed as an independent crypto or platform driver. The
user-visible ESP offload control is a netdev capability: xfrmdev_ops and
NETIF_F_HW_ESP live on the target netdev, and the feature can be controlled
through the usual netdev feature path. SOE also shares the FE/QDMA/PPE
datapath, private queues, DSA conduit handling and netdev lifetime owned by
airoha_eth.

Patch 1 adds xdo_dev_packet_xmit() because the existing XFRM packet
offload transmit path does not provide a hook for hardware whose ESP engine
is reached through device-specific packet forwarding. SOE needs to consume
the skb, add a hardware hop descriptor, steer it to a private QDMA path and
return the final transmit status. Drivers that do not implement the
optional callback keep the existing XFRM output behavior.

Jihong Min (7):
  xfrm: allow packet offload drivers to own transmit
  dt-bindings: net: airoha: add EN7581 SOE
  arm64: dts: airoha: add EN7581 SOE node
  net: airoha: add SOE registers and driver state
  net: airoha: add QDMA support for SOE packets
  net: airoha: add PPE support for SOE flows
  net: airoha: add SOE XFRM packet offload support

 .../bindings/net/airoha,en7581-soe.yaml       |   48 +
 MAINTAINERS                                   |    1 +
 arch/arm64/boot/dts/airoha/en7581.dtsi        |    6 +
 drivers/net/ethernet/airoha/Kconfig           |   13 +
 drivers/net/ethernet/airoha/Makefile          |    1 +
 drivers/net/ethernet/airoha/airoha_eth.c      |  668 +++++-
 drivers/net/ethernet/airoha/airoha_eth.h      |   40 +
 drivers/net/ethernet/airoha/airoha_ppe.c      |  606 +++++-
 drivers/net/ethernet/airoha/airoha_regs.h     |   16 +
 drivers/net/ethernet/airoha/airoha_soe.c      | 1896 +++++++++++++++++
 drivers/net/ethernet/airoha/airoha_soe.h      |  126 ++
 include/linux/netdevice.h                     |    8 +
 include/linux/soc/airoha/airoha_offload.h     |    5 +
 net/xfrm/xfrm_output.c                        |   11 +
 14 files changed, 3342 insertions(+), 103 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/net/airoha,en7581-soe.yaml
 create mode 100644 drivers/net/ethernet/airoha/airoha_soe.c
 create mode 100644 drivers/net/ethernet/airoha/airoha_soe.h

-- 
2.53.0

^ permalink raw reply

* Re: [PATCH net] psample: zero the netlink attribute padding in PSAMPLE_ATTR_DATA
From: Xiang Mei @ 2026-06-14  3:50 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, davem, yotam.gi, edumazet, pabeni, horms, bestswngs
In-Reply-To: <20260613171052.0dbdfea4@kernel.org>

On Sat, Jun 13, 2026 at 5:10 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Sat, 13 Jun 2026 17:02:39 -0700 Xiang Mei wrote:
> > > >       if (data_len) {
> > > > -             int nla_len = nla_total_size(data_len);
> > > >               struct nlattr *nla;
> > > >
> > > > -             nla = skb_put(nl_skb, nla_len);
> > > > -             nla->nla_type = PSAMPLE_ATTR_DATA;
> > > > -             nla->nla_len = nla_attr_size(data_len);
> > > > +             nla = nla_reserve(nl_skb, PSAMPLE_ATTR_DATA, data_len);
> > > > +             if (!nla)
> > > > +                     goto error;
> > > >
> > > >               if (skb_copy_bits(skb, 0, nla_data(nla), data_len))
> > > >                       goto error;
> > > >
> > > > Let me know if the new patch makes sense.
> > >
> > > I assumed the author intentionally was avoiding the memset for
> > > the memory we will override with data. Otherwise the whole dance
> > > could be avoided and nla_put() would have been the answer.
> >
> > The reason nla_put() isn't used here is that the payload source is the
> > sampled skb, which can be nonlinear, so the data has to be gathered with
> > skb_copy_bits() rather than a flat memcpy().
>
> That too.
>
> > There is no nla_put() variant that takes an skb source, hence the
> > reserve-then-copy.
> >
> > But that doesn't require open-coding the attribute: nla_reserve() only
> > memsets the alignment padding (nla_padlen), never the data region, so
> > nla_reserve() + skb_copy_bits() writes every byte exactly once with no
> > redundant memset over the payload. The bug was that the open-coded
> > version dropped that padding-zero step.
>
> I find it hard to believe that nla_reserve() was not considered.
> It's a widely used function in the networking stack.
Thanks for your time of discussion. v3 has been sent.

Xiang
> Please move on.

^ permalink raw reply

* [PATCH net v2] psample: use nla_reserve() for PSAMPLE_ATTR_DATA
From: Xiang Mei @ 2026-06-14  3:49 UTC (permalink / raw)
  To: kuba, netdev
  Cc: davem, yotam.gi, edumazet, pabeni, horms, bestswngs, Xiang Mei

psample_sample_packet() open-codes the PSAMPLE_ATTR_DATA attribute and
reserves nla_total_size(data_len) bytes but only writes NLA_HDRLEN +
data_len of them.  When data_len is not a multiple of 4 the trailing
alignment padding is left uninitialised, leaking stale slab memory to
every listener on the PSAMPLE_NL_MCGRP_SAMPLE multicast group.

Use nla_reserve(), which lays out the header and zeroes the padding, and
copy the payload into the reserved area with skb_copy_bits().

Fixes: 6ae0a6286171 ("net: Introduce psample, a new genetlink channel for packet sampling")
Reported-by: Weiming Shi <bestswngs@gmail.com>
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Xiang Mei <xmei5@asu.edu>
---
v2: use nla_reserve to ensure no info leak

 net/psample/psample.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/net/psample/psample.c b/net/psample/psample.c
index 7763662036fb..6a714a4b4992 100644
--- a/net/psample/psample.c
+++ b/net/psample/psample.c
@@ -476,12 +476,11 @@ void psample_sample_packet(struct psample_group *group,
 		goto error;
 
 	if (data_len) {
-		int nla_len = nla_total_size(data_len);
 		struct nlattr *nla;
 
-		nla = skb_put(nl_skb, nla_len);
-		nla->nla_type = PSAMPLE_ATTR_DATA;
-		nla->nla_len = nla_attr_size(data_len);
+		nla = nla_reserve(nl_skb, PSAMPLE_ATTR_DATA, data_len);
+		if (!nla)
+			goto error;
 
 		if (skb_copy_bits(skb, 0, nla_data(nla), data_len))
 			goto error;
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH bpf v5 1/2] bpf: Run generic devmap egress prog on private skb
From: sun jian @ 2026-06-14  3:46 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Alexei Starovoitov, bpf, Network Development, LKML,
	open list:KERNEL SELFTEST FRAMEWORK, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau,
	David S. Miller, Jesper Dangaard Brouer, John Fastabend,
	Stanislav Fomichev, Shuah Khan, Jiayuan Chen,
	Toke Høiland-Jørgensen, Menglong Dong, Emil Tsalapatis
In-Reply-To: <20260613130343.3984878b@kernel.org>

On Sun, Jun 14, 2026 at 4:03 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Sat, 13 Jun 2026 10:53:35 -0700 Alexei Starovoitov wrote:
> > On Sat, Jun 13, 2026 at 10:25 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > >
> > > On Fri, 12 Jun 2026 19:40:31 +0800 Sun Jian wrote:
> > > > Suggested-by: Jakub Kicinski <kuba@kernel.org>
> > >
> > > I did not suggest this
> >
> > ohh. I didn't follow discussion closely.
> > Do you want me to revert the whole set or just remove that line?
>
> It's alright, I was just being grumpy.
>
> Maybe we can amend Chris's prompts to catch these?
> Noobs tend to add Suggested-by for people who simply commented
> on patches.

Hi Jakub, Alexei,

Sorry about that. I had meant to drop that Suggested-by tag before
sending v5, but missed it.

No action needed from my side.

Thanks,
Sun Jian

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox