Re: [RFC PATCH net-next 6/7] net: airoha: add PPE support for SOE flows

Devicetree
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: "Jihong Min" <hurryman2212@gmail.com>
Cc: devicetree@vger.kernel.org, robh@kernel.org, conor+dt@kernel.org
Subject: Re: [RFC PATCH net-next 6/7] net: airoha: add PPE support for SOE flows
Date: Mon, 15 Jun 2026 04:58:07 +0000	[thread overview]
Message-ID: <20260615045807.AD8021F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260614040032.1567994-7-hurryman2212@gmail.com>

Thank you for your contribution! Sashiko AI review found 8 potential issue(s) to consider:

New issues:
- [Critical] Type confusion from blindly casting `f->cookie` allows unprivileged users to crash the kernel via TC flower.
- [High] The driver destructively overwrites `skb->mark` with private hardware metadata, breaking user-defined firewall marks.
- [High] Lack of IPv4 fragment validation in `airoha_ppe_soe_fill_inner_ipv4_data` leads to garbage L4 port extraction.
- [High] Lockless updates of the `soe_meta` array cause data races and torn reads, corrupting IPsec state.
- [High] Overlapping identical memory pointers are passed to `memcpy`, triggering undefined behavior.
- [High] Hardware polling on 16K+ table entries inside a spinlock triggers CPU soft lockups.
- [Medium] Massive contiguous memory allocations via `devm_kzalloc` risk persistent driver initialization failures.

Pre-existing issues:
- [High] A shadowed error variable in `airoha_ppe_flush_entries` silently masks hardware initialization failures.
--

commit 00c38592fc258c1ad98cdbaaadb9925cf653ba43
Author: Jihong Min <hurryman2212@gmail.com>

net: airoha: add PPE support for SOE flows

Add PPE metadata handling for SOE flows so decrypted packets can carry
their original FOE/SA context until the normal egress path is known, and
so XFRM flowtable entries can be programmed with SOE SA and hop
information.

> diff --git a/drivers/net/ethernet/airoha/airoha_ppe.c b/drivers/net/ethernet/airoha/airoha_ppe.c
> index 91bcc55a6ac67..faa5f04d4c7b3 100644
> --- a/drivers/net/ethernet/airoha/airoha_ppe.c
> +++ b/drivers/net/ethernet/airoha/airoha_ppe.c
> @@ -578,6 +604,57 @@ static int airoha_ppe_foe_entry_set_ipv6_tuple(struct airoha_foe_entry *hwe,
> +static int airoha_ppe_soe_fill_inner_ipv4_data(struct sk_buff *skb,
> +					       struct airoha_flow_data *data,
> +					       int *type, int *l4proto)
> +{
> +	unsigned int ip_offset = ETH_HLEN;
> +	union {
> +		struct tcphdr tcp;
> +		struct udphdr udp;
> +	} ports;
> +	struct iphdr iph_buf, *iph;
> +	unsigned int l4_offset;
> +	struct udphdr *udp;
> +	struct tcphdr *tcp;
> +
> +	if (skb_headlen(skb) < ETH_HLEN)
> +		return -EINVAL;
> +
> +	memcpy(&data->eth, skb->data, ETH_HLEN);
> +	if (data->eth.h_proto != htons(ETH_P_IP))
> +		return -EAFNOSUPPORT;
> +
> +	iph = skb_header_pointer(skb, ip_offset, sizeof(iph_buf), &iph_buf);
> +	if (!iph || iph->ihl < 5 || iph->version != 4)
> +		return -EINVAL;
> +
> +	l4_offset = ip_offset + iph->ihl * 4;

[Severity: High]
Does this code handle IPv4 fragments safely?

If iph->frag_off indicates a trailing fragment, the L4 header is not present.

Because there is no ip_is_fragment(iph) check before calculating l4_offset,
the function will read arbitrary payload data, misinterpret it as ports,
and program incorrect tuples into the hardware offload tables, leading to
corrupted flows and dropped fragments.

[ ... ]

> @@ -868,16 +1096,129 @@ airoha_ppe_foe_commit_subflow_entry(struct airoha_ppe *ppe,
> +static void airoha_ppe_soe_meta_store(struct airoha_ppe_soe_meta *meta,
> +				      u32 key_hash, u16 foe_hash, u8 sa_index,
> +				      u8 hop)
>  {
> +	u8 seen = 1;
> +
> +	if (READ_ONCE(meta->valid) &&
> +	    READ_ONCE(meta->key_hash) == key_hash &&
> +	    READ_ONCE(meta->foe_hash) == foe_hash &&
> +	    READ_ONCE(meta->sa_index) == sa_index &&
> +	    READ_ONCE(meta->hop) == hop &&
> +	    time_before_eq(jiffies, READ_ONCE(meta->expires)))
> +		seen = min_t(u8, READ_ONCE(meta->seen) + 1, U8_MAX);
> +
> +	WRITE_ONCE(meta->key_hash, key_hash);
> +	WRITE_ONCE(meta->foe_hash, foe_hash);
> +	WRITE_ONCE(meta->sa_index, sa_index);
> +	WRITE_ONCE(meta->hop, hop);
> +	WRITE_ONCE(meta->seen, seen);
> +	WRITE_ONCE(meta->expires,
> +		   jiffies + msecs_to_jiffies(AIROHA_PPE_SOE_META_TIMEOUT_MS));
> +	WRITE_ONCE(meta->valid, 1);

[Severity: High]
Can these lockless updates cause torn reads or data races?

Because there are no CPU memory barriers (like smp_store_release) and no
locks to prevent concurrent hash bucket collisions, writes are performed
sequentially.

Does this mean valid = 1 could be reordered before the data is visible to
other CPUs?

> +}
> +
> +void airoha_ppe_soe_mark_skb(struct airoha_ppe_dev *dev, struct sk_buff *skb,
> +			     u16 hash, u8 sa_index, u8 hop)
> +{
> +	struct airoha_ppe *ppe;
> +	u32 ppe_hash_mask;
> +
> +	if (!dev || !skb)
> +		return;
> +
> +	ppe = dev->priv;
> +	if (!ppe || !ppe->soe_meta)
> +		return;
> +
> +	ppe_hash_mask = airoha_ppe_get_total_num_entries(ppe) - 1;
> +	if (hash > ppe_hash_mask)
> +		return;
> +
> +	/* SOE decrypt completion is CPU-visible before normal routing has
> +	 * selected the plaintext egress netdev. Keep the original encrypted FOE
> +	 * hash and SA hop briefly on the skb so airoha_dev_xmit() can finish
> +	 * the PPE entry once the final egress descriptor is known.
> +	 */
> +	airoha_ppe_soe_meta_store(&ppe->soe_meta[hash], hash, hash, sa_index,
> +				  hop);
> +	ppe->foe_check_time[hash] = 0;
> +
> +	skb->mark &= ~(AIROHA_PPE_SOE_MARK_MAGIC_MASK |
> +		       AIROHA_PPE_SOE_MARK_HASH_MASK);
> +	skb->mark |= AIROHA_PPE_SOE_MARK_MAGIC |
> +		     FIELD_PREP(AIROHA_PPE_SOE_MARK_HASH_MASK, hash);

[Severity: High]
Does this destructively overwrite the user-space visible skb->mark?

Network drivers are forbidden from repurposing the global UAPI skb->mark
field for internal cross-stack state tracking.

Could this destroy any existing fwmark applied by userspace
(via iptables, nftables, or tc), breaking policy routing and firewall rules?

> +}
> +
> +bool airoha_ppe_soe_skb_marked(struct sk_buff *skb)
> +{
> +	return skb && ((skb->mark & AIROHA_PPE_SOE_MARK_MAGIC_MASK) ==
> +		       AIROHA_PPE_SOE_MARK_MAGIC);
> +}
> +
> +void airoha_ppe_soe_xmit_skb(struct airoha_ppe_dev *dev, struct sk_buff *skb,
> +			     struct net_device *netdev)
> +{
> +	struct airoha_foe_entry entry, tmpl, *hwe;
> +	struct airoha_flow_data data = {};
> +	struct airoha_ppe_soe_meta *meta;
> +	u32 ppe_hash_mask, key_hash;
> +	struct airoha_gdm_dev *gdm;
> +	struct airoha_ppe *ppe;
> +	unsigned long expires;
> +	u16 hash;
> +	int err, l4proto, type;
> +	u8 sa_index, hop;
> +	u8 seen;
> +
> +	if (!dev || !skb || !netdev)
> +		return;
> +
> +	if ((skb->mark & AIROHA_PPE_SOE_MARK_MAGIC_MASK) !=
> +	    AIROHA_PPE_SOE_MARK_MAGIC)
> +		return;
> +
> +	ppe = dev->priv;
> +	if (!ppe || !ppe->soe_meta)
> +		goto clear_mark;
> +
> +	ppe_hash_mask = airoha_ppe_get_total_num_entries(ppe) - 1;
> +	key_hash = FIELD_GET(AIROHA_PPE_SOE_MARK_HASH_MASK, skb->mark);
> +	if (key_hash > ppe_hash_mask)
> +		goto clear_mark;
> +
> +	meta = &ppe->soe_meta[key_hash];
> +	if (!READ_ONCE(meta->valid))
> +		goto clear_mark;

[Severity: High]
Are these reads gated by valid without acquire barriers?

Without smp_load_acquire, a reading CPU can observe valid == 1 while reading
torn, mixed data from two racing packets. This programs the hardware with an
incorrect SA index, breaking IPsec tunnel isolation.

[ ... ]

> +clear_mark:
> +	skb->mark &= ~(AIROHA_PPE_SOE_MARK_MAGIC_MASK |
> +		       AIROHA_PPE_SOE_MARK_HASH_MASK);

[Severity: High]
Is the hijacked mark consumed and zeroed out on the transmit path here?
This finalizes the loss of the original fwmark.

> +}
> +
> +void airoha_ppe_soe_flush_sa(struct airoha_ppe *ppe, u8 sa_index)
> +{
> +	u32 num_entries, hash;
> +
> +	if (!ppe)
> +		return;
> +
> +	num_entries = airoha_ppe_get_total_num_entries(ppe);
> +
> +	spin_lock_bh(&ppe_lock);

[Severity: High]
Is it safe to hold this spinlock continuously across a massive loop that
interacts with hardware?

When flushing IPsec state, this loops over every PPE table entry (up to
24,576 entries) while continuously holding a single spin_lock_bh(&ppe_lock).

During the iteration of SRAM entries, airoha_ppe_foe_get_entry_locked()
issues MMIO reads via read_poll_timeout_atomic(), which can delay up to
100us per entry.

Multiplying this delay by thousands of loop iterations can block the CPU for
tens to hundreds of milliseconds. Can this trigger watchdog soft lockup
warnings and drop unrelated network traffic?

> +	for (hash = 0; hash < num_entries; hash++) {
> +		struct airoha_foe_entry *hwe;
> +		u32 state, type;
> +
> +		hwe = airoha_ppe_foe_get_entry_locked(ppe, hash);
> +		if (!hwe)
> +			continue;
> +
> +		state = FIELD_GET(AIROHA_FOE_IB1_BIND_STATE, hwe->ib1);
> +		if (state != AIROHA_FOE_STATE_BIND)
> +			continue;
> +
> +		type = FIELD_GET(AIROHA_FOE_IB1_BIND_PACKET_TYPE, hwe->ib1);
> +		if (type != PPE_PKT_TYPE_IPV4_HNAPT &&
> +		    type != PPE_PKT_TYPE_IPV4_ROUTE)
> +			continue;
> +
> +		if (!(hwe->ipv4.data & AIROHA_FOE_TUNNEL))
> +			continue;
> +
> +		if (FIELD_GET(AIROHA_FOE_ACTDP, hwe->ipv4.data) != sa_index)
> +			continue;
> +
> +		/* NAT-T data and IKE control both use UDP/4500. A stale SOE
> +		 * bound entry can otherwise keep sending later IKE_AUTH packets
> +		 * to the SOE path after the SA has been deleted.
> +		 */
> +		hwe->ib1 &= ~AIROHA_FOE_IB1_BIND_STATE;
> +		hwe->ib1 |= FIELD_PREP(AIROHA_FOE_IB1_BIND_STATE,
> +				       AIROHA_FOE_STATE_INVALID);
> +		airoha_ppe_foe_commit_entry(ppe, hwe, hash, false);

[Severity: High]
Does this pass overlapping identical memory pointers to memcpy(), triggering
undefined behavior?

The pointer returned by airoha_ppe_foe_get_entry_locked() points directly to
the ppe->foe array.

This exact pointer is then passed as hwe into airoha_ppe_foe_commit_entry(),
which recalculates the destination pointing to the exact same address:

    struct airoha_foe_entry *hwe = ppe->foe + hash * sizeof(*hwe);
    ...
    memcpy(&hwe->d, &e->d, sizeof(*hwe) - sizeof(hwe->ib1));

The C standard forbids overlapping pointers in memcpy(), which can trigger
fortify source panics or silent data corruption.

> +	}
> +	spin_unlock_bh(&ppe_lock);
> +}

[ ... ]

> @@ -1057,6 +1551,7 @@ static int airoha_ppe_entry_idle_time(struct airoha_ppe *ppe,
>  static int airoha_ppe_flow_offload_replace(struct airoha_eth *eth,
>  					   struct flow_cls_offload *f)
>  {
> +	const struct flow_offload_tuple *tuple = (const void *)f->cookie;

[Severity: Critical]
Could this lead to a kernel panic from a type confusion if the rule was
added via TC flower?

The driver accepts generic FLOW_CLS_REPLACE commands, which can come from the
Netfilter flowtable or TC flower. This blindly casts f->cookie to
struct flow_offload_tuple *.

However, if the rule was added via TC flower, cookie actually points to a
struct cls_fl_filter.

The driver then passes this invalid pointer to
airoha_ppe_foe_entry_set_soe_info(), which reads a garbage xmit_type.

If it coincidentally matches FLOW_OFFLOAD_XMIT_XFRM, the code calls
airoha_soe_xfrm_ppe_info(tuple->dst_cache), dereferencing a random garbage
pointer. This allows any user capable of configuring TC flower rules to
trigger an oops.

>  	struct flow_rule *rule = flow_cls_offload_flow_rule(f);
>  	struct airoha_flow_table_entry *e;

[ ... ]

> @@ -1350,16 +1851,26 @@ static int airoha_ppe_flow_offload_cmd(struct airoha_eth *eth,
>  	return -EOPNOTSUPP;
>  }
>  
> -static int airoha_ppe_flush_sram_entries(struct airoha_ppe *ppe)
> +static int airoha_ppe_flush_entries(struct airoha_ppe *ppe)
>  {
> +	u32 ppe_num_entries = airoha_ppe_get_total_num_entries(ppe);
>  	u32 sram_num_entries = airoha_ppe_get_total_sram_num_entries(ppe);
>  	struct airoha_foe_entry *hwe = ppe->foe;
>  	int i, err = 0;
>  
> +	memset(hwe, 0, ppe_num_entries * sizeof(*hwe));
> +	if (ppe->foe_stats) {
> +		u32 ppe_num_stats_entries =
> +			airoha_ppe_get_total_num_stats_entries(ppe);
> +
> +		memset(ppe->foe_stats, 0,
> +		       ppe_num_stats_entries * sizeof(*ppe->foe_stats));
> +	}
> +	dma_wmb();
> +
>  	for (i = 0; i < sram_num_entries; i++) {
>  		int err;
>  
> -		memset(&hwe[i], 0, sizeof(*hwe));
>  		err = airoha_ppe_foe_commit_sram_entry(ppe, i);
>  		if (err)
>  			break;
>  	}
>  
>  	return err;

[Severity: High]
This is a pre-existing issue, but does this shadowed error variable silently
mask hardware initialization failures?

Inside this initialization loop, a block-scoped int err; shadows the
function-scoped err = 0;.

If airoha_ppe_foe_commit_sram_entry() fails (e.g., due to a hardware poll
timeout), the loop breaks, but the function returns the outer err which
remains 0.

Does this silently ignore hardware initialization errors, leaving the PPE
table in an inconsistent state?

>  }

[ ... ]

> @@ -1601,12 +2143,20 @@ int airoha_ppe_init(struct airoha_eth *eth)
>  			return -ENOMEM;
>  	}
>  
> -	ppe->foe_check_time = devm_kzalloc(eth->dev, ppe_num_entries,
> -					   GFP_KERNEL);
> +	ppe->foe_check_time =
> +		devm_kzalloc(eth->dev,
> +			     ppe_num_entries * sizeof(*ppe->foe_check_time),
> +			     GFP_KERNEL);
>  	if (!ppe->foe_check_time)
>  		return -ENOMEM;
>  
> -	err = airoha_ppe_flush_sram_entries(ppe);
> +	ppe->soe_meta = devm_kzalloc(eth->dev,
> +				     ppe_num_entries * sizeof(*ppe->soe_meta),
> +				     GFP_KERNEL);

[Severity: Medium]
Could this fail on systems with fragmented memory or high uptime?

Allocating nearly 600 KB (an order-8 contiguous allocation) without a
fallback is extremely prone to failure and will prevent the network interface
from probing. Could this use devm_kvzalloc() to allow virtual memory
fallbacks?

> +	if (!ppe->soe_meta)
> +		return -ENOMEM;
> +
> +	err = airoha_ppe_flush_entries(ppe);

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260614040032.1567994-1-hurryman2212@gmail.com?part=6

next prev parent reply	other threads:[~2026-06-15  4:58 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-14  4:00 [RFC PATCH net-next 0/7] net: airoha: add EN7581 SOE ESP packet offload Jihong Min
2026-06-14  4:00 ` [RFC PATCH net-next 1/7] xfrm: allow packet offload drivers to own transmit Jihong Min
2026-06-15  4:58   ` sashiko-bot
2026-06-14  4:00 ` [RFC PATCH net-next 2/7] dt-bindings: net: airoha: add EN7581 SOE Jihong Min
2026-06-14  4:00 ` [RFC PATCH net-next 3/7] arm64: dts: airoha: add EN7581 SOE node Jihong Min
2026-06-15  4:58   ` sashiko-bot
2026-06-14  4:00 ` [RFC PATCH net-next 4/7] net: airoha: add SOE registers and driver state Jihong Min
2026-06-14  4:00 ` [RFC PATCH net-next 5/7] net: airoha: add QDMA support for SOE packets Jihong Min
2026-06-15  4:58   ` sashiko-bot
2026-06-14  4:00 ` [RFC PATCH net-next 6/7] net: airoha: add PPE support for SOE flows Jihong Min
2026-06-15  4:58   ` sashiko-bot [this message]
2026-06-14  4:00 ` [RFC PATCH net-next 7/7] net: airoha: add SOE XFRM packet offload support Jihong Min
2026-06-15  4:58   ` sashiko-bot
2026-06-14  4:18 ` [RFC PATCH net-next 0/7] net: airoha: add EN7581 SOE ESP packet offload Jihong Min

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260615045807.AD8021F000E9@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=conor+dt@kernel.org \
    --cc=devicetree@vger.kernel.org \
    --cc=hurryman2212@gmail.com \
    --cc=robh@kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox