[PATCH net v2 0/2] Fix to possible skb leak due to race condtion in tx path

Netdev List
 help / color / mirror / Atom feed

* [PATCH net v2 0/2] Fix to possible skb leak due to race condtion in tx path
@ 2026-06-26 15:35 Selvamani Rajagopal via B4 Relay
  2026-06-26 15:35 ` [PATCH net v2 1/2] net: ethernet: oa_tc6: Protect skb pointer used by two different kernel instances Selvamani Rajagopal via B4 Relay
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Selvamani Rajagopal via B4 Relay @ 2026-06-26 15:35 UTC (permalink / raw)
  To: Parthiban Veerasooran, Andrew Lunn, Piergiorgio Beruto,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: netdev, linux-kernel, Andrew Lunn, Parthiban Veerasooran,
	Selvamani Rajagopal

Now the traffic is handled in threaded IRQ, and the
disable_traffic flag is checked before handling the
data, new race condition is exposed, in which
buffer may leak, if threaded IRQ interrupts the
trasmit path midway.

With this change, disable_traffic and waiting_tx_skb
pointer are protected by spin lock/unlock pair.

This is highlighted in Sashiko review
https://netdev-ai.bots.linux.dev/sashiko/#/patchset/20260611-level-trigger-v5-0-4533a9e85ce2%40onsemi.com

Also on buffer overrun condition, probably due to loss of
SPI data chunks, receive path doesn't see the expected
data chunk with end_valid bit set. As a result, driver
keeps adding data chunks to the skb before running out
of space and kernel panic is seen.

With this change, before adding data to the skb, if there
is no space, skb is freed and driver starts looking for
new frame by looking for a data chunk with start_valid
bit set.

[  705.405490] skbuff: skb_over_panic: text:ffffffd2eb72a264 len:1600 put:64 head:ffffff804e5cdc40 data:ffffff804e5cdc80 tail:0x680 end:0x640 dev:eth1
[  705.405569] ------------[ cut here ]------------
[  705.405575] kernel BUG at net/core/skbuff.c:214!
[  705.405589] Internal error: Oops - BUG: 00000000f2000800 [#1]  SMP

[ 6703.427690] Call trace:
[  705.925157]  skb_panic+0x58/0x68 (P)
[  705.928726]  skb_put+0x74/0x80
[  705.931772]  oa_tc6_update_rx_skb+0x44/0x98 [oa_tc6_mod]
[  705.937084]  oa_tc6_macphy_threaded_irq+0x3f4/0x900 [oa_tc6_mod]
[  705.943084]  irq_thread_fn+0x34/0xb8
[  705.946654]  irq_thread+0x1a0/0x300
[  705.950134]  kthread+0x138/0x150
[  705.953356]  ret_from_fork+0x10/0x20

Signed-off-by: Selvamani Rajagopal <Selvamani.Rajagopal@onsemi.com>
---
Changes in v2:
- Improvment to how error -EAGAIN is handled. Took care of
  couple of use cases where start_bit and end_bit may be missing or
  repeated due to lost data chunks.
- Protected handling of waiting_tx_skb pointer with spin lock
- Link to v1: https://lore.kernel.org/r/20260621-fix-race-condition-and-crash-v1-0-87e290d9357f@onsemi.com

---
Selvamani Rajagopal (2):
      net: ethernet: oa_tc6: Protect skb pointer used by two different kernel instances
      net: ethernet: oa_tc6: Improvement in buffer overflow handling

 drivers/net/ethernet/oa_tc6.c | 91 ++++++++++++++++++++++++++++++-------------
 1 file changed, 64 insertions(+), 27 deletions(-)
---
base-commit: 805185b7c7a1069e407b6f7b3bc98e44d415f484
change-id: 20260621-fix-race-condition-and-crash-94d055a665c4

Best regards,
-- 
Selvamani Rajagopal <Selvamani.Rajagopal@onsemi.com>



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH net v2 1/2] net: ethernet: oa_tc6: Protect skb pointer used by two different kernel instances
  2026-06-26 15:35 [PATCH net v2 0/2] Fix to possible skb leak due to race condtion in tx path Selvamani Rajagopal via B4 Relay
@ 2026-06-26 15:35 ` Selvamani Rajagopal via B4 Relay
  2026-06-30  2:15   ` Jakub Kicinski
  2026-06-26 15:35 ` [PATCH net v2 2/2] net: ethernet: oa_tc6: Improvement in buffer overflow handling Selvamani Rajagopal via B4 Relay
  2026-06-30  5:38 ` [PATCH net v2 0/2] Fix to possible skb leak due to race condtion in tx path Parthiban.Veerasooran
  2 siblings, 1 reply; 10+ messages in thread
From: Selvamani Rajagopal via B4 Relay @ 2026-06-26 15:35 UTC (permalink / raw)
  To: Parthiban Veerasooran, Andrew Lunn, Piergiorgio Beruto,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: netdev, linux-kernel, Andrew Lunn, Parthiban Veerasooran,
	Selvamani Rajagopal

From: Selvamani Rajagopal <Selvamani.Rajagopal@onsemi.com>

Threaded IRQ uses waiting_tx_skb. Transmit path also uses
this pointer without any mutual exclusion protection. As a
result, it might leak skb buffer, particularly threaded IRQ
runs in the middle of tranmsmit path, near skb_linearize.

Fixes: b542d13fab0f ("net: ethernet: oa_tc6: Interrupt is active low, level triggered.")
Signed-off-by: Selvamani Rajagopal <Selvamani.Rajagopal@onsemi.com>

---
changes in v2:
  added the missing prefix to the title
---
 drivers/net/ethernet/oa_tc6.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/oa_tc6.c b/drivers/net/ethernet/oa_tc6.c
index 0727d53345a3..3fd4851ee66d 100644
--- a/drivers/net/ethernet/oa_tc6.c
+++ b/drivers/net/ethernet/oa_tc6.c
@@ -672,10 +672,16 @@ static void oa_tc6_cleanup_ongoing_tx_skb(struct oa_tc6 *tc6)
 
 static void oa_tc6_cleanup_waiting_tx_skb(struct oa_tc6 *tc6)
 {
-	if (tc6->waiting_tx_skb) {
+	struct sk_buff *skb;
+
+	spin_lock_bh(&tc6->tx_skb_lock);
+	skb = tc6->waiting_tx_skb;
+	tc6->waiting_tx_skb = NULL;
+	spin_unlock_bh(&tc6->tx_skb_lock);
+
+	if (skb) {
 		tc6->netdev->stats.tx_dropped++;
-		kfree_skb(tc6->waiting_tx_skb);
-		tc6->waiting_tx_skb = NULL;
+		kfree_skb(skb);
 	}
 }
 
@@ -1250,11 +1256,6 @@ EXPORT_SYMBOL_GPL(oa_tc6_zero_align_receive_frame_enable);
  */
 netdev_tx_t oa_tc6_start_xmit(struct oa_tc6 *tc6, struct sk_buff *skb)
 {
-	if (tc6->disable_traffic || tc6->waiting_tx_skb) {
-		netif_stop_queue(tc6->netdev);
-		return NETDEV_TX_BUSY;
-	}
-
 	if (skb_linearize(skb)) {
 		dev_kfree_skb_any(skb);
 		tc6->netdev->stats.tx_dropped++;
@@ -1262,6 +1263,11 @@ netdev_tx_t oa_tc6_start_xmit(struct oa_tc6 *tc6, struct sk_buff *skb)
 	}
 
 	spin_lock_bh(&tc6->tx_skb_lock);
+	if (tc6->disable_traffic || tc6->waiting_tx_skb) {
+		netif_stop_queue(tc6->netdev);
+		spin_unlock_bh(&tc6->tx_skb_lock);
+		return NETDEV_TX_BUSY;
+	}
 	tc6->waiting_tx_skb = skb;
 	spin_unlock_bh(&tc6->tx_skb_lock);
 

-- 
2.43.0



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH net v2 1/2] net: ethernet: oa_tc6: Protect skb pointer used by two different kernel instances
  2026-06-26 15:35 ` [PATCH net v2 1/2] net: ethernet: oa_tc6: Protect skb pointer used by two different kernel instances Selvamani Rajagopal via B4 Relay
@ 2026-06-30  2:15   ` Jakub Kicinski
  2026-06-30  4:16     ` Selvamani Rajagopal
  0 siblings, 1 reply; 10+ messages in thread
From: Jakub Kicinski @ 2026-06-30  2:15 UTC (permalink / raw)
  To: Selvamani Rajagopal via B4 Relay
  Cc: Selvamani.Rajagopal, Parthiban Veerasooran, Andrew Lunn,
	Piergiorgio Beruto, David S. Miller, Eric Dumazet, Paolo Abeni,
	netdev, linux-kernel, Andrew Lunn

On Fri, 26 Jun 2026 08:35:18 -0700 Selvamani Rajagopal via B4 Relay
wrote:
> Threaded IRQ uses waiting_tx_skb. Transmit path also uses
> this pointer without any mutual exclusion protection. As a
> result, it might leak skb buffer, particularly threaded IRQ
> runs in the middle of tranmsmit path, near skb_linearize.

Can you say more ? only xmit sets waiting_tx_skb, the IRQ
clears it. So why is IRQ racing with xmit leading to drops?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH net v2 1/2] net: ethernet: oa_tc6: Protect skb pointer used by two different kernel instances
  2026-06-30  2:15   ` Jakub Kicinski
@ 2026-06-30  4:16     ` Selvamani Rajagopal
  2026-06-30 22:46       ` Jakub Kicinski
  0 siblings, 1 reply; 10+ messages in thread
From: Selvamani Rajagopal @ 2026-06-30  4:16 UTC (permalink / raw)
  To: Jakub Kicinski, Selvamani Rajagopal via B4 Relay
  Cc: Parthiban Veerasooran, Andrew Lunn, Piergiorgio Beruto,
	David S. Miller, Eric Dumazet, Paolo Abeni,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Andrew Lunn

> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Subject: Re: [PATCH net v2 1/2] net: ethernet: oa_tc6: Protect skb pointer used by two
> different kernel instances
> 
> 
> On Fri, 26 Jun 2026 08:35:18 -0700 Selvamani Rajagopal via B4 Relay
> wrote:
> > Threaded IRQ uses waiting_tx_skb. Transmit path also uses
> > this pointer without any mutual exclusion protection. As a
> > result, it might leak skb buffer, particularly threaded IRQ
> > runs in the middle of tranmsmit path, near skb_linearize.
> 
> Can you say more ? only xmit sets waiting_tx_skb, the IRQ
> clears it. So why is IRQ racing with xmit leading to drops?

I believe xmit path and IRQ thread would run in different kernel instances. Imagine oa_tc6_try_spi_transfer
call fails in threaded IRQ. It would set disable_irq. If xmit function didn't see that when it checked, but it is set
before placing skb buffer in the waiting_tx_skb pointer (due to skb_linearize for example), the skb would be stuck
in waiting_tx_skb.

Also, See the Sashiko review that gave a SMP use-case. If you search for CPU0 0r CPU1, you would find the use case.

https://netdev-ai.bots.linux.dev/sashiko/#/patchset/20260611-level-trigger-v5-0-4533a9e85ce2%40onsemi.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net v2 1/2] net: ethernet: oa_tc6: Protect skb pointer used by two different kernel instances
  2026-06-30  4:16     ` Selvamani Rajagopal
@ 2026-06-30 22:46       ` Jakub Kicinski
  0 siblings, 0 replies; 10+ messages in thread
From: Jakub Kicinski @ 2026-06-30 22:46 UTC (permalink / raw)
  To: Selvamani Rajagopal
  Cc: Selvamani Rajagopal via B4 Relay, Parthiban Veerasooran,
	Andrew Lunn, Piergiorgio Beruto, David S. Miller, Eric Dumazet,
	Paolo Abeni, netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	Andrew Lunn

On Tue, 30 Jun 2026 04:16:24 +0000 Selvamani Rajagopal wrote:
> > On Fri, 26 Jun 2026 08:35:18 -0700 Selvamani Rajagopal via B4 Relay
> > wrote:  
> > > Threaded IRQ uses waiting_tx_skb. Transmit path also uses
> > > this pointer without any mutual exclusion protection. As a
> > > result, it might leak skb buffer, particularly threaded IRQ
> > > runs in the middle of tranmsmit path, near skb_linearize.  
> > 
> > Can you say more ? only xmit sets waiting_tx_skb, the IRQ
> > clears it. So why is IRQ racing with xmit leading to drops?  
> 
> I believe xmit path and IRQ thread would run in different kernel
> instances. Imagine oa_tc6_try_spi_transfer call fails in threaded
> IRQ. It would set disable_irq. If xmit function didn't see that when
> it checked, but it is set before placing skb buffer in the
> waiting_tx_skb pointer (due to skb_linearize for example), the skb
> would be stuck in waiting_tx_skb.

Perhaps, but wouldn't that cause a stall not a leak?

Please do your digging and submit high quality patches which don't
require research. We get 150 patches a day in netdev, and all
maintainers have day jobs (contrary to popular belief)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH net v2 2/2] net: ethernet: oa_tc6: Improvement in buffer overflow handling
  2026-06-26 15:35 [PATCH net v2 0/2] Fix to possible skb leak due to race condtion in tx path Selvamani Rajagopal via B4 Relay
  2026-06-26 15:35 ` [PATCH net v2 1/2] net: ethernet: oa_tc6: Protect skb pointer used by two different kernel instances Selvamani Rajagopal via B4 Relay
@ 2026-06-26 15:35 ` Selvamani Rajagopal via B4 Relay
  2026-06-30  2:29   ` Jakub Kicinski
  2026-06-30  5:38 ` [PATCH net v2 0/2] Fix to possible skb leak due to race condtion in tx path Parthiban.Veerasooran
  2 siblings, 1 reply; 10+ messages in thread
From: Selvamani Rajagopal via B4 Relay @ 2026-06-26 15:35 UTC (permalink / raw)
  To: Parthiban Veerasooran, Andrew Lunn, Piergiorgio Beruto,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: netdev, linux-kernel, Andrew Lunn, Parthiban Veerasooran,
	Selvamani Rajagopal

From: Selvamani Rajagopal <Selvamani.Rajagopal@onsemi.com>

When oversubscribed traffic causes lot of buffer overflow errors,
probably due to loss of data chunks, driver fails to find a
data chunk with end_valid bit set, before it runs out of sk buffer
space. As a result, assert is seen during skb_put.

Now check is made if tail + len > end, driver abandons the current
data and starts look for a data chunk with start_valid bit,
that is a new frame.

Fixes: d70a0d8f2f2d ("net: ethernet: oa_tc6: implement receive path to receive rx ethernet frames")
Signed-off-by: Selvamani Rajagopal <Selvamani.Rajagopal@onsemi.com>

---
changes in v2
  - Check rx_skb pointer before new allocation and NULL before use.
---
 drivers/net/ethernet/oa_tc6.c | 69 +++++++++++++++++++++++++++++++------------
 1 file changed, 50 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/oa_tc6.c b/drivers/net/ethernet/oa_tc6.c
index 3fd4851ee66d..c59daa032e70 100644
--- a/drivers/net/ethernet/oa_tc6.c
+++ b/drivers/net/ethernet/oa_tc6.c
@@ -692,6 +692,12 @@ static void oa_tc6_free_pending_skbs(struct oa_tc6 *tc6)
 	oa_tc6_cleanup_waiting_tx_skb(tc6);
 }
 
+static void oa_tc6_look_for_new_frame(struct oa_tc6 *tc6)
+{
+	tc6->rx_buf_overflow = true;
+	oa_tc6_cleanup_ongoing_rx_skb(tc6);
+}
+
 /* If the failure is at SPI interface level, masking and clearing
  * the interrupt of the device won't work. Since SPI interrupt is
  * disabled, it should stop the repeated interrupts.
@@ -729,8 +735,7 @@ static int oa_tc6_process_extended_status(struct oa_tc6 *tc6)
 	}
 
 	if (FIELD_GET(STATUS0_RX_BUFFER_OVERFLOW_ERROR, value)) {
-		tc6->rx_buf_overflow = true;
-		oa_tc6_cleanup_ongoing_rx_skb(tc6);
+		oa_tc6_look_for_new_frame(tc6);
 		net_err_ratelimited("%s: Receive buffer overflow error\n",
 				    tc6->netdev->name);
 		return -EAGAIN;
@@ -811,13 +816,35 @@ static void oa_tc6_submit_rx_skb(struct oa_tc6 *tc6)
 	tc6->rx_skb = NULL;
 }
 
-static void oa_tc6_update_rx_skb(struct oa_tc6 *tc6, u8 *payload, u8 length)
+/* On oversubscribed traffic condition, particularly with overwhelming rx
+ * buffer overflow errors, there could be data chunk loss. If tail + length
+ * goes beyond end pointer, that is an indication that the data chunk with
+ * end_valid bit is lost. Time to look for a data chunk with start_valid bit.
+ *
+ * If rx_skb is NULL, it is time to start looking for data chunk with
+ * start_bit.
+ */
+static int oa_tc6_update_rx_skb(struct oa_tc6 *tc6, u8 *payload, u8 length)
 {
+	if (!tc6->rx_skb ||
+	    (tc6->rx_skb->tail + length) > tc6->rx_skb->end) {
+		oa_tc6_look_for_new_frame(tc6);
+		return -EAGAIN;
+	}
+
 	memcpy(skb_put(tc6->rx_skb, length), payload, length);
+	return 0;
 }
 
+/* On overwhelming rx buffer overflow errors, due to data chunk loss, it is
+ * possible that we get two data chunks with start_valid bit set, without
+ * end_valid bit set in between. In this case, rx_skb would have a valid
+ * buffer pointer. We should release, if a valid pointer is found before
+ * allocating a new one.
+ */
 static int oa_tc6_allocate_rx_skb(struct oa_tc6 *tc6)
 {
+	oa_tc6_cleanup_ongoing_rx_skb(tc6);
 	tc6->rx_skb = netdev_alloc_skb_ip_align(tc6->netdev, tc6->netdev->mtu +
 						ETH_HLEN + ETH_FCS_LEN);
 	if (!tc6->rx_skb) {
@@ -837,7 +864,9 @@ static int oa_tc6_prcs_complete_rx_frame(struct oa_tc6 *tc6, u8 *payload,
 	if (ret)
 		return ret;
 
-	oa_tc6_update_rx_skb(tc6, payload, size);
+	ret = oa_tc6_update_rx_skb(tc6, payload, size);
+	if (ret)
+		return ret;
 
 	oa_tc6_submit_rx_skb(tc6);
 
@@ -852,22 +881,24 @@ static int oa_tc6_prcs_rx_frame_start(struct oa_tc6 *tc6, u8 *payload, u16 size)
 	if (ret)
 		return ret;
 
-	oa_tc6_update_rx_skb(tc6, payload, size);
-
-	return 0;
+	return oa_tc6_update_rx_skb(tc6, payload, size);
 }
 
-static void oa_tc6_prcs_rx_frame_end(struct oa_tc6 *tc6, u8 *payload, u16 size)
+static int oa_tc6_prcs_rx_frame_end(struct oa_tc6 *tc6, u8 *payload, u16 size)
 {
-	oa_tc6_update_rx_skb(tc6, payload, size);
+	int ret;
 
-	oa_tc6_submit_rx_skb(tc6);
+	ret = oa_tc6_update_rx_skb(tc6, payload, size);
+	if (!ret)
+		oa_tc6_submit_rx_skb(tc6);
+	return ret;
 }
 
-static void oa_tc6_prcs_ongoing_rx_frame(struct oa_tc6 *tc6, u8 *payload,
-					 u32 footer)
+static int oa_tc6_prcs_ongoing_rx_frame(struct oa_tc6 *tc6, u8 *payload,
+					u32 footer)
 {
-	oa_tc6_update_rx_skb(tc6, payload, OA_TC6_CHUNK_PAYLOAD_SIZE);
+	return oa_tc6_update_rx_skb(tc6, payload,
+				    OA_TC6_CHUNK_PAYLOAD_SIZE);
 }
 
 static int oa_tc6_prcs_rx_chunk_payload(struct oa_tc6 *tc6, u8 *data,
@@ -880,6 +911,7 @@ static int oa_tc6_prcs_rx_chunk_payload(struct oa_tc6 *tc6, u8 *data,
 	bool start_valid = FIELD_GET(OA_TC6_DATA_FOOTER_START_VALID, footer);
 	bool end_valid = FIELD_GET(OA_TC6_DATA_FOOTER_END_VALID, footer);
 	u16 size;
+	int ret;
 
 	/* Restart the new rx frame after receiving rx buffer overflow error */
 	if (start_valid && tc6->rx_buf_overflow)
@@ -907,8 +939,7 @@ static int oa_tc6_prcs_rx_chunk_payload(struct oa_tc6 *tc6, u8 *data,
 	/* Process the chunk with only rx frame end */
 	if (end_valid && !start_valid) {
 		size = end_byte_offset + 1;
-		oa_tc6_prcs_rx_frame_end(tc6, data, size);
-		return 0;
+		return oa_tc6_prcs_rx_frame_end(tc6, data, size);
 	}
 
 	/* Process the chunk with previous rx frame end and next rx frame
@@ -921,7 +952,9 @@ static int oa_tc6_prcs_rx_chunk_payload(struct oa_tc6 *tc6, u8 *data,
 		 */
 		if (tc6->rx_skb) {
 			size = end_byte_offset + 1;
-			oa_tc6_prcs_rx_frame_end(tc6, data, size);
+			ret = oa_tc6_prcs_rx_frame_end(tc6, data, size);
+			if (ret)
+				return ret;
 		}
 		size = OA_TC6_CHUNK_PAYLOAD_SIZE - start_byte_offset;
 		return oa_tc6_prcs_rx_frame_start(tc6,
@@ -930,9 +963,7 @@ static int oa_tc6_prcs_rx_chunk_payload(struct oa_tc6 *tc6, u8 *data,
 	}
 
 	/* Process the chunk with ongoing rx frame data */
-	oa_tc6_prcs_ongoing_rx_frame(tc6, data, footer);
-
-	return 0;
+	return oa_tc6_prcs_ongoing_rx_frame(tc6, data, footer);
 }
 
 static u32 oa_tc6_get_rx_chunk_footer(struct oa_tc6 *tc6, u16 footer_offset)

-- 
2.43.0



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH net v2 2/2] net: ethernet: oa_tc6: Improvement in buffer overflow handling
  2026-06-26 15:35 ` [PATCH net v2 2/2] net: ethernet: oa_tc6: Improvement in buffer overflow handling Selvamani Rajagopal via B4 Relay
@ 2026-06-30  2:29   ` Jakub Kicinski
  2026-06-30  4:41     ` Selvamani Rajagopal
  0 siblings, 1 reply; 10+ messages in thread
From: Jakub Kicinski @ 2026-06-30  2:29 UTC (permalink / raw)
  To: Selvamani Rajagopal via B4 Relay
  Cc: Selvamani.Rajagopal, Parthiban Veerasooran, Andrew Lunn,
	Piergiorgio Beruto, David S. Miller, Eric Dumazet, Paolo Abeni,
	netdev, linux-kernel, Andrew Lunn

On Fri, 26 Jun 2026 08:35:19 -0700 Selvamani Rajagopal via B4 Relay
wrote:
> When oversubscribed traffic causes lot of buffer overflow errors,
> probably due to loss of data chunks, driver fails to find a
> data chunk with end_valid bit set, before it runs out of sk buffer
> space. As a result, assert is seen during skb_put.
> 
> Now check is made if tail + len > end, driver abandons the current
> data and starts look for a data chunk with start_valid bit,
> that is a new frame.

This sounds rather scary. The driver seems to have no length
information to confirm it got all the chunks. So if it missed
a middle chunk we will never know? At the very least this seems
like something we should increment rx_error for, not just rx_dropped?

Regarding the patch itself, I'm not clear on why we need to look
for new frame. Will we not notice the start bit immediately and
call oa_tc6_allocate_rx_skb() (if there is indeed a start bit in the
stream?)

So handling skb-already-exists in oa_tc6_allocate_rx_skb() seems
like enough to start a new frame.

Sashiko has another comment:

https://sashiko.dev/#/patchset/20260626-fix-race-condition-and-crash-v2-1-b6c5c10e604f@onsemi.com
-- 
pw-bot: cr

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH net v2 2/2] net: ethernet: oa_tc6: Improvement in buffer overflow handling
  2026-06-30  2:29   ` Jakub Kicinski
@ 2026-06-30  4:41     ` Selvamani Rajagopal
  0 siblings, 0 replies; 10+ messages in thread
From: Selvamani Rajagopal @ 2026-06-30  4:41 UTC (permalink / raw)
  To: Jakub Kicinski, Selvamani Rajagopal via B4 Relay
  Cc: Parthiban Veerasooran, Andrew Lunn, Piergiorgio Beruto,
	David S. Miller, Eric Dumazet, Paolo Abeni,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Andrew Lunn

> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Monday, June 29, 2026 7:30 PM
> Subject: Re: [PATCH net v2 2/2] net: ethernet: oa_tc6: Improvement in buffer overflow handling
> 
> 
> This sounds rather scary. The driver seems to have no length
> information to confirm it got all the chunks. So if it missed
> a middle chunk we will never know? At the very least this seems
> like something we should increment rx_error for, not just rx_dropped?

If middle chunk is lost, we won't know. If we look at prcs_rx_chunk_payload, it relies on start_valid 
and end_valid bits only. start_byte_offset and end_byte_offset are always associated with the above
two bits.

When rx buffer overflow occurs, we don't know whether to trust the data chunk or lost, if lost,
how many lost, so, we drop everything and start looking for new frame indicated by "start_bit"
That's why drop count is incremented. 

If we dig deeper, though we increment the dropped count by one. It is possible that we may have lost 
more than one frame.

Function process_spi_data_rx_buf  bails out of the "for loop" with number_of_rx_chunks,
after EAGAIN error is seen. We don't go through all the rx chunks that are already received, when 
we start looking for next data chunks with "start_bit". 

That improvement is for another time. I didn't want to complicate the changes for this effort.

> 
> Regarding the patch itself, I'm not clear on why we need to look
> for new frame. Will we not notice the start bit immediately and
> call oa_tc6_allocate_rx_skb() (if there is indeed a start bit in the
> stream?)
> 

The chunk with "start_bit" could be next chunk or "nth" chunk. We don't know.
We keep throwing away incoming data chunks until we see a chunk with data_bit.

> So handling skb-already-exists in oa_tc6_allocate_rx_skb() seems
> like enough to start a new frame.

I don't think we can rely on this. What if we get two continuous data chunks with "start_bit" set
without end_valid bit? Later skb would overwrite the previous one. (either due to corruption or lost chunks)

> 
> Sashiko has another comment:

Let me read at the Sashiko review 

> 
> https://sashiko.dev/#/patchset/20260626-fix-race-condition-and-crash-v2-1-
> b6c5c10e604f@onsemi.com
> <https://sashiko.dev/#/patchset/20260626-fix-race-condition-and-crash-v2-1-b6c5c10e604f@onsemi.com
> =sashiko.dev>
> --
> pw-bot: cr

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net v2 0/2] Fix to possible skb leak due to race condtion in tx path
  2026-06-26 15:35 [PATCH net v2 0/2] Fix to possible skb leak due to race condtion in tx path Selvamani Rajagopal via B4 Relay
  2026-06-26 15:35 ` [PATCH net v2 1/2] net: ethernet: oa_tc6: Protect skb pointer used by two different kernel instances Selvamani Rajagopal via B4 Relay
  2026-06-26 15:35 ` [PATCH net v2 2/2] net: ethernet: oa_tc6: Improvement in buffer overflow handling Selvamani Rajagopal via B4 Relay
@ 2026-06-30  5:38 ` Parthiban.Veerasooran
  2026-06-30  5:57   ` Selvamani Rajagopal
  2 siblings, 1 reply; 10+ messages in thread
From: Parthiban.Veerasooran @ 2026-06-30  5:38 UTC (permalink / raw)
  To: Selvamani.Rajagopal
  Cc: netdev, linux-kernel, andrew, andrew+netdev, pier.beruto, davem,
	edumazet, kuba, pabeni

Hi Selvamani,

On 26/06/26 9:05 pm, Selvamani Rajagopal via B4 Relay wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> Now the traffic is handled in threaded IRQ, and the
> disable_traffic flag is checked before handling the
> data, new race condition is exposed, in which
> buffer may leak, if threaded IRQ interrupts the
> trasmit path midway.
> 
> With this change, disable_traffic and waiting_tx_skb
> pointer are protected by spin lock/unlock pair.
> 
> This is highlighted in Sashiko review
> https://netdev-ai.bots.linux.dev/sashiko/#/patchset/20260611-level-trigger-v5-0-4533a9e85ce2%40onsemi.com
> 
> Also on buffer overrun condition, probably due to loss of
> SPI data chunks, receive path doesn't see the expected
> data chunk with end_valid bit set. As a result, driver
> keeps adding data chunks to the skb before running out
> of space and kernel panic is seen.
> 
> With this change, before adding data to the skb, if there
> is no space, skb is freed and driver starts looking for
> new frame by looking for a data chunk with start_valid
> bit set.
> 
> [  705.405490] skbuff: skb_over_panic: text:ffffffd2eb72a264 len:1600 put:64 head:ffffff804e5cdc40 data:ffffff804e5cdc80 tail:0x680 end:0x640 dev:eth1
> [  705.405569] ------------[ cut here ]------------
> [  705.405575] kernel BUG at net/core/skbuff.c:214!
> [  705.405589] Internal error: Oops - BUG: 00000000f2000800 [#1]  SMP
> 
> [ 6703.427690] Call trace:
> [  705.925157]  skb_panic+0x58/0x68 (P)
> [  705.928726]  skb_put+0x74/0x80
> [  705.931772]  oa_tc6_update_rx_skb+0x44/0x98 [oa_tc6_mod]
> [  705.937084]  oa_tc6_macphy_threaded_irq+0x3f4/0x900 [oa_tc6_mod]
> [  705.943084]  irq_thread_fn+0x34/0xb8
> [  705.946654]  irq_thread+0x1a0/0x300
> [  705.950134]  kthread+0x138/0x150
> [  705.953356]  ret_from_fork+0x10/0x20
> 
> Signed-off-by: Selvamani Rajagopal <Selvamani.Rajagopal@onsemi.com>
> ---
> Changes in v2:
> - Improvment to how error -EAGAIN is handled. Took care of
>    couple of use cases where start_bit and end_bit may be missing or
>    repeated due to lost data chunks.
> - Protected handling of waiting_tx_skb pointer with spin lock
> - Link to v1: https://lore.kernel.org/r/20260621-fix-race-condition-and-crash-v1-0-87e290d9357f@onsemi.com
I performed a one-hour quick test using two instances of the LAN8651 
MAC-PHY Click (Test Case 2), and it ran without any crashes. Thank you 
for the fixes.

Best regards,
Parthiban V
> 
> ---
> Selvamani Rajagopal (2):
>        net: ethernet: oa_tc6: Protect skb pointer used by two different kernel instances
>        net: ethernet: oa_tc6: Improvement in buffer overflow handling
> 
>   drivers/net/ethernet/oa_tc6.c | 91 ++++++++++++++++++++++++++++++-------------
>   1 file changed, 64 insertions(+), 27 deletions(-)
> ---
> base-commit: 805185b7c7a1069e407b6f7b3bc98e44d415f484
> change-id: 20260621-fix-race-condition-and-crash-94d055a665c4
> 
> Best regards,
> --
> Selvamani Rajagopal <Selvamani.Rajagopal@onsemi.com>
> 
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH net v2 0/2] Fix to possible skb leak due to race condtion in tx path
  2026-06-30  5:38 ` [PATCH net v2 0/2] Fix to possible skb leak due to race condtion in tx path Parthiban.Veerasooran
@ 2026-06-30  5:57   ` Selvamani Rajagopal
  0 siblings, 0 replies; 10+ messages in thread
From: Selvamani Rajagopal @ 2026-06-30  5:57 UTC (permalink / raw)
  To: Parthiban.Veerasooran@microchip.com
  Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	andrew@lunn.ch, andrew+netdev@lunn.ch, Piergiorgio Beruto,
	davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
	pabeni@redhat.com

> -----Original Message-----
> From: Parthiban.Veerasooran@microchip.com <Parthiban.Veerasooran@microchip.com>
> Subject: Re: [PATCH net v2 0/2] Fix to possible skb leak due to race condtion in tx path
> 
> 
>
> I performed a one-hour quick test using two instances of the LAN8651
> MAC-PHY Click (Test Case 2), and it ran without any crashes. Thank you
> for the fixes.


Great. Appreciate your help. Thanks


> 
> Best regards,
> Parthiban V
> >
> > ---


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2026-06-30 22:46 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-26 15:35 [PATCH net v2 0/2] Fix to possible skb leak due to race condtion in tx path Selvamani Rajagopal via B4 Relay
2026-06-26 15:35 ` [PATCH net v2 1/2] net: ethernet: oa_tc6: Protect skb pointer used by two different kernel instances Selvamani Rajagopal via B4 Relay
2026-06-30  2:15   ` Jakub Kicinski
2026-06-30  4:16     ` Selvamani Rajagopal
2026-06-30 22:46       ` Jakub Kicinski
2026-06-26 15:35 ` [PATCH net v2 2/2] net: ethernet: oa_tc6: Improvement in buffer overflow handling Selvamani Rajagopal via B4 Relay
2026-06-30  2:29   ` Jakub Kicinski
2026-06-30  4:41     ` Selvamani Rajagopal
2026-06-30  5:38 ` [PATCH net v2 0/2] Fix to possible skb leak due to race condtion in tx path Parthiban.Veerasooran
2026-06-30  5:57   ` Selvamani Rajagopal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox