* [PATCH net v2 0/2] Fix to possible skb leak due to race condtion in tx path
@ 2026-06-26 15:35 Selvamani Rajagopal via B4 Relay
2026-06-26 15:35 ` [PATCH net v2 1/2] net: ethernet: oa_tc6: Protect skb pointer used by two different kernel instances Selvamani Rajagopal via B4 Relay
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Selvamani Rajagopal via B4 Relay @ 2026-06-26 15:35 UTC (permalink / raw)
To: Parthiban Veerasooran, Andrew Lunn, Piergiorgio Beruto,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: netdev, linux-kernel, Andrew Lunn, Parthiban Veerasooran,
Selvamani Rajagopal
Now the traffic is handled in threaded IRQ, and the
disable_traffic flag is checked before handling the
data, new race condition is exposed, in which
buffer may leak, if threaded IRQ interrupts the
trasmit path midway.
With this change, disable_traffic and waiting_tx_skb
pointer are protected by spin lock/unlock pair.
This is highlighted in Sashiko review
https://netdev-ai.bots.linux.dev/sashiko/#/patchset/20260611-level-trigger-v5-0-4533a9e85ce2%40onsemi.com
Also on buffer overrun condition, probably due to loss of
SPI data chunks, receive path doesn't see the expected
data chunk with end_valid bit set. As a result, driver
keeps adding data chunks to the skb before running out
of space and kernel panic is seen.
With this change, before adding data to the skb, if there
is no space, skb is freed and driver starts looking for
new frame by looking for a data chunk with start_valid
bit set.
[ 705.405490] skbuff: skb_over_panic: text:ffffffd2eb72a264 len:1600 put:64 head:ffffff804e5cdc40 data:ffffff804e5cdc80 tail:0x680 end:0x640 dev:eth1
[ 705.405569] ------------[ cut here ]------------
[ 705.405575] kernel BUG at net/core/skbuff.c:214!
[ 705.405589] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
[ 6703.427690] Call trace:
[ 705.925157] skb_panic+0x58/0x68 (P)
[ 705.928726] skb_put+0x74/0x80
[ 705.931772] oa_tc6_update_rx_skb+0x44/0x98 [oa_tc6_mod]
[ 705.937084] oa_tc6_macphy_threaded_irq+0x3f4/0x900 [oa_tc6_mod]
[ 705.943084] irq_thread_fn+0x34/0xb8
[ 705.946654] irq_thread+0x1a0/0x300
[ 705.950134] kthread+0x138/0x150
[ 705.953356] ret_from_fork+0x10/0x20
Signed-off-by: Selvamani Rajagopal <Selvamani.Rajagopal@onsemi.com>
---
Changes in v2:
- Improvment to how error -EAGAIN is handled. Took care of
couple of use cases where start_bit and end_bit may be missing or
repeated due to lost data chunks.
- Protected handling of waiting_tx_skb pointer with spin lock
- Link to v1: https://lore.kernel.org/r/20260621-fix-race-condition-and-crash-v1-0-87e290d9357f@onsemi.com
---
Selvamani Rajagopal (2):
net: ethernet: oa_tc6: Protect skb pointer used by two different kernel instances
net: ethernet: oa_tc6: Improvement in buffer overflow handling
drivers/net/ethernet/oa_tc6.c | 91 ++++++++++++++++++++++++++++++-------------
1 file changed, 64 insertions(+), 27 deletions(-)
---
base-commit: 805185b7c7a1069e407b6f7b3bc98e44d415f484
change-id: 20260621-fix-race-condition-and-crash-94d055a665c4
Best regards,
--
Selvamani Rajagopal <Selvamani.Rajagopal@onsemi.com>
^ permalink raw reply [flat|nested] 10+ messages in thread* [PATCH net v2 1/2] net: ethernet: oa_tc6: Protect skb pointer used by two different kernel instances 2026-06-26 15:35 [PATCH net v2 0/2] Fix to possible skb leak due to race condtion in tx path Selvamani Rajagopal via B4 Relay @ 2026-06-26 15:35 ` Selvamani Rajagopal via B4 Relay 2026-06-30 2:15 ` Jakub Kicinski 2026-06-26 15:35 ` [PATCH net v2 2/2] net: ethernet: oa_tc6: Improvement in buffer overflow handling Selvamani Rajagopal via B4 Relay 2026-06-30 5:38 ` [PATCH net v2 0/2] Fix to possible skb leak due to race condtion in tx path Parthiban.Veerasooran 2 siblings, 1 reply; 10+ messages in thread From: Selvamani Rajagopal via B4 Relay @ 2026-06-26 15:35 UTC (permalink / raw) To: Parthiban Veerasooran, Andrew Lunn, Piergiorgio Beruto, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni Cc: netdev, linux-kernel, Andrew Lunn, Parthiban Veerasooran, Selvamani Rajagopal From: Selvamani Rajagopal <Selvamani.Rajagopal@onsemi.com> Threaded IRQ uses waiting_tx_skb. Transmit path also uses this pointer without any mutual exclusion protection. As a result, it might leak skb buffer, particularly threaded IRQ runs in the middle of tranmsmit path, near skb_linearize. Fixes: b542d13fab0f ("net: ethernet: oa_tc6: Interrupt is active low, level triggered.") Signed-off-by: Selvamani Rajagopal <Selvamani.Rajagopal@onsemi.com> --- changes in v2: added the missing prefix to the title --- drivers/net/ethernet/oa_tc6.c | 22 ++++++++++++++-------- 1 file changed, 14 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/oa_tc6.c b/drivers/net/ethernet/oa_tc6.c index 0727d53345a3..3fd4851ee66d 100644 --- a/drivers/net/ethernet/oa_tc6.c +++ b/drivers/net/ethernet/oa_tc6.c @@ -672,10 +672,16 @@ static void oa_tc6_cleanup_ongoing_tx_skb(struct oa_tc6 *tc6) static void oa_tc6_cleanup_waiting_tx_skb(struct oa_tc6 *tc6) { - if (tc6->waiting_tx_skb) { + struct sk_buff *skb; + + spin_lock_bh(&tc6->tx_skb_lock); + skb = tc6->waiting_tx_skb; + tc6->waiting_tx_skb = NULL; + spin_unlock_bh(&tc6->tx_skb_lock); + + if (skb) { tc6->netdev->stats.tx_dropped++; - kfree_skb(tc6->waiting_tx_skb); - tc6->waiting_tx_skb = NULL; + kfree_skb(skb); } } @@ -1250,11 +1256,6 @@ EXPORT_SYMBOL_GPL(oa_tc6_zero_align_receive_frame_enable); */ netdev_tx_t oa_tc6_start_xmit(struct oa_tc6 *tc6, struct sk_buff *skb) { - if (tc6->disable_traffic || tc6->waiting_tx_skb) { - netif_stop_queue(tc6->netdev); - return NETDEV_TX_BUSY; - } - if (skb_linearize(skb)) { dev_kfree_skb_any(skb); tc6->netdev->stats.tx_dropped++; @@ -1262,6 +1263,11 @@ netdev_tx_t oa_tc6_start_xmit(struct oa_tc6 *tc6, struct sk_buff *skb) } spin_lock_bh(&tc6->tx_skb_lock); + if (tc6->disable_traffic || tc6->waiting_tx_skb) { + netif_stop_queue(tc6->netdev); + spin_unlock_bh(&tc6->tx_skb_lock); + return NETDEV_TX_BUSY; + } tc6->waiting_tx_skb = skb; spin_unlock_bh(&tc6->tx_skb_lock); -- 2.43.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH net v2 1/2] net: ethernet: oa_tc6: Protect skb pointer used by two different kernel instances 2026-06-26 15:35 ` [PATCH net v2 1/2] net: ethernet: oa_tc6: Protect skb pointer used by two different kernel instances Selvamani Rajagopal via B4 Relay @ 2026-06-30 2:15 ` Jakub Kicinski 2026-06-30 4:16 ` Selvamani Rajagopal 0 siblings, 1 reply; 10+ messages in thread From: Jakub Kicinski @ 2026-06-30 2:15 UTC (permalink / raw) To: Selvamani Rajagopal via B4 Relay Cc: Selvamani.Rajagopal, Parthiban Veerasooran, Andrew Lunn, Piergiorgio Beruto, David S. Miller, Eric Dumazet, Paolo Abeni, netdev, linux-kernel, Andrew Lunn On Fri, 26 Jun 2026 08:35:18 -0700 Selvamani Rajagopal via B4 Relay wrote: > Threaded IRQ uses waiting_tx_skb. Transmit path also uses > this pointer without any mutual exclusion protection. As a > result, it might leak skb buffer, particularly threaded IRQ > runs in the middle of tranmsmit path, near skb_linearize. Can you say more ? only xmit sets waiting_tx_skb, the IRQ clears it. So why is IRQ racing with xmit leading to drops? ^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [PATCH net v2 1/2] net: ethernet: oa_tc6: Protect skb pointer used by two different kernel instances 2026-06-30 2:15 ` Jakub Kicinski @ 2026-06-30 4:16 ` Selvamani Rajagopal 2026-06-30 22:46 ` Jakub Kicinski 0 siblings, 1 reply; 10+ messages in thread From: Selvamani Rajagopal @ 2026-06-30 4:16 UTC (permalink / raw) To: Jakub Kicinski, Selvamani Rajagopal via B4 Relay Cc: Parthiban Veerasooran, Andrew Lunn, Piergiorgio Beruto, David S. Miller, Eric Dumazet, Paolo Abeni, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Andrew Lunn > -----Original Message----- > From: Jakub Kicinski <kuba@kernel.org> > Subject: Re: [PATCH net v2 1/2] net: ethernet: oa_tc6: Protect skb pointer used by two > different kernel instances > > > On Fri, 26 Jun 2026 08:35:18 -0700 Selvamani Rajagopal via B4 Relay > wrote: > > Threaded IRQ uses waiting_tx_skb. Transmit path also uses > > this pointer without any mutual exclusion protection. As a > > result, it might leak skb buffer, particularly threaded IRQ > > runs in the middle of tranmsmit path, near skb_linearize. > > Can you say more ? only xmit sets waiting_tx_skb, the IRQ > clears it. So why is IRQ racing with xmit leading to drops? I believe xmit path and IRQ thread would run in different kernel instances. Imagine oa_tc6_try_spi_transfer call fails in threaded IRQ. It would set disable_irq. If xmit function didn't see that when it checked, but it is set before placing skb buffer in the waiting_tx_skb pointer (due to skb_linearize for example), the skb would be stuck in waiting_tx_skb. Also, See the Sashiko review that gave a SMP use-case. If you search for CPU0 0r CPU1, you would find the use case. https://netdev-ai.bots.linux.dev/sashiko/#/patchset/20260611-level-trigger-v5-0-4533a9e85ce2%40onsemi.com ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net v2 1/2] net: ethernet: oa_tc6: Protect skb pointer used by two different kernel instances 2026-06-30 4:16 ` Selvamani Rajagopal @ 2026-06-30 22:46 ` Jakub Kicinski 0 siblings, 0 replies; 10+ messages in thread From: Jakub Kicinski @ 2026-06-30 22:46 UTC (permalink / raw) To: Selvamani Rajagopal Cc: Selvamani Rajagopal via B4 Relay, Parthiban Veerasooran, Andrew Lunn, Piergiorgio Beruto, David S. Miller, Eric Dumazet, Paolo Abeni, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Andrew Lunn On Tue, 30 Jun 2026 04:16:24 +0000 Selvamani Rajagopal wrote: > > On Fri, 26 Jun 2026 08:35:18 -0700 Selvamani Rajagopal via B4 Relay > > wrote: > > > Threaded IRQ uses waiting_tx_skb. Transmit path also uses > > > this pointer without any mutual exclusion protection. As a > > > result, it might leak skb buffer, particularly threaded IRQ > > > runs in the middle of tranmsmit path, near skb_linearize. > > > > Can you say more ? only xmit sets waiting_tx_skb, the IRQ > > clears it. So why is IRQ racing with xmit leading to drops? > > I believe xmit path and IRQ thread would run in different kernel > instances. Imagine oa_tc6_try_spi_transfer call fails in threaded > IRQ. It would set disable_irq. If xmit function didn't see that when > it checked, but it is set before placing skb buffer in the > waiting_tx_skb pointer (due to skb_linearize for example), the skb > would be stuck in waiting_tx_skb. Perhaps, but wouldn't that cause a stall not a leak? Please do your digging and submit high quality patches which don't require research. We get 150 patches a day in netdev, and all maintainers have day jobs (contrary to popular belief) ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH net v2 2/2] net: ethernet: oa_tc6: Improvement in buffer overflow handling 2026-06-26 15:35 [PATCH net v2 0/2] Fix to possible skb leak due to race condtion in tx path Selvamani Rajagopal via B4 Relay 2026-06-26 15:35 ` [PATCH net v2 1/2] net: ethernet: oa_tc6: Protect skb pointer used by two different kernel instances Selvamani Rajagopal via B4 Relay @ 2026-06-26 15:35 ` Selvamani Rajagopal via B4 Relay 2026-06-30 2:29 ` Jakub Kicinski 2026-06-30 5:38 ` [PATCH net v2 0/2] Fix to possible skb leak due to race condtion in tx path Parthiban.Veerasooran 2 siblings, 1 reply; 10+ messages in thread From: Selvamani Rajagopal via B4 Relay @ 2026-06-26 15:35 UTC (permalink / raw) To: Parthiban Veerasooran, Andrew Lunn, Piergiorgio Beruto, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni Cc: netdev, linux-kernel, Andrew Lunn, Parthiban Veerasooran, Selvamani Rajagopal From: Selvamani Rajagopal <Selvamani.Rajagopal@onsemi.com> When oversubscribed traffic causes lot of buffer overflow errors, probably due to loss of data chunks, driver fails to find a data chunk with end_valid bit set, before it runs out of sk buffer space. As a result, assert is seen during skb_put. Now check is made if tail + len > end, driver abandons the current data and starts look for a data chunk with start_valid bit, that is a new frame. Fixes: d70a0d8f2f2d ("net: ethernet: oa_tc6: implement receive path to receive rx ethernet frames") Signed-off-by: Selvamani Rajagopal <Selvamani.Rajagopal@onsemi.com> --- changes in v2 - Check rx_skb pointer before new allocation and NULL before use. --- drivers/net/ethernet/oa_tc6.c | 69 +++++++++++++++++++++++++++++++------------ 1 file changed, 50 insertions(+), 19 deletions(-) diff --git a/drivers/net/ethernet/oa_tc6.c b/drivers/net/ethernet/oa_tc6.c index 3fd4851ee66d..c59daa032e70 100644 --- a/drivers/net/ethernet/oa_tc6.c +++ b/drivers/net/ethernet/oa_tc6.c @@ -692,6 +692,12 @@ static void oa_tc6_free_pending_skbs(struct oa_tc6 *tc6) oa_tc6_cleanup_waiting_tx_skb(tc6); } +static void oa_tc6_look_for_new_frame(struct oa_tc6 *tc6) +{ + tc6->rx_buf_overflow = true; + oa_tc6_cleanup_ongoing_rx_skb(tc6); +} + /* If the failure is at SPI interface level, masking and clearing * the interrupt of the device won't work. Since SPI interrupt is * disabled, it should stop the repeated interrupts. @@ -729,8 +735,7 @@ static int oa_tc6_process_extended_status(struct oa_tc6 *tc6) } if (FIELD_GET(STATUS0_RX_BUFFER_OVERFLOW_ERROR, value)) { - tc6->rx_buf_overflow = true; - oa_tc6_cleanup_ongoing_rx_skb(tc6); + oa_tc6_look_for_new_frame(tc6); net_err_ratelimited("%s: Receive buffer overflow error\n", tc6->netdev->name); return -EAGAIN; @@ -811,13 +816,35 @@ static void oa_tc6_submit_rx_skb(struct oa_tc6 *tc6) tc6->rx_skb = NULL; } -static void oa_tc6_update_rx_skb(struct oa_tc6 *tc6, u8 *payload, u8 length) +/* On oversubscribed traffic condition, particularly with overwhelming rx + * buffer overflow errors, there could be data chunk loss. If tail + length + * goes beyond end pointer, that is an indication that the data chunk with + * end_valid bit is lost. Time to look for a data chunk with start_valid bit. + * + * If rx_skb is NULL, it is time to start looking for data chunk with + * start_bit. + */ +static int oa_tc6_update_rx_skb(struct oa_tc6 *tc6, u8 *payload, u8 length) { + if (!tc6->rx_skb || + (tc6->rx_skb->tail + length) > tc6->rx_skb->end) { + oa_tc6_look_for_new_frame(tc6); + return -EAGAIN; + } + memcpy(skb_put(tc6->rx_skb, length), payload, length); + return 0; } +/* On overwhelming rx buffer overflow errors, due to data chunk loss, it is + * possible that we get two data chunks with start_valid bit set, without + * end_valid bit set in between. In this case, rx_skb would have a valid + * buffer pointer. We should release, if a valid pointer is found before + * allocating a new one. + */ static int oa_tc6_allocate_rx_skb(struct oa_tc6 *tc6) { + oa_tc6_cleanup_ongoing_rx_skb(tc6); tc6->rx_skb = netdev_alloc_skb_ip_align(tc6->netdev, tc6->netdev->mtu + ETH_HLEN + ETH_FCS_LEN); if (!tc6->rx_skb) { @@ -837,7 +864,9 @@ static int oa_tc6_prcs_complete_rx_frame(struct oa_tc6 *tc6, u8 *payload, if (ret) return ret; - oa_tc6_update_rx_skb(tc6, payload, size); + ret = oa_tc6_update_rx_skb(tc6, payload, size); + if (ret) + return ret; oa_tc6_submit_rx_skb(tc6); @@ -852,22 +881,24 @@ static int oa_tc6_prcs_rx_frame_start(struct oa_tc6 *tc6, u8 *payload, u16 size) if (ret) return ret; - oa_tc6_update_rx_skb(tc6, payload, size); - - return 0; + return oa_tc6_update_rx_skb(tc6, payload, size); } -static void oa_tc6_prcs_rx_frame_end(struct oa_tc6 *tc6, u8 *payload, u16 size) +static int oa_tc6_prcs_rx_frame_end(struct oa_tc6 *tc6, u8 *payload, u16 size) { - oa_tc6_update_rx_skb(tc6, payload, size); + int ret; - oa_tc6_submit_rx_skb(tc6); + ret = oa_tc6_update_rx_skb(tc6, payload, size); + if (!ret) + oa_tc6_submit_rx_skb(tc6); + return ret; } -static void oa_tc6_prcs_ongoing_rx_frame(struct oa_tc6 *tc6, u8 *payload, - u32 footer) +static int oa_tc6_prcs_ongoing_rx_frame(struct oa_tc6 *tc6, u8 *payload, + u32 footer) { - oa_tc6_update_rx_skb(tc6, payload, OA_TC6_CHUNK_PAYLOAD_SIZE); + return oa_tc6_update_rx_skb(tc6, payload, + OA_TC6_CHUNK_PAYLOAD_SIZE); } static int oa_tc6_prcs_rx_chunk_payload(struct oa_tc6 *tc6, u8 *data, @@ -880,6 +911,7 @@ static int oa_tc6_prcs_rx_chunk_payload(struct oa_tc6 *tc6, u8 *data, bool start_valid = FIELD_GET(OA_TC6_DATA_FOOTER_START_VALID, footer); bool end_valid = FIELD_GET(OA_TC6_DATA_FOOTER_END_VALID, footer); u16 size; + int ret; /* Restart the new rx frame after receiving rx buffer overflow error */ if (start_valid && tc6->rx_buf_overflow) @@ -907,8 +939,7 @@ static int oa_tc6_prcs_rx_chunk_payload(struct oa_tc6 *tc6, u8 *data, /* Process the chunk with only rx frame end */ if (end_valid && !start_valid) { size = end_byte_offset + 1; - oa_tc6_prcs_rx_frame_end(tc6, data, size); - return 0; + return oa_tc6_prcs_rx_frame_end(tc6, data, size); } /* Process the chunk with previous rx frame end and next rx frame @@ -921,7 +952,9 @@ static int oa_tc6_prcs_rx_chunk_payload(struct oa_tc6 *tc6, u8 *data, */ if (tc6->rx_skb) { size = end_byte_offset + 1; - oa_tc6_prcs_rx_frame_end(tc6, data, size); + ret = oa_tc6_prcs_rx_frame_end(tc6, data, size); + if (ret) + return ret; } size = OA_TC6_CHUNK_PAYLOAD_SIZE - start_byte_offset; return oa_tc6_prcs_rx_frame_start(tc6, @@ -930,9 +963,7 @@ static int oa_tc6_prcs_rx_chunk_payload(struct oa_tc6 *tc6, u8 *data, } /* Process the chunk with ongoing rx frame data */ - oa_tc6_prcs_ongoing_rx_frame(tc6, data, footer); - - return 0; + return oa_tc6_prcs_ongoing_rx_frame(tc6, data, footer); } static u32 oa_tc6_get_rx_chunk_footer(struct oa_tc6 *tc6, u16 footer_offset) -- 2.43.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH net v2 2/2] net: ethernet: oa_tc6: Improvement in buffer overflow handling 2026-06-26 15:35 ` [PATCH net v2 2/2] net: ethernet: oa_tc6: Improvement in buffer overflow handling Selvamani Rajagopal via B4 Relay @ 2026-06-30 2:29 ` Jakub Kicinski 2026-06-30 4:41 ` Selvamani Rajagopal 0 siblings, 1 reply; 10+ messages in thread From: Jakub Kicinski @ 2026-06-30 2:29 UTC (permalink / raw) To: Selvamani Rajagopal via B4 Relay Cc: Selvamani.Rajagopal, Parthiban Veerasooran, Andrew Lunn, Piergiorgio Beruto, David S. Miller, Eric Dumazet, Paolo Abeni, netdev, linux-kernel, Andrew Lunn On Fri, 26 Jun 2026 08:35:19 -0700 Selvamani Rajagopal via B4 Relay wrote: > When oversubscribed traffic causes lot of buffer overflow errors, > probably due to loss of data chunks, driver fails to find a > data chunk with end_valid bit set, before it runs out of sk buffer > space. As a result, assert is seen during skb_put. > > Now check is made if tail + len > end, driver abandons the current > data and starts look for a data chunk with start_valid bit, > that is a new frame. This sounds rather scary. The driver seems to have no length information to confirm it got all the chunks. So if it missed a middle chunk we will never know? At the very least this seems like something we should increment rx_error for, not just rx_dropped? Regarding the patch itself, I'm not clear on why we need to look for new frame. Will we not notice the start bit immediately and call oa_tc6_allocate_rx_skb() (if there is indeed a start bit in the stream?) So handling skb-already-exists in oa_tc6_allocate_rx_skb() seems like enough to start a new frame. Sashiko has another comment: https://sashiko.dev/#/patchset/20260626-fix-race-condition-and-crash-v2-1-b6c5c10e604f@onsemi.com -- pw-bot: cr ^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [PATCH net v2 2/2] net: ethernet: oa_tc6: Improvement in buffer overflow handling 2026-06-30 2:29 ` Jakub Kicinski @ 2026-06-30 4:41 ` Selvamani Rajagopal 0 siblings, 0 replies; 10+ messages in thread From: Selvamani Rajagopal @ 2026-06-30 4:41 UTC (permalink / raw) To: Jakub Kicinski, Selvamani Rajagopal via B4 Relay Cc: Parthiban Veerasooran, Andrew Lunn, Piergiorgio Beruto, David S. Miller, Eric Dumazet, Paolo Abeni, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Andrew Lunn > -----Original Message----- > From: Jakub Kicinski <kuba@kernel.org> > Sent: Monday, June 29, 2026 7:30 PM > Subject: Re: [PATCH net v2 2/2] net: ethernet: oa_tc6: Improvement in buffer overflow handling > > > This sounds rather scary. The driver seems to have no length > information to confirm it got all the chunks. So if it missed > a middle chunk we will never know? At the very least this seems > like something we should increment rx_error for, not just rx_dropped? If middle chunk is lost, we won't know. If we look at prcs_rx_chunk_payload, it relies on start_valid and end_valid bits only. start_byte_offset and end_byte_offset are always associated with the above two bits. When rx buffer overflow occurs, we don't know whether to trust the data chunk or lost, if lost, how many lost, so, we drop everything and start looking for new frame indicated by "start_bit" That's why drop count is incremented. If we dig deeper, though we increment the dropped count by one. It is possible that we may have lost more than one frame. Function process_spi_data_rx_buf bails out of the "for loop" with number_of_rx_chunks, after EAGAIN error is seen. We don't go through all the rx chunks that are already received, when we start looking for next data chunks with "start_bit". That improvement is for another time. I didn't want to complicate the changes for this effort. > > Regarding the patch itself, I'm not clear on why we need to look > for new frame. Will we not notice the start bit immediately and > call oa_tc6_allocate_rx_skb() (if there is indeed a start bit in the > stream?) > The chunk with "start_bit" could be next chunk or "nth" chunk. We don't know. We keep throwing away incoming data chunks until we see a chunk with data_bit. > So handling skb-already-exists in oa_tc6_allocate_rx_skb() seems > like enough to start a new frame. I don't think we can rely on this. What if we get two continuous data chunks with "start_bit" set without end_valid bit? Later skb would overwrite the previous one. (either due to corruption or lost chunks) > > Sashiko has another comment: Let me read at the Sashiko review > > https://sashiko.dev/#/patchset/20260626-fix-race-condition-and-crash-v2-1- > b6c5c10e604f@onsemi.com > <https://sashiko.dev/#/patchset/20260626-fix-race-condition-and-crash-v2-1-b6c5c10e604f@onsemi.com > =sashiko.dev> > -- > pw-bot: cr ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net v2 0/2] Fix to possible skb leak due to race condtion in tx path 2026-06-26 15:35 [PATCH net v2 0/2] Fix to possible skb leak due to race condtion in tx path Selvamani Rajagopal via B4 Relay 2026-06-26 15:35 ` [PATCH net v2 1/2] net: ethernet: oa_tc6: Protect skb pointer used by two different kernel instances Selvamani Rajagopal via B4 Relay 2026-06-26 15:35 ` [PATCH net v2 2/2] net: ethernet: oa_tc6: Improvement in buffer overflow handling Selvamani Rajagopal via B4 Relay @ 2026-06-30 5:38 ` Parthiban.Veerasooran 2026-06-30 5:57 ` Selvamani Rajagopal 2 siblings, 1 reply; 10+ messages in thread From: Parthiban.Veerasooran @ 2026-06-30 5:38 UTC (permalink / raw) To: Selvamani.Rajagopal Cc: netdev, linux-kernel, andrew, andrew+netdev, pier.beruto, davem, edumazet, kuba, pabeni Hi Selvamani, On 26/06/26 9:05 pm, Selvamani Rajagopal via B4 Relay wrote: > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe > > Now the traffic is handled in threaded IRQ, and the > disable_traffic flag is checked before handling the > data, new race condition is exposed, in which > buffer may leak, if threaded IRQ interrupts the > trasmit path midway. > > With this change, disable_traffic and waiting_tx_skb > pointer are protected by spin lock/unlock pair. > > This is highlighted in Sashiko review > https://netdev-ai.bots.linux.dev/sashiko/#/patchset/20260611-level-trigger-v5-0-4533a9e85ce2%40onsemi.com > > Also on buffer overrun condition, probably due to loss of > SPI data chunks, receive path doesn't see the expected > data chunk with end_valid bit set. As a result, driver > keeps adding data chunks to the skb before running out > of space and kernel panic is seen. > > With this change, before adding data to the skb, if there > is no space, skb is freed and driver starts looking for > new frame by looking for a data chunk with start_valid > bit set. > > [ 705.405490] skbuff: skb_over_panic: text:ffffffd2eb72a264 len:1600 put:64 head:ffffff804e5cdc40 data:ffffff804e5cdc80 tail:0x680 end:0x640 dev:eth1 > [ 705.405569] ------------[ cut here ]------------ > [ 705.405575] kernel BUG at net/core/skbuff.c:214! > [ 705.405589] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP > > [ 6703.427690] Call trace: > [ 705.925157] skb_panic+0x58/0x68 (P) > [ 705.928726] skb_put+0x74/0x80 > [ 705.931772] oa_tc6_update_rx_skb+0x44/0x98 [oa_tc6_mod] > [ 705.937084] oa_tc6_macphy_threaded_irq+0x3f4/0x900 [oa_tc6_mod] > [ 705.943084] irq_thread_fn+0x34/0xb8 > [ 705.946654] irq_thread+0x1a0/0x300 > [ 705.950134] kthread+0x138/0x150 > [ 705.953356] ret_from_fork+0x10/0x20 > > Signed-off-by: Selvamani Rajagopal <Selvamani.Rajagopal@onsemi.com> > --- > Changes in v2: > - Improvment to how error -EAGAIN is handled. Took care of > couple of use cases where start_bit and end_bit may be missing or > repeated due to lost data chunks. > - Protected handling of waiting_tx_skb pointer with spin lock > - Link to v1: https://lore.kernel.org/r/20260621-fix-race-condition-and-crash-v1-0-87e290d9357f@onsemi.com I performed a one-hour quick test using two instances of the LAN8651 MAC-PHY Click (Test Case 2), and it ran without any crashes. Thank you for the fixes. Best regards, Parthiban V > > --- > Selvamani Rajagopal (2): > net: ethernet: oa_tc6: Protect skb pointer used by two different kernel instances > net: ethernet: oa_tc6: Improvement in buffer overflow handling > > drivers/net/ethernet/oa_tc6.c | 91 ++++++++++++++++++++++++++++++------------- > 1 file changed, 64 insertions(+), 27 deletions(-) > --- > base-commit: 805185b7c7a1069e407b6f7b3bc98e44d415f484 > change-id: 20260621-fix-race-condition-and-crash-94d055a665c4 > > Best regards, > -- > Selvamani Rajagopal <Selvamani.Rajagopal@onsemi.com> > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [PATCH net v2 0/2] Fix to possible skb leak due to race condtion in tx path 2026-06-30 5:38 ` [PATCH net v2 0/2] Fix to possible skb leak due to race condtion in tx path Parthiban.Veerasooran @ 2026-06-30 5:57 ` Selvamani Rajagopal 0 siblings, 0 replies; 10+ messages in thread From: Selvamani Rajagopal @ 2026-06-30 5:57 UTC (permalink / raw) To: Parthiban.Veerasooran@microchip.com Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, andrew@lunn.ch, andrew+netdev@lunn.ch, Piergiorgio Beruto, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com > -----Original Message----- > From: Parthiban.Veerasooran@microchip.com <Parthiban.Veerasooran@microchip.com> > Subject: Re: [PATCH net v2 0/2] Fix to possible skb leak due to race condtion in tx path > > > > I performed a one-hour quick test using two instances of the LAN8651 > MAC-PHY Click (Test Case 2), and it ran without any crashes. Thank you > for the fixes. Great. Appreciate your help. Thanks > > Best regards, > Parthiban V > > > > --- ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-06-30 22:46 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-06-26 15:35 [PATCH net v2 0/2] Fix to possible skb leak due to race condtion in tx path Selvamani Rajagopal via B4 Relay 2026-06-26 15:35 ` [PATCH net v2 1/2] net: ethernet: oa_tc6: Protect skb pointer used by two different kernel instances Selvamani Rajagopal via B4 Relay 2026-06-30 2:15 ` Jakub Kicinski 2026-06-30 4:16 ` Selvamani Rajagopal 2026-06-30 22:46 ` Jakub Kicinski 2026-06-26 15:35 ` [PATCH net v2 2/2] net: ethernet: oa_tc6: Improvement in buffer overflow handling Selvamani Rajagopal via B4 Relay 2026-06-30 2:29 ` Jakub Kicinski 2026-06-30 4:41 ` Selvamani Rajagopal 2026-06-30 5:38 ` [PATCH net v2 0/2] Fix to possible skb leak due to race condtion in tx path Parthiban.Veerasooran 2026-06-30 5:57 ` Selvamani Rajagopal
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox