* [PATCH net v2 0/2] Fixes on the OPEN Alliance TC6 10BASE-T1x MAC-PHY support generic lib
@ 2024-11-22 10:21 Parthiban Veerasooran
2024-11-22 10:21 ` [PATCH net v2 1/2] net: ethernet: oa_tc6: fix infinite loop error when tx credits becomes 0 Parthiban Veerasooran
2024-11-22 10:21 ` [PATCH net v2 2/2] net: ethernet: oa_tc6: fix tx skb race condition between reference pointers Parthiban Veerasooran
0 siblings, 2 replies; 12+ messages in thread
From: Parthiban Veerasooran @ 2024-11-22 10:21 UTC (permalink / raw)
To: andrew+netdev, davem, edumazet, kuba, pabeni
Cc: parthiban.veerasooran, netdev, linux-kernel, UNGLinuxDriver,
jacob.e.keller
This patch series contain the below fixes.
- Infinite loop error when tx credits becomes 0.
- Race condition between tx skb reference pointers.
v2:
- Added mutex lock to protect tx skb reference handling.
Parthiban Veerasooran (2):
net: ethernet: oa_tc6: fix infinite loop error when tx credits becomes
0
net: ethernet: oa_tc6: fix tx skb race condition between reference
pointers
drivers/net/ethernet/oa_tc6.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
base-commit: fcc79e1714e8c2b8e216dc3149812edd37884eef
--
2.34.1
^ permalink raw reply [flat|nested] 12+ messages in thread* [PATCH net v2 1/2] net: ethernet: oa_tc6: fix infinite loop error when tx credits becomes 0 2024-11-22 10:21 [PATCH net v2 0/2] Fixes on the OPEN Alliance TC6 10BASE-T1x MAC-PHY support generic lib Parthiban Veerasooran @ 2024-11-22 10:21 ` Parthiban Veerasooran 2024-11-26 10:44 ` Paolo Abeni 2024-11-22 10:21 ` [PATCH net v2 2/2] net: ethernet: oa_tc6: fix tx skb race condition between reference pointers Parthiban Veerasooran 1 sibling, 1 reply; 12+ messages in thread From: Parthiban Veerasooran @ 2024-11-22 10:21 UTC (permalink / raw) To: andrew+netdev, davem, edumazet, kuba, pabeni Cc: parthiban.veerasooran, netdev, linux-kernel, UNGLinuxDriver, jacob.e.keller SPI thread wakes up to perform SPI transfer whenever there is an TX skb from n/w stack or interrupt from MAC-PHY. Ethernet frame from TX skb is transferred based on the availability tx credits in the MAC-PHY which is reported from the previous SPI transfer. Sometimes there is a possibility that TX skb is available to transmit but there is no tx credits from MAC-PHY. In this case, there will not be any SPI transfer but the thread will be running in an endless loop until tx credits available again. So checking the availability of tx credits along with TX skb will prevent the above infinite loop. When the tx credits available again that will be notified through interrupt which will trigger the SPI transfer to get the available tx credits. Fixes: 53fbde8ab21e ("net: ethernet: oa_tc6: implement transmit path to transfer tx ethernet frames") Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Parthiban Veerasooran <parthiban.veerasooran@microchip.com> --- drivers/net/ethernet/oa_tc6.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/oa_tc6.c b/drivers/net/ethernet/oa_tc6.c index f9c0dcd965c2..4c8b0ca922b7 100644 --- a/drivers/net/ethernet/oa_tc6.c +++ b/drivers/net/ethernet/oa_tc6.c @@ -1111,8 +1111,9 @@ static int oa_tc6_spi_thread_handler(void *data) /* This kthread will be waken up if there is a tx skb or mac-phy * interrupt to perform spi transfer with tx chunks. */ - wait_event_interruptible(tc6->spi_wq, tc6->waiting_tx_skb || - tc6->int_flag || + wait_event_interruptible(tc6->spi_wq, tc6->int_flag || + (tc6->waiting_tx_skb && + tc6->tx_credits) || kthread_should_stop()); if (kthread_should_stop()) -- 2.34.1 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH net v2 1/2] net: ethernet: oa_tc6: fix infinite loop error when tx credits becomes 0 2024-11-22 10:21 ` [PATCH net v2 1/2] net: ethernet: oa_tc6: fix infinite loop error when tx credits becomes 0 Parthiban Veerasooran @ 2024-11-26 10:44 ` Paolo Abeni 2024-11-27 8:43 ` Parthiban.Veerasooran 0 siblings, 1 reply; 12+ messages in thread From: Paolo Abeni @ 2024-11-26 10:44 UTC (permalink / raw) To: Parthiban Veerasooran, andrew+netdev, davem, edumazet, kuba Cc: netdev, linux-kernel, UNGLinuxDriver, jacob.e.keller On 11/22/24 11:21, Parthiban Veerasooran wrote: > SPI thread wakes up to perform SPI transfer whenever there is an TX skb > from n/w stack or interrupt from MAC-PHY. Ethernet frame from TX skb is > transferred based on the availability tx credits in the MAC-PHY which is > reported from the previous SPI transfer. Sometimes there is a possibility > that TX skb is available to transmit but there is no tx credits from > MAC-PHY. In this case, there will not be any SPI transfer but the thread > will be running in an endless loop until tx credits available again. > > So checking the availability of tx credits along with TX skb will prevent > the above infinite loop. When the tx credits available again that will be > notified through interrupt which will trigger the SPI transfer to get the > available tx credits. > > Fixes: 53fbde8ab21e ("net: ethernet: oa_tc6: implement transmit path to transfer tx ethernet frames") > > Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> > Signed-off-by: Parthiban Veerasooran <parthiban.veerasooran@microchip.com> Please, avoid empty lines between the Fixes tag and the SoB Thanks, Paolo ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net v2 1/2] net: ethernet: oa_tc6: fix infinite loop error when tx credits becomes 0 2024-11-26 10:44 ` Paolo Abeni @ 2024-11-27 8:43 ` Parthiban.Veerasooran 0 siblings, 0 replies; 12+ messages in thread From: Parthiban.Veerasooran @ 2024-11-27 8:43 UTC (permalink / raw) To: pabeni Cc: netdev, linux-kernel, UNGLinuxDriver, jacob.e.keller, andrew+netdev, davem, edumazet, kuba Hi Paolo, On 26/11/24 4:14 pm, Paolo Abeni wrote: > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe > > On 11/22/24 11:21, Parthiban Veerasooran wrote: >> SPI thread wakes up to perform SPI transfer whenever there is an TX skb >> from n/w stack or interrupt from MAC-PHY. Ethernet frame from TX skb is >> transferred based on the availability tx credits in the MAC-PHY which is >> reported from the previous SPI transfer. Sometimes there is a possibility >> that TX skb is available to transmit but there is no tx credits from >> MAC-PHY. In this case, there will not be any SPI transfer but the thread >> will be running in an endless loop until tx credits available again. >> >> So checking the availability of tx credits along with TX skb will prevent >> the above infinite loop. When the tx credits available again that will be >> notified through interrupt which will trigger the SPI transfer to get the >> available tx credits. >> >> Fixes: 53fbde8ab21e ("net: ethernet: oa_tc6: implement transmit path to transfer tx ethernet frames") >> >> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> >> Signed-off-by: Parthiban Veerasooran <parthiban.veerasooran@microchip.com> > > Please, avoid empty lines between the Fixes tag and the SoB Oh yes, will correct it in the next version. Best regards, Parthiban V > > Thanks, > > Paolo > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH net v2 2/2] net: ethernet: oa_tc6: fix tx skb race condition between reference pointers 2024-11-22 10:21 [PATCH net v2 0/2] Fixes on the OPEN Alliance TC6 10BASE-T1x MAC-PHY support generic lib Parthiban Veerasooran 2024-11-22 10:21 ` [PATCH net v2 1/2] net: ethernet: oa_tc6: fix infinite loop error when tx credits becomes 0 Parthiban Veerasooran @ 2024-11-22 10:21 ` Parthiban Veerasooran 2024-11-26 10:41 ` Paolo Abeni 1 sibling, 1 reply; 12+ messages in thread From: Parthiban Veerasooran @ 2024-11-22 10:21 UTC (permalink / raw) To: andrew+netdev, davem, edumazet, kuba, pabeni Cc: parthiban.veerasooran, netdev, linux-kernel, UNGLinuxDriver, jacob.e.keller There are two skb pointers to manage tx skb's enqueued from n/w stack. waiting_tx_skb pointer points to the tx skb which needs to be processed and ongoing_tx_skb pointer points to the tx skb which is being processed. SPI thread prepares the tx data chunks from the tx skb pointed by the ongoing_tx_skb pointer. When the tx skb pointed by the ongoing_tx_skb is processed, the tx skb pointed by the waiting_tx_skb is assigned to ongoing_tx_skb and the waiting_tx_skb pointer is assigned with NULL. Whenever there is a new tx skb from n/w stack, it will be assigned to waiting_tx_skb pointer if it is NULL. Enqueuing and processing of a tx skb handled in two different threads. Consider a scenario where the SPI thread processed an ongoing_tx_skb and it moves next tx skb from waiting_tx_skb pointer to ongoing_tx_skb pointer without doing any NULL check. At this time, if the waiting_tx_skb pointer is NULL then ongoing_tx_skb pointer is also assigned with NULL. After that, if a new tx skb is assigned to waiting_tx_skb pointer by the n/w stack and there is a chance to overwrite the tx skb pointer with NULL in the SPI thread. Finally one of the tx skb will be left as unhandled, resulting packet missing and memory leak. To overcome the above issue, protect the moving of tx skb reference from waiting_tx_skb pointer to ongoing_tx_skb pointer so that the other thread can't access the waiting_tx_skb pointer until the current thread completes moving the tx skb reference safely. Fixes: 53fbde8ab21e ("net: ethernet: oa_tc6: implement transmit path to transfer tx ethernet frames") Signed-off-by: Parthiban Veerasooran <parthiban.veerasooran@microchip.com> --- drivers/net/ethernet/oa_tc6.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/net/ethernet/oa_tc6.c b/drivers/net/ethernet/oa_tc6.c index 4c8b0ca922b7..2ddb1faafe55 100644 --- a/drivers/net/ethernet/oa_tc6.c +++ b/drivers/net/ethernet/oa_tc6.c @@ -113,6 +113,7 @@ struct oa_tc6 { struct mii_bus *mdiobus; struct spi_device *spi; struct mutex spi_ctrl_lock; /* Protects spi control transfer */ + struct mutex tx_skb_lock; /* Protects tx skb handling */ void *spi_ctrl_tx_buf; void *spi_ctrl_rx_buf; void *spi_data_tx_buf; @@ -1004,8 +1005,10 @@ static u16 oa_tc6_prepare_spi_tx_buf_for_tx_skbs(struct oa_tc6 *tc6) for (used_tx_credits = 0; used_tx_credits < tc6->tx_credits; used_tx_credits++) { if (!tc6->ongoing_tx_skb) { + mutex_lock(&tc6->tx_skb_lock); tc6->ongoing_tx_skb = tc6->waiting_tx_skb; tc6->waiting_tx_skb = NULL; + mutex_unlock(&tc6->tx_skb_lock); } if (!tc6->ongoing_tx_skb) break; @@ -1240,6 +1243,7 @@ struct oa_tc6 *oa_tc6_init(struct spi_device *spi, struct net_device *netdev) tc6->netdev = netdev; SET_NETDEV_DEV(netdev, &spi->dev); mutex_init(&tc6->spi_ctrl_lock); + mutex_init(&tc6->tx_skb_lock); /* Set the SPI controller to pump at realtime priority */ tc6->spi->rt = true; -- 2.34.1 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH net v2 2/2] net: ethernet: oa_tc6: fix tx skb race condition between reference pointers 2024-11-22 10:21 ` [PATCH net v2 2/2] net: ethernet: oa_tc6: fix tx skb race condition between reference pointers Parthiban Veerasooran @ 2024-11-26 10:41 ` Paolo Abeni 2024-11-27 10:49 ` Parthiban.Veerasooran 0 siblings, 1 reply; 12+ messages in thread From: Paolo Abeni @ 2024-11-26 10:41 UTC (permalink / raw) To: Parthiban Veerasooran, andrew+netdev, davem, edumazet, kuba Cc: netdev, linux-kernel, UNGLinuxDriver, jacob.e.keller On 11/22/24 11:21, Parthiban Veerasooran wrote: > There are two skb pointers to manage tx skb's enqueued from n/w stack. > waiting_tx_skb pointer points to the tx skb which needs to be processed > and ongoing_tx_skb pointer points to the tx skb which is being processed. > > SPI thread prepares the tx data chunks from the tx skb pointed by the > ongoing_tx_skb pointer. When the tx skb pointed by the ongoing_tx_skb is > processed, the tx skb pointed by the waiting_tx_skb is assigned to > ongoing_tx_skb and the waiting_tx_skb pointer is assigned with NULL. > Whenever there is a new tx skb from n/w stack, it will be assigned to > waiting_tx_skb pointer if it is NULL. Enqueuing and processing of a tx skb > handled in two different threads. > > Consider a scenario where the SPI thread processed an ongoing_tx_skb and > it moves next tx skb from waiting_tx_skb pointer to ongoing_tx_skb pointer > without doing any NULL check. At this time, if the waiting_tx_skb pointer > is NULL then ongoing_tx_skb pointer is also assigned with NULL. After > that, if a new tx skb is assigned to waiting_tx_skb pointer by the n/w > stack and there is a chance to overwrite the tx skb pointer with NULL in > the SPI thread. Finally one of the tx skb will be left as unhandled, > resulting packet missing and memory leak. > To overcome the above issue, protect the moving of tx skb reference from > waiting_tx_skb pointer to ongoing_tx_skb pointer so that the other thread > can't access the waiting_tx_skb pointer until the current thread completes > moving the tx skb reference safely. A mutex looks overkill. Why don't you use a spinlock? why locking only one side (the writer) would be enough? Could you please report the exact sequence of events in a time diagram leading to the bug, something alike the following? CPU0 CPU1 oa_tc6_start_xmit ... oa_tc6_spi_thread_handler ... Thanks, Paolo ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net v2 2/2] net: ethernet: oa_tc6: fix tx skb race condition between reference pointers 2024-11-26 10:41 ` Paolo Abeni @ 2024-11-27 10:49 ` Parthiban.Veerasooran 2024-11-27 12:11 ` Paolo Abeni 0 siblings, 1 reply; 12+ messages in thread From: Parthiban.Veerasooran @ 2024-11-27 10:49 UTC (permalink / raw) To: pabeni Cc: netdev, linux-kernel, UNGLinuxDriver, jacob.e.keller, andrew+netdev, davem, edumazet, kuba Hi Paolo, On 26/11/24 4:11 pm, Paolo Abeni wrote: > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe > > On 11/22/24 11:21, Parthiban Veerasooran wrote: >> There are two skb pointers to manage tx skb's enqueued from n/w stack. >> waiting_tx_skb pointer points to the tx skb which needs to be processed >> and ongoing_tx_skb pointer points to the tx skb which is being processed. >> >> SPI thread prepares the tx data chunks from the tx skb pointed by the >> ongoing_tx_skb pointer. When the tx skb pointed by the ongoing_tx_skb is >> processed, the tx skb pointed by the waiting_tx_skb is assigned to >> ongoing_tx_skb and the waiting_tx_skb pointer is assigned with NULL. >> Whenever there is a new tx skb from n/w stack, it will be assigned to >> waiting_tx_skb pointer if it is NULL. Enqueuing and processing of a tx skb >> handled in two different threads. >> >> Consider a scenario where the SPI thread processed an ongoing_tx_skb and >> it moves next tx skb from waiting_tx_skb pointer to ongoing_tx_skb pointer >> without doing any NULL check. At this time, if the waiting_tx_skb pointer >> is NULL then ongoing_tx_skb pointer is also assigned with NULL. After >> that, if a new tx skb is assigned to waiting_tx_skb pointer by the n/w >> stack and there is a chance to overwrite the tx skb pointer with NULL in >> the SPI thread. Finally one of the tx skb will be left as unhandled, >> resulting packet missing and memory leak. >> To overcome the above issue, protect the moving of tx skb reference from >> waiting_tx_skb pointer to ongoing_tx_skb pointer so that the other thread >> can't access the waiting_tx_skb pointer until the current thread completes >> moving the tx skb reference safely. > > A mutex looks overkill. Why don't you use a spinlock? why locking only > one side (the writer) would be enough? Ah my bad, missed to protect tc6->waiting_tx_skb = skb. So that it will become like below, mutex_lock(&tc6->tx_skb_lock); tc6->waiting_tx_skb = skb; mutex_unlock(&tc6->tx_skb_lock); As both are not called from atomic context and they are allowed to sleep, I used mutex rather than spinlock. > > Could you please report the exact sequence of events in a time diagram > leading to the bug, something alike the following? > > CPU0 CPU1 > oa_tc6_start_xmit > ... > oa_tc6_spi_thread_handler > ... Good case: ---------- Consider waiting_tx_skb is NULL. Thread1 (oa_tc6_start_xmit) Thread2 (oa_tc6_spi_thread_handler) --------------------------- ----------------------------------- - if waiting_tx_skb is NULL - waiting_tx_skb = skb - if ongoing_tx_skb is NULL - ongoing_tx_skb = waiting_tx_skb - waiting_tx_skb = NULL ... - ongoing_tx_skb = NULL - if waiting_tx_skb is NULL - waiting_tx_skb = skb - if ongoing_tx_skb is NULL - ongoing_tx_skb = waiting_tx_skb - waiting_tx_skb = NULL ... - ongoing_tx_skb = NULL .... Bad case: --------- Consider waiting_tx_skb is NULL. Thread1 (oa_tc6_start_xmit) Thread2 (oa_tc6_spi_thread_handler) --------------------------- ----------------------------------- - if waiting_tx_skb is NULL - waiting_tx_skb = skb - if ongoing_tx_skb is NULL - ongoing_tx_skb = waiting_tx_skb - waiting_tx_skb = NULL ... - ongoing_tx_skb = NULL - if waiting_tx_skb is NULL - if ongoing_tx_skb is NULL - ongoing_tx_skb = waiting_tx_skb - waiting_tx_skb = skb - waiting_tx_skb = NULL ... - ongoing_tx_skb = NULL - if waiting_tx_skb is NULL - waiting_tx_skb = skb So one tx skb is left as unhandled and the pointer is overwritten with NULL. Hope this clarifies the race condition. Best regards, Parthiban V > > Thanks, > > Paolo > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net v2 2/2] net: ethernet: oa_tc6: fix tx skb race condition between reference pointers 2024-11-27 10:49 ` Parthiban.Veerasooran @ 2024-11-27 12:11 ` Paolo Abeni 2024-11-28 5:43 ` Parthiban.Veerasooran 0 siblings, 1 reply; 12+ messages in thread From: Paolo Abeni @ 2024-11-27 12:11 UTC (permalink / raw) To: Parthiban.Veerasooran Cc: netdev, linux-kernel, UNGLinuxDriver, jacob.e.keller, andrew+netdev, davem, edumazet, kuba On 11/27/24 11:49, Parthiban.Veerasooran@microchip.com wrote: > On 26/11/24 4:11 pm, Paolo Abeni wrote: >> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe >> >> On 11/22/24 11:21, Parthiban Veerasooran wrote: >>> There are two skb pointers to manage tx skb's enqueued from n/w stack. >>> waiting_tx_skb pointer points to the tx skb which needs to be processed >>> and ongoing_tx_skb pointer points to the tx skb which is being processed. >>> >>> SPI thread prepares the tx data chunks from the tx skb pointed by the >>> ongoing_tx_skb pointer. When the tx skb pointed by the ongoing_tx_skb is >>> processed, the tx skb pointed by the waiting_tx_skb is assigned to >>> ongoing_tx_skb and the waiting_tx_skb pointer is assigned with NULL. >>> Whenever there is a new tx skb from n/w stack, it will be assigned to >>> waiting_tx_skb pointer if it is NULL. Enqueuing and processing of a tx skb >>> handled in two different threads. >>> >>> Consider a scenario where the SPI thread processed an ongoing_tx_skb and >>> it moves next tx skb from waiting_tx_skb pointer to ongoing_tx_skb pointer >>> without doing any NULL check. At this time, if the waiting_tx_skb pointer >>> is NULL then ongoing_tx_skb pointer is also assigned with NULL. After >>> that, if a new tx skb is assigned to waiting_tx_skb pointer by the n/w >>> stack and there is a chance to overwrite the tx skb pointer with NULL in >>> the SPI thread. Finally one of the tx skb will be left as unhandled, >>> resulting packet missing and memory leak. >>> To overcome the above issue, protect the moving of tx skb reference from >>> waiting_tx_skb pointer to ongoing_tx_skb pointer so that the other thread >>> can't access the waiting_tx_skb pointer until the current thread completes >>> moving the tx skb reference safely. >> >> A mutex looks overkill. Why don't you use a spinlock? why locking only >> one side (the writer) would be enough? > Ah my bad, missed to protect tc6->waiting_tx_skb = skb. So that it will > become like below, > > mutex_lock(&tc6->tx_skb_lock); > tc6->waiting_tx_skb = skb; > mutex_unlock(&tc6->tx_skb_lock); > > As both are not called from atomic context and they are allowed to > sleep, I used mutex rather than spinlock. >> >> Could you please report the exact sequence of events in a time diagram >> leading to the bug, something alike the following? >> >> CPU0 CPU1 >> oa_tc6_start_xmit >> ... >> oa_tc6_spi_thread_handler >> ... > Good case: > ---------- > Consider waiting_tx_skb is NULL. > > Thread1 (oa_tc6_start_xmit) Thread2 (oa_tc6_spi_thread_handler) > --------------------------- ----------------------------------- > - if waiting_tx_skb is NULL > - waiting_tx_skb = skb > - if ongoing_tx_skb is NULL > - ongoing_tx_skb = waiting_tx_skb > - waiting_tx_skb = NULL > ... > - ongoing_tx_skb = NULL > - if waiting_tx_skb is NULL > - waiting_tx_skb = skb > - if ongoing_tx_skb is NULL > - ongoing_tx_skb = waiting_tx_skb > - waiting_tx_skb = NULL > ... > - ongoing_tx_skb = NULL > .... > > Bad case: > --------- > Consider waiting_tx_skb is NULL. > > Thread1 (oa_tc6_start_xmit) Thread2 (oa_tc6_spi_thread_handler) > --------------------------- ----------------------------------- > - if waiting_tx_skb is NULL > - waiting_tx_skb = skb > - if ongoing_tx_skb is NULL AFAICS, if 'waiting_tx_skb == NULL and Thread2 is in oa_tc6_spi_thread_handler()/oa_tc6_prepare_spi_tx_buf_for_tx_skbs() then ongoing_tx_skb can not be NULL, due to the previous check in: https://elixir.bootlin.com/linux/v6.12/source/drivers/net/ethernet/oa_tc6.c#L1064 This looks like a single reader/single write scenarios that does not need any lock to ensure consistency. Do you observe any memory leak in real life scenarios? BTW it looks like both oa_tc6_start_xmit and oa_tc6_spi_thread_handler are possibly lacking memory barriers to avoid missing wake-ups. Cheers, Paolo ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net v2 2/2] net: ethernet: oa_tc6: fix tx skb race condition between reference pointers 2024-11-27 12:11 ` Paolo Abeni @ 2024-11-28 5:43 ` Parthiban.Veerasooran 2024-12-02 3:37 ` Parthiban.Veerasooran 0 siblings, 1 reply; 12+ messages in thread From: Parthiban.Veerasooran @ 2024-11-28 5:43 UTC (permalink / raw) To: pabeni Cc: netdev, linux-kernel, UNGLinuxDriver, jacob.e.keller, andrew+netdev, davem, edumazet, kuba Hi Paolo, On 27/11/24 5:41 pm, Paolo Abeni wrote: > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe > > On 11/27/24 11:49, Parthiban.Veerasooran@microchip.com wrote: >> On 26/11/24 4:11 pm, Paolo Abeni wrote: >>> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe >>> >>> On 11/22/24 11:21, Parthiban Veerasooran wrote: >>>> There are two skb pointers to manage tx skb's enqueued from n/w stack. >>>> waiting_tx_skb pointer points to the tx skb which needs to be processed >>>> and ongoing_tx_skb pointer points to the tx skb which is being processed. >>>> >>>> SPI thread prepares the tx data chunks from the tx skb pointed by the >>>> ongoing_tx_skb pointer. When the tx skb pointed by the ongoing_tx_skb is >>>> processed, the tx skb pointed by the waiting_tx_skb is assigned to >>>> ongoing_tx_skb and the waiting_tx_skb pointer is assigned with NULL. >>>> Whenever there is a new tx skb from n/w stack, it will be assigned to >>>> waiting_tx_skb pointer if it is NULL. Enqueuing and processing of a tx skb >>>> handled in two different threads. >>>> >>>> Consider a scenario where the SPI thread processed an ongoing_tx_skb and >>>> it moves next tx skb from waiting_tx_skb pointer to ongoing_tx_skb pointer >>>> without doing any NULL check. At this time, if the waiting_tx_skb pointer >>>> is NULL then ongoing_tx_skb pointer is also assigned with NULL. After >>>> that, if a new tx skb is assigned to waiting_tx_skb pointer by the n/w >>>> stack and there is a chance to overwrite the tx skb pointer with NULL in >>>> the SPI thread. Finally one of the tx skb will be left as unhandled, >>>> resulting packet missing and memory leak. >>>> To overcome the above issue, protect the moving of tx skb reference from >>>> waiting_tx_skb pointer to ongoing_tx_skb pointer so that the other thread >>>> can't access the waiting_tx_skb pointer until the current thread completes >>>> moving the tx skb reference safely. >>> >>> A mutex looks overkill. Why don't you use a spinlock? why locking only >>> one side (the writer) would be enough? >> Ah my bad, missed to protect tc6->waiting_tx_skb = skb. So that it will >> become like below, >> >> mutex_lock(&tc6->tx_skb_lock); >> tc6->waiting_tx_skb = skb; >> mutex_unlock(&tc6->tx_skb_lock); >> >> As both are not called from atomic context and they are allowed to >> sleep, I used mutex rather than spinlock. >>> >>> Could you please report the exact sequence of events in a time diagram >>> leading to the bug, something alike the following? >>> >>> CPU0 CPU1 >>> oa_tc6_start_xmit >>> ... >>> oa_tc6_spi_thread_handler >>> ... >> Good case: >> ---------- >> Consider waiting_tx_skb is NULL. >> >> Thread1 (oa_tc6_start_xmit) Thread2 (oa_tc6_spi_thread_handler) >> --------------------------- ----------------------------------- >> - if waiting_tx_skb is NULL >> - waiting_tx_skb = skb >> - if ongoing_tx_skb is NULL >> - ongoing_tx_skb = waiting_tx_skb >> - waiting_tx_skb = NULL >> ... >> - ongoing_tx_skb = NULL >> - if waiting_tx_skb is NULL >> - waiting_tx_skb = skb >> - if ongoing_tx_skb is NULL >> - ongoing_tx_skb = waiting_tx_skb >> - waiting_tx_skb = NULL >> ... >> - ongoing_tx_skb = NULL >> .... >> >> Bad case: >> --------- >> Consider waiting_tx_skb is NULL. >> >> Thread1 (oa_tc6_start_xmit) Thread2 (oa_tc6_spi_thread_handler) >> --------------------------- ----------------------------------- >> - if waiting_tx_skb is NULL >> - waiting_tx_skb = skb >> - if ongoing_tx_skb is NULL > > AFAICS, if 'waiting_tx_skb == NULL and Thread2 is in > oa_tc6_spi_thread_handler()/oa_tc6_prepare_spi_tx_buf_for_tx_skbs() > then ongoing_tx_skb can not be NULL, due to the previous check in: > > https://elixir.bootlin.com/linux/v6.12/source/drivers/net/ethernet/oa_tc6.c#L1064 > > This looks like a single reader/single write scenarios that does not > need any lock to ensure consistency. > > Do you observe any memory leak in real life scenarios? > > BTW it looks like both oa_tc6_start_xmit and oa_tc6_spi_thread_handler > are possibly lacking memory barriers to avoid missing wake-ups. Actually the transmit flow control is done using the TXC reported from MAC-PHY and it is done in the below for loop. TXC is Transmit Credit Count represents the rooms available to place the tx chunks in the MAC-PHY. https://elixir.bootlin.com/linux/v6.12/source/drivers/net/ethernet/oa_tc6.c#L1004 - Consider a scenario where the TXC reported from the previous transfer is 10 and ongoing_tx_skb holds an tx ethernet frame which can be transported in 20 TXCs and waiting_tx_skb is still NULL. tx_credits = 10; /* 21 are filled in the previous transfer */ ongoing_tx_skb = 20; waiting_tx_skb = NULL; /* Still NULL */ - So, (tc6->ongoing_tx_skb || tc6->waiting_tx_skb) becomes true. - After oa_tc6_prepare_spi_tx_buf_for_tx_skbs() ongoing_tx_skb = 10; waiting_tx_skb = NULL; /* Still NULL */ - Perform SPI transfer. - Process SPI rx buffer to get the TXC from footers. - Now let's assume previously filled 21 TXCs are freed so we are good to transport the next remaining 10 tx chunks from ongoing_tx_skb. tx_credits = 21; ongoing_tx_skb = 10; waiting_tx_skb = NULL; - So, (tc6->ongoing_tx_skb || tc6->waiting_tx_skb) becomes true again. - In the oa_tc6_prepare_spi_tx_buf_for_tx_skbs() ongoing_tx_skb = NULL; waiting_tx_skb = NULL; Now the below bad case might happen, Thread1 (oa_tc6_start_xmit) Thread2 (oa_tc6_spi_thread_handler) --------------------------- ----------------------------------- - if waiting_tx_skb is NULL - if ongoing_tx_skb is NULL - ongoing_tx_skb = waiting_tx_skb - waiting_tx_skb = skb - waiting_tx_skb = NULL ... - ongoing_tx_skb = NULL - if waiting_tx_skb is NULL - waiting_tx_skb = skb Hope this clarifies. Best regards, Parthiban V > > Cheers, > > Paolo > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net v2 2/2] net: ethernet: oa_tc6: fix tx skb race condition between reference pointers 2024-11-28 5:43 ` Parthiban.Veerasooran @ 2024-12-02 3:37 ` Parthiban.Veerasooran 2024-12-04 2:52 ` Jakub Kicinski 0 siblings, 1 reply; 12+ messages in thread From: Parthiban.Veerasooran @ 2024-12-02 3:37 UTC (permalink / raw) To: pabeni Cc: netdev, linux-kernel, UNGLinuxDriver, jacob.e.keller, andrew+netdev, davem, edumazet, kuba Hi Paolo, Does the below reply clarifies your question and shall I proceed to prepare the next version? OR still do you have any comments on that? Best regards, Parthiban V On 28/11/24 11:13 am, Parthiban.Veerasooran@microchip.com wrote: > Hi Paolo, > > On 27/11/24 5:41 pm, Paolo Abeni wrote: >> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe >> >> On 11/27/24 11:49, Parthiban.Veerasooran@microchip.com wrote: >>> On 26/11/24 4:11 pm, Paolo Abeni wrote: >>>> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe >>>> >>>> On 11/22/24 11:21, Parthiban Veerasooran wrote: >>>>> There are two skb pointers to manage tx skb's enqueued from n/w stack. >>>>> waiting_tx_skb pointer points to the tx skb which needs to be processed >>>>> and ongoing_tx_skb pointer points to the tx skb which is being processed. >>>>> >>>>> SPI thread prepares the tx data chunks from the tx skb pointed by the >>>>> ongoing_tx_skb pointer. When the tx skb pointed by the ongoing_tx_skb is >>>>> processed, the tx skb pointed by the waiting_tx_skb is assigned to >>>>> ongoing_tx_skb and the waiting_tx_skb pointer is assigned with NULL. >>>>> Whenever there is a new tx skb from n/w stack, it will be assigned to >>>>> waiting_tx_skb pointer if it is NULL. Enqueuing and processing of a tx skb >>>>> handled in two different threads. >>>>> >>>>> Consider a scenario where the SPI thread processed an ongoing_tx_skb and >>>>> it moves next tx skb from waiting_tx_skb pointer to ongoing_tx_skb pointer >>>>> without doing any NULL check. At this time, if the waiting_tx_skb pointer >>>>> is NULL then ongoing_tx_skb pointer is also assigned with NULL. After >>>>> that, if a new tx skb is assigned to waiting_tx_skb pointer by the n/w >>>>> stack and there is a chance to overwrite the tx skb pointer with NULL in >>>>> the SPI thread. Finally one of the tx skb will be left as unhandled, >>>>> resulting packet missing and memory leak. >>>>> To overcome the above issue, protect the moving of tx skb reference from >>>>> waiting_tx_skb pointer to ongoing_tx_skb pointer so that the other thread >>>>> can't access the waiting_tx_skb pointer until the current thread completes >>>>> moving the tx skb reference safely. >>>> >>>> A mutex looks overkill. Why don't you use a spinlock? why locking only >>>> one side (the writer) would be enough? >>> Ah my bad, missed to protect tc6->waiting_tx_skb = skb. So that it will >>> become like below, >>> >>> mutex_lock(&tc6->tx_skb_lock); >>> tc6->waiting_tx_skb = skb; >>> mutex_unlock(&tc6->tx_skb_lock); >>> >>> As both are not called from atomic context and they are allowed to >>> sleep, I used mutex rather than spinlock. >>>> >>>> Could you please report the exact sequence of events in a time diagram >>>> leading to the bug, something alike the following? >>>> >>>> CPU0 CPU1 >>>> oa_tc6_start_xmit >>>> ... >>>> oa_tc6_spi_thread_handler >>>> ... >>> Good case: >>> ---------- >>> Consider waiting_tx_skb is NULL. >>> >>> Thread1 (oa_tc6_start_xmit) Thread2 (oa_tc6_spi_thread_handler) >>> --------------------------- ----------------------------------- >>> - if waiting_tx_skb is NULL >>> - waiting_tx_skb = skb >>> - if ongoing_tx_skb is NULL >>> - ongoing_tx_skb = waiting_tx_skb >>> - waiting_tx_skb = NULL >>> ... >>> - ongoing_tx_skb = NULL >>> - if waiting_tx_skb is NULL >>> - waiting_tx_skb = skb >>> - if ongoing_tx_skb is NULL >>> - ongoing_tx_skb = waiting_tx_skb >>> - waiting_tx_skb = NULL >>> ... >>> - ongoing_tx_skb = NULL >>> .... >>> >>> Bad case: >>> --------- >>> Consider waiting_tx_skb is NULL. >>> >>> Thread1 (oa_tc6_start_xmit) Thread2 (oa_tc6_spi_thread_handler) >>> --------------------------- ----------------------------------- >>> - if waiting_tx_skb is NULL >>> - waiting_tx_skb = skb >>> - if ongoing_tx_skb is NULL >> >> AFAICS, if 'waiting_tx_skb == NULL and Thread2 is in >> oa_tc6_spi_thread_handler()/oa_tc6_prepare_spi_tx_buf_for_tx_skbs() >> then ongoing_tx_skb can not be NULL, due to the previous check in: >> >> https://elixir.bootlin.com/linux/v6.12/source/drivers/net/ethernet/oa_tc6.c#L1064 >> >> This looks like a single reader/single write scenarios that does not >> need any lock to ensure consistency. >> >> Do you observe any memory leak in real life scenarios? >> >> BTW it looks like both oa_tc6_start_xmit and oa_tc6_spi_thread_handler >> are possibly lacking memory barriers to avoid missing wake-ups. > Actually the transmit flow control is done using the TXC reported from > MAC-PHY and it is done in the below for loop. TXC is Transmit Credit > Count represents the rooms available to place the tx chunks in the MAC-PHY. > > https://elixir.bootlin.com/linux/v6.12/source/drivers/net/ethernet/oa_tc6.c#L1004 > > - Consider a scenario where the TXC reported from the previous transfer > is 10 and ongoing_tx_skb holds an tx ethernet frame which can be > transported in 20 TXCs and waiting_tx_skb is still NULL. > tx_credits = 10; /* 21 are filled in the previous transfer */ > ongoing_tx_skb = 20; > waiting_tx_skb = NULL; /* Still NULL */ > - So, (tc6->ongoing_tx_skb || tc6->waiting_tx_skb) becomes true. > - After oa_tc6_prepare_spi_tx_buf_for_tx_skbs() > ongoing_tx_skb = 10; > waiting_tx_skb = NULL; /* Still NULL */ > - Perform SPI transfer. > - Process SPI rx buffer to get the TXC from footers. > - Now let's assume previously filled 21 TXCs are freed so we are good to > transport the next remaining 10 tx chunks from ongoing_tx_skb. > tx_credits = 21; > ongoing_tx_skb = 10; > waiting_tx_skb = NULL; > - So, (tc6->ongoing_tx_skb || tc6->waiting_tx_skb) becomes true again. > - In the oa_tc6_prepare_spi_tx_buf_for_tx_skbs() > ongoing_tx_skb = NULL; > waiting_tx_skb = NULL; > > Now the below bad case might happen, > > Thread1 (oa_tc6_start_xmit) Thread2 (oa_tc6_spi_thread_handler) > --------------------------- ----------------------------------- > - if waiting_tx_skb is NULL > - if ongoing_tx_skb is NULL > - ongoing_tx_skb = waiting_tx_skb > - waiting_tx_skb = skb > - waiting_tx_skb = NULL > ... > - ongoing_tx_skb = NULL > - if waiting_tx_skb is NULL > - waiting_tx_skb = skb > > Hope this clarifies. > > Best regards, > Parthiban V >> >> Cheers, >> >> Paolo >> > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net v2 2/2] net: ethernet: oa_tc6: fix tx skb race condition between reference pointers 2024-12-02 3:37 ` Parthiban.Veerasooran @ 2024-12-04 2:52 ` Jakub Kicinski 2024-12-04 8:28 ` Parthiban.Veerasooran 0 siblings, 1 reply; 12+ messages in thread From: Jakub Kicinski @ 2024-12-04 2:52 UTC (permalink / raw) To: Parthiban.Veerasooran Cc: pabeni, netdev, linux-kernel, UNGLinuxDriver, jacob.e.keller, andrew+netdev, davem, edumazet On Mon, 2 Dec 2024 03:37:01 +0000 Parthiban.Veerasooran@microchip.com wrote: > Does the below reply clarifies your question and shall I proceed to > prepare the next version? OR still do you have any comments on that? In case you're still waiting for a response - yes, please proceed with v2, add the diagram requested by Paolo to the commit message.. and we will see if there are more questions then :) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net v2 2/2] net: ethernet: oa_tc6: fix tx skb race condition between reference pointers 2024-12-04 2:52 ` Jakub Kicinski @ 2024-12-04 8:28 ` Parthiban.Veerasooran 0 siblings, 0 replies; 12+ messages in thread From: Parthiban.Veerasooran @ 2024-12-04 8:28 UTC (permalink / raw) To: kuba Cc: pabeni, netdev, linux-kernel, UNGLinuxDriver, jacob.e.keller, andrew+netdev, davem, edumazet Hi Jakub, On 04/12/24 8:22 am, Jakub Kicinski wrote: > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe > > On Mon, 2 Dec 2024 03:37:01 +0000 Parthiban.Veerasooran@microchip.com > wrote: >> Does the below reply clarifies your question and shall I proceed to >> prepare the next version? OR still do you have any comments on that? > > In case you're still waiting for a response - yes, please proceed with > v2, add the diagram requested by Paolo to the commit message.. and we > will see if there are more questions then :) Sure will do then. Thanks for the info. Best regards, Parthiban V ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2024-12-04 8:28 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-11-22 10:21 [PATCH net v2 0/2] Fixes on the OPEN Alliance TC6 10BASE-T1x MAC-PHY support generic lib Parthiban Veerasooran 2024-11-22 10:21 ` [PATCH net v2 1/2] net: ethernet: oa_tc6: fix infinite loop error when tx credits becomes 0 Parthiban Veerasooran 2024-11-26 10:44 ` Paolo Abeni 2024-11-27 8:43 ` Parthiban.Veerasooran 2024-11-22 10:21 ` [PATCH net v2 2/2] net: ethernet: oa_tc6: fix tx skb race condition between reference pointers Parthiban Veerasooran 2024-11-26 10:41 ` Paolo Abeni 2024-11-27 10:49 ` Parthiban.Veerasooran 2024-11-27 12:11 ` Paolo Abeni 2024-11-28 5:43 ` Parthiban.Veerasooran 2024-12-02 3:37 ` Parthiban.Veerasooran 2024-12-04 2:52 ` Jakub Kicinski 2024-12-04 8:28 ` Parthiban.Veerasooran
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).