[PATCH net v2 0/2] Fixes on the OPEN Alliance TC6 10BASE-T1x MAC-PHY support generic lib

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net v2 0/2] Fixes on the OPEN Alliance TC6 10BASE-T1x MAC-PHY support generic lib
@ 2024-11-22 10:21 Parthiban Veerasooran
  2024-11-22 10:21 ` [PATCH net v2 1/2] net: ethernet: oa_tc6: fix infinite loop error when tx credits becomes 0 Parthiban Veerasooran
  2024-11-22 10:21 ` [PATCH net v2 2/2] net: ethernet: oa_tc6: fix tx skb race condition between reference pointers Parthiban Veerasooran
  0 siblings, 2 replies; 12+ messages in thread
From: Parthiban Veerasooran @ 2024-11-22 10:21 UTC (permalink / raw)
  To: andrew+netdev, davem, edumazet, kuba, pabeni
  Cc: parthiban.veerasooran, netdev, linux-kernel, UNGLinuxDriver,
	jacob.e.keller

This patch series contain the below fixes.

- Infinite loop error when tx credits becomes 0.
- Race condition between tx skb reference pointers.

v2:
- Added mutex lock to protect tx skb reference handling.


Parthiban Veerasooran (2):
  net: ethernet: oa_tc6: fix infinite loop error when tx credits becomes
    0
  net: ethernet: oa_tc6: fix tx skb race condition between reference
    pointers

 drivers/net/ethernet/oa_tc6.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)


base-commit: fcc79e1714e8c2b8e216dc3149812edd37884eef
-- 
2.34.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH net v2 1/2] net: ethernet: oa_tc6: fix infinite loop error when tx credits becomes 0
  2024-11-22 10:21 [PATCH net v2 0/2] Fixes on the OPEN Alliance TC6 10BASE-T1x MAC-PHY support generic lib Parthiban Veerasooran
@ 2024-11-22 10:21 ` Parthiban Veerasooran
  2024-11-26 10:44   ` Paolo Abeni
  2024-11-22 10:21 ` [PATCH net v2 2/2] net: ethernet: oa_tc6: fix tx skb race condition between reference pointers Parthiban Veerasooran
  1 sibling, 1 reply; 12+ messages in thread
From: Parthiban Veerasooran @ 2024-11-22 10:21 UTC (permalink / raw)
  To: andrew+netdev, davem, edumazet, kuba, pabeni
  Cc: parthiban.veerasooran, netdev, linux-kernel, UNGLinuxDriver,
	jacob.e.keller

SPI thread wakes up to perform SPI transfer whenever there is an TX skb
from n/w stack or interrupt from MAC-PHY. Ethernet frame from TX skb is
transferred based on the availability tx credits in the MAC-PHY which is
reported from the previous SPI transfer. Sometimes there is a possibility
that TX skb is available to transmit but there is no tx credits from
MAC-PHY. In this case, there will not be any SPI transfer but the thread
will be running in an endless loop until tx credits available again.

So checking the availability of tx credits along with TX skb will prevent
the above infinite loop. When the tx credits available again that will be
notified through interrupt which will trigger the SPI transfer to get the
available tx credits.

Fixes: 53fbde8ab21e ("net: ethernet: oa_tc6: implement transmit path to transfer tx ethernet frames")

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Parthiban Veerasooran <parthiban.veerasooran@microchip.com>
---
 drivers/net/ethernet/oa_tc6.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/oa_tc6.c b/drivers/net/ethernet/oa_tc6.c
index f9c0dcd965c2..4c8b0ca922b7 100644
--- a/drivers/net/ethernet/oa_tc6.c
+++ b/drivers/net/ethernet/oa_tc6.c
@@ -1111,8 +1111,9 @@ static int oa_tc6_spi_thread_handler(void *data)
 		/* This kthread will be waken up if there is a tx skb or mac-phy
 		 * interrupt to perform spi transfer with tx chunks.
 		 */
-		wait_event_interruptible(tc6->spi_wq, tc6->waiting_tx_skb ||
-					 tc6->int_flag ||
+		wait_event_interruptible(tc6->spi_wq, tc6->int_flag ||
+					 (tc6->waiting_tx_skb &&
+					 tc6->tx_credits) ||
 					 kthread_should_stop());
 
 		if (kthread_should_stop())
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net v2 2/2] net: ethernet: oa_tc6: fix tx skb race condition between reference pointers
  2024-11-22 10:21 [PATCH net v2 0/2] Fixes on the OPEN Alliance TC6 10BASE-T1x MAC-PHY support generic lib Parthiban Veerasooran
  2024-11-22 10:21 ` [PATCH net v2 1/2] net: ethernet: oa_tc6: fix infinite loop error when tx credits becomes 0 Parthiban Veerasooran
@ 2024-11-22 10:21 ` Parthiban Veerasooran
  2024-11-26 10:41   ` Paolo Abeni
  1 sibling, 1 reply; 12+ messages in thread
From: Parthiban Veerasooran @ 2024-11-22 10:21 UTC (permalink / raw)
  To: andrew+netdev, davem, edumazet, kuba, pabeni
  Cc: parthiban.veerasooran, netdev, linux-kernel, UNGLinuxDriver,
	jacob.e.keller

There are two skb pointers to manage tx skb's enqueued from n/w stack.
waiting_tx_skb pointer points to the tx skb which needs to be processed
and ongoing_tx_skb pointer points to the tx skb which is being processed.

SPI thread prepares the tx data chunks from the tx skb pointed by the
ongoing_tx_skb pointer. When the tx skb pointed by the ongoing_tx_skb is
processed, the tx skb pointed by the waiting_tx_skb is assigned to
ongoing_tx_skb and the waiting_tx_skb pointer is assigned with NULL.
Whenever there is a new tx skb from n/w stack, it will be assigned to
waiting_tx_skb pointer if it is NULL. Enqueuing and processing of a tx skb
handled in two different threads.

Consider a scenario where the SPI thread processed an ongoing_tx_skb and
it moves next tx skb from waiting_tx_skb pointer to ongoing_tx_skb pointer
without doing any NULL check. At this time, if the waiting_tx_skb pointer
is NULL then ongoing_tx_skb pointer is also assigned with NULL. After
that, if a new tx skb is assigned to waiting_tx_skb pointer by the n/w
stack and there is a chance to overwrite the tx skb pointer with NULL in
the SPI thread. Finally one of the tx skb will be left as unhandled,
resulting packet missing and memory leak.

To overcome the above issue, protect the moving of tx skb reference from
waiting_tx_skb pointer to ongoing_tx_skb pointer so that the other thread
can't access the waiting_tx_skb pointer until the current thread completes
moving the tx skb reference safely.

Fixes: 53fbde8ab21e ("net: ethernet: oa_tc6: implement transmit path to transfer tx ethernet frames")
Signed-off-by: Parthiban Veerasooran <parthiban.veerasooran@microchip.com>
---
 drivers/net/ethernet/oa_tc6.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/oa_tc6.c b/drivers/net/ethernet/oa_tc6.c
index 4c8b0ca922b7..2ddb1faafe55 100644
--- a/drivers/net/ethernet/oa_tc6.c
+++ b/drivers/net/ethernet/oa_tc6.c
@@ -113,6 +113,7 @@ struct oa_tc6 {
 	struct mii_bus *mdiobus;
 	struct spi_device *spi;
 	struct mutex spi_ctrl_lock; /* Protects spi control transfer */
+	struct mutex tx_skb_lock; /* Protects tx skb handling */
 	void *spi_ctrl_tx_buf;
 	void *spi_ctrl_rx_buf;
 	void *spi_data_tx_buf;
@@ -1004,8 +1005,10 @@ static u16 oa_tc6_prepare_spi_tx_buf_for_tx_skbs(struct oa_tc6 *tc6)
 	for (used_tx_credits = 0; used_tx_credits < tc6->tx_credits;
 	     used_tx_credits++) {
 		if (!tc6->ongoing_tx_skb) {
+			mutex_lock(&tc6->tx_skb_lock);
 			tc6->ongoing_tx_skb = tc6->waiting_tx_skb;
 			tc6->waiting_tx_skb = NULL;
+			mutex_unlock(&tc6->tx_skb_lock);
 		}
 		if (!tc6->ongoing_tx_skb)
 			break;
@@ -1240,6 +1243,7 @@ struct oa_tc6 *oa_tc6_init(struct spi_device *spi, struct net_device *netdev)
 	tc6->netdev = netdev;
 	SET_NETDEV_DEV(netdev, &spi->dev);
 	mutex_init(&tc6->spi_ctrl_lock);
+	mutex_init(&tc6->tx_skb_lock);

 	/* Set the SPI controller to pump at realtime priority */
 	tc6->spi->rt = true;
-- 
2.34.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH net v2 2/2] net: ethernet: oa_tc6: fix tx skb race condition between reference pointers
  2024-11-22 10:21 ` [PATCH net v2 2/2] net: ethernet: oa_tc6: fix tx skb race condition between reference pointers Parthiban Veerasooran
@ 2024-11-26 10:41   ` Paolo Abeni
  2024-11-27 10:49     ` Parthiban.Veerasooran
  0 siblings, 1 reply; 12+ messages in thread
From: Paolo Abeni @ 2024-11-26 10:41 UTC (permalink / raw)
  To: Parthiban Veerasooran, andrew+netdev, davem, edumazet, kuba
  Cc: netdev, linux-kernel, UNGLinuxDriver, jacob.e.keller



On 11/22/24 11:21, Parthiban Veerasooran wrote:
> There are two skb pointers to manage tx skb's enqueued from n/w stack.
> waiting_tx_skb pointer points to the tx skb which needs to be processed
> and ongoing_tx_skb pointer points to the tx skb which is being processed.
> 
> SPI thread prepares the tx data chunks from the tx skb pointed by the
> ongoing_tx_skb pointer. When the tx skb pointed by the ongoing_tx_skb is
> processed, the tx skb pointed by the waiting_tx_skb is assigned to
> ongoing_tx_skb and the waiting_tx_skb pointer is assigned with NULL.
> Whenever there is a new tx skb from n/w stack, it will be assigned to
> waiting_tx_skb pointer if it is NULL. Enqueuing and processing of a tx skb
> handled in two different threads.
> 
> Consider a scenario where the SPI thread processed an ongoing_tx_skb and
> it moves next tx skb from waiting_tx_skb pointer to ongoing_tx_skb pointer
> without doing any NULL check. At this time, if the waiting_tx_skb pointer
> is NULL then ongoing_tx_skb pointer is also assigned with NULL. After
> that, if a new tx skb is assigned to waiting_tx_skb pointer by the n/w
> stack and there is a chance to overwrite the tx skb pointer with NULL in
> the SPI thread. Finally one of the tx skb will be left as unhandled,
> resulting packet missing and memory leak.
> To overcome the above issue, protect the moving of tx skb reference from
> waiting_tx_skb pointer to ongoing_tx_skb pointer so that the other thread
> can't access the waiting_tx_skb pointer until the current thread completes
> moving the tx skb reference safely.

A mutex looks overkill. Why don't you use a spinlock? why locking only
one side (the writer) would be enough?

Could you please report the exact sequence of events in a time diagram
leading to the bug, something alike the following?

CPU0					CPU1
oa_tc6_start_xmit
 ...
					oa_tc6_spi_thread_handler
					 ...

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net v2 1/2] net: ethernet: oa_tc6: fix infinite loop error when tx credits becomes 0
  2024-11-22 10:21 ` [PATCH net v2 1/2] net: ethernet: oa_tc6: fix infinite loop error when tx credits becomes 0 Parthiban Veerasooran
@ 2024-11-26 10:44   ` Paolo Abeni
  2024-11-27  8:43     ` Parthiban.Veerasooran
  0 siblings, 1 reply; 12+ messages in thread
From: Paolo Abeni @ 2024-11-26 10:44 UTC (permalink / raw)
  To: Parthiban Veerasooran, andrew+netdev, davem, edumazet, kuba
  Cc: netdev, linux-kernel, UNGLinuxDriver, jacob.e.keller

On 11/22/24 11:21, Parthiban Veerasooran wrote:
> SPI thread wakes up to perform SPI transfer whenever there is an TX skb
> from n/w stack or interrupt from MAC-PHY. Ethernet frame from TX skb is
> transferred based on the availability tx credits in the MAC-PHY which is
> reported from the previous SPI transfer. Sometimes there is a possibility
> that TX skb is available to transmit but there is no tx credits from
> MAC-PHY. In this case, there will not be any SPI transfer but the thread
> will be running in an endless loop until tx credits available again.
> 
> So checking the availability of tx credits along with TX skb will prevent
> the above infinite loop. When the tx credits available again that will be
> notified through interrupt which will trigger the SPI transfer to get the
> available tx credits.
> 
> Fixes: 53fbde8ab21e ("net: ethernet: oa_tc6: implement transmit path to transfer tx ethernet frames")
> 
> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
> Signed-off-by: Parthiban Veerasooran <parthiban.veerasooran@microchip.com>

Please, avoid empty lines between the Fixes tag and the SoB

Thanks,

Paolo



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net v2 1/2] net: ethernet: oa_tc6: fix infinite loop error when tx credits becomes 0
  2024-11-26 10:44   ` Paolo Abeni
@ 2024-11-27  8:43     ` Parthiban.Veerasooran
  0 siblings, 0 replies; 12+ messages in thread
From: Parthiban.Veerasooran @ 2024-11-27  8:43 UTC (permalink / raw)
  To: pabeni
  Cc: netdev, linux-kernel, UNGLinuxDriver, jacob.e.keller,
	andrew+netdev, davem, edumazet, kuba

Hi Paolo,

On 26/11/24 4:14 pm, Paolo Abeni wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> On 11/22/24 11:21, Parthiban Veerasooran wrote:
>> SPI thread wakes up to perform SPI transfer whenever there is an TX skb
>> from n/w stack or interrupt from MAC-PHY. Ethernet frame from TX skb is
>> transferred based on the availability tx credits in the MAC-PHY which is
>> reported from the previous SPI transfer. Sometimes there is a possibility
>> that TX skb is available to transmit but there is no tx credits from
>> MAC-PHY. In this case, there will not be any SPI transfer but the thread
>> will be running in an endless loop until tx credits available again.
>>
>> So checking the availability of tx credits along with TX skb will prevent
>> the above infinite loop. When the tx credits available again that will be
>> notified through interrupt which will trigger the SPI transfer to get the
>> available tx credits.
>>
>> Fixes: 53fbde8ab21e ("net: ethernet: oa_tc6: implement transmit path to transfer tx ethernet frames")
>>
>> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
>> Signed-off-by: Parthiban Veerasooran <parthiban.veerasooran@microchip.com>
> 
> Please, avoid empty lines between the Fixes tag and the SoB
Oh yes, will correct it in the next version.

Best regards,
Parthiban V
> 
> Thanks,
> 
> Paolo
> 
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net v2 2/2] net: ethernet: oa_tc6: fix tx skb race condition between reference pointers
  2024-11-26 10:41   ` Paolo Abeni
@ 2024-11-27 10:49     ` Parthiban.Veerasooran
  2024-11-27 12:11       ` Paolo Abeni
  0 siblings, 1 reply; 12+ messages in thread
From: Parthiban.Veerasooran @ 2024-11-27 10:49 UTC (permalink / raw)
  To: pabeni
  Cc: netdev, linux-kernel, UNGLinuxDriver, jacob.e.keller,
	andrew+netdev, davem, edumazet, kuba

Hi Paolo,

On 26/11/24 4:11 pm, Paolo Abeni wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> On 11/22/24 11:21, Parthiban Veerasooran wrote:
>> There are two skb pointers to manage tx skb's enqueued from n/w stack.
>> waiting_tx_skb pointer points to the tx skb which needs to be processed
>> and ongoing_tx_skb pointer points to the tx skb which is being processed.
>>
>> SPI thread prepares the tx data chunks from the tx skb pointed by the
>> ongoing_tx_skb pointer. When the tx skb pointed by the ongoing_tx_skb is
>> processed, the tx skb pointed by the waiting_tx_skb is assigned to
>> ongoing_tx_skb and the waiting_tx_skb pointer is assigned with NULL.
>> Whenever there is a new tx skb from n/w stack, it will be assigned to
>> waiting_tx_skb pointer if it is NULL. Enqueuing and processing of a tx skb
>> handled in two different threads.
>>
>> Consider a scenario where the SPI thread processed an ongoing_tx_skb and
>> it moves next tx skb from waiting_tx_skb pointer to ongoing_tx_skb pointer
>> without doing any NULL check. At this time, if the waiting_tx_skb pointer
>> is NULL then ongoing_tx_skb pointer is also assigned with NULL. After
>> that, if a new tx skb is assigned to waiting_tx_skb pointer by the n/w
>> stack and there is a chance to overwrite the tx skb pointer with NULL in
>> the SPI thread. Finally one of the tx skb will be left as unhandled,
>> resulting packet missing and memory leak.
>> To overcome the above issue, protect the moving of tx skb reference from
>> waiting_tx_skb pointer to ongoing_tx_skb pointer so that the other thread
>> can't access the waiting_tx_skb pointer until the current thread completes
>> moving the tx skb reference safely.
> 
> A mutex looks overkill. Why don't you use a spinlock? why locking only
> one side (the writer) would be enough?
Ah my bad, missed to protect tc6->waiting_tx_skb = skb. So that it will 
become like below,

mutex_lock(&tc6->tx_skb_lock);
tc6->waiting_tx_skb = skb;
mutex_unlock(&tc6->tx_skb_lock);

As both are not called from atomic context and they are allowed to 
sleep, I used mutex rather than spinlock.
> 
> Could you please report the exact sequence of events in a time diagram
> leading to the bug, something alike the following?
> 
> CPU0                                    CPU1
> oa_tc6_start_xmit
>   ...
>                                          oa_tc6_spi_thread_handler
>                                           ...
Good case:
----------
Consider waiting_tx_skb is NULL.

Thread1 (oa_tc6_start_xmit)	Thread2 (oa_tc6_spi_thread_handler)
---------------------------	-----------------------------------
- if waiting_tx_skb is NULL
- waiting_tx_skb = skb
				- if ongoing_tx_skb is NULL
				- ongoing_tx_skb = waiting_tx_skb
				- waiting_tx_skb = NULL
				...
				- ongoing_tx_skb = NULL
- if waiting_tx_skb is NULL
- waiting_tx_skb = skb
				- if ongoing_tx_skb is NULL
				- ongoing_tx_skb = waiting_tx_skb
				- waiting_tx_skb = NULL
				...
				- ongoing_tx_skb = NULL
....

Bad case:
---------
Consider waiting_tx_skb is NULL.

Thread1 (oa_tc6_start_xmit)	Thread2 (oa_tc6_spi_thread_handler)
---------------------------	-----------------------------------
- if waiting_tx_skb is NULL
- waiting_tx_skb = skb
				- if ongoing_tx_skb is NULL
				- ongoing_tx_skb = waiting_tx_skb
				- waiting_tx_skb = NULL
				...
				- ongoing_tx_skb = NULL
- if waiting_tx_skb is NULL
				- if ongoing_tx_skb is NULL
				- ongoing_tx_skb = waiting_tx_skb
- waiting_tx_skb = skb
				- waiting_tx_skb = NULL
				...
				- ongoing_tx_skb = NULL
- if waiting_tx_skb is NULL
- waiting_tx_skb = skb

So one tx skb is left as unhandled and the pointer is overwritten with 
NULL. Hope this clarifies the race condition.

Best regards,
Parthiban V
> 
> Thanks,
> 
> Paolo
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net v2 2/2] net: ethernet: oa_tc6: fix tx skb race condition between reference pointers
  2024-11-27 10:49     ` Parthiban.Veerasooran
@ 2024-11-27 12:11       ` Paolo Abeni
  2024-11-28  5:43         ` Parthiban.Veerasooran
  0 siblings, 1 reply; 12+ messages in thread
From: Paolo Abeni @ 2024-11-27 12:11 UTC (permalink / raw)
  To: Parthiban.Veerasooran
  Cc: netdev, linux-kernel, UNGLinuxDriver, jacob.e.keller,
	andrew+netdev, davem, edumazet, kuba

On 11/27/24 11:49, Parthiban.Veerasooran@microchip.com wrote:
> On 26/11/24 4:11 pm, Paolo Abeni wrote:
>> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>>
>> On 11/22/24 11:21, Parthiban Veerasooran wrote:
>>> There are two skb pointers to manage tx skb's enqueued from n/w stack.
>>> waiting_tx_skb pointer points to the tx skb which needs to be processed
>>> and ongoing_tx_skb pointer points to the tx skb which is being processed.
>>>
>>> SPI thread prepares the tx data chunks from the tx skb pointed by the
>>> ongoing_tx_skb pointer. When the tx skb pointed by the ongoing_tx_skb is
>>> processed, the tx skb pointed by the waiting_tx_skb is assigned to
>>> ongoing_tx_skb and the waiting_tx_skb pointer is assigned with NULL.
>>> Whenever there is a new tx skb from n/w stack, it will be assigned to
>>> waiting_tx_skb pointer if it is NULL. Enqueuing and processing of a tx skb
>>> handled in two different threads.
>>>
>>> Consider a scenario where the SPI thread processed an ongoing_tx_skb and
>>> it moves next tx skb from waiting_tx_skb pointer to ongoing_tx_skb pointer
>>> without doing any NULL check. At this time, if the waiting_tx_skb pointer
>>> is NULL then ongoing_tx_skb pointer is also assigned with NULL. After
>>> that, if a new tx skb is assigned to waiting_tx_skb pointer by the n/w
>>> stack and there is a chance to overwrite the tx skb pointer with NULL in
>>> the SPI thread. Finally one of the tx skb will be left as unhandled,
>>> resulting packet missing and memory leak.
>>> To overcome the above issue, protect the moving of tx skb reference from
>>> waiting_tx_skb pointer to ongoing_tx_skb pointer so that the other thread
>>> can't access the waiting_tx_skb pointer until the current thread completes
>>> moving the tx skb reference safely.
>>
>> A mutex looks overkill. Why don't you use a spinlock? why locking only
>> one side (the writer) would be enough?
> Ah my bad, missed to protect tc6->waiting_tx_skb = skb. So that it will 
> become like below,
> 
> mutex_lock(&tc6->tx_skb_lock);
> tc6->waiting_tx_skb = skb;
> mutex_unlock(&tc6->tx_skb_lock);
> 
> As both are not called from atomic context and they are allowed to 
> sleep, I used mutex rather than spinlock.
>>
>> Could you please report the exact sequence of events in a time diagram
>> leading to the bug, something alike the following?
>>
>> CPU0                                    CPU1
>> oa_tc6_start_xmit
>>   ...
>>                                          oa_tc6_spi_thread_handler
>>                                           ...
> Good case:
> ----------
> Consider waiting_tx_skb is NULL.
> 
> Thread1 (oa_tc6_start_xmit)	Thread2 (oa_tc6_spi_thread_handler)
> ---------------------------	-----------------------------------
> - if waiting_tx_skb is NULL
> - waiting_tx_skb = skb
> 				- if ongoing_tx_skb is NULL
> 				- ongoing_tx_skb = waiting_tx_skb
> 				- waiting_tx_skb = NULL
> 				...
> 				- ongoing_tx_skb = NULL
> - if waiting_tx_skb is NULL
> - waiting_tx_skb = skb
> 				- if ongoing_tx_skb is NULL
> 				- ongoing_tx_skb = waiting_tx_skb
> 				- waiting_tx_skb = NULL
> 				...
> 				- ongoing_tx_skb = NULL
> ....
> 
> Bad case:
> ---------
> Consider waiting_tx_skb is NULL.
> 
> Thread1 (oa_tc6_start_xmit)	Thread2 (oa_tc6_spi_thread_handler)
> ---------------------------	-----------------------------------
> - if waiting_tx_skb is NULL
> - waiting_tx_skb = skb
> 				- if ongoing_tx_skb is NULL

AFAICS, if 'waiting_tx_skb == NULL and Thread2 is in
oa_tc6_spi_thread_handler()/oa_tc6_prepare_spi_tx_buf_for_tx_skbs()
then ongoing_tx_skb can not be NULL, due to the previous check in:

https://elixir.bootlin.com/linux/v6.12/source/drivers/net/ethernet/oa_tc6.c#L1064

This looks like a single reader/single write scenarios that does not
need any lock to ensure consistency.

Do you observe any memory leak in real life scenarios?

BTW it looks like both oa_tc6_start_xmit and oa_tc6_spi_thread_handler
are possibly lacking memory barriers to avoid missing wake-ups.

Cheers,

Paolo


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net v2 2/2] net: ethernet: oa_tc6: fix tx skb race condition between reference pointers
  2024-11-27 12:11       ` Paolo Abeni
@ 2024-11-28  5:43         ` Parthiban.Veerasooran
  2024-12-02  3:37           ` Parthiban.Veerasooran
  0 siblings, 1 reply; 12+ messages in thread
From: Parthiban.Veerasooran @ 2024-11-28  5:43 UTC (permalink / raw)
  To: pabeni
  Cc: netdev, linux-kernel, UNGLinuxDriver, jacob.e.keller,
	andrew+netdev, davem, edumazet, kuba

Hi Paolo,

On 27/11/24 5:41 pm, Paolo Abeni wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> On 11/27/24 11:49, Parthiban.Veerasooran@microchip.com wrote:
>> On 26/11/24 4:11 pm, Paolo Abeni wrote:
>>> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>>>
>>> On 11/22/24 11:21, Parthiban Veerasooran wrote:
>>>> There are two skb pointers to manage tx skb's enqueued from n/w stack.
>>>> waiting_tx_skb pointer points to the tx skb which needs to be processed
>>>> and ongoing_tx_skb pointer points to the tx skb which is being processed.
>>>>
>>>> SPI thread prepares the tx data chunks from the tx skb pointed by the
>>>> ongoing_tx_skb pointer. When the tx skb pointed by the ongoing_tx_skb is
>>>> processed, the tx skb pointed by the waiting_tx_skb is assigned to
>>>> ongoing_tx_skb and the waiting_tx_skb pointer is assigned with NULL.
>>>> Whenever there is a new tx skb from n/w stack, it will be assigned to
>>>> waiting_tx_skb pointer if it is NULL. Enqueuing and processing of a tx skb
>>>> handled in two different threads.
>>>>
>>>> Consider a scenario where the SPI thread processed an ongoing_tx_skb and
>>>> it moves next tx skb from waiting_tx_skb pointer to ongoing_tx_skb pointer
>>>> without doing any NULL check. At this time, if the waiting_tx_skb pointer
>>>> is NULL then ongoing_tx_skb pointer is also assigned with NULL. After
>>>> that, if a new tx skb is assigned to waiting_tx_skb pointer by the n/w
>>>> stack and there is a chance to overwrite the tx skb pointer with NULL in
>>>> the SPI thread. Finally one of the tx skb will be left as unhandled,
>>>> resulting packet missing and memory leak.
>>>> To overcome the above issue, protect the moving of tx skb reference from
>>>> waiting_tx_skb pointer to ongoing_tx_skb pointer so that the other thread
>>>> can't access the waiting_tx_skb pointer until the current thread completes
>>>> moving the tx skb reference safely.
>>>
>>> A mutex looks overkill. Why don't you use a spinlock? why locking only
>>> one side (the writer) would be enough?
>> Ah my bad, missed to protect tc6->waiting_tx_skb = skb. So that it will
>> become like below,
>>
>> mutex_lock(&tc6->tx_skb_lock);
>> tc6->waiting_tx_skb = skb;
>> mutex_unlock(&tc6->tx_skb_lock);
>>
>> As both are not called from atomic context and they are allowed to
>> sleep, I used mutex rather than spinlock.
>>>
>>> Could you please report the exact sequence of events in a time diagram
>>> leading to the bug, something alike the following?
>>>
>>> CPU0                                    CPU1
>>> oa_tc6_start_xmit
>>>    ...
>>>                                           oa_tc6_spi_thread_handler
>>>                                            ...
>> Good case:
>> ----------
>> Consider waiting_tx_skb is NULL.
>>
>> Thread1 (oa_tc6_start_xmit)   Thread2 (oa_tc6_spi_thread_handler)
>> ---------------------------   -----------------------------------
>> - if waiting_tx_skb is NULL
>> - waiting_tx_skb = skb
>>                                - if ongoing_tx_skb is NULL
>>                                - ongoing_tx_skb = waiting_tx_skb
>>                                - waiting_tx_skb = NULL
>>                                ...
>>                                - ongoing_tx_skb = NULL
>> - if waiting_tx_skb is NULL
>> - waiting_tx_skb = skb
>>                                - if ongoing_tx_skb is NULL
>>                                - ongoing_tx_skb = waiting_tx_skb
>>                                - waiting_tx_skb = NULL
>>                                ...
>>                                - ongoing_tx_skb = NULL
>> ....
>>
>> Bad case:
>> ---------
>> Consider waiting_tx_skb is NULL.
>>
>> Thread1 (oa_tc6_start_xmit)   Thread2 (oa_tc6_spi_thread_handler)
>> ---------------------------   -----------------------------------
>> - if waiting_tx_skb is NULL
>> - waiting_tx_skb = skb
>>                                - if ongoing_tx_skb is NULL
> 
> AFAICS, if 'waiting_tx_skb == NULL and Thread2 is in
> oa_tc6_spi_thread_handler()/oa_tc6_prepare_spi_tx_buf_for_tx_skbs()
> then ongoing_tx_skb can not be NULL, due to the previous check in:
> 
> https://elixir.bootlin.com/linux/v6.12/source/drivers/net/ethernet/oa_tc6.c#L1064
> 
> This looks like a single reader/single write scenarios that does not
> need any lock to ensure consistency.
> 
> Do you observe any memory leak in real life scenarios?
> 
> BTW it looks like both oa_tc6_start_xmit and oa_tc6_spi_thread_handler
> are possibly lacking memory barriers to avoid missing wake-ups.
Actually the transmit flow control is done using the TXC reported from 
MAC-PHY and it is done in the below for loop. TXC is Transmit Credit 
Count represents the rooms available to place the tx chunks in the MAC-PHY.

https://elixir.bootlin.com/linux/v6.12/source/drivers/net/ethernet/oa_tc6.c#L1004

- Consider a scenario where the TXC reported from the previous transfer 
is 10 and ongoing_tx_skb holds an tx ethernet frame which can be 
transported in 20 TXCs and waiting_tx_skb is still NULL.
	tx_credits = 10; /* 21 are filled in the previous transfer */
	ongoing_tx_skb = 20;
	waiting_tx_skb = NULL; /* Still NULL */
- So, (tc6->ongoing_tx_skb || tc6->waiting_tx_skb) becomes true.
- After oa_tc6_prepare_spi_tx_buf_for_tx_skbs()
	ongoing_tx_skb = 10;
	waiting_tx_skb = NULL; /* Still NULL */
- Perform SPI transfer.
- Process SPI rx buffer to get the TXC from footers.
- Now let's assume previously filled 21 TXCs are freed so we are good to 
transport the next remaining 10 tx chunks from ongoing_tx_skb.
	tx_credits = 21;
	ongoing_tx_skb = 10;
	waiting_tx_skb = NULL;
- So, (tc6->ongoing_tx_skb || tc6->waiting_tx_skb) becomes true again.
- In the oa_tc6_prepare_spi_tx_buf_for_tx_skbs()
	ongoing_tx_skb = NULL;
	waiting_tx_skb = NULL;

Now the below bad case might happen,

Thread1 (oa_tc6_start_xmit)	Thread2 (oa_tc6_spi_thread_handler)
---------------------------	-----------------------------------
- if waiting_tx_skb is NULL
				- if ongoing_tx_skb is NULL
				- ongoing_tx_skb = waiting_tx_skb
- waiting_tx_skb = skb
				- waiting_tx_skb = NULL
				...
				- ongoing_tx_skb = NULL
- if waiting_tx_skb is NULL
- waiting_tx_skb = skb

Hope this clarifies.

Best regards,
Parthiban V
> 
> Cheers,
> 
> Paolo
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net v2 2/2] net: ethernet: oa_tc6: fix tx skb race condition between reference pointers
  2024-11-28  5:43         ` Parthiban.Veerasooran
@ 2024-12-02  3:37           ` Parthiban.Veerasooran
  2024-12-04  2:52             ` Jakub Kicinski
  0 siblings, 1 reply; 12+ messages in thread
From: Parthiban.Veerasooran @ 2024-12-02  3:37 UTC (permalink / raw)
  To: pabeni
  Cc: netdev, linux-kernel, UNGLinuxDriver, jacob.e.keller,
	andrew+netdev, davem, edumazet, kuba

Hi Paolo,

Does the below reply clarifies your question and shall I proceed to 
prepare the next version? OR still do you have any comments on that?

Best regards,
Parthiban V
On 28/11/24 11:13 am, Parthiban.Veerasooran@microchip.com wrote:
> Hi Paolo,
> 
> On 27/11/24 5:41 pm, Paolo Abeni wrote:
>> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>>
>> On 11/27/24 11:49, Parthiban.Veerasooran@microchip.com wrote:
>>> On 26/11/24 4:11 pm, Paolo Abeni wrote:
>>>> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>>>>
>>>> On 11/22/24 11:21, Parthiban Veerasooran wrote:
>>>>> There are two skb pointers to manage tx skb's enqueued from n/w stack.
>>>>> waiting_tx_skb pointer points to the tx skb which needs to be processed
>>>>> and ongoing_tx_skb pointer points to the tx skb which is being processed.
>>>>>
>>>>> SPI thread prepares the tx data chunks from the tx skb pointed by the
>>>>> ongoing_tx_skb pointer. When the tx skb pointed by the ongoing_tx_skb is
>>>>> processed, the tx skb pointed by the waiting_tx_skb is assigned to
>>>>> ongoing_tx_skb and the waiting_tx_skb pointer is assigned with NULL.
>>>>> Whenever there is a new tx skb from n/w stack, it will be assigned to
>>>>> waiting_tx_skb pointer if it is NULL. Enqueuing and processing of a tx skb
>>>>> handled in two different threads.
>>>>>
>>>>> Consider a scenario where the SPI thread processed an ongoing_tx_skb and
>>>>> it moves next tx skb from waiting_tx_skb pointer to ongoing_tx_skb pointer
>>>>> without doing any NULL check. At this time, if the waiting_tx_skb pointer
>>>>> is NULL then ongoing_tx_skb pointer is also assigned with NULL. After
>>>>> that, if a new tx skb is assigned to waiting_tx_skb pointer by the n/w
>>>>> stack and there is a chance to overwrite the tx skb pointer with NULL in
>>>>> the SPI thread. Finally one of the tx skb will be left as unhandled,
>>>>> resulting packet missing and memory leak.
>>>>> To overcome the above issue, protect the moving of tx skb reference from
>>>>> waiting_tx_skb pointer to ongoing_tx_skb pointer so that the other thread
>>>>> can't access the waiting_tx_skb pointer until the current thread completes
>>>>> moving the tx skb reference safely.
>>>>
>>>> A mutex looks overkill. Why don't you use a spinlock? why locking only
>>>> one side (the writer) would be enough?
>>> Ah my bad, missed to protect tc6->waiting_tx_skb = skb. So that it will
>>> become like below,
>>>
>>> mutex_lock(&tc6->tx_skb_lock);
>>> tc6->waiting_tx_skb = skb;
>>> mutex_unlock(&tc6->tx_skb_lock);
>>>
>>> As both are not called from atomic context and they are allowed to
>>> sleep, I used mutex rather than spinlock.
>>>>
>>>> Could you please report the exact sequence of events in a time diagram
>>>> leading to the bug, something alike the following?
>>>>
>>>> CPU0                                    CPU1
>>>> oa_tc6_start_xmit
>>>>     ...
>>>>                                            oa_tc6_spi_thread_handler
>>>>                                             ...
>>> Good case:
>>> ----------
>>> Consider waiting_tx_skb is NULL.
>>>
>>> Thread1 (oa_tc6_start_xmit)   Thread2 (oa_tc6_spi_thread_handler)
>>> ---------------------------   -----------------------------------
>>> - if waiting_tx_skb is NULL
>>> - waiting_tx_skb = skb
>>>                                 - if ongoing_tx_skb is NULL
>>>                                 - ongoing_tx_skb = waiting_tx_skb
>>>                                 - waiting_tx_skb = NULL
>>>                                 ...
>>>                                 - ongoing_tx_skb = NULL
>>> - if waiting_tx_skb is NULL
>>> - waiting_tx_skb = skb
>>>                                 - if ongoing_tx_skb is NULL
>>>                                 - ongoing_tx_skb = waiting_tx_skb
>>>                                 - waiting_tx_skb = NULL
>>>                                 ...
>>>                                 - ongoing_tx_skb = NULL
>>> ....
>>>
>>> Bad case:
>>> ---------
>>> Consider waiting_tx_skb is NULL.
>>>
>>> Thread1 (oa_tc6_start_xmit)   Thread2 (oa_tc6_spi_thread_handler)
>>> ---------------------------   -----------------------------------
>>> - if waiting_tx_skb is NULL
>>> - waiting_tx_skb = skb
>>>                                 - if ongoing_tx_skb is NULL
>>
>> AFAICS, if 'waiting_tx_skb == NULL and Thread2 is in
>> oa_tc6_spi_thread_handler()/oa_tc6_prepare_spi_tx_buf_for_tx_skbs()
>> then ongoing_tx_skb can not be NULL, due to the previous check in:
>>
>> https://elixir.bootlin.com/linux/v6.12/source/drivers/net/ethernet/oa_tc6.c#L1064
>>
>> This looks like a single reader/single write scenarios that does not
>> need any lock to ensure consistency.
>>
>> Do you observe any memory leak in real life scenarios?
>>
>> BTW it looks like both oa_tc6_start_xmit and oa_tc6_spi_thread_handler
>> are possibly lacking memory barriers to avoid missing wake-ups.
> Actually the transmit flow control is done using the TXC reported from
> MAC-PHY and it is done in the below for loop. TXC is Transmit Credit
> Count represents the rooms available to place the tx chunks in the MAC-PHY.
> 
> https://elixir.bootlin.com/linux/v6.12/source/drivers/net/ethernet/oa_tc6.c#L1004
> 
> - Consider a scenario where the TXC reported from the previous transfer
> is 10 and ongoing_tx_skb holds an tx ethernet frame which can be
> transported in 20 TXCs and waiting_tx_skb is still NULL.
> 	tx_credits = 10; /* 21 are filled in the previous transfer */
> 	ongoing_tx_skb = 20;
> 	waiting_tx_skb = NULL; /* Still NULL */
> - So, (tc6->ongoing_tx_skb || tc6->waiting_tx_skb) becomes true.
> - After oa_tc6_prepare_spi_tx_buf_for_tx_skbs()
> 	ongoing_tx_skb = 10;
> 	waiting_tx_skb = NULL; /* Still NULL */
> - Perform SPI transfer.
> - Process SPI rx buffer to get the TXC from footers.
> - Now let's assume previously filled 21 TXCs are freed so we are good to
> transport the next remaining 10 tx chunks from ongoing_tx_skb.
> 	tx_credits = 21;
> 	ongoing_tx_skb = 10;
> 	waiting_tx_skb = NULL;
> - So, (tc6->ongoing_tx_skb || tc6->waiting_tx_skb) becomes true again.
> - In the oa_tc6_prepare_spi_tx_buf_for_tx_skbs()
> 	ongoing_tx_skb = NULL;
> 	waiting_tx_skb = NULL;
> 
> Now the below bad case might happen,
> 
> Thread1 (oa_tc6_start_xmit)	Thread2 (oa_tc6_spi_thread_handler)
> ---------------------------	-----------------------------------
> - if waiting_tx_skb is NULL
> 				- if ongoing_tx_skb is NULL
> 				- ongoing_tx_skb = waiting_tx_skb
> - waiting_tx_skb = skb
> 				- waiting_tx_skb = NULL
> 				...
> 				- ongoing_tx_skb = NULL
> - if waiting_tx_skb is NULL
> - waiting_tx_skb = skb
> 
> Hope this clarifies.
> 
> Best regards,
> Parthiban V
>>
>> Cheers,
>>
>> Paolo
>>
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net v2 2/2] net: ethernet: oa_tc6: fix tx skb race condition between reference pointers
  2024-12-02  3:37           ` Parthiban.Veerasooran
@ 2024-12-04  2:52             ` Jakub Kicinski
  2024-12-04  8:28               ` Parthiban.Veerasooran
  0 siblings, 1 reply; 12+ messages in thread
From: Jakub Kicinski @ 2024-12-04  2:52 UTC (permalink / raw)
  To: Parthiban.Veerasooran
  Cc: pabeni, netdev, linux-kernel, UNGLinuxDriver, jacob.e.keller,
	andrew+netdev, davem, edumazet

On Mon, 2 Dec 2024 03:37:01 +0000 Parthiban.Veerasooran@microchip.com
wrote:
> Does the below reply clarifies your question and shall I proceed to 
> prepare the next version? OR still do you have any comments on that?

In case you're still waiting for a response - yes, please proceed with
v2, add the diagram requested by Paolo to the commit message.. and we
will see if there are more questions then :)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net v2 2/2] net: ethernet: oa_tc6: fix tx skb race condition between reference pointers
  2024-12-04  2:52             ` Jakub Kicinski
@ 2024-12-04  8:28               ` Parthiban.Veerasooran
  0 siblings, 0 replies; 12+ messages in thread
From: Parthiban.Veerasooran @ 2024-12-04  8:28 UTC (permalink / raw)
  To: kuba
  Cc: pabeni, netdev, linux-kernel, UNGLinuxDriver, jacob.e.keller,
	andrew+netdev, davem, edumazet

Hi Jakub,

On 04/12/24 8:22 am, Jakub Kicinski wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> On Mon, 2 Dec 2024 03:37:01 +0000 Parthiban.Veerasooran@microchip.com
> wrote:
>> Does the below reply clarifies your question and shall I proceed to
>> prepare the next version? OR still do you have any comments on that?
> 
> In case you're still waiting for a response - yes, please proceed with
> v2, add the diagram requested by Paolo to the commit message.. and we
> will see if there are more questions then :)
Sure will do then. Thanks for the info.

Best regards,
Parthiban V


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-12-04  8:28 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-22 10:21 [PATCH net v2 0/2] Fixes on the OPEN Alliance TC6 10BASE-T1x MAC-PHY support generic lib Parthiban Veerasooran
2024-11-22 10:21 ` [PATCH net v2 1/2] net: ethernet: oa_tc6: fix infinite loop error when tx credits becomes 0 Parthiban Veerasooran
2024-11-26 10:44   ` Paolo Abeni
2024-11-27  8:43     ` Parthiban.Veerasooran
2024-11-22 10:21 ` [PATCH net v2 2/2] net: ethernet: oa_tc6: fix tx skb race condition between reference pointers Parthiban Veerasooran
2024-11-26 10:41   ` Paolo Abeni
2024-11-27 10:49     ` Parthiban.Veerasooran
2024-11-27 12:11       ` Paolo Abeni
2024-11-28  5:43         ` Parthiban.Veerasooran
2024-12-02  3:37           ` Parthiban.Veerasooran
2024-12-04  2:52             ` Jakub Kicinski
2024-12-04  8:28               ` Parthiban.Veerasooran

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).