netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net v3 0/2] Fixes on the OPEN Alliance TC6 10BASE-T1x MAC-PHY support generic lib
@ 2024-12-04 13:35 Parthiban Veerasooran
  2024-12-04 13:35 ` [PATCH net v3 1/2] net: ethernet: oa_tc6: fix infinite loop error when tx credits becomes 0 Parthiban Veerasooran
  2024-12-04 13:35 ` [PATCH net v3 2/2] net: ethernet: oa_tc6: fix tx skb race condition between reference pointers Parthiban Veerasooran
  0 siblings, 2 replies; 8+ messages in thread
From: Parthiban Veerasooran @ 2024-12-04 13:35 UTC (permalink / raw)
  To: andrew+netdev, davem, edumazet, kuba, pabeni
  Cc: parthiban.veerasooran, netdev, linux-kernel, UNGLinuxDriver,
	jacob.e.keller

This patch series contain the below fixes.

- Infinite loop error when tx credits becomes 0.
- Race condition between tx skb reference pointers.

v2:
- Added mutex lock to protect tx skb reference handling.

v3:
- Added mutex protection in assigning new tx skb to waiting_tx_skb
  pointer.
- Explained the possible scenario for the race condition with the time
  diagram in the commit message.


Parthiban Veerasooran (2):
  net: ethernet: oa_tc6: fix infinite loop error when tx credits becomes
    0
  net: ethernet: oa_tc6: fix tx skb race condition between reference
    pointers

 drivers/net/ethernet/oa_tc6.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)


base-commit: bb18265c3aba92b91a1355609769f3e967b65dee
-- 
2.34.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH net v3 1/2] net: ethernet: oa_tc6: fix infinite loop error when tx credits becomes 0
  2024-12-04 13:35 [PATCH net v3 0/2] Fixes on the OPEN Alliance TC6 10BASE-T1x MAC-PHY support generic lib Parthiban Veerasooran
@ 2024-12-04 13:35 ` Parthiban Veerasooran
  2024-12-04 13:35 ` [PATCH net v3 2/2] net: ethernet: oa_tc6: fix tx skb race condition between reference pointers Parthiban Veerasooran
  1 sibling, 0 replies; 8+ messages in thread
From: Parthiban Veerasooran @ 2024-12-04 13:35 UTC (permalink / raw)
  To: andrew+netdev, davem, edumazet, kuba, pabeni
  Cc: parthiban.veerasooran, netdev, linux-kernel, UNGLinuxDriver,
	jacob.e.keller

SPI thread wakes up to perform SPI transfer whenever there is an TX skb
from n/w stack or interrupt from MAC-PHY. Ethernet frame from TX skb is
transferred based on the availability tx credits in the MAC-PHY which is
reported from the previous SPI transfer. Sometimes there is a possibility
that TX skb is available to transmit but there is no tx credits from
MAC-PHY. In this case, there will not be any SPI transfer but the thread
will be running in an endless loop until tx credits available again.

So checking the availability of tx credits along with TX skb will prevent
the above infinite loop. When the tx credits available again that will be
notified through interrupt which will trigger the SPI transfer to get the
available tx credits.

Fixes: 53fbde8ab21e ("net: ethernet: oa_tc6: implement transmit path to transfer tx ethernet frames")
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Parthiban Veerasooran <parthiban.veerasooran@microchip.com>
---
 drivers/net/ethernet/oa_tc6.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/oa_tc6.c b/drivers/net/ethernet/oa_tc6.c
index f9c0dcd965c2..4c8b0ca922b7 100644
--- a/drivers/net/ethernet/oa_tc6.c
+++ b/drivers/net/ethernet/oa_tc6.c
@@ -1111,8 +1111,9 @@ static int oa_tc6_spi_thread_handler(void *data)
 		/* This kthread will be waken up if there is a tx skb or mac-phy
 		 * interrupt to perform spi transfer with tx chunks.
 		 */
-		wait_event_interruptible(tc6->spi_wq, tc6->waiting_tx_skb ||
-					 tc6->int_flag ||
+		wait_event_interruptible(tc6->spi_wq, tc6->int_flag ||
+					 (tc6->waiting_tx_skb &&
+					 tc6->tx_credits) ||
 					 kthread_should_stop());
 
 		if (kthread_should_stop())
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH net v3 2/2] net: ethernet: oa_tc6: fix tx skb race condition between reference pointers
  2024-12-04 13:35 [PATCH net v3 0/2] Fixes on the OPEN Alliance TC6 10BASE-T1x MAC-PHY support generic lib Parthiban Veerasooran
  2024-12-04 13:35 ` [PATCH net v3 1/2] net: ethernet: oa_tc6: fix infinite loop error when tx credits becomes 0 Parthiban Veerasooran
@ 2024-12-04 13:35 ` Parthiban Veerasooran
  2024-12-10  0:11   ` Jakub Kicinski
  1 sibling, 1 reply; 8+ messages in thread
From: Parthiban Veerasooran @ 2024-12-04 13:35 UTC (permalink / raw)
  To: andrew+netdev, davem, edumazet, kuba, pabeni
  Cc: parthiban.veerasooran, netdev, linux-kernel, UNGLinuxDriver,
	jacob.e.keller

There are two skb pointers to manage tx skb's enqueued from n/w stack.
waiting_tx_skb pointer points to the tx skb which needs to be processed
and ongoing_tx_skb pointer points to the tx skb which is being processed.

SPI thread prepares the tx data chunks from the tx skb pointed by the
ongoing_tx_skb pointer. When the tx skb pointed by the ongoing_tx_skb is
processed, the tx skb pointed by the waiting_tx_skb is assigned to
ongoing_tx_skb and the waiting_tx_skb pointer is assigned with NULL.
Whenever there is a new tx skb from n/w stack, it will be assigned to
waiting_tx_skb pointer if it is NULL. Enqueuing and processing of a tx skb
handled in two different threads.

Consider a scenario where the SPI thread processed an ongoing_tx_skb and
it moves next tx skb from waiting_tx_skb pointer to ongoing_tx_skb pointer
without doing any NULL check. At this time, if the waiting_tx_skb pointer
is NULL then ongoing_tx_skb pointer is also assigned with NULL. After
that, if a new tx skb is assigned to waiting_tx_skb pointer by the n/w
stack and there is a chance to overwrite the tx skb pointer with NULL in
the SPI thread. Finally one of the tx skb will be left as unhandled,
resulting packet missing and memory leak.

- Consider the below scenario where the TXC reported from the previous
transfer is 10 and ongoing_tx_skb holds an tx ethernet frame which can be
transported in 20 TXCs and waiting_tx_skb is still NULL.
	tx_credits = 10; /* 21 are filled in the previous transfer */
	ongoing_tx_skb = 20;
	waiting_tx_skb = NULL; /* Still NULL */
- So, (tc6->ongoing_tx_skb || tc6->waiting_tx_skb) becomes true.
- After oa_tc6_prepare_spi_tx_buf_for_tx_skbs()
	ongoing_tx_skb = 10;
	waiting_tx_skb = NULL; /* Still NULL */
- Perform SPI transfer.
- Process SPI rx buffer to get the TXC from footers.
- Now let's assume previously filled 21 TXCs are freed so we are good to
transport the next remaining 10 tx chunks from ongoing_tx_skb.
	tx_credits = 21;
	ongoing_tx_skb = 10;
	waiting_tx_skb = NULL;
- So, (tc6->ongoing_tx_skb || tc6->waiting_tx_skb) becomes true again.
- In the oa_tc6_prepare_spi_tx_buf_for_tx_skbs()
	ongoing_tx_skb = NULL;
	waiting_tx_skb = NULL;

- Now the below bad case might happen,

Thread1 (oa_tc6_start_xmit)	Thread2 (oa_tc6_spi_thread_handler)
---------------------------	-----------------------------------
- if waiting_tx_skb is NULL
				- if ongoing_tx_skb is NULL
				- ongoing_tx_skb = waiting_tx_skb
- waiting_tx_skb = skb
				- waiting_tx_skb = NULL
				...
				- ongoing_tx_skb = NULL
- if waiting_tx_skb is NULL
- waiting_tx_skb = skb

To overcome the above issue, protect the moving of tx skb reference from
waiting_tx_skb pointer to ongoing_tx_skb pointer and assigning new tx skb
to waiting_tx_skb pointer, so that the other thread can't access the
waiting_tx_skb pointer until the current thread completes moving the tx
skb reference safely.

Fixes: 53fbde8ab21e ("net: ethernet: oa_tc6: implement transmit path to transfer tx ethernet frames")
Signed-off-by: Parthiban Veerasooran <parthiban.veerasooran@microchip.com>
---
 drivers/net/ethernet/oa_tc6.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/oa_tc6.c b/drivers/net/ethernet/oa_tc6.c
index 4c8b0ca922b7..574bfd4e9356 100644
--- a/drivers/net/ethernet/oa_tc6.c
+++ b/drivers/net/ethernet/oa_tc6.c
@@ -113,6 +113,7 @@ struct oa_tc6 {
 	struct mii_bus *mdiobus;
 	struct spi_device *spi;
 	struct mutex spi_ctrl_lock; /* Protects spi control transfer */
+	struct mutex tx_skb_lock; /* Protects tx skb handling */
 	void *spi_ctrl_tx_buf;
 	void *spi_ctrl_rx_buf;
 	void *spi_data_tx_buf;
@@ -1004,8 +1005,10 @@ static u16 oa_tc6_prepare_spi_tx_buf_for_tx_skbs(struct oa_tc6 *tc6)
 	for (used_tx_credits = 0; used_tx_credits < tc6->tx_credits;
 	     used_tx_credits++) {
 		if (!tc6->ongoing_tx_skb) {
+			mutex_lock(&tc6->tx_skb_lock);
 			tc6->ongoing_tx_skb = tc6->waiting_tx_skb;
 			tc6->waiting_tx_skb = NULL;
+			mutex_unlock(&tc6->tx_skb_lock);
 		}
 		if (!tc6->ongoing_tx_skb)
 			break;
@@ -1210,7 +1213,9 @@ netdev_tx_t oa_tc6_start_xmit(struct oa_tc6 *tc6, struct sk_buff *skb)
 		return NETDEV_TX_OK;
 	}
 
+	mutex_lock(&tc6->tx_skb_lock);
 	tc6->waiting_tx_skb = skb;
+	mutex_unlock(&tc6->tx_skb_lock);
 
 	/* Wake spi kthread to perform spi transfer */
 	wake_up_interruptible(&tc6->spi_wq);
@@ -1240,6 +1245,7 @@ struct oa_tc6 *oa_tc6_init(struct spi_device *spi, struct net_device *netdev)
 	tc6->netdev = netdev;
 	SET_NETDEV_DEV(netdev, &spi->dev);
 	mutex_init(&tc6->spi_ctrl_lock);
+	mutex_init(&tc6->tx_skb_lock);
 
 	/* Set the SPI controller to pump at realtime priority */
 	tc6->spi->rt = true;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH net v3 2/2] net: ethernet: oa_tc6: fix tx skb race condition between reference pointers
  2024-12-04 13:35 ` [PATCH net v3 2/2] net: ethernet: oa_tc6: fix tx skb race condition between reference pointers Parthiban Veerasooran
@ 2024-12-10  0:11   ` Jakub Kicinski
  2024-12-12 12:33     ` Parthiban.Veerasooran
  0 siblings, 1 reply; 8+ messages in thread
From: Jakub Kicinski @ 2024-12-10  0:11 UTC (permalink / raw)
  To: Parthiban Veerasooran
  Cc: andrew+netdev, davem, edumazet, pabeni, netdev, linux-kernel,
	UNGLinuxDriver, jacob.e.keller

On Wed, 4 Dec 2024 19:05:18 +0530 Parthiban Veerasooran wrote:
> @@ -1210,7 +1213,9 @@ netdev_tx_t oa_tc6_start_xmit(struct oa_tc6 *tc6, struct sk_buff *skb)
>  		return NETDEV_TX_OK;
>  	}
>  
> +	mutex_lock(&tc6->tx_skb_lock);
>  	tc6->waiting_tx_skb = skb;
> +	mutex_unlock(&tc6->tx_skb_lock);

start_xmit runs in BH / softirq context. You can't take sleeping locks.
The lock has to be a spin lock. You could possibly try to use the
existing spin lock of the tx queue (__netif_tx_lock()) but that may be
more challenging to do cleanly from within a library..

Please make sure you test with builds including the
kernel/configs/debug.config Kconfigs.
-- 
pw-bot: cr

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net v3 2/2] net: ethernet: oa_tc6: fix tx skb race condition between reference pointers
  2024-12-10  0:11   ` Jakub Kicinski
@ 2024-12-12 12:33     ` Parthiban.Veerasooran
  2024-12-13 10:35       ` Parthiban.Veerasooran
  0 siblings, 1 reply; 8+ messages in thread
From: Parthiban.Veerasooran @ 2024-12-12 12:33 UTC (permalink / raw)
  To: kuba
  Cc: andrew+netdev, davem, edumazet, pabeni, netdev, linux-kernel,
	UNGLinuxDriver, jacob.e.keller

Hi Jakub,

On 10/12/24 5:41 am, Jakub Kicinski wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> On Wed, 4 Dec 2024 19:05:18 +0530 Parthiban Veerasooran wrote:
>> @@ -1210,7 +1213,9 @@ netdev_tx_t oa_tc6_start_xmit(struct oa_tc6 *tc6, struct sk_buff *skb)
>>                return NETDEV_TX_OK;
>>        }
>>
>> +     mutex_lock(&tc6->tx_skb_lock);
>>        tc6->waiting_tx_skb = skb;
>> +     mutex_unlock(&tc6->tx_skb_lock);
> 
> start_xmit runs in BH / softirq context. You can't take sleeping locks.
> The lock has to be a spin lock. You could possibly try to use the
> existing spin lock of the tx queue (__netif_tx_lock()) but that may be
> more challenging to do cleanly from within a library..
Thanks for the input. Yes, it looks like implementing a spin lock would 
be a right choice. I will implement it and do the testing as you 
suggested below and share the feedback.

Best regards,
Parthiban V
> 
> Please make sure you test with builds including the
> kernel/configs/debug.config Kconfigs.
> --
> pw-bot: cr


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net v3 2/2] net: ethernet: oa_tc6: fix tx skb race condition between reference pointers
  2024-12-12 12:33     ` Parthiban.Veerasooran
@ 2024-12-13 10:35       ` Parthiban.Veerasooran
  2024-12-14  2:42         ` Jakub Kicinski
  0 siblings, 1 reply; 8+ messages in thread
From: Parthiban.Veerasooran @ 2024-12-13 10:35 UTC (permalink / raw)
  To: kuba
  Cc: andrew+netdev, davem, edumazet, pabeni, netdev, linux-kernel,
	UNGLinuxDriver, jacob.e.keller

Hi Jakub,

On 12/12/24 6:03 pm, Parthiban.Veerasooran@microchip.com wrote:
> Hi Jakub,
> 
> On 10/12/24 5:41 am, Jakub Kicinski wrote:
>> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>>
>> On Wed, 4 Dec 2024 19:05:18 +0530 Parthiban Veerasooran wrote:
>>> @@ -1210,7 +1213,9 @@ netdev_tx_t oa_tc6_start_xmit(struct oa_tc6 *tc6, struct sk_buff *skb)
>>>                 return NETDEV_TX_OK;
>>>         }
>>>
>>> +     mutex_lock(&tc6->tx_skb_lock);
>>>         tc6->waiting_tx_skb = skb;
>>> +     mutex_unlock(&tc6->tx_skb_lock);
>>
>> start_xmit runs in BH / softirq context. You can't take sleeping locks.
>> The lock has to be a spin lock. You could possibly try to use the
>> existing spin lock of the tx queue (__netif_tx_lock()) but that may be
>> more challenging to do cleanly from within a library..
> Thanks for the input. Yes, it looks like implementing a spin lock would
> be a right choice. I will implement it and do the testing as you
> suggested below and share the feedback.
I tried using spin_lock_bh() variants (as the softirq involved) on both 
start_xmit() and spi_thread() where the critical regions need to be 
protected and tested by enabling the Kconfigs in the 
kernel/configs/debug.config. Didn't notice any warnings in the dmesg log.

Note: Prior to the above test, purposefully I tried with spin_lock() 
variants on both the sides to check/simulate for the warnings using 
Kconfigs kernel/configs/debug.config. Got some warnings in the dmesg 
regarding deadlock which clarified the expected behavior. And then I 
proceeded with the above fix and it worked as expected.

If you agree, I will prepare the next version with this fix and post.

Best regards,
Parthiban V
> 
> Best regards,
> Parthiban V
>>
>> Please make sure you test with builds including the
>> kernel/configs/debug.config Kconfigs.
>> --
>> pw-bot: cr
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net v3 2/2] net: ethernet: oa_tc6: fix tx skb race condition between reference pointers
  2024-12-13 10:35       ` Parthiban.Veerasooran
@ 2024-12-14  2:42         ` Jakub Kicinski
  2024-12-16  4:16           ` Parthiban.Veerasooran
  0 siblings, 1 reply; 8+ messages in thread
From: Jakub Kicinski @ 2024-12-14  2:42 UTC (permalink / raw)
  To: Parthiban.Veerasooran
  Cc: andrew+netdev, davem, edumazet, pabeni, netdev, linux-kernel,
	UNGLinuxDriver, jacob.e.keller

On Fri, 13 Dec 2024 10:35:03 +0000 Parthiban.Veerasooran@microchip.com
wrote:
> >> start_xmit runs in BH / softirq context. You can't take sleeping locks.
> >> The lock has to be a spin lock. You could possibly try to use the
> >> existing spin lock of the tx queue (__netif_tx_lock()) but that may be
> >> more challenging to do cleanly from within a library..  
> > Thanks for the input. Yes, it looks like implementing a spin lock would
> > be a right choice. I will implement it and do the testing as you
> > suggested below and share the feedback.  
> I tried using spin_lock_bh() variants (as the softirq involved) on both 
> start_xmit() and spi_thread() where the critical regions need to be 
> protected and tested by enabling the Kconfigs in the 
> kernel/configs/debug.config. Didn't notice any warnings in the dmesg log.
> 
> Note: Prior to the above test, purposefully I tried with spin_lock() 
> variants on both the sides to check/simulate for the warnings using 
> Kconfigs kernel/configs/debug.config. Got some warnings in the dmesg 
> regarding deadlock which clarified the expected behavior. And then I 
> proceeded with the above fix and it worked as expected.
> 
> If you agree, I will prepare the next version with this fix and post.

Go ahead.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net v3 2/2] net: ethernet: oa_tc6: fix tx skb race condition between reference pointers
  2024-12-14  2:42         ` Jakub Kicinski
@ 2024-12-16  4:16           ` Parthiban.Veerasooran
  0 siblings, 0 replies; 8+ messages in thread
From: Parthiban.Veerasooran @ 2024-12-16  4:16 UTC (permalink / raw)
  To: kuba
  Cc: andrew+netdev, davem, edumazet, pabeni, netdev, linux-kernel,
	UNGLinuxDriver, jacob.e.keller

Hi Jakub,

On 14/12/24 8:12 am, Jakub Kicinski wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> On Fri, 13 Dec 2024 10:35:03 +0000 Parthiban.Veerasooran@microchip.com
> wrote:
>>>> start_xmit runs in BH / softirq context. You can't take sleeping locks.
>>>> The lock has to be a spin lock. You could possibly try to use the
>>>> existing spin lock of the tx queue (__netif_tx_lock()) but that may be
>>>> more challenging to do cleanly from within a library..
>>> Thanks for the input. Yes, it looks like implementing a spin lock would
>>> be a right choice. I will implement it and do the testing as you
>>> suggested below and share the feedback.
>> I tried using spin_lock_bh() variants (as the softirq involved) on both
>> start_xmit() and spi_thread() where the critical regions need to be
>> protected and tested by enabling the Kconfigs in the
>> kernel/configs/debug.config. Didn't notice any warnings in the dmesg log.
>>
>> Note: Prior to the above test, purposefully I tried with spin_lock()
>> variants on both the sides to check/simulate for the warnings using
>> Kconfigs kernel/configs/debug.config. Got some warnings in the dmesg
>> regarding deadlock which clarified the expected behavior. And then I
>> proceeded with the above fix and it worked as expected.
>>
>> If you agree, I will prepare the next version with this fix and post.
> 
> Go ahead.
Thanks for your comment. It is already posted for review.

https://patchwork.kernel.org/project/netdevbpf/patch/20241213123159.439739-3-parthiban.veerasooran@microchip.com/

Best regards,
Parthiban V


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-12-16  4:16 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-04 13:35 [PATCH net v3 0/2] Fixes on the OPEN Alliance TC6 10BASE-T1x MAC-PHY support generic lib Parthiban Veerasooran
2024-12-04 13:35 ` [PATCH net v3 1/2] net: ethernet: oa_tc6: fix infinite loop error when tx credits becomes 0 Parthiban Veerasooran
2024-12-04 13:35 ` [PATCH net v3 2/2] net: ethernet: oa_tc6: fix tx skb race condition between reference pointers Parthiban Veerasooran
2024-12-10  0:11   ` Jakub Kicinski
2024-12-12 12:33     ` Parthiban.Veerasooran
2024-12-13 10:35       ` Parthiban.Veerasooran
2024-12-14  2:42         ` Jakub Kicinski
2024-12-16  4:16           ` Parthiban.Veerasooran

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).