[PATCH v2] net: macb: Restart tx only if queue pointer is lagging

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2] net: macb: Restart tx only if queue pointer is lagging
@ 2022-04-07 16:16 Tomas Melin
  2022-04-08  7:42 ` Claudiu.Beznea
  2022-04-12  1:19 ` Jakub Kicinski
  0 siblings, 2 replies; 6+ messages in thread
From: Tomas Melin @ 2022-04-07 16:16 UTC (permalink / raw)
  To: netdev; +Cc: nicolas.ferre, claudiu.beznea, davem, kuba, pabeni, Tomas Melin

commit 4298388574da ("net: macb: restart tx after tx used bit read")
added support for restarting transmission. Restarting tx does not work
in case controller asserts TXUBR interrupt and TQBP is already at the end
of the tx queue. In that situation, restarting tx will immediately cause
assertion of another TXUBR interrupt. The driver will end up in an infinite
interrupt loop which it cannot break out of.

For cases where TQBP is at the end of the tx queue, instead
only clear TX_USED interrupt. As more data gets pushed to the queue,
transmission will resume.

This issue was observed on a Xilinx Zynq-7000 based board.
During stress test of the network interface,
driver would get stuck on interrupt loop within seconds or minutes
causing CPU to stall.

Signed-off-by: Tomas Melin <tomas.melin@vaisala.com>
---
Changes v2:
- change referenced commit to use original commit ID instead of stable branch ID

 drivers/net/ethernet/cadence/macb_main.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index 800d5ced5800..e475be29845c 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -1658,6 +1658,7 @@ static void macb_tx_restart(struct macb_queue *queue)
 	unsigned int head = queue->tx_head;
 	unsigned int tail = queue->tx_tail;
 	struct macb *bp = queue->bp;
+	unsigned int head_idx, tbqp;

 	if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
 		queue_writel(queue, ISR, MACB_BIT(TXUBR));
@@ -1665,6 +1666,13 @@ static void macb_tx_restart(struct macb_queue *queue)
 	if (head == tail)
 		return;

+	tbqp = queue_readl(queue, TBQP) / macb_dma_desc_get_size(bp);
+	tbqp = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, tbqp));
+	head_idx = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, head));
+
+	if (tbqp == head_idx)
+		return;
+
 	macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
 }

-- 
2.35.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] net: macb: Restart tx only if queue pointer is lagging
  2022-04-07 16:16 [PATCH v2] net: macb: Restart tx only if queue pointer is lagging Tomas Melin
@ 2022-04-08  7:42 ` Claudiu.Beznea
  2022-04-08  8:47   ` Harini Katakam
  2022-04-12  1:19 ` Jakub Kicinski
  1 sibling, 1 reply; 6+ messages in thread
From: Claudiu.Beznea @ 2022-04-08  7:42 UTC (permalink / raw)
  To: tomas.melin, netdev
  Cc: Nicolas.Ferre, davem, kuba, pabeni, harini.katakam,
	shubhrajyoti.datta, michal.simek, pthombar, mparab, rafalo

Hi, Tomas,

I'm returning to this new thread.

Sorry for the long delay. I looked though my emails for the steps to
reproduce the bug that introduces macb_tx_restart() but haven't found them.
Though the code in this patch should not affect at all SAMA5D4.

I have tested anyway SAMA5D4 with and without your code and saw no issues.
In case Dave, Jakub want to merge it you can add my
Tested-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com>

The only thing with this patch, as mention earlier, is that freeing of
packet N may depend on sending packet N+1 and if packet N+1 blocks again
the HW then the freeing of packets N, N+1 may depend on packet N+2 etc. But
from your investigation it seems hardware has some bugs.

FYI, I looked though Xilinx github repository and saw no patches on macb
that may be related to this issue.

Anyway, it would be good if there would be some replies from Xilinx or at
least Cadence people on this (previous thread at [1]).

Thank you,
Claudiu Beznea

[1]
https://lore.kernel.org/netdev/82276bf7-72a5-6a2e-ff33-f8fe0c5e4a90@microchip.com/T/#m644c84a8709a65c40b8fc15a589e83b24e48ccfd

On 07.04.2022 19:16, Tomas Melin wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> commit 4298388574da ("net: macb: restart tx after tx used bit read")
> added support for restarting transmission. Restarting tx does not work
> in case controller asserts TXUBR interrupt and TQBP is already at the end
> of the tx queue. In that situation, restarting tx will immediately cause
> assertion of another TXUBR interrupt. The driver will end up in an infinite
> interrupt loop which it cannot break out of.
> 
> For cases where TQBP is at the end of the tx queue, instead
> only clear TX_USED interrupt. As more data gets pushed to the queue,
> transmission will resume.
> 
> This issue was observed on a Xilinx Zynq-7000 based board.
> During stress test of the network interface,
> driver would get stuck on interrupt loop within seconds or minutes
> causing CPU to stall.
> 
> Signed-off-by: Tomas Melin <tomas.melin@vaisala.com>
> ---
> Changes v2:
> - change referenced commit to use original commit ID instead of stable branch ID
> 
>  drivers/net/ethernet/cadence/macb_main.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
> index 800d5ced5800..e475be29845c 100644
> --- a/drivers/net/ethernet/cadence/macb_main.c
> +++ b/drivers/net/ethernet/cadence/macb_main.c
> @@ -1658,6 +1658,7 @@ static void macb_tx_restart(struct macb_queue *queue)
>         unsigned int head = queue->tx_head;
>         unsigned int tail = queue->tx_tail;
>         struct macb *bp = queue->bp;
> +       unsigned int head_idx, tbqp;
> 
>         if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
>                 queue_writel(queue, ISR, MACB_BIT(TXUBR));
> @@ -1665,6 +1666,13 @@ static void macb_tx_restart(struct macb_queue *queue)
>         if (head == tail)
>                 return;
> 
> +       tbqp = queue_readl(queue, TBQP) / macb_dma_desc_get_size(bp);
> +       tbqp = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, tbqp));
> +       head_idx = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, head));
> +
> +       if (tbqp == head_idx)
> +               return;
> +
>         macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
>  }
> 
> --
> 2.35.1
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [PATCH v2] net: macb: Restart tx only if queue pointer is lagging
  2022-04-08  7:42 ` Claudiu.Beznea
@ 2022-04-08  8:47   ` Harini Katakam
  2022-04-08  9:57     ` Tomas Melin
  0 siblings, 1 reply; 6+ messages in thread
From: Harini Katakam @ 2022-04-08  8:47 UTC (permalink / raw)
  To: Claudiu.Beznea@microchip.com, tomas.melin@vaisala.com,
	netdev@vger.kernel.org
  Cc: Nicolas.Ferre@microchip.com, davem@davemloft.net, kuba@kernel.org,
	pabeni@redhat.com, Shubhrajyoti Datta, Michal Simek,
	pthombar@cadence.com, mparab@cadence.com, rafalo@cadence.com

Hi Claudiu, Tomas,

> -----Original Message-----
> From: Claudiu.Beznea@microchip.com <Claudiu.Beznea@microchip.com>
> Sent: Friday, April 8, 2022 1:13 PM
> To: tomas.melin@vaisala.com; netdev@vger.kernel.org
> Cc: Nicolas.Ferre@microchip.com; davem@davemloft.net; kuba@kernel.org;
> pabeni@redhat.com; Harini Katakam <harinik@xilinx.com>; Shubhrajyoti
> Datta <shubhraj@xilinx.com>; Michal Simek <michals@xilinx.com>;
> pthombar@cadence.com; mparab@cadence.com; rafalo@cadence.com
> Subject: Re: [PATCH v2] net: macb: Restart tx only if queue pointer is lagging
> 
> Hi, Tomas,
> 
> I'm returning to this new thread.
> 
> Sorry for the long delay. I looked though my emails for the steps to
> reproduce the bug that introduces macb_tx_restart() but haven't found
> them.
> Though the code in this patch should not affect at all SAMA5D4.
> 
> I have tested anyway SAMA5D4 with and without your code and saw no
> issues.
> In case Dave, Jakub want to merge it you can add my
> Tested-by: Claudiu Beznea <claudiu.beznea@microchip.com>
> Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com>
> 
> The only thing with this patch, as mention earlier, is that freeing of packet N
> may depend on sending packet N+1 and if packet N+1 blocks again the HW
> then the freeing of packets N, N+1 may depend on packet N+2 etc. But from
> your investigation it seems hardware has some bugs.
> 
> FYI, I looked though Xilinx github repository and saw no patches on macb that
> may be related to this issue.
> 
> Anyway, it would be good if there would be some replies from Xilinx or at
> least Cadence people on this (previous thread at [1]).

Sorry for the delayed response.
I saw the condition you described and I'm not able to reproduce it.
But I agree with your assessment that restarting TX will not help in this case.
Also, the original patch restarting TX was also not reproduced on Zynq board
easily. We've had some users report the issue after > 1hr of traffic but that was
on a 4.xx kernel and I'm afraid I don’t have a case where I can reproduce the
original issue Claudiu described on any 5.xx kernel.

Based on the thread, there is one possibility for a HW bug that controller fails to
generate TCOMP when a TXUBR and restart conditions occur because these interrupts
are edge triggered on Zynq.

I'm going to check the errata and let you know if I find anything relevant and also
request Cadence folks to comment.
I'm sorry ask but is this condition reproducible on any later variants of the IP in Xilinx or
non-Xilinx devices?
Zynq US+ MPSoC has the r1p07 while Zynq has the older version IP r1p23 (old versioning)

Regards,
Harini

> 
> Thank you,
> Claudiu Beznea
> 
> [1]
> https://lore.kernel.org/netdev/82276bf7-72a5-6a2e-ff33-
> f8fe0c5e4a90@microchip.com/T/#m644c84a8709a65c40b8fc15a589e83b24e4
> 8ccfd
> 
> On 07.04.2022 19:16, Tomas Melin wrote:
> > EXTERNAL EMAIL: Do not click links or open attachments unless you know
> > the content is safe
> >
> > commit 4298388574da ("net: macb: restart tx after tx used bit read")
> > added support for restarting transmission. Restarting tx does not work
> > in case controller asserts TXUBR interrupt and TQBP is already at the
> > end of the tx queue. In that situation, restarting tx will immediately
> > cause assertion of another TXUBR interrupt. The driver will end up in
> > an infinite interrupt loop which it cannot break out of.
> >
> > For cases where TQBP is at the end of the tx queue, instead only clear
> > TX_USED interrupt. As more data gets pushed to the queue, transmission
> > will resume.
> >
> > This issue was observed on a Xilinx Zynq-7000 based board.
> > During stress test of the network interface, driver would get stuck on
> > interrupt loop within seconds or minutes causing CPU to stall.
> >
> > Signed-off-by: Tomas Melin <tomas.melin@vaisala.com>
> > ---
> > Changes v2:
> > - change referenced commit to use original commit ID instead of stable
> > branch ID
> >
> >  drivers/net/ethernet/cadence/macb_main.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/cadence/macb_main.c
> > b/drivers/net/ethernet/cadence/macb_main.c
> > index 800d5ced5800..e475be29845c 100644
> > --- a/drivers/net/ethernet/cadence/macb_main.c
> > +++ b/drivers/net/ethernet/cadence/macb_main.c
> > @@ -1658,6 +1658,7 @@ static void macb_tx_restart(struct macb_queue
> *queue)
> >         unsigned int head = queue->tx_head;
> >         unsigned int tail = queue->tx_tail;
> >         struct macb *bp = queue->bp;
> > +       unsigned int head_idx, tbqp;
> >
> >         if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
> >                 queue_writel(queue, ISR, MACB_BIT(TXUBR)); @@ -1665,6
> > +1666,13 @@ static void macb_tx_restart(struct macb_queue *queue)
> >         if (head == tail)
> >                 return;
> >
> > +       tbqp = queue_readl(queue, TBQP) / macb_dma_desc_get_size(bp);
> > +       tbqp = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, tbqp));
> > +       head_idx = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp,
> > + head));
> > +
> > +       if (tbqp == head_idx)
> > +               return;
> > +
> >         macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
> > }
> >
> > --
> > 2.35.1
> >


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] net: macb: Restart tx only if queue pointer is lagging
  2022-04-08  8:47   ` Harini Katakam
@ 2022-04-08  9:57     ` Tomas Melin
  2022-04-08 12:02       ` Harini Katakam
  0 siblings, 1 reply; 6+ messages in thread
From: Tomas Melin @ 2022-04-08  9:57 UTC (permalink / raw)
  To: Harini Katakam, Claudiu.Beznea@microchip.com,
	netdev@vger.kernel.org
  Cc: Nicolas.Ferre@microchip.com, davem@davemloft.net, kuba@kernel.org,
	pabeni@redhat.com, Shubhrajyoti Datta, Michal Simek,
	pthombar@cadence.com, mparab@cadence.com, rafalo@cadence.com

Hi Claudiu, Harini,

On 08/04/2022 11:47, Harini Katakam wrote:
> Hi Claudiu, Tomas,
> 
>> -----Original Message-----
>> From: Claudiu.Beznea@microchip.com <Claudiu.Beznea@microchip.com>
>> Sent: Friday, April 8, 2022 1:13 PM
>> To: tomas.melin@vaisala.com; netdev@vger.kernel.org
>> Cc: Nicolas.Ferre@microchip.com; davem@davemloft.net; kuba@kernel.org;
>> pabeni@redhat.com; Harini Katakam <harinik@xilinx.com>; Shubhrajyoti
>> Datta <shubhraj@xilinx.com>; Michal Simek <michals@xilinx.com>;
>> pthombar@cadence.com; mparab@cadence.com; rafalo@cadence.com
>> Subject: Re: [PATCH v2] net: macb: Restart tx only if queue pointer is lagging
>>
>> Hi, Tomas,
>>
>> I'm returning to this new thread.
>>
>> Sorry for the long delay. I looked though my emails for the steps to
>> reproduce the bug that introduces macb_tx_restart() but haven't found
>> them.
>> Though the code in this patch should not affect at all SAMA5D4.
>>
>> I have tested anyway SAMA5D4 with and without your code and saw no
>> issues.
>> In case Dave, Jakub want to merge it you can add my
>> Tested-by: Claudiu Beznea <claudiu.beznea@microchip.com>
>> Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com>

Thank you for the effort to review and test this! Also thanks for the
discussions around this issue to provide further insights.


>>
>> The only thing with this patch, as mention earlier, is that freeing of packet N
>> may depend on sending packet N+1 and if packet N+1 blocks again the HW
>> then the freeing of packets N, N+1 may depend on packet N+2 etc. But from
>> your investigation it seems hardware has some bugs.

Indeed, this is not behaviour I have encountered in any testing. If we
were ever to encounter such issue, then it would need to be handled in
separate manner. Perhaps call tx_interrupt() to progress the queue. But
then again, this does not seem to happen.

>>
>> FYI, I looked though Xilinx github repository and saw no patches on macb that
>> may be related to this issue.
>>
>> Anyway, it would be good if there would be some replies from Xilinx or at
>> least Cadence people on this (previous thread at [1]).
> 
> Sorry for the delayed response.
> I saw the condition you described and I'm not able to reproduce it.
> But I agree with your assessment that restarting TX will not help in this case.
> Also, the original patch restarting TX was also not reproduced on Zynq board
> easily. We've had some users report the issue after > 1hr of traffic but that was
> on a 4.xx kernel and I'm afraid I don’t have a case where I can reproduce the
> original issue Claudiu described on any 5.xx kernel.
> 
> Based on the thread, there is one possibility for a HW bug that controller fails to
> generate TCOMP when a TXUBR and restart conditions occur because these interrupts
> are edge triggered on Zynq.

This is interesting hypothesis and that would indeed lead to this situation.


> 
> I'm going to check the errata and let you know if I find anything relevant and also
> request Cadence folks to comment.
> I'm sorry ask but is this condition reproducible on any later variants of the IP in Xilinx or
> non-Xilinx devices?

I have not seen this issue on MPSoC (atleast yet). Indeed this issue
seems to require the correct timing conditions for being able to trigger it.

So any additional information that we might get about possible issues in
IP is welcomed. However, the hardware on the boards we have at hand will
still be the same so the patch as such is relevant.

BR,
Tomas



> Zynq US+ MPSoC has the r1p07 while Zynq has the older version IP r1p23 (old versioning)
> 
> Regards,
> Harini
> 
>>
>> Thank you,
>> Claudiu Beznea
>>
>> [1]
>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fnetdev%2F82276bf7-72a5-6a2e-ff33-&amp;data=04%7C01%7Ctomas.melin%40vaisala.com%7C352a532fe14b42ad01d508da193c6320%7C6d7393e041f54c2e9b124c2be5da5c57%7C0%7C0%7C637850044400650522%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=rsBnJEVlDqpSUIfL%2BuXzAgTUL4w9rqaR6A6OLAi9gNQ%3D&amp;reserved=0
>> f8fe0c5e4a90@microchip.com/T/#m644c84a8709a65c40b8fc15a589e83b24e4
>> 8ccfd
>>
>> On 07.04.2022 19:16, Tomas Melin wrote:
>>> EXTERNAL EMAIL: Do not click links or open attachments unless you know
>>> the content is safe
>>>
>>> commit 4298388574da ("net: macb: restart tx after tx used bit read")
>>> added support for restarting transmission. Restarting tx does not work
>>> in case controller asserts TXUBR interrupt and TQBP is already at the
>>> end of the tx queue. In that situation, restarting tx will immediately
>>> cause assertion of another TXUBR interrupt. The driver will end up in
>>> an infinite interrupt loop which it cannot break out of.
>>>
>>> For cases where TQBP is at the end of the tx queue, instead only clear
>>> TX_USED interrupt. As more data gets pushed to the queue, transmission
>>> will resume.
>>>
>>> This issue was observed on a Xilinx Zynq-7000 based board.
>>> During stress test of the network interface, driver would get stuck on
>>> interrupt loop within seconds or minutes causing CPU to stall.
>>>
>>> Signed-off-by: Tomas Melin <tomas.melin@vaisala.com>
>>> ---
>>> Changes v2:
>>> - change referenced commit to use original commit ID instead of stable
>>> branch ID
>>>
>>>  drivers/net/ethernet/cadence/macb_main.c | 8 ++++++++
>>>  1 file changed, 8 insertions(+)
>>>
>>> diff --git a/drivers/net/ethernet/cadence/macb_main.c
>>> b/drivers/net/ethernet/cadence/macb_main.c
>>> index 800d5ced5800..e475be29845c 100644
>>> --- a/drivers/net/ethernet/cadence/macb_main.c
>>> +++ b/drivers/net/ethernet/cadence/macb_main.c
>>> @@ -1658,6 +1658,7 @@ static void macb_tx_restart(struct macb_queue
>> *queue)
>>>         unsigned int head = queue->tx_head;
>>>         unsigned int tail = queue->tx_tail;
>>>         struct macb *bp = queue->bp;
>>> +       unsigned int head_idx, tbqp;
>>>
>>>         if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
>>>                 queue_writel(queue, ISR, MACB_BIT(TXUBR)); @@ -1665,6
>>> +1666,13 @@ static void macb_tx_restart(struct macb_queue *queue)
>>>         if (head == tail)
>>>                 return;
>>>
>>> +       tbqp = queue_readl(queue, TBQP) / macb_dma_desc_get_size(bp);
>>> +       tbqp = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, tbqp));
>>> +       head_idx = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp,
>>> + head));
>>> +
>>> +       if (tbqp == head_idx)
>>> +               return;
>>> +
>>>         macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
>>> }
>>>
>>> --
>>> 2.35.1
>>>
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [PATCH v2] net: macb: Restart tx only if queue pointer is lagging
  2022-04-08  9:57     ` Tomas Melin
@ 2022-04-08 12:02       ` Harini Katakam
  0 siblings, 0 replies; 6+ messages in thread
From: Harini Katakam @ 2022-04-08 12:02 UTC (permalink / raw)
  To: Tomas Melin, Claudiu.Beznea@microchip.com, netdev@vger.kernel.org
  Cc: Nicolas.Ferre@microchip.com, davem@davemloft.net, kuba@kernel.org,
	pabeni@redhat.com, Shubhrajyoti Datta, Michal Simek,
	pthombar@cadence.com, mparab@cadence.com, rafalo@cadence.com

Hi Tomas,

> -----Original Message-----
> From: Tomas Melin <tomas.melin@vaisala.com>
> Sent: Friday, April 8, 2022 3:27 PM
> To: Harini Katakam <harinik@xilinx.com>; Claudiu.Beznea@microchip.com;
> netdev@vger.kernel.org
> Cc: Nicolas.Ferre@microchip.com; davem@davemloft.net; kuba@kernel.org;
> pabeni@redhat.com; Shubhrajyoti Datta <shubhraj@xilinx.com>; Michal
> Simek <michals@xilinx.com>; pthombar@cadence.com;
> mparab@cadence.com; rafalo@cadence.com
> Subject: Re: [PATCH v2] net: macb: Restart tx only if queue pointer is lagging
> 
> Hi Claudiu, Harini,
> 
> On 08/04/2022 11:47, Harini Katakam wrote:
> > Hi Claudiu, Tomas,
> >
> >> -----Original Message-----
> >> From: Claudiu.Beznea@microchip.com <Claudiu.Beznea@microchip.com>
> >> Sent: Friday, April 8, 2022 1:13 PM
> >> To: tomas.melin@vaisala.com; netdev@vger.kernel.org
> >> Cc: Nicolas.Ferre@microchip.com; davem@davemloft.net;
> kuba@kernel.org;
> >> pabeni@redhat.com; Harini Katakam <harinik@xilinx.com>; Shubhrajyoti
> >> Datta <shubhraj@xilinx.com>; Michal Simek <michals@xilinx.com>;
> >> pthombar@cadence.com; mparab@cadence.com; rafalo@cadence.com
> >> Subject: Re: [PATCH v2] net: macb: Restart tx only if queue pointer is
> lagging
> >>
> >> Hi, Tomas,
> >>
> >> I'm returning to this new thread.
> >>
> >> Sorry for the long delay. I looked though my emails for the steps to
> >> reproduce the bug that introduces macb_tx_restart() but haven't found
> >> them.
> >> Though the code in this patch should not affect at all SAMA5D4.
> >>
> >> I have tested anyway SAMA5D4 with and without your code and saw no
> >> issues.
> >> In case Dave, Jakub want to merge it you can add my
> >> Tested-by: Claudiu Beznea <claudiu.beznea@microchip.com>
> >> Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com>
> 
> Thank you for the effort to review and test this! Also thanks for the
> discussions around this issue to provide further insights.
> 
> 
> >>
> >> The only thing with this patch, as mention earlier, is that freeing of packet
> N
> >> may depend on sending packet N+1 and if packet N+1 blocks again the
> HW
> >> then the freeing of packets N, N+1 may depend on packet N+2 etc. But
> from
> >> your investigation it seems hardware has some bugs.
> 
> Indeed, this is not behaviour I have encountered in any testing. If we
> were ever to encounter such issue, then it would need to be handled in
> separate manner. Perhaps call tx_interrupt() to progress the queue. But
> then again, this does not seem to happen.
> 
> >>
> >> FYI, I looked though Xilinx github repository and saw no patches on macb
> that
> >> may be related to this issue.
> >>
> >> Anyway, it would be good if there would be some replies from Xilinx or at
> >> least Cadence people on this (previous thread at [1]).
> >
> > Sorry for the delayed response.
> > I saw the condition you described and I'm not able to reproduce it.
> > But I agree with your assessment that restarting TX will not help in this
> case.
> > Also, the original patch restarting TX was also not reproduced on Zynq
> board
> > easily. We've had some users report the issue after > 1hr of traffic but that
> was
> > on a 4.xx kernel and I'm afraid I don’t have a case where I can reproduce
> the
> > original issue Claudiu described on any 5.xx kernel.
> >
> > Based on the thread, there is one possibility for a HW bug that controller
> fails to
> > generate TCOMP when a TXUBR and restart conditions occur because
> these interrupts
> > are edge triggered on Zynq.
> 
> This is interesting hypothesis and that would indeed lead to this situation.
> 
> 
> >
> > I'm going to check the errata and let you know if I find anything relevant
> and also
> > request Cadence folks to comment.
> > I'm sorry ask but is this condition reproducible on any later variants of the IP
> in Xilinx or
> > non-Xilinx devices?
> 
> I have not seen this issue on MPSoC (atleast yet). Indeed this issue
> seems to require the correct timing conditions for being able to trigger it.
> 
> So any additional information that we might get about possible issues in
> IP is welcomed. However, the hardware on the boards we have at hand will
> still be the same so the patch as such is relevant.

Yes, agreed. The patch is still required.

Regards,
Harini

> 
> BR,
> Tomas
> 
> 
> 
> > Zynq US+ MPSoC has the r1p07 while Zynq has the older version IP r1p23
> (old versioning)
> >
> > Regards,
> > Harini
> >
> >>
> >> Thank you,
> >> Claudiu Beznea
> >>
> >> [1]
> >>
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.k
> ernel.org%2Fnetdev%2F82276bf7-72a5-6a2e-ff33-
> &amp;data=04%7C01%7Ctomas.melin%40vaisala.com%7C352a532fe14b42ad
> 01d508da193c6320%7C6d7393e041f54c2e9b124c2be5da5c57%7C0%7C0%7C6
> 37850044400650522%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMD
> AiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=
> rsBnJEVlDqpSUIfL%2BuXzAgTUL4w9rqaR6A6OLAi9gNQ%3D&amp;reserved=
> 0
> >>
> f8fe0c5e4a90@microchip.com/T/#m644c84a8709a65c40b8fc15a589e83b24e4
> >> 8ccfd
> >>
<snip>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] net: macb: Restart tx only if queue pointer is lagging
  2022-04-07 16:16 [PATCH v2] net: macb: Restart tx only if queue pointer is lagging Tomas Melin
  2022-04-08  7:42 ` Claudiu.Beznea
@ 2022-04-12  1:19 ` Jakub Kicinski
  1 sibling, 0 replies; 6+ messages in thread
From: Jakub Kicinski @ 2022-04-12  1:19 UTC (permalink / raw)
  To: Tomas Melin; +Cc: netdev, nicolas.ferre, claudiu.beznea, davem, pabeni

On Thu,  7 Apr 2022 19:16:59 +0300 Tomas Melin wrote:
> commit 4298388574da ("net: macb: restart tx after tx used bit read")
> added support for restarting transmission. Restarting tx does not work
> in case controller asserts TXUBR interrupt and TQBP is already at the end
> of the tx queue. In that situation, restarting tx will immediately cause
> assertion of another TXUBR interrupt. The driver will end up in an infinite
> interrupt loop which it cannot break out of.
> 
> For cases where TQBP is at the end of the tx queue, instead
> only clear TX_USED interrupt. As more data gets pushed to the queue,
> transmission will resume.
> 
> This issue was observed on a Xilinx Zynq-7000 based board.
> During stress test of the network interface,
> driver would get stuck on interrupt loop within seconds or minutes
> causing CPU to stall.
> 
> Signed-off-by: Tomas Melin <tomas.melin@vaisala.com>

Applied, thanks. Commit 5ad7f18cd82c ("net: macb: Restart tx only if
queue pointer is lagging") in net.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-04-12  1:19 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-04-07 16:16 [PATCH v2] net: macb: Restart tx only if queue pointer is lagging Tomas Melin
2022-04-08  7:42 ` Claudiu.Beznea
2022-04-08  8:47   ` Harini Katakam
2022-04-08  9:57     ` Tomas Melin
2022-04-08 12:02       ` Harini Katakam
2022-04-12  1:19 ` Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).