[PATCH net] ibmvnic: Do not reset dql stats on NON

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net] ibmvnic: Do not reset dql stats on NON_FATAL err
@ 2023-06-22 19:03 Nick Child
  2023-06-23  7:52 ` Simon Horman
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Nick Child @ 2023-06-22 19:03 UTC (permalink / raw)
  To: netdev; +Cc: haren, ricklind, Nick Child

All ibmvnic resets, make a call to netdev_tx_reset_queue() when
re-opening the device. netdev_tx_reset_queue() resets the num_queued
and num_completed byte counters. These stats are used in Byte Queue
Limit (BQL) algorithms. The difference between these two stats tracks
the number of bytes currently sitting on the physical NIC. ibmvnic
increases the number of queued bytes though calls to
netdev_tx_sent_queue() in the drivers xmit function. When, VIOS reports
that it is done transmitting bytes, the ibmvnic device increases the
number of completed bytes through calls to netdev_tx_completed_queue().
It is important to note that the driver batches its transmit calls and
num_queued is increased every time that an skb is added to the next
batch, not necessarily when the batch is sent to VIOS for transmission.

Unlike other reset types, a NON FATAL reset will not flush the sub crq
tx buffers. Therefore, it is possible for the batched skb array to be
partially full. So if there is call to netdev_tx_reset_queue() when
re-opening the device, the value of num_queued (0) would not account
for the skb's that are currently batched. Eventually, when the batch
is sent to VIOS, the call to netdev_tx_completed_queue() would increase
num_completed to a value greater than the num_queued. This causes a
BUG_ON crash:

ibmvnic 30000002: Firmware reports error, cause: adapter problem.
Starting recovery...
ibmvnic 30000002: tx error 600
ibmvnic 30000002: tx error 600
ibmvnic 30000002: tx error 600
ibmvnic 30000002: tx error 600
------------[ cut here ]------------
kernel BUG at lib/dynamic_queue_limits.c:27!
Oops: Exception in kernel mode, sig: 5
[....]
NIP dql_completed+0x28/0x1c0
LR ibmvnic_complete_tx.isra.0+0x23c/0x420 [ibmvnic]
Call Trace:
ibmvnic_complete_tx.isra.0+0x3f8/0x420 [ibmvnic] (unreliable)
ibmvnic_interrupt_tx+0x40/0x70 [ibmvnic]
__handle_irq_event_percpu+0x98/0x270
---[ end trace ]---

Therefore, do not reset the dql stats when performing a NON_FATAL reset.
Simply clearing the queues off bit is sufficient.

Fixes: 0d973388185d ("ibmvnic: Introduce xmit_more support using batched subCRQ hcalls")
Signed-off-by: Nick Child <nnac123@linux.ibm.com>
---
 drivers/net/ethernet/ibm/ibmvnic.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index c63d3ec9d328..5523ab52ff2b 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -1816,7 +1816,18 @@ static int __ibmvnic_open(struct net_device *netdev)
 		if (prev_state == VNIC_CLOSED)
 			enable_irq(adapter->tx_scrq[i]->irq);
 		enable_scrq_irq(adapter, adapter->tx_scrq[i]);
-		netdev_tx_reset_queue(netdev_get_tx_queue(netdev, i));
+		/* netdev_tx_reset_queue will reset dql stats and set the stacks
+		 * flag for queue status. During NON_FATAL resets, just
+		 * clear the bit, don't reset the stats because there could
+		 * be batched skb's waiting to be sent. If we reset dql stats,
+		 * we risk num_completed being greater than num_queued.
+		 * This will cause a BUG_ON in dql_completed().
+		 */
+		if (adapter->reset_reason == VNIC_RESET_NON_FATAL)
+			clear_bit(__QUEUE_STATE_STACK_XOFF,
+				  &netdev_get_tx_queue(netdev, i)->state);
+		else
+			netdev_tx_reset_queue(netdev_get_tx_queue(netdev, i));
 	}

 	rc = set_link_state(adapter, IBMVNIC_LOGICAL_LNK_UP);
-- 
2.31.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH net] ibmvnic: Do not reset dql stats on NON_FATAL err
  2023-06-22 19:03 [PATCH net] ibmvnic: Do not reset dql stats on NON_FATAL err Nick Child
@ 2023-06-23  7:52 ` Simon Horman
  2023-06-23 12:41   ` Simon Horman
  2023-06-23 12:24 ` Rick Lindsley
  2023-06-24 22:19 ` Jakub Kicinski
  2 siblings, 1 reply; 9+ messages in thread
From: Simon Horman @ 2023-06-23  7:52 UTC (permalink / raw)
  To: Nick Child; +Cc: netdev, haren, ricklind

+ maintainers and blamed authors

On Thu, Jun 22, 2023 at 02:03:32PM -0500, Nick Child wrote:
> All ibmvnic resets, make a call to netdev_tx_reset_queue() when
> re-opening the device. netdev_tx_reset_queue() resets the num_queued
> and num_completed byte counters. These stats are used in Byte Queue
> Limit (BQL) algorithms. The difference between these two stats tracks
> the number of bytes currently sitting on the physical NIC. ibmvnic
> increases the number of queued bytes though calls to
> netdev_tx_sent_queue() in the drivers xmit function. When, VIOS reports
> that it is done transmitting bytes, the ibmvnic device increases the
> number of completed bytes through calls to netdev_tx_completed_queue().
> It is important to note that the driver batches its transmit calls and
> num_queued is increased every time that an skb is added to the next
> batch, not necessarily when the batch is sent to VIOS for transmission.
> 
> Unlike other reset types, a NON FATAL reset will not flush the sub crq
> tx buffers. Therefore, it is possible for the batched skb array to be
> partially full. So if there is call to netdev_tx_reset_queue() when
> re-opening the device, the value of num_queued (0) would not account
> for the skb's that are currently batched. Eventually, when the batch
> is sent to VIOS, the call to netdev_tx_completed_queue() would increase
> num_completed to a value greater than the num_queued. This causes a
> BUG_ON crash:
> 
> ibmvnic 30000002: Firmware reports error, cause: adapter problem.
> Starting recovery...
> ibmvnic 30000002: tx error 600
> ibmvnic 30000002: tx error 600
> ibmvnic 30000002: tx error 600
> ibmvnic 30000002: tx error 600
> ------------[ cut here ]------------
> kernel BUG at lib/dynamic_queue_limits.c:27!
> Oops: Exception in kernel mode, sig: 5
> [....]
> NIP dql_completed+0x28/0x1c0
> LR ibmvnic_complete_tx.isra.0+0x23c/0x420 [ibmvnic]
> Call Trace:
> ibmvnic_complete_tx.isra.0+0x3f8/0x420 [ibmvnic] (unreliable)
> ibmvnic_interrupt_tx+0x40/0x70 [ibmvnic]
> __handle_irq_event_percpu+0x98/0x270
> ---[ end trace ]---
> 
> Therefore, do not reset the dql stats when performing a NON_FATAL reset.
> Simply clearing the queues off bit is sufficient.
> 
> Fixes: 0d973388185d ("ibmvnic: Introduce xmit_more support using batched subCRQ hcalls")
> Signed-off-by: Nick Child <nnac123@linux.ibm.com>
> ---
>  drivers/net/ethernet/ibm/ibmvnic.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
> index c63d3ec9d328..5523ab52ff2b 100644
> --- a/drivers/net/ethernet/ibm/ibmvnic.c
> +++ b/drivers/net/ethernet/ibm/ibmvnic.c
> @@ -1816,7 +1816,18 @@ static int __ibmvnic_open(struct net_device *netdev)
>  		if (prev_state == VNIC_CLOSED)
>  			enable_irq(adapter->tx_scrq[i]->irq);
>  		enable_scrq_irq(adapter, adapter->tx_scrq[i]);
> -		netdev_tx_reset_queue(netdev_get_tx_queue(netdev, i));
> +		/* netdev_tx_reset_queue will reset dql stats and set the stacks
> +		 * flag for queue status. During NON_FATAL resets, just
> +		 * clear the bit, don't reset the stats because there could
> +		 * be batched skb's waiting to be sent. If we reset dql stats,
> +		 * we risk num_completed being greater than num_queued.
> +		 * This will cause a BUG_ON in dql_completed().
> +		 */
> +		if (adapter->reset_reason == VNIC_RESET_NON_FATAL)
> +			clear_bit(__QUEUE_STATE_STACK_XOFF,
> +				  &netdev_get_tx_queue(netdev, i)->state);
> +		else
> +			netdev_tx_reset_queue(netdev_get_tx_queue(netdev, i));
>  	}
>  
>  	rc = set_link_state(adapter, IBMVNIC_LOGICAL_LNK_UP);
> -- 
> 2.31.1
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net] ibmvnic: Do not reset dql stats on NON_FATAL err
  2023-06-22 19:03 [PATCH net] ibmvnic: Do not reset dql stats on NON_FATAL err Nick Child
  2023-06-23  7:52 ` Simon Horman
@ 2023-06-23 12:24 ` Rick Lindsley
  2023-06-24 22:19 ` Jakub Kicinski
  2 siblings, 0 replies; 9+ messages in thread
From: Rick Lindsley @ 2023-06-23 12:24 UTC (permalink / raw)
  To: Nick Child, netdev; +Cc: haren, ricklind

On 6/22/23 12:03, Nick Child wrote:

> Signed-off-by: Nick Child <nnac123@linux.ibm.com>

Reviewed-by: ricklind@us.ibm.com


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net] ibmvnic: Do not reset dql stats on NON_FATAL err
  2023-06-23  7:52 ` Simon Horman
@ 2023-06-23 12:41   ` Simon Horman
  0 siblings, 0 replies; 9+ messages in thread
From: Simon Horman @ 2023-06-23 12:41 UTC (permalink / raw)
  To: Nick Child
  Cc: netdev, haren, ricklind, mpe, kuba, tlfalcon, danymadden, npiggin,
	christophe.leroy, davem, pabeni, edumazet, linuxppc-dev

On Fri, Jun 23, 2023 at 09:52:26AM +0200, Simon Horman wrote:
> + maintainers and blamed authors

A second time, because something went wrong with the first attempt.

> On Thu, Jun 22, 2023 at 02:03:32PM -0500, Nick Child wrote:
> > All ibmvnic resets, make a call to netdev_tx_reset_queue() when
> > re-opening the device. netdev_tx_reset_queue() resets the num_queued
> > and num_completed byte counters. These stats are used in Byte Queue
> > Limit (BQL) algorithms. The difference between these two stats tracks
> > the number of bytes currently sitting on the physical NIC. ibmvnic
> > increases the number of queued bytes though calls to
> > netdev_tx_sent_queue() in the drivers xmit function. When, VIOS reports
> > that it is done transmitting bytes, the ibmvnic device increases the
> > number of completed bytes through calls to netdev_tx_completed_queue().
> > It is important to note that the driver batches its transmit calls and
> > num_queued is increased every time that an skb is added to the next
> > batch, not necessarily when the batch is sent to VIOS for transmission.
> > 
> > Unlike other reset types, a NON FATAL reset will not flush the sub crq
> > tx buffers. Therefore, it is possible for the batched skb array to be
> > partially full. So if there is call to netdev_tx_reset_queue() when
> > re-opening the device, the value of num_queued (0) would not account
> > for the skb's that are currently batched. Eventually, when the batch
> > is sent to VIOS, the call to netdev_tx_completed_queue() would increase
> > num_completed to a value greater than the num_queued. This causes a
> > BUG_ON crash:
> > 
> > ibmvnic 30000002: Firmware reports error, cause: adapter problem.
> > Starting recovery...
> > ibmvnic 30000002: tx error 600
> > ibmvnic 30000002: tx error 600
> > ibmvnic 30000002: tx error 600
> > ibmvnic 30000002: tx error 600
> > ------------[ cut here ]------------
> > kernel BUG at lib/dynamic_queue_limits.c:27!
> > Oops: Exception in kernel mode, sig: 5
> > [....]
> > NIP dql_completed+0x28/0x1c0
> > LR ibmvnic_complete_tx.isra.0+0x23c/0x420 [ibmvnic]
> > Call Trace:
> > ibmvnic_complete_tx.isra.0+0x3f8/0x420 [ibmvnic] (unreliable)
> > ibmvnic_interrupt_tx+0x40/0x70 [ibmvnic]
> > __handle_irq_event_percpu+0x98/0x270
> > ---[ end trace ]---
> > 
> > Therefore, do not reset the dql stats when performing a NON_FATAL reset.
> > Simply clearing the queues off bit is sufficient.
> > 
> > Fixes: 0d973388185d ("ibmvnic: Introduce xmit_more support using batched subCRQ hcalls")
> > Signed-off-by: Nick Child <nnac123@linux.ibm.com>
> > ---
> >  drivers/net/ethernet/ibm/ibmvnic.c | 13 ++++++++++++-
> >  1 file changed, 12 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
> > index c63d3ec9d328..5523ab52ff2b 100644
> > --- a/drivers/net/ethernet/ibm/ibmvnic.c
> > +++ b/drivers/net/ethernet/ibm/ibmvnic.c
> > @@ -1816,7 +1816,18 @@ static int __ibmvnic_open(struct net_device *netdev)
> >  		if (prev_state == VNIC_CLOSED)
> >  			enable_irq(adapter->tx_scrq[i]->irq);
> >  		enable_scrq_irq(adapter, adapter->tx_scrq[i]);
> > -		netdev_tx_reset_queue(netdev_get_tx_queue(netdev, i));
> > +		/* netdev_tx_reset_queue will reset dql stats and set the stacks
> > +		 * flag for queue status. During NON_FATAL resets, just
> > +		 * clear the bit, don't reset the stats because there could
> > +		 * be batched skb's waiting to be sent. If we reset dql stats,
> > +		 * we risk num_completed being greater than num_queued.
> > +		 * This will cause a BUG_ON in dql_completed().
> > +		 */
> > +		if (adapter->reset_reason == VNIC_RESET_NON_FATAL)
> > +			clear_bit(__QUEUE_STATE_STACK_XOFF,
> > +				  &netdev_get_tx_queue(netdev, i)->state);
> > +		else
> > +			netdev_tx_reset_queue(netdev_get_tx_queue(netdev, i));
> >  	}
> >  
> >  	rc = set_link_state(adapter, IBMVNIC_LOGICAL_LNK_UP);
> > -- 
> > 2.31.1
> > 
> > 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net] ibmvnic: Do not reset dql stats on NON_FATAL err
  2023-06-22 19:03 [PATCH net] ibmvnic: Do not reset dql stats on NON_FATAL err Nick Child
  2023-06-23  7:52 ` Simon Horman
  2023-06-23 12:24 ` Rick Lindsley
@ 2023-06-24 22:19 ` Jakub Kicinski
  2023-06-26 15:45   ` Nick Child
  2023-06-26 19:06   ` Rick Lindsley
  2 siblings, 2 replies; 9+ messages in thread
From: Jakub Kicinski @ 2023-06-24 22:19 UTC (permalink / raw)
  To: Nick Child; +Cc: netdev, haren, ricklind

On Thu, 22 Jun 2023 14:03:32 -0500 Nick Child wrote:
> +		if (adapter->reset_reason == VNIC_RESET_NON_FATAL)
> +			clear_bit(__QUEUE_STATE_STACK_XOFF,
> +				  &netdev_get_tx_queue(netdev, i)->state);

Why are you trying to clear this bit?

If the completions will still come the bit will be cleared (or not)
during completion handling (netdev_tx_completed_queue() et al.)

Drivers shouldn't be poking into queue state bits directly.
-- 
pw-bot: cr

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net] ibmvnic: Do not reset dql stats on NON_FATAL err
  2023-06-24 22:19 ` Jakub Kicinski
@ 2023-06-26 15:45   ` Nick Child
  2023-06-26 17:33     ` Jakub Kicinski
  2023-06-26 19:06   ` Rick Lindsley
  1 sibling, 1 reply; 9+ messages in thread
From: Nick Child @ 2023-06-26 15:45 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, haren, ricklind

On 6/24/23 17:19, Jakub Kicinski wrote:
> On Thu, 22 Jun 2023 14:03:32 -0500 Nick Child wrote:
>> +		if (adapter->reset_reason == VNIC_RESET_NON_FATAL)
>> +			clear_bit(__QUEUE_STATE_STACK_XOFF,
>> +				  &netdev_get_tx_queue(netdev, i)->state);
> 
> Why are you trying to clear this bit?
> 
> If the completions will still come the bit will be cleared (or not)
> during completion handling (netdev_tx_completed_queue() et al.)
> 
> Drivers shouldn't be poking into queue state bits directly.

Most likely, yes there could be some bytes that will get a completion 
which would clear this bit.

That being said, it is also possible that all bytes sent to the NIC are 
already completed. In which case we would not get a completion. The 
ibmvnic driver sends its skb's to the NIC in batches, it makes a call to 
netdev_tx_sent_queue on every time an skb is added to the batch. This is 
not necessarily every-time that the batch is sent to the NIC.

I am not sure what number of bytes causes dql to set 
__QUEUE_STATE_STACK_XOFF but I do know that it is possible for up to 15 
skb's to be sitting in the queues batch. If there are no outstanding 
bytes on the NIC (ie not expecting a tx completion) and the internal 
batch has 15 references per queue, is this enough to enforce dql and set 
__QUEUE_STATE_STACK_XOFF? If so, then we must clear 
__QUEUE_STATE_STACK_XOFF when resetting.

I had a feeling this would raise some eyebrows. The main intent is to do 
everything that netdev_tx_reset_queue() does besides resetting 
statistics. Setting a "*STACK*" flag in driver code feels wrong 
(especially since a *DRV* flag exists) but I could not find an 
appropriate upper-level function. I suppose an alternative is to read 
this flag after the device finishes the reset and sending the batch if 
it is set. Is this any better?

As always, thanks for the review.
Nick

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net] ibmvnic: Do not reset dql stats on NON_FATAL err
  2023-06-26 15:45   ` Nick Child
@ 2023-06-26 17:33     ` Jakub Kicinski
  2023-06-26 19:23       ` Nick Child
  0 siblings, 1 reply; 9+ messages in thread
From: Jakub Kicinski @ 2023-06-26 17:33 UTC (permalink / raw)
  To: Nick Child; +Cc: netdev, haren, ricklind

On Mon, 26 Jun 2023 10:45:38 -0500 Nick Child wrote:
> On 6/24/23 17:19, Jakub Kicinski wrote:
> > On Thu, 22 Jun 2023 14:03:32 -0500 Nick Child wrote:  
>  [...]  
> > 
> > Why are you trying to clear this bit?
> > 
> > If the completions will still come the bit will be cleared (or not)
> > during completion handling (netdev_tx_completed_queue() et al.)
> > 
> > Drivers shouldn't be poking into queue state bits directly.  
> 
> Most likely, yes there could be some bytes that will get a completion 
> which would clear this bit.
> 
> That being said, it is also possible that all bytes sent to the NIC are 
> already completed. In which case we would not get a completion. The 
> ibmvnic driver sends its skb's to the NIC in batches, it makes a call to 
> netdev_tx_sent_queue on every time an skb is added to the batch. This is 
> not necessarily every-time that the batch is sent to the NIC.

If packets can get stuck in the driver that needs a dedicated fix.

> I am not sure what number of bytes causes dql to set 
> __QUEUE_STATE_STACK_XOFF but I do know that it is possible for up to 15 
> skb's to be sitting in the queues batch. If there are no outstanding 
> bytes on the NIC (ie not expecting a tx completion) and the internal 
> batch has 15 references per queue, is this enough to enforce dql and set 
> __QUEUE_STATE_STACK_XOFF? If so, then we must clear 
> __QUEUE_STATE_STACK_XOFF when resetting.

You should only be doing the batching if xmit_more is set, really.
And xmit_more will not be set if core is about to set
__QUEUE_STATE_STACK_XOFF. 

With a correctly written driver STACK_XOFF can only be set if we're
expecting a completion.

> I had a feeling this would raise some eyebrows. The main intent is to do 
> everything that netdev_tx_reset_queue() does besides resetting 
> statistics. Setting a "*STACK*" flag in driver code feels wrong 
> (especially since a *DRV* flag exists) but I could not find an 
> appropriate upper-level function. I suppose an alternative is to read 
> this flag after the device finishes the reset and sending the batch if 
> it is set. Is this any better?

AFAIU you shouldn't have to clear this flag. We need to reset DQL when
hard resetting the queue because we may lose completions. But if no
completions are lost, STACK_XOFF should be left alone. 

Just clearing that flag is not valid either because qdiscs will not be
rescheduled, so until some socket tries to xmit next packet the data
already queued will not move.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH net] ibmvnic: Do not reset dql stats on NON_FATAL err
  2023-06-24 22:19 ` Jakub Kicinski
  2023-06-26 15:45   ` Nick Child
@ 2023-06-26 19:06   ` Rick Lindsley
  1 sibling, 0 replies; 9+ messages in thread
From: Rick Lindsley @ 2023-06-26 19:06 UTC (permalink / raw)
  To: Jakub Kicinski, Nick Child; +Cc: netdev, haren, ricklind

On 6/24/23 15:19, Jakub Kicinski wrote:

> Why are you trying to clear this bit?
> 
> If the completions will still come the bit will be cleared (or not)
> during completion handling (netdev_tx_completed_queue() et al.)
> 
> Drivers shouldn't be poking into queue state bits directly.

+		if (adapter->reset_reason == VNIC_RESET_NON_FATAL)
+			clear_bit(__QUEUE_STATE_STACK_XOFF,
+				  &netdev_get_tx_queue(netdev, i)->state);
+		else
+			netdev_tx_reset_queue(netdev_get_tx_queue(netdev, i));

The problem is the call to dql_reset() in netdev_tx_reset_queue().  If we reset dql stats,   we risk num_completed being greater than num_queued, which will cause a BUG_ON to fire in dql_completed().

     static inline void netdev_tx_reset_queue(struct netdev_queue *q)
     {
     #ifdef CONFIG_BQL
         clear_bit(__QUEUE_STATE_STACK_XOFF, &q->state);
         dql_reset(&q->dql);
     #endif
     }

Existing code calls  netdev_tx_reset_queue() unconditionally.  When the error is non-fatal, though, we were tripping over the BUG_ON on those occasions when a few skbs were already submitted.  The patch here is more about NOT calling dql_reset() than it is about clearing __QUEUE_STATE_STACK_XOFF.  That was only included because it-was-the-other-thing-the-function-did.  So ... maybe we don't care about __QUEUE_STATE_STACK_XOFF?

Nick, do we *need* to have __QUEUE_STATE_STACK_XOFF cleared?  If not, then do we need to call or emulate netdev_tx_reset_queue() at all on a non-fatal error?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net] ibmvnic: Do not reset dql stats on NON_FATAL err
  2023-06-26 17:33     ` Jakub Kicinski
@ 2023-06-26 19:23       ` Nick Child
  0 siblings, 0 replies; 9+ messages in thread
From: Nick Child @ 2023-06-26 19:23 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, haren, ricklind

On 6/26/23 12:33, Jakub Kicinski wrote:
> 
> If packets can get stuck in the driver that needs a dedicated fix.
> 
> You should only be doing the batching if xmit_more is set, really.
> And xmit_more will not be set if core is about to set
> __QUEUE_STATE_STACK_XOFF.
> 

I am not saying that packets can get stuck in the driver, if xmit_more 
is false or if the driver is hard resetting then the batch gets sent.

During nonfatal reset, we are simply reestablishing communications with 
the NIC and it is okay to keep the batch around. It is possible that 
there are batched skb's because xmit_more and the non fatal reset work 
independently of each-other and are not mutually exclusive. The upper 
level functions have no way of knowing when an issue from the NIC will 
occur.

> With a correctly written driver STACK_XOFF can only be set if we're
> expecting a completion.
> 
> AFAIU you shouldn't have to clear this flag. We need to reset DQL when
> hard resetting the queue because we may lose completions. But if no
> completions are lost, STACK_XOFF should be left alone.
> 
> Just clearing that flag is not valid either because qdiscs will not be
> rescheduled, so until some socket tries to xmit next packet the data
> already queued will not move.

This addresses my concern. So if STACK_XOFF gets set, then xmit_more 
will be false and our batch will get sent. This will eventually lead to 
a completion which will clear STACK_XOFF. Meaning the reset should not 
have to worry about clearing STACK_XOFF.

My worry was that STACK_XOFF would get set when the batch was only 
partially full and xmit_more was true. If a non fatal reset occurred 
here then there would be no expected completions and a partially filled 
batch. So I thought we would have to clear STACK_XOFF in order to get 
the queue to be usable again. Sounds like this is not possible due to 
xmit_more being false if STACK_XOFF gets set.

On 6/26/23 14:06, Rick Lindsley wrote:
 >
 > Nick, do we *need* to have __QUEUE_STATE_STACK_XOFF cleared?  If not,
 > then do we need to call or emulate netdev_tx_reset_queue() at all on a
 > non-fatal error?

After reading Jakubs review. I believe we should remove the part where 
we clear STACK_XOFF.

If there are no objections, I will test these changes and send a v2.

Thanks

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-06-26 19:24 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-22 19:03 [PATCH net] ibmvnic: Do not reset dql stats on NON_FATAL err Nick Child
2023-06-23  7:52 ` Simon Horman
2023-06-23 12:41   ` Simon Horman
2023-06-23 12:24 ` Rick Lindsley
2023-06-24 22:19 ` Jakub Kicinski
2023-06-26 15:45   ` Nick Child
2023-06-26 17:33     ` Jakub Kicinski
2023-06-26 19:23       ` Nick Child
2023-06-26 19:06   ` Rick Lindsley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).