netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net v1] mlxbf_gige: fix receive packet race condition
@ 2022-09-08 20:28 David Thompson
  2022-09-19 21:17 ` Jakub Kicinski
  0 siblings, 1 reply; 3+ messages in thread
From: David Thompson @ 2022-09-08 20:28 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni
  Cc: netdev, cai.huoqing, brgl, limings, David Thompson, Asmaa Mnebhi

Under heavy traffic, the BF2 Gigabit interface can
become unresponsive for periods of time (several minutes)
before eventually recovering.  This is due to a possible
race condition in the mlxbf_gige_rx_packet function, where
the function exits with producer and consumer indices equal
but there are remaining packet(s) to be processed. In order
to prevent this situation, disable receive DMA during the
processing of received packets.

Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver")
Reviewed-by: Asmaa Mnebhi <asmaa@nvidia.com>
Signed-off-by: David Thompson <davthompson@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c
index afa3b92a6905..1490fbc74169 100644
--- a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c
+++ b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c
@@ -299,6 +299,10 @@ int mlxbf_gige_poll(struct napi_struct *napi, int budget)
 
 	mlxbf_gige_handle_tx_complete(priv);
 
+	data = readq(priv->base + MLXBF_GIGE_RX_DMA);
+	data &= ~MLXBF_GIGE_RX_DMA_EN;
+	writeq(data, priv->base + MLXBF_GIGE_RX_DMA);
+
 	do {
 		remaining_pkts = mlxbf_gige_rx_packet(priv, &work_done);
 	} while (remaining_pkts && work_done < budget);
@@ -314,6 +318,10 @@ int mlxbf_gige_poll(struct napi_struct *napi, int budget)
 		data = readq(priv->base + MLXBF_GIGE_INT_MASK);
 		data &= ~MLXBF_GIGE_INT_MASK_RX_RECEIVE_PACKET;
 		writeq(data, priv->base + MLXBF_GIGE_INT_MASK);
+
+		data = readq(priv->base + MLXBF_GIGE_RX_DMA);
+		data |= MLXBF_GIGE_RX_DMA_EN;
+		writeq(data, priv->base + MLXBF_GIGE_RX_DMA);
 	}
 
 	return work_done;
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH net v1] mlxbf_gige: fix receive packet race condition
  2022-09-08 20:28 [PATCH net v1] mlxbf_gige: fix receive packet race condition David Thompson
@ 2022-09-19 21:17 ` Jakub Kicinski
  2022-10-25 16:31   ` David Thompson
  0 siblings, 1 reply; 3+ messages in thread
From: Jakub Kicinski @ 2022-09-19 21:17 UTC (permalink / raw)
  To: David Thompson
  Cc: davem, edumazet, pabeni, netdev, cai.huoqing, brgl, limings,
	Asmaa Mnebhi

On Thu, 8 Sep 2022 16:28:53 -0400 David Thompson wrote:
> Under heavy traffic, the BF2 Gigabit interface can
> become unresponsive for periods of time (several minutes)
> before eventually recovering.  This is due to a possible
> race condition in the mlxbf_gige_rx_packet function, where
> the function exits with producer and consumer indices equal
> but there are remaining packet(s) to be processed. In order
> to prevent this situation, disable receive DMA during the
> processing of received packets.

Pausing Rx DMA seems a little drastic, is the capacity of the NIC
buffer large enough to sink the traffic while the stack drains 
the ring?

Could you provide a little more detail on what the HW issue is? 
There is no less intrusive way we can fix it?

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: [PATCH net v1] mlxbf_gige: fix receive packet race condition
  2022-09-19 21:17 ` Jakub Kicinski
@ 2022-10-25 16:31   ` David Thompson
  0 siblings, 0 replies; 3+ messages in thread
From: David Thompson @ 2022-10-25 16:31 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem@davemloft.net, edumazet@google.com, pabeni@redhat.com,
	netdev@vger.kernel.org, cai.huoqing@linux.dev, brgl@bgdev.pl,
	Liming Sun, Asmaa Mnebhi

> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Monday, September 19, 2022 5:18 PM
> To: David Thompson <davthompson@nvidia.com>
> Cc: davem@davemloft.net; edumazet@google.com; pabeni@redhat.com;
> netdev@vger.kernel.org; cai.huoqing@linux.dev; brgl@bgdev.pl; Liming Sun
> <limings@nvidia.com>; Asmaa Mnebhi <asmaa@nvidia.com>
> Subject: Re: [PATCH net v1] mlxbf_gige: fix receive packet race condition
> 
> On Thu, 8 Sep 2022 16:28:53 -0400 David Thompson wrote:
> > Under heavy traffic, the BF2 Gigabit interface can become unresponsive
> > for periods of time (several minutes) before eventually recovering.
> > This is due to a possible race condition in the mlxbf_gige_rx_packet
> > function, where the function exits with producer and consumer indices
> > equal but there are remaining packet(s) to be processed. In order to
> > prevent this situation, disable receive DMA during the processing of
> > received packets.
> 
> Pausing Rx DMA seems a little drastic, is the capacity of the NIC buffer large enough to sink the
> traffic while the stack drains the ring?
> 
> Could you provide a little more detail on what the HW issue is?
> There is no less intrusive way we can fix it?

Thank you for your insight Jakub.  I will review this patch and see if
it can be solved without pausing of the DMA process.

FYI, a little background on the DMA operation in hardware:

The pausing of RX DMA prevents writing new packets to memory.
New packets will be written to a 20KB buffer (but won't get forwarded to memory and no consumer index update). Once this buffer is full, packets will get dropped.  

Thanks, Dave

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-10-25 16:31 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-09-08 20:28 [PATCH net v1] mlxbf_gige: fix receive packet race condition David Thompson
2022-09-19 21:17 ` Jakub Kicinski
2022-10-25 16:31   ` David Thompson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).