* [PATCH net] net: mana: Fix doorbell out of order violation and avoid unnecessary doorbell rings
@ 2024-08-05 23:38 longli
2024-08-06 15:56 ` Haiyang Zhang
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: longli @ 2024-08-05 23:38 UTC (permalink / raw)
To: K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Shradha Gupta, Simon Horman, Konstantin Taranov,
Souradeep Chakrabarti, Erick Archer, linux-hyperv, netdev,
linux-kernel, linux-rdma
Cc: Long Li, stable
From: Long Li <longli@microsoft.com>
After napi_complete_done() is called, another NAPI may be running on
another CPU and ring the doorbell before the current CPU does. When
combined with unnecessary rings when there is no need to ARM the CQ, this
triggers error paths in the hardware.
Fix this by always ring the doorbell in sequence and avoid unnecessary
rings.
Cc: stable@vger.kernel.org
Fixes: e1b5683ff62e ("net: mana: Move NAPI from EQ to CQ")
Signed-off-by: Long Li <longli@microsoft.com>
---
drivers/net/ethernet/microsoft/mana/mana_en.c | 24 ++++++++++++-------
include/net/mana/mana.h | 1 +
2 files changed, 16 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index d2f07e179e86..7d08e23c6749 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -1788,7 +1788,6 @@ static void mana_poll_rx_cq(struct mana_cq *cq)
static int mana_cq_handler(void *context, struct gdma_queue *gdma_queue)
{
struct mana_cq *cq = context;
- u8 arm_bit;
int w;
WARN_ON_ONCE(cq->gdma_cq != gdma_queue);
@@ -1799,16 +1798,23 @@ static int mana_cq_handler(void *context, struct gdma_queue *gdma_queue)
mana_poll_tx_cq(cq);
w = cq->work_done;
-
- if (w < cq->budget &&
- napi_complete_done(&cq->napi, w)) {
- arm_bit = SET_ARM_BIT;
- } else {
- arm_bit = 0;
+ cq->work_done_since_doorbell += w;
+
+ if (w < cq->budget) {
+ mana_gd_ring_cq(gdma_queue, SET_ARM_BIT);
+ cq->work_done_since_doorbell = 0;
+ napi_complete_done(&cq->napi, w);
+ } else if (cq->work_done_since_doorbell >
+ cq->gdma_cq->queue_size / COMP_ENTRY_SIZE * 4) {
+ /* MANA hardware requires at least one doorbell ring every 8
+ * wraparounds of CQ even there is no need to ARM. This driver
+ * rings the doorbell as soon as we have execceded 4
+ * wraparounds.
+ */
+ mana_gd_ring_cq(gdma_queue, 0);
+ cq->work_done_since_doorbell = 0;
}
- mana_gd_ring_cq(gdma_queue, arm_bit);
-
return w;
}
diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
index 6439fd8b437b..7caa334f4888 100644
--- a/include/net/mana/mana.h
+++ b/include/net/mana/mana.h
@@ -275,6 +275,7 @@ struct mana_cq {
/* NAPI data */
struct napi_struct napi;
int work_done;
+ int work_done_since_doorbell;
int budget;
};
--
2.17.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* RE: [PATCH net] net: mana: Fix doorbell out of order violation and avoid unnecessary doorbell rings
2024-08-05 23:38 [PATCH net] net: mana: Fix doorbell out of order violation and avoid unnecessary doorbell rings longli
@ 2024-08-06 15:56 ` Haiyang Zhang
2024-08-06 15:59 ` Zhu Yanjun
2024-08-06 19:28 ` Dexuan Cui
2 siblings, 0 replies; 6+ messages in thread
From: Haiyang Zhang @ 2024-08-06 15:56 UTC (permalink / raw)
To: Long Li, KY Srinivasan, Wei Liu, Dexuan Cui, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Shradha Gupta,
Simon Horman, Konstantin Taranov, Souradeep Chakrabarti,
Erick Archer, linux-hyperv@vger.kernel.org,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-rdma@vger.kernel.org
Cc: stable@vger.kernel.org
> -----Original Message-----
> From: longli@linuxonhyperv.com <longli@linuxonhyperv.com>
> Sent: Monday, August 5, 2024 7:38 PM
> To: KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
> <haiyangz@microsoft.com>; Wei Liu <wei.liu@kernel.org>; Dexuan Cui
> <decui@microsoft.com>; David S. Miller <davem@davemloft.net>; Eric Dumazet
> <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo Abeni
> <pabeni@redhat.com>; Shradha Gupta <shradhagupta@linux.microsoft.com>;
> Simon Horman <horms@kernel.org>; Konstantin Taranov
> <kotaranov@microsoft.com>; Souradeep Chakrabarti
> <schakrabarti@linux.microsoft.com>; Erick Archer
> <erick.archer@outlook.com>; linux-hyperv@vger.kernel.org;
> netdev@vger.kernel.org; linux-kernel@vger.kernel.org; linux-
> rdma@vger.kernel.org
> Cc: Long Li <longli@microsoft.com>; stable@vger.kernel.org
> Subject: [PATCH net] net: mana: Fix doorbell out of order violation and
> avoid unnecessary doorbell rings
>
> From: Long Li <longli@microsoft.com>
>
> After napi_complete_done() is called, another NAPI may be running on
> another CPU and ring the doorbell before the current CPU does. When
> combined with unnecessary rings when there is no need to ARM the CQ, this
> triggers error paths in the hardware.
>
> Fix this by always ring the doorbell in sequence and avoid unnecessary
> rings.
>
> Cc: stable@vger.kernel.org
> Fixes: e1b5683ff62e ("net: mana: Move NAPI from EQ to CQ")
> Signed-off-by: Long Li <longli@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Thank you.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net] net: mana: Fix doorbell out of order violation and avoid unnecessary doorbell rings
2024-08-05 23:38 [PATCH net] net: mana: Fix doorbell out of order violation and avoid unnecessary doorbell rings longli
2024-08-06 15:56 ` Haiyang Zhang
@ 2024-08-06 15:59 ` Zhu Yanjun
2024-08-07 15:25 ` Long Li
2024-08-06 19:28 ` Dexuan Cui
2 siblings, 1 reply; 6+ messages in thread
From: Zhu Yanjun @ 2024-08-06 15:59 UTC (permalink / raw)
To: longli, K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Shradha Gupta, Simon Horman, Konstantin Taranov,
Souradeep Chakrabarti, Erick Archer, linux-hyperv, netdev,
linux-kernel, linux-rdma
Cc: stable
在 2024/8/6 7:38, longli@linuxonhyperv.com 写道:
> From: Long Li <longli@microsoft.com>
>
> After napi_complete_done() is called, another NAPI may be running on
> another CPU and ring the doorbell before the current CPU does. When
> combined with unnecessary rings when there is no need to ARM the CQ, this
> triggers error paths in the hardware.
>
> Fix this by always ring the doorbell in sequence and avoid unnecessary
> rings.
Trivial problem^_^
s/ring/ringing ?
Zhu Yanjun
>
> Cc: stable@vger.kernel.org
> Fixes: e1b5683ff62e ("net: mana: Move NAPI from EQ to CQ")
> Signed-off-by: Long Li <longli@microsoft.com>
> ---
> drivers/net/ethernet/microsoft/mana/mana_en.c | 24 ++++++++++++-------
> include/net/mana/mana.h | 1 +
> 2 files changed, 16 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
> index d2f07e179e86..7d08e23c6749 100644
> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> @@ -1788,7 +1788,6 @@ static void mana_poll_rx_cq(struct mana_cq *cq)
> static int mana_cq_handler(void *context, struct gdma_queue *gdma_queue)
> {
> struct mana_cq *cq = context;
> - u8 arm_bit;
> int w;
>
> WARN_ON_ONCE(cq->gdma_cq != gdma_queue);
> @@ -1799,16 +1798,23 @@ static int mana_cq_handler(void *context, struct gdma_queue *gdma_queue)
> mana_poll_tx_cq(cq);
>
> w = cq->work_done;
> -
> - if (w < cq->budget &&
> - napi_complete_done(&cq->napi, w)) {
> - arm_bit = SET_ARM_BIT;
> - } else {
> - arm_bit = 0;
> + cq->work_done_since_doorbell += w;
> +
> + if (w < cq->budget) {
> + mana_gd_ring_cq(gdma_queue, SET_ARM_BIT);
> + cq->work_done_since_doorbell = 0;
> + napi_complete_done(&cq->napi, w);
> + } else if (cq->work_done_since_doorbell >
> + cq->gdma_cq->queue_size / COMP_ENTRY_SIZE * 4) {
> + /* MANA hardware requires at least one doorbell ring every 8
> + * wraparounds of CQ even there is no need to ARM. This driver
> + * rings the doorbell as soon as we have execceded 4
> + * wraparounds.
> + */
> + mana_gd_ring_cq(gdma_queue, 0);
> + cq->work_done_since_doorbell = 0;
> }
>
> - mana_gd_ring_cq(gdma_queue, arm_bit);
> -
> return w;
> }
>
> diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
> index 6439fd8b437b..7caa334f4888 100644
> --- a/include/net/mana/mana.h
> +++ b/include/net/mana/mana.h
> @@ -275,6 +275,7 @@ struct mana_cq {
> /* NAPI data */
> struct napi_struct napi;
> int work_done;
> + int work_done_since_doorbell;
> int budget;
> };
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: [PATCH net] net: mana: Fix doorbell out of order violation and avoid unnecessary doorbell rings
2024-08-05 23:38 [PATCH net] net: mana: Fix doorbell out of order violation and avoid unnecessary doorbell rings longli
2024-08-06 15:56 ` Haiyang Zhang
2024-08-06 15:59 ` Zhu Yanjun
@ 2024-08-06 19:28 ` Dexuan Cui
2024-08-07 15:26 ` Long Li
2 siblings, 1 reply; 6+ messages in thread
From: Dexuan Cui @ 2024-08-06 19:28 UTC (permalink / raw)
To: Long Li, KY Srinivasan, Haiyang Zhang, Wei Liu, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Shradha Gupta,
Simon Horman, Konstantin Taranov, Souradeep Chakrabarti,
Erick Archer, linux-hyperv@vger.kernel.org,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-rdma@vger.kernel.org
Cc: stable@vger.kernel.org
> From: longli@linuxonhyperv.com <longli@linuxonhyperv.com>
> Sent: Monday, August 5, 2024 4:38 PM
> [...]
> After napi_complete_done() is called, another NAPI may be running on
> another CPU and ring the doorbell before the current CPU does. When
Can you please share more details about "another NAPI"? Is it about busy_poll?
> combined with unnecessary rings when there is no need to ARM the CQ, this
> triggers error paths in the hardware.
>
> Fix this by always ring the doorbell in sequence and avoid unnecessary
> rings.
I'm not sure what "error paths in the hardware" means. It's better to describe
the user-visible consequence.
Maybe this is clearer:
When there is no need to arm the CQ from NAPI's perspective, the driver must
not combine "too many" arming operations due to a MANA hardware requirement:
the driver must ring the doorbell at least once within every 8 wraparounds of the CQ,
otherwise "XXX" would happen. //Dexuan: I don't know what the "XXX" is
Add a per-CQ counter cq->work_done_since_doorbell, and make sure the CQ is
armed within 4 wraparounds of the CQ. //Dexuan: why not 8 or 7?
> + if (w < cq->budget) {
> + mana_gd_ring_cq(gdma_queue, SET_ARM_BIT);
> + cq->work_done_since_doorbell = 0;
> + napi_complete_done(&cq->napi, w);
> + } else if (cq->work_done_since_doorbell >
> + cq->gdma_cq->queue_size / COMP_ENTRY_SIZE * 4) {
> + /* MANA hardware requires at least one doorbell ring every 8
s/ring every 8/arming within every 8/ ?
> + * wraparounds of CQ even there is no need to ARM. This
> driver
s/ARM/arming/ ?
s/even/even if/ ?
Thanks,
Dexuan
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: [PATCH net] net: mana: Fix doorbell out of order violation and avoid unnecessary doorbell rings
2024-08-06 15:59 ` Zhu Yanjun
@ 2024-08-07 15:25 ` Long Li
0 siblings, 0 replies; 6+ messages in thread
From: Long Li @ 2024-08-07 15:25 UTC (permalink / raw)
To: Zhu Yanjun, KY Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Shradha Gupta, Simon Horman, Konstantin Taranov,
Souradeep Chakrabarti, Erick Archer, linux-hyperv@vger.kernel.org,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-rdma@vger.kernel.org
Cc: stable@vger.kernel.org
> Subject: Re: [PATCH net] net: mana: Fix doorbell out of order violation and avoid
> unnecessary doorbell rings
>
> 在 2024/8/6 7:38, longli@linuxonhyperv.com 写道:
> > From: Long Li <longli@microsoft.com>
> >
> > After napi_complete_done() is called, another NAPI may be running on
> > another CPU and ring the doorbell before the current CPU does. When
> > combined with unnecessary rings when there is no need to ARM the CQ,
> > this triggers error paths in the hardware.
> >
> > Fix this by always ring the doorbell in sequence and avoid unnecessary
> > rings.
>
> Trivial problem^_^
>
> s/ring/ringing ?
>
> Zhu Yanjun
I'm sending v2 to fix this.
Thanks,
Long
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: [PATCH net] net: mana: Fix doorbell out of order violation and avoid unnecessary doorbell rings
2024-08-06 19:28 ` Dexuan Cui
@ 2024-08-07 15:26 ` Long Li
0 siblings, 0 replies; 6+ messages in thread
From: Long Li @ 2024-08-07 15:26 UTC (permalink / raw)
To: Dexuan Cui, KY Srinivasan, Haiyang Zhang, Wei Liu,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Shradha Gupta, Simon Horman, Konstantin Taranov,
Souradeep Chakrabarti, Erick Archer, linux-hyperv@vger.kernel.org,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-rdma@vger.kernel.org
Cc: stable@vger.kernel.org
> Subject: RE: [PATCH net] net: mana: Fix doorbell out of order violation and avoid
> unnecessary doorbell rings
>
> > From: longli@linuxonhyperv.com <longli@linuxonhyperv.com>
> > Sent: Monday, August 5, 2024 4:38 PM
> > [...]
> > After napi_complete_done() is called, another NAPI may be running on
> > another CPU and ring the doorbell before the current CPU does. When
>
> Can you please share more details about "another NAPI"? Is it about busy_poll?
>
> > combined with unnecessary rings when there is no need to ARM the CQ,
> > this triggers error paths in the hardware.
> >
> > Fix this by always ring the doorbell in sequence and avoid unnecessary
> > rings.
>
> I'm not sure what "error paths in the hardware" means. It's better to describe the
> user-visible consequence.
>
> Maybe this is clearer:
>
> When there is no need to arm the CQ from NAPI's perspective, the driver must
> not combine "too many" arming operations due to a MANA hardware
> requirement:
> the driver must ring the doorbell at least once within every 8 wraparounds of the
> CQ, otherwise "XXX" would happen. //Dexuan: I don't know what the "XXX" is
>
> Add a per-CQ counter cq->work_done_since_doorbell, and make sure the CQ is
> armed within 4 wraparounds of the CQ. //Dexuan: why not 8 or 7?
I'm sending v2 to address the details in the comments.
>
>
> > + if (w < cq->budget) {
> > + mana_gd_ring_cq(gdma_queue, SET_ARM_BIT);
> > + cq->work_done_since_doorbell = 0;
> > + napi_complete_done(&cq->napi, w);
> > + } else if (cq->work_done_since_doorbell >
> > + cq->gdma_cq->queue_size / COMP_ENTRY_SIZE * 4) {
> > + /* MANA hardware requires at least one doorbell ring every 8
> s/ring every 8/arming within every 8/ ?
>
> > + * wraparounds of CQ even there is no need to ARM. This
> > driver
>
> s/ARM/arming/ ?
> s/even/even if/ ?
Will fix this in v2.
Thanks,
Long
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-08-07 15:26 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-05 23:38 [PATCH net] net: mana: Fix doorbell out of order violation and avoid unnecessary doorbell rings longli
2024-08-06 15:56 ` Haiyang Zhang
2024-08-06 15:59 ` Zhu Yanjun
2024-08-07 15:25 ` Long Li
2024-08-06 19:28 ` Dexuan Cui
2024-08-07 15:26 ` Long Li
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).