From: saeed@kernel.org
To: "David S. Miller" <davem@davemloft.net>,
Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org, Maxim Mikityanskiy <maximmi@mellanox.com>,
Tariq Toukan <tariqt@mellanox.com>,
Saeed Mahameed <saeedm@mellanox.com>,
Saeed Mahameed <saeedm@nvidia.com>
Subject: [net V2 03/15] net/mlx5e: Use synchronize_rcu to sync with NAPI
Date: Mon, 21 Sep 2020 17:30:49 -0700 [thread overview]
Message-ID: <20200922003101.529117-4-saeed@kernel.org> (raw)
In-Reply-To: <20200922003101.529117-1-saeed@kernel.org>
From: Maxim Mikityanskiy <maximmi@mellanox.com>
As described in the previous commit, napi_synchronize doesn't quite fit
the purpose when we just need to wait until the currently running NAPI
quits. Its implementation waits until NAPI is not running by polling and
waiting for 1ms in between. In cases where we need to deactivate one
queue (e.g., recovery flows) or where we deactivate them one-by-one
(deactivate channel flow), we may get stuck in napi_synchronize forever
if other queues keep NAPI active, causing a soft lockup. Depending on
kernel configuration (CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC), it may result
in a kernel panic.
To fix the issue, use synchronize_rcu to wait for NAPI to quit, and wrap
the whole NAPI in rcu_read_lock.
Fixes: acc6c5953af1 ("net/mlx5e: Split open/close channels to stages")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
.../net/ethernet/mellanox/mlx5/core/en/xsk/rx.c | 14 ++------------
.../ethernet/mellanox/mlx5/core/en/xsk/setup.c | 3 +--
.../net/ethernet/mellanox/mlx5/core/en_main.c | 12 ++++--------
drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 12 ++----------
.../net/ethernet/mellanox/mlx5/core/en_txrx.c | 17 +++++++++++++----
5 files changed, 22 insertions(+), 36 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
index a33a1f762c70..40db27bf790b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
@@ -31,7 +31,6 @@ struct sk_buff *mlx5e_xsk_skb_from_cqe_mpwrq_linear(struct mlx5e_rq *rq,
{
struct xdp_buff *xdp = wi->umr.dma_info[page_idx].xsk;
u32 cqe_bcnt32 = cqe_bcnt;
- bool consumed;
/* Check packet size. Note LRO doesn't use linear SKB */
if (unlikely(cqe_bcnt > rq->hw_mtu)) {
@@ -51,10 +50,6 @@ struct sk_buff *mlx5e_xsk_skb_from_cqe_mpwrq_linear(struct mlx5e_rq *rq,
xsk_buff_dma_sync_for_cpu(xdp);
prefetch(xdp->data);
- rcu_read_lock();
- consumed = mlx5e_xdp_handle(rq, NULL, &cqe_bcnt32, xdp);
- rcu_read_unlock();
-
/* Possible flows:
* - XDP_REDIRECT to XSKMAP:
* The page is owned by the userspace from now.
@@ -70,7 +65,7 @@ struct sk_buff *mlx5e_xsk_skb_from_cqe_mpwrq_linear(struct mlx5e_rq *rq,
* allocated first from the Reuse Ring, so it has enough space.
*/
- if (likely(consumed)) {
+ if (likely(mlx5e_xdp_handle(rq, NULL, &cqe_bcnt32, xdp))) {
if (likely(__test_and_clear_bit(MLX5E_RQ_FLAG_XDP_XMIT, rq->flags)))
__set_bit(page_idx, wi->xdp_xmit_bitmap); /* non-atomic */
return NULL; /* page/packet was consumed by XDP */
@@ -88,7 +83,6 @@ struct sk_buff *mlx5e_xsk_skb_from_cqe_linear(struct mlx5e_rq *rq,
u32 cqe_bcnt)
{
struct xdp_buff *xdp = wi->di->xsk;
- bool consumed;
/* wi->offset is not used in this function, because xdp->data and the
* DMA address point directly to the necessary place. Furthermore, the
@@ -107,11 +101,7 @@ struct sk_buff *mlx5e_xsk_skb_from_cqe_linear(struct mlx5e_rq *rq,
return NULL;
}
- rcu_read_lock();
- consumed = mlx5e_xdp_handle(rq, NULL, &cqe_bcnt, xdp);
- rcu_read_unlock();
-
- if (likely(consumed))
+ if (likely(mlx5e_xdp_handle(rq, NULL, &cqe_bcnt, xdp)))
return NULL; /* page/packet was consumed by XDP */
/* XDP_PASS: copy the data from the UMEM to a new SKB. The frame reuse
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c
index dd9df519d383..55e65a438de7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c
@@ -106,8 +106,7 @@ int mlx5e_open_xsk(struct mlx5e_priv *priv, struct mlx5e_params *params,
void mlx5e_close_xsk(struct mlx5e_channel *c)
{
clear_bit(MLX5E_CHANNEL_STATE_XSK, c->state);
- napi_synchronize(&c->napi);
- synchronize_rcu(); /* Sync with the XSK wakeup. */
+ synchronize_rcu(); /* Sync with the XSK wakeup and with NAPI. */
mlx5e_close_rq(&c->xskrq);
mlx5e_close_cq(&c->xskrq.cq);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index bde97b108db5..917c28e7f29e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -873,7 +873,7 @@ void mlx5e_activate_rq(struct mlx5e_rq *rq)
void mlx5e_deactivate_rq(struct mlx5e_rq *rq)
{
clear_bit(MLX5E_RQ_STATE_ENABLED, &rq->state);
- napi_synchronize(&rq->channel->napi); /* prevent mlx5e_post_rx_wqes */
+ synchronize_rcu(); /* Sync with NAPI to prevent mlx5e_post_rx_wqes. */
}
void mlx5e_close_rq(struct mlx5e_rq *rq)
@@ -1318,12 +1318,10 @@ void mlx5e_tx_disable_queue(struct netdev_queue *txq)
static void mlx5e_deactivate_txqsq(struct mlx5e_txqsq *sq)
{
- struct mlx5e_channel *c = sq->channel;
struct mlx5_wq_cyc *wq = &sq->wq;
clear_bit(MLX5E_SQ_STATE_ENABLED, &sq->state);
- /* prevent netif_tx_wake_queue */
- napi_synchronize(&c->napi);
+ synchronize_rcu(); /* Sync with NAPI to prevent netif_tx_wake_queue. */
mlx5e_tx_disable_queue(sq->txq);
@@ -1398,10 +1396,8 @@ void mlx5e_activate_icosq(struct mlx5e_icosq *icosq)
void mlx5e_deactivate_icosq(struct mlx5e_icosq *icosq)
{
- struct mlx5e_channel *c = icosq->channel;
-
clear_bit(MLX5E_SQ_STATE_ENABLED, &icosq->state);
- napi_synchronize(&c->napi);
+ synchronize_rcu(); /* Sync with NAPI. */
}
void mlx5e_close_icosq(struct mlx5e_icosq *sq)
@@ -1480,7 +1476,7 @@ void mlx5e_close_xdpsq(struct mlx5e_xdpsq *sq)
struct mlx5e_channel *c = sq->channel;
clear_bit(MLX5E_SQ_STATE_ENABLED, &sq->state);
- napi_synchronize(&c->napi);
+ synchronize_rcu(); /* Sync with NAPI. */
mlx5e_destroy_sq(c->mdev, sq->sqn);
mlx5e_free_xdpsq_descs(sq);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 65828af120b7..99d102c035b0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -1132,7 +1132,6 @@ mlx5e_skb_from_cqe_linear(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe,
struct xdp_buff xdp;
struct sk_buff *skb;
void *va, *data;
- bool consumed;
u32 frag_size;
va = page_address(di->page) + wi->offset;
@@ -1144,11 +1143,8 @@ mlx5e_skb_from_cqe_linear(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe,
prefetchw(va); /* xdp_frame data area */
prefetch(data);
- rcu_read_lock();
mlx5e_fill_xdp_buff(rq, va, rx_headroom, cqe_bcnt, &xdp);
- consumed = mlx5e_xdp_handle(rq, di, &cqe_bcnt, &xdp);
- rcu_read_unlock();
- if (consumed)
+ if (mlx5e_xdp_handle(rq, di, &cqe_bcnt, &xdp))
return NULL; /* page/packet was consumed by XDP */
rx_headroom = xdp.data - xdp.data_hard_start;
@@ -1438,7 +1434,6 @@ mlx5e_skb_from_cqe_mpwrq_linear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *wi,
struct sk_buff *skb;
void *va, *data;
u32 frag_size;
- bool consumed;
/* Check packet size. Note LRO doesn't use linear SKB */
if (unlikely(cqe_bcnt > rq->hw_mtu)) {
@@ -1455,11 +1450,8 @@ mlx5e_skb_from_cqe_mpwrq_linear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *wi,
prefetchw(va); /* xdp_frame data area */
prefetch(data);
- rcu_read_lock();
mlx5e_fill_xdp_buff(rq, va, rx_headroom, cqe_bcnt32, &xdp);
- consumed = mlx5e_xdp_handle(rq, di, &cqe_bcnt32, &xdp);
- rcu_read_unlock();
- if (consumed) {
+ if (mlx5e_xdp_handle(rq, di, &cqe_bcnt32, &xdp)) {
if (__test_and_clear_bit(MLX5E_RQ_FLAG_XDP_XMIT, rq->flags))
__set_bit(page_idx, wi->xdp_xmit_bitmap); /* non-atomic */
return NULL; /* page/packet was consumed by XDP */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
index de10b06bade5..d5868670f8a5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
@@ -121,13 +121,17 @@ int mlx5e_napi_poll(struct napi_struct *napi, int budget)
struct mlx5e_xdpsq *xsksq = &c->xsksq;
struct mlx5e_rq *xskrq = &c->xskrq;
struct mlx5e_rq *rq = &c->rq;
- bool xsk_open = test_bit(MLX5E_CHANNEL_STATE_XSK, c->state);
bool aff_change = false;
bool busy_xsk = false;
bool busy = false;
int work_done = 0;
+ bool xsk_open;
int i;
+ rcu_read_lock();
+
+ xsk_open = test_bit(MLX5E_CHANNEL_STATE_XSK, c->state);
+
ch_stats->poll++;
for (i = 0; i < c->num_tc; i++)
@@ -167,8 +171,10 @@ int mlx5e_napi_poll(struct napi_struct *napi, int budget)
busy |= busy_xsk;
if (busy) {
- if (likely(mlx5e_channel_no_affinity_change(c)))
- return budget;
+ if (likely(mlx5e_channel_no_affinity_change(c))) {
+ work_done = budget;
+ goto out;
+ }
ch_stats->aff_change++;
aff_change = true;
if (budget && work_done == budget)
@@ -176,7 +182,7 @@ int mlx5e_napi_poll(struct napi_struct *napi, int budget)
}
if (unlikely(!napi_complete_done(napi, work_done)))
- return work_done;
+ goto out;
ch_stats->arm++;
@@ -203,6 +209,9 @@ int mlx5e_napi_poll(struct napi_struct *napi, int budget)
ch_stats->force_irq++;
}
+out:
+ rcu_read_unlock();
+
return work_done;
}
--
2.26.2
next prev parent reply other threads:[~2020-09-22 0:31 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-22 0:30 [pull request][net V2 00/15] mlx5 fixes-2020-09-18 saeed
2020-09-22 0:30 ` [net V2 01/15] net/mlx5: Fix FTE cleanup saeed
2020-09-22 0:30 ` [net V2 02/15] net/mlx5e: Use RCU to protect rq->xdp_prog saeed
2020-09-22 0:30 ` saeed [this message]
2020-09-22 0:30 ` [net V2 04/15] net/mlx5e: Fix memory leak of tunnel info when rule under multipath not ready saeed
2020-09-22 0:30 ` [net V2 05/15] net/mlx5e: CT: Fix freeing ct_label mapping saeed
2020-09-22 0:30 ` [net V2 06/15] net/mlx5e: Enable adding peer miss rules only if merged eswitch is supported saeed
2020-09-22 0:30 ` [net V2 07/15] net/mlx5e: Fix endianness when calculating pedit mask first bit saeed
2020-09-22 0:30 ` [net V2 08/15] net/mlx5e: Fix multicast counter not up-to-date in "ip -s" saeed
2020-09-22 0:30 ` [net V2 09/15] net/mlx5e: Fix using wrong stats_grps in mlx5e_update_ndo_stats() saeed
2020-09-22 0:30 ` [net V2 10/15] net/mlx5e: TLS, Do not expose FPGA TLS counter if not supported saeed
2020-09-22 0:30 ` [net V2 11/15] net/mlx5e: kTLS, Fix napi sync and possible use-after-free saeed
2020-09-22 0:30 ` [net V2 12/15] net/mlx5e: kTLS, Add missing dma_unmap in RX resync saeed
2020-09-22 0:30 ` [net V2 13/15] net/mlx5e: kTLS, Fix leak on resync error flow saeed
2020-09-22 0:31 ` [net V2 14/15] net/mlx5e: kTLS, Avoid kzalloc(GFP_KERNEL) under spinlock saeed
2020-09-22 0:31 ` [net V2 15/15] net/mlx5e: mlx5e_fec_in_caps() returns a boolean saeed
2020-09-22 0:36 ` [pull request][net V2 00/15] mlx5 fixes-2020-09-18 David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200922003101.529117-4-saeed@kernel.org \
--to=saeed@kernel.org \
--cc=davem@davemloft.net \
--cc=kuba@kernel.org \
--cc=maximmi@mellanox.com \
--cc=netdev@vger.kernel.org \
--cc=saeedm@mellanox.com \
--cc=saeedm@nvidia.com \
--cc=tariqt@mellanox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).