From: Saeed Mahameed <saeed@kernel.org>
To: "David S. Miller" <davem@davemloft.net>,
Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org, Roi Dayan <roid@nvidia.com>,
Amir Tzin <amirtz@nvidia.com>, Aya Levin <ayal@nvidia.com>,
Saeed Mahameed <saeedm@nvidia.com>
Subject: [v2 net 07/12] net/mlx5e: Wrap the tx reporter dump callback to extract the sq
Date: Thu, 23 Dec 2021 11:04:36 -0800 [thread overview]
Message-ID: <20211223190441.153012-8-saeed@kernel.org> (raw)
In-Reply-To: <20211223190441.153012-1-saeed@kernel.org>
From: Amir Tzin <amirtz@nvidia.com>
Function mlx5e_tx_reporter_dump_sq() casts its void * argument to struct
mlx5e_txqsq *, but in TX-timeout-recovery flow the argument is actually
of type struct mlx5e_tx_timeout_ctx *.
mlx5_core 0000:08:00.1 enp8s0f1: TX timeout detected
mlx5_core 0000:08:00.1 enp8s0f1: TX timeout on queue: 1, SQ: 0x11ec, CQ: 0x146d, SQ Cons: 0x0 SQ Prod: 0x1, usecs since last trans: 21565000
BUG: stack guard page was hit at 0000000093f1a2de (stack is 00000000b66ea0dc..000000004d932dae)
kernel stack overflow (page fault): 0000 [#1] SMP NOPTI
CPU: 5 PID: 95 Comm: kworker/u20:1 Tainted: G W OE 5.13.0_mlnx #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Workqueue: mlx5e mlx5e_tx_timeout_work [mlx5_core]
RIP: 0010:mlx5e_tx_reporter_dump_sq+0xd3/0x180
[mlx5_core]
Call Trace:
mlx5e_tx_reporter_dump+0x43/0x1c0 [mlx5_core]
devlink_health_do_dump.part.91+0x71/0xd0
devlink_health_report+0x157/0x1b0
mlx5e_reporter_tx_timeout+0xb9/0xf0 [mlx5_core]
? mlx5e_tx_reporter_err_cqe_recover+0x1d0/0x1d0
[mlx5_core]
? mlx5e_health_queue_dump+0xd0/0xd0 [mlx5_core]
? update_load_avg+0x19b/0x550
? set_next_entity+0x72/0x80
? pick_next_task_fair+0x227/0x340
? finish_task_switch+0xa2/0x280
mlx5e_tx_timeout_work+0x83/0xb0 [mlx5_core]
process_one_work+0x1de/0x3a0
worker_thread+0x2d/0x3c0
? process_one_work+0x3a0/0x3a0
kthread+0x115/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x1f/0x30
--[ end trace 51ccabea504edaff ]---
RIP: 0010:mlx5e_tx_reporter_dump_sq+0xd3/0x180
PKRU: 55555554
Kernel panic - not syncing: Fatal exception
Kernel Offset: disabled
end Kernel panic - not syncing: Fatal exception
To fix this bug add a wrapper for mlx5e_tx_reporter_dump_sq() which
extracts the sq from struct mlx5e_tx_timeout_ctx and set it as the
TX-timeout-recovery flow dump callback.
Fixes: 5f29458b77d5 ("net/mlx5e: Support dump callback in TX reporter")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Amir Tzin <amirtz@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
.../net/ethernet/mellanox/mlx5/core/en/reporter_tx.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
index 4f4bc8726ec4..614cd9477600 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
@@ -466,6 +466,14 @@ static int mlx5e_tx_reporter_dump_sq(struct mlx5e_priv *priv, struct devlink_fms
return mlx5e_health_fmsg_named_obj_nest_end(fmsg);
}
+static int mlx5e_tx_reporter_timeout_dump(struct mlx5e_priv *priv, struct devlink_fmsg *fmsg,
+ void *ctx)
+{
+ struct mlx5e_tx_timeout_ctx *to_ctx = ctx;
+
+ return mlx5e_tx_reporter_dump_sq(priv, fmsg, to_ctx->sq);
+}
+
static int mlx5e_tx_reporter_dump_all_sqs(struct mlx5e_priv *priv,
struct devlink_fmsg *fmsg)
{
@@ -561,7 +569,7 @@ int mlx5e_reporter_tx_timeout(struct mlx5e_txqsq *sq)
to_ctx.sq = sq;
err_ctx.ctx = &to_ctx;
err_ctx.recover = mlx5e_tx_reporter_timeout_recover;
- err_ctx.dump = mlx5e_tx_reporter_dump_sq;
+ err_ctx.dump = mlx5e_tx_reporter_timeout_dump;
snprintf(err_str, sizeof(err_str),
"TX timeout on queue: %d, SQ: 0x%x, CQ: 0x%x, SQ Cons: 0x%x SQ Prod: 0x%x, usecs since last trans: %u",
sq->ch_ix, sq->sqn, sq->cq.mcq.cqn, sq->cc, sq->pc,
--
2.33.1
next prev parent reply other threads:[~2021-12-23 19:04 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-23 19:04 [pull request][v2 net 00/12] mlx5 fixes 2021-12-22 Saeed Mahameed
2021-12-23 19:04 ` [v2 net 01/12] net/mlx5: DR, Fix NULL vs IS_ERR checking in dr_domain_init_resources Saeed Mahameed
2021-12-24 3:30 ` patchwork-bot+netdevbpf
2021-12-23 19:04 ` [v2 net 02/12] net/mlx5: DR, Fix querying eswitch manager vport for ECPF Saeed Mahameed
2021-12-23 19:04 ` [v2 net 03/12] net/mlx5: Use first online CPU instead of hard coded CPU Saeed Mahameed
2021-12-23 19:04 ` [v2 net 04/12] net/mlx5: Fix error print in case of IRQ request failed Saeed Mahameed
2021-12-23 19:04 ` [v2 net 05/12] net/mlx5: Fix SF health recovery flow Saeed Mahameed
2021-12-23 19:04 ` [v2 net 06/12] net/mlx5: Fix tc max supported prio for nic mode Saeed Mahameed
2021-12-23 19:04 ` Saeed Mahameed [this message]
2021-12-23 19:04 ` [v2 net 08/12] net/mlx5e: Fix skb memory leak when TC classifier action offloads are disabled Saeed Mahameed
2021-12-23 19:04 ` [v2 net 09/12] net/mlx5e: Fix interoperability between XSK and ICOSQ recovery flow Saeed Mahameed
2021-12-23 19:04 ` [v2 net 10/12] net/mlx5e: Fix ICOSQ recovery flow for XSK Saeed Mahameed
2021-12-23 19:04 ` [v2 net 11/12] net/mlx5e: Delete forward rule for ct or sample action Saeed Mahameed
2021-12-23 19:04 ` [v2 net 12/12] net/mlx5: Fix some error handling paths in 'mlx5e_tc_add_fdb_flow()' Saeed Mahameed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20211223190441.153012-8-saeed@kernel.org \
--to=saeed@kernel.org \
--cc=amirtz@nvidia.com \
--cc=ayal@nvidia.com \
--cc=davem@davemloft.net \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=roid@nvidia.com \
--cc=saeedm@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.