From: Saeed Mahameed <saeed@kernel.org>
To: "David S. Miller" <davem@davemloft.net>,
Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org, Roi Dayan <roid@nvidia.com>,
Amir Tzin <amirtz@nvidia.com>, Aya Levin <ayal@nvidia.com>,
Saeed Mahameed <saeedm@nvidia.com>
Subject: [v2 net 07/12] net/mlx5e: Wrap the tx reporter dump callback to extract the sq
Date: Thu, 23 Dec 2021 11:04:36 -0800 [thread overview]
Message-ID: <20211223190441.153012-8-saeed@kernel.org> (raw)
In-Reply-To: <20211223190441.153012-1-saeed@kernel.org>
From: Amir Tzin <amirtz@nvidia.com>
Function mlx5e_tx_reporter_dump_sq() casts its void * argument to struct
mlx5e_txqsq *, but in TX-timeout-recovery flow the argument is actually
of type struct mlx5e_tx_timeout_ctx *.
mlx5_core 0000:08:00.1 enp8s0f1: TX timeout detected
mlx5_core 0000:08:00.1 enp8s0f1: TX timeout on queue: 1, SQ: 0x11ec, CQ: 0x146d, SQ Cons: 0x0 SQ Prod: 0x1, usecs since last trans: 21565000
BUG: stack guard page was hit at 0000000093f1a2de (stack is 00000000b66ea0dc..000000004d932dae)
kernel stack overflow (page fault): 0000 [#1] SMP NOPTI
CPU: 5 PID: 95 Comm: kworker/u20:1 Tainted: G W OE 5.13.0_mlnx #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Workqueue: mlx5e mlx5e_tx_timeout_work [mlx5_core]
RIP: 0010:mlx5e_tx_reporter_dump_sq+0xd3/0x180
[mlx5_core]
Call Trace:
mlx5e_tx_reporter_dump+0x43/0x1c0 [mlx5_core]
devlink_health_do_dump.part.91+0x71/0xd0
devlink_health_report+0x157/0x1b0
mlx5e_reporter_tx_timeout+0xb9/0xf0 [mlx5_core]
? mlx5e_tx_reporter_err_cqe_recover+0x1d0/0x1d0
[mlx5_core]
? mlx5e_health_queue_dump+0xd0/0xd0 [mlx5_core]
? update_load_avg+0x19b/0x550
? set_next_entity+0x72/0x80
? pick_next_task_fair+0x227/0x340
? finish_task_switch+0xa2/0x280
mlx5e_tx_timeout_work+0x83/0xb0 [mlx5_core]
process_one_work+0x1de/0x3a0
worker_thread+0x2d/0x3c0
? process_one_work+0x3a0/0x3a0
kthread+0x115/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x1f/0x30
--[ end trace 51ccabea504edaff ]---
RIP: 0010:mlx5e_tx_reporter_dump_sq+0xd3/0x180
PKRU: 55555554
Kernel panic - not syncing: Fatal exception
Kernel Offset: disabled
end Kernel panic - not syncing: Fatal exception
To fix this bug add a wrapper for mlx5e_tx_reporter_dump_sq() which
extracts the sq from struct mlx5e_tx_timeout_ctx and set it as the
TX-timeout-recovery flow dump callback.
Fixes: 5f29458b77d5 ("net/mlx5e: Support dump callback in TX reporter")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Amir Tzin <amirtz@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
.../net/ethernet/mellanox/mlx5/core/en/reporter_tx.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
index 4f4bc8726ec4..614cd9477600 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
@@ -466,6 +466,14 @@ static int mlx5e_tx_reporter_dump_sq(struct mlx5e_priv *priv, struct devlink_fms
return mlx5e_health_fmsg_named_obj_nest_end(fmsg);
}
+static int mlx5e_tx_reporter_timeout_dump(struct mlx5e_priv *priv, struct devlink_fmsg *fmsg,
+ void *ctx)
+{
+ struct mlx5e_tx_timeout_ctx *to_ctx = ctx;
+
+ return mlx5e_tx_reporter_dump_sq(priv, fmsg, to_ctx->sq);
+}
+
static int mlx5e_tx_reporter_dump_all_sqs(struct mlx5e_priv *priv,
struct devlink_fmsg *fmsg)
{
@@ -561,7 +569,7 @@ int mlx5e_reporter_tx_timeout(struct mlx5e_txqsq *sq)
to_ctx.sq = sq;
err_ctx.ctx = &to_ctx;
err_ctx.recover = mlx5e_tx_reporter_timeout_recover;
- err_ctx.dump = mlx5e_tx_reporter_dump_sq;
+ err_ctx.dump = mlx5e_tx_reporter_timeout_dump;
snprintf(err_str, sizeof(err_str),
"TX timeout on queue: %d, SQ: 0x%x, CQ: 0x%x, SQ Cons: 0x%x SQ Prod: 0x%x, usecs since last trans: %u",
sq->ch_ix, sq->sqn, sq->cq.mcq.cqn, sq->cc, sq->pc,
--
2.33.1
next prev parent reply other threads:[~2021-12-23 19:04 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-23 19:04 [pull request][v2 net 00/12] mlx5 fixes 2021-12-22 Saeed Mahameed
2021-12-23 19:04 ` [v2 net 01/12] net/mlx5: DR, Fix NULL vs IS_ERR checking in dr_domain_init_resources Saeed Mahameed
2021-12-24 3:30 ` patchwork-bot+netdevbpf
2021-12-23 19:04 ` [v2 net 02/12] net/mlx5: DR, Fix querying eswitch manager vport for ECPF Saeed Mahameed
2021-12-23 19:04 ` [v2 net 03/12] net/mlx5: Use first online CPU instead of hard coded CPU Saeed Mahameed
2021-12-23 19:04 ` [v2 net 04/12] net/mlx5: Fix error print in case of IRQ request failed Saeed Mahameed
2021-12-23 19:04 ` [v2 net 05/12] net/mlx5: Fix SF health recovery flow Saeed Mahameed
2021-12-23 19:04 ` [v2 net 06/12] net/mlx5: Fix tc max supported prio for nic mode Saeed Mahameed
2021-12-23 19:04 ` Saeed Mahameed [this message]
2021-12-23 19:04 ` [v2 net 08/12] net/mlx5e: Fix skb memory leak when TC classifier action offloads are disabled Saeed Mahameed
2021-12-23 19:04 ` [v2 net 09/12] net/mlx5e: Fix interoperability between XSK and ICOSQ recovery flow Saeed Mahameed
2021-12-23 19:04 ` [v2 net 10/12] net/mlx5e: Fix ICOSQ recovery flow for XSK Saeed Mahameed
2021-12-23 19:04 ` [v2 net 11/12] net/mlx5e: Delete forward rule for ct or sample action Saeed Mahameed
2021-12-23 19:04 ` [v2 net 12/12] net/mlx5: Fix some error handling paths in 'mlx5e_tc_add_fdb_flow()' Saeed Mahameed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20211223190441.153012-8-saeed@kernel.org \
--to=saeed@kernel.org \
--cc=amirtz@nvidia.com \
--cc=ayal@nvidia.com \
--cc=davem@davemloft.net \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=roid@nvidia.com \
--cc=saeedm@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).