From: Luis Chamberlain <mcgrof@kernel.org>
To: Igor Russkikh <irusskikh@marvell.com>
Cc: jeyu@kernel.org, akpm@linux-foundation.org, arnd@arndb.de,
rostedt@goodmis.org, mingo@redhat.com, aquini@redhat.com,
cai@lca.pw, dyoung@redhat.com, bhe@redhat.com,
peterz@infradead.org, tglx@linutronix.de, gpiccoli@canonical.com,
pmladek@suse.com, tiwai@suse.de, schlad@suse.de,
andriy.shevchenko@linux.intel.com, keescook@chromium.org,
daniel.vetter@ffwll.ch, will@kernel.org,
mchehab+samsung@kernel.org, kvalo@codeaurora.org,
davem@davemloft.net, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, Ariel Elior <aelior@marvell.com>,
GR-everest-linux-l2 <GR-everest-linux-l2@marvell.com>
Subject: Re: [EXT] [PATCH 09/15] qed: use new module_firmware_crashed()
Date: Tue, 12 May 2020 17:34:31 +0000 [thread overview]
Message-ID: <20200512173431.GD11244@42.do-not-panic.com> (raw)
In-Reply-To: <e10b611e-f925-f12d-bcd2-ba60d86dd8d0@marvell.com>
On Tue, May 12, 2020 at 07:23:28PM +0300, Igor Russkikh wrote:
>
> >> So I think its not a good place to insert this call.
> >> Its hard to find exact good place to insert it in qed.
> >
> > Is there a way to check if what happened was indeed a fw crash?
>
> Our driver has two firmwares (slowpath and fastpath).
> For slowpath firmware the way to understand it crashed is to observe command
> response timeout. This is in qed_mcp.c, around "The MFW failed to respond to
> command" traceout.
Ok thanks.
> For fastpath this is tricky, think you may leave the above place as the only
> place to invoke module_firmware_crashed()
So do you mean like the changes below?
diff --git a/drivers/net/ethernet/qlogic/qed/qed_debug.c b/drivers/net/ethernet/qlogic/qed/qed_debug.c
index f4eebaabb6d0..95cb7da2542e 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_debug.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_debug.c
@@ -7906,6 +7906,7 @@ int qed_dbg_all_data(struct qed_dev *cdev, void *buffer)
rc = qed_dbg_grc(cdev, (u8 *)buffer + offset +
REGDUMP_HEADER_SIZE, &feature_size);
if (!rc) {
+ module_firmware_crashed();
*(u32 *)((u8 *)buffer + offset) =
qed_calc_regdump_header(cdev, GRC_DUMP,
cur_engine,
diff --git a/drivers/net/ethernet/qlogic/qed/qed_mcp.c b/drivers/net/ethernet/qlogic/qed/qed_mcp.c
index 280527cc0578..a818cf09dccf 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_mcp.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_mcp.c
@@ -566,6 +566,7 @@ _qed_mcp_cmd_and_union(struct qed_hwfn *p_hwfn,
DP_NOTICE(p_hwfn,
"The MFW failed to respond to command 0x%08x [param 0x%08x].\n",
p_mb_params->cmd, p_mb_params->param);
+ module_firmware_crashed();
qed_mcp_print_cpu_info(p_hwfn, p_ptt);
spin_lock_bh(&p_hwfn->mcp_info->cmd_lock);
> >> One more thing is that AFAIU taint flag gets permanent on kernel, but
> > for
> >> example our device can recover itself from some FW crashes, thus it'd be
> >> transparent for user.
> >
> > Similar things are *supposed* to recoverable with other device, however
> > this can also sometimes lead to a situation where devices are not usable
> > anymore, and require a full driver unload / load.
> >
> >> Whats the logical purpose of module_firmware_crashed? Does it mean fatal
> >> unrecoverable error on device?
> >
> > Its just to annotate on the module and kernel that this has happened.
> >
> > I take it you may agree that, firmware crashing *often* is not good
> > design,
> > and these issues should be reported to / fixed by vendors. In cases
> > where driver bugs are reported it is good to see if a firmware crash has
> > happened before, so that during analysis this is ruled out.
>
> Probably, but still I see some misalignment here, in sense that taint is about
> the kernel state, not about a hardware state indication.
The kernel carries the driver though, and the driver / subsystem can
often times act strange when this happens.
> devlink health could really be a much better candidate for such things.
That sounds fantastic, please Cc me on patches! However I still believe
we should register this event in the kernel for support purposes.
Luis
next prev parent reply other threads:[~2020-05-12 17:34 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-09 4:35 [PATCH 00/15] net: taint when the device driver firmware crashes Luis Chamberlain
2020-05-09 4:35 ` [PATCH 01/15] taint: add module firmware crash taint support Luis Chamberlain
2020-05-09 15:18 ` Rafael Aquini
2020-05-09 16:46 ` Luis Chamberlain
2020-05-10 2:19 ` Randy Dunlap
2020-05-09 4:35 ` [PATCH 02/15] ethernet/839: use new module_firmware_crashed() Luis Chamberlain
2020-05-09 4:35 ` [PATCH 03/15] bnx2x: " Luis Chamberlain
2020-05-09 4:35 ` [PATCH 04/15] bnxt: " Luis Chamberlain
2020-05-09 4:35 ` [PATCH 05/15] bna: " Luis Chamberlain
2020-05-09 4:35 ` [PATCH 06/15] liquidio: " Luis Chamberlain
2020-05-09 4:35 ` [PATCH 07/15] cxgb4: " Luis Chamberlain
2020-05-09 4:35 ` [PATCH 08/15] ehea: " Luis Chamberlain
2020-05-09 4:35 ` [PATCH 09/15] qed: " Luis Chamberlain
2020-05-09 6:32 ` [EXT] " Igor Russkikh
2020-05-09 16:42 ` Luis Chamberlain
2020-05-12 16:23 ` Igor Russkikh
2020-05-12 17:34 ` Luis Chamberlain [this message]
2020-05-14 14:53 ` Igor Russkikh
2020-05-15 20:32 ` Luis Chamberlain
2020-05-15 20:37 ` Igor Russkikh
2020-05-09 4:35 ` [PATCH 10/15] soc: qcom: ipa: " Luis Chamberlain
2020-05-09 4:35 ` [PATCH 11/15] wimax/i2400m: " Luis Chamberlain
2020-05-09 4:35 ` [PATCH 12/15] ath10k: " Luis Chamberlain
2020-05-09 4:35 ` [PATCH 13/15] ath6kl: " Luis Chamberlain
2020-05-09 4:35 ` [PATCH 14/15] brcm80211: " Luis Chamberlain
2020-05-09 4:35 ` [PATCH 15/15] mwl8k: " Luis Chamberlain
2020-05-09 18:35 ` [PATCH 00/15] net: taint when the device driver firmware crashes Jakub Kicinski
2020-05-11 14:11 ` Luis Chamberlain
2020-05-10 1:01 ` Shannon Nelson
2020-05-10 1:58 ` Andrew Lunn
2020-05-10 2:15 ` Shannon Nelson
2020-05-11 14:13 ` Luis Chamberlain
2020-05-11 19:21 ` Steven Rostedt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200512173431.GD11244@42.do-not-panic.com \
--to=mcgrof@kernel.org \
--cc=GR-everest-linux-l2@marvell.com \
--cc=aelior@marvell.com \
--cc=akpm@linux-foundation.org \
--cc=andriy.shevchenko@linux.intel.com \
--cc=aquini@redhat.com \
--cc=arnd@arndb.de \
--cc=bhe@redhat.com \
--cc=cai@lca.pw \
--cc=daniel.vetter@ffwll.ch \
--cc=davem@davemloft.net \
--cc=dyoung@redhat.com \
--cc=gpiccoli@canonical.com \
--cc=irusskikh@marvell.com \
--cc=jeyu@kernel.org \
--cc=keescook@chromium.org \
--cc=kvalo@codeaurora.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mchehab+samsung@kernel.org \
--cc=mingo@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=pmladek@suse.com \
--cc=rostedt@goodmis.org \
--cc=schlad@suse.de \
--cc=tglx@linutronix.de \
--cc=tiwai@suse.de \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).