public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Luis Chamberlain <mcgrof@kernel.org>
To: Igor Russkikh <irusskikh@marvell.com>
Cc: jeyu@kernel.org, akpm@linux-foundation.org, arnd@arndb.de,
	rostedt@goodmis.org, mingo@redhat.com, aquini@redhat.com,
	cai@lca.pw, dyoung@redhat.com, bhe@redhat.com,
	peterz@infradead.org, tglx@linutronix.de, gpiccoli@canonical.com,
	pmladek@suse.com, tiwai@suse.de, schlad@suse.de,
	andriy.shevchenko@linux.intel.com, keescook@chromium.org,
	daniel.vetter@ffwll.ch, will@kernel.org,
	mchehab+samsung@kernel.org, kvalo@codeaurora.org,
	davem@davemloft.net, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, Ariel Elior <aelior@marvell.com>,
	GR-everest-linux-l2 <GR-everest-linux-l2@marvell.com>
Subject: Re: [EXT] [PATCH 09/15] qed: use new module_firmware_crashed()
Date: Sat, 9 May 2020 16:42:29 +0000	[thread overview]
Message-ID: <20200509164229.GJ11244@42.do-not-panic.com> (raw)
In-Reply-To: <2aaddb69-2292-ff3f-94c7-0ab9dbc8e53c@marvell.com>

On Sat, May 09, 2020 at 09:32:51AM +0300, Igor Russkikh wrote:
> 
> > This makes use of the new module_firmware_crashed() to help
> > annotate when firmware for device drivers crash. When firmware
> > crashes devices can sometimes become unresponsive, and recovery
> > sometimes requires a driver unload / reload and in the worst cases
> > a reboot.
> > 
> > Using a taint flag allows us to annotate when this happens clearly.
> > 
> > Cc: Ariel Elior <aelior@marvell.com>
> > Cc: GR-everest-linux-l2@marvell.com
> > Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> > ---
> >  drivers/net/ethernet/qlogic/qed/qed_debug.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/drivers/net/ethernet/qlogic/qed/qed_debug.c
> > b/drivers/net/ethernet/qlogic/qed/qed_debug.c
> > index f4eebaabb6d0..9cc6287b889b 100644
> > --- a/drivers/net/ethernet/qlogic/qed/qed_debug.c
> > +++ b/drivers/net/ethernet/qlogic/qed/qed_debug.c
> > @@ -7854,6 +7854,7 @@ int qed_dbg_all_data(struct qed_dev *cdev, void
> > *buffer)
> >  						 REGDUMP_HEADER_SIZE,
> >  						 &feature_size);
> >  		if (!rc) {
> > +			module_firmware_crashed();
> >  			*(u32 *)((u8 *)buffer + offset) =
> >  			    qed_calc_regdump_header(cdev,
> > PROTECTION_OVERRIDE,
> >  						    cur_engine,
> > @@ -7869,6 +7870,7 @@ int qed_dbg_all_data(struct qed_dev *cdev, void
> > *buffer)
> >  		rc = qed_dbg_fw_asserts(cdev, (u8 *)buffer + offset +
> >  					REGDUMP_HEADER_SIZE,
> > &feature_size);
> >  		if (!rc) {
> > +			module_firmware_crashed();
> >  			*(u32 *)((u8 *)buffer + offset) =
> >  			    qed_calc_regdump_header(cdev, FW_ASSERTS,
> >  						    cur_engine,
> > feature_size,
> > @@ -7906,6 +7908,7 @@ int qed_dbg_all_data(struct qed_dev *cdev, void
> > *buffer)
> >  		rc = qed_dbg_grc(cdev, (u8 *)buffer + offset +
> >  				 REGDUMP_HEADER_SIZE, &feature_size);
> >  		if (!rc) {
> > +			module_firmware_crashed();
> >  			*(u32 *)((u8 *)buffer + offset) =
> >  			    qed_calc_regdump_header(cdev, GRC_DUMP,
> >  						    cur_engine,
> 
> 
> Hi Luis,
> 
> qed_dbg_all_data is being used to gather debug dump from device. Failures
> inside it may happen due to various reasons, but they normally do not indicate
> FW failure.
> 
> So I think its not a good place to insert this call.
> Its hard to find exact good place to insert it in qed.

Is there a way to check if what happened was indeed a fw crash?

> One more thing is that AFAIU taint flag gets permanent on kernel, but for
> example our device can recover itself from some FW crashes, thus it'd be
> transparent for user.

Similar things are *supposed* to recoverable with other device, however
this can also sometimes lead to a situation where devices are not usable
anymore, and require a full driver unload / load.

> Whats the logical purpose of module_firmware_crashed? Does it mean fatal
> unrecoverable error on device?

Its just to annotate on the module and kernel that this has happened.

I take it you may agree that, firmware crashing *often* is not good design,
and these issues should be reported to / fixed by vendors. In cases
where driver bugs are reported it is good to see if a firmware crash has
happened before, so that during analysis this is ruled out.

  Luis

  reply	other threads:[~2020-05-09 16:42 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-09  4:35 [PATCH 00/15] net: taint when the device driver firmware crashes Luis Chamberlain
2020-05-09  4:35 ` [PATCH 01/15] taint: add module firmware crash taint support Luis Chamberlain
2020-05-09 15:18   ` Rafael Aquini
2020-05-09 16:46     ` Luis Chamberlain
2020-05-10  2:19       ` Randy Dunlap
2020-05-09  4:35 ` [PATCH 02/15] ethernet/839: use new module_firmware_crashed() Luis Chamberlain
2020-05-09  4:35 ` [PATCH 03/15] bnx2x: " Luis Chamberlain
2020-05-09  4:35 ` [PATCH 04/15] bnxt: " Luis Chamberlain
2020-05-09  4:35 ` [PATCH 05/15] bna: " Luis Chamberlain
2020-05-09  4:35 ` [PATCH 06/15] liquidio: " Luis Chamberlain
2020-05-09  4:35 ` [PATCH 07/15] cxgb4: " Luis Chamberlain
2020-05-09  4:35 ` [PATCH 08/15] ehea: " Luis Chamberlain
2020-05-09  4:35 ` [PATCH 09/15] qed: " Luis Chamberlain
2020-05-09  6:32   ` [EXT] " Igor Russkikh
2020-05-09 16:42     ` Luis Chamberlain [this message]
2020-05-12 16:23       ` Igor Russkikh
2020-05-12 17:34         ` Luis Chamberlain
2020-05-14 14:53           ` Igor Russkikh
2020-05-15 20:32             ` Luis Chamberlain
2020-05-15 20:37               ` Igor Russkikh
2020-05-09  4:35 ` [PATCH 10/15] soc: qcom: ipa: " Luis Chamberlain
2020-05-09  4:35 ` [PATCH 11/15] wimax/i2400m: " Luis Chamberlain
2020-05-09  4:35 ` [PATCH 12/15] ath10k: " Luis Chamberlain
2020-05-09  4:35 ` [PATCH 13/15] ath6kl: " Luis Chamberlain
2020-05-09  4:35 ` [PATCH 14/15] brcm80211: " Luis Chamberlain
2020-05-09  4:35 ` [PATCH 15/15] mwl8k: " Luis Chamberlain
2020-05-09 18:35 ` [PATCH 00/15] net: taint when the device driver firmware crashes Jakub Kicinski
2020-05-11 14:11   ` Luis Chamberlain
2020-05-10  1:01 ` Shannon Nelson
2020-05-10  1:58   ` Andrew Lunn
2020-05-10  2:15     ` Shannon Nelson
2020-05-11 14:13       ` Luis Chamberlain
2020-05-11 19:21   ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200509164229.GJ11244@42.do-not-panic.com \
    --to=mcgrof@kernel.org \
    --cc=GR-everest-linux-l2@marvell.com \
    --cc=aelior@marvell.com \
    --cc=akpm@linux-foundation.org \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=aquini@redhat.com \
    --cc=arnd@arndb.de \
    --cc=bhe@redhat.com \
    --cc=cai@lca.pw \
    --cc=daniel.vetter@ffwll.ch \
    --cc=davem@davemloft.net \
    --cc=dyoung@redhat.com \
    --cc=gpiccoli@canonical.com \
    --cc=irusskikh@marvell.com \
    --cc=jeyu@kernel.org \
    --cc=keescook@chromium.org \
    --cc=kvalo@codeaurora.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mchehab+samsung@kernel.org \
    --cc=mingo@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=schlad@suse.de \
    --cc=tglx@linutronix.de \
    --cc=tiwai@suse.de \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox