From: Luis Chamberlain <mcgrof@kernel.org>
To: jeyu@kernel.org
Cc: akpm@linux-foundation.org, arnd@arndb.de, rostedt@goodmis.org,
mingo@redhat.com, aquini@redhat.com, cai@lca.pw,
dyoung@redhat.com, bhe@redhat.com, peterz@infradead.org,
tglx@linutronix.de, gpiccoli@canonical.com, pmladek@suse.com,
tiwai@suse.de, schlad@suse.de, andriy.shevchenko@linux.intel.com,
keescook@chromium.org, daniel.vetter@ffwll.ch, will@kernel.org,
mchehab+samsung@kernel.org, kvalo@codeaurora.org,
davem@davemloft.net, linux-kernel@vger.kernel.org,
Luis Chamberlain <mcgrof@kernel.org>
Subject: [RFC] taint: add module firmware crash taint support
Date: Fri, 8 May 2020 02:14:38 +0000 [thread overview]
Message-ID: <20200508021438.4373-1-mcgrof@kernel.org> (raw)
Device driver firmware can crash, and sometimes, this can leave your
system in a state which makes the device or subsystem completely
useless. Detecting this by inspecting /proc/sys/kernel/tainted instead
of scraping some magical words from the kernel log, which is driver
specific, is much easier. So instead provide a helper which lets drivers
annotate this.
Once this happens, scrapers can easily scrape modules taint flags.
This will taint both the kernel and respective calling module.
The new helper module_firmware_crashed() uses LOCKDEP_STILL_OK as
this fact should in no way shape or form affect lockdep. This taint
is device driver specific.
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
Below is the full diff stat of manual inspection throughout the kernel
when this happens. My methodology is to just scrape for "crash" and
then study the driver a bit to see if indeed it seems like that the
firmware crashes there. In *many* cases there is even infrastructure
for this, so this is sometimes clearly obvious. Some other times it
required a bit of deciphering.
The diff stat below is what I have so far, however the patch below
only includes the drivers that start with Q, as they were a source of
inspiration for this, and to make this RFC easier to read.
If this seems sensible, I can follow up with the kernel helper first,
and then tackle each subsystem independently.
I purposely skipped review of remoteproc and virtualization. That should
require its own separate careful review and considerations.
drivers/atm/nicstar.c | 1 +
drivers/bluetooth/hci_qca.c | 1 +
drivers/gpu/drm/i915/i915_gpu_error.c | 1 +
drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c | 2 ++
drivers/gpu/drm/msm/msm_gpu.c | 1 +
drivers/media/dvb-frontends/sp8870.c | 1 +
drivers/media/pci/ttpci/av7110.c | 1 +
drivers/misc/mic/cosm/cosm_scif_server.c | 1 +
drivers/net/ethernet/8390/axnet_cs.c | 4 +++-
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 1 +
drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 1 +
drivers/net/ethernet/brocade/bna/bfa_ioc.c | 1 +
drivers/net/ethernet/cavium/liquidio/lio_main.c | 1 +
drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 1 +
drivers/net/ethernet/ibm/ehea/ehea_main.c | 2 ++
drivers/net/ethernet/qlogic/qed/qed_debug.c | 3 +++
drivers/net/ipa/ipa_modem.c | 1 +
drivers/net/wimax/i2400m/rx.c | 1 +
drivers/net/wireless/ath/ath10k/pci.c | 2 ++
drivers/net/wireless/ath/ath10k/sdio.c | 2 ++
drivers/net/wireless/ath/ath10k/snoc.c | 1 +
drivers/net/wireless/ath/ath6kl/hif.c | 1 +
.../net/wireless/broadcom/brcm80211/brcmfmac/core.c | 1 +
drivers/net/wireless/marvell/mwl8k.c | 1 +
drivers/scsi/bfa/bfa_ioc.c | 1 +
drivers/scsi/megaraid/megaraid_sas_fusion.c | 1 +
drivers/scsi/qla4xxx/ql4_mbx.c | 2 ++
drivers/staging/media/omap4iss/iss.c | 1 +
include/linux/kernel.h | 3 ++-
include/linux/module.h | 13 +++++++++++++
include/trace/events/module.h | 3 ++-
kernel/module.c | 5 +++--
kernel/panic.c | 1 +
33 files changed, 58 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/qlogic/qed/qed_debug.c b/drivers/net/ethernet/qlogic/qed/qed_debug.c
index f4eebaabb6d0..9cc6287b889b 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_debug.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_debug.c
@@ -7854,6 +7854,7 @@ int qed_dbg_all_data(struct qed_dev *cdev, void *buffer)
REGDUMP_HEADER_SIZE,
&feature_size);
if (!rc) {
+ module_firmware_crashed();
*(u32 *)((u8 *)buffer + offset) =
qed_calc_regdump_header(cdev, PROTECTION_OVERRIDE,
cur_engine,
@@ -7869,6 +7870,7 @@ int qed_dbg_all_data(struct qed_dev *cdev, void *buffer)
rc = qed_dbg_fw_asserts(cdev, (u8 *)buffer + offset +
REGDUMP_HEADER_SIZE, &feature_size);
if (!rc) {
+ module_firmware_crashed();
*(u32 *)((u8 *)buffer + offset) =
qed_calc_regdump_header(cdev, FW_ASSERTS,
cur_engine, feature_size,
@@ -7906,6 +7908,7 @@ int qed_dbg_all_data(struct qed_dev *cdev, void *buffer)
rc = qed_dbg_grc(cdev, (u8 *)buffer + offset +
REGDUMP_HEADER_SIZE, &feature_size);
if (!rc) {
+ module_firmware_crashed();
*(u32 *)((u8 *)buffer + offset) =
qed_calc_regdump_header(cdev, GRC_DUMP,
cur_engine,
diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c
index 1d941d53fdc9..6bd0f3b518b9 100644
--- a/drivers/net/wireless/ath/ath10k/pci.c
+++ b/drivers/net/wireless/ath/ath10k/pci.c
@@ -1767,6 +1767,7 @@ static void ath10k_pci_fw_dump_work(struct work_struct *work)
scnprintf(guid, sizeof(guid), "n/a");
ath10k_err(ar, "firmware crashed! (guid %s)\n", guid);
+ module_firmware_crashed();
ath10k_print_driver_info(ar);
ath10k_pci_dump_registers(ar, crash_data);
ath10k_ce_dump_registers(ar, crash_data);
@@ -2837,6 +2838,7 @@ static int ath10k_pci_hif_power_up(struct ath10k *ar,
if (ret) {
if (ath10k_pci_has_fw_crashed(ar)) {
ath10k_warn(ar, "firmware crashed during chip reset\n");
+ module_firmware_crashed();
ath10k_pci_fw_crashed_clear(ar);
ath10k_pci_fw_crashed_dump(ar);
}
diff --git a/drivers/net/wireless/ath/ath10k/sdio.c b/drivers/net/wireless/ath/ath10k/sdio.c
index e2aff2254a40..d34ad289380f 100644
--- a/drivers/net/wireless/ath/ath10k/sdio.c
+++ b/drivers/net/wireless/ath/ath10k/sdio.c
@@ -794,6 +794,7 @@ static int ath10k_sdio_mbox_proc_dbg_intr(struct ath10k *ar)
/* TODO: Add firmware crash handling */
ath10k_warn(ar, "firmware crashed\n");
+ module_firmware_crashed();
/* read counter to clear the interrupt, the debug error interrupt is
* counter 0.
@@ -915,6 +916,7 @@ static int ath10k_sdio_mbox_proc_cpu_intr(struct ath10k *ar)
if (cpu_int_status & MBOX_CPU_STATUS_ENABLE_ASSERT_MASK) {
ath10k_err(ar, "firmware crashed!\n");
queue_work(ar->workqueue, &ar->restart_work);
+ module_firmware_crashed();
}
return ret;
}
diff --git a/drivers/net/wireless/ath/ath10k/snoc.c b/drivers/net/wireless/ath/ath10k/snoc.c
index 354d49b1cd45..7cfc123c345c 100644
--- a/drivers/net/wireless/ath/ath10k/snoc.c
+++ b/drivers/net/wireless/ath/ath10k/snoc.c
@@ -1451,6 +1451,7 @@ void ath10k_snoc_fw_crashed_dump(struct ath10k *ar)
scnprintf(guid, sizeof(guid), "n/a");
ath10k_err(ar, "firmware crashed! (guid %s)\n", guid);
+ module_firmware_crashed();
ath10k_print_driver_info(ar);
ath10k_msa_dump_memory(ar, crash_data);
mutex_unlock(&ar->dump_mutex);
diff --git a/drivers/net/wireless/ath/ath6kl/hif.c b/drivers/net/wireless/ath/ath6kl/hif.c
index d1942537ea10..cfd838607544 100644
--- a/drivers/net/wireless/ath/ath6kl/hif.c
+++ b/drivers/net/wireless/ath/ath6kl/hif.c
@@ -120,6 +120,7 @@ static int ath6kl_hif_proc_dbg_intr(struct ath6kl_device *dev)
int ret;
ath6kl_warn("firmware crashed\n");
+ module_firmware_crashed();
/*
* read counter to clear the interrupt, the debug error interrupt is
diff --git a/drivers/scsi/qla4xxx/ql4_mbx.c b/drivers/scsi/qla4xxx/ql4_mbx.c
index 02636b4785c5..39a58ba5df0a 100644
--- a/drivers/scsi/qla4xxx/ql4_mbx.c
+++ b/drivers/scsi/qla4xxx/ql4_mbx.c
@@ -1016,6 +1016,8 @@ void qla4xxx_get_crash_record(struct scsi_qla_host * ha)
dma_addr_t crash_record_dma = 0;
uint32_t crash_record_size = 0;
+ module_firmware_crashed();
+
memset(&mbox_cmd, 0, sizeof(mbox_cmd));
memset(&mbox_sts, 0, sizeof(mbox_cmd));
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 04a5885cec1b..19e1541c82c7 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -601,7 +601,8 @@ extern enum system_states {
#define TAINT_LIVEPATCH 15
#define TAINT_AUX 16
#define TAINT_RANDSTRUCT 17
-#define TAINT_FLAGS_COUNT 18
+#define TAINT_FIRMWARE_CRASH 18
+#define TAINT_FLAGS_COUNT 19
struct taint_flag {
char c_true; /* character printed when tainted */
diff --git a/include/linux/module.h b/include/linux/module.h
index 1ad393e62bef..de9156dfe6bb 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -697,6 +697,14 @@ static inline bool is_livepatch_module(struct module *mod)
bool is_module_sig_enforced(void);
void set_module_sig_enforced(void);
+void add_taint_module(struct module *mod, unsigned flag,
+ enum lockdep_ok lockdep_ok);
+
+static inline void module_firmware_crashed(void)
+{
+ add_taint_module(THIS_MODULE, TAINT_FIRMWARE_CRASH, LOCKDEP_STILL_OK);
+}
+
#else /* !CONFIG_MODULES... */
static inline struct module *__module_address(unsigned long addr)
@@ -844,6 +852,11 @@ void *dereference_module_function_descriptor(struct module *mod, void *ptr)
return ptr;
}
+static inline void module_firmware_crashed(void)
+{
+ add_taint(TAINT_FIRMWARE_CRASH, val);
+}
+
#endif /* CONFIG_MODULES */
#ifdef CONFIG_SYSFS
diff --git a/include/trace/events/module.h b/include/trace/events/module.h
index 097485c73c01..b749ea25affd 100644
--- a/include/trace/events/module.h
+++ b/include/trace/events/module.h
@@ -26,7 +26,8 @@ struct module;
{ (1UL << TAINT_OOT_MODULE), "O" }, \
{ (1UL << TAINT_FORCED_MODULE), "F" }, \
{ (1UL << TAINT_CRAP), "C" }, \
- { (1UL << TAINT_UNSIGNED_MODULE), "E" })
+ { (1UL << TAINT_UNSIGNED_MODULE), "E" }, \
+ { (1UL << TAINT_FIRMWARE_CRASH), "Q" })
TRACE_EVENT(module_load,
diff --git a/kernel/module.c b/kernel/module.c
index d626dc31b37f..06e6e645dbd4 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -325,12 +325,13 @@ static inline int strong_try_module_get(struct module *mod)
return -ENOENT;
}
-static inline void add_taint_module(struct module *mod, unsigned flag,
- enum lockdep_ok lockdep_ok)
+void add_taint_module(struct module *mod, unsigned flag,
+ enum lockdep_ok lockdep_ok)
{
add_taint(flag, lockdep_ok);
set_bit(flag, &mod->taints);
}
+EXPORT_SYMBOL_GPL(add_taint_module);
/*
* A thread that wants to hold a reference to a module only while it
diff --git a/kernel/panic.c b/kernel/panic.c
index ec6d7d788ce7..504fb926947e 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -384,6 +384,7 @@ const struct taint_flag taint_flags[TAINT_FLAGS_COUNT] = {
[ TAINT_LIVEPATCH ] = { 'K', ' ', true },
[ TAINT_AUX ] = { 'X', ' ', true },
[ TAINT_RANDSTRUCT ] = { 'T', ' ', true },
+ [ TAINT_FIRMWARE_CRASH ] = { 'Q', ' ', true },
};
/**
--
2.25.1
next reply other threads:[~2020-05-08 2:14 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-08 2:14 Luis Chamberlain [this message]
2020-05-08 2:18 ` [RFC] taint: add module firmware crash taint support Luis Chamberlain
2020-05-08 5:47 ` Kees Cook
2020-05-08 6:04 ` Luis Chamberlain
2020-05-08 6:11 ` Daniel Vetter
2020-05-08 6:24 ` Luis Chamberlain
2020-05-08 10:16 ` Andy Shevchenko
2020-05-08 13:49 ` Steven Rostedt
2020-05-08 14:04 ` Rafael Aquini
2020-05-08 15:37 ` Steven Rostedt
2020-05-08 20:37 ` kbuild test robot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200508021438.4373-1-mcgrof@kernel.org \
--to=mcgrof@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=andriy.shevchenko@linux.intel.com \
--cc=aquini@redhat.com \
--cc=arnd@arndb.de \
--cc=bhe@redhat.com \
--cc=cai@lca.pw \
--cc=daniel.vetter@ffwll.ch \
--cc=davem@davemloft.net \
--cc=dyoung@redhat.com \
--cc=gpiccoli@canonical.com \
--cc=jeyu@kernel.org \
--cc=keescook@chromium.org \
--cc=kvalo@codeaurora.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mchehab+samsung@kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=pmladek@suse.com \
--cc=rostedt@goodmis.org \
--cc=schlad@suse.de \
--cc=tglx@linutronix.de \
--cc=tiwai@suse.de \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.