From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Paul Burton <paul.burton@mips.com>,
James Hogan <jhogan@kernel.org>,
Ralf Baechle <ralf@linux-mips.org>,
Huacai Chen <chenhc@lemote.com>,
linux-mips@linux-mips.org
Subject: [PATCH 4.14 04/54] MIPS: Use async IPIs for arch_trigger_cpumask_backtrace()
Date: Mon, 16 Jul 2018 09:35:01 +0200 [thread overview]
Message-ID: <20180716073451.319910961@linuxfoundation.org> (raw)
In-Reply-To: <20180716073450.534886211@linuxfoundation.org>
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paul Burton <paul.burton@mips.com>
commit b63e132b6433a41cf311e8bc382d33fd2b73b505 upstream.
The current MIPS implementation of arch_trigger_cpumask_backtrace() is
broken because it attempts to use synchronous IPIs despite the fact that
it may be run with interrupts disabled.
This means that when arch_trigger_cpumask_backtrace() is invoked, for
example by the RCU CPU stall watchdog, we may:
- Deadlock due to use of synchronous IPIs with interrupts disabled,
causing the CPU that's attempting to generate the backtrace output
to hang itself.
- Not succeed in generating the desired output from remote CPUs.
- Produce warnings about this from smp_call_function_many(), for
example:
[42760.526910] INFO: rcu_sched detected stalls on CPUs/tasks:
[42760.535755] 0-...!: (1 GPs behind) idle=ade/140000000000000/0 softirq=526944/526945 fqs=0
[42760.547874] 1-...!: (0 ticks this GP) idle=e4a/140000000000000/0 softirq=547885/547885 fqs=0
[42760.559869] (detected by 2, t=2162 jiffies, g=266689, c=266688, q=33)
[42760.568927] ------------[ cut here ]------------
[42760.576146] WARNING: CPU: 2 PID: 1216 at kernel/smp.c:416 smp_call_function_many+0x88/0x20c
[42760.587839] Modules linked in:
[42760.593152] CPU: 2 PID: 1216 Comm: sh Not tainted 4.15.4-00373-gee058bb4d0c2 #2
[42760.603767] Stack : 8e09bd20 8e09bd20 8e09bd20 fffffff0 00000007 00000006 00000000 8e09bca8
[42760.616937] 95b2b379 95b2b379 807a0080 00000007 81944518 0000018a 00000032 00000000
[42760.630095] 00000000 00000030 80000000 00000000 806eca74 00000009 8017e2b8 000001a0
[42760.643169] 00000000 00000002 00000000 8e09baa4 00000008 808b8008 86d69080 8e09bca0
[42760.656282] 8e09ad50 805e20aa 00000000 00000000 00000000 8017e2b8 00000009 801070ca
[42760.669424] ...
[42760.673919] Call Trace:
[42760.678672] [<27fde568>] show_stack+0x70/0xf0
[42760.685417] [<84751641>] dump_stack+0xaa/0xd0
[42760.692188] [<699d671c>] __warn+0x80/0x92
[42760.698549] [<68915d41>] warn_slowpath_null+0x28/0x36
[42760.705912] [<f7c76c1c>] smp_call_function_many+0x88/0x20c
[42760.713696] [<6bbdfc2a>] arch_trigger_cpumask_backtrace+0x30/0x4a
[42760.722216] [<f845bd33>] rcu_dump_cpu_stacks+0x6a/0x98
[42760.729580] [<796e7629>] rcu_check_callbacks+0x672/0x6ac
[42760.737476] [<059b3b43>] update_process_times+0x18/0x34
[42760.744981] [<6eb94941>] tick_sched_handle.isra.5+0x26/0x38
[42760.752793] [<478d3d70>] tick_sched_timer+0x1c/0x50
[42760.759882] [<e56ea39f>] __hrtimer_run_queues+0xc6/0x226
[42760.767418] [<e88bbcae>] hrtimer_interrupt+0x88/0x19a
[42760.775031] [<6765a19e>] gic_compare_interrupt+0x2e/0x3a
[42760.782761] [<0558bf5f>] handle_percpu_devid_irq+0x78/0x168
[42760.790795] [<90c11ba2>] generic_handle_irq+0x1e/0x2c
[42760.798117] [<1b6d462c>] gic_handle_local_int+0x38/0x86
[42760.805545] [<b2ada1c7>] gic_irq_dispatch+0xa/0x14
[42760.812534] [<90c11ba2>] generic_handle_irq+0x1e/0x2c
[42760.820086] [<c7521934>] do_IRQ+0x16/0x20
[42760.826274] [<9aef3ce6>] plat_irq_dispatch+0x62/0x94
[42760.833458] [<6a94b53c>] except_vec_vi_end+0x70/0x78
[42760.840655] [<22284043>] smp_call_function_many+0x1ba/0x20c
[42760.848501] [<54022b58>] smp_call_function+0x1e/0x2c
[42760.855693] [<ab9fc705>] flush_tlb_mm+0x2a/0x98
[42760.862730] [<0844cdd0>] tlb_flush_mmu+0x1c/0x44
[42760.869628] [<cb259b74>] arch_tlb_finish_mmu+0x26/0x3e
[42760.877021] [<1aeaaf74>] tlb_finish_mmu+0x18/0x66
[42760.883907] [<b3fce717>] exit_mmap+0x76/0xea
[42760.890428] [<c4c8a2f6>] mmput+0x80/0x11a
[42760.896632] [<a41a08f4>] do_exit+0x1f4/0x80c
[42760.903158] [<ee01cef6>] do_group_exit+0x20/0x7e
[42760.909990] [<13fa8d54>] __wake_up_parent+0x0/0x1e
[42760.917045] [<46cf89d0>] smp_call_function_many+0x1a2/0x20c
[42760.924893] [<8c21a93b>] syscall_common+0x14/0x1c
[42760.931765] ---[ end trace 02aa09da9dc52a60 ]---
[42760.938342] ------------[ cut here ]------------
[42760.945311] WARNING: CPU: 2 PID: 1216 at kernel/smp.c:291 smp_call_function_single+0xee/0xf8
...
This patch switches MIPS' arch_trigger_cpumask_backtrace() to use async
IPIs & smp_call_function_single_async() in order to resolve this
problem. We ensure use of the pre-allocated call_single_data_t
structures is serialized by maintaining a cpumask indicating that
they're busy, and refusing to attempt to send an IPI when a CPU's bit is
set in this mask. This should only happen if a CPU hasn't responded to a
previous backtrace IPI - ie. if it's hung - and we print a warning to
the console in this case.
I've marked this for stable branches as far back as v4.9, to which it
applies cleanly. Strictly speaking the faulty MIPS implementation can be
traced further back to commit 856839b76836 ("MIPS: Add
arch_trigger_all_cpu_backtrace() function") in v3.19, but kernel
versions v3.19 through v4.8 will require further work to backport due to
the rework performed in commit 9a01c3ed5cdb ("nmi_backtrace: add more
trigger_*_cpu_backtrace() methods").
Signed-off-by: Paul Burton <paul.burton@mips.com>
Patchwork: https://patchwork.linux-mips.org/patch/19597/
Cc: James Hogan <jhogan@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Huacai Chen <chenhc@lemote.com>
Cc: linux-mips@linux-mips.org
Cc: stable@vger.kernel.org # v4.9+
Fixes: 856839b76836 ("MIPS: Add arch_trigger_all_cpu_backtrace() function")
Fixes: 9a01c3ed5cdb ("nmi_backtrace: add more trigger_*_cpu_backtrace() methods")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
arch/mips/kernel/process.c | 45 ++++++++++++++++++++++++++++++---------------
1 file changed, 30 insertions(+), 15 deletions(-)
--- a/arch/mips/kernel/process.c
+++ b/arch/mips/kernel/process.c
@@ -29,6 +29,7 @@
#include <linux/kallsyms.h>
#include <linux/random.h>
#include <linux/prctl.h>
+#include <linux/nmi.h>
#include <asm/asm.h>
#include <asm/bootinfo.h>
@@ -655,28 +656,42 @@ unsigned long arch_align_stack(unsigned
return sp & ALMASK;
}
-static void arch_dump_stack(void *info)
-{
- struct pt_regs *regs;
+static DEFINE_PER_CPU(call_single_data_t, backtrace_csd);
+static struct cpumask backtrace_csd_busy;
- regs = get_irq_regs();
-
- if (regs)
- show_regs(regs);
- else
- dump_stack();
+static void handle_backtrace(void *info)
+{
+ nmi_cpu_backtrace(get_irq_regs());
+ cpumask_clear_cpu(smp_processor_id(), &backtrace_csd_busy);
}
-void arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self)
+static void raise_backtrace(cpumask_t *mask)
{
- long this_cpu = get_cpu();
+ call_single_data_t *csd;
+ int cpu;
- if (cpumask_test_cpu(this_cpu, mask) && !exclude_self)
- dump_stack();
+ for_each_cpu(cpu, mask) {
+ /*
+ * If we previously sent an IPI to the target CPU & it hasn't
+ * cleared its bit in the busy cpumask then it didn't handle
+ * our previous IPI & it's not safe for us to reuse the
+ * call_single_data_t.
+ */
+ if (cpumask_test_and_set_cpu(cpu, &backtrace_csd_busy)) {
+ pr_warn("Unable to send backtrace IPI to CPU%u - perhaps it hung?\n",
+ cpu);
+ continue;
+ }
- smp_call_function_many(mask, arch_dump_stack, NULL, 1);
+ csd = &per_cpu(backtrace_csd, cpu);
+ csd->func = handle_backtrace;
+ smp_call_function_single_async(cpu, csd);
+ }
+}
- put_cpu();
+void arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self)
+{
+ nmi_trigger_cpumask_backtrace(mask, exclude_self, raise_backtrace);
}
int mips_get_process_fp_mode(struct task_struct *task)
next prev parent reply other threads:[~2018-07-16 8:05 UTC|newest]
Thread overview: 57+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-16 7:34 [PATCH 4.14 00/54] 4.14.56-stable review Greg Kroah-Hartman
2018-07-16 7:34 ` [PATCH 4.14 01/54] media: rc: mce_kbd decoder: fix stuck keys Greg Kroah-Hartman
2018-07-16 7:34 ` [PATCH 4.14 02/54] ASoC: mediatek: preallocate pages use platform device Greg Kroah-Hartman
2018-07-16 7:35 ` [PATCH 4.14 03/54] MIPS: Call dump_stack() from show_regs() Greg Kroah-Hartman
2018-07-16 7:35 ` Greg Kroah-Hartman [this message]
2018-07-16 7:35 ` [PATCH 4.14 05/54] MIPS: Fix ioremap() RAM check Greg Kroah-Hartman
2018-07-16 7:35 ` [PATCH 4.14 06/54] mmc: sdhci-esdhc-imx: allow 1.8V modes without 100/200MHz pinctrl states Greg Kroah-Hartman
2018-07-16 7:35 ` [PATCH 4.14 07/54] mmc: dw_mmc: fix card threshold control configuration Greg Kroah-Hartman
2018-07-16 7:35 ` [PATCH 4.14 08/54] ibmasm: dont write out of bounds in read handler Greg Kroah-Hartman
2018-07-16 7:35 ` [PATCH 4.14 09/54] staging: rtl8723bs: Prevent an underflow in rtw_check_beacon_data() Greg Kroah-Hartman
2018-07-16 7:35 ` [PATCH 4.14 10/54] staging: r8822be: Fix RTL8822be cant find any wireless AP Greg Kroah-Hartman
2018-07-16 7:35 ` [PATCH 4.14 11/54] ata: Fix ZBC_OUT command block check Greg Kroah-Hartman
2018-07-16 7:35 ` [PATCH 4.14 12/54] ata: Fix ZBC_OUT all bit handling Greg Kroah-Hartman
2018-07-16 7:35 ` [PATCH 4.14 13/54] vmw_balloon: fix inflation with batching Greg Kroah-Hartman
2018-07-16 7:35 ` [PATCH 4.14 14/54] ahci: Disable LPM on Lenovo 50 series laptops with a too old BIOS Greg Kroah-Hartman
2018-07-16 7:35 ` [PATCH 4.14 15/54] USB: serial: ch341: fix type promotion bug in ch341_control_in() Greg Kroah-Hartman
2018-07-16 7:35 ` [PATCH 4.14 16/54] USB: serial: cp210x: add another USB ID for Qivicon ZigBee stick Greg Kroah-Hartman
2018-07-16 7:35 ` [PATCH 4.14 17/54] USB: serial: keyspan_pda: fix modem-status error handling Greg Kroah-Hartman
2018-07-16 7:35 ` [PATCH 4.14 18/54] USB: yurex: fix out-of-bounds uaccess in read handler Greg Kroah-Hartman
2018-07-16 7:35 ` [PATCH 4.14 19/54] USB: serial: mos7840: fix status-register error handling Greg Kroah-Hartman
2018-07-16 7:35 ` [PATCH 4.14 20/54] usb: quirks: add delay quirks for Corsair Strafe Greg Kroah-Hartman
2018-07-16 7:35 ` [PATCH 4.14 21/54] xhci: xhci-mem: off by one in xhci_stream_id_to_ring() Greg Kroah-Hartman
2018-07-16 7:35 ` [PATCH 4.14 22/54] devpts: hoist out check for DEVPTS_SUPER_MAGIC Greg Kroah-Hartman
2018-07-16 7:35 ` [PATCH 4.14 23/54] devpts: resolve devpts bind-mounts Greg Kroah-Hartman
2018-07-16 7:35 ` [PATCH 4.14 24/54] Fix up non-directory creation in SGID directories Greg Kroah-Hartman
2018-07-16 7:35 ` [PATCH 4.14 25/54] genirq/affinity: assign vectors to all possible CPUs Greg Kroah-Hartman
2018-07-16 7:35 ` [PATCH 4.14 26/54] scsi: megaraid_sas: use adapter_type for all gen controllers Greg Kroah-Hartman
2018-07-16 7:35 ` [PATCH 4.14 27/54] scsi: megaraid_sas: replace instance->ctrl_context checks with instance->adapter_type Greg Kroah-Hartman
2018-07-16 7:35 ` [PATCH 4.14 28/54] scsi: megaraid_sas: replace is_ventura with adapter_type checks Greg Kroah-Hartman
2018-07-16 7:35 ` [PATCH 4.14 29/54] scsi: megaraid_sas: Create separate functions to allocate ctrl memory Greg Kroah-Hartman
2018-07-16 7:35 ` [PATCH 4.14 30/54] scsi: megaraid_sas: fix selection of reply queue Greg Kroah-Hartman
2018-07-16 7:35 ` [PATCH 4.14 31/54] ALSA: hda/realtek - two more lenovo models need fixup of MIC_LOCATION Greg Kroah-Hartman
2018-07-16 7:35 ` [PATCH 4.14 32/54] ALSA: hda - Handle pm failure during hotplug Greg Kroah-Hartman
2018-07-16 7:35 ` [PATCH 4.14 33/54] mm: do not drop unused pages when userfaultd is running Greg Kroah-Hartman
2018-07-16 7:35 ` [PATCH 4.14 34/54] fs/proc/task_mmu.c: fix Locked field in /proc/pid/smaps* Greg Kroah-Hartman
2018-07-16 7:35 ` [PATCH 4.14 35/54] fs, elf: make sure to page align bss in load_elf_library Greg Kroah-Hartman
2018-07-16 7:35 ` [PATCH 4.14 36/54] mm: do not bug_on on incorrect length in __mm_populate() Greg Kroah-Hartman
2018-07-16 7:35 ` [PATCH 4.14 37/54] tracing: Reorder display of TGID to be after PID Greg Kroah-Hartman
2018-07-16 7:35 ` [PATCH 4.14 38/54] kbuild: delete INSTALL_FW_PATH from kbuild documentation Greg Kroah-Hartman
2018-07-16 7:35 ` [PATCH 4.14 39/54] arm64: neon: Fix function may_use_simd() return error status Greg Kroah-Hartman
2018-07-16 7:35 ` [PATCH 4.14 40/54] tools build: fix # escaping in .cmd files for future Make Greg Kroah-Hartman
2018-07-16 7:35 ` [PATCH 4.14 41/54] IB/hfi1: Fix incorrect mixing of ERR_PTR and NULL return values Greg Kroah-Hartman
2018-07-16 7:35 ` [PATCH 4.14 42/54] i2c: tegra: Fix NACK error handling Greg Kroah-Hartman
2018-07-16 7:35 ` [PATCH 4.14 43/54] iw_cxgb4: correctly enforce the max reg_mr depth Greg Kroah-Hartman
2018-07-16 7:35 ` [PATCH 4.14 44/54] xen: setup pv irq ops vector earlier Greg Kroah-Hartman
2018-07-16 7:35 ` [PATCH 4.14 45/54] nvme-pci: Remap CMB SQ entries on every controller reset Greg Kroah-Hartman
2018-07-16 7:35 ` [PATCH 4.14 46/54] crypto: x86/salsa20 - remove x86 salsa20 implementations Greg Kroah-Hartman
2018-07-16 7:35 ` [PATCH 4.14 47/54] uprobes/x86: Remove incorrect WARN_ON() in uprobe_init_insn() Greg Kroah-Hartman
2018-07-16 7:35 ` [PATCH 4.14 48/54] netfilter: nf_queue: augment nfqa_cfg_policy Greg Kroah-Hartman
2018-07-16 7:35 ` [PATCH 4.14 49/54] netfilter: x_tables: initialise match/target check parameter struct Greg Kroah-Hartman
2018-07-16 7:35 ` [PATCH 4.14 50/54] loop: add recursion validation to LOOP_CHANGE_FD Greg Kroah-Hartman
2018-07-16 7:35 ` [PATCH 4.14 51/54] PM / hibernate: Fix oops at snapshot_write() Greg Kroah-Hartman
2018-07-16 7:35 ` [PATCH 4.14 52/54] RDMA/ucm: Mark UCM interface as BROKEN Greg Kroah-Hartman
2018-07-16 7:35 ` [PATCH 4.14 53/54] loop: remember whether sysfs_create_group() was done Greg Kroah-Hartman
2018-07-16 7:35 ` [PATCH 4.14 54/54] f2fs: give message and set need_fsck given broken node id Greg Kroah-Hartman
2018-07-16 16:26 ` [PATCH 4.14 00/54] 4.14.56-stable review Guenter Roeck
2018-07-17 8:04 ` Naresh Kamboju
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180716073451.319910961@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=chenhc@lemote.com \
--cc=jhogan@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mips@linux-mips.org \
--cc=paul.burton@mips.com \
--cc=ralf@linux-mips.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).