From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>,
Alexander Shishkin <alexander.shishkin@linux.intel.com>,
Arnaldo Carvalho de Melo <acme@redhat.com>,
Heiko Carstens <heiko.carstens@de.ibm.com>,
Hendrik Brueckner <brueckner@linux.ibm.com>,
Jiri Olsa <jolsa@redhat.com>, Kees Cook <keescook@chromium.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Martin Schwidefsky <schwidefsky@de.ibm.com>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@kernel.org>, Sasha Levin <sashal@kernel.org>
Subject: [PATCH AUTOSEL 4.19 49/53] perf/core: Fix perf_event_disable_inatomic() race
Date: Fri, 26 Apr 2019 21:40:46 -0400 [thread overview]
Message-ID: <20190427014051.7522-49-sashal@kernel.org> (raw)
In-Reply-To: <20190427014051.7522-1-sashal@kernel.org>
From: Peter Zijlstra <peterz@infradead.org>
[ Upstream commit 1d54ad944074010609562da5c89e4f5df2f4e5db ]
Thomas-Mich Richter reported he triggered a WARN()ing from event_function_local()
on his s390. The problem boils down to:
CPU-A CPU-B
perf_event_overflow()
perf_event_disable_inatomic()
@pending_disable = 1
irq_work_queue();
sched-out
event_sched_out()
@pending_disable = 0
sched-in
perf_event_overflow()
perf_event_disable_inatomic()
@pending_disable = 1;
irq_work_queue(); // FAILS
irq_work_run()
perf_pending_event()
if (@pending_disable)
perf_event_disable_local(); // WHOOPS
The problem exists in generic, but s390 is particularly sensitive
because it doesn't implement arch_irq_work_raise(), nor does it call
irq_work_run() from it's PMU interrupt handler (nor would that be
sufficient in this case, because s390 also generates
perf_event_overflow() from pmu::stop). Add to that the fact that s390
is a virtual architecture and (virtual) CPU-A can stall long enough
for the above race to happen, even if it would self-IPI.
Adding a irq_work_sync() to event_sched_in() would work for all hardare
PMUs that properly use irq_work_run() but fails for software PMUs.
Instead encode the CPU number in @pending_disable, such that we can
tell which CPU requested the disable. This then allows us to detect
the above scenario and even redirect the IPI to make up for the failed
queue.
Reported-by: Thomas-Mich Richter <tmricht@linux.ibm.com>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
kernel/events/core.c | 52 ++++++++++++++++++++++++++++++-------
kernel/events/ring_buffer.c | 4 +--
2 files changed, 45 insertions(+), 11 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 87bd96399d1c..171b83ebed4a 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2007,8 +2007,8 @@ event_sched_out(struct perf_event *event,
event->pmu->del(event, 0);
event->oncpu = -1;
- if (event->pending_disable) {
- event->pending_disable = 0;
+ if (READ_ONCE(event->pending_disable) >= 0) {
+ WRITE_ONCE(event->pending_disable, -1);
state = PERF_EVENT_STATE_OFF;
}
perf_event_set_state(event, state);
@@ -2196,7 +2196,8 @@ EXPORT_SYMBOL_GPL(perf_event_disable);
void perf_event_disable_inatomic(struct perf_event *event)
{
- event->pending_disable = 1;
+ WRITE_ONCE(event->pending_disable, smp_processor_id());
+ /* can fail, see perf_pending_event_disable() */
irq_work_queue(&event->pending);
}
@@ -5803,10 +5804,45 @@ void perf_event_wakeup(struct perf_event *event)
}
}
+static void perf_pending_event_disable(struct perf_event *event)
+{
+ int cpu = READ_ONCE(event->pending_disable);
+
+ if (cpu < 0)
+ return;
+
+ if (cpu == smp_processor_id()) {
+ WRITE_ONCE(event->pending_disable, -1);
+ perf_event_disable_local(event);
+ return;
+ }
+
+ /*
+ * CPU-A CPU-B
+ *
+ * perf_event_disable_inatomic()
+ * @pending_disable = CPU-A;
+ * irq_work_queue();
+ *
+ * sched-out
+ * @pending_disable = -1;
+ *
+ * sched-in
+ * perf_event_disable_inatomic()
+ * @pending_disable = CPU-B;
+ * irq_work_queue(); // FAILS
+ *
+ * irq_work_run()
+ * perf_pending_event()
+ *
+ * But the event runs on CPU-B and wants disabling there.
+ */
+ irq_work_queue_on(&event->pending, cpu);
+}
+
static void perf_pending_event(struct irq_work *entry)
{
- struct perf_event *event = container_of(entry,
- struct perf_event, pending);
+ struct perf_event *event = container_of(entry, struct perf_event, pending);
int rctx;
rctx = perf_swevent_get_recursion_context();
@@ -5815,10 +5851,7 @@ static void perf_pending_event(struct irq_work *entry)
* and we won't recurse 'further'.
*/
- if (event->pending_disable) {
- event->pending_disable = 0;
- perf_event_disable_local(event);
- }
+ perf_pending_event_disable(event);
if (event->pending_wakeup) {
event->pending_wakeup = 0;
@@ -9969,6 +10002,7 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
init_waitqueue_head(&event->waitq);
+ event->pending_disable = -1;
init_irq_work(&event->pending, perf_pending_event);
mutex_init(&event->mmap_mutex);
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 474b2ccdbe69..99c7f199f2d4 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -393,7 +393,7 @@ void *perf_aux_output_begin(struct perf_output_handle *handle,
* store that will be enabled on successful return
*/
if (!handle->size) { /* A, matches D */
- event->pending_disable = 1;
+ event->pending_disable = smp_processor_id();
perf_output_wakeup(handle);
local_set(&rb->aux_nest, 0);
goto err_put;
@@ -471,7 +471,7 @@ void perf_aux_output_end(struct perf_output_handle *handle, unsigned long size)
if (wakeup) {
if (handle->aux_flags & PERF_AUX_FLAG_TRUNCATED)
- handle->event->pending_disable = 1;
+ handle->event->pending_disable = smp_processor_id();
perf_output_wakeup(handle);
}
--
2.19.1
next prev parent reply other threads:[~2019-04-27 1:48 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-27 1:39 [PATCH AUTOSEL 4.19 01/53] ASoC: tlv320aic3x: fix reset gpio reference counting Sasha Levin
2019-04-27 1:39 ` [PATCH AUTOSEL 4.19 02/53] ASoC: hdmi-codec: fix S/PDIF DAI Sasha Levin
2019-04-27 1:40 ` [PATCH AUTOSEL 4.19 03/53] ASoC: ab8500: Mark expected switch fall-through Sasha Levin
2019-04-27 1:40 ` [PATCH AUTOSEL 4.19 04/53] ASoC: stm32: sai: fix iec958 controls indexation Sasha Levin
2019-04-27 1:40 ` [PATCH AUTOSEL 4.19 05/53] ASoC: stm32: sai: fix exposed capabilities in spdif mode Sasha Levin
2019-04-27 1:40 ` [PATCH AUTOSEL 4.19 06/53] ASoC:soc-pcm:fix a codec fixup issue in TDM case Sasha Levin
2019-04-27 1:40 ` [PATCH AUTOSEL 4.19 07/53] ASoC:intel:skl:fix a simultaneous playback & capture issue on hda platform Sasha Levin
2019-04-27 1:40 ` [PATCH AUTOSEL 4.19 08/53] ASoC: nau8824: fix the issue of the widget with prefix name Sasha Levin
2019-04-27 1:40 ` [PATCH AUTOSEL 4.19 09/53] ASoC: nau8810: fix the issue of widget with prefixed name Sasha Levin
2019-04-27 1:40 ` [PATCH AUTOSEL 4.19 10/53] ASoC: samsung: odroid: Fix clock configuration for 44100 sample rate Sasha Levin
2019-04-27 1:40 ` [PATCH AUTOSEL 4.19 11/53] ASoC: rt5682: recording has no sound after booting Sasha Levin
2019-04-27 1:40 ` [PATCH AUTOSEL 4.19 12/53] ASoC: wm_adsp: Add locking to wm_adsp2_bus_error Sasha Levin
2019-04-27 1:40 ` [PATCH AUTOSEL 4.19 13/53] clk: meson-gxbb: round the vdec dividers to closest Sasha Levin
2019-04-27 1:40 ` [PATCH AUTOSEL 4.19 14/53] ASoC: stm32: dfsdm: manage multiple prepare Sasha Levin
2019-04-27 1:40 ` [PATCH AUTOSEL 4.19 15/53] ASoC: stm32: dfsdm: fix debugfs warnings on entry creation Sasha Levin
2019-04-27 1:40 ` [PATCH AUTOSEL 4.19 16/53] ASoC: cs4270: Set auto-increment bit for register writes Sasha Levin
2019-04-27 1:40 ` [PATCH AUTOSEL 4.19 17/53] ASoC: dapm: Fix NULL pointer dereference in snd_soc_dapm_free_kcontrol Sasha Levin
2019-04-27 1:40 ` [PATCH AUTOSEL 4.19 18/53] drm/omap: hdmi4_cec: Fix CEC clock handling for PM Sasha Levin
2019-04-27 1:40 ` [PATCH AUTOSEL 4.19 19/53] IB/hfi1: Eliminate opcode tests on mr deref Sasha Levin
2019-04-27 1:40 ` [PATCH AUTOSEL 4.19 20/53] IB/hfi1: Fix the allocation of RSM table Sasha Levin
2019-04-27 1:40 ` [PATCH AUTOSEL 4.19 21/53] MIPS: KGDB: fix kgdb support for SMP platforms Sasha Levin
2019-04-27 1:40 ` [PATCH AUTOSEL 4.19 22/53] ASoC: tlv320aic32x4: Fix Common Pins Sasha Levin
2019-04-27 1:40 ` [PATCH AUTOSEL 4.19 23/53] drm/mediatek: Fix an error code in mtk_hdmi_dt_parse_pdata() Sasha Levin
2019-04-27 1:40 ` [PATCH AUTOSEL 4.19 24/53] perf/x86/intel: Fix handling of wakeup_events for multi-entry PEBS Sasha Levin
2019-04-27 1:40 ` [PATCH AUTOSEL 4.19 25/53] perf/x86/intel: Initialize TFA MSR Sasha Levin
2019-04-27 1:40 ` [PATCH AUTOSEL 4.19 26/53] linux/kernel.h: Use parentheses around argument in u64_to_user_ptr() Sasha Levin
2019-04-27 1:40 ` [PATCH AUTOSEL 4.19 27/53] xtensa: fix initialization of pt_regs::syscall in start_thread Sasha Levin
2019-04-27 1:40 ` [PATCH AUTOSEL 4.19 28/53] ASoC: rockchip: pdm: fix regmap_ops hang issue Sasha Levin
2019-04-27 1:40 ` [PATCH AUTOSEL 4.19 29/53] drm/amdkfd: Add picasso pci id Sasha Levin
[not found] ` <BN6PR12MB18098B1A85760FCFFFDD3C37F73F0@BN6PR12MB1809.namprd12.prod.outlook.com>
2019-05-08 17:35 ` Sasha Levin
2019-04-27 1:40 ` [PATCH AUTOSEL 4.19 30/53] drm/amd/display: fix cursor black issue Sasha Levin
2019-04-27 1:40 ` [PATCH AUTOSEL 4.19 31/53] ASoC: cs35l35: Disable regulators on driver removal Sasha Levin
2019-04-27 1:40 ` [PATCH AUTOSEL 4.19 32/53] objtool: Add rewind_stack_do_exit() to the noreturn list Sasha Levin
2019-04-27 1:40 ` [PATCH AUTOSEL 4.19 33/53] powerpc/vdso32: fix CLOCK_MONOTONIC on PPC64 Sasha Levin
2019-04-27 1:40 ` [PATCH AUTOSEL 4.19 34/53] slab: fix a crash by reading /proc/slab_allocators Sasha Levin
2019-04-27 1:40 ` [PATCH AUTOSEL 4.19 35/53] ASoC: stm32: fix sai driver name initialisation Sasha Levin
2019-04-27 1:40 ` [PATCH AUTOSEL 4.19 36/53] drm/sun4i: tcon top: Fix NULL/invalid pointer dereference in sun8i_tcon_top_un/bind Sasha Levin
2019-04-27 1:40 ` [PATCH AUTOSEL 4.19 37/53] virtio_pci: fix a NULL pointer reference in vp_del_vqs Sasha Levin
2019-04-27 1:40 ` [PATCH AUTOSEL 4.19 38/53] RDMA/vmw_pvrdma: Fix memory leak on pvrdma_pci_remove Sasha Levin
2019-04-27 1:40 ` [PATCH AUTOSEL 4.19 39/53] RDMA/hns: Fix bug that caused srq creation to fail Sasha Levin
2019-04-27 1:40 ` [PATCH AUTOSEL 4.19 40/53] scsi: csiostor: fix missing data copy in csio_scsi_err_handler() Sasha Levin
2019-04-27 1:40 ` [PATCH AUTOSEL 4.19 41/53] drm/mediatek: fix possible object reference leak Sasha Levin
2019-04-27 1:40 ` [PATCH AUTOSEL 4.19 42/53] Bluetooth: btusb: request wake pin with NOAUTOEN Sasha Levin
2019-04-27 1:40 ` [PATCH AUTOSEL 4.19 43/53] ASoC: Intel: kbl: fix wrong number of channels Sasha Levin
2019-04-27 1:40 ` [PATCH AUTOSEL 4.19 44/53] virtio-blk: limit number of hw queues by nr_cpu_ids Sasha Levin
2019-04-27 1:40 ` [PATCH AUTOSEL 4.19 45/53] clk: x86: Add system specific quirk to mark clocks as critical Sasha Levin
2019-04-27 1:40 ` [PATCH AUTOSEL 4.19 46/53] nvme-fc: correct csn initialization and increments on error Sasha Levin
2019-04-27 1:40 ` [PATCH AUTOSEL 4.19 47/53] platform/x86: pmc_atom: Drop __initconst on dmi table Sasha Levin
2019-04-27 1:40 ` [PATCH AUTOSEL 4.19 48/53] NFS: Forbid setting AF_INET6 to "struct sockaddr_in"->sin_family Sasha Levin
2019-04-27 1:40 ` Sasha Levin [this message]
2019-04-27 1:40 ` [PATCH AUTOSEL 4.19 50/53] iommu/amd: Set exclusion range correctly Sasha Levin
2019-04-27 1:40 ` [PATCH AUTOSEL 4.19 51/53] mm: make page ref count overflow check tighter and more explicit Sasha Levin
2019-04-27 1:40 ` [PATCH AUTOSEL 4.19 52/53] mm: add 'try_get_page()' helper function Sasha Levin
2019-04-27 1:40 ` [PATCH AUTOSEL 4.19 53/53] fs: prevent page refcount overflow in pipe_buf_get Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190427014051.7522-49-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=acme@redhat.com \
--cc=alexander.shishkin@linux.intel.com \
--cc=brueckner@linux.ibm.com \
--cc=heiko.carstens@de.ibm.com \
--cc=jolsa@redhat.com \
--cc=keescook@chromium.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=schwidefsky@de.ibm.com \
--cc=stable@vger.kernel.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox