public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Arnaldo Carvalho de Melo <acme@redhat.com>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Hendrik Brueckner <brueckner@linux.ibm.com>,
	Jiri Olsa <jolsa@redhat.com>, Kees Cook <keescook@chromium.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@kernel.org>, Sasha Levin <sashal@kernel.org>
Subject: [PATCH AUTOSEL 5.0 74/79] perf/core: Fix perf_event_disable_inatomic() race
Date: Fri, 26 Apr 2019 21:38:33 -0400	[thread overview]
Message-ID: <20190427013838.6596-74-sashal@kernel.org> (raw)
In-Reply-To: <20190427013838.6596-1-sashal@kernel.org>

From: Peter Zijlstra <peterz@infradead.org>

[ Upstream commit 1d54ad944074010609562da5c89e4f5df2f4e5db ]

Thomas-Mich Richter reported he triggered a WARN()ing from event_function_local()
on his s390. The problem boils down to:

	CPU-A				CPU-B

	perf_event_overflow()
	  perf_event_disable_inatomic()
	    @pending_disable = 1
	    irq_work_queue();

	sched-out
	  event_sched_out()
	    @pending_disable = 0

					sched-in
					perf_event_overflow()
					  perf_event_disable_inatomic()
					    @pending_disable = 1;
					    irq_work_queue(); // FAILS

	irq_work_run()
	  perf_pending_event()
	    if (@pending_disable)
	      perf_event_disable_local(); // WHOOPS

The problem exists in generic, but s390 is particularly sensitive
because it doesn't implement arch_irq_work_raise(), nor does it call
irq_work_run() from it's PMU interrupt handler (nor would that be
sufficient in this case, because s390 also generates
perf_event_overflow() from pmu::stop). Add to that the fact that s390
is a virtual architecture and (virtual) CPU-A can stall long enough
for the above race to happen, even if it would self-IPI.

Adding a irq_work_sync() to event_sched_in() would work for all hardare
PMUs that properly use irq_work_run() but fails for software PMUs.

Instead encode the CPU number in @pending_disable, such that we can
tell which CPU requested the disable. This then allows us to detect
the above scenario and even redirect the IPI to make up for the failed
queue.

Reported-by: Thomas-Mich Richter <tmricht@linux.ibm.com>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 kernel/events/core.c        | 52 ++++++++++++++++++++++++++++++-------
 kernel/events/ring_buffer.c |  4 +--
 2 files changed, 45 insertions(+), 11 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 2e2305a81047..124e1e3d06b9 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2007,8 +2007,8 @@ event_sched_out(struct perf_event *event,
 	event->pmu->del(event, 0);
 	event->oncpu = -1;
 
-	if (event->pending_disable) {
-		event->pending_disable = 0;
+	if (READ_ONCE(event->pending_disable) >= 0) {
+		WRITE_ONCE(event->pending_disable, -1);
 		state = PERF_EVENT_STATE_OFF;
 	}
 	perf_event_set_state(event, state);
@@ -2196,7 +2196,8 @@ EXPORT_SYMBOL_GPL(perf_event_disable);
 
 void perf_event_disable_inatomic(struct perf_event *event)
 {
-	event->pending_disable = 1;
+	WRITE_ONCE(event->pending_disable, smp_processor_id());
+	/* can fail, see perf_pending_event_disable() */
 	irq_work_queue(&event->pending);
 }
 
@@ -5803,10 +5804,45 @@ void perf_event_wakeup(struct perf_event *event)
 	}
 }
 
+static void perf_pending_event_disable(struct perf_event *event)
+{
+	int cpu = READ_ONCE(event->pending_disable);
+
+	if (cpu < 0)
+		return;
+
+	if (cpu == smp_processor_id()) {
+		WRITE_ONCE(event->pending_disable, -1);
+		perf_event_disable_local(event);
+		return;
+	}
+
+	/*
+	 *  CPU-A			CPU-B
+	 *
+	 *  perf_event_disable_inatomic()
+	 *    @pending_disable = CPU-A;
+	 *    irq_work_queue();
+	 *
+	 *  sched-out
+	 *    @pending_disable = -1;
+	 *
+	 *				sched-in
+	 *				perf_event_disable_inatomic()
+	 *				  @pending_disable = CPU-B;
+	 *				  irq_work_queue(); // FAILS
+	 *
+	 *  irq_work_run()
+	 *    perf_pending_event()
+	 *
+	 * But the event runs on CPU-B and wants disabling there.
+	 */
+	irq_work_queue_on(&event->pending, cpu);
+}
+
 static void perf_pending_event(struct irq_work *entry)
 {
-	struct perf_event *event = container_of(entry,
-			struct perf_event, pending);
+	struct perf_event *event = container_of(entry, struct perf_event, pending);
 	int rctx;
 
 	rctx = perf_swevent_get_recursion_context();
@@ -5815,10 +5851,7 @@ static void perf_pending_event(struct irq_work *entry)
 	 * and we won't recurse 'further'.
 	 */
 
-	if (event->pending_disable) {
-		event->pending_disable = 0;
-		perf_event_disable_local(event);
-	}
+	perf_pending_event_disable(event);
 
 	if (event->pending_wakeup) {
 		event->pending_wakeup = 0;
@@ -9998,6 +10031,7 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
 
 
 	init_waitqueue_head(&event->waitq);
+	event->pending_disable = -1;
 	init_irq_work(&event->pending, perf_pending_event);
 
 	mutex_init(&event->mmap_mutex);
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 878c62ec0190..015a9f3b893e 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -393,7 +393,7 @@ void *perf_aux_output_begin(struct perf_output_handle *handle,
 		 * store that will be enabled on successful return
 		 */
 		if (!handle->size) { /* A, matches D */
-			event->pending_disable = 1;
+			event->pending_disable = smp_processor_id();
 			perf_output_wakeup(handle);
 			local_set(&rb->aux_nest, 0);
 			goto err_put;
@@ -481,7 +481,7 @@ void perf_aux_output_end(struct perf_output_handle *handle, unsigned long size)
 
 	if (wakeup) {
 		if (handle->aux_flags & PERF_AUX_FLAG_TRUNCATED)
-			handle->event->pending_disable = 1;
+			handle->event->pending_disable = smp_processor_id();
 		perf_output_wakeup(handle);
 	}
 
-- 
2.19.1


  parent reply	other threads:[~2019-04-27  1:40 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-27  1:37 [PATCH AUTOSEL 5.0 01/79] ASoC: tlv320aic3x: fix reset gpio reference counting Sasha Levin
2019-04-27  1:37 ` [PATCH AUTOSEL 5.0 02/79] ASoC: hdmi-codec: fix S/PDIF DAI Sasha Levin
2019-04-27  1:37 ` [PATCH AUTOSEL 5.0 03/79] ASoC: ab8500: Mark expected switch fall-through Sasha Levin
2019-04-27 17:14   ` Mark Brown
2019-04-27 17:31     ` Gustavo A. R. Silva
2019-04-27 18:00       ` Mark Brown
2019-04-28  1:06         ` Sasha Levin
2019-04-27  1:37 ` [PATCH AUTOSEL 5.0 04/79] ASoC: stm32: sai: fix iec958 controls indexation Sasha Levin
2019-04-27  1:37 ` [PATCH AUTOSEL 5.0 05/79] ASoC: stm32: sai: fix exposed capabilities in spdif mode Sasha Levin
2019-04-27  1:37 ` [PATCH AUTOSEL 5.0 06/79] ASoC: stm32: sai: fix race condition in irq handler Sasha Levin
2019-04-27  1:37 ` [PATCH AUTOSEL 5.0 07/79] ASoC:soc-pcm:fix a codec fixup issue in TDM case Sasha Levin
2019-04-27  1:37 ` [PATCH AUTOSEL 5.0 08/79] ASoC:hdac_hda:use correct format to setup hda codec Sasha Levin
2019-04-27  1:37 ` [PATCH AUTOSEL 5.0 09/79] ASoC:intel:skl:fix a simultaneous playback & capture issue on hda platform Sasha Levin
2019-04-27  1:37 ` [PATCH AUTOSEL 5.0 10/79] ASoC: dpcm: prevent snd_soc_dpcm use after free Sasha Levin
2019-04-27  1:37 ` [PATCH AUTOSEL 5.0 11/79] ASoC: nau8824: fix the issue of the widget with prefix name Sasha Levin
2019-04-27  1:37 ` [PATCH AUTOSEL 5.0 12/79] ASoC: nau8810: fix the issue of widget with prefixed name Sasha Levin
2019-04-27  1:37 ` [PATCH AUTOSEL 5.0 13/79] ASoC: samsung: odroid: Fix clock configuration for 44100 sample rate Sasha Levin
2019-04-27  1:37 ` [PATCH AUTOSEL 5.0 14/79] ASoC: rt5682: Check JD status when system resume Sasha Levin
2019-04-27  1:37 ` [PATCH AUTOSEL 5.0 15/79] ASoC: rt5682: fix jack type detection issue Sasha Levin
2019-04-27  1:37 ` [PATCH AUTOSEL 5.0 16/79] ASoC: rt5682: recording has no sound after booting Sasha Levin
2019-04-27  1:37 ` [PATCH AUTOSEL 5.0 17/79] ASoC: wm_adsp: Add locking to wm_adsp2_bus_error Sasha Levin
2019-04-27  1:37 ` [PATCH AUTOSEL 5.0 18/79] clk: meson-gxbb: round the vdec dividers to closest Sasha Levin
2019-04-27  1:37 ` [PATCH AUTOSEL 5.0 19/79] ASoC: stm32: dfsdm: manage multiple prepare Sasha Levin
2019-04-27  1:37 ` [PATCH AUTOSEL 5.0 20/79] ASoC: stm32: dfsdm: fix debugfs warnings on entry creation Sasha Levin
2019-04-27  1:37 ` [PATCH AUTOSEL 5.0 21/79] ASoC: cs4270: Set auto-increment bit for register writes Sasha Levin
2019-04-27  1:37 ` [PATCH AUTOSEL 5.0 22/79] ASoC: dapm: Fix NULL pointer dereference in snd_soc_dapm_free_kcontrol Sasha Levin
2019-04-27  1:37 ` [PATCH AUTOSEL 5.0 23/79] drm/omap: hdmi4_cec: Fix CEC clock handling for PM Sasha Levin
2019-04-27  1:37 ` [PATCH AUTOSEL 5.0 24/79] IB/hfi1: Clear the IOWAIT pending bits when QP is put into error state Sasha Levin
2019-04-27  1:37 ` [PATCH AUTOSEL 5.0 25/79] IB/hfi1: Eliminate opcode tests on mr deref Sasha Levin
2019-04-27  1:37 ` [PATCH AUTOSEL 5.0 26/79] IB/hfi1: Fix the allocation of RSM table Sasha Levin
2019-04-27  1:37 ` [PATCH AUTOSEL 5.0 27/79] MIPS: KGDB: fix kgdb support for SMP platforms Sasha Levin
2019-04-27  1:37 ` [PATCH AUTOSEL 5.0 28/79] ASoC: tlv320aic32x4: Fix Common Pins Sasha Levin
2019-04-27  1:37 ` [PATCH AUTOSEL 5.0 29/79] drm/mediatek: Fix an error code in mtk_hdmi_dt_parse_pdata() Sasha Levin
2019-04-27  1:37 ` [PATCH AUTOSEL 5.0 30/79] ASoC: dpcm: skip missing substream while applying symmetry Sasha Levin
2019-04-27  1:37 ` [PATCH AUTOSEL 5.0 31/79] perf/x86/intel: Fix handling of wakeup_events for multi-entry PEBS Sasha Levin
2019-04-27  1:37 ` [PATCH AUTOSEL 5.0 32/79] perf/x86/intel: Initialize TFA MSR Sasha Levin
2019-04-27  1:37 ` [PATCH AUTOSEL 5.0 33/79] linux/kernel.h: Use parentheses around argument in u64_to_user_ptr() Sasha Levin
2019-04-27  1:37 ` [PATCH AUTOSEL 5.0 34/79] ALSA: hda/realtek - Move to ACT_INIT state Sasha Levin
2019-04-27  1:37 ` [PATCH AUTOSEL 5.0 35/79] iov_iter: Fix build error without CONFIG_CRYPTO Sasha Levin
2019-04-27  1:37 ` [PATCH AUTOSEL 5.0 36/79] xtensa: fix initialization of pt_regs::syscall in start_thread Sasha Levin
2019-04-27  1:37 ` [PATCH AUTOSEL 5.0 37/79] ASoC: rockchip: pdm: fix regmap_ops hang issue Sasha Levin
2019-04-27  1:37 ` [PATCH AUTOSEL 5.0 38/79] drm/amdkfd: Add picasso pci id Sasha Levin
2019-04-27  1:37 ` [PATCH AUTOSEL 5.0 39/79] drm/amdgpu: Adjust IB test timeout for XGMI configuration Sasha Levin
2019-04-27  1:37 ` [PATCH AUTOSEL 5.0 40/79] drm/amdgpu: amdgpu_device_recover_vram always failed if only one node in shadow_list Sasha Levin
2019-04-27  1:38 ` [PATCH AUTOSEL 5.0 41/79] drm/amd/display: fix cursor black issue Sasha Levin
2019-04-27  1:38 ` [PATCH AUTOSEL 5.0 42/79] ASoC: cs35l35: Disable regulators on driver removal Sasha Levin
2019-04-27  1:38 ` [PATCH AUTOSEL 5.0 43/79] objtool: Add rewind_stack_do_exit() to the noreturn list Sasha Levin
2019-04-27  1:38 ` [PATCH AUTOSEL 5.0 44/79] powerpc/vdso32: fix CLOCK_MONOTONIC on PPC64 Sasha Levin
2019-04-27  1:38 ` [PATCH AUTOSEL 5.0 45/79] slab: fix a crash by reading /proc/slab_allocators Sasha Levin
2019-04-27  1:38 ` [PATCH AUTOSEL 5.0 46/79] ASoC: stm32: fix sai driver name initialisation Sasha Levin
2019-04-27  1:38 ` [PATCH AUTOSEL 5.0 47/79] drm/sun4i: tcon top: Fix NULL/invalid pointer dereference in sun8i_tcon_top_un/bind Sasha Levin
2019-04-27  1:38 ` [PATCH AUTOSEL 5.0 48/79] virtio_pci: fix a NULL pointer reference in vp_del_vqs Sasha Levin
2019-04-27  1:38 ` [PATCH AUTOSEL 5.0 49/79] RDMA/vmw_pvrdma: Fix memory leak on pvrdma_pci_remove Sasha Levin
2019-04-27  1:38 ` [PATCH AUTOSEL 5.0 50/79] RDMA/hns: Fix bug that caused srq creation to fail Sasha Levin
2019-04-27  1:38 ` [PATCH AUTOSEL 5.0 51/79] tpm: fix an invalid condition in tpm_common_poll Sasha Levin
2019-04-27  1:38 ` [PATCH AUTOSEL 5.0 52/79] KEYS: trusted: fix -Wvarags warning Sasha Levin
2019-04-27  1:38 ` [PATCH AUTOSEL 5.0 53/79] scsi: csiostor: fix missing data copy in csio_scsi_err_handler() Sasha Levin
2019-04-27  1:38 ` [PATCH AUTOSEL 5.0 54/79] drm/mediatek: fix possible object reference leak Sasha Levin
2019-04-27  1:38 ` [PATCH AUTOSEL 5.0 55/79] drm/mediatek: fix the rate and divder of hdmi phy for MT2701 Sasha Levin
2019-04-27  1:38 ` [PATCH AUTOSEL 5.0 56/79] drm/mediatek: make implementation of recalc_rate() for MT2701 hdmi phy Sasha Levin
2019-04-27  1:38 ` [PATCH AUTOSEL 5.0 57/79] drm/mediatek: remove flag CLK_SET_RATE_PARENT " Sasha Levin
2019-04-27  1:38 ` [PATCH AUTOSEL 5.0 58/79] drm/mediatek: using new factor for tvdpll " Sasha Levin
2019-04-27  1:38 ` [PATCH AUTOSEL 5.0 59/79] drm/mediatek: no change parent rate in round_rate() " Sasha Levin
2019-04-27  1:38 ` [PATCH AUTOSEL 5.0 60/79] Bluetooth: btusb: request wake pin with NOAUTOEN Sasha Levin
2019-04-27  1:38 ` [PATCH AUTOSEL 5.0 61/79] ASoC: Intel: kbl: fix wrong number of channels Sasha Levin
2019-04-27  1:38 ` [PATCH AUTOSEL 5.0 62/79] ASoC: stm32: sai: fix master clock management Sasha Levin
2019-04-27  1:38 ` [PATCH AUTOSEL 5.0 63/79] ALSA: hda: Fix racy display power access Sasha Levin
2019-04-27  1:38 ` [PATCH AUTOSEL 5.0 64/79] block, bfq: fix use after free in bfq_bfqq_expire Sasha Levin
2019-04-27  1:38 ` [PATCH AUTOSEL 5.0 65/79] virtio-blk: limit number of hw queues by nr_cpu_ids Sasha Levin
2019-04-27  1:38 ` [PATCH AUTOSEL 5.0 66/79] blk-mq: introduce blk_mq_complete_request_sync() Sasha Levin
2019-04-27  1:38 ` [PATCH AUTOSEL 5.0 67/79] nvme: cancel request synchronously Sasha Levin
2019-04-27  1:38 ` [PATCH AUTOSEL 5.0 68/79] clk: x86: Add system specific quirk to mark clocks as critical Sasha Levin
2019-04-27  1:38 ` [PATCH AUTOSEL 5.0 69/79] nvme-fc: correct csn initialization and increments on error Sasha Levin
2019-04-27  1:38 ` [PATCH AUTOSEL 5.0 70/79] nvmet: fix discover log page when offsets are used Sasha Levin
2019-04-27  1:38 ` [PATCH AUTOSEL 5.0 71/79] platform/x86: pmc_atom: Drop __initconst on dmi table Sasha Levin
2019-04-27  1:38 ` [PATCH AUTOSEL 5.0 72/79] NFS: Forbid setting AF_INET6 to "struct sockaddr_in"->sin_family Sasha Levin
2019-04-27  1:38 ` [PATCH AUTOSEL 5.0 73/79] NFSv4.1 fix incorrect return value in copy_file_range Sasha Levin
2019-04-27  1:38 ` Sasha Levin [this message]
2019-04-27  1:38 ` [PATCH AUTOSEL 5.0 75/79] iommu/amd: Set exclusion range correctly Sasha Levin
2019-04-27  1:38 ` [PATCH AUTOSEL 5.0 76/79] mm: make page ref count overflow check tighter and more explicit Sasha Levin
2019-04-27  1:38 ` [PATCH AUTOSEL 5.0 77/79] mm: add 'try_get_page()' helper function Sasha Levin
2019-04-27  1:38 ` [PATCH AUTOSEL 5.0 78/79] mm: prevent get_user_pages() from overflowing page refcount Sasha Levin
2019-04-27  1:38 ` [PATCH AUTOSEL 5.0 79/79] fs: prevent page refcount overflow in pipe_buf_get Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190427013838.6596-74-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=acme@redhat.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=brueckner@linux.ibm.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=jolsa@redhat.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=schwidefsky@de.ibm.com \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox