All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] perf/core: Fix cgroup event list management
@ 2021-12-13  6:59 Namhyung Kim
  2021-12-13 20:30 ` Peter Zijlstra
  2021-12-13 21:08 ` Peter Zijlstra
  0 siblings, 2 replies; 4+ messages in thread
From: Namhyung Kim @ 2021-12-13  6:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Jiri Olsa, Mark Rutland,
	Alexander Shishkin, LKML, Stephane Eranian, Andi Kleen,
	Ian Rogers, kernel test robot, Marco Elver

The active cgroup events are managed in the per-cpu cgrp_cpuctx_list.
This list is accessed from current cpu and not protected by any locks.
But from the commit ef54c1a476ae ("perf: Rework
perf_event_exit_event()"), this assumption does not hold true anymore.

In the perf_remove_from_context(), it can remove an event from the
context without an IPI when the context is not active.  I think it
assumes task event context, but it's possible for cpu event context
only with cgroup events can be inactive at the moment - and it might
become active soon.

If the event is enabled when it's about to be closed, it might call
perf_cgroup_event_disable() and list_del() with the cgrp_cpuctx_list
on a different cpu.

This resulted in a crash due to an invalid list pointer access during
the cgroup list traversal on the cpu which the event belongs to.

The following program can crash my box easily..

  #include <stdio.h>
  #include <fcntl.h>
  #include <unistd.h>
  #include <linux/perf_event.h>
  #include <sys/stat.h>
  #include <sys/syscall.h>

  //#define CGROUP_ROOT  "/dev/cgroup/devices"
  #define CGROUP_ROOT  "/sys/fs/cgroup"

  int perf_event_open(struct perf_event_attr *attr, int pid, int cpu,
                      int grp, unsigned long flags)
  {
    return syscall(SYS_perf_event_open, attr, pid, cpu, grp, flags);
  }

  int get_cgroup_fd(const char *grp)
  {
    char buf[128];

    snprintf(buf, sizeof(buf), "%s/%s", CGROUP_ROOT, grp);

    /* ignore failures */
    mkdir(buf, 0755);

    return open(buf, O_RDONLY);
  }

  int main(int argc, char *argv[])
  {
    struct perf_event_attr hw = {
      .type = PERF_TYPE_HARDWARE,
      .config = PERF_COUNT_HW_CPU_CYCLES,
    };
    struct perf_event_attr sw = {
      .type = PERF_TYPE_SOFTWARE,
      .config = PERF_COUNT_SW_CPU_CLOCK,
    };
    int cpus = sysconf(_SC_NPROCESSORS_ONLN);
    int fd[4][cpus];
    int cgrpA, cgrpB;

    cgrpA = get_cgroup_fd("A");
    cgrpB = get_cgroup_fd("B");
    if (cgrpA < 0 || cgrpB < 0) {
      printf("failed to get cgroup fd\n");
      return 1;
    }

    while (1) {
      int i;

      for (i = 0; i < cpus; i++) {
        fd[0][i] = perf_event_open(&hw, cgrpA, i, -1, PERF_FLAG_PID_CGROUP);
        fd[1][i] = perf_event_open(&sw, cgrpA, i, -1, PERF_FLAG_PID_CGROUP);
        fd[2][i] = perf_event_open(&hw, cgrpB, i, -1, PERF_FLAG_PID_CGROUP);
        fd[3][i] = perf_event_open(&sw, cgrpB, i, -1, PERF_FLAG_PID_CGROUP);
      }

      for (i = 0; i < cpus; i++) {
        close(fd[3][i]);
        close(fd[2][i]);
        close(fd[1][i]);
        close(fd[0][i]);
      }
    }
    return 0;
  }

Let's use IPI to prevent such crashes.

Similarly, I think perf_install_in_context() should use IPI for the
cgroup events too.

Reported-by: kernel test robot <lkp@intel.com>  # for build error
Cc: Marco Elver <elver@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
v2) simply use IPI for cgroup events

 kernel/events/core.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 30d94f68c5bd..9460c083acd9 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2388,7 +2388,7 @@ static void perf_remove_from_context(struct perf_event *event, unsigned long fla
 	 * event_function_call() user.
 	 */
 	raw_spin_lock_irq(&ctx->lock);
-	if (!ctx->is_active) {
+	if (!ctx->is_active && !is_cgroup_event(event)) {
 		__perf_remove_from_context(event, __get_cpu_context(ctx),
 					   ctx, (void *)flags);
 		raw_spin_unlock_irq(&ctx->lock);
@@ -2857,11 +2857,14 @@ perf_install_in_context(struct perf_event_context *ctx,
 	 * perf_event_attr::disabled events will not run and can be initialized
 	 * without IPI. Except when this is the first event for the context, in
 	 * that case we need the magic of the IPI to set ctx->is_active.
+	 * Similarly, cgroup events for the context also needs the IPI to
+	 * manipulate the cgrp_cpuctx_list.
 	 *
 	 * The IOC_ENABLE that is sure to follow the creation of a disabled
 	 * event will issue the IPI and reprogram the hardware.
 	 */
-	if (__perf_effective_state(event) == PERF_EVENT_STATE_OFF && ctx->nr_events) {
+	if (__perf_effective_state(event) == PERF_EVENT_STATE_OFF &&
+	    ctx->nr_events && !is_cgroup_event(event)) {
 		raw_spin_lock_irq(&ctx->lock);
 		if (ctx->task == TASK_TOMBSTONE) {
 			raw_spin_unlock_irq(&ctx->lock);

base-commit: 73743c3b092277febbf69b250ce8ebbca0525aa2
-- 
2.34.1.173.g76aa8bc2d0-goog


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-12-13 21:11 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-12-13  6:59 [PATCH v2] perf/core: Fix cgroup event list management Namhyung Kim
2021-12-13 20:30 ` Peter Zijlstra
2021-12-13 21:10   ` Namhyung Kim
2021-12-13 21:08 ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.