* [PATCH v2] perf/core: Fix cgroup event list management
@ 2021-12-13 6:59 Namhyung Kim
2021-12-13 20:30 ` Peter Zijlstra
2021-12-13 21:08 ` Peter Zijlstra
0 siblings, 2 replies; 4+ messages in thread
From: Namhyung Kim @ 2021-12-13 6:59 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Jiri Olsa, Mark Rutland,
Alexander Shishkin, LKML, Stephane Eranian, Andi Kleen,
Ian Rogers, kernel test robot, Marco Elver
The active cgroup events are managed in the per-cpu cgrp_cpuctx_list.
This list is accessed from current cpu and not protected by any locks.
But from the commit ef54c1a476ae ("perf: Rework
perf_event_exit_event()"), this assumption does not hold true anymore.
In the perf_remove_from_context(), it can remove an event from the
context without an IPI when the context is not active. I think it
assumes task event context, but it's possible for cpu event context
only with cgroup events can be inactive at the moment - and it might
become active soon.
If the event is enabled when it's about to be closed, it might call
perf_cgroup_event_disable() and list_del() with the cgrp_cpuctx_list
on a different cpu.
This resulted in a crash due to an invalid list pointer access during
the cgroup list traversal on the cpu which the event belongs to.
The following program can crash my box easily..
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <linux/perf_event.h>
#include <sys/stat.h>
#include <sys/syscall.h>
//#define CGROUP_ROOT "/dev/cgroup/devices"
#define CGROUP_ROOT "/sys/fs/cgroup"
int perf_event_open(struct perf_event_attr *attr, int pid, int cpu,
int grp, unsigned long flags)
{
return syscall(SYS_perf_event_open, attr, pid, cpu, grp, flags);
}
int get_cgroup_fd(const char *grp)
{
char buf[128];
snprintf(buf, sizeof(buf), "%s/%s", CGROUP_ROOT, grp);
/* ignore failures */
mkdir(buf, 0755);
return open(buf, O_RDONLY);
}
int main(int argc, char *argv[])
{
struct perf_event_attr hw = {
.type = PERF_TYPE_HARDWARE,
.config = PERF_COUNT_HW_CPU_CYCLES,
};
struct perf_event_attr sw = {
.type = PERF_TYPE_SOFTWARE,
.config = PERF_COUNT_SW_CPU_CLOCK,
};
int cpus = sysconf(_SC_NPROCESSORS_ONLN);
int fd[4][cpus];
int cgrpA, cgrpB;
cgrpA = get_cgroup_fd("A");
cgrpB = get_cgroup_fd("B");
if (cgrpA < 0 || cgrpB < 0) {
printf("failed to get cgroup fd\n");
return 1;
}
while (1) {
int i;
for (i = 0; i < cpus; i++) {
fd[0][i] = perf_event_open(&hw, cgrpA, i, -1, PERF_FLAG_PID_CGROUP);
fd[1][i] = perf_event_open(&sw, cgrpA, i, -1, PERF_FLAG_PID_CGROUP);
fd[2][i] = perf_event_open(&hw, cgrpB, i, -1, PERF_FLAG_PID_CGROUP);
fd[3][i] = perf_event_open(&sw, cgrpB, i, -1, PERF_FLAG_PID_CGROUP);
}
for (i = 0; i < cpus; i++) {
close(fd[3][i]);
close(fd[2][i]);
close(fd[1][i]);
close(fd[0][i]);
}
}
return 0;
}
Let's use IPI to prevent such crashes.
Similarly, I think perf_install_in_context() should use IPI for the
cgroup events too.
Reported-by: kernel test robot <lkp@intel.com> # for build error
Cc: Marco Elver <elver@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
v2) simply use IPI for cgroup events
kernel/events/core.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 30d94f68c5bd..9460c083acd9 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2388,7 +2388,7 @@ static void perf_remove_from_context(struct perf_event *event, unsigned long fla
* event_function_call() user.
*/
raw_spin_lock_irq(&ctx->lock);
- if (!ctx->is_active) {
+ if (!ctx->is_active && !is_cgroup_event(event)) {
__perf_remove_from_context(event, __get_cpu_context(ctx),
ctx, (void *)flags);
raw_spin_unlock_irq(&ctx->lock);
@@ -2857,11 +2857,14 @@ perf_install_in_context(struct perf_event_context *ctx,
* perf_event_attr::disabled events will not run and can be initialized
* without IPI. Except when this is the first event for the context, in
* that case we need the magic of the IPI to set ctx->is_active.
+ * Similarly, cgroup events for the context also needs the IPI to
+ * manipulate the cgrp_cpuctx_list.
*
* The IOC_ENABLE that is sure to follow the creation of a disabled
* event will issue the IPI and reprogram the hardware.
*/
- if (__perf_effective_state(event) == PERF_EVENT_STATE_OFF && ctx->nr_events) {
+ if (__perf_effective_state(event) == PERF_EVENT_STATE_OFF &&
+ ctx->nr_events && !is_cgroup_event(event)) {
raw_spin_lock_irq(&ctx->lock);
if (ctx->task == TASK_TOMBSTONE) {
raw_spin_unlock_irq(&ctx->lock);
base-commit: 73743c3b092277febbf69b250ce8ebbca0525aa2
--
2.34.1.173.g76aa8bc2d0-goog
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH v2] perf/core: Fix cgroup event list management
2021-12-13 6:59 [PATCH v2] perf/core: Fix cgroup event list management Namhyung Kim
@ 2021-12-13 20:30 ` Peter Zijlstra
2021-12-13 21:10 ` Namhyung Kim
2021-12-13 21:08 ` Peter Zijlstra
1 sibling, 1 reply; 4+ messages in thread
From: Peter Zijlstra @ 2021-12-13 20:30 UTC (permalink / raw)
To: Namhyung Kim
Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Jiri Olsa, Mark Rutland,
Alexander Shishkin, LKML, Stephane Eranian, Andi Kleen,
Ian Rogers, kernel test robot, Marco Elver
On Sun, Dec 12, 2021 at 10:59:36PM -0800, Namhyung Kim wrote:
> The active cgroup events are managed in the per-cpu cgrp_cpuctx_list.
> This list is accessed from current cpu and not protected by any locks.
> But from the commit ef54c1a476ae ("perf: Rework
> perf_event_exit_event()"), this assumption does not hold true anymore.
>
> In the perf_remove_from_context(), it can remove an event from the
> context without an IPI when the context is not active. I think it
"I tihnk" just doesn't cut it. That means I have to completely reverse
engineer your patch and it's assumptions. Which is more work for me :-(
> assumes task event context, but it's possible for cpu event context
> only with cgroup events can be inactive at the moment - and it might
> become active soon.
>
> If the event is enabled when it's about to be closed, it might call
> perf_cgroup_event_disable() and list_del() with the cgrp_cpuctx_list
> on a different cpu.
>
> This resulted in a crash due to an invalid list pointer access during
> the cgroup list traversal on the cpu which the event belongs to.
>
> The following program can crash my box easily..
Unless that's already public, you've just given the script kiddos ammo,
surely we don't need that.
> Let's use IPI to prevent such crashes.
Let's just not do random things and hope stuff 'works'. Either it is
correct or it is not.
> Similarly, I think perf_install_in_context() should use IPI for the
> cgroup events too.
Let's be sure, ok?
> Reported-by: kernel test robot <lkp@intel.com> # for build error
That's complete garbage, please don't do that.
> Cc: Marco Elver <elver@google.com>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
> v2) simply use IPI for cgroup events
>
> kernel/events/core.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 30d94f68c5bd..9460c083acd9 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -2388,7 +2388,7 @@ static void perf_remove_from_context(struct perf_event *event, unsigned long fla
> * event_function_call() user.
> */
> raw_spin_lock_irq(&ctx->lock);
> - if (!ctx->is_active) {
> + if (!ctx->is_active && !is_cgroup_event(event)) {
> __perf_remove_from_context(event, __get_cpu_context(ctx),
> ctx, (void *)flags);
> raw_spin_unlock_irq(&ctx->lock);
> @@ -2857,11 +2857,14 @@ perf_install_in_context(struct perf_event_context *ctx,
> * perf_event_attr::disabled events will not run and can be initialized
> * without IPI. Except when this is the first event for the context, in
> * that case we need the magic of the IPI to set ctx->is_active.
> + * Similarly, cgroup events for the context also needs the IPI to
> + * manipulate the cgrp_cpuctx_list.
> *
> * The IOC_ENABLE that is sure to follow the creation of a disabled
> * event will issue the IPI and reprogram the hardware.
> */
> - if (__perf_effective_state(event) == PERF_EVENT_STATE_OFF && ctx->nr_events) {
> + if (__perf_effective_state(event) == PERF_EVENT_STATE_OFF &&
> + ctx->nr_events && !is_cgroup_event(event)) {
> raw_spin_lock_irq(&ctx->lock);
> if (ctx->task == TASK_TOMBSTONE) {
> raw_spin_unlock_irq(&ctx->lock);
>
> base-commit: 73743c3b092277febbf69b250ce8ebbca0525aa2
What's junk like that doing ?
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [PATCH v2] perf/core: Fix cgroup event list management
2021-12-13 20:30 ` Peter Zijlstra
@ 2021-12-13 21:10 ` Namhyung Kim
0 siblings, 0 replies; 4+ messages in thread
From: Namhyung Kim @ 2021-12-13 21:10 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Jiri Olsa, Mark Rutland,
Alexander Shishkin, LKML, Stephane Eranian, Andi Kleen,
Ian Rogers, kernel test robot, Marco Elver
Hi Peter,
On Mon, Dec 13, 2021 at 12:30 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Sun, Dec 12, 2021 at 10:59:36PM -0800, Namhyung Kim wrote:
> > The active cgroup events are managed in the per-cpu cgrp_cpuctx_list.
> > This list is accessed from current cpu and not protected by any locks.
> > But from the commit ef54c1a476ae ("perf: Rework
> > perf_event_exit_event()"), this assumption does not hold true anymore.
> >
> > In the perf_remove_from_context(), it can remove an event from the
> > context without an IPI when the context is not active. I think it
>
> "I tihnk" just doesn't cut it. That means I have to completely reverse
> engineer your patch and it's assumptions. Which is more work for me :-(
If you are talking about my wording, ok I will use more assertive words
with the facts in the future. I should say that it doesn't work with
cgroup events which can change the active states during context
switches.
>
> > assumes task event context, but it's possible for cpu event context
> > only with cgroup events can be inactive at the moment - and it might
> > become active soon.
> >
> > If the event is enabled when it's about to be closed, it might call
> > perf_cgroup_event_disable() and list_del() with the cgrp_cpuctx_list
> > on a different cpu.
> >
> > This resulted in a crash due to an invalid list pointer access during
> > the cgroup list traversal on the cpu which the event belongs to.
> >
> > The following program can crash my box easily..
>
> Unless that's already public, you've just given the script kiddos ammo,
> surely we don't need that.
Understood, will take more care in the future.
But this requires root access to create cgroups or the sysctl
perf_event_paranoid of 0 to open cpu/cgroup events which
is restricted in the most distro. Anyway, I should be careful,
sorry about that.
>
> > Let's use IPI to prevent such crashes.
>
> Let's just not do random things and hope stuff 'works'. Either it is
> correct or it is not.
Right, but in this case, it's back to use IPI by removing the
optimization for cgroup events. I'll update the description.
>
> > Similarly, I think perf_install_in_context() should use IPI for the
> > cgroup events too.
>
> Let's be sure, ok?
I see.
>
> > Reported-by: kernel test robot <lkp@intel.com> # for build error
>
> That's complete garbage, please don't do that.
Got it.
>
> > Cc: Marco Elver <elver@google.com>
> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > ---
> > v2) simply use IPI for cgroup events
> >
> > kernel/events/core.c | 7 +++++--
> > 1 file changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index 30d94f68c5bd..9460c083acd9 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -2388,7 +2388,7 @@ static void perf_remove_from_context(struct perf_event *event, unsigned long fla
> > * event_function_call() user.
> > */
> > raw_spin_lock_irq(&ctx->lock);
> > - if (!ctx->is_active) {
> > + if (!ctx->is_active && !is_cgroup_event(event)) {
> > __perf_remove_from_context(event, __get_cpu_context(ctx),
> > ctx, (void *)flags);
> > raw_spin_unlock_irq(&ctx->lock);
> > @@ -2857,11 +2857,14 @@ perf_install_in_context(struct perf_event_context *ctx,
> > * perf_event_attr::disabled events will not run and can be initialized
> > * without IPI. Except when this is the first event for the context, in
> > * that case we need the magic of the IPI to set ctx->is_active.
> > + * Similarly, cgroup events for the context also needs the IPI to
> > + * manipulate the cgrp_cpuctx_list.
> > *
> > * The IOC_ENABLE that is sure to follow the creation of a disabled
> > * event will issue the IPI and reprogram the hardware.
> > */
> > - if (__perf_effective_state(event) == PERF_EVENT_STATE_OFF && ctx->nr_events) {
> > + if (__perf_effective_state(event) == PERF_EVENT_STATE_OFF &&
> > + ctx->nr_events && !is_cgroup_event(event)) {
> > raw_spin_lock_irq(&ctx->lock);
> > if (ctx->task == TASK_TOMBSTONE) {
> > raw_spin_unlock_irq(&ctx->lock);
> >
> > base-commit: 73743c3b092277febbf69b250ce8ebbca0525aa2
>
> What's junk like that doing ?
It was recommended by the kernel test robot. I think it uses
the info to find where this patch applies to.
Thanks,
Namhyung
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v2] perf/core: Fix cgroup event list management
2021-12-13 6:59 [PATCH v2] perf/core: Fix cgroup event list management Namhyung Kim
2021-12-13 20:30 ` Peter Zijlstra
@ 2021-12-13 21:08 ` Peter Zijlstra
1 sibling, 0 replies; 4+ messages in thread
From: Peter Zijlstra @ 2021-12-13 21:08 UTC (permalink / raw)
To: Namhyung Kim
Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Jiri Olsa, Mark Rutland,
Alexander Shishkin, LKML, Stephane Eranian, Andi Kleen,
Ian Rogers, kernel test robot, Marco Elver
On Sun, Dec 12, 2021 at 10:59:36PM -0800, Namhyung Kim wrote:
> The active cgroup events are managed in the per-cpu cgrp_cpuctx_list.
> This list is accessed from current cpu and not protected by any locks.
> But from the commit ef54c1a476ae ("perf: Rework
> perf_event_exit_event()"), this assumption does not hold true anymore.
>
> In the perf_remove_from_context(), it can remove an event from the
> context without an IPI when the context is not active. I think it
> assumes task event context, but it's possible for cpu event context
Yes, event_function_call() in general doesn't work, but for cpu events
it does.
> only with cgroup events can be inactive at the moment - and it might
> become active soon.
It can't, we're holding ctx->mutex and ctx->lock, and since it's a cpu
event, that's cpuctx.
But yes, cgrp_cpuctx_list relies on being strictly per-cpu and I can't
come up with a better solution either, doing those IPIs suck but...
But please, put in a comment like:
/*
* Cgroup events are per-CPU events, and must IPI because of
* cgrp_cpuctx_list.
*/
if (!ctx->is_active || !is_cgroup_event(event)) {
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2021-12-13 21:11 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-12-13 6:59 [PATCH v2] perf/core: Fix cgroup event list management Namhyung Kim
2021-12-13 20:30 ` Peter Zijlstra
2021-12-13 21:10 ` Namhyung Kim
2021-12-13 21:08 ` Peter Zijlstra
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.