* [PATCH 01/19] lockdep: Fix might_fault()
2024-11-04 13:39 [PATCH 00/19] perf: Make perf_pmu_unregister() usable Peter Zijlstra
@ 2024-11-04 13:39 ` Peter Zijlstra
2025-03-01 20:06 ` [tip: locking/core] lockdep/mm: Fix might_fault() lockdep check of current->mm->mmap_lock tip-bot2 for Peter Zijlstra
2024-11-04 13:39 ` [PATCH 02/19] perf: Fix pmus_lock vs pmus_srcu ordering Peter Zijlstra
` (19 subsequent siblings)
20 siblings, 1 reply; 85+ messages in thread
From: Peter Zijlstra @ 2024-11-04 13:39 UTC (permalink / raw)
To: mingo, lucas.demarchi
Cc: linux-kernel, peterz, willy, acme, namhyung, mark.rutland,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
David Hildenbrand
Turns out that commit 9ec23531fd48 ("sched/preempt, mm/fault: Trigger
might_sleep() in might_fault() with disabled pagefaults") accidentally
(and unnessecarily) put the lockdep part of __might_fault() under
CONFIG_DEBUG_ATOMIC_SLEEP.
Cc: David Hildenbrand <david@redhat.com>
Fixes: 9ec23531fd48 ("sched/preempt, mm/fault: Trigger might_sleep() in might_fault() with disabled pagefaults")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
mm/memory.c | 2 --
1 file changed, 2 deletions(-)
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -6695,10 +6695,8 @@ void __might_fault(const char *file, int
if (pagefault_disabled())
return;
__might_sleep(file, line);
-#if defined(CONFIG_DEBUG_ATOMIC_SLEEP)
if (current->mm)
might_lock_read(¤t->mm->mmap_lock);
-#endif
}
EXPORT_SYMBOL(__might_fault);
#endif
^ permalink raw reply [flat|nested] 85+ messages in thread* [tip: locking/core] lockdep/mm: Fix might_fault() lockdep check of current->mm->mmap_lock
2024-11-04 13:39 ` [PATCH 01/19] lockdep: Fix might_fault() Peter Zijlstra
@ 2025-03-01 20:06 ` tip-bot2 for Peter Zijlstra
0 siblings, 0 replies; 85+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2025-03-01 20:06 UTC (permalink / raw)
To: linux-tip-commits
Cc: Peter Zijlstra (Intel), Ingo Molnar, Linus Torvalds,
Andrew Morton, x86, linux-kernel
The following commit has been merged into the locking/core branch of tip:
Commit-ID: a1b65f3f7c6f7f0a08a7dba8be458c6415236487
Gitweb: https://git.kernel.org/tip/a1b65f3f7c6f7f0a08a7dba8be458c6415236487
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Mon, 04 Nov 2024 14:39:10 +01:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Sat, 01 Mar 2025 19:32:41 +01:00
lockdep/mm: Fix might_fault() lockdep check of current->mm->mmap_lock
Turns out that this commit, about 10 years ago:
9ec23531fd48 ("sched/preempt, mm/fault: Trigger might_sleep() in might_fault() with disabled pagefaults")
... accidentally (and unnessecarily) put the lockdep part of
__might_fault() under CONFIG_DEBUG_ATOMIC_SLEEP=y.
This is potentially notable because large distributions such as
Ubuntu are running with !CONFIG_DEBUG_ATOMIC_SLEEP.
Restore the debug check.
[ mingo: Update changelog. ]
Fixes: 9ec23531fd48 ("sched/preempt, mm/fault: Trigger might_sleep() in might_fault() with disabled pagefaults")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/r/20241104135517.536628371@infradead.org
---
mm/memory.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index 539c0f7..1dfad45 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -6835,10 +6835,8 @@ void __might_fault(const char *file, int line)
if (pagefault_disabled())
return;
__might_sleep(file, line);
-#if defined(CONFIG_DEBUG_ATOMIC_SLEEP)
if (current->mm)
might_lock_read(¤t->mm->mmap_lock);
-#endif
}
EXPORT_SYMBOL(__might_fault);
#endif
^ permalink raw reply related [flat|nested] 85+ messages in thread
* [PATCH 02/19] perf: Fix pmus_lock vs pmus_srcu ordering
2024-11-04 13:39 [PATCH 00/19] perf: Make perf_pmu_unregister() usable Peter Zijlstra
2024-11-04 13:39 ` [PATCH 01/19] lockdep: Fix might_fault() Peter Zijlstra
@ 2024-11-04 13:39 ` Peter Zijlstra
2025-03-01 20:07 ` [tip: perf/core] perf/core: Fix pmus_lock vs. " tip-bot2 for Peter Zijlstra
2024-11-04 13:39 ` [PATCH 03/19] perf: Fix perf_pmu_register() vs perf_init_event() Peter Zijlstra
` (18 subsequent siblings)
20 siblings, 1 reply; 85+ messages in thread
From: Peter Zijlstra @ 2024-11-04 13:39 UTC (permalink / raw)
To: mingo, lucas.demarchi
Cc: linux-kernel, peterz, willy, acme, namhyung, mark.rutland,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang
Commit a63fbed776c7 ("perf/tracing/cpuhotplug: Fix locking order")
placed pmus_lock inside pmus_srcu, this makes perf_pmu_unregister()
trip lockdep.
Move the locking about such that only pmu_idr and pmus (list) are
modified while holding pmus_lock. This avoids doing synchronize_srcu()
while holding pmus_lock and all is well again.
Fixes: a63fbed776c7 ("perf/tracing/cpuhotplug: Fix locking order")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
kernel/events/core.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -11836,6 +11836,8 @@ void perf_pmu_unregister(struct pmu *pmu
{
mutex_lock(&pmus_lock);
list_del_rcu(&pmu->entry);
+ idr_remove(&pmu_idr, pmu->type);
+ mutex_unlock(&pmus_lock);
/*
* We dereference the pmu list under both SRCU and regular RCU, so
@@ -11845,7 +11847,6 @@ void perf_pmu_unregister(struct pmu *pmu
synchronize_rcu();
free_percpu(pmu->pmu_disable_count);
- idr_remove(&pmu_idr, pmu->type);
if (pmu_bus_running && pmu->dev && pmu->dev != PMU_NULL_DEV) {
if (pmu->nr_addr_filters)
device_remove_file(pmu->dev, &dev_attr_nr_addr_filters);
@@ -11853,7 +11854,6 @@ void perf_pmu_unregister(struct pmu *pmu
put_device(pmu->dev);
}
free_pmu_context(pmu);
- mutex_unlock(&pmus_lock);
}
EXPORT_SYMBOL_GPL(perf_pmu_unregister);
^ permalink raw reply [flat|nested] 85+ messages in thread* [tip: perf/core] perf/core: Fix pmus_lock vs. pmus_srcu ordering
2024-11-04 13:39 ` [PATCH 02/19] perf: Fix pmus_lock vs pmus_srcu ordering Peter Zijlstra
@ 2025-03-01 20:07 ` tip-bot2 for Peter Zijlstra
0 siblings, 0 replies; 85+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2025-03-01 20:07 UTC (permalink / raw)
To: linux-tip-commits; +Cc: Peter Zijlstra (Intel), Ingo Molnar, x86, linux-kernel
The following commit has been merged into the perf/core branch of tip:
Commit-ID: 2565e42539b120b81a68a58da961ce5d1e34eac8
Gitweb: https://git.kernel.org/tip/2565e42539b120b81a68a58da961ce5d1e34eac8
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Mon, 04 Nov 2024 14:39:11 +01:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Sat, 01 Mar 2025 19:38:42 +01:00
perf/core: Fix pmus_lock vs. pmus_srcu ordering
Commit a63fbed776c7 ("perf/tracing/cpuhotplug: Fix locking order")
placed pmus_lock inside pmus_srcu, this makes perf_pmu_unregister()
trip lockdep.
Move the locking about such that only pmu_idr and pmus (list) are
modified while holding pmus_lock. This avoids doing synchronize_srcu()
while holding pmus_lock and all is well again.
Fixes: a63fbed776c7 ("perf/tracing/cpuhotplug: Fix locking order")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20241104135517.679556858@infradead.org
---
kernel/events/core.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 6364319..11793d6 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -11939,6 +11939,8 @@ void perf_pmu_unregister(struct pmu *pmu)
{
mutex_lock(&pmus_lock);
list_del_rcu(&pmu->entry);
+ idr_remove(&pmu_idr, pmu->type);
+ mutex_unlock(&pmus_lock);
/*
* We dereference the pmu list under both SRCU and regular RCU, so
@@ -11948,7 +11950,6 @@ void perf_pmu_unregister(struct pmu *pmu)
synchronize_rcu();
free_percpu(pmu->pmu_disable_count);
- idr_remove(&pmu_idr, pmu->type);
if (pmu_bus_running && pmu->dev && pmu->dev != PMU_NULL_DEV) {
if (pmu->nr_addr_filters)
device_remove_file(pmu->dev, &dev_attr_nr_addr_filters);
@@ -11956,7 +11957,6 @@ void perf_pmu_unregister(struct pmu *pmu)
put_device(pmu->dev);
}
free_pmu_context(pmu);
- mutex_unlock(&pmus_lock);
}
EXPORT_SYMBOL_GPL(perf_pmu_unregister);
^ permalink raw reply related [flat|nested] 85+ messages in thread
* [PATCH 03/19] perf: Fix perf_pmu_register() vs perf_init_event()
2024-11-04 13:39 [PATCH 00/19] perf: Make perf_pmu_unregister() usable Peter Zijlstra
2024-11-04 13:39 ` [PATCH 01/19] lockdep: Fix might_fault() Peter Zijlstra
2024-11-04 13:39 ` [PATCH 02/19] perf: Fix pmus_lock vs pmus_srcu ordering Peter Zijlstra
@ 2024-11-04 13:39 ` Peter Zijlstra
2024-11-04 15:36 ` Uros Bizjak
2025-03-01 20:07 ` [tip: perf/core] perf/core: Fix perf_pmu_register() vs. perf_init_event() tip-bot2 for Peter Zijlstra
2024-11-04 13:39 ` [PATCH 04/19] perf: Simplify perf_event_alloc() error path Peter Zijlstra
` (17 subsequent siblings)
20 siblings, 2 replies; 85+ messages in thread
From: Peter Zijlstra @ 2024-11-04 13:39 UTC (permalink / raw)
To: mingo, lucas.demarchi
Cc: linux-kernel, peterz, willy, acme, namhyung, mark.rutland,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang
There is a fairly obvious race between perf_init_event() doing
idr_find() and perf_pmu_register() doing idr_alloc() with an
incompletely initialized pmu pointer.
Avoid by doing idr_alloc() on a NULL pointer to register the id, and
swizzling the real pmu pointer at the end using idr_replace().
Also making sure to not set pmu members after publishing the pmu, duh.
[ introduce idr_cmpxchg() in order to better handle the idr_replace()
error case -- if it were to return an unexpected pointer, it will
already have replaced the value and there is no going back. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
kernel/events/core.c | 28 ++++++++++++++++++++++++++--
1 file changed, 26 insertions(+), 2 deletions(-)
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -11739,6 +11739,21 @@ static int pmu_dev_alloc(struct pmu *pmu
static struct lock_class_key cpuctx_mutex;
static struct lock_class_key cpuctx_lock;
+static bool idr_cmpxchg(struct idr *idr, unsigned long id, void *old, void *new)
+{
+ void *tmp, *val = idr_find(idr, id);
+
+ if (val != old)
+ return false;
+
+ tmp = idr_replace(idr, new, id);
+ if (IS_ERR(tmp))
+ return false;
+
+ WARN_ON_ONCE(tmp != val);
+ return true;
+}
+
int perf_pmu_register(struct pmu *pmu, const char *name, int type)
{
int cpu, ret, max = PERF_TYPE_MAX;
@@ -11765,7 +11780,7 @@ int perf_pmu_register(struct pmu *pmu, c
if (type >= 0)
max = type;
- ret = idr_alloc(&pmu_idr, pmu, max, 0, GFP_KERNEL);
+ ret = idr_alloc(&pmu_idr, NULL, max, 0, GFP_KERNEL);
if (ret < 0)
goto free_pdc;
@@ -11773,6 +11788,7 @@ int perf_pmu_register(struct pmu *pmu, c
type = ret;
pmu->type = type;
+ atomic_set(&pmu->exclusive_cnt, 0);
if (pmu_bus_running && !pmu->dev) {
ret = pmu_dev_alloc(pmu);
@@ -11821,14 +11837,22 @@ int perf_pmu_register(struct pmu *pmu, c
if (!pmu->event_idx)
pmu->event_idx = perf_event_idx_default;
+ /*
+ * Now that the PMU is complete, make it visible to perf_try_init_event().
+ */
+ if (!idr_cmpxchg(&pmu_idr, pmu->type, NULL, pmu))
+ goto free_context;
list_add_rcu(&pmu->entry, &pmus);
- atomic_set(&pmu->exclusive_cnt, 0);
+
ret = 0;
unlock:
mutex_unlock(&pmus_lock);
return ret;
+free_context:
+ free_percpu(pmu->cpu_pmu_context);
+
free_dev:
if (pmu->dev && pmu->dev != PMU_NULL_DEV) {
device_del(pmu->dev);
^ permalink raw reply [flat|nested] 85+ messages in thread* Re: [PATCH 03/19] perf: Fix perf_pmu_register() vs perf_init_event()
2024-11-04 13:39 ` [PATCH 03/19] perf: Fix perf_pmu_register() vs perf_init_event() Peter Zijlstra
@ 2024-11-04 15:36 ` Uros Bizjak
2024-11-05 12:01 ` Peter Zijlstra
2025-03-01 20:07 ` [tip: perf/core] perf/core: Fix perf_pmu_register() vs. perf_init_event() tip-bot2 for Peter Zijlstra
1 sibling, 1 reply; 85+ messages in thread
From: Uros Bizjak @ 2024-11-04 15:36 UTC (permalink / raw)
To: Peter Zijlstra, mingo, lucas.demarchi
Cc: linux-kernel, willy, acme, namhyung, mark.rutland,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang
On 4. 11. 24 14:39, Peter Zijlstra wrote:
> There is a fairly obvious race between perf_init_event() doing
> idr_find() and perf_pmu_register() doing idr_alloc() with an
> incompletely initialized pmu pointer.
>
> Avoid by doing idr_alloc() on a NULL pointer to register the id, and
> swizzling the real pmu pointer at the end using idr_replace().
>
> Also making sure to not set pmu members after publishing the pmu, duh.
>
> [ introduce idr_cmpxchg() in order to better handle the idr_replace()
> error case -- if it were to return an unexpected pointer, it will
> already have replaced the value and there is no going back. ]
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
> kernel/events/core.c | 28 ++++++++++++++++++++++++++--
> 1 file changed, 26 insertions(+), 2 deletions(-)
>
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -11739,6 +11739,21 @@ static int pmu_dev_alloc(struct pmu *pmu
> static struct lock_class_key cpuctx_mutex;
> static struct lock_class_key cpuctx_lock;
>
> +static bool idr_cmpxchg(struct idr *idr, unsigned long id, void *old, void *new)
> +{
> + void *tmp, *val = idr_find(idr, id);
> +
> + if (val != old)
> + return false;
> +
> + tmp = idr_replace(idr, new, id);
> + if (IS_ERR(tmp))
> + return false;
> +
> + WARN_ON_ONCE(tmp != val);
> + return true;
> +}
Can the above function be named idr_try_cmpxchg?
cmpxchg family of functions return an old value from the location and
one would expect that idr_cmpxchg() returns an old value from *idr, too.
idr_cmpxchg() function however returns success/failure status, and this
is also what functions from try_cmpxchg family return.
Uros.
^ permalink raw reply [flat|nested] 85+ messages in thread* Re: [PATCH 03/19] perf: Fix perf_pmu_register() vs perf_init_event()
2024-11-04 15:36 ` Uros Bizjak
@ 2024-11-05 12:01 ` Peter Zijlstra
0 siblings, 0 replies; 85+ messages in thread
From: Peter Zijlstra @ 2024-11-05 12:01 UTC (permalink / raw)
To: Uros Bizjak
Cc: mingo, lucas.demarchi, linux-kernel, willy, acme, namhyung,
mark.rutland, alexander.shishkin, jolsa, irogers, adrian.hunter,
kan.liang
On Mon, Nov 04, 2024 at 04:36:26PM +0100, Uros Bizjak wrote:
>
>
> On 4. 11. 24 14:39, Peter Zijlstra wrote:
> > There is a fairly obvious race between perf_init_event() doing
> > idr_find() and perf_pmu_register() doing idr_alloc() with an
> > incompletely initialized pmu pointer.
> >
> > Avoid by doing idr_alloc() on a NULL pointer to register the id, and
> > swizzling the real pmu pointer at the end using idr_replace().
> >
> > Also making sure to not set pmu members after publishing the pmu, duh.
> >
> > [ introduce idr_cmpxchg() in order to better handle the idr_replace()
> > error case -- if it were to return an unexpected pointer, it will
> > already have replaced the value and there is no going back. ]
> >
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> > kernel/events/core.c | 28 ++++++++++++++++++++++++++--
> > 1 file changed, 26 insertions(+), 2 deletions(-)
> >
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -11739,6 +11739,21 @@ static int pmu_dev_alloc(struct pmu *pmu
> > static struct lock_class_key cpuctx_mutex;
> > static struct lock_class_key cpuctx_lock;
> > +static bool idr_cmpxchg(struct idr *idr, unsigned long id, void *old, void *new)
> > +{
> > + void *tmp, *val = idr_find(idr, id);
> > +
> > + if (val != old)
> > + return false;
> > +
> > + tmp = idr_replace(idr, new, id);
> > + if (IS_ERR(tmp))
> > + return false;
> > +
> > + WARN_ON_ONCE(tmp != val);
> > + return true;
> > +}
>
> Can the above function be named idr_try_cmpxchg?
>
> cmpxchg family of functions return an old value from the location and one
> would expect that idr_cmpxchg() returns an old value from *idr, too.
> idr_cmpxchg() function however returns success/failure status, and this is
> also what functions from try_cmpxchg family return.
Fair enough -- OTOH, this function is very much not atomic. I considered
calling it idr_cas() as to distance itself from cmpxchg family.
Also, it is local to perf, and not placed in idr.h or similar.
While the usage here is somewhat spurious, it gets used later on in the
series to better effect.
^ permalink raw reply [flat|nested] 85+ messages in thread
* [tip: perf/core] perf/core: Fix perf_pmu_register() vs. perf_init_event()
2024-11-04 13:39 ` [PATCH 03/19] perf: Fix perf_pmu_register() vs perf_init_event() Peter Zijlstra
2024-11-04 15:36 ` Uros Bizjak
@ 2025-03-01 20:07 ` tip-bot2 for Peter Zijlstra
1 sibling, 0 replies; 85+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2025-03-01 20:07 UTC (permalink / raw)
To: linux-tip-commits; +Cc: Peter Zijlstra (Intel), Ingo Molnar, x86, linux-kernel
The following commit has been merged into the perf/core branch of tip:
Commit-ID: 003659fec9f6d8c04738cb74b5384398ae8a7e88
Gitweb: https://git.kernel.org/tip/003659fec9f6d8c04738cb74b5384398ae8a7e88
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Mon, 04 Nov 2024 14:39:12 +01:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Sat, 01 Mar 2025 19:38:42 +01:00
perf/core: Fix perf_pmu_register() vs. perf_init_event()
There is a fairly obvious race between perf_init_event() doing
idr_find() and perf_pmu_register() doing idr_alloc() with an
incompletely initialized PMU pointer.
Avoid by doing idr_alloc() on a NULL pointer to register the id, and
swizzling the real struct pmu pointer at the end using idr_replace().
Also making sure to not set struct pmu members after publishing
the struct pmu, duh.
[ introduce idr_cmpxchg() in order to better handle the idr_replace()
error case -- if it were to return an unexpected pointer, it will
already have replaced the value and there is no going back. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20241104135517.858805880@infradead.org
---
kernel/events/core.c | 28 ++++++++++++++++++++++++++--
1 file changed, 26 insertions(+), 2 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 11793d6..823aa08 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -11830,6 +11830,21 @@ free_dev:
static struct lock_class_key cpuctx_mutex;
static struct lock_class_key cpuctx_lock;
+static bool idr_cmpxchg(struct idr *idr, unsigned long id, void *old, void *new)
+{
+ void *tmp, *val = idr_find(idr, id);
+
+ if (val != old)
+ return false;
+
+ tmp = idr_replace(idr, new, id);
+ if (IS_ERR(tmp))
+ return false;
+
+ WARN_ON_ONCE(tmp != val);
+ return true;
+}
+
int perf_pmu_register(struct pmu *pmu, const char *name, int type)
{
int cpu, ret, max = PERF_TYPE_MAX;
@@ -11856,7 +11871,7 @@ int perf_pmu_register(struct pmu *pmu, const char *name, int type)
if (type >= 0)
max = type;
- ret = idr_alloc(&pmu_idr, pmu, max, 0, GFP_KERNEL);
+ ret = idr_alloc(&pmu_idr, NULL, max, 0, GFP_KERNEL);
if (ret < 0)
goto free_pdc;
@@ -11864,6 +11879,7 @@ int perf_pmu_register(struct pmu *pmu, const char *name, int type)
type = ret;
pmu->type = type;
+ atomic_set(&pmu->exclusive_cnt, 0);
if (pmu_bus_running && !pmu->dev) {
ret = pmu_dev_alloc(pmu);
@@ -11912,14 +11928,22 @@ int perf_pmu_register(struct pmu *pmu, const char *name, int type)
if (!pmu->event_idx)
pmu->event_idx = perf_event_idx_default;
+ /*
+ * Now that the PMU is complete, make it visible to perf_try_init_event().
+ */
+ if (!idr_cmpxchg(&pmu_idr, pmu->type, NULL, pmu))
+ goto free_context;
list_add_rcu(&pmu->entry, &pmus);
- atomic_set(&pmu->exclusive_cnt, 0);
+
ret = 0;
unlock:
mutex_unlock(&pmus_lock);
return ret;
+free_context:
+ free_percpu(pmu->cpu_pmu_context);
+
free_dev:
if (pmu->dev && pmu->dev != PMU_NULL_DEV) {
device_del(pmu->dev);
^ permalink raw reply related [flat|nested] 85+ messages in thread
* [PATCH 04/19] perf: Simplify perf_event_alloc() error path
2024-11-04 13:39 [PATCH 00/19] perf: Make perf_pmu_unregister() usable Peter Zijlstra
` (2 preceding siblings ...)
2024-11-04 13:39 ` [PATCH 03/19] perf: Fix perf_pmu_register() vs perf_init_event() Peter Zijlstra
@ 2024-11-04 13:39 ` Peter Zijlstra
2025-03-01 20:07 ` [tip: perf/core] perf/core: Simplify the " tip-bot2 for Peter Zijlstra
` (2 more replies)
2024-11-04 13:39 ` [PATCH 05/19] perf: Simplify perf_pmu_register() " Peter Zijlstra
` (16 subsequent siblings)
20 siblings, 3 replies; 85+ messages in thread
From: Peter Zijlstra @ 2024-11-04 13:39 UTC (permalink / raw)
To: mingo, lucas.demarchi
Cc: linux-kernel, peterz, willy, acme, namhyung, mark.rutland,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang
The error cleanup sequence in perf_event_alloc() is a subset of the
existing _free_event() function (it must of course be).
Split this out into __free_event() and simplify the error path.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
include/linux/perf_event.h | 16 +++--
kernel/events/core.c | 134 ++++++++++++++++++++++-----------------------
2 files changed, 76 insertions(+), 74 deletions(-)
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -652,13 +652,15 @@ struct swevent_hlist {
struct rcu_head rcu_head;
};
-#define PERF_ATTACH_CONTEXT 0x01
-#define PERF_ATTACH_GROUP 0x02
-#define PERF_ATTACH_TASK 0x04
-#define PERF_ATTACH_TASK_DATA 0x08
-#define PERF_ATTACH_ITRACE 0x10
-#define PERF_ATTACH_SCHED_CB 0x20
-#define PERF_ATTACH_CHILD 0x40
+#define PERF_ATTACH_CONTEXT 0x0001
+#define PERF_ATTACH_GROUP 0x0002
+#define PERF_ATTACH_TASK 0x0004
+#define PERF_ATTACH_TASK_DATA 0x0008
+#define PERF_ATTACH_ITRACE 0x0010
+#define PERF_ATTACH_SCHED_CB 0x0020
+#define PERF_ATTACH_CHILD 0x0040
+#define PERF_ATTACH_EXCLUSIVE 0x0080
+#define PERF_ATTACH_CALLCHAIN 0x0100
struct bpf_prog;
struct perf_cgroup;
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5246,6 +5246,8 @@ static int exclusive_event_init(struct p
return -EBUSY;
}
+ event->attach_state |= PERF_ATTACH_EXCLUSIVE;
+
return 0;
}
@@ -5253,14 +5255,13 @@ static void exclusive_event_destroy(stru
{
struct pmu *pmu = event->pmu;
- if (!is_exclusive_pmu(pmu))
- return;
-
/* see comment in exclusive_event_init() */
if (event->attach_state & PERF_ATTACH_TASK)
atomic_dec(&pmu->exclusive_cnt);
else
atomic_inc(&pmu->exclusive_cnt);
+
+ event->attach_state &= ~PERF_ATTACH_EXCLUSIVE;
}
static bool exclusive_event_match(struct perf_event *e1, struct perf_event *e2)
@@ -5319,40 +5320,20 @@ static void perf_pending_task_sync(struc
rcuwait_wait_event(&event->pending_work_wait, !event->pending_work, TASK_UNINTERRUPTIBLE);
}
-static void _free_event(struct perf_event *event)
+/* vs perf_event_alloc() error */
+static void __free_event(struct perf_event *event)
{
- irq_work_sync(&event->pending_irq);
- irq_work_sync(&event->pending_disable_irq);
- perf_pending_task_sync(event);
-
- unaccount_event(event);
+ if (event->attach_state & PERF_ATTACH_CALLCHAIN)
+ put_callchain_buffers();
- security_perf_event_free(event);
+ kfree(event->addr_filter_ranges);
- if (event->rb) {
- /*
- * Can happen when we close an event with re-directed output.
- *
- * Since we have a 0 refcount, perf_mmap_close() will skip
- * over us; possibly making our ring_buffer_put() the last.
- */
- mutex_lock(&event->mmap_mutex);
- ring_buffer_attach(event, NULL);
- mutex_unlock(&event->mmap_mutex);
- }
+ if (event->attach_state & PERF_ATTACH_EXCLUSIVE)
+ exclusive_event_destroy(event);
if (is_cgroup_event(event))
perf_detach_cgroup(event);
- if (!event->parent) {
- if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN)
- put_callchain_buffers();
- }
-
- perf_event_free_bpf_prog(event);
- perf_addr_filters_splice(event, NULL);
- kfree(event->addr_filter_ranges);
-
if (event->destroy)
event->destroy(event);
@@ -5363,22 +5344,58 @@ static void _free_event(struct perf_even
if (event->hw.target)
put_task_struct(event->hw.target);
- if (event->pmu_ctx)
+ if (event->pmu_ctx) {
+ /*
+ * put_pmu_ctx() needs an event->ctx reference, because of
+ * epc->ctx.
+ */
+ WARN_ON_ONCE(!event->ctx);
+ WARN_ON_ONCE(event->pmu_ctx->ctx != event->ctx);
put_pmu_ctx(event->pmu_ctx);
+ }
/*
- * perf_event_free_task() relies on put_ctx() being 'last', in particular
- * all task references must be cleaned up.
+ * perf_event_free_task() relies on put_ctx() being 'last', in
+ * particular all task references must be cleaned up.
*/
if (event->ctx)
put_ctx(event->ctx);
- exclusive_event_destroy(event);
- module_put(event->pmu->module);
+ if (event->pmu)
+ module_put(event->pmu->module);
call_rcu(&event->rcu_head, free_event_rcu);
}
+/* vs perf_event_alloc() success */
+static void _free_event(struct perf_event *event)
+{
+ irq_work_sync(&event->pending_irq);
+ irq_work_sync(&event->pending_disable_irq);
+ perf_pending_task_sync(event);
+
+ unaccount_event(event);
+
+ security_perf_event_free(event);
+
+ if (event->rb) {
+ /*
+ * Can happen when we close an event with re-directed output.
+ *
+ * Since we have a 0 refcount, perf_mmap_close() will skip
+ * over us; possibly making our ring_buffer_put() the last.
+ */
+ mutex_lock(&event->mmap_mutex);
+ ring_buffer_attach(event, NULL);
+ mutex_unlock(&event->mmap_mutex);
+ }
+
+ perf_event_free_bpf_prog(event);
+ perf_addr_filters_splice(event, NULL);
+
+ __free_event(event);
+}
+
/*
* Used to free events which have a known refcount of 1, such as in error paths
* where the event isn't exposed yet and inherited events.
@@ -11922,8 +11939,10 @@ static int perf_try_init_event(struct pm
event->destroy(event);
}
- if (ret)
+ if (ret) {
+ event->pmu = NULL;
module_put(pmu->module);
+ }
return ret;
}
@@ -12251,7 +12270,7 @@ perf_event_alloc(struct perf_event_attr
* See perf_output_read().
*/
if (has_inherit_and_sample_read(attr) && !(attr->sample_type & PERF_SAMPLE_TID))
- goto err_ns;
+ goto err;
if (!has_branch_stack(event))
event->attr.branch_sample_type = 0;
@@ -12259,7 +12278,7 @@ perf_event_alloc(struct perf_event_attr
pmu = perf_init_event(event);
if (IS_ERR(pmu)) {
err = PTR_ERR(pmu);
- goto err_ns;
+ goto err;
}
/*
@@ -12269,24 +12288,24 @@ perf_event_alloc(struct perf_event_attr
*/
if (pmu->task_ctx_nr == perf_invalid_context && (task || cgroup_fd != -1)) {
err = -EINVAL;
- goto err_pmu;
+ goto err;
}
if (event->attr.aux_output &&
!(pmu->capabilities & PERF_PMU_CAP_AUX_OUTPUT)) {
err = -EOPNOTSUPP;
- goto err_pmu;
+ goto err;
}
if (cgroup_fd != -1) {
err = perf_cgroup_connect(cgroup_fd, event, attr, group_leader);
if (err)
- goto err_pmu;
+ goto err;
}
err = exclusive_event_init(event);
if (err)
- goto err_pmu;
+ goto err;
if (has_addr_filter(event)) {
event->addr_filter_ranges = kcalloc(pmu->nr_addr_filters,
@@ -12294,7 +12313,7 @@ perf_event_alloc(struct perf_event_attr
GFP_KERNEL);
if (!event->addr_filter_ranges) {
err = -ENOMEM;
- goto err_per_task;
+ goto err;
}
/*
@@ -12319,41 +12338,22 @@ perf_event_alloc(struct perf_event_attr
if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN) {
err = get_callchain_buffers(attr->sample_max_stack);
if (err)
- goto err_addr_filters;
+ goto err;
+ event->attach_state |= PERF_ATTACH_CALLCHAIN;
}
}
err = security_perf_event_alloc(event);
if (err)
- goto err_callchain_buffer;
+ goto err;
/* symmetric to unaccount_event() in _free_event() */
account_event(event);
return event;
-err_callchain_buffer:
- if (!event->parent) {
- if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN)
- put_callchain_buffers();
- }
-err_addr_filters:
- kfree(event->addr_filter_ranges);
-
-err_per_task:
- exclusive_event_destroy(event);
-
-err_pmu:
- if (is_cgroup_event(event))
- perf_detach_cgroup(event);
- if (event->destroy)
- event->destroy(event);
- module_put(pmu->module);
-err_ns:
- if (event->hw.target)
- put_task_struct(event->hw.target);
- call_rcu(&event->rcu_head, free_event_rcu);
-
+err:
+ __free_event(event);
return ERR_PTR(err);
}
^ permalink raw reply [flat|nested] 85+ messages in thread* [tip: perf/core] perf/core: Simplify the perf_event_alloc() error path
2024-11-04 13:39 ` [PATCH 04/19] perf: Simplify perf_event_alloc() error path Peter Zijlstra
@ 2025-03-01 20:07 ` tip-bot2 for Peter Zijlstra
2025-03-04 8:57 ` tip-bot2 for Peter Zijlstra
2025-03-06 7:57 ` [PATCH 04/19] perf: Simplify " Lai, Yi
2 siblings, 0 replies; 85+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2025-03-01 20:07 UTC (permalink / raw)
To: linux-tip-commits; +Cc: Peter Zijlstra (Intel), Ingo Molnar, x86, linux-kernel
The following commit has been merged into the perf/core branch of tip:
Commit-ID: 02be310c2d24223efe1a0aec3c5bf04d78ac5ba2
Gitweb: https://git.kernel.org/tip/02be310c2d24223efe1a0aec3c5bf04d78ac5ba2
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Mon, 04 Nov 2024 14:39:13 +01:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Sat, 01 Mar 2025 19:54:05 +01:00
perf/core: Simplify the perf_event_alloc() error path
The error cleanup sequence in perf_event_alloc() is a subset of the
existing _free_event() function (it must of course be).
Split this out into __free_event() and simplify the error path.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20241104135517.967889521@infradead.org
---
include/linux/perf_event.h | 16 ++--
kernel/events/core.c | 138 ++++++++++++++++++------------------
2 files changed, 78 insertions(+), 76 deletions(-)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index c4525ba..8c0117b 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -673,13 +673,15 @@ struct swevent_hlist {
struct rcu_head rcu_head;
};
-#define PERF_ATTACH_CONTEXT 0x01
-#define PERF_ATTACH_GROUP 0x02
-#define PERF_ATTACH_TASK 0x04
-#define PERF_ATTACH_TASK_DATA 0x08
-#define PERF_ATTACH_ITRACE 0x10
-#define PERF_ATTACH_SCHED_CB 0x20
-#define PERF_ATTACH_CHILD 0x40
+#define PERF_ATTACH_CONTEXT 0x0001
+#define PERF_ATTACH_GROUP 0x0002
+#define PERF_ATTACH_TASK 0x0004
+#define PERF_ATTACH_TASK_DATA 0x0008
+#define PERF_ATTACH_ITRACE 0x0010
+#define PERF_ATTACH_SCHED_CB 0x0020
+#define PERF_ATTACH_CHILD 0x0040
+#define PERF_ATTACH_EXCLUSIVE 0x0080
+#define PERF_ATTACH_CALLCHAIN 0x0100
struct bpf_prog;
struct perf_cgroup;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 6ccf363..1b8b1c8 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5289,6 +5289,8 @@ static int exclusive_event_init(struct perf_event *event)
return -EBUSY;
}
+ event->attach_state |= PERF_ATTACH_EXCLUSIVE;
+
return 0;
}
@@ -5296,14 +5298,13 @@ static void exclusive_event_destroy(struct perf_event *event)
{
struct pmu *pmu = event->pmu;
- if (!is_exclusive_pmu(pmu))
- return;
-
/* see comment in exclusive_event_init() */
if (event->attach_state & PERF_ATTACH_TASK)
atomic_dec(&pmu->exclusive_cnt);
else
atomic_inc(&pmu->exclusive_cnt);
+
+ event->attach_state &= ~PERF_ATTACH_EXCLUSIVE;
}
static bool exclusive_event_match(struct perf_event *e1, struct perf_event *e2)
@@ -5362,40 +5363,20 @@ static void perf_pending_task_sync(struct perf_event *event)
rcuwait_wait_event(&event->pending_work_wait, !event->pending_work, TASK_UNINTERRUPTIBLE);
}
-static void _free_event(struct perf_event *event)
+/* vs perf_event_alloc() error */
+static void __free_event(struct perf_event *event)
{
- irq_work_sync(&event->pending_irq);
- irq_work_sync(&event->pending_disable_irq);
- perf_pending_task_sync(event);
+ if (event->attach_state & PERF_ATTACH_CALLCHAIN)
+ put_callchain_buffers();
- unaccount_event(event);
+ kfree(event->addr_filter_ranges);
- security_perf_event_free(event);
-
- if (event->rb) {
- /*
- * Can happen when we close an event with re-directed output.
- *
- * Since we have a 0 refcount, perf_mmap_close() will skip
- * over us; possibly making our ring_buffer_put() the last.
- */
- mutex_lock(&event->mmap_mutex);
- ring_buffer_attach(event, NULL);
- mutex_unlock(&event->mmap_mutex);
- }
+ if (event->attach_state & PERF_ATTACH_EXCLUSIVE)
+ exclusive_event_destroy(event);
if (is_cgroup_event(event))
perf_detach_cgroup(event);
- if (!event->parent) {
- if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN)
- put_callchain_buffers();
- }
-
- perf_event_free_bpf_prog(event);
- perf_addr_filters_splice(event, NULL);
- kfree(event->addr_filter_ranges);
-
if (event->destroy)
event->destroy(event);
@@ -5406,22 +5387,58 @@ static void _free_event(struct perf_event *event)
if (event->hw.target)
put_task_struct(event->hw.target);
- if (event->pmu_ctx)
+ if (event->pmu_ctx) {
+ /*
+ * put_pmu_ctx() needs an event->ctx reference, because of
+ * epc->ctx.
+ */
+ WARN_ON_ONCE(!event->ctx);
+ WARN_ON_ONCE(event->pmu_ctx->ctx != event->ctx);
put_pmu_ctx(event->pmu_ctx);
+ }
/*
- * perf_event_free_task() relies on put_ctx() being 'last', in particular
- * all task references must be cleaned up.
+ * perf_event_free_task() relies on put_ctx() being 'last', in
+ * particular all task references must be cleaned up.
*/
if (event->ctx)
put_ctx(event->ctx);
- exclusive_event_destroy(event);
- module_put(event->pmu->module);
+ if (event->pmu)
+ module_put(event->pmu->module);
call_rcu(&event->rcu_head, free_event_rcu);
}
+/* vs perf_event_alloc() success */
+static void _free_event(struct perf_event *event)
+{
+ irq_work_sync(&event->pending_irq);
+ irq_work_sync(&event->pending_disable_irq);
+ perf_pending_task_sync(event);
+
+ unaccount_event(event);
+
+ security_perf_event_free(event);
+
+ if (event->rb) {
+ /*
+ * Can happen when we close an event with re-directed output.
+ *
+ * Since we have a 0 refcount, perf_mmap_close() will skip
+ * over us; possibly making our ring_buffer_put() the last.
+ */
+ mutex_lock(&event->mmap_mutex);
+ ring_buffer_attach(event, NULL);
+ mutex_unlock(&event->mmap_mutex);
+ }
+
+ perf_event_free_bpf_prog(event);
+ perf_addr_filters_splice(event, NULL);
+
+ __free_event(event);
+}
+
/*
* Used to free events which have a known refcount of 1, such as in error paths
* where the event isn't exposed yet and inherited events.
@@ -12093,8 +12110,10 @@ static int perf_try_init_event(struct pmu *pmu, struct perf_event *event)
event->destroy(event);
}
- if (ret)
+ if (ret) {
+ event->pmu = NULL;
module_put(pmu->module);
+ }
return ret;
}
@@ -12422,7 +12441,7 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
* See perf_output_read().
*/
if (has_inherit_and_sample_read(attr) && !(attr->sample_type & PERF_SAMPLE_TID))
- goto err_ns;
+ goto err;
if (!has_branch_stack(event))
event->attr.branch_sample_type = 0;
@@ -12430,7 +12449,7 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
pmu = perf_init_event(event);
if (IS_ERR(pmu)) {
err = PTR_ERR(pmu);
- goto err_ns;
+ goto err;
}
/*
@@ -12440,25 +12459,25 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
*/
if (pmu->task_ctx_nr == perf_invalid_context && (task || cgroup_fd != -1)) {
err = -EINVAL;
- goto err_pmu;
+ goto err;
}
if (event->attr.aux_output &&
(!(pmu->capabilities & PERF_PMU_CAP_AUX_OUTPUT) ||
event->attr.aux_pause || event->attr.aux_resume)) {
err = -EOPNOTSUPP;
- goto err_pmu;
+ goto err;
}
if (event->attr.aux_pause && event->attr.aux_resume) {
err = -EINVAL;
- goto err_pmu;
+ goto err;
}
if (event->attr.aux_start_paused) {
if (!(pmu->capabilities & PERF_PMU_CAP_AUX_PAUSE)) {
err = -EOPNOTSUPP;
- goto err_pmu;
+ goto err;
}
event->hw.aux_paused = 1;
}
@@ -12466,12 +12485,12 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
if (cgroup_fd != -1) {
err = perf_cgroup_connect(cgroup_fd, event, attr, group_leader);
if (err)
- goto err_pmu;
+ goto err;
}
err = exclusive_event_init(event);
if (err)
- goto err_pmu;
+ goto err;
if (has_addr_filter(event)) {
event->addr_filter_ranges = kcalloc(pmu->nr_addr_filters,
@@ -12479,7 +12498,7 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
GFP_KERNEL);
if (!event->addr_filter_ranges) {
err = -ENOMEM;
- goto err_per_task;
+ goto err;
}
/*
@@ -12504,41 +12523,22 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN) {
err = get_callchain_buffers(attr->sample_max_stack);
if (err)
- goto err_addr_filters;
+ goto err;
+ event->attach_state |= PERF_ATTACH_CALLCHAIN;
}
}
err = security_perf_event_alloc(event);
if (err)
- goto err_callchain_buffer;
+ goto err;
/* symmetric to unaccount_event() in _free_event() */
account_event(event);
return event;
-err_callchain_buffer:
- if (!event->parent) {
- if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN)
- put_callchain_buffers();
- }
-err_addr_filters:
- kfree(event->addr_filter_ranges);
-
-err_per_task:
- exclusive_event_destroy(event);
-
-err_pmu:
- if (is_cgroup_event(event))
- perf_detach_cgroup(event);
- if (event->destroy)
- event->destroy(event);
- module_put(pmu->module);
-err_ns:
- if (event->hw.target)
- put_task_struct(event->hw.target);
- call_rcu(&event->rcu_head, free_event_rcu);
-
+err:
+ __free_event(event);
return ERR_PTR(err);
}
^ permalink raw reply related [flat|nested] 85+ messages in thread* [tip: perf/core] perf/core: Simplify the perf_event_alloc() error path
2024-11-04 13:39 ` [PATCH 04/19] perf: Simplify perf_event_alloc() error path Peter Zijlstra
2025-03-01 20:07 ` [tip: perf/core] perf/core: Simplify the " tip-bot2 for Peter Zijlstra
@ 2025-03-04 8:57 ` tip-bot2 for Peter Zijlstra
2025-03-06 7:57 ` [PATCH 04/19] perf: Simplify " Lai, Yi
2 siblings, 0 replies; 85+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2025-03-04 8:57 UTC (permalink / raw)
To: linux-tip-commits
Cc: Peter Zijlstra (Intel), Ingo Molnar, Ravi Bangoria, x86,
linux-kernel
The following commit has been merged into the perf/core branch of tip:
Commit-ID: c70ca298036c58a88686ff388d3d367e9d21acf0
Gitweb: https://git.kernel.org/tip/c70ca298036c58a88686ff388d3d367e9d21acf0
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Mon, 04 Nov 2024 14:39:13 +01:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 04 Mar 2025 09:42:14 +01:00
perf/core: Simplify the perf_event_alloc() error path
The error cleanup sequence in perf_event_alloc() is a subset of the
existing _free_event() function (it must of course be).
Split this out into __free_event() and simplify the error path.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20241104135517.967889521@infradead.org
---
include/linux/perf_event.h | 16 ++--
kernel/events/core.c | 138 ++++++++++++++++++------------------
2 files changed, 78 insertions(+), 76 deletions(-)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index c4525ba..8c0117b 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -673,13 +673,15 @@ struct swevent_hlist {
struct rcu_head rcu_head;
};
-#define PERF_ATTACH_CONTEXT 0x01
-#define PERF_ATTACH_GROUP 0x02
-#define PERF_ATTACH_TASK 0x04
-#define PERF_ATTACH_TASK_DATA 0x08
-#define PERF_ATTACH_ITRACE 0x10
-#define PERF_ATTACH_SCHED_CB 0x20
-#define PERF_ATTACH_CHILD 0x40
+#define PERF_ATTACH_CONTEXT 0x0001
+#define PERF_ATTACH_GROUP 0x0002
+#define PERF_ATTACH_TASK 0x0004
+#define PERF_ATTACH_TASK_DATA 0x0008
+#define PERF_ATTACH_ITRACE 0x0010
+#define PERF_ATTACH_SCHED_CB 0x0020
+#define PERF_ATTACH_CHILD 0x0040
+#define PERF_ATTACH_EXCLUSIVE 0x0080
+#define PERF_ATTACH_CALLCHAIN 0x0100
struct bpf_prog;
struct perf_cgroup;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 6ccf363..1b8b1c8 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5289,6 +5289,8 @@ static int exclusive_event_init(struct perf_event *event)
return -EBUSY;
}
+ event->attach_state |= PERF_ATTACH_EXCLUSIVE;
+
return 0;
}
@@ -5296,14 +5298,13 @@ static void exclusive_event_destroy(struct perf_event *event)
{
struct pmu *pmu = event->pmu;
- if (!is_exclusive_pmu(pmu))
- return;
-
/* see comment in exclusive_event_init() */
if (event->attach_state & PERF_ATTACH_TASK)
atomic_dec(&pmu->exclusive_cnt);
else
atomic_inc(&pmu->exclusive_cnt);
+
+ event->attach_state &= ~PERF_ATTACH_EXCLUSIVE;
}
static bool exclusive_event_match(struct perf_event *e1, struct perf_event *e2)
@@ -5362,40 +5363,20 @@ static void perf_pending_task_sync(struct perf_event *event)
rcuwait_wait_event(&event->pending_work_wait, !event->pending_work, TASK_UNINTERRUPTIBLE);
}
-static void _free_event(struct perf_event *event)
+/* vs perf_event_alloc() error */
+static void __free_event(struct perf_event *event)
{
- irq_work_sync(&event->pending_irq);
- irq_work_sync(&event->pending_disable_irq);
- perf_pending_task_sync(event);
+ if (event->attach_state & PERF_ATTACH_CALLCHAIN)
+ put_callchain_buffers();
- unaccount_event(event);
+ kfree(event->addr_filter_ranges);
- security_perf_event_free(event);
-
- if (event->rb) {
- /*
- * Can happen when we close an event with re-directed output.
- *
- * Since we have a 0 refcount, perf_mmap_close() will skip
- * over us; possibly making our ring_buffer_put() the last.
- */
- mutex_lock(&event->mmap_mutex);
- ring_buffer_attach(event, NULL);
- mutex_unlock(&event->mmap_mutex);
- }
+ if (event->attach_state & PERF_ATTACH_EXCLUSIVE)
+ exclusive_event_destroy(event);
if (is_cgroup_event(event))
perf_detach_cgroup(event);
- if (!event->parent) {
- if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN)
- put_callchain_buffers();
- }
-
- perf_event_free_bpf_prog(event);
- perf_addr_filters_splice(event, NULL);
- kfree(event->addr_filter_ranges);
-
if (event->destroy)
event->destroy(event);
@@ -5406,22 +5387,58 @@ static void _free_event(struct perf_event *event)
if (event->hw.target)
put_task_struct(event->hw.target);
- if (event->pmu_ctx)
+ if (event->pmu_ctx) {
+ /*
+ * put_pmu_ctx() needs an event->ctx reference, because of
+ * epc->ctx.
+ */
+ WARN_ON_ONCE(!event->ctx);
+ WARN_ON_ONCE(event->pmu_ctx->ctx != event->ctx);
put_pmu_ctx(event->pmu_ctx);
+ }
/*
- * perf_event_free_task() relies on put_ctx() being 'last', in particular
- * all task references must be cleaned up.
+ * perf_event_free_task() relies on put_ctx() being 'last', in
+ * particular all task references must be cleaned up.
*/
if (event->ctx)
put_ctx(event->ctx);
- exclusive_event_destroy(event);
- module_put(event->pmu->module);
+ if (event->pmu)
+ module_put(event->pmu->module);
call_rcu(&event->rcu_head, free_event_rcu);
}
+/* vs perf_event_alloc() success */
+static void _free_event(struct perf_event *event)
+{
+ irq_work_sync(&event->pending_irq);
+ irq_work_sync(&event->pending_disable_irq);
+ perf_pending_task_sync(event);
+
+ unaccount_event(event);
+
+ security_perf_event_free(event);
+
+ if (event->rb) {
+ /*
+ * Can happen when we close an event with re-directed output.
+ *
+ * Since we have a 0 refcount, perf_mmap_close() will skip
+ * over us; possibly making our ring_buffer_put() the last.
+ */
+ mutex_lock(&event->mmap_mutex);
+ ring_buffer_attach(event, NULL);
+ mutex_unlock(&event->mmap_mutex);
+ }
+
+ perf_event_free_bpf_prog(event);
+ perf_addr_filters_splice(event, NULL);
+
+ __free_event(event);
+}
+
/*
* Used to free events which have a known refcount of 1, such as in error paths
* where the event isn't exposed yet and inherited events.
@@ -12093,8 +12110,10 @@ static int perf_try_init_event(struct pmu *pmu, struct perf_event *event)
event->destroy(event);
}
- if (ret)
+ if (ret) {
+ event->pmu = NULL;
module_put(pmu->module);
+ }
return ret;
}
@@ -12422,7 +12441,7 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
* See perf_output_read().
*/
if (has_inherit_and_sample_read(attr) && !(attr->sample_type & PERF_SAMPLE_TID))
- goto err_ns;
+ goto err;
if (!has_branch_stack(event))
event->attr.branch_sample_type = 0;
@@ -12430,7 +12449,7 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
pmu = perf_init_event(event);
if (IS_ERR(pmu)) {
err = PTR_ERR(pmu);
- goto err_ns;
+ goto err;
}
/*
@@ -12440,25 +12459,25 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
*/
if (pmu->task_ctx_nr == perf_invalid_context && (task || cgroup_fd != -1)) {
err = -EINVAL;
- goto err_pmu;
+ goto err;
}
if (event->attr.aux_output &&
(!(pmu->capabilities & PERF_PMU_CAP_AUX_OUTPUT) ||
event->attr.aux_pause || event->attr.aux_resume)) {
err = -EOPNOTSUPP;
- goto err_pmu;
+ goto err;
}
if (event->attr.aux_pause && event->attr.aux_resume) {
err = -EINVAL;
- goto err_pmu;
+ goto err;
}
if (event->attr.aux_start_paused) {
if (!(pmu->capabilities & PERF_PMU_CAP_AUX_PAUSE)) {
err = -EOPNOTSUPP;
- goto err_pmu;
+ goto err;
}
event->hw.aux_paused = 1;
}
@@ -12466,12 +12485,12 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
if (cgroup_fd != -1) {
err = perf_cgroup_connect(cgroup_fd, event, attr, group_leader);
if (err)
- goto err_pmu;
+ goto err;
}
err = exclusive_event_init(event);
if (err)
- goto err_pmu;
+ goto err;
if (has_addr_filter(event)) {
event->addr_filter_ranges = kcalloc(pmu->nr_addr_filters,
@@ -12479,7 +12498,7 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
GFP_KERNEL);
if (!event->addr_filter_ranges) {
err = -ENOMEM;
- goto err_per_task;
+ goto err;
}
/*
@@ -12504,41 +12523,22 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN) {
err = get_callchain_buffers(attr->sample_max_stack);
if (err)
- goto err_addr_filters;
+ goto err;
+ event->attach_state |= PERF_ATTACH_CALLCHAIN;
}
}
err = security_perf_event_alloc(event);
if (err)
- goto err_callchain_buffer;
+ goto err;
/* symmetric to unaccount_event() in _free_event() */
account_event(event);
return event;
-err_callchain_buffer:
- if (!event->parent) {
- if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN)
- put_callchain_buffers();
- }
-err_addr_filters:
- kfree(event->addr_filter_ranges);
-
-err_per_task:
- exclusive_event_destroy(event);
-
-err_pmu:
- if (is_cgroup_event(event))
- perf_detach_cgroup(event);
- if (event->destroy)
- event->destroy(event);
- module_put(pmu->module);
-err_ns:
- if (event->hw.target)
- put_task_struct(event->hw.target);
- call_rcu(&event->rcu_head, free_event_rcu);
-
+err:
+ __free_event(event);
return ERR_PTR(err);
}
^ permalink raw reply related [flat|nested] 85+ messages in thread* Re: [PATCH 04/19] perf: Simplify perf_event_alloc() error path
2024-11-04 13:39 ` [PATCH 04/19] perf: Simplify perf_event_alloc() error path Peter Zijlstra
2025-03-01 20:07 ` [tip: perf/core] perf/core: Simplify the " tip-bot2 for Peter Zijlstra
2025-03-04 8:57 ` tip-bot2 for Peter Zijlstra
@ 2025-03-06 7:57 ` Lai, Yi
2025-03-06 9:24 ` Ingo Molnar
2 siblings, 1 reply; 85+ messages in thread
From: Lai, Yi @ 2025-03-06 7:57 UTC (permalink / raw)
To: Peter Zijlstra
Cc: mingo, lucas.demarchi, linux-kernel, willy, acme, namhyung,
mark.rutland, alexander.shishkin, jolsa, irogers, adrian.hunter,
kan.liang
On Mon, Nov 04, 2024 at 02:39:13PM +0100, Peter Zijlstra wrote:
> The error cleanup sequence in perf_event_alloc() is a subset of the
> existing _free_event() function (it must of course be).
>
> Split this out into __free_event() and simplify the error path.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
> include/linux/perf_event.h | 16 +++--
> kernel/events/core.c | 134 ++++++++++++++++++++++-----------------------
> 2 files changed, 76 insertions(+), 74 deletions(-)
>
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -652,13 +652,15 @@ struct swevent_hlist {
> struct rcu_head rcu_head;
> };
>
> -#define PERF_ATTACH_CONTEXT 0x01
> -#define PERF_ATTACH_GROUP 0x02
> -#define PERF_ATTACH_TASK 0x04
> -#define PERF_ATTACH_TASK_DATA 0x08
> -#define PERF_ATTACH_ITRACE 0x10
> -#define PERF_ATTACH_SCHED_CB 0x20
> -#define PERF_ATTACH_CHILD 0x40
> +#define PERF_ATTACH_CONTEXT 0x0001
> +#define PERF_ATTACH_GROUP 0x0002
> +#define PERF_ATTACH_TASK 0x0004
> +#define PERF_ATTACH_TASK_DATA 0x0008
> +#define PERF_ATTACH_ITRACE 0x0010
> +#define PERF_ATTACH_SCHED_CB 0x0020
> +#define PERF_ATTACH_CHILD 0x0040
> +#define PERF_ATTACH_EXCLUSIVE 0x0080
> +#define PERF_ATTACH_CALLCHAIN 0x0100
>
> struct bpf_prog;
> struct perf_cgroup;
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -5246,6 +5246,8 @@ static int exclusive_event_init(struct p
> return -EBUSY;
> }
>
> + event->attach_state |= PERF_ATTACH_EXCLUSIVE;
> +
> return 0;
> }
>
> @@ -5253,14 +5255,13 @@ static void exclusive_event_destroy(stru
> {
> struct pmu *pmu = event->pmu;
>
> - if (!is_exclusive_pmu(pmu))
> - return;
> -
> /* see comment in exclusive_event_init() */
> if (event->attach_state & PERF_ATTACH_TASK)
> atomic_dec(&pmu->exclusive_cnt);
> else
> atomic_inc(&pmu->exclusive_cnt);
> +
> + event->attach_state &= ~PERF_ATTACH_EXCLUSIVE;
> }
>
> static bool exclusive_event_match(struct perf_event *e1, struct perf_event *e2)
> @@ -5319,40 +5320,20 @@ static void perf_pending_task_sync(struc
> rcuwait_wait_event(&event->pending_work_wait, !event->pending_work, TASK_UNINTERRUPTIBLE);
> }
>
> -static void _free_event(struct perf_event *event)
> +/* vs perf_event_alloc() error */
> +static void __free_event(struct perf_event *event)
> {
> - irq_work_sync(&event->pending_irq);
> - irq_work_sync(&event->pending_disable_irq);
> - perf_pending_task_sync(event);
> -
> - unaccount_event(event);
> + if (event->attach_state & PERF_ATTACH_CALLCHAIN)
> + put_callchain_buffers();
>
> - security_perf_event_free(event);
> + kfree(event->addr_filter_ranges);
>
> - if (event->rb) {
> - /*
> - * Can happen when we close an event with re-directed output.
> - *
> - * Since we have a 0 refcount, perf_mmap_close() will skip
> - * over us; possibly making our ring_buffer_put() the last.
> - */
> - mutex_lock(&event->mmap_mutex);
> - ring_buffer_attach(event, NULL);
> - mutex_unlock(&event->mmap_mutex);
> - }
> + if (event->attach_state & PERF_ATTACH_EXCLUSIVE)
> + exclusive_event_destroy(event);
>
> if (is_cgroup_event(event))
> perf_detach_cgroup(event);
>
> - if (!event->parent) {
> - if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN)
> - put_callchain_buffers();
> - }
> -
> - perf_event_free_bpf_prog(event);
> - perf_addr_filters_splice(event, NULL);
> - kfree(event->addr_filter_ranges);
> -
> if (event->destroy)
> event->destroy(event);
>
> @@ -5363,22 +5344,58 @@ static void _free_event(struct perf_even
> if (event->hw.target)
> put_task_struct(event->hw.target);
>
> - if (event->pmu_ctx)
> + if (event->pmu_ctx) {
> + /*
> + * put_pmu_ctx() needs an event->ctx reference, because of
> + * epc->ctx.
> + */
> + WARN_ON_ONCE(!event->ctx);
> + WARN_ON_ONCE(event->pmu_ctx->ctx != event->ctx);
> put_pmu_ctx(event->pmu_ctx);
> + }
>
> /*
> - * perf_event_free_task() relies on put_ctx() being 'last', in particular
> - * all task references must be cleaned up.
> + * perf_event_free_task() relies on put_ctx() being 'last', in
> + * particular all task references must be cleaned up.
> */
> if (event->ctx)
> put_ctx(event->ctx);
>
> - exclusive_event_destroy(event);
> - module_put(event->pmu->module);
> + if (event->pmu)
> + module_put(event->pmu->module);
>
> call_rcu(&event->rcu_head, free_event_rcu);
> }
>
> +/* vs perf_event_alloc() success */
> +static void _free_event(struct perf_event *event)
> +{
> + irq_work_sync(&event->pending_irq);
> + irq_work_sync(&event->pending_disable_irq);
> + perf_pending_task_sync(event);
> +
> + unaccount_event(event);
> +
> + security_perf_event_free(event);
> +
> + if (event->rb) {
> + /*
> + * Can happen when we close an event with re-directed output.
> + *
> + * Since we have a 0 refcount, perf_mmap_close() will skip
> + * over us; possibly making our ring_buffer_put() the last.
> + */
> + mutex_lock(&event->mmap_mutex);
> + ring_buffer_attach(event, NULL);
> + mutex_unlock(&event->mmap_mutex);
> + }
> +
> + perf_event_free_bpf_prog(event);
> + perf_addr_filters_splice(event, NULL);
> +
> + __free_event(event);
> +}
> +
> /*
> * Used to free events which have a known refcount of 1, such as in error paths
> * where the event isn't exposed yet and inherited events.
> @@ -11922,8 +11939,10 @@ static int perf_try_init_event(struct pm
> event->destroy(event);
> }
>
> - if (ret)
> + if (ret) {
> + event->pmu = NULL;
> module_put(pmu->module);
> + }
>
> return ret;
> }
> @@ -12251,7 +12270,7 @@ perf_event_alloc(struct perf_event_attr
> * See perf_output_read().
> */
> if (has_inherit_and_sample_read(attr) && !(attr->sample_type & PERF_SAMPLE_TID))
> - goto err_ns;
> + goto err;
>
> if (!has_branch_stack(event))
> event->attr.branch_sample_type = 0;
> @@ -12259,7 +12278,7 @@ perf_event_alloc(struct perf_event_attr
> pmu = perf_init_event(event);
> if (IS_ERR(pmu)) {
> err = PTR_ERR(pmu);
> - goto err_ns;
> + goto err;
> }
>
> /*
> @@ -12269,24 +12288,24 @@ perf_event_alloc(struct perf_event_attr
> */
> if (pmu->task_ctx_nr == perf_invalid_context && (task || cgroup_fd != -1)) {
> err = -EINVAL;
> - goto err_pmu;
> + goto err;
> }
>
> if (event->attr.aux_output &&
> !(pmu->capabilities & PERF_PMU_CAP_AUX_OUTPUT)) {
> err = -EOPNOTSUPP;
> - goto err_pmu;
> + goto err;
> }
>
> if (cgroup_fd != -1) {
> err = perf_cgroup_connect(cgroup_fd, event, attr, group_leader);
> if (err)
> - goto err_pmu;
> + goto err;
> }
>
> err = exclusive_event_init(event);
> if (err)
> - goto err_pmu;
> + goto err;
>
> if (has_addr_filter(event)) {
> event->addr_filter_ranges = kcalloc(pmu->nr_addr_filters,
> @@ -12294,7 +12313,7 @@ perf_event_alloc(struct perf_event_attr
> GFP_KERNEL);
> if (!event->addr_filter_ranges) {
> err = -ENOMEM;
> - goto err_per_task;
> + goto err;
> }
>
> /*
> @@ -12319,41 +12338,22 @@ perf_event_alloc(struct perf_event_attr
> if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN) {
> err = get_callchain_buffers(attr->sample_max_stack);
> if (err)
> - goto err_addr_filters;
> + goto err;
> + event->attach_state |= PERF_ATTACH_CALLCHAIN;
> }
> }
>
> err = security_perf_event_alloc(event);
> if (err)
> - goto err_callchain_buffer;
> + goto err;
>
> /* symmetric to unaccount_event() in _free_event() */
> account_event(event);
>
> return event;
>
> -err_callchain_buffer:
> - if (!event->parent) {
> - if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN)
> - put_callchain_buffers();
> - }
> -err_addr_filters:
> - kfree(event->addr_filter_ranges);
> -
> -err_per_task:
> - exclusive_event_destroy(event);
> -
> -err_pmu:
> - if (is_cgroup_event(event))
> - perf_detach_cgroup(event);
> - if (event->destroy)
> - event->destroy(event);
> - module_put(pmu->module);
> -err_ns:
> - if (event->hw.target)
> - put_task_struct(event->hw.target);
> - call_rcu(&event->rcu_head, free_event_rcu);
> -
> +err:
> + __free_event(event);
> return ERR_PTR(err);
> }
>
Hi Peter Zijlstra ,
Greetings!
I used Syzkaller and found that in linux-next (tag: next-20250303), there are two issues and the first bad commit for both issues is
"
02be310c2d24 perf/core: Simplify the perf_event_alloc() error path
"
Issue 1: There is WARNING in __unregister_ftrace_function
repro code:
https://github.com/laifryiee/syzkaller_logs/tree/main/250306_simplify_the_perf_event_alloc_error_path/repro1.c
repro binary:
https://github.com/laifryiee/syzkaller_logs/tree/main/250306_simplify_the_perf_event_alloc_error_path/repro1
bzImage:
https://github.com/laifryiee/syzkaller_logs/tree/main/250306_simplify_the_perf_event_alloc_error_path/bzImage_1
dmesg:
https://github.com/laifryiee/syzkaller_logs/tree/main/250306_simplify_the_perf_event_alloc_error_path/dmesg_1.log
"
[ 25.925933] ------------[ cut here ]------------
[ 25.926631] WARNING: CPU: 1 PID: 729 at kernel/trace/ftrace.c:378 __unregister_ftrace_function+0x2dc/0x410
[ 25.927470] Modules linked in:
[ 25.927743] CPU: 1 UID: 0 PID: 729 Comm: repro Not tainted 6.14.0-rc5-next-20250303-cd3215bbcb9d #1
[ 25.928370] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[ 25.929147] RIP: 0010:__unregister_ftrace_function+0x2dc/0x410
[ 25.929614] Code: 5f 06 49 81 fc 00 91 ed 87 0f 84 bb 00 00 00 e8 3a 90 fa ff 4c 39 e3 0f 84 b9 00 00 00 4c 89 e3 e9 c0 fd ff ff e8 24 90 fa ff <0f> 0b 41 bc f0 ff ff ff e9 b9 fe ff ff e8 12 90 fa ff be ff ff ff
[ 25.930905] RSP: 0018:ffff888013ecfb10 EFLAGS: 00010293
[ 25.931285] RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff818da8fa
[ 25.931782] RDX: ffff88801341ca80 RSI: ffffffff818dab8c RDI: 0000000000000007
[ 25.932281] RBP: ffff888013ecfb30 R08: 0000000000000000 R09: fffffbfff0fdafcc
[ 25.932775] R10: 0000000000000001 R11: 1ffffffff1485d7d R12: 0000000000000000
[ 25.933272] R13: ffff88800dd48d70 R14: ffffffff87e3fc20 R15: 0000000000000000
[ 25.933848] FS: 00007f4f0f1f8600(0000) GS:ffff8880e36a9000(0000) knlGS:0000000000000000
[ 25.934414] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 25.934823] CR2: 00007f28ac627120 CR3: 0000000014c3c004 CR4: 0000000000770ef0
[ 25.935325] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 25.935825] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400
[ 25.936320] PKRU: 55555554
[ 25.936523] Call Trace:
[ 25.936708] <TASK>
[ 25.936874] ? show_regs+0x6d/0x80
[ 25.937148] ? __warn+0xf3/0x390
[ 25.937402] ? report_bug+0x25e/0x4b0
[ 25.937734] ? __unregister_ftrace_function+0x2dc/0x410
[ 25.938118] ? report_bug+0x2cb/0x4b0
[ 25.938393] ? __unregister_ftrace_function+0x2dc/0x410
[ 25.938773] ? __unregister_ftrace_function+0x2dc/0x410
[ 25.939151] ? handle_bug+0x2cd/0x510
[ 25.939428] ? __unregister_ftrace_function+0x2de/0x410
[ 25.939811] ? exc_invalid_op+0x3c/0x80
[ 25.940098] ? asm_exc_invalid_op+0x1f/0x30
[ 25.940413] ? __unregister_ftrace_function+0x4a/0x410
[ 25.940786] ? __unregister_ftrace_function+0x2dc/0x410
[ 25.941162] ? __unregister_ftrace_function+0x2dc/0x410
[ 25.941550] unregister_ftrace_function+0x52/0x400
[ 25.941937] ? __sanitizer_cov_trace_switch+0x58/0xa0
[ 25.942315] perf_ftrace_event_register+0x1af/0x260
[ 25.942679] perf_trace_destroy+0xa1/0x1d0
[ 25.942986] tp_perf_event_destroy+0x1f/0x30
[ 25.943302] ? __pfx_tp_perf_event_destroy+0x10/0x10
[ 25.943659] __free_event+0x1e2/0x8a0
[ 25.943931] ? __kasan_check_write+0x18/0x20
[ 25.944257] perf_event_alloc.part.0+0x21be/0x3710
[ 25.944619] ? perf_event_alloc.part.0+0xff9/0x3710
[ 25.944980] __do_sys_perf_event_open+0x672/0x2bc0
[ 25.945341] ? __pfx___do_sys_perf_event_open+0x10/0x10
[ 25.945765] ? seqcount_lockdep_reader_access.constprop.0+0xc0/0xd0
[ 25.946220] ? __sanitizer_cov_trace_cmp4+0x1a/0x20
[ 25.946571] ? ktime_get_coarse_real_ts64+0xb6/0x100
[ 25.946942] __x64_sys_perf_event_open+0xc7/0x150
[ 25.947285] ? syscall_trace_enter+0x14d/0x280
[ 25.947615] x64_sys_call+0x1ea2/0x2150
[ 25.947901] do_syscall_64+0x6d/0x140
[ 25.948183] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 25.948547] RIP: 0033:0x7f4f0ee3ee5d
[ 25.948819] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 93 af 1b 00 f7 d8 64 89 01 48
[ 25.950110] RSP: 002b:00007ffe7bb958f8 EFLAGS: 00000246 ORIG_RAX: 000000000000012a
[ 25.950641] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f4f0ee3ee5d
[ 25.951143] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020002240
[ 25.951640] RBP: 00007ffe7bb95900 R08: 0000000000000000 R09: 00007ffe7bb95930
[ 25.952138] R10: 00000000ffffffff R11: 0000000000000246 R12: 00007ffe7bb95a58
[ 25.952637] R13: 0000000000401b4f R14: 0000000000403e08 R15: 00007f4f0f241000
[ 25.953154] </TASK>
[ 25.953326] irq event stamp: 851735
[ 25.953596] hardirqs last enabled at (851743): [<ffffffff81664ea5>] __up_console_sem+0x95/0xb0
[ 25.954245] hardirqs last disabled at (851752): [<ffffffff81664e8a>] __up_console_sem+0x7a/0xb0
[ 25.954851] softirqs last enabled at (851498): [<ffffffff8148c93e>] __irq_exit_rcu+0x10e/0x170
[ 25.955456] softirqs last disabled at (851493): [<ffffffff8148c93e>] __irq_exit_rcu+0x10e/0x170
[ 25.956059] ---[ end trace 0000000000000000 ]---
[ 27.413792] ------------[ cut here ]------------
[ 27.414477] WARNING: CPU: 1 PID: 730 at kernel/trace/ftrace.c:378 __unregister_ftrace_function+0x2dc/0x410
[ 27.415894] Modules linked in:
[ 27.416310] CPU: 1 UID: 0 PID: 730 Comm: repro Tainted: G W 6.14.0-rc5-next-20250303-cd3215bbcb9d #1
[ 27.417646] Tainted: [W]=WARN
[ 27.418122] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[ 27.419521] RIP: 0010:__unregister_ftrace_function+0x2dc/0x410
[ 27.420266] Code: 5f 06 49 81 fc 00 91 ed 87 0f 84 bb 00 00 00 e8 3a 90 fa ff 4c 39 e3 0f 84 b9 00 00 00 4c 89 e3 e9 c0 fd ff ff e8 24 90 fa ff <0f> 0b 41 bc f0 ff ff ff e9 b9 fe ff ff e8 12 90 fa ff be ff ff ff
[ 27.422583] RSP: 0018:ffff888012677b10 EFLAGS: 00010293
[ 27.423255] RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff818da8fa
[ 27.424141] RDX: ffff88800ed48000 RSI: ffffffff818dab8c RDI: 0000000000000007
[ 27.425024] RBP: ffff888012677b30 R08: 0000000000000000 R09: fffffbfff0fdafcc
[ 27.425829] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
[ 27.426546] R13: ffff88801264dc80 R14: ffffffff87e3fc20 R15: 0000000000000000
[ 27.427258] FS: 00007f4f0f1f8600(0000) GS:ffff8880e36a9000(0000) knlGS:0000000000000000
[ 27.427966] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 27.428472] CR2: 0000000020002240 CR3: 000000000ebf6001 CR4: 0000000000770ef0
[ 27.429092] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 27.429741] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400
[ 27.430366] PKRU: 55555554
[ 27.430617] Call Trace:
[ 27.430846] <TASK>
[ 27.431052] ? show_regs+0x6d/0x80
[ 27.431375] ? __warn+0xf3/0x390
[ 27.431697] ? report_bug+0x25e/0x4b0
[ 27.432076] ? __unregister_ftrace_function+0x2dc/0x410
[ 27.432568] ? report_bug+0x2cb/0x4b0
[ 27.432916] ? __unregister_ftrace_function+0x2dc/0x410
[ 27.433387] ? __unregister_ftrace_function+0x2dc/0x410
[ 27.433911] ? handle_bug+0x2cd/0x510
[ 27.434252] ? __unregister_ftrace_function+0x2de/0x410
"
Issue 2:
repro code:
https://github.com/laifryiee/syzkaller_logs/tree/main/250306_simplify_the_perf_event_alloc_error_path/repro2.c
repro binary:
https://github.com/laifryiee/syzkaller_logs/tree/main/250306_simplify_the_perf_event_alloc_error_path/repro2
bzImage:
https://github.com/laifryiee/syzkaller_logs/tree/main/250306_simplify_the_perf_event_alloc_error_path/bzImage_2
dmesg:
https://github.com/laifryiee/syzkaller_logs/tree/main/250306_simplify_the_perf_event_alloc_error_path/dmesg_2.log
"
[ 22.332967] ------------------------------------------------------
[ 22.332970] repro/738 is trying to acquire lock:
[ 22.332975] ffffffff87050e58 ((console_sem).lock){-...}-{2:2}, at: down_trylock+0x1c/0x80
[ 22.333021]
[ 22.333021] but task is already holding lock:
[ 22.333024] ff11000012331818 (&ctx->lock){....}-{2:2}, at: __perf_install_in_context+0xf8/0xc90
[ 22.333050]
[ 22.333050] which lock already depends on the new lock.
[ 22.333050]
[ 22.333052]
[ 22.333052] the existing dependency chain (in reverse order) is:
[ 22.333055]
[ 22.333055] -> #3 (&ctx->lock){....}-{2:2}:
[ 22.333068] _raw_spin_lock+0x38/0x50
[ 22.333080] __perf_event_task_sched_out+0x466/0x1930
[ 22.333090] __schedule+0x1403/0x3510
[ 22.333098] preempt_schedule_common+0x49/0xd0
[ 22.333106] __cond_resched+0x37/0x50
[ 22.333114] dput.part.0+0x2e/0x9e0
[ 22.333128] dput+0x29/0x40
[ 22.333136] __fput+0x535/0xb70
[ 22.333145] ____fput+0x22/0x30
[ 22.333154] task_work_run+0x19c/0x2b0
[ 22.333166] do_exit+0xb0f/0x2a30
[ 22.333176] do_group_exit+0xe4/0x2c0
[ 22.333185] __x64_sys_exit_group+0x4d/0x60
[ 22.333196] x64_sys_call+0xf81/0x2140
[ 22.333211] do_syscall_64+0x6d/0x140
[ 22.333220] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 22.333235]
[ 22.333235] -> #2 (&rq->__lock){-.-.}-{2:2}:
[ 22.333248] _raw_spin_lock_nested+0x3e/0x60
[ 22.333263] __task_rq_lock+0xe6/0x480
[ 22.333274] wake_up_new_task+0x72d/0xe70
[ 22.333284] kernel_clone+0x203/0x8c0
[ 22.333293] user_mode_thread+0xe0/0x120
[ 22.333302] rest_init+0x2e/0x2b0
[ 22.333314] start_kernel+0x42b/0x560
[ 22.333324] x86_64_start_reservations+0x1c/0x30
[ 22.333339] x86_64_start_kernel+0xa0/0xb0
[ 22.333353] common_startup_64+0x13e/0x141
[ 22.333367]
[ 22.333367] -> #1 (&p->pi_lock){-.-.}-{2:2}:
[ 22.333380] _raw_spin_lock_irqsave+0x52/0x80
[ 22.333396] try_to_wake_up+0xc6/0x1650
[ 22.333413] wake_up_process+0x19/0x20
[ 22.333423] __up.isra.0+0xec/0x130
[ 22.333433] up+0x90/0xc0
[ 22.333443] __up_console_sem+0x8b/0xb0
[ 22.333457] console_unlock+0x1db/0x200
[ 22.333472] con_font_op+0xc6f/0x1090
[ 22.333483] vt_ioctl+0x63a/0x2dc0
[ 22.333496] tty_ioctl+0x7ca/0x1790
[ 22.333508] __x64_sys_ioctl+0x1ba/0x220
[ 22.333520] x64_sys_call+0x1227/0x2140
[ 22.333533] do_syscall_64+0x6d/0x140
[ 22.333542] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 22.333556]
[ 22.333556] -> #0 ((console_sem).lock){-...}-{2:2}:
[ 22.333568] __lock_acquire+0x2ff8/0x5d60
[ 22.333578] lock_acquire+0x1bd/0x550
[ 22.333587] _raw_spin_lock_irqsave+0x52/0x80
[ 22.333603] down_trylock+0x1c/0x80
[ 22.333614] __down_trylock_console_sem+0x4f/0xe0
[ 22.333627] vprintk_emit+0x72b/0x930
[ 22.333641] vprintk_default+0x2f/0x40
[ 22.333656] vprintk+0x6e/0x100
[ 22.333664] _printk+0xc4/0x100
[ 22.333675] __warn_printk+0x131/0x2e0
[ 22.333684] arch_install_hw_breakpoint+0x157/0x400
[ 22.333700] hw_breakpoint_add+0xb0/0x140
[ 22.333716] event_sched_in+0x3eb/0x9e0
[ 22.333729] merge_sched_in+0x877/0x1470
[ 22.333744] visit_groups_merge.constprop.0.isra.0+0x8e8/0x13a0
[ 22.333761] ctx_sched_in+0x5e3/0xa20
[ 22.333776] perf_event_sched_in+0x67/0xa0
[ 22.333791] ctx_resched+0x3a3/0x830
[ 22.333806] __perf_install_in_context+0x49b/0xc90
[ 22.333823] remote_function+0x135/0x1b0
[ 22.333838] generic_exec_single+0x1e5/0x2e0
[ 22.333851] smp_call_function_single+0x196/0x470
[ 22.333864] task_function_call+0x10e/0x1b0
[ 22.333877] perf_install_in_context+0x2eb/0x5a0
[ 22.333889] __do_sys_perf_event_open+0x1915/0x2be0
[ 22.333900] __x64_sys_perf_event_open+0xc7/0x150
[ 22.333910] x64_sys_call+0x1e96/0x2140
[ 22.333924] do_syscall_64+0x6d/0x140
[ 22.333932] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 22.333946]
[ 22.333946] other info that might help us debug this:
[ 22.333946]
[ 22.333949] Chain exists of:
[ 22.333949] (console_sem).lock --> &rq->__lock --> &ctx->lock
[ 22.333949]
[ 22.333963] Possible unsafe locking scenario:
[ 22.333963]
[ 22.333965] CPU0 CPU1
[ 22.333968] ---- ----
[ 22.333971] lock(&ctx->lock);
[ 22.333976] lock(&rq->__lock);
[ 22.333983] lock(&ctx->lock);
[ 22.333989] lock((console_sem).lock);
[ 22.333995]
[ 22.333995] *** DEADLOCK ***
[ 22.333995]
[ 22.333997] 4 locks held by repro/738:
[ 22.334002] #0: ff1100000bd68ca8 (&sig->exec_update_lock){++++}-{4:4}, at: __do_sys_perf_event_open+0x83a/0x2be0
[ 22.334027] #1: ff110000123318a8 (&ctx->mutex){+.+.}-{4:4}, at: __do_sys_perf_event_open+0xd2f/0x2be0
[ 22.334050] #2: ff1100006c83d238 (&cpuctx_lock){....}-{2:2}, at: __perf_install_in_context+0xb7/0xc90
[ 22.334080] #3: ff11000012331818 (&ctx->lock){....}-{2:2}, at: __perf_install_in_context+0xf8/0xc90
[ 22.334109]
[ 22.334109] stack backtrace:
[ 22.334114] CPU: 0 UID: 0 PID: 738 Comm: repro Tainted: G W 6.14.0-rc4-02be310c2d24+ #1
[ 22.334128] Tainted: [W]=WARN
[ 22.334131] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/014
[ 22.334138] Call Trace:
[ 22.334140] <TASK>
[ 22.334144] dump_stack_lvl+0xea/0x150
[ 22.334163] dump_stack+0x19/0x20
[ 22.334178] print_circular_bug+0x47f/0x750
[ 22.334199] check_noncircular+0x2f4/0x3e0
[ 22.334217] ? __pfx_check_noncircular+0x10/0x10
[ 22.334233] ? __pfx__prb_read_valid+0x10/0x10
[ 22.334251] ? lockdep_lock+0xd0/0x1d0
[ 22.334265] ? __pfx_lockdep_lock+0x10/0x10
[ 22.334283] __lock_acquire+0x2ff8/0x5d60
[ 22.334301] ? __pfx___lock_acquire+0x10/0x10
[ 22.334312] ? prb_final_commit+0x42/0x60
[ 22.334324] ? __sanitizer_cov_trace_const_cmp8+0x1c/0x30
[ 22.334342] ? vprintk_store+0x1c5/0xb20
[ 22.334359] lock_acquire+0x1bd/0x550
[ 22.334369] ? down_trylock+0x1c/0x80
[ 22.334383] ? __pfx_lock_acquire+0x10/0x10
[ 22.334397] ? mark_lock.part.0+0xf3/0x17b0
[ 22.334414] ? kernel_text_address+0xd3/0xe0
[ 22.334426] ? vprintk_default+0x2f/0x40
[ 22.334444] _raw_spin_lock_irqsave+0x52/0x80
[ 22.334460] ? down_trylock+0x1c/0x80
[ 22.334473] down_trylock+0x1c/0x80
[ 22.334484] ? vprintk_default+0x2f/0x40
[ 22.334500] __down_trylock_console_sem+0x4f/0xe0
[ 22.334515] vprintk_emit+0x72b/0x930
[ 22.334533] ? __pfx_vprintk_emit+0x10/0x10
[ 22.334550] ? __kasan_check_read+0x15/0x20
[ 22.334564] vprintk_default+0x2f/0x40
[ 22.334580] vprintk+0x6e/0x100
[ 22.334590] _printk+0xc4/0x100
[ 22.334602] ? __pfx__printk+0x10/0x10
[ 22.334614] ? __kasan_check_read+0x15/0x20
[ 22.334628] ? __warn_printk+0x125/0x2e0
[ 22.334638] ? __warn_printk+0x118/0x2e0
[ 22.334650] __warn_printk+0x131/0x2e0
[ 22.334661] ? __pfx___warn_printk+0x10/0x10
[ 22.334677] ? arch_install_hw_breakpoint+0x144/0x400
[ 22.334692] ? arch_install_hw_breakpoint+0x137/0x400
[ 22.334709] arch_install_hw_breakpoint+0x157/0x400
[ 22.334728] hw_breakpoint_add+0xb0/0x140
[ 22.334744] event_sched_in+0x3eb/0x9e0
[ 22.334762] merge_sched_in+0x877/0x1470
[ 22.334782] visit_groups_merge.constprop.0.isra.0+0x8e8/0x13a0
[ 22.334800] ? perf_event_set_state+0x37f/0x480
[ 22.334815] ? __pfx_visit_groups_merge.constprop.0.isra.0+0x10/0x10
[ 22.334834] ? __this_cpu_preempt_check+0x21/0x30
[ 22.334846] ? lock_is_held_type+0xef/0x150
[ 22.334859] ctx_sched_in+0x5e3/0xa20
[ 22.334875] ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
[ 22.334891] ? __pmu_ctx_sched_out+0xfa/0x720
[ 22.334908] ? __pfx_ctx_sched_in+0x10/0x10
[ 22.334930] perf_event_sched_in+0x67/0xa0
[ 22.334947] ctx_resched+0x3a3/0x830
[ 22.334967] __perf_install_in_context+0x49b/0xc90
[ 22.334984] ? __pfx_remote_function+0x10/0x10
[ 22.335001] ? __pfx___perf_install_in_context+0x10/0x10
[ 22.335020] remote_function+0x135/0x1b0
[ 22.335034] ? trace_csd_function_entry+0x6a/0x1b0
[ 22.335047] ? __pfx_remote_function+0x10/0x10
[ 22.335063] generic_exec_single+0x1e5/0x2e0
[ 22.335078] smp_call_function_single+0x196/0x470
[ 22.335092] ? __pfx_remote_function+0x10/0x10
[ 22.335108] ? __pfx_smp_call_function_single+0x10/0x10
[ 22.335123] ? __pfx_remote_function+0x10/0x10
[ 22.335138] ? debug_mutex_init+0x3c/0x80
[ 22.335152] ? __sanitizer_cov_trace_const_cmp1+0x1e/0x30
[ 22.335169] ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
[ 22.335188] task_function_call+0x10e/0x1b0
[ 22.335202] ? __pfx_task_function_call+0x10/0x10
[ 22.335217] ? __pfx___perf_install_in_context+0x10/0x10
[ 22.335235] ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
[ 22.335252] ? exclusive_event_installable+0x25c/0x330
[ 22.335268] ? lock_is_held_type+0xef/0x150
[ 22.335280] perf_install_in_context+0x2eb/0x5a0
[ 22.335295] ? __pfx_perf_install_in_context+0x10/0x10
[ 22.335308] ? __anon_inode_getfile+0x191/0x370
[ 22.335320] ? __sanitizer_cov_trace_const_cmp8+0x1c/0x30
[ 22.335337] ? __perf_event_read_size+0xc7/0xe0
[ 22.335351] ? __sanitizer_cov_trace_const_cmp8+0x1c/0x30
[ 22.335368] ? __sanitizer_cov_trace_const_cmp8+0x1c/0x30
[ 22.335387] __do_sys_perf_event_open+0x1915/0x2be0
[ 22.335403] ? __pfx___do_sys_perf_event_open+0x10/0x10
[ 22.335430] ? seqcount_lockdep_reader_access.constprop.0+0xc0/0xd0
[ 22.335448] ? __sanitizer_cov_trace_cmp4+0x1a/0x20
[ 22.335464] ? ktime_get_coarse_real_ts64+0xb6/0x100
[ 22.335485] __x64_sys_perf_event_open+0xc7/0x150
[ 22.335497] ? syscall_trace_enter+0x14f/0x280
[ 22.335512] x64_sys_call+0x1e96/0x2140
[ 22.335527] do_syscall_64+0x6d/0x140
[ 22.335537] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 22.335552] RIP: 0033:0x7f2565a3ee5d
[ 22.335560] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c88
[ 22.335570] RSP: 002b:00007ffd557ea4f8 EFLAGS: 00000297 ORIG_RAX: 000000000000012a
[ 22.335580] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f2565a3ee5d
[ 22.335586] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020002ac0
[ 22.335593] RBP: 00007ffd557ea510 R08: 0000000000000000 R09: 00007ffd557ea510
[ 22.335599] R10: 00000000ffffffff R11: 0000000000000297 R12: 00007ffd557ea668
[ 22.335606] R13: 0000000000402018 R14: 0000000000404e08 R15: 00007f2565cde000
[ 22.335621] </TASK>
[ 22.436551] Can't find any breakpoint slot
[ 22.436575] WARNING: CPU: 0 PID: 738 at arch/x86/kernel/hw_breakpoint.c:113 arch_install_hw_breakpoint+0x157/0x400
[ 22.437972] Modules linked in:
[ 22.438281] CPU: 0 UID: 0 PID: 738 Comm: repro Tainted: G W 6.14.0-rc4-02be310c2d24+ #1
[ 22.439174] Tainted: [W]=WARN
[ 22.439477] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/014
[ 22.440600] RIP: 0010:arch_install_hw_breakpoint+0x157/0x400
[ 22.441154] Code: ff ff ff 89 de e8 49 8d 57 00 84 db 0f 85 c3 01 00 00 e8 4c 93 57 00 48 c7 c7 60 09 c3 85 c6 05 03 e3 937
[ 22.442901] RSP: 0018:ff1100000bd27648 EFLAGS: 00010092
[ 22.443410] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff8146b573
[ 22.444149] RDX: ff11000011fdabc0 RSI: ffffffff8146b580 RDI: 0000000000000001
[ 22.444828] RBP: ff1100000bd27690 R08: 0000000000000001 R09: ffe21c000d905b21
[ 22.445515] R10: 0000000000000000 R11: 3030303030302052 R12: 00000000fffffff0
[ 22.446189] R13: ff1100000ad73758 R14: 000000000002c940 R15: dffffc0000000000
[ 22.446871] FS: 00007f2565c93740(0000) GS:ff1100006c800000(0000) knlGS:0000000000000000
[ 22.447642] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 22.448252] CR2: 0000000020002ac0 CR3: 000000001104e003 CR4: 0000000000771ef0
[ 22.448939] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 22.449625] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 00000000000006aa
[ 22.450298] PKRU: 55555554
[ 22.450577] Call Trace:
[ 22.450827] <TASK>
[ 22.451047] ? show_regs+0x6d/0x80
[ 22.451394] ? __warn+0xf3/0x390
[ 22.451731] ? find_bug+0x32e/0x4b0
[ 22.452137] ? arch_install_hw_breakpoint+0x157/0x400
[ 22.452643] ? report_bug+0x2cb/0x4b0
[ 22.453017] ? arch_install_hw_breakpoint+0x157/0x400
[ 22.453520] ? arch_install_hw_breakpoint+0x158/0x400
[ 22.454015] ? handle_bug+0xf1/0x190
[ 22.454376] ? exc_invalid_op+0x3c/0x80
[ 22.454766] ? asm_exc_invalid_op+0x1f/0x30
[ 22.455184] ? __warn_printk+0x173/0x2e0
[ 22.455578] ? __warn_printk+0x180/0x2e0
[ 22.456021] ? arch_install_hw_breakpoint+0x157/0x400
[ 22.456524] ? arch_install_hw_breakpoint+0x157/0x400
[ 22.457027] hw_breakpoint_add+0xb0/0x140
[ 22.457434] event_sched_in+0x3eb/0x9e0
[ 22.457820] merge_sched_in+0x877/0x1470
[ 22.458221] visit_groups_merge.constprop.0.isra.0+0x8e8/0x13a0
[ 22.458805] ? perf_event_set_state+0x37f/0x480
[ 22.459257] ? __pfx_visit_groups_merge.constprop.0.isra.0+0x10/0x10
[ 22.459876] ? __this_cpu_preempt_check+0x21/0x30
[ 22.460380] ? lock_is_held_type+0xef/0x150
[ 22.460797] ctx_sched_in+0x5e3/0xa20
[ 22.461169] ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
[ 22.461705] ? __pmu_ctx_sched_out+0xfa/0x720
[ 22.462149] ? __pfx_ctx_sched_in+0x10/0x10
[ 22.462578] perf_event_sched_in+0x67/0xa0
[ 22.462988] ctx_resched+0x3a3/0x830
[ 22.463355] __perf_install_in_context+0x49b/0xc90
[ 22.463839] ? __pfx_remote_function+0x10/0x10
[ 22.464342] ? __pfx___perf_install_in_context+0x10/0x10
[ 22.464875] remote_function+0x135/0x1b0
[ 22.465266] ? trace_csd_function_entry+0x6a/0x1b0
[ 22.465743] ? __pfx_remote_function+0x10/0x10
[ 22.466189] generic_exec_single+0x1e5/0x2e0
[ 22.466621] smp_call_function_single+0x196/0x470
[ 22.467085] ? __pfx_remote_function+0x10/0x10
[ 22.467534] ? __pfx_smp_call_function_single+0x10/0x10
[ 22.468108] ? __pfx_remote_function+0x10/0x10
[ 22.468553] ? debug_mutex_init+0x3c/0x80
[ 22.468952] ? __sanitizer_cov_trace_const_cmp1+0x1e/0x30
[ 22.469485] ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
[ 22.470016] task_function_call+0x10e/0x1b0
[ 22.470437] ? __pfx_task_function_call+0x10/0x10
[ 22.470900] ? __pfx___perf_install_in_context+0x10/0x10
[ 22.471425] ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
[ 22.472019] ? exclusive_event_installable+0x25c/0x330
[ 22.472538] ? lock_is_held_type+0xef/0x150
[ 22.472956] perf_install_in_context+0x2eb/0x5a0
[ 22.473416] ? __pfx_perf_install_in_context+0x10/0x10
[ 22.473919] ? __anon_inode_getfile+0x191/0x370
[ 22.474363] ? __sanitizer_cov_trace_const_cmp8+0x1c/0x30
[ 22.474903] ? __perf_event_read_size+0xc7/0xe0
[ 22.475355] ? __sanitizer_cov_trace_const_cmp8+0x1c/0x30
[ 22.475948] ? __sanitizer_cov_trace_const_cmp8+0x1c/0x30
[ 22.476488] __do_sys_perf_event_open+0x1915/0x2be0
[ 22.476969] ? __pfx___do_sys_perf_event_open+0x10/0x10
[ 22.477489] ? seqcount_lockdep_reader_access.constprop.0+0xc0/0xd0
[ 22.478095] ? __sanitizer_cov_trace_cmp4+0x1a/0x20
[ 22.478583] ? ktime_get_coarse_real_ts64+0xb6/0x100
[ 22.479076] __x64_sys_perf_event_open+0xc7/0x150
[ 22.479544] ? syscall_trace_enter+0x14f/0x280
[ 22.480059] x64_sys_call+0x1e96/0x2140
[ 22.480450] do_syscall_64+0x6d/0x140
[ 22.480816] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 22.481312] RIP: 0033:0x7f2565a3ee5d
[ 22.481678] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c88
[ 22.483409] RSP: 002b:00007ffd557ea4f8 EFLAGS: 00000297 ORIG_RAX: 000000000000012a
[ 22.484192] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f2565a3ee5d
[ 22.484876] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020002ac0
[ 22.485560] RBP: 00007ffd557ea510 R08: 0000000000000000 R09: 00007ffd557ea510
[ 22.486243] R10: 00000000ffffffff R11: 0000000000000297 R12: 00007ffd557ea668
[ 22.486925] R13: 0000000000402018 R14: 0000000000404e08 R15: 00007f2565cde000
[ 22.487612] </TASK>
[ 22.487840] irq event stamp: 496
[ 22.488224] hardirqs last enabled at (495): [<ffffffff81f86bac>] mod_objcg_state+0x42c/0x9c0
[ 22.489037] hardirqs last disabled at (496): [<ffffffff817bbd75>] generic_exec_single+0x1d5/0x2e0
[ 22.489891] softirqs last enabled at (0): [<ffffffff81463afe>] copy_process+0x1d4e/0x6a40
[ 22.490683] softirqs last disabled at (0): [<0000000000000000>] 0x0
[ 22.491278] ---[ end trace 0000000000000000 ]---
"
Hope this cound be insightful to you.
Regards,
Yi Lai
---
If you don't need the following environment to reproduce the problem or if you
already have one reproduced environment, please ignore the following information.
How to reproduce:
git clone https://gitlab.com/xupengfe/repro_vm_env.git
cd repro_vm_env
tar -xvf repro_vm_env.tar.gz
cd repro_vm_env; ./start3.sh // it needs qemu-system-x86_64 and I used v7.1.0
// start3.sh will load bzImage_2241ab53cbb5cdb08a6b2d4688feb13971058f65 v6.2-rc5 kernel
// You could change the bzImage_xxx as you want
// Maybe you need to remove line "-drive if=pflash,format=raw,readonly=on,file=./OVMF_CODE.fd \" for different qemu version
You could use below command to log in, there is no password for root.
ssh -p 10023 root@localhost
After login vm(virtual machine) successfully, you could transfer reproduced
binary to the vm by below way, and reproduce the problem in vm:
gcc -pthread -o repro repro.c
scp -P 10023 repro root@localhost:/root/
Get the bzImage for target kernel:
Please use target kconfig and copy it to kernel_src/.config
make olddefconfig
make -jx bzImage //x should equal or less than cpu num your pc has
Fill the bzImage file into above start3.sh to load the target kernel in vm.
Tips:
If you already have qemu-system-x86_64, please ignore below info.
If you want to install qemu v7.1.0 version:
git clone https://github.com/qemu/qemu.git
cd qemu
git checkout -f v7.1.0
mkdir build
cd build
yum install -y ninja-build.x86_64
yum -y install libslirp-devel.x86_64
../configure --target-list=x86_64-softmmu --enable-kvm --enable-vnc --enable-gtk --enable-sdl --enable-usb-redir --enable-slirp
make
make install
>
>
^ permalink raw reply [flat|nested] 85+ messages in thread* Re: [PATCH 04/19] perf: Simplify perf_event_alloc() error path
2025-03-06 7:57 ` [PATCH 04/19] perf: Simplify " Lai, Yi
@ 2025-03-06 9:24 ` Ingo Molnar
2025-03-07 3:16 ` Lai, Yi
0 siblings, 1 reply; 85+ messages in thread
From: Ingo Molnar @ 2025-03-06 9:24 UTC (permalink / raw)
To: Lai, Yi
Cc: Peter Zijlstra, lucas.demarchi, linux-kernel, willy, acme,
namhyung, mark.rutland, alexander.shishkin, jolsa, irogers,
adrian.hunter, kan.liang
* Lai, Yi <yi1.lai@linux.intel.com> wrote:
> Hi Peter Zijlstra ,
>
> Greetings!
>
> I used Syzkaller and found that in linux-next (tag: next-20250303), there are two issues and the first bad commit for both issues is
>
> "
> 02be310c2d24 perf/core: Simplify the perf_event_alloc() error path
> "
We've had a number of fixes in this area, could you please check
whether you can reproduce this crash with the latest perf tree:
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf/core
Thanks!
Ingo
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 04/19] perf: Simplify perf_event_alloc() error path
2025-03-06 9:24 ` Ingo Molnar
@ 2025-03-07 3:16 ` Lai, Yi
2025-03-07 11:33 ` Ingo Molnar
0 siblings, 1 reply; 85+ messages in thread
From: Lai, Yi @ 2025-03-07 3:16 UTC (permalink / raw)
To: Ingo Molnar
Cc: Peter Zijlstra, lucas.demarchi, linux-kernel, willy, acme,
namhyung, mark.rutland, alexander.shishkin, jolsa, irogers,
adrian.hunter, kan.liang, yi1.lai
On Thu, Mar 06, 2025 at 10:24:35AM +0100, Ingo Molnar wrote:
>
> * Lai, Yi <yi1.lai@linux.intel.com> wrote:
>
> > Hi Peter Zijlstra ,
> >
> > Greetings!
> >
> > I used Syzkaller and found that in linux-next (tag: next-20250303), there are two issues and the first bad commit for both issues is
> >
> > "
> > 02be310c2d24 perf/core: Simplify the perf_event_alloc() error path
> > "
>
> We've had a number of fixes in this area, could you please check
> whether you can reproduce this crash with the latest perf tree:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf/core
>
Tested the tip tree perf/core branch - HEAD 7a310c644cf571fbdb1d447a1dc39cf048634589:
Above two issues cannot be reproduced.
Regards,
Yi Lai
> Thanks!
>
> Ingo
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 04/19] perf: Simplify perf_event_alloc() error path
2025-03-07 3:16 ` Lai, Yi
@ 2025-03-07 11:33 ` Ingo Molnar
0 siblings, 0 replies; 85+ messages in thread
From: Ingo Molnar @ 2025-03-07 11:33 UTC (permalink / raw)
To: Lai, Yi
Cc: Peter Zijlstra, lucas.demarchi, linux-kernel, willy, acme,
namhyung, mark.rutland, alexander.shishkin, jolsa, irogers,
adrian.hunter, kan.liang, yi1.lai
* Lai, Yi <yi1.lai@linux.intel.com> wrote:
> On Thu, Mar 06, 2025 at 10:24:35AM +0100, Ingo Molnar wrote:
> >
> > * Lai, Yi <yi1.lai@linux.intel.com> wrote:
> >
> > > Hi Peter Zijlstra ,
> > >
> > > Greetings!
> > >
> > > I used Syzkaller and found that in linux-next (tag: next-20250303), there are two issues and the first bad commit for both issues is
> > >
> > > "
> > > 02be310c2d24 perf/core: Simplify the perf_event_alloc() error path
> > > "
> >
> > We've had a number of fixes in this area, could you please check
> > whether you can reproduce this crash with the latest perf tree:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf/core
> >
> Tested the tip tree perf/core branch - HEAD 7a310c644cf571fbdb1d447a1dc39cf048634589:
>
> Above two issues cannot be reproduced.
Thank you!
Ingo
^ permalink raw reply [flat|nested] 85+ messages in thread
* [PATCH 05/19] perf: Simplify perf_pmu_register() error path
2024-11-04 13:39 [PATCH 00/19] perf: Make perf_pmu_unregister() usable Peter Zijlstra
` (3 preceding siblings ...)
2024-11-04 13:39 ` [PATCH 04/19] perf: Simplify perf_event_alloc() error path Peter Zijlstra
@ 2024-11-04 13:39 ` Peter Zijlstra
2025-03-01 20:07 ` [tip: perf/core] perf/core: Simplify the " tip-bot2 for Peter Zijlstra
2025-03-04 8:57 ` tip-bot2 for Peter Zijlstra
2024-11-04 13:39 ` [PATCH 06/19] perf: Simplify perf_pmu_register() Peter Zijlstra
` (15 subsequent siblings)
20 siblings, 2 replies; 85+ messages in thread
From: Peter Zijlstra @ 2024-11-04 13:39 UTC (permalink / raw)
To: mingo, lucas.demarchi
Cc: linux-kernel, peterz, willy, acme, namhyung, mark.rutland,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang
The error path of perf_pmu_register() is of course very similar to a
subset of perf_pmu_unregister(). Extract this common part in
perf_pmu_free() and simplify things.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
kernel/events/core.c | 66 ++++++++++++++++++++++-----------------------------
1 file changed, 29 insertions(+), 37 deletions(-)
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -11540,11 +11540,6 @@ static int perf_event_idx_default(struct
return 0;
}
-static void free_pmu_context(struct pmu *pmu)
-{
- free_percpu(pmu->cpu_pmu_context);
-}
-
/*
* Let userspace know that this PMU supports address range filtering:
*/
@@ -11771,25 +11766,38 @@ static bool idr_cmpxchg(struct idr *idr,
return true;
}
+static void perf_pmu_free(struct pmu *pmu)
+{
+ free_percpu(pmu->pmu_disable_count);
+ if (pmu_bus_running && pmu->dev && pmu->dev != PMU_NULL_DEV) {
+ if (pmu->nr_addr_filters)
+ device_remove_file(pmu->dev, &dev_attr_nr_addr_filters);
+ device_del(pmu->dev);
+ put_device(pmu->dev);
+ }
+ free_percpu(pmu->cpu_pmu_context);
+}
+
int perf_pmu_register(struct pmu *pmu, const char *name, int type)
{
int cpu, ret, max = PERF_TYPE_MAX;
+ pmu->type = -1;
+
mutex_lock(&pmus_lock);
ret = -ENOMEM;
pmu->pmu_disable_count = alloc_percpu(int);
if (!pmu->pmu_disable_count)
goto unlock;
- pmu->type = -1;
if (WARN_ONCE(!name, "Can not register anonymous pmu.\n")) {
ret = -EINVAL;
- goto free_pdc;
+ goto free;
}
if (WARN_ONCE(pmu->scope >= PERF_PMU_MAX_SCOPE, "Can not register a pmu with an invalid scope.\n")) {
ret = -EINVAL;
- goto free_pdc;
+ goto free;
}
pmu->name = name;
@@ -11799,24 +11807,23 @@ int perf_pmu_register(struct pmu *pmu, c
ret = idr_alloc(&pmu_idr, NULL, max, 0, GFP_KERNEL);
if (ret < 0)
- goto free_pdc;
+ goto free;
WARN_ON(type >= 0 && ret != type);
- type = ret;
- pmu->type = type;
+ pmu->type = ret;
atomic_set(&pmu->exclusive_cnt, 0);
if (pmu_bus_running && !pmu->dev) {
ret = pmu_dev_alloc(pmu);
if (ret)
- goto free_idr;
+ goto free;
}
ret = -ENOMEM;
pmu->cpu_pmu_context = alloc_percpu(struct perf_cpu_pmu_context);
if (!pmu->cpu_pmu_context)
- goto free_dev;
+ goto free;
for_each_possible_cpu(cpu) {
struct perf_cpu_pmu_context *cpc;
@@ -11857,8 +11864,10 @@ int perf_pmu_register(struct pmu *pmu, c
/*
* Now that the PMU is complete, make it visible to perf_try_init_event().
*/
- if (!idr_cmpxchg(&pmu_idr, pmu->type, NULL, pmu))
- goto free_context;
+ if (!idr_cmpxchg(&pmu_idr, pmu->type, NULL, pmu)) {
+ ret = -EINVAL;
+ goto free;
+ }
list_add_rcu(&pmu->entry, &pmus);
ret = 0;
@@ -11867,20 +11876,10 @@ int perf_pmu_register(struct pmu *pmu, c
return ret;
-free_context:
- free_percpu(pmu->cpu_pmu_context);
-
-free_dev:
- if (pmu->dev && pmu->dev != PMU_NULL_DEV) {
- device_del(pmu->dev);
- put_device(pmu->dev);
- }
-
-free_idr:
- idr_remove(&pmu_idr, pmu->type);
-
-free_pdc:
- free_percpu(pmu->pmu_disable_count);
+free:
+ if (pmu->type >= 0)
+ idr_remove(&pmu_idr, pmu->type);
+ perf_pmu_free(pmu);
goto unlock;
}
EXPORT_SYMBOL_GPL(perf_pmu_register);
@@ -11899,14 +11898,7 @@ void perf_pmu_unregister(struct pmu *pmu
synchronize_srcu(&pmus_srcu);
synchronize_rcu();
- free_percpu(pmu->pmu_disable_count);
- if (pmu_bus_running && pmu->dev && pmu->dev != PMU_NULL_DEV) {
- if (pmu->nr_addr_filters)
- device_remove_file(pmu->dev, &dev_attr_nr_addr_filters);
- device_del(pmu->dev);
- put_device(pmu->dev);
- }
- free_pmu_context(pmu);
+ perf_pmu_free(pmu);
}
EXPORT_SYMBOL_GPL(perf_pmu_unregister);
^ permalink raw reply [flat|nested] 85+ messages in thread* [tip: perf/core] perf/core: Simplify the perf_pmu_register() error path
2024-11-04 13:39 ` [PATCH 05/19] perf: Simplify perf_pmu_register() " Peter Zijlstra
@ 2025-03-01 20:07 ` tip-bot2 for Peter Zijlstra
2025-03-04 8:57 ` tip-bot2 for Peter Zijlstra
1 sibling, 0 replies; 85+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2025-03-01 20:07 UTC (permalink / raw)
To: linux-tip-commits
Cc: Peter Zijlstra (Intel), Ingo Molnar, Ravi Bangoria, x86,
linux-kernel
The following commit has been merged into the perf/core branch of tip:
Commit-ID: e6b17cfd528d141b69b5a1b948f6bf619c922bf4
Gitweb: https://git.kernel.org/tip/e6b17cfd528d141b69b5a1b948f6bf619c922bf4
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Mon, 04 Nov 2024 14:39:14 +01:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Sat, 01 Mar 2025 19:54:05 +01:00
perf/core: Simplify the perf_pmu_register() error path
The error path of perf_pmu_register() is of course very similar to a
subset of perf_pmu_unregister(). Extract this common part in
perf_pmu_free() and simplify things.
[ mingo: Forward ported it ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20241104135518.090915501@infradead.org
---
kernel/events/core.c | 67 +++++++++++++++++++------------------------
1 file changed, 30 insertions(+), 37 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 1b8b1c8..ee5cdd6 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -11675,11 +11675,6 @@ static int perf_event_idx_default(struct perf_event *event)
return 0;
}
-static void free_pmu_context(struct pmu *pmu)
-{
- free_percpu(pmu->cpu_pmu_context);
-}
-
/*
* Let userspace know that this PMU supports address range filtering:
*/
@@ -11885,6 +11880,7 @@ del_dev:
free_dev:
put_device(pmu->dev);
+ pmu->dev = NULL;
goto out;
}
@@ -11906,25 +11902,38 @@ static bool idr_cmpxchg(struct idr *idr, unsigned long id, void *old, void *new)
return true;
}
+static void perf_pmu_free(struct pmu *pmu)
+{
+ free_percpu(pmu->pmu_disable_count);
+ if (pmu_bus_running && pmu->dev && pmu->dev != PMU_NULL_DEV) {
+ if (pmu->nr_addr_filters)
+ device_remove_file(pmu->dev, &dev_attr_nr_addr_filters);
+ device_del(pmu->dev);
+ put_device(pmu->dev);
+ }
+ free_percpu(pmu->cpu_pmu_context);
+}
+
int perf_pmu_register(struct pmu *pmu, const char *name, int type)
{
int cpu, ret, max = PERF_TYPE_MAX;
+ pmu->type = -1;
+
mutex_lock(&pmus_lock);
ret = -ENOMEM;
pmu->pmu_disable_count = alloc_percpu(int);
if (!pmu->pmu_disable_count)
goto unlock;
- pmu->type = -1;
if (WARN_ONCE(!name, "Can not register anonymous pmu.\n")) {
ret = -EINVAL;
- goto free_pdc;
+ goto free;
}
if (WARN_ONCE(pmu->scope >= PERF_PMU_MAX_SCOPE, "Can not register a pmu with an invalid scope.\n")) {
ret = -EINVAL;
- goto free_pdc;
+ goto free;
}
pmu->name = name;
@@ -11934,24 +11943,23 @@ int perf_pmu_register(struct pmu *pmu, const char *name, int type)
ret = idr_alloc(&pmu_idr, NULL, max, 0, GFP_KERNEL);
if (ret < 0)
- goto free_pdc;
+ goto free;
WARN_ON(type >= 0 && ret != type);
- type = ret;
- pmu->type = type;
+ pmu->type = ret;
atomic_set(&pmu->exclusive_cnt, 0);
if (pmu_bus_running && !pmu->dev) {
ret = pmu_dev_alloc(pmu);
if (ret)
- goto free_idr;
+ goto free;
}
ret = -ENOMEM;
pmu->cpu_pmu_context = alloc_percpu(struct perf_cpu_pmu_context);
if (!pmu->cpu_pmu_context)
- goto free_dev;
+ goto free;
for_each_possible_cpu(cpu) {
struct perf_cpu_pmu_context *cpc;
@@ -11992,8 +12000,10 @@ int perf_pmu_register(struct pmu *pmu, const char *name, int type)
/*
* Now that the PMU is complete, make it visible to perf_try_init_event().
*/
- if (!idr_cmpxchg(&pmu_idr, pmu->type, NULL, pmu))
- goto free_context;
+ if (!idr_cmpxchg(&pmu_idr, pmu->type, NULL, pmu)) {
+ ret = -EINVAL;
+ goto free;
+ }
list_add_rcu(&pmu->entry, &pmus);
ret = 0;
@@ -12002,20 +12012,10 @@ unlock:
return ret;
-free_context:
- free_percpu(pmu->cpu_pmu_context);
-
-free_dev:
- if (pmu->dev && pmu->dev != PMU_NULL_DEV) {
- device_del(pmu->dev);
- put_device(pmu->dev);
- }
-
-free_idr:
- idr_remove(&pmu_idr, pmu->type);
-
-free_pdc:
- free_percpu(pmu->pmu_disable_count);
+free:
+ if (pmu->type >= 0)
+ idr_remove(&pmu_idr, pmu->type);
+ perf_pmu_free(pmu);
goto unlock;
}
EXPORT_SYMBOL_GPL(perf_pmu_register);
@@ -12034,14 +12034,7 @@ void perf_pmu_unregister(struct pmu *pmu)
synchronize_srcu(&pmus_srcu);
synchronize_rcu();
- free_percpu(pmu->pmu_disable_count);
- if (pmu_bus_running && pmu->dev && pmu->dev != PMU_NULL_DEV) {
- if (pmu->nr_addr_filters)
- device_remove_file(pmu->dev, &dev_attr_nr_addr_filters);
- device_del(pmu->dev);
- put_device(pmu->dev);
- }
- free_pmu_context(pmu);
+ perf_pmu_free(pmu);
}
EXPORT_SYMBOL_GPL(perf_pmu_unregister);
^ permalink raw reply related [flat|nested] 85+ messages in thread* [tip: perf/core] perf/core: Simplify the perf_pmu_register() error path
2024-11-04 13:39 ` [PATCH 05/19] perf: Simplify perf_pmu_register() " Peter Zijlstra
2025-03-01 20:07 ` [tip: perf/core] perf/core: Simplify the " tip-bot2 for Peter Zijlstra
@ 2025-03-04 8:57 ` tip-bot2 for Peter Zijlstra
1 sibling, 0 replies; 85+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2025-03-04 8:57 UTC (permalink / raw)
To: linux-tip-commits
Cc: Peter Zijlstra (Intel), Ingo Molnar, Ravi Bangoria, x86,
linux-kernel
The following commit has been merged into the perf/core branch of tip:
Commit-ID: 8f4c4963d28349cbf1920ab71edea8276f6ac4c5
Gitweb: https://git.kernel.org/tip/8f4c4963d28349cbf1920ab71edea8276f6ac4c5
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Mon, 04 Nov 2024 14:39:14 +01:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 04 Mar 2025 09:42:26 +01:00
perf/core: Simplify the perf_pmu_register() error path
The error path of perf_pmu_register() is of course very similar to a
subset of perf_pmu_unregister(). Extract this common part in
perf_pmu_free() and simplify things.
[ mingo: Forward ported it ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20241104135518.090915501@infradead.org
---
kernel/events/core.c | 67 +++++++++++++++++++------------------------
1 file changed, 30 insertions(+), 37 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 1b8b1c8..ee5cdd6 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -11675,11 +11675,6 @@ static int perf_event_idx_default(struct perf_event *event)
return 0;
}
-static void free_pmu_context(struct pmu *pmu)
-{
- free_percpu(pmu->cpu_pmu_context);
-}
-
/*
* Let userspace know that this PMU supports address range filtering:
*/
@@ -11885,6 +11880,7 @@ del_dev:
free_dev:
put_device(pmu->dev);
+ pmu->dev = NULL;
goto out;
}
@@ -11906,25 +11902,38 @@ static bool idr_cmpxchg(struct idr *idr, unsigned long id, void *old, void *new)
return true;
}
+static void perf_pmu_free(struct pmu *pmu)
+{
+ free_percpu(pmu->pmu_disable_count);
+ if (pmu_bus_running && pmu->dev && pmu->dev != PMU_NULL_DEV) {
+ if (pmu->nr_addr_filters)
+ device_remove_file(pmu->dev, &dev_attr_nr_addr_filters);
+ device_del(pmu->dev);
+ put_device(pmu->dev);
+ }
+ free_percpu(pmu->cpu_pmu_context);
+}
+
int perf_pmu_register(struct pmu *pmu, const char *name, int type)
{
int cpu, ret, max = PERF_TYPE_MAX;
+ pmu->type = -1;
+
mutex_lock(&pmus_lock);
ret = -ENOMEM;
pmu->pmu_disable_count = alloc_percpu(int);
if (!pmu->pmu_disable_count)
goto unlock;
- pmu->type = -1;
if (WARN_ONCE(!name, "Can not register anonymous pmu.\n")) {
ret = -EINVAL;
- goto free_pdc;
+ goto free;
}
if (WARN_ONCE(pmu->scope >= PERF_PMU_MAX_SCOPE, "Can not register a pmu with an invalid scope.\n")) {
ret = -EINVAL;
- goto free_pdc;
+ goto free;
}
pmu->name = name;
@@ -11934,24 +11943,23 @@ int perf_pmu_register(struct pmu *pmu, const char *name, int type)
ret = idr_alloc(&pmu_idr, NULL, max, 0, GFP_KERNEL);
if (ret < 0)
- goto free_pdc;
+ goto free;
WARN_ON(type >= 0 && ret != type);
- type = ret;
- pmu->type = type;
+ pmu->type = ret;
atomic_set(&pmu->exclusive_cnt, 0);
if (pmu_bus_running && !pmu->dev) {
ret = pmu_dev_alloc(pmu);
if (ret)
- goto free_idr;
+ goto free;
}
ret = -ENOMEM;
pmu->cpu_pmu_context = alloc_percpu(struct perf_cpu_pmu_context);
if (!pmu->cpu_pmu_context)
- goto free_dev;
+ goto free;
for_each_possible_cpu(cpu) {
struct perf_cpu_pmu_context *cpc;
@@ -11992,8 +12000,10 @@ int perf_pmu_register(struct pmu *pmu, const char *name, int type)
/*
* Now that the PMU is complete, make it visible to perf_try_init_event().
*/
- if (!idr_cmpxchg(&pmu_idr, pmu->type, NULL, pmu))
- goto free_context;
+ if (!idr_cmpxchg(&pmu_idr, pmu->type, NULL, pmu)) {
+ ret = -EINVAL;
+ goto free;
+ }
list_add_rcu(&pmu->entry, &pmus);
ret = 0;
@@ -12002,20 +12012,10 @@ unlock:
return ret;
-free_context:
- free_percpu(pmu->cpu_pmu_context);
-
-free_dev:
- if (pmu->dev && pmu->dev != PMU_NULL_DEV) {
- device_del(pmu->dev);
- put_device(pmu->dev);
- }
-
-free_idr:
- idr_remove(&pmu_idr, pmu->type);
-
-free_pdc:
- free_percpu(pmu->pmu_disable_count);
+free:
+ if (pmu->type >= 0)
+ idr_remove(&pmu_idr, pmu->type);
+ perf_pmu_free(pmu);
goto unlock;
}
EXPORT_SYMBOL_GPL(perf_pmu_register);
@@ -12034,14 +12034,7 @@ void perf_pmu_unregister(struct pmu *pmu)
synchronize_srcu(&pmus_srcu);
synchronize_rcu();
- free_percpu(pmu->pmu_disable_count);
- if (pmu_bus_running && pmu->dev && pmu->dev != PMU_NULL_DEV) {
- if (pmu->nr_addr_filters)
- device_remove_file(pmu->dev, &dev_attr_nr_addr_filters);
- device_del(pmu->dev);
- put_device(pmu->dev);
- }
- free_pmu_context(pmu);
+ perf_pmu_free(pmu);
}
EXPORT_SYMBOL_GPL(perf_pmu_unregister);
^ permalink raw reply related [flat|nested] 85+ messages in thread
* [PATCH 06/19] perf: Simplify perf_pmu_register()
2024-11-04 13:39 [PATCH 00/19] perf: Make perf_pmu_unregister() usable Peter Zijlstra
` (4 preceding siblings ...)
2024-11-04 13:39 ` [PATCH 05/19] perf: Simplify perf_pmu_register() " Peter Zijlstra
@ 2024-11-04 13:39 ` Peter Zijlstra
2024-11-20 13:06 ` Ravi Bangoria
` (3 more replies)
2024-11-04 13:39 ` [PATCH 07/19] perf: Simplify perf_init_event() Peter Zijlstra
` (14 subsequent siblings)
20 siblings, 4 replies; 85+ messages in thread
From: Peter Zijlstra @ 2024-11-04 13:39 UTC (permalink / raw)
To: mingo, lucas.demarchi
Cc: linux-kernel, peterz, willy, acme, namhyung, mark.rutland,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang
Using the previously introduced perf_pmu_free() and a new IDR helper,
simplify the perf_pmu_register error paths.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
include/linux/idr.h | 17 ++++++++++++
kernel/events/core.c | 71 ++++++++++++++++++++-------------------------------
2 files changed, 46 insertions(+), 42 deletions(-)
--- a/include/linux/idr.h
+++ b/include/linux/idr.h
@@ -15,6 +15,7 @@
#include <linux/radix-tree.h>
#include <linux/gfp.h>
#include <linux/percpu.h>
+#include <linux/cleanup.h>
struct idr {
struct radix_tree_root idr_rt;
@@ -124,6 +125,22 @@ void *idr_get_next_ul(struct idr *, unsi
void *idr_replace(struct idr *, void *, unsigned long id);
void idr_destroy(struct idr *);
+struct __class_idr {
+ struct idr *idr;
+ int id;
+};
+
+#define idr_null ((struct __class_idr){ NULL, -1 })
+#define take_idr_id(id) __get_and_null(id, idr_null)
+
+DEFINE_CLASS(idr_alloc, struct __class_idr,
+ if (_T.id >= 0) idr_remove(_T.idr, _T.id),
+ ((struct __class_idr){
+ .idr = idr,
+ .id = idr_alloc(idr, ptr, start, end, gfp),
+ }),
+ struct idr *idr, void *ptr, int start, int end, gfp_t gfp);
+
/**
* idr_init_base() - Initialise an IDR.
* @idr: IDR handle.
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -11778,52 +11778,49 @@ static void perf_pmu_free(struct pmu *pm
free_percpu(pmu->cpu_pmu_context);
}
-int perf_pmu_register(struct pmu *pmu, const char *name, int type)
+DEFINE_FREE(pmu_unregister, struct pmu *, if (_T) perf_pmu_free(_T))
+
+int perf_pmu_register(struct pmu *_pmu, const char *name, int type)
{
- int cpu, ret, max = PERF_TYPE_MAX;
+ int cpu, max = PERF_TYPE_MAX;
- pmu->type = -1;
+ struct pmu *pmu __free(pmu_unregister) = _pmu;
+ guard(mutex)(&pmus_lock);
- mutex_lock(&pmus_lock);
- ret = -ENOMEM;
pmu->pmu_disable_count = alloc_percpu(int);
if (!pmu->pmu_disable_count)
- goto unlock;
+ return -ENOMEM;
- if (WARN_ONCE(!name, "Can not register anonymous pmu.\n")) {
- ret = -EINVAL;
- goto free;
- }
+ if (WARN_ONCE(!name, "Can not register anonymous pmu.\n"))
+ return -EINVAL;
- if (WARN_ONCE(pmu->scope >= PERF_PMU_MAX_SCOPE, "Can not register a pmu with an invalid scope.\n")) {
- ret = -EINVAL;
- goto free;
- }
+ if (WARN_ONCE(pmu->scope >= PERF_PMU_MAX_SCOPE,
+ "Can not register a pmu with an invalid scope.\n"))
+ return -EINVAL;
pmu->name = name;
if (type >= 0)
max = type;
- ret = idr_alloc(&pmu_idr, NULL, max, 0, GFP_KERNEL);
- if (ret < 0)
- goto free;
+ CLASS(idr_alloc, pmu_type)(&pmu_idr, NULL, max, 0, GFP_KERNEL);
+ if (pmu_type.id < 0)
+ return pmu_type.id;
- WARN_ON(type >= 0 && ret != type);
+ WARN_ON(type >= 0 && pmu_type.id != type);
- pmu->type = ret;
+ pmu->type = pmu_type.id;
atomic_set(&pmu->exclusive_cnt, 0);
if (pmu_bus_running && !pmu->dev) {
- ret = pmu_dev_alloc(pmu);
+ int ret = pmu_dev_alloc(pmu);
if (ret)
- goto free;
+ return ret;
}
- ret = -ENOMEM;
pmu->cpu_pmu_context = alloc_percpu(struct perf_cpu_pmu_context);
if (!pmu->cpu_pmu_context)
- goto free;
+ return -ENOMEM;
for_each_possible_cpu(cpu) {
struct perf_cpu_pmu_context *cpc;
@@ -11864,32 +11861,22 @@ int perf_pmu_register(struct pmu *pmu, c
/*
* Now that the PMU is complete, make it visible to perf_try_init_event().
*/
- if (!idr_cmpxchg(&pmu_idr, pmu->type, NULL, pmu)) {
- ret = -EINVAL;
- goto free;
- }
+ if (!idr_cmpxchg(&pmu_idr, pmu->type, NULL, pmu))
+ return -EINVAL;
list_add_rcu(&pmu->entry, &pmus);
- ret = 0;
-unlock:
- mutex_unlock(&pmus_lock);
-
- return ret;
-
-free:
- if (pmu->type >= 0)
- idr_remove(&pmu_idr, pmu->type);
- perf_pmu_free(pmu);
- goto unlock;
+ take_idr_id(pmu_type);
+ _pmu = no_free_ptr(pmu); // let it rip
+ return 0;
}
EXPORT_SYMBOL_GPL(perf_pmu_register);
void perf_pmu_unregister(struct pmu *pmu)
{
- mutex_lock(&pmus_lock);
- list_del_rcu(&pmu->entry);
- idr_remove(&pmu_idr, pmu->type);
- mutex_unlock(&pmus_lock);
+ scoped_guard (mutex, &pmus_lock) {
+ list_del_rcu(&pmu->entry);
+ idr_remove(&pmu_idr, pmu->type);
+ }
/*
* We dereference the pmu list under both SRCU and regular RCU, so
^ permalink raw reply [flat|nested] 85+ messages in thread* Re: [PATCH 06/19] perf: Simplify perf_pmu_register()
2024-11-04 13:39 ` [PATCH 06/19] perf: Simplify perf_pmu_register() Peter Zijlstra
@ 2024-11-20 13:06 ` Ravi Bangoria
2024-11-20 14:46 ` Peter Zijlstra
2025-03-01 20:07 ` [tip: perf/core] perf/core: " tip-bot2 for Peter Zijlstra
` (2 subsequent siblings)
3 siblings, 1 reply; 85+ messages in thread
From: Ravi Bangoria @ 2024-11-20 13:06 UTC (permalink / raw)
To: Peter Zijlstra
Cc: mingo, lucas.demarchi, linux-kernel, willy, acme, namhyung,
mark.rutland, alexander.shishkin, jolsa, irogers, adrian.hunter,
kan.liang, Ravi Bangoria
Hi Peter,
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -11778,52 +11778,49 @@ static void perf_pmu_free(struct pmu *pm
> free_percpu(pmu->cpu_pmu_context);
> }
>
> -int perf_pmu_register(struct pmu *pmu, const char *name, int type)
> +DEFINE_FREE(pmu_unregister, struct pmu *, if (_T) perf_pmu_free(_T))
> +
> +int perf_pmu_register(struct pmu *_pmu, const char *name, int type)
> {
> - int cpu, ret, max = PERF_TYPE_MAX;
> + int cpu, max = PERF_TYPE_MAX;
>
> - pmu->type = -1;
> + struct pmu *pmu __free(pmu_unregister) = _pmu;
> + guard(mutex)(&pmus_lock);
>
> - mutex_lock(&pmus_lock);
> - ret = -ENOMEM;
> pmu->pmu_disable_count = alloc_percpu(int);
> if (!pmu->pmu_disable_count)
> - goto unlock;
> + return -ENOMEM;
>
> - if (WARN_ONCE(!name, "Can not register anonymous pmu.\n")) {
> - ret = -EINVAL;
> - goto free;
> - }
> + if (WARN_ONCE(!name, "Can not register anonymous pmu.\n"))
> + return -EINVAL;
>
> - if (WARN_ONCE(pmu->scope >= PERF_PMU_MAX_SCOPE, "Can not register a pmu with an invalid scope.\n")) {
> - ret = -EINVAL;
> - goto free;
> - }
> + if (WARN_ONCE(pmu->scope >= PERF_PMU_MAX_SCOPE,
> + "Can not register a pmu with an invalid scope.\n"))
> + return -EINVAL;
>
> pmu->name = name;
>
> if (type >= 0)
> max = type;
>
> - ret = idr_alloc(&pmu_idr, NULL, max, 0, GFP_KERNEL);
> - if (ret < 0)
> - goto free;
> + CLASS(idr_alloc, pmu_type)(&pmu_idr, NULL, max, 0, GFP_KERNEL);
> + if (pmu_type.id < 0)
> + return pmu_type.id;
>
> - WARN_ON(type >= 0 && ret != type);
> + WARN_ON(type >= 0 && pmu_type.id != type);
>
> - pmu->type = ret;
> + pmu->type = pmu_type.id;
> atomic_set(&pmu->exclusive_cnt, 0);
>
> if (pmu_bus_running && !pmu->dev) {
> - ret = pmu_dev_alloc(pmu);
> + int ret = pmu_dev_alloc(pmu);
> if (ret)
> - goto free;
> + return ret;
pmu_dev_alloc() can fail before or in device_add(). perf_pmu_free() should
not call device_del() for such cases. No?
Thanks,
Ravi
^ permalink raw reply [flat|nested] 85+ messages in thread* Re: [PATCH 06/19] perf: Simplify perf_pmu_register()
2024-11-20 13:06 ` Ravi Bangoria
@ 2024-11-20 14:46 ` Peter Zijlstra
2024-11-20 15:53 ` Ravi Bangoria
0 siblings, 1 reply; 85+ messages in thread
From: Peter Zijlstra @ 2024-11-20 14:46 UTC (permalink / raw)
To: Ravi Bangoria
Cc: mingo, lucas.demarchi, linux-kernel, willy, acme, namhyung,
mark.rutland, alexander.shishkin, jolsa, irogers, adrian.hunter,
kan.liang
On Wed, Nov 20, 2024 at 06:36:55PM +0530, Ravi Bangoria wrote:
> Hi Peter,
>
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -11778,52 +11778,49 @@ static void perf_pmu_free(struct pmu *pm
> > free_percpu(pmu->cpu_pmu_context);
> > }
> >
> > -int perf_pmu_register(struct pmu *pmu, const char *name, int type)
> > +DEFINE_FREE(pmu_unregister, struct pmu *, if (_T) perf_pmu_free(_T))
> > +
> > +int perf_pmu_register(struct pmu *_pmu, const char *name, int type)
> > {
> > - int cpu, ret, max = PERF_TYPE_MAX;
> > + int cpu, max = PERF_TYPE_MAX;
> >
> > - pmu->type = -1;
> > + struct pmu *pmu __free(pmu_unregister) = _pmu;
> > + guard(mutex)(&pmus_lock);
> >
> > - mutex_lock(&pmus_lock);
> > - ret = -ENOMEM;
> > pmu->pmu_disable_count = alloc_percpu(int);
> > if (!pmu->pmu_disable_count)
> > - goto unlock;
> > + return -ENOMEM;
> >
> > - if (WARN_ONCE(!name, "Can not register anonymous pmu.\n")) {
> > - ret = -EINVAL;
> > - goto free;
> > - }
> > + if (WARN_ONCE(!name, "Can not register anonymous pmu.\n"))
> > + return -EINVAL;
> >
> > - if (WARN_ONCE(pmu->scope >= PERF_PMU_MAX_SCOPE, "Can not register a pmu with an invalid scope.\n")) {
> > - ret = -EINVAL;
> > - goto free;
> > - }
> > + if (WARN_ONCE(pmu->scope >= PERF_PMU_MAX_SCOPE,
> > + "Can not register a pmu with an invalid scope.\n"))
> > + return -EINVAL;
> >
> > pmu->name = name;
> >
> > if (type >= 0)
> > max = type;
> >
> > - ret = idr_alloc(&pmu_idr, NULL, max, 0, GFP_KERNEL);
> > - if (ret < 0)
> > - goto free;
> > + CLASS(idr_alloc, pmu_type)(&pmu_idr, NULL, max, 0, GFP_KERNEL);
> > + if (pmu_type.id < 0)
> > + return pmu_type.id;
> >
> > - WARN_ON(type >= 0 && ret != type);
> > + WARN_ON(type >= 0 && pmu_type.id != type);
> >
> > - pmu->type = ret;
> > + pmu->type = pmu_type.id;
> > atomic_set(&pmu->exclusive_cnt, 0);
> >
> > if (pmu_bus_running && !pmu->dev) {
> > - ret = pmu_dev_alloc(pmu);
> > + int ret = pmu_dev_alloc(pmu);
> > if (ret)
> > - goto free;
> > + return ret;
>
> pmu_dev_alloc() can fail before or in device_add(). perf_pmu_free() should
> not call device_del() for such cases. No?
Right you are -- but is this not introduced in the previous patch?
Also, this should cure things, no?
---
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -11810,6 +11810,7 @@ static int pmu_dev_alloc(struct pmu *pmu
free_dev:
put_device(pmu->dev);
+ pmu->dev = NULL;
goto out;
}
^ permalink raw reply [flat|nested] 85+ messages in thread* Re: [PATCH 06/19] perf: Simplify perf_pmu_register()
2024-11-20 14:46 ` Peter Zijlstra
@ 2024-11-20 15:53 ` Ravi Bangoria
0 siblings, 0 replies; 85+ messages in thread
From: Ravi Bangoria @ 2024-11-20 15:53 UTC (permalink / raw)
To: Peter Zijlstra
Cc: mingo, lucas.demarchi, linux-kernel, willy, acme, namhyung,
mark.rutland, alexander.shishkin, jolsa, irogers, adrian.hunter,
kan.liang, Ravi Bangoria
On 20-Nov-24 8:16 PM, Peter Zijlstra wrote:
> On Wed, Nov 20, 2024 at 06:36:55PM +0530, Ravi Bangoria wrote:
>> Hi Peter,
>>
>>> --- a/kernel/events/core.c
>>> +++ b/kernel/events/core.c
>>> @@ -11778,52 +11778,49 @@ static void perf_pmu_free(struct pmu *pm
>>> free_percpu(pmu->cpu_pmu_context);
>>> }
>>>
>>> -int perf_pmu_register(struct pmu *pmu, const char *name, int type)
>>> +DEFINE_FREE(pmu_unregister, struct pmu *, if (_T) perf_pmu_free(_T))
>>> +
>>> +int perf_pmu_register(struct pmu *_pmu, const char *name, int type)
>>> {
>>> - int cpu, ret, max = PERF_TYPE_MAX;
>>> + int cpu, max = PERF_TYPE_MAX;
>>>
>>> - pmu->type = -1;
>>> + struct pmu *pmu __free(pmu_unregister) = _pmu;
>>> + guard(mutex)(&pmus_lock);
>>>
>>> - mutex_lock(&pmus_lock);
>>> - ret = -ENOMEM;
>>> pmu->pmu_disable_count = alloc_percpu(int);
>>> if (!pmu->pmu_disable_count)
>>> - goto unlock;
>>> + return -ENOMEM;
>>>
>>> - if (WARN_ONCE(!name, "Can not register anonymous pmu.\n")) {
>>> - ret = -EINVAL;
>>> - goto free;
>>> - }
>>> + if (WARN_ONCE(!name, "Can not register anonymous pmu.\n"))
>>> + return -EINVAL;
>>>
>>> - if (WARN_ONCE(pmu->scope >= PERF_PMU_MAX_SCOPE, "Can not register a pmu with an invalid scope.\n")) {
>>> - ret = -EINVAL;
>>> - goto free;
>>> - }
>>> + if (WARN_ONCE(pmu->scope >= PERF_PMU_MAX_SCOPE,
>>> + "Can not register a pmu with an invalid scope.\n"))
>>> + return -EINVAL;
>>>
>>> pmu->name = name;
>>>
>>> if (type >= 0)
>>> max = type;
>>>
>>> - ret = idr_alloc(&pmu_idr, NULL, max, 0, GFP_KERNEL);
>>> - if (ret < 0)
>>> - goto free;
>>> + CLASS(idr_alloc, pmu_type)(&pmu_idr, NULL, max, 0, GFP_KERNEL);
>>> + if (pmu_type.id < 0)
>>> + return pmu_type.id;
>>>
>>> - WARN_ON(type >= 0 && ret != type);
>>> + WARN_ON(type >= 0 && pmu_type.id != type);
>>>
>>> - pmu->type = ret;
>>> + pmu->type = pmu_type.id;
>>> atomic_set(&pmu->exclusive_cnt, 0);
>>>
>>> if (pmu_bus_running && !pmu->dev) {
>>> - ret = pmu_dev_alloc(pmu);
>>> + int ret = pmu_dev_alloc(pmu);
>>> if (ret)
>>> - goto free;
>>> + return ret;
>>
>> pmu_dev_alloc() can fail before or in device_add(). perf_pmu_free() should
>> not call device_del() for such cases. No?
>
> Right you are -- but is this not introduced in the previous patch?
I didn't notice that.
> Also, this should cure things, no?
>
> ---
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -11810,6 +11810,7 @@ static int pmu_dev_alloc(struct pmu *pmu
>
> free_dev:
> put_device(pmu->dev);
> + pmu->dev = NULL;
> goto out;
> }
>
Yes, this should fix it.
Thanks,
Ravi
^ permalink raw reply [flat|nested] 85+ messages in thread
* [tip: perf/core] perf/core: Simplify perf_pmu_register()
2024-11-04 13:39 ` [PATCH 06/19] perf: Simplify perf_pmu_register() Peter Zijlstra
2024-11-20 13:06 ` Ravi Bangoria
@ 2025-03-01 20:07 ` tip-bot2 for Peter Zijlstra
2025-03-04 8:57 ` tip-bot2 for Peter Zijlstra
2025-03-25 19:47 ` [PATCH 06/19] perf: " Sidhartha Kumar
3 siblings, 0 replies; 85+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2025-03-01 20:07 UTC (permalink / raw)
To: linux-tip-commits
Cc: Peter Zijlstra (Intel), Ingo Molnar, Ravi Bangoria, x86,
linux-kernel
The following commit has been merged into the perf/core branch of tip:
Commit-ID: 742d5df92842aa903ad4c2ac2e33ac56cb6b6f05
Gitweb: https://git.kernel.org/tip/742d5df92842aa903ad4c2ac2e33ac56cb6b6f05
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Mon, 04 Nov 2024 14:39:15 +01:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Sat, 01 Mar 2025 19:54:05 +01:00
perf/core: Simplify perf_pmu_register()
Using the previously introduced perf_pmu_free() and a new IDR helper,
simplify the perf_pmu_register error paths.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20241104135518.198937277@infradead.org
---
include/linux/idr.h | 17 ++++++++++-
kernel/events/core.c | 71 +++++++++++++++++--------------------------
2 files changed, 46 insertions(+), 42 deletions(-)
diff --git a/include/linux/idr.h b/include/linux/idr.h
index da5f5fa..cd729be 100644
--- a/include/linux/idr.h
+++ b/include/linux/idr.h
@@ -15,6 +15,7 @@
#include <linux/radix-tree.h>
#include <linux/gfp.h>
#include <linux/percpu.h>
+#include <linux/cleanup.h>
struct idr {
struct radix_tree_root idr_rt;
@@ -124,6 +125,22 @@ void *idr_get_next_ul(struct idr *, unsigned long *nextid);
void *idr_replace(struct idr *, void *, unsigned long id);
void idr_destroy(struct idr *);
+struct __class_idr {
+ struct idr *idr;
+ int id;
+};
+
+#define idr_null ((struct __class_idr){ NULL, -1 })
+#define take_idr_id(id) __get_and_null(id, idr_null)
+
+DEFINE_CLASS(idr_alloc, struct __class_idr,
+ if (_T.id >= 0) idr_remove(_T.idr, _T.id),
+ ((struct __class_idr){
+ .idr = idr,
+ .id = idr_alloc(idr, ptr, start, end, gfp),
+ }),
+ struct idr *idr, void *ptr, int start, int end, gfp_t gfp);
+
/**
* idr_init_base() - Initialise an IDR.
* @idr: IDR handle.
diff --git a/kernel/events/core.c b/kernel/events/core.c
index ee5cdd6..215dad5 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -11914,52 +11914,49 @@ static void perf_pmu_free(struct pmu *pmu)
free_percpu(pmu->cpu_pmu_context);
}
-int perf_pmu_register(struct pmu *pmu, const char *name, int type)
+DEFINE_FREE(pmu_unregister, struct pmu *, if (_T) perf_pmu_free(_T))
+
+int perf_pmu_register(struct pmu *_pmu, const char *name, int type)
{
- int cpu, ret, max = PERF_TYPE_MAX;
+ int cpu, max = PERF_TYPE_MAX;
- pmu->type = -1;
+ struct pmu *pmu __free(pmu_unregister) = _pmu;
+ guard(mutex)(&pmus_lock);
- mutex_lock(&pmus_lock);
- ret = -ENOMEM;
pmu->pmu_disable_count = alloc_percpu(int);
if (!pmu->pmu_disable_count)
- goto unlock;
+ return -ENOMEM;
- if (WARN_ONCE(!name, "Can not register anonymous pmu.\n")) {
- ret = -EINVAL;
- goto free;
- }
+ if (WARN_ONCE(!name, "Can not register anonymous pmu.\n"))
+ return -EINVAL;
- if (WARN_ONCE(pmu->scope >= PERF_PMU_MAX_SCOPE, "Can not register a pmu with an invalid scope.\n")) {
- ret = -EINVAL;
- goto free;
- }
+ if (WARN_ONCE(pmu->scope >= PERF_PMU_MAX_SCOPE,
+ "Can not register a pmu with an invalid scope.\n"))
+ return -EINVAL;
pmu->name = name;
if (type >= 0)
max = type;
- ret = idr_alloc(&pmu_idr, NULL, max, 0, GFP_KERNEL);
- if (ret < 0)
- goto free;
+ CLASS(idr_alloc, pmu_type)(&pmu_idr, NULL, max, 0, GFP_KERNEL);
+ if (pmu_type.id < 0)
+ return pmu_type.id;
- WARN_ON(type >= 0 && ret != type);
+ WARN_ON(type >= 0 && pmu_type.id != type);
- pmu->type = ret;
+ pmu->type = pmu_type.id;
atomic_set(&pmu->exclusive_cnt, 0);
if (pmu_bus_running && !pmu->dev) {
- ret = pmu_dev_alloc(pmu);
+ int ret = pmu_dev_alloc(pmu);
if (ret)
- goto free;
+ return ret;
}
- ret = -ENOMEM;
pmu->cpu_pmu_context = alloc_percpu(struct perf_cpu_pmu_context);
if (!pmu->cpu_pmu_context)
- goto free;
+ return -ENOMEM;
for_each_possible_cpu(cpu) {
struct perf_cpu_pmu_context *cpc;
@@ -12000,32 +11997,22 @@ int perf_pmu_register(struct pmu *pmu, const char *name, int type)
/*
* Now that the PMU is complete, make it visible to perf_try_init_event().
*/
- if (!idr_cmpxchg(&pmu_idr, pmu->type, NULL, pmu)) {
- ret = -EINVAL;
- goto free;
- }
+ if (!idr_cmpxchg(&pmu_idr, pmu->type, NULL, pmu))
+ return -EINVAL;
list_add_rcu(&pmu->entry, &pmus);
- ret = 0;
-unlock:
- mutex_unlock(&pmus_lock);
-
- return ret;
-
-free:
- if (pmu->type >= 0)
- idr_remove(&pmu_idr, pmu->type);
- perf_pmu_free(pmu);
- goto unlock;
+ take_idr_id(pmu_type);
+ _pmu = no_free_ptr(pmu); // let it rip
+ return 0;
}
EXPORT_SYMBOL_GPL(perf_pmu_register);
void perf_pmu_unregister(struct pmu *pmu)
{
- mutex_lock(&pmus_lock);
- list_del_rcu(&pmu->entry);
- idr_remove(&pmu_idr, pmu->type);
- mutex_unlock(&pmus_lock);
+ scoped_guard (mutex, &pmus_lock) {
+ list_del_rcu(&pmu->entry);
+ idr_remove(&pmu_idr, pmu->type);
+ }
/*
* We dereference the pmu list under both SRCU and regular RCU, so
^ permalink raw reply related [flat|nested] 85+ messages in thread* [tip: perf/core] perf/core: Simplify perf_pmu_register()
2024-11-04 13:39 ` [PATCH 06/19] perf: Simplify perf_pmu_register() Peter Zijlstra
2024-11-20 13:06 ` Ravi Bangoria
2025-03-01 20:07 ` [tip: perf/core] perf/core: " tip-bot2 for Peter Zijlstra
@ 2025-03-04 8:57 ` tip-bot2 for Peter Zijlstra
2025-03-25 19:47 ` [PATCH 06/19] perf: " Sidhartha Kumar
3 siblings, 0 replies; 85+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2025-03-04 8:57 UTC (permalink / raw)
To: linux-tip-commits
Cc: Peter Zijlstra (Intel), Ingo Molnar, Ravi Bangoria, x86,
linux-kernel
The following commit has been merged into the perf/core branch of tip:
Commit-ID: 6c8b0b835f003647e593c08331a4dd2150d5eb0e
Gitweb: https://git.kernel.org/tip/6c8b0b835f003647e593c08331a4dd2150d5eb0e
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Mon, 04 Nov 2024 14:39:15 +01:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 04 Mar 2025 09:42:29 +01:00
perf/core: Simplify perf_pmu_register()
Using the previously introduced perf_pmu_free() and a new IDR helper,
simplify the perf_pmu_register error paths.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20241104135518.198937277@infradead.org
---
include/linux/idr.h | 17 ++++++++++-
kernel/events/core.c | 71 +++++++++++++++++--------------------------
2 files changed, 46 insertions(+), 42 deletions(-)
diff --git a/include/linux/idr.h b/include/linux/idr.h
index da5f5fa..cd729be 100644
--- a/include/linux/idr.h
+++ b/include/linux/idr.h
@@ -15,6 +15,7 @@
#include <linux/radix-tree.h>
#include <linux/gfp.h>
#include <linux/percpu.h>
+#include <linux/cleanup.h>
struct idr {
struct radix_tree_root idr_rt;
@@ -124,6 +125,22 @@ void *idr_get_next_ul(struct idr *, unsigned long *nextid);
void *idr_replace(struct idr *, void *, unsigned long id);
void idr_destroy(struct idr *);
+struct __class_idr {
+ struct idr *idr;
+ int id;
+};
+
+#define idr_null ((struct __class_idr){ NULL, -1 })
+#define take_idr_id(id) __get_and_null(id, idr_null)
+
+DEFINE_CLASS(idr_alloc, struct __class_idr,
+ if (_T.id >= 0) idr_remove(_T.idr, _T.id),
+ ((struct __class_idr){
+ .idr = idr,
+ .id = idr_alloc(idr, ptr, start, end, gfp),
+ }),
+ struct idr *idr, void *ptr, int start, int end, gfp_t gfp);
+
/**
* idr_init_base() - Initialise an IDR.
* @idr: IDR handle.
diff --git a/kernel/events/core.c b/kernel/events/core.c
index ee5cdd6..215dad5 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -11914,52 +11914,49 @@ static void perf_pmu_free(struct pmu *pmu)
free_percpu(pmu->cpu_pmu_context);
}
-int perf_pmu_register(struct pmu *pmu, const char *name, int type)
+DEFINE_FREE(pmu_unregister, struct pmu *, if (_T) perf_pmu_free(_T))
+
+int perf_pmu_register(struct pmu *_pmu, const char *name, int type)
{
- int cpu, ret, max = PERF_TYPE_MAX;
+ int cpu, max = PERF_TYPE_MAX;
- pmu->type = -1;
+ struct pmu *pmu __free(pmu_unregister) = _pmu;
+ guard(mutex)(&pmus_lock);
- mutex_lock(&pmus_lock);
- ret = -ENOMEM;
pmu->pmu_disable_count = alloc_percpu(int);
if (!pmu->pmu_disable_count)
- goto unlock;
+ return -ENOMEM;
- if (WARN_ONCE(!name, "Can not register anonymous pmu.\n")) {
- ret = -EINVAL;
- goto free;
- }
+ if (WARN_ONCE(!name, "Can not register anonymous pmu.\n"))
+ return -EINVAL;
- if (WARN_ONCE(pmu->scope >= PERF_PMU_MAX_SCOPE, "Can not register a pmu with an invalid scope.\n")) {
- ret = -EINVAL;
- goto free;
- }
+ if (WARN_ONCE(pmu->scope >= PERF_PMU_MAX_SCOPE,
+ "Can not register a pmu with an invalid scope.\n"))
+ return -EINVAL;
pmu->name = name;
if (type >= 0)
max = type;
- ret = idr_alloc(&pmu_idr, NULL, max, 0, GFP_KERNEL);
- if (ret < 0)
- goto free;
+ CLASS(idr_alloc, pmu_type)(&pmu_idr, NULL, max, 0, GFP_KERNEL);
+ if (pmu_type.id < 0)
+ return pmu_type.id;
- WARN_ON(type >= 0 && ret != type);
+ WARN_ON(type >= 0 && pmu_type.id != type);
- pmu->type = ret;
+ pmu->type = pmu_type.id;
atomic_set(&pmu->exclusive_cnt, 0);
if (pmu_bus_running && !pmu->dev) {
- ret = pmu_dev_alloc(pmu);
+ int ret = pmu_dev_alloc(pmu);
if (ret)
- goto free;
+ return ret;
}
- ret = -ENOMEM;
pmu->cpu_pmu_context = alloc_percpu(struct perf_cpu_pmu_context);
if (!pmu->cpu_pmu_context)
- goto free;
+ return -ENOMEM;
for_each_possible_cpu(cpu) {
struct perf_cpu_pmu_context *cpc;
@@ -12000,32 +11997,22 @@ int perf_pmu_register(struct pmu *pmu, const char *name, int type)
/*
* Now that the PMU is complete, make it visible to perf_try_init_event().
*/
- if (!idr_cmpxchg(&pmu_idr, pmu->type, NULL, pmu)) {
- ret = -EINVAL;
- goto free;
- }
+ if (!idr_cmpxchg(&pmu_idr, pmu->type, NULL, pmu))
+ return -EINVAL;
list_add_rcu(&pmu->entry, &pmus);
- ret = 0;
-unlock:
- mutex_unlock(&pmus_lock);
-
- return ret;
-
-free:
- if (pmu->type >= 0)
- idr_remove(&pmu_idr, pmu->type);
- perf_pmu_free(pmu);
- goto unlock;
+ take_idr_id(pmu_type);
+ _pmu = no_free_ptr(pmu); // let it rip
+ return 0;
}
EXPORT_SYMBOL_GPL(perf_pmu_register);
void perf_pmu_unregister(struct pmu *pmu)
{
- mutex_lock(&pmus_lock);
- list_del_rcu(&pmu->entry);
- idr_remove(&pmu_idr, pmu->type);
- mutex_unlock(&pmus_lock);
+ scoped_guard (mutex, &pmus_lock) {
+ list_del_rcu(&pmu->entry);
+ idr_remove(&pmu_idr, pmu->type);
+ }
/*
* We dereference the pmu list under both SRCU and regular RCU, so
^ permalink raw reply related [flat|nested] 85+ messages in thread* Re: [PATCH 06/19] perf: Simplify perf_pmu_register()
2024-11-04 13:39 ` [PATCH 06/19] perf: Simplify perf_pmu_register() Peter Zijlstra
` (2 preceding siblings ...)
2025-03-04 8:57 ` tip-bot2 for Peter Zijlstra
@ 2025-03-25 19:47 ` Sidhartha Kumar
3 siblings, 0 replies; 85+ messages in thread
From: Sidhartha Kumar @ 2025-03-25 19:47 UTC (permalink / raw)
To: peterz, Lorenzo Stoakes, Liam R . Howlett
Cc: acme, adrian.hunter, alexander.shishkin, irogers, jolsa,
kan.liang, linux-kernel, lucas.demarchi, mark.rutland, mingo,
namhyung, willy
Hello,
The inclusion of #include <linux/cleanup.h> in include/linux/idr.h
breaks building the userspace radix-tree test suite with:
In file included from ../shared/linux/idr.h:1,
from radix-tree.c:18:
../shared/linux/../../../../include/linux/idr.h:18:10: fatal error:
linux/cleanup.h: No such file or directory
18 | #include <linux/cleanup.h>
| ^~~~~~~~~~~~~~~~~
compilation terminated.
Thanks,
Sid
^ permalink raw reply [flat|nested] 85+ messages in thread
* [PATCH 07/19] perf: Simplify perf_init_event()
2024-11-04 13:39 [PATCH 00/19] perf: Make perf_pmu_unregister() usable Peter Zijlstra
` (5 preceding siblings ...)
2024-11-04 13:39 ` [PATCH 06/19] perf: Simplify perf_pmu_register() Peter Zijlstra
@ 2024-11-04 13:39 ` Peter Zijlstra
2025-03-01 20:07 ` [tip: perf/core] perf/core: " tip-bot2 for Peter Zijlstra
2025-03-04 8:57 ` tip-bot2 for Peter Zijlstra
2024-11-04 13:39 ` [PATCH 08/19] perf: Simplify perf_event_alloc() Peter Zijlstra
` (13 subsequent siblings)
20 siblings, 2 replies; 85+ messages in thread
From: Peter Zijlstra @ 2024-11-04 13:39 UTC (permalink / raw)
To: mingo, lucas.demarchi
Cc: linux-kernel, peterz, willy, acme, namhyung, mark.rutland,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
kernel/events/core.c | 31 ++++++++++++-------------------
1 file changed, 12 insertions(+), 19 deletions(-)
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -11942,10 +11942,10 @@ static int perf_try_init_event(struct pm
static struct pmu *perf_init_event(struct perf_event *event)
{
bool extended_type = false;
- int idx, type, ret;
struct pmu *pmu;
+ int type, ret;
- idx = srcu_read_lock(&pmus_srcu);
+ guard(srcu)(&pmus_srcu);
/*
* Save original type before calling pmu->event_init() since certain
@@ -11958,7 +11958,7 @@ static struct pmu *perf_init_event(struc
pmu = event->parent->pmu;
ret = perf_try_init_event(pmu, event);
if (!ret)
- goto unlock;
+ return pmu;
}
/*
@@ -11977,13 +11977,12 @@ static struct pmu *perf_init_event(struc
}
again:
- rcu_read_lock();
- pmu = idr_find(&pmu_idr, type);
- rcu_read_unlock();
+ scoped_guard (rcu)
+ pmu = idr_find(&pmu_idr, type);
if (pmu) {
if (event->attr.type != type && type != PERF_TYPE_RAW &&
!(pmu->capabilities & PERF_PMU_CAP_EXTENDED_HW_TYPE))
- goto fail;
+ return ERR_PTR(-ENOENT);
ret = perf_try_init_event(pmu, event);
if (ret == -ENOENT && event->attr.type != type && !extended_type) {
@@ -11992,27 +11991,21 @@ static struct pmu *perf_init_event(struc
}
if (ret)
- pmu = ERR_PTR(ret);
+ return ERR_PTR(ret);
- goto unlock;
+ return pmu;
}
list_for_each_entry_rcu(pmu, &pmus, entry, lockdep_is_held(&pmus_srcu)) {
ret = perf_try_init_event(pmu, event);
if (!ret)
- goto unlock;
+ return pmu;
- if (ret != -ENOENT) {
- pmu = ERR_PTR(ret);
- goto unlock;
- }
+ if (ret != -ENOENT)
+ return ERR_PTR(ret);
}
-fail:
- pmu = ERR_PTR(-ENOENT);
-unlock:
- srcu_read_unlock(&pmus_srcu, idx);
- return pmu;
+ return ERR_PTR(-ENOENT);
}
static void attach_sb_event(struct perf_event *event)
^ permalink raw reply [flat|nested] 85+ messages in thread* [tip: perf/core] perf/core: Simplify perf_init_event()
2024-11-04 13:39 ` [PATCH 07/19] perf: Simplify perf_init_event() Peter Zijlstra
@ 2025-03-01 20:07 ` tip-bot2 for Peter Zijlstra
2025-03-04 8:57 ` tip-bot2 for Peter Zijlstra
1 sibling, 0 replies; 85+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2025-03-01 20:07 UTC (permalink / raw)
To: linux-tip-commits
Cc: Peter Zijlstra (Intel), Ingo Molnar, Thomas Gleixner, x86,
linux-kernel
The following commit has been merged into the perf/core branch of tip:
Commit-ID: 9954ea69de5c06c5bf073d366d6768c1c4a2a1c7
Gitweb: https://git.kernel.org/tip/9954ea69de5c06c5bf073d366d6768c1c4a2a1c7
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Mon, 04 Nov 2024 14:39:16 +01:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Sat, 01 Mar 2025 19:54:05 +01:00
perf/core: Simplify perf_init_event()
Use the <linux/cleanup.h> guard() and scoped_guard() infrastructure
to simplify the control flow.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20241104135518.302444446@infradead.org
---
kernel/events/core.c | 31 ++++++++++++-------------------
1 file changed, 12 insertions(+), 19 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 215dad5..fd35236 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -12101,10 +12101,10 @@ static int perf_try_init_event(struct pmu *pmu, struct perf_event *event)
static struct pmu *perf_init_event(struct perf_event *event)
{
bool extended_type = false;
- int idx, type, ret;
struct pmu *pmu;
+ int type, ret;
- idx = srcu_read_lock(&pmus_srcu);
+ guard(srcu)(&pmus_srcu);
/*
* Save original type before calling pmu->event_init() since certain
@@ -12117,7 +12117,7 @@ static struct pmu *perf_init_event(struct perf_event *event)
pmu = event->parent->pmu;
ret = perf_try_init_event(pmu, event);
if (!ret)
- goto unlock;
+ return pmu;
}
/*
@@ -12136,13 +12136,12 @@ static struct pmu *perf_init_event(struct perf_event *event)
}
again:
- rcu_read_lock();
- pmu = idr_find(&pmu_idr, type);
- rcu_read_unlock();
+ scoped_guard (rcu)
+ pmu = idr_find(&pmu_idr, type);
if (pmu) {
if (event->attr.type != type && type != PERF_TYPE_RAW &&
!(pmu->capabilities & PERF_PMU_CAP_EXTENDED_HW_TYPE))
- goto fail;
+ return ERR_PTR(-ENOENT);
ret = perf_try_init_event(pmu, event);
if (ret == -ENOENT && event->attr.type != type && !extended_type) {
@@ -12151,27 +12150,21 @@ again:
}
if (ret)
- pmu = ERR_PTR(ret);
+ return ERR_PTR(ret);
- goto unlock;
+ return pmu;
}
list_for_each_entry_rcu(pmu, &pmus, entry, lockdep_is_held(&pmus_srcu)) {
ret = perf_try_init_event(pmu, event);
if (!ret)
- goto unlock;
+ return pmu;
- if (ret != -ENOENT) {
- pmu = ERR_PTR(ret);
- goto unlock;
- }
+ if (ret != -ENOENT)
+ return ERR_PTR(ret);
}
-fail:
- pmu = ERR_PTR(-ENOENT);
-unlock:
- srcu_read_unlock(&pmus_srcu, idx);
- return pmu;
+ return ERR_PTR(-ENOENT);
}
static void attach_sb_event(struct perf_event *event)
^ permalink raw reply related [flat|nested] 85+ messages in thread* [tip: perf/core] perf/core: Simplify perf_init_event()
2024-11-04 13:39 ` [PATCH 07/19] perf: Simplify perf_init_event() Peter Zijlstra
2025-03-01 20:07 ` [tip: perf/core] perf/core: " tip-bot2 for Peter Zijlstra
@ 2025-03-04 8:57 ` tip-bot2 for Peter Zijlstra
1 sibling, 0 replies; 85+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2025-03-04 8:57 UTC (permalink / raw)
To: linux-tip-commits
Cc: Peter Zijlstra (Intel), Ingo Molnar, Ravi Bangoria,
Thomas Gleixner, x86, linux-kernel
The following commit has been merged into the perf/core branch of tip:
Commit-ID: caf8b765d453198d4ca5305d9e207535934b6e3b
Gitweb: https://git.kernel.org/tip/caf8b765d453198d4ca5305d9e207535934b6e3b
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Mon, 04 Nov 2024 14:39:16 +01:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 04 Mar 2025 09:42:32 +01:00
perf/core: Simplify perf_init_event()
Use the <linux/cleanup.h> guard() and scoped_guard() infrastructure
to simplify the control flow.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20241104135518.302444446@infradead.org
---
kernel/events/core.c | 31 ++++++++++++-------------------
1 file changed, 12 insertions(+), 19 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 215dad5..fd35236 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -12101,10 +12101,10 @@ static int perf_try_init_event(struct pmu *pmu, struct perf_event *event)
static struct pmu *perf_init_event(struct perf_event *event)
{
bool extended_type = false;
- int idx, type, ret;
struct pmu *pmu;
+ int type, ret;
- idx = srcu_read_lock(&pmus_srcu);
+ guard(srcu)(&pmus_srcu);
/*
* Save original type before calling pmu->event_init() since certain
@@ -12117,7 +12117,7 @@ static struct pmu *perf_init_event(struct perf_event *event)
pmu = event->parent->pmu;
ret = perf_try_init_event(pmu, event);
if (!ret)
- goto unlock;
+ return pmu;
}
/*
@@ -12136,13 +12136,12 @@ static struct pmu *perf_init_event(struct perf_event *event)
}
again:
- rcu_read_lock();
- pmu = idr_find(&pmu_idr, type);
- rcu_read_unlock();
+ scoped_guard (rcu)
+ pmu = idr_find(&pmu_idr, type);
if (pmu) {
if (event->attr.type != type && type != PERF_TYPE_RAW &&
!(pmu->capabilities & PERF_PMU_CAP_EXTENDED_HW_TYPE))
- goto fail;
+ return ERR_PTR(-ENOENT);
ret = perf_try_init_event(pmu, event);
if (ret == -ENOENT && event->attr.type != type && !extended_type) {
@@ -12151,27 +12150,21 @@ again:
}
if (ret)
- pmu = ERR_PTR(ret);
+ return ERR_PTR(ret);
- goto unlock;
+ return pmu;
}
list_for_each_entry_rcu(pmu, &pmus, entry, lockdep_is_held(&pmus_srcu)) {
ret = perf_try_init_event(pmu, event);
if (!ret)
- goto unlock;
+ return pmu;
- if (ret != -ENOENT) {
- pmu = ERR_PTR(ret);
- goto unlock;
- }
+ if (ret != -ENOENT)
+ return ERR_PTR(ret);
}
-fail:
- pmu = ERR_PTR(-ENOENT);
-unlock:
- srcu_read_unlock(&pmus_srcu, idx);
- return pmu;
+ return ERR_PTR(-ENOENT);
}
static void attach_sb_event(struct perf_event *event)
^ permalink raw reply related [flat|nested] 85+ messages in thread
* [PATCH 08/19] perf: Simplify perf_event_alloc()
2024-11-04 13:39 [PATCH 00/19] perf: Make perf_pmu_unregister() usable Peter Zijlstra
` (6 preceding siblings ...)
2024-11-04 13:39 ` [PATCH 07/19] perf: Simplify perf_init_event() Peter Zijlstra
@ 2024-11-04 13:39 ` Peter Zijlstra
2025-03-01 20:07 ` [tip: perf/core] perf/core: " tip-bot2 for Peter Zijlstra
2025-03-04 8:57 ` tip-bot2 for Peter Zijlstra
2024-11-04 13:39 ` [PATCH 09/19] perf: Merge pmu_disable_count into cpu_pmu_context Peter Zijlstra
` (12 subsequent siblings)
20 siblings, 2 replies; 85+ messages in thread
From: Peter Zijlstra @ 2024-11-04 13:39 UTC (permalink / raw)
To: mingo, lucas.demarchi
Cc: linux-kernel, peterz, willy, acme, namhyung, mark.rutland,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang
Using the previous simplifications, transition perf_event_alloc() to
the cleanup way of things -- reducing error path magic.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
kernel/events/core.c | 47 ++++++++++++++++++-----------------------------
1 file changed, 18 insertions(+), 29 deletions(-)
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5370,6 +5370,8 @@ static void __free_event(struct perf_eve
call_rcu(&event->rcu_head, free_event_rcu);
}
+DEFINE_FREE(__free_event, struct perf_event *, if (_T) __free_event(_T))
+
/* vs perf_event_alloc() success */
static void _free_event(struct perf_event *event)
{
@@ -12132,7 +12134,6 @@ perf_event_alloc(struct perf_event_attr
void *context, int cgroup_fd)
{
struct pmu *pmu;
- struct perf_event *event;
struct hw_perf_event *hwc;
long err = -EINVAL;
int node;
@@ -12147,8 +12148,8 @@ perf_event_alloc(struct perf_event_attr
}
node = (cpu >= 0) ? cpu_to_node(cpu) : -1;
- event = kmem_cache_alloc_node(perf_event_cache, GFP_KERNEL | __GFP_ZERO,
- node);
+ struct perf_event *event __free(__free_event) =
+ kmem_cache_alloc_node(perf_event_cache, GFP_KERNEL | __GFP_ZERO, node);
if (!event)
return ERR_PTR(-ENOMEM);
@@ -12255,51 +12256,43 @@ perf_event_alloc(struct perf_event_attr
* See perf_output_read().
*/
if (has_inherit_and_sample_read(attr) && !(attr->sample_type & PERF_SAMPLE_TID))
- goto err;
+ return ERR_PTR(-EINVAL);
if (!has_branch_stack(event))
event->attr.branch_sample_type = 0;
pmu = perf_init_event(event);
- if (IS_ERR(pmu)) {
- err = PTR_ERR(pmu);
- goto err;
- }
+ if (IS_ERR(pmu))
+ return (void*)pmu;
/*
* Disallow uncore-task events. Similarly, disallow uncore-cgroup
* events (they don't make sense as the cgroup will be different
* on other CPUs in the uncore mask).
*/
- if (pmu->task_ctx_nr == perf_invalid_context && (task || cgroup_fd != -1)) {
- err = -EINVAL;
- goto err;
- }
+ if (pmu->task_ctx_nr == perf_invalid_context && (task || cgroup_fd != -1))
+ return ERR_PTR(-EINVAL);
if (event->attr.aux_output &&
- !(pmu->capabilities & PERF_PMU_CAP_AUX_OUTPUT)) {
- err = -EOPNOTSUPP;
- goto err;
- }
+ !(pmu->capabilities & PERF_PMU_CAP_AUX_OUTPUT))
+ return ERR_PTR(-EOPNOTSUPP);
if (cgroup_fd != -1) {
err = perf_cgroup_connect(cgroup_fd, event, attr, group_leader);
if (err)
- goto err;
+ return ERR_PTR(err);
}
err = exclusive_event_init(event);
if (err)
- goto err;
+ return ERR_PTR(err);
if (has_addr_filter(event)) {
event->addr_filter_ranges = kcalloc(pmu->nr_addr_filters,
sizeof(struct perf_addr_filter_range),
GFP_KERNEL);
- if (!event->addr_filter_ranges) {
- err = -ENOMEM;
- goto err;
- }
+ if (!event->addr_filter_ranges)
+ return ERR_PTR(-ENOMEM);
/*
* Clone the parent's vma offsets: they are valid until exec()
@@ -12323,23 +12316,19 @@ perf_event_alloc(struct perf_event_attr
if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN) {
err = get_callchain_buffers(attr->sample_max_stack);
if (err)
- goto err;
+ return ERR_PTR(err);
event->attach_state |= PERF_ATTACH_CALLCHAIN;
}
}
err = security_perf_event_alloc(event);
if (err)
- goto err;
+ return ERR_PTR(err);
/* symmetric to unaccount_event() in _free_event() */
account_event(event);
- return event;
-
-err:
- __free_event(event);
- return ERR_PTR(err);
+ return_ptr(event);
}
static int perf_copy_attr(struct perf_event_attr __user *uattr,
^ permalink raw reply [flat|nested] 85+ messages in thread* [tip: perf/core] perf/core: Simplify perf_event_alloc()
2024-11-04 13:39 ` [PATCH 08/19] perf: Simplify perf_event_alloc() Peter Zijlstra
@ 2025-03-01 20:07 ` tip-bot2 for Peter Zijlstra
2025-03-04 8:57 ` tip-bot2 for Peter Zijlstra
1 sibling, 0 replies; 85+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2025-03-01 20:07 UTC (permalink / raw)
To: linux-tip-commits; +Cc: Peter Zijlstra (Intel), Ingo Molnar, x86, linux-kernel
The following commit has been merged into the perf/core branch of tip:
Commit-ID: ebfe83832e392f25b4fe1fedf816eebb76f1f5b4
Gitweb: https://git.kernel.org/tip/ebfe83832e392f25b4fe1fedf816eebb76f1f5b4
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Mon, 04 Nov 2024 14:39:17 +01:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Sat, 01 Mar 2025 19:54:06 +01:00
perf/core: Simplify perf_event_alloc()
Using the previous simplifications, transition perf_event_alloc() to
the cleanup way of things -- reducing error path magic.
[ mingo: Ported it to recent kernels. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20241104135518.410755241@infradead.org
---
kernel/events/core.c | 59 ++++++++++++++++---------------------------
1 file changed, 22 insertions(+), 37 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index fd35236..348a379 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5410,6 +5410,8 @@ static void __free_event(struct perf_event *event)
call_rcu(&event->rcu_head, free_event_rcu);
}
+DEFINE_FREE(__free_event, struct perf_event *, if (_T) __free_event(_T))
+
/* vs perf_event_alloc() success */
static void _free_event(struct perf_event *event)
{
@@ -12291,7 +12293,6 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
void *context, int cgroup_fd)
{
struct pmu *pmu;
- struct perf_event *event;
struct hw_perf_event *hwc;
long err = -EINVAL;
int node;
@@ -12306,8 +12307,8 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
}
node = (cpu >= 0) ? cpu_to_node(cpu) : -1;
- event = kmem_cache_alloc_node(perf_event_cache, GFP_KERNEL | __GFP_ZERO,
- node);
+ struct perf_event *event __free(__free_event) =
+ kmem_cache_alloc_node(perf_event_cache, GFP_KERNEL | __GFP_ZERO, node);
if (!event)
return ERR_PTR(-ENOMEM);
@@ -12414,65 +12415,53 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
* See perf_output_read().
*/
if (has_inherit_and_sample_read(attr) && !(attr->sample_type & PERF_SAMPLE_TID))
- goto err;
+ return ERR_PTR(-EINVAL);
if (!has_branch_stack(event))
event->attr.branch_sample_type = 0;
pmu = perf_init_event(event);
- if (IS_ERR(pmu)) {
- err = PTR_ERR(pmu);
- goto err;
- }
+ if (IS_ERR(pmu))
+ return (void*)pmu;
/*
* Disallow uncore-task events. Similarly, disallow uncore-cgroup
* events (they don't make sense as the cgroup will be different
* on other CPUs in the uncore mask).
*/
- if (pmu->task_ctx_nr == perf_invalid_context && (task || cgroup_fd != -1)) {
- err = -EINVAL;
- goto err;
- }
+ if (pmu->task_ctx_nr == perf_invalid_context && (task || cgroup_fd != -1))
+ return ERR_PTR(-EINVAL);
if (event->attr.aux_output &&
(!(pmu->capabilities & PERF_PMU_CAP_AUX_OUTPUT) ||
- event->attr.aux_pause || event->attr.aux_resume)) {
- err = -EOPNOTSUPP;
- goto err;
- }
+ event->attr.aux_pause || event->attr.aux_resume))
+ return ERR_PTR(-EOPNOTSUPP);
- if (event->attr.aux_pause && event->attr.aux_resume) {
- err = -EINVAL;
- goto err;
- }
+ if (event->attr.aux_pause && event->attr.aux_resume)
+ return ERR_PTR(-EINVAL);
if (event->attr.aux_start_paused) {
- if (!(pmu->capabilities & PERF_PMU_CAP_AUX_PAUSE)) {
- err = -EOPNOTSUPP;
- goto err;
- }
+ if (!(pmu->capabilities & PERF_PMU_CAP_AUX_PAUSE))
+ return ERR_PTR(-EOPNOTSUPP);
event->hw.aux_paused = 1;
}
if (cgroup_fd != -1) {
err = perf_cgroup_connect(cgroup_fd, event, attr, group_leader);
if (err)
- goto err;
+ return ERR_PTR(err);
}
err = exclusive_event_init(event);
if (err)
- goto err;
+ return ERR_PTR(err);
if (has_addr_filter(event)) {
event->addr_filter_ranges = kcalloc(pmu->nr_addr_filters,
sizeof(struct perf_addr_filter_range),
GFP_KERNEL);
- if (!event->addr_filter_ranges) {
- err = -ENOMEM;
- goto err;
- }
+ if (!event->addr_filter_ranges)
+ return ERR_PTR(-ENOMEM);
/*
* Clone the parent's vma offsets: they are valid until exec()
@@ -12496,23 +12485,19 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN) {
err = get_callchain_buffers(attr->sample_max_stack);
if (err)
- goto err;
+ return ERR_PTR(err);
event->attach_state |= PERF_ATTACH_CALLCHAIN;
}
}
err = security_perf_event_alloc(event);
if (err)
- goto err;
+ return ERR_PTR(err);
/* symmetric to unaccount_event() in _free_event() */
account_event(event);
- return event;
-
-err:
- __free_event(event);
- return ERR_PTR(err);
+ return_ptr(event);
}
static int perf_copy_attr(struct perf_event_attr __user *uattr,
^ permalink raw reply related [flat|nested] 85+ messages in thread* [tip: perf/core] perf/core: Simplify perf_event_alloc()
2024-11-04 13:39 ` [PATCH 08/19] perf: Simplify perf_event_alloc() Peter Zijlstra
2025-03-01 20:07 ` [tip: perf/core] perf/core: " tip-bot2 for Peter Zijlstra
@ 2025-03-04 8:57 ` tip-bot2 for Peter Zijlstra
1 sibling, 0 replies; 85+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2025-03-04 8:57 UTC (permalink / raw)
To: linux-tip-commits
Cc: Peter Zijlstra (Intel), Ingo Molnar, Ravi Bangoria, x86,
linux-kernel
The following commit has been merged into the perf/core branch of tip:
Commit-ID: 8f2221f52eced88e74c7ae22b4b2d67dc7a96bd2
Gitweb: https://git.kernel.org/tip/8f2221f52eced88e74c7ae22b4b2d67dc7a96bd2
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Mon, 04 Nov 2024 14:39:17 +01:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 04 Mar 2025 09:42:40 +01:00
perf/core: Simplify perf_event_alloc()
Using the previous simplifications, transition perf_event_alloc() to
the cleanup way of things -- reducing error path magic.
[ mingo: Ported it to recent kernels. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20241104135518.410755241@infradead.org
---
kernel/events/core.c | 59 ++++++++++++++++---------------------------
1 file changed, 22 insertions(+), 37 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index fd35236..348a379 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5410,6 +5410,8 @@ static void __free_event(struct perf_event *event)
call_rcu(&event->rcu_head, free_event_rcu);
}
+DEFINE_FREE(__free_event, struct perf_event *, if (_T) __free_event(_T))
+
/* vs perf_event_alloc() success */
static void _free_event(struct perf_event *event)
{
@@ -12291,7 +12293,6 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
void *context, int cgroup_fd)
{
struct pmu *pmu;
- struct perf_event *event;
struct hw_perf_event *hwc;
long err = -EINVAL;
int node;
@@ -12306,8 +12307,8 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
}
node = (cpu >= 0) ? cpu_to_node(cpu) : -1;
- event = kmem_cache_alloc_node(perf_event_cache, GFP_KERNEL | __GFP_ZERO,
- node);
+ struct perf_event *event __free(__free_event) =
+ kmem_cache_alloc_node(perf_event_cache, GFP_KERNEL | __GFP_ZERO, node);
if (!event)
return ERR_PTR(-ENOMEM);
@@ -12414,65 +12415,53 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
* See perf_output_read().
*/
if (has_inherit_and_sample_read(attr) && !(attr->sample_type & PERF_SAMPLE_TID))
- goto err;
+ return ERR_PTR(-EINVAL);
if (!has_branch_stack(event))
event->attr.branch_sample_type = 0;
pmu = perf_init_event(event);
- if (IS_ERR(pmu)) {
- err = PTR_ERR(pmu);
- goto err;
- }
+ if (IS_ERR(pmu))
+ return (void*)pmu;
/*
* Disallow uncore-task events. Similarly, disallow uncore-cgroup
* events (they don't make sense as the cgroup will be different
* on other CPUs in the uncore mask).
*/
- if (pmu->task_ctx_nr == perf_invalid_context && (task || cgroup_fd != -1)) {
- err = -EINVAL;
- goto err;
- }
+ if (pmu->task_ctx_nr == perf_invalid_context && (task || cgroup_fd != -1))
+ return ERR_PTR(-EINVAL);
if (event->attr.aux_output &&
(!(pmu->capabilities & PERF_PMU_CAP_AUX_OUTPUT) ||
- event->attr.aux_pause || event->attr.aux_resume)) {
- err = -EOPNOTSUPP;
- goto err;
- }
+ event->attr.aux_pause || event->attr.aux_resume))
+ return ERR_PTR(-EOPNOTSUPP);
- if (event->attr.aux_pause && event->attr.aux_resume) {
- err = -EINVAL;
- goto err;
- }
+ if (event->attr.aux_pause && event->attr.aux_resume)
+ return ERR_PTR(-EINVAL);
if (event->attr.aux_start_paused) {
- if (!(pmu->capabilities & PERF_PMU_CAP_AUX_PAUSE)) {
- err = -EOPNOTSUPP;
- goto err;
- }
+ if (!(pmu->capabilities & PERF_PMU_CAP_AUX_PAUSE))
+ return ERR_PTR(-EOPNOTSUPP);
event->hw.aux_paused = 1;
}
if (cgroup_fd != -1) {
err = perf_cgroup_connect(cgroup_fd, event, attr, group_leader);
if (err)
- goto err;
+ return ERR_PTR(err);
}
err = exclusive_event_init(event);
if (err)
- goto err;
+ return ERR_PTR(err);
if (has_addr_filter(event)) {
event->addr_filter_ranges = kcalloc(pmu->nr_addr_filters,
sizeof(struct perf_addr_filter_range),
GFP_KERNEL);
- if (!event->addr_filter_ranges) {
- err = -ENOMEM;
- goto err;
- }
+ if (!event->addr_filter_ranges)
+ return ERR_PTR(-ENOMEM);
/*
* Clone the parent's vma offsets: they are valid until exec()
@@ -12496,23 +12485,19 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN) {
err = get_callchain_buffers(attr->sample_max_stack);
if (err)
- goto err;
+ return ERR_PTR(err);
event->attach_state |= PERF_ATTACH_CALLCHAIN;
}
}
err = security_perf_event_alloc(event);
if (err)
- goto err;
+ return ERR_PTR(err);
/* symmetric to unaccount_event() in _free_event() */
account_event(event);
- return event;
-
-err:
- __free_event(event);
- return ERR_PTR(err);
+ return_ptr(event);
}
static int perf_copy_attr(struct perf_event_attr __user *uattr,
^ permalink raw reply related [flat|nested] 85+ messages in thread
* [PATCH 09/19] perf: Merge pmu_disable_count into cpu_pmu_context
2024-11-04 13:39 [PATCH 00/19] perf: Make perf_pmu_unregister() usable Peter Zijlstra
` (7 preceding siblings ...)
2024-11-04 13:39 ` [PATCH 08/19] perf: Simplify perf_event_alloc() Peter Zijlstra
@ 2024-11-04 13:39 ` Peter Zijlstra
2025-03-01 20:07 ` [tip: perf/core] perf/core: Merge struct pmu::pmu_disable_count into struct perf_cpu_pmu_context::pmu_disable_count tip-bot2 for Peter Zijlstra
2025-03-04 8:57 ` tip-bot2 for Peter Zijlstra
2024-11-04 13:39 ` [PATCH 10/19] perf: Add this_cpc() helper Peter Zijlstra
` (11 subsequent siblings)
20 siblings, 2 replies; 85+ messages in thread
From: Peter Zijlstra @ 2024-11-04 13:39 UTC (permalink / raw)
To: mingo, lucas.demarchi
Cc: linux-kernel, peterz, willy, acme, namhyung, mark.rutland,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang
Because it makes no sense to have two per-cpu allocations per pmu.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
include/linux/perf_event.h | 2 +-
kernel/events/core.c | 12 ++++--------
2 files changed, 5 insertions(+), 9 deletions(-)
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -336,7 +336,6 @@ struct pmu {
*/
unsigned int scope;
- int __percpu *pmu_disable_count;
struct perf_cpu_pmu_context __percpu *cpu_pmu_context;
atomic_t exclusive_cnt; /* < 0: cpu; > 0: tsk */
int task_ctx_nr;
@@ -1010,6 +1009,7 @@ struct perf_cpu_pmu_context {
int active_oncpu;
int exclusive;
+ int pmu_disable_count;
raw_spinlock_t hrtimer_lock;
struct hrtimer hrtimer;
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1178,21 +1178,22 @@ static int perf_mux_hrtimer_restart_ipi(
void perf_pmu_disable(struct pmu *pmu)
{
- int *count = this_cpu_ptr(pmu->pmu_disable_count);
+ int *count = &this_cpu_ptr(pmu->cpu_pmu_context)->pmu_disable_count;
if (!(*count)++)
pmu->pmu_disable(pmu);
}
void perf_pmu_enable(struct pmu *pmu)
{
- int *count = this_cpu_ptr(pmu->pmu_disable_count);
+ int *count = &this_cpu_ptr(pmu->cpu_pmu_context)->pmu_disable_count;
if (!--(*count))
pmu->pmu_enable(pmu);
}
static void perf_assert_pmu_disabled(struct pmu *pmu)
{
- WARN_ON_ONCE(*this_cpu_ptr(pmu->pmu_disable_count) == 0);
+ int *count = &this_cpu_ptr(pmu->cpu_pmu_context)->pmu_disable_count;
+ WARN_ON_ONCE(*count == 0);
}
static void get_ctx(struct perf_event_context *ctx)
@@ -11758,7 +11759,6 @@ static bool idr_cmpxchg(struct idr *idr,
static void perf_pmu_free(struct pmu *pmu)
{
- free_percpu(pmu->pmu_disable_count);
if (pmu_bus_running && pmu->dev && pmu->dev != PMU_NULL_DEV) {
if (pmu->nr_addr_filters)
device_remove_file(pmu->dev, &dev_attr_nr_addr_filters);
@@ -11777,10 +11777,6 @@ int perf_pmu_register(struct pmu *_pmu,
struct pmu *pmu __free(pmu_unregister) = _pmu;
guard(mutex)(&pmus_lock);
- pmu->pmu_disable_count = alloc_percpu(int);
- if (!pmu->pmu_disable_count)
- return -ENOMEM;
-
if (WARN_ONCE(!name, "Can not register anonymous pmu.\n"))
return -EINVAL;
^ permalink raw reply [flat|nested] 85+ messages in thread* [tip: perf/core] perf/core: Merge struct pmu::pmu_disable_count into struct perf_cpu_pmu_context::pmu_disable_count
2024-11-04 13:39 ` [PATCH 09/19] perf: Merge pmu_disable_count into cpu_pmu_context Peter Zijlstra
@ 2025-03-01 20:07 ` tip-bot2 for Peter Zijlstra
2025-03-04 8:57 ` tip-bot2 for Peter Zijlstra
1 sibling, 0 replies; 85+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2025-03-01 20:07 UTC (permalink / raw)
To: linux-tip-commits; +Cc: Peter Zijlstra (Intel), Ingo Molnar, x86, linux-kernel
The following commit has been merged into the perf/core branch of tip:
Commit-ID: 46cc0835d258934bcad1fa921c2688db98b2dabd
Gitweb: https://git.kernel.org/tip/46cc0835d258934bcad1fa921c2688db98b2dabd
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Mon, 04 Nov 2024 14:39:18 +01:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Sat, 01 Mar 2025 20:00:11 +01:00
perf/core: Merge struct pmu::pmu_disable_count into struct perf_cpu_pmu_context::pmu_disable_count
Because it makes no sense to have two per-cpu allocations per pmu.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20241104135518.518730578@infradead.org
---
include/linux/perf_event.h | 2 +-
kernel/events/core.c | 12 ++++--------
2 files changed, 5 insertions(+), 9 deletions(-)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 8c0117b..5f293e6 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -343,7 +343,6 @@ struct pmu {
*/
unsigned int scope;
- int __percpu *pmu_disable_count;
struct perf_cpu_pmu_context __percpu *cpu_pmu_context;
atomic_t exclusive_cnt; /* < 0: cpu; > 0: tsk */
int task_ctx_nr;
@@ -1031,6 +1030,7 @@ struct perf_cpu_pmu_context {
int active_oncpu;
int exclusive;
+ int pmu_disable_count;
raw_spinlock_t hrtimer_lock;
struct hrtimer hrtimer;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 348a379..8321b71 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1219,21 +1219,22 @@ static int perf_mux_hrtimer_restart_ipi(void *arg)
void perf_pmu_disable(struct pmu *pmu)
{
- int *count = this_cpu_ptr(pmu->pmu_disable_count);
+ int *count = &this_cpu_ptr(pmu->cpu_pmu_context)->pmu_disable_count;
if (!(*count)++)
pmu->pmu_disable(pmu);
}
void perf_pmu_enable(struct pmu *pmu)
{
- int *count = this_cpu_ptr(pmu->pmu_disable_count);
+ int *count = &this_cpu_ptr(pmu->cpu_pmu_context)->pmu_disable_count;
if (!--(*count))
pmu->pmu_enable(pmu);
}
static void perf_assert_pmu_disabled(struct pmu *pmu)
{
- WARN_ON_ONCE(*this_cpu_ptr(pmu->pmu_disable_count) == 0);
+ int *count = &this_cpu_ptr(pmu->cpu_pmu_context)->pmu_disable_count;
+ WARN_ON_ONCE(*count == 0);
}
static inline void perf_pmu_read(struct perf_event *event)
@@ -11906,7 +11907,6 @@ static bool idr_cmpxchg(struct idr *idr, unsigned long id, void *old, void *new)
static void perf_pmu_free(struct pmu *pmu)
{
- free_percpu(pmu->pmu_disable_count);
if (pmu_bus_running && pmu->dev && pmu->dev != PMU_NULL_DEV) {
if (pmu->nr_addr_filters)
device_remove_file(pmu->dev, &dev_attr_nr_addr_filters);
@@ -11925,10 +11925,6 @@ int perf_pmu_register(struct pmu *_pmu, const char *name, int type)
struct pmu *pmu __free(pmu_unregister) = _pmu;
guard(mutex)(&pmus_lock);
- pmu->pmu_disable_count = alloc_percpu(int);
- if (!pmu->pmu_disable_count)
- return -ENOMEM;
-
if (WARN_ONCE(!name, "Can not register anonymous pmu.\n"))
return -EINVAL;
^ permalink raw reply related [flat|nested] 85+ messages in thread* [tip: perf/core] perf/core: Merge struct pmu::pmu_disable_count into struct perf_cpu_pmu_context::pmu_disable_count
2024-11-04 13:39 ` [PATCH 09/19] perf: Merge pmu_disable_count into cpu_pmu_context Peter Zijlstra
2025-03-01 20:07 ` [tip: perf/core] perf/core: Merge struct pmu::pmu_disable_count into struct perf_cpu_pmu_context::pmu_disable_count tip-bot2 for Peter Zijlstra
@ 2025-03-04 8:57 ` tip-bot2 for Peter Zijlstra
1 sibling, 0 replies; 85+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2025-03-04 8:57 UTC (permalink / raw)
To: linux-tip-commits
Cc: Peter Zijlstra (Intel), Ingo Molnar, Ravi Bangoria, x86,
linux-kernel
The following commit has been merged into the perf/core branch of tip:
Commit-ID: 4baeb0687abf5eca3f7ab8b147c27cce82ec49ea
Gitweb: https://git.kernel.org/tip/4baeb0687abf5eca3f7ab8b147c27cce82ec49ea
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Mon, 04 Nov 2024 14:39:18 +01:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 04 Mar 2025 09:42:47 +01:00
perf/core: Merge struct pmu::pmu_disable_count into struct perf_cpu_pmu_context::pmu_disable_count
Because it makes no sense to have two per-cpu allocations per pmu.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20241104135518.518730578@infradead.org
---
include/linux/perf_event.h | 2 +-
kernel/events/core.c | 12 ++++--------
2 files changed, 5 insertions(+), 9 deletions(-)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 8c0117b..5f293e6 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -343,7 +343,6 @@ struct pmu {
*/
unsigned int scope;
- int __percpu *pmu_disable_count;
struct perf_cpu_pmu_context __percpu *cpu_pmu_context;
atomic_t exclusive_cnt; /* < 0: cpu; > 0: tsk */
int task_ctx_nr;
@@ -1031,6 +1030,7 @@ struct perf_cpu_pmu_context {
int active_oncpu;
int exclusive;
+ int pmu_disable_count;
raw_spinlock_t hrtimer_lock;
struct hrtimer hrtimer;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 348a379..8321b71 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1219,21 +1219,22 @@ static int perf_mux_hrtimer_restart_ipi(void *arg)
void perf_pmu_disable(struct pmu *pmu)
{
- int *count = this_cpu_ptr(pmu->pmu_disable_count);
+ int *count = &this_cpu_ptr(pmu->cpu_pmu_context)->pmu_disable_count;
if (!(*count)++)
pmu->pmu_disable(pmu);
}
void perf_pmu_enable(struct pmu *pmu)
{
- int *count = this_cpu_ptr(pmu->pmu_disable_count);
+ int *count = &this_cpu_ptr(pmu->cpu_pmu_context)->pmu_disable_count;
if (!--(*count))
pmu->pmu_enable(pmu);
}
static void perf_assert_pmu_disabled(struct pmu *pmu)
{
- WARN_ON_ONCE(*this_cpu_ptr(pmu->pmu_disable_count) == 0);
+ int *count = &this_cpu_ptr(pmu->cpu_pmu_context)->pmu_disable_count;
+ WARN_ON_ONCE(*count == 0);
}
static inline void perf_pmu_read(struct perf_event *event)
@@ -11906,7 +11907,6 @@ static bool idr_cmpxchg(struct idr *idr, unsigned long id, void *old, void *new)
static void perf_pmu_free(struct pmu *pmu)
{
- free_percpu(pmu->pmu_disable_count);
if (pmu_bus_running && pmu->dev && pmu->dev != PMU_NULL_DEV) {
if (pmu->nr_addr_filters)
device_remove_file(pmu->dev, &dev_attr_nr_addr_filters);
@@ -11925,10 +11925,6 @@ int perf_pmu_register(struct pmu *_pmu, const char *name, int type)
struct pmu *pmu __free(pmu_unregister) = _pmu;
guard(mutex)(&pmus_lock);
- pmu->pmu_disable_count = alloc_percpu(int);
- if (!pmu->pmu_disable_count)
- return -ENOMEM;
-
if (WARN_ONCE(!name, "Can not register anonymous pmu.\n"))
return -EINVAL;
^ permalink raw reply related [flat|nested] 85+ messages in thread
* [PATCH 10/19] perf: Add this_cpc() helper
2024-11-04 13:39 [PATCH 00/19] perf: Make perf_pmu_unregister() usable Peter Zijlstra
` (8 preceding siblings ...)
2024-11-04 13:39 ` [PATCH 09/19] perf: Merge pmu_disable_count into cpu_pmu_context Peter Zijlstra
@ 2024-11-04 13:39 ` Peter Zijlstra
2025-03-01 20:07 ` [tip: perf/core] perf/core: " tip-bot2 for Peter Zijlstra
2025-03-04 8:57 ` tip-bot2 for Peter Zijlstra
2024-11-04 13:39 ` [PATCH 11/19] perf: Detach perf_cpu_pmu_context and pmu lifetimes Peter Zijlstra
` (10 subsequent siblings)
20 siblings, 2 replies; 85+ messages in thread
From: Peter Zijlstra @ 2024-11-04 13:39 UTC (permalink / raw)
To: mingo, lucas.demarchi
Cc: linux-kernel, peterz, willy, acme, namhyung, mark.rutland,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang
As a preparation for adding yet another indirection.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
kernel/events/core.c | 34 ++++++++++++++++++----------------
1 file changed, 18 insertions(+), 16 deletions(-)
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1176,23 +1176,28 @@ static int perf_mux_hrtimer_restart_ipi(
return perf_mux_hrtimer_restart(arg);
}
+static __always_inline struct perf_cpu_pmu_context *this_cpc(struct pmu *pmu)
+{
+ return this_cpu_ptr(pmu->cpu_pmu_context);
+}
+
void perf_pmu_disable(struct pmu *pmu)
{
- int *count = &this_cpu_ptr(pmu->cpu_pmu_context)->pmu_disable_count;
+ int *count = &this_cpc(pmu)->pmu_disable_count;
if (!(*count)++)
pmu->pmu_disable(pmu);
}
void perf_pmu_enable(struct pmu *pmu)
{
- int *count = &this_cpu_ptr(pmu->cpu_pmu_context)->pmu_disable_count;
+ int *count = &this_cpc(pmu)->pmu_disable_count;
if (!--(*count))
pmu->pmu_enable(pmu);
}
static void perf_assert_pmu_disabled(struct pmu *pmu)
{
- int *count = &this_cpu_ptr(pmu->cpu_pmu_context)->pmu_disable_count;
+ int *count = &this_cpc(pmu)->pmu_disable_count;
WARN_ON_ONCE(*count == 0);
}
@@ -2304,7 +2309,7 @@ static void
event_sched_out(struct perf_event *event, struct perf_event_context *ctx)
{
struct perf_event_pmu_context *epc = event->pmu_ctx;
- struct perf_cpu_pmu_context *cpc = this_cpu_ptr(epc->pmu->cpu_pmu_context);
+ struct perf_cpu_pmu_context *cpc = this_cpc(epc->pmu);
enum perf_event_state state = PERF_EVENT_STATE_INACTIVE;
// XXX cpc serialization, probably per-cpu IRQ disabled
@@ -2445,9 +2450,8 @@ __perf_remove_from_context(struct perf_e
pmu_ctx->rotate_necessary = 0;
if (ctx->task && ctx->is_active) {
- struct perf_cpu_pmu_context *cpc;
+ struct perf_cpu_pmu_context *cpc = this_cpc(pmu_ctx->pmu);
- cpc = this_cpu_ptr(pmu_ctx->pmu->cpu_pmu_context);
WARN_ON_ONCE(cpc->task_epc && cpc->task_epc != pmu_ctx);
cpc->task_epc = NULL;
}
@@ -2585,7 +2589,7 @@ static int
event_sched_in(struct perf_event *event, struct perf_event_context *ctx)
{
struct perf_event_pmu_context *epc = event->pmu_ctx;
- struct perf_cpu_pmu_context *cpc = this_cpu_ptr(epc->pmu->cpu_pmu_context);
+ struct perf_cpu_pmu_context *cpc = this_cpc(epc->pmu);
int ret = 0;
WARN_ON_ONCE(event->ctx != ctx);
@@ -2692,7 +2696,7 @@ group_sched_in(struct perf_event *group_
static int group_can_go_on(struct perf_event *event, int can_add_hw)
{
struct perf_event_pmu_context *epc = event->pmu_ctx;
- struct perf_cpu_pmu_context *cpc = this_cpu_ptr(epc->pmu->cpu_pmu_context);
+ struct perf_cpu_pmu_context *cpc = this_cpc(epc->pmu);
/*
* Groups consisting entirely of software events can always go on.
@@ -3315,9 +3319,8 @@ static void __pmu_ctx_sched_out(struct p
struct pmu *pmu = pmu_ctx->pmu;
if (ctx->task && !(ctx->is_active & EVENT_ALL)) {
- struct perf_cpu_pmu_context *cpc;
+ struct perf_cpu_pmu_context *cpc = this_cpc(pmu);
- cpc = this_cpu_ptr(pmu->cpu_pmu_context);
WARN_ON_ONCE(cpc->task_epc && cpc->task_epc != pmu_ctx);
cpc->task_epc = NULL;
}
@@ -3565,7 +3568,7 @@ static void perf_ctx_sched_task_cb(struc
struct perf_cpu_pmu_context *cpc;
list_for_each_entry(pmu_ctx, &ctx->pmu_ctx_list, pmu_ctx_entry) {
- cpc = this_cpu_ptr(pmu_ctx->pmu->cpu_pmu_context);
+ cpc = this_cpc(pmu_ctx->pmu);
if (cpc->sched_cb_usage && pmu_ctx->pmu->sched_task)
pmu_ctx->pmu->sched_task(pmu_ctx, sched_in);
@@ -3674,7 +3677,7 @@ static DEFINE_PER_CPU(int, perf_sched_cb
void perf_sched_cb_dec(struct pmu *pmu)
{
- struct perf_cpu_pmu_context *cpc = this_cpu_ptr(pmu->cpu_pmu_context);
+ struct perf_cpu_pmu_context *cpc = this_cpc(pmu);
this_cpu_dec(perf_sched_cb_usages);
barrier();
@@ -3686,7 +3689,7 @@ void perf_sched_cb_dec(struct pmu *pmu)
void perf_sched_cb_inc(struct pmu *pmu)
{
- struct perf_cpu_pmu_context *cpc = this_cpu_ptr(pmu->cpu_pmu_context);
+ struct perf_cpu_pmu_context *cpc = this_cpc(pmu);
if (!cpc->sched_cb_usage++)
list_add(&cpc->sched_cb_entry, this_cpu_ptr(&sched_cb_list));
@@ -3810,7 +3813,7 @@ static void __link_epc(struct perf_event
if (!pmu_ctx->ctx->task)
return;
- cpc = this_cpu_ptr(pmu_ctx->pmu->cpu_pmu_context);
+ cpc = this_cpc(pmu_ctx->pmu);
WARN_ON_ONCE(cpc->task_epc && cpc->task_epc != pmu_ctx);
cpc->task_epc = pmu_ctx;
}
@@ -3939,10 +3942,9 @@ static int merge_sched_in(struct perf_ev
perf_cgroup_event_disable(event, ctx);
perf_event_set_state(event, PERF_EVENT_STATE_ERROR);
} else {
- struct perf_cpu_pmu_context *cpc;
+ struct perf_cpu_pmu_context *cpc = this_cpc(event->pmu_ctx->pmu);
event->pmu_ctx->rotate_necessary = 1;
- cpc = this_cpu_ptr(event->pmu_ctx->pmu->cpu_pmu_context);
perf_mux_hrtimer_restart(cpc);
group_update_userpage(event);
}
^ permalink raw reply [flat|nested] 85+ messages in thread* [tip: perf/core] perf/core: Add this_cpc() helper
2024-11-04 13:39 ` [PATCH 10/19] perf: Add this_cpc() helper Peter Zijlstra
@ 2025-03-01 20:07 ` tip-bot2 for Peter Zijlstra
2025-03-04 8:57 ` tip-bot2 for Peter Zijlstra
1 sibling, 0 replies; 85+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2025-03-01 20:07 UTC (permalink / raw)
To: linux-tip-commits; +Cc: Peter Zijlstra (Intel), Ingo Molnar, x86, linux-kernel
The following commit has been merged into the perf/core branch of tip:
Commit-ID: a57411b14ea03d0d3acdf81f2df78df6307d3ad6
Gitweb: https://git.kernel.org/tip/a57411b14ea03d0d3acdf81f2df78df6307d3ad6
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Mon, 04 Nov 2024 14:39:19 +01:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Sat, 01 Mar 2025 20:02:24 +01:00
perf/core: Add this_cpc() helper
As a preparation for adding yet another indirection.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20241104135518.650051565@infradead.org
---
kernel/events/core.c | 34 ++++++++++++++++++----------------
1 file changed, 18 insertions(+), 16 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 8321b71..0c7015f 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1217,23 +1217,28 @@ static int perf_mux_hrtimer_restart_ipi(void *arg)
return perf_mux_hrtimer_restart(arg);
}
+static __always_inline struct perf_cpu_pmu_context *this_cpc(struct pmu *pmu)
+{
+ return this_cpu_ptr(pmu->cpu_pmu_context);
+}
+
void perf_pmu_disable(struct pmu *pmu)
{
- int *count = &this_cpu_ptr(pmu->cpu_pmu_context)->pmu_disable_count;
+ int *count = &this_cpc(pmu)->pmu_disable_count;
if (!(*count)++)
pmu->pmu_disable(pmu);
}
void perf_pmu_enable(struct pmu *pmu)
{
- int *count = &this_cpu_ptr(pmu->cpu_pmu_context)->pmu_disable_count;
+ int *count = &this_cpc(pmu)->pmu_disable_count;
if (!--(*count))
pmu->pmu_enable(pmu);
}
static void perf_assert_pmu_disabled(struct pmu *pmu)
{
- int *count = &this_cpu_ptr(pmu->cpu_pmu_context)->pmu_disable_count;
+ int *count = &this_cpc(pmu)->pmu_disable_count;
WARN_ON_ONCE(*count == 0);
}
@@ -2355,7 +2360,7 @@ static void
event_sched_out(struct perf_event *event, struct perf_event_context *ctx)
{
struct perf_event_pmu_context *epc = event->pmu_ctx;
- struct perf_cpu_pmu_context *cpc = this_cpu_ptr(epc->pmu->cpu_pmu_context);
+ struct perf_cpu_pmu_context *cpc = this_cpc(epc->pmu);
enum perf_event_state state = PERF_EVENT_STATE_INACTIVE;
// XXX cpc serialization, probably per-cpu IRQ disabled
@@ -2496,9 +2501,8 @@ __perf_remove_from_context(struct perf_event *event,
pmu_ctx->rotate_necessary = 0;
if (ctx->task && ctx->is_active) {
- struct perf_cpu_pmu_context *cpc;
+ struct perf_cpu_pmu_context *cpc = this_cpc(pmu_ctx->pmu);
- cpc = this_cpu_ptr(pmu_ctx->pmu->cpu_pmu_context);
WARN_ON_ONCE(cpc->task_epc && cpc->task_epc != pmu_ctx);
cpc->task_epc = NULL;
}
@@ -2636,7 +2640,7 @@ static int
event_sched_in(struct perf_event *event, struct perf_event_context *ctx)
{
struct perf_event_pmu_context *epc = event->pmu_ctx;
- struct perf_cpu_pmu_context *cpc = this_cpu_ptr(epc->pmu->cpu_pmu_context);
+ struct perf_cpu_pmu_context *cpc = this_cpc(epc->pmu);
int ret = 0;
WARN_ON_ONCE(event->ctx != ctx);
@@ -2743,7 +2747,7 @@ error:
static int group_can_go_on(struct perf_event *event, int can_add_hw)
{
struct perf_event_pmu_context *epc = event->pmu_ctx;
- struct perf_cpu_pmu_context *cpc = this_cpu_ptr(epc->pmu->cpu_pmu_context);
+ struct perf_cpu_pmu_context *cpc = this_cpc(epc->pmu);
/*
* Groups consisting entirely of software events can always go on.
@@ -3366,9 +3370,8 @@ static void __pmu_ctx_sched_out(struct perf_event_pmu_context *pmu_ctx,
struct pmu *pmu = pmu_ctx->pmu;
if (ctx->task && !(ctx->is_active & EVENT_ALL)) {
- struct perf_cpu_pmu_context *cpc;
+ struct perf_cpu_pmu_context *cpc = this_cpc(pmu);
- cpc = this_cpu_ptr(pmu->cpu_pmu_context);
WARN_ON_ONCE(cpc->task_epc && cpc->task_epc != pmu_ctx);
cpc->task_epc = NULL;
}
@@ -3615,7 +3618,7 @@ static void perf_ctx_sched_task_cb(struct perf_event_context *ctx, bool sched_in
struct perf_cpu_pmu_context *cpc;
list_for_each_entry(pmu_ctx, &ctx->pmu_ctx_list, pmu_ctx_entry) {
- cpc = this_cpu_ptr(pmu_ctx->pmu->cpu_pmu_context);
+ cpc = this_cpc(pmu_ctx->pmu);
if (cpc->sched_cb_usage && pmu_ctx->pmu->sched_task)
pmu_ctx->pmu->sched_task(pmu_ctx, sched_in);
@@ -3724,7 +3727,7 @@ static DEFINE_PER_CPU(int, perf_sched_cb_usages);
void perf_sched_cb_dec(struct pmu *pmu)
{
- struct perf_cpu_pmu_context *cpc = this_cpu_ptr(pmu->cpu_pmu_context);
+ struct perf_cpu_pmu_context *cpc = this_cpc(pmu);
this_cpu_dec(perf_sched_cb_usages);
barrier();
@@ -3736,7 +3739,7 @@ void perf_sched_cb_dec(struct pmu *pmu)
void perf_sched_cb_inc(struct pmu *pmu)
{
- struct perf_cpu_pmu_context *cpc = this_cpu_ptr(pmu->cpu_pmu_context);
+ struct perf_cpu_pmu_context *cpc = this_cpc(pmu);
if (!cpc->sched_cb_usage++)
list_add(&cpc->sched_cb_entry, this_cpu_ptr(&sched_cb_list));
@@ -3853,7 +3856,7 @@ static void __link_epc(struct perf_event_pmu_context *pmu_ctx)
if (!pmu_ctx->ctx->task)
return;
- cpc = this_cpu_ptr(pmu_ctx->pmu->cpu_pmu_context);
+ cpc = this_cpc(pmu_ctx->pmu);
WARN_ON_ONCE(cpc->task_epc && cpc->task_epc != pmu_ctx);
cpc->task_epc = pmu_ctx;
}
@@ -3982,10 +3985,9 @@ static int merge_sched_in(struct perf_event *event, void *data)
perf_cgroup_event_disable(event, ctx);
perf_event_set_state(event, PERF_EVENT_STATE_ERROR);
} else {
- struct perf_cpu_pmu_context *cpc;
+ struct perf_cpu_pmu_context *cpc = this_cpc(event->pmu_ctx->pmu);
event->pmu_ctx->rotate_necessary = 1;
- cpc = this_cpu_ptr(event->pmu_ctx->pmu->cpu_pmu_context);
perf_mux_hrtimer_restart(cpc);
group_update_userpage(event);
}
^ permalink raw reply related [flat|nested] 85+ messages in thread* [tip: perf/core] perf/core: Add this_cpc() helper
2024-11-04 13:39 ` [PATCH 10/19] perf: Add this_cpc() helper Peter Zijlstra
2025-03-01 20:07 ` [tip: perf/core] perf/core: " tip-bot2 for Peter Zijlstra
@ 2025-03-04 8:57 ` tip-bot2 for Peter Zijlstra
1 sibling, 0 replies; 85+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2025-03-04 8:57 UTC (permalink / raw)
To: linux-tip-commits
Cc: Peter Zijlstra (Intel), Ingo Molnar, Ravi Bangoria, x86,
linux-kernel
The following commit has been merged into the perf/core branch of tip:
Commit-ID: b2996f56556e389a13377158904c218da6fffa91
Gitweb: https://git.kernel.org/tip/b2996f56556e389a13377158904c218da6fffa91
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Mon, 04 Nov 2024 14:39:19 +01:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 04 Mar 2025 09:42:51 +01:00
perf/core: Add this_cpc() helper
As a preparation for adding yet another indirection.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20241104135518.650051565@infradead.org
---
kernel/events/core.c | 34 ++++++++++++++++++----------------
1 file changed, 18 insertions(+), 16 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 8321b71..0c7015f 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1217,23 +1217,28 @@ static int perf_mux_hrtimer_restart_ipi(void *arg)
return perf_mux_hrtimer_restart(arg);
}
+static __always_inline struct perf_cpu_pmu_context *this_cpc(struct pmu *pmu)
+{
+ return this_cpu_ptr(pmu->cpu_pmu_context);
+}
+
void perf_pmu_disable(struct pmu *pmu)
{
- int *count = &this_cpu_ptr(pmu->cpu_pmu_context)->pmu_disable_count;
+ int *count = &this_cpc(pmu)->pmu_disable_count;
if (!(*count)++)
pmu->pmu_disable(pmu);
}
void perf_pmu_enable(struct pmu *pmu)
{
- int *count = &this_cpu_ptr(pmu->cpu_pmu_context)->pmu_disable_count;
+ int *count = &this_cpc(pmu)->pmu_disable_count;
if (!--(*count))
pmu->pmu_enable(pmu);
}
static void perf_assert_pmu_disabled(struct pmu *pmu)
{
- int *count = &this_cpu_ptr(pmu->cpu_pmu_context)->pmu_disable_count;
+ int *count = &this_cpc(pmu)->pmu_disable_count;
WARN_ON_ONCE(*count == 0);
}
@@ -2355,7 +2360,7 @@ static void
event_sched_out(struct perf_event *event, struct perf_event_context *ctx)
{
struct perf_event_pmu_context *epc = event->pmu_ctx;
- struct perf_cpu_pmu_context *cpc = this_cpu_ptr(epc->pmu->cpu_pmu_context);
+ struct perf_cpu_pmu_context *cpc = this_cpc(epc->pmu);
enum perf_event_state state = PERF_EVENT_STATE_INACTIVE;
// XXX cpc serialization, probably per-cpu IRQ disabled
@@ -2496,9 +2501,8 @@ __perf_remove_from_context(struct perf_event *event,
pmu_ctx->rotate_necessary = 0;
if (ctx->task && ctx->is_active) {
- struct perf_cpu_pmu_context *cpc;
+ struct perf_cpu_pmu_context *cpc = this_cpc(pmu_ctx->pmu);
- cpc = this_cpu_ptr(pmu_ctx->pmu->cpu_pmu_context);
WARN_ON_ONCE(cpc->task_epc && cpc->task_epc != pmu_ctx);
cpc->task_epc = NULL;
}
@@ -2636,7 +2640,7 @@ static int
event_sched_in(struct perf_event *event, struct perf_event_context *ctx)
{
struct perf_event_pmu_context *epc = event->pmu_ctx;
- struct perf_cpu_pmu_context *cpc = this_cpu_ptr(epc->pmu->cpu_pmu_context);
+ struct perf_cpu_pmu_context *cpc = this_cpc(epc->pmu);
int ret = 0;
WARN_ON_ONCE(event->ctx != ctx);
@@ -2743,7 +2747,7 @@ error:
static int group_can_go_on(struct perf_event *event, int can_add_hw)
{
struct perf_event_pmu_context *epc = event->pmu_ctx;
- struct perf_cpu_pmu_context *cpc = this_cpu_ptr(epc->pmu->cpu_pmu_context);
+ struct perf_cpu_pmu_context *cpc = this_cpc(epc->pmu);
/*
* Groups consisting entirely of software events can always go on.
@@ -3366,9 +3370,8 @@ static void __pmu_ctx_sched_out(struct perf_event_pmu_context *pmu_ctx,
struct pmu *pmu = pmu_ctx->pmu;
if (ctx->task && !(ctx->is_active & EVENT_ALL)) {
- struct perf_cpu_pmu_context *cpc;
+ struct perf_cpu_pmu_context *cpc = this_cpc(pmu);
- cpc = this_cpu_ptr(pmu->cpu_pmu_context);
WARN_ON_ONCE(cpc->task_epc && cpc->task_epc != pmu_ctx);
cpc->task_epc = NULL;
}
@@ -3615,7 +3618,7 @@ static void perf_ctx_sched_task_cb(struct perf_event_context *ctx, bool sched_in
struct perf_cpu_pmu_context *cpc;
list_for_each_entry(pmu_ctx, &ctx->pmu_ctx_list, pmu_ctx_entry) {
- cpc = this_cpu_ptr(pmu_ctx->pmu->cpu_pmu_context);
+ cpc = this_cpc(pmu_ctx->pmu);
if (cpc->sched_cb_usage && pmu_ctx->pmu->sched_task)
pmu_ctx->pmu->sched_task(pmu_ctx, sched_in);
@@ -3724,7 +3727,7 @@ static DEFINE_PER_CPU(int, perf_sched_cb_usages);
void perf_sched_cb_dec(struct pmu *pmu)
{
- struct perf_cpu_pmu_context *cpc = this_cpu_ptr(pmu->cpu_pmu_context);
+ struct perf_cpu_pmu_context *cpc = this_cpc(pmu);
this_cpu_dec(perf_sched_cb_usages);
barrier();
@@ -3736,7 +3739,7 @@ void perf_sched_cb_dec(struct pmu *pmu)
void perf_sched_cb_inc(struct pmu *pmu)
{
- struct perf_cpu_pmu_context *cpc = this_cpu_ptr(pmu->cpu_pmu_context);
+ struct perf_cpu_pmu_context *cpc = this_cpc(pmu);
if (!cpc->sched_cb_usage++)
list_add(&cpc->sched_cb_entry, this_cpu_ptr(&sched_cb_list));
@@ -3853,7 +3856,7 @@ static void __link_epc(struct perf_event_pmu_context *pmu_ctx)
if (!pmu_ctx->ctx->task)
return;
- cpc = this_cpu_ptr(pmu_ctx->pmu->cpu_pmu_context);
+ cpc = this_cpc(pmu_ctx->pmu);
WARN_ON_ONCE(cpc->task_epc && cpc->task_epc != pmu_ctx);
cpc->task_epc = pmu_ctx;
}
@@ -3982,10 +3985,9 @@ static int merge_sched_in(struct perf_event *event, void *data)
perf_cgroup_event_disable(event, ctx);
perf_event_set_state(event, PERF_EVENT_STATE_ERROR);
} else {
- struct perf_cpu_pmu_context *cpc;
+ struct perf_cpu_pmu_context *cpc = this_cpc(event->pmu_ctx->pmu);
event->pmu_ctx->rotate_necessary = 1;
- cpc = this_cpu_ptr(event->pmu_ctx->pmu->cpu_pmu_context);
perf_mux_hrtimer_restart(cpc);
group_update_userpage(event);
}
^ permalink raw reply related [flat|nested] 85+ messages in thread
* [PATCH 11/19] perf: Detach perf_cpu_pmu_context and pmu lifetimes
2024-11-04 13:39 [PATCH 00/19] perf: Make perf_pmu_unregister() usable Peter Zijlstra
` (9 preceding siblings ...)
2024-11-04 13:39 ` [PATCH 10/19] perf: Add this_cpc() helper Peter Zijlstra
@ 2024-11-04 13:39 ` Peter Zijlstra
2025-03-03 12:29 ` [tip: perf/core] perf/core: Detach 'struct perf_cpu_pmu_context' and 'struct pmu' lifetimes tip-bot2 for Peter Zijlstra
2025-03-04 8:56 ` tip-bot2 for Peter Zijlstra
2024-11-04 13:39 ` [PATCH 12/19] perf: Introduce perf_free_addr_filters() Peter Zijlstra
` (9 subsequent siblings)
20 siblings, 2 replies; 85+ messages in thread
From: Peter Zijlstra @ 2024-11-04 13:39 UTC (permalink / raw)
To: mingo, lucas.demarchi
Cc: linux-kernel, peterz, willy, acme, namhyung, mark.rutland,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang
In prepration for being able to unregister a pmu with existing events,
it becomes important to detach struct perf_cpu_pmu_context lifetimes
from that of struct pmu.
Notably perf_cpu_pmu_context embeds a perf_event_pmu_context that can
stay referenced until the last event goes.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
include/linux/perf_event.h | 4 +--
kernel/events/core.c | 56 +++++++++++++++++++++++++++++++++++++--------
2 files changed, 49 insertions(+), 11 deletions(-)
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -336,7 +336,7 @@ struct pmu {
*/
unsigned int scope;
- struct perf_cpu_pmu_context __percpu *cpu_pmu_context;
+ struct perf_cpu_pmu_context __percpu **cpu_pmu_context;
atomic_t exclusive_cnt; /* < 0: cpu; > 0: tsk */
int task_ctx_nr;
int hrtimer_interval_ms;
@@ -901,7 +901,7 @@ struct perf_event_pmu_context {
struct list_head pinned_active;
struct list_head flexible_active;
- /* Used to avoid freeing per-cpu perf_event_pmu_context */
+ /* Used to identify the per-cpu perf_event_pmu_context */
unsigned int embedded : 1;
unsigned int nr_events;
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1178,7 +1178,7 @@ static int perf_mux_hrtimer_restart_ipi(
static __always_inline struct perf_cpu_pmu_context *this_cpc(struct pmu *pmu)
{
- return this_cpu_ptr(pmu->cpu_pmu_context);
+ return *this_cpu_ptr(pmu->cpu_pmu_context);
}
void perf_pmu_disable(struct pmu *pmu)
@@ -4971,11 +4971,14 @@ find_get_pmu_context(struct pmu *pmu, st
*/
struct perf_cpu_pmu_context *cpc;
- cpc = per_cpu_ptr(pmu->cpu_pmu_context, event->cpu);
+ cpc = *per_cpu_ptr(pmu->cpu_pmu_context, event->cpu);
epc = &cpc->epc;
raw_spin_lock_irq(&ctx->lock);
if (!epc->ctx) {
- atomic_set(&epc->refcount, 1);
+ /*
+ * One extra reference for the pmu; see perf_pmu_free().
+ */
+ atomic_set(&epc->refcount, 2);
epc->embedded = 1;
list_add(&epc->pmu_ctx_entry, &ctx->pmu_ctx_list);
epc->ctx = ctx;
@@ -5044,6 +5047,15 @@ static void get_pmu_ctx(struct perf_even
WARN_ON_ONCE(!atomic_inc_not_zero(&epc->refcount));
}
+static void free_cpc_rcu(struct rcu_head *head)
+{
+ struct perf_cpu_pmu_context *cpc =
+ container_of(head, typeof(*cpc), epc.rcu_head);
+
+ kfree(cpc->epc.task_ctx_data);
+ kfree(cpc);
+}
+
static void free_epc_rcu(struct rcu_head *head)
{
struct perf_event_pmu_context *epc = container_of(head, typeof(*epc), rcu_head);
@@ -5078,8 +5090,10 @@ static void put_pmu_ctx(struct perf_even
raw_spin_unlock_irqrestore(&ctx->lock, flags);
- if (epc->embedded)
+ if (epc->embedded) {
+ call_rcu(&epc->rcu_head, free_cpc_rcu);
return;
+ }
call_rcu(&epc->rcu_head, free_epc_rcu);
}
@@ -11595,7 +11609,7 @@ perf_event_mux_interval_ms_store(struct
cpus_read_lock();
for_each_online_cpu(cpu) {
struct perf_cpu_pmu_context *cpc;
- cpc = per_cpu_ptr(pmu->cpu_pmu_context, cpu);
+ cpc = *per_cpu_ptr(pmu->cpu_pmu_context, cpu);
cpc->hrtimer_interval = ns_to_ktime(NSEC_PER_MSEC * timer);
cpu_function_call(cpu, perf_mux_hrtimer_restart_ipi, cpc);
@@ -11767,7 +11781,25 @@ static void perf_pmu_free(struct pmu *pm
device_del(pmu->dev);
put_device(pmu->dev);
}
- free_percpu(pmu->cpu_pmu_context);
+
+ if (pmu->cpu_pmu_context) {
+ int cpu;
+
+ for_each_possible_cpu(cpu) {
+ struct perf_cpu_pmu_context *cpc;
+
+ cpc = *per_cpu_ptr(pmu->cpu_pmu_context, cpu);
+ if (!cpc)
+ continue;
+ if (cpc->epc.embedded) {
+ /* refcount managed */
+ put_pmu_ctx(&cpc->epc);
+ continue;
+ }
+ kfree(cpc);
+ }
+ free_percpu(pmu->cpu_pmu_context);
+ }
}
DEFINE_FREE(pmu_unregister, struct pmu *, if (_T) perf_pmu_free(_T))
@@ -11806,14 +11838,20 @@ int perf_pmu_register(struct pmu *_pmu,
return ret;
}
- pmu->cpu_pmu_context = alloc_percpu(struct perf_cpu_pmu_context);
+ pmu->cpu_pmu_context = alloc_percpu(struct perf_cpu_pmu_context *);
if (!pmu->cpu_pmu_context)
return -ENOMEM;
for_each_possible_cpu(cpu) {
- struct perf_cpu_pmu_context *cpc;
+ struct perf_cpu_pmu_context *cpc =
+ kmalloc_node(sizeof(struct perf_cpu_pmu_context),
+ GFP_KERNEL | __GFP_ZERO,
+ cpu_to_node(cpu));
+
+ if (!cpc)
+ return -ENOMEM;
- cpc = per_cpu_ptr(pmu->cpu_pmu_context, cpu);
+ *per_cpu_ptr(pmu->cpu_pmu_context, cpu) = cpc;
__perf_init_event_pmu_context(&cpc->epc, pmu);
__perf_mux_hrtimer_init(cpc, cpu);
}
^ permalink raw reply [flat|nested] 85+ messages in thread* [tip: perf/core] perf/core: Detach 'struct perf_cpu_pmu_context' and 'struct pmu' lifetimes
2024-11-04 13:39 ` [PATCH 11/19] perf: Detach perf_cpu_pmu_context and pmu lifetimes Peter Zijlstra
@ 2025-03-03 12:29 ` tip-bot2 for Peter Zijlstra
2025-03-04 8:56 ` tip-bot2 for Peter Zijlstra
1 sibling, 0 replies; 85+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2025-03-03 12:29 UTC (permalink / raw)
To: linux-tip-commits; +Cc: Peter Zijlstra (Intel), Ingo Molnar, x86, linux-kernel
The following commit has been merged into the perf/core branch of tip:
Commit-ID: f67d1ffd841f31bc4a1314bc7f0a973ba77f39a5
Gitweb: https://git.kernel.org/tip/f67d1ffd841f31bc4a1314bc7f0a973ba77f39a5
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Mon, 04 Nov 2024 14:39:20 +01:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Mon, 03 Mar 2025 13:24:12 +01:00
perf/core: Detach 'struct perf_cpu_pmu_context' and 'struct pmu' lifetimes
In prepration for being able to unregister a PMU with existing events,
it becomes important to detach struct perf_cpu_pmu_context lifetimes
from that of struct pmu.
Notably struct perf_cpu_pmu_context embeds a struct perf_event_pmu_context
that can stay referenced until the last event goes.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20241104135518.760214287@infradead.org
---
include/linux/perf_event.h | 4 +--
kernel/events/core.c | 56 +++++++++++++++++++++++++++++++------
2 files changed, 49 insertions(+), 11 deletions(-)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 5f293e6..76f4265 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -343,7 +343,7 @@ struct pmu {
*/
unsigned int scope;
- struct perf_cpu_pmu_context __percpu *cpu_pmu_context;
+ struct perf_cpu_pmu_context __percpu **cpu_pmu_context;
atomic_t exclusive_cnt; /* < 0: cpu; > 0: tsk */
int task_ctx_nr;
int hrtimer_interval_ms;
@@ -922,7 +922,7 @@ struct perf_event_pmu_context {
struct list_head pinned_active;
struct list_head flexible_active;
- /* Used to avoid freeing per-cpu perf_event_pmu_context */
+ /* Used to identify the per-cpu perf_event_pmu_context */
unsigned int embedded : 1;
unsigned int nr_events;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 773875a..8b2a8c3 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1219,7 +1219,7 @@ static int perf_mux_hrtimer_restart_ipi(void *arg)
static __always_inline struct perf_cpu_pmu_context *this_cpc(struct pmu *pmu)
{
- return this_cpu_ptr(pmu->cpu_pmu_context);
+ return *this_cpu_ptr(pmu->cpu_pmu_context);
}
void perf_pmu_disable(struct pmu *pmu)
@@ -5007,11 +5007,14 @@ find_get_pmu_context(struct pmu *pmu, struct perf_event_context *ctx,
*/
struct perf_cpu_pmu_context *cpc;
- cpc = per_cpu_ptr(pmu->cpu_pmu_context, event->cpu);
+ cpc = *per_cpu_ptr(pmu->cpu_pmu_context, event->cpu);
epc = &cpc->epc;
raw_spin_lock_irq(&ctx->lock);
if (!epc->ctx) {
- atomic_set(&epc->refcount, 1);
+ /*
+ * One extra reference for the pmu; see perf_pmu_free().
+ */
+ atomic_set(&epc->refcount, 2);
epc->embedded = 1;
list_add(&epc->pmu_ctx_entry, &ctx->pmu_ctx_list);
epc->ctx = ctx;
@@ -5087,6 +5090,15 @@ static void get_pmu_ctx(struct perf_event_pmu_context *epc)
WARN_ON_ONCE(!atomic_inc_not_zero(&epc->refcount));
}
+static void free_cpc_rcu(struct rcu_head *head)
+{
+ struct perf_cpu_pmu_context *cpc =
+ container_of(head, typeof(*cpc), epc.rcu_head);
+
+ kfree(cpc->epc.task_ctx_data);
+ kfree(cpc);
+}
+
static void free_epc_rcu(struct rcu_head *head)
{
struct perf_event_pmu_context *epc = container_of(head, typeof(*epc), rcu_head);
@@ -5121,8 +5133,10 @@ static void put_pmu_ctx(struct perf_event_pmu_context *epc)
raw_spin_unlock_irqrestore(&ctx->lock, flags);
- if (epc->embedded)
+ if (epc->embedded) {
+ call_rcu(&epc->rcu_head, free_cpc_rcu);
return;
+ }
call_rcu(&epc->rcu_head, free_epc_rcu);
}
@@ -11752,7 +11766,7 @@ perf_event_mux_interval_ms_store(struct device *dev,
cpus_read_lock();
for_each_online_cpu(cpu) {
struct perf_cpu_pmu_context *cpc;
- cpc = per_cpu_ptr(pmu->cpu_pmu_context, cpu);
+ cpc = *per_cpu_ptr(pmu->cpu_pmu_context, cpu);
cpc->hrtimer_interval = ns_to_ktime(NSEC_PER_MSEC * timer);
cpu_function_call(cpu, perf_mux_hrtimer_restart_ipi, cpc);
@@ -11925,7 +11939,25 @@ static void perf_pmu_free(struct pmu *pmu)
device_del(pmu->dev);
put_device(pmu->dev);
}
- free_percpu(pmu->cpu_pmu_context);
+
+ if (pmu->cpu_pmu_context) {
+ int cpu;
+
+ for_each_possible_cpu(cpu) {
+ struct perf_cpu_pmu_context *cpc;
+
+ cpc = *per_cpu_ptr(pmu->cpu_pmu_context, cpu);
+ if (!cpc)
+ continue;
+ if (cpc->epc.embedded) {
+ /* refcount managed */
+ put_pmu_ctx(&cpc->epc);
+ continue;
+ }
+ kfree(cpc);
+ }
+ free_percpu(pmu->cpu_pmu_context);
+ }
}
DEFINE_FREE(pmu_unregister, struct pmu *, if (_T) perf_pmu_free(_T))
@@ -11964,14 +11996,20 @@ int perf_pmu_register(struct pmu *_pmu, const char *name, int type)
return ret;
}
- pmu->cpu_pmu_context = alloc_percpu(struct perf_cpu_pmu_context);
+ pmu->cpu_pmu_context = alloc_percpu(struct perf_cpu_pmu_context *);
if (!pmu->cpu_pmu_context)
return -ENOMEM;
for_each_possible_cpu(cpu) {
- struct perf_cpu_pmu_context *cpc;
+ struct perf_cpu_pmu_context *cpc =
+ kmalloc_node(sizeof(struct perf_cpu_pmu_context),
+ GFP_KERNEL | __GFP_ZERO,
+ cpu_to_node(cpu));
+
+ if (!cpc)
+ return -ENOMEM;
- cpc = per_cpu_ptr(pmu->cpu_pmu_context, cpu);
+ *per_cpu_ptr(pmu->cpu_pmu_context, cpu) = cpc;
__perf_init_event_pmu_context(&cpc->epc, pmu);
__perf_mux_hrtimer_init(cpc, cpu);
}
^ permalink raw reply related [flat|nested] 85+ messages in thread* [tip: perf/core] perf/core: Detach 'struct perf_cpu_pmu_context' and 'struct pmu' lifetimes
2024-11-04 13:39 ` [PATCH 11/19] perf: Detach perf_cpu_pmu_context and pmu lifetimes Peter Zijlstra
2025-03-03 12:29 ` [tip: perf/core] perf/core: Detach 'struct perf_cpu_pmu_context' and 'struct pmu' lifetimes tip-bot2 for Peter Zijlstra
@ 2025-03-04 8:56 ` tip-bot2 for Peter Zijlstra
1 sibling, 0 replies; 85+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2025-03-04 8:56 UTC (permalink / raw)
To: linux-tip-commits
Cc: Peter Zijlstra (Intel), Ingo Molnar, Ravi Bangoria, x86,
linux-kernel
The following commit has been merged into the perf/core branch of tip:
Commit-ID: 4eabf533fb1886089ef57e0c8ec52048b1741e39
Gitweb: https://git.kernel.org/tip/4eabf533fb1886089ef57e0c8ec52048b1741e39
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Mon, 04 Nov 2024 14:39:20 +01:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 04 Mar 2025 09:43:22 +01:00
perf/core: Detach 'struct perf_cpu_pmu_context' and 'struct pmu' lifetimes
In prepration for being able to unregister a PMU with existing events,
it becomes important to detach struct perf_cpu_pmu_context lifetimes
from that of struct pmu.
Notably struct perf_cpu_pmu_context embeds a struct perf_event_pmu_context
that can stay referenced until the last event goes.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20241104135518.760214287@infradead.org
---
include/linux/perf_event.h | 4 +--
kernel/events/core.c | 56 +++++++++++++++++++++++++++++++------
2 files changed, 49 insertions(+), 11 deletions(-)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 5f293e6..76f4265 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -343,7 +343,7 @@ struct pmu {
*/
unsigned int scope;
- struct perf_cpu_pmu_context __percpu *cpu_pmu_context;
+ struct perf_cpu_pmu_context __percpu **cpu_pmu_context;
atomic_t exclusive_cnt; /* < 0: cpu; > 0: tsk */
int task_ctx_nr;
int hrtimer_interval_ms;
@@ -922,7 +922,7 @@ struct perf_event_pmu_context {
struct list_head pinned_active;
struct list_head flexible_active;
- /* Used to avoid freeing per-cpu perf_event_pmu_context */
+ /* Used to identify the per-cpu perf_event_pmu_context */
unsigned int embedded : 1;
unsigned int nr_events;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 773875a..8b2a8c3 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1219,7 +1219,7 @@ static int perf_mux_hrtimer_restart_ipi(void *arg)
static __always_inline struct perf_cpu_pmu_context *this_cpc(struct pmu *pmu)
{
- return this_cpu_ptr(pmu->cpu_pmu_context);
+ return *this_cpu_ptr(pmu->cpu_pmu_context);
}
void perf_pmu_disable(struct pmu *pmu)
@@ -5007,11 +5007,14 @@ find_get_pmu_context(struct pmu *pmu, struct perf_event_context *ctx,
*/
struct perf_cpu_pmu_context *cpc;
- cpc = per_cpu_ptr(pmu->cpu_pmu_context, event->cpu);
+ cpc = *per_cpu_ptr(pmu->cpu_pmu_context, event->cpu);
epc = &cpc->epc;
raw_spin_lock_irq(&ctx->lock);
if (!epc->ctx) {
- atomic_set(&epc->refcount, 1);
+ /*
+ * One extra reference for the pmu; see perf_pmu_free().
+ */
+ atomic_set(&epc->refcount, 2);
epc->embedded = 1;
list_add(&epc->pmu_ctx_entry, &ctx->pmu_ctx_list);
epc->ctx = ctx;
@@ -5087,6 +5090,15 @@ static void get_pmu_ctx(struct perf_event_pmu_context *epc)
WARN_ON_ONCE(!atomic_inc_not_zero(&epc->refcount));
}
+static void free_cpc_rcu(struct rcu_head *head)
+{
+ struct perf_cpu_pmu_context *cpc =
+ container_of(head, typeof(*cpc), epc.rcu_head);
+
+ kfree(cpc->epc.task_ctx_data);
+ kfree(cpc);
+}
+
static void free_epc_rcu(struct rcu_head *head)
{
struct perf_event_pmu_context *epc = container_of(head, typeof(*epc), rcu_head);
@@ -5121,8 +5133,10 @@ static void put_pmu_ctx(struct perf_event_pmu_context *epc)
raw_spin_unlock_irqrestore(&ctx->lock, flags);
- if (epc->embedded)
+ if (epc->embedded) {
+ call_rcu(&epc->rcu_head, free_cpc_rcu);
return;
+ }
call_rcu(&epc->rcu_head, free_epc_rcu);
}
@@ -11752,7 +11766,7 @@ perf_event_mux_interval_ms_store(struct device *dev,
cpus_read_lock();
for_each_online_cpu(cpu) {
struct perf_cpu_pmu_context *cpc;
- cpc = per_cpu_ptr(pmu->cpu_pmu_context, cpu);
+ cpc = *per_cpu_ptr(pmu->cpu_pmu_context, cpu);
cpc->hrtimer_interval = ns_to_ktime(NSEC_PER_MSEC * timer);
cpu_function_call(cpu, perf_mux_hrtimer_restart_ipi, cpc);
@@ -11925,7 +11939,25 @@ static void perf_pmu_free(struct pmu *pmu)
device_del(pmu->dev);
put_device(pmu->dev);
}
- free_percpu(pmu->cpu_pmu_context);
+
+ if (pmu->cpu_pmu_context) {
+ int cpu;
+
+ for_each_possible_cpu(cpu) {
+ struct perf_cpu_pmu_context *cpc;
+
+ cpc = *per_cpu_ptr(pmu->cpu_pmu_context, cpu);
+ if (!cpc)
+ continue;
+ if (cpc->epc.embedded) {
+ /* refcount managed */
+ put_pmu_ctx(&cpc->epc);
+ continue;
+ }
+ kfree(cpc);
+ }
+ free_percpu(pmu->cpu_pmu_context);
+ }
}
DEFINE_FREE(pmu_unregister, struct pmu *, if (_T) perf_pmu_free(_T))
@@ -11964,14 +11996,20 @@ int perf_pmu_register(struct pmu *_pmu, const char *name, int type)
return ret;
}
- pmu->cpu_pmu_context = alloc_percpu(struct perf_cpu_pmu_context);
+ pmu->cpu_pmu_context = alloc_percpu(struct perf_cpu_pmu_context *);
if (!pmu->cpu_pmu_context)
return -ENOMEM;
for_each_possible_cpu(cpu) {
- struct perf_cpu_pmu_context *cpc;
+ struct perf_cpu_pmu_context *cpc =
+ kmalloc_node(sizeof(struct perf_cpu_pmu_context),
+ GFP_KERNEL | __GFP_ZERO,
+ cpu_to_node(cpu));
+
+ if (!cpc)
+ return -ENOMEM;
- cpc = per_cpu_ptr(pmu->cpu_pmu_context, cpu);
+ *per_cpu_ptr(pmu->cpu_pmu_context, cpu) = cpc;
__perf_init_event_pmu_context(&cpc->epc, pmu);
__perf_mux_hrtimer_init(cpc, cpu);
}
^ permalink raw reply related [flat|nested] 85+ messages in thread
* [PATCH 12/19] perf: Introduce perf_free_addr_filters()
2024-11-04 13:39 [PATCH 00/19] perf: Make perf_pmu_unregister() usable Peter Zijlstra
` (10 preceding siblings ...)
2024-11-04 13:39 ` [PATCH 11/19] perf: Detach perf_cpu_pmu_context and pmu lifetimes Peter Zijlstra
@ 2024-11-04 13:39 ` Peter Zijlstra
2025-03-01 20:07 ` [tip: perf/core] perf/core: " tip-bot2 for Peter Zijlstra
2025-03-04 8:57 ` tip-bot2 for Peter Zijlstra
2024-11-04 13:39 ` [PATCH 13/19] perf: Robustify perf_event_free_bpf_prog() Peter Zijlstra
` (8 subsequent siblings)
20 siblings, 2 replies; 85+ messages in thread
From: Peter Zijlstra @ 2024-11-04 13:39 UTC (permalink / raw)
To: mingo, lucas.demarchi
Cc: linux-kernel, peterz, willy, acme, namhyung, mark.rutland,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang
Replace _free_event()'s use of perf_addr_filters_splice()s use with an
explicit perf_free_addr_filters() with the explicit propery that it is
able to be called a second time without ill effect.
Most notable, referencing event->pmu must be avoided when there are no
filters left (from eg a previous call).
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
kernel/events/core.c | 19 ++++++++++++++++---
1 file changed, 16 insertions(+), 3 deletions(-)
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5307,8 +5307,7 @@ static bool exclusive_event_installable(
return true;
}
-static void perf_addr_filters_splice(struct perf_event *event,
- struct list_head *head);
+static void perf_free_addr_filters(struct perf_event *event);
static void perf_pending_task_sync(struct perf_event *event)
{
@@ -5407,7 +5406,7 @@ static void _free_event(struct perf_even
}
perf_event_free_bpf_prog(event);
- perf_addr_filters_splice(event, NULL);
+ perf_free_addr_filters(event);
__free_event(event);
}
@@ -10880,6 +10882,17 @@ static void perf_addr_filters_splice(str
free_filters_list(&list);
}
+static void perf_free_addr_filters(struct perf_event *event)
+{
+ /*
+ * Used during free paths, there is no concurrency.
+ */
+ if (list_empty(&event->addr_filters.list))
+ return;
+
+ perf_addr_filters_splice(event, NULL);
+}
+
/*
* Scan through mm's vmas and see if one of them matches the
* @filter; if so, adjust filter's address range.
^ permalink raw reply [flat|nested] 85+ messages in thread* [tip: perf/core] perf/core: Introduce perf_free_addr_filters()
2024-11-04 13:39 ` [PATCH 12/19] perf: Introduce perf_free_addr_filters() Peter Zijlstra
@ 2025-03-01 20:07 ` tip-bot2 for Peter Zijlstra
2025-03-04 8:57 ` tip-bot2 for Peter Zijlstra
1 sibling, 0 replies; 85+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2025-03-01 20:07 UTC (permalink / raw)
To: linux-tip-commits; +Cc: Peter Zijlstra (Intel), Ingo Molnar, x86, linux-kernel
The following commit has been merged into the perf/core branch of tip:
Commit-ID: 8e140c656746ef14d34be68d56f1a1047991a8be
Gitweb: https://git.kernel.org/tip/8e140c656746ef14d34be68d56f1a1047991a8be
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Mon, 04 Nov 2024 14:39:21 +01:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Sat, 01 Mar 2025 20:02:51 +01:00
perf/core: Introduce perf_free_addr_filters()
Replace _free_event()'s use of perf_addr_filters_splice()s use with an
explicit perf_free_addr_filters() with the explicit propery that it is
able to be called a second time without ill effect.
Most notable, referencing event->pmu must be avoided when there are no
filters left (from eg a previous call).
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20241104135518.868460518@infradead.org
---
kernel/events/core.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 0c7015f..525c64e 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5339,8 +5339,7 @@ static bool exclusive_event_installable(struct perf_event *event,
return true;
}
-static void perf_addr_filters_splice(struct perf_event *event,
- struct list_head *head);
+static void perf_free_addr_filters(struct perf_event *event);
static void perf_pending_task_sync(struct perf_event *event)
{
@@ -5439,7 +5438,7 @@ static void _free_event(struct perf_event *event)
}
perf_event_free_bpf_prog(event);
- perf_addr_filters_splice(event, NULL);
+ perf_free_addr_filters(event);
__free_event(event);
}
@@ -11004,6 +11003,17 @@ static void perf_addr_filters_splice(struct perf_event *event,
free_filters_list(&list);
}
+static void perf_free_addr_filters(struct perf_event *event)
+{
+ /*
+ * Used during free paths, there is no concurrency.
+ */
+ if (list_empty(&event->addr_filters.list))
+ return;
+
+ perf_addr_filters_splice(event, NULL);
+}
+
/*
* Scan through mm's vmas and see if one of them matches the
* @filter; if so, adjust filter's address range.
^ permalink raw reply related [flat|nested] 85+ messages in thread* [tip: perf/core] perf/core: Introduce perf_free_addr_filters()
2024-11-04 13:39 ` [PATCH 12/19] perf: Introduce perf_free_addr_filters() Peter Zijlstra
2025-03-01 20:07 ` [tip: perf/core] perf/core: " tip-bot2 for Peter Zijlstra
@ 2025-03-04 8:57 ` tip-bot2 for Peter Zijlstra
1 sibling, 0 replies; 85+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2025-03-04 8:57 UTC (permalink / raw)
To: linux-tip-commits
Cc: Peter Zijlstra (Intel), Ingo Molnar, Ravi Bangoria, x86,
linux-kernel
The following commit has been merged into the perf/core branch of tip:
Commit-ID: adc38b4ca1ed25ed2f1300e4d87c483bf51bfd50
Gitweb: https://git.kernel.org/tip/adc38b4ca1ed25ed2f1300e4d87c483bf51bfd50
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Mon, 04 Nov 2024 14:39:21 +01:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 04 Mar 2025 09:42:55 +01:00
perf/core: Introduce perf_free_addr_filters()
Replace _free_event()'s use of perf_addr_filters_splice()s use with an
explicit perf_free_addr_filters() with the explicit propery that it is
able to be called a second time without ill effect.
Most notable, referencing event->pmu must be avoided when there are no
filters left (from eg a previous call).
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20241104135518.868460518@infradead.org
---
kernel/events/core.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 0c7015f..525c64e 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5339,8 +5339,7 @@ static bool exclusive_event_installable(struct perf_event *event,
return true;
}
-static void perf_addr_filters_splice(struct perf_event *event,
- struct list_head *head);
+static void perf_free_addr_filters(struct perf_event *event);
static void perf_pending_task_sync(struct perf_event *event)
{
@@ -5439,7 +5438,7 @@ static void _free_event(struct perf_event *event)
}
perf_event_free_bpf_prog(event);
- perf_addr_filters_splice(event, NULL);
+ perf_free_addr_filters(event);
__free_event(event);
}
@@ -11004,6 +11003,17 @@ static void perf_addr_filters_splice(struct perf_event *event,
free_filters_list(&list);
}
+static void perf_free_addr_filters(struct perf_event *event)
+{
+ /*
+ * Used during free paths, there is no concurrency.
+ */
+ if (list_empty(&event->addr_filters.list))
+ return;
+
+ perf_addr_filters_splice(event, NULL);
+}
+
/*
* Scan through mm's vmas and see if one of them matches the
* @filter; if so, adjust filter's address range.
^ permalink raw reply related [flat|nested] 85+ messages in thread
* [PATCH 13/19] perf: Robustify perf_event_free_bpf_prog()
2024-11-04 13:39 [PATCH 00/19] perf: Make perf_pmu_unregister() usable Peter Zijlstra
` (11 preceding siblings ...)
2024-11-04 13:39 ` [PATCH 12/19] perf: Introduce perf_free_addr_filters() Peter Zijlstra
@ 2024-11-04 13:39 ` Peter Zijlstra
2025-03-01 20:07 ` [tip: perf/core] perf/bpf: " tip-bot2 for Peter Zijlstra
2025-03-04 8:57 ` tip-bot2 for Peter Zijlstra
2024-11-04 13:39 ` [PATCH 14/19] perf: Simplify perf_mmap() control flow Peter Zijlstra
` (7 subsequent siblings)
20 siblings, 2 replies; 85+ messages in thread
From: Peter Zijlstra @ 2024-11-04 13:39 UTC (permalink / raw)
To: mingo, lucas.demarchi
Cc: linux-kernel, peterz, willy, acme, namhyung, mark.rutland,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang
Ensure perf_event_free_bpf_prog() is safe to call a second time;
notably without making any references to event->pmu when there is no
prog left.
XXX perf_event_detach_bpf_prog() might leave a stale event->prog
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
kernel/events/core.c | 19 ++++++++++++++++---
1 file changed, 16 insertions(+), 3 deletions(-)
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -10782,6 +10781,9 @@ int perf_event_set_bpf_prog(struct perf_
void perf_event_free_bpf_prog(struct perf_event *event)
{
+ if (!event->prog)
+ return;
+
if (!perf_event_is_tracing(event)) {
perf_event_free_bpf_handler(event);
return;
^ permalink raw reply [flat|nested] 85+ messages in thread* [tip: perf/core] perf/bpf: Robustify perf_event_free_bpf_prog()
2024-11-04 13:39 ` [PATCH 13/19] perf: Robustify perf_event_free_bpf_prog() Peter Zijlstra
@ 2025-03-01 20:07 ` tip-bot2 for Peter Zijlstra
2025-03-04 8:57 ` tip-bot2 for Peter Zijlstra
1 sibling, 0 replies; 85+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2025-03-01 20:07 UTC (permalink / raw)
To: linux-tip-commits
Cc: Peter Zijlstra (Intel), Ingo Molnar, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, x86, linux-kernel
The following commit has been merged into the perf/core branch of tip:
Commit-ID: 26700b1359a121c42eb7517e053a86b07466b79b
Gitweb: https://git.kernel.org/tip/26700b1359a121c42eb7517e053a86b07466b79b
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Mon, 04 Nov 2024 14:39:22 +01:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Sat, 01 Mar 2025 20:06:47 +01:00
perf/bpf: Robustify perf_event_free_bpf_prog()
Ensure perf_event_free_bpf_prog() is safe to call a second time;
notably without making any references to event->pmu when there is no
prog left.
Note: perf_event_detach_bpf_prog() might leave a stale event->prog
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20241104135518.978956692@infradead.org
---
kernel/events/core.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 525c64e..ab4e497 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -10905,6 +10905,9 @@ int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog,
void perf_event_free_bpf_prog(struct perf_event *event)
{
+ if (!event->prog)
+ return;
+
if (!perf_event_is_tracing(event)) {
perf_event_free_bpf_handler(event);
return;
^ permalink raw reply related [flat|nested] 85+ messages in thread* [tip: perf/core] perf/bpf: Robustify perf_event_free_bpf_prog()
2024-11-04 13:39 ` [PATCH 13/19] perf: Robustify perf_event_free_bpf_prog() Peter Zijlstra
2025-03-01 20:07 ` [tip: perf/core] perf/bpf: " tip-bot2 for Peter Zijlstra
@ 2025-03-04 8:57 ` tip-bot2 for Peter Zijlstra
1 sibling, 0 replies; 85+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2025-03-04 8:57 UTC (permalink / raw)
To: linux-tip-commits
Cc: Peter Zijlstra (Intel), Ingo Molnar, Ravi Bangoria,
Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, x86,
linux-kernel
The following commit has been merged into the perf/core branch of tip:
Commit-ID: c5b96789575b670b1e776071bb243e0ed3d3abaa
Gitweb: https://git.kernel.org/tip/c5b96789575b670b1e776071bb243e0ed3d3abaa
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Mon, 04 Nov 2024 14:39:22 +01:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 04 Mar 2025 09:42:59 +01:00
perf/bpf: Robustify perf_event_free_bpf_prog()
Ensure perf_event_free_bpf_prog() is safe to call a second time;
notably without making any references to event->pmu when there is no
prog left.
Note: perf_event_detach_bpf_prog() might leave a stale event->prog
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20241104135518.978956692@infradead.org
---
kernel/events/core.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 525c64e..ab4e497 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -10905,6 +10905,9 @@ int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog,
void perf_event_free_bpf_prog(struct perf_event *event)
{
+ if (!event->prog)
+ return;
+
if (!perf_event_is_tracing(event)) {
perf_event_free_bpf_handler(event);
return;
^ permalink raw reply related [flat|nested] 85+ messages in thread
* [PATCH 14/19] perf: Simplify perf_mmap() control flow
2024-11-04 13:39 [PATCH 00/19] perf: Make perf_pmu_unregister() usable Peter Zijlstra
` (12 preceding siblings ...)
2024-11-04 13:39 ` [PATCH 13/19] perf: Robustify perf_event_free_bpf_prog() Peter Zijlstra
@ 2024-11-04 13:39 ` Peter Zijlstra
2025-03-01 20:07 ` [tip: perf/core] perf/core: Simplify the " tip-bot2 for Peter Zijlstra
2025-03-04 8:57 ` tip-bot2 for Peter Zijlstra
2024-11-04 13:39 ` [PATCH 15/19] perf: Fix perf_mmap() failure path Peter Zijlstra
` (6 subsequent siblings)
20 siblings, 2 replies; 85+ messages in thread
From: Peter Zijlstra @ 2024-11-04 13:39 UTC (permalink / raw)
To: mingo, lucas.demarchi
Cc: linux-kernel, peterz, willy, acme, namhyung, mark.rutland,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang
if (c) {
X1;
} else {
Y;
goto l;
}
X2;
l:
into:
if (c) {
X1;
X2;
} else {
Y;
}
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
kernel/events/core.c | 71 ++++++++++++++++++++++++---------------------------
1 file changed, 34 insertions(+), 37 deletions(-)
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6647,6 +6647,40 @@ static int perf_mmap(struct file *file,
if (vma->vm_pgoff == 0) {
nr_pages = (vma_size / PAGE_SIZE) - 1;
+
+ /*
+ * If we have rb pages ensure they're a power-of-two number, so we
+ * can do bitmasks instead of modulo.
+ */
+ if (nr_pages != 0 && !is_power_of_2(nr_pages))
+ return -EINVAL;
+
+ if (vma_size != PAGE_SIZE * (1 + nr_pages))
+ return -EINVAL;
+
+ WARN_ON_ONCE(event->ctx->parent_ctx);
+again:
+ mutex_lock(&event->mmap_mutex);
+ if (event->rb) {
+ if (data_page_nr(event->rb) != nr_pages) {
+ ret = -EINVAL;
+ goto unlock;
+ }
+
+ if (!atomic_inc_not_zero(&event->rb->mmap_count)) {
+ /*
+ * Raced against perf_mmap_close(); remove the
+ * event and try again.
+ */
+ ring_buffer_attach(event, NULL);
+ mutex_unlock(&event->mmap_mutex);
+ goto again;
+ }
+
+ goto unlock;
+ }
+
+ user_extra = nr_pages + 1;
} else {
/*
* AUX area mapping: if rb->aux_nr_pages != 0, it's already
@@ -6706,45 +6740,8 @@ static int perf_mmap(struct file *file,
atomic_set(&rb->aux_mmap_count, 1);
user_extra = nr_pages;
-
- goto accounting;
}
- /*
- * If we have rb pages ensure they're a power-of-two number, so we
- * can do bitmasks instead of modulo.
- */
- if (nr_pages != 0 && !is_power_of_2(nr_pages))
- return -EINVAL;
-
- if (vma_size != PAGE_SIZE * (1 + nr_pages))
- return -EINVAL;
-
- WARN_ON_ONCE(event->ctx->parent_ctx);
-again:
- mutex_lock(&event->mmap_mutex);
- if (event->rb) {
- if (data_page_nr(event->rb) != nr_pages) {
- ret = -EINVAL;
- goto unlock;
- }
-
- if (!atomic_inc_not_zero(&event->rb->mmap_count)) {
- /*
- * Raced against perf_mmap_close(); remove the
- * event and try again.
- */
- ring_buffer_attach(event, NULL);
- mutex_unlock(&event->mmap_mutex);
- goto again;
- }
-
- goto unlock;
- }
-
- user_extra = nr_pages + 1;
-
-accounting:
user_lock_limit = sysctl_perf_event_mlock >> (PAGE_SHIFT - 10);
/*
^ permalink raw reply [flat|nested] 85+ messages in thread* [tip: perf/core] perf/core: Simplify the perf_mmap() control flow
2024-11-04 13:39 ` [PATCH 14/19] perf: Simplify perf_mmap() control flow Peter Zijlstra
@ 2025-03-01 20:07 ` tip-bot2 for Peter Zijlstra
2025-03-04 8:57 ` tip-bot2 for Peter Zijlstra
1 sibling, 0 replies; 85+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2025-03-01 20:07 UTC (permalink / raw)
To: linux-tip-commits; +Cc: Peter Zijlstra (Intel), Ingo Molnar, x86, linux-kernel
The following commit has been merged into the perf/core branch of tip:
Commit-ID: 7503c90c0df8d0178be66d53705eacd9e843d762
Gitweb: https://git.kernel.org/tip/7503c90c0df8d0178be66d53705eacd9e843d762
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Mon, 04 Nov 2024 14:39:23 +01:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Sat, 01 Mar 2025 20:12:53 +01:00
perf/core: Simplify the perf_mmap() control flow
Identity-transform:
if (c) {
X1;
} else {
Y;
goto l;
}
X2;
l:
into the simpler:
if (c) {
X1;
X2;
} else {
Y;
}
[ mingo: Forward ported it ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20241104135519.095904637@infradead.org
---
kernel/events/core.c | 75 ++++++++++++++++++++-----------------------
1 file changed, 36 insertions(+), 39 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index ab4e497..d1b04c8 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6701,6 +6701,42 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma)
if (vma->vm_pgoff == 0) {
nr_pages = (vma_size / PAGE_SIZE) - 1;
+
+ /*
+ * If we have rb pages ensure they're a power-of-two number, so we
+ * can do bitmasks instead of modulo.
+ */
+ if (nr_pages != 0 && !is_power_of_2(nr_pages))
+ return -EINVAL;
+
+ if (vma_size != PAGE_SIZE * (1 + nr_pages))
+ return -EINVAL;
+
+ WARN_ON_ONCE(event->ctx->parent_ctx);
+again:
+ mutex_lock(&event->mmap_mutex);
+ if (event->rb) {
+ if (data_page_nr(event->rb) != nr_pages) {
+ ret = -EINVAL;
+ goto unlock;
+ }
+
+ if (!atomic_inc_not_zero(&event->rb->mmap_count)) {
+ /*
+ * Raced against perf_mmap_close(); remove the
+ * event and try again.
+ */
+ ring_buffer_attach(event, NULL);
+ mutex_unlock(&event->mmap_mutex);
+ goto again;
+ }
+
+ /* We need the rb to map pages. */
+ rb = event->rb;
+ goto unlock;
+ }
+
+ user_extra = nr_pages + 1;
} else {
/*
* AUX area mapping: if rb->aux_nr_pages != 0, it's already
@@ -6760,47 +6796,8 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma)
atomic_set(&rb->aux_mmap_count, 1);
user_extra = nr_pages;
-
- goto accounting;
- }
-
- /*
- * If we have rb pages ensure they're a power-of-two number, so we
- * can do bitmasks instead of modulo.
- */
- if (nr_pages != 0 && !is_power_of_2(nr_pages))
- return -EINVAL;
-
- if (vma_size != PAGE_SIZE * (1 + nr_pages))
- return -EINVAL;
-
- WARN_ON_ONCE(event->ctx->parent_ctx);
-again:
- mutex_lock(&event->mmap_mutex);
- if (event->rb) {
- if (data_page_nr(event->rb) != nr_pages) {
- ret = -EINVAL;
- goto unlock;
- }
-
- if (!atomic_inc_not_zero(&event->rb->mmap_count)) {
- /*
- * Raced against perf_mmap_close(); remove the
- * event and try again.
- */
- ring_buffer_attach(event, NULL);
- mutex_unlock(&event->mmap_mutex);
- goto again;
- }
-
- /* We need the rb to map pages. */
- rb = event->rb;
- goto unlock;
}
- user_extra = nr_pages + 1;
-
-accounting:
user_lock_limit = sysctl_perf_event_mlock >> (PAGE_SHIFT - 10);
/*
^ permalink raw reply related [flat|nested] 85+ messages in thread* [tip: perf/core] perf/core: Simplify the perf_mmap() control flow
2024-11-04 13:39 ` [PATCH 14/19] perf: Simplify perf_mmap() control flow Peter Zijlstra
2025-03-01 20:07 ` [tip: perf/core] perf/core: Simplify the " tip-bot2 for Peter Zijlstra
@ 2025-03-04 8:57 ` tip-bot2 for Peter Zijlstra
1 sibling, 0 replies; 85+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2025-03-04 8:57 UTC (permalink / raw)
To: linux-tip-commits
Cc: Peter Zijlstra (Intel), Ingo Molnar, Ravi Bangoria, x86,
linux-kernel
The following commit has been merged into the perf/core branch of tip:
Commit-ID: 954878377bc81459b95937a05f01e8ebf6a05083
Gitweb: https://git.kernel.org/tip/954878377bc81459b95937a05f01e8ebf6a05083
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Mon, 04 Nov 2024 14:39:23 +01:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 04 Mar 2025 09:43:05 +01:00
perf/core: Simplify the perf_mmap() control flow
Identity-transform:
if (c) {
X1;
} else {
Y;
goto l;
}
X2;
l:
into the simpler:
if (c) {
X1;
X2;
} else {
Y;
}
[ mingo: Forward ported it ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20241104135519.095904637@infradead.org
---
kernel/events/core.c | 75 ++++++++++++++++++++-----------------------
1 file changed, 36 insertions(+), 39 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index ab4e497..d1b04c8 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6701,6 +6701,42 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma)
if (vma->vm_pgoff == 0) {
nr_pages = (vma_size / PAGE_SIZE) - 1;
+
+ /*
+ * If we have rb pages ensure they're a power-of-two number, so we
+ * can do bitmasks instead of modulo.
+ */
+ if (nr_pages != 0 && !is_power_of_2(nr_pages))
+ return -EINVAL;
+
+ if (vma_size != PAGE_SIZE * (1 + nr_pages))
+ return -EINVAL;
+
+ WARN_ON_ONCE(event->ctx->parent_ctx);
+again:
+ mutex_lock(&event->mmap_mutex);
+ if (event->rb) {
+ if (data_page_nr(event->rb) != nr_pages) {
+ ret = -EINVAL;
+ goto unlock;
+ }
+
+ if (!atomic_inc_not_zero(&event->rb->mmap_count)) {
+ /*
+ * Raced against perf_mmap_close(); remove the
+ * event and try again.
+ */
+ ring_buffer_attach(event, NULL);
+ mutex_unlock(&event->mmap_mutex);
+ goto again;
+ }
+
+ /* We need the rb to map pages. */
+ rb = event->rb;
+ goto unlock;
+ }
+
+ user_extra = nr_pages + 1;
} else {
/*
* AUX area mapping: if rb->aux_nr_pages != 0, it's already
@@ -6760,47 +6796,8 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma)
atomic_set(&rb->aux_mmap_count, 1);
user_extra = nr_pages;
-
- goto accounting;
- }
-
- /*
- * If we have rb pages ensure they're a power-of-two number, so we
- * can do bitmasks instead of modulo.
- */
- if (nr_pages != 0 && !is_power_of_2(nr_pages))
- return -EINVAL;
-
- if (vma_size != PAGE_SIZE * (1 + nr_pages))
- return -EINVAL;
-
- WARN_ON_ONCE(event->ctx->parent_ctx);
-again:
- mutex_lock(&event->mmap_mutex);
- if (event->rb) {
- if (data_page_nr(event->rb) != nr_pages) {
- ret = -EINVAL;
- goto unlock;
- }
-
- if (!atomic_inc_not_zero(&event->rb->mmap_count)) {
- /*
- * Raced against perf_mmap_close(); remove the
- * event and try again.
- */
- ring_buffer_attach(event, NULL);
- mutex_unlock(&event->mmap_mutex);
- goto again;
- }
-
- /* We need the rb to map pages. */
- rb = event->rb;
- goto unlock;
}
- user_extra = nr_pages + 1;
-
-accounting:
user_lock_limit = sysctl_perf_event_mlock >> (PAGE_SHIFT - 10);
/*
^ permalink raw reply related [flat|nested] 85+ messages in thread
* [PATCH 15/19] perf: Fix perf_mmap() failure path
2024-11-04 13:39 [PATCH 00/19] perf: Make perf_pmu_unregister() usable Peter Zijlstra
` (13 preceding siblings ...)
2024-11-04 13:39 ` [PATCH 14/19] perf: Simplify perf_mmap() control flow Peter Zijlstra
@ 2024-11-04 13:39 ` Peter Zijlstra
2025-03-01 19:17 ` Ingo Molnar
` (2 more replies)
2024-11-04 13:39 ` [PATCH 16/19] perf: Further simplify perf_mmap() Peter Zijlstra
` (5 subsequent siblings)
20 siblings, 3 replies; 85+ messages in thread
From: Peter Zijlstra @ 2024-11-04 13:39 UTC (permalink / raw)
To: mingo, lucas.demarchi
Cc: linux-kernel, peterz, willy, acme, namhyung, mark.rutland,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang
When f_ops->mmap() returns failure, m_ops->close() is *not* called.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
kernel/events/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6830,7 +6830,7 @@ static int perf_mmap(struct file *file,
vm_flags_set(vma, VM_DONTCOPY | VM_DONTEXPAND | VM_DONTDUMP);
vma->vm_ops = &perf_mmap_vmops;
- if (event->pmu->event_mapped)
+ if (!ret && event->pmu->event_mapped)
event->pmu->event_mapped(event, vma->vm_mm);
return ret;
^ permalink raw reply [flat|nested] 85+ messages in thread* Re: [PATCH 15/19] perf: Fix perf_mmap() failure path
2024-11-04 13:39 ` [PATCH 15/19] perf: Fix perf_mmap() failure path Peter Zijlstra
@ 2025-03-01 19:17 ` Ingo Molnar
2025-03-03 12:38 ` Lorenzo Stoakes
2025-03-04 8:46 ` [tip: perf/core] perf/core: " tip-bot2 for Peter Zijlstra
2025-03-04 8:56 ` tip-bot2 for Peter Zijlstra
2 siblings, 1 reply; 85+ messages in thread
From: Ingo Molnar @ 2025-03-01 19:17 UTC (permalink / raw)
To: Peter Zijlstra
Cc: lucas.demarchi, linux-kernel, willy, acme, namhyung, mark.rutland,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang
* Peter Zijlstra <peterz@infradead.org> wrote:
> When f_ops->mmap() returns failure, m_ops->close() is *not* called.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
> kernel/events/core.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -6830,7 +6830,7 @@ static int perf_mmap(struct file *file,
> vm_flags_set(vma, VM_DONTCOPY | VM_DONTEXPAND | VM_DONTDUMP);
> vma->vm_ops = &perf_mmap_vmops;
>
> - if (event->pmu->event_mapped)
> + if (!ret && event->pmu->event_mapped)
> event->pmu->event_mapped(event, vma->vm_mm);
>
> return ret;
I'm wondering whether this fix is still relevant in context of this
recent commit:
b709eb872e19 perf: map pages in advance
Thanks,
Ingo
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 15/19] perf: Fix perf_mmap() failure path
2025-03-01 19:17 ` Ingo Molnar
@ 2025-03-03 12:38 ` Lorenzo Stoakes
0 siblings, 0 replies; 85+ messages in thread
From: Lorenzo Stoakes @ 2025-03-03 12:38 UTC (permalink / raw)
To: Ingo Molnar
Cc: Peter Zijlstra, lucas.demarchi, linux-kernel, willy, acme,
namhyung, mark.rutland, alexander.shishkin, jolsa, irogers,
adrian.hunter, kan.liang
On Sat, Mar 01, 2025 at 08:17:02PM +0100, Ingo Molnar wrote:
>
> * Peter Zijlstra <peterz@infradead.org> wrote:
>
> > When f_ops->mmap() returns failure, m_ops->close() is *not* called.
> >
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> > kernel/events/core.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -6830,7 +6830,7 @@ static int perf_mmap(struct file *file,
> > vm_flags_set(vma, VM_DONTCOPY | VM_DONTEXPAND | VM_DONTDUMP);
> > vma->vm_ops = &perf_mmap_vmops;
> >
> > - if (event->pmu->event_mapped)
> > + if (!ret && event->pmu->event_mapped)
> > event->pmu->event_mapped(event, vma->vm_mm);
> >
> > return ret;
>
> I'm wondering whether this fix is still relevant in context of this
> recent commit:
>
> b709eb872e19 perf: map pages in advance
Yeah this is, as perf_mmap() will still be triggered by the mmap code when
this is mmap'd, all we do is essentially prefault the pages at this point
also.
>
> Thanks,
>
> Ingo
^ permalink raw reply [flat|nested] 85+ messages in thread
* [tip: perf/core] perf/core: Fix perf_mmap() failure path
2024-11-04 13:39 ` [PATCH 15/19] perf: Fix perf_mmap() failure path Peter Zijlstra
2025-03-01 19:17 ` Ingo Molnar
@ 2025-03-04 8:46 ` tip-bot2 for Peter Zijlstra
2025-03-04 8:56 ` tip-bot2 for Peter Zijlstra
2 siblings, 0 replies; 85+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2025-03-04 8:46 UTC (permalink / raw)
To: linux-tip-commits
Cc: Peter Zijlstra (Intel), Ingo Molnar, Lorenzo Stoakes, x86,
linux-kernel
The following commit has been merged into the perf/core branch of tip:
Commit-ID: bfd33e88addda078a089657044945858a33c435e
Gitweb: https://git.kernel.org/tip/bfd33e88addda078a089657044945858a33c435e
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Mon, 04 Nov 2024 14:39:24 +01:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 04 Mar 2025 09:39:04 +01:00
perf/core: Fix perf_mmap() failure path
When f_ops->mmap() returns failure, m_ops->close() is *not* called.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Link: https://lore.kernel.org/r/20241104135519.248358497@infradead.org
---
kernel/events/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 8b2a8c3..b2334d2 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6903,7 +6903,7 @@ aux_unlock:
if (!ret)
ret = map_range(rb, vma);
- if (event->pmu->event_mapped)
+ if (!ret && event->pmu->event_mapped)
event->pmu->event_mapped(event, vma->vm_mm);
return ret;
^ permalink raw reply related [flat|nested] 85+ messages in thread
* [tip: perf/core] perf/core: Fix perf_mmap() failure path
2024-11-04 13:39 ` [PATCH 15/19] perf: Fix perf_mmap() failure path Peter Zijlstra
2025-03-01 19:17 ` Ingo Molnar
2025-03-04 8:46 ` [tip: perf/core] perf/core: " tip-bot2 for Peter Zijlstra
@ 2025-03-04 8:56 ` tip-bot2 for Peter Zijlstra
2 siblings, 0 replies; 85+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2025-03-04 8:56 UTC (permalink / raw)
To: linux-tip-commits
Cc: Peter Zijlstra (Intel), Ingo Molnar, Lorenzo Stoakes,
Ravi Bangoria, x86, linux-kernel
The following commit has been merged into the perf/core branch of tip:
Commit-ID: 66477c7230eb1f9b90deb8c0f4da2bac2053c329
Gitweb: https://git.kernel.org/tip/66477c7230eb1f9b90deb8c0f4da2bac2053c329
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Mon, 04 Nov 2024 14:39:24 +01:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 04 Mar 2025 09:43:26 +01:00
perf/core: Fix perf_mmap() failure path
When f_ops->mmap() returns failure, m_ops->close() is *not* called.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20241104135519.248358497@infradead.org
---
kernel/events/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 8b2a8c3..b2334d2 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6903,7 +6903,7 @@ aux_unlock:
if (!ret)
ret = map_range(rb, vma);
- if (event->pmu->event_mapped)
+ if (!ret && event->pmu->event_mapped)
event->pmu->event_mapped(event, vma->vm_mm);
return ret;
^ permalink raw reply related [flat|nested] 85+ messages in thread
* [PATCH 16/19] perf: Further simplify perf_mmap()
2024-11-04 13:39 [PATCH 00/19] perf: Make perf_pmu_unregister() usable Peter Zijlstra
` (14 preceding siblings ...)
2024-11-04 13:39 ` [PATCH 15/19] perf: Fix perf_mmap() failure path Peter Zijlstra
@ 2024-11-04 13:39 ` Peter Zijlstra
2025-03-01 20:07 ` [tip: perf/core] perf/core: " tip-bot2 for Peter Zijlstra
2025-03-04 8:57 ` tip-bot2 for Peter Zijlstra
2024-11-04 13:39 ` [PATCH 17/19] perf: Remove retry loop from perf_mmap() Peter Zijlstra
` (4 subsequent siblings)
20 siblings, 2 replies; 85+ messages in thread
From: Peter Zijlstra @ 2024-11-04 13:39 UTC (permalink / raw)
To: mingo, lucas.demarchi
Cc: linux-kernel, peterz, willy, acme, namhyung, mark.rutland,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
kernel/events/core.c | 21 ++++++++++-----------
1 file changed, 10 insertions(+), 11 deletions(-)
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6644,9 +6644,18 @@ static int perf_mmap(struct file *file,
return ret;
vma_size = vma->vm_end - vma->vm_start;
+ nr_pages = vma_size / PAGE_SIZE;
+
+ if (nr_pages > INT_MAX)
+ return -ENOMEM;
+
+ if (vma_size != PAGE_SIZE * nr_pages)
+ return -EINVAL;
+
+ user_extra = nr_pages;
if (vma->vm_pgoff == 0) {
- nr_pages = (vma_size / PAGE_SIZE) - 1;
+ nr_pages -= 1;
/*
* If we have rb pages ensure they're a power-of-two number, so we
@@ -6655,9 +6664,6 @@ static int perf_mmap(struct file *file,
if (nr_pages != 0 && !is_power_of_2(nr_pages))
return -EINVAL;
- if (vma_size != PAGE_SIZE * (1 + nr_pages))
- return -EINVAL;
-
WARN_ON_ONCE(event->ctx->parent_ctx);
again:
mutex_lock(&event->mmap_mutex);
@@ -6679,8 +6685,6 @@ static int perf_mmap(struct file *file,
goto unlock;
}
-
- user_extra = nr_pages + 1;
} else {
/*
* AUX area mapping: if rb->aux_nr_pages != 0, it's already
@@ -6692,10 +6696,6 @@ static int perf_mmap(struct file *file,
if (!event->rb)
return -EINVAL;
- nr_pages = vma_size / PAGE_SIZE;
- if (nr_pages > INT_MAX)
- return -ENOMEM;
-
mutex_lock(&event->mmap_mutex);
ret = -EINVAL;
@@ -6739,7 +6739,6 @@ static int perf_mmap(struct file *file,
}
atomic_set(&rb->aux_mmap_count, 1);
- user_extra = nr_pages;
}
user_lock_limit = sysctl_perf_event_mlock >> (PAGE_SHIFT - 10);
^ permalink raw reply [flat|nested] 85+ messages in thread* [tip: perf/core] perf/core: Further simplify perf_mmap()
2024-11-04 13:39 ` [PATCH 16/19] perf: Further simplify perf_mmap() Peter Zijlstra
@ 2025-03-01 20:07 ` tip-bot2 for Peter Zijlstra
2025-03-04 8:57 ` tip-bot2 for Peter Zijlstra
1 sibling, 0 replies; 85+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2025-03-01 20:07 UTC (permalink / raw)
To: linux-tip-commits; +Cc: Peter Zijlstra (Intel), Ingo Molnar, x86, linux-kernel
The following commit has been merged into the perf/core branch of tip:
Commit-ID: 8c7446add31e5db22ceb2f066a8674735c9753f1
Gitweb: https://git.kernel.org/tip/8c7446add31e5db22ceb2f066a8674735c9753f1
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Mon, 04 Nov 2024 14:39:25 +01:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Sat, 01 Mar 2025 20:24:34 +01:00
perf/core: Further simplify perf_mmap()
Perform CSE and such.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20241104135519.354909594@infradead.org
---
kernel/events/core.c | 21 ++++++++++-----------
1 file changed, 10 insertions(+), 11 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index d1b04c8..4cd3494 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6698,9 +6698,18 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma)
return ret;
vma_size = vma->vm_end - vma->vm_start;
+ nr_pages = vma_size / PAGE_SIZE;
+
+ if (nr_pages > INT_MAX)
+ return -ENOMEM;
+
+ if (vma_size != PAGE_SIZE * nr_pages)
+ return -EINVAL;
+
+ user_extra = nr_pages;
if (vma->vm_pgoff == 0) {
- nr_pages = (vma_size / PAGE_SIZE) - 1;
+ nr_pages -= 1;
/*
* If we have rb pages ensure they're a power-of-two number, so we
@@ -6709,9 +6718,6 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma)
if (nr_pages != 0 && !is_power_of_2(nr_pages))
return -EINVAL;
- if (vma_size != PAGE_SIZE * (1 + nr_pages))
- return -EINVAL;
-
WARN_ON_ONCE(event->ctx->parent_ctx);
again:
mutex_lock(&event->mmap_mutex);
@@ -6735,8 +6741,6 @@ again:
rb = event->rb;
goto unlock;
}
-
- user_extra = nr_pages + 1;
} else {
/*
* AUX area mapping: if rb->aux_nr_pages != 0, it's already
@@ -6748,10 +6752,6 @@ again:
if (!event->rb)
return -EINVAL;
- nr_pages = vma_size / PAGE_SIZE;
- if (nr_pages > INT_MAX)
- return -ENOMEM;
-
mutex_lock(&event->mmap_mutex);
ret = -EINVAL;
@@ -6795,7 +6795,6 @@ again:
}
atomic_set(&rb->aux_mmap_count, 1);
- user_extra = nr_pages;
}
user_lock_limit = sysctl_perf_event_mlock >> (PAGE_SHIFT - 10);
^ permalink raw reply related [flat|nested] 85+ messages in thread* [tip: perf/core] perf/core: Further simplify perf_mmap()
2024-11-04 13:39 ` [PATCH 16/19] perf: Further simplify perf_mmap() Peter Zijlstra
2025-03-01 20:07 ` [tip: perf/core] perf/core: " tip-bot2 for Peter Zijlstra
@ 2025-03-04 8:57 ` tip-bot2 for Peter Zijlstra
1 sibling, 0 replies; 85+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2025-03-04 8:57 UTC (permalink / raw)
To: linux-tip-commits
Cc: Peter Zijlstra (Intel), Ingo Molnar, Ravi Bangoria, x86,
linux-kernel
The following commit has been merged into the perf/core branch of tip:
Commit-ID: 0c8a4e4139adf09b27fb910edbc596ea2d31a5db
Gitweb: https://git.kernel.org/tip/0c8a4e4139adf09b27fb910edbc596ea2d31a5db
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Mon, 04 Nov 2024 14:39:25 +01:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 04 Mar 2025 09:43:10 +01:00
perf/core: Further simplify perf_mmap()
Perform CSE and such.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20241104135519.354909594@infradead.org
---
kernel/events/core.c | 21 ++++++++++-----------
1 file changed, 10 insertions(+), 11 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index d1b04c8..4cd3494 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6698,9 +6698,18 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma)
return ret;
vma_size = vma->vm_end - vma->vm_start;
+ nr_pages = vma_size / PAGE_SIZE;
+
+ if (nr_pages > INT_MAX)
+ return -ENOMEM;
+
+ if (vma_size != PAGE_SIZE * nr_pages)
+ return -EINVAL;
+
+ user_extra = nr_pages;
if (vma->vm_pgoff == 0) {
- nr_pages = (vma_size / PAGE_SIZE) - 1;
+ nr_pages -= 1;
/*
* If we have rb pages ensure they're a power-of-two number, so we
@@ -6709,9 +6718,6 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma)
if (nr_pages != 0 && !is_power_of_2(nr_pages))
return -EINVAL;
- if (vma_size != PAGE_SIZE * (1 + nr_pages))
- return -EINVAL;
-
WARN_ON_ONCE(event->ctx->parent_ctx);
again:
mutex_lock(&event->mmap_mutex);
@@ -6735,8 +6741,6 @@ again:
rb = event->rb;
goto unlock;
}
-
- user_extra = nr_pages + 1;
} else {
/*
* AUX area mapping: if rb->aux_nr_pages != 0, it's already
@@ -6748,10 +6752,6 @@ again:
if (!event->rb)
return -EINVAL;
- nr_pages = vma_size / PAGE_SIZE;
- if (nr_pages > INT_MAX)
- return -ENOMEM;
-
mutex_lock(&event->mmap_mutex);
ret = -EINVAL;
@@ -6795,7 +6795,6 @@ again:
}
atomic_set(&rb->aux_mmap_count, 1);
- user_extra = nr_pages;
}
user_lock_limit = sysctl_perf_event_mlock >> (PAGE_SHIFT - 10);
^ permalink raw reply related [flat|nested] 85+ messages in thread
* [PATCH 17/19] perf: Remove retry loop from perf_mmap()
2024-11-04 13:39 [PATCH 00/19] perf: Make perf_pmu_unregister() usable Peter Zijlstra
` (15 preceding siblings ...)
2024-11-04 13:39 ` [PATCH 16/19] perf: Further simplify perf_mmap() Peter Zijlstra
@ 2024-11-04 13:39 ` Peter Zijlstra
2025-03-01 20:07 ` [tip: perf/core] perf/core: " tip-bot2 for Peter Zijlstra
2025-03-04 8:56 ` tip-bot2 for Peter Zijlstra
2024-11-04 13:39 ` [PATCH 18/19] perf: Lift event->mmap_mutex in perf_mmap() Peter Zijlstra
` (3 subsequent siblings)
20 siblings, 2 replies; 85+ messages in thread
From: Peter Zijlstra @ 2024-11-04 13:39 UTC (permalink / raw)
To: mingo, lucas.demarchi
Cc: linux-kernel, peterz, willy, acme, namhyung, mark.rutland,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang
AFAICT there is no actual benefit from the mutex drop on re-try. The
'worst' case scenario is that we instantly re-gain the mutex without
perf_mmap_close() getting it. So might as well make that the normal
case.
Reflow the code to make the ring buffer detach case naturally flow
into the no ring buffer case.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
kernel/events/core.c | 21 +++++++++++++--------
1 file changed, 13 insertions(+), 8 deletions(-)
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6665,26 +6665,31 @@ static int perf_mmap(struct file *file,
return -EINVAL;
WARN_ON_ONCE(event->ctx->parent_ctx);
-again:
mutex_lock(&event->mmap_mutex);
+
if (event->rb) {
if (data_page_nr(event->rb) != nr_pages) {
ret = -EINVAL;
goto unlock;
}
- if (!atomic_inc_not_zero(&event->rb->mmap_count)) {
+ if (atomic_inc_not_zero(&event->rb->mmap_count)) {
/*
- * Raced against perf_mmap_close(); remove the
- * event and try again.
+ * Success -- managed to mmap() the same buffer
+ * multiple times.
*/
- ring_buffer_attach(event, NULL);
- mutex_unlock(&event->mmap_mutex);
- goto again;
+ ret = 0;
+ goto unlock;
}
- goto unlock;
+ /*
+ * Raced against perf_mmap_close()'s
+ * atomic_dec_and_mutex_lock() remove the
+ * event and continue as if !event->rb
+ */
+ ring_buffer_attach(event, NULL);
}
+
} else {
/*
* AUX area mapping: if rb->aux_nr_pages != 0, it's already
^ permalink raw reply [flat|nested] 85+ messages in thread* [tip: perf/core] perf/core: Remove retry loop from perf_mmap()
2024-11-04 13:39 ` [PATCH 17/19] perf: Remove retry loop from perf_mmap() Peter Zijlstra
@ 2025-03-01 20:07 ` tip-bot2 for Peter Zijlstra
2025-03-04 8:56 ` tip-bot2 for Peter Zijlstra
1 sibling, 0 replies; 85+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2025-03-01 20:07 UTC (permalink / raw)
To: linux-tip-commits; +Cc: Peter Zijlstra (Intel), Ingo Molnar, x86, linux-kernel
The following commit has been merged into the perf/core branch of tip:
Commit-ID: 6cbfc06a8590ab4db69f8af9431e816c859e2776
Gitweb: https://git.kernel.org/tip/6cbfc06a8590ab4db69f8af9431e816c859e2776
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Mon, 04 Nov 2024 14:39:26 +01:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Sat, 01 Mar 2025 20:25:57 +01:00
perf/core: Remove retry loop from perf_mmap()
AFAICT there is no actual benefit from the mutex drop on re-try. The
'worst' case scenario is that we instantly re-gain the mutex without
perf_mmap_close() getting it. So might as well make that the normal
case.
Reflow the code to make the ring buffer detach case naturally flow
into the no ring buffer case.
[ mingo: Forward ported it ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20241104135519.463607258@infradead.org
---
kernel/events/core.c | 25 +++++++++++++++----------
1 file changed, 15 insertions(+), 10 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 4cd3494..ca4c124 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6719,28 +6719,33 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma)
return -EINVAL;
WARN_ON_ONCE(event->ctx->parent_ctx);
-again:
mutex_lock(&event->mmap_mutex);
+
if (event->rb) {
if (data_page_nr(event->rb) != nr_pages) {
ret = -EINVAL;
goto unlock;
}
- if (!atomic_inc_not_zero(&event->rb->mmap_count)) {
+ if (atomic_inc_not_zero(&event->rb->mmap_count)) {
/*
- * Raced against perf_mmap_close(); remove the
- * event and try again.
+ * Success -- managed to mmap() the same buffer
+ * multiple times.
*/
- ring_buffer_attach(event, NULL);
- mutex_unlock(&event->mmap_mutex);
- goto again;
+ ret = 0;
+ /* We need the rb to map pages. */
+ rb = event->rb;
+ goto unlock;
}
- /* We need the rb to map pages. */
- rb = event->rb;
- goto unlock;
+ /*
+ * Raced against perf_mmap_close()'s
+ * atomic_dec_and_mutex_lock() remove the
+ * event and continue as if !event->rb
+ */
+ ring_buffer_attach(event, NULL);
}
+
} else {
/*
* AUX area mapping: if rb->aux_nr_pages != 0, it's already
^ permalink raw reply related [flat|nested] 85+ messages in thread* [tip: perf/core] perf/core: Remove retry loop from perf_mmap()
2024-11-04 13:39 ` [PATCH 17/19] perf: Remove retry loop from perf_mmap() Peter Zijlstra
2025-03-01 20:07 ` [tip: perf/core] perf/core: " tip-bot2 for Peter Zijlstra
@ 2025-03-04 8:56 ` tip-bot2 for Peter Zijlstra
1 sibling, 0 replies; 85+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2025-03-04 8:56 UTC (permalink / raw)
To: linux-tip-commits
Cc: Peter Zijlstra (Intel), Ingo Molnar, Ravi Bangoria, x86,
linux-kernel
The following commit has been merged into the perf/core branch of tip:
Commit-ID: 8eaec7bb723c9a0addfc0457e2f28e41735607af
Gitweb: https://git.kernel.org/tip/8eaec7bb723c9a0addfc0457e2f28e41735607af
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Mon, 04 Nov 2024 14:39:26 +01:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 04 Mar 2025 09:43:15 +01:00
perf/core: Remove retry loop from perf_mmap()
AFAICT there is no actual benefit from the mutex drop on re-try. The
'worst' case scenario is that we instantly re-gain the mutex without
perf_mmap_close() getting it. So might as well make that the normal
case.
Reflow the code to make the ring buffer detach case naturally flow
into the no ring buffer case.
[ mingo: Forward ported it ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20241104135519.463607258@infradead.org
---
kernel/events/core.c | 25 +++++++++++++++----------
1 file changed, 15 insertions(+), 10 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 4cd3494..ca4c124 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6719,28 +6719,33 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma)
return -EINVAL;
WARN_ON_ONCE(event->ctx->parent_ctx);
-again:
mutex_lock(&event->mmap_mutex);
+
if (event->rb) {
if (data_page_nr(event->rb) != nr_pages) {
ret = -EINVAL;
goto unlock;
}
- if (!atomic_inc_not_zero(&event->rb->mmap_count)) {
+ if (atomic_inc_not_zero(&event->rb->mmap_count)) {
/*
- * Raced against perf_mmap_close(); remove the
- * event and try again.
+ * Success -- managed to mmap() the same buffer
+ * multiple times.
*/
- ring_buffer_attach(event, NULL);
- mutex_unlock(&event->mmap_mutex);
- goto again;
+ ret = 0;
+ /* We need the rb to map pages. */
+ rb = event->rb;
+ goto unlock;
}
- /* We need the rb to map pages. */
- rb = event->rb;
- goto unlock;
+ /*
+ * Raced against perf_mmap_close()'s
+ * atomic_dec_and_mutex_lock() remove the
+ * event and continue as if !event->rb
+ */
+ ring_buffer_attach(event, NULL);
}
+
} else {
/*
* AUX area mapping: if rb->aux_nr_pages != 0, it's already
^ permalink raw reply related [flat|nested] 85+ messages in thread
* [PATCH 18/19] perf: Lift event->mmap_mutex in perf_mmap()
2024-11-04 13:39 [PATCH 00/19] perf: Make perf_pmu_unregister() usable Peter Zijlstra
` (16 preceding siblings ...)
2024-11-04 13:39 ` [PATCH 17/19] perf: Remove retry loop from perf_mmap() Peter Zijlstra
@ 2024-11-04 13:39 ` Peter Zijlstra
2025-03-01 20:07 ` [tip: perf/core] perf/core: " tip-bot2 for Peter Zijlstra
2025-03-04 8:56 ` tip-bot2 for Peter Zijlstra
2024-11-04 13:39 ` [PATCH 19/19] perf: Make perf_pmu_unregister() useable Peter Zijlstra
` (2 subsequent siblings)
20 siblings, 2 replies; 85+ messages in thread
From: Peter Zijlstra @ 2024-11-04 13:39 UTC (permalink / raw)
To: mingo, lucas.demarchi
Cc: linux-kernel, peterz, willy, acme, namhyung, mark.rutland,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang
This puts 'all' of perf_mmap() under single event->mmap_mutex.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
kernel/events/core.c | 20 ++++++++------------
1 file changed, 8 insertions(+), 12 deletions(-)
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6626,7 +6626,7 @@ static int perf_mmap(struct file *file,
unsigned long vma_size;
unsigned long nr_pages;
long user_extra = 0, extra = 0;
- int ret = 0, flags = 0;
+ int ret, flags = 0;
/*
* Don't allow mmap() of inherited per-task counters. This would
@@ -6654,6 +6654,9 @@ static int perf_mmap(struct file *file,
user_extra = nr_pages;
+ mutex_lock(&event->mmap_mutex);
+ ret = -EINVAL;
+
if (vma->vm_pgoff == 0) {
nr_pages -= 1;
@@ -6662,16 +6665,13 @@ static int perf_mmap(struct file *file,
* can do bitmasks instead of modulo.
*/
if (nr_pages != 0 && !is_power_of_2(nr_pages))
- return -EINVAL;
+ goto unlock;
WARN_ON_ONCE(event->ctx->parent_ctx);
- mutex_lock(&event->mmap_mutex);
if (event->rb) {
- if (data_page_nr(event->rb) != nr_pages) {
- ret = -EINVAL;
+ if (data_page_nr(event->rb) != nr_pages)
goto unlock;
- }
if (atomic_inc_not_zero(&event->rb->mmap_count)) {
/*
@@ -6698,12 +6698,6 @@ static int perf_mmap(struct file *file,
*/
u64 aux_offset, aux_size;
- if (!event->rb)
- return -EINVAL;
-
- mutex_lock(&event->mmap_mutex);
- ret = -EINVAL;
-
rb = event->rb;
if (!rb)
goto aux_unlock;
@@ -6813,6 +6807,8 @@ static int perf_mmap(struct file *file,
rb->aux_mmap_locked = extra;
}
+ ret = 0;
+
unlock:
if (!ret) {
atomic_long_add(user_extra, &user->locked_vm);
^ permalink raw reply [flat|nested] 85+ messages in thread* [tip: perf/core] perf/core: Lift event->mmap_mutex in perf_mmap()
2024-11-04 13:39 ` [PATCH 18/19] perf: Lift event->mmap_mutex in perf_mmap() Peter Zijlstra
@ 2025-03-01 20:07 ` tip-bot2 for Peter Zijlstra
2025-03-04 8:56 ` tip-bot2 for Peter Zijlstra
1 sibling, 0 replies; 85+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2025-03-01 20:07 UTC (permalink / raw)
To: linux-tip-commits; +Cc: Peter Zijlstra (Intel), Ingo Molnar, x86, linux-kernel
The following commit has been merged into the perf/core branch of tip:
Commit-ID: 244b28f87ba48daaed81bbfe5af079e320c1e093
Gitweb: https://git.kernel.org/tip/244b28f87ba48daaed81bbfe5af079e320c1e093
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Mon, 04 Nov 2024 14:39:27 +01:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Sat, 01 Mar 2025 20:32:30 +01:00
perf/core: Lift event->mmap_mutex in perf_mmap()
This puts 'all' of perf_mmap() under single event->mmap_mutex.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20241104135519.582252957@infradead.org
---
kernel/events/core.c | 20 ++++++++------------
1 file changed, 8 insertions(+), 12 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index ca4c124..773875a 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6680,7 +6680,7 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma)
unsigned long vma_size;
unsigned long nr_pages;
long user_extra = 0, extra = 0;
- int ret = 0, flags = 0;
+ int ret, flags = 0;
/*
* Don't allow mmap() of inherited per-task counters. This would
@@ -6708,6 +6708,9 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma)
user_extra = nr_pages;
+ mutex_lock(&event->mmap_mutex);
+ ret = -EINVAL;
+
if (vma->vm_pgoff == 0) {
nr_pages -= 1;
@@ -6716,16 +6719,13 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma)
* can do bitmasks instead of modulo.
*/
if (nr_pages != 0 && !is_power_of_2(nr_pages))
- return -EINVAL;
+ goto unlock;
WARN_ON_ONCE(event->ctx->parent_ctx);
- mutex_lock(&event->mmap_mutex);
if (event->rb) {
- if (data_page_nr(event->rb) != nr_pages) {
- ret = -EINVAL;
+ if (data_page_nr(event->rb) != nr_pages)
goto unlock;
- }
if (atomic_inc_not_zero(&event->rb->mmap_count)) {
/*
@@ -6754,12 +6754,6 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma)
*/
u64 aux_offset, aux_size;
- if (!event->rb)
- return -EINVAL;
-
- mutex_lock(&event->mmap_mutex);
- ret = -EINVAL;
-
rb = event->rb;
if (!rb)
goto aux_unlock;
@@ -6869,6 +6863,8 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma)
rb->aux_mmap_locked = extra;
}
+ ret = 0;
+
unlock:
if (!ret) {
atomic_long_add(user_extra, &user->locked_vm);
^ permalink raw reply related [flat|nested] 85+ messages in thread* [tip: perf/core] perf/core: Lift event->mmap_mutex in perf_mmap()
2024-11-04 13:39 ` [PATCH 18/19] perf: Lift event->mmap_mutex in perf_mmap() Peter Zijlstra
2025-03-01 20:07 ` [tip: perf/core] perf/core: " tip-bot2 for Peter Zijlstra
@ 2025-03-04 8:56 ` tip-bot2 for Peter Zijlstra
1 sibling, 0 replies; 85+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2025-03-04 8:56 UTC (permalink / raw)
To: linux-tip-commits
Cc: Peter Zijlstra (Intel), Ingo Molnar, Ravi Bangoria, x86,
linux-kernel
The following commit has been merged into the perf/core branch of tip:
Commit-ID: 0983593f32c4c94239e01e42e4a17664b64a3c63
Gitweb: https://git.kernel.org/tip/0983593f32c4c94239e01e42e4a17664b64a3c63
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Mon, 04 Nov 2024 14:39:27 +01:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 04 Mar 2025 09:43:19 +01:00
perf/core: Lift event->mmap_mutex in perf_mmap()
This puts 'all' of perf_mmap() under single event->mmap_mutex.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20241104135519.582252957@infradead.org
---
kernel/events/core.c | 20 ++++++++------------
1 file changed, 8 insertions(+), 12 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index ca4c124..773875a 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6680,7 +6680,7 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma)
unsigned long vma_size;
unsigned long nr_pages;
long user_extra = 0, extra = 0;
- int ret = 0, flags = 0;
+ int ret, flags = 0;
/*
* Don't allow mmap() of inherited per-task counters. This would
@@ -6708,6 +6708,9 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma)
user_extra = nr_pages;
+ mutex_lock(&event->mmap_mutex);
+ ret = -EINVAL;
+
if (vma->vm_pgoff == 0) {
nr_pages -= 1;
@@ -6716,16 +6719,13 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma)
* can do bitmasks instead of modulo.
*/
if (nr_pages != 0 && !is_power_of_2(nr_pages))
- return -EINVAL;
+ goto unlock;
WARN_ON_ONCE(event->ctx->parent_ctx);
- mutex_lock(&event->mmap_mutex);
if (event->rb) {
- if (data_page_nr(event->rb) != nr_pages) {
- ret = -EINVAL;
+ if (data_page_nr(event->rb) != nr_pages)
goto unlock;
- }
if (atomic_inc_not_zero(&event->rb->mmap_count)) {
/*
@@ -6754,12 +6754,6 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma)
*/
u64 aux_offset, aux_size;
- if (!event->rb)
- return -EINVAL;
-
- mutex_lock(&event->mmap_mutex);
- ret = -EINVAL;
-
rb = event->rb;
if (!rb)
goto aux_unlock;
@@ -6869,6 +6863,8 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma)
rb->aux_mmap_locked = extra;
}
+ ret = 0;
+
unlock:
if (!ret) {
atomic_long_add(user_extra, &user->locked_vm);
^ permalink raw reply related [flat|nested] 85+ messages in thread
* [PATCH 19/19] perf: Make perf_pmu_unregister() useable
2024-11-04 13:39 [PATCH 00/19] perf: Make perf_pmu_unregister() usable Peter Zijlstra
` (17 preceding siblings ...)
2024-11-04 13:39 ` [PATCH 18/19] perf: Lift event->mmap_mutex in perf_mmap() Peter Zijlstra
@ 2024-11-04 13:39 ` Peter Zijlstra
2024-11-05 15:08 ` Liang, Kan
` (2 more replies)
2024-12-16 18:02 ` [PATCH 00/19] perf: Make perf_pmu_unregister() usable Lucas De Marchi
2025-03-01 20:00 ` Ingo Molnar
20 siblings, 3 replies; 85+ messages in thread
From: Peter Zijlstra @ 2024-11-04 13:39 UTC (permalink / raw)
To: mingo, lucas.demarchi
Cc: linux-kernel, peterz, willy, acme, namhyung, mark.rutland,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang
Previously it was only safe to call perf_pmu_unregister() if there
were no active events of that pmu around -- which was impossible to
guarantee since it races all sorts against perf_init_event().
Rework the whole thing by:
- keeping track of all events for a given pmu
- 'hiding' the pmu from perf_init_event()
- waiting for the appropriate (s)rcu grace periods such that all
prior references to the PMU will be completed
- detaching all still existing events of that pmu (see first point)
and moving them to a new REVOKED state.
- actually freeing the pmu data.
Where notably the new REVOKED state must inhibit all event actions
from reaching code that wants to use event->pmu.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
include/linux/perf_event.h | 13 +-
kernel/events/core.c | 222 ++++++++++++++++++++++++++++++++++++++++-----
2 files changed, 210 insertions(+), 25 deletions(-)
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -318,6 +318,9 @@ struct perf_output_handle;
struct pmu {
struct list_head entry;
+ spinlock_t events_lock;
+ struct list_head events;
+
struct module *module;
struct device *dev;
struct device *parent;
@@ -611,9 +614,10 @@ struct perf_addr_filter_range {
* enum perf_event_state - the states of an event:
*/
enum perf_event_state {
- PERF_EVENT_STATE_DEAD = -4,
- PERF_EVENT_STATE_EXIT = -3,
- PERF_EVENT_STATE_ERROR = -2,
+ PERF_EVENT_STATE_DEAD = -5,
+ PERF_EVENT_STATE_REVOKED = -4, /* pmu gone, must not touch */
+ PERF_EVENT_STATE_EXIT = -3, /* task died, still inherit */
+ PERF_EVENT_STATE_ERROR = -2, /* scheduling error, can enable */
PERF_EVENT_STATE_OFF = -1,
PERF_EVENT_STATE_INACTIVE = 0,
PERF_EVENT_STATE_ACTIVE = 1,
@@ -854,6 +858,7 @@ struct perf_event {
void *security;
#endif
struct list_head sb_list;
+ struct list_head pmu_list;
/*
* Certain events gets forwarded to another pmu internally by over-
@@ -1105,7 +1110,7 @@ extern void perf_aux_output_flag(struct
extern void perf_event_itrace_started(struct perf_event *event);
extern int perf_pmu_register(struct pmu *pmu, const char *name, int type);
-extern void perf_pmu_unregister(struct pmu *pmu);
+extern int perf_pmu_unregister(struct pmu *pmu);
extern void __perf_event_task_sched_in(struct task_struct *prev,
struct task_struct *task);
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2412,7 +2412,9 @@ ctx_time_update_event(struct perf_event_
#define DETACH_GROUP 0x01UL
#define DETACH_CHILD 0x02UL
-#define DETACH_DEAD 0x04UL
+#define DETACH_EXIT 0x04UL
+#define DETACH_REVOKE 0x08UL
+#define DETACH_DEAD 0x10UL
/*
* Cross CPU call to remove a performance event
@@ -2427,6 +2429,7 @@ __perf_remove_from_context(struct perf_e
void *info)
{
struct perf_event_pmu_context *pmu_ctx = event->pmu_ctx;
+ enum perf_event_state state = PERF_EVENT_STATE_OFF;
unsigned long flags = (unsigned long)info;
ctx_time_update(cpuctx, ctx);
@@ -2435,16 +2438,22 @@ __perf_remove_from_context(struct perf_e
* Ensure event_sched_out() switches to OFF, at the very least
* this avoids raising perf_pending_task() at this time.
*/
- if (flags & DETACH_DEAD)
+ if (flags & DETACH_EXIT)
+ state = PERF_EVENT_STATE_EXIT;
+ if (flags & DETACH_REVOKE)
+ state = PERF_EVENT_STATE_REVOKED;
+ if (flags & DETACH_DEAD) {
event->pending_disable = 1;
+ state = PERF_EVENT_STATE_DEAD;
+ }
event_sched_out(event, ctx);
if (flags & DETACH_GROUP)
perf_group_detach(event);
if (flags & DETACH_CHILD)
perf_child_detach(event);
list_del_event(event, ctx);
- if (flags & DETACH_DEAD)
- event->state = PERF_EVENT_STATE_DEAD;
+
+ event->state = state;
if (!pmu_ctx->nr_events) {
pmu_ctx->rotate_necessary = 0;
@@ -4511,7 +4520,8 @@ static void perf_event_enable_on_exec(st
static void perf_remove_from_owner(struct perf_event *event);
static void perf_event_exit_event(struct perf_event *event,
- struct perf_event_context *ctx);
+ struct perf_event_context *ctx,
+ bool revoke);
/*
* Removes all events from the current task that have been marked
@@ -4538,7 +4548,7 @@ static void perf_event_remove_on_exec(st
modified = true;
- perf_event_exit_event(event, ctx);
+ perf_event_exit_event(event, ctx, false);
}
raw_spin_lock_irqsave(&ctx->lock, flags);
@@ -5138,6 +5148,7 @@ static bool is_sb_event(struct perf_even
attr->context_switch || attr->text_poke ||
attr->bpf_event)
return true;
+
return false;
}
@@ -5339,6 +5350,8 @@ static void perf_pending_task_sync(struc
/* vs perf_event_alloc() error */
static void __free_event(struct perf_event *event)
{
+ struct pmu *pmu = event->pmu;
+
if (event->attach_state & PERF_ATTACH_CALLCHAIN)
put_callchain_buffers();
@@ -5365,6 +5378,7 @@ static void __free_event(struct perf_eve
* put_pmu_ctx() needs an event->ctx reference, because of
* epc->ctx.
*/
+ WARN_ON_ONCE(!pmu);
WARN_ON_ONCE(!event->ctx);
WARN_ON_ONCE(event->pmu_ctx->ctx != event->ctx);
put_pmu_ctx(event->pmu_ctx);
@@ -5377,8 +5391,13 @@ static void __free_event(struct perf_eve
if (event->ctx)
put_ctx(event->ctx);
- if (event->pmu)
- module_put(event->pmu->module);
+ if (pmu) {
+ module_put(pmu->module);
+ scoped_guard (spinlock, &pmu->events_lock) {
+ list_del(&event->pmu_list);
+ wake_up_var(pmu);
+ }
+ }
call_rcu(&event->rcu_head, free_event_rcu);
}
@@ -5397,6 +5416,7 @@ static void _free_event(struct perf_even
security_perf_event_free(event);
if (event->rb) {
+ WARN_ON_ONCE(!event->pmu);
/*
* Can happen when we close an event with re-directed output.
*
@@ -5527,7 +5547,11 @@ int perf_event_release_kernel(struct per
* Thus this guarantees that we will in fact observe and kill _ALL_
* child events.
*/
- perf_remove_from_context(event, DETACH_GROUP|DETACH_DEAD);
+ if (event->state > PERF_EVENT_STATE_REVOKED) {
+ perf_remove_from_context(event, DETACH_GROUP|DETACH_DEAD);
+ } else {
+ event->state = PERF_EVENT_STATE_DEAD;
+ }
perf_event_ctx_unlock(event, ctx);
@@ -5838,7 +5862,7 @@ __perf_read(struct perf_event *event, ch
* error state (i.e. because it was pinned but it couldn't be
* scheduled on to the CPU at some point).
*/
- if (event->state == PERF_EVENT_STATE_ERROR)
+ if (event->state <= PERF_EVENT_STATE_ERROR)
return 0;
if (count < event->read_size)
@@ -5877,8 +5901,14 @@ static __poll_t perf_poll(struct file *f
struct perf_buffer *rb;
__poll_t events = EPOLLHUP;
+ if (event->state <= PERF_EVENT_STATE_REVOKED)
+ return EPOLLERR;
+
poll_wait(file, &event->waitq, wait);
+ if (event->state <= PERF_EVENT_STATE_REVOKED)
+ return EPOLLERR;
+
if (is_event_hup(event))
return events;
@@ -6058,6 +6088,9 @@ static long _perf_ioctl(struct perf_even
void (*func)(struct perf_event *);
u32 flags = arg;
+ if (event->state <= PERF_EVENT_STATE_REVOKED)
+ return -ENODEV;
+
switch (cmd) {
case PERF_EVENT_IOC_ENABLE:
func = _perf_event_enable;
@@ -6507,6 +6540,7 @@ static void perf_mmap_close(struct vm_ar
unsigned long size = perf_data_size(rb);
bool detach_rest = false;
+ /* FIXIES vs perf_pmu_unregister() */
if (event->pmu->event_unmapped)
event->pmu->event_unmapped(event, vma->vm_mm);
@@ -6657,6 +6691,16 @@ static int perf_mmap(struct file *file,
mutex_lock(&event->mmap_mutex);
ret = -EINVAL;
+ /*
+ * This relies on __pmu_detach_event() taking mmap_mutex after marking
+ * the event REVOKED. Either we observe the state, or __pmu_detach_event()
+ * will detach the rb created here.
+ */
+ if (event->state <= PERF_EVENT_STATE_REVOKED) {
+ ret = -ENODEV;
+ goto unlock;
+ }
+
if (vma->vm_pgoff == 0) {
nr_pages -= 1;
@@ -6840,6 +6884,9 @@ static int perf_fasync(int fd, struct fi
struct perf_event *event = filp->private_data;
int retval;
+ if (event->state <= PERF_EVENT_STATE_REVOKED)
+ return -ENODEV;
+
inode_lock(inode);
retval = fasync_helper(fd, filp, on, &event->fasync);
inode_unlock(inode);
@@ -11892,6 +11939,9 @@ int perf_pmu_register(struct pmu *_pmu,
if (!pmu->event_idx)
pmu->event_idx = perf_event_idx_default;
+ INIT_LIST_HEAD(&pmu->events);
+ spin_lock_init(&pmu->events_lock);
+
/*
* Now that the PMU is complete, make it visible to perf_try_init_event().
*/
@@ -11905,11 +11955,100 @@ int perf_pmu_register(struct pmu *_pmu,
}
EXPORT_SYMBOL_GPL(perf_pmu_register);
-void perf_pmu_unregister(struct pmu *pmu)
+static void __pmu_detach_event(struct pmu *pmu, struct perf_event *event,
+ struct perf_event_context *ctx)
+{
+ /*
+ * De-schedule the event and mark it REVOKED.
+ */
+ perf_event_exit_event(event, ctx, true);
+
+ /*
+ * All _free_event() bits that rely on event->pmu:
+ *
+ * Notably, perf_mmap() relies on the ordering here.
+ */
+ scoped_guard (mutex, &event->mmap_mutex) {
+ WARN_ON_ONCE(pmu->event_unmapped);
+ ring_buffer_attach(event, NULL);
+ }
+
+ perf_event_free_bpf_prog(event);
+ perf_free_addr_filters(event);
+
+ if (event->destroy) {
+ event->destroy(event);
+ event->destroy = NULL;
+ }
+
+ if (event->pmu_ctx) {
+ put_pmu_ctx(event->pmu_ctx);
+ event->pmu_ctx = NULL;
+ }
+
+ exclusive_event_destroy(event);
+ module_put(pmu->module);
+
+ event->pmu = NULL; /* force fault instead of UAF */
+}
+
+static void pmu_detach_event(struct pmu *pmu, struct perf_event *event)
+{
+ struct perf_event_context *ctx;
+
+ ctx = perf_event_ctx_lock(event);
+ __pmu_detach_event(pmu, event, ctx);
+ perf_event_ctx_unlock(event, ctx);
+
+ scoped_guard (spinlock, &pmu->events_lock)
+ list_del(&event->pmu_list);
+}
+
+static struct perf_event *pmu_get_event(struct pmu *pmu)
+{
+ struct perf_event *event;
+
+ guard(spinlock)(&pmu->events_lock);
+ list_for_each_entry(event, &pmu->events, pmu_list) {
+ if (atomic_long_inc_not_zero(&event->refcount))
+ return event;
+ }
+
+ return NULL;
+}
+
+static bool pmu_empty(struct pmu *pmu)
+{
+ guard(spinlock)(&pmu->events_lock);
+ return list_empty(&pmu->events);
+}
+
+static void pmu_detach_events(struct pmu *pmu)
+{
+ struct perf_event *event;
+
+ for (;;) {
+ event = pmu_get_event(pmu);
+ if (!event)
+ break;
+
+ pmu_detach_event(pmu, event);
+ put_event(event);
+ }
+
+ /*
+ * wait for pending _free_event()s
+ */
+ wait_var_event(pmu, pmu_empty(pmu));
+}
+
+int perf_pmu_unregister(struct pmu *pmu)
{
scoped_guard (mutex, &pmus_lock) {
+ if (!idr_cmpxchg(&pmu_idr, pmu->type, pmu, NULL))
+ return -EINVAL;
+
list_del_rcu(&pmu->entry);
- idr_remove(&pmu_idr, pmu->type);
}
/*
@@ -11919,7 +12058,31 @@ void perf_pmu_unregister(struct pmu *pmu
synchronize_srcu(&pmus_srcu);
synchronize_rcu();
+ if (pmu->event_unmapped && !pmu_empty(pmu)) {
+ /*
+ * Can't force remove events when pmu::event_unmapped()
+ * is used in perf_mmap_close().
+ */
+ guard(mutex)(&pmus_lock);
+ idr_cmpxchg(&pmu_idr, pmu->type, NULL, pmu);
+ list_add_rcu(&pmu->entry, &pmus);
+ return -EBUSY;
+ }
+
+ scoped_guard (mutex, &pmus_lock)
+ idr_remove(&pmu_idr, pmu->type);
+
+ /*
+ * PMU is removed from the pmus list, so no new events will
+ * be created, now take care of the existing ones.
+ */
+ pmu_detach_events(pmu);
+
+ /*
+ * PMU is unused, make it go away.
+ */
perf_pmu_free(pmu);
+ return 0;
}
EXPORT_SYMBOL_GPL(perf_pmu_unregister);
@@ -12226,6 +12389,7 @@ perf_event_alloc(struct perf_event_attr
INIT_LIST_HEAD(&event->active_entry);
INIT_LIST_HEAD(&event->addr_filters.list);
INIT_HLIST_NODE(&event->hlist_entry);
+ INIT_LIST_HEAD(&event->pmu_list);
init_waitqueue_head(&event->waitq);
@@ -12294,6 +12458,13 @@ perf_event_alloc(struct perf_event_attr
perf_event__state_init(event);
+ /*
+ * Hold SRCU critical section around perf_init_event(), until returning
+ * the fully formed event put on pmu->events_list. This ensures that
+ * perf_pmu_unregister() will see any in-progress event creation that
+ * races.
+ */
+ guard(srcu)(&pmus_srcu);
pmu = NULL;
hwc = &event->hw;
@@ -12383,6 +12554,9 @@ perf_event_alloc(struct perf_event_attr
/* symmetric to unaccount_event() in _free_event() */
account_event(event);
+ scoped_guard (spinlock, &pmu->events_lock)
+ list_add(&event->pmu_list, &pmu->events);
+
return_ptr(event);
}
@@ -12769,6 +12943,10 @@ SYSCALL_DEFINE5(perf_event_open,
if (err)
goto err_fd;
group_leader = fd_file(group)->private_data;
+ if (group_leader->state <= PERF_EVENT_STATE_REVOKED) {
+ err = -ENODEV;
+ goto err_group_fd;
+ }
if (flags & PERF_FLAG_FD_OUTPUT)
output_event = group_leader;
if (flags & PERF_FLAG_FD_NO_GROUP)
@@ -13316,10 +13494,11 @@ static void sync_child_event(struct perf
}
static void
-perf_event_exit_event(struct perf_event *event, struct perf_event_context *ctx)
+perf_event_exit_event(struct perf_event *event,
+ struct perf_event_context *ctx, bool revoke)
{
struct perf_event *parent_event = event->parent;
- unsigned long detach_flags = 0;
+ unsigned long detach_flags = DETACH_EXIT;
if (parent_event) {
/*
@@ -13334,16 +13513,14 @@ perf_event_exit_event(struct perf_event
* Do destroy all inherited groups, we don't care about those
* and being thorough is better.
*/
- detach_flags = DETACH_GROUP | DETACH_CHILD;
+ detach_flags |= DETACH_GROUP | DETACH_CHILD;
mutex_lock(&parent_event->child_mutex);
}
- perf_remove_from_context(event, detach_flags);
+ if (revoke)
+ detach_flags |= DETACH_GROUP | DETACH_REVOKE;
- raw_spin_lock_irq(&ctx->lock);
- if (event->state > PERF_EVENT_STATE_EXIT)
- perf_event_set_state(event, PERF_EVENT_STATE_EXIT);
- raw_spin_unlock_irq(&ctx->lock);
+ perf_remove_from_context(event, detach_flags);
/*
* Child events can be freed.
@@ -13419,7 +13596,7 @@ static void perf_event_exit_task_context
perf_event_task(child, child_ctx, 0);
list_for_each_entry_safe(child_event, next, &child_ctx->event_list, event_entry)
- perf_event_exit_event(child_event, child_ctx);
+ perf_event_exit_event(child_event, child_ctx, false);
mutex_unlock(&child_ctx->mutex);
@@ -13609,6 +13786,9 @@ inherit_event(struct perf_event *parent_
if (parent_event->parent)
parent_event = parent_event->parent;
+ if (parent_event->state <= PERF_EVENT_STATE_REVOKED)
+ return NULL;
+
child_event = perf_event_alloc(&parent_event->attr,
parent_event->cpu,
child,
^ permalink raw reply [flat|nested] 85+ messages in thread* Re: [PATCH 19/19] perf: Make perf_pmu_unregister() useable
2024-11-04 13:39 ` [PATCH 19/19] perf: Make perf_pmu_unregister() useable Peter Zijlstra
@ 2024-11-05 15:08 ` Liang, Kan
2024-11-05 15:16 ` Peter Zijlstra
2024-11-25 4:10 ` Ravi Bangoria
2025-01-03 4:29 ` Ravi Bangoria
2 siblings, 1 reply; 85+ messages in thread
From: Liang, Kan @ 2024-11-05 15:08 UTC (permalink / raw)
To: Peter Zijlstra, mingo, lucas.demarchi
Cc: linux-kernel, willy, acme, namhyung, mark.rutland,
alexander.shishkin, jolsa, irogers, adrian.hunter
On 2024-11-04 8:39 a.m., Peter Zijlstra wrote:
> Previously it was only safe to call perf_pmu_unregister() if there
> were no active events of that pmu around -- which was impossible to
> guarantee since it races all sorts against perf_init_event().
>
> Rework the whole thing by:
>
> - keeping track of all events for a given pmu
>
> - 'hiding' the pmu from perf_init_event()
>
> - waiting for the appropriate (s)rcu grace periods such that all
> prior references to the PMU will be completed
>
> - detaching all still existing events of that pmu (see first point)
> and moving them to a new REVOKED state.
>
> - actually freeing the pmu data.
>
> Where notably the new REVOKED state must inhibit all event actions
> from reaching code that wants to use event->pmu.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
> include/linux/perf_event.h | 13 +-
> kernel/events/core.c | 222 ++++++++++++++++++++++++++++++++++++++++-----
> 2 files changed, 210 insertions(+), 25 deletions(-)
>
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -318,6 +318,9 @@ struct perf_output_handle;
> struct pmu {
> struct list_head entry;
>
> + spinlock_t events_lock;
> + struct list_head events;
> +
> struct module *module;
> struct device *dev;
> struct device *parent;
> @@ -611,9 +614,10 @@ struct perf_addr_filter_range {
> * enum perf_event_state - the states of an event:
> */
> enum perf_event_state {
> - PERF_EVENT_STATE_DEAD = -4,
> - PERF_EVENT_STATE_EXIT = -3,
> - PERF_EVENT_STATE_ERROR = -2,
> + PERF_EVENT_STATE_DEAD = -5,
> + PERF_EVENT_STATE_REVOKED = -4, /* pmu gone, must not touch */
> + PERF_EVENT_STATE_EXIT = -3, /* task died, still inherit */
> + PERF_EVENT_STATE_ERROR = -2, /* scheduling error, can enable */
> PERF_EVENT_STATE_OFF = -1,
> PERF_EVENT_STATE_INACTIVE = 0,
> PERF_EVENT_STATE_ACTIVE = 1,
> @@ -854,6 +858,7 @@ struct perf_event {
> void *security;
> #endif
> struct list_head sb_list;
> + struct list_head pmu_list;
>
> /*
> * Certain events gets forwarded to another pmu internally by over-
> @@ -1105,7 +1110,7 @@ extern void perf_aux_output_flag(struct
> extern void perf_event_itrace_started(struct perf_event *event);
>
> extern int perf_pmu_register(struct pmu *pmu, const char *name, int type);
> -extern void perf_pmu_unregister(struct pmu *pmu);
> +extern int perf_pmu_unregister(struct pmu *pmu);
>
> extern void __perf_event_task_sched_in(struct task_struct *prev,
> struct task_struct *task);
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -2412,7 +2412,9 @@ ctx_time_update_event(struct perf_event_
>
> #define DETACH_GROUP 0x01UL
> #define DETACH_CHILD 0x02UL
> -#define DETACH_DEAD 0x04UL
> +#define DETACH_EXIT 0x04UL
> +#define DETACH_REVOKE 0x08UL
> +#define DETACH_DEAD 0x10UL
>
> /*
> * Cross CPU call to remove a performance event
> @@ -2427,6 +2429,7 @@ __perf_remove_from_context(struct perf_e
> void *info)
> {
> struct perf_event_pmu_context *pmu_ctx = event->pmu_ctx;
> + enum perf_event_state state = PERF_EVENT_STATE_OFF;
Set the PERF_EVENT_STATE_OFF as default seems dangerous.
If the event was in an error state, the state will be overwritten to the
PERF_EVENT_STATE_OFF later.
One example may be the perf_pmu_migrate_context(). After the migration,
it looks like all the error state will be cleared.
Thanks,
Kan
> unsigned long flags = (unsigned long)info;
>
> ctx_time_update(cpuctx, ctx);
> @@ -2435,16 +2438,22 @@ __perf_remove_from_context(struct perf_e
> * Ensure event_sched_out() switches to OFF, at the very least
> * this avoids raising perf_pending_task() at this time.
> */
> - if (flags & DETACH_DEAD)
> + if (flags & DETACH_EXIT)
> + state = PERF_EVENT_STATE_EXIT;
> + if (flags & DETACH_REVOKE)
> + state = PERF_EVENT_STATE_REVOKED;
> + if (flags & DETACH_DEAD) {
> event->pending_disable = 1;
> + state = PERF_EVENT_STATE_DEAD;
> + }
> event_sched_out(event, ctx);
> if (flags & DETACH_GROUP)
> perf_group_detach(event);
> if (flags & DETACH_CHILD)
> perf_child_detach(event);
> list_del_event(event, ctx);
> - if (flags & DETACH_DEAD)
> - event->state = PERF_EVENT_STATE_DEAD;
> +
> + event->state = state;
>
> if (!pmu_ctx->nr_events) {
> pmu_ctx->rotate_necessary = 0;
> @@ -4511,7 +4520,8 @@ static void perf_event_enable_on_exec(st
>
> static void perf_remove_from_owner(struct perf_event *event);
> static void perf_event_exit_event(struct perf_event *event,
> - struct perf_event_context *ctx);
> + struct perf_event_context *ctx,
> + bool revoke);
>
> /*
> * Removes all events from the current task that have been marked
> @@ -4538,7 +4548,7 @@ static void perf_event_remove_on_exec(st
>
> modified = true;
>
> - perf_event_exit_event(event, ctx);
> + perf_event_exit_event(event, ctx, false);
> }
>
> raw_spin_lock_irqsave(&ctx->lock, flags);
> @@ -5138,6 +5148,7 @@ static bool is_sb_event(struct perf_even
> attr->context_switch || attr->text_poke ||
> attr->bpf_event)
> return true;
> +
> return false;
> }
>
> @@ -5339,6 +5350,8 @@ static void perf_pending_task_sync(struc
> /* vs perf_event_alloc() error */
> static void __free_event(struct perf_event *event)
> {
> + struct pmu *pmu = event->pmu;
> +
> if (event->attach_state & PERF_ATTACH_CALLCHAIN)
> put_callchain_buffers();
>
> @@ -5365,6 +5378,7 @@ static void __free_event(struct perf_eve
> * put_pmu_ctx() needs an event->ctx reference, because of
> * epc->ctx.
> */
> + WARN_ON_ONCE(!pmu);
> WARN_ON_ONCE(!event->ctx);
> WARN_ON_ONCE(event->pmu_ctx->ctx != event->ctx);
> put_pmu_ctx(event->pmu_ctx);
> @@ -5377,8 +5391,13 @@ static void __free_event(struct perf_eve
> if (event->ctx)
> put_ctx(event->ctx);
>
> - if (event->pmu)
> - module_put(event->pmu->module);
> + if (pmu) {
> + module_put(pmu->module);
> + scoped_guard (spinlock, &pmu->events_lock) {
> + list_del(&event->pmu_list);
> + wake_up_var(pmu);
> + }
> + }
>
> call_rcu(&event->rcu_head, free_event_rcu);
> }
> @@ -5397,6 +5416,7 @@ static void _free_event(struct perf_even
> security_perf_event_free(event);
>
> if (event->rb) {
> + WARN_ON_ONCE(!event->pmu);
> /*
> * Can happen when we close an event with re-directed output.
> *
> @@ -5527,7 +5547,11 @@ int perf_event_release_kernel(struct per
> * Thus this guarantees that we will in fact observe and kill _ALL_
> * child events.
> */
> - perf_remove_from_context(event, DETACH_GROUP|DETACH_DEAD);
> + if (event->state > PERF_EVENT_STATE_REVOKED) {
> + perf_remove_from_context(event, DETACH_GROUP|DETACH_DEAD);
> + } else {
> + event->state = PERF_EVENT_STATE_DEAD;
> + }
>
> perf_event_ctx_unlock(event, ctx);
>
> @@ -5838,7 +5862,7 @@ __perf_read(struct perf_event *event, ch
> * error state (i.e. because it was pinned but it couldn't be
> * scheduled on to the CPU at some point).
> */
> - if (event->state == PERF_EVENT_STATE_ERROR)
> + if (event->state <= PERF_EVENT_STATE_ERROR)
> return 0;
>
> if (count < event->read_size)
> @@ -5877,8 +5901,14 @@ static __poll_t perf_poll(struct file *f
> struct perf_buffer *rb;
> __poll_t events = EPOLLHUP;
>
> + if (event->state <= PERF_EVENT_STATE_REVOKED)
> + return EPOLLERR;
> +
> poll_wait(file, &event->waitq, wait);
>
> + if (event->state <= PERF_EVENT_STATE_REVOKED)
> + return EPOLLERR;
> +
> if (is_event_hup(event))
> return events;
>
> @@ -6058,6 +6088,9 @@ static long _perf_ioctl(struct perf_even
> void (*func)(struct perf_event *);
> u32 flags = arg;
>
> + if (event->state <= PERF_EVENT_STATE_REVOKED)
> + return -ENODEV;
> +
> switch (cmd) {
> case PERF_EVENT_IOC_ENABLE:
> func = _perf_event_enable;
> @@ -6507,6 +6540,7 @@ static void perf_mmap_close(struct vm_ar
> unsigned long size = perf_data_size(rb);
> bool detach_rest = false;
>
> + /* FIXIES vs perf_pmu_unregister() */
> if (event->pmu->event_unmapped)
> event->pmu->event_unmapped(event, vma->vm_mm);
>
> @@ -6657,6 +6691,16 @@ static int perf_mmap(struct file *file,
> mutex_lock(&event->mmap_mutex);
> ret = -EINVAL;
>
> + /*
> + * This relies on __pmu_detach_event() taking mmap_mutex after marking
> + * the event REVOKED. Either we observe the state, or __pmu_detach_event()
> + * will detach the rb created here.
> + */
> + if (event->state <= PERF_EVENT_STATE_REVOKED) {
> + ret = -ENODEV;
> + goto unlock;
> + }
> +
> if (vma->vm_pgoff == 0) {
> nr_pages -= 1;
>
> @@ -6840,6 +6884,9 @@ static int perf_fasync(int fd, struct fi
> struct perf_event *event = filp->private_data;
> int retval;
>
> + if (event->state <= PERF_EVENT_STATE_REVOKED)
> + return -ENODEV;
> +
> inode_lock(inode);
> retval = fasync_helper(fd, filp, on, &event->fasync);
> inode_unlock(inode);
> @@ -11892,6 +11939,9 @@ int perf_pmu_register(struct pmu *_pmu,
> if (!pmu->event_idx)
> pmu->event_idx = perf_event_idx_default;
>
> + INIT_LIST_HEAD(&pmu->events);
> + spin_lock_init(&pmu->events_lock);
> +
> /*
> * Now that the PMU is complete, make it visible to perf_try_init_event().
> */
> @@ -11905,11 +11955,100 @@ int perf_pmu_register(struct pmu *_pmu,
> }
> EXPORT_SYMBOL_GPL(perf_pmu_register);
>
> -void perf_pmu_unregister(struct pmu *pmu)
> +static void __pmu_detach_event(struct pmu *pmu, struct perf_event *event,
> + struct perf_event_context *ctx)
> +{
> + /*
> + * De-schedule the event and mark it REVOKED.
> + */
> + perf_event_exit_event(event, ctx, true);
> +
> + /*
> + * All _free_event() bits that rely on event->pmu:
> + *
> + * Notably, perf_mmap() relies on the ordering here.
> + */
> + scoped_guard (mutex, &event->mmap_mutex) {
> + WARN_ON_ONCE(pmu->event_unmapped);
> + ring_buffer_attach(event, NULL);
> + }
> +
> + perf_event_free_bpf_prog(event);
> + perf_free_addr_filters(event);
> +
> + if (event->destroy) {
> + event->destroy(event);
> + event->destroy = NULL;
> + }
> +
> + if (event->pmu_ctx) {
> + put_pmu_ctx(event->pmu_ctx);
> + event->pmu_ctx = NULL;
> + }
> +
> + exclusive_event_destroy(event);
> + module_put(pmu->module);
> +
> + event->pmu = NULL; /* force fault instead of UAF */
> +}
> +
> +static void pmu_detach_event(struct pmu *pmu, struct perf_event *event)
> +{
> + struct perf_event_context *ctx;
> +
> + ctx = perf_event_ctx_lock(event);
> + __pmu_detach_event(pmu, event, ctx);
> + perf_event_ctx_unlock(event, ctx);
> +
> + scoped_guard (spinlock, &pmu->events_lock)
> + list_del(&event->pmu_list);
> +}
> +
> +static struct perf_event *pmu_get_event(struct pmu *pmu)
> +{
> + struct perf_event *event;
> +
> + guard(spinlock)(&pmu->events_lock);
> + list_for_each_entry(event, &pmu->events, pmu_list) {
> + if (atomic_long_inc_not_zero(&event->refcount))
> + return event;
> + }
> +
> + return NULL;
> +}
> +
> +static bool pmu_empty(struct pmu *pmu)
> +{
> + guard(spinlock)(&pmu->events_lock);
> + return list_empty(&pmu->events);
> +}
> +
> +static void pmu_detach_events(struct pmu *pmu)
> +{
> + struct perf_event *event;
> +
> + for (;;) {
> + event = pmu_get_event(pmu);
> + if (!event)
> + break;
> +
> + pmu_detach_event(pmu, event);
> + put_event(event);
> + }
> +
> + /*
> + * wait for pending _free_event()s
> + */
> + wait_var_event(pmu, pmu_empty(pmu));
> +}
> +
> +int perf_pmu_unregister(struct pmu *pmu)
> {
> scoped_guard (mutex, &pmus_lock) {
> + if (!idr_cmpxchg(&pmu_idr, pmu->type, pmu, NULL))
> + return -EINVAL;
> +
> list_del_rcu(&pmu->entry);
> - idr_remove(&pmu_idr, pmu->type);
> }
>
> /*
> @@ -11919,7 +12058,31 @@ void perf_pmu_unregister(struct pmu *pmu
> synchronize_srcu(&pmus_srcu);
> synchronize_rcu();
>
> + if (pmu->event_unmapped && !pmu_empty(pmu)) {
> + /*
> + * Can't force remove events when pmu::event_unmapped()
> + * is used in perf_mmap_close().
> + */
> + guard(mutex)(&pmus_lock);
> + idr_cmpxchg(&pmu_idr, pmu->type, NULL, pmu);
> + list_add_rcu(&pmu->entry, &pmus);
> + return -EBUSY;
> + }
> +
> + scoped_guard (mutex, &pmus_lock)
> + idr_remove(&pmu_idr, pmu->type);
> +
> + /*
> + * PMU is removed from the pmus list, so no new events will
> + * be created, now take care of the existing ones.
> + */
> + pmu_detach_events(pmu);
> +
> + /*
> + * PMU is unused, make it go away.
> + */
> perf_pmu_free(pmu);
> + return 0;
> }
> EXPORT_SYMBOL_GPL(perf_pmu_unregister);
>
> @@ -12226,6 +12389,7 @@ perf_event_alloc(struct perf_event_attr
> INIT_LIST_HEAD(&event->active_entry);
> INIT_LIST_HEAD(&event->addr_filters.list);
> INIT_HLIST_NODE(&event->hlist_entry);
> + INIT_LIST_HEAD(&event->pmu_list);
>
>
> init_waitqueue_head(&event->waitq);
> @@ -12294,6 +12458,13 @@ perf_event_alloc(struct perf_event_attr
>
> perf_event__state_init(event);
>
> + /*
> + * Hold SRCU critical section around perf_init_event(), until returning
> + * the fully formed event put on pmu->events_list. This ensures that
> + * perf_pmu_unregister() will see any in-progress event creation that
> + * races.
> + */
> + guard(srcu)(&pmus_srcu);
> pmu = NULL;
>
> hwc = &event->hw;
> @@ -12383,6 +12554,9 @@ perf_event_alloc(struct perf_event_attr
> /* symmetric to unaccount_event() in _free_event() */
> account_event(event);
>
> + scoped_guard (spinlock, &pmu->events_lock)
> + list_add(&event->pmu_list, &pmu->events);
> +
> return_ptr(event);
> }
>
> @@ -12769,6 +12943,10 @@ SYSCALL_DEFINE5(perf_event_open,
> if (err)
> goto err_fd;
> group_leader = fd_file(group)->private_data;
> + if (group_leader->state <= PERF_EVENT_STATE_REVOKED) {
> + err = -ENODEV;
> + goto err_group_fd;
> + }
> if (flags & PERF_FLAG_FD_OUTPUT)
> output_event = group_leader;
> if (flags & PERF_FLAG_FD_NO_GROUP)
> @@ -13316,10 +13494,11 @@ static void sync_child_event(struct perf
> }
>
> static void
> -perf_event_exit_event(struct perf_event *event, struct perf_event_context *ctx)
> +perf_event_exit_event(struct perf_event *event,
> + struct perf_event_context *ctx, bool revoke)
> {
> struct perf_event *parent_event = event->parent;
> - unsigned long detach_flags = 0;
> + unsigned long detach_flags = DETACH_EXIT;
>
> if (parent_event) {
> /*
> @@ -13334,16 +13513,14 @@ perf_event_exit_event(struct perf_event
> * Do destroy all inherited groups, we don't care about those
> * and being thorough is better.
> */
> - detach_flags = DETACH_GROUP | DETACH_CHILD;
> + detach_flags |= DETACH_GROUP | DETACH_CHILD;
> mutex_lock(&parent_event->child_mutex);
> }
>
> - perf_remove_from_context(event, detach_flags);
> + if (revoke)
> + detach_flags |= DETACH_GROUP | DETACH_REVOKE;
>
> - raw_spin_lock_irq(&ctx->lock);
> - if (event->state > PERF_EVENT_STATE_EXIT)
> - perf_event_set_state(event, PERF_EVENT_STATE_EXIT);
> - raw_spin_unlock_irq(&ctx->lock);
> + perf_remove_from_context(event, detach_flags);
>
> /*
> * Child events can be freed.
> @@ -13419,7 +13596,7 @@ static void perf_event_exit_task_context
> perf_event_task(child, child_ctx, 0);
>
> list_for_each_entry_safe(child_event, next, &child_ctx->event_list, event_entry)
> - perf_event_exit_event(child_event, child_ctx);
> + perf_event_exit_event(child_event, child_ctx, false);
>
> mutex_unlock(&child_ctx->mutex);
>
> @@ -13609,6 +13786,9 @@ inherit_event(struct perf_event *parent_
> if (parent_event->parent)
> parent_event = parent_event->parent;
>
> + if (parent_event->state <= PERF_EVENT_STATE_REVOKED)
> + return NULL;
> +
> child_event = perf_event_alloc(&parent_event->attr,
> parent_event->cpu,
> child,
>
>
>
^ permalink raw reply [flat|nested] 85+ messages in thread* Re: [PATCH 19/19] perf: Make perf_pmu_unregister() useable
2024-11-05 15:08 ` Liang, Kan
@ 2024-11-05 15:16 ` Peter Zijlstra
2024-11-05 15:25 ` Liang, Kan
0 siblings, 1 reply; 85+ messages in thread
From: Peter Zijlstra @ 2024-11-05 15:16 UTC (permalink / raw)
To: Liang, Kan
Cc: mingo, lucas.demarchi, linux-kernel, willy, acme, namhyung,
mark.rutland, alexander.shishkin, jolsa, irogers, adrian.hunter
On Tue, Nov 05, 2024 at 10:08:54AM -0500, Liang, Kan wrote:
> > @@ -2427,6 +2429,7 @@ __perf_remove_from_context(struct perf_e
> > void *info)
> > {
> > struct perf_event_pmu_context *pmu_ctx = event->pmu_ctx;
> > + enum perf_event_state state = PERF_EVENT_STATE_OFF;
>
> Set the PERF_EVENT_STATE_OFF as default seems dangerous.
> If the event was in an error state, the state will be overwritten to the
> PERF_EVENT_STATE_OFF later.
>
> One example may be the perf_pmu_migrate_context(). After the migration,
> it looks like all the error state will be cleared.
>
> Thanks,
> Kan
>
> > unsigned long flags = (unsigned long)info;
> >
> > ctx_time_update(cpuctx, ctx);
> > @@ -2435,16 +2438,22 @@ __perf_remove_from_context(struct perf_e
> > * Ensure event_sched_out() switches to OFF, at the very least
> > * this avoids raising perf_pending_task() at this time.
> > */
> > - if (flags & DETACH_DEAD)
> > + if (flags & DETACH_EXIT)
> > + state = PERF_EVENT_STATE_EXIT;
> > + if (flags & DETACH_REVOKE)
> > + state = PERF_EVENT_STATE_REVOKED;
> > + if (flags & DETACH_DEAD) {
> > event->pending_disable = 1;
> > + state = PERF_EVENT_STATE_DEAD;
> > + }
> > event_sched_out(event, ctx);
> > if (flags & DETACH_GROUP)
> > perf_group_detach(event);
> > if (flags & DETACH_CHILD)
> > perf_child_detach(event);
> > list_del_event(event, ctx);
> > - if (flags & DETACH_DEAD)
> > - event->state = PERF_EVENT_STATE_DEAD;
> > +
> > + event->state = state;
How about we make this:
event->state = min(event->state, state);
?
^ permalink raw reply [flat|nested] 85+ messages in thread* Re: [PATCH 19/19] perf: Make perf_pmu_unregister() useable
2024-11-05 15:16 ` Peter Zijlstra
@ 2024-11-05 15:25 ` Liang, Kan
0 siblings, 0 replies; 85+ messages in thread
From: Liang, Kan @ 2024-11-05 15:25 UTC (permalink / raw)
To: Peter Zijlstra
Cc: mingo, lucas.demarchi, linux-kernel, willy, acme, namhyung,
mark.rutland, alexander.shishkin, jolsa, irogers, adrian.hunter
On 2024-11-05 10:16 a.m., Peter Zijlstra wrote:
> On Tue, Nov 05, 2024 at 10:08:54AM -0500, Liang, Kan wrote:
>>> @@ -2427,6 +2429,7 @@ __perf_remove_from_context(struct perf_e
>>> void *info)
>>> {
>>> struct perf_event_pmu_context *pmu_ctx = event->pmu_ctx;
>>> + enum perf_event_state state = PERF_EVENT_STATE_OFF;
>>
>> Set the PERF_EVENT_STATE_OFF as default seems dangerous.
>> If the event was in an error state, the state will be overwritten to the
>> PERF_EVENT_STATE_OFF later.
>>
>> One example may be the perf_pmu_migrate_context(). After the migration,
>> it looks like all the error state will be cleared.
>>
>> Thanks,
>> Kan
>>
>>> unsigned long flags = (unsigned long)info;
>>>
>>> ctx_time_update(cpuctx, ctx);
>>> @@ -2435,16 +2438,22 @@ __perf_remove_from_context(struct perf_e
>>> * Ensure event_sched_out() switches to OFF, at the very least
>>> * this avoids raising perf_pending_task() at this time.
>>> */
>>> - if (flags & DETACH_DEAD)
>>> + if (flags & DETACH_EXIT)
>>> + state = PERF_EVENT_STATE_EXIT;
>>> + if (flags & DETACH_REVOKE)
>>> + state = PERF_EVENT_STATE_REVOKED;
>>> + if (flags & DETACH_DEAD) {
>>> event->pending_disable = 1;
>>> + state = PERF_EVENT_STATE_DEAD;
>>> + }
>>> event_sched_out(event, ctx);
>>> if (flags & DETACH_GROUP)
>>> perf_group_detach(event);
>>> if (flags & DETACH_CHILD)
>>> perf_child_detach(event);
>>> list_del_event(event, ctx);
>>> - if (flags & DETACH_DEAD)
>>> - event->state = PERF_EVENT_STATE_DEAD;
>>> +
>>> + event->state = state;
>
> How about we make this:
>
> event->state = min(event->state, state);
>
> ?
Yep, it looks good to me.
Thanks,
Kan
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 19/19] perf: Make perf_pmu_unregister() useable
2024-11-04 13:39 ` [PATCH 19/19] perf: Make perf_pmu_unregister() useable Peter Zijlstra
2024-11-05 15:08 ` Liang, Kan
@ 2024-11-25 4:10 ` Ravi Bangoria
2024-12-17 9:12 ` Peter Zijlstra
2025-01-03 4:29 ` Ravi Bangoria
2 siblings, 1 reply; 85+ messages in thread
From: Ravi Bangoria @ 2024-11-25 4:10 UTC (permalink / raw)
To: Peter Zijlstra
Cc: mingo@kernel.org, lucas.demarchi@intel.com,
linux-kernel@vger.kernel.org, willy@infradead.org,
acme@kernel.org, namhyung@kernel.org, mark.rutland@arm.com,
alexander.shishkin@linux.intel.com, jolsa@kernel.org,
irogers@google.com, adrian.hunter@intel.com,
kan.liang@linux.intel.com, Bangoria, Ravikumar
Hi Peter,
> @@ -6507,6 +6540,7 @@ static void perf_mmap_close(struct vm_ar
> unsigned long size = perf_data_size(rb);
> bool detach_rest = false;
>
> + /* FIXIES vs perf_pmu_unregister() */
> if (event->pmu->event_unmapped)
> event->pmu->event_unmapped(event, vma->vm_mm);
I assume you are already aware of the race between __pmu_detach_event()
and perf_mmap_close() since you have put that comment. In any case, below
sequence of operations triggers a splat when perf_mmap_close() tries to
access event->rb, event->pmu etc. which were already freed by
__pmu_detach_event().
Sequence:
Kernel Userspace
------ ---------
perf_pmu_register()
fd = perf_event_open()
p = mmap(fd)
perf_pmu_unregister()
munmap(p)
close(fd)
Splat:
BUG: kernel NULL pointer dereference, address: 0000000000000088
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 105f90067 P4D 105f90067 PUD 11a94a067 PMD 0
Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 49 UID: 0 PID: 3456 Comm: perf-event-mmap Tainted: G OE 6.12.0-vanilla-dirty #273
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Dell Inc. PowerEdge R6515/07PXPY, BIOS 2.7.3 03/31/2022
RIP: 0010:perf_mmap_close+0x69/0x316
Code: [...]
RSP: 0018:ffffc90003773970 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffff888125c2d400 RSI: ffff88bf7c8a1900 RDI: ffff888125c2d400
RBP: ffff888103ccaf40 R08: 0000000000000000 R09: 0000000000000000
R10: 3030303030303030 R11: 3030303030303030 R12: ffff88811f58d080
R13: ffffc90003773a70 R14: ffffc90003773a28 R15: 00007fcc1df1d000
FS: 00007fcc1e72e6c0(0000) GS:ffff88bf7c880000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000088 CR3: 000000010396c003 CR4: 0000000000f70ef0
PKRU: 55555554
Call Trace:
<TASK>
? __die_body.cold+0x19/0x27
? page_fault_oops+0x15a/0x2f0
? exc_page_fault+0x7e/0x180
? asm_exc_page_fault+0x26/0x30
? perf_mmap_close+0x69/0x316
remove_vma+0x2f/0x70
vms_complete_munmap_vmas+0xdc/0x190
do_vmi_align_munmap+0x1d7/0x250
do_vmi_munmap+0xd0/0x180
__vm_munmap+0xa2/0x170
? hrtimer_start_range_ns+0x26f/0x3b0
__x64_sys_munmap+0x1b/0x30
do_syscall_64+0x82/0x160
? srso_alias_return_thunk+0x5/0xfbef5
? tty_insert_flip_string_and_push_buffer+0x8d/0xc0
? srso_alias_return_thunk+0x5/0xfbef5
? remove_wait_queue+0x24/0x60
? srso_alias_return_thunk+0x5/0xfbef5
? n_tty_write+0x36f/0x520
? srso_alias_return_thunk+0x5/0xfbef5
? __wake_up+0x44/0x60
? srso_alias_return_thunk+0x5/0xfbef5
? file_tty_write.isra.0+0x20c/0x2c0
? srso_alias_return_thunk+0x5/0xfbef5
? vfs_write+0x290/0x450
? srso_alias_return_thunk+0x5/0xfbef5
? srso_alias_return_thunk+0x5/0xfbef5
? syscall_exit_to_user_mode+0x10/0x210
? srso_alias_return_thunk+0x5/0xfbef5
? do_syscall_64+0x8e/0x160
? __rseq_handle_notify_resume+0xa6/0x4e0
? __pfx_hrtimer_wakeup+0x10/0x10
? srso_alias_return_thunk+0x5/0xfbef5
? syscall_exit_to_user_mode+0x1d5/0x210
? srso_alias_return_thunk+0x5/0xfbef5
? do_syscall_64+0x8e/0x160
? exc_page_fault+0x7e/0x180
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7fcc1e849eab
Code: [...]
RSP: 002b:00007fcc1e72de08 EFLAGS: 00000202 ORIG_RAX: 000000000000000b
RAX: ffffffffffffffda RBX: 00007fcc1e72ecdc RCX: 00007fcc1e849eab
RDX: 0000000000011000 RSI: 0000000000011000 RDI: 00007fcc1df1d000
RBP: 00007fcc1e72dec0 R08: 0000000000000000 R09: 00000000ffffffff
R10: 0000000000000000 R11: 0000000000000202 R12: 00007fcc1e72e6c0
R13: ffffffffffffff88 R14: 0000000000000000 R15: 00007ffd1a4ce000
</TASK>
^ permalink raw reply [flat|nested] 85+ messages in thread* Re: [PATCH 19/19] perf: Make perf_pmu_unregister() useable
2024-11-25 4:10 ` Ravi Bangoria
@ 2024-12-17 9:12 ` Peter Zijlstra
2024-12-17 11:52 ` Peter Zijlstra
0 siblings, 1 reply; 85+ messages in thread
From: Peter Zijlstra @ 2024-12-17 9:12 UTC (permalink / raw)
To: Ravi Bangoria
Cc: mingo@kernel.org, lucas.demarchi@intel.com,
linux-kernel@vger.kernel.org, willy@infradead.org,
acme@kernel.org, namhyung@kernel.org, mark.rutland@arm.com,
alexander.shishkin@linux.intel.com, jolsa@kernel.org,
irogers@google.com, adrian.hunter@intel.com,
kan.liang@linux.intel.com
Oh sorry, I seem to have missed this email :/
On Mon, Nov 25, 2024 at 09:40:28AM +0530, Ravi Bangoria wrote:
> Hi Peter,
>
> > @@ -6507,6 +6540,7 @@ static void perf_mmap_close(struct vm_ar
> > unsigned long size = perf_data_size(rb);
> > bool detach_rest = false;
> >
> > + /* FIXIES vs perf_pmu_unregister() */
> > if (event->pmu->event_unmapped)
> > event->pmu->event_unmapped(event, vma->vm_mm);
>
> I assume you are already aware of the race between __pmu_detach_event()
> and perf_mmap_close() since you have put that comment.
That comment was mostly about how we can't fix up the whole
->event_unmapped() thing and have to abort pmu_unregister for it.
> In any case, below sequence of operations triggers a splat when
> perf_mmap_close() tries to access event->rb, event->pmu etc. which
> were already freed by __pmu_detach_event().
>
> Sequence:
>
> Kernel Userspace
> ------ ---------
> perf_pmu_register()
> fd = perf_event_open()
> p = mmap(fd)
> perf_pmu_unregister()
> munmap(p)
> close(fd)
Right, let me go have a look. Thanks!
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 19/19] perf: Make perf_pmu_unregister() useable
2024-12-17 9:12 ` Peter Zijlstra
@ 2024-12-17 11:52 ` Peter Zijlstra
2024-12-19 9:33 ` Ravi Bangoria
0 siblings, 1 reply; 85+ messages in thread
From: Peter Zijlstra @ 2024-12-17 11:52 UTC (permalink / raw)
To: Ravi Bangoria
Cc: mingo@kernel.org, lucas.demarchi@intel.com,
linux-kernel@vger.kernel.org, willy@infradead.org,
acme@kernel.org, namhyung@kernel.org, mark.rutland@arm.com,
alexander.shishkin@linux.intel.com, jolsa@kernel.org,
irogers@google.com, adrian.hunter@intel.com,
kan.liang@linux.intel.com
On Tue, Dec 17, 2024 at 10:12:16AM +0100, Peter Zijlstra wrote:
>
>
> Oh sorry, I seem to have missed this email :/
>
> On Mon, Nov 25, 2024 at 09:40:28AM +0530, Ravi Bangoria wrote:
> > Hi Peter,
> >
> > > @@ -6507,6 +6540,7 @@ static void perf_mmap_close(struct vm_ar
> > > unsigned long size = perf_data_size(rb);
> > > bool detach_rest = false;
> > >
> > > + /* FIXIES vs perf_pmu_unregister() */
> > > if (event->pmu->event_unmapped)
> > > event->pmu->event_unmapped(event, vma->vm_mm);
> >
> > I assume you are already aware of the race between __pmu_detach_event()
> > and perf_mmap_close() since you have put that comment.
>
> That comment was mostly about how we can't fix up the whole
> ->event_unmapped() thing and have to abort pmu_unregister for it.
>
> > In any case, below sequence of operations triggers a splat when
> > perf_mmap_close() tries to access event->rb, event->pmu etc. which
> > were already freed by __pmu_detach_event().
> >
> > Sequence:
> >
> > Kernel Userspace
> > ------ ---------
> > perf_pmu_register()
> > fd = perf_event_open()
> > p = mmap(fd)
> > perf_pmu_unregister()
> > munmap(p)
> > close(fd)
>
> Right, let me go have a look. Thanks!
Bah, that's a right mess indeed, however did I miss all that.
The easiest solution is probably to leave the RB around on detach, but
now I need to remember why I did that in the first place :/
Oh.. I think I mostly that to serialize against perf_mmap(), which
should reject creating further maps. But we can do that without actually
detaching the RB -- we only need to acquire and release mmap_mutex.
Ah, there's that perf_event_stop() inside of ring_buffer_attach(), that
must not happen after detach, obviously. So that must be dealt with.
Hmm, also if we leave ->rb around, then we need to deal with
perf_event_set_output(), someone could try and redirect their things
into our buffer -- which isn't technically broken, but still weird.
Something like the below.
How did you test; perf-fuzzer or something?
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1742,7 +1742,7 @@ static inline bool needs_branch_stack(st
static inline bool has_aux(struct perf_event *event)
{
- return event->pmu->setup_aux;
+ return event->pmu && event->pmu->setup_aux;
}
static inline bool has_aux_action(struct perf_event *event)
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5409,7 +5409,6 @@ static void _free_event(struct perf_even
security_perf_event_free(event);
if (event->rb) {
- WARN_ON_ONCE(!event->pmu);
/*
* Can happen when we close an event with re-directed output.
*
@@ -12023,7 +12022,10 @@ static void __pmu_detach_event(struct pm
*/
scoped_guard (mutex, &event->mmap_mutex) {
WARN_ON_ONCE(pmu->event_unmapped);
- ring_buffer_attach(event, NULL);
+ /*
+ * Mostly an empy lock sequence, such that perf_mmap(), which
+ * relies on mmap_mutex, is sure to observe the state change.
+ */
}
perf_event_free_bpf_prog(event);
@@ -12823,6 +12825,9 @@ perf_event_set_output(struct perf_event
goto unlock;
if (output_event) {
+ if (output_event->state <= PERF_EVENT_STATE_REVOKED)
+ goto unlock;
+
/* get the rb we want to redirect to */
rb = ring_buffer_get(output_event);
if (!rb)
^ permalink raw reply [flat|nested] 85+ messages in thread* Re: [PATCH 19/19] perf: Make perf_pmu_unregister() useable
2024-12-17 11:52 ` Peter Zijlstra
@ 2024-12-19 9:33 ` Ravi Bangoria
2024-12-19 10:56 ` Ravi Bangoria
2025-01-03 4:24 ` Ravi Bangoria
0 siblings, 2 replies; 85+ messages in thread
From: Ravi Bangoria @ 2024-12-19 9:33 UTC (permalink / raw)
To: Peter Zijlstra
Cc: mingo@kernel.org, lucas.demarchi@intel.com,
linux-kernel@vger.kernel.org, willy@infradead.org,
acme@kernel.org, namhyung@kernel.org, mark.rutland@arm.com,
alexander.shishkin@linux.intel.com, jolsa@kernel.org,
irogers@google.com, adrian.hunter@intel.com,
kan.liang@linux.intel.com, Ravi Bangoria
Hi Peter,
>>> In any case, below sequence of operations triggers a splat when
>>> perf_mmap_close() tries to access event->rb, event->pmu etc. which
>>> were already freed by __pmu_detach_event().
>>>
>>> Sequence:
>>>
>>> Kernel Userspace
>>> ------ ---------
>>> perf_pmu_register()
>>> fd = perf_event_open()
>>> p = mmap(fd)
>>> perf_pmu_unregister()
>>> munmap(p)
>>> close(fd)
>>
>> Right, let me go have a look. Thanks!
>
> Bah, that's a right mess indeed, however did I miss all that.
>
> The easiest solution is probably to leave the RB around on detach, but
> now I need to remember why I did that in the first place :/
>
> Oh.. I think I mostly that to serialize against perf_mmap(), which
> should reject creating further maps. But we can do that without actually
> detaching the RB -- we only need to acquire and release mmap_mutex.
>
> Ah, there's that perf_event_stop() inside of ring_buffer_attach(), that
> must not happen after detach, obviously. So that must be dealt with.
>
> Hmm, also if we leave ->rb around, then we need to deal with
> perf_event_set_output(), someone could try and redirect their things
> into our buffer -- which isn't technically broken, but still weird.
>
> Something like the below.
>
> How did you test; perf-fuzzer or something?
Prepared a simple test that does pmu register(), unregister() and
"perf record" in parallel. It's quite dirty, I'll clean it up and
share it here.
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -1742,7 +1742,7 @@ static inline bool needs_branch_stack(st
>
> static inline bool has_aux(struct perf_event *event)
> {
> - return event->pmu->setup_aux;
> + return event->pmu && event->pmu->setup_aux;
> }
>
> static inline bool has_aux_action(struct perf_event *event)
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -5409,7 +5409,6 @@ static void _free_event(struct perf_even
> security_perf_event_free(event);
>
> if (event->rb) {
> - WARN_ON_ONCE(!event->pmu);
> /*
> * Can happen when we close an event with re-directed output.
> *
> @@ -12023,7 +12022,10 @@ static void __pmu_detach_event(struct pm
> */
> scoped_guard (mutex, &event->mmap_mutex) {
> WARN_ON_ONCE(pmu->event_unmapped);
> - ring_buffer_attach(event, NULL);
> + /*
> + * Mostly an empy lock sequence, such that perf_mmap(), which
> + * relies on mmap_mutex, is sure to observe the state change.
> + */
> }
>
> perf_event_free_bpf_prog(event);
> @@ -12823,6 +12825,9 @@ perf_event_set_output(struct perf_event
> goto unlock;
>
> if (output_event) {
> + if (output_event->state <= PERF_EVENT_STATE_REVOKED)
> + goto unlock;
> +
> /* get the rb we want to redirect to */
> rb = ring_buffer_get(output_event);
> if (!rb)
I needed this additional diff on top of your change. With this, it survives
my test. perf_mmap_close() change seems correct. Not sure about perf_mmap().
I'll inspect the code further.
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6540,7 +6540,7 @@ static void perf_mmap_close(struct vm_area_struct *vma)
bool detach_rest = false;
/* FIXIES vs perf_pmu_unregister() */
- if (event->pmu->event_unmapped)
+ if (event->pmu && event->pmu->event_unmapped)
event->pmu->event_unmapped(event, vma->vm_mm);
/*
@@ -6873,7 +6873,7 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma)
vm_flags_set(vma, VM_DONTCOPY | VM_DONTEXPAND | VM_DONTDUMP);
vma->vm_ops = &perf_mmap_vmops;
- if (!ret && event->pmu->event_mapped)
+ if (!ret && event->pmu && event->pmu->event_mapped)
event->pmu->event_mapped(event, vma->vm_mm);
return ret;
Thanks,
Ravi
^ permalink raw reply [flat|nested] 85+ messages in thread* Re: [PATCH 19/19] perf: Make perf_pmu_unregister() useable
2024-12-19 9:33 ` Ravi Bangoria
@ 2024-12-19 10:56 ` Ravi Bangoria
2025-01-03 4:24 ` Ravi Bangoria
1 sibling, 0 replies; 85+ messages in thread
From: Ravi Bangoria @ 2024-12-19 10:56 UTC (permalink / raw)
To: Peter Zijlstra
Cc: mingo@kernel.org, lucas.demarchi@intel.com,
linux-kernel@vger.kernel.org, willy@infradead.org,
acme@kernel.org, namhyung@kernel.org, mark.rutland@arm.com,
alexander.shishkin@linux.intel.com, jolsa@kernel.org,
irogers@google.com, adrian.hunter@intel.com,
kan.liang@linux.intel.com, Ravi Bangoria
[-- Attachment #1: Type: text/plain, Size: 419 bytes --]
>> How did you test; perf-fuzzer or something?
>
> Prepared a simple test that does pmu register(), unregister() and
> "perf record" in parallel. It's quite dirty, I'll clean it up and
> share it here.
Attaching the testsuite here.
It contains:
- Kernel module that allows user to register and unregister a pmu
- Shell script to that runs "perf record" on test pmu
- README explains how to run the test
Thanks,
Ravi
[-- Attachment #2: _tinypmu-u-register.sh --]
[-- Type: text/plain, Size: 262 bytes --]
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
# Author: Ravi Bangoria <ravi.bangoria@amd.com>
# Must not run directly. Run it via tinypmu-u.sh
if [ "$EUID" -ne 0 ]; then
echo "Please run as root"
exit
fi
while true; do
echo 1 > /dev/tinypmu_register
done
[-- Attachment #3: _tinypmu-u-unregister.sh --]
[-- Type: text/plain, Size: 264 bytes --]
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
# Author: Ravi Bangoria <ravi.bangoria@amd.com>
# Must not run directly. Run it via tinypmu-u.sh
if [ "$EUID" -ne 0 ]; then
echo "Please run as root"
exit
fi
while true; do
echo 1 > /dev/tinypmu_unregister
done
[-- Attachment #4: Makefile --]
[-- Type: text/plain, Size: 314 bytes --]
# SPDX-License-Identifier: GPL-2.0
# Author: Ravi Bangoria <ravi.bangoria@amd.com>
obj-m := tinypmu-k.o
KDIR := /lib/modules/$(shell uname -r)/build
PWD := $(shell pwd)
default:
$(MAKE) -C $(KDIR) M=$(PWD) modules
clean:
rm -rf *.o *.mod.c *.mod *.ko *.order *.symvers \.*.o \.*.ko \.*.cmd \.tmp_versions
[-- Attachment #5: README --]
[-- Type: text/plain, Size: 445 bytes --]
A simple testsuite to stress test pmu register / unregister code:
https://lore.kernel.org/r/20241104133909.669111662@infradead.org
Build:
$ make
Clean:
$ make clean
Test perf pmu register and unregister:
term1~$ while true; do sudo bash tinypmu-u.sh; done
Test event creation / mmap / unmap / event deletion etc along with
pmu register / unregister, (run this in parallel to above command):
term2~$ sudo bash tinypmu-u-events.sh
[-- Attachment #6: tinypmu-k.c --]
[-- Type: text/plain, Size: 4621 bytes --]
// SPDX-License-Identifier: GPL-2.0
/*
* Provide an interface /dev/tinypmu_register and /dev/tinypmu_unregister to
* stress test perf_pmu_register() and perf_pmu_unregister() functions.
*
* Author: Ravi Bangoria <ravi.bangoria@amd.com>
*/
#define pr_fmt(fmt) "tinypmu: " fmt
#include <linux/module.h>
#include <linux/perf_event.h>
#include <linux/fs.h>
#include <linux/miscdevice.h>
#define NR_TINYPMUS 1
static int tinypmu_event_init(struct perf_event *event)
{
return 0;
}
static void tinypmu_del(struct perf_event *event, int flags)
{
}
static int tinypmu_add(struct perf_event *event, int flags)
{
return 0;
}
static void tinypmu_start(struct perf_event *event, int flags)
{
}
static void tinypmu_stop(struct perf_event *event, int flags)
{
}
static void tinypmu_read(struct perf_event *event)
{
}
PMU_FORMAT_ATTR(event, "config:0-20");
static struct attribute *tinypmu_events_attr[] = {
&format_attr_event.attr,
NULL,
};
static struct attribute_group tinypmu_events_group = {
.name = "format",
.attrs = tinypmu_events_attr,
};
static const struct attribute_group *tinypmu_attr_groups[] = {
&tinypmu_events_group,
NULL,
};
static struct pmu *alloc_tinypmu(void)
{
struct pmu *tinypmu = kzalloc(sizeof(struct pmu), GFP_KERNEL);
if (!tinypmu)
return NULL;
tinypmu->task_ctx_nr = perf_invalid_context;
tinypmu->event_init = tinypmu_event_init;
tinypmu->add = tinypmu_add;
tinypmu->del = tinypmu_del;
tinypmu->start = tinypmu_start;
tinypmu->stop = tinypmu_stop;
tinypmu->read = tinypmu_read;
tinypmu->attr_groups = tinypmu_attr_groups;
return tinypmu;
}
static DEFINE_MUTEX(lock);
static struct pmu *tinypmus[NR_TINYPMUS];
static char pmu_name[NR_TINYPMUS][11];
static void register_pmu(unsigned int idx)
{
struct pmu *temp;
int ret;
if (idx >= NR_TINYPMUS)
return;
temp = alloc_tinypmu();
if (!temp) {
mutex_unlock(&lock);
return;
}
mutex_lock(&lock);
if (tinypmus[idx]) {
mutex_unlock(&lock);
kfree(temp);
return;
}
ret = perf_pmu_register(temp, pmu_name[idx], -1);
if (!ret) {
tinypmus[idx] = temp;
mutex_unlock(&lock);
return;
}
mutex_unlock(&lock);
kfree(temp);
return;
}
static void unregister_pmu(unsigned int idx)
{
struct pmu *temp;
if (idx >= NR_TINYPMUS)
return;
mutex_lock(&lock);
if (!tinypmus[idx]) {
mutex_unlock(&lock);
return;
}
/*
* Must call perf_pmu_unregister() inside atomic section. If I try
* to reduce atomic section by using temp to cache tinypmus[idx]
* plus clear tinypmus[idx] and do unregister after mutex_unlock(),
* register_pmu() will assume no pmu exists with "tinypmu<idx>" name
* since tinypmus[idx] is NULL and try to register new pmu although
* this function is yet to unregistered pmu with the same name.
*/
perf_pmu_unregister(tinypmus[idx]);
temp = tinypmus[idx];
tinypmus[idx] = NULL;
mutex_unlock(&lock);
kfree(temp);
}
static ssize_t register_write(struct file *f, const char *data, size_t size,
loff_t *ppos)
{
unsigned int idx;
get_random_bytes(&idx, sizeof(idx));
idx %= NR_TINYPMUS;
register_pmu(idx);
return 1;
}
static ssize_t unregister_write(struct file *f, const char *data, size_t size,
loff_t *ppos)
{
unsigned int idx;
get_random_bytes(&idx, sizeof(idx));
idx %= NR_TINYPMUS;
unregister_pmu(idx);
return 1;
}
static const struct file_operations register_fops = {
.owner = THIS_MODULE,
.write = register_write,
};
static const struct file_operations unregister_fops = {
.owner = THIS_MODULE,
.write = unregister_write,
};
static struct miscdevice register_dev = {
MISC_DYNAMIC_MINOR,
"tinypmu_register",
®ister_fops,
};
static struct miscdevice unregister_dev = {
MISC_DYNAMIC_MINOR,
"tinypmu_unregister",
&unregister_fops,
};
static int __init hello_init(void)
{
int ret;
int i;
ret = misc_register(®ister_dev);
if (ret) {
pr_err("Failed to register register_dev\n");
return ret;
}
ret = misc_register(&unregister_dev);
if (ret) {
pr_err("Failed to register unregister_dev\n");
misc_deregister(®ister_dev);
return ret;
}
for (i = 0; i < NR_TINYPMUS; i++)
sprintf(pmu_name[i], "tinypmu%d", i);
for (i = 0; i < NR_TINYPMUS; i++)
register_pmu(i);
return 0;
}
module_init(hello_init);
static void __exit hello_exit(void)
{
int i;
for (i = 0; i < NR_TINYPMUS; i++)
unregister_pmu(i);
misc_deregister(®ister_dev);
misc_deregister(&unregister_dev);
}
module_exit(hello_exit);
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Ravi Bangoria");
MODULE_DESCRIPTION("PMU register/unregister stress test");
MODULE_VERSION("dev");
[-- Attachment #7: tinypmu-u.sh --]
[-- Type: text/plain, Size: 629 bytes --]
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
# Author: Ravi Bangoria <ravi.bangoria@amd.com>
if [ "$EUID" -ne 0 ]; then
echo "Please run as root"
exit
fi
sudo insmod tinypmu-k.ko
cleanup() {
for i in "$@"; do
echo -n "${i} "
kill ${i}
done
wait
rmmod tinypmu_k
rm -rf /dev/tinypmu_register
rm -rf /dev/tinypmu_unregister
echo ""
exit
}
bash _tinypmu-u-register.sh &
reg_pid=$!
bash _tinypmu-u-unregister.sh &
unreg_pid=$!
# register Ctrl+C cleanup if aborted inbetween
#trap "cleanup '${reg_pid}' '${unreg_pid}' ${event_pids[@]}" 2
echo ${reg_pid} ${unreg_pid}
sleep 10
cleanup ${reg_pid} ${unreg_pid}
[-- Attachment #8: tinypmu-u-events.sh --]
[-- Type: text/plain, Size: 262 bytes --]
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
# Author: Ravi Bangoria <ravi.bangoria@amd.com>
pmu_nr=${1}
if [ "$EUID" -ne 0 ]; then
echo "Please run as root"
exit
fi
while true; do
perf record -a -e tinypmu${pmu_nr}// -- sleep 1 >> /dev/null 2>&1 &
done
^ permalink raw reply [flat|nested] 85+ messages in thread* Re: [PATCH 19/19] perf: Make perf_pmu_unregister() useable
2024-12-19 9:33 ` Ravi Bangoria
2024-12-19 10:56 ` Ravi Bangoria
@ 2025-01-03 4:24 ` Ravi Bangoria
2025-01-17 0:03 ` Peter Zijlstra
1 sibling, 1 reply; 85+ messages in thread
From: Ravi Bangoria @ 2025-01-03 4:24 UTC (permalink / raw)
To: Peter Zijlstra
Cc: mingo@kernel.org, lucas.demarchi@intel.com,
linux-kernel@vger.kernel.org, willy@infradead.org,
acme@kernel.org, namhyung@kernel.org, mark.rutland@arm.com,
alexander.shishkin@linux.intel.com, jolsa@kernel.org,
irogers@google.com, adrian.hunter@intel.com,
kan.liang@linux.intel.com, Ravi Bangoria
Hi Peter,
Sorry for the delay. Was on vacation.
>>>> In any case, below sequence of operations triggers a splat when
>>>> perf_mmap_close() tries to access event->rb, event->pmu etc. which
>>>> were already freed by __pmu_detach_event().
>>>>
>>>> Sequence:
>>>>
>>>> Kernel Userspace
>>>> ------ ---------
>>>> perf_pmu_register()
>>>> fd = perf_event_open()
>>>> p = mmap(fd)
>>>> perf_pmu_unregister()
>>>> munmap(p)
>>>> close(fd)
>>>
>>> Right, let me go have a look. Thanks!
>>
>> Bah, that's a right mess indeed, however did I miss all that.
>>
>> The easiest solution is probably to leave the RB around on detach, but
>> now I need to remember why I did that in the first place :/
>>
>> Oh.. I think I mostly that to serialize against perf_mmap(), which
>> should reject creating further maps. But we can do that without actually
>> detaching the RB -- we only need to acquire and release mmap_mutex.
>>
>> Ah, there's that perf_event_stop() inside of ring_buffer_attach(), that
>> must not happen after detach, obviously. So that must be dealt with.
>>
>> Hmm, also if we leave ->rb around, then we need to deal with
>> perf_event_set_output(), someone could try and redirect their things
>> into our buffer -- which isn't technically broken, but still weird.
>>
>> Something like the below.
>>
>> How did you test; perf-fuzzer or something?
>
> Prepared a simple test that does pmu register(), unregister() and
> "perf record" in parallel. It's quite dirty, I'll clean it up and
> share it here.
>
>> --- a/include/linux/perf_event.h
>> +++ b/include/linux/perf_event.h
>> @@ -1742,7 +1742,7 @@ static inline bool needs_branch_stack(st
>>
>> static inline bool has_aux(struct perf_event *event)
>> {
>> - return event->pmu->setup_aux;
>> + return event->pmu && event->pmu->setup_aux;
>> }
>>
>> static inline bool has_aux_action(struct perf_event *event)
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
>> @@ -5409,7 +5409,6 @@ static void _free_event(struct perf_even
>> security_perf_event_free(event);
>>
>> if (event->rb) {
>> - WARN_ON_ONCE(!event->pmu);
>> /*
>> * Can happen when we close an event with re-directed output.
>> *
>> @@ -12023,7 +12022,10 @@ static void __pmu_detach_event(struct pm
>> */
>> scoped_guard (mutex, &event->mmap_mutex) {
>> WARN_ON_ONCE(pmu->event_unmapped);
>> - ring_buffer_attach(event, NULL);
>> + /*
>> + * Mostly an empy lock sequence, such that perf_mmap(), which
>> + * relies on mmap_mutex, is sure to observe the state change.
>> + */
>> }
>>
>> perf_event_free_bpf_prog(event);
>> @@ -12823,6 +12825,9 @@ perf_event_set_output(struct perf_event
>> goto unlock;
>>
>> if (output_event) {
>> + if (output_event->state <= PERF_EVENT_STATE_REVOKED)
>> + goto unlock;
>> +
>> /* get the rb we want to redirect to */
>> rb = ring_buffer_get(output_event);
>> if (!rb)
>
> I needed this additional diff on top of your change. With this, it survives
> my test. perf_mmap_close() change seems correct. Not sure about perf_mmap().
> I'll inspect the code further.
>
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -6540,7 +6540,7 @@ static void perf_mmap_close(struct vm_area_struct *vma)
> bool detach_rest = false;
>
> /* FIXIES vs perf_pmu_unregister() */
> - if (event->pmu->event_unmapped)
> + if (event->pmu && event->pmu->event_unmapped)
> event->pmu->event_unmapped(event, vma->vm_mm);
>
> /*
> @@ -6873,7 +6873,7 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma)
> vm_flags_set(vma, VM_DONTCOPY | VM_DONTEXPAND | VM_DONTDUMP);
> vma->vm_ops = &perf_mmap_vmops;
>
> - if (!ret && event->pmu->event_mapped)
> + if (!ret && event->pmu && event->pmu->event_mapped)
> event->pmu->event_mapped(event, vma->vm_mm);
>
> return ret;
Both of these are incorrect. They just reduce the race window, doesn't
actually solve the race. Anyway, I could spot few other races:
1) A race between event creation and perf_pmu_unregister(). Any event
create code path (perf_event_open(), perf_event_create_kernel_counter()
and inherit_event()) allocates event with perf_event_alloc() which adds
an event to the pmu->events list. However, the event is still immature,
for ex, event->ctx is still NULL. In the mean time, perf_pmu_unregister()
finds this event and tries to detach it.
perf_event_open() perf_pmu_unregister()
event = perf_event_alloc() pmu_detach_event(event)
list_add(&event->pmu_list, &pmu->events); perf_event_ctx_lock(event)
/* perf_event_ctx_lock_nested(ctx)
* event->ctx is NULL. ctx = READ_ONCE(event->ctx); /* event->ctx is NULL */
*/ if (!refcount_inc_not_zero(&ctx->refcount)) { /* Crash */
perf_install_in_context(ctx, event);
2) A race with perf_event_release_kernel(). perf_event_release_kernel()
prepares a separate "free_list" of all children events under ctx->mutex
and event->child_mutex. However, the "free_list" uses the same
"event->child_list" for entries. OTOH, perf_pmu_unregister() ultimately
calls __perf_remove_from_context() with DETACH_CHILD, which checks if
the event being removed is a child event, and if so, it will try to
detach the child from parent using list_del_init(&event->child_list);
i.e. two code path doing list_del on the same list entry.
perf_event_release_kernel() perf_pmu_unregister()
/* Move children events to free_list */ ...
list_for_each_entry_safe(child, tmp, &free_list, child_list) { perf_remove_from_context() /* with DETACH_CHILD */
... __perf_remove_from_context()
list_del(&child->child_list); perf_child_detach()
list_del_init(&event->child_list);
3) A WARN(), not a race. perf_pmu_unregister() increments event->refcount
before detaching the event. If perf_pmu_unregister() picks up a child
event, perf_event_exit_event() called through perf_pmu_unregister()
will try to free it. Since event->refcount would be 2, free_event()
will trigger a WARN().
perf_pmu_unregister()
event = pmu_get_event() /* event->refcount => 2 */
...
perf_event_exit_event()
if (parent_event) { /* true, because `event` is a child */
free_event(event);
if (WARN(atomic_long_cmpxchg(&event->refcount, 1, 0) != 1,
"unexpected event refcount: %ld; ptr=%p\n",
atomic_long_read(&event->refcount), event))
4) A race with perf_event_set_bpf_prog(). perf_event_set_bpf_prog() might
be in process of setting event->prog, where as perf_pmu_unregister(),
which internally calls perf_event_free_bpf_prog(), will clear the
event->prog pointer.
perf_pmu_unregister() perf_event_set_bpf_prog()
... perf_event_set_bpf_handler()
perf_event_free_bpf_prog() event->prog = prog;
event->prog = NULL;
I've yet to inspect other code paths, so there might be more races.
Thinking loud, a plausible brute force solution is to introduce "event
specific lock" and acquire it right at the beginning of all code paths
and release it at the end. event->lock shouldn't create any contention,
since event would mostly be going through only one code path at any
point in time.
Thanks,
Ravi
^ permalink raw reply [flat|nested] 85+ messages in thread* Re: [PATCH 19/19] perf: Make perf_pmu_unregister() useable
2025-01-03 4:24 ` Ravi Bangoria
@ 2025-01-17 0:03 ` Peter Zijlstra
2025-01-17 5:20 ` Ravi Bangoria
2025-01-17 13:04 ` Peter Zijlstra
0 siblings, 2 replies; 85+ messages in thread
From: Peter Zijlstra @ 2025-01-17 0:03 UTC (permalink / raw)
To: Ravi Bangoria
Cc: mingo@kernel.org, lucas.demarchi@intel.com,
linux-kernel@vger.kernel.org, willy@infradead.org,
acme@kernel.org, namhyung@kernel.org, mark.rutland@arm.com,
alexander.shishkin@linux.intel.com, jolsa@kernel.org,
irogers@google.com, adrian.hunter@intel.com,
kan.liang@linux.intel.com
On Fri, Jan 03, 2025 at 09:54:09AM +0530, Ravi Bangoria wrote:
> Hi Peter,
>
> Sorry for the delay. Was on vacation.
Yeah, me too :-)
> Both of these are incorrect. They just reduce the race window, doesn't
> actually solve the race. Anyway, I could spot few other races:
>
> 1) A race between event creation and perf_pmu_unregister(). Any event
> create code path (perf_event_open(), perf_event_create_kernel_counter()
> and inherit_event()) allocates event with perf_event_alloc() which adds
> an event to the pmu->events list. However, the event is still immature,
> for ex, event->ctx is still NULL. In the mean time, perf_pmu_unregister()
> finds this event and tries to detach it.
>
> perf_event_open() perf_pmu_unregister()
> event = perf_event_alloc() pmu_detach_event(event)
> list_add(&event->pmu_list, &pmu->events); perf_event_ctx_lock(event)
> /* perf_event_ctx_lock_nested(ctx)
> * event->ctx is NULL. ctx = READ_ONCE(event->ctx); /* event->ctx is NULL */
> */ if (!refcount_inc_not_zero(&ctx->refcount)) { /* Crash */
> perf_install_in_context(ctx, event);
Ah, that puts the lie to the guard(srcu) comment there, doesn't it :/
So the intent was for that SRCU section to cover the creation, so that
perf_pmu_unregister() can take out the pmu to avoid creating more
events, then srcu-sync to wait on all in-progress creation and then go
detach everything.
I suppose the simplest thing here is to grow that SRCU section.
> 2) A race with perf_event_release_kernel(). perf_event_release_kernel()
> prepares a separate "free_list" of all children events under ctx->mutex
> and event->child_mutex. However, the "free_list" uses the same
> "event->child_list" for entries. OTOH, perf_pmu_unregister() ultimately
> calls __perf_remove_from_context() with DETACH_CHILD, which checks if
> the event being removed is a child event, and if so, it will try to
> detach the child from parent using list_del_init(&event->child_list);
> i.e. two code path doing list_del on the same list entry.
>
> perf_event_release_kernel() perf_pmu_unregister()
> /* Move children events to free_list */ ...
> list_for_each_entry_safe(child, tmp, &free_list, child_list) { perf_remove_from_context() /* with DETACH_CHILD */
> ... __perf_remove_from_context()
> list_del(&child->child_list); perf_child_detach()
> list_del_init(&event->child_list);
Bah, I had figured it was taken care of, because perf_event_exit_event()
has a similar race. I'll try and figure out what to do there.
> 3) A WARN(), not a race. perf_pmu_unregister() increments event->refcount
> before detaching the event. If perf_pmu_unregister() picks up a child
> event, perf_event_exit_event() called through perf_pmu_unregister()
> will try to free it. Since event->refcount would be 2, free_event()
> will trigger a WARN().
>
> perf_pmu_unregister()
> event = pmu_get_event() /* event->refcount => 2 */
> ...
> perf_event_exit_event()
> if (parent_event) { /* true, because `event` is a child */
> free_event(event);
> if (WARN(atomic_long_cmpxchg(&event->refcount, 1, 0) != 1,
> "unexpected event refcount: %ld; ptr=%p\n",
> atomic_long_read(&event->refcount), event))
I'll make that something like:
if (revoke)
put_event(event);
else
free_event(event);
or so.
> 4) A race with perf_event_set_bpf_prog(). perf_event_set_bpf_prog() might
> be in process of setting event->prog, where as perf_pmu_unregister(),
> which internally calls perf_event_free_bpf_prog(), will clear the
> event->prog pointer.
>
> perf_pmu_unregister() perf_event_set_bpf_prog()
> ... perf_event_set_bpf_handler()
> perf_event_free_bpf_prog() event->prog = prog;
> event->prog = NULL;
>
> I've yet to inspect other code paths, so there might be more races.
Weird, that should be serialized by perf_event_ctx_lock(), both
__pmu_detach_event() and _perf_ioctl() are called under that.
Thanks for going over this!
^ permalink raw reply [flat|nested] 85+ messages in thread* Re: [PATCH 19/19] perf: Make perf_pmu_unregister() useable
2025-01-17 0:03 ` Peter Zijlstra
@ 2025-01-17 5:20 ` Ravi Bangoria
2025-01-17 8:36 ` Peter Zijlstra
2025-01-17 13:04 ` Peter Zijlstra
1 sibling, 1 reply; 85+ messages in thread
From: Ravi Bangoria @ 2025-01-17 5:20 UTC (permalink / raw)
To: Peter Zijlstra
Cc: mingo@kernel.org, lucas.demarchi@intel.com,
linux-kernel@vger.kernel.org, willy@infradead.org,
acme@kernel.org, namhyung@kernel.org, mark.rutland@arm.com,
alexander.shishkin@linux.intel.com, jolsa@kernel.org,
irogers@google.com, adrian.hunter@intel.com,
kan.liang@linux.intel.com, Ravi Bangoria
Hi Peter,
>> 4) A race with perf_event_set_bpf_prog(). perf_event_set_bpf_prog() might
>> be in process of setting event->prog, where as perf_pmu_unregister(),
>> which internally calls perf_event_free_bpf_prog(), will clear the
>> event->prog pointer.
>>
>> perf_pmu_unregister() perf_event_set_bpf_prog()
>> ... perf_event_set_bpf_handler()
>> perf_event_free_bpf_prog() event->prog = prog;
>> event->prog = NULL;
>>
>> I've yet to inspect other code paths, so there might be more races.
>
> Weird, that should be serialized by perf_event_ctx_lock(), both
> __pmu_detach_event() and _perf_ioctl() are called under that.
There are multiple code paths leading to perf_event_set_bpf_prog(). The
one starting from _perf_ioctl() is serialized. However, this is not:
__sys_bpf()
link_create()
bpf_perf_link_attach()
perf_event_set_bpf_prog()
perf_event_set_bpf_handler()
event->prog = prog;
Thanks,
Ravi
^ permalink raw reply [flat|nested] 85+ messages in thread* Re: [PATCH 19/19] perf: Make perf_pmu_unregister() useable
2025-01-17 5:20 ` Ravi Bangoria
@ 2025-01-17 8:36 ` Peter Zijlstra
0 siblings, 0 replies; 85+ messages in thread
From: Peter Zijlstra @ 2025-01-17 8:36 UTC (permalink / raw)
To: Ravi Bangoria
Cc: mingo@kernel.org, lucas.demarchi@intel.com,
linux-kernel@vger.kernel.org, willy@infradead.org,
acme@kernel.org, namhyung@kernel.org, mark.rutland@arm.com,
alexander.shishkin@linux.intel.com, jolsa@kernel.org,
irogers@google.com, adrian.hunter@intel.com,
kan.liang@linux.intel.com
On Fri, Jan 17, 2025 at 10:50:26AM +0530, Ravi Bangoria wrote:
> Hi Peter,
>
> >> 4) A race with perf_event_set_bpf_prog(). perf_event_set_bpf_prog() might
> >> be in process of setting event->prog, where as perf_pmu_unregister(),
> >> which internally calls perf_event_free_bpf_prog(), will clear the
> >> event->prog pointer.
> >>
> >> perf_pmu_unregister() perf_event_set_bpf_prog()
> >> ... perf_event_set_bpf_handler()
> >> perf_event_free_bpf_prog() event->prog = prog;
> >> event->prog = NULL;
> >>
> >> I've yet to inspect other code paths, so there might be more races.
> >
> > Weird, that should be serialized by perf_event_ctx_lock(), both
> > __pmu_detach_event() and _perf_ioctl() are called under that.
>
> There are multiple code paths leading to perf_event_set_bpf_prog(). The
> one starting from _perf_ioctl() is serialized. However, this is not:
>
> __sys_bpf()
> link_create()
> bpf_perf_link_attach()
> perf_event_set_bpf_prog()
> perf_event_set_bpf_handler()
> event->prog = prog;
>
Urgh yeah, that's broken. Damn bpf stuff :-/
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 19/19] perf: Make perf_pmu_unregister() useable
2025-01-17 0:03 ` Peter Zijlstra
2025-01-17 5:20 ` Ravi Bangoria
@ 2025-01-17 13:04 ` Peter Zijlstra
2025-01-17 21:04 ` Peter Zijlstra
1 sibling, 1 reply; 85+ messages in thread
From: Peter Zijlstra @ 2025-01-17 13:04 UTC (permalink / raw)
To: Ravi Bangoria
Cc: mingo@kernel.org, lucas.demarchi@intel.com,
linux-kernel@vger.kernel.org, willy@infradead.org,
acme@kernel.org, namhyung@kernel.org, mark.rutland@arm.com,
alexander.shishkin@linux.intel.com, jolsa@kernel.org,
irogers@google.com, adrian.hunter@intel.com,
kan.liang@linux.intel.com
On Fri, Jan 17, 2025 at 01:03:16AM +0100, Peter Zijlstra wrote:
> > 2) A race with perf_event_release_kernel(). perf_event_release_kernel()
> > prepares a separate "free_list" of all children events under ctx->mutex
> > and event->child_mutex. However, the "free_list" uses the same
> > "event->child_list" for entries. OTOH, perf_pmu_unregister() ultimately
> > calls __perf_remove_from_context() with DETACH_CHILD, which checks if
> > the event being removed is a child event, and if so, it will try to
> > detach the child from parent using list_del_init(&event->child_list);
> > i.e. two code path doing list_del on the same list entry.
> >
> > perf_event_release_kernel() perf_pmu_unregister()
> > /* Move children events to free_list */ ...
> > list_for_each_entry_safe(child, tmp, &free_list, child_list) { perf_remove_from_context() /* with DETACH_CHILD */
> > ... __perf_remove_from_context()
> > list_del(&child->child_list); perf_child_detach()
> > list_del_init(&event->child_list);
>
> Bah, I had figured it was taken care of, because perf_event_exit_event()
> has a similar race. I'll try and figure out what to do there.
So, the problem appears to be that perf_event_release_kernel() does not
use DETACH_CHILD, doing so will clear PERF_ATTACH_CHILD, at which point
the above is fully serialized by parent->child_mutex.
Then the next problem is that since pmu_detach_events() can hold an
extra ref on things, the free_event() from free_list will WARN, like
before.
Easily fixed by making that put_event(), except that messes up the whole
wait_var_event() scheme -- since __free_event() does the final
put_ctx().
This in turn can be fixed by pushing that wake_up_var() nonsense into
put_ctx() itself.
Which then gives me something like so.
But also, I think we can get rid of that free_list entirely.
Anyway, let me go break this up into individual patches and go test
this -- after lunch!
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1229,8 +1229,14 @@ static void put_ctx(struct perf_event_co
if (refcount_dec_and_test(&ctx->refcount)) {
if (ctx->parent_ctx)
put_ctx(ctx->parent_ctx);
- if (ctx->task && ctx->task != TASK_TOMBSTONE)
- put_task_struct(ctx->task);
+ if (ctx->task) {
+ if (ctx->task == TASK_TOMBSTONE) {
+ smp_mb();
+ wake_up_var(&ctx->refcount);
+ } else {
+ put_task_struct(ctx->task);
+ }
+ }
call_rcu(&ctx->rcu_head, free_ctx);
}
}
@@ -5550,8 +5556,6 @@ int perf_event_release_kernel(struct per
again:
mutex_lock(&event->child_mutex);
list_for_each_entry(child, &event->child_list, child_list) {
- void *var = NULL;
-
/*
* Cannot change, child events are not migrated, see the
* comment with perf_event_ctx_lock_nested().
@@ -5584,46 +5588,32 @@ int perf_event_release_kernel(struct per
tmp = list_first_entry_or_null(&event->child_list,
struct perf_event, child_list);
if (tmp == child) {
- perf_remove_from_context(child, DETACH_GROUP);
- list_move(&child->child_list, &free_list);
+ perf_remove_from_context(child, DETACH_GROUP | DETACH_CHILD);
+ /*
+ * Can't risk calling into free_event() here, since
+ * event->destroy() might invert with the currently
+ * held locks, see 82d94856fa22 ("perf/core: Fix lock
+ * inversion between perf,trace,cpuhp")
+ */
+ list_add(&child->child_list, &free_list);
/*
* This matches the refcount bump in inherit_event();
* this can't be the last reference.
*/
put_event(event);
- } else {
- var = &ctx->refcount;
}
mutex_unlock(&event->child_mutex);
mutex_unlock(&ctx->mutex);
put_ctx(ctx);
- if (var) {
- /*
- * If perf_event_free_task() has deleted all events from the
- * ctx while the child_mutex got released above, make sure to
- * notify about the preceding put_ctx().
- */
- smp_mb(); /* pairs with wait_var_event() */
- wake_up_var(var);
- }
goto again;
}
mutex_unlock(&event->child_mutex);
list_for_each_entry_safe(child, tmp, &free_list, child_list) {
- void *var = &child->ctx->refcount;
-
list_del(&child->child_list);
- free_event(child);
-
- /*
- * Wake any perf_event_free_task() waiting for this event to be
- * freed.
- */
- smp_mb(); /* pairs with wait_var_event() */
- wake_up_var(var);
+ put_event(child);
}
no_ctx:
^ permalink raw reply [flat|nested] 85+ messages in thread* Re: [PATCH 19/19] perf: Make perf_pmu_unregister() useable
2025-01-17 13:04 ` Peter Zijlstra
@ 2025-01-17 21:04 ` Peter Zijlstra
2025-01-20 11:15 ` Ravi Bangoria
0 siblings, 1 reply; 85+ messages in thread
From: Peter Zijlstra @ 2025-01-17 21:04 UTC (permalink / raw)
To: Ravi Bangoria
Cc: mingo@kernel.org, lucas.demarchi@intel.com,
linux-kernel@vger.kernel.org, willy@infradead.org,
acme@kernel.org, namhyung@kernel.org, mark.rutland@arm.com,
alexander.shishkin@linux.intel.com, jolsa@kernel.org,
irogers@google.com, adrian.hunter@intel.com,
kan.liang@linux.intel.com
On Fri, Jan 17, 2025 at 02:04:23PM +0100, Peter Zijlstra wrote:
> Anyway, let me go break this up into individual patches and go test
> this -- after lunch!
OK, so aside from a few dumb mistakes, the result seems to hold up with
your tinypmu testcase. I left it running for about 30 minutes.
I pushed out the latest patches to queue/perf/pmu-unregister
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 19/19] perf: Make perf_pmu_unregister() useable
2025-01-17 21:04 ` Peter Zijlstra
@ 2025-01-20 11:15 ` Ravi Bangoria
0 siblings, 0 replies; 85+ messages in thread
From: Ravi Bangoria @ 2025-01-20 11:15 UTC (permalink / raw)
To: Peter Zijlstra
Cc: mingo@kernel.org, lucas.demarchi@intel.com,
linux-kernel@vger.kernel.org, willy@infradead.org,
acme@kernel.org, namhyung@kernel.org, mark.rutland@arm.com,
alexander.shishkin@linux.intel.com, jolsa@kernel.org,
irogers@google.com, adrian.hunter@intel.com,
kan.liang@linux.intel.com, Ravi Bangoria
Hi Peter,
On 18-Jan-25 2:34 AM, Peter Zijlstra wrote:
> On Fri, Jan 17, 2025 at 02:04:23PM +0100, Peter Zijlstra wrote:
>
>> Anyway, let me go break this up into individual patches and go test
>> this -- after lunch!
>
> OK, so aside from a few dumb mistakes, the result seems to hold up with
> your tinypmu testcase. I left it running for about 30 minutes.
>
> I pushed out the latest patches to queue/perf/pmu-unregister
I'll spend some time to go through the changes.
I ran fuzzer over the weekend with latest queue/perf/pmu-unregister
and I saw this kernel crash:
BUG: kernel NULL pointer dereference, address: 00000000000000d0
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 12a922067 P4D 0
Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 8 UID: 1002 PID: 8505 Comm: perf_fuzzer Kdump: loaded Tainted: G W O 6.13.0-rc1-pmu-unregister+ #171
Tainted: [W]=WARN, [O]=OOT_MODULE
Hardware name: AMD Corporation RUBY/RUBY, BIOS RRR1009C 07/21/2023
RIP: 0010:perf_mmap_to_page+0x6/0xc0
Code: ...
RSP: 0018:ffa0000003aff910 EFLAGS: 00010206
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000008
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffa0000003aff980 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 00007b2b02ca8000 R14: ff1100014f9cc5c0 R15: 0000000000000009
FS: 00007b2b02e03740(0000) GS:ff11001009000000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000000d0 CR3: 000000014f99c001 CR4: 0000000000f71ef0
PKRU: 55555554
Call Trace:
<TASK>
? show_regs+0x6c/0x80
? __die+0x24/0x80
? page_fault_oops+0x155/0x570
? do_user_addr_fault+0x4b2/0x870
? srso_alias_return_thunk+0x5/0xfbef5
? get_page_from_freelist+0x3c7/0x1680
? exc_page_fault+0x82/0x1b0
? asm_exc_page_fault+0x27/0x30
? perf_mmap_to_page+0x6/0xc0
? perf_mmap+0x237/0x710
__mmap_region+0x6d5/0xb90
mmap_region+0x8d/0xc0
do_mmap+0x349/0x630
vm_mmap_pgoff+0xf4/0x1c0
ksys_mmap_pgoff+0x177/0x240
__x64_sys_mmap+0x33/0x70
x64_sys_call+0x24b9/0x2650
do_syscall_64+0x7e/0x170
? srso_alias_return_thunk+0x5/0xfbef5
? srso_alias_return_thunk+0x5/0xfbef5
? sched_tick+0x119/0x320
? srso_alias_return_thunk+0x5/0xfbef5
? sysvec_irq_work+0x4f/0xc0
? srso_alias_return_thunk+0x5/0xfbef5
? rcu_report_qs_rnp+0xd1/0x140
? srso_alias_return_thunk+0x5/0xfbef5
? rcu_core+0x1c2/0x380
? srso_alias_return_thunk+0x5/0xfbef5
? rcu_core_si+0xe/0x20
? srso_alias_return_thunk+0x5/0xfbef5
? handle_softirqs+0xe7/0x330
? srso_alias_return_thunk+0x5/0xfbef5
? srso_alias_return_thunk+0x5/0xfbef5
? irqentry_exit_to_user_mode+0x43/0x250
? srso_alias_return_thunk+0x5/0xfbef5
? irqentry_exit+0x43/0x50
? srso_alias_return_thunk+0x5/0xfbef5
? sysvec_apic_timer_interrupt+0x4f/0xc0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7b2b02b2531c
Code: ...
RSP: 002b:00007ffc3b177880 EFLAGS: 00000246 ORIG_RAX: 0000000000000009
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007b2b02b2531c
RDX: 0000000000000003 RSI: 0000000000009000 RDI: 0000000000000000
RBP: 00007ffc3b177890 R08: 0000000000000003 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000001
R13: 0000000000000000 R14: 0000636a69c7bb60 R15: 00007b2b02e55000
</TASK>
Modules linked in: ...
CR2: 00000000000000d0
---[ end trace 0000000000000000 ]---
Thanks,
Ravi
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 19/19] perf: Make perf_pmu_unregister() useable
2024-11-04 13:39 ` [PATCH 19/19] perf: Make perf_pmu_unregister() useable Peter Zijlstra
2024-11-05 15:08 ` Liang, Kan
2024-11-25 4:10 ` Ravi Bangoria
@ 2025-01-03 4:29 ` Ravi Bangoria
2 siblings, 0 replies; 85+ messages in thread
From: Ravi Bangoria @ 2025-01-03 4:29 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, willy, acme, namhyung, mark.rutland,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
mingo, lucas.demarchi, Ravi Bangoria
Hi Peter,
> @@ -12294,6 +12458,13 @@ perf_event_alloc(struct perf_event_attr
>
> perf_event__state_init(event);
>
> + /*
> + * Hold SRCU critical section around perf_init_event(), until returning
> + * the fully formed event put on pmu->events_list. This ensures that
> + * perf_pmu_unregister() will see any in-progress event creation that
> + * races.
> + */
> + guard(srcu)(&pmus_srcu);
Minor nit. This can go down a bit, just right before perf_init_event() ?
Thanks,
Ravi
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 00/19] perf: Make perf_pmu_unregister() usable
2024-11-04 13:39 [PATCH 00/19] perf: Make perf_pmu_unregister() usable Peter Zijlstra
` (18 preceding siblings ...)
2024-11-04 13:39 ` [PATCH 19/19] perf: Make perf_pmu_unregister() useable Peter Zijlstra
@ 2024-12-16 18:02 ` Lucas De Marchi
2025-03-01 20:00 ` Ingo Molnar
20 siblings, 0 replies; 85+ messages in thread
From: Lucas De Marchi @ 2024-12-16 18:02 UTC (permalink / raw)
To: Peter Zijlstra
Cc: mingo, linux-kernel, willy, acme, namhyung, mark.rutland,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang
Hi Peter,
On Mon, Nov 04, 2024 at 02:39:09PM +0100, Peter Zijlstra wrote:
>Hi,
>
>Lucas convinced me that perf_pmu_unregister() is a trainwreck; after
>considering a few options I was like, how hard could it be..
>
>So find here a few patches that clean things up in preparation and then a final
>patch that makes unregistering a PMU work by introducing a new event state
>(REVOKED) and ensuring that any event in such a state will never get to using
>it's PMU methods ever again.
Any updates on this series? I'm trying to understand if there are any
blockers I could help with - we are aiming to add perf pmu to the xe
driver, but I'd like to have a reliable unregister first.
thanks
Lucas De Marchi
>
>
^ permalink raw reply [flat|nested] 85+ messages in thread* Re: [PATCH 00/19] perf: Make perf_pmu_unregister() usable
2024-11-04 13:39 [PATCH 00/19] perf: Make perf_pmu_unregister() usable Peter Zijlstra
` (19 preceding siblings ...)
2024-12-16 18:02 ` [PATCH 00/19] perf: Make perf_pmu_unregister() usable Lucas De Marchi
@ 2025-03-01 20:00 ` Ingo Molnar
2025-03-03 3:25 ` Ravi Bangoria
20 siblings, 1 reply; 85+ messages in thread
From: Ingo Molnar @ 2025-03-01 20:00 UTC (permalink / raw)
To: Peter Zijlstra
Cc: lucas.demarchi, linux-kernel, willy, acme, namhyung, mark.rutland,
alexander.shishkin, jolsa, irogers, adrian.hunter, kan.liang,
Ravi Bangoria
* Peter Zijlstra <peterz@infradead.org> wrote:
> Hi,
>
> Lucas convinced me that perf_pmu_unregister() is a trainwreck; after
> considering a few options I was like, how hard could it be..
>
> So find here a few patches that clean things up in preparation and then a final
> patch that makes unregistering a PMU work by introducing a new event state
> (REVOKED) and ensuring that any event in such a state will never get to using
> it's PMU methods ever again.
So it looks like this series first got lost in the usual end-of-year
fog of holidays, then it has become somewhat bitrotten due to other
perf changes interacting and creating conflicts. I cannot find these
patches in queue.git anymore, other than the somewhat stale 4+ months
old perf/pmu-unregister branch from October 2024.
Which I found a bit sad, because these cleanups to the control flow and
error handling of these key perf primitives was a substantial reduction
of our years long technical debt in this area.
So to move things forward I dusted off most of these patches, reviewed
the logic, resolved the conflicts, folded in the fix to pmu_dev_alloc()
that Ravi found (and upgraded his 'looks OK' reply into Acked-by tags),
added/extended changelogs, did some testing due diligence and sorted
them into their appropriate -next branches:
#
# tip:[locking/core]
#
# After 10 years of this lockdep debug check hidden behind
# CONFIG_DEBUG_ATOMIC_SLEEP=y I definitely wasn't brave enough to stick
# this into an urgent branch. Sue me.
#
a1b65f3f7c6f ("lockdep/mm: Fix might_fault() lockdep check of current->mm->mmap_lock")
#
# tip:[perf/urgent]
#
# These look like obvious fixes that can be accelerated to -rc6
#
003659fec9f6 ("perf/core: Fix perf_pmu_register() vs. perf_init_event()")
2565e42539b1 ("perf/core: Fix pmus_lock vs. pmus_srcu ordering")
#
# tip:[perf/core]
#
# These are most of the remaining patches from this series, except for 15/19
# which I was unsure about and 19/19 which is still under discussion:
#
02be310c2d24 ("perf/core: Simplify the perf_event_alloc() error path
e6b17cfd528d ("perf/core: Simplify the perf_pmu_register() error path")
742d5df92842 ("perf/core: Simplify perf_pmu_register()")
9954ea69de5c ("perf/core: Simplify perf_init_event()")
ebfe83832e39 ("perf/core: Simplify perf_event_alloc()")
46cc0835d258 ("perf/core: Merge struct pmu::pmu_disable_count into struct perf_cpu_pmu_context::pmu_disable_count")
a57411b14ea0 ("perf/core: Add this_cpc() helper")
8e140c656746 ("perf/core: Introduce perf_free_addr_filters()")
26700b1359a1 ("perf/bpf: Robustify perf_event_free_bpf_prog()")
7503c90c0df8 ("perf/core: Simplify the perf_mmap() control flow")
8c7446add31e ("perf/core: Further simplify perf_mmap()")
6cbfc06a8590 ("perf/core: Remove retry loop from perf_mmap()")
244b28f87ba4 ("perf/core: Lift event->mmap_mutex in perf_mmap()")
As to 'testing due diligence', that's overselling it really, it was
mostly just some quick build/boot and functionality test combined
with perf test runs, ie. very light testing. Caveat emptor, but of
course the end result is perfect if we disregard any new bugs.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 85+ messages in thread* Re: [PATCH 00/19] perf: Make perf_pmu_unregister() usable
2025-03-01 20:00 ` Ingo Molnar
@ 2025-03-03 3:25 ` Ravi Bangoria
2025-03-03 9:16 ` Peter Zijlstra
0 siblings, 1 reply; 85+ messages in thread
From: Ravi Bangoria @ 2025-03-03 3:25 UTC (permalink / raw)
To: Ingo Molnar
Cc: Peter Zijlstra, lucas.demarchi, linux-kernel, willy, acme,
namhyung, mark.rutland, alexander.shishkin, jolsa, irogers,
adrian.hunter, kan.liang, Ravi Bangoria
Hi Ingo,
>> Lucas convinced me that perf_pmu_unregister() is a trainwreck; after
>> considering a few options I was like, how hard could it be..
>>
>> So find here a few patches that clean things up in preparation and then a final
>> patch that makes unregistering a PMU work by introducing a new event state
>> (REVOKED) and ensuring that any event in such a state will never get to using
>> it's PMU methods ever again.
>
> So it looks like this series first got lost in the usual end-of-year
> fog of holidays, then it has become somewhat bitrotten due to other
> perf changes interacting and creating conflicts. I cannot find these
> patches in queue.git anymore, other than the somewhat stale 4+ months
> old perf/pmu-unregister branch from October 2024.
Peter posted V2:
https://lore.kernel.org/r/20250205102120.531585416@infradead.org
Same was pushed in Peter's queue repo (2025-02-03):
https://web.git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/log/?h=perf/pmu-unregister
Thanks,
Ravi
^ permalink raw reply [flat|nested] 85+ messages in thread
* Re: [PATCH 00/19] perf: Make perf_pmu_unregister() usable
2025-03-03 3:25 ` Ravi Bangoria
@ 2025-03-03 9:16 ` Peter Zijlstra
0 siblings, 0 replies; 85+ messages in thread
From: Peter Zijlstra @ 2025-03-03 9:16 UTC (permalink / raw)
To: Ravi Bangoria
Cc: Ingo Molnar, lucas.demarchi, linux-kernel, willy, acme, namhyung,
mark.rutland, alexander.shishkin, jolsa, irogers, adrian.hunter,
kan.liang
On Mon, Mar 03, 2025 at 08:55:06AM +0530, Ravi Bangoria wrote:
> Hi Ingo,
>
> >> Lucas convinced me that perf_pmu_unregister() is a trainwreck; after
> >> considering a few options I was like, how hard could it be..
> >>
> >> So find here a few patches that clean things up in preparation and then a final
> >> patch that makes unregistering a PMU work by introducing a new event state
> >> (REVOKED) and ensuring that any event in such a state will never get to using
> >> it's PMU methods ever again.
> >
> > So it looks like this series first got lost in the usual end-of-year
> > fog of holidays, then it has become somewhat bitrotten due to other
> > perf changes interacting and creating conflicts. I cannot find these
> > patches in queue.git anymore, other than the somewhat stale 4+ months
> > old perf/pmu-unregister branch from October 2024.
>
> Peter posted V2:
> https://lore.kernel.org/r/20250205102120.531585416@infradead.org
>
> Same was pushed in Peter's queue repo (2025-02-03):
> https://web.git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/log/?h=perf/pmu-unregister
Right, I need to get back to this.. I have a bunch of changes from v2
already accumulated, but haven't gotten around to looking at Ravi's
latest feedback :/
^ permalink raw reply [flat|nested] 85+ messages in thread