* Re: [PATCH v2] perf: Rewrite core context handling [not found] <20221008062424.313-1-ravi.bangoria@amd.com> @ 2022-10-11 11:29 ` Peter Zijlstra 2022-10-11 13:19 ` Ravi Bangoria 0 siblings, 1 reply; 12+ messages in thread From: Peter Zijlstra @ 2022-10-11 11:29 UTC (permalink / raw) To: Ravi Bangoria Cc: acme, alexander.shishkin, jolsa, namhyung, songliubraving, eranian, ak, mark.rutland, frederic, maddy, irogers, will, robh, mingo, catalin.marinas, ndesaulniers, srw, linux-arm-kernel, linux-perf-users, linuxppc-dev, linux-s390, linux-kernel, sandipan.das, ananth.narayan, kim.phillips, santosh.shukla On Sat, Oct 08, 2022 at 11:54:24AM +0530, Ravi Bangoria wrote: > +static void perf_event_swap_task_ctx_data(struct perf_event_context *prev_ctx, > + struct perf_event_context *next_ctx) > +{ > + struct perf_event_pmu_context *prev_epc, *next_epc; > + > + if (!prev_ctx->nr_task_data) > + return; > + > + prev_epc = list_first_entry(&prev_ctx->pmu_ctx_list, > + struct perf_event_pmu_context, > + pmu_ctx_entry); > + next_epc = list_first_entry(&next_ctx->pmu_ctx_list, > + struct perf_event_pmu_context, > + pmu_ctx_entry); > + > + while (&prev_epc->pmu_ctx_entry != &prev_ctx->pmu_ctx_list && > + &next_epc->pmu_ctx_entry != &next_ctx->pmu_ctx_list) { > + > + WARN_ON_ONCE(prev_epc->pmu != next_epc->pmu); > + > + /* > + * PMU specific parts of task perf context can require > + * additional synchronization. As an example of such > + * synchronization see implementation details of Intel > + * LBR call stack data profiling; > + */ > + if (prev_epc->pmu->swap_task_ctx) > + prev_epc->pmu->swap_task_ctx(prev_epc, next_epc); > + else > + swap(prev_epc->task_ctx_data, next_epc->task_ctx_data); Did I forget to advance the iterators here? > + } > +} ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] perf: Rewrite core context handling 2022-10-11 11:29 ` [PATCH v2] perf: Rewrite core context handling Peter Zijlstra @ 2022-10-11 13:19 ` Ravi Bangoria 2022-10-11 14:02 ` Peter Zijlstra 0 siblings, 1 reply; 12+ messages in thread From: Ravi Bangoria @ 2022-10-11 13:19 UTC (permalink / raw) To: Peter Zijlstra Cc: acme, alexander.shishkin, jolsa, namhyung, songliubraving, eranian, ak, mark.rutland, frederic, maddy, irogers, will, robh, mingo, catalin.marinas, ndesaulniers, srw, linux-arm-kernel, linux-perf-users, linuxppc-dev, linux-s390, linux-kernel, sandipan.das, ananth.narayan, kim.phillips, santosh.shukla, Ravi Bangoria On 11-Oct-22 4:59 PM, Peter Zijlstra wrote: > On Sat, Oct 08, 2022 at 11:54:24AM +0530, Ravi Bangoria wrote: > >> +static void perf_event_swap_task_ctx_data(struct perf_event_context *prev_ctx, >> + struct perf_event_context *next_ctx) >> +{ >> + struct perf_event_pmu_context *prev_epc, *next_epc; >> + >> + if (!prev_ctx->nr_task_data) >> + return; >> + >> + prev_epc = list_first_entry(&prev_ctx->pmu_ctx_list, >> + struct perf_event_pmu_context, >> + pmu_ctx_entry); >> + next_epc = list_first_entry(&next_ctx->pmu_ctx_list, >> + struct perf_event_pmu_context, >> + pmu_ctx_entry); >> + >> + while (&prev_epc->pmu_ctx_entry != &prev_ctx->pmu_ctx_list && >> + &next_epc->pmu_ctx_entry != &next_ctx->pmu_ctx_list) { >> + >> + WARN_ON_ONCE(prev_epc->pmu != next_epc->pmu); >> + >> + /* >> + * PMU specific parts of task perf context can require >> + * additional synchronization. As an example of such >> + * synchronization see implementation details of Intel >> + * LBR call stack data profiling; >> + */ >> + if (prev_epc->pmu->swap_task_ctx) >> + prev_epc->pmu->swap_task_ctx(prev_epc, next_epc); >> + else >> + swap(prev_epc->task_ctx_data, next_epc->task_ctx_data); > > Did I forget to advance the iterators here? Yeah. Seems so. I overlooked it too. Thanks, Ravi ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] perf: Rewrite core context handling 2022-10-11 13:19 ` Ravi Bangoria @ 2022-10-11 14:02 ` Peter Zijlstra 2022-10-11 17:47 ` Peter Zijlstra 0 siblings, 1 reply; 12+ messages in thread From: Peter Zijlstra @ 2022-10-11 14:02 UTC (permalink / raw) To: Ravi Bangoria Cc: acme, alexander.shishkin, jolsa, namhyung, songliubraving, eranian, ak, mark.rutland, frederic, maddy, irogers, will, robh, mingo, catalin.marinas, ndesaulniers, srw, linux-arm-kernel, linux-perf-users, linuxppc-dev, linux-s390, linux-kernel, sandipan.das, ananth.narayan, kim.phillips, santosh.shukla On Tue, Oct 11, 2022 at 06:49:55PM +0530, Ravi Bangoria wrote: > On 11-Oct-22 4:59 PM, Peter Zijlstra wrote: > > On Sat, Oct 08, 2022 at 11:54:24AM +0530, Ravi Bangoria wrote: > > > >> +static void perf_event_swap_task_ctx_data(struct perf_event_context *prev_ctx, > >> + struct perf_event_context *next_ctx) > >> +{ > >> + struct perf_event_pmu_context *prev_epc, *next_epc; > >> + > >> + if (!prev_ctx->nr_task_data) > >> + return; > >> + > >> + prev_epc = list_first_entry(&prev_ctx->pmu_ctx_list, > >> + struct perf_event_pmu_context, > >> + pmu_ctx_entry); > >> + next_epc = list_first_entry(&next_ctx->pmu_ctx_list, > >> + struct perf_event_pmu_context, > >> + pmu_ctx_entry); > >> + > >> + while (&prev_epc->pmu_ctx_entry != &prev_ctx->pmu_ctx_list && > >> + &next_epc->pmu_ctx_entry != &next_ctx->pmu_ctx_list) { > >> + > >> + WARN_ON_ONCE(prev_epc->pmu != next_epc->pmu); > >> + > >> + /* > >> + * PMU specific parts of task perf context can require > >> + * additional synchronization. As an example of such > >> + * synchronization see implementation details of Intel > >> + * LBR call stack data profiling; > >> + */ > >> + if (prev_epc->pmu->swap_task_ctx) > >> + prev_epc->pmu->swap_task_ctx(prev_epc, next_epc); > >> + else > >> + swap(prev_epc->task_ctx_data, next_epc->task_ctx_data); > > > > Did I forget to advance the iterators here? > > Yeah. Seems so. I overlooked it too. OK; so I'm not slowly going crazy staring at this code ;-) Let me go add it now then. :-) But first I gotta taxi the kids around for a bit, bbl. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] perf: Rewrite core context handling 2022-10-11 14:02 ` Peter Zijlstra @ 2022-10-11 17:47 ` Peter Zijlstra 2022-10-12 8:39 ` Ravi Bangoria 0 siblings, 1 reply; 12+ messages in thread From: Peter Zijlstra @ 2022-10-11 17:47 UTC (permalink / raw) To: Ravi Bangoria Cc: acme, alexander.shishkin, jolsa, namhyung, songliubraving, eranian, ak, mark.rutland, frederic, maddy, irogers, will, robh, mingo, catalin.marinas, ndesaulniers, srw, linux-arm-kernel, linux-perf-users, linuxppc-dev, linux-s390, linux-kernel, sandipan.das, ananth.narayan, kim.phillips, santosh.shukla On Tue, Oct 11, 2022 at 04:02:56PM +0200, Peter Zijlstra wrote: > On Tue, Oct 11, 2022 at 06:49:55PM +0530, Ravi Bangoria wrote: > > On 11-Oct-22 4:59 PM, Peter Zijlstra wrote: > > > On Sat, Oct 08, 2022 at 11:54:24AM +0530, Ravi Bangoria wrote: > > > > > >> +static void perf_event_swap_task_ctx_data(struct perf_event_context *prev_ctx, > > >> + struct perf_event_context *next_ctx) > > >> +{ > > >> + struct perf_event_pmu_context *prev_epc, *next_epc; > > >> + > > >> + if (!prev_ctx->nr_task_data) > > >> + return; > > >> + > > >> + prev_epc = list_first_entry(&prev_ctx->pmu_ctx_list, > > >> + struct perf_event_pmu_context, > > >> + pmu_ctx_entry); > > >> + next_epc = list_first_entry(&next_ctx->pmu_ctx_list, > > >> + struct perf_event_pmu_context, > > >> + pmu_ctx_entry); > > >> + > > >> + while (&prev_epc->pmu_ctx_entry != &prev_ctx->pmu_ctx_list && > > >> + &next_epc->pmu_ctx_entry != &next_ctx->pmu_ctx_list) { > > >> + > > >> + WARN_ON_ONCE(prev_epc->pmu != next_epc->pmu); > > >> + > > >> + /* > > >> + * PMU specific parts of task perf context can require > > >> + * additional synchronization. As an example of such > > >> + * synchronization see implementation details of Intel > > >> + * LBR call stack data profiling; > > >> + */ > > >> + if (prev_epc->pmu->swap_task_ctx) > > >> + prev_epc->pmu->swap_task_ctx(prev_epc, next_epc); > > >> + else > > >> + swap(prev_epc->task_ctx_data, next_epc->task_ctx_data); > > > > > > Did I forget to advance the iterators here? > > > > Yeah. Seems so. I overlooked it too. > > OK; so I'm not slowly going crazy staring at this code ;-) Let me go add > it now then. :-) > > But first I gotta taxi the kids around for a bit, bbl. OK, so I've been going over the perf_event_pmu_context life-time thing as well, there were a bunch of XXXs there and I'm not sure Im happy with things, but I'd also forgotten most of it. Ideally epc works like it's a regular member of ctx -- locking wise that is, but I'm not sure we can make that stick -- see the ctx->mutex issues we have with put_ctx(). As such, I'm going to have to re-audit all the epc usage to see if pure ctx->lock is sufficient. I did do make epc RCU freed, because pretty much everything is and that was easy enough to make happen -- it means we don't need to worry about that. But I'm going cross-eyes from staring at this all day, so more tomorrow. The below is what I currently have. --- --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -833,13 +833,13 @@ struct perf_event { * `--------[1:n]---------' `-[n:1]-> pmu <-[1:n]-' * * - * XXX destroy epc when empty - * refcount, !rcu + * epc lifetime is refcount based and RCU freed (similar to perf_event_context). + * epc locking is as if it were a member of perf_event_context; specifically: * - * XXX epc locking + * modification, both: ctx->mutex && ctx->lock + * reading, either: ctx->mutex || ctx->lock * - * event->pmu_ctx ctx->mutex && inactive - * ctx->pmu_ctx_list ctx->mutex && ctx->lock + * XXX except this isn't true ... see put_pmu_ctx(). * */ struct perf_event_pmu_context { @@ -857,6 +857,7 @@ struct perf_event_pmu_context { unsigned int nr_events; atomic_t refcount; /* event <-> epc */ + struct rcu_head rcu_head; void *task_ctx_data; /* pmu specific data */ /* --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -1727,6 +1727,10 @@ perf_event_groups_next(struct perf_event return NULL; } +#define perf_event_groups_for_cpu_pmu(event, groups, cpu, pmu) \ + for (event = perf_event_groups_first(groups, cpu, pmu, NULL); \ + event; event = perf_event_groups_next(event, pmu)) + /* * Iterate through the whole groups tree. */ @@ -3366,6 +3370,14 @@ static void perf_event_sync_stat(struct } } +#define list_for_each_entry_double(pos1, pos2, head1, head2, member) \ + for (pos1 = list_first_entry(head1, typeof(*pos1), member), \ + pos2 = list_first_entry(head2, typeof(*pos2), member); \ + !list_entry_is_head(pos1, head1, member) && \ + !list_entry_is_head(pos2, head2, member); \ + pos1 = list_next_entry(pos1, member), \ + pos2 = list_next_entry(pos2, member)) + static void perf_event_swap_task_ctx_data(struct perf_event_context *prev_ctx, struct perf_event_context *next_ctx) { @@ -3374,16 +3386,9 @@ static void perf_event_swap_task_ctx_dat if (!prev_ctx->nr_task_data) return; - prev_epc = list_first_entry(&prev_ctx->pmu_ctx_list, - struct perf_event_pmu_context, - pmu_ctx_entry); - next_epc = list_first_entry(&next_ctx->pmu_ctx_list, - struct perf_event_pmu_context, - pmu_ctx_entry); - - while (&prev_epc->pmu_ctx_entry != &prev_ctx->pmu_ctx_list && - &next_epc->pmu_ctx_entry != &next_ctx->pmu_ctx_list) { - + list_for_each_entry_double(prev_epc, next_epc, + &prev_ctx->pmu_ctx_list, &next_ctx->pmu_ctx_list, + pmu_ctx_entry) { WARN_ON_ONCE(prev_epc->pmu != next_epc->pmu); /* @@ -3706,7 +3711,6 @@ static noinline int visit_groups_merge(s perf_assert_pmu_disabled((*evt)->pmu_ctx->pmu); } - min_heapify_all(&event_heap, &perf_min_heap); while (event_heap.nr) { @@ -3845,7 +3849,6 @@ ctx_sched_in(struct perf_event_context * /* start ctx time */ __update_context_time(ctx, false); perf_cgroup_set_timestamp(cpuctx); - // XXX ctx->task =? task /* * CPU-release for the below ->is_active store, * see __load_acquire() in perf_event_time_now() @@ -4815,6 +4818,15 @@ find_get_pmu_context(struct pmu *pmu, st __perf_init_event_pmu_context(new, pmu); + /* + * XXX + * + * lockdep_assert_held(&ctx->mutex); + * + * can't because perf_event_init_task() doesn't actually hold the + * child_ctx->mutex. + */ + raw_spin_lock_irq(&ctx->lock); list_for_each_entry(epc, &ctx->pmu_ctx_list, pmu_ctx_entry) { if (epc->pmu == pmu) { @@ -4849,6 +4861,14 @@ static void get_pmu_ctx(struct perf_even WARN_ON_ONCE(!atomic_inc_not_zero(&epc->refcount)); } +static void free_epc_rcu(struct rcu_head *head) +{ + struct perf_event_pmu_context *epc = container_of(head, typeof(*epc), rcu_head); + + kfree(epc->task_ctx_data); + kfree(epc); +} + static void put_pmu_ctx(struct perf_event_pmu_context *epc) { unsigned long flags; @@ -4859,7 +4879,14 @@ static void put_pmu_ctx(struct perf_even if (epc->ctx) { struct perf_event_context *ctx = epc->ctx; - // XXX ctx->mutex + /* + * XXX + * + * lockdep_assert_held(&ctx->mutex); + * + * can't because of the call-site in _free_event()/put_event() + * which isn't always called under ctx->mutex. + */ WARN_ON_ONCE(list_empty(&epc->pmu_ctx_entry)); raw_spin_lock_irqsave(&ctx->lock, flags); @@ -4874,17 +4901,15 @@ static void put_pmu_ctx(struct perf_even if (epc->embedded) return; - kfree(epc->task_ctx_data); - kfree(epc); + call_rcu(&epc->rcu_head, free_epc_rcu); } static void perf_event_free_filter(struct perf_event *event); static void free_event_rcu(struct rcu_head *head) { - struct perf_event *event; + struct perf_event *event = container_of(head, typeof(*event), rcu_head); - event = container_of(head, struct perf_event, rcu_head); if (event->ns) put_pid_ns(event->ns); perf_event_free_filter(event); @@ -12643,13 +12668,6 @@ perf_event_create_kernel_counter(struct goto err_alloc; } - pmu_ctx = find_get_pmu_context(pmu, ctx, event); - if (IS_ERR(pmu_ctx)) { - err = PTR_ERR(pmu_ctx); - goto err_ctx; - } - event->pmu_ctx = pmu_ctx; - WARN_ON_ONCE(ctx->parent_ctx); mutex_lock(&ctx->mutex); if (ctx->task == TASK_TOMBSTONE) { @@ -12657,6 +12675,13 @@ perf_event_create_kernel_counter(struct goto err_unlock; } + pmu_ctx = find_get_pmu_context(pmu, ctx, event); + if (IS_ERR(pmu_ctx)) { + err = PTR_ERR(pmu_ctx); + goto err_unlock; + } + event->pmu_ctx = pmu_ctx; + if (!task) { /* * Check if the @cpu we're creating an event for is online. @@ -12668,13 +12693,13 @@ perf_event_create_kernel_counter(struct container_of(ctx, struct perf_cpu_context, ctx); if (!cpuctx->online) { err = -ENODEV; - goto err_unlock; + goto err_pmu_ctx; } } if (!exclusive_event_installable(event, ctx)) { err = -EBUSY; - goto err_unlock; + goto err_pmu_ctx; } perf_install_in_context(ctx, event, event->cpu); @@ -12683,9 +12708,10 @@ perf_event_create_kernel_counter(struct return event; +err_pmu_ctx: + put_pmu_ctx(pmu_ctx); err_unlock: mutex_unlock(&ctx->mutex); -err_ctx: perf_unpin_context(ctx); put_ctx(ctx); err_alloc: @@ -12702,9 +12728,7 @@ static void __perf_pmu_remove(struct per { struct perf_event *event, *sibling; - for (event = perf_event_groups_first(groups, cpu, pmu, NULL); - event; event = perf_event_groups_next(event, pmu)) { - + perf_event_groups_for_cpu_pmu(event, groups, cpu, pmu) { perf_remove_from_context(event, 0); unaccount_event_cpu(event, cpu); put_pmu_ctx(event->pmu_ctx); @@ -12998,7 +13022,7 @@ void perf_event_free_task(struct task_st struct perf_event_context *ctx; struct perf_event *event, *tmp; - ctx = rcu_dereference(task->perf_event_ctxp); + ctx = rcu_access_pointer(task->perf_event_ctxp); if (!ctx) return; ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] perf: Rewrite core context handling 2022-10-11 17:47 ` Peter Zijlstra @ 2022-10-12 8:39 ` Ravi Bangoria 2022-10-12 12:16 ` Peter Zijlstra 0 siblings, 1 reply; 12+ messages in thread From: Ravi Bangoria @ 2022-10-12 8:39 UTC (permalink / raw) To: Peter Zijlstra Cc: acme, alexander.shishkin, jolsa, namhyung, songliubraving, eranian, ak, mark.rutland, frederic, maddy, irogers, will, robh, mingo, catalin.marinas, ndesaulniers, srw, linux-arm-kernel, linux-perf-users, linuxppc-dev, linux-s390, linux-kernel, sandipan.das, ananth.narayan, kim.phillips, santosh.shukla, Ravi Bangoria On 11-Oct-22 11:17 PM, Peter Zijlstra wrote: > On Tue, Oct 11, 2022 at 04:02:56PM +0200, Peter Zijlstra wrote: >> On Tue, Oct 11, 2022 at 06:49:55PM +0530, Ravi Bangoria wrote: >>> On 11-Oct-22 4:59 PM, Peter Zijlstra wrote: >>>> On Sat, Oct 08, 2022 at 11:54:24AM +0530, Ravi Bangoria wrote: >>>> >>>>> +static void perf_event_swap_task_ctx_data(struct perf_event_context *prev_ctx, >>>>> + struct perf_event_context *next_ctx) >>>>> +{ >>>>> + struct perf_event_pmu_context *prev_epc, *next_epc; >>>>> + >>>>> + if (!prev_ctx->nr_task_data) >>>>> + return; >>>>> + >>>>> + prev_epc = list_first_entry(&prev_ctx->pmu_ctx_list, >>>>> + struct perf_event_pmu_context, >>>>> + pmu_ctx_entry); >>>>> + next_epc = list_first_entry(&next_ctx->pmu_ctx_list, >>>>> + struct perf_event_pmu_context, >>>>> + pmu_ctx_entry); >>>>> + >>>>> + while (&prev_epc->pmu_ctx_entry != &prev_ctx->pmu_ctx_list && >>>>> + &next_epc->pmu_ctx_entry != &next_ctx->pmu_ctx_list) { >>>>> + >>>>> + WARN_ON_ONCE(prev_epc->pmu != next_epc->pmu); >>>>> + >>>>> + /* >>>>> + * PMU specific parts of task perf context can require >>>>> + * additional synchronization. As an example of such >>>>> + * synchronization see implementation details of Intel >>>>> + * LBR call stack data profiling; >>>>> + */ >>>>> + if (prev_epc->pmu->swap_task_ctx) >>>>> + prev_epc->pmu->swap_task_ctx(prev_epc, next_epc); >>>>> + else >>>>> + swap(prev_epc->task_ctx_data, next_epc->task_ctx_data); >>>> >>>> Did I forget to advance the iterators here? >>> >>> Yeah. Seems so. I overlooked it too. >> >> OK; so I'm not slowly going crazy staring at this code ;-) Let me go add >> it now then. :-) >> >> But first I gotta taxi the kids around for a bit, bbl. > > OK, so I've been going over the perf_event_pmu_context life-time thing > as well, there were a bunch of XXXs there and I'm not sure Im happy with > things, but I'd also forgotten most of it. > > Ideally epc works like it's a regular member of ctx -- locking wise that > is, but I'm not sure we can make that stick -- see the ctx->mutex issues > we have with put_ctx(). > > As such, I'm going to have to re-audit all the epc usage to see if > pure ctx->lock is sufficient. > > I did do make epc RCU freed, because pretty much everything is and that > was easy enough to make happen -- it means we don't need to worry about > that. > > But I'm going cross-eyes from staring at this all day, so more tomorrow. > The below is what I currently have. > > --- > --- a/include/linux/perf_event.h > +++ b/include/linux/perf_event.h > @@ -833,13 +833,13 @@ struct perf_event { > * `--------[1:n]---------' `-[n:1]-> pmu <-[1:n]-' > * > * > - * XXX destroy epc when empty > - * refcount, !rcu > + * epc lifetime is refcount based and RCU freed (similar to perf_event_context). > + * epc locking is as if it were a member of perf_event_context; specifically: > * > - * XXX epc locking > + * modification, both: ctx->mutex && ctx->lock > + * reading, either: ctx->mutex || ctx->lock > * > - * event->pmu_ctx ctx->mutex && inactive > - * ctx->pmu_ctx_list ctx->mutex && ctx->lock > + * XXX except this isn't true ... see put_pmu_ctx(). > * > */ > struct perf_event_pmu_context { > @@ -857,6 +857,7 @@ struct perf_event_pmu_context { > unsigned int nr_events; > > atomic_t refcount; /* event <-> epc */ > + struct rcu_head rcu_head; > > void *task_ctx_data; /* pmu specific data */ > /* > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -1727,6 +1727,10 @@ perf_event_groups_next(struct perf_event > return NULL; > } > > +#define perf_event_groups_for_cpu_pmu(event, groups, cpu, pmu) \ > + for (event = perf_event_groups_first(groups, cpu, pmu, NULL); \ > + event; event = perf_event_groups_next(event, pmu)) > + > /* > * Iterate through the whole groups tree. > */ > @@ -3366,6 +3370,14 @@ static void perf_event_sync_stat(struct > } > } > > +#define list_for_each_entry_double(pos1, pos2, head1, head2, member) \ > + for (pos1 = list_first_entry(head1, typeof(*pos1), member), \ > + pos2 = list_first_entry(head2, typeof(*pos2), member); \ > + !list_entry_is_head(pos1, head1, member) && \ > + !list_entry_is_head(pos2, head2, member); \ > + pos1 = list_next_entry(pos1, member), \ > + pos2 = list_next_entry(pos2, member)) > + > static void perf_event_swap_task_ctx_data(struct perf_event_context *prev_ctx, > struct perf_event_context *next_ctx) > { > @@ -3374,16 +3386,9 @@ static void perf_event_swap_task_ctx_dat > if (!prev_ctx->nr_task_data) > return; > > - prev_epc = list_first_entry(&prev_ctx->pmu_ctx_list, > - struct perf_event_pmu_context, > - pmu_ctx_entry); > - next_epc = list_first_entry(&next_ctx->pmu_ctx_list, > - struct perf_event_pmu_context, > - pmu_ctx_entry); > - > - while (&prev_epc->pmu_ctx_entry != &prev_ctx->pmu_ctx_list && > - &next_epc->pmu_ctx_entry != &next_ctx->pmu_ctx_list) { > - > + list_for_each_entry_double(prev_epc, next_epc, > + &prev_ctx->pmu_ctx_list, &next_ctx->pmu_ctx_list, > + pmu_ctx_entry) { There are more places which can use list_for_each_entry_double(). I'll fix those. > WARN_ON_ONCE(prev_epc->pmu != next_epc->pmu); > > /* > @@ -3706,7 +3711,6 @@ static noinline int visit_groups_merge(s > perf_assert_pmu_disabled((*evt)->pmu_ctx->pmu); > } > > - > min_heapify_all(&event_heap, &perf_min_heap); > > while (event_heap.nr) { > @@ -3845,7 +3849,6 @@ ctx_sched_in(struct perf_event_context * > /* start ctx time */ > __update_context_time(ctx, false); > perf_cgroup_set_timestamp(cpuctx); > - // XXX ctx->task =? task > /* > * CPU-release for the below ->is_active store, > * see __load_acquire() in perf_event_time_now() > @@ -4815,6 +4818,15 @@ find_get_pmu_context(struct pmu *pmu, st > > __perf_init_event_pmu_context(new, pmu); > > + /* > + * XXX > + * > + * lockdep_assert_held(&ctx->mutex); > + * > + * can't because perf_event_init_task() doesn't actually hold the > + * child_ctx->mutex. > + */ > + > raw_spin_lock_irq(&ctx->lock); > list_for_each_entry(epc, &ctx->pmu_ctx_list, pmu_ctx_entry) { > if (epc->pmu == pmu) { > @@ -4849,6 +4861,14 @@ static void get_pmu_ctx(struct perf_even > WARN_ON_ONCE(!atomic_inc_not_zero(&epc->refcount)); > } > > +static void free_epc_rcu(struct rcu_head *head) > +{ > + struct perf_event_pmu_context *epc = container_of(head, typeof(*epc), rcu_head); > + > + kfree(epc->task_ctx_data); > + kfree(epc); > +} > + > static void put_pmu_ctx(struct perf_event_pmu_context *epc) > { > unsigned long flags; > @@ -4859,7 +4879,14 @@ static void put_pmu_ctx(struct perf_even > if (epc->ctx) { > struct perf_event_context *ctx = epc->ctx; > > - // XXX ctx->mutex > + /* > + * XXX > + * > + * lockdep_assert_held(&ctx->mutex); > + * > + * can't because of the call-site in _free_event()/put_event() > + * which isn't always called under ctx->mutex. > + */ Yes. I came across the same and could not figure out how to solve this. So Just kept XXX as is. > > WARN_ON_ONCE(list_empty(&epc->pmu_ctx_entry)); > raw_spin_lock_irqsave(&ctx->lock, flags); > @@ -4874,17 +4901,15 @@ static void put_pmu_ctx(struct perf_even > if (epc->embedded) > return; > > - kfree(epc->task_ctx_data); > - kfree(epc); > + call_rcu(&epc->rcu_head, free_epc_rcu); > } > > static void perf_event_free_filter(struct perf_event *event); > > static void free_event_rcu(struct rcu_head *head) > { > - struct perf_event *event; > + struct perf_event *event = container_of(head, typeof(*event), rcu_head); > > - event = container_of(head, struct perf_event, rcu_head); > if (event->ns) > put_pid_ns(event->ns); > perf_event_free_filter(event); > @@ -12643,13 +12668,6 @@ perf_event_create_kernel_counter(struct > goto err_alloc; > } > > - pmu_ctx = find_get_pmu_context(pmu, ctx, event); > - if (IS_ERR(pmu_ctx)) { > - err = PTR_ERR(pmu_ctx); > - goto err_ctx; > - } > - event->pmu_ctx = pmu_ctx; > - > WARN_ON_ONCE(ctx->parent_ctx); > mutex_lock(&ctx->mutex); > if (ctx->task == TASK_TOMBSTONE) { > @@ -12657,6 +12675,13 @@ perf_event_create_kernel_counter(struct > goto err_unlock; > } > > + pmu_ctx = find_get_pmu_context(pmu, ctx, event); > + if (IS_ERR(pmu_ctx)) { > + err = PTR_ERR(pmu_ctx); > + goto err_unlock; > + } > + event->pmu_ctx = pmu_ctx; We should call find_get_pmu_context() with ctx->mutex held and thus above perf_event_create_kernel_counter() change. Is my understanding correct? > + > if (!task) { > /* > * Check if the @cpu we're creating an event for is online. > @@ -12668,13 +12693,13 @@ perf_event_create_kernel_counter(struct > container_of(ctx, struct perf_cpu_context, ctx); > if (!cpuctx->online) { > err = -ENODEV; > - goto err_unlock; > + goto err_pmu_ctx; > } > } > > if (!exclusive_event_installable(event, ctx)) { > err = -EBUSY; > - goto err_unlock; > + goto err_pmu_ctx; > } > > perf_install_in_context(ctx, event, event->cpu); > @@ -12683,9 +12708,10 @@ perf_event_create_kernel_counter(struct > > return event; > > +err_pmu_ctx: > + put_pmu_ctx(pmu_ctx); > err_unlock: > mutex_unlock(&ctx->mutex); > -err_ctx: > perf_unpin_context(ctx); > put_ctx(ctx); > err_alloc: > @@ -12702,9 +12728,7 @@ static void __perf_pmu_remove(struct per > { > struct perf_event *event, *sibling; > > - for (event = perf_event_groups_first(groups, cpu, pmu, NULL); > - event; event = perf_event_groups_next(event, pmu)) { > - > + perf_event_groups_for_cpu_pmu(event, groups, cpu, pmu) { > perf_remove_from_context(event, 0); > unaccount_event_cpu(event, cpu); > put_pmu_ctx(event->pmu_ctx); > @@ -12998,7 +13022,7 @@ void perf_event_free_task(struct task_st > struct perf_event_context *ctx; > struct perf_event *event, *tmp; > > - ctx = rcu_dereference(task->perf_event_ctxp); > + ctx = rcu_access_pointer(task->perf_event_ctxp); We dereference ctx pointer but with mutex and lock held. And thus rcu_access_pointer() is sufficient. Is my understanding correct? Thanks, Ravi ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] perf: Rewrite core context handling 2022-10-12 8:39 ` Ravi Bangoria @ 2022-10-12 12:16 ` Peter Zijlstra 2022-10-12 20:47 ` Peter Zijlstra 0 siblings, 1 reply; 12+ messages in thread From: Peter Zijlstra @ 2022-10-12 12:16 UTC (permalink / raw) To: Ravi Bangoria Cc: acme, alexander.shishkin, jolsa, namhyung, songliubraving, eranian, ak, mark.rutland, frederic, maddy, irogers, will, robh, mingo, catalin.marinas, ndesaulniers, srw, linux-arm-kernel, linux-perf-users, linuxppc-dev, linux-s390, linux-kernel, sandipan.das, ananth.narayan, kim.phillips, santosh.shukla On Wed, Oct 12, 2022 at 02:09:00PM +0530, Ravi Bangoria wrote: > > @@ -3366,6 +3370,14 @@ static void perf_event_sync_stat(struct > > } > > } > > > > +#define list_for_each_entry_double(pos1, pos2, head1, head2, member) \ > > + for (pos1 = list_first_entry(head1, typeof(*pos1), member), \ > > + pos2 = list_first_entry(head2, typeof(*pos2), member); \ > > + !list_entry_is_head(pos1, head1, member) && \ > > + !list_entry_is_head(pos2, head2, member); \ > > + pos1 = list_next_entry(pos1, member), \ > > + pos2 = list_next_entry(pos2, member)) > > + > > static void perf_event_swap_task_ctx_data(struct perf_event_context *prev_ctx, > > struct perf_event_context *next_ctx) > > { > > @@ -3374,16 +3386,9 @@ static void perf_event_swap_task_ctx_dat > > if (!prev_ctx->nr_task_data) > > return; > > > > - prev_epc = list_first_entry(&prev_ctx->pmu_ctx_list, > > - struct perf_event_pmu_context, > > - pmu_ctx_entry); > > - next_epc = list_first_entry(&next_ctx->pmu_ctx_list, > > - struct perf_event_pmu_context, > > - pmu_ctx_entry); > > - > > - while (&prev_epc->pmu_ctx_entry != &prev_ctx->pmu_ctx_list && > > - &next_epc->pmu_ctx_entry != &next_ctx->pmu_ctx_list) { > > - > > + list_for_each_entry_double(prev_epc, next_epc, > > + &prev_ctx->pmu_ctx_list, &next_ctx->pmu_ctx_list, > > + pmu_ctx_entry) { > > There are more places which can use list_for_each_entry_double(). > I'll fix those. I've gone and renamed it: double_list_for_each_entry(), but yeah, didn't look too hard for other users. > > @@ -4859,7 +4879,14 @@ static void put_pmu_ctx(struct perf_even > > if (epc->ctx) { > > struct perf_event_context *ctx = epc->ctx; > > > > - // XXX ctx->mutex > > + /* > > + * XXX > > + * > > + * lockdep_assert_held(&ctx->mutex); > > + * > > + * can't because of the call-site in _free_event()/put_event() > > + * which isn't always called under ctx->mutex. > > + */ > > Yes. I came across the same and could not figure out how to solve > this. So Just kept XXX as is. Yeah, I can sorta fix it, but it's ugly so there we are. > > > > WARN_ON_ONCE(list_empty(&epc->pmu_ctx_entry)); > > raw_spin_lock_irqsave(&ctx->lock, flags); > > @@ -12657,6 +12675,13 @@ perf_event_create_kernel_counter(struct > > goto err_unlock; > > } > > > > + pmu_ctx = find_get_pmu_context(pmu, ctx, event); > > + if (IS_ERR(pmu_ctx)) { > > + err = PTR_ERR(pmu_ctx); > > + goto err_unlock; > > + } > > + event->pmu_ctx = pmu_ctx; > > We should call find_get_pmu_context() with ctx->mutex held and thus > above perf_event_create_kernel_counter() change. Is my understanding > correct? That's the intent yeah. But due to not always holding ctx->mutex over put_pmu_ctx() this might be moot. I'm almost through auditing epc usage and I think ctx->lock is sufficient, fingers crossed. > > + > > if (!task) { > > /* > > * Check if the @cpu we're creating an event for is online. > > @@ -12998,7 +13022,7 @@ void perf_event_free_task(struct task_st > > struct perf_event_context *ctx; > > struct perf_event *event, *tmp; > > > > - ctx = rcu_dereference(task->perf_event_ctxp); > > + ctx = rcu_access_pointer(task->perf_event_ctxp); > > We dereference ctx pointer but with mutex and lock held. And thus > rcu_access_pointer() is sufficient. Is my understanding correct? We do not in fact hold ctx->lock here IIRC; but this is a NULL test, if it is !NULL we know we have a reference on it and are good. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] perf: Rewrite core context handling 2022-10-12 12:16 ` Peter Zijlstra @ 2022-10-12 20:47 ` Peter Zijlstra 2022-10-13 10:07 ` Ravi Bangoria 0 siblings, 1 reply; 12+ messages in thread From: Peter Zijlstra @ 2022-10-12 20:47 UTC (permalink / raw) To: Ravi Bangoria Cc: acme, alexander.shishkin, jolsa, namhyung, songliubraving, eranian, ak, mark.rutland, frederic, maddy, irogers, will, robh, mingo, catalin.marinas, ndesaulniers, srw, linux-arm-kernel, linux-perf-users, linuxppc-dev, linux-s390, linux-kernel, sandipan.das, ananth.narayan, kim.phillips, santosh.shukla On Wed, Oct 12, 2022 at 02:16:29PM +0200, Peter Zijlstra wrote: > That's the intent yeah. But due to not always holding ctx->mutex over > put_pmu_ctx() this might be moot. I'm almost through auditing epc usage > and I think ctx->lock is sufficient, fingers crossed. So the very last epc usage threw a spanner into the works and made things complicated. Specifically sys_perf_event_open()'s group_leader case uses event->pmu_ctx while only holding ctx->mutex. Therefore we can't fully let go of ctx->mutex locking and purely rely on ctx->lock. Now the good news is that the annoying put_pmu_ctx() without holding ctx->mutex case doesn't actually matter here. Since we hold a reference on the group_leader (per the filedesc) the event can't go away, therefore it must have a pmu_ctx, and then holding ctx->mutex ensures the pmu_ctx is stable -- iow it serializes against sys_perf_event_open()'s move_group and perf_pmu_migrate_context() changing the epc around. So we're going with the normal mutex+lock for modification rule, but allow the weird put_pmu_ctx() exception. I have the below delta. I'm hoping we can call this done -- I'm going to see if I can bribe Mark to take a look at the arm64 thing soon and then hopefully queue the whole thing once -rc1 happens. That should give us a good long soak until the next merge window. --- --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -826,21 +826,28 @@ struct perf_event { }; /* - * ,------------------------[1:n]---------------------. + * ,-----------------------[1:n]----------------------. * V V * perf_event_context <-[1:n]-> perf_event_pmu_context <--- perf_event * ^ ^ | | * `--------[1:n]---------' `-[n:1]-> pmu <-[1:n]-' * * - * XXX destroy epc when empty - * refcount, !rcu - * - * XXX epc locking - * - * event->pmu_ctx ctx->mutex && inactive - * ctx->pmu_ctx_list ctx->mutex && ctx->lock - * + * struct perf_event_pmu_context lifetime is refcount based and RCU freed + * (similar to perf_event_context). Locking is as if it were a member of + * perf_event_context; specifically: + * + * modification, both: ctx->mutex && ctx->lock + * reading, either: ctx->mutex || ctx->lock + * + * There is one exception to this; namely put_pmu_ctx() isn't always called + * with ctx->mutex held; this means that as long as we can guarantee the epc + * has events the above rules hold. + * + * Specificially, sys_perf_event_open()'s group_leader case depends on + * ctx->mutex pinning the configuration. Since we hold a reference on + * group_leader (through the filedesc) it can't fo away, therefore it's + * associated pmu_ctx must exist and cannot change due to ctx->mutex. */ struct perf_event_pmu_context { struct pmu *pmu; @@ -857,6 +864,7 @@ struct perf_event_pmu_context { unsigned int nr_events; atomic_t refcount; /* event <-> epc */ + struct rcu_head rcu_head; void *task_ctx_data; /* pmu specific data */ /* @@ -906,7 +914,7 @@ struct perf_event_context { int nr_freq; int rotate_disable; - refcount_t refcount; + refcount_t refcount; /* event <-> ctx */ struct task_struct *task; /* --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -1727,6 +1727,10 @@ perf_event_groups_next(struct perf_event return NULL; } +#define perf_event_groups_for_cpu_pmu(event, groups, cpu, pmu) \ + for (event = perf_event_groups_first(groups, cpu, pmu, NULL); \ + event; event = perf_event_groups_next(event, pmu)) + /* * Iterate through the whole groups tree. */ @@ -3366,6 +3370,14 @@ static void perf_event_sync_stat(struct } } +#define double_list_for_each_entry(pos1, pos2, head1, head2, member) \ + for (pos1 = list_first_entry(head1, typeof(*pos1), member), \ + pos2 = list_first_entry(head2, typeof(*pos2), member); \ + !list_entry_is_head(pos1, head1, member) && \ + !list_entry_is_head(pos2, head2, member); \ + pos1 = list_next_entry(pos1, member), \ + pos2 = list_next_entry(pos2, member)) + static void perf_event_swap_task_ctx_data(struct perf_event_context *prev_ctx, struct perf_event_context *next_ctx) { @@ -3374,17 +3386,12 @@ static void perf_event_swap_task_ctx_dat if (!prev_ctx->nr_task_data) return; - prev_epc = list_first_entry(&prev_ctx->pmu_ctx_list, - struct perf_event_pmu_context, - pmu_ctx_entry); - next_epc = list_first_entry(&next_ctx->pmu_ctx_list, - struct perf_event_pmu_context, - pmu_ctx_entry); - - while (&prev_epc->pmu_ctx_entry != &prev_ctx->pmu_ctx_list && - &next_epc->pmu_ctx_entry != &next_ctx->pmu_ctx_list) { + double_list_for_each_entry(prev_epc, next_epc, + &prev_ctx->pmu_ctx_list, &next_ctx->pmu_ctx_list, + pmu_ctx_entry) { - WARN_ON_ONCE(prev_epc->pmu != next_epc->pmu); + if (WARN_ON_ONCE(prev_epc->pmu != next_epc->pmu)) + continue; /* * PMU specific parts of task perf context can require @@ -3706,7 +3713,6 @@ static noinline int visit_groups_merge(s perf_assert_pmu_disabled((*evt)->pmu_ctx->pmu); } - min_heapify_all(&event_heap, &perf_min_heap); while (event_heap.nr) { @@ -3845,7 +3851,6 @@ ctx_sched_in(struct perf_event_context * /* start ctx time */ __update_context_time(ctx, false); perf_cgroup_set_timestamp(cpuctx); - // XXX ctx->task =? task /* * CPU-release for the below ->is_active store, * see __load_acquire() in perf_event_time_now() @@ -4815,6 +4820,15 @@ find_get_pmu_context(struct pmu *pmu, st __perf_init_event_pmu_context(new, pmu); + /* + * XXX + * + * lockdep_assert_held(&ctx->mutex); + * + * can't because perf_event_init_task() doesn't actually hold the + * child_ctx->mutex. + */ + raw_spin_lock_irq(&ctx->lock); list_for_each_entry(epc, &ctx->pmu_ctx_list, pmu_ctx_entry) { if (epc->pmu == pmu) { @@ -4849,6 +4863,14 @@ static void get_pmu_ctx(struct perf_even WARN_ON_ONCE(!atomic_inc_not_zero(&epc->refcount)); } +static void free_epc_rcu(struct rcu_head *head) +{ + struct perf_event_pmu_context *epc = container_of(head, typeof(*epc), rcu_head); + + kfree(epc->task_ctx_data); + kfree(epc); +} + static void put_pmu_ctx(struct perf_event_pmu_context *epc) { unsigned long flags; @@ -4859,7 +4881,14 @@ static void put_pmu_ctx(struct perf_even if (epc->ctx) { struct perf_event_context *ctx = epc->ctx; - // XXX ctx->mutex + /* + * XXX + * + * lockdep_assert_held(&ctx->mutex); + * + * can't because of the call-site in _free_event()/put_event() + * which isn't always called under ctx->mutex. + */ WARN_ON_ONCE(list_empty(&epc->pmu_ctx_entry)); raw_spin_lock_irqsave(&ctx->lock, flags); @@ -4874,17 +4903,15 @@ static void put_pmu_ctx(struct perf_even if (epc->embedded) return; - kfree(epc->task_ctx_data); - kfree(epc); + call_rcu(&epc->rcu_head, free_epc_rcu); } static void perf_event_free_filter(struct perf_event *event); static void free_event_rcu(struct rcu_head *head) { - struct perf_event *event; + struct perf_event *event = container_of(head, typeof(*event), rcu_head); - event = container_of(head, struct perf_event, rcu_head); if (event->ns) put_pid_ns(event->ns); perf_event_free_filter(event); @@ -12436,6 +12463,9 @@ SYSCALL_DEFINE5(perf_event_open, * Allow the addition of software events to hw * groups, this is safe because software events * never fail to schedule. + * + * Note the comment that goes with struct + * pmu_event_pmu_context. */ pmu = group_leader->pmu_ctx->pmu; } else if (!is_software_event(event) && @@ -12643,13 +12673,6 @@ perf_event_create_kernel_counter(struct goto err_alloc; } - pmu_ctx = find_get_pmu_context(pmu, ctx, event); - if (IS_ERR(pmu_ctx)) { - err = PTR_ERR(pmu_ctx); - goto err_ctx; - } - event->pmu_ctx = pmu_ctx; - WARN_ON_ONCE(ctx->parent_ctx); mutex_lock(&ctx->mutex); if (ctx->task == TASK_TOMBSTONE) { @@ -12657,6 +12680,13 @@ perf_event_create_kernel_counter(struct goto err_unlock; } + pmu_ctx = find_get_pmu_context(pmu, ctx, event); + if (IS_ERR(pmu_ctx)) { + err = PTR_ERR(pmu_ctx); + goto err_unlock; + } + event->pmu_ctx = pmu_ctx; + if (!task) { /* * Check if the @cpu we're creating an event for is online. @@ -12668,13 +12698,13 @@ perf_event_create_kernel_counter(struct container_of(ctx, struct perf_cpu_context, ctx); if (!cpuctx->online) { err = -ENODEV; - goto err_unlock; + goto err_pmu_ctx; } } if (!exclusive_event_installable(event, ctx)) { err = -EBUSY; - goto err_unlock; + goto err_pmu_ctx; } perf_install_in_context(ctx, event, event->cpu); @@ -12683,9 +12713,10 @@ perf_event_create_kernel_counter(struct return event; +err_pmu_ctx: + put_pmu_ctx(pmu_ctx); err_unlock: mutex_unlock(&ctx->mutex); -err_ctx: perf_unpin_context(ctx); put_ctx(ctx); err_alloc: @@ -12702,9 +12733,7 @@ static void __perf_pmu_remove(struct per { struct perf_event *event, *sibling; - for (event = perf_event_groups_first(groups, cpu, pmu, NULL); - event; event = perf_event_groups_next(event, pmu)) { - + perf_event_groups_for_cpu_pmu(event, groups, cpu, pmu) { perf_remove_from_context(event, 0); unaccount_event_cpu(event, cpu); put_pmu_ctx(event->pmu_ctx); @@ -12998,7 +13027,7 @@ void perf_event_free_task(struct task_st struct perf_event_context *ctx; struct perf_event *event, *tmp; - ctx = rcu_dereference(task->perf_event_ctxp); + ctx = rcu_access_pointer(task->perf_event_ctxp); if (!ctx) return; ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] perf: Rewrite core context handling 2022-10-12 20:47 ` Peter Zijlstra @ 2022-10-13 10:07 ` Ravi Bangoria 2022-10-13 10:59 ` Peter Zijlstra 0 siblings, 1 reply; 12+ messages in thread From: Ravi Bangoria @ 2022-10-13 10:07 UTC (permalink / raw) To: Peter Zijlstra Cc: acme, alexander.shishkin, jolsa, namhyung, songliubraving, eranian, ak, mark.rutland, frederic, maddy, irogers, will, robh, mingo, catalin.marinas, ndesaulniers, srw, linux-arm-kernel, linux-perf-users, linuxppc-dev, linux-s390, linux-kernel, sandipan.das, ananth.narayan, kim.phillips, santosh.shukla, Ravi Bangoria On 13-Oct-22 2:17 AM, Peter Zijlstra wrote: > On Wed, Oct 12, 2022 at 02:16:29PM +0200, Peter Zijlstra wrote: > >> That's the intent yeah. But due to not always holding ctx->mutex over >> put_pmu_ctx() this might be moot. I'm almost through auditing epc usage >> and I think ctx->lock is sufficient, fingers crossed. > > So the very last epc usage threw a spanner into the works and made > things complicated. > > Specifically sys_perf_event_open()'s group_leader case uses > event->pmu_ctx while only holding ctx->mutex. Therefore we can't fully > let go of ctx->mutex locking and purely rely on ctx->lock. > > Now the good news is that the annoying put_pmu_ctx() without holding > ctx->mutex case doesn't actually matter here. Since we hold a reference > on the group_leader (per the filedesc) the event can't go away, > therefore it must have a pmu_ctx, and then holding ctx->mutex ensures > the pmu_ctx is stable -- iow it serializes against > sys_perf_event_open()'s move_group and perf_pmu_migrate_context() > changing the epc around. > > So we're going with the normal mutex+lock for modification rule, but > allow the weird put_pmu_ctx() exception. > > I have the below delta. > > I'm hoping we can call this done -- I'm going to see if I can bribe Mark > to take a look at the arm64 thing soon and then hopefully queue the > whole thing once -rc1 happens. That should give us a good long soak > until the next merge window. Sounds good. Thanks for all the help! I've glanced through the changes and they looks fine, below are few minor points. > + * Specificially, sys_perf_event_open()'s group_leader case depends on > + * ctx->mutex pinning the configuration. Since we hold a reference on > + * group_leader (through the filedesc) it can't fo away, therefore it's typo: can't go away > - refcount_t refcount; > + refcount_t refcount; /* event <-> ctx */ Ok. We need to remove all those // XXX get/put_ctx() from code which we added to make refcount a pmu_ctx <-> ctx. > +#define double_list_for_each_entry(pos1, pos2, head1, head2, member) \ > + for (pos1 = list_first_entry(head1, typeof(*pos1), member), \ > + pos2 = list_first_entry(head2, typeof(*pos2), member); \ > + !list_entry_is_head(pos1, head1, member) && \ > + !list_entry_is_head(pos2, head2, member); \ > + pos1 = list_next_entry(pos1, member), \ > + pos2 = list_next_entry(pos2, member)) > + > static void perf_event_swap_task_ctx_data(struct perf_event_context *prev_ctx, > struct perf_event_context *next_ctx) While this is unrelated to this patch, shouldn't we also need to swap event->hw.target? A purely hypothetical scenario: Consider two processes having clone contexts (for example, two children of the same parent). While process switch between these two, the perf event context would get swapped but event->hw.target will point to other sibling's task_struct. If any one process exit just after single context swap, _free_event() will call put_task_context() on sibling process' task_struct. > @@ -12436,6 +12463,9 @@ SYSCALL_DEFINE5(perf_event_open, > * Allow the addition of software events to hw > * groups, this is safe because software events > * never fail to schedule. > + * > + * Note the comment that goes with struct > + * pmu_event_pmu_context. typo: perf_event_pmu_context The good (or bad? ;)) news is, perf test and Vince's perf_event_tests are running fine without any regression on my machine. Thanks, Ravi ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] perf: Rewrite core context handling 2022-10-13 10:07 ` Ravi Bangoria @ 2022-10-13 10:59 ` Peter Zijlstra 2022-10-14 9:56 ` Ravi Bangoria 0 siblings, 1 reply; 12+ messages in thread From: Peter Zijlstra @ 2022-10-13 10:59 UTC (permalink / raw) To: Ravi Bangoria Cc: acme, alexander.shishkin, jolsa, namhyung, songliubraving, eranian, ak, mark.rutland, frederic, maddy, irogers, will, robh, mingo, catalin.marinas, ndesaulniers, srw, linux-arm-kernel, linux-perf-users, linuxppc-dev, linux-s390, linux-kernel, sandipan.das, ananth.narayan, kim.phillips, santosh.shukla On Thu, Oct 13, 2022 at 03:37:23PM +0530, Ravi Bangoria wrote: > > - refcount_t refcount; > > + refcount_t refcount; /* event <-> ctx */ > > Ok. We need to remove all those // XXX get/put_ctx() from code > which we added to make refcount a pmu_ctx <-> ctx. Them already gone :-) I've not yet fixed up the typoes, but current version should be here: https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/log/?h=perf/core Thanks! ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] perf: Rewrite core context handling 2022-10-13 10:59 ` Peter Zijlstra @ 2022-10-14 9:56 ` Ravi Bangoria 2022-10-14 14:50 ` Peter Zijlstra 2022-10-17 9:33 ` Ravi Bangoria 0 siblings, 2 replies; 12+ messages in thread From: Ravi Bangoria @ 2022-10-14 9:56 UTC (permalink / raw) To: Peter Zijlstra Cc: acme, alexander.shishkin, jolsa, namhyung, songliubraving, eranian, ak, mark.rutland, frederic, maddy, irogers, will, robh, mingo, catalin.marinas, ndesaulniers, srw, linux-arm-kernel, linux-perf-users, linuxppc-dev, linux-s390, linux-kernel, sandipan.das, ananth.narayan, kim.phillips, santosh.shukla, Ravi Bangoria On 13-Oct-22 4:29 PM, Peter Zijlstra wrote: > On Thu, Oct 13, 2022 at 03:37:23PM +0530, Ravi Bangoria wrote: > >>> - refcount_t refcount; >>> + refcount_t refcount; /* event <-> ctx */ >> >> Ok. We need to remove all those // XXX get/put_ctx() from code >> which we added to make refcount a pmu_ctx <-> ctx. > > Them already gone :-) I've not yet fixed up the typoes, but current > version should be here: > > https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/log/?h=perf/core > > Thanks! I've been running perf-fuzzer on Xeon machine since yesterday and I don't see any issue. Will do the same on my AMD machine as well over the weekend. Thanks, Ravi ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] perf: Rewrite core context handling 2022-10-14 9:56 ` Ravi Bangoria @ 2022-10-14 14:50 ` Peter Zijlstra 2022-10-17 9:33 ` Ravi Bangoria 1 sibling, 0 replies; 12+ messages in thread From: Peter Zijlstra @ 2022-10-14 14:50 UTC (permalink / raw) To: Ravi Bangoria Cc: acme, alexander.shishkin, jolsa, namhyung, songliubraving, eranian, ak, mark.rutland, frederic, maddy, irogers, will, robh, mingo, catalin.marinas, ndesaulniers, srw, linux-arm-kernel, linux-perf-users, linuxppc-dev, linux-s390, linux-kernel, sandipan.das, ananth.narayan, kim.phillips, santosh.shukla On Fri, Oct 14, 2022 at 03:26:07PM +0530, Ravi Bangoria wrote: > On 13-Oct-22 4:29 PM, Peter Zijlstra wrote: > > On Thu, Oct 13, 2022 at 03:37:23PM +0530, Ravi Bangoria wrote: > > > >>> - refcount_t refcount; > >>> + refcount_t refcount; /* event <-> ctx */ > >> > >> Ok. We need to remove all those // XXX get/put_ctx() from code > >> which we added to make refcount a pmu_ctx <-> ctx. > > > > Them already gone :-) I've not yet fixed up the typoes, but current > > version should be here: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/log/?h=perf/core > > > > Thanks! > > I've been running perf-fuzzer on Xeon machine since yesterday and I don't see any > issue. Will do the same on my AMD machine as well over the weekend. Awesome -- I've started fuzzing on the ADL (with the big.LITTLE PMU setup) and I've had it run on my very aged IVB-EP machine. Both so far (knock on wood) with no issues. The most modern AMD machine I have at hand is a 2 socket Interlagos, and I doubt anybody really much cares about that these days -- but I can run it for giggles. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] perf: Rewrite core context handling 2022-10-14 9:56 ` Ravi Bangoria 2022-10-14 14:50 ` Peter Zijlstra @ 2022-10-17 9:33 ` Ravi Bangoria 1 sibling, 0 replies; 12+ messages in thread From: Ravi Bangoria @ 2022-10-17 9:33 UTC (permalink / raw) To: Peter Zijlstra Cc: acme, alexander.shishkin, jolsa, namhyung, songliubraving, eranian, ak, mark.rutland, frederic, maddy, irogers, will, robh, mingo, catalin.marinas, ndesaulniers, srw, linux-arm-kernel, linux-perf-users, linuxppc-dev, linux-s390, linux-kernel, sandipan.das, ananth.narayan, kim.phillips, santosh.shukla, Ravi Bangoria On 14-Oct-22 3:26 PM, Ravi Bangoria wrote: > On 13-Oct-22 4:29 PM, Peter Zijlstra wrote: >> On Thu, Oct 13, 2022 at 03:37:23PM +0530, Ravi Bangoria wrote: >> >>>> - refcount_t refcount; >>>> + refcount_t refcount; /* event <-> ctx */ >>> >>> Ok. We need to remove all those // XXX get/put_ctx() from code >>> which we added to make refcount a pmu_ctx <-> ctx. >> >> Them already gone :-) I've not yet fixed up the typoes, but current >> version should be here: >> >> https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/log/?h=perf/core >> >> Thanks! > > I've been running perf-fuzzer on Xeon machine since yesterday and I don't see any > issue. Will do the same on my AMD machine as well over the weekend. Only one WARN_ON() hit. Otherwise all good. https://lore.kernel.org/lkml/8d91528b-e830-6ad0-8a92-621ce9f944ca@amd.com Thanks, Ravi ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2022-10-17 9:34 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20221008062424.313-1-ravi.bangoria@amd.com>
2022-10-11 11:29 ` [PATCH v2] perf: Rewrite core context handling Peter Zijlstra
2022-10-11 13:19 ` Ravi Bangoria
2022-10-11 14:02 ` Peter Zijlstra
2022-10-11 17:47 ` Peter Zijlstra
2022-10-12 8:39 ` Ravi Bangoria
2022-10-12 12:16 ` Peter Zijlstra
2022-10-12 20:47 ` Peter Zijlstra
2022-10-13 10:07 ` Ravi Bangoria
2022-10-13 10:59 ` Peter Zijlstra
2022-10-14 9:56 ` Ravi Bangoria
2022-10-14 14:50 ` Peter Zijlstra
2022-10-17 9:33 ` Ravi Bangoria
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox