public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] perf/core: Fix warning warning due to unordred pmu_ctx_list
@ 2025-01-20 11:43 Luo Gengkun
  2025-01-20 20:49 ` Liang, Kan
  0 siblings, 1 reply; 3+ messages in thread
From: Luo Gengkun @ 2025-01-20 11:43 UTC (permalink / raw)
  To: peterz
  Cc: mingo, acme, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, kan.liang, ravi.bangoria,
	linux-perf-users, linux-kernel, luogengkun

Syskaller triggers a warning due to prev_epc->pmu != next_epc->pmu in
perf_event_swap_task_ctx_data. vmcore shows that two lists have the same
perf_event_pmu_context, but not in the same order.

The problem is that when inheritance is performed, it traverses the ordered
groups of events, and inserts the new perf_event_pmu_context into
child_ctx->pmu_ctx_list which is unordered. So the order of pmu_ctx_list in
the parent and child may be different.

The follow testcase can trigger above warning:

 # perf record -e cycles --call-graph lbr -- taskset -c 3 ./a.out &
 # perf stat -e cpu-clock,cs -p xxx // xxx is the pid of a.out

test.c

void main() {
        int count = 0;
        pid_t pid;

        printf("%d running\n", getpid());
        sleep(30);
        printf("running\n");

        pid = fork();
        if (pid == -1) {
                printf("fork error\n");
                return;
        }
        if (pid == 0) {
                while (1) {
                        count++;
                }
        } else {
                while (1) {
                        count++;
                }
        }
}

The testcase first open a lbr event, so it will alloc task_ctx_data, and
then open tracepoint and software events, so the parent ctx will have 3
different perf_event_pmu_contexts. When doing inherit, child ctx will
insert the perf_event_pmu_context in another order then the warning will
trigger.

To fix this problem, add pmu_ctx_insertion_sort to make sure the
pmu_ctx_list is ordered.

Fixes: bd2756811766 ("perf: Rewrite core context handling")
Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com>
---
 kernel/events/core.c | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 95b01a51139d..1bdff3ef0ce2 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4953,6 +4953,24 @@ find_get_context(struct task_struct *task, struct perf_event *event)
 	return ERR_PTR(err);
 }
 
+/*
+ * This function ensures that ctx->pmu_ctx_list is ordered, so that no warning
+ * is triggered due to prev_epc->pmu != next_epc->pmu.
+ */
+static void pmu_ctx_insertion_sort(struct perf_event_pmu_context *new,
+				   struct perf_event_context *ctx)
+{
+	struct perf_event_pmu_context *epc;
+
+	lockdep_assert_held(&ctx->lock);
+
+	list_for_each_entry(epc, &ctx->pmu_ctx_list, pmu_ctx_entry) {
+		if (epc->pmu > new->pmu)
+			break;
+	}
+	list_add(&new->pmu_ctx_entry, epc->pmu_ctx_entry.prev);
+}
+
 static struct perf_event_pmu_context *
 find_get_pmu_context(struct pmu *pmu, struct perf_event_context *ctx,
 		     struct perf_event *event)
@@ -4974,7 +4992,7 @@ find_get_pmu_context(struct pmu *pmu, struct perf_event_context *ctx,
 		if (!epc->ctx) {
 			atomic_set(&epc->refcount, 1);
 			epc->embedded = 1;
-			list_add(&epc->pmu_ctx_entry, &ctx->pmu_ctx_list);
+			pmu_ctx_insertion_sort(epc, ctx);
 			epc->ctx = ctx;
 		} else {
 			WARN_ON_ONCE(epc->ctx != ctx);
@@ -5021,7 +5039,7 @@ find_get_pmu_context(struct pmu *pmu, struct perf_event_context *ctx,
 	printk(KERN_INFO
 		"lgk: ctx %p insert pmu ctx %p, pmu is %p!\n", ctx, epc, epc->pmu);
 
-	list_add(&epc->pmu_ctx_entry, &ctx->pmu_ctx_list);
+	pmu_ctx_insertion_sort(epc, ctx);
 	epc->ctx = ctx;
 
 found_epc:
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] perf/core: Fix warning warning due to unordred pmu_ctx_list
  2025-01-20 11:43 [PATCH] perf/core: Fix warning warning due to unordred pmu_ctx_list Luo Gengkun
@ 2025-01-20 20:49 ` Liang, Kan
  2025-01-21  1:59   ` Luo Gengkun
  0 siblings, 1 reply; 3+ messages in thread
From: Liang, Kan @ 2025-01-20 20:49 UTC (permalink / raw)
  To: Luo Gengkun, peterz
  Cc: mingo, acme, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, ravi.bangoria, linux-perf-users,
	linux-kernel

A redundant "warning" is in the title.

On 2025-01-20 6:43 a.m., Luo Gengkun wrote:
> Syskaller triggers a warning due to prev_epc->pmu != next_epc->pmu in
> perf_event_swap_task_ctx_data. vmcore shows that two lists have the same
> perf_event_pmu_context, but not in the same order.
> 
> The problem is that when inheritance is performed, it traverses the ordered
> groups of events, and inserts the new perf_event_pmu_context into
> child_ctx->pmu_ctx_list which is unordered. So the order of pmu_ctx_list in
> the parent and child may be different.

I think the order of pmu_ctx_list for the parent should be impacted by
the time when an event/pmu is added.
While the order for a child should be impacted by the event order in the
pinned_groups and flexible_groups.

> 
> The follow testcase can trigger above warning:
> 
>  # perf record -e cycles --call-graph lbr -- taskset -c 3 ./a.out &
>  # perf stat -e cpu-clock,cs -p xxx // xxx is the pid of a.out
> 
> test.c
> 
> void main() {
>         int count = 0;
>         pid_t pid;
> 
>         printf("%d running\n", getpid());
>         sleep(30);
>         printf("running\n");
> 
>         pid = fork();
>         if (pid == -1) {
>                 printf("fork error\n");
>                 return;
>         }
>         if (pid == 0) {
>                 while (1) {
>                         count++;
>                 }
>         } else {
>                 while (1) {
>                         count++;
>                 }
>         }
> }
> 
> The testcase first open a lbr event, so it will alloc task_ctx_data, and
> then open tracepoint and software events, so the parent ctx will have 3
> different perf_event_pmu_contexts. When doing inherit, child ctx will
> insert the perf_event_pmu_context in another order then the warning will
> trigger.
> 
> To fix this problem, add pmu_ctx_insertion_sort to make sure the
> pmu_ctx_list is ordered.
> 
> Fixes: bd2756811766 ("perf: Rewrite core context handling")
> Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com>
> ---
>  kernel/events/core.c | 22 ++++++++++++++++++++--
>  1 file changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 95b01a51139d..1bdff3ef0ce2 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -4953,6 +4953,24 @@ find_get_context(struct task_struct *task, struct perf_event *event)
>  	return ERR_PTR(err);
>  }
>  
> +/*
> + * This function ensures that ctx->pmu_ctx_list is ordered, so that no warning
> + * is triggered due to prev_epc->pmu != next_epc->pmu.
> + */
> +static void pmu_ctx_insertion_sort(struct perf_event_pmu_context *new,
> +				   struct perf_event_context *ctx)
> +{
> +	struct perf_event_pmu_context *epc;
> +
> +	lockdep_assert_held(&ctx->lock);
> +
> +	list_for_each_entry(epc, &ctx->pmu_ctx_list, pmu_ctx_entry) {
> +		if (epc->pmu > new->pmu)
> +			break;
> +	}
> +	list_add(&new->pmu_ctx_entry, epc->pmu_ctx_entry.prev);
> +}
> +
>  static struct perf_event_pmu_context *
>  find_get_pmu_context(struct pmu *pmu, struct perf_event_context *ctx,
>  		     struct perf_event *event)
> @@ -4974,7 +4992,7 @@ find_get_pmu_context(struct pmu *pmu, struct perf_event_context *ctx,
>  		if (!epc->ctx) {
>  			atomic_set(&epc->refcount, 1);
>  			epc->embedded = 1;
> -			list_add(&epc->pmu_ctx_entry, &ctx->pmu_ctx_list);
> +			pmu_ctx_insertion_sort(epc, ctx);

The CPU event and per-task event should have a different ctx.
The warning should only be triggered for the per-task event, right?
If so, I don't think a sort is required here.

>  			epc->ctx = ctx;
>  		} else {
>  			WARN_ON_ONCE(epc->ctx != ctx);
> @@ -5021,7 +5039,7 @@ find_get_pmu_context(struct pmu *pmu, struct perf_event_context *ctx,
>  	printk(KERN_INFO
>  		"lgk: ctx %p insert pmu ctx %p, pmu is %p!\n", ctx, epc, epc->pmu);

Seems your debug code. Please send a clean patch.

>  
> -	list_add(&epc->pmu_ctx_entry, &ctx->pmu_ctx_list);
> +	pmu_ctx_insertion_sort(epc, ctx);

I think the pmu_ctx_list has already traversed to find a matched pmu
right before. The traverse in the pmu_ctx_insertion_sort() can be avoided.

Thanks,
Kan
>  	epc->ctx = ctx;
>  
>  found_epc:


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] perf/core: Fix warning warning due to unordred pmu_ctx_list
  2025-01-20 20:49 ` Liang, Kan
@ 2025-01-21  1:59   ` Luo Gengkun
  0 siblings, 0 replies; 3+ messages in thread
From: Luo Gengkun @ 2025-01-21  1:59 UTC (permalink / raw)
  To: Liang, Kan, peterz
  Cc: mingo, acme, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, ravi.bangoria, linux-perf-users,
	linux-kernel


On 2025/1/21 4:49, Liang, Kan wrote:
> A redundant "warning" is in the title.
>
> On 2025-01-20 6:43 a.m., Luo Gengkun wrote:
>> Syskaller triggers a warning due to prev_epc->pmu != next_epc->pmu in
>> perf_event_swap_task_ctx_data. vmcore shows that two lists have the same
>> perf_event_pmu_context, but not in the same order.
>>
>> The problem is that when inheritance is performed, it traverses the ordered
>> groups of events, and inserts the new perf_event_pmu_context into
>> child_ctx->pmu_ctx_list which is unordered. So the order of pmu_ctx_list in
>> the parent and child may be different.
> I think the order of pmu_ctx_list for the parent should be impacted by
> the time when an event/pmu is added.
> While the order for a child should be impacted by the event order in the
> pinned_groups and flexible_groups.

Yes, so the order of pmu_ctx_list for the parent and child maybe 
different because

of this point. I will make it clear in the commit message.

>> The follow testcase can trigger above warning:
>>
>>   # perf record -e cycles --call-graph lbr -- taskset -c 3 ./a.out &
>>   # perf stat -e cpu-clock,cs -p xxx // xxx is the pid of a.out
>>
>> test.c
>>
>> void main() {
>>          int count = 0;
>>          pid_t pid;
>>
>>          printf("%d running\n", getpid());
>>          sleep(30);
>>          printf("running\n");
>>
>>          pid = fork();
>>          if (pid == -1) {
>>                  printf("fork error\n");
>>                  return;
>>          }
>>          if (pid == 0) {
>>                  while (1) {
>>                          count++;
>>                  }
>>          } else {
>>                  while (1) {
>>                          count++;
>>                  }
>>          }
>> }
>>
>> The testcase first open a lbr event, so it will alloc task_ctx_data, and
>> then open tracepoint and software events, so the parent ctx will have 3
>> different perf_event_pmu_contexts. When doing inherit, child ctx will
>> insert the perf_event_pmu_context in another order then the warning will
>> trigger.
>>
>> To fix this problem, add pmu_ctx_insertion_sort to make sure the
>> pmu_ctx_list is ordered.
>>
>> Fixes: bd2756811766 ("perf: Rewrite core context handling")
>> Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com>
>> ---
>>   kernel/events/core.c | 22 ++++++++++++++++++++--
>>   1 file changed, 20 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index 95b01a51139d..1bdff3ef0ce2 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
>> @@ -4953,6 +4953,24 @@ find_get_context(struct task_struct *task, struct perf_event *event)
>>   	return ERR_PTR(err);
>>   }
>>   
>> +/*
>> + * This function ensures that ctx->pmu_ctx_list is ordered, so that no warning
>> + * is triggered due to prev_epc->pmu != next_epc->pmu.
>> + */
>> +static void pmu_ctx_insertion_sort(struct perf_event_pmu_context *new,
>> +				   struct perf_event_context *ctx)
>> +{
>> +	struct perf_event_pmu_context *epc;
>> +
>> +	lockdep_assert_held(&ctx->lock);
>> +
>> +	list_for_each_entry(epc, &ctx->pmu_ctx_list, pmu_ctx_entry) {
>> +		if (epc->pmu > new->pmu)
>> +			break;
>> +	}
>> +	list_add(&new->pmu_ctx_entry, epc->pmu_ctx_entry.prev);
>> +}
>> +
>>   static struct perf_event_pmu_context *
>>   find_get_pmu_context(struct pmu *pmu, struct perf_event_context *ctx,
>>   		     struct perf_event *event)
>> @@ -4974,7 +4992,7 @@ find_get_pmu_context(struct pmu *pmu, struct perf_event_context *ctx,
>>   		if (!epc->ctx) {
>>   			atomic_set(&epc->refcount, 1);
>>   			epc->embedded = 1;
>> -			list_add(&epc->pmu_ctx_entry, &ctx->pmu_ctx_list);
>> +			pmu_ctx_insertion_sort(epc, ctx);
> The CPU event and per-task event should have a different ctx.
> The warning should only be triggered for the per-task event, right?
> If so, I don't think a sort is required here.
Yes, the ctx is extracted from task, so only sort the task ctx should 
fix this problem.
>
>>   			epc->ctx = ctx;
>>   		} else {
>>   			WARN_ON_ONCE(epc->ctx != ctx);
>> @@ -5021,7 +5039,7 @@ find_get_pmu_context(struct pmu *pmu, struct perf_event_context *ctx,
>>   	printk(KERN_INFO
>>   		"lgk: ctx %p insert pmu ctx %p, pmu is %p!\n", ctx, epc, epc->pmu);
> Seems your debug code. Please send a clean patch.
Sorry about this.
>>   
>> -	list_add(&epc->pmu_ctx_entry, &ctx->pmu_ctx_list);
>> +	pmu_ctx_insertion_sort(epc, ctx);
> I think the pmu_ctx_list has already traversed to find a matched pmu
> right before. The traverse in the pmu_ctx_insertion_sort() can be avoided.
>
> Thanks,
> Kan
Thanks for the review, I will send PATCH v2 later.
>>   	epc->ctx = ctx;
>>   
>>   found_epc:


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-01-21  1:59 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-20 11:43 [PATCH] perf/core: Fix warning warning due to unordred pmu_ctx_list Luo Gengkun
2025-01-20 20:49 ` Liang, Kan
2025-01-21  1:59   ` Luo Gengkun

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox