public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Ravi Bangoria <ravi.bangoria@amd.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: "mingo@kernel.org" <mingo@kernel.org>,
	"lucas.demarchi@intel.com" <lucas.demarchi@intel.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"willy@infradead.org" <willy@infradead.org>,
	"acme@kernel.org" <acme@kernel.org>,
	"namhyung@kernel.org" <namhyung@kernel.org>,
	"mark.rutland@arm.com" <mark.rutland@arm.com>,
	"alexander.shishkin@linux.intel.com"
	<alexander.shishkin@linux.intel.com>,
	"jolsa@kernel.org" <jolsa@kernel.org>,
	"irogers@google.com" <irogers@google.com>,
	"adrian.hunter@intel.com" <adrian.hunter@intel.com>,
	"kan.liang@linux.intel.com" <kan.liang@linux.intel.com>,
	Ravi Bangoria <ravi.bangoria@amd.com>
Subject: Re: [PATCH v2 24/24] perf: Make perf_pmu_unregister() useable
Date: Wed, 19 Feb 2025 18:53:46 +0530	[thread overview]
Message-ID: <e8dfc3df-69d6-46ba-81db-4a04484fb7c3@amd.com> (raw)
In-Reply-To: <d585f337-5ce7-4a02-b890-5f3888e59ad0@amd.com>

Hi Peter,

>>>> Apparently not, it ends up with:
>>>>
>>>>   ------------[ cut here ]------------
>>>>   WARNING: CPU: 145 PID: 5459 at kernel/events/core.c:281 event_function+0xd2/0xf0
>>>>   WARNING: CPU: 145 PID: 5459 at kernel/events/core.c:286 event_function+0xd6/0xf0
>>>
>>>>    remote_function+0x4f/0x70
>>>>    generic_exec_single+0x7f/0x160
>>>>    smp_call_function_single+0x110/0x160
>>>>    event_function_call+0x98/0x1d0
>>>>    _perf_event_disable+0x41/0x70
>>>>    perf_event_for_each_child+0x40/0x90
>>>>    _perf_ioctl+0xac/0xb00
>>>>    perf_ioctl+0x45/0x80
>>>
>>> Took me a long while trying to blame this on the 'event->parent =
>>> NULL;', but AFAICT this is a new, unrelated issue.
>>>
>>> What I think happens is this perf_ioctl(DISABLE) vs pmu_detach_events()
>>> race, where the crux is that perf_ioctl() path does not take
>>> event2->mutex which allows the following interleave:
>>
>> This one was only with perf_fuzzer, so pmu_detach_events() code path was
>> not invoked.
> 
> I think the issue is, unaccount_event() gets called for the child event
> after the child is detached. Since event->parent is NULL, unaccount_event()
> abruptly corrupts the perf_sched_work.

A different manifestation of the same underlying issue (perf_sched_work
is getting corrupted because of event->parent = NULL):

  WARNING: CPU: 0 PID: 5799 at kernel/events/core.c:286 event_function+0xd6/0xf0
  CPU: 0 UID: 1002 PID: 5799 Comm: perf_fuzzer Kdump: loaded Not tainted 6.14.0-rc1-pmu-unregister+ #263
  RIP: 0010:event_function+0xd6/0xf0
  RSP: 0018:ffa0000006a87828 EFLAGS: 00010086
  RAX: 0000000000000007 RBX: ff11001008830ee0 RCX: 0000000000000000
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
  RBP: ffa0000006a87850 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000000 R12: ffa0000006a87918
  R13: 0000000000000000 R14: ff1100010e347c00 R15: ff110001226c9a40
  FS:  0000000000000000(0000) GS:ff11001008800000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000595fed7d3e12 CR3: 000000016954c005 CR4: 0000000000f71ef0
  PKRU: 55555554
  Call Trace:
   <TASK>
   ? event_function+0xd6/0xf0
   ? event_function+0x3d/0xf0
   remote_function+0x4c/0x70
   generic_exec_single+0x7c/0x160
   smp_call_function_single+0x110/0x160
   event_function_call+0x98/0x1d0
   perf_remove_from_context+0x46/0xa0
   perf_event_release_kernel+0x1f3/0x210
   perf_release+0x12/0x20

With the above trace as reference, something like below is what is
happening. (Code paths are based on my understanding of traces, I've
not verified each an every bit :P)

  Assume:
  task1: event1
  task2 (child of task1): event2 (child of event1)
  task3: event3

              CPU 1                                             CPU 2                                                               CPU 3                                                       CPU 4
  
                                                  /* task3 sched in */
                                                  perf_event_task_sched_in()
                                                    if (static_branch_unlikely(&perf_sched_events)) /* true */
                                                        __perf_event_task_sched_in()
                                                            ...

  task2:
  perf_event_exit_event() /* event2 */
    event->parent = NULL /* by perf_perf_child_detach() */
    free_event()
      _free_event()
        unaccount_event()
          if (event->parent) /* false */
              return; /* won't get executed */
          perf_sched_work <== OFF  ------.
                                         |
                                         |        /* task3 sched out */
                                         |        perf_event_task_sched_out()
                                         '----->   if (static_branch_unlikely(&perf_sched_events)) /* false */
                                         |             /* __perf_event_task_sched_out() won't get called */
                                         |
                                         |                                                                            /* task3 sched in */
                                         |                                                                            perf_event_task_sched_in()
                                         '-------------------------------------------------------------------------->   if (static_branch_unlikely(&perf_sched_events)) /* false */
                                                                                                                             /* __perf_event_task_sched_in() won't get called
                                                                                                                             So cpuctx->task_ctx will remains NULL; */
  
                                                                                                                                                                                         close(event3)
                                                                                                                                                                                           perf_release()
                                                                                                                                                                                             perf_event_release_kernel()
                                                                                                                                                                                               perf_remove_from_context()
                                                                                                                                                                                                 event_function_call()
                                                                                                                                                                                                 /* IPI to CPU 3 ---> */
                                                                                                                      /* ---> IPI from CPU 4 */
                                                                                                                      event_function()
                                                                                                                        if (ctx->task) {
                                                                                                                            /* task_ctx is NULL whereas ctx is !NULL */
                                                                                                                            WARN_ON_ONCE(task_ctx != ctx);

                                                                                                                            ^^^^^^^^^^^^

IIUC, event->parent is the only way to detect whether the event is a
child or not, so it makes sense to preserve the ->parent pointer even
after the child is detached.

Thanks,
Ravi

  reply	other threads:[~2025-02-19 13:23 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-05 10:21 [PATCH v2 00/24] perf: Make perf_pmu_unregister() usable Peter Zijlstra
2025-02-05 10:21 ` [PATCH v2 01/24] lockdep: Fix might_fault() Peter Zijlstra
2025-02-06 18:19   ` David Hildenbrand
2025-02-05 10:21 ` [PATCH v2 02/24] perf: Ensure bpf_perf_link path is properly serialized Peter Zijlstra
2025-02-05 10:21 ` [PATCH v2 03/24] perf: Simplify child event tear-down Peter Zijlstra
2025-02-05 10:21 ` [PATCH v2 04/24] perf: Simplify perf_event_free_task() wait Peter Zijlstra
2025-02-05 10:21 ` [PATCH v2 05/24] perf: Simplify perf_event_release_kernel() Peter Zijlstra
2025-02-05 10:21 ` [PATCH v2 06/24] perf: Fix pmus_lock vs pmus_srcu ordering Peter Zijlstra
2025-02-27 16:59   ` Lucas De Marchi
2025-02-05 10:21 ` [PATCH v2 07/24] perf: Fix perf_pmu_register() vs perf_init_event() Peter Zijlstra
2025-02-05 10:21 ` [PATCH v2 08/24] perf: Cleanup perf_try_init_event() Peter Zijlstra
2025-03-05 11:29   ` [tip: perf/core] perf/core: Clean up perf_try_init_event() tip-bot2 for Peter Zijlstra
2025-02-05 10:21 ` [PATCH v2 09/24] perf: Simplify perf_event_alloc() error path Peter Zijlstra
2025-02-05 10:21 ` [PATCH v2 10/24] perf: Simplify perf_pmu_register() " Peter Zijlstra
2025-02-05 10:21 ` [PATCH v2 11/24] perf: Simplify perf_pmu_register() Peter Zijlstra
2025-02-05 10:21 ` [PATCH v2 12/24] perf: Simplify perf_init_event() Peter Zijlstra
2025-02-05 10:21 ` [PATCH v2 13/24] perf: Simplify perf_event_alloc() Peter Zijlstra
2025-02-05 10:21 ` [PATCH v2 14/24] perf: Merge pmu_disable_count into cpu_pmu_context Peter Zijlstra
2025-02-05 10:21 ` [PATCH v2 15/24] perf: Add this_cpc() helper Peter Zijlstra
2025-02-05 10:21 ` [PATCH v2 16/24] perf: Detach perf_cpu_pmu_context and pmu lifetimes Peter Zijlstra
2025-02-05 10:21 ` [PATCH v2 17/24] perf: Introduce perf_free_addr_filters() Peter Zijlstra
2025-02-05 10:21 ` [PATCH v2 18/24] perf: Robustify perf_event_free_bpf_prog() Peter Zijlstra
2025-02-05 10:21 ` [PATCH v2 19/24] perf: Simplify perf_mmap() control flow Peter Zijlstra
2025-03-03  5:39   ` Ravi Bangoria
2025-03-03 11:19     ` Ingo Molnar
2025-03-03 13:36       ` Ravi Bangoria
2025-03-04  8:44         ` Ingo Molnar
2025-02-05 10:21 ` [PATCH v2 20/24] perf: Fix perf_mmap() failure path Peter Zijlstra
2025-02-05 10:21 ` [PATCH v2 21/24] perf: Further simplify perf_mmap() Peter Zijlstra
2025-02-05 10:21 ` [PATCH v2 22/24] perf: Remove retry loop from perf_mmap() Peter Zijlstra
2025-02-05 10:21 ` [PATCH v2 23/24] perf: Lift event->mmap_mutex in perf_mmap() Peter Zijlstra
2025-02-05 10:21 ` [PATCH v2 24/24] perf: Make perf_pmu_unregister() useable Peter Zijlstra
2025-02-10  6:39   ` Ravi Bangoria
2025-02-11 15:46     ` Peter Zijlstra
2025-02-10  6:42   ` Ravi Bangoria
2025-02-12 12:49     ` Peter Zijlstra
2025-02-13  7:52       ` Ravi Bangoria
2025-02-13 13:08         ` Peter Zijlstra
2025-02-14  3:57           ` Ravi Bangoria
2025-02-14 20:24         ` Peter Zijlstra
2025-02-17  8:24           ` Ravi Bangoria
2025-02-17 16:31             ` Ravi Bangoria
2025-02-19 13:23               ` Ravi Bangoria [this message]
2025-02-19 14:30                 ` Ravi Bangoria
2025-02-10  6:59   ` Ravi Bangoria
2025-02-13 13:07     ` Peter Zijlstra
2025-03-03  6:01 ` [PATCH v2 00/24] perf: Make perf_pmu_unregister() usable Ravi Bangoria

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e8dfc3df-69d6-46ba-81db-4a04484fb7c3@amd.com \
    --to=ravi.bangoria@amd.com \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=irogers@google.com \
    --cc=jolsa@kernel.org \
    --cc=kan.liang@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lucas.demarchi@intel.com \
    --cc=mark.rutland@arm.com \
    --cc=mingo@kernel.org \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox