From: Yeoreum Yun <yeoreum.yun@arm.com>
To: Leo Yan <leo.yan@arm.com>
Cc: peterz@infradead.org, mingo@redhat.com, mingo@kernel.org,
acme@kernel.org, namhyung@kernel.org, mark.rutland@arm.com,
alexander.shishkin@linux.intel.com, jolsa@kernel.org,
irogers@google.com, adrian.hunter@intel.com,
kan.liang@linux.intel.com, linux-perf-users@vger.kernel.org,
linux-kernel@vger.kernel.org, David Wang <00107082@163.com>
Subject: Re: [PATCH 1/1] perf/core: fix dangling cgroup pointer in cpuctx
Date: Tue, 3 Jun 2025 16:05:23 +0100 [thread overview]
Message-ID: <aD8PM7qx7vCDhs8a@e129823.arm.com> (raw)
In-Reply-To: <20250603140040.GB8020@e132581.arm.com>
Hi Leo,
>
> On Mon, Jun 02, 2025 at 07:40:49PM +0100, Yeoreum Yun wrote:
> > commit a3c3c6667("perf/core: Fix child_total_time_enabled accounting bug at task exit")
> > changes the event->state update before list_del_event().
> > This change prevents calling perf_cgroup_event_disable() as a result,
> > cpuctx->cgrp can't be cleared properly and point to dangling point of cgroup.
> >
> > Because of this problem, some machin meets the below panic[0]:
> >
> > 863.881960] sysved_call_function_sing le+0x4c/0xc0
> > 863.881301] asm_sysvec_call_function_single+0x16/0x20
> > 869.881344] RIP: 0633:0x7f9alcea3367
> > 663.681373] Code: 00 66 99 b8 ff ff ff ff c3 66 ....
> > 863.881524] RSP: 002b:00007fffa526fcf8 EFLAGS: 00000246
> > 869.881567] RAX: 0000562060c962d0 RBX: 0000000000000002 RCX: 00007f9a1cff1c60
> > 863.881625] RDX: 00007f9a0c000030 RSI: 00007f9alcff1c60 RDI: 00007f9a1ca91c20
> > 863.081682] RBP: 0000000000000001 R08: 0000000000000000 R09: 00007f9a1d6217a0
> > 869.881740] R10: 00007f9alca91c10 R11: 0000000000000246 R12: 00007f9a1d70c020
> > 869.881798] R13: 00007fffa5270030 R14: 00007fffa526fd00 R15: 0000000000000000
> > 863.881860] </TASK>
> > 863.881876) Modules linked in: snd_seq_dummy (E) snd_hrtimer (E)...
> > ...
> > 863.887142] button (E)
> > 863.912127] CR2: ffffe4afcc079650
> > 863.914593] --- [ end trace 0000000000000000 1--
> > 864.042750] RIP: 0010:ctx_sched_out+0x1ce/0x210
> > 864.045214] Code: 89 c6 4c 8b b9 de 00 00 00 48 ...
> > 864.050343] RSP: 0000:ffffaa4ec0f3fe60 EFLAGS: 00010086
> > 864.052929] RAX: 0000000000000002 RBX: ffff8e8eeed2a580 RCX: ffff8e8bded9bf00
> > 864.055518] RDX: 000000c92340b051 RSI: 000000c92340b051 RDI: ffff
> > 864.058093] RBP: 0000000000000000 R08: 0000000000000002 R09: 00
> > 864.060654] R10: 0000000000000000 R11: 0000000000000000 R12: 000
> > 864.063183] R13: ffff8e8eeed2a580 R14: 0000000000000007 R15: ffffe4afcc079650
> > 864.065729] FS: 00007f9a1ca91940 (0000) GS:ffff8e8f6b1c3000(0000) knIGS:0000000000000000
> > 864.068312] CS: 0010 DS: 0000 ES: 0000 CRO: 0000000080050033
> > 864.070898] CR2: ffffe4afcc079650 CR3: 00000001136d8000 CR4: 0000000000350ef0
> > 864.673523] Kernel panic - not syncing: Fatal exception in interrupt
> > 864.076410] Kernel Offset: 0xc00000 from 0xffffffff81000000 (relocation range: 0xff
> > 864.205401] --- [ end Kernel panic - not syncing: Fatal exception in interrupt ]---
> >
> > To address this call the perf_cgroup_event_disable() properly before
> > list_del_event() in __perf_remove_from_context().
> >
> > Link: https://lore.kernel.org/all/aD2TspKH%2F7yvfYoO@e129823.arm.com/ [0]
> > Fixes: a3c3c6667("perf/core: Fix child_total_time_enabled accounting bug at task exit")
> > Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
> > Tested-by: David Wang <00107082@163.com>
> > ---
> > kernel/events/core.c | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index f34c99f8ce8f..909b9d5a65c1 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -2498,6 +2498,10 @@ __perf_remove_from_context(struct perf_event *event,
> > state = PERF_EVENT_STATE_DEAD;
> > }
> > event_sched_out(event, ctx);
> > +
> > + if (event->state > PERF_EVENT_STATE_OFF)
> > + perf_cgroup_event_disable(event, ctx);
> > +
>
> As we discussed, seems to me, the issue is caused by an ambigous state
> machine transition:
>
> When a PMU event state is PERF_EVENT_STATE_EXIT, the current code does
> not transite the state to PERF_EVENT_STATE_OFF. As a result, the
> list_del_event() function skips to clean up cgroup pointer for non OFF
> states. This is different from the code prior to the commit a3c3c6667,
> which transits states EXIT -> INACTIVE -> OFF.
Just for correct this, prior to commit a3c3c6667,
Isn't the state change INACTIVE -> OFF -> EXIT?
when I saw the perf_event_exit_event(),
perf_remove_from_context(event, detacg_flags
...
perf_event_set_state(event, PERF_EVENT_STATE_EXIT),
So,
1. event_sched_out() -> INACTIVE
2. list_del_event() -> OFF
3. perf_event_set_state() -> EXIT.
Therefore, the final event state is "PREF_EVENT_STATE_EXIT"
Therefore, I think calling "perf_cgroup_event_disable()" is called
as soon as event_sched_out() since commit a3c3c6667 change the state
INCATIVE -> EXIT.
so, the middle state of OFF is gone and I think this middle state "OFF"
doesn't affect any race condition since the removed event "wakeup" is
also called after it changed to "final state" -- in this case, the
PERF_EVENT_STATE_EXIT.
>
> My suggestion is not reliable. Roughly read code, except for the
> PERF_EVENT_STATE_EXIT case, I think other error cases should also clean
> up the cgroup pointer. The reason is I don't see other places to
> clean up the cgroup pointer for these error cases:
>
> PERF_EVENT_STATE_REVOKED
> PERF_EVENT_STATE_DEAD
>
> Only in the PERF_EVENT_STATE_ERROR state, we don't need to cleanup
> cgroup as this has already been handled in merge_sched_in().
>
> So a correct condition would be:
>
> if (event->state > PERF_EVENT_STATE_OFF ||
> event->state <= PERF_EVENT_STATE_EXIT)
> perf_cgroup_event_disable(event, ctx);
>
> And we need to remove the perf_cgroup_event_disable() from
> list_del_event() to avoid duplicate code.
If we sustain perf_cgroup_event_disable()'s call in *list_del_event()*
But, Should it be? I think its enough to call as soon as after event_sched_out().
>
> Perhaps a better approach for code consolidation would be to modify
> the conditions in list_del_event() to ensure the cgroup pointer is
> cleaned up in error cases. However, I'm not confident that this is the
> correct direction, so I would wait for suggestions from the maintainers.
>
> Thanks,
> Leo
[...]
--
Sincerely,
Yeoreum Yun
prev parent reply other threads:[~2025-06-03 15:06 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-02 18:40 [PATCH 1/1] perf/core: fix dangling cgroup pointer in cpuctx Yeoreum Yun
2025-06-03 2:01 ` David Wang
2025-06-03 4:46 ` [PATCH " Yeoreum Yun
2025-06-03 5:44 ` David Wang
2025-06-03 6:34 ` Yeoreum Yun
2025-06-03 6:39 ` Yeoreum Yun
2025-06-03 6:47 ` David Wang
2025-06-03 6:42 ` David Wang
2025-06-03 7:16 ` Yeoreum Yun
2025-06-03 7:31 ` David Wang
2025-06-03 8:15 ` David Wang
2025-06-03 6:54 ` David Wang
2025-06-03 9:20 ` Yeoreum Yun
2025-06-03 10:08 ` David Wang
2025-06-03 13:41 ` Yeoreum Yun
2025-06-03 14:02 ` David Wang
2025-06-03 14:00 ` Leo Yan
2025-06-03 14:44 ` Peter Zijlstra
2025-06-03 15:17 ` Yeoreum Yun
2025-06-04 7:06 ` Peter Zijlstra
2025-06-04 8:03 ` Peter Zijlstra
2025-06-04 10:06 ` Yeoreum Yun
2025-06-04 12:37 ` Peter Zijlstra
2025-06-04 12:54 ` Yeoreum Yun
2025-06-04 10:18 ` Leo Yan
2025-06-04 13:58 ` Peter Zijlstra
2025-06-04 15:17 ` Leo Yan
2025-06-04 14:16 ` Peter Zijlstra
2025-06-04 15:46 ` Leo Yan
2025-06-04 15:59 ` Peter Zijlstra
2025-06-05 11:29 ` Peter Zijlstra
2025-06-05 12:33 ` Peter Zijlstra
2025-06-05 17:21 ` Leo Yan
2025-06-05 11:41 ` Peter Zijlstra
2025-06-03 15:05 ` Yeoreum Yun [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aD8PM7qx7vCDhs8a@e129823.arm.com \
--to=yeoreum.yun@arm.com \
--cc=00107082@163.com \
--cc=acme@kernel.org \
--cc=adrian.hunter@intel.com \
--cc=alexander.shishkin@linux.intel.com \
--cc=irogers@google.com \
--cc=jolsa@kernel.org \
--cc=kan.liang@linux.intel.com \
--cc=leo.yan@arm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=mingo@kernel.org \
--cc=mingo@redhat.com \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).