All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yeoreum Yun <yeoreum.yun@arm.com>
To: Leo Yan <leo.yan@arm.com>
Cc: peterz@infradead.org, mingo@redhat.com, mingo@kernel.org,
	acme@kernel.org, namhyung@kernel.org, mark.rutland@arm.com,
	alexander.shishkin@linux.intel.com, jolsa@kernel.org,
	irogers@google.com, adrian.hunter@intel.com,
	kan.liang@linux.intel.com, linux-perf-users@vger.kernel.org,
	linux-kernel@vger.kernel.org, David Wang <00107082@163.com>
Subject: Re: [PATCH 1/1] perf/core: fix dangling cgroup pointer in cpuctx
Date: Tue, 3 Jun 2025 16:05:23 +0100	[thread overview]
Message-ID: <aD8PM7qx7vCDhs8a@e129823.arm.com> (raw)
In-Reply-To: <20250603140040.GB8020@e132581.arm.com>

Hi Leo,
>
> On Mon, Jun 02, 2025 at 07:40:49PM +0100, Yeoreum Yun wrote:
> > commit a3c3c6667("perf/core: Fix child_total_time_enabled accounting bug at task exit")
> > changes the event->state update before list_del_event().
> > This change prevents calling perf_cgroup_event_disable() as a result,
> > cpuctx->cgrp can't be cleared properly and point to dangling point of cgroup.
> >
> > Because of this problem, some machin meets the below panic[0]:
> >
> > 863.881960] sysved_call_function_sing le+0x4c/0xc0
> > 863.881301] asm_sysvec_call_function_single+0x16/0x20
> > 869.881344] RIP: 0633:0x7f9alcea3367
> > 663.681373] Code: 00 66 99 b8 ff ff ff ff c3 66 ....
> > 863.881524] RSP: 002b:00007fffa526fcf8 EFLAGS: 00000246
> > 869.881567] RAX: 0000562060c962d0 RBX: 0000000000000002 RCX: 00007f9a1cff1c60
> > 863.881625] RDX: 00007f9a0c000030 RSI: 00007f9alcff1c60 RDI: 00007f9a1ca91c20
> > 863.081682] RBP: 0000000000000001 R08: 0000000000000000 R09: 00007f9a1d6217a0
> > 869.881740] R10: 00007f9alca91c10 R11: 0000000000000246 R12: 00007f9a1d70c020
> > 869.881798] R13: 00007fffa5270030 R14: 00007fffa526fd00 R15: 0000000000000000
> > 863.881860] </TASK>
> > 863.881876) Modules linked in: snd_seq_dummy (E) snd_hrtimer (E)...
> > ...
> > 863.887142] button (E)
> > 863.912127] CR2: ffffe4afcc079650
> > 863.914593] --- [ end trace 0000000000000000 1--
> > 864.042750] RIP: 0010:ctx_sched_out+0x1ce/0x210
> > 864.045214] Code: 89 c6 4c 8b b9 de 00 00 00 48 ...
> > 864.050343] RSP: 0000:ffffaa4ec0f3fe60 EFLAGS: 00010086
> > 864.052929] RAX: 0000000000000002 RBX: ffff8e8eeed2a580 RCX: ffff8e8bded9bf00
> > 864.055518] RDX: 000000c92340b051 RSI: 000000c92340b051 RDI: ffff
> > 864.058093] RBP: 0000000000000000 R08: 0000000000000002 R09: 00
> > 864.060654] R10: 0000000000000000 R11: 0000000000000000 R12: 000
> > 864.063183] R13: ffff8e8eeed2a580 R14: 0000000000000007 R15: ffffe4afcc079650
> > 864.065729] FS: 00007f9a1ca91940 (0000) GS:ffff8e8f6b1c3000(0000) knIGS:0000000000000000
> > 864.068312] CS: 0010 DS: 0000 ES: 0000 CRO: 0000000080050033
> > 864.070898] CR2: ffffe4afcc079650 CR3: 00000001136d8000 CR4: 0000000000350ef0
> > 864.673523] Kernel panic - not syncing: Fatal exception in interrupt
> > 864.076410] Kernel Offset: 0xc00000 from 0xffffffff81000000 (relocation range: 0xff
> > 864.205401] --- [ end Kernel panic - not syncing: Fatal exception in interrupt ]---
> >
> > To address this call the perf_cgroup_event_disable() properly before
> > list_del_event() in __perf_remove_from_context().
> >
> > Link: https://lore.kernel.org/all/aD2TspKH%2F7yvfYoO@e129823.arm.com/ [0]
> > Fixes: a3c3c6667("perf/core: Fix child_total_time_enabled accounting bug at task exit")
> > Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
> > Tested-by: David Wang <00107082@163.com>
> > ---
> >  kernel/events/core.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index f34c99f8ce8f..909b9d5a65c1 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -2498,6 +2498,10 @@ __perf_remove_from_context(struct perf_event *event,
> >  		state = PERF_EVENT_STATE_DEAD;
> >  	}
> >  	event_sched_out(event, ctx);
> > +
> > +	if (event->state > PERF_EVENT_STATE_OFF)
> > +		perf_cgroup_event_disable(event, ctx);
> > +
>
> As we discussed, seems to me, the issue is caused by an ambigous state
> machine transition:
>
> When a PMU event state is PERF_EVENT_STATE_EXIT, the current code does
> not transite the state to PERF_EVENT_STATE_OFF. As a result, the
> list_del_event() function skips to clean up cgroup pointer for non OFF
> states. This is different from the code prior to the commit a3c3c6667,
> which transits states EXIT -> INACTIVE -> OFF.

Just for correct this, prior to commit a3c3c6667,
Isn't the state change INACTIVE -> OFF -> EXIT?

when I saw the perf_event_exit_event(),
  perf_remove_from_context(event, detacg_flags
  ...
  perf_event_set_state(event, PERF_EVENT_STATE_EXIT),

So,

  1. event_sched_out() -> INACTIVE
  2. list_del_event() -> OFF
  3. perf_event_set_state() -> EXIT.

Therefore, the final event state is "PREF_EVENT_STATE_EXIT"

Therefore, I think calling "perf_cgroup_event_disable()" is called
as soon as event_sched_out() since commit a3c3c6667 change the state
INCATIVE -> EXIT.

so, the middle state of OFF is gone and I think this middle state "OFF"
doesn't affect any race condition since the removed event "wakeup" is
also called after it changed to "final state" -- in this case, the
PERF_EVENT_STATE_EXIT.

>
> My suggestion is not reliable. Roughly read code, except for the
> PERF_EVENT_STATE_EXIT case, I think other error cases should also clean
> up the cgroup pointer.  The reason is I don't see other places to
> clean up the cgroup pointer for these error cases:
>
>   PERF_EVENT_STATE_REVOKED
>   PERF_EVENT_STATE_DEAD
>
> Only in the PERF_EVENT_STATE_ERROR state, we don't need to cleanup
> cgroup as this has already been handled in merge_sched_in().
>
> So a correct condition would be:
>
>     if (event->state > PERF_EVENT_STATE_OFF ||
>         event->state <= PERF_EVENT_STATE_EXIT)
>         perf_cgroup_event_disable(event, ctx);
>
> And we need to remove the perf_cgroup_event_disable() from
> list_del_event() to avoid duplicate code.

If we sustain perf_cgroup_event_disable()'s call in *list_del_event()*
But, Should it be? I think its enough to call as soon as after event_sched_out().

>
> Perhaps a better approach for code consolidation would be to modify
> the conditions in list_del_event() to ensure the cgroup pointer is
> cleaned up in error cases. However, I'm not confident that this is the
> correct direction, so I would wait for suggestions from the maintainers.
>
> Thanks,
> Leo
[...]

--
Sincerely,
Yeoreum Yun

      parent reply	other threads:[~2025-06-03 15:06 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-02 18:40 [PATCH 1/1] perf/core: fix dangling cgroup pointer in cpuctx Yeoreum Yun
2025-06-03  2:01 ` David Wang
2025-06-03  4:46   ` [PATCH " Yeoreum Yun
2025-06-03  5:44     ` David Wang
2025-06-03  6:34       ` Yeoreum Yun
2025-06-03  6:39         ` Yeoreum Yun
2025-06-03  6:47           ` David Wang
2025-06-03  6:42         ` David Wang
2025-06-03  7:16           ` Yeoreum Yun
2025-06-03  7:31             ` David Wang
2025-06-03  8:15             ` David Wang
2025-06-03  6:54         ` David Wang
2025-06-03  9:20         ` Yeoreum Yun
2025-06-03 10:08           ` David Wang
2025-06-03 13:41             ` Yeoreum Yun
2025-06-03 14:02               ` David Wang
2025-06-03 14:00 ` Leo Yan
2025-06-03 14:44   ` Peter Zijlstra
2025-06-03 15:17     ` Yeoreum Yun
2025-06-04  7:06       ` Peter Zijlstra
2025-06-04  8:03     ` Peter Zijlstra
2025-06-04 10:06       ` Yeoreum Yun
2025-06-04 12:37         ` Peter Zijlstra
2025-06-04 12:54           ` Yeoreum Yun
2025-06-04 10:18       ` Leo Yan
2025-06-04 13:58         ` Peter Zijlstra
2025-06-04 15:17           ` Leo Yan
2025-06-11  9:29           ` [tip: perf/urgent] perf: Add comment to enum perf_event_state tip-bot2 for Peter Zijlstra
2025-06-04 14:16         ` [PATCH 1/1] perf/core: fix dangling cgroup pointer in cpuctx Peter Zijlstra
2025-06-04 15:46           ` Leo Yan
2025-06-04 15:59             ` Peter Zijlstra
2025-06-05 11:29             ` Peter Zijlstra
2025-06-05 12:33               ` Peter Zijlstra
2025-06-05 17:21                 ` Leo Yan
2025-06-11  9:29                 ` [tip: perf/urgent] perf: Fix cgroup state vs ERROR tip-bot2 for Peter Zijlstra
2025-06-05 11:41           ` [PATCH 1/1] perf/core: fix dangling cgroup pointer in cpuctx Peter Zijlstra
2025-06-03 15:05   ` Yeoreum Yun [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aD8PM7qx7vCDhs8a@e129823.arm.com \
    --to=yeoreum.yun@arm.com \
    --cc=00107082@163.com \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=irogers@google.com \
    --cc=jolsa@kernel.org \
    --cc=kan.liang@linux.intel.com \
    --cc=leo.yan@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@kernel.org \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.