From: Yeoreum Yun <yeoreum.yun@arm.com>
To: peterz@infradead.org, mingo@redhat.com, mingo@kernel.org,
acme@kernel.org, namhyung@kernel.org, leo.yan@arm.com,
mark.rutland@arm.com, alexander.shishkin@linux.intel.com,
jolsa@kernel.org, irogers@google.com, adrian.hunter@intel.com,
kan.liang@linux.intel.com
Cc: linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org,
David Wang <00107082@163.com>
Subject: Re: [PATCH 1/1] perf/core: fix dangling cgroup pointer in cpuctx
Date: Tue, 3 Jun 2025 10:20:04 +0100 [thread overview]
Message-ID: <aD6+RGnAOyIS+tik@e129823.arm.com> (raw)
In-Reply-To: <aD6Xk2rdBjnVy6DA@e129823.arm.com>
Hi David,
> > > > >
> > > > > Also, your patch couldn't solve a problem describe in
> > > > > commit a3c3c6667("perf/core: Fix child_total_time_enabled accounting bug at task exit")
> > > > > for INCATIVE event's total_enable_time.
> > > >
> > > > I do not think so.
> > > > Correct me if I am making silly mistakes,
> > > > The patch, https://lore.kernel.org/lkml/20250603032651.3988-1-00107082@163.com/
> > > > calls perf_event_set_state() based on DETACH_EXIT flag, which cover the INACTIVE state, right?
> > > > If DETACH_EXIT is not used for this purpose? Then why should it exist at the first place?
> > > > I think I does not revert the purpose of commit a3c3c6667.....But I could be wrong
> > > > Would you show a call path where DETACH_EXIT is not set, but the changes in commit a3c3c6667 is still needed?
> > > >
> > > > Sorry for my bad explaination without detail.
> > > > Think about cpu specific event and closed by task.
> > > > If there is specific child cpu event specified in cpu 0.
> > > > 1. cpu 0 -> active
> > > > 2. scheulded to cpu1 -> inactive
> > > > 3. close the cpu event from parent -> inactive close
> > > >
> > > > Can be failed to count total_enable_time.
> > >
> > > Is this explaining the purpose of commit a3c3c6667 ?
> > > I am not arguing with it. And I also not suggest reverting it. (it is just that reverting it can fix the kernel panic.)
> >
> > In commit a3c3c6667, I explain the specific case but not with above
> > case. But the commit's purpose is "account total_enable_time" properly.
> >
> > > > And also, considering the your patch, for DETACH_EXIT case,
> > > > If it changes the state before list_del_event() that wouldn't disable
> > > > related to the cgroup. So it would make cpuctx->cgrp pointer could be dangled
> > > > as patch describe...
> > > No, I don't think so.
> > > change state before list_del_event(), this is the same behavior before commit a3c3c6667, right?
> > > And no such kernel panic happened before commit a3c3c6667.
>
> Oh! I was wrong, before commit a3c3c6667, "change state" happened *after* list_del_event()
> >
> > That's why list_del_event() handle the perf_cgroup_disable() before the
> > commit a3c3c6667. However because of *my mistake*, I've forget to
> > perf_cgroup_disable() properly before change the event state.
> > Yes, your patch can make avoid the panic since as soon as exit,
> > the event->cgrp switched.
>
> I cannot agree with the reasoning,
> The panic dose not happened when exit, it happened when reboot/shutdown.
> (I close perf_event_open before reboot)
> >
> > However, as I said, the INACTIVE event could be failed to count
> >total_enable_time.
> >
> > So, set event should be occured before list_del_event().
> >And since it's event->state change on remove.
> >It shouldn't have any side effect the state change isn't cause of your
> > panic. But missed perf_cgroup_disable().
>
> Any procedure to bring out the impact of this missed perf_cgroup_disable()?
> My system seems all normal, where should I check it?
Here is possible senario:
1. perf event open with cgroup.
2. perf event open with cpu event (no cgroup).
3. above task sets the cpuctx->cgrp the same to (1).
3. close (1) events.
here, perf_cgroup_event_disable() isn't called,
cpuctx->cgrp still point the cgroup.
4. by other task, the cgroup and is destroied.
5. close (2) events.
here, it is last event, in __perf_remove_from_context()
and last event, it calls update_cgrp_time_from_cpuctx(),
And this refers invalid pointer.
> But to fix it, isn't following change less aggressive?
> event_sched_out(event, ctx);
> - perf_event_set_state(event, min(event->state, state));
> if (flags & DETACH_GROUP)
> perf_group_detach(event);
> if (flags & DETACH_CHILD)
> perf_child_detach(event);
> list_del_event(event, ctx);
> + perf_event_set_state(event, min(event->state, state));
If perf_child_detach() is called first and perf_event_set_state() call,
since the parent is removed in perf_child_detatced,
It would be failed to account the total_enable_time which caculating
child_event's enable_time too.
Thanks
--
Sincerely,
Yeoreum Yun
next prev parent reply other threads:[~2025-06-03 9:20 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-02 18:40 [PATCH 1/1] perf/core: fix dangling cgroup pointer in cpuctx Yeoreum Yun
2025-06-03 2:01 ` David Wang
2025-06-03 4:46 ` [PATCH " Yeoreum Yun
2025-06-03 5:44 ` David Wang
2025-06-03 6:34 ` Yeoreum Yun
2025-06-03 6:39 ` Yeoreum Yun
2025-06-03 6:47 ` David Wang
2025-06-03 6:42 ` David Wang
2025-06-03 7:16 ` Yeoreum Yun
2025-06-03 7:31 ` David Wang
2025-06-03 8:15 ` David Wang
2025-06-03 6:54 ` David Wang
2025-06-03 9:20 ` Yeoreum Yun [this message]
2025-06-03 10:08 ` David Wang
2025-06-03 13:41 ` Yeoreum Yun
2025-06-03 14:02 ` David Wang
2025-06-03 14:00 ` Leo Yan
2025-06-03 14:44 ` Peter Zijlstra
2025-06-03 15:17 ` Yeoreum Yun
2025-06-04 7:06 ` Peter Zijlstra
2025-06-04 8:03 ` Peter Zijlstra
2025-06-04 10:06 ` Yeoreum Yun
2025-06-04 12:37 ` Peter Zijlstra
2025-06-04 12:54 ` Yeoreum Yun
2025-06-04 10:18 ` Leo Yan
2025-06-04 13:58 ` Peter Zijlstra
2025-06-04 15:17 ` Leo Yan
2025-06-11 9:29 ` [tip: perf/urgent] perf: Add comment to enum perf_event_state tip-bot2 for Peter Zijlstra
2025-06-04 14:16 ` [PATCH 1/1] perf/core: fix dangling cgroup pointer in cpuctx Peter Zijlstra
2025-06-04 15:46 ` Leo Yan
2025-06-04 15:59 ` Peter Zijlstra
2025-06-05 11:29 ` Peter Zijlstra
2025-06-05 12:33 ` Peter Zijlstra
2025-06-05 17:21 ` Leo Yan
2025-06-11 9:29 ` [tip: perf/urgent] perf: Fix cgroup state vs ERROR tip-bot2 for Peter Zijlstra
2025-06-05 11:41 ` [PATCH 1/1] perf/core: fix dangling cgroup pointer in cpuctx Peter Zijlstra
2025-06-03 15:05 ` Yeoreum Yun
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aD6+RGnAOyIS+tik@e129823.arm.com \
--to=yeoreum.yun@arm.com \
--cc=00107082@163.com \
--cc=acme@kernel.org \
--cc=adrian.hunter@intel.com \
--cc=alexander.shishkin@linux.intel.com \
--cc=irogers@google.com \
--cc=jolsa@kernel.org \
--cc=kan.liang@linux.intel.com \
--cc=leo.yan@arm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=mingo@kernel.org \
--cc=mingo@redhat.com \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.