From: "Liang, Kan" <kan.liang@linux.intel.com>
To: Madhavan Srinivasan <maddy@linux.ibm.com>,
Namhyung Kim <namhyung@kernel.org>,
Michael Ellerman <mpe@ellerman.id.au>
Cc: Ian Rogers <irogers@google.com>, Andi Kleen <ak@linux.intel.com>,
Peter Zijlstra <peterz@infradead.org>,
Jiri Olsa <jolsa@redhat.com>,
linux-kernel <linux-kernel@vger.kernel.org>,
Stephane Eranian <eranian@google.com>,
Paul Mackerras <paulus@samba.org>,
Arnaldo Carvalho de Melo <acme@kernel.org>,
linuxppc-dev@lists.ozlabs.org, Ingo Molnar <mingo@kernel.org>,
Gabriel Marin <gmx@google.com>
Subject: Re: [PATCH 1/3] perf/core: Flush PMU internal buffers for per-CPU events
Date: Tue, 24 Nov 2020 11:04:31 -0500 [thread overview]
Message-ID: <ba4d80fa-82e5-d3fd-c772-deb12e286de3@linux.intel.com> (raw)
In-Reply-To: <9657dc9f-e1a9-eb7e-8ac2-a108416d5a10@linux.ibm.com>
On 11/24/2020 12:42 AM, Madhavan Srinivasan wrote:
>
> On 11/24/20 10:21 AM, Namhyung Kim wrote:
>> Hello,
>>
>> On Mon, Nov 23, 2020 at 8:00 PM Michael Ellerman <mpe@ellerman.id.au>
>> wrote:
>>> Namhyung Kim <namhyung@kernel.org> writes:
>>>> Hi Peter and Kan,
>>>>
>>>> (Adding PPC folks)
>>>>
>>>> On Tue, Nov 17, 2020 at 2:01 PM Namhyung Kim <namhyung@kernel.org>
>>>> wrote:
>>>>> Hello,
>>>>>
>>>>> On Thu, Nov 12, 2020 at 4:54 AM Liang, Kan
>>>>> <kan.liang@linux.intel.com> wrote:
>>>>>>
>>>>>>
>>>>>> On 11/11/2020 11:25 AM, Peter Zijlstra wrote:
>>>>>>> On Mon, Nov 09, 2020 at 09:49:31AM -0500, Liang, Kan wrote:
>>>>>>>
>>>>>>>> - When the large PEBS was introduced (9c964efa4330), the
>>>>>>>> sched_task() should
>>>>>>>> be invoked to flush the PEBS buffer in each context switch.
>>>>>>>> However, The
>>>>>>>> perf_sched_events in account_event() is not updated accordingly.
>>>>>>>> The
>>>>>>>> perf_event_task_sched_* never be invoked for a pure per-CPU
>>>>>>>> context. Only
>>>>>>>> per-task event works.
>>>>>>>> At that time, the perf_pmu_sched_task() is outside of
>>>>>>>> perf_event_context_sched_in/out. It means that perf has to double
>>>>>>>> perf_pmu_disable() for per-task event.
>>>>>>>> - The patch 1 tries to fix broken per-CPU events. The CPU
>>>>>>>> context cannot be
>>>>>>>> retrieved from the task->perf_event_ctxp. So it has to be
>>>>>>>> tracked in the
>>>>>>>> sched_cb_list. Yes, the code is very similar to the original
>>>>>>>> codes, but it
>>>>>>>> is actually the new code for per-CPU events. The optimization
>>>>>>>> for per-task
>>>>>>>> events is still kept.
>>>>>>>> For the case, which has both a CPU context and a task
>>>>>>>> context, yes, the
>>>>>>>> __perf_pmu_sched_task() in this patch is not invoked. Because the
>>>>>>>> sched_task() only need to be invoked once in a context switch. The
>>>>>>>> sched_task() will be eventually invoked in the task context.
>>>>>>> The thing is; your first two patches rely on PERF_ATTACH_SCHED_CB
>>>>>>> and
>>>>>>> only set that for large pebs. Are you sure the other users (Intel
>>>>>>> LBR
>>>>>>> and PowerPC BHRB) don't need it?
>>>>>> I didn't set it for LBR, because the perf_sched_events is always
>>>>>> enabled
>>>>>> for LBR. But, yes, we should explicitly set the PERF_ATTACH_SCHED_CB
>>>>>> for LBR.
>>>>>>
>>>>>> if (has_branch_stack(event))
>>>>>> inc = true;
>>>>>>
>>>>>>> If they indeed do not require the pmu::sched_task() callback for CPU
>>>>>>> events, then I still think the whole perf_sched_cb_{inc,dec}()
>>>>>>> interface
>>>>>> No, LBR requires the pmu::sched_task() callback for CPU events.
>>>>>>
>>>>>> Now, The LBR registers have to be reset in sched in even for CPU
>>>>>> events.
>>>>>>
>>>>>> To fix the shorter LBR callstack issue for CPU events, we also
>>>>>> need to
>>>>>> save/restore LBRs in pmu::sched_task().
>>>>>> https://lore.kernel.org/lkml/1578495789-95006-4-git-send-email-kan.liang@linux.intel.com/
>>>>>>
>>>>>>
>>>>>>> is confusing at best.
>>>>>>>
>>>>>>> Can't we do something like this instead?
>>>>>>>
>>>>>> I think the below patch may have two issues.
>>>>>> - PERF_ATTACH_SCHED_CB is required for LBR (maybe PowerPC BHRB as
>>>>>> well) now.
>>>>>> - We may disable the large PEBS later if not all PEBS events support
>>>>>> large PEBS. The PMU need a way to notify the generic code to decrease
>>>>>> the nr_sched_task.
>>>>> Any updates on this? I've reviewed and tested Kan's patches
>>>>> and they all look good.
>>>>>
>>>>> Maybe we can talk to PPC folks to confirm the BHRB case?
>>>> Can we move this forward? I saw patch 3/3 also adds
>>>> PERF_ATTACH_SCHED_CB
>>>> for PowerPC too. But it'd be nice if ppc folks can confirm the change.
>>> Sorry I've read the whole thread, but I'm still not entirely sure I
>>> understand the question.
>> Thanks for your time and sorry about not being clear enough.
>>
>> We found per-cpu events are not calling pmu::sched_task()
>> on context switches. So PERF_ATTACH_SCHED_CB was
>> added to indicate the core logic that it needs to invoke the
>> callback.
>>
>> The patch 3/3 added the flag to PPC (for BHRB) with other
>> changes (I think it should be split like in the patch 2/3) and
>> want to get ACKs from the PPC folks.
>
> Sorry for delay.
>
> I guess first it will be better to split the ppc change to a separate
> patch,
Both PPC and X86 invokes the perf_sched_cb_inc() directly. The patch
changes the parameters of the perf_sched_cb_inc(). I think we have to
update the PPC and X86 codes together. Otherwise, there will be a
compile error, if someone may only applies the change for the
perf_sched_cb_inc() but forget to applies the changes in PPC or X86
specific codes.
>
> secondly, we are missing the changes needed in the power_pmu_bhrb_disable()
>
> where perf_sched_cb_dec() needs the "state" to be included.
>
Ah, right. The below patch should fix the issue.
diff --git a/arch/powerpc/perf/core-book3s.c
b/arch/powerpc/perf/core-book3s.c
index bced502f64a1..6756d1602a67 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -391,13 +391,18 @@ static void power_pmu_bhrb_enable(struct
perf_event *event)
static void power_pmu_bhrb_disable(struct perf_event *event)
{
struct cpu_hw_events *cpuhw = this_cpu_ptr(&cpu_hw_events);
+ int state = PERF_SCHED_CB_SW_IN;
if (!ppmu->bhrb_nr)
return;
WARN_ON_ONCE(!cpuhw->bhrb_users);
cpuhw->bhrb_users--;
- perf_sched_cb_dec(event->ctx->pmu);
+
+ if (!(event->attach_state & PERF_ATTACH_TASK))
+ state |= PERF_SCHED_CB_CPU;
+
+ perf_sched_cb_dec(event->ctx->pmu, state);
if (!cpuhw->disabled && !cpuhw->bhrb_users) {
/* BHRB cannot be turned off when other
Thanks,
Kan
WARNING: multiple messages have this Message-ID (diff)
From: "Liang, Kan" <kan.liang@linux.intel.com>
To: Madhavan Srinivasan <maddy@linux.ibm.com>,
Namhyung Kim <namhyung@kernel.org>,
Michael Ellerman <mpe@ellerman.id.au>
Cc: Ian Rogers <irogers@google.com>, Andi Kleen <ak@linux.intel.com>,
Peter Zijlstra <peterz@infradead.org>,
linuxppc-dev@lists.ozlabs.org,
linux-kernel <linux-kernel@vger.kernel.org>,
Stephane Eranian <eranian@google.com>,
Paul Mackerras <paulus@samba.org>,
Arnaldo Carvalho de Melo <acme@kernel.org>,
Jiri Olsa <jolsa@redhat.com>, Ingo Molnar <mingo@kernel.org>,
Gabriel Marin <gmx@google.com>
Subject: Re: [PATCH 1/3] perf/core: Flush PMU internal buffers for per-CPU events
Date: Tue, 24 Nov 2020 11:04:31 -0500 [thread overview]
Message-ID: <ba4d80fa-82e5-d3fd-c772-deb12e286de3@linux.intel.com> (raw)
In-Reply-To: <9657dc9f-e1a9-eb7e-8ac2-a108416d5a10@linux.ibm.com>
On 11/24/2020 12:42 AM, Madhavan Srinivasan wrote:
>
> On 11/24/20 10:21 AM, Namhyung Kim wrote:
>> Hello,
>>
>> On Mon, Nov 23, 2020 at 8:00 PM Michael Ellerman <mpe@ellerman.id.au>
>> wrote:
>>> Namhyung Kim <namhyung@kernel.org> writes:
>>>> Hi Peter and Kan,
>>>>
>>>> (Adding PPC folks)
>>>>
>>>> On Tue, Nov 17, 2020 at 2:01 PM Namhyung Kim <namhyung@kernel.org>
>>>> wrote:
>>>>> Hello,
>>>>>
>>>>> On Thu, Nov 12, 2020 at 4:54 AM Liang, Kan
>>>>> <kan.liang@linux.intel.com> wrote:
>>>>>>
>>>>>>
>>>>>> On 11/11/2020 11:25 AM, Peter Zijlstra wrote:
>>>>>>> On Mon, Nov 09, 2020 at 09:49:31AM -0500, Liang, Kan wrote:
>>>>>>>
>>>>>>>> - When the large PEBS was introduced (9c964efa4330), the
>>>>>>>> sched_task() should
>>>>>>>> be invoked to flush the PEBS buffer in each context switch.
>>>>>>>> However, The
>>>>>>>> perf_sched_events in account_event() is not updated accordingly.
>>>>>>>> The
>>>>>>>> perf_event_task_sched_* never be invoked for a pure per-CPU
>>>>>>>> context. Only
>>>>>>>> per-task event works.
>>>>>>>> At that time, the perf_pmu_sched_task() is outside of
>>>>>>>> perf_event_context_sched_in/out. It means that perf has to double
>>>>>>>> perf_pmu_disable() for per-task event.
>>>>>>>> - The patch 1 tries to fix broken per-CPU events. The CPU
>>>>>>>> context cannot be
>>>>>>>> retrieved from the task->perf_event_ctxp. So it has to be
>>>>>>>> tracked in the
>>>>>>>> sched_cb_list. Yes, the code is very similar to the original
>>>>>>>> codes, but it
>>>>>>>> is actually the new code for per-CPU events. The optimization
>>>>>>>> for per-task
>>>>>>>> events is still kept.
>>>>>>>> For the case, which has both a CPU context and a task
>>>>>>>> context, yes, the
>>>>>>>> __perf_pmu_sched_task() in this patch is not invoked. Because the
>>>>>>>> sched_task() only need to be invoked once in a context switch. The
>>>>>>>> sched_task() will be eventually invoked in the task context.
>>>>>>> The thing is; your first two patches rely on PERF_ATTACH_SCHED_CB
>>>>>>> and
>>>>>>> only set that for large pebs. Are you sure the other users (Intel
>>>>>>> LBR
>>>>>>> and PowerPC BHRB) don't need it?
>>>>>> I didn't set it for LBR, because the perf_sched_events is always
>>>>>> enabled
>>>>>> for LBR. But, yes, we should explicitly set the PERF_ATTACH_SCHED_CB
>>>>>> for LBR.
>>>>>>
>>>>>> if (has_branch_stack(event))
>>>>>> inc = true;
>>>>>>
>>>>>>> If they indeed do not require the pmu::sched_task() callback for CPU
>>>>>>> events, then I still think the whole perf_sched_cb_{inc,dec}()
>>>>>>> interface
>>>>>> No, LBR requires the pmu::sched_task() callback for CPU events.
>>>>>>
>>>>>> Now, The LBR registers have to be reset in sched in even for CPU
>>>>>> events.
>>>>>>
>>>>>> To fix the shorter LBR callstack issue for CPU events, we also
>>>>>> need to
>>>>>> save/restore LBRs in pmu::sched_task().
>>>>>> https://lore.kernel.org/lkml/1578495789-95006-4-git-send-email-kan.liang@linux.intel.com/
>>>>>>
>>>>>>
>>>>>>> is confusing at best.
>>>>>>>
>>>>>>> Can't we do something like this instead?
>>>>>>>
>>>>>> I think the below patch may have two issues.
>>>>>> - PERF_ATTACH_SCHED_CB is required for LBR (maybe PowerPC BHRB as
>>>>>> well) now.
>>>>>> - We may disable the large PEBS later if not all PEBS events support
>>>>>> large PEBS. The PMU need a way to notify the generic code to decrease
>>>>>> the nr_sched_task.
>>>>> Any updates on this? I've reviewed and tested Kan's patches
>>>>> and they all look good.
>>>>>
>>>>> Maybe we can talk to PPC folks to confirm the BHRB case?
>>>> Can we move this forward? I saw patch 3/3 also adds
>>>> PERF_ATTACH_SCHED_CB
>>>> for PowerPC too. But it'd be nice if ppc folks can confirm the change.
>>> Sorry I've read the whole thread, but I'm still not entirely sure I
>>> understand the question.
>> Thanks for your time and sorry about not being clear enough.
>>
>> We found per-cpu events are not calling pmu::sched_task()
>> on context switches. So PERF_ATTACH_SCHED_CB was
>> added to indicate the core logic that it needs to invoke the
>> callback.
>>
>> The patch 3/3 added the flag to PPC (for BHRB) with other
>> changes (I think it should be split like in the patch 2/3) and
>> want to get ACKs from the PPC folks.
>
> Sorry for delay.
>
> I guess first it will be better to split the ppc change to a separate
> patch,
Both PPC and X86 invokes the perf_sched_cb_inc() directly. The patch
changes the parameters of the perf_sched_cb_inc(). I think we have to
update the PPC and X86 codes together. Otherwise, there will be a
compile error, if someone may only applies the change for the
perf_sched_cb_inc() but forget to applies the changes in PPC or X86
specific codes.
>
> secondly, we are missing the changes needed in the power_pmu_bhrb_disable()
>
> where perf_sched_cb_dec() needs the "state" to be included.
>
Ah, right. The below patch should fix the issue.
diff --git a/arch/powerpc/perf/core-book3s.c
b/arch/powerpc/perf/core-book3s.c
index bced502f64a1..6756d1602a67 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -391,13 +391,18 @@ static void power_pmu_bhrb_enable(struct
perf_event *event)
static void power_pmu_bhrb_disable(struct perf_event *event)
{
struct cpu_hw_events *cpuhw = this_cpu_ptr(&cpu_hw_events);
+ int state = PERF_SCHED_CB_SW_IN;
if (!ppmu->bhrb_nr)
return;
WARN_ON_ONCE(!cpuhw->bhrb_users);
cpuhw->bhrb_users--;
- perf_sched_cb_dec(event->ctx->pmu);
+
+ if (!(event->attach_state & PERF_ATTACH_TASK))
+ state |= PERF_SCHED_CB_CPU;
+
+ perf_sched_cb_dec(event->ctx->pmu, state);
if (!cpuhw->disabled && !cpuhw->bhrb_users) {
/* BHRB cannot be turned off when other
Thanks,
Kan
next prev parent reply other threads:[~2020-11-24 16:07 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-06 21:29 [PATCH 1/3] perf/core: Flush PMU internal buffers for per-CPU events kan.liang
2020-11-06 21:29 ` [PATCH 2/3] perf/x86/intel: Set PERF_ATTACH_SCHED_CB for large PEBS kan.liang
2020-11-06 21:29 ` [PATCH 3/3] perf: Optimize sched_task() in a context switch kan.liang
2020-11-09 9:52 ` [PATCH 1/3] perf/core: Flush PMU internal buffers for per-CPU events Peter Zijlstra
2020-11-09 11:04 ` Peter Zijlstra
2020-11-09 14:49 ` Liang, Kan
2020-11-09 17:33 ` Peter Zijlstra
2020-11-09 19:52 ` Liang, Kan
2020-11-09 17:40 ` Peter Zijlstra
2020-11-11 16:25 ` Peter Zijlstra
2020-11-11 19:54 ` Liang, Kan
2020-11-17 5:01 ` Namhyung Kim
2020-11-20 11:24 ` Namhyung Kim
2020-11-20 11:24 ` Namhyung Kim
2020-11-23 11:00 ` Michael Ellerman
2020-11-23 11:00 ` Michael Ellerman
2020-11-24 4:51 ` Namhyung Kim
2020-11-24 4:51 ` Namhyung Kim
2020-11-24 5:42 ` Madhavan Srinivasan
2020-11-24 5:42 ` Madhavan Srinivasan
2020-11-24 16:04 ` Liang, Kan [this message]
2020-11-24 16:04 ` Liang, Kan
2020-11-25 8:12 ` Michael Ellerman
2020-11-25 8:12 ` Michael Ellerman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ba4d80fa-82e5-d3fd-c772-deb12e286de3@linux.intel.com \
--to=kan.liang@linux.intel.com \
--cc=acme@kernel.org \
--cc=ak@linux.intel.com \
--cc=eranian@google.com \
--cc=gmx@google.com \
--cc=irogers@google.com \
--cc=jolsa@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=maddy@linux.ibm.com \
--cc=mingo@kernel.org \
--cc=mpe@ellerman.id.au \
--cc=namhyung@kernel.org \
--cc=paulus@samba.org \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.