All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Stephane Eranian <eranian@google.com>
Cc: Eric Lin <eric.lin@sifive.com>,
	mingo@redhat.com, acme@kernel.org, mark.rutland@arm.com,
	alexander.shishkin@linux.intel.com, jolsa@kernel.org,
	namhyung@kernel.org, irogers@google.com, adrian.hunter@intel.com,
	palmer@dabbelt.com, linux-perf-users@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org,
	greentime.hu@sifive.com, vincent.chen@sifive.com
Subject: Re: [PATCH] perf/core: Add pmu stop before unthrottling to prevent WARNING
Date: Wed, 21 Jun 2023 13:58:11 +0200	[thread overview]
Message-ID: <20230621115811.GD2053369@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <CABPqkBRyqsMnNbokBKepkWq1DtzfB0npXySGbKS1T3nQTwmaAw@mail.gmail.com>

On Tue, Jun 20, 2023 at 11:18:05PM -0700, Stephane Eranian wrote:
> On Tue, Jun 20, 2023 at 9:25 PM Eric Lin <eric.lin@sifive.com> wrote:
> >
> > CC: Stephane Eranian
> >
> > On Fri, Jun 2, 2023 at 5:49 PM Eric Lin <eric.lin@sifive.com> wrote:
> > >
> > > Currently, during the perf sampling, if the perf interrupt takes too long,
> > > perf framework will lower the perf_event_max_sample_rate. This will limit
> > > the number of samples per timer tick (max_samples_per_tick) and set hwc->interrupts
> > > to MAX_INTERRUPTS within the __perf_event_account_interrupt() function.
> > >
> > > Afterward, the perf framework will unthrottle the event in the timer interrupt
> > > handler, which triggers the driver's *_pmu_start() function. Most of the driver's
> > > *_pmu_start() functions will check the event->hw.state to determine whether this
> > > event has stopped. If the event has not stopped, a WARN_ON_ONCE() warning
> > > will be triggered as shown below:
> > >
> > > [ 2110.224723] ------------[ cut here ]------------
> > > [ 2110.224851] WARNING: CPU: 0 PID: 240 at drivers/perf/riscv_pmu.c:184 riscv_pmu_start+0x7c/0x8e
> > > [ 2110.225242] Modules linked in:
> > > [ 2110.225380] CPU: 0 PID: 240 Comm: ls Not tainted 6.4-rc4-g19d0788e9ef2 #1
> > > [ 2110.225574] Hardware name: SiFive (DT)
> > > [ 2110.225657] epc : riscv_pmu_start+0x7c/0x8e
> > > [ 2110.225834]  ra : riscv_pmu_start+0x28/0x8e
> > > [ 2110.225998] epc : ffffffff80aef864 ra : ffffffff80aef810 sp : ffff8f80004db6f0
> > > [ 2110.226135]  gp : ffffffff81c83750 tp : ffffaf80069f9bc0 t0 : ffff8f80004db6c0
> > > [ 2110.226245]  t1 : 0000000000000000 t2 : 000000000000001f s0 : ffff8f80004db720
> > > [ 2110.226367]  s1 : ffffaf8008ca1068 a0 : 0000ffffffffffff a1 : 0000000000000000
> > > [ 2110.226488]  a2 : 0000000000000001 a3 : 0000000000000870 a4 : 0000000000000000
> > > [ 2110.226605]  a5 : 0000000000000000 a6 : 0000000000000840 a7 : 0000000000000030
> > > [ 2110.226721]  s2 : 0000000000000000 s3 : ffffaf8005165800 s4 : ffffaf800424da00
> > > [ 2110.226838]  s5 : ffffffffffffffff s6 : ffffffff81cc7590 s7 : 0000000000000000
> > > [ 2110.226955]  s8 : 0000000000000006 s9 : 0000000000000001 s10: ffffaf807efbc340
> > > [ 2110.227064]  s11: ffffaf807efbbf00 t3 : ffffaf8006a16028 t4 : 00000000dbfbb796
> > > [ 2110.227180]  t5 : 0000000700000000 t6 : ffffaf8005269870
> > > [ 2110.227277] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003
> > > [ 2110.227407] [<ffffffff80aef864>] riscv_pmu_start+0x7c/0x8e
> > > [ 2110.227622] [<ffffffff80185b56>] perf_adjust_freq_unthr_context+0x15e/0x174
> > > [ 2110.227961] [<ffffffff80188642>] perf_event_task_tick+0x88/0x9c
> > > [ 2110.228235] [<ffffffff800626a8>] scheduler_tick+0xfe/0x27c
> > > [ 2110.228463] [<ffffffff800b5640>] update_process_times+0x9a/0xba
> > > [ 2110.228690] [<ffffffff800c5bd4>] tick_sched_handle+0x32/0x66
> > > [ 2110.229007] [<ffffffff800c5e0c>] tick_sched_timer+0x64/0xb0
> > > [ 2110.229253] [<ffffffff800b5e50>] __hrtimer_run_queues+0x156/0x2f4
> > > [ 2110.229446] [<ffffffff800b6bdc>] hrtimer_interrupt+0xe2/0x1fe
> > > [ 2110.229637] [<ffffffff80acc9e8>] riscv_timer_interrupt+0x38/0x42
> > > [ 2110.229984] [<ffffffff80090a16>] handle_percpu_devid_irq+0x90/0x1d2
> > > [ 2110.230162] [<ffffffff8008a9f4>] generic_handle_domain_irq+0x28/0x36
> > >
> > > To prevent this warning, we should call the driver's *_pmu_stop() function before unthrottling
> > >
> > > Signed-off-by: Eric Lin <eric.lin@sifive.com>
> > > ---
> > >  kernel/events/core.c | 1 +
> > >  1 file changed, 1 insertion(+)
> > >
> > > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > > index db016e418931..098c875abe88 100644
> > > --- a/kernel/events/core.c
> > > +++ b/kernel/events/core.c
> > > @@ -4128,6 +4128,7 @@ perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle)
> > >
> > >                 if (hwc->interrupts == MAX_INTERRUPTS) {
> > >                         hwc->interrupts = 0;
> > > +                       event->pmu->stop(event, 0);
> 
> But how could the event have been stopped with a call to pmu->stop()
> during throttling?

Yeah, Changelog fails to explain how we got to the faulty state -- and
without that we can't judge if the proposed solution actually fixes the
problem or not.

> >
> > >                         perf_log_throttle(event, 1);
> > >                         event->pmu->start(event, 0);
> > >                 }
> > > --
> > > 2.17.1
> > >

WARNING: multiple messages have this Message-ID (diff)
From: Peter Zijlstra <peterz@infradead.org>
To: Stephane Eranian <eranian@google.com>
Cc: Eric Lin <eric.lin@sifive.com>,
	mingo@redhat.com, acme@kernel.org, mark.rutland@arm.com,
	alexander.shishkin@linux.intel.com, jolsa@kernel.org,
	namhyung@kernel.org, irogers@google.com, adrian.hunter@intel.com,
	palmer@dabbelt.com, linux-perf-users@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org,
	greentime.hu@sifive.com, vincent.chen@sifive.com
Subject: Re: [PATCH] perf/core: Add pmu stop before unthrottling to prevent WARNING
Date: Wed, 21 Jun 2023 13:58:11 +0200	[thread overview]
Message-ID: <20230621115811.GD2053369@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <CABPqkBRyqsMnNbokBKepkWq1DtzfB0npXySGbKS1T3nQTwmaAw@mail.gmail.com>

On Tue, Jun 20, 2023 at 11:18:05PM -0700, Stephane Eranian wrote:
> On Tue, Jun 20, 2023 at 9:25 PM Eric Lin <eric.lin@sifive.com> wrote:
> >
> > CC: Stephane Eranian
> >
> > On Fri, Jun 2, 2023 at 5:49 PM Eric Lin <eric.lin@sifive.com> wrote:
> > >
> > > Currently, during the perf sampling, if the perf interrupt takes too long,
> > > perf framework will lower the perf_event_max_sample_rate. This will limit
> > > the number of samples per timer tick (max_samples_per_tick) and set hwc->interrupts
> > > to MAX_INTERRUPTS within the __perf_event_account_interrupt() function.
> > >
> > > Afterward, the perf framework will unthrottle the event in the timer interrupt
> > > handler, which triggers the driver's *_pmu_start() function. Most of the driver's
> > > *_pmu_start() functions will check the event->hw.state to determine whether this
> > > event has stopped. If the event has not stopped, a WARN_ON_ONCE() warning
> > > will be triggered as shown below:
> > >
> > > [ 2110.224723] ------------[ cut here ]------------
> > > [ 2110.224851] WARNING: CPU: 0 PID: 240 at drivers/perf/riscv_pmu.c:184 riscv_pmu_start+0x7c/0x8e
> > > [ 2110.225242] Modules linked in:
> > > [ 2110.225380] CPU: 0 PID: 240 Comm: ls Not tainted 6.4-rc4-g19d0788e9ef2 #1
> > > [ 2110.225574] Hardware name: SiFive (DT)
> > > [ 2110.225657] epc : riscv_pmu_start+0x7c/0x8e
> > > [ 2110.225834]  ra : riscv_pmu_start+0x28/0x8e
> > > [ 2110.225998] epc : ffffffff80aef864 ra : ffffffff80aef810 sp : ffff8f80004db6f0
> > > [ 2110.226135]  gp : ffffffff81c83750 tp : ffffaf80069f9bc0 t0 : ffff8f80004db6c0
> > > [ 2110.226245]  t1 : 0000000000000000 t2 : 000000000000001f s0 : ffff8f80004db720
> > > [ 2110.226367]  s1 : ffffaf8008ca1068 a0 : 0000ffffffffffff a1 : 0000000000000000
> > > [ 2110.226488]  a2 : 0000000000000001 a3 : 0000000000000870 a4 : 0000000000000000
> > > [ 2110.226605]  a5 : 0000000000000000 a6 : 0000000000000840 a7 : 0000000000000030
> > > [ 2110.226721]  s2 : 0000000000000000 s3 : ffffaf8005165800 s4 : ffffaf800424da00
> > > [ 2110.226838]  s5 : ffffffffffffffff s6 : ffffffff81cc7590 s7 : 0000000000000000
> > > [ 2110.226955]  s8 : 0000000000000006 s9 : 0000000000000001 s10: ffffaf807efbc340
> > > [ 2110.227064]  s11: ffffaf807efbbf00 t3 : ffffaf8006a16028 t4 : 00000000dbfbb796
> > > [ 2110.227180]  t5 : 0000000700000000 t6 : ffffaf8005269870
> > > [ 2110.227277] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003
> > > [ 2110.227407] [<ffffffff80aef864>] riscv_pmu_start+0x7c/0x8e
> > > [ 2110.227622] [<ffffffff80185b56>] perf_adjust_freq_unthr_context+0x15e/0x174
> > > [ 2110.227961] [<ffffffff80188642>] perf_event_task_tick+0x88/0x9c
> > > [ 2110.228235] [<ffffffff800626a8>] scheduler_tick+0xfe/0x27c
> > > [ 2110.228463] [<ffffffff800b5640>] update_process_times+0x9a/0xba
> > > [ 2110.228690] [<ffffffff800c5bd4>] tick_sched_handle+0x32/0x66
> > > [ 2110.229007] [<ffffffff800c5e0c>] tick_sched_timer+0x64/0xb0
> > > [ 2110.229253] [<ffffffff800b5e50>] __hrtimer_run_queues+0x156/0x2f4
> > > [ 2110.229446] [<ffffffff800b6bdc>] hrtimer_interrupt+0xe2/0x1fe
> > > [ 2110.229637] [<ffffffff80acc9e8>] riscv_timer_interrupt+0x38/0x42
> > > [ 2110.229984] [<ffffffff80090a16>] handle_percpu_devid_irq+0x90/0x1d2
> > > [ 2110.230162] [<ffffffff8008a9f4>] generic_handle_domain_irq+0x28/0x36
> > >
> > > To prevent this warning, we should call the driver's *_pmu_stop() function before unthrottling
> > >
> > > Signed-off-by: Eric Lin <eric.lin@sifive.com>
> > > ---
> > >  kernel/events/core.c | 1 +
> > >  1 file changed, 1 insertion(+)
> > >
> > > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > > index db016e418931..098c875abe88 100644
> > > --- a/kernel/events/core.c
> > > +++ b/kernel/events/core.c
> > > @@ -4128,6 +4128,7 @@ perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle)
> > >
> > >                 if (hwc->interrupts == MAX_INTERRUPTS) {
> > >                         hwc->interrupts = 0;
> > > +                       event->pmu->stop(event, 0);
> 
> But how could the event have been stopped with a call to pmu->stop()
> during throttling?

Yeah, Changelog fails to explain how we got to the faulty state -- and
without that we can't judge if the proposed solution actually fixes the
problem or not.

> >
> > >                         perf_log_throttle(event, 1);
> > >                         event->pmu->start(event, 0);
> > >                 }
> > > --
> > > 2.17.1
> > >

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

  reply	other threads:[~2023-06-21 11:58 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-02  9:48 [PATCH] perf/core: Add pmu stop before unthrottling to prevent WARNING Eric Lin
2023-06-02  9:48 ` Eric Lin
2023-06-21  4:24 ` Eric Lin
2023-06-21  4:24   ` Eric Lin
2023-06-21  6:18   ` Stephane Eranian
2023-06-21  6:18     ` Stephane Eranian
2023-06-21 11:58     ` Peter Zijlstra [this message]
2023-06-21 11:58       ` Peter Zijlstra
2023-06-27  9:03       ` Eric Lin
2023-06-27  9:03         ` Eric Lin
2023-06-27  9:08       ` Eric Lin
2023-06-27  9:08         ` Eric Lin
2023-06-27  9:38         ` Peter Zijlstra
2023-06-27  9:38           ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230621115811.GD2053369@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=eranian@google.com \
    --cc=eric.lin@sifive.com \
    --cc=greentime.hu@sifive.com \
    --cc=irogers@google.com \
    --cc=jolsa@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=palmer@dabbelt.com \
    --cc=vincent.chen@sifive.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.