From: Luo Gengkun <luogengkun@huaweicloud.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: mingo@redhat.com, acme@kernel.org, namhyung@kernel.org,
mark.rutland@arm.com, alexander.shishkin@linux.intel.com,
jolsa@kernel.org, irogers@google.com, adrian.hunter@intel.com,
kan.liang@linux.intel.com, linux-perf-users@vger.kernel.org,
linux-kernel@vger.kernel.org, stable@vger.kernel.org
Subject: Re: [PATCH v5 2/2] perf/core: Fix incorrect time diff in tick adjust period
Date: Thu, 5 Sep 2024 14:38:10 +0800 [thread overview]
Message-ID: <38ceaabe-0a2c-43f2-8f04-b93215f1f94c@huaweicloud.com> (raw)
In-Reply-To: <20240902095054.GD4723@noisy.programming.kicks-ass.net>
On 2024/9/2 17:50, Peter Zijlstra wrote:
> On Sat, Aug 31, 2024 at 07:43:16AM +0000, Luo Gengkun wrote:
>> Perf events has the notion of sampling frequency which is implemented in
>> software by dynamically adjusting the counter period so that samples occur
>> at approximately the target frequency. Period adjustment is done in 2
>> places:
>> - when the counter overflows (and a sample is recorded)
>> - each timer tick, when the event is active
>> The later case is slightly flawed because it assumes that the time since
>> the last timer-tick period adjustment is 1 tick, whereas the event may not
>> have been active (e.g. for a task that is sleeping).
>>
>> Fix by using jiffies to determine the elapsed time in that case.
>>
>> Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com>
>> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
>> ---
>> include/linux/perf_event.h | 1 +
>> kernel/events/core.c | 12 +++++++++---
>> 2 files changed, 10 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
>> index 1a8942277dda..d29b7cf971a1 100644
>> --- a/include/linux/perf_event.h
>> +++ b/include/linux/perf_event.h
>> @@ -265,6 +265,7 @@ struct hw_perf_event {
>> * State for freq target events, see __perf_event_overflow() and
>> * perf_adjust_freq_unthr_context().
>> */
>> + u64 freq_tick_stamp;
>> u64 freq_time_stamp;
>> u64 freq_count_stamp;
>> #endif
>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index a9395bbfd4aa..183291e0d070 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
>> @@ -55,6 +55,7 @@
>> #include <linux/pgtable.h>
>> #include <linux/buildid.h>
>> #include <linux/task_work.h>
>> +#include <linux/jiffies.h>
>>
>> #include "internal.h"
>>
>> @@ -4120,9 +4121,11 @@ static void perf_adjust_freq_unthr_events(struct list_head *event_list)
>> {
>> struct perf_event *event;
>> struct hw_perf_event *hwc;
>> - u64 now, period = TICK_NSEC;
>> + u64 now, period, tick_stamp;
>> s64 delta;
>>
>> + tick_stamp = jiffies64_to_nsecs(get_jiffies_64());
>> +
>> list_for_each_entry(event, event_list, active_list) {
>> if (event->state != PERF_EVENT_STATE_ACTIVE)
>> continue;
>> @@ -4148,6 +4151,9 @@ static void perf_adjust_freq_unthr_events(struct list_head *event_list)
>> */
>> event->pmu->stop(event, PERF_EF_UPDATE);
>>
>> + period = tick_stamp - hwc->freq_tick_stamp;
>> + hwc->freq_tick_stamp = tick_stamp;
>> +
>> now = local64_read(&event->count);
>> delta = now - hwc->freq_count_stamp;
>> hwc->freq_count_stamp = now;
>> @@ -4157,9 +4163,9 @@ static void perf_adjust_freq_unthr_events(struct list_head *event_list)
>> * reload only if value has changed
>> * we have stopped the event so tell that
>> * to perf_adjust_period() to avoid stopping it
>> - * twice.
>> + * twice. And skip if it is the first tick adjust period.
>> */
>> - if (delta > 0)
>> + if (delta > 0 && likely(period != tick_stamp))
>> perf_adjust_period(event, period, delta, false);
>>
>> event->pmu->start(event, delta > 0 ? PERF_EF_RELOAD : 0);
> This one I'm less happy with.. that condition 'period != tick_stamp'
> doesn't make sense to me. That's only false if hwc->freq_tick_stamp ==
> 0, which it will only be once after event creation. Even through the
> Changelog babbles about event scheduling.
>
> Also, that all should then be written something like:
>
> if (delta > 0 && ...) {
> perf_adjust_period(...);
> adjusted = true;
> }
>
> event->pmu->start(event, adjusted ? PERF_EF_RELOAD : 0);
Thank for your review! That is a good point.
If freq_tick_stamp is initialized when an event is created
or enabled, the additional condition can be removed as follows:
+static bool is_freq_event(struct perf_event *event)
+{
+ return event->attr.freq && event->attr.sample_freq;
+}
+
static void
perf_event_set_state(struct perf_event *event, enum perf_event_state
state)
{
@@ -665,6 +670,12 @@ perf_event_set_state(struct perf_event *event, enum
perf_event_state state)
*/
if ((event->state < 0) ^ (state < 0))
perf_event_update_sibling_time(event);
+ /*
+ * Update freq_tick_stamp for freq event just enabled
+ */
+ if (is_freq_event(event) && state == PERF_EVENT_STATE_INACTIVE &&
+ event->state <
PERF_EVENT_STATE_INACTIVE)
+ event->hw.freq_tick_stamp =
jiffies64_to_nsecs(get_jiffies_64());
WRITE_ONCE(event->state, state);
}
@@ -4165,7 +4176,7 @@ static void perf_adjust_freq_unthr_events(struct
list_head *event_list)
* to perf_adjust_period() to avoid stopping it
* twice. And skip if it is the first tick adjust period.
*/
- if (delta > 0 && likely(period != tick_stamp))
+ if (delta > 0)
perf_adjust_period(event, period, delta, false);
event->pmu->start(event, delta > 0 ? PERF_EF_RELOAD : 0);
@@ -12061,8 +12072,11 @@ perf_event_alloc(struct perf_event_attr *attr,
int cpu,
hwc = &event->hw;
hwc->sample_period = attr->sample_period;
- if (attr->freq && attr->sample_freq)
+ if (is_freq_event(event)) {
hwc->sample_period = 1;
+ if (event->state == PERF_EVENT_STATE_INACTIVE)
+ event->hw.freq_tick_stamp =
jiffies64_to_nsecs(get_jiffies_64());
+ }
And I'm wondering if we also need to update freq_count_stamp when
the freq event is enabled for the reason to keep they on the same "period".
+ if (is_freq_event(event) && state == PERF_EVENT_STATE_INACTIVE &&
+ event->state <
PERF_EVENT_STATE_INACTIVE) {
+ event->hw.freq_tick_stamp =
jiffies64_to_nsecs(get_jiffies_64());
+ event->hw.freq_count_stamp = local64_read(&event->count);
+ }
Looking for your reply!
Thanks.
prev parent reply other threads:[~2024-09-05 6:38 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-31 7:43 [PATCH v5 0/2] Fix perf adjust period algorithm Luo Gengkun
2024-08-31 7:43 ` [PATCH v5 1/2] perf/core: Fix small negative period being ignored Luo Gengkun
2024-09-02 9:20 ` Peter Zijlstra
2024-08-31 7:43 ` [PATCH v5 2/2] perf/core: Fix incorrect time diff in tick adjust period Luo Gengkun
2024-09-02 9:50 ` Peter Zijlstra
2024-09-05 6:38 ` Luo Gengkun [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=38ceaabe-0a2c-43f2-8f04-b93215f1f94c@huaweicloud.com \
--to=luogengkun@huaweicloud.com \
--cc=acme@kernel.org \
--cc=adrian.hunter@intel.com \
--cc=alexander.shishkin@linux.intel.com \
--cc=irogers@google.com \
--cc=jolsa@kernel.org \
--cc=kan.liang@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=mingo@redhat.com \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).