From: "Liang, Kan" <kan.liang@linux.intel.com>
To: Luo Gengkun <luogengkun@huaweicloud.com>,
Adrian Hunter <adrian.hunter@intel.com>,
peterz@infradead.org
Cc: mingo@redhat.com, acme@kernel.org, namhyung@kernel.org,
mark.rutland@arm.com, alexander.shishkin@linux.intel.com,
jolsa@kernel.org, irogers@google.com,
linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v4 2/2] perf/core: Fix incorrect time diff in tick adjust period
Date: Thu, 29 Aug 2024 10:30:08 -0400 [thread overview]
Message-ID: <11099a9e-7006-4372-82b5-f35232a63c8c@linux.intel.com> (raw)
In-Reply-To: <eb37b77d-58ed-4b79-a942-7c249cb5050b@huaweicloud.com>
On 2024-08-29 10:19 a.m., Luo Gengkun wrote:
>
> On 2024/8/29 21:46, Liang, Kan wrote:
>>
>> On 2024-08-27 9:10 p.m., Adrian Hunter wrote:
>>> On 27/08/24 23:06, Liang, Kan wrote:
>>>>
>>>> On 2024-08-27 1:16 p.m., Adrian Hunter wrote:
>>>>> On 27/08/24 19:42, Liang, Kan wrote:
>>>>>>
>>>>>> On 2024-08-21 9:42 a.m., Luo Gengkun wrote:
>>>>>>> Perf events has the notion of sampling frequency which is
>>>>>>> implemented in
>>>>>>> software by dynamically adjusting the counter period so that
>>>>>>> samples occur
>>>>>>> at approximately the target frequency. Period adjustment is done
>>>>>>> in 2
>>>>>>> places:
>>>>>>> - when the counter overflows (and a sample is recorded)
>>>>>>> - each timer tick, when the event is active
>>>>>>> The later case is slightly flawed because it assumes that the
>>>>>>> time since
>>>>>>> the last timer-tick period adjustment is 1 tick, whereas the
>>>>>>> event may not
>>>>>>> have been active (e.g. for a task that is sleeping).
>>>>>>>
>>>>>> Do you have a real-world example to demonstrate how bad it is if the
>>>>>> algorithm doesn't take sleep into account?
>>>>>>
>>>>>> I'm not sure if introducing such complexity in the critical path is
>>>>>> worth it.
>>>>>>
>>>>>>> Fix by using jiffies to determine the elapsed time in that case.
>>>>>>>
>>>>>>> Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com>
>>>>>>> ---
>>>>>>> include/linux/perf_event.h | 1 +
>>>>>>> kernel/events/core.c | 11 ++++++++---
>>>>>>> 2 files changed, 9 insertions(+), 3 deletions(-)
>>>>>>>
>>>>>>> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
>>>>>>> index 1a8942277dda..d29b7cf971a1 100644
>>>>>>> --- a/include/linux/perf_event.h
>>>>>>> +++ b/include/linux/perf_event.h
>>>>>>> @@ -265,6 +265,7 @@ struct hw_perf_event {
>>>>>>> * State for freq target events, see
>>>>>>> __perf_event_overflow() and
>>>>>>> * perf_adjust_freq_unthr_context().
>>>>>>> */
>>>>>>> + u64 freq_tick_stamp;
>>>>>>> u64 freq_time_stamp;
>>>>>>> u64 freq_count_stamp;
>>>>>>> #endif
>>>>>>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>>>>>>> index a9395bbfd4aa..86e80e3ef6ac 100644
>>>>>>> --- a/kernel/events/core.c
>>>>>>> +++ b/kernel/events/core.c
>>>>>>> @@ -55,6 +55,7 @@
>>>>>>> #include <linux/pgtable.h>
>>>>>>> #include <linux/buildid.h>
>>>>>>> #include <linux/task_work.h>
>>>>>>> +#include <linux/jiffies.h>
>>>>>>> #include "internal.h"
>>>>>>> @@ -4120,7 +4121,7 @@ static void
>>>>>>> perf_adjust_freq_unthr_events(struct list_head *event_list)
>>>>>>> {
>>>>>>> struct perf_event *event;
>>>>>>> struct hw_perf_event *hwc;
>>>>>>> - u64 now, period = TICK_NSEC;
>>>>>>> + u64 now, period, tick_stamp;
>>>>>>> s64 delta;
>>>>>>> list_for_each_entry(event, event_list, active_list) {
>>>>>>> @@ -4148,6 +4149,10 @@ static void
>>>>>>> perf_adjust_freq_unthr_events(struct list_head *event_list)
>>>>>>> */
>>>>>>> event->pmu->stop(event, PERF_EF_UPDATE);
>>>>>>> + tick_stamp = jiffies64_to_nsecs(get_jiffies_64());
>>>>>> Seems it only needs to retrieve the time once at the beginning,
>>>>>> not for
>>>>>> each event.
>>>>>>
>>>>>> There is a perf_clock(). It's better to use it for the consistency.
>>>>> perf_clock() is much slower, and for statistical sampling it doesn't
>>>>> have to be perfect.
>>>> Because of rdtsc?
>>> Yes
>> OK. I'm not worry about it too much as long as it's only invoked once in
>> each tick.
>>
>>>> If it is only used here, it should be fine. What I'm worried about is
>>>> that someone may use it with other timestamp in perf later. Anyway,
>>>> it's
>>>> not a big deal.
>>>>
>>>> The main concern I have is that do we really need the patch?
>>> The current code is wrong.
>>>
>>>> It seems can only bring us a better guess of the period for the sleep
>>>> test. Then we have to do all the calculate for each tick.
>>> Or any workload that sleeps periodically.
>>>
>>> Another option is to remove the period adjust on tick entirely.
>>> Although arguably the calculation at a tick is better because
>>> it probably covers a longer period.
>> Or we may remove the period adjust on overflow.
>>
>> As my understanding, the period adjust on overflow is to handle the case
>> while the overflow happens very frequently (< 2 ticks). It is mainly
>> caused by the very low start period (1).
>> I'm working on a patch to set a larger start period, which should
>> minimize the usage of the period adjust on overflow.
> I think it's hard to choose a nice initial period, it may require a lot
> of testing, good luck.
>>
>> Anyway, based on the current code, I agree that adding a new
>> freq_tick_stamp should be required. But it doesn't need to read the time
>> for each event. I think reading the time once at the beginning should be
>> good enough for the period adjust/estimate algorithm.
>
> That's a good idea, do you think it's appropriate to move this line here?
>
>
> Thanks,
>
> Gengkun
>
> @@ -4126,6 +4126,8 @@ perf_adjust_freq_unthr_context(struct
> perf_event_context *ctx, bool unthrottle)
>
> raw_spin_lock(&ctx->lock);
>
> + tick_stamp = jiffies64_to_nsecs(get_jiffies_64());
Yes, the place looks good.
I'm still not a big fan of jiffies. Anyway, I guess we can leave it to
Peter to decide.
Thanks,
Kan
> +
> list_for_each_entry_rcu(event, &ctx->event_list, event_entry) {
> if (event->state != PERF_EVENT_STATE_ACTIVE)
> continue;
> @@ -4152,7 +4154,6 @@ perf_adjust_freq_unthr_context(struct
> perf_event_context *ctx, bool unthrottle)
> */
> event->pmu->stop(event, PERF_EF_UPDATE);
>
> - tick_stamp = jiffies64_to_nsecs(get_jiffies_64());
> period = tick_stamp - hwc->freq_tick_stamp;
> hwc->freq_tick_stamp = tick_stamp;
>
>>
>> Thanks,
>> Kan
>>
>>>> Thanks,
>>>> Kan
>>>>>> Thanks,
>>>>>> Kan
>>>>>>> + period = tick_stamp - hwc->freq_tick_stamp;
>>>>>>> + hwc->freq_tick_stamp = tick_stamp;
>>>>>>> +
>>>>>>> now = local64_read(&event->count);
>>>>>>> delta = now - hwc->freq_count_stamp;
>>>>>>> hwc->freq_count_stamp = now;
>>>>>>> @@ -4157,9 +4162,9 @@ static void
>>>>>>> perf_adjust_freq_unthr_events(struct list_head *event_list)
>>>>>>> * reload only if value has changed
>>>>>>> * we have stopped the event so tell that
>>>>>>> * to perf_adjust_period() to avoid stopping it
>>>>>>> - * twice.
>>>>>>> + * twice. And skip if it is the first tick adjust period.
>>>>>>> */
>>>>>>> - if (delta > 0)
>>>>>>> + if (delta > 0 && likely(period != tick_stamp))
>>>>>>> perf_adjust_period(event, period, delta, false);>
>>>>>>> event->pmu->start(event, delta > 0 ? PERF_EF_RELOAD : 0);
>>>>>
>>>
>
prev parent reply other threads:[~2024-08-29 14:30 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-21 13:42 [PATCH v4 0/2] Fix perf adjust period Luo Gengkun
2024-08-21 13:42 ` [PATCH v4 1/2] perf/core: Fix small negative period being ignored Luo Gengkun
2024-08-27 16:32 ` Liang, Kan
2024-08-21 13:42 ` [PATCH v4 2/2] perf/core: Fix incorrect time diff in tick adjust period Luo Gengkun
2024-08-22 18:23 ` Adrian Hunter
2024-08-27 16:42 ` Liang, Kan
2024-08-27 17:16 ` Adrian Hunter
2024-08-27 20:06 ` Liang, Kan
2024-08-28 1:10 ` Adrian Hunter
2024-08-29 13:46 ` Liang, Kan
2024-08-29 14:19 ` Luo Gengkun
2024-08-29 14:30 ` Liang, Kan [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=11099a9e-7006-4372-82b5-f35232a63c8c@linux.intel.com \
--to=kan.liang@linux.intel.com \
--cc=acme@kernel.org \
--cc=adrian.hunter@intel.com \
--cc=alexander.shishkin@linux.intel.com \
--cc=irogers@google.com \
--cc=jolsa@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=luogengkun@huaweicloud.com \
--cc=mark.rutland@arm.com \
--cc=mingo@redhat.com \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).