Re: [PATCH v4 2/2] perf/core: Fix incorrect time diff in tick adjust period

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Liang, Kan" <kan.liang@linux.intel.com>
To: Adrian Hunter <adrian.hunter@intel.com>,
	Luo Gengkun <luogengkun@huaweicloud.com>,
	peterz@infradead.org
Cc: mingo@redhat.com, acme@kernel.org, namhyung@kernel.org,
	mark.rutland@arm.com, alexander.shishkin@linux.intel.com,
	jolsa@kernel.org, irogers@google.com,
	linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v4 2/2] perf/core: Fix incorrect time diff in tick adjust period
Date: Thu, 29 Aug 2024 09:46:59 -0400	[thread overview]
Message-ID: <63bcac23-e650-41c8-9c9e-93e258355777@linux.intel.com> (raw)
In-Reply-To: <30884874-848a-40ef-9e02-7cdb7b1a029a@intel.com>



On 2024-08-27 9:10 p.m., Adrian Hunter wrote:
> On 27/08/24 23:06, Liang, Kan wrote:
>>
>>
>> On 2024-08-27 1:16 p.m., Adrian Hunter wrote:
>>> On 27/08/24 19:42, Liang, Kan wrote:
>>>>
>>>>
>>>> On 2024-08-21 9:42 a.m., Luo Gengkun wrote:
>>>>> Perf events has the notion of sampling frequency which is implemented in
>>>>> software by dynamically adjusting the counter period so that samples occur
>>>>> at approximately the target frequency.  Period adjustment is done in 2
>>>>> places:
>>>>>  - when the counter overflows (and a sample is recorded)
>>>>>  - each timer tick, when the event is active
>>>>> The later case is slightly flawed because it assumes that the time since
>>>>> the last timer-tick period adjustment is 1 tick, whereas the event may not
>>>>> have been active (e.g. for a task that is sleeping).
>>>>>
>>>>
>>>> Do you have a real-world example to demonstrate how bad it is if the
>>>> algorithm doesn't take sleep into account?
>>>>
>>>> I'm not sure if introducing such complexity in the critical path is
>>>> worth it.
>>>>
>>>>> Fix by using jiffies to determine the elapsed time in that case.
>>>>>
>>>>> Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com>
>>>>> ---
>>>>>  include/linux/perf_event.h |  1 +
>>>>>  kernel/events/core.c       | 11 ++++++++---
>>>>>  2 files changed, 9 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
>>>>> index 1a8942277dda..d29b7cf971a1 100644
>>>>> --- a/include/linux/perf_event.h
>>>>> +++ b/include/linux/perf_event.h
>>>>> @@ -265,6 +265,7 @@ struct hw_perf_event {
>>>>>  	 * State for freq target events, see __perf_event_overflow() and
>>>>>  	 * perf_adjust_freq_unthr_context().
>>>>>  	 */
>>>>> +	u64				freq_tick_stamp;
>>>>>  	u64				freq_time_stamp;
>>>>>  	u64				freq_count_stamp;
>>>>>  #endif
>>>>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>>>>> index a9395bbfd4aa..86e80e3ef6ac 100644
>>>>> --- a/kernel/events/core.c
>>>>> +++ b/kernel/events/core.c
>>>>> @@ -55,6 +55,7 @@
>>>>>  #include <linux/pgtable.h>
>>>>>  #include <linux/buildid.h>
>>>>>  #include <linux/task_work.h>
>>>>> +#include <linux/jiffies.h>
>>>>>  
>>>>>  #include "internal.h"
>>>>>  
>>>>> @@ -4120,7 +4121,7 @@ static void perf_adjust_freq_unthr_events(struct list_head *event_list)
>>>>>  {
>>>>>  	struct perf_event *event;
>>>>>  	struct hw_perf_event *hwc;
>>>>> -	u64 now, period = TICK_NSEC;
>>>>> +	u64 now, period, tick_stamp;
>>>>>  	s64 delta;
>>>>>  
>>>>>  	list_for_each_entry(event, event_list, active_list) {
>>>>> @@ -4148,6 +4149,10 @@ static void perf_adjust_freq_unthr_events(struct list_head *event_list)
>>>>>  		 */
>>>>>  		event->pmu->stop(event, PERF_EF_UPDATE);
>>>>>  
>>>>> +		tick_stamp = jiffies64_to_nsecs(get_jiffies_64());
>>>>
>>>> Seems it only needs to retrieve the time once at the beginning, not for
>>>> each event.
>>>>
>>>> There is a perf_clock(). It's better to use it for the consistency.
>>>
>>> perf_clock() is much slower, and for statistical sampling it doesn't
>>> have to be perfect.
>>
>> Because of rdtsc?
> 
> Yes

OK. I'm not worry about it too much as long as it's only invoked once in
each tick.

> 
>>
>> If it is only used here, it should be fine. What I'm worried about is
>> that someone may use it with other timestamp in perf later. Anyway, it's
>> not a big deal.
>>
>> The main concern I have is that do we really need the patch?
> 
> The current code is wrong.
> 
>> It seems can only bring us a better guess of the period for the sleep
>> test. Then we have to do all the calculate for each tick.
> 
> Or any workload that sleeps periodically.
> 
> Another option is to remove the period adjust on tick entirely.
> Although arguably the calculation at a tick is better because
> it probably covers a longer period.

Or we may remove the period adjust on overflow.

As my understanding, the period adjust on overflow is to handle the case
while the overflow happens very frequently (< 2 ticks). It is mainly
caused by the very low start period (1).
I'm working on a patch to set a larger start period, which should
minimize the usage of the period adjust on overflow.

Anyway, based on the current code, I agree that adding a new
freq_tick_stamp should be required. But it doesn't need to read the time
for each event. I think reading the time once at the beginning should be
good enough for the period adjust/estimate algorithm.

Thanks,
Kan

> 
>>
>> Thanks,
>> Kan
>>>
>>>>
>>>> Thanks,
>>>> Kan
>>>>> +		period = tick_stamp - hwc->freq_tick_stamp;
>>>>> +		hwc->freq_tick_stamp = tick_stamp;
>>>>> +
>>>>>  		now = local64_read(&event->count);
>>>>>  		delta = now - hwc->freq_count_stamp;
>>>>>  		hwc->freq_count_stamp = now;
>>>>> @@ -4157,9 +4162,9 @@ static void perf_adjust_freq_unthr_events(struct list_head *event_list)
>>>>>  		 * reload only if value has changed
>>>>>  		 * we have stopped the event so tell that
>>>>>  		 * to perf_adjust_period() to avoid stopping it
>>>>> -		 * twice.
>>>>> +		 * twice. And skip if it is the first tick adjust period.
>>>>>  		 */
>>>>> -		if (delta > 0)
>>>>> +		if (delta > 0 && likely(period != tick_stamp))
>>>>>  			perf_adjust_period(event, period, delta, false);>
>>>>>  		event->pmu->start(event, delta > 0 ? PERF_EF_RELOAD : 0);
>>>
>>>
> 
>

next prev parent reply	other threads:[~2024-08-29 13:47 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-21 13:42 [PATCH v4 0/2] Fix perf adjust period Luo Gengkun
2024-08-21 13:42 ` [PATCH v4 1/2] perf/core: Fix small negative period being ignored Luo Gengkun
2024-08-27 16:32   ` Liang, Kan
2024-08-21 13:42 ` [PATCH v4 2/2] perf/core: Fix incorrect time diff in tick adjust period Luo Gengkun
2024-08-22 18:23   ` Adrian Hunter
2024-08-27 16:42   ` Liang, Kan
2024-08-27 17:16     ` Adrian Hunter
2024-08-27 20:06       ` Liang, Kan
2024-08-28  1:10         ` Adrian Hunter
2024-08-29 13:46           ` Liang, Kan [this message]
2024-08-29 14:19             ` Luo Gengkun
2024-08-29 14:30               ` Liang, Kan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=63bcac23-e650-41c8-9c9e-93e258355777@linux.intel.com \
    --to=kan.liang@linux.intel.com \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=irogers@google.com \
    --cc=jolsa@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=luogengkun@huaweicloud.com \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.