[PATCH RESEND v2 0/2] Fix small negative period being ignored

linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH RESEND v2 0/2] Fix small negative period being ignored
@ 2024-04-17 11:54 Luo Gengkun
  2024-04-17 11:54 ` [PATCH RESEND v2 1/2] perf/core: " Luo Gengkun
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Luo Gengkun @ 2024-04-17 11:54 UTC (permalink / raw)
  To: peterz
  Cc: mingo, acme, mark.rutland, alexander.shishkin, jolsa, namhyung,
	irogers, adrian.hunter, linux-perf-users, linux-kernel,
	luogengkun

v1 -> v2:
1. Add reviewed by for perf/core: Fix small negative period being ignored
2. Add perf/core: Fix incorrected time diff in tick adjust period

Luo Gengkun (2):
  perf/core: Fix small negative period being ignored
  perf/core: Fix incorrected time diff in tick adjust period

 include/linux/perf_event.h |  1 +
 kernel/events/core.c       | 21 +++++++++++++++++----
 2 files changed, 18 insertions(+), 4 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH RESEND v2 1/2] perf/core: Fix small negative period being ignored
  2024-04-17 11:54 [PATCH RESEND v2 0/2] Fix small negative period being ignored Luo Gengkun
@ 2024-04-17 11:54 ` Luo Gengkun
  2024-04-17 11:54 ` [PATCH RESEND v2 2/2] perf/core: Fix incorrected time diff in tick adjust period Luo Gengkun
  2024-05-06 11:28 ` [PATCH RESEND v2 0/2] Fix small negative period being ignored Luo Gengkun
  2 siblings, 0 replies; 7+ messages in thread
From: Luo Gengkun @ 2024-04-17 11:54 UTC (permalink / raw)
  To: peterz
  Cc: mingo, acme, mark.rutland, alexander.shishkin, jolsa, namhyung,
	irogers, adrian.hunter, linux-perf-users, linux-kernel,
	luogengkun

In perf_adjust_period, we will first calculate period, and then use
this period to calculate delta. However, when delta is less than 0,
there will be a deviation compared to when delta is greater than or
equal to 0. For example, when delta is in the range of [-14,-1], the
range of delta = delta + 7 is between [-7,6], so the final value of
delta/8 is 0. Therefore, the impact of -1 and -2 will be ignored.
This is unacceptable when the target period is very short, because
we will lose a lot of samples.

Here are some tests and analyzes:
before:
  # perf record -e cs -F 1000  ./a.out
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.022 MB perf.data (518 samples) ]

  # perf script
  ...
  a.out     396   257.956048:         23 cs:  ffffffff81f4eeec schedul>
  a.out     396   257.957891:         23 cs:  ffffffff81f4eeec schedul>
  a.out     396   257.959730:         23 cs:  ffffffff81f4eeec schedul>
  a.out     396   257.961545:         23 cs:  ffffffff81f4eeec schedul>
  a.out     396   257.963355:         23 cs:  ffffffff81f4eeec schedul>
  a.out     396   257.965163:         23 cs:  ffffffff81f4eeec schedul>
  a.out     396   257.966973:         23 cs:  ffffffff81f4eeec schedul>
  a.out     396   257.968785:         23 cs:  ffffffff81f4eeec schedul>
  a.out     396   257.970593:         23 cs:  ffffffff81f4eeec schedul>
  ...

after:
  # perf record -e cs -F 1000  ./a.out
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.058 MB perf.data (1466 samples) ]

  # perf script
  ...
  a.out     395    59.338813:         11 cs:  ffffffff81f4eeec schedul>
  a.out     395    59.339707:         12 cs:  ffffffff81f4eeec schedul>
  a.out     395    59.340682:         13 cs:  ffffffff81f4eeec schedul>
  a.out     395    59.341751:         13 cs:  ffffffff81f4eeec schedul>
  a.out     395    59.342799:         12 cs:  ffffffff81f4eeec schedul>
  a.out     395    59.343765:         11 cs:  ffffffff81f4eeec schedul>
  a.out     395    59.344651:         11 cs:  ffffffff81f4eeec schedul>
  a.out     395    59.345539:         12 cs:  ffffffff81f4eeec schedul>
  a.out     395    59.346502:         13 cs:  ffffffff81f4eeec schedul>
  ...

test.c

int main() {
        for (int i = 0; i < 20000; i++)
                usleep(10);

        return 0;
}

  # time ./a.out
  real    0m1.583s
  user    0m0.040s
  sys     0m0.298s

The above results were tested on x86-64 qemu with KVM enabled using
test.c as test program. Ideally, we should have around 1500 samples,
but the previous algorithm had only about 500, whereas the modified
algorithm now has about 1400. Further more, the new version shows 1
sample per 0.001s, while the previous one is 1 sample per 0.002s.This
indicates that the new algorithm is more sensitive to small negative
values compared to old algorithm.

Fixes: bd2b5b12849a ("perf_counter: More aggressive frequency adjustment")
Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
---
 kernel/events/core.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 683dc086ef10..cad50d3439f1 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4078,7 +4078,11 @@ static void perf_adjust_period(struct perf_event *event, u64 nsec, u64 count, bo
 	period = perf_calculate_period(event, nsec, count);
 
 	delta = (s64)(period - hwc->sample_period);
-	delta = (delta + 7) / 8; /* low pass filter */
+	if (delta >= 0)
+		delta += 7;
+	else
+		delta -= 7;
+	delta /= 8; /* low pass filter */
 
 	sample_period = hwc->sample_period + delta;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH RESEND v2 2/2] perf/core: Fix incorrected time diff in tick adjust period
  2024-04-17 11:54 [PATCH RESEND v2 0/2] Fix small negative period being ignored Luo Gengkun
  2024-04-17 11:54 ` [PATCH RESEND v2 1/2] perf/core: " Luo Gengkun
@ 2024-04-17 11:54 ` Luo Gengkun
  2024-06-17 13:42   ` Adrian Hunter
  2024-05-06 11:28 ` [PATCH RESEND v2 0/2] Fix small negative period being ignored Luo Gengkun
  2 siblings, 1 reply; 7+ messages in thread
From: Luo Gengkun @ 2024-04-17 11:54 UTC (permalink / raw)
  To: peterz
  Cc: mingo, acme, mark.rutland, alexander.shishkin, jolsa, namhyung,
	irogers, adrian.hunter, linux-perf-users, linux-kernel,
	luogengkun

Adrian found that there is a probability that the number of samples
is small, which is caused by the unreasonable large sampling period.

 # taskset --cpu 0 perf record -F 1000 -e cs -- taskset --cpu 1 ./test
 [ perf record: Woken up 1 times to write data ]
 [ perf record: Captured and wrote 0.010 MB perf.data (204 samples) ]
 # perf script
 ...
 test   865   265.377846:         16 cs:  ffffffff832e927b schedule+0x2b
 test   865   265.378900:         15 cs:  ffffffff832e927b schedule+0x2b
 test   865   265.379845:         14 cs:  ffffffff832e927b schedule+0x2b
 test   865   265.380770:         14 cs:  ffffffff832e927b schedule+0x2b
 test   865   265.381647:         15 cs:  ffffffff832e927b schedule+0x2b
 test   865   265.382638:         16 cs:  ffffffff832e927b schedule+0x2b
 test   865   265.383647:         16 cs:  ffffffff832e927b schedule+0x2b
 test   865   265.384704:         15 cs:  ffffffff832e927b schedule+0x2b
 test   865   265.385649:         14 cs:  ffffffff832e927b schedule+0x2b
 test   865   265.386578:        152 cs:  ffffffff832e927b schedule+0x2b
 test   865   265.396383:        154 cs:  ffffffff832e927b schedule+0x2b
 test   865   265.406183:        154 cs:  ffffffff832e927b schedule+0x2b
 test   865   265.415839:        154 cs:  ffffffff832e927b schedule+0x2b
 test   865   265.425445:        154 cs:  ffffffff832e927b schedule+0x2b
 test   865   265.435052:        154 cs:  ffffffff832e927b schedule+0x2b
 test   865   265.444708:        154 cs:  ffffffff832e927b schedule+0x2b
 test   865   265.454314:        154 cs:  ffffffff832e927b schedule+0x2b
 test   865   265.463970:        154 cs:  ffffffff832e927b schedule+0x2b
 test   865   265.473577:        154 cs:  ffffffff832e927b schedule+0x2b
 ...

It seems that the Timer Interrupts is not coming every TICK_NSEC when
system is idle. For example, counter increase n during 2 * TICK_NSEC,
and it call perf_adjust_period using n and TICK_NSEC, so the final period
we calculated will be bigger than expected one. What's more, if the
the overflow time is larger than 2 * TICK_NSEC we cannot tune the period
using __perf_event_account_interrupt. To fix this problem, perf can
calculate the tick interval by itself.

Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com>
---
 include/linux/perf_event.h |  1 +
 kernel/events/core.c       | 15 ++++++++++++---
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index afb028c54f33..2708f1d0692c 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -265,6 +265,7 @@ struct hw_perf_event {
 	 * State for freq target events, see __perf_event_overflow() and
 	 * perf_adjust_freq_unthr_context().
 	 */
+	u64				freq_tick_stamp;
 	u64				freq_time_stamp;
 	u64				freq_count_stamp;
 #endif
diff --git a/kernel/events/core.c b/kernel/events/core.c
index cad50d3439f1..0f2025d631aa 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4112,7 +4112,7 @@ perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle)
 {
 	struct perf_event *event;
 	struct hw_perf_event *hwc;
-	u64 now, period = TICK_NSEC;
+	u64 now, period, tick_stamp;
 	s64 delta;
 
 	/*
@@ -4151,6 +4151,10 @@ perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle)
 		 */
 		event->pmu->stop(event, PERF_EF_UPDATE);
 
+		tick_stamp = perf_clock();
+		period = tick_stamp - hwc->freq_tick_stamp;
+		hwc->freq_tick_stamp = tick_stamp;
+
 		now = local64_read(&event->count);
 		delta = now - hwc->freq_count_stamp;
 		hwc->freq_count_stamp = now;
@@ -4162,8 +4166,13 @@ perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle)
 		 * to perf_adjust_period() to avoid stopping it
 		 * twice.
 		 */
-		if (delta > 0)
-			perf_adjust_period(event, period, delta, false);
+		if (delta > 0) {
+			/*
+			 * we skip first tick adjust period
+			 */
+			if (likely(period != tick_stamp))
+				perf_adjust_period(event, period, delta, false);
+		}
 
 		event->pmu->start(event, delta > 0 ? PERF_EF_RELOAD : 0);
 	next:
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH RESEND v2 0/2] Fix small negative period being ignored
  2024-04-17 11:54 [PATCH RESEND v2 0/2] Fix small negative period being ignored Luo Gengkun
  2024-04-17 11:54 ` [PATCH RESEND v2 1/2] perf/core: " Luo Gengkun
  2024-04-17 11:54 ` [PATCH RESEND v2 2/2] perf/core: Fix incorrected time diff in tick adjust period Luo Gengkun
@ 2024-05-06 11:28 ` Luo Gengkun
  2 siblings, 0 replies; 7+ messages in thread
From: Luo Gengkun @ 2024-05-06 11:28 UTC (permalink / raw)
  To: peterz
  Cc: mingo, acme, mark.rutland, alexander.shishkin, jolsa, namhyung,
	irogers, adrian.hunter, linux-perf-users, linux-kernel

Ping.

The previous link is : 
https://lore.kernel.org/all/20240116083915.2859302-1-luogengkun2@huawei.com/

On 2024/4/17 19:54, Luo Gengkun wrote:
> v1 -> v2:
> 1. Add reviewed by for perf/core: Fix small negative period being ignored
> 2. Add perf/core: Fix incorrected time diff in tick adjust period
>
> Luo Gengkun (2):
>    perf/core: Fix small negative period being ignored
>    perf/core: Fix incorrected time diff in tick adjust period
>
>   include/linux/perf_event.h |  1 +
>   kernel/events/core.c       | 21 +++++++++++++++++----
>   2 files changed, 18 insertions(+), 4 deletions(-)
>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH RESEND v2 2/2] perf/core: Fix incorrected time diff in tick adjust period
  2024-04-17 11:54 ` [PATCH RESEND v2 2/2] perf/core: Fix incorrected time diff in tick adjust period Luo Gengkun
@ 2024-06-17 13:42   ` Adrian Hunter
  2024-07-29 14:18     ` Luo Gengkun
  0 siblings, 1 reply; 7+ messages in thread
From: Adrian Hunter @ 2024-06-17 13:42 UTC (permalink / raw)
  To: Luo Gengkun, peterz
  Cc: mingo, acme, mark.rutland, alexander.shishkin, jolsa, namhyung,
	irogers, linux-perf-users, linux-kernel

On 17/04/24 14:54, Luo Gengkun wrote:
> Adrian found that there is a probability that the number of samples
> is small, which is caused by the unreasonable large sampling period.
> 
>  # taskset --cpu 0 perf record -F 1000 -e cs -- taskset --cpu 1 ./test
>  [ perf record: Woken up 1 times to write data ]
>  [ perf record: Captured and wrote 0.010 MB perf.data (204 samples) ]
>  # perf script
>  ...
>  test   865   265.377846:         16 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.378900:         15 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.379845:         14 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.380770:         14 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.381647:         15 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.382638:         16 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.383647:         16 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.384704:         15 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.385649:         14 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.386578:        152 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.396383:        154 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.406183:        154 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.415839:        154 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.425445:        154 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.435052:        154 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.444708:        154 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.454314:        154 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.463970:        154 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.473577:        154 cs:  ffffffff832e927b schedule+0x2b
>  ...
> 
> It seems that the Timer Interrupts is not coming every TICK_NSEC when

No, the period is not adjusted unless the event is active i.e. scheduled in.
So an event in a task context where the task sleeps a lot will
likely not be adjusted every tick.

> system is idle. For example, counter increase n during 2 * TICK_NSEC,
> and it call perf_adjust_period using n and TICK_NSEC, so the final period
> we calculated will be bigger than expected one. What's more, if the
> the overflow time is larger than 2 * TICK_NSEC we cannot tune the period
> using __perf_event_account_interrupt. To fix this problem, perf can
> calculate the tick interval by itself.

Yes, the period can get stuck being too big:

	perf_adjust_freq_unthr_events() calculates a value that is
	too big because it incorrectly assumes the count has
	accumulated only since the last tick, whereas it can have
	been much longer.

	__perf_event_account_interrupt() has an unexplained limit
	(2*TICK_NSEC) on the count delta, and won't adjust the
	period if that is exceeded.

> 
> Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com>
> ---
>  include/linux/perf_event.h |  1 +
>  kernel/events/core.c       | 15 ++++++++++++---
>  2 files changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index afb028c54f33..2708f1d0692c 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -265,6 +265,7 @@ struct hw_perf_event {
>  	 * State for freq target events, see __perf_event_overflow() and
>  	 * perf_adjust_freq_unthr_context().
>  	 */
> +	u64				freq_tick_stamp;
>  	u64				freq_time_stamp;
>  	u64				freq_count_stamp;
>  #endif
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index cad50d3439f1..0f2025d631aa 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -4112,7 +4112,7 @@ perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle)
>  {
>  	struct perf_event *event;
>  	struct hw_perf_event *hwc;
> -	u64 now, period = TICK_NSEC;
> +	u64 now, period, tick_stamp;
>  	s64 delta;
>  
>  	/*
> @@ -4151,6 +4151,10 @@ perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle)
>  		 */
>  		event->pmu->stop(event, PERF_EF_UPDATE);
>  
> +		tick_stamp = perf_clock();

Perhaps jiffies would work just as well, but be
more efficient.

> +		period = tick_stamp - hwc->freq_tick_stamp;
> +		hwc->freq_tick_stamp = tick_stamp;
> +
>  		now = local64_read(&event->count);
>  		delta = now - hwc->freq_count_stamp;
>  		hwc->freq_count_stamp = now;
> @@ -4162,8 +4166,13 @@ perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle)
>  		 * to perf_adjust_period() to avoid stopping it
>  		 * twice.
>  		 */
> -		if (delta > 0)
> -			perf_adjust_period(event, period, delta, false);
> +		if (delta > 0) {
> +			/*
> +			 * we skip first tick adjust period
> +			 */
> +			if (likely(period != tick_stamp))
> +				perf_adjust_period(event, period, delta, false);
> +		}
>  
>  		event->pmu->start(event, delta > 0 ? PERF_EF_RELOAD : 0);
>  	next:


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH RESEND v2 2/2] perf/core: Fix incorrected time diff in tick adjust period
  2024-06-17 13:42   ` Adrian Hunter
@ 2024-07-29 14:18     ` Luo Gengkun
  2024-08-10  9:26       ` Luo Gengkun
  0 siblings, 1 reply; 7+ messages in thread
From: Luo Gengkun @ 2024-07-29 14:18 UTC (permalink / raw)
  To: Adrian Hunter, peterz
  Cc: mingo, acme, mark.rutland, alexander.shishkin, jolsa, namhyung,
	irogers, linux-perf-users, linux-kernel


On 2024/6/17 21:42, Adrian Hunter wrote:
> On 17/04/24 14:54, Luo Gengkun wrote:
>> Adrian found that there is a probability that the number of samples
>> is small, which is caused by the unreasonable large sampling period.
>>
>>   # taskset --cpu 0 perf record -F 1000 -e cs -- taskset --cpu 1 ./test
>>   [ perf record: Woken up 1 times to write data ]
>>   [ perf record: Captured and wrote 0.010 MB perf.data (204 samples) ]
>>   # perf script
>>   ...
>>   test   865   265.377846:         16 cs:  ffffffff832e927b schedule+0x2b
>>   test   865   265.378900:         15 cs:  ffffffff832e927b schedule+0x2b
>>   test   865   265.379845:         14 cs:  ffffffff832e927b schedule+0x2b
>>   test   865   265.380770:         14 cs:  ffffffff832e927b schedule+0x2b
>>   test   865   265.381647:         15 cs:  ffffffff832e927b schedule+0x2b
>>   test   865   265.382638:         16 cs:  ffffffff832e927b schedule+0x2b
>>   test   865   265.383647:         16 cs:  ffffffff832e927b schedule+0x2b
>>   test   865   265.384704:         15 cs:  ffffffff832e927b schedule+0x2b
>>   test   865   265.385649:         14 cs:  ffffffff832e927b schedule+0x2b
>>   test   865   265.386578:        152 cs:  ffffffff832e927b schedule+0x2b
>>   test   865   265.396383:        154 cs:  ffffffff832e927b schedule+0x2b
>>   test   865   265.406183:        154 cs:  ffffffff832e927b schedule+0x2b
>>   test   865   265.415839:        154 cs:  ffffffff832e927b schedule+0x2b
>>   test   865   265.425445:        154 cs:  ffffffff832e927b schedule+0x2b
>>   test   865   265.435052:        154 cs:  ffffffff832e927b schedule+0x2b
>>   test   865   265.444708:        154 cs:  ffffffff832e927b schedule+0x2b
>>   test   865   265.454314:        154 cs:  ffffffff832e927b schedule+0x2b
>>   test   865   265.463970:        154 cs:  ffffffff832e927b schedule+0x2b
>>   test   865   265.473577:        154 cs:  ffffffff832e927b schedule+0x2b
>>   ...
>>
>> It seems that the Timer Interrupts is not coming every TICK_NSEC when
> No, the period is not adjusted unless the event is active i.e. scheduled in.
> So an event in a task context where the task sleeps a lot will
> likely not be adjusted every tick.
Yes, your explanation makes sense.
>> system is idle. For example, counter increase n during 2 * TICK_NSEC,
>> and it call perf_adjust_period using n and TICK_NSEC, so the final period
>> we calculated will be bigger than expected one. What's more, if the
>> the overflow time is larger than 2 * TICK_NSEC we cannot tune the period
>> using __perf_event_account_interrupt. To fix this problem, perf can
>> calculate the tick interval by itself.
> Yes, the period can get stuck being too big:
>
> 	perf_adjust_freq_unthr_events() calculates a value that is
> 	too big because it incorrectly assumes the count has
> 	accumulated only since the last tick, whereas it can have
> 	been much longer.
>
> 	__perf_event_account_interrupt() has an unexplained limit
> 	(2*TICK_NSEC) on the count delta, and won't adjust the
> 	period if that is exceeded.
>
>> Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com>
>> ---
>>   include/linux/perf_event.h |  1 +
>>   kernel/events/core.c       | 15 ++++++++++++---
>>   2 files changed, 13 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
>> index afb028c54f33..2708f1d0692c 100644
>> --- a/include/linux/perf_event.h
>> +++ b/include/linux/perf_event.h
>> @@ -265,6 +265,7 @@ struct hw_perf_event {
>>   	 * State for freq target events, see __perf_event_overflow() and
>>   	 * perf_adjust_freq_unthr_context().
>>   	 */
>> +	u64				freq_tick_stamp;
>>   	u64				freq_time_stamp;
>>   	u64				freq_count_stamp;
>>   #endif
>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index cad50d3439f1..0f2025d631aa 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
>> @@ -4112,7 +4112,7 @@ perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle)
>>   {
>>   	struct perf_event *event;
>>   	struct hw_perf_event *hwc;
>> -	u64 now, period = TICK_NSEC;
>> +	u64 now, period, tick_stamp;
>>   	s64 delta;
>>   
>>   	/*
>> @@ -4151,6 +4151,10 @@ perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle)
>>   		 */
>>   		event->pmu->stop(event, PERF_EF_UPDATE);
>>   
>> +		tick_stamp = perf_clock();
> Perhaps jiffies would work just as well, but be
> more efficient.

I tried to use jiffies as shown below.

                 tick_stamp = perf_clock();
+               printk("debug jiffies64:%llu, clock:%llu\n", 
jiffies64_to_nsecs(get_jiffies_64()), perf_clock());
                 period = tick_stamp - hwc->freq_tick_stamp;

But the result is odd and I don't know why, the result is pasted below.

[  423.646990] debug jiffies64:4295090788000000, clock:423646990256
[  423.685989] debug jiffies64:4295090827000000, clock:423685989583
[  423.719989] debug jiffies64:4295090861000000, clock:423719989830
[  423.755990] debug jiffies64:4295090897000000, clock:423755990128
[  423.766990] debug jiffies64:4295090908000000, clock:423766989901
[  423.777990] debug jiffies64:4295090918000000, clock:423777989972
[  423.787989] debug jiffies64:4295090929000000, clock:423787989835
[  423.798989] debug jiffies64:4295090940000000, clock:423798989359
[  423.833990] debug jiffies64:4295090975000000, clock:423833990057
[  423.878989] debug jiffies64:4295091020000000, clock:423878989503
[  423.898990] debug jiffies64:4295091040000000, clock:423898990201
[  423.921989] debug jiffies64:4295091063000000, clock:423921989762
[  423.967989] debug jiffies64:4295091109000000, clock:423967989325
[  424.011989] debug jiffies64:4295091153000000, clock:424011989387
[  424.099989] debug jiffies64:4295091241000000, clock:424099989730
[  424.169989] debug jiffies64:4295091311000000, clock:424169989577

perf_clock gets the right answer and jiffies make me confuse.

Looking forward to your reply, sincerely.

>> +		period = tick_stamp - hwc->freq_tick_stamp;
>> +		hwc->freq_tick_stamp = tick_stamp;
>> +
>>   		now = local64_read(&event->count);
>>   		delta = now - hwc->freq_count_stamp;
>>   		hwc->freq_count_stamp = now;
>> @@ -4162,8 +4166,13 @@ perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle)
>>   		 * to perf_adjust_period() to avoid stopping it
>>   		 * twice.
>>   		 */
>> -		if (delta > 0)
>> -			perf_adjust_period(event, period, delta, false);
>> +		if (delta > 0) {
>> +			/*
>> +			 * we skip first tick adjust period
>> +			 */
>> +			if (likely(period != tick_stamp))
>> +				perf_adjust_period(event, period, delta, false);
>> +		}
>>   
>>   		event->pmu->start(event, delta > 0 ? PERF_EF_RELOAD : 0);
>>   	next:


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH RESEND v2 2/2] perf/core: Fix incorrected time diff in tick adjust period
  2024-07-29 14:18     ` Luo Gengkun
@ 2024-08-10  9:26       ` Luo Gengkun
  0 siblings, 0 replies; 7+ messages in thread
From: Luo Gengkun @ 2024-08-10  9:26 UTC (permalink / raw)
  To: Adrian Hunter, peterz
  Cc: mingo, acme, mark.rutland, alexander.shishkin, jolsa, namhyung,
	irogers, linux-perf-users, linux-kernel


On 2024/7/29 22:18, Luo Gengkun wrote:
>
> On 2024/6/17 21:42, Adrian Hunter wrote:
>> On 17/04/24 14:54, Luo Gengkun wrote:
>>> Adrian found that there is a probability that the number of samples
>>> is small, which is caused by the unreasonable large sampling period.
>>>
>>>   # taskset --cpu 0 perf record -F 1000 -e cs -- taskset --cpu 1 ./test
>>>   [ perf record: Woken up 1 times to write data ]
>>>   [ perf record: Captured and wrote 0.010 MB perf.data (204 samples) ]
>>>   # perf script
>>>   ...
>>>   test   865   265.377846:         16 cs:  ffffffff832e927b 
>>> schedule+0x2b
>>>   test   865   265.378900:         15 cs:  ffffffff832e927b 
>>> schedule+0x2b
>>>   test   865   265.379845:         14 cs:  ffffffff832e927b 
>>> schedule+0x2b
>>>   test   865   265.380770:         14 cs:  ffffffff832e927b 
>>> schedule+0x2b
>>>   test   865   265.381647:         15 cs:  ffffffff832e927b 
>>> schedule+0x2b
>>>   test   865   265.382638:         16 cs:  ffffffff832e927b 
>>> schedule+0x2b
>>>   test   865   265.383647:         16 cs:  ffffffff832e927b 
>>> schedule+0x2b
>>>   test   865   265.384704:         15 cs:  ffffffff832e927b 
>>> schedule+0x2b
>>>   test   865   265.385649:         14 cs:  ffffffff832e927b 
>>> schedule+0x2b
>>>   test   865   265.386578:        152 cs:  ffffffff832e927b 
>>> schedule+0x2b
>>>   test   865   265.396383:        154 cs:  ffffffff832e927b 
>>> schedule+0x2b
>>>   test   865   265.406183:        154 cs:  ffffffff832e927b 
>>> schedule+0x2b
>>>   test   865   265.415839:        154 cs:  ffffffff832e927b 
>>> schedule+0x2b
>>>   test   865   265.425445:        154 cs:  ffffffff832e927b 
>>> schedule+0x2b
>>>   test   865   265.435052:        154 cs:  ffffffff832e927b 
>>> schedule+0x2b
>>>   test   865   265.444708:        154 cs:  ffffffff832e927b 
>>> schedule+0x2b
>>>   test   865   265.454314:        154 cs:  ffffffff832e927b 
>>> schedule+0x2b
>>>   test   865   265.463970:        154 cs:  ffffffff832e927b 
>>> schedule+0x2b
>>>   test   865   265.473577:        154 cs:  ffffffff832e927b 
>>> schedule+0x2b
>>>   ...
>>>
>>> It seems that the Timer Interrupts is not coming every TICK_NSEC when
>> No, the period is not adjusted unless the event is active i.e. 
>> scheduled in.
>> So an event in a task context where the task sleeps a lot will
>> likely not be adjusted every tick.
> Yes, your explanation makes sense.
>>> system is idle. For example, counter increase n during 2 * TICK_NSEC,
>>> and it call perf_adjust_period using n and TICK_NSEC, so the final 
>>> period
>>> we calculated will be bigger than expected one. What's more, if the
>>> the overflow time is larger than 2 * TICK_NSEC we cannot tune the 
>>> period
>>> using __perf_event_account_interrupt. To fix this problem, perf can
>>> calculate the tick interval by itself.
>> Yes, the period can get stuck being too big:
>>
>>     perf_adjust_freq_unthr_events() calculates a value that is
>>     too big because it incorrectly assumes the count has
>>     accumulated only since the last tick, whereas it can have
>>     been much longer.
>>
>>     __perf_event_account_interrupt() has an unexplained limit
>>     (2*TICK_NSEC) on the count delta, and won't adjust the
>>     period if that is exceeded.
>>
>>> Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com>
>>> ---
>>>   include/linux/perf_event.h |  1 +
>>>   kernel/events/core.c       | 15 ++++++++++++---
>>>   2 files changed, 13 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
>>> index afb028c54f33..2708f1d0692c 100644
>>> --- a/include/linux/perf_event.h
>>> +++ b/include/linux/perf_event.h
>>> @@ -265,6 +265,7 @@ struct hw_perf_event {
>>>        * State for freq target events, see __perf_event_overflow() and
>>>        * perf_adjust_freq_unthr_context().
>>>        */
>>> +    u64                freq_tick_stamp;
>>>       u64                freq_time_stamp;
>>>       u64                freq_count_stamp;
>>>   #endif
>>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>>> index cad50d3439f1..0f2025d631aa 100644
>>> --- a/kernel/events/core.c
>>> +++ b/kernel/events/core.c
>>> @@ -4112,7 +4112,7 @@ perf_adjust_freq_unthr_context(struct 
>>> perf_event_context *ctx, bool unthrottle)
>>>   {
>>>       struct perf_event *event;
>>>       struct hw_perf_event *hwc;
>>> -    u64 now, period = TICK_NSEC;
>>> +    u64 now, period, tick_stamp;
>>>       s64 delta;
>>>         /*
>>> @@ -4151,6 +4151,10 @@ perf_adjust_freq_unthr_context(struct 
>>> perf_event_context *ctx, bool unthrottle)
>>>            */
>>>           event->pmu->stop(event, PERF_EF_UPDATE);
>>>   +        tick_stamp = perf_clock();
>> Perhaps jiffies would work just as well, but be
>> more efficient.
>
> I tried to use jiffies as shown below.
>
>                 tick_stamp = perf_clock();
> +               printk("debug jiffies64:%llu, clock:%llu\n", 
> jiffies64_to_nsecs(get_jiffies_64()), perf_clock());
>                 period = tick_stamp - hwc->freq_tick_stamp;
>
> But the result is odd and I don't know why, the result is pasted below.
>
> [  423.646990] debug jiffies64:4295090788000000, clock:423646990256
> [  423.685989] debug jiffies64:4295090827000000, clock:423685989583
> [  423.719989] debug jiffies64:4295090861000000, clock:423719989830
> [  423.755990] debug jiffies64:4295090897000000, clock:423755990128
> [  423.766990] debug jiffies64:4295090908000000, clock:423766989901
> [  423.777990] debug jiffies64:4295090918000000, clock:423777989972
> [  423.787989] debug jiffies64:4295090929000000, clock:423787989835
> [  423.798989] debug jiffies64:4295090940000000, clock:423798989359
> [  423.833990] debug jiffies64:4295090975000000, clock:423833990057
> [  423.878989] debug jiffies64:4295091020000000, clock:423878989503
> [  423.898990] debug jiffies64:4295091040000000, clock:423898990201
> [  423.921989] debug jiffies64:4295091063000000, clock:423921989762
> [  423.967989] debug jiffies64:4295091109000000, clock:423967989325
> [  424.011989] debug jiffies64:4295091153000000, clock:424011989387
> [  424.099989] debug jiffies64:4295091241000000, clock:424099989730
> [  424.169989] debug jiffies64:4295091311000000, clock:424169989577
>
> perf_clock gets the right answer and jiffies make me confuse.
>
> Looking forward to your reply, sincerely.

Please ignore the previous problem. I try to replace perf_clock with 
jiffies, and

it still work well. The result is pasted below:

root@lgk:~# time ./a.out

real    0m3.459s
user    0m0.144s
sys     0m1.508s
root@lgk:~# perf record -e cs -F 1000  ./a.out

[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.132 MB perf.data (3421 samples) ]
root@lgk:~# perf record -e cs -F 1000  ./a.out

[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.128 MB perf.data (3336 samples) ]
root@lgk:~# perf record -e cs -F 1000  ./a.out

[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.128 MB perf.data (3315 samples) ]
root@lgk:~# perf record -e cs -F 1000  ./a.out

[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.125 MB perf.data (3262 samples) ]
root@lgk:~# perf record -e cs -F 1000  ./a.out

[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.129 MB perf.data (3344 samples) ]

In addition, it looks like perf_clock is using everywhere in perf. So i 
replace

local_clock with jiffies, as shown below:

+#include <linux/jiffies.h>

  #include "internal.h"

@@ -578,7 +579,7 @@ void __weak perf_event_print_debug(void)    { }

  static inline u64 perf_clock(void)
  {
-       return local_clock();
+       return jiffies64_to_nsecs(get_jiffies_64());
  }


>
>>> +        period = tick_stamp - hwc->freq_tick_stamp;
>>> +        hwc->freq_tick_stamp = tick_stamp;
>>> +
>>>           now = local64_read(&event->count);
>>>           delta = now - hwc->freq_count_stamp;
>>>           hwc->freq_count_stamp = now;
>>> @@ -4162,8 +4166,13 @@ perf_adjust_freq_unthr_context(struct 
>>> perf_event_context *ctx, bool unthrottle)
>>>            * to perf_adjust_period() to avoid stopping it
>>>            * twice.
>>>            */
>>> -        if (delta > 0)
>>> -            perf_adjust_period(event, period, delta, false);
>>> +        if (delta > 0) {
>>> +            /*
>>> +             * we skip first tick adjust period
>>> +             */
>>> +            if (likely(period != tick_stamp))
>>> +                perf_adjust_period(event, period, delta, false);
>>> +        }
>>>             event->pmu->start(event, delta > 0 ? PERF_EF_RELOAD : 0);
>>>       next:


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-08-10  9:26 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-17 11:54 [PATCH RESEND v2 0/2] Fix small negative period being ignored Luo Gengkun
2024-04-17 11:54 ` [PATCH RESEND v2 1/2] perf/core: " Luo Gengkun
2024-04-17 11:54 ` [PATCH RESEND v2 2/2] perf/core: Fix incorrected time diff in tick adjust period Luo Gengkun
2024-06-17 13:42   ` Adrian Hunter
2024-07-29 14:18     ` Luo Gengkun
2024-08-10  9:26       ` Luo Gengkun
2024-05-06 11:28 ` [PATCH RESEND v2 0/2] Fix small negative period being ignored Luo Gengkun

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).