Re: [PATCH v5 1/2] perf/core: Fix small negative period being ignored

public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH v5 1/2] perf/core: Fix small negative period being ignored
  2024-08-31  7:43 ` [PATCH v5 1/2] perf/core: Fix small negative period being ignored Luo Gengkun
@ 2024-08-31  7:39   ` kernel test robot
  2024-09-02  9:20   ` Peter Zijlstra
  2024-09-05 15:03   ` [tip: perf/core] " tip-bot2 for Luo Gengkun
  2 siblings, 0 replies; 8+ messages in thread
From: kernel test robot @ 2024-08-31  7:39 UTC (permalink / raw)
  To: Luo Gengkun; +Cc: stable, oe-kbuild-all

Hi,

Thanks for your patch.

FYI: kernel test robot notices the stable kernel rule is not satisfied.

The check is based on https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html#option-1

Rule: add the tag "Cc: stable@vger.kernel.org" in the sign-off area to have the patch automatically included in the stable tree.
Subject: [PATCH v5 1/2] perf/core: Fix small negative period being ignored
Link: https://lore.kernel.org/stable/20240831074316.2106159-2-luogengkun%40huaweicloud.com

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v5 0/2] Fix perf adjust period algorithm
@ 2024-08-31  7:43 Luo Gengkun
  2024-08-31  7:43 ` [PATCH v5 1/2] perf/core: Fix small negative period being ignored Luo Gengkun
  2024-08-31  7:43 ` [PATCH v5 2/2] perf/core: Fix incorrect time diff in tick adjust period Luo Gengkun
  0 siblings, 2 replies; 8+ messages in thread
From: Luo Gengkun @ 2024-08-31  7:43 UTC (permalink / raw)
  To: peterz
  Cc: mingo, acme, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, kan.liang, linux-perf-users, linux-kernel,
	stable, luogengkun

---
Changes in v5:
1. Read the time once at the beginning instead of each loop
2. Add reviewed by
Link to v4: https://lore.kernel.org/all/20240821134227.577544-1-luogengkun@huaweicloud.com/

Changes in v4:
1. Rebase the patch 
2. Tidy up the commit message
3. Modify the code style
Link to v3: https://lore.kernel.org/all/20240810102406.1190402-1-luogengkun@huaweicloud.com/

Changes in v3:
1. Replace perf_clock with jiffies in perf_adjust_freq_unthr_context
Link to v2: https://lore.kernel.org/all/20240417115446.2908769-1-luogengkun@huaweicloud.com/

Changes in v2:
1. Add reviewed by for perf/core: Fix small negative period being ignored
2. Add new patch perf/core: Fix incorrected time diff in tick adjust period
Link to v1: https://lore.kernel.org/all/20240116083915.2859302-1-luogengkun2@huawei.com/
---

Luo Gengkun (2):
  perf/core: Fix small negative period being ignored
  perf/core: Fix incorrect time diff in tick adjust period

 include/linux/perf_event.h |  1 +
 kernel/events/core.c       | 18 ++++++++++++++----
 2 files changed, 15 insertions(+), 4 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v5 1/2] perf/core: Fix small negative period being ignored
  2024-08-31  7:43 [PATCH v5 0/2] Fix perf adjust period algorithm Luo Gengkun
@ 2024-08-31  7:43 ` Luo Gengkun
  2024-08-31  7:39   ` kernel test robot
                     ` (2 more replies)
  2024-08-31  7:43 ` [PATCH v5 2/2] perf/core: Fix incorrect time diff in tick adjust period Luo Gengkun
  1 sibling, 3 replies; 8+ messages in thread
From: Luo Gengkun @ 2024-08-31  7:43 UTC (permalink / raw)
  To: peterz
  Cc: mingo, acme, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, kan.liang, linux-perf-users, linux-kernel,
	stable, luogengkun

In perf_adjust_period, we will first calculate period, and then use
this period to calculate delta. However, when delta is less than 0,
there will be a deviation compared to when delta is greater than or
equal to 0. For example, when delta is in the range of [-14,-1], the
range of delta = delta + 7 is between [-7,6], so the final value of
delta/8 is 0. Therefore, the impact of -1 and -2 will be ignored.
This is unacceptable when the target period is very short, because
we will lose a lot of samples.

Here are some tests and analyzes:
before:
  # perf record -e cs -F 1000  ./a.out
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.022 MB perf.data (518 samples) ]

  # perf script
  ...
  a.out     396   257.956048:         23 cs:  ffffffff81f4eeec schedul>
  a.out     396   257.957891:         23 cs:  ffffffff81f4eeec schedul>
  a.out     396   257.959730:         23 cs:  ffffffff81f4eeec schedul>
  a.out     396   257.961545:         23 cs:  ffffffff81f4eeec schedul>
  a.out     396   257.963355:         23 cs:  ffffffff81f4eeec schedul>
  a.out     396   257.965163:         23 cs:  ffffffff81f4eeec schedul>
  a.out     396   257.966973:         23 cs:  ffffffff81f4eeec schedul>
  a.out     396   257.968785:         23 cs:  ffffffff81f4eeec schedul>
  a.out     396   257.970593:         23 cs:  ffffffff81f4eeec schedul>
  ...

after:
  # perf record -e cs -F 1000  ./a.out
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.058 MB perf.data (1466 samples) ]

  # perf script
  ...
  a.out     395    59.338813:         11 cs:  ffffffff81f4eeec schedul>
  a.out     395    59.339707:         12 cs:  ffffffff81f4eeec schedul>
  a.out     395    59.340682:         13 cs:  ffffffff81f4eeec schedul>
  a.out     395    59.341751:         13 cs:  ffffffff81f4eeec schedul>
  a.out     395    59.342799:         12 cs:  ffffffff81f4eeec schedul>
  a.out     395    59.343765:         11 cs:  ffffffff81f4eeec schedul>
  a.out     395    59.344651:         11 cs:  ffffffff81f4eeec schedul>
  a.out     395    59.345539:         12 cs:  ffffffff81f4eeec schedul>
  a.out     395    59.346502:         13 cs:  ffffffff81f4eeec schedul>
  ...

test.c

int main() {
        for (int i = 0; i < 20000; i++)
                usleep(10);

        return 0;
}

  # time ./a.out
  real    0m1.583s
  user    0m0.040s
  sys     0m0.298s

The above results were tested on x86-64 qemu with KVM enabled using
test.c as test program. Ideally, we should have around 1500 samples,
but the previous algorithm had only about 500, whereas the modified
algorithm now has about 1400. Further more, the new version shows 1
sample per 0.001s, while the previous one is 1 sample per 0.002s.This
indicates that the new algorithm is more sensitive to small negative
values compared to old algorithm.

Fixes: bd2b5b12849a ("perf_counter: More aggressive frequency adjustment")
Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
---
 kernel/events/core.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index c973e3c11e03..a9395bbfd4aa 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4092,7 +4092,11 @@ static void perf_adjust_period(struct perf_event *event, u64 nsec, u64 count, bo
 	period = perf_calculate_period(event, nsec, count);
 
 	delta = (s64)(period - hwc->sample_period);
-	delta = (delta + 7) / 8; /* low pass filter */
+	if (delta >= 0)
+		delta += 7;
+	else
+		delta -= 7;
+	delta /= 8; /* low pass filter */
 
 	sample_period = hwc->sample_period + delta;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v5 2/2] perf/core: Fix incorrect time diff in tick adjust period
  2024-08-31  7:43 [PATCH v5 0/2] Fix perf adjust period algorithm Luo Gengkun
  2024-08-31  7:43 ` [PATCH v5 1/2] perf/core: Fix small negative period being ignored Luo Gengkun
@ 2024-08-31  7:43 ` Luo Gengkun
  2024-09-02  9:50   ` Peter Zijlstra
  1 sibling, 1 reply; 8+ messages in thread
From: Luo Gengkun @ 2024-08-31  7:43 UTC (permalink / raw)
  To: peterz
  Cc: mingo, acme, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, kan.liang, linux-perf-users, linux-kernel,
	stable, luogengkun

Perf events has the notion of sampling frequency which is implemented in
software by dynamically adjusting the counter period so that samples occur
at approximately the target frequency.  Period adjustment is done in 2
places:
 - when the counter overflows (and a sample is recorded)
 - each timer tick, when the event is active
The later case is slightly flawed because it assumes that the time since
the last timer-tick period adjustment is 1 tick, whereas the event may not
have been active (e.g. for a task that is sleeping).

Fix by using jiffies to determine the elapsed time in that case.

Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
---
 include/linux/perf_event.h |  1 +
 kernel/events/core.c       | 12 +++++++++---
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 1a8942277dda..d29b7cf971a1 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -265,6 +265,7 @@ struct hw_perf_event {
 	 * State for freq target events, see __perf_event_overflow() and
 	 * perf_adjust_freq_unthr_context().
 	 */
+	u64				freq_tick_stamp;
 	u64				freq_time_stamp;
 	u64				freq_count_stamp;
 #endif
diff --git a/kernel/events/core.c b/kernel/events/core.c
index a9395bbfd4aa..183291e0d070 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -55,6 +55,7 @@
 #include <linux/pgtable.h>
 #include <linux/buildid.h>
 #include <linux/task_work.h>
+#include <linux/jiffies.h>
 
 #include "internal.h"
 
@@ -4120,9 +4121,11 @@ static void perf_adjust_freq_unthr_events(struct list_head *event_list)
 {
 	struct perf_event *event;
 	struct hw_perf_event *hwc;
-	u64 now, period = TICK_NSEC;
+	u64 now, period, tick_stamp;
 	s64 delta;
 
+	tick_stamp = jiffies64_to_nsecs(get_jiffies_64());
+
 	list_for_each_entry(event, event_list, active_list) {
 		if (event->state != PERF_EVENT_STATE_ACTIVE)
 			continue;
@@ -4148,6 +4151,9 @@ static void perf_adjust_freq_unthr_events(struct list_head *event_list)
 		 */
 		event->pmu->stop(event, PERF_EF_UPDATE);
 
+		period = tick_stamp - hwc->freq_tick_stamp;
+		hwc->freq_tick_stamp = tick_stamp;
+
 		now = local64_read(&event->count);
 		delta = now - hwc->freq_count_stamp;
 		hwc->freq_count_stamp = now;
@@ -4157,9 +4163,9 @@ static void perf_adjust_freq_unthr_events(struct list_head *event_list)
 		 * reload only if value has changed
 		 * we have stopped the event so tell that
 		 * to perf_adjust_period() to avoid stopping it
-		 * twice.
+		 * twice. And skip if it is the first tick adjust period.
 		 */
-		if (delta > 0)
+		if (delta > 0 && likely(period != tick_stamp))
 			perf_adjust_period(event, period, delta, false);
 
 		event->pmu->start(event, delta > 0 ? PERF_EF_RELOAD : 0);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v5 1/2] perf/core: Fix small negative period being ignored
  2024-08-31  7:43 ` [PATCH v5 1/2] perf/core: Fix small negative period being ignored Luo Gengkun
  2024-08-31  7:39   ` kernel test robot
@ 2024-09-02  9:20   ` Peter Zijlstra
  2024-09-05 15:03   ` [tip: perf/core] " tip-bot2 for Luo Gengkun
  2 siblings, 0 replies; 8+ messages in thread
From: Peter Zijlstra @ 2024-09-02  9:20 UTC (permalink / raw)
  To: Luo Gengkun
  Cc: mingo, acme, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, kan.liang, linux-perf-users, linux-kernel,
	stable

On Sat, Aug 31, 2024 at 07:43:15AM +0000, Luo Gengkun wrote:

>  kernel/events/core.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index c973e3c11e03..a9395bbfd4aa 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -4092,7 +4092,11 @@ static void perf_adjust_period(struct perf_event *event, u64 nsec, u64 count, bo
>  	period = perf_calculate_period(event, nsec, count);
>  
>  	delta = (s64)(period - hwc->sample_period);
> -	delta = (delta + 7) / 8; /* low pass filter */
> +	if (delta >= 0)
> +		delta += 7;
> +	else
> +		delta -= 7;
> +	delta /= 8; /* low pass filter */
>  
>  	sample_period = hwc->sample_period + delta;
>  

OK, that makes sense, Thanks!

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v5 2/2] perf/core: Fix incorrect time diff in tick adjust period
  2024-08-31  7:43 ` [PATCH v5 2/2] perf/core: Fix incorrect time diff in tick adjust period Luo Gengkun
@ 2024-09-02  9:50   ` Peter Zijlstra
  2024-09-05  6:38     ` Luo Gengkun
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2024-09-02  9:50 UTC (permalink / raw)
  To: Luo Gengkun
  Cc: mingo, acme, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, kan.liang, linux-perf-users, linux-kernel,
	stable

On Sat, Aug 31, 2024 at 07:43:16AM +0000, Luo Gengkun wrote:
> Perf events has the notion of sampling frequency which is implemented in
> software by dynamically adjusting the counter period so that samples occur
> at approximately the target frequency.  Period adjustment is done in 2
> places:
>  - when the counter overflows (and a sample is recorded)
>  - each timer tick, when the event is active
> The later case is slightly flawed because it assumes that the time since
> the last timer-tick period adjustment is 1 tick, whereas the event may not
> have been active (e.g. for a task that is sleeping).
> 
> Fix by using jiffies to determine the elapsed time in that case.
> 
> Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com>
> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>  include/linux/perf_event.h |  1 +
>  kernel/events/core.c       | 12 +++++++++---
>  2 files changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 1a8942277dda..d29b7cf971a1 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -265,6 +265,7 @@ struct hw_perf_event {
>  	 * State for freq target events, see __perf_event_overflow() and
>  	 * perf_adjust_freq_unthr_context().
>  	 */
> +	u64				freq_tick_stamp;
>  	u64				freq_time_stamp;
>  	u64				freq_count_stamp;
>  #endif
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index a9395bbfd4aa..183291e0d070 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -55,6 +55,7 @@
>  #include <linux/pgtable.h>
>  #include <linux/buildid.h>
>  #include <linux/task_work.h>
> +#include <linux/jiffies.h>
>  
>  #include "internal.h"
>  
> @@ -4120,9 +4121,11 @@ static void perf_adjust_freq_unthr_events(struct list_head *event_list)
>  {
>  	struct perf_event *event;
>  	struct hw_perf_event *hwc;
> -	u64 now, period = TICK_NSEC;
> +	u64 now, period, tick_stamp;
>  	s64 delta;
>  
> +	tick_stamp = jiffies64_to_nsecs(get_jiffies_64());
> +
>  	list_for_each_entry(event, event_list, active_list) {
>  		if (event->state != PERF_EVENT_STATE_ACTIVE)
>  			continue;
> @@ -4148,6 +4151,9 @@ static void perf_adjust_freq_unthr_events(struct list_head *event_list)
>  		 */
>  		event->pmu->stop(event, PERF_EF_UPDATE);
>  
> +		period = tick_stamp - hwc->freq_tick_stamp;
> +		hwc->freq_tick_stamp = tick_stamp;
> +
>  		now = local64_read(&event->count);
>  		delta = now - hwc->freq_count_stamp;
>  		hwc->freq_count_stamp = now;
> @@ -4157,9 +4163,9 @@ static void perf_adjust_freq_unthr_events(struct list_head *event_list)
>  		 * reload only if value has changed
>  		 * we have stopped the event so tell that
>  		 * to perf_adjust_period() to avoid stopping it
> -		 * twice.
> +		 * twice. And skip if it is the first tick adjust period.
>  		 */
> -		if (delta > 0)
> +		if (delta > 0 && likely(period != tick_stamp))
>  			perf_adjust_period(event, period, delta, false);
>  
>  		event->pmu->start(event, delta > 0 ? PERF_EF_RELOAD : 0);

This one I'm less happy with.. that condition 'period != tick_stamp'
doesn't make sense to me. That's only false if hwc->freq_tick_stamp ==
0, which it will only be once after event creation. Even through the
Changelog babbles about event scheduling.

Also, that all should then be written something like:

	if (delta > 0 && ...) {
		perf_adjust_period(...);
		adjusted = true;
	}

	event->pmu->start(event, adjusted ? PERF_EF_RELOAD : 0);

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v5 2/2] perf/core: Fix incorrect time diff in tick adjust period
  2024-09-02  9:50   ` Peter Zijlstra
@ 2024-09-05  6:38     ` Luo Gengkun
  0 siblings, 0 replies; 8+ messages in thread
From: Luo Gengkun @ 2024-09-05  6:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, acme, namhyung, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, kan.liang, linux-perf-users, linux-kernel,
	stable


On 2024/9/2 17:50, Peter Zijlstra wrote:
> On Sat, Aug 31, 2024 at 07:43:16AM +0000, Luo Gengkun wrote:
>> Perf events has the notion of sampling frequency which is implemented in
>> software by dynamically adjusting the counter period so that samples occur
>> at approximately the target frequency.  Period adjustment is done in 2
>> places:
>>   - when the counter overflows (and a sample is recorded)
>>   - each timer tick, when the event is active
>> The later case is slightly flawed because it assumes that the time since
>> the last timer-tick period adjustment is 1 tick, whereas the event may not
>> have been active (e.g. for a task that is sleeping).
>>
>> Fix by using jiffies to determine the elapsed time in that case.
>>
>> Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com>
>> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
>> ---
>>   include/linux/perf_event.h |  1 +
>>   kernel/events/core.c       | 12 +++++++++---
>>   2 files changed, 10 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
>> index 1a8942277dda..d29b7cf971a1 100644
>> --- a/include/linux/perf_event.h
>> +++ b/include/linux/perf_event.h
>> @@ -265,6 +265,7 @@ struct hw_perf_event {
>>   	 * State for freq target events, see __perf_event_overflow() and
>>   	 * perf_adjust_freq_unthr_context().
>>   	 */
>> +	u64				freq_tick_stamp;
>>   	u64				freq_time_stamp;
>>   	u64				freq_count_stamp;
>>   #endif
>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index a9395bbfd4aa..183291e0d070 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
>> @@ -55,6 +55,7 @@
>>   #include <linux/pgtable.h>
>>   #include <linux/buildid.h>
>>   #include <linux/task_work.h>
>> +#include <linux/jiffies.h>
>>   
>>   #include "internal.h"
>>   
>> @@ -4120,9 +4121,11 @@ static void perf_adjust_freq_unthr_events(struct list_head *event_list)
>>   {
>>   	struct perf_event *event;
>>   	struct hw_perf_event *hwc;
>> -	u64 now, period = TICK_NSEC;
>> +	u64 now, period, tick_stamp;
>>   	s64 delta;
>>   
>> +	tick_stamp = jiffies64_to_nsecs(get_jiffies_64());
>> +
>>   	list_for_each_entry(event, event_list, active_list) {
>>   		if (event->state != PERF_EVENT_STATE_ACTIVE)
>>   			continue;
>> @@ -4148,6 +4151,9 @@ static void perf_adjust_freq_unthr_events(struct list_head *event_list)
>>   		 */
>>   		event->pmu->stop(event, PERF_EF_UPDATE);
>>   
>> +		period = tick_stamp - hwc->freq_tick_stamp;
>> +		hwc->freq_tick_stamp = tick_stamp;
>> +
>>   		now = local64_read(&event->count);
>>   		delta = now - hwc->freq_count_stamp;
>>   		hwc->freq_count_stamp = now;
>> @@ -4157,9 +4163,9 @@ static void perf_adjust_freq_unthr_events(struct list_head *event_list)
>>   		 * reload only if value has changed
>>   		 * we have stopped the event so tell that
>>   		 * to perf_adjust_period() to avoid stopping it
>> -		 * twice.
>> +		 * twice. And skip if it is the first tick adjust period.
>>   		 */
>> -		if (delta > 0)
>> +		if (delta > 0 && likely(period != tick_stamp))
>>   			perf_adjust_period(event, period, delta, false);
>>   
>>   		event->pmu->start(event, delta > 0 ? PERF_EF_RELOAD : 0);
> This one I'm less happy with.. that condition 'period != tick_stamp'
> doesn't make sense to me. That's only false if hwc->freq_tick_stamp ==
> 0, which it will only be once after event creation. Even through the
> Changelog babbles about event scheduling.
>
> Also, that all should then be written something like:
>
> 	if (delta > 0 && ...) {
> 		perf_adjust_period(...);
> 		adjusted = true;
> 	}
>
> 	event->pmu->start(event, adjusted ? PERF_EF_RELOAD : 0);

Thank for your review! That is a good point.

If freq_tick_stamp is initialized when an event is created

or enabled, the additional condition can be removed as follows:

+static bool is_freq_event(struct perf_event *event)
+{
+       return event->attr.freq && event->attr.sample_freq;
+}
+
  static void
  perf_event_set_state(struct perf_event *event, enum perf_event_state 
state)
  {
@@ -665,6 +670,12 @@ perf_event_set_state(struct perf_event *event, enum 
perf_event_state state)
          */
         if ((event->state < 0) ^ (state < 0))
                 perf_event_update_sibling_time(event);
+       /*
+        * Update freq_tick_stamp for freq event just enabled
+        */
+       if (is_freq_event(event) && state == PERF_EVENT_STATE_INACTIVE &&
+                                   event->state < 
PERF_EVENT_STATE_INACTIVE)
+               event->hw.freq_tick_stamp = 
jiffies64_to_nsecs(get_jiffies_64());

         WRITE_ONCE(event->state, state);
  }
@@ -4165,7 +4176,7 @@ static void perf_adjust_freq_unthr_events(struct 
list_head *event_list)
                  * to perf_adjust_period() to avoid stopping it
                  * twice. And skip if it is the first tick adjust period.
                  */
-               if (delta > 0 && likely(period != tick_stamp))
+               if (delta > 0)
                         perf_adjust_period(event, period, delta, false);

                 event->pmu->start(event, delta > 0 ? PERF_EF_RELOAD : 0);
@@ -12061,8 +12072,11 @@ perf_event_alloc(struct perf_event_attr *attr, 
int cpu,

         hwc = &event->hw;
         hwc->sample_period = attr->sample_period;
-       if (attr->freq && attr->sample_freq)
+       if (is_freq_event(event)) {
                 hwc->sample_period = 1;
+               if (event->state == PERF_EVENT_STATE_INACTIVE)
+                       event->hw.freq_tick_stamp = 
jiffies64_to_nsecs(get_jiffies_64());
+       }


And  I'm wondering if we also need to update freq_count_stamp when

the freq event is enabled for the reason to keep they on the same "period".

+       if (is_freq_event(event) && state == PERF_EVENT_STATE_INACTIVE &&
+                                   event->state < 
PERF_EVENT_STATE_INACTIVE) {
+               event->hw.freq_tick_stamp = 
jiffies64_to_nsecs(get_jiffies_64());
+               event->hw.freq_count_stamp = local64_read(&event->count);
+       }

Looking for your reply!

Thanks.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [tip: perf/core] perf/core: Fix small negative period being ignored
  2024-08-31  7:43 ` [PATCH v5 1/2] perf/core: Fix small negative period being ignored Luo Gengkun
  2024-08-31  7:39   ` kernel test robot
  2024-09-02  9:20   ` Peter Zijlstra
@ 2024-09-05 15:03   ` tip-bot2 for Luo Gengkun
  2 siblings, 0 replies; 8+ messages in thread
From: tip-bot2 for Luo Gengkun @ 2024-09-05 15:03 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Luo Gengkun, Peter Zijlstra (Intel), Adrian Hunter, Kan Liang,
	stable, x86, linux-kernel

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     62c0b1061593d7012292f781f11145b2d46f43ab
Gitweb:        https://git.kernel.org/tip/62c0b1061593d7012292f781f11145b2d46f43ab
Author:        Luo Gengkun <luogengkun@huaweicloud.com>
AuthorDate:    Sat, 31 Aug 2024 07:43:15 
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 05 Sep 2024 16:56:13 +02:00

perf/core: Fix small negative period being ignored

In perf_adjust_period, we will first calculate period, and then use
this period to calculate delta. However, when delta is less than 0,
there will be a deviation compared to when delta is greater than or
equal to 0. For example, when delta is in the range of [-14,-1], the
range of delta = delta + 7 is between [-7,6], so the final value of
delta/8 is 0. Therefore, the impact of -1 and -2 will be ignored.
This is unacceptable when the target period is very short, because
we will lose a lot of samples.

Here are some tests and analyzes:
before:
  # perf record -e cs -F 1000  ./a.out
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.022 MB perf.data (518 samples) ]

  # perf script
  ...
  a.out     396   257.956048:         23 cs:  ffffffff81f4eeec schedul>
  a.out     396   257.957891:         23 cs:  ffffffff81f4eeec schedul>
  a.out     396   257.959730:         23 cs:  ffffffff81f4eeec schedul>
  a.out     396   257.961545:         23 cs:  ffffffff81f4eeec schedul>
  a.out     396   257.963355:         23 cs:  ffffffff81f4eeec schedul>
  a.out     396   257.965163:         23 cs:  ffffffff81f4eeec schedul>
  a.out     396   257.966973:         23 cs:  ffffffff81f4eeec schedul>
  a.out     396   257.968785:         23 cs:  ffffffff81f4eeec schedul>
  a.out     396   257.970593:         23 cs:  ffffffff81f4eeec schedul>
  ...

after:
  # perf record -e cs -F 1000  ./a.out
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.058 MB perf.data (1466 samples) ]

  # perf script
  ...
  a.out     395    59.338813:         11 cs:  ffffffff81f4eeec schedul>
  a.out     395    59.339707:         12 cs:  ffffffff81f4eeec schedul>
  a.out     395    59.340682:         13 cs:  ffffffff81f4eeec schedul>
  a.out     395    59.341751:         13 cs:  ffffffff81f4eeec schedul>
  a.out     395    59.342799:         12 cs:  ffffffff81f4eeec schedul>
  a.out     395    59.343765:         11 cs:  ffffffff81f4eeec schedul>
  a.out     395    59.344651:         11 cs:  ffffffff81f4eeec schedul>
  a.out     395    59.345539:         12 cs:  ffffffff81f4eeec schedul>
  a.out     395    59.346502:         13 cs:  ffffffff81f4eeec schedul>
  ...

test.c

int main() {
        for (int i = 0; i < 20000; i++)
                usleep(10);

        return 0;
}

  # time ./a.out
  real    0m1.583s
  user    0m0.040s
  sys     0m0.298s

The above results were tested on x86-64 qemu with KVM enabled using
test.c as test program. Ideally, we should have around 1500 samples,
but the previous algorithm had only about 500, whereas the modified
algorithm now has about 1400. Further more, the new version shows 1
sample per 0.001s, while the previous one is 1 sample per 0.002s.This
indicates that the new algorithm is more sensitive to small negative
values compared to old algorithm.

Fixes: bd2b5b12849a ("perf_counter: More aggressive frequency adjustment")
Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20240831074316.2106159-2-luogengkun@huaweicloud.com
---
 kernel/events/core.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 4acec97..67e115d 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4183,7 +4183,11 @@ static void perf_adjust_period(struct perf_event *event, u64 nsec, u64 count, bo
 	period = perf_calculate_period(event, nsec, count);
 
 	delta = (s64)(period - hwc->sample_period);
-	delta = (delta + 7) / 8; /* low pass filter */
+	if (delta >= 0)
+		delta += 7;
+	else
+		delta -= 7;
+	delta /= 8; /* low pass filter */
 
 	sample_period = hwc->sample_period + delta;
 

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-09-05 15:03 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-31  7:43 [PATCH v5 0/2] Fix perf adjust period algorithm Luo Gengkun
2024-08-31  7:43 ` [PATCH v5 1/2] perf/core: Fix small negative period being ignored Luo Gengkun
2024-08-31  7:39   ` kernel test robot
2024-09-02  9:20   ` Peter Zijlstra
2024-09-05 15:03   ` [tip: perf/core] " tip-bot2 for Luo Gengkun
2024-08-31  7:43 ` [PATCH v5 2/2] perf/core: Fix incorrect time diff in tick adjust period Luo Gengkun
2024-09-02  9:50   ` Peter Zijlstra
2024-09-05  6:38     ` Luo Gengkun

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox