* [PATCH v3 0/2] *** Fix small negative period being ignored *** @ 2024-08-10 10:24 Luo Gengkun 2024-08-10 10:24 ` [PATCH v3 1/2] perf/core: Fix small negative period being ignored Luo Gengkun 2024-08-10 10:24 ` [PATCH v3 2/2] perf/core: Fix incorrected time diff in tick adjust period Luo Gengkun 0 siblings, 2 replies; 4+ messages in thread From: Luo Gengkun @ 2024-08-10 10:24 UTC (permalink / raw) To: peterz Cc: mingo, acme, mark.rutland, alexander.shishkin, jolsa, namhyung, irogers, adrian.hunter, linux-perf-users, linux-kernel, luogengkun v2 -> v3: 1. Replace perf_clock with jiffies in perf_adjust_freq_unthr_context Luo Gengkun (2): perf/core: Fix small negative period being ignored perf/core: Fix incorrected time diff in tick adjust period include/linux/perf_event.h | 1 + kernel/events/core.c | 22 ++++++++++++++++++---- 2 files changed, 19 insertions(+), 4 deletions(-) -- 2.34.1 ^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH v3 1/2] perf/core: Fix small negative period being ignored 2024-08-10 10:24 [PATCH v3 0/2] *** Fix small negative period being ignored *** Luo Gengkun @ 2024-08-10 10:24 ` Luo Gengkun 2024-08-10 10:24 ` [PATCH v3 2/2] perf/core: Fix incorrected time diff in tick adjust period Luo Gengkun 1 sibling, 0 replies; 4+ messages in thread From: Luo Gengkun @ 2024-08-10 10:24 UTC (permalink / raw) To: peterz Cc: mingo, acme, mark.rutland, alexander.shishkin, jolsa, namhyung, irogers, adrian.hunter, linux-perf-users, linux-kernel, luogengkun In perf_adjust_period, we will first calculate period, and then use this period to calculate delta. However, when delta is less than 0, there will be a deviation compared to when delta is greater than or equal to 0. For example, when delta is in the range of [-14,-1], the range of delta = delta + 7 is between [-7,6], so the final value of delta/8 is 0. Therefore, the impact of -1 and -2 will be ignored. This is unacceptable when the target period is very short, because we will lose a lot of samples. Here are some tests and analyzes: before: # perf record -e cs -F 1000 ./a.out [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.022 MB perf.data (518 samples) ] # perf script ... a.out 396 257.956048: 23 cs: ffffffff81f4eeec schedul> a.out 396 257.957891: 23 cs: ffffffff81f4eeec schedul> a.out 396 257.959730: 23 cs: ffffffff81f4eeec schedul> a.out 396 257.961545: 23 cs: ffffffff81f4eeec schedul> a.out 396 257.963355: 23 cs: ffffffff81f4eeec schedul> a.out 396 257.965163: 23 cs: ffffffff81f4eeec schedul> a.out 396 257.966973: 23 cs: ffffffff81f4eeec schedul> a.out 396 257.968785: 23 cs: ffffffff81f4eeec schedul> a.out 396 257.970593: 23 cs: ffffffff81f4eeec schedul> ... after: # perf record -e cs -F 1000 ./a.out [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.058 MB perf.data (1466 samples) ] # perf script ... a.out 395 59.338813: 11 cs: ffffffff81f4eeec schedul> a.out 395 59.339707: 12 cs: ffffffff81f4eeec schedul> a.out 395 59.340682: 13 cs: ffffffff81f4eeec schedul> a.out 395 59.341751: 13 cs: ffffffff81f4eeec schedul> a.out 395 59.342799: 12 cs: ffffffff81f4eeec schedul> a.out 395 59.343765: 11 cs: ffffffff81f4eeec schedul> a.out 395 59.344651: 11 cs: ffffffff81f4eeec schedul> a.out 395 59.345539: 12 cs: ffffffff81f4eeec schedul> a.out 395 59.346502: 13 cs: ffffffff81f4eeec schedul> ... test.c int main() { for (int i = 0; i < 20000; i++) usleep(10); return 0; } # time ./a.out real 0m1.583s user 0m0.040s sys 0m0.298s The above results were tested on x86-64 qemu with KVM enabled using test.c as test program. Ideally, we should have around 1500 samples, but the previous algorithm had only about 500, whereas the modified algorithm now has about 1400. Further more, the new version shows 1 sample per 0.001s, while the previous one is 1 sample per 0.002s.This indicates that the new algorithm is more sensitive to small negative values compared to old algorithm. Fixes: bd2b5b12849a ("perf_counter: More aggressive frequency adjustment") Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> --- kernel/events/core.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index 683dc086ef10..cad50d3439f1 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -4078,7 +4078,11 @@ static void perf_adjust_period(struct perf_event *event, u64 nsec, u64 count, bo period = perf_calculate_period(event, nsec, count); delta = (s64)(period - hwc->sample_period); - delta = (delta + 7) / 8; /* low pass filter */ + if (delta >= 0) + delta += 7; + else + delta -= 7; + delta /= 8; /* low pass filter */ sample_period = hwc->sample_period + delta; -- 2.34.1 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH v3 2/2] perf/core: Fix incorrected time diff in tick adjust period 2024-08-10 10:24 [PATCH v3 0/2] *** Fix small negative period being ignored *** Luo Gengkun 2024-08-10 10:24 ` [PATCH v3 1/2] perf/core: Fix small negative period being ignored Luo Gengkun @ 2024-08-10 10:24 ` Luo Gengkun 2024-08-21 11:32 ` Adrian Hunter 1 sibling, 1 reply; 4+ messages in thread From: Luo Gengkun @ 2024-08-10 10:24 UTC (permalink / raw) To: peterz Cc: mingo, acme, mark.rutland, alexander.shishkin, jolsa, namhyung, irogers, adrian.hunter, linux-perf-users, linux-kernel, luogengkun Adrian found that there is a probability that the number of samples is small, which is caused by the unreasonable large sampling period. # taskset --cpu 0 perf record -F 1000 -e cs -- taskset --cpu 1 ./test [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.010 MB perf.data (204 samples) ] # perf script ... test 865 265.377846: 16 cs: ffffffff832e927b schedule+0x2b test 865 265.378900: 15 cs: ffffffff832e927b schedule+0x2b test 865 265.379845: 14 cs: ffffffff832e927b schedule+0x2b test 865 265.380770: 14 cs: ffffffff832e927b schedule+0x2b test 865 265.381647: 15 cs: ffffffff832e927b schedule+0x2b test 865 265.382638: 16 cs: ffffffff832e927b schedule+0x2b test 865 265.383647: 16 cs: ffffffff832e927b schedule+0x2b test 865 265.384704: 15 cs: ffffffff832e927b schedule+0x2b test 865 265.385649: 14 cs: ffffffff832e927b schedule+0x2b test 865 265.386578: 152 cs: ffffffff832e927b schedule+0x2b test 865 265.396383: 154 cs: ffffffff832e927b schedule+0x2b test 865 265.406183: 154 cs: ffffffff832e927b schedule+0x2b test 865 265.415839: 154 cs: ffffffff832e927b schedule+0x2b test 865 265.425445: 154 cs: ffffffff832e927b schedule+0x2b test 865 265.435052: 154 cs: ffffffff832e927b schedule+0x2b test 865 265.444708: 154 cs: ffffffff832e927b schedule+0x2b test 865 265.454314: 154 cs: ffffffff832e927b schedule+0x2b test 865 265.463970: 154 cs: ffffffff832e927b schedule+0x2b test 865 265.473577: 154 cs: ffffffff832e927b schedule+0x2b ... And the reason is perf_adjust_freq_unthr_events() calculates a value that is too big because it incorrectly assumes the count has accumulated only since the last tick, whereas it can have been much longer. To fix this problem, perf can calculate the tick interval by itself. For perf_adjust_freq_unthr_events we can use jiffies to calculate the tick interval more efficiently, as sugguested by Adrian. Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com> --- include/linux/perf_event.h | 1 + kernel/events/core.c | 16 +++++++++++++--- 2 files changed, 14 insertions(+), 3 deletions(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index afb028c54f33..2708f1d0692c 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -265,6 +265,7 @@ struct hw_perf_event { * State for freq target events, see __perf_event_overflow() and * perf_adjust_freq_unthr_context(). */ + u64 freq_tick_stamp; u64 freq_time_stamp; u64 freq_count_stamp; #endif diff --git a/kernel/events/core.c b/kernel/events/core.c index cad50d3439f1..309af5520f52 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -55,6 +55,7 @@ #include <linux/pgtable.h> #include <linux/buildid.h> #include <linux/task_work.h> +#include <linux/jiffies.h> #include "internal.h" @@ -4112,7 +4113,7 @@ perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle) { struct perf_event *event; struct hw_perf_event *hwc; - u64 now, period = TICK_NSEC; + u64 now, period, tick_stamp; s64 delta; /* @@ -4151,6 +4152,10 @@ perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle) */ event->pmu->stop(event, PERF_EF_UPDATE); + tick_stamp = jiffies64_to_nsecs(get_jiffies_64()); + period = tick_stamp - hwc->freq_tick_stamp; + hwc->freq_tick_stamp = tick_stamp; + now = local64_read(&event->count); delta = now - hwc->freq_count_stamp; hwc->freq_count_stamp = now; @@ -4162,8 +4167,13 @@ perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle) * to perf_adjust_period() to avoid stopping it * twice. */ - if (delta > 0) - perf_adjust_period(event, period, delta, false); + if (delta > 0) { + /* + * we skip first tick adjust period + */ + if (likely(period != tick_stamp)) + perf_adjust_period(event, period, delta, false); + } event->pmu->start(event, delta > 0 ? PERF_EF_RELOAD : 0); next: -- 2.34.1 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v3 2/2] perf/core: Fix incorrected time diff in tick adjust period 2024-08-10 10:24 ` [PATCH v3 2/2] perf/core: Fix incorrected time diff in tick adjust period Luo Gengkun @ 2024-08-21 11:32 ` Adrian Hunter 0 siblings, 0 replies; 4+ messages in thread From: Adrian Hunter @ 2024-08-21 11:32 UTC (permalink / raw) To: Luo Gengkun, peterz Cc: mingo, acme, mark.rutland, alexander.shishkin, jolsa, namhyung, irogers, linux-perf-users, linux-kernel On 10/08/24 13:24, Luo Gengkun wrote: > Adrian found that there is a probability that the number of samples > is small, which is caused by the unreasonable large sampling period. Subject: incorrected -> incorrect Note, the patch now needs to be re-based. Also maybe tidy up the commit message e.g. perf events has the notion of sampling frequency which is implemented in software by dynamically adjusting the counter period so that samples occur at approximately the target frequency. Period adjustment is done in 2 places: - when the counter overflows (and a sample is recorded) - each timer tick, when the event is active The later case is slightly flawed because it assumes that the time since the last timer-tick period adjustment is 1 tick, whereas the event may not have been active (e.g. for a task that is sleeping). Fix by using jiffies to determine the elapsed time in that case. > > # taskset --cpu 0 perf record -F 1000 -e cs -- taskset --cpu 1 ./test > [ perf record: Woken up 1 times to write data ] > [ perf record: Captured and wrote 0.010 MB perf.data (204 samples) ] > # perf script > ... > test 865 265.377846: 16 cs: ffffffff832e927b schedule+0x2b > test 865 265.378900: 15 cs: ffffffff832e927b schedule+0x2b > test 865 265.379845: 14 cs: ffffffff832e927b schedule+0x2b > test 865 265.380770: 14 cs: ffffffff832e927b schedule+0x2b > test 865 265.381647: 15 cs: ffffffff832e927b schedule+0x2b > test 865 265.382638: 16 cs: ffffffff832e927b schedule+0x2b > test 865 265.383647: 16 cs: ffffffff832e927b schedule+0x2b > test 865 265.384704: 15 cs: ffffffff832e927b schedule+0x2b > test 865 265.385649: 14 cs: ffffffff832e927b schedule+0x2b > test 865 265.386578: 152 cs: ffffffff832e927b schedule+0x2b > test 865 265.396383: 154 cs: ffffffff832e927b schedule+0x2b > test 865 265.406183: 154 cs: ffffffff832e927b schedule+0x2b > test 865 265.415839: 154 cs: ffffffff832e927b schedule+0x2b > test 865 265.425445: 154 cs: ffffffff832e927b schedule+0x2b > test 865 265.435052: 154 cs: ffffffff832e927b schedule+0x2b > test 865 265.444708: 154 cs: ffffffff832e927b schedule+0x2b > test 865 265.454314: 154 cs: ffffffff832e927b schedule+0x2b > test 865 265.463970: 154 cs: ffffffff832e927b schedule+0x2b > test 865 265.473577: 154 cs: ffffffff832e927b schedule+0x2b > ... > > And the reason is perf_adjust_freq_unthr_events() calculates a value that is too > big because it incorrectly assumes the count has accumulated only since the last > tick, whereas it can have been much longer. To fix this problem, perf can calculate > the tick interval by itself. For perf_adjust_freq_unthr_events we can use jiffies > to calculate the tick interval more efficiently, as sugguested by Adrian. > > Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com> > --- > include/linux/perf_event.h | 1 + > kernel/events/core.c | 16 +++++++++++++--- > 2 files changed, 14 insertions(+), 3 deletions(-) > > diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h > index afb028c54f33..2708f1d0692c 100644 > --- a/include/linux/perf_event.h > +++ b/include/linux/perf_event.h > @@ -265,6 +265,7 @@ struct hw_perf_event { > * State for freq target events, see __perf_event_overflow() and > * perf_adjust_freq_unthr_context(). > */ > + u64 freq_tick_stamp; > u64 freq_time_stamp; > u64 freq_count_stamp; > #endif > diff --git a/kernel/events/core.c b/kernel/events/core.c > index cad50d3439f1..309af5520f52 100644 > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -55,6 +55,7 @@ > #include <linux/pgtable.h> > #include <linux/buildid.h> > #include <linux/task_work.h> > +#include <linux/jiffies.h> > > #include "internal.h" > > @@ -4112,7 +4113,7 @@ perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle) > { > struct perf_event *event; > struct hw_perf_event *hwc; > - u64 now, period = TICK_NSEC; > + u64 now, period, tick_stamp; > s64 delta; > > /* > @@ -4151,6 +4152,10 @@ perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle) > */ > event->pmu->stop(event, PERF_EF_UPDATE); > > + tick_stamp = jiffies64_to_nsecs(get_jiffies_64()); > + period = tick_stamp - hwc->freq_tick_stamp; > + hwc->freq_tick_stamp = tick_stamp; > + > now = local64_read(&event->count); > delta = now - hwc->freq_count_stamp; > hwc->freq_count_stamp = now; > @@ -4162,8 +4167,13 @@ perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle) > * to perf_adjust_period() to avoid stopping it > * twice. > */ > - if (delta > 0) > - perf_adjust_period(event, period, delta, false); > + if (delta > 0) { > + /* > + * we skip first tick adjust period > + */ Could be a single line comment. > + if (likely(period != tick_stamp)) Kernel style is to combine if-statements if possible i.e. /* Skip if no delta or it is the first tick adjust period */ if (delta > 0 && likely(period != tick_stamp)) > + perf_adjust_period(event, period, delta, false); > + } > > event->pmu->start(event, delta > 0 ? PERF_EF_RELOAD : 0); > next: ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-08-21 11:32 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-08-10 10:24 [PATCH v3 0/2] *** Fix small negative period being ignored *** Luo Gengkun 2024-08-10 10:24 ` [PATCH v3 1/2] perf/core: Fix small negative period being ignored Luo Gengkun 2024-08-10 10:24 ` [PATCH v3 2/2] perf/core: Fix incorrected time diff in tick adjust period Luo Gengkun 2024-08-21 11:32 ` Adrian Hunter
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).