From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0A65542A96; Thu, 29 Aug 2024 14:19:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724941193; cv=none; b=U2Cbzs9f3zWdXrcWHaWcs5AjZjmXQF/W6Irjb1FR85NIwQzk9sT6bMw+Py2aWElh0LHoM22aAZv8jUrd4xQC8UqRoqFY3+NjIQ1ssRTF97p/9EsUFOGZMcFfVFFwYHXPSHzIrVQnp3iN3dWE+UpB2YdvMIosI4YJZyY5ZpjlAZo= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724941193; c=relaxed/simple; bh=cQ+Yytpsq6B15Gxw8y+wWSg9HZED+7ysWVFrZmzZZR8=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=rA1k7mKJItxZczOWDtEotvw5gt5dnETlYalkfoQz0MfktLz8ffgQaG/Ll6N6TyvtEpewDl5zUNNU0ZFcnWiqdgYwBausbZ9aEoHysA/AhVcAk1BHvklDw3JVImU2Ri14vRIssXj7nhPhrLFzBIpc/5FuvCsQjTl46+nvd+5YR3k= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4Wvk1h4S3xz4f3jtv; Thu, 29 Aug 2024 22:19:36 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 9F1741A018D; Thu, 29 Aug 2024 22:19:46 +0800 (CST) Received: from [10.67.108.244] (unknown [10.67.108.244]) by APP4 (Coremail) with SMTP id gCh0CgCHr4WBg9Bm7Pw6DA--.21461S3; Thu, 29 Aug 2024 22:19:46 +0800 (CST) Message-ID: Date: Thu, 29 Aug 2024 22:19:45 +0800 Precedence: bulk X-Mailing-List: linux-perf-users@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v4 2/2] perf/core: Fix incorrect time diff in tick adjust period Content-Language: en-US To: "Liang, Kan" , Adrian Hunter , peterz@infradead.org Cc: mingo@redhat.com, acme@kernel.org, namhyung@kernel.org, mark.rutland@arm.com, alexander.shishkin@linux.intel.com, jolsa@kernel.org, irogers@google.com, linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org References: <20240821134227.577544-1-luogengkun@huaweicloud.com> <20240821134227.577544-3-luogengkun@huaweicloud.com> <30884874-848a-40ef-9e02-7cdb7b1a029a@intel.com> <63bcac23-e650-41c8-9c9e-93e258355777@linux.intel.com> From: Luo Gengkun In-Reply-To: <63bcac23-e650-41c8-9c9e-93e258355777@linux.intel.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-CM-TRANSID:gCh0CgCHr4WBg9Bm7Pw6DA--.21461S3 X-Coremail-Antispam: 1UD129KBjvJXoW3Wr1xAFy5AFyfJrWfCrWfXwb_yoWxAF47pr WkAF15trWkJr18ur1jyw1ktry8tw1kJw1DWF15Ga4xAwn8trn2qF47XF1Y9r1qyrs7JFyj ya1jqry7urWjqFJanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUv0b4IE77IF4wAFF20E14v26r4j6ryUM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28lY4IEw2IIxxk0rwA2F7IY1VAKz4 vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_Ar0_tr1l84ACjcxK6xIIjxv20xvEc7Cj xVAFwI0_Gr1j6F4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x 0267AKxVW0oVCq3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG 6I80ewAv7VC0I7IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFV Cjc4AY6r1j6r4UM4x0Y48IcVAKI48JM4IIrI8v6xkF7I0E8cxan2IY04v7MxkF7I0En4kS 14v26r1q6r43MxAIw28IcxkI7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I 8CrVAFwI0_Jr0_Jr4lx2IqxVCjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVWUtVW8 ZwCIc40Y0x0EwIxGrwCI42IY6xIIjxv20xvE14v26r1j6r1xMIIF0xvE2Ix0cI8IcVCY1x 0267AKxVW8JVWxJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_ Jr0_Gr1lIxAIcVC2z280aVCY1x0267AKxVW8JVW8JrUvcSsGvfC2KfnxnUUI43ZEXa7IU1 7KsUUUUUU== X-CM-SenderInfo: 5oxrwvpqjn3046kxt4xhlfz01xgou0bp/ On 2024/8/29 21:46, Liang, Kan wrote: > > On 2024-08-27 9:10 p.m., Adrian Hunter wrote: >> On 27/08/24 23:06, Liang, Kan wrote: >>> >>> On 2024-08-27 1:16 p.m., Adrian Hunter wrote: >>>> On 27/08/24 19:42, Liang, Kan wrote: >>>>> >>>>> On 2024-08-21 9:42 a.m., Luo Gengkun wrote: >>>>>> Perf events has the notion of sampling frequency which is implemented in >>>>>> software by dynamically adjusting the counter period so that samples occur >>>>>> at approximately the target frequency. Period adjustment is done in 2 >>>>>> places: >>>>>> - when the counter overflows (and a sample is recorded) >>>>>> - each timer tick, when the event is active >>>>>> The later case is slightly flawed because it assumes that the time since >>>>>> the last timer-tick period adjustment is 1 tick, whereas the event may not >>>>>> have been active (e.g. for a task that is sleeping). >>>>>> >>>>> Do you have a real-world example to demonstrate how bad it is if the >>>>> algorithm doesn't take sleep into account? >>>>> >>>>> I'm not sure if introducing such complexity in the critical path is >>>>> worth it. >>>>> >>>>>> Fix by using jiffies to determine the elapsed time in that case. >>>>>> >>>>>> Signed-off-by: Luo Gengkun >>>>>> --- >>>>>> include/linux/perf_event.h | 1 + >>>>>> kernel/events/core.c | 11 ++++++++--- >>>>>> 2 files changed, 9 insertions(+), 3 deletions(-) >>>>>> >>>>>> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h >>>>>> index 1a8942277dda..d29b7cf971a1 100644 >>>>>> --- a/include/linux/perf_event.h >>>>>> +++ b/include/linux/perf_event.h >>>>>> @@ -265,6 +265,7 @@ struct hw_perf_event { >>>>>> * State for freq target events, see __perf_event_overflow() and >>>>>> * perf_adjust_freq_unthr_context(). >>>>>> */ >>>>>> + u64 freq_tick_stamp; >>>>>> u64 freq_time_stamp; >>>>>> u64 freq_count_stamp; >>>>>> #endif >>>>>> diff --git a/kernel/events/core.c b/kernel/events/core.c >>>>>> index a9395bbfd4aa..86e80e3ef6ac 100644 >>>>>> --- a/kernel/events/core.c >>>>>> +++ b/kernel/events/core.c >>>>>> @@ -55,6 +55,7 @@ >>>>>> #include >>>>>> #include >>>>>> #include >>>>>> +#include >>>>>> >>>>>> #include "internal.h" >>>>>> >>>>>> @@ -4120,7 +4121,7 @@ static void perf_adjust_freq_unthr_events(struct list_head *event_list) >>>>>> { >>>>>> struct perf_event *event; >>>>>> struct hw_perf_event *hwc; >>>>>> - u64 now, period = TICK_NSEC; >>>>>> + u64 now, period, tick_stamp; >>>>>> s64 delta; >>>>>> >>>>>> list_for_each_entry(event, event_list, active_list) { >>>>>> @@ -4148,6 +4149,10 @@ static void perf_adjust_freq_unthr_events(struct list_head *event_list) >>>>>> */ >>>>>> event->pmu->stop(event, PERF_EF_UPDATE); >>>>>> >>>>>> + tick_stamp = jiffies64_to_nsecs(get_jiffies_64()); >>>>> Seems it only needs to retrieve the time once at the beginning, not for >>>>> each event. >>>>> >>>>> There is a perf_clock(). It's better to use it for the consistency. >>>> perf_clock() is much slower, and for statistical sampling it doesn't >>>> have to be perfect. >>> Because of rdtsc? >> Yes > OK. I'm not worry about it too much as long as it's only invoked once in > each tick. > >>> If it is only used here, it should be fine. What I'm worried about is >>> that someone may use it with other timestamp in perf later. Anyway, it's >>> not a big deal. >>> >>> The main concern I have is that do we really need the patch? >> The current code is wrong. >> >>> It seems can only bring us a better guess of the period for the sleep >>> test. Then we have to do all the calculate for each tick. >> Or any workload that sleeps periodically. >> >> Another option is to remove the period adjust on tick entirely. >> Although arguably the calculation at a tick is better because >> it probably covers a longer period. > Or we may remove the period adjust on overflow. > > As my understanding, the period adjust on overflow is to handle the case > while the overflow happens very frequently (< 2 ticks). It is mainly > caused by the very low start period (1). > I'm working on a patch to set a larger start period, which should > minimize the usage of the period adjust on overflow. I think it's hard to choose a nice initial period, it may require a lot of testing, good luck. > > Anyway, based on the current code, I agree that adding a new > freq_tick_stamp should be required. But it doesn't need to read the time > for each event. I think reading the time once at the beginning should be > good enough for the period adjust/estimate algorithm. That's a good idea, do you think it's appropriate to move this line here? Thanks, Gengkun @@ -4126,6 +4126,8 @@ perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle)         raw_spin_lock(&ctx->lock); +       tick_stamp = jiffies64_to_nsecs(get_jiffies_64()); +         list_for_each_entry_rcu(event, &ctx->event_list, event_entry) {                 if (event->state != PERF_EVENT_STATE_ACTIVE)                         continue; @@ -4152,7 +4154,6 @@ perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle)                  */                 event->pmu->stop(event, PERF_EF_UPDATE); -               tick_stamp = jiffies64_to_nsecs(get_jiffies_64());                 period = tick_stamp - hwc->freq_tick_stamp;                 hwc->freq_tick_stamp = tick_stamp; > > Thanks, > Kan > >>> Thanks, >>> Kan >>>>> Thanks, >>>>> Kan >>>>>> + period = tick_stamp - hwc->freq_tick_stamp; >>>>>> + hwc->freq_tick_stamp = tick_stamp; >>>>>> + >>>>>> now = local64_read(&event->count); >>>>>> delta = now - hwc->freq_count_stamp; >>>>>> hwc->freq_count_stamp = now; >>>>>> @@ -4157,9 +4162,9 @@ static void perf_adjust_freq_unthr_events(struct list_head *event_list) >>>>>> * reload only if value has changed >>>>>> * we have stopped the event so tell that >>>>>> * to perf_adjust_period() to avoid stopping it >>>>>> - * twice. >>>>>> + * twice. And skip if it is the first tick adjust period. >>>>>> */ >>>>>> - if (delta > 0) >>>>>> + if (delta > 0 && likely(period != tick_stamp)) >>>>>> perf_adjust_period(event, period, delta, false);> >>>>>> event->pmu->start(event, delta > 0 ? PERF_EF_RELOAD : 0); >>>> >>