From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.ozlabs.org (lists.ozlabs.org [112.213.38.117]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 421CFF3C991 for ; Tue, 24 Feb 2026 15:41:49 +0000 (UTC) Received: from boromir.ozlabs.org (localhost [127.0.0.1]) by lists.ozlabs.org (Postfix) with ESMTP id 4fL25R4v1lz3cQf; Wed, 25 Feb 2026 02:41:47 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; arc=none smtp.remote-ip=172.234.252.31 ARC-Seal: i=1; a=rsa-sha256; d=lists.ozlabs.org; s=201707; t=1771947707; cv=none; b=LOzDbGJAjpkJ89IwzcUUIHqgj1Tz9B53/J204y4sABCkkYXxFRWnq/zuSazl43k8Tc3ywAITq+vN6ZJ0ZvA182vAFhCZQLthw6kg8q7DxhS5FvNLQZqx/L2ClpnLacsM/LUNd+9uh7glieYWrndlUUP1G7kknpycLnAiIZmJE2+IdrnRuy5C2a8Xd27bcprSw5tpoQGaG2bFBUxbouHCIdJOCRqwzW9QY/2LtEVx7Q558K1rO3nT5KI4zNENnASsTw+IbE1VxmEi8Qkbyi9Ob+g+muf2cxRXEhwFcemWsfJhwjJ+EEmkqV7o0HgUcmiIes68RHcdvy2DNdWLs6+M2w== ARC-Message-Signature: i=1; a=rsa-sha256; d=lists.ozlabs.org; s=201707; t=1771947707; c=relaxed/relaxed; bh=R1ZMv4njnpISh5Ph/i5b4MUV8Q31alXnt/bBHvqiaOM=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=bzdtNOVDJRhAGzwAzOOvZ55uzAK8hP6I3PzQO0m9pfxlReRPPLzOFa/VTQWw/4KXBOQpyaU2fpfSu114h+fQ8Mx/y8xOVOk3zpsEq+qmnttNxf2i5jA2AxPf+f8t9oUZfoh1x3heMdlg4gmxQmtouPaTAxlJ6RM6k93alzVa9euGeU/qxO6OhMckWAPGKEQ5R4DtnADhRzvWqGqT1o9rZ4XShxYw86o1qw7ZSuygzVusYrN/3y6mE5W+YLWqq2R5zzRgNOnDn6cFlisdas7tIyE+PCPl7VlGiG2WY2G7lddKf2q8hBUwrU9DqlOQ9sF3Mto5AHZeidgrcDJfIeFtVQ== ARC-Authentication-Results: i=1; lists.ozlabs.org; dmarc=pass (p=quarantine dis=none) header.from=kernel.org; dkim=pass (2048-bit key; unprotected) header.d=kernel.org header.i=@kernel.org header.a=rsa-sha256 header.s=k20201202 header.b=YT1nNitk; dkim-atps=neutral; spf=pass (client-ip=172.234.252.31; helo=sea.source.kernel.org; envelope-from=chleroy@kernel.org; receiver=lists.ozlabs.org) smtp.mailfrom=kernel.org Authentication-Results: lists.ozlabs.org; dmarc=pass (p=quarantine dis=none) header.from=kernel.org Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=kernel.org header.i=@kernel.org header.a=rsa-sha256 header.s=k20201202 header.b=YT1nNitk; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=kernel.org (client-ip=172.234.252.31; helo=sea.source.kernel.org; envelope-from=chleroy@kernel.org; receiver=lists.ozlabs.org) Received: from sea.source.kernel.org (sea.source.kernel.org [172.234.252.31]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange x25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4fL25Q4lrjz3cPw for ; Wed, 25 Feb 2026 02:41:46 +1100 (AEDT) Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sea.source.kernel.org (Postfix) with ESMTP id 71AF244578; Tue, 24 Feb 2026 15:41:44 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2D7D2C116D0; Tue, 24 Feb 2026 15:41:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771947704; bh=H208v0Z5rUJBYXeJ3UAVS2bjkma/8IcplUAsueh/KvI=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=YT1nNitkIs1wxq/nUgyyR3B4XFXAgL+IyMDXqYQ6Z3rT+glSBDtF97WKCv29iB0y2 Xk7zMHtfwitGW9QzX0X5SiRnx43ho7uBE6Y7EfcPijcre74Rn7Q1eJZ/3cksgvOVFt yA8J3IAPQ9uyqSIzalsAQ5Ml9PK76/6xa5jD5pNJF3vt4Ru6sa9Uz0O3ecqrxWDtH6 yBktcj8R6JncfgY/Elh5yhRZsZlKlNWsvc+0YItIvNUPWn0IIfJRfYStlO47PkxLlx KwIFYKFOxYec0jBXC6ThrHncVLOvN+7rwjn8C9dYv/y0w6MaxC68NS1uTGT2zTIuJm 1lm+AOGPIghbw== Message-ID: <9ab1e7d7-57ee-49f9-963c-3a1b96dda684@kernel.org> Date: Tue, 24 Feb 2026 16:41:32 +0100 X-Mailing-List: linuxppc-dev@lists.ozlabs.org List-Id: List-Help: List-Owner: List-Post: List-Archive: , List-Subscribe: , , List-Unsubscribe: Precedence: list MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 04/15] powerpc/time: Prepare to stop elapsing in dynticks-idle To: Shrikanth Hegde , Frederic Weisbecker , LKML , Madhavan Srinivasan Cc: "Rafael J. Wysocki" , Alexander Gordeev , Anna-Maria Behnsen , Ben Segall , Boqun Feng , Christian Borntraeger , Dietmar Eggemann , Heiko Carstens , Ingo Molnar , Jan Kiszka , Joel Fernandes , Juri Lelli , Kieran Bingham , Mel Gorman , Michael Ellerman , Neeraj Upadhyay , Nicholas Piggin , "Paul E . McKenney" , Peter Zijlstra , Steven Rostedt , Sven Schnelle , Thomas Gleixner , Uladzislau Rezki , Valentin Schneider , Vasily Gorbik , Vincent Guittot , Viresh Kumar , Xin Zhao , linux-pm@vger.kernel.org, linux-s390@vger.kernel.org, linuxppc-dev@lists.ozlabs.org References: <20260206142245.58987-1-frederic@kernel.org> <20260206142245.58987-5-frederic@kernel.org> <9413517d-963b-4e6d-b11b-b440acd7cb5a@linux.ibm.com> Content-Language: fr-FR From: "Christophe Leroy (CS GROUP)" In-Reply-To: <9413517d-963b-4e6d-b11b-b440acd7cb5a@linux.ibm.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Hi Hegde, Le 19/02/2026 à 19:30, Shrikanth Hegde a écrit : > > > On 2/6/26 7:52 PM, Frederic Weisbecker wrote: >> Currently the tick subsystem stores the idle cputime accounting in >> private fields, allowing cohabitation with architecture idle vtime >> accounting. The former is fetched on online CPUs, the latter on offline >> CPUs. >> >> For consolidation purpose, architecture vtime accounting will continue >> to account the cputime but will make a break when the idle tick is >> stopped. The dyntick cputime accounting will then be relayed by the tick >> subsystem so that the idle cputime is still seen advancing coherently >> even when the tick isn't there to flush the idle vtime. >> >> Prepare for that and introduce three new APIs which will be used in >> subsequent patches: >> >> _ vtime_dynticks_start() is deemed to be called when idle enters in >>    dyntick mode. The idle cputime that elapsed so far is accumulated. >> >> - vtime_dynticks_stop() is deemed to be called when idle exits from >>    dyntick mode. The vtime entry clocks are fast-forward to current time >>    so that idle accounting restarts elapsing from now. >> >> - vtime_reset() is deemed to be called from dynticks idle IRQ entry to >>    fast-forward the clock to current time so that the IRQ time is still >>    accounted by vtime while nohz cputime is paused. >> >> Also accumulated vtime won't be flushed from dyntick-idle ticks to avoid >> accounting twice the idle cputime, along with nohz accounting. >> >> Signed-off-by: Frederic Weisbecker > > Reviewed-by: Shrikanth Hegde > >> --- >>   arch/powerpc/kernel/time.c | 41 ++++++++++++++++++++++++++++++++++++++ >>   include/linux/vtime.h      |  6 ++++++ >>   2 files changed, 47 insertions(+) >> >> diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c >> index 4bbeb8644d3d..18506740f4a4 100644 >> --- a/arch/powerpc/kernel/time.c >> +++ b/arch/powerpc/kernel/time.c >> @@ -376,6 +376,47 @@ void vtime_task_switch(struct task_struct *prev) >>           acct->starttime = acct0->starttime; >>       } >>   } >> + >> +#ifdef CONFIG_NO_HZ_COMMON >> +/** >> + * vtime_reset - Fast forward vtime entry clocks >> + * >> + * Called from dynticks idle IRQ entry to fast-forward the clocks to >> current time >> + * so that the IRQ time is still accounted by vtime while nohz >> cputime is paused. >> + */ >> +void vtime_reset(void) >> +{ >> +    struct cpu_accounting_data *acct = get_accounting(current); >> + >> +    acct->starttime = mftb(); > > I figured out why those huge values happen. > > This happens because mftb is from when the system is booted. > I was doing kexec to start the new kernel and mftb wasn't getting > reset. > > I thought about this. This is concern for pseries too, where LPAR's > restart but system won't restart and mftb will continue to run instead of > reset. > > I think we should be using sched_clock instead of mftb here. > Though we need it a few more places and some cosmetic changes around it. > > Note: Some values being huge exists without series for few CPUs, with > series it > shows up in most of the CPUs. > > So I am planning send out fix below fix separately keeping your > series as dependency. > > --- >  arch/powerpc/include/asm/accounting.h |  4 ++-- >  arch/powerpc/include/asm/cputime.h    | 14 +++++++------- >  arch/powerpc/kernel/time.c            | 22 +++++++++++----------- >  3 files changed, 20 insertions(+), 20 deletions(-) > > diff --git a/arch/powerpc/include/asm/accounting.h b/arch/powerpc/ > include/asm/accounting.h > index 6d79c31700e2..50f120646e6d 100644 > --- a/arch/powerpc/include/asm/accounting.h > +++ b/arch/powerpc/include/asm/accounting.h > @@ -21,8 +21,8 @@ struct cpu_accounting_data { >      unsigned long steal_time; >      unsigned long idle_time; >      /* Internal counters */ > -    unsigned long starttime;    /* TB value snapshot */ > -    unsigned long starttime_user;    /* TB value on exit to usermode */ > +    unsigned long starttime;    /* Time value snapshot */ > +    unsigned long starttime_user;    /* Time value on exit to usermode */ >  #ifdef CONFIG_ARCH_HAS_SCALED_CPUTIME >      unsigned long startspurr;    /* SPURR value snapshot */ >      unsigned long utime_sspurr;    /* ->user_time when ->startspurr > set */ > diff --git a/arch/powerpc/include/asm/cputime.h b/arch/powerpc/include/ > asm/cputime.h > index aff858ca99c0..eb6b629b113f 100644 > --- a/arch/powerpc/include/asm/cputime.h > +++ b/arch/powerpc/include/asm/cputime.h > @@ -20,9 +20,9 @@ >  #include >  #include >  #include > +#include > >  #ifdef __KERNEL__ > -#define cputime_to_nsecs(cputime) tb_to_ns(cputime) > >  /* >   * PPC64 uses PACA which is task independent for storing accounting > data while > @@ -44,20 +44,20 @@ >   */ >  static notrace inline void account_cpu_user_entry(void) >  { > -    unsigned long tb = mftb(); > +    unsigned long now = sched_clock(); Now way ! By doing that you'll kill performance for no reason. All we need when accounting time spent in kernel or in user is the difference between time at entry and time at exit, no mater what the time was at boot time. Also sched_clock() returns nanoseconds which implies calculation from timebase. This is pointless CPU consumption. The current implementation calculates nanoseconds at task switch when calling vtime_flush().Your change will now do it at every kernel entry and kernel exit by calling sched_clock(). Another point is that sched_clock() returns a long long not a long. And also sched_clock() uses get_tb() which does mftb and mftbu. Which is pointless for calculating time deltas unless your application spends hours without being re-scheduled. >      struct cpu_accounting_data *acct = raw_get_accounting(current); > > -    acct->utime += (tb - acct->starttime_user); > -    acct->starttime = tb; > +    acct->utime += (now - acct->starttime_user); > +    acct->starttime = now; >  } > >  static notrace inline void account_cpu_user_exit(void) >  { > -    unsigned long tb = mftb(); > +    unsigned long now = sched_clock(); >      struct cpu_accounting_data *acct = raw_get_accounting(current); > > -    acct->stime += (tb - acct->starttime); > -    acct->starttime_user = tb; > +    acct->stime += (now - acct->starttime); > +    acct->starttime_user = now; >  } > >  static notrace inline void account_stolen_time(void) > diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c > index 18506740f4a4..fb67cdae3bcb 100644 > --- a/arch/powerpc/kernel/time.c > +++ b/arch/powerpc/kernel/time.c > @@ -215,7 +215,7 @@ static unsigned long vtime_delta(struct > cpu_accounting_data *acct, > >      WARN_ON_ONCE(!irqs_disabled()); > > -    now = mftb(); > +    now = sched_clock(); >      stime = now - acct->starttime; >      acct->starttime = now; > > @@ -299,9 +299,9 @@ static void vtime_flush_scaled(struct task_struct *tsk, >  { >  #ifdef CONFIG_ARCH_HAS_SCALED_CPUTIME >      if (acct->utime_scaled) > -        tsk->utimescaled += cputime_to_nsecs(acct->utime_scaled); > +        tsk->utimescaled += acct->utime_scaled; >      if (acct->stime_scaled) > -        tsk->stimescaled += cputime_to_nsecs(acct->stime_scaled); > +        tsk->stimescaled += acct->stime_scaled; > >      acct->utime_scaled = 0; >      acct->utime_sspurr = 0; > @@ -321,28 +321,28 @@ void vtime_flush(struct task_struct *tsk) >      struct cpu_accounting_data *acct = get_accounting(tsk); > >      if (acct->utime) > -        account_user_time(tsk, cputime_to_nsecs(acct->utime)); > +        account_user_time(tsk, acct->utime); > >      if (acct->gtime) > -        account_guest_time(tsk, cputime_to_nsecs(acct->gtime)); > +        account_guest_time(tsk, acct->gtime); > >      if (IS_ENABLED(CONFIG_PPC_SPLPAR) && acct->steal_time) { > -        account_steal_time(cputime_to_nsecs(acct->steal_time)); > +        account_steal_time(acct->steal_time); >          acct->steal_time = 0; >      } > >      if (acct->idle_time) > -        account_idle_time(cputime_to_nsecs(acct->idle_time)); > +        account_idle_time(acct->idle_time); > >      if (acct->stime) > -        account_system_index_time(tsk, cputime_to_nsecs(acct->stime), > +        account_system_index_time(tsk, acct->stime, >                        CPUTIME_SYSTEM); > >      if (acct->hardirq_time) > -        account_system_index_time(tsk, cputime_to_nsecs(acct- > >hardirq_time), > +        account_system_index_time(tsk, acct->hardirq_time, >                        CPUTIME_IRQ); >      if (acct->softirq_time) > -        account_system_index_time(tsk, cputime_to_nsecs(acct- > >softirq_time), > +        account_system_index_time(tsk, acct->softirq_time, >                        CPUTIME_SOFTIRQ); > >      vtime_flush_scaled(tsk, acct); > @@ -388,7 +388,7 @@ void vtime_reset(void) >  { >      struct cpu_accounting_data *acct = get_accounting(current); > > -    acct->starttime = mftb(); > +    acct->starttime = sched_clock(); >  #ifdef CONFIG_ARCH_HAS_SCALED_CPUTIME >      acct->startspurr = read_spurr(acct->starttime); >  #endif