From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751944AbcF0MuM (ORCPT ); Mon, 27 Jun 2016 08:50:12 -0400 Received: from mx1.redhat.com ([209.132.183.28]:39570 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751601AbcF0MuK (ORCPT ); Mon, 27 Jun 2016 08:50:10 -0400 Message-ID: <1467031806.22723.18.camel@redhat.com> Subject: Re: [PATCH 1/5] sched,time: count actually elapsed irq & softirq time From: Rik van Riel To: Frederic Weisbecker Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, mingo@kernel.org, pbonzini@redhat.com, fweisbec@redhat.com, wanpeng.li@hotmail.com, efault@gmx.de, tglx@linutronix.de, rkrcmar@redhat.com Date: Mon, 27 Jun 2016 08:50:06 -0400 In-Reply-To: <20160627122553.GA2111@lerouge> References: <1466648751-7958-1-git-send-email-riel@redhat.com> <1466648751-7958-2-git-send-email-riel@redhat.com> <20160627122553.GA2111@lerouge> Content-Type: multipart/signed; micalg="pgp-sha256"; protocol="application/pgp-signature"; boundary="=-1t2hYkeP6uQ58UkqwA6p" Mime-Version: 1.0 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Mon, 27 Jun 2016 12:50:10 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --=-1t2hYkeP6uQ58UkqwA6p Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Mon, 2016-06-27 at 14:25 +0200, Frederic Weisbecker wrote: > On Wed, Jun 22, 2016 at 10:25:47PM -0400, riel@redhat.com wrote: > >=20 > > From: Rik van Riel > >=20 > > Currently, if there was any irq or softirq time during 'ticks' > > jiffies, the entire period will be accounted as irq or softirq > > time. > >=20 > > This is inaccurate if only a subset of 'ticks' jiffies was > > actually spent handling irqs, and could conceivably mis-count > > all of the ticks during a period as irq time, when there was > > some irq and some softirq time. > Good catch! >=20 > Many comments following. >=20 > >=20 > >=20 > > This can actually happen when irqtime_account_process_tick > > is called from account_idle_ticks, which can pass a larger > > number of ticks down all at once. > >=20 > > Fix this by changing irqtime_account_hi_update and > > irqtime_account_si_update to round elapsed irq and softirq > > time to jiffies, and return the number of jiffies spent in > > each mode, similar to how steal time is handled. > >=20 > > Additionally, have irqtime_account_process_tick take into > > account how much time was spent in each of steal, irq, > > and softirq time. > >=20 > > The latter could help improve the accuracy of timekeeping > Maybe you meant cputime? Timekeeping is rather about jiffies and > GTOD. >=20 > >=20 > > when returning from idle on a NO_HZ_IDLE CPU. > >=20 > > Properly accounting how much time was spent in hardirq and > > softirq time will also allow the NO_HZ_FULL code to re-use > > these same functions for hardirq and softirq accounting. > >=20 > > Signed-off-by: Rik van Riel > >=C2=A0 > > =C2=A0 local_irq_save(flags); > > - latest_ns =3D this_cpu_read(cpu_hardirq_time); > > - if (nsecs_to_cputime64(latest_ns) > cpustat[CPUTIME_IRQ]) > > - ret =3D 1; > > + irq =3D this_cpu_read(cpu_hardirq_time) - > > cpustat[CPUTIME_IRQ]; > cpu_hardirq_time is made of nsecs whereas cpustat is of cputime_t (in > fact > even cputime64_t). So you need to convert cpu_hardirq_time before > doing the > substract. Doh. Good catch! > > -static int irqtime_account_si_update(void) > > +static unsigned long irqtime_account_si_update(unsigned long > > max_jiffies) > > =C2=A0{ > > =C2=A0 u64 *cpustat =3D kcpustat_this_cpu->cpustat; > > + unsigned long si_jiffies =3D 0; > > =C2=A0 unsigned long flags; > > - u64 latest_ns; > > - int ret =3D 0; > > + u64 softirq; > > =C2=A0 > > =C2=A0 local_irq_save(flags); > > - latest_ns =3D this_cpu_read(cpu_softirq_time); > > - if (nsecs_to_cputime64(latest_ns) > > > cpustat[CPUTIME_SOFTIRQ]) > > - ret =3D 1; > > + softirq =3D this_cpu_read(cpu_softirq_time) - > > cpustat[CPUTIME_SOFTIRQ]; > > + if (softirq > cputime_one_jiffy) { > > + si_jiffies =3D min(max_jiffies, > > cputime_to_jiffies(softirq)); > > + cpustat[CPUTIME_SOFTIRQ] +=3D > > jiffies_to_cputime(si_jiffies); > > + } > > =C2=A0 local_irq_restore(flags); > > - return ret; > > + return si_jiffies; > So same comments apply here. >=20 > [...] > >=20 > > =C2=A0 * Accumulate raw cputime values of dead tasks (sig->[us]time) an= d > > live > > =C2=A0 * tasks (sum on group iteration) belonging to @tsk's group. > > =C2=A0 */ > > @@ -344,19 +378,24 @@ static void > > irqtime_account_process_tick(struct task_struct *p, int user_tick, > > =C2=A0{ > > =C2=A0 cputime_t scaled =3D cputime_to_scaled(cputime_one_jiffy); > > =C2=A0 u64 cputime =3D (__force u64) cputime_one_jiffy; > > - u64 *cpustat =3D kcpustat_this_cpu->cpustat; > > + unsigned long other; > > =C2=A0 > > - if (steal_account_process_tick(ULONG_MAX)) > > + /* > > + =C2=A0* When returning from idle, many ticks can get accounted > > at > > + =C2=A0* once, including some ticks of steal, irq, and softirq > > time. > > + =C2=A0* Subtract those ticks from the amount of time accounted > > to > > + =C2=A0* idle, or potentially user or system time. Due to > > rounding, > > + =C2=A0* other time can exceed ticks occasionally. > > + =C2=A0*/ > > + other =3D account_other_ticks(ticks); > > + if (other >=3D ticks) > > =C2=A0 return; > > + ticks -=3D other; > > =C2=A0 > > =C2=A0 cputime *=3D ticks; > > =C2=A0 scaled *=3D ticks; > So instead of dealing with ticks here, I think you should rather use > the above > cputime as both the limit and the remaining time to account after > steal/irqs. >=20 > This should avoid some middle conversions and improve precision when > cputime_t =3D=3D nsecs granularity. >=20 > If we account 2 ticks to idle (lets say HZ=3D100) and irq time to > account is 15 ms. 2 ticks =3D 20 ms > so we have 5 ms left to account to idle. With the jiffies granularity > in this patch, we would account > one tick to irqtime (1 tick =3D 10 ms, there will be 5 ms to account > back later) and one tick to idle > time whereas if you deal with cputime_t, you are going to account the > correct amount of idle time. >=20 Ahhh, so you want irqtime_account_process_tick to work with and account fractional ticks when calling account_system_time, account_user_time, account_idle_time, etc? I guess that should work fine since we already pass cputime values in, anyway. I suppose we can do the same for get_vtime_delta, too. They can both work with the actual remaining time (in cputime_t), after the other time has been subtracted. I can rework the series to do that. --=20 All Rights Reversed. --=-1t2hYkeP6uQ58UkqwA6p Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAABCAAGBQJXcSD/AAoJEM553pKExN6D7kQH/3ZMaLLJpytVIR/+3DQT/yx5 4eKCWsCltt7NfdsTUzxATgYTj123wW52MZr7o3mxfRvuSaAv9gGF/iw7qhRH4RZQ Bgw8kGcBZsS4xh1BiaGWl65AGsm4F3OExgwxoKNnc1vPPcvl3tfcE4C6GAM1i/99 iXfqOPbGk6/10y4rqQdF6+CRHHpc9y7m7ZvunnqO1X5tJsKLDr2hg1rMurlz6mCm aN4L9L4D7XAI8m2wUXxM+63LxHfq8d70dn2jjaxAeAOFd7R7GO+mn4qO9nmVtySx 9oid1UZFchCXtNMcLi1HD6ntMCKHueDSqX4/+TmbCnkXjPJF0RmwDjGn6vy1+VA= =CoXt -----END PGP SIGNATURE----- --=-1t2hYkeP6uQ58UkqwA6p--