From: Martin Schwidefsky <schwidefsky@de.ibm.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>,
LKML <linux-kernel@vger.kernel.org>,
Tony Luck <tony.luck@intel.com>,
Peter Zijlstra <peterz@infradead.org>,
Heiko Carstens <heiko.carstens@de.ibm.com>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Oleg Nesterov <oleg@redhat.com>,
Paul Mackerras <paulus@samba.org>,
Wu Fengguang <fengguang.wu@intel.com>,
Ingo Molnar <mingo@kernel.org>, Rik van Riel <riel@redhat.com>
Subject: Re: [RFC PATCH 07/30] cputime: Convert kcpustat to nsecs
Date: Mon, 1 Dec 2014 18:27:38 +0100 [thread overview]
Message-ID: <20141201182738.2b344a18@mschwide> (raw)
In-Reply-To: <alpine.DEB.2.11.1412011801250.3961@nanos>
On Mon, 1 Dec 2014 18:15:36 +0100 (CET)
Thomas Gleixner <tglx@linutronix.de> wrote:
> On Mon, 1 Dec 2014, Martin Schwidefsky wrote:
> > On Mon, 1 Dec 2014 17:10:34 +0100
> > Frederic Weisbecker <fweisbec@gmail.com> wrote:
> >
> > > Speaking about the degradation in s390:
> > >
> > > s390 is really a special case. And it would be a shame if we prevent from a
> > > real core cleanup just for this special case especially as it's fairly possible
> > > to keep a specific treatment for s390 in order not to impact its performances
> > > and time precision. We could simply accumulate the cputime in per-cpu values:
> > >
> > > struct s390_cputime {
> > > cputime_t user, sys, softirq, hardirq, steal;
> > > }
> > >
> > > DEFINE_PER_CPU(struct s390_cputime, s390_cputime);
> > >
> > > Then on irq entry/exit, just add the accumulated time to the relevant buffer
> > > and account for real (through any account_...time() functions) only on tick
> > > and task switch. There the costly operations (unit conversion and call to
> > > account_...._time() functions) are deferred to a rarer yet periodic enough
> > > event. This is what s390 does already for user/system time and kernel
> > > boundaries.
> > >
> > > This way we should even improve the situation compared to what we have
> > > upstream. It's going to be faster because calling the accounting functions
> > > can be costlier than simple per-cpu ops. And also we keep the cputime_t
> > > granularity. For archs like s390 which have a granularity higher than nsecs,
> > > we can have:
> > >
> > > u64 cputime_to_nsecs(cputime_t time, u64 *rem);
> > >
> > > And to avoid remainder losses, we can do that from the tick:
> > >
> > > delta_cputime = this_cpu_read(s390_cputime.hardirq);
> > > delta_nsec = cputime_to_nsecs(delta_cputime, &rem);
> > > account_system_time(delta_nsec, HARDIRQ_OFFSET);
> > > this_cpu_write(s390_cputime.hardirq, rem);
> > >
> > > Although I doubt that remainders below one nsec lost each tick matter that much.
> > > But if it does, it's fairly possible to handle like above.
> >
> > To make that work we would have to move some of the logic from account_system_time
> > to the architecture code. The decision if a system time delta is guest time,
> > irq time, softirq time or simply system time is currently done in
> > kernel/sched/cputime.c.
> >
> > As the conversion + the accounting is delayed to a regular tick we would have
> > to split the accounting code into decision functions which bucket a system time
> > delta should go to and introduce new function to account to the different buckets.
> >
> > Instead of a single account_system_time we would have account_guest_time,
> > account_system_time, account_system_time_irq and account_system_time_softirq.
> >
> > In principle not a bad idea, that would make the interrupt path for s390 faster
> > as we would not have to call account_system_time, only the decision function
> > which could be an inline function.
>
> Why make this s390 specific?
>
> We can decouple the accounting from the time accumulation for all
> architectures.
>
> struct cputime_record {
> u64 user, sys, softirq, hardirq, steal;
> };
>
> DEFINE_PER_CPU(struct cputime_record, cputime_record);
>
> Now let account_xxx_time() just work on that per cpu data
> structures. That would just accumulate the deltas based on whatever
> the architecture uses as a cputime source with whatever resolution it
> provides.
>
> Then we collect that accumulated results for the various buckets on a
> regular base and convert them to nano seconds. This is not even
> required to be at the tick, it could be done by some async worker and
> on idle enter/exit.
And leave the decision making in kernel/sched/cputime.c. Yes, that is good.
This would make the arch and the account_xxx_time() function care about
cputime_t and all other common code would use nano-seconds. With the added
benefit that I do not have to change the low level code too much ;-)
--
blue skies,
Martin.
"Reality continues to ruin my life." - Calvin.
next prev parent reply other threads:[~2014-12-01 17:27 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-11-28 18:23 [RFC PATCH 00/30] cputime: Convert task/cpu cputime accounting to nsecs Frederic Weisbecker
2014-11-28 18:23 ` [RFC PATCH 01/30] jiffies: Remove HZ > USEC_PER_SEC special case Frederic Weisbecker
2014-11-28 18:23 ` [RFC PATCH 02/30] time: Introduce jiffies64_to_nsecs() Frederic Weisbecker
2014-11-28 18:23 ` [RFC PATCH 03/30] cputime: Introduce nsecs_to_cputime64() Frederic Weisbecker
2014-12-01 14:05 ` Martin Schwidefsky
2014-11-28 18:23 ` [RFC PATCH 04/30] s390: Convert open coded idle time seqcount Frederic Weisbecker
2014-12-01 13:46 ` Heiko Carstens
2014-11-28 18:23 ` [RFC PATCH 05/30] s390: Translate cputime magic constants to macros Frederic Weisbecker
2014-12-01 13:47 ` Heiko Carstens
2014-12-01 16:23 ` Frederic Weisbecker
2014-11-28 18:23 ` [RFC PATCH 06/30] s390: Introduce cputime64_to_nsecs() Frederic Weisbecker
2014-12-01 12:24 ` Heiko Carstens
2014-12-01 13:58 ` Martin Schwidefsky
2014-12-01 16:23 ` Frederic Weisbecker
2014-11-28 18:23 ` [RFC PATCH 07/30] cputime: Convert kcpustat to nsecs Frederic Weisbecker
2014-12-01 14:14 ` Martin Schwidefsky
2014-12-01 16:10 ` Frederic Weisbecker
2014-12-01 16:48 ` Martin Schwidefsky
2014-12-01 17:15 ` Thomas Gleixner
2014-12-01 17:27 ` Martin Schwidefsky [this message]
2014-12-01 19:59 ` Frederic Weisbecker
2014-12-01 20:14 ` Christian Borntraeger
2014-12-01 20:21 ` Thomas Gleixner
2014-11-28 18:23 ` [RFC PATCH 08/30] apm32: Fix cputime == jiffies assumption Frederic Weisbecker
2014-11-28 18:23 ` [RFC PATCH 09/30] alpha: Fix jiffies based cputime assumption Frederic Weisbecker
2014-11-28 18:23 ` [RFC PATCH 10/30] cputime: Convert guest time accounting to nsecs Frederic Weisbecker
2014-11-28 18:23 ` [RFC PATCH 11/30] cputime: Special API to return old-typed cputime Frederic Weisbecker
2014-11-28 18:23 ` [RFC PATCH 12/30] cputime: Convert task/group cputime to nsecs Frederic Weisbecker
2014-11-28 18:23 ` [RFC PATCH 13/30] alpha: Convert obsolete cputime_t " Frederic Weisbecker
2014-11-28 18:23 ` [RFC PATCH 14/30] x86: Convert obsolete cputime type " Frederic Weisbecker
2014-11-28 18:23 ` [RFC PATCH 15/30] isdn: " Frederic Weisbecker
2014-11-28 18:23 ` [RFC PATCH 16/30] binfmt: " Frederic Weisbecker
2014-11-28 18:23 ` [RFC PATCH 17/30] acct: " Frederic Weisbecker
2014-11-28 18:23 ` [RFC PATCH 18/30] delaycct: " Frederic Weisbecker
2014-11-28 18:23 ` [RFC PATCH 19/30] tsacct: " Frederic Weisbecker
2014-11-28 18:23 ` [RFC PATCH 20/30] signal: " Frederic Weisbecker
2014-11-28 18:23 ` [RFC PATCH 21/30] cputime: Remove task_cputime_t_scaled Frederic Weisbecker
2014-11-28 18:23 ` [RFC PATCH 22/30] u64_stats_sync: Introduce preempt-unsafe readers Frederic Weisbecker
2014-11-28 18:23 ` [RFC PATCH 23/30] cputime: Convert irq_time_accounting to use u64_stats_sync Frederic Weisbecker
2014-11-28 18:23 ` [RFC PATCH 24/30] cputime: Increment kcpustat directly on irqtime account Frederic Weisbecker
2014-12-01 14:41 ` Martin Schwidefsky
2014-12-01 16:15 ` Frederic Weisbecker
2014-12-01 16:50 ` Martin Schwidefsky
2014-11-28 18:23 ` [RFC PATCH 25/30] cputime: Remove temporary irqtime states Frederic Weisbecker
2014-11-28 18:23 ` [RFC PATCH 26/30] cputime: Push time to account_user_time() in nanosecs Frederic Weisbecker
2014-11-28 18:23 ` [RFC PATCH 27/30] cputime: Push time to account_steal_time() " Frederic Weisbecker
2014-11-28 18:23 ` [RFC PATCH 28/30] cputime: Push time to account_idle_time() " Frederic Weisbecker
2014-11-28 18:23 ` [RFC PATCH 29/30] cputime: Push time to account_guest_time() " Frederic Weisbecker
2014-11-28 18:24 ` [RFC PATCH 30/30] cputime: Push time to account_system_time() " Frederic Weisbecker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141201182738.2b344a18@mschwide \
--to=schwidefsky@de.ibm.com \
--cc=benh@kernel.crashing.org \
--cc=fengguang.wu@intel.com \
--cc=fweisbec@gmail.com \
--cc=heiko.carstens@de.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=oleg@redhat.com \
--cc=paulus@samba.org \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=tglx@linutronix.de \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.