From: Martin Schwidefsky <schwidefsky@de.ibm.com>
To: Frederic Weisbecker <fweisbec@gmail.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
Tony Luck <tony.luck@intel.com>,
Wanpeng Li <wanpeng.li@hotmail.com>,
Peter Zijlstra <peterz@infradead.org>,
Michael Ellerman <mpe@ellerman.id.au>,
Heiko Carstens <heiko.carstens@de.ibm.com>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Thomas Gleixner <tglx@linutronix.de>,
Paul Mackerras <paulus@samba.org>, Ingo Molnar <mingo@kernel.org>,
Fenghua Yu <fenghua.yu@intel.com>, Rik van Riel <riel@redhat.com>,
Stanislaw Gruszka <sgruszka@redhat.com>
Subject: Re: [PATCH 00/36] cputime: Convert core use of cputime_t to nsecs
Date: Fri, 18 Nov 2016 13:08:46 +0100 [thread overview]
Message-ID: <20161118130846.7da515cc@mschwide> (raw)
In-Reply-To: <1479406123-24785-1-git-send-email-fweisbec@gmail.com>
On Thu, 17 Nov 2016 19:08:07 +0100
Frederic Weisbecker <fweisbec@gmail.com> wrote:
> I'm sorry for the patchbomb, especially as I usually complain about
> these myself but I don't see any way to split this patchset into
> standalone pieces, none of which would make any sense... All I can do
> is to isolate about 3 cleanup patches.
On first glance the patches look ok-ish, but I am not happy about the
direction this takes.
I can understand the wish to consolidate the common code to a single
format which is nano-seconds. It will have repercussions though.
First the obvious problem, it does not compile for s390:
arch/s390/kernel/vtime.c: In function 'do_account_vtime':
arch/s390/kernel/vtime.c:140:25: error: implicit declaration of function
'cputime_to_nsecs' [-Werror=implicit-function-declaration]
account_user_time(tsk, cputime_to_nsecs(user));
^~~~~~~~~~~~~~~~
arch/s390/kernel/idle.c: In function 'enabled_wait':
arch/s390/kernel/idle.c:46:20: error: implicit declaration of function
'cputime_to_nsecs' [-Werror=implicit-function-declaration]
account_idle_time(cputime_to_nsecs(idle_time));
^~~~~~~~~~~~~~~~
arch/s390/kernel/idle.c: In function 'arch_cpu_idle_time':
arch/s390/kernel/idle.c:100:9: error: implicit declaration of function
'cputime_to_nsec' [-Werror=implicit-function-declaration]
return cputime_to_nsec(idle_enter ? ((idle_exit ?: now) - idle_enter) : 0);
^~~~~~~~~~~~~~~
The error at idle.c:100 is a typo cputime_to_nsec vs cputime_to_nsecs.
The other two could probably be solved with an additional include but the
default cputime_to_nsecs is in include/linux/cputime.h is this:
#ifndef cputime_to_nsecs
# define cputime_to_nsecs(__ct) \
(cputime_to_usecs(__ct) * NSEC_PER_USEC)
#endif
which downgrades the accuracy for s390 from better than nano-seconds
to micro-seconds. Not good. For the s390 cputime format you would have
to do
static inline unsigned long long cputime_to_nsecs(const cputime_t cputime)
{
return ((__force unsigned long long) cputime * 1000) >> 12;
}
But this *example* function has an overflow problem.
> So currently, cputime_t serves the purpose, for s390 and
> powerpc (on CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y), to avoid converting
> arch clock counters to nanosecs or jiffies while accounting cputime.
The cputime_t has several purposes:
1) Allow for different units in the calculations for virtual cpu time.
There are currently three models: jiffies, nano-seconds and the native
TOD clock format for s390 which is a bit better than nano-seconds.
2) Act as a marker in the common code where a virtual cpu time is used.
This is more important than you might think, unfortunately it is very
easy to confuse a wall-clock delta with cpu time.
3) Avoid expensive operations on the fast path to convert the native cpu
time to something else. Instead move the expensive calculation to the
read-out code, e.g. fs/proc.
You patches breaks all three of these purposes. My main gripe is with 3).
> But this comes at the cost of a lot of complexity and uglification
> in the core code to deal with such an opaque type that relies on lots of
> mutators and accessors in order to deal with a random granularity time
> unit that also involve lots of workarounds and likely some performance
> penalties.
Having an opaque type with a set of helper functions is the whole point, no?
And I would not call the generic implementations for jiffies or nano-seconds
complex, these are easy enough to understand. And what are the performance
penalties you are talking about?
> So this patchset proposes to convert most of the cputime_t uses to nsecs.
> In the end it's only used by s390 and powerpc. This all comes at the
> expense of those two archs which then need to perform a cputime_to_nsec()
> conversion everytime they update the cputime to the core. Now I expect
> we can leverage this performance loss with flushing the cputime only on
> ticks so that we accumulate time as cputime_t in between and make the
> conversions more rare.
It is not just one cputime_to_nsec that we would have to add but several.
Three in do_account_vtime and one in vtime_account_irq_enter.
The do_account_vtime function is called once per jiffy and once per task
switch. HZ is usually set to 100 for s390, the conversion once per jiffy
would not be so bad, but the call on the scheduling path *will* hurt.
What is even worse is the vtime_account_irq_enter path, that is call several
times for each *interrupt*, at least two times for an interrupt without
additional processing and four times if a softirq is triggered.
Now it has been proposed to implement lazy accounting to accumulate deltas
and do the expensive conversions only infrequently. This is pretty straight-
forward for account_user_time but to do this for the account_system_time
function is more complicated. The function has to differentiate between
guest/hardirq/softirq and pure system time. We would need to keep sums for
each bucket and provide a separate function to add to each bucket. Like
account_guest_time(), account_hardirq_time(), account_softirq_time() and
account_system_time(). Then it is up to the arch code to sort out the details
and call the accounting code once per jiffy for each of the buckets.
We still have to do the whole thing on each task switch though.
But I am still not happy about the approach. What is the compelling reason
for this change except for the "but it looks ugly"?
--
blue skies,
Martin.
"Reality continues to ruin my life." - Calvin.
next prev parent reply other threads:[~2016-11-18 12:09 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-17 18:08 [PATCH 00/36] cputime: Convert core use of cputime_t to nsecs Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 01/36] jiffies: Reuse TICK_NSEC instead of NSEC_PER_JIFFY Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 02/36] time: Introduce jiffies64_to_nsecs() Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 03/36] sched: Remove unused INIT_CPUTIME macro Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 04/36] cputime: Convert kcpustat to nsecs Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 05/36] macintosh/rack-meter: Remove cputime_t internal use Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 06/36] cputime: Convert guest time accounting to nsecs Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 07/36] cputime: Special API to return old-typed cputime Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 08/36] cputime: Convert task/group cputime to nsecs Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 09/36] alpha: Convert obsolete cputime_t " Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 10/36] x86: Convert obsolete cputime type " Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 11/36] isdn: " Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 12/36] binfmt: " Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 13/36] acct: " Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 14/36] delaycct: " Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 15/36] tsacct: " Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 16/36] signal: " Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 17/36] cputime: Increment kcpustat directly on irqtime account Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 18/36] posix-timers: Use TICK_NSEC instead of a dynamically ad-hoc calculated version Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 19/36] posix-timers: Convert internals to use nsecs Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 20/36] itimer: Convert internal cputime_t units to nsec Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 21/36] sched: Remove temporary cputime_t accessors Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 22/36] cputime: Push time to account_user_time() in nsecs Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 23/36] cputime: Push time to account_steal_time() " Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 24/36] cputime: Push time to account_idle_time() " Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 25/36] cputime: Push time to account_system_time() " Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 26/36] cputime: Complete nsec conversion of tick based accounting Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 27/36] vtime: Return nsecs instead of cputime_t to account Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 28/36] cputime: Remove jiffies based cputime Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 29/36] ia64: Move nsecs based cputime headers to the last arch using it Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 30/36] ia64: Convert vtime to use nsec units directly Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 31/36] ia64: Remove unused cputime definitions Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 32/36] s390: Make arch_cpu_idle_time() to return nsecs Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 33/36] powerpc: Remove unused cputime definitions Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 34/36] s390: " Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 35/36] cputime: Remove unused nsec_to_cputime Frederic Weisbecker
2016-11-17 18:08 ` [PATCH 36/36] cputime: Remove asm generic headers Frederic Weisbecker
2016-11-18 12:08 ` Martin Schwidefsky [this message]
2016-11-18 14:47 ` [PATCH 00/36] cputime: Convert core use of cputime_t to nsecs Frederic Weisbecker
2016-11-21 6:59 ` Martin Schwidefsky
2016-11-21 10:17 ` Martin Schwidefsky
2016-11-22 13:45 ` Frederic Weisbecker
2016-11-22 14:28 ` Martin Schwidefsky
2016-11-21 16:20 ` Frederic Weisbecker
2016-11-22 6:11 ` Martin Schwidefsky
2016-11-21 9:49 ` Ingo Molnar
2016-11-21 16:22 ` Frederic Weisbecker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161118130846.7da515cc@mschwide \
--to=schwidefsky@de.ibm.com \
--cc=benh@kernel.crashing.org \
--cc=fenghua.yu@intel.com \
--cc=fweisbec@gmail.com \
--cc=heiko.carstens@de.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=mpe@ellerman.id.au \
--cc=paulus@samba.org \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=sgruszka@redhat.com \
--cc=tglx@linutronix.de \
--cc=tony.luck@intel.com \
--cc=wanpeng.li@hotmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).