From: Tomasz Buchert <Tomasz.Buchert@inria.fr>
To: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: linux-kernel@vger.kernel.org,
Daniel Walker <dwalker@codeaurora.org>,
Peter Zijlstra <peterz@infradead.org>,
Thomas Gleixner <tglx@linutronix.de>
Subject: Rationale for wall clocks
Date: Fri, 30 Jul 2010 16:58:26 +0200 [thread overview]
Message-ID: <4C52E892.1040709@inria.fr> (raw)
In-Reply-To: <20100730132343.4c15bfdc@dhcp-lab-109.englab.brq.redhat.com>
Hi!
To begin with, there are two main things that my patches concern:
A) Limited access to POSIX CPU clocks
B) Access to wall time information of a process/thread.
By CPU time I understand "user time" + "system time".
The scenerio I have, is (making the long story short) that my process
supervises a set of tasks. It "freezes" them (using freezer cgroups)
when their (CPU time) / (wall time) ratio reaches a certain threshold.
Now, to make this decision as precise as possible, I need to get a good
measurement of CPU/wall time of a task (identified by TID). If internally
the kernel time-keeping is in nanoseconds (well, at least on my x86 machine)
why shouldn't I expect to have access to it?
Let's agree at the very beginning that procfs is not feasible
to achieve that with acceptable quality. /proc/[pid]/stat
and /proc/[tid]/task/[tid]/stat expose CPU time in clock ticks
(on my machine I have sysconf(_SC_CLK_TCK) = 100 so precision is 10ms).
Start time of a process is given in a number of ticks
after the system boot and the boot time itself is given in /proc/stat in ...
a number of seconds after the beginning of Unix epoch. That's not good enough.
Ad. A)
clock_gettime is a very nice interface with nanosecond precision
(again on my x86 machine). You can ask for CPU time of a thread
or a process. And finally you can clock_nanosleep on it.
When asking for CPU time of a task, however, you can only query
tasks from your own thread group. I see no reason why this
couldn't be extended to all tasks of the same user (extending
it further could introduce potential security risks). I think
also that a root user could have the access to all clocks in the system.
This kind of information may be retrieved via taskstats anyway
(for EVERY task in the system), but with only ms precision
(because of the mentioned security problems?)
Ad. B)
As far I can tell, the only good way to obtain elapsed time
of a process/thread is to use taskstats interface. It's not THAT bad,
I agree with Stanislaw on that, it gives you some valuable pieces of information.
The precision is 2ms for the CPU time and 1us for elapsed time. In fact
with CONFIG_TASK_DELAY_ACCT enabled you can get CPU time with nanosecond precision
(it's not compiled in on my Ubuntu 9.10 kernel but it is in on one Debian machine
I have somewhere). Another exotic way to get CPU time is to use CONFIG_SCHEDSTATS
and read the first number in /proc/[tid]/schedstat. Interestingly, this is available
by default on my Ubuntu box but not in the previously mentioned Debian :).
The most portable way would be to use taskstats (it's in both kernels...:) ).
I didn't like the CPU time precision given and the whole messy code needed to use
netlink interface, though. Moreover, to get the best available precision
I would have to use POSIX clocks to get CPU time (assuming the change A would accepted!)
and taskstats to get WALL time (the precision would be however still 1us). I didn't
like this idea at all.
That's why I started to dig the kernel a little bit. After some time I found unused slot
in clockid_t which would perfectly fit an additional clock. What I like about this interface:
1) clean and simple
2) nanosecond precision
3) cheap, compared to taskstats
4) unified access to 2 important clocks of a process: CPU clock and WALL clock
The nice thing also is that you can clock_nanosleep on that clock. I have this kind
of scenario on mind: I control a process and, say, want to kill it after 1 sec
(because it is only allowed to run for that amount of time). It is easily and robustly
done with this interface: you just sleep on the WALL time clock of that process until
absolute time of 1s. Sadly, right now you can't do it precisely and correctly at the same time.
I agree that these problems could be addressed with giving the access to start_time
field, as Stanislaw suggested. Adding new fields with the same meaning but with
higher precision to taskstats is a terrible idea of course.
I simply felt, that adding a new clock type is a nice and consistent approach.
That's it.
Tomasz
prev parent reply other threads:[~2010-07-30 14:58 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-07-30 9:57 [PATCH 0/4] Wall time clocks + change in access rules Tomasz Buchert
2010-07-30 9:57 ` [PATCH 1/4] posix-timers: Refactoring of CPUCLOCK* macros Tomasz Buchert
2010-07-30 9:57 ` [PATCH 2/4] posix-cpu-timers: Introduction of wall clocks Tomasz Buchert
2010-07-30 9:57 ` [PATCH 3/4] posix-cpu-timers: Wider access to the thread clocks Tomasz Buchert
2010-07-30 9:57 ` [PATCH 4/4] posix-cpu-timers: posix-cpu-timers.c renamed to posix-task-timers.c Tomasz Buchert
2010-07-30 11:30 ` [PATCH 2/4] posix-cpu-timers: Introduction of wall clocks Stanislaw Gruszka
2010-07-30 11:23 ` [PATCH 0/4] Wall time clocks + change in access rules Stanislaw Gruszka
2010-07-30 11:51 ` Stanislaw Gruszka
2010-07-30 14:58 ` Tomasz Buchert [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4C52E892.1040709@inria.fr \
--to=tomasz.buchert@inria.fr \
--cc=dwalker@codeaurora.org \
--cc=linux-kernel@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=sgruszka@redhat.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox