public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC] time: drop do_sys_times spinlock
@ 2014-08-12 18:25 Rik van Riel
  2014-08-12 19:12 ` Oleg Nesterov
  0 siblings, 1 reply; 49+ messages in thread
From: Rik van Riel @ 2014-08-12 18:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Oleg Nesterov, Hidetoshi Seto, Frank Mayhar,
	Frederic Weisbecker, Andrew Morton, Sanjay Rao, Larry Woodman

Back in 2009, Spencer Candland pointed out there is a race with
do_sys_times, where multiple threads calling do_sys_times can
sometimes get decreasing results.

https://lkml.org/lkml/2009/11/3/522

As a result of that discussion, some of the code in do_sys_times
was moved under a spinlock.

However, that does not seem to actually make the race go away on
larger systems. One obvious remaining race is that after one thread
is about to return from do_sys_times, it is preempted by another
thread, which also runs do_sys_times, and stores a larger value in
the shared variable than what the first thread got.

This race is on the kernel/userspace boundary, and not fixable
with spinlocks.

Removing the spinlock from do_sys_times does not seem to result
in an increase in the number of times a decreasing utime is
observed when running the test case. In fact, on the 80 CPU test
system that I tried, I saw a small decrease, from an average
14.8 to 6.5 instances of backwards utime running the test case.

Back in 2009, in changeset 2b5fe6de5 Oleg Nesterov already found
that it should be safe to remove the spinlock.  I believe this is
true, because it appears that nobody changes another task's ->sighand
pointer, except at fork time and exit time, during which the task
cannot be in do_sys_times.

This is subtle enough to warrant documenting.

The increased scalability of removing the spinlock should help
things like databases and middleware that measure the resource
use of every query processed.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Frank Mayhar <fmayhar@google.com>
Cc: Frederic Weisbecker <fweisbec@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Sanjay Rao <srao@redhat.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Signed-off-by: Rik van Riel <riel@redhat.com>
---
 kernel/sys.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/kernel/sys.c b/kernel/sys.c
index 66a751e..cb81ce4 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -862,11 +862,15 @@ void do_sys_times(struct tms *tms)
 {
 	cputime_t tgutime, tgstime, cutime, cstime;
 
-	spin_lock_irq(&current->sighand->siglock);
+	/*
+	 * sys_times gets away with not locking &current->sighand->siglock
+	 * because most of the time only the current process gets to change
+	 * its own sighand pointer. The exception is exit, which changes
+	 * the sighand pointer of an exiting process.
+	 */
 	thread_group_cputime_adjusted(current, &tgutime, &tgstime);
 	cutime = current->signal->cutime;
 	cstime = current->signal->cstime;
-	spin_unlock_irq(&current->sighand->siglock);
 	tms->tms_utime = cputime_to_clock_t(tgutime);
 	tms->tms_stime = cputime_to_clock_t(tgstime);
 	tms->tms_cutime = cputime_to_clock_t(cutime);

^ permalink raw reply related	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2014-08-15 22:33 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-12 18:25 [PATCH RFC] time: drop do_sys_times spinlock Rik van Riel
2014-08-12 19:12 ` Oleg Nesterov
2014-08-12 19:22   ` Rik van Riel
2014-08-12 22:27   ` Rik van Riel
2014-08-13 17:22     ` Oleg Nesterov
2014-08-13 17:35       ` Rik van Riel
2014-08-13 18:08         ` Oleg Nesterov
2014-08-13 18:25           ` Rik van Riel
2014-08-13 18:45             ` Oleg Nesterov
2014-08-13 18:57               ` Rik van Riel
2014-08-13 21:03               ` [PATCH RFC] time,signal: protect resource use statistics with seqlock Rik van Riel
2014-08-14  0:43                 ` Frederic Weisbecker
2014-08-14  1:57                   ` Rik van Riel
2014-08-14 13:34                     ` Frederic Weisbecker
2014-08-14 14:39                       ` Oleg Nesterov
2014-08-15  2:52                         ` Frederic Weisbecker
2014-08-15 14:26                           ` Oleg Nesterov
2014-08-15 22:33                             ` Frederic Weisbecker
2014-08-14 13:22                 ` Oleg Nesterov
2014-08-14 13:38                   ` Frederic Weisbecker
2014-08-14 13:53                     ` Oleg Nesterov
2014-08-14 17:48                   ` Oleg Nesterov
2014-08-14 18:34                     ` Oleg Nesterov
2014-08-15  5:19                     ` Mike Galbraith
2014-08-15  6:28                       ` Peter Zijlstra
2014-08-15  9:37                         ` Mike Galbraith
2014-08-15  9:44                           ` Peter Zijlstra
2014-08-15 16:36                         ` Oleg Nesterov
2014-08-15 16:49                           ` Oleg Nesterov
2014-08-15 17:25                             ` Rik van Riel
2014-08-15 18:36                               ` Oleg Nesterov
2014-08-14 14:24                 ` Oleg Nesterov
2014-08-14 15:37                   ` Rik van Riel
2014-08-14 16:12                     ` Oleg Nesterov
2014-08-14 17:36                       ` Rik van Riel
2014-08-14 18:15                         ` Oleg Nesterov
2014-08-14 19:03                           ` Rik van Riel
2014-08-14 19:37                             ` Oleg Nesterov
2014-08-15  2:14                       ` Rik van Riel
2014-08-15 14:58                         ` Oleg Nesterov
2014-08-13 21:03               ` Rik van Riel
2014-08-13 17:40       ` [PATCH RFC] time: drop do_sys_times spinlock Peter Zijlstra
2014-08-13 17:50         ` Rik van Riel
2014-08-13 17:53           ` Peter Zijlstra
2014-08-13  6:59   ` Mike Galbraith
2014-08-13 11:11     ` Peter Zijlstra
2014-08-13 13:24       ` Rik van Riel
2014-08-13 13:39         ` Peter Zijlstra
2014-08-13 14:09           ` Mike Galbraith

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox