From: Mike Galbraith <umgwanakikbuti@gmail.com>
To: Oleg Nesterov <oleg@redhat.com>
Cc: Rik van Riel <riel@redhat.com>,
linux-kernel@vger.kernel.org,
Peter Zijlstra <peterz@infradead.org>,
Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>,
Frank Mayhar <fmayhar@google.com>,
Frederic Weisbecker <fweisbec@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Sanjay Rao <srao@redhat.com>, Larry Woodman <lwoodman@redhat.com>
Subject: Re: [PATCH RFC] time: drop do_sys_times spinlock
Date: Wed, 13 Aug 2014 08:59:50 +0200 [thread overview]
Message-ID: <1407913190.5542.50.camel@marge.simpson.net> (raw)
In-Reply-To: <20140812191218.GA15210@redhat.com>
On Tue, 2014-08-12 at 21:12 +0200, Oleg Nesterov wrote:
> On 08/12, Rik van Riel wrote:
> >
> > Back in 2009, Spencer Candland pointed out there is a race with
> > do_sys_times, where multiple threads calling do_sys_times can
> > sometimes get decreasing results.
> >
> > https://lkml.org/lkml/2009/11/3/522
> >
> > As a result of that discussion, some of the code in do_sys_times
> > was moved under a spinlock.
> >
> > However, that does not seem to actually make the race go away on
> > larger systems. One obvious remaining race is that after one thread
> > is about to return from do_sys_times, it is preempted by another
> > thread, which also runs do_sys_times, and stores a larger value in
> > the shared variable than what the first thread got.
> >
> > This race is on the kernel/userspace boundary, and not fixable
> > with spinlocks.
>
> Not sure I understand...
>
> Afaics, the problem is that a single thread can observe the decreasing
> (say) sum_exec_runtime if it calls do_sys_times() twice without the lock.
>
> This is because it can account the exiting sub-thread twice if it races
> with __exit_signal() which increments sig->sum_sched_runtime, but this
> exiting thread can still be visible to thread_group_cputime().
>
> IOW, it is not actually about decreasing, the problem is that the lockless
> thread_group_cputime() can return the wrong result, and the next ys_times()
> can show the right value.
>
> > Back in 2009, in changeset 2b5fe6de5 Oleg Nesterov already found
> > that it should be safe to remove the spinlock.
>
> Yes, it is safe but only in a sense that for_each_thread() is fine lockless.
> So this change was reverted.
Funny that thread_group_cputime() should come up just now..
Could you take tasklist_lock ala posix_cpu_clock_get_task()? If so,
would that improve things at all?
I was told that clock_gettime(CLOCK_PROCESS_CPUTIME_ID) has scalability
issues on BIG boxen, but perhaps less so than times()?
I'm sure the real clock_gettime() using proggy that gummed up a ~1200
core box for "a while" wasn't the testcase below, which will gum it up
for a long while, but looks to me like using CLOCK_PROCESS_CPUTIME_ID
from LOTS of threads is a "Don't do that, it'll hurt a LOT".
#include <sys/time.h>
#include <mpi.h>
#include <stdio.h>
#include <time.h>
int
main(int argc, char **argv){
struct timeval tv;
struct timespec tp;
int rc;
int i;
MPI_Init(&argc, &argv);
for(i=0;i<100000;i++){
rc = gettimeofday(&tv, NULL);
if(rc < 0) perror("gettimeofday");
rc = clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &tp);
if(rc < 0) perror("clock_gettime");
}
MPI_Finalize();
return 0;
}
next prev parent reply other threads:[~2014-08-13 6:59 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-08-12 18:25 [PATCH RFC] time: drop do_sys_times spinlock Rik van Riel
2014-08-12 19:12 ` Oleg Nesterov
2014-08-12 19:22 ` Rik van Riel
2014-08-12 22:27 ` Rik van Riel
2014-08-13 17:22 ` Oleg Nesterov
2014-08-13 17:35 ` Rik van Riel
2014-08-13 18:08 ` Oleg Nesterov
2014-08-13 18:25 ` Rik van Riel
2014-08-13 18:45 ` Oleg Nesterov
2014-08-13 18:57 ` Rik van Riel
2014-08-13 21:03 ` [PATCH RFC] time,signal: protect resource use statistics with seqlock Rik van Riel
2014-08-14 0:43 ` Frederic Weisbecker
2014-08-14 1:57 ` Rik van Riel
2014-08-14 13:34 ` Frederic Weisbecker
2014-08-14 14:39 ` Oleg Nesterov
2014-08-15 2:52 ` Frederic Weisbecker
2014-08-15 14:26 ` Oleg Nesterov
2014-08-15 22:33 ` Frederic Weisbecker
2014-08-14 13:22 ` Oleg Nesterov
2014-08-14 13:38 ` Frederic Weisbecker
2014-08-14 13:53 ` Oleg Nesterov
2014-08-14 17:48 ` Oleg Nesterov
2014-08-14 18:34 ` Oleg Nesterov
2014-08-15 5:19 ` Mike Galbraith
2014-08-15 6:28 ` Peter Zijlstra
2014-08-15 9:37 ` Mike Galbraith
2014-08-15 9:44 ` Peter Zijlstra
2014-08-15 16:36 ` Oleg Nesterov
2014-08-15 16:49 ` Oleg Nesterov
2014-08-15 17:25 ` Rik van Riel
2014-08-15 18:36 ` Oleg Nesterov
2014-08-14 14:24 ` Oleg Nesterov
2014-08-14 15:37 ` Rik van Riel
2014-08-14 16:12 ` Oleg Nesterov
2014-08-14 17:36 ` Rik van Riel
2014-08-14 18:15 ` Oleg Nesterov
2014-08-14 19:03 ` Rik van Riel
2014-08-14 19:37 ` Oleg Nesterov
2014-08-15 2:14 ` Rik van Riel
2014-08-15 14:58 ` Oleg Nesterov
2014-08-13 21:03 ` Rik van Riel
2014-08-13 17:40 ` [PATCH RFC] time: drop do_sys_times spinlock Peter Zijlstra
2014-08-13 17:50 ` Rik van Riel
2014-08-13 17:53 ` Peter Zijlstra
2014-08-13 6:59 ` Mike Galbraith [this message]
2014-08-13 11:11 ` Peter Zijlstra
2014-08-13 13:24 ` Rik van Riel
2014-08-13 13:39 ` Peter Zijlstra
2014-08-13 14:09 ` Mike Galbraith
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1407913190.5542.50.camel@marge.simpson.net \
--to=umgwanakikbuti@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=fmayhar@google.com \
--cc=fweisbec@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lwoodman@redhat.com \
--cc=oleg@redhat.com \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=seto.hidetoshi@jp.fujitsu.com \
--cc=srao@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox