From: Frederic Weisbecker <fweisbec@gmail.com>
To: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
LKML <linux-kernel@vger.kernel.org>,
Tony Luck <tony.luck@intel.com>,
Wanpeng Li <wanpeng.li@hotmail.com>,
Michael Ellerman <mpe@ellerman.id.au>,
Heiko Carstens <heiko.carstens@de.ibm.com>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Thomas Gleixner <tglx@linutronix.de>,
Paul Mackerras <paulus@samba.org>, Ingo Molnar <mingo@kernel.org>,
Fenghua Yu <fenghua.yu@intel.com>, Rik van Riel <riel@redhat.com>,
Martin Schwidefsky <schwidefsky@de.ibm.com>
Subject: Re: [PATCH 08/37] cputime: Convert task/group cputime to nsecs
Date: Mon, 30 Jan 2017 16:38:08 +0100 [thread overview]
Message-ID: <20170130153807.GB16945@lerouge> (raw)
In-Reply-To: <20170130135451.GB2669@redhat.com>
On Mon, Jan 30, 2017 at 02:54:51PM +0100, Stanislaw Gruszka wrote:
> On Sat, Jan 28, 2017 at 04:28:13PM +0100, Frederic Weisbecker wrote:
> > On Sat, Jan 28, 2017 at 12:57:40PM +0100, Stanislaw Gruszka wrote:
> > > On 32 bit architectures 64bit store/load is not atomic and if not
> > > protected - 64bit variables can be mangled. I do not see any protection
> > > (lock) between utime/stime store and load in the patch and seems that
> > > {u/s}time store/load can be performed at the same time. Though problem
> > > is very very improbable it still can happen at least theoretically when
> > > lower and upper 32 bits are changed at the same time i.e. process
> > > {u,s}time become near to multiple of 2**32 nsec (aprox: 4sec) and
> > > 64bit {u,s}time is stored and loaded at the same time on different
> > > cpus. As said this is very improbable situation, but eventually could
> > > be possible on long lived processes.
> >
> > "Improbable situation" doesn't appply to Linux. With millions (billion?)
> > of machines using it, a rare issue in the core turns into likely to happen
> > somewhere in the planet every second.
> >
> > So it's definetly a race we want to consider. Note it goes beyond the scope
> > of this patchset as the issue was already there before since cputime_t can already
> > map to u64 on 32 bits systems upstream. But this patchset definetly extends
> > the issue on all 32 bits configs.
> >
> > kcpustat has the same issue upstream. It's is made of u64 on all configs.
>
> I would like to add what are possible consequences if value will be
> mangled. For sum_exec_runtime, utime and stime we could get wrong values
> on cpu-clock related syscalls like clock_gettime() or clock_nanosleep()
> and cpu-clock timers like timer_create(CLOCK_PROCESS_CPUTIME_ID) can be
> triggered before or long after expected. For kcpustat this seems to be
> wrong values read by procfs and 3 drivers (cpufreq, appldata, macintosh).
Yep, all agreed.
>
> > > I considering fixing problem of sum_exec_runtime possible mangling
> > > by using prev_sum_exec_runtime:
> > >
> > > u64 read_sum_exec_runtime(struct task_struct *t)
> > > {
> > > u64 ns, prev_ns;
> > >
> > > do {
> > > prev_ns = READ_ONCE(t->se.prev_sum_exec_runtime);
> > > ns = READ_ONCE(t->se.sum_exec_runtime);
> > > } while (ns < prev_ns || ns > (prev_ns + U32_MAX));
> > >
> > > return ns;
> > > }
> > >
> > > This should work based on fact that prev_sum_exec_runtime and
> > > sum_exec_runtime are not modified and stored at the same time, so only
> > > one of those variabled can be mangled. Though I need to think about
> > > correctnes of that a bit more.
> >
> > I'm not sure that would be enough. READ_ONCE prevents from reordering by the
> > compiler but not by the CPU. You'd need memory barriers between reads and
> > writes of prev_ns and ns.
>
> It will not be enough, this _suppose_ to work based on that sum_exec_runtime
> and prev_sum_exec_runtime are not written at the same time. i.e. only
> one variable can be mangled as another one sits already in the memory.
> However "not written at the same time" is weak part of reasoning. Even
> if those variables are stored at different part of code (sum_exec_runtime
> on update_curr() and prev_sum_exec_runtime on set_next_entity()) we can
> not assume store of one variable is finished before another one starts.
Right.
>
> > WRITE ns READ prev_ns
> > smp_wmb() smp_rmb()
> > WRITE prev_ns READ ns
> > smp_wmb() smp_rmb()
> >
> > It seems to be the only way to make sure that at least one of the reads
> > (prev_ns or ns) is correct.
Well reading that again, I'm not 100% sure about the correctness guarantee.
But it might work.
>
> I think you have right, but seems on much code paths we have scenario:
>
> WRITE ns READ prev_ns
> smp_wmb() smp_rmb()
> WRITE prev_ns READ ns
>
> and we have already smp_wmb() after write of sum_exec_runtime on
> update_min_vruntime().
You still need a second barrier after the second write and read (or
before the first write and read, it's the same) to ensure that if you read
a mangled version of ns, prev_ns is ok.
Still I think u64_stats_sync is less trouble and more reliable.
Thanks.
next prev parent reply other threads:[~2017-01-30 15:39 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-01-22 18:19 [PATCH 00/37] cputime: Convert core use of cputime_t to nsecs v3 Frederic Weisbecker
2017-01-22 18:19 ` [PATCH 01/37] jiffies: Reuse TICK_NSEC instead of NSEC_PER_JIFFY Frederic Weisbecker
2017-01-22 18:19 ` [PATCH 02/37] time: Introduce jiffies64_to_nsecs() Frederic Weisbecker
2017-01-22 18:19 ` [PATCH 03/37] sched: Remove unused INIT_CPUTIME macro Frederic Weisbecker
2017-01-22 18:19 ` [PATCH 04/37] cputime: Convert kcpustat to nsecs Frederic Weisbecker
2017-01-22 18:19 ` [PATCH 05/37] macintosh/rack-meter: Remove cputime_t internal use Frederic Weisbecker
2017-01-22 18:19 ` [PATCH 06/37] cputime: Convert guest time accounting to nsecs Frederic Weisbecker
2017-01-22 18:19 ` [PATCH 07/37] cputime: Special API to return old-typed cputime Frederic Weisbecker
2017-01-22 18:19 ` [PATCH 08/37] cputime: Convert task/group cputime to nsecs Frederic Weisbecker
2017-01-28 11:57 ` Stanislaw Gruszka
2017-01-28 15:28 ` Frederic Weisbecker
2017-01-30 13:54 ` Stanislaw Gruszka
2017-01-30 15:38 ` Frederic Weisbecker [this message]
2017-01-22 18:19 ` [PATCH 09/37] alpha: Convert obsolete cputime_t " Frederic Weisbecker
2017-01-22 18:19 ` [PATCH 10/37] x86: Convert obsolete cputime type " Frederic Weisbecker
2017-01-22 18:19 ` [PATCH 11/37] isdn: " Frederic Weisbecker
2017-01-22 18:19 ` [PATCH 12/37] binfmt: " Frederic Weisbecker
2017-01-22 18:19 ` [PATCH 13/37] acct: " Frederic Weisbecker
2017-01-22 18:19 ` [PATCH 14/37] delaycct: " Frederic Weisbecker
2017-01-22 18:19 ` [PATCH 15/37] tsacct: " Frederic Weisbecker
2017-01-22 18:19 ` [PATCH 16/37] signal: " Frederic Weisbecker
2017-01-22 18:19 ` [PATCH 17/37] cputime: Increment kcpustat directly on irqtime account Frederic Weisbecker
2017-01-22 18:19 ` [PATCH 18/37] posix-timers: Use TICK_NSEC instead of a dynamically ad-hoc calculated version Frederic Weisbecker
2017-01-22 18:19 ` [PATCH 19/37] posix-timers: Convert internals to use nsecs Frederic Weisbecker
2017-01-22 18:19 ` [PATCH 20/37] itimer: Convert internal cputime_t units to nsec Frederic Weisbecker
2017-01-22 18:19 ` [PATCH 21/37] sched: Remove temporary cputime_t accessors Frederic Weisbecker
2017-01-22 18:19 ` [PATCH 22/37] cputime: Push time to account_user_time() in nsecs Frederic Weisbecker
2017-01-22 18:19 ` [PATCH 23/37] cputime: Push time to account_steal_time() " Frederic Weisbecker
2017-01-22 18:20 ` [PATCH 24/37] cputime: Push time to account_idle_time() " Frederic Weisbecker
2017-01-22 18:20 ` [PATCH 25/37] cputime: Push time to account_system_time() " Frederic Weisbecker
2017-01-22 18:20 ` [PATCH 26/37] cputime: Complete nsec conversion of tick based accounting Frederic Weisbecker
2017-01-22 18:20 ` [PATCH 27/37] vtime: Return nsecs instead of cputime_t to account Frederic Weisbecker
2017-01-22 18:20 ` [PATCH 28/37] cputime: Remove jiffies based cputime Frederic Weisbecker
2017-01-22 18:20 ` [PATCH 29/37] ia64: Move nsecs based cputime headers to the last arch using it Frederic Weisbecker
2017-01-22 18:20 ` [PATCH 30/37] ia64: Convert vtime to use nsec units directly Frederic Weisbecker
2017-01-22 18:20 ` [PATCH 31/37] ia64: Remove unused cputime definitions Frederic Weisbecker
2017-01-22 18:20 ` [PATCH 32/37] s390: Make arch_cpu_idle_time() to return nsecs Frederic Weisbecker
2017-01-22 18:20 ` [PATCH 33/37] powerpc: Remove unused cputime definitions Frederic Weisbecker
2017-01-22 18:20 ` [PATCH 34/37] s390: " Frederic Weisbecker
2017-01-22 18:20 ` [PATCH 35/37] cputime: Remove unused nsec_to_cputime Frederic Weisbecker
2017-01-22 18:20 ` [PATCH 36/37] cputime: Remove asm generic headers Frederic Weisbecker
2017-01-22 18:20 ` [PATCH 37/37] s390: Prevent from cputime leaks Frederic Weisbecker
2017-01-23 9:44 ` Martin Schwidefsky
2017-01-25 15:25 ` Frederic Weisbecker
2017-01-25 15:40 ` Martin Schwidefsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170130153807.GB16945@lerouge \
--to=fweisbec@gmail.com \
--cc=benh@kernel.crashing.org \
--cc=fenghua.yu@intel.com \
--cc=heiko.carstens@de.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=mpe@ellerman.id.au \
--cc=paulus@samba.org \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=schwidefsky@de.ibm.com \
--cc=sgruszka@redhat.com \
--cc=tglx@linutronix.de \
--cc=tony.luck@intel.com \
--cc=wanpeng.li@hotmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox