Re: [PATCH 3/3] sched,time: atomically increment stime & utime

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Oleg Nesterov <oleg@redhat.com>
To: riel@redhat.com
Cc: linux-kernel@vger.kernel.org, peterz@infradead.org,
	umgwanakikbuti@gmail.com, fweisbec@gmail.com,
	akpm@linux-foundation.org, srao@redhat.com, lwoodman@redhat.com,
	atheurer@redhat.com
Subject: Re: [PATCH 3/3] sched,time: atomically increment stime & utime
Date: Sat, 16 Aug 2014 16:55:15 +0200	[thread overview]
Message-ID: <20140816145515.GA17226@redhat.com> (raw)
In-Reply-To: <1408133138-22048-4-git-send-email-riel@redhat.com>

On 08/15, riel@redhat.com wrote:
>
> --- a/kernel/sched/cputime.c
> +++ b/kernel/sched/cputime.c
> @@ -605,9 +605,12 @@ static void cputime_adjust(struct task_cputime *curr,
>  	 * If the tick based count grows faster than the scheduler one,
>  	 * the result of the scaling may go backward.
>  	 * Let's enforce monotonicity.
> +	 * Atomic exchange protects against concurrent cputime_adjust.
>  	 */
> -	prev->stime = max(prev->stime, stime);
> -	prev->utime = max(prev->utime, utime);
> +	while (stime > (rtime = ACCESS_ONCE(prev->stime)))
> +		cmpxchg(&prev->stime, rtime, stime);
> +	while (utime > (rtime = ACCESS_ONCE(prev->utime)))
> +		cmpxchg(&prev->utime, rtime, utime);
>
>  out:
>  	*ut = prev->utime;

I am still not sure about this change. At least I think it needs some
discussion.

Let me repeat, afaics this can lead to inconsistent results. Just
suppose that the caller of thread_group_cputime_adjusted() gets a long
preemption between thread_group_cputime() and cputime_adjust(), and
the numbers in signal->prev_cputime grow significantly when this task
resumes. If cputime_adjust() sees both prev->stime and prev->utime
updated everything is fine. But we can race with cputime_adjust() on
another CPU and miss, say, the change in ->utime.

IOW. To simplify, suppose that thread_group_cputime(T) fills task_cputime
with zeros. Then the caller X is preempted.

Another task does thread_group_cputime(T) and this time task_cputime is
{ .utime = A_LOT_U, .stime = A_LOT_S }. This task calls cputime_adjust()
and sets prev->stime = A_LOT_S.

X resumes, calls cputime_adjust(), and returns { 0, A_LOT_S }.

If you think that we do not care, probably I won't argue. But at least
this should be documented/discussed. And if we can tolerate this, then we
can probably simply remove the scale_stime recalculation and change it to
just do

	static void cputime_adjust(struct task_cputime *curr,
				   struct cputime *prev,
				   cputime_t *ut, cputime_t *st)
	{
		cputime_t rtime, stime, utime;
		/*
		 * Let's enforce monotonicity.
		 * Atomic exchange protects against concurrent cputime_adjust.
		 */
		while (stime > (rtime = ACCESS_ONCE(prev->stime)))
			cmpxchg(&prev->stime, rtime, stime);
		while (utime > (rtime = ACCESS_ONCE(prev->utime)))
			cmpxchg(&prev->utime, rtime, utime);

		*ut = prev->utime;
		*st = prev->stime;
	}

Oleg.

next prev parent reply	other threads:[~2014-08-16 14:57 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-15 20:05 [PATCH 0/3] lockless sys_times and posix_cpu_clock_get riel
2014-08-15 20:05 ` [PATCH 1/3] exit: always reap resource stats in __exit_signal riel
2014-09-08  6:39   ` [tip:sched/core] exit: Always reap resource stats in __exit_signal() tip-bot for Rik van Riel
2014-08-15 20:05 ` [PATCH 2/3] time,signal: protect resource use statistics with seqlock riel
2014-08-16 14:11   ` Oleg Nesterov
2014-08-16 15:07     ` Rik van Riel
2014-08-16 17:40     ` [PATCH v2 " Rik van Riel
2014-08-16 17:50       ` Oleg Nesterov
2014-08-18  4:44         ` Mike Galbraith
2014-08-18 14:03           ` Rik van Riel
2014-08-19 14:26             ` Mike Galbraith
2014-09-08  6:39       ` [tip:sched/core] time, signal: Protect " tip-bot for Rik van Riel
2014-08-15 20:05 ` [PATCH 3/3] sched,time: atomically increment stime & utime riel
2014-08-16 14:55   ` Oleg Nesterov [this message]
2014-08-16 14:56     ` Oleg Nesterov
2014-09-08  6:40   ` [tip:sched/core] sched, time: Atomically " tip-bot for Rik van Riel
2014-08-19 21:21 ` [PATCH 0/3] lockless sys_times and posix_cpu_clock_get Andrew Theurer
2014-09-03 18:38   ` Rik van Riel
2014-09-04  7:48     ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140816145515.GA17226@redhat.com \
    --to=oleg@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=atheurer@redhat.com \
    --cc=fweisbec@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lwoodman@redhat.com \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=srao@redhat.com \
    --cc=umgwanakikbuti@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.