From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1756122AbbCBVQV (ORCPT <rfc822;w@1wt.eu>);
	Mon, 2 Mar 2015 16:16:21 -0500
Received: from g9t5009.houston.hp.com ([15.240.92.67]:39285 "EHLO
	g9t5009.houston.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1753543AbbCBVQT (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Mon, 2 Mar 2015 16:16:19 -0500
X-Greylist: delayed 9241 seconds by postgrey-1.27 at vger.kernel.org; Mon, 02 Mar 2015 16:16:19 EST
Message-ID: <1425330975.5304.49.camel@j-VirtualBox>
Subject: Re: [PATCH v2] sched, timer: Use atomics for thread_group_cputimer
 to improve scalability
From: Jason Low <jason.low2@hp.com>
To: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@kernel.org>,
        Linus Torvalds <torvalds@linux-foundation.org>,
        "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
        Andrew Morton <akpm@linux-foundation.org>,
        Mike Galbraith <umgwanakikbuti@gmail.com>,
        Frederic Weisbecker <fweisbec@gmail.com>,
        Rik van Riel <riel@redhat.com>, Steven Rostedt <rostedt@goodmis.org>,
        Scott Norton <scott.norton@hp.com>,
        Aswin Chandramouleeswaran <aswin@hp.com>, linux-kernel@vger.kernel.org,
        Jason Low <jason.low2@hp.com>
Date: Mon, 02 Mar 2015 13:16:15 -0800
In-Reply-To: <20150302194356.GB27914@redhat.com>
References: <1425321731.5304.14.camel@j-VirtualBox>
	 <20150302194033.GA27914@redhat.com> <20150302194356.GB27914@redhat.com>
Content-Type: text/plain; charset="UTF-8"
X-Mailer: Evolution 3.2.3-0ubuntu6 
Content-Transfer-Encoding: 7bit
Mime-Version: 1.0
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Mon, 2015-03-02 at 20:43 +0100, Oleg Nesterov wrote:
> On 03/02, Oleg Nesterov wrote:
> >
> > Well, I forgot everything about this code, but let me ask anyway ;)
> >
> > On 03/02, Jason Low wrote:
> > >
> > > -static void update_gt_cputime(struct task_cputime *a, struct task_cputime *b)
> > > +static inline void __update_gt_cputime(atomic64_t *cputime, u64 sum_cputime)
> > >  {
> > > -	if (b->utime > a->utime)
> > > -		a->utime = b->utime;
> > > -
> > > -	if (b->stime > a->stime)
> > > -		a->stime = b->stime;
> > > +	u64 curr_cputime;
> > > +	/*
> > > +	 * Set cputime to sum_cputime if sum_cputime > cputime. Use cmpxchg
> > > +	 * to avoid race conditions with concurrent updates to cputime.
> > > +	 */
> > > +retry:
> > > +	curr_cputime = atomic64_read(cputime);
> > > +	if (sum_cputime > curr_cputime) {
> > > +		if (atomic64_cmpxchg(cputime, curr_cputime, sum_cputime) != curr_cputime)
> > > +			goto retry;
> > > +	}
> > > +}
> > >
> > > -	if (b->sum_exec_runtime > a->sum_exec_runtime)
> > > -		a->sum_exec_runtime = b->sum_exec_runtime;
> > > +static void update_gt_cputime(struct thread_group_cputimer *cputimer, struct task_cputime *sum)
> > > +{
> > > +	__update_gt_cputime(&cputimer->utime, sum->utime);
> > > +	__update_gt_cputime(&cputimer->stime, sum->stime);
> > > +	__update_gt_cputime(&cputimer->sum_exec_runtime, sum->sum_exec_runtime);
> > >  }
> >
> > And this is called if !cputimer_running().
> >
> > So who else can update these atomic64_t's ? The caller is called under ->siglock.
> > IOW, do we really need to cmpxchg/retry ?
> >
> > Just curious, I am sure I missed something.
> 
> Ah, sorry, I seem to understand.
> 
> We still can race with account_group_*time() even if ->running == 0. Because
> (say) account_group_exec_runtime() can race with 1 -> 0 -> 1 transition.
> 
> Or is there another reason?

Hi Oleg,

Yes, that 1 -> 0 -> 1 transition was the race that I had in mind. Thus,
I added the extra atomic logic in update_gt_cputime() just to be safe.

In original code, we set cputimer->running first so it is running while
we call update_gt_cputime(). Now in this patch, we swapped the 2 calls
such that we set running after calling update_gt_cputime(), so that
wouldn't be an issue anymore.