From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756122AbbCBVQV (ORCPT ); Mon, 2 Mar 2015 16:16:21 -0500 Received: from g9t5009.houston.hp.com ([15.240.92.67]:39285 "EHLO g9t5009.houston.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753543AbbCBVQT (ORCPT ); Mon, 2 Mar 2015 16:16:19 -0500 X-Greylist: delayed 9241 seconds by postgrey-1.27 at vger.kernel.org; Mon, 02 Mar 2015 16:16:19 EST Message-ID: <1425330975.5304.49.camel@j-VirtualBox> Subject: Re: [PATCH v2] sched, timer: Use atomics for thread_group_cputimer to improve scalability From: Jason Low To: Oleg Nesterov Cc: Peter Zijlstra , Ingo Molnar , Linus Torvalds , "Paul E. McKenney" , Andrew Morton , Mike Galbraith , Frederic Weisbecker , Rik van Riel , Steven Rostedt , Scott Norton , Aswin Chandramouleeswaran , linux-kernel@vger.kernel.org, Jason Low Date: Mon, 02 Mar 2015 13:16:15 -0800 In-Reply-To: <20150302194356.GB27914@redhat.com> References: <1425321731.5304.14.camel@j-VirtualBox> <20150302194033.GA27914@redhat.com> <20150302194356.GB27914@redhat.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.2.3-0ubuntu6 Content-Transfer-Encoding: 7bit Mime-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 2015-03-02 at 20:43 +0100, Oleg Nesterov wrote: > On 03/02, Oleg Nesterov wrote: > > > > Well, I forgot everything about this code, but let me ask anyway ;) > > > > On 03/02, Jason Low wrote: > > > > > > -static void update_gt_cputime(struct task_cputime *a, struct task_cputime *b) > > > +static inline void __update_gt_cputime(atomic64_t *cputime, u64 sum_cputime) > > > { > > > - if (b->utime > a->utime) > > > - a->utime = b->utime; > > > - > > > - if (b->stime > a->stime) > > > - a->stime = b->stime; > > > + u64 curr_cputime; > > > + /* > > > + * Set cputime to sum_cputime if sum_cputime > cputime. Use cmpxchg > > > + * to avoid race conditions with concurrent updates to cputime. > > > + */ > > > +retry: > > > + curr_cputime = atomic64_read(cputime); > > > + if (sum_cputime > curr_cputime) { > > > + if (atomic64_cmpxchg(cputime, curr_cputime, sum_cputime) != curr_cputime) > > > + goto retry; > > > + } > > > +} > > > > > > - if (b->sum_exec_runtime > a->sum_exec_runtime) > > > - a->sum_exec_runtime = b->sum_exec_runtime; > > > +static void update_gt_cputime(struct thread_group_cputimer *cputimer, struct task_cputime *sum) > > > +{ > > > + __update_gt_cputime(&cputimer->utime, sum->utime); > > > + __update_gt_cputime(&cputimer->stime, sum->stime); > > > + __update_gt_cputime(&cputimer->sum_exec_runtime, sum->sum_exec_runtime); > > > } > > > > And this is called if !cputimer_running(). > > > > So who else can update these atomic64_t's ? The caller is called under ->siglock. > > IOW, do we really need to cmpxchg/retry ? > > > > Just curious, I am sure I missed something. > > Ah, sorry, I seem to understand. > > We still can race with account_group_*time() even if ->running == 0. Because > (say) account_group_exec_runtime() can race with 1 -> 0 -> 1 transition. > > Or is there another reason? Hi Oleg, Yes, that 1 -> 0 -> 1 transition was the race that I had in mind. Thus, I added the extra atomic logic in update_gt_cputime() just to be safe. In original code, we set cputimer->running first so it is running while we call update_gt_cputime(). Now in this patch, we swapped the 2 calls such that we set running after calling update_gt_cputime(), so that wouldn't be an issue anymore.