From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755221AbbCBVtz (ORCPT ); Mon, 2 Mar 2015 16:49:55 -0500 Received: from g4t3427.houston.hp.com ([15.201.208.55]:36575 "EHLO g4t3427.houston.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754484AbbCBVtr (ORCPT ); Mon, 2 Mar 2015 16:49:47 -0500 Message-ID: <1425332984.5304.66.camel@j-VirtualBox> Subject: Re: [PATCH v2] sched, timer: Use atomics for thread_group_cputimer to improve scalability From: Jason Low To: Linus Torvalds Cc: Peter Zijlstra , Ingo Molnar , "Paul E. McKenney" , Andrew Morton , Oleg Nesterov , Mike Galbraith , Frederic Weisbecker , Rik van Riel , Steven Rostedt , Scott Norton , Aswin Chandramouleeswaran , Linux Kernel Mailing List , jason.low2@hp.com Date: Mon, 02 Mar 2015 13:49:44 -0800 In-Reply-To: References: <1425321731.5304.14.camel@j-VirtualBox> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.2.3-0ubuntu6 Content-Transfer-Encoding: 7bit Mime-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 2015-03-02 at 11:03 -0800, Linus Torvalds wrote: > On Mon, Mar 2, 2015 at 10:42 AM, Jason Low wrote: > > > > This patch converts the timers to 64 bit atomic variables and use > > atomic add to update them without a lock. With this patch, the percent > > of total time spent updating thread group cputimer timers was reduced > > from 30% down to less than 1%. > > NAK. > > Not because I think this is wrong, but because somebody needs to look > at the effects on 32-bit architectures too. > > In particular, check out lib/atomic64.c - which uses a hashed array of > 16-bit spinlocks to do 64-bit atomics. That may or may well work ok in > practice, but it does mean that now sample_group_cputimer() and > update_gt_cputime() will take that (it ends up generally being the > same) spinlock three times for the three atomic64_read()'s. Okay, I will run some tests to see how this change affects the performance of itimers on 32 bit systems. While the update_gt_cputime() shouldn't be an issue for performance since it doesn't get called often, the sample_group_cputimer() needing to take locks 3 times for each atomic64_read is something that could impact performance, so we should take a look at that. Thanks, Jason