From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753308AbaGJKJF (ORCPT ); Thu, 10 Jul 2014 06:09:05 -0400 Received: from bombadil.infradead.org ([198.137.202.9]:58071 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752525AbaGJKJD (ORCPT ); Thu, 10 Jul 2014 06:09:03 -0400 Date: Thu, 10 Jul 2014 12:08:59 +0200 From: Peter Zijlstra To: bsegall@google.com Cc: Yuyang Du , mingo@redhat.com, linux-kernel@vger.kernel.org, rafael.j.wysocki@intel.com, arjan.van.de.ven@intel.com, len.brown@intel.com, alan.cox@intel.com, mark.gross@intel.com, pjt@google.com, fengguang.wu@intel.com Subject: Re: [PATCH 2/2] sched: Rewrite per entity runnable load average tracking Message-ID: <20140710100859.GW3935@laptop> References: <1404268256-3019-1-git-send-email-yuyang.du@intel.com> <1404268256-3019-2-git-send-email-yuyang.du@intel.com> <20140707104646.GK6758@twins.programming.kicks-ass.net> <20140708000840.GB25653@intel.com> <20140709010753.GD25653@intel.com> <20140709184543.GI9918@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jul 09, 2014 at 12:07:08PM -0700, bsegall@google.com wrote: > Peter Zijlstra writes: > > > On Wed, Jul 09, 2014 at 09:07:53AM +0800, Yuyang Du wrote: > >> That is chalenging... Can someone (Peter) grant us a lock of the remote rq? :) > > > > Nope :-).. we got rid of that lock for a good reason. > > > > Also, this is one area where I feel performance really trumps > > correctness, we can fudge the blocked load a little. So the > > sched_clock_cpu() difference is a strict upper bound on the > > rq_clock_task() difference (and under 'normal' circumstances shouldn't > > be much off). > > Well, unless IRQ_TIME_ACCOUNTING or such is on, in which case you lose. > Or am I misunderstanding the suggestion? If its on its still an upper bound, and typically the difference is not too large I think. Since clock_task is the regular clock minus some local amount, the difference between two regular clock reads is always a strict upper bound on clock_task differences. > Actually the simplest thing > would probably be to grab last_update_time (which on 32-bit could be > done with the _copy hack) and use that. Then I think the accuracy is > only worse than current in that you can lose runnable load as well as > blocked load, and that it isn't as easily corrected - currently if the > blocked tasks wake up they'll add the correct numbers to > runnable_load_avg, even if blocked_load_avg is screwed up and hit zero. > This code would have to wait until it stabilized again. The problem with that is that last_update_time is measured in clock_task, and you cannot transfer these values between CPUs. clock_task can drift unbounded between CPUs.