From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751145AbaGKEL0 (ORCPT ); Fri, 11 Jul 2014 00:11:26 -0400 Received: from mga03.intel.com ([143.182.124.21]:1225 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750775AbaGKELZ (ORCPT ); Fri, 11 Jul 2014 00:11:25 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.01,642,1400050800"; d="scan'208";a="455721904" Date: Fri, 11 Jul 2014 04:08:32 +0800 From: Yuyang Du To: bsegall@google.com Cc: Peter Zijlstra , mingo@redhat.com, linux-kernel@vger.kernel.org, rafael.j.wysocki@intel.com, arjan.van.de.ven@intel.com, len.brown@intel.com, alan.cox@intel.com, mark.gross@intel.com, pjt@google.com, fengguang.wu@intel.com Subject: Re: [PATCH 2/2] sched: Rewrite per entity runnable load average tracking Message-ID: <20140710200831.GB12984@intel.com> References: <1404268256-3019-1-git-send-email-yuyang.du@intel.com> <1404268256-3019-2-git-send-email-yuyang.du@intel.com> <20140707104646.GK6758@twins.programming.kicks-ass.net> <20140708000840.GB25653@intel.com> <20140709010753.GD25653@intel.com> <20140709184543.GI9918@twins.programming.kicks-ass.net> <20140709233049.GA12024@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jul 10, 2014 at 10:06:27AM -0700, bsegall@google.com wrote: > So, sched_clock(_cpu) can be arbitrarily far off of cfs_rq_clock_task, so you > can't really do that. Ideally, yes, you would account for any time since > the last update and account that time as !runnable. However, I don't > think there is any good way to do that, and the current code doesn't. Yeah. We only catch up the migrating task to its cfs_rq and substract. No catching up to "current" time. > > > > I made another mistake. Should not only track task entity load, group entity > > (as an entity) is also needed. Otherwise, task_h_load can't be done correctly... > > Sorry for the messup. But this won't make much change in the codes. > > This will increase it to 2x __update_load_avg per cgroup per > enqueue/dequeue. What does this (and this patch in general) do to > context switch cost at cgroup depth 1/2/3? We can update cfs_rq load_avg, and let the cfs_rq's own se take a ride in that update. These two should get exactly synchronized anyway (group se's load is only usefull for task_h_load calc, and group cfs_rq's load is useful for task_h_load and update_cfs_share calc). And technically, it looks easy: To update cfs_rq, the update weight is cfs_rq->load.weight To update its se, the update weight is cfs_rq->tg->se[cpu]->load.weight * on_rq So the it will not increase to 2x, but 1.05x, maybe, :) Thanks, Yuyang