From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754189Ab3LQOJT (ORCPT ); Tue, 17 Dec 2013 09:09:19 -0500 Received: from service87.mimecast.com ([91.220.42.44]:46108 "EHLO service87.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754001Ab3LQOJR convert rfc822-to-8bit (ORCPT ); Tue, 17 Dec 2013 09:09:17 -0500 Message-ID: <52B05B09.5090807@arm.com> Date: Tue, 17 Dec 2013 14:09:13 +0000 From: Chris Redpath User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: Peter Zijlstra CC: "pjt@google.com" , "mingo@redhat.com" , "alex.shi@linaro.org" , Morten Rasmussen , Dietmar Eggemann , "linux-kernel@vger.kernel.org" , "bsegall@google.com" , Vincent Guittot , Frederic Weisbecker Subject: Re: [PATCH 2/2] sched: update runqueue clock before migrations away References: <1386593950-26475-1-git-send-email-chris.redpath@arm.com> <1386593950-26475-3-git-send-email-chris.redpath@arm.com> <20131210114825.GF12849@twins.programming.kicks-ass.net> <52A71605.5090509@arm.com> <20131210151428.GH12849@twins.programming.kicks-ass.net> <52A7397F.4000806@arm.com> <20131212182414.GF2480@laptop.programming.kicks-ass.net> In-Reply-To: <20131212182414.GF2480@laptop.programming.kicks-ass.net> X-OriginalArrivalTime: 17 Dec 2013 14:09:13.0947 (UTC) FILETIME=[90ABCAB0:01CEFB31] X-MC-Unique: 113121714091402601 Content-Type: text/plain; charset=WINDOWS-1252; format=flowed Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 12/12/13 18:24, Peter Zijlstra wrote: > Would pre_schedule_idle() -> rq_last_tick_reset() -> rq->last_sched_tick > be useful? > > I suppose we could easily lift that to NO_HZ_COMMON. > Many thanks for the tip Peter, I have tried this out and it does provide enough information to be able to correct the problem. The new version doesn't update the rq, just carries the extra unaccounted time (estimated from the jiffies) over to be processed during enqueue. However before I send a new patch set I have a question about the existing behavior. Ben, you may already know the answer to this? During a wake migration we call __synchronize_entity_decay in migrate_task_rq_fair, which will decay avg.runnable_avg_sum. We also record the amount of periods we decayed for as a negative number in avg.decay_count. We then enqueue the task on its target runqueue, and again we decay the load by the number of periods it has been off-rq. if (unlikely(se->avg.decay_count <= 0)) { se->avg.last_runnable_update = rq_clock_task(rq_of(cfs_rq)); if (se->avg.decay_count) { se->avg.last_runnable_update -= (-se->avg.decay_count) << 20; >>> update_entity_load_avg(se, 0); Am I misunderstanding how this is supposed to work or have we been always double-accounting sleep time for wake migrations?