From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753964Ab3LJPUJ (ORCPT ); Tue, 10 Dec 2013 10:20:09 -0500 Received: from mail-pb0-f54.google.com ([209.85.160.54]:34832 "EHLO mail-pb0-f54.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753690Ab3LJPUG (ORCPT ); Tue, 10 Dec 2013 10:20:06 -0500 Message-ID: <52A73121.30309@linaro.org> Date: Tue, 10 Dec 2013 23:20:01 +0800 From: Alex Shi User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.1.0 MIME-Version: 1.0 To: Daniel Lezcano , Linux Kernel Mailing List , Mike Galbraith Subject: Re: [question] sched: idle_avg and migration latency References: <52A6FB5C.7010706@linaro.org> In-Reply-To: <52A6FB5C.7010706@linaro.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org CC to MikeG, he written this part. :) I try to explain sth I know. I am sorry if my understanding incorrect. On 12/10/2013 07:30 PM, Daniel Lezcano wrote: > > Hi All, > > I am trying to understand how is computed the idle_avg and how it is > used regarding the migration latency. > > 1. What is the sysctl_sched_migration_cost value ? It is initialized to > 500000UL. Is it an arbitrarily chosen value ? Could it change depending > on the hardware performances ? current sysctl_sched_mirgration_cost is 0.5ms, used to limit overscheduling. Guess it is a kind of arbitrary. But it can be rewrite at /proc/sys/kernel/sched_migration_cost_ns. So if you find some new suitable value in particular scenario. guess PeterZ like to modify it. :) > > > 2. The idle_balance function checks: > > if (this_rq->avg_idle < sysctl_sched_migration_cost) > return 0; > > IIUC, it is not worth to migrate a task to this cpu as we expect to run > another task before we can pull a task to the current cpu, right ? No, that used to prevent every idle_balance cause a task migration if idle balance happens too much and too quick, -- frequency more than task migration limitation. > > Then if there is no task to balance we will enter idle, thus we > initialize the idle_stamp to the current clock. If we pulled task, we will restart frequency calculation by set idle_stamp = 0; or if new task adding this rq, allow more idle_balance. > > When another task is woken up with the ttwu_do_wakeup, the duration of > the idle time is computed in there: > > if (rq->idle_stamp) { > u64 delta = rq_clock(rq) - rq->idle_stamp; > u64 max = 2*sysctl_sched_migration_cost; > > if (delta > max) > rq->avg_idle = max; > else > update_avg(&rq->avg_idle, delta); > rq->idle_stamp = 0; > } > > Why is the 'delta' leveraged by 'max' ? > > > 3. And finally the function update_avg does: > > s64 diff = sample - *avg; > *avg += diff >> 3; > > Why is diff >> 3 used instead of the number of values ? It is a kind of decay. but has no idea of why this value '3'. Guess MikeG has some reason. > > Thanks in advance for any answers > > -- Daniel > -- Thanks Alex