From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752960AbcAMQBs (ORCPT ); Wed, 13 Jan 2016 11:01:48 -0500 Received: from mail-wm0-f68.google.com ([74.125.82.68]:35111 "EHLO mail-wm0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752806AbcAMQBp (ORCPT ); Wed, 13 Jan 2016 11:01:45 -0500 From: Frederic Weisbecker To: Peter Zijlstra Cc: LKML , Frederic Weisbecker , Byungchul Park , Chris Metcalf , Thomas Gleixner , Luiz Capitulino , Christoph Lameter , "Paul E . McKenney" , Mike Galbraith , Rik van Riel Subject: [RFC PATCH 4/4] sched: Upload nohz full CPU load on task enqueue/dequeue Date: Wed, 13 Jan 2016 17:01:31 +0100 Message-Id: <1452700891-21807-5-git-send-email-fweisbec@gmail.com> X-Mailer: git-send-email 2.6.4 In-Reply-To: <1452700891-21807-1-git-send-email-fweisbec@gmail.com> References: <1452700891-21807-1-git-send-email-fweisbec@gmail.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The full nohz CPU load is currently accounted on tick restart only. But there are a few issues with this model: _ On tick restart, if cpu_load[0] doesn't contain the load of the actual tickless load that just ran, we are going to account a wrong value. And it is very likely to be so given that cpu_load[0] doesn't have an opportunity to be updated between tick stop and tick restart. _ If the runqueue had updates that didn't trigger a tick restart, we are going to miss those CPU load changes. A solution to fix this is to update the CPU load everytime we enqueue or dequeue a task in the fair runqueue and more than a jiffy occured since the last update. Cc: Byungchul Park Cc: Mike Galbraith Cc: Chris Metcalf Cc: Christoph Lameter Cc: Luiz Capitulino Cc: Paul E . McKenney Cc: Rik van Riel Cc: Thomas Gleixner Signed-off-by: Frederic Weisbecker --- kernel/sched/fair.c | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 1e0cb6e..763dc3b 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4433,6 +4433,34 @@ void update_cpu_load_active(struct rq *this_rq) #endif /* CONFIG_SMP */ /* + * In NO_HZ full mode, we need to account the CPU load without relying + * on the tick. We do it instead on task enqueue/dequeue time as those + * are the main points where CPU load changes. + */ +static inline void update_cpu_load_nohz_full(struct rq *rq) +{ +#ifdef CONFIG_NO_HZ_FULL + unsigned long curr_jiffies; + unsigned long load; + + if (!tick_nohz_full_cpu(cpu_of(rq))) + return; + + curr_jiffies = READ_ONCE(jiffies); + load = weighted_cpuload(cpu_of(rq)); + if (curr_jiffies == rq->last_load_update_tick) { + /* + * At least record the current load so that we flush + * it correctly on the next update. + */ + rq->cpu_load[0] = load; + } else { + __update_cpu_load_nohz(rq, curr_jiffies, load, 1); + } +#endif +} + +/* * The enqueue_task method is called before nr_running is * increased. Here we update the fair scheduling stats and * then put the task into the rbtree: @@ -4477,6 +4505,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags) add_nr_running(rq, 1); hrtick_update(rq); + update_cpu_load_nohz_full(rq); } static void set_next_buddy(struct sched_entity *se); @@ -4537,6 +4566,7 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags) sub_nr_running(rq, 1); hrtick_update(rq); + update_cpu_load_nohz_full(rq); } #ifdef CONFIG_SMP -- 2.6.4