From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1752706AbbJBQEo (ORCPT <rfc822;w@1wt.eu>);
	Fri, 2 Oct 2015 12:04:44 -0400
Received: from casper.infradead.org ([85.118.1.10]:42733 "EHLO
	casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1752158AbbJBQEl (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Fri, 2 Oct 2015 12:04:41 -0400
Date: Fri, 2 Oct 2015 17:59:06 +0200
From: Peter Zijlstra <peterz@infradead.org>
To: byungchul.park@lge.com
Cc: mingo@kernel.org, linux-kernel@vger.kernel.org, fweisbec@gmail.com,
        tglx@linutronix.de
Subject: Re: [PATCH v3 2/2] sched: consider missed ticks when updating global
 cpu load
Message-ID: <20151002155906.GD3816@twins.programming.kicks-ass.net>
References: <1443771974-27077-1-git-send-email-byungchul.park@lge.com>
 <1443771974-27077-3-git-send-email-byungchul.park@lge.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1443771974-27077-3-git-send-email-byungchul.park@lge.com>
User-Agent: Mutt/1.5.21 (2012-12-30)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Fri, Oct 02, 2015 at 04:46:14PM +0900, byungchul.park@lge.com wrote:
> From: Byungchul Park <byungchul.park@lge.com>
> 
> in hrtimer_interrupt(), the first tick_program_event() can be failed
> because the next timer could be already expired due to,
> (see the comment in hrtimer_interrupt())
> 
> - tracing
> - long lasting callbacks

If anything keeps interrupts disabled for longer than 1 tick, you'd
better go fix that.

> - being scheduled away when running in a VM

Not sure how much I should care about that, and this patch is completely
wrong for that anyhow.

And this case in hrtimer_interrupt() is basically a fail case, if you
hit that, you've got bigger problems. The solution is to rework things
so you don't get there.


> in the case that the first tick_program_event() is failed, the second
> tick_program_event() set the expired time to more than one tick later.
> then next tick can happen after more than one tick, even though tick is
> not stopped by e.g. NOHZ.
> 
> when the next tick occurs, update_process_times() -> scheduler_tick()
> -> update_cpu_load_active() is performed, assuming the distance between
> last tick and current tick is 1 tick! it's wrong in this case. thus,
> this abnormal case should be considered in update_cpu_load_active().

Everything in update_process_times() assumes 1 tick, just fixing up 
one function inside that callchain is wrong -- I've already told you
that.