From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752706AbbJBQEo (ORCPT ); Fri, 2 Oct 2015 12:04:44 -0400 Received: from casper.infradead.org ([85.118.1.10]:42733 "EHLO casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752158AbbJBQEl (ORCPT ); Fri, 2 Oct 2015 12:04:41 -0400 Date: Fri, 2 Oct 2015 17:59:06 +0200 From: Peter Zijlstra To: byungchul.park@lge.com Cc: mingo@kernel.org, linux-kernel@vger.kernel.org, fweisbec@gmail.com, tglx@linutronix.de Subject: Re: [PATCH v3 2/2] sched: consider missed ticks when updating global cpu load Message-ID: <20151002155906.GD3816@twins.programming.kicks-ass.net> References: <1443771974-27077-1-git-send-email-byungchul.park@lge.com> <1443771974-27077-3-git-send-email-byungchul.park@lge.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1443771974-27077-3-git-send-email-byungchul.park@lge.com> User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Oct 02, 2015 at 04:46:14PM +0900, byungchul.park@lge.com wrote: > From: Byungchul Park > > in hrtimer_interrupt(), the first tick_program_event() can be failed > because the next timer could be already expired due to, > (see the comment in hrtimer_interrupt()) > > - tracing > - long lasting callbacks If anything keeps interrupts disabled for longer than 1 tick, you'd better go fix that. > - being scheduled away when running in a VM Not sure how much I should care about that, and this patch is completely wrong for that anyhow. And this case in hrtimer_interrupt() is basically a fail case, if you hit that, you've got bigger problems. The solution is to rework things so you don't get there. > in the case that the first tick_program_event() is failed, the second > tick_program_event() set the expired time to more than one tick later. > then next tick can happen after more than one tick, even though tick is > not stopped by e.g. NOHZ. > > when the next tick occurs, update_process_times() -> scheduler_tick() > -> update_cpu_load_active() is performed, assuming the distance between > last tick and current tick is 1 tick! it's wrong in this case. thus, > this abnormal case should be considered in update_cpu_load_active(). Everything in update_process_times() assumes 1 tick, just fixing up one function inside that callchain is wrong -- I've already told you that.