From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751972AbcJJMUX (ORCPT ); Mon, 10 Oct 2016 08:20:23 -0400 Received: from bombadil.infradead.org ([198.137.202.9]:58222 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751352AbcJJMUV (ORCPT ); Mon, 10 Oct 2016 08:20:21 -0400 Date: Mon, 10 Oct 2016 14:02:21 +0200 From: Peter Zijlstra To: Wanpeng Li Cc: "linux-kernel@vger.kernel.org" , Wanpeng Li , Ingo Molnar , Mike Galbraith , Thomas Gleixner Subject: Re: [PATCH] sched/core: Fix kick offline cpu to do nohz idle load balance Message-ID: <20161010120221.GP3568@worktop.programming.kicks-ass.net> References: <1476072600-3619-1-git-send-email-wanpeng.li@hotmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.22.1 (2013-10-16) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Oct 10, 2016 at 04:34:48PM +0800, Wanpeng Li wrote: > > If there is a need to kick the idle load balancer, an ILB will be selected > > to perform nohz idle load balance, however, if the selected ILB is in the > > process of offline, smp_sched_reschedule() which generates a sched IPI will > > splat as above. > > > > CPU0 CPU1 > > > > find_new_ilb() > > set_rq_offline() > > smp_sched_reschedule() Oops > > nohz_balance_exit_idle() > > > > This patch fix it by exiting nohz idle balance before set cpu offline. > > CPU 0 CPU1 > > find_new_ilb() > nohz_balance_exit_idle() > set_rq_offline() > smp_sched_reschedule() > > It seems that the patch still can't avoid this race, so any proposal > is a great appreciated. :) Not sure how this can happen, scheduler_tick() -> trigger_load_balance() -> nohz_balancer_kick() is called with IRQs disabled, this too implies a RCU-sched read side section. And hotplug explicitly includes a rcu_sync_sched(). It would be find_new_ilb() is 'broken' in that it considers !active CPUs. That's not immediately obvious.