From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756071AbbIBVuZ (ORCPT ); Wed, 2 Sep 2015 17:50:25 -0400 Received: from mail-wi0-f172.google.com ([209.85.212.172]:38066 "EHLO mail-wi0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752794AbbIBVuY (ORCPT ); Wed, 2 Sep 2015 17:50:24 -0400 Date: Wed, 2 Sep 2015 23:50:22 +0200 From: Frederic Weisbecker To: Tejun Heo Cc: "Paul E. McKenney" , linux-kernel@vger.kernel.org, Peter Zijlstra Subject: Re: Warning in irq_work_queue_on() Message-ID: <20150902215020.GA21505@lerouge> References: <20150825001611.GA1751@linux.vnet.ibm.com> <20150902194405.GM22326@mtj.duckdns.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150902194405.GM22326@mtj.duckdns.org> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Sep 02, 2015 at 03:44:05PM -0400, Tejun Heo wrote: > (cc'ing peterz) > > Ooh, this is from irq_work which doesn't have much to do with > workqueue. Peter? > > On Mon, Aug 24, 2015 at 05:16:11PM -0700, Paul E. McKenney wrote: > > Hello, Tejun, > > > > As discussed last week, I am getting an occasional warning out of > > irq_work_queue_on() WARN_ON_ONCE(cpu_is_offline(cpu)). The repeat-by > > seems to be a week or so of rcutorture runs on 16-CPU KVM instances > > on x86. So please see below on the off-chance that this is of use. > > I have also attached a .config file. > > > > Thoughts? > > > > Thanx, Paul > > > > ------------------------------------------------------------------------ > > > > [ 875.702254] ------------[ cut here ]------------ > > [ 875.703111] WARNING: CPU: 0 PID: 768 at /home/paulmck/public_git/bisect-linux-rcu/kernel/irq_work.c:69 irq_work_queue_on+0xd4/0x110() > > [ 875.703227] Modules linked in: > > [ 875.703227] CPU: 0 PID: 768 Comm: rcu_torture_rea Tainted: G W 4.1.0-rc4+ #1 > > [ 875.703227] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > > [ 875.703227] ffffffff81baadd8 ffff88001dc5fce8 ffffffff81895418 00000000000000aa > > [ 875.703227] 0000000000000000 ffff88001dc5fd28 ffffffff810517d5 0000000000015bc0 > > [ 875.703227] 0000000000000004 0000000000000004 ffff88001fc8f980 ffff88001fc8d500 > > [ 875.703227] Call Trace: > > [ 875.703227] [] dump_stack+0x45/0x57 > > [ 875.703227] [] warn_slowpath_common+0x85/0xc0 > > [ 875.703227] [] warn_slowpath_null+0x15/0x20 > > [ 875.703227] [] irq_work_queue_on+0xd4/0x110 > > [ 875.703227] [] tick_nohz_full_kick_cpu+0x44/0x50 It happens in nohz full, but I'm not sure the guilty is nohz full. The problem here is that wake_up_nohz_cpu() selects a CPU that is offline. But this shouldn't happen. Either it selects a CPU that is in the domain tree, and I suspect offline CPUs aren't supposed to be there, or it selects the current CPU. And if the CPU is offlined, it shouldn't be running some kthread... > > [ 875.703227] [] wake_up_nohz_cpu+0xb4/0x100 > > [ 875.703227] [] internal_add_timer+0x86/0xa0 > > [ 875.703227] [] mod_timer+0xf1/0x1e0 > > [ 875.703227] [] rcu_torture_reader+0x2a4/0x2e0 > > [ 875.703227] [] ? rcu_torture_reader+0x2e0/0x2e0 > > [ 875.703227] [] ? rcutorture_trace_dump.part.10+0x20/0x20 > > [ 875.703227] [] kthread+0xcd/0xf0 > > [ 875.703227] [] ? kthread_create_on_node+0x180/0x180 > > [ 875.703227] [] ret_from_fork+0x42/0x70 > > [ 875.703227] [] ? kthread_create_on_node+0x180/0x180 > > [ 875.703227] ---[ end trace 74175128740d0113 ]---