From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932187AbcAVXvz (ORCPT ); Fri, 22 Jan 2016 18:51:55 -0500 Received: from e33.co.us.ibm.com ([32.97.110.151]:34767 "EHLO e33.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755232AbcAVXvk (ORCPT ); Fri, 22 Jan 2016 18:51:40 -0500 X-IBM-Helo: d03dlp02.boulder.ibm.com X-IBM-MailFrom: paulmck@linux.vnet.ibm.com X-IBM-RcptTo: linux-kernel@vger.kernel.org Date: Fri, 22 Jan 2016 15:51:43 -0800 From: "Paul E. McKenney" To: Andi Kleen Cc: Dave Jones , Linux Kernel , x86@kernel.org, tglx@linutronix.de, rostedt@goodmis.org Subject: Re: suspicious RCU usage in msr tracing. Message-ID: <20160122235143.GO3818@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20160122003427.GA29240@codemonkey.org.uk> <20160122223329.GC25832@tassilo.jf.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160122223329.GC25832@tassilo.jf.intel.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16012223-0009-0000-0000-000011ACEB3D Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jan 22, 2016 at 02:33:29PM -0800, Andi Kleen wrote: > On Thu, Jan 21, 2016 at 07:34:27PM -0500, Dave Jones wrote: > > Just hit this on Linus' current tree. > > > > [ INFO: suspicious RCU usage. ] > > 4.4.0-think+ #1 Tainted: G W > > ------------------------------- > > ./arch/x86/include/asm/msr-trace.h:47 suspicious rcu_dereference_check() usage! > > > > other info that might help us debug this: > > > > > > RCU used illegally from idle CPU! > > rcu_scheduler_active = 1, debug_locks = 0 > > RCU used illegally from extended quiescent state! > > no locks held by swapper/3/0. > > > > stack backtrace: > > CPU: 3 PID: 0 Comm: swapper/3 Tainted: G W 4.4.0-think+ #1 > > ffffffff8ef82ac0 c4dd1c3486ada576 ffff880468e07f08 ffffffff8e566ae1 > > ffff880464905340 ffff880468e07f38 ffffffff8e135bf8 ffffffff8f665b00 > > ffff880464918000 0000000000000000 ffff880464920000 ffff880468e07f70 > > Call Trace: > > [] dump_stack+0x4e/0x7d > > [] lockdep_rcu_suspicious+0xf8/0x110 > > [] do_trace_write_msr+0x136/0x140 > > [] native_apic_msr_eoi_write+0x23/0x30 > > [] smp_call_function_single_interrupt+0x36/0x50 > > [] smp_call_function_interrupt+0xe/0x10 > > [] call_function_interrupt+0x90/0xa0 > > [] ? __asan_store4+0x80/0x80 > > [] ? poll_idle+0x67/0xc0 > > [] cpuidle_enter_state+0x174/0x430 > > [] cpuidle_enter+0x17/0x20 > > [] cpu_startup_entry+0x4c5/0x5a0 > > [] ? default_idle_call+0x60/0x60 > > [] ? clockevents_config_and_register+0x64/0x70 > > [] start_secondary+0x269/0x300 > > [] ? set_cpu_sibling_map+0x970/0x970 > > ------------[ cut here ]------------ > > > > 44 DEFINE_EVENT(msr_trace_class, write_msr, > > 45 TP_PROTO(unsigned msr, u64 val, int failed), > > 46 TP_ARGS(msr, val, failed) > > 47 ); > > > > Andi, could this be caused by 7f47d8cc039f8746e0038fe05f1ddcb15a2e27f0 ? > > Yes, it's likely. The trace points use RCU, and with X2APIC on there is a > MSR write with trace point at the end up of interrupts to ack the APIC. > > This is what the IPI uses: > > static inline void entering_ack_irq(void) > { > ack_APIC_irq(); > entering_irq(); > } > > I assume the problem is that RCU thinks we're in idle until > the entering_irq(), but the MSR write with the new trace point > is already in the ack_APIC_irq() Just confirming, yes, RCU thinks we are in idle until entering_irq() gets to the call to rcu_irq_enter(). Thanx, Paul > Thomas, can the two call be reordered? I assume it's safe. > > -Andi > > -- > ak@linux.intel.com -- Speaking for myself only >