From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755067AbcBPLDR (ORCPT ); Tue, 16 Feb 2016 06:03:17 -0500 Received: from e34.co.us.ibm.com ([32.97.110.152]:56482 "EHLO e34.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754932AbcBPLDP (ORCPT ); Tue, 16 Feb 2016 06:03:15 -0500 X-IBM-Helo: d03dlp02.boulder.ibm.com X-IBM-MailFrom: paulmck@linux.vnet.ibm.com X-IBM-RcptTo: linux-kernel@vger.kernel.org Date: Tue, 16 Feb 2016 03:03:18 -0800 From: "Paul E. McKenney" To: Denis Kirjanov Cc: Steven Rostedt , linux-kernel@vger.kernel.org Subject: Re: [PATCH 1/2] tracing/mm: don't trace kfree on offline CPUs Message-ID: <20160216110318.GQ6719@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <1455387773-4245-1-git-send-email-kda@linux-powerpc.org> <20160214114537.79a6dbb3@grimm.local.home> <20160215123256.6b4cc115@gandalf.local.home> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16021611-0017-0000-0000-0000122B4CE3 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Feb 16, 2016 at 08:52:24AM +0300, Denis Kirjanov wrote: > On 2/15/16, Steven Rostedt wrote: > > On Sun, 14 Feb 2016 23:19:42 +0300 > > Denis Kirjanov wrote: > > > > > >> >> TP_PROTO(unsigned long call_site, const void *ptr), > >> >> > >> >> - TP_ARGS(call_site, ptr) > >> >> + TP_ARGS(call_site, ptr), > >> >> + > >> >> + /* > >> >> + * This trace can be potentially called from an offlined cpu. > >> >> + * Since trace points use RCU and RCU should not be used from > >> >> + * offline cpus, filter such calls out. > >> >> + * While this trace can be called from a preemptable section, > >> >> + * it has no impact on the condition since tasks can migrate > >> >> + * only from online cpus to other online cpus. Thus its safe > >> >> + * to use raw_smp_processor_id. > >> >> + */ > >> >> + TP_CONDITION(cpu_online(raw_smp_processor_id())) > >> > > >> > This is starting to become a common occurrence. Perhaps it is best to > >> > just hardcode this into the tracepoint code itself? > >> > >> Can you take it as a fix for now. I'll post the follow-up patch then > >> for rcu and offline cpus > >> issue. > > > > Actually, I prefer not to add more checks for cpu_online(), if its > > going to become the default. > > > > Does this work for you instead? > > Hm, somehow it doesn't work: > > [ 84.692846] =============================== > [ 84.692847] [ INFO: suspicious RCU usage. ] > [ 84.692849] 4.5.0-rc3-00765-g78b70d4-dirty #36 Tainted: G S > [ 84.692850] ------------------------------- > [ 84.692851] include/trace/events/rcu.h:457 suspicious > rcu_dereference_check() usage! > [ 84.692852] > other info that might help us debug this: > > [ 84.692854] > RCU used illegally from offline CPU! The problem is that the CPU really is offline as far as RCU is concerned. The usual idle-loop workarounds do not apply here. > rcu_scheduler_active = 1, debug_locks = 1 > [ 84.692855] no locks held by swapper/11/0. > [ 84.692856] > stack backtrace: > [ 84.692858] CPU: 11 PID: 0 Comm: swapper/11 Tainted: G S > 4.5.0-rc3-00765-g78b70d4-dirty #36 > [ 84.692860] Call Trace: > [ 84.692868] [c0000005b76d37f0] [c0000000008c57ac] > .dump_stack+0xe0/0x14c (unreliable) > [ 84.692872] [c0000005b76d3880] [c00000000010d0c8] > .lockdep_rcu_suspicious+0x108/0x170 > [ 84.692876] [c0000005b76d3910] [c00000000013cab4] > .__call_rcu.constprop.60+0x244/0x680 > [ 84.692880] [c0000005b76d39e0] [c0000000002c38ec] .put_object+0x5c/0x80 > [ 84.692884] [c0000005b76d3a60] [c00000000029e748] > .kmem_cache_free+0x318/0x4c0 > [ 84.692888] [c0000005b76d3b00] [c0000000000936b4] .__mmdrop+0x54/0x150 Ah, this is the problem. We are calling __mmdrop() on a CPU that RCU believes to be offline. Can we pull the call to __mmdrop() forward? (Trying to recall what x86 did about a very similar problem...) Thanx, Paul > [ 84.692891] [c0000005b76d3b90] [c0000000000e46e0] .idle_task_exit+0x120/0x130 > [ 84.692895] [c0000005b76d3c20] [c000000000075a14] > .pseries_mach_cpu_die+0x64/0x310 > [ 84.692899] [c0000005b76d3cd0] [c00000000004401c] .cpu_die+0x3c/0x60 > [ 84.692902] [c0000005b76d3d40] [c000000000018998] > .arch_cpu_idle_dead+0x28/0x40 > [ 84.692906] [c0000005b76d3db0] [c000000000102714] > .cpu_startup_entry+0x554/0x580 > [ 84.692909] [c0000005b76d3ed0] [c000000000043d68] > .start_secondary+0x348/0x390 > [ 84.692913] [c0000005b76d3f90] [c000000000008a6c] > start_secondary_prolog+0x10/0x14 > > [ 84.692918] =============================== > [ 84.692919] [ INFO: suspicious RCU usage. ] > [ 84.692920] 4.5.0-rc3-00765-g78b70d4-dirty #36 Tainted: G S > [ 84.692921] ------------------------------- > [ 84.692922] include/trace/events/kmem.h:141 suspicious > rcu_dereference_check() usage! > [ 84.692923] > other info that might help us debug this: > > [ 84.692924] > RCU used illegally from offline CPU! > rcu_scheduler_active = 1, debug_locks = 1 > [ 84.692925] no locks held by swapper/11/0. > [ 84.692926] > stack backtrace: > [ 84.692928] CPU: 11 PID: 0 Comm: swapper/11 Tainted: G S > 4.5.0-rc3-00765-g78b70d4-dirty #36 > [ 84.692929] Call Trace: > [ 84.692933] [c0000005b76d38c0] [c0000000008c57ac] > .dump_stack+0xe0/0x14c (unreliable) > [ 84.692936] [c0000005b76d3950] [c00000000010d0c8] > .lockdep_rcu_suspicious+0x108/0x170 > [ 84.692939] [c0000005b76d39e0] [c00000000029f1c0] .kfree+0x380/0x470 > [ 84.692942] [c0000005b76d3a80] [c000000000056194] .destroy_context+0x44/0x100 > [ 84.692946] [c0000005b76d3b00] [c0000000000936c0] .__mmdrop+0x60/0x150 > [ 84.692948] [c0000005b76d3b90] [c0000000000e46e0] .idle_task_exit+0x120/0x130 > [ 84.692951] [c0000005b76d3c20] [c000000000075a14] > .pseries_mach_cpu_die+0x64/0x310 > [ 84.692954] [c0000005b76d3cd0] [c00000000004401c] .cpu_die+0x3c/0x60 > [ 84.692957] [c0000005b76d3d40] [c000000000018998] > .arch_cpu_idle_dead+0x28/0x40 > [ 84.692960] [c0000005b76d3db0] [c000000000102714] > .cpu_startup_entry+0x554/0x580 > [ 84.692963] [c0000005b76d3ed0] [c000000000043d68] > .start_secondary+0x348/0x390 > [ 84.692966] [c0000005b76d3f90] [c000000000008a6c] > start_secondary_prolog+0x10/0x14 > > > > > > > -- Steve > > > > diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h > > index acd522a91539..acfdbf353a0b 100644 > > --- a/include/linux/tracepoint.h > > +++ b/include/linux/tracepoint.h > > @@ -14,8 +14,10 @@ > > * See the file COPYING for more details. > > */ > > > > +#include > > #include > > #include > > +#include > > #include > > #include > > > > @@ -132,6 +134,9 @@ extern void syscall_unregfunc(void); > > void *it_func; \ > > void *__data; \ > > \ > > + if (!cpu_online(raw_smp_processor_id())) \ > > + return; \ > > + \ > > if (!(cond)) \ > > return; \ > > prercu; \ > > >