From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753954Ab3FDMJ6 (ORCPT ); Tue, 4 Jun 2013 08:09:58 -0400 Received: from mail-ea0-f182.google.com ([209.85.215.182]:64307 "EHLO mail-ea0-f182.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753150Ab3FDMJz (ORCPT ); Tue, 4 Jun 2013 08:09:55 -0400 Date: Tue, 4 Jun 2013 14:09:51 +0200 From: Frederic Weisbecker To: Steven Rostedt Cc: Peter Zijlstra , LKML , "Paul E. McKenney" , Dave Jones , Ingo Molnar Subject: Re: [PATCH v2][RFC] tracing/context-tracking: Add preempt_schedule_context() for tracing Message-ID: <20130604120949.GD14973@somewhere> References: <1369943981.26799.43.camel@gandalf.local.home> <20130531134319.GE5932@somewhere> <1370013528.26799.49.camel@gandalf.local.home> <20130531160101.GB5910@twins.programming.kicks-ass.net> <1370050218.26799.87.camel@gandalf.local.home> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1370050218.26799.87.camel@gandalf.local.home> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 31, 2013 at 09:30:18PM -0400, Steven Rostedt wrote: > Dave Jones hit the following bug report: > > =============================== > [ INFO: suspicious RCU usage. ] > 3.10.0-rc2+ #1 Not tainted > ------------------------------- > include/linux/rcupdate.h:771 rcu_read_lock() used illegally while idle! > other info that might help us debug this: > RCU used illegally from idle CPU! rcu_scheduler_active = 1, debug_locks = 0 > RCU used illegally from extended quiescent state! > 2 locks held by cc1/63645: > #0: (&rq->lock){-.-.-.}, at: [] __schedule+0xed/0x9b0 > #1: (rcu_read_lock){.+.+..}, at: [] cpuacct_charge+0x5/0x1f0 > > CPU: 1 PID: 63645 Comm: cc1 Not tainted 3.10.0-rc2+ #1 [loadavg: 40.57 27.55 13.39 25/277 64369] > Hardware name: Gigabyte Technology Co., Ltd. GA-MA78GM-S2H/GA-MA78GM-S2H, BIOS F12a 04/23/2010 > 0000000000000000 ffff88010f78fcf8 ffffffff816ae383 ffff88010f78fd28 > ffffffff810b698d ffff88011c092548 000000000023d073 ffff88011c092500 > 0000000000000001 ffff88010f78fd60 ffffffff8109d7c5 ffffffff8109d645 > Call Trace: > [] dump_stack+0x19/0x1b > [] lockdep_rcu_suspicious+0xfd/0x130 > [] cpuacct_charge+0x185/0x1f0 > [] ? cpuacct_charge+0x5/0x1f0 > [] update_curr+0xec/0x240 > [] put_prev_task_fair+0x228/0x480 > [] __schedule+0x161/0x9b0 > [] preempt_schedule+0x51/0x80 > [] ? __cond_resched_softirq+0x60/0x60 > [] ? retint_careful+0x12/0x2e > [] ftrace_ops_control_func+0x1dc/0x210 > [] ftrace_call+0x5/0x2f > [] ? retint_careful+0xb/0x2e > [] ? schedule_user+0x5/0x70 > [] ? schedule_user+0x5/0x70 > [] ? retint_careful+0x12/0x2e > ------------[ cut here ]------------ > > What happened was that the function tracer traced the schedule_user() code > that tells RCU that the system is coming back from userspace, and to > add the CPU back to the RCU monitoring. > > Because the function tracer does a preempt_disable/enable_notrace() calls > the preempt_enable_notrace() checks the NEED_RESCHED flag. If it is set, > then preempt_schedule() is called. But this is called before the user_exit() > function can inform the kernel that the CPU is no longer in user mode and > needs to be accounted for by RCU. > > The fix is to create a new preempt_schedule_context() that checks if > the kernel is still in user mode and if so to switch it to kernel mode > before calling schedule. It also switches back to user mode coming back > from schedule in need be. > > The only user of this currently is the preempt_enable_notrace(), which is > only used by the tracing subsystem. > > Link: http://lkml.kernel.org/r/1369423420.6828.226.camel@gandalf.local.home > > Signed-off-by: Steven Rostedt > --- > include/linux/preempt.h | 18 +++++++++++++++++- > kernel/context_tracking.c | 38 ++++++++++++++++++++++++++++++++++++++ > 2 files changed, 55 insertions(+), 1 deletions(-) > > diff --git a/include/linux/preempt.h b/include/linux/preempt.h > index 87a03c7..f5d4723 100644 > --- a/include/linux/preempt.h > +++ b/include/linux/preempt.h > @@ -33,9 +33,25 @@ do { \ > preempt_schedule(); \ > } while (0) > > +#ifdef CONFIG_CONTEXT_TRACKING > + > +void preempt_schedule_context(void); > + > +#define preempt_check_resched_context() \ > +do { \ > + if (unlikely(test_thread_flag(TIF_NEED_RESCHED))) \ > + preempt_schedule_context(); \ > +} while (0) > +#else > + > +#define preempt_check_resched_context() preempt_check_resched() > + > +#endif /* CONFIG_CONTEXT_TRACKING */ > + > #else /* !CONFIG_PREEMPT */ > > #define preempt_check_resched() do { } while (0) > +#define preempt_check_resched_context() do { } while (0) > > #endif /* CONFIG_PREEMPT */ > > @@ -88,7 +104,7 @@ do { \ > do { \ > preempt_enable_no_resched_notrace(); \ > barrier(); \ > - preempt_check_resched(); \ > + preempt_check_resched_context(); \ > } while (0) > > #else /* !CONFIG_PREEMPT_COUNT */ > diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c > index 65349f0..15c9f2e 100644 > --- a/kernel/context_tracking.c > +++ b/kernel/context_tracking.c > @@ -71,6 +71,44 @@ void user_enter(void) > local_irq_restore(flags); > } > > +/** > + * preempt_schedule_context - preempt_schedule called by tracing > + * > + * The tracing infrastructure uses preempt_enable_notrace to prevent > + * recursion and tracing preempt enabling caused by the tracing > + * infrastructure itself. But as tracing can happen in areas coming > + * from userspace or just about to enter userspace, a preempt enable > + * can occur before user_exit() is called. This will cause the scheduler > + * to be called when the system is still in usermode. > + * > + * To prevent this, the preempt_enable_notrace will use this function > + * instead of preempt_schedule() to exit user context if needed before > + * calling the scheduler. > + */ > +void __sched notrace preempt_schedule_context(void) > +{ > + struct thread_info *ti = current_thread_info(); > + enum ctx_state prev_ctx; > + > + if (likely(ti->preempt_count || irqs_disabled())) > + return; or: if (!preemptible()) return; > + > + /* > + * Need to disable preemption in case user_exit() is traced > + * and the tracer calls preempt_enable_notrace() causing > + * an infinite recursion. > + */ > + preempt_disable_notrace(); > + prev_ctx = exception_enter(); > + preempt_enable_no_resched_notrace(); > + > + preempt_schedule(); > + > + preempt_disable_notrace(); > + exception_exit(prev_ctx); > + preempt_enable_notrace(); > +} > +EXPORT_SYMBOL(preempt_schedule_context); That's quite a tricky change but I can't think of anything better. Acked-by: Frederic Weisbecker > /** > * user_exit - Inform the context tracking that the CPU is > -- > 1.7.3.4 > > >