From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Paul E. McKenney" Subject: Re: [PATCH v3 tip/core/rcu 40/40] rcu: Make non-preemptive schedule be Tasks RCU quiescent state Date: Fri, 29 Sep 2017 09:36:56 -0700 Message-ID: <20170929163656.GZ3521@linux.vnet.ibm.com> References: <20170419165805.GB10874@linux.vnet.ibm.com> <1492621117-13939-40-git-send-email-paulmck@linux.vnet.ibm.com> <20170928123055.GI3521@linux.vnet.ibm.com> <20170928153813.7cernglt2d7umhpe@sasha-lappy> <20170928160514.GM3521@linux.vnet.ibm.com> <20170929093010.w56nawdoz23mkzio@tardis> Reply-To: paulmck@linux.vnet.ibm.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Boqun Feng , "Levin, Alexander (Sasha Levin)" , Sasha Levin , "linux-kernel@vger.kernel.org List" , Ingo Molnar , "jiangshanlai@gmail.com" , "dipankar@in.ibm.com" , Andrew Morton , Mathieu Desnoyers , Josh Triplett , Thomas Gleixner , Peter Zijlstra , "dhowells@redhat.com" , Eric Dumazet , =?iso-8859-1?Q?Fr=E9d=E9ric?= Weisbecker , Oleg Nesterov , "bobby.prani@gmail.com" Return-path: Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:40218 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751975AbdI2QyT (ORCPT ); Fri, 29 Sep 2017 12:54:19 -0400 Received: from pps.filterd (m0098404.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.21/8.16.0.21) with SMTP id v8TGrwuL045960 for ; Fri, 29 Sep 2017 12:54:18 -0400 Received: from e14.ny.us.ibm.com (e14.ny.us.ibm.com [129.33.205.204]) by mx0a-001b2d01.pphosted.com with ESMTP id 2d9pka5rud-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Fri, 29 Sep 2017 12:54:18 -0400 Received: from localhost by e14.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 29 Sep 2017 12:54:17 -0400 Content-Disposition: inline In-Reply-To: Sender: kvm-owner@vger.kernel.org List-ID: On Fri, Sep 29, 2017 at 12:01:24PM +0200, Paolo Bonzini wrote: > On 29/09/2017 11:30, Boqun Feng wrote: > > On Thu, Sep 28, 2017 at 04:05:14PM +0000, Paul E. McKenney wrote: > > [...] > >>> __schedule+0x201/0x2240 kernel/sched/core.c:3292 > >>> schedule+0x113/0x460 kernel/sched/core.c:3421 > >>> kvm_async_pf_task_wait+0x43f/0x940 arch/x86/kernel/kvm.c:158 > >> > >> It is kvm_async_pf_task_wait() that calls schedule(), but it carefully > >> sets state to make that legal. Except... > >> > >>> do_async_page_fault+0x72/0x90 arch/x86/kernel/kvm.c:271 > >>> async_page_fault+0x22/0x30 arch/x86/entry/entry_64.S:1069 > >>> RIP: 0010:format_decode+0x240/0x830 lib/vsprintf.c:1996 > >>> RSP: 0018:ffff88003b2df520 EFLAGS: 00010283 > >>> RAX: 000000000000003f RBX: ffffffffb5d1e141 RCX: ffff88003b2df670 > >>> RDX: 0000000000000001 RSI: dffffc0000000000 RDI: ffffffffb5d1e140 > >>> RBP: ffff88003b2df560 R08: dffffc0000000000 R09: 0000000000000000 > >>> R10: ffff88003b2df718 R11: 0000000000000000 R12: ffff88003b2df5d8 > >>> R13: 0000000000000064 R14: ffffffffb5d1e140 R15: 0000000000000000 > >>> vsnprintf+0x173/0x1700 lib/vsprintf.c:2136 > >> > >> We took a page fault in vsnprintf() while doing link_path_walk(), > >> which looks to be within an RCU read-side critical section. > >> > >> Maybe the page fault confused lockdep? > >> > >> Sigh. It is going to be a real pain if all printk()s need to be > >> outside of RCU read-side critical sections due to the possibility of > >> page faults... > >> > > > > Does this mean whenever we get a page fault in a RCU read-side critical > > section, we may hit this? > > > > Could we simply avoid to schedule() in kvm_async_pf_task_wait() if the > > fault process is in a RCU read-side critical section as follow? > > > > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c > > index aa60a08b65b1..291ea13b23d2 100644 > > --- a/arch/x86/kernel/kvm.c > > +++ b/arch/x86/kernel/kvm.c > > @@ -140,7 +140,7 @@ void kvm_async_pf_task_wait(u32 token) > > > > n.token = token; > > n.cpu = smp_processor_id(); > > - n.halted = is_idle_task(current) || preempt_count() > 1; > > + n.halted = is_idle_task(current) || preempt_count() > 1 || rcu_preempt_depth(); > > init_swait_queue_head(&n.wq); > > hlist_add_head(&n.link, &b->list); > > raw_spin_unlock(&b->lock); This works for PREEMPT=y kernels, but can silently break RCU read-side critical sections on PREEMPT=n kernels. > > (Add KVM folks and list Cced) > > Yes, that would work. Mind to send it as a proper patch? Just out of curiosity, why is printk() being passed something that can page fault? Thanx, Paul