From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751526AbdJBPea (ORCPT ); Mon, 2 Oct 2017 11:34:30 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:50794 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750957AbdJBPe2 (ORCPT ); Mon, 2 Oct 2017 11:34:28 -0400 Date: Mon, 2 Oct 2017 08:34:22 -0700 From: "Paul E. McKenney" To: Boqun Feng Cc: Paolo Bonzini , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Peter Zijlstra , Wanpeng Li , Radim =?utf-8?B?S3LEjW3DocWZ?= , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , x86@kernel.org Subject: Re: [PATCH v2] kvm/x86: Handle async PF in RCU read-side critical sections Reply-To: paulmck@linux.vnet.ibm.com References: <20170929110148.3467-1-boqun.feng@gmail.com> <20171001013140.21325-1-boqun.feng@gmail.com> <6c0ee091-4dec-745c-7ffa-add189f249fb@redhat.com> <20171002144300.qotoywl4nos5peuy@tardis> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20171002144300.qotoywl4nos5peuy@tardis> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 17100215-0048-0000-0000-000001EEBBB5 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00007828; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000233; SDB=6.00925483; UDB=6.00465453; IPR=6.00705669; BA=6.00005616; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00017362; XFM=3.00000015; UTC=2017-10-02 15:34:25 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17100215-0049-0000-0000-000042BEA1C8 Message-Id: <20171002153422.GR3521@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-10-02_04:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1707230000 definitions=main-1710020226 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Oct 02, 2017 at 10:43:00PM +0800, Boqun Feng wrote: > On Mon, Oct 02, 2017 at 01:41:03PM +0000, Paolo Bonzini wrote: > [...] > > > > > > Wanpeng, the callsite of kvm_async_pf_task_wait() in > > > kvm_handle_page_fault() is for nested scenario, right? I take it we > > > should handle it as if the fault happens when l1 guest is running in > > > kernel mode, so @user should be 0, right? > > > > In that case we can schedule, actually. The guest will let another > > process run. > > > > In fact we could schedule safely most of the time in the > > !user_mode(regs) case, it's just that with PREEMPT=n there's no > > knowledge of whether we can do so. This explains why we have never seen > > the bug before. > > Thanks, looks like I confused myself a little bit here. You are right. > So in PREEMPT=n kernel, we only couldn't schedule when the async PF > *interrupts* the *kernel*, while in the kvm_handle_page_fault(), we > actually didn't interrupt the kernel, so it's fine. Actually, we should be able to do a little bit better than that. If PREEMPT=n but PREEMPT_COUNT=y, then preempt_count() will know about RCU read-side critical sections via preempt_disable(). So maybe something like this? n.halted = is_idle_task(current) || preempt_count() > 1 || (!IS_ENABLED(CONFIG_PREEMPT) && !IS_ENABLED(CONFIG_PREEMPT_COUNT) && !user) || rcu_preempt_depth(); Thanx, Paul > > I had already applied v1, can you rebase and resend please? Thanks, > > > > Sure, I'm going to rename that parameter to "interrupt_kernel"(probably > a bad name"), indicating whether the async PF interrupts the kernel. > > But it's a little bit late today, will do that tomorrow. > > Regards, > Boqun > > > > Paolo > > > > > arch/x86/include/asm/kvm_para.h | 4 ++-- > > > arch/x86/kernel/kvm.c | 9 ++++++--- > > > arch/x86/kvm/mmu.c | 2 +- > > > 3 files changed, 9 insertions(+), 6 deletions(-) > > > > > > diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h > > > index bc62e7cbf1b1..0a5ae6bb128b 100644 > > > --- a/arch/x86/include/asm/kvm_para.h > > > +++ b/arch/x86/include/asm/kvm_para.h > > > @@ -88,7 +88,7 @@ static inline long kvm_hypercall4(unsigned int nr, unsigned long p1, > > > bool kvm_para_available(void); > > > unsigned int kvm_arch_para_features(void); > > > void __init kvm_guest_init(void); > > > -void kvm_async_pf_task_wait(u32 token); > > > +void kvm_async_pf_task_wait(u32 token, int user); > > > void kvm_async_pf_task_wake(u32 token); > > > u32 kvm_read_and_reset_pf_reason(void); > > > extern void kvm_disable_steal_time(void); > > > @@ -103,7 +103,7 @@ static inline void kvm_spinlock_init(void) > > > > > > #else /* CONFIG_KVM_GUEST */ > > > #define kvm_guest_init() do {} while (0) > > > -#define kvm_async_pf_task_wait(T) do {} while(0) > > > +#define kvm_async_pf_task_wait(T, U) do {} while(0) > > > #define kvm_async_pf_task_wake(T) do {} while(0) > > > > > > static inline bool kvm_para_available(void) > > > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c > > > index aa60a08b65b1..916f519e54c9 100644 > > > --- a/arch/x86/kernel/kvm.c > > > +++ b/arch/x86/kernel/kvm.c > > > @@ -117,7 +117,7 @@ static struct kvm_task_sleep_node *_find_apf_task(struct kvm_task_sleep_head *b, > > > return NULL; > > > } > > > > > > -void kvm_async_pf_task_wait(u32 token) > > > +void kvm_async_pf_task_wait(u32 token, int user) > > > { > > > u32 key = hash_32(token, KVM_TASK_SLEEP_HASHBITS); > > > struct kvm_task_sleep_head *b = &async_pf_sleepers[key]; > > > @@ -140,7 +140,10 @@ void kvm_async_pf_task_wait(u32 token) > > > > > > n.token = token; > > > n.cpu = smp_processor_id(); > > > - n.halted = is_idle_task(current) || preempt_count() > 1; > > > + n.halted = is_idle_task(current) || > > > + preempt_count() > 1 || > > > + (!IS_ENABLED(CONFIG_PREEMPT) && !user) || > > > + rcu_preempt_depth(); > > > init_swait_queue_head(&n.wq); > > > hlist_add_head(&n.link, &b->list); > > > raw_spin_unlock(&b->lock); > > > @@ -268,7 +271,7 @@ do_async_page_fault(struct pt_regs *regs, unsigned long error_code) > > > case KVM_PV_REASON_PAGE_NOT_PRESENT: > > > /* page is swapped out by the host. */ > > > prev_state = exception_enter(); > > > - kvm_async_pf_task_wait((u32)read_cr2()); > > > + kvm_async_pf_task_wait((u32)read_cr2(), user_mode(regs)); > > > exception_exit(prev_state); > > > break; > > > case KVM_PV_REASON_PAGE_READY: > > > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > > > index eca30c1eb1d9..106d4a029a8a 100644 > > > --- a/arch/x86/kvm/mmu.c > > > +++ b/arch/x86/kvm/mmu.c > > > @@ -3837,7 +3837,7 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code, > > > case KVM_PV_REASON_PAGE_NOT_PRESENT: > > > vcpu->arch.apf.host_apf_reason = 0; > > > local_irq_disable(); > > > - kvm_async_pf_task_wait(fault_address); > > > + kvm_async_pf_task_wait(fault_address, 0); > > > local_irq_enable(); > > > break; > > > case KVM_PV_REASON_PAGE_READY: > > > > >