* Re: [PATCH v3 tip/core/rcu 40/40] rcu: Make non-preemptive schedule be Tasks RCU quiescent state
2017-09-29 10:01 ` Paolo Bonzini
@ 2017-09-29 10:25 ` Boqun Feng
2017-09-29 10:34 ` Peter Zijlstra
2017-09-29 16:36 ` Paul E. McKenney
2 siblings, 0 replies; 7+ messages in thread
From: Boqun Feng @ 2017-09-29 10:25 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Paul E. McKenney, Levin, Alexander (Sasha Levin), Sasha Levin,
linux-kernel@vger.kernel.org List, Ingo Molnar,
jiangshanlai@gmail.com, dipankar@in.ibm.com, Andrew Morton,
Mathieu Desnoyers, Josh Triplett, Thomas Gleixner, Peter Zijlstra,
dhowells@redhat.com, Eric Dumazet, Frédéric Weisbecker,
Oleg Nesterov
[-- Attachment #1: Type: text/plain, Size: 2657 bytes --]
On Fri, Sep 29, 2017 at 10:01:24AM +0000, Paolo Bonzini wrote:
> On 29/09/2017 11:30, Boqun Feng wrote:
> > On Thu, Sep 28, 2017 at 04:05:14PM +0000, Paul E. McKenney wrote:
> > [...]
> >>> __schedule+0x201/0x2240 kernel/sched/core.c:3292
> >>> schedule+0x113/0x460 kernel/sched/core.c:3421
> >>> kvm_async_pf_task_wait+0x43f/0x940 arch/x86/kernel/kvm.c:158
> >>
> >> It is kvm_async_pf_task_wait() that calls schedule(), but it carefully
> >> sets state to make that legal. Except...
> >>
> >>> do_async_page_fault+0x72/0x90 arch/x86/kernel/kvm.c:271
> >>> async_page_fault+0x22/0x30 arch/x86/entry/entry_64.S:1069
> >>> RIP: 0010:format_decode+0x240/0x830 lib/vsprintf.c:1996
> >>> RSP: 0018:ffff88003b2df520 EFLAGS: 00010283
> >>> RAX: 000000000000003f RBX: ffffffffb5d1e141 RCX: ffff88003b2df670
> >>> RDX: 0000000000000001 RSI: dffffc0000000000 RDI: ffffffffb5d1e140
> >>> RBP: ffff88003b2df560 R08: dffffc0000000000 R09: 0000000000000000
> >>> R10: ffff88003b2df718 R11: 0000000000000000 R12: ffff88003b2df5d8
> >>> R13: 0000000000000064 R14: ffffffffb5d1e140 R15: 0000000000000000
> >>> vsnprintf+0x173/0x1700 lib/vsprintf.c:2136
> >>
> >> We took a page fault in vsnprintf() while doing link_path_walk(),
> >> which looks to be within an RCU read-side critical section.
> >>
> >> Maybe the page fault confused lockdep?
> >>
> >> Sigh. It is going to be a real pain if all printk()s need to be
> >> outside of RCU read-side critical sections due to the possibility of
> >> page faults...
> >>
> >
> > Does this mean whenever we get a page fault in a RCU read-side critical
> > section, we may hit this?
> >
> > Could we simply avoid to schedule() in kvm_async_pf_task_wait() if the
> > fault process is in a RCU read-side critical section as follow?
> >
> > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> > index aa60a08b65b1..291ea13b23d2 100644
> > --- a/arch/x86/kernel/kvm.c
> > +++ b/arch/x86/kernel/kvm.c
> > @@ -140,7 +140,7 @@ void kvm_async_pf_task_wait(u32 token)
> >
> > n.token = token;
> > n.cpu = smp_processor_id();
> > - n.halted = is_idle_task(current) || preempt_count() > 1;
> > + n.halted = is_idle_task(current) || preempt_count() > 1 || rcu_preempt_depth();
> > init_swait_queue_head(&n.wq);
> > hlist_add_head(&n.link, &b->list);
> > raw_spin_unlock(&b->lock);
> >
> > (Add KVM folks and list Cced)
>
> Yes, that would work. Mind to send it as a proper patch?
>
I'd love to ;-) In a minute.
Regards,
Boqun
> Thanks,
>
> Paolo
>
> > Regards,
> > Boqun
> >
> >> Thanx, Paul
> > [...]
> >
>
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v3 tip/core/rcu 40/40] rcu: Make non-preemptive schedule be Tasks RCU quiescent state
2017-09-29 10:01 ` Paolo Bonzini
2017-09-29 10:25 ` Boqun Feng
@ 2017-09-29 10:34 ` Peter Zijlstra
2017-09-29 11:44 ` Paolo Bonzini
2017-09-29 16:36 ` Paul E. McKenney
2 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2017-09-29 10:34 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Boqun Feng, Paul E. McKenney, Levin, Alexander (Sasha Levin),
Sasha Levin, linux-kernel@vger.kernel.org List, Ingo Molnar,
jiangshanlai@gmail.com, dipankar@in.ibm.com, Andrew Morton,
Mathieu Desnoyers, Josh Triplett, Thomas Gleixner,
dhowells@redhat.com, Eric Dumazet, Fr??d??ric Weisbecker,
Oleg Nesterov, bobby.prani@gmail.com
On Fri, Sep 29, 2017 at 12:01:24PM +0200, Paolo Bonzini wrote:
> > Does this mean whenever we get a page fault in a RCU read-side critical
> > section, we may hit this?
> >
> > Could we simply avoid to schedule() in kvm_async_pf_task_wait() if the
> > fault process is in a RCU read-side critical section as follow?
> >
> > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> > index aa60a08b65b1..291ea13b23d2 100644
> > --- a/arch/x86/kernel/kvm.c
> > +++ b/arch/x86/kernel/kvm.c
> > @@ -140,7 +140,7 @@ void kvm_async_pf_task_wait(u32 token)
> >
> > n.token = token;
> > n.cpu = smp_processor_id();
> > - n.halted = is_idle_task(current) || preempt_count() > 1;
> > + n.halted = is_idle_task(current) || preempt_count() > 1 || rcu_preempt_depth();
> > init_swait_queue_head(&n.wq);
> > hlist_add_head(&n.link, &b->list);
> > raw_spin_unlock(&b->lock);
> >
> > (Add KVM folks and list Cced)
>
> Yes, that would work. Mind to send it as a proper patch?
I'm confused, why would we do an ASYNC PF at all here? Thing is, a
printk() shouldn't trigger a major fault _ever_. At worst it triggers
something like a vmalloc minor fault. And I'm thinking we should not do
the whole ASYNC machinery for minor faults.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v3 tip/core/rcu 40/40] rcu: Make non-preemptive schedule be Tasks RCU quiescent state
2017-09-29 10:34 ` Peter Zijlstra
@ 2017-09-29 11:44 ` Paolo Bonzini
2017-09-29 16:38 ` Paul E. McKenney
0 siblings, 1 reply; 7+ messages in thread
From: Paolo Bonzini @ 2017-09-29 11:44 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Boqun Feng, Paul E. McKenney, Levin, Alexander (Sasha Levin),
Sasha Levin, linux-kernel@vger.kernel.org List, Ingo Molnar,
jiangshanlai@gmail.com, dipankar@in.ibm.com, Andrew Morton,
Mathieu Desnoyers, Josh Triplett, Thomas Gleixner,
dhowells@redhat.com, Eric Dumazet, Fr??d??ric Weisbecker,
Oleg Nesterov, bobby.prani@gmail.com
On 29/09/2017 12:34, Peter Zijlstra wrote:
> On Fri, Sep 29, 2017 at 12:01:24PM +0200, Paolo Bonzini wrote:
>>> Does this mean whenever we get a page fault in a RCU read-side critical
>>> section, we may hit this?
>>>
>>> Could we simply avoid to schedule() in kvm_async_pf_task_wait() if the
>>> fault process is in a RCU read-side critical section as follow?
>>>
>>> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
>>> index aa60a08b65b1..291ea13b23d2 100644
>>> --- a/arch/x86/kernel/kvm.c
>>> +++ b/arch/x86/kernel/kvm.c
>>> @@ -140,7 +140,7 @@ void kvm_async_pf_task_wait(u32 token)
>>>
>>> n.token = token;
>>> n.cpu = smp_processor_id();
>>> - n.halted = is_idle_task(current) || preempt_count() > 1;
>>> + n.halted = is_idle_task(current) || preempt_count() > 1 || rcu_preempt_depth();
>>> init_swait_queue_head(&n.wq);
>>> hlist_add_head(&n.link, &b->list);
>>> raw_spin_unlock(&b->lock);
>>>
>>> (Add KVM folks and list Cced)
>>
>> Yes, that would work. Mind to send it as a proper patch?
>
> I'm confused, why would we do an ASYNC PF at all here? Thing is, a
> printk() shouldn't trigger a major fault _ever_. At worst it triggers
> something like a vmalloc minor fault. And I'm thinking we should not do
> the whole ASYNC machinery for minor faults.
Async page faults are page faults _on the host_ side, and you cannot
control what the host pages out. Of course the hypervisor filters out
some cases itself (e.g. IF=0) but in general you could get one at any time.
Paolo
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v3 tip/core/rcu 40/40] rcu: Make non-preemptive schedule be Tasks RCU quiescent state
2017-09-29 11:44 ` Paolo Bonzini
@ 2017-09-29 16:38 ` Paul E. McKenney
0 siblings, 0 replies; 7+ messages in thread
From: Paul E. McKenney @ 2017-09-29 16:38 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Peter Zijlstra, Boqun Feng, Levin, Alexander (Sasha Levin),
Sasha Levin, linux-kernel@vger.kernel.org List, Ingo Molnar,
jiangshanlai@gmail.com, dipankar@in.ibm.com, Andrew Morton,
Mathieu Desnoyers, Josh Triplett, Thomas Gleixner,
dhowells@redhat.com, Eric Dumazet, Fr??d??ric Weisbecker,
Oleg Nesterov, bobby.prani@gmail.com, Radim
On Fri, Sep 29, 2017 at 01:44:56PM +0200, Paolo Bonzini wrote:
> On 29/09/2017 12:34, Peter Zijlstra wrote:
> > On Fri, Sep 29, 2017 at 12:01:24PM +0200, Paolo Bonzini wrote:
> >>> Does this mean whenever we get a page fault in a RCU read-side critical
> >>> section, we may hit this?
> >>>
> >>> Could we simply avoid to schedule() in kvm_async_pf_task_wait() if the
> >>> fault process is in a RCU read-side critical section as follow?
> >>>
> >>> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> >>> index aa60a08b65b1..291ea13b23d2 100644
> >>> --- a/arch/x86/kernel/kvm.c
> >>> +++ b/arch/x86/kernel/kvm.c
> >>> @@ -140,7 +140,7 @@ void kvm_async_pf_task_wait(u32 token)
> >>>
> >>> n.token = token;
> >>> n.cpu = smp_processor_id();
> >>> - n.halted = is_idle_task(current) || preempt_count() > 1;
> >>> + n.halted = is_idle_task(current) || preempt_count() > 1 || rcu_preempt_depth();
> >>> init_swait_queue_head(&n.wq);
> >>> hlist_add_head(&n.link, &b->list);
> >>> raw_spin_unlock(&b->lock);
> >>>
> >>> (Add KVM folks and list Cced)
> >>
> >> Yes, that would work. Mind to send it as a proper patch?
> >
> > I'm confused, why would we do an ASYNC PF at all here? Thing is, a
> > printk() shouldn't trigger a major fault _ever_. At worst it triggers
> > something like a vmalloc minor fault. And I'm thinking we should not do
> > the whole ASYNC machinery for minor faults.
>
> Async page faults are page faults _on the host_ side, and you cannot
> control what the host pages out. Of course the hypervisor filters out
> some cases itself (e.g. IF=0) but in general you could get one at any time.
Just to make sure I am understanding this... You take a page fault on
the host, and this causes a schedule() on the guest? Or did I lose the
thread here?
Thanx, Paul
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v3 tip/core/rcu 40/40] rcu: Make non-preemptive schedule be Tasks RCU quiescent state
2017-09-29 10:01 ` Paolo Bonzini
2017-09-29 10:25 ` Boqun Feng
2017-09-29 10:34 ` Peter Zijlstra
@ 2017-09-29 16:36 ` Paul E. McKenney
2 siblings, 0 replies; 7+ messages in thread
From: Paul E. McKenney @ 2017-09-29 16:36 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Boqun Feng, Levin, Alexander (Sasha Levin), Sasha Levin,
linux-kernel@vger.kernel.org List, Ingo Molnar,
jiangshanlai@gmail.com, dipankar@in.ibm.com, Andrew Morton,
Mathieu Desnoyers, Josh Triplett, Thomas Gleixner, Peter Zijlstra,
dhowells@redhat.com, Eric Dumazet, Frédéric Weisbecker,
Oleg Nesterov, bobby.prani@gmail.com
On Fri, Sep 29, 2017 at 12:01:24PM +0200, Paolo Bonzini wrote:
> On 29/09/2017 11:30, Boqun Feng wrote:
> > On Thu, Sep 28, 2017 at 04:05:14PM +0000, Paul E. McKenney wrote:
> > [...]
> >>> __schedule+0x201/0x2240 kernel/sched/core.c:3292
> >>> schedule+0x113/0x460 kernel/sched/core.c:3421
> >>> kvm_async_pf_task_wait+0x43f/0x940 arch/x86/kernel/kvm.c:158
> >>
> >> It is kvm_async_pf_task_wait() that calls schedule(), but it carefully
> >> sets state to make that legal. Except...
> >>
> >>> do_async_page_fault+0x72/0x90 arch/x86/kernel/kvm.c:271
> >>> async_page_fault+0x22/0x30 arch/x86/entry/entry_64.S:1069
> >>> RIP: 0010:format_decode+0x240/0x830 lib/vsprintf.c:1996
> >>> RSP: 0018:ffff88003b2df520 EFLAGS: 00010283
> >>> RAX: 000000000000003f RBX: ffffffffb5d1e141 RCX: ffff88003b2df670
> >>> RDX: 0000000000000001 RSI: dffffc0000000000 RDI: ffffffffb5d1e140
> >>> RBP: ffff88003b2df560 R08: dffffc0000000000 R09: 0000000000000000
> >>> R10: ffff88003b2df718 R11: 0000000000000000 R12: ffff88003b2df5d8
> >>> R13: 0000000000000064 R14: ffffffffb5d1e140 R15: 0000000000000000
> >>> vsnprintf+0x173/0x1700 lib/vsprintf.c:2136
> >>
> >> We took a page fault in vsnprintf() while doing link_path_walk(),
> >> which looks to be within an RCU read-side critical section.
> >>
> >> Maybe the page fault confused lockdep?
> >>
> >> Sigh. It is going to be a real pain if all printk()s need to be
> >> outside of RCU read-side critical sections due to the possibility of
> >> page faults...
> >>
> >
> > Does this mean whenever we get a page fault in a RCU read-side critical
> > section, we may hit this?
> >
> > Could we simply avoid to schedule() in kvm_async_pf_task_wait() if the
> > fault process is in a RCU read-side critical section as follow?
> >
> > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> > index aa60a08b65b1..291ea13b23d2 100644
> > --- a/arch/x86/kernel/kvm.c
> > +++ b/arch/x86/kernel/kvm.c
> > @@ -140,7 +140,7 @@ void kvm_async_pf_task_wait(u32 token)
> >
> > n.token = token;
> > n.cpu = smp_processor_id();
> > - n.halted = is_idle_task(current) || preempt_count() > 1;
> > + n.halted = is_idle_task(current) || preempt_count() > 1 || rcu_preempt_depth();
> > init_swait_queue_head(&n.wq);
> > hlist_add_head(&n.link, &b->list);
> > raw_spin_unlock(&b->lock);
This works for PREEMPT=y kernels, but can silently break RCU read-side
critical sections on PREEMPT=n kernels.
> > (Add KVM folks and list Cced)
>
> Yes, that would work. Mind to send it as a proper patch?
Just out of curiosity, why is printk() being passed something that can
page fault?
Thanx, Paul
^ permalink raw reply [flat|nested] 7+ messages in thread