* Async page fault delivered while irq are disabled? @ 2019-12-19 15:28 Frederic Weisbecker 2019-12-19 15:57 ` Sean Christopherson 0 siblings, 1 reply; 8+ messages in thread From: Frederic Weisbecker @ 2019-12-19 15:28 UTC (permalink / raw) To: Paolo Bonzini, Radim Krčmář Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm Hi, While checking the x86 async page fault code, I can't find anything that prevents KVM_PV_REASON_PAGE_READY to be injected while the guest has interrupts disabled. If that page fault happens to trap in an interrupt disabled section, there may be a deadlock due to the call to wake_up_process() which locks the rq->lock (among others). Given how long that code is there, I guess such an issue would have been reported for a while already. But I just would like to be sure we are checking that. Can someone enlighten me? Thanks. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Async page fault delivered while irq are disabled? 2019-12-19 15:28 Async page fault delivered while irq are disabled? Frederic Weisbecker @ 2019-12-19 15:57 ` Sean Christopherson 2019-12-19 16:15 ` Frederic Weisbecker 0 siblings, 1 reply; 8+ messages in thread From: Sean Christopherson @ 2019-12-19 15:57 UTC (permalink / raw) To: Frederic Weisbecker Cc: Paolo Bonzini, Radim Krčmář, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm On Thu, Dec 19, 2019 at 04:28:15PM +0100, Frederic Weisbecker wrote: > Hi, > > While checking the x86 async page fault code, I can't > find anything that prevents KVM_PV_REASON_PAGE_READY to be injected > while the guest has interrupts disabled. If that page fault happens > to trap in an interrupt disabled section, there may be a deadlock due to the > call to wake_up_process() which locks the rq->lock (among others). > > Given how long that code is there, I guess such an issue would > have been reported for a while already. But I just would like to > be sure we are checking that. > > Can someone enlighten me? The check is triggered from the caller of kvm_async_page_present(). kvm_check_async_pf_completion() | |-> kvm_arch_can_inject_async_page_present() | |-> kvm_can_do_async_pf() | |-> kvm_x86_ops->interrupt_allowed() ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Async page fault delivered while irq are disabled? 2019-12-19 15:57 ` Sean Christopherson @ 2019-12-19 16:15 ` Frederic Weisbecker 2019-12-19 19:00 ` Sean Christopherson 0 siblings, 1 reply; 8+ messages in thread From: Frederic Weisbecker @ 2019-12-19 16:15 UTC (permalink / raw) To: Sean Christopherson Cc: Paolo Bonzini, Radim Krčmář, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm On Thu, Dec 19, 2019 at 07:57:46AM -0800, Sean Christopherson wrote: > On Thu, Dec 19, 2019 at 04:28:15PM +0100, Frederic Weisbecker wrote: > > Hi, > > > > While checking the x86 async page fault code, I can't > > find anything that prevents KVM_PV_REASON_PAGE_READY to be injected > > while the guest has interrupts disabled. If that page fault happens > > to trap in an interrupt disabled section, there may be a deadlock due to the > > call to wake_up_process() which locks the rq->lock (among others). > > > > Given how long that code is there, I guess such an issue would > > have been reported for a while already. But I just would like to > > be sure we are checking that. > > > > Can someone enlighten me? > > The check is triggered from the caller of kvm_async_page_present(). > > kvm_check_async_pf_completion() > | > |-> kvm_arch_can_inject_async_page_present() > | > |-> kvm_can_do_async_pf() > | > |-> kvm_x86_ops->interrupt_allowed() Ah thanks, I missed that one. And what about kvm_async_page_present_sync()? I don't see a similar check there. And one last silly question, what about that line in kvm_arch_can_inject_async_page_present: if (!(vcpu->arch.apf.msr_val & KVM_ASYNC_PF_ENABLED)) return true; That looks weird, also it shortcuts the irqs_allowed() check. Thanks! ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Async page fault delivered while irq are disabled? 2019-12-19 16:15 ` Frederic Weisbecker @ 2019-12-19 19:00 ` Sean Christopherson 2019-12-20 9:34 ` Paolo Bonzini 0 siblings, 1 reply; 8+ messages in thread From: Sean Christopherson @ 2019-12-19 19:00 UTC (permalink / raw) To: Frederic Weisbecker Cc: Paolo Bonzini, Radim Krčmář, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm On Thu, Dec 19, 2019 at 05:15:25PM +0100, Frederic Weisbecker wrote: > On Thu, Dec 19, 2019 at 07:57:46AM -0800, Sean Christopherson wrote: > > On Thu, Dec 19, 2019 at 04:28:15PM +0100, Frederic Weisbecker wrote: > > > Hi, > > > > > > While checking the x86 async page fault code, I can't > > > find anything that prevents KVM_PV_REASON_PAGE_READY to be injected > > > while the guest has interrupts disabled. If that page fault happens > > > to trap in an interrupt disabled section, there may be a deadlock due to the > > > call to wake_up_process() which locks the rq->lock (among others). > > > > > > Given how long that code is there, I guess such an issue would > > > have been reported for a while already. But I just would like to > > > be sure we are checking that. > > > > > > Can someone enlighten me? > > > > The check is triggered from the caller of kvm_async_page_present(). > > > > kvm_check_async_pf_completion() > > | > > |-> kvm_arch_can_inject_async_page_present() > > | > > |-> kvm_can_do_async_pf() > > | > > |-> kvm_x86_ops->interrupt_allowed() > > Ah thanks, I missed that one. And what about > kvm_async_page_present_sync()? I don't see a similar check > there. CONFIG_KVM_ASYNC_PF_SYNC is selected only by s390, it can't be turned on for x86. > And one last silly question, what about that line in > kvm_arch_can_inject_async_page_present: > > if (!(vcpu->arch.apf.msr_val & KVM_ASYNC_PF_ENABLED)) > return true; > > That looks weird, also it shortcuts the irqs_allowed() check. I wondered about that code as well :-). Definitely odd, but it would require the guest to disable async #PF after an async #PF is queued. Best guess is the idea is that it's the guest's problem if it disables async #PF on the fly. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Async page fault delivered while irq are disabled? 2019-12-19 19:00 ` Sean Christopherson @ 2019-12-20 9:34 ` Paolo Bonzini 2019-12-23 2:17 ` Frederic Weisbecker 0 siblings, 1 reply; 8+ messages in thread From: Paolo Bonzini @ 2019-12-20 9:34 UTC (permalink / raw) To: Sean Christopherson, Frederic Weisbecker Cc: Radim Krčmář, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm On 19/12/19 20:00, Sean Christopherson wrote: >> And one last silly question, what about that line in >> kvm_arch_can_inject_async_page_present: >> >> if (!(vcpu->arch.apf.msr_val & KVM_ASYNC_PF_ENABLED)) >> return true; >> >> That looks weird, also it shortcuts the irqs_allowed() check. > > I wondered about that code as well :-). Definitely odd, but it would > require the guest to disable async #PF after an async #PF is queued. Best > guess is the idea is that it's the guest's problem if it disables async #PF > on the fly. > When the guest disables async #PF all outstanding page faults are cancelled by kvm_clear_async_pf_completion_queue. However, in case they complete while in cancel_work_sync. you need to inject them even if interrupts are disabled. Paolo ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Async page fault delivered while irq are disabled? 2019-12-20 9:34 ` Paolo Bonzini @ 2019-12-23 2:17 ` Frederic Weisbecker 2019-12-23 8:38 ` Paolo Bonzini 0 siblings, 1 reply; 8+ messages in thread From: Frederic Weisbecker @ 2019-12-23 2:17 UTC (permalink / raw) To: Paolo Bonzini Cc: Sean Christopherson, Radim Krčmář, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm On Fri, Dec 20, 2019 at 10:34:20AM +0100, Paolo Bonzini wrote: > On 19/12/19 20:00, Sean Christopherson wrote: > >> And one last silly question, what about that line in > >> kvm_arch_can_inject_async_page_present: > >> > >> if (!(vcpu->arch.apf.msr_val & KVM_ASYNC_PF_ENABLED)) > >> return true; > >> > >> That looks weird, also it shortcuts the irqs_allowed() check. > > > > I wondered about that code as well :-). Definitely odd, but it would > > require the guest to disable async #PF after an async #PF is queued. Best > > guess is the idea is that it's the guest's problem if it disables async #PF > > on the fly. > > > > When the guest disables async #PF all outstanding page faults are > cancelled by kvm_clear_async_pf_completion_queue. However, in case they > complete while in cancel_work_sync. you need to inject them even if > interrupts are disabled. Hmm, shouldn't the guest wait for the whole pending waitqueue in kvm_async_pf_task_wait() to be serviced and woken up before actually allowing to disable async #PF ? Because you can't really afford to inject those #PF while IRQs are disabled, that's a big rq deadlock risk. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Async page fault delivered while irq are disabled? 2019-12-23 2:17 ` Frederic Weisbecker @ 2019-12-23 8:38 ` Paolo Bonzini 2019-12-26 17:28 ` Frederic Weisbecker 0 siblings, 1 reply; 8+ messages in thread From: Paolo Bonzini @ 2019-12-23 8:38 UTC (permalink / raw) To: Frederic Weisbecker Cc: Sean Christopherson, Radim Krčmář, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm On 23/12/19 03:17, Frederic Weisbecker wrote: > On Fri, Dec 20, 2019 at 10:34:20AM +0100, Paolo Bonzini wrote: >> On 19/12/19 20:00, Sean Christopherson wrote: >>>> And one last silly question, what about that line in >>>> kvm_arch_can_inject_async_page_present: >>>> >>>> if (!(vcpu->arch.apf.msr_val & KVM_ASYNC_PF_ENABLED)) >>>> return true; >>>> >>>> That looks weird, also it shortcuts the irqs_allowed() check. >>> >>> I wondered about that code as well :-). Definitely odd, but it would >>> require the guest to disable async #PF after an async #PF is queued. Best >>> guess is the idea is that it's the guest's problem if it disables async #PF >>> on the fly. >>> >> >> When the guest disables async #PF all outstanding page faults are >> cancelled by kvm_clear_async_pf_completion_queue. However, in case they >> complete while in cancel_work_sync. you need to inject them even if >> interrupts are disabled. > > Hmm, shouldn't the guest wait for the whole pending waitqueue in kvm_async_pf_task_wait() > to be serviced and woken up before actually allowing to disable async #PF ? > Because you can't really afford to inject those #PF while IRQs are disabled, > that's a big rq deadlock risk. That's just how Linux works, and Linux doesn't ever disable async page faults with disabled IRQ (reboot_notifier_list is a blocking notifier). Paolo ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Async page fault delivered while irq are disabled? 2019-12-23 8:38 ` Paolo Bonzini @ 2019-12-26 17:28 ` Frederic Weisbecker 0 siblings, 0 replies; 8+ messages in thread From: Frederic Weisbecker @ 2019-12-26 17:28 UTC (permalink / raw) To: Paolo Bonzini Cc: Sean Christopherson, Radim Krčmář, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm On Mon, Dec 23, 2019 at 09:38:18AM +0100, Paolo Bonzini wrote: > On 23/12/19 03:17, Frederic Weisbecker wrote: > > On Fri, Dec 20, 2019 at 10:34:20AM +0100, Paolo Bonzini wrote: > >> On 19/12/19 20:00, Sean Christopherson wrote: > >>>> And one last silly question, what about that line in > >>>> kvm_arch_can_inject_async_page_present: > >>>> > >>>> if (!(vcpu->arch.apf.msr_val & KVM_ASYNC_PF_ENABLED)) > >>>> return true; > >>>> > >>>> That looks weird, also it shortcuts the irqs_allowed() check. > >>> > >>> I wondered about that code as well :-). Definitely odd, but it would > >>> require the guest to disable async #PF after an async #PF is queued. Best > >>> guess is the idea is that it's the guest's problem if it disables async #PF > >>> on the fly. > >>> > >> > >> When the guest disables async #PF all outstanding page faults are > >> cancelled by kvm_clear_async_pf_completion_queue. However, in case they > >> complete while in cancel_work_sync. you need to inject them even if > >> interrupts are disabled. > > > > Hmm, shouldn't the guest wait for the whole pending waitqueue in kvm_async_pf_task_wait() > > to be serviced and woken up before actually allowing to disable async #PF ? > > Because you can't really afford to inject those #PF while IRQs are disabled, > > that's a big rq deadlock risk. > > That's just how Linux works, and Linux doesn't ever disable async page > faults with disabled IRQ (reboot_notifier_list is a blocking notifier). So when I talk about IRQs enabled requirement, this is to prevent the page fault from interrupting code that may hold a lock. Now in those case I think we are good, as kvm_pv_guest_cpu_reboot() is called from a generic IPI (rq and others shouldn't be held at that time) and kvm_guest_cpu_offline() is called from a thread with interrupts disabled. Anyway those semantics and expectations are very obscure. Probably those async page faults should be considered as IRQs from lockdep POV. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2019-12-26 17:28 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2019-12-19 15:28 Async page fault delivered while irq are disabled? Frederic Weisbecker 2019-12-19 15:57 ` Sean Christopherson 2019-12-19 16:15 ` Frederic Weisbecker 2019-12-19 19:00 ` Sean Christopherson 2019-12-20 9:34 ` Paolo Bonzini 2019-12-23 2:17 ` Frederic Weisbecker 2019-12-23 8:38 ` Paolo Bonzini 2019-12-26 17:28 ` Frederic Weisbecker
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox