qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] hotplug: VM got stuck when attaching a pass-through device to the non-pass-through VM for the first time
@ 2014-02-18  2:38 Zhanghaoyu (A)
  2014-02-18 10:38 ` Michael S. Tsirkin
  0 siblings, 1 reply; 9+ messages in thread
From: Zhanghaoyu (A) @ 2014-02-18  2:38 UTC (permalink / raw)
  To: qemu-devel@nongnu.org, KVM, Paolo Bonzini, Gleb Natapov,
	Michael S. Tsirkin, Avi Kivity, Eric Blake
  Cc: chenliang (T), Huangweidong (C), Zhanghaoyu (A), Luonengjun,
	Gonglei (Arei), Gaowei (UVP)

Hi, all

The VM will get stuck for a while(about 6s for a VM with 20GB memory) when attaching a pass-through PCI card to the non-pass-through VM for the first time. 
The reason is that the host will build the whole VT-d GPA->HPA DMAR page-table, which needs a lot of time, and during this time, the qemu_global_mutex
lock is hold by the main-thread, if the vcpu thread IOCTL return, it will be blocked to waiting main-thread to release the qemu_global_mutex lock,
so the VM got stuck.
The race between qemu-main-thread and vcpu-thread is shown as below,

              QEMU-main-thread                                vcpu-thread           
                     |                                             |
      qemu_mutex_lock_iothread                     qemu_mutex_lock(&qemu_global_mutex)
                     |                                             |
        +----loop- ->+                               +----loop---->+               
        |            |                               |             |
        |  qemu_mutex_unlock_iothread                | qemu_mutex_unlock_iothread 
        |            |                               |             |              
        |           poll                             |    kvm_vcpu_ioctl(KVM_RUN) 
        |            |                               |             |              
        | qemu_mutex_lock_iothread                   |             |
        |            |                               |             | 
 --------------------------------------------------------------------------------------
        |            |                               |  qemu_mutex_lock_iothread
        |   kvm_device_pci_assign                    |             |              
        |            |                               |   blocked to waiting main-thread to release the qemu lock
        |      about 6 sec for 20GB memory           |             |              
        |            |                               |             |                     
        +------------+                               +-------------+              


Any advises?

Thanks,
Zhang Haoyu

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] hotplug: VM got stuck when attaching a pass-through device to the non-pass-through VM for the first time
  2014-02-18  2:38 [Qemu-devel] hotplug: VM got stuck when attaching a pass-through device to the non-pass-through VM for the first time Zhanghaoyu (A)
@ 2014-02-18 10:38 ` Michael S. Tsirkin
  2014-02-18 10:42   ` Paolo Bonzini
  2014-02-18 10:59   ` Zhanghaoyu (A)
  0 siblings, 2 replies; 9+ messages in thread
From: Michael S. Tsirkin @ 2014-02-18 10:38 UTC (permalink / raw)
  To: Zhanghaoyu (A)
  Cc: chenliang (T), Avi Kivity, Huangweidong (C), KVM, Gleb Natapov,
	Luonengjun, qemu-devel@nongnu.org, Gonglei (Arei), Paolo Bonzini,
	Gaowei (UVP)

On Tue, Feb 18, 2014 at 02:38:40AM +0000, Zhanghaoyu (A) wrote:
> Hi, all
> 
> The VM will get stuck for a while(about 6s for a VM with 20GB memory) when attaching a pass-through PCI card to the non-pass-through VM for the first time. 
> The reason is that the host will build the whole VT-d GPA->HPA DMAR page-table, which needs a lot of time, and during this time, the qemu_global_mutex
> lock is hold by the main-thread, if the vcpu thread IOCTL return, it will be blocked to waiting main-thread to release the qemu_global_mutex lock,
> so the VM got stuck.
> The race between qemu-main-thread and vcpu-thread is shown as below,
> 
>               QEMU-main-thread                                vcpu-thread           
>                      |                                             |
>       qemu_mutex_lock_iothread                     qemu_mutex_lock(&qemu_global_mutex)
>                      |                                             |
>         +----loop- ->+                               +----loop---->+               
>         |            |                               |             |
>         |  qemu_mutex_unlock_iothread                | qemu_mutex_unlock_iothread 
>         |            |                               |             |              
>         |           poll                             |    kvm_vcpu_ioctl(KVM_RUN) 
>         |            |                               |             |              
>         | qemu_mutex_lock_iothread                   |             |
>         |            |                               |             | 
>  --------------------------------------------------------------------------------------
>         |            |                               |  qemu_mutex_lock_iothread
>         |   kvm_device_pci_assign                    |             |              
>         |            |                               |   blocked to waiting main-thread to release the qemu lock
>         |      about 6 sec for 20GB memory           |             |              
>         |            |                               |             |                     
>         +------------+                               +-------------+              
> 
> 
> Any advises?
> 
> Thanks,
> Zhang Haoyu

What if you detach and re-attach?
Is it fast then?
If yes this means the issue is COW breaking that occurs
with get_user_pages, not translation as such.
Try hugepages with prealloc - does it help?




-- 
MST

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] hotplug: VM got stuck when attaching a pass-through device to the non-pass-through VM for the first time
  2014-02-18 10:38 ` Michael S. Tsirkin
@ 2014-02-18 10:42   ` Paolo Bonzini
  2014-02-18 10:51     ` Michael S. Tsirkin
  2014-02-18 11:17     ` Zhanghaoyu (A)
  2014-02-18 10:59   ` Zhanghaoyu (A)
  1 sibling, 2 replies; 9+ messages in thread
From: Paolo Bonzini @ 2014-02-18 10:42 UTC (permalink / raw)
  To: Michael S. Tsirkin, Zhanghaoyu (A)
  Cc: chenliang (T), Avi Kivity, Huangweidong (C), KVM, Gleb Natapov,
	Luonengjun, qemu-devel@nongnu.org, Gonglei (Arei), Gaowei (UVP)

Il 18/02/2014 11:38, Michael S. Tsirkin ha scritto:
> What if you detach and re-attach?
> Is it fast then?
> If yes this means the issue is COW breaking that occurs
> with get_user_pages, not translation as such.
> Try hugepages with prealloc - does it help?

I agree it's either COW breaking or (similarly) locking pages that the 
guest hasn't touched yet.

You can use prealloc or "-rt mlock=on" to avoid this problem.

Paolo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] hotplug: VM got stuck when attaching a pass-through device to the non-pass-through VM for the first time
  2014-02-18 10:42   ` Paolo Bonzini
@ 2014-02-18 10:51     ` Michael S. Tsirkin
  2014-02-18 11:05       ` Paolo Bonzini
  2014-02-18 11:17     ` Zhanghaoyu (A)
  1 sibling, 1 reply; 9+ messages in thread
From: Michael S. Tsirkin @ 2014-02-18 10:51 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: chenliang (T), Avi Kivity, Huangweidong (C), Gleb Natapov, KVM,
	Zhanghaoyu (A), Luonengjun, qemu-devel@nongnu.org, Gonglei (Arei),
	Gaowei (UVP)

On Tue, Feb 18, 2014 at 11:42:19AM +0100, Paolo Bonzini wrote:
> Il 18/02/2014 11:38, Michael S. Tsirkin ha scritto:
> >What if you detach and re-attach?
> >Is it fast then?
> >If yes this means the issue is COW breaking that occurs
> >with get_user_pages, not translation as such.
> >Try hugepages with prealloc - does it help?
> 
> I agree it's either COW breaking or (similarly) locking pages that
> the guest hasn't touched yet.
> 
> You can use prealloc or "-rt mlock=on" to avoid this problem.
> 
> Paolo

Or the new shared flag - IIRC shared VMAs don't do COW either.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] hotplug: VM got stuck when attaching a pass-through device to the non-pass-through VM for the first time
  2014-02-18 10:38 ` Michael S. Tsirkin
  2014-02-18 10:42   ` Paolo Bonzini
@ 2014-02-18 10:59   ` Zhanghaoyu (A)
  1 sibling, 0 replies; 9+ messages in thread
From: Zhanghaoyu (A) @ 2014-02-18 10:59 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: chenliang (T), Avi Kivity, Huangweidong (C), KVM, Gleb Natapov,
	Luonengjun, qemu-devel@nongnu.org, Gonglei (Arei), Paolo Bonzini,
	Gaowei (UVP)

>> Hi, all
>> 
>> The VM will get stuck for a while(about 6s for a VM with 20GB memory) when attaching a pass-through PCI card to the non-pass-through VM for the first time. 
>> The reason is that the host will build the whole VT-d GPA->HPA DMAR 
>> page-table, which needs a lot of time, and during this time, the 
>> qemu_global_mutex lock is hold by the main-thread, if the vcpu thread IOCTL return, it will be blocked to waiting main-thread to release the qemu_global_mutex lock, so the VM got stuck.
>> The race between qemu-main-thread and vcpu-thread is shown as below,
>> 
>>               QEMU-main-thread                                vcpu-thread           
>>                      |                                             |
>>       qemu_mutex_lock_iothread                     qemu_mutex_lock(&qemu_global_mutex)
>>                      |                                             |
>>         +----loop- ->+                               +----loop---->+               
>>         |            |                               |             |
>>         |  qemu_mutex_unlock_iothread                | qemu_mutex_unlock_iothread 
>>         |            |                               |             |              
>>         |           poll                             |    kvm_vcpu_ioctl(KVM_RUN) 
>>         |            |                               |             |              
>>         | qemu_mutex_lock_iothread                   |             |
>>         |            |                               |             | 
>>  --------------------------------------------------------------------------------------
>>         |            |                               |  qemu_mutex_lock_iothread
>>         |   kvm_device_pci_assign                    |             |              
>>         |            |                               |   blocked to waiting main-thread to release the qemu lock
>>         |      about 6 sec for 20GB memory           |             |              
>>         |            |                               |             |                     
>>         +------------+                               +-------------+              
>> 
>> 
>> Any advises?
>> 
>> Thanks,
>> Zhang Haoyu
>
>What if you detach and re-attach?
>Is it fast then?
Yes, because the VT-d GPA->HPA DMAR page-table has been built, no need to re-build it.

>If yes this means the issue is COW breaking that occurs with get_user_pages, not translation as such.
>Try hugepages with prealloc - does it help?
Yes, a bit help gained, but it cannot resolve the problem completely, the stuck still happened.

Thanks,
Zhang Haoyu

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] hotplug: VM got stuck when attaching a pass-through device to the non-pass-through VM for the first time
  2014-02-18 10:51     ` Michael S. Tsirkin
@ 2014-02-18 11:05       ` Paolo Bonzini
  2014-02-18 11:16         ` Michael S. Tsirkin
  2014-02-24 11:38         ` Zhanghaoyu (A)
  0 siblings, 2 replies; 9+ messages in thread
From: Paolo Bonzini @ 2014-02-18 11:05 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: chenliang (T), Avi Kivity, Huangweidong (C), Gleb Natapov, KVM,
	Zhanghaoyu (A), Luonengjun, qemu-devel@nongnu.org, Gonglei (Arei),
	Gaowei (UVP)

Il 18/02/2014 11:51, Michael S. Tsirkin ha scritto:
>> > I agree it's either COW breaking or (similarly) locking pages that
>> > the guest hasn't touched yet.
>> >
>> > You can use prealloc or "-rt mlock=on" to avoid this problem.
>> >
>> > Paolo
> Or the new shared flag - IIRC shared VMAs don't do COW either.

Only if the problem isn't locking and zeroing of untouched pages (also, 
it is not upstream is it?).

Can you make a profile with perf?

Paolo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] hotplug: VM got stuck when attaching a pass-through device to the non-pass-through VM for the first time
  2014-02-18 11:05       ` Paolo Bonzini
@ 2014-02-18 11:16         ` Michael S. Tsirkin
  2014-02-24 11:38         ` Zhanghaoyu (A)
  1 sibling, 0 replies; 9+ messages in thread
From: Michael S. Tsirkin @ 2014-02-18 11:16 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: chenliang (T), Avi Kivity, Huangweidong (C), Gleb Natapov, KVM,
	Zhanghaoyu (A), Luonengjun, qemu-devel@nongnu.org, Gonglei (Arei),
	Gaowei (UVP)

On Tue, Feb 18, 2014 at 12:05:19PM +0100, Paolo Bonzini wrote:
> Il 18/02/2014 11:51, Michael S. Tsirkin ha scritto:
> >>> I agree it's either COW breaking or (similarly) locking pages that
> >>> the guest hasn't touched yet.
> >>>
> >>> You can use prealloc or "-rt mlock=on" to avoid this problem.
> >>>
> >>> Paolo
> >Or the new shared flag - IIRC shared VMAs don't do COW either.
> 
> Only if the problem isn't locking and zeroing of untouched pages
> (also, it is not upstream is it?).

No but it's a small patch - part of vhost user patchset.

> Can you make a profile with perf?
> 
> Paolo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] hotplug: VM got stuck when attaching a pass-through device to the non-pass-through VM for the first time
  2014-02-18 10:42   ` Paolo Bonzini
  2014-02-18 10:51     ` Michael S. Tsirkin
@ 2014-02-18 11:17     ` Zhanghaoyu (A)
  1 sibling, 0 replies; 9+ messages in thread
From: Zhanghaoyu (A) @ 2014-02-18 11:17 UTC (permalink / raw)
  To: Paolo Bonzini, Michael S. Tsirkin
  Cc: chenliang (T), Avi Kivity, Huangweidong (C), KVM, Gleb Natapov,
	Luonengjun, qemu-devel@nongnu.org, Gonglei (Arei), Gaowei (UVP)

>> What if you detach and re-attach?
>> Is it fast then?
>> If yes this means the issue is COW breaking that occurs with 
>> get_user_pages, not translation as such.
>> Try hugepages with prealloc - does it help?
>
>I agree it's either COW breaking or (similarly) locking pages that the guest hasn't touched yet.
>
>You can use prealloc or "-rt mlock=on" to avoid this problem.
>
It gets better if using "-rt mlock=on", but still cannot resolve the problem completely.
VT-d and EPT do not share the GPA->HPA page-table, still need to build VT-d GPA->HPA DMAR page-table,
Although the "-rt mlock=on" option guarantees that all of vm memory have been touched before attaching 
the pass-through device, the building is faster, but which still need some time.

Thanks,
Zhang Haoyu

>Paolo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] hotplug: VM got stuck when attaching a pass-through device to the non-pass-through VM for the first time
  2014-02-18 11:05       ` Paolo Bonzini
  2014-02-18 11:16         ` Michael S. Tsirkin
@ 2014-02-24 11:38         ` Zhanghaoyu (A)
  1 sibling, 0 replies; 9+ messages in thread
From: Zhanghaoyu (A) @ 2014-02-24 11:38 UTC (permalink / raw)
  To: Paolo Bonzini, Michael S. Tsirkin
  Cc: chenliang (T), Avi Kivity, Huangweidong (C), KVM, Gleb Natapov,
	Luonengjun, qemu-devel@nongnu.org, Gonglei (Arei), Gaowei (UVP)

>>> > I agree it's either COW breaking or (similarly) locking pages that 
>>> > the guest hasn't touched yet.
>>> >
>>> > You can use prealloc or "-rt mlock=on" to avoid this problem.
>>> >
>>> > Paolo
>> Or the new shared flag - IIRC shared VMAs don't do COW either.
>
>Only if the problem isn't locking and zeroing of untouched pages (also, it is not upstream is it?).
>
>Can you make a profile with perf?
>
"-rt mlock=on" option is not set, perf top -p <qemu pid> result:
21699 root      20   0 24.2g  24g 5312 S      0 33.8   0:24.39 qemu-system-x8
   PerfTop:      95 irqs/sec  kernel:17.9% us: 1.1% guest kernel:47.4% guest us:32.6% exact:  0.0% [1000Hz cycles],  (target_pid: 15950)
----------------------------------------------------------------------------

             samples  pcnt function                              DSO
             _______ _____ _____________________________________ ___________

             2984.00 77.8% clear_page_c                          [kernel]
              135.00  3.5% gup_huge_pmd                          [kernel]
              134.00  3.5% pfn_to_dma_pte                        [kernel]
               83.00  2.2% __domain_mapping                      [kernel]
               63.00  1.6% update_memslots                       [kvm]
               59.00  1.5% prep_new_page                         [kernel]
               50.00  1.3% get_user_pages_fast                   [kernel]
               45.00  1.2% up_read                               [kernel]
               42.00  1.1% down_read                             [kernel]
               38.00  1.0% gup_pud_range                         [kernel]
               34.00  0.9% kvm_clear_async_pf_completion_queue   [kvm]
               18.00  0.5% intel_iommu_map                       [kernel]
               16.00  0.4% _cond_resched                         [kernel]
               16.00  0.4% gfn_to_hva                            [kvm]
               15.00  0.4% kvm_set_apic_base                     [kvm]
               15.00  0.4% load_vmcs12_host_state                [kvm_intel]
               14.00  0.4% clear_huge_page                       [kernel]
                7.00  0.2% intel_iommu_iova_to_phys              [kernel]
                6.00  0.2% is_error_pfn                          [kvm]
                6.00  0.2% iommu_map                             [kernel]
                6.00  0.2% native_write_msr_safe                 [kernel]
                5.00  0.1% find_vma                              [kernel]

"-rt mlock=on" option is set, perf top -p <qemu pid> result:
   PerfTop:     326 irqs/sec  kernel:17.5% us: 2.8% guest kernel:37.4% guest us:42.3% exact:  0.0% [1000Hz cycles],  (target_pid: 25845)
----------------------------------------------------------------------------

             samples  pcnt function                              DSO
             _______ _____ _____________________________________ ___________

              182.00 17.5% pfn_to_dma_pte                        [kernel]
              178.00 17.1% gup_huge_pmd                          [kernel]
               91.00  8.8% __domain_mapping                      [kernel]
               71.00  6.8% update_memslots                       [kvm]
               65.00  6.3% gup_pud_range                         [kernel]
               62.00  6.0% get_user_pages_fast                   [kernel]
               52.00  5.0% kvm_clear_async_pf_completion_queue   [kvm]
               50.00  4.8% down_read                             [kernel]
               37.00  3.6% up_read                               [kernel]
               26.00  2.5% intel_iommu_map                       [kernel]
               20.00  1.9% native_write_msr_safe                 [kernel]
               16.00  1.5% gfn_to_hva                            [kvm]
               14.00  1.3% load_vmcs12_host_state                [kvm_intel]
                8.00  0.8% find_busiest_group                    [kernel]
                8.00  0.8% _raw_spin_lock                        [kernel]
                8.00  0.8% hrtimer_interrupt                     [kernel]
                8.00  0.8% intel_iommu_iova_to_phys              [kernel]
                7.00  0.7% iommu_map                             [kernel]
                6.00  0.6% kvm_mmu_pte_write                     [kvm]
                6.00  0.6% is_error_pfn                          [kvm]
                5.00  0.5% kvm_set_apic_base                     [kvm]
                5.00  0.5% clear_page_c                          [kernel]
                5.00  0.5% iommu_iova_to_phys                    [kernel]

With "-rt mlock=on" option not set, when iommu_map, many new pages have to be allocated and cleared, the clear operation is expensive.
but no matter whether the "-rt mlock=on" option is set or not, the GPA->HPA DMAR page-table MUST be built, this operation is also expensive, about 1-2 sec needed for 25GB memory.

Thanks,
Zhang Haoyu
>Paolo

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-02-24 11:38 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-18  2:38 [Qemu-devel] hotplug: VM got stuck when attaching a pass-through device to the non-pass-through VM for the first time Zhanghaoyu (A)
2014-02-18 10:38 ` Michael S. Tsirkin
2014-02-18 10:42   ` Paolo Bonzini
2014-02-18 10:51     ` Michael S. Tsirkin
2014-02-18 11:05       ` Paolo Bonzini
2014-02-18 11:16         ` Michael S. Tsirkin
2014-02-24 11:38         ` Zhanghaoyu (A)
2014-02-18 11:17     ` Zhanghaoyu (A)
2014-02-18 10:59   ` Zhanghaoyu (A)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).