* Re: Windows VM slow boot [not found] ` <20120906092039.GA19234@alpha.arachsys.com> @ 2012-09-12 10:56 ` Richard Davies 2012-09-12 12:25 ` Mel Gorman 0 siblings, 1 reply; 15+ messages in thread From: Richard Davies @ 2012-09-12 10:56 UTC (permalink / raw) To: Rik van Riel; +Cc: Avi Kivity, qemu-devel, kvm, linux-mm [ adding linux-mm - previously at http://marc.info/?t=134511509400003 ] Hi Rik, Since qemu-kvm 1.2.0 and Linux 3.6.0-rc5 came out, I thought that I would retest with these. The typical symptom now appears to be that the Windows VMs boot reasonably fast, but then there is high CPU use and load for many minutes afterwards - the high CPU use is both for the qemu-kvm processes themselves and also for % sys. I attach a perf report which seems to show that the high CPU use is in the memory manager. Cheers, Richard. # ======== # captured on: Wed Sep 12 10:25:43 2012 # os release : 3.6.0-rc5-elastic # perf version : 3.5.2 # arch : x86_64 # nrcpus online : 16 # nrcpus avail : 16 # cpudesc : AMD Opteron(tm) Processor 6128 # cpuid : AuthenticAMD,16,9,1 # total memory : 131973280 kB # cmdline : /home/root/bin/perf record -g -a # event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 } # HEADER_CPU_TOPOLOGY info available, use -I to display # HEADER_NUMA_TOPOLOGY info available, use -I to display # ======== # # Samples: 870K of event 'cycles' # Event count (approx.): 432968175910 # # Overhead Command Shared Object Symbol # ........ ............... .................... .............................................. # 89.14% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave | --- _raw_spin_lock_irqsave | |--95.47%-- isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--55.64%-- 0x10100000002 | | | --44.36%-- 0x10100000006 | |--4.53%-- compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--55.36%-- 0x10100000002 | | | --44.64%-- 0x10100000006 --0.00%-- [...] 4.92% qemu-kvm [kernel.kallsyms] [k] migrate_pages | --- migrate_pages | |--99.74%-- compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--55.80%-- 0x10100000002 | | | --44.20%-- 0x10100000006 --0.26%-- [...] 1.59% ksmd [kernel.kallsyms] [k] memcmp | --- memcmp | |--99.69%-- memcmp_pages | | | |--78.86%-- ksm_scan_thread | | kthread | | kernel_thread_helper | | | --21.14%-- try_to_merge_with_ksm_page | ksm_scan_thread | kthread | kernel_thread_helper --0.31%-- [...] 0.85% ksmd [kernel.kallsyms] [k] smp_call_function_many | --- smp_call_function_many native_flush_tlb_others | |--99.81%-- flush_tlb_page | ptep_clear_flush | try_to_merge_with_ksm_page | ksm_scan_thread | kthread | kernel_thread_helper --0.19%-- [...] 0.38% swapper [kernel.kallsyms] [k] default_idle | --- default_idle | |--99.80%-- cpu_idle | | | |--90.53%-- start_secondary | | | --9.47%-- rest_init | start_kernel | x86_64_start_reservations | x86_64_start_kernel --0.20%-- [...] 0.38% qemu-kvm [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore | --- _raw_spin_unlock_irqrestore | |--94.31%-- compact_checklock_irqsave | isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--59.74%-- 0x10100000006 | | | --40.26%-- 0x10100000002 | |--3.41%-- isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--53.57%-- 0x10100000006 | | | --46.43%-- 0x10100000002 | |--0.82%-- ntp_tick_length | do_timer | tick_do_update_jiffies64 | tick_sched_timer | __run_hrtimer.isra.28 | hrtimer_interrupt | smp_apic_timer_interrupt | apic_timer_interrupt | compact_checklock_irqsave | isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | 0x10100000002 | |--0.76%-- __page_cache_release.part.11 | __put_compound_page | put_compound_page | release_pages | free_pages_and_swap_cache | tlb_flush_mmu | tlb_finish_mmu | exit_mmap | mmput | exit_mm | do_exit | do_group_exit | get_signal_to_deliver | do_signal | do_notify_resume | int_signal --0.70%-- [...] 0.26% qemu-kvm [kernel.kallsyms] [k] isolate_migratepages_range | --- isolate_migratepages_range | |--95.44%-- compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--52.46%-- 0x10100000002 | | | --47.54%-- 0x10100000006 | --4.56%-- compact_zone_order try_to_compact_pages __alloc_pages_direct_compact __alloc_pages_nodemask alloc_pages_vma do_huge_pmd_anonymous_page handle_mm_fault __get_user_pages get_user_page_nowait hva_to_pfn.isra.17 __gfn_to_pfn gfn_to_pfn_async try_async_pf tdp_page_fault kvm_mmu_page_fault pf_interception handle_exit kvm_arch_vcpu_ioctl_run kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl | |--53.84%-- 0x10100000006 | --46.16%-- 0x10100000002 0.21% qemu-kvm [kernel.kallsyms] [k] compact_zone | --- compact_zone compact_zone_order try_to_compact_pages __alloc_pages_direct_compact __alloc_pages_nodemask alloc_pages_vma do_huge_pmd_anonymous_page handle_mm_fault __get_user_pages get_user_page_nowait hva_to_pfn.isra.17 __gfn_to_pfn gfn_to_pfn_async try_async_pf tdp_page_fault kvm_mmu_page_fault pf_interception handle_exit kvm_arch_vcpu_ioctl_run kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl | |--53.46%-- 0x10100000002 | --46.54%-- 0x10100000006 0.14% qemu-kvm [kernel.kallsyms] [k] mod_zone_page_state | --- mod_zone_page_state | |--70.21%-- isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--55.97%-- 0x10100000002 | | | --44.03%-- 0x10100000006 | |--29.71%-- compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--61.19%-- 0x10100000002 | | | --38.81%-- 0x10100000006 --0.08%-- [...] 0.13% qemu-kvm [kernel.kallsyms] [k] flush_tlb_func | --- flush_tlb_func | |--99.47%-- generic_smp_call_function_interrupt | smp_call_function_interrupt | call_function_interrupt | | | |--91.76%-- compact_checklock_irqsave | | isolate_migratepages_range | | compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | | | | |--76.39%-- 0x10100000006 | | | | | --23.61%-- 0x10100000002 | | | |--7.61%-- compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | | | | |--70.59%-- 0x10100000006 | | | | | --29.41%-- 0x10100000002 | --0.63%-- [...] | --0.53%-- smp_call_function_interrupt call_function_interrupt | |--83.32%-- compact_checklock_irqsave | isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--79.99%-- 0x10100000006 | | | --20.01%-- 0x10100000002 | --16.68%-- compact_zone compact_zone_order try_to_compact_pages __alloc_pages_direct_compact __alloc_pages_nodemask alloc_pages_vma do_huge_pmd_anonymous_page handle_mm_fault __get_user_pages get_user_page_nowait hva_to_pfn.isra.17 __gfn_to_pfn gfn_to_pfn_async try_async_pf tdp_page_fault kvm_mmu_page_fault pf_interception handle_exit kvm_arch_vcpu_ioctl_run kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl 0x10100000002 0.09% qemu-kvm [kernel.kallsyms] [k] free_pages_prepare | --- free_pages_prepare | |--99.75%-- __free_pages_ok | | | |--99.84%-- free_compound_page | | __put_compound_page | | put_compound_page | | release_pages | | free_pages_and_swap_cache | | tlb_flush_mmu | | tlb_finish_mmu | | exit_mmap | | mmput | | exit_mm | | do_exit | | do_group_exit | | get_signal_to_deliver | | do_signal | | do_notify_resume | | int_signal | --0.16%-- [...] --0.25%-- [...] 0.08% :2585 [kernel.kallsyms] [k] free_pages_prepare | --- free_pages_prepare | |--99.47%-- __free_pages_ok | free_compound_page | __put_compound_page | put_compound_page | release_pages | free_pages_and_swap_cache | tlb_flush_mmu | tlb_finish_mmu | exit_mmap | mmput | exit_mm | do_exit | do_group_exit | get_signal_to_deliver | do_signal | do_notify_resume | int_signal | --0.53%-- free_hot_cold_page __free_pages | |--50.65%-- zap_huge_pmd | unmap_single_vma | unmap_vmas | exit_mmap | mmput | exit_mm | do_exit | do_group_exit | get_signal_to_deliver | do_signal | do_notify_resume | int_signal | --49.35%-- __vunmap vfree kvm_free_physmem_slot kvm_free_physmem kvm_put_kvm kvm_vcpu_release __fput ____fput task_work_run do_exit do_group_exit get_signal_to_deliver do_signal do_notify_resume int_signal 0.07% :2561 [kernel.kallsyms] [k] free_pages_prepare | --- free_pages_prepare | |--99.55%-- __free_pages_ok | free_compound_page | __put_compound_page | put_compound_page | release_pages | free_pages_and_swap_cache | tlb_flush_mmu | tlb_finish_mmu | exit_mmap | mmput | exit_mm | do_exit | do_group_exit | get_signal_to_deliver | do_signal | do_notify_resume | int_signal --0.45%-- [...] 0.07% qemu-kvm [kernel.kallsyms] [k] __zone_watermark_ok | --- __zone_watermark_ok | |--56.52%-- zone_watermark_ok | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--59.67%-- 0x10100000002 | | | --40.33%-- 0x10100000006 | --43.48%-- compact_zone compact_zone_order try_to_compact_pages __alloc_pages_direct_compact __alloc_pages_nodemask alloc_pages_vma do_huge_pmd_anonymous_page handle_mm_fault __get_user_pages get_user_page_nowait hva_to_pfn.isra.17 __gfn_to_pfn gfn_to_pfn_async try_async_pf tdp_page_fault kvm_mmu_page_fault pf_interception handle_exit kvm_arch_vcpu_ioctl_run kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl | |--58.50%-- 0x10100000002 | --41.50%-- 0x10100000006 0.06% perf [kernel.kallsyms] [k] copy_user_generic_string | --- copy_user_generic_string | |--99.82%-- generic_file_buffered_write | __generic_file_aio_write | generic_file_aio_write | ext4_file_write | do_sync_write | vfs_write | sys_write | system_call_fastpath | write | run_builtin | main | __libc_start_main --0.18%-- [...] 0.05% qemu-kvm [kernel.kallsyms] [k] compact_checklock_irqsave | --- compact_checklock_irqsave | |--82.09%-- isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--54.69%-- 0x10100000002 | | | --45.31%-- 0x10100000006 | --17.91%-- compact_zone compact_zone_order try_to_compact_pages __alloc_pages_direct_compact __alloc_pages_nodemask alloc_pages_vma do_huge_pmd_anonymous_page handle_mm_fault __get_user_pages get_user_page_nowait hva_to_pfn.isra.17 __gfn_to_pfn gfn_to_pfn_async try_async_pf tdp_page_fault kvm_mmu_page_fault pf_interception handle_exit kvm_arch_vcpu_ioctl_run kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl | |--59.49%-- 0x10100000002 | --40.51%-- 0x10100000006 0.04% qemu-kvm [kernel.kallsyms] [k] call_function_interrupt | --- call_function_interrupt | |--91.95%-- compact_checklock_irqsave | isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--72.81%-- 0x10100000006 | | | --27.19%-- 0x10100000002 | |--7.50%-- compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--55.56%-- 0x10100000006 | | | --44.44%-- 0x10100000002 --0.56%-- [...] 0.04% ksmd [kernel.kallsyms] [k] default_send_IPI_mask_sequence_phys | --- default_send_IPI_mask_sequence_phys | |--99.44%-- physflat_send_IPI_mask | native_send_call_func_ipi | smp_call_function_many | native_flush_tlb_others | flush_tlb_page | ptep_clear_flush | try_to_merge_with_ksm_page | ksm_scan_thread | kthread | kernel_thread_helper | --0.56%-- native_send_call_func_ipi smp_call_function_many native_flush_tlb_others flush_tlb_page ptep_clear_flush try_to_merge_with_ksm_page ksm_scan_thread kthread kernel_thread_helper 0.03% qemu-kvm [kernel.kallsyms] [k] generic_smp_call_function_interrupt | --- generic_smp_call_function_interrupt | |--96.97%-- smp_call_function_interrupt | call_function_interrupt | | | |--97.39%-- compact_checklock_irqsave | | isolate_migratepages_range | | compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | | | | |--78.65%-- 0x10100000006 | | | | | --21.35%-- 0x10100000002 | | | |--2.43%-- compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | | | | |--57.14%-- 0x10100000002 | | | | | --42.86%-- 0x10100000006 | --0.19%-- [...] | --3.03%-- call_function_interrupt | |--77.79%-- compact_checklock_irqsave | isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--71.42%-- 0x10100000006 | | | --28.58%-- 0x10100000002 | --22.21%-- compact_zone compact_zone_order try_to_compact_pages __alloc_pages_direct_compact __alloc_pages_nodemask alloc_pages_vma do_huge_pmd_anonymous_page handle_mm_fault __get_user_pages get_user_page_nowait hva_to_pfn.isra.17 __gfn_to_pfn gfn_to_pfn_async -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Windows VM slow boot 2012-09-12 10:56 ` Windows VM slow boot Richard Davies @ 2012-09-12 12:25 ` Mel Gorman 2012-09-12 16:46 ` Richard Davies 0 siblings, 1 reply; 15+ messages in thread From: Mel Gorman @ 2012-09-12 12:25 UTC (permalink / raw) To: Richard Davies Cc: Rik van Riel, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm On Wed, Sep 12, 2012 at 11:56:59AM +0100, Richard Davies wrote: > [ adding linux-mm - previously at http://marc.info/?t=134511509400003 ] > > Hi Rik, > I'm not Rik but hi anyway. > Since qemu-kvm 1.2.0 and Linux 3.6.0-rc5 came out, I thought that I would > retest with these. > Ok. 3.6.0-rc5 contains [c67fe375: mm: compaction: Abort async compaction if locks are contended or taking too long] that should have helped mitigate some of the lock contention problem but not all of it as we'll see later. > The typical symptom now appears to be that the Windows VMs boot reasonably > fast, I see that this is an old-ish bug but I did not read the full history. Is it now booting faster than 3.5.0 was? I'm asking because I'm interested to see if commit c67fe375 helped your particular case. > but then there is high CPU use and load for many minutes afterwards - > the high CPU use is both for the qemu-kvm processes themselves and also for > % sys. > Ok, I cannot comment on the userspace portion of things but the kernel portion still indicates that there is a high percentage of time on what appears to be lock contention. > I attach a perf report which seems to show that the high CPU use is in the > memory manager. > A follow-on from commit c67fe375 was the following patch (author cc'd) which addresses lock contention in isolate_migratepages_range where your perf report indicates that we're spending 95% of the time. Would you be willing to test it please? ---8<--- From: Shaohua Li <shli@kernel.org> Subject: mm: compaction: check lock contention first before taking lock isolate_migratepages_range will take zone->lru_lock first and check if the lock is contented, if yes, it will release the lock. This isn't efficient. If the lock is truly contented, a lock/unlock pair will increase the lock contention. We'd better check if the lock is contended first. compact_trylock_irqsave perfectly meets the requirement. Signed-off-by: Shaohua Li <shli@fusionio.com> Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/compaction.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff -puN mm/compaction.c~mm-compaction-check-lock-contention-first-before-taking-lock mm/compaction.c --- a/mm/compaction.c~mm-compaction-check-lock-contention-first-before-taking-lock +++ a/mm/compaction.c @@ -349,8 +349,9 @@ isolate_migratepages_range(struct zone * /* Time to isolate some pages for migration */ cond_resched(); - spin_lock_irqsave(&zone->lru_lock, flags); - locked = true; + locked = compact_trylock_irqsave(&zone->lru_lock, &flags, cc); + if (!locked) + return 0; for (; low_pfn < end_pfn; low_pfn++) { struct page *page; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Windows VM slow boot 2012-09-12 12:25 ` Mel Gorman @ 2012-09-12 16:46 ` Richard Davies 2012-09-13 9:50 ` Mel Gorman ` (2 more replies) 0 siblings, 3 replies; 15+ messages in thread From: Richard Davies @ 2012-09-12 16:46 UTC (permalink / raw) To: Mel Gorman Cc: Rik van Riel, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm Hi Mel - thanks for replying to my underhand bcc! Mel Gorman wrote: > I see that this is an old-ish bug but I did not read the full history. > Is it now booting faster than 3.5.0 was? I'm asking because I'm > interested to see if commit c67fe375 helped your particular case. Yes, I think 3.6.0-rc5 is already better than 3.5.x but can still be improved, as discussed. > A follow-on from commit c67fe375 was the following patch (author cc'd) > which addresses lock contention in isolate_migratepages_range where your > perf report indicates that we're spending 95% of the time. Would you be > willing to test it please? > > ---8<--- > From: Shaohua Li <shli@kernel.org> > Subject: mm: compaction: check lock contention first before taking lock > > isolate_migratepages_range will take zone->lru_lock first and check if the > lock is contented, if yes, it will release the lock. This isn't > efficient. If the lock is truly contented, a lock/unlock pair will > increase the lock contention. We'd better check if the lock is contended > first. compact_trylock_irqsave perfectly meets the requirement. > > Signed-off-by: Shaohua Li <shli@fusionio.com> > Acked-by: Mel Gorman <mgorman@suse.de> > Acked-by: Minchan Kim <minchan@kernel.org> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org> > --- > > mm/compaction.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff -puN mm/compaction.c~mm-compaction-check-lock-contention-first-before-taking-lock mm/compaction.c > --- a/mm/compaction.c~mm-compaction-check-lock-contention-first-before-taking-lock > +++ a/mm/compaction.c > @@ -349,8 +349,9 @@ isolate_migratepages_range(struct zone * > > /* Time to isolate some pages for migration */ > cond_resched(); > - spin_lock_irqsave(&zone->lru_lock, flags); > - locked = true; > + locked = compact_trylock_irqsave(&zone->lru_lock, &flags, cc); > + if (!locked) > + return 0; > for (; low_pfn < end_pfn; low_pfn++) { > struct page *page; I have applied and tested again - perf results below. isolate_migratepages_range is indeed much reduced. There is now a lot of time in isolate_freepages_block and still quite a lot of lock contention, although in a different place. # ======== # captured on: Wed Sep 12 16:00:52 2012 # os release : 3.6.0-rc5-elastic+ # perf version : 3.5.2 # arch : x86_64 # nrcpus online : 16 # nrcpus avail : 16 # cpudesc : AMD Opteron(tm) Processor 6128 # cpuid : AuthenticAMD,16,9,1 # total memory : 131973280 kB # cmdline : /home/root/bin/perf record -g -a # event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80 } # HEADER_CPU_TOPOLOGY info available, use -I to display # HEADER_NUMA_TOPOLOGY info available, use -I to display # ======== # # Samples: 1M of event 'cycles' # Event count (approx.): 560365005583 # # Overhead Command Shared Object Symbol # ........ ............... .................... .............................................. # 43.95% qemu-kvm [kernel.kallsyms] [k] isolate_freepages_block | --- isolate_freepages_block | |--99.99%-- compaction_alloc | migrate_pages | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--95.17%-- 0x10100000006 | | | --4.83%-- 0x10100000002 --0.01%-- [...] 15.98% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave | --- _raw_spin_lock_irqsave | |--97.18%-- compact_checklock_irqsave | | | |--98.61%-- compaction_alloc | | migrate_pages | | compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | | | | |--94.94%-- 0x10100000006 | | | | | --5.06%-- 0x10100000002 | | | --1.39%-- isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--95.04%-- 0x10100000006 | | | --4.96%-- 0x10100000002 | |--1.94%-- compaction_alloc | migrate_pages | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--95.19%-- 0x10100000006 | | | --4.81%-- 0x10100000002 --0.88%-- [...] 5.73% ksmd [kernel.kallsyms] [k] memcmp | --- memcmp | |--99.79%-- memcmp_pages | | | |--81.64%-- ksm_scan_thread | | kthread | | kernel_thread_helper | | | --18.36%-- try_to_merge_with_ksm_page | ksm_scan_thread | kthread | kernel_thread_helper --0.21%-- [...] 5.52% swapper [kernel.kallsyms] [k] default_idle | --- default_idle | |--99.51%-- cpu_idle | | | |--86.19%-- start_secondary | | | --13.81%-- rest_init | start_kernel | x86_64_start_reservations | x86_64_start_kernel --0.49%-- [...] 2.90% qemu-kvm [kernel.kallsyms] [k] yield_to | --- yield_to | |--99.70%-- kvm_vcpu_yield_to | kvm_vcpu_on_spin | pause_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--96.09%-- 0x10100000006 | | | --3.91%-- 0x10100000002 --0.30%-- [...] 1.86% qemu-kvm [kernel.kallsyms] [k] clear_page_c | --- clear_page_c | |--99.15%-- do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--96.03%-- 0x10100000006 | | | --3.97%-- 0x10100000002 | --0.85%-- __alloc_pages_nodemask | |--78.22%-- alloc_pages_vma | handle_pte_fault | | | |--99.76%-- handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | | | | |--91.60%-- 0x10100000006 | | | | | --8.40%-- 0x10100000002 | --0.24%-- [...] | --21.78%-- alloc_pages_current pte_alloc_one | |--97.40%-- do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--93.12%-- 0x10100000006 | | | --6.88%-- 0x10100000002 | --2.60%-- __pte_alloc do_huge_pmd_anonymous_page handle_mm_fault __get_user_pages get_user_page_nowait hva_to_pfn.isra.17 __gfn_to_pfn gfn_to_pfn_async try_async_pf tdp_page_fault kvm_mmu_page_fault pf_interception handle_exit kvm_arch_vcpu_ioctl_run kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl 0x10100000006 1.83% qemu-kvm [kernel.kallsyms] [k] get_pageblock_flags_group | --- get_pageblock_flags_group | |--51.38%-- isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--95.32%-- 0x10100000006 | | | --4.68%-- 0x10100000002 | |--43.05%-- suitable_migration_target | compaction_alloc | migrate_pages | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--95.52%-- 0x10100000006 | | | --4.48%-- 0x10100000002 | |--3.62%-- compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--96.78%-- 0x10100000006 | | | --3.22%-- 0x10100000002 | |--1.20%-- compaction_alloc | migrate_pages | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--96.33%-- 0x10100000006 | | | --3.67%-- 0x10100000002 | |--0.61%-- free_hot_cold_page | | | |--77.99%-- free_hot_cold_page_list | | | | | |--95.93%-- release_pages | | | pagevec_lru_move_fn | | | __pagevec_lru_add | | | | | | | |--98.44%-- __lru_cache_add | | | | lru_cache_add_lru | | | | putback_lru_page | | | | migrate_pages | | | | compact_zone | | | | compact_zone_order | | | | try_to_compact_pages | | | | __alloc_pages_direct_compact | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--96.77%-- 0x10100000006 | | | | | | | | | --3.23%-- 0x10100000002 | | | | | | | --1.56%-- lru_add_drain_cpu | | | lru_add_drain | | | migrate_prep_local | | | compact_zone | | | compact_zone_order | | | try_to_compact_pages | | | __alloc_pages_direct_compact | | | __alloc_pages_nodemask | | | alloc_pages_vma | | | do_huge_pmd_anonymous_page | | | handle_mm_fault | | | __get_user_pages | | | get_user_page_nowait | | | hva_to_pfn.isra.17 | | | __gfn_to_pfn | | | gfn_to_pfn_async | | | try_async_pf | | | tdp_page_fault | | | kvm_mmu_page_fault | | | pf_interception | | | handle_exit | | | kvm_arch_vcpu_ioctl_run | | | kvm_vcpu_ioctl | | | do_vfs_ioctl | | | sys_ioctl | | | system_call_fastpath | | | ioctl | | | 0x10100000006 | | | | | --4.07%-- shrink_page_list | | shrink_inactive_list | | shrink_lruvec | | try_to_free_pages | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | 0x10100000006 | | | |--19.40%-- __free_pages | | | | | |--85.71%-- release_freepages | | | compact_zone | | | compact_zone_order | | | try_to_compact_pages | | | __alloc_pages_direct_compact | | | __alloc_pages_nodemask | | | alloc_pages_vma | | | do_huge_pmd_anonymous_page | | | handle_mm_fault | | | __get_user_pages | | | get_user_page_nowait | | | hva_to_pfn.isra.17 | | | __gfn_to_pfn | | | gfn_to_pfn_async | | | try_async_pf | | | tdp_page_fault | | | kvm_mmu_page_fault | | | pf_interception | | | handle_exit | | | kvm_arch_vcpu_ioctl_run | | | kvm_vcpu_ioctl | | | do_vfs_ioctl | | | sys_ioctl | | | system_call_fastpath | | | ioctl | | | | | | | |--90.47%-- 0x10100000006 | | | | | | | --9.53%-- 0x10100000002 | | | | | |--10.21%-- do_huge_pmd_anonymous_page | | | handle_mm_fault | | | __get_user_pages | | | get_user_page_nowait | | | hva_to_pfn.isra.17 | | | __gfn_to_pfn | | | gfn_to_pfn_async | | | try_async_pf | | | tdp_page_fault | | | kvm_mmu_page_fault | | | pf_interception | | | handle_exit | | | kvm_arch_vcpu_ioctl_run | | | kvm_vcpu_ioctl | | | do_vfs_ioctl | | | sys_ioctl | | | system_call_fastpath | | | ioctl | | | 0x10100000006 | | | | | --4.08%-- __free_slab | | discard_slab | | __slab_free | | kmem_cache_free | | free_buffer_head | | try_to_free_buffers | | jbd2_journal_try_to_free_buffers | | bdev_try_to_free_page | | blkdev_releasepage | | try_to_release_page | | move_to_new_page | | migrate_pages | | compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | 0x10100000006 | | | --2.61%-- __put_single_page | put_page | | | |--91.27%-- putback_lru_page | | migrate_pages | | compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | 0x10100000006 | | | --8.73%-- skb_free_head.part.34 | skb_release_data | __kfree_skb | tcp_recvmsg | inet_recvmsg | sock_recvmsg | sys_recvfrom | system_call_fastpath | recv | 0x0 --0.14%-- [...] 1.54% qemu-kvm [kernel.kallsyms] [k] svm_vcpu_run | --- svm_vcpu_run | |--99.52%-- kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--94.70%-- 0x10100000006 | | | --5.30%-- 0x10100000002 --0.48%-- [...] 1.30% qemu-kvm [kernel.kallsyms] [k] kvm_vcpu_on_spin | --- kvm_vcpu_on_spin | |--99.45%-- pause_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--96.06%-- 0x10100000006 | | | --3.94%-- 0x10100000002 | --0.55%-- handle_exit kvm_arch_vcpu_ioctl_run kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl | |--97.59%-- 0x10100000006 | --2.41%-- 0x10100000002 1.00% qemu-kvm qemu-kvm [.] 0x0000000000254bc2 | |--1.63%-- 0x4eec20 | | | |--47.60%-- 0x2274280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--26.98%-- 0x309c280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | --25.42%-- 0x16b5280 | 0x0 | 0xa0 | 0x696368752d62 | |--1.63%-- 0x4eec6e | | | |--52.41%-- 0x2274280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--38.99%-- 0x16b5280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | --8.60%-- 0x309c280 | 0x0 | 0xa0 | 0x696368752d62 | |--1.44%-- 0x5b4cb4 | 0x0 | | | --100.00%-- 0x822ee8fff96873e9 | |--1.32%-- 0x503457 | 0x0 | |--1.30%-- 0x65a186 | 0x0 | |--1.22%-- 0x541422 | 0x0 | |--1.08%-- 0x568f04 | | | |--93.81%-- 0x0 | | | |--6.01%-- 0x10100000006 | --0.19%-- [...] | |--1.06%-- 0x56a08e | | | |--55.97%-- 0x2fa1410 | | 0x0 | | | |--24.12%-- 0x2179410 | | 0x0 | | | --19.92%-- 0x15ba410 | 0x0 | |--1.05%-- 0x4eeeac | | | |--66.23%-- 0x309c280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--19.06%-- 0x16b5280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | --14.71%-- 0x2274280 | 0x0 | 0xa0 | 0x696368752d62 | |--1.01%-- 0x6578d7 | | | --100.00%-- 0x0 | |--0.96%-- 0x52fb44 | | | |--91.88%-- 0x0 | | | --8.12%-- 0x10100000006 | |--0.95%-- 0x65a102 | |--0.94%-- 0x541aac | 0x0 | |--0.93%-- 0x525261 | 0x0 | | | --100.00%-- 0x822ee8fff96873e9 | |--0.89%-- 0x540e24 | |--0.88%-- 0x477a32 | 0x0 | |--0.87%-- 0x4eee03 | | | |--47.23%-- 0x309c280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--32.15%-- 0x2274280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | --20.62%-- 0x16b5280 | 0x0 | 0xa0 | 0x696368752d62 | |--0.84%-- 0x530421 | | | --100.00%-- 0x0 | |--0.83%-- 0x4eeb52 | |--0.82%-- 0x40a6a9 | |--0.79%-- 0x672601 | 0x1 | |--0.78%-- 0x564e00 | | | --100.00%-- 0x0 | |--0.78%-- 0x568e38 | | | |--95.83%-- 0x309c280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--2.15%-- 0x10100000006 | | | --2.02%-- 0x16b5280 | 0x0 | 0xa0 | 0x696368752d62 | |--0.74%-- 0x56e704 | | | |--47.84%-- 0x309c280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--38.61%-- 0x16b5280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--10.72%-- 0x2274280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | --2.83%-- 0x10100000006 | |--0.73%-- 0x5308c3 | |--0.72%-- 0x654b22 | 0x0 | |--0.71%-- 0x530094 | |--0.71%-- 0x564e04 | | | |--87.21%-- 0x0 | | | |--12.59%-- 0x46b47b | | 0xdffebc0000a88169 | --0.20%-- [...] | |--0.71%-- 0x568e5f | | | |--98.58%-- 0x309c280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | --1.42%-- 0x16b5280 | 0x0 | 0xa0 | 0x696368752d62 | |--0.70%-- 0x4ef092 | |--0.70%-- 0x52fac2 | | | |--99.12%-- 0x0 | | | --0.88%-- 0x10100000006 | |--0.68%-- 0x541ac1 | |--0.66%-- 0x4eec22 | | | |--44.90%-- 0x16b5280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--30.11%-- 0x309c280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | --25.00%-- 0x2274280 | 0x0 | 0xa0 | 0x696368752d62 | |--0.65%-- 0x5afab4 | | | |--48.10%-- 0x2179410 | | 0x0 | | | |--41.94%-- 0x15ba410 | | 0x0 | | | |--5.05%-- 0x0 | | | | | |--39.43%-- 0x3099550 | | | 0x5699c0 | | | 0x24448948004b4154 | | | | | |--35.76%-- 0x23c0e90 | | | 0x5699c0 | | | 0x24448948004b4154 | | | | | --24.81%-- 0x16b2130 | | 0x5699c0 | | 0x24448948004b4154 | | | |--4.00%-- 0x2fa1410 | | 0x0 | | | --0.92%-- 0x6 | |--0.63%-- 0x65a3f6 | 0x1 | |--0.63%-- 0x659d12 | 0x0 | |--0.62%-- 0x530764 | 0x0 | |--0.62%-- 0x46e803 | 0x46b47b | | | |--72.15%-- 0xdffebc0000a88169 | | | |--16.88%-- 0xdffebec000a08169 | | | --10.97%-- 0xdffeb1d000a88169 | |--0.61%-- 0x4eeba0 | | | |--45.41%-- 0x309c280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--36.19%-- 0x16b5280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | --18.40%-- 0x2274280 | 0x0 | 0xa0 | 0x696368752d62 | |--0.60%-- 0x659d61 | |--0.60%-- 0x4ff496 | |--0.59%-- 0x5030db | |--0.58%-- 0x477822 | -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Windows VM slow boot 2012-09-12 16:46 ` Richard Davies @ 2012-09-13 9:50 ` Mel Gorman 2012-09-13 19:47 ` [PATCH 1/2] Revert "mm: have order > 0 compaction start near a pageblock with free pages" Rik van Riel 2012-09-13 19:48 ` [PATCH 2/2] make the compaction "skip ahead" logic robust Rik van Riel 2 siblings, 0 replies; 15+ messages in thread From: Mel Gorman @ 2012-09-13 9:50 UTC (permalink / raw) To: Richard Davies Cc: Rik van Riel, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm On Wed, Sep 12, 2012 at 05:46:15PM +0100, Richard Davies wrote: > Hi Mel - thanks for replying to my underhand bcc! > > Mel Gorman wrote: > > I see that this is an old-ish bug but I did not read the full history. > > Is it now booting faster than 3.5.0 was? I'm asking because I'm > > interested to see if commit c67fe375 helped your particular case. > > Yes, I think 3.6.0-rc5 is already better than 3.5.x but can still be > improved, as discussed. > What are the boot times for each kernel? > <PATCH SNIPPED> > > I have applied and tested again - perf results below. > > isolate_migratepages_range is indeed much reduced. > > There is now a lot of time in isolate_freepages_block and still quite a lot > of lock contention, although in a different place. > This on top please. ---8<--- From: Shaohua Li <shli@fusionio.com> compaction: abort compaction loop if lock is contended or run too long isolate_migratepages_range() might isolate none pages, for example, when zone->lru_lock is contended and compaction is async. In this case, we should abort compaction, otherwise, compact_zone will run a useless loop and make zone->lru_lock is even contended. V2: only abort the compaction if lock is contended or run too long Rearranged the code by Andrea Arcangeli. [minchan@kernel.org: Putback pages isolated for migration if aborting] [akpm@linux-foundation.org: Fixup one contended usage site] Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Shaohua Li <shli@fusionio.com> Signed-off-by: Mel Gorman <mgorman@suse.de> --- mm/compaction.c | 17 ++++++++++++----- mm/internal.h | 2 +- 2 files changed, 13 insertions(+), 6 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 7fcd3a5..a8de20d 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -70,8 +70,7 @@ static bool compact_checklock_irqsave(spinlock_t *lock, unsigned long *flags, /* async aborts if taking too long or contended */ if (!cc->sync) { - if (cc->contended) - *cc->contended = true; + cc->contended = true; return false; } @@ -634,7 +633,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone, /* Perform the isolation */ low_pfn = isolate_migratepages_range(zone, cc, low_pfn, end_pfn); - if (!low_pfn) + if (!low_pfn || cc->contended) return ISOLATE_ABORT; cc->migrate_pfn = low_pfn; @@ -787,6 +786,8 @@ static int compact_zone(struct zone *zone, struct compact_control *cc) switch (isolate_migratepages(zone, cc)) { case ISOLATE_ABORT: ret = COMPACT_PARTIAL; + putback_lru_pages(&cc->migratepages); + cc->nr_migratepages = 0; goto out; case ISOLATE_NONE: continue; @@ -831,6 +832,7 @@ static unsigned long compact_zone_order(struct zone *zone, int order, gfp_t gfp_mask, bool sync, bool *contended) { + unsigned long ret; struct compact_control cc = { .nr_freepages = 0, .nr_migratepages = 0, @@ -838,12 +840,17 @@ static unsigned long compact_zone_order(struct zone *zone, .migratetype = allocflags_to_migratetype(gfp_mask), .zone = zone, .sync = sync, - .contended = contended, }; INIT_LIST_HEAD(&cc.freepages); INIT_LIST_HEAD(&cc.migratepages); - return compact_zone(zone, &cc); + ret = compact_zone(zone, &cc); + + VM_BUG_ON(!list_empty(&cc.freepages)); + VM_BUG_ON(!list_empty(&cc.migratepages)); + + *contended = cc.contended; + return ret; } int sysctl_extfrag_threshold = 500; diff --git a/mm/internal.h b/mm/internal.h index b8c91b3..4bd7c0e 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -130,7 +130,7 @@ struct compact_control { int order; /* order a direct compactor needs */ int migratetype; /* MOVABLE, RECLAIMABLE etc */ struct zone *zone; - bool *contended; /* True if a lock was contended */ + bool contended; /* True if a lock was contended */ }; unsigned long -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 1/2] Revert "mm: have order > 0 compaction start near a pageblock with free pages" 2012-09-12 16:46 ` Richard Davies 2012-09-13 9:50 ` Mel Gorman @ 2012-09-13 19:47 ` Rik van Riel 2012-09-13 19:48 ` [PATCH 2/2] make the compaction "skip ahead" logic robust Rik van Riel 2 siblings, 0 replies; 15+ messages in thread From: Rik van Riel @ 2012-09-13 19:47 UTC (permalink / raw) To: Richard Davies Cc: Mel Gorman, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm On Wed, 12 Sep 2012 17:46:15 +0100 Richard Davies <richard@arachsys.com> wrote: > Mel Gorman wrote: > > I see that this is an old-ish bug but I did not read the full history. > > Is it now booting faster than 3.5.0 was? I'm asking because I'm > > interested to see if commit c67fe375 helped your particular case. > > Yes, I think 3.6.0-rc5 is already better than 3.5.x but can still be > improved, as discussed. Re-reading Mel's commit de74f1cc3b1e9730d9b58580cd11361d30cd182d, I believe it re-introduces the quadratic behaviour that the code was suffering from before, by not moving zone->compact_cached_free_pfn down when no more free pfns are found in a page block. This mail reverts that changeset, the next introduces what I hope to be the proper fix. Richard, would you be willing to give these patches a try, since your system seems to reproduce this bug easily? ---8<--- Revert "mm: have order > 0 compaction start near a pageblock with free pages" This reverts commit de74f1cc3b1e9730d9b58580cd11361d30cd182d. Mel found a real issue with my "skip ahead" logic in the compaction code, but unfortunately his approach appears to have re-introduced quadratic behaviour in that the value of zone->compact_cached_free_pfn is never advanced until the compaction run wraps around the start of the zone. This merely moved the starting point for the quadratic behaviour further into the zone, but the behaviour has still been observed. It looks like another fix is required. Signed-off-by: Rik van Riel <riel@redhat.com> Reported-by: Richard Davies <richard@daviesmail.org> diff --git a/mm/compaction.c b/mm/compaction.c index 7fcd3a5..771775d 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -431,20 +431,6 @@ static bool suitable_migration_target(struct page *page) } /* - * Returns the start pfn of the last page block in a zone. This is the starting - * point for full compaction of a zone. Compaction searches for free pages from - * the end of each zone, while isolate_freepages_block scans forward inside each - * page block. - */ -static unsigned long start_free_pfn(struct zone *zone) -{ - unsigned long free_pfn; - free_pfn = zone->zone_start_pfn + zone->spanned_pages; - free_pfn &= ~(pageblock_nr_pages-1); - return free_pfn; -} - -/* * Based on information in the current compact_control, find blocks * suitable for isolating free pages from and then isolate them. */ @@ -483,6 +469,17 @@ static void isolate_freepages(struct zone *zone, pfn -= pageblock_nr_pages) { unsigned long isolated; + /* + * Skip ahead if another thread is compacting in the area + * simultaneously. If we wrapped around, we can only skip + * ahead if zone->compact_cached_free_pfn also wrapped to + * above our starting point. + */ + if (cc->order > 0 && (!cc->wrapped || + zone->compact_cached_free_pfn > + cc->start_free_pfn)) + pfn = min(pfn, zone->compact_cached_free_pfn); + if (!pfn_valid(pfn)) continue; @@ -533,15 +530,7 @@ static void isolate_freepages(struct zone *zone, */ if (isolated) { high_pfn = max(high_pfn, pfn); - - /* - * If the free scanner has wrapped, update - * compact_cached_free_pfn to point to the highest - * pageblock with free pages. This reduces excessive - * scanning of full pageblocks near the end of the - * zone - */ - if (cc->order > 0 && cc->wrapped) + if (cc->order > 0) zone->compact_cached_free_pfn = high_pfn; } } @@ -551,11 +540,6 @@ static void isolate_freepages(struct zone *zone, cc->free_pfn = high_pfn; cc->nr_freepages = nr_freepages; - - /* If compact_cached_free_pfn is reset then set it now */ - if (cc->order > 0 && !cc->wrapped && - zone->compact_cached_free_pfn == start_free_pfn(zone)) - zone->compact_cached_free_pfn = high_pfn; } /* @@ -642,6 +626,20 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone, return ISOLATE_SUCCESS; } +/* + * Returns the start pfn of the last page block in a zone. This is the starting + * point for full compaction of a zone. Compaction searches for free pages from + * the end of each zone, while isolate_freepages_block scans forward inside each + * page block. + */ +static unsigned long start_free_pfn(struct zone *zone) +{ + unsigned long free_pfn; + free_pfn = zone->zone_start_pfn + zone->spanned_pages; + free_pfn &= ~(pageblock_nr_pages-1); + return free_pfn; +} + static int compact_finished(struct zone *zone, struct compact_control *cc) { -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 2/2] make the compaction "skip ahead" logic robust 2012-09-12 16:46 ` Richard Davies 2012-09-13 9:50 ` Mel Gorman 2012-09-13 19:47 ` [PATCH 1/2] Revert "mm: have order > 0 compaction start near a pageblock with free pages" Rik van Riel @ 2012-09-13 19:48 ` Rik van Riel 2012-09-13 19:54 ` [PATCH -v2 " Rik van Riel 2 siblings, 1 reply; 15+ messages in thread From: Rik van Riel @ 2012-09-13 19:48 UTC (permalink / raw) To: Richard Davies Cc: Mel Gorman, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm Make the "skip ahead" logic in compaction resistant to compaction wrapping around to the end of the zone. This can lead to less efficient compaction when one thread has wrapped around to the end of the zone, and another simultaneous compactor has not done so yet. However, it should ensure that we do not suffer quadratic behaviour any more. Signed-off-by: Rik van Riel <riel@redhat.com> Reported-by: Richard Davies <richard@daviesmail.org> diff --git a/mm/compaction.c b/mm/compaction.c index 771775d..0656759 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -431,6 +431,24 @@ static bool suitable_migration_target(struct page *page) } /* + * We scan the zone in a circular fashion, starting at + * zone->compact_cached_free_pfn. Be careful not to skip if + * one compacting thread has just wrapped back to the end of the + * zone, but another thread has not. + */ +static bool compaction_may_skip(struct zone *zone, + struct compact_control *cc) +{ + if (!cc->wrapped && zone->compact_free_pfn < cc->start_pfn) + return true; + + if (cc->wrapped && zone_compact_free_pfn > cc->start_pfn) + return true; + + return false; +} + +/* * Based on information in the current compact_control, find blocks * suitable for isolating free pages from and then isolate them. */ @@ -471,13 +489,9 @@ static void isolate_freepages(struct zone *zone, /* * Skip ahead if another thread is compacting in the area - * simultaneously. If we wrapped around, we can only skip - * ahead if zone->compact_cached_free_pfn also wrapped to - * above our starting point. + * simultaneously, and has finished with this page block. */ - if (cc->order > 0 && (!cc->wrapped || - zone->compact_cached_free_pfn > - cc->start_free_pfn)) + if (cc->order > 0 && compaction_may_skip(zone, cc)) pfn = min(pfn, zone->compact_cached_free_pfn); if (!pfn_valid(pfn)) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH -v2 2/2] make the compaction "skip ahead" logic robust 2012-09-13 19:48 ` [PATCH 2/2] make the compaction "skip ahead" logic robust Rik van Riel @ 2012-09-13 19:54 ` Rik van Riel 2012-09-15 15:55 ` Richard Davies 0 siblings, 1 reply; 15+ messages in thread From: Rik van Riel @ 2012-09-13 19:54 UTC (permalink / raw) To: Richard Davies Cc: Mel Gorman, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm Argh. And of course I send out the version from _before_ the compile test, instead of the one after! I am not used to caffeine any more and have had way too much tea... ---8<--- Make the "skip ahead" logic in compaction resistant to compaction wrapping around to the end of the zone. This can lead to less efficient compaction when one thread has wrapped around to the end of the zone, and another simultaneous compactor has not done so yet. However, it should ensure that we do not suffer quadratic behaviour any more. Signed-off-by: Rik van Riel <riel@redhat.com> Reported-by: Richard Davies <richard@daviesmail.org> diff --git a/mm/compaction.c b/mm/compaction.c index 771775d..0656759 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -431,6 +431,24 @@ static bool suitable_migration_target(struct page *page) } /* + * We scan the zone in a circular fashion, starting at + * zone->compact_cached_free_pfn. Be careful not to skip if + * one compacting thread has just wrapped back to the end of the + * zone, but another thread has not. + */ +static bool compaction_may_skip(struct zone *zone, + struct compact_control *cc) +{ + if (!cc->wrapped && zone->compact_cached_free_pfn < cc->start_free_pfn) + return true; + + if (cc->wrapped && zone->compact_cached_free_pfn > cc->start_free_pfn) + return true; + + return false; +} + +/* * Based on information in the current compact_control, find blocks * suitable for isolating free pages from and then isolate them. */ @@ -471,13 +489,9 @@ static void isolate_freepages(struct zone *zone, /* * Skip ahead if another thread is compacting in the area - * simultaneously. If we wrapped around, we can only skip - * ahead if zone->compact_cached_free_pfn also wrapped to - * above our starting point. + * simultaneously, and has finished with this page block. */ - if (cc->order > 0 && (!cc->wrapped || - zone->compact_cached_free_pfn > - cc->start_free_pfn)) + if (cc->order > 0 && compaction_may_skip(zone, cc)) pfn = min(pfn, zone->compact_cached_free_pfn); if (!pfn_valid(pfn)) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH -v2 2/2] make the compaction "skip ahead" logic robust 2012-09-13 19:54 ` [PATCH -v2 " Rik van Riel @ 2012-09-15 15:55 ` Richard Davies 2012-09-16 19:12 ` Richard Davies ` (2 more replies) 0 siblings, 3 replies; 15+ messages in thread From: Richard Davies @ 2012-09-15 15:55 UTC (permalink / raw) To: Rik van Riel Cc: Mel Gorman, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm Hi Rik, Mel and Shaohua, Thank you for your latest patches. I attach my latest perf report for a slow boot with all of these applied. Mel asked for timings of the slow boots. It's very hard to give anything useful here! A normal boot would be a minute or so, and many are like that, but the slowest that I have seen (on 3.5.x) was several hours. Basically, I just test many times until I get one which is noticeably slow than normal and then run perf record on that one. The latest perf report for a slow boot is below. For the fast boots, most of the time is in clean_page_c in do_huge_pmd_anonymous_page, but for this slow one there is a lot of lock contention above that. Thanks, Richard. # ======== # captured on: Sat Sep 15 15:40:54 2012 # os release : 3.6.0-rc5-elastic+ # perf version : 3.5.2 # arch : x86_64 # nrcpus online : 16 # nrcpus avail : 16 # cpudesc : AMD Opteron(tm) Processor 6128 # cpuid : AuthenticAMD,16,9,1 # total memory : 131973280 kB # cmdline : /home/root/bin/perf record -g -a # event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80 } # HEADER_CPU_TOPOLOGY info available, use -I to display # HEADER_NUMA_TOPOLOGY info available, use -I to display # ======== # # Samples: 3M of event 'cycles' # Event count (approx.): 1457256240581 # # Overhead Command Shared Object Symbol # ........ ............... .................... .............................................. # 58.49% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave | --- _raw_spin_lock_irqsave | |--95.07%-- compact_checklock_irqsave | | | |--70.03%-- isolate_migratepages_range | | compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | | | | |--92.76%-- 0x10100000006 | | | | | --7.24%-- 0x10100000002 | | | --29.97%-- compaction_alloc | migrate_pages | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--90.69%-- 0x10100000006 | | | --9.31%-- 0x10100000002 | |--4.53%-- isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--92.22%-- 0x10100000006 | | | --7.78%-- 0x10100000002 --0.40%-- [...] 13.14% qemu-kvm [kernel.kallsyms] [k] clear_page_c | --- clear_page_c | |--99.38%-- do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--51.86%-- 0x10100000006 | | | |--48.14%-- 0x10100000002 | --0.01%-- [...] | --0.62%-- __alloc_pages_nodemask | |--76.27%-- alloc_pages_vma | handle_pte_fault | | | |--99.57%-- handle_mm_fault | | | | | |--99.65%-- __get_user_pages | | | get_user_page_nowait | | | hva_to_pfn.isra.17 | | | __gfn_to_pfn | | | gfn_to_pfn_async | | | try_async_pf | | | tdp_page_fault | | | kvm_mmu_page_fault | | | pf_interception | | | handle_exit | | | kvm_arch_vcpu_ioctl_run | | | kvm_vcpu_ioctl | | | do_vfs_ioctl | | | sys_ioctl | | | system_call_fastpath | | | ioctl | | | | | | | |--91.77%-- 0x10100000006 | | | | | | | --8.23%-- 0x10100000002 | | --0.35%-- [...] | --0.43%-- [...] | --23.73%-- alloc_pages_current | |--99.20%-- pte_alloc_one | | | |--98.68%-- do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | | | | |--58.61%-- 0x10100000002 | | | | | --41.39%-- 0x10100000006 | | | --1.32%-- __pte_alloc | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | 0x10100000006 | |--0.69%-- __vmalloc_node_range | __vmalloc_node | vzalloc | __kvm_set_memory_region | kvm_set_memory_region | kvm_vm_ioctl_set_memory_region | kvm_vm_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl --0.12%-- [...] 6.31% qemu-kvm [kernel.kallsyms] [k] isolate_freepages_block | --- isolate_freepages_block | |--99.98%-- compaction_alloc | migrate_pages | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--91.13%-- 0x10100000006 | | | --8.87%-- 0x10100000002 --0.02%-- [...] 1.68% qemu-kvm [kernel.kallsyms] [k] yield_to | --- yield_to | |--99.65%-- kvm_vcpu_yield_to | kvm_vcpu_on_spin | pause_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--88.78%-- 0x10100000006 | | | --11.22%-- 0x10100000002 --0.35%-- [...] 1.24% ksmd [kernel.kallsyms] [k] memcmp | --- memcmp | |--99.78%-- memcmp_pages | | | |--77.17%-- ksm_scan_thread | | kthread | | kernel_thread_helper | | | --22.83%-- try_to_merge_with_ksm_page | ksm_scan_thread | kthread | kernel_thread_helper --0.22%-- [...] 1.09% qemu-kvm [kernel.kallsyms] [k] svm_vcpu_run | --- svm_vcpu_run | |--99.44%-- kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--82.15%-- 0x10100000006 | | | |--17.85%-- 0x10100000002 | --0.00%-- [...] | --0.56%-- kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl | |--75.21%-- 0x10100000006 | --24.79%-- 0x10100000002 1.09% swapper [kernel.kallsyms] [k] default_idle | --- default_idle | |--99.74%-- cpu_idle | | | |--76.31%-- start_secondary | | | --23.69%-- rest_init | start_kernel | x86_64_start_reservations | x86_64_start_kernel --0.26%-- [...] 1.08% ksmd [kernel.kallsyms] [k] smp_call_function_many | --- smp_call_function_many | |--99.97%-- native_flush_tlb_others | | | |--99.78%-- flush_tlb_page | | ptep_clear_flush | | try_to_merge_with_ksm_page | | ksm_scan_thread | | kthread | | kernel_thread_helper | --0.22%-- [...] --0.03%-- [...] 0.77% qemu-kvm [kernel.kallsyms] [k] kvm_vcpu_on_spin | --- kvm_vcpu_on_spin | |--99.36%-- pause_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--90.08%-- 0x10100000006 | | | |--9.92%-- 0x10100000002 | --0.00%-- [...] | --0.64%-- handle_exit kvm_arch_vcpu_ioctl_run kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl | |--87.37%-- 0x10100000006 | --12.63%-- 0x10100000002 0.75% qemu-kvm [kernel.kallsyms] [k] compact_zone | --- compact_zone | |--99.98%-- compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--91.29%-- 0x10100000006 | | | --8.71%-- 0x10100000002 --0.02%-- [...] 0.68% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock | --- _raw_spin_lock | |--39.71%-- yield_to | kvm_vcpu_yield_to | kvm_vcpu_on_spin | pause_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--90.52%-- 0x10100000006 | | | --9.48%-- 0x10100000002 | |--15.63%-- kvm_vcpu_yield_to | kvm_vcpu_on_spin | pause_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--90.96%-- 0x10100000006 | | | --9.04%-- 0x10100000002 | |--6.55%-- tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--78.78%-- 0x10100000006 | | | --21.22%-- 0x10100000002 | |--4.87%-- free_pcppages_bulk | | | |--51.10%-- free_hot_cold_page | | | | | |--83.60%-- free_hot_cold_page_list | | | | | | | |--62.17%-- release_pages | | | | pagevec_lru_move_fn | | | | __pagevec_lru_add | | | | | | | | | |--99.22%-- __lru_cache_add | | | | | lru_cache_add_lru | | | | | putback_lru_page | | | | | | | | | | | |--99.61%-- migrate_pages | | | | | | compact_zone | | | | | | compact_zone_order | | | | | | try_to_compact_pages | | | | | | __alloc_pages_direct_compact | | | | | | __alloc_pages_nodemask | | | | | | alloc_pages_vma | | | | | | do_huge_pmd_anonymous_page | | | | | | handle_mm_fault | | | | | | __get_user_pages | | | | | | get_user_page_nowait | | | | | | hva_to_pfn.isra.17 | | | | | | __gfn_to_pfn | | | | | | gfn_to_pfn_async | | | | | | try_async_pf | | | | | | tdp_page_fault | | | | | | kvm_mmu_page_fault | | | | | | pf_interception | | | | | | handle_exit | | | | | | kvm_arch_vcpu_ioctl_run | | | | | | kvm_vcpu_ioctl | | | | | | do_vfs_ioctl | | | | | | sys_ioctl | | | | | | system_call_fastpath | | | | | | ioctl | | | | | | | | | | | | | |--88.98%-- 0x10100000006 | | | | | | | | | | | | | --11.02%-- 0x10100000002 | | | | | --0.39%-- [...] | | | | | | | | | --0.78%-- lru_add_drain_cpu | | | | lru_add_drain | | | | migrate_prep_local | | | | compact_zone | | | | compact_zone_order | | | | try_to_compact_pages | | | | __alloc_pages_direct_compact | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | 0x10100000006 | | | | | | | --37.83%-- shrink_page_list | | | shrink_inactive_list | | | shrink_lruvec | | | try_to_free_pages | | | __alloc_pages_nodemask | | | alloc_pages_vma | | | do_huge_pmd_anonymous_page | | | handle_mm_fault | | | __get_user_pages | | | get_user_page_nowait | | | hva_to_pfn.isra.17 | | | __gfn_to_pfn | | | gfn_to_pfn_async | | | try_async_pf | | | tdp_page_fault | | | kvm_mmu_page_fault | | | pf_interception | | | handle_exit | | | kvm_arch_vcpu_ioctl_run | | | kvm_vcpu_ioctl | | | do_vfs_ioctl | | | sys_ioctl | | | system_call_fastpath | | | ioctl | | | | | | | |--86.38%-- 0x10100000006 | | | | | | | --13.62%-- 0x10100000002 | | | | | |--12.96%-- __free_pages | | | | | | | |--98.43%-- release_freepages | | | | compact_zone | | | | compact_zone_order | | | | try_to_compact_pages | | | | __alloc_pages_direct_compact | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--90.49%-- 0x10100000006 | | | | | | | | | --9.51%-- 0x10100000002 | | | | | | | --1.57%-- __free_slab | | | discard_slab | | | unfreeze_partials | | | put_cpu_partial | | | __slab_free | | | kmem_cache_free | | | free_buffer_head | | | try_to_free_buffers | | | jbd2_journal_try_to_free_buffers | | | bdev_try_to_free_page | | | blkdev_releasepage | | | try_to_release_page | | | move_to_new_page | | | migrate_pages | | | compact_zone | | | compact_zone_order | | | try_to_compact_pages | | | __alloc_pages_direct_compact | | | __alloc_pages_nodemask | | | alloc_pages_vma | | | do_huge_pmd_anonymous_page | | | handle_mm_fault | | | __get_user_pages | | | get_user_page_nowait | | | hva_to_pfn.isra.17 | | | __gfn_to_pfn | | | gfn_to_pfn_async | | | try_async_pf | | | tdp_page_fault | | | kvm_mmu_page_fault | | | pf_interception | | | handle_exit | | | kvm_arch_vcpu_ioctl_run | | | kvm_vcpu_ioctl | | | do_vfs_ioctl | | | sys_ioctl | | | system_call_fastpath | | | ioctl | | | 0x10100000006 | | | | | --3.44%-- __put_single_page | | put_page | | putback_lru_page | | migrate_pages | | compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | | | | |--88.25%-- 0x10100000006 | | | | | --11.75%-- 0x10100000002 | | | --48.90%-- drain_pages | | | |--88.65%-- drain_local_pages | | | | | |--96.33%-- generic_smp_call_function_interrupt | | | smp_call_function_interrupt | | | call_function_interrupt | | | | | | | |--23.46%-- __remove_mapping | | | | shrink_page_list | | | | shrink_inactive_list | | | | shrink_lruvec | | | | try_to_free_pages | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--93.81%-- 0x10100000006 | | | | | | | | | --6.19%-- 0x10100000002 | | | | | | | |--19.93%-- kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--93.65%-- 0x10100000006 | | | | | | | | | --6.35%-- 0x10100000002 | | | | | | | |--14.19%-- compaction_alloc | | | | migrate_pages | | | | compact_zone | | | | compact_zone_order | | | | try_to_compact_pages | | | | __alloc_pages_direct_compact | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--89.88%-- 0x10100000006 | | | | | | | | | --10.12%-- 0x10100000002 | | | | | | | |--8.57%-- isolate_migratepages_range | | | | compact_zone | | | | compact_zone_order | | | | try_to_compact_pages | | | | __alloc_pages_direct_compact | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--92.14%-- 0x10100000006 | | | | | | | | | --7.86%-- 0x10100000002 | | | | | | | |--5.05%-- do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--92.53%-- 0x10100000006 | | | | | | | | | --7.47%-- 0x10100000002 | | | | | | | |--4.49%-- shrink_inactive_list | | | | shrink_lruvec | | | | try_to_free_pages | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--94.61%-- 0x10100000006 | | | | | | | | | --5.39%-- 0x10100000002 | | | | | | | |--2.80%-- free_hot_cold_page_list | | | | shrink_page_list | | | | shrink_inactive_list | | | | shrink_lruvec | | | | try_to_free_pages | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--91.24%-- 0x10100000006 | | | | | | | | | --8.76%-- 0x10100000002 | | | | | | | |--1.96%-- buffer_migrate_page | | | | move_to_new_page | | | | migrate_pages | | | | compact_zone | | | | compact_zone_order | | | | try_to_compact_pages | | | | __alloc_pages_direct_compact | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--63.14%-- 0x10100000006 | | | | | | | | | --36.86%-- 0x10100000002 | | | | | | | |--1.62%-- try_to_free_buffers | | | | jbd2_journal_try_to_free_buffers | | | | ext4_releasepage | | | | try_to_release_page | | | | shrink_page_list | | | | shrink_inactive_list | | | | shrink_lruvec | | | | try_to_free_pages | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | 0x10100000006 | | | | | | | |--1.49%-- compact_checklock_irqsave | | | | isolate_migratepages_range | | | | compact_zone | | | | compact_zone_order | | | | try_to_compact_pages | | | | __alloc_pages_direct_compact | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | 0x10100000006 | | | | | | | |--1.46%-- __mutex_lock_slowpath | | | | mutex_lock | | | | page_lock_anon_vma | | | | page_referenced | | | | shrink_active_list | | | | shrink_lruvec | | | | try_to_free_pages | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | 0x10100000006 | | | | | | | |--1.41%-- native_flush_tlb_others | | | | flush_tlb_page | | | | | | | | | |--67.10%-- ptep_clear_flush | | | | | try_to_unmap_one | | | | | try_to_unmap_anon | | | | | try_to_unmap | | | | | migrate_pages | | | | | compact_zone | | | | | compact_zone_order | | | | | try_to_compact_pages | | | | | __alloc_pages_direct_compact | | | | | __alloc_pages_nodemask -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH -v2 2/2] make the compaction "skip ahead" logic robust 2012-09-15 15:55 ` Richard Davies @ 2012-09-16 19:12 ` Richard Davies 2012-09-17 12:26 ` Mel Gorman 2012-09-17 13:50 ` Rik van Riel 2 siblings, 0 replies; 15+ messages in thread From: Richard Davies @ 2012-09-16 19:12 UTC (permalink / raw) To: Rik van Riel Cc: Mel Gorman, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm Richard Davies wrote: > Thank you for your latest patches. I attach my latest perf report for a slow > boot with all of these applied. For avoidance of any doubt, there is the combined diff versus 3.6.0-rc5 which I tested: diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 38b42e7..090405d 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -1383,10 +1383,8 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, qgroup_dirty(fs_info, srcgroup); } - if (!inherit) { - ret = -EINVAL; + if (!inherit) goto unlock; - } i_qgroups = (u64 *)(inherit + 1); for (i = 0; i < inherit->num_qgroups; ++i) { diff --git a/mm/compaction.c b/mm/compaction.c index 7fcd3a5..92bae88 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -70,8 +70,7 @@ static bool compact_checklock_irqsave(spinlock_t *lock, unsigned long *flags, /* async aborts if taking too long or contended */ if (!cc->sync) { - if (cc->contended) - *cc->contended = true; + cc->contended = true; return false; } @@ -296,8 +295,9 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc, /* Time to isolate some pages for migration */ cond_resched(); - spin_lock_irqsave(&zone->lru_lock, flags); - locked = true; + locked = compact_trylock_irqsave(&zone->lru_lock, &flags, cc); + if (!locked) + return 0; for (; low_pfn < end_pfn; low_pfn++) { struct page *page; @@ -431,17 +431,21 @@ static bool suitable_migration_target(struct page *page) } /* - * Returns the start pfn of the last page block in a zone. This is the starting - * point for full compaction of a zone. Compaction searches for free pages from - * the end of each zone, while isolate_freepages_block scans forward inside each - * page block. + * We scan the zone in a circular fashion, starting at + * zone->compact_cached_free_pfn. Be careful not to skip if + * one compacting thread has just wrapped back to the end of the + * zone, but another thread has not. */ -static unsigned long start_free_pfn(struct zone *zone) +static bool compaction_may_skip(struct zone *zone, + struct compact_control *cc) { - unsigned long free_pfn; - free_pfn = zone->zone_start_pfn + zone->spanned_pages; - free_pfn &= ~(pageblock_nr_pages-1); - return free_pfn; + if (!cc->wrapped && zone->compact_cached_free_pfn < cc->start_free_pfn) + return true; + + if (cc->wrapped && zone->compact_cached_free_pfn > cc->start_free_pfn) + return true; + + return false; } /* @@ -483,6 +487,13 @@ static void isolate_freepages(struct zone *zone, pfn -= pageblock_nr_pages) { unsigned long isolated; + /* + * Skip ahead if another thread is compacting in the area + * simultaneously, and has finished with this page block. + */ + if (cc->order > 0 && compaction_may_skip(zone, cc)) + pfn = min(pfn, zone->compact_cached_free_pfn); + if (!pfn_valid(pfn)) continue; @@ -533,15 +544,7 @@ static void isolate_freepages(struct zone *zone, */ if (isolated) { high_pfn = max(high_pfn, pfn); - - /* - * If the free scanner has wrapped, update - * compact_cached_free_pfn to point to the highest - * pageblock with free pages. This reduces excessive - * scanning of full pageblocks near the end of the - * zone - */ - if (cc->order > 0 && cc->wrapped) + if (cc->order > 0) zone->compact_cached_free_pfn = high_pfn; } } @@ -551,11 +554,6 @@ static void isolate_freepages(struct zone *zone, cc->free_pfn = high_pfn; cc->nr_freepages = nr_freepages; - - /* If compact_cached_free_pfn is reset then set it now */ - if (cc->order > 0 && !cc->wrapped && - zone->compact_cached_free_pfn == start_free_pfn(zone)) - zone->compact_cached_free_pfn = high_pfn; } /* @@ -634,7 +632,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone, /* Perform the isolation */ low_pfn = isolate_migratepages_range(zone, cc, low_pfn, end_pfn); - if (!low_pfn) + if (!low_pfn || cc->contended) return ISOLATE_ABORT; cc->migrate_pfn = low_pfn; @@ -642,6 +640,20 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone, return ISOLATE_SUCCESS; } +/* + * Returns the start pfn of the last page block in a zone. This is the starting + * point for full compaction of a zone. Compaction searches for free pages from + * the end of each zone, while isolate_freepages_block scans forward inside each + * page block. + */ +static unsigned long start_free_pfn(struct zone *zone) +{ + unsigned long free_pfn; + free_pfn = zone->zone_start_pfn + zone->spanned_pages; + free_pfn &= ~(pageblock_nr_pages-1); + return free_pfn; +} + static int compact_finished(struct zone *zone, struct compact_control *cc) { @@ -787,6 +799,8 @@ static int compact_zone(struct zone *zone, struct compact_control *cc) switch (isolate_migratepages(zone, cc)) { case ISOLATE_ABORT: ret = COMPACT_PARTIAL; + putback_lru_pages(&cc->migratepages); + cc->nr_migratepages = 0; goto out; case ISOLATE_NONE: continue; @@ -831,6 +845,7 @@ static unsigned long compact_zone_order(struct zone *zone, int order, gfp_t gfp_mask, bool sync, bool *contended) { + unsigned long ret; struct compact_control cc = { .nr_freepages = 0, .nr_migratepages = 0, @@ -838,12 +853,17 @@ static unsigned long compact_zone_order(struct zone *zone, .migratetype = allocflags_to_migratetype(gfp_mask), .zone = zone, .sync = sync, - .contended = contended, }; INIT_LIST_HEAD(&cc.freepages); INIT_LIST_HEAD(&cc.migratepages); - return compact_zone(zone, &cc); + ret = compact_zone(zone, &cc); + + VM_BUG_ON(!list_empty(&cc.freepages)); + VM_BUG_ON(!list_empty(&cc.migratepages)); + + *contended = cc.contended; + return ret; } int sysctl_extfrag_threshold = 500; diff --git a/mm/internal.h b/mm/internal.h index b8c91b3..4bd7c0e 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -130,7 +130,7 @@ struct compact_control { int order; /* order a direct compactor needs */ int migratetype; /* MOVABLE, RECLAIMABLE etc */ struct zone *zone; - bool *contended; /* True if a lock was contended */ + bool contended; /* True if a lock was contended */ }; unsigned long -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH -v2 2/2] make the compaction "skip ahead" logic robust 2012-09-15 15:55 ` Richard Davies 2012-09-16 19:12 ` Richard Davies @ 2012-09-17 12:26 ` Mel Gorman 2012-09-18 8:14 ` Richard Davies 2012-09-17 13:50 ` Rik van Riel 2 siblings, 1 reply; 15+ messages in thread From: Mel Gorman @ 2012-09-17 12:26 UTC (permalink / raw) To: Richard Davies Cc: Rik van Riel, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm On Sat, Sep 15, 2012 at 04:55:24PM +0100, Richard Davies wrote: > Hi Rik, Mel and Shaohua, > > Thank you for your latest patches. I attach my latest perf report for a slow > boot with all of these applied. > Thanks for testing. > Mel asked for timings of the slow boots. It's very hard to give anything > useful here! A normal boot would be a minute or so, and many are like that, > but the slowest that I have seen (on 3.5.x) was several hours. Basically, I > just test many times until I get one which is noticeably slow than normal > and then run perf record on that one. > Ok. > The latest perf report for a slow boot is below. For the fast boots, most of > the time is in clean_page_c in do_huge_pmd_anonymous_page, but for this slow > one there is a lot of lock contention above that. > > <SNIP> > 58.49% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave > | > --- _raw_spin_lock_irqsave > | > |--95.07%-- compact_checklock_irqsave > | | > | |--70.03%-- isolate_migratepages_range > <SNIP> > | --29.97%-- compaction_alloc > | > |--4.53%-- isolate_migratepages_range > <SNIP> This is going the right direction but usage due to contentions is still obviously stupidly high. Compaction features throughout the profile but staying focused on the lock contention for the moment. Can you try the following patch? So far I'm not having much luck reproducing this locally. ---8<--- mm: compaction: Only release lru_lock every SWAP_CLUSTER_MAX pages if necessary Commit b2eef8c0 (mm: compaction: minimise the time IRQs are disabled while isolating pages for migration) releases the lru_lock every SWAP_CLUSTER_MAX pages that are scanned as it was found at the time that compaction could contend badly with page reclaim. This can lead to a situation where compaction contends heavily with itself as it releases and reacquires the LRU lock. This patch makes two changes to how the migrate scanner acquires the LRU lock. First, it only releases the LRU lock every SWAP_CLUSTER_MAX pages if the lock is contended. This reduces the number of times it unnnecessarily disables and reenables IRQs. The second is that it defers acquiring the LRU lock for as long as possible. In cases where transparent hugepages are encountered the LRU lock will not be acquired at all. Signed-off-by: Mel Gorman <mgorman@suse.de> --- mm/compaction.c | 65 +++++++++++++++++++++++++++++++++++++------------------ 1 file changed, 44 insertions(+), 21 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 39342ee..1874f23 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -50,6 +50,11 @@ static inline bool migrate_async_suitable(int migratetype) return is_migrate_cma(migratetype) || migratetype == MIGRATE_MOVABLE; } +static inline bool should_release_lock(spinlock_t *lock) +{ + return need_resched() || spin_is_contended(lock); +} + /* * Compaction requires the taking of some coarse locks that are potentially * very heavily contended. Check if the process needs to be scheduled or @@ -62,7 +67,7 @@ static inline bool migrate_async_suitable(int migratetype) static bool compact_checklock_irqsave(spinlock_t *lock, unsigned long *flags, bool locked, struct compact_control *cc) { - if (need_resched() || spin_is_contended(lock)) { + if (should_release_lock(lock)) { if (locked) { spin_unlock_irqrestore(lock, *flags); locked = false; @@ -275,7 +280,7 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc, isolate_mode_t mode = 0; struct lruvec *lruvec; unsigned long flags; - bool locked; + bool locked = false; /* * Ensure that there are not too many pages isolated from the LRU @@ -295,24 +300,17 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc, /* Time to isolate some pages for migration */ cond_resched(); - locked = compact_trylock_irqsave(&zone->lru_lock, &flags, cc); - if (!locked) - return 0; for (; low_pfn < end_pfn; low_pfn++) { struct page *page; /* give a chance to irqs before checking need_resched() */ - if (!((low_pfn+1) % SWAP_CLUSTER_MAX)) { - spin_unlock_irqrestore(&zone->lru_lock, flags); - locked = false; + if (locked && !((low_pfn+1) % SWAP_CLUSTER_MAX)) { + if (should_release_lock(&zone->lru_lock)) { + spin_unlock_irqrestore(&zone->lru_lock, flags); + locked = false; + } } - /* Check if it is ok to still hold the lock */ - locked = compact_checklock_irqsave(&zone->lru_lock, &flags, - locked, cc); - if (!locked) - break; - /* * migrate_pfn does not necessarily start aligned to a * pageblock. Ensure that pfn_valid is called when moving @@ -352,21 +350,38 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc, pageblock_nr = low_pfn >> pageblock_order; if (!cc->sync && last_pageblock_nr != pageblock_nr && !migrate_async_suitable(get_pageblock_migratetype(page))) { - low_pfn += pageblock_nr_pages; - low_pfn = ALIGN(low_pfn, pageblock_nr_pages) - 1; - last_pageblock_nr = pageblock_nr; - continue; + goto next_pageblock; } + /* Check may be lockless but that's ok as we recheck later */ if (!PageLRU(page)) continue; /* - * PageLRU is set, and lru_lock excludes isolation, - * splitting and collapsing (collapsing has already - * happened if PageLRU is set). + * PageLRU is set. lru_lock normally excludes isolation + * splitting and collapsing (collapsing has already happened + * if PageLRU is set) but the lock is not necessarily taken + * here and it is wasteful to take it just to check transhuge. + * Check transhuge without lock and skip if it's either a + * transhuge or hugetlbfs page. */ if (PageTransHuge(page)) { + if (!locked) + goto next_pageblock; + low_pfn += (1 << compound_order(page)) - 1; + continue; + } + + /* Check if it is ok to still hold the lock */ + locked = compact_checklock_irqsave(&zone->lru_lock, &flags, + locked, cc); + if (!locked) + break; + + /* Recheck PageLRU and PageTransHuge under lock */ + if (!PageLRU(page)) + continue; + if (PageTransHuge(page)) { low_pfn += (1 << compound_order(page)) - 1; continue; } @@ -393,6 +408,14 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc, ++low_pfn; break; } + + continue; + +next_pageblock: + low_pfn += pageblock_nr_pages; + low_pfn = ALIGN(low_pfn, pageblock_nr_pages) - 1; + last_pageblock_nr = pageblock_nr; + } acct_isolated(zone, locked, cc); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH -v2 2/2] make the compaction "skip ahead" logic robust 2012-09-17 12:26 ` Mel Gorman @ 2012-09-18 8:14 ` Richard Davies 2012-09-18 11:21 ` Mel Gorman 0 siblings, 1 reply; 15+ messages in thread From: Richard Davies @ 2012-09-18 8:14 UTC (permalink / raw) To: Mel Gorman Cc: Rik van Riel, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm Hi Mel, Thanks for your latest patch, I attach a perf report below with this on top of all previous patches. There is still lock contention, though in a different place. Regarding Rik's question: > > Mel asked for timings of the slow boots. It's very hard to give anything > > useful here! A normal boot would be a minute or so, and many are like > > that, but the slowest that I have seen (on 3.5.x) was several hours. > > Basically, I just test many times until I get one which is noticeably > > slow than normal and then run perf record on that one. > > > > The latest perf report for a slow boot is below. For the fast boots, > > most of the time is in clean_page_c in do_huge_pmd_anonymous_page, but > > for this slow one there is a lot of lock contention above that. > > How often do you run into slow boots, vs. fast ones? It is about 1/3rd slow boots, some of which are slower than others. I do about ten and send you the trace of the worst. Experimentally, copying large files (the VM image files) immediately before booting the VM seems to make a slow boot more likely. Thanks, Richard. # ======== # captured on: Mon Sep 17 20:09:33 2012 # os release : 3.6.0-rc5-elastic+ # perf version : 3.5.2 # arch : x86_64 # nrcpus online : 16 # nrcpus avail : 16 # cpudesc : AMD Opteron(tm) Processor 6128 # cpuid : AuthenticAMD,16,9,1 # total memory : 131973280 kB # cmdline : /home/root/bin/perf record -g -a # event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48 } # HEADER_CPU_TOPOLOGY info available, use -I to display # HEADER_NUMA_TOPOLOGY info available, use -I to display # ======== # # Samples: 4M of event 'cycles' # Event count (approx.): 1616311320818 # # Overhead Command Shared Object Symbol # ........ ............... .................... .............................................. # 59.97% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave | --- _raw_spin_lock_irqsave | |--99.30%-- compact_checklock_irqsave | | | |--99.98%-- compaction_alloc | | migrate_pages | | compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | | | | |--84.28%-- 0x10100000006 | | | | | --15.72%-- 0x10100000002 | --0.02%-- [...] | |--0.65%-- compaction_alloc | migrate_pages | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--83.37%-- 0x10100000006 | | | --16.63%-- 0x10100000002 --0.05%-- [...] 12.27% qemu-kvm [kernel.kallsyms] [k] isolate_freepages_block | --- isolate_freepages_block | |--99.99%-- compaction_alloc | migrate_pages | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--82.90%-- 0x10100000006 | | | --17.10%-- 0x10100000002 --0.01%-- [...] 7.90% qemu-kvm [kernel.kallsyms] [k] clear_page_c | --- clear_page_c | |--99.19%-- do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--64.93%-- 0x10100000006 | | | --35.07%-- 0x10100000002 | --0.81%-- __alloc_pages_nodemask | |--84.23%-- alloc_pages_vma | handle_pte_fault | | | |--99.62%-- handle_mm_fault | | | | | |--99.74%-- __get_user_pages | | | get_user_page_nowait | | | hva_to_pfn.isra.17 | | | __gfn_to_pfn | | | gfn_to_pfn_async | | | try_async_pf | | | tdp_page_fault | | | kvm_mmu_page_fault | | | pf_interception | | | handle_exit | | | kvm_arch_vcpu_ioctl_run | | | kvm_vcpu_ioctl | | | do_vfs_ioctl | | | sys_ioctl | | | system_call_fastpath | | | ioctl | | | | | | | |--76.24%-- 0x10100000006 | | | | | | | --23.76%-- 0x10100000002 | | --0.26%-- [...] | --0.38%-- [...] | --15.77%-- alloc_pages_current pte_alloc_one | |--97.49%-- do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--57.31%-- 0x10100000006 | | | --42.69%-- 0x10100000002 | --2.51%-- __pte_alloc do_huge_pmd_anonymous_page handle_mm_fault __get_user_pages get_user_page_nowait hva_to_pfn.isra.17 __gfn_to_pfn gfn_to_pfn_async try_async_pf tdp_page_fault kvm_mmu_page_fault pf_interception handle_exit kvm_arch_vcpu_ioctl_run kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl | |--61.90%-- 0x10100000006 | --38.10%-- 0x10100000002 2.66% ksmd [kernel.kallsyms] [k] smp_call_function_many | --- smp_call_function_many | |--99.99%-- native_flush_tlb_others | | | |--99.79%-- flush_tlb_page | | ptep_clear_flush | | try_to_merge_with_ksm_page | | ksm_scan_thread | | kthread | | kernel_thread_helper | --0.21%-- [...] --0.01%-- [...] 1.62% qemu-kvm [kernel.kallsyms] [k] yield_to | --- yield_to | |--99.58%-- kvm_vcpu_yield_to | kvm_vcpu_on_spin | pause_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--77.42%-- 0x10100000006 | | | --22.58%-- 0x10100000002 --0.42%-- [...] 1.17% ksmd [kernel.kallsyms] [k] memcmp | --- memcmp | |--99.65%-- memcmp_pages | | | |--78.67%-- ksm_scan_thread | | kthread | | kernel_thread_helper | | | --21.33%-- try_to_merge_with_ksm_page | ksm_scan_thread | kthread | kernel_thread_helper --0.35%-- [...] 1.16% qemu-kvm [kernel.kallsyms] [k] svm_vcpu_run | --- svm_vcpu_run | |--99.47%-- kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--74.69%-- 0x10100000006 | | | --25.31%-- 0x10100000002 | --0.53%-- kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl | |--72.19%-- 0x10100000006 | --27.81%-- 0x10100000002 1.09% swapper [kernel.kallsyms] [k] default_idle | --- default_idle | |--99.73%-- cpu_idle | | | |--84.39%-- start_secondary | | | --15.61%-- rest_init | start_kernel | x86_64_start_reservations | x86_64_start_kernel --0.27%-- [...] 0.85% qemu-kvm [kernel.kallsyms] [k] kvm_vcpu_on_spin | --- kvm_vcpu_on_spin | |--99.40%-- pause_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--76.92%-- 0x10100000006 | | | --23.08%-- 0x10100000002 | --0.60%-- handle_exit kvm_arch_vcpu_ioctl_run kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl | |--75.02%-- 0x10100000006 | --24.98%-- 0x10100000002 0.60% qemu-kvm [kernel.kallsyms] [k] __srcu_read_lock | --- __srcu_read_lock | |--92.87%-- kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--76.37%-- 0x10100000006 | | | --23.63%-- 0x10100000002 | |--6.18%-- kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--74.92%-- 0x10100000006 | | | --25.08%-- 0x10100000002 --0.95%-- [...] 0.60% qemu-kvm [kernel.kallsyms] [k] __rcu_read_unlock | --- __rcu_read_unlock | |--79.70%-- get_pid_task | kvm_vcpu_yield_to | kvm_vcpu_on_spin | pause_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--75.95%-- 0x10100000006 | | | --24.05%-- 0x10100000002 | |--11.44%-- kvm_vcpu_yield_to | kvm_vcpu_on_spin | pause_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--75.32%-- 0x10100000006 | | | --24.68%-- 0x10100000002 | |--3.51%-- kvm_vcpu_on_spin | pause_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--76.56%-- 0x10100000006 | | | --23.44%-- 0x10100000002 | |--1.88%-- do_select | core_sys_select | sys_select | system_call_fastpath | __select | 0x0 | |--1.30%-- fget_light | | | |--71.87%-- do_select | | core_sys_select | | sys_select | | system_call_fastpath | | __select | | 0x0 | | | |--15.50%-- sys_ioctl | | system_call_fastpath | | ioctl | | | | | |--50.94%-- 0x10100000002 | | | | | |--17.13%-- 0x2740310 | | | 0x0 | | | | | |--13.07%-- 0x225c310 | | | 0x0 | | | | | |--9.95%-- 0x2792310 | | | 0x0 | | | | | |--3.64%-- 0x75ed8548202c4b83 | | | | | |--1.87%-- 0x8800000 | | | 0x26433c0 | | | | | |--1.79%-- 0x10100000006 | | | | | |--0.95%-- 0x19800000 | | | 0x26953c0 | | | | | --0.67%-- 0x24bc8b4400000098 | | | |--7.32%-- sys_read | | system_call_fastpath | | read | | | | | --100.00%-- pthread_mutex_lock@plt | | | |--4.03%-- sys_write | | system_call_fastpath | | write | | | | | --100.00%-- 0x0 | | | |--0.69%-- sys_pread64 | | system_call_fastpath | | pread64 | | 0x269d260 | | 0x80 | | 0x480050b9e1058b48 | --0.59%-- [...] --2.18%-- [...] 0.49% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock | --- _raw_spin_lock | |--50.00%-- yield_to | kvm_vcpu_yield_to | kvm_vcpu_on_spin | pause_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--77.93%-- 0x10100000006 | | | --22.07%-- 0x10100000002 | |--11.97%-- free_pcppages_bulk | | | |--67.09%-- free_hot_cold_page | | | | | |--87.14%-- free_hot_cold_page_list | | | | | | | |--62.82%-- shrink_page_list | | | | shrink_inactive_list | | | | shrink_lruvec | | | | try_to_free_pages | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--77.85%-- 0x10100000006 | | | | | | | | | --22.15%-- 0x10100000002 | | | | | | | --37.18%-- release_pages | | | pagevec_lru_move_fn | | | __pagevec_lru_add | | | | | | | |--99.76%-- __lru_cache_add | | | | lru_cache_add_lru | | | | putback_lru_page | | | | migrate_pages | | | | compact_zone | | | | compact_zone_order | | | | try_to_compact_pages | | | | __alloc_pages_direct_compact | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--80.37%-- 0x10100000006 | | | | | | | | | --19.63%-- 0x10100000002 | | | --0.24%-- [...] | | | | | |--10.98%-- __free_pages | | | | | | | |--98.77%-- release_freepages | | | | compact_zone | | | | compact_zone_order | | | | try_to_compact_pages | | | | __alloc_pages_direct_compact | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--80.81%-- 0x10100000006 | | | | | | | | | --19.19%-- 0x10100000002 | | | | | | | --1.23%-- __free_slab | | | discard_slab | | | unfreeze_partials | | | put_cpu_partial | | | __slab_free | | | kmem_cache_free | | | free_buffer_head | | | try_to_free_buffers | | | jbd2_journal_try_to_free_buffers | | | ext4_releasepage | | | try_to_release_page | | | shrink_page_list | | | shrink_inactive_list | | | shrink_lruvec | | | try_to_free_pages | | | __alloc_pages_nodemask | | | alloc_pages_vma | | | do_huge_pmd_anonymous_page | | | handle_mm_fault | | | __get_user_pages | | | get_user_page_nowait | | | hva_to_pfn.isra.17 | | | __gfn_to_pfn | | | gfn_to_pfn_async | | | try_async_pf | | | tdp_page_fault | | | kvm_mmu_page_fault | | | pf_interception | | | handle_exit | | | kvm_arch_vcpu_ioctl_run | | | kvm_vcpu_ioctl | | | do_vfs_ioctl | | | sys_ioctl | | | system_call_fastpath | | | ioctl | | | | | | | |--57.92%-- 0x10100000006 | | | | | | | --42.08%-- 0x10100000002 | | | | | --1.88%-- __put_single_page | | put_page | | putback_lru_page | | migrate_pages | | compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | | | | |--62.44%-- 0x10100000006 | | | | | --37.56%-- 0x10100000002 | | | --32.91%-- drain_pages | | | |--75.89%-- drain_local_pages | | | | | |--89.98%-- generic_smp_call_function_interrupt | | | smp_call_function_interrupt | | | call_function_interrupt | | | | | | | |--44.57%-- compaction_alloc | | | | migrate_pages | | | | compact_zone | | | | compact_zone_order | | | | try_to_compact_pages | | | | __alloc_pages_direct_compact | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--79.27%-- 0x10100000006 | | | | | | | | | --20.73%-- 0x10100000002 | | | | | | | |--16.92%-- kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--86.24%-- 0x10100000006 | | | | | | | | | --13.76%-- 0x10100000002 | | | | | | | |--5.39%-- do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--75.62%-- 0x10100000006 | | | | | | | | | --24.38%-- 0x10100000002 | | | | | | | |--3.26%-- buffer_migrate_page | | | | move_to_new_page | | | | migrate_pages | | | | compact_zone | | | | compact_zone_order | | | | try_to_compact_pages | | | | __alloc_pages_direct_compact | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--85.62%-- 0x10100000006 | | | | | | | | | --14.38%-- 0x10100000002 | | | | | | | |--3.21%-- __remove_mapping | | | | shrink_page_list | | | | shrink_inactive_list | | | | shrink_lruvec | | | | try_to_free_pages | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--78.75%-- 0x10100000006 | | | | | | | | | --21.25%-- 0x10100000002 | | | | | | | |--3.01%-- free_hot_cold_page_list | | | | shrink_page_list | | | | shrink_inactive_list | | | | shrink_lruvec | | | | try_to_free_pages | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--84.48%-- 0x10100000006 | | | | | | | | | --15.52%-- 0x10100000002 | | | | | | | |--2.25%-- try_to_free_buffers | | | | jbd2_journal_try_to_free_buffers | | | | ext4_releasepage | | | | try_to_release_page | | | | shrink_page_list | | | | shrink_inactive_list | | | | shrink_lruvec | | | | try_to_free_pages | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--58.91%-- 0x10100000006 | | | | | | | | | --41.09%-- 0x10100000002 | | | | | | | |--2.07%-- compact_zone | | | | compact_zone_order | | | | try_to_compact_pages | | | | __alloc_pages_direct_compact | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--67.59%-- 0x10100000006 | | | | | | | | | --32.41%-- 0x10100000002 | | | | | | | |--1.80%-- native_flush_tlb_others | | | | | | | | | |--75.08%-- flush_tlb_page | | | | | | | | | | | |--82.69%-- ptep_clear_flush_young | | | | | | page_referenced_one | | | | | | page_referenced | | | | | | shrink_active_list | | | | | | shrink_lruvec | | | | | | try_to_free_pages | | | | | | __alloc_pages_nodemask | | | | | | alloc_pages_vma | | | | | | do_huge_pmd_anonymous_page | | | | | | handle_mm_fault | | | | | | __get_user_pages | | | | | | get_user_page_nowait | | | | | | hva_to_pfn.isra.17 | | | | | | __gfn_to_pfn | | | | | | gfn_to_pfn_async | | | | | | try_async_pf | | | | | | tdp_page_fault | | | | | | kvm_mmu_page_fault | | | | | | pf_interception | | | | | | handle_exit | | | | | | kvm_arch_vcpu_ioctl_run | | | | | | kvm_vcpu_ioctl | | | | | | do_vfs_ioctl | | | | | | sys_ioctl | | | | | | system_call_fastpath | | | | | | ioctl | | | | | | | | | | | | | |--78.99%-- 0x10100000006 | | | | | | | | | | | | | --21.01%-- 0x10100000002 | | | | | | | | | | | --17.31%-- ptep_clear_flush | | | | | try_to_unmap_one | | | | | try_to_unmap_anon | | | | | try_to_unmap | | | | | migrate_pages | | | | | compact_zone | | | | | compact_zone_order | | | | | try_to_compact_pages | | | | | __alloc_pages_direct_compact | | | | | __alloc_pages_nodemask | | | | | alloc_pages_vma | | | | | do_huge_pmd_anonymous_page | | | | | handle_mm_fault | | | | | __get_user_pages | | | | | get_user_page_nowait | | | | | hva_to_pfn.isra.17 | | | | | __gfn_to_pfn | | | | | gfn_to_pfn_async | | | | | try_async_pf | | | | | tdp_page_fault | | | | | kvm_mmu_page_fault | | | | | pf_interception | | | | | handle_exit | | | | | kvm_arch_vcpu_ioctl_run | | | | | kvm_vcpu_ioctl | | | | | do_vfs_ioctl | | | | | sys_ioctl | | | | | system_call_fastpath | | | | | ioctl | | | | | 0x10100000006 | | | | | | | | | --24.92%-- flush_tlb_mm_range | | | | pmdp_clear_flush_young | | | | page_referenced_one | | | | page_referenced | | | | shrink_active_list | | | | shrink_lruvec | | | | try_to_free_pages | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH -v2 2/2] make the compaction "skip ahead" logic robust 2012-09-18 8:14 ` Richard Davies @ 2012-09-18 11:21 ` Mel Gorman 2012-09-18 17:58 ` Richard Davies 0 siblings, 1 reply; 15+ messages in thread From: Mel Gorman @ 2012-09-18 11:21 UTC (permalink / raw) To: Richard Davies Cc: Rik van Riel, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm On Tue, Sep 18, 2012 at 09:14:55AM +0100, Richard Davies wrote: > Hi Mel, > > Thanks for your latest patch, I attach a perf report below with this on top > of all previous patches. There is still lock contention, though in a > different place. > > 59.97% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave > | > --- _raw_spin_lock_irqsave > | > |--99.30%-- compact_checklock_irqsave > | | > | |--99.98%-- compaction_alloc Ok, this just means the focus has moved to the zone->lock instead of the zone->lru_lock. This was expected to some extent. This is an additional patch that defers acquisition of the zone->lock for as long as possible. Incidentally, I checked the efficiency of compaction - i.e. how many pages scanned versus how many pages isolated and the efficiency completely sucks. It must be addressed but addressing the lock contention should happen first. ---8<--- mm: compaction: Acquire the zone->lock as late as possible The zone lock is required when isolating pages to allocate and for checking PageBuddy. It is a coarse-grained lock but the current implementation acquires the lock when examining each pageblock before it is known if there are free pages to isolate. This patch defers acquiring the zone lock for as long as possible. In the event there are no free pages in the pageblock then the lock will not be acquired at all. Signed-off-by: Mel Gorman <mgorman@suse.de> --- mm/compaction.c | 80 ++++++++++++++++++++++++++++++++----------------------- 1 file changed, 47 insertions(+), 33 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index a5d698f..57ff9ef 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -89,19 +89,14 @@ static bool compact_checklock_irqsave(spinlock_t *lock, unsigned long *flags, return true; } -static inline bool compact_trylock_irqsave(spinlock_t *lock, - unsigned long *flags, struct compact_control *cc) -{ - return compact_checklock_irqsave(lock, flags, false, cc); -} - /* * Isolate free pages onto a private freelist. Caller must hold zone->lock. * If @strict is true, will abort returning 0 on any invalid PFNs or non-free * pages inside of the pageblock (even though it may still end up isolating * some pages). */ -static unsigned long isolate_freepages_block(unsigned long start_pfn, +static unsigned long isolate_freepages_block(struct compact_control *cc, + unsigned long start_pfn, unsigned long end_pfn, struct list_head *freelist, bool strict) @@ -109,6 +104,8 @@ static unsigned long isolate_freepages_block(unsigned long start_pfn, int nr_scanned = 0, total_isolated = 0; unsigned long blockpfn = start_pfn; struct page *cursor; + unsigned long flags; + bool locked = false; cursor = pfn_to_page(blockpfn); @@ -117,18 +114,29 @@ static unsigned long isolate_freepages_block(unsigned long start_pfn, int isolated, i; struct page *page = cursor; - if (!pfn_valid_within(blockpfn)) { - if (strict) - return 0; - continue; - } + if (!pfn_valid_within(blockpfn)) + goto strict_check; nr_scanned++; - if (!PageBuddy(page)) { - if (strict) - return 0; - continue; - } + if (!PageBuddy(page)) + goto strict_check; + + /* + * The zone lock must be held to isolate freepages. This + * unfortunately this is a very coarse lock and can be + * heavily contended if there are parallel allocations + * or parallel compactions. For async compaction do not + * spin on the lock and we acquire the lock as late as + * possible. + */ + locked = compact_checklock_irqsave(&cc->zone->lock, &flags, + locked, cc); + if (!locked) + break; + + /* Recheck this is a buddy page under lock */ + if (!PageBuddy(page)) + goto strict_check; /* Found a free page, break it into order-0 pages */ isolated = split_free_page(page); @@ -145,10 +153,24 @@ static unsigned long isolate_freepages_block(unsigned long start_pfn, blockpfn += isolated - 1; cursor += isolated - 1; } + + continue; + +strict_check: + /* Abort isolation if the caller requested strict isolation */ + if (strict) { + total_isolated = 0; + goto out; + } } trace_mm_compaction_isolate_freepages(start_pfn, nr_scanned, total_isolated); + +out: + if (locked) + spin_unlock_irqrestore(&cc->zone->lock, flags); + return total_isolated; } @@ -168,13 +190,18 @@ static unsigned long isolate_freepages_block(unsigned long start_pfn, unsigned long isolate_freepages_range(unsigned long start_pfn, unsigned long end_pfn) { - unsigned long isolated, pfn, block_end_pfn, flags; + unsigned long isolated, pfn, block_end_pfn; struct zone *zone = NULL; LIST_HEAD(freelist); + struct compact_control cc; if (pfn_valid(start_pfn)) zone = page_zone(pfn_to_page(start_pfn)); + /* cc needed for isolate_freepages_block to acquire zone->lock */ + cc.zone = zone; + cc.sync = true; + for (pfn = start_pfn; pfn < end_pfn; pfn += isolated) { if (!pfn_valid(pfn) || zone != page_zone(pfn_to_page(pfn))) break; @@ -186,10 +213,8 @@ isolate_freepages_range(unsigned long start_pfn, unsigned long end_pfn) block_end_pfn = ALIGN(pfn + 1, pageblock_nr_pages); block_end_pfn = min(block_end_pfn, end_pfn); - spin_lock_irqsave(&zone->lock, flags); - isolated = isolate_freepages_block(pfn, block_end_pfn, + isolated = isolate_freepages_block(&cc, pfn, block_end_pfn, &freelist, true); - spin_unlock_irqrestore(&zone->lock, flags); /* * In strict mode, isolate_freepages_block() returns 0 if @@ -480,7 +505,6 @@ static void isolate_freepages(struct zone *zone, { struct page *page; unsigned long high_pfn, low_pfn, pfn, zone_end_pfn, end_pfn; - unsigned long flags; int nr_freepages = cc->nr_freepages; struct list_head *freelist = &cc->freepages; @@ -536,22 +560,12 @@ static void isolate_freepages(struct zone *zone, */ isolated = 0; - /* - * The zone lock must be held to isolate freepages. This - * unfortunately this is a very coarse lock and can be - * heavily contended if there are parallel allocations - * or parallel compactions. For async compaction do not - * spin on the lock - */ - if (!compact_trylock_irqsave(&zone->lock, &flags, cc)) - break; if (suitable_migration_target(page)) { end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn); - isolated = isolate_freepages_block(pfn, end_pfn, + isolated = isolate_freepages_block(cc, pfn, end_pfn, freelist, false); nr_freepages += isolated; } - spin_unlock_irqrestore(&zone->lock, flags); /* * Record the highest PFN we isolated pages from. When next -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH -v2 2/2] make the compaction "skip ahead" logic robust 2012-09-18 11:21 ` Mel Gorman @ 2012-09-18 17:58 ` Richard Davies 0 siblings, 0 replies; 15+ messages in thread From: Richard Davies @ 2012-09-18 17:58 UTC (permalink / raw) To: Mel Gorman Cc: Rik van Riel, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm Mel Gorman wrote: > Ok, this just means the focus has moved to the zone->lock instead of the > zone->lru_lock. This was expected to some extent. This is an additional > patch that defers acquisition of the zone->lock for as long as possible. And I believe you have now beaten the lock contention - congratulations! > Incidentally, I checked the efficiency of compaction - i.e. how many > pages scanned versus how many pages isolated and the efficiency > completely sucks. It must be addressed but addressing the lock > contention should happen first. Yes, compaction is now definitely top. Interestingly, some boots still seem "slow" and some "fast", even without any lock contention issues. Here are traces from a few different runs, and I attach the detailed report for the first of these which was one of the slow ones. # grep -F '[k]' report.1 | head -8 55.86% qemu-kvm [kernel.kallsyms] [k] isolate_freepages_block 14.98% qemu-kvm [kernel.kallsyms] [k] clear_page_c 2.18% qemu-kvm [kernel.kallsyms] [k] yield_to 1.67% qemu-kvm [kernel.kallsyms] [k] get_pageblock_flags_group 1.66% qemu-kvm [kernel.kallsyms] [k] compact_zone 1.56% ksmd [kernel.kallsyms] [k] memcmp 1.48% swapper [kernel.kallsyms] [k] default_idle 1.33% qemu-kvm [kernel.kallsyms] [k] svm_vcpu_run # # grep -F '[k]' report.2 | head -8 38.28% qemu-kvm [kernel.kallsyms] [k] isolate_freepages_block 7.58% qemu-kvm [kernel.kallsyms] [k] get_pageblock_flags_group 7.03% qemu-kvm [kernel.kallsyms] [k] clear_page_c 4.72% qemu-kvm [kernel.kallsyms] [k] isolate_migratepages_range 4.31% qemu-kvm [kernel.kallsyms] [k] copy_page_c 4.15% qemu-kvm [kernel.kallsyms] [k] compact_zone 2.68% qemu-kvm [kernel.kallsyms] [k] __zone_watermark_ok 2.65% qemu-kvm [kernel.kallsyms] [k] yield_to # # grep -F '[k]' report.3 | head -8 75.18% qemu-kvm [kernel.kallsyms] [k] clear_page_c 1.82% swapper [kernel.kallsyms] [k] default_idle 1.29% qemu-kvm [kernel.kallsyms] [k] isolate_freepages_block 1.27% qemu-kvm [kernel.kallsyms] [k] get_page_from_freelist 1.20% ksmd [kernel.kallsyms] [k] memcmp 0.83% qemu-kvm [kernel.kallsyms] [k] free_pages_prepare 0.78% qemu-kvm [kernel.kallsyms] [k] svm_vcpu_run 0.59% qemu-kvm [kernel.kallsyms] [k] prep_compound_page # # grep -F '[k]' report.4 | head -8 41.02% qemu-kvm [kernel.kallsyms] [k] isolate_freepages_block 32.20% qemu-kvm [kernel.kallsyms] [k] clear_page_c 1.76% qemu-kvm [kernel.kallsyms] [k] yield_to 1.37% swapper [kernel.kallsyms] [k] default_idle 1.35% ksmd [kernel.kallsyms] [k] memcmp 1.27% qemu-kvm [kernel.kallsyms] [k] svm_vcpu_run 1.23% qemu-kvm [kernel.kallsyms] [k] get_pageblock_flags_group 0.88% qemu-kvm [kernel.kallsyms] [k] kvm_vcpu_on_spin # # grep -F '[k]' report.5 | head -8 61.18% qemu-kvm [kernel.kallsyms] [k] isolate_freepages_block 14.55% qemu-kvm [kernel.kallsyms] [k] clear_page_c 1.75% qemu-kvm [kernel.kallsyms] [k] yield_to 1.31% ksmd [kernel.kallsyms] [k] memcmp 1.21% qemu-kvm [kernel.kallsyms] [k] svm_vcpu_run 1.20% swapper [kernel.kallsyms] [k] default_idle 1.14% qemu-kvm [kernel.kallsyms] [k] get_pageblock_flags_group 0.94% qemu-kvm [kernel.kallsyms] [k] kvm_vcpu_on_spin Here is the detailed report for the first of these: # ======== # captured on: Tue Sep 18 17:03:40 2012 # os release : 3.6.0-rc5-elastic+ # perf version : 3.5.2 # arch : x86_64 # nrcpus online : 16 # nrcpus avail : 16 # cpudesc : AMD Opteron(tm) Processor 6128 # cpuid : AuthenticAMD,16,9,1 # total memory : 131973280 kB # cmdline : /home/root/bin/perf record -g -a # event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 } # HEADER_CPU_TOPOLOGY info available, use -I to display # HEADER_NUMA_TOPOLOGY info available, use -I to display # ======== # # Samples: 3M of event 'cycles' # Event count (approx.): 1184064513533 # # Overhead Command Shared Object Symbol # ........ ............... .................... .............................................. # 55.86% qemu-kvm [kernel.kallsyms] [k] isolate_freepages_block | --- isolate_freepages_block | |--99.99%-- compaction_alloc | migrate_pages | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--88.73%-- 0x10100000006 | | | --11.27%-- 0x10100000002 --0.01%-- [...] 14.98% qemu-kvm [kernel.kallsyms] [k] clear_page_c | --- clear_page_c | |--99.84%-- do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--55.15%-- 0x10100000006 | | | --44.85%-- 0x10100000002 --0.16%-- [...] 2.18% qemu-kvm [kernel.kallsyms] [k] yield_to | --- yield_to | |--99.62%-- kvm_vcpu_yield_to | kvm_vcpu_on_spin | pause_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--83.34%-- 0x10100000006 | | | --16.66%-- 0x10100000002 --0.38%-- [...] 1.67% qemu-kvm [kernel.kallsyms] [k] get_pageblock_flags_group | --- get_pageblock_flags_group | |--57.67%-- isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--86.10%-- 0x10100000006 | | | --13.90%-- 0x10100000002 | |--38.10%-- suitable_migration_target | compaction_alloc | migrate_pages | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--88.50%-- 0x10100000006 | | | --11.50%-- 0x10100000002 | |--2.23%-- compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--85.85%-- 0x10100000006 | | | --14.15%-- 0x10100000002 | |--0.88%-- compaction_alloc | migrate_pages | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--87.75%-- 0x10100000006 | | | --12.25%-- 0x10100000002 | |--0.75%-- free_hot_cold_page | | | |--74.93%-- free_hot_cold_page_list | | | | | |--53.13%-- shrink_page_list | | | shrink_inactive_list | | | shrink_lruvec | | | try_to_free_pages | | | __alloc_pages_nodemask | | | alloc_pages_vma | | | do_huge_pmd_anonymous_page | | | handle_mm_fault | | | __get_user_pages | | | get_user_page_nowait | | | hva_to_pfn.isra.17 | | | __gfn_to_pfn | | | gfn_to_pfn_async | | | try_async_pf | | | tdp_page_fault | | | kvm_mmu_page_fault | | | pf_interception | | | handle_exit | | | kvm_arch_vcpu_ioctl_run | | | kvm_vcpu_ioctl | | | do_vfs_ioctl | | | sys_ioctl | | | system_call_fastpath | | | ioctl | | | | | | | |--82.85%-- 0x10100000006 | | | | | | | --17.15%-- 0x10100000002 | | | | | --46.87%-- release_pages | | pagevec_lru_move_fn | | __pagevec_lru_add | | | | | |--98.13%-- __lru_cache_add | | | lru_cache_add_lru | | | putback_lru_page | | | | | | | |--99.02%-- migrate_pages | | | | compact_zone | | | | compact_zone_order | | | | try_to_compact_pages | | | | __alloc_pages_direct_compact | | | | __alloc_pages_nodemask | | | | alloc_pages_vma | | | | do_huge_pmd_anonymous_page | | | | handle_mm_fault | | | | __get_user_pages | | | | get_user_page_nowait | | | | hva_to_pfn.isra.17 | | | | __gfn_to_pfn | | | | gfn_to_pfn_async | | | | try_async_pf | | | | tdp_page_fault | | | | kvm_mmu_page_fault | | | | pf_interception | | | | handle_exit | | | | kvm_arch_vcpu_ioctl_run | | | | kvm_vcpu_ioctl | | | | do_vfs_ioctl | | | | sys_ioctl | | | | system_call_fastpath | | | | ioctl | | | | | | | | | |--88.56%-- 0x10100000006 | | | | | | | | | --11.44%-- 0x10100000002 | | | | | | | --0.98%-- putback_lru_pages | | | compact_zone | | | compact_zone_order | | | try_to_compact_pages | | | __alloc_pages_direct_compact | | | __alloc_pages_nodemask | | | alloc_pages_vma | | | do_huge_pmd_anonymous_page | | | handle_mm_fault | | | __get_user_pages | | | get_user_page_nowait | | | hva_to_pfn.isra.17 | | | __gfn_to_pfn | | | gfn_to_pfn_async | | | try_async_pf | | | tdp_page_fault | | | kvm_mmu_page_fault | | | pf_interception | | | handle_exit | | | kvm_arch_vcpu_ioctl_run | | | kvm_vcpu_ioctl | | | do_vfs_ioctl | | | sys_ioctl | | | system_call_fastpath | | | ioctl | | | 0x10100000002 | | | | | --1.87%-- lru_add_drain_cpu | | lru_add_drain | | | | | |--51.26%-- shrink_inactive_list | | | shrink_lruvec | | | try_to_free_pages | | | __alloc_pages_nodemask | | | alloc_pages_vma | | | do_huge_pmd_anonymous_page | | | handle_mm_fault | | | __get_user_pages | | | get_user_page_nowait | | | hva_to_pfn.isra.17 | | | __gfn_to_pfn | | | gfn_to_pfn_async | | | try_async_pf | | | tdp_page_fault | | | kvm_mmu_page_fault | | | pf_interception | | | handle_exit | | | kvm_arch_vcpu_ioctl_run | | | kvm_vcpu_ioctl | | | do_vfs_ioctl | | | sys_ioctl | | | system_call_fastpath | | | ioctl | | | 0x10100000002 | | | | | --48.74%-- migrate_prep_local | | compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | 0x10100000006 | | | |--23.04%-- __free_pages | | | | | |--59.57%-- release_freepages | | | compact_zone | | | compact_zone_order | | | try_to_compact_pages | | | __alloc_pages_direct_compact | | | __alloc_pages_nodemask | | | alloc_pages_vma | | | do_huge_pmd_anonymous_page | | | handle_mm_fault | | | __get_user_pages | | | get_user_page_nowait | | | hva_to_pfn.isra.17 | | | __gfn_to_pfn | | | gfn_to_pfn_async | | | try_async_pf | | | tdp_page_fault | | | kvm_mmu_page_fault | | | pf_interception | | | handle_exit | | | kvm_arch_vcpu_ioctl_run | | | kvm_vcpu_ioctl | | | do_vfs_ioctl | | | sys_ioctl | | | system_call_fastpath | | | ioctl | | | | | | | |--89.08%-- 0x10100000006 | | | | | | | --10.92%-- 0x10100000002 | | | | | |--30.57%-- do_huge_pmd_anonymous_page | | | handle_mm_fault | | | __get_user_pages | | | get_user_page_nowait | | | hva_to_pfn.isra.17 | | | __gfn_to_pfn | | | gfn_to_pfn_async | | | try_async_pf | | | tdp_page_fault | | | kvm_mmu_page_fault | | | pf_interception | | | handle_exit | | | kvm_arch_vcpu_ioctl_run | | | kvm_vcpu_ioctl | | | do_vfs_ioctl | | | sys_ioctl | | | system_call_fastpath | | | ioctl | | | | | | | |--60.91%-- 0x10100000006 | | | | | | | --39.09%-- 0x10100000002 | | | | | --9.86%-- __free_slab | | discard_slab | | | | | |--55.43%-- unfreeze_partials | | | put_cpu_partial | | | __slab_free | | | kmem_cache_free | | | free_buffer_head | | | try_to_free_buffers | | | jbd2_journal_try_to_free_buffers | | | ext4_releasepage | | | try_to_release_page | | | shrink_page_list | | | shrink_inactive_list | | | shrink_lruvec | | | try_to_free_pages | | | __alloc_pages_nodemask | | | alloc_pages_vma | | | do_huge_pmd_anonymous_page | | | handle_mm_fault | | | __get_user_pages | | | get_user_page_nowait | | | hva_to_pfn.isra.17 | | | __gfn_to_pfn | | | gfn_to_pfn_async | | | try_async_pf | | | tdp_page_fault | | | kvm_mmu_page_fault | | | pf_interception | | | handle_exit | | | kvm_arch_vcpu_ioctl_run | | | kvm_vcpu_ioctl | | | do_vfs_ioctl | | | sys_ioctl | | | system_call_fastpath | | | ioctl | | | 0x10100000006 | | | | | --44.57%-- __slab_free | | kmem_cache_free | | free_buffer_head | | try_to_free_buffers | | jbd2_journal_try_to_free_buffers | | ext4_releasepage | | try_to_release_page | | shrink_page_list | | shrink_inactive_list | | shrink_lruvec | | try_to_free_pages | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | 0x10100000006 | | | --2.02%-- __put_single_page | put_page | putback_lru_page | migrate_pages | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--83.36%-- 0x10100000006 | | | --16.64%-- 0x10100000002 --0.37%-- [...] 1.66% qemu-kvm [kernel.kallsyms] [k] compact_zone | --- compact_zone | |--99.99%-- compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.17 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--85.25%-- 0x10100000006 | | | --14.75%-- 0x10100000002 --0.01%-- [...] 1.56% ksmd [kernel.kallsyms] [k] memcmp | --- memcmp | |--99.67%-- memcmp_pages | | | |--77.39%-- ksm_scan_thread | | kthread | | kernel_thread_helper | | | --22.61%-- try_to_merge_with_ksm_page | ksm_scan_thread | kthread | kernel_thread_helper --0.33%-- [...] 1.48% swapper [kernel.kallsyms] [k] default_idle | --- default_idle | |--99.55%-- cpu_idle | | | |--92.95%-- start_secondary | | | --7.05%-- rest_init | start_kernel | x86_64_start_reservations | x86_64_start_kernel --0.45%-- [...] 1.33% qemu-kvm [kernel.kallsyms] [k] svm_vcpu_run | --- svm_vcpu_run | |--99.34%-- kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--77.65%-- 0x10100000006 | | | --22.35%-- 0x10100000002 | --0.66%-- kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl | |--73.97%-- 0x10100000006 | --26.03%-- 0x10100000002 1.08% qemu-kvm [kernel.kallsyms] [k] kvm_vcpu_on_spin | --- kvm_vcpu_on_spin | |--99.27%-- pause_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--83.21%-- 0x10100000006 | | | --16.79%-- 0x10100000002 | --0.73%-- handle_exit kvm_arch_vcpu_ioctl_run kvm_vcpu_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath ioctl | |--80.89%-- 0x10100000006 | --19.11%-- 0x10100000002 0.79% qemu-kvm qemu-kvm [.] 0x00000000000ae282 | |--1.27%-- 0x4eec6e | | | |--38.48%-- 0x1491280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--32.35%-- 0x3195280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | --29.16%-- 0x200c280 | 0x0 | 0xa0 | 0x696368752d62 | |--1.24%-- 0x503457 | 0x0 | |--1.02%-- 0x4eec20 | | | |--46.48%-- 0x200c280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--28.52%-- 0x3195280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | --24.99%-- 0x1491280 | 0x0 | 0xa0 | 0x696368752d62 | |--1.00%-- 0x4eec2a | | | |--77.52%-- 0x200c280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--12.67%-- 0x3195280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | --9.80%-- 0x1491280 | 0x0 | 0xa0 | 0x696368752d62 | |--0.99%-- 0x4ef092 | |--0.94%-- 0x568f04 | | | |--89.85%-- 0x0 | | | |--7.89%-- 0x10100000002 | | | --2.26%-- 0x10100000006 | |--0.93%-- 0x5afab4 | | | |--40.39%-- 0x309a410 | | 0x0 | | | |--31.80%-- 0x1f11410 | | 0x0 | | | |--20.88%-- 0x1396410 | | 0x0 | | | |--4.58%-- 0x0 | | | | | |--52.36%-- 0x148ea00 | | | 0x5699c0 | | | 0x24448948004b4154 | | | | | |--31.49%-- 0x2009a00 | | | 0x5699c0 | | | 0x24448948004b4154 | | | | | --16.15%-- 0x3192a00 | | 0x5699c0 | | 0x24448948004b4154 | | | |--1.31%-- 0x1000 | | | --1.03%-- 0x6 | |--0.92%-- 0x4eeba0 | | | |--35.54%-- 0x3195280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--32.33%-- 0x200c280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | --32.12%-- 0x1491280 | 0x0 | 0xa0 | 0x696368752d62 | |--0.91%-- 0x652b11 | |--0.83%-- 0x65a102 | |--0.82%-- 0x40a6a9 | |--0.81%-- 0x530421 | | | |--94.43%-- 0x0 | | | --5.57%-- 0x46b47b | | | |--51.32%-- 0xdffec96000a08169 | | | --48.68%-- 0xdffec90000a08169 | |--0.80%-- 0x569fc4 | | | |--41.34%-- 0x1396410 | | 0x0 | | | |--29.46%-- 0x1f11410 | | 0x0 | | | --29.21%-- 0x309a410 | 0x0 | |--0.73%-- 0x541422 | 0x0 | |--0.70%-- 0x56b990 | | | |--72.77%-- 0x100000008 | | | |--26.00%-- 0xfed00000 | | | | | --100.00%-- 0x0 | | | |--0.73%-- 0x100000004 | --0.50%-- [...] | |--0.69%-- 0x525261 | 0x0 | 0x822ee8fff96873e9 | |--0.69%-- 0x6578d7 | | | --100.00%-- 0x0 | |--0.67%-- 0x52fb44 | | | |--75.44%-- 0x0 | | | |--17.16%-- 0x10100000002 | | | --7.41%-- 0x10100000006 | |--0.66%-- 0x568e29 | | | |--50.87%-- 0x200c280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--33.04%-- 0x3195280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--13.60%-- 0x1491280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--1.40%-- 0x1000 | | | |--0.65%-- 0x3000 | --0.43%-- [...] | |--0.65%-- 0x5b4cb4 | 0x0 | 0x822ee8fff96873e9 | |--0.62%-- 0x55b9ba | | | |--50.14%-- 0x0 | | | --49.86%-- 0x2000000 | |--0.61%-- 0x4ff496 | |--0.60%-- 0x672601 | 0x1 | |--0.58%-- 0x4eec06 | | | |--75.93%-- 0x200c280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--15.91%-- 0x3195280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | --8.15%-- 0x1491280 | 0x0 | 0xa0 | 0x696368752d62 | |--0.58%-- 0x477a32 | 0x0 | |--0.56%-- 0x477b27 | 0x0 | |--0.56%-- 0x540e24 | |--0.56%-- 0x40a4f4 | |--0.55%-- 0x659d12 | 0x0 | |--0.55%-- 0x4eec22 | | | |--44.24%-- 0x200c280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | |--32.08%-- 0x1491280 | | 0x0 | | 0xa0 | | 0x696368752d62 | | | --23.68%-- 0x3195280 | 0x0 | 0xa0 | 0x696368752d62 | |--0.53%-- 0x564394 | | | |--69.75%-- 0x0 | | | |--23.87%-- 0x10100000002 | | | --6.38%-- 0x10100000006 | |--0.52%-- 0x4eeb52 | |--0.51%-- 0x530094 | |--0.50%-- 0x477a9e | 0x0 --74.90%-- [...] 0.77% qemu-kvm [kernel.kallsyms] [k] __srcu_read_lock | --- __srcu_read_lock | |--91.98%-- kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--81.72%-- 0x10100000006 | | | --18.28%-- 0x10100000002 | |--5.81%-- kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--78.63%-- 0x10100000006 | | | --21.37%-- 0x10100000002 | |--1.06%-- fsnotify | vfs_write | | | |--98.29%-- sys_write | | system_call_fastpath | | write | | | | | --100.00%-- 0x0 | | | --1.71%-- sys_pwrite64 | system_call_fastpath | pwrite64 | | | |--55.68%-- 0x1f12260 | | 0x80 | | 0x480050b9e1058b48 | | | --44.32%-- 0x309b260 | 0x80 | 0x480050b9e1058b48 | |--0.91%-- kvm_mmu_notifier_invalidate_page | __mmu_notifier_invalidate_page | try_to_unmap_one | | | |--98.79%-- try_to_unmap_anon | | try_to_unmap | | migrate_pages | | compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.17 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | | handle_exit | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call_fastpath | | ioctl | | | -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH -v2 2/2] make the compaction "skip ahead" logic robust 2012-09-15 15:55 ` Richard Davies 2012-09-16 19:12 ` Richard Davies 2012-09-17 12:26 ` Mel Gorman @ 2012-09-17 13:50 ` Rik van Riel 2012-09-17 14:07 ` Mel Gorman 2 siblings, 1 reply; 15+ messages in thread From: Rik van Riel @ 2012-09-17 13:50 UTC (permalink / raw) To: Richard Davies Cc: Mel Gorman, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm On 09/15/2012 11:55 AM, Richard Davies wrote: > Hi Rik, Mel and Shaohua, > > Thank you for your latest patches. I attach my latest perf report for a slow > boot with all of these applied. > > Mel asked for timings of the slow boots. It's very hard to give anything > useful here! A normal boot would be a minute or so, and many are like that, > but the slowest that I have seen (on 3.5.x) was several hours. Basically, I > just test many times until I get one which is noticeably slow than normal > and then run perf record on that one. > > The latest perf report for a slow boot is below. For the fast boots, most of > the time is in clean_page_c in do_huge_pmd_anonymous_page, but for this slow > one there is a lot of lock contention above that. How often do you run into slow boots, vs. fast ones? > # Overhead Command Shared Object Symbol > # ........ ............... .................... .............................................. > # > 58.49% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave > | > --- _raw_spin_lock_irqsave > | > |--95.07%-- compact_checklock_irqsave > | | > | |--70.03%-- isolate_migratepages_range > | | compact_zone > | | compact_zone_order > | | try_to_compact_pages > | | __alloc_pages_direct_compact > | | __alloc_pages_nodemask Looks like it moved from isolate_freepages_block in your last trace, to isolate_migratepages_range? Mel, I wonder if we have any quadratic complexity problems in this part of the code, too? The isolate_freepages_block CPU use can be fixed by simply restarting where the last invocation left off, instead of always starting at the end of the zone. Could we need something similar for isolate_migratepages_range? After all, Richard has a 128GB system, and runs 108GB worth of KVM guests on it... -- All rights reversed -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH -v2 2/2] make the compaction "skip ahead" logic robust 2012-09-17 13:50 ` Rik van Riel @ 2012-09-17 14:07 ` Mel Gorman 0 siblings, 0 replies; 15+ messages in thread From: Mel Gorman @ 2012-09-17 14:07 UTC (permalink / raw) To: Rik van Riel Cc: Richard Davies, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm On Mon, Sep 17, 2012 at 09:50:08AM -0400, Rik van Riel wrote: > On 09/15/2012 11:55 AM, Richard Davies wrote: > >Hi Rik, Mel and Shaohua, > > > >Thank you for your latest patches. I attach my latest perf report for a slow > >boot with all of these applied. > > > >Mel asked for timings of the slow boots. It's very hard to give anything > >useful here! A normal boot would be a minute or so, and many are like that, > >but the slowest that I have seen (on 3.5.x) was several hours. Basically, I > >just test many times until I get one which is noticeably slow than normal > >and then run perf record on that one. > > > >The latest perf report for a slow boot is below. For the fast boots, most of > >the time is in clean_page_c in do_huge_pmd_anonymous_page, but for this slow > >one there is a lot of lock contention above that. > > How often do you run into slow boots, vs. fast ones? > > ># Overhead Command Shared Object Symbol > ># ........ ............... .................... .............................................. > ># > > 58.49% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave > > | > > --- _raw_spin_lock_irqsave > > | > > |--95.07%-- compact_checklock_irqsave > > | | > > | |--70.03%-- isolate_migratepages_range > > | | compact_zone > > | | compact_zone_order > > | | try_to_compact_pages > > | | __alloc_pages_direct_compact > > | | __alloc_pages_nodemask > > Looks like it moved from isolate_freepages_block in your last > trace, to isolate_migratepages_range? > > Mel, I wonder if we have any quadratic complexity problems > in this part of the code, too? > Possibly but right now I'm focusing on the contention even though I recognise that reducing the amount of scanning implicitly reduces the amount of contention. I'm running a test at the moment with an additional patch to record the pageblock being scanned by either the free or migrate page scanner. This should be enough to both calculate the scanning efficiency and how many useless blocks are scanned to determine if your "skip" patches are behaving as expected and from there decide if the migrate scanner needs similar logic. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2012-09-18 17:58 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <20120821152107.GA16363@alpha.arachsys.com> [not found] ` <5034A18B.5040408@redhat.com> [not found] ` <20120822124032.GA12647@alpha.arachsys.com> [not found] ` <5034D437.8070106@redhat.com> [not found] ` <20120822144150.GA1400@alpha.arachsys.com> [not found] ` <5034F8F4.3080301@redhat.com> [not found] ` <20120825174550.GA8619@alpha.arachsys.com> [not found] ` <50391564.30401@redhat.com> [not found] ` <20120826105803.GA377@alpha.arachsys.com> [not found] ` <20120906092039.GA19234@alpha.arachsys.com> 2012-09-12 10:56 ` Windows VM slow boot Richard Davies 2012-09-12 12:25 ` Mel Gorman 2012-09-12 16:46 ` Richard Davies 2012-09-13 9:50 ` Mel Gorman 2012-09-13 19:47 ` [PATCH 1/2] Revert "mm: have order > 0 compaction start near a pageblock with free pages" Rik van Riel 2012-09-13 19:48 ` [PATCH 2/2] make the compaction "skip ahead" logic robust Rik van Riel 2012-09-13 19:54 ` [PATCH -v2 " Rik van Riel 2012-09-15 15:55 ` Richard Davies 2012-09-16 19:12 ` Richard Davies 2012-09-17 12:26 ` Mel Gorman 2012-09-18 8:14 ` Richard Davies 2012-09-18 11:21 ` Mel Gorman 2012-09-18 17:58 ` Richard Davies 2012-09-17 13:50 ` Rik van Riel 2012-09-17 14:07 ` Mel Gorman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).