From mboxrd@z Thu Jan 1 00:00:00 1970 From: Igor Mammedov Subject: Re: x86: kvmclock: abstract save/restore sched_clock_state Date: Thu, 09 Feb 2012 16:13:29 +0100 Message-ID: <4F33E299.3090500@redhat.com> References: <20120207210542.GC20618@amt.cnet> <20120209122742.GA25505@amit.redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Marcelo Tosatti , kvm@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, hpa@zytor.com, x86@kernel.org, johnstul@us.ibm.com, riel@redhat.com, avi@redhat.com To: Amit Shah Return-path: Received: from mx1.redhat.com ([209.132.183.28]:50568 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751215Ab2BIPNu (ORCPT ); Thu, 9 Feb 2012 10:13:50 -0500 In-Reply-To: <20120209122742.GA25505@amit.redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: On 02/09/2012 01:27 PM, Amit Shah wrote: > On (Tue) 07 Feb 2012 [19:05:42], Marcelo Tosatti wrote: >> >> Upon resume from hibernation, CPU 0's hvclock area contains the old >> values for system_time and tsc_timestamp. It is necessary for the >> hypervisor to update these values with uptodate ones before the CPU uses >> them. >> >> Abstract TSC's save/restore sched_clock_state functions and use >> restore_state to write to KVM_SYSTEM_TIME MSR, forcing an update. >> >> Fixes suspend-to-disk with kvmclock. > > There are stalls after resume, see trace below. > > ./x86_64-softmmu/qemu-system-x86_64 -kernel ~/src/linux/arch/x86/boot/bzImage -append 'root=/dev/vda1 console=tty0 console=ttyS0 no_console_suspend' -drive file=/guests/f14-suspend.qcow2,if=none,id=dr0 -device virtio-blk-pci,drive=dr0 -net nic,model=virtio -net user -serial stdio -enable-kvm -m 512 -cpu host -smp 4 > > > > Disabling non-boot CPUs ... > CPU 1 is now offline > CPU 2 is now offline > CPU 3 is now offline > lockdep: fixing up alternatives. > kvm-clock: cpu 0, msr 0:1f1d19c1, primary cpu clock, resume > PM: Restoring platform NVS memory > Enabling non-boot CPUs ... > lockdep: fixing up alternatives. > Booting Node 0 Processor 1 APIC 0x1 > Calibrating delay loop (skipped) already calibrated this CPU > INFO: rcu_sched detected stalls on CPUs/tasks: { 1} (detected by 0, t=314192 jiffies) > Pid: 662, comm: bash Not tainted 3.3.0-rc2+ #293 > Call Trace: > [] __rcu_pending+0x268/0x3c5 > [] rcu_check_callbacks+0xab/0x108 > [] update_process_times+0x3f/0x75 > [] tick_sched_timer+0x6d/0x8c > [] __run_hrtimer+0xc9/0x15c > [] ? tick_nohz_handler+0xd5/0xd5 > [] hrtimer_interrupt+0xea/0x1b4 > [] smp_apic_timer_interrupt+0x76/0x89 > [] apic_timer_interrupt+0x73/0x80 > [] ? arch_local_irq_restore+0x6/0xd > [] native_cpu_up+0x108/0x127 > [] _cpu_up+0x92/0xfc > [] enable_nonboot_cpus+0x4d/0xb2 > [] hibernation_snapshot+0x1f3/0x2a0 > [] ? cleanup_srcu_struct+0x52/0x58 > [] hibernate+0x97/0x196 > [] state_store+0x5c/0x106 > [] kobj_attr_store+0x17/0x19 > [] sysfs_write_file+0x10e/0x14a > [] vfs_write+0xab/0xd2 > [] ? fget_light+0x3a/0xa1 > [] sys_write+0x4d/0x74 > [] system_call_fastpath+0x16/0x1b > INFO: rcu_preempt detected stalls on CPUs/tasks: { 1} (detected by 0, t=314193 jiffies) > Pid: 662, comm: bash Not tainted 3.3.0-rc2+ #293 > Call Trace: > [] __rcu_pending+0x268/0x3c5 > [] rcu_check_callbacks+0xe9/0x108 > [] update_process_times+0x3f/0x75 > [] tick_sched_timer+0x6d/0x8c > [] __run_hrtimer+0xc9/0x15c > [] ? tick_nohz_handler+0xd5/0xd5 > [] hrtimer_interrupt+0xea/0x1b4 > [] smp_apic_timer_interrupt+0x76/0x89 > [] apic_timer_interrupt+0x73/0x80 > [] ? arch_local_irq_restore+0x6/0xd > [] native_cpu_up+0x108/0x127 > [] _cpu_up+0x92/0xfc > [] enable_nonboot_cpus+0x4d/0xb2 > [] hibernation_snapshot+0x1f3/0x2a0 > [] ? cleanup_srcu_struct+0x52/0x58 > [] hibernate+0x97/0x196 > [] state_store+0x5c/0x106 > [] kobj_attr_store+0x17/0x19 > [] sysfs_write_file+0x10e/0x14a > [] vfs_write+0xab/0xd2 > [] ? fget_light+0x3a/0xa1 > [] sys_write+0x4d/0x74 > [] system_call_fastpath+0x16/0x1b > INFO: rcu_sched detected stalls on CPUs/tasks: { 1} (detected by 0, t=1713579840232 jiffies) > Pid: 662, comm: bash Not tainted 3.3.0-rc2+ #293 > Call Trace: > [] __rcu_pending+0x268/0x3c5 > [] rcu_check_callbacks+0xab/0x108 > [] update_process_times+0x3f/0x75 > [] tick_sched_timer+0x6d/0x8c > [] __run_hrtimer+0xc9/0x15c > [] ? tick_nohz_handler+0xd5/0xd5 > [] hrtimer_interrupt+0xea/0x1b4 > [] smp_apic_timer_interrupt+0x76/0x89 > [] apic_timer_interrupt+0x73/0x80 > [] ? arch_local_irq_restore+0x6/0xd > [] native_cpu_up+0x108/0x127 > [] _cpu_up+0x92/0xfc > [] enable_nonboot_cpus+0x4d/0xb2 > [] hibernation_snapshot+0x1f3/0x2a0 > [] ? cleanup_srcu_struct+0x52/0x58 > [] hibernate+0x97/0x196 > [] state_store+0x5c/0x106 > [] kobj_attr_store+0x17/0x19 > [] sysfs_write_file+0x10e/0x14a > [] vfs_write+0xab/0xd2 > [] ? fget_light+0x3a/0xa1 > [] sys_write+0x4d/0x74 > [] system_call_fastpath+0x16/0x1b > INFO: rcu_preempt detected stalls on CPUs/tasks: { 1} (detected by 0, t=1713579840233 jiffies) > Pid: 662, comm: bash Not tainted 3.3.0-rc2+ #293 > Call Trace: > [] __rcu_pending+0x268/0x3c5 > [] rcu_check_callbacks+0xe9/0x108 > [] update_process_times+0x3f/0x75 > [] tick_sched_timer+0x6d/0x8c > [] __run_hrtimer+0xc9/0x15c > [] ? tick_nohz_handler+0xd5/0xd5 > [] hrtimer_interrupt+0xea/0x1b4 > [] smp_apic_timer_interrupt+0x76/0x89 > [] apic_timer_interrupt+0x73/0x80 > [] ? arch_local_irq_restore+0x6/0xd > [] native_cpu_up+0x108/0x127 > [] _cpu_up+0x92/0xfc > [] enable_nonboot_cpus+0x4d/0xb2 > [] hibernation_snapshot+0x1f3/0x2a0 > [] ? cleanup_srcu_struct+0x52/0x58 > [] hibernate+0x97/0x196 > [] state_store+0x5c/0x106 > [] kobj_attr_store+0x17/0x19 > [] sysfs_write_file+0x10e/0x14a > [] vfs_write+0xab/0xd2 > [] ? fget_light+0x3a/0xa1 > [] sys_write+0x4d/0x74 > [] system_call_fastpath+0x16/0x1b > > > Amit Stalls are probably caused by uninitialized percpu hv_clock, with following patch I don't see stalls. Although I might be just lucky. http://git.kernel.org/?p=virt/kvm/kvm.git;a=commit;h=e2971ac7e1d186af059e088d305496c5cb47d487 However there is/are a warning/s on suspend path and with following patch: diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c index 66c74f4..c8e7587 100644 --- a/arch/x86/kernel/smp.c +++ b/arch/x86/kernel/smp.c @@ -117,6 +117,7 @@ static void native_smp_send_reschedule(int cpu) { if (unlikely(cpu_is_offline(cpu))) { + printk ("curr_cpu: %d, for cpu: %d\n", smp_processor_id(), cpu); WARN_ON(1); return; } [ 3.101919] Unregister pv shared memory for cpu 1 [ 3.207321] CPU 1 is now offline [ 3.210177] curr_cpu: 0, for cpu: 1 [ 3.211172] ------------[ cut here ]------------ [ 3.211172] WARNING: at arch/x86/kernel/smp.c:121 native_smp_send_reschedule+0x40/0x71() [ 3.211172] Hardware name: Bochs [ 3.211172] Modules linked in: floppy virtio_blk [ 3.211172] Pid: 264, comm: sh Not tainted 3.3.0-rc1+ #42 [ 3.211172] Call Trace: [ 3.211172] [] warn_slowpath_common+0x83/0x9b [ 3.211172] [] warn_slowpath_null+0x1a/0x1c [ 3.211172] [] native_smp_send_reschedule+0x40/0x71 [ 3.211172] [] trigger_load_balance+0x28e/0x2ba [ 3.211172] [] scheduler_tick+0xe9/0xf2 [ 3.211172] [] update_process_times+0x65/0x76 [ 3.211172] [] tick_sched_timer+0x75/0x9f [ 3.211172] [] __run_hrtimer+0xb1/0x155 [ 3.211172] [] ? tick_nohz_handler+0xdd/0xdd [ 3.211172] [] hrtimer_interrupt+0xe0/0x19e [ 3.211172] [] smp_apic_timer_interrupt+0x77/0x8a [ 3.211172] [] apic_timer_interrupt+0x73/0x80 [ 3.211172] [] ? cpuid4_cache_sysfs_exit+0x94/0xad [ 3.211172] [] ? cmpxchg_double_slab+0x6/0xef [ 3.211172] [] __slab_free+0xe6/0x1b7 [ 3.211172] [] kmem_cache_free+0x72/0xac [ 3.211172] [] ? cpuid4_cache_sysfs_exit+0x94/0xad [ 3.211172] [] release_sysfs_dirent+0x9b/0xb9 [ 3.211172] [] __sysfs_put+0x18/0x1a [ 3.211172] [] sysfs_addrm_finish+0x8a/0xa2 [ 3.211172] [] sysfs_hash_and_remove+0x5c/0x71 [ 3.211172] [] sysfs_remove_file+0x34/0x3b [ 3.211172] [] device_remove_file+0x17/0x19 [ 3.211172] [] mce_cpu_callback+0xa3/0x186 [ 3.211172] [] notifier_call_chain+0x37/0x63 [ 3.211172] [] __raw_notifier_call_chain+0xe/0x10 [ 3.211172] [] __cpu_notify+0x20/0x32 [ 3.211172] [] cpu_notify_nofail+0x13/0x1b [ 3.211172] [] _cpu_down+0x13a/0x21f [ 3.211172] [] disable_nonboot_cpus+0x66/0x11e [ 3.211172] [] hibernation_restore+0x78/0x106 [ 3.211172] [] software_resume+0x1e0/0x249 [ 3.211172] [] resume_store+0x8b/0x97 [ 3.211172] [] kobj_attr_store+0x17/0x19 [ 3.211172] [] sysfs_write_file+0x101/0x13d [ 3.211172] [] vfs_write+0xac/0xf3 [ 3.211172] [] ? trace_hardirqs_on_caller+0xf/0x24 [ 3.211172] [] sys_write+0x4a/0x6e [ 3.211172] [] system_call_fastpath+0x16/0x1b [ 3.211172] ---[ end trace 5aa7abdf47c70330 ]--- [ 3.276748] Unregister pv shared memory for cpu 2 [ 3.280771] CPU 2 is now offline [ 3.285070] Unregister pv shared memory for cpu 3 [ 3.288206] CPU 3 is now offline [ 31.290110] kvm-clock: cpu 0, msr 0:1fc13341, primary cpu clock, resume [ 31.290110] Enabling non-boot CPUs ... [ 31.291319] Booting Node 0 Processor 1 APIC 0x1 [ 31.293720] smpboot cpu 1: start_ip = 96000 [ 31.172141] Calibrating delay loop (skipped) already calibrated this CPUcurr_cpu: 0, for cpu: 2 [ 31.330037] ------------[ cut here ]------------ [ 31.330094] WARNING: at arch/x86/kernel/smp.c:121 native_smp_send_reschedule+0x40/0x71() [ 31.330094] Hardware name: Bochs [ 31.330094] Modules linked in: lockd fcoe libfcoe 8021q libfc scsi_transport_fc scsi_tgt garp stp llc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ip6table_filter ip6_tables e1000 pcspkr uinput i2c_piix4 i2c_core ppdev parport_pc parport microcode sunrpc floppy virtio_blk [last unloaded: scsi_wait_scan] [ 31.330094] Pid: 1016, comm: bash Tainted: G W 3.3.0-rc1+ #42 [ 31.330094] Call Trace: [ 31.330094] [] warn_slowpath_common+0x83/0x9b [ 31.330094] [] warn_slowpath_null+0x1a/0x1c [ 31.330094] [] native_smp_send_reschedule+0x40/0x71 [ 31.330094] [] trigger_load_balance+0x28e/0x2ba [ 31.330094] [] scheduler_tick+0xe9/0xf2 [ 31.330094] [] update_process_times+0x65/0x76 [ 31.330094] [] tick_sched_timer+0x75/0x9f [ 31.330094] [] __run_hrtimer+0xb1/0x155 [ 31.330094] [] ? tick_nohz_handler+0xdd/0xdd [ 31.330094] [] hrtimer_interrupt+0xe0/0x19e [ 31.330094] [] smp_apic_timer_interrupt+0x77/0x8a [ 31.330094] [] apic_timer_interrupt+0x73/0x80 [ 31.330094] [] ? retint_restore_args+0x13/0x13 [ 31.330094] [] ? arch_local_irq_restore+0x6/0xd [ 31.330094] [] _raw_spin_unlock_irqrestore+0x3d/0x3f [ 31.330094] [] workqueue_cpu_callback+0x203/0x214 [ 31.330094] [] notifier_call_chain+0x37/0x63 [ 31.330094] [] ? arch_local_irq_restore+0x6/0xd [ 31.330094] [] __raw_notifier_call_chain+0xe/0x10 [ 31.330094] [] __cpu_notify+0x20/0x32 [ 31.330094] [] _cpu_up+0xcd/0xfc [ 31.330094] [] enable_nonboot_cpus+0x57/0xc4 [ 31.330094] [] hibernation_snapshot+0x1f3/0x2a0 [ 31.330094] [] hibernate+0x97/0x198 [ 31.330094] [] state_store+0x5c/0x106 [ 31.330094] [] kobj_attr_store+0x17/0x19 [ 31.330094] [] sysfs_write_file+0x101/0x13d [ 31.330094] [] vfs_write+0xac/0xf3 [ 31.330094] [] ? trace_hardirqs_on_caller+0xf/0x24 [ 31.330094] [] sys_write+0x4a/0x6e [ 31.330094] [] system_call_fastpath+0x16/0x1b [ 31.330094] ---[ end trace bbd3f0319d3e2c44 ]--- [ 31.402829] KVM setup async PF for cpu 1 [ 31.402829] kvm-stealtime: cpu 1, msr 1fc8df00 [ 31.656224] NMI watchdog disabled (cpu1): hardware events not enabled [ 31.659240] CPU1 is up [ 31.660402] Booting Node 0 Processor 2 APIC 0x2 [ 31.661522] smpboot cpu 2: start_ip = 96000 [ 31.278595] Calibrating delay loop (skipped) already calibrated this CPU [ 31.697981] NMI watchdog disabled (cpu2): hardware events not enabled [ 31.698047] KVM setup async PF for cpu 2 [ 31.698051] kvm-stealtime: cpu 2, msr 1fd0df00 [ 31.706326] CPU2 is up however I think they are not related to subj patch.