From mboxrd@z Thu Jan 1 00:00:00 1970 From: Konrad Rzeszutek Wilk Subject: Re: resume from S3 sleep not working in Dom0 - Xen4.2.1 Date: Tue, 5 Feb 2013 13:32:37 -0500 Message-ID: <20130205183236.GB5652@konrad-lan.dumpdata.com> References: <510F91CA02000078000BB6AA@nat28.tlf.novell.com> <510F885B.2040603@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: <510F885B.2040603@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Tomasz Wroblewski Cc: Milan opath , Ben Guthro , Jan Beulich , "xen-devel@lists.xen.org" List-Id: xen-devel@lists.xenproject.org On Mon, Feb 04, 2013 at 11:07:23AM +0100, Tomasz Wroblewski wrote: > > >>fix-suspend-scheduler-v2 > >>fix-suspend-scheduler-revert-affinity-part > >>s3-timerirq > >> > >>All of these fixes have been proposed to the xen-devel list, but have > >>not yet been accepted, for one reason, or another. > >And I don't think comments on them have seen follow-ups. > > > >Jan > > > I guess it's worth bringing this up again; > > s3-timerirq: this was empirical hack which for some reason is needed > on stable 4.2 we use, but not on latest unstable, didn't really > investigate further since it appeared fixed later on anyway.. > > fix-suspend-scheduler/revert-affinity: the big objection here was > the part which reverts one of the hunks in Keir's commit. I tried > for quite few days to find a working fix which does not do this > revert using posted suggestions, but was not succesfull: > > - there was a crash in xen scheduler, which was fixable using your > suggestion of masking softirqs during s3 (ugly) > - there was also a crash in xen acpi cpufreq driver, which was > similarily fixable using a bandaid s3 condition (ugly) > - unfortunately this turned out to not be all, xen did not crash > anymore at this point but dom0 kernel did around the time it enables > cpus, in multiple places: at this point I didn't have a good > explanation for it, my opinion of aggravating hunk was rather low, > so I uttered a hearty curse and stuck a revert into private > patchqueue. > > The dom0 kernel crashes were as follows: > > 1) > > [ 60.657751] Enabling non-boot CPUs ... > [ 60.657958] installing Xen timer for CPU 1 > [ 60.657987] cpu 1 spinlock event irq 279 > [ 60.658101] Disabled fast string operations > [ 60.658466] CPU1 is up > [ 60.658736] installing Xen timer for CPU 2 > [ 60.658784] cpu 2 spinlock event irq 285 > [ 60.659764] Disabled fast string operations > [ 60.661811] BUG: unable to handle kernel NULL pointer dereference > at 0000000000000018 > [ 60.661817] IP: [] > build_sched_domains+0x770/0(XEN) *** Serial input -> Xen (type > 'CTRL-a' three times to switch input to DOM0) > > > > > 2) > .332997] installing Xen timer for CPU 2emory > [ 36.333061] cpu 2 spinlock event irq 285 > [ 36.333343] Disabled fast string operations > [ 36.334939] CPU2 is up > [ 36.335213] installing Xen timer for CPU 3 > [ 36.335244] cpu 3 spinlock event irq 291 > [ 36.335561] Disabled fast string operations > [ 36.337461] CPU3 is up > [ 36.339513] ACPI: Waking up from system sleep state S3 > [ 36.350193] BUG: unable to handle kernel NULL pointer dereference > at 0000000000000004 > [ 36.350211] IP: [] find_busiest_group+0x38a/0xbb0 > [ 36.350236] PGD 2f19067 PUD 2ec7067 PMD 0 > [ 36.350252] Oops: 0000 [#1] SMP > [ 36.350263] CPU 1 > [ 36.350267] Modules linked in: xt_mac ipt_MASQUERADE > ebtable_filter ebtables iscsi_scst(O) xt_tcpudp scst_vdisk(O) > xt_state crc32c xt_multiport libcrc32c iptable_filter iptable_nat > nf_nat nf_conntrack_ipv4 nf_conntrack scst_cdrom(O) nf_defrag_ipv4 > ip_tables scst(O) x_tables bridge stp llc nls_cp437 isofs zram(C) > snd_hda_codec_hdmi snd_hda_codec_conexant microcode arc4 psmouse > serio_raw i915 drm_kms_helper drm iwlwifi(O) mac80211(O) cfg80211(O) > thinkpad_acpi nvram snd_hda_intel snd_hda_codec snd_hwdep snd_pcm > snd_timer snd soundcore snd_page_alloc i2c_algo_bit intel_agp video > intel_gtt tpm_tis tpm tpm_bios sdhci_pci sdhci ehci_hcd e1000e > [ 36.350437] > [ 36.350445] Pid: 2730, comm: bash Tainted: G C O > 3.2.23-orc #19 LENOVO 42404EU/42404EU > [ 36.350463] RIP: e030:[] [] > find_busiest_group+0x38a/0xbb0 > [ 36.350481] RSP: e02b:ffff880002b71228 EFLAGS: 00010046 > [ 36.350490] RAX: 0000000000000040 RBX: 0000000000000000 RCX: > 0000000000000000 > [ 36.350500] RDX: 0000000000000000 RSI: 0000000000000040 RDI: > 0000000000000000 > [ 36.350510] RBP: ffff880002b713b8 R08: ffff880026109f00 R09: > 0000000000000000 > [ 36.350519] R10: 0000000000000000 R11: 0000000000000001 R12: > 0000000000000000 > [ 36.350529] R13: ffff880026109f80 R14: ffffffffffffffff R15: > ffff880026109f98 > [ 36.350547] FS: 00007fc41e295700(0000) GS:ffff88002dc40000(0000) > knlGS:0000000000000000 > [ 36.350558] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 36.350566] CR2: 0000000000000004 CR3: 0000000026329000 CR4: > 0000000000002660 > [ 36.350577] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [ 36.350587] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > 0000000000000400 > [ 36.350598] Process bash (pid: 2730, threadinfo ffff880002b70000, > task ffff880027a7db40) > [ 36.350608] Stack: > [ 36.350613] 00ffffff00000002 0000000300000001 ffff880002b71498 > ffff880002b71534 > [ 36.350630] 00ffffff00000002 0000000100000001 ffff8800262cf000 > 0000000000000008 > [ 36.350646] ffffffff00000000 0000000000000000 0000000000000000 > ffff88002dc4e2c8 > [ 36.350662] Call Trace: > [ 36.350677] [] load_balance+0xb8/0x840 > [ 36.350690] [] ? sched_clock+0x9/0x10 > [ 36.350706] [] ? sched_clock_cpu+0xbd/0x110 > [ 36.350718] [] ? update_shares+0xcc/0x100 > [ 36.350735] [] __schedule+0x875/0x8d0 > [ 36.350749] [] ? try_to_del_timer_sync+0x92/0x130 > [ 36.350762] [] schedule+0x3f/0x60 > [ 36.350773] [] schedule_timeout+0x16d/0x320 > [ 36.350786] [] ? usleep_range+0x50/0x50 > [ 36.350800] [] ? _raw_spin_unlock_irqrestore+0x1e/0x30 > [ 36.350817] [] > acpi_ec_transaction_unlocked+0x134/0x1d8 > [ 36.350830] [] ? add_wait_queue+0x60/0x60 > [ 36.350842] [] acpi_ec_transaction+0x196/0x239 > [ 36.350856] [] ? _raw_spin_unlock_irqrestore+0x1e/0x30 > [ 36.350869] [] acpi_ec_write+0x40/0x42 > [ 36.350881] [] acpi_ec_space_handler+0x9e/0xfc > [ 36.350894] [] ? acpi_ec_burst_disable+0x3d/0x3d > [ 36.350909] [] > acpi_ev_address_space_dispatch+0x179/0x1c8 > [ 36.350924] [] acpi_ex_access_region+0x23e/0x24b > [ 36.350936] [] ? __sysctl_head_next+0x11c/0x130 > [ 36.350951] [] acpi_ex_field_datum_io+0xf9/0x17a > [ 36.350965] [] > acpi_ex_write_with_update_rule+0xb5/0xc1 > [ 36.350989] [] acpi_ex_insert_into_field+0x1ef/0x211 > [ 36.351003] [] ? > acpi_ut_allocate_object_desc_dbg+0x45/0x7f > [ 36.351018] [] acpi_ex_write_data_to_field+0x194/0x1c2 > [ 36.351031] [] ? > acpi_ds_init_object_from_op+0x137/0x231 > [ 36.351044] [] acpi_ex_store_object_to_node+0xa3/0xe2 > [ 36.351056] [] acpi_ex_store+0xc3/0x256 > [ 36.351066] [] acpi_ex_opcode_1A_1T_1R+0x353/0x4a5 > [ 36.351078] [] acpi_ds_exec_end_op+0xf7/0x3e7 > [ 36.351092] [] acpi_ps_parse_loop+0x7bd/0x94e > [ 36.351105] [] acpi_ps_parse_aml+0x96/0x275 > [ 36.351119] [] acpi_ps_execute_method+0x1ce/0x276 > [ 36.351131] [] acpi_ns_evaluate+0xdf/0x1aa > [ 36.351144] [] acpi_evaluate_object+0xfb/0x1f4 > [ 36.351156] [] acpi_device_sleep_wake+0x95/0xc7 > [ 36.351168] [] > acpi_disable_wakeup_device_power+0x6e/0xc9 > [ 36.351182] [] acpi_disable_wakeup_devices+0x7b/0x95 > [ 36.351194] [] acpi_pm_finish+0x39/0x55 > [ 36.351208] [] suspend_devices_and_enter+0x104/0x310 > [ 36.351222] [] enter_state+0x167/0x190 > [ 36.351234] [] state_store+0xb7/0x130 > [ 36.351246] [] kobj_attr_store+0xf/0x30 > [ 36.351260] [] sysfs_write_file+0xef/0x170 > [ 36.351274] [] vfs_write+0xb3/0x180 > [ 36.351286] [] sys_write+0x4a/0x90 > [ 36.351300] [] system_call_fastpath+0x16/0x1b > [ 36.351308] Code: ff 48 8b bd a0 fe ff ff 44 88 85 78 fe ff ff e8 > 5d fb ff ff 44 0f b6 85 78 fe ff ff 0f 1f 44 00 00 49 8b 7d 10 4c 8b > 4d 98 31 d2 <8b> 4f 04 4c 89 c8 48 c1 e0 0a 48 f7 f1 48 8b 4d a0 48 > 85 c9 48 > [ 36.351435] RIP [] find_busiest_group+0x38a/0xbb0 > [ 36.351450] RSP > [ 36.351456] CR2: 0000000000000004 > [ 36.351465] ---[ end trace 5ad2b14b3a9050ae ]--- > [ 36.352362] BUG: unable to handle kernel NULL pointer dereference > at 0000000000000010 > [ 36.352379] IP: [] rb_next+0x1/0x50 > [ 36.352394] PGD 0 > [ 36.352402] Oops: 0000 [#2] SMP > [ 36.352411] CPU 1 > [ 36.352416] Modules linked in: xt_mac ipt_MASQUERADE > ebtable_filter ebtables iscsi_scst(O) xt_tcpudp scst_vdisk(O) > xt_state crc32c xt_multiport libcrc32c iptable_filter iptable_nat > nf_nat nf_conntrack_ipv4 nf_conntrack scst_cdrom(O) nf_defrag_ipv4 > ip_tables scst(O) x_tables bridge stp llc nls_cp437 isofs zram(C) > snd_hda_codec_hdmi snd_hda_codec_conexant microcode arc4 psmouse > serio_raw i915 drm_kms_helper drm iwlwifi(O) mac80211(O) cfg80211(O) > thinkpad_acpi nvram snd_hda_intel snd_hda_codec snd_hwdep snd_pcm > snd_timer snd soundcore snd_page_alloc i2c_algo_bit intel_agp video > intel_gtt tpm_tis tpm tpm_bios sdhci_pci sdhci ehci_hcd e1000e > [ 36.352573] > [ 36.352580] Pid: 2730, comm: bash Tainted: G D C O > 3.2.23-orc #19 LENOVO 42404EU/42404EU > [ 36.352596] RIP: e030:[] [ > > > > 3) > > [ 47.833362] Resuming Xen processor info > (XEN) microcode: collect_cpu_info : sig=0x206a6, pf=0x10, rev=0x28 > (XEN) microcode: collect_cpu_info : sig=0x206a6, pf=0x10, rev=0x28 > (XEN) microcode: collect_cpu_info : sig=0x206a6, pf=0x10, rev=0x28 > (XEN) microcode: collect_cpu_info : sig=0x206a6, pf=0x10, rev=0x28 > (XEN) microcode: collect_cpu_info : sig=0x206a6, pf=0x10, rev=0x28 > (XEN) microcode: collect_cpu_info : sig=0x206a6, pf=0x10, rev=0x28 > (XEN) microcode: collect_cpu_info : sig=0x206a6, pf=0x10, rev=0x28 > (XEN) microcode: collect_cpu_info : sig=0x206a6, pf=0x10, rev=0x28 > [ 47.886297] Enabling non-boot CPUs ... > [ 47.890082] installing Xen timer for CPU 1 > [ 47.894257] cpu 1 spinlock event irq 48 > [ 47.899013] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 > [ 47.906740] IP: [] __cpuidle_register_device+0x2b/0x100 > [ 47.913578] PGD 34a4067 PUD 3ac3067 PMD 0 > [ 47.917825] Oops: 0000 [#1] SMP > [ 47.921108] Modules linked in: ipt_MASQUERADE ebtable_filter ebtables iscsi_scst(O) xt_tcpudp xt_state xt_multiport iptable_filter scst_vdisk(O) iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack scst_cdrom(O) ip_tables scst(O) x_tables nls_cp437 isofs bridge stp llc zram(C) zsmalloc(C) hid_generic usbhid hid coretemp crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul microcode psmouse serio_raw arc4 iwldvm mac80211 i915 drm_kms_helper drm iwlwifi intel_agp i2c_algo_bit cfg80211 intel_gtt video ahci libahci e1000e [last unloaded: tpm_bios] > [ 47.974636] CPU 0 > [ 47.976456] Pid: 2468, comm: pm-suspend Tainted: G C O 3.8.0-orc #19 Intel Corporation SandyBridge Platform/Emerald Lake > [ 47.988310] RIP: e030:[] [] __cpuidle_register_device+0x2b/0x100 > [ 47.997605] RSP: e02b:ffff880025685c98 EFLAGS: 00010286 > [ 48.002970] RAX: 0000000000000000 RBX: ffff88002de40000 RCX: 0000000000000000 > [ 48.010154] RDX: ffff880025685fd8 RSI: 0000000000000007 RDI: ffff88002de40000 > [ 48.017336] RBP: ffff880025685cb8 R08: 0000000000021120 R09: 0000000000000000 > [ 48.024520] R10: 0000000000000030 R11: 0000000000000000 R12: ffff88002de40000 > [ 48.031742] R13: 00000000ffffffde R14: 00000000ffffffea R15: 0000000000000000 > [ 48.038927] FS: 00007fb599d0e700(0000) GS:ffff88002de00000(0000) knlGS:0000000000000000 > [ 48.047060] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 48.052859] CR2: 0000000000000008 CR3: 000000000345b000 CR4: 0000000000002660 > [ 48.060043] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 48.067223] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > [ 48.074450] Process pm-suspend (pid: 2468, threadinfo ffff880025684000, task ffff880003558000) > [ 48.083102] Stack: > [ 48.085179] ffff88002de40000 ffff88002de40000 00000000ffffffde ffffffff81a6b480 > [ 48.092622] ffff880025685cd8 ffffffff81491cc1 0000000000000001 ffff88002de40000 > [ 48.100064] ffff880025685cf8 ffffffff813046df 0000000000000001 0000000000000001 > [ 48.107517] Call Trace: > [ 48.110029] [] cpuidle_register_device+0x31/0x80 > [ 48.116348] [] intel_idle_cpu_init+0xbf/0x120 > [ 48.122423] [] cpu_hotplug_notify+0x70/0x80 > [ 48.128310] [] notifier_call_chain+0x4d/0x70 > [ 48.134281] [] __raw_notifier_call_chain+0xe/0x10 > [ 48.140686] [] __cpu_notify+0x20/0x40 > [ 48.146050] [] _cpu_up+0xf1/0x138 > [ 48.151070] [] enable_nonboot_cpus+0x99/0xd0 > [ 48.157090] [] suspend_devices_and_enter+0x25d/0x330 > [ 48.163752] [] pm_suspend+0x18f/0x1f0 > [ 48.169117] [] state_store+0x8a/0x100 > [ 48.174483] [] kobj_attr_store+0xf/0x30 > [ 48.180022] [] sysfs_write_file+0xef/0x170 > [ 48.185943] [] vfs_write+0xb3/0x180 > [ 48.191056] [] sys_write+0x52/0xa0 > [ 48.196160] [] ? do_page_fault+0xe/0x10 > [ 48.201700] [] system_call_fastpath+0x16/0x1b > [ 48.207758] Code: 66 66 66 66 90 55 48 89 e5 48 83 ec 20 48 89 5d e0 4c 89 6d f0 48 89 fb 4c 89 75 f8 4c 89 65 e8 41 be ea ff ff ff e8 75 0a 00 00<48> 8b 78 08 49 89 c5 e8 19 80 c1 ff 84 c0 74 53 8b 43 04 49 c7 > [ 48.226658] RIP [] __cpuidle_register_device+0x2b/0x100 Hm, that is suspect. There should not be any cpuidle_register? Perhaps you are .. ah yes, you are hitting a bug that should be in the stable tree fix. Here is the git commit b88a634a903d9670aa5f2f785aa890628ce0dece and 6f8c2e7933679f54b6478945dc72e59ef9a3d5e0 > [ 48.233582] RSP > [ 48.237131] CR2: 0000000000000008 > > [ 48.240521] ---[ end trace 535ebe28cd06b143 ]--- > > >