* Re: hibernation hangs with ATA errors (lockup_detector bug)
2010-06-01 14:46 ` Jiri Slaby
@ 2010-06-01 15:26 ` Don Zickus
2010-06-01 15:26 ` Don Zickus
` (2 subsequent siblings)
3 siblings, 0 replies; 36+ messages in thread
From: Don Zickus @ 2010-06-01 15:26 UTC (permalink / raw)
To: Jiri Slaby; +Cc: Frederic Weisbecker, Linux-pm mailing list, LKML, linux-ide
On Tue, Jun 01, 2010 at 04:46:28PM +0200, Jiri Slaby wrote:
> On 06/01/2010 03:50 PM, Don Zickus wrote:
> > On Mon, May 31, 2010 at 04:22:00PM +0200, Jiri Slaby wrote:
> >> Hi,
> >>
> >> with -next I get the following errors while trying to hibernate in
> >> qemu-kvm after the image is stored on disk:
> >
> > Is this the host that is hibernating or the guest?
>
> Guest.
>
> > KVM guests don't emulate the performance counters, so the nmi piece
> > shouldn't be functioning and the soft lockup piece just sits on top of an
> > hrtimer, so off the top of my head it is hard to imagine it intefering
> > with a sata driver.
> >
> > I'll need your whole boot up log to see how the lockup detector
> > initialized itself.
>
> Here it goes (till the moment it dies on the ata errors):
I think I was able to reproduce over here. I guess I didn't test the
suspend/resume path very well and the cpu transitions states aren't what I
expected. It looks as though I might have screwed up the hrtimer stuff
and caused the box to hang when cpu1 when boots up in single cpu mode.
I'll see if I can get something out to test with later today.
Cheers,
Don
>
> Linux version 2.6.34-mm1_64+ (ku@bellona) (gcc version 4.5.0 20100420
> [gcc-4_5-branch revision 158562] (SUSE Linux) ) #1209 SMP Thu May 27
> 22:57:37 CEST 2010
> Command line: root=/dev/sda1 resume=/dev/sdb debug no_console_suspend
> console=ttyS0,115200 console=tty0 printk.time=1 2
> BIOS-provided physical RAM map:
> BIOS-e820: 0000000000000000 - 000000000009f000 (usable)
> BIOS-e820: 000000000009f000 - 00000000000a0000 (reserved)
> BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
> BIOS-e820: 0000000000100000 - 000000003e7fb000 (usable)
> BIOS-e820: 000000003e7fb000 - 000000003e800000 (reserved)
> BIOS-e820: 00000000fffbc000 - 0000000100000000 (reserved)
> NX (Execute Disable) protection: active
> DMI 2.4 present.
> e820 update range: 0000000000000000 - 0000000000001000 (usable) ==>
> (reserved)
> e820 remove range: 00000000000a0000 - 0000000000100000 (usable)
> No AGP bridge found
> last_pfn = 0x3e7fb max_arch_pfn = 0x400000000
> MTRR default type: write-back
> MTRR fixed ranges enabled:
> 00000-9FFFF write-back
> A0000-BFFFF uncachable
> C0000-FFFFF write-protect
> MTRR variable ranges enabled:
> 0 base 00E0000000 mask FFE0000000 uncachable
> 1 disabled
> 2 disabled
> 3 disabled
> 4 disabled
> 5 disabled
> 6 disabled
> 7 disabled
> PAT not supported by CPU.
> initial memory mapped : 0 - 20000000
> init_memory_mapping: 0000000000000000-000000003e7fb000
> 0000000000 - 003e600000 page 2M
> 003e600000 - 003e7fb000 page 4k
> kernel direct mapping tables up to 3e7fb000 @ 8000-b000
> RAMDISK: 3df2f000 - 3e7f0000
> ACPI: RSDP 00000000000f84c0 00014 (v00 BOCHS )
> ACPI: RSDT 000000003e7fde10 00034 (v01 BOCHS BXPCRSDT 00000001 BXPC
> 00000001)
> ACPI: FACP 000000003e7ffe40 00074 (v01 BOCHS BXPCFACP 00000001 BXPC
> 00000001)
> ACPI: DSDT 000000003e7fdfd0 01E22 (v01 BXPC BXDSDT 00000001 INTL
> 20090123)
> ACPI: FACS 000000003e7ffe00 00040
> ACPI: SSDT 000000003e7fdf80 00044 (v01 BOCHS BXPCSSDT 00000001 BXPC
> 00000001)
> ACPI: APIC 000000003e7fde90 0007A (v01 BOCHS BXPCAPIC 00000001 BXPC
> 00000001)
> ACPI: HPET 000000003e7fde50 00038 (v01 BOCHS BXPCHPET 00000001 BXPC
> 00000001)
> ACPI: Local APIC address 0xfee00000
> (8/32 early reservations) ==> bootmem [0000000000 - 003e7fb000]
> #0 [0001000000 - 0001989354] TEXT DATA BSS ==> [0001000000 -
> 0001989354]
> #1 [003df2f000 - 003e7f0000] RAMDISK ==> [003df2f000 -
> 003e7f0000]
> #2 [000009f000 - 0000100000] BIOS reserved ==> [000009f000 -
> 0000100000]
> #3 [000198a000 - 000198a049] BRK ==> [000198a000 -
> 000198a049]
> #4 [0000001000 - 0000003000] TRAMPOLINE ==> [0000001000 -
> 0000003000]
> #5 [0000003000 - 0000007000] ACPI WAKEUP ==> [0000003000 -
> 0000007000]
> #6 [0000008000 - 0000009000] PGTABLE ==> [0000008000 -
> 0000009000]
> #7 [0000009000 - 0000011000] BOOTMAP ==> [0000009000 -
> 0000011000]
> [ffffea0000000000-ffffea0000dfffff] PMD ->
> [ffff880002200000-ffff880002ffffff] on node 0
> sizeof(struct page) = 56
> Zone PFN ranges:
> DMA 0x00000001 -> 0x00001000
> DMA32 0x00001000 -> 0x00100000
> Normal empty
> Movable zone start PFN for each node
> early_node_map[2] active PFN ranges
> 0: 0x00000001 -> 0x0000009f
> 0: 0x00000100 -> 0x0003e7fb
> On node 0 totalpages: 255897
> DMA zone: 56 pages used for memmap
> DMA zone: 112 pages reserved
> DMA zone: 3830 pages, LIFO batch:0
> DMA32 zone: 3444 pages used for memmap
> DMA32 zone: 248455 pages, LIFO batch:31
> ACPI: PM-Timer IO Port: 0xb008
> ACPI: Local APIC address 0xfee00000
> ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
> ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
> ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
> IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, GSI 0-23
> ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
> ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
> ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
> ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
> ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
> ACPI: IRQ0 used by override.
> ACPI: IRQ2 used by override.
> ACPI: IRQ5 used by override.
> ACPI: IRQ9 used by override.
> ACPI: IRQ10 used by override.
> ACPI: IRQ11 used by override.
> Using ACPI (MADT) for SMP configuration information
> ACPI: HPET id: 0x8086a201 base: 0xfed00000
> SMP: Allowing 2 CPUs, 0 hotplug CPUs
> nr_irqs_gsi: 40
> PM: Registered nosave memory: 000000000009f000 - 00000000000a0000
> PM: Registered nosave memory: 00000000000a0000 - 00000000000f0000
> PM: Registered nosave memory: 00000000000f0000 - 0000000000100000
> Allocating PCI resources starting at 3e800000 (gap: 3e800000:c17bc000)
> setup_percpu: NR_CPUS:4 nr_cpumask_bits:4 nr_cpu_ids:2 nr_node_ids:1
> PERCPU: Embedded 26 pages/cpu @ffff880001a00000 s76352 r8192 d21952 u1048576
> pcpu-alloc: s76352 r8192 d21952 u1048576 alloc=1*2097152
> pcpu-alloc: [0] 0 1
> Built 1 zonelists in Zone order, mobility grouping on. Total pages: 252285
> Kernel command line: root=/dev/sda1 resume=/dev/sdb debug
> no_console_suspend console=ttyS0,115200 console=tty0 printk.time=1 2
> PID hash table entries: 4096 (order: 3, 32768 bytes)
> Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes)
> Inode-cache hash table entries: 65536 (order: 7, 524288 bytes)
> Checking aperture...
> No AGP bridge found
> Memory: 988644k/1023980k available (5634k kernel code, 392k absent,
> 34944k reserved, 3013k data, 424k init)
> SLUB: Genslabs=13, HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
> Hierarchical RCU implementation.
> RCU-based detection of stalled CPUs is disabled.
> Verbose stalled-CPUs detection is disabled.
> NR_IRQS:4352 nr_irqs:512
> Console: colour VGA+ 80x25
> console [tty0] enabled
> console [ttyS0] enabled
> allocated 10485760 bytes of page_cgroup
> please try 'cgroup_disable=memory' option if you don't want memory cgroups
> hpet clockevent registered
> Fast TSC calibration using PIT
> Detected 2990.792 MHz processor.
> Calibrating delay loop (skipped), value calculated using timer
> frequency.. 5981.58 BogoMIPS (lpj=11963168)
> pid_max: default: 32768 minimum: 301
> Mount-cache hash table entries: 256
> Initializing cgroup subsys debug
> Initializing cgroup subsys ns
> Initializing cgroup subsys cpuacct
> Initializing cgroup subsys memory
> Initializing cgroup subsys devices
> Initializing cgroup subsys freezer
> mce: CPU supports 10 MCE banks
> Performance Events: unsupported p6 CPU model 2 no PMU driver, software
> events only.
> ACPI: Core revision 20100428
> Setting APIC routing to flat
> ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
> CPU0: Intel QEMU Virtual CPU version 0.12.3 stepping 03
> NMI watchdog failed to create perf event on cpu0: ffffffffffffffed
> Booting Node 0, Processors #1 Ok.
> NMI watchdog failed to create perf event on cpu1: ffffffffffffffed
> Brought up 2 CPUs
> Total of 2 processors activated (11949.21 BogoMIPS).
> devtmpfs: initialized
> NET: Registered protocol family 16
> ACPI: bus type pci registered
> PCI: Using configuration type 1 for base access
> mtrr: your CPUs had inconsistent variable MTRR settings
> mtrr: your CPUs had inconsistent MTRRdefType settings
> mtrr: probably your BIOS does not setup all CPUs.
> mtrr: corrected configuration.
> bio: create slab <bio-0> at 0
> ACPI: EC: Look up EC in DSDT
> ACPI: Interpreter enabled
> ACPI: (supports S0 S3 S4 S5)
> ACPI: Using IOAPIC for interrupt routing
> ACPI: No dock devices found.
> PCI: Ignoring host bridge windows from ACPI; if necessary, use
> "pci=use_crs" and report a bug
> ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
> pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
> pci_root PNP0A03:00: host bridge window [io 0x0d00-0xffff] (ignored)
> pci_root PNP0A03:00: host bridge window [mem 0x000a0000-0x000bffff]
> (ignored)
> pci_root PNP0A03:00: host bridge window [mem 0xe0000000-0xfebfffff]
> (ignored)
> pci 0000:00:01.1: reg 20: [io 0xc000-0xc00f]
> pci 0000:00:01.2: reg 20: [io 0xc020-0xc03f]
> pci 0000:00:01.3: quirk: [io 0xb000-0xb03f] claimed by PIIX4 ACPI
> pci 0000:00:01.3: quirk: [io 0xb100-0xb10f] claimed by PIIX4 SMB
> pci 0000:00:02.0: reg 10: [mem 0xf0000000-0xf1ffffff pref]
> pci 0000:00:02.0: reg 14: [mem 0xf2000000-0xf2000fff]
> pci 0000:00:02.0: reg 30: [mem 0xf2010000-0xf201ffff pref]
> pci 0000:00:03.0: reg 10: [mem 0xf2020000-0xf203ffff]
> pci 0000:00:03.0: reg 14: [io 0xc040-0xc07f]
> pci 0000:00:03.0: reg 30: [mem 0xf2040000-0xf205ffff pref]
> pci_bus 0000:00: on NUMA node 0
> ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
> ACPI: PCI Interrupt Link [LNKA] (IRQs 5 *10 11)
> ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11)
> ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11)
> ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11)
> vgaarb: device added: PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none
> vgaarb: loaded
> SCSI subsystem initialized
> libata version 3.00 loaded.
> usbcore: registered new interface driver usbfs
> usbcore: registered new interface driver hub
> usbcore: registered new device driver usb
> Advanced Linux Sound Architecture Driver Version 1.0.23.
> PCI: Using ACPI for IRQ routing
> PCI: pci_cache_line_size set to 64 bytes
> reserve RAM buffer: 000000000009f000 - 000000000009ffff
> reserve RAM buffer: 000000003e7fb000 - 000000003fffffff
> cfg80211: Calling CRDA to update world regulatory domain
> HPET: 3 timers in total, 0 timers will be used for per-cpu timer
> hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0
> hpet0: 3 comparators, 64-bit 100.000000 MHz counter
> Switching to clocksource tsc
> Slow work thread pool: Starting up
> Slow work thread pool: Ready
> FS-Cache: Loaded
> CacheFiles: Loaded
> pnp: PnP ACPI init
> ACPI: bus type pnp registered
> pnp: PnP ACPI: found 8 devices
> ACPI: ACPI bus type pnp unregistered
> pci_bus 0000:00: resource 0 [io 0x0000-0xffff]
> pci_bus 0000:00: resource 1 [mem 0x00000000-0xffffffffffffffff]
> NET: Registered protocol family 2
> IP route cache hash table entries: 32768 (order: 6, 262144 bytes)
> TCP established hash table entries: 131072 (order: 9, 2097152 bytes)
> TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
> TCP: Hash tables configured (established 131072 bind 65536)
> TCP reno registered
> UDP hash table entries: 512 (order: 2, 16384 bytes)
> UDP-Lite hash table entries: 512 (order: 2, 16384 bytes)
> NET: Registered protocol family 1
> RPC: Registered udp transport module.
> RPC: Registered tcp transport module.
> RPC: Registered tcp NFSv4.1 backchannel transport module.
> pci 0000:00:00.0: Limiting direct PCI/PCI transfers
> pci 0000:00:01.0: PIIX3: Enabling Passive Release
> pci 0000:00:01.0: Activating ISA DMA hang workarounds
> pci 0000:00:02.0: Boot video device
> PCI: CLS 0 bytes, default 64
> Unpacking initramfs...
> Freeing initrd memory: 8964k freed
> kvm: no hardware support
> fuse init (API version 7.13)
> msgmni has been set to 1948
> alg: No test for stdrng (krng)
> io scheduler noop registered
> io scheduler cfq registered (default)
> input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0
> ACPI: Power Button [PWRF]
> Linux agpgart interface v0.103
> Hangcheck: starting hangcheck timer 0.9.1 (tick is 180 seconds, margin
> is 60 seconds).
> Hangcheck: Using getrawmonotonic().
> [drm] Initialized drm 1.1.0 20060810
> [drm:i915_init] *ERROR* drm/i915 can't work without intel_agp module!
> Serial: 8250/16550 driver, 2 ports, IRQ sharing disabled
> serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
> 00:06: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
> parport_pc 00:05: reported by Plug and Play ACPI
> parport0: PC-style at 0x378, irq 7 [PCSPP,TRISTATE]
> FDC 0 is a S82078B
> loop: module loaded
> ata_piix 0000:00:01.1: version 2.13
> ata_piix 0000:00:01.1: setting latency timer to 64
> scsi0 : ata_piix
> scsi1 : ata_piix
> ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc000 irq 14
> ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc008 irq 15
> e1000: Intel(R) PRO/1000 Network Driver - version 7.3.21-k6-NAPI
> e1000: Copyright (c) 1999-2006 Intel Corporation.
> ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 11
> e1000 0000:00:03.0: PCI INT A -> Link[LNKC] -> GSI 11 (level, high) ->
> IRQ 11
> e1000 0000:00:03.0: setting latency timer to 64
> ata1.00: ATA-7: QEMU HARDDISK, 0.12.3, max UDMA/100
> ata1.00: 20971520 sectors, multi 16: LBA48
> ata1.01: ATA-7: QEMU HARDDISK, 0.12.3, max UDMA/100
> ata1.01: 1433602 sectors, multi 16: LBA48
> ata2.01: NODEV after polling detection
> ata2.00: ATAPI: QEMU DVD-ROM, 0.12.3, max UDMA/100
> ata2.00: configured for MWDMA2
> ata1.00: configured for MWDMA2
> ata1.01: configured for MWDMA2
> scsi 0:0:0:0: Direct-Access ATA QEMU HARDDISK 0.12 PQ: 0 ANSI: 5
> scsi 0:0:1:0: Direct-Access ATA QEMU HARDDISK 0.12 PQ: 0 ANSI: 5
> sd 0:0:0:0: [sda] 20971520 512-byte logical blocks: (10.7 GB/10.0 GiB)
> scsi 1:0:0:0: CD-ROM QEMU QEMU DVD-ROM 0.12 PQ: 0 ANSI: 5
> sd 0:0:0:0: [sda] Write Protect is off
> sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
> sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't
> support DPO or FUA
> sd 0:0:1:0: [sdb] 1433602 512-byte logical blocks: (734 MB/700 MiB)
> sda:sr0: scsi3-mmc drive: 4x/4x xa/form2 tray
> Uniform CD-ROM driver Revision: 3.20
> sda1
> sd 0:0:1:0: [sdb] Write Protect is off
> sr 1:0:0:0: Attached scsi CD-ROM sr0
> sd 0:0:0:0: [sda] Attached SCSI disk
> sd 0:0:1:0: [sdb] Mode Sense: 00 3a 00 00
> sd 0:0:1:0: [sdb] Write cache: disabled, read cache: enabled, doesn't
> support DPO or FUA
> sdb: unknown partition table
> sd 0:0:1:0: [sdb] Attached SCSI disk
> e1000 0000:00:03.0: eth0: (PCI:33MHz:32-bit)
> e1000 0000:00:03.0: eth0: 52:54:00:12:34:56
> e1000 0000:00:03.0: eth0: Intel(R) PRO/1000 Network Connection
> e1000e: Intel(R) PRO/1000 Network Driver - 1.0.2-k4
> e1000e: Copyright (c) 1999 - 2009 Intel Corporation.
> tun: Universal TUN/TAP device driver, 1.6
> tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com>
> console [netcon0] enabled
> netconsole: network logging started
> ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
> uhci_hcd: USB Universal Host Controller Interface driver
> ACPI: PCI Interrupt Link [LNKD] enabled at IRQ 10
> uhci_hcd 0000:00:01.2: PCI INT D -> Link[LNKD] -> GSI 10 (level, high)
> -> IRQ 10
> uhci_hcd 0000:00:01.2: setting latency timer to 64
> uhci_hcd 0000:00:01.2: UHCI Host Controller
> uhci_hcd 0000:00:01.2: new USB bus registered, assigned bus number 1
> uhci_hcd 0000:00:01.2: irq 10, io base 0x0000c020
> hub 1-0:1.0: USB hub found
> hub 1-0:1.0: 2 ports detected
> PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 0x60,0x64 irq 1,12
> serio: i8042 KBD port at 0x60,0x64 irq 1
> serio: i8042 AUX port at 0x60,0x64 irq 12
> mice: PS/2 mouse device common for all mice
> input: AT Translated Set 2 keyboard as
> /devices/platform/i8042/serio0/input/input1
> rtc_cmos 00:01: rtc core: registered rtc_cmos as rtc0
> rtc0: alarms up to one day, 114 bytes nvram, hpet irqs
> i2c /dev entries driver
> coretemp: CPU (model=0x2) has no thermal sensor.
> coretemp: CPU (model=0x2) has no thermal sensor.
> md: raid0 personality registered for level 0
> md: raid1 personality registered for level 1
> device-mapper: ioctl: 4.17.0-ioctl (2010-03-05) initialised:
> dm-devel@redhat.com
> cpuidle: using governor ladder
> cpuidle: using governor menu
> Registered led device: input::numl
> Registered led device: input::capsl
> Registered led device: input::scrolll
> Registered led device: input1::numl
> Registered led device: input1::capsl
> Registered led device: input1::scrolll
> usbcore: registered new interface driver usbhid
> usbhid: USB HID core driver
> ALSA device list:
> No soundcards found.
> TCP cubic registered
> NET: Registered protocol family 10
> NET: Registered protocol family 17
> registered taskstats version 1
> rtc_cmos 00:01: setting system clock to 2010-06-01 14:42:57 UTC (1275403377)
> Freeing unused kernel memory: 424k freed
> Write protecting the kernel read-only data: 8192k
> Freeing unused kernel memory: 488k freed
> Freeing unused kernel memory: 328k freed
> udev: starting version 154
> PM: Starting manual resume from disk
> EXT4-fs (sda1): (no)acl options not supported
> EXT4-fs (sda1): mounted filesystem with ordered data mode
> EXT4-fs (sda1): (no)acl options not supported
> udev: starting version 154
> Adding 716796k swap on /dev/sdb. Priority:-1 extents:1 across:716796k
> ADDRCONF(NETDEV_UP): eth0: link is not ready
> e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
> ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> Syncing filesystems ... done.
> Freezing user space processes ... (elapsed 0.01 seconds) done.
> Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done.
> PM: Preallocating image memory... done (allocated 28778 pages)
> PM: Allocated 115112 kbytes in 0.03 seconds (3837.06 MB/s)
> e1000 0000:00:03.0: PCI INT A disabled
> PM: freeze of devices complete after 54.045 msecs
> PM: late freeze of devices complete after 0.614 msecs
> ACPI: Preparing to enter system sleep state S4
> PM: Saving platform NVS memory
> Disabling non-boot CPUs ...
> Broke affinity for irq 14
> CPU 1 is now offline
> SMP alternatives: switching to UP code
> PM: Creating hibernation image:
> PM: Need to copy 28341 pages
> PM: Hibernation image created (28341 pages copied)
> Enabling non-boot CPUs ...
> SMP alternatives: switching to SMP code
> Booting Node 0 Processor 1 APIC 0x1
> NMI watchdog failed to create perf event on cpu1: ffffffffffffffed
> CPU1 is up
> ACPI: Waking up from system sleep state S4
> PM: early thaw of devices complete after 0.025 msecs
> ata_piix 0000:00:01.1: restoring config space at offset 0x1 (was
> 0x2800003, writing 0x2800007)
> e1000 0000:00:03.0: restoring config space at offset 0x1 (was 0x100003,
> writing 0x100007)
> e1000 0000:00:03.0: PCI INT A -> Link[LNKC] -> GSI 11 (level, high) ->
> IRQ 11
> e1000 0000:00:03.0: setting latency timer to 64
> ata_piix 0000:00:01.1: setting latency timer to 64
> sd 0:0:0:0: [sda] Starting disk
> sd 0:0:1:0: [sdb] Starting disk
> e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
> ata1.00: configured for MWDMA2
> ata1.01: configured for MWDMA2
> PM: thaw of devices complete after 166.297 msecs
> PM: Saving image data pages (28397 pages) ... done
> PM: Wrote 113588 kbytes in 2.95 seconds (38.50 MB/s)
> ata2.01: qc timeout (cmd 0xec)
> ata2.01: failed to IDENTIFY (I/O error, err_mask=0x4)
> sd 0:0:1:0: [sdb] Stopping disk
> sd 0:0:0:0: [sda] Stopping disk
> uhci_hcd 0000:00:01.2: PCI INT D disabled
> e1000 0000:00:03.0: PCI INT A disabled
> ata2.01: qc timeout (cmd 0xec)
> ata2.01: failed to IDENTIFY (I/O error, err_mask=0x4)
>
>
> --
> js
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: hibernation hangs with ATA errors (lockup_detector bug)
2010-06-01 14:46 ` Jiri Slaby
2010-06-01 15:26 ` Don Zickus
@ 2010-06-01 15:26 ` Don Zickus
2010-06-02 18:44 ` Don Zickus
2010-06-02 18:44 ` Don Zickus
3 siblings, 0 replies; 36+ messages in thread
From: Don Zickus @ 2010-06-01 15:26 UTC (permalink / raw)
To: Jiri Slaby; +Cc: Frederic Weisbecker, LKML, Linux-pm mailing list, linux-ide
On Tue, Jun 01, 2010 at 04:46:28PM +0200, Jiri Slaby wrote:
> On 06/01/2010 03:50 PM, Don Zickus wrote:
> > On Mon, May 31, 2010 at 04:22:00PM +0200, Jiri Slaby wrote:
> >> Hi,
> >>
> >> with -next I get the following errors while trying to hibernate in
> >> qemu-kvm after the image is stored on disk:
> >
> > Is this the host that is hibernating or the guest?
>
> Guest.
>
> > KVM guests don't emulate the performance counters, so the nmi piece
> > shouldn't be functioning and the soft lockup piece just sits on top of an
> > hrtimer, so off the top of my head it is hard to imagine it intefering
> > with a sata driver.
> >
> > I'll need your whole boot up log to see how the lockup detector
> > initialized itself.
>
> Here it goes (till the moment it dies on the ata errors):
I think I was able to reproduce over here. I guess I didn't test the
suspend/resume path very well and the cpu transitions states aren't what I
expected. It looks as though I might have screwed up the hrtimer stuff
and caused the box to hang when cpu1 when boots up in single cpu mode.
I'll see if I can get something out to test with later today.
Cheers,
Don
>
> Linux version 2.6.34-mm1_64+ (ku@bellona) (gcc version 4.5.0 20100420
> [gcc-4_5-branch revision 158562] (SUSE Linux) ) #1209 SMP Thu May 27
> 22:57:37 CEST 2010
> Command line: root=/dev/sda1 resume=/dev/sdb debug no_console_suspend
> console=ttyS0,115200 console=tty0 printk.time=1 2
> BIOS-provided physical RAM map:
> BIOS-e820: 0000000000000000 - 000000000009f000 (usable)
> BIOS-e820: 000000000009f000 - 00000000000a0000 (reserved)
> BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
> BIOS-e820: 0000000000100000 - 000000003e7fb000 (usable)
> BIOS-e820: 000000003e7fb000 - 000000003e800000 (reserved)
> BIOS-e820: 00000000fffbc000 - 0000000100000000 (reserved)
> NX (Execute Disable) protection: active
> DMI 2.4 present.
> e820 update range: 0000000000000000 - 0000000000001000 (usable) ==>
> (reserved)
> e820 remove range: 00000000000a0000 - 0000000000100000 (usable)
> No AGP bridge found
> last_pfn = 0x3e7fb max_arch_pfn = 0x400000000
> MTRR default type: write-back
> MTRR fixed ranges enabled:
> 00000-9FFFF write-back
> A0000-BFFFF uncachable
> C0000-FFFFF write-protect
> MTRR variable ranges enabled:
> 0 base 00E0000000 mask FFE0000000 uncachable
> 1 disabled
> 2 disabled
> 3 disabled
> 4 disabled
> 5 disabled
> 6 disabled
> 7 disabled
> PAT not supported by CPU.
> initial memory mapped : 0 - 20000000
> init_memory_mapping: 0000000000000000-000000003e7fb000
> 0000000000 - 003e600000 page 2M
> 003e600000 - 003e7fb000 page 4k
> kernel direct mapping tables up to 3e7fb000 @ 8000-b000
> RAMDISK: 3df2f000 - 3e7f0000
> ACPI: RSDP 00000000000f84c0 00014 (v00 BOCHS )
> ACPI: RSDT 000000003e7fde10 00034 (v01 BOCHS BXPCRSDT 00000001 BXPC
> 00000001)
> ACPI: FACP 000000003e7ffe40 00074 (v01 BOCHS BXPCFACP 00000001 BXPC
> 00000001)
> ACPI: DSDT 000000003e7fdfd0 01E22 (v01 BXPC BXDSDT 00000001 INTL
> 20090123)
> ACPI: FACS 000000003e7ffe00 00040
> ACPI: SSDT 000000003e7fdf80 00044 (v01 BOCHS BXPCSSDT 00000001 BXPC
> 00000001)
> ACPI: APIC 000000003e7fde90 0007A (v01 BOCHS BXPCAPIC 00000001 BXPC
> 00000001)
> ACPI: HPET 000000003e7fde50 00038 (v01 BOCHS BXPCHPET 00000001 BXPC
> 00000001)
> ACPI: Local APIC address 0xfee00000
> (8/32 early reservations) ==> bootmem [0000000000 - 003e7fb000]
> #0 [0001000000 - 0001989354] TEXT DATA BSS ==> [0001000000 -
> 0001989354]
> #1 [003df2f000 - 003e7f0000] RAMDISK ==> [003df2f000 -
> 003e7f0000]
> #2 [000009f000 - 0000100000] BIOS reserved ==> [000009f000 -
> 0000100000]
> #3 [000198a000 - 000198a049] BRK ==> [000198a000 -
> 000198a049]
> #4 [0000001000 - 0000003000] TRAMPOLINE ==> [0000001000 -
> 0000003000]
> #5 [0000003000 - 0000007000] ACPI WAKEUP ==> [0000003000 -
> 0000007000]
> #6 [0000008000 - 0000009000] PGTABLE ==> [0000008000 -
> 0000009000]
> #7 [0000009000 - 0000011000] BOOTMAP ==> [0000009000 -
> 0000011000]
> [ffffea0000000000-ffffea0000dfffff] PMD ->
> [ffff880002200000-ffff880002ffffff] on node 0
> sizeof(struct page) = 56
> Zone PFN ranges:
> DMA 0x00000001 -> 0x00001000
> DMA32 0x00001000 -> 0x00100000
> Normal empty
> Movable zone start PFN for each node
> early_node_map[2] active PFN ranges
> 0: 0x00000001 -> 0x0000009f
> 0: 0x00000100 -> 0x0003e7fb
> On node 0 totalpages: 255897
> DMA zone: 56 pages used for memmap
> DMA zone: 112 pages reserved
> DMA zone: 3830 pages, LIFO batch:0
> DMA32 zone: 3444 pages used for memmap
> DMA32 zone: 248455 pages, LIFO batch:31
> ACPI: PM-Timer IO Port: 0xb008
> ACPI: Local APIC address 0xfee00000
> ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
> ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
> ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
> IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, GSI 0-23
> ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
> ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
> ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
> ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
> ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
> ACPI: IRQ0 used by override.
> ACPI: IRQ2 used by override.
> ACPI: IRQ5 used by override.
> ACPI: IRQ9 used by override.
> ACPI: IRQ10 used by override.
> ACPI: IRQ11 used by override.
> Using ACPI (MADT) for SMP configuration information
> ACPI: HPET id: 0x8086a201 base: 0xfed00000
> SMP: Allowing 2 CPUs, 0 hotplug CPUs
> nr_irqs_gsi: 40
> PM: Registered nosave memory: 000000000009f000 - 00000000000a0000
> PM: Registered nosave memory: 00000000000a0000 - 00000000000f0000
> PM: Registered nosave memory: 00000000000f0000 - 0000000000100000
> Allocating PCI resources starting at 3e800000 (gap: 3e800000:c17bc000)
> setup_percpu: NR_CPUS:4 nr_cpumask_bits:4 nr_cpu_ids:2 nr_node_ids:1
> PERCPU: Embedded 26 pages/cpu @ffff880001a00000 s76352 r8192 d21952 u1048576
> pcpu-alloc: s76352 r8192 d21952 u1048576 alloc=1*2097152
> pcpu-alloc: [0] 0 1
> Built 1 zonelists in Zone order, mobility grouping on. Total pages: 252285
> Kernel command line: root=/dev/sda1 resume=/dev/sdb debug
> no_console_suspend console=ttyS0,115200 console=tty0 printk.time=1 2
> PID hash table entries: 4096 (order: 3, 32768 bytes)
> Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes)
> Inode-cache hash table entries: 65536 (order: 7, 524288 bytes)
> Checking aperture...
> No AGP bridge found
> Memory: 988644k/1023980k available (5634k kernel code, 392k absent,
> 34944k reserved, 3013k data, 424k init)
> SLUB: Genslabs=13, HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
> Hierarchical RCU implementation.
> RCU-based detection of stalled CPUs is disabled.
> Verbose stalled-CPUs detection is disabled.
> NR_IRQS:4352 nr_irqs:512
> Console: colour VGA+ 80x25
> console [tty0] enabled
> console [ttyS0] enabled
> allocated 10485760 bytes of page_cgroup
> please try 'cgroup_disable=memory' option if you don't want memory cgroups
> hpet clockevent registered
> Fast TSC calibration using PIT
> Detected 2990.792 MHz processor.
> Calibrating delay loop (skipped), value calculated using timer
> frequency.. 5981.58 BogoMIPS (lpj=11963168)
> pid_max: default: 32768 minimum: 301
> Mount-cache hash table entries: 256
> Initializing cgroup subsys debug
> Initializing cgroup subsys ns
> Initializing cgroup subsys cpuacct
> Initializing cgroup subsys memory
> Initializing cgroup subsys devices
> Initializing cgroup subsys freezer
> mce: CPU supports 10 MCE banks
> Performance Events: unsupported p6 CPU model 2 no PMU driver, software
> events only.
> ACPI: Core revision 20100428
> Setting APIC routing to flat
> ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
> CPU0: Intel QEMU Virtual CPU version 0.12.3 stepping 03
> NMI watchdog failed to create perf event on cpu0: ffffffffffffffed
> Booting Node 0, Processors #1 Ok.
> NMI watchdog failed to create perf event on cpu1: ffffffffffffffed
> Brought up 2 CPUs
> Total of 2 processors activated (11949.21 BogoMIPS).
> devtmpfs: initialized
> NET: Registered protocol family 16
> ACPI: bus type pci registered
> PCI: Using configuration type 1 for base access
> mtrr: your CPUs had inconsistent variable MTRR settings
> mtrr: your CPUs had inconsistent MTRRdefType settings
> mtrr: probably your BIOS does not setup all CPUs.
> mtrr: corrected configuration.
> bio: create slab <bio-0> at 0
> ACPI: EC: Look up EC in DSDT
> ACPI: Interpreter enabled
> ACPI: (supports S0 S3 S4 S5)
> ACPI: Using IOAPIC for interrupt routing
> ACPI: No dock devices found.
> PCI: Ignoring host bridge windows from ACPI; if necessary, use
> "pci=use_crs" and report a bug
> ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
> pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
> pci_root PNP0A03:00: host bridge window [io 0x0d00-0xffff] (ignored)
> pci_root PNP0A03:00: host bridge window [mem 0x000a0000-0x000bffff]
> (ignored)
> pci_root PNP0A03:00: host bridge window [mem 0xe0000000-0xfebfffff]
> (ignored)
> pci 0000:00:01.1: reg 20: [io 0xc000-0xc00f]
> pci 0000:00:01.2: reg 20: [io 0xc020-0xc03f]
> pci 0000:00:01.3: quirk: [io 0xb000-0xb03f] claimed by PIIX4 ACPI
> pci 0000:00:01.3: quirk: [io 0xb100-0xb10f] claimed by PIIX4 SMB
> pci 0000:00:02.0: reg 10: [mem 0xf0000000-0xf1ffffff pref]
> pci 0000:00:02.0: reg 14: [mem 0xf2000000-0xf2000fff]
> pci 0000:00:02.0: reg 30: [mem 0xf2010000-0xf201ffff pref]
> pci 0000:00:03.0: reg 10: [mem 0xf2020000-0xf203ffff]
> pci 0000:00:03.0: reg 14: [io 0xc040-0xc07f]
> pci 0000:00:03.0: reg 30: [mem 0xf2040000-0xf205ffff pref]
> pci_bus 0000:00: on NUMA node 0
> ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
> ACPI: PCI Interrupt Link [LNKA] (IRQs 5 *10 11)
> ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11)
> ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11)
> ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11)
> vgaarb: device added: PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none
> vgaarb: loaded
> SCSI subsystem initialized
> libata version 3.00 loaded.
> usbcore: registered new interface driver usbfs
> usbcore: registered new interface driver hub
> usbcore: registered new device driver usb
> Advanced Linux Sound Architecture Driver Version 1.0.23.
> PCI: Using ACPI for IRQ routing
> PCI: pci_cache_line_size set to 64 bytes
> reserve RAM buffer: 000000000009f000 - 000000000009ffff
> reserve RAM buffer: 000000003e7fb000 - 000000003fffffff
> cfg80211: Calling CRDA to update world regulatory domain
> HPET: 3 timers in total, 0 timers will be used for per-cpu timer
> hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0
> hpet0: 3 comparators, 64-bit 100.000000 MHz counter
> Switching to clocksource tsc
> Slow work thread pool: Starting up
> Slow work thread pool: Ready
> FS-Cache: Loaded
> CacheFiles: Loaded
> pnp: PnP ACPI init
> ACPI: bus type pnp registered
> pnp: PnP ACPI: found 8 devices
> ACPI: ACPI bus type pnp unregistered
> pci_bus 0000:00: resource 0 [io 0x0000-0xffff]
> pci_bus 0000:00: resource 1 [mem 0x00000000-0xffffffffffffffff]
> NET: Registered protocol family 2
> IP route cache hash table entries: 32768 (order: 6, 262144 bytes)
> TCP established hash table entries: 131072 (order: 9, 2097152 bytes)
> TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
> TCP: Hash tables configured (established 131072 bind 65536)
> TCP reno registered
> UDP hash table entries: 512 (order: 2, 16384 bytes)
> UDP-Lite hash table entries: 512 (order: 2, 16384 bytes)
> NET: Registered protocol family 1
> RPC: Registered udp transport module.
> RPC: Registered tcp transport module.
> RPC: Registered tcp NFSv4.1 backchannel transport module.
> pci 0000:00:00.0: Limiting direct PCI/PCI transfers
> pci 0000:00:01.0: PIIX3: Enabling Passive Release
> pci 0000:00:01.0: Activating ISA DMA hang workarounds
> pci 0000:00:02.0: Boot video device
> PCI: CLS 0 bytes, default 64
> Unpacking initramfs...
> Freeing initrd memory: 8964k freed
> kvm: no hardware support
> fuse init (API version 7.13)
> msgmni has been set to 1948
> alg: No test for stdrng (krng)
> io scheduler noop registered
> io scheduler cfq registered (default)
> input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0
> ACPI: Power Button [PWRF]
> Linux agpgart interface v0.103
> Hangcheck: starting hangcheck timer 0.9.1 (tick is 180 seconds, margin
> is 60 seconds).
> Hangcheck: Using getrawmonotonic().
> [drm] Initialized drm 1.1.0 20060810
> [drm:i915_init] *ERROR* drm/i915 can't work without intel_agp module!
> Serial: 8250/16550 driver, 2 ports, IRQ sharing disabled
> serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
> 00:06: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
> parport_pc 00:05: reported by Plug and Play ACPI
> parport0: PC-style at 0x378, irq 7 [PCSPP,TRISTATE]
> FDC 0 is a S82078B
> loop: module loaded
> ata_piix 0000:00:01.1: version 2.13
> ata_piix 0000:00:01.1: setting latency timer to 64
> scsi0 : ata_piix
> scsi1 : ata_piix
> ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc000 irq 14
> ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc008 irq 15
> e1000: Intel(R) PRO/1000 Network Driver - version 7.3.21-k6-NAPI
> e1000: Copyright (c) 1999-2006 Intel Corporation.
> ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 11
> e1000 0000:00:03.0: PCI INT A -> Link[LNKC] -> GSI 11 (level, high) ->
> IRQ 11
> e1000 0000:00:03.0: setting latency timer to 64
> ata1.00: ATA-7: QEMU HARDDISK, 0.12.3, max UDMA/100
> ata1.00: 20971520 sectors, multi 16: LBA48
> ata1.01: ATA-7: QEMU HARDDISK, 0.12.3, max UDMA/100
> ata1.01: 1433602 sectors, multi 16: LBA48
> ata2.01: NODEV after polling detection
> ata2.00: ATAPI: QEMU DVD-ROM, 0.12.3, max UDMA/100
> ata2.00: configured for MWDMA2
> ata1.00: configured for MWDMA2
> ata1.01: configured for MWDMA2
> scsi 0:0:0:0: Direct-Access ATA QEMU HARDDISK 0.12 PQ: 0 ANSI: 5
> scsi 0:0:1:0: Direct-Access ATA QEMU HARDDISK 0.12 PQ: 0 ANSI: 5
> sd 0:0:0:0: [sda] 20971520 512-byte logical blocks: (10.7 GB/10.0 GiB)
> scsi 1:0:0:0: CD-ROM QEMU QEMU DVD-ROM 0.12 PQ: 0 ANSI: 5
> sd 0:0:0:0: [sda] Write Protect is off
> sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
> sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't
> support DPO or FUA
> sd 0:0:1:0: [sdb] 1433602 512-byte logical blocks: (734 MB/700 MiB)
> sda:sr0: scsi3-mmc drive: 4x/4x xa/form2 tray
> Uniform CD-ROM driver Revision: 3.20
> sda1
> sd 0:0:1:0: [sdb] Write Protect is off
> sr 1:0:0:0: Attached scsi CD-ROM sr0
> sd 0:0:0:0: [sda] Attached SCSI disk
> sd 0:0:1:0: [sdb] Mode Sense: 00 3a 00 00
> sd 0:0:1:0: [sdb] Write cache: disabled, read cache: enabled, doesn't
> support DPO or FUA
> sdb: unknown partition table
> sd 0:0:1:0: [sdb] Attached SCSI disk
> e1000 0000:00:03.0: eth0: (PCI:33MHz:32-bit)
> e1000 0000:00:03.0: eth0: 52:54:00:12:34:56
> e1000 0000:00:03.0: eth0: Intel(R) PRO/1000 Network Connection
> e1000e: Intel(R) PRO/1000 Network Driver - 1.0.2-k4
> e1000e: Copyright (c) 1999 - 2009 Intel Corporation.
> tun: Universal TUN/TAP device driver, 1.6
> tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com>
> console [netcon0] enabled
> netconsole: network logging started
> ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
> uhci_hcd: USB Universal Host Controller Interface driver
> ACPI: PCI Interrupt Link [LNKD] enabled at IRQ 10
> uhci_hcd 0000:00:01.2: PCI INT D -> Link[LNKD] -> GSI 10 (level, high)
> -> IRQ 10
> uhci_hcd 0000:00:01.2: setting latency timer to 64
> uhci_hcd 0000:00:01.2: UHCI Host Controller
> uhci_hcd 0000:00:01.2: new USB bus registered, assigned bus number 1
> uhci_hcd 0000:00:01.2: irq 10, io base 0x0000c020
> hub 1-0:1.0: USB hub found
> hub 1-0:1.0: 2 ports detected
> PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 0x60,0x64 irq 1,12
> serio: i8042 KBD port at 0x60,0x64 irq 1
> serio: i8042 AUX port at 0x60,0x64 irq 12
> mice: PS/2 mouse device common for all mice
> input: AT Translated Set 2 keyboard as
> /devices/platform/i8042/serio0/input/input1
> rtc_cmos 00:01: rtc core: registered rtc_cmos as rtc0
> rtc0: alarms up to one day, 114 bytes nvram, hpet irqs
> i2c /dev entries driver
> coretemp: CPU (model=0x2) has no thermal sensor.
> coretemp: CPU (model=0x2) has no thermal sensor.
> md: raid0 personality registered for level 0
> md: raid1 personality registered for level 1
> device-mapper: ioctl: 4.17.0-ioctl (2010-03-05) initialised:
> dm-devel@redhat.com
> cpuidle: using governor ladder
> cpuidle: using governor menu
> Registered led device: input::numl
> Registered led device: input::capsl
> Registered led device: input::scrolll
> Registered led device: input1::numl
> Registered led device: input1::capsl
> Registered led device: input1::scrolll
> usbcore: registered new interface driver usbhid
> usbhid: USB HID core driver
> ALSA device list:
> No soundcards found.
> TCP cubic registered
> NET: Registered protocol family 10
> NET: Registered protocol family 17
> registered taskstats version 1
> rtc_cmos 00:01: setting system clock to 2010-06-01 14:42:57 UTC (1275403377)
> Freeing unused kernel memory: 424k freed
> Write protecting the kernel read-only data: 8192k
> Freeing unused kernel memory: 488k freed
> Freeing unused kernel memory: 328k freed
> udev: starting version 154
> PM: Starting manual resume from disk
> EXT4-fs (sda1): (no)acl options not supported
> EXT4-fs (sda1): mounted filesystem with ordered data mode
> EXT4-fs (sda1): (no)acl options not supported
> udev: starting version 154
> Adding 716796k swap on /dev/sdb. Priority:-1 extents:1 across:716796k
> ADDRCONF(NETDEV_UP): eth0: link is not ready
> e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
> ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> Syncing filesystems ... done.
> Freezing user space processes ... (elapsed 0.01 seconds) done.
> Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done.
> PM: Preallocating image memory... done (allocated 28778 pages)
> PM: Allocated 115112 kbytes in 0.03 seconds (3837.06 MB/s)
> e1000 0000:00:03.0: PCI INT A disabled
> PM: freeze of devices complete after 54.045 msecs
> PM: late freeze of devices complete after 0.614 msecs
> ACPI: Preparing to enter system sleep state S4
> PM: Saving platform NVS memory
> Disabling non-boot CPUs ...
> Broke affinity for irq 14
> CPU 1 is now offline
> SMP alternatives: switching to UP code
> PM: Creating hibernation image:
> PM: Need to copy 28341 pages
> PM: Hibernation image created (28341 pages copied)
> Enabling non-boot CPUs ...
> SMP alternatives: switching to SMP code
> Booting Node 0 Processor 1 APIC 0x1
> NMI watchdog failed to create perf event on cpu1: ffffffffffffffed
> CPU1 is up
> ACPI: Waking up from system sleep state S4
> PM: early thaw of devices complete after 0.025 msecs
> ata_piix 0000:00:01.1: restoring config space at offset 0x1 (was
> 0x2800003, writing 0x2800007)
> e1000 0000:00:03.0: restoring config space at offset 0x1 (was 0x100003,
> writing 0x100007)
> e1000 0000:00:03.0: PCI INT A -> Link[LNKC] -> GSI 11 (level, high) ->
> IRQ 11
> e1000 0000:00:03.0: setting latency timer to 64
> ata_piix 0000:00:01.1: setting latency timer to 64
> sd 0:0:0:0: [sda] Starting disk
> sd 0:0:1:0: [sdb] Starting disk
> e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
> ata1.00: configured for MWDMA2
> ata1.01: configured for MWDMA2
> PM: thaw of devices complete after 166.297 msecs
> PM: Saving image data pages (28397 pages) ... done
> PM: Wrote 113588 kbytes in 2.95 seconds (38.50 MB/s)
> ata2.01: qc timeout (cmd 0xec)
> ata2.01: failed to IDENTIFY (I/O error, err_mask=0x4)
> sd 0:0:1:0: [sdb] Stopping disk
> sd 0:0:0:0: [sda] Stopping disk
> uhci_hcd 0000:00:01.2: PCI INT D disabled
> e1000 0000:00:03.0: PCI INT A disabled
> ata2.01: qc timeout (cmd 0xec)
> ata2.01: failed to IDENTIFY (I/O error, err_mask=0x4)
>
>
> --
> js
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: hibernation hangs with ATA errors (lockup_detector bug)
2010-06-01 14:46 ` Jiri Slaby
2010-06-01 15:26 ` Don Zickus
2010-06-01 15:26 ` Don Zickus
@ 2010-06-02 18:44 ` Don Zickus
2010-06-02 18:44 ` Don Zickus
3 siblings, 0 replies; 36+ messages in thread
From: Don Zickus @ 2010-06-02 18:44 UTC (permalink / raw)
To: Jiri Slaby; +Cc: Frederic Weisbecker, Linux-pm mailing list, LKML, linux-ide
On Tue, Jun 01, 2010 at 04:46:28PM +0200, Jiri Slaby wrote:
> On 06/01/2010 03:50 PM, Don Zickus wrote:
> > On Mon, May 31, 2010 at 04:22:00PM +0200, Jiri Slaby wrote:
> >> Hi,
> >>
> >> with -next I get the following errors while trying to hibernate in
> >> qemu-kvm after the image is stored on disk:
> >
> > Is this the host that is hibernating or the guest?
>
> Guest.
>
> > KVM guests don't emulate the performance counters, so the nmi piece
> > shouldn't be functioning and the soft lockup piece just sits on top of an
> > hrtimer, so off the top of my head it is hard to imagine it intefering
> > with a sata driver.
> >
> > I'll need your whole boot up log to see how the lockup detector
> > initialized itself.
Ok, so I found out what is causing the problem, not entirely sure why or
what the right fix is, but this patch should do the trick.
This is probably one of those fixing the symptoms but not the problem patch,
but I don't know enough about suspend/resume to understand what the real
problem is.
---->SNIP<---------------------
[lockup detector] don't return NOTIFY_BAD when cpu goes online for suspend
KVM guests do not support performance counter emulation, so if the nmi
watchdog piece is compiled in, it will always fail during boot. The
failure returns NOTIFY_BAD when the cpu goes online in the cpu notifier
callback. Returning NOTIFY_BAD causes hibernation to do really bad
things, so avoid doing that.
The cpu failure shouldn't be a critical failure anyway, so returning
NOTIFY_BAD was probably overstating things.
Signed-off-by: Don Zickus <dzickus@redhat.com>
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 6b7fad8..fda9770 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -550,8 +550,7 @@ cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu)
break;
case CPU_ONLINE:
case CPU_ONLINE_FROZEN:
- if (watchdog_enable(hotcpu))
- return NOTIFY_BAD;
+ watchdog_enable(hotcpu)
break;
#ifdef CONFIG_HOTPLUG_CPU
case CPU_UP_CANCELED:
^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: hibernation hangs with ATA errors (lockup_detector bug)
2010-06-01 14:46 ` Jiri Slaby
` (2 preceding siblings ...)
2010-06-02 18:44 ` Don Zickus
@ 2010-06-02 18:44 ` Don Zickus
2010-06-02 19:13 ` Frederic Weisbecker
` (3 more replies)
3 siblings, 4 replies; 36+ messages in thread
From: Don Zickus @ 2010-06-02 18:44 UTC (permalink / raw)
To: Jiri Slaby; +Cc: Frederic Weisbecker, LKML, Linux-pm mailing list, linux-ide
On Tue, Jun 01, 2010 at 04:46:28PM +0200, Jiri Slaby wrote:
> On 06/01/2010 03:50 PM, Don Zickus wrote:
> > On Mon, May 31, 2010 at 04:22:00PM +0200, Jiri Slaby wrote:
> >> Hi,
> >>
> >> with -next I get the following errors while trying to hibernate in
> >> qemu-kvm after the image is stored on disk:
> >
> > Is this the host that is hibernating or the guest?
>
> Guest.
>
> > KVM guests don't emulate the performance counters, so the nmi piece
> > shouldn't be functioning and the soft lockup piece just sits on top of an
> > hrtimer, so off the top of my head it is hard to imagine it intefering
> > with a sata driver.
> >
> > I'll need your whole boot up log to see how the lockup detector
> > initialized itself.
Ok, so I found out what is causing the problem, not entirely sure why or
what the right fix is, but this patch should do the trick.
This is probably one of those fixing the symptoms but not the problem patch,
but I don't know enough about suspend/resume to understand what the real
problem is.
---->SNIP<---------------------
[lockup detector] don't return NOTIFY_BAD when cpu goes online for suspend
KVM guests do not support performance counter emulation, so if the nmi
watchdog piece is compiled in, it will always fail during boot. The
failure returns NOTIFY_BAD when the cpu goes online in the cpu notifier
callback. Returning NOTIFY_BAD causes hibernation to do really bad
things, so avoid doing that.
The cpu failure shouldn't be a critical failure anyway, so returning
NOTIFY_BAD was probably overstating things.
Signed-off-by: Don Zickus <dzickus@redhat.com>
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 6b7fad8..fda9770 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -550,8 +550,7 @@ cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu)
break;
case CPU_ONLINE:
case CPU_ONLINE_FROZEN:
- if (watchdog_enable(hotcpu))
- return NOTIFY_BAD;
+ watchdog_enable(hotcpu)
break;
#ifdef CONFIG_HOTPLUG_CPU
case CPU_UP_CANCELED:
^ permalink raw reply related [flat|nested] 36+ messages in thread* Re: hibernation hangs with ATA errors (lockup_detector bug)
2010-06-02 18:44 ` Don Zickus
@ 2010-06-02 19:13 ` Frederic Weisbecker
2010-06-02 19:13 ` Frederic Weisbecker
` (2 subsequent siblings)
3 siblings, 0 replies; 36+ messages in thread
From: Frederic Weisbecker @ 2010-06-02 19:13 UTC (permalink / raw)
To: Don Zickus; +Cc: linux-ide, Linux-pm mailing list, Jiri Slaby, LKML
On Wed, Jun 02, 2010 at 02:44:59PM -0400, Don Zickus wrote:
> On Tue, Jun 01, 2010 at 04:46:28PM +0200, Jiri Slaby wrote:
> > On 06/01/2010 03:50 PM, Don Zickus wrote:
> > > On Mon, May 31, 2010 at 04:22:00PM +0200, Jiri Slaby wrote:
> > >> Hi,
> > >>
> > >> with -next I get the following errors while trying to hibernate in
> > >> qemu-kvm after the image is stored on disk:
> > >
> > > Is this the host that is hibernating or the guest?
> >
> > Guest.
> >
> > > KVM guests don't emulate the performance counters, so the nmi piece
> > > shouldn't be functioning and the soft lockup piece just sits on top of an
> > > hrtimer, so off the top of my head it is hard to imagine it intefering
> > > with a sata driver.
> > >
> > > I'll need your whole boot up log to see how the lockup detector
> > > initialized itself.
>
> Ok, so I found out what is causing the problem, not entirely sure why or
> what the right fix is, but this patch should do the trick.
>
> This is probably one of those fixing the symptoms but not the problem patch,
> but I don't know enough about suspend/resume to understand what the real
> problem is.
So the problem is that we stop the cpu hotplug notifying, I guess this prevents
some ata callbacks to execute in the cpu hotplug notifier and then provoke this
crash.
The patch looks ok, but I think you should at least print a message in such
case of watchdog failure.
Thanks.
>
> ---->SNIP<---------------------
> [lockup detector] don't return NOTIFY_BAD when cpu goes online for suspend
>
> KVM guests do not support performance counter emulation, so if the nmi
> watchdog piece is compiled in, it will always fail during boot. The
> failure returns NOTIFY_BAD when the cpu goes online in the cpu notifier
> callback. Returning NOTIFY_BAD causes hibernation to do really bad
> things, so avoid doing that.
>
> The cpu failure shouldn't be a critical failure anyway, so returning
> NOTIFY_BAD was probably overstating things.
>
> Signed-off-by: Don Zickus <dzickus@redhat.com>
>
>
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 6b7fad8..fda9770 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -550,8 +550,7 @@ cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu)
> break;
> case CPU_ONLINE:
> case CPU_ONLINE_FROZEN:
> - if (watchdog_enable(hotcpu))
> - return NOTIFY_BAD;
> + watchdog_enable(hotcpu)
> break;
> #ifdef CONFIG_HOTPLUG_CPU
> case CPU_UP_CANCELED:
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: hibernation hangs with ATA errors (lockup_detector bug)
2010-06-02 18:44 ` Don Zickus
2010-06-02 19:13 ` Frederic Weisbecker
@ 2010-06-02 19:13 ` Frederic Weisbecker
2010-06-02 19:43 ` Frederic Weisbecker
` (3 more replies)
2010-06-06 6:22 ` Jiri Slaby
2010-06-06 6:22 ` Jiri Slaby
3 siblings, 4 replies; 36+ messages in thread
From: Frederic Weisbecker @ 2010-06-02 19:13 UTC (permalink / raw)
To: Don Zickus; +Cc: Jiri Slaby, LKML, Linux-pm mailing list, linux-ide
On Wed, Jun 02, 2010 at 02:44:59PM -0400, Don Zickus wrote:
> On Tue, Jun 01, 2010 at 04:46:28PM +0200, Jiri Slaby wrote:
> > On 06/01/2010 03:50 PM, Don Zickus wrote:
> > > On Mon, May 31, 2010 at 04:22:00PM +0200, Jiri Slaby wrote:
> > >> Hi,
> > >>
> > >> with -next I get the following errors while trying to hibernate in
> > >> qemu-kvm after the image is stored on disk:
> > >
> > > Is this the host that is hibernating or the guest?
> >
> > Guest.
> >
> > > KVM guests don't emulate the performance counters, so the nmi piece
> > > shouldn't be functioning and the soft lockup piece just sits on top of an
> > > hrtimer, so off the top of my head it is hard to imagine it intefering
> > > with a sata driver.
> > >
> > > I'll need your whole boot up log to see how the lockup detector
> > > initialized itself.
>
> Ok, so I found out what is causing the problem, not entirely sure why or
> what the right fix is, but this patch should do the trick.
>
> This is probably one of those fixing the symptoms but not the problem patch,
> but I don't know enough about suspend/resume to understand what the real
> problem is.
So the problem is that we stop the cpu hotplug notifying, I guess this prevents
some ata callbacks to execute in the cpu hotplug notifier and then provoke this
crash.
The patch looks ok, but I think you should at least print a message in such
case of watchdog failure.
Thanks.
>
> ---->SNIP<---------------------
> [lockup detector] don't return NOTIFY_BAD when cpu goes online for suspend
>
> KVM guests do not support performance counter emulation, so if the nmi
> watchdog piece is compiled in, it will always fail during boot. The
> failure returns NOTIFY_BAD when the cpu goes online in the cpu notifier
> callback. Returning NOTIFY_BAD causes hibernation to do really bad
> things, so avoid doing that.
>
> The cpu failure shouldn't be a critical failure anyway, so returning
> NOTIFY_BAD was probably overstating things.
>
> Signed-off-by: Don Zickus <dzickus@redhat.com>
>
>
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 6b7fad8..fda9770 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -550,8 +550,7 @@ cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu)
> break;
> case CPU_ONLINE:
> case CPU_ONLINE_FROZEN:
> - if (watchdog_enable(hotcpu))
> - return NOTIFY_BAD;
> + watchdog_enable(hotcpu)
> break;
> #ifdef CONFIG_HOTPLUG_CPU
> case CPU_UP_CANCELED:
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: hibernation hangs with ATA errors (lockup_detector bug)
2010-06-02 19:13 ` Frederic Weisbecker
@ 2010-06-02 19:43 ` Frederic Weisbecker
2010-06-02 19:43 ` Frederic Weisbecker
` (2 subsequent siblings)
3 siblings, 0 replies; 36+ messages in thread
From: Frederic Weisbecker @ 2010-06-02 19:43 UTC (permalink / raw)
To: Don Zickus, Jeff Garzik
Cc: linux-ide, Linux-pm mailing list, Jiri Slaby, LKML
On Wed, Jun 02, 2010 at 09:13:40PM +0200, Frederic Weisbecker wrote:
> On Wed, Jun 02, 2010 at 02:44:59PM -0400, Don Zickus wrote:
> > On Tue, Jun 01, 2010 at 04:46:28PM +0200, Jiri Slaby wrote:
> > > On 06/01/2010 03:50 PM, Don Zickus wrote:
> > > > On Mon, May 31, 2010 at 04:22:00PM +0200, Jiri Slaby wrote:
> > > >> Hi,
> > > >>
> > > >> with -next I get the following errors while trying to hibernate in
> > > >> qemu-kvm after the image is stored on disk:
> > > >
> > > > Is this the host that is hibernating or the guest?
> > >
> > > Guest.
> > >
> > > > KVM guests don't emulate the performance counters, so the nmi piece
> > > > shouldn't be functioning and the soft lockup piece just sits on top of an
> > > > hrtimer, so off the top of my head it is hard to imagine it intefering
> > > > with a sata driver.
> > > >
> > > > I'll need your whole boot up log to see how the lockup detector
> > > > initialized itself.
> >
> > Ok, so I found out what is causing the problem, not entirely sure why or
> > what the right fix is, but this patch should do the trick.
> >
> > This is probably one of those fixing the symptoms but not the problem patch,
> > but I don't know enough about suspend/resume to understand what the real
> > problem is.
>
>
> So the problem is that we stop the cpu hotplug notifying, I guess this prevents
> some ata callbacks to execute in the cpu hotplug notifier and then provoke this
> crash.
(Adding more people in Cc)
But I'm eventually surprised about this: stopping the cpu hotplug callbacks
prevents some ATA resume callbacks to execute.
Does that mean some ata resume path are done from a cpu hotplug notifier?
That looks weird.
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: hibernation hangs with ATA errors (lockup_detector bug)
2010-06-02 19:13 ` Frederic Weisbecker
2010-06-02 19:43 ` Frederic Weisbecker
@ 2010-06-02 19:43 ` Frederic Weisbecker
2010-06-03 8:17 ` Tejun Heo
2010-06-03 8:17 ` Tejun Heo
2010-06-02 21:03 ` Don Zickus
2010-06-02 21:03 ` Don Zickus
3 siblings, 2 replies; 36+ messages in thread
From: Frederic Weisbecker @ 2010-06-02 19:43 UTC (permalink / raw)
To: Don Zickus, Jeff Garzik
Cc: Jiri Slaby, LKML, Linux-pm mailing list, linux-ide
On Wed, Jun 02, 2010 at 09:13:40PM +0200, Frederic Weisbecker wrote:
> On Wed, Jun 02, 2010 at 02:44:59PM -0400, Don Zickus wrote:
> > On Tue, Jun 01, 2010 at 04:46:28PM +0200, Jiri Slaby wrote:
> > > On 06/01/2010 03:50 PM, Don Zickus wrote:
> > > > On Mon, May 31, 2010 at 04:22:00PM +0200, Jiri Slaby wrote:
> > > >> Hi,
> > > >>
> > > >> with -next I get the following errors while trying to hibernate in
> > > >> qemu-kvm after the image is stored on disk:
> > > >
> > > > Is this the host that is hibernating or the guest?
> > >
> > > Guest.
> > >
> > > > KVM guests don't emulate the performance counters, so the nmi piece
> > > > shouldn't be functioning and the soft lockup piece just sits on top of an
> > > > hrtimer, so off the top of my head it is hard to imagine it intefering
> > > > with a sata driver.
> > > >
> > > > I'll need your whole boot up log to see how the lockup detector
> > > > initialized itself.
> >
> > Ok, so I found out what is causing the problem, not entirely sure why or
> > what the right fix is, but this patch should do the trick.
> >
> > This is probably one of those fixing the symptoms but not the problem patch,
> > but I don't know enough about suspend/resume to understand what the real
> > problem is.
>
>
> So the problem is that we stop the cpu hotplug notifying, I guess this prevents
> some ata callbacks to execute in the cpu hotplug notifier and then provoke this
> crash.
(Adding more people in Cc)
But I'm eventually surprised about this: stopping the cpu hotplug callbacks
prevents some ATA resume callbacks to execute.
Does that mean some ata resume path are done from a cpu hotplug notifier?
That looks weird.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: hibernation hangs with ATA errors (lockup_detector bug)
2010-06-02 19:43 ` Frederic Weisbecker
@ 2010-06-03 8:17 ` Tejun Heo
2010-06-03 8:17 ` Tejun Heo
1 sibling, 0 replies; 36+ messages in thread
From: Tejun Heo @ 2010-06-03 8:17 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Don Zickus, Jeff Garzik, Jiri Slaby, LKML, Linux-pm mailing list,
linux-ide
Hello,
On 06/02/2010 09:43 PM, Frederic Weisbecker wrote:
> But I'm eventually surprised about this: stopping the cpu hotplug callbacks
> prevents some ATA resume callbacks to execute.
>
> Does that mean some ata resume path are done from a cpu hotplug notifier?
> That looks weird.
libata itself doesn't use any cpu hotplug notifier. It just uses
suspend and resume callbacks. It looks like somehow IRQ delivery is
screwed up?
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: hibernation hangs with ATA errors (lockup_detector bug)
2010-06-02 19:43 ` Frederic Weisbecker
2010-06-03 8:17 ` Tejun Heo
@ 2010-06-03 8:17 ` Tejun Heo
1 sibling, 0 replies; 36+ messages in thread
From: Tejun Heo @ 2010-06-03 8:17 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Don Zickus, Jiri Slaby, LKML, linux-ide, Linux-pm mailing list,
Jeff Garzik
Hello,
On 06/02/2010 09:43 PM, Frederic Weisbecker wrote:
> But I'm eventually surprised about this: stopping the cpu hotplug callbacks
> prevents some ATA resume callbacks to execute.
>
> Does that mean some ata resume path are done from a cpu hotplug notifier?
> That looks weird.
libata itself doesn't use any cpu hotplug notifier. It just uses
suspend and resume callbacks. It looks like somehow IRQ delivery is
screwed up?
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: hibernation hangs with ATA errors (lockup_detector bug)
2010-06-02 19:13 ` Frederic Weisbecker
2010-06-02 19:43 ` Frederic Weisbecker
2010-06-02 19:43 ` Frederic Weisbecker
@ 2010-06-02 21:03 ` Don Zickus
2010-06-02 21:03 ` Don Zickus
3 siblings, 0 replies; 36+ messages in thread
From: Don Zickus @ 2010-06-02 21:03 UTC (permalink / raw)
To: Frederic Weisbecker; +Cc: linux-ide, Linux-pm mailing list, Jiri Slaby, LKML
On Wed, Jun 02, 2010 at 09:13:40PM +0200, Frederic Weisbecker wrote:
> On Wed, Jun 02, 2010 at 02:44:59PM -0400, Don Zickus wrote:
> > On Tue, Jun 01, 2010 at 04:46:28PM +0200, Jiri Slaby wrote:
> > > On 06/01/2010 03:50 PM, Don Zickus wrote:
> > > > On Mon, May 31, 2010 at 04:22:00PM +0200, Jiri Slaby wrote:
> > > >> Hi,
> > > >>
> > > >> with -next I get the following errors while trying to hibernate in
> > > >> qemu-kvm after the image is stored on disk:
> > > >
> > > > Is this the host that is hibernating or the guest?
> > >
> > > Guest.
> > >
> > > > KVM guests don't emulate the performance counters, so the nmi piece
> > > > shouldn't be functioning and the soft lockup piece just sits on top of an
> > > > hrtimer, so off the top of my head it is hard to imagine it intefering
> > > > with a sata driver.
> > > >
> > > > I'll need your whole boot up log to see how the lockup detector
> > > > initialized itself.
> >
> > Ok, so I found out what is causing the problem, not entirely sure why or
> > what the right fix is, but this patch should do the trick.
> >
> > This is probably one of those fixing the symptoms but not the problem patch,
> > but I don't know enough about suspend/resume to understand what the real
> > problem is.
>
>
> So the problem is that we stop the cpu hotplug notifying, I guess this prevents
> some ata callbacks to execute in the cpu hotplug notifier and then provoke this
> crash.
>
> The patch looks ok, but I think you should at least print a message in such
> case of watchdog failure.
Well this is already printed and in Jiri's dmesg output
NMI watchdog failed to create perf event on cpu1: ffffffffffffffed
I could change it to make it more obvious?
Cheers,
Don
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: hibernation hangs with ATA errors (lockup_detector bug)
2010-06-02 19:13 ` Frederic Weisbecker
` (2 preceding siblings ...)
2010-06-02 21:03 ` Don Zickus
@ 2010-06-02 21:03 ` Don Zickus
3 siblings, 0 replies; 36+ messages in thread
From: Don Zickus @ 2010-06-02 21:03 UTC (permalink / raw)
To: Frederic Weisbecker; +Cc: Jiri Slaby, LKML, Linux-pm mailing list, linux-ide
On Wed, Jun 02, 2010 at 09:13:40PM +0200, Frederic Weisbecker wrote:
> On Wed, Jun 02, 2010 at 02:44:59PM -0400, Don Zickus wrote:
> > On Tue, Jun 01, 2010 at 04:46:28PM +0200, Jiri Slaby wrote:
> > > On 06/01/2010 03:50 PM, Don Zickus wrote:
> > > > On Mon, May 31, 2010 at 04:22:00PM +0200, Jiri Slaby wrote:
> > > >> Hi,
> > > >>
> > > >> with -next I get the following errors while trying to hibernate in
> > > >> qemu-kvm after the image is stored on disk:
> > > >
> > > > Is this the host that is hibernating or the guest?
> > >
> > > Guest.
> > >
> > > > KVM guests don't emulate the performance counters, so the nmi piece
> > > > shouldn't be functioning and the soft lockup piece just sits on top of an
> > > > hrtimer, so off the top of my head it is hard to imagine it intefering
> > > > with a sata driver.
> > > >
> > > > I'll need your whole boot up log to see how the lockup detector
> > > > initialized itself.
> >
> > Ok, so I found out what is causing the problem, not entirely sure why or
> > what the right fix is, but this patch should do the trick.
> >
> > This is probably one of those fixing the symptoms but not the problem patch,
> > but I don't know enough about suspend/resume to understand what the real
> > problem is.
>
>
> So the problem is that we stop the cpu hotplug notifying, I guess this prevents
> some ata callbacks to execute in the cpu hotplug notifier and then provoke this
> crash.
>
> The patch looks ok, but I think you should at least print a message in such
> case of watchdog failure.
Well this is already printed and in Jiri's dmesg output
NMI watchdog failed to create perf event on cpu1: ffffffffffffffed
I could change it to make it more obvious?
Cheers,
Don
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: hibernation hangs with ATA errors (lockup_detector bug)
2010-06-02 18:44 ` Don Zickus
2010-06-02 19:13 ` Frederic Weisbecker
2010-06-02 19:13 ` Frederic Weisbecker
@ 2010-06-06 6:22 ` Jiri Slaby
2010-06-06 6:22 ` Jiri Slaby
3 siblings, 0 replies; 36+ messages in thread
From: Jiri Slaby @ 2010-06-06 6:22 UTC (permalink / raw)
To: Don Zickus; +Cc: Frederic Weisbecker, Linux-pm mailing list, LKML, linux-ide
On 06/02/2010 08:44 PM, Don Zickus wrote:
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -550,8 +550,7 @@ cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu)
> break;
> case CPU_ONLINE:
> case CPU_ONLINE_FROZEN:
> - if (watchdog_enable(hotcpu))
> - return NOTIFY_BAD;
> + watchdog_enable(hotcpu)
This fixes the problem indeed.
thanks,
--
js
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: hibernation hangs with ATA errors (lockup_detector bug)
2010-06-02 18:44 ` Don Zickus
` (2 preceding siblings ...)
2010-06-06 6:22 ` Jiri Slaby
@ 2010-06-06 6:22 ` Jiri Slaby
2010-09-03 8:35 ` Jiri Slaby
2010-09-03 8:35 ` Jiri Slaby
3 siblings, 2 replies; 36+ messages in thread
From: Jiri Slaby @ 2010-06-06 6:22 UTC (permalink / raw)
To: Don Zickus; +Cc: Frederic Weisbecker, LKML, Linux-pm mailing list, linux-ide
On 06/02/2010 08:44 PM, Don Zickus wrote:
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -550,8 +550,7 @@ cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu)
> break;
> case CPU_ONLINE:
> case CPU_ONLINE_FROZEN:
> - if (watchdog_enable(hotcpu))
> - return NOTIFY_BAD;
> + watchdog_enable(hotcpu)
This fixes the problem indeed.
thanks,
--
js
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: hibernation hangs with ATA errors (lockup_detector bug)
2010-06-06 6:22 ` Jiri Slaby
@ 2010-09-03 8:35 ` Jiri Slaby
2010-09-03 13:56 ` Don Zickus
2010-09-03 13:56 ` Don Zickus
2010-09-03 8:35 ` Jiri Slaby
1 sibling, 2 replies; 36+ messages in thread
From: Jiri Slaby @ 2010-09-03 8:35 UTC (permalink / raw)
To: Don Zickus; +Cc: Frederic Weisbecker, LKML, Linux-pm mailing list, linux-ide
On 06/06/2010 08:22 AM, Jiri Slaby wrote:
> On 06/02/2010 08:44 PM, Don Zickus wrote:
>> --- a/kernel/watchdog.c
>> +++ b/kernel/watchdog.c
>> @@ -550,8 +550,7 @@ cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu)
>> break;
>> case CPU_ONLINE:
>> case CPU_ONLINE_FROZEN:
>> - if (watchdog_enable(hotcpu))
>> - return NOTIFY_BAD;
>> + watchdog_enable(hotcpu)
>
> This fixes the problem indeed.
Hi, was this fixed somehow? I still use this hunk and don't know if it
is needed:
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -524,7 +524,7 @@ cpu_callback(struct notifier_block *nfb, unsigned
long action, void *hcpu)
break;
case CPU_ONLINE:
case CPU_ONLINE_FROZEN:
- err = watchdog_enable(hotcpu);
+ watchdog_enable(hotcpu);
break;
#ifdef CONFIG_HOTPLUG_CPU
case CPU_UP_CANCELED:
thanks,
--
js
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: hibernation hangs with ATA errors (lockup_detector bug)
2010-09-03 8:35 ` Jiri Slaby
@ 2010-09-03 13:56 ` Don Zickus
2010-09-03 13:56 ` Don Zickus
1 sibling, 0 replies; 36+ messages in thread
From: Don Zickus @ 2010-09-03 13:56 UTC (permalink / raw)
To: Jiri Slaby; +Cc: Frederic Weisbecker, Linux-pm mailing list, LKML, linux-ide
On Fri, Sep 03, 2010 at 10:35:04AM +0200, Jiri Slaby wrote:
> On 06/06/2010 08:22 AM, Jiri Slaby wrote:
> > On 06/02/2010 08:44 PM, Don Zickus wrote:
> >> --- a/kernel/watchdog.c
> >> +++ b/kernel/watchdog.c
> >> @@ -550,8 +550,7 @@ cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu)
> >> break;
> >> case CPU_ONLINE:
> >> case CPU_ONLINE_FROZEN:
> >> - if (watchdog_enable(hotcpu))
> >> - return NOTIFY_BAD;
> >> + watchdog_enable(hotcpu)
> >
> > This fixes the problem indeed.
>
> Hi, was this fixed somehow? I still use this hunk and don't know if it
> is needed:
I have to go re-review the problem again. I might have forgot to bring it
in. Thanks for reminding me!
Cheers,
Don
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -524,7 +524,7 @@ cpu_callback(struct notifier_block *nfb, unsigned
> long action, void *hcpu)
> break;
> case CPU_ONLINE:
> case CPU_ONLINE_FROZEN:
> - err = watchdog_enable(hotcpu);
> + watchdog_enable(hotcpu);
> break;
> #ifdef CONFIG_HOTPLUG_CPU
> case CPU_UP_CANCELED:
>
> thanks,
> --
> js
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: hibernation hangs with ATA errors (lockup_detector bug)
2010-09-03 8:35 ` Jiri Slaby
2010-09-03 13:56 ` Don Zickus
@ 2010-09-03 13:56 ` Don Zickus
2010-09-24 15:10 ` Jiri Slaby
2010-09-24 15:10 ` Jiri Slaby
1 sibling, 2 replies; 36+ messages in thread
From: Don Zickus @ 2010-09-03 13:56 UTC (permalink / raw)
To: Jiri Slaby; +Cc: Frederic Weisbecker, LKML, Linux-pm mailing list, linux-ide
On Fri, Sep 03, 2010 at 10:35:04AM +0200, Jiri Slaby wrote:
> On 06/06/2010 08:22 AM, Jiri Slaby wrote:
> > On 06/02/2010 08:44 PM, Don Zickus wrote:
> >> --- a/kernel/watchdog.c
> >> +++ b/kernel/watchdog.c
> >> @@ -550,8 +550,7 @@ cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu)
> >> break;
> >> case CPU_ONLINE:
> >> case CPU_ONLINE_FROZEN:
> >> - if (watchdog_enable(hotcpu))
> >> - return NOTIFY_BAD;
> >> + watchdog_enable(hotcpu)
> >
> > This fixes the problem indeed.
>
> Hi, was this fixed somehow? I still use this hunk and don't know if it
> is needed:
I have to go re-review the problem again. I might have forgot to bring it
in. Thanks for reminding me!
Cheers,
Don
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -524,7 +524,7 @@ cpu_callback(struct notifier_block *nfb, unsigned
> long action, void *hcpu)
> break;
> case CPU_ONLINE:
> case CPU_ONLINE_FROZEN:
> - err = watchdog_enable(hotcpu);
> + watchdog_enable(hotcpu);
> break;
> #ifdef CONFIG_HOTPLUG_CPU
> case CPU_UP_CANCELED:
>
> thanks,
> --
> js
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: hibernation hangs with ATA errors (lockup_detector bug)
2010-09-03 13:56 ` Don Zickus
@ 2010-09-24 15:10 ` Jiri Slaby
2010-09-24 15:10 ` Jiri Slaby
1 sibling, 0 replies; 36+ messages in thread
From: Jiri Slaby @ 2010-09-24 15:10 UTC (permalink / raw)
To: Don Zickus; +Cc: Frederic Weisbecker, Linux-pm mailing list, LKML, linux-ide
On 09/03/2010 04:56 PM, Don Zickus wrote:
> On Fri, Sep 03, 2010 at 10:35:04AM +0200, Jiri Slaby wrote:
>> On 06/06/2010 08:22 AM, Jiri Slaby wrote:
>>> On 06/02/2010 08:44 PM, Don Zickus wrote:
>>>> --- a/kernel/watchdog.c
>>>> +++ b/kernel/watchdog.c
>>>> @@ -550,8 +550,7 @@ cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu)
>>>> break;
>>>> case CPU_ONLINE:
>>>> case CPU_ONLINE_FROZEN:
>>>> - if (watchdog_enable(hotcpu))
>>>> - return NOTIFY_BAD;
>>>> + watchdog_enable(hotcpu)
>>>
>>> This fixes the problem indeed.
>>
>> Hi, was this fixed somehow? I still use this hunk and don't know if it
>> is needed:
>
> I have to go re-review the problem again. I might have forgot to bring it
> in. Thanks for reminding me!
Reminder #2 :). Any updates?
thanks,
--
js
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: hibernation hangs with ATA errors (lockup_detector bug)
2010-09-03 13:56 ` Don Zickus
2010-09-24 15:10 ` Jiri Slaby
@ 2010-09-24 15:10 ` Jiri Slaby
2010-09-24 18:24 ` Don Zickus
` (3 more replies)
1 sibling, 4 replies; 36+ messages in thread
From: Jiri Slaby @ 2010-09-24 15:10 UTC (permalink / raw)
To: Don Zickus; +Cc: Frederic Weisbecker, LKML, Linux-pm mailing list, linux-ide
On 09/03/2010 04:56 PM, Don Zickus wrote:
> On Fri, Sep 03, 2010 at 10:35:04AM +0200, Jiri Slaby wrote:
>> On 06/06/2010 08:22 AM, Jiri Slaby wrote:
>>> On 06/02/2010 08:44 PM, Don Zickus wrote:
>>>> --- a/kernel/watchdog.c
>>>> +++ b/kernel/watchdog.c
>>>> @@ -550,8 +550,7 @@ cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu)
>>>> break;
>>>> case CPU_ONLINE:
>>>> case CPU_ONLINE_FROZEN:
>>>> - if (watchdog_enable(hotcpu))
>>>> - return NOTIFY_BAD;
>>>> + watchdog_enable(hotcpu)
>>>
>>> This fixes the problem indeed.
>>
>> Hi, was this fixed somehow? I still use this hunk and don't know if it
>> is needed:
>
> I have to go re-review the problem again. I might have forgot to bring it
> in. Thanks for reminding me!
Reminder #2 :). Any updates?
thanks,
--
js
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: hibernation hangs with ATA errors (lockup_detector bug)
2010-09-24 15:10 ` Jiri Slaby
@ 2010-09-24 18:24 ` Don Zickus
2010-09-24 18:24 ` Don Zickus
` (2 subsequent siblings)
3 siblings, 0 replies; 36+ messages in thread
From: Don Zickus @ 2010-09-24 18:24 UTC (permalink / raw)
To: Jiri Slaby; +Cc: Frederic Weisbecker, Linux-pm mailing list, LKML, linux-ide
On Fri, Sep 24, 2010 at 06:10:02PM +0300, Jiri Slaby wrote:
> On 09/03/2010 04:56 PM, Don Zickus wrote:
> > On Fri, Sep 03, 2010 at 10:35:04AM +0200, Jiri Slaby wrote:
> >> On 06/06/2010 08:22 AM, Jiri Slaby wrote:
> >>> On 06/02/2010 08:44 PM, Don Zickus wrote:
> >>>> --- a/kernel/watchdog.c
> >>>> +++ b/kernel/watchdog.c
> >>>> @@ -550,8 +550,7 @@ cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu)
> >>>> break;
> >>>> case CPU_ONLINE:
> >>>> case CPU_ONLINE_FROZEN:
> >>>> - if (watchdog_enable(hotcpu))
> >>>> - return NOTIFY_BAD;
> >>>> + watchdog_enable(hotcpu)
> >>>
> >>> This fixes the problem indeed.
> >>
> >> Hi, was this fixed somehow? I still use this hunk and don't know if it
> >> is needed:
> >
> > I have to go re-review the problem again. I might have forgot to bring it
> > in. Thanks for reminding me!
>
> Reminder #2 :). Any updates?
No. I suck. I was sidetracked on other stuff. Let me put that together
today.
Thanks again for reminding me.
Cheers,
Don
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: hibernation hangs with ATA errors (lockup_detector bug)
2010-09-24 15:10 ` Jiri Slaby
2010-09-24 18:24 ` Don Zickus
@ 2010-09-24 18:24 ` Don Zickus
2010-09-24 19:19 ` Don Zickus
2010-09-24 19:19 ` Don Zickus
3 siblings, 0 replies; 36+ messages in thread
From: Don Zickus @ 2010-09-24 18:24 UTC (permalink / raw)
To: Jiri Slaby; +Cc: Frederic Weisbecker, LKML, Linux-pm mailing list, linux-ide
On Fri, Sep 24, 2010 at 06:10:02PM +0300, Jiri Slaby wrote:
> On 09/03/2010 04:56 PM, Don Zickus wrote:
> > On Fri, Sep 03, 2010 at 10:35:04AM +0200, Jiri Slaby wrote:
> >> On 06/06/2010 08:22 AM, Jiri Slaby wrote:
> >>> On 06/02/2010 08:44 PM, Don Zickus wrote:
> >>>> --- a/kernel/watchdog.c
> >>>> +++ b/kernel/watchdog.c
> >>>> @@ -550,8 +550,7 @@ cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu)
> >>>> break;
> >>>> case CPU_ONLINE:
> >>>> case CPU_ONLINE_FROZEN:
> >>>> - if (watchdog_enable(hotcpu))
> >>>> - return NOTIFY_BAD;
> >>>> + watchdog_enable(hotcpu)
> >>>
> >>> This fixes the problem indeed.
> >>
> >> Hi, was this fixed somehow? I still use this hunk and don't know if it
> >> is needed:
> >
> > I have to go re-review the problem again. I might have forgot to bring it
> > in. Thanks for reminding me!
>
> Reminder #2 :). Any updates?
No. I suck. I was sidetracked on other stuff. Let me put that together
today.
Thanks again for reminding me.
Cheers,
Don
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: hibernation hangs with ATA errors (lockup_detector bug)
2010-09-24 15:10 ` Jiri Slaby
2010-09-24 18:24 ` Don Zickus
2010-09-24 18:24 ` Don Zickus
@ 2010-09-24 19:19 ` Don Zickus
2010-09-25 19:30 ` Jiri Slaby
2010-09-25 19:30 ` Jiri Slaby
2010-09-24 19:19 ` Don Zickus
3 siblings, 2 replies; 36+ messages in thread
From: Don Zickus @ 2010-09-24 19:19 UTC (permalink / raw)
To: Jiri Slaby; +Cc: Frederic Weisbecker, LKML, Linux-pm mailing list, linux-ide
On Fri, Sep 24, 2010 at 06:10:02PM +0300, Jiri Slaby wrote:
> On 09/03/2010 04:56 PM, Don Zickus wrote:
> > On Fri, Sep 03, 2010 at 10:35:04AM +0200, Jiri Slaby wrote:
> >> On 06/06/2010 08:22 AM, Jiri Slaby wrote:
> >>> On 06/02/2010 08:44 PM, Don Zickus wrote:
> >>>> --- a/kernel/watchdog.c
> >>>> +++ b/kernel/watchdog.c
> >>>> @@ -550,8 +550,7 @@ cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu)
> >>>> break;
> >>>> case CPU_ONLINE:
> >>>> case CPU_ONLINE_FROZEN:
> >>>> - if (watchdog_enable(hotcpu))
> >>>> - return NOTIFY_BAD;
> >>>> + watchdog_enable(hotcpu)
> >>>
> >>> This fixes the problem indeed.
> >>
> >> Hi, was this fixed somehow? I still use this hunk and don't know if it
> >> is needed:
> >
> > I have to go re-review the problem again. I might have forgot to bring it
> > in. Thanks for reminding me!
>
> Reminder #2 :). Any updates?
Actually there seems to have been some clean up in the cpu hotplug error
handling code. Could you remove that snippet of code and see if the
problem still happens?
Cheers,
Don
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: hibernation hangs with ATA errors (lockup_detector bug)
2010-09-24 19:19 ` Don Zickus
@ 2010-09-25 19:30 ` Jiri Slaby
2010-09-25 19:30 ` Jiri Slaby
1 sibling, 0 replies; 36+ messages in thread
From: Jiri Slaby @ 2010-09-25 19:30 UTC (permalink / raw)
To: Don Zickus; +Cc: Frederic Weisbecker, Linux-pm mailing list, LKML, linux-ide
On 09/24/2010 10:19 PM, Don Zickus wrote:
> Actually there seems to have been some clean up in the cpu hotplug error
> handling code. Could you remove that snippet of code and see if the
> problem still happens?
The thing is I don't remember what problems exactly I had. From the
description I wrote, I cannot reproduce it anymore :).
thanks for looking into that,
--
js
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: hibernation hangs with ATA errors (lockup_detector bug)
2010-09-24 19:19 ` Don Zickus
2010-09-25 19:30 ` Jiri Slaby
@ 2010-09-25 19:30 ` Jiri Slaby
2010-09-27 14:21 ` Don Zickus
2010-09-27 14:21 ` Don Zickus
1 sibling, 2 replies; 36+ messages in thread
From: Jiri Slaby @ 2010-09-25 19:30 UTC (permalink / raw)
To: Don Zickus; +Cc: Frederic Weisbecker, LKML, Linux-pm mailing list, linux-ide
On 09/24/2010 10:19 PM, Don Zickus wrote:
> Actually there seems to have been some clean up in the cpu hotplug error
> handling code. Could you remove that snippet of code and see if the
> problem still happens?
The thing is I don't remember what problems exactly I had. From the
description I wrote, I cannot reproduce it anymore :).
thanks for looking into that,
--
js
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: hibernation hangs with ATA errors (lockup_detector bug)
2010-09-25 19:30 ` Jiri Slaby
@ 2010-09-27 14:21 ` Don Zickus
2010-09-27 14:21 ` Don Zickus
1 sibling, 0 replies; 36+ messages in thread
From: Don Zickus @ 2010-09-27 14:21 UTC (permalink / raw)
To: Jiri Slaby; +Cc: Frederic Weisbecker, Linux-pm mailing list, LKML, linux-ide
On Sat, Sep 25, 2010 at 10:30:35PM +0300, Jiri Slaby wrote:
> On 09/24/2010 10:19 PM, Don Zickus wrote:
> > Actually there seems to have been some clean up in the cpu hotplug error
> > handling code. Could you remove that snippet of code and see if the
> > problem still happens?
>
> The thing is I don't remember what problems exactly I had. From the
> description I wrote, I cannot reproduce it anymore :).
>
> thanks for looking into that,
ok. :-) Let me know if it pops up again.
Cheers,
Don
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: hibernation hangs with ATA errors (lockup_detector bug)
2010-09-25 19:30 ` Jiri Slaby
2010-09-27 14:21 ` Don Zickus
@ 2010-09-27 14:21 ` Don Zickus
1 sibling, 0 replies; 36+ messages in thread
From: Don Zickus @ 2010-09-27 14:21 UTC (permalink / raw)
To: Jiri Slaby; +Cc: Frederic Weisbecker, LKML, Linux-pm mailing list, linux-ide
On Sat, Sep 25, 2010 at 10:30:35PM +0300, Jiri Slaby wrote:
> On 09/24/2010 10:19 PM, Don Zickus wrote:
> > Actually there seems to have been some clean up in the cpu hotplug error
> > handling code. Could you remove that snippet of code and see if the
> > problem still happens?
>
> The thing is I don't remember what problems exactly I had. From the
> description I wrote, I cannot reproduce it anymore :).
>
> thanks for looking into that,
ok. :-) Let me know if it pops up again.
Cheers,
Don
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: hibernation hangs with ATA errors (lockup_detector bug)
2010-09-24 15:10 ` Jiri Slaby
` (2 preceding siblings ...)
2010-09-24 19:19 ` Don Zickus
@ 2010-09-24 19:19 ` Don Zickus
3 siblings, 0 replies; 36+ messages in thread
From: Don Zickus @ 2010-09-24 19:19 UTC (permalink / raw)
To: Jiri Slaby; +Cc: Frederic Weisbecker, Linux-pm mailing list, LKML, linux-ide
On Fri, Sep 24, 2010 at 06:10:02PM +0300, Jiri Slaby wrote:
> On 09/03/2010 04:56 PM, Don Zickus wrote:
> > On Fri, Sep 03, 2010 at 10:35:04AM +0200, Jiri Slaby wrote:
> >> On 06/06/2010 08:22 AM, Jiri Slaby wrote:
> >>> On 06/02/2010 08:44 PM, Don Zickus wrote:
> >>>> --- a/kernel/watchdog.c
> >>>> +++ b/kernel/watchdog.c
> >>>> @@ -550,8 +550,7 @@ cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu)
> >>>> break;
> >>>> case CPU_ONLINE:
> >>>> case CPU_ONLINE_FROZEN:
> >>>> - if (watchdog_enable(hotcpu))
> >>>> - return NOTIFY_BAD;
> >>>> + watchdog_enable(hotcpu)
> >>>
> >>> This fixes the problem indeed.
> >>
> >> Hi, was this fixed somehow? I still use this hunk and don't know if it
> >> is needed:
> >
> > I have to go re-review the problem again. I might have forgot to bring it
> > in. Thanks for reminding me!
>
> Reminder #2 :). Any updates?
Actually there seems to have been some clean up in the cpu hotplug error
handling code. Could you remove that snippet of code and see if the
problem still happens?
Cheers,
Don
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: hibernation hangs with ATA errors (lockup_detector bug)
2010-06-06 6:22 ` Jiri Slaby
2010-09-03 8:35 ` Jiri Slaby
@ 2010-09-03 8:35 ` Jiri Slaby
1 sibling, 0 replies; 36+ messages in thread
From: Jiri Slaby @ 2010-09-03 8:35 UTC (permalink / raw)
To: Don Zickus; +Cc: Frederic Weisbecker, Linux-pm mailing list, LKML, linux-ide
On 06/06/2010 08:22 AM, Jiri Slaby wrote:
> On 06/02/2010 08:44 PM, Don Zickus wrote:
>> --- a/kernel/watchdog.c
>> +++ b/kernel/watchdog.c
>> @@ -550,8 +550,7 @@ cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu)
>> break;
>> case CPU_ONLINE:
>> case CPU_ONLINE_FROZEN:
>> - if (watchdog_enable(hotcpu))
>> - return NOTIFY_BAD;
>> + watchdog_enable(hotcpu)
>
> This fixes the problem indeed.
Hi, was this fixed somehow? I still use this hunk and don't know if it
is needed:
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -524,7 +524,7 @@ cpu_callback(struct notifier_block *nfb, unsigned
long action, void *hcpu)
break;
case CPU_ONLINE:
case CPU_ONLINE_FROZEN:
- err = watchdog_enable(hotcpu);
+ watchdog_enable(hotcpu);
break;
#ifdef CONFIG_HOTPLUG_CPU
case CPU_UP_CANCELED:
thanks,
--
js
^ permalink raw reply [flat|nested] 36+ messages in thread