* swsusp 'disk' fails in bk-current - intel_agp at fault?
@ 2005-03-23 18:49 Andy Isaacson
2005-03-24 14:27 ` Stefan Seyfried
[not found] ` <20050525171825.51a06908.akpm@osdl.org>
0 siblings, 2 replies; 52+ messages in thread
From: Andy Isaacson @ 2005-03-23 18:49 UTC (permalink / raw)
To: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1181 bytes --]
I was previously running 2.6.11-rc3 and swsusp was working quite nicely:
echo shutdown > /sys/power/disk
echo disk > /sys/power/state
Now I've upgraded to 2.6.12-rc1, 423b66b6oJOGN68OhmSrBFxxLOtIEA, and it
no longer works reliably. Almost every time I do the above it blocks in
device_resume() (I haven't had time to track it deeper than that).
Here's the output (hand copied):
[ 51.782593] [nosave pfn 0x356]<7>[nosave pfn 0x357]swsusp:critical section/: done (122772 pages copied)
[ 54.305996] PM: writing image.
[ 54.306032] /usr/src/linux-2.6-cvs/kernel/power/swsusp.c:863
[ 54.316885] e100: eth0: e100_watchdog: link up, 100Mbps, full-duplex
_
(Obviously, I added some printks to track where it's blocking.)
Dmesg is attached; hardware is a Vaio r505te.
Unfortunately, the deadlock (?) is nondeterministic; it *sometimes*
suspends successfully, maybe one time out of 10. And thinking back, I
*sometimes* saw failures to suspend with 2.6.11-rc3, maybe one failure
out of 20 suspends.
Another interesting tidbit - I had more success when I tried it without
the intel_agp module loaded; I haven't seen a lockup yet. (But why
can't I rmmod intel_agp?)
-andy
[-- Attachment #2: dmesg --]
[-- Type: text/plain, Size: 9616 bytes --]
[4294667.296000] Linux version 2.6.12-rc1 (adi@sart) (gcc version 3.3.4 (Debian 1:3.3.4-9ubuntu5)) #7 Tue Mar 22 21:30:45 PST 2005
[4294667.296000] BIOS-provided physical RAM map:
[4294667.296000] BIOS-e820: 0000000000000000 - 000000000009e800 (usable)
[4294667.296000] BIOS-e820: 000000000009e800 - 00000000000a0000 (reserved)
[4294667.296000] BIOS-e820: 00000000000c0000 - 00000000000cc000 (reserved)
[4294667.296000] BIOS-e820: 00000000000d8000 - 0000000000100000 (reserved)
[4294667.296000] BIOS-e820: 0000000000100000 - 0000000013cf0000 (usable)
[4294667.296000] BIOS-e820: 0000000013cf0000 - 0000000013cfc000 (ACPI data)
[4294667.296000] BIOS-e820: 0000000013cfc000 - 0000000013d00000 (ACPI NVS)
[4294667.296000] BIOS-e820: 0000000013d00000 - 0000000013e80000 (usable)
[4294667.296000] BIOS-e820: 0000000013e80000 - 0000000014000000 (reserved)
[4294667.296000] BIOS-e820: 00000000ff800000 - 00000000ffc00000 (reserved)
[4294667.296000] BIOS-e820: 00000000fff00000 - 0000000100000000 (reserved)
[4294667.296000] 318MB LOWMEM available.
[4294667.296000] On node 0 totalpages: 81536
[4294667.296000] DMA zone: 4096 pages, LIFO batch:1
[4294667.296000] Normal zone: 77440 pages, LIFO batch:16
[4294667.296000] HighMem zone: 0 pages, LIFO batch:1
[4294667.296000] DMI present.
[4294667.296000] ACPI: RSDP (v000 PTLTD ) @ 0x000f7120
[4294667.296000] ACPI: RSDT (v001 SONY U1 0x20010312 PTL 0x00000000) @ 0x13cf74cb
[4294667.296000] ACPI: FADT (v001 SONY U1 0x20010312 PTL 0x01000000) @ 0x13cfbf64
[4294667.296000] ACPI: BOOT (v001 SONY U1 0x20010312 PTL 0x00000001) @ 0x13cfbfd8
[4294667.296000] ACPI: DSDT (v001 SONY U1 0x20010312 PTL 0x0100000b) @ 0x00000000
[4294667.296000] Allocating PCI resources starting at 14000000 (gap: 14000000:eb800000)
[4294667.296000] Built 1 zonelists
[4294667.296000] Kernel command line: root=/dev/hda2 ro
[4294667.296000] Local APIC disabled by BIOS -- you can enable it with "lapic"
[4294667.296000] mapped APIC to ffffd000 (0127f000)
[4294667.296000] Initializing CPU#0
[4294667.296000] PID hash table entries: 2048 (order: 11, 32768 bytes)
[ 0.000000] Detected 596.225 MHz processor.
[ 15.409585] Using tsc for high-res timesource
[ 15.411373] Console: colour VGA+ 80x25
[ 15.413032] Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
[ 15.414428] Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
[ 15.451391] Memory: 319764k/326144k available (1648k kernel code, 5748k reserved, 769k data, 156k init, 0k highmem)
[ 15.451466] Checking if this processor honours the WP bit even in supervisor mode... Ok.
[ 15.451721] Calibrating delay loop... 1175.55 BogoMIPS (lpj=587776)
[ 15.472329] Security Framework v1.0.0 initialized
[ 15.472403] Mount-cache hash table entries: 512
[ 15.472701] CPU: After generic identify, caps: 0383f9ff 00000000 00000000 00000000 00000000 00000000 00000000
[ 15.472723] CPU: After vendor identify, caps: 0383f9ff 00000000 00000000 00000000 00000000 00000000 00000000
[ 15.472748] CPU: L1 I cache: 16K, L1 D cache: 16K
[ 15.472790] CPU: L2 cache: 256K
[ 15.472823] CPU: After all inits, caps: 0383f9ff 00000000 00000000 00000040 00000000 00000000 00000000
[ 15.472840] CPU: Intel Pentium III (Coppermine) stepping 06
[ 15.472895] Enabling fast FPU save and restore... done.
[ 15.472937] Enabling unmasked SIMD FPU exception support... done.
[ 15.472986] Checking 'hlt' instruction... OK.
[ 15.516484] ACPI: setting ELCR to 0200 (from 0628)
[ 15.552288] NET: Registered protocol family 16
[ 15.553529] PCI: PCI BIOS revision 2.10 entry at 0xfd9b0, last bus=1
[ 15.553588] PCI: Using configuration type 1
[ 15.553626] mtrr: v2.0 (20020519)
[ 15.554600] ACPI: Subsystem revision 20050211
[ 15.593476] ACPI: Interpreter enabled
[ 15.593531] ACPI: Using PIC for interrupt routing
[ 15.597099] ACPI: PCI Root Bridge [PCI0] (00:00)
[ 15.597142] PCI: Probing PCI hardware (bus 00)
[ 15.598428] PCI: Transparent bridge - 0000:00:1e.0
[ 15.599531] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
[ 15.600509] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.HUB_._PRT]
[ 15.604310] ACPI: Embedded Controller [EC0] (gpe 28)
[ 15.647350] ACPI: PCI Interrupt Link [LNKA] (IRQs 9) *0
[ 15.648458] ACPI: PCI Interrupt Link [LNKB] (IRQs 9) *0
[ 15.649416] ACPI: PCI Interrupt Link [LNKC] (IRQs 9) *0, disabled.
[ 15.650495] ACPI: PCI Interrupt Link [LNKD] (IRQs *9)
[ 15.651565] ACPI: PCI Interrupt Link [LNKE] (IRQs *9)
[ 15.652638] ACPI: PCI Interrupt Link [LNKH] (IRQs 9) *0
[ 15.653325] PCI: Using ACPI for IRQ routing
[ 15.653369] PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report
[ 15.659519] Simple Boot Flag at 0x36 set to 0x1
[ 15.660688] VFS: Disk quotas dquot_6.5.1
[ 15.660777] Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
[ 15.660920] devfs: 2004-01-31 Richard Gooch (rgooch@atnf.csiro.au)
[ 15.660970] devfs: boot_options: 0x0
[ 15.661069] Initializing Cryptographic API
[ 15.662644] ACPI: AC Adapter [ACAD] (off-line)
[ 15.667594] ACPI: Battery Slot [BAT1] (battery present)
[ 15.667664] ACPI: Lid Switch [LID]
[ 15.667730] ACPI: Power Button (CM) [PWRB]
[ 15.700520] ACPI: Video Device [GCH0] (multi-head: yes rom: yes post: no)
[ 15.701009] ACPI: CPU0 (power states: C1[C1] C2[C2])
[ 15.701064] ACPI: Processor [CPU0] (supports 8 throttling states)
[ 15.707046] ACPI: Thermal Zone [ATF0] (34 C)
[ 15.713858] serio: i8042 AUX port at 0x60,0x64 irq 12
[ 15.714069] serio: i8042 KBD port at 0x60,0x64 irq 1
[ 15.714109] io scheduler noop registered
[ 15.714175] io scheduler anticipatory registered
[ 15.714224] io scheduler deadline registered
[ 15.714298] io scheduler cfq registered
[ 15.715472] RAMDISK driver initialized: 16 RAM disks of 8192K size 1024 blocksize
[ 15.715660] Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
[ 15.715708] ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
[ 15.715974] ICH2M: IDE controller at PCI slot 0000:00:1f.1
[ 15.716039] ICH2M: chipset revision 3
[ 15.716073] ICH2M: not 100% native mode: will probe irqs later
[ 15.716128] ide0: BM-DMA at 0x1800-0x1807, BIOS settings: hda:DMA, hdb:pio
[ 15.716218] Probing IDE interface ide0...
[ 15.738423] hda: TOSHIBA MK4026GAX, ATA DISK drive
[ 15.751123] ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
[ 15.751512] Probing IDE interface ide1...
[ 15.762838] Probing IDE interface ide2...
[ 15.783945] Probing IDE interface ide3...
[ 15.795219] Probing IDE interface ide4...
[ 15.816326] Probing IDE interface ide5...
[ 15.837538] hda: max request size: 128KiB
[ 15.848237] hda: 78140160 sectors (40007 MB), CHS=65535/16/63, UDMA(100)
[ 15.848356] hda: cache flushes supported
[ 15.848565] /dev/ide/host0/bus0/target0/lun0: p1 p2 p3
[ 15.855880] mice: PS/2 mouse device common for all mice
[ 15.855997] NET: Registered protocol family 2
[ 15.857032] IP: routing cache hash table of 2048 buckets, 16Kbytes
[ 15.857508] TCP established hash table entries: 16384 (order: 5, 131072 bytes)
[ 15.858088] TCP bind hash table entries: 16384 (order: 4, 65536 bytes)
[ 15.858415] TCP: Hash tables configured (established 16384 bind 16384)
[ 15.858737] NET: Registered protocol family 8
[ 15.858786] NET: Registered protocol family 20
[ 15.858924] PM: Checking swsusp image.
[ 15.859289] swsusp: Resume From Partition /dev/hda3
[ 15.877786] input: AT Translated Set 2 keyboard on isa0060/serio0
[ 15.885595] <3>swsusp: Suspend partition has wrong signature?
[ 15.885690] swsusp: Error -22 check for resume file
[ 15.885697] PM: Resume from disk failed.
[ 15.885743] ACPI wakeup devices:
[ 15.885793] PWRB LAN CRD0 EC0 COMA USB1 USB2 MODE
[ 15.885869] ACPI: (supports S0 S3 S4 S5)
[ 15.888624] kjournald starting. Commit interval 5 seconds
[ 15.888688] EXT3-fs: mounted filesystem with ordered data mode.
[ 15.888807] VFS: Mounted root (ext3 filesystem) readonly.
[ 15.889516] Freeing unused kernel memory: 156k freed
[ 16.141642] NET: Registered protocol family 1
[ 19.215308] Adding 987988k swap on /dev/hda3. Priority:-1 extents:1
[ 19.290971] EXT3 FS on hda2, internal journal
[ 22.577701] SCSI subsystem initialized
[ 23.111022] Enabling hardware tapping
[ 23.144426] input: PS/2 Mouse on isa0060/serio1
[ 23.450887] input: AlpsPS/2 ALPS GlidePoint on isa0060/serio1
[ 23.542008] ieee1394: Initialized config rom entry `ip1394'
[ 24.513902] sbp2: $Rev: 1219 $ Ben Collins <bcollins@debian.org>
[ 25.341803] kjournald starting. Commit interval 5 seconds
[ 25.341959] EXT3 FS on hda1, internal journal
[ 25.342007] EXT3-fs: mounted filesystem with ordered data mode.
[ 28.246274] Linux Kernel Card Services
[ 28.246343] options: [pci] [cardbus] [pm]
[ 28.436545] ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 9
[ 28.436608] PCI: setting IRQ 9 as level-triggered
[ 28.436647] ACPI: PCI interrupt 0000:01:02.0[A] -> GSI 9 (level, low) -> IRQ 9
[ 28.436724] Yenta: CardBus bridge found at 0000:01:02.0 [104d:80e0]
[ 28.556726] Yenta: ISA IRQ mask 0x0cb8, PCI irq 9
[ 28.556764] Socket status: 30000006
[ 30.159387] NET: Registered protocol family 10
[ 30.159768] Disabled Privacy Extensions on device c032ea20(lo)
[ 30.160046] IPv6 over IPv4 tunneling driver
[ 34.058359] Linux agpgart interface v0.101 (c) Dave Jones
[ 34.148385] agpgart: Detected an Intel i815 Chipset.
[ 34.204153] agpgart: AGP aperture is 64M @ 0xf8000000
^ permalink raw reply [flat|nested] 52+ messages in thread* Re: swsusp 'disk' fails in bk-current - intel_agp at fault? 2005-03-23 18:49 swsusp 'disk' fails in bk-current - intel_agp at fault? Andy Isaacson @ 2005-03-24 14:27 ` Stefan Seyfried 2005-03-24 18:10 ` Andy Isaacson [not found] ` <20050525171825.51a06908.akpm@osdl.org> 1 sibling, 1 reply; 52+ messages in thread From: Stefan Seyfried @ 2005-03-24 14:27 UTC (permalink / raw) To: Andy Isaacson; +Cc: kernel list Andy Isaacson wrote: > Dmesg is attached; hardware is a Vaio r505te. > > Unfortunately, the deadlock (?) is nondeterministic; it *sometimes* > suspends successfully, maybe one time out of 10. And thinking back, I > *sometimes* saw failures to suspend with 2.6.11-rc3, maybe one failure > out of 20 suspends. Does it hang hard or is sysrq still working? If sysrq is still working, please try with "i8042.noaux" (this will kill your touchpad, which is what i intend :-) Best regards, Stefan ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: swsusp 'disk' fails in bk-current - intel_agp at fault? 2005-03-24 14:27 ` Stefan Seyfried @ 2005-03-24 18:10 ` Andy Isaacson 2005-03-24 19:18 ` Dmitry Torokhov 2005-03-24 20:38 ` Stefan Seyfried 0 siblings, 2 replies; 52+ messages in thread From: Andy Isaacson @ 2005-03-24 18:10 UTC (permalink / raw) To: Stefan Seyfried; +Cc: kernel list On Thu, Mar 24, 2005 at 03:27:15PM +0100, Stefan Seyfried wrote: > Andy Isaacson wrote: > > Dmesg is attached; hardware is a Vaio r505te. > > > > Unfortunately, the deadlock (?) is nondeterministic; it *sometimes* > > suspends successfully, maybe one time out of 10. And thinking back, I > > *sometimes* saw failures to suspend with 2.6.11-rc3, maybe one failure > > out of 20 suspends. > > Does it hang hard or is sysrq still working? Sysrq still prints stuff, so IRQs aren't locked. But most of the sysrq commands don't work... S and U don't seem to do anything (not too suprising I suppose) but B does reboot. > If sysrq is still working, please try with "i8042.noaux" (this will kill > your touchpad, which is what i intend :-) So I added i8042.noaux to my kernel command line, rebooted, insmodded intel_agp, started X, and verified no touchpad action. Then I suspended, and it worked fine. After restart, I suspended again - also fine. So I think that fixed it. But no touchpad is a bit annoying. :) -andy ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: swsusp 'disk' fails in bk-current - intel_agp at fault? 2005-03-24 18:10 ` Andy Isaacson @ 2005-03-24 19:18 ` Dmitry Torokhov 2005-03-24 20:20 ` Andy Isaacson 2005-03-24 20:38 ` Stefan Seyfried 1 sibling, 1 reply; 52+ messages in thread From: Dmitry Torokhov @ 2005-03-24 19:18 UTC (permalink / raw) To: Andy Isaacson; +Cc: Stefan Seyfried, kernel list On Thu, 24 Mar 2005 10:10:59 -0800, Andy Isaacson <adi@hexapodia.org> wrote: > > So I added i8042.noaux to my kernel command line, rebooted, insmodded > intel_agp, started X, and verified no touchpad action. Then I > suspended, and it worked fine. After restart, I suspended again - also > fine. > > So I think that fixed it. But no touchpad is a bit annoying. :) > Try adding i8042.nomux instead of i8042.noaux, it should keep your touchpad in working condition. Please let me know if it still wiorks. -- Dmitry ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: swsusp 'disk' fails in bk-current - intel_agp at fault? 2005-03-24 19:18 ` Dmitry Torokhov @ 2005-03-24 20:20 ` Andy Isaacson 2005-03-24 21:10 ` Dmitry Torokhov 2005-03-24 21:14 ` Dmitry Torokhov 0 siblings, 2 replies; 52+ messages in thread From: Andy Isaacson @ 2005-03-24 20:20 UTC (permalink / raw) To: dtor_core; +Cc: Stefan Seyfried, kernel list On Thu, Mar 24, 2005 at 02:18:40PM -0500, Dmitry Torokhov wrote: > On Thu, 24 Mar 2005 10:10:59 -0800, Andy Isaacson <adi@hexapodia.org> wrote: > > So I added i8042.noaux to my kernel command line, rebooted, insmodded > > intel_agp, started X, and verified no touchpad action. Then I > > suspended, and it worked fine. After restart, I suspended again - also > > fine. > > > > So I think that fixed it. But no touchpad is a bit annoying. :) > > Try adding i8042.nomux instead of i8042.noaux, it should keep your > touchpad in working condition. Please let me know if it still wiorks. With nomux the touchpad works again, but suspend blocks in the same place as without nomux. (How can I verify that "nomux" was accepted? It shows up on the "Kernel command line" but there's no other mention of it in dmesg.) -andy ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: swsusp 'disk' fails in bk-current - intel_agp at fault? 2005-03-24 20:20 ` Andy Isaacson @ 2005-03-24 21:10 ` Dmitry Torokhov 2005-03-24 23:54 ` Andy Isaacson 2005-03-24 21:14 ` Dmitry Torokhov 1 sibling, 1 reply; 52+ messages in thread From: Dmitry Torokhov @ 2005-03-24 21:10 UTC (permalink / raw) To: Andy Isaacson; +Cc: Stefan Seyfried, kernel list On Thu, 24 Mar 2005 12:20:40 -0800, Andy Isaacson <adi@hexapodia.org> wrote: > On Thu, Mar 24, 2005 at 02:18:40PM -0500, Dmitry Torokhov wrote: > > On Thu, 24 Mar 2005 10:10:59 -0800, Andy Isaacson <adi@hexapodia.org> wrote: > > > So I added i8042.noaux to my kernel command line, rebooted, insmodded > > > intel_agp, started X, and verified no touchpad action. Then I > > > suspended, and it worked fine. After restart, I suspended again - also > > > fine. > > > > > > So I think that fixed it. But no touchpad is a bit annoying. :) > > > > Try adding i8042.nomux instead of i8042.noaux, it should keep your > > touchpad in working condition. Please let me know if it still wiorks. > > With nomux the touchpad works again, but suspend blocks in the same > place as without nomux. > > (How can I verify that "nomux" was accepted? It shows up on the "Kernel > command line" but there's no other mention of it in dmesg.) > > -andy > If you do "ls /sys/bus/serio/devices" and see more than 3 ports you have MUX mode active. -- Dmitry ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: swsusp 'disk' fails in bk-current - intel_agp at fault? 2005-03-24 21:10 ` Dmitry Torokhov @ 2005-03-24 23:54 ` Andy Isaacson 2005-03-25 9:22 ` Stefan Seyfried 2005-03-25 14:58 ` Dmitry Torokhov 0 siblings, 2 replies; 52+ messages in thread From: Andy Isaacson @ 2005-03-24 23:54 UTC (permalink / raw) To: dtor_core; +Cc: Stefan Seyfried, kernel list On Thu, Mar 24, 2005 at 04:10:39PM -0500, Dmitry Torokhov wrote: > If you do "ls /sys/bus/serio/devices" and see more than 3 ports you > have MUX mode active. Just serio0 and serio1. On Thu, Mar 24, 2005 at 04:14:52PM -0500, Dmitry Torokhov wrote: > On Thu, 24 Mar 2005 12:20:40 -0800, Andy Isaacson <adi@hexapodia.org> wrote: > > (How can I verify that "nomux" was accepted? It shows up on the "Kernel > > command line" but there's no other mention of it in dmesg.) > > Ignore my babbling, I just noticed in your dmesg that your KBC does > not support MUX mode to begin with. OK, anything else I should try? Why does it only fail when I have *both* intel_agp and i8042 aux? In the SysRq-T trace I see one interesting process: most things are in D state in refrigerator(), but sh shows the following traceback: wait_for_completion call_usermodehelper kobject_hotplug kobject_del class_device_del class_device_unregister mousedev_disconnect input_unregister_device alps_disconnect psmouse_disconnect serio_driver_remove device_release_driver serio_release_driver serio_resume resume_device dpm_resume device_resume swsusp_write pm_suspend_disk enter_state state_store subsys_attr_store flush_write_buffer sysfs_write_file ... That seems odd to me... Also, khelper has the following trace: io_schedule sync_buffer __wait_on_bit out_of_line_wait_on_bit ext3_find_entry ext3_lookup real_lookup do_lookup __link_path_walk link_path_walk path_lookup open_exec do_execve ... -andy ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: swsusp 'disk' fails in bk-current - intel_agp at fault? 2005-03-24 23:54 ` Andy Isaacson @ 2005-03-25 9:22 ` Stefan Seyfried 2005-03-25 10:13 ` Pavel Machek 2005-03-29 16:18 ` Dmitry Torokhov 2005-03-25 14:58 ` Dmitry Torokhov 1 sibling, 2 replies; 52+ messages in thread From: Stefan Seyfried @ 2005-03-25 9:22 UTC (permalink / raw) To: Andy Isaacson; +Cc: dtor_core, kernel list, Pavel Machek, Vojtech Pavlik Andy Isaacson wrote: > OK, anything else I should try? not really, i just wait for Vojtech and Pavel :-) > Why does it only fail when I have *both* intel_agp and i8042 aux? later... > In the SysRq-T trace I see one interesting process: most things are > in D state in refrigerator(), but sh shows the following traceback: > > wait_for_completion > call_usermodehelper > kobject_hotplug > kobject_del > class_device_del > class_device_unregister > mousedev_disconnect > input_unregister_device > alps_disconnect > psmouse_disconnect > serio_driver_remove > device_release_driver > serio_release_driver i think the following happens (but i am in no case an expert for this): - alps driver suspends - alps driver unregisters the device - udev is called via call_usermodehelper (which fails since userspace is stopped) - now somebody wants to wait for udev which does not work right. Why only with the ALPS driver and intel_agp? I think this is an accident. For me, it happens only with init=/bin/bash and _no_ other drivers loaded (only IDE drivers and psmouse built-in). As soon as i load any other drivers (i have only tried ehci_hcd and 8139too, to be honest) it works fine again. This leads me to believe it is a race condition since the extra driver that has to be suspended may give the ALPS driver the extra time needed to finish the race. For you, it may be the other way round. This is mostly guesswork, i am no kernel expert at all. -- seife Never trust a computer you can't lift. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: swsusp 'disk' fails in bk-current - intel_agp at fault? 2005-03-25 9:22 ` Stefan Seyfried @ 2005-03-25 10:13 ` Pavel Machek 2005-03-25 14:19 ` Dmitry Torokhov 2005-03-25 18:36 ` Andy Isaacson 2005-03-29 16:18 ` Dmitry Torokhov 1 sibling, 2 replies; 52+ messages in thread From: Pavel Machek @ 2005-03-25 10:13 UTC (permalink / raw) To: Stefan Seyfried; +Cc: Andy Isaacson, dtor_core, kernel list, Vojtech Pavlik Hi! > > OK, anything else I should try? > > not really, i just wait for Vojtech and Pavel :-) Try commenting out "call_usermodehelper". If that helps, Stefan's theory is confirmed, and this waits for Vojtech to fix it. > > In the SysRq-T trace I see one interesting process: most things are > > in D state in refrigerator(), but sh shows the following traceback: > > > > wait_for_completion > > call_usermodehelper > > kobject_hotplug > > kobject_del > > class_device_del > > class_device_unregister > > mousedev_disconnect > > input_unregister_device > > alps_disconnect > > psmouse_disconnect > > serio_driver_remove > > device_release_driver > > serio_release_driver Pavel -- People were complaining that M$ turns users into beta-testers... ...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl! ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: swsusp 'disk' fails in bk-current - intel_agp at fault? 2005-03-25 10:13 ` Pavel Machek @ 2005-03-25 14:19 ` Dmitry Torokhov 2005-03-25 14:24 ` Pavel Machek 2005-03-25 18:36 ` Andy Isaacson 1 sibling, 1 reply; 52+ messages in thread From: Dmitry Torokhov @ 2005-03-25 14:19 UTC (permalink / raw) To: Pavel Machek; +Cc: Stefan Seyfried, Andy Isaacson, kernel list, Vojtech Pavlik Hi, On Fri, 25 Mar 2005 11:13:44 +0100, Pavel Machek <pavel@suse.cz> wrote: > Hi! > > > > OK, anything else I should try? > > > > not really, i just wait for Vojtech and Pavel :-) > > Try commenting out "call_usermodehelper". If that helps, Stefan's > theory is confirmed, and this waits for Vojtech to fix it. > This is more of a general swsusp problem I believe - the second phase when it blindly resumes entire system. Resume of a device can fail (any reason whatsoever) and it will attempt to clean up after itself, but userspace is dead and hotplug never completes. While I am interested to know why ALPS does not want to resume on ANdy's laptop the issue will never be completely resolved from within the input system. Pavel, is it possible for swsusp to disable hotplug (probably just do hotplug_path[0] = 0) before resuming in suspend phase? A bit on tangent - you need to resume system so you can write the image, right? I wonder if we could add a flag to struct device that would mark device as "on_resume_path". The flag would be set when you select resume partition and propagated to the root of the system. Then when resume after making the image you could skip all devices that are not on resume path. -- Dmitry ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: swsusp 'disk' fails in bk-current - intel_agp at fault? 2005-03-25 14:19 ` Dmitry Torokhov @ 2005-03-25 14:24 ` Pavel Machek 2005-03-25 14:52 ` Dmitry Torokhov 0 siblings, 1 reply; 52+ messages in thread From: Pavel Machek @ 2005-03-25 14:24 UTC (permalink / raw) To: dtor_core; +Cc: Stefan Seyfried, Andy Isaacson, kernel list, Vojtech Pavlik Hi! > > > > OK, anything else I should try? > > > > > > not really, i just wait for Vojtech and Pavel :-) > > > > Try commenting out "call_usermodehelper". If that helps, Stefan's > > theory is confirmed, and this waits for Vojtech to fix it. > > > > This is more of a general swsusp problem I believe - the second phase > when it blindly resumes entire system. Resume of a device can fail > (any reason whatsoever) and it will attempt to clean up after itself, > but userspace is dead and hotplug never completes. While I am > interested to know why ALPS does not want to resume on ANdy's laptop > the issue will never be completely resolved from within the input > system. When device fails to resume, what should I do? I think I could if (error) panic("Device resume failed\n"); , but... that does not look like what you want. > Pavel, is it possible for swsusp to disable hotplug (probably just do > hotplug_path[0] = 0) before resuming in suspend phase? It feels like a hack, but yes, I probably could do that. (Do you have patch to try?) > A bit on tangent - you need to resume system so you can write the > image, right? I wonder if we could add a flag to struct device that > would mark device as "on_resume_path". The flag would be set when you > select resume partition and propagated to the root of the system. Then > when resume after making the image you could skip all devices that are > not on resume path. I'm not going to do that, see FAQ in swsusp.txt. Pavel -- People were complaining that M$ turns users into beta-testers... ...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl! ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: swsusp 'disk' fails in bk-current - intel_agp at fault? 2005-03-25 14:24 ` Pavel Machek @ 2005-03-25 14:52 ` Dmitry Torokhov 2005-03-25 15:42 ` Pavel Machek 2005-03-29 21:49 ` Rafael J. Wysocki 0 siblings, 2 replies; 52+ messages in thread From: Dmitry Torokhov @ 2005-03-25 14:52 UTC (permalink / raw) To: Pavel Machek; +Cc: Stefan Seyfried, Andy Isaacson, kernel list, Vojtech Pavlik On Fri, 25 Mar 2005 15:24:15 +0100, Pavel Machek <pavel@suse.cz> wrote: > Hi! > > > > > > OK, anything else I should try? > > > > > > > > not really, i just wait for Vojtech and Pavel :-) > > > > > > Try commenting out "call_usermodehelper". If that helps, Stefan's > > > theory is confirmed, and this waits for Vojtech to fix it. > > > > > > > This is more of a general swsusp problem I believe - the second phase > > when it blindly resumes entire system. Resume of a device can fail > > (any reason whatsoever) and it will attempt to clean up after itself, > > but userspace is dead and hotplug never completes. While I am > > interested to know why ALPS does not want to resume on ANdy's laptop > > the issue will never be completely resolved from within the input > > system. > > When device fails to resume, what should I do? I think I could > > if (error) > panic("Device resume failed\n"); > > , but... that does not look like what you want. Oh, always panic-happy Pavel ;). It really depends on what kind of device has faled to resume. If the device is really needed for writing image then panic is the only recourse, but if it some other device you resuming just ignore it, who cares... Btw, I dont think that doing selective resume (as opposed to selective suspend and Nigel's partial device trees) would be so much complicated. You'd always resume sysdevs and then, when iterating over "normal" devices, just skip ones not in resume path. It can all be contained in driver core I believe (sorry but no patch, for now at least). > > > Pavel, is it possible for swsusp to disable hotplug (probably just do > > hotplug_path[0] = 0) before resuming in suspend phase? > > It feels like a hack, but yes, I probably could do that. (Do you have > patch to try?) > Not really, I won't be able to write any code anything till next week I think. -- Dmitry ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: swsusp 'disk' fails in bk-current - intel_agp at fault? 2005-03-25 14:52 ` Dmitry Torokhov @ 2005-03-25 15:42 ` Pavel Machek 2005-03-25 16:04 ` Dmitry Torokhov 2005-03-29 21:49 ` Rafael J. Wysocki 1 sibling, 1 reply; 52+ messages in thread From: Pavel Machek @ 2005-03-25 15:42 UTC (permalink / raw) To: dtor_core; +Cc: Stefan Seyfried, Andy Isaacson, kernel list, Vojtech Pavlik Hi! > > > This is more of a general swsusp problem I believe - the second phase > > > when it blindly resumes entire system. Resume of a device can fail > > > (any reason whatsoever) and it will attempt to clean up after itself, > > > but userspace is dead and hotplug never completes. While I am > > > interested to know why ALPS does not want to resume on ANdy's laptop > > > the issue will never be completely resolved from within the input > > > system. > > > > When device fails to resume, what should I do? I think I could > > > > if (error) > > panic("Device resume failed\n"); > > > > , but... that does not look like what you want. > > Oh, always panic-happy Pavel ;). It really depends on what kind of > device has faled to resume. If the device is really needed for writing > image then panic is the only recourse, but if it some other device you > resuming just ignore it, who cares... You are right, for resume-during-suspend, we may as well risk it. We have consistent state, and if we happen to write it on disk, everything is okay. For resume-during-resume, I don't really know how we can handle that. Running with some devices non-working seems dangerous to me. > Btw, I dont think that doing selective resume (as opposed to selective > suspend and Nigel's partial device trees) would be so much > complicated. You'd always resume sysdevs and then, when iterating over > "normal" devices, just skip ones not in resume path. It can all be > contained in driver core I believe (sorry but no patch, for now at > least). :-) I think we can simply make device freeze/unfreeze fast enough. [We do not need to do full suspend/resume; freeze is enough]. Pavel -- People were complaining that M$ turns users into beta-testers... ...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl! ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: swsusp 'disk' fails in bk-current - intel_agp at fault? 2005-03-25 15:42 ` Pavel Machek @ 2005-03-25 16:04 ` Dmitry Torokhov 2005-03-28 23:00 ` Pavel Machek 2005-03-29 23:19 ` Rafael J. Wysocki 0 siblings, 2 replies; 52+ messages in thread From: Dmitry Torokhov @ 2005-03-25 16:04 UTC (permalink / raw) To: Pavel Machek; +Cc: Stefan Seyfried, Andy Isaacson, kernel list, Vojtech Pavlik On Fri, 25 Mar 2005 16:42:37 +0100, Pavel Machek <pavel@suse.cz> wrote: > Hi! > > > > > This is more of a general swsusp problem I believe - the second phase > > > > when it blindly resumes entire system. Resume of a device can fail > > > > (any reason whatsoever) and it will attempt to clean up after itself, > > > > but userspace is dead and hotplug never completes. While I am > > > > interested to know why ALPS does not want to resume on ANdy's laptop > > > > the issue will never be completely resolved from within the input > > > > system. > > > > > > When device fails to resume, what should I do? I think I could > > > > > > if (error) > > > panic("Device resume failed\n"); > > > > > > , but... that does not look like what you want. > > > > Oh, always panic-happy Pavel ;). It really depends on what kind of > > device has faled to resume. If the device is really needed for writing > > image then panic is the only recourse, but if it some other device you > > resuming just ignore it, who cares... > > You are right, for resume-during-suspend, we may as well risk it. We > have consistent state, and if we happen to write it on disk, > everything is okay. > > For resume-during-resume, I don't really know how we can handle > that. Running with some devices non-working seems dangerous to me. > I think it again varies, and the driver would have to decide what to do if it can not resume hardware. Take for example USB - i believe USB guys are shooting at being able to disconnect device while the box is suspended and have it removed from the system when resuming. In Probably every driver that has even a slighest notion of hot-pluggability should just properly clean up after itself and not signal error to the core. > > Btw, I dont think that doing selective resume (as opposed to selective > > suspend and Nigel's partial device trees) would be so much > > complicated. You'd always resume sysdevs and then, when iterating over > > "normal" devices, just skip ones not in resume path. It can all be > > contained in driver core I believe (sorry but no patch, for now at > > least). > > :-) I think we can simply make device freeze/unfreeze fast enough. > [We do not need to do full suspend/resume; freeze is enough]. It is not suspend/freeze here that gets us but resume and with resume the driver (at least for now) does not have any idea if it is "unfreeze" or "full-resume". I mean I could have serio just ignore "unfreeze" requests (as I doubt anyone would ever try to suspend over PS/2 port ;) ) but I think it should be really handled by the core. -- Dmitry ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: swsusp 'disk' fails in bk-current - intel_agp at fault? 2005-03-25 16:04 ` Dmitry Torokhov @ 2005-03-28 23:00 ` Pavel Machek 2005-03-29 23:19 ` Rafael J. Wysocki 1 sibling, 0 replies; 52+ messages in thread From: Pavel Machek @ 2005-03-28 23:00 UTC (permalink / raw) To: dtor_core; +Cc: Stefan Seyfried, Andy Isaacson, kernel list, Vojtech Pavlik Hi! > > > Btw, I dont think that doing selective resume (as opposed to selective > > > suspend and Nigel's partial device trees) would be so much > > > complicated. You'd always resume sysdevs and then, when iterating over > > > "normal" devices, just skip ones not in resume path. It can all be > > > contained in driver core I believe (sorry but no patch, for now at > > > least). > > > > :-) I think we can simply make device freeze/unfreeze fast enough. > > [We do not need to do full suspend/resume; freeze is enough]. > > It is not suspend/freeze here that gets us but resume and with resume > the driver (at least for now) does not have any idea if it is > "unfreeze" or "full-resume". I mean I could have serio just ignore > "unfreeze" requests (as I doubt anyone would ever try to suspend over > PS/2 port ;) ) but I think it should be really handled by the core. Please just always do full-resume... for now. Patches that enable you to detect "unfreeze" are not in, yet. If something fails, just printk with big enough severity and continue, as you don't have method of signaling error, anyway. Pavel -- People were complaining that M$ turns users into beta-testers... ...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl! ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: swsusp 'disk' fails in bk-current - intel_agp at fault? 2005-03-25 16:04 ` Dmitry Torokhov 2005-03-28 23:00 ` Pavel Machek @ 2005-03-29 23:19 ` Rafael J. Wysocki 1 sibling, 0 replies; 52+ messages in thread From: Rafael J. Wysocki @ 2005-03-29 23:19 UTC (permalink / raw) To: dtor_core Cc: Pavel Machek, Stefan Seyfried, Andy Isaacson, kernel list, Vojtech Pavlik Hi, On Friday, 25 of March 2005 17:04, Dmitry Torokhov wrote: > On Fri, 25 Mar 2005 16:42:37 +0100, Pavel Machek <pavel@suse.cz> wrote: > > Hi! > > > > > > > This is more of a general swsusp problem I believe - the second phase > > > > > when it blindly resumes entire system. Resume of a device can fail > > > > > (any reason whatsoever) and it will attempt to clean up after itself, > > > > > but userspace is dead and hotplug never completes. While I am > > > > > interested to know why ALPS does not want to resume on ANdy's laptop > > > > > the issue will never be completely resolved from within the input > > > > > system. > > > > > > > > When device fails to resume, what should I do? I think I could > > > > > > > > if (error) > > > > panic("Device resume failed\n"); > > > > > > > > , but... that does not look like what you want. > > > > > > Oh, always panic-happy Pavel ;). It really depends on what kind of > > > device has faled to resume. If the device is really needed for writing > > > image then panic is the only recourse, but if it some other device you > > > resuming just ignore it, who cares... > > > > You are right, for resume-during-suspend, we may as well risk it. We > > have consistent state, and if we happen to write it on disk, > > everything is okay. > > > > For resume-during-resume, I don't really know how we can handle > > that. Running with some devices non-working seems dangerous to me. > > > > I think it again varies, and the driver would have to decide what to > do if it can not resume hardware. Well, I don't think that the driver would be able to state that its failure is "serious enough", for example, to panic(). This is only known to the higher-level code that calls the driver's _resume() routine. IMO the driver should not make any assumptions of its importance (eg a SCSI driver that panic()s, because it's unable to resume a disk which does not even contain a mounted partition is not a good idea ;-)). > Take for example USB - i believe USB > guys are shooting at being able to disconnect device while the box is > suspended and have it removed from the system when resuming. In > Probably every driver that has even a slighest notion of > hot-pluggability should just properly clean up after itself and not > signal error to the core. Unless, for instance, one of its devices contains the root filesystem. > > > Btw, I dont think that doing selective resume (as opposed to selective > > > suspend and Nigel's partial device trees) would be so much > > > complicated. You'd always resume sysdevs and then, when iterating over > > > "normal" devices, just skip ones not in resume path. It can all be > > > contained in driver core I believe (sorry but no patch, for now at > > > least). > > > > :-) I think we can simply make device freeze/unfreeze fast enough. > > [We do not need to do full suspend/resume; freeze is enough]. If the driver is compiled as a module, its devices may be uninitialized when its _resume() routine is called (eg in the resume-during-resume). Hence, IMHO, we can forget the "unfreeze" thing until we can differentiate the resume-during-suspend from the resume-during-resume etc. ... Greets, Rafael -- - Would you tell me, please, which way I ought to go from here? - That depends a good deal on where you want to get to. -- Lewis Carroll "Alice's Adventures in Wonderland" ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: swsusp 'disk' fails in bk-current - intel_agp at fault? 2005-03-25 14:52 ` Dmitry Torokhov 2005-03-25 15:42 ` Pavel Machek @ 2005-03-29 21:49 ` Rafael J. Wysocki 1 sibling, 0 replies; 52+ messages in thread From: Rafael J. Wysocki @ 2005-03-29 21:49 UTC (permalink / raw) To: dtor_core Cc: Pavel Machek, Stefan Seyfried, Andy Isaacson, kernel list, Vojtech Pavlik Hi, On Friday, 25 of March 2005 15:52, Dmitry Torokhov wrote: > On Fri, 25 Mar 2005 15:24:15 +0100, Pavel Machek <pavel@suse.cz> wrote: > > Hi! > > > > > > > > OK, anything else I should try? > > > > > > > > > > not really, i just wait for Vojtech and Pavel :-) > > > > > > > > Try commenting out "call_usermodehelper". If that helps, Stefan's > > > > theory is confirmed, and this waits for Vojtech to fix it. > > > > > > > > > > This is more of a general swsusp problem I believe - the second phase > > > when it blindly resumes entire system. Resume of a device can fail > > > (any reason whatsoever) and it will attempt to clean up after itself, > > > but userspace is dead and hotplug never completes. While I am > > > interested to know why ALPS does not want to resume on ANdy's laptop > > > the issue will never be completely resolved from within the input > > > system. > > > > When device fails to resume, what should I do? I think I could > > > > if (error) > > panic("Device resume failed\n"); > > > > , but... that does not look like what you want. > > Oh, always panic-happy Pavel ;). It really depends on what kind of > device has faled to resume. If the device is really needed for writing > image then panic is the only recourse, but if it some other device you > resuming just ignore it, who cares... Moreover, if we panic() here, we potentially lose data. IMO we should not do this for a device that is not needed for saving the image and/or contains the root filesystem. > Btw, I dont think that doing selective resume (as opposed to selective > suspend and Nigel's partial device trees) would be so much > complicated. You'd always resume sysdevs and then, when iterating over > "normal" devices, just skip ones not in resume path. It can all be > contained in driver core I believe (sorry but no patch, for now at > least). In fact, the only devices that we really need to resume-during-suspend are those necessary for saving the image. Greets, Rafael -- - Would you tell me, please, which way I ought to go from here? - That depends a good deal on where you want to get to. -- Lewis Carroll "Alice's Adventures in Wonderland" ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: swsusp 'disk' fails in bk-current - intel_agp at fault? 2005-03-25 10:13 ` Pavel Machek 2005-03-25 14:19 ` Dmitry Torokhov @ 2005-03-25 18:36 ` Andy Isaacson 1 sibling, 0 replies; 52+ messages in thread From: Andy Isaacson @ 2005-03-25 18:36 UTC (permalink / raw) To: Pavel Machek; +Cc: Stefan Seyfried, dtor_core, kernel list, Vojtech Pavlik On Fri, Mar 25, 2005 at 11:13:44AM +0100, Pavel Machek wrote: > Hi! > > > > OK, anything else I should try? > > > > not really, i just wait for Vojtech and Pavel :-) > > Try commenting out "call_usermodehelper". If that helps, Stefan's > theory is confirmed, and this waits for Vojtech to fix it. > > > > wait_for_completion > > > call_usermodehelper > > > kobject_hotplug > > > kobject_del Without the call_usermodehelper in kobject_hotplug, the first suspend seems to work OK (which I think confirms the theory). But after resume, the second suspend hangs in the same place. It's calling call_usermodehelper from input_call_hotplug... time to comment out another one and recompile. I also tried -mm1 and it hangs in the same place. -andy ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: swsusp 'disk' fails in bk-current - intel_agp at fault? 2005-03-25 9:22 ` Stefan Seyfried 2005-03-25 10:13 ` Pavel Machek @ 2005-03-29 16:18 ` Dmitry Torokhov 2005-03-29 18:18 ` Pavel Machek 1 sibling, 1 reply; 52+ messages in thread From: Dmitry Torokhov @ 2005-03-29 16:18 UTC (permalink / raw) To: Stefan Seyfried; +Cc: Andy Isaacson, kernel list, Pavel Machek, Vojtech Pavlik On Fri, 25 Mar 2005 10:22:28 +0100, Stefan Seyfried <seife@suse.de> wrote: > Andy Isaacson wrote: > > > In the SysRq-T trace I see one interesting process: most things are > > in D state in refrigerator(), but sh shows the following traceback: > > > > wait_for_completion > > call_usermodehelper > > kobject_hotplug > > kobject_del > > class_device_del > > class_device_unregister > > mousedev_disconnect > > input_unregister_device > > alps_disconnect > > psmouse_disconnect > > serio_driver_remove > > device_release_driver > > serio_release_driver > > i think the following happens (but i am in no case an expert for this): > - alps driver suspends > - alps driver unregisters the device > - udev is called via call_usermodehelper (which fails since userspace > is stopped) > - now somebody wants to wait for udev which does not work right. The thing is that kobject_uevent calls call_usermodehelper with wait=0. That means that it conly waits for execve("/sbin/hotplug") call to complete, it does not wait for the entire process ti complete. If you look at Andy's second trace you will see that we are waiting for the disk I/O to get /sbin/hotplug from the disk. Pavel, do you know why IO does not complete? khelper is a kernel thread so it is marked with PF_NOFREEZE. Could it be that we managed to freeze kblockd? -- Dmitry ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: swsusp 'disk' fails in bk-current - intel_agp at fault? 2005-03-29 16:18 ` Dmitry Torokhov @ 2005-03-29 18:18 ` Pavel Machek 2005-03-29 19:11 ` Dmitry Torokhov 0 siblings, 1 reply; 52+ messages in thread From: Pavel Machek @ 2005-03-29 18:18 UTC (permalink / raw) To: dtor_core; +Cc: Stefan Seyfried, Andy Isaacson, kernel list, Vojtech Pavlik Hi! > > > In the SysRq-T trace I see one interesting process: most things are > > > in D state in refrigerator(), but sh shows the following traceback: > > > > > > wait_for_completion > > > call_usermodehelper > > > kobject_hotplug > > > kobject_del > > > class_device_del > > > class_device_unregister > > > mousedev_disconnect > > > input_unregister_device > > > alps_disconnect > > > psmouse_disconnect > > > serio_driver_remove > > > device_release_driver > > > serio_release_driver > > > > i think the following happens (but i am in no case an expert for this): > > - alps driver suspends > > - alps driver unregisters the device > > - udev is called via call_usermodehelper (which fails since userspace > > is stopped) > > - now somebody wants to wait for udev which does not work right. > > The thing is that kobject_uevent calls call_usermodehelper with > wait=0. That means that it conly waits for execve("/sbin/hotplug") > call to complete, it does not wait for the entire process ti complete. > > If you look at Andy's second trace you will see that we are waiting > for the disk I/O to get /sbin/hotplug from the disk. Pavel, do you > know why IO does not complete? khelper is a kernel thread so it is > marked with > PF_NOFREEZE. Could it be that we managed to freeze kblockd? Uf, no idea about kblockd freezing -- we certainly should not. *But*, if we are doing execve while system is frozen, something is very wrong. We should not be doing execve in the first place. Pavel -- People were complaining that M$ turns users into beta-testers... ...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl! ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: swsusp 'disk' fails in bk-current - intel_agp at fault? 2005-03-29 18:18 ` Pavel Machek @ 2005-03-29 19:11 ` Dmitry Torokhov 2005-03-29 19:23 ` Pavel Machek 0 siblings, 1 reply; 52+ messages in thread From: Dmitry Torokhov @ 2005-03-29 19:11 UTC (permalink / raw) To: Pavel Machek; +Cc: Stefan Seyfried, Andy Isaacson, kernel list, Vojtech Pavlik On Tue, 29 Mar 2005 20:18:31 +0200, Pavel Machek <pavel@suse.cz> wrote: > Hi! > > > If you look at Andy's second trace you will see that we are waiting > > for the disk I/O to get /sbin/hotplug from the disk. Pavel, do you > > know why IO does not complete? khelper is a kernel thread so it is > > marked with > > PF_NOFREEZE. Could it be that we managed to freeze kblockd? > > Uf, no idea about kblockd freezing -- we certainly should not. > > *But*, if we are doing execve while system is frozen, something is > very wrong. We should not be doing execve in the first place. Well, there lies a problem - some devices have to do execve because they need firmware to operate. Also, again, some buses with hot-pluggable devices will attempt to clean up unsuccessful resume and this will cause hotplug events. The point is you either resume system or you don't. We probably need a separate "unfreeze" callback, although this is kind of messy. -- Dmitry ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: swsusp 'disk' fails in bk-current - intel_agp at fault? 2005-03-29 19:11 ` Dmitry Torokhov @ 2005-03-29 19:23 ` Pavel Machek 2005-03-29 20:05 ` Dmitry Torokhov 0 siblings, 1 reply; 52+ messages in thread From: Pavel Machek @ 2005-03-29 19:23 UTC (permalink / raw) To: dtor_core; +Cc: Stefan Seyfried, Andy Isaacson, kernel list, Vojtech Pavlik Hi! > > > If you look at Andy's second trace you will see that we are waiting > > > for the disk I/O to get /sbin/hotplug from the disk. Pavel, do you > > > know why IO does not complete? khelper is a kernel thread so it is > > > marked with > > > PF_NOFREEZE. Could it be that we managed to freeze kblockd? > > > > Uf, no idea about kblockd freezing -- we certainly should not. > > > > *But*, if we are doing execve while system is frozen, something is > > very wrong. We should not be doing execve in the first place. > > Well, there lies a problem - some devices have to do execve because > they need firmware to operate. Also, again, some buses with > hot-pluggable devices will attempt to clean up unsuccessful resume and > this will cause hotplug events. The point is you either resume system > or you don't. We probably need a separate "unfreeze" callback, > although this is kind of messy. There's a better solution for firmware: You should load your firmware prior to suspend and store it in RAM. Anything else just plain does not work. (Because your wireless firmware might be on NFS mounted over that wireless card). Hotplug... I guess udev just needs to hold that callbacks before system is fully up... it has to do something similar on regular boot, no? Pavel -- People were complaining that M$ turns users into beta-testers... ...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl! ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: swsusp 'disk' fails in bk-current - intel_agp at fault? 2005-03-29 19:23 ` Pavel Machek @ 2005-03-29 20:05 ` Dmitry Torokhov 2005-03-29 20:52 ` Pavel Machek 0 siblings, 1 reply; 52+ messages in thread From: Dmitry Torokhov @ 2005-03-29 20:05 UTC (permalink / raw) To: Pavel Machek; +Cc: Stefan Seyfried, Andy Isaacson, kernel list, Vojtech Pavlik On Tue, 29 Mar 2005 21:23:39 +0200, Pavel Machek <pavel@suse.cz> wrote: > Hi! > > > > > If you look at Andy's second trace you will see that we are waiting > > > > for the disk I/O to get /sbin/hotplug from the disk. Pavel, do you > > > > know why IO does not complete? khelper is a kernel thread so it is > > > > marked with > > > > PF_NOFREEZE. Could it be that we managed to freeze kblockd? > > > > > > Uf, no idea about kblockd freezing -- we certainly should not. > > > > > > *But*, if we are doing execve while system is frozen, something is > > > very wrong. We should not be doing execve in the first place. > > > > Well, there lies a problem - some devices have to do execve because > > they need firmware to operate. Also, again, some buses with > > hot-pluggable devices will attempt to clean up unsuccessful resume and > > this will cause hotplug events. The point is you either resume system > > or you don't. We probably need a separate "unfreeze" callback, > > although this is kind of messy. > > There's a better solution for firmware: You should load your firmware > prior to suspend and store it in RAM. Anything else just plain does > not work. (Because your wireless firmware might be on NFS mounted over > that wireless card). > > Hotplug... I guess udev just needs to hold that callbacks before > system is fully up... it has to do something similar on regular boot, > no? Well, I did not really look into udev but hotplug (which can iteract with udev) does not keep anything. If it fails its ok - that's why there are coldplug scripts that "recover" lost events. But here we block trying to start hotplug - we not getting an error - and this is bad. Unfortunately I am not familiar with block devices working to say why it hangs. Should we pull Jens into the discussion? -- Dmitry ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: swsusp 'disk' fails in bk-current - intel_agp at fault? 2005-03-29 20:05 ` Dmitry Torokhov @ 2005-03-29 20:52 ` Pavel Machek 2005-03-29 21:07 ` Dmitry Torokhov 2005-03-29 21:23 ` [linux-pm] " Patrick Mochel 0 siblings, 2 replies; 52+ messages in thread From: Pavel Machek @ 2005-03-29 20:52 UTC (permalink / raw) To: dtor_core Cc: Stefan Seyfried, Andy Isaacson, kernel list, Vojtech Pavlik, Linux-pm mailing list Hi! > > > Well, there lies a problem - some devices have to do execve because > > > they need firmware to operate. Also, again, some buses with > > > hot-pluggable devices will attempt to clean up unsuccessful resume and > > > this will cause hotplug events. The point is you either resume system > > > or you don't. We probably need a separate "unfreeze" callback, > > > although this is kind of messy. > > > > There's a better solution for firmware: You should load your firmware > > prior to suspend and store it in RAM. Anything else just plain does > > not work. (Because your wireless firmware might be on NFS mounted over > > that wireless card). > > > > Hotplug... I guess udev just needs to hold that callbacks before > > system is fully up... it has to do something similar on regular boot, > > no? > > Well, I did not really look into udev but hotplug (which can iteract > with udev) does not keep anything. If it fails its ok - that's why > there are coldplug scripts that "recover" lost events. But here we > block trying to start hotplug - we not getting an error - and this is > bad. Unfortunately I am not familiar with block devices working to say > why it hangs. > > Should we pull Jens into the discussion? I don't really want us to try execve during resume... Could we simply artifically fail that execve with something if (in_suspend()) return -EINVAL; [except that in_suspend() just is not there, but there were some proposals to add it]. Or just avoid calling hotplug at all in resume case? And then do coldplug-like scan when userspace is ready... But we perhaps should cc linux-pm list. Pavel -- People were complaining that M$ turns users into beta-testers... ...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl! ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: swsusp 'disk' fails in bk-current - intel_agp at fault? 2005-03-29 20:52 ` Pavel Machek @ 2005-03-29 21:07 ` Dmitry Torokhov 2005-03-29 21:12 ` Pavel Machek 2005-03-29 21:23 ` [linux-pm] " Patrick Mochel 1 sibling, 1 reply; 52+ messages in thread From: Dmitry Torokhov @ 2005-03-29 21:07 UTC (permalink / raw) To: Pavel Machek Cc: Stefan Seyfried, Andy Isaacson, kernel list, Vojtech Pavlik, Linux-pm mailing list On Tue, 29 Mar 2005 22:52:25 +0200, Pavel Machek <pavel@suse.cz> wrote: > I don't really want us to try execve during resume... Could we simply > artifically fail that execve with something if (in_suspend()) return > -EINVAL; [except that in_suspend() just is not there, but there were > some proposals to add it]. > > Or just avoid calling hotplug at all in resume case? And then do > coldplug-like scan when userspace is ready... > I am leaning towards calling disable_usermodehelper (not writtent yet) after swsusp completes snapshotting memory. We really don't care about hotplug events in this case and this will allow keeping "normal" resume in drivers as is. What do you think? -- Dmitry ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: swsusp 'disk' fails in bk-current - intel_agp at fault? 2005-03-29 21:07 ` Dmitry Torokhov @ 2005-03-29 21:12 ` Pavel Machek 2005-03-29 21:33 ` Dmitry Torokhov 2005-03-29 23:05 ` Rafael J. Wysocki 0 siblings, 2 replies; 52+ messages in thread From: Pavel Machek @ 2005-03-29 21:12 UTC (permalink / raw) To: dtor_core Cc: Stefan Seyfried, Andy Isaacson, kernel list, Vojtech Pavlik, Linux-pm mailing list Hi! > > I don't really want us to try execve during resume... Could we simply > > artifically fail that execve with something if (in_suspend()) return > > -EINVAL; [except that in_suspend() just is not there, but there were > > some proposals to add it]. > > > > Or just avoid calling hotplug at all in resume case? And then do > > coldplug-like scan when userspace is ready... > > > > I am leaning towards calling disable_usermodehelper (not writtent yet) > after swsusp completes snapshotting memory. We really don't care about > hotplug events in this case and this will allow keeping "normal" > resume in drivers as is. What do you think? That would certianly do the trick. [Or perhaps in_suspend() is slightly nicer solution? People wanted it for other stuff (sanity checking, like BUG_ON(in_suspend())), too....] Pavel -- People were complaining that M$ turns users into beta-testers... ...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl! ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: swsusp 'disk' fails in bk-current - intel_agp at fault? 2005-03-29 21:12 ` Pavel Machek @ 2005-03-29 21:33 ` Dmitry Torokhov 2005-03-29 21:44 ` Pavel Machek 2005-03-29 23:05 ` Rafael J. Wysocki 1 sibling, 1 reply; 52+ messages in thread From: Dmitry Torokhov @ 2005-03-29 21:33 UTC (permalink / raw) To: Pavel Machek Cc: Stefan Seyfried, Andy Isaacson, kernel list, Vojtech Pavlik, Linux-pm mailing list On Tue, 29 Mar 2005 23:12:39 +0200, Pavel Machek <pavel@suse.cz> wrote: > > > > I am leaning towards calling disable_usermodehelper (not writtent yet) > > after swsusp completes snapshotting memory. We really don't care about > > hotplug events in this case and this will allow keeping "normal" > > resume in drivers as is. What do you think? > > That would certianly do the trick. > > [Or perhaps in_suspend() is slightly nicer solution? People wanted it > for other stuff (sanity checking, like BUG_ON(in_suspend())), too....] > We might want having both... Hmm... in_suspend - is it only for swsusp (in_swsusp) or for suspend-to-ram as well? For suspend to ram we might need slightly different rules, I don't know. A separate call will allow more fine-grained control and will explicitely tell reader what is happening. I do not have a strong preference though. -- Dmitry ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: swsusp 'disk' fails in bk-current - intel_agp at fault? 2005-03-29 21:33 ` Dmitry Torokhov @ 2005-03-29 21:44 ` Pavel Machek 2005-03-29 22:31 ` [linux-pm] " Nigel Cunningham 0 siblings, 1 reply; 52+ messages in thread From: Pavel Machek @ 2005-03-29 21:44 UTC (permalink / raw) To: dtor_core Cc: Stefan Seyfried, Andy Isaacson, kernel list, Vojtech Pavlik, Linux-pm mailing list On Út 29-03-05 16:33:04, Dmitry Torokhov wrote: > On Tue, 29 Mar 2005 23:12:39 +0200, Pavel Machek <pavel@suse.cz> wrote: > > > > > > I am leaning towards calling disable_usermodehelper (not writtent yet) > > > after swsusp completes snapshotting memory. We really don't care about > > > hotplug events in this case and this will allow keeping "normal" > > > resume in drivers as is. What do you think? > > > > That would certianly do the trick. > > > > [Or perhaps in_suspend() is slightly nicer solution? People wanted it > > for other stuff (sanity checking, like BUG_ON(in_suspend())), too....] > > > > We might want having both... Hmm... in_suspend - is it only for swsusp > (in_swsusp) or for suspend-to-ram as well? For suspend to ram we might > need slightly different rules, I don't know. A separate call will > allow more fine-grained control and will explicitely tell reader what > is happening. We currently freeze processes for suspend-to-ram, too. I guess that disable_usermodehelper is probably better and that in_suspend() should only be used for sanity checks... go with disable_usermodehelper and sorry for the noise. Pavel -- People were complaining that M$ turns users into beta-testers... ...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl! ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [linux-pm] Re: swsusp 'disk' fails in bk-current - intel_agp at fault? 2005-03-29 21:44 ` Pavel Machek @ 2005-03-29 22:31 ` Nigel Cunningham 2005-03-29 22:35 ` Pavel Machek 0 siblings, 1 reply; 52+ messages in thread From: Nigel Cunningham @ 2005-03-29 22:31 UTC (permalink / raw) To: Pavel Machek Cc: dtor_core, Linux-pm mailing list, Vojtech Pavlik, Stefan Seyfried, Linux Kernel Mailing List, Andy Isaacson Hi. On Wed, 2005-03-30 at 07:44, Pavel Machek wrote: > We currently freeze processes for suspend-to-ram, too. I guess that > disable_usermodehelper is probably better and that in_suspend() should > only be used for sanity checks... go with disable_usermodehelper and > sorry for the noise. Here's another possibility: Freeze the workqueue that call_usermodehelper uses (remember that code I didn't push hard enough to Andrew?), and let invocations of call_usermodehelper block in TASK_UNINTERRUPTIBLE. In refrigerating processes, don't choke on kernel processes in that state. Of course if you won't want the freeze processes for str, but do want to freeze call_usermodehelper, I guess you'd still need the in_suspend() macro. Regards, Nigel -- Nigel Cunningham Software Engineer, Canberra, Australia http://www.cyclades.com Bus: +61 (2) 6291 9554; Hme: +61 (2) 6292 8028; Mob: +61 (417) 100 574 Maintainer of Suspend2 Kernel Patches http://suspend2.net ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [linux-pm] Re: swsusp 'disk' fails in bk-current - intel_agp at fault? 2005-03-29 22:31 ` [linux-pm] " Nigel Cunningham @ 2005-03-29 22:35 ` Pavel Machek 2005-03-29 23:46 ` Nigel Cunningham 2005-03-31 7:26 ` Dmitry Torokhov 0 siblings, 2 replies; 52+ messages in thread From: Pavel Machek @ 2005-03-29 22:35 UTC (permalink / raw) To: Nigel Cunningham Cc: dtor_core, Linux-pm mailing list, Vojtech Pavlik, Stefan Seyfried, Linux Kernel Mailing List, Andy Isaacson Hi! > > We currently freeze processes for suspend-to-ram, too. I guess that > > disable_usermodehelper is probably better and that in_suspend() should > > only be used for sanity checks... go with disable_usermodehelper and > > sorry for the noise. > > Here's another possibility: Freeze the workqueue that > call_usermodehelper uses (remember that code I didn't push hard enough > to Andrew?), and let invocations of call_usermodehelper block in > TASK_UNINTERRUPTIBLE. In refrigerating processes, don't choke on There may be many devices in the system, and you are going to need quite a lot of RAM for all that... That's why they do not queue it during boot, IIRC. Disabling usermode helper seems right. Pavel -- People were complaining that M$ turns users into beta-testers... ...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl! ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [linux-pm] Re: swsusp 'disk' fails in bk-current - intel_agp at fault? 2005-03-29 22:35 ` Pavel Machek @ 2005-03-29 23:46 ` Nigel Cunningham 2005-03-31 7:26 ` Dmitry Torokhov 1 sibling, 0 replies; 52+ messages in thread From: Nigel Cunningham @ 2005-03-29 23:46 UTC (permalink / raw) To: Pavel Machek Cc: dtor_core, Linux-pm mailing list, Vojtech Pavlik, Stefan Seyfried, Linux Kernel Mailing List, Andy Isaacson Hi. On Wed, 2005-03-30 at 08:35, Pavel Machek wrote: > Hi! > > > > We currently freeze processes for suspend-to-ram, too. I guess that > > > disable_usermodehelper is probably better and that in_suspend() should > > > only be used for sanity checks... go with disable_usermodehelper and > > > sorry for the noise. > > > > Here's another possibility: Freeze the workqueue that > > call_usermodehelper uses (remember that code I didn't push hard enough > > to Andrew?), and let invocations of call_usermodehelper block in > > TASK_UNINTERRUPTIBLE. In refrigerating processes, don't choke on > > There may be many devices in the system, and you are going to need > quite a lot of RAM for all that... That's why they do not queue it > during boot, IIRC. Disabling usermode helper seems right. Many devices is true, but very few of them invoke usermode helpers. [desktop build-2.6.12-rc1]# find -name *.[ch] | xargs grep usermodehelper ./drivers/s390/crypto/z90main.c: call_usermodehelper(argv[0], argv, envp, 0); ./drivers/net/hamradio/baycom_epp.c: return call_usermodehelper(eppconfig_path, argv, envp, 1); ./drivers/acpi/thermal.c: call_usermodehelper(argv[0], argv, envp, 0); ./drivers/acpi/thermal.mod.c: { 0x436006da, "call_usermodehelper" }, ./drivers/input/input.c: value = call_usermodehelper(argv [0], argv, envp, 0); ./drivers/pnp/pnpbios/core.c: value = call_usermodehelper (argv [0], argv, envp, 0); ./drivers/macintosh/therm_pm72.c: return call_usermodehelper(critical_overtemp_path, argv, envp, 0); ./arch/i386/mach-voyager/voyager_thread.c: if ((ret = call_usermodehelper(argv[0], argv, envp, 1)) != 0) { ./include/linux/kmod.h:extern int call_usermodehelper(char *path, char *argv[], char *envp[], int wait); ./include/linux/kmod.h:extern void usermodehelper_init(void); ./kernel/power/main.c: return call_usermodehelper(argv[0], argv, envp, 1); ./kernel/power/suspend_userui.c: retval = call_usermodehelper(userui_program, argv, envp, 0); ./kernel/kmod.c: call_usermodehelper wait flag, and remove exec_usermodehelper. ./kernel/kmod.c: ret = call_usermodehelper(modprobe_path, argv, envp, 1); ./kernel/kmod.c:static int ____call_usermodehelper(void *data) ./kernel/kmod.c: pid = kernel_thread(____call_usermodehelper, sub_info, SIGCHLD); ./kernel/kmod.c:static void __call_usermodehelper(void *data) ./kernel/kmod.c: pid = kernel_thread(____call_usermodehelper, sub_info, ./kernel/kmod.c: * call_usermodehelper - start a usermode application ./kernel/kmod.c:int call_usermodehelper(char *path, char **argv, char **envp, int wait) ./kernel/kmod.c: DECLARE_WORK(work, __call_usermodehelper, &sub_info); ./kernel/kmod.c:EXPORT_SYMBOL(call_usermodehelper); ./kernel/kmod.c:void __init usermodehelper_init(void) ./kernel/cpuset.c: * Note final arg to call_usermodehelper() is 0 - that means ./kernel/cpuset.c: return call_usermodehelper(argv[0], argv, envp, 0); ./security/keys/request_key.c: return call_usermodehelper(argv[0], argv, envp, 1); ./lib/kobject_uevent.c: retval = call_usermodehelper (argv[0], argv, envp, 0); ./lib/kobject_uevent.c: pr_debug ("%s - call_usermodehelper returned %d\n", ./init/main.c: usermodehelper_init(); Of course there will be indirect invocations (via kobjects, for example), but I still think the number is not that great. I'm already using the method I suggested in unreleased Suspend2 code, and the only invocation I'm catch is at resume time, for the keseriod. Regards, Nigel -- Nigel Cunningham Software Engineer, Canberra, Australia http://www.cyclades.com Bus: +61 (2) 6291 9554; Hme: +61 (2) 6292 8028; Mob: +61 (417) 100 574 Maintainer of Suspend2 Kernel Patches http://suspend2.net ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [linux-pm] Re: swsusp 'disk' fails in bk-current - intel_agp at fault? 2005-03-29 22:35 ` Pavel Machek 2005-03-29 23:46 ` Nigel Cunningham @ 2005-03-31 7:26 ` Dmitry Torokhov 2005-03-31 8:39 ` Pavel Machek 2005-03-31 16:02 ` Patrick Mochel 1 sibling, 2 replies; 52+ messages in thread From: Dmitry Torokhov @ 2005-03-31 7:26 UTC (permalink / raw) To: Pavel Machek Cc: Nigel Cunningham, Linux-pm mailing list, Vojtech Pavlik, Stefan Seyfried, Linux Kernel Mailing List, Andy Isaacson On Tuesday 29 March 2005 17:35, Pavel Machek wrote: > Hi! > > > > We currently freeze processes for suspend-to-ram, too. I guess that > > > disable_usermodehelper is probably better and that in_suspend() should > > > only be used for sanity checks... go with disable_usermodehelper and > > > sorry for the noise. > > > > Here's another possibility: Freeze the workqueue that > > call_usermodehelper uses (remember that code I didn't push hard enough > > to Andrew?), and let invocations of call_usermodehelper block in > > TASK_UNINTERRUPTIBLE. In refrigerating processes, don't choke on > > There may be many devices in the system, and you are going to need > quite a lot of RAM for all that... That's why they do not queue it > during boot, IIRC. Disabling usermode helper seems right. Ok, what do you think about this one? =================================================================== swsusp: disable usermodehelper after generating memory snapshot and before resuming devices, so when device fails to resume we won't try to call hotplug - userspace stopped anyway. Signed-off-by: Dmitry Torokhov <dtor@mail.ru> include/linux/kmod.h | 3 +++ kernel/kmod.c | 14 +++++++++++++- kernel/power/disk.c | 2 ++ kernel/power/swsusp.c | 1 - 4 files changed, 18 insertions(+), 2 deletions(-) Index: dtor/kernel/power/disk.c =================================================================== --- dtor.orig/kernel/power/disk.c +++ dtor/kernel/power/disk.c @@ -205,6 +205,8 @@ int pm_suspend_disk(void) if (in_suspend) { pr_debug("PM: writing image.\n"); + usermodehelper_disable(); + device_resume(); error = swsusp_write(); if (!error) power_down(pm_disk_mode); Index: dtor/kernel/power/swsusp.c =================================================================== --- dtor.orig/kernel/power/swsusp.c +++ dtor/kernel/power/swsusp.c @@ -853,7 +853,6 @@ static int suspend_prepare_image(void) int swsusp_write(void) { int error; - device_resume(); lock_swapdevices(); error = write_suspend_image(); /* This will unlock ignored swap devices since writing is finished */ Index: dtor/kernel/kmod.c =================================================================== --- dtor.orig/kernel/kmod.c +++ dtor/kernel/kmod.c @@ -124,6 +124,8 @@ struct subprocess_info { int retval; }; +static int usermodehelper_disabled; + /* * This is the task which runs the usermode application */ @@ -240,7 +242,7 @@ int call_usermodehelper(char *path, char if (!khelper_wq) return -EBUSY; - if (path[0] == '\0') + if (usermodehelper_disabled || path[0] == '\0') return 0; queue_work(khelper_wq, &work); @@ -249,6 +251,16 @@ int call_usermodehelper(char *path, char } EXPORT_SYMBOL(call_usermodehelper); +void usermodehelper_enable(void) +{ + usermodehelper_disabled = 0; +} + +void usermodehelper_disable(void) +{ + usermodehelper_disabled = 1; +} + void __init usermodehelper_init(void) { khelper_wq = create_singlethread_workqueue("khelper"); Index: dtor/include/linux/kmod.h =================================================================== --- dtor.orig/include/linux/kmod.h +++ dtor/include/linux/kmod.h @@ -34,7 +34,10 @@ static inline int request_module(const c #endif #define try_then_request_module(x, mod...) ((x) ?: (request_module(mod), (x))) + extern int call_usermodehelper(char *path, char *argv[], char *envp[], int wait); extern void usermodehelper_init(void); +extern void usermodehelper_enable(void); +extern void usermodehelper_disable(void); #endif /* __LINUX_KMOD_H__ */ ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [linux-pm] Re: swsusp 'disk' fails in bk-current - intel_agp at fault? 2005-03-31 7:26 ` Dmitry Torokhov @ 2005-03-31 8:39 ` Pavel Machek 2005-03-31 15:02 ` Dmitry Torokhov 2005-03-31 16:02 ` Patrick Mochel 1 sibling, 1 reply; 52+ messages in thread From: Pavel Machek @ 2005-03-31 8:39 UTC (permalink / raw) To: Dmitry Torokhov Cc: Nigel Cunningham, Linux-pm mailing list, Vojtech Pavlik, Stefan Seyfried, Linux Kernel Mailing List, Andy Isaacson Hi! > > > > We currently freeze processes for suspend-to-ram, too. I guess that > > > > disable_usermodehelper is probably better and that in_suspend() should > > > > only be used for sanity checks... go with disable_usermodehelper and > > > > sorry for the noise. > > > > > > Here's another possibility: Freeze the workqueue that > > > call_usermodehelper uses (remember that code I didn't push hard enough > > > to Andrew?), and let invocations of call_usermodehelper block in > > > TASK_UNINTERRUPTIBLE. In refrigerating processes, don't choke on > > > > There may be many devices in the system, and you are going to need > > quite a lot of RAM for all that... That's why they do not queue it > > during boot, IIRC. Disabling usermode helper seems right. > > Ok, what do you think about this one? > > =================================================================== > > swsusp: disable usermodehelper after generating memory snapshot and > before resuming devices, so when device fails to resume we > won't try to call hotplug - userspace stopped anyway. > > Signed-off-by: Dmitry Torokhov <dtor@mail.ru> > > > include/linux/kmod.h | 3 +++ > kernel/kmod.c | 14 +++++++++++++- > kernel/power/disk.c | 2 ++ > kernel/power/swsusp.c | 1 - > 4 files changed, 18 insertions(+), 2 deletions(-) > > Index: dtor/kernel/power/disk.c > =================================================================== > --- dtor.orig/kernel/power/disk.c > +++ dtor/kernel/power/disk.c > @@ -205,6 +205,8 @@ int pm_suspend_disk(void) > > if (in_suspend) { > pr_debug("PM: writing image.\n"); > + usermodehelper_disable(); > + device_resume(); > error = swsusp_write(); > if (!error) > power_down(pm_disk_mode); > Index: dtor/kernel/power/swsusp.c > =================================================================== > --- dtor.orig/kernel/power/swsusp.c > +++ dtor/kernel/power/swsusp.c > @@ -853,7 +853,6 @@ static int suspend_prepare_image(void) > int swsusp_write(void) > { > int error; > - device_resume(); > lock_swapdevices(); > error = write_suspend_image(); > /* This will unlock ignored swap devices since writing is Looks good, except... why move code around? Could you just call usermodehelper_disable from swsusp_write? Pavel -- People were complaining that M$ turns users into beta-testers... ...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl! ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [linux-pm] Re: swsusp 'disk' fails in bk-current - intel_agp at fault? 2005-03-31 8:39 ` Pavel Machek @ 2005-03-31 15:02 ` Dmitry Torokhov 0 siblings, 0 replies; 52+ messages in thread From: Dmitry Torokhov @ 2005-03-31 15:02 UTC (permalink / raw) To: Pavel Machek Cc: Nigel Cunningham, Linux-pm mailing list, Vojtech Pavlik, Stefan Seyfried, Linux Kernel Mailing List, Andy Isaacson On Thu, 31 Mar 2005 10:39:10 +0200, Pavel Machek <pavel@suse.cz> wrote: > > int swsusp_write(void) > > { > > int error; > > - device_resume(); > > lock_swapdevices(); > > error = write_suspend_image(); > > /* This will unlock ignored swap devices since writing is > > Looks good, except... why move code around? Could you just call > usermodehelper_disable from swsusp_write? That's because I don't think that swsusp_write is a proper place for it ;) It looks like a lean and mean function that does just write and manipulating usermodehelper state _and_ system (device) state is wrong. Let it do one thing, don't overload with actions that I think belong to the upper level. Do you agree? I think I need to stick in usermodehelper_enable call in case swsusp_write fails though. -- Dmitry ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [linux-pm] Re: swsusp 'disk' fails in bk-current - intel_agp at fault? 2005-03-31 7:26 ` Dmitry Torokhov 2005-03-31 8:39 ` Pavel Machek @ 2005-03-31 16:02 ` Patrick Mochel 2005-03-31 16:32 ` Dmitry Torokhov 1 sibling, 1 reply; 52+ messages in thread From: Patrick Mochel @ 2005-03-31 16:02 UTC (permalink / raw) To: Dmitry Torokhov Cc: Pavel Machek, Vojtech Pavlik, Andy Isaacson, Linux-pm mailing list, Nigel Cunningham, Stefan Seyfried, Linux Kernel Mailing List On Thu, 31 Mar 2005, Dmitry Torokhov wrote: > Ok, what do you think about this one? > > =================================================================== > > swsusp: disable usermodehelper after generating memory snapshot and > before resuming devices, so when device fails to resume we > won't try to call hotplug - userspace stopped anyway. Hm, shouldn't we disable it before we start to freeze processes? We don't want any more processes trying to start up after we've taken care of them.. Thanks, Pat ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [linux-pm] Re: swsusp 'disk' fails in bk-current - intel_agp at fault? 2005-03-31 16:02 ` Patrick Mochel @ 2005-03-31 16:32 ` Dmitry Torokhov 2005-03-31 22:16 ` Nigel Cunningham 2005-03-31 22:18 ` Pavel Machek 0 siblings, 2 replies; 52+ messages in thread From: Dmitry Torokhov @ 2005-03-31 16:32 UTC (permalink / raw) To: Patrick Mochel Cc: Pavel Machek, Vojtech Pavlik, Andy Isaacson, Linux-pm mailing list, Nigel Cunningham, Stefan Seyfried, Linux Kernel Mailing List On Thu, 31 Mar 2005 08:02:44 -0800 (PST), Patrick Mochel <mochel@digitalimplant.org> wrote: > > On Thu, 31 Mar 2005, Dmitry Torokhov wrote: > > > Ok, what do you think about this one? > > > > =================================================================== > > > > swsusp: disable usermodehelper after generating memory snapshot and > > before resuming devices, so when device fails to resume we > > won't try to call hotplug - userspace stopped anyway. > > Hm, shouldn't we disable it before we start to freeze processes? We don't > want any more processes trying to start up after we've taken care of > them.. > Can't a device be removed (for any reason) _while_ we are freezing processes? I think freeszing code will properly deal with it... What about suspend semantics - if suspend fails do we say the device should be operational or the system should attempt to re-initialize? I.e. we are not doing suspend after all - can we still drop messages on the floor? After all, we still have ability to run coldplug after failed suspend. I frankly am not sure at what point to disable usermode helper. Or maybe we need to have a list of pending events and suspend khelper_wq while suspending. -- Dmitry ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [linux-pm] Re: swsusp 'disk' fails in bk-current - intel_agp at fault? 2005-03-31 16:32 ` Dmitry Torokhov @ 2005-03-31 22:16 ` Nigel Cunningham 2005-03-31 22:18 ` Pavel Machek 1 sibling, 0 replies; 52+ messages in thread From: Nigel Cunningham @ 2005-03-31 22:16 UTC (permalink / raw) To: dtor_core Cc: Patrick Mochel, Pavel Machek, Vojtech Pavlik, Andy Isaacson, Linux-pm mailing list, Stefan Seyfried, Linux Kernel Mailing List Hi. On Fri, 2005-04-01 at 02:32, Dmitry Torokhov wrote: > On Thu, 31 Mar 2005 08:02:44 -0800 (PST), Patrick Mochel > <mochel@digitalimplant.org> wrote: > > > > On Thu, 31 Mar 2005, Dmitry Torokhov wrote: > > > > > Ok, what do you think about this one? > > > > > > =================================================================== > > > > > > swsusp: disable usermodehelper after generating memory snapshot and > > > before resuming devices, so when device fails to resume we > > > won't try to call hotplug - userspace stopped anyway. > > > > Hm, shouldn't we disable it before we start to freeze processes? We don't > > want any more processes trying to start up after we've taken care of > > them.. > > > > Can't a device be removed (for any reason) _while_ we are freezing > processes? I think freeszing code will properly deal with it... What > about suspend semantics - if suspend fails do we say the device should > be operational or the system should attempt to re-initialize? I.e. we > are not doing suspend after all - can we still drop messages on the > floor? After all, we still have ability to run coldplug after failed > suspend. > > I frankly am not sure at what point to disable usermode helper. Or > maybe we need to have a list of pending events and suspend khelper_wq > while suspending. FWIW, my solution is purely freezer based. I freeze khelper and in the freezer code ignore kernel threads in state uninterruptible (which is where kseriod, eg, will be while it waits for the usermode helper process (which also gets frozen). Regards, Nigel -- Nigel Cunningham Software Engineer, Canberra, Australia http://www.cyclades.com Bus: +61 (2) 6291 9554; Hme: +61 (2) 6292 8028; Mob: +61 (417) 100 574 Maintainer of Suspend2 Kernel Patches http://suspend2.net ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [linux-pm] Re: swsusp 'disk' fails in bk-current - intel_agp at fault? 2005-03-31 16:32 ` Dmitry Torokhov 2005-03-31 22:16 ` Nigel Cunningham @ 2005-03-31 22:18 ` Pavel Machek 2005-03-31 22:28 ` Nigel Cunningham 1 sibling, 1 reply; 52+ messages in thread From: Pavel Machek @ 2005-03-31 22:18 UTC (permalink / raw) To: dtor_core Cc: Patrick Mochel, Vojtech Pavlik, Andy Isaacson, Linux-pm mailing list, Nigel Cunningham, Stefan Seyfried, Linux Kernel Mailing List Hi! > > > Ok, what do you think about this one? > > > > > > =================================================================== > > > > > > swsusp: disable usermodehelper after generating memory snapshot and > > > before resuming devices, so when device fails to resume we > > > won't try to call hotplug - userspace stopped anyway. > > > > Hm, shouldn't we disable it before we start to freeze processes? We don't > > want any more processes trying to start up after we've taken care of > > them.. > > > > Can't a device be removed (for any reason) _while_ we are freezing > processes? I think freeszing code will properly deal with it... What > about suspend semantics - if suspend fails do we say the device should > be operational or the system should attempt to re-initialize? I.e. we > are not doing suspend after all - can we still drop messages on the > floor? After all, we still have ability to run coldplug after failed > suspend. I believe we should freeze hotplug before processes. Dropping messages on the floor should not be a problem, we should just call coldplug after failed suspend. Pavel -- People were complaining that M$ turns users into beta-testers... ...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl! ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [linux-pm] Re: swsusp 'disk' fails in bk-current - intel_agp at fault? 2005-03-31 22:18 ` Pavel Machek @ 2005-03-31 22:28 ` Nigel Cunningham 2005-04-01 8:49 ` Rafael J. Wysocki 0 siblings, 1 reply; 52+ messages in thread From: Nigel Cunningham @ 2005-03-31 22:28 UTC (permalink / raw) To: Pavel Machek Cc: dtor_core, Patrick Mochel, Vojtech Pavlik, Andy Isaacson, Linux-pm mailing list, Stefan Seyfried, Linux Kernel Mailing List Hi. On Fri, 2005-04-01 at 08:18, Pavel Machek wrote: > Hi! > > > > > Ok, what do you think about this one? > > > > > > > > =================================================================== > > > > > > > > swsusp: disable usermodehelper after generating memory snapshot and > > > > before resuming devices, so when device fails to resume we > > > > won't try to call hotplug - userspace stopped anyway. > > > > > > Hm, shouldn't we disable it before we start to freeze processes? We don't > > > want any more processes trying to start up after we've taken care of > > > them.. > > > > > > > Can't a device be removed (for any reason) _while_ we are freezing > > processes? I think freeszing code will properly deal with it... What > > about suspend semantics - if suspend fails do we say the device should > > be operational or the system should attempt to re-initialize? I.e. we > > are not doing suspend after all - can we still drop messages on the > > floor? After all, we still have ability to run coldplug after failed > > suspend. > > I believe we should freeze hotplug before processes. Dropping messages > on the floor should not be a problem, we should just call coldplug > after failed suspend. How will you know which devices to call coldplug for, post resume? (Or does it figure that out itself somehow?) Regards, Nigel -- Nigel Cunningham Software Engineer, Canberra, Australia http://www.cyclades.com Bus: +61 (2) 6291 9554; Hme: +61 (2) 6292 8028; Mob: +61 (417) 100 574 Maintainer of Suspend2 Kernel Patches http://suspend2.net ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [linux-pm] Re: swsusp 'disk' fails in bk-current - intel_agp at fault? 2005-03-31 22:28 ` Nigel Cunningham @ 2005-04-01 8:49 ` Rafael J. Wysocki 2005-04-01 10:33 ` Stefan Seyfried 0 siblings, 1 reply; 52+ messages in thread From: Rafael J. Wysocki @ 2005-04-01 8:49 UTC (permalink / raw) To: ncunningham Cc: Pavel Machek, dtor_core, Patrick Mochel, Vojtech Pavlik, Andy Isaacson, Linux-pm mailing list, Stefan Seyfried, Linux Kernel Mailing List Hi, On Friday, 1 of April 2005 00:28, Nigel Cunningham wrote: > Hi. > > On Fri, 2005-04-01 at 08:18, Pavel Machek wrote: > > Hi! > > > > > > > Ok, what do you think about this one? > > > > > > > > > > =================================================================== > > > > > > > > > > swsusp: disable usermodehelper after generating memory snapshot and > > > > > before resuming devices, so when device fails to resume we > > > > > won't try to call hotplug - userspace stopped anyway. > > > > > > > > Hm, shouldn't we disable it before we start to freeze processes? We don't > > > > want any more processes trying to start up after we've taken care of > > > > them.. > > > > > > > > > > Can't a device be removed (for any reason) _while_ we are freezing > > > processes? I think freeszing code will properly deal with it... What > > > about suspend semantics - if suspend fails do we say the device should > > > be operational or the system should attempt to re-initialize? I.e. we > > > are not doing suspend after all - can we still drop messages on the > > > floor? After all, we still have ability to run coldplug after failed > > > suspend. > > > > I believe we should freeze hotplug before processes. I agree. IMO user space should not be considered as available once we have started freezing processes, so hotplug should be disabled before. By the same token, it should only be enabled after the processes have been restarted during resume (or after suspend has failed). BTW, it seems to me that the forking of new processes could be disabled before we start to freeze the existing ones. > > Dropping messages on the floor should not be a problem, we should just > > call coldplug after failed suspend. > > How will you know which devices to call coldplug for, post resume? (Or > does it figure that out itself somehow?) I think the drivers that need the hotplug to resume should defer their resume routines until usermodehelper is enabled (it seems to me that we can use a completion to handle this). Greets, Rafael -- - Would you tell me, please, which way I ought to go from here? - That depends a good deal on where you want to get to. -- Lewis Carroll "Alice's Adventures in Wonderland" ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [linux-pm] Re: swsusp 'disk' fails in bk-current - intel_agp at fault? 2005-04-01 8:49 ` Rafael J. Wysocki @ 2005-04-01 10:33 ` Stefan Seyfried 0 siblings, 0 replies; 52+ messages in thread From: Stefan Seyfried @ 2005-04-01 10:33 UTC (permalink / raw) To: Rafael J. Wysocki Cc: ncunningham, Pavel Machek, dtor_core, Patrick Mochel, Vojtech Pavlik, Andy Isaacson, Linux-pm mailing list, Linux Kernel Mailing List Rafael J. Wysocki wrote: > Hi, >> On Fri, 2005-04-01 at 08:18, Pavel Machek wrote: >> > I believe we should freeze hotplug before processes. > > I agree. IMO user space should not be considered as available once we have > started freezing processes, so hotplug should be disabled before. By the same > token, it should only be enabled after the processes have been restarted > during resume (or after suspend has failed). it has probably to be enabled before the processes are restarted - they may rightfully assume that hotplug is working. -- seife Never trust a computer you can't lift. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: swsusp 'disk' fails in bk-current - intel_agp at fault? 2005-03-29 21:12 ` Pavel Machek 2005-03-29 21:33 ` Dmitry Torokhov @ 2005-03-29 23:05 ` Rafael J. Wysocki 1 sibling, 0 replies; 52+ messages in thread From: Rafael J. Wysocki @ 2005-03-29 23:05 UTC (permalink / raw) To: Pavel Machek Cc: dtor_core, Stefan Seyfried, Andy Isaacson, kernel list, Vojtech Pavlik, Linux-pm mailing list Hi, On Tuesday, 29 of March 2005 23:12, Pavel Machek wrote: > Hi! > > > > I don't really want us to try execve during resume... Could we simply > > > artifically fail that execve with something if (in_suspend()) return > > > -EINVAL; [except that in_suspend() just is not there, but there were > > > some proposals to add it]. > > > > > > Or just avoid calling hotplug at all in resume case? And then do > > > coldplug-like scan when userspace is ready... > > > > > > > I am leaning towards calling disable_usermodehelper (not writtent yet) > > after swsusp completes snapshotting memory. We really don't care about > > hotplug events in this case and this will allow keeping "normal" > > resume in drivers as is. What do you think? > > That would certianly do the trick. > > [Or perhaps in_suspend() is slightly nicer solution? People wanted it > for other stuff (sanity checking, like BUG_ON(in_suspend())), too....] IMHO, they are not mutually exclusive. However, by using disable_usermodehelper we would get rid of the reason (ie hotplug events) instead of just curing the symptoms (ie execve() during suspend). Greets, Rafael -- - Would you tell me, please, which way I ought to go from here? - That depends a good deal on where you want to get to. -- Lewis Carroll "Alice's Adventures in Wonderland" ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [linux-pm] Re: swsusp 'disk' fails in bk-current - intel_agp at fault? 2005-03-29 20:52 ` Pavel Machek 2005-03-29 21:07 ` Dmitry Torokhov @ 2005-03-29 21:23 ` Patrick Mochel 2005-03-29 21:38 ` Dmitry Torokhov 2005-03-30 9:52 ` Greg KH 1 sibling, 2 replies; 52+ messages in thread From: Patrick Mochel @ 2005-03-29 21:23 UTC (permalink / raw) To: Pavel Machek Cc: dtor_core, Linux-pm mailing list, Vojtech Pavlik, Stefan Seyfried, kernel list, Andy Isaacson On Tue, 29 Mar 2005, Pavel Machek wrote: > I don't really want us to try execve during resume... Could we simply > artifically fail that execve with something if (in_suspend()) return > -EINVAL; [except that in_suspend() just is not there, but there were > some proposals to add it]. > > Or just avoid calling hotplug at all in resume case? And then do > coldplug-like scan when userspace is ready... I thought that cold-plugging only worked for devices, not all objects. Can we just queue up hotplug events? That way we wouldn't lose any across the transition, and could be used to send resume events to userspace for various devices that need help.. Pat ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [linux-pm] Re: swsusp 'disk' fails in bk-current - intel_agp at fault? 2005-03-29 21:23 ` [linux-pm] " Patrick Mochel @ 2005-03-29 21:38 ` Dmitry Torokhov 2005-03-30 9:52 ` Greg KH 1 sibling, 0 replies; 52+ messages in thread From: Dmitry Torokhov @ 2005-03-29 21:38 UTC (permalink / raw) To: Patrick Mochel Cc: Pavel Machek, Linux-pm mailing list, Vojtech Pavlik, Stefan Seyfried, kernel list, Andy Isaacson On Tue, 29 Mar 2005 13:23:35 -0800 (PST), Patrick Mochel <mochel@digitalimplant.org> wrote: > > On Tue, 29 Mar 2005, Pavel Machek wrote: > > > I don't really want us to try execve during resume... Could we simply > > artifically fail that execve with something if (in_suspend()) return > > -EINVAL; [except that in_suspend() just is not there, but there were > > some proposals to add it]. > > > > Or just avoid calling hotplug at all in resume case? And then do > > coldplug-like scan when userspace is ready... > > I thought that cold-plugging only worked for devices, not all objects. > It really depens on the script - nothing stops it from traversing entire /sys tree and if an object it not exported in the tree I'd say userspace should not care about such object anyway. > Can we just queue up hotplug events? That way we wouldn't lose any across > the transition, and could be used to send resume events to userspace for > various devices that need help.. > The point is that at this point any changes to the system state will be discarded - we already did the image and about to write it. When we resume for real all those events will be regenerated once again. -- Dmitry ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [linux-pm] Re: swsusp 'disk' fails in bk-current - intel_agp at fault? 2005-03-29 21:23 ` [linux-pm] " Patrick Mochel 2005-03-29 21:38 ` Dmitry Torokhov @ 2005-03-30 9:52 ` Greg KH 1 sibling, 0 replies; 52+ messages in thread From: Greg KH @ 2005-03-30 9:52 UTC (permalink / raw) To: Patrick Mochel Cc: Pavel Machek, Vojtech Pavlik, Andy Isaacson, Linux-pm mailing list, Stefan Seyfried, kernel list On Tue, Mar 29, 2005 at 01:23:35PM -0800, Patrick Mochel wrote: > > On Tue, 29 Mar 2005, Pavel Machek wrote: > > > I don't really want us to try execve during resume... Could we simply > > artifically fail that execve with something if (in_suspend()) return > > -EINVAL; [except that in_suspend() just is not there, but there were > > some proposals to add it]. > > > > Or just avoid calling hotplug at all in resume case? And then do > > coldplug-like scan when userspace is ready... > > I thought that cold-plugging only worked for devices, not all objects. We can walk the whole sysfs tree and create "cold" hotplug events. udevstart does that for devices that udev cares about (as an example.) > Can we just queue up hotplug events? That way we wouldn't lose any across > the transition, and could be used to send resume events to userspace for > various devices that need help.. Ick, I really hate this idea, but there is a patch in the SuSE kernel to do this at boot time. Hopefully the author of that patch resubmitts it again and maybe it will make it eventually into mainline... thanks, greg k-h ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: swsusp 'disk' fails in bk-current - intel_agp at fault? 2005-03-24 23:54 ` Andy Isaacson 2005-03-25 9:22 ` Stefan Seyfried @ 2005-03-25 14:58 ` Dmitry Torokhov 2005-03-30 7:26 ` Andy Isaacson 1 sibling, 1 reply; 52+ messages in thread From: Dmitry Torokhov @ 2005-03-25 14:58 UTC (permalink / raw) To: Andy Isaacson; +Cc: Stefan Seyfried, kernel list On Thu, 24 Mar 2005 15:54:39 -0800, Andy Isaacson <adi@hexapodia.org> wrote: > On Thu, Mar 24, 2005 at 04:10:39PM -0500, Dmitry Torokhov wrote: > > If you do "ls /sys/bus/serio/devices" and see more than 3 ports you > > have MUX mode active. > > Just serio0 and serio1. > > On Thu, Mar 24, 2005 at 04:14:52PM -0500, Dmitry Torokhov wrote: > > On Thu, 24 Mar 2005 12:20:40 -0800, Andy Isaacson <adi@hexapodia.org> wrote: > > > (How can I verify that "nomux" was accepted? It shows up on the "Kernel > > > command line" but there's no other mention of it in dmesg.) > > > > Ignore my babbling, I just noticed in your dmesg that your KBC does > > not support MUX mode to begin with. > > OK, anything else I should try? > > Why does it only fail when I have *both* intel_agp and i8042 aux? > > In the SysRq-T trace I see one interesting process: most things are > in D state in refrigerator(), but sh shows the following traceback: > > wait_for_completion > call_usermodehelper > kobject_hotplug > kobject_del > class_device_del > class_device_unregister > mousedev_disconnect > input_unregister_device > alps_disconnect > psmouse_disconnect > serio_driver_remove > device_release_driver > serio_release_driver > serio_resume I wonder why ALPS reconnect failed. You don't have a serial console set up, do you? If not then maybe you could make a huge framebuffer to capture as much info as you can... I hope you have a digital camera ;) Then do "echo 1 > /sys/modules/i8042/parameters/debug" and try to suspend. I am interested of data coming in and out of i8042. -- Dmitry ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: swsusp 'disk' fails in bk-current - intel_agp at fault? 2005-03-25 14:58 ` Dmitry Torokhov @ 2005-03-30 7:26 ` Andy Isaacson 0 siblings, 0 replies; 52+ messages in thread From: Andy Isaacson @ 2005-03-30 7:26 UTC (permalink / raw) To: dtor_core; +Cc: Stefan Seyfried, kernel list On Fri, Mar 25, 2005 at 09:58:40AM -0500, Dmitry Torokhov wrote: > I wonder why ALPS reconnect failed. You don't have a serial console > set up, do you? If not then maybe you could make a huge framebuffer to > capture as much info as you can... I hope you have a digital camera ;) No serial ports brought out on this laptop, and I've not tried framebuffer... > Then do "echo 1 > /sys/modules/i8042/parameters/debug" and try to > suspend. I am interested of data coming in and out of i8042. Transcribed by hand, the last few bytes are < fa ACK > d4 e9 GETINFO < fa 20 00 64 > d4 ff RESET_BAT < fa aa 00 RET_BAT (Because I used O= the __FILE__ is very long so each dbg() takes two lines of my 80x25 console...) Dunno if that's helpful, sorry... -andy ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: swsusp 'disk' fails in bk-current - intel_agp at fault? 2005-03-24 20:20 ` Andy Isaacson 2005-03-24 21:10 ` Dmitry Torokhov @ 2005-03-24 21:14 ` Dmitry Torokhov 1 sibling, 0 replies; 52+ messages in thread From: Dmitry Torokhov @ 2005-03-24 21:14 UTC (permalink / raw) To: Andy Isaacson; +Cc: Stefan Seyfried, kernel list On Thu, 24 Mar 2005 12:20:40 -0800, Andy Isaacson <adi@hexapodia.org> wrote: > On Thu, Mar 24, 2005 at 02:18:40PM -0500, Dmitry Torokhov wrote: > > On Thu, 24 Mar 2005 10:10:59 -0800, Andy Isaacson <adi@hexapodia.org> wrote: > > > So I added i8042.noaux to my kernel command line, rebooted, insmodded > > > intel_agp, started X, and verified no touchpad action. Then I > > > suspended, and it worked fine. After restart, I suspended again - also > > > fine. > > > > > > So I think that fixed it. But no touchpad is a bit annoying. :) > > > > Try adding i8042.nomux instead of i8042.noaux, it should keep your > > touchpad in working condition. Please let me know if it still wiorks. > > With nomux the touchpad works again, but suspend blocks in the same > place as without nomux. > > (How can I verify that "nomux" was accepted? It shows up on the "Kernel > command line" but there's no other mention of it in dmesg.) > Ignore my babbling, I just noticed in your dmesg that your KBC does not support MUX mode to begin with. -- Dmitry ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: swsusp 'disk' fails in bk-current - intel_agp at fault? 2005-03-24 18:10 ` Andy Isaacson 2005-03-24 19:18 ` Dmitry Torokhov @ 2005-03-24 20:38 ` Stefan Seyfried 2005-03-29 18:42 ` Dmitry Torokhov 1 sibling, 1 reply; 52+ messages in thread From: Stefan Seyfried @ 2005-03-24 20:38 UTC (permalink / raw) To: Andy Isaacson; +Cc: kernel list Andy Isaacson wrote: > On Thu, Mar 24, 2005 at 03:27:15PM +0100, Stefan Seyfried wrote: > Sysrq still prints stuff, so IRQs aren't locked. But most of the sysrq > commands don't work... S and U don't seem to do anything (not too > suprising I suppose) but B does reboot. sysrq-t will probably show a stuck kseriod. Unfortunately it only happens on one machine for me (toshiba P10-550 IIRC, P4HT but with non-smp kernel) which has no serial port for console. >> If sysrq is still working, please try with "i8042.noaux" (this will kill >> your touchpad, which is what i intend :-) > > So I added i8042.noaux to my kernel command line, rebooted, insmodded > intel_agp, started X, and verified no touchpad action. Then I > suspended, and it worked fine. After restart, I suspended again - also > fine. > > So I think that fixed it. But no touchpad is a bit annoying. :) Yes, it was not thought as a fix but just for verification, since i have seen something similar. We have a SUSE bug for this, i believe Vojtech and Pavel will take care of this one. Thanks for confirming, i almost started to believe i was seeing ghosts :-) -- seife Never trust a computer you can't lift. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: swsusp 'disk' fails in bk-current - intel_agp at fault? 2005-03-24 20:38 ` Stefan Seyfried @ 2005-03-29 18:42 ` Dmitry Torokhov 2005-03-30 7:24 ` Andy Isaacson 0 siblings, 1 reply; 52+ messages in thread From: Dmitry Torokhov @ 2005-03-29 18:42 UTC (permalink / raw) To: Stefan Seyfried; +Cc: Andy Isaacson, kernel list On Thursday 24 March 2005 15:38, Stefan Seyfried wrote: > Andy Isaacson wrote: > > So I added i8042.noaux to my kernel command line, rebooted, insmodded > > intel_agp, started X, and verified no touchpad action. Then I > > suspended, and it worked fine. After restart, I suspended again - also > > fine. > > > > So I think that fixed it. But no touchpad is a bit annoying. :) > > Yes, it was not thought as a fix but just for verification, since i have > seen something similar. > We have a SUSE bug for this, i believe Vojtech and Pavel will take care > of this one. Thanks for confirming, i almost started to believe i was > seeing ghosts :-) Could you please try the patch below - it should fix the issues you are seeing although there may be other devices (really any hot-pluggable device) that will show the same behaviour. In the long run swsusp should not attempt resuming devices when the system can not handle the process properly. -- Dmitry =================================================================== Input: serio - do not attempt to immediately disconnect port if resume failed, let kseriod take care of it. Otherwise we may attempt to unregister associated input devices which will generate hotplug events which are not handled well during swsusp. Signed-off-by: Dmitry Torokhov <dtor@mail.ru> serio.c | 1 - 1 files changed, 1 deletion(-) Index: dtor/drivers/input/serio/serio.c =================================================================== --- dtor.orig/drivers/input/serio/serio.c +++ dtor/drivers/input/serio/serio.c @@ -779,7 +779,6 @@ static int serio_resume(struct device *d struct serio *serio = to_serio_port(dev); if (!serio->drv || !serio->drv->reconnect || serio->drv->reconnect(serio)) { - serio_disconnect_port(serio); /* * Driver re-probing can take a while, so better let kseriod * deal with it. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: swsusp 'disk' fails in bk-current - intel_agp at fault? 2005-03-29 18:42 ` Dmitry Torokhov @ 2005-03-30 7:24 ` Andy Isaacson 0 siblings, 0 replies; 52+ messages in thread From: Andy Isaacson @ 2005-03-30 7:24 UTC (permalink / raw) To: Dmitry Torokhov; +Cc: Stefan Seyfried, kernel list On Tue, Mar 29, 2005 at 01:42:26PM -0500, Dmitry Torokhov wrote: > Could you please try the patch below - it should fix the issues you are [snip] > --- dtor.orig/drivers/input/serio/serio.c > +++ dtor/drivers/input/serio/serio.c > if (!serio->drv || !serio->drv->reconnect || serio->drv->reconnect(serio)) { > - serio_disconnect_port(serio); > /* > * Driver re-probing can take a while, so better let kseriod Yep, that fixes it. I applied your patch to 2.6.12-rc1-mm1 and suspended and resumed 5 times in a row without any difficulty. Thanks! -andy ^ permalink raw reply [flat|nested] 52+ messages in thread
[parent not found: <20050525171825.51a06908.akpm@osdl.org>]
* Re: swsusp 'disk' fails in bk-current - intel_agp at fault? [not found] ` <20050525171825.51a06908.akpm@osdl.org> @ 2005-05-27 17:44 ` Andy Isaacson 0 siblings, 0 replies; 52+ messages in thread From: Andy Isaacson @ 2005-05-27 17:44 UTC (permalink / raw) To: Andrew Morton; +Cc: Pavel Machek, Dave Jones, linux-kernel On Wed, May 25, 2005 at 05:18:25PM -0700, Andrew Morton wrote: > Andy Isaacson <adi@hexapodia.org> wrote: > > I was previously running 2.6.11-rc3 and swsusp was working quite nicely: > > echo shutdown > /sys/power/disk > > echo disk > /sys/power/state > > > > Now I've upgraded to 2.6.12-rc1, 423b66b6oJOGN68OhmSrBFxxLOtIEA, and it > > no longer works reliably. Almost every time I do the above it blocks in > > device_resume() (I haven't had time to track it deeper than that). > > Andy, can you please retest 2.6.12-rc5 and if these problems remain, > generate new reports at bugme.osdl.org? After two quick tests, it appears to be fixed in 2.6.12-rc5. Thanks for the follow-up. -andy ^ permalink raw reply [flat|nested] 52+ messages in thread
end of thread, other threads:[~2005-05-27 17:45 UTC | newest]
Thread overview: 52+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-03-23 18:49 swsusp 'disk' fails in bk-current - intel_agp at fault? Andy Isaacson
2005-03-24 14:27 ` Stefan Seyfried
2005-03-24 18:10 ` Andy Isaacson
2005-03-24 19:18 ` Dmitry Torokhov
2005-03-24 20:20 ` Andy Isaacson
2005-03-24 21:10 ` Dmitry Torokhov
2005-03-24 23:54 ` Andy Isaacson
2005-03-25 9:22 ` Stefan Seyfried
2005-03-25 10:13 ` Pavel Machek
2005-03-25 14:19 ` Dmitry Torokhov
2005-03-25 14:24 ` Pavel Machek
2005-03-25 14:52 ` Dmitry Torokhov
2005-03-25 15:42 ` Pavel Machek
2005-03-25 16:04 ` Dmitry Torokhov
2005-03-28 23:00 ` Pavel Machek
2005-03-29 23:19 ` Rafael J. Wysocki
2005-03-29 21:49 ` Rafael J. Wysocki
2005-03-25 18:36 ` Andy Isaacson
2005-03-29 16:18 ` Dmitry Torokhov
2005-03-29 18:18 ` Pavel Machek
2005-03-29 19:11 ` Dmitry Torokhov
2005-03-29 19:23 ` Pavel Machek
2005-03-29 20:05 ` Dmitry Torokhov
2005-03-29 20:52 ` Pavel Machek
2005-03-29 21:07 ` Dmitry Torokhov
2005-03-29 21:12 ` Pavel Machek
2005-03-29 21:33 ` Dmitry Torokhov
2005-03-29 21:44 ` Pavel Machek
2005-03-29 22:31 ` [linux-pm] " Nigel Cunningham
2005-03-29 22:35 ` Pavel Machek
2005-03-29 23:46 ` Nigel Cunningham
2005-03-31 7:26 ` Dmitry Torokhov
2005-03-31 8:39 ` Pavel Machek
2005-03-31 15:02 ` Dmitry Torokhov
2005-03-31 16:02 ` Patrick Mochel
2005-03-31 16:32 ` Dmitry Torokhov
2005-03-31 22:16 ` Nigel Cunningham
2005-03-31 22:18 ` Pavel Machek
2005-03-31 22:28 ` Nigel Cunningham
2005-04-01 8:49 ` Rafael J. Wysocki
2005-04-01 10:33 ` Stefan Seyfried
2005-03-29 23:05 ` Rafael J. Wysocki
2005-03-29 21:23 ` [linux-pm] " Patrick Mochel
2005-03-29 21:38 ` Dmitry Torokhov
2005-03-30 9:52 ` Greg KH
2005-03-25 14:58 ` Dmitry Torokhov
2005-03-30 7:26 ` Andy Isaacson
2005-03-24 21:14 ` Dmitry Torokhov
2005-03-24 20:38 ` Stefan Seyfried
2005-03-29 18:42 ` Dmitry Torokhov
2005-03-30 7:24 ` Andy Isaacson
[not found] ` <20050525171825.51a06908.akpm@osdl.org>
2005-05-27 17:44 ` Andy Isaacson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox