swsusp 'disk' fails in bk-current

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* swsusp 'disk' fails in bk-current - intel_agp at fault?
@ 2005-03-23 18:49 Andy Isaacson
  2005-03-24 14:27 ` Stefan Seyfried
       [not found] ` <20050525171825.51a06908.akpm@osdl.org>
  0 siblings, 2 replies; 52+ messages in thread
From: Andy Isaacson @ 2005-03-23 18:49 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1181 bytes --]

I was previously running 2.6.11-rc3 and swsusp was working quite nicely:
echo shutdown > /sys/power/disk
echo disk > /sys/power/state

Now I've upgraded to 2.6.12-rc1, 423b66b6oJOGN68OhmSrBFxxLOtIEA, and it
no longer works reliably.  Almost every time I do the above it blocks in
device_resume() (I haven't had time to track it deeper than that).
Here's the output (hand copied):

[    51.782593] [nosave pfn 0x356]<7>[nosave pfn 0x357]swsusp:critical section/: done (122772 pages copied)
[    54.305996] PM: writing image.
[    54.306032] /usr/src/linux-2.6-cvs/kernel/power/swsusp.c:863
[    54.316885] e100: eth0: e100_watchdog: link up, 100Mbps, full-duplex
_

(Obviously, I added some printks to track where it's blocking.)

Dmesg is attached; hardware is a Vaio r505te.

Unfortunately, the deadlock (?) is nondeterministic; it *sometimes*
suspends successfully, maybe one time out of 10.  And thinking back, I
*sometimes* saw failures to suspend with 2.6.11-rc3, maybe one failure
out of 20 suspends.

Another interesting tidbit - I had more success when I tried it without
the intel_agp module loaded; I haven't seen a lockup yet.  (But why
can't I rmmod intel_agp?)

-andy

[-- Attachment #2: dmesg --]
[-- Type: text/plain, Size: 9616 bytes --]

[4294667.296000] Linux version 2.6.12-rc1 (adi@sart) (gcc version 3.3.4 (Debian 1:3.3.4-9ubuntu5)) #7 Tue Mar 22 21:30:45 PST 2005
[4294667.296000] BIOS-provided physical RAM map:
[4294667.296000]  BIOS-e820: 0000000000000000 - 000000000009e800 (usable)
[4294667.296000]  BIOS-e820: 000000000009e800 - 00000000000a0000 (reserved)
[4294667.296000]  BIOS-e820: 00000000000c0000 - 00000000000cc000 (reserved)
[4294667.296000]  BIOS-e820: 00000000000d8000 - 0000000000100000 (reserved)
[4294667.296000]  BIOS-e820: 0000000000100000 - 0000000013cf0000 (usable)
[4294667.296000]  BIOS-e820: 0000000013cf0000 - 0000000013cfc000 (ACPI data)
[4294667.296000]  BIOS-e820: 0000000013cfc000 - 0000000013d00000 (ACPI NVS)
[4294667.296000]  BIOS-e820: 0000000013d00000 - 0000000013e80000 (usable)
[4294667.296000]  BIOS-e820: 0000000013e80000 - 0000000014000000 (reserved)
[4294667.296000]  BIOS-e820: 00000000ff800000 - 00000000ffc00000 (reserved)
[4294667.296000]  BIOS-e820: 00000000fff00000 - 0000000100000000 (reserved)
[4294667.296000] 318MB LOWMEM available.
[4294667.296000] On node 0 totalpages: 81536
[4294667.296000]   DMA zone: 4096 pages, LIFO batch:1
[4294667.296000]   Normal zone: 77440 pages, LIFO batch:16
[4294667.296000]   HighMem zone: 0 pages, LIFO batch:1
[4294667.296000] DMI present.
[4294667.296000] ACPI: RSDP (v000 PTLTD                                 ) @ 0x000f7120
[4294667.296000] ACPI: RSDT (v001 SONY   U1       0x20010312 PTL  0x00000000) @ 0x13cf74cb
[4294667.296000] ACPI: FADT (v001   SONY       U1 0x20010312 PTL  0x01000000) @ 0x13cfbf64
[4294667.296000] ACPI: BOOT (v001   SONY       U1 0x20010312 PTL  0x00000001) @ 0x13cfbfd8
[4294667.296000] ACPI: DSDT (v001   SONY       U1 0x20010312 PTL  0x0100000b) @ 0x00000000
[4294667.296000] Allocating PCI resources starting at 14000000 (gap: 14000000:eb800000)
[4294667.296000] Built 1 zonelists
[4294667.296000] Kernel command line: root=/dev/hda2 ro
[4294667.296000] Local APIC disabled by BIOS -- you can enable it with "lapic"
[4294667.296000] mapped APIC to ffffd000 (0127f000)
[4294667.296000] Initializing CPU#0
[4294667.296000] PID hash table entries: 2048 (order: 11, 32768 bytes)
[    0.000000] Detected 596.225 MHz processor.
[   15.409585] Using tsc for high-res timesource
[   15.411373] Console: colour VGA+ 80x25
[   15.413032] Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
[   15.414428] Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
[   15.451391] Memory: 319764k/326144k available (1648k kernel code, 5748k reserved, 769k data, 156k init, 0k highmem)
[   15.451466] Checking if this processor honours the WP bit even in supervisor mode... Ok.
[   15.451721] Calibrating delay loop... 1175.55 BogoMIPS (lpj=587776)
[   15.472329] Security Framework v1.0.0 initialized
[   15.472403] Mount-cache hash table entries: 512
[   15.472701] CPU: After generic identify, caps: 0383f9ff 00000000 00000000 00000000 00000000 00000000 00000000
[   15.472723] CPU: After vendor identify, caps: 0383f9ff 00000000 00000000 00000000 00000000 00000000 00000000
[   15.472748] CPU: L1 I cache: 16K, L1 D cache: 16K
[   15.472790] CPU: L2 cache: 256K
[   15.472823] CPU: After all inits, caps: 0383f9ff 00000000 00000000 00000040 00000000 00000000 00000000
[   15.472840] CPU: Intel Pentium III (Coppermine) stepping 06
[   15.472895] Enabling fast FPU save and restore... done.
[   15.472937] Enabling unmasked SIMD FPU exception support... done.
[   15.472986] Checking 'hlt' instruction... OK.
[   15.516484] ACPI: setting ELCR to 0200 (from 0628)
[   15.552288] NET: Registered protocol family 16
[   15.553529] PCI: PCI BIOS revision 2.10 entry at 0xfd9b0, last bus=1
[   15.553588] PCI: Using configuration type 1
[   15.553626] mtrr: v2.0 (20020519)
[   15.554600] ACPI: Subsystem revision 20050211
[   15.593476] ACPI: Interpreter enabled
[   15.593531] ACPI: Using PIC for interrupt routing
[   15.597099] ACPI: PCI Root Bridge [PCI0] (00:00)
[   15.597142] PCI: Probing PCI hardware (bus 00)
[   15.598428] PCI: Transparent bridge - 0000:00:1e.0
[   15.599531] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
[   15.600509] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.HUB_._PRT]
[   15.604310] ACPI: Embedded Controller [EC0] (gpe 28)
[   15.647350] ACPI: PCI Interrupt Link [LNKA] (IRQs 9) *0
[   15.648458] ACPI: PCI Interrupt Link [LNKB] (IRQs 9) *0
[   15.649416] ACPI: PCI Interrupt Link [LNKC] (IRQs 9) *0, disabled.
[   15.650495] ACPI: PCI Interrupt Link [LNKD] (IRQs *9)
[   15.651565] ACPI: PCI Interrupt Link [LNKE] (IRQs *9)
[   15.652638] ACPI: PCI Interrupt Link [LNKH] (IRQs 9) *0
[   15.653325] PCI: Using ACPI for IRQ routing
[   15.653369] PCI: If a device doesn't work, try "pci=routeirq".  If it helps, post a report
[   15.659519] Simple Boot Flag at 0x36 set to 0x1
[   15.660688] VFS: Disk quotas dquot_6.5.1
[   15.660777] Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
[   15.660920] devfs: 2004-01-31 Richard Gooch (rgooch@atnf.csiro.au)
[   15.660970] devfs: boot_options: 0x0
[   15.661069] Initializing Cryptographic API
[   15.662644] ACPI: AC Adapter [ACAD] (off-line)
[   15.667594] ACPI: Battery Slot [BAT1] (battery present)
[   15.667664] ACPI: Lid Switch [LID]
[   15.667730] ACPI: Power Button (CM) [PWRB]
[   15.700520] ACPI: Video Device [GCH0] (multi-head: yes  rom: yes  post: no)
[   15.701009] ACPI: CPU0 (power states: C1[C1] C2[C2])
[   15.701064] ACPI: Processor [CPU0] (supports 8 throttling states)
[   15.707046] ACPI: Thermal Zone [ATF0] (34 C)
[   15.713858] serio: i8042 AUX port at 0x60,0x64 irq 12
[   15.714069] serio: i8042 KBD port at 0x60,0x64 irq 1
[   15.714109] io scheduler noop registered
[   15.714175] io scheduler anticipatory registered
[   15.714224] io scheduler deadline registered
[   15.714298] io scheduler cfq registered
[   15.715472] RAMDISK driver initialized: 16 RAM disks of 8192K size 1024 blocksize
[   15.715660] Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
[   15.715708] ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
[   15.715974] ICH2M: IDE controller at PCI slot 0000:00:1f.1
[   15.716039] ICH2M: chipset revision 3
[   15.716073] ICH2M: not 100% native mode: will probe irqs later
[   15.716128]     ide0: BM-DMA at 0x1800-0x1807, BIOS settings: hda:DMA, hdb:pio
[   15.716218] Probing IDE interface ide0...
[   15.738423] hda: TOSHIBA MK4026GAX, ATA DISK drive
[   15.751123] ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
[   15.751512] Probing IDE interface ide1...
[   15.762838] Probing IDE interface ide2...
[   15.783945] Probing IDE interface ide3...
[   15.795219] Probing IDE interface ide4...
[   15.816326] Probing IDE interface ide5...
[   15.837538] hda: max request size: 128KiB
[   15.848237] hda: 78140160 sectors (40007 MB), CHS=65535/16/63, UDMA(100)
[   15.848356] hda: cache flushes supported
[   15.848565]  /dev/ide/host0/bus0/target0/lun0: p1 p2 p3
[   15.855880] mice: PS/2 mouse device common for all mice
[   15.855997] NET: Registered protocol family 2
[   15.857032] IP: routing cache hash table of 2048 buckets, 16Kbytes
[   15.857508] TCP established hash table entries: 16384 (order: 5, 131072 bytes)
[   15.858088] TCP bind hash table entries: 16384 (order: 4, 65536 bytes)
[   15.858415] TCP: Hash tables configured (established 16384 bind 16384)
[   15.858737] NET: Registered protocol family 8
[   15.858786] NET: Registered protocol family 20
[   15.858924] PM: Checking swsusp image.
[   15.859289] swsusp: Resume From Partition /dev/hda3
[   15.877786] input: AT Translated Set 2 keyboard on isa0060/serio0
[   15.885595] <3>swsusp: Suspend partition has wrong signature?
[   15.885690] swsusp: Error -22 check for resume file
[   15.885697] PM: Resume from disk failed.
[   15.885743] ACPI wakeup devices: 
[   15.885793] PWRB  LAN CRD0  EC0 COMA USB1 USB2 MODE 
[   15.885869] ACPI: (supports S0 S3 S4 S5)
[   15.888624] kjournald starting.  Commit interval 5 seconds
[   15.888688] EXT3-fs: mounted filesystem with ordered data mode.
[   15.888807] VFS: Mounted root (ext3 filesystem) readonly.
[   15.889516] Freeing unused kernel memory: 156k freed
[   16.141642] NET: Registered protocol family 1
[   19.215308] Adding 987988k swap on /dev/hda3.  Priority:-1 extents:1
[   19.290971] EXT3 FS on hda2, internal journal
[   22.577701] SCSI subsystem initialized
[   23.111022]   Enabling hardware tapping
[   23.144426] input: PS/2 Mouse on isa0060/serio1
[   23.450887] input: AlpsPS/2 ALPS GlidePoint on isa0060/serio1
[   23.542008] ieee1394: Initialized config rom entry `ip1394'
[   24.513902] sbp2: $Rev: 1219 $ Ben Collins <bcollins@debian.org>
[   25.341803] kjournald starting.  Commit interval 5 seconds
[   25.341959] EXT3 FS on hda1, internal journal
[   25.342007] EXT3-fs: mounted filesystem with ordered data mode.
[   28.246274] Linux Kernel Card Services
[   28.246343]   options:  [pci] [cardbus] [pm]
[   28.436545] ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 9
[   28.436608] PCI: setting IRQ 9 as level-triggered
[   28.436647] ACPI: PCI interrupt 0000:01:02.0[A] -> GSI 9 (level, low) -> IRQ 9
[   28.436724] Yenta: CardBus bridge found at 0000:01:02.0 [104d:80e0]
[   28.556726] Yenta: ISA IRQ mask 0x0cb8, PCI irq 9
[   28.556764] Socket status: 30000006
[   30.159387] NET: Registered protocol family 10
[   30.159768] Disabled Privacy Extensions on device c032ea20(lo)
[   30.160046] IPv6 over IPv4 tunneling driver
[   34.058359] Linux agpgart interface v0.101 (c) Dave Jones
[   34.148385] agpgart: Detected an Intel i815 Chipset.
[   34.204153] agpgart: AGP aperture is 64M @ 0xf8000000

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: swsusp 'disk' fails in bk-current - intel_agp at fault?
  2005-03-23 18:49 swsusp 'disk' fails in bk-current - intel_agp at fault? Andy Isaacson
@ 2005-03-24 14:27 ` Stefan Seyfried
  2005-03-24 18:10   ` Andy Isaacson
       [not found] ` <20050525171825.51a06908.akpm@osdl.org>
  1 sibling, 1 reply; 52+ messages in thread
From: Stefan Seyfried @ 2005-03-24 14:27 UTC (permalink / raw)
  To: Andy Isaacson; +Cc: kernel list

Andy Isaacson wrote:

> Dmesg is attached; hardware is a Vaio r505te.
> 
> Unfortunately, the deadlock (?) is nondeterministic; it *sometimes*
> suspends successfully, maybe one time out of 10.  And thinking back, I
> *sometimes* saw failures to suspend with 2.6.11-rc3, maybe one failure
> out of 20 suspends.

Does it hang hard or is sysrq still working?
If sysrq is still working, please try with "i8042.noaux" (this will kill
your touchpad, which is what i intend :-)

Best regards,

    Stefan




^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: swsusp 'disk' fails in bk-current - intel_agp at fault?
  2005-03-24 14:27 ` Stefan Seyfried
@ 2005-03-24 18:10   ` Andy Isaacson
  2005-03-24 19:18     ` Dmitry Torokhov
  2005-03-24 20:38     ` Stefan Seyfried
  0 siblings, 2 replies; 52+ messages in thread
From: Andy Isaacson @ 2005-03-24 18:10 UTC (permalink / raw)
  To: Stefan Seyfried; +Cc: kernel list

On Thu, Mar 24, 2005 at 03:27:15PM +0100, Stefan Seyfried wrote:
> Andy Isaacson wrote:
> > Dmesg is attached; hardware is a Vaio r505te.
> > 
> > Unfortunately, the deadlock (?) is nondeterministic; it *sometimes*
> > suspends successfully, maybe one time out of 10.  And thinking back, I
> > *sometimes* saw failures to suspend with 2.6.11-rc3, maybe one failure
> > out of 20 suspends.
> 
> Does it hang hard or is sysrq still working?

Sysrq still prints stuff, so IRQs aren't locked.  But most of the sysrq
commands don't work... S and U don't seem to do anything (not too
suprising I suppose) but B does reboot.

> If sysrq is still working, please try with "i8042.noaux" (this will kill
> your touchpad, which is what i intend :-)

So I added i8042.noaux to my kernel command line, rebooted, insmodded
intel_agp, started X, and verified no touchpad action.  Then I
suspended, and it worked fine.  After restart, I suspended again - also
fine.

So I think that fixed it.  But no touchpad is a bit annoying. :)

-andy

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: swsusp 'disk' fails in bk-current - intel_agp at fault?
  2005-03-24 18:10   ` Andy Isaacson
@ 2005-03-24 19:18     ` Dmitry Torokhov
  2005-03-24 20:20       ` Andy Isaacson
  2005-03-24 20:38     ` Stefan Seyfried
  1 sibling, 1 reply; 52+ messages in thread
From: Dmitry Torokhov @ 2005-03-24 19:18 UTC (permalink / raw)
  To: Andy Isaacson; +Cc: Stefan Seyfried, kernel list

On Thu, 24 Mar 2005 10:10:59 -0800, Andy Isaacson <adi@hexapodia.org> wrote:
> 
> So I added i8042.noaux to my kernel command line, rebooted, insmodded
> intel_agp, started X, and verified no touchpad action.  Then I
> suspended, and it worked fine.  After restart, I suspended again - also
> fine.
> 
> So I think that fixed it.  But no touchpad is a bit annoying. :)
> 

Try adding i8042.nomux instead of i8042.noaux, it should keep your
touchpad in working condition. Please let me know if it still wiorks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: swsusp 'disk' fails in bk-current - intel_agp at fault?
  2005-03-24 19:18     ` Dmitry Torokhov
@ 2005-03-24 20:20       ` Andy Isaacson
  2005-03-24 21:10         ` Dmitry Torokhov
  2005-03-24 21:14         ` Dmitry Torokhov
  0 siblings, 2 replies; 52+ messages in thread
From: Andy Isaacson @ 2005-03-24 20:20 UTC (permalink / raw)
  To: dtor_core; +Cc: Stefan Seyfried, kernel list

On Thu, Mar 24, 2005 at 02:18:40PM -0500, Dmitry Torokhov wrote:
> On Thu, 24 Mar 2005 10:10:59 -0800, Andy Isaacson <adi@hexapodia.org> wrote:
> > So I added i8042.noaux to my kernel command line, rebooted, insmodded
> > intel_agp, started X, and verified no touchpad action.  Then I
> > suspended, and it worked fine.  After restart, I suspended again - also
> > fine.
> > 
> > So I think that fixed it.  But no touchpad is a bit annoying. :)
> 
> Try adding i8042.nomux instead of i8042.noaux, it should keep your
> touchpad in working condition. Please let me know if it still wiorks.

With nomux the touchpad works again, but suspend blocks in the same
place as without nomux.

(How can I verify that "nomux" was accepted?  It shows up on the "Kernel
command line" but there's no other mention of it in dmesg.)

-andy

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: swsusp 'disk' fails in bk-current - intel_agp at fault?
  2005-03-24 20:20       ` Andy Isaacson
@ 2005-03-24 21:10         ` Dmitry Torokhov
  2005-03-24 23:54           ` Andy Isaacson
  2005-03-24 21:14         ` Dmitry Torokhov
  1 sibling, 1 reply; 52+ messages in thread
From: Dmitry Torokhov @ 2005-03-24 21:10 UTC (permalink / raw)
  To: Andy Isaacson; +Cc: Stefan Seyfried, kernel list

On Thu, 24 Mar 2005 12:20:40 -0800, Andy Isaacson <adi@hexapodia.org> wrote:
> On Thu, Mar 24, 2005 at 02:18:40PM -0500, Dmitry Torokhov wrote:
> > On Thu, 24 Mar 2005 10:10:59 -0800, Andy Isaacson <adi@hexapodia.org> wrote:
> > > So I added i8042.noaux to my kernel command line, rebooted, insmodded
> > > intel_agp, started X, and verified no touchpad action.  Then I
> > > suspended, and it worked fine.  After restart, I suspended again - also
> > > fine.
> > >
> > > So I think that fixed it.  But no touchpad is a bit annoying. :)
> >
> > Try adding i8042.nomux instead of i8042.noaux, it should keep your
> > touchpad in working condition. Please let me know if it still wiorks.
> 
> With nomux the touchpad works again, but suspend blocks in the same
> place as without nomux.
> 
> (How can I verify that "nomux" was accepted?  It shows up on the "Kernel
> command line" but there's no other mention of it in dmesg.)
> 
> -andy
> 

If you do "ls /sys/bus/serio/devices" and see more than 3 ports you
have MUX mode active.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: swsusp 'disk' fails in bk-current - intel_agp at fault?
  2005-03-24 21:10         ` Dmitry Torokhov
@ 2005-03-24 23:54           ` Andy Isaacson
  2005-03-25  9:22             ` Stefan Seyfried
  2005-03-25 14:58             ` Dmitry Torokhov
  0 siblings, 2 replies; 52+ messages in thread
From: Andy Isaacson @ 2005-03-24 23:54 UTC (permalink / raw)
  To: dtor_core; +Cc: Stefan Seyfried, kernel list

On Thu, Mar 24, 2005 at 04:10:39PM -0500, Dmitry Torokhov wrote:
> If you do "ls /sys/bus/serio/devices" and see more than 3 ports you
> have MUX mode active.

Just serio0 and serio1.

On Thu, Mar 24, 2005 at 04:14:52PM -0500, Dmitry Torokhov wrote:
> On Thu, 24 Mar 2005 12:20:40 -0800, Andy Isaacson <adi@hexapodia.org> wrote:
> > (How can I verify that "nomux" was accepted?  It shows up on the "Kernel
> > command line" but there's no other mention of it in dmesg.)
> 
> Ignore my babbling, I just noticed in your dmesg that your KBC does
> not support MUX mode to begin with.

OK, anything else I should try?

Why does it only fail when I have *both* intel_agp and i8042 aux?

In the SysRq-T trace I see one interesting process: most things are
in D state in refrigerator(), but sh shows the following traceback:

wait_for_completion
call_usermodehelper
kobject_hotplug
kobject_del
class_device_del
class_device_unregister
mousedev_disconnect
input_unregister_device
alps_disconnect
psmouse_disconnect
serio_driver_remove
device_release_driver
serio_release_driver
serio_resume
resume_device
dpm_resume
device_resume
swsusp_write
pm_suspend_disk
enter_state
state_store
subsys_attr_store
flush_write_buffer
sysfs_write_file
...

That seems odd to me...

Also, khelper has the following trace:
io_schedule
sync_buffer
__wait_on_bit
out_of_line_wait_on_bit
ext3_find_entry
ext3_lookup
real_lookup
do_lookup
__link_path_walk
link_path_walk
path_lookup
open_exec
do_execve
...

-andy

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: swsusp 'disk' fails in bk-current - intel_agp at fault?
  2005-03-24 23:54           ` Andy Isaacson
@ 2005-03-25  9:22             ` Stefan Seyfried
  2005-03-25 10:13               ` Pavel Machek
  2005-03-29 16:18               ` Dmitry Torokhov
  2005-03-25 14:58             ` Dmitry Torokhov
  1 sibling, 2 replies; 52+ messages in thread
From: Stefan Seyfried @ 2005-03-25  9:22 UTC (permalink / raw)
  To: Andy Isaacson; +Cc: dtor_core, kernel list, Pavel Machek, Vojtech Pavlik

Andy Isaacson wrote:

> OK, anything else I should try?

not really, i just wait for Vojtech and Pavel :-)

> Why does it only fail when I have *both* intel_agp and i8042 aux?

later...

> In the SysRq-T trace I see one interesting process: most things are
> in D state in refrigerator(), but sh shows the following traceback:
> 
> wait_for_completion
> call_usermodehelper
> kobject_hotplug
> kobject_del
> class_device_del
> class_device_unregister
> mousedev_disconnect
> input_unregister_device
> alps_disconnect
> psmouse_disconnect
> serio_driver_remove
> device_release_driver
> serio_release_driver

i think the following happens (but i am in no case an expert for this):
 - alps driver suspends
 - alps driver unregisters the device
 - udev is called via call_usermodehelper (which fails since userspace
   is stopped)
 - now somebody wants to wait for udev which does not work right.

Why only with the ALPS driver and intel_agp?
I think this is an accident. For me, it happens only with init=/bin/bash
and _no_ other drivers loaded (only IDE drivers and psmouse built-in).
As soon as i load any other drivers (i have only tried ehci_hcd and
8139too, to be honest) it works fine again. This leads me to believe it
is a race condition since the extra driver that has to be suspended may
give the ALPS driver the extra time needed to finish the race. For you,
it may be the other way round.

This is mostly guesswork, i am no kernel expert at all.
-- 
seife
                                 Never trust a computer you can't lift.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: swsusp 'disk' fails in bk-current - intel_agp at fault?
  2005-03-25  9:22             ` Stefan Seyfried
@ 2005-03-25 10:13               ` Pavel Machek
  2005-03-25 14:19                 ` Dmitry Torokhov
  2005-03-25 18:36                 ` Andy Isaacson
  2005-03-29 16:18               ` Dmitry Torokhov
  1 sibling, 2 replies; 52+ messages in thread
From: Pavel Machek @ 2005-03-25 10:13 UTC (permalink / raw)
  To: Stefan Seyfried; +Cc: Andy Isaacson, dtor_core, kernel list, Vojtech Pavlik

Hi!

> > OK, anything else I should try?
> 
> not really, i just wait for Vojtech and Pavel :-)

Try commenting out "call_usermodehelper". If that helps, Stefan's
theory is confirmed, and this waits for Vojtech to fix it.


> > In the SysRq-T trace I see one interesting process: most things are
> > in D state in refrigerator(), but sh shows the following traceback:
> > 
> > wait_for_completion
> > call_usermodehelper
> > kobject_hotplug
> > kobject_del
> > class_device_del
> > class_device_unregister
> > mousedev_disconnect
> > input_unregister_device
> > alps_disconnect
> > psmouse_disconnect
> > serio_driver_remove
> > device_release_driver
> > serio_release_driver

								Pavel

-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: swsusp 'disk' fails in bk-current - intel_agp at fault?
  2005-03-25 10:13               ` Pavel Machek
@ 2005-03-25 14:19                 ` Dmitry Torokhov
  2005-03-25 14:24                   ` Pavel Machek
  2005-03-25 18:36                 ` Andy Isaacson
  1 sibling, 1 reply; 52+ messages in thread
From: Dmitry Torokhov @ 2005-03-25 14:19 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Stefan Seyfried, Andy Isaacson, kernel list, Vojtech Pavlik

Hi,

On Fri, 25 Mar 2005 11:13:44 +0100, Pavel Machek <pavel@suse.cz> wrote:
> Hi!
> 
> > > OK, anything else I should try?
> >
> > not really, i just wait for Vojtech and Pavel :-)
> 
> Try commenting out "call_usermodehelper". If that helps, Stefan's
> theory is confirmed, and this waits for Vojtech to fix it.
> 

This is more of a general swsusp problem I believe - the second phase
when it blindly resumes entire system. Resume of a device can fail
(any reason whatsoever) and it will attempt to clean up after itself,
but userspace is dead and hotplug never completes. While I am
interested to know why ALPS does not want to resume on ANdy's laptop
the issue will never be completely resolved from within the input
system.

Pavel, is it possible for swsusp to disable hotplug (probably just do
hotplug_path[0] = 0) before resuming in suspend phase?

A bit on tangent - you need to resume system so you can write the
image, right? I wonder if we could add a flag to struct device that
would mark device as "on_resume_path". The flag would be set when you
select resume partition and propagated to the root of the system. Then
when resume after making the image you could skip all devices that are
not on resume path.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: swsusp 'disk' fails in bk-current - intel_agp at fault?
  2005-03-25 14:19                 ` Dmitry Torokhov
@ 2005-03-25 14:24                   ` Pavel Machek
  2005-03-25 14:52                     ` Dmitry Torokhov
  0 siblings, 1 reply; 52+ messages in thread
From: Pavel Machek @ 2005-03-25 14:24 UTC (permalink / raw)
  To: dtor_core; +Cc: Stefan Seyfried, Andy Isaacson, kernel list, Vojtech Pavlik

Hi!

> > > > OK, anything else I should try?
> > >
> > > not really, i just wait for Vojtech and Pavel :-)
> > 
> > Try commenting out "call_usermodehelper". If that helps, Stefan's
> > theory is confirmed, and this waits for Vojtech to fix it.
> > 
> 
> This is more of a general swsusp problem I believe - the second phase
> when it blindly resumes entire system. Resume of a device can fail
> (any reason whatsoever) and it will attempt to clean up after itself,
> but userspace is dead and hotplug never completes. While I am
> interested to know why ALPS does not want to resume on ANdy's laptop
> the issue will never be completely resolved from within the input
> system.

When device fails to resume, what should I do? I think I could

	if (error)
		panic("Device resume failed\n");

, but... that does not look like what you want.

> Pavel, is it possible for swsusp to disable hotplug (probably just do
> hotplug_path[0] = 0) before resuming in suspend phase?

It feels like a hack, but yes, I probably could do that. (Do you have
patch to try?)

> A bit on tangent - you need to resume system so you can write the
> image, right? I wonder if we could add a flag to struct device that
> would mark device as "on_resume_path". The flag would be set when you
> select resume partition and propagated to the root of the system. Then
> when resume after making the image you could skip all devices that are
> not on resume path.

I'm not going to do that, see FAQ in swsusp.txt.

								Pavel
-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: swsusp 'disk' fails in bk-current - intel_agp at fault?
  2005-03-25 14:24                   ` Pavel Machek
@ 2005-03-25 14:52                     ` Dmitry Torokhov
  2005-03-25 15:42                       ` Pavel Machek
  2005-03-29 21:49                       ` Rafael J. Wysocki
  0 siblings, 2 replies; 52+ messages in thread
From: Dmitry Torokhov @ 2005-03-25 14:52 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Stefan Seyfried, Andy Isaacson, kernel list, Vojtech Pavlik

On Fri, 25 Mar 2005 15:24:15 +0100, Pavel Machek <pavel@suse.cz> wrote:
> Hi!
> 
> > > > > OK, anything else I should try?
> > > >
> > > > not really, i just wait for Vojtech and Pavel :-)
> > >
> > > Try commenting out "call_usermodehelper". If that helps, Stefan's
> > > theory is confirmed, and this waits for Vojtech to fix it.
> > >
> >
> > This is more of a general swsusp problem I believe - the second phase
> > when it blindly resumes entire system. Resume of a device can fail
> > (any reason whatsoever) and it will attempt to clean up after itself,
> > but userspace is dead and hotplug never completes. While I am
> > interested to know why ALPS does not want to resume on ANdy's laptop
> > the issue will never be completely resolved from within the input
> > system.
> 
> When device fails to resume, what should I do? I think I could
> 
>        if (error)
>                panic("Device resume failed\n");
> 
> , but... that does not look like what you want.

Oh, always panic-happy Pavel ;). It really depends on what kind of
device has faled to resume. If the device is really needed for writing
image then panic is the only recourse, but if it some other device you
resuming just ignore it, who cares...

Btw, I dont think that doing selective resume (as opposed to selective
suspend and Nigel's partial device trees) would be so much
complicated. You'd always resume sysdevs and then, when iterating over
"normal" devices, just skip ones not in resume path. It can all be
contained in driver core I believe (sorry but no patch, for now at
least).

> 
> > Pavel, is it possible for swsusp to disable hotplug (probably just do
> > hotplug_path[0] = 0) before resuming in suspend phase?
> 
> It feels like a hack, but yes, I probably could do that. (Do you have
> patch to try?)
>

Not really, I won't be able to write any code anything till next week I think.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: swsusp 'disk' fails in bk-current - intel_agp at fault?
  2005-03-25 14:52                     ` Dmitry Torokhov
@ 2005-03-25 15:42                       ` Pavel Machek
  2005-03-25 16:04                         ` Dmitry Torokhov
  2005-03-29 21:49                       ` Rafael J. Wysocki
  1 sibling, 1 reply; 52+ messages in thread
From: Pavel Machek @ 2005-03-25 15:42 UTC (permalink / raw)
  To: dtor_core; +Cc: Stefan Seyfried, Andy Isaacson, kernel list, Vojtech Pavlik

Hi!

> > > This is more of a general swsusp problem I believe - the second phase
> > > when it blindly resumes entire system. Resume of a device can fail
> > > (any reason whatsoever) and it will attempt to clean up after itself,
> > > but userspace is dead and hotplug never completes. While I am
> > > interested to know why ALPS does not want to resume on ANdy's laptop
> > > the issue will never be completely resolved from within the input
> > > system.
> > 
> > When device fails to resume, what should I do? I think I could
> > 
> >        if (error)
> >                panic("Device resume failed\n");
> > 
> > , but... that does not look like what you want.
> 
> Oh, always panic-happy Pavel ;). It really depends on what kind of
> device has faled to resume. If the device is really needed for writing
> image then panic is the only recourse, but if it some other device you
> resuming just ignore it, who cares...

You are right, for resume-during-suspend, we may as well risk it. We
have consistent state, and if we happen to write it on disk,
everything is okay.

For resume-during-resume, I don't really know how we can handle
that. Running with some devices non-working seems dangerous to me.

> Btw, I dont think that doing selective resume (as opposed to selective
> suspend and Nigel's partial device trees) would be so much
> complicated. You'd always resume sysdevs and then, when iterating over
> "normal" devices, just skip ones not in resume path. It can all be
> contained in driver core I believe (sorry but no patch, for now at
> least).

:-) I think we can simply make device freeze/unfreeze fast enough.
[We do not need to do full suspend/resume; freeze is enough].

								Pavel
-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: swsusp 'disk' fails in bk-current - intel_agp at fault?
  2005-03-25 15:42                       ` Pavel Machek
@ 2005-03-25 16:04                         ` Dmitry Torokhov
  2005-03-28 23:00                           ` Pavel Machek
  2005-03-29 23:19                           ` Rafael J. Wysocki
  0 siblings, 2 replies; 52+ messages in thread
From: Dmitry Torokhov @ 2005-03-25 16:04 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Stefan Seyfried, Andy Isaacson, kernel list, Vojtech Pavlik

On Fri, 25 Mar 2005 16:42:37 +0100, Pavel Machek <pavel@suse.cz> wrote:
> Hi!
> 
> > > > This is more of a general swsusp problem I believe - the second phase
> > > > when it blindly resumes entire system. Resume of a device can fail
> > > > (any reason whatsoever) and it will attempt to clean up after itself,
> > > > but userspace is dead and hotplug never completes. While I am
> > > > interested to know why ALPS does not want to resume on ANdy's laptop
> > > > the issue will never be completely resolved from within the input
> > > > system.
> > >
> > > When device fails to resume, what should I do? I think I could
> > >
> > >        if (error)
> > >                panic("Device resume failed\n");
> > >
> > > , but... that does not look like what you want.
> >
> > Oh, always panic-happy Pavel ;). It really depends on what kind of
> > device has faled to resume. If the device is really needed for writing
> > image then panic is the only recourse, but if it some other device you
> > resuming just ignore it, who cares...
> 
> You are right, for resume-during-suspend, we may as well risk it. We
> have consistent state, and if we happen to write it on disk,
> everything is okay.
> 
> For resume-during-resume, I don't really know how we can handle
> that. Running with some devices non-working seems dangerous to me.
>

I think it again varies, and the driver would have to decide what to
do if it can not resume hardware. Take for example USB - i believe USB
guys are shooting at being able to disconnect device while the box is
suspended and have it removed from the system when resuming. In
Probably every driver that has even a slighest notion of
hot-pluggability should just properly clean up after itself and not
signal error to the core.
 
> > Btw, I dont think that doing selective resume (as opposed to selective
> > suspend and Nigel's partial device trees) would be so much
> > complicated. You'd always resume sysdevs and then, when iterating over
> > "normal" devices, just skip ones not in resume path. It can all be
> > contained in driver core I believe (sorry but no patch, for now at
> > least).
> 
> :-) I think we can simply make device freeze/unfreeze fast enough.
> [We do not need to do full suspend/resume; freeze is enough].

It is not suspend/freeze here that gets us but resume and with resume
the driver (at least for now) does not have any idea if it is
"unfreeze" or "full-resume". I mean I could have serio just ignore
"unfreeze" requests (as I doubt anyone would ever try to suspend over
PS/2 port ;) ) but I think it should be really handled by the core.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: swsusp 'disk' fails in bk-current - intel_agp at fault?
  2005-03-25 16:04                         ` Dmitry Torokhov
@ 2005-03-28 23:00                           ` Pavel Machek
  2005-03-29 23:19                           ` Rafael J. Wysocki
  1 sibling, 0 replies; 52+ messages in thread
From: Pavel Machek @ 2005-03-28 23:00 UTC (permalink / raw)
  To: dtor_core; +Cc: Stefan Seyfried, Andy Isaacson, kernel list, Vojtech Pavlik

Hi!

> > > Btw, I dont think that doing selective resume (as opposed to selective
> > > suspend and Nigel's partial device trees) would be so much
> > > complicated. You'd always resume sysdevs and then, when iterating over
> > > "normal" devices, just skip ones not in resume path. It can all be
> > > contained in driver core I believe (sorry but no patch, for now at
> > > least).
> > 
> > :-) I think we can simply make device freeze/unfreeze fast enough.
> > [We do not need to do full suspend/resume; freeze is enough].
> 
> It is not suspend/freeze here that gets us but resume and with resume
> the driver (at least for now) does not have any idea if it is
> "unfreeze" or "full-resume". I mean I could have serio just ignore
> "unfreeze" requests (as I doubt anyone would ever try to suspend over
> PS/2 port ;) ) but I think it should be really handled by the core.

Please just always do full-resume... for now. Patches that enable you
to detect "unfreeze" are not in, yet. If something fails, just printk
with big enough severity and continue, as you don't have method of
signaling error, anyway.
								Pavel
-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: swsusp 'disk' fails in bk-current - intel_agp at fault?
  2005-03-25 16:04                         ` Dmitry Torokhov
  2005-03-28 23:00                           ` Pavel Machek
@ 2005-03-29 23:19                           ` Rafael J. Wysocki
  1 sibling, 0 replies; 52+ messages in thread
From: Rafael J. Wysocki @ 2005-03-29 23:19 UTC (permalink / raw)
  To: dtor_core
  Cc: Pavel Machek, Stefan Seyfried, Andy Isaacson, kernel list,
	Vojtech Pavlik

Hi,

On Friday, 25 of March 2005 17:04, Dmitry Torokhov wrote:
> On Fri, 25 Mar 2005 16:42:37 +0100, Pavel Machek <pavel@suse.cz> wrote:
> > Hi!
> > 
> > > > > This is more of a general swsusp problem I believe - the second phase
> > > > > when it blindly resumes entire system. Resume of a device can fail
> > > > > (any reason whatsoever) and it will attempt to clean up after itself,
> > > > > but userspace is dead and hotplug never completes. While I am
> > > > > interested to know why ALPS does not want to resume on ANdy's laptop
> > > > > the issue will never be completely resolved from within the input
> > > > > system.
> > > >
> > > > When device fails to resume, what should I do? I think I could
> > > >
> > > >        if (error)
> > > >                panic("Device resume failed\n");
> > > >
> > > > , but... that does not look like what you want.
> > >
> > > Oh, always panic-happy Pavel ;). It really depends on what kind of
> > > device has faled to resume. If the device is really needed for writing
> > > image then panic is the only recourse, but if it some other device you
> > > resuming just ignore it, who cares...
> > 
> > You are right, for resume-during-suspend, we may as well risk it. We
> > have consistent state, and if we happen to write it on disk,
> > everything is okay.
> > 
> > For resume-during-resume, I don't really know how we can handle
> > that. Running with some devices non-working seems dangerous to me.
> >
> 
> I think it again varies, and the driver would have to decide what to
> do if it can not resume hardware.

Well, I don't think that the driver would be able to state that its failure
is "serious enough", for example, to panic().  This is only known to the
higher-level code that calls the driver's _resume() routine.  IMO the driver
should not make any assumptions of its importance (eg a SCSI driver
that panic()s, because it's unable to resume a disk which does not
even contain a mounted partition is not a good idea ;-)).

> Take for example USB - i believe USB 
> guys are shooting at being able to disconnect device while the box is
> suspended and have it removed from the system when resuming. In
> Probably every driver that has even a slighest notion of
> hot-pluggability should just properly clean up after itself and not
> signal error to the core.

Unless, for instance, one of its devices contains the root filesystem.
  
> > > Btw, I dont think that doing selective resume (as opposed to selective
> > > suspend and Nigel's partial device trees) would be so much
> > > complicated. You'd always resume sysdevs and then, when iterating over
> > > "normal" devices, just skip ones not in resume path. It can all be
> > > contained in driver core I believe (sorry but no patch, for now at
> > > least).
> > 
> > :-) I think we can simply make device freeze/unfreeze fast enough.
> > [We do not need to do full suspend/resume; freeze is enough].

If the driver is compiled as a module, its devices may be uninitialized
when its _resume() routine is called (eg in the resume-during-resume).
Hence, IMHO, we can forget the "unfreeze" thing until we can differentiate
the resume-during-suspend from the resume-during-resume etc. ...

Greets,
Rafael


-- 
- Would you tell me, please, which way I ought to go from here?
- That depends a good deal on where you want to get to.
		-- Lewis Carroll "Alice's Adventures in Wonderland"

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: swsusp 'disk' fails in bk-current - intel_agp at fault?
  2005-03-25 14:52                     ` Dmitry Torokhov
  2005-03-25 15:42                       ` Pavel Machek
@ 2005-03-29 21:49                       ` Rafael J. Wysocki
  1 sibling, 0 replies; 52+ messages in thread
From: Rafael J. Wysocki @ 2005-03-29 21:49 UTC (permalink / raw)
  To: dtor_core
  Cc: Pavel Machek, Stefan Seyfried, Andy Isaacson, kernel list,
	Vojtech Pavlik

Hi,

On Friday, 25 of March 2005 15:52, Dmitry Torokhov wrote:
> On Fri, 25 Mar 2005 15:24:15 +0100, Pavel Machek <pavel@suse.cz> wrote:
> > Hi!
> > 
> > > > > > OK, anything else I should try?
> > > > >
> > > > > not really, i just wait for Vojtech and Pavel :-)
> > > >
> > > > Try commenting out "call_usermodehelper". If that helps, Stefan's
> > > > theory is confirmed, and this waits for Vojtech to fix it.
> > > >
> > >
> > > This is more of a general swsusp problem I believe - the second phase
> > > when it blindly resumes entire system. Resume of a device can fail
> > > (any reason whatsoever) and it will attempt to clean up after itself,
> > > but userspace is dead and hotplug never completes. While I am
> > > interested to know why ALPS does not want to resume on ANdy's laptop
> > > the issue will never be completely resolved from within the input
> > > system.
> > 
> > When device fails to resume, what should I do? I think I could
> > 
> >        if (error)
> >                panic("Device resume failed\n");
> > 
> > , but... that does not look like what you want.
> 
> Oh, always panic-happy Pavel ;). It really depends on what kind of
> device has faled to resume. If the device is really needed for writing
> image then panic is the only recourse, but if it some other device you
> resuming just ignore it, who cares...

Moreover, if we panic() here, we potentially lose data.  IMO we should not
do this for a device that is not needed for saving the image and/or
contains the root filesystem.

> Btw, I dont think that doing selective resume (as opposed to selective
> suspend and Nigel's partial device trees) would be so much
> complicated. You'd always resume sysdevs and then, when iterating over
> "normal" devices, just skip ones not in resume path. It can all be
> contained in driver core I believe (sorry but no patch, for now at
> least).

In fact, the only devices that we really need to resume-during-suspend are
those necessary for saving the image.

Greets,
Rafael


-- 
- Would you tell me, please, which way I ought to go from here?
- That depends a good deal on where you want to get to.
		-- Lewis Carroll "Alice's Adventures in Wonderland"


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: swsusp 'disk' fails in bk-current - intel_agp at fault?
  2005-03-25 10:13               ` Pavel Machek
  2005-03-25 14:19                 ` Dmitry Torokhov
@ 2005-03-25 18:36                 ` Andy Isaacson
  1 sibling, 0 replies; 52+ messages in thread
From: Andy Isaacson @ 2005-03-25 18:36 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Stefan Seyfried, dtor_core, kernel list, Vojtech Pavlik

On Fri, Mar 25, 2005 at 11:13:44AM +0100, Pavel Machek wrote:
> Hi!
> 
> > > OK, anything else I should try?
> > 
> > not really, i just wait for Vojtech and Pavel :-)
> 
> Try commenting out "call_usermodehelper". If that helps, Stefan's
> theory is confirmed, and this waits for Vojtech to fix it.
> 
> > > wait_for_completion
> > > call_usermodehelper
> > > kobject_hotplug
> > > kobject_del

Without the call_usermodehelper in kobject_hotplug, the first suspend
seems to work OK (which I think confirms the theory).  But after resume,
the second suspend hangs in the same place.  It's calling
call_usermodehelper from input_call_hotplug... time to comment out
another one and recompile.

I also tried -mm1 and it hangs in the same place.

-andy

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: swsusp 'disk' fails in bk-current - intel_agp at fault?
  2005-03-25  9:22             ` Stefan Seyfried
  2005-03-25 10:13               ` Pavel Machek
@ 2005-03-29 16:18               ` Dmitry Torokhov
  2005-03-29 18:18                 ` Pavel Machek
  1 sibling, 1 reply; 52+ messages in thread
From: Dmitry Torokhov @ 2005-03-29 16:18 UTC (permalink / raw)
  To: Stefan Seyfried; +Cc: Andy Isaacson, kernel list, Pavel Machek, Vojtech Pavlik

On Fri, 25 Mar 2005 10:22:28 +0100, Stefan Seyfried <seife@suse.de> wrote:
> Andy Isaacson wrote:
> 
> > In the SysRq-T trace I see one interesting process: most things are
> > in D state in refrigerator(), but sh shows the following traceback:
> >
> > wait_for_completion
> > call_usermodehelper
> > kobject_hotplug
> > kobject_del
> > class_device_del
> > class_device_unregister
> > mousedev_disconnect
> > input_unregister_device
> > alps_disconnect
> > psmouse_disconnect
> > serio_driver_remove
> > device_release_driver
> > serio_release_driver
> 
> i think the following happens (but i am in no case an expert for this):
> - alps driver suspends
> - alps driver unregisters the device
> - udev is called via call_usermodehelper (which fails since userspace
>   is stopped)
> - now somebody wants to wait for udev which does not work right.

The thing is that kobject_uevent calls call_usermodehelper with
wait=0. That means that it conly waits for execve("/sbin/hotplug")
call to complete, it does not wait for the entire process ti complete.

If you look at Andy's second trace you will see that we are waiting
for the disk I/O to get /sbin/hotplug from the disk. Pavel, do you
know why IO does not complete? khelper is a kernel thread so it is
marked with
PF_NOFREEZE. Could it be that we managed to freeze kblockd?

-- 
Dmitry

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: swsusp 'disk' fails in bk-current - intel_agp at fault?
  2005-03-29 16:18               ` Dmitry Torokhov
@ 2005-03-29 18:18                 ` Pavel Machek
  2005-03-29 19:11                   ` Dmitry Torokhov
  0 siblings, 1 reply; 52+ messages in thread
From: Pavel Machek @ 2005-03-29 18:18 UTC (permalink / raw)
  To: dtor_core; +Cc: Stefan Seyfried, Andy Isaacson, kernel list, Vojtech Pavlik

Hi!

> > > In the SysRq-T trace I see one interesting process: most things are
> > > in D state in refrigerator(), but sh shows the following traceback:
> > >
> > > wait_for_completion
> > > call_usermodehelper
> > > kobject_hotplug
> > > kobject_del
> > > class_device_del
> > > class_device_unregister
> > > mousedev_disconnect
> > > input_unregister_device
> > > alps_disconnect
> > > psmouse_disconnect
> > > serio_driver_remove
> > > device_release_driver
> > > serio_release_driver
> > 
> > i think the following happens (but i am in no case an expert for this):
> > - alps driver suspends
> > - alps driver unregisters the device
> > - udev is called via call_usermodehelper (which fails since userspace
> >   is stopped)
> > - now somebody wants to wait for udev which does not work right.
> 
> The thing is that kobject_uevent calls call_usermodehelper with
> wait=0. That means that it conly waits for execve("/sbin/hotplug")
> call to complete, it does not wait for the entire process ti complete.
> 
> If you look at Andy's second trace you will see that we are waiting
> for the disk I/O to get /sbin/hotplug from the disk. Pavel, do you
> know why IO does not complete? khelper is a kernel thread so it is
> marked with
> PF_NOFREEZE. Could it be that we managed to freeze kblockd?

Uf, no idea about kblockd freezing -- we certainly should not.

*But*, if we are doing execve while system is frozen, something is
very wrong. We should not be doing execve in the first place.
								Pavel
-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: swsusp 'disk' fails in bk-current - intel_agp at fault?
  2005-03-29 18:18                 ` Pavel Machek
@ 2005-03-29 19:11                   ` Dmitry Torokhov
  2005-03-29 19:23                     ` Pavel Machek
  0 siblings, 1 reply; 52+ messages in thread
From: Dmitry Torokhov @ 2005-03-29 19:11 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Stefan Seyfried, Andy Isaacson, kernel list, Vojtech Pavlik

On Tue, 29 Mar 2005 20:18:31 +0200, Pavel Machek <pavel@suse.cz> wrote:
> Hi!
> 
> > If you look at Andy's second trace you will see that we are waiting
> > for the disk I/O to get /sbin/hotplug from the disk. Pavel, do you
> > know why IO does not complete? khelper is a kernel thread so it is
> > marked with
> > PF_NOFREEZE. Could it be that we managed to freeze kblockd?
> 
> Uf, no idea about kblockd freezing -- we certainly should not.
> 
> *But*, if we are doing execve while system is frozen, something is
> very wrong. We should not be doing execve in the first place.

Well, there lies a problem - some devices have to do execve because
they need firmware to operate. Also, again, some buses with
hot-pluggable devices will attempt to clean up unsuccessful resume and
this will cause hotplug events. The point is you either resume system
or you don't. We probably need a separate "unfreeze" callback,
although this is kind of messy.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: swsusp 'disk' fails in bk-current - intel_agp at fault?
  2005-03-29 19:11                   ` Dmitry Torokhov
@ 2005-03-29 19:23                     ` Pavel Machek
  2005-03-29 20:05                       ` Dmitry Torokhov
  0 siblings, 1 reply; 52+ messages in thread
From: Pavel Machek @ 2005-03-29 19:23 UTC (permalink / raw)
  To: dtor_core; +Cc: Stefan Seyfried, Andy Isaacson, kernel list, Vojtech Pavlik

Hi!

> > > If you look at Andy's second trace you will see that we are waiting
> > > for the disk I/O to get /sbin/hotplug from the disk. Pavel, do you
> > > know why IO does not complete? khelper is a kernel thread so it is
> > > marked with
> > > PF_NOFREEZE. Could it be that we managed to freeze kblockd?
> > 
> > Uf, no idea about kblockd freezing -- we certainly should not.
> > 
> > *But*, if we are doing execve while system is frozen, something is
> > very wrong. We should not be doing execve in the first place.
> 
> Well, there lies a problem - some devices have to do execve because
> they need firmware to operate. Also, again, some buses with
> hot-pluggable devices will attempt to clean up unsuccessful resume and
> this will cause hotplug events. The point is you either resume system
> or you don't. We probably need a separate "unfreeze" callback,
> although this is kind of messy.

There's a better solution for firmware: You should load your firmware
prior to suspend and store it in RAM. Anything else just plain does
not work. (Because your wireless firmware might be on NFS mounted over
that wireless card).

Hotplug... I guess udev just needs to hold that callbacks before
system is fully up... it has to do something similar on regular boot,
no?
								Pavel
-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: swsusp 'disk' fails in bk-current - intel_agp at fault?
  2005-03-29 19:23                     ` Pavel Machek
@ 2005-03-29 20:05                       ` Dmitry Torokhov
  2005-03-29 20:52                         ` Pavel Machek
  0 siblings, 1 reply; 52+ messages in thread
From: Dmitry Torokhov @ 2005-03-29 20:05 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Stefan Seyfried, Andy Isaacson, kernel list, Vojtech Pavlik

On Tue, 29 Mar 2005 21:23:39 +0200, Pavel Machek <pavel@suse.cz> wrote:
> Hi!
> 
> > > > If you look at Andy's second trace you will see that we are waiting
> > > > for the disk I/O to get /sbin/hotplug from the disk. Pavel, do you
> > > > know why IO does not complete? khelper is a kernel thread so it is
> > > > marked with
> > > > PF_NOFREEZE. Could it be that we managed to freeze kblockd?
> > >
> > > Uf, no idea about kblockd freezing -- we certainly should not.
> > >
> > > *But*, if we are doing execve while system is frozen, something is
> > > very wrong. We should not be doing execve in the first place.
> >
> > Well, there lies a problem - some devices have to do execve because
> > they need firmware to operate. Also, again, some buses with
> > hot-pluggable devices will attempt to clean up unsuccessful resume and
> > this will cause hotplug events. The point is you either resume system
> > or you don't. We probably need a separate "unfreeze" callback,
> > although this is kind of messy.
> 
> There's a better solution for firmware: You should load your firmware
> prior to suspend and store it in RAM. Anything else just plain does
> not work. (Because your wireless firmware might be on NFS mounted over
> that wireless card).
> 
> Hotplug... I guess udev just needs to hold that callbacks before
> system is fully up... it has to do something similar on regular boot,
> no?

Well, I did not really look into udev but hotplug (which can iteract
with udev) does not keep anything. If it fails its ok - that's why
there are coldplug scripts that "recover" lost events. But here we
block trying to start hotplug - we not getting an error - and this is
bad. Unfortunately I am not familiar with block devices working to say
why it hangs.

Should we pull Jens into the discussion?
 
-- 
Dmitry

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: swsusp 'disk' fails in bk-current - intel_agp at fault?
  2005-03-29 20:05                       ` Dmitry Torokhov
@ 2005-03-29 20:52                         ` Pavel Machek
  2005-03-29 21:07                           ` Dmitry Torokhov
  2005-03-29 21:23                           ` [linux-pm] " Patrick Mochel
  0 siblings, 2 replies; 52+ messages in thread
From: Pavel Machek @ 2005-03-29 20:52 UTC (permalink / raw)
  To: dtor_core
  Cc: Stefan Seyfried, Andy Isaacson, kernel list, Vojtech Pavlik,
	Linux-pm mailing list

Hi!

> > > Well, there lies a problem - some devices have to do execve because
> > > they need firmware to operate. Also, again, some buses with
> > > hot-pluggable devices will attempt to clean up unsuccessful resume and
> > > this will cause hotplug events. The point is you either resume system
> > > or you don't. We probably need a separate "unfreeze" callback,
> > > although this is kind of messy.
> > 
> > There's a better solution for firmware: You should load your firmware
> > prior to suspend and store it in RAM. Anything else just plain does
> > not work. (Because your wireless firmware might be on NFS mounted over
> > that wireless card).
> > 
> > Hotplug... I guess udev just needs to hold that callbacks before
> > system is fully up... it has to do something similar on regular boot,
> > no?
> 
> Well, I did not really look into udev but hotplug (which can iteract
> with udev) does not keep anything. If it fails its ok - that's why
> there are coldplug scripts that "recover" lost events. But here we
> block trying to start hotplug - we not getting an error - and this is
> bad. Unfortunately I am not familiar with block devices working to say
> why it hangs.
> 
> Should we pull Jens into the discussion?

I don't really want us to try execve during resume... Could we simply
artifically fail that execve with something if (in_suspend()) return
-EINVAL; [except that in_suspend() just is not there, but there were
some proposals to add it].

Or just avoid calling hotplug at all in resume case? And then do
coldplug-like scan when userspace is ready...

But we perhaps should cc linux-pm list.
								Pavel
-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: swsusp 'disk' fails in bk-current - intel_agp at fault?
  2005-03-29 20:52                         ` Pavel Machek
@ 2005-03-29 21:07                           ` Dmitry Torokhov
  2005-03-29 21:12                             ` Pavel Machek
  2005-03-29 21:23                           ` [linux-pm] " Patrick Mochel
  1 sibling, 1 reply; 52+ messages in thread
From: Dmitry Torokhov @ 2005-03-29 21:07 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Stefan Seyfried, Andy Isaacson, kernel list, Vojtech Pavlik,
	Linux-pm mailing list

On Tue, 29 Mar 2005 22:52:25 +0200, Pavel Machek <pavel@suse.cz> wrote:
> I don't really want us to try execve during resume... Could we simply
> artifically fail that execve with something if (in_suspend()) return
> -EINVAL; [except that in_suspend() just is not there, but there were
> some proposals to add it].
> 
> Or just avoid calling hotplug at all in resume case? And then do
> coldplug-like scan when userspace is ready...
> 

I am leaning towards calling disable_usermodehelper (not writtent yet)
after swsusp completes snapshotting memory. We really don't care about
hotplug events in this case and this will allow keeping "normal"
resume in drivers as is. What do you think?

-- 
Dmitry

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: swsusp 'disk' fails in bk-current - intel_agp at fault?
  2005-03-29 21:07                           ` Dmitry Torokhov
@ 2005-03-29 21:12                             ` Pavel Machek
  2005-03-29 21:33                               ` Dmitry Torokhov
  2005-03-29 23:05                               ` Rafael J. Wysocki
  0 siblings, 2 replies; 52+ messages in thread
From: Pavel Machek @ 2005-03-29 21:12 UTC (permalink / raw)
  To: dtor_core
  Cc: Stefan Seyfried, Andy Isaacson, kernel list, Vojtech Pavlik,
	Linux-pm mailing list

Hi!

> > I don't really want us to try execve during resume... Could we simply
> > artifically fail that execve with something if (in_suspend()) return
> > -EINVAL; [except that in_suspend() just is not there, but there were
> > some proposals to add it].
> > 
> > Or just avoid calling hotplug at all in resume case? And then do
> > coldplug-like scan when userspace is ready...
> > 
> 
> I am leaning towards calling disable_usermodehelper (not writtent yet)
> after swsusp completes snapshotting memory. We really don't care about
> hotplug events in this case and this will allow keeping "normal"
> resume in drivers as is. What do you think?

That would certianly do the trick.

[Or perhaps in_suspend() is slightly nicer solution? People wanted it
for other stuff (sanity checking, like BUG_ON(in_suspend())), too....]

									Pavel
-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: swsusp 'disk' fails in bk-current - intel_agp at fault?
  2005-03-29 21:12                             ` Pavel Machek
@ 2005-03-29 21:33                               ` Dmitry Torokhov
  2005-03-29 21:44                                 ` Pavel Machek
  2005-03-29 23:05                               ` Rafael J. Wysocki
  1 sibling, 1 reply; 52+ messages in thread
From: Dmitry Torokhov @ 2005-03-29 21:33 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Stefan Seyfried, Andy Isaacson, kernel list, Vojtech Pavlik,
	Linux-pm mailing list

On Tue, 29 Mar 2005 23:12:39 +0200, Pavel Machek <pavel@suse.cz> wrote:
> >
> > I am leaning towards calling disable_usermodehelper (not writtent yet)
> > after swsusp completes snapshotting memory. We really don't care about
> > hotplug events in this case and this will allow keeping "normal"
> > resume in drivers as is. What do you think?
> 
> That would certianly do the trick.
> 
> [Or perhaps in_suspend() is slightly nicer solution? People wanted it
> for other stuff (sanity checking, like BUG_ON(in_suspend())), too....]
> 

We might want having both... Hmm... in_suspend - is it only for swsusp
(in_swsusp) or for suspend-to-ram as well? For suspend to ram we might
need slightly different rules, I don't know. A separate call will
allow more fine-grained control and will explicitely tell reader what
is happening.

I do not have a strong preference though.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: swsusp 'disk' fails in bk-current - intel_agp at fault?
  2005-03-29 21:33                               ` Dmitry Torokhov
@ 2005-03-29 21:44                                 ` Pavel Machek
  2005-03-29 22:31                                   ` [linux-pm] " Nigel Cunningham
  0 siblings, 1 reply; 52+ messages in thread
From: Pavel Machek @ 2005-03-29 21:44 UTC (permalink / raw)
  To: dtor_core
  Cc: Stefan Seyfried, Andy Isaacson, kernel list, Vojtech Pavlik,
	Linux-pm mailing list

On Út 29-03-05 16:33:04, Dmitry Torokhov wrote:
> On Tue, 29 Mar 2005 23:12:39 +0200, Pavel Machek <pavel@suse.cz> wrote:
> > >
> > > I am leaning towards calling disable_usermodehelper (not writtent yet)
> > > after swsusp completes snapshotting memory. We really don't care about
> > > hotplug events in this case and this will allow keeping "normal"
> > > resume in drivers as is. What do you think?
> > 
> > That would certianly do the trick.
> > 
> > [Or perhaps in_suspend() is slightly nicer solution? People wanted it
> > for other stuff (sanity checking, like BUG_ON(in_suspend())), too....]
> > 
> 
> We might want having both... Hmm... in_suspend - is it only for swsusp
> (in_swsusp) or for suspend-to-ram as well? For suspend to ram we might
> need slightly different rules, I don't know. A separate call will
> allow more fine-grained control and will explicitely tell reader what
> is happening.

We currently freeze processes for suspend-to-ram, too. I guess that
disable_usermodehelper is probably better and that in_suspend() should
only be used for sanity checks... go with disable_usermodehelper and
sorry for the noise.
								Pavel
-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [linux-pm] Re: swsusp 'disk' fails in bk-current - intel_agp at fault?
  2005-03-29 21:44                                 ` Pavel Machek
@ 2005-03-29 22:31                                   ` Nigel Cunningham
  2005-03-29 22:35                                     ` Pavel Machek
  0 siblings, 1 reply; 52+ messages in thread
From: Nigel Cunningham @ 2005-03-29 22:31 UTC (permalink / raw)
  To: Pavel Machek
  Cc: dtor_core, Linux-pm mailing list, Vojtech Pavlik, Stefan Seyfried,
	Linux Kernel Mailing List, Andy Isaacson

Hi.

On Wed, 2005-03-30 at 07:44, Pavel Machek wrote:
> We currently freeze processes for suspend-to-ram, too. I guess that
> disable_usermodehelper is probably better and that in_suspend() should
> only be used for sanity checks... go with disable_usermodehelper and
> sorry for the noise.

Here's another possibility: Freeze the workqueue that
call_usermodehelper uses (remember that code I didn't push hard enough
to Andrew?), and let invocations of call_usermodehelper block in
TASK_UNINTERRUPTIBLE. In refrigerating processes, don't choke on kernel
processes in that state. Of course if you won't want the freeze
processes for str, but do want to freeze call_usermodehelper, I guess
you'd still need the in_suspend() macro.

Regards,

Nigel
-- 
Nigel Cunningham
Software Engineer, Canberra, Australia
http://www.cyclades.com
Bus: +61 (2) 6291 9554; Hme: +61 (2) 6292 8028;  Mob: +61 (417) 100 574

Maintainer of Suspend2 Kernel Patches http://suspend2.net


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [linux-pm] Re: swsusp 'disk' fails in bk-current - intel_agp at fault?
  2005-03-29 22:31                                   ` [linux-pm] " Nigel Cunningham
@ 2005-03-29 22:35                                     ` Pavel Machek
  2005-03-29 23:46                                       ` Nigel Cunningham
  2005-03-31  7:26                                       ` Dmitry Torokhov
  0 siblings, 2 replies; 52+ messages in thread
From: Pavel Machek @ 2005-03-29 22:35 UTC (permalink / raw)
  To: Nigel Cunningham
  Cc: dtor_core, Linux-pm mailing list, Vojtech Pavlik, Stefan Seyfried,
	Linux Kernel Mailing List, Andy Isaacson

Hi!

> > We currently freeze processes for suspend-to-ram, too. I guess that
> > disable_usermodehelper is probably better and that in_suspend() should
> > only be used for sanity checks... go with disable_usermodehelper and
> > sorry for the noise.
> 
> Here's another possibility: Freeze the workqueue that
> call_usermodehelper uses (remember that code I didn't push hard enough
> to Andrew?), and let invocations of call_usermodehelper block in
> TASK_UNINTERRUPTIBLE. In refrigerating processes, don't choke on

There may be many devices in the system, and you are going to need
quite a lot of RAM for all that... That's why they do not queue it
during boot, IIRC. Disabling usermode helper seems right.
								Pavel
-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [linux-pm] Re: swsusp 'disk' fails in bk-current - intel_agp at fault?
  2005-03-29 22:35                                     ` Pavel Machek
@ 2005-03-29 23:46                                       ` Nigel Cunningham
  2005-03-31  7:26                                       ` Dmitry Torokhov
  1 sibling, 0 replies; 52+ messages in thread
From: Nigel Cunningham @ 2005-03-29 23:46 UTC (permalink / raw)
  To: Pavel Machek
  Cc: dtor_core, Linux-pm mailing list, Vojtech Pavlik, Stefan Seyfried,
	Linux Kernel Mailing List, Andy Isaacson

Hi.

On Wed, 2005-03-30 at 08:35, Pavel Machek wrote:
> Hi!
> 
> > > We currently freeze processes for suspend-to-ram, too. I guess that
> > > disable_usermodehelper is probably better and that in_suspend() should
> > > only be used for sanity checks... go with disable_usermodehelper and
> > > sorry for the noise.
> > 
> > Here's another possibility: Freeze the workqueue that
> > call_usermodehelper uses (remember that code I didn't push hard enough
> > to Andrew?), and let invocations of call_usermodehelper block in
> > TASK_UNINTERRUPTIBLE. In refrigerating processes, don't choke on
> 
> There may be many devices in the system, and you are going to need
> quite a lot of RAM for all that... That's why they do not queue it
> during boot, IIRC. Disabling usermode helper seems right.

Many devices is true, but very few of them invoke usermode helpers.

[desktop build-2.6.12-rc1]# find -name *.[ch] | xargs grep usermodehelper
./drivers/s390/crypto/z90main.c:        call_usermodehelper(argv[0], argv, envp, 0);
./drivers/net/hamradio/baycom_epp.c:    return call_usermodehelper(eppconfig_path, argv, envp, 1);
./drivers/acpi/thermal.c:       call_usermodehelper(argv[0], argv, envp, 0);
./drivers/acpi/thermal.mod.c:   { 0x436006da, "call_usermodehelper" },
./drivers/input/input.c:        value = call_usermodehelper(argv [0], argv, envp, 0);
./drivers/pnp/pnpbios/core.c:   value = call_usermodehelper (argv [0], argv, envp, 0);
./drivers/macintosh/therm_pm72.c:       return call_usermodehelper(critical_overtemp_path, argv, envp, 0);
./arch/i386/mach-voyager/voyager_thread.c:      if ((ret = call_usermodehelper(argv[0], argv, envp, 1)) != 0) {
./include/linux/kmod.h:extern int call_usermodehelper(char *path, char *argv[], char *envp[], int wait);
./include/linux/kmod.h:extern void usermodehelper_init(void);
./kernel/power/main.c:  return call_usermodehelper(argv[0], argv, envp, 1);
./kernel/power/suspend_userui.c:        retval = call_usermodehelper(userui_program, argv, envp, 0);
./kernel/kmod.c:        call_usermodehelper wait flag, and remove exec_usermodehelper.
./kernel/kmod.c:        ret = call_usermodehelper(modprobe_path, argv, envp, 1);
./kernel/kmod.c:static int ____call_usermodehelper(void *data)
./kernel/kmod.c:        pid = kernel_thread(____call_usermodehelper, sub_info, SIGCHLD);
./kernel/kmod.c:static void __call_usermodehelper(void *data)
./kernel/kmod.c:                pid = kernel_thread(____call_usermodehelper, sub_info,
./kernel/kmod.c: * call_usermodehelper - start a usermode application
./kernel/kmod.c:int call_usermodehelper(char *path, char **argv, char **envp, int wait)
./kernel/kmod.c:        DECLARE_WORK(work, __call_usermodehelper, &sub_info);
./kernel/kmod.c:EXPORT_SYMBOL(call_usermodehelper);
./kernel/kmod.c:void __init usermodehelper_init(void)
./kernel/cpuset.c: * Note final arg to call_usermodehelper() is 0 - that means
./kernel/cpuset.c:      return call_usermodehelper(argv[0], argv, envp, 0);
./security/keys/request_key.c:  return call_usermodehelper(argv[0], argv, envp, 1);
./lib/kobject_uevent.c: retval = call_usermodehelper (argv[0], argv, envp, 0);
./lib/kobject_uevent.c:         pr_debug ("%s - call_usermodehelper returned %d\n",
./init/main.c:  usermodehelper_init();

Of course there will be indirect invocations (via kobjects, for
example), but I still think the number is not that great. I'm already
using the method I suggested in unreleased Suspend2 code, and the only
invocation I'm catch is at resume time, for the keseriod.

Regards,

Nigel
-- 
Nigel Cunningham
Software Engineer, Canberra, Australia
http://www.cyclades.com
Bus: +61 (2) 6291 9554; Hme: +61 (2) 6292 8028;  Mob: +61 (417) 100 574

Maintainer of Suspend2 Kernel Patches http://suspend2.net

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [linux-pm] Re: swsusp 'disk' fails in bk-current - intel_agp at fault?
  2005-03-29 22:35                                     ` Pavel Machek
  2005-03-29 23:46                                       ` Nigel Cunningham
@ 2005-03-31  7:26                                       ` Dmitry Torokhov
  2005-03-31  8:39                                         ` Pavel Machek
  2005-03-31 16:02                                         ` Patrick Mochel
  1 sibling, 2 replies; 52+ messages in thread
From: Dmitry Torokhov @ 2005-03-31  7:26 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Nigel Cunningham, Linux-pm mailing list, Vojtech Pavlik,
	Stefan Seyfried, Linux Kernel Mailing List, Andy Isaacson

On Tuesday 29 March 2005 17:35, Pavel Machek wrote:
> Hi!
> 
> > > We currently freeze processes for suspend-to-ram, too. I guess that
> > > disable_usermodehelper is probably better and that in_suspend() should
> > > only be used for sanity checks... go with disable_usermodehelper and
> > > sorry for the noise.
> > 
> > Here's another possibility: Freeze the workqueue that
> > call_usermodehelper uses (remember that code I didn't push hard enough
> > to Andrew?), and let invocations of call_usermodehelper block in
> > TASK_UNINTERRUPTIBLE. In refrigerating processes, don't choke on
> 
> There may be many devices in the system, and you are going to need
> quite a lot of RAM for all that... That's why they do not queue it
> during boot, IIRC. Disabling usermode helper seems right.

Ok, what do you think about this one?

===================================================================

swsusp: disable usermodehelper after generating memory snapshot and
        before resuming devices, so when device fails to resume we
        won't try to call hotplug - userspace stopped anyway.

Signed-off-by: Dmitry Torokhov <dtor@mail.ru>


 include/linux/kmod.h  |    3 +++
 kernel/kmod.c         |   14 +++++++++++++-
 kernel/power/disk.c   |    2 ++
 kernel/power/swsusp.c |    1 -
 4 files changed, 18 insertions(+), 2 deletions(-)

Index: dtor/kernel/power/disk.c
===================================================================
--- dtor.orig/kernel/power/disk.c
+++ dtor/kernel/power/disk.c
@@ -205,6 +205,8 @@ int pm_suspend_disk(void)
 
 	if (in_suspend) {
 		pr_debug("PM: writing image.\n");
+		usermodehelper_disable();
+		device_resume();
 		error = swsusp_write();
 		if (!error)
 			power_down(pm_disk_mode);
Index: dtor/kernel/power/swsusp.c
===================================================================
--- dtor.orig/kernel/power/swsusp.c
+++ dtor/kernel/power/swsusp.c
@@ -853,7 +853,6 @@ static int suspend_prepare_image(void)
 int swsusp_write(void)
 {
 	int error;
-	device_resume();
 	lock_swapdevices();
 	error = write_suspend_image();
 	/* This will unlock ignored swap devices since writing is finished */
Index: dtor/kernel/kmod.c
===================================================================
--- dtor.orig/kernel/kmod.c
+++ dtor/kernel/kmod.c
@@ -124,6 +124,8 @@ struct subprocess_info {
 	int retval;
 };
 
+static int usermodehelper_disabled;
+
 /*
  * This is the task which runs the usermode application
  */
@@ -240,7 +242,7 @@ int call_usermodehelper(char *path, char
 	if (!khelper_wq)
 		return -EBUSY;
 
-	if (path[0] == '\0')
+	if (usermodehelper_disabled || path[0] == '\0')
 		return 0;
 
 	queue_work(khelper_wq, &work);
@@ -249,6 +251,16 @@ int call_usermodehelper(char *path, char
 }
 EXPORT_SYMBOL(call_usermodehelper);
 
+void usermodehelper_enable(void)
+{
+	usermodehelper_disabled = 0;
+}
+
+void usermodehelper_disable(void)
+{
+	usermodehelper_disabled = 1;
+}
+
 void __init usermodehelper_init(void)
 {
 	khelper_wq = create_singlethread_workqueue("khelper");
Index: dtor/include/linux/kmod.h
===================================================================
--- dtor.orig/include/linux/kmod.h
+++ dtor/include/linux/kmod.h
@@ -34,7 +34,10 @@ static inline int request_module(const c
 #endif
 
 #define try_then_request_module(x, mod...) ((x) ?: (request_module(mod), (x)))
+
 extern int call_usermodehelper(char *path, char *argv[], char *envp[], int wait);
 extern void usermodehelper_init(void);
+extern void usermodehelper_enable(void);
+extern void usermodehelper_disable(void);
 
 #endif /* __LINUX_KMOD_H__ */

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [linux-pm] Re: swsusp 'disk' fails in bk-current - intel_agp at fault?
  2005-03-31  7:26                                       ` Dmitry Torokhov
@ 2005-03-31  8:39                                         ` Pavel Machek
  2005-03-31 15:02                                           ` Dmitry Torokhov
  2005-03-31 16:02                                         ` Patrick Mochel
  1 sibling, 1 reply; 52+ messages in thread
From: Pavel Machek @ 2005-03-31  8:39 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Nigel Cunningham, Linux-pm mailing list, Vojtech Pavlik,
	Stefan Seyfried, Linux Kernel Mailing List, Andy Isaacson

Hi!

> > > > We currently freeze processes for suspend-to-ram, too. I guess that
> > > > disable_usermodehelper is probably better and that in_suspend() should
> > > > only be used for sanity checks... go with disable_usermodehelper and
> > > > sorry for the noise.
> > > 
> > > Here's another possibility: Freeze the workqueue that
> > > call_usermodehelper uses (remember that code I didn't push hard enough
> > > to Andrew?), and let invocations of call_usermodehelper block in
> > > TASK_UNINTERRUPTIBLE. In refrigerating processes, don't choke on
> > 
> > There may be many devices in the system, and you are going to need
> > quite a lot of RAM for all that... That's why they do not queue it
> > during boot, IIRC. Disabling usermode helper seems right.
> 
> Ok, what do you think about this one?
> 
> ===================================================================
> 
> swsusp: disable usermodehelper after generating memory snapshot and
>         before resuming devices, so when device fails to resume we
>         won't try to call hotplug - userspace stopped anyway.
> 
> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
> 
> 
>  include/linux/kmod.h  |    3 +++
>  kernel/kmod.c         |   14 +++++++++++++-
>  kernel/power/disk.c   |    2 ++
>  kernel/power/swsusp.c |    1 -
>  4 files changed, 18 insertions(+), 2 deletions(-)
> 
> Index: dtor/kernel/power/disk.c
> ===================================================================
> --- dtor.orig/kernel/power/disk.c
> +++ dtor/kernel/power/disk.c
> @@ -205,6 +205,8 @@ int pm_suspend_disk(void)
>  
>  	if (in_suspend) {
>  		pr_debug("PM: writing image.\n");
> +		usermodehelper_disable();
> +		device_resume();
>  		error = swsusp_write();
>  		if (!error)
>  			power_down(pm_disk_mode);
> Index: dtor/kernel/power/swsusp.c
> ===================================================================
> --- dtor.orig/kernel/power/swsusp.c
> +++ dtor/kernel/power/swsusp.c
> @@ -853,7 +853,6 @@ static int suspend_prepare_image(void)
>  int swsusp_write(void)
>  {
>  	int error;
> -	device_resume();
>  	lock_swapdevices();
>  	error = write_suspend_image();
>  	/* This will unlock ignored swap devices since writing is

Looks good, except... why move code around? Could you just call
usermodehelper_disable from swsusp_write?
							Pavel

-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [linux-pm] Re: swsusp 'disk' fails in bk-current - intel_agp at fault?
  2005-03-31  8:39                                         ` Pavel Machek
@ 2005-03-31 15:02                                           ` Dmitry Torokhov
  0 siblings, 0 replies; 52+ messages in thread
From: Dmitry Torokhov @ 2005-03-31 15:02 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Nigel Cunningham, Linux-pm mailing list, Vojtech Pavlik,
	Stefan Seyfried, Linux Kernel Mailing List, Andy Isaacson

On Thu, 31 Mar 2005 10:39:10 +0200, Pavel Machek <pavel@suse.cz> wrote:
> >  int swsusp_write(void)
> >  {
> >       int error;
> > -     device_resume();
> >       lock_swapdevices();
> >       error = write_suspend_image();
> >       /* This will unlock ignored swap devices since writing is
> 
> Looks good, except... why move code around? Could you just call
> usermodehelper_disable from swsusp_write?

That's because I don't think that swsusp_write is a proper place for
it ;) It looks like a lean and mean function that does just write and
manipulating usermodehelper state _and_ system (device) state is
wrong. Let it do one thing, don't overload with actions that I think
belong to the upper level. Do you agree?

I think I need to stick in usermodehelper_enable call in case
swsusp_write fails though.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [linux-pm] Re: swsusp 'disk' fails in bk-current - intel_agp at fault?
  2005-03-31  7:26                                       ` Dmitry Torokhov
  2005-03-31  8:39                                         ` Pavel Machek
@ 2005-03-31 16:02                                         ` Patrick Mochel
  2005-03-31 16:32                                           ` Dmitry Torokhov
  1 sibling, 1 reply; 52+ messages in thread
From: Patrick Mochel @ 2005-03-31 16:02 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Pavel Machek, Vojtech Pavlik, Andy Isaacson,
	Linux-pm mailing list, Nigel Cunningham, Stefan Seyfried,
	Linux Kernel Mailing List


On Thu, 31 Mar 2005, Dmitry Torokhov wrote:

> Ok, what do you think about this one?
>
> ===================================================================
>
> swsusp: disable usermodehelper after generating memory snapshot and
>         before resuming devices, so when device fails to resume we
>         won't try to call hotplug - userspace stopped anyway.

Hm, shouldn't we disable it before we start to freeze processes? We don't
want any more processes trying to start up after we've taken care of
them..

Thanks,


	Pat


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [linux-pm] Re: swsusp 'disk' fails in bk-current - intel_agp at fault?
  2005-03-31 16:02                                         ` Patrick Mochel
@ 2005-03-31 16:32                                           ` Dmitry Torokhov
  2005-03-31 22:16                                             ` Nigel Cunningham
  2005-03-31 22:18                                             ` Pavel Machek
  0 siblings, 2 replies; 52+ messages in thread
From: Dmitry Torokhov @ 2005-03-31 16:32 UTC (permalink / raw)
  To: Patrick Mochel
  Cc: Pavel Machek, Vojtech Pavlik, Andy Isaacson,
	Linux-pm mailing list, Nigel Cunningham, Stefan Seyfried,
	Linux Kernel Mailing List

On Thu, 31 Mar 2005 08:02:44 -0800 (PST), Patrick Mochel
<mochel@digitalimplant.org> wrote:
> 
> On Thu, 31 Mar 2005, Dmitry Torokhov wrote:
> 
> > Ok, what do you think about this one?
> >
> > ===================================================================
> >
> > swsusp: disable usermodehelper after generating memory snapshot and
> >         before resuming devices, so when device fails to resume we
> >         won't try to call hotplug - userspace stopped anyway.
> 
> Hm, shouldn't we disable it before we start to freeze processes? We don't
> want any more processes trying to start up after we've taken care of
> them..
> 

Can't a device be removed (for any reason) _while_ we are freezing
processes? I think freeszing code will properly deal with it... What
about suspend semantics - if suspend fails do we say the device should
be operational or the system should attempt to re-initialize? I.e. we
are not doing suspend after all - can we still drop messages on the
floor? After all, we still have ability to run coldplug after failed
suspend.

I frankly am not sure at what point to disable usermode helper. Or
maybe we need to have a list of pending events and suspend khelper_wq
while suspending.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [linux-pm] Re: swsusp 'disk' fails in bk-current - intel_agp at fault?
  2005-03-31 16:32                                           ` Dmitry Torokhov
@ 2005-03-31 22:16                                             ` Nigel Cunningham
  2005-03-31 22:18                                             ` Pavel Machek
  1 sibling, 0 replies; 52+ messages in thread
From: Nigel Cunningham @ 2005-03-31 22:16 UTC (permalink / raw)
  To: dtor_core
  Cc: Patrick Mochel, Pavel Machek, Vojtech Pavlik, Andy Isaacson,
	Linux-pm mailing list, Stefan Seyfried, Linux Kernel Mailing List

Hi.

On Fri, 2005-04-01 at 02:32, Dmitry Torokhov wrote:
> On Thu, 31 Mar 2005 08:02:44 -0800 (PST), Patrick Mochel
> <mochel@digitalimplant.org> wrote:
> > 
> > On Thu, 31 Mar 2005, Dmitry Torokhov wrote:
> > 
> > > Ok, what do you think about this one?
> > >
> > > ===================================================================
> > >
> > > swsusp: disable usermodehelper after generating memory snapshot and
> > >         before resuming devices, so when device fails to resume we
> > >         won't try to call hotplug - userspace stopped anyway.
> > 
> > Hm, shouldn't we disable it before we start to freeze processes? We don't
> > want any more processes trying to start up after we've taken care of
> > them..
> > 
> 
> Can't a device be removed (for any reason) _while_ we are freezing
> processes? I think freeszing code will properly deal with it... What
> about suspend semantics - if suspend fails do we say the device should
> be operational or the system should attempt to re-initialize? I.e. we
> are not doing suspend after all - can we still drop messages on the
> floor? After all, we still have ability to run coldplug after failed
> suspend.
> 
> I frankly am not sure at what point to disable usermode helper. Or
> maybe we need to have a list of pending events and suspend khelper_wq
> while suspending.

FWIW, my solution is purely freezer based. I freeze khelper and in the
freezer code ignore kernel threads in state uninterruptible (which is
where kseriod, eg, will be while it waits for the usermode helper
process (which also gets frozen).

Regards,

Nigel
-- 
Nigel Cunningham
Software Engineer, Canberra, Australia
http://www.cyclades.com
Bus: +61 (2) 6291 9554; Hme: +61 (2) 6292 8028;  Mob: +61 (417) 100 574

Maintainer of Suspend2 Kernel Patches http://suspend2.net


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [linux-pm] Re: swsusp 'disk' fails in bk-current - intel_agp at fault?
  2005-03-31 16:32                                           ` Dmitry Torokhov
  2005-03-31 22:16                                             ` Nigel Cunningham
@ 2005-03-31 22:18                                             ` Pavel Machek
  2005-03-31 22:28                                               ` Nigel Cunningham
  1 sibling, 1 reply; 52+ messages in thread
From: Pavel Machek @ 2005-03-31 22:18 UTC (permalink / raw)
  To: dtor_core
  Cc: Patrick Mochel, Vojtech Pavlik, Andy Isaacson,
	Linux-pm mailing list, Nigel Cunningham, Stefan Seyfried,
	Linux Kernel Mailing List

Hi!

> > > Ok, what do you think about this one?
> > >
> > > ===================================================================
> > >
> > > swsusp: disable usermodehelper after generating memory snapshot and
> > >         before resuming devices, so when device fails to resume we
> > >         won't try to call hotplug - userspace stopped anyway.
> > 
> > Hm, shouldn't we disable it before we start to freeze processes? We don't
> > want any more processes trying to start up after we've taken care of
> > them..
> > 
> 
> Can't a device be removed (for any reason) _while_ we are freezing
> processes? I think freeszing code will properly deal with it... What
> about suspend semantics - if suspend fails do we say the device should
> be operational or the system should attempt to re-initialize? I.e. we
> are not doing suspend after all - can we still drop messages on the
> floor? After all, we still have ability to run coldplug after failed
> suspend.

I believe we should freeze hotplug before processes. Dropping messages
on the floor should not be a problem, we should just call coldplug
after failed suspend.
								Pavel
-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [linux-pm] Re: swsusp 'disk' fails in bk-current - intel_agp at fault?
  2005-03-31 22:18                                             ` Pavel Machek
@ 2005-03-31 22:28                                               ` Nigel Cunningham
  2005-04-01  8:49                                                 ` Rafael J. Wysocki
  0 siblings, 1 reply; 52+ messages in thread
From: Nigel Cunningham @ 2005-03-31 22:28 UTC (permalink / raw)
  To: Pavel Machek
  Cc: dtor_core, Patrick Mochel, Vojtech Pavlik, Andy Isaacson,
	Linux-pm mailing list, Stefan Seyfried, Linux Kernel Mailing List

Hi.

On Fri, 2005-04-01 at 08:18, Pavel Machek wrote:
> Hi!
> 
> > > > Ok, what do you think about this one?
> > > >
> > > > ===================================================================
> > > >
> > > > swsusp: disable usermodehelper after generating memory snapshot and
> > > >         before resuming devices, so when device fails to resume we
> > > >         won't try to call hotplug - userspace stopped anyway.
> > > 
> > > Hm, shouldn't we disable it before we start to freeze processes? We don't
> > > want any more processes trying to start up after we've taken care of
> > > them..
> > > 
> > 
> > Can't a device be removed (for any reason) _while_ we are freezing
> > processes? I think freeszing code will properly deal with it... What
> > about suspend semantics - if suspend fails do we say the device should
> > be operational or the system should attempt to re-initialize? I.e. we
> > are not doing suspend after all - can we still drop messages on the
> > floor? After all, we still have ability to run coldplug after failed
> > suspend.
> 
> I believe we should freeze hotplug before processes. Dropping messages
> on the floor should not be a problem, we should just call coldplug
> after failed suspend.

How will you know which devices to call coldplug for, post resume? (Or
does it figure that out itself somehow?)

Regards,

Nigel
-- 
Nigel Cunningham
Software Engineer, Canberra, Australia
http://www.cyclades.com
Bus: +61 (2) 6291 9554; Hme: +61 (2) 6292 8028;  Mob: +61 (417) 100 574

Maintainer of Suspend2 Kernel Patches http://suspend2.net


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [linux-pm] Re: swsusp 'disk' fails in bk-current - intel_agp at fault?
  2005-03-31 22:28                                               ` Nigel Cunningham
@ 2005-04-01  8:49                                                 ` Rafael J. Wysocki
  2005-04-01 10:33                                                   ` Stefan Seyfried
  0 siblings, 1 reply; 52+ messages in thread
From: Rafael J. Wysocki @ 2005-04-01  8:49 UTC (permalink / raw)
  To: ncunningham
  Cc: Pavel Machek, dtor_core, Patrick Mochel, Vojtech Pavlik,
	Andy Isaacson, Linux-pm mailing list, Stefan Seyfried,
	Linux Kernel Mailing List

Hi,

On Friday, 1 of April 2005 00:28, Nigel Cunningham wrote:
> Hi.
> 
> On Fri, 2005-04-01 at 08:18, Pavel Machek wrote:
> > Hi!
> > 
> > > > > Ok, what do you think about this one?
> > > > >
> > > > > ===================================================================
> > > > >
> > > > > swsusp: disable usermodehelper after generating memory snapshot and
> > > > >         before resuming devices, so when device fails to resume we
> > > > >         won't try to call hotplug - userspace stopped anyway.
> > > > 
> > > > Hm, shouldn't we disable it before we start to freeze processes? We don't
> > > > want any more processes trying to start up after we've taken care of
> > > > them..
> > > > 
> > > 
> > > Can't a device be removed (for any reason) _while_ we are freezing
> > > processes? I think freeszing code will properly deal with it... What
> > > about suspend semantics - if suspend fails do we say the device should
> > > be operational or the system should attempt to re-initialize? I.e. we
> > > are not doing suspend after all - can we still drop messages on the
> > > floor? After all, we still have ability to run coldplug after failed
> > > suspend.
> > 
> > I believe we should freeze hotplug before processes.

I agree.  IMO user space should not be considered as available once we have
started freezing processes, so hotplug should be disabled before.  By the same
token, it should only be enabled after the processes have been restarted
during resume (or after suspend has failed).

BTW, it seems to me that the forking of new processes could be disabled
before we start to freeze the existing ones.

> > Dropping messages on the floor should not be a problem, we should just
> > call coldplug after failed suspend.
> 
> How will you know which devices to call coldplug for, post resume? (Or
> does it figure that out itself somehow?)

I think the drivers that need the hotplug to resume should defer their resume
routines until usermodehelper is enabled (it seems to me that we can use
a completion to handle this).

Greets,
Rafael


-- 
- Would you tell me, please, which way I ought to go from here?
- That depends a good deal on where you want to get to.
		-- Lewis Carroll "Alice's Adventures in Wonderland"

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [linux-pm] Re: swsusp 'disk' fails in bk-current - intel_agp at fault?
  2005-04-01  8:49                                                 ` Rafael J. Wysocki
@ 2005-04-01 10:33                                                   ` Stefan Seyfried
  0 siblings, 0 replies; 52+ messages in thread
From: Stefan Seyfried @ 2005-04-01 10:33 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: ncunningham, Pavel Machek, dtor_core, Patrick Mochel,
	Vojtech Pavlik, Andy Isaacson, Linux-pm mailing list,
	Linux Kernel Mailing List

Rafael J. Wysocki wrote:
> Hi,
>> On Fri, 2005-04-01 at 08:18, Pavel Machek wrote:
>> > I believe we should freeze hotplug before processes.
> 
> I agree.  IMO user space should not be considered as available once we have
> started freezing processes, so hotplug should be disabled before.  By the same
> token, it should only be enabled after the processes have been restarted
> during resume (or after suspend has failed).

it has probably to be enabled before the processes are restarted - they
may rightfully assume that hotplug is working.
-- 
seife
                                 Never trust a computer you can't lift.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: swsusp 'disk' fails in bk-current - intel_agp at fault?
  2005-03-29 21:12                             ` Pavel Machek
  2005-03-29 21:33                               ` Dmitry Torokhov
@ 2005-03-29 23:05                               ` Rafael J. Wysocki
  1 sibling, 0 replies; 52+ messages in thread
From: Rafael J. Wysocki @ 2005-03-29 23:05 UTC (permalink / raw)
  To: Pavel Machek
  Cc: dtor_core, Stefan Seyfried, Andy Isaacson, kernel list,
	Vojtech Pavlik, Linux-pm mailing list

Hi,

On Tuesday, 29 of March 2005 23:12, Pavel Machek wrote:
> Hi!
> 
> > > I don't really want us to try execve during resume... Could we simply
> > > artifically fail that execve with something if (in_suspend()) return
> > > -EINVAL; [except that in_suspend() just is not there, but there were
> > > some proposals to add it].
> > > 
> > > Or just avoid calling hotplug at all in resume case? And then do
> > > coldplug-like scan when userspace is ready...
> > > 
> > 
> > I am leaning towards calling disable_usermodehelper (not writtent yet)
> > after swsusp completes snapshotting memory. We really don't care about
> > hotplug events in this case and this will allow keeping "normal"
> > resume in drivers as is. What do you think?
> 
> That would certianly do the trick.
> 
> [Or perhaps in_suspend() is slightly nicer solution? People wanted it
> for other stuff (sanity checking, like BUG_ON(in_suspend())), too....]

IMHO, they are not mutually exclusive.    However, by using
disable_usermodehelper we would get rid of the reason (ie hotplug events)
instead of just curing the symptoms (ie execve() during suspend).

Greets,
Rafael


-- 
- Would you tell me, please, which way I ought to go from here?
- That depends a good deal on where you want to get to.
		-- Lewis Carroll "Alice's Adventures in Wonderland"


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [linux-pm] Re: swsusp 'disk' fails in bk-current - intel_agp at fault?
  2005-03-29 20:52                         ` Pavel Machek
  2005-03-29 21:07                           ` Dmitry Torokhov
@ 2005-03-29 21:23                           ` Patrick Mochel
  2005-03-29 21:38                             ` Dmitry Torokhov
  2005-03-30  9:52                             ` Greg KH
  1 sibling, 2 replies; 52+ messages in thread
From: Patrick Mochel @ 2005-03-29 21:23 UTC (permalink / raw)
  To: Pavel Machek
  Cc: dtor_core, Linux-pm mailing list, Vojtech Pavlik, Stefan Seyfried,
	kernel list, Andy Isaacson


On Tue, 29 Mar 2005, Pavel Machek wrote:

> I don't really want us to try execve during resume... Could we simply
> artifically fail that execve with something if (in_suspend()) return
> -EINVAL; [except that in_suspend() just is not there, but there were
> some proposals to add it].
>
> Or just avoid calling hotplug at all in resume case? And then do
> coldplug-like scan when userspace is ready...

I thought that cold-plugging only worked for devices, not all objects.

Can we just queue up hotplug events? That way we wouldn't lose any across
the transition, and could be used to send resume events to userspace for
various devices that need help..


	Pat


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [linux-pm] Re: swsusp 'disk' fails in bk-current - intel_agp at fault?
  2005-03-29 21:23                           ` [linux-pm] " Patrick Mochel
@ 2005-03-29 21:38                             ` Dmitry Torokhov
  2005-03-30  9:52                             ` Greg KH
  1 sibling, 0 replies; 52+ messages in thread
From: Dmitry Torokhov @ 2005-03-29 21:38 UTC (permalink / raw)
  To: Patrick Mochel
  Cc: Pavel Machek, Linux-pm mailing list, Vojtech Pavlik,
	Stefan Seyfried, kernel list, Andy Isaacson

On Tue, 29 Mar 2005 13:23:35 -0800 (PST), Patrick Mochel
<mochel@digitalimplant.org> wrote:
> 
> On Tue, 29 Mar 2005, Pavel Machek wrote:
> 
> > I don't really want us to try execve during resume... Could we simply
> > artifically fail that execve with something if (in_suspend()) return
> > -EINVAL; [except that in_suspend() just is not there, but there were
> > some proposals to add it].
> >
> > Or just avoid calling hotplug at all in resume case? And then do
> > coldplug-like scan when userspace is ready...
> 
> I thought that cold-plugging only worked for devices, not all objects.
> 

It really depens on the script - nothing stops it from traversing
entire /sys tree and if an object it not exported in the tree I'd say
userspace should not care about such object anyway.

> Can we just queue up hotplug events? That way we wouldn't lose any across
> the transition, and could be used to send resume events to userspace for
> various devices that need help..
>

The point is that at this point any changes to the system state will
be discarded - we already did the image and about to write it. When we
resume for real all those events will be regenerated once again.
 
-- 
Dmitry

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [linux-pm] Re: swsusp 'disk' fails in bk-current - intel_agp at fault?
  2005-03-29 21:23                           ` [linux-pm] " Patrick Mochel
  2005-03-29 21:38                             ` Dmitry Torokhov
@ 2005-03-30  9:52                             ` Greg KH
  1 sibling, 0 replies; 52+ messages in thread
From: Greg KH @ 2005-03-30  9:52 UTC (permalink / raw)
  To: Patrick Mochel
  Cc: Pavel Machek, Vojtech Pavlik, Andy Isaacson,
	Linux-pm mailing list, Stefan Seyfried, kernel list

On Tue, Mar 29, 2005 at 01:23:35PM -0800, Patrick Mochel wrote:
> 
> On Tue, 29 Mar 2005, Pavel Machek wrote:
> 
> > I don't really want us to try execve during resume... Could we simply
> > artifically fail that execve with something if (in_suspend()) return
> > -EINVAL; [except that in_suspend() just is not there, but there were
> > some proposals to add it].
> >
> > Or just avoid calling hotplug at all in resume case? And then do
> > coldplug-like scan when userspace is ready...
> 
> I thought that cold-plugging only worked for devices, not all objects.

We can walk the whole sysfs tree and create "cold" hotplug events.
udevstart does that for devices that udev cares about (as an example.)

> Can we just queue up hotplug events? That way we wouldn't lose any across
> the transition, and could be used to send resume events to userspace for
> various devices that need help..

Ick, I really hate this idea, but there is a patch in the SuSE kernel to
do this at boot time.  Hopefully the author of that patch resubmitts it
again and maybe it will make it eventually into mainline...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: swsusp 'disk' fails in bk-current - intel_agp at fault?
  2005-03-24 23:54           ` Andy Isaacson
  2005-03-25  9:22             ` Stefan Seyfried
@ 2005-03-25 14:58             ` Dmitry Torokhov
  2005-03-30  7:26               ` Andy Isaacson
  1 sibling, 1 reply; 52+ messages in thread
From: Dmitry Torokhov @ 2005-03-25 14:58 UTC (permalink / raw)
  To: Andy Isaacson; +Cc: Stefan Seyfried, kernel list

On Thu, 24 Mar 2005 15:54:39 -0800, Andy Isaacson <adi@hexapodia.org> wrote:
> On Thu, Mar 24, 2005 at 04:10:39PM -0500, Dmitry Torokhov wrote:
> > If you do "ls /sys/bus/serio/devices" and see more than 3 ports you
> > have MUX mode active.
> 
> Just serio0 and serio1.
> 
> On Thu, Mar 24, 2005 at 04:14:52PM -0500, Dmitry Torokhov wrote:
> > On Thu, 24 Mar 2005 12:20:40 -0800, Andy Isaacson <adi@hexapodia.org> wrote:
> > > (How can I verify that "nomux" was accepted?  It shows up on the "Kernel
> > > command line" but there's no other mention of it in dmesg.)
> >
> > Ignore my babbling, I just noticed in your dmesg that your KBC does
> > not support MUX mode to begin with.
> 
> OK, anything else I should try?
> 
> Why does it only fail when I have *both* intel_agp and i8042 aux?
> 
> In the SysRq-T trace I see one interesting process: most things are
> in D state in refrigerator(), but sh shows the following traceback:
> 
> wait_for_completion
> call_usermodehelper
> kobject_hotplug
> kobject_del
> class_device_del
> class_device_unregister
> mousedev_disconnect
> input_unregister_device
> alps_disconnect
> psmouse_disconnect
> serio_driver_remove
> device_release_driver
> serio_release_driver
> serio_resume

I wonder why ALPS reconnect failed. You don't have a serial console
set up, do you? If not then maybe you could make a huge framebuffer to
capture as much info as you can... I hope you have a digital camera ;)

Then do "echo 1 > /sys/modules/i8042/parameters/debug" and try to
suspend. I am interested of data coming in and out of i8042.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: swsusp 'disk' fails in bk-current - intel_agp at fault?
  2005-03-25 14:58             ` Dmitry Torokhov
@ 2005-03-30  7:26               ` Andy Isaacson
  0 siblings, 0 replies; 52+ messages in thread
From: Andy Isaacson @ 2005-03-30  7:26 UTC (permalink / raw)
  To: dtor_core; +Cc: Stefan Seyfried, kernel list

On Fri, Mar 25, 2005 at 09:58:40AM -0500, Dmitry Torokhov wrote:
> I wonder why ALPS reconnect failed. You don't have a serial console
> set up, do you? If not then maybe you could make a huge framebuffer to
> capture as much info as you can... I hope you have a digital camera ;)

No serial ports brought out on this laptop, and I've not tried
framebuffer...

> Then do "echo 1 > /sys/modules/i8042/parameters/debug" and try to
> suspend. I am interested of data coming in and out of i8042.

Transcribed by hand, the last few bytes are
< fa           ACK
> d4 e9        GETINFO
< fa 20 00 64  
> d4 ff        RESET_BAT
< fa aa 00     RET_BAT

(Because I used O= the __FILE__ is very long so each dbg() takes two lines
of my 80x25 console...)

Dunno if that's helpful, sorry...

-andy

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: swsusp 'disk' fails in bk-current - intel_agp at fault?
  2005-03-24 20:20       ` Andy Isaacson
  2005-03-24 21:10         ` Dmitry Torokhov
@ 2005-03-24 21:14         ` Dmitry Torokhov
  1 sibling, 0 replies; 52+ messages in thread
From: Dmitry Torokhov @ 2005-03-24 21:14 UTC (permalink / raw)
  To: Andy Isaacson; +Cc: Stefan Seyfried, kernel list

On Thu, 24 Mar 2005 12:20:40 -0800, Andy Isaacson <adi@hexapodia.org> wrote:
> On Thu, Mar 24, 2005 at 02:18:40PM -0500, Dmitry Torokhov wrote:
> > On Thu, 24 Mar 2005 10:10:59 -0800, Andy Isaacson <adi@hexapodia.org> wrote:
> > > So I added i8042.noaux to my kernel command line, rebooted, insmodded
> > > intel_agp, started X, and verified no touchpad action.  Then I
> > > suspended, and it worked fine.  After restart, I suspended again - also
> > > fine.
> > >
> > > So I think that fixed it.  But no touchpad is a bit annoying. :)
> >
> > Try adding i8042.nomux instead of i8042.noaux, it should keep your
> > touchpad in working condition. Please let me know if it still wiorks.
> 
> With nomux the touchpad works again, but suspend blocks in the same
> place as without nomux.
> 
> (How can I verify that "nomux" was accepted?  It shows up on the "Kernel
> command line" but there's no other mention of it in dmesg.)
> 

Ignore my babbling, I just noticed in your dmesg that your KBC does
not support MUX mode to begin with.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: swsusp 'disk' fails in bk-current - intel_agp at fault?
  2005-03-24 18:10   ` Andy Isaacson
  2005-03-24 19:18     ` Dmitry Torokhov
@ 2005-03-24 20:38     ` Stefan Seyfried
  2005-03-29 18:42       ` Dmitry Torokhov
  1 sibling, 1 reply; 52+ messages in thread
From: Stefan Seyfried @ 2005-03-24 20:38 UTC (permalink / raw)
  To: Andy Isaacson; +Cc: kernel list

Andy Isaacson wrote:
> On Thu, Mar 24, 2005 at 03:27:15PM +0100, Stefan Seyfried wrote:

> Sysrq still prints stuff, so IRQs aren't locked.  But most of the sysrq
> commands don't work... S and U don't seem to do anything (not too
> suprising I suppose) but B does reboot.

sysrq-t will probably show a stuck kseriod. Unfortunately it only
happens on one machine for me (toshiba P10-550 IIRC, P4HT but with
non-smp kernel) which has no serial port for console.

>> If sysrq is still working, please try with "i8042.noaux" (this will kill
>> your touchpad, which is what i intend :-)
> 
> So I added i8042.noaux to my kernel command line, rebooted, insmodded
> intel_agp, started X, and verified no touchpad action.  Then I
> suspended, and it worked fine.  After restart, I suspended again - also
> fine.
> 
> So I think that fixed it.  But no touchpad is a bit annoying. :)

Yes, it was not thought as a fix but just for verification, since i have
seen something similar.
We have a SUSE bug for this, i believe Vojtech and Pavel will take care
of this one. Thanks for confirming, i almost started to believe i was
seeing ghosts :-)
-- 
seife
                                 Never trust a computer you can't lift.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: swsusp 'disk' fails in bk-current - intel_agp at fault?
  2005-03-24 20:38     ` Stefan Seyfried
@ 2005-03-29 18:42       ` Dmitry Torokhov
  2005-03-30  7:24         ` Andy Isaacson
  0 siblings, 1 reply; 52+ messages in thread
From: Dmitry Torokhov @ 2005-03-29 18:42 UTC (permalink / raw)
  To: Stefan Seyfried; +Cc: Andy Isaacson, kernel list

On Thursday 24 March 2005 15:38, Stefan Seyfried wrote:
> Andy Isaacson wrote:
> > So I added i8042.noaux to my kernel command line, rebooted, insmodded
> > intel_agp, started X, and verified no touchpad action.  Then I
> > suspended, and it worked fine.  After restart, I suspended again - also
> > fine.
> > 
> > So I think that fixed it.  But no touchpad is a bit annoying. :)
> 
> Yes, it was not thought as a fix but just for verification, since i have
> seen something similar.
> We have a SUSE bug for this, i believe Vojtech and Pavel will take care
> of this one. Thanks for confirming, i almost started to believe i was
> seeing ghosts :-)

Could you please try the patch below - it should fix the issues you are
seeing although there may be other devices (really any hot-pluggable
device) that will show the same behaviour. In the long run swsusp should
not attempt resuming devices when the system can not handle the process
properly. 

-- 
Dmitry

===================================================================

Input: serio - do not attempt to immediately disconnect port if
       resume failed, let kseriod take care of it. Otherwise we
       may attempt to unregister associated input devices which
       will generate hotplug events which are not handled well
       during swsusp.

Signed-off-by: Dmitry Torokhov <dtor@mail.ru>


 serio.c |    1 -
 1 files changed, 1 deletion(-)

Index: dtor/drivers/input/serio/serio.c
===================================================================
--- dtor.orig/drivers/input/serio/serio.c
+++ dtor/drivers/input/serio/serio.c
@@ -779,7 +779,6 @@ static int serio_resume(struct device *d
 	struct serio *serio = to_serio_port(dev);
 
 	if (!serio->drv || !serio->drv->reconnect || serio->drv->reconnect(serio)) {
-		serio_disconnect_port(serio);
 		/*
 		 * Driver re-probing can take a while, so better let kseriod
 		 * deal with it.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: swsusp 'disk' fails in bk-current - intel_agp at fault?
  2005-03-29 18:42       ` Dmitry Torokhov
@ 2005-03-30  7:24         ` Andy Isaacson
  0 siblings, 0 replies; 52+ messages in thread
From: Andy Isaacson @ 2005-03-30  7:24 UTC (permalink / raw)
  To: Dmitry Torokhov; +Cc: Stefan Seyfried, kernel list

On Tue, Mar 29, 2005 at 01:42:26PM -0500, Dmitry Torokhov wrote:
> Could you please try the patch below - it should fix the issues you are
[snip]
> --- dtor.orig/drivers/input/serio/serio.c
> +++ dtor/drivers/input/serio/serio.c
>  	if (!serio->drv || !serio->drv->reconnect || serio->drv->reconnect(serio)) {
> -		serio_disconnect_port(serio);
>  		/*
>  		 * Driver re-probing can take a while, so better let kseriod

Yep, that fixes it.  I applied your patch to 2.6.12-rc1-mm1 and
suspended and resumed 5 times in a row without any difficulty.  Thanks!

-andy

^ permalink raw reply	[flat|nested] 52+ messages in thread

[parent not found: <20050525171825.51a06908.akpm@osdl.org>]

* Re: swsusp 'disk' fails in bk-current - intel_agp at fault?
       [not found] ` <20050525171825.51a06908.akpm@osdl.org>
@ 2005-05-27 17:44   ` Andy Isaacson
  0 siblings, 0 replies; 52+ messages in thread
From: Andy Isaacson @ 2005-05-27 17:44 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Pavel Machek, Dave Jones, linux-kernel

On Wed, May 25, 2005 at 05:18:25PM -0700, Andrew Morton wrote:
> Andy Isaacson <adi@hexapodia.org> wrote:
> > I was previously running 2.6.11-rc3 and swsusp was working quite nicely:
> > echo shutdown > /sys/power/disk
> > echo disk > /sys/power/state
> > 
> > Now I've upgraded to 2.6.12-rc1, 423b66b6oJOGN68OhmSrBFxxLOtIEA, and it
> > no longer works reliably.  Almost every time I do the above it blocks in
> > device_resume() (I haven't had time to track it deeper than that).
> 
> Andy, can you please retest 2.6.12-rc5 and if these problems remain,
> generate new reports at bugme.osdl.org?

After two quick tests, it appears to be fixed in 2.6.12-rc5.  Thanks for
the follow-up.

-andy

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2005-05-27 17:45 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-03-23 18:49 swsusp 'disk' fails in bk-current - intel_agp at fault? Andy Isaacson
2005-03-24 14:27 ` Stefan Seyfried
2005-03-24 18:10   ` Andy Isaacson
2005-03-24 19:18     ` Dmitry Torokhov
2005-03-24 20:20       ` Andy Isaacson
2005-03-24 21:10         ` Dmitry Torokhov
2005-03-24 23:54           ` Andy Isaacson
2005-03-25  9:22             ` Stefan Seyfried
2005-03-25 10:13               ` Pavel Machek
2005-03-25 14:19                 ` Dmitry Torokhov
2005-03-25 14:24                   ` Pavel Machek
2005-03-25 14:52                     ` Dmitry Torokhov
2005-03-25 15:42                       ` Pavel Machek
2005-03-25 16:04                         ` Dmitry Torokhov
2005-03-28 23:00                           ` Pavel Machek
2005-03-29 23:19                           ` Rafael J. Wysocki
2005-03-29 21:49                       ` Rafael J. Wysocki
2005-03-25 18:36                 ` Andy Isaacson
2005-03-29 16:18               ` Dmitry Torokhov
2005-03-29 18:18                 ` Pavel Machek
2005-03-29 19:11                   ` Dmitry Torokhov
2005-03-29 19:23                     ` Pavel Machek
2005-03-29 20:05                       ` Dmitry Torokhov
2005-03-29 20:52                         ` Pavel Machek
2005-03-29 21:07                           ` Dmitry Torokhov
2005-03-29 21:12                             ` Pavel Machek
2005-03-29 21:33                               ` Dmitry Torokhov
2005-03-29 21:44                                 ` Pavel Machek
2005-03-29 22:31                                   ` [linux-pm] " Nigel Cunningham
2005-03-29 22:35                                     ` Pavel Machek
2005-03-29 23:46                                       ` Nigel Cunningham
2005-03-31  7:26                                       ` Dmitry Torokhov
2005-03-31  8:39                                         ` Pavel Machek
2005-03-31 15:02                                           ` Dmitry Torokhov
2005-03-31 16:02                                         ` Patrick Mochel
2005-03-31 16:32                                           ` Dmitry Torokhov
2005-03-31 22:16                                             ` Nigel Cunningham
2005-03-31 22:18                                             ` Pavel Machek
2005-03-31 22:28                                               ` Nigel Cunningham
2005-04-01  8:49                                                 ` Rafael J. Wysocki
2005-04-01 10:33                                                   ` Stefan Seyfried
2005-03-29 23:05                               ` Rafael J. Wysocki
2005-03-29 21:23                           ` [linux-pm] " Patrick Mochel
2005-03-29 21:38                             ` Dmitry Torokhov
2005-03-30  9:52                             ` Greg KH
2005-03-25 14:58             ` Dmitry Torokhov
2005-03-30  7:26               ` Andy Isaacson
2005-03-24 21:14         ` Dmitry Torokhov
2005-03-24 20:38     ` Stefan Seyfried
2005-03-29 18:42       ` Dmitry Torokhov
2005-03-30  7:24         ` Andy Isaacson
     [not found] ` <20050525171825.51a06908.akpm@osdl.org>
2005-05-27 17:44   ` Andy Isaacson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox