* [RFC] ACPI vs device ordering on resume
@ 2006-11-14 23:30 Stephen Hemminger
2006-11-14 23:59 ` Linus Torvalds
2006-11-15 7:03 ` [linux-pm] " Len Brown
0 siblings, 2 replies; 15+ messages in thread
From: Stephen Hemminger @ 2006-11-14 23:30 UTC (permalink / raw)
To: len.brown, Greg KH, Pavel Machek
Cc: Andrew Morton, linux-acpi, Linus Torvalds, linux-pm
If I do a suspend-to-ram then resume on a Sony Vaio laptop with sky2 driver,
the first interrupt gets misrouted to the original shared IRQ, rather than
to the MSI irq expected.
During the pci_restore process, the MSI information and the PCI command register
are restored properly. But later during resume, inside the ACPI evaluation of
the WAK method, the PCI_COMMAND INTX_DISABLE (0x400) flag is being cleared.
My guess is that the BIOS ends up doing some resetting of devices.
I may be able to workaround the problem for this one device, but it brings up
a more general issue about what the ordering should be during resume. If ACPI
evaluation (which I assume talks to the BIOS), might change device state, it
seems that ACPI code should execute before resuming devices not after. But changing
the order here seems drastic.
An alternate solution would be to have two pm_ops, one for early_resume
and another for late, and split the ACPI work.
--- 2.6.19-rc5.orig/kernel/power/main.c 2006-11-14 14:24:37.000000000 -0800
+++ 2.6.19-rc5/kernel/power/main.c 2006-11-14 14:25:23.000000000 -0800
@@ -132,12 +132,12 @@
static void suspend_finish(suspend_state_t state)
{
+ if (pm_ops && pm_ops->finish)
+ pm_ops->finish(state);
device_resume();
resume_console();
thaw_processes();
enable_nonboot_cpus();
- if (pm_ops && pm_ops->finish)
- pm_ops->finish(state);
pm_restore_console();
}
--
Stephen Hemminger <shemminger@osdl.org>
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [RFC] ACPI vs device ordering on resume 2006-11-14 23:30 [RFC] ACPI vs device ordering on resume Stephen Hemminger @ 2006-11-14 23:59 ` Linus Torvalds 2006-11-15 7:03 ` [linux-pm] " Len Brown 1 sibling, 0 replies; 15+ messages in thread From: Linus Torvalds @ 2006-11-14 23:59 UTC (permalink / raw) To: Stephen Hemminger Cc: len.brown, Greg KH, Pavel Machek, linux-acpi, linux-pm, Andrew Morton On Tue, 14 Nov 2006, Stephen Hemminger wrote: > > During the pci_restore process, the MSI information and the PCI command register > are restored properly. But later during resume, inside the ACPI evaluation of > the WAK method, the PCI_COMMAND INTX_DISABLE (0x400) flag is being cleared. > My guess is that the BIOS ends up doing some resetting of devices. Hmm.. I think calling "pm_ops->finish()" early is the right thing to do, so your patch looks fine. I'd hate to apply it at this stage in the 2.6.19 development, though. Comments? Looking at the code (at least for ACPI), pm_ops->finish() function does things that we'd generally want done early. I'm a bit nervous about the "nesting", though - we call the "->prepare" function _before_ we do the device suspend stuff, so if "->finish" undoes something that was done by "->prepare", then we will restore the devices with the state they were in _after_ the "->prepare". So from a logical _nesting_ perspective, the "->finish()" routine should happen after devices have been restored. And anything that ACPI does to undo "->enter" should have been done early by ACPI itself as it was exiting that "->enter" routine - that would make the things nest properly. Gaah. Since there is no way to know what the HELL those ACPI routines will actually end up doing, there is no way to know what is the right answer. Does anybody know what Windows does? I think Stephen's patch is likely good, but I really don't see myself applying it to my tree right now. It should either go into -mm, or early into the post-2.6.19 tree (or, most likely, both). Anybody? Linus ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-pm] [RFC] ACPI vs device ordering on resume 2006-11-14 23:30 [RFC] ACPI vs device ordering on resume Stephen Hemminger 2006-11-14 23:59 ` Linus Torvalds @ 2006-11-15 7:03 ` Len Brown 2006-11-15 9:32 ` Rafael J. Wysocki ` (2 more replies) 1 sibling, 3 replies; 15+ messages in thread From: Len Brown @ 2006-11-15 7:03 UTC (permalink / raw) To: linux-pm Cc: Stephen Hemminger, Greg KH, Pavel Machek, Andrew Morton, linux-acpi, Linus Torvalds, linux-pm On Tuesday 14 November 2006 18:30, Stephen Hemminger wrote: > If I do a suspend-to-ram then resume on a Sony Vaio laptop with sky2 driver, > the first interrupt gets misrouted to the original shared IRQ, rather than > to the MSI irq expected. > > During the pci_restore process, the MSI information and the PCI command register > are restored properly. But later during resume, inside the ACPI evaluation of > the WAK method, the PCI_COMMAND INTX_DISABLE (0x400) flag is being cleared. > My guess is that the BIOS ends up doing some resetting of devices. > > I may be able to workaround the problem for this one device, but it brings up > a more general issue about what the ordering should be during resume. If ACPI > evaluation (which I assume talks to the BIOS), might change device state, it > seems that ACPI code should execute before resuming devices not after. But changing > the order here seems drastic. > > An alternate solution would be to have two pm_ops, one for early_resume > and another for late, and split the ACPI work. > > --- 2.6.19-rc5.orig/kernel/power/main.c 2006-11-14 14:24:37.000000000 -0800 > +++ 2.6.19-rc5/kernel/power/main.c 2006-11-14 14:25:23.000000000 -0800 > @@ -132,12 +132,12 @@ > > static void suspend_finish(suspend_state_t state) > { > + if (pm_ops && pm_ops->finish) > + pm_ops->finish(state); > device_resume(); > resume_console(); > thaw_processes(); > enable_nonboot_cpus(); > - if (pm_ops && pm_ops->finish) > - pm_ops->finish(state); > pm_restore_console(); > } Yes, I agree that _WAK needs to come before device_resume(). Need to let any BIOS nasties happen and get over with before we restore device drivers. This is consistent with the wording in ACPI 3.0b (section 7.4) that says 11. _WAK is run 12. OSPM notifies all native device drivefrs of the return from the sleep state transition However, commit 1a38416cea8ac801ae8f261074721f35317613dc says that _WAK must follow INIT -- ie finish() must come after enable_nonboot_cpus(), and this patch as it stands would violate that. So it looks like we need this sequence: enable_nonboot_cpus() /* INIT */ finish() /* _WAK */ device_resume() -Len ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-pm] [RFC] ACPI vs device ordering on resume 2006-11-15 7:03 ` [linux-pm] " Len Brown @ 2006-11-15 9:32 ` Rafael J. Wysocki 2006-11-15 16:47 ` Linus Torvalds 2006-12-01 1:48 ` Stephen Hemminger 2 siblings, 0 replies; 15+ messages in thread From: Rafael J. Wysocki @ 2006-11-15 9:32 UTC (permalink / raw) To: Len Brown Cc: linux-pm, Stephen Hemminger, Greg KH, Pavel Machek, Andrew Morton, linux-acpi, Linus Torvalds, linux-pm On Wednesday, 15 November 2006 08:03, Len Brown wrote: > On Tuesday 14 November 2006 18:30, Stephen Hemminger wrote: > > If I do a suspend-to-ram then resume on a Sony Vaio laptop with sky2 driver, > > the first interrupt gets misrouted to the original shared IRQ, rather than > > to the MSI irq expected. > > > > During the pci_restore process, the MSI information and the PCI command register > > are restored properly. But later during resume, inside the ACPI evaluation of > > the WAK method, the PCI_COMMAND INTX_DISABLE (0x400) flag is being cleared. > > My guess is that the BIOS ends up doing some resetting of devices. > > > > I may be able to workaround the problem for this one device, but it brings up > > a more general issue about what the ordering should be during resume. If ACPI > > evaluation (which I assume talks to the BIOS), might change device state, it > > seems that ACPI code should execute before resuming devices not after. But changing > > the order here seems drastic. > > > > An alternate solution would be to have two pm_ops, one for early_resume > > and another for late, and split the ACPI work. > > > > --- 2.6.19-rc5.orig/kernel/power/main.c 2006-11-14 14:24:37.000000000 -0800 > > +++ 2.6.19-rc5/kernel/power/main.c 2006-11-14 14:25:23.000000000 -0800 > > @@ -132,12 +132,12 @@ > > > > static void suspend_finish(suspend_state_t state) > > { > > + if (pm_ops && pm_ops->finish) > > + pm_ops->finish(state); > > device_resume(); > > resume_console(); > > thaw_processes(); > > enable_nonboot_cpus(); > > - if (pm_ops && pm_ops->finish) > > - pm_ops->finish(state); > > pm_restore_console(); > > } > > Yes, I agree that _WAK needs to come before device_resume(). > Need to let any BIOS nasties happen and get over with before we restore device drivers. > This is consistent with the wording in ACPI 3.0b (section 7.4) that says > 11. _WAK is run > 12. OSPM notifies all native device drivefrs of the return from the sleep state transition > > However, commit 1a38416cea8ac801ae8f261074721f35317613dc says that > _WAK must follow INIT -- ie finish() must come after enable_nonboot_cpus(), > and this patch as it stands would violate that. > > So it looks like we need this sequence: > > enable_nonboot_cpus() /* INIT */ > finish() /* _WAK */ > device_resume() Which is a problem, because thaw_processes() is not SMP-safe. Greetings, Rafael -- You never change things by fighting the existing reality. R. Buckminster Fuller ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-pm] [RFC] ACPI vs device ordering on resume 2006-11-15 7:03 ` [linux-pm] " Len Brown 2006-11-15 9:32 ` Rafael J. Wysocki @ 2006-11-15 16:47 ` Linus Torvalds 2006-12-01 9:33 ` Pavel Machek 2006-12-01 1:48 ` Stephen Hemminger 2 siblings, 1 reply; 15+ messages in thread From: Linus Torvalds @ 2006-11-15 16:47 UTC (permalink / raw) To: Len Brown Cc: linux-pm, Stephen Hemminger, Greg KH, Pavel Machek, Andrew Morton, linux-acpi, linux-pm On Wed, 15 Nov 2006, Len Brown wrote: > > So it looks like we need this sequence: > > enable_nonboot_cpus() /* INIT */ > finish() /* _WAK */ > device_resume() Can somebody remind me about this immediately after 2.6.19? No way am I going to make that kind of a major ordering change right now, especially as the thing isn't apparently even a real regression (just more fallout from enabling MSI). It would be good if people who are affected (and people in general that are interested in power management) would test out the patch. In particular, this is an even bigger change than the one suggested by Stephen. It means, for example, that we will have device resume (and process thawing) being called when we're in SMP mode - and that may or may not have issues. So this is a really scary patch to me, no way in hell will I apply it right now. But if people try it out, it would be good.. Btw, this is a clear example of where it might be good to start actively using the "early_resume" thing, and do PCI bus resume there. We might want to do early_resume _before_ calling the bios (I'm not at all convinced that the firmware will do the right thing without the PCI buses set up, for example), and _before_ thawing processes. Then, we might let individual devices do their "device resume" in the normal late resume phase. Greg already carries the patch around for that PCI bus early resume thing, I think. We need to test these things. Linus ---- diff --git a/kernel/power/main.c b/kernel/power/main.c index 873228c..2989609 100644 --- a/kernel/power/main.c +++ b/kernel/power/main.c @@ -132,12 +132,12 @@ int suspend_enter(suspend_state_t state) static void suspend_finish(suspend_state_t state) { - device_resume(); - resume_console(); - thaw_processes(); enable_nonboot_cpus(); if (pm_ops && pm_ops->finish) pm_ops->finish(state); + device_resume(); + resume_console(); + thaw_processes(); pm_restore_console(); } ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [linux-pm] [RFC] ACPI vs device ordering on resume 2006-11-15 16:47 ` Linus Torvalds @ 2006-12-01 9:33 ` Pavel Machek 2006-12-01 10:33 ` Rafael J. Wysocki 2006-12-01 16:12 ` Linus Torvalds 0 siblings, 2 replies; 15+ messages in thread From: Pavel Machek @ 2006-12-01 9:33 UTC (permalink / raw) To: Linus Torvalds Cc: Len Brown, linux-pm, Stephen Hemminger, Greg KH, Andrew Morton, linux-acpi, linux-pm, Rafael J. Wysocki Hi! > > So it looks like we need this sequence: > > > > enable_nonboot_cpus() /* INIT */ > > finish() /* _WAK */ > > device_resume() > > Can somebody remind me about this immediately after 2.6.19? Remind. But note that freezer is not yet SMP safe... Rafael is working on that. Pavel > No way am I going to make that kind of a major ordering change right now, > especially as the thing isn't apparently even a real regression (just more > fallout from enabling MSI). > > It would be good if people who are affected (and people in general that > are interested in power management) would test out the patch. > > In particular, this is an even bigger change than the one suggested by > Stephen. It means, for example, that we will have device resume (and > process thawing) being called when we're in SMP mode - and that may or may > not have issues. So this is a really scary patch to me, no way in hell > will I apply it right now. > > But if people try it out, it would be good.. > > Btw, this is a clear example of where it might be good to start actively > using the "early_resume" thing, and do PCI bus resume there. We might want > to do early_resume _before_ calling the bios (I'm not at all convinced > that the firmware will do the right thing without the PCI buses set up, > for example), and _before_ thawing processes. Then, we might let > individual devices do their "device resume" in the normal late resume > phase. > > Greg already carries the patch around for that PCI bus early resume thing, > I think. > > We need to test these things. > > Linus > > ---- > diff --git a/kernel/power/main.c b/kernel/power/main.c > index 873228c..2989609 100644 > --- a/kernel/power/main.c > +++ b/kernel/power/main.c > @@ -132,12 +132,12 @@ int suspend_enter(suspend_state_t state) > > static void suspend_finish(suspend_state_t state) > { > - device_resume(); > - resume_console(); > - thaw_processes(); > enable_nonboot_cpus(); > if (pm_ops && pm_ops->finish) > pm_ops->finish(state); > + device_resume(); > + resume_console(); > + thaw_processes(); > pm_restore_console(); > } > -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-pm] [RFC] ACPI vs device ordering on resume 2006-12-01 9:33 ` Pavel Machek @ 2006-12-01 10:33 ` Rafael J. Wysocki 2006-12-01 10:57 ` Pavel Machek 2006-12-01 16:12 ` Linus Torvalds 1 sibling, 1 reply; 15+ messages in thread From: Rafael J. Wysocki @ 2006-12-01 10:33 UTC (permalink / raw) To: Pavel Machek Cc: Linus Torvalds, Len Brown, linux-pm, Stephen Hemminger, Greg KH, Andrew Morton, linux-acpi, linux-pm Hi, On Friday, 1 December 2006 10:33, Pavel Machek wrote: > Hi! > > > > So it looks like we need this sequence: > > > > > > enable_nonboot_cpus() /* INIT */ > > > finish() /* _WAK */ > > > device_resume() > > > > Can somebody remind me about this immediately after 2.6.19? > > Remind. But note that freezer is not yet SMP safe... Rafael is working > on that. Yup. BTW, have you looked at the last version of the patch for the handling of stopped tasks (appended just in case, full discussion at: http://lists.osdl.org/pipermail/linux-pm/2006-November/004214.html)? Rafael Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> --- kernel/power/process.c | 36 ++++++++++++++++++++++++++++++------ kernel/signal.c | 2 +- 2 files changed, 31 insertions(+), 7 deletions(-) Index: linux-2.6.19-rc6-mm2/kernel/power/process.c =================================================================== --- linux-2.6.19-rc6-mm2.orig/kernel/power/process.c +++ linux-2.6.19-rc6-mm2/kernel/power/process.c @@ -28,8 +28,7 @@ static inline int freezeable(struct task if ((p == current) || (p->flags & PF_NOFREEZE) || (p->exit_state == EXIT_ZOMBIE) || - (p->exit_state == EXIT_DEAD) || - (p->state == TASK_STOPPED)) + (p->exit_state == EXIT_DEAD)) return 0; return 1; } @@ -81,6 +80,11 @@ static void cancel_freezing(struct task_ } } +static inline int stopped_and_freezing(struct task_struct *p) +{ + return p->state == TASK_STOPPED && freezing(p); +} + static inline int is_user_space(struct task_struct *p) { return p->mm && !(p->flags & PF_BORROWED_MM); @@ -103,9 +107,11 @@ static unsigned int try_to_freeze_tasks( if (frozen(p)) continue; - if (p->state == TASK_TRACED && - (frozen(p->parent) || - p->parent->state == TASK_STOPPED)) { + if (stopped_and_freezing(p)) + continue; + + if (p->state == TASK_TRACED && (frozen(p->parent) || + stopped_and_freezing(p->parent))) { cancel_freezing(p); continue; } @@ -149,7 +155,8 @@ static unsigned int try_to_freeze_tasks( if (is_user_space(p) == !freeze_user_space) continue; - if (freezeable(p) && !frozen(p)) + if (freezeable(p) && !frozen(p) && + p->state != TASK_STOPPED && p->state != TASK_TRACED) printk(KERN_ERR " %s\n", p->comm); cancel_freezing(p); @@ -185,6 +192,18 @@ int freeze_processes(void) return 0; } +static void release_stopped_tasks(void) +{ + struct task_struct *g, *p; + + read_lock(&tasklist_lock); + do_each_thread(g, p) { + if (stopped_and_freezing(p)) + cancel_freezing(p); + } while_each_thread(g, p); + read_unlock(&tasklist_lock); +} + static void thaw_tasks(int thaw_user_space) { struct task_struct *g, *p; @@ -197,6 +216,10 @@ static void thaw_tasks(int thaw_user_spa if (is_user_space(p) == !thaw_user_space) continue; + if (!frozen(p) && + (p->state == TASK_STOPPED || p->state == TASK_TRACED)) + continue; + if (!thaw_process(p)) printk(KERN_WARNING " Strange, %s not stopped\n", p->comm ); @@ -207,6 +230,7 @@ static void thaw_tasks(int thaw_user_spa void thaw_processes(void) { printk("Restarting tasks ... "); + release_stopped_tasks(); thaw_tasks(FREEZER_KERNEL_THREADS); thaw_tasks(FREEZER_USER_SPACE); schedule(); Index: linux-2.6.19-rc6-mm2/kernel/signal.c =================================================================== --- linux-2.6.19-rc6-mm2.orig/kernel/signal.c +++ linux-2.6.19-rc6-mm2/kernel/signal.c @@ -1937,9 +1937,9 @@ int get_signal_to_deliver(siginfo_t *inf sigset_t *mask = ¤t->blocked; int signr = 0; +relock: try_to_freeze(); -relock: spin_lock_irq(¤t->sighand->siglock); for (;;) { struct k_sigaction *ka; ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-pm] [RFC] ACPI vs device ordering on resume 2006-12-01 10:33 ` Rafael J. Wysocki @ 2006-12-01 10:57 ` Pavel Machek 2006-12-01 11:31 ` Rafael J. Wysocki 0 siblings, 1 reply; 15+ messages in thread From: Pavel Machek @ 2006-12-01 10:57 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linus Torvalds, Len Brown, linux-pm, Stephen Hemminger, Greg KH, Andrew Morton, linux-acpi, linux-pm Hi! > > > > So it looks like we need this sequence: > > > > > > > > enable_nonboot_cpus() /* INIT */ > > > > finish() /* _WAK */ > > > > device_resume() > > > > > > Can somebody remind me about this immediately after 2.6.19? > > > > Remind. But note that freezer is not yet SMP safe... Rafael is working > > on that. > > Yup. > > BTW, have you looked at the last version of the patch for the handling of > stopped tasks (appended just in case, full discussion at: > http://lists.osdl.org/pipermail/linux-pm/2006-November/004214.html)? Well, I took a look.. and decided I'd like to find the place in kernel where I can add try_to_freeze() and fix the TASK_STOPPED processes. I hope such place exists. Pavel > > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> > --- > kernel/power/process.c | 36 ++++++++++++++++++++++++++++++------ > kernel/signal.c | 2 +- > 2 files changed, 31 insertions(+), 7 deletions(-) > > Index: linux-2.6.19-rc6-mm2/kernel/power/process.c > =================================================================== > --- linux-2.6.19-rc6-mm2.orig/kernel/power/process.c > +++ linux-2.6.19-rc6-mm2/kernel/power/process.c > @@ -28,8 +28,7 @@ static inline int freezeable(struct task > if ((p == current) || > (p->flags & PF_NOFREEZE) || > (p->exit_state == EXIT_ZOMBIE) || > - (p->exit_state == EXIT_DEAD) || > - (p->state == TASK_STOPPED)) > + (p->exit_state == EXIT_DEAD)) > return 0; > return 1; > } > @@ -81,6 +80,11 @@ static void cancel_freezing(struct task_ > } > } > > +static inline int stopped_and_freezing(struct task_struct *p) > +{ > + return p->state == TASK_STOPPED && freezing(p); > +} > + > static inline int is_user_space(struct task_struct *p) > { > return p->mm && !(p->flags & PF_BORROWED_MM); > @@ -103,9 +107,11 @@ static unsigned int try_to_freeze_tasks( > if (frozen(p)) > continue; > > - if (p->state == TASK_TRACED && > - (frozen(p->parent) || > - p->parent->state == TASK_STOPPED)) { > + if (stopped_and_freezing(p)) > + continue; > + > + if (p->state == TASK_TRACED && (frozen(p->parent) || > + stopped_and_freezing(p->parent))) { > cancel_freezing(p); > continue; > } > @@ -149,7 +155,8 @@ static unsigned int try_to_freeze_tasks( > if (is_user_space(p) == !freeze_user_space) > continue; > > - if (freezeable(p) && !frozen(p)) > + if (freezeable(p) && !frozen(p) && > + p->state != TASK_STOPPED && p->state != TASK_TRACED) > printk(KERN_ERR " %s\n", p->comm); > > cancel_freezing(p); > @@ -185,6 +192,18 @@ int freeze_processes(void) > return 0; > } > > +static void release_stopped_tasks(void) > +{ > + struct task_struct *g, *p; > + > + read_lock(&tasklist_lock); > + do_each_thread(g, p) { > + if (stopped_and_freezing(p)) > + cancel_freezing(p); > + } while_each_thread(g, p); > + read_unlock(&tasklist_lock); > +} > + > static void thaw_tasks(int thaw_user_space) > { > struct task_struct *g, *p; > @@ -197,6 +216,10 @@ static void thaw_tasks(int thaw_user_spa > if (is_user_space(p) == !thaw_user_space) > continue; > > + if (!frozen(p) && > + (p->state == TASK_STOPPED || p->state == TASK_TRACED)) > + continue; > + > if (!thaw_process(p)) > printk(KERN_WARNING " Strange, %s not stopped\n", > p->comm ); > @@ -207,6 +230,7 @@ static void thaw_tasks(int thaw_user_spa > void thaw_processes(void) > { > printk("Restarting tasks ... "); > + release_stopped_tasks(); > thaw_tasks(FREEZER_KERNEL_THREADS); > thaw_tasks(FREEZER_USER_SPACE); > schedule(); > Index: linux-2.6.19-rc6-mm2/kernel/signal.c > =================================================================== > --- linux-2.6.19-rc6-mm2.orig/kernel/signal.c > +++ linux-2.6.19-rc6-mm2/kernel/signal.c > @@ -1937,9 +1937,9 @@ int get_signal_to_deliver(siginfo_t *inf > sigset_t *mask = ¤t->blocked; > int signr = 0; > > +relock: > try_to_freeze(); > > -relock: > spin_lock_irq(¤t->sighand->siglock); > for (;;) { > struct k_sigaction *ka; -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-pm] [RFC] ACPI vs device ordering on resume 2006-12-01 10:57 ` Pavel Machek @ 2006-12-01 11:31 ` Rafael J. Wysocki 0 siblings, 0 replies; 15+ messages in thread From: Rafael J. Wysocki @ 2006-12-01 11:31 UTC (permalink / raw) To: Pavel Machek Cc: Linus Torvalds, Len Brown, linux-pm, Stephen Hemminger, Greg KH, Andrew Morton, linux-acpi, linux-pm Hi, On Friday, 1 December 2006 11:57, Pavel Machek wrote: > Hi! > > > > > > So it looks like we need this sequence: > > > > > > > > > > enable_nonboot_cpus() /* INIT */ > > > > > finish() /* _WAK */ > > > > > device_resume() > > > > > > > > Can somebody remind me about this immediately after 2.6.19? > > > > > > Remind. But note that freezer is not yet SMP safe... Rafael is working > > > on that. > > > > Yup. > > > > BTW, have you looked at the last version of the patch for the handling of > > stopped tasks (appended just in case, full discussion at: > > http://lists.osdl.org/pipermail/linux-pm/2006-November/004214.html)? > > Well, I took a look.. and decided I'd like to find the place in kernel > where I can add try_to_freeze() and fix the TASK_STOPPED processes. I > hope such place exists. Well, try_to_freeze() will only work for a process that has PF_FREEZE set, no? So, if you want try_to_freeze() to work for a stopped process, you have to set PF_FREEZE for this process, but this means that stopped processes cannot be treated as unfreezeable ... > > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> > > --- > > kernel/power/process.c | 36 ++++++++++++++++++++++++++++++------ > > kernel/signal.c | 2 +- > > 2 files changed, 31 insertions(+), 7 deletions(-) > > > > Index: linux-2.6.19-rc6-mm2/kernel/power/process.c > > =================================================================== > > --- linux-2.6.19-rc6-mm2.orig/kernel/power/process.c > > +++ linux-2.6.19-rc6-mm2/kernel/power/process.c > > @@ -28,8 +28,7 @@ static inline int freezeable(struct task > > if ((p == current) || > > (p->flags & PF_NOFREEZE) || > > (p->exit_state == EXIT_ZOMBIE) || > > - (p->exit_state == EXIT_DEAD) || > > - (p->state == TASK_STOPPED)) > > + (p->exit_state == EXIT_DEAD)) ... so this change is _necessary_ anyway. Now if we don't treat stopped processes as unfreezeable, it doesn't make sense to set PF_FREEZE for them more than once. This it's reasonable to check if we have already set PF_FREEZE for given stopped task and skip it if so ... > > return 0; > > return 1; > > } > > @@ -81,6 +80,11 @@ static void cancel_freezing(struct task_ > > } > > } > > > > +static inline int stopped_and_freezing(struct task_struct *p) > > +{ > > + return p->state == TASK_STOPPED && freezing(p); > > +} > > + > > static inline int is_user_space(struct task_struct *p) > > { > > return p->mm && !(p->flags & PF_BORROWED_MM); > > @@ -103,9 +107,11 @@ static unsigned int try_to_freeze_tasks( > > if (frozen(p)) > > continue; > > > > - if (p->state == TASK_TRACED && > > - (frozen(p->parent) || > > - p->parent->state == TASK_STOPPED)) { > > + if (stopped_and_freezing(p)) > > + continue; ... which is done here. Now the condition for the freezing of traced tasks should be modified to account for the fact that we no longer regard stopped tasks as unfreezeable ... > > + > > + if (p->state == TASK_TRACED && (frozen(p->parent) || > > + stopped_and_freezing(p->parent))) { ... so we need this change too. > > cancel_freezing(p); > > continue; > > } > > @@ -149,7 +155,8 @@ static unsigned int try_to_freeze_tasks( > > if (is_user_space(p) == !freeze_user_space) > > continue; > > > > - if (freezeable(p) && !frozen(p)) > > + if (freezeable(p) && !frozen(p) && > > + p->state != TASK_STOPPED && p->state != TASK_TRACED) > > printk(KERN_ERR " %s\n", p->comm); Now if the freezing of tasks fails, we don't want to confuse the user by reporting that some stopped or traced tasks weren't frozen, so it's reasonable to do the above, but it's really optional. > > > > cancel_freezing(p); > > @@ -185,6 +192,18 @@ int freeze_processes(void) > > return 0; > > } > > > > +static void release_stopped_tasks(void) > > +{ > > + struct task_struct *g, *p; > > + > > + read_lock(&tasklist_lock); > > + do_each_thread(g, p) { > > + if (stopped_and_freezing(p)) > > + cancel_freezing(p); > > + } while_each_thread(g, p); > > + read_unlock(&tasklist_lock); > > +} > > + > > static void thaw_tasks(int thaw_user_space) > > { > > struct task_struct *g, *p; > > @@ -197,6 +216,10 @@ static void thaw_tasks(int thaw_user_spa > > if (is_user_space(p) == !thaw_user_space) > > continue; > > > > + if (!frozen(p) && > > + (p->state == TASK_STOPPED || p->state == TASK_TRACED)) > > + continue; > > + Again, we shouldn't be confusing the user by reporting that some stopped or traced tasks didn't stop, because that's likely a normal thing ... > > if (!thaw_process(p)) > > printk(KERN_WARNING " Strange, %s not stopped\n", > > p->comm ); > > @@ -207,6 +230,7 @@ static void thaw_tasks(int thaw_user_spa > > void thaw_processes(void) > > { > > printk("Restarting tasks ... "); > > + release_stopped_tasks(); ... but we need this in case there are some stopped tasks for which we set PF_FREEZE during the suspend and they haven't entered the refrigerator (eg. because they haven't received the continuation signal in the meantime). This also is a consequence of the fact that we don't regard the stopped tasks as unfreezeable. > > thaw_tasks(FREEZER_KERNEL_THREADS); > > thaw_tasks(FREEZER_USER_SPACE); > > schedule(); > > Index: linux-2.6.19-rc6-mm2/kernel/signal.c > > =================================================================== > > --- linux-2.6.19-rc6-mm2.orig/kernel/signal.c > > +++ linux-2.6.19-rc6-mm2/kernel/signal.c > > @@ -1937,9 +1937,9 @@ int get_signal_to_deliver(siginfo_t *inf > > sigset_t *mask = ¤t->blocked; > > int signr = 0; > > > > +relock: > > try_to_freeze(); > > > > -relock: > > spin_lock_irq(¤t->sighand->siglock); > > for (;;) { > > struct k_sigaction *ka; And this makes stopped tasks that have just received the continuation signal to immediately check if they should go to the refrigerator. Greetings, Rafael -- You never change things by fighting the existing reality. R. Buckminster Fuller ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-pm] [RFC] ACPI vs device ordering on resume 2006-12-01 9:33 ` Pavel Machek 2006-12-01 10:33 ` Rafael J. Wysocki @ 2006-12-01 16:12 ` Linus Torvalds 2006-12-01 17:45 ` Alexey Starikovskiy 1 sibling, 1 reply; 15+ messages in thread From: Linus Torvalds @ 2006-12-01 16:12 UTC (permalink / raw) To: Pavel Machek Cc: Len Brown, linux-pm, Stephen Hemminger, Greg KH, Andrew Morton, linux-acpi, linux-pm, Rafael J. Wysocki On Fri, 1 Dec 2006, Pavel Machek wrote: > > > So it looks like we need this sequence: > > > > > > enable_nonboot_cpus() /* INIT */ > > > finish() /* _WAK */ > > > device_resume() > > > > Can somebody remind me about this immediately after 2.6.19? > > Remind. But note that freezer is not yet SMP safe... Rafael is working > on that. Thanks. On the other hand, I really wonder (and suspect) whether the problem isn't really the freezer or even the kernel resume ordering, but simply an ACPI internal resume ordering thing. Doesn't ACPI have per-device "WAK" calls anyway? Shouldn't we just call those _individually_ as we walk the device tree (perhaps in the "early_resume" stage) rather than calling them all in one chunk? Linus ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-pm] [RFC] ACPI vs device ordering on resume 2006-12-01 16:12 ` Linus Torvalds @ 2006-12-01 17:45 ` Alexey Starikovskiy 2006-12-01 18:40 ` Stephen Hemminger 0 siblings, 1 reply; 15+ messages in thread From: Alexey Starikovskiy @ 2006-12-01 17:45 UTC (permalink / raw) To: Linus Torvalds Cc: Pavel Machek, Andrew Morton, linux-acpi, linux-pm, linux-pm, Stephen Hemminger Linus Torvalds wrote: > On Fri, 1 Dec 2006, Pavel Machek wrote: > > >>>> So it looks like we need this sequence: >>>> >>>> enable_nonboot_cpus() /* INIT */ >>>> finish() /* _WAK */ >>>> device_resume() >>>> >>> Can somebody remind me about this immediately after 2.6.19? >>> >> Remind. But note that freezer is not yet SMP safe... Rafael is working >> on that. >> > > Thanks. > > On the other hand, I really wonder (and suspect) whether the problem isn't > really the freezer or even the kernel resume ordering, but simply an ACPI > internal resume ordering thing. > > Doesn't ACPI have per-device "WAK" calls anyway? Shouldn't we just call > those _individually_ as we walk the device tree (perhaps in the > "early_resume" stage) rather than calling them all in one chunk? > > Linus > _WAK method is system-wide. Individual objects do not have their own resume methods. One way of reordering internal ACPI resume is done in patch series to 7122, I mentioned that earlier. It's possible to resume ACPI devices after execution of _WAK in pm->finish. Alex. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-pm] [RFC] ACPI vs device ordering on resume 2006-12-01 17:45 ` Alexey Starikovskiy @ 2006-12-01 18:40 ` Stephen Hemminger 2006-12-01 18:42 ` Alexey Starikovskiy 0 siblings, 1 reply; 15+ messages in thread From: Stephen Hemminger @ 2006-12-01 18:40 UTC (permalink / raw) To: Alexey Starikovskiy Cc: Linus Torvalds, Pavel Machek, Andrew Morton, linux-acpi, linux-pm, linux-pm On Fri, 01 Dec 2006 20:45:51 +0300 Alexey Starikovskiy <alexey.y.starikovskiy@linux.intel.com> wrote: > Linus Torvalds wrote: > > On Fri, 1 Dec 2006, Pavel Machek wrote: > > > > > >>>> So it looks like we need this sequence: > >>>> > >>>> enable_nonboot_cpus() /* INIT */ > >>>> finish() /* _WAK */ > >>>> device_resume() > >>>> > >>> Can somebody remind me about this immediately after 2.6.19? > >>> > >> Remind. But note that freezer is not yet SMP safe... Rafael is working > >> on that. > >> > > > > Thanks. > > > > On the other hand, I really wonder (and suspect) whether the problem isn't > > really the freezer or even the kernel resume ordering, but simply an ACPI > > internal resume ordering thing. > > > > Doesn't ACPI have per-device "WAK" calls anyway? Shouldn't we just call > > those _individually_ as we walk the device tree (perhaps in the > > "early_resume" stage) rather than calling them all in one chunk? > > > > Linus > > > _WAK method is system-wide. Individual objects do not have their own > resume methods. > One way of reordering internal ACPI resume is done in patch series to > 7122, I mentioned that earlier. > It's possible to resume ACPI devices after execution of _WAK in pm->finish. Does it solve the original problem where finish() was getting run after device_resume(), and finish() was corrupting PCI register settings like MSI? -- Stephen Hemminger <shemminger@osdl.org> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-pm] [RFC] ACPI vs device ordering on resume 2006-12-01 18:40 ` Stephen Hemminger @ 2006-12-01 18:42 ` Alexey Starikovskiy 0 siblings, 0 replies; 15+ messages in thread From: Alexey Starikovskiy @ 2006-12-01 18:42 UTC (permalink / raw) To: Stephen Hemminger Cc: Linus Torvalds, Pavel Machek, Andrew Morton, linux-acpi, linux-pm Stephen Hemminger wrote: > On Fri, 01 Dec 2006 20:45:51 +0300 > Alexey Starikovskiy <alexey.y.starikovskiy@linux.intel.com> wrote: > > >> Linus Torvalds wrote: >> >>> On Fri, 1 Dec 2006, Pavel Machek wrote: >>> >>> >>> >>>>>> So it looks like we need this sequence: >>>>>> >>>>>> enable_nonboot_cpus() /* INIT */ >>>>>> finish() /* _WAK */ >>>>>> device_resume() >>>>>> >>>>>> >>>>> Can somebody remind me about this immediately after 2.6.19? >>>>> >>>>> >>>> Remind. But note that freezer is not yet SMP safe... Rafael is working >>>> on that. >>>> >>>> >>> Thanks. >>> >>> On the other hand, I really wonder (and suspect) whether the problem isn't >>> really the freezer or even the kernel resume ordering, but simply an ACPI >>> internal resume ordering thing. >>> >>> Doesn't ACPI have per-device "WAK" calls anyway? Shouldn't we just call >>> those _individually_ as we walk the device tree (perhaps in the >>> "early_resume" stage) rather than calling them all in one chunk? >>> >>> Linus >>> >>> >> _WAK method is system-wide. Individual objects do not have their own >> resume methods. >> One way of reordering internal ACPI resume is done in patch series to >> 7122, I mentioned that earlier. >> It's possible to resume ACPI devices after execution of _WAK in pm->finish. >> > > Does it solve the original problem where finish() was getting run after > device_resume(), and finish() was corrupting PCI register settings like > MSI? > > No, only the case with ACPI devices... all other devices are untouched. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-pm] [RFC] ACPI vs device ordering on resume 2006-11-15 7:03 ` [linux-pm] " Len Brown 2006-11-15 9:32 ` Rafael J. Wysocki 2006-11-15 16:47 ` Linus Torvalds @ 2006-12-01 1:48 ` Stephen Hemminger 2006-12-01 10:25 ` Rafael J. Wysocki 2 siblings, 1 reply; 15+ messages in thread From: Stephen Hemminger @ 2006-12-01 1:48 UTC (permalink / raw) To: Len Brown Cc: len.brown, linux-pm, Greg KH, Pavel Machek, Andrew Morton, linux-acpi, Linus Torvalds, linux-pm On Wed, 15 Nov 2006 02:03:30 -0500 Len Brown <len.brown@intel.com> wrote: > On Tuesday 14 November 2006 18:30, Stephen Hemminger wrote: > > If I do a suspend-to-ram then resume on a Sony Vaio laptop with sky2 driver, > > the first interrupt gets misrouted to the original shared IRQ, rather than > > to the MSI irq expected. > > > > During the pci_restore process, the MSI information and the PCI command register > > are restored properly. But later during resume, inside the ACPI evaluation of > > the WAK method, the PCI_COMMAND INTX_DISABLE (0x400) flag is being cleared. > > My guess is that the BIOS ends up doing some resetting of devices. > > > > I may be able to workaround the problem for this one device, but it brings up > > a more general issue about what the ordering should be during resume. If ACPI > > evaluation (which I assume talks to the BIOS), might change device state, it > > seems that ACPI code should execute before resuming devices not after. But changing > > the order here seems drastic. > > > > An alternate solution would be to have two pm_ops, one for early_resume > > and another for late, and split the ACPI work. > > > > --- 2.6.19-rc5.orig/kernel/power/main.c 2006-11-14 14:24:37.000000000 -0800 > > +++ 2.6.19-rc5/kernel/power/main.c 2006-11-14 14:25:23.000000000 -0800 > > @@ -132,12 +132,12 @@ > > > > static void suspend_finish(suspend_state_t state) > > { > > + if (pm_ops && pm_ops->finish) > > + pm_ops->finish(state); > > device_resume(); > > resume_console(); > > thaw_processes(); > > enable_nonboot_cpus(); > > - if (pm_ops && pm_ops->finish) > > - pm_ops->finish(state); > > pm_restore_console(); > > } > > Yes, I agree that _WAK needs to come before device_resume(). > Need to let any BIOS nasties happen and get over with before we restore device drivers. > This is consistent with the wording in ACPI 3.0b (section 7.4) that says > 11. _WAK is run > 12. OSPM notifies all native device drivefrs of the return from the sleep state transition > > However, commit 1a38416cea8ac801ae8f261074721f35317613dc says that > _WAK must follow INIT -- ie finish() must come after enable_nonboot_cpus(), > and this patch as it stands would violate that. > > So it looks like we need this sequence: > > enable_nonboot_cpus() /* INIT */ > finish() /* _WAK */ > device_resume() > Do you want to do this, or shall I? send off a patch. I can test on about 5 machines first. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-pm] [RFC] ACPI vs device ordering on resume 2006-12-01 1:48 ` Stephen Hemminger @ 2006-12-01 10:25 ` Rafael J. Wysocki 0 siblings, 0 replies; 15+ messages in thread From: Rafael J. Wysocki @ 2006-12-01 10:25 UTC (permalink / raw) To: Stephen Hemminger Cc: Len Brown, len.brown, linux-pm, Greg KH, Pavel Machek, Andrew Morton, linux-acpi, Linus Torvalds, linux-pm On Friday, 1 December 2006 02:48, Stephen Hemminger wrote: > On Wed, 15 Nov 2006 02:03:30 -0500 > Len Brown <len.brown@intel.com> wrote: > > > On Tuesday 14 November 2006 18:30, Stephen Hemminger wrote: > > > If I do a suspend-to-ram then resume on a Sony Vaio laptop with sky2 driver, > > > the first interrupt gets misrouted to the original shared IRQ, rather than > > > to the MSI irq expected. > > > > > > During the pci_restore process, the MSI information and the PCI command register > > > are restored properly. But later during resume, inside the ACPI evaluation of > > > the WAK method, the PCI_COMMAND INTX_DISABLE (0x400) flag is being cleared. > > > My guess is that the BIOS ends up doing some resetting of devices. > > > > > > I may be able to workaround the problem for this one device, but it brings up > > > a more general issue about what the ordering should be during resume. If ACPI > > > evaluation (which I assume talks to the BIOS), might change device state, it > > > seems that ACPI code should execute before resuming devices not after. But changing > > > the order here seems drastic. > > > > > > An alternate solution would be to have two pm_ops, one for early_resume > > > and another for late, and split the ACPI work. > > > > > > --- 2.6.19-rc5.orig/kernel/power/main.c 2006-11-14 14:24:37.000000000 -0800 > > > +++ 2.6.19-rc5/kernel/power/main.c 2006-11-14 14:25:23.000000000 -0800 > > > @@ -132,12 +132,12 @@ > > > > > > static void suspend_finish(suspend_state_t state) > > > { > > > + if (pm_ops && pm_ops->finish) > > > + pm_ops->finish(state); > > > device_resume(); > > > resume_console(); > > > thaw_processes(); > > > enable_nonboot_cpus(); > > > - if (pm_ops && pm_ops->finish) > > > - pm_ops->finish(state); > > > pm_restore_console(); > > > } > > > > Yes, I agree that _WAK needs to come before device_resume(). > > Need to let any BIOS nasties happen and get over with before we restore device drivers. > > This is consistent with the wording in ACPI 3.0b (section 7.4) that says > > 11. _WAK is run > > 12. OSPM notifies all native device drivefrs of the return from the sleep state transition > > > > However, commit 1a38416cea8ac801ae8f261074721f35317613dc says that > > _WAK must follow INIT -- ie finish() must come after enable_nonboot_cpus(), > > and this patch as it stands would violate that. > > > > So it looks like we need this sequence: > > > > enable_nonboot_cpus() /* INIT */ > > finish() /* _WAK */ > > device_resume() > > > > Do you want to do this, or shall I? send off a patch. > I can test on about 5 machines first. Could we please wait with that a bit until we have the freezer fixed? Rafael -- You never change things by fighting the existing reality. R. Buckminster Fuller ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2006-12-01 18:42 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-11-14 23:30 [RFC] ACPI vs device ordering on resume Stephen Hemminger 2006-11-14 23:59 ` Linus Torvalds 2006-11-15 7:03 ` [linux-pm] " Len Brown 2006-11-15 9:32 ` Rafael J. Wysocki 2006-11-15 16:47 ` Linus Torvalds 2006-12-01 9:33 ` Pavel Machek 2006-12-01 10:33 ` Rafael J. Wysocki 2006-12-01 10:57 ` Pavel Machek 2006-12-01 11:31 ` Rafael J. Wysocki 2006-12-01 16:12 ` Linus Torvalds 2006-12-01 17:45 ` Alexey Starikovskiy 2006-12-01 18:40 ` Stephen Hemminger 2006-12-01 18:42 ` Alexey Starikovskiy 2006-12-01 1:48 ` Stephen Hemminger 2006-12-01 10:25 ` Rafael J. Wysocki
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox