* peculiar suspend/resume bug. @ 2006-08-15 22:10 Dave Jones 2006-08-16 0:19 ` Nigel Cunningham 2006-08-16 22:06 ` Pavel Machek 0 siblings, 2 replies; 12+ messages in thread From: Dave Jones @ 2006-08-15 22:10 UTC (permalink / raw) To: Linux Kernel Here's a fun one. - Get a dual core cpufreq aware laptop (Like say, a core-duo) - Add a cpufreq monitor to gnome-panel. Configure it to watch the 2nd core. - Suspend. - Resume. Watch the cpufreq monitor die horribly. I believe this is because we take down the 2nd core at suspend time with cpu hotplug, and for some reason we're scheduling userspace before we bring that second core back up. Anyone have any clues why this is happening? Dave -- http://www.codemonkey.org.uk ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: peculiar suspend/resume bug. 2006-08-15 22:10 peculiar suspend/resume bug Dave Jones @ 2006-08-16 0:19 ` Nigel Cunningham 2006-08-16 0:37 ` Dave Jones 2006-08-16 22:06 ` Pavel Machek 1 sibling, 1 reply; 12+ messages in thread From: Nigel Cunningham @ 2006-08-16 0:19 UTC (permalink / raw) To: Dave Jones; +Cc: Linux Kernel Hi Dave. On Tue, 2006-08-15 at 18:10 -0400, Dave Jones wrote: > Here's a fun one. > - Get a dual core cpufreq aware laptop (Like say, a core-duo) > - Add a cpufreq monitor to gnome-panel. Configure it > to watch the 2nd core. > - Suspend. > - Resume. > > Watch the cpufreq monitor die horribly. > > I believe this is because we take down the 2nd core at suspend > time with cpu hotplug, and for some reason we're scheduling > userspace before we bring that second core back up. > > Anyone have any clues why this is happening? If you hotunplug and replug the cpu using the sysfs interface, rather than suspending and resuming, does the same thing happen? Regards, Nigel ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: peculiar suspend/resume bug. 2006-08-16 0:19 ` Nigel Cunningham @ 2006-08-16 0:37 ` Dave Jones 2006-08-16 1:05 ` Nigel Cunningham 2006-08-16 2:41 ` Matthew Garrett 0 siblings, 2 replies; 12+ messages in thread From: Dave Jones @ 2006-08-16 0:37 UTC (permalink / raw) To: Nigel Cunningham; +Cc: Linux Kernel On Wed, Aug 16, 2006 at 10:19:59AM +1000, Nigel Cunningham wrote: > Hi Dave. > > On Tue, 2006-08-15 at 18:10 -0400, Dave Jones wrote: > > Here's a fun one. > > - Get a dual core cpufreq aware laptop (Like say, a core-duo) > > - Add a cpufreq monitor to gnome-panel. Configure it > > to watch the 2nd core. > > - Suspend. > > - Resume. > > > > Watch the cpufreq monitor die horribly. > > > > I believe this is because we take down the 2nd core at suspend > > time with cpu hotplug, and for some reason we're scheduling > > userspace before we bring that second core back up. > > > > Anyone have any clues why this is happening? > > If you hotunplug and replug the cpu using the sysfs interface, rather > than suspending and resuming, does the same thing happen? cpufreq-applet crashes as soon as the cpu goes offline. Now, the applet should be written to deal with this scenario more gracefully, but I'm questioning whether or not userspace should *see* the unplug/replug that suspend does at all. IMO, when we shouldn't schedule userspace until the system is in the exact state it was before we suspended. Dave -- http://www.codemonkey.org.uk ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: peculiar suspend/resume bug. 2006-08-16 0:37 ` Dave Jones @ 2006-08-16 1:05 ` Nigel Cunningham 2006-08-16 2:41 ` Matthew Garrett 1 sibling, 0 replies; 12+ messages in thread From: Nigel Cunningham @ 2006-08-16 1:05 UTC (permalink / raw) To: Dave Jones; +Cc: Linux Kernel Hi Dave. On Tue, 2006-08-15 at 20:37 -0400, Dave Jones wrote: > On Wed, Aug 16, 2006 at 10:19:59AM +1000, Nigel Cunningham wrote: > > Hi Dave. > > > > On Tue, 2006-08-15 at 18:10 -0400, Dave Jones wrote: > > > Here's a fun one. > > > - Get a dual core cpufreq aware laptop (Like say, a core-duo) > > > - Add a cpufreq monitor to gnome-panel. Configure it > > > to watch the 2nd core. > > > - Suspend. > > > - Resume. > > > > > > Watch the cpufreq monitor die horribly. > > > > > > I believe this is because we take down the 2nd core at suspend > > > time with cpu hotplug, and for some reason we're scheduling > > > userspace before we bring that second core back up. > > > > > > Anyone have any clues why this is happening? > > > > If you hotunplug and replug the cpu using the sysfs interface, rather > > than suspending and resuming, does the same thing happen? > > cpufreq-applet crashes as soon as the cpu goes offline. > Now, the applet should be written to deal with this scenario more > gracefully, but I'm questioning whether or not userspace should > *see* the unplug/replug that suspend does at all. > > IMO, when we shouldn't schedule userspace until the system is > in the exact state it was before we suspended. At the moment, the cpu hotplugging/unplugging is done outside of freezing processes because once we've frozen processes we can't (afaik) move ones that are tied to the cpu being unplugged to another processor, and won't also be able to kill kernel threads that are tied to the processor(s) being taken down. Personally, I wouldn't mind being seeing this addressed as I see a few other benefits to being able to hot[un]plug later, besides simplifying life for the cpufreq-applet (although it shouldn't crash if a cpu is offlined anyway). Regards, Nigel ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: peculiar suspend/resume bug. 2006-08-16 0:37 ` Dave Jones 2006-08-16 1:05 ` Nigel Cunningham @ 2006-08-16 2:41 ` Matthew Garrett 2006-08-16 3:53 ` Dave Jones 2006-08-17 1:44 ` Nigel Cunningham 1 sibling, 2 replies; 12+ messages in thread From: Matthew Garrett @ 2006-08-16 2:41 UTC (permalink / raw) To: Dave Jones, Nigel Cunningham, Linux Kernel On Tue, Aug 15, 2006 at 08:37:28PM -0400, Dave Jones wrote: > cpufreq-applet crashes as soon as the cpu goes offline. > Now, the applet should be written to deal with this scenario more > gracefully, but I'm questioning whether or not userspace should > *see* the unplug/replug that suspend does at all. As Nigel mentioned, cpu unplug happens just before processes are frozen, so I guess there's a chance for it to be scheduled. On the other hand, it's not unreasonable for CPUs to be unplugged during runtime anyway - perhaps userspace should be able to deal with that? -- Matthew Garrett | mjg59@srcf.ucam.org ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: peculiar suspend/resume bug. 2006-08-16 2:41 ` Matthew Garrett @ 2006-08-16 3:53 ` Dave Jones 2006-08-16 8:54 ` Rafael J. Wysocki 2006-08-17 1:44 ` Nigel Cunningham 1 sibling, 1 reply; 12+ messages in thread From: Dave Jones @ 2006-08-16 3:53 UTC (permalink / raw) To: Matthew Garrett; +Cc: Nigel Cunningham, Linux Kernel On Wed, Aug 16, 2006 at 03:41:40AM +0100, Matthew Garrett wrote: > On Tue, Aug 15, 2006 at 08:37:28PM -0400, Dave Jones wrote: > > > cpufreq-applet crashes as soon as the cpu goes offline. > > Now, the applet should be written to deal with this scenario more > > gracefully, but I'm questioning whether or not userspace should > > *see* the unplug/replug that suspend does at all. > > As Nigel mentioned, cpu unplug happens just before processes are frozen, > so I guess there's a chance for it to be scheduled. On the other hand, > it's not unreasonable for CPUs to be unplugged during runtime anyway - > perhaps userspace should be able to deal with that? Sure, I'm not debating that point. It's a bug in the applet that needs fixing, but it also seems that we could be saving a whole lot of pain by hiding this from userspace at suspend/resume time. Dave -- http://www.codemonkey.org.uk ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: peculiar suspend/resume bug. 2006-08-16 3:53 ` Dave Jones @ 2006-08-16 8:54 ` Rafael J. Wysocki 0 siblings, 0 replies; 12+ messages in thread From: Rafael J. Wysocki @ 2006-08-16 8:54 UTC (permalink / raw) To: Dave Jones; +Cc: Matthew Garrett, Nigel Cunningham, Linux Kernel Hi, On Wednesday 16 August 2006 05:53, Dave Jones wrote: > On Wed, Aug 16, 2006 at 03:41:40AM +0100, Matthew Garrett wrote: > > On Tue, Aug 15, 2006 at 08:37:28PM -0400, Dave Jones wrote: > > > > > cpufreq-applet crashes as soon as the cpu goes offline. > > > Now, the applet should be written to deal with this scenario more > > > gracefully, but I'm questioning whether or not userspace should > > > *see* the unplug/replug that suspend does at all. > > > > As Nigel mentioned, cpu unplug happens just before processes are frozen, > > so I guess there's a chance for it to be scheduled. On the other hand, > > it's not unreasonable for CPUs to be unplugged during runtime anyway - > > perhaps userspace should be able to deal with that? > > Sure, I'm not debating that point. It's a bug in the applet that needs fixing, > but it also seems that we could be saving a whole lot of pain by > hiding this from userspace at suspend/resume time. Yes, that's the plan, but for now the freezer is not SMP-friendly, so to speak, and we have some work to do to make it possible. Greetings, Rafael -- You never change things by fighting the existing reality. R. Buckminster Fuller ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: peculiar suspend/resume bug. 2006-08-16 2:41 ` Matthew Garrett 2006-08-16 3:53 ` Dave Jones @ 2006-08-17 1:44 ` Nigel Cunningham 2006-08-17 5:44 ` Rafael J. Wysocki 1 sibling, 1 reply; 12+ messages in thread From: Nigel Cunningham @ 2006-08-17 1:44 UTC (permalink / raw) To: Matthew Garrett; +Cc: Dave Jones, Linux Kernel Hi. On Wed, 2006-08-16 at 03:41 +0100, Matthew Garrett wrote: > On Tue, Aug 15, 2006 at 08:37:28PM -0400, Dave Jones wrote: > > > cpufreq-applet crashes as soon as the cpu goes offline. > > Now, the applet should be written to deal with this scenario more > > gracefully, but I'm questioning whether or not userspace should > > *see* the unplug/replug that suspend does at all. > > As Nigel mentioned, cpu unplug happens just before processes are frozen, > so I guess there's a chance for it to be scheduled. On the other hand, > it's not unreasonable for CPUs to be unplugged during runtime anyway - > perhaps userspace should be able to deal with that? Agreed. I've spent a little more time thinking about this, and want to put a few thoughts forward for discussion/ignoring/flame bait/whatever. I see two main issues at the moment with freezing before hotplugging. The first is that we have cpu specific kernel threads that we're going to want to kill, and the second is that we have userspace threads that we want to migrate to another cpu. Have I missed anything? The first issue could be helped by splitting the freezing of userspace processes from kernel space. The kernel threads could thus die without us having to worry about userspace seeing what's going on. I haven't looked at vanilla in a while; this might already be in. Alternatively, if it's viable, per-cpu kernel threads could perhaps be made NO_FREEZE. The second issue is migrating userspace threads. I'm no scheduling expert, so I'll just speculate :>. I wondered if it's possible to make the migration happen lazily; in such a way that if, when we come to thaw userspace, the cpu has been hotplugged again, the migration never happens. Does that sound possible? Regards, Nigel ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: peculiar suspend/resume bug. 2006-08-17 1:44 ` Nigel Cunningham @ 2006-08-17 5:44 ` Rafael J. Wysocki 2006-08-17 5:55 ` Nigel Cunningham 0 siblings, 1 reply; 12+ messages in thread From: Rafael J. Wysocki @ 2006-08-17 5:44 UTC (permalink / raw) To: Nigel Cunningham; +Cc: Matthew Garrett, Dave Jones, Linux Kernel On Thursday 17 August 2006 03:44, Nigel Cunningham wrote: > Hi. > > On Wed, 2006-08-16 at 03:41 +0100, Matthew Garrett wrote: > > On Tue, Aug 15, 2006 at 08:37:28PM -0400, Dave Jones wrote: > > > > > cpufreq-applet crashes as soon as the cpu goes offline. > > > Now, the applet should be written to deal with this scenario more > > > gracefully, but I'm questioning whether or not userspace should > > > *see* the unplug/replug that suspend does at all. > > > > As Nigel mentioned, cpu unplug happens just before processes are frozen, > > so I guess there's a chance for it to be scheduled. On the other hand, > > it's not unreasonable for CPUs to be unplugged during runtime anyway - > > perhaps userspace should be able to deal with that? > > Agreed. > > I've spent a little more time thinking about this, and want to put a few > thoughts forward for discussion/ignoring/flame bait/whatever. > > I see two main issues at the moment with freezing before hotplugging. > The first is that we have cpu specific kernel threads that we're going > to want to kill, and the second is that we have userspace threads that > we want to migrate to another cpu. Have I missed anything? I have bad memories from the time we were not using the CPU-hotplug and tried to freeze tasks with all CPUs on-line. There were some very subtle race conditions appearing between the freezer and the running tasks which were a nightmare to figure out. I'm not sure that they will appear now, but something tells me so. :-) > The first issue could be helped by splitting the freezing of userspace > processes from kernel space. The kernel threads could thus die without > us having to worry about userspace seeing what's going on. I haven't > looked at vanilla in a while; this might already be in. Yes, it is. > Alternatively, if it's viable, per-cpu kernel threads could perhaps be made > NO_FREEZE. > > The second issue is migrating userspace threads. I'm no scheduling > expert, so I'll just speculate :>. I wondered if it's possible to make > the migration happen lazily; in such a way that if, when we come to thaw > userspace, the cpu has been hotplugged again, the migration never > happens. Does that sound possible? The CPU hotplug makes the tasks migrate automatically, but that's not a problem, as I see it. The problem is some tasks may have specific CPU affinities set and these should not change accross suspend/resume. Greetings, Rafael -- You never change things by fighting the existing reality. R. Buckminster Fuller ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: peculiar suspend/resume bug. 2006-08-17 5:44 ` Rafael J. Wysocki @ 2006-08-17 5:55 ` Nigel Cunningham 2006-08-17 6:30 ` Rafael J. Wysocki 0 siblings, 1 reply; 12+ messages in thread From: Nigel Cunningham @ 2006-08-17 5:55 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: Matthew Garrett, Dave Jones, Linux Kernel Hi. Thanks for the reply. On Thu, 2006-08-17 at 07:44 +0200, Rafael J. Wysocki wrote: > On Thursday 17 August 2006 03:44, Nigel Cunningham wrote: > > Hi. > > > > On Wed, 2006-08-16 at 03:41 +0100, Matthew Garrett wrote: > > > On Tue, Aug 15, 2006 at 08:37:28PM -0400, Dave Jones wrote: > > > > > > > cpufreq-applet crashes as soon as the cpu goes offline. > > > > Now, the applet should be written to deal with this scenario more > > > > gracefully, but I'm questioning whether or not userspace should > > > > *see* the unplug/replug that suspend does at all. > > > > > > As Nigel mentioned, cpu unplug happens just before processes are frozen, > > > so I guess there's a chance for it to be scheduled. On the other hand, > > > it's not unreasonable for CPUs to be unplugged during runtime anyway - > > > perhaps userspace should be able to deal with that? > > > > Agreed. > > > > I've spent a little more time thinking about this, and want to put a few > > thoughts forward for discussion/ignoring/flame bait/whatever. > > > > I see two main issues at the moment with freezing before hotplugging. > > The first is that we have cpu specific kernel threads that we're going > > to want to kill, and the second is that we have userspace threads that > > we want to migrate to another cpu. Have I missed anything? > > I have bad memories from the time we were not using the CPU-hotplug and > tried to freeze tasks with all CPUs on-line. There were some very subtle > race conditions appearing between the freezer and the running tasks > which were a nightmare to figure out. I'm not sure that they will appear > now, but something tells me so. :-) I think you'll find that the separate freezing of kernel space will help. We had SMP support in Suspend2 long before cpu hotplugging was added, and it was stable and reliable. I'm reasonably certain that the switch to splitting freezing was pre-cpu hotplugging. > > The first issue could be helped by splitting the freezing of userspace > > processes from kernel space. The kernel threads could thus die without > > us having to worry about userspace seeing what's going on. I haven't > > looked at vanilla in a while; this might already be in. > > Yes, it is. Great. Sorry for my slowness. I just keep too many things on the go at once. > > Alternatively, if it's viable, per-cpu kernel threads could perhaps be made > > NO_FREEZE. > > > > The second issue is migrating userspace threads. I'm no scheduling > > expert, so I'll just speculate :>. I wondered if it's possible to make > > the migration happen lazily; in such a way that if, when we come to thaw > > userspace, the cpu has been hotplugged again, the migration never > > happens. Does that sound possible? > > The CPU hotplug makes the tasks migrate automatically, but that's not > a problem, as I see it. The problem is some tasks may have specific CPU > affinities set and these should not change accross suspend/resume. Mmm. My concern was that cpu hotplug might somehow deadlock if the process it was trying to migrate was frozen. You don't think that's a possibility? With affinities, would saving and restoring be a possibility? Regards, Nigel ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: peculiar suspend/resume bug. 2006-08-17 5:55 ` Nigel Cunningham @ 2006-08-17 6:30 ` Rafael J. Wysocki 0 siblings, 0 replies; 12+ messages in thread From: Rafael J. Wysocki @ 2006-08-17 6:30 UTC (permalink / raw) To: Nigel Cunningham; +Cc: Matthew Garrett, Dave Jones, Linux Kernel Hi, On Thursday 17 August 2006 07:55, Nigel Cunningham wrote: > Hi. > > Thanks for the reply. > > On Thu, 2006-08-17 at 07:44 +0200, Rafael J. Wysocki wrote: > > On Thursday 17 August 2006 03:44, Nigel Cunningham wrote: > > > Hi. > > > > > > On Wed, 2006-08-16 at 03:41 +0100, Matthew Garrett wrote: > > > > On Tue, Aug 15, 2006 at 08:37:28PM -0400, Dave Jones wrote: > > > > > > > > > cpufreq-applet crashes as soon as the cpu goes offline. > > > > > Now, the applet should be written to deal with this scenario more > > > > > gracefully, but I'm questioning whether or not userspace should > > > > > *see* the unplug/replug that suspend does at all. > > > > > > > > As Nigel mentioned, cpu unplug happens just before processes are frozen, > > > > so I guess there's a chance for it to be scheduled. On the other hand, > > > > it's not unreasonable for CPUs to be unplugged during runtime anyway - > > > > perhaps userspace should be able to deal with that? > > > > > > Agreed. > > > > > > I've spent a little more time thinking about this, and want to put a few > > > thoughts forward for discussion/ignoring/flame bait/whatever. > > > > > > I see two main issues at the moment with freezing before hotplugging. > > > The first is that we have cpu specific kernel threads that we're going > > > to want to kill, and the second is that we have userspace threads that > > > we want to migrate to another cpu. Have I missed anything? > > > > I have bad memories from the time we were not using the CPU-hotplug and > > tried to freeze tasks with all CPUs on-line. There were some very subtle > > race conditions appearing between the freezer and the running tasks > > which were a nightmare to figure out. I'm not sure that they will appear > > now, but something tells me so. :-) > > I think you'll find that the separate freezing of kernel space will > help. That certainly is possible, but will need some testing. > We had SMP support in Suspend2 long before cpu hotplugging was > added, and it was stable and reliable. I'm reasonably certain that the > switch to splitting freezing was pre-cpu hotplugging. > > > > The first issue could be helped by splitting the freezing of userspace > > > processes from kernel space. The kernel threads could thus die without > > > us having to worry about userspace seeing what's going on. I haven't > > > looked at vanilla in a while; this might already be in. > > > > Yes, it is. > > Great. Sorry for my slowness. I just keep too many things on the go at > once. > > > > Alternatively, if it's viable, per-cpu kernel threads could perhaps be made > > > NO_FREEZE. > > > > > > The second issue is migrating userspace threads. I'm no scheduling > > > expert, so I'll just speculate :>. I wondered if it's possible to make > > > the migration happen lazily; in such a way that if, when we come to thaw > > > userspace, the cpu has been hotplugged again, the migration never > > > happens. Does that sound possible? > > > > The CPU hotplug makes the tasks migrate automatically, but that's not > > a problem, as I see it. The problem is some tasks may have specific CPU > > affinities set and these should not change accross suspend/resume. > > Mmm. My concern was that cpu hotplug might somehow deadlock if the > process it was trying to migrate was frozen. You don't think that's a > possibility? No, I don't. Of course it'll have to be tested anyway. :-) > With affinities, would saving and restoring be a possibility? I haven't thought about it yet. Perhaps, but it will need to be done with care. Greetings, Rafael -- You never change things by fighting the existing reality. R. Buckminster Fuller ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: peculiar suspend/resume bug. 2006-08-15 22:10 peculiar suspend/resume bug Dave Jones 2006-08-16 0:19 ` Nigel Cunningham @ 2006-08-16 22:06 ` Pavel Machek 1 sibling, 0 replies; 12+ messages in thread From: Pavel Machek @ 2006-08-16 22:06 UTC (permalink / raw) To: Dave Jones, Linux Kernel Hi! > Here's a fun one. > - Get a dual core cpufreq aware laptop (Like say, a core-duo) > - Add a cpufreq monitor to gnome-panel. Configure it > to watch the 2nd core. > - Suspend. > - Resume. > > Watch the cpufreq monitor die horribly. > > I believe this is because we take down the 2nd core at suspend > time with cpu hotplug, and for some reason we're scheduling > userspace before we bring that second core back up. > > Anyone have any clues why this is happening? Its by design, we do unplug first. Okay, maybe it is more of design bug :-). Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2006-08-17 6:27 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-08-15 22:10 peculiar suspend/resume bug Dave Jones 2006-08-16 0:19 ` Nigel Cunningham 2006-08-16 0:37 ` Dave Jones 2006-08-16 1:05 ` Nigel Cunningham 2006-08-16 2:41 ` Matthew Garrett 2006-08-16 3:53 ` Dave Jones 2006-08-16 8:54 ` Rafael J. Wysocki 2006-08-17 1:44 ` Nigel Cunningham 2006-08-17 5:44 ` Rafael J. Wysocki 2006-08-17 5:55 ` Nigel Cunningham 2006-08-17 6:30 ` Rafael J. Wysocki 2006-08-16 22:06 ` Pavel Machek
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox