* S3 and sigwait (was Re: 2.6.13-rc3: swsusp works (TP 600X)) [not found] ` <20050723003544.GC1988-I/5MKhXcvmPrBKCeMvbIDA@public.gmane.org> @ 2005-07-30 0:50 ` Sanjoy Mahajan [not found] ` <E1DyfYO-0006oI-00-KmINTRm7+bkRAIupTkoUWTYRy0cijUJx@public.gmane.org> 0 siblings, 1 reply; 5+ messages in thread From: Sanjoy Mahajan @ 2005-07-30 0:50 UTC (permalink / raw) To: Pavel Machek Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, acpi-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f >> One other glitch is that pdnsd (a nameserver caching daemon) has crashed >> when the system wakes up from swsusp. It also happens when waking up >> from S3, which was working with 2.6.11.4 although not with 2.6.13-rc3. >> Many people have said mysql also does not suspend well. Is their use of >> a named pipe or socket causing the problem? > No idea, strace? The upshot of stracing is in tthe Debian BTS <bugs.debian.org> #319572. Paul Rombouts, an author of pdnsd, reproduced the strace crash and found the problem: > Apparently strace causes sigwait to return EINTR, which is > inconsistent with the documentation I could find on sigwait. Which is true. The sigwait man entry (Debian 'etch') says: The !sigwait! function never returns an error. His patch (available in the BTS and included below) fixed the problem of strace or S3 sleep crashing pdnsd. Shouldn't sleeping and suspension be invisible to user-space processes such as pdnsd? Drivers and other kernel code need rewriting so that devices and buses are not abandoned in a weird state, but going to sleep should just pull the rug out from under the entire user space. Then no user space process would need rewriting to survive a sleep/wake, as long as the deep-freeze were cold enough. Or is there a subtlety with threads that I'm missing? With APM, maybe such transparency was more possible since going to bed was arranged by the firmware rather than by the OS, and the firmware would pull out the rug from under the entire user and kernel space (after maybe a bit of kernel prep). -Sanjoy --- src/main.c~ 2005-07-08 20:13:14.000000000 +0200 +++ src/main.c 2005-07-29 16:16:12.000000000 +0200 @@ -659,11 +659,20 @@ pthread_sigmask(SIG_BLOCK,&sigs_msk,NULL); waiting=1; #endif - sigwait(&sigs_msk,&sig); - DEBUG_MSG("Signal %i caught.\n",sig); + { + int err; + while ((err=sigwait(&sigs_msk,&sig))) { + if(err!=EINTR) { + log_warn("sigwait failed: %s",strerror(err)); + sig=0; + break; + } + } + } + if(sig) DEBUG_MSG("Signal %i caught.\n",sig); write_disk_cache(); destroy_cache(); - log_warn("Caught signal %i. Exiting.",sig); + if(sig) log_warn("Caught signal %i. Exiting.",sig); if (sig==SIGSEGV || sig==SIGILL || sig==SIGBUS) crash_msg("This is a fatal signal probably triggered by a bug."); if (ping_isocket!=-1) ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <E1DyfYO-0006oI-00-KmINTRm7+bkRAIupTkoUWTYRy0cijUJx@public.gmane.org>]
* Re: S3 and sigwait (was Re: 2.6.13-rc3: swsusp works (TP 600X)) [not found] ` <E1DyfYO-0006oI-00-KmINTRm7+bkRAIupTkoUWTYRy0cijUJx@public.gmane.org> @ 2005-07-30 10:30 ` Pavel Machek [not found] ` <20050730103034.GC1942-I/5MKhXcvmPrBKCeMvbIDA@public.gmane.org> 0 siblings, 1 reply; 5+ messages in thread From: Pavel Machek @ 2005-07-30 10:30 UTC (permalink / raw) To: Sanjoy Mahajan Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, acpi-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f Hi! > >> One other glitch is that pdnsd (a nameserver caching daemon) has crashed > >> when the system wakes up from swsusp. It also happens when waking up > >> from S3, which was working with 2.6.11.4 although not with 2.6.13-rc3. > >> Many people have said mysql also does not suspend well. Is their use of > >> a named pipe or socket causing the problem? > > > No idea, strace? > > The upshot of stracing is in tthe Debian BTS <bugs.debian.org> > #319572. Paul Rombouts, an author of pdnsd, reproduced the strace > crash and found the problem: > > > Apparently strace causes sigwait to return EINTR, which is > > inconsistent with the documentation I could find on sigwait. > > Which is true. The sigwait man entry (Debian 'etch') says: > The !sigwait! function never returns an error. > > His patch (available in the BTS and included below) fixed the problem > of strace or S3 sleep crashing pdnsd. If you think it is a linux bug, can you produce small test case doing just the sigwait, and post it on l-k with big title "sigwait() breaks when straced, and on suspend"? That way it is going to get some attetion, and you'll get either documentation or kernel fixed. Pavel -- teflon -- maybe it is a trademark, but it should not be. ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <20050730103034.GC1942-I/5MKhXcvmPrBKCeMvbIDA@public.gmane.org>]
* Re: S3 and sigwait (was Re: 2.6.13-rc3: swsusp works (TP 600X)) [not found] ` <20050730103034.GC1942-I/5MKhXcvmPrBKCeMvbIDA@public.gmane.org> @ 2005-08-01 6:51 ` Shaohua Li [not found] ` <1122879094.3285.2.camel-ECwVeV2eNyQD0+JXs3kMbRL4W9x8LtSr@public.gmane.org> 0 siblings, 1 reply; 5+ messages in thread From: Shaohua Li @ 2005-08-01 6:51 UTC (permalink / raw) To: Pavel Machek Cc: Sanjoy Mahajan, linux-kernel-u79uwXL29TY76Z2rM5mHXA, acpi-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f On Sat, 2005-07-30 at 18:30 +0800, Pavel Machek wrote: > Hi! > > > >> One other glitch is that pdnsd (a nameserver caching daemon) has > crashed > > >> when the system wakes up from swsusp. It also happens when > waking up > > >> from S3, which was working with 2.6.11.4 although not with > 2.6.13-rc3. > > >> Many people have said mysql also does not suspend well. Is their > use of > > >> a named pipe or socket causing the problem? > > > > > No idea, strace? > > > > The upshot of stracing is in tthe Debian BTS <bugs.debian.org> > > #319572. Paul Rombouts, an author of pdnsd, reproduced the strace > > crash and found the problem: > > > > > Apparently strace causes sigwait to return EINTR, which is > > > inconsistent with the documentation I could find on sigwait. > > > > Which is true. The sigwait man entry (Debian 'etch') says: > > The !sigwait! function never returns an error. > > > > His patch (available in the BTS and included below) fixed the > problem > > of strace or S3 sleep crashing pdnsd. > > If you think it is a linux bug, can you produce small test case doing > just the sigwait, and post it on l-k with big title "sigwait() breaks > when straced, and on suspend"? > > That way it is going to get some attetion, and you'll get either > documentation or kernel fixed. Looks like a linux bug to me. The refrigerator fake signal waked the task up and without restart for the sigwait case. How about below patch: Thanks, Shaohua --- linux-2.6.13-rc4-root/kernel/signal.c | 11 ++++++++++- 1 files changed, 10 insertions(+), 1 deletion(-) diff -puN kernel/signal.c~sigwait-suspend-resume kernel/signal.c --- linux-2.6.13-rc4/kernel/signal.c~sigwait-suspend-resume 2005-08-01 14:00:39.089460688 +0800 +++ linux-2.6.13-rc4-root/kernel/signal.c 2005-08-01 14:30:13.821660384 +0800 @@ -2188,6 +2188,7 @@ sys_rt_sigtimedwait(const sigset_t __use struct timespec ts; siginfo_t info; long timeout = 0; + int recover = 0; /* XXX: Don't preclude handling different sized sigset_t's. */ if (sigsetsize != sizeof(sigset_t)) @@ -2225,15 +2226,23 @@ sys_rt_sigtimedwait(const sigset_t __use * be awakened when they arrive. */ current->real_blocked = current->blocked; sigandsets(¤t->blocked, ¤t->blocked, &these); +do_recover: recalc_sigpending(); spin_unlock_irq(¤t->sighand->siglock); current->state = TASK_INTERRUPTIBLE; timeout = schedule_timeout(timeout); - try_to_freeze(); + if (try_to_freeze()) + recover = 1; spin_lock_irq(¤t->sighand->siglock); sig = dequeue_signal(current, &these, &info); + if (!sig && recover) { + if (timeout == 0) + timeout = MAX_SCHEDULE_TIMEOUT; + recover = 0; + goto do_recover; + } current->blocked = current->real_blocked; siginitset(¤t->real_blocked, 0); recalc_sigpending(); _ ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <1122879094.3285.2.camel-ECwVeV2eNyQD0+JXs3kMbRL4W9x8LtSr@public.gmane.org>]
* Re: S3 and sigwait (was Re: 2.6.13-rc3: swsusp works (TP 600X)) [not found] ` <1122879094.3285.2.camel-ECwVeV2eNyQD0+JXs3kMbRL4W9x8LtSr@public.gmane.org> @ 2005-08-01 7:09 ` Pavel Machek 0 siblings, 0 replies; 5+ messages in thread From: Pavel Machek @ 2005-08-01 7:09 UTC (permalink / raw) To: Shaohua Li Cc: Sanjoy Mahajan, linux-kernel-u79uwXL29TY76Z2rM5mHXA, acpi-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f Hi! > > If you think it is a linux bug, can you produce small test case doing > > just the sigwait, and post it on l-k with big title "sigwait() breaks > > when straced, and on suspend"? > > > > That way it is going to get some attetion, and you'll get either > > documentation or kernel fixed. > Looks like a linux bug to me. The refrigerator fake signal waked the > task up and without restart for the sigwait case. How about below > patch: Is there chance to fix strace case, too? sigwait() is broken in more than one way it seems... Pavel > linux-2.6.13-rc4-root/kernel/signal.c | 11 ++++++++++- > 1 files changed, 10 insertions(+), 1 deletion(-) > > diff -puN kernel/signal.c~sigwait-suspend-resume kernel/signal.c > --- linux-2.6.13-rc4/kernel/signal.c~sigwait-suspend-resume 2005-08-01 14:00:39.089460688 +0800 > +++ linux-2.6.13-rc4-root/kernel/signal.c 2005-08-01 14:30:13.821660384 +0800 > @@ -2188,6 +2188,7 @@ sys_rt_sigtimedwait(const sigset_t __use > struct timespec ts; > siginfo_t info; > long timeout = 0; > + int recover = 0; > > /* XXX: Don't preclude handling different sized sigset_t's. */ > if (sigsetsize != sizeof(sigset_t)) > @@ -2225,15 +2226,23 @@ sys_rt_sigtimedwait(const sigset_t __use > * be awakened when they arrive. */ > current->real_blocked = current->blocked; > sigandsets(¤t->blocked, ¤t->blocked, &these); > +do_recover: > recalc_sigpending(); > spin_unlock_irq(¤t->sighand->siglock); > > current->state = TASK_INTERRUPTIBLE; > timeout = schedule_timeout(timeout); > > - try_to_freeze(); > + if (try_to_freeze()) > + recover = 1; Can't you just goto do_recover here? > spin_lock_irq(¤t->sighand->siglock); > sig = dequeue_signal(current, &these, &info); > + if (!sig && recover) { > + if (timeout == 0) > + timeout = MAX_SCHEDULE_TIMEOUT; > + recover = 0; > + goto do_recover; > + } > current->blocked = current->real_blocked; > siginitset(¤t->real_blocked, 0); > recalc_sigpending(); > _ > -- if you have sharp zaurus hardware you don't need... you know my address ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click ^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: S3 and sigwait (was Re: 2.6.13-rc3: swsusp works (TP 600X)) @ 2005-08-01 12:36 Li, Shaohua 0 siblings, 0 replies; 5+ messages in thread From: Li, Shaohua @ 2005-08-01 12:36 UTC (permalink / raw) To: Pavel Machek Cc: Sanjoy Mahajan, linux-kernel-u79uwXL29TY76Z2rM5mHXA, acpi-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f Hi, >> > If you think it is a linux bug, can you produce small test case doing >> > just the sigwait, and post it on l-k with big title "sigwait() breaks >> > when straced, and on suspend"? >> > >> > That way it is going to get some attetion, and you'll get either >> > documentation or kernel fixed. >> Looks like a linux bug to me. The refrigerator fake signal waked the >> task up and without restart for the sigwait case. How about below >> patch: > >Is there chance to fix strace case, too? sigwait() is broken in more >than one way it seems... Not sure about it. strace shows sigwait using sigtimedwait, which doesn't say it can't return error. >> linux-2.6.13-rc4-root/kernel/signal.c | 11 ++++++++++- >> 1 files changed, 10 insertions(+), 1 deletion(-) >> >> diff -puN kernel/signal.c~sigwait-suspend-resume kernel/signal.c >> --- linux-2.6.13-rc4/kernel/signal.c~sigwait-suspend-resume 2005-08- >01 14:00:39.089460688 +0800 >> +++ linux-2.6.13-rc4-root/kernel/signal.c 2005-08-01 >14:30:13.821660384 +0800 >> @@ -2188,6 +2188,7 @@ sys_rt_sigtimedwait(const sigset_t __use >> struct timespec ts; >> siginfo_t info; >> long timeout = 0; >> + int recover = 0; >> >> /* XXX: Don't preclude handling different sized sigset_t's. */ >> if (sigsetsize != sizeof(sigset_t)) >> @@ -2225,15 +2226,23 @@ sys_rt_sigtimedwait(const sigset_t __use >> * be awakened when they arrive. */ >> current->real_blocked = current->blocked; >> sigandsets(¤t->blocked, ¤t->blocked, &these); >> +do_recover: >> recalc_sigpending(); >> spin_unlock_irq(¤t->sighand->siglock); >> >> current->state = TASK_INTERRUPTIBLE; >> timeout = schedule_timeout(timeout); >> >> - try_to_freeze(); >> + if (try_to_freeze()) >> + recover = 1; > >Can't you just goto do_recover here? Not sure again. Thanks, Shaohua ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_idt77&alloc_id\x16492&op=click ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2005-08-01 12:36 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20050723003544.GC1988@elf.ucw.cz>
[not found] ` <20050723003544.GC1988-I/5MKhXcvmPrBKCeMvbIDA@public.gmane.org>
2005-07-30 0:50 ` S3 and sigwait (was Re: 2.6.13-rc3: swsusp works (TP 600X)) Sanjoy Mahajan
[not found] ` <E1DyfYO-0006oI-00-KmINTRm7+bkRAIupTkoUWTYRy0cijUJx@public.gmane.org>
2005-07-30 10:30 ` Pavel Machek
[not found] ` <20050730103034.GC1942-I/5MKhXcvmPrBKCeMvbIDA@public.gmane.org>
2005-08-01 6:51 ` Shaohua Li
[not found] ` <1122879094.3285.2.camel-ECwVeV2eNyQD0+JXs3kMbRL4W9x8LtSr@public.gmane.org>
2005-08-01 7:09 ` Pavel Machek
2005-08-01 12:36 Li, Shaohua
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox