public inbox for linux-acpi@vger.kernel.org
 help / color / mirror / Atom feed
* S3 and sigwait (was Re: 2.6.13-rc3: swsusp works (TP 600X))
       [not found] ` <20050723003544.GC1988-I/5MKhXcvmPrBKCeMvbIDA@public.gmane.org>
@ 2005-07-30  0:50   ` Sanjoy Mahajan
       [not found]     ` <E1DyfYO-0006oI-00-KmINTRm7+bkRAIupTkoUWTYRy0cijUJx@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Sanjoy Mahajan @ 2005-07-30  0:50 UTC (permalink / raw)
  To: Pavel Machek
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	acpi-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

>> One other glitch is that pdnsd (a nameserver caching daemon) has crashed
>> when the system wakes up from swsusp.  It also happens when waking up
>> from S3, which was working with 2.6.11.4 although not with 2.6.13-rc3.
>> Many people have said mysql also does not suspend well.  Is their use of
>> a named pipe or socket causing the problem?

> No idea, strace?

The upshot of stracing is in tthe Debian BTS <bugs.debian.org>
#319572.  Paul Rombouts, an author of pdnsd, reproduced the strace
crash and found the problem:

> Apparently strace causes sigwait to return EINTR, which is
> inconsistent with the documentation I could find on sigwait.

Which is true.  The sigwait man entry (Debian 'etch') says:
       The !sigwait! function never returns an error.

His patch (available in the BTS and included below) fixed the problem
of strace or S3 sleep crashing pdnsd.

Shouldn't sleeping and suspension be invisible to user-space processes
such as pdnsd?  Drivers and other kernel code need rewriting so that
devices and buses are not abandoned in a weird state, but going to
sleep should just pull the rug out from under the entire user space.
Then no user space process would need rewriting to survive a
sleep/wake, as long as the deep-freeze were cold enough.  Or is there
a subtlety with threads that I'm missing?

With APM, maybe such transparency was more possible since going to bed
was arranged by the firmware rather than by the OS, and the firmware
would pull out the rug from under the entire user and kernel space
(after maybe a bit of kernel prep).

-Sanjoy

--- src/main.c~	2005-07-08 20:13:14.000000000 +0200
+++ src/main.c	2005-07-29 16:16:12.000000000 +0200
@@ -659,11 +659,20 @@
 	pthread_sigmask(SIG_BLOCK,&sigs_msk,NULL);
 	waiting=1;
 #endif
-	sigwait(&sigs_msk,&sig);
-	DEBUG_MSG("Signal %i caught.\n",sig);
+	{
+		int err;
+		while ((err=sigwait(&sigs_msk,&sig))) {
+			if(err!=EINTR) {
+				log_warn("sigwait failed: %s",strerror(err));
+				sig=0;
+				break;
+			}
+		}
+	}
+	if(sig) DEBUG_MSG("Signal %i caught.\n",sig);
 	write_disk_cache();
 	destroy_cache();
-	log_warn("Caught signal %i. Exiting.",sig);
+	if(sig) log_warn("Caught signal %i. Exiting.",sig);
 	if (sig==SIGSEGV || sig==SIGILL || sig==SIGBUS)
 		crash_msg("This is a fatal signal probably triggered by a bug.");
 	if (ping_isocket!=-1)



-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: S3 and sigwait (was Re: 2.6.13-rc3: swsusp works (TP 600X))
       [not found]     ` <E1DyfYO-0006oI-00-KmINTRm7+bkRAIupTkoUWTYRy0cijUJx@public.gmane.org>
@ 2005-07-30 10:30       ` Pavel Machek
       [not found]         ` <20050730103034.GC1942-I/5MKhXcvmPrBKCeMvbIDA@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Pavel Machek @ 2005-07-30 10:30 UTC (permalink / raw)
  To: Sanjoy Mahajan
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	acpi-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

Hi!

> >> One other glitch is that pdnsd (a nameserver caching daemon) has crashed
> >> when the system wakes up from swsusp.  It also happens when waking up
> >> from S3, which was working with 2.6.11.4 although not with 2.6.13-rc3.
> >> Many people have said mysql also does not suspend well.  Is their use of
> >> a named pipe or socket causing the problem?
> 
> > No idea, strace?
> 
> The upshot of stracing is in tthe Debian BTS <bugs.debian.org>
> #319572.  Paul Rombouts, an author of pdnsd, reproduced the strace
> crash and found the problem:
> 
> > Apparently strace causes sigwait to return EINTR, which is
> > inconsistent with the documentation I could find on sigwait.
> 
> Which is true.  The sigwait man entry (Debian 'etch') says:
>        The !sigwait! function never returns an error.
> 
> His patch (available in the BTS and included below) fixed the problem
> of strace or S3 sleep crashing pdnsd.

If you think it is a linux bug, can you produce small test case doing
just the sigwait, and post it on l-k with big title "sigwait() breaks
when straced, and on suspend"?

That way it is going to get some attetion, and you'll get either
documentation or kernel fixed.
								Pavel


-- 
teflon -- maybe it is a trademark, but it should not be.


-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: S3 and sigwait (was Re: 2.6.13-rc3: swsusp works (TP 600X))
       [not found]         ` <20050730103034.GC1942-I/5MKhXcvmPrBKCeMvbIDA@public.gmane.org>
@ 2005-08-01  6:51           ` Shaohua Li
       [not found]             ` <1122879094.3285.2.camel-ECwVeV2eNyQD0+JXs3kMbRL4W9x8LtSr@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Shaohua Li @ 2005-08-01  6:51 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Sanjoy Mahajan, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	acpi-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

On Sat, 2005-07-30 at 18:30 +0800, Pavel Machek wrote:
> Hi!
> 
> > >> One other glitch is that pdnsd (a nameserver caching daemon) has
> crashed 
> > >> when the system wakes up from swsusp.  It also happens when
> waking up 
> > >> from S3, which was working with 2.6.11.4 although not with
> 2.6.13-rc3. 
> > >> Many people have said mysql also does not suspend well.  Is their
> use of 
> > >> a named pipe or socket causing the problem? 
> >  
> > > No idea, strace? 
> >  
> > The upshot of stracing is in tthe Debian BTS <bugs.debian.org> 
> > #319572.  Paul Rombouts, an author of pdnsd, reproduced the strace 
> > crash and found the problem: 
> >  
> > > Apparently strace causes sigwait to return EINTR, which is 
> > > inconsistent with the documentation I could find on sigwait. 
> >  
> > Which is true.  The sigwait man entry (Debian 'etch') says: 
> >        The !sigwait! function never returns an error. 
> >  
> > His patch (available in the BTS and included below) fixed the
> problem 
> > of strace or S3 sleep crashing pdnsd.
> 
> If you think it is a linux bug, can you produce small test case doing 
> just the sigwait, and post it on l-k with big title "sigwait() breaks 
> when straced, and on suspend"?
> 
> That way it is going to get some attetion, and you'll get either 
> documentation or kernel fixed. 
Looks like a linux bug to me. The refrigerator fake signal waked the
task up and without restart for the sigwait case. How about below patch:


Thanks,
Shaohua
---

 linux-2.6.13-rc4-root/kernel/signal.c |   11 ++++++++++-
 1 files changed, 10 insertions(+), 1 deletion(-)

diff -puN kernel/signal.c~sigwait-suspend-resume kernel/signal.c
--- linux-2.6.13-rc4/kernel/signal.c~sigwait-suspend-resume	2005-08-01 14:00:39.089460688 +0800
+++ linux-2.6.13-rc4-root/kernel/signal.c	2005-08-01 14:30:13.821660384 +0800
@@ -2188,6 +2188,7 @@ sys_rt_sigtimedwait(const sigset_t __use
 	struct timespec ts;
 	siginfo_t info;
 	long timeout = 0;
+	int recover = 0;
 
 	/* XXX: Don't preclude handling different sized sigset_t's.  */
 	if (sigsetsize != sizeof(sigset_t))
@@ -2225,15 +2226,23 @@ sys_rt_sigtimedwait(const sigset_t __use
 			 * be awakened when they arrive.  */
 			current->real_blocked = current->blocked;
 			sigandsets(&current->blocked, &current->blocked, &these);
+do_recover:
 			recalc_sigpending();
 			spin_unlock_irq(&current->sighand->siglock);
 
 			current->state = TASK_INTERRUPTIBLE;
 			timeout = schedule_timeout(timeout);
 
-			try_to_freeze();
+			if (try_to_freeze())
+				recover = 1;
 			spin_lock_irq(&current->sighand->siglock);
 			sig = dequeue_signal(current, &these, &info);
+			if (!sig && recover) {
+				if (timeout == 0)
+					timeout = MAX_SCHEDULE_TIMEOUT;
+				recover = 0;
+				goto do_recover;
+			}
 			current->blocked = current->real_blocked;
 			siginitset(&current->real_blocked, 0);
 			recalc_sigpending();
_




-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: S3 and sigwait (was Re: 2.6.13-rc3: swsusp works (TP 600X))
       [not found]             ` <1122879094.3285.2.camel-ECwVeV2eNyQD0+JXs3kMbRL4W9x8LtSr@public.gmane.org>
@ 2005-08-01  7:09               ` Pavel Machek
  0 siblings, 0 replies; 5+ messages in thread
From: Pavel Machek @ 2005-08-01  7:09 UTC (permalink / raw)
  To: Shaohua Li
  Cc: Sanjoy Mahajan, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	acpi-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

Hi!

> > If you think it is a linux bug, can you produce small test case doing 
> > just the sigwait, and post it on l-k with big title "sigwait() breaks 
> > when straced, and on suspend"?
> > 
> > That way it is going to get some attetion, and you'll get either 
> > documentation or kernel fixed. 
> Looks like a linux bug to me. The refrigerator fake signal waked the
> task up and without restart for the sigwait case. How about below
> patch:

Is there chance to fix strace case, too? sigwait() is broken in more
than one way it seems...
								Pavel


>  linux-2.6.13-rc4-root/kernel/signal.c |   11 ++++++++++-
>  1 files changed, 10 insertions(+), 1 deletion(-)
> 
> diff -puN kernel/signal.c~sigwait-suspend-resume kernel/signal.c
> --- linux-2.6.13-rc4/kernel/signal.c~sigwait-suspend-resume	2005-08-01 14:00:39.089460688 +0800
> +++ linux-2.6.13-rc4-root/kernel/signal.c	2005-08-01 14:30:13.821660384 +0800
> @@ -2188,6 +2188,7 @@ sys_rt_sigtimedwait(const sigset_t __use
>  	struct timespec ts;
>  	siginfo_t info;
>  	long timeout = 0;
> +	int recover = 0;
>  
>  	/* XXX: Don't preclude handling different sized sigset_t's.  */
>  	if (sigsetsize != sizeof(sigset_t))
> @@ -2225,15 +2226,23 @@ sys_rt_sigtimedwait(const sigset_t __use
>  			 * be awakened when they arrive.  */
>  			current->real_blocked = current->blocked;
>  			sigandsets(&current->blocked, &current->blocked, &these);
> +do_recover:
>  			recalc_sigpending();
>  			spin_unlock_irq(&current->sighand->siglock);
>  
>  			current->state = TASK_INTERRUPTIBLE;
>  			timeout = schedule_timeout(timeout);
>  
> -			try_to_freeze();
> +			if (try_to_freeze())
> +				recover = 1;

Can't you just goto do_recover here?

>  			spin_lock_irq(&current->sighand->siglock);
>  			sig = dequeue_signal(current, &these, &info);
> +			if (!sig && recover) {
> +				if (timeout == 0)
> +					timeout = MAX_SCHEDULE_TIMEOUT;
> +				recover = 0;
> +				goto do_recover;
> +			}
>  			current->blocked = current->real_blocked;
>  			siginitset(&current->real_blocked, 0);
>  			recalc_sigpending();
> _
> 

-- 
if you have sharp zaurus hardware you don't need... you know my address


-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: S3 and sigwait (was Re: 2.6.13-rc3: swsusp works (TP 600X))
@ 2005-08-01 12:36 Li, Shaohua
  0 siblings, 0 replies; 5+ messages in thread
From: Li, Shaohua @ 2005-08-01 12:36 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Sanjoy Mahajan, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	acpi-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

Hi,
>> > If you think it is a linux bug, can you produce small test case
doing
>> > just the sigwait, and post it on l-k with big title "sigwait()
breaks
>> > when straced, and on suspend"?
>> >
>> > That way it is going to get some attetion, and you'll get either
>> > documentation or kernel fixed.
>> Looks like a linux bug to me. The refrigerator fake signal waked the
>> task up and without restart for the sigwait case. How about below
>> patch:
>
>Is there chance to fix strace case, too? sigwait() is broken in more
>than one way it seems...
Not sure about it. strace shows sigwait using sigtimedwait, which
doesn't say it can't return error.

>>  linux-2.6.13-rc4-root/kernel/signal.c |   11 ++++++++++-
>>  1 files changed, 10 insertions(+), 1 deletion(-)
>>
>> diff -puN kernel/signal.c~sigwait-suspend-resume kernel/signal.c
>> --- linux-2.6.13-rc4/kernel/signal.c~sigwait-suspend-resume	2005-08-
>01 14:00:39.089460688 +0800
>> +++ linux-2.6.13-rc4-root/kernel/signal.c	2005-08-01
>14:30:13.821660384 +0800
>> @@ -2188,6 +2188,7 @@ sys_rt_sigtimedwait(const sigset_t __use
>>  	struct timespec ts;
>>  	siginfo_t info;
>>  	long timeout = 0;
>> +	int recover = 0;
>>
>>  	/* XXX: Don't preclude handling different sized sigset_t's.  */
>>  	if (sigsetsize != sizeof(sigset_t))
>> @@ -2225,15 +2226,23 @@ sys_rt_sigtimedwait(const sigset_t __use
>>  			 * be awakened when they arrive.  */
>>  			current->real_blocked = current->blocked;
>>  			sigandsets(&current->blocked, &current->blocked,
&these);
>> +do_recover:
>>  			recalc_sigpending();
>>  			spin_unlock_irq(&current->sighand->siglock);
>>
>>  			current->state = TASK_INTERRUPTIBLE;
>>  			timeout = schedule_timeout(timeout);
>>
>> -			try_to_freeze();
>> +			if (try_to_freeze())
>> +				recover = 1;
>
>Can't you just goto do_recover here?
Not sure again.

Thanks,
Shaohua


-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_idt77&alloc_id\x16492&op=click

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2005-08-01 12:36 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20050723003544.GC1988@elf.ucw.cz>
     [not found] ` <20050723003544.GC1988-I/5MKhXcvmPrBKCeMvbIDA@public.gmane.org>
2005-07-30  0:50   ` S3 and sigwait (was Re: 2.6.13-rc3: swsusp works (TP 600X)) Sanjoy Mahajan
     [not found]     ` <E1DyfYO-0006oI-00-KmINTRm7+bkRAIupTkoUWTYRy0cijUJx@public.gmane.org>
2005-07-30 10:30       ` Pavel Machek
     [not found]         ` <20050730103034.GC1942-I/5MKhXcvmPrBKCeMvbIDA@public.gmane.org>
2005-08-01  6:51           ` Shaohua Li
     [not found]             ` <1122879094.3285.2.camel-ECwVeV2eNyQD0+JXs3kMbRL4W9x8LtSr@public.gmane.org>
2005-08-01  7:09               ` Pavel Machek
2005-08-01 12:36 Li, Shaohua

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox