public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Total system lockup with Alt-SysRQ-L
@ 2001-12-23 17:58 Russell King
  2001-12-24  2:34 ` Alan Cox
  0 siblings, 1 reply; 10+ messages in thread
From: Russell King @ 2001-12-23 17:58 UTC (permalink / raw)
  To: linux-kernel

Ok, alt-sysrq-l is a pretty major thing to do, as it has the effect of
killing everything, including init.

When pid1 exits (maybe due to a kill signal), we lockup hard in (iirc)
exit_notify.  I don't remember the details I'm afraid.

Back in 2.3, I had a go at fixing this, Linus rejected the patch saying
that it was doing the wrong thing.  To this day, the kernel still suffers
from this, and I've not had the inclination to spend any more time on it.

So, I'm just letting people know that alt-sysrq-l is rather fatal,
especially if you want to do the following sequence to avoid a fsck:

	alt-sysrq-l
	alt-sysrq-s
	alt-sysrq-u
	alt-sysrq-b

IMHO either alt-sysrq-l should be removed, or someone who knows the logic
behind the linking of tasks together needs to fix exit_notify so it doesn't
enter an infinite loop when init exits.

--
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Total system lockup with Alt-SysRQ-L
  2001-12-23 17:58 Total system lockup with Alt-SysRQ-L Russell King
@ 2001-12-24  2:34 ` Alan Cox
  2001-12-24  8:37   ` Russell King
  0 siblings, 1 reply; 10+ messages in thread
From: Alan Cox @ 2001-12-24  2:34 UTC (permalink / raw)
  To: Russell King; +Cc: linux-kernel

> When pid1 exits (maybe due to a kill signal), we lockup hard in (iirc)
> exit_notify.  I don't remember the details I'm afraid.

pid1 ends up trying to kill pid1 and it goes deeply down the toilet from
that point onwards. The Unix traditional world reboots when pid 1 dies.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Total system lockup with Alt-SysRQ-L
  2001-12-24  2:34 ` Alan Cox
@ 2001-12-24  8:37   ` Russell King
  2001-12-24 11:48     ` Denis Oliver Kropp
                       ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Russell King @ 2001-12-24  8:37 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel

On Mon, Dec 24, 2001 at 02:34:20AM +0000, Alan Cox wrote:
> > When pid1 exits (maybe due to a kill signal), we lockup hard in (iirc)
> > exit_notify.  I don't remember the details I'm afraid.
> 
> pid1 ends up trying to kill pid1 and it goes deeply down the toilet from
> that point onwards. The Unix traditional world reboots when pid 1 dies.

The problem was definitely in the exit_notify code, where it manipulated
the task links indefinitely.  (I think it was cptr never becomes null,
so the loop never terminates).

However, if we're saying that "pid1 must not die" then maybe we should get
rid of the 'killall' sysrq option since it serves no useful purpose, and
add a suitable panic in the do_exit path?

I'll generate a patch for that if there's interest.

--
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Total system lockup with Alt-SysRQ-L
  2001-12-24  8:37   ` Russell King
@ 2001-12-24 11:48     ` Denis Oliver Kropp
  2001-12-24 12:26     ` Russell King
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 10+ messages in thread
From: Denis Oliver Kropp @ 2001-12-24 11:48 UTC (permalink / raw)
  To: Russell King; +Cc: Alan Cox, linux-kernel

Quoting Russell King (rmk@arm.linux.org.uk):
> On Mon, Dec 24, 2001 at 02:34:20AM +0000, Alan Cox wrote:
> > > When pid1 exits (maybe due to a kill signal), we lockup hard in (iirc)
> > > exit_notify.  I don't remember the details I'm afraid.
> > 
> > pid1 ends up trying to kill pid1 and it goes deeply down the toilet from
> > that point onwards. The Unix traditional world reboots when pid 1 dies.
> 
> The problem was definitely in the exit_notify code, where it manipulated
> the task links indefinitely.  (I think it was cptr never becomes null,
> so the loop never terminates).
> 
> However, if we're saying that "pid1 must not die" then maybe we should get
> rid of the 'killall' sysrq option since it serves no useful purpose, and
> add a suitable panic in the do_exit path?

Another annoying thing that happens sometimes is that I accidently
press 'L' or 'E' instead of 'K' or 'R', the mostly used SysRQs for me.

An additional modifier for the harmful actions would be useful, e.g. Shift.
So pressing Alt-SysRQ-E would do nothing until Shift is pressed, too.

-- 
Best regards,
  Denis Oliver Kropp

.------------------------------------------.
| DirectFB - Hardware accelerated graphics |
| http://www.directfb.org/                 |
"------------------------------------------"

           convergence integrated media GmbH

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Total system lockup with Alt-SysRQ-L
  2001-12-24  8:37   ` Russell King
  2001-12-24 11:48     ` Denis Oliver Kropp
@ 2001-12-24 12:26     ` Russell King
  2001-12-25 11:33       ` Pavel Machek
  2001-12-24 14:27     ` M. Edward (Ed) Borasky
  2001-12-28  1:00     ` David Woodhouse
  3 siblings, 1 reply; 10+ messages in thread
From: Russell King @ 2001-12-24 12:26 UTC (permalink / raw)
  To: linux-kernel

On Mon, Dec 24, 2001 at 08:37:52AM +0000, Russell King wrote:
> The problem was definitely in the exit_notify code, where it manipulated
> the task links indefinitely.  (I think it was cptr never becomes null,
> so the loop never terminates).
> 
> However, if we're saying that "pid1 must not die" then maybe we should get
> rid of the 'killall' sysrq option since it serves no useful purpose, and
> add a suitable panic in the do_exit path?

Ok, can someone explain *why* it is desirable to attempt to kill pid1
given that doing so will completely lockup the machine?  (should we
rename it to "Lockup" instead of "killalL"? 8)

We do have some tests in the do_exit() path to panic if/when init dies,
which rely on the init PID being '1'.  Unfortunately, these don't trigger
because of the following bogosity in drivers/char/sysrq.c:

                        if (p->pid == 1 && even_init)
                                /* Ugly hack to kill init */
                                p->pid = 0x8000;

So, I propose we get rid of this "ugly hack", and the alt-sysrq-l
option altogether - it would appear to serve no useful purpose.

Here is a patch that does just this.  It should apply to 2.4.17 and 2.5.1
kernels fine (generated on 2.5.1).

--- orig/drivers/char/sysrq.c	Wed Dec 12 11:37:40 2001
+++ linux/drivers/char/sysrq.c	Mon Dec 24 12:19:58 2001
@@ -284,24 +284,20 @@
 
 /* signal sysrq helper function
  * Sends a signal to all user processes */
-static void send_sig_all(int sig, int even_init)
+static void send_sig_all(int sig)
 {
 	struct task_struct *p;
 
 	for_each_task(p) {
-		if (p->mm) { /* Not swapper nor kernel thread */
-			if (p->pid == 1 && even_init)
-				/* Ugly hack to kill init */
-				p->pid = 0x8000;
-			if (p->pid != 1)
-				force_sig(sig, p);
-		}
+		if (p->mm && p->pid != 1)
+			/* Not swapper, init nor kernel thread */
+			force_sig(sig, p);
 	}
 }
 
 static void sysrq_handle_term(int key, struct pt_regs *pt_regs,
 		struct kbd_struct *kbd, struct tty_struct *tty) {
-	send_sig_all(SIGTERM, 0);
+	send_sig_all(SIGTERM);
 	console_loglevel = 8;
 }
 static struct sysrq_key_op sysrq_term_op = {
@@ -312,7 +308,7 @@
 
 static void sysrq_handle_kill(int key, struct pt_regs *pt_regs,
 		struct kbd_struct *kbd, struct tty_struct *tty) {
-	send_sig_all(SIGKILL, 0);
+	send_sig_all(SIGKILL);
 	console_loglevel = 8;
 }
 static struct sysrq_key_op sysrq_kill_op = {
@@ -321,17 +317,6 @@
 	action_msg:	"Kill All Tasks",
 };
 
-static void sysrq_handle_killall(int key, struct pt_regs *pt_regs,
-		struct kbd_struct *kbd, struct tty_struct *tty) {
-	send_sig_all(SIGKILL, 1);
-	console_loglevel = 8;
-}
-static struct sysrq_key_op sysrq_killall_op = {
-	handler:	sysrq_handle_killall,
-	help_msg:	"killalL",
-	action_msg:	"Kill All Tasks (even init)",
-};
-
 /* END SIGNAL SYSRQ HANDLERS BLOCK */
 
 
@@ -366,7 +351,7 @@
 #else
 /* k */	NULL,
 #endif
-/* l */	&sysrq_killall_op,
+/* l */	NULL,
 /* m */	&sysrq_showmem_op,
 /* n */	NULL,
 /* o */	NULL, /* This will often be registered

--
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Total system lockup with Alt-SysRQ-L
  2001-12-24  8:37   ` Russell King
  2001-12-24 11:48     ` Denis Oliver Kropp
  2001-12-24 12:26     ` Russell King
@ 2001-12-24 14:27     ` M. Edward (Ed) Borasky
  2001-12-24 17:07       ` Alan Cox
  2001-12-28  1:00     ` David Woodhouse
  3 siblings, 1 reply; 10+ messages in thread
From: M. Edward (Ed) Borasky @ 2001-12-24 14:27 UTC (permalink / raw)
  To: Russell King; +Cc: Alan Cox, linux-kernel

On Mon, 24 Dec 2001, Russell King wrote:

> The problem was definitely in the exit_notify code, where it
> manipulated the task links indefinitely.  (I think it was cptr never
> becomes null, so the loop never terminates).
>
> However, if we're saying that "pid1 must not die" then maybe we should
> get rid of the 'killall' sysrq option since it serves no useful
> purpose, and add a suitable panic in the do_exit path?
>
> I'll generate a patch for that if there's interest.

What would be even better, and I think there may already be such an
option, would be a one-button "sync up all the disks, forbid any more
writes, save as much state as possbile (registers, memory) to a swap
partition, set a flag for crash dump processing and reboot" capability.

-- 
M. Edward Borasky

znmeb@borasky-research.net
http://www.borasky-research.net

If God had meant carrots to be eaten cooked, He would have given rabbits
fire.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Total system lockup with Alt-SysRQ-L
  2001-12-24 14:27     ` M. Edward (Ed) Borasky
@ 2001-12-24 17:07       ` Alan Cox
  2001-12-25 11:35         ` Pavel Machek
  0 siblings, 1 reply; 10+ messages in thread
From: Alan Cox @ 2001-12-24 17:07 UTC (permalink / raw)
  To: "M. Edward (Ed) Borasky"; +Cc: Russell King, Alan Cox, linux-kernel

> option, would be a one-button "sync up all the disks, forbid any more
> writes, save as much state as possbile (registers, memory) to a swap
> partition, set a flag for crash dump processing and reboot" capability.

Very hard to do - you can't trust the I/O systems state so the dump code
has to verify it hasnt been corrupted, reconfigure the drive it wishes to
write to, write the data out using its own non interrupt driven code and
then halt the box.

There are folks with patches that do a lot of that (lkcd)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Total system lockup with Alt-SysRQ-L
  2001-12-24 12:26     ` Russell King
@ 2001-12-25 11:33       ` Pavel Machek
  0 siblings, 0 replies; 10+ messages in thread
From: Pavel Machek @ 2001-12-25 11:33 UTC (permalink / raw)
  To: Russell King; +Cc: linux-kernel


Hi!

> We do have some tests in the do_exit() path to panic if/when init dies,
> which rely on the init PID being '1'.  Unfortunately, these don't trigger
> because of the following bogosity in drivers/char/sysrq.c:
> 
>                         if (p->pid == 1 && even_init)
>                                 /* Ugly hack to kill init */
>                                 p->pid = 0x8000;
> 
> So, I propose we get rid of this "ugly hack", and the alt-sysrq-l
> option altogether - it would appear to serve no useful purpose.

Ask mj if it was ever usefull... But I guess it was not. Kill it.

> 
> Here is a patch that does just this.  It should apply to 2.4.17 and 2.5.1
> kernels fine (generated on 2.5.1).
> 
> --- orig/drivers/char/sysrq.c	Wed Dec 12 11:37:40 2001
> +++ linux/drivers/char/sysrq.c	Mon Dec 24 12:19:58 2001
> @@ -284,24 +284,20 @@
>  
>  /* signal sysrq helper function
>   * Sends a signal to all user processes */
> -static void send_sig_all(int sig, int even_init)
> +static void send_sig_all(int sig)
>  {
>  	struct task_struct *p;
>  
>  	for_each_task(p) {
> -		if (p->mm) { /* Not swapper nor kernel thread */
> -			if (p->pid == 1 && even_init)
> -				/* Ugly hack to kill init */
> -				p->pid = 0x8000;
> -			if (p->pid != 1)
> -				force_sig(sig, p);
> -		}
> +		if (p->mm && p->pid != 1)
> +			/* Not swapper, init nor kernel thread */
> +			force_sig(sig, p);
>  	}
>  }
>  
>  static void sysrq_handle_term(int key, struct pt_regs *pt_regs,
>  		struct kbd_struct *kbd, struct tty_struct *tty) {
> -	send_sig_all(SIGTERM, 0);
> +	send_sig_all(SIGTERM);
>  	console_loglevel = 8;
>  }
>  static struct sysrq_key_op sysrq_term_op = {
> @@ -312,7 +308,7 @@
>  
>  static void sysrq_handle_kill(int key, struct pt_regs *pt_regs,
>  		struct kbd_struct *kbd, struct tty_struct *tty) {
> -	send_sig_all(SIGKILL, 0);
> +	send_sig_all(SIGKILL);
>  	console_loglevel = 8;
>  }
>  static struct sysrq_key_op sysrq_kill_op = {
> @@ -321,17 +317,6 @@
>  	action_msg:	"Kill All Tasks",
>  };
>  
> -static void sysrq_handle_killall(int key, struct pt_regs *pt_regs,
> -		struct kbd_struct *kbd, struct tty_struct *tty) {
> -	send_sig_all(SIGKILL, 1);
> -	console_loglevel = 8;
> -}
> -static struct sysrq_key_op sysrq_killall_op = {
> -	handler:	sysrq_handle_killall,
> -	help_msg:	"killalL",
> -	action_msg:	"Kill All Tasks (even init)",
> -};
> -
>  /* END SIGNAL SYSRQ HANDLERS BLOCK */
>  
>  
> @@ -366,7 +351,7 @@
>  #else
>  /* k */	NULL,
>  #endif
> -/* l */	&sysrq_killall_op,
> +/* l */	NULL,
>  /* m */	&sysrq_showmem_op,
>  /* n */	NULL,
>  /* o */	NULL, /* This will often be registered
> 
> --
> Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
>              http://www.arm.linux.org.uk/personal/aboutme.html
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Total system lockup with Alt-SysRQ-L
  2001-12-24 17:07       ` Alan Cox
@ 2001-12-25 11:35         ` Pavel Machek
  0 siblings, 0 replies; 10+ messages in thread
From: Pavel Machek @ 2001-12-25 11:35 UTC (permalink / raw)
  To: Alan Cox; +Cc: "M. Edward (Ed) Borasky", Russell King, linux-kernel

Hi!

> > option, would be a one-button "sync up all the disks, forbid any more
> > writes, save as much state as possbile (registers, memory) to a swap
> > partition, set a flag for crash dump processing and reboot" capability.
> 
> Very hard to do - you can't trust the I/O systems state so the dump code

Actually... swsusp should be usable for most of this... But swsusp will
not work in bad state and I guess that's showtopper.
 
-- 
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Total system lockup with Alt-SysRQ-L
  2001-12-24  8:37   ` Russell King
                       ` (2 preceding siblings ...)
  2001-12-24 14:27     ` M. Edward (Ed) Borasky
@ 2001-12-28  1:00     ` David Woodhouse
  3 siblings, 0 replies; 10+ messages in thread
From: David Woodhouse @ 2001-12-28  1:00 UTC (permalink / raw)
  To: Russell King; +Cc: linux-kernel


rmk@arm.linux.org.uk said:
>  Ok, can someone explain *why* it is desirable to attempt to kill pid1
> given that doing so will completely lockup the machine?  (should we
> rename it to "Lockup" instead of "killalL"? 8) 

It's not. I believe SysRq-L was implemented while Linux would still exhibit 
sane behaviour upon pid1 dying, and was never removed when the current 
brokenness was introduced.

--
dwmw2



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2001-12-28  1:00 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-12-23 17:58 Total system lockup with Alt-SysRQ-L Russell King
2001-12-24  2:34 ` Alan Cox
2001-12-24  8:37   ` Russell King
2001-12-24 11:48     ` Denis Oliver Kropp
2001-12-24 12:26     ` Russell King
2001-12-25 11:33       ` Pavel Machek
2001-12-24 14:27     ` M. Edward (Ed) Borasky
2001-12-24 17:07       ` Alan Cox
2001-12-25 11:35         ` Pavel Machek
2001-12-28  1:00     ` David Woodhouse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox