[RFC] Change signal used to exit scsi error handlers

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC] Change signal used to exit scsi error handlers
@ 2003-01-01 21:05 Willem Riede
  2003-01-08 22:53 ` Willem Riede
  0 siblings, 1 reply; 5+ messages in thread
From: Willem Riede @ 2003-01-01 21:05 UTC (permalink / raw)
  To: linux-scsi

[-- Attachment #1: Type: text/plain, Size: 4078 bytes --]

I earlier reported, that the error handler for ide-scsi exits prematurely if modprobed
from rc.sysinit. I put in some debug prints to apprehend the culprit responsible for
sending the SIGHUP signal that causes the exit.

This is what my log captured:

Jan  1 12:20:13 fallguy kernel: Process 223 [modprobe] starting scsi error handler
Jan  1 12:20:13 fallguy kernel: Wake up parent of scsi_eh_2, pid 224
Jan  1 12:20:13 fallguy kernel: Signals pending for scsi_eh_2: 00000000 00000000
Jan  1 12:20:13 fallguy kernel: Error handler scsi_eh_2 sleeping
Jan  1 12:20:13 fallguy kernel: scsi2 : SCSI host adapter emulation for IDE ATAPI devices
[detected devices skipped]
Jan  1 12:20:14 fallguy kernel: Signal 15 sent from 181 [rc.sysinit] to 182 [getkey]
Jan  1 12:20:14 fallguy kernel: Signal 1 sent from 22 [init] to 22 [init]
Jan  1 12:20:14 fallguy kernel: Signal 18 sent from 22 [init] to 22 [init]
Jan  1 12:20:14 fallguy kernel: Signal 1 sent from 22 [init] to 22 [init]
Jan  1 12:20:14 fallguy kernel: Signal 1 sent from 22 [init] to 24 [initlog]
Jan  1 12:20:14 fallguy kernel: Signal 1 sent from 22 [init] to 78 [khubd]
Jan  1 12:20:14 fallguy kernel: Signal 1 sent from 22 [init] to 224 [scsi_eh_2]
Jan  1 12:20:14 fallguy kernel: Signals pending for scsi_eh_2: 00000001 00000000
Jan  1 12:20:14 fallguy kernel: Error handler scsi_eh_2 exiting

Here is a snapshot of some processes made during rc.sysinit:

  F   UID   PID  PPID PRI  NI   VSZ  RSS WCHAN  STAT TTY        TIME COMMAND
100     0     1     0  15   0  1332  420 schedu S    ?          0:05 init
...
040     0    22     1  16   0  1332  388 wait4  S    tty1       0:00 init
000     0    23    22  15   0  4116 1316 wait4  S    tty1       0:00 /bin/bash /
040     0    24    23  16   0  2160 1364 schedu S    tty1       0:00 /sbin/initl
...

Init must have forked to exec bash to exec rc.sysinit which then gets re-executed
through initlog. When rc.sysinit ends, the last thing it does is send that TERM
signal from sub-process 181 to getkey (process 182) -- the 'Signal 15 ...' line 
above.

As the forked init (process 22) exits, it sends a flurry of signals to all surviving
processes created from it. That looks like standard "if I am to die I need to take
all my offspring down with me" behavior -- do you agree?

Since we want error handlers to survive, IMHO that means that the choice of signal
for error handler exit is unfortunate. The source of scsi_error suggests SIGPWR
might be a worthy alternative. I think that is true. From inspecting init source,
it is not capable of sending SIGPWR. SIGPWR should never be sent by dying processes
(its sole use should be from a power daemon _to_ init to shut the system down when
the juice is running out).

So I suggest the following changes to hosts.c and scsi_error.c:

--- drivers/scsi/hosts.c	Tue Dec 24 09:59:30 2002
+++ /home/wriede/develop/hosts.c	Wed Jan  1 15:09:05 2003
@@ -337,7 +337,7 @@
 	if (shost->ehandler) {
 		DECLARE_MUTEX_LOCKED(sem);
 		shost->eh_notify = &sem;
-		send_sig(SIGHUP, shost->ehandler, 1);
+		send_sig(SIGPWR, shost->ehandler, 1);
 		down(&sem);
 		shost->eh_notify = NULL;
 	}

--- drivers/scsi/scsi_error.c	Tue Dec 24 09:59:30 2002
+++ /home/wriede/develop/scsi_error.c	Wed Jan  1 15:21:46 2003
@@ -52,8 +52,12 @@
  * go to single-user mode.  For that matter, init also sends SIGKILL,
  * so we mustn't enable that one either.  We use SIGHUP instead.  Other
  * options would be SIGPWR, I suppose.
+ *
+ * Changed behavior 1/1/2003 - it turns out, that SIGHUP can get sent
+ * to error handlers from a process responsible for their creation.
+ * To sidestep that issue, we now use SIGPWR as suggested above.
  */
-#define SHUTDOWN_SIGS	(sigmask(SIGHUP))
+#define SHUTDOWN_SIGS	(sigmask(SIGPWR))
 
 #ifdef DEBUG
 #define SENSE_TIMEOUT SCSI_TIMEOUT

Seperatly, I'd like to suggest improving the debug printout associated with the
error handler process.

Full diffs against 2.5.53 attached. If accepted, they need to go in 2.4.x too,
as I have confirmed, that the same problem exists there.

Comments, please. Willem Riede.

[-- Attachment #2: hosts.patch --]
[-- Type: text/plain, Size: 340 bytes --]

--- drivers/scsi/hosts.c	Tue Dec 24 09:59:30 2002
+++ /home/wriede/develop/hosts.c	Wed Jan  1 15:09:05 2003
@@ -337,7 +337,7 @@
 	if (shost->ehandler) {
 		DECLARE_MUTEX_LOCKED(sem);
 		shost->eh_notify = &sem;
-		send_sig(SIGHUP, shost->ehandler, 1);
+		send_sig(SIGPWR, shost->ehandler, 1);
 		down(&sem);
 		shost->eh_notify = NULL;
 	}

[-- Attachment #3: scsi_error.patch --]
[-- Type: text/plain, Size: 1777 bytes --]

--- drivers/scsi/scsi_error.c	Tue Dec 24 09:59:30 2002
+++ /home/wriede/develop/scsi_error.c	Wed Jan  1 15:21:46 2003
@@ -52,8 +52,12 @@
  * go to single-user mode.  For that matter, init also sends SIGKILL,
  * so we mustn't enable that one either.  We use SIGHUP instead.  Other
  * options would be SIGPWR, I suppose.
+ *
+ * Changed behavior 1/1/2003 - it turns out, that SIGHUP can get sent
+ * to error handlers from a process responsible for their creation.
+ * To sidestep that issue, we now use SIGPWR as suggested above.
  */
-#define SHUTDOWN_SIGS	(sigmask(SIGHUP))
+#define SHUTDOWN_SIGS	(sigmask(SIGPWR))
 
 #ifdef DEBUG
 #define SENSE_TIMEOUT SCSI_TIMEOUT
@@ -1619,7 +1623,7 @@
 	/*
 	 * Wake up the thread that created us.
 	 */
-	SCSI_LOG_ERROR_RECOVERY(3, printk("Wake up parent \n"));
+	SCSI_LOG_ERROR_RECOVERY(3, printk("Wake up parent of scsi_eh_%d\n",shost->host_no));
 
 	up(shost->eh_notify);
 
@@ -1629,7 +1633,7 @@
 		 * away and die.  This typically happens if the user is
 		 * trying to unload a module.
 		 */
-		SCSI_LOG_ERROR_RECOVERY(1, printk("Error handler sleeping\n"));
+		SCSI_LOG_ERROR_RECOVERY(1, printk("Error handler scsi_eh_%d sleeping\n",shost->host_no));
 
 		/*
 		 * Note - we always use down_interruptible with the semaphore
@@ -1644,7 +1648,7 @@
 		if (signal_pending(current))
 			break;
 
-		SCSI_LOG_ERROR_RECOVERY(1, printk("Error handler waking up\n"));
+		SCSI_LOG_ERROR_RECOVERY(1, printk("Error handler scsi_eh_%d waking up\n",shost->host_no));
 
 		shost->eh_active = 1;
 
@@ -1672,7 +1676,7 @@
 
 	}
 
-	SCSI_LOG_ERROR_RECOVERY(1, printk("Error handler exiting\n"));
+	SCSI_LOG_ERROR_RECOVERY(1, printk("Error handler scsi_eh_%d exiting\n",shost->host_no));
 
 	/*
 	 * Make sure that nobody tries to wake us up again.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] Change signal used to exit scsi error handlers
  2003-01-01 21:05 [RFC] Change signal used to exit scsi error handlers Willem Riede
@ 2003-01-08 22:53 ` Willem Riede
  2003-01-08 23:36   ` Mike Anderson
  2003-01-09  0:50   ` Alan Cox
  0 siblings, 2 replies; 5+ messages in thread
From: Willem Riede @ 2003-01-08 22:53 UTC (permalink / raw)
  To: linux-scsi; +Cc: andmike

On 2003.01.01 16:05 Willem Riede wrote:
> I earlier reported, that the error handler for ide-scsi exits prematurely if modprobed
> from rc.sysinit. I put in some debug prints to apprehend the culprit responsible for
> sending the SIGHUP signal that causes the exit.
> 
[snip]
> 
> Since we want error handlers to survive, IMHO that means that the choice of signal
> for error handler exit is unfortunate. The source of scsi_error suggests SIGPWR
> might be a worthy alternative. I think that is true. From inspecting init source,
> it is not capable of sending SIGPWR. SIGPWR should never be sent by dying processes
> (its sole use should be from a power daemon _to_ init to shut the system down when
> the juice is running out).
> 
So nobody has any comments? But who decides whether to make this change?
>From the source it appears that the last person to touch scsi_error.c and hosts.c
is Mike Anderson. Does that make you the defacto maintainer, Mike?

Thanks, Willem Riede.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] Change signal used to exit scsi error handlers
  2003-01-08 22:53 ` Willem Riede
@ 2003-01-08 23:36   ` Mike Anderson
  2003-01-09  0:48     ` Alan Cox
  2003-01-09  0:50   ` Alan Cox
  1 sibling, 1 reply; 5+ messages in thread
From: Mike Anderson @ 2003-01-08 23:36 UTC (permalink / raw)
  To: Willem Riede; +Cc: linux-scsi

Willem Riede [wrlk@riede.org] wrote:
> On 2003.01.01 16:05 Willem Riede wrote:
> > I earlier reported, that the error handler for ide-scsi exits prematurely if modprobed
> > from rc.sysinit. I put in some debug prints to apprehend the culprit responsible for
> > sending the SIGHUP signal that causes the exit.
> > 
> [snip]
> > 
> > Since we want error handlers to survive, IMHO that means that the choice of signal
> > for error handler exit is unfortunate. The source of scsi_error suggests SIGPWR
> > might be a worthy alternative. I think that is true. From inspecting init source,
> > it is not capable of sending SIGPWR. SIGPWR should never be sent by dying processes
> > (its sole use should be from a power daemon _to_ init to shut the system down when
> > the juice is running out).
> > 
> So nobody has any comments? But who decides whether to make this change?
> >From the source it appears that the last person to touch scsi_error.c and hosts.c
> is Mike Anderson. Does that make you the defacto maintainer, Mike?

Sorry about no reply I am just back from a very long time out of the
office and I am just catching up.

I do not know if last updates give me maintainership, but I will give
my $.02.

The change looks reasonable to switch to another signal to avoid the
problem. It is unclear why the comment mentioned only SIGPWR as the only
alternative. It would think that SIGUSR1, SIGUSR2, etc. would also work
maybe someone else on this list or linux-kernel would know why.

The logging change looks good.

-andmike
--
Michael Anderson
andmike@us.ibm.com


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] Change signal used to exit scsi error handlers
  2003-01-08 23:36   ` Mike Anderson
@ 2003-01-09  0:48     ` Alan Cox
  0 siblings, 0 replies; 5+ messages in thread
From: Alan Cox @ 2003-01-09  0:48 UTC (permalink / raw)
  To: Mike Anderson; +Cc: Willem Riede, linux-scsi

On Wed, 2003-01-08 at 23:36, Mike Anderson wrote:
> The change looks reasonable to switch to another signal to avoid the
> problem. It is unclear why the comment mentioned only SIGPWR as the only
> alternative. It would think that SIGUSR1, SIGUSR2, etc. would also work
> maybe someone else on this list or linux-kernel would know why.

Pete Zaitcev figured it out - it also breaks other stuff. Seems to be a
bug caused by NPTL - looks like Daemonize() needs updating and has not
been.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] Change signal used to exit scsi error handlers
  2003-01-08 22:53 ` Willem Riede
  2003-01-08 23:36   ` Mike Anderson
@ 2003-01-09  0:50   ` Alan Cox
  1 sibling, 0 replies; 5+ messages in thread
From: Alan Cox @ 2003-01-09  0:50 UTC (permalink / raw)
  To: wrlk; +Cc: linux-scsi, andmike

On Wed, 2003-01-08 at 22:53, Willem Riede wrote:
> So nobody has any comments? But who decides whether to make this change?
> >From the source it appears that the last person to touch scsi_error.c and hosts.c
> is Mike Anderson. Does that make you the defacto maintainer, Mike?

People were occupied with important things - like finding out *why* it broke


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2003-01-09  0:50 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-01-01 21:05 [RFC] Change signal used to exit scsi error handlers Willem Riede
2003-01-08 22:53 ` Willem Riede
2003-01-08 23:36   ` Mike Anderson
2003-01-09  0:48     ` Alan Cox
2003-01-09  0:50   ` Alan Cox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox