Re: 2.4.18-pre9, Iomega Jaz, PPA - endless loop in SCSI recoverythread

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Ishikawa <ishikawa@yk.rim.or.jp>
Cc: Borsenkow Andrej <Andrej.Borsenkow@mow.siemens.ru>,
	Andries.Brouwer@cwi.nl,
	linux-kernel list <linux-kernel@vger.kernel.org>,
	linux-scsi@vger.kernel.org,
	Juan Quintela <quintela@mandrakesoft.com>
Subject: Re: 2.4.18-pre9, Iomega Jaz, PPA - endless loop in SCSI recoverythread
Date: Sun, 30 Jun 2002 03:10:22 +0900	[thread overview]
Message-ID: <3D1DF80E.EF62AADE@yk.rim.or.jp> (raw)
In-Reply-To: 1023638656.3261.9.camel@localhost.localdomain


I noticed the following patch posted by Borsenkow Andrej.

Is this being picked up by the kernel maintainers?
(How can I check if such patch is being picked up?
Is there an up-to-date bleeding edge patched source tree?)

As someone who has been bitten a few times with
similar problems (including "reset storm") 
related to SCSI error handler deficiencies,
I would like to see the patch go in.
[Sorry, I am so scared these days to connect
an old Seagate drive that is known to cause trouble
due to a media error or two
to my current PC. 
When it killed linux 2.3.1xx, solaris
7 kept on handling the situation gracefully 
to my dismay/surprise. From what I understand,
the situation has not improved very much practically, 
and this post is one of the leading improvements 
to error handler.

Surely I hope the 2.5 kernel series would
have revampled SCSI subsystem with better readability
so that more people including me can understand the
code and contribute. I tried back when I noticed the
problem (and much earlier with other SCSI problems)
but the twisted code kept me from understanding the
scsi subsystem completely.

 From: Borsenkow Andrej <Andrej.Borsenkow@mow.siemens.ru>
> > 
> >     It looks like a media went corrupted so one expects some reasonable
> >     error message. What happens is I get endless loop in SCSI error recovery
> >     thread. The only way to clear up situation is to reboot
> > 
> > That is the normal situation. I have never seen the SCSI error handler
> > do anything else but bring down the system. Commenting it out
> > really increases the stability of Linux (on my hardware).
> > 
> 
> MAINTAINERS:
> 
> SCSI SUBSYSTEM
> L:      linux-scsi@vger.kernel.org
> S:      Unmaintained
> 
> Oh, well ...
> 
> 
> > Hmm. It is clear what happens.
> > You are in scsi_error.c:scsi_send_eh_cmnd() and send a command,
> > the return status is NEEDS_RETRY, and the same command is sent again. Etc.
> > 
> > Now retrying is evil, and trying more than three times almost always
> > stupid, so if you add in this function
> > 
> >       int retry_count = 0;
> > 
> > and
> > 
> >     retry:
> >       if (++retry_count > 3) {
> >               SCpnt->eh_state = FAILED;
> >               return;
> >       }
> > 
> > that would surely be an improvement locally.
> > Globally one wonders: there is a retry count already - why isn't it
> > honoured by everybody involved? (But it isnt - it is only used in
> > scsi_decide_disposition().)
> >
> 
> Which actually looks like an oversight ... and comments in unjam_host
> loop even indicate this must be used.
> 
> Anyway, returning FAILED does not look right here. As long as I
> understand what happens, FAILED means hard error (like failing to
> communicate with device) while here we have more or less "soft" error.
> So I decided to copy relevant bits from scsi_decide_disposition into
> scsi_eh_completed_normally which resulted in pretty nice overall error
> handling:
> 
> Jun  9 19:30:45 localhost kernel: Activating command for device 4 (1)
> Jun  9 19:30:45 localhost kernel: Doing sd request, dev = 0x804, block = 944448
> Jun  9 19:30:45 localhost kernel: sda : real dev = /dev/0, block = 944480
> Jun  9 19:30:45 localhost kernel: sda : reading 8/8 512 byte blocks.
> Jun  9 19:30:45 localhost kernel: Adding timer for command cfe5a000 at 6000 (d2a13964)
> Jun  9 19:30:45 localhost kernel: scsi_dispatch_cmnd (host = 0, channel = 0, target = 4, command = cfe5a058, buffer = ca4d4000, 
> Jun  9 19:30:45 localhost kernel: bufflen = 4096, done = d2813b84)
> Jun  9 19:30:45 localhost kernel: queuecommand : routine at d2a38292
> Jun  9 19:30:45 localhost kernel: leaving scsi_dispatch_cmnd()
> Jun  9 19:30:47 localhost kernel: Clearing timer for command cfe5a000 1
> Jun  9 19:30:47 localhost kernel: Command failed cfe5a000 2 active=1 busy=1 failed=0
> Jun  9 19:30:47 localhost kernel: bh08:04: old sense key None
> Jun  9 19:30:47 localhost kernel: Non-extended sense class 0 code 0x0
> Jun  9 19:30:47 localhost kernel: Waking error handler thread (-1)
> Jun  9 19:30:47 localhost kernel: Error handler waking up
> Jun  9 19:30:47 localhost kernel: scsi_unjam_host: Checking to see if we need to request sense
> Jun  9 19:30:47 localhost kernel: scsi_unjam_host: Requesting sense for 4
> Jun  9 19:30:47 localhost kernel: Adding timer for command cfe5a000 at 1000 (d2a13af8)
> Jun  9 19:30:47 localhost kernel: In eh_done cfe5a000 result:0
> Jun  9 19:30:47 localhost kernel: send_eh_cmnd: cfe5a000 eh_state:2002
> Jun  9 19:30:47 localhost kernel: scsi_send_eh_cmnd: scsi_eh_completed_normally 2002
> Jun  9 19:30:47 localhost kernel: Sense requested for cfe5a000 - result 2
> Jun  9 19:30:47 localhost kernel: Info fld=0xe6960, Current bh08:04: sense key Medium Error
> Jun  9 19:30:47 localhost kernel: Additional sense indicates Address mark not found for data field
> Jun  9 19:30:47 localhost kernel: Adding timer for command cfe5a000 at 6000 (d2a13af8)
> Jun  9 19:30:49 localhost kernel: In eh_done cfe5a000 result:2
> Jun  9 19:30:49 localhost kernel: send_eh_cmnd: cfe5a000 eh_state:2002
> Jun  9 19:30:49 localhost kernel: scsi_send_eh_cmnd: scsi_eh_completed_normally 2001
> Jun  9 19:30:49 localhost kernel: Adding timer for command cfe5a000 at 6000 (d2a13af8)
> Jun  9 19:30:51 localhost kernel: In eh_done cfe5a000 result:2
> Jun  9 19:30:51 localhost kernel: send_eh_cmnd: cfe5a000 eh_state:2002
> Jun  9 19:30:51 localhost kernel: scsi_send_eh_cmnd: scsi_eh_completed_normally 2001
> Jun  9 19:30:51 localhost kernel: Adding timer for command cfe5a000 at 6000 (d2a13af8)
> Jun  9 19:30:53 localhost kernel: In eh_done cfe5a000 result:2
> Jun  9 19:30:53 localhost kernel: send_eh_cmnd: cfe5a000 eh_state:2002
> Jun  9 19:30:53 localhost kernel: scsi_send_eh_cmnd: scsi_eh_completed_normally 2001
> Jun  9 19:30:53 localhost kernel: Adding timer for command cfe5a000 at 6000 (d2a13af8)
> Jun  9 19:30:55 localhost kernel: In eh_done cfe5a000 result:2
> Jun  9 19:30:55 localhost kernel: send_eh_cmnd: cfe5a000 eh_state:2002
> Jun  9 19:30:55 localhost kernel: scsi_send_eh_cmnd: scsi_eh_completed_normally 2002
> Jun  9 19:30:55 localhost kernel: Total of 0+0 commands on 0 devices require eh work
> Jun  9 19:30:55 localhost kernel: Adding timer for command cfe5a000 at 100 (d2a13af8)
> Jun  9 19:30:55 localhost kernel: Clearing timer for command cfe5a000 1
> Jun  9 19:30:55 localhost kernel: scsi_error.c: Waking up host to restart
> Jun  9 19:30:55 localhost kernel: Error handler sleeping
> Jun  9 19:30:55 localhost kernel: Command finished 1 0 0x2
> Jun  9 19:30:55 localhost kernel: Notifying upper driver of completion for device 4 8000002
> Jun  9 19:30:55 localhost kernel: sda : rw_intr(0, 8000002 [f0 3])
> Jun  9 19:30:55 localhost kernel: scsi0: ERROR on channel 0, id 4, lun 0, CDB: Read (10) 00 00 0e 69 60 00 00 08 00 
> Jun  9 19:30:55 localhost kernel: Info fld=0xe6960, Current sd08:04: sense key Medium Error
> Jun  9 19:30:55 localhost kernel: Additional sense indicates Address mark not found for data field
> Jun  9 19:30:55 localhost kernel: I/O error: dev 08:04, sector 944448
> Jun  9 19:30:55 localhost kernel: Deactivating command for device 4 (active=0, failed=0)
> Jun  9 19:30:55 localhost kernel: Open returning 1
> Jun  9 19:30:55 localhost kernel: Trying ioctl with scsi command 30
> Jun  9 19:30:55 localhost kernel: scsi_do_req (host = 0, channel = 0 target = 4, buffer =00000000, bufflen = 0, done = d2a0f088, timeout = 1000, retries = 5)
> Jun  9 19:30:55 localhost kernel: command : 1e  00  00  00  00  00  
> Jun  9 19:30:55 localhost kernel: Activating command for device 4 (1)
> Jun  9 19:30:55 localhost kernel: Leaving scsi_init_cmd_from_req()
> Jun  9 19:30:55 localhost kernel: Adding timer for command cfe5a000 at 1000 (d2a13964)
> Jun  9 19:30:55 localhost kernel: scsi_dispatch_cmnd (host = 0, channel = 0, target = 4, command = cfe5a058, buffer = 00000000, 
> Jun  9 19:30:55 localhost kernel: bufflen = 0, done = d2a0f088)
> Jun  9 19:30:55 localhost kernel: queuecommand : routine at d2a38292
> Jun  9 19:30:55 localhost kernel: leaving scsi_dispatch_cmnd()
> Jun  9 19:30:55 localhost kernel: Leaving scsi_do_req()
> Jun  9 19:30:55 localhost kernel: Clearing timer for command cfe5a000 1
> Jun  9 19:30:55 localhost kernel: Command finished 1 0 0x0
> Jun  9 19:30:55 localhost kernel: Notifying upper driver of completion for device 4 0
> Jun  9 19:30:55 localhost kernel: Deactivating command for device 4 (active=0, failed=0)
> Jun  9 19:30:55 localhost kernel: Ioctl returned  0x0
> Jun  9 19:30:55 localhost kernel: IOCTL Releasing command
> 
>  
> > Where did we get this NEEDS_RETRY in the first place?
> > Well, scsi_check_sense() will return NEEDS_RETRY for MEDIUM_ERROR.
> > (Bad: the hardware also did retry many times already; it is usually very
> > counterproductive to keep on hitting the same bad spot on the media.)
> >
> 
> I do not dare to touch it. Whoever wrote the code must have some reasons
> behind. Still, the obvious patch the shuts at least this code path
> follows. 
> 
> -andrej
> 
> > There are also other code paths that will loop, but this one
> > looks most likely here.
> > 
> > Andries
> > 
> 
> --- linux-2.4.18-18mdk/drivers/scsi/scsi_error.c.scsi-eh-timeout        Thu May 30 16:22:37 2002
> +++ linux-2.4.18-18mdk/drivers/scsi/scsi_error.c        Sun Jun  9 19:18:11 2002
> @@ -1100,6 +1100,8 @@
>   */
>  STATIC int scsi_eh_completed_normally(Scsi_Cmnd * SCpnt)
>  {
> +       int rtn;
> +
>         /*
>          * First check the host byte, to see if there is anything in there
>          * that would indicate what we need to do.
> @@ -1113,14 +1115,18 @@
>                          * otherwise we just flag it as success.
>                          */
>                         SCpnt->flags &= ~IS_RESETTING;
> -                       return NEEDS_RETRY;
> +                       goto maybe_retry;
>                 }
>                 /*
>                  * Rats.  We are already in the error handler, so we now get to try
>                  * and figure out what to do next.  If the sense is valid, we have
>                  * a pretty good idea of what to do.  If not, we mark it as failed.
>                  */
> -               return scsi_check_sense(SCpnt);
> +               rtn = scsi_check_sense(SCpnt);
> +               if (rtn == NEEDS_RETRY) {
> +                       goto maybe_retry;
> +               }
> +               return rtn;
>         }
>         if (host_byte(SCpnt->result) != DID_OK) {
>                 return FAILED;
> @@ -1139,7 +1145,11 @@
>         case COMMAND_TERMINATED:
>                 return SUCCESS;
>         case CHECK_CONDITION:
> -               return scsi_check_sense(SCpnt);
> +               rtn = scsi_check_sense(SCpnt);
> +               if (rtn == NEEDS_RETRY) {
> +                       goto maybe_retry;
> +               }
> +               return rtn;
>         case CONDITION_GOOD:
>         case INTERMEDIATE_GOOD:
>         case INTERMEDIATE_C_GOOD:
> @@ -1154,6 +1164,17 @@
>                 return FAILED;
>         }
>         return FAILED;
> +
> +      maybe_retry:
> +
> +       if ((++SCpnt->retries) < SCpnt->allowed) {
> +               return NEEDS_RETRY;
> +       } else {
> +                /*
> +                 * No more retries - report this one back to upper level.
> +                 */
> +               return SUCCESS;
> +       }
>  }
>  
>  /*
> -

WARNING: multiple messages have this Message-ID (diff)

From: Ishikawa <ishikawa@yk.rim.or.jp>
To: unlisted-recipients:; (no To-header on input)
Cc: Borsenkow Andrej <Andrej.Borsenkow@mow.siemens.ru>,
	Andries.Brouwer@cwi.nl,
	linux-kernel list <linux-kernel@vger.kernel.org>,
	linux-scsi@vger.kernel.org,
	Juan Quintela <quintela@mandrakesoft.com>
Subject: Re: 2.4.18-pre9, Iomega Jaz, PPA - endless loop in SCSI recoverythread
Date: Sun, 30 Jun 2002 03:10:22 +0900	[thread overview]
Message-ID: <3D1DF80E.EF62AADE@yk.rim.or.jp> (raw)
In-Reply-To: 1023638656.3261.9.camel@localhost.localdomain


I noticed the following patch posted by Borsenkow Andrej.

Is this being picked up by the kernel maintainers?
(How can I check if such patch is being picked up?
Is there an up-to-date bleeding edge patched source tree?)

As someone who has been bitten a few times with
similar problems (including "reset storm") 
related to SCSI error handler deficiencies,
I would like to see the patch go in.
[Sorry, I am so scared these days to connect
an old Seagate drive that is known to cause trouble
due to a media error or two
to my current PC. 
When it killed linux 2.3.1xx, solaris
7 kept on handling the situation gracefully 
to my dismay/surprise. From what I understand,
the situation has not improved very much practically, 
and this post is one of the leading improvements 
to error handler.

Surely I hope the 2.5 kernel series would
have revampled SCSI subsystem with better readability
so that more people including me can understand the
code and contribute. I tried back when I noticed the
problem (and much earlier with other SCSI problems)
but the twisted code kept me from understanding the
scsi subsystem completely.

 From: Borsenkow Andrej <Andrej.Borsenkow@mow.siemens.ru>
> > 
> >     It looks like a media went corrupted so one expects some reasonable
> >     error message. What happens is I get endless loop in SCSI error recovery
> >     thread. The only way to clear up situation is to reboot
> > 
> > That is the normal situation. I have never seen the SCSI error handler
> > do anything else but bring down the system. Commenting it out
> > really increases the stability of Linux (on my hardware).
> > 
> 
> MAINTAINERS:
> 
> SCSI SUBSYSTEM
> L:      linux-scsi@vger.kernel.org
> S:      Unmaintained
> 
> Oh, well ...
> 
> 
> > Hmm. It is clear what happens.
> > You are in scsi_error.c:scsi_send_eh_cmnd() and send a command,
> > the return status is NEEDS_RETRY, and the same command is sent again. Etc.
> > 
> > Now retrying is evil, and trying more than three times almost always
> > stupid, so if you add in this function
> > 
> >       int retry_count = 0;
> > 
> > and
> > 
> >     retry:
> >       if (++retry_count > 3) {
> >               SCpnt->eh_state = FAILED;
> >               return;
> >       }
> > 
> > that would surely be an improvement locally.
> > Globally one wonders: there is a retry count already - why isn't it
> > honoured by everybody involved? (But it isnt - it is only used in
> > scsi_decide_disposition().)
> >
> 
> Which actually looks like an oversight ... and comments in unjam_host
> loop even indicate this must be used.
> 
> Anyway, returning FAILED does not look right here. As long as I
> understand what happens, FAILED means hard error (like failing to
> communicate with device) while here we have more or less "soft" error.
> So I decided to copy relevant bits from scsi_decide_disposition into
> scsi_eh_completed_normally which resulted in pretty nice overall error
> handling:
> 
> Jun  9 19:30:45 localhost kernel: Activating command for device 4 (1)
> Jun  9 19:30:45 localhost kernel: Doing sd request, dev = 0x804, block = 944448
> Jun  9 19:30:45 localhost kernel: sda : real dev = /dev/0, block = 944480
> Jun  9 19:30:45 localhost kernel: sda : reading 8/8 512 byte blocks.
> Jun  9 19:30:45 localhost kernel: Adding timer for command cfe5a000 at 6000 (d2a13964)
> Jun  9 19:30:45 localhost kernel: scsi_dispatch_cmnd (host = 0, channel = 0, target = 4, command = cfe5a058, buffer = ca4d4000, 
> Jun  9 19:30:45 localhost kernel: bufflen = 4096, done = d2813b84)
> Jun  9 19:30:45 localhost kernel: queuecommand : routine at d2a38292
> Jun  9 19:30:45 localhost kernel: leaving scsi_dispatch_cmnd()
> Jun  9 19:30:47 localhost kernel: Clearing timer for command cfe5a000 1
> Jun  9 19:30:47 localhost kernel: Command failed cfe5a000 2 active=1 busy=1 failed=0
> Jun  9 19:30:47 localhost kernel: bh08:04: old sense key None
> Jun  9 19:30:47 localhost kernel: Non-extended sense class 0 code 0x0
> Jun  9 19:30:47 localhost kernel: Waking error handler thread (-1)
> Jun  9 19:30:47 localhost kernel: Error handler waking up
> Jun  9 19:30:47 localhost kernel: scsi_unjam_host: Checking to see if we need to request sense
> Jun  9 19:30:47 localhost kernel: scsi_unjam_host: Requesting sense for 4
> Jun  9 19:30:47 localhost kernel: Adding timer for command cfe5a000 at 1000 (d2a13af8)
> Jun  9 19:30:47 localhost kernel: In eh_done cfe5a000 result:0
> Jun  9 19:30:47 localhost kernel: send_eh_cmnd: cfe5a000 eh_state:2002
> Jun  9 19:30:47 localhost kernel: scsi_send_eh_cmnd: scsi_eh_completed_normally 2002
> Jun  9 19:30:47 localhost kernel: Sense requested for cfe5a000 - result 2
> Jun  9 19:30:47 localhost kernel: Info fld=0xe6960, Current bh08:04: sense key Medium Error
> Jun  9 19:30:47 localhost kernel: Additional sense indicates Address mark not found for data field
> Jun  9 19:30:47 localhost kernel: Adding timer for command cfe5a000 at 6000 (d2a13af8)
> Jun  9 19:30:49 localhost kernel: In eh_done cfe5a000 result:2
> Jun  9 19:30:49 localhost kernel: send_eh_cmnd: cfe5a000 eh_state:2002
> Jun  9 19:30:49 localhost kernel: scsi_send_eh_cmnd: scsi_eh_completed_normally 2001
> Jun  9 19:30:49 localhost kernel: Adding timer for command cfe5a000 at 6000 (d2a13af8)
> Jun  9 19:30:51 localhost kernel: In eh_done cfe5a000 result:2
> Jun  9 19:30:51 localhost kernel: send_eh_cmnd: cfe5a000 eh_state:2002
> Jun  9 19:30:51 localhost kernel: scsi_send_eh_cmnd: scsi_eh_completed_normally 2001
> Jun  9 19:30:51 localhost kernel: Adding timer for command cfe5a000 at 6000 (d2a13af8)
> Jun  9 19:30:53 localhost kernel: In eh_done cfe5a000 result:2
> Jun  9 19:30:53 localhost kernel: send_eh_cmnd: cfe5a000 eh_state:2002
> Jun  9 19:30:53 localhost kernel: scsi_send_eh_cmnd: scsi_eh_completed_normally 2001
> Jun  9 19:30:53 localhost kernel: Adding timer for command cfe5a000 at 6000 (d2a13af8)
> Jun  9 19:30:55 localhost kernel: In eh_done cfe5a000 result:2
> Jun  9 19:30:55 localhost kernel: send_eh_cmnd: cfe5a000 eh_state:2002
> Jun  9 19:30:55 localhost kernel: scsi_send_eh_cmnd: scsi_eh_completed_normally 2002
> Jun  9 19:30:55 localhost kernel: Total of 0+0 commands on 0 devices require eh work
> Jun  9 19:30:55 localhost kernel: Adding timer for command cfe5a000 at 100 (d2a13af8)
> Jun  9 19:30:55 localhost kernel: Clearing timer for command cfe5a000 1
> Jun  9 19:30:55 localhost kernel: scsi_error.c: Waking up host to restart
> Jun  9 19:30:55 localhost kernel: Error handler sleeping
> Jun  9 19:30:55 localhost kernel: Command finished 1 0 0x2
> Jun  9 19:30:55 localhost kernel: Notifying upper driver of completion for device 4 8000002
> Jun  9 19:30:55 localhost kernel: sda : rw_intr(0, 8000002 [f0 3])
> Jun  9 19:30:55 localhost kernel: scsi0: ERROR on channel 0, id 4, lun 0, CDB: Read (10) 00 00 0e 69 60 00 00 08 00 
> Jun  9 19:30:55 localhost kernel: Info fld=0xe6960, Current sd08:04: sense key Medium Error
> Jun  9 19:30:55 localhost kernel: Additional sense indicates Address mark not found for data field
> Jun  9 19:30:55 localhost kernel: I/O error: dev 08:04, sector 944448
> Jun  9 19:30:55 localhost kernel: Deactivating command for device 4 (active=0, failed=0)
> Jun  9 19:30:55 localhost kernel: Open returning 1
> Jun  9 19:30:55 localhost kernel: Trying ioctl with scsi command 30
> Jun  9 19:30:55 localhost kernel: scsi_do_req (host = 0, channel = 0 target = 4, buffer =00000000, bufflen = 0, done = d2a0f088, timeout = 1000, retries = 5)
> Jun  9 19:30:55 localhost kernel: command : 1e  00  00  00  00  00  
> Jun  9 19:30:55 localhost kernel: Activating command for device 4 (1)
> Jun  9 19:30:55 localhost kernel: Leaving scsi_init_cmd_from_req()
> Jun  9 19:30:55 localhost kernel: Adding timer for command cfe5a000 at 1000 (d2a13964)
> Jun  9 19:30:55 localhost kernel: scsi_dispatch_cmnd (host = 0, channel = 0, target = 4, command = cfe5a058, buffer = 00000000, 
> Jun  9 19:30:55 localhost kernel: bufflen = 0, done = d2a0f088)
> Jun  9 19:30:55 localhost kernel: queuecommand : routine at d2a38292
> Jun  9 19:30:55 localhost kernel: leaving scsi_dispatch_cmnd()
> Jun  9 19:30:55 localhost kernel: Leaving scsi_do_req()
> Jun  9 19:30:55 localhost kernel: Clearing timer for command cfe5a000 1
> Jun  9 19:30:55 localhost kernel: Command finished 1 0 0x0
> Jun  9 19:30:55 localhost kernel: Notifying upper driver of completion for device 4 0
> Jun  9 19:30:55 localhost kernel: Deactivating command for device 4 (active=0, failed=0)
> Jun  9 19:30:55 localhost kernel: Ioctl returned  0x0
> Jun  9 19:30:55 localhost kernel: IOCTL Releasing command
> 
>  
> > Where did we get this NEEDS_RETRY in the first place?
> > Well, scsi_check_sense() will return NEEDS_RETRY for MEDIUM_ERROR.
> > (Bad: the hardware also did retry many times already; it is usually very
> > counterproductive to keep on hitting the same bad spot on the media.)
> >
> 
> I do not dare to touch it. Whoever wrote the code must have some reasons
> behind. Still, the obvious patch the shuts at least this code path
> follows. 
> 
> -andrej
> 
> > There are also other code paths that will loop, but this one
> > looks most likely here.
> > 
> > Andries
> > 
> 
> --- linux-2.4.18-18mdk/drivers/scsi/scsi_error.c.scsi-eh-timeout        Thu May 30 16:22:37 2002
> +++ linux-2.4.18-18mdk/drivers/scsi/scsi_error.c        Sun Jun  9 19:18:11 2002
> @@ -1100,6 +1100,8 @@
>   */
>  STATIC int scsi_eh_completed_normally(Scsi_Cmnd * SCpnt)
>  {
> +       int rtn;
> +
>         /*
>          * First check the host byte, to see if there is anything in there
>          * that would indicate what we need to do.
> @@ -1113,14 +1115,18 @@
>                          * otherwise we just flag it as success.
>                          */
>                         SCpnt->flags &= ~IS_RESETTING;
> -                       return NEEDS_RETRY;
> +                       goto maybe_retry;
>                 }
>                 /*
>                  * Rats.  We are already in the error handler, so we now get to try
>                  * and figure out what to do next.  If the sense is valid, we have
>                  * a pretty good idea of what to do.  If not, we mark it as failed.
>                  */
> -               return scsi_check_sense(SCpnt);
> +               rtn = scsi_check_sense(SCpnt);
> +               if (rtn == NEEDS_RETRY) {
> +                       goto maybe_retry;
> +               }
> +               return rtn;
>         }
>         if (host_byte(SCpnt->result) != DID_OK) {
>                 return FAILED;
> @@ -1139,7 +1145,11 @@
>         case COMMAND_TERMINATED:
>                 return SUCCESS;
>         case CHECK_CONDITION:
> -               return scsi_check_sense(SCpnt);
> +               rtn = scsi_check_sense(SCpnt);
> +               if (rtn == NEEDS_RETRY) {
> +                       goto maybe_retry;
> +               }
> +               return rtn;
>         case CONDITION_GOOD:
>         case INTERMEDIATE_GOOD:
>         case INTERMEDIATE_C_GOOD:
> @@ -1154,6 +1164,17 @@
>                 return FAILED;
>         }
>         return FAILED;
> +
> +      maybe_retry:
> +
> +       if ((++SCpnt->retries) < SCpnt->allowed) {
> +               return NEEDS_RETRY;
> +       } else {
> +                /*
> +                 * No more retries - report this one back to upper level.
> +                 */
> +               return SUCCESS;
> +       }
>  }
>  
>  /*
> -

next prev parent reply	other threads:[~2002-06-29 18:10 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-06-08 22:50 2.4.18-pre9, Iomega Jaz, PPA - endless loop in SCSI recovery thread Andries.Brouwer
2002-06-09 16:04 ` Borsenkow Andrej
2002-06-09 16:04 ` Borsenkow Andrej
2002-06-29 18:10   ` Ishikawa [this message]
2002-06-29 18:10     ` 2.4.18-pre9, Iomega Jaz, PPA - endless loop in SCSI recoverythread Ishikawa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3D1DF80E.EF62AADE@yk.rim.or.jp \
    --to=ishikawa@yk.rim.or.jp \
    --cc=Andrej.Borsenkow@mow.siemens.ru \
    --cc=Andries.Brouwer@cwi.nl \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=quintela@mandrakesoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.