public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* 2.4.18-pre9, Iomega Jaz, PPA - endless loop in SCSI recovery thread
@ 2002-06-08 20:55 Borsenkow Andrej
  2002-06-08 20:58 ` Borsenkow Andrej
  0 siblings, 1 reply; 4+ messages in thread
From: Borsenkow Andrej @ 2002-06-08 20:55 UTC (permalink / raw)
  To: linux-kernel list


It looks like a media went corrupted so one expects some reasonable
error message. What happens is I get endless loop in SCSI error recovery
thread. The only way to clear up situation is to reboot and even then it
hangs on umount so I have to do SysRq-B. Please, what additional
information can be provided; I hope I can keep this media intact for
some time for testing :-)

the kernel is 2.4.18-18mdk based on 2.4.1-pre9.

I am not on lkml so please Cc me. Thank you.

-andrej


{pts/0}% cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 04 Lun: 00
  Vendor: iomega   Model: jaz 2GB          Rev: E.17
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 03 Lun: 00
  Vendor: PHILIPS  Model: CDD3610 CD-R/RW  Rev: 3.09
  Type:   CD-ROM                           ANSI SCSI revision: 02
{pts/0}% cat /proc/scsi/ppa/0
Version : 2.07 (for Linux 2.4.x)
Parport : parport0
Mode    : EPP 32 bit
{pts/0}% uname -a
Linux localhost.localdomain 2.4.18-18mdkcustom #5 Sat Jun 8 23:25:23 MSD
2002 i686 unknown

SCSI log:

Jun  8 20:49:59 localhost kernel: Activating command for device 4 (1)
Jun  8 20:49:59 localhost kernel: Doing sd request, dev = 0x804, block = 944344
Jun  8 20:49:59 localhost kernel: sda : real dev = /dev/0, block = 944376
Jun  8 20:49:59 localhost kernel: sda : reading 248/248 512 byte blocks.
Jun  8 20:49:59 localhost kernel: Adding timer for command cd8b1e00 at 6000 (d2a12892)
Jun  8 20:49:59 localhost kernel: scsi_dispatch_cmnd (host = 0, channel = 0, target = 4, command = cd8b1e58, buffer = c009e000, 
Jun  8 20:49:59 localhost kernel: bufflen = 126976, done = d2813b7a)
Jun  8 20:49:59 localhost kernel: queuecommand : routine at d2a2f1e0
Jun  8 20:49:59 localhost kernel: leaving scsi_dispatch_cmnd()
Jun  8 20:50:00 localhost kernel: Clearing timer for command cd8b1e00 1
Jun  8 20:50:00 localhost kernel: Command failed cd8b1e00 2 active=1 busy=1 failed=0
Jun  8 20:50:00 localhost kernel: bh08:04: old sense key None
Jun  8 20:50:00 localhost kernel: Non-extended sense class 0 code 0x0
Jun  8 20:50:00 localhost kernel: Waking error handler thread (-1)
Jun  8 20:50:00 localhost kernel: Error handler waking up
Jun  8 20:50:00 localhost kernel: scsi_unjam_host: Checking to see if we need to request sense
Jun  8 20:50:00 localhost kernel: scsi_unjam_host: Requesting sense for 4
Jun  8 20:50:00 localhost kernel: Adding timer for command cd8b1e00 at 1000 (d2a12a28)
Jun  8 20:50:00 localhost kernel: In eh_done cd8b1e00 result:0
Jun  8 20:50:00 localhost kernel: send_eh_cmnd: cd8b1e00 eh_state:2002
Jun  8 20:50:00 localhost kernel: scsi_send_eh_cmnd: scsi_eh_completed_normally 2002
Jun  8 20:50:00 localhost kernel: Sense requested for cd8b1e00 - result 2
Jun  8 20:50:00 localhost kernel: Info fld=0xe6960, Current bh08:04: sense key Medium Error
Jun  8 20:50:00 localhost kernel: Additional sense indicates Address mark not found for data field
Jun  8 20:50:00 localhost kernel: Adding timer for command cd8b1e00 at 6000 (d2a12a28)
Jun  8 20:50:03 localhost kernel: In eh_done cd8b1e00 result:2
Jun  8 20:50:03 localhost kernel: send_eh_cmnd: cd8b1e00 eh_state:2002
Jun  8 20:50:03 localhost kernel: scsi_send_eh_cmnd: scsi_eh_completed_normally 2001
Jun  8 20:50:03 localhost kernel: Adding timer for command cd8b1e00 at 6000 (d2a12a28)
Jun  8 20:50:06 localhost kernel: In eh_done cd8b1e00 result:2
Jun  8 20:50:06 localhost kernel: send_eh_cmnd: cd8b1e00 eh_state:2002
Jun  8 20:50:06 localhost kernel: scsi_send_eh_cmnd: scsi_eh_completed_normally 2001
Jun  8 20:50:06 localhost kernel: Adding timer for command cd8b1e00 at 6000 (d2a12a28)
Jun  8 20:50:09 localhost kernel: In eh_done cd8b1e00 result:2
Jun  8 20:50:09 localhost kernel: send_eh_cmnd: cd8b1e00 eh_state:2002
Jun  8 20:50:09 localhost kernel: scsi_send_eh_cmnd: scsi_eh_completed_normally 2001
Jun  8 20:50:09 localhost kernel: Adding timer for command cd8b1e00 at 6000 (d2a12a28)
Jun  8 20:50:12 localhost kernel: In eh_done cd8b1e00 result:2
Jun  8 20:50:12 localhost kernel: send_eh_cmnd: cd8b1e00 eh_state:2002
Jun  8 20:50:12 localhost kernel: scsi_send_eh_cmnd: scsi_eh_completed_normally 2001
Jun  8 20:50:12 localhost kernel: Adding timer for command cd8b1e00 at 6000 (d2a12a28)
Jun  8 20:50:15 localhost kernel: In eh_done cd8b1e00 result:2
Jun  8 20:50:15 localhost kernel: send_eh_cmnd: cd8b1e00 eh_state:2002
Jun  8 20:50:15 localhost kernel: scsi_send_eh_cmnd: scsi_eh_completed_normally 2001
Jun  8 20:50:15 localhost kernel: Adding timer for command cd8b1e00 at 6000 (d2a12a28)
Jun  8 20:50:17 localhost kernel: In eh_done cd8b1e00 result:2
Jun  8 20:50:17 localhost kernel: send_eh_cmnd: cd8b1e00 eh_state:2002
Jun  8 20:50:17 localhost kernel: scsi_send_eh_cmnd: scsi_eh_completed_normally 2001
Jun  8 20:50:17 localhost kernel: Adding timer for command cd8b1e00 at 6000 (d2a12a28)
Jun  8 20:50:20 localhost kernel: In eh_done cd8b1e00 result:2
Jun  8 20:50:20 localhost kernel: send_eh_cmnd: cd8b1e00 eh_state:2002
Jun  8 20:50:20 localhost kernel: scsi_send_eh_cmnd: scsi_eh_completed_normally 2001
Jun  8 20:50:20 localhost kernel: Adding timer for command cd8b1e00 at 6000 (d2a12a28)
Jun  8 20:50:23 localhost kernel: In eh_done cd8b1e00 result:2
Jun  8 20:50:23 localhost kernel: send_eh_cmnd: cd8b1e00 eh_state:2002
Jun  8 20:50:23 localhost kernel: scsi_send_eh_cmnd: scsi_eh_completed_normally 2001
Jun  8 20:50:23 localhost kernel: Adding timer for command cd8b1e00 at 6000 (d2a12a28)
Jun  8 20:50:26 localhost kernel: In eh_done cd8b1e00 result:2
Jun  8 20:50:26 localhost kernel: send_eh_cmnd: cd8b1e00 eh_state:2002
Jun  8 20:50:26 localhost kernel: scsi_send_eh_cmnd: scsi_eh_completed_normally 2001

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 2.4.18-pre9, Iomega Jaz, PPA - endless loop in SCSI recovery thread
  2002-06-08 20:55 2.4.18-pre9, Iomega Jaz, PPA - endless loop in SCSI recovery thread Borsenkow Andrej
@ 2002-06-08 20:58 ` Borsenkow Andrej
  0 siblings, 0 replies; 4+ messages in thread
From: Borsenkow Andrej @ 2002-06-08 20:58 UTC (permalink / raw)
  To: linux-kernel list

Oh, and when I tried to simply switch off Jaz during this loop I got
panic:

kernel panic: scsi_unjam_host
Miscount of number of failed commands

Not sure if it helps very much ...

-andrej


В Вск, 09.06.2002, в 00:55, Andrej Borsenkow написал:
> 
> It looks like a media went corrupted so one expects some reasonable
> error message. What happens is I get endless loop in SCSI error recovery
> thread. The only way to clear up situation is to reboot and even then it
> hangs on umount so I have to do SysRq-B. Please, what additional
> information can be provided; I hope I can keep this media intact for
> some time for testing :-)
> 
> the kernel is 2.4.18-18mdk based on 2.4.1-pre9.
> 
> I am not on lkml so please Cc me. Thank you.
> 
> -andrej
> 
> 
> {pts/0}% cat /proc/scsi/scsi
> Attached devices:
> Host: scsi0 Channel: 00 Id: 04 Lun: 00
>   Vendor: iomega   Model: jaz 2GB          Rev: E.17
>   Type:   Direct-Access                    ANSI SCSI revision: 02
> Host: scsi1 Channel: 00 Id: 03 Lun: 00
>   Vendor: PHILIPS  Model: CDD3610 CD-R/RW  Rev: 3.09
>   Type:   CD-ROM                           ANSI SCSI revision: 02
> {pts/0}% cat /proc/scsi/ppa/0
> Version : 2.07 (for Linux 2.4.x)
> Parport : parport0
> Mode    : EPP 32 bit
> {pts/0}% uname -a
> Linux localhost.localdomain 2.4.18-18mdkcustom #5 Sat Jun 8 23:25:23 MSD
> 2002 i686 unknown
> 
> SCSI log:
> 
> Jun  8 20:49:59 localhost kernel: Activating command for device 4 (1)
> Jun  8 20:49:59 localhost kernel: Doing sd request, dev = 0x804, block = 944344
> Jun  8 20:49:59 localhost kernel: sda : real dev = /dev/0, block = 944376
> Jun  8 20:49:59 localhost kernel: sda : reading 248/248 512 byte blocks.
> Jun  8 20:49:59 localhost kernel: Adding timer for command cd8b1e00 at 6000 (d2a12892)
> Jun  8 20:49:59 localhost kernel: scsi_dispatch_cmnd (host = 0, channel = 0, target = 4, command = cd8b1e58, buffer = c009e000, 
> Jun  8 20:49:59 localhost kernel: bufflen = 126976, done = d2813b7a)
> Jun  8 20:49:59 localhost kernel: queuecommand : routine at d2a2f1e0
> Jun  8 20:49:59 localhost kernel: leaving scsi_dispatch_cmnd()
> Jun  8 20:50:00 localhost kernel: Clearing timer for command cd8b1e00 1
> Jun  8 20:50:00 localhost kernel: Command failed cd8b1e00 2 active=1 busy=1 failed=0
> Jun  8 20:50:00 localhost kernel: bh08:04: old sense key None
> Jun  8 20:50:00 localhost kernel: Non-extended sense class 0 code 0x0
> Jun  8 20:50:00 localhost kernel: Waking error handler thread (-1)
> Jun  8 20:50:00 localhost kernel: Error handler waking up
> Jun  8 20:50:00 localhost kernel: scsi_unjam_host: Checking to see if we need to request sense
> Jun  8 20:50:00 localhost kernel: scsi_unjam_host: Requesting sense for 4
> Jun  8 20:50:00 localhost kernel: Adding timer for command cd8b1e00 at 1000 (d2a12a28)
> Jun  8 20:50:00 localhost kernel: In eh_done cd8b1e00 result:0
> Jun  8 20:50:00 localhost kernel: send_eh_cmnd: cd8b1e00 eh_state:2002
> Jun  8 20:50:00 localhost kernel: scsi_send_eh_cmnd: scsi_eh_completed_normally 2002
> Jun  8 20:50:00 localhost kernel: Sense requested for cd8b1e00 - result 2
> Jun  8 20:50:00 localhost kernel: Info fld=0xe6960, Current bh08:04: sense key Medium Error
> Jun  8 20:50:00 localhost kernel: Additional sense indicates Address mark not found for data field
> Jun  8 20:50:00 localhost kernel: Adding timer for command cd8b1e00 at 6000 (d2a12a28)
> Jun  8 20:50:03 localhost kernel: In eh_done cd8b1e00 result:2
> Jun  8 20:50:03 localhost kernel: send_eh_cmnd: cd8b1e00 eh_state:2002
> Jun  8 20:50:03 localhost kernel: scsi_send_eh_cmnd: scsi_eh_completed_normally 2001
> Jun  8 20:50:03 localhost kernel: Adding timer for command cd8b1e00 at 6000 (d2a12a28)
> Jun  8 20:50:06 localhost kernel: In eh_done cd8b1e00 result:2
> Jun  8 20:50:06 localhost kernel: send_eh_cmnd: cd8b1e00 eh_state:2002
> Jun  8 20:50:06 localhost kernel: scsi_send_eh_cmnd: scsi_eh_completed_normally 2001
> Jun  8 20:50:06 localhost kernel: Adding timer for command cd8b1e00 at 6000 (d2a12a28)
> Jun  8 20:50:09 localhost kernel: In eh_done cd8b1e00 result:2
> Jun  8 20:50:09 localhost kernel: send_eh_cmnd: cd8b1e00 eh_state:2002
> Jun  8 20:50:09 localhost kernel: scsi_send_eh_cmnd: scsi_eh_completed_normally 2001
> Jun  8 20:50:09 localhost kernel: Adding timer for command cd8b1e00 at 6000 (d2a12a28)
> Jun  8 20:50:12 localhost kernel: In eh_done cd8b1e00 result:2
> Jun  8 20:50:12 localhost kernel: send_eh_cmnd: cd8b1e00 eh_state:2002
> Jun  8 20:50:12 localhost kernel: scsi_send_eh_cmnd: scsi_eh_completed_normally 2001
> Jun  8 20:50:12 localhost kernel: Adding timer for command cd8b1e00 at 6000 (d2a12a28)
> Jun  8 20:50:15 localhost kernel: In eh_done cd8b1e00 result:2
> Jun  8 20:50:15 localhost kernel: send_eh_cmnd: cd8b1e00 eh_state:2002
> Jun  8 20:50:15 localhost kernel: scsi_send_eh_cmnd: scsi_eh_completed_normally 2001
> Jun  8 20:50:15 localhost kernel: Adding timer for command cd8b1e00 at 6000 (d2a12a28)
> Jun  8 20:50:17 localhost kernel: In eh_done cd8b1e00 result:2
> Jun  8 20:50:17 localhost kernel: send_eh_cmnd: cd8b1e00 eh_state:2002
> Jun  8 20:50:17 localhost kernel: scsi_send_eh_cmnd: scsi_eh_completed_normally 2001
> Jun  8 20:50:17 localhost kernel: Adding timer for command cd8b1e00 at 6000 (d2a12a28)
> Jun  8 20:50:20 localhost kernel: In eh_done cd8b1e00 result:2
> Jun  8 20:50:20 localhost kernel: send_eh_cmnd: cd8b1e00 eh_state:2002
> Jun  8 20:50:20 localhost kernel: scsi_send_eh_cmnd: scsi_eh_completed_normally 2001
> Jun  8 20:50:20 localhost kernel: Adding timer for command cd8b1e00 at 6000 (d2a12a28)
> Jun  8 20:50:23 localhost kernel: In eh_done cd8b1e00 result:2
> Jun  8 20:50:23 localhost kernel: send_eh_cmnd: cd8b1e00 eh_state:2002
> Jun  8 20:50:23 localhost kernel: scsi_send_eh_cmnd: scsi_eh_completed_normally 2001
> Jun  8 20:50:23 localhost kernel: Adding timer for command cd8b1e00 at 6000 (d2a12a28)
> Jun  8 20:50:26 localhost kernel: In eh_done cd8b1e00 result:2
> Jun  8 20:50:26 localhost kernel: send_eh_cmnd: cd8b1e00 eh_state:2002
> Jun  8 20:50:26 localhost kernel: scsi_send_eh_cmnd: scsi_eh_completed_normally 2001


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 2.4.18-pre9, Iomega Jaz, PPA - endless loop in SCSI recovery thread
@ 2002-06-08 22:50 Andries.Brouwer
  2002-06-09 16:04 ` Borsenkow Andrej
  0 siblings, 1 reply; 4+ messages in thread
From: Andries.Brouwer @ 2002-06-08 22:50 UTC (permalink / raw)
  To: Andrej.Borsenkow, linux-kernel; +Cc: linux-scsi

    From: Borsenkow Andrej <Andrej.Borsenkow@mow.siemens.ru>

    It looks like a media went corrupted so one expects some reasonable
    error message. What happens is I get endless loop in SCSI error recovery
    thread. The only way to clear up situation is to reboot

That is the normal situation. I have never seen the SCSI error handler
do anything else but bring down the system. Commenting it out
really increases the stability of Linux (on my hardware).

    the kernel is 2.4.18-18mdk based on 2.4.1-pre9.

    Jun  8 20:50:03 localhost: In eh_done cd8b1e00 result:2
    Jun  8 20:50:03 localhost: send_eh_cmnd: cd8b1e00 eh_state:2002
    Jun  8 20:50:03 localhost: scsi_send_eh_cmnd: scsi_eh_completed_normally 2001
    Jun  8 20:50:03 localhost: Adding timer for command cd8b1e00 at 6000 (d2a12a28)
    Jun  8 20:50:06 localhost: In eh_done cd8b1e00 result:2
    Jun  8 20:50:06 localhost: send_eh_cmnd: cd8b1e00 eh_state:2002
    Jun  8 20:50:06 localhost: scsi_send_eh_cmnd: scsi_eh_completed_normally 2001
    Jun  8 20:50:06 localhost: Adding timer for command cd8b1e00 at 6000 (d2a12a28)
    Jun  8 20:50:09 localhost: In eh_done cd8b1e00 result:2
    Jun  8 20:50:09 localhost: send_eh_cmnd: cd8b1e00 eh_state:2002
    Jun  8 20:50:09 localhost: scsi_send_eh_cmnd: scsi_eh_completed_normally 2001
    Jun  8 20:50:09 localhost: Adding timer for command cd8b1e00 at 6000 (d2a12a28)

Hmm. It is clear what happens.
You are in scsi_error.c:scsi_send_eh_cmnd() and send a command,
the return status is NEEDS_RETRY, and the same command is sent again. Etc.

Now retrying is evil, and trying more than three times almost always
stupid, so if you add in this function

	int retry_count = 0;

and

    retry:
	if (++retry_count > 3) {
		SCpnt->eh_state = FAILED;
		return;
	}

that would surely be an improvement locally.
Globally one wonders: there is a retry count already - why isn't it
honoured by everybody involved? (But it isnt - it is only used in
scsi_decide_disposition().)

Where did we get this NEEDS_RETRY in the first place?
Well, scsi_check_sense() will return NEEDS_RETRY for MEDIUM_ERROR.
(Bad: the hardware also did retry many times already; it is usually very
counterproductive to keep on hitting the same bad spot on the media.)

There are also other code paths that will loop, but this one
looks most likely here.

Andries

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 2.4.18-pre9, Iomega Jaz, PPA - endless loop in SCSI recovery thread
  2002-06-08 22:50 Andries.Brouwer
@ 2002-06-09 16:04 ` Borsenkow Andrej
  0 siblings, 0 replies; 4+ messages in thread
From: Borsenkow Andrej @ 2002-06-09 16:04 UTC (permalink / raw)
  To: Andries.Brouwer; +Cc: linux-kernel list, linux-scsi, Juan Quintela

В Вск, 09.06.2002, в 02:50, Andries.Brouwer@cwi.nl написал:
>     From: Borsenkow Andrej <Andrej.Borsenkow@mow.siemens.ru>
> 
>     It looks like a media went corrupted so one expects some reasonable
>     error message. What happens is I get endless loop in SCSI error recovery
>     thread. The only way to clear up situation is to reboot
> 
> That is the normal situation. I have never seen the SCSI error handler
> do anything else but bring down the system. Commenting it out
> really increases the stability of Linux (on my hardware).
> 

MAINTAINERS:

SCSI SUBSYSTEM
L:      linux-scsi@vger.kernel.org
S:      Unmaintained

Oh, well ...


> Hmm. It is clear what happens.
> You are in scsi_error.c:scsi_send_eh_cmnd() and send a command,
> the return status is NEEDS_RETRY, and the same command is sent again. Etc.
> 
> Now retrying is evil, and trying more than three times almost always
> stupid, so if you add in this function
> 
> 	int retry_count = 0;
> 
> and
> 
>     retry:
> 	if (++retry_count > 3) {
> 		SCpnt->eh_state = FAILED;
> 		return;
> 	}
> 
> that would surely be an improvement locally.
> Globally one wonders: there is a retry count already - why isn't it
> honoured by everybody involved? (But it isnt - it is only used in
> scsi_decide_disposition().)
>

Which actually looks like an oversight ... and comments in unjam_host
loop even indicate this must be used.

Anyway, returning FAILED does not look right here. As long as I
understand what happens, FAILED means hard error (like failing to
communicate with device) while here we have more or less "soft" error.
So I decided to copy relevant bits from scsi_decide_disposition into
scsi_eh_completed_normally which resulted in pretty nice overall error
handling:

Jun  9 19:30:45 localhost kernel: Activating command for device 4 (1)
Jun  9 19:30:45 localhost kernel: Doing sd request, dev = 0x804, block = 944448
Jun  9 19:30:45 localhost kernel: sda : real dev = /dev/0, block = 944480
Jun  9 19:30:45 localhost kernel: sda : reading 8/8 512 byte blocks.
Jun  9 19:30:45 localhost kernel: Adding timer for command cfe5a000 at 6000 (d2a13964)
Jun  9 19:30:45 localhost kernel: scsi_dispatch_cmnd (host = 0, channel = 0, target = 4, command = cfe5a058, buffer = ca4d4000, 
Jun  9 19:30:45 localhost kernel: bufflen = 4096, done = d2813b84)
Jun  9 19:30:45 localhost kernel: queuecommand : routine at d2a38292
Jun  9 19:30:45 localhost kernel: leaving scsi_dispatch_cmnd()
Jun  9 19:30:47 localhost kernel: Clearing timer for command cfe5a000 1
Jun  9 19:30:47 localhost kernel: Command failed cfe5a000 2 active=1 busy=1 failed=0
Jun  9 19:30:47 localhost kernel: bh08:04: old sense key None
Jun  9 19:30:47 localhost kernel: Non-extended sense class 0 code 0x0
Jun  9 19:30:47 localhost kernel: Waking error handler thread (-1)
Jun  9 19:30:47 localhost kernel: Error handler waking up
Jun  9 19:30:47 localhost kernel: scsi_unjam_host: Checking to see if we need to request sense
Jun  9 19:30:47 localhost kernel: scsi_unjam_host: Requesting sense for 4
Jun  9 19:30:47 localhost kernel: Adding timer for command cfe5a000 at 1000 (d2a13af8)
Jun  9 19:30:47 localhost kernel: In eh_done cfe5a000 result:0
Jun  9 19:30:47 localhost kernel: send_eh_cmnd: cfe5a000 eh_state:2002
Jun  9 19:30:47 localhost kernel: scsi_send_eh_cmnd: scsi_eh_completed_normally 2002
Jun  9 19:30:47 localhost kernel: Sense requested for cfe5a000 - result 2
Jun  9 19:30:47 localhost kernel: Info fld=0xe6960, Current bh08:04: sense key Medium Error
Jun  9 19:30:47 localhost kernel: Additional sense indicates Address mark not found for data field
Jun  9 19:30:47 localhost kernel: Adding timer for command cfe5a000 at 6000 (d2a13af8)
Jun  9 19:30:49 localhost kernel: In eh_done cfe5a000 result:2
Jun  9 19:30:49 localhost kernel: send_eh_cmnd: cfe5a000 eh_state:2002
Jun  9 19:30:49 localhost kernel: scsi_send_eh_cmnd: scsi_eh_completed_normally 2001
Jun  9 19:30:49 localhost kernel: Adding timer for command cfe5a000 at 6000 (d2a13af8)
Jun  9 19:30:51 localhost kernel: In eh_done cfe5a000 result:2
Jun  9 19:30:51 localhost kernel: send_eh_cmnd: cfe5a000 eh_state:2002
Jun  9 19:30:51 localhost kernel: scsi_send_eh_cmnd: scsi_eh_completed_normally 2001
Jun  9 19:30:51 localhost kernel: Adding timer for command cfe5a000 at 6000 (d2a13af8)
Jun  9 19:30:53 localhost kernel: In eh_done cfe5a000 result:2
Jun  9 19:30:53 localhost kernel: send_eh_cmnd: cfe5a000 eh_state:2002
Jun  9 19:30:53 localhost kernel: scsi_send_eh_cmnd: scsi_eh_completed_normally 2001
Jun  9 19:30:53 localhost kernel: Adding timer for command cfe5a000 at 6000 (d2a13af8)
Jun  9 19:30:55 localhost kernel: In eh_done cfe5a000 result:2
Jun  9 19:30:55 localhost kernel: send_eh_cmnd: cfe5a000 eh_state:2002
Jun  9 19:30:55 localhost kernel: scsi_send_eh_cmnd: scsi_eh_completed_normally 2002
Jun  9 19:30:55 localhost kernel: Total of 0+0 commands on 0 devices require eh work
Jun  9 19:30:55 localhost kernel: Adding timer for command cfe5a000 at 100 (d2a13af8)
Jun  9 19:30:55 localhost kernel: Clearing timer for command cfe5a000 1
Jun  9 19:30:55 localhost kernel: scsi_error.c: Waking up host to restart
Jun  9 19:30:55 localhost kernel: Error handler sleeping
Jun  9 19:30:55 localhost kernel: Command finished 1 0 0x2
Jun  9 19:30:55 localhost kernel: Notifying upper driver of completion for device 4 8000002
Jun  9 19:30:55 localhost kernel: sda : rw_intr(0, 8000002 [f0 3])
Jun  9 19:30:55 localhost kernel: scsi0: ERROR on channel 0, id 4, lun 0, CDB: Read (10) 00 00 0e 69 60 00 00 08 00 
Jun  9 19:30:55 localhost kernel: Info fld=0xe6960, Current sd08:04: sense key Medium Error
Jun  9 19:30:55 localhost kernel: Additional sense indicates Address mark not found for data field
Jun  9 19:30:55 localhost kernel: I/O error: dev 08:04, sector 944448
Jun  9 19:30:55 localhost kernel: Deactivating command for device 4 (active=0, failed=0)
Jun  9 19:30:55 localhost kernel: Open returning 1
Jun  9 19:30:55 localhost kernel: Trying ioctl with scsi command 30
Jun  9 19:30:55 localhost kernel: scsi_do_req (host = 0, channel = 0 target = 4, buffer =00000000, bufflen = 0, done = d2a0f088, timeout = 1000, retries = 5)
Jun  9 19:30:55 localhost kernel: command : 1e  00  00  00  00  00  
Jun  9 19:30:55 localhost kernel: Activating command for device 4 (1)
Jun  9 19:30:55 localhost kernel: Leaving scsi_init_cmd_from_req()
Jun  9 19:30:55 localhost kernel: Adding timer for command cfe5a000 at 1000 (d2a13964)
Jun  9 19:30:55 localhost kernel: scsi_dispatch_cmnd (host = 0, channel = 0, target = 4, command = cfe5a058, buffer = 00000000, 
Jun  9 19:30:55 localhost kernel: bufflen = 0, done = d2a0f088)
Jun  9 19:30:55 localhost kernel: queuecommand : routine at d2a38292
Jun  9 19:30:55 localhost kernel: leaving scsi_dispatch_cmnd()
Jun  9 19:30:55 localhost kernel: Leaving scsi_do_req()
Jun  9 19:30:55 localhost kernel: Clearing timer for command cfe5a000 1
Jun  9 19:30:55 localhost kernel: Command finished 1 0 0x0
Jun  9 19:30:55 localhost kernel: Notifying upper driver of completion for device 4 0
Jun  9 19:30:55 localhost kernel: Deactivating command for device 4 (active=0, failed=0)
Jun  9 19:30:55 localhost kernel: Ioctl returned  0x0
Jun  9 19:30:55 localhost kernel: IOCTL Releasing command

 
> Where did we get this NEEDS_RETRY in the first place?
> Well, scsi_check_sense() will return NEEDS_RETRY for MEDIUM_ERROR.
> (Bad: the hardware also did retry many times already; it is usually very
> counterproductive to keep on hitting the same bad spot on the media.)
>

I do not dare to touch it. Whoever wrote the code must have some reasons
behind. Still, the obvious patch the shuts at least this code path
follows. 

-andrej

> There are also other code paths that will loop, but this one
> looks most likely here.
> 
> Andries
> 

--- linux-2.4.18-18mdk/drivers/scsi/scsi_error.c.scsi-eh-timeout	Thu May 30 16:22:37 2002
+++ linux-2.4.18-18mdk/drivers/scsi/scsi_error.c	Sun Jun  9 19:18:11 2002
@@ -1100,6 +1100,8 @@
  */
 STATIC int scsi_eh_completed_normally(Scsi_Cmnd * SCpnt)
 {
+	int rtn;
+
 	/*
 	 * First check the host byte, to see if there is anything in there
 	 * that would indicate what we need to do.
@@ -1113,14 +1115,18 @@
 			 * otherwise we just flag it as success.
 			 */
 			SCpnt->flags &= ~IS_RESETTING;
-			return NEEDS_RETRY;
+			goto maybe_retry;
 		}
 		/*
 		 * Rats.  We are already in the error handler, so we now get to try
 		 * and figure out what to do next.  If the sense is valid, we have
 		 * a pretty good idea of what to do.  If not, we mark it as failed.
 		 */
-		return scsi_check_sense(SCpnt);
+		rtn = scsi_check_sense(SCpnt);
+		if (rtn == NEEDS_RETRY) {
+			goto maybe_retry;
+		}
+		return rtn;
 	}
 	if (host_byte(SCpnt->result) != DID_OK) {
 		return FAILED;
@@ -1139,7 +1145,11 @@
 	case COMMAND_TERMINATED:
 		return SUCCESS;
 	case CHECK_CONDITION:
-		return scsi_check_sense(SCpnt);
+		rtn = scsi_check_sense(SCpnt);
+		if (rtn == NEEDS_RETRY) {
+			goto maybe_retry;
+		}
+		return rtn;
 	case CONDITION_GOOD:
 	case INTERMEDIATE_GOOD:
 	case INTERMEDIATE_C_GOOD:
@@ -1154,6 +1164,17 @@
 		return FAILED;
 	}
 	return FAILED;
+
+      maybe_retry:
+
+	if ((++SCpnt->retries) < SCpnt->allowed) {
+		return NEEDS_RETRY;
+	} else {
+                /*
+                 * No more retries - report this one back to upper level.
+                 */
+		return SUCCESS;
+	}
 }
 
 /*

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2002-06-09 16:06 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-06-08 20:55 2.4.18-pre9, Iomega Jaz, PPA - endless loop in SCSI recovery thread Borsenkow Andrej
2002-06-08 20:58 ` Borsenkow Andrej
  -- strict thread matches above, loose matches on Subject: below --
2002-06-08 22:50 Andries.Brouwer
2002-06-09 16:04 ` Borsenkow Andrej

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox