All of lore.kernel.org
 help / color / mirror / Atom feed
* SCSI regression in 2.6.27 and Nokia phones
@ 2008-11-18 20:54 Alan Stern
  2008-11-18 22:12 ` James Smart
  0 siblings, 1 reply; 3+ messages in thread
From: Alan Stern @ 2008-11-18 20:54 UTC (permalink / raw)
  To: James Bottomley, Andrew Morton
  Cc: USB Storage list, SCSI development list, stable

Everybody:

Nokia's phones are a real nuisance.  They've got lots of different 
models, with more coming out all the time, and every single one of them 
has a USB firmware bug causing it to report that it has one more sector 
of mass-storage than it really does.

On other operatings systems this isn't terribly noticeable.  But with 
Linux, things like the EFI GUID partitioning code and udev's vol_id 
program look specifically at the last sector to find partition and 
label information.

As it happens, these phones return no data and NO SENSE when asked for
the nonexistent "last" sector.  Up through 2.6.26 this apparently
worked okay.  There would be "Unable to read sector" errors in the
system log, but at least the owner could use the stupid thing.

However in 2.6.27 something changed, and now the SCSI midlayer goes
into an endless retry loop trying to read the invalid sector.  A bunch
of people have noticed and they are unhappy.

For the models we already know about, there are blacklist entries in
usb-storage indicating that the number of sectors is really one less
than what the device claims.  But this is a losing game: Nokia keeps
coming out with more models, and there are plenty of existing ones not
in the blacklist.  See the later comments in Bugzilla #11768 for 
examples.

How should we handle this?  I have already submitted to James a pair of
patches for 2.6.29 which will (taken together) fix the problem.  (They
were submitted two weeks ago, and so far there has not been any
feedback either positive or negative.)  For 2.6.27 and 2.6.28, the
simpler patch below will also break the endless loop.  But this one
hasn't been reviewed and it conflicts with the pair already submitted
-- and I firmly believe those two are the right way to go.

Since patches aren't supposed to go into a stable tree until they are 
in the mainline kernel, the best course forward isn't clear.  What 
should we do?

Alan Stern


------------------------------------------------------------------


Decrement the request's retry count when it is retried without making
any forward progress.  When the count drops below 0, fail the request.  
This prevents infinite retry loops of the sort seen with Nokia phones
when asked to read their nonexistent "last" sector.

Signed-Off-By: Alan Stern <stern@rowland.harvard.edu>

---

Index: 2.6.27.4/drivers/scsi/scsi_lib.c
===================================================================
--- 2.6.27.4.orig/drivers/scsi/scsi_lib.c
+++ 2.6.27.4/drivers/scsi/scsi_lib.c
@@ -611,6 +611,11 @@ static void scsi_requeue_command(struct 
 	struct request *req = cmd->request;
 	unsigned long flags;
 
+	if (--req->retries < 0) {
+		blk_end_request(req, -EIO, blk_rq_bytes(req));
+		scsi_next_command(cmd);
+		return;
+	}
 	scsi_unprep_request(req);
 	spin_lock_irqsave(q->queue_lock, flags);
 	blk_requeue_request(q, req);
@@ -690,6 +695,8 @@ static struct scsi_cmnd *scsi_end_reques
 				 * leftovers in the front of the
 				 * queue, and goose the queue again.
 				 */
+				if (bytes > 0)		/* Made progress */
+					++req->retries;
 				scsi_requeue_command(q, cmd);
 				cmd = NULL;
 			}


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: SCSI regression in 2.6.27 and Nokia phones
  2008-11-18 20:54 SCSI regression in 2.6.27 and Nokia phones Alan Stern
@ 2008-11-18 22:12 ` James Smart
  2008-11-18 22:37   ` Alan Stern
  0 siblings, 1 reply; 3+ messages in thread
From: James Smart @ 2008-11-18 22:12 UTC (permalink / raw)
  To: Alan Stern
  Cc: James Bottomley, Andrew Morton, USB Storage list,
	SCSI development list, stable@kernel.org

Alan,

It was probably this change:
http://marc.info/?l=linux-scsi&m=122118359130474&w=2

However, I would *not* back it out. It's a silent data corruption if you 
do. Past behavior reset good_bytes to indicate all data was transfered 
and ignored any residual - which meant if no data was transfered, we 
misinform the callee and the callee gets stuck with whatever data is in 
the buffer. Current behavior, the patch, uses residual, and likely falls 
into an i/o retry path.

-- james s


Alan Stern wrote:
> Everybody:
> 
> Nokia's phones are a real nuisance.  They've got lots of different
> models, with more coming out all the time, and every single one of them
> has a USB firmware bug causing it to report that it has one more sector
> of mass-storage than it really does.
> 
> On other operatings systems this isn't terribly noticeable.  But with
> Linux, things like the EFI GUID partitioning code and udev's vol_id
> program look specifically at the last sector to find partition and
> label information.
> 
> As it happens, these phones return no data and NO SENSE when asked for
> the nonexistent "last" sector.  Up through 2.6.26 this apparently
> worked okay.  There would be "Unable to read sector" errors in the
> system log, but at least the owner could use the stupid thing.
> 
> However in 2.6.27 something changed, and now the SCSI midlayer goes
> into an endless retry loop trying to read the invalid sector.  A bunch
> of people have noticed and they are unhappy.
> 
> For the models we already know about, there are blacklist entries in
> usb-storage indicating that the number of sectors is really one less
> than what the device claims.  But this is a losing game: Nokia keeps
> coming out with more models, and there are plenty of existing ones not
> in the blacklist.  See the later comments in Bugzilla #11768 for
> examples.
> 
> How should we handle this?  I have already submitted to James a pair of
> patches for 2.6.29 which will (taken together) fix the problem.  (They
> were submitted two weeks ago, and so far there has not been any
> feedback either positive or negative.)  For 2.6.27 and 2.6.28, the
> simpler patch below will also break the endless loop.  But this one
> hasn't been reviewed and it conflicts with the pair already submitted
> -- and I firmly believe those two are the right way to go.
> 
> Since patches aren't supposed to go into a stable tree until they are
> in the mainline kernel, the best course forward isn't clear.  What
> should we do?
> 
> Alan Stern
> 
> 
> ------------------------------------------------------------------
> 
> 
> Decrement the request's retry count when it is retried without making
> any forward progress.  When the count drops below 0, fail the request.
> This prevents infinite retry loops of the sort seen with Nokia phones
> when asked to read their nonexistent "last" sector.
> 
> Signed-Off-By: Alan Stern <stern@rowland.harvard.edu>
> 
> ---
> 
> Index: 2.6.27.4/drivers/scsi/scsi_lib.c
> ===================================================================
> --- 2.6.27.4.orig/drivers/scsi/scsi_lib.c
> +++ 2.6.27.4/drivers/scsi/scsi_lib.c
> @@ -611,6 +611,11 @@ static void scsi_requeue_command(struct
>         struct request *req = cmd->request;
>         unsigned long flags;
> 
> +       if (--req->retries < 0) {
> +               blk_end_request(req, -EIO, blk_rq_bytes(req));
> +               scsi_next_command(cmd);
> +               return;
> +       }
>         scsi_unprep_request(req);
>         spin_lock_irqsave(q->queue_lock, flags);
>         blk_requeue_request(q, req);
> @@ -690,6 +695,8 @@ static struct scsi_cmnd *scsi_end_reques
>                                  * leftovers in the front of the
>                                  * queue, and goose the queue again.
>                                  */
> +                               if (bytes > 0)          /* Made progress */
> +                                       ++req->retries;
>                                 scsi_requeue_command(q, cmd);
>                                 cmd = NULL;
>                         }
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: SCSI regression in 2.6.27 and Nokia phones
  2008-11-18 22:12 ` James Smart
@ 2008-11-18 22:37   ` Alan Stern
  0 siblings, 0 replies; 3+ messages in thread
From: Alan Stern @ 2008-11-18 22:37 UTC (permalink / raw)
  To: James Smart
  Cc: James Bottomley, Andrew Morton, USB Storage list,
	SCSI development list, stable@kernel.org

On Tue, 18 Nov 2008, James Smart wrote:

> Alan,
> 
> It was probably this change:
> http://marc.info/?l=linux-scsi&m=122118359130474&w=2
> 
> However, I would *not* back it out. It's a silent data corruption if you 
> do. Past behavior reset good_bytes to indicate all data was transfered 
> and ignored any residual - which meant if no data was transfered, we 
> misinform the callee and the callee gets stuck with whatever data is in 
> the buffer. Current behavior, the patch, uses residual, and likely falls 
> into an i/o retry path.

I agree, that was probably the cause, and I also agree that it should
not be removed.  That's part of what makes this so difficult; we don't
want to just remove the original cause, we want to fix things.

Alan Stern


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2008-11-18 22:37 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-18 20:54 SCSI regression in 2.6.27 and Nokia phones Alan Stern
2008-11-18 22:12 ` James Smart
2008-11-18 22:37   ` Alan Stern

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.