public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
From: Stephen Hemminger <stephen@networkplumber.org>
To: James Bottomley <James.Bottomley@Hansenpartnership.com>
Cc: Hannes Reinecke <hare@suse.de>, Christoph Hellwig <hch@lst.de>,
	James Bottomley <jejb@linux.vnet.ibm.com>,
	Jens Axboe <axboe@kernel.dk>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	"K. Y. Srinivasan" <kys@microsoft.com>,
	Dexuan Cui <decui@microsoft.com>, Long Li <longli@microsoft.com>,
	Josh Poulson <jopoulso@microsoft.com>,
	v-adsuho@microsoft.com, linux-scsi@vger.kernel.org,
	Haiyang Zhang <haiyangz@microsoft.com>
Subject: Re: SCSI regression in 4.11
Date: Fri, 3 Mar 2017 14:29:51 -0800	[thread overview]
Message-ID: <20170303142951.4d86a420@xeon-e3> (raw)
In-Reply-To: <1b325703-b823-4304-9d9d-86071811e000@email.android.com>

On Thu, 02 Mar 2017 11:18:23 -0800
James Bottomley <James.Bottomley@Hansenpartnership.com> wrote:

> On March 2, 2017 11:05:05 AM PST, Stephen Hemminger <stephen@networkplumber.org> wrote:
> >On Thu, 02 Mar 2017 10:36:17 -0800
> >James Bottomley <James.Bottomley@Hansenpartnership.com> wrote:
> >  
> >> On March 2, 2017 10:23:24 AM PST, Stephen Hemminger  
> ><stephen@networkplumber.org> wrote:  
> >> >On Thu, 2 Mar 2017 14:25:14 +0100
> >> >Hannes Reinecke <hare@suse.de> wrote:
> >> >    
> >> >> On 03/02/2017 02:40 AM, Stephen Hemminger wrote:    
> >> >> > On Thu, 2 Mar 2017 01:56:15 +0100
> >> >> > Christoph Hellwig <hch@lst.de> wrote:
> >> >> >      
> >> >> >> On Thu, Mar 02, 2017 at 01:01:35AM +0100, Christoph Hellwig  
> >wrote:    
> >> >     
> >> >> >>> On Wed, Mar 01, 2017 at 07:54:12AM -0800, Stephen Hemminger    
> >> >wrote:      
> >> >> >>>>>    
> >>
> >>	http://git.infradead.org/users/hch/block.git/commitdiff/148cff67b401e2229c076c0ea418712654be77e4  
> >> >     
> >> >> >>>>
> >> >> >>>> It appears that is already in the code I am testing in    
> >> >linux-next...      
> >> >> >>>
> >> >> >>> It's in -next now, but it wasn't at the time you reported the   
> >  
> >> >bug.    
> >> >> >>>
> >> >> >>> And it would sortof explain the bug if the INQUIRY data is    
> >> >correct    
> >> >> >>> in the scatterlist, but we ignore it, given that  
> >scsi_probe_lun  
> >> >> >>> ignores the result based on sense data.
> >> >> >>>
> >> >> >>> Can you check what happens with the horrible hack below:      
> >> >> >>
> >> >> >> Strike that - we're checking result later, so this can't be the  
> >   
> >> >case.    
> >> >> >>
> >> >> >> Now the other interesting thing is the memset in  
> >__scsi_exectute,  
> >> >> >> which looks very suspicious.  Try the following please:
> >> >> >>
> >> >> >> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> >> >> >> index 3e32dc954c3c..22f4fb550561 100644
> >> >> >> --- a/drivers/scsi/scsi_lib.c
> >> >> >> +++ b/drivers/scsi/scsi_lib.c
> >> >> >> @@ -253,7 +253,8 @@ static int __scsi_execute(struct  
> >scsi_device    
> >> >*sdev, const unsigned char *cmd,    
> >> >> >>  	 * and prevent security leaks by zeroing out the excess data.
> >> >> >>  	 */
> >> >> >>  	if (unlikely(rq->resid_len > 0 && rq->resid_len <= bufflen))
> >> >> >> -		memset(buffer + (bufflen - rq->resid_len), 0,  
> >rq->resid_len);  
> >> >> >> +//		memset(buffer + (bufflen - rq->resid_len), 0,  
> >rq->resid_len);  
> >> >> >> +		printk_ratelimited("%s: got resid %d\n", __func__,    
> >> >rq->resid_len);    
> >> >> >>
> >> >> >>  	if (resid)
> >> >> >>  		*resid = rq->resid_len;      
> >> >> >
> >> >> >
> >> >> > Still fails but does print resid on some of the later INQUIRY    
> >> >commands (not the initial one).    
> >> >> >      
> >> >> Can you test what happens if you blank out the storvsc_drv    
> >> >workaround:    
> >> >> 
> >> >> diff --git a/drivers/scsi/storvsc_drv.c  
> >b/drivers/scsi/storvsc_drv.c  
> >> >> index 585e54f..c36f42d 100644
> >> >> --- a/drivers/scsi/storvsc_drv.c
> >> >> +++ b/drivers/scsi/storvsc_drv.c
> >> >> @@ -1060,13 +1060,13 @@ static void  
> >storvsc_on_io_completion(struct   
> >> >> storvsc_device *stor_device,
> >> >>           * We do this so we can distinguish truly fatal failues
> >> >>           * (srb status == 0x4) and off-line the device in that  
> >case.  
> >> >>           */
> >> >> -
> >> >> +#if 0
> >> >>          if ((stor_pkt->vm_srb.cdb[0] == INQUIRY) ||
> >> >>             (stor_pkt->vm_srb.cdb[0] == MODE_SENSE)) {
> >> >>                  vstor_packet->vm_srb.scsi_status = 0;
> >> >>                  vstor_packet->vm_srb.srb_status =    
> >> >SRB_STATUS_SUCCESS;    
> >> >>          }
> >> >> -
> >> >> +#endif
> >> >> 
> >> >>          /* Copy over the status...etc */
> >> >>          stor_pkt->vm_srb.scsi_status =    
> >> >vstor_packet->vm_srb.scsi_status;    
> >> >> 
> >> >> It might thappen that we're fail to interpret the 'Device not    
> >> >present'     
> >> >> status correctly (which will happen for non-connected DVDs)  
> >causing    
> >> >the     
> >> >> SCSI stack to make incorrect decisions later on.
> >> >> 
> >> >> Cheers,
> >> >> 
> >> >> Hannes    
> >> >
> >> >There are several oddities about the host SCSI interface that I see:
> >> > 1. The host bus seems to report up to 6 devices even though only 2  
> >are  
> >> >     present (Disk and CDROM).
> >> >2. The CDROM emulation doesn't report the same status as a real  
> >device.  
> >> > 3. The host emulation of SCSI doesn't support all the page codes  
> >which  
> >> >     is why there is the hack.
> >> >
> >> >But as James said, these don't appear to be related to the failure
> >> >because
> >> >the code worked before and only in post 4.11 merege is there a  
> >problem.    
> >> 
> >> Your wait for the hang trace is the most suggestive.   It says we're  
> >waiting for a partition read to the spurious device.  Previously this
> >would have failed or timed out, so this seems to be the root cause.  
> >> 
> >> James
> >> 
> >>   
> >
> >Where is the number of valid LUN's determined during the scan process?  
> 
> Depends.  If you can do a report lun scan then that's definitive.  You seem to be probing (SCSI_probe_and_add_lun)  and you make us think there's something there by responding wrongly to the initial inquiry.

Testing a fix now. There looks like 3 problems here:
1. storvsc_io_completion masks all error responses from INQUIRY
2. Error handling in storvsc does not report invalid LUN correctly.
3. Block layer has new problems when device is in bad state (not present and timing out).

The first two have been there for 4 years but did not cause problems.
Something happened that made kernel chew lots of resources and eventually die when
it hits a disconnected device that is not detected properly.

  reply	other threads:[~2017-03-03 22:38 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-27 23:30 SCSI regression in 4.11 Stephen Hemminger
2017-02-28  1:19 ` Stephen Hemminger
2017-02-28  2:16   ` Jens Axboe
2017-02-28 14:08   ` Christoph Hellwig
2017-02-28 15:32     ` Jens Axboe
2017-02-28 17:06       ` James Bottomley
2017-02-28 17:16         ` Stephen Hemminger
2017-02-28 17:31           ` Jens Axboe
2017-02-28 18:41         ` Stephen Hemminger
2017-02-28 19:10           ` James Bottomley
2017-02-28 18:57         ` Stephen Hemminger
2017-02-28 23:48           ` James Bottomley
2017-03-01  1:25             ` Stephen Hemminger
2017-03-01  6:20               ` James Bottomley
2017-03-01  6:48                 ` Stephen Hemminger
2017-03-01 15:50                   ` Christoph Hellwig
2017-03-01 15:54                     ` Stephen Hemminger
2017-03-02  0:01                       ` Christoph Hellwig
2017-03-02  0:56                         ` Christoph Hellwig
2017-03-02  1:40                           ` Stephen Hemminger
2017-03-02 13:25                             ` Hannes Reinecke
2017-03-02 17:48                               ` Stephen Hemminger
2017-03-02 18:23                               ` Stephen Hemminger
2017-03-02 18:36                                 ` James Bottomley
2017-03-02 19:05                                   ` Stephen Hemminger
2017-03-02 19:18                                     ` James Bottomley
2017-03-03 22:29                                       ` Stephen Hemminger [this message]
2017-03-04  0:50                                       ` [RFC] hv_storvsc: error handling Stephen Hemminger
2017-03-04 11:55                                         ` Hannes Reinecke
2017-03-04 21:03                                         ` KY Srinivasan
2017-03-04 21:36                                           ` James Bottomley
2017-03-04 21:39                                             ` KY Srinivasan
2017-03-04 23:55                                               ` KY Srinivasan
2017-03-06 16:36                                           ` Stephen Hemminger
2017-03-06 17:48                                             ` KY Srinivasan
2017-03-06 17:57                                               ` Stephen Hemminger
2017-03-07  5:06                                               ` Christoph Hellwig
2017-03-07  6:08                                                 ` KY Srinivasan
2017-03-02  0:57                         ` SCSI regression in 4.11 Stephen Hemminger
2017-03-01 16:13                     ` Stephen Hemminger
2017-03-01 18:48                 ` Stephen Hemminger
2017-03-01 18:57                   ` James Bottomley
2017-03-01 19:20                     ` James Bottomley
2017-03-01 19:39                       ` Stephen Hemminger
2017-03-01 21:27                       ` Stephen Hemminger
2017-03-01 23:09                         ` James Bottomley
2017-03-01 23:39                           ` Stephen Hemminger
2017-03-01 19:00                   ` Linus Torvalds
2017-02-28 17:33     ` Stephen Hemminger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170303142951.4d86a420@xeon-e3 \
    --to=stephen@networkplumber.org \
    --cc=James.Bottomley@Hansenpartnership.com \
    --cc=axboe@kernel.dk \
    --cc=decui@microsoft.com \
    --cc=haiyangz@microsoft.com \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=jejb@linux.vnet.ibm.com \
    --cc=jopoulso@microsoft.com \
    --cc=kys@microsoft.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=longli@microsoft.com \
    --cc=martin.petersen@oracle.com \
    --cc=torvalds@linux-foundation.org \
    --cc=v-adsuho@microsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox