From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stephen Hemminger Subject: Re: [RFC] hv_storvsc: error handling. Date: Mon, 6 Mar 2017 08:36:19 -0800 Message-ID: <20170306083619.6789f9ba@xeon-e3> References: <1488301573.3046.9.camel@linux.vnet.ibm.com> <20170228105741.6253bb8a@xeon-e3> <1488325732.11610.9.camel@linux.vnet.ibm.com> <20170228172532.280811ed@xeon-e3> <1488349258.20321.11.camel@linux.vnet.ibm.com> <20170228224845.1da358ee@xeon-e3> <20170301155057.GA13167@lst.de> <20170301075412.2e5f1e98@xeon-e3> <20170302000135.GA22886@lst.de> <20170302005615.GA23687@lst.de> <20170301174058.383da142@xeon-e3> <20170302102324.47dbe3ad@xeon-e3> <895c4f2e-7faa-41e1-b5de-eedb4ae0f882@email.android.com> <20170302110505.6ad2eb61@xeon-e3> <1b325703-b823-4304-9d9d-86071811e000@email.android.com> <20170303165011.53a38794@xeon-e3> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Return-path: Received: from mail-pf0-f172.google.com ([209.85.192.172]:33131 "EHLO mail-pf0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753939AbdCFQpq (ORCPT ); Mon, 6 Mar 2017 11:45:46 -0500 Received: by mail-pf0-f172.google.com with SMTP id w189so62526517pfb.0 for ; Mon, 06 Mar 2017 08:45:46 -0800 (PST) In-Reply-To: Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: KY Srinivasan Cc: James Bottomley , Hannes Reinecke , Christoph Hellwig , James Bottomley , Jens Axboe , Linus Torvalds , "Martin K. Petersen" , Dexuan Cui , Long Li , Josh Poulson , "Adrian Suhov (Cloudbase Solutions SRL)" , "linux-scsi@vger.kernel.org" , Haiyang Zhang On Sat, 4 Mar 2017 21:03:41 +0000 KY Srinivasan wrote: > > -----Original Message----- > > From: Stephen Hemminger [mailto:stephen@networkplumber.org] > > Sent: Friday, March 3, 2017 4:50 PM > > To: James Bottomley > > Cc: Hannes Reinecke ; Christoph Hellwig ; > > James Bottomley ; Jens Axboe > > ; Linus Torvalds ; > > Martin K. Petersen ; KY Srinivasan > > ; Dexuan Cui ; Long Li > > ; Josh Poulson ; Adrian > > Suhov (Cloudbase Solutions SRL) ; linux- > > scsi@vger.kernel.org; Haiyang Zhang > > Subject: [RFC] hv_storvsc: error handling. > > > > Needs more testing but this does fix the observed problem. > > > > From: Stephen Hemminger > > > > Subject: [PATCH] hv_storvsc: fix error handling > > > > The Hyper-V storvsc SCSI driver was hiding all errors in INQUIRY and > > MODE_SENSE commands. This caused the scan process to incorrectly think > > devices were present and online. Also invalid LUN errors were not > > being handled correctly. > > > > This fixes problems booting a GEN2 VM on Hyper-V. It effectively > > reverts commit 4ed51a21c0f69 ("Staging: hv: storvsc: Fixup > > srb and scsi status for INQUIRY and MODE_SENSE") > > > > Signed-off-by: Stephen Hemminger > > --- > > drivers/scsi/storvsc_drv.c | 48 ++++------------------------------------------ > > 1 file changed, 4 insertions(+), 44 deletions(-) > > > > diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c > > index 638e5f427c90..8cc241fc54b8 100644 > > --- a/drivers/scsi/storvsc_drv.c > > +++ b/drivers/scsi/storvsc_drv.c > > @@ -543,28 +543,6 @@ static void storvsc_host_scan(struct work_struct > > *work) > > kfree(wrk); > > } > > > > -static void storvsc_remove_lun(struct work_struct *work) > > -{ > > - struct storvsc_scan_work *wrk; > > - struct scsi_device *sdev; > > - > > - wrk = container_of(work, struct storvsc_scan_work, work); > > - if (!scsi_host_get(wrk->host)) > > - goto done; > > - > > - sdev = scsi_device_lookup(wrk->host, 0, wrk->tgt_id, wrk->lun); > > - > > - if (sdev) { > > - scsi_remove_device(sdev); > > - scsi_device_put(sdev); > > - } > > - scsi_host_put(wrk->host); > > - > > -done: > > - kfree(wrk); > > -} > > - > > - > > /* > > * We can get incoming messages from the host that are not in response to > > * messages that we have sent out. An example of this would be messages > > @@ -955,8 +933,7 @@ static void storvsc_handle_error(struct > > vmscsi_request *vm_srb, > > } > > break; > > case SRB_STATUS_INVALID_LUN: > > - do_work = true; > > - process_err_fn = storvsc_remove_lun; > > + set_host_byte(scmnd, DID_NO_CONNECT); > > break; > > case SRB_STATUS_ABORTED: > > if (vm_srb->srb_status & SRB_STATUS_AUTOSENSE_VALID > > && > > @@ -1050,32 +1027,15 @@ static void storvsc_on_io_completion(struct > > storvsc_device *stor_device, > > > > stor_pkt = &request->vstor_packet; > > > > - /* > > - * The current SCSI handling on the host side does > > - * not correctly handle: > > - * INQUIRY command with page code parameter set to 0x80 > > - * MODE_SENSE command with cmd[2] == 0x1c > > - * > > - * Setup srb and scsi status so this won't be fatal. > > - * We do this so we can distinguish truly fatal failues > > - * (srb status == 0x4) and off-line the device in that case. > > - */ > > - > > - if ((stor_pkt->vm_srb.cdb[0] == INQUIRY) || > > - (stor_pkt->vm_srb.cdb[0] == MODE_SENSE)) { > > - vstor_packet->vm_srb.scsi_status = 0; > > - vstor_packet->vm_srb.srb_status = SRB_STATUS_SUCCESS; > > - } > > - > > - > > /* Copy over the status...etc */ > > stor_pkt->vm_srb.scsi_status = vstor_packet->vm_srb.scsi_status; > > stor_pkt->vm_srb.srb_status = vstor_packet->vm_srb.srb_status; > > stor_pkt->vm_srb.sense_info_length = > > vstor_packet->vm_srb.sense_info_length; > > > > - if (vstor_packet->vm_srb.scsi_status != 0 || > > - vstor_packet->vm_srb.srb_status != SRB_STATUS_SUCCESS) > > + if (stor_pkt->vm_srb.cdb[0] != INQUIRY && > > + (vstor_packet->vm_srb.scsi_status != 0 || > > + vstor_packet->vm_srb.srb_status != SRB_STATUS_SUCCESS)) > > storvsc_log(device, STORVSC_LOGGING_WARN, > > "cmd 0x%x scsi status 0x%x srb status 0x%x\n", > > stor_pkt->vm_srb.cdb[0], > > -- > > This patch gets rid of the ability to "hot remove" LUNs. I don't think that can be part of any > solution. The INQUIRY hack I put in a long time ago was to deal with host bugs on prior versions of > Windows server. WS2016 should not be trigerring this code. Stephen, could you please test this patch - > a quick hack: > > From b97f24f224a71a6e745c42e5640045a553eb407c Mon Sep 17 00:00:00 2001 > From: K. Y. Srinivasan > Date: Sat, 4 Mar 2017 14:00:46 -0700 > Subject: [PATCH 1/1] scsi: storvsc: Fix a bug in LUN removal code > Reply-To: kys@microsoft.com > > Signed-off-by: K. Y. Srinivasan > --- > drivers/scsi/storvsc_drv.c | 13 +++++++++++++ > 1 files changed, 13 insertions(+), 0 deletions(-) > > diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c > index 05526b7..27eb682 100644 > --- a/drivers/scsi/storvsc_drv.c > +++ b/drivers/scsi/storvsc_drv.c > @@ -885,6 +885,7 @@ static void storvsc_handle_error(struct vmscsi_request *vm_srb, > struct storvsc_scan_work *wrk; > void (*process_err_fn)(struct work_struct *work); > bool do_work = false; > + struct scsi_device *sdev; > > switch (SRB_STATUS(vm_srb->srb_status)) { > case SRB_STATUS_ERROR: > @@ -911,6 +912,18 @@ static void storvsc_handle_error(struct vmscsi_request *vm_srb, > } > break; > case SRB_STATUS_INVALID_LUN: > + if (!scsi_host_get(host)) { > + set_host_byte(scmnd, DID_NO_CONNECT); > + break; > + } > + > + sdev = scsi_device_lookup(wrk->host, 0, wrk->tgt_id, wrk->lun); > + > + if (!sdev) { > + set_host_byte(scmnd, DID_NO_CONNECT); > + break; > + } > + > do_work = true; > process_err_fn = storvsc_remove_lun; > break; I will try it, but it can't work for two reasons. First, the INVALID_LUN error is masked off on INQUIRY in current code. Second, the scsi_device is instantiated already as part of scan probe process before it gets here. The best solution so far is: - remove old INQUIRY/SENSE error masking + add new workaround for INQUIRY of device id on LUN 0 which appears to be the reason for old masking + return errors on missing LUN + provide better transport services for hot remove (rather than detecting by failed I/O).