Adding direct email addresses of few people to avoid any filters. Hannes/Martin/James/Tomas/Christoph, Can you please comment on this? Thanks, Sumit >-----Original Message----- >From: Sumit Saxena [mailto:sumit.saxena@broadcom.com] >Sent: Tuesday, December 13, 2016 6:50 PM >To: 'linux-scsi' >Subject: SCSI: usage of DID_REQUEUE vs DID_RESET for returning SCSI >commands to be retried > >Hi all, > >I have query regarding usage of host_byte DID_REQUEUE vs DID_RESET returned >by LLD to SCSI mid layer. > >Let me give some background here. >I am using megaraid_sas controller. megaraid_sas driver returns all outstanding >SCSI commands back to SCSI layer with DID_RESET host_byte before resetting >controller. >The intent is- all these commands returned with DID_RESET before controller >reset should be retried by SCSI. > >In few distros, I have observed that if SYNCHRONIZE_CACHE command(should be >applicable for all non Read write commands) is outstanding before resetting >controller and driver returns those commands back with DID_RESET then >SYNCHRONIZE_CACHE command not retried but failed to upper layer but other >READ/WRITE commands were not failed but retried. I was running filesystem IOs >and SYNHRONIZE_CACHE returned with error cause filesystem to go in READ only >mode. > >Later I checked and realized below commit was missing in that distro kernel and >seems to cause the problem- > >a621bac scsi_lib: correctly retry failed zero length REQ_TYPE_FS commands > >But distro kernel has below commit- > >89fb4cd scsi: handle flush errors properly > >Issue does not hit on the kernels which don't have both commits but hits when >commit- "89fb4cd scsi: handle flush errors properly " is there but commit- >"a621bac scsi_lib: correctly retry failed zero length REQ_TYPE_FS commands" is >missing. > >This issue is observed with mpt3sas driver as well and should be applicable to all >LLDs returning non read write commands with DID_RESET. > >Returning DID_REQUEUE instead of DID_RESET from driver solves the problem >irrespective of whether these above mentioned commits are there or not in >kernel. >So I am thinking to use DID_REQUEUE instead of DID_RESET in megaraid_sas >driver for all SCSI commands(not only limited to SYNCHRONIZE_CACHE or non >read write commands) before resetting controller. As I mentioned intent is those >outstanding commands returned by driver before doing controller reset should be >retried and as soon as reset is complete, driver will be processing those >commands. There is no sense key associated with SCSI commands returned. > >I browsed SCSI code and get to know DID_REQUEUE causes command to be >retried by calling- scsi_queue_insert(cmd, SCSI_MLQUEUE_DEVICE_BUSY). >This seems to be good enough for our requirement of commands to be re-tried >by SCSI layer. > >Please provide feedback if anyone forsee any issue with using DID_REQUEUE for >this use case. >I will be doing some testing with DID_REQUEUE in place of DID_RESET in >megaraid_sas driver. > >I have attached here a proposed patch for megaraid_sas driver. >If there are no concerns, I will send this patch to SCSI mailing list. > >Thanks, >Sumit