From mboxrd@z Thu Jan 1 00:00:00 1970 From: Brian King Subject: Re: [PATCH 1/1] scsi: Add EH Start Unit retry Date: Mon, 02 Apr 2007 13:57:56 -0500 Message-ID: <46115234.20308@us.ibm.com> References: <11751999513070-patch-mail.ibm.com> <460C375F.1000000@torque.net> Reply-To: brking@linux.vnet.ibm.com Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Return-path: Received: from e32.co.us.ibm.com ([32.97.110.150]:38214 "EHLO e32.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965693AbXDBS6B (ORCPT ); Mon, 2 Apr 2007 14:58:01 -0400 Received: from d03relay04.boulder.ibm.com (d03relay04.boulder.ibm.com [9.17.195.106]) by e32.co.us.ibm.com (8.12.11.20060308/8.13.8) with ESMTP id l32Itq4t001470 for ; Mon, 2 Apr 2007 14:55:52 -0400 Received: from d03av02.boulder.ibm.com (d03av02.boulder.ibm.com [9.17.195.168]) by d03relay04.boulder.ibm.com (8.13.8/8.13.8/NCO v8.3) with ESMTP id l32Iw05o198200 for ; Mon, 2 Apr 2007 12:58:00 -0600 Received: from d03av02.boulder.ibm.com (loopback [127.0.0.1]) by d03av02.boulder.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id l32IvxYO006813 for ; Mon, 2 Apr 2007 12:58:00 -0600 In-Reply-To: <460C375F.1000000@torque.net> Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: dougg@torque.net Cc: James.Bottomley@steeleye.com, linux-scsi@vger.kernel.org, thlin@linux.vnet.ibm.com Douglas Gilbert wrote: > Brian King wrote: >> Currently, the scsi error handler will issue a START_UNIT >> command if the drive indicates it needs its motor started >> and the allow_restart flag is set in the scsi_device. If, >> after the scsi error handler invokes a host adapter reset >> due to error recovery, a device is in a unit attention >> state AND also needs a START_UNIT, that device will be placed >> offline. The disk array devices on an ipr RAID adapter >> will do exactly this when in a dual initiator configuration. >> This patch adds a single retry to the EH initiated >> START_UNIT. > > I have no objection to this patch. Just seems a pity > that SCSI devices go to the trouble of sending unit > attentions while OSes just throw them away. I agree. The reason the ipr adapter firmware added this UA in this configuration was to support SCSI 1 reservations and communicate to the host that any reservation previously held to the disk array is now lost since the adapter was reset. > Perhaps the scsi_device sysfs directory could have entries > like: > last_ua_asc > last_ua_ascq > last_ua_timestamp > where code could place the asc/ascq codes and a timestamp > then continue doing a retry. > Could we get a log entry, hotplug event? If we did have a way to communicate UA's to userspace like this it seems like it would allow usage of SCSI 1 reservations in this config and make it easier for an out of band tool to manage these reservations. I wonder if it would be cleaner if UAs could simply be sent up as netlink events / uevents, so they could contain all the information needed in one packet, rather than having to read sysfs attributes to figure out what happened. > Logical units may queue unit attentions (sam4r10.pdf > section 5.8.7) so it is possible that one retry may > not be enough. With my suggestion above, only the last > one would persist for a reasonable time. Yep. I've already ran into that with dual ported SAS devices. While one retry is sufficient for the ipr disk array devices I am trying to fix this for, I have no objection to increasing it. Maybe its just a case of increasing it later if it ends up being an issue. Brian -- Brian King eServer Storage I/O IBM Linux Technology Center