From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bart Van Assche Subject: Re: [PATCH] scsi_transport_fc: Make 'port_state' writeable Date: Fri, 15 Mar 2013 14:41:23 +0100 Message-ID: <51432503.5000703@acm.org> References: <1358262138-13378-1-git-send-email-hare@suse.de> <51421272.2000706@linux.vnet.ibm.com> <51430C4A.7090308@suse.de> <5143130F.4090702@acm.org> <514315F7.4010101@redhat.com> <5143183B.1070300@acm.org> <514321FD.7090507@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from jacques.telenet-ops.be ([195.130.132.50]:49769 "EHLO jacques.telenet-ops.be" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753867Ab3CONl2 (ORCPT ); Fri, 15 Mar 2013 09:41:28 -0400 In-Reply-To: <514321FD.7090507@redhat.com> Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: "Bryn M. Reeves" Cc: Hannes Reinecke , Steffen Maier , linux-scsi@vger.kernel.org, Chad Dupuis , Andrew Vasquez , James Smart , James Bottomley , Mike Christie On 03/15/13 14:28, Bryn M. Reeves wrote: > On 03/15/2013 12:46 PM, Bart Van Assche wrote: >> The SCSI EH keeps trying until all outstanding request have been >> finished. Does lpfc_host_reset_handler() invoke scsi_done() for > > I don't think so (ends up calling lpfc_sli_cancel_iocbs() via > lpfc_hba_down_post() after shutting down the mailbox) but I've not seen > the EH escalate all the way to host reset in most of my testing - > usually some time after reaching the bus reset remaining IOs timeout and > the error bubbles up to device-mapper (all the cases I'm looking at are > devices managed by a dm-multipath target). > > The problem is that getting to this stage can take a very long time - > much longer than most cluster's node eviction timer for e.g. which is > the source of much of the complaint about this behaviour. > >> outstanding requests ? If not, how about modifying >> lpfc_host_reset_handler() such that it finishes all outstanding requests >> if the remote port is not reachable ? > > I'm not sure how safe that is in this situation - James mentioned in the > I_T nexus reset thread concerns about frames that could be delayed etc. > in the fabric if the host unilaterally abandons IOs (not sure of the > details for lpfc at this level). How about using the value of scsi_cmnd.jiffies_at_alloc to finish only those SCSI commands in the host reset handler that exceeded a certain processing time ? Bart.