From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bart Van Assche Subject: Re: hosts resets in SRP and the rest of the world, was: Re: [PATCH 01/12] scsi_transport_srp: Introduce srp_wait_for_queuecommand() Date: Mon, 11 May 2015 11:58:59 +0200 Message-ID: <55507D63.6010007@sandisk.com> References: <5541EE21.3050809@sandisk.com> <5541EE4A.30803@sandisk.com> <20150430093719.GA23486@infradead.org> <5542034D.5010300@sandisk.com> <554204D7.9050204@dev.mellanox.co.il> <55420AEA.10108@sandisk.com> <20150430172516.GA19200@infradead.org> <5549E600.9050208@sandisk.com> <20150511075058.GA18483@infradead.org> <55506E46.2060103@sandisk.com> <20150511093130.GA30217@infradead.org> Mime-Version: 1.0 Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from mail-bl2on0061.outbound.protection.outlook.com ([65.55.169.61]:53120 "EHLO na01-bl2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S932117AbbEKJ7G (ORCPT ); Mon, 11 May 2015 05:59:06 -0400 In-Reply-To: <20150511093130.GA30217@infradead.org> Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: Christoph Hellwig Cc: Sagi Grimberg , Doug Ledford , James Bottomley , Sagi Grimberg , Sebastian Parschauer , Jens Axboe , "linux-scsi@vger.kernel.org" , Hannes Reinecke On 05/11/15 11:31, Christoph Hellwig wrote: > On Mon, May 11, 2015 at 10:54:30AM +0200, Bart Van Assche wrote: >> There are multiple events that can cause the SRP initiator driver to >> initiate a reconnect: >> 1. The SCSI core invoking eh_host_reset_handler(). >> 2. An error reported by the IB HCA or by the IB core, e.g. an RDMA >> transmit timeout or a transport layer disconnect reported by the >> IB/CM. > > Right, I missed the srp_reconnect_work case. But even with that I > think what I wrote above still stands. srp_reconnect_work in that > case would just directly trigger the abort all commands and > reconnect operation. > > The main point I was trying to make is that instead of having a sequence > of: > > 1) block new queuecommand instances > 2) flush out pending queuecommand instances > 3) do part of the disconnect > 4) fail all in-flight commands > 5) reconnect > > we should aim for: > > 1) block new queuecommand instances > 2) fail all in-flight commands > 3) disconnect and reconnect > > to avoid the need to keep track of pending queuecommand instances, > and instead re-use the existing infrastructure to fail all in-flight > commands, which we have the infrastructure for, and which we need > to do anyway. Hello Christoph, Your proposal absolutely makes sense to me but unfortunately I do not have the time available now to implement it. Would it be acceptable if I rework scsi_wait_for_queuecommand() such that per-CPU counters are introduced in blk-mq instead of one counter per hctx ? Thanks, Bart.