From mboxrd@z Thu Jan 1 00:00:00 1970 From: Hannes Reinecke Subject: Re: [PATCH 0/3] Fix USB deadlock caused by SCSI error handling Date: Tue, 01 Apr 2014 08:14:09 +0200 Message-ID: <533A5931.4040708@suse.de> References: <53397DAE.9010801@suse.de> <1396278224.3152.26.camel@dabdike.int.hansenpartnership.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: In-Reply-To: <1396278224.3152.26.camel-sFMDBYUN5F8GjUHQrlYNx2Wm91YjaHnnhRte9Li2A+AAvxtiuMwx3w@public.gmane.org> Sender: linux-usb-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: James Bottomley Cc: Alan Stern , SCSI development list , USB list List-Id: linux-scsi@vger.kernel.org On 03/31/2014 05:03 PM, James Bottomley wrote: > [lets split the thread] > On Mon, 2014-03-31 at 16:37 +0200, Hannes Reinecke wrote: >> On 03/31/2014 03:33 PM, Alan Stern wrote: >>> On Mon, 31 Mar 2014, Hannes Reinecke wrote: >>>> On 03/28/2014 08:29 PM, Alan Stern wrote: >>>>> On Fri, 28 Mar 2014, James Bottomley wrote: >>>>> Maybe scmd_eh_abort_handler() should check the flag before doing >>>>> anything. Is there any sort of sychronization to prevent the sam= e >>>>> incarnation of a command from being aborted twice (or by two diff= erent >>>>> threads at the same time)? If there is, it isn't obvious. >>>>> >>>> See above. scsi_times_out() will only ever called once. >>>> What can happen, though, is that _theoretically_ the LLDD might >>>> decide to call ->done() on a timed out command when >>>> scsi_eh_abort_handler() is still pending. >>> >>> That's okay. We can expect the LLDD to have sufficient locking to >>> handle that sort of thing without confusion (usb-storage does, for >>> example). >>> >>>>> (Also, what's going on at the start of scsi_abort_command()? Con= trary >>>>> to what one might expect, the first part of the function _cancels= _ a >>>>> scheduled abort. And it does so without clearing the >>>>> SCSI_EH_ABORT_SCHEDULED flag.) >>>>> >>>> The original idea was this: >>>> >>>> SCSI_EH_ABORT_SCHEDULED is sticky _per command_. >>>> Point is, any command abort is only ever send for a timed-out >>>> command. And the main problem for a timed-out command is that we >>>> simply _do not_ know what happened for that command. So _if_ a >>>> command timed out, _and_ we've send an abort, _and_ the command >>>> times out _again_ we'll be running into an endless loop between >>>> timeout and aborting, and never returning the command at all. >>>> >>>> So to prevent this we should set a marker on that command telling = it >>>> to _not_ try to abort the command again. >>> >>> I disagree. We _have_ to abort the command again -- how else can w= e >>> stop a running command? To prevent the loop you described, we shou= ld >>> avoid _retrying_ the command after it is aborted the second time. >>> >> The actual question is whether it's worth aborting the same command >> a second time. >> In principle any reset (like LUN reset etc) should clear the >> command, too. >> And the EH abort functionality is geared around this. >> If, for some reason, the transport layer / device driver >> requires a command abort to be send then sure, we need >> to accommodate for that. >=20 > We already discussed this (that was my first response too). USB need= s > all outstanding commands aborted before proceeding to reset. I'm > starting to think the actual way to fix this is to reset the abort > scheduled only if we send something else, so this might be a better f= ix. >=20 > It doesn't matter if we finish a command with abort scheduled because > the command then gets freed and the flags cleaned. >=20 > We can take our time with this because the other two patches, which I > can send separately fix the current deadlock (we no longer send an > unaborted request sense before the reset), and it can go via rc fixes= =2E >=20 Yes, agreed. The USB case seems to be a bit more tricky, and at least I need some more time to fully understand the details and implications of this. Cheers, Hannes --=20 Dr. Hannes Reinecke zSeries & Storage hare-l3A5Bk7waGM@public.gmane.org +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N=FCrnberg GF: J. Hawn, J. Guild, F. Imend=F6rffer, HRB 16746 (AG N=FCrnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html