From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michael Reed Subject: Re: [PATCH] OOPS due to clearing eh_action prior to aborting eh command Date: Wed, 07 Dec 2005 16:16:04 -0600 Message-ID: <43975F24.4040904@sgi.com> References: <43975A8C.2030208@sgi.com> <1133993204.3303.46.camel@mulgrave> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Return-path: Received: from omx3-ext.sgi.com ([192.48.171.20]:62143 "EHLO omx3.sgi.com") by vger.kernel.org with ESMTP id S1030390AbVLGWQH (ORCPT ); Wed, 7 Dec 2005 17:16:07 -0500 In-Reply-To: <1133993204.3303.46.camel@mulgrave> Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: James Bottomley Cc: linux-scsi@vger.kernel.org James Bottomley wrote: > On Wed, 2005-12-07 at 15:56 -0600, Michael Reed wrote: >> During my testing of fc transport attributes for the mpt fusion >> driver, I came upon this OOPS. (Actually, I've come upon it >> too many times. :( ) >> >> Attached is a patch which addresses the issue. Please give it a look. > > Isn't a better patch simply to copy the eh_action and check it for null > before completing it? That will close the done after timeout race. FWIW, the situation I'm reporting isn't done after timeout, it's the scsi error handler calling the LLDD abort routine, which actually aborts the command and completes it. The eh_action is cleared before the abort, violating what appears to be the accepted protocol of having the LLDD complete aborted commands. Am I missing something in your comment? (It wouldn't surprise me!) Thanks, Mike