From mboxrd@z Thu Jan 1 00:00:00 1970 From: James Bottomley Subject: RE: [2.4.21] Spurious ABORTs Date: Tue, 27 Sep 2005 15:10:45 -0500 Message-ID: <1127851845.4823.23.camel@mulgrave> References: <0E3FA95632D6D047BA649F95DAB60E57060CD1E1@exa-atlanta> Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit Return-path: Received: from stat9.steeleye.com ([209.192.50.41]:36253 "EHLO hancock.sc.steeleye.com") by vger.kernel.org with ESMTP id S1750969AbVI0UKu (ORCPT ); Tue, 27 Sep 2005 16:10:50 -0400 In-Reply-To: <0E3FA95632D6D047BA649F95DAB60E57060CD1E1@exa-atlanta> Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: "Bagalkote, Sreenivas" Cc: "'linux-scsi@vger.kernel.org'" , 'Christoph Hellwig' , "'hch@lst.de'" , "Kolli, Neela Syam" On Tue, 2005-09-27 at 15:48 -0400, Bagalkote, Sreenivas wrote: > >1. you do clustering, so a reset request could be from a > >reservation breaking protocol > > I don't have clustering setup. So this is definitely not the reason. You might not, but others do. If you return success to a reset request without doing anything then the device will stay reserved by the other system. i.e. you'll break clustering setups. > >2. The fact that the eh activated indicates something went > >wrong. If you take no corrective action and the test unit > >ready that follows the reset fails or times out then the > >device will be taken offline. > > > > Heavy IOs are going on in the FW while it is rebuilding RAID arrays. > We expect some of the commands to timeout. But the key is recover > gracefully. I see that FW is completing _all_ the commands albeit > after timing out. When the reset handler is called after all the > commands are out of the door, I simply return success. Can this > potentially cause any issues? As long as the Test Unit Ready that follows the reset succeeds, then no, this will work and it shouldn't cause any issues other than the clustering one. James