From mboxrd@z Thu Jan 1 00:00:00 1970 From: James Bottomley Subject: RE: [2.4.21] Spurious ABORTs Date: Tue, 27 Sep 2005 12:18:21 -0500 Message-ID: <1127841501.4814.53.camel@mulgrave> References: <0E3FA95632D6D047BA649F95DAB60E57060CD1DF@exa-atlanta> Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit Return-path: Received: from stat9.steeleye.com ([209.192.50.41]:412 "EHLO hancock.sc.steeleye.com") by vger.kernel.org with ESMTP id S964943AbVI0RS0 (ORCPT ); Tue, 27 Sep 2005 13:18:26 -0400 In-Reply-To: <0E3FA95632D6D047BA649F95DAB60E57060CD1DF@exa-atlanta> Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: "Bagalkote, Sreenivas" Cc: "'linux-scsi@vger.kernel.org'" , 'Christoph Hellwig' , "'hch@lst.de'" , "Kolli, Neela Syam" On Tue, 2005-09-27 at 13:10 -0400, Bagalkote, Sreenivas wrote: > What do you mean by "actually do a reset"? I see that firmware doesn't > have any pending commands. So I simply return success from reset routine. > Do you see any problem in this? After a hundred or so such cycles, the > system is frozen. I should also tell you that if I introduce abort handler > and return success for all the completed commands, I don't see the OS hang. Well, yes, for two reasons 1. you do clustering, so a reset request could be from a reservation breaking protocol 2. The fact that the eh activated indicates something went wrong. If you take no corrective action and the test unit ready that follows the reset fails or times out then the device will be taken offline. James