From mboxrd@z Thu Jan 1 00:00:00 1970 From: James Bottomley Subject: Re: aic94xx driver woes continued Date: Thu, 20 Mar 2008 14:01:54 -0500 Message-ID: <1206039714.3038.40.camel@localhost.localdomain> References: <47E2B044.70705@ipax.at> Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit Return-path: Received: from accolon.hansenpartnership.com ([76.243.235.52]:43002 "EHLO accolon.hansenpartnership.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753457AbYCTTB5 (ORCPT ); Thu, 20 Mar 2008 15:01:57 -0400 In-Reply-To: <47E2B044.70705@ipax.at> Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: "Raoul Bhatia [IPAX]" Cc: linux-scsi@vger.kernel.org On Thu, 2008-03-20 at 19:43 +0100, Raoul Bhatia [IPAX] wrote: > hi there, > > we find ourself in the same situation as posted on this list before [1] > > first of all, the hardware details: > > System: > > Tyan Transport GT24-B3992 > > Motherboard: Tyan B3992 > > Dual Opteron 2218 (Dual-Core) > > 8GB RAM > > SAS Controller: > > product: AIC-9410W SAS (Razor ASIC RAID)=20 > > vendor: Adaptec > > > controler-bios: BIOS present (1,1), 1820 > > controler-sequencer: Firmware version 1.1 (V30) > > Harddisks: > > 4x Seagate Cheetah 15K.5 ST373455SS > > There is a Software Raid10 on top of those 4 disks. > > vanilla kernel 2.6.25-rc5 > > Debian GNU/Linux 4.0, AMD64 > > > coming to the problem description itself: > > the server is booted, the raid is working as intended > > md4 : active raid10 sdb9[1] sda9[0] sdd9[3] sdc9[2] > > 100181120 blocks 64K chunks 2 near-copies [4/4] [UUUU] > > now we mount /dev/md4 to /home, cd there and run an io intensive task > such as stress, tiobench (or even raid-reinit is enough) > > stress --hdd 20 --hdd-bytes 2gb --hdd-noclean > > soon we see: > > aic94xx: escb_tasklet_complete: REQ_TASK_ABORT, reason=0x6 > > sas: command 0xffff81023fb2ca80, task 0xffff81023ea7ab40, timed out: > EH_NOT_HANDLED > > ... > > sas: Enter sas_scsi_recover_host > > sas: trying to find task 0xffff81023ea7ab40 > > sas: sas_scsi_find_task: aborting task 0xffff81023ea7ab40 > > ... > > sas: --- Exit sas_scsi_recover_host > > please se the attached logfile. This is all normal. Seagate drives are known for throwing protocol errors under stress at certain revs of firmware. That's what REQ_TASK_ABORT, reason=0x6 is. Your logs indicate that the recovery occurred correctly (as in all tasks were eventually retried), so it doesn't show an actual problem. > sometimes even a disk is kicked out of the raid configuration. This would be abnormal, if you have a log of this, could you post it. I assume it was because of I/O errors? James