From mboxrd@z Thu Jan 1 00:00:00 1970 From: Francois Payette Subject: SATA150TX4 atat1:command timeout Date: Mon, 14 Feb 2005 16:41:22 -0500 Message-ID: <42111B02.4010805@netmosphere.net> Reply-To: francoisp@netmosphere.net Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Received: from 65.18.135.ptr ([65.18.135.81]:35241 "EHLO isecurit.com") by vger.kernel.org with ESMTP id S261540AbVBNVkr (ORCPT ); Mon, 14 Feb 2005 16:40:47 -0500 Received: from eliza.isecurit.net (modemcable026.47-81-70.mc.videotron.ca [70.81.47.26]) (authenticated) by isecurit.com (8.11.6/8.11.6) with ESMTP id j1ELUIe24134 for ; Mon, 14 Feb 2005 16:30:18 -0500 Received: from [192.168.0.7] (dice.isecurit.net [192.168.0.7]) by eliza.isecurit.net (Postfix) with ESMTP id 2CCF0AF79B for ; Sun, 13 Feb 2005 12:59:29 -0500 (EST) Sender: linux-ide-owner@vger.kernel.org List-Id: linux-ide@vger.kernel.org To: linux-ide@vger.kernel.org Hi, We have reported earlier a strange bug at bugzilla.kernel.org (#4106 ): in our setup of a 20318 (the SATA150 TX4, not the fastrack one) we are systematically getting ata1: command timeout after copying between 200 and 600GB of data through the controller. Our setup is with 4 maxtor 6Y200M0, 2 of them in raid 0, and the other 2 in a LV group over a raid 0 md array. When copying from one array to the other one repeatedly, the machines freezes once out out of every 2 copy. We changed the drive order, but we still got the msg ata1 command timeout. We swapped the order of the cables, and still got ata1 command timeout. We got a few kernel panics with spin locks, but since finding this forum we added the line writel(mask, mmio_base + PDC_INT_SEQMASK); to pdc_interrupt, and that one was gone. We have kernel 2.6.10-753 (fc3) with all relevant patches to the sata stuff, the last of which is the one Bartlomiej Zolnierkiewicz posted on 06/02/2005. http://marc.theaimsgroup.com/?l=linux-ide&m=110769875419863&w=2 After commenting out the line /* reduce TBG clock to 133 Mhz. */ /*tmp = readl(mmio + PDC_TBG_MODE); */ tmp &= ~0x30000; /* clear bit 17, 16*/ tmp |= 0x10000; /* set bit 17:16 = 0:1 */ /*writel(tmp, mmio + PDC_TBG_MODE); */ in pdc_host_init (total shot in the dark) the setup seems more stable, we have now gone through 3 cycles of stress test (600GB of copying) and have not seen the crash. Earlier we tried the same stress test with ATA_DEBUG and ATA_VERBOSE_DEBUG defined, the error did not occur maybe because of it was slowed down with all the output)? Later we tried commenting out the line that sets bmr burst (PDC_FLASH_CTL) and slew rate (PDC_SLEW_CTL) in pdc_host_init, and that slowed the setup to half it's orignal speed, but in that case the problem did not show up. We outputted the command timeout and it's 0x35 (ATA_CMD_WRITE_EXT) protocol is 4 (ATA_PROT_DMA). Any ideas? TIA, Francois Payette