From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tejun Heo Subject: Re: sata_sil24 AMD64 crash/lockup Date: Thu, 06 Oct 2005 16:04:01 +0900 Message-ID: <4344CC61.8070609@gmail.com> References: <20051004221450.18205.qmail@science.horizon.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from wproxy.gmail.com ([64.233.184.195]:50093 "EHLO wproxy.gmail.com") by vger.kernel.org with ESMTP id S1750744AbVJFHEJ (ORCPT ); Thu, 6 Oct 2005 03:04:09 -0400 Received: by wproxy.gmail.com with SMTP id i2so170340wra for ; Thu, 06 Oct 2005 00:04:09 -0700 (PDT) In-Reply-To: <20051004221450.18205.qmail@science.horizon.com> Sender: linux-ide-owner@vger.kernel.org List-Id: linux-ide@vger.kernel.org To: linux@horizon.com Cc: linux-ide@vger.kernel.org Hello, there. linux@horizon.com wrote: > I'm trying to bring up a new AMD64 (uniprocessor) storage server with > 3x Sil3132 PCIe SATA controllers running 6x Seagate 7200.8 drives. > > Kernel 2.6.13.2 + 2.6.13-rc7-libata1.patch.bz2 + PPSkit-light > > I'm having some problems with intermitted (every few days) crashes > which lock up the drives. The machine is not yet in service, so > activity is pretty light, but I've been running zcav on a 6-way RAID-0 > partition to keep it busy. (350 MB/sec sustained is fun.) > > I have twice seen an assert fail that I didn't manage to wrote down. > (The first time, I though it was something I had done, and the second, > someone rebooted it before I got a chance.) > > Both times, the keyboard was still operating and I could scroll back. > This last time, it was locked up hard and I could only get what was > on the screen. Omitting the leading ffffffff from the kernel addresses, > and modulo any transcription errors, what I saw was: > > 802c1d50 do_unblank_screen+272 > 8012117e do_page_fault+1838 > 80133a9c call_console_drivers+76 > 801348c9 vprintk+601 > 80147739 autoremove_wake_function+9 > 8012ff23 wake_up_common+67 > 8010f22d error_exit+0 > 80340b3a ata_gen_fixed_sense+138 > 8034128a ata_scsi_qc_complete+106 > 8033ceaa ata_qc_complete+362 > 80342c15 sil24_interrupt+325 > Oh.. this is because of missing tf_read callback which is called from ata_gen_fixed_sense(). I'll try to figure out what to do about it a bit later. I gotta leave for english lessons now. I'm already a bit late. However, your kernel hitting that path means that some drives are actually generating errors. Let's see about that after get tf_read thing fixed. Damn. I'm really late. :-) > I know there are more recent kernels and sata_sil24 code, but that's > the most current pair I could figure out how to fit together. > > Trying the instructions at http://kernel.org/git/ gives me: > > $ cg-clone http://www.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev.git > defaulting to local storage area > 17:54:10 URL:http://www.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev.git/refs/heads/master [41/41] -> "refs/heads/origin" [1] > progress: 2 objects, 926 bytes > error: File ca442d313d86dc67e0a2e5d584b465bd382cbf5c (http://www.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev.git/objects/ca/442d313d86dc67e0a2e5d584b465bd382cbf5c) corrupt > > Cannot obtain needed blob ca442d313d86dc67e0a2e5d584b465bd382cbf5c > while processing commit 0000000000000000000000000000000000000000. > cg-pull: objects pull failed > cg-clone: pull failed > > > In the mean time, I'll keep working to reproduce the problem. > > Thanks for any hints! > - > To unsubscribe from this list: send the line "unsubscribe linux-ide" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- tejun