From mboxrd@z Thu Jan 1 00:00:00 1970 From: Trevor Cordes Subject: ide dma bug? Date: Sun, 4 Sep 2005 15:18:44 -0500 Message-ID: <20050904201844.GA12464@pog.tecnopolis.ca> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Received: from S0106000092a70907.wp.shawcable.net ([24.76.183.245]:29569 "EHLO pog.tecnopolis.ca") by vger.kernel.org with ESMTP id S1751086AbVIDUSp (ORCPT ); Sun, 4 Sep 2005 16:18:45 -0400 Received: from pog.tecnopolis.ca (pog.tecnopolis.ca [127.0.0.1]) by pog.tecnopolis.ca (8.13.1/8.13.1) with ESMTP id j84KIi6k012672 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Sun, 4 Sep 2005 15:18:44 -0500 Received: (from trevor@localhost) by pog.tecnopolis.ca (8.13.1/8.13.1/Submit) id j84KIibB012671 for linux-ide@vger.kernel.org; Sun, 4 Sep 2005 15:18:44 -0500 Content-Disposition: inline Sender: linux-ide-owner@vger.kernel.org List-Id: linux-ide@vger.kernel.org To: linux-ide@vger.kernel.org I'm getting a couple of times every couple of days: Sep 1 07:09:53 piles kernel: hdq: dma_intr: bad DMA status (dma_stat=75) Sep 1 07:09:53 piles kernel: hdq: dma_intr: status=0x50 { DriveReady SeekComplete } Sep 1 07:09:53 piles kernel: ide: failed opcode was: unknown Sep 1 07:10:26 piles kernel: hdr: dma_intr: bad DMA status (dma_stat=75) Sep 1 07:10:26 piles kernel: hdr: dma_intr: status=0x50 { DriveReady SeekComplete } Sep 1 07:10:26 piles kernel: ide: failed opcode was: unknown Sep 3 19:41:36 piles kernel: hdt: dma_intr: bad DMA status (dma_stat=75) Sep 3 19:41:36 piles kernel: hdt: dma_intr: status=0x50 { DriveReady SeekComplete } Sep 3 19:41:36 piles kernel: ide: failed opcode was: unknown It's my franken-computer file-server that has 12 IDE PATA drives running a combo of RAID6 and RAID0 (and a little RAID1). They're running off of the onboard Intel 845D IDE, onboard Promise PDC20276 (used only as a simple IDE controller), 1 Promise Ultra100 TX2 PDC20268, and 2 CMD680-based cards. 8 drives are 250GB (3 diff brands -- on purpose!), 4 are 160GB. The 8 250's are all IDE masters. The 4 160's share 2 IDE channels (master+slave). The 160's are hdq, hdr, hds, and hdt. Those are the ONLY drives that show these intermittent dma errors. Obviously the problem has to do with using some drives as slaves. I was reluctant to put in another IDE card (having so many already scares me), but have one on order right now. I'm hoping having them all on master will alleviate this problem. Note, besides the log messages, I see no other symptoms. The RAID array stays up and things seem to run fine. However, a friend said to me the following and so I write this email: "Not good... means you've got DMA problems. Possibly related to having too many scatter-gather operations pending simultaneously all being triggered off one or two IRQs... ??? I'd post to either ATA or Kernel or RAID mailing lists and ask for opinions, since that's a potential BIG problem in terms of CPU usage." "What you're telling me about those disks tells me that there's a slight bug in the driver code - basically straggler interrup handler looks for DMA results, sees bogosity, checks drive for errors and print message about bad-DMA-but-drives-are-fine." My kernel is 2.6.12-1.1372_FC3 My hdparm's are all identical: #hdparm /dev/hds /dev/hds: multcount = 16 (on) IO_support = 1 (32-bit) unmaskirq = 1 (on) using_dma = 1 (on) keepsettings = 0 (off) readonly = 0 (off) readahead = 256 (on)