public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Bryan Andersen <bryan@bogonomicon.net>
To: Robbert Kouprie <robbert@radium.jvb.tudelft.nl>
Cc: "'Stephan von Krawczynski'" <skraw@ithnet.com>,
	"'Benjamin Herrenschmidt'" <benh@kernel.crashing.org>,
	alan@redhat.com, linux-kernel@vger.kernel.org
Subject: Re: 2.4.21-pre4: PDC ide driver problems with shared interrupts
Date: Wed, 05 Feb 2003 13:45:39 -0600	[thread overview]
Message-ID: <3E4169E3.4050003@bogonomicon.net> (raw)
In-Reply-To: 009f01c2cd35$d9015860$020da8c0@nitemare

This looks similar to what I complained about in "PDC292XX DMA
loss in 2.4.21-pre3-ac4".  I can reproduce my problem but it only 
happens randomly under heavy load.  I have to load down accesses
to both disks (each on their own channel as master) on the
controller.  The same task running on only one of the disks sees
no problems.

    http://www.ussg.iu.edu/hypermail/linux/kernel/0301.3/0239.html

The only differnce now is I have seen it on both disks on the
PDC29269.  I have one disk on each channel.  I've ruled out problems 
with the hardware.  I'm now using different cables and a differnt 
PDC29269 controller.

This is from earlier today.

Feb  5 10:57:11 blip kernel: hdg: dma_intr: status=0xd0 { Busy }
Feb  5 10:57:11 blip kernel:
Feb  5 10:57:11 blip kernel: hdg: DMA disabled
Feb  5 10:57:11 blip kernel: PDC202XX: Secondary channel reset.
Feb  5 10:57:11 blip kernel: ide3: reset: success

This one is recent after the reboot after the one above.

Feb  5 13:22:42 blip kernel: hde: dma_intr: status=0xd0 { Busy }
Feb  5 13:22:42 blip kernel:
Feb  5 13:22:42 blip kernel: hde: DMA disabled
Feb  5 13:22:42 blip kernel: PDC202XX: Primary channel reset.
Feb  5 13:22:43 blip kernel: ide2: reset: success

It happened after about 12 minutes run time on the script below.

#!/bin/bash
mount1=/data5 # {mount point of disk one}
mount2=/data6 # {mount point of disk two}
mount3=/usr   # Further load to use up DMA channel time

(while(true)
do
    find /$mount1 -type f -print | xargs wc
done > /dev/null) &
(while(true)
do
    find /$mount2 -type f -print | xargs wc
done > /dev/null) &
(while(true)
do
    find /$mount3 -type f -print | xargs wc
done > /dev/null) &
wait

exit 0


I haven't looked at the code yet, but it comes to mind what
happens when one has run out of DMA channels or interrupts
to service requests?  My system has both the PDC29269 channels
on a shared interrupt with the a couple of other devices.  My
motherboard is a ASUS A7N8X with the nForce2 chipset.

- Bryan

Robbert Kouprie wrote:
 > Benjamin Herrenschmidt wrote:
 >
 >
 >>>Feb 4 01:02:22 admin kernel: hde: dma_intr: status=0xd0 { Busy }
 >>>Feb 4 01:02:22 admin kernel:
 >>>Feb 4 01:02:22 admin kernel: PDC202XX: Primary channel reset.
 >>>Feb 4 01:02:22 admin kernel: ide2: reset: success
 >>>Feb 4 01:02:23 admin kernel: hde: status error: status=0x58 { DriveReady
 >>
 > SeekComplete DataRequest }
 >
 >>>Feb 4 01:02:23 admin kernel:
 >>>Feb 4 01:02:23 admin kernel: hde: drive not ready for command
 >>>Feb 4 01:02:23 admin kernel: hde: status error: status=0x50 { DriveReady
 >>
 > SeekComplete }
 >
 >>>Feb 4 01:02:23 admin kernel:
 >>>Feb 4 01:02:23 admin kernel: hde: no DRQ after issuing WRITE
 >>>Feb 4 01:02:23 admin kernel: hde: status timeout: status=0xd0 { Busy }
 >>>Feb 4 01:02:23 admin kernel:
 >>
 >>Hi Alan !
 >>
 >>I'm trying to get some sense out of the above report as it seem to match
 >>a problem a user reported me as well. It's interesting because it's
 >>apparently running UP, so it's not the SMP race found by Ross. It's
 >>definitely a problem with shared interrupt though.
 >
 >
 > Hi all,
 >
 > I experience a similar problem too on UP (and UP kernel, no preempt, no
 > taskfile, no acpi). Having replaced almost all hardware involved, I'm 99%
 > sure it isn't a hardware problem. I reported to Alan & Andre privately at
 > the time, but unfortunately there was no way to reproduce it, so nothing
 > came out of it. Also there have been a few threads where it looks like
 > people report the same problem.
 >
 > Like:
 > http://www.uwsg.indiana.edu/hypermail/linux/kernel/0210.3/0416.html (2nd
 > paragraph)
 > http://www.uwsg.indiana.edu/hypermail/linux/kernel/0301.1/0547.html
 > (replacing the cable actually _didn't_ fix it in the end)
 >
 > System is a PIII-866 on an Asus board, WD1800JB 180Gb on pri master
 > (UDMA100), WD1200BB 120Gb sec master (UDMA100) on an Promise PDC20269 PCI
 > card, kernel 2.4.20-rc1-ac4 (yes it's an older one, but testing is 
hard, as
 > it sometimes takes a month for the problem to show up again).
 >
 > In a shared interrupt setup (i.e. 2 disks on each channel of the 
PDC), the
 > machine will just crash in unpredictable ways, without leaving 
traces. With
 > only the 180Gb drive on the PDC, there's no interrupt sharing, and the
 > system is able to stay alive when the error happens, it's just the 
disk that
 > drops dead in that case. Box has to be powered off (resetting doesn't 
help)
 > for the drive to get alive again. The logs say:
 >
 > Nov  1 08:24:14 Bambi kernel: hde: dma_timer_expiry: dma status == 0x21
 > Nov  1 08:24:24 Bambi kernel: PDC202XX: Primary channel reset.
 > Nov  1 08:24:24 Bambi kernel: hde: timeout waiting for DMA
 > Nov  1 08:24:24 Bambi kernel: hde: (__ide_dma_test_irq) called while not
 > waiting
 > Nov  1 08:24:24 Bambi kernel: blk: queue c02d5e58, I/O limit 4095Mb (mask
 > 0xffffffff)
 > Nov  1 08:24:34 Bambi kernel: hde: irq timeout: status=0xd0 { Busy }
 > Nov  1 08:24:34 Bambi kernel:
 > Nov  1 08:24:34 Bambi kernel: hde: DMA disabled
 > Nov  1 08:24:34 Bambi kernel: PDC202XX: Primary channel reset.
 > Nov  1 08:25:04 Bambi kernel: ide2: reset timed-out, status=0xd0
 > Nov  1 08:25:04 Bambi kernel: hde: status timeout: status=0xd0 { Busy }
 > Nov  1 08:25:04 Bambi kernel:
 > Nov  1 08:25:04 Bambi kernel: PDC202XX: Primary channel reset.
 > Nov  1 08:25:34 Bambi kernel: ide2: reset timed-out, status=0xd0
 >
 > A few times I tried to reset the drive with hdparm using "hdparm -w
 > /dev/hde". This didn't have success and produced the following log 
messages:
 >
 > Nov  1 12:02:57 Bambi kernel: hde: DMA disabled
 > Nov  1 12:02:57 Bambi kernel: PDC202XX: Primary channel reset.
 > Nov  1 12:02:57 Bambi kernel: hde: ide_timer_expiry: hwgroup->busy 
was 0 ??
 > Nov  1 12:03:27 Bambi kernel: ide2: reset timed-out, status=0xd0
 >
 > Does anyone have any theory of how to reproduce this (maybe with the new
 > theory of Benjamin)? I would be glad to test new kernels, but some 
testcase
 > is required to be able to supply more results (within a reasonable 
amount of
 > time that is ;)).
 >
 > Below all specs in the shared irq setup, 2 drives on the 20269, one 
on each
 > channel. (In the non-shared setup, I moved the WD 120Gb to an extra
 > PDC20262, no interrupts were shared.)
 >
 > Regards,
 > - Robbert Kouprie
 >



  reply	other threads:[~2003-02-05 19:36 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-02-05 16:44 2.4.21-pre4: PDC ide driver problems with shared interrupts Robbert Kouprie
2003-02-05 19:45 ` Bryan Andersen [this message]
     [not found] <3F1C54A8.5020404@snarkhunter.com>
2003-07-22 14:44 ` Edward King
2003-07-22 18:07   ` John V. Martinez
2003-07-22 21:02     ` Edward King
     [not found] <20030202153009$2e0d@gated-at.bofh.it>
     [not found] ` <20030205181006$107c@gated-at.bofh.it>
     [not found]   ` <20030205181006$7bb8@gated-at.bofh.it>
     [not found]     ` <20030205181006$455c@gated-at.bofh.it>
     [not found]       ` <20030205181006$5dba@gated-at.bofh.it>
     [not found]         ` <20030205181006$3358@gated-at.bofh.it>
     [not found]           ` <200302061451.h16Epl0Z001134@pc.skynet.be>
2003-02-23 14:33             ` Stephan von Krawczynski
2003-02-23 15:04               ` Arjan van de Ven
2003-02-23 17:29                 ` Stephan von Krawczynski
     [not found] <7b263321.0302140626.2ddb7980@posting.google.com>
2003-02-14 14:41 ` Edward King
2003-02-14 15:41   ` Benjamin Herrenschmidt
  -- strict thread matches above, loose matches on Subject: below --
2003-02-02 15:18 2.4.21-pre4: tg3 " Stephan von Krawczynski
2003-02-02 16:49 ` Jeff Garzik
2003-02-02 17:52   ` Stephan von Krawczynski
2003-02-02 18:28     ` Jeff Garzik
2003-02-05  9:48       ` 2.4.21-pre4: PDC ide " Stephan von Krawczynski
2003-02-05 11:16         ` Benjamin Herrenschmidt
2003-02-05 11:39           ` Stephan von Krawczynski
2003-02-05 12:21             ` Alan Cox
2003-02-05 12:22             ` Benjamin Herrenschmidt
2003-02-05 12:50               ` Alan Cox
2003-02-05 13:19               ` Stephan von Krawczynski
2003-02-05 12:24           ` Alan Cox
2003-02-05 16:56           ` Ross Biro
2003-02-05 17:12             ` Benjamin Herrenschmidt
2003-02-05 17:19               ` Ross Biro
2003-02-05 17:34                 ` Benjamin Herrenschmidt
2003-02-05 17:38                   ` Stephan von Krawczynski
     [not found]                     ` <1044467091.685.155.camel@zion.wanadoo.fr>
2003-02-05 17:58                       ` Stephan von Krawczynski
2003-02-05 20:00                     ` Bryan Andersen
2003-02-05 19:10                 ` Alan Cox
2003-02-06 12:20               ` Stephan von Krawczynski
2003-02-06 23:04                 ` Benjamin Herrenschmidt
2003-02-07  9:10                   ` Stephan von Krawczynski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3E4169E3.4050003@bogonomicon.net \
    --to=bryan@bogonomicon.net \
    --cc=alan@redhat.com \
    --cc=benh@kernel.crashing.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=robbert@radium.jvb.tudelft.nl \
    --cc=skraw@ithnet.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox