public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andreas Steinmetz <ast@domdv.de>
To: linux-kernel@vger.kernel.org
Subject: ioapic+shared interrupt+concurrent access=ide lockup on 2.4 and 2.6
Date: Wed, 04 Feb 2004 21:14:18 +0100	[thread overview]
Message-ID: <4021529A.506@domdv.de> (raw)

Summary:
--------

The ide subsystem locks up on a dual Opteron system when io-apic is 
enabled and there are concurrent disk accesses on a pci ide controller 
card. The lockup occurs on different pci ide controller cards and there 
are boot parameters that can be used as workarounds. The lockup can be 
reproduced easily when a software raid5 array is used.

Details are given below. If additional information is required please 
contact me. I can currently run any tests required to fix this problem. 
Though the problem is in 2.4 and 2.6 I would prefer to find out what is 
happening based on 2.6.2 as the system doesn't lock up completely as 
opposed to 2.4.

Symptoms:
---------

Executing "dd if=/dev/md1 of=/dev/null bs=65536" on 2.6.2 (64 bit) 
results after a few seconds in:

hde: dma_timer_expiry: dma status == 0x24
hdg: dma_timer_expiry: dma status == 0x24
hde: DMA interrupt recovery
hde: lost interrupt
hdg: DMA interrupt recovery
hdg: lost interrupt
hde: dma_timer_expiry: dma status == 0x24
hdg: dma_timer_expiry: dma status == 0x24
hde: DMA interrupt recovery
hde: lost interrupt
hdg: DMA interrupt recovery
hdg: lost interrupt
...

Further accesses to hde and hdg always result then in the same messages 
as stated above.

When dd-ing concurrently from all disks (/dev/hdc, /dev/hde, /dev/hdg) 
from userspace on 2.6.2 the MIS count increases and finally the lockup 
happens though it takes considerably longer than using the raid5 array.

On 2.4 (either 32 bit or 64 bit) with the dd command on /dev/md1 it 
takes a few minutes to result in:

hde: dma_timer_expiry: dma status == 0x24

The 2.4 systems lock up completely then (nmi watchdog triggers, no SysRq 
on 64 bit).

In all cases the MIS count of /proc/interrupts steadily increases prior 
to the lockup. It always has the same value as the ide2/ide3 interrupt 
count on cpu1. Note that no irq balancing is active so there should be 
no interrupts on cpu1.

During the tests causing the lockup the system is otherwise idle.

Workarounds:
------------

In the cases listed below the lockup does not happen and the MIS count 
of /proc/interrupts stays at zero.

1. Boot with "noapic"
2. Boot with "ide2=serialize" (tested on 2.6.2 only)
3. Boot with "ide3=serialize" (tested on 2.6.2 only)

System:
-------

Tyan S2885 dual Opteron 246

IDE config:
-----------

ide0/hda DVD-RW
ide1/hdc WD2500JB
ide2/hde WD2500JB
ide3/hdg WD2500JB

ide2/ide3 are connected either to a HPT302 or a CMD680 card (same 
behaviour with both cards), so ide2 and ide3 share the same interrupt.

Raid5 config:
-------------

raiddev /dev/md1
	raid-level		5
	nr-raid-disks		3
	nr-spare-disks		0
	persistent-superblock	1
	chunk-size		32
	device			/dev/hdc6
	raid-disk		0
	device			/dev/hde6
	raid-disk		1
	device			/dev/hdg6
	raid-disk		2

-- 
Andreas Steinmetz


                 reply	other threads:[~2004-02-04 20:17 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4021529A.506@domdv.de \
    --to=ast@domdv.de \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox