Hi Tejun, We have been trying to inject some errors on some drives & validate that the new error handling kicks out drives. Using 2.6.18rc3 on a box with 4 drives - 3 good & one with an artificially created ecc error in the 4-way MD RAID1 partition. The error handling worked through the various transitions, but did not give up on the drive well enough to let the boot continue using the other 3. I plan to look at the state of the drive with an analyzer tomorrow to make sure that the drive is not holding the bus or something & try your latest "new init" git tree code. What it looks like is a soft hang - maybe the box is stuck in ata_port_wait_eh() which never seems to timeout on a bus that does not recover? Attached is the console log, ric