* hot-swap problems with Adaptec AIC-7902 (aic79xx)
@ 2005-08-24 11:29 Andrea Carpani
2005-08-24 14:42 ` bernd
0 siblings, 1 reply; 2+ messages in thread
From: Andrea Carpani @ 2005-08-24 11:29 UTC (permalink / raw)
To: linux-scsi
Hi everybody,
here are some issues I'm having with my system dealing with
hot-swapping.
The box is a Tyan GX28 (B2881) B2881G28U4H with 4 Hot-swap U320 SCSI
bays. SCSI controller is Adaptec AIC-7902 dual channel Ultra320 SCSI.
# cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: FUJITSU Model: MAP3735NC Rev: 0108
Type: Direct-Access ANSI SCSI revision: 03
Host: scsi0 Channel: 00 Id: 01 Lun: 00
Vendor: FUJITSU Model: MAP3735NC Rev: 0108
Type: Direct-Access ANSI SCSI revision: 03
Linux kernel 2.6.12.3 (no patches).
I have 2 drives single partition set up as a single md0 software
mirrored raid device (xfs filesystem). I set /dev/sdb1 as faulty and
remove it from the array.
I then want to hot-swap the drive with another one.
echo "scsi remove-single-device 0 0 1 0" > /proc/scsi/scsi
removes it and
cat /proc/scsi/scsi
shows this. If I physically swap the drive (with a different Maxtor one)
and issue
echo "scsi add-single-device 0 0 1 0" > /proc/scsi/scsi
nothing happens (syslog:
Aug 24 12:53:48 localhost kernel: scsi0: ILLEGAL_PHASE 0x80
Aug 24 12:53:48 localhost kernel: (scsi0:A:1:0): Abort Message Sent)
and the new drive appears in /proc/scsi/scsi only after a second "echo"
command (I assume this is a power-up delay).
At this point I'm not yet adding the drive to the mirror. The problem is
that if I repeat the last steps more than once (remove-single-device,
swap the drives again, add-single-device) I get the following error on
the console and everything freezes
I/O error in filesystem ("md0") meta-data dev md0 block 0x44308c4
("xlog_iodone") error 5 buf count 1024
Filesystem "md0": Log I/O error detected.
Shutting down filesystem: md0
Please umount the filesystem and rectify the problem(s).
Which is quite strange as I'm only scsi-dealing with the sdb device and
the filesystem at this point should only be on sda.
Here are some questions:
Is it possible that the scsi level operations disturb the other drive?
Which is the correct way to hot-swap scsi disks? Am I doing something
wrong?
More often than not (but not as easily reproducible) the removal and
detection of a new drive fails and the box hangs (no console messages):
could it be a driver/board problem?
Are there well tested scsi adapters/drivers that I should use?
Which scsi debug info should I turn on to help understad the problem?
Thanks,
Andrea.
--
Andrea Carpani <andrea.carpani@criticalpath.net>
^ permalink raw reply [flat|nested] 2+ messages in thread* Re: hot-swap problems with Adaptec AIC-7902 (aic79xx)
2005-08-24 11:29 hot-swap problems with Adaptec AIC-7902 (aic79xx) Andrea Carpani
@ 2005-08-24 14:42 ` bernd
0 siblings, 0 replies; 2+ messages in thread
From: bernd @ 2005-08-24 14:42 UTC (permalink / raw)
To: linux-scsi
>Hi everybody,
>here are some issues I'm having with my system dealing with
>hot-swapping.
>
>The box is a Tyan GX28 (B2881) B2881G28U4H with 4 Hot-swap U320 SCSI
>bays. SCSI controller is Adaptec AIC-7902 dual channel Ultra320 SCSI.
>
># cat /proc/scsi/scsi
>Attached devices:
>Host: scsi0 Channel: 00 Id: 00 Lun: 00
> Vendor: FUJITSU Model: MAP3735NC Rev: 0108
> Type: Direct-Access ANSI SCSI revision: 03
>Host: scsi0 Channel: 00 Id: 01 Lun: 00
> Vendor: FUJITSU Model: MAP3735NC Rev: 0108
> Type: Direct-Access ANSI SCSI revision: 03
>
>Linux kernel 2.6.12.3 (no patches).
>
>I have 2 drives single partition set up as a single md0 software
>mirrored raid device (xfs filesystem). I set /dev/sdb1 as faulty and
>remove it from the array.
>
>I then want to hot-swap the drive with another one.
>
>echo "scsi remove-single-device 0 0 1 0" > /proc/scsi/scsi
>
>removes it and
>
>cat /proc/scsi/scsi
>
>shows this. If I physically swap the drive (with a different Maxtor one)
>and issue
>
>echo "scsi add-single-device 0 0 1 0" > /proc/scsi/scsi
>
>nothing happens (syslog:
>Aug 24 12:53:48 localhost kernel: scsi0: ILLEGAL_PHASE 0x80
>Aug 24 12:53:48 localhost kernel: (scsi0:A:1:0): Abort Message Sent)
>and the new drive appears in /proc/scsi/scsi only after a second "echo"
>command (I assume this is a power-up delay).
>
>At this point I'm not yet adding the drive to the mirror. The problem is
>that if I repeat the last steps more than once (remove-single-device,
>swap the drives again, add-single-device) I get the following error on
>the console and everything freezes
>
>I/O error in filesystem ("md0") meta-data dev md0 block 0x44308c4
>("xlog_iodone") error 5 buf count 1024
>Filesystem "md0": Log I/O error detected.
>Shutting down filesystem: md0
>Please umount the filesystem and rectify the problem(s).
>
>Which is quite strange as I'm only scsi-dealing with the sdb device and
>the filesystem at this point should only be on sda.
>
>Here are some questions:
>Is it possible that the scsi level operations disturb the other drive?
>Which is the correct way to hot-swap scsi disks? Am I doing something
>wrong?
>More often than not (but not as easily reproducible) the removal and
>detection of a new drive fails and the box hangs (no console messages):
>could it be a driver/board problem?
>Are there well tested scsi adapters/drivers that I should use?
>Which scsi debug info should I turn on to help understad the problem?
>
>
>Thanks,
>Andrea.
>
>--
>Andrea Carpani <andrea.carpani@criticalpath.net>
-------------------------------------------------------------------------
Hi Andrea,
we have the same problem with AIC7902 and we posted this some month ago
to the scsi and the raid groups but without reply. We have RAID1 arrays.
with 10 drives on two controllers so 5 drives are an the one half of the
RAID1 arrays and 5 on the other. When we hotreplace a drive _all_ arrays
go into degrated mode because _all_ drives on the controller where
the disk is repaced are declared to be not ready during the spinup of
the replaced drive. Strange! This means we don't have the possibilty to
hotreplace a drive on Linux which we had since 20 years on HPUX.
As we detected in many, many tries we made the problem seem to come up
with kernel 2.6.x. In re not sure. With 2.4 there was never reported a
problem with hotswap by any customer. We use Suse 9.2 / 9.3 and SLES 9.0.
Sorry for this reply not giving you a solution but we wish to have
one, too.
Greetings Bernd Rieke
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2005-08-24 14:42 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-08-24 11:29 hot-swap problems with Adaptec AIC-7902 (aic79xx) Andrea Carpani
2005-08-24 14:42 ` bernd
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).