* aic79xx driver - hotswap error
@ 2006-08-28 12:15 Martin Zuziak
2006-08-28 15:45 ` Doug Ledford
0 siblings, 1 reply; 6+ messages in thread
From: Martin Zuziak @ 2006-08-28 12:15 UTC (permalink / raw)
To: linux-scsi
Hello all
Hot-swapping doesn't seem to work with the aic79xx driver in kernel
2.6.17.9. Removing or adding a disk from/to a running system makes i/o
to any disk on the bus fail.
The machine is an IBM x346 server with a x86_64 cpu and a aic7902 scsi
controller.
A copy of the system log is here:
http://www.math.ku.dk/~zuziak/tmp/aic79xx_error_2.6.17.9.log
It shows the result of removing the third disk: the first disk (the only
one mounted) becomes inaccessible.
Kernel 2.6.15.7 seems to work but I have had no luck with newer kernels.
Has anyone seen hot-swapping work with the aic79xx driver in recent
kernels?
Sincerely,
Martin Zuziak <zuziak@math.ku.dk>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: aic79xx driver - hotswap error
2006-08-28 12:15 aic79xx driver - hotswap error Martin Zuziak
@ 2006-08-28 15:45 ` Doug Ledford
2006-08-29 9:26 ` Hannes Reinecke
0 siblings, 1 reply; 6+ messages in thread
From: Doug Ledford @ 2006-08-28 15:45 UTC (permalink / raw)
To: Martin Zuziak; +Cc: linux-scsi
[-- Attachment #1: Type: text/plain, Size: 2576 bytes --]
On Mon, 2006-08-28 at 14:15 +0200, Martin Zuziak wrote:
> Hello all
>
> Hot-swapping doesn't seem to work with the aic79xx driver in kernel
> 2.6.17.9. Removing or adding a disk from/to a running system makes i/o
> to any disk on the bus fail.
>
> The machine is an IBM x346 server with a x86_64 cpu and a aic7902 scsi
> controller.
>
> A copy of the system log is here:
> http://www.math.ku.dk/~zuziak/tmp/aic79xx_error_2.6.17.9.log
>
> It shows the result of removing the third disk: the first disk (the only
> one mounted) becomes inaccessible.
>
> Kernel 2.6.15.7 seems to work but I have had no luck with newer kernels.
>
> Has anyone seen hot-swapping work with the aic79xx driver in recent
> kernels?
Are you sure your system is hot swap safe? The whole log mess begins
with "someone reset channel A" which means the card detected a bus reset
but it didn't initiate the reset. That's either going to be because
your system shouldn't be hot swap plugged and it triggered a spike on
the reset pin, or because your hot swap drive setup resets the bus on
unplug intentionally. Knowing that would help.
So, the driver managed to get into the ahd_pause_and_flushwork()
function, probably while trying to queue the abort SCB, and while there
it detected an infinite loop and printed out the "Infinite interrupt
loop, INTSTAT = 8" message. The INTSTAT value of 0x08 maps to SCSIINT,
so next you would look at the SCSIINT1 and SCSIINT2 registers to see
just *what* is causing the loop. There you see SSTAT1[0x20]:(SCSIRSTI).
This tells us the driver is *still* getting a SCSI Reset In interrupt
from the card, even over 1 minute after you pulled the drive. So, the
reason your SCSI bus hung is because everything on the bus is being
subjected to an infinite bus reset condition. The cause of this
happening is likely either A) your bus isn't hot swap safe and you hot
swapped anyway, and in the process you disconnected the termination
power source or termination itself or just plain flaked other devices on
the bus out or B) something in your hot swap enclosure is broken and
throws an infinite bus reset when the drive is removed. Either way,
this is not what I would call expected behavior from the aic79xx driver,
I suspect that it is innocent here and that the hardware is to blame.
--
Doug Ledford <dledford@redhat.com>
GPG KeyID: CFBFF194
http://people.redhat.com/dledford
Infiniband specific RPMs available at
http://people.redhat.com/dledford/Infiniband
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: aic79xx driver - hotswap error
2006-08-28 15:45 ` Doug Ledford
@ 2006-08-29 9:26 ` Hannes Reinecke
2006-08-30 13:23 ` Martin Zuziak
0 siblings, 1 reply; 6+ messages in thread
From: Hannes Reinecke @ 2006-08-29 9:26 UTC (permalink / raw)
To: Doug Ledford; +Cc: Martin Zuziak, linux-scsi
Doug Ledford wrote:
> On Mon, 2006-08-28 at 14:15 +0200, Martin Zuziak wrote:
>> Hello all
>>
>> Hot-swapping doesn't seem to work with the aic79xx driver in kernel
>> 2.6.17.9. Removing or adding a disk from/to a running system makes i/o
>> to any disk on the bus fail.
>>
>> The machine is an IBM x346 server with a x86_64 cpu and a aic7902 scsi
>> controller.
>>
>> A copy of the system log is here:
>> http://www.math.ku.dk/~zuziak/tmp/aic79xx_error_2.6.17.9.log
>>
>> It shows the result of removing the third disk: the first disk (the only
>> one mounted) becomes inaccessible.
>>
>> Kernel 2.6.15.7 seems to work but I have had no luck with newer kernels.
>>
>> Has anyone seen hot-swapping work with the aic79xx driver in recent
>> kernels?
>
> Are you sure your system is hot swap safe? The whole log mess begins
> with "someone reset channel A" which means the card detected a bus reset
> but it didn't initiate the reset. That's either going to be because
> your system shouldn't be hot swap plugged and it triggered a spike on
> the reset pin, or because your hot swap drive setup resets the bus on
> unplug intentionally. Knowing that would help.
>
> So, the driver managed to get into the ahd_pause_and_flushwork()
> function, probably while trying to queue the abort SCB, and while there
> it detected an infinite loop and printed out the "Infinite interrupt
> loop, INTSTAT = 8" message. The INTSTAT value of 0x08 maps to SCSIINT,
> so next you would look at the SCSIINT1 and SCSIINT2 registers to see
> just *what* is causing the loop. There you see SSTAT1[0x20]:(SCSIRSTI).
> This tells us the driver is *still* getting a SCSI Reset In interrupt
> from the card, even over 1 minute after you pulled the drive. So, the
> reason your SCSI bus hung is because everything on the bus is being
> subjected to an infinite bus reset condition. The cause of this
> happening is likely either A) your bus isn't hot swap safe and you hot
> swapped anyway, and in the process you disconnected the termination
> power source or termination itself or just plain flaked other devices on
> the bus out or B) something in your hot swap enclosure is broken and
> throws an infinite bus reset when the drive is removed. Either way,
> this is not what I would call expected behavior from the aic79xx driver,
> I suspect that it is innocent here and that the hardware is to blame.
>
Thanks Doug.
I couldn't have phrased it better.
Hotswap does work, provided you don't do anything untoward.
Which translates as "might work but don't blame me if it doesn't".
And I would never ever claim that it is a supported feature.
Cheers,
Hannes
--
Dr. Hannes Reinecke hare@suse.de
SuSE Linux Products GmbH S390 & zSeries
Maxfeldstraße 5 +49 911 74053 688
90409 Nürnberg http://www.suse.de
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: aic79xx driver - hotswap error
2006-08-29 9:26 ` Hannes Reinecke
@ 2006-08-30 13:23 ` Martin Zuziak
2006-08-30 13:41 ` Hannes Reinecke
0 siblings, 1 reply; 6+ messages in thread
From: Martin Zuziak @ 2006-08-30 13:23 UTC (permalink / raw)
To: Hannes Reinecke; +Cc: Doug Ledford, linux-scsi
On Tue, Aug 29, 2006 at 11:26:44AM +0200, Hannes Reinecke wrote:
> Doug Ledford wrote:
> >On Mon, 2006-08-28 at 14:15 +0200, Martin Zuziak wrote:
> >>Hello all
> >>
> >>Hot-swapping doesn't seem to work with the aic79xx driver in kernel
> >>2.6.17.9. Removing or adding a disk from/to a running system makes i/o
> >>to any disk on the bus fail.
> >>
> >>The machine is an IBM x346 server with a x86_64 cpu and a aic7902 scsi
> >>controller.
> >>
> >>A copy of the system log is here:
> >>http://www.math.ku.dk/~zuziak/tmp/aic79xx_error_2.6.17.9.log
> >>
> >>It shows the result of removing the third disk: the first disk (the only
> >>one mounted) becomes inaccessible.
> >>
> >>Kernel 2.6.15.7 seems to work but I have had no luck with newer kernels.
> >>
> >>Has anyone seen hot-swapping work with the aic79xx driver in recent
> >>kernels?
> >
> >Are you sure your system is hot swap safe? The whole log mess begins
> >with "someone reset channel A" which means the card detected a bus reset
> >but it didn't initiate the reset. That's either going to be because
> >your system shouldn't be hot swap plugged and it triggered a spike on
> >the reset pin, or because your hot swap drive setup resets the bus on
> >unplug intentionally. Knowing that would help.
> >
> >So, the driver managed to get into the ahd_pause_and_flushwork()
> >function, probably while trying to queue the abort SCB, and while there
> >it detected an infinite loop and printed out the "Infinite interrupt
> >loop, INTSTAT = 8" message. The INTSTAT value of 0x08 maps to SCSIINT,
> >so next you would look at the SCSIINT1 and SCSIINT2 registers to see
> >just *what* is causing the loop. There you see SSTAT1[0x20]:(SCSIRSTI).
> >This tells us the driver is *still* getting a SCSI Reset In interrupt
> >from the card, even over 1 minute after you pulled the drive. So, the
> >reason your SCSI bus hung is because everything on the bus is being
> >subjected to an infinite bus reset condition. The cause of this
> >happening is likely either A) your bus isn't hot swap safe and you hot
> >swapped anyway, and in the process you disconnected the termination
> >power source or termination itself or just plain flaked other devices on
> >the bus out or B) something in your hot swap enclosure is broken and
> >throws an infinite bus reset when the drive is removed. Either way,
> >this is not what I would call expected behavior from the aic79xx driver,
> >I suspect that it is innocent here and that the hardware is to blame.
> >
> Thanks Doug.
> I couldn't have phrased it better.
>
> Hotswap does work, provided you don't do anything untoward.
> Which translates as "might work but don't blame me if it doesn't".
> And I would never ever claim that it is a supported feature.
Thank you both for your replies.
First off all the hardware (both server and disks) does support hot
swap. And I have tried on three different machines (same model) to rule
out a hardware fault.
I have never had any problems with hot swap before. And indeed hot swap
on these machines work with the 2.6.15.7 kernel. But something has
changed since 2.6.16 so it no longer works.
However if hot swap isn't supported then that's just the way it is.
Sincerely,
Martin Zuziak <zuziak@math.ku.dk>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: aic79xx driver - hotswap error
2006-08-30 13:23 ` Martin Zuziak
@ 2006-08-30 13:41 ` Hannes Reinecke
2006-08-31 11:16 ` Martin Zuziak
0 siblings, 1 reply; 6+ messages in thread
From: Hannes Reinecke @ 2006-08-30 13:41 UTC (permalink / raw)
To: Martin Zuziak; +Cc: Doug Ledford, linux-scsi
Martin Zuziak wrote:
> On Tue, Aug 29, 2006 at 11:26:44AM +0200, Hannes Reinecke wrote:
>> Doug Ledford wrote:
>>> On Mon, 2006-08-28 at 14:15 +0200, Martin Zuziak wrote:
>>>> Hello all
>>>>
>>>> Hot-swapping doesn't seem to work with the aic79xx driver in kernel
>>>> 2.6.17.9. Removing or adding a disk from/to a running system makes i/o
>>>> to any disk on the bus fail.
>>>>
>>>> The machine is an IBM x346 server with a x86_64 cpu and a aic7902 scsi
>>>> controller.
>>>>
>>>> A copy of the system log is here:
>>>> http://www.math.ku.dk/~zuziak/tmp/aic79xx_error_2.6.17.9.log
>>>>
>>>> It shows the result of removing the third disk: the first disk (the only
>>>> one mounted) becomes inaccessible.
>>>>
>>>> Kernel 2.6.15.7 seems to work but I have had no luck with newer kernels.
>>>>
>>>> Has anyone seen hot-swapping work with the aic79xx driver in recent
>>>> kernels?
>>> Are you sure your system is hot swap safe? The whole log mess begins
>>> with "someone reset channel A" which means the card detected a bus reset
>>> but it didn't initiate the reset. That's either going to be because
>>> your system shouldn't be hot swap plugged and it triggered a spike on
>>> the reset pin, or because your hot swap drive setup resets the bus on
>>> unplug intentionally. Knowing that would help.
>>>
>>> So, the driver managed to get into the ahd_pause_and_flushwork()
>>> function, probably while trying to queue the abort SCB, and while there
>>> it detected an infinite loop and printed out the "Infinite interrupt
>>> loop, INTSTAT = 8" message. The INTSTAT value of 0x08 maps to SCSIINT,
>>> so next you would look at the SCSIINT1 and SCSIINT2 registers to see
>>> just *what* is causing the loop. There you see SSTAT1[0x20]:(SCSIRSTI).
>>> This tells us the driver is *still* getting a SCSI Reset In interrupt
>> >from the card, even over 1 minute after you pulled the drive. So, the
>>> reason your SCSI bus hung is because everything on the bus is being
>>> subjected to an infinite bus reset condition. The cause of this
>>> happening is likely either A) your bus isn't hot swap safe and you hot
>>> swapped anyway, and in the process you disconnected the termination
>>> power source or termination itself or just plain flaked other devices on
>>> the bus out or B) something in your hot swap enclosure is broken and
>>> throws an infinite bus reset when the drive is removed. Either way,
>>> this is not what I would call expected behavior from the aic79xx driver,
>>> I suspect that it is innocent here and that the hardware is to blame.
>>>
>> Thanks Doug.
>> I couldn't have phrased it better.
>>
>> Hotswap does work, provided you don't do anything untoward.
>> Which translates as "might work but don't blame me if it doesn't".
>> And I would never ever claim that it is a supported feature.
>
> Thank you both for your replies.
>
> First off all the hardware (both server and disks) does support hot
> swap. And I have tried on three different machines (same model) to rule
> out a hardware fault.
>
> I have never had any problems with hot swap before. And indeed hot swap
> on these machines work with the 2.6.15.7 kernel. But something has
> changed since 2.6.16 so it no longer works.
>
Hmm. But you should see something in the logs for 2.6.15, too.
Ideally some aic79xx stack dump. Can you dig it out?
If a aic79xx state dump is available for 2.6.15 we might be able to
figure out the difference and fix the driver if possible.
But saying that 'hotswap is supported' simply doesn't cut it.
The spec actually allows you to claim 'hotswap is supported' when you
have to power-cycle the entire cabinet.
Or the HBA.
Hmm. I wonder whether this is related to the infamous bus polling
mechanism the original driver did ...
Anyway, can you get the logs?
Cheers,
Hannes
--
Dr. Hannes Reinecke hare@suse.de
SuSE Linux Products GmbH S390 & zSeries
Maxfeldstraße 5 +49 911 74053 688
90409 Nürnberg http://www.suse.de
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: aic79xx driver - hotswap error
2006-08-30 13:41 ` Hannes Reinecke
@ 2006-08-31 11:16 ` Martin Zuziak
0 siblings, 0 replies; 6+ messages in thread
From: Martin Zuziak @ 2006-08-31 11:16 UTC (permalink / raw)
To: Hannes Reinecke; +Cc: Doug Ledford, linux-scsi
> >Thank you both for your replies.
> >
> >First off all the hardware (both server and disks) does support hot
> >swap. And I have tried on three different machines (same model) to rule
> >out a hardware fault.
> >
> >I have never had any problems with hot swap before. And indeed hot swap
> >on these machines work with the 2.6.15.7 kernel. But something has
> >changed since 2.6.16 so it no longer works.
> >
> Hmm. But you should see something in the logs for 2.6.15, too.
> Ideally some aic79xx stack dump. Can you dig it out?
> If a aic79xx state dump is available for 2.6.15 we might be able to
> figure out the difference and fix the driver if possible.
No stack dump is logged and I don't know how to force one.
After echoing "scsi remove-single-device 1 0 2 0" to /proc/scsi/scsi
nothing is logged.
After the disk is physically removed from its bay this is logged:
Aug 31 12:45:58 nyimf kernel: scsi1: Someone reset channel A
And when it's re-inserted:
Aug 31 12:46:24 nyimf kernel: scsi1: Someone reset channel A
After echoing "scsi add-single-device 1 0 2 0" to /proc/scsi/scsi the
disk is detected and ready to use:
Aug 31 12:46:45 nyimf kernel: Vendor: IBM-ESXS Model: VPR036C3-ETS10FN Rev: S3C0
Aug 31 12:46:45 nyimf kernel: Type: Direct-Access ANSI SCSI revision: 04
Aug 31 12:46:45 nyimf kernel: target1:0:2: asynchronous.
Aug 31 12:46:45 nyimf kernel: scsi1:A:2:0: Tagged Queuing enabled. Depth 32
Aug 31 12:46:45 nyimf kernel: target1:0:2: Beginning Domain Validation
Aug 31 12:46:45 nyimf kernel: target1:0:2: wide asynchronous.
Aug 31 12:46:45 nyimf kernel: target1:0:2: FAST-160 WIDE SCSI 320.0 MB/s DT IU RDSTRM RTI WRFLOW PCOMP (6.25 ns, offset 127)
Aug 31 12:46:45 nyimf kernel: target1:0:2: Domain Validation skipping write tests
Aug 31 12:46:45 nyimf kernel: target1:0:2: Ending Domain Validation
Aug 31 12:46:49 nyimf udevd-event[20583]: wait_for_sysfs: waiting for '/sys/devices/pci0000:00/0000:00:06.0/0000:07:00.0/0000:08:07.1/host1/target1:0:2/1:0:2:0/bus' failed
Aug 31 12:46:52 nyimf udevd-event[20583]: wait_for_sysfs: waiting for '/sys/devices/pci0000:00/0000:00:06.0/0000:07:00.0/0000:08:07.1/host1/target1:0:2/1:0:2:0/ioerr_cnt' failed
Aug 31 12:46:54 nyimf kernel: sdc: Spinning up disk............ready
Aug 31 12:46:54 nyimf kernel: SCSI device sdc: 71096640 512-byte hdwr sectors (36401 MB)
Aug 31 12:46:54 nyimf kernel: SCSI device sdc: drive cache: write through
Aug 31 12:46:54 nyimf kernel: SCSI device sdc: 71096640 512-byte hdwr sectors (36401 MB)
Aug 31 12:46:54 nyimf scsi_id[20586]: scsi_id: unable to access parent device of '/block/sdc'
Aug 31 12:46:54 nyimf kernel: SCSI device sdc: drive cache: write through
Aug 31 12:46:54 nyimf scsi_id[20587]: scsi_id: unable to access parent device of '/block/sdc'
Aug 31 12:46:54 nyimf kernel: sdc:
Aug 31 12:46:54 nyimf kernel: sd 1:0:2:0: Attached scsi disk sdc
Aug 31 12:46:54 nyimf kernel: sd 1:0:2:0: Attached scsi generic sg2 type 0
> But saying that 'hotswap is supported' simply doesn't cut it.
> The spec actually allows you to claim 'hotswap is supported' when you
> have to power-cycle the entire cabinet.
> Or the HBA.
I don't think IBM designed the server that way. But I must admit to
knowing very little about scsi. It just seemed suspicious that hot
swapping stopped working with 2.6.16 when the changelog showed a lot of
changes to the aic79xx driver.
Sincerely,
Martin Zuziak <zuziak@math.ku.dk>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2006-08-31 11:16 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-28 12:15 aic79xx driver - hotswap error Martin Zuziak
2006-08-28 15:45 ` Doug Ledford
2006-08-29 9:26 ` Hannes Reinecke
2006-08-30 13:23 ` Martin Zuziak
2006-08-30 13:41 ` Hannes Reinecke
2006-08-31 11:16 ` Martin Zuziak
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox