* LSI SAS changes SCSI address and by-path on hot-swap
@ 2010-03-04 16:55 Asdo
2010-03-05 6:22 ` James Bottomley
0 siblings, 1 reply; 11+ messages in thread
From: Asdo @ 2010-03-04 16:55 UTC (permalink / raw)
To: linux-scsi@vger.kernel.org
Hello all,
we need to buy new controllers for new storages we are building.
LSI SAS HBAs are very attractive for our purposes but I identified a
problem with our existing mainboard-integrated LSI SAS 1068E . The
problem is that it is apparently not possble to use the
/dev/disk/by-path feature of Linux with it. At least not with the kernel
2.6.24 we are using (excuse me if it has already been fixed on latest
kernels: the server is in production now and it's not easy for us to check).
We need the /dev/disk/by-path feature because we commonly do hot-swaps
with drives and we need to know for sure which HDD slot corresponds to a
certain linux block device. With other controllers like 3ware 9650SE
there is no such problem, ok but that's a SATA controller... I don't
know if the problem is by design with SAS controllers.
Actually the problem is even more complicated because for the new
storages we have planned to assemble there would be SAS expanders in the
middle.
Look, here is an hot-swap seen from the dmesg:
Feb 22 14:27:30 myserver kernel: [655437.601971] mptbase: ioc0:
LogInfo(0x31110d00): Originator={PL}, Code={Reset}, SubCode(0x0d00)
Feb 22 14:27:35 myserver kernel: [655442.781061] mptsas: ioc0:
removing sata device, channel 0, id 0, phy 0
Feb 22 14:27:35 myserver kernel: [655442.781453] sd 5:0:10:0:
[sdu] Synchronizing SCSI cache
Feb 22 14:27:35 myserver kernel: [655442.781495] sd 5:0:10:0:
[sdu] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
Feb 22 14:28:22 myserver kernel: [655489.237562] mptsas: ioc0:
attaching sata device, channel 0, id 0, phy 0
Feb 22 14:28:22 myserver kernel: [655489.241959] scsi 5:0:11:0:
Direct-Access ATA WDC WD10EADS-00P 0A01 PQ: 0 ANSI: 5
Feb 22 14:28:22 myserver kernel: [655489.242506] sd 5:0:11:0:
[sdu] 1953525168 512-byte hardware sectors (1000205 MB)
Feb 22 14:28:22 myserver kernel: [655489.248104] sd 5:0:11:0:
[sdu] Write Protect is off
Feb 22 14:28:22 myserver kernel: [655489.251847] sd 5:0:11:0:
[sdu] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Feb 22 14:28:22 myserver kernel: [655489.252161] sd 5:0:11:0:
[sdu] 1953525168 512-byte hardware sectors (1000205 MB)
Feb 22 14:28:22 myserver kernel: [655489.257758] sd 5:0:11:0:
[sdu] Write Protect is off
Feb 22 14:28:22 myserver kernel: [655489.261518] sd 5:0:11:0:
[sdu] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Feb 22 14:28:22 myserver kernel: [655489.261525] sdu: unknown
partition table
Feb 22 14:28:22 myserver kernel: [655489.287152] sd 5:0:11:0:
[sdu] Attached SCSI disk
Feb 22 14:28:22 myserver kernel: [655489.287204] sd 5:0:11:0:
Attached scsi generic sg21 type 0
You see, when I remove the disk it takes away device sd 5:0:10:0 and
when I reinsert a new drive it becomes device sd 5:0:11:0.
the /dev/disk/by-path the file to the disk also changes, from:
/dev/disk/by-path/pci-0000:0b:00.0-sas-0x500e08101003c820:1:0-0x1221000000000000:0
to:
/dev/disk/by-path/pci-0000:0b:00.0-sas-0x500e08101003c824:1:4-0x1221000000000000:0
(note: I'm not 100% sure that these two entries come from the same
hot-swap as the dmesg above)
in rare cases I noticed that after an hot swap the file in
/dev/disk/by-path for the device is not even recreated.
I also cannot trust drive letters because they can change across reboot,
and they also change if I remove drive A, remove drive B, insert drive
B, insert drive A... the letters would be swapped. So it's not reliable
enough for our use.
So is this a real bug and is maybe fixed on newer kernels, or it is by
design?
How can people reliably use hot-swap hardware in this situation...? Are
there other ways to determine the physical connections from within linux
(possibly through SAS expanders also), which I am not aware of?
Thank you
Asdo
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: LSI SAS changes SCSI address and by-path on hot-swap 2010-03-04 16:55 LSI SAS changes SCSI address and by-path on hot-swap Asdo @ 2010-03-05 6:22 ` James Bottomley 2010-03-05 11:12 ` Asdo 2010-03-05 16:57 ` Moore, Michael 0 siblings, 2 replies; 11+ messages in thread From: James Bottomley @ 2010-03-05 6:22 UTC (permalink / raw) To: Asdo; +Cc: linux-scsi@vger.kernel.org On Thu, 2010-03-04 at 17:55 +0100, Asdo wrote: > we need to buy new controllers for new storages we are building. > > LSI SAS HBAs are very attractive for our purposes but I identified a > problem with our existing mainboard-integrated LSI SAS 1068E . The > problem is that it is apparently not possble to use the > /dev/disk/by-path feature of Linux with it. At least not with the kernel > 2.6.24 we are using (excuse me if it has already been fixed on latest > kernels: the server is in production now and it's not easy for us to check). > > We need the /dev/disk/by-path feature because we commonly do hot-swaps > with drives and we need to know for sure which HDD slot corresponds to a > certain linux block device. With other controllers like 3ware 9650SE > there is no such problem, ok but that's a SATA controller... I don't > know if the problem is by design with SAS controllers. > > Actually the problem is even more complicated because for the new > storages we have planned to assemble there would be SAS expanders in the > middle. > > Look, here is an hot-swap seen from the dmesg: > > Feb 22 14:27:30 myserver kernel: [655437.601971] mptbase: ioc0: > LogInfo(0x31110d00): Originator={PL}, Code={Reset}, SubCode(0x0d00) > Feb 22 14:27:35 myserver kernel: [655442.781061] mptsas: ioc0: > removing sata device, channel 0, id 0, phy 0 > Feb 22 14:27:35 myserver kernel: [655442.781453] sd 5:0:10:0: > [sdu] Synchronizing SCSI cache > Feb 22 14:27:35 myserver kernel: [655442.781495] sd 5:0:10:0: > [sdu] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK > Feb 22 14:28:22 myserver kernel: [655489.237562] mptsas: ioc0: > attaching sata device, channel 0, id 0, phy 0 > Feb 22 14:28:22 myserver kernel: [655489.241959] scsi 5:0:11:0: > Direct-Access ATA WDC WD10EADS-00P 0A01 PQ: 0 ANSI: 5 > Feb 22 14:28:22 myserver kernel: [655489.242506] sd 5:0:11:0: > [sdu] 1953525168 512-byte hardware sectors (1000205 MB) > Feb 22 14:28:22 myserver kernel: [655489.248104] sd 5:0:11:0: > [sdu] Write Protect is off > Feb 22 14:28:22 myserver kernel: [655489.251847] sd 5:0:11:0: > [sdu] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA > Feb 22 14:28:22 myserver kernel: [655489.252161] sd 5:0:11:0: > [sdu] 1953525168 512-byte hardware sectors (1000205 MB) > Feb 22 14:28:22 myserver kernel: [655489.257758] sd 5:0:11:0: > [sdu] Write Protect is off > Feb 22 14:28:22 myserver kernel: [655489.261518] sd 5:0:11:0: > [sdu] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA > Feb 22 14:28:22 myserver kernel: [655489.261525] sdu: unknown > partition table > Feb 22 14:28:22 myserver kernel: [655489.287152] sd 5:0:11:0: > [sdu] Attached SCSI disk > Feb 22 14:28:22 myserver kernel: [655489.287204] sd 5:0:11:0: > Attached scsi generic sg21 type 0 > > You see, when I remove the disk it takes away device sd 5:0:10:0 and > when I reinsert a new drive it becomes device sd 5:0:11:0. > > the /dev/disk/by-path the file to the disk also changes, from: > > /dev/disk/by-path/pci-0000:0b:00.0-sas-0x500e08101003c820:1:0-0x1221000000000000:0 > > to: > > /dev/disk/by-path/pci-0000:0b:00.0-sas-0x500e08101003c824:1:4-0x1221000000000000:0 > (note: I'm not 100% sure that these two entries come from the same > hot-swap as the dmesg above) > > in rare cases I noticed that after an hot swap the file in > /dev/disk/by-path for the device is not even recreated. > > I also cannot trust drive letters because they can change across reboot, > and they also change if I remove drive A, remove drive B, insert drive > B, insert drive A... the letters would be swapped. So it's not reliable > enough for our use. > > So is this a real bug and is maybe fixed on newer kernels, or it is by > design? > > How can people reliably use hot-swap hardware in this situation...? Are > there other ways to determine the physical connections from within linux > (possibly through SAS expanders also), which I am not aware of? So what I think I hear in the foregoing is that you actually want to identify a device by slot number in the chassis? For that, /dev/disk/by-path will never work; you need to be using enclosure services. However, since you mention you'll be using SAS and expanders, there is a way to get to the slot numbers without using enclosure services: They phy numbers of the expander (and HBA) ports usually correspond one for one with the slot. So for sda, if, in my system, you look at /sys/block/sda/device, it's a symbolic link for /sys/devices/pci0000:00/0000:00:1c.0/0000:02:00.0/0000:03:04.0/host3/port-3:0/end_device-3:0/target3:0:0/3:0:0:0 The thing you want is the port-3.0. If you look in sysfs at this: ls /sys/class/sas_port/port-3\:0/device Mine contains phy-3:4 Showing this disk is actually connected to phy 4 of the output device (as the HBA counts). For expanders it's a little more complex, you'll see multiple ports in the path, but it's the phy of the last one you want. James ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: LSI SAS changes SCSI address and by-path on hot-swap 2010-03-05 6:22 ` James Bottomley @ 2010-03-05 11:12 ` Asdo 2010-03-05 16:57 ` Moore, Michael 1 sibling, 0 replies; 11+ messages in thread From: Asdo @ 2010-03-05 11:12 UTC (permalink / raw) To: James Bottomley; +Cc: linux-scsi@vger.kernel.org James Bottomley wrote: > [CUT] > The thing you want is the port-3.0. If you look in sysfs at this: > > ls /sys/class/sas_port/port-3\:0/device > > Mine contains phy-3:4 Showing this disk is actually connected to phy 4 > of the output device (as the HBA counts). > > WONDERFUL!! Thanks for replying. I cannot check right now on my system, maybe next week, but I trust this will work... Thanks again A. ^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: LSI SAS changes SCSI address and by-path on hot-swap 2010-03-05 6:22 ` James Bottomley 2010-03-05 11:12 ` Asdo @ 2010-03-05 16:57 ` Moore, Michael 2010-03-05 23:05 ` Asdo 1 sibling, 1 reply; 11+ messages in thread From: Moore, Michael @ 2010-03-05 16:57 UTC (permalink / raw) To: James Bottomley, Asdo; +Cc: linux-scsi@vger.kernel.org I did this with a LSI-1068E HBA and 2 x 4 drive hot swap SATA bays. I was able to create udev rules to map the drive slots to consistent /dev entries. However, the bigger problem I had was that if I had a drive inserted and mounted (say in slot A) and then I added or swapped another drive on the same port ( 4 SAS channels per port on the external HBAs) it would cause some sort of reset on the bus that would end up unmounting the drive in slot A even though I never did anything to the drive in slot A. Now, this was with SATA drives connected directly to the 1068 which should work, but since I needed this to work, I had to revert to the older setup that used Silicon Image 3124 eSATA cards. I can try to dig up the udev rules I used if this would be helpful. - Mike -----Original Message----- From: linux-scsi-owner@vger.kernel.org [mailto:linux-scsi-owner@vger.kernel.org] On Behalf Of James Bottomley Sent: Friday, March 05, 2010 1:23 AM To: Asdo Cc: linux-scsi@vger.kernel.org Subject: Re: LSI SAS changes SCSI address and by-path on hot-swap On Thu, 2010-03-04 at 17:55 +0100, Asdo wrote: > we need to buy new controllers for new storages we are building. > > LSI SAS HBAs are very attractive for our purposes but I identified a > problem with our existing mainboard-integrated LSI SAS 1068E . The > problem is that it is apparently not possble to use the > /dev/disk/by-path feature of Linux with it. At least not with the kernel > 2.6.24 we are using (excuse me if it has already been fixed on latest > kernels: the server is in production now and it's not easy for us to check). > > We need the /dev/disk/by-path feature because we commonly do hot-swaps > with drives and we need to know for sure which HDD slot corresponds to a > certain linux block device. With other controllers like 3ware 9650SE > there is no such problem, ok but that's a SATA controller... I don't > know if the problem is by design with SAS controllers. > > Actually the problem is even more complicated because for the new > storages we have planned to assemble there would be SAS expanders in the > middle. > > Look, here is an hot-swap seen from the dmesg: > > Feb 22 14:27:30 myserver kernel: [655437.601971] mptbase: ioc0: > LogInfo(0x31110d00): Originator={PL}, Code={Reset}, SubCode(0x0d00) > Feb 22 14:27:35 myserver kernel: [655442.781061] mptsas: ioc0: > removing sata device, channel 0, id 0, phy 0 > Feb 22 14:27:35 myserver kernel: [655442.781453] sd 5:0:10:0: > [sdu] Synchronizing SCSI cache > Feb 22 14:27:35 myserver kernel: [655442.781495] sd 5:0:10:0: > [sdu] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK > Feb 22 14:28:22 myserver kernel: [655489.237562] mptsas: ioc0: > attaching sata device, channel 0, id 0, phy 0 > Feb 22 14:28:22 myserver kernel: [655489.241959] scsi 5:0:11:0: > Direct-Access ATA WDC WD10EADS-00P 0A01 PQ: 0 ANSI: 5 > Feb 22 14:28:22 myserver kernel: [655489.242506] sd 5:0:11:0: > [sdu] 1953525168 512-byte hardware sectors (1000205 MB) > Feb 22 14:28:22 myserver kernel: [655489.248104] sd 5:0:11:0: > [sdu] Write Protect is off > Feb 22 14:28:22 myserver kernel: [655489.251847] sd 5:0:11:0: > [sdu] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA > Feb 22 14:28:22 myserver kernel: [655489.252161] sd 5:0:11:0: > [sdu] 1953525168 512-byte hardware sectors (1000205 MB) > Feb 22 14:28:22 myserver kernel: [655489.257758] sd 5:0:11:0: > [sdu] Write Protect is off > Feb 22 14:28:22 myserver kernel: [655489.261518] sd 5:0:11:0: > [sdu] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA > Feb 22 14:28:22 myserver kernel: [655489.261525] sdu: unknown > partition table > Feb 22 14:28:22 myserver kernel: [655489.287152] sd 5:0:11:0: > [sdu] Attached SCSI disk > Feb 22 14:28:22 myserver kernel: [655489.287204] sd 5:0:11:0: > Attached scsi generic sg21 type 0 > > You see, when I remove the disk it takes away device sd 5:0:10:0 and > when I reinsert a new drive it becomes device sd 5:0:11:0. > > the /dev/disk/by-path the file to the disk also changes, from: > > /dev/disk/by-path/pci-0000:0b:00.0-sas-0x500e08101003c820:1:0-0x1221000000000000:0 > > to: > > /dev/disk/by-path/pci-0000:0b:00.0-sas-0x500e08101003c824:1:4-0x1221000000000000:0 > (note: I'm not 100% sure that these two entries come from the same > hot-swap as the dmesg above) > > in rare cases I noticed that after an hot swap the file in > /dev/disk/by-path for the device is not even recreated. > > I also cannot trust drive letters because they can change across reboot, > and they also change if I remove drive A, remove drive B, insert drive > B, insert drive A... the letters would be swapped. So it's not reliable > enough for our use. > > So is this a real bug and is maybe fixed on newer kernels, or it is by > design? > > How can people reliably use hot-swap hardware in this situation...? Are > there other ways to determine the physical connections from within linux > (possibly through SAS expanders also), which I am not aware of? So what I think I hear in the foregoing is that you actually want to identify a device by slot number in the chassis? For that, /dev/disk/by-path will never work; you need to be using enclosure services. However, since you mention you'll be using SAS and expanders, there is a way to get to the slot numbers without using enclosure services: They phy numbers of the expander (and HBA) ports usually correspond one for one with the slot. So for sda, if, in my system, you look at /sys/block/sda/device, it's a symbolic link for /sys/devices/pci0000:00/0000:00:1c.0/0000:02:00.0/0000:03:04.0/host3/port-3:0/end_device-3:0/target3:0:0/3:0:0:0 The thing you want is the port-3.0. If you look in sysfs at this: ls /sys/class/sas_port/port-3\:0/device Mine contains phy-3:4 Showing this disk is actually connected to phy 4 of the output device (as the HBA counts). For expanders it's a little more complex, you'll see multiple ports in the path, but it's the phy of the last one you want. James -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: LSI SAS changes SCSI address and by-path on hot-swap 2010-03-05 16:57 ` Moore, Michael @ 2010-03-05 23:05 ` Asdo 2010-03-09 16:50 ` Moore, Michael 0 siblings, 1 reply; 11+ messages in thread From: Asdo @ 2010-03-05 23:05 UTC (permalink / raw) To: Moore, Michael; +Cc: James Bottomley, linux-scsi@vger.kernel.org Moore, Michael wrote: > I did this with a LSI-1068E HBA and 2 x 4 drive hot swap SATA bays. I was able to create udev rules to map the drive slots to consistent /dev entries. However, the bigger problem I had was that if I had a drive inserted and mounted (say in slot A) and then I added or swapped another drive on the same port ( 4 SAS channels per port on the external HBAs) it would cause some sort of reset on the bus that would end up unmounting the drive > in slot A even though I never did anything to the drive in slot A. Now, this was with SATA drives connected directly to the 1068 which should work, but since I needed this to work, I had to revert to the older setup that used Silicon Image 3124 eSATA cards. > > I can try to dig up the udev rules I used if this would be helpful. > > - Mike > Michael, thanks for replying. The problem you describe would really be a showstopper for us, so I'd really like to understand it well!... I tried to reread your post multiple times but I don't fully understand, excuse my ignorance... this is because I am not really familiar with SAS/SCSI terms. You have an LSI-1068E attached directly to hot swap SATA bays, you don't have expanders in the middle, right? Then you swapped one of the 8 drives and another one got disconnected because of that? This doesn't seem to happen on my setup. I have a mainboard-integrated LSI-1068E and kernel 2.6.24. I only have 4 drives connected and they all belong to "port-5:x" (x is different for each drive) I didn't try to swap all of them, but I definitely tried swapping one, and no other drive was disconnected because of this. The other drives were part of an md-raid, it would have been disastrous if it happened. What kernel version do you have? The following sentence is not clear to me: "then I added or swapped another drive on the same port ( 4 SAS channels per port on the external HBAs)" what is the "external HBAs"? Is that an expander? Also... are you sure the problem was not maybe due to the udev rule? Sorry I don't know udev, I don't know if it even has the power to unmount a drive... Thank you A. ^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: LSI SAS changes SCSI address and by-path on hot-swap 2010-03-05 23:05 ` Asdo @ 2010-03-09 16:50 ` Moore, Michael 2010-03-10 10:59 ` Boaz Harrosh 2010-03-12 15:25 ` LSI SAS changes SCSI address and by-path on hot-swap Asdo 0 siblings, 2 replies; 11+ messages in thread From: Moore, Michael @ 2010-03-09 16:50 UTC (permalink / raw) To: Asdo; +Cc: James Bottomley, linux-scsi@vger.kernel.org Sorry for top posting, but Outlook just screws it all up. The cards I've used are a LSI Logic SAS 3800X (8 port External PCI-X card w/ 2 x SFF-8470 SAS connectors) and LSI SAS 3801E ( 8 Port External PCI-e card with 2 x SFF-8088 SAS connectors). Each connector has 4 SAS links. The SAS protocol is downwardly compatible with SATA, so you can run SATA drives right on a SAS cable. So, in my setup, I basically have 1 drive per SAS link. No expanders, or anything fancy. The issues I mentioned happens to the 4 drives on the same connector. When the driver is detecting the new drive, it looks like it redetects all of the drives on the connector (or it at least reports one new drive and the other existing drives). If you were in a directory from one of the mounted drives, you get IO Errors as it appears that the drive was removed, and then remounted, but in a way that was not clean. This has happened with Default CentOS 5 kernels (2.6.18-*.el5), 2.6.26 vanilla, 2.6.30 vanilla, Fedora latest. The issue appeared no matter what. The udev rules used the ENV{ID_PATH} option to tie to the sysfs value that indicated which PCI ID + SAS phy on the SAS HBA used by the drives to the device detected by the kernel, and then create a symlink from the /dev/sd<X> entry to /dev/slot<Y>, where Y is the label on the slot of the hot swap bays (a-h). Here is an example of the rule: KERNEL=="sd*", ENV{ID_PATH}=="pci-0000:04:00.0-sas-phy0:1*", SYMLINK+="slota%n" I did this because the device ID number that the kernel reports increments every time a drive is swapped. So, even though you are using the same SAS channel, you do not have a consistent drive numbering. So I had to go down to the SAS phy to get something consistent. The SiI-3124/libata setup had consistent device ID's (the ID was tied to the SATA channel, and I used the device ID to do the mapping. Perhaps udev is the reason for the issues, but I tend to think it is the way the SAS/SCSI subsystem works as I have never seen the SATA/libata subsystem have this "rescan/remount" behavior. -----Original Message----- From: linux-scsi-owner@vger.kernel.org [mailto:linux-scsi-owner@vger.kernel.org] On Behalf Of Asdo Sent: Friday, March 05, 2010 6:06 PM To: Moore, Michael Cc: James Bottomley; linux-scsi@vger.kernel.org Subject: Re: LSI SAS changes SCSI address and by-path on hot-swap Moore, Michael wrote: > I did this with a LSI-1068E HBA and 2 x 4 drive hot swap SATA bays. I was able to create udev rules to map the drive slots to consistent /dev entries. However, the bigger problem I had was that if I had a drive inserted and mounted (say in slot A) and then I added or swapped another drive on the same port ( 4 SAS channels per port on the external HBAs) it would cause some sort of reset on the bus that would end up unmounting the drive > in slot A even though I never did anything to the drive in slot A. Now, this was with SATA drives connected directly to the 1068 which should work, but since I needed this to work, I had to revert to the older setup that used Silicon Image 3124 eSATA cards. > > I can try to dig up the udev rules I used if this would be helpful. > > - Mike > Michael, thanks for replying. The problem you describe would really be a showstopper for us, so I'd really like to understand it well!... I tried to reread your post multiple times but I don't fully understand, excuse my ignorance... this is because I am not really familiar with SAS/SCSI terms. You have an LSI-1068E attached directly to hot swap SATA bays, you don't have expanders in the middle, right? Then you swapped one of the 8 drives and another one got disconnected because of that? This doesn't seem to happen on my setup. I have a mainboard-integrated LSI-1068E and kernel 2.6.24. I only have 4 drives connected and they all belong to "port-5:x" (x is different for each drive) I didn't try to swap all of them, but I definitely tried swapping one, and no other drive was disconnected because of this. The other drives were part of an md-raid, it would have been disastrous if it happened. What kernel version do you have? The following sentence is not clear to me: "then I added or swapped another drive on the same port ( 4 SAS channels per port on the external HBAs)" what is the "external HBAs"? Is that an expander? Also... are you sure the problem was not maybe due to the udev rule? Sorry I don't know udev, I don't know if it even has the power to unmount a drive... Thank you A. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: LSI SAS changes SCSI address and by-path on hot-swap 2010-03-09 16:50 ` Moore, Michael @ 2010-03-10 10:59 ` Boaz Harrosh 2010-03-10 13:49 ` quotes in reply messages (was Re: LSI SAS changes SCSI address and by-path on hot-swap) Stefan Richter 2010-03-12 15:25 ` LSI SAS changes SCSI address and by-path on hot-swap Asdo 1 sibling, 1 reply; 11+ messages in thread From: Boaz Harrosh @ 2010-03-10 10:59 UTC (permalink / raw) To: Moore, Michael; +Cc: Asdo, James Bottomley, linux-scsi@vger.kernel.org On 03/09/2010 06:50 PM, Moore, Michael wrote: > Sorry for top posting, but Outlook just screws it all up. > Outlook is the root of all computers evil and should be avoided as the plague. Exchange servers work just as well with ThunderBird or any other none evil email client through the IMAP protocol. The mail is easy to setup the address book needs some low-level LDAP definitions but once set works like a charm. For office schedule you can use outlook just for that, along side Thunderbird (It's IMAP) for all other mail needs. Outlook is evil Boaz ^ permalink raw reply [flat|nested] 11+ messages in thread
* quotes in reply messages (was Re: LSI SAS changes SCSI address and by-path on hot-swap) 2010-03-10 10:59 ` Boaz Harrosh @ 2010-03-10 13:49 ` Stefan Richter 0 siblings, 0 replies; 11+ messages in thread From: Stefan Richter @ 2010-03-10 13:49 UTC (permalink / raw) To: Boaz Harrosh Cc: Moore, Michael, Asdo, James Bottomley, linux-scsi@vger.kernel.org Boaz Harrosh wrote: > On 03/09/2010 06:50 PM, Moore, Michael wrote: >> Sorry for top posting, but Outlook just screws it all up. >> > > Outlook is the root of all computers evil and should be avoided as the > plague. Exchange servers work just as well with ThunderBird or any other > none evil email client through the IMAP protocol. [...] There are also macros for Outlook which properly preformat quoted text when you hit reply. Search the web for outlook quotefix. -- Stefan Richter -=====-==-=- --== -=-=- http://arcgraph.de/sr/ ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: LSI SAS changes SCSI address and by-path on hot-swap 2010-03-09 16:50 ` Moore, Michael 2010-03-10 10:59 ` Boaz Harrosh @ 2010-03-12 15:25 ` Asdo 2010-03-12 15:32 ` James Bottomley 1 sibling, 1 reply; 11+ messages in thread From: Asdo @ 2010-03-12 15:25 UTC (permalink / raw) To: Moore, Michael; +Cc: James Bottomley, linux-scsi@vger.kernel.org Moore, Michael wrote: > Sorry for top posting, but Outlook just screws it all up. > > The cards I've used are a LSI Logic SAS 3800X (8 port External PCI-X card w/ 2 x SFF-8470 SAS connectors) and LSI SAS 3801E ( 8 Port External PCI-e card with 2 x SFF-8088 SAS connectors). Each connector has 4 SAS links. > The SAS protocol is downwardly compatible with SATA, so you can run SATA drives right on a SAS cable. > > So, in my setup, I basically have 1 drive per SAS link. No expanders, or anything fancy. The issues I mentioned happens to the 4 drives on the same connector. When the driver is detecting the new drive, it looks like it redetects all of the drives on the connector (or it at least reports one new drive and the other existing drives). If you were in a directory from one of the mounted drives, you get IO Errors as it appears that the drive was removed, and then remounted, but in a way that was not clean. > > This has happened with Default CentOS 5 kernels (2.6.18-*.el5), 2.6.26 vanilla, 2.6.30 vanilla, Fedora latest. > The issue appeared no matter what. > > The udev rules used the ENV{ID_PATH} option to tie to the sysfs value that indicated which PCI ID + SAS phy on the SAS HBA used by the drives to the device detected by the kernel, and then create a symlink from the /dev/sd<X> entry to /dev/slot<Y>, where Y is the label on the slot of the hot swap bays (a-h). Here is an example of the rule: > > KERNEL=="sd*", ENV{ID_PATH}=="pci-0000:04:00.0-sas-phy0:1*", SYMLINK+="slota%n" > > I did this because the device ID number that the kernel reports increments every time a drive is swapped. So, even though you are using the same SAS channel, you do not have a consistent drive numbering. So I had to go down to the SAS phy to get something consistent. The SiI-3124/libata setup had consistent device ID's (the ID was tied to the SATA channel, and I used the device ID to do the mapping. Perhaps udev is the reason for the issues, but I tend to think it is the way the SAS/SCSI subsystem works as I have never seen the SATA/libata subsystem have this "rescan/remount" behavior. > This looks like a horrible bug for people having software RAID on the disks (or maybe even hardware RAID) I seem not to have this bug on ubuntu kernel 2.6.24, I mean my situation was similar with the mainboard-integrated LSISAS 1068E and it didn't happen to me, but that doesn't mean much... Also, LSI controllers are very much used by linuxers. Have you tried reporting it here and try to get it fixed? Or reporting it to the LSI tech support? They are pretty responsive even if their web interface is a bit strange. I'm thinking about buying a few of LSI HBA controllers for linux software RAID use, probably external ones like the one you have. Maybe attached to expanders. I'll keep my fingers crossed! ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: LSI SAS changes SCSI address and by-path on hot-swap 2010-03-12 15:25 ` LSI SAS changes SCSI address and by-path on hot-swap Asdo @ 2010-03-12 15:32 ` James Bottomley 2010-03-12 15:50 ` Asdo 0 siblings, 1 reply; 11+ messages in thread From: James Bottomley @ 2010-03-12 15:32 UTC (permalink / raw) To: Asdo; +Cc: Moore, Michael, linux-scsi@vger.kernel.org On Fri, 2010-03-12 at 16:25 +0100, Asdo wrote: > Moore, Michael wrote: > > Sorry for top posting, but Outlook just screws it all up. > > > > The cards I've used are a LSI Logic SAS 3800X (8 port External PCI-X card w/ 2 x SFF-8470 SAS connectors) and LSI SAS 3801E ( 8 Port External PCI-e card with 2 x SFF-8088 SAS connectors). Each connector has 4 SAS links. > > The SAS protocol is downwardly compatible with SATA, so you can run SATA drives right on a SAS cable. > > > > So, in my setup, I basically have 1 drive per SAS link. No expanders, or anything fancy. The issues I mentioned happens to the 4 drives on the same connector. When the driver is detecting the new drive, it looks like it redetects all of the drives on the connector (or it at least reports one new drive and the other existing drives). If you were in a directory from one of the mounted drives, you get IO Errors as it appears that the drive was removed, and then remounted, but in a way that was not clean. > > > > This has happened with Default CentOS 5 kernels (2.6.18-*.el5), 2.6.26 vanilla, 2.6.30 vanilla, Fedora latest. > > The issue appeared no matter what. > > > > The udev rules used the ENV{ID_PATH} option to tie to the sysfs value that indicated which PCI ID + SAS phy on the SAS HBA used by the drives to the device detected by the kernel, and then create a symlink from the /dev/sd<X> entry to /dev/slot<Y>, where Y is the label on the slot of the hot swap bays (a-h). Here is an example of the rule: > > > > KERNEL=="sd*", ENV{ID_PATH}=="pci-0000:04:00.0-sas-phy0:1*", SYMLINK+="slota%n" > > > > I did this because the device ID number that the kernel reports increments every time a drive is swapped. So, even though you are using the same SAS channel, you do not have a consistent drive numbering. So I had to go down to the SAS phy to get something consistent. The SiI-3124/libata setup had consistent device ID's (the ID was tied to the SATA channel, and I used the device ID to do the mapping. Perhaps udev is the reason for the issues, but I tend to think it is the way the SAS/SCSI subsystem works as I have never seen the SATA/libata subsystem have this "rescan/remount" behavior. > > > > This looks like a horrible bug for people having software RAID on the > disks (or maybe even hardware RAID) Not really, most people want to identify the disk permanently, not the slot, so that's what /dev/disk/by-id and /dev/disk/by-uuid is for. > I seem not to have this bug on ubuntu kernel 2.6.24, I mean my situation > was similar with the mainboard-integrated LSISAS 1068E and it didn't > happen to me, but that doesn't mean much... > > Also, LSI controllers are very much used by linuxers. > Have you tried reporting it here and try to get it fixed? > Or reporting it to the LSI tech support? They are pretty responsive even > if their web interface is a bit strange. > > I'm thinking about buying a few of LSI HBA controllers for linux > software RAID use, probably external ones like the one you have. Maybe > attached to expanders. I'll keep my fingers crossed! It's not just LSI that does this ... every SAS board will tend to increment target numbers on add and remove because that's the way the transport class does it. In linux, you have to expect the /dev/sdX name to be volatile and mount by id or uuid instead. To mount by slot, you can use the phy workaround for SAS/SATA, but you should really be using an enclosure management service. James ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: LSI SAS changes SCSI address and by-path on hot-swap 2010-03-12 15:32 ` James Bottomley @ 2010-03-12 15:50 ` Asdo 0 siblings, 0 replies; 11+ messages in thread From: Asdo @ 2010-03-12 15:50 UTC (permalink / raw) To: James Bottomley; +Cc: Moore, Michael, linux-scsi@vger.kernel.org James Bottomley wrote: > On Fri, 2010-03-12 at 16:25 +0100, Asdo wrote: > >> Moore, Michael wrote: >> >>> Sorry for top posting, but Outlook just screws it all up. >>> >>> The cards I've used are a LSI Logic SAS 3800X (8 port External PCI-X card w/ 2 x SFF-8470 SAS connectors) and LSI SAS 3801E ( 8 Port External PCI-e card with 2 x SFF-8088 SAS connectors). Each connector has 4 SAS links. >>> The SAS protocol is downwardly compatible with SATA, so you can run SATA drives right on a SAS cable. >>> >>> So, in my setup, I basically have 1 drive per SAS link. No expanders, or anything fancy. The issues I mentioned happens to the 4 drives on the same connector. When the driver is detecting the new drive, it looks like it redetects all of the drives on the connector (or it at least reports one new drive and the other existing drives). If you were in a directory from one of the mounted drives, you get IO Errors as it appears that the drive was removed, and then remounted, but in a way that was not clean. >>> >>> This has happened with Default CentOS 5 kernels (2.6.18-*.el5), 2.6.26 vanilla, 2.6.30 vanilla, Fedora latest. >>> The issue appeared no matter what. >>> >>> The udev rules used the ENV{ID_PATH} option to tie to the sysfs value that indicated which PCI ID + SAS phy on the SAS HBA used by the drives to the device detected by the kernel, and then create a symlink from the /dev/sd<X> entry to /dev/slot<Y>, where Y is the label on the slot of the hot swap bays (a-h). Here is an example of the rule: >>> >>> KERNEL=="sd*", ENV{ID_PATH}=="pci-0000:04:00.0-sas-phy0:1*", SYMLINK+="slota%n" >>> >>> I did this because the device ID number that the kernel reports increments every time a drive is swapped. So, even though you are using the same SAS channel, you do not have a consistent drive numbering. So I had to go down to the SAS phy to get something consistent. The SiI-3124/libata setup had consistent device ID's (the ID was tied to the SATA channel, and I used the device ID to do the mapping. Perhaps udev is the reason for the issues, but I tend to think it is the way the SAS/SCSI subsystem works as I have never seen the SATA/libata subsystem have this "rescan/remount" behavior. >>> >>> >> This looks like a horrible bug for people having software RAID on the >> disks (or maybe even hardware RAID) >> > > Not really, most people want to identify the disk permanently, not the > slot, so that's what /dev/disk/by-id and /dev/disk/by-uuid is for. > > No James, I am *not* referring to the topic of my original post now (for that one I understood how to do, thank you), I am now referring to the bug reported by Michael Reread this part by Michael: > So, in my setup, I basically have 1 drive per SAS link. No expanders, or anything fancy. The issues I mentioned happens to the 4 drives on the same connector. When the driver is detecting the new drive, it looks like it redetects all of the drives on the connector (or it at least reports one new drive and the other existing drives). If you were in a directory from one of the mounted drives, you get IO Errors as it appears that the drive was removed, and then remounted, but in a way that was not clean. > and his previous post on this same thread If the drives are part of an MD raid, they are going to be kicked by MD if they give errors when one of the brothers is hotswapped. If multiple drives are kicked simultaneously (like it seems to happen for Michael), the array will go down and you might not even be able to bring it up again with --force (depending on various factors e.g. on how many drives were on the same controller vs how many were on other controllers). If you are able to bring the array up again it will probably in degraded state. Data loss is also very likely. ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2010-03-12 15:52 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-03-04 16:55 LSI SAS changes SCSI address and by-path on hot-swap Asdo 2010-03-05 6:22 ` James Bottomley 2010-03-05 11:12 ` Asdo 2010-03-05 16:57 ` Moore, Michael 2010-03-05 23:05 ` Asdo 2010-03-09 16:50 ` Moore, Michael 2010-03-10 10:59 ` Boaz Harrosh 2010-03-10 13:49 ` quotes in reply messages (was Re: LSI SAS changes SCSI address and by-path on hot-swap) Stefan Richter 2010-03-12 15:25 ` LSI SAS changes SCSI address and by-path on hot-swap Asdo 2010-03-12 15:32 ` James Bottomley 2010-03-12 15:50 ` Asdo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox