LSI SAS changes SCSI address and by-path on hot-swap

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

* LSI SAS changes SCSI address and by-path on hot-swap
@ 2010-03-04 16:55 Asdo
  2010-03-05  6:22 ` James Bottomley
  0 siblings, 1 reply; 11+ messages in thread
From: Asdo @ 2010-03-04 16:55 UTC (permalink / raw)
  To: linux-scsi@vger.kernel.org

Hello all,

we need to buy new controllers for new storages we are building.

LSI SAS HBAs are very attractive for our purposes but I identified a 
problem with our existing mainboard-integrated LSI SAS 1068E . The 
problem is that it is apparently not possble to use the 
/dev/disk/by-path feature of Linux with it. At least not with the kernel 
2.6.24 we are using (excuse me if it has already been fixed on latest 
kernels: the server is in production now and it's not easy for us to check).

We need the /dev/disk/by-path feature because we commonly do hot-swaps 
with drives and we need to know for sure which HDD slot corresponds to a 
certain linux block device. With other controllers like 3ware 9650SE 
there is no such problem, ok but that's a SATA controller... I don't 
know if the problem is by design with SAS controllers.

Actually the problem is even more complicated because for the new 
storages we have planned to assemble there would be SAS expanders in the 
middle.

Look, here is an hot-swap seen from the dmesg:

        Feb 22 14:27:30 myserver kernel: [655437.601971] mptbase: ioc0: 
LogInfo(0x31110d00): Originator={PL}, Code={Reset}, SubCode(0x0d00)
        Feb 22 14:27:35 myserver kernel: [655442.781061] mptsas: ioc0: 
removing sata device, channel 0, id 0, phy 0
        Feb 22 14:27:35 myserver kernel: [655442.781453] sd 5:0:10:0: 
[sdu] Synchronizing SCSI cache
        Feb 22 14:27:35 myserver kernel: [655442.781495] sd 5:0:10:0: 
[sdu] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
        Feb 22 14:28:22 myserver kernel: [655489.237562] mptsas: ioc0: 
attaching sata device, channel 0, id 0, phy 0
        Feb 22 14:28:22 myserver kernel: [655489.241959] scsi 5:0:11:0: 
Direct-Access     ATA      WDC WD10EADS-00P 0A01 PQ: 0 ANSI: 5
        Feb 22 14:28:22 myserver kernel: [655489.242506] sd 5:0:11:0: 
[sdu] 1953525168 512-byte hardware sectors (1000205 MB)
        Feb 22 14:28:22 myserver kernel: [655489.248104] sd 5:0:11:0: 
[sdu] Write Protect is off
        Feb 22 14:28:22 myserver kernel: [655489.251847] sd 5:0:11:0: 
[sdu] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
        Feb 22 14:28:22 myserver kernel: [655489.252161] sd 5:0:11:0: 
[sdu] 1953525168 512-byte hardware sectors (1000205 MB)
        Feb 22 14:28:22 myserver kernel: [655489.257758] sd 5:0:11:0: 
[sdu] Write Protect is off
        Feb 22 14:28:22 myserver kernel: [655489.261518] sd 5:0:11:0: 
[sdu] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
        Feb 22 14:28:22 myserver kernel: [655489.261525]  sdu: unknown 
partition table
        Feb 22 14:28:22 myserver kernel: [655489.287152] sd 5:0:11:0: 
[sdu] Attached SCSI disk
        Feb 22 14:28:22 myserver kernel: [655489.287204] sd 5:0:11:0: 
Attached scsi generic sg21 type 0

You see, when I remove the disk it takes away device sd 5:0:10:0 and 
when I reinsert a new drive it becomes device sd 5:0:11:0.

the /dev/disk/by-path the file to the disk also changes, from:

/dev/disk/by-path/pci-0000:0b:00.0-sas-0x500e08101003c820:1:0-0x1221000000000000:0 

to:

/dev/disk/by-path/pci-0000:0b:00.0-sas-0x500e08101003c824:1:4-0x1221000000000000:0
(note: I'm not 100% sure that these two entries come from the same 
hot-swap as the dmesg above)

in rare cases I noticed that after an hot swap the file in 
/dev/disk/by-path for the device is not even recreated.

I also cannot trust drive letters because they can change across reboot, 
and they also change if I remove drive A, remove drive B, insert drive 
B, insert drive A... the letters would be swapped. So it's not reliable 
enough for our use.

So is this a real bug and is maybe fixed on newer kernels, or it is by 
design?

How can people reliably use hot-swap hardware in this situation...? Are 
there other ways to determine the physical connections from within linux 
(possibly through SAS expanders also), which I am not aware of?

Thank you
Asdo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: LSI SAS changes SCSI address and by-path on hot-swap
  2010-03-04 16:55 LSI SAS changes SCSI address and by-path on hot-swap Asdo
@ 2010-03-05  6:22 ` James Bottomley
  2010-03-05 11:12   ` Asdo
  2010-03-05 16:57   ` Moore, Michael
  0 siblings, 2 replies; 11+ messages in thread
From: James Bottomley @ 2010-03-05  6:22 UTC (permalink / raw)
  To: Asdo; +Cc: linux-scsi@vger.kernel.org

On Thu, 2010-03-04 at 17:55 +0100, Asdo wrote: 
> we need to buy new controllers for new storages we are building.
> 
> LSI SAS HBAs are very attractive for our purposes but I identified a 
> problem with our existing mainboard-integrated LSI SAS 1068E . The 
> problem is that it is apparently not possble to use the 
> /dev/disk/by-path feature of Linux with it. At least not with the kernel 
> 2.6.24 we are using (excuse me if it has already been fixed on latest 
> kernels: the server is in production now and it's not easy for us to check).
> 
> We need the /dev/disk/by-path feature because we commonly do hot-swaps 
> with drives and we need to know for sure which HDD slot corresponds to a 
> certain linux block device. With other controllers like 3ware 9650SE 
> there is no such problem, ok but that's a SATA controller... I don't 
> know if the problem is by design with SAS controllers.
> 
> Actually the problem is even more complicated because for the new 
> storages we have planned to assemble there would be SAS expanders in the 
> middle.
> 
> Look, here is an hot-swap seen from the dmesg:
> 
>         Feb 22 14:27:30 myserver kernel: [655437.601971] mptbase: ioc0: 
> LogInfo(0x31110d00): Originator={PL}, Code={Reset}, SubCode(0x0d00)
>         Feb 22 14:27:35 myserver kernel: [655442.781061] mptsas: ioc0: 
> removing sata device, channel 0, id 0, phy 0
>         Feb 22 14:27:35 myserver kernel: [655442.781453] sd 5:0:10:0: 
> [sdu] Synchronizing SCSI cache
>         Feb 22 14:27:35 myserver kernel: [655442.781495] sd 5:0:10:0: 
> [sdu] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
>         Feb 22 14:28:22 myserver kernel: [655489.237562] mptsas: ioc0: 
> attaching sata device, channel 0, id 0, phy 0
>         Feb 22 14:28:22 myserver kernel: [655489.241959] scsi 5:0:11:0: 
> Direct-Access     ATA      WDC WD10EADS-00P 0A01 PQ: 0 ANSI: 5
>         Feb 22 14:28:22 myserver kernel: [655489.242506] sd 5:0:11:0: 
> [sdu] 1953525168 512-byte hardware sectors (1000205 MB)
>         Feb 22 14:28:22 myserver kernel: [655489.248104] sd 5:0:11:0: 
> [sdu] Write Protect is off
>         Feb 22 14:28:22 myserver kernel: [655489.251847] sd 5:0:11:0: 
> [sdu] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
>         Feb 22 14:28:22 myserver kernel: [655489.252161] sd 5:0:11:0: 
> [sdu] 1953525168 512-byte hardware sectors (1000205 MB)
>         Feb 22 14:28:22 myserver kernel: [655489.257758] sd 5:0:11:0: 
> [sdu] Write Protect is off
>         Feb 22 14:28:22 myserver kernel: [655489.261518] sd 5:0:11:0: 
> [sdu] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
>         Feb 22 14:28:22 myserver kernel: [655489.261525]  sdu: unknown 
> partition table
>         Feb 22 14:28:22 myserver kernel: [655489.287152] sd 5:0:11:0: 
> [sdu] Attached SCSI disk
>         Feb 22 14:28:22 myserver kernel: [655489.287204] sd 5:0:11:0: 
> Attached scsi generic sg21 type 0
> 
> You see, when I remove the disk it takes away device sd 5:0:10:0 and 
> when I reinsert a new drive it becomes device sd 5:0:11:0.
> 
> the /dev/disk/by-path the file to the disk also changes, from:
>      
> /dev/disk/by-path/pci-0000:0b:00.0-sas-0x500e08101003c820:1:0-0x1221000000000000:0 
> 
> to:
>      
> /dev/disk/by-path/pci-0000:0b:00.0-sas-0x500e08101003c824:1:4-0x1221000000000000:0
> (note: I'm not 100% sure that these two entries come from the same 
> hot-swap as the dmesg above)
> 
> in rare cases I noticed that after an hot swap the file in 
> /dev/disk/by-path for the device is not even recreated.
> 
> I also cannot trust drive letters because they can change across reboot, 
> and they also change if I remove drive A, remove drive B, insert drive 
> B, insert drive A... the letters would be swapped. So it's not reliable 
> enough for our use.
> 
> So is this a real bug and is maybe fixed on newer kernels, or it is by 
> design?
> 
> How can people reliably use hot-swap hardware in this situation...? Are 
> there other ways to determine the physical connections from within linux 
> (possibly through SAS expanders also), which I am not aware of?

So what I think I hear in the foregoing is that you actually want to
identify a device by slot number in the chassis?

For that, /dev/disk/by-path will never work; you need to be using
enclosure services.

However, since you mention you'll be using SAS and expanders, there is a
way to get to the slot numbers without using enclosure services:  They
phy numbers of the expander (and HBA) ports usually correspond one for
one with the slot.

So for sda, if, in my system, you look at /sys/block/sda/device, it's a
symbolic link for

/sys/devices/pci0000:00/0000:00:1c.0/0000:02:00.0/0000:03:04.0/host3/port-3:0/end_device-3:0/target3:0:0/3:0:0:0

The thing you want is the port-3.0.  If you look in sysfs at this:

ls /sys/class/sas_port/port-3\:0/device

Mine contains phy-3:4  Showing this disk is actually connected to phy 4
of the output device (as the HBA counts).

For expanders it's a little more complex, you'll see multiple ports in
the path, but it's the phy of the last one you want.

James



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: LSI SAS changes SCSI address and by-path on hot-swap
  2010-03-05  6:22 ` James Bottomley
@ 2010-03-05 11:12   ` Asdo
  2010-03-05 16:57   ` Moore, Michael
  1 sibling, 0 replies; 11+ messages in thread
From: Asdo @ 2010-03-05 11:12 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi@vger.kernel.org

James Bottomley wrote:
> [CUT]
> The thing you want is the port-3.0.  If you look in sysfs at this:
>
> ls /sys/class/sas_port/port-3\:0/device
>
> Mine contains phy-3:4  Showing this disk is actually connected to phy 4
> of the output device (as the HBA counts).
>
>   
WONDERFUL!!
Thanks for replying.
I cannot check right now on my system, maybe next week, but I trust this 
will work...
Thanks again
A.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: LSI SAS changes SCSI address and by-path on hot-swap
  2010-03-05  6:22 ` James Bottomley
  2010-03-05 11:12   ` Asdo
@ 2010-03-05 16:57   ` Moore, Michael
  2010-03-05 23:05     ` Asdo
  1 sibling, 1 reply; 11+ messages in thread
From: Moore, Michael @ 2010-03-05 16:57 UTC (permalink / raw)
  To: James Bottomley, Asdo; +Cc: linux-scsi@vger.kernel.org

I did this with a LSI-1068E HBA and 2 x 4 drive hot swap SATA bays.  I was able to create udev rules to map the drive slots to consistent /dev entries.  However, the bigger problem I had was that if I had a drive inserted and mounted (say in slot A) and then I added or swapped another drive on the same port ( 4 SAS channels per port on the external HBAs) it would cause some sort of reset on the bus that would end up unmounting the drive    
in slot A even though I never did anything to the drive in slot A.  Now, this was with SATA drives connected directly to the 1068 which should work, but since I needed this to work, I had to revert to the older setup that used Silicon Image 3124 eSATA cards.

I can try to dig up the udev rules I used if this would be helpful.

- Mike
 
-----Original Message-----
From: linux-scsi-owner@vger.kernel.org [mailto:linux-scsi-owner@vger.kernel.org] On Behalf Of James Bottomley
Sent: Friday, March 05, 2010 1:23 AM
To: Asdo
Cc: linux-scsi@vger.kernel.org
Subject: Re: LSI SAS changes SCSI address and by-path on hot-swap

On Thu, 2010-03-04 at 17:55 +0100, Asdo wrote: 
> we need to buy new controllers for new storages we are building.
> 
> LSI SAS HBAs are very attractive for our purposes but I identified a 
> problem with our existing mainboard-integrated LSI SAS 1068E . The 
> problem is that it is apparently not possble to use the 
> /dev/disk/by-path feature of Linux with it. At least not with the kernel 
> 2.6.24 we are using (excuse me if it has already been fixed on latest 
> kernels: the server is in production now and it's not easy for us to check).
> 
> We need the /dev/disk/by-path feature because we commonly do hot-swaps 
> with drives and we need to know for sure which HDD slot corresponds to a 
> certain linux block device. With other controllers like 3ware 9650SE 
> there is no such problem, ok but that's a SATA controller... I don't 
> know if the problem is by design with SAS controllers.
> 
> Actually the problem is even more complicated because for the new 
> storages we have planned to assemble there would be SAS expanders in the 
> middle.
> 
> Look, here is an hot-swap seen from the dmesg:
> 
>         Feb 22 14:27:30 myserver kernel: [655437.601971] mptbase: ioc0: 
> LogInfo(0x31110d00): Originator={PL}, Code={Reset}, SubCode(0x0d00)
>         Feb 22 14:27:35 myserver kernel: [655442.781061] mptsas: ioc0: 
> removing sata device, channel 0, id 0, phy 0
>         Feb 22 14:27:35 myserver kernel: [655442.781453] sd 5:0:10:0: 
> [sdu] Synchronizing SCSI cache
>         Feb 22 14:27:35 myserver kernel: [655442.781495] sd 5:0:10:0: 
> [sdu] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
>         Feb 22 14:28:22 myserver kernel: [655489.237562] mptsas: ioc0: 
> attaching sata device, channel 0, id 0, phy 0
>         Feb 22 14:28:22 myserver kernel: [655489.241959] scsi 5:0:11:0: 
> Direct-Access     ATA      WDC WD10EADS-00P 0A01 PQ: 0 ANSI: 5
>         Feb 22 14:28:22 myserver kernel: [655489.242506] sd 5:0:11:0: 
> [sdu] 1953525168 512-byte hardware sectors (1000205 MB)
>         Feb 22 14:28:22 myserver kernel: [655489.248104] sd 5:0:11:0: 
> [sdu] Write Protect is off
>         Feb 22 14:28:22 myserver kernel: [655489.251847] sd 5:0:11:0: 
> [sdu] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
>         Feb 22 14:28:22 myserver kernel: [655489.252161] sd 5:0:11:0: 
> [sdu] 1953525168 512-byte hardware sectors (1000205 MB)
>         Feb 22 14:28:22 myserver kernel: [655489.257758] sd 5:0:11:0: 
> [sdu] Write Protect is off
>         Feb 22 14:28:22 myserver kernel: [655489.261518] sd 5:0:11:0: 
> [sdu] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
>         Feb 22 14:28:22 myserver kernel: [655489.261525]  sdu: unknown 
> partition table
>         Feb 22 14:28:22 myserver kernel: [655489.287152] sd 5:0:11:0: 
> [sdu] Attached SCSI disk
>         Feb 22 14:28:22 myserver kernel: [655489.287204] sd 5:0:11:0: 
> Attached scsi generic sg21 type 0
> 
> You see, when I remove the disk it takes away device sd 5:0:10:0 and 
> when I reinsert a new drive it becomes device sd 5:0:11:0.
> 
> the /dev/disk/by-path the file to the disk also changes, from:
>      
> /dev/disk/by-path/pci-0000:0b:00.0-sas-0x500e08101003c820:1:0-0x1221000000000000:0 
> 
> to:
>      
> /dev/disk/by-path/pci-0000:0b:00.0-sas-0x500e08101003c824:1:4-0x1221000000000000:0
> (note: I'm not 100% sure that these two entries come from the same 
> hot-swap as the dmesg above)
> 
> in rare cases I noticed that after an hot swap the file in 
> /dev/disk/by-path for the device is not even recreated.
> 
> I also cannot trust drive letters because they can change across reboot, 
> and they also change if I remove drive A, remove drive B, insert drive 
> B, insert drive A... the letters would be swapped. So it's not reliable 
> enough for our use.
> 
> So is this a real bug and is maybe fixed on newer kernels, or it is by 
> design?
> 
> How can people reliably use hot-swap hardware in this situation...? Are 
> there other ways to determine the physical connections from within linux 
> (possibly through SAS expanders also), which I am not aware of?

So what I think I hear in the foregoing is that you actually want to
identify a device by slot number in the chassis?

For that, /dev/disk/by-path will never work; you need to be using
enclosure services.

However, since you mention you'll be using SAS and expanders, there is a
way to get to the slot numbers without using enclosure services:  They
phy numbers of the expander (and HBA) ports usually correspond one for
one with the slot.

So for sda, if, in my system, you look at /sys/block/sda/device, it's a
symbolic link for

/sys/devices/pci0000:00/0000:00:1c.0/0000:02:00.0/0000:03:04.0/host3/port-3:0/end_device-3:0/target3:0:0/3:0:0:0

The thing you want is the port-3.0.  If you look in sysfs at this:

ls /sys/class/sas_port/port-3\:0/device

Mine contains phy-3:4  Showing this disk is actually connected to phy 4
of the output device (as the HBA counts).

For expanders it's a little more complex, you'll see multiple ports in
the path, but it's the phy of the last one you want.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: LSI SAS changes SCSI address and by-path on hot-swap
  2010-03-05 16:57   ` Moore, Michael
@ 2010-03-05 23:05     ` Asdo
  2010-03-09 16:50       ` Moore, Michael
  0 siblings, 1 reply; 11+ messages in thread
From: Asdo @ 2010-03-05 23:05 UTC (permalink / raw)
  To: Moore, Michael; +Cc: James Bottomley, linux-scsi@vger.kernel.org

Moore, Michael wrote:
> I did this with a LSI-1068E HBA and 2 x 4 drive hot swap SATA bays.  I was able to create udev rules to map the drive slots to consistent /dev entries.  However, the bigger problem I had was that if I had a drive inserted and mounted (say in slot A) and then I added or swapped another drive on the same port ( 4 SAS channels per port on the external HBAs) it would cause some sort of reset on the bus that would end up unmounting the drive    
> in slot A even though I never did anything to the drive in slot A.  Now, this was with SATA drives connected directly to the 1068 which should work, but since I needed this to work, I had to revert to the older setup that used Silicon Image 3124 eSATA cards.
>
> I can try to dig up the udev rules I used if this would be helpful.
>
> - Mike
>   

Michael, thanks for replying.
The problem you describe would really be a showstopper for us, so I'd 
really like to understand it well!...
I tried to reread your post multiple times but I don't fully understand, 
excuse my ignorance... this is because I am not really familiar with 
SAS/SCSI terms.

You have an LSI-1068E attached directly to hot swap SATA bays, you don't 
have expanders in the middle, right?
Then you swapped one of the 8 drives and another one got disconnected 
because of that?

This doesn't seem to happen on my setup. I have a mainboard-integrated 
LSI-1068E and kernel 2.6.24. I only have 4 drives connected and they all 
belong to "port-5:x" (x is different for each drive) I didn't try to 
swap all of them, but I definitely tried swapping one, and no other 
drive was disconnected because of this. The other drives were part of an 
md-raid, it would have been disastrous if it happened.
What kernel version do you have?

The following sentence is not clear to me:
"then I added or swapped another drive on the same port ( 4 SAS channels 
per port on the external HBAs)"
what is the "external HBAs"? Is that an expander?

Also... are you sure the problem was not maybe due to the udev rule? 
Sorry I don't know udev, I don't know if it even has the power to 
unmount a drive...

Thank you
A.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: LSI SAS changes SCSI address and by-path on hot-swap
  2010-03-05 23:05     ` Asdo
@ 2010-03-09 16:50       ` Moore, Michael
  2010-03-10 10:59         ` Boaz Harrosh
  2010-03-12 15:25         ` LSI SAS changes SCSI address and by-path on hot-swap Asdo
  0 siblings, 2 replies; 11+ messages in thread
From: Moore, Michael @ 2010-03-09 16:50 UTC (permalink / raw)
  To: Asdo; +Cc: James Bottomley, linux-scsi@vger.kernel.org

Sorry for top posting, but Outlook just screws it all up.

The cards I've used are a LSI Logic SAS 3800X (8 port External PCI-X card w/ 2 x SFF-8470 SAS connectors) and LSI SAS 3801E ( 8 Port External PCI-e card with 2 x SFF-8088 SAS connectors).  Each connector has 4 SAS links.
The SAS protocol is downwardly compatible with SATA, so you can run SATA drives right on a SAS cable.

So, in my setup, I basically have 1 drive per SAS link.  No expanders, or anything fancy.  The issues I mentioned happens to the 4 drives on the same connector.  When the driver is detecting the new drive, it looks like it redetects all of the drives on the connector (or it at least reports one new drive and the other existing drives).  If you were in a directory from one of the mounted drives, you get IO Errors as it appears that the drive was removed, and then remounted, but in a way that was not clean.  

This has happened with Default CentOS 5 kernels (2.6.18-*.el5), 2.6.26 vanilla, 2.6.30 vanilla, Fedora latest.
The issue appeared no matter what.

The udev rules used the ENV{ID_PATH} option to tie to the sysfs value that indicated which PCI ID + SAS phy on the SAS HBA used by the drives to the device detected by the kernel, and then create a symlink from the /dev/sd<X> entry to /dev/slot<Y>, where Y is the label on the slot of the hot swap bays (a-h).   Here is an example of the rule:

KERNEL=="sd*", ENV{ID_PATH}=="pci-0000:04:00.0-sas-phy0:1*", SYMLINK+="slota%n"

I did this because the device ID number that the kernel reports increments every time a drive is swapped.  So, even though you are using the same SAS channel, you do not have a consistent drive numbering.  So I had to go down to the SAS phy to get something consistent.  The SiI-3124/libata setup had consistent device ID's (the ID was tied to the SATA channel, and I used the device ID to do the mapping.  Perhaps udev is the reason for the issues, but I tend to think it is the way the SAS/SCSI subsystem works as I have never seen the SATA/libata subsystem have this "rescan/remount" behavior.

-----Original Message-----
From: linux-scsi-owner@vger.kernel.org [mailto:linux-scsi-owner@vger.kernel.org] On Behalf Of Asdo
Sent: Friday, March 05, 2010 6:06 PM
To: Moore, Michael
Cc: James Bottomley; linux-scsi@vger.kernel.org
Subject: Re: LSI SAS changes SCSI address and by-path on hot-swap

Moore, Michael wrote:
> I did this with a LSI-1068E HBA and 2 x 4 drive hot swap SATA bays.  I was able to create udev rules to map the drive slots to consistent /dev entries.  However, the bigger problem I had was that if I had a drive inserted and mounted (say in slot A) and then I added or swapped another drive on the same port ( 4 SAS channels per port on the external HBAs) it would cause some sort of reset on the bus that would end up unmounting the drive    
> in slot A even though I never did anything to the drive in slot A.  Now, this was with SATA drives connected directly to the 1068 which should work, but since I needed this to work, I had to revert to the older setup that used Silicon Image 3124 eSATA cards.
>
> I can try to dig up the udev rules I used if this would be helpful.
>
> - Mike
>   

Michael, thanks for replying.
The problem you describe would really be a showstopper for us, so I'd 
really like to understand it well!...
I tried to reread your post multiple times but I don't fully understand, 
excuse my ignorance... this is because I am not really familiar with 
SAS/SCSI terms.

You have an LSI-1068E attached directly to hot swap SATA bays, you don't 
have expanders in the middle, right?
Then you swapped one of the 8 drives and another one got disconnected 
because of that?

This doesn't seem to happen on my setup. I have a mainboard-integrated 
LSI-1068E and kernel 2.6.24. I only have 4 drives connected and they all 
belong to "port-5:x" (x is different for each drive) I didn't try to 
swap all of them, but I definitely tried swapping one, and no other 
drive was disconnected because of this. The other drives were part of an 
md-raid, it would have been disastrous if it happened.
What kernel version do you have?

The following sentence is not clear to me:
"then I added or swapped another drive on the same port ( 4 SAS channels 
per port on the external HBAs)"
what is the "external HBAs"? Is that an expander?

Also... are you sure the problem was not maybe due to the udev rule? 
Sorry I don't know udev, I don't know if it even has the power to 
unmount a drive...

Thank you
A.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: LSI SAS changes SCSI address and by-path on hot-swap
  2010-03-09 16:50       ` Moore, Michael
@ 2010-03-10 10:59         ` Boaz Harrosh
  2010-03-10 13:49           ` quotes in reply messages (was Re: LSI SAS changes SCSI address and by-path on hot-swap) Stefan Richter
  2010-03-12 15:25         ` LSI SAS changes SCSI address and by-path on hot-swap Asdo
  1 sibling, 1 reply; 11+ messages in thread
From: Boaz Harrosh @ 2010-03-10 10:59 UTC (permalink / raw)
  To: Moore, Michael; +Cc: Asdo, James Bottomley, linux-scsi@vger.kernel.org

On 03/09/2010 06:50 PM, Moore, Michael wrote:
> Sorry for top posting, but Outlook just screws it all up.
> 

Outlook is the root of all computers evil and should be avoided as the
plague. Exchange servers work just as well with ThunderBird or any other
none evil email client through the IMAP protocol. The mail is easy to setup
the address book needs some low-level LDAP definitions but once set works
like a charm. For office schedule you can use outlook just for that, along
side Thunderbird (It's IMAP) for all other mail needs.

Outlook is evil
Boaz

^ permalink raw reply	[flat|nested] 11+ messages in thread

* quotes in reply messages (was Re: LSI SAS changes SCSI address and by-path on hot-swap)
  2010-03-10 10:59         ` Boaz Harrosh
@ 2010-03-10 13:49           ` Stefan Richter
  0 siblings, 0 replies; 11+ messages in thread
From: Stefan Richter @ 2010-03-10 13:49 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Moore, Michael, Asdo, James Bottomley, linux-scsi@vger.kernel.org

Boaz Harrosh wrote:
> On 03/09/2010 06:50 PM, Moore, Michael wrote:
>> Sorry for top posting, but Outlook just screws it all up.
>> 
> 
> Outlook is the root of all computers evil and should be avoided as the
> plague. Exchange servers work just as well with ThunderBird or any other
> none evil email client through the IMAP protocol.
[...]

There are also macros for Outlook which properly preformat quoted text
when you hit reply.  Search the web for outlook quotefix.
-- 
Stefan Richter
-=====-==-=- --== -=-=-
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: LSI SAS changes SCSI address and by-path on hot-swap
  2010-03-09 16:50       ` Moore, Michael
  2010-03-10 10:59         ` Boaz Harrosh
@ 2010-03-12 15:25         ` Asdo
  2010-03-12 15:32           ` James Bottomley
  1 sibling, 1 reply; 11+ messages in thread
From: Asdo @ 2010-03-12 15:25 UTC (permalink / raw)
  To: Moore, Michael; +Cc: James Bottomley, linux-scsi@vger.kernel.org

Moore, Michael wrote:
> Sorry for top posting, but Outlook just screws it all up.
>
> The cards I've used are a LSI Logic SAS 3800X (8 port External PCI-X card w/ 2 x SFF-8470 SAS connectors) and LSI SAS 3801E ( 8 Port External PCI-e card with 2 x SFF-8088 SAS connectors).  Each connector has 4 SAS links.
> The SAS protocol is downwardly compatible with SATA, so you can run SATA drives right on a SAS cable.
>
> So, in my setup, I basically have 1 drive per SAS link.  No expanders, or anything fancy.  The issues I mentioned happens to the 4 drives on the same connector.  When the driver is detecting the new drive, it looks like it redetects all of the drives on the connector (or it at least reports one new drive and the other existing drives).  If you were in a directory from one of the mounted drives, you get IO Errors as it appears that the drive was removed, and then remounted, but in a way that was not clean.  
>
> This has happened with Default CentOS 5 kernels (2.6.18-*.el5), 2.6.26 vanilla, 2.6.30 vanilla, Fedora latest.
> The issue appeared no matter what.
>
> The udev rules used the ENV{ID_PATH} option to tie to the sysfs value that indicated which PCI ID + SAS phy on the SAS HBA used by the drives to the device detected by the kernel, and then create a symlink from the /dev/sd<X> entry to /dev/slot<Y>, where Y is the label on the slot of the hot swap bays (a-h).   Here is an example of the rule:
>
> KERNEL=="sd*", ENV{ID_PATH}=="pci-0000:04:00.0-sas-phy0:1*", SYMLINK+="slota%n"
>
> I did this because the device ID number that the kernel reports increments every time a drive is swapped.  So, even though you are using the same SAS channel, you do not have a consistent drive numbering.  So I had to go down to the SAS phy to get something consistent.  The SiI-3124/libata setup had consistent device ID's (the ID was tied to the SATA channel, and I used the device ID to do the mapping.  Perhaps udev is the reason for the issues, but I tend to think it is the way the SAS/SCSI subsystem works as I have never seen the SATA/libata subsystem have this "rescan/remount" behavior.
>   

This looks like a horrible bug for people having software RAID on the 
disks (or maybe even hardware RAID)

I seem not to have this bug on ubuntu kernel 2.6.24, I mean my situation 
was similar with the mainboard-integrated LSISAS 1068E and it didn't 
happen to me, but that doesn't mean much...

Also, LSI controllers are very much used by linuxers.
Have you tried reporting it here and try to get it fixed?
Or reporting it to the LSI tech support? They are pretty responsive even 
if their web interface is a bit strange.

I'm thinking about buying a few of LSI HBA controllers for linux 
software RAID use, probably external ones like the one you have. Maybe 
attached to expanders. I'll keep my fingers crossed!


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: LSI SAS changes SCSI address and by-path on hot-swap
  2010-03-12 15:25         ` LSI SAS changes SCSI address and by-path on hot-swap Asdo
@ 2010-03-12 15:32           ` James Bottomley
  2010-03-12 15:50             ` Asdo
  0 siblings, 1 reply; 11+ messages in thread
From: James Bottomley @ 2010-03-12 15:32 UTC (permalink / raw)
  To: Asdo; +Cc: Moore, Michael, linux-scsi@vger.kernel.org

On Fri, 2010-03-12 at 16:25 +0100, Asdo wrote:
> Moore, Michael wrote:
> > Sorry for top posting, but Outlook just screws it all up.
> >
> > The cards I've used are a LSI Logic SAS 3800X (8 port External PCI-X card w/ 2 x SFF-8470 SAS connectors) and LSI SAS 3801E ( 8 Port External PCI-e card with 2 x SFF-8088 SAS connectors).  Each connector has 4 SAS links.
> > The SAS protocol is downwardly compatible with SATA, so you can run SATA drives right on a SAS cable.
> >
> > So, in my setup, I basically have 1 drive per SAS link.  No expanders, or anything fancy.  The issues I mentioned happens to the 4 drives on the same connector.  When the driver is detecting the new drive, it looks like it redetects all of the drives on the connector (or it at least reports one new drive and the other existing drives).  If you were in a directory from one of the mounted drives, you get IO Errors as it appears that the drive was removed, and then remounted, but in a way that was not clean.  
> >
> > This has happened with Default CentOS 5 kernels (2.6.18-*.el5), 2.6.26 vanilla, 2.6.30 vanilla, Fedora latest.
> > The issue appeared no matter what.
> >
> > The udev rules used the ENV{ID_PATH} option to tie to the sysfs value that indicated which PCI ID + SAS phy on the SAS HBA used by the drives to the device detected by the kernel, and then create a symlink from the /dev/sd<X> entry to /dev/slot<Y>, where Y is the label on the slot of the hot swap bays (a-h).   Here is an example of the rule:
> >
> > KERNEL=="sd*", ENV{ID_PATH}=="pci-0000:04:00.0-sas-phy0:1*", SYMLINK+="slota%n"
> >
> > I did this because the device ID number that the kernel reports increments every time a drive is swapped.  So, even though you are using the same SAS channel, you do not have a consistent drive numbering.  So I had to go down to the SAS phy to get something consistent.  The SiI-3124/libata setup had consistent device ID's (the ID was tied to the SATA channel, and I used the device ID to do the mapping.  Perhaps udev is the reason for the issues, but I tend to think it is the way the SAS/SCSI subsystem works as I have never seen the SATA/libata subsystem have this "rescan/remount" behavior.
> >   
> 
> This looks like a horrible bug for people having software RAID on the 
> disks (or maybe even hardware RAID)

Not really, most people want to identify the disk permanently, not the
slot, so that's what /dev/disk/by-id and /dev/disk/by-uuid is for.

> I seem not to have this bug on ubuntu kernel 2.6.24, I mean my situation 
> was similar with the mainboard-integrated LSISAS 1068E and it didn't 
> happen to me, but that doesn't mean much...
> 
> Also, LSI controllers are very much used by linuxers.
> Have you tried reporting it here and try to get it fixed?
> Or reporting it to the LSI tech support? They are pretty responsive even 
> if their web interface is a bit strange.
> 
> I'm thinking about buying a few of LSI HBA controllers for linux 
> software RAID use, probably external ones like the one you have. Maybe 
> attached to expanders. I'll keep my fingers crossed!

It's not just LSI that does this ... every SAS board will tend to
increment target numbers on add and remove because that's the way the
transport class does it.  In linux, you have to expect the /dev/sdX name
to be volatile and mount by id or uuid instead.  To mount by slot, you
can use the phy workaround for SAS/SATA, but you should really be using
an enclosure management service.

James



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: LSI SAS changes SCSI address and by-path on hot-swap
  2010-03-12 15:32           ` James Bottomley
@ 2010-03-12 15:50             ` Asdo
  0 siblings, 0 replies; 11+ messages in thread
From: Asdo @ 2010-03-12 15:50 UTC (permalink / raw)
  To: James Bottomley; +Cc: Moore, Michael, linux-scsi@vger.kernel.org

James Bottomley wrote:
> On Fri, 2010-03-12 at 16:25 +0100, Asdo wrote:
>   
>> Moore, Michael wrote:
>>     
>>> Sorry for top posting, but Outlook just screws it all up.
>>>
>>> The cards I've used are a LSI Logic SAS 3800X (8 port External PCI-X card w/ 2 x SFF-8470 SAS connectors) and LSI SAS 3801E ( 8 Port External PCI-e card with 2 x SFF-8088 SAS connectors).  Each connector has 4 SAS links.
>>> The SAS protocol is downwardly compatible with SATA, so you can run SATA drives right on a SAS cable.
>>>
>>> So, in my setup, I basically have 1 drive per SAS link.  No expanders, or anything fancy.  The issues I mentioned happens to the 4 drives on the same connector.  When the driver is detecting the new drive, it looks like it redetects all of the drives on the connector (or it at least reports one new drive and the other existing drives).  If you were in a directory from one of the mounted drives, you get IO Errors as it appears that the drive was removed, and then remounted, but in a way that was not clean.  
>>>
>>> This has happened with Default CentOS 5 kernels (2.6.18-*.el5), 2.6.26 vanilla, 2.6.30 vanilla, Fedora latest.
>>> The issue appeared no matter what.
>>>
>>> The udev rules used the ENV{ID_PATH} option to tie to the sysfs value that indicated which PCI ID + SAS phy on the SAS HBA used by the drives to the device detected by the kernel, and then create a symlink from the /dev/sd<X> entry to /dev/slot<Y>, where Y is the label on the slot of the hot swap bays (a-h).   Here is an example of the rule:
>>>
>>> KERNEL=="sd*", ENV{ID_PATH}=="pci-0000:04:00.0-sas-phy0:1*", SYMLINK+="slota%n"
>>>
>>> I did this because the device ID number that the kernel reports increments every time a drive is swapped.  So, even though you are using the same SAS channel, you do not have a consistent drive numbering.  So I had to go down to the SAS phy to get something consistent.  The SiI-3124/libata setup had consistent device ID's (the ID was tied to the SATA channel, and I used the device ID to do the mapping.  Perhaps udev is the reason for the issues, but I tend to think it is the way the SAS/SCSI subsystem works as I have never seen the SATA/libata subsystem have this "rescan/remount" behavior.
>>>   
>>>       
>> This looks like a horrible bug for people having software RAID on the 
>> disks (or maybe even hardware RAID)
>>     
>
> Not really, most people want to identify the disk permanently, not the
> slot, so that's what /dev/disk/by-id and /dev/disk/by-uuid is for.
>
>   
No James, I am *not* referring to the topic of my original post now (for 
that one I understood how to do, thank you), I am now referring to the 
bug reported by Michael

Reread this part by Michael:
> So, in my setup, I basically have 1 drive per SAS link.  No expanders, or anything fancy.  The issues I mentioned happens to the 4 drives on the same connector.  When the driver is detecting the new drive, it looks like it redetects all of the drives on the connector (or it at least reports one new drive and the other existing drives).  If you were in a directory from one of the mounted drives, you get IO Errors as it appears that the drive was removed, and then remounted, but in a way that was not clean. 
>   
and his previous post on this same thread

If the drives are part of an MD raid, they are going to be kicked by MD 
if they give errors when one of the brothers is hotswapped. If multiple 
drives are kicked simultaneously (like it seems to happen for Michael), 
the array will go down and you might not even be able to bring it up 
again with --force (depending on various factors e.g. on how many drives 
were on the same controller vs how many were on other controllers). If 
you are able to bring the array up again it will probably in degraded 
state. Data loss is also very likely.



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-03-12 15:52 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-04 16:55 LSI SAS changes SCSI address and by-path on hot-swap Asdo
2010-03-05  6:22 ` James Bottomley
2010-03-05 11:12   ` Asdo
2010-03-05 16:57   ` Moore, Michael
2010-03-05 23:05     ` Asdo
2010-03-09 16:50       ` Moore, Michael
2010-03-10 10:59         ` Boaz Harrosh
2010-03-10 13:49           ` quotes in reply messages (was Re: LSI SAS changes SCSI address and by-path on hot-swap) Stefan Richter
2010-03-12 15:25         ` LSI SAS changes SCSI address and by-path on hot-swap Asdo
2010-03-12 15:32           ` James Bottomley
2010-03-12 15:50             ` Asdo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox