* SAS overview
@ 2004-03-16 11:46 Douglas Gilbert
2004-03-16 13:12 ` Matthew Wilcox
0 siblings, 1 reply; 9+ messages in thread
From: Douglas Gilbert @ 2004-03-16 11:46 UTC (permalink / raw)
To: linux-scsi
Serial Attached SCSI (SAS) is projected to become available
later this year. Many of the usual suspects that supply the
parallel SCSI world have recently been making SAS product
announcements. See http://www.scsita.org .
Also on that site is an instructive 10 part tutorial by
Rob Elliot on various aspects of SAS. SAS borrows a lot
from SATA (and Fibre Channel) and is designed for
internal storage (as is SATA and it uses the same plugs
and cables) and "near" external storage (up to around
10 metres away). Even though the Ultra 640 standard
(SPI-5) is ratified it seems that vendors are going to
bypass it and go to SAS.
A SCSI domain is a set of SCSI initiators and targets (at
least one of each) interconnected by a service delivery
subsystem. In the SCSI Parallel Interface (SPI) the service
delivery subsystem is passive: a multi-drop cable. In SAS
the service delivery subsystem is either a cable (e.g. a
SATA cable) or one or more expanders. Over 16000 SAS
initiators and targets can be interconnected by a set of
expanders in a single SAS domain. To simplify routing,
multiple paths between SAS initiators and targets within
a single SAS domain are not allowed. This leads to some
interesting topologies (see question 1 of Rob Elliot's
tutorial after you understand wide links).
The unit of serial communication is a "phy". The current
generation of SATA phys run at 1.5 Gb/sec while the SAS
phys run at 3.0 Gb/sec. [A byte is encoded in 10 bits in
the serial stream so for maximum megabyte/sec throughputs,
just divide by 10.] The SAS standard shows expanders with
optional SAS-SATA bridges and SATA phys. The standard
defines the STP protocol for tunnelling the ATA/ATAPI
command set through the SAS infastructure. Evidentally
there are autosensing phys that can have either a SAS or
SATA disk connected to them. It seems as though the first
generation of SAS HBAs will also have this capability.
SAS initiators, targets and expanders are identified by SAS
addresses which are a (world wide unique ?) 64 bit number
(naa==5). From what I read typical SAS elements will have
the following:
SAS HBA: 4 phys, 1 SAS address
SAS disk: 2 phys each with a separate SAS address
(dual ported)
Expander: 3 or (usually) more phys, 1 SAS address **
SATA disk: 1 (SATA) phy, SAS address??
Now things start to get interesting. SAS can aggregate
individual phy interconnects to form "wide" links. So
if two (single phy) cables are run from a HBA (occupying
two of its phys) to phys on the same expander then that
is a wide link. Seen from Linux driving that HBA that is
two PCI devices (each HBA phy) but only one SCSI initiator
port. Up to 256 phy interconnects can be ganged to make
a very wide SAS link (a parallel bus you might say).
On the other hand if those two HBA phys were cabled to the
two phys on a SAS disk then that would _not_ be a wide link
since the two phys at the SAS disk end do _not_ have the same
SAS address. Further it is two SAS domains since the two
narrow links are separate service delivery subsystems.
Hopefully any errors I have made above will be corrected by
those who have practical experience with SAS. The SAS
architectural overview tutorial (url given above) has
diagrams to illustrate.
Linux implementation thoughts:
- SAS addresses are 64 bit and correspond to the Linux
SCSI id (i.e. third item of <host,channel,id,lun> tuple)
which is currently 32 bit
- wide links break the one to one relationship between
PCI devices (i.e. a phy) and a SAS initiator.
- SAS introduces a SAS Management Protocol (SMP) for
discovery and configuring SAS expanders and attached
devices. SAS expanders are not SCSI devices yet they
need to be addressable from the user space for
discovery and configuration to be offloaded from the
LLD to user space programs. Perhaps a new role for the
scsi generic driver (with a guard of 'M' rather than
'S')??
Notes:
** The SAS standard permits an expander to have multiple
SAS addresses: the mandatory one for SMP routing
purposes and optional ones for embedded devices such
as bridges and enclosure services devices.
- some SAS disks will have a 2.5" form factor
- as I write the www.scsita.org is down, hopefully it
is a temporary outage. The (latest draft before) the
SAS standard is at www.t10.org
Doug Gilbert
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: SAS overview 2004-03-16 11:46 SAS overview Douglas Gilbert @ 2004-03-16 13:12 ` Matthew Wilcox 2004-03-16 17:47 ` Scott M. Ferris 0 siblings, 1 reply; 9+ messages in thread From: Matthew Wilcox @ 2004-03-16 13:12 UTC (permalink / raw) To: Douglas Gilbert; +Cc: linux-scsi On Tue, Mar 16, 2004 at 09:46:31PM +1000, Douglas Gilbert wrote: > The unit of serial communication is a "phy". The current > generation of SATA phys run at 1.5 Gb/sec while the SAS > phys run at 3.0 Gb/sec. [A byte is encoded in 10 bits in > the serial stream so for maximum megabyte/sec throughputs, > just divide by 10.] This is going to be interesting with PCI Express. PCI Express links also encode a byte in 10 bits (and so does infinibad, iirc). PCI-E 1.0 runs its links at 2.5Gb/s, so you're going to need strictly more PCI-E links than SAS phys in order to saturate the SAS bus. Rumour has it that the first generation of boxes will have x4 sockets for non-graphics slots, which will be fine for cards with up to 3 phys (3 * 3 = 9; 4 * 2.5 = 10) but will be the bottleneck for cards with 4 phys. > Now things start to get interesting. SAS can aggregate > individual phy interconnects to form "wide" links. So > if two (single phy) cables are run from a HBA (occupying > two of its phys) to phys on the same expander then that > is a wide link. Seen from Linux driving that HBA that is > two PCI devices (each HBA phy) but only one SCSI initiator > port. Up to 256 phy interconnects can be ganged to make > a very wide SAS link (a parallel bus you might say). It's different though (and I suspect you know this, but let's clarify for the audience). A parallel link would send each bit down a different phy. PCI-E (and probably SAS) send each byte down a different phy. -- "Next the statesmen will invent cheap lies, putting the blame upon the nation that is attacked, and every man will be glad of those conscience-soothing falsities, and will diligently study them, and refuse to examine any refutations of them; and thus he will by and by convince himself that the war is just, and will thank God for the better sleep he enjoys after this process of grotesque self-deception." -- Mark Twain ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: SAS overview 2004-03-16 13:12 ` Matthew Wilcox @ 2004-03-16 17:47 ` Scott M. Ferris 2004-03-28 3:01 ` Douglas Gilbert 0 siblings, 1 reply; 9+ messages in thread From: Scott M. Ferris @ 2004-03-16 17:47 UTC (permalink / raw) To: Matthew Wilcox; +Cc: Douglas Gilbert, linux-scsi Matthew Wilcox wrote: > On Tue, Mar 16, 2004 at 09:46:31PM +1000, Douglas Gilbert wrote: > > > Now things start to get interesting. SAS can aggregate > > individual phy interconnects to form "wide" links. So > > if two (single phy) cables are run from a HBA (occupying > > two of its phys) to phys on the same expander then that > > is a wide link. Seen from Linux driving that HBA that is > > two PCI devices (each HBA phy) but only one SCSI initiator > > port. Up to 256 phy interconnects can be ganged to make > > a very wide SAS link (a parallel bus you might say). > > It's different though (and I suspect you know this, but let's clarify for > the audience). A parallel link would send each bit down a different phy. > PCI-E (and probably SAS) send each byte down a different phy. The assumption that each phy will be a different PCI device is likely to be false, at least for initiators. Under normal circumstances, there's simply no reason for a host driver to care which initiator phy gets used, and it's more efficient for the HBA to choose the phy when it wants to open a connection, and expose a firmware interface that deals with either logical (SAM-2) SCSI initiator ports (narrow or wide depending on cabling), or end devices that the HBA has discovered in the SAS domain. You can probably get phy information from the HBA for configuration or error reporting purposes, but host drivers probably won't queue commands to specific phys. SAS is connection-oriented, and each connection is between one initiator phy and one target phy. The wide ports don't aggregate data at the byte level. A wide port lets you have multiple connections open at the same time, to different devices, or to the same device if that device has enough phys. Once a connection is established, SAS is fairly similar to packetized U320. The main effect of having wide ports on the HBAs is that it allows the HBA to have 4 connections open at once, and it allows a target to open a connection to any of the HBA's 4 initiator phys when reconnecting to the initiator to provide data-in or status. It's like having 4 phone lines with the same hunt-group phone number for incoming calls. -- Scott M. Ferris, sferris@acm.org ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: SAS overview 2004-03-16 17:47 ` Scott M. Ferris @ 2004-03-28 3:01 ` Douglas Gilbert 2004-03-28 3:19 ` Jeff Garzik 2004-03-28 22:06 ` Scott M. Ferris 0 siblings, 2 replies; 9+ messages in thread From: Douglas Gilbert @ 2004-03-28 3:01 UTC (permalink / raw) To: Scott M. Ferris; +Cc: Matthew Wilcox, linux-scsi Scott M. Ferris wrote: > Matthew Wilcox wrote: > >>On Tue, Mar 16, 2004 at 09:46:31PM +1000, Douglas Gilbert wrote: >> >> >>>Now things start to get interesting. SAS can aggregate >>>individual phy interconnects to form "wide" links. So >>>if two (single phy) cables are run from a HBA (occupying >>>two of its phys) to phys on the same expander then that >>>is a wide link. Seen from Linux driving that HBA that is >>>two PCI devices (each HBA phy) but only one SCSI initiator >>>port. Up to 256 phy interconnects can be ganged to make >>>a very wide SAS link (a parallel bus you might say). >> >>It's different though (and I suspect you know this, but let's clarify for >>the audience). A parallel link would send each bit down a different phy. >>PCI-E (and probably SAS) send each byte down a different phy. > > > The assumption that each phy will be a different PCI device is likely > to be false, at least for initiators. Scott, Whether two phys on a SAS HBA are two separate SCSI initiators or one initiator on a wide link depends on what those phys are attached to. That seems pretty dynamic so it seems to me the HBA driver needs to look at phys individually. So if the phys are not separate PCI devices then they at least need to be individually addressable (and the SAS address isn't going to help since it's the same for all HBA phys). At the moment for SPI PCI HBAs sysfs has a one to one link between a PCI device and a SCSI host. Such sysfs links might look a little different in the presene of SAS wide links. > Under normal circumstances, > there's simply no reason for a host driver to care which initiator phy > gets used, and it's more efficient for the HBA to choose the phy when > it wants to open a connection, and expose a firmware interface that > deals with either logical (SAM-2) SCSI initiator ports (narrow or wide > depending on cabling), or end devices that the HBA has discovered in > the SAS domain. You can probably get phy information from the HBA for > configuration or error reporting purposes, but host drivers probably > won't queue commands to specific phys. Knowing the physical components (cables) that make up a wide link seems important IMO. It's possible to incorrectly "wire" a SAS domain so it might be useful to point out which cable(s) (by identifying the phys at either end) is inappropriate. Also the "unit" of routing in a SAS expander is the phy. > SAS is connection-oriented, and each connection is between one > initiator phy and one target phy. The wide ports don't aggregate data > at the byte level. A wide port lets you have multiple connections > open at the same time, to different devices, or to the same device if > that device has enough phys. Once a connection is established, SAS is > fairly similar to packetized U320. SAS has a protocol stack and it is connection-oriented towards the top of that stack. IMO its important to know what is going on at the lower levels as well. > The main effect of having wide ports on the HBAs is that it allows the > HBA to have 4 connections open at once, and it allows a target to open > a connection to any of the HBA's 4 initiator phys when reconnecting to > the initiator to provide data-in or status. It's like having 4 phone > lines with the same hunt-group phone number for incoming calls. Yes, I didn't make that point. The hunt-group analogy is good (as long as folks understand what that is). Doug Gilbert ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: SAS overview 2004-03-28 3:01 ` Douglas Gilbert @ 2004-03-28 3:19 ` Jeff Garzik 2004-03-28 4:32 ` Andre Hedrick 2004-03-28 22:06 ` Scott M. Ferris 1 sibling, 1 reply; 9+ messages in thread From: Jeff Garzik @ 2004-03-28 3:19 UTC (permalink / raw) To: dougg; +Cc: Scott M. Ferris, Matthew Wilcox, linux-scsi Douglas Gilbert wrote: > Scott M. Ferris wrote: > >> Matthew Wilcox wrote: >> >>> On Tue, Mar 16, 2004 at 09:46:31PM +1000, Douglas Gilbert wrote: >>> >>> >>>> Now things start to get interesting. SAS can aggregate >>>> individual phy interconnects to form "wide" links. So >>>> if two (single phy) cables are run from a HBA (occupying >>>> two of its phys) to phys on the same expander then that >>>> is a wide link. Seen from Linux driving that HBA that is >>>> two PCI devices (each HBA phy) but only one SCSI initiator >>>> port. Up to 256 phy interconnects can be ganged to make >>>> a very wide SAS link (a parallel bus you might say). >>> >>> >>> It's different though (and I suspect you know this, but let's clarify >>> for >>> the audience). A parallel link would send each bit down a different >>> phy. >>> PCI-E (and probably SAS) send each byte down a different phy. >> >> >> >> The assumption that each phy will be a different PCI device is likely >> to be false, at least for initiators. > > > Scott, > Whether two phys on a SAS HBA are two separate SCSI initiators > or one initiator on a wide link depends on what those phys are > attached to. That seems pretty dynamic so it seems to me the HBA > driver needs to look at phys individually. So if the phys are > not separate PCI devices then they at least need to be individually > addressable (and the SAS address isn't going to help since it's > the same for all HBA phys). At the moment for SPI PCI HBAs sysfs > has a one to one link between a PCI device and a SCSI host. > Such sysfs links might look a little different in the presene of > SAS wide links. If it's anything like SATA, you have one phy per point-to-point link. Thus, 8 sata phys on a single 8-port SATA board. Each is individually controlled and reset-able, and can throw transport error conditions independent of the other phys. Jeff ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: SAS overview 2004-03-28 3:19 ` Jeff Garzik @ 2004-03-28 4:32 ` Andre Hedrick 2004-03-28 5:33 ` Jeff Garzik 0 siblings, 1 reply; 9+ messages in thread From: Andre Hedrick @ 2004-03-28 4:32 UTC (permalink / raw) To: Jeff Garzik; +Cc: dougg, Scott M. Ferris, Matthew Wilcox, linux-scsi Jeff, Until you enter port-multiplier mode, and if their is a prosposal which looks anything like loopback FC with multipath the game becomes less like single ended of SATA 1.0. Just my nickle, and it is wooden too. Andre Hedrick LAD Storage Consulting Group On Sat, 27 Mar 2004, Jeff Garzik wrote: > Douglas Gilbert wrote: > > Scott M. Ferris wrote: > > > >> Matthew Wilcox wrote: > >> > >>> On Tue, Mar 16, 2004 at 09:46:31PM +1000, Douglas Gilbert wrote: > >>> > >>> > >>>> Now things start to get interesting. SAS can aggregate > >>>> individual phy interconnects to form "wide" links. So > >>>> if two (single phy) cables are run from a HBA (occupying > >>>> two of its phys) to phys on the same expander then that > >>>> is a wide link. Seen from Linux driving that HBA that is > >>>> two PCI devices (each HBA phy) but only one SCSI initiator > >>>> port. Up to 256 phy interconnects can be ganged to make > >>>> a very wide SAS link (a parallel bus you might say). > >>> > >>> > >>> It's different though (and I suspect you know this, but let's clarify > >>> for > >>> the audience). A parallel link would send each bit down a different > >>> phy. > >>> PCI-E (and probably SAS) send each byte down a different phy. > >> > >> > >> > >> The assumption that each phy will be a different PCI device is likely > >> to be false, at least for initiators. > > > > > > Scott, > > Whether two phys on a SAS HBA are two separate SCSI initiators > > or one initiator on a wide link depends on what those phys are > > attached to. That seems pretty dynamic so it seems to me the HBA > > driver needs to look at phys individually. So if the phys are > > not separate PCI devices then they at least need to be individually > > addressable (and the SAS address isn't going to help since it's > > the same for all HBA phys). At the moment for SPI PCI HBAs sysfs > > has a one to one link between a PCI device and a SCSI host. > > Such sysfs links might look a little different in the presene of > > SAS wide links. > > If it's anything like SATA, you have one phy per point-to-point link. > Thus, 8 sata phys on a single 8-port SATA board. Each is individually > controlled and reset-able, and can throw transport error conditions > independent of the other phys. > > Jeff > > > > - > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: SAS overview 2004-03-28 4:32 ` Andre Hedrick @ 2004-03-28 5:33 ` Jeff Garzik 0 siblings, 0 replies; 9+ messages in thread From: Jeff Garzik @ 2004-03-28 5:33 UTC (permalink / raw) To: Andre Hedrick; +Cc: dougg, Scott M. Ferris, Matthew Wilcox, linux-scsi Andre Hedrick wrote: > Until you enter port-multiplier mode, and if their is a prosposal which Yeah, good point... ;) ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: SAS overview 2004-03-28 3:01 ` Douglas Gilbert 2004-03-28 3:19 ` Jeff Garzik @ 2004-03-28 22:06 ` Scott M. Ferris 2004-03-28 22:38 ` Jeff Garzik 1 sibling, 1 reply; 9+ messages in thread From: Scott M. Ferris @ 2004-03-28 22:06 UTC (permalink / raw) To: dougg; +Cc: Scott M. Ferris, Matthew Wilcox, linux-scsi Douglas Gilbert wrote: > > Whether two phys on a SAS HBA are two separate SCSI initiators or > one initiator on a wide link depends on what those phys are attached > to. Right. To handle wide ports, the HBA firmware/hardware is going to need access to information flowing over all of the phys in the wide port, which is why I expect that all of the phys that could be part of a wide port will be connected to the same PCI function, rather than one PCI function per phy. > That seems pretty dynamic so it seems to me the HBA driver needs > to look at phys individually. For some things, such as phy link resets and hard resets, phy link speed configuration, etc, the HBA driver will need to address individual phys, so there will have to be some way of doing that, probably by phy number to mimic the way expanders work. For issuing commands, I'd expect that the HBA firmware interface won't normally specify particular phys, precisely because it should be dynamic in order to take advantage of wide ports. To minimize the command latency, you want the HBA to issue the next command on any available phy of a wide port. You don't want a command to wait for a connection on phy 0 to finish when phys 1-3 are available immediately. That kind of low-level routing really needs to be in the HBA firmware or hardware in order to be dynamic enough. Any information the HBA driver has on phy utilization may be out of date by the time a particular command makes it to the head of the queue in the HBA firmware. > So if the phys are not separate PCI devices then they at least need > to be individually addressable (and the SAS address isn't going to > help since it's the same for all HBA phys). I think we agree that for some functions the phys will have to be individually addressable. SAS addresses are only useful to specify the other end of connection, independent of phy path. > At the moment for SPI PCI HBAs sysfs has a one to one link between a > PCI device and a SCSI host. Such sysfs links might look a little > different in the presene of SAS wide links. That might not have to change, depending on how the HBA interfaces are designed. A SAS HBA can have a variable number of initiator ports for each PCI function, depending on whether the ports are cabled wide or narrow. That could be modeled as one channel for each initiator port if the HBA exposes the ports, or just one channel if the HBA interface doesn't expose the ports. Which ports to collect into the same host structure will depend on the granularity a host_reset() can operate at (since a reset/reboot of the HBA firmware may clobber more than one port). That's usually pretty apparent once the HBA documentation is available. Another factor is whether there are any global limitations on the total number of commands the HBA firmware can accept. There may need to be limits on the number of commands per initiator port as well per HBA, which may or may not fit into the current Linux concept of a channel, though it wouldn't be hard to check port limits in the low-level driver's queuecommand() if that doesn't get enforced at higher levels of the Linux SCSI stack. > Knowing the physical components (cables) that make up a wide link > seems important IMO. It's possible to incorrectly "wire" a SAS > domain so it might be useful to point out which cable(s) (by > identifying the phys at either end) is inappropriate. Also the > "unit" of routing in a SAS expander is the phy. Knowledge of the phy topology would certainly be useful for troubleshooting. I'd expect a SAS HBA to have some way of reporting some or all of the information it discovered when it scanned the SAS domain, but that's a quality of implementation issue. > SAS has a protocol stack and it is connection-oriented towards the > top of that stack. IMO its important to know what is going on at > the lower levels as well. Sometimes yes, but usually no. Or at least I'm not used to getting much information about the lower level details from the SCSI HBAs I've worked with. I've usually needed an analyzer to tell me what was really going wrong at the lower levels. For troubleshooting any information you can get from the HBA is nice. It's a question of how much low-level info the HBA vendor is willing to provide. Some information may be unavailable, either because the ASICs don't expose it to the HBA firmware, or because the firmware doesn't expose it to the HBA driver. For the fast path of queueing and completing commands, I'd expect SAS HBAs to work similar to SPI HBAs, where all the HBA driver does is queue commands with data buffers, and wait for an interrupt to indicate that one or more commands have completed. For performance, the HBA needs to manage the entire lifetime of the command, and DMA data back and forth without needing intervention by the HBA driver. Perhaps I'm just misunderstanding what kinds of control and diagnostic information you're hoping to get from the lower levels. -- Scott M. Ferris, sferris@acm.org ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: SAS overview 2004-03-28 22:06 ` Scott M. Ferris @ 2004-03-28 22:38 ` Jeff Garzik 0 siblings, 0 replies; 9+ messages in thread From: Jeff Garzik @ 2004-03-28 22:38 UTC (permalink / raw) To: Scott M. Ferris; +Cc: dougg, Matthew Wilcox, linux-scsi Scott M. Ferris wrote: > For the fast path of queueing and completing commands, I'd expect SAS > HBAs to work similar to SPI HBAs, where all the HBA driver does is > queue commands with data buffers, and wait for an interrupt to > indicate that one or more commands have completed. For performance, > the HBA needs to manage the entire lifetime of the command, and DMA > data back and forth without needing intervention by the HBA driver. Newer SATA controllers do this for you, but still give you low level access to the individual frames being sent on the SATA wire... Jeff ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2004-03-28 22:38 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-03-16 11:46 SAS overview Douglas Gilbert 2004-03-16 13:12 ` Matthew Wilcox 2004-03-16 17:47 ` Scott M. Ferris 2004-03-28 3:01 ` Douglas Gilbert 2004-03-28 3:19 ` Jeff Garzik 2004-03-28 4:32 ` Andre Hedrick 2004-03-28 5:33 ` Jeff Garzik 2004-03-28 22:06 ` Scott M. Ferris 2004-03-28 22:38 ` Jeff Garzik
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox