* mpt2sas,mpt3sas: SATA affiliations
@ 2014-11-12 19:59 Douglas Gilbert
2014-11-13 14:26 ` Christoph Hellwig
0 siblings, 1 reply; 2+ messages in thread
From: Douglas Gilbert @ 2014-11-12 19:59 UTC (permalink / raw)
To: SCSI development list
From a correspondent and my own testing I have seen way
too many of these messages in the log:
log_info(0x31160000): originator(PL), code(0x16), sub_code(0x0000)
That comes from either the mpt2sas or mpt3sas driver and may be
a problem with their interaction with the SCSI EH. In one case,
those messages go on forever, requiring a reboot; in my testing
(with sg_readcap) the command timeout (60 seconds) stopped them.
How they occur needs a bit of explaining: ATA disks are designed
to have only only initiator (host). So if you build a SAS fabric
including at least two initiators, an expander and one SATA disk,
then there is potentially a problem which SAS expanders address
with "affiliations". An affiliation is a mechanism for the
expander to remember the SAS address of the initiator (host)
that first "grabbed" the SATA disk, and rejecting any other
initiator that tries to access that SATA disk.
That rejection, in the link layer in SAS for the STP protocol,
is a OPEN_REJECT (STP RESOURCES BUSY) response. That is *not*
a retry-able error (so the use of "busy" is unfortunate).
FreeBSD handles this correctly, Linux in some cases retries
which results in chaos plus bloated logs.
There are mechanisms for the owner of the affiliation to clear
it so another initiator can claim it. However affiliations are
designed to thwart brute force attempts by non-owners. At best
non-owners should get one log message along the lines of:
cannot access SATA disk xxxx since another machine/HBA is
affiliated with it
Linux properly handles SATA affiliations when it comes across
them in normal device discovery. It is the "surprise"
disappearance of an affiliation that causes instability. That
surprise is caused by a utility like smp_phy_control telling
the expander to clear the affiliation and doing a rescan on
the other machine to claim the affiliation.
Doug Gilbert
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: mpt2sas,mpt3sas: SATA affiliations
2014-11-12 19:59 mpt2sas,mpt3sas: SATA affiliations Douglas Gilbert
@ 2014-11-13 14:26 ` Christoph Hellwig
0 siblings, 0 replies; 2+ messages in thread
From: Christoph Hellwig @ 2014-11-13 14:26 UTC (permalink / raw)
To: Douglas Gilbert
Cc: SCSI development list, Nagalakshmi Nandigama,
Praveen Krishnamoorthy, Sreekanth Reddy, Abhijit Mahajan
[adding the mpt maintainers as this seems to be a driver and not
midlayer issue]
On Wed, Nov 12, 2014 at 02:59:11PM -0500, Douglas Gilbert wrote:
> From a correspondent and my own testing I have seen way
> too many of these messages in the log:
> log_info(0x31160000): originator(PL), code(0x16), sub_code(0x0000)
>
> That comes from either the mpt2sas or mpt3sas driver and may be
> a problem with their interaction with the SCSI EH. In one case,
> those messages go on forever, requiring a reboot; in my testing
> (with sg_readcap) the command timeout (60 seconds) stopped them.
>
>
> How they occur needs a bit of explaining: ATA disks are designed
> to have only only initiator (host). So if you build a SAS fabric
> including at least two initiators, an expander and one SATA disk,
> then there is potentially a problem which SAS expanders address
> with "affiliations". An affiliation is a mechanism for the
> expander to remember the SAS address of the initiator (host)
> that first "grabbed" the SATA disk, and rejecting any other
> initiator that tries to access that SATA disk.
>
> That rejection, in the link layer in SAS for the STP protocol,
> is a OPEN_REJECT (STP RESOURCES BUSY) response. That is *not*
> a retry-able error (so the use of "busy" is unfortunate).
> FreeBSD handles this correctly, Linux in some cases retries
> which results in chaos plus bloated logs.
>
> There are mechanisms for the owner of the affiliation to clear
> it so another initiator can claim it. However affiliations are
> designed to thwart brute force attempts by non-owners. At best
> non-owners should get one log message along the lines of:
> cannot access SATA disk xxxx since another machine/HBA is
> affiliated with it
>
> Linux properly handles SATA affiliations when it comes across
> them in normal device discovery. It is the "surprise"
> disappearance of an affiliation that causes instability. That
> surprise is caused by a utility like smp_phy_control telling
> the expander to clear the affiliation and doing a rescan on
> the other machine to claim the affiliation.
>
> Doug Gilbert
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
---end quoted text---
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2014-11-13 14:26 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-12 19:59 mpt2sas,mpt3sas: SATA affiliations Douglas Gilbert
2014-11-13 14:26 ` Christoph Hellwig
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.