linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Regression in v4.6-rc due to SCSI multipath change
@ 2016-05-04  6:51 Paul Mackerras
  2016-05-04  8:59 ` Hannes Reinecke
  0 siblings, 1 reply; 2+ messages in thread
From: Paul Mackerras @ 2016-05-04  6:51 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: linux-scsi, linux-kernel

Current upstream kernels fail to boot on my POWER8 server with
multipath SCSI disks and IPR host bus adapters.  What happens is that
the system finds each disk twice (as normal) and then prints messages
like this:

[    2.827761] sd 1:2:4:0: alua: supports implicit TPGS
[    2.827875] sd 1:2:4:0: alua: No device descriptors found
[    2.827923] sd 1:2:4:0: alua: Attach failed (-22)
[    2.827979] device-mapper: table: 253:0: multipath: error attaching hardware handler
[    2.828048] device-mapper: ioctl: error adding target to table

Eventually dracut times out (this is with Fedora 23) enters emergency
mode.

I bisected the problem down to commit 0047220c6c36 ("scsi_dh_alua: use
unique device id", 2016-02-19).  It seems that this commit adds the
restriction that we can only do multipath with disks that have stuff
in their VPD page 83 that scsi_vpd_lun_id() can parse.  The disks on
my server apparently don't.

I instrumented scsi_vpd_lun_id() to find out what was going on.  The
disks on this machine have a vendor-specific designator and a T10
vendor ID based designator, but no designators of types 2, 3 or 8.
An example from one disk is:

02 01 00 20 49 42 4d 20 20 20 20 20 49 50 52 2d 30 20 20 20 35 45 43
34 41 42 30 30 30 30 30 30 30 30 32 30

02 00 00 14 30 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20

I have a patch that extends scsi_vpd_lun_id() to be able to use the
T10 vendor ID based designator, which fixes the problem on my system.
I'll post the patch shortly.

However, was it really intentional that multipath now can't be used
with disks like these, when it worked just fine previously?

Paul.

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Regression in v4.6-rc due to SCSI multipath change
  2016-05-04  6:51 Regression in v4.6-rc due to SCSI multipath change Paul Mackerras
@ 2016-05-04  8:59 ` Hannes Reinecke
  0 siblings, 0 replies; 2+ messages in thread
From: Hannes Reinecke @ 2016-05-04  8:59 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linux-scsi, linux-kernel

On 05/04/2016 08:51 AM, Paul Mackerras wrote:
> Current upstream kernels fail to boot on my POWER8 server with
> multipath SCSI disks and IPR host bus adapters.  What happens is that
> the system finds each disk twice (as normal) and then prints messages
> like this:
> 
> [    2.827761] sd 1:2:4:0: alua: supports implicit TPGS
> [    2.827875] sd 1:2:4:0: alua: No device descriptors found
> [    2.827923] sd 1:2:4:0: alua: Attach failed (-22)
> [    2.827979] device-mapper: table: 253:0: multipath: error attaching hardware handler
> [    2.828048] device-mapper: ioctl: error adding target to table
> 
> Eventually dracut times out (this is with Fedora 23) enters emergency
> mode.
> 
> I bisected the problem down to commit 0047220c6c36 ("scsi_dh_alua: use
> unique device id", 2016-02-19).  It seems that this commit adds the
> restriction that we can only do multipath with disks that have stuff
> in their VPD page 83 that scsi_vpd_lun_id() can parse.  The disks on
> my server apparently don't.
> 
> I instrumented scsi_vpd_lun_id() to find out what was going on.  The
> disks on this machine have a vendor-specific designator and a T10
> vendor ID based designator, but no designators of types 2, 3 or 8.
> An example from one disk is:
> 
> 02 01 00 20 49 42 4d 20 20 20 20 20 49 50 52 2d 30 20 20 20 35 45 43
> 34 41 42 30 30 30 30 30 30 30 30 32 30
> 
> 02 00 00 14 30 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
> 
> I have a patch that extends scsi_vpd_lun_id() to be able to use the
> T10 vendor ID based designator, which fixes the problem on my system.
> I'll post the patch shortly.
> 
Please do. I'm about to draft a patch myself, but if you already
have one ...

> However, was it really intentional that multipath now can't be used
> with disks like these, when it worked just fine previously?
> 
Well. The thing is, ALUA can't really work if no VPD descriptors are
found, and so the check itself is correct.

Howver, we really need to parse all possible VPD descriptors, for
sure, so that is indeed a bug.
I'm preparing a patch for decoding all possible VPD descriptors, too.

Let's see who's first :-)

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2016-05-04  8:59 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-05-04  6:51 Regression in v4.6-rc due to SCSI multipath change Paul Mackerras
2016-05-04  8:59 ` Hannes Reinecke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).