* [PATCH 0/3] Another round of SES fixes
@ 2017-12-21 11:22 Hannes Reinecke
2017-12-21 11:22 ` [PATCH 1/3] ses: make initial allocation size configurable Hannes Reinecke
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Hannes Reinecke @ 2017-12-21 11:22 UTC (permalink / raw)
To: Martin K. Petersen
Cc: Christoph Hellwig, James Bottomley, linux-scsi, Hannes Reinecke
Hi all,
I recently played around with a new NetApp E-series, and found some
issue with our SES implementation.
As usual, comments and reviews are welcome.
Hannes Reinecke (3):
ses: make initial allocation size configurable
ses: skip error messages for invalid LUNs
ses: Retry Power-on-reset check condition
drivers/scsi/ses.c | 47 ++++++++++++++++++++++++++++++++++-------------
1 file changed, 34 insertions(+), 13 deletions(-)
--
1.8.5.6
^ permalink raw reply [flat|nested] 10+ messages in thread* [PATCH 1/3] ses: make initial allocation size configurable 2017-12-21 11:22 [PATCH 0/3] Another round of SES fixes Hannes Reinecke @ 2017-12-21 11:22 ` Hannes Reinecke 2017-12-22 17:14 ` James Bottomley 2017-12-21 11:22 ` [PATCH 2/3] ses: skip error messages for invalid LUNs Hannes Reinecke 2017-12-21 11:22 ` [PATCH 3/3] ses: Retry Power-on-reset check condition Hannes Reinecke 2 siblings, 1 reply; 10+ messages in thread From: Hannes Reinecke @ 2017-12-21 11:22 UTC (permalink / raw) To: Martin K. Petersen Cc: Christoph Hellwig, James Bottomley, linux-scsi, Hannes Reinecke, Hannes Reinecke Some storage arrays have an incorrect SES implementation which will always return the allocation length of the CDB instead of the true length of the requested page. With this patch one can modify the initial allocation size to get the full output of those arrays. Signed-off-by: Hannes Reinecke <hare@suse.com> --- drivers/scsi/ses.c | 21 +++++++++++++-------- 1 file changed, 13 insertions(+), 8 deletions(-) diff --git a/drivers/scsi/ses.c b/drivers/scsi/ses.c index 11826c5..e8ffee1 100644 --- a/drivers/scsi/ses.c +++ b/drivers/scsi/ses.c @@ -51,6 +51,13 @@ struct ses_component { u64 addr; }; +#define INIT_ALLOC_SIZE 32 + +static unsigned long ses_alloc_size = INIT_ALLOC_SIZE; + +module_param_named(alloc_size, ses_alloc_size, ulong, S_IRUGO|S_IWUSR); +MODULE_PARM_DESC(alloc_size, "initial allocation length"); + static bool ses_page2_supported(struct enclosure_device *edev) { struct ses_device *ses_dev = edev->scratch; @@ -508,8 +515,6 @@ static int ses_enclosure_find_by_addr(struct enclosure_device *edev, return 0; } -#define INIT_ALLOC_SIZE 32 - static void ses_enclosure_data_process(struct enclosure_device *edev, struct scsi_device *sdev, int create) @@ -519,7 +524,7 @@ static void ses_enclosure_data_process(struct enclosure_device *edev, int i, j, page7_len, len, components; struct ses_device *ses_dev = edev->scratch; int types = ses_dev->page1_num_types; - unsigned char *hdr_buf = kzalloc(INIT_ALLOC_SIZE, GFP_KERNEL); + unsigned char *hdr_buf = kzalloc(ses_alloc_size, GFP_KERNEL); if (!hdr_buf) goto simple_populate; @@ -528,7 +533,7 @@ static void ses_enclosure_data_process(struct enclosure_device *edev, if (ses_dev->page10) ses_recv_diag(sdev, 10, ses_dev->page10, ses_dev->page10_len); /* Page 7 for the descriptors is optional */ - result = ses_recv_diag(sdev, 7, hdr_buf, INIT_ALLOC_SIZE); + result = ses_recv_diag(sdev, 7, hdr_buf, ses_alloc_size); if (result) goto simple_populate; @@ -663,12 +668,12 @@ static int ses_intf_add(struct device *cdev, sdev_printk(KERN_NOTICE, sdev, "Embedded Enclosure Device\n"); ses_dev = kzalloc(sizeof(*ses_dev), GFP_KERNEL); - hdr_buf = kzalloc(INIT_ALLOC_SIZE, GFP_KERNEL); + hdr_buf = kzalloc(ses_alloc_size, GFP_KERNEL); if (!hdr_buf || !ses_dev) goto err_init_free; page = 1; - result = ses_recv_diag(sdev, page, hdr_buf, INIT_ALLOC_SIZE); + result = ses_recv_diag(sdev, page, hdr_buf, ses_alloc_size); if (result) goto recv_failed; @@ -708,7 +713,7 @@ static int ses_intf_add(struct device *cdev, buf = NULL; page = 2; - result = ses_recv_diag(sdev, page, hdr_buf, INIT_ALLOC_SIZE); + result = ses_recv_diag(sdev, page, hdr_buf, ses_alloc_size); if (result) goto page2_not_supported; @@ -728,7 +733,7 @@ static int ses_intf_add(struct device *cdev, /* The additional information page --- allows us * to match up the devices */ page = 10; - result = ses_recv_diag(sdev, page, hdr_buf, INIT_ALLOC_SIZE); + result = ses_recv_diag(sdev, page, hdr_buf, ses_alloc_size); if (!result) { len = (hdr_buf[2] << 8) + hdr_buf[3] + 4; -- 1.8.5.6 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 1/3] ses: make initial allocation size configurable 2017-12-21 11:22 ` [PATCH 1/3] ses: make initial allocation size configurable Hannes Reinecke @ 2017-12-22 17:14 ` James Bottomley 2017-12-23 14:29 ` Hannes Reinecke 0 siblings, 1 reply; 10+ messages in thread From: James Bottomley @ 2017-12-22 17:14 UTC (permalink / raw) To: Hannes Reinecke, Martin K. Petersen Cc: Christoph Hellwig, linux-scsi, Hannes Reinecke On Thu, 2017-12-21 at 12:22 +0100, Hannes Reinecke wrote: > Some storage arrays have an incorrect SES implementation which will > always return the allocation length of the CDB instead of the true > length of the requested page. That's a pretty gross standards violation, is this common? When the buffer is > than the data to return, does it get set then? Because if not, we're going to be processing bogus data and no module parameter can fix this because the returned length depends on the number of elements in the enclosure making this parameter really unsafe unless you get it exactly right. > With this patch one can modify the initial allocation size to > get the full output of those arrays. This really doesn't look like the right way to do it. Shouldn't we rather have a blacklist for these devices and simply do a page for each allocation for them (assuming they return the correct length when over subscribed). James ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/3] ses: make initial allocation size configurable 2017-12-22 17:14 ` James Bottomley @ 2017-12-23 14:29 ` Hannes Reinecke 0 siblings, 0 replies; 10+ messages in thread From: Hannes Reinecke @ 2017-12-23 14:29 UTC (permalink / raw) To: James Bottomley, Martin K. Petersen; +Cc: Christoph Hellwig, linux-scsi On 12/22/2017 06:14 PM, James Bottomley wrote: > On Thu, 2017-12-21 at 12:22 +0100, Hannes Reinecke wrote: >> Some storage arrays have an incorrect SES implementation which will >> always return the allocation length of the CDB instead of the true >> length of the requested page. > > That's a pretty gross standards violation, is this common? When the > buffer is > than the data to return, does it get set then? Because if > not, we're going to be processing bogus data and no module parameter > can fix this because the returned length depends on the number of > elements in the enclosure making this parameter really unsafe unless > you get it exactly right. > It's actually the first time I've observed that; the vendor is already alerted. But yes, if the buffer is larger than the page the correct page size is set, so we won't be suffering from the above issue. >> With this patch one can modify the initial allocation size to >> get the full output of those arrays. > > This really doesn't look like the right way to do it. Shouldn't we > rather have a blacklist for these devices and simply do a page for each > allocation for them (assuming they return the correct length when over > subscribed). > I really doubt it's worth it. It's just a single array line with a particular firmware revision from what I've seen. The one problem we're continually running into is that the blacklist flags are essentially used up, so I'd be rather careful with adding some not-that-common scenarios to it. I would love to have the blacklist size increased, though; I do have a few other issues which would rather benefit from being handled by a blacklist flag ... Cheers, Hannes -- Dr. Hannes Reinecke Teamlead Storage & Networking hare@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg) ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 2/3] ses: skip error messages for invalid LUNs 2017-12-21 11:22 [PATCH 0/3] Another round of SES fixes Hannes Reinecke 2017-12-21 11:22 ` [PATCH 1/3] ses: make initial allocation size configurable Hannes Reinecke @ 2017-12-21 11:22 ` Hannes Reinecke 2017-12-22 17:39 ` James Bottomley 2017-12-21 11:22 ` [PATCH 3/3] ses: Retry Power-on-reset check condition Hannes Reinecke 2 siblings, 1 reply; 10+ messages in thread From: Hannes Reinecke @ 2017-12-21 11:22 UTC (permalink / raw) To: Martin K. Petersen Cc: Christoph Hellwig, James Bottomley, linux-scsi, Hannes Reinecke, Hannes Reinecke Some storage array set the 'Embedded enclosure' bit even though no LUN is present, causing the first RECEIVE DIAGNOSTIC call to be returned with sense code 'LOGICAL UNIT NOT SUPPORTED'. This patch skips the annoying 'Failed to get diagnostic page 0x1' messages for those cases. Signed-off-by: Hannes Reinecke <hare@suse.com> --- drivers/scsi/ses.c | 20 +++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/drivers/scsi/ses.c b/drivers/scsi/ses.c index e8ffee1..c1f96b0 100644 --- a/drivers/scsi/ses.c +++ b/drivers/scsi/ses.c @@ -110,12 +110,17 @@ static int ses_recv_diag(struct scsi_device *sdev, int page_code, 0 }; unsigned char recv_page_code; + struct scsi_sense_hdr sshdr; ret = scsi_execute_req(sdev, cmd, DMA_FROM_DEVICE, buf, bufflen, - NULL, SES_TIMEOUT, SES_RETRIES, NULL); - if (unlikely(ret)) + &sshdr, SES_TIMEOUT, SES_RETRIES, NULL); + if (unlikely(ret)) { + if (status_byte(ret) == CHECK_CONDITION && + sshdr.asc == 0x25 && sshdr.ascq == 0x00) { + ret = -ENODEV; + } return ret; - + } recv_page_code = ((unsigned char *)buf)[0]; if (likely(recv_page_code == page_code)) @@ -674,9 +679,14 @@ static int ses_intf_add(struct device *cdev, page = 1; result = ses_recv_diag(sdev, page, hdr_buf, ses_alloc_size); - if (result) + if (result) { + if (result == -ENODEV && + sdev->inq_periph_qual != 0) { + kfree(hdr_buf); + return result; + } goto recv_failed; - + } len = (hdr_buf[2] << 8) + hdr_buf[3] + 4; buf = kzalloc(len, GFP_KERNEL); if (!buf) -- 1.8.5.6 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 2/3] ses: skip error messages for invalid LUNs 2017-12-21 11:22 ` [PATCH 2/3] ses: skip error messages for invalid LUNs Hannes Reinecke @ 2017-12-22 17:39 ` James Bottomley 2017-12-23 14:37 ` Hannes Reinecke 0 siblings, 1 reply; 10+ messages in thread From: James Bottomley @ 2017-12-22 17:39 UTC (permalink / raw) To: Hannes Reinecke, Martin K. Petersen Cc: Christoph Hellwig, linux-scsi, Hannes Reinecke On Thu, 2017-12-21 at 12:22 +0100, Hannes Reinecke wrote: > Some storage array set the 'Embedded enclosure' bit even though > no LUN is present, causing the first RECEIVE DIAGNOSTIC call to > be returned with sense code 'LOGICAL UNIT NOT SUPPORTED'. > This patch skips the annoying 'Failed to get diagnostic page 0x1' > messages for those cases. What disagnostic pages does this thing support? Can you do a receive diagnostic on page 0 to find out? I suspect a lot of embedded enclosure services are simple and support page 7 only. If it really refuses all diagnostic page requests (which would be a gross standards violation), then it should probably be blacklisted by inquiry string as unusable. James ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/3] ses: skip error messages for invalid LUNs 2017-12-22 17:39 ` James Bottomley @ 2017-12-23 14:37 ` Hannes Reinecke 0 siblings, 0 replies; 10+ messages in thread From: Hannes Reinecke @ 2017-12-23 14:37 UTC (permalink / raw) To: James Bottomley, Martin K. Petersen Cc: Christoph Hellwig, linux-scsi, Hannes Reinecke On 12/22/2017 06:39 PM, James Bottomley wrote: > On Thu, 2017-12-21 at 12:22 +0100, Hannes Reinecke wrote: >> Some storage array set the 'Embedded enclosure' bit even though >> no LUN is present, causing the first RECEIVE DIAGNOSTIC call to >> be returned with sense code 'LOGICAL UNIT NOT SUPPORTED'. >> This patch skips the annoying 'Failed to get diagnostic page 0x1' >> messages for those cases. > > What disagnostic pages does this thing support? Can you do a receive > diagnostic on page 0 to find out? I suspect a lot of embedded > enclosure services are simple and support page 7 only. If it really > refuses all diagnostic page requests (which would be a gross standards > violation), then it should probably be blacklisted by inquiry string as > unusable. > If a LUN is connected it'll respond to the usual set (0, 1, 2, und 6). Sending a RECEIVE DIAGNOSTIC to page 0 will probably yield the same result (ie being returned with that sense code), but at least we won't try to access the other pages. But anyway, yes, the implementation is dodgy. I'm in contact with the vendor and try to straighten things out. Cheers, Hannes -- Dr. Hannes Reinecke Teamlead Storage & Networking hare@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg) ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 3/3] ses: Retry Power-on-reset check condition 2017-12-21 11:22 [PATCH 0/3] Another round of SES fixes Hannes Reinecke 2017-12-21 11:22 ` [PATCH 1/3] ses: make initial allocation size configurable Hannes Reinecke 2017-12-21 11:22 ` [PATCH 2/3] ses: skip error messages for invalid LUNs Hannes Reinecke @ 2017-12-21 11:22 ` Hannes Reinecke 2017-12-21 15:29 ` James Bottomley 2 siblings, 1 reply; 10+ messages in thread From: Hannes Reinecke @ 2017-12-21 11:22 UTC (permalink / raw) To: Martin K. Petersen Cc: Christoph Hellwig, James Bottomley, linux-scsi, Hannes Reinecke, Hannes Reinecke During startup any SCSI request might encounter a 'Power-on/reset' sense code, which just can be retried. In the case of ses it needs to be retried, otherwise ses will errorneously detect this as a failure and not attach the driver. Signed-off-by: Hannes Reinecke <hare@suse.com> --- drivers/scsi/ses.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/drivers/scsi/ses.c b/drivers/scsi/ses.c index c1f96b0..9c8b3db 100644 --- a/drivers/scsi/ses.c +++ b/drivers/scsi/ses.c @@ -110,14 +110,20 @@ static int ses_recv_diag(struct scsi_device *sdev, int page_code, 0 }; unsigned char recv_page_code; + int retries = SES_RETRIES; struct scsi_sense_hdr sshdr; +retry: ret = scsi_execute_req(sdev, cmd, DMA_FROM_DEVICE, buf, bufflen, - &sshdr, SES_TIMEOUT, SES_RETRIES, NULL); + &sshdr, SES_TIMEOUT, retries, NULL); if (unlikely(ret)) { - if (status_byte(ret) == CHECK_CONDITION && - sshdr.asc == 0x25 && sshdr.ascq == 0x00) { - ret = -ENODEV; + if (status_byte(ret) == CHECK_CONDITION) { + if (sshdr.asc == 0x29) { + retries--; + goto retry; + } + if (sshdr.asc == 0x25 && sshdr.ascq == 0x00) + ret = -ENODEV; } return ret; } -- 1.8.5.6 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 3/3] ses: Retry Power-on-reset check condition 2017-12-21 11:22 ` [PATCH 3/3] ses: Retry Power-on-reset check condition Hannes Reinecke @ 2017-12-21 15:29 ` James Bottomley 2017-12-21 15:39 ` Hannes Reinecke 0 siblings, 1 reply; 10+ messages in thread From: James Bottomley @ 2017-12-21 15:29 UTC (permalink / raw) To: Hannes Reinecke, Martin K. Petersen Cc: Christoph Hellwig, linux-scsi, Hannes Reinecke On Thu, 2017-12-21 at 12:22 +0100, Hannes Reinecke wrote: > During startup any SCSI request might encounter a 'Power-on/reset' > sense code, which just can be retried. > In the case of ses it needs to be retried, otherwise ses will > errorneously detect this as a failure and not attach the driver. > > Signed-off-by: Hannes Reinecke <hare@suse.com> > --- > drivers/scsi/ses.c | 14 ++++++++++---- > 1 file changed, 10 insertions(+), 4 deletions(-) > > diff --git a/drivers/scsi/ses.c b/drivers/scsi/ses.c > index c1f96b0..9c8b3db 100644 > --- a/drivers/scsi/ses.c > +++ b/drivers/scsi/ses.c > @@ -110,14 +110,20 @@ static int ses_recv_diag(struct scsi_device > *sdev, int page_code, > 0 > }; > unsigned char recv_page_code; > + int retries = SES_RETRIES; > struct scsi_sense_hdr sshdr; > > +retry: > ret = scsi_execute_req(sdev, cmd, DMA_FROM_DEVICE, buf, > bufflen, > - &sshdr, SES_TIMEOUT, SES_RETRIES, > NULL); > + &sshdr, SES_TIMEOUT, retries, NULL); > if (unlikely(ret)) { > - if (status_byte(ret) == CHECK_CONDITION && > - sshdr.asc == 0x25 && sshdr.ascq == 0x00) { > - ret = -ENODEV; > + if (status_byte(ret) == CHECK_CONDITION) { > + if (sshdr.asc == 0x29) { > + retries--; Nothing ever checks this, meaning the loop potentially never terminates. We already have two templates for how to do this in sd.c ... on the other hand I can't see any use of scsi_execute/scsi_execute_req that actually want to see the ASC 29 conditions, so it might be better to integrate it into scsi_execute(). James ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 3/3] ses: Retry Power-on-reset check condition 2017-12-21 15:29 ` James Bottomley @ 2017-12-21 15:39 ` Hannes Reinecke 0 siblings, 0 replies; 10+ messages in thread From: Hannes Reinecke @ 2017-12-21 15:39 UTC (permalink / raw) To: James Bottomley, Martin K. Petersen Cc: Christoph Hellwig, linux-scsi, Hannes Reinecke On 12/21/2017 04:29 PM, James Bottomley wrote: > On Thu, 2017-12-21 at 12:22 +0100, Hannes Reinecke wrote: >> During startup any SCSI request might encounter a 'Power-on/reset' >> sense code, which just can be retried. >> In the case of ses it needs to be retried, otherwise ses will >> errorneously detect this as a failure and not attach the driver. >> >> Signed-off-by: Hannes Reinecke <hare@suse.com> >> --- >> drivers/scsi/ses.c | 14 ++++++++++---- >> 1 file changed, 10 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/scsi/ses.c b/drivers/scsi/ses.c >> index c1f96b0..9c8b3db 100644 >> --- a/drivers/scsi/ses.c >> +++ b/drivers/scsi/ses.c >> @@ -110,14 +110,20 @@ static int ses_recv_diag(struct scsi_device >> *sdev, int page_code, >> 0 >> }; >> unsigned char recv_page_code; >> + int retries = SES_RETRIES; >> struct scsi_sense_hdr sshdr; >> >> +retry: >> ret = scsi_execute_req(sdev, cmd, DMA_FROM_DEVICE, buf, >> bufflen, >> - &sshdr, SES_TIMEOUT, SES_RETRIES, >> NULL); >> + &sshdr, SES_TIMEOUT, retries, NULL); >> if (unlikely(ret)) { >> - if (status_byte(ret) == CHECK_CONDITION && >> - sshdr.asc == 0x25 && sshdr.ascq == 0x00) { >> - ret = -ENODEV; >> + if (status_byte(ret) == CHECK_CONDITION) { >> + if (sshdr.asc == 0x29) { >> + retries--; > > Nothing ever checks this, meaning the loop potentially never > terminates. > > We already have two templates for how to do this in sd.c ... on the > other hand I can't see any use of scsi_execute/scsi_execute_req that > actually want to see the ASC 29 conditions, so it might be better to > integrate it into scsi_execute(). > I was wondering about the same, as the POR conditions are pretty pointless (ATM; my friends from NetApp tend to disagree here :-) So ok, I'll be moving it into scsi_execute_req(). Cheers, Hannes -- Dr. Hannes Reinecke Teamlead Storage & Networking hare@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg) ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2017-12-23 14:37 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-12-21 11:22 [PATCH 0/3] Another round of SES fixes Hannes Reinecke 2017-12-21 11:22 ` [PATCH 1/3] ses: make initial allocation size configurable Hannes Reinecke 2017-12-22 17:14 ` James Bottomley 2017-12-23 14:29 ` Hannes Reinecke 2017-12-21 11:22 ` [PATCH 2/3] ses: skip error messages for invalid LUNs Hannes Reinecke 2017-12-22 17:39 ` James Bottomley 2017-12-23 14:37 ` Hannes Reinecke 2017-12-21 11:22 ` [PATCH 3/3] ses: Retry Power-on-reset check condition Hannes Reinecke 2017-12-21 15:29 ` James Bottomley 2017-12-21 15:39 ` Hannes Reinecke
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox