* [PATCH 0/4] isci, libsas fixes for 3.4-rc2
@ 2014-02-06 20:22 Dan Williams
2014-02-06 20:23 ` [PATCH 1/4] isci: fix reset timeout handling Dan Williams
` (4 more replies)
0 siblings, 5 replies; 7+ messages in thread
From: Dan Williams @ 2014-02-06 20:22 UTC (permalink / raw)
To: linux-scsi; +Cc: artur.paszkiewicz, dave.jiang, lukasz.dorau, JBottomley
Hi James,
Here are some collected fixes. All but patch 2 are tagged for -stable.
Patch 1 and 4 have been on the list since before the 3.14 merge window,
patch 2 and 3 are new.
Please apply, thank you.
[PATCH 1/4] isci: fix reset timeout handling
[PATCH 2/4] scsi, libsas: introduce scmd_dbg() to quiet false positive "timeout" messages
[PATCH 3/4] isci: fix needless ata reset escalations
[PATCH 4/4] isci: correct erroneous for_each_isci_host macro
drivers/scsi/isci/host.h | 5 ++---
drivers/scsi/isci/port_config.c | 7 -------
drivers/scsi/isci/request.c | 8 ++------
drivers/scsi/isci/task.c | 2 +-
drivers/scsi/libsas/sas_scsi_host.c | 2 +-
include/scsi/scsi_device.h | 12 ++++++++++++
6 files changed, 18 insertions(+), 18 deletions(-)
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/4] isci: fix reset timeout handling
2014-02-06 20:22 [PATCH 0/4] isci, libsas fixes for 3.4-rc2 Dan Williams
@ 2014-02-06 20:23 ` Dan Williams
2014-02-06 20:23 ` [PATCH 2/4] scsi, libsas: introduce scmd_dbg() to quiet false positive "timeout" messages Dan Williams
` (3 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: Dan Williams @ 2014-02-06 20:23 UTC (permalink / raw)
To: linux-scsi
Cc: dave.jiang, Maciej Patelczyk, stable, David Milburn,
artur.paszkiewicz, lukasz.dorau, JBottomley, Xun Ni
Remove an erroneous BUG_ON() in the case of a hard reset timeout. The
reset timeout handler puts the port into the "awaiting link-up" state.
The timeout causes the device to be disconnected and we need to be in
the awaiting link-up state to re-connect the port. The BUG_ON() made
the incorrect assumption that resets never timeout and we always
complete the reset in the "resetting" state.
Testing this patch also uncovered that libata continues to attempt to
reset the port long after the driver has torn down the context. Once
the driver has committed to abandoning the link it must indicate to
libata that recovery ends by returning -ENODEV from
->lldd_I_T_nexus_reset().
Cc: <stable@vger.kernel.org>
Cc: Maciej Patelczyk <maciej.patelczyk@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Acked-by: Lukasz Dorau <lukasz.dorau@intel.com>
Reported-by: David Milburn <dmilburn@redhat.com>
Reported-by: Xun Ni <xun.ni@intel.com>
Tested-by: Xun Ni <xun.ni@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/scsi/isci/port_config.c | 7 -------
drivers/scsi/isci/task.c | 2 +-
2 files changed, 1 insertions(+), 8 deletions(-)
diff --git a/drivers/scsi/isci/port_config.c b/drivers/scsi/isci/port_config.c
index 85c77f6b802b..ac879745ef80 100644
--- a/drivers/scsi/isci/port_config.c
+++ b/drivers/scsi/isci/port_config.c
@@ -615,13 +615,6 @@ static void sci_apc_agent_link_up(struct isci_host *ihost,
SCIC_SDS_APC_WAIT_LINK_UP_NOTIFICATION);
} else {
/* the phy is already the part of the port */
- u32 port_state = iport->sm.current_state_id;
-
- /* if the PORT'S state is resetting then the link up is from
- * port hard reset in this case, we need to tell the port
- * that link up is recieved
- */
- BUG_ON(port_state != SCI_PORT_RESETTING);
port_agent->phy_ready_mask |= 1 << phy_index;
sci_port_link_up(iport, iphy);
}
diff --git a/drivers/scsi/isci/task.c b/drivers/scsi/isci/task.c
index 0d30ca849e8f..5d6fda72d659 100644
--- a/drivers/scsi/isci/task.c
+++ b/drivers/scsi/isci/task.c
@@ -801,7 +801,7 @@ int isci_task_I_T_nexus_reset(struct domain_device *dev)
/* XXX: need to cleanup any ireqs targeting this
* domain_device
*/
- ret = TMF_RESP_FUNC_COMPLETE;
+ ret = -ENODEV;
goto out;
}
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/4] scsi, libsas: introduce scmd_dbg() to quiet false positive "timeout" messages
2014-02-06 20:22 [PATCH 0/4] isci, libsas fixes for 3.4-rc2 Dan Williams
2014-02-06 20:23 ` [PATCH 1/4] isci: fix reset timeout handling Dan Williams
@ 2014-02-06 20:23 ` Dan Williams
2014-02-06 20:23 ` [PATCH 3/4] isci: fix needless ata reset escalations Dan Williams
` (2 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: Dan Williams @ 2014-02-06 20:23 UTC (permalink / raw)
To: linux-scsi
Cc: dave.jiang, lukasz.dorau, Jack Wang, Sangeetha Gnanasekaran,
artur.paszkiewicz, JBottomley, Nikith Ganigarakoppal,
Anand Kumar Santhanam, Xun Ni, Nelson Cheng
libsas sometimes short circuits timeouts to force commands into error
recovery. It is misleading to log that the command timed-out in
sas_scsi_timed_out() when in fact it was just queued for error handling.
It's also redundant in the case of a true timeout as libata eh will
detect and report timeouts via it's AC_ERR_TIMEOUT facility.
Given that some environments consider "timeout" errors to be indicative
of impending device failure demote the sas_scsi_timed_out() timeout
message to be disabled by default. This parallels ata_scsi_timed_out().
Cc: Jack Wang <jack_wang@usish.com>
Cc: Anand Kumar Santhanam <AnandKumar.Santhanam@pmcs.com>
Cc: Sangeetha Gnanasekaran <Sangeetha.Gnanasekaran@pmcs.com>
Cc: Nikith Ganigarakoppal <Nikith.Ganigarakoppal@pmcs.com>
Reported-by: Xun Ni <xun.ni@intel.com>
Tested-by: Nelson Cheng <nelson.cheng@intel.com>
Acked-by: Lukasz Dorau <lukasz.dorau@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/scsi/libsas/sas_scsi_host.c | 2 +-
include/scsi/scsi_device.h | 12 ++++++++++++
2 files changed, 13 insertions(+), 1 deletions(-)
diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index da3aee17faa5..25d0f127424d 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -862,7 +862,7 @@ out:
enum blk_eh_timer_return sas_scsi_timed_out(struct scsi_cmnd *cmd)
{
- scmd_printk(KERN_DEBUG, cmd, "command %p timed out\n", cmd);
+ scmd_dbg(cmd, "command %p timed out\n", cmd);
return BLK_EH_NOT_HANDLED;
}
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index d65fbec2533d..067ac9f1607c 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -235,12 +235,24 @@ struct scsi_dh_data {
#define sdev_printk(prefix, sdev, fmt, a...) \
dev_printk(prefix, &(sdev)->sdev_gendev, fmt, ##a)
+#define sdev_dbg(sdev, fmt, a...) \
+ dev_dbg(&(sdev)->sdev_gendev, fmt, ##a)
+
#define scmd_printk(prefix, scmd, fmt, a...) \
(scmd)->request->rq_disk ? \
sdev_printk(prefix, (scmd)->device, "[%s] " fmt, \
(scmd)->request->rq_disk->disk_name, ##a) : \
sdev_printk(prefix, (scmd)->device, fmt, ##a)
+#define scmd_dbg(scmd, fmt, a...) \
+ do { \
+ if ((scmd)->request->rq_disk) \
+ sdev_dbg((scmd)->device, "[%s] " fmt, \
+ (scmd)->request->rq_disk->disk_name, ##a);\
+ else \
+ sdev_dbg((scmd)->device, fmt, ##a); \
+ } while (0)
+
enum scsi_target_state {
STARGET_CREATED = 1,
STARGET_RUNNING,
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 3/4] isci: fix needless ata reset escalations
2014-02-06 20:22 [PATCH 0/4] isci, libsas fixes for 3.4-rc2 Dan Williams
2014-02-06 20:23 ` [PATCH 1/4] isci: fix reset timeout handling Dan Williams
2014-02-06 20:23 ` [PATCH 2/4] scsi, libsas: introduce scmd_dbg() to quiet false positive "timeout" messages Dan Williams
@ 2014-02-06 20:23 ` Dan Williams
2014-02-06 20:23 ` [PATCH 4/4] isci: correct erroneous for_each_isci_host macro Dan Williams
2014-02-23 20:12 ` [PATCH 0/4] isci, libsas fixes for 3.4-rc2 James Bottomley
4 siblings, 0 replies; 7+ messages in thread
From: Dan Williams @ 2014-02-06 20:23 UTC (permalink / raw)
To: linux-scsi
Cc: dave.jiang, lukasz.dorau, stable, artur.paszkiewicz, JBottomley,
Xun Ni, Nelson Cheng
isci is needlessly tying libata's hands by returning
SAM_STAT_CHECK_CONDITION to some ata errors. Instead, prefer
SAS_PROTO_RESPONSE to let libata (via sas_ata_task_done()) disposition
the device-to-host fis.
For example isci is triggering an HSM Violation where AHCI is showing a
simple media error for the same bus condition:
isci:
ata7.00: failed command: READ VERIFY SECTOR(S)
ata7.00: cmd 40/00:01:00:00:00/00:00:00:00:00/e0 tag 0
res 01/04:00:00:00:00/00:00:00:00:00/e0 Emask 0x3 (HSM violation)
ahci:
ata6.00: failed command: READ VERIFY SECTOR(S)
ata6.00: cmd 40/00:01:00:00:00/00:00:00:00:00/e0 tag 0
res 51/40:01:00:00:00/00:00:00:00:00/e0 Emask 0x9 (media error)
Note that the isci response matches this from sas_ata_task_done():
/* We saw a SAS error. Send a vague error. */
[..]
dev->sata_dev.fis[3] = 0x04; /* status err */
dev->sata_dev.fis[2] = ATA_ERR;
The end effect is that isci is needlessly triggering hard resets when
they are not necessary.
Cc: <stable@vger.kernel.org>
Reported-by: Xun Ni <xun.ni@intel.com>
Tested-by: Nelson Cheng <nelson.cheng@intel.com>
Acked-by: Lukasz Dorau <lukasz.dorau@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/scsi/isci/request.c | 8 ++------
1 files changed, 2 insertions(+), 6 deletions(-)
diff --git a/drivers/scsi/isci/request.c b/drivers/scsi/isci/request.c
index 99d2930b18c8..56e38096f0c4 100644
--- a/drivers/scsi/isci/request.c
+++ b/drivers/scsi/isci/request.c
@@ -2723,13 +2723,9 @@ static void isci_process_stp_response(struct sas_task *task, struct dev_to_host_
memcpy(resp->ending_fis, fis, sizeof(*fis));
ts->buf_valid_size = sizeof(*resp);
- /* If the device fault bit is set in the status register, then
- * set the sense data and return.
- */
- if (fis->status & ATA_DF)
+ /* If an error is flagged let libata decode the fis */
+ if (ac_err_mask(fis->status))
ts->stat = SAS_PROTO_RESPONSE;
- else if (fis->status & ATA_ERR)
- ts->stat = SAM_STAT_CHECK_CONDITION;
else
ts->stat = SAM_STAT_GOOD;
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 4/4] isci: correct erroneous for_each_isci_host macro
2014-02-06 20:22 [PATCH 0/4] isci, libsas fixes for 3.4-rc2 Dan Williams
` (2 preceding siblings ...)
2014-02-06 20:23 ` [PATCH 3/4] isci: fix needless ata reset escalations Dan Williams
@ 2014-02-06 20:23 ` Dan Williams
2014-02-23 20:12 ` [PATCH 0/4] isci, libsas fixes for 3.4-rc2 James Bottomley
4 siblings, 0 replies; 7+ messages in thread
From: Dan Williams @ 2014-02-06 20:23 UTC (permalink / raw)
To: linux-scsi
Cc: dave.jiang, Maciej Patelczyk, JBottomley, artur.paszkiewicz,
lukasz.dorau, stable
From: Lukasz Dorau <lukasz.dorau@intel.com>
In the first place, the loop 'for' in the macro 'for_each_isci_host'
(drivers/scsi/isci/host.h:314) is incorrect, because it accesses
the 3rd element of 2 element array. After the 2nd iteration it executes
the instruction:
ihost = to_pci_info(pdev)->hosts[2]
(while the size of the 'hosts' array equals 2) and reads an
out of range element.
In the second place, this loop is incorrectly optimized by GCC v4.8
(see http://marc.info/?l=linux-kernel&m=138998871911336&w=2).
As a result, on platforms with two SCU controllers,
the loop is executed more times than it can be (for i=0,1 and 2).
It causes kernel panic during entering the S3 state
and the following oops after 'rmmod isci':
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff8131360b>] __list_add+0x1b/0xc0
Oops: 0000 [#1] SMP
RIP: 0010:[<ffffffff8131360b>] [<ffffffff8131360b>] __list_add+0x1b/0xc0
Call Trace:
[<ffffffff81661b84>] __mutex_lock_slowpath+0x114/0x1b0
[<ffffffff81661c3f>] mutex_lock+0x1f/0x30
[<ffffffffa03e97cb>] sas_disable_events+0x1b/0x50 [libsas]
[<ffffffffa03e9818>] sas_unregister_ha+0x18/0x60 [libsas]
[<ffffffffa040316e>] isci_unregister+0x1e/0x40 [isci]
[<ffffffffa0403efd>] isci_pci_remove+0x5d/0x100 [isci]
[<ffffffff813391cb>] pci_device_remove+0x3b/0xb0
[<ffffffff813fbf7f>] __device_release_driver+0x7f/0xf0
[<ffffffff813fc8f8>] driver_detach+0xa8/0xb0
[<ffffffff813fbb8b>] bus_remove_driver+0x9b/0x120
[<ffffffff813fcf2c>] driver_unregister+0x2c/0x50
[<ffffffff813381f3>] pci_unregister_driver+0x23/0x80
[<ffffffffa04152f8>] isci_exit+0x10/0x1e [isci]
[<ffffffff810d199b>] SyS_delete_module+0x16b/0x2d0
[<ffffffff81012a21>] ? do_notify_resume+0x61/0xa0
[<ffffffff8166ce29>] system_call_fastpath+0x16/0x1b
The loop has been corrected.
This patch fixes kernel panic during entering the S3 state
and the above oops.
Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Reviewed-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
Tested-by: Lukasz Dorau <lukasz.dorau@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/scsi/isci/host.h | 5 ++---
1 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/scsi/isci/host.h b/drivers/scsi/isci/host.h
index 4911310a38f5..22a9bb1abae1 100644
--- a/drivers/scsi/isci/host.h
+++ b/drivers/scsi/isci/host.h
@@ -311,9 +311,8 @@ static inline struct Scsi_Host *to_shost(struct isci_host *ihost)
}
#define for_each_isci_host(id, ihost, pdev) \
- for (id = 0, ihost = to_pci_info(pdev)->hosts[id]; \
- id < ARRAY_SIZE(to_pci_info(pdev)->hosts) && ihost; \
- ihost = to_pci_info(pdev)->hosts[++id])
+ for (id = 0; id < SCI_MAX_CONTROLLERS && \
+ (ihost = to_pci_info(pdev)->hosts[id]); id++)
static inline void wait_for_start(struct isci_host *ihost)
{
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 0/4] isci, libsas fixes for 3.4-rc2
2014-02-06 20:22 [PATCH 0/4] isci, libsas fixes for 3.4-rc2 Dan Williams
` (3 preceding siblings ...)
2014-02-06 20:23 ` [PATCH 4/4] isci: correct erroneous for_each_isci_host macro Dan Williams
@ 2014-02-23 20:12 ` James Bottomley
2014-02-24 20:03 ` Dan Williams
4 siblings, 1 reply; 7+ messages in thread
From: James Bottomley @ 2014-02-23 20:12 UTC (permalink / raw)
To: Dan Williams; +Cc: linux-scsi, artur.paszkiewicz, dave.jiang, lukasz.dorau
On Thu, 2014-02-06 at 12:22 -0800, Dan Williams wrote:
> Hi James,
>
> Here are some collected fixes. All but patch 2 are tagged for -stable.
>
> Patch 1 and 4 have been on the list since before the 3.14 merge window,
> patch 2 and 3 are new.
>
> Please apply, thank you.
>
> [PATCH 1/4] isci: fix reset timeout handling
> [PATCH 2/4] scsi, libsas: introduce scmd_dbg() to quiet false positive "timeout" messages
> [PATCH 3/4] isci: fix needless ata reset escalations
> [PATCH 4/4] isci: correct erroneous for_each_isci_host macro
2,3 aren't really bug fixes; they're more enhancements (they make the
behaviour better but nothing breaks without them), so I'll queue 1,4 for
fixes and 2-3 for misc.
James
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 0/4] isci, libsas fixes for 3.4-rc2
2014-02-23 20:12 ` [PATCH 0/4] isci, libsas fixes for 3.4-rc2 James Bottomley
@ 2014-02-24 20:03 ` Dan Williams
0 siblings, 0 replies; 7+ messages in thread
From: Dan Williams @ 2014-02-24 20:03 UTC (permalink / raw)
To: James Bottomley; +Cc: linux-scsi, artur.paszkiewicz, dave.jiang, Lukasz Dorau
On Sun, Feb 23, 2014 at 12:12 PM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
>
> On Thu, 2014-02-06 at 12:22 -0800, Dan Williams wrote:
>> Hi James,
>>
>> Here are some collected fixes. All but patch 2 are tagged for -stable.
>>
>> Patch 1 and 4 have been on the list since before the 3.14 merge window,
>> patch 2 and 3 are new.
>>
>> Please apply, thank you.
>>
>> [PATCH 1/4] isci: fix reset timeout handling
>> [PATCH 2/4] scsi, libsas: introduce scmd_dbg() to quiet false positive "timeout" messages
>> [PATCH 3/4] isci: fix needless ata reset escalations
>> [PATCH 4/4] isci: correct erroneous for_each_isci_host macro
>
> 2,3 aren't really bug fixes; they're more enhancements (they make the
> behaviour better but nothing breaks without them), so I'll queue 1,4 for
> fixes and 2-3 for misc.
Ok.
Also Joe's patch while you're in the area:
http://marc.info/?l=linux-scsi&m=138609455422179&w=2
I acked it, but it shows some additional work needed for registration
error paths in scsi_transport_sas.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2014-02-24 20:03 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-06 20:22 [PATCH 0/4] isci, libsas fixes for 3.4-rc2 Dan Williams
2014-02-06 20:23 ` [PATCH 1/4] isci: fix reset timeout handling Dan Williams
2014-02-06 20:23 ` [PATCH 2/4] scsi, libsas: introduce scmd_dbg() to quiet false positive "timeout" messages Dan Williams
2014-02-06 20:23 ` [PATCH 3/4] isci: fix needless ata reset escalations Dan Williams
2014-02-06 20:23 ` [PATCH 4/4] isci: correct erroneous for_each_isci_host macro Dan Williams
2014-02-23 20:12 ` [PATCH 0/4] isci, libsas fixes for 3.4-rc2 James Bottomley
2014-02-24 20:03 ` Dan Williams
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox