[PATCH 00/10] hpsa: September 2013 driver fixes

linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 00/10] hpsa: September 2013 driver fixes
@ 2013-09-23 18:33 Stephen M. Cameron
  2013-09-23 18:33 ` [PATCH 01/10] hpsa: do not attempt to flush the cache on locked up controllers Stephen M. Cameron
                   ` (9 more replies)
  0 siblings, 10 replies; 21+ messages in thread
From: Stephen M. Cameron @ 2013-09-23 18:33 UTC (permalink / raw)
  To: james.bottomley; +Cc: stephenmcameron, mikem, thenzl, linux-scsi, scott.teel

The following series contains some fixes for hpsa 

---

Stephen M. Cameron (10):
      hpsa: do not attempt to flush the cache on locked up controllers
      hpsa: add 5 second delay after doorbell reset
      hpsa: do not discard scsi status on aborted commands
      hpsa: remove unneeded include of seq_file.h
      hpsa: fix memory leak in CCISS_BIG_PASSTHRU ioctl
      hpsa: add MSA 2040 to list of external target devices
      hpsa: hide logical drives with format in progress from linux
      hpsa: bring logical drives online when format completes
      hpsa: cap CCISS_PASSTHRU at 20 concurrent commands.
      hpsa: prevent stalled i/o


 drivers/scsi/hpsa.c |  308 +++++++++++++++++++++++++++++++++++++++++++++++----
 drivers/scsi/hpsa.h |   20 +++
 2 files changed, 305 insertions(+), 23 deletions(-)

-- 
-- steve

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 01/10] hpsa: do not attempt to flush the cache on locked up controllers
  2013-09-23 18:33 [PATCH 00/10] hpsa: September 2013 driver fixes Stephen M. Cameron
@ 2013-09-23 18:33 ` Stephen M. Cameron
  2013-09-23 18:33 ` [PATCH 02/10] hpsa: add 5 second delay after doorbell reset Stephen M. Cameron
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Stephen M. Cameron @ 2013-09-23 18:33 UTC (permalink / raw)
  To: james.bottomley; +Cc: stephenmcameron, mikem, thenzl, linux-scsi, scott.teel

From: Stephen M. Cameron <scameron@beardog.cce.hp.com>

There's no point in trying since it can't work, and if you do
try, it will just hang the system on shutdown.

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
---
 drivers/scsi/hpsa.c |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index 891c86b..d6ddc7f 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -4942,6 +4942,15 @@ static void hpsa_flush_cache(struct ctlr_info *h)
 {
 	char *flush_buf;
 	struct CommandList *c;
+	unsigned long flags;
+
+	/* Don't bother trying to flush the cache if locked up */
+	spin_lock_irqsave(&h->lock, flags);
+	if (unlikely(h->lockup_detected)) {
+		spin_unlock_irqrestore(&h->lock, flags);
+		return;
+	}
+	spin_unlock_irqrestore(&h->lock, flags);
 
 	flush_buf = kzalloc(4, GFP_KERNEL);
 	if (!flush_buf)


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 02/10] hpsa: add 5 second delay after doorbell reset
  2013-09-23 18:33 [PATCH 00/10] hpsa: September 2013 driver fixes Stephen M. Cameron
  2013-09-23 18:33 ` [PATCH 01/10] hpsa: do not attempt to flush the cache on locked up controllers Stephen M. Cameron
@ 2013-09-23 18:33 ` Stephen M. Cameron
  2013-09-23 18:33 ` [PATCH 03/10] hpsa: do not discard scsi status on aborted commands Stephen M. Cameron
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Stephen M. Cameron @ 2013-09-23 18:33 UTC (permalink / raw)
  To: james.bottomley; +Cc: stephenmcameron, mikem, thenzl, linux-scsi, scott.teel

From: Stephen M. Cameron <scameron@beardog.cce.hp.com>

The hardware guys tell us that after initiating a software
reset via the doorbell register we need to wait 5 seconds before
attempting to talk to the board *at all*.  This means that we
cannot watch the board to verify it transitions from "ready" to
to "not ready" then back "ready", since this transition will
most likely happen during those 5 seconds (though we can still
verify the reset happens by watching the "driver version" field
get cleared.)

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
---
 drivers/scsi/hpsa.c |   32 +++++++++++++++++++++++---------
 1 files changed, 23 insertions(+), 9 deletions(-)

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index d6ddc7f..b63af55 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -3785,6 +3785,13 @@ static int hpsa_controller_hard_reset(struct pci_dev *pdev,
 		 */
 		dev_info(&pdev->dev, "using doorbell to reset controller\n");
 		writel(use_doorbell, vaddr + SA5_DOORBELL);
+
+		/* PMC hardware guys tell us we need a 5 second delay after
+		 * doorbell reset and before any attempt to talk to the board
+		 * at all to ensure that this actually works and doesn't fall
+		 * over in some weird corner cases.
+		 */
+		msleep(5000);
 	} else { /* Try to do it the PCI power state way */
 
 		/* Quoting from the Open CISS Specification: "The Power
@@ -3981,15 +3988,22 @@ static int hpsa_kdump_hard_reset_controller(struct pci_dev *pdev)
 	   need a little pause here */
 	msleep(HPSA_POST_RESET_PAUSE_MSECS);
 
-	/* Wait for board to become not ready, then ready. */
-	dev_info(&pdev->dev, "Waiting for board to reset.\n");
-	rc = hpsa_wait_for_board_state(pdev, vaddr, BOARD_NOT_READY);
-	if (rc) {
-		dev_warn(&pdev->dev,
-			"failed waiting for board to reset."
-			" Will try soft reset.\n");
-		rc = -ENOTSUPP; /* Not expected, but try soft reset later */
-		goto unmap_cfgtable;
+	if (!use_doorbell) {
+		/* Wait for board to become not ready, then ready.
+		 * (if we used the doorbell, then we already waited 5 secs
+		 * so the "not ready" state is already gone by so we
+		 * won't catch it.)
+		 */
+		dev_info(&pdev->dev, "Waiting for board to reset.\n");
+		rc = hpsa_wait_for_board_state(pdev, vaddr, BOARD_NOT_READY);
+		if (rc) {
+			dev_warn(&pdev->dev,
+				"failed waiting for board to reset."
+				" Will try soft reset.\n");
+			/* Not expected, but try soft reset later */
+			rc = -ENOTSUPP;
+			goto unmap_cfgtable;
+		}
 	}
 	rc = hpsa_wait_for_board_state(pdev, vaddr, BOARD_READY);
 	if (rc) {


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 03/10] hpsa: do not discard scsi status on aborted commands
  2013-09-23 18:33 [PATCH 00/10] hpsa: September 2013 driver fixes Stephen M. Cameron
  2013-09-23 18:33 ` [PATCH 01/10] hpsa: do not attempt to flush the cache on locked up controllers Stephen M. Cameron
  2013-09-23 18:33 ` [PATCH 02/10] hpsa: add 5 second delay after doorbell reset Stephen M. Cameron
@ 2013-09-23 18:33 ` Stephen M. Cameron
  2013-09-23 18:33 ` [PATCH 04/10] hpsa: remove unneeded include of seq_file.h Stephen M. Cameron
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Stephen M. Cameron @ 2013-09-23 18:33 UTC (permalink / raw)
  To: james.bottomley; +Cc: stephenmcameron, mikem, thenzl, linux-scsi, scott.teel

From: Stephen M. Cameron <scameron@beardog.cce.hp.com>

We inadvertantly discarded the scsi status for aborted commands.
For some commands (e.g. reads from tape drives) these can't be retried,
and if we discarded the scsi status, the scsi mid layer couldn't notice
anything was wrong and the error was not reported.

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Cc: stable@vger.kernel.org
---
 drivers/scsi/hpsa.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index b63af55..3e45090 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -1289,7 +1289,7 @@ static void complete_scsi_command(struct CommandList *cp)
 					"has check condition: aborted command: "
 					"ASC: 0x%x, ASCQ: 0x%x\n",
 					cp, asc, ascq);
-				cmd->result = DID_SOFT_ERROR << 16;
+				cmd->result |= DID_SOFT_ERROR << 16;
 				break;
 			}
 			/* Must be some other type of check condition */


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 04/10] hpsa: remove unneeded include of seq_file.h
  2013-09-23 18:33 [PATCH 00/10] hpsa: September 2013 driver fixes Stephen M. Cameron
                   ` (2 preceding siblings ...)
  2013-09-23 18:33 ` [PATCH 03/10] hpsa: do not discard scsi status on aborted commands Stephen M. Cameron
@ 2013-09-23 18:33 ` Stephen M. Cameron
  2013-09-23 18:33 ` [PATCH 05/10] hpsa: fix memory leak in CCISS_BIG_PASSTHRU ioctl Stephen M. Cameron
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Stephen M. Cameron @ 2013-09-23 18:33 UTC (permalink / raw)
  To: james.bottomley; +Cc: stephenmcameron, mikem, thenzl, linux-scsi, scott.teel

From: Stephen M. Cameron <scameron@beardog.cce.hp.com>

Signed-off-by: Scott Teel <scott.teel@hp.com>
Acked-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
---
 drivers/scsi/hpsa.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index 3e45090..411aef2 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -29,7 +29,6 @@
 #include <linux/delay.h>
 #include <linux/fs.h>
 #include <linux/timer.h>
-#include <linux/seq_file.h>
 #include <linux/init.h>
 #include <linux/spinlock.h>
 #include <linux/compat.h>


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 05/10] hpsa: fix memory leak in CCISS_BIG_PASSTHRU ioctl
  2013-09-23 18:33 [PATCH 00/10] hpsa: September 2013 driver fixes Stephen M. Cameron
                   ` (3 preceding siblings ...)
  2013-09-23 18:33 ` [PATCH 04/10] hpsa: remove unneeded include of seq_file.h Stephen M. Cameron
@ 2013-09-23 18:33 ` Stephen M. Cameron
  2013-09-23 18:33 ` [PATCH 06/10] hpsa: add MSA 2040 to list of external target devices Stephen M. Cameron
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Stephen M. Cameron @ 2013-09-23 18:33 UTC (permalink / raw)
  To: james.bottomley; +Cc: stephenmcameron, mikem, thenzl, linux-scsi, scott.teel

From: Stephen M. Cameron <scameron@beardog.cce.hp.com>

We were leaking a command buffer if a DMA mapping error was
encountered in the CCISS_BIG_PASSTHRU ioctl.

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
---
 drivers/scsi/hpsa.c |   11 +++++------
 1 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index 411aef2..f2ef778 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -3170,7 +3170,7 @@ static int hpsa_big_passthru_ioctl(struct ctlr_info *h, void __user *argp)
 				hpsa_pci_unmap(h->pdev, c, i,
 					PCI_DMA_BIDIRECTIONAL);
 				status = -ENOMEM;
-				goto cleanup1;
+				goto cleanup0;
 			}
 			c->SG[i].Addr.lower = temp64.val32.lower;
 			c->SG[i].Addr.upper = temp64.val32.upper;
@@ -3186,24 +3186,23 @@ static int hpsa_big_passthru_ioctl(struct ctlr_info *h, void __user *argp)
 	/* Copy the error information out */
 	memcpy(&ioc->error_info, c->err_info, sizeof(ioc->error_info));
 	if (copy_to_user(argp, ioc, sizeof(*ioc))) {
-		cmd_special_free(h, c);
 		status = -EFAULT;
-		goto cleanup1;
+		goto cleanup0;
 	}
 	if (ioc->Request.Type.Direction == XFER_READ && ioc->buf_size > 0) {
 		/* Copy the data out of the buffer we created */
 		BYTE __user *ptr = ioc->buf;
 		for (i = 0; i < sg_used; i++) {
 			if (copy_to_user(ptr, buff[i], buff_size[i])) {
-				cmd_special_free(h, c);
 				status = -EFAULT;
-				goto cleanup1;
+				goto cleanup0;
 			}
 			ptr += buff_size[i];
 		}
 	}
-	cmd_special_free(h, c);
 	status = 0;
+cleanup0:
+	cmd_special_free(h, c);
 cleanup1:
 	if (buff) {
 		for (i = 0; i < sg_used; i++)


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 06/10] hpsa: add MSA 2040 to list of external target devices
  2013-09-23 18:33 [PATCH 00/10] hpsa: September 2013 driver fixes Stephen M. Cameron
                   ` (4 preceding siblings ...)
  2013-09-23 18:33 ` [PATCH 05/10] hpsa: fix memory leak in CCISS_BIG_PASSTHRU ioctl Stephen M. Cameron
@ 2013-09-23 18:33 ` Stephen M. Cameron
  2013-09-23 18:34 ` [PATCH 07/10] hpsa: hide logical drives with format in progress from linux Stephen M. Cameron
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Stephen M. Cameron @ 2013-09-23 18:33 UTC (permalink / raw)
  To: james.bottomley; +Cc: stephenmcameron, mikem, thenzl, linux-scsi, scott.teel

From: Stephen M. Cameron <scameron@beardog.cce.hp.com>

Signed-off-by: Scott Teel <scott.teel@hp.com>
Acked-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
---
 drivers/scsi/hpsa.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index f2ef778..b7f405f 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -1782,6 +1782,7 @@ static unsigned char *ext_target_model[] = {
 	"MSA2312",
 	"MSA2324",
 	"P2000 G3 SAS",
+	"MSA 2040 SAS",
 	NULL,
 };
 


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 07/10] hpsa: hide logical drives with format in progress from linux
  2013-09-23 18:33 [PATCH 00/10] hpsa: September 2013 driver fixes Stephen M. Cameron
                   ` (5 preceding siblings ...)
  2013-09-23 18:33 ` [PATCH 06/10] hpsa: add MSA 2040 to list of external target devices Stephen M. Cameron
@ 2013-09-23 18:34 ` Stephen M. Cameron
  2013-09-27 13:22   ` Tomas Henzl
  2013-09-23 18:34 ` [PATCH 08/10] hpsa: bring logical drives online when format completes Stephen M. Cameron
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 21+ messages in thread
From: Stephen M. Cameron @ 2013-09-23 18:34 UTC (permalink / raw)
  To: james.bottomley; +Cc: stephenmcameron, mikem, thenzl, linux-scsi, scott.teel

From: Stephen M. Cameron <scameron@beardog.cce.hp.com>

SCSI mid layer doesn't seem to handle logical drives undergoing format
very well.  scsi_add_device on such devices seems to result in hitting
those devices with a TUR at a rate of 3Hz for awhile, transitioning
to hitting them with a READ(10) at a much higher rate indefinitely,
and at boot time, this prevents the system from coming up.  If we
do not expose such devices to the kernel, it isn't bothered by them.

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
---
 drivers/scsi/hpsa.c |   50 ++++++++++++++++++++++++++++++++++++++++++++++++--
 drivers/scsi/hpsa.h |    1 +
 2 files changed, 49 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index b7f405f..38e3af4 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -1010,6 +1010,20 @@ static void adjust_hpsa_scsi_table(struct ctlr_info *h, int hostno,
 	for (i = 0; i < nsds; i++) {
 		if (!sd[i]) /* if already added above. */
 			continue;
+
+		/* Don't add devices which are NOT READY, FORMAT IN PROGRESS
+		 * as the SCSI mid-layer does not handle such devices well.
+		 * It relentlessly loops sending TUR at 3Hz, then READ(10)
+		 * at 160Hz, and prevents the system from coming up.
+		 */
+		if (sd[i]->format_in_progress) {
+			dev_info(&h->pdev->dev,
+				"Logical drive format in progress, device c%db%dt%dl%d offline.\n",
+				h->scsi_host->host_no,
+				sd[i]->bus, sd[i]->target, sd[i]->lun);
+			continue;
+		}
+
 		device_change = hpsa_scsi_find_entry(sd[i], h->dev,
 					h->ndevices, &entry);
 		if (device_change == DEVICE_NOT_FOUND) {
@@ -1715,6 +1729,34 @@ static inline void hpsa_set_bus_target_lun(struct hpsa_scsi_dev_t *device,
 	device->lun = lun;
 }
 
+static unsigned char hpsa_format_in_progress(struct ctlr_info *h,
+		unsigned char scsi3addr[])
+{
+	struct CommandList *c;
+	unsigned char *sense, sense_key, asc, ascq;
+#define ASC_LUN_NOT_READY 0x04
+#define ASCQ_LUN_NOT_READY_FORMAT_IN_PROGRESS 0x04
+
+
+	c = cmd_special_alloc(h);
+	if (!c)
+		return 0;
+	fill_cmd(c, TEST_UNIT_READY, h, NULL, 0, 0, scsi3addr, TYPE_CMD);
+	hpsa_scsi_do_simple_cmd_core(h, c);
+	sense = c->err_info->SenseInfo;
+	sense_key = sense[2];
+	asc = sense[12];
+	ascq = sense[13];
+	if (c->err_info->CommandStatus == CMD_TARGET_STATUS &&
+		c->err_info->ScsiStatus == SAM_STAT_CHECK_CONDITION &&
+		sense_key == NOT_READY &&
+		asc == ASC_LUN_NOT_READY &&
+		ascq == ASCQ_LUN_NOT_READY_FORMAT_IN_PROGRESS)
+		return 1;
+	cmd_special_free(h, c);
+	return 0;
+}
+
 static int hpsa_update_device_info(struct ctlr_info *h,
 	unsigned char scsi3addr[], struct hpsa_scsi_dev_t *this_device,
 	unsigned char *is_OBDR_device)
@@ -1753,10 +1795,14 @@ static int hpsa_update_device_info(struct ctlr_info *h,
 		sizeof(this_device->device_id));
 
 	if (this_device->devtype == TYPE_DISK &&
-		is_logical_dev_addr_mode(scsi3addr))
+		is_logical_dev_addr_mode(scsi3addr)) {
 		hpsa_get_raid_level(h, scsi3addr, &this_device->raid_level);
-	else
+		this_device->format_in_progress =
+			hpsa_format_in_progress(h, scsi3addr);
+	} else {
 		this_device->raid_level = RAID_UNKNOWN;
+		this_device->format_in_progress = 0;
+	}
 
 	if (is_OBDR_device) {
 		/* See if this is a One-Button-Disaster-Recovery device
diff --git a/drivers/scsi/hpsa.h b/drivers/scsi/hpsa.h
index bc85e72..4fd0d45 100644
--- a/drivers/scsi/hpsa.h
+++ b/drivers/scsi/hpsa.h
@@ -46,6 +46,7 @@ struct hpsa_scsi_dev_t {
 	unsigned char vendor[8];        /* bytes 8-15 of inquiry data */
 	unsigned char model[16];        /* bytes 16-31 of inquiry data */
 	unsigned char raid_level;	/* from inquiry page 0xC1 */
+	unsigned char format_in_progress;
 };
 
 struct reply_pool {


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 08/10] hpsa: bring logical drives online when format completes
  2013-09-23 18:33 [PATCH 00/10] hpsa: September 2013 driver fixes Stephen M. Cameron
                   ` (6 preceding siblings ...)
  2013-09-23 18:34 ` [PATCH 07/10] hpsa: hide logical drives with format in progress from linux Stephen M. Cameron
@ 2013-09-23 18:34 ` Stephen M. Cameron
  2013-09-23 18:34 ` [PATCH 09/10] hpsa: cap CCISS_PASSTHRU at 20 concurrent commands Stephen M. Cameron
  2013-09-23 18:34 ` [PATCH 10/10] hpsa: prevent stalled i/o Stephen M. Cameron
  9 siblings, 0 replies; 21+ messages in thread
From: Stephen M. Cameron @ 2013-09-23 18:34 UTC (permalink / raw)
  To: james.bottomley; +Cc: stephenmcameron, mikem, thenzl, linux-scsi, scott.teel

From: Stephen M. Cameron <scameron@beardog.cce.hp.com>

Now that the driver is hiding logical drives which are
undergoing low level format (e.g. drive erase or rapid parity
initialization) from the OS, it should bring those drives online
when the operation completes.  We poll with test unit ready
every so often to determine when the drives become ready.

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
---
 drivers/scsi/hpsa.c |  128 +++++++++++++++++++++++++++++++++++++++++++++++++++
 drivers/scsi/hpsa.h |   13 +++++
 2 files changed, 140 insertions(+), 1 deletions(-)

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index 38e3af4..198288d 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -223,6 +223,8 @@ static int hpsa_lookup_board_id(struct pci_dev *pdev, u32 *board_id);
 static int hpsa_wait_for_board_state(struct pci_dev *pdev, void __iomem *vaddr,
 				     int wait_for_ready);
 static inline void finish_cmd(struct CommandList *c);
+static unsigned char hpsa_format_in_progress(struct ctlr_info *h,
+		unsigned char scsi3addr[]);
 #define BOARD_NOT_READY 0
 #define BOARD_READY 1
 
@@ -946,6 +948,112 @@ static int hpsa_scsi_find_entry(struct hpsa_scsi_dev_t *needle,
 	return DEVICE_NOT_FOUND;
 }
 
+#define OFFLINE_DEVICE_POLL_INTERVAL (120 * HZ)
+static int hpsa_offline_device_thread(void *v)
+{
+	struct ctlr_info *h = v;
+	unsigned long flags;
+	struct offline_device_entry *d;
+	unsigned char need_rescan = 0;
+	struct list_head *this, *tmp;
+
+	while (1) {
+		schedule_timeout_interruptible(OFFLINE_DEVICE_POLL_INTERVAL);
+		if (kthread_should_stop())
+			break;
+
+		/* Check if any of the offline devices have become ready */
+		spin_lock_irqsave(&h->offline_device_lock, flags);
+		list_for_each_safe(this, tmp, &h->offline_device_list) {
+			d = list_entry(this, struct offline_device_entry,
+					offline_list);
+			spin_unlock_irqrestore(&h->offline_device_lock, flags);
+			if (!hpsa_format_in_progress(h, d->scsi3addr)) {
+				need_rescan = 1;
+				goto do_rescan;
+			}
+			spin_lock_irqsave(&h->offline_device_lock, flags);
+		}
+		spin_unlock_irqrestore(&h->offline_device_lock, flags);
+	}
+
+do_rescan:
+
+	/* Remove all entries from the list and rescan and exit this thread.
+	 * If there are still offline devices, the rescan will make a new list
+	 * and create a new offline device monitor thread.
+	 */
+	spin_lock_irqsave(&h->offline_device_lock, flags);
+	list_for_each_safe(this, tmp, &h->offline_device_list) {
+		d = list_entry(this, struct offline_device_entry, offline_list);
+		list_del_init(this);
+		kfree(d);
+	}
+	h->offline_device_monitor = NULL;
+	h->offline_device_thread_state = OFFLINE_DEVICE_THREAD_STOPPED;
+	spin_unlock_irqrestore(&h->offline_device_lock, flags);
+	if (need_rescan)
+		hpsa_scan_start(h->scsi_host);
+	return 0;
+}
+
+static void hpsa_monitor_offline_device(struct ctlr_info *h,
+					unsigned char scsi3addr[])
+{
+	struct offline_device_entry *device;
+	unsigned long flags;
+
+	/* Check to see if device is already on the list */
+	spin_lock_irqsave(&h->offline_device_lock, flags);
+	list_for_each_entry(device, &h->offline_device_list, offline_list) {
+		if (memcmp(device->scsi3addr, scsi3addr,
+				sizeof(device->scsi3addr)) == 0) {
+			spin_unlock_irqrestore(&h->offline_device_lock, flags);
+			return;
+		}
+	}
+	spin_unlock_irqrestore(&h->offline_device_lock, flags);
+
+	/* Device is not on the list, add it. */
+	device = kmalloc(sizeof(*device), GFP_KERNEL);
+	if (!device) {
+		dev_warn(&h->pdev->dev, "out of memory in %s\n", __func__);
+		return;
+	}
+	memcpy(device->scsi3addr, scsi3addr, sizeof(device->scsi3addr));
+	spin_lock_irqsave(&h->offline_device_lock, flags);
+	list_add_tail(&device->offline_list, &h->offline_device_list);
+	if (h->offline_device_thread_state == OFFLINE_DEVICE_THREAD_STOPPED) {
+		h->offline_device_thread_state = OFFLINE_DEVICE_THREAD_RUNNING;
+		spin_unlock_irqrestore(&h->offline_device_lock, flags);
+		h->offline_device_monitor =
+			kthread_run(hpsa_offline_device_thread, h, HPSA "-odm");
+		spin_lock_irqsave(&h->offline_device_lock, flags);
+	}
+	if (!h->offline_device_monitor) {
+		dev_warn(&h->pdev->dev, "failed to start offline device monitor thread.\n");
+		h->offline_device_thread_state = OFFLINE_DEVICE_THREAD_STOPPED;
+	}
+	spin_unlock_irqrestore(&h->offline_device_lock, flags);
+}
+
+static void stop_offline_device_monitor(struct ctlr_info *h)
+{
+	unsigned long flags;
+	int stop_thread;
+
+	spin_lock_irqsave(&h->offline_device_lock, flags);
+	stop_thread = (h->offline_device_thread_state ==
+				OFFLINE_DEVICE_THREAD_RUNNING);
+	if (stop_thread)
+		/* STOPPING state prevents new thread from starting. */
+		h->offline_device_thread_state =
+				OFFLINE_DEVICE_THREAD_STOPPING;
+	spin_unlock_irqrestore(&h->offline_device_lock, flags);
+	if (stop_thread)
+		kthread_stop(h->offline_device_monitor);
+}
+
 static void adjust_hpsa_scsi_table(struct ctlr_info *h, int hostno,
 	struct hpsa_scsi_dev_t *sd[], int nsds)
 {
@@ -1018,7 +1126,10 @@ static void adjust_hpsa_scsi_table(struct ctlr_info *h, int hostno,
 		 */
 		if (sd[i]->format_in_progress) {
 			dev_info(&h->pdev->dev,
-				"Logical drive format in progress, device c%db%dt%dl%d offline.\n",
+				"c%db%dt%dl%d: Logical drive parity initialization, erase or format in progress\n",
+				h->scsi_host->host_no,
+				sd[i]->bus, sd[i]->target, sd[i]->lun);
+			dev_info(&h->pdev->dev, "c%db%dt%dl%d: temporarily offline\n",
 				h->scsi_host->host_no,
 				sd[i]->bus, sd[i]->target, sd[i]->lun);
 			continue;
@@ -1042,6 +1153,17 @@ static void adjust_hpsa_scsi_table(struct ctlr_info *h, int hostno,
 	}
 	spin_unlock_irqrestore(&h->devlock, flags);
 
+	/* Monitor devices which are NOT READY, FORMAT IN PROGRESS to be
+	 * brought online later. This must be done without holding h->devlock,
+	 * so don't touch h->dev[]
+	 */
+	for (i = 0; i < nsds; i++) {
+		if (!sd[i]) /* if already added above. */
+			continue;
+		if (sd[i]->format_in_progress)
+			hpsa_monitor_offline_device(h, sd[i]->scsi3addr);
+	}
+
 	/* Don't notify scsi mid layer of any changes the first time through
 	 * (or if there are no changes) scsi_scan_host will do it later the
 	 * first time through.
@@ -4879,8 +5001,10 @@ reinit_after_soft_reset:
 	h->intr_mode = hpsa_simple_mode ? SIMPLE_MODE_INT : PERF_MODE_INT;
 	INIT_LIST_HEAD(&h->cmpQ);
 	INIT_LIST_HEAD(&h->reqQ);
+	INIT_LIST_HEAD(&h->offline_device_list);
 	spin_lock_init(&h->lock);
 	spin_lock_init(&h->scan_lock);
+	spin_lock_init(&h->offline_device_lock);
 	rc = hpsa_pci_init(h);
 	if (rc != 0)
 		goto clean1;
@@ -4888,6 +5012,7 @@ reinit_after_soft_reset:
 	sprintf(h->devname, HPSA "%d", number_of_controllers);
 	h->ctlr = number_of_controllers;
 	number_of_controllers++;
+	h->offline_device_thread_state = OFFLINE_DEVICE_THREAD_STOPPED;
 
 	/* configure PCI DMA stuff */
 	rc = pci_set_dma_mask(pdev, DMA_BIT_MASK(64));
@@ -5066,6 +5191,7 @@ static void hpsa_remove_one(struct pci_dev *pdev)
 	}
 	h = pci_get_drvdata(pdev);
 	stop_controller_lockup_detector(h);
+	stop_offline_device_monitor(h);
 	hpsa_unregister_scsi(h);	/* unhook from SCSI subsystem */
 	hpsa_shutdown(pdev);
 	iounmap(h->vaddr);
diff --git a/drivers/scsi/hpsa.h b/drivers/scsi/hpsa.h
index 4fd0d45..4953fe3 100644
--- a/drivers/scsi/hpsa.h
+++ b/drivers/scsi/hpsa.h
@@ -155,7 +155,20 @@ struct ctlr_info {
 #define HPSATMF_LOG_QRY_TASK    (1 << 23)
 #define HPSATMF_LOG_QRY_TSET    (1 << 24)
 #define HPSATMF_LOG_QRY_ASYNC   (1 << 25)
+	spinlock_t offline_device_lock;
+	struct list_head offline_device_list;
+	struct task_struct *offline_device_monitor;
+	unsigned char offline_device_thread_state;
+#define OFFLINE_DEVICE_THREAD_STOPPED 0
+#define OFFLINE_DEVICE_THREAD_STOPPING 1
+#define OFFLINE_DEVICE_THREAD_RUNNING 2
 };
+
+struct offline_device_entry {
+	unsigned char scsi3addr[8];
+	struct list_head offline_list;
+};
+
 #define HPSA_ABORT_MSG 0
 #define HPSA_DEVICE_RESET_MSG 1
 #define HPSA_RESET_TYPE_CONTROLLER 0x00


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 09/10] hpsa: cap CCISS_PASSTHRU at 20 concurrent commands.
  2013-09-23 18:33 [PATCH 00/10] hpsa: September 2013 driver fixes Stephen M. Cameron
                   ` (7 preceding siblings ...)
  2013-09-23 18:34 ` [PATCH 08/10] hpsa: bring logical drives online when format completes Stephen M. Cameron
@ 2013-09-23 18:34 ` Stephen M. Cameron
  2013-09-23 18:34 ` [PATCH 10/10] hpsa: prevent stalled i/o Stephen M. Cameron
  9 siblings, 0 replies; 21+ messages in thread
From: Stephen M. Cameron @ 2013-09-23 18:34 UTC (permalink / raw)
  To: james.bottomley; +Cc: stephenmcameron, mikem, thenzl, linux-scsi, scott.teel

From: Stephen M. Cameron <scameron@beardog.cce.hp.com>

Cap CCISS_BIG_PASSTHRU as well.  If an attempt is made
to exceed this, ioctl() will return -1 with errno == EAGAIN.

This is to prevent a userland program from exhausting all of
pci_alloc_consistent memory.  I've only seen this problem when
running a special test program designed to provoke it.  20
concurrent commands via the passthru ioctls (not counting SG_IO)
should be more than enough.

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
---
 drivers/scsi/hpsa.c |   44 ++++++++++++++++++++++++++++++++++++++++++--
 drivers/scsi/hpsa.h |    5 +++++
 2 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index 198288d..1f6809b 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -3390,6 +3390,36 @@ static void check_ioctl_unit_attention(struct ctlr_info *h,
 			c->err_info->ScsiStatus != SAM_STAT_CHECK_CONDITION)
 		(void) check_for_unit_attention(h, c);
 }
+
+static int increment_passthru_count(struct ctlr_info *h)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&h->passthru_count_lock, flags);
+	if (h->passthru_count >= HPSA_MAX_CONCURRENT_PASSTHRUS) {
+		spin_unlock_irqrestore(&h->passthru_count_lock, flags);
+		return -1;
+	}
+	h->passthru_count++;
+	spin_unlock_irqrestore(&h->passthru_count_lock, flags);
+	return 0;
+}
+
+static void decrement_passthru_count(struct ctlr_info *h)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&h->passthru_count_lock, flags);
+	if (h->passthru_count <= 0) {
+		spin_unlock_irqrestore(&h->passthru_count_lock, flags);
+		/* not expecting to get here. */
+		dev_warn(&h->pdev->dev, "Bug detected, passthru_count seems to be incorrect.\n");
+		return;
+	}
+	h->passthru_count--;
+	spin_unlock_irqrestore(&h->passthru_count_lock, flags);
+}
+
 /*
  * ioctl
  */
@@ -3397,6 +3427,7 @@ static int hpsa_ioctl(struct scsi_device *dev, int cmd, void *arg)
 {
 	struct ctlr_info *h;
 	void __user *argp = (void __user *)arg;
+	int rc;
 
 	h = sdev_to_hba(dev);
 
@@ -3411,9 +3442,17 @@ static int hpsa_ioctl(struct scsi_device *dev, int cmd, void *arg)
 	case CCISS_GETDRIVVER:
 		return hpsa_getdrivver_ioctl(h, argp);
 	case CCISS_PASSTHRU:
-		return hpsa_passthru_ioctl(h, argp);
+		if (increment_passthru_count(h))
+			return -EAGAIN;
+		rc = hpsa_passthru_ioctl(h, argp);
+		decrement_passthru_count(h);
+		return rc;
 	case CCISS_BIG_PASSTHRU:
-		return hpsa_big_passthru_ioctl(h, argp);
+		if (increment_passthru_count(h))
+			return -EAGAIN;
+		rc = hpsa_big_passthru_ioctl(h, argp);
+		decrement_passthru_count(h);
+		return rc;
 	default:
 		return -ENOTTY;
 	}
@@ -5005,6 +5044,7 @@ reinit_after_soft_reset:
 	spin_lock_init(&h->lock);
 	spin_lock_init(&h->scan_lock);
 	spin_lock_init(&h->offline_device_lock);
+	spin_lock_init(&h->passthru_count_lock);
 	rc = hpsa_pci_init(h);
 	if (rc != 0)
 		goto clean1;
diff --git a/drivers/scsi/hpsa.h b/drivers/scsi/hpsa.h
index 4953fe3..839c533 100644
--- a/drivers/scsi/hpsa.h
+++ b/drivers/scsi/hpsa.h
@@ -115,6 +115,11 @@ struct ctlr_info {
 	struct TransTable_struct *transtable;
 	unsigned long transMethod;
 
+	/* cap concurrent passthrus at some reasonable maximum */
+#define HPSA_MAX_CONCURRENT_PASSTHRUS (20)
+	spinlock_t passthru_count_lock; /* protects passthru_count */
+	int passthru_count;
+
 	/*
 	 * Performant mode completion buffers
 	 */


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 10/10] hpsa: prevent stalled i/o
  2013-09-23 18:33 [PATCH 00/10] hpsa: September 2013 driver fixes Stephen M. Cameron
                   ` (8 preceding siblings ...)
  2013-09-23 18:34 ` [PATCH 09/10] hpsa: cap CCISS_PASSTHRU at 20 concurrent commands Stephen M. Cameron
@ 2013-09-23 18:34 ` Stephen M. Cameron
  9 siblings, 0 replies; 21+ messages in thread
From: Stephen M. Cameron @ 2013-09-23 18:34 UTC (permalink / raw)
  To: james.bottomley; +Cc: stephenmcameron, mikem, thenzl, linux-scsi, scott.teel

From: Stephen M. Cameron <scameron@beardog.cce.hp.com>

If a fifo full condition is encountered, i/o requests will stack
up in the h->reqQ queue.  The only thing which empties this queue
is start_io, which only gets called when new i/o requests come in.
If none are forthcoming, i/o in h->reqQ will be stalled.

To fix this, whenever fifo full condition is encountered, this
is recorded, and the interrupt handler examines this to see
if a fifo full condition was recently encountered when a
command completes and will call start_io to prevent i/o's in
h->reqQ from getting stuck.

I've only ever seen this problem occur when running specialized
test programs that pound on the the CCISS_PASSTHRU ioctl.

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
---
 drivers/scsi/hpsa.c |   32 ++++++++++++++++++++++++++++++--
 drivers/scsi/hpsa.h |    1 +
 2 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index 1f6809b..682f431 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -3651,9 +3651,11 @@ static void start_io(struct ctlr_info *h)
 		c = list_entry(h->reqQ.next, struct CommandList, list);
 		/* can't do anything if fifo is full */
 		if ((h->access.fifo_full(h))) {
+			h->fifo_recently_full = 1;
 			dev_warn(&h->pdev->dev, "fifo full\n");
 			break;
 		}
+		h->fifo_recently_full = 0;
 
 		/* Get the first entry from the Request Q */
 		removeQ(c);
@@ -3707,15 +3709,41 @@ static inline int bad_tag(struct ctlr_info *h, u32 tag_index,
 static inline void finish_cmd(struct CommandList *c)
 {
 	unsigned long flags;
+	int io_may_be_stalled = 0;
+	struct ctlr_info *h = c->h;
 
-	spin_lock_irqsave(&c->h->lock, flags);
+	spin_lock_irqsave(&h->lock, flags);
 	removeQ(c);
-	spin_unlock_irqrestore(&c->h->lock, flags);
+
+	/*
+	 * Check for possibly stalled i/o.
+	 *
+	 * If a fifo_full condition is encountered, requests will back up
+	 * in h->reqQ.  This queue is only emptied out by start_io which is
+	 * only called when a new i/o request comes in.  If no i/o's are
+	 * forthcoming, the i/o's in h->reqQ can get stuck.  So we call
+	 * start_io from here if we detect such a danger.
+	 *
+	 * Normally, we shouldn't hit this case, but pounding on the
+	 * CCISS_PASSTHRU ioctl can provoke it.  Only call start_io if
+	 * commands_outstanding is low.  We want to avoid calling
+	 * start_io from in here as much as possible, and esp. don't
+	 * want to get in a cycle where we call start_io every time
+	 * through here.
+	 */
+	if (unlikely(h->fifo_recently_full) &&
+		h->commands_outstanding < 5)
+		io_may_be_stalled = 1;
+
+	spin_unlock_irqrestore(&h->lock, flags);
+
 	dial_up_lockup_detection_on_fw_flash_complete(c->h, c);
 	if (likely(c->cmd_type == CMD_SCSI))
 		complete_scsi_command(c);
 	else if (c->cmd_type == CMD_IOCTL_PEND)
 		complete(c->waiting);
+	if (unlikely(io_may_be_stalled))
+		start_io(h);
 }
 
 static inline u32 hpsa_tag_contains_index(u32 tag)
diff --git a/drivers/scsi/hpsa.h b/drivers/scsi/hpsa.h
index 839c533..bea2365 100644
--- a/drivers/scsi/hpsa.h
+++ b/drivers/scsi/hpsa.h
@@ -137,6 +137,7 @@ struct ctlr_info {
 	atomic_t firmware_flash_in_progress;
 	u32 lockup_detected;
 	struct list_head lockup_list;
+	u32 fifo_recently_full;
 	/* Address of h->q[x] is passed to intr handler to know which queue */
 	u8 q[MAX_REPLY_QUEUES];
 	u32 TMFSupportFlags; /* cache what task mgmt funcs are supported. */


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 07/10] hpsa: hide logical drives with format in progress from linux
  2013-09-23 18:34 ` [PATCH 07/10] hpsa: hide logical drives with format in progress from linux Stephen M. Cameron
@ 2013-09-27 13:22   ` Tomas Henzl
  2013-09-27 13:34     ` scameron
  2013-09-27 19:11     ` scameron
  0 siblings, 2 replies; 21+ messages in thread
From: Tomas Henzl @ 2013-09-27 13:22 UTC (permalink / raw)
  To: Stephen M. Cameron
  Cc: james.bottomley, stephenmcameron, mikem, linux-scsi, scott.teel

On 09/23/2013 08:34 PM, Stephen M. Cameron wrote:
> From: Stephen M. Cameron <scameron@beardog.cce.hp.com>
>
> SCSI mid layer doesn't seem to handle logical drives undergoing format
> very well.  scsi_add_device on such devices seems to result in hitting
> those devices with a TUR at a rate of 3Hz for awhile, transitioning
> to hitting them with a READ(10) at a much higher rate indefinitely,
> and at boot time, this prevents the system from coming up.  If we
> do not expose such devices to the kernel, it isn't bothered by them.

Is the result of this patch that the drive is no more visible for the user
and he can't follow the formatting progress? 
I think a better option is to fix the kernel to handle formatting devices better
or harden the hpsa so it can cope with TURs or reads (ignore) from a formatting device.

Also maybe a cmd_special_free is missing - see below

Cheers, Tomas
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
---
 drivers/scsi/hpsa.c |   50 ++++++++++++++++++++++++++++++++++++++++++++++++--
 drivers/scsi/hpsa.h |    1 +
 2 files changed, 49 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index b7f405f..38e3af4 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -1010,6 +1010,20 @@ static void adjust_hpsa_scsi_table(struct ctlr_info *h, int hostno,
 	for (i = 0; i < nsds; i++) {
 		if (!sd[i]) /* if already added above. */
 			continue;
+
+		/* Don't add devices which are NOT READY, FORMAT IN PROGRESS
+		 * as the SCSI mid-layer does not handle such devices well.
+		 * It relentlessly loops sending TUR at 3Hz, then READ(10)
+		 * at 160Hz, and prevents the system from coming up.
+		 */
+		if (sd[i]->format_in_progress) {
+			dev_info(&h->pdev->dev,
+				"Logical drive format in progress, device c%db%dt%dl%d offline.\n",
+				h->scsi_host->host_no,
+				sd[i]->bus, sd[i]->target, sd[i]->lun);
+			continue;
+		}
+
 		device_change = hpsa_scsi_find_entry(sd[i], h->dev,
 					h->ndevices, &entry);
 		if (device_change == DEVICE_NOT_FOUND) {
@@ -1715,6 +1729,34 @@ static inline void hpsa_set_bus_target_lun(struct hpsa_scsi_dev_t *device,
 	device->lun = lun;
 }
 
+static unsigned char hpsa_format_in_progress(struct ctlr_info *h,
+		unsigned char scsi3addr[])
+{
+	struct CommandList *c;
+	unsigned char *sense, sense_key, asc, ascq;
+#define ASC_LUN_NOT_READY 0x04
+#define ASCQ_LUN_NOT_READY_FORMAT_IN_PROGRESS 0x04
+
+
+	c = cmd_special_alloc(h);
+	if (!c)
+		return 0;
+	fill_cmd(c, TEST_UNIT_READY, h, NULL, 0, 0, scsi3addr, TYPE_CMD);
+	hpsa_scsi_do_simple_cmd_core(h, c);
+	sense = c->err_info->SenseInfo;
+	sense_key = sense[2];
+	asc = sense[12];
+	ascq = sense[13];
+	if (c->err_info->CommandStatus == CMD_TARGET_STATUS &&
+		c->err_info->ScsiStatus == SAM_STAT_CHECK_CONDITION &&
+		sense_key == NOT_READY &&
+		asc == ASC_LUN_NOT_READY &&
+		ascq == ASCQ_LUN_NOT_READY_FORMAT_IN_PROGRESS)
+		return 1;
return^ without cmd_special_free

+	cmd_special_free(h, c);
+	return 0;
+}
+
 static int hpsa_update_device_info(struct ctlr_info *h,
 	unsigned char scsi3addr[], struct hpsa_scsi_dev_t *this_device,
 	unsigned char *is_OBDR_device)
@@ -1753,10 +1795,14 @@ static int hpsa_update_device_info(struct ctlr_info *h,
 		sizeof(this_device->device_id));
 
 	if (this_device->devtype == TYPE_DISK &&
-		is_logical_dev_addr_mode(scsi3addr))
+		is_logical_dev_addr_mode(scsi3addr)) {
 		hpsa_get_raid_level(h, scsi3addr, &this_device->raid_level);
-	else
+		this_device->format_in_progress =
+			hpsa_format_in_progress(h, scsi3addr);
+	} else {
 		this_device->raid_level = RAID_UNKNOWN;
+		this_device->format_in_progress = 0;
+	}
 
 	if (is_OBDR_device) {
 		/* See if this is a One-Button-Disaster-Recovery device
diff --git a/drivers/scsi/hpsa.h b/drivers/scsi/hpsa.h
index bc85e72..4fd0d45 100644
--- a/drivers/scsi/hpsa.h
+++ b/drivers/scsi/hpsa.h
@@ -46,6 +46,7 @@ struct hpsa_scsi_dev_t {
 	unsigned char vendor[8];        /* bytes 8-15 of inquiry data */
 	unsigned char model[16];        /* bytes 16-31 of inquiry data */
 	unsigned char raid_level;	/* from inquiry page 0xC1 */
+	unsigned char format_in_progress;
 };
 
 struct reply_pool {

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 07/10] hpsa: hide logical drives with format in progress from linux
  2013-09-27 13:22   ` Tomas Henzl
@ 2013-09-27 13:34     ` scameron
  2013-09-27 14:01       ` Tomas Henzl
  2013-10-10 16:25       ` scameron
  2013-09-27 19:11     ` scameron
  1 sibling, 2 replies; 21+ messages in thread
From: scameron @ 2013-09-27 13:34 UTC (permalink / raw)
  To: Tomas Henzl
  Cc: james.bottomley, stephenmcameron, mikem, linux-scsi, scott.teel,
	scameron

On Fri, Sep 27, 2013 at 03:22:19PM +0200, Tomas Henzl wrote:
> On 09/23/2013 08:34 PM, Stephen M. Cameron wrote:
> > From: Stephen M. Cameron <scameron@beardog.cce.hp.com>
> >
> > SCSI mid layer doesn't seem to handle logical drives undergoing format
> > very well.  scsi_add_device on such devices seems to result in hitting
> > those devices with a TUR at a rate of 3Hz for awhile, transitioning
> > to hitting them with a READ(10) at a much higher rate indefinitely,
> > and at boot time, this prevents the system from coming up.  If we
> > do not expose such devices to the kernel, it isn't bothered by them.
> 
> Is the result of this patch that the drive is no more visible for the user
> and he can't follow the formatting progress? 

Yes (subsequent patch monitors the progress and brings the drive
online when it's ready).

> I think a better option is to fix the kernel to handle formatting devices better

Yeah, you're probably right. (This is what comes of writing code for all
the distros then forward porting to kernel.org code.  Grumble-grumble-management
grumble-grumble real-world problems.)

> or harden the hpsa so it can cope with TURs or reads (ignore) from a formatting
> device.

I don't think hpsa driver had any problem with the TURs or READs though,
they would be returned to the mid layer just fine (TUR returned sense data
indicating not ready, format in progress, I forget what the reads
returned, whatever the firmware filled in for the sense data, which
was reasonable), but the mid-layer was relentless and just never
really proceeded, iirc.

Since we were trying to make this work on existing OSes where fixing the
SCSI mid layer wasn't an option, we came up with this.

> 
> Also maybe a cmd_special_free is missing - see below

D'oh.  Ok, now that's just embarassing.  Thanks.

-- steve

> 
> Cheers, Tomas
> Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
> ---
>  drivers/scsi/hpsa.c |   50 ++++++++++++++++++++++++++++++++++++++++++++++++--
>  drivers/scsi/hpsa.h |    1 +
>  2 files changed, 49 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
> index b7f405f..38e3af4 100644
> --- a/drivers/scsi/hpsa.c
> +++ b/drivers/scsi/hpsa.c
> @@ -1010,6 +1010,20 @@ static void adjust_hpsa_scsi_table(struct ctlr_info *h, int hostno,
>  	for (i = 0; i < nsds; i++) {
>  		if (!sd[i]) /* if already added above. */
>  			continue;
> +
> +		/* Don't add devices which are NOT READY, FORMAT IN PROGRESS
> +		 * as the SCSI mid-layer does not handle such devices well.
> +		 * It relentlessly loops sending TUR at 3Hz, then READ(10)
> +		 * at 160Hz, and prevents the system from coming up.
> +		 */
> +		if (sd[i]->format_in_progress) {
> +			dev_info(&h->pdev->dev,
> +				"Logical drive format in progress, device c%db%dt%dl%d offline.\n",
> +				h->scsi_host->host_no,
> +				sd[i]->bus, sd[i]->target, sd[i]->lun);
> +			continue;
> +		}
> +
>  		device_change = hpsa_scsi_find_entry(sd[i], h->dev,
>  					h->ndevices, &entry);
>  		if (device_change == DEVICE_NOT_FOUND) {
> @@ -1715,6 +1729,34 @@ static inline void hpsa_set_bus_target_lun(struct hpsa_scsi_dev_t *device,
>  	device->lun = lun;
>  }
>  
> +static unsigned char hpsa_format_in_progress(struct ctlr_info *h,
> +		unsigned char scsi3addr[])
> +{
> +	struct CommandList *c;
> +	unsigned char *sense, sense_key, asc, ascq;
> +#define ASC_LUN_NOT_READY 0x04
> +#define ASCQ_LUN_NOT_READY_FORMAT_IN_PROGRESS 0x04
> +
> +
> +	c = cmd_special_alloc(h);
> +	if (!c)
> +		return 0;
> +	fill_cmd(c, TEST_UNIT_READY, h, NULL, 0, 0, scsi3addr, TYPE_CMD);
> +	hpsa_scsi_do_simple_cmd_core(h, c);
> +	sense = c->err_info->SenseInfo;
> +	sense_key = sense[2];
> +	asc = sense[12];
> +	ascq = sense[13];
> +	if (c->err_info->CommandStatus == CMD_TARGET_STATUS &&
> +		c->err_info->ScsiStatus == SAM_STAT_CHECK_CONDITION &&
> +		sense_key == NOT_READY &&
> +		asc == ASC_LUN_NOT_READY &&
> +		ascq == ASCQ_LUN_NOT_READY_FORMAT_IN_PROGRESS)
> +		return 1;
> return^ without cmd_special_free
> 
> +	cmd_special_free(h, c);
> +	return 0;
> +}
> +
>  static int hpsa_update_device_info(struct ctlr_info *h,
>  	unsigned char scsi3addr[], struct hpsa_scsi_dev_t *this_device,
>  	unsigned char *is_OBDR_device)
> @@ -1753,10 +1795,14 @@ static int hpsa_update_device_info(struct ctlr_info *h,
>  		sizeof(this_device->device_id));
>  
>  	if (this_device->devtype == TYPE_DISK &&
> -		is_logical_dev_addr_mode(scsi3addr))
> +		is_logical_dev_addr_mode(scsi3addr)) {
>  		hpsa_get_raid_level(h, scsi3addr, &this_device->raid_level);
> -	else
> +		this_device->format_in_progress =
> +			hpsa_format_in_progress(h, scsi3addr);
> +	} else {
>  		this_device->raid_level = RAID_UNKNOWN;
> +		this_device->format_in_progress = 0;
> +	}
>  
>  	if (is_OBDR_device) {
>  		/* See if this is a One-Button-Disaster-Recovery device
> diff --git a/drivers/scsi/hpsa.h b/drivers/scsi/hpsa.h
> index bc85e72..4fd0d45 100644
> --- a/drivers/scsi/hpsa.h
> +++ b/drivers/scsi/hpsa.h
> @@ -46,6 +46,7 @@ struct hpsa_scsi_dev_t {
>  	unsigned char vendor[8];        /* bytes 8-15 of inquiry data */
>  	unsigned char model[16];        /* bytes 16-31 of inquiry data */
>  	unsigned char raid_level;	/* from inquiry page 0xC1 */
> +	unsigned char format_in_progress;
>  };
>  
>  struct reply_pool {
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 07/10] hpsa: hide logical drives with format in progress from linux
  2013-09-27 13:34     ` scameron
@ 2013-09-27 14:01       ` Tomas Henzl
  2013-09-27 14:41         ` scameron
  2013-10-10 16:25       ` scameron
  1 sibling, 1 reply; 21+ messages in thread
From: Tomas Henzl @ 2013-09-27 14:01 UTC (permalink / raw)
  To: scameron; +Cc: james.bottomley, stephenmcameron, mikem, linux-scsi, scott.teel

On 09/27/2013 03:34 PM, scameron@beardog.cce.hp.com wrote:
> On Fri, Sep 27, 2013 at 03:22:19PM +0200, Tomas Henzl wrote:
>> On 09/23/2013 08:34 PM, Stephen M. Cameron wrote:
>>> From: Stephen M. Cameron <scameron@beardog.cce.hp.com>
>>>
>>> SCSI mid layer doesn't seem to handle logical drives undergoing format
>>> very well.  scsi_add_device on such devices seems to result in hitting
>>> those devices with a TUR at a rate of 3Hz for awhile, transitioning
>>> to hitting them with a READ(10) at a much higher rate indefinitely,
>>> and at boot time, this prevents the system from coming up.  If we
>>> do not expose such devices to the kernel, it isn't bothered by them.
>> Is the result of this patch that the drive is no more visible for the user
>> and he can't follow the formatting progress? 
> Yes (subsequent patch monitors the progress and brings the drive
> online when it's ready).
>
>> I think a better option is to fix the kernel to handle formatting devices better
> Yeah, you're probably right. (This is what comes of writing code for all
> the distros then forward porting to kernel.org code.  Grumble-grumble-management
> grumble-grumble real-world problems.)
>
>> or harden the hpsa so it can cope with TURs or reads (ignore) from a formatting
>> device.
> I don't think hpsa driver had any problem with the TURs or READs though,
> they would be returned to the mid layer just fine (TUR returned sense data
> indicating not ready, format in progress, I forget what the reads
> returned, whatever the firmware filled in for the sense data, which
> was reasonable), but the mid-layer was relentless and just never
> really proceeded, iirc.
>
> Since we were trying to make this work on existing OSes where fixing the
> SCSI mid layer wasn't an option, we came up with this.

I'm actually glad that you care about existing OSes :)

Do you know whether the midlayer has similar problems with other drivers?

Tomas

>
>> Also maybe a cmd_special_free is missing - see below
> D'oh.  Ok, now that's just embarassing.  Thanks.
>
> -- steve
>
>> Cheers, Tomas
>> Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
>> ---
>>  drivers/scsi/hpsa.c |   50 ++++++++++++++++++++++++++++++++++++++++++++++++--
>>  drivers/scsi/hpsa.h |    1 +
>>  2 files changed, 49 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
>> index b7f405f..38e3af4 100644
>> --- a/drivers/scsi/hpsa.c
>> +++ b/drivers/scsi/hpsa.c
>> @@ -1010,6 +1010,20 @@ static void adjust_hpsa_scsi_table(struct ctlr_info *h, int hostno,
>>  	for (i = 0; i < nsds; i++) {
>>  		if (!sd[i]) /* if already added above. */
>>  			continue;
>> +
>> +		/* Don't add devices which are NOT READY, FORMAT IN PROGRESS
>> +		 * as the SCSI mid-layer does not handle such devices well.
>> +		 * It relentlessly loops sending TUR at 3Hz, then READ(10)
>> +		 * at 160Hz, and prevents the system from coming up.
>> +		 */
>> +		if (sd[i]->format_in_progress) {
>> +			dev_info(&h->pdev->dev,
>> +				"Logical drive format in progress, device c%db%dt%dl%d offline.\n",
>> +				h->scsi_host->host_no,
>> +				sd[i]->bus, sd[i]->target, sd[i]->lun);
>> +			continue;
>> +		}
>> +
>>  		device_change = hpsa_scsi_find_entry(sd[i], h->dev,
>>  					h->ndevices, &entry);
>>  		if (device_change == DEVICE_NOT_FOUND) {
>> @@ -1715,6 +1729,34 @@ static inline void hpsa_set_bus_target_lun(struct hpsa_scsi_dev_t *device,
>>  	device->lun = lun;
>>  }
>>  
>> +static unsigned char hpsa_format_in_progress(struct ctlr_info *h,
>> +		unsigned char scsi3addr[])
>> +{
>> +	struct CommandList *c;
>> +	unsigned char *sense, sense_key, asc, ascq;
>> +#define ASC_LUN_NOT_READY 0x04
>> +#define ASCQ_LUN_NOT_READY_FORMAT_IN_PROGRESS 0x04
>> +
>> +
>> +	c = cmd_special_alloc(h);
>> +	if (!c)
>> +		return 0;
>> +	fill_cmd(c, TEST_UNIT_READY, h, NULL, 0, 0, scsi3addr, TYPE_CMD);
>> +	hpsa_scsi_do_simple_cmd_core(h, c);
>> +	sense = c->err_info->SenseInfo;
>> +	sense_key = sense[2];
>> +	asc = sense[12];
>> +	ascq = sense[13];
>> +	if (c->err_info->CommandStatus == CMD_TARGET_STATUS &&
>> +		c->err_info->ScsiStatus == SAM_STAT_CHECK_CONDITION &&
>> +		sense_key == NOT_READY &&
>> +		asc == ASC_LUN_NOT_READY &&
>> +		ascq == ASCQ_LUN_NOT_READY_FORMAT_IN_PROGRESS)
>> +		return 1;
>> return^ without cmd_special_free
>>
>> +	cmd_special_free(h, c);
>> +	return 0;
>> +}
>> +
>>  static int hpsa_update_device_info(struct ctlr_info *h,
>>  	unsigned char scsi3addr[], struct hpsa_scsi_dev_t *this_device,
>>  	unsigned char *is_OBDR_device)
>> @@ -1753,10 +1795,14 @@ static int hpsa_update_device_info(struct ctlr_info *h,
>>  		sizeof(this_device->device_id));
>>  
>>  	if (this_device->devtype == TYPE_DISK &&
>> -		is_logical_dev_addr_mode(scsi3addr))
>> +		is_logical_dev_addr_mode(scsi3addr)) {
>>  		hpsa_get_raid_level(h, scsi3addr, &this_device->raid_level);
>> -	else
>> +		this_device->format_in_progress =
>> +			hpsa_format_in_progress(h, scsi3addr);
>> +	} else {
>>  		this_device->raid_level = RAID_UNKNOWN;
>> +		this_device->format_in_progress = 0;
>> +	}
>>  
>>  	if (is_OBDR_device) {
>>  		/* See if this is a One-Button-Disaster-Recovery device
>> diff --git a/drivers/scsi/hpsa.h b/drivers/scsi/hpsa.h
>> index bc85e72..4fd0d45 100644
>> --- a/drivers/scsi/hpsa.h
>> +++ b/drivers/scsi/hpsa.h
>> @@ -46,6 +46,7 @@ struct hpsa_scsi_dev_t {
>>  	unsigned char vendor[8];        /* bytes 8-15 of inquiry data */
>>  	unsigned char model[16];        /* bytes 16-31 of inquiry data */
>>  	unsigned char raid_level;	/* from inquiry page 0xC1 */
>> +	unsigned char format_in_progress;
>>  };
>>  
>>  struct reply_pool {
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 07/10] hpsa: hide logical drives with format in progress from linux
  2013-09-27 14:01       ` Tomas Henzl
@ 2013-09-27 14:41         ` scameron
  2013-09-27 14:58           ` Tomas Henzl
  2013-09-27 16:54           ` Douglas Gilbert
  0 siblings, 2 replies; 21+ messages in thread
From: scameron @ 2013-09-27 14:41 UTC (permalink / raw)
  To: Tomas Henzl
  Cc: james.bottomley, stephenmcameron, mikem, linux-scsi, scott.teel,
	scameron

On Fri, Sep 27, 2013 at 04:01:30PM +0200, Tomas Henzl wrote:
> On 09/27/2013 03:34 PM, scameron@beardog.cce.hp.com wrote:
> > On Fri, Sep 27, 2013 at 03:22:19PM +0200, Tomas Henzl wrote:
> >> On 09/23/2013 08:34 PM, Stephen M. Cameron wrote:
> >>> From: Stephen M. Cameron <scameron@beardog.cce.hp.com>
> >>>
> >>> SCSI mid layer doesn't seem to handle logical drives undergoing format
> >>> very well.  scsi_add_device on such devices seems to result in hitting
> >>> those devices with a TUR at a rate of 3Hz for awhile, transitioning
> >>> to hitting them with a READ(10) at a much higher rate indefinitely,
> >>> and at boot time, this prevents the system from coming up.  If we
> >>> do not expose such devices to the kernel, it isn't bothered by them.
> >> Is the result of this patch that the drive is no more visible for the user
> >> and he can't follow the formatting progress? 
> > Yes (subsequent patch monitors the progress and brings the drive
> > online when it's ready).
> >
> >> I think a better option is to fix the kernel to handle formatting devices better
> > Yeah, you're probably right. (This is what comes of writing code for all
> > the distros then forward porting to kernel.org code.  Grumble-grumble-management
> > grumble-grumble real-world problems.)
> >
> >> or harden the hpsa so it can cope with TURs or reads (ignore) from a formatting
> >> device.
> > I don't think hpsa driver had any problem with the TURs or READs though,
> > they would be returned to the mid layer just fine (TUR returned sense data
> > indicating not ready, format in progress, I forget what the reads
> > returned, whatever the firmware filled in for the sense data, which
> > was reasonable), but the mid-layer was relentless and just never
> > really proceeded, iirc.
> >
> > Since we were trying to make this work on existing OSes where fixing the
> > SCSI mid layer wasn't an option, we came up with this.
> 
> I'm actually glad that you care about existing OSes :)

And the pain of porting would be much the same regardless of
whether the port is forward or backward, I suppose.

> 
> Do you know whether the midlayer has similar problems with other drivers?

No, not sure.   One thing that's a bit unusual about hpsa is it uses
the scan_start and scan_finished members of scsi_host_template, so hpsa
does its own scanning, rather than let the midlayer do the scanning which
is due to Smart Array's weirdness around the vicinity of SCSI_REPORT_LUNS.

I suspect that a lld driver calling scsi_add_device() on something which
is NOT READY/FORMAT IN PROGRESS is what provokes the trouble.  Most drivers
do not call scsi_add_device() directly at all, so it's quite possible most
drivers do not experience such a problem. A few do call scsi_add_device()
directly, like ipr or pmcraid, so these might conceivably have a similar
problem.  

We ran into this problem with what we call "Rapid Parity Initialization", which
is what you get when the RAID controller leaves the logical volume in a NOT
READY/FORMAT IN PROGRESS state and devotes itself entirely to initializing
parity data and when that's done, then the volume starts acting normally.  

Initializing the parity data can take quite a long time (hours), but not as
long as initializing it on the fly under load, which, with very large,
relatively slow drives can take nigh on forever, hence the "rapid" parity
initialization moniker.  So, if those other RAID controllers don't have a
similar feature that produces a relatively long lived NOT READY/FORMAT IN
PROGRESS state, they may not bump into the problem.

It has been awhile since I've tried letting the driver call scsi_add_device()
on a device which is undergoing Rapid Parity Initialization, so I need to try
that with current code and see how it behaves.  I haven't thought about how to
fix it within the SCSI mid layer (presuming it still doesn't behave well)
since previously we only concerned ourselves with avoiding provoking the
undesirable behavior.

-- steve

> 
> Tomas
> 
> >
> >> Also maybe a cmd_special_free is missing - see below
> > D'oh.  Ok, now that's just embarassing.  Thanks.
> >
> > -- steve
> >
> >> Cheers, Tomas
> >> Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
> >> ---
> >>  drivers/scsi/hpsa.c |   50 ++++++++++++++++++++++++++++++++++++++++++++++++--
> >>  drivers/scsi/hpsa.h |    1 +
> >>  2 files changed, 49 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
> >> index b7f405f..38e3af4 100644
> >> --- a/drivers/scsi/hpsa.c
> >> +++ b/drivers/scsi/hpsa.c
> >> @@ -1010,6 +1010,20 @@ static void adjust_hpsa_scsi_table(struct ctlr_info *h, int hostno,
> >>  	for (i = 0; i < nsds; i++) {
> >>  		if (!sd[i]) /* if already added above. */
> >>  			continue;
> >> +
> >> +		/* Don't add devices which are NOT READY, FORMAT IN PROGRESS
> >> +		 * as the SCSI mid-layer does not handle such devices well.
> >> +		 * It relentlessly loops sending TUR at 3Hz, then READ(10)
> >> +		 * at 160Hz, and prevents the system from coming up.
> >> +		 */
> >> +		if (sd[i]->format_in_progress) {
> >> +			dev_info(&h->pdev->dev,
> >> +				"Logical drive format in progress, device c%db%dt%dl%d offline.\n",
> >> +				h->scsi_host->host_no,
> >> +				sd[i]->bus, sd[i]->target, sd[i]->lun);
> >> +			continue;
> >> +		}
> >> +
> >>  		device_change = hpsa_scsi_find_entry(sd[i], h->dev,
> >>  					h->ndevices, &entry);
> >>  		if (device_change == DEVICE_NOT_FOUND) {
> >> @@ -1715,6 +1729,34 @@ static inline void hpsa_set_bus_target_lun(struct hpsa_scsi_dev_t *device,
> >>  	device->lun = lun;
> >>  }
> >>  
> >> +static unsigned char hpsa_format_in_progress(struct ctlr_info *h,
> >> +		unsigned char scsi3addr[])
> >> +{
> >> +	struct CommandList *c;
> >> +	unsigned char *sense, sense_key, asc, ascq;
> >> +#define ASC_LUN_NOT_READY 0x04
> >> +#define ASCQ_LUN_NOT_READY_FORMAT_IN_PROGRESS 0x04
> >> +
> >> +
> >> +	c = cmd_special_alloc(h);
> >> +	if (!c)
> >> +		return 0;
> >> +	fill_cmd(c, TEST_UNIT_READY, h, NULL, 0, 0, scsi3addr, TYPE_CMD);
> >> +	hpsa_scsi_do_simple_cmd_core(h, c);
> >> +	sense = c->err_info->SenseInfo;
> >> +	sense_key = sense[2];
> >> +	asc = sense[12];
> >> +	ascq = sense[13];
> >> +	if (c->err_info->CommandStatus == CMD_TARGET_STATUS &&
> >> +		c->err_info->ScsiStatus == SAM_STAT_CHECK_CONDITION &&
> >> +		sense_key == NOT_READY &&
> >> +		asc == ASC_LUN_NOT_READY &&
> >> +		ascq == ASCQ_LUN_NOT_READY_FORMAT_IN_PROGRESS)
> >> +		return 1;
> >> return^ without cmd_special_free
> >>
> >> +	cmd_special_free(h, c);
> >> +	return 0;
> >> +}
> >> +
> >>  static int hpsa_update_device_info(struct ctlr_info *h,
> >>  	unsigned char scsi3addr[], struct hpsa_scsi_dev_t *this_device,
> >>  	unsigned char *is_OBDR_device)
> >> @@ -1753,10 +1795,14 @@ static int hpsa_update_device_info(struct ctlr_info *h,
> >>  		sizeof(this_device->device_id));
> >>  
> >>  	if (this_device->devtype == TYPE_DISK &&
> >> -		is_logical_dev_addr_mode(scsi3addr))
> >> +		is_logical_dev_addr_mode(scsi3addr)) {
> >>  		hpsa_get_raid_level(h, scsi3addr, &this_device->raid_level);
> >> -	else
> >> +		this_device->format_in_progress =
> >> +			hpsa_format_in_progress(h, scsi3addr);
> >> +	} else {
> >>  		this_device->raid_level = RAID_UNKNOWN;
> >> +		this_device->format_in_progress = 0;
> >> +	}
> >>  
> >>  	if (is_OBDR_device) {
> >>  		/* See if this is a One-Button-Disaster-Recovery device
> >> diff --git a/drivers/scsi/hpsa.h b/drivers/scsi/hpsa.h
> >> index bc85e72..4fd0d45 100644
> >> --- a/drivers/scsi/hpsa.h
> >> +++ b/drivers/scsi/hpsa.h
> >> @@ -46,6 +46,7 @@ struct hpsa_scsi_dev_t {
> >>  	unsigned char vendor[8];        /* bytes 8-15 of inquiry data */
> >>  	unsigned char model[16];        /* bytes 16-31 of inquiry data */
> >>  	unsigned char raid_level;	/* from inquiry page 0xC1 */
> >> +	unsigned char format_in_progress;
> >>  };
> >>  
> >>  struct reply_pool {
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 07/10] hpsa: hide logical drives with format in progress from linux
  2013-09-27 14:41         ` scameron
@ 2013-09-27 14:58           ` Tomas Henzl
  2013-09-30 21:18             ` scameron
  2013-09-27 16:54           ` Douglas Gilbert
  1 sibling, 1 reply; 21+ messages in thread
From: Tomas Henzl @ 2013-09-27 14:58 UTC (permalink / raw)
  To: scameron; +Cc: james.bottomley, stephenmcameron, mikem, linux-scsi, scott.teel

On 09/27/2013 04:41 PM, scameron@beardog.cce.hp.com wrote:
> On Fri, Sep 27, 2013 at 04:01:30PM +0200, Tomas Henzl wrote:
>> On 09/27/2013 03:34 PM, scameron@beardog.cce.hp.com wrote:
>>> On Fri, Sep 27, 2013 at 03:22:19PM +0200, Tomas Henzl wrote:
>>>> On 09/23/2013 08:34 PM, Stephen M. Cameron wrote:
>>>>> From: Stephen M. Cameron <scameron@beardog.cce.hp.com>
>>>>>
>>>>> SCSI mid layer doesn't seem to handle logical drives undergoing format
>>>>> very well.  scsi_add_device on such devices seems to result in hitting
>>>>> those devices with a TUR at a rate of 3Hz for awhile, transitioning
>>>>> to hitting them with a READ(10) at a much higher rate indefinitely,
>>>>> and at boot time, this prevents the system from coming up.  If we
>>>>> do not expose such devices to the kernel, it isn't bothered by them.
>>>> Is the result of this patch that the drive is no more visible for the user
>>>> and he can't follow the formatting progress? 
>>> Yes (subsequent patch monitors the progress and brings the drive
>>> online when it's ready).
>>>
>>>> I think a better option is to fix the kernel to handle formatting devices better
>>> Yeah, you're probably right. (This is what comes of writing code for all
>>> the distros then forward porting to kernel.org code.  Grumble-grumble-management
>>> grumble-grumble real-world problems.)
>>>
>>>> or harden the hpsa so it can cope with TURs or reads (ignore) from a formatting
>>>> device.
>>> I don't think hpsa driver had any problem with the TURs or READs though,
>>> they would be returned to the mid layer just fine (TUR returned sense data
>>> indicating not ready, format in progress, I forget what the reads
>>> returned, whatever the firmware filled in for the sense data, which
>>> was reasonable), but the mid-layer was relentless and just never
>>> really proceeded, iirc.
>>>
>>> Since we were trying to make this work on existing OSes where fixing the
>>> SCSI mid layer wasn't an option, we came up with this.
>> I'm actually glad that you care about existing OSes :)
> And the pain of porting would be much the same regardless of
> whether the port is forward or backward, I suppose.
>
>> Do you know whether the midlayer has similar problems with other drivers?
> No, not sure.   One thing that's a bit unusual about hpsa is it uses
> the scan_start and scan_finished members of scsi_host_template, so hpsa
> does its own scanning, rather than let the midlayer do the scanning which
> is due to Smart Array's weirdness around the vicinity of SCSI_REPORT_LUNS.
>
> I suspect that a lld driver calling scsi_add_device() on something which
> is NOT READY/FORMAT IN PROGRESS is what provokes the trouble.  Most drivers
> do not call scsi_add_device() directly at all, so it's quite possible most
> drivers do not experience such a problem. A few do call scsi_add_device()
> directly, like ipr or pmcraid, so these might conceivably have a similar
> problem.  
>
> We ran into this problem with what we call "Rapid Parity Initialization", which
> is what you get when the RAID controller leaves the logical volume in a NOT
> READY/FORMAT IN PROGRESS state and devotes itself entirely to initializing
> parity data and when that's done, then the volume starts acting normally.  
>
> Initializing the parity data can take quite a long time (hours), but not as
> long as initializing it on the fly under load, which, with very large,
> relatively slow drives can take nigh on forever, hence the "rapid" parity
> initialization moniker.  So, if those other RAID controllers don't have a
> similar feature that produces a relatively long lived NOT READY/FORMAT IN
> PROGRESS state, they may not bump into the problem.
>
> It has been awhile since I've tried letting the driver call scsi_add_device()
> on a device which is undergoing Rapid Parity Initialization, so I need to try
> that with current code and see how it behaves.  I haven't thought about how to
> fix it within the SCSI mid layer (presuming it still doesn't behave well)
> since previously we only concerned ourselves with avoiding provoking the
> undesirable behavior.
>
> -- steve

Thanks for the explanation. I hope I can look into this later. Sometimes later. When my
real-world problems go away...

>
>> Tomas
>>
>>>> Also maybe a cmd_special_free is missing - see below
>>> D'oh.  Ok, now that's just embarassing.  Thanks.
>>>
>>> -- steve
>>>
>>>> Cheers, Tomas
>>>> Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
>>>> ---
>>>>  drivers/scsi/hpsa.c |   50 ++++++++++++++++++++++++++++++++++++++++++++++++--
>>>>  drivers/scsi/hpsa.h |    1 +
>>>>  2 files changed, 49 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
>>>> index b7f405f..38e3af4 100644
>>>> --- a/drivers/scsi/hpsa.c
>>>> +++ b/drivers/scsi/hpsa.c
>>>> @@ -1010,6 +1010,20 @@ static void adjust_hpsa_scsi_table(struct ctlr_info *h, int hostno,
>>>>  	for (i = 0; i < nsds; i++) {
>>>>  		if (!sd[i]) /* if already added above. */
>>>>  			continue;
>>>> +
>>>> +		/* Don't add devices which are NOT READY, FORMAT IN PROGRESS
>>>> +		 * as the SCSI mid-layer does not handle such devices well.
>>>> +		 * It relentlessly loops sending TUR at 3Hz, then READ(10)
>>>> +		 * at 160Hz, and prevents the system from coming up.
>>>> +		 */
>>>> +		if (sd[i]->format_in_progress) {
>>>> +			dev_info(&h->pdev->dev,
>>>> +				"Logical drive format in progress, device c%db%dt%dl%d offline.\n",
>>>> +				h->scsi_host->host_no,
>>>> +				sd[i]->bus, sd[i]->target, sd[i]->lun);
>>>> +			continue;
>>>> +		}
>>>> +
>>>>  		device_change = hpsa_scsi_find_entry(sd[i], h->dev,
>>>>  					h->ndevices, &entry);
>>>>  		if (device_change == DEVICE_NOT_FOUND) {
>>>> @@ -1715,6 +1729,34 @@ static inline void hpsa_set_bus_target_lun(struct hpsa_scsi_dev_t *device,
>>>>  	device->lun = lun;
>>>>  }
>>>>  
>>>> +static unsigned char hpsa_format_in_progress(struct ctlr_info *h,
>>>> +		unsigned char scsi3addr[])
>>>> +{
>>>> +	struct CommandList *c;
>>>> +	unsigned char *sense, sense_key, asc, ascq;
>>>> +#define ASC_LUN_NOT_READY 0x04
>>>> +#define ASCQ_LUN_NOT_READY_FORMAT_IN_PROGRESS 0x04
>>>> +
>>>> +
>>>> +	c = cmd_special_alloc(h);
>>>> +	if (!c)
>>>> +		return 0;
>>>> +	fill_cmd(c, TEST_UNIT_READY, h, NULL, 0, 0, scsi3addr, TYPE_CMD);
>>>> +	hpsa_scsi_do_simple_cmd_core(h, c);
>>>> +	sense = c->err_info->SenseInfo;
>>>> +	sense_key = sense[2];
>>>> +	asc = sense[12];
>>>> +	ascq = sense[13];
>>>> +	if (c->err_info->CommandStatus == CMD_TARGET_STATUS &&
>>>> +		c->err_info->ScsiStatus == SAM_STAT_CHECK_CONDITION &&
>>>> +		sense_key == NOT_READY &&
>>>> +		asc == ASC_LUN_NOT_READY &&
>>>> +		ascq == ASCQ_LUN_NOT_READY_FORMAT_IN_PROGRESS)
>>>> +		return 1;
>>>> return^ without cmd_special_free
>>>>
>>>> +	cmd_special_free(h, c);
>>>> +	return 0;
>>>> +}
>>>> +
>>>>  static int hpsa_update_device_info(struct ctlr_info *h,
>>>>  	unsigned char scsi3addr[], struct hpsa_scsi_dev_t *this_device,
>>>>  	unsigned char *is_OBDR_device)
>>>> @@ -1753,10 +1795,14 @@ static int hpsa_update_device_info(struct ctlr_info *h,
>>>>  		sizeof(this_device->device_id));
>>>>  
>>>>  	if (this_device->devtype == TYPE_DISK &&
>>>> -		is_logical_dev_addr_mode(scsi3addr))
>>>> +		is_logical_dev_addr_mode(scsi3addr)) {
>>>>  		hpsa_get_raid_level(h, scsi3addr, &this_device->raid_level);
>>>> -	else
>>>> +		this_device->format_in_progress =
>>>> +			hpsa_format_in_progress(h, scsi3addr);
>>>> +	} else {
>>>>  		this_device->raid_level = RAID_UNKNOWN;
>>>> +		this_device->format_in_progress = 0;
>>>> +	}
>>>>  
>>>>  	if (is_OBDR_device) {
>>>>  		/* See if this is a One-Button-Disaster-Recovery device
>>>> diff --git a/drivers/scsi/hpsa.h b/drivers/scsi/hpsa.h
>>>> index bc85e72..4fd0d45 100644
>>>> --- a/drivers/scsi/hpsa.h
>>>> +++ b/drivers/scsi/hpsa.h
>>>> @@ -46,6 +46,7 @@ struct hpsa_scsi_dev_t {
>>>>  	unsigned char vendor[8];        /* bytes 8-15 of inquiry data */
>>>>  	unsigned char model[16];        /* bytes 16-31 of inquiry data */
>>>>  	unsigned char raid_level;	/* from inquiry page 0xC1 */
>>>> +	unsigned char format_in_progress;
>>>>  };
>>>>  
>>>>  struct reply_pool {
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 07/10] hpsa: hide logical drives with format in progress from linux
  2013-09-27 14:41         ` scameron
  2013-09-27 14:58           ` Tomas Henzl
@ 2013-09-27 16:54           ` Douglas Gilbert
  2013-09-27 17:41             ` scameron
  1 sibling, 1 reply; 21+ messages in thread
From: Douglas Gilbert @ 2013-09-27 16:54 UTC (permalink / raw)
  To: scameron, Tomas Henzl
  Cc: james.bottomley, stephenmcameron, mikem, linux-scsi, scott.teel

On 13-09-27 10:41 AM, scameron@beardog.cce.hp.com wrote:
> On Fri, Sep 27, 2013 at 04:01:30PM +0200, Tomas Henzl wrote:
>> On 09/27/2013 03:34 PM, scameron@beardog.cce.hp.com wrote:
>>> On Fri, Sep 27, 2013 at 03:22:19PM +0200, Tomas Henzl wrote:
>>>> On 09/23/2013 08:34 PM, Stephen M. Cameron wrote:
>>>>> From: Stephen M. Cameron <scameron@beardog.cce.hp.com>
>>>>>
>>>>> SCSI mid layer doesn't seem to handle logical drives undergoing format
>>>>> very well.  scsi_add_device on such devices seems to result in hitting
>>>>> those devices with a TUR at a rate of 3Hz for awhile, transitioning
>>>>> to hitting them with a READ(10) at a much higher rate indefinitely,
>>>>> and at boot time, this prevents the system from coming up.  If we
>>>>> do not expose such devices to the kernel, it isn't bothered by them.
>>>> Is the result of this patch that the drive is no more visible for the user
>>>> and he can't follow the formatting progress?
>>> Yes (subsequent patch monitors the progress and brings the drive
>>> online when it's ready).
>>>
>>>> I think a better option is to fix the kernel to handle formatting devices better
>>> Yeah, you're probably right. (This is what comes of writing code for all
>>> the distros then forward porting to kernel.org code.  Grumble-grumble-management
>>> grumble-grumble real-world problems.)
>>>
>>>> or harden the hpsa so it can cope with TURs or reads (ignore) from a formatting
>>>> device.
>>> I don't think hpsa driver had any problem with the TURs or READs though,
>>> they would be returned to the mid layer just fine (TUR returned sense data
>>> indicating not ready, format in progress, I forget what the reads
>>> returned, whatever the firmware filled in for the sense data, which
>>> was reasonable), but the mid-layer was relentless and just never
>>> really proceeded, iirc.
>>>
>>> Since we were trying to make this work on existing OSes where fixing the
>>> SCSI mid layer wasn't an option, we came up with this.
>>
>> I'm actually glad that you care about existing OSes :)
>
> And the pain of porting would be much the same regardless of
> whether the port is forward or backward, I suppose.
>
>>
>> Do you know whether the midlayer has similar problems with other drivers?
>
> No, not sure.   One thing that's a bit unusual about hpsa is it uses
> the scan_start and scan_finished members of scsi_host_template, so hpsa
> does its own scanning, rather than let the midlayer do the scanning which
> is due to Smart Array's weirdness around the vicinity of SCSI_REPORT_LUNS.
>
> I suspect that a lld driver calling scsi_add_device() on something which
> is NOT READY/FORMAT IN PROGRESS is what provokes the trouble.  Most drivers
> do not call scsi_add_device() directly at all, so it's quite possible most
> drivers do not experience such a problem. A few do call scsi_add_device()
> directly, like ipr or pmcraid, so these might conceivably have a similar
> problem.
>
> We ran into this problem with what we call "Rapid Parity Initialization", which
> is what you get when the RAID controller leaves the logical volume in a NOT
> READY/FORMAT IN PROGRESS state and devotes itself entirely to initializing
> parity data and when that's done, then the volume starts acting normally.
>
> Initializing the parity data can take quite a long time (hours), but not as
> long as initializing it on the fly under load, which, with very large,
> relatively slow drives can take nigh on forever, hence the "rapid" parity
> initialization moniker.  So, if those other RAID controllers don't have a
> similar feature that produces a relatively long lived NOT READY/FORMAT IN
> PROGRESS state, they may not bump into the problem.

     {0x04,0x04,"Logical unit not ready, format in progress"},
     {0x04,0x05,"Logical unit not ready, rebuild in progress"},
     {0x04,0x06,"Logical unit not ready, recalculation in progress"},
     {0x04,0x07,"Logical unit not ready, operation in progress"},
...
     {0x04,0x1b,"Logical unit not ready, sanitize in progress"},

Wouldn't perhaps 0x4,0x5 be more accurate? If someone managed to
send a FORMAT UNIT or SANITIZE to a physical drive behind your LV,
that would be a completely different issue.

Doug Gilbert



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 07/10] hpsa: hide logical drives with format in progress from linux
  2013-09-27 16:54           ` Douglas Gilbert
@ 2013-09-27 17:41             ` scameron
  0 siblings, 0 replies; 21+ messages in thread
From: scameron @ 2013-09-27 17:41 UTC (permalink / raw)
  To: Douglas Gilbert
  Cc: Tomas Henzl, james.bottomley, stephenmcameron, mikem, linux-scsi,
	scott.teel, scameron

On Fri, Sep 27, 2013 at 12:54:39PM -0400, Douglas Gilbert wrote:
> On 13-09-27 10:41 AM, scameron@beardog.cce.hp.com wrote:
> >On Fri, Sep 27, 2013 at 04:01:30PM +0200, Tomas Henzl wrote:
> >>On 09/27/2013 03:34 PM, scameron@beardog.cce.hp.com wrote:
> >>>On Fri, Sep 27, 2013 at 03:22:19PM +0200, Tomas Henzl wrote:
> >>>>On 09/23/2013 08:34 PM, Stephen M. Cameron wrote:
> >>>>>From: Stephen M. Cameron <scameron@beardog.cce.hp.com>
> >>>>>
> >>>>>SCSI mid layer doesn't seem to handle logical drives undergoing format
> >>>>>very well.  scsi_add_device on such devices seems to result in hitting
> >>>>>those devices with a TUR at a rate of 3Hz for awhile, transitioning
> >>>>>to hitting them with a READ(10) at a much higher rate indefinitely,
> >>>>>and at boot time, this prevents the system from coming up.  If we
> >>>>>do not expose such devices to the kernel, it isn't bothered by them.
> >>>>Is the result of this patch that the drive is no more visible for the 
> >>>>user
> >>>>and he can't follow the formatting progress?
> >>>Yes (subsequent patch monitors the progress and brings the drive
> >>>online when it's ready).
> >>>
> >>>>I think a better option is to fix the kernel to handle formatting 
> >>>>devices better
> >>>Yeah, you're probably right. (This is what comes of writing code for all
> >>>the distros then forward porting to kernel.org code.  
> >>>Grumble-grumble-management
> >>>grumble-grumble real-world problems.)
> >>>
> >>>>or harden the hpsa so it can cope with TURs or reads (ignore) from a 
> >>>>formatting
> >>>>device.
> >>>I don't think hpsa driver had any problem with the TURs or READs though,
> >>>they would be returned to the mid layer just fine (TUR returned sense 
> >>>data
> >>>indicating not ready, format in progress, I forget what the reads
> >>>returned, whatever the firmware filled in for the sense data, which
> >>>was reasonable), but the mid-layer was relentless and just never
> >>>really proceeded, iirc.
> >>>
> >>>Since we were trying to make this work on existing OSes where fixing the
> >>>SCSI mid layer wasn't an option, we came up with this.
> >>
> >>I'm actually glad that you care about existing OSes :)
> >
> >And the pain of porting would be much the same regardless of
> >whether the port is forward or backward, I suppose.
> >
> >>
> >>Do you know whether the midlayer has similar problems with other drivers?
> >
> >No, not sure.   One thing that's a bit unusual about hpsa is it uses
> >the scan_start and scan_finished members of scsi_host_template, so hpsa
> >does its own scanning, rather than let the midlayer do the scanning which
> >is due to Smart Array's weirdness around the vicinity of SCSI_REPORT_LUNS.
> >
> >I suspect that a lld driver calling scsi_add_device() on something which
> >is NOT READY/FORMAT IN PROGRESS is what provokes the trouble.  Most drivers
> >do not call scsi_add_device() directly at all, so it's quite possible most
> >drivers do not experience such a problem. A few do call scsi_add_device()
> >directly, like ipr or pmcraid, so these might conceivably have a similar
> >problem.
> >
> >We ran into this problem with what we call "Rapid Parity Initialization", 
> >which
> >is what you get when the RAID controller leaves the logical volume in a NOT
> >READY/FORMAT IN PROGRESS state and devotes itself entirely to initializing
> >parity data and when that's done, then the volume starts acting normally.
> >
> >Initializing the parity data can take quite a long time (hours), but not as
> >long as initializing it on the fly under load, which, with very large,
> >relatively slow drives can take nigh on forever, hence the "rapid" parity
> >initialization moniker.  So, if those other RAID controllers don't have a
> >similar feature that produces a relatively long lived NOT READY/FORMAT IN
> >PROGRESS state, they may not bump into the problem.
> 
>     {0x04,0x04,"Logical unit not ready, format in progress"},
>     {0x04,0x05,"Logical unit not ready, rebuild in progress"},
>     {0x04,0x06,"Logical unit not ready, recalculation in progress"},
>     {0x04,0x07,"Logical unit not ready, operation in progress"},
> ...
>     {0x04,0x1b,"Logical unit not ready, sanitize in progress"},
> 
> Wouldn't perhaps 0x4,0x5 be more accurate?  If someone managed to
> send a FORMAT UNIT or SANITIZE to a physical drive behind your LV,
> that would be a completely different issue.

Perhaps, but 0x04/0x04 is what the firmware returns in this instance.

-- steve


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 07/10] hpsa: hide logical drives with format in progress from linux
  2013-09-27 13:22   ` Tomas Henzl
  2013-09-27 13:34     ` scameron
@ 2013-09-27 19:11     ` scameron
  1 sibling, 0 replies; 21+ messages in thread
From: scameron @ 2013-09-27 19:11 UTC (permalink / raw)
  To: Tomas Henzl
  Cc: james.bottomley, stephenmcameron, mikem, linux-scsi, scott.teel,
	scameron

On Fri, Sep 27, 2013 at 03:22:19PM +0200, Tomas Henzl wrote:
> On 09/23/2013 08:34 PM, Stephen M. Cameron wrote:
> > From: Stephen M. Cameron <scameron@beardog.cce.hp.com>
> >
> > SCSI mid layer doesn't seem to handle logical drives undergoing format
> > very well.  scsi_add_device on such devices seems to result in hitting
> > those devices with a TUR at a rate of 3Hz for awhile, transitioning
> > to hitting them with a READ(10) at a much higher rate indefinitely,
> > and at boot time, this prevents the system from coming up.  If we
> > do not expose such devices to the kernel, it isn't bothered by them.
> 
> Is the result of this patch that the drive is no more visible for the user
> and he can't follow the formatting progress? 
> I think a better option is to fix the kernel to handle formatting devices better
> or harden the hpsa so it can cope with TURs or reads (ignore) from a formatting device.

So here is the behavior I see with linux-3.12-rc2 when create a logical
drive with rapid parity initialization enabled and then reboot
before the drive finishes.  Note that scsi 0:0:0:1 is
the device that's in this state.  Interspersed are some notes from
me, prefixed "smc> "

Summary: First you see sd (I think) printing dots very slowly.
Then you see udev get angry.  Then a couple stack traces one
from modprobe and one from dmraid, and the system doesn't
boot up.  20-something minutes have elapsed at this point. It
may eventually boot when the RPI finally finishes, but at this
point, I don't care, because 20 minutes is too long to be holding
things up.


HP HPSA Driver (v 3.4.0-1)                                                      
hpsa 0000:02:00.0: can't disable ASPM; OS doesn't have ASPM control             
hpsa 0000:02:00.0: MSIX                                                         
hpsa 0000:02:00.0: hpsa0: <0x323b> at IRQ 64 using DAC                          
scsi0 : hpsa                                                                    
hpsa 0000:02:00.0: RAID              device c0b3t0l0 added.                     
hpsa 0000:02:00.0: Direct-Access     device c0b0t0l0 added.                     
hpsa 0000:02:00.0: Direct-Access     device c0b0t0l1 added.                     
hpsa 0000:02:00.0: Direct-Access     device c0b0t0l2 added.                     
usb 1-1.3: new low-speed USB device number 3 using ehci-pci                     
scsi 0:3:0:0: RAID              HP       P420i            5.19 PQ: 0 ANSI: 5    
scsi 0:0:0:0: Direct-Access     HP       LOGICAL VOLUME   5.19 PQ: 0 ANSI: 5    
scsi 0:0:0:1: Direct-Access     HP       LOGICAL VOLUME   5.19 PQ: 0 ANSI: 5    
scsi 0:0:0:2: Direct-Access     HP       LOGICAL VOLUME   5.19 PQ: 0 ANSI: 5    
ata_piix 0000:00:1f.2: MAP [                                                    
 P0 P2 P1 P3 ]                                                                  
usb 1-1.3: New USB device found, idVendor=0624, idProduct=0341                  
usb 1-1.3: New USB device strings: Mfr=1, Product=2, SerialNumber=0             
usb 1-1.3: Product: HP 336047-B21                                               
usb 1-1.3: Manufacturer: Avocent                                                
input: Avocent HP 336047-B21 as /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.31
hid-generic 0003:0624:0341.0001: input,hidraw0: USB HID v1.10 Keyboard [Avocent0
scsi1 : ata_piix                                                                
scsi2 : ata_piix                                                                
ata1: SATA max UDMA/133 cmd 0x4000 ctl 0x4008 bmdma 0x4020 irq 17               
ata2: SATA max UDMA/133 cmd 0x4010 ctl 0x4018 bmdma 0x4028 irq 17               
input: Avocent HP 336047-B21 as /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.32
hid-generic 0003:0624:0341.0002: input,hidraw1: USB HID v1.10 Mouse [Avocent HP1
sd 0:0:0:0: [sda] 2344160432 512-byte logical blocks: (1.20 TB/1.09 TiB)        
sd 0:0:0:1: [sdb] Spinning up disk...                                           
usb 2-1.3: new high-speed USB device number 3 using ehci-pci                    
sd 0:0:0:2: [sdc] 390651840 512-byte logical blocks: (200 GB/186 GiB)           
sd 0:0:0:0: [sda] Write Protect is off                                          
sd 0:0:0:2: [sdc] Write Protect is off                                          
sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DA
sd 0:0:0:2: [sdc] Write cache: disabled, read cache: enabled, doesn't support DA
 sdc: unknown partition table                                                   
sd 0:0:0:2: [sdc] Attached SCSI disk                                            
 sda: sda1 sda2 sda3                                                            
sd 0:0:0:0: [sda] Attached SCSI disk                                            
usb 2-1.3: New USB device found, idVendor=0424, idProduct=2660                  
usb 2-1.3: New USB device strings: Mfr=0, Product=0, SerialNumber=0             
hub 2-1.3:1.0: USB hub found                                                    
hub 2-1.3:1.0: 2 ports detected                                                 
Switched to clocksource tsc                                                     
.ata2.01: failed to resume link (SControl 0)                                    
ata2.00: SATA link down (SStatus 0 SControl 300)                                
ata2.01: SATA link down (SStatus 4 SControl 0)                                  
ata1.01: failed to resume link (SControl 0)                                     
ata1.00: SATA link down (SStatus 0 SControl 300)                                
ata1.01: SATA link down (SStatus 4 SControl 0)                                  
................................................................................
sd 0:0:0:1: [sdb] 1757614684 512-byte logical blocks: (899 GB/838 GiB)          
sd 0:0:0:1: [sdb] 4096-byte physical blocks                                     
sd 0:0:0:1: [sdb] Write Protect is off                                          
sd 0:0:0:1: [sdb] Write cache: disabled, read cache: enabled, doesn't support DA
sd 0:0:0:1: [sdb] Spinning up disk...                                           
............................................................................... 


smc> there is a loooooong pause while it prints those dots above.
smc> below, udev starts getting angry...

 
udevadm settle - timeout of 180 seconds reached, the event queue contains:      
  /sys/devices/LNXSYSTM:00/device:00/PNP0A08:00/device:35/PNP0A06:00/PNP0501:00)
  /sys/devices/pci0000:00/0000:00:02.2/0000:02:00.0/host0/target0:3:0/0:3:0:0 ()
  /sys/devices/pci0000:00/0000:00:02.2/0000:02:00.0/host0/target0:3:0/0:3:0:0/s)
  /sys/devices/pci0000:00/0000:00:02.2/0000:02:00.0/host0/target0:3:0/0:3:0:0/b)
  /sys/devices/pci0000:00/0000:00:02.2/0000:02:00.0/host0/target0:0:0/0:0:0:0 ()
  /sys/devices/pci0000:00/0000:00:02.2/0000:02:00.0/host0/target0:0:0/0:0:0:0/s)
  /sys/devices/pci0000:00/0000:00:02.2/0000:02:00.0/host0/target0:0:0/0:0:0:0/b)
  /sys/devices/pci0000:00/0000:00:02.2/0000:02:00.0/host0/target0:0:0/0:0:0:1 ()
  /sys/devices/pci0000:00/0000:00:02.2/0000:02:00.0/host0/target0:0:0/0:0:0:1/s)
  /sys/devices/pci0000:00/0000:00:02.2/0000:02:00.0/host0/target0:0:0/0:0:0:1/b)
  /sys/devices/pci0000:00/0000:00:02.2/0000:02:00.0/host0/target0:0:0/0:0:0:2 ()
  /sys/devices/pci0000:00/0000:00:02.2/0000:02:00.0/host0/target0:0:0/0:0:0:2/s)
  /sys/devices/pci0000:00/0000:00:02.2/0000:02:00.0/host0/target0:0:0/0:0:0:2/b)
  /sys/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.3/1-1.3:1.0/input/input1 (2)
  /sys/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.3/1-1.3:1.0/input/input1/ev)
  /sys/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.3/1-1.3:1.1/input/input2 (2)
  /sys/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.3/1-1.3:1.1/input/input2/mo)
  /sys/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.3/1-1.3:1.1/input/input2/ev)
  /sys/devices/pci0000:00/0000:00:02.2/0000:02:00.0/host0/target0:0:0/0:0:0:0/s)
  /sys/devices/pci0000:00/0000:00:02.2/0000:02:00.0/host0/target0:0:0/0:0:0:1/s)
  /sys/devices/pci0000:00/0000:00:02.2/0000:02:00.0/host0/target0:0:0/0:0:0:2/s)
  /sys/devices/pci0000:00/0000:00:02.2/0000:02:00.0/host0/target0:0:0/0:0:0:2/b)
  /sys/devices/pci0000:00/0000:00:02.2/0000:02:00.0/host0/target0:0:0/0:0:0:0/b)
  /sys/devices/pci0000:00/0000:00:02.2/0000:02:00.0/host0/target0:0:0/0:0:0:0/b)
  /sys/devices/pci0000:00/0000:00:02.2/0000:02:00.0/host0/target0:0:0/0:0:0:0/b)
  /sys/devices/pci0000:00/0000:00:02.2/0000:02:00.0/host0/target0:0:0/0:0:0:0/b)
udevd[130]: worker [175] unexpectedly returned with status 0x0100               
                                                                                
udevd[130]: worker [175] failed while handling '/devices/pci0000:00/0000:00:02.'
                                                                                
udevd[130]: worker [176] unexpectedly returned with status 0x0100               
                                                                                
udevd[130]: worker [176] failed while handling '/devices/pci0000:00/0000:00:02.'
                                                                                
udevd[130]: worker [178] unexpectedly returned with status 0x0100               
                                                                                
udevd[130]: worker [178] failed while handling '/devices/pci0000:00/0000:00:02.'
                                                                                
udevd[130]: worker [179] unexpectedly returned with status 0x0100               
                                                                                
udevd[130]: worker [179] failed while handling '/devices/pci0000:00/0000:00:02.'
                                                                                
EXT4-fs (sda2): mounted filesystem with ordered data mode. Opts: (null)         
dracut: Mounted root filesystem /dev/sda2                                       
.SELinux:  Disabled at runtime.                                                 
type=1404 audit(1380289585.871:2): selinux=0 auid=4294967295 ses=4294967295     
dracut:                                                                         
dracut: Switching root                                                          
                Welcome to Red Hatreadahead: starting                           
 Enterprise Linux Server                                                        
.Starting udev: udev: starting version 147                                      
WARNING! power/level is deprecated; use power/control instead                   
.G.pps_core: LinuxPPS API ver. 1 registered                                     
pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@>
PTP clock support registered                                                    
tg3.c:v3.133 (Jul 29, 2013)                                                     
tg3 0000:03:00.0 eth0: Tigon3 [partno(629133-001) rev 5719001] (PCI Express) MA0
tg3 0000:03:00.0 eth0: attached PHY is 5719C (10/100/1000Base-T Ethernet) (Wire)
tg3 0000:03:00.0 eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1]       
tg3 0000:03:00.0 eth0: dma_rwctrl[00000001] dma_mask[64-bit]                    
tg3 0000:03:00.1 eth1: Tigon3 [partno(629133-001) rev 5719001] (PCI Express) MA1
tg3 0000:03:00.1 eth1: attached PHY is 5719C (10/100/1000Base-T Ethernet) (Wire)
tg3 0000:03:00.1 eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1]       
tg3 0000:03:00.1 eth1: dma_rwctrl[00000001] dma_mask[64-bit]                    
tg3 0000:03:00.2 eth2: Tigon3 [partno(629133-001) rev 5719001] (PCI Express) MA2
tg3 0000:03:00.2 eth2: attached PHY is 5719C (10/100/1000Base-T Ethernet) (Wire)
tg3 0000:03:00.2 eth2: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1]       
tg3 0000:03:00.2 eth2: dma_rwctrl[00000001] dma_mask[64-bit]                    
tg3 0000:03:00.3 eth3: Tigon3 [partno(629133-001) rev 5719001] (PCI Express) MA3
tg3 0000:03:00.3 eth3: attached PHY is 5719C (10/100/1000Base-T Ethernet) (Wire)
tg3 0000:03:00.3 eth3: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1]       
tg3 0000:03:00.3 eth3: dma_rwctrl[00000001] dma_mask[64-bit]                    
dca service started, version 1.12.1                                             
ioatdma: Intel(R) QuickData Technology Driver 4.00                              
ioatdma 0000:00:04.0: can't derive routing for PCI INT A                        
ioatdma 0000:00:04.0: PCI INT A: no GSI - using ISA IRQ 5                       
ioatdma 0000:00:04.1: can't derive routing for PCI INT B                        
ioatdma 0000:00:04.1: PCI INT B: no GSI - using ISA IRQ 7                       
ioatdma 0000:00:04.2: can't derive routing for PCI INT C                        
ioatdma 0000:00:04.2: PCI INT C: no GSI - using ISA IRQ 10                      
ioatdma 0000:00:04.3: can't derive routing for PCI INT D                        
ioatdma 0000:00:04.3: PCI INT D: no GSI - using ISA IRQ 10                      
ioatdma 0000:00:04.4: can't derive routing for PCI INT A                        
ioatdma 0000:00:04.4: PCI INT A: no GSI - using ISA IRQ 5                       
ioatdma 0000:00:04.5: can't derive routing for PCI INT B                        
ioatdma 0000:00:04.5: PCI INT B: no GSI - using ISA IRQ 7                       
ioatdma 0000:00:04.6: can't derive routing for PCI INT C                        
ioatdma 0000:00:04.6: PCI INT C: no GSI - using ISA IRQ 10                      
ioatdma 0000:00:04.7: can't derive routing for PCI INT D                        
ioatdma 0000:00:04.7: PCI INT D: no GSI - using ISA IRQ 10                      
.hpwdt 0000:01:00.0: HP Watchdog Timer Driver: NMI decoding initialized, allow )
hpwdt 0000:01:00.0: HP Watchdog Timer Driver: 1.3.2, timer margin: 30 seconds (.
                                                                                
ACPI Warning: 0x0000000000000928-0x000000000000092f SystemIO conflicts with Reg)
ACPI: If an ACPI driver is available for this device, you should use it insteadr
lpc_ich: Resource conflict(s) found affecting gpio_ich                          
EDAC MC: Ver: 3.0.0                                                             
EDAC sbridge: Seeking for: dev 0e.0 PCI ID 8086:3ca0                            
EDAC sbridge: Seeking for: dev 0e.0 PCI ID 8086:3ca0                            
EDAC sbridge: Seeking for: dev 0f.0 PCI ID 8086:3ca8                            
EDAC sbridge: Seeking for: dev 0f.0 PCI ID 8086:3ca8                            
EDAC sbridge: Seeking for: dev 0f.1 PCI ID 8086:3c71                            
EDAC sbridge: Seeking for: dev 0f.1 PCI ID 8086:3c71                            
EDAC sbridge: Seeking for: dev 0f.2 PCI ID 8086:3caa                            
EDAC sbridge: Seeking for: dev 0f.2 PCI ID 8086:3caa                            
EDAC sbridge: Seeking for: dev 0f.3 PCI ID 8086:3cab                            
EDAC sbridge: Seeking for: dev 0f.3 PCI ID 8086:3cab                            
EDAC sbridge: Seeking for: dev 0f.4 PCI ID 8086:3cac                            
EDAC sbridge: Seeking for: dev 0f.4 PCI ID 8086:3cac                            
EDAC sbridge: Seeking for: dev 0f.5 PCI ID 8086:3cad                            
EDAC sbridge: Seeking for: dev 0f.5 PCI ID 8086:3cad                            
EDAC sbridge: Seeking for: dev 11.0 PCI ID 8086:3cb8                            
EDAC sbridge: Seeking for: dev 11.0 PCI ID 8086:3cb8                            
EDAC sbridge: Seeking for: dev 0c.6 PCI ID 8086:3cf4                            
EDAC sbridge: Seeking for: dev 0c.6 PCI ID 8086:3cf4                            
EDAC sbridge: Seeking for: dev 0c.7 PCI ID 8086:3cf6                            
EDAC sbridge: Seeking for: dev 0c.7 PCI ID 8086:3cf6                            
EDAC sbridge: Seeking for: dev 0d.6 PCI ID 8086:3cf5                            
EDAC sbridge: Seeking for: dev 0d.6 PCI ID 8086:3cf5                            
EDAC MC0: Giving out device to 'sbridge_edac.c' 'Sandy Bridge Socket#0': DEV 000
EDAC sbridge: Driver loaded.                                                    
scsi 0:3:0:0: Attached scsi generic sg0 type 12                                 
sd 0:0:0:0: Attached scsi generic sg1 type 0                                    
sd 0:0:0:1: Attached scsi generic sg2 type 0                                    
sd 0:0:0:2: Attached scsi generic sg3 type 0                                    
input: PC Speaker as /devices/platform/pcspkr/input/input3                      
microcode: CPU0 sig=0x206d7, pf=0x1, revision=0x70d                             
microcode: CPU1 sig=0x206d7, pf=0x1, revision=0x70d                             
microcode: CPU2 sig=0x206d7, pf=0x1, revision=0x70d                             
microcode: CPU3 sig=0x206d7, pf=0x1, revision=0x70d                             
microcode: CPU4 sig=0x206d7, pf=0x1, revision=0x70d                             
microcode: CPU5 sig=0x206d7, pf=0x1, revision=0x70d                             
microcode: CPU6 sig=0x206d7, pf=0x1, revision=0x70d                             
microcode: CPU7 sig=0x206d7, pf=0x1, revision=0x70d                             
microcode: CPU8 sig=0x206d7, pf=0x1, revision=0x70d                             
microcode: CPU9 sig=0x206d7, pf=0x1, revision=0x70d                             
microcode: CPU10 sig=0x206d7, pf=0x1, revision=0x70d                            
microcode: CPU11 sig=0x206d7, pf=0x1, revision=0x70d                            
microcode: Microcode Update Driver: v2.00 <tigran@aivazian.fsnet.co.uk>, Peter a
ipmi message handler version 39.2                                               
IPMI System Interface driver.                                                   
ipmi_si: probing via ACPI                                                       
ipmi_si 00:02: [io  0x0ca2-0x0ca3] regsize 1 spacing 1 irq 0                    
ipmi_si: Adding ACPI-specified kcs state machine                                
ipmi_si: probing via SMBIOS                                                     
ipmi_si: SMBIOS: io 0xca2 regsize 1 spacing 1 irq 0                             
ipmi_si: Adding SMBIOS-specified kcs state machine duplicate interface          
ipmi_si: probing via SPMI                                                       
ipmi_si: SPMI: io 0xca2 regsize 2 spacing 2 irq 0                               
ipmi_si: Adding SPMI-specified kcs state machine duplicate interface            
ipmi_si: Trying ACPI-specified kcs state machine at i/o address 0xca2, slave ad0
ipmi_si 00:02: Found new BMC (man_id: 0x00000b, prod_id: 0x2000, dev_id: 0x13)  
ipmi_si 00:02: IPMI kcs interface initialized                                   
iTCO_vendor_support: vendor-support=0                                           
iTCO_wdt: Intel TCO WatchDog Timer Driver v1.10                                 
iTCO_wdt: unable to reset NO_REBOOT flag, device disabled by hardware/BIOS      
[  O.K  ]                                                                       
tun: Universal TUN/TAP device driver, 1.6                                       
tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com>                          
Setting hostname localhost.localdomain:  [  OK  ]                               
device-mapper: uevent: version 1.0.3                                            
device-mapper: ioctl: 4.26.0-ioctl (2013-08-15) initialised: dm-devel@redhat.com
...............not responding...                                                
INFO: task modprobe:487 blocked for more than 120 seconds.                      
      Not tainted 3.12.0-rc2+ #1                                                
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.       
modprobe        D 0000000000000000     0   487      1 0x00000000                
 ffff880c0bc6bdc8 0000000000000046 ffffffff8107af7d ffff880c0bc6a000            
 ffff880c0bc6bfd8 ffff880c0bc6a000 ffff880c0bc6a010 ffff880c0bc6a000            
 ffff880c0bc6bfd8 ffff880c0bc6a000 ffff880c09a16440 ffff880c0ee6a540            
Call Trace:                                                                     
 [<ffffffff8107af7d>] ? lowest_in_progress+0x4d/0x60                            
 [<ffffffff81592109>] schedule+0x29/0x70                                        
 [<ffffffff8107b005>] async_synchronize_cookie_domain+0x75/0x120                
 [<ffffffff81073c20>] ? wake_up_bit+0x40/0x40                                   
 [<ffffffff8107b0e8>] async_synchronize_full_domain+0x18/0x20                   
 [<ffffffff8107b100>] async_synchronize_full+0x10/0x20                          
 [<ffffffff810c7c65>] do_init_module+0x135/0x1b0                                
 [<ffffffff810c9932>] load_module+0x502/0x620                                   
 [<ffffffff810c7170>] ? __unlink_module+0x30/0x30                               
 [<ffffffff810c6760>] ? module_sect_show+0x30/0x30                              
 [<ffffffff810c9bd6>] SyS_init_module+0x96/0xc0                                 
 [<ffffffff8159d1d2>] system_call_fastpath+0x16/0x1b                            
no locks held by modprobe/487.     
INFO: task dmraid:6718 blocked for more than 120 seconds.                       
      Not tainted 3.12.0-rc2+ #1                                                
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.       
dmraid          D 0000000000000000     0  6718    553 0x00000000                
 ffff8800b9a51ae8 0000000000000046 ffff880c0a42d200 ffff8800b9a50000            
 ffff8800b9a51fd8 ffff8800b9a50000 ffff8800b9a50010 ffff8800b9a50000            
 ffff8800b9a51fd8 ffff8800b9a50000 ffff880c0a42c940 ffffffff81a104c0            
Call Trace:                                                                     
 [<ffffffff81592109>] schedule+0x29/0x70                                        
 [<ffffffff81592467>] schedule_preempt_disabled+0x27/0x40                       
 [<ffffffff8158f84a>] mutex_lock_nested+0x13a/0x340                             
 [<ffffffff811cc21e>] ? __blkdev_get+0x6e/0x490                                 
 [<ffffffff811cc21e>] __blkdev_get+0x6e/0x490                                   
 [<ffffffff811cb6a9>] ? bd_acquire+0x99/0xf0                                    
 [<ffffffff811cc69c>] blkdev_get+0x5c/0x210                                     
 [<ffffffff8159446b>] ? _raw_spin_unlock+0x2b/0x50                              
 [<ffffffff811cc850>] ? blkdev_get+0x210/0x210                                  
 [<ffffffff811cc8b2>] blkdev_open+0x62/0x80                                     
 [<ffffffff8118d46e>] do_dentry_open+0x24e/0x2e0                                
 [<ffffffff8118d615>] finish_open+0x35/0x50                                     
 [<ffffffff811a0ab6>] do_last+0x436/0x7e0                                       
 [<ffffffff811a0f24>] path_openat+0xc4/0x490                                    
 [<ffffffff811a142a>] do_filp_open+0x4a/0xa0                                    
 [<ffffffff811ae2c1>] ? __alloc_fd+0xb1/0x160                                   
 [<ffffffff8115f01f>] ? vm_munmap+0x5f/0x80                                     
 [<ffffffff8118e91a>] do_sys_open+0x11a/0x230                                   
 [<ffffffff81078223>] ? up_write+0x23/0x40                                      
 [<ffffffff81296909>] ? lockdep_sys_exit_thunk+0x35/0x67                        
 [<ffffffff8118ea6e>] SyS_open+0x1e/0x20                                        
 [<ffffffff8159d1d2>] system_call_fastpath+0x16/0x1b                            
1 lock held by dmraid/6718:                                                     
 #0:  (&bdev->bd_mutex){......}, at: [<ffffffff811cc21e>] __blkdev_get+0x6e/0x40

smc> and it's been 20-something minutes at this point, and the system is
still not up, still cannot login..

If anyone wants to try it themself, make a RAID5 volume on a smart array
with rapid parity init enabled and then reboot.

Userland is RHEL6u3, I think (might be RHEL6u4, I don't think it makes
a difference.).

-- steve


> 
> Also maybe a cmd_special_free is missing - see below
> 
> Cheers, Tomas
> Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
> ---
>  drivers/scsi/hpsa.c |   50 ++++++++++++++++++++++++++++++++++++++++++++++++--
>  drivers/scsi/hpsa.h |    1 +
>  2 files changed, 49 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
> index b7f405f..38e3af4 100644
> --- a/drivers/scsi/hpsa.c
> +++ b/drivers/scsi/hpsa.c
> @@ -1010,6 +1010,20 @@ static void adjust_hpsa_scsi_table(struct ctlr_info *h, int hostno,
>  	for (i = 0; i < nsds; i++) {
>  		if (!sd[i]) /* if already added above. */
>  			continue;
> +
> +		/* Don't add devices which are NOT READY, FORMAT IN PROGRESS
> +		 * as the SCSI mid-layer does not handle such devices well.
> +		 * It relentlessly loops sending TUR at 3Hz, then READ(10)
> +		 * at 160Hz, and prevents the system from coming up.
> +		 */
> +		if (sd[i]->format_in_progress) {
> +			dev_info(&h->pdev->dev,
> +				"Logical drive format in progress, device c%db%dt%dl%d offline.\n",
> +				h->scsi_host->host_no,
> +				sd[i]->bus, sd[i]->target, sd[i]->lun);
> +			continue;
> +		}
> +
>  		device_change = hpsa_scsi_find_entry(sd[i], h->dev,
>  					h->ndevices, &entry);
>  		if (device_change == DEVICE_NOT_FOUND) {
> @@ -1715,6 +1729,34 @@ static inline void hpsa_set_bus_target_lun(struct hpsa_scsi_dev_t *device,
>  	device->lun = lun;
>  }
>  
> +static unsigned char hpsa_format_in_progress(struct ctlr_info *h,
> +		unsigned char scsi3addr[])
> +{
> +	struct CommandList *c;
> +	unsigned char *sense, sense_key, asc, ascq;
> +#define ASC_LUN_NOT_READY 0x04
> +#define ASCQ_LUN_NOT_READY_FORMAT_IN_PROGRESS 0x04
> +
> +
> +	c = cmd_special_alloc(h);
> +	if (!c)
> +		return 0;
> +	fill_cmd(c, TEST_UNIT_READY, h, NULL, 0, 0, scsi3addr, TYPE_CMD);
> +	hpsa_scsi_do_simple_cmd_core(h, c);
> +	sense = c->err_info->SenseInfo;
> +	sense_key = sense[2];
> +	asc = sense[12];
> +	ascq = sense[13];
> +	if (c->err_info->CommandStatus == CMD_TARGET_STATUS &&
> +		c->err_info->ScsiStatus == SAM_STAT_CHECK_CONDITION &&
> +		sense_key == NOT_READY &&
> +		asc == ASC_LUN_NOT_READY &&
> +		ascq == ASCQ_LUN_NOT_READY_FORMAT_IN_PROGRESS)
> +		return 1;
> return^ without cmd_special_free
> 
> +	cmd_special_free(h, c);
> +	return 0;
> +}
> +
>  static int hpsa_update_device_info(struct ctlr_info *h,
>  	unsigned char scsi3addr[], struct hpsa_scsi_dev_t *this_device,
>  	unsigned char *is_OBDR_device)
> @@ -1753,10 +1795,14 @@ static int hpsa_update_device_info(struct ctlr_info *h,
>  		sizeof(this_device->device_id));
>  
>  	if (this_device->devtype == TYPE_DISK &&
> -		is_logical_dev_addr_mode(scsi3addr))
> +		is_logical_dev_addr_mode(scsi3addr)) {
>  		hpsa_get_raid_level(h, scsi3addr, &this_device->raid_level);
> -	else
> +		this_device->format_in_progress =
> +			hpsa_format_in_progress(h, scsi3addr);
> +	} else {
>  		this_device->raid_level = RAID_UNKNOWN;
> +		this_device->format_in_progress = 0;
> +	}
>  
>  	if (is_OBDR_device) {
>  		/* See if this is a One-Button-Disaster-Recovery device
> diff --git a/drivers/scsi/hpsa.h b/drivers/scsi/hpsa.h
> index bc85e72..4fd0d45 100644
> --- a/drivers/scsi/hpsa.h
> +++ b/drivers/scsi/hpsa.h
> @@ -46,6 +46,7 @@ struct hpsa_scsi_dev_t {
>  	unsigned char vendor[8];        /* bytes 8-15 of inquiry data */
>  	unsigned char model[16];        /* bytes 16-31 of inquiry data */
>  	unsigned char raid_level;	/* from inquiry page 0xC1 */
> +	unsigned char format_in_progress;
>  };
>  
>  struct reply_pool {
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 07/10] hpsa: hide logical drives with format in progress from linux
  2013-09-27 14:58           ` Tomas Henzl
@ 2013-09-30 21:18             ` scameron
  0 siblings, 0 replies; 21+ messages in thread
From: scameron @ 2013-09-30 21:18 UTC (permalink / raw)
  To: Tomas Henzl
  Cc: james.bottomley, stephenmcameron, mikem, linux-scsi, scott.teel,
	scameron

On Fri, Sep 27, 2013 at 04:58:41PM +0200, Tomas Henzl wrote:
> On 09/27/2013 04:41 PM, scameron@beardog.cce.hp.com wrote:
> > On Fri, Sep 27, 2013 at 04:01:30PM +0200, Tomas Henzl wrote:
> >> On 09/27/2013 03:34 PM, scameron@beardog.cce.hp.com wrote:
> >>> On Fri, Sep 27, 2013 at 03:22:19PM +0200, Tomas Henzl wrote:
> >>>> On 09/23/2013 08:34 PM, Stephen M. Cameron wrote:
> >>>>> From: Stephen M. Cameron <scameron@beardog.cce.hp.com>
> >>>>>
> >>>>> SCSI mid layer doesn't seem to handle logical drives undergoing format
> >>>>> very well.  scsi_add_device on such devices seems to result in hitting
> >>>>> those devices with a TUR at a rate of 3Hz for awhile, transitioning
> >>>>> to hitting them with a READ(10) at a much higher rate indefinitely,
> >>>>> and at boot time, this prevents the system from coming up.  If we
> >>>>> do not expose such devices to the kernel, it isn't bothered by them.
> >>>> Is the result of this patch that the drive is no more visible for the user
> >>>> and he can't follow the formatting progress? 
> >>> Yes (subsequent patch monitors the progress and brings the drive
> >>> online when it's ready).
> >>>
> >>>> I think a better option is to fix the kernel to handle formatting devices better
> >>> Yeah, you're probably right. (This is what comes of writing code for all
> >>> the distros then forward porting to kernel.org code.  Grumble-grumble-management
> >>> grumble-grumble real-world problems.)
> >>>
> >>>> or harden the hpsa so it can cope with TURs or reads (ignore) from a formatting
> >>>> device.
> >>> I don't think hpsa driver had any problem with the TURs or READs though,
> >>> they would be returned to the mid layer just fine (TUR returned sense data
> >>> indicating not ready, format in progress, I forget what the reads
> >>> returned, whatever the firmware filled in for the sense data, which
> >>> was reasonable), but the mid-layer was relentless and just never
> >>> really proceeded, iirc.
> >>>
> >>> Since we were trying to make this work on existing OSes where fixing the
> >>> SCSI mid layer wasn't an option, we came up with this.
> >> I'm actually glad that you care about existing OSes :)
> > And the pain of porting would be much the same regardless of
> > whether the port is forward or backward, I suppose.
> >
> >> Do you know whether the midlayer has similar problems with other drivers?
> > No, not sure.   One thing that's a bit unusual about hpsa is it uses
> > the scan_start and scan_finished members of scsi_host_template, so hpsa
> > does its own scanning, rather than let the midlayer do the scanning which
> > is due to Smart Array's weirdness around the vicinity of SCSI_REPORT_LUNS.
> >
> > I suspect that a lld driver calling scsi_add_device() on something which
> > is NOT READY/FORMAT IN PROGRESS is what provokes the trouble.  Most drivers
> > do not call scsi_add_device() directly at all, so it's quite possible most
> > drivers do not experience such a problem. A few do call scsi_add_device()
> > directly, like ipr or pmcraid, so these might conceivably have a similar
> > problem.  
> >
> > We ran into this problem with what we call "Rapid Parity Initialization", which
> > is what you get when the RAID controller leaves the logical volume in a NOT
> > READY/FORMAT IN PROGRESS state and devotes itself entirely to initializing
> > parity data and when that's done, then the volume starts acting normally.  
> >
> > Initializing the parity data can take quite a long time (hours), but not as
> > long as initializing it on the fly under load, which, with very large,
> > relatively slow drives can take nigh on forever, hence the "rapid" parity
> > initialization moniker.  So, if those other RAID controllers don't have a
> > similar feature that produces a relatively long lived NOT READY/FORMAT IN
> > PROGRESS state, they may not bump into the problem.
> >
> > It has been awhile since I've tried letting the driver call scsi_add_device()
> > on a device which is undergoing Rapid Parity Initialization, so I need to try
> > that with current code and see how it behaves.  I haven't thought about how to
> > fix it within the SCSI mid layer (presuming it still doesn't behave well)
> > since previously we only concerned ourselves with avoiding provoking the
> > undesirable behavior.
> >
> > -- steve
> 
> Thanks for the explanation. I hope I can look into this later. Sometimes later. When my
> real-world problems go away...

I have taken a stab at it... still working on it a bit, but it worked at
least one time for me.  Before I go too far off into the weeds, let me
describe what I have done.

In the sd driver, sd_revalidate_disk() calls sd_spinup_disk().
I modified sd_spinup_disk() to detect the NOT READY/FORMAT IN PROGRESS
status and return a status indicating that this state has been detected:

	defer_revalidation = sd_spinup_disk(sdkp);
	...

then, similar to media not present condition, skip trying to get
a bunch of stuff.

        if (sdkp->media_present && !defer_revalidation) {
                sd_read_capacity(sdkp, buffer);

                if (sd_try_extended_inquiry(sdp)) {
                        sd_read_block_provisioning(sdkp);
                        sd_read_block_limits(sdkp);
                        sd_read_block_characteristics(sdkp);
                }

                sd_read_write_protect_flag(sdkp, buffer);
                sd_read_cache_type(sdkp, buffer);
                sd_read_app_tag_own(sdkp, buffer);
                sd_read_write_same(sdkp, buffer);
        }

(skipping that stuff may not be necessary for FORMAT IN PROGRESS, not sure.)

At the end of sd_revalidate_disk():

	if (defer_revalidation)
                sd_schedule_revalidation(sdkp);


sd_schedule_revalidation() is a new function I added which:

1. Adds the disk to a list of disks to be monitored for completion of
   format in progress state.

2. Starts a thread if there's isn't one already running to monitor
   this list.

The thread which monitors the list of disks does a TUR for each disk
and if they are no longer NOT READY/FORMAT IN PROGRESS, it calls
sd_revalidate_disk() on them and removes them from the list.  When
the list is empty, the thread exits.

If all this seems reasonable, more or less, I will proceed with trying
to get this patch into a finished state.  If I'm off in the weeds 
barking up the wrong tree, and this sort of functionality is misguided,
not wanted, and unwelcome, let me know that and I'll drop it.

Here's some debugging output from my modified sd driver in which
you can see a device go from NOT READY/FORMAT IN PROGRESS to ready
and get revalidated: 

sd_deferred_revalidation_thread: awakened                                       
sd: monitored 2 offline devices.                                                
sd_deferred_revalidation_thread: sleeping                                       
sd_deferred_revalidation_thread: awakened                                       
sd: monitored 2 offline devices.                                                
sd_deferred_revalidation_thread: sleeping                                       
sd_deferred_revalidation_thread: awakened                                       
sd: monitored 2 offline devices.                                                
sd_deferred_revalidation_thread: sleeping                                       
sd_deferred_revalidation_thread: awakened                                       
sd: offline device came online!                                                 
sd: offline device came online!                                                 
sd: monitored 2 offline devices.                                                
sd 0:0:0:1: [sdb] Revalidating disk.                                            
sd 0:0:0:1: [sdb] 1757614684 512-byte logical blocks: (899 GB/838 GiB)          
sd 0:0:0:1: [sdb] 4096-byte physical blocks                                     
sdb: detected capacity change from 0 to 899898718208                            
sd 0:0:0:1: [sdb] Revalidating disk.                                            
sd revalidation thread exiting, list empty

(I think the count of 2 and the device "... came online!" message
is due to my patch being slightly buggy, but this is just kind of
proof of concept at this point.)

-- steve

> 
> >
> >> Tomas
> >>
> >>>> Also maybe a cmd_special_free is missing - see below
> >>> D'oh.  Ok, now that's just embarassing.  Thanks.
> >>>
> >>> -- steve
> >>>
> >>>> Cheers, Tomas
> >>>> Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
> >>>> ---
> >>>>  drivers/scsi/hpsa.c |   50 ++++++++++++++++++++++++++++++++++++++++++++++++--
> >>>>  drivers/scsi/hpsa.h |    1 +
> >>>>  2 files changed, 49 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
> >>>> index b7f405f..38e3af4 100644
> >>>> --- a/drivers/scsi/hpsa.c
> >>>> +++ b/drivers/scsi/hpsa.c
> >>>> @@ -1010,6 +1010,20 @@ static void adjust_hpsa_scsi_table(struct ctlr_info *h, int hostno,
> >>>>  	for (i = 0; i < nsds; i++) {
> >>>>  		if (!sd[i]) /* if already added above. */
> >>>>  			continue;
> >>>> +
> >>>> +		/* Don't add devices which are NOT READY, FORMAT IN PROGRESS
> >>>> +		 * as the SCSI mid-layer does not handle such devices well.
> >>>> +		 * It relentlessly loops sending TUR at 3Hz, then READ(10)
> >>>> +		 * at 160Hz, and prevents the system from coming up.
> >>>> +		 */
> >>>> +		if (sd[i]->format_in_progress) {
> >>>> +			dev_info(&h->pdev->dev,
> >>>> +				"Logical drive format in progress, device c%db%dt%dl%d offline.\n",
> >>>> +				h->scsi_host->host_no,
> >>>> +				sd[i]->bus, sd[i]->target, sd[i]->lun);
> >>>> +			continue;
> >>>> +		}
> >>>> +
> >>>>  		device_change = hpsa_scsi_find_entry(sd[i], h->dev,
> >>>>  					h->ndevices, &entry);
> >>>>  		if (device_change == DEVICE_NOT_FOUND) {
> >>>> @@ -1715,6 +1729,34 @@ static inline void hpsa_set_bus_target_lun(struct hpsa_scsi_dev_t *device,
> >>>>  	device->lun = lun;
> >>>>  }
> >>>>  
> >>>> +static unsigned char hpsa_format_in_progress(struct ctlr_info *h,
> >>>> +		unsigned char scsi3addr[])
> >>>> +{
> >>>> +	struct CommandList *c;
> >>>> +	unsigned char *sense, sense_key, asc, ascq;
> >>>> +#define ASC_LUN_NOT_READY 0x04
> >>>> +#define ASCQ_LUN_NOT_READY_FORMAT_IN_PROGRESS 0x04
> >>>> +
> >>>> +
> >>>> +	c = cmd_special_alloc(h);
> >>>> +	if (!c)
> >>>> +		return 0;
> >>>> +	fill_cmd(c, TEST_UNIT_READY, h, NULL, 0, 0, scsi3addr, TYPE_CMD);
> >>>> +	hpsa_scsi_do_simple_cmd_core(h, c);
> >>>> +	sense = c->err_info->SenseInfo;
> >>>> +	sense_key = sense[2];
> >>>> +	asc = sense[12];
> >>>> +	ascq = sense[13];
> >>>> +	if (c->err_info->CommandStatus == CMD_TARGET_STATUS &&
> >>>> +		c->err_info->ScsiStatus == SAM_STAT_CHECK_CONDITION &&
> >>>> +		sense_key == NOT_READY &&
> >>>> +		asc == ASC_LUN_NOT_READY &&
> >>>> +		ascq == ASCQ_LUN_NOT_READY_FORMAT_IN_PROGRESS)
> >>>> +		return 1;
> >>>> return^ without cmd_special_free
> >>>>
> >>>> +	cmd_special_free(h, c);
> >>>> +	return 0;
> >>>> +}
> >>>> +
> >>>>  static int hpsa_update_device_info(struct ctlr_info *h,
> >>>>  	unsigned char scsi3addr[], struct hpsa_scsi_dev_t *this_device,
> >>>>  	unsigned char *is_OBDR_device)
> >>>> @@ -1753,10 +1795,14 @@ static int hpsa_update_device_info(struct ctlr_info *h,
> >>>>  		sizeof(this_device->device_id));
> >>>>  
> >>>>  	if (this_device->devtype == TYPE_DISK &&
> >>>> -		is_logical_dev_addr_mode(scsi3addr))
> >>>> +		is_logical_dev_addr_mode(scsi3addr)) {
> >>>>  		hpsa_get_raid_level(h, scsi3addr, &this_device->raid_level);
> >>>> -	else
> >>>> +		this_device->format_in_progress =
> >>>> +			hpsa_format_in_progress(h, scsi3addr);
> >>>> +	} else {
> >>>>  		this_device->raid_level = RAID_UNKNOWN;
> >>>> +		this_device->format_in_progress = 0;
> >>>> +	}
> >>>>  
> >>>>  	if (is_OBDR_device) {
> >>>>  		/* See if this is a One-Button-Disaster-Recovery device
> >>>> diff --git a/drivers/scsi/hpsa.h b/drivers/scsi/hpsa.h
> >>>> index bc85e72..4fd0d45 100644
> >>>> --- a/drivers/scsi/hpsa.h
> >>>> +++ b/drivers/scsi/hpsa.h
> >>>> @@ -46,6 +46,7 @@ struct hpsa_scsi_dev_t {
> >>>>  	unsigned char vendor[8];        /* bytes 8-15 of inquiry data */
> >>>>  	unsigned char model[16];        /* bytes 16-31 of inquiry data */
> >>>>  	unsigned char raid_level;	/* from inquiry page 0xC1 */
> >>>> +	unsigned char format_in_progress;
> >>>>  };
> >>>>  
> >>>>  struct reply_pool {
> >>>>
> >>>> --
> >>>> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> >>>> the body of a message to majordomo@vger.kernel.org
> >>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> >>> the body of a message to majordomo@vger.kernel.org
> >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 07/10] hpsa: hide logical drives with format in progress from linux
  2013-09-27 13:34     ` scameron
  2013-09-27 14:01       ` Tomas Henzl
@ 2013-10-10 16:25       ` scameron
  1 sibling, 0 replies; 21+ messages in thread
From: scameron @ 2013-10-10 16:25 UTC (permalink / raw)
  To: Tomas Henzl
  Cc: james.bottomley, stephenmcameron, mikem, linux-scsi, scott.teel,
	scameron

On Fri, Sep 27, 2013 at 08:34:51AM -0500, scameron@beardog.cce.hp.com wrote:
> On Fri, Sep 27, 2013 at 03:22:19PM +0200, Tomas Henzl wrote:
> > On 09/23/2013 08:34 PM, Stephen M. Cameron wrote:
> > > From: Stephen M. Cameron <scameron@beardog.cce.hp.com>
> > >
> > > SCSI mid layer doesn't seem to handle logical drives undergoing format
> > > very well.  scsi_add_device on such devices seems to result in hitting
> > > those devices with a TUR at a rate of 3Hz for awhile, transitioning
> > > to hitting them with a READ(10) at a much higher rate indefinitely,
> > > and at boot time, this prevents the system from coming up.  If we
> > > do not expose such devices to the kernel, it isn't bothered by them.
> > 
> > Is the result of this patch that the drive is no more visible for the user
> > and he can't follow the formatting progress? 
> 
> Yes (subsequent patch monitors the progress and brings the drive
> online when it's ready).
> 
> > I think a better option is to fix the kernel to handle formatting devices better
> 
> Yeah, you're probably right. (This is what comes of writing code for all
> the distros then forward porting to kernel.org code.  Grumble-grumble-management
> grumble-grumble real-world problems.)

[...]

So I took a stab at modifying sd (below) which I am now thinking may not
be the right approach.  Empirically, it seems to work, but there are a 
couple of things not quite right.

The gist of the patch is, when sd_spinup_device encounters a scsi disk which
is NOT READY/FORMAT IN PROGRESS, it adds it to a list, and starts up a
thread to monitor the list (if one isn't already running).  The thread
periodically polls the disks until they become ready, and revalidates them
and removes them from the list when they become ready.  When the list is
empty the thread exits.  There are a lot of complicated locking and
get_device/put_device reference counting to make sure various rugs don't
get ripped out from under the thread.

One problematic area is on rmmod of sd.  It's using kthread_stop() to stop
the thread, but what if the thread's already gone?  I didn't figure out
how to get around that, and came to the conclusion that having the thread
sometimes decide when to exit on it's own, and having it sometimes be told
to exit on rmmod of sd probably can't work (although that is exactly what
I would like to do.)  So, hmm, not satisfactory.

But then I started thinking that maybe I'm approaching it all wrong.  Maybe
this should be done (mostly) from userspace.  Maybe some small change to sd
just to keep it from waiting around too long for disks that are in 
NOT READY/FORMAT IN PROGRESS state, and some change to udevd(?) or udev
rules... or something udev-ey? so that it can notice when a device that is
in this state has been added to the system and launch a userspace daemon to
poll the device with TURs via SG_IO and when it becomes ready, trigger the
revalidation from userspace.  That would probably be better than what I've
done below.

But, I'm not really sure where to start to get something like that rolling,
presuming doing something like that would be a better approach.

-- steve


commit 6b4265426c8aff03e4c28372409a3f1c2c760c28
Author: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Date:   Wed Oct 9 19:03:46 2013 -0500

    scsi: improve probing of disks that are temporarily offline
    
    The current handling of disks which are NOT READY/FORMAT
    IN PROGRESS is not ideal.  Instead of waiting for a period of time
    for the disk to become ready and then giving up in sd_spinup_disk(),
    we can create a thread to wait for a list of disks which are
    expected to eventually become ready which periodically polls such
    disks and revalidates them when they become ready.  When the list
    of not-ready disks becomes empty, the thread will exit.  This allows
    other things to proceed, for example, allowing the system to boot
    without a the usual delay that is mostly futile since the odds
    are good that a format in progress won't finish during the time
    that sd_spinup_disk would have waited.
    

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index e62d17d..3158780 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -51,6 +51,7 @@
 #include <linux/async.h>
 #include <linux/slab.h>
 #include <linux/pm_runtime.h>
+#include <linux/kthread.h>
 #include <asm/uaccess.h>
 #include <asm/unaligned.h>
 
@@ -126,6 +127,19 @@ static DEFINE_MUTEX(sd_ref_mutex);
 static struct kmem_cache *sd_cdb_cache;
 static mempool_t *sd_cdb_pool;
 
+static DEFINE_SPINLOCK(sd_revalidation_lock);
+struct sd_deferred_disk_list_entry {
+	struct scsi_disk *sdkp;
+	struct list_head deferred_list;
+};
+static struct list_head sd_deferred_disk_list =
+		LIST_HEAD_INIT(sd_deferred_disk_list);
+#define REVALIDATION_THREAD_STOPPED 0
+#define REVALIDATION_THREAD_RUNNING 1
+#define REVALIDATION_THREAD_STOPPING 2
+static int sd_revalidation_thread_state;
+static struct task_struct *sd_revalidation_thread;
+
 static const char *sd_cache_types[] = {
 	"write through", "none", "write back",
 	"write back, no read (daft)"
@@ -1733,8 +1747,11 @@ static int sd_done(struct scsi_cmnd *SCpnt)
 
 /*
  * spinup disk - called only in sd_revalidate_disk()
+ * Returns 1 if disk is not ready and revalidation
+ * should be deferred due to e.g., format in progress,
+ * 0 otherwise.
  */
-static void
+static int
 sd_spinup_disk(struct scsi_disk *sdkp)
 {
 	unsigned char cmd[10];
@@ -1743,6 +1760,7 @@ sd_spinup_disk(struct scsi_disk *sdkp)
 	unsigned int the_result;
 	struct scsi_sense_hdr sshdr;
 	int sense_valid = 0;
+	int defer_revalidation = 0;
 
 	spintime = 0;
 
@@ -1766,7 +1784,7 @@ sd_spinup_disk(struct scsi_disk *sdkp)
 			 * with any more polling.
 			 */
 			if (media_not_present(sdkp, &sshdr))
-				return;
+				return 0;
 
 			if (the_result)
 				sense_valid = scsi_sense_valid(&sshdr);
@@ -1799,6 +1817,13 @@ sd_spinup_disk(struct scsi_disk *sdkp)
 				break;	/* standby */
 			if (sshdr.asc == 4 && sshdr.ascq == 0xc)
 				break;	/* unavailable */
+			if (sshdr.asc == 4 && sshdr.ascq == 4) {
+				/* Format in progress.  This may take a long
+				 * time.  Defer revalidation until later.
+				 */
+				defer_revalidation = 1;
+				break;
+			}
 			/*
 			 * Issue command to spin up drive when not ready
 			 */
@@ -1853,6 +1878,7 @@ sd_spinup_disk(struct scsi_disk *sdkp)
 		else
 			printk("not responding...\n");
 	}
+	return defer_revalidation;
 }
 
 
@@ -2668,6 +2694,144 @@ static int sd_try_extended_inquiry(struct scsi_device *sdp)
 	return 0;
 }
 
+static int sd_device_temporarily_offline(struct scsi_disk *sdkp)
+{
+	unsigned char cmd[10];
+	unsigned int the_result;
+	struct scsi_sense_hdr sshdr;
+
+	cmd[0] = TEST_UNIT_READY;
+	memset((void *) &cmd[1], 0, 9);
+
+	the_result = scsi_execute_req(sdkp->device, cmd,
+				      DMA_NONE, NULL, 0,
+				      &sshdr, SD_TIMEOUT,
+				      SD_MAX_RETRIES, NULL);
+	if (media_not_present(sdkp, &sshdr))
+		return 0;
+	return (the_result && scsi_sense_valid(&sshdr) &&
+		sshdr.sense_key == NOT_READY &&
+		sshdr.asc == 4 && sshdr.ascq == 4);
+}
+
+#define SD_OFFLINE_DEVICE_POLL_TIME (100 * HZ)
+static int sd_deferred_revalidation_thread(void *v)
+{
+	struct sd_deferred_disk_list_entry *d;
+	unsigned long flags;
+	struct list_head revalidate_list, *this, *tmp;
+	int temp_offline;
+
+	INIT_LIST_HEAD(&revalidate_list);
+	spin_lock_irqsave(&sd_revalidation_lock, flags);
+	while (!list_empty(&sd_deferred_disk_list) && !kthread_should_stop()) {
+		/* Check if any of the offline devices have become ready */
+		list_for_each_safe(this, tmp, &sd_deferred_disk_list) {
+			d = list_entry(this, struct sd_deferred_disk_list_entry,
+					deferred_list);
+			get_device(&d->sdkp->dev);
+			spin_unlock_irqrestore(&sd_revalidation_lock, flags);
+			temp_offline = sd_device_temporarily_offline(d->sdkp);
+			spin_lock_irqsave(&sd_revalidation_lock, flags);
+			if (!temp_offline) {
+				list_del(this);
+				list_add(&d->deferred_list, &revalidate_list);
+			} else {
+				put_device(&d->sdkp->dev);
+			}
+		}
+		spin_unlock_irqrestore(&sd_revalidation_lock, flags);
+		list_for_each_safe(this, tmp, &revalidate_list) {
+			d = list_entry(this, struct sd_deferred_disk_list_entry,
+					deferred_list);
+			list_del(this);
+			revalidate_disk(d->sdkp->disk);
+			put_device(&d->sdkp->dev);
+			kfree(d);
+		}
+		schedule_timeout_uninterruptible(SD_OFFLINE_DEVICE_POLL_TIME);
+		spin_lock_irqsave(&sd_revalidation_lock, flags);
+	}
+	sd_revalidation_thread_state = REVALIDATION_THREAD_STOPPED;
+	spin_unlock_irqrestore(&sd_revalidation_lock, flags);
+	return 0;
+}
+
+static void sd_stop_deferred_revalidation_thread(void)
+{
+	unsigned long flags;
+	int stop_thread;
+
+	spin_lock_irqsave(&sd_revalidation_lock, flags);
+	stop_thread = (sd_revalidation_thread_state ==
+				REVALIDATION_THREAD_RUNNING);
+	if (stop_thread)
+		/* STOPPING state prevents new thread from starting. */
+		sd_revalidation_thread_state = REVALIDATION_THREAD_STOPPING;
+	spin_unlock_irqrestore(&sd_revalidation_lock, flags);
+	if (stop_thread)
+		kthread_stop(sd_revalidation_thread);
+}
+
+static void sd_remove_deferred_revalidation(struct scsi_disk *sdkp)
+{
+	struct list_head *this, *tmp;
+	struct sd_deferred_disk_list_entry *d = NULL;
+	unsigned long flags;
+
+	spin_lock_irqsave(&sd_revalidation_lock, flags);
+	list_for_each_safe(this, tmp, &sd_deferred_disk_list) {
+		d = list_entry(this, struct sd_deferred_disk_list_entry,
+					deferred_list);
+		if (d->sdkp == sdkp) {
+			list_del(this);
+			break;
+		}
+	}
+	spin_unlock_irqrestore(&sd_revalidation_lock, flags);
+	kfree(d);
+}
+
+static void sd_schedule_revalidation(struct scsi_disk *sdkp)
+{
+	struct sd_deferred_disk_list_entry *d, *entry;
+	unsigned long flags;
+
+	d = kzalloc(sizeof(*d), GFP_KERNEL);
+	if (!d) {
+		sd_printk(KERN_WARNING, sdkp,
+			"sd_schedule_revalidation: Memory allocation failure.\n");
+		return;
+	}
+	d->sdkp = sdkp;
+	spin_lock_irqsave(&sd_revalidation_lock, flags);
+
+	/* Only add the device if it is not already in the list */
+	list_for_each_entry(entry, &sd_deferred_disk_list, deferred_list) {
+		if (entry->sdkp == sdkp) {
+			spin_unlock_irqrestore(&sd_revalidation_lock, flags);
+			kfree(d);
+			return;
+		}
+	}
+
+	list_add(&d->deferred_list, &sd_deferred_disk_list);
+	if (sd_revalidation_thread_state == REVALIDATION_THREAD_STOPPED) {
+		sd_revalidation_thread_state = REVALIDATION_THREAD_RUNNING;
+		spin_unlock_irqrestore(&sd_revalidation_lock, flags);
+		sd_revalidation_thread =
+			kthread_run(sd_deferred_revalidation_thread, NULL,
+					"sd-deferred-revalidation");
+		spin_lock_irqsave(&sd_revalidation_lock, flags);
+	}
+	if (!sd_revalidation_thread) {
+		sd_revalidation_thread_state = REVALIDATION_THREAD_STOPPED;
+		sd_printk(KERN_WARNING, sdkp,
+			"sd_schedule_revalidation: Failed to start deferred revalidation thread\n");
+	}
+	spin_unlock_irqrestore(&sd_revalidation_lock, flags);
+}
+
 /**
  *	sd_revalidate_disk - called the first time a new disk is seen,
  *	performs disk spin up, read_capacity, etc.
@@ -2679,6 +2843,7 @@ static int sd_revalidate_disk(struct gendisk *disk)
 	struct scsi_device *sdp = sdkp->device;
 	unsigned char *buffer;
 	unsigned flush = 0;
+	int defer_revalidation;
 
 	SCSI_LOG_HLQUEUE(3, sd_printk(KERN_INFO, sdkp,
 				      "sd_revalidate_disk\n"));
@@ -2697,13 +2862,13 @@ static int sd_revalidate_disk(struct gendisk *disk)
 		goto out;
 	}
 
-	sd_spinup_disk(sdkp);
+	defer_revalidation = sd_spinup_disk(sdkp);
 
 	/*
 	 * Without media there is no reason to ask; moreover, some devices
 	 * react badly if we do.
 	 */
-	if (sdkp->media_present) {
+	if (sdkp->media_present && !defer_revalidation) {
 		sd_read_capacity(sdkp, buffer);
 
 		if (sd_try_extended_inquiry(sdp)) {
@@ -2718,7 +2883,8 @@ static int sd_revalidate_disk(struct gendisk *disk)
 		sd_read_write_same(sdkp, buffer);
 	}
 
-	sdkp->first_scan = 0;
+	if (!defer_revalidation)
+		sdkp->first_scan = 0;
 
 	/*
 	 * We now have all cache related info, determine how we deal
@@ -2736,6 +2902,9 @@ static int sd_revalidate_disk(struct gendisk *disk)
 	sd_config_write_same(sdkp);
 	kfree(buffer);
 
+	if (defer_revalidation)
+		sd_schedule_revalidation(sdkp);
+
  out:
 	return 0;
 }
@@ -2990,6 +3159,7 @@ static int sd_remove(struct device *dev)
 
 	sdkp = dev_get_drvdata(dev);
 	devt = disk_devt(sdkp->disk);
+	sd_remove_deferred_revalidation(sdkp);
 	scsi_autopm_get_device(sdkp->device);
 
 	async_synchronize_full_domain(&scsi_sd_probe_domain);
@@ -3204,6 +3374,7 @@ static void __exit exit_sd(void)
 
 	SCSI_LOG_HLQUEUE(3, printk("exit_sd: exiting sd driver\n"));
 
+	sd_stop_deferred_revalidation_thread();
 	scsi_unregister_driver(&sd_template.gendrv);
 	mempool_destroy(sd_cdb_pool);
 	kmem_cache_destroy(sd_cdb_cache);


^ permalink raw reply related	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2013-10-10 16:26 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-23 18:33 [PATCH 00/10] hpsa: September 2013 driver fixes Stephen M. Cameron
2013-09-23 18:33 ` [PATCH 01/10] hpsa: do not attempt to flush the cache on locked up controllers Stephen M. Cameron
2013-09-23 18:33 ` [PATCH 02/10] hpsa: add 5 second delay after doorbell reset Stephen M. Cameron
2013-09-23 18:33 ` [PATCH 03/10] hpsa: do not discard scsi status on aborted commands Stephen M. Cameron
2013-09-23 18:33 ` [PATCH 04/10] hpsa: remove unneeded include of seq_file.h Stephen M. Cameron
2013-09-23 18:33 ` [PATCH 05/10] hpsa: fix memory leak in CCISS_BIG_PASSTHRU ioctl Stephen M. Cameron
2013-09-23 18:33 ` [PATCH 06/10] hpsa: add MSA 2040 to list of external target devices Stephen M. Cameron
2013-09-23 18:34 ` [PATCH 07/10] hpsa: hide logical drives with format in progress from linux Stephen M. Cameron
2013-09-27 13:22   ` Tomas Henzl
2013-09-27 13:34     ` scameron
2013-09-27 14:01       ` Tomas Henzl
2013-09-27 14:41         ` scameron
2013-09-27 14:58           ` Tomas Henzl
2013-09-30 21:18             ` scameron
2013-09-27 16:54           ` Douglas Gilbert
2013-09-27 17:41             ` scameron
2013-10-10 16:25       ` scameron
2013-09-27 19:11     ` scameron
2013-09-23 18:34 ` [PATCH 08/10] hpsa: bring logical drives online when format completes Stephen M. Cameron
2013-09-23 18:34 ` [PATCH 09/10] hpsa: cap CCISS_PASSTHRU at 20 concurrent commands Stephen M. Cameron
2013-09-23 18:34 ` [PATCH 10/10] hpsa: prevent stalled i/o Stephen M. Cameron

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).