linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [libsas PATCH v12 00/11] libsas eh, discovery, suspend, and fixes
@ 2012-03-22  6:31 Dan Williams
  2012-03-22  6:31 ` [libsas PATCH v12 01/11] libsas: cleanup spurious calls to scsi_schedule_eh Dan Williams
                   ` (10 more replies)
  0 siblings, 11 replies; 24+ messages in thread
From: Dan Williams @ 2012-03-22  6:31 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide

Changes since v11: http://marc.info/?l=linux-scsi&m=133170574921314&w=2

This is primarily a collection of fixes to the libsas functionality that
has gone upstream in the initial scsi push for 3.3, but also includes a
revised suspend/resume implementation that was initially posted before
the merge window opened.

1/ PATCH 1: Maciej noticed that the original version of "libsas: cleanup
   spurious calls to scsi_schedule_eh" went too far when it deleted
   sas_ata_task_abort.

2/ PATCH 3: Hot plug testing incurs 1 minute timeouts, across 3 reset
   attempts, for ATA unplugs so provide an option to cut that down to
   one reset with a 10 second timeout.

3/ PATCH 4-5: Address long standing kernel panics when hotplugs occur
   during the initial scan.

4/ PATCH 6-9: Address corner cases and regressions in device discovery

5/ PATCH 10-11: Revised suspend implementation with a cleaner fix for
   the sd-probing vs PM-resume deadlock.

[libsas PATCH v12 01/11] libsas: cleanup spurious calls to scsi_schedule_eh
[libsas PATCH v12 02/11] libsas: trim sas_task of slow path infrastructure
[libsas PATCH v12 03/11] libata: reset once
[libsas PATCH v12 04/11] sysfs: handle 'parent deleted before child added'
[libsas PATCH v12 05/11] scsi_transport_sas: fix delete vs scan race
[libsas PATCH v12 06/11] libsas: unify domain_device sas_rphy lifetimes
[libsas PATCH v12 07/11] libsas: fix false positive 'device attached' conditions
[libsas PATCH v12 08/11] libsas: fix ata_eh clobbering ex_phys via smp_ata_check_ready
[libsas PATCH v12 09/11] libsas, libata: fix start of life for a sas ata_port
[libsas PATCH v12 10/11] scsi, sd: limit the scope of the async probe domain
[libsas PATCH v12 11/11] libsas: suspend / resume support

Note, I rebased the isci branch to move all scsi, libsas and libata
patches to this 'libsas' branch.

The following changes since commit cd8df932d894f3128c884e3ae1b2b484540513db:

  [SCSI] qla4xxx: Update driver version to 5.02.00-k15 (2012-02-29 17:03:03 -0600)

are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/djbw/isci.git libsas-eh-reworks-v12

Dan Carpenter (1):
      mvsas: remove unused variable in mvs_task_exec()

Dan Williams (21):
      libsas: introduce sas_work to fix sas_drain_work vs sas_queue_work
      libata, libsas: introduce sched_eh and end_eh port ops
      libsas: enforce eh strategy handlers only in eh context
      libsas: add sas_eh_abort_handler
      libsas: use ->lldd_I_T_nexus_reset for ->eh_bus_reset_handler
      isci: use sas eh strategy handlers
      libsas: trim sas_task of slow path infrastructure
      libsas: fix sas_get_port_device regression
      libata: reset once
      sysfs: handle 'parent deleted before child added'
      scsi_transport_sas: fix delete vs scan race
      libsas: unify domain_device sas_rphy lifetimes
      libsas: fix false positive 'device attached' conditions
      libsas: fix ata_eh clobbering ex_phys via smp_ata_check_ready
      libata: make ata_print_id atomic
      libsas: continue revalidation
      libsas, libata: fix start of life for a sas ata_port
      libata: export ata_port suspend/resume infrastructure for sas
      libsas: drop sata port multiplier infrastructure
      scsi, sd: limit the scope of the async probe domain
      libsas: suspend / resume support

Jeff Skirvin (1):
      libsas: sas_rediscover_dev did not look at the SMP exec status.

Maciej Trela (1):
      libsas: cleanup spurious calls to scsi_schedule_eh

Thomas Jackson (1):
      libsas: fix sas_find_bcast_phy() in the presence of 'vacant' phys

 Documentation/kernel-parameters.txt |    3 +
 drivers/ata/libata-core.c           |   67 ++++++++++---
 drivers/ata/libata-eh.c             |   59 +++++++++--
 drivers/ata/libata-scsi.c           |   35 ++++---
 drivers/ata/libata.h                |    2 +-
 drivers/scsi/ipr.c                  |    6 +-
 drivers/scsi/isci/init.c            |    3 +
 drivers/scsi/libsas/sas_ata.c       |  158 +++++++++++++++++++++++-----
 drivers/scsi/libsas/sas_discover.c  |  153 +++++++++++++++++++--------
 drivers/scsi/libsas/sas_dump.c      |    1 +
 drivers/scsi/libsas/sas_event.c     |   40 ++++----
 drivers/scsi/libsas/sas_expander.c  |   91 ++++++++++++-----
 drivers/scsi/libsas/sas_init.c      |  140 ++++++++++++++++++++++---
 drivers/scsi/libsas/sas_internal.h  |    7 +-
 drivers/scsi/libsas/sas_phy.c       |   42 +++++---
 drivers/scsi/libsas/sas_port.c      |   67 ++++++++++--
 drivers/scsi/libsas/sas_scsi_host.c |  195 ++++++++++++++++++++++++++++++----
 drivers/scsi/mvsas/mv_sas.c         |   21 ++--
 drivers/scsi/pm8001/pm8001_sas.c    |   37 ++++---
 drivers/scsi/scsi.c                 |    6 +
 drivers/scsi/scsi_lib.c             |   10 ++-
 drivers/scsi/scsi_pm.c              |    2 +-
 drivers/scsi/scsi_transport_sas.c   |    6 +-
 drivers/scsi/sd.c                   |    4 +-
 fs/sysfs/dir.c                      |    3 +
 include/linux/libata.h              |   19 +++-
 include/scsi/libsas.h               |   90 +++++++++++++---
 include/scsi/sas_ata.h              |   19 +++-
 include/scsi/scsi_device.h          |    4 +
 lib/kobject.c                       |    7 +-
 30 files changed, 1012 insertions(+), 285 deletions(-)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [libsas PATCH v12 01/11] libsas: cleanup spurious calls to scsi_schedule_eh
  2012-03-22  6:31 [libsas PATCH v12 00/11] libsas eh, discovery, suspend, and fixes Dan Williams
@ 2012-03-22  6:31 ` Dan Williams
  2012-03-22  6:32 ` [libsas PATCH v12 02/11] libsas: trim sas_task of slow path infrastructure Dan Williams
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Dan Williams @ 2012-03-22  6:31 UTC (permalink / raw)
  To: linux-scsi; +Cc: Maciej Trela, linux-ide, Artur Wojcik, Jacek Danecki

From: Maciej Trela <maciej.trela@intel.com>

eh is woken up automatically by the presence of failed commands,
scsi_schedule_eh is reserved for cases where there are no failed
commands.  This guarantees that host_eh_sceduled is only incremented
when an explicit eh request is made.

Reviewed-by: Jacek Danecki <jacek.danecki@intel.com>
Signed-off-by: Maciej Trela <maciej.trela@intel.com>
[fixed spurious delete of sas_ata_task_abort]
Signed-off-by: Artur Wojcik <artur.wojcik@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/libsas/sas_ata.c       |    1 -
 drivers/scsi/libsas/sas_scsi_host.c |    1 -
 2 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c
index bc0cecc..bc83704 100644
--- a/drivers/scsi/libsas/sas_ata.c
+++ b/drivers/scsi/libsas/sas_ata.c
@@ -587,7 +587,6 @@ void sas_ata_task_abort(struct sas_task *task)
 		spin_lock_irqsave(q->queue_lock, flags);
 		blk_abort_request(qc->scsicmd->request);
 		spin_unlock_irqrestore(q->queue_lock, flags);
-		scsi_schedule_eh(qc->scsicmd->device->host);
 		return;
 	}
 
diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index f0b9b7b..17339e5 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -1003,7 +1003,6 @@ void sas_task_abort(struct sas_task *task)
 		spin_lock_irqsave(q->queue_lock, flags);
 		blk_abort_request(sc->request);
 		spin_unlock_irqrestore(q->queue_lock, flags);
-		scsi_schedule_eh(sc->device->host);
 	}
 }
 


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [libsas PATCH v12 02/11] libsas: trim sas_task of slow path infrastructure
  2012-03-22  6:31 [libsas PATCH v12 00/11] libsas eh, discovery, suspend, and fixes Dan Williams
  2012-03-22  6:31 ` [libsas PATCH v12 01/11] libsas: cleanup spurious calls to scsi_schedule_eh Dan Williams
@ 2012-03-22  6:32 ` Dan Williams
  2012-03-22  6:32 ` [libsas PATCH v12 03/11] libata: reset once Dan Williams
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Dan Williams @ 2012-03-22  6:32 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide, Christoph Hellwig, Jack Wang

The timer and the completion are only used for slow path tasks (smp, and
lldd tmfs), yet we incur the allocation space and cpu setup time for
every fast path task.

Cc: Christoph Hellwig <hch@lst.de>
Acked-by: Jack Wang <jack_wang@usish.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/libsas/sas_expander.c  |   20 ++++++++++----------
 drivers/scsi/libsas/sas_init.c      |   23 +++++++++++++++++++++--
 drivers/scsi/libsas/sas_scsi_host.c |    8 ++++++--
 drivers/scsi/mvsas/mv_sas.c         |   20 ++++++++++----------
 drivers/scsi/pm8001/pm8001_sas.c    |   34 +++++++++++++++++-----------------
 include/scsi/libsas.h               |   14 +++++++++-----
 6 files changed, 73 insertions(+), 46 deletions(-)

diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c
index 05acd9e..0ab3796 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -51,14 +51,14 @@ static void smp_task_timedout(unsigned long _task)
 		task->task_state_flags |= SAS_TASK_STATE_ABORTED;
 	spin_unlock_irqrestore(&task->task_state_lock, flags);
 
-	complete(&task->completion);
+	complete(&task->slow_task->completion);
 }
 
 static void smp_task_done(struct sas_task *task)
 {
-	if (!del_timer(&task->timer))
+	if (!del_timer(&task->slow_task->timer))
 		return;
-	complete(&task->completion);
+	complete(&task->slow_task->completion);
 }
 
 /* Give it some long enough timeout. In seconds. */
@@ -79,7 +79,7 @@ static int smp_execute_task(struct domain_device *dev, void *req, int req_size,
 			break;
 		}
 
-		task = sas_alloc_task(GFP_KERNEL);
+		task = sas_alloc_slow_task(GFP_KERNEL);
 		if (!task) {
 			res = -ENOMEM;
 			break;
@@ -91,20 +91,20 @@ static int smp_execute_task(struct domain_device *dev, void *req, int req_size,
 
 		task->task_done = smp_task_done;
 
-		task->timer.data = (unsigned long) task;
-		task->timer.function = smp_task_timedout;
-		task->timer.expires = jiffies + SMP_TIMEOUT*HZ;
-		add_timer(&task->timer);
+		task->slow_task->timer.data = (unsigned long) task;
+		task->slow_task->timer.function = smp_task_timedout;
+		task->slow_task->timer.expires = jiffies + SMP_TIMEOUT*HZ;
+		add_timer(&task->slow_task->timer);
 
 		res = i->dft->lldd_execute_task(task, 1, GFP_KERNEL);
 
 		if (res) {
-			del_timer(&task->timer);
+			del_timer(&task->slow_task->timer);
 			SAS_DPRINTK("executing SMP task failed:%d\n", res);
 			break;
 		}
 
-		wait_for_completion(&task->completion);
+		wait_for_completion(&task->slow_task->completion);
 		res = -ECOMM;
 		if ((task->task_state_flags & SAS_TASK_STATE_ABORTED)) {
 			SAS_DPRINTK("smp task timed out or aborted\n");
diff --git a/drivers/scsi/libsas/sas_init.c b/drivers/scsi/libsas/sas_init.c
index 1bbab3d..014297c 100644
--- a/drivers/scsi/libsas/sas_init.c
+++ b/drivers/scsi/libsas/sas_init.c
@@ -48,18 +48,37 @@ struct sas_task *sas_alloc_task(gfp_t flags)
 		INIT_LIST_HEAD(&task->list);
 		spin_lock_init(&task->task_state_lock);
 		task->task_state_flags = SAS_TASK_STATE_PENDING;
-		init_timer(&task->timer);
-		init_completion(&task->completion);
 	}
 
 	return task;
 }
 EXPORT_SYMBOL_GPL(sas_alloc_task);
 
+struct sas_task *sas_alloc_slow_task(gfp_t flags)
+{
+	struct sas_task *task = sas_alloc_task(flags);
+	struct sas_task_slow *slow = kmalloc(sizeof(*slow), flags);
+
+	if (!task || !slow) {
+		if (task)
+			kmem_cache_free(sas_task_cache, task);
+		kfree(slow);
+		return NULL;
+	}
+
+	task->slow_task = slow;
+	init_timer(&slow->timer);
+	init_completion(&slow->completion);
+
+	return task;
+}
+EXPORT_SYMBOL_GPL(sas_alloc_slow_task);
+
 void sas_free_task(struct sas_task *task)
 {
 	if (task) {
 		BUG_ON(!list_empty(&task->list));
+		kfree(task->slow_task);
 		kmem_cache_free(sas_task_cache, task);
 	}
 }
diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index 6764148..6e795a1 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -1134,9 +1134,13 @@ void sas_task_abort(struct sas_task *task)
 
 	/* Escape for libsas internal commands */
 	if (!sc) {
-		if (!del_timer(&task->timer))
+		struct sas_task_slow *slow = task->slow_task;
+
+		if (!slow)
+			return;
+		if (!del_timer(&slow->timer))
 			return;
-		task->timer.function(task->timer.data);
+		slow->timer.function(slow->timer.data);
 		return;
 	}
 
diff --git a/drivers/scsi/mvsas/mv_sas.c b/drivers/scsi/mvsas/mv_sas.c
index b68a653..d0462b8 100644
--- a/drivers/scsi/mvsas/mv_sas.c
+++ b/drivers/scsi/mvsas/mv_sas.c
@@ -1365,9 +1365,9 @@ void mvs_dev_gone(struct domain_device *dev)
 
 static void mvs_task_done(struct sas_task *task)
 {
-	if (!del_timer(&task->timer))
+	if (!del_timer(&task->slow_task->timer))
 		return;
-	complete(&task->completion);
+	complete(&task->slow_task->completion);
 }
 
 static void mvs_tmf_timedout(unsigned long data)
@@ -1375,7 +1375,7 @@ static void mvs_tmf_timedout(unsigned long data)
 	struct sas_task *task = (struct sas_task *)data;
 
 	task->task_state_flags |= SAS_TASK_STATE_ABORTED;
-	complete(&task->completion);
+	complete(&task->slow_task->completion);
 }
 
 #define MVS_TASK_TIMEOUT 20
@@ -1386,7 +1386,7 @@ static int mvs_exec_internal_tmf_task(struct domain_device *dev,
 	struct sas_task *task = NULL;
 
 	for (retry = 0; retry < 3; retry++) {
-		task = sas_alloc_task(GFP_KERNEL);
+		task = sas_alloc_slow_task(GFP_KERNEL);
 		if (!task)
 			return -ENOMEM;
 
@@ -1396,20 +1396,20 @@ static int mvs_exec_internal_tmf_task(struct domain_device *dev,
 		memcpy(&task->ssp_task, parameter, para_len);
 		task->task_done = mvs_task_done;
 
-		task->timer.data = (unsigned long) task;
-		task->timer.function = mvs_tmf_timedout;
-		task->timer.expires = jiffies + MVS_TASK_TIMEOUT*HZ;
-		add_timer(&task->timer);
+		task->slow_task->timer.data = (unsigned long) task;
+		task->slow_task->timer.function = mvs_tmf_timedout;
+		task->slow_task->timer.expires = jiffies + MVS_TASK_TIMEOUT*HZ;
+		add_timer(&task->slow_task->timer);
 
 		res = mvs_task_exec(task, 1, GFP_KERNEL, NULL, 1, tmf);
 
 		if (res) {
-			del_timer(&task->timer);
+			del_timer(&task->slow_task->timer);
 			mv_printk("executing internel task failed:%d\n", res);
 			goto ex_err;
 		}
 
-		wait_for_completion(&task->completion);
+		wait_for_completion(&task->slow_task->completion);
 		res = TMF_RESP_FUNC_FAILED;
 		/* Even TMF timed out, return direct. */
 		if ((task->task_state_flags & SAS_TASK_STATE_ABORTED)) {
diff --git a/drivers/scsi/pm8001/pm8001_sas.c b/drivers/scsi/pm8001/pm8001_sas.c
index fdbba57..b961112 100644
--- a/drivers/scsi/pm8001/pm8001_sas.c
+++ b/drivers/scsi/pm8001/pm8001_sas.c
@@ -650,9 +650,9 @@ int pm8001_dev_found(struct domain_device *dev)
 
 static void pm8001_task_done(struct sas_task *task)
 {
-	if (!del_timer(&task->timer))
+	if (!del_timer(&task->slow_task->timer))
 		return;
-	complete(&task->completion);
+	complete(&task->slow_task->completion);
 }
 
 static void pm8001_tmf_timedout(unsigned long data)
@@ -660,7 +660,7 @@ static void pm8001_tmf_timedout(unsigned long data)
 	struct sas_task *task = (struct sas_task *)data;
 
 	task->task_state_flags |= SAS_TASK_STATE_ABORTED;
-	complete(&task->completion);
+	complete(&task->slow_task->completion);
 }
 
 #define PM8001_TASK_TIMEOUT 20
@@ -683,7 +683,7 @@ static int pm8001_exec_internal_tmf_task(struct domain_device *dev,
 	struct pm8001_hba_info *pm8001_ha = pm8001_find_ha_by_dev(dev);
 
 	for (retry = 0; retry < 3; retry++) {
-		task = sas_alloc_task(GFP_KERNEL);
+		task = sas_alloc_slow_task(GFP_KERNEL);
 		if (!task)
 			return -ENOMEM;
 
@@ -691,21 +691,21 @@ static int pm8001_exec_internal_tmf_task(struct domain_device *dev,
 		task->task_proto = dev->tproto;
 		memcpy(&task->ssp_task, parameter, para_len);
 		task->task_done = pm8001_task_done;
-		task->timer.data = (unsigned long)task;
-		task->timer.function = pm8001_tmf_timedout;
-		task->timer.expires = jiffies + PM8001_TASK_TIMEOUT*HZ;
-		add_timer(&task->timer);
+		task->slow_task->timer.data = (unsigned long)task;
+		task->slow_task->timer.function = pm8001_tmf_timedout;
+		task->slow_task->timer.expires = jiffies + PM8001_TASK_TIMEOUT*HZ;
+		add_timer(&task->slow_task->timer);
 
 		res = pm8001_task_exec(task, 1, GFP_KERNEL, 1, tmf);
 
 		if (res) {
-			del_timer(&task->timer);
+			del_timer(&task->slow_task->timer);
 			PM8001_FAIL_DBG(pm8001_ha,
 				pm8001_printk("Executing internal task "
 				"failed\n"));
 			goto ex_err;
 		}
-		wait_for_completion(&task->completion);
+		wait_for_completion(&task->slow_task->completion);
 		res = -TMF_RESP_FUNC_FAILED;
 		/* Even TMF timed out, return direct. */
 		if ((task->task_state_flags & SAS_TASK_STATE_ABORTED)) {
@@ -765,17 +765,17 @@ pm8001_exec_internal_task_abort(struct pm8001_hba_info *pm8001_ha,
 	struct sas_task *task = NULL;
 
 	for (retry = 0; retry < 3; retry++) {
-		task = sas_alloc_task(GFP_KERNEL);
+		task = sas_alloc_slow_task(GFP_KERNEL);
 		if (!task)
 			return -ENOMEM;
 
 		task->dev = dev;
 		task->task_proto = dev->tproto;
 		task->task_done = pm8001_task_done;
-		task->timer.data = (unsigned long)task;
-		task->timer.function = pm8001_tmf_timedout;
-		task->timer.expires = jiffies + PM8001_TASK_TIMEOUT * HZ;
-		add_timer(&task->timer);
+		task->slow_task->timer.data = (unsigned long)task;
+		task->slow_task->timer.function = pm8001_tmf_timedout;
+		task->slow_task->timer.expires = jiffies + PM8001_TASK_TIMEOUT * HZ;
+		add_timer(&task->slow_task->timer);
 
 		res = pm8001_tag_alloc(pm8001_ha, &ccb_tag);
 		if (res)
@@ -789,13 +789,13 @@ pm8001_exec_internal_task_abort(struct pm8001_hba_info *pm8001_ha,
 			pm8001_dev, flag, task_tag, ccb_tag);
 
 		if (res) {
-			del_timer(&task->timer);
+			del_timer(&task->slow_task->timer);
 			PM8001_FAIL_DBG(pm8001_ha,
 				pm8001_printk("Executing internal task "
 				"failed\n"));
 			goto ex_err;
 		}
-		wait_for_completion(&task->completion);
+		wait_for_completion(&task->slow_task->completion);
 		res = TMF_RESP_FUNC_FAILED;
 		/* Even TMF timed out, return direct. */
 		if ((task->task_state_flags & SAS_TASK_STATE_ABORTED)) {
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index 45f9534..f66e83b 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -612,10 +612,6 @@ struct sas_task {
 
 	enum   sas_protocol      task_proto;
 
-	/* Used by the discovery code. */
-	struct timer_list     timer;
-	struct completion     completion;
-
 	union {
 		struct sas_ata_task ata_task;
 		struct sas_smp_task smp_task;
@@ -632,8 +628,15 @@ struct sas_task {
 
 	void   *lldd_task;	  /* for use by LLDDs */
 	void   *uldd_task;
+	struct sas_task_slow *slow_task;
+};
 
-	struct work_struct abort_work;
+struct sas_task_slow {
+	/* standard/extra infrastructure for slow path commands (SMP and
+	 * internal lldd commands
+	 */
+	struct timer_list     timer;
+	struct completion     completion;
 };
 
 #define SAS_TASK_STATE_PENDING      1
@@ -643,6 +646,7 @@ struct sas_task {
 #define SAS_TASK_AT_INITIATOR       16
 
 extern struct sas_task *sas_alloc_task(gfp_t flags);
+extern struct sas_task *sas_alloc_slow_task(gfp_t flags);
 extern void sas_free_task(struct sas_task *task);
 
 struct sas_domain_function_template {


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [libsas PATCH v12 03/11] libata: reset once
  2012-03-22  6:31 [libsas PATCH v12 00/11] libsas eh, discovery, suspend, and fixes Dan Williams
  2012-03-22  6:31 ` [libsas PATCH v12 01/11] libsas: cleanup spurious calls to scsi_schedule_eh Dan Williams
  2012-03-22  6:32 ` [libsas PATCH v12 02/11] libsas: trim sas_task of slow path infrastructure Dan Williams
@ 2012-03-22  6:32 ` Dan Williams
  2012-03-22  6:32 ` [libsas PATCH v12 04/11] sysfs: handle 'parent deleted before child added' Dan Williams
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Dan Williams @ 2012-03-22  6:32 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide

Hotplug testing with libsas currently encounters a 55 second wait for
link recovery to give up.  In the case where the user trusts the
response time of their devices permit the recovery attempts to be
limited to one.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 Documentation/kernel-parameters.txt |    3 +++
 drivers/ata/libata-core.c           |    1 +
 drivers/ata/libata-eh.c             |    2 ++
 include/linux/libata.h              |    1 +
 4 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 033d4e6..0154d16 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1306,6 +1306,9 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			* nohrst, nosrst, norst: suppress hard, soft
                           and both resets.
 
+			* rstonce: only attempt one reset during
+			  hot-unplug link recovery
+
 			* dump_id: dump IDENTIFY data.
 
 			If there are multiple matching configurations changing
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index ea8444e..0491504 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -6381,6 +6381,7 @@ static int __init ata_parse_force_one(char **cur,
 		{ "nohrst",	.lflags		= ATA_LFLAG_NO_HRST },
 		{ "nosrst",	.lflags		= ATA_LFLAG_NO_SRST },
 		{ "norst",	.lflags		= ATA_LFLAG_NO_HRST | ATA_LFLAG_NO_SRST },
+		{ "rstonce",	.lflags		= ATA_LFLAG_RST_ONCE },
 	};
 	char *start = *cur, *p = *cur;
 	char *id, *val, *endp;
diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index 4f12f63..b02748b 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -2605,6 +2605,8 @@ int ata_eh_reset(struct ata_link *link, int classify,
 	 */
 	while (ata_eh_reset_timeouts[max_tries] != ULONG_MAX)
 		max_tries++;
+	if (link->flags & ATA_LFLAG_RST_ONCE)
+		max_tries = 1;
 	if (link->flags & ATA_LFLAG_NO_HRST)
 		hardreset = NULL;
 	if (link->flags & ATA_LFLAG_NO_SRST)
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 83ff256..ad2ea33 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -182,6 +182,7 @@ enum {
 	ATA_LFLAG_DISABLED	= (1 << 6), /* link is disabled */
 	ATA_LFLAG_SW_ACTIVITY	= (1 << 7), /* keep activity stats */
 	ATA_LFLAG_NO_LPM	= (1 << 8), /* disable LPM on this link */
+	ATA_LFLAG_RST_ONCE	= (1 << 9), /* limit recovery to one reset */
 
 	/* struct ata_port flags */
 	ATA_FLAG_SLAVE_POSS	= (1 << 0), /* host supports slave dev */


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [libsas PATCH v12 04/11] sysfs: handle 'parent deleted before child added'
  2012-03-22  6:31 [libsas PATCH v12 00/11] libsas eh, discovery, suspend, and fixes Dan Williams
                   ` (2 preceding siblings ...)
  2012-03-22  6:32 ` [libsas PATCH v12 03/11] libata: reset once Dan Williams
@ 2012-03-22  6:32 ` Dan Williams
  2012-03-22 14:39   ` Greg Kroah-Hartman
  2012-03-22 14:47   ` James Bottomley
  2012-03-22  6:32 ` [libsas PATCH v12 05/11] scsi_transport_sas: fix delete vs scan race Dan Williams
                   ` (6 subsequent siblings)
  10 siblings, 2 replies; 24+ messages in thread
From: Dan Williams @ 2012-03-22  6:32 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide, Greg Kroah-Hartman

In scsi at least two cases of the parent device being deleted before the
child is added have been observed.

1/ scsi is performing async scans and the device is removed prior to the
   async can thread running.

2/ libsas discovery event running after the parent port has been torn
   down.

Result in crash signatures like:
 BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
 IP: [<ffffffff8115e100>] sysfs_create_dir+0x32/0xb6
 ...
 Process scsi_scan_8 (pid: 5417, threadinfo ffff88080bd16000, task ffff880801b8a0b0)
 Stack:
  00000000fffffffe ffff880813470628 ffff88080bd17cd0 ffff88080614b7e8
  ffff88080b45c108 00000000fffffffe ffff88080bd17d20 ffffffff8125e4a8
  ffff88080bd17cf0 ffffffff81075149 ffff88080bd17d30 ffff88080614b7e8
 Call Trace:
  [<ffffffff8125e4a8>] kobject_add_internal+0x120/0x1e3
  [<ffffffff81075149>] ? trace_hardirqs_on+0xd/0xf
  [<ffffffff8125e641>] kobject_add_varg+0x41/0x50
  [<ffffffff8125e70b>] kobject_add+0x64/0x66
  [<ffffffff8131122b>] device_add+0x12d/0x63a

These scenarios need to be cleaned up, but in the meantime the system
need not crash if this ordering occurs.  Instead report:

 kobject_add_internal failed for target8:0:16 (error: -2 parent: end_device-8:0:24)
 Pid: 2942, comm: scsi_scan_8 Not tainted 3.3.0-rc7-isci+ #2
 Call Trace:
  [<ffffffff8125e551>] kobject_add_internal+0x1c1/0x1f3
  [<ffffffff81075149>] ? trace_hardirqs_on+0xd/0xf
  [<ffffffff8125e659>] kobject_add_varg+0x41/0x50
  [<ffffffff8125e723>] kobject_add+0x64/0x66
  [<ffffffff8131124b>] device_add+0x12d/0x63a
  [<ffffffff8125e0ef>] ? kobject_put+0x4c/0x50
  [<ffffffff8132f370>] scsi_sysfs_add_sdev+0x4e/0x28a
  [<ffffffff8132dce3>] do_scan_async+0x9c/0x145

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 fs/sysfs/dir.c |    3 +++
 lib/kobject.c  |    7 ++++---
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 7fdf6a7..86521ee 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -714,6 +714,9 @@ int sysfs_create_dir(struct kobject * kobj)
 	else
 		parent_sd = &sysfs_root;
 
+	if (!parent_sd)
+		return -ENOENT;
+
 	if (sysfs_ns_type(parent_sd))
 		ns = kobj->ktype->namespace(kobj);
 	type = sysfs_read_ns_type(kobj);
diff --git a/lib/kobject.c b/lib/kobject.c
index c33d7a1..e5f86c0 100644
--- a/lib/kobject.c
+++ b/lib/kobject.c
@@ -192,13 +192,14 @@ static int kobject_add_internal(struct kobject *kobj)
 
 		/* be noisy on error issues */
 		if (error == -EEXIST)
-			printk(KERN_ERR "%s failed for %s with "
+			pr_err("%s failed for %s with "
 			       "-EEXIST, don't try to register things with "
 			       "the same name in the same directory.\n",
 			       __func__, kobject_name(kobj));
 		else
-			printk(KERN_ERR "%s failed for %s (%d)\n",
-			       __func__, kobject_name(kobj), error);
+			pr_err("%s failed for %s (error: %d parent: %s)\n",
+			       __func__, kobject_name(kobj), error,
+			       parent ? kobject_name(parent) : "'none'");
 		dump_stack();
 	} else
 		kobj->state_in_sysfs = 1;


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [libsas PATCH v12 05/11] scsi_transport_sas: fix delete vs scan race
  2012-03-22  6:31 [libsas PATCH v12 00/11] libsas eh, discovery, suspend, and fixes Dan Williams
                   ` (3 preceding siblings ...)
  2012-03-22  6:32 ` [libsas PATCH v12 04/11] sysfs: handle 'parent deleted before child added' Dan Williams
@ 2012-03-22  6:32 ` Dan Williams
  2012-03-22  6:32 ` [libsas PATCH v12 06/11] libsas: unify domain_device sas_rphy lifetimes Dan Williams
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Dan Williams @ 2012-03-22  6:32 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide

The following crash results from cases where the end_device has been
removed before scsi_sysfs_add_sdev has had a chance to run.

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
 IP: [<ffffffff8115e100>] sysfs_create_dir+0x32/0xb6
 ...
 Call Trace:
  [<ffffffff8125e4a8>] kobject_add_internal+0x120/0x1e3
  [<ffffffff81075149>] ? trace_hardirqs_on+0xd/0xf
  [<ffffffff8125e641>] kobject_add_varg+0x41/0x50
  [<ffffffff8125e70b>] kobject_add+0x64/0x66
  [<ffffffff8131122b>] device_add+0x12d/0x63a
  [<ffffffff814b65ea>] ? _raw_spin_unlock_irqrestore+0x47/0x56
  [<ffffffff8107de15>] ? module_refcount+0x89/0xa0
  [<ffffffff8132f348>] scsi_sysfs_add_sdev+0x4e/0x28a
  [<ffffffff8132dcbb>] do_scan_async+0x9c/0x145

...teach sas_rphy_remove to wait for async scanning to quiesce before
removing the end_device.  It seems this is a more general problem [1],
but this patch only addresses sas transport.

[1]: 23edb6e [SCSI] mpt2sas: Do not set sas_device->starget to NULL from
the slave_destroy callback when all the LUNS have been deleted

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/scsi_transport_sas.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/drivers/scsi/scsi_transport_sas.c b/drivers/scsi/scsi_transport_sas.c
index f7565fc..47abb90 100644
--- a/drivers/scsi/scsi_transport_sas.c
+++ b/drivers/scsi/scsi_transport_sas.c
@@ -33,8 +33,9 @@
 #include <linux/bsg.h>
 
 #include <scsi/scsi.h>
-#include <scsi/scsi_device.h>
 #include <scsi/scsi_host.h>
+#include <scsi/scsi_scan.h>
+#include <scsi/scsi_device.h>
 #include <scsi/scsi_transport.h>
 #include <scsi/scsi_transport_sas.h>
 
@@ -1667,6 +1668,9 @@ sas_rphy_remove(struct sas_rphy *rphy)
 {
 	struct device *dev = &rphy->dev;
 
+	/* prevent device_del() while child device_add() may be in-flight */
+	scsi_complete_async_scans();
+
 	switch (rphy->identify.device_type) {
 	case SAS_END_DEVICE:
 		scsi_remove_target(dev);


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [libsas PATCH v12 06/11] libsas: unify domain_device sas_rphy lifetimes
  2012-03-22  6:31 [libsas PATCH v12 00/11] libsas eh, discovery, suspend, and fixes Dan Williams
                   ` (4 preceding siblings ...)
  2012-03-22  6:32 ` [libsas PATCH v12 05/11] scsi_transport_sas: fix delete vs scan race Dan Williams
@ 2012-03-22  6:32 ` Dan Williams
  2012-03-22  6:32 ` [libsas PATCH v12 07/11] libsas: fix false positive 'device attached' conditions Dan Williams
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Dan Williams @ 2012-03-22  6:32 UTC (permalink / raw)
  To: linux-scsi; +Cc: Tom Jackson, linux-ide

Since the domain_device can out live the scsi_target we need the rphy to
follow suit otherwise we run into issues like:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
  IP: [<ffffffffa011561b>] sas_ata_printk+0x43/0x6f [libsas]
  PGD 0
  Oops: 0000 [#1] SMP
  CPU 1
  Modules linked in: ses enclosure isci libsas scsi_transport_sas fuse sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf microcode pcspkr igb joydev iTCO_wdt ioatdma iTCO_vendor_support i2c_i801 i2c_core dca wmi hed ipv6 pata_acpi ata_generic [last unloaded: scsi_wait_scan]

  Pid: 129, comm: kworker/u:3 Not tainted 3.3.0-rc5-isci+ #1 Intel Corporation SandyBridge Platform/To be filled by O.E.M.
  RIP: 0010:[<ffffffffa011561b>] [<ffffffffa011561b>] sas_ata_printk+0x43/0x6f [libsas]
  RSP: 0018:ffff88042232dd70 EFLAGS: 00010282
  RAX: 0000000000000000 RBX: ffff8804283165b8 RCX: ffff88042232dda0
  RDX: ffff88042232dd78 RSI: ffff8804283165b8 RDI: ffffffffa01188d7
  RBP: ffff88042232ddd0 R08: ffff880388454000 R09: ffff8803edfde1f8
  R10: ffff8803edfde1f8 R11: ffff8803edfde1f8 R12: ffff880428316750
  R13: ffff880388454000 R14: ffff8803f88b31d0 R15: ffff8803f8b21d50
  FS: 0000000000000000(0000) GS:ffff88042ee20000(0000) knlGS:0000000000000000
  CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 0000000000000050 CR3: 0000000001a05000 CR4: 00000000000406e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Process kworker/u:3 (pid: 129, threadinfo ffff88042232c000, task ffff88042230c920)
  Stack:
  0000000000000000 ffff880400000018 ffff88042232dde0 ffff88042232dda0
  ffffffffa01188c4 ffff88042ee93af0 ffff88042232ddb0 ffffffff8100e047
  ffff88042232de10 ffff880420e5a2c8 ffff8803f8b21d50 ffff8803edfde1f8
  Call Trace:
  [<ffffffff8100e047>] ? load_TLS+0xb/0xf
  [<ffffffffa01156ad>] async_sas_ata_eh+0x66/0x95 [libsas]
  [<ffffffff810655e1>] async_run_entry_fn+0x9e/0x131

Reported-by: Tom Jackson <thomas.p.jackson@intel.com>
Tested-by: Tom Jackson <thomas.p.jackson@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/libsas/sas_discover.c |   11 ++++++-----
 drivers/scsi/libsas/sas_expander.c |    6 ++++--
 2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/libsas/sas_discover.c b/drivers/scsi/libsas/sas_discover.c
index 8ab1de6..57c5100 100644
--- a/drivers/scsi/libsas/sas_discover.c
+++ b/drivers/scsi/libsas/sas_discover.c
@@ -152,6 +152,7 @@ static int sas_get_port_device(struct asd_sas_port *port)
 	sas_device_set_phy(dev, port->port);
 
 	dev->rphy = rphy;
+	get_device(&dev->rphy->dev);
 
 	if (dev_is_sata(dev) || dev->dev_type == SAS_END_DEV)
 		list_add_tail(&dev->disco_list_node, &port->disco_list);
@@ -256,6 +257,9 @@ void sas_free_device(struct kref *kref)
 {
 	struct domain_device *dev = container_of(kref, typeof(*dev), kref);
 
+	put_device(&dev->rphy->dev);
+	dev->rphy = NULL;
+
 	if (dev->parent)
 		sas_put_device(dev->parent);
 
@@ -314,7 +318,6 @@ static void sas_destruct_devices(struct work_struct *work)
 
 		sas_remove_children(&dev->rphy->dev);
 		sas_rphy_delete(dev->rphy);
-		dev->rphy = NULL;
 		sas_unregister_common_dev(port, dev);
 	}
 }
@@ -326,11 +329,11 @@ void sas_unregister_dev(struct asd_sas_port *port, struct domain_device *dev)
 		/* this rphy never saw sas_rphy_add */
 		list_del_init(&dev->disco_list_node);
 		sas_rphy_free(dev->rphy);
-		dev->rphy = NULL;
 		sas_unregister_common_dev(port, dev);
+		return;
 	}
 
-	if (dev->rphy && !test_and_set_bit(SAS_DEV_DESTROY, &dev->state)) {
+	if (!test_and_set_bit(SAS_DEV_DESTROY, &dev->state)) {
 		sas_rphy_unlink(dev->rphy);
 		list_move_tail(&dev->disco_list_node, &port->destroy_list);
 		sas_discover_event(dev->port, DISCE_DESTRUCT);
@@ -430,8 +433,6 @@ static void sas_discover_domain(struct work_struct *work)
 
 	if (error) {
 		sas_rphy_free(dev->rphy);
-		dev->rphy = NULL;
-
 		list_del_init(&dev->disco_list_node);
 		spin_lock_irq(&port->dev_list_lock);
 		list_del_init(&dev->dev_list_node);
diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c
index e9423f3..5874c7be 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -783,6 +783,7 @@ static struct domain_device *sas_ex_discover_end_dev(
 		sas_init_dev(child);
 
 		child->rphy = rphy;
+		get_device(&rphy->dev);
 
 		list_add_tail(&child->disco_list_node, &parent->port->disco_list);
 
@@ -806,6 +807,7 @@ static struct domain_device *sas_ex_discover_end_dev(
 		sas_init_dev(child);
 
 		child->rphy = rphy;
+		get_device(&rphy->dev);
 		sas_fill_in_rphy(child, rphy);
 
 		list_add_tail(&child->disco_list_node, &parent->port->disco_list);
@@ -830,8 +832,6 @@ static struct domain_device *sas_ex_discover_end_dev(
 
  out_list_del:
 	sas_rphy_free(child->rphy);
-	child->rphy = NULL;
-
 	list_del(&child->disco_list_node);
 	spin_lock_irq(&parent->port->dev_list_lock);
 	list_del(&child->dev_list_node);
@@ -911,6 +911,7 @@ static struct domain_device *sas_ex_discover_expander(
 	}
 	port = parent->port;
 	child->rphy = rphy;
+	get_device(&rphy->dev);
 	edev = rphy_to_expander_device(rphy);
 	child->dev_type = phy->attached_dev_type;
 	kref_get(&parent->kref);
@@ -934,6 +935,7 @@ static struct domain_device *sas_ex_discover_expander(
 
 	res = sas_discover_expander(child);
 	if (res) {
+		sas_rphy_delete(rphy);
 		spin_lock_irq(&parent->port->dev_list_lock);
 		list_del(&child->dev_list_node);
 		spin_unlock_irq(&parent->port->dev_list_lock);


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [libsas PATCH v12 07/11] libsas: fix false positive 'device attached' conditions
  2012-03-22  6:31 [libsas PATCH v12 00/11] libsas eh, discovery, suspend, and fixes Dan Williams
                   ` (5 preceding siblings ...)
  2012-03-22  6:32 ` [libsas PATCH v12 06/11] libsas: unify domain_device sas_rphy lifetimes Dan Williams
@ 2012-03-22  6:32 ` Dan Williams
  2012-03-22  6:32 ` [libsas PATCH v12 08/11] libsas: fix ata_eh clobbering ex_phys via smp_ata_check_ready Dan Williams
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Dan Williams @ 2012-03-22  6:32 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide, Andrzej Jakowski

Normalize phy->attached_sas_addr to return a zero-address in the case
when device-type == NO_DEVICE or the linkrate is invalid to handle
expanders that put non-zero sas addresses in the discovery response:

 sas: ex 5001b4da000f903f phy02:U:0 attached: 0100000000000000 (no device)
 sas: ex 5001b4da000f903f phy01:U:0 attached: 0100000000000000 (no device)
 sas: ex 5001b4da000f903f phy03:U:0 attached: 0100000000000000 (no device)
 sas: ex 5001b4da000f903f phy00:U:0 attached: 0100000000000000 (no device)

Reported-by: Andrzej Jakowski <andrzej.jakowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/libsas/sas_expander.c |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c
index 5874c7be..48d7ce9 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -240,7 +240,14 @@ static void sas_set_ex_phy(struct domain_device *dev, int phy_id, void *rsp)
 	phy->attached_sata_ps   = dr->attached_sata_ps;
 	phy->attached_iproto = dr->iproto << 1;
 	phy->attached_tproto = dr->tproto << 1;
-	memcpy(phy->attached_sas_addr, dr->attached_sas_addr, SAS_ADDR_SIZE);
+	/* help some expanders that fail to zero sas_address in the 'no
+	 * device' case
+	 */
+	if (phy->attached_dev_type == NO_DEVICE ||
+	    phy->linkrate < SAS_LINK_RATE_1_5_GBPS)
+		memset(phy->attached_sas_addr, 0, SAS_ADDR_SIZE);
+	else
+		memcpy(phy->attached_sas_addr, dr->attached_sas_addr, SAS_ADDR_SIZE);
 	phy->attached_phy_id = dr->attached_phy_id;
 	phy->phy_change_count = dr->change_count;
 	phy->routing_attr = dr->routing_attr;


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [libsas PATCH v12 08/11] libsas: fix ata_eh clobbering ex_phys via smp_ata_check_ready
  2012-03-22  6:31 [libsas PATCH v12 00/11] libsas eh, discovery, suspend, and fixes Dan Williams
                   ` (6 preceding siblings ...)
  2012-03-22  6:32 ` [libsas PATCH v12 07/11] libsas: fix false positive 'device attached' conditions Dan Williams
@ 2012-03-22  6:32 ` Dan Williams
  2012-03-22  6:32 ` [libsas PATCH v12 09/11] libsas, libata: fix start of life for a sas ata_port Dan Williams
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Dan Williams @ 2012-03-22  6:32 UTC (permalink / raw)
  To: linux-scsi
  Cc: linux-ide, Jacek Danecki, Maciej Patelczyk, Bartek Nowakowski,
	Jack Wang

The check_ready implementation in the expander-attached ata device case
polls on sas_ex_phy_discover().  The effect is that the ex_phy fields
(critically ->attached_sas_addr) can change.  When ata_eh ends and
libsas comes along to revalidate the domain
sas_unregister_devs_sas_addr() can fail to lookup devices to remove, or
fail to re-add an ata device that ata_eh marked as disabled.  So change
the code to skip the sas_address and change count updates when ata_eh is
active.

Cc: Jack Wang <jack_wang@usish.com>
Tested-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
Tested-by: Bartek Nowakowski <bartek.nowakowski@intel.com>
Tested-by: Jacek Danecki <jacek.danecki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/libsas/sas_expander.c |   16 +++++++++++++++-
 1 files changed, 15 insertions(+), 1 deletions(-)

diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c
index 48d7ce9..b5fbfaa 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -202,6 +202,7 @@ static void sas_set_ex_phy(struct domain_device *dev, int phy_id, void *rsp)
 	u8 sas_addr[SAS_ADDR_SIZE];
 	struct smp_resp *resp = rsp;
 	struct discover_resp *dr = &resp->disc;
+	struct sas_ha_struct *ha = dev->port->ha;
 	struct expander_device *ex = &dev->ex_dev;
 	struct ex_phy *phy = &ex->ex_phy[phy_id];
 	struct sas_rphy *rphy = dev->rphy;
@@ -209,6 +210,8 @@ static void sas_set_ex_phy(struct domain_device *dev, int phy_id, void *rsp)
 	char *type;
 
 	if (new_phy) {
+		if (WARN_ON_ONCE(test_bit(SAS_HA_ATA_EH_ACTIVE, &ha->state)))
+			return;
 		phy->phy = sas_phy_alloc(&rphy->dev, phy_id);
 
 		/* FIXME: error_handling */
@@ -233,6 +236,8 @@ static void sas_set_ex_phy(struct domain_device *dev, int phy_id, void *rsp)
 	memcpy(sas_addr, phy->attached_sas_addr, SAS_ADDR_SIZE);
 
 	phy->attached_dev_type = to_dev_type(dr);
+	if (test_bit(SAS_HA_ATA_EH_ACTIVE, &ha->state))
+		goto out;
 	phy->phy_id = phy_id;
 	phy->linkrate = dr->linkrate;
 	phy->attached_sata_host = dr->attached_sata_host;
@@ -273,6 +278,7 @@ static void sas_set_ex_phy(struct domain_device *dev, int phy_id, void *rsp)
 			return;
 		}
 
+ out:
 	switch (phy->attached_dev_type) {
 	case SATA_PENDING:
 		type = "stp pending";
@@ -311,7 +317,15 @@ static void sas_set_ex_phy(struct domain_device *dev, int phy_id, void *rsp)
 	else
 		return;
 
-	SAS_DPRINTK("ex %016llx phy%02d:%c:%X attached: %016llx (%s)\n",
+	/* if the attached device type changed and ata_eh is active,
+	 * make sure we run revalidation when eh completes (see:
+	 * sas_enable_revalidation)
+	 */
+	if (test_bit(SAS_HA_ATA_EH_ACTIVE, &ha->state))
+		set_bit(DISCE_REVALIDATE_DOMAIN, &dev->port->disc.pending);
+
+	SAS_DPRINTK("%sex %016llx phy%02d:%c:%X attached: %016llx (%s)\n",
+		    test_bit(SAS_HA_ATA_EH_ACTIVE, &ha->state) ? "ata: " : "",
 		    SAS_ADDR(dev->sas_addr), phy->phy_id,
 		    sas_route_char(dev, phy), phy->linkrate,
 		    SAS_ADDR(phy->attached_sas_addr), type);


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [libsas PATCH v12 09/11] libsas, libata: fix start of life for a sas ata_port
  2012-03-22  6:31 [libsas PATCH v12 00/11] libsas eh, discovery, suspend, and fixes Dan Williams
                   ` (7 preceding siblings ...)
  2012-03-22  6:32 ` [libsas PATCH v12 08/11] libsas: fix ata_eh clobbering ex_phys via smp_ata_check_ready Dan Williams
@ 2012-03-22  6:32 ` Dan Williams
  2012-03-22  6:32 ` [libsas PATCH v12 10/11] scsi, sd: limit the scope of the async probe domain Dan Williams
  2012-03-22  6:32 ` [libsas PATCH v12 11/11] libsas: suspend / resume support Dan Williams
  10 siblings, 0 replies; 24+ messages in thread
From: Dan Williams @ 2012-03-22  6:32 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide, Michal Kosciowski

This changes the ordering of initialization and probing events from:
  1/ allocate rphy in PORTE_BYTES_DMAED, DISCE_REVALIDATE_DOMAIN
  2/ allocate ata_port and schedule port probe in DISCE_PROBE
...to:
  1/ allocate ata_port in PORTE_BYTES_DMAED, DISCE_REVALIDATE_DOMAIN
  2/ allocate rphy in PORTE_BYTES_DMAED, DISCE_REVALIDATE_DOMAIN
  3/ schedule port probe in DISCE_PROBE

This ordering prevents PHYE_SIGNAL_LOSS_EVENTS from sneaking in to
destrory ata devices before they have been fully initialized:

  BUG: unable to handle kernel paging request at 0000000000003b10
  IP: [<ffffffffa0053d7e>] sas_ata_end_eh+0x12/0x5e [libsas]
  ...
  [<ffffffffa004d1af>] sas_unregister_common_dev+0x78/0xc9 [libsas]
  [<ffffffffa004d4d4>] sas_unregister_dev+0x4f/0xad [libsas]
  [<ffffffffa004d5b1>] sas_unregister_domain_devices+0x7f/0xbf [libsas]
  [<ffffffffa004c487>] sas_deform_port+0x61/0x1b8 [libsas]
  [<ffffffffa004bed0>] sas_phye_loss_of_signal+0x29/0x2b [libsas]

...and kills the awkward "sata domain_device briefly existing in the
domain without an ata_port" state.

Reported-by: Michal Kosciowski <michal.kosciowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/ata/libata-scsi.c          |   35 ++++++++++++++++++++---------------
 drivers/scsi/ipr.c                 |    6 +++++-
 drivers/scsi/libsas/sas_ata.c      |   33 ++++++++++-----------------------
 drivers/scsi/libsas/sas_discover.c |   13 ++++++++++---
 drivers/scsi/libsas/sas_expander.c |    8 +++++---
 include/linux/libata.h             |    3 ++-
 include/scsi/sas_ata.h             |    4 ++--
 7 files changed, 54 insertions(+), 48 deletions(-)

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 93dabdc..cde63f2 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -3838,18 +3838,25 @@ void ata_sas_port_stop(struct ata_port *ap)
 }
 EXPORT_SYMBOL_GPL(ata_sas_port_stop);
 
-int ata_sas_async_port_init(struct ata_port *ap)
+/**
+ * ata_sas_async_probe - simply schedule probing and return
+ * @ap: Port to probe
+ *
+ * For batch scheduling of probe for sas attached ata devices, assumes
+ * the port has already been through ata_sas_port_init()
+ */
+void ata_sas_async_probe(struct ata_port *ap)
 {
-	int rc = ap->ops->port_start(ap);
-
-	if (!rc) {
-		ap->print_id = atomic_inc_return(&ata_print_id);
-		__ata_port_probe(ap);
-	}
+	__ata_port_probe(ap);
+}
+EXPORT_SYMBOL_GPL(ata_sas_async_probe);
 
-	return rc;
+int ata_sas_sync_probe(struct ata_port *ap)
+{
+	return ata_port_probe(ap);
 }
-EXPORT_SYMBOL_GPL(ata_sas_async_port_init);
+EXPORT_SYMBOL_GPL(ata_sas_sync_probe);
+
 
 /**
  *	ata_sas_port_init - Initialize a SATA device
@@ -3866,12 +3873,10 @@ int ata_sas_port_init(struct ata_port *ap)
 {
 	int rc = ap->ops->port_start(ap);
 
-	if (!rc) {
-		ap->print_id = atomic_inc_return(&ata_print_id);
-		rc = ata_port_probe(ap);
-	}
-
-	return rc;
+	if (rc)
+		return rc;
+	ap->print_id = atomic_inc_return(&ata_print_id);
+	return 0;
 }
 EXPORT_SYMBOL_GPL(ata_sas_port_init);
 
diff --git a/drivers/scsi/ipr.c b/drivers/scsi/ipr.c
index cdfe5a1..58e7376 100644
--- a/drivers/scsi/ipr.c
+++ b/drivers/scsi/ipr.c
@@ -4546,8 +4546,12 @@ static int ipr_ata_slave_alloc(struct scsi_device *sdev)
 	ENTER;
 	if (sdev->sdev_target)
 		sata_port = sdev->sdev_target->hostdata;
-	if (sata_port)
+	if (sata_port) {
 		rc = ata_sas_port_init(sata_port->ap);
+		if (rc == 0)
+			rc = ata_sas_sync_probe(ap);
+	}
+
 	if (rc)
 		ipr_slave_destroy(sdev);
 
diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c
index 545b78d..6a6c80f 100644
--- a/drivers/scsi/libsas/sas_ata.c
+++ b/drivers/scsi/libsas/sas_ata.c
@@ -573,11 +573,12 @@ static struct ata_port_info sata_port_info = {
 	.port_ops = &sas_sata_ops
 };
 
-int sas_ata_init_host_and_port(struct domain_device *found_dev)
+int sas_ata_init(struct domain_device *found_dev)
 {
 	struct sas_ha_struct *ha = found_dev->port->ha;
 	struct Scsi_Host *shost = ha->core.shost;
 	struct ata_port *ap;
+	int rc;
 
 	ata_host_init(&found_dev->sata_dev.ata_host,
 		      ha->dev,
@@ -594,8 +595,11 @@ int sas_ata_init_host_and_port(struct domain_device *found_dev)
 	ap->private_data = found_dev;
 	ap->cbl = ATA_CBL_SATA;
 	ap->scsi_host = shost;
-	/* publish initialized ata port */
-	smp_wmb();
+	rc = ata_sas_port_init(ap);
+	if (rc) {
+		ata_sas_port_destroy(ap);
+		return rc;
+	}
 	found_dev->sata_dev.ap = ap;
 
 	return 0;
@@ -674,18 +678,13 @@ static void sas_get_ata_command_set(struct domain_device *dev)
 void sas_probe_sata(struct asd_sas_port *port)
 {
 	struct domain_device *dev, *n;
-	int err;
 
 	mutex_lock(&port->ha->disco_mutex);
-	list_for_each_entry_safe(dev, n, &port->disco_list, disco_list_node) {
+	list_for_each_entry(dev, &port->disco_list, disco_list_node) {
 		if (!dev_is_sata(dev))
 			continue;
 
-		err = sas_ata_init_host_and_port(dev);
-		if (err)
-			sas_fail_probe(dev, __func__, err);
-		else
-			ata_sas_async_port_init(dev->sata_dev.ap);
+		ata_sas_async_probe(dev->sata_dev.ap);
 	}
 	mutex_unlock(&port->ha->disco_mutex);
 
@@ -740,18 +739,6 @@ static void async_sas_ata_eh(void *data, async_cookie_t cookie)
 	sas_put_device(dev);
 }
 
-static bool sas_ata_dev_eh_valid(struct domain_device *dev)
-{
-	struct ata_port *ap;
-
-	if (!dev_is_sata(dev))
-		return false;
-	ap = dev->sata_dev.ap;
-	/* consume fully initialized ata ports */
-	smp_rmb();
-	return !!ap;
-}
-
 void sas_ata_strategy_handler(struct Scsi_Host *shost)
 {
 	struct sas_ha_struct *sas_ha = SHOST_TO_SAS_HA(shost);
@@ -775,7 +762,7 @@ void sas_ata_strategy_handler(struct Scsi_Host *shost)
 
 		spin_lock(&port->dev_list_lock);
 		list_for_each_entry(dev, &port->dev_list, dev_list_node) {
-			if (!sas_ata_dev_eh_valid(dev))
+			if (!dev_is_sata(dev))
 				continue;
 
 			/* hold a reference over eh since we may be
diff --git a/drivers/scsi/libsas/sas_discover.c b/drivers/scsi/libsas/sas_discover.c
index 57c5100..b031d23 100644
--- a/drivers/scsi/libsas/sas_discover.c
+++ b/drivers/scsi/libsas/sas_discover.c
@@ -73,6 +73,7 @@ static int sas_get_port_device(struct asd_sas_port *port)
 	struct asd_sas_phy *phy;
 	struct sas_rphy *rphy;
 	struct domain_device *dev;
+	int rc = -ENODEV;
 
 	dev = sas_alloc_device();
 	if (!dev)
@@ -111,9 +112,16 @@ static int sas_get_port_device(struct asd_sas_port *port)
 
 	sas_init_dev(dev);
 
+	dev->port = port;
 	switch (dev->dev_type) {
-	case SAS_END_DEV:
 	case SATA_DEV:
+		rc = sas_ata_init(dev);
+		if (rc) {
+			rphy = NULL;
+			break;
+		}
+		/* fall through */
+	case SAS_END_DEV:
 		rphy = sas_end_device_alloc(port->port);
 		break;
 	case EDGE_DEV:
@@ -132,7 +140,7 @@ static int sas_get_port_device(struct asd_sas_port *port)
 
 	if (!rphy) {
 		sas_put_device(dev);
-		return -ENODEV;
+		return rc;
 	}
 
 	rphy->identify.phy_identifier = phy->phy->identify.phy_identifier;
@@ -140,7 +148,6 @@ static int sas_get_port_device(struct asd_sas_port *port)
 	sas_fill_in_rphy(dev, rphy);
 	sas_hash_addr(dev->hashed_sas_addr, dev->sas_addr);
 	port->port_dev = dev;
-	dev->port = port;
 	dev->linkrate = port->linkrate;
 	dev->min_linkrate = port->linkrate;
 	dev->max_linkrate = port->linkrate;
diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c
index 62fd06b..a4b15c1 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -797,12 +797,14 @@ static struct domain_device *sas_ex_discover_end_dev(
 		if (res)
 			goto out_free;
 
+		sas_init_dev(child);
+		res = sas_ata_init(child);
+		if (res)
+			goto out_free;
 		rphy = sas_end_device_alloc(phy->port);
-		if (unlikely(!rphy))
+		if (!rphy)
 			goto out_free;
 
-		sas_init_dev(child);
-
 		child->rphy = rphy;
 		get_device(&rphy->dev);
 
diff --git a/include/linux/libata.h b/include/linux/libata.h
index ad2ea33..0a3c99a 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -999,7 +999,8 @@ extern int ata_sas_scsi_ioctl(struct ata_port *ap, struct scsi_device *dev,
 extern void ata_sas_port_destroy(struct ata_port *);
 extern struct ata_port *ata_sas_port_alloc(struct ata_host *,
 					   struct ata_port_info *, struct Scsi_Host *);
-extern int ata_sas_async_port_init(struct ata_port *);
+extern void ata_sas_async_probe(struct ata_port *ap);
+extern int ata_sas_sync_probe(struct ata_port *ap);
 extern int ata_sas_port_init(struct ata_port *);
 extern int ata_sas_port_start(struct ata_port *ap);
 extern void ata_sas_port_stop(struct ata_port *ap);
diff --git a/include/scsi/sas_ata.h b/include/scsi/sas_ata.h
index cae55b6..2dfbdaa 100644
--- a/include/scsi/sas_ata.h
+++ b/include/scsi/sas_ata.h
@@ -37,7 +37,7 @@ static inline int dev_is_sata(struct domain_device *dev)
 }
 
 int sas_get_ata_info(struct domain_device *dev, struct ex_phy *phy);
-int sas_ata_init_host_and_port(struct domain_device *found_dev);
+int sas_ata_init(struct domain_device *dev);
 void sas_ata_task_abort(struct sas_task *task);
 void sas_ata_strategy_handler(struct Scsi_Host *shost);
 void sas_ata_eh(struct Scsi_Host *shost, struct list_head *work_q,
@@ -53,7 +53,7 @@ static inline int dev_is_sata(struct domain_device *dev)
 {
 	return 0;
 }
-static inline int sas_ata_init_host_and_port(struct domain_device *found_dev)
+static inline int sas_ata_init(struct domain_device *dev)
 {
 	return 0;
 }


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [libsas PATCH v12 10/11] scsi, sd: limit the scope of the async probe domain
  2012-03-22  6:31 [libsas PATCH v12 00/11] libsas eh, discovery, suspend, and fixes Dan Williams
                   ` (8 preceding siblings ...)
  2012-03-22  6:32 ` [libsas PATCH v12 09/11] libsas, libata: fix start of life for a sas ata_port Dan Williams
@ 2012-03-22  6:32 ` Dan Williams
  2012-03-22 14:20   ` Alan Stern
  2012-03-22  6:32 ` [libsas PATCH v12 11/11] libsas: suspend / resume support Dan Williams
  10 siblings, 1 reply; 24+ messages in thread
From: Dan Williams @ 2012-03-22  6:32 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide, Alan Stern

sd injects and synchronizes probe work on the global kernel-wide domain.
This runs into conflict with PM that wants to perform resume actions in
async context:

[  494.237079] INFO: task kworker/u:3:554 blocked for more than 120 seconds.
[  494.294396] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  494.360809] kworker/u:3     D 0000000000000000     0   554      2 0x00000000
[  494.420739]  ffff88012e4d3af0 0000000000000046 ffff88013200c160 ffff88012e4d3fd8
[  494.484392]  ffff88012e4d3fd8 0000000000012500 ffff8801394ea0b0 ffff88013200c160
[  494.548038]  ffff88012e4d3ae0 00000000000001e3 ffffffff81a249e0 ffff8801321c5398
[  494.611685] Call Trace:
[  494.632649]  [<ffffffff8149dd25>] schedule+0x5a/0x5c
[  494.674687]  [<ffffffff8104b968>] async_synchronize_cookie_domain+0xb6/0x112
[  494.734177]  [<ffffffff810461ff>] ? __init_waitqueue_head+0x50/0x50
[  494.787134]  [<ffffffff8131a224>] ? scsi_remove_target+0x48/0x48
[  494.837900]  [<ffffffff8104b9d9>] async_synchronize_cookie+0x15/0x17
[  494.891567]  [<ffffffff8104ba49>] async_synchronize_full+0x54/0x70  <-- here we wait for async contexts to complete
[  494.943783]  [<ffffffff8104b9f5>] ? async_synchronize_full_domain+0x1a/0x1a
[  495.002547]  [<ffffffffa00114b1>] sd_remove+0x2c/0xa2 [sd_mod]
[  495.051861]  [<ffffffff812fe94f>] __device_release_driver+0x86/0xcf
[  495.104807]  [<ffffffff812fe9bd>] device_release_driver+0x25/0x32  <-- here we take device_lock()

[  853.511341] INFO: task kworker/u:4:549 blocked for more than 120 seconds.
[  853.568693] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  853.635119] kworker/u:4     D ffff88013097b5d0     0   549      2 0x00000000
[  853.695129]  ffff880132773c40 0000000000000046 ffff880130790000 ffff880132773fd8
[  853.758990]  ffff880132773fd8 0000000000012500 ffff88013288a0b0 ffff880130790000
[  853.822796]  0000000000000246 0000000000000040 ffff88013097b5c8 ffff880130790000
[  853.886633] Call Trace:
[  853.907631]  [<ffffffff8149dd25>] schedule+0x5a/0x5c
[  853.949670]  [<ffffffff8149cc44>] __mutex_lock_common+0x220/0x351
[  854.001225]  [<ffffffff81304bd7>] ? device_resume+0x58/0x1c4
[  854.049082]  [<ffffffff81304bd7>] ? device_resume+0x58/0x1c4
[  854.097011]  [<ffffffff8149ce48>] mutex_lock_nested+0x2f/0x36   <-- here we wait for device_lock()
[  854.145591]  [<ffffffff81304bd7>] device_resume+0x58/0x1c4
[  854.192066]  [<ffffffff81304d61>] async_resume+0x1e/0x45
[  854.237019]  [<ffffffff8104bc93>] async_run_entry_fn+0xc6/0x173  <-- ...while running in async context

Provide a 'scsi_sd_probe_domain' so that async probe actions actions can
be flushed without regard for the state of PM, and allow for the resume
path to handle devices that have transitioned from SDEV_QUIESCE to
SDEV_DEL prior to resume.

Cc: Alan Stern <stern@rowland.harvard.edu>
[alan: uplevel scsi_sd_probe_domain, clarify scsi_device_resume]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/scsi.c        |    6 ++++++
 drivers/scsi/scsi_lib.c    |   10 +++++++---
 drivers/scsi/scsi_pm.c     |    2 +-
 drivers/scsi/sd.c          |    4 ++--
 include/scsi/scsi_device.h |    4 ++++
 5 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 07322ec..61c82a3 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -90,6 +90,12 @@ unsigned int scsi_logging_level;
 EXPORT_SYMBOL(scsi_logging_level);
 #endif
 
+#if IS_ENABLED(CONFIG_PM) || IS_ENABLED(CONFIG_BLK_DEV_SD)
+/* sd and scsi_pm need to coordinate flushing async actions */
+LIST_HEAD(scsi_sd_probe_domain);
+EXPORT_SYMBOL(scsi_sd_probe_domain);
+#endif
+
 /* NB: These are exposed through /proc/scsi/scsi and form part of the ABI.
  * You may not alter any existing entry (although adding new ones is
  * encouraged once assigned by ANSI/INCITS T10
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index b4833de..992ff63 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2348,10 +2348,14 @@ EXPORT_SYMBOL(scsi_device_quiesce);
  *
  *	Must be called with user context, may sleep.
  */
-void
-scsi_device_resume(struct scsi_device *sdev)
+void scsi_device_resume(struct scsi_device *sdev)
 {
-	if(scsi_device_set_state(sdev, SDEV_RUNNING))
+	/* check if the device state was mutated prior to resume, and if
+	 * so assume the state is being managed elsewhere (for example
+	 * device deleted during suspend)
+	 */
+	if (sdev->sdev_state != SDEV_QUIESCE ||
+	    scsi_device_set_state(sdev, SDEV_RUNNING))
 		return;
 	scsi_run_queue(sdev->request_queue);
 }
diff --git a/drivers/scsi/scsi_pm.c b/drivers/scsi/scsi_pm.c
index c467064..f661a41 100644
--- a/drivers/scsi/scsi_pm.c
+++ b/drivers/scsi/scsi_pm.c
@@ -97,7 +97,7 @@ static int scsi_bus_prepare(struct device *dev)
 {
 	if (scsi_is_sdev_device(dev)) {
 		/* sd probing uses async_schedule.  Wait until it finishes. */
-		async_synchronize_full();
+		async_synchronize_full_domain(&scsi_sd_probe_domain);
 
 	} else if (scsi_is_host_device(dev)) {
 		/* Wait until async scanning is finished */
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index bd17cf8..5e3910e 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2717,7 +2717,7 @@ static int sd_probe(struct device *dev)
 	dev_set_drvdata(dev, sdkp);
 
 	get_device(&sdkp->dev);	/* prevent release before async_schedule */
-	async_schedule(sd_probe_async, sdkp);
+	async_schedule_domain(sd_probe_async, sdkp, &scsi_sd_probe_domain);
 
 	return 0;
 
@@ -2751,7 +2751,7 @@ static int sd_remove(struct device *dev)
 	sdkp = dev_get_drvdata(dev);
 	scsi_autopm_get_device(sdkp->device);
 
-	async_synchronize_full();
+	async_synchronize_full_domain(&scsi_sd_probe_domain);
 	blk_queue_prep_rq(sdkp->device->request_queue, scsi_prep_fn);
 	blk_queue_unprep_rq(sdkp->device->request_queue, NULL);
 	device_del(&sdkp->dev);
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 77273f2..86992a6 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -393,6 +393,10 @@ static inline int scsi_autopm_get_device(struct scsi_device *d) { return 0; }
 static inline void scsi_autopm_put_device(struct scsi_device *d) {}
 #endif /* CONFIG_PM_RUNTIME */
 
+#if IS_ENABLED(CONFIG_PM) || IS_ENABLED(CONFIG_BLK_DEV_SD)
+extern struct list_head scsi_sd_probe_domain;
+#endif
+
 static inline int __must_check scsi_device_reprobe(struct scsi_device *sdev)
 {
 	return device_reprobe(&sdev->sdev_gendev);


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [libsas PATCH v12 11/11] libsas: suspend / resume support
  2012-03-22  6:31 [libsas PATCH v12 00/11] libsas eh, discovery, suspend, and fixes Dan Williams
                   ` (9 preceding siblings ...)
  2012-03-22  6:32 ` [libsas PATCH v12 10/11] scsi, sd: limit the scope of the async probe domain Dan Williams
@ 2012-03-22  6:32 ` Dan Williams
  10 siblings, 0 replies; 24+ messages in thread
From: Dan Williams @ 2012-03-22  6:32 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide, Alan Stern, Maciej Patelczyk, Jacek Danecki

libsas power management routines to suspend and recover the sas domain
based on a model where the lldd is allowed and expected to be
"forgetful".

sas_suspend_ha - disable event processing allowing the lldd to take down
                 links without concern for causing hotplug events.
                 Regardless of whether the lldd actually posts link down
                 messages libsas notifies the lldd that all
                 domain_devices are gone.

sas_prep_resume_ha - on the way back up before the lldd starts link
                     training clean out any spurious events that were
                     generated on the way down, and re-enable event
                     processing

sas_resume_ha - after the lldd has started and decided that all phys
		have posted link-up events this routine is called to let
		libsas start it's own timeout of any phys that did not
		resume.  After the timeout an lldd can cancel the
                phy teardown by posting a link-up event.

Storage for ex_change_count (u16) and phy_change_count (u8) are changed
to int so they can be set to -1 to indicate 'invalidated'.

Cc: Alan Stern <stern@rowland.harvard.edu>
Reviewed-by: Jacek Danecki <jacek.danecki@intel.com>
Tested-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/libsas/sas_ata.c      |   86 ++++++++++++++++++++++++++++++++++
 drivers/scsi/libsas/sas_discover.c |   69 ++++++++++++++++++++++++----
 drivers/scsi/libsas/sas_dump.c     |    1 
 drivers/scsi/libsas/sas_event.c    |    4 +-
 drivers/scsi/libsas/sas_init.c     |   90 ++++++++++++++++++++++++++++++++++++
 drivers/scsi/libsas/sas_internal.h |    1 
 drivers/scsi/libsas/sas_phy.c      |   21 ++++++++
 drivers/scsi/libsas/sas_port.c     |   52 ++++++++++++++++++++-
 include/scsi/libsas.h              |   20 ++++++--
 include/scsi/sas_ata.h             |   10 ++++
 10 files changed, 335 insertions(+), 19 deletions(-)

diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c
index 6a6c80f..607a35b 100644
--- a/drivers/scsi/libsas/sas_ata.c
+++ b/drivers/scsi/libsas/sas_ata.c
@@ -700,6 +700,92 @@ void sas_probe_sata(struct asd_sas_port *port)
 		if (ata_dev_disabled(sas_to_ata_dev(dev)))
 			sas_fail_probe(dev, __func__, -ENODEV);
 	}
+
+}
+
+static bool sas_ata_flush_pm_eh(struct asd_sas_port *port, const char *func)
+{
+	struct domain_device *dev, *n;
+	bool retry = false;
+
+	list_for_each_entry_safe(dev, n, &port->dev_list, dev_list_node) {
+		int rc;
+
+		if (!dev_is_sata(dev))
+			continue;
+
+		sas_ata_wait_eh(dev);
+		rc = dev->sata_dev.pm_result;
+		if (rc == -EAGAIN)
+			retry = true;
+		else if (rc) {
+			/* since we don't have a
+			 * ->port_{suspend|resume} routine in our
+			 *  ata_port ops, and no entanglements with
+			 *  acpi, suspend should just be mechanical trip
+			 *  through eh, catch cases where these
+			 *  assumptions are invalidated
+			 */
+			WARN_ONCE(1, "failed %s %s error: %d\n", func,
+				 dev_name(&dev->rphy->dev), rc);
+		}
+
+		/* if libata failed to power manage the device, tear it down */
+		if (ata_dev_disabled(sas_to_ata_dev(dev)))
+			sas_fail_probe(dev, func, -ENODEV);
+	}
+
+	return retry;
+}
+
+void sas_suspend_sata(struct asd_sas_port *port)
+{
+	struct domain_device *dev;
+
+ retry:
+	mutex_lock(&port->ha->disco_mutex);
+	list_for_each_entry(dev, &port->dev_list, dev_list_node) {
+		struct sata_device *sata;
+
+		if (!dev_is_sata(dev))
+			continue;
+
+		sata = &dev->sata_dev;
+		if (sata->ap->pm_mesg.event == PM_EVENT_SUSPEND)
+			continue;
+
+		sata->pm_result = -EIO;
+		ata_sas_port_async_suspend(sata->ap, &sata->pm_result);
+	}
+	mutex_unlock(&port->ha->disco_mutex);
+
+	if (sas_ata_flush_pm_eh(port, __func__))
+		goto retry;
+}
+
+void sas_resume_sata(struct asd_sas_port *port)
+{
+	struct domain_device *dev;
+
+ retry:
+	mutex_lock(&port->ha->disco_mutex);
+	list_for_each_entry(dev, &port->dev_list, dev_list_node) {
+		struct sata_device *sata;
+
+		if (!dev_is_sata(dev))
+			continue;
+
+		sata = &dev->sata_dev;
+		if (sata->ap->pm_mesg.event == PM_EVENT_ON)
+			continue;
+
+		sata->pm_result = -EIO;
+		ata_sas_port_async_resume(sata->ap, &sata->pm_result);
+	}
+	mutex_unlock(&port->ha->disco_mutex);
+
+	if (sas_ata_flush_pm_eh(port, __func__))
+		goto retry;
 }
 
 /**
diff --git a/drivers/scsi/libsas/sas_discover.c b/drivers/scsi/libsas/sas_discover.c
index 3e9dc1a..a0c3003 100644
--- a/drivers/scsi/libsas/sas_discover.c
+++ b/drivers/scsi/libsas/sas_discover.c
@@ -24,6 +24,7 @@
 
 #include <linux/scatterlist.h>
 #include <linux/slab.h>
+#include <linux/async.h>
 #include <scsi/scsi_host.h>
 #include <scsi/scsi_eh.h>
 #include "sas_internal.h"
@@ -180,16 +181,18 @@ int sas_notify_lldd_dev_found(struct domain_device *dev)
 	struct Scsi_Host *shost = sas_ha->core.shost;
 	struct sas_internal *i = to_sas_internal(shost->transportt);
 
-	if (i->dft->lldd_dev_found) {
-		res = i->dft->lldd_dev_found(dev);
-		if (res) {
-			printk("sas: driver on pcidev %s cannot handle "
-			       "device %llx, error:%d\n",
-			       dev_name(sas_ha->dev),
-			       SAS_ADDR(dev->sas_addr), res);
-		}
-		kref_get(&dev->kref);
+	if (!i->dft->lldd_dev_found)
+		return 0;
+
+	res = i->dft->lldd_dev_found(dev);
+	if (res) {
+		printk("sas: driver on pcidev %s cannot handle "
+		       "device %llx, error:%d\n",
+		       dev_name(sas_ha->dev),
+		       SAS_ADDR(dev->sas_addr), res);
 	}
+	set_bit(SAS_DEV_FOUND, &dev->state);
+	kref_get(&dev->kref);
 	return res;
 }
 
@@ -200,7 +203,10 @@ void sas_notify_lldd_dev_gone(struct domain_device *dev)
 	struct Scsi_Host *shost = sas_ha->core.shost;
 	struct sas_internal *i = to_sas_internal(shost->transportt);
 
-	if (i->dft->lldd_dev_gone) {
+	if (!i->dft->lldd_dev_gone)
+		return;
+
+	if (test_and_clear_bit(SAS_DEV_FOUND, &dev->state)) {
 		i->dft->lldd_dev_gone(dev);
 		sas_put_device(dev);
 	}
@@ -234,6 +240,47 @@ static void sas_probe_devices(struct work_struct *work)
 	}
 }
 
+static void sas_suspend_devices(struct work_struct *work)
+{
+	struct asd_sas_phy *phy;
+	struct domain_device *dev;
+	struct sas_discovery_event *ev = to_sas_discovery_event(work);
+	struct asd_sas_port *port = ev->port;
+	struct Scsi_Host *shost = port->ha->core.shost;
+	struct sas_internal *si = to_sas_internal(shost->transportt);
+
+	clear_bit(DISCE_SUSPEND, &port->disc.pending);
+
+	sas_suspend_sata(port);
+
+	/* lldd is free to forget the domain_device across the
+	 * suspension, we force the issue here to keep the reference
+	 * counts aligned
+	 */
+	list_for_each_entry(dev, &port->dev_list, dev_list_node)
+		sas_notify_lldd_dev_gone(dev);
+
+	/* we are suspending, so we know events are disabled and
+	 * phy_list is not being mutated
+	 */
+	list_for_each_entry(phy, &port->phy_list, port_phy_el) {
+		if (si->dft->lldd_port_formed)
+			si->dft->lldd_port_deformed(phy);
+		phy->suspended = 1;
+		port->suspended = 1;
+	}
+}
+
+static void sas_resume_devices(struct work_struct *work)
+{
+	struct sas_discovery_event *ev = to_sas_discovery_event(work);
+	struct asd_sas_port *port = ev->port;
+
+	clear_bit(DISCE_RESUME, &port->disc.pending);
+
+	sas_resume_sata(port);
+}
+
 /**
  * sas_discover_end_dev -- discover an end device (SSP, etc)
  * @end: pointer to domain device of interest
@@ -530,6 +577,8 @@ void sas_init_disc(struct sas_discovery *disc, struct asd_sas_port *port)
 		[DISCE_DISCOVER_DOMAIN] = sas_discover_domain,
 		[DISCE_REVALIDATE_DOMAIN] = sas_revalidate_domain,
 		[DISCE_PROBE] = sas_probe_devices,
+		[DISCE_SUSPEND] = sas_suspend_devices,
+		[DISCE_RESUME] = sas_resume_devices,
 		[DISCE_DESTRUCT] = sas_destruct_devices,
 	};
 
diff --git a/drivers/scsi/libsas/sas_dump.c b/drivers/scsi/libsas/sas_dump.c
index fc46093..cd6f99c 100644
--- a/drivers/scsi/libsas/sas_dump.c
+++ b/drivers/scsi/libsas/sas_dump.c
@@ -41,6 +41,7 @@ static const char *sas_phye_str[] = {
 	[1] = "PHYE_OOB_DONE",
 	[2] = "PHYE_OOB_ERROR",
 	[3] = "PHYE_SPINUP_HOLD",
+	[4] = "PHYE_RESUME_TIMEOUT",
 };
 
 void sas_dprint_porte(int phyid, enum port_event pe)
diff --git a/drivers/scsi/libsas/sas_event.c b/drivers/scsi/libsas/sas_event.c
index 789c4d8..aadbd53 100644
--- a/drivers/scsi/libsas/sas_event.c
+++ b/drivers/scsi/libsas/sas_event.c
@@ -134,7 +134,7 @@ static void notify_port_event(struct asd_sas_phy *phy, enum port_event event)
 			&phy->port_events[event].work, ha);
 }
 
-static void notify_phy_event(struct asd_sas_phy *phy, enum phy_event event)
+void sas_notify_phy_event(struct asd_sas_phy *phy, enum phy_event event)
 {
 	struct sas_ha_struct *ha = phy->ha;
 
@@ -159,7 +159,7 @@ int sas_init_events(struct sas_ha_struct *sas_ha)
 
 	sas_ha->notify_ha_event = notify_ha_event;
 	sas_ha->notify_port_event = notify_port_event;
-	sas_ha->notify_phy_event = notify_phy_event;
+	sas_ha->notify_phy_event = sas_notify_phy_event;
 
 	return 0;
 }
diff --git a/drivers/scsi/libsas/sas_init.c b/drivers/scsi/libsas/sas_init.c
index 014297c..dbc8a79 100644
--- a/drivers/scsi/libsas/sas_init.c
+++ b/drivers/scsi/libsas/sas_init.c
@@ -178,7 +178,7 @@ Undo_phys:
 	return error;
 }
 
-int sas_unregister_ha(struct sas_ha_struct *sas_ha)
+static void sas_disable_events(struct sas_ha_struct *sas_ha)
 {
 	/* Set the state to unregistered to avoid further unchained
 	 * events to be queued, and flush any in-progress drainers
@@ -189,7 +189,11 @@ int sas_unregister_ha(struct sas_ha_struct *sas_ha)
 	spin_unlock_irq(&sas_ha->lock);
 	__sas_drain_work(sas_ha);
 	mutex_unlock(&sas_ha->drain_mutex);
+}
 
+int sas_unregister_ha(struct sas_ha_struct *sas_ha)
+{
+	sas_disable_events(sas_ha);
 	sas_unregister_ports(sas_ha);
 
 	/* flush unregistration work */
@@ -381,6 +385,90 @@ int sas_set_phy_speed(struct sas_phy *phy,
 	return ret;
 }
 
+void sas_prep_resume_ha(struct sas_ha_struct *ha)
+{
+	int i;
+
+	set_bit(SAS_HA_REGISTERED, &ha->state);
+
+	/* clear out any stale link events/data from the suspension path */
+	for (i = 0; i < ha->num_phys; i++) {
+		struct asd_sas_phy *phy = ha->sas_phy[i];
+
+		memset(phy->attached_sas_addr, 0, SAS_ADDR_SIZE);
+		phy->port_events_pending = 0;
+		phy->phy_events_pending = 0;
+		phy->frame_rcvd_size = 0;
+	}
+}
+EXPORT_SYMBOL(sas_prep_resume_ha);
+
+static int phys_suspended(struct sas_ha_struct *ha)
+{
+	int i, rc = 0;
+
+	for (i = 0; i < ha->num_phys; i++) {
+		struct asd_sas_phy *phy = ha->sas_phy[i];
+
+		if (phy->suspended)
+			rc++;
+	}
+
+	return rc;
+}
+
+void sas_resume_ha(struct sas_ha_struct *ha)
+{
+	const unsigned long tmo = msecs_to_jiffies(25000);
+	int i;
+
+	/* deform ports on phys that did not resume
+	 * at this point we may be racing the phy coming back (as posted
+	 * by the lldd).  So we post the event and once we are in the
+	 * libsas context check that the phy remains suspended before
+	 * tearing it down.
+	 */
+	i = phys_suspended(ha);
+	if (i)
+		dev_info(ha->dev, "waiting up to 25 seconds for %d phy%s to resume\n",
+			 i, i > 1 ? "s" : "");
+	wait_event_timeout(ha->eh_wait_q, phys_suspended(ha) == 0, tmo);
+	for (i = 0; i < ha->num_phys; i++) {
+		struct asd_sas_phy *phy = ha->sas_phy[i];
+
+		if (phy->suspended) {
+			dev_warn(&phy->phy->dev, "resume timeout\n");
+			sas_notify_phy_event(phy, PHYE_RESUME_TIMEOUT);
+		}
+	}
+
+	/* all phys are back up or timed out, turn on i/o so we can
+	 * flush out disks that did not return
+	 */
+	scsi_unblock_requests(ha->core.shost);
+	sas_drain_work(ha);
+}
+EXPORT_SYMBOL(sas_resume_ha);
+
+void sas_suspend_ha(struct sas_ha_struct *ha)
+{
+	int i;
+
+	sas_disable_events(ha);
+	scsi_block_requests(ha->core.shost);
+	for (i = 0; i < ha->num_phys; i++) {
+		struct asd_sas_port *port = ha->sas_port[i];
+
+		sas_discover_event(port, DISCE_SUSPEND);
+	}
+
+	/* flush suspend events while unregistered */
+	mutex_lock(&ha->drain_mutex);
+	__sas_drain_work(ha);
+	mutex_unlock(&ha->drain_mutex);
+}
+EXPORT_SYMBOL(sas_suspend_ha);
+
 static void sas_phy_release(struct sas_phy *phy)
 {
 	kfree(phy->hostdata);
diff --git a/drivers/scsi/libsas/sas_internal.h b/drivers/scsi/libsas/sas_internal.h
index 507e4cf..1de6796 100644
--- a/drivers/scsi/libsas/sas_internal.h
+++ b/drivers/scsi/libsas/sas_internal.h
@@ -89,6 +89,7 @@ int sas_smp_phy_control(struct domain_device *dev, int phy_id,
 			enum phy_func phy_func, struct sas_phy_linkrates *);
 int sas_smp_get_phy_events(struct sas_phy *phy);
 
+void sas_notify_phy_event(struct asd_sas_phy *phy, enum phy_event event);
 void sas_device_set_phy(struct domain_device *dev, struct sas_port *port);
 struct domain_device *sas_find_dev_by_rphy(struct sas_rphy *rphy);
 struct domain_device *sas_ex_to_ata(struct domain_device *ex_dev, int phy_id);
diff --git a/drivers/scsi/libsas/sas_phy.c b/drivers/scsi/libsas/sas_phy.c
index 521422e..cdee446 100644
--- a/drivers/scsi/libsas/sas_phy.c
+++ b/drivers/scsi/libsas/sas_phy.c
@@ -94,6 +94,25 @@ static void sas_phye_spinup_hold(struct work_struct *work)
 	i->dft->lldd_control_phy(phy, PHY_FUNC_RELEASE_SPINUP_HOLD, NULL);
 }
 
+static void sas_phye_resume_timeout(struct work_struct *work)
+{
+	struct asd_sas_event *ev = to_asd_sas_event(work);
+	struct asd_sas_phy *phy = ev->phy;
+
+	clear_bit(PHYE_RESUME_TIMEOUT, &phy->phy_events_pending);
+
+	/* phew, lldd got the phy back in the nick of time */
+	if (!phy->suspended) {
+		dev_info(&phy->phy->dev, "resume timeout cancelled\n");
+		return;
+	}
+
+	phy->error = 0;
+	phy->suspended = 0;
+	sas_deform_port(phy, 1);
+}
+
+
 /* ---------- Phy class registration ---------- */
 
 int sas_register_phys(struct sas_ha_struct *sas_ha)
@@ -105,6 +124,8 @@ int sas_register_phys(struct sas_ha_struct *sas_ha)
 		[PHYE_OOB_DONE] = sas_phye_oob_done,
 		[PHYE_OOB_ERROR] = sas_phye_oob_error,
 		[PHYE_SPINUP_HOLD] = sas_phye_spinup_hold,
+		[PHYE_RESUME_TIMEOUT] = sas_phye_resume_timeout,
+
 	};
 
 	static const work_func_t sas_port_event_fns[PORT_NUM_EVENTS] = {
diff --git a/drivers/scsi/libsas/sas_port.c b/drivers/scsi/libsas/sas_port.c
index 1cf7d75..1ac0c8b 100644
--- a/drivers/scsi/libsas/sas_port.c
+++ b/drivers/scsi/libsas/sas_port.c
@@ -39,6 +39,49 @@ static bool phy_is_wideport_member(struct asd_sas_port *port, struct asd_sas_phy
 	return true;
 }
 
+static void sas_resume_port(struct asd_sas_phy *phy)
+{
+	struct domain_device *dev;
+	struct asd_sas_port *port = phy->port;
+	struct sas_ha_struct *sas_ha = phy->ha;
+	struct sas_internal *si = to_sas_internal(sas_ha->core.shost->transportt);
+
+	if (si->dft->lldd_port_formed)
+		si->dft->lldd_port_formed(phy);
+
+	if (port->suspended)
+		port->suspended = 0;
+	else {
+		/* we only need to handle "link returned" actions once */
+		return;
+	}
+
+	/* if the port came back:
+	 * 1/ presume every device came back
+	 * 2/ force the next revalidation to check all expander phys
+	 */
+	list_for_each_entry(dev, &port->dev_list, dev_list_node) {
+		int i, rc;
+
+		rc = sas_notify_lldd_dev_found(dev);
+		if (rc) {
+			sas_unregister_dev(port, dev);
+			continue;
+		}
+
+		if (dev->dev_type == EDGE_DEV || dev->dev_type == FANOUT_DEV) {
+			dev->ex_dev.ex_change_count = -1;
+			for (i = 0; i < dev->ex_dev.num_phys; i++) {
+				struct ex_phy *phy = &dev->ex_dev.ex_phy[i];
+
+				phy->phy_change_count = -1;
+			}
+		}
+	}
+
+	sas_discover_event(port, DISCE_RESUME);
+}
+
 /**
  * sas_form_port -- add this phy to a port
  * @phy: the phy of interest
@@ -58,7 +101,14 @@ static void sas_form_port(struct asd_sas_phy *phy)
 	if (port) {
 		if (!phy_is_wideport_member(port, phy))
 			sas_deform_port(phy, 0);
-		else {
+		else if (phy->suspended) {
+			phy->suspended = 0;
+			sas_resume_port(phy);
+
+			/* phy came back, try to cancel the timeout */
+			wake_up(&sas_ha->eh_wait_q);
+			return;
+		} else {
 			SAS_DPRINTK("%s: phy%d belongs to port%d already(%d)!\n",
 				    __func__, phy->id, phy->port->id,
 				    phy->port->num_phys);
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index 798011f..7598b39 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -79,7 +79,8 @@ enum phy_event {
 	PHYE_OOB_DONE         = 1,
 	PHYE_OOB_ERROR        = 2,
 	PHYE_SPINUP_HOLD      = 3, /* hot plug SATA, no COMWAKE sent */
-	PHY_NUM_EVENTS        = 4,
+	PHYE_RESUME_TIMEOUT   = 4,
+	PHY_NUM_EVENTS        = 5,
 };
 
 enum discover_event {
@@ -87,8 +88,10 @@ enum discover_event {
 	DISCE_REVALIDATE_DOMAIN = 1,
 	DISCE_PORT_GONE         = 2,
 	DISCE_PROBE		= 3,
-	DISCE_DESTRUCT		= 4,
-	DISC_NUM_EVENTS		= 5,
+	DISCE_SUSPEND		= 4,
+	DISCE_RESUME		= 5,
+	DISCE_DESTRUCT		= 6,
+	DISC_NUM_EVENTS		= 7,
 };
 
 /* ---------- Expander Devices ---------- */
@@ -128,7 +131,7 @@ struct ex_phy {
 	u8   attached_sas_addr[SAS_ADDR_SIZE];
 	u8   attached_phy_id;
 
-	u8   phy_change_count;
+	int phy_change_count;
 	enum routing_attribute routing_attr;
 	u8   virtual:1;
 
@@ -141,7 +144,7 @@ struct ex_phy {
 struct expander_device {
 	struct list_head children;
 
-	u16    ex_change_count;
+	int    ex_change_count;
 	u16    max_route_indexes;
 	u8     num_phys;
 
@@ -167,6 +170,7 @@ struct sata_device {
         enum   ata_command_set command_set;
         struct smp_resp        rps_resp; /* report_phy_sata_resp */
         u8     port_no;        /* port number, if this is a PM (Port) */
+	int    pm_result;
 
 	struct ata_port *ap;
 	struct ata_host ata_host;
@@ -180,6 +184,7 @@ struct ssp_device {
 
 enum {
 	SAS_DEV_GONE,
+	SAS_DEV_FOUND, /* device notified to lldd */
 	SAS_DEV_DESTROY,
 	SAS_DEV_EH_PENDING,
 	SAS_DEV_LU_RESET,
@@ -271,6 +276,7 @@ struct asd_sas_port {
 	enum   sas_linkrate linkrate;
 
 	struct sas_work work;
+	int suspended;
 
 /* public: */
 	int id;
@@ -319,6 +325,7 @@ struct asd_sas_phy {
 	unsigned long phy_events_pending;
 
 	int error;
+	int suspended;
 
 	struct sas_phy *phy;
 
@@ -685,6 +692,9 @@ struct sas_domain_function_template {
 
 extern int sas_register_ha(struct sas_ha_struct *);
 extern int sas_unregister_ha(struct sas_ha_struct *);
+extern void sas_prep_resume_ha(struct sas_ha_struct *sas_ha);
+extern void sas_resume_ha(struct sas_ha_struct *sas_ha);
+extern void sas_suspend_ha(struct sas_ha_struct *sas_ha);
 
 int sas_set_phy_speed(struct sas_phy *phy,
 		      struct sas_phy_linkrates *rates);
diff --git a/include/scsi/sas_ata.h b/include/scsi/sas_ata.h
index 2dfbdaa..ff71a56 100644
--- a/include/scsi/sas_ata.h
+++ b/include/scsi/sas_ata.h
@@ -45,6 +45,8 @@ void sas_ata_eh(struct Scsi_Host *shost, struct list_head *work_q,
 void sas_ata_schedule_reset(struct domain_device *dev);
 void sas_ata_wait_eh(struct domain_device *dev);
 void sas_probe_sata(struct asd_sas_port *port);
+void sas_suspend_sata(struct asd_sas_port *port);
+void sas_resume_sata(struct asd_sas_port *port);
 void sas_ata_end_eh(struct ata_port *ap);
 #else
 
@@ -82,6 +84,14 @@ static inline void sas_probe_sata(struct asd_sas_port *port)
 {
 }
 
+static inline void sas_suspend_sata(struct asd_sas_port *port)
+{
+}
+
+static inline void sas_resume_sata(struct asd_sas_port *port)
+{
+}
+
 static inline int sas_get_ata_info(struct domain_device *dev, struct ex_phy *phy)
 {
 	return 0;


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [libsas PATCH v12 10/11] scsi, sd: limit the scope of the async probe domain
  2012-03-22  6:32 ` [libsas PATCH v12 10/11] scsi, sd: limit the scope of the async probe domain Dan Williams
@ 2012-03-22 14:20   ` Alan Stern
  2012-03-22 19:09     ` Williams, Dan J
  0 siblings, 1 reply; 24+ messages in thread
From: Alan Stern @ 2012-03-22 14:20 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-scsi, linux-ide

On Wed, 21 Mar 2012, Dan Williams wrote:

> sd injects and synchronizes probe work on the global kernel-wide domain.
> This runs into conflict with PM that wants to perform resume actions in
> async context:

...

> Provide a 'scsi_sd_probe_domain' so that async probe actions actions can
> be flushed without regard for the state of PM, and allow for the resume
> path to handle devices that have transitioned from SDEV_QUIESCE to
> SDEV_DEL prior to resume.

> --- a/include/scsi/scsi_device.h
> +++ b/include/scsi/scsi_device.h
> @@ -393,6 +393,10 @@ static inline int scsi_autopm_get_device(struct scsi_device *d) { return 0; }
>  static inline void scsi_autopm_put_device(struct scsi_device *d) {}
>  #endif /* CONFIG_PM_RUNTIME */
>  
> +#if IS_ENABLED(CONFIG_PM) || IS_ENABLED(CONFIG_BLK_DEV_SD)
> +extern struct list_head scsi_sd_probe_domain;
> +#endif
> +

This doesn't need to be provided to drivers outside the SCSI core, so 
the declaration doesn't need to be in include/scsi.  I'd move it to 
drivers/scsi/scsi_priv.h.

Apart from that,

Acked-by: Alan Stern <stern@rowland.harvard.edu>

Alan Stern


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [libsas PATCH v12 04/11] sysfs: handle 'parent deleted before child added'
  2012-03-22  6:32 ` [libsas PATCH v12 04/11] sysfs: handle 'parent deleted before child added' Dan Williams
@ 2012-03-22 14:39   ` Greg Kroah-Hartman
  2012-03-22 16:27     ` Williams, Dan J
  2012-03-23 18:43     ` Dan Williams
  2012-03-22 14:47   ` James Bottomley
  1 sibling, 2 replies; 24+ messages in thread
From: Greg Kroah-Hartman @ 2012-03-22 14:39 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-scsi, linux-ide

On Wed, Mar 21, 2012 at 11:32:14PM -0700, Dan Williams wrote:
> In scsi at least two cases of the parent device being deleted before the
> child is added have been observed.
> 
> 1/ scsi is performing async scans and the device is removed prior to the
>    async can thread running.
> 
> 2/ libsas discovery event running after the parent port has been torn
>    down.
> 
> Result in crash signatures like:
>  BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
>  IP: [<ffffffff8115e100>] sysfs_create_dir+0x32/0xb6
>  ...
>  Process scsi_scan_8 (pid: 5417, threadinfo ffff88080bd16000, task ffff880801b8a0b0)
>  Stack:
>   00000000fffffffe ffff880813470628 ffff88080bd17cd0 ffff88080614b7e8
>   ffff88080b45c108 00000000fffffffe ffff88080bd17d20 ffffffff8125e4a8
>   ffff88080bd17cf0 ffffffff81075149 ffff88080bd17d30 ffff88080614b7e8
>  Call Trace:
>   [<ffffffff8125e4a8>] kobject_add_internal+0x120/0x1e3
>   [<ffffffff81075149>] ? trace_hardirqs_on+0xd/0xf
>   [<ffffffff8125e641>] kobject_add_varg+0x41/0x50
>   [<ffffffff8125e70b>] kobject_add+0x64/0x66
>   [<ffffffff8131122b>] device_add+0x12d/0x63a
> 
> These scenarios need to be cleaned up, but in the meantime the system
> need not crash if this ordering occurs.  Instead report:
> 
>  kobject_add_internal failed for target8:0:16 (error: -2 parent: end_device-8:0:24)
>  Pid: 2942, comm: scsi_scan_8 Not tainted 3.3.0-rc7-isci+ #2
>  Call Trace:
>   [<ffffffff8125e551>] kobject_add_internal+0x1c1/0x1f3
>   [<ffffffff81075149>] ? trace_hardirqs_on+0xd/0xf
>   [<ffffffff8125e659>] kobject_add_varg+0x41/0x50
>   [<ffffffff8125e723>] kobject_add+0x64/0x66
>   [<ffffffff8131124b>] device_add+0x12d/0x63a
>   [<ffffffff8125e0ef>] ? kobject_put+0x4c/0x50
>   [<ffffffff8132f370>] scsi_sysfs_add_sdev+0x4e/0x28a
>   [<ffffffff8132dce3>] do_scan_async+0x9c/0x145
> 
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  fs/sysfs/dir.c |    3 +++
>  lib/kobject.c  |    7 ++++---
>  2 files changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
> index 7fdf6a7..86521ee 100644
> --- a/fs/sysfs/dir.c
> +++ b/fs/sysfs/dir.c
> @@ -714,6 +714,9 @@ int sysfs_create_dir(struct kobject * kobj)
>  	else
>  		parent_sd = &sysfs_root;
>  
> +	if (!parent_sd)
> +		return -ENOENT;
> +
>  	if (sysfs_ns_type(parent_sd))
>  		ns = kobj->ktype->namespace(kobj);
>  	type = sysfs_read_ns_type(kobj);

So what happens if this is true?  Does this patch fix the oops?  What
kernels should this be applied to where this problem has been seen?

> diff --git a/lib/kobject.c b/lib/kobject.c
> index c33d7a1..e5f86c0 100644
> --- a/lib/kobject.c
> +++ b/lib/kobject.c
> @@ -192,13 +192,14 @@ static int kobject_add_internal(struct kobject *kobj)
>  
>  		/* be noisy on error issues */
>  		if (error == -EEXIST)
> -			printk(KERN_ERR "%s failed for %s with "
> +			pr_err("%s failed for %s with "
>  			       "-EEXIST, don't try to register things with "
>  			       "the same name in the same directory.\n",
>  			       __func__, kobject_name(kobj));
>  		else
> -			printk(KERN_ERR "%s failed for %s (%d)\n",
> -			       __func__, kobject_name(kobj), error);
> +			pr_err("%s failed for %s (error: %d parent: %s)\n",
> +			       __func__, kobject_name(kobj), error,
> +			       parent ? kobject_name(parent) : "'none'");
>  		dump_stack();
>  	} else
>  		kobj->state_in_sysfs = 1;

These changes have nothing to do with the above fix, so why include them
here?

And note, I hate pr_err(), what's wrong with printk() in this instance?

greg k-h

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [libsas PATCH v12 04/11] sysfs: handle 'parent deleted before child added'
  2012-03-22  6:32 ` [libsas PATCH v12 04/11] sysfs: handle 'parent deleted before child added' Dan Williams
  2012-03-22 14:39   ` Greg Kroah-Hartman
@ 2012-03-22 14:47   ` James Bottomley
  2012-03-22 16:34     ` Williams, Dan J
  1 sibling, 1 reply; 24+ messages in thread
From: James Bottomley @ 2012-03-22 14:47 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-scsi, linux-ide, Greg Kroah-Hartman

On Wed, 2012-03-21 at 23:32 -0700, Dan Williams wrote:
> In scsi at least two cases of the parent device being deleted before the
> child is added have been observed.
> 
> 1/ scsi is performing async scans and the device is removed prior to the
>    async can thread running.

This doesn't sound right.  We do an explicit get on the sdkp (and the
sdkp holds the sdp) before we schedule an async scan.  That's only
removed after the async scan has completed, so it should be impossible
for the parent to vanish.

> 2/ libsas discovery event running after the parent port has been torn
>    down.

This sounds possible, but should be fixable by referencing and checking
in libsas rather than having to alter sysfs, shouldn't it?

In general, I think any case where the parent is freed before the child
of that parent created is our refcounting cockup rather than a generic
problem in sysfs.

James



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [libsas PATCH v12 04/11] sysfs: handle 'parent deleted before child added'
  2012-03-22 14:39   ` Greg Kroah-Hartman
@ 2012-03-22 16:27     ` Williams, Dan J
  2012-03-22 22:51       ` Stefan Richter
  2012-03-23 18:43     ` Dan Williams
  1 sibling, 1 reply; 24+ messages in thread
From: Williams, Dan J @ 2012-03-22 16:27 UTC (permalink / raw)
  To: Greg Kroah-Hartman; +Cc: linux-scsi, linux-ide

On Thu, Mar 22, 2012 at 7:39 AM, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
> On Wed, Mar 21, 2012 at 11:32:14PM -0700, Dan Williams wrote:
[..]
>> diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
>> index 7fdf6a7..86521ee 100644
>> --- a/fs/sysfs/dir.c
>> +++ b/fs/sysfs/dir.c
>> @@ -714,6 +714,9 @@ int sysfs_create_dir(struct kobject * kobj)
>>       else
>>               parent_sd = &sysfs_root;
>>
>> +     if (!parent_sd)
>> +             return -ENOENT;
>> +
>>       if (sysfs_ns_type(parent_sd))
>>               ns = kobj->ktype->namespace(kobj);
>>       type = sysfs_read_ns_type(kobj);
>
> So what happens if this is true?  Does this patch fix the oops?

This patch downgrades the oops by turning it into a device_add()
failure, but the patches that *fix* this warning are here [1] and here
[2].

> What kernels should this be applied to where this problem has been seen?

I assume this has been a latent problem ever since scsi async scanning
was added (2.6.20-rc2), but it's a rare corner case to unplug devices
during the initial scan.

>> diff --git a/lib/kobject.c b/lib/kobject.c
>> index c33d7a1..e5f86c0 100644
>> --- a/lib/kobject.c
>> +++ b/lib/kobject.c
>> @@ -192,13 +192,14 @@ static int kobject_add_internal(struct kobject *kobj)
>>
>>               /* be noisy on error issues */
>>               if (error == -EEXIST)
>> -                     printk(KERN_ERR "%s failed for %s with "
>> +                     pr_err("%s failed for %s with "
>>                              "-EEXIST, don't try to register things with "
>>                              "the same name in the same directory.\n",
>>                              __func__, kobject_name(kobj));
>>               else
>> -                     printk(KERN_ERR "%s failed for %s (%d)\n",
>> -                            __func__, kobject_name(kobj), error);
>> +                     pr_err("%s failed for %s (error: %d parent: %s)\n",
>> +                            __func__, kobject_name(kobj), error,
>> +                            parent ? kobject_name(parent) : "'none'");
>>               dump_stack();
>>       } else
>>               kobj->state_in_sysfs = 1;
>
> These changes have nothing to do with the above fix, so why include them
> here?

It wasn't until I realized which 'parent' and which 'child' were
interacting that I was able to identify the real fixes.  Since it was
helpful for the scsi/sas case, I decided to leave the more informative
warning for the next person that gets to debug a similar failure.

> And note, I hate pr_err(), what's wrong with printk() in this instance?

This is a bit circuitous, but extending the warning to include the
'parent' and 'child' pushed up against 80 columns and since this
routine has a pr_debug() a few lines up I thought a pr_ conversion was
acceptable.  The pr_err() conversion of the EEXIST case just came
along for the ride to keep the print style consistent (at least in
this routine).

--
Dan

[1]: http://marc.info/?l=linux-scsi&m=133239707903443&w=2
[2]: http://marc.info/?l=linux-scsi&m=133239709603452&w=2

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [libsas PATCH v12 04/11] sysfs: handle 'parent deleted before child added'
  2012-03-22 14:47   ` James Bottomley
@ 2012-03-22 16:34     ` Williams, Dan J
  0 siblings, 0 replies; 24+ messages in thread
From: Williams, Dan J @ 2012-03-22 16:34 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi, linux-ide, Greg Kroah-Hartman

On Thu, Mar 22, 2012 at 7:47 AM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> On Wed, 2012-03-21 at 23:32 -0700, Dan Williams wrote:
>> In scsi at least two cases of the parent device being deleted before the
>> child is added have been observed.
>>
>> 1/ scsi is performing async scans and the device is removed prior to the
>>    async can thread running.
>
> This doesn't sound right.  We do an explicit get on the sdkp (and the
> sdkp holds the sdp) before we schedule an async scan.  That's only
> removed after the async scan has completed, so it should be impossible
> for the parent to vanish.

Right, the parent does not get freed because we have a reference, but
it still gets device_del()'d which leads to:

  device_del()->kobject_del()->sysfs_remove_dir()

  kobj->sd = NULL;

...and then sysfs_create_dir() (without this fix) goes ahead and
de-references parent_sd via sysfs_ns_type():

  return (sd->s_flags & SYSFS_NS_TYPE_MASK) >> SYSFS_NS_TYPE_SHIFT;

--
Dan

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [libsas PATCH v12 10/11] scsi, sd: limit the scope of the async probe domain
  2012-03-22 14:20   ` Alan Stern
@ 2012-03-22 19:09     ` Williams, Dan J
  0 siblings, 0 replies; 24+ messages in thread
From: Williams, Dan J @ 2012-03-22 19:09 UTC (permalink / raw)
  To: Alan Stern; +Cc: linux-scsi, linux-ide

On Thu, Mar 22, 2012 at 7:20 AM, Alan Stern <stern@rowland.harvard.edu> wrote:
> On Wed, 21 Mar 2012, Dan Williams wrote:
>
>> sd injects and synchronizes probe work on the global kernel-wide domain.
>> This runs into conflict with PM that wants to perform resume actions in
>> async context:
>
> ...
>
>> Provide a 'scsi_sd_probe_domain' so that async probe actions actions can
>> be flushed without regard for the state of PM, and allow for the resume
>> path to handle devices that have transitioned from SDEV_QUIESCE to
>> SDEV_DEL prior to resume.
>
>> --- a/include/scsi/scsi_device.h
>> +++ b/include/scsi/scsi_device.h
>> @@ -393,6 +393,10 @@ static inline int scsi_autopm_get_device(struct scsi_device *d) { return 0; }
>>  static inline void scsi_autopm_put_device(struct scsi_device *d) {}
>>  #endif /* CONFIG_PM_RUNTIME */
>>
>> +#if IS_ENABLED(CONFIG_PM) || IS_ENABLED(CONFIG_BLK_DEV_SD)
>> +extern struct list_head scsi_sd_probe_domain;
>> +#endif
>> +
>
> This doesn't need to be provided to drivers outside the SCSI core, so
> the declaration doesn't need to be in include/scsi.  I'd move it to
> drivers/scsi/scsi_priv.h.
>
> Apart from that,
>
> Acked-by: Alan Stern <stern@rowland.harvard.edu>

Thanks, will re-spin with the scsi_priv.h change.

--
Dan

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [libsas PATCH v12 04/11] sysfs: handle 'parent deleted before child added'
  2012-03-22 16:27     ` Williams, Dan J
@ 2012-03-22 22:51       ` Stefan Richter
  2012-03-22 23:11         ` Williams, Dan J
  2012-03-22 23:26         ` Stefan Richter
  0 siblings, 2 replies; 24+ messages in thread
From: Stefan Richter @ 2012-03-22 22:51 UTC (permalink / raw)
  To: Williams, Dan J; +Cc: Greg Kroah-Hartman, linux-scsi, linux-ide

On Mar 22 Williams, Dan J wrote:
> On Thu, Mar 22, 2012 at 7:39 AM, Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> > On Wed, Mar 21, 2012 at 11:32:14PM -0700, Dan Williams wrote:
> [..]
> >> diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
> >> index 7fdf6a7..86521ee 100644
> >> --- a/fs/sysfs/dir.c
> >> +++ b/fs/sysfs/dir.c
> >> @@ -714,6 +714,9 @@ int sysfs_create_dir(struct kobject * kobj)
> >>       else
> >>               parent_sd = &sysfs_root;
> >>
> >> +     if (!parent_sd)
> >> +             return -ENOENT;
> >> +
> >>       if (sysfs_ns_type(parent_sd))
> >>               ns = kobj->ktype->namespace(kobj);
> >>       type = sysfs_read_ns_type(kobj);
> >
> > So what happens if this is true?  Does this patch fix the oops?
> 
> This patch downgrades the oops by turning it into a device_add()
> failure, but the patches that *fix* this warning are here [1] and here
> [2].
[...]
> [1]: http://marc.info/?l=linux-scsi&m=133239707903443&w=2
> [2]: http://marc.info/?l=linux-scsi&m=133239709603452&w=2

Isn't this something which is to be accomplished by counting references to
the parent device?
-- 
Stefan Richter
-=====-===-- --== =-==-
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [libsas PATCH v12 04/11] sysfs: handle 'parent deleted before child added'
  2012-03-22 22:51       ` Stefan Richter
@ 2012-03-22 23:11         ` Williams, Dan J
  2012-03-22 23:26         ` Stefan Richter
  1 sibling, 0 replies; 24+ messages in thread
From: Williams, Dan J @ 2012-03-22 23:11 UTC (permalink / raw)
  To: Stefan Richter; +Cc: Greg Kroah-Hartman, linux-scsi, linux-ide

On Thu, Mar 22, 2012 at 3:51 PM, Stefan Richter
<stefanr@s5r6.in-berlin.de> wrote:
> Isn't this something which is to be accomplished by counting references to
> the parent device?

As I mentioned to James [1] the reference count keeps the parent from
being kfree()'d but not device_del()'d where kobj->sd gets sets to
NULL.

--
Dan

[1]: http://marc.info/?l=linux-scsi&m=133243414518064&w=2

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [libsas PATCH v12 04/11] sysfs: handle 'parent deleted before child added'
  2012-03-22 22:51       ` Stefan Richter
  2012-03-22 23:11         ` Williams, Dan J
@ 2012-03-22 23:26         ` Stefan Richter
  1 sibling, 0 replies; 24+ messages in thread
From: Stefan Richter @ 2012-03-22 23:26 UTC (permalink / raw)
  To: Williams, Dan J; +Cc: Greg Kroah-Hartman, linux-scsi, linux-ide

On Mar 22 Stefan Richter wrote:
> On Mar 22 Williams, Dan J wrote:
> > the patches that *fix* this warning are here [1] and here
> > [2].
> [...]
> > [1]: http://marc.info/?l=linux-scsi&m=133239707903443&w=2
> > [2]: http://marc.info/?l=linux-scsi&m=133239709603452&w=2
> 
> Isn't this something which is to be accomplished by counting references to
> the parent device?

Oh, OK, it is about ensuring device_add(parent) -- device_add(child) --
device_del(child) -- device_del(parent) nesting order.  So not just
counting of plain references but a wait-for-completion type issue.
-- 
Stefan Richter
-=====-===-- --== =-===
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [libsas PATCH v12 04/11] sysfs: handle 'parent deleted before child added'
  2012-03-22 14:39   ` Greg Kroah-Hartman
  2012-03-22 16:27     ` Williams, Dan J
@ 2012-03-23 18:43     ` Dan Williams
  2012-03-23 20:54       ` Greg Kroah-Hartman
  1 sibling, 1 reply; 24+ messages in thread
From: Dan Williams @ 2012-03-23 18:43 UTC (permalink / raw)
  To: Greg Kroah-Hartman; +Cc: linux-scsi, linux-ide

On Thu, 2012-03-22 at 07:39 -0700, Greg Kroah-Hartman wrote:
> And note, I hate pr_err(), what's wrong with printk() in this instance?

So how about an end run around the conversion to pr_err() and just make
these proper WARN()s since we end up calling dump_stack in both cases?

Something like:

diff --git a/lib/kobject.c b/lib/kobject.c
index e5f86c0..1bd0893 100644
--- a/lib/kobject.c
+++ b/lib/kobject.c
@@ -192,15 +192,14 @@ static int kobject_add_internal(struct kobject *kobj)
 
 		/* be noisy on error issues */
 		if (error == -EEXIST)
-			pr_err("%s failed for %s with "
-			       "-EEXIST, don't try to register things with "
-			       "the same name in the same directory.\n",
-			       __func__, kobject_name(kobj));
+			WARN(1, "%s failed for %s with "
+			     "-EEXIST, don't try to register things with "
+			     "the same name in the same directory.\n",
+			     __func__, kobject_name(kobj));
 		else
-			pr_err("%s failed for %s (error: %d parent: %s)\n",
-			       __func__, kobject_name(kobj), error,
-			       parent ? kobject_name(parent) : "'none'");
-		dump_stack();
+			WARN(1, "%s failed for %s (error: %d parent: %s)\n",
+			     __func__, kobject_name(kobj), error,
+			     parent ? kobject_name(parent) : "'none'");
 	} else
 		kobj->state_in_sysfs = 1;
 



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [libsas PATCH v12 04/11] sysfs: handle 'parent deleted before child added'
  2012-03-23 18:43     ` Dan Williams
@ 2012-03-23 20:54       ` Greg Kroah-Hartman
  2012-03-23 21:15         ` Williams, Dan J
  0 siblings, 1 reply; 24+ messages in thread
From: Greg Kroah-Hartman @ 2012-03-23 20:54 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-scsi, linux-ide

On Fri, Mar 23, 2012 at 11:43:28AM -0700, Dan Williams wrote:
> On Thu, 2012-03-22 at 07:39 -0700, Greg Kroah-Hartman wrote:
> > And note, I hate pr_err(), what's wrong with printk() in this instance?
> 
> So how about an end run around the conversion to pr_err() and just make
> these proper WARN()s since we end up calling dump_stack in both cases?
> 
> Something like:
> 
> diff --git a/lib/kobject.c b/lib/kobject.c
> index e5f86c0..1bd0893 100644
> --- a/lib/kobject.c
> +++ b/lib/kobject.c
> @@ -192,15 +192,14 @@ static int kobject_add_internal(struct kobject *kobj)
>  
>  		/* be noisy on error issues */
>  		if (error == -EEXIST)
> -			pr_err("%s failed for %s with "
> -			       "-EEXIST, don't try to register things with "
> -			       "the same name in the same directory.\n",
> -			       __func__, kobject_name(kobj));
> +			WARN(1, "%s failed for %s with "
> +			     "-EEXIST, don't try to register things with "
> +			     "the same name in the same directory.\n",
> +			     __func__, kobject_name(kobj));
>  		else
> -			pr_err("%s failed for %s (error: %d parent: %s)\n",
> -			       __func__, kobject_name(kobj), error,
> -			       parent ? kobject_name(parent) : "'none'");
> -		dump_stack();
> +			WARN(1, "%s failed for %s (error: %d parent: %s)\n",
> +			     __func__, kobject_name(kobj), error,
> +			     parent ? kobject_name(parent) : "'none'");

I'm missing why this isn't the exact same thing that is currently
happening...

confused,

greg k-h

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [libsas PATCH v12 04/11] sysfs: handle 'parent deleted before child added'
  2012-03-23 20:54       ` Greg Kroah-Hartman
@ 2012-03-23 21:15         ` Williams, Dan J
  0 siblings, 0 replies; 24+ messages in thread
From: Williams, Dan J @ 2012-03-23 21:15 UTC (permalink / raw)
  To: Greg Kroah-Hartman; +Cc: linux-scsi, linux-ide

On Fri, Mar 23, 2012 at 1:54 PM, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
> On Fri, Mar 23, 2012 at 11:43:28AM -0700, Dan Williams wrote:
>> On Thu, 2012-03-22 at 07:39 -0700, Greg Kroah-Hartman wrote:
>> > And note, I hate pr_err(), what's wrong with printk() in this instance?
>>
>> So how about an end run around the conversion to pr_err() and just make
>> these proper WARN()s since we end up calling dump_stack in both cases?
>>
>> Something like:
>>
>> diff --git a/lib/kobject.c b/lib/kobject.c
>> index e5f86c0..1bd0893 100644
>> --- a/lib/kobject.c
>> +++ b/lib/kobject.c
>> @@ -192,15 +192,14 @@ static int kobject_add_internal(struct kobject *kobj)
>>
>>               /* be noisy on error issues */
>>               if (error == -EEXIST)
>> -                     pr_err("%s failed for %s with "
>> -                            "-EEXIST, don't try to register things with "
>> -                            "the same name in the same directory.\n",
>> -                            __func__, kobject_name(kobj));
>> +                     WARN(1, "%s failed for %s with "
>> +                          "-EEXIST, don't try to register things with "
>> +                          "the same name in the same directory.\n",
>> +                          __func__, kobject_name(kobj));
>>               else
>> -                     pr_err("%s failed for %s (error: %d parent: %s)\n",
>> -                            __func__, kobject_name(kobj), error,
>> -                            parent ? kobject_name(parent) : "'none'");
>> -             dump_stack();
>> +                     WARN(1, "%s failed for %s (error: %d parent: %s)\n",
>> +                          __func__, kobject_name(kobj), error,
>> +                          parent ? kobject_name(parent) : "'none'");
>
> I'm missing why this isn't the exact same thing that is currently
> happening...
>
> confused,

WARN does wrap the printk and backtrace in "------------[ cut here
]------------\n" and "---[ end trace %016llx ]---\n" so automated
tools can pick it up, maybe that's useful in a core routine like this?

It's still modifying the else case to add the faulting child and
parent kobject names, and does so without setting a pr_ conversion
precedent in sysfs.

--
Dan

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2012-03-23 21:15 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-22  6:31 [libsas PATCH v12 00/11] libsas eh, discovery, suspend, and fixes Dan Williams
2012-03-22  6:31 ` [libsas PATCH v12 01/11] libsas: cleanup spurious calls to scsi_schedule_eh Dan Williams
2012-03-22  6:32 ` [libsas PATCH v12 02/11] libsas: trim sas_task of slow path infrastructure Dan Williams
2012-03-22  6:32 ` [libsas PATCH v12 03/11] libata: reset once Dan Williams
2012-03-22  6:32 ` [libsas PATCH v12 04/11] sysfs: handle 'parent deleted before child added' Dan Williams
2012-03-22 14:39   ` Greg Kroah-Hartman
2012-03-22 16:27     ` Williams, Dan J
2012-03-22 22:51       ` Stefan Richter
2012-03-22 23:11         ` Williams, Dan J
2012-03-22 23:26         ` Stefan Richter
2012-03-23 18:43     ` Dan Williams
2012-03-23 20:54       ` Greg Kroah-Hartman
2012-03-23 21:15         ` Williams, Dan J
2012-03-22 14:47   ` James Bottomley
2012-03-22 16:34     ` Williams, Dan J
2012-03-22  6:32 ` [libsas PATCH v12 05/11] scsi_transport_sas: fix delete vs scan race Dan Williams
2012-03-22  6:32 ` [libsas PATCH v12 06/11] libsas: unify domain_device sas_rphy lifetimes Dan Williams
2012-03-22  6:32 ` [libsas PATCH v12 07/11] libsas: fix false positive 'device attached' conditions Dan Williams
2012-03-22  6:32 ` [libsas PATCH v12 08/11] libsas: fix ata_eh clobbering ex_phys via smp_ata_check_ready Dan Williams
2012-03-22  6:32 ` [libsas PATCH v12 09/11] libsas, libata: fix start of life for a sas ata_port Dan Williams
2012-03-22  6:32 ` [libsas PATCH v12 10/11] scsi, sd: limit the scope of the async probe domain Dan Williams
2012-03-22 14:20   ` Alan Stern
2012-03-22 19:09     ` Williams, Dan J
2012-03-22  6:32 ` [libsas PATCH v12 11/11] libsas: suspend / resume support Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).