[PATCH 0/3] cciss rmmod/scan-thread fixes

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/3] cciss rmmod/scan-thread fixes
@ 2009-07-14 22:02 Andrew Patterson
  2009-07-14 22:02 ` [PATCH 1/3] cciss: remove logical drive sysfs entries during driver cleanup Andrew Patterson
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Andrew Patterson @ 2009-07-14 22:02 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm, linux-scsi, mike.miller, jens.axboe

The following series fixes several problems causing hangs while
rmmoding the cciss driver. In the process of fixing these hangs, I
also reworked the the logical drive scanning kernel thread to use one
thread per driver rather than one thread per controller. I also added
a sysfs attribute to kick off a controller scan.

 .../ABI/testing/sysfs-bus-pci-devices-cciss        |    7 +
 drivers/block/cciss.c                              |  198 ++++++++++++++++++--
 drivers/block/cciss.h                              |    7 +
 3 files changed, 188 insertions(+), 24 deletions(-)

Andrew Patterson (3):
      cciss: remove logical drive sysfs entries during driver cleanup.
      cciss: use only one scan thread
      cciss: kick off logical drive topology rescan through sysfs



-- 
Andrew Patterson

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/3] cciss: remove logical drive sysfs entries during driver cleanup.
  2009-07-14 22:02 [PATCH 0/3] cciss rmmod/scan-thread fixes Andrew Patterson
@ 2009-07-14 22:02 ` Andrew Patterson
  2009-07-14 22:02 ` [PATCH 2/3] cciss: use only one scan thread Andrew Patterson
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Andrew Patterson @ 2009-07-14 22:02 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm, linux-scsi, mike.miller, jens.axboe

cciss: remove logical drive sysfs entries during driver cleanup.

Sysfs entries for logical drives need to be removed when a drive is
deleted during driver cleanup.

Acked-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
---

diff --git a/drivers/block/cciss.c b/drivers/block/cciss.c
index a52cc7f..970c896 100644
--- a/drivers/block/cciss.c
+++ b/drivers/block/cciss.c
@@ -1977,7 +1977,6 @@ static int rebuild_lun_table(ctlr_info_t *h, int first_time)
 			h->drv[i].busy_configuring = 1;
 			spin_unlock_irqrestore(CCISS_LOCK(h->ctlr), flags);
 			return_code = deregister_disk(h, i, 1);
-			cciss_destroy_ld_sysfs_entry(&h->drv[i]);
 			h->drv[i].busy_configuring = 0;
 		}
 	}
@@ -2118,6 +2117,7 @@ static int deregister_disk(ctlr_info_t *h, int drv_index,
 				 * indicate that this element of the drive
 				 * array is free.
 				 */
+	cciss_destroy_ld_sysfs_entry(drv);
 
 	if (clear_all) {
 		/* check to see if it was the last disk */
@@ -4141,6 +4141,9 @@ static void __devexit cciss_remove_one(struct pci_dev *pdev)
 			if (q)
 				blk_cleanup_queue(q);
 		}
+		if (hba[i]->drv[j].raid_level != -1)
+			cciss_destroy_ld_sysfs_entry(&hba[i]->drv[j]);
+
 	}
 
 #ifdef CONFIG_CISS_SCSI_TAPE


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/3] cciss: use only one scan thread
  2009-07-14 22:02 [PATCH 0/3] cciss rmmod/scan-thread fixes Andrew Patterson
  2009-07-14 22:02 ` [PATCH 1/3] cciss: remove logical drive sysfs entries during driver cleanup Andrew Patterson
@ 2009-07-14 22:02 ` Andrew Patterson
  2009-07-15 22:06   ` Andrew Morton
  2009-07-14 22:02 ` [PATCH 3/3] cciss: kick off logical drive topology rescan through sysfs Andrew Patterson
  2009-07-15 22:08 ` [PATCH 0/3] cciss rmmod/scan-thread fixes Andrew Morton
  3 siblings, 1 reply; 9+ messages in thread
From: Andrew Patterson @ 2009-07-14 22:02 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm, linux-scsi, mike.miller, jens.axboe

cciss: use only one scan thread

Replace the use of one scan kthread per controller with one per driver.
Use a queue to hold a list of controllers that need to be rescanned with
routines to add and remove controllers from the queue.
Fix locking and completion handling to prevent a hang during rmmod.

Acked-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
---

diff --git a/drivers/block/cciss.c b/drivers/block/cciss.c
index 970c896..c6bba77 100644
--- a/drivers/block/cciss.c
+++ b/drivers/block/cciss.c
@@ -39,6 +39,7 @@
 #include <linux/hdreg.h>
 #include <linux/spinlock.h>
 #include <linux/compat.h>
+#include <linux/mutex.h>
 #include <asm/uaccess.h>
 #include <asm/io.h>
 
@@ -155,6 +156,10 @@ static struct board_type products[] = {
 
 static ctlr_info_t *hba[MAX_CTLR];
 
+static struct task_struct *cciss_scan_thread;
+static struct mutex scan_mutex;
+static struct list_head scan_q;
+
 static void do_cciss_request(struct request_queue *q);
 static irqreturn_t do_cciss_intr(int irq, void *dev_id);
 static int cciss_open(struct block_device *bdev, fmode_t mode);
@@ -189,6 +194,8 @@ static int sendcmd_withirq_core(ctlr_info_t *h, CommandList_struct *c,
 static int process_sendcmd_error(ctlr_info_t *h, CommandList_struct *c);
 
 static void fail_all_cmds(unsigned long ctlr);
+static int add_to_scan_list(struct ctlr_info *h);
+static void remove_from_scan_list(struct ctlr_info *h);
 static int scan_thread(void *data);
 static int check_for_unit_attention(ctlr_info_t *h, CommandList_struct *c);
 
@@ -3232,20 +3239,121 @@ static irqreturn_t do_cciss_intr(int irq, void *dev_id)
 	return IRQ_HANDLED;
 }
 
-static int scan_thread(void *data)
+/**
+ * add_to_scan_list() - add controller to rescan queue
+ * @h:		      Pointer to the controller.
+ *
+ * Adds the controller to the rescan queue if not already on the queue.
+ *
+ * returns 1 if added to the queue, 0 if skipped (could be on the
+ * queue already, or the controller could be initializing or shutting
+ * down).
+ **/
+static int add_to_scan_list(struct ctlr_info *h)
 {
-	ctlr_info_t *h = data;
-	int rc;
-	DECLARE_COMPLETION_ONSTACK(wait);
-	h->rescan_wait = &wait;
+	struct ctlr_info *test_h;
+	int found = 0;
+	int ret = 0;
+
+	if (h->busy_initializing)
+		return 0;
+
+	if (!mutex_trylock(&h->busy_shutting_down))
+		return 0;
 
-	for (;;) {
-		rc = wait_for_completion_interruptible(&wait);
-		if (kthread_should_stop())
+	mutex_lock(&scan_mutex);
+	list_for_each_entry(test_h, &scan_q, scan_list) {
+		if (test_h == h) {
+			found = 1;
 			break;
-		if (!rc)
+		}
+	}
+	if (!found && !h->busy_scanning) {
+		INIT_COMPLETION(h->scan_wait);
+		list_add_tail(&h->scan_list, &scan_q);
+		ret = 1;
+	}
+	mutex_unlock(&scan_mutex);
+	mutex_unlock(&h->busy_shutting_down);
+
+	return ret;
+}
+
+/**
+ * remove_from_scan_list() - remove controller from rescan queue
+ * @h:			   Pointer to the controller.
+ *
+ * Removes the controller from the rescan queue if present. Blocks if
+ * the controller is currently conducting a rescan.
+ **/
+static void remove_from_scan_list(struct ctlr_info *h)
+{
+	struct ctlr_info *test_h, *tmp_h;
+	int scanning = 0;
+
+	mutex_lock(&scan_mutex);
+	list_for_each_entry_safe(test_h, tmp_h, &scan_q, scan_list) {
+		if (test_h == h) {
+			list_del(&h->scan_list);
+			complete_all(&h->scan_wait);
+			mutex_unlock(&scan_mutex);
+			return;
+		}
+	}
+	if (&h->busy_scanning)
+		scanning = 0;
+	mutex_unlock(&scan_mutex);
+
+	if (scanning)
+		wait_for_completion(&h->scan_wait);
+}
+
+/**
+ * scan_thread() - kernel thread used to rescan controllers
+ * @data:	 Ignored.
+ *
+ * A kernel thread used scan for drive topology changes on
+ * controllers. The thread processes only one controller at a time
+ * using a queue.  Controllers are added to the queue using
+ * add_to_scan_list() and removed from the queue either after done
+ * processing or using remove_from_scan_list().
+ *
+ * returns 0.
+ **/
+static int scan_thread(void *data)
+{
+	struct ctlr_info *h;
+
+	set_current_state(TASK_INTERRUPTIBLE);
+	while (!kthread_should_stop()) {
+		mutex_lock(&scan_mutex);
+		if (list_empty(&scan_q)) {
+			h = NULL;
+		} else {
+			h = list_entry(scan_q.next,
+				       struct ctlr_info,
+				       scan_list);
+			list_del(&h->scan_list);
+			h->busy_scanning = 1;
+		}
+		mutex_unlock(&scan_mutex);
+
+		if (h) {
+			__set_current_state(TASK_RUNNING);
 			rebuild_lun_table(h, 0);
+			complete_all(&h->scan_wait);
+			mutex_lock(&scan_mutex);
+			h->busy_scanning = 0;
+			mutex_unlock(&scan_mutex);
+			set_current_state(TASK_INTERRUPTIBLE);
+		} else {
+			schedule();
+			set_current_state(TASK_INTERRUPTIBLE);
+			continue;
+		}
 	}
+
+	__set_current_state(TASK_RUNNING);
 	return 0;
 }
 
@@ -3268,8 +3376,8 @@ static int check_for_unit_attention(ctlr_info_t *h, CommandList_struct *c)
 	case REPORT_LUNS_CHANGED:
 		printk(KERN_WARNING "cciss%d: report LUN data "
 			"changed\n", h->ctlr);
-		if (h->rescan_wait)
-			complete(h->rescan_wait);
+		add_to_scan_list(h);
+		wake_up_process(cciss_scan_thread);
 		return 1;
 	break;
 	case POWER_OR_RESET:
@@ -3918,6 +4026,7 @@ static int __devinit cciss_init_one(struct pci_dev *pdev,
 	hba[i]->busy_initializing = 1;
 	INIT_HLIST_HEAD(&hba[i]->cmpQ);
 	INIT_HLIST_HEAD(&hba[i]->reqQ);
+	mutex_init(&hba[i]->busy_shutting_down);
 
 	if (cciss_pci_init(hba[i], pdev) != 0)
 		goto clean0;
@@ -3926,6 +4035,8 @@ static int __devinit cciss_init_one(struct pci_dev *pdev,
 	hba[i]->ctlr = i;
 	hba[i]->pdev = pdev;
 
+	init_completion(&hba[i]->scan_wait);
+
 	if (cciss_create_hba_sysfs_entry(hba[i]))
 		goto clean0;
 
@@ -4037,10 +4148,6 @@ static int __devinit cciss_init_one(struct pci_dev *pdev,
 	hba[i]->busy_initializing = 0;
 
 	rebuild_lun_table(hba[i], 1);
-	hba[i]->cciss_scan_thread = kthread_run(scan_thread, hba[i],
-				"cciss_scan%02d", i);
-	if (IS_ERR(hba[i]->cciss_scan_thread))
-		return PTR_ERR(hba[i]->cciss_scan_thread);
 
 	return 1;
 
@@ -4125,8 +4232,9 @@ static void __devexit cciss_remove_one(struct pci_dev *pdev)
 		return;
 	}
 
-	kthread_stop(hba[i]->cciss_scan_thread);
+	mutex_lock(&hba[i]->busy_shutting_down);
 
+	remove_from_scan_list(hba[i]);
 	remove_proc_entry(hba[i]->devname, proc_cciss);
 	unregister_blkdev(hba[i]->major, hba[i]->devname);
 
@@ -4173,6 +4281,7 @@ static void __devexit cciss_remove_one(struct pci_dev *pdev)
 	pci_release_regions(pdev);
 	pci_set_drvdata(pdev, NULL);
 	cciss_destroy_hba_sysfs_entry(hba[i]);
+	mutex_unlock(&hba[i]->busy_shutting_down);
 	free_hba(i);
 }
 
@@ -4205,15 +4314,27 @@ static int __init cciss_init(void)
 	if (err)
 		return err;
 
+	/* Start the scan thread */
+	mutex_init(&scan_mutex);
+	INIT_LIST_HEAD(&scan_q);
+	cciss_scan_thread = kthread_run(scan_thread, NULL, "cciss_scan");
+	if (IS_ERR(cciss_scan_thread)) {
+		err = PTR_ERR(cciss_scan_thread);
+		goto err_bus_unregister;
+	}
+
 	/* Register for our PCI devices */
 	err = pci_register_driver(&cciss_pci_driver);
 	if (err)
-		goto err_bus_register;
+		goto err_thread_stop;
 
 	return 0;
 
-err_bus_register:
+err_thread_stop:
+	kthread_stop(cciss_scan_thread);
+err_bus_unregister:
 	bus_unregister(&cciss_bus_type);
+
 	return err;
 }
 
@@ -4230,6 +4351,7 @@ static void __exit cciss_cleanup(void)
 			cciss_remove_one(hba[i]->pdev);
 		}
 	}
+	kthread_stop(cciss_scan_thread);
 	remove_proc_entry("driver/cciss", NULL);
 	bus_unregister(&cciss_bus_type);
 }
diff --git a/drivers/block/cciss.h b/drivers/block/cciss.h
index 06a5db2..859e0e0 100644
--- a/drivers/block/cciss.h
+++ b/drivers/block/cciss.h
@@ -2,6 +2,7 @@
 #define CCISS_H
 
 #include <linux/genhd.h>
+#include <linux/mutex.h>
 
 #include "cciss_cmd.h"
 
@@ -108,6 +109,8 @@ struct ctlr_info
 	int			nr_frees; 
 	int			busy_configuring;
 	int			busy_initializing;
+	int			busy_scanning;
+	struct mutex   		busy_shutting_down;
 
 	/* This element holds the zero based queue number of the last
 	 * queue to be started.  It is used for fairness.
@@ -122,8 +125,8 @@ struct ctlr_info
 	/* and saved for later processing */
 #endif
 	unsigned char alive;
-	struct completion *rescan_wait;
-	struct task_struct *cciss_scan_thread;
+	struct list_head scan_list;
+	struct completion scan_wait;
 	struct device dev;
 };
 


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/3] cciss: kick off logical drive topology rescan through sysfs
  2009-07-14 22:02 [PATCH 0/3] cciss rmmod/scan-thread fixes Andrew Patterson
  2009-07-14 22:02 ` [PATCH 1/3] cciss: remove logical drive sysfs entries during driver cleanup Andrew Patterson
  2009-07-14 22:02 ` [PATCH 2/3] cciss: use only one scan thread Andrew Patterson
@ 2009-07-14 22:02 ` Andrew Patterson
  2009-07-15 22:08 ` [PATCH 0/3] cciss rmmod/scan-thread fixes Andrew Morton
  3 siblings, 0 replies; 9+ messages in thread
From: Andrew Patterson @ 2009-07-14 22:02 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm, linux-scsi, mike.miller, jens.axboe

cciss: kick off logical drive topology rescan through sysfs

Added /sys/bus/pci/devices/<dev>/ccissX/rescan sysfs entry used
to kick off a rescan that discovers logical drive topology changes.

Acked-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
---

diff --git a/Documentation/ABI/testing/sysfs-bus-pci-devices-cciss b/Documentation/ABI/testing/sysfs-bus-pci-devices-cciss
index 0a92a7c..4050606 100644
--- a/Documentation/ABI/testing/sysfs-bus-pci-devices-cciss
+++ b/Documentation/ABI/testing/sysfs-bus-pci-devices-cciss
@@ -31,3 +31,10 @@ Date:		March 2009
 Kernel Version: 2.6.30
 Contact:	iss_storagedev@hp.com
 Description:	A symbolic link to /sys/block/cciss!cXdY
+
+Where:		/sys/bus/pci/devices/<dev>/ccissX/rescan
+Date:		July 2009
+Kernel Version:	2.6.31
+Contact:	iss_storagedev@hp.com
+Description:	Kicks of a rescan of the controller to discovery logical
+		drive topology changes.
diff --git a/drivers/block/cciss.c b/drivers/block/cciss.c
index c6bba77..a409e26 100644
--- a/drivers/block/cciss.c
+++ b/drivers/block/cciss.c
@@ -461,9 +461,19 @@ static void __devinit cciss_procinit(int i)
 #define to_hba(n) container_of(n, struct ctlr_info, dev)
 #define to_drv(n) container_of(n, drive_info_struct, dev)
 
-static struct device_type cciss_host_type = {
-	.name		= "cciss_host",
-};
+static ssize_t host_store_rescan(struct device *dev,
+				 struct device_attribute *attr,
+				 const char *buf, size_t count)
+{
+	struct ctlr_info *h = to_hba(dev);
+
+	add_to_scan_list(h);
+	wake_up_process(cciss_scan_thread);
+	wait_for_completion_interruptible(&h->scan_wait);
+
+	return count;
+}
+DEVICE_ATTR(rescan, S_IWUSR, NULL, host_store_rescan);
 
 static ssize_t dev_show_unique_id(struct device *dev,
 				 struct device_attribute *attr,
@@ -567,6 +577,25 @@ static ssize_t dev_show_rev(struct device *dev,
 }
 DEVICE_ATTR(rev, S_IRUGO, dev_show_rev, NULL);
 
+static struct attribute *cciss_host_attrs[] = {
+	&dev_attr_rescan.attr,
+	NULL
+};
+
+static struct attribute_group cciss_host_attr_group = {
+	.attrs = cciss_host_attrs,
+};
+
+static struct attribute_group *cciss_host_attr_groups[] = {
+	&cciss_host_attr_group,
+	NULL
+};
+
+static struct device_type cciss_host_type = {
+	.name		= "cciss_host",
+	.groups		= cciss_host_attr_groups,
+};
+
 static struct attribute *cciss_dev_attrs[] = {
 	&dev_attr_unique_id.attr,
 	&dev_attr_model.attr,


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/3] cciss: use only one scan thread
  2009-07-14 22:02 ` [PATCH 2/3] cciss: use only one scan thread Andrew Patterson
@ 2009-07-15 22:06   ` Andrew Morton
  2009-07-16  4:15     ` Andrew Patterson
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2009-07-15 22:06 UTC (permalink / raw)
  To: Andrew Patterson; +Cc: linux-kernel, linux-scsi, mike.miller, jens.axboe

On Tue, 14 Jul 2009 16:02:50 -0600
Andrew Patterson <andrew.patterson@hp.com> wrote:

> cciss: use only one scan thread
> 
> Replace the use of one scan kthread per controller with one per driver.
> Use a queue to hold a list of controllers that need to be rescanned with
> routines to add and remove controllers from the queue.
> Fix locking and completion handling to prevent a hang during rmmod.
> 
>
> ...
>
> --- a/drivers/block/cciss.c
> +++ b/drivers/block/cciss.c
> @@ -39,6 +39,7 @@
>  #include <linux/hdreg.h>
>  #include <linux/spinlock.h>
>  #include <linux/compat.h>
> +#include <linux/mutex.h>
>  #include <asm/uaccess.h>
>  #include <asm/io.h>
>  
> @@ -155,6 +156,10 @@ static struct board_type products[] = {
>  
>  static ctlr_info_t *hba[MAX_CTLR];
>  
> +static struct task_struct *cciss_scan_thread;
> +static struct mutex scan_mutex;
> +static struct list_head scan_q;

Both of the above can be initialised at compile-time.  This is a bit
neater and prevents reviewers from having to run around checking that
they got initialised early enough.

>  static void do_cciss_request(struct request_queue *q);
>  static irqreturn_t do_cciss_intr(int irq, void *dev_id);
>  static int cciss_open(struct block_device *bdev, fmode_t mode);
> @@ -189,6 +194,8 @@ static int sendcmd_withirq_core(ctlr_info_t *h, CommandList_struct *c,
>  static int process_sendcmd_error(ctlr_info_t *h, CommandList_struct *c);
>  
>  static void fail_all_cmds(unsigned long ctlr);
> +static int add_to_scan_list(struct ctlr_info *h);
> +static void remove_from_scan_list(struct ctlr_info *h);

These forward declarations are unneeded.

>  static int scan_thread(void *data);
>  static int check_for_unit_attention(ctlr_info_t *h, CommandList_struct *c);
>  
> @@ -3232,20 +3239,121 @@ static irqreturn_t do_cciss_intr(int irq, void *dev_id)
>  	return IRQ_HANDLED;
>  }
>  
> -static int scan_thread(void *data)
> +/**
> + * add_to_scan_list() - add controller to rescan queue
> + * @h:		      Pointer to the controller.
> + *
> + * Adds the controller to the rescan queue if not already on the queue.
> + *
> + * returns 1 if added to the queue, 0 if skipped (could be on the
> + * queue already, or the controller could be initializing or shutting
> + * down).
> + **/
> +static int add_to_scan_list(struct ctlr_info *h)
>  {
> -	ctlr_info_t *h = data;
> -	int rc;
> -	DECLARE_COMPLETION_ONSTACK(wait);
> -	h->rescan_wait = &wait;
> +	struct ctlr_info *test_h;
> +	int found = 0;
> +	int ret = 0;
> +
> +	if (h->busy_initializing)
> +		return 0;

This looks pretty random and racy.

For example, what stops busy_initializing from becoming true one
picosecond after the test?

> +	if (!mutex_trylock(&h->busy_shutting_down))
> +		return 0;
>  
> -	for (;;) {
> -		rc = wait_for_completion_interruptible(&wait);
> -		if (kthread_should_stop())
> +	mutex_lock(&scan_mutex);
> +	list_for_each_entry(test_h, &scan_q, scan_list) {
> +		if (test_h == h) {
> +			found = 1;
>  			break;
> -		if (!rc)
> +		}
> +	}
> +	if (!found && !h->busy_scanning) {
> +		INIT_COMPLETION(h->scan_wait);
> +		list_add_tail(&h->scan_list, &scan_q);
> +		ret = 1;
> +	}
> +	mutex_unlock(&scan_mutex);
> +	mutex_unlock(&h->busy_shutting_down);
> +
> +	return ret;
> +}

We already did init_completion(h->scan_wait) at startup time?  Doing
the INIT_COMPLETION() here looks unusual and is hopefully unnecessary.

> +/**
> + * scan_thread() - kernel thread used to rescan controllers
> + * @data:	 Ignored.
> + *
> + * A kernel thread used scan for drive topology changes on
> + * controllers. The thread processes only one controller at a time
> + * using a queue.  Controllers are added to the queue using
> + * add_to_scan_list() and removed from the queue either after done
> + * processing or using remove_from_scan_list().
> + *
> + * returns 0.
> + **/
> +static int scan_thread(void *data)
> +{
> +	struct ctlr_info *h;
> +
> +	set_current_state(TASK_INTERRUPTIBLE);
> +	while (!kthread_should_stop()) {
> +		mutex_lock(&scan_mutex);
> +		if (list_empty(&scan_q)) {
> +			h = NULL;
> +		} else {
> +			h = list_entry(scan_q.next,
> +				       struct ctlr_info,
> +				       scan_list);
> +			list_del(&h->scan_list);
> +			h->busy_scanning = 1;
> +		}
> +		mutex_unlock(&scan_mutex);
> +
> +		if (h) {
> +			__set_current_state(TASK_RUNNING);
>  			rebuild_lun_table(h, 0);
> +			complete_all(&h->scan_wait);
> +			mutex_lock(&scan_mutex);
> +			h->busy_scanning = 0;
> +			mutex_unlock(&scan_mutex);
> +			set_current_state(TASK_INTERRUPTIBLE);
> +		} else {
> +			schedule();
> +			set_current_state(TASK_INTERRUPTIBLE);
> +			continue;
> +		}
>  	}
> +
> +	__set_current_state(TASK_RUNNING);
>  	return 0;
>  }

This code is somewhat wrong.  Look:

	set_current_state(TASK_INTERRUPTIBLE);
	...
		mutex_lock(&scan_mutex);

now, we don't know what state the task is in.  If the mutex_lock()
slept the we're in state TASK_RUNNING.  If the mutex_lock() didn't
sleep then we're in state <mumble>.  Where
<mumble>==TASK_INTERRUPTIBLE, but that's an implementation fluke.

So I'd say that some rethinking/restructuring is needed here.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/3] cciss rmmod/scan-thread fixes
  2009-07-14 22:02 [PATCH 0/3] cciss rmmod/scan-thread fixes Andrew Patterson
                   ` (2 preceding siblings ...)
  2009-07-14 22:02 ` [PATCH 3/3] cciss: kick off logical drive topology rescan through sysfs Andrew Patterson
@ 2009-07-15 22:08 ` Andrew Morton
  2009-07-16  5:45   ` Andrew Patterson
  3 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2009-07-15 22:08 UTC (permalink / raw)
  To: Andrew Patterson; +Cc: linux-kernel, linux-scsi, mike.miller, jens.axboe

On Tue, 14 Jul 2009 16:02:40 -0600
Andrew Patterson <andrew.patterson@hp.com> wrote:

> The following series fixes several problems causing hangs while
> rmmoding the cciss driver. In the process of fixing these hangs, I
> also reworked the the logical drive scanning kernel thread to use one
> thread per driver rather than one thread per controller. I also added
> a sysfs attribute to kick off a controller scan.
> 
>  .../ABI/testing/sysfs-bus-pci-devices-cciss        |    7 +
>  drivers/block/cciss.c                              |  198 ++++++++++++++++++--
>  drivers/block/cciss.h                              |    7 +
>  3 files changed, 188 insertions(+), 24 deletions(-)
> 
> Andrew Patterson (3):
>       cciss: remove logical drive sysfs entries during driver cleanup.
>       cciss: use only one scan thread
>       cciss: kick off logical drive topology rescan through sysfs
> 

So I assume that these are intended to fix the problem which
cciss-fix-the-termination-of-the-scanning-thread.patch didn't fix?

Once we get this all sorted out, which kernel version(s) are we
targetting and why?

Thanks.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/3] cciss: use only one scan thread
  2009-07-15 22:06   ` Andrew Morton
@ 2009-07-16  4:15     ` Andrew Patterson
  2009-07-16  4:33       ` Andrew Morton
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Patterson @ 2009-07-16  4:15 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-scsi, mike.miller, jens.axboe

On Wed, 2009-07-15 at 15:06 -0700, Andrew Morton wrote:
> On Tue, 14 Jul 2009 16:02:50 -0600
> Andrew Patterson <andrew.patterson@hp.com> wrote:
> 
> > cciss: use only one scan thread
> > 
> > Replace the use of one scan kthread per controller with one per driver.
> > Use a queue to hold a list of controllers that need to be rescanned with
> > routines to add and remove controllers from the queue.
> > Fix locking and completion handling to prevent a hang during rmmod.
> > 
> >
> > ...
> >
> > --- a/drivers/block/cciss.c
> > +++ b/drivers/block/cciss.c
> > @@ -39,6 +39,7 @@
> >  #include <linux/hdreg.h>
> >  #include <linux/spinlock.h>
> >  #include <linux/compat.h>
> > +#include <linux/mutex.h>
> >  #include <asm/uaccess.h>
> >  #include <asm/io.h>
> >  
> > @@ -155,6 +156,10 @@ static struct board_type products[] = {
> >  
> >  static ctlr_info_t *hba[MAX_CTLR];
> >  
> > +static struct task_struct *cciss_scan_thread;
> > +static struct mutex scan_mutex;
> > +static struct list_head scan_q;
> 
> Both of the above can be initialised at compile-time.  This is a bit
> neater and prevents reviewers from having to run around checking that
> they got initialised early enough.

Fixed.

> 
> >  static void do_cciss_request(struct request_queue *q);
> >  static irqreturn_t do_cciss_intr(int irq, void *dev_id);
> >  static int cciss_open(struct block_device *bdev, fmode_t mode);
> > @@ -189,6 +194,8 @@ static int sendcmd_withirq_core(ctlr_info_t *h, CommandList_struct *c,
> >  static int process_sendcmd_error(ctlr_info_t *h, CommandList_struct *c);
> >  
> >  static void fail_all_cmds(unsigned long ctlr);
> > +static int add_to_scan_list(struct ctlr_info *h);
> > +static void remove_from_scan_list(struct ctlr_info *h);
> 
> These forward declarations are unneeded.
> 

Fixed.

> >  static int scan_thread(void *data);
> >  static int check_for_unit_attention(ctlr_info_t *h, CommandList_struct *c);
> >  
> > @@ -3232,20 +3239,121 @@ static irqreturn_t do_cciss_intr(int irq, void *dev_id)
> >  	return IRQ_HANDLED;
> >  }
> >  
> > -static int scan_thread(void *data)
> > +/**
> > + * add_to_scan_list() - add controller to rescan queue
> > + * @h:		      Pointer to the controller.
> > + *
> > + * Adds the controller to the rescan queue if not already on the queue.
> > + *
> > + * returns 1 if added to the queue, 0 if skipped (could be on the
> > + * queue already, or the controller could be initializing or shutting
> > + * down).
> > + **/
> > +static int add_to_scan_list(struct ctlr_info *h)
> >  {
> > -	ctlr_info_t *h = data;
> > -	int rc;
> > -	DECLARE_COMPLETION_ONSTACK(wait);
> > -	h->rescan_wait = &wait;
> > +	struct ctlr_info *test_h;
> > +	int found = 0;
> > +	int ret = 0;
> > +
> > +	if (h->busy_initializing)
> > +		return 0;
> 
> This looks pretty random and racy.

Fixed. Converted busy_initializer to a mutex and will use:

if (mutex_is_locked(&h->busy_initializing))
		return 0;

> 
> For example, what stops busy_initializing from becoming true one
> picosecond after the test?
> 
> > +	if (!mutex_trylock(&h->busy_shutting_down))
> > +		return 0;
> >  
> > -	for (;;) {
> > -		rc = wait_for_completion_interruptible(&wait);
> > -		if (kthread_should_stop())
> > +	mutex_lock(&scan_mutex);
> > +	list_for_each_entry(test_h, &scan_q, scan_list) {
> > +		if (test_h == h) {
> > +			found = 1;
> >  			break;
> > -		if (!rc)
> > +		}
> > +	}
> > +	if (!found && !h->busy_scanning) {
> > +		INIT_COMPLETION(h->scan_wait);
> > +		list_add_tail(&h->scan_list, &scan_q);
> > +		ret = 1;
> > +	}
> > +	mutex_unlock(&scan_mutex);
> > +	mutex_unlock(&h->busy_shutting_down);
> > +
> > +	return ret;
> > +}
> 
> We already did init_completion(h->scan_wait) at startup time?  Doing
> the INIT_COMPLETION() here looks unusual and is hopefully unnecessary.
> 

I am following the text in the "Linux Device Drivers 3rd Edition section
5.4:

"A completion is normally a one-shot device; it is used once then
discarded. It is possible, however, to reuse completion structures if
proper care is taken. If complete_all is not used, a completion
structure can be reused without any problems as long as there is no
ambiguity about what event is being signalled. If you use complete_all,
however, you must reinitialize the completion structure before reusing
it. The macro:

INIT_COMPLETION(struct completion c);

can be used to quickly perform this reinitialization."

It there are better way to do this? INIT_COMPLETION() currently just
sets done = 0.

> > +/**
> > + * scan_thread() - kernel thread used to rescan controllers
> > + * @data:	 Ignored.
> > + *
> > + * A kernel thread used scan for drive topology changes on
> > + * controllers. The thread processes only one controller at a time
> > + * using a queue.  Controllers are added to the queue using
> > + * add_to_scan_list() and removed from the queue either after done
> > + * processing or using remove_from_scan_list().
> > + *
> > + * returns 0.
> > + **/
> > +static int scan_thread(void *data)
> > +{
> > +	struct ctlr_info *h;
> > +
> > +	set_current_state(TASK_INTERRUPTIBLE);
> > +	while (!kthread_should_stop()) {
> > +		mutex_lock(&scan_mutex);
> > +		if (list_empty(&scan_q)) {
> > +			h = NULL;
> > +		} else {
> > +			h = list_entry(scan_q.next,
> > +				       struct ctlr_info,
> > +				       scan_list);
> > +			list_del(&h->scan_list);
> > +			h->busy_scanning = 1;
> > +		}
> > +		mutex_unlock(&scan_mutex);
> > +
> > +		if (h) {
> > +			__set_current_state(TASK_RUNNING);
> >  			rebuild_lun_table(h, 0);
> > +			complete_all(&h->scan_wait);
> > +			mutex_lock(&scan_mutex);
> > +			h->busy_scanning = 0;
> > +			mutex_unlock(&scan_mutex);
> > +			set_current_state(TASK_INTERRUPTIBLE);
> > +		} else {
> > +			schedule();
> > +			set_current_state(TASK_INTERRUPTIBLE);
> > +			continue;
> > +		}
> >  	}
> > +
> > +	__set_current_state(TASK_RUNNING);
> >  	return 0;
> >  }
> 
> This code is somewhat wrong.  Look:
> 
> 	set_current_state(TASK_INTERRUPTIBLE);
> 	...
> 		mutex_lock(&scan_mutex);
> 
> now, we don't know what state the task is in.  If the mutex_lock()
> slept the we're in state TASK_RUNNING.  If the mutex_lock() didn't
> sleep then we're in state <mumble>.  Where
> <mumble>==TASK_INTERRUPTIBLE, but that's an implementation fluke.
> 
> So I'd say that some rethinking/restructuring is needed here.

Will look into this.  I have had a hard time removing this possible race
(even the mutex_lock() does not remove it completely, but at most you
get multiple completions) without using some sort of nested locks.

> 
> 
-- 
Andrew Patterson
Hewlett-Packard


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/3] cciss: use only one scan thread
  2009-07-16  4:15     ` Andrew Patterson
@ 2009-07-16  4:33       ` Andrew Morton
  0 siblings, 0 replies; 9+ messages in thread
From: Andrew Morton @ 2009-07-16  4:33 UTC (permalink / raw)
  To: Andrew Patterson; +Cc: linux-kernel, linux-scsi, mike.miller, jens.axboe

On Thu, 16 Jul 2009 04:15:59 +0000 Andrew Patterson <andrew.patterson@hp.com> wrote:

> > > +}
> > 
> > We already did init_completion(h->scan_wait) at startup time?  Doing
> > the INIT_COMPLETION() here looks unusual and is hopefully unnecessary.
> > 
> 
> I am following the text in the "Linux Device Drivers 3rd Edition section
> 5.4:
> 
> "A completion is normally a one-shot device; it is used once then
> discarded. It is possible, however, to reuse completion structures if
> proper care is taken. If complete_all is not used, a completion
> structure can be reused without any problems as long as there is no
> ambiguity about what event is being signalled. If you use complete_all,
> however, you must reinitialize the completion structure before reusing
> it. The macro:
> 
> INIT_COMPLETION(struct completion c);
> 
> can be used to quickly perform this reinitialization."
> 
> It there are better way to do this? INIT_COMPLETION() currently just
> sets done = 0.

OK, I'd assumed that a full wait_for_completion()/complete() operation
would leave the completion in ready-to-use-again state.

But wait_for_completion_interruptible() can indeed leave
completion.done in a non-zero state if it was interrupted by a signal.
hm.  Perhaps so the caller can run wait_for_completion*() again.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/3] cciss rmmod/scan-thread fixes
  2009-07-15 22:08 ` [PATCH 0/3] cciss rmmod/scan-thread fixes Andrew Morton
@ 2009-07-16  5:45   ` Andrew Patterson
  0 siblings, 0 replies; 9+ messages in thread
From: Andrew Patterson @ 2009-07-16  5:45 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-scsi, mike.miller, jens.axboe

On Wed, 2009-07-15 at 15:08 -0700, Andrew Morton wrote:
> On Tue, 14 Jul 2009 16:02:40 -0600
> Andrew Patterson <andrew.patterson@hp.com> wrote:
> 
> > The following series fixes several problems causing hangs while
> > rmmoding the cciss driver. In the process of fixing these hangs, I
> > also reworked the the logical drive scanning kernel thread to use one
> > thread per driver rather than one thread per controller. I also added
> > a sysfs attribute to kick off a controller scan.
> > 
> >  .../ABI/testing/sysfs-bus-pci-devices-cciss        |    7 +
> >  drivers/block/cciss.c                              |  198 ++++++++++++++++++--
> >  drivers/block/cciss.h                              |    7 +
> >  3 files changed, 188 insertions(+), 24 deletions(-)
> > 
> > Andrew Patterson (3):
> >       cciss: remove logical drive sysfs entries during driver cleanup.
> >       cciss: use only one scan thread
> >       cciss: kick off logical drive topology rescan through sysfs
> > 
> 
> So I assume that these are intended to fix the problem which
> cciss-fix-the-termination-of-the-scanning-thread.patch didn't fix?

Correct. As well as a lot of the race conditions during cleanup.

> 
> Once we get this all sorted out, which kernel version(s) are we
> targetting and why?

The current problems this patchset tries to solve is that rmmod of the
module hangs along with logical hot remove.

I was hoping to get these in as soon as possible. It looks like the
sysfs additions and the scan_thread patches initially went into 2.6.30
(if my limited git-fu is telling me correctly), so I can't call this a
regression.  I guess 2.6.32 then.

> 
> Thanks.
> 
-- 
Andrew Patterson
Hewlett-Packard


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2009-07-16  5:45 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-07-14 22:02 [PATCH 0/3] cciss rmmod/scan-thread fixes Andrew Patterson
2009-07-14 22:02 ` [PATCH 1/3] cciss: remove logical drive sysfs entries during driver cleanup Andrew Patterson
2009-07-14 22:02 ` [PATCH 2/3] cciss: use only one scan thread Andrew Patterson
2009-07-15 22:06   ` Andrew Morton
2009-07-16  4:15     ` Andrew Patterson
2009-07-16  4:33       ` Andrew Morton
2009-07-14 22:02 ` [PATCH 3/3] cciss: kick off logical drive topology rescan through sysfs Andrew Patterson
2009-07-15 22:08 ` [PATCH 0/3] cciss rmmod/scan-thread fixes Andrew Morton
2009-07-16  5:45   ` Andrew Patterson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox