[PATCH] firewire: core: use non-reentrant workqueue where necessary

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] firewire: core: use non-reentrant workqueue where necessary
@ 2010-10-10 14:55 Stefan Richter
  2010-10-10 14:57 ` [PATCH] firewire: sbp2: parallelize login/inquiry, reconnect, and shutdown Stefan Richter
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Stefan Richter @ 2010-10-10 14:55 UTC (permalink / raw)
  To: linux1394-devel; +Cc: linux-kernel, Tejun Heo

firewire-core manages the following types of work items:

fw_card.br_work:
  - resets the bus on a card and possibly sends a PHY packet before that
  - does not sleep for long or not at all
  - is scheduled via fw_schedule_bus_reset() by
      - firewire-ohci's pci_probe method
      - firewire-ohci's set_config_rom method, called by kernelspace
        protocol drivers and userspace drivers which add/remove
	Configuration ROM descriptors
      - userspace drivers which use the bus reset ioctl
      - itself if the last reset happened less than 2 seconds ago
  - should not be executed in parallel by multiple CPUs but was not
    strictly protected against that

fw_card.bm_work:
  - performs bus management duties
  - usually does not (but may in corner cases) sleep for long
  - is scheduled via fw_schedule_bm_work() by
      - firewire-ohci's self-ID-complete IRQ handler tasklet
      - firewire-core's fw_device.work instances whenever the root node
        device was (successfully or unsuccessfully) discovered,
	refreshed, or rediscovered
      - itself in case of resource allocation failures or in order to
        obey the 125ms bus manager arbitration interval
  - must not be executed in parallel by multiple CPUs but was not
    strictly protected against that

fw_device.work:
  - performs node probe, update, shutdown, revival, removal; including
    kernel driver probe, update, shutdown and bus reset notification to
    userspace drivers
  - usually sleeps moderately long, in corner cases very long
  - is scheduled by
      - firewire-ohci's self-ID-complete IRQ handler tasklet via the
        core's fw_node_event
      - firewire-ohci's pci_remove method via core's fw_destroy_nodes/
        fw_node_event
      - itself during retries, e.g. while a node is powering up
  - must not be executed in parallel by multiple CPUs but was not
    strictly protected against that

iso_resource.work:
  - accesses registers at the Isochronous Resource Manager node
  - usually does not (but may in corner cases) sleep for long
  - is scheduled via schedule_iso_resource() by
      - the owning userspace driver at addition and removal of the
        resource
      - firewire-core's fw_device.work instances after bus reset
      - itself in case of resource allocation if necessary to obey the
        1000ms reallocation period after bus reset
  - must not be executed in parallel by multiple CPUs but was not
    strictly protected against that

Therefore queue all of these types of work items on system_nrt_wq
instead of system_wq.  The former guarantees non-reentrance across all
CPUs, the latter only on the CPU which schedules a work item.

As a bonus, other subsystems which flush system_wq won't be held up if
the firewire subsystem spends a lot of time in an extraordinarily long
fw_device.work.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
---
 drivers/firewire/core-card.c   |    6 +++---
 drivers/firewire/core-cdev.c   |    2 +-
 drivers/firewire/core-device.c |   28 +++++++++++++++++-----------
 3 files changed, 21 insertions(+), 15 deletions(-)

Index: b/drivers/firewire/core-card.c
===================================================================
--- a/drivers/firewire/core-card.c
+++ b/drivers/firewire/core-card.c
@@ -219,8 +219,8 @@ void fw_schedule_bus_reset(struct fw_car
 
 	/* Use an arbitrary short delay to combine multiple reset requests. */
 	fw_card_get(card);
-	if (!schedule_delayed_work(&card->br_work,
-				   delayed ? DIV_ROUND_UP(HZ, 100) : 0))
+	if (!queue_delayed_work(system_nrt_wq, &card->br_work,
+				delayed ? DIV_ROUND_UP(HZ, 100) : 0))
 		fw_card_put(card);
 }
 EXPORT_SYMBOL(fw_schedule_bus_reset);
@@ -232,7 +232,7 @@ static void br_work(struct work_struct *
 	/* Delay for 2s after last reset per IEEE 1394 clause 8.2.1. */
 	if (card->reset_jiffies != 0 &&
 	    time_is_after_jiffies(card->reset_jiffies + 2 * HZ)) {
-		if (!schedule_delayed_work(&card->br_work, 2 * HZ))
+		if (!queue_delayed_work(system_nrt_wq, &card->br_work, 2 * HZ))
 			fw_card_put(card);
 		return;
 	}
Index: b/drivers/firewire/core-cdev.c
===================================================================
--- a/drivers/firewire/core-cdev.c
+++ b/drivers/firewire/core-cdev.c
@@ -149,7 +149,7 @@ static void release_iso_resource(struct 
 static void schedule_iso_resource(struct iso_resource *r, unsigned long delay)
 {
 	client_get(r->client);
-	if (!schedule_delayed_work(&r->work, delay))
+	if (!queue_delayed_work(system_nrt_wq, &r->work, delay))
 		client_put(r->client);
 }
 
Index: b/drivers/firewire/core-device.c
===================================================================
--- a/drivers/firewire/core-device.c
+++ b/drivers/firewire/core-device.c
@@ -725,6 +725,12 @@ struct fw_device *fw_device_get_by_devt(
 	return device;
 }
 
+static void fw_schedule_device_work(struct fw_device *device,
+				    unsigned long delay)
+{
+	queue_delayed_work(system_nrt_wq, &device->work, delay);
+}
+
 /*
  * These defines control the retry behavior for reading the config
  * rom.  It shouldn't be necessary to tweak these; if the device
@@ -749,7 +755,7 @@ static void fw_device_shutdown(struct wo
 
 	if (time_is_after_jiffies(device->card->reset_jiffies + SHUTDOWN_DELAY)
 	    && !list_empty(&device->card->link)) {
-		schedule_delayed_work(&device->work, SHUTDOWN_DELAY);
+		fw_schedule_device_work(device, SHUTDOWN_DELAY);
 		return;
 	}
 
@@ -861,7 +867,7 @@ static int lookup_existing_device(struct
 		fw_notify("rediscovered device %s\n", dev_name(dev));
 
 		PREPARE_DELAYED_WORK(&old->work, fw_device_update);
-		schedule_delayed_work(&old->work, 0);
+		fw_schedule_device_work(old, 0);
 
 		if (current_node == card->root_node)
 			fw_schedule_bm_work(card, 0);
@@ -952,7 +958,7 @@ static void fw_device_init(struct work_s
 		if (device->config_rom_retries < MAX_RETRIES &&
 		    atomic_read(&device->state) == FW_DEVICE_INITIALIZING) {
 			device->config_rom_retries++;
-			schedule_delayed_work(&device->work, RETRY_DELAY);
+			fw_schedule_device_work(device, RETRY_DELAY);
 		} else {
 			fw_notify("giving up on config rom for node id %x\n",
 				  device->node_id);
@@ -1017,7 +1023,7 @@ static void fw_device_init(struct work_s
 			   FW_DEVICE_INITIALIZING,
 			   FW_DEVICE_RUNNING) == FW_DEVICE_GONE) {
 		PREPARE_DELAYED_WORK(&device->work, fw_device_shutdown);
-		schedule_delayed_work(&device->work, SHUTDOWN_DELAY);
+		fw_schedule_device_work(device, SHUTDOWN_DELAY);
 	} else {
 		if (device->config_rom_retries)
 			fw_notify("created device %s: GUID %08x%08x, S%d00, "
@@ -1096,7 +1102,7 @@ static void fw_device_refresh(struct wor
 		if (device->config_rom_retries < MAX_RETRIES / 2 &&
 		    atomic_read(&device->state) == FW_DEVICE_INITIALIZING) {
 			device->config_rom_retries++;
-			schedule_delayed_work(&device->work, RETRY_DELAY / 2);
+			fw_schedule_device_work(device, RETRY_DELAY / 2);
 
 			return;
 		}
@@ -1129,7 +1135,7 @@ static void fw_device_refresh(struct wor
 		if (device->config_rom_retries < MAX_RETRIES &&
 		    atomic_read(&device->state) == FW_DEVICE_INITIALIZING) {
 			device->config_rom_retries++;
-			schedule_delayed_work(&device->work, RETRY_DELAY);
+			fw_schedule_device_work(device, RETRY_DELAY);
 
 			return;
 		}
@@ -1156,7 +1162,7 @@ static void fw_device_refresh(struct wor
  gone:
 	atomic_set(&device->state, FW_DEVICE_GONE);
 	PREPARE_DELAYED_WORK(&device->work, fw_device_shutdown);
-	schedule_delayed_work(&device->work, SHUTDOWN_DELAY);
+	fw_schedule_device_work(device, SHUTDOWN_DELAY);
  out:
 	if (node_id == card->root_node->node_id)
 		fw_schedule_bm_work(card, 0);
@@ -1209,7 +1215,7 @@ void fw_node_event(struct fw_card *card,
 		 * first config rom scan half a second after bus reset.
 		 */
 		INIT_DELAYED_WORK(&device->work, fw_device_init);
-		schedule_delayed_work(&device->work, INITIAL_DELAY);
+		fw_schedule_device_work(device, INITIAL_DELAY);
 		break;
 
 	case FW_NODE_INITIATED_RESET:
@@ -1224,7 +1230,7 @@ void fw_node_event(struct fw_card *card,
 			    FW_DEVICE_RUNNING,
 			    FW_DEVICE_INITIALIZING) == FW_DEVICE_RUNNING) {
 			PREPARE_DELAYED_WORK(&device->work, fw_device_refresh);
-			schedule_delayed_work(&device->work,
+			fw_schedule_device_work(device,
 				device->is_local ? 0 : INITIAL_DELAY);
 		}
 		break;
@@ -1239,7 +1245,7 @@ void fw_node_event(struct fw_card *card,
 		device->generation = card->generation;
 		if (atomic_read(&device->state) == FW_DEVICE_RUNNING) {
 			PREPARE_DELAYED_WORK(&device->work, fw_device_update);
-			schedule_delayed_work(&device->work, 0);
+			fw_schedule_device_work(device, 0);
 		}
 		break;
 
@@ -1264,7 +1270,7 @@ void fw_node_event(struct fw_card *card,
 		if (atomic_xchg(&device->state,
 				FW_DEVICE_GONE) == FW_DEVICE_RUNNING) {
 			PREPARE_DELAYED_WORK(&device->work, fw_device_shutdown);
-			schedule_delayed_work(&device->work,
+			fw_schedule_device_work(device,
 				list_empty(&card->link) ? 0 : SHUTDOWN_DELAY);
 		}
 		break;

-- 
Stefan Richter
-=====-==-=- =-=- -=-=-
http://arcgraph.de/sr/


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] firewire: sbp2: parallelize login/inquiry, reconnect, and shutdown
  2010-10-10 14:55 [PATCH] firewire: core: use non-reentrant workqueue where necessary Stefan Richter
@ 2010-10-10 14:57 ` Stefan Richter
  2010-10-11 13:43   ` Tejun Heo
  2010-10-12 21:39   ` [PATCH unfinished update] " Stefan Richter
  2010-10-11 13:29 ` [PATCH] firewire: core: use non-reentrant workqueue where necessary Tejun Heo
  2010-10-12 21:29 ` [PATCH update] firewire: core: use non-reentrant workqueue with rescuer Stefan Richter
  2 siblings, 2 replies; 17+ messages in thread
From: Stefan Richter @ 2010-10-10 14:57 UTC (permalink / raw)
  To: linux1394-devel; +Cc: linux-kernel, Tejun Heo

firewire-sbp2 used to perform management tasks (and SCSI Inquiry, sd
probe etc.) in one single-threaded workqueue for all targets.  Some of
these jobs may take very long, e.g. when the target needs to spin up a
disk to perform the task.

We should have had per-target workqueues for that but didn't because
that would have created respectively many kernel threads which almost
always sit idle.  Now that we have concurrency managed workqueues,
correct that lack of parallelism:

Simply swap out the driver workqueue by system_nrt_wq.  It provides all
the parallelism we will ever need, accepts long-running work, and
provides the here necessary guarantee of non-reentrance across all CPUs.
(The work items are scheduled from firewire-core's fw_device work items
which themselves may change CPUs.)

This simple approach requires though that targets with multiple logical
units accept multiple outstanding management requests.  The SBP-2 spec
implicitly requires this capability, but who knows how real firmwares
react.

I only have access to two dual-LU targets:  A OXUF924DSB based one from
MacPower and a INIC-2430 based one from IOI.  They both work fine with
this change, as do several single-LU targets of course.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
---
 drivers/firewire/sbp2.c |    9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

Index: b/drivers/firewire/sbp2.c
===================================================================
--- a/drivers/firewire/sbp2.c
+++ b/drivers/firewire/sbp2.c
@@ -835,8 +835,6 @@ static void sbp2_target_put(struct sbp2_
 	kref_put(&tgt->kref, sbp2_release_target);
 }

-static struct workqueue_struct *sbp2_wq;
-
 /*
  * Always get the target's kref when scheduling work on one its units.
  * Each workqueue job is responsible to call sbp2_target_put() upon return.
@@ -844,7 +842,7 @@ static struct workqueue_struct *sbp2_wq;
 static void sbp2_queue_work(struct sbp2_logical_unit *lu, unsigned long delay)
 {
 	sbp2_target_get(lu->tgt);
-	if (!queue_delayed_work(sbp2_wq, &lu->work, delay))
+	if (!queue_delayed_work(system_nrt_wq, &lu->work, delay))
 		sbp2_target_put(lu->tgt);
 }

@@ -1656,17 +1654,12 @@ MODULE_ALIAS("sbp2");

 static int __init sbp2_init(void)
 {
-	sbp2_wq = create_singlethread_workqueue(KBUILD_MODNAME);
-	if (!sbp2_wq)
-		return -ENOMEM;
-
 	return driver_register(&sbp2_driver.driver);
 }

 static void __exit sbp2_cleanup(void)
 {
 	driver_unregister(&sbp2_driver.driver);
-	destroy_workqueue(sbp2_wq);
 }

 module_init(sbp2_init);

-- 
Stefan Richter
-=====-==-=- =-=- -=-=-
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] firewire: core: use non-reentrant workqueue where necessary
  2010-10-10 14:55 [PATCH] firewire: core: use non-reentrant workqueue where necessary Stefan Richter
  2010-10-10 14:57 ` [PATCH] firewire: sbp2: parallelize login/inquiry, reconnect, and shutdown Stefan Richter
@ 2010-10-11 13:29 ` Tejun Heo
  2010-10-11 19:05   ` Stefan Richter
  2010-10-12 21:29 ` [PATCH update] firewire: core: use non-reentrant workqueue with rescuer Stefan Richter
  2 siblings, 1 reply; 17+ messages in thread
From: Tejun Heo @ 2010-10-11 13:29 UTC (permalink / raw)
  To: Stefan Richter; +Cc: linux1394-devel, linux-kernel

Hello, Stefan.

On 10/10/2010 04:55 PM, Stefan Richter wrote:
...
> Therefore queue all of these types of work items on system_nrt_wq
> instead of system_wq.  The former guarantees non-reentrance across all
> CPUs, the latter only on the CPU which schedules a work item.
> 
> As a bonus, other subsystems which flush system_wq won't be held up if
> the firewire subsystem spends a lot of time in an extraordinarily long
> fw_device.work.

Awesome, thanks a lot for doing this.  BTW, I'm in the process of
hunting down all flush_scheduled_work() users and plan to remove
workqueue flushing capability from all system workqueues.

> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

For the workqueue part,

Acked-by: Tejun Heo <tj@kernel.org>

-- 
tejun

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] firewire: sbp2: parallelize login/inquiry, reconnect, and shutdown
  2010-10-10 14:57 ` [PATCH] firewire: sbp2: parallelize login/inquiry, reconnect, and shutdown Stefan Richter
@ 2010-10-11 13:43   ` Tejun Heo
  2010-10-11 21:27     ` Stefan Richter
  2010-10-12 21:39   ` [PATCH unfinished update] " Stefan Richter
  1 sibling, 1 reply; 17+ messages in thread
From: Tejun Heo @ 2010-10-11 13:43 UTC (permalink / raw)
  To: Stefan Richter; +Cc: linux1394-devel, linux-kernel

Hello,

On 10/10/2010 04:57 PM, Stefan Richter wrote:
...
> Simply swap out the driver workqueue by system_nrt_wq.  It provides all
> the parallelism we will ever need, accepts long-running work, and
> provides the here necessary guarantee of non-reentrance across all CPUs.
               ^^^^
               this probably should go away?

> (The work items are scheduled from firewire-core's fw_device work items
> which themselves may change CPUs.)
> 
> This simple approach requires though that targets with multiple logical
> units accept multiple outstanding management requests.  The SBP-2 spec
> implicitly requires this capability, but who knows how real firmwares
> react.
> 
> I only have access to two dual-LU targets:  A OXUF924DSB based one from
> MacPower and a INIC-2430 based one from IOI.  They both work fine with
> this change, as do several single-LU targets of course.
> 
> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
> ---
>  drivers/firewire/sbp2.c |    9 +--------
>  1 file changed, 1 insertion(+), 8 deletions(-)
> 
> Index: b/drivers/firewire/sbp2.c
> ===================================================================
> --- a/drivers/firewire/sbp2.c
> +++ b/drivers/firewire/sbp2.c
> @@ -835,8 +835,6 @@ static void sbp2_target_put(struct sbp2_
>  	kref_put(&tgt->kref, sbp2_release_target);
>  }
>  
> -static struct workqueue_struct *sbp2_wq;
> -
>  /*
>   * Always get the target's kref when scheduling work on one its units.
>   * Each workqueue job is responsible to call sbp2_target_put() upon return.
> @@ -844,7 +842,7 @@ static struct workqueue_struct *sbp2_wq;
>  static void sbp2_queue_work(struct sbp2_logical_unit *lu, unsigned long delay)
>  {
>  	sbp2_target_get(lu->tgt);
> -	if (!queue_delayed_work(sbp2_wq, &lu->work, delay))
> +	if (!queue_delayed_work(system_nrt_wq, &lu->work, delay))
>  		sbp2_target_put(lu->tgt);
>  }
>  
> @@ -1656,17 +1654,12 @@ MODULE_ALIAS("sbp2");
>  
>  static int __init sbp2_init(void)
>  {
> -	sbp2_wq = create_singlethread_workqueue(KBUILD_MODNAME);
> -	if (!sbp2_wq)
> -		return -ENOMEM;
> -
>  	return driver_register(&sbp2_driver.driver);
>  }
>  
>  static void __exit sbp2_cleanup(void)
>  {
>  	driver_unregister(&sbp2_driver.driver);
> -	destroy_workqueue(sbp2_wq);

Hmmm... from glancing the code, there doesn't seem to anything which
can guarantee sbp2_release_target/reconnect() are finished before
sbp2_cleanup() returns, so the code section might go away with code
still running.  It seems like the right thing to do here would be
using alloc_workqueue(KBUILD_MODNAME, WQ_NON_REENTRANT, 0).  Am I
missing something?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] firewire: core: use non-reentrant workqueue where necessary
  2010-10-11 13:29 ` [PATCH] firewire: core: use non-reentrant workqueue where necessary Tejun Heo
@ 2010-10-11 19:05   ` Stefan Richter
  0 siblings, 0 replies; 17+ messages in thread
From: Stefan Richter @ 2010-10-11 19:05 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux1394-devel, linux-kernel

Tejun Heo wrote:
> For the workqueue part,
> 
> Acked-by: Tejun Heo <tj@kernel.org>

Thanks for looking at it.  (And for pointing out the reentrance issue with the
original system workqueue earlier during the cmwq postings.)
-- 
Stefan Richter
-=====-==-=- =-=- -=-==
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] firewire: sbp2: parallelize login/inquiry, reconnect, and shutdown
  2010-10-11 13:43   ` Tejun Heo
@ 2010-10-11 21:27     ` Stefan Richter
  2010-10-11 21:39       ` Stefan Richter
  2010-10-12 13:50       ` Tejun Heo
  0 siblings, 2 replies; 17+ messages in thread
From: Stefan Richter @ 2010-10-11 21:27 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux1394-devel, linux-kernel

Tejun Heo wrote:
> On 10/10/2010 04:57 PM, Stefan Richter wrote:
> ...
>> Simply swap out the driver workqueue by system_nrt_wq.  It provides all
>> the parallelism we will ever need, accepts long-running work, and
>> provides the here necessary guarantee of non-reentrance across all CPUs.
>                ^^^^
>                this probably should go away?
> 
>> (The work items are scheduled from firewire-core's fw_device work items
>> which themselves may change CPUs.)

Yep.

[...]
>> --- a/drivers/firewire/sbp2.c
>> +++ b/drivers/firewire/sbp2.c
>> @@ -835,8 +835,6 @@ static void sbp2_target_put(struct sbp2_
>>  	kref_put(&tgt->kref, sbp2_release_target);
>>  }
>>  
>> -static struct workqueue_struct *sbp2_wq;
>> -
>>  /*
>>   * Always get the target's kref when scheduling work on one its units.
>>   * Each workqueue job is responsible to call sbp2_target_put() upon return.
>> @@ -844,7 +842,7 @@ static struct workqueue_struct *sbp2_wq;
>>  static void sbp2_queue_work(struct sbp2_logical_unit *lu, unsigned long delay)
>>  {
>>  	sbp2_target_get(lu->tgt);
>> -	if (!queue_delayed_work(sbp2_wq, &lu->work, delay))
>> +	if (!queue_delayed_work(system_nrt_wq, &lu->work, delay))
>>  		sbp2_target_put(lu->tgt);
>>  }
>>  
>> @@ -1656,17 +1654,12 @@ MODULE_ALIAS("sbp2");
>>  
>>  static int __init sbp2_init(void)
>>  {
>> -	sbp2_wq = create_singlethread_workqueue(KBUILD_MODNAME);
>> -	if (!sbp2_wq)
>> -		return -ENOMEM;
>> -
>>  	return driver_register(&sbp2_driver.driver);
>>  }
>>  
>>  static void __exit sbp2_cleanup(void)
>>  {
>>  	driver_unregister(&sbp2_driver.driver);
>> -	destroy_workqueue(sbp2_wq);
> 
> Hmmm... from glancing the code, there doesn't seem to anything which
> can guarantee sbp2_release_target/reconnect() are finished before
> sbp2_cleanup() returns, so the code section might go away with code
> still running.  It seems like the right thing to do here would be
> using alloc_workqueue(KBUILD_MODNAME, WQ_NON_REENTRANT, 0).  Am I
> missing something?

There are indeed situations where the last module reference was already
put down before the work is run for the last time.  Thanks for the hint.

What is preferable, an own workqueue instance whose destroy_workqueue()
lets sbp2_cleanup wait for unfinished work, or module ref-counting like
below?

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

[PATCH] firewire: sbp2: hold a reference for workqueue jobs

which can be run after sbp2_release.  While SCSI core usually still
holds a reference to the firewire-sbp2 module at that point, it might
not do so anymore in some special cases where the scsi_device was
dropped earlier.

Reported-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
---
 drivers/firewire/sbp2.c |   11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

Index: b/drivers/firewire/sbp2.c
===================================================================
--- a/drivers/firewire/sbp2.c
+++ b/drivers/firewire/sbp2.c
@@ -821,8 +821,9 @@ static void sbp2_release_target(struct k
 	fw_notify("released %s, target %d:0:0\n", tgt->bus_id, shost->host_no);
 
 	fw_unit_put(tgt->unit);
-	scsi_host_put(shost);
 	fw_device_put(device);
+	scsi_host_put(shost);
+	module_put(THIS_MODULE);
 }
 
 static void sbp2_target_get(struct sbp2_target *tgt)
@@ -1139,13 +1140,17 @@ static int sbp2_probe(struct device *dev
 	struct Scsi_Host *shost;
 	u32 model, firmware_revision;
 
+	/* Take a reference for late workqueue jobs. */
+	if (!try_module_get(THIS_MODULE))
+		return -ECANCELED;
+
 	if (dma_get_max_seg_size(device->card->device) > SBP2_MAX_SEG_SIZE)
 		BUG_ON(dma_set_max_seg_size(device->card->device,
 					    SBP2_MAX_SEG_SIZE));
 
 	shost = scsi_host_alloc(&scsi_driver_template, sizeof(*tgt));
 	if (shost == NULL)
-		return -ENOMEM;
+		goto fail;
 
 	tgt = (struct sbp2_target *)shost->hostdata;
 	dev_set_drvdata(&unit->device, tgt);
@@ -1200,6 +1205,8 @@ static int sbp2_probe(struct device *dev
 
  fail_shost_put:
 	scsi_host_put(shost);
+ fail:
+	module_put(THIS_MODULE);
 	return -ENOMEM;
 }
 


-- 
Stefan Richter
-=====-==-=- =-=- -=-==
http://arcgraph.de/sr/


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] firewire: sbp2: parallelize login/inquiry, reconnect, and shutdown
  2010-10-11 21:27     ` Stefan Richter
@ 2010-10-11 21:39       ` Stefan Richter
  2010-10-12 13:55         ` Tejun Heo
  2010-10-12 13:50       ` Tejun Heo
  1 sibling, 1 reply; 17+ messages in thread
From: Stefan Richter @ 2010-10-11 21:39 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux1394-devel, linux-kernel

Stefan Richter wrote:
> There are indeed situations where the last module reference was already
> put down before the work is run for the last time.  Thanks for the hint.
> 
> What is preferable, an own workqueue instance whose destroy_workqueue()
> lets sbp2_cleanup wait for unfinished work, or module ref-counting like
> below?

Anoher thing:  Somebody might have swap space on a FireWire disk.  Yet
the system workqueues that firewire-core and (perhaps) firewire-sbp2 are
using are created without WQ_RESCUER.

Does this --- fringe use case as it might be --- call for private
workqueues in firewire-core (for fw_device.work) and firewire-sbp2 (for
sbp2_logical_unit.work)?

Besides the less interesting cases of device discovery and shutdown,
both of these works are also involved in SBP reconnect which needs to be
performed at each FireWire bus reset (which can happen anytime for a
variety of reasons).
-- 
Stefan Richter
-=====-==-=- =-=- -=-==
http://arcgraph.de/sr/


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] firewire: sbp2: parallelize login/inquiry, reconnect, and shutdown
  2010-10-11 21:27     ` Stefan Richter
  2010-10-11 21:39       ` Stefan Richter
@ 2010-10-12 13:50       ` Tejun Heo
  1 sibling, 0 replies; 17+ messages in thread
From: Tejun Heo @ 2010-10-12 13:50 UTC (permalink / raw)
  To: Stefan Richter; +Cc: linux1394-devel, linux-kernel

Hello,

On 10/11/2010 11:27 PM, Stefan Richter wrote:
>> Hmmm... from glancing the code, there doesn't seem to anything which
>> can guarantee sbp2_release_target/reconnect() are finished before
>> sbp2_cleanup() returns, so the code section might go away with code
>> still running.  It seems like the right thing to do here would be
>> using alloc_workqueue(KBUILD_MODNAME, WQ_NON_REENTRANT, 0).  Am I
>> missing something?
> 
> There are indeed situations where the last module reference was already
> put down before the work is run for the last time.  Thanks for the hint.
> 
> What is preferable, an own workqueue instance whose destroy_workqueue()
> lets sbp2_cleanup wait for unfinished work, or module ref-counting like
> below?

The best would be flushing the specific works but that isn't possible
when works are used to do the final put.  I would go for using a
separate workqueue.  With cmwq, each workqueue is much cheaper than
before anyway.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] firewire: sbp2: parallelize login/inquiry, reconnect, and shutdown
  2010-10-11 21:39       ` Stefan Richter
@ 2010-10-12 13:55         ` Tejun Heo
  2010-10-12 16:15           ` Stefan Richter
  0 siblings, 1 reply; 17+ messages in thread
From: Tejun Heo @ 2010-10-12 13:55 UTC (permalink / raw)
  To: Stefan Richter; +Cc: linux1394-devel, linux-kernel

Hello,

On 10/11/2010 11:39 PM, Stefan Richter wrote:
> Stefan Richter wrote:
>> There are indeed situations where the last module reference was already
>> put down before the work is run for the last time.  Thanks for the hint.
>>
>> What is preferable, an own workqueue instance whose destroy_workqueue()
>> lets sbp2_cleanup wait for unfinished work, or module ref-counting like
>> below?
> 
> Anoher thing:  Somebody might have swap space on a FireWire disk.

It doesn't have to be swap.  Having a rw filesystem mounted is enough
to be in the memory reclaim path.

> Yet the system workqueues that firewire-core and (perhaps)
> firewire-sbp2 are using are created without WQ_RESCUER.
> 
> Does this --- fringe use case as it might be --- call for private
> workqueues in firewire-core (for fw_device.work) and firewire-sbp2 (for
> sbp2_logical_unit.work)?
> 
> Besides the less interesting cases of device discovery and shutdown,
> both of these works are also involved in SBP reconnect which needs to be
> performed at each FireWire bus reset (which can happen anytime for a
> variety of reasons).

Yes, it would definitely be better to put them in a workqueue with
WQ_RESCUER (or the new WQ_MEM_RECLAIM).

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] firewire: sbp2: parallelize login/inquiry, reconnect, and shutdown
  2010-10-12 13:55         ` Tejun Heo
@ 2010-10-12 16:15           ` Stefan Richter
  2010-10-12 16:46             ` Tejun Heo
  0 siblings, 1 reply; 17+ messages in thread
From: Stefan Richter @ 2010-10-12 16:15 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux1394-devel, linux-kernel

Tejun Hejo wrote:
> It doesn't have to be swap.  Having a rw filesystem mounted is enough
> to be in the memory reclaim path.

Oh, alright.

...
> Yes, it would definitely be better to put them in a workqueue with
> WQ_RESCUER (or the new WQ_MEM_RECLAIM).

A pro pos WQ_MEM_RECLAIM, is wq.git#for-next stable history which I can pull
into a topic branch at linux1394-2.6.git?  (Or I wait with this stuff until
after .37-rc1.)
-- 
Stefan Richter
-=====-==-=- =-=- -==--
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] firewire: sbp2: parallelize login/inquiry, reconnect, and shutdown
  2010-10-12 16:15           ` Stefan Richter
@ 2010-10-12 16:46             ` Tejun Heo
  0 siblings, 0 replies; 17+ messages in thread
From: Tejun Heo @ 2010-10-12 16:46 UTC (permalink / raw)
  To: Stefan Richter; +Cc: linux1394-devel, linux-kernel

Hello,

On 10/12/2010 06:15 PM, Stefan Richter wrote:
> Tejun Hejo wrote:
>> It doesn't have to be swap.  Having a rw filesystem mounted is enough
>> to be in the memory reclaim path.
> 
> Oh, alright.
> 
> ...
>> Yes, it would definitely be better to put them in a workqueue with
>> WQ_RESCUER (or the new WQ_MEM_RECLAIM).
> 
> A pro pos WQ_MEM_RECLAIM, is wq.git#for-next stable history which I can pull
> into a topic branch at linux1394-2.6.git?  (Or I wait with this stuff until
> after .37-rc1.)

For now using WQ_RESCUER should do.  We can update it to
WQ_MEM_RECLAIM after the trees are merged (both flags have the same
meaning now so nothing would break).

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH update] firewire: core: use non-reentrant workqueue with rescuer
  2010-10-10 14:55 [PATCH] firewire: core: use non-reentrant workqueue where necessary Stefan Richter
  2010-10-10 14:57 ` [PATCH] firewire: sbp2: parallelize login/inquiry, reconnect, and shutdown Stefan Richter
  2010-10-11 13:29 ` [PATCH] firewire: core: use non-reentrant workqueue where necessary Tejun Heo
@ 2010-10-12 21:29 ` Stefan Richter
  2010-10-13  9:47   ` Tejun Heo
  2 siblings, 1 reply; 17+ messages in thread
From: Stefan Richter @ 2010-10-12 21:29 UTC (permalink / raw)
  To: linux1394-devel; +Cc: linux-kernel, Tejun Heo

firewire-core manages the following types of work items:

fw_card.br_work:
  - resets the bus on a card and possibly sends a PHY packet before that
  - does not sleep for long or not at all
  - is scheduled via fw_schedule_bus_reset() by
      - firewire-ohci's pci_probe method
      - firewire-ohci's set_config_rom method, called by kernelspace
        protocol drivers and userspace drivers which add/remove
	Configuration ROM descriptors
      - userspace drivers which use the bus reset ioctl
      - itself if the last reset happened less than 2 seconds ago

fw_card.bm_work:
  - performs bus management duties
  - usually does not (but may in corner cases) sleep for long
  - is scheduled via fw_schedule_bm_work() by
      - firewire-ohci's self-ID-complete IRQ handler tasklet
      - firewire-core's fw_device.work instances whenever the root node
        device was (successfully or unsuccessfully) discovered,
	refreshed, or rediscovered
      - itself in case of resource allocation failures or in order to
        obey the 125ms bus manager arbitration interval

fw_device.work:
  - performs node probe, update, shutdown, revival, removal; including
    kernel driver probe, update, shutdown and bus reset notification to
    userspace drivers
  - usually sleeps moderately long, in corner cases very long
  - is scheduled by
      - firewire-ohci's self-ID-complete IRQ handler tasklet via the
        core's fw_node_event
      - firewire-ohci's pci_remove method via core's fw_destroy_nodes/
        fw_node_event
      - itself during retries, e.g. while a node is powering up

iso_resource.work:
  - accesses registers at the Isochronous Resource Manager node
  - usually does not (but may in corner cases) sleep for long
  - is scheduled via schedule_iso_resource() by
      - the owning userspace driver at addition and removal of the
        resource
      - firewire-core's fw_device.work instances after bus reset
      - itself in case of resource allocation if necessary to obey the
        1000ms reallocation period after bus reset

fw_card.br_work instances should not, and instances of the others must
not, be executed in parallel by multiple CPUs -- but were not protected
against that.  Hence allocated a non-reentrant workqueue for them.

fw_device.work may be used in the memory reclaim path in case of SBP-2
device updates.  Hence we need a workqueue with rescuer and cannot use
system_nrt_wq.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
---
 drivers/firewire/core-card.c        |    6 +++---
 drivers/firewire/core-cdev.c        |    2 +-
 drivers/firewire/core-device.c      |   30 ++++++++++++++++++----------
 drivers/firewire/core-transaction.c |   12 ++++++++++-
 drivers/firewire/core.h             |    2 ++
 5 files changed, 36 insertions(+), 16 deletions(-)

Index: b/drivers/firewire/core-card.c
===================================================================
--- a/drivers/firewire/core-card.c
+++ b/drivers/firewire/core-card.c
@@ -219,8 +219,8 @@ void fw_schedule_bus_reset(struct fw_car
 
 	/* Use an arbitrary short delay to combine multiple reset requests. */
 	fw_card_get(card);
-	if (!schedule_delayed_work(&card->br_work,
-				   delayed ? DIV_ROUND_UP(HZ, 100) : 0))
+	if (!queue_delayed_work(fw_wq, &card->br_work,
+				delayed ? DIV_ROUND_UP(HZ, 100) : 0))
 		fw_card_put(card);
 }
 EXPORT_SYMBOL(fw_schedule_bus_reset);
@@ -232,7 +232,7 @@ static void br_work(struct work_struct *
 	/* Delay for 2s after last reset per IEEE 1394 clause 8.2.1. */
 	if (card->reset_jiffies != 0 &&
 	    time_is_after_jiffies(card->reset_jiffies + 2 * HZ)) {
-		if (!schedule_delayed_work(&card->br_work, 2 * HZ))
+		if (!queue_delayed_work(fw_wq, &card->br_work, 2 * HZ))
 			fw_card_put(card);
 		return;
 	}
Index: b/drivers/firewire/core-cdev.c
===================================================================
--- a/drivers/firewire/core-cdev.c
+++ b/drivers/firewire/core-cdev.c
@@ -149,7 +149,7 @@ static void release_iso_resource(struct 
 static void schedule_iso_resource(struct iso_resource *r, unsigned long delay)
 {
 	client_get(r->client);
-	if (!schedule_delayed_work(&r->work, delay))
+	if (!queue_delayed_work(fw_wq, &r->work, delay))
 		client_put(r->client);
 }
 
Index: b/drivers/firewire/core-device.c
===================================================================
--- a/drivers/firewire/core-device.c
+++ b/drivers/firewire/core-device.c
@@ -725,6 +725,14 @@ struct fw_device *fw_device_get_by_devt(
 	return device;
 }
 
+struct workqueue_struct *fw_wq;
+
+static void fw_schedule_device_work(struct fw_device *device,
+				    unsigned long delay)
+{
+	queue_delayed_work(fw_wq, &device->work, delay);
+}
+
 /*
  * These defines control the retry behavior for reading the config
  * rom.  It shouldn't be necessary to tweak these; if the device
@@ -749,7 +757,7 @@ static void fw_device_shutdown(struct wo
 
 	if (time_is_after_jiffies(device->card->reset_jiffies + SHUTDOWN_DELAY)
 	    && !list_empty(&device->card->link)) {
-		schedule_delayed_work(&device->work, SHUTDOWN_DELAY);
+		fw_schedule_device_work(device, SHUTDOWN_DELAY);
 		return;
 	}
 
@@ -861,7 +869,7 @@ static int lookup_existing_device(struct
 		fw_notify("rediscovered device %s\n", dev_name(dev));
 
 		PREPARE_DELAYED_WORK(&old->work, fw_device_update);
-		schedule_delayed_work(&old->work, 0);
+		fw_schedule_device_work(old, 0);
 
 		if (current_node == card->root_node)
 			fw_schedule_bm_work(card, 0);
@@ -952,7 +960,7 @@ static void fw_device_init(struct work_s
 		if (device->config_rom_retries < MAX_RETRIES &&
 		    atomic_read(&device->state) == FW_DEVICE_INITIALIZING) {
 			device->config_rom_retries++;
-			schedule_delayed_work(&device->work, RETRY_DELAY);
+			fw_schedule_device_work(device, RETRY_DELAY);
 		} else {
 			fw_notify("giving up on config rom for node id %x\n",
 				  device->node_id);
@@ -1017,7 +1025,7 @@ static void fw_device_init(struct work_s
 			   FW_DEVICE_INITIALIZING,
 			   FW_DEVICE_RUNNING) == FW_DEVICE_GONE) {
 		PREPARE_DELAYED_WORK(&device->work, fw_device_shutdown);
-		schedule_delayed_work(&device->work, SHUTDOWN_DELAY);
+		fw_schedule_device_work(device, SHUTDOWN_DELAY);
 	} else {
 		if (device->config_rom_retries)
 			fw_notify("created device %s: GUID %08x%08x, S%d00, "
@@ -1096,7 +1104,7 @@ static void fw_device_refresh(struct wor
 		if (device->config_rom_retries < MAX_RETRIES / 2 &&
 		    atomic_read(&device->state) == FW_DEVICE_INITIALIZING) {
 			device->config_rom_retries++;
-			schedule_delayed_work(&device->work, RETRY_DELAY / 2);
+			fw_schedule_device_work(device, RETRY_DELAY / 2);
 
 			return;
 		}
@@ -1129,7 +1137,7 @@ static void fw_device_refresh(struct wor
 		if (device->config_rom_retries < MAX_RETRIES &&
 		    atomic_read(&device->state) == FW_DEVICE_INITIALIZING) {
 			device->config_rom_retries++;
-			schedule_delayed_work(&device->work, RETRY_DELAY);
+			fw_schedule_device_work(device, RETRY_DELAY);
 
 			return;
 		}
@@ -1156,7 +1164,7 @@ static void fw_device_refresh(struct wor
  gone:
 	atomic_set(&device->state, FW_DEVICE_GONE);
 	PREPARE_DELAYED_WORK(&device->work, fw_device_shutdown);
-	schedule_delayed_work(&device->work, SHUTDOWN_DELAY);
+	fw_schedule_device_work(device, SHUTDOWN_DELAY);
  out:
 	if (node_id == card->root_node->node_id)
 		fw_schedule_bm_work(card, 0);
@@ -1209,7 +1217,7 @@ void fw_node_event(struct fw_card *card,
 		 * first config rom scan half a second after bus reset.
 		 */
 		INIT_DELAYED_WORK(&device->work, fw_device_init);
-		schedule_delayed_work(&device->work, INITIAL_DELAY);
+		fw_schedule_device_work(device, INITIAL_DELAY);
 		break;
 
 	case FW_NODE_INITIATED_RESET:
@@ -1224,7 +1232,7 @@ void fw_node_event(struct fw_card *card,
 			    FW_DEVICE_RUNNING,
 			    FW_DEVICE_INITIALIZING) == FW_DEVICE_RUNNING) {
 			PREPARE_DELAYED_WORK(&device->work, fw_device_refresh);
-			schedule_delayed_work(&device->work,
+			fw_schedule_device_work(device,
 				device->is_local ? 0 : INITIAL_DELAY);
 		}
 		break;
@@ -1239,7 +1247,7 @@ void fw_node_event(struct fw_card *card,
 		device->generation = card->generation;
 		if (atomic_read(&device->state) == FW_DEVICE_RUNNING) {
 			PREPARE_DELAYED_WORK(&device->work, fw_device_update);
-			schedule_delayed_work(&device->work, 0);
+			fw_schedule_device_work(device, 0);
 		}
 		break;
 
@@ -1264,7 +1272,7 @@ void fw_node_event(struct fw_card *card,
 		if (atomic_xchg(&device->state,
 				FW_DEVICE_GONE) == FW_DEVICE_RUNNING) {
 			PREPARE_DELAYED_WORK(&device->work, fw_device_shutdown);
-			schedule_delayed_work(&device->work,
+			fw_schedule_device_work(device,
 				list_empty(&card->link) ? 0 : SHUTDOWN_DELAY);
 		}
 		break;
Index: b/drivers/firewire/core-transaction.c
===================================================================
--- a/drivers/firewire/core-transaction.c
+++ b/drivers/firewire/core-transaction.c
@@ -36,6 +36,7 @@
 #include <linux/string.h>
 #include <linux/timer.h>
 #include <linux/types.h>
+#include <linux/workqueue.h>
 
 #include <asm/byteorder.h>
 
@@ -1192,13 +1193,21 @@ static int __init fw_core_init(void)
 {
 	int ret;
 
+	fw_wq = alloc_workqueue(KBUILD_MODNAME,
+				WQ_NON_REENTRANT | WQ_RESCUER, 0);
+	if (!fw_wq)
+		return -ENOMEM;
+
 	ret = bus_register(&fw_bus_type);
-	if (ret < 0)
+	if (ret < 0) {
+		destroy_workqueue(fw_wq);
 		return ret;
+	}
 
 	fw_cdev_major = register_chrdev(0, "firewire", &fw_device_ops);
 	if (fw_cdev_major < 0) {
 		bus_unregister(&fw_bus_type);
+		destroy_workqueue(fw_wq);
 		return fw_cdev_major;
 	}
 
@@ -1214,6 +1223,7 @@ static void __exit fw_core_cleanup(void)
 {
 	unregister_chrdev(fw_cdev_major, "firewire");
 	bus_unregister(&fw_bus_type);
+	destroy_workqueue(fw_wq);
 	idr_destroy(&fw_device_idr);
 }
 
Index: b/drivers/firewire/core.h
===================================================================
--- a/drivers/firewire/core.h
+++ b/drivers/firewire/core.h
@@ -138,6 +138,8 @@ void fw_cdev_handle_phy_packet(struct fw
 extern struct rw_semaphore fw_device_rwsem;
 extern struct idr fw_device_idr;
 extern int fw_cdev_major;
+struct workqueue_struct;
+extern struct workqueue_struct *fw_wq;
 
 struct fw_device *fw_device_get_by_devt(dev_t devt);
 int fw_device_set_broadcast_channel(struct device *dev, void *gen);

-- 
Stefan Richter
-=====-==-=- =-=- -==--
http://arcgraph.de/sr/


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH unfinished update] firewire: sbp2: parallelize login/inquiry, reconnect, and shutdown
  2010-10-10 14:57 ` [PATCH] firewire: sbp2: parallelize login/inquiry, reconnect, and shutdown Stefan Richter
  2010-10-11 13:43   ` Tejun Heo
@ 2010-10-12 21:39   ` Stefan Richter
  2010-10-12 22:25     ` Stefan Richter
  1 sibling, 1 reply; 17+ messages in thread
From: Stefan Richter @ 2010-10-12 21:39 UTC (permalink / raw)
  To: linux1394-devel; +Cc: linux-kernel, Tejun Heo

So I changed the patch to the simple one below, but suddenly my
INIC-2430 device has trouble with inquiry in parallel to its two LUNs:

Oct 12 23:23:50 stein kernel: scsi22 : SBP-2 IEEE-1394
Oct 12 23:23:50 stein kernel: firewire_sbp2: Workarounds for fw7.0: 0x2 (firmware_revision 0x000241, model_id 0x000000)
Oct 12 23:23:50 stein kernel: scsi23 : SBP-2 IEEE-1394
Oct 12 23:23:50 stein kernel: firewire_sbp2: Workarounds for fw7.1: 0x2 (firmware_revision 0x000241, model_id 0x000000)
Oct 12 23:23:50 stein kernel: firewire_core: created device fw7: GUID 00027a0e440020c2, S800
Oct 12 23:23:50 stein kernel: firewire_sbp2: fw7.0: logged in to LUN 0000 (0 retries)
Oct 12 23:23:50 stein kernel: firewire_sbp2: fw7.1: logged in to LUN 0001 (0 retries)
Oct 12 23:23:52 stein kernel: scsi 23:0:0:1: CD-ROM            TEAC     DV-516GC         XT11 PQ: 0 ANSI: 0
Oct 12 23:23:52 stein kernel: sr1: scsi3-mmc drive: 48x/48x cd/rw xa/form2 cdda tray
Oct 12 23:23:52 stein kernel: sr 23:0:0:1: Attached scsi CD-ROM sr1
Oct 12 23:23:52 stein kernel: sr 23:0:0:1: Attached scsi generic sg6 type 5
Oct 12 23:23:52 stein kernel: firewire_sbp2: fw7.0: orb reply timed out, rcode=0x11

When I then force a bus reset by hardware or software, the device comes
to its senses:

Oct 12 23:24:38 stein kernel: firewire_sbp2: fw7.0: logged in to LUN 0000 (0 retries)
Oct 12 23:24:38 stein kernel: firewire_sbp2: fw7.1: reconnected to LUN 0001 (0 retries)
Oct 12 23:24:38 stein kernel: scsi 22:0:0:0: CD-ROM            TEAC     DV-516GC         XT11 PQ: 0 ANSI: 0
Oct 12 23:24:38 stein kernel: sr2: scsi3-mmc drive: 48x/48x cd/rw xa/form2 cdda tray
Oct 12 23:24:38 stein kernel: sr 22:0:0:0: Attached scsi CD-ROM sr2
Oct 12 23:24:38 stein kernel: sr 22:0:0:0: Attached scsi generic sg7 type 5

The initial Inquiry(?) timeout happens not always but quite often.  It
did not happen when I tested the earlier version of the patch in which I
used system_nrt_wq.

Maybe I need ordered per-target workqueues.  Not Signed-off-by me yet.

Tejun, can a workqueue job call destroy_workqueue on its own queue?
---
 drivers/firewire/sbp2.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: b/drivers/firewire/sbp2.c
===================================================================
--- a/drivers/firewire/sbp2.c
+++ b/drivers/firewire/sbp2.c
@@ -1656,7 +1656,8 @@ MODULE_ALIAS("sbp2");
 
 static int __init sbp2_init(void)
 {
-	sbp2_wq = create_singlethread_workqueue(KBUILD_MODNAME);
+	sbp2_wq = alloc_workqueue(KBUILD_MODNAME,
+				  WQ_NON_REENTRANT | WQ_RESCUER, 0);
 	if (!sbp2_wq)
 		return -ENOMEM;
 

-- 
Stefan Richter
-=====-==-=- =-=- -==--
http://arcgraph.de/sr/


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH unfinished update] firewire: sbp2: parallelize login/inquiry, reconnect, and shutdown
  2010-10-12 21:39   ` [PATCH unfinished update] " Stefan Richter
@ 2010-10-12 22:25     ` Stefan Richter
  2010-10-12 23:09       ` Stefan Richter
  0 siblings, 1 reply; 17+ messages in thread
From: Stefan Richter @ 2010-10-12 22:25 UTC (permalink / raw)
  To: linux1394-devel; +Cc: linux-kernel, Tejun Heo

> So I changed the patch to the simple one below, but suddenly my
> INIC-2430 device has trouble with inquiry in parallel to its two LUNs:

The following modification fixes this mostly, i.e. always when the first
Inquiry attempt to one LUN fails but succeeds on the other.  But I had
one occasion just now at which both failed; then that reschedule does
simply lead into the same situation until retry limit.

> Maybe I need ordered per-target workqueues.

No, I actually need per-node ordering.  This device, like my other one
and apparently most ones on the market actually keep their SCSI logical
units in separate FireWire units.  Currently firewire-sbp2 is not aware
that fw_unit instances reside on the same node.

This is going to be tougher than I hoped.  Maybe I need a driver-global
Inquiry mutex.  Or move the __scsi_add_device() out of sbp2_login() into
another work which is scheduled to a "single-threaded" workqueue.

--- a/drivers/firewire/sbp2.c
+++ b/drivers/firewire/sbp2.c
@@ -982,6 +982,8 @@ static void sbp2_login(struct work_struc
 	 * lu->work already.  Reset the work from reconnect to login.
 	 */
 	PREPARE_DELAYED_WORK(&lu->work, sbp2_login);
+	if (lu->retries++ < 5)
+		sbp2_queue_work(lu, DIV_ROUND_UP(HZ, 5));
  out:
 	sbp2_target_put(tgt);
 }
@@ -1656,7 +1658,8 @@ MODULE_ALIAS("sbp2");

 static int __init sbp2_init(void)
 {
-	sbp2_wq = create_singlethread_workqueue(KBUILD_MODNAME);
+	sbp2_wq = alloc_workqueue(KBUILD_MODNAME,
+				  WQ_NON_REENTRANT | WQ_RESCUER, 0);
 	if (!sbp2_wq)
 		return -ENOMEM;

-- 
Stefan Richter
-=====-==-=- =-=- -==-=
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH unfinished update] firewire: sbp2: parallelize login/inquiry, reconnect, and shutdown
  2010-10-12 22:25     ` Stefan Richter
@ 2010-10-12 23:09       ` Stefan Richter
  2010-10-13  9:45         ` Tejun Heo
  0 siblings, 1 reply; 17+ messages in thread
From: Stefan Richter @ 2010-10-12 23:09 UTC (permalink / raw)
  To: linux1394-devel; +Cc: linux-kernel, Tejun Heo

Stefan Richter wrote:
>> Maybe I need ordered per-target workqueues.
> 
> No, I actually need per-node ordering.  This device, like my other one
> and apparently most ones on the market actually keep their SCSI logical
> units in separate FireWire units.  Currently firewire-sbp2 is not aware
> that fw_unit instances reside on the same node.
> 
> This is going to be tougher than I hoped.  Maybe I need a driver-global
> Inquiry mutex.  Or move the __scsi_add_device() out of sbp2_login() into
> another work which is scheduled to a "single-threaded" workqueue.

More extensive change but probably with simpler end result:  Let sbp2_probe
and sbp2_update perform login/scsi_add_device/reconnect directly.  sbp2_probe
and sbp2_update are serialized per node, but (since 2.6.36 with cmwq)
parallelized across nodes.
-- 
Stefan Richter
-=====-==-=- =-=- -==-=
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH unfinished update] firewire: sbp2: parallelize login/inquiry, reconnect, and shutdown
  2010-10-12 23:09       ` Stefan Richter
@ 2010-10-13  9:45         ` Tejun Heo
  0 siblings, 0 replies; 17+ messages in thread
From: Tejun Heo @ 2010-10-13  9:45 UTC (permalink / raw)
  To: Stefan Richter; +Cc: linux1394-devel, linux-kernel

Hello,

On 10/13/2010 01:09 AM, Stefan Richter wrote:
> More extensive change but probably with simpler end result: Let
> sbp2_probe and sbp2_update perform login/scsi_add_device/reconnect
> directly.  sbp2_probe and sbp2_update are serialized per node, but
> (since 2.6.36 with cmwq) parallelized across nodes.

I don't know much about firewire details but just in case it helps.
libata does the ATA part of probing in parallel using the async
mechanism (which is a wrapper around workqueue now) and executes the
SCSI probing in a serialized context.  If SCSI parallel probing is
causing the problem, maybe firewire can do similar thing?  The
difference between using async and workqueue directly is only in how
things can be ordered.  There are cases where parallelizing probe is
easier with async but one way or the other it shouldn't matter too
much.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH update] firewire: core: use non-reentrant workqueue with rescuer
  2010-10-12 21:29 ` [PATCH update] firewire: core: use non-reentrant workqueue with rescuer Stefan Richter
@ 2010-10-13  9:47   ` Tejun Heo
  0 siblings, 0 replies; 17+ messages in thread
From: Tejun Heo @ 2010-10-13  9:47 UTC (permalink / raw)
  To: Stefan Richter; +Cc: linux1394-devel, linux-kernel

Hello,

On 10/12/2010 11:29 PM, Stefan Richter wrote:
...
> fw_card.br_work instances should not, and instances of the others must
> not, be executed in parallel by multiple CPUs -- but were not protected
> against that.  Hence allocated a non-reentrant workqueue for them.
> 
> fw_device.work may be used in the memory reclaim path in case of SBP-2
> device updates.  Hence we need a workqueue with rescuer and cannot use
> system_nrt_wq.
> 
> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

Looks good to me.

Reviewed-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2010-10-13  9:47 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-10-10 14:55 [PATCH] firewire: core: use non-reentrant workqueue where necessary Stefan Richter
2010-10-10 14:57 ` [PATCH] firewire: sbp2: parallelize login/inquiry, reconnect, and shutdown Stefan Richter
2010-10-11 13:43   ` Tejun Heo
2010-10-11 21:27     ` Stefan Richter
2010-10-11 21:39       ` Stefan Richter
2010-10-12 13:55         ` Tejun Heo
2010-10-12 16:15           ` Stefan Richter
2010-10-12 16:46             ` Tejun Heo
2010-10-12 13:50       ` Tejun Heo
2010-10-12 21:39   ` [PATCH unfinished update] " Stefan Richter
2010-10-12 22:25     ` Stefan Richter
2010-10-12 23:09       ` Stefan Richter
2010-10-13  9:45         ` Tejun Heo
2010-10-11 13:29 ` [PATCH] firewire: core: use non-reentrant workqueue where necessary Tejun Heo
2010-10-11 19:05   ` Stefan Richter
2010-10-12 21:29 ` [PATCH update] firewire: core: use non-reentrant workqueue with rescuer Stefan Richter
2010-10-13  9:47   ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox