[PATCH] serialize bus scanning

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] serialize bus scanning
@ 2003-08-25 12:24 Christoph Hellwig
  2003-08-27 20:29 ` Anton Blanchard
  0 siblings, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2003-08-25 12:24 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi

Synchronize all scanning activity, this fixes long-standing races
vs /proc/scsi/scsi and sysfs addition and deletion of devices.

Note that this does not serialize removing, the lists will get
their own locking soon.


diff -Nru a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
--- a/drivers/scsi/scsi_scan.c	Mon Aug 25 13:37:29 2003
+++ b/drivers/scsi/scsi_scan.c	Mon Aug 25 13:37:29 2003
@@ -30,6 +30,7 @@
 #include <linux/moduleparam.h>
 #include <linux/init.h>
 #include <linux/blkdev.h>
+#include <asm/semaphore.h>
 
 #include "scsi.h"
 #include "hosts.h"
@@ -94,6 +95,13 @@
 		 " between 1 and 16384)");
 #endif
 
+/*
+ * This mutex serializes all scsi scanning activity from kernel- and
+ * userspace.  It could easily be made per-host but I'd like to avoid
+ * the overhead for now.
+ */
+static DECLARE_MUTEX(scsi_scan_mutex);
+
 /**
  * scsi_unlock_floptical - unlock device via a special MODE SENSE command
  * @sreq:	used to send the command
@@ -1067,9 +1075,12 @@
 	struct scsi_device *sdev;
 	int res;
 
+	down(&scsi_scan_mutex);
 	res = scsi_probe_and_add_lun(shost, channel, id, lun, NULL, &sdev, 1);
 	if (res != SCSI_SCAN_LUN_PRESENT)
 		sdev = ERR_PTR(-ENODEV);
+	up(&scsi_scan_mutex);
+
 	return sdev;
 }
 
@@ -1191,11 +1202,14 @@
 	    ((lun != SCAN_WILD_CARD) && (lun > shost->max_lun)))
 		return -EINVAL;
 
+	down(&scsi_scan_mutex);
 	if (channel == SCAN_WILD_CARD) 
 		for (channel = 0; channel <= shost->max_channel; channel++)
 			scsi_scan_channel(shost, channel, id, lun, rescan);
 	else
 		scsi_scan_channel(shost, channel, id, lun, rescan);
+	up(&scsi_scan_mutex);
+
 	return 0;
 }
 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] serialize bus scanning
  2003-08-25 12:24 [PATCH] serialize bus scanning Christoph Hellwig
@ 2003-08-27 20:29 ` Anton Blanchard
  2003-08-27 20:35   ` Christoph Hellwig
  2003-08-27 22:32   ` Mike Anderson
  0 siblings, 2 replies; 9+ messages in thread
From: Anton Blanchard @ 2003-08-27 20:29 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: James Bottomley, linux-scsi

Hi,

> Synchronize all scanning activity, this fixes long-standing races
> vs /proc/scsi/scsi and sysfs addition and deletion of devices.
> 
> Note that this does not serialize removing, the lists will get
> their own locking soon.

Ive started playing with parallel probe in the lab (on some large ppc64
boxes). I know we have a lot of work to do before this will all work,
but would it be possible to make this semaphore per host?

Even with a per host semaphore, it annoys me how long it takes to spin
up 7 disks especially if a few of them are dead. Longer term are we
planning to overlap spinup of disks on the same host?

(We have a property in our device tree that says how quickly you can
spin disks up on the same host, eg 1sec apart. We should also be able to
do this in parallel on all hosts. We could easily read this property in
a small initramfs program and then issue probe commands via /sys)

Anton

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] serialize bus scanning
  2003-08-27 20:29 ` Anton Blanchard
@ 2003-08-27 20:35   ` Christoph Hellwig
  2003-09-10  7:57     ` Anton Blanchard
  2003-08-27 22:32   ` Mike Anderson
  1 sibling, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2003-08-27 20:35 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: James Bottomley, linux-scsi

On Thu, Aug 28, 2003 at 06:29:08AM +1000, Anton Blanchard wrote:
> 
> Hi,
> 
> > Synchronize all scanning activity, this fixes long-standing races
> > vs /proc/scsi/scsi and sysfs addition and deletion of devices.
> > 
> > Note that this does not serialize removing, the lists will get
> > their own locking soon.
> 
> Ive started playing with parallel probe in the lab (on some large ppc64
> boxes). I know we have a lot of work to do before this will all work,
> but would it be possible to make this semaphore per host?

you can just add a per-host mutex and lock it in the same places as
indicated by the comment in the patch.  

> Even with a per host semaphore, it annoys me how long it takes to spin
> up 7 disks especially if a few of them are dead. Longer term are we
> planning to overlap spinup of disks on the same host?

I have no plans to do that but if someone comes up with a proper
patch that would be nice.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] serialize bus scanning
  2003-08-27 20:29 ` Anton Blanchard
  2003-08-27 20:35   ` Christoph Hellwig
@ 2003-08-27 22:32   ` Mike Anderson
  2003-08-27 22:59     ` Anton Blanchard
  1 sibling, 1 reply; 9+ messages in thread
From: Mike Anderson @ 2003-08-27 22:32 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: Christoph Hellwig, James Bottomley, linux-scsi

Anton Blanchard [anton@samba.org] wrote:
> Ive started playing with parallel probe in the lab (on some large ppc64
> boxes). I know we have a lot of work to do before this will all work,
> but would it be possible to make this semaphore per host?
> 
> Even with a per host semaphore, it annoys me how long it takes to spin
> up 7 disks especially if a few of them are dead. Longer term are we
> planning to overlap spinup of disks on the same host?
> 
> (We have a property in our device tree that says how quickly you can
> spin disks up on the same host, eg 1sec apart. We should also be able to
> do this in parallel on all hosts. We could easily read this property in
> a small initramfs program and then issue probe commands via /sys)
> 

Since I have not seen your tree I am speaking with lack of knowledge, but
it seems odd that your device tree would contain spinup knowledge per
host. The power issue you are trying to avoid is per disk enclosure
/ cabinet. A per host value seems like it could still lead to a power
issue given a large enough configuration. 

-andmike
--
Michael Anderson
andmike@us.ibm.com


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] serialize bus scanning
  2003-08-27 22:32   ` Mike Anderson
@ 2003-08-27 22:59     ` Anton Blanchard
  0 siblings, 0 replies; 9+ messages in thread
From: Anton Blanchard @ 2003-08-27 22:59 UTC (permalink / raw)
  To: Christoph Hellwig, James Bottomley, linux-scsi

> Since I have not seen your tree I am speaking with lack of knowledge, but
> it seems odd that your device tree would contain spinup knowledge per
> host. The power issue you are trying to avoid is per disk enclosure
> / cabinet. A per host value seems like it could still lead to a power
> issue given a large enough configuration.

Yeah you are right, its currently a global value. The firmware knows
power domains so it could get finer grained but it would be at the host
bridge level.

Anyway all this intelligence to parse this stuff should be in an initramfs 
program talking to sysfs. I was quite pleased with the sysfs rescan
functionality recently, we had someone kick the power on a scsi drawer
and were able to get linux to rescan and spin up the disks again
without rebooting.

Anton

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] serialize bus scanning
  2003-08-27 20:35   ` Christoph Hellwig
@ 2003-09-10  7:57     ` Anton Blanchard
  2003-09-11 20:34       ` Patrick Mansfield
  0 siblings, 1 reply; 9+ messages in thread
From: Anton Blanchard @ 2003-09-10  7:57 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: James Bottomley, linux-scsi

 
> you can just add a per-host mutex and lock it in the same places as
> indicated by the comment in the patch.  

How does this look? This restores the ability to do parallel scsi scan
via sysfs.

Anton

--

Change scsi_scan_mutex from global to per host, which is needed for
parallel SCSI probe.


 gr16_work-anton/drivers/scsi/hosts.c     |    2 ++
 gr16_work-anton/drivers/scsi/scsi_scan.c |   15 ++++-----------
 gr16_work-anton/include/scsi/scsi_host.h |    6 ++++++
 3 files changed, 12 insertions(+), 11 deletions(-)

diff -puN drivers/scsi/scsi_scan.c~per_host_scan_mutex drivers/scsi/scsi_scan.c
--- gr16_work/drivers/scsi/scsi_scan.c~per_host_scan_mutex	2003-09-07 00:01:57.000000000 -0500
+++ gr16_work-anton/drivers/scsi/scsi_scan.c	2003-09-07 00:01:57.000000000 -0500
@@ -95,13 +95,6 @@ MODULE_PARM_DESC(max_report_luns,
 		 " between 1 and 16384)");
 #endif
 
-/*
- * This mutex serializes all scsi scanning activity from kernel- and
- * userspace.  It could easily be made per-host but I'd like to avoid
- * the overhead for now.
- */
-static DECLARE_MUTEX(scsi_scan_mutex);
-
 /**
  * scsi_unlock_floptical - unlock device via a special MODE SENSE command
  * @sreq:	used to send the command
@@ -1075,11 +1068,11 @@ struct scsi_device *scsi_add_device(stru
 	struct scsi_device *sdev;
 	int res;
 
-	down(&scsi_scan_mutex);
+	down(&shost->scan_mutex);
 	res = scsi_probe_and_add_lun(shost, channel, id, lun, NULL, &sdev, 1);
 	if (res != SCSI_SCAN_LUN_PRESENT)
 		sdev = ERR_PTR(-ENODEV);
-	up(&scsi_scan_mutex);
+	up(&shost->scan_mutex);
 
 	return sdev;
 }
@@ -1202,13 +1195,13 @@ int scsi_scan_host_selected(struct Scsi_
 	    ((lun != SCAN_WILD_CARD) && (lun > shost->max_lun)))
 		return -EINVAL;
 
-	down(&scsi_scan_mutex);
+	down(&shost->scan_mutex);
 	if (channel == SCAN_WILD_CARD) 
 		for (channel = 0; channel <= shost->max_channel; channel++)
 			scsi_scan_channel(shost, channel, id, lun, rescan);
 	else
 		scsi_scan_channel(shost, channel, id, lun, rescan);
-	up(&scsi_scan_mutex);
+	up(&shost->scan_mutex);
 
 	return 0;
 }
diff -puN include/scsi/scsi_host.h~per_host_scan_mutex include/scsi/scsi_host.h
--- gr16_work/include/scsi/scsi_host.h~per_host_scan_mutex	2003-09-07 00:01:57.000000000 -0500
+++ gr16_work-anton/include/scsi/scsi_host.h	2003-09-10 02:53:34.000000000 -0500
@@ -480,6 +480,12 @@ struct Scsi_Host {
 	struct list_head sht_legacy_list;
 
 	/*
+	 * This mutex serializes all scsi scanning activity from kernel- and
+	 * userspace.
+	 */
+	struct semaphore scan_mutex;
+
+	/*
 	 * We should ensure that this is aligned, both for better performance
 	 * and also because some compilers (m68k) don't automatically force
 	 * alignment to a long boundary.
diff -puN drivers/scsi/hosts.c~per_host_scan_mutex drivers/scsi/hosts.c
--- gr16_work/drivers/scsi/hosts.c~per_host_scan_mutex	2003-09-07 00:01:57.000000000 -0500
+++ gr16_work-anton/drivers/scsi/hosts.c	2003-09-07 00:01:57.000000000 -0500
@@ -214,6 +214,8 @@ struct Scsi_Host *scsi_host_alloc(struct
 	INIT_LIST_HEAD(&shost->starved_list);
 	init_waitqueue_head(&shost->host_wait);
 
+	init_MUTEX(&shost->scan_mutex);
+
 	shost->host_no = scsi_host_next_hn++; /* XXX(hch): still racy */
 	shost->dma_channel = 0xff;
 

_

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] serialize bus scanning
  2003-09-10  7:57     ` Anton Blanchard
@ 2003-09-11 20:34       ` Patrick Mansfield
  2003-09-15  7:13         ` Anton Blanchard
  0 siblings, 1 reply; 9+ messages in thread
From: Patrick Mansfield @ 2003-09-11 20:34 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: Christoph Hellwig, James Bottomley, linux-scsi

[-- Attachment #1: Type: text/plain, Size: 2379 bytes --]

On Wed, Sep 10, 2003 at 05:57:42PM +1000, Anton Blanchard wrote:
> 
> How does this look? This restores the ability to do parallel scsi scan
> via sysfs.
> 

The patch looks fine to me, I hope James will apply it.

I think we might still have to allow only a scan or delete (per host), and
not both in parallel - it looks like we could have some races otherwise,
I'm not sure though (mainly where the driver model points to the parent
device). Maybe a get of the host device during scanning is enough.

I tested it out, and ran some timings pre and post-patch, mainly out of
curiousity.

I have:

[root@elm3b79 root]# ls -1d /sysfs/bus/scsi/devices/[01]* | wc
      6       6     192
[root@elm3b79 root]# ls -1d /sysfs/bus/scsi/devices/[2-5]* | wc
    288     288    9428

Really 72 separate LUNs attached to qlogic 2310 (FCP) host adapters (hosts
2-5), each with four paths. 40 disk drives (10 on a loop per enclosure),
plus 32 LUNs connected to a disk array. All attached to a 1Gb switch.

I ran a simple script (attached) to scan via sysfs in parallel. Of course
without the patch scanning is serialized.

System is a 8 cpu (two node) NUMAQ.

--------------------------------------------
Before the patch:

Rescanning the four qlogic adapters without finding any new devices,
seemed to vary up to about 34 seconds:

[root@elm3b79 root]# time ./scanall.sh > xx

real    0m31.947s
user    0m0.007s
sys     0m0.113s

Rescan in parallel re-discovering all devices:

[root@elm3b79 root]# time ./scanall.sh  > xx

real    0m34.189s
user    0m0.009s
sys     0m0.615s

--------------------------------------------
With the patch:

Generally went from 34 seconds down to 9.

Rescanning the four qlogic adapters without finding any new devices:

[root@elm3b79 root]# time ./scanall.sh > xx

real    0m8.057s
user    0m0.007s
sys     0m0.120s

Rescan in parallel re-discovering all devices:

[root@elm3b79 root]# time ./scanall.sh > xx

real    0m9.608s
user    0m0.008s
sys     0m0.783s

-----------------------------------------------------------

Also: time to sequentially remove all devices (on host 2-5, patch does not
matter but I find this interesting):

time ./delall.sh > xx

real    0m7.276s
user    0m0.823s
sys     0m3.106s

And time to remove in parallel

[root@elm3b79 root]# time ./delall.sh > xx 
real    0m4.586s
user    0m1.336s
sys     0m7.204s

-- Patrick Mansfield

[-- Attachment #2: scanall.sh --]
[-- Type: application/x-sh, Size: 238 bytes --]

[-- Attachment #3: delall.sh --]
[-- Type: application/x-sh, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] serialize bus scanning
  2003-09-11 20:34       ` Patrick Mansfield
@ 2003-09-15  7:13         ` Anton Blanchard
  2003-09-15  9:10           ` Christoph Hellwig
  0 siblings, 1 reply; 9+ messages in thread
From: Anton Blanchard @ 2003-09-15  7:13 UTC (permalink / raw)
  To: Patrick Mansfield; +Cc: Christoph Hellwig, James Bottomley, linux-scsi


Hi Pat,

> The patch looks fine to me, I hope James will apply it.

Good to see someone else does parallel scan :)

> I think we might still have to allow only a scan or delete (per host), and
> not both in parallel - it looks like we could have some races otherwise,
> I'm not sure though (mainly where the driver model points to the parent
> device). Maybe a get of the host device during scanning is enough.

Speaking of races... Ive been tripping one when parallel probing.
I hit a poisoned list pointer in scsi_run_host_queues, which is being
called out of the error handler thread. (Ive got a few scsi buses in this
machine full of bad disks)

I havent had a chance to look into it but Im guessing we are racing with
someone calling scsi_free_sdev. Im not sure how we should be protecting
->siblings but it looks like we have to do something.

Anton

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] serialize bus scanning
  2003-09-15  7:13         ` Anton Blanchard
@ 2003-09-15  9:10           ` Christoph Hellwig
  0 siblings, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2003-09-15  9:10 UTC (permalink / raw)
  To: Anton Blanchard
  Cc: Patrick Mansfield, Christoph Hellwig, James Bottomley, linux-scsi

On Mon, Sep 15, 2003 at 05:13:05PM +1000, Anton Blanchard wrote:
> 
> Hi Pat,
> 
> > The patch looks fine to me, I hope James will apply it.
> 
> Good to see someone else does parallel scan :)

Looks fine to me aswell.  Not that I have a machine with more than
one real HBA :)

> Speaking of races... Ive been tripping one when parallel probing.
> I hit a poisoned list pointer in scsi_run_host_queues, which is being
> called out of the error handler thread. (Ive got a few scsi buses in this
> machine full of bad disks)
> 
> I havent had a chance to look into it but Im guessing we are racing with
> someone calling scsi_free_sdev. Im not sure how we should be protecting
> ->siblings but it looks like we have to do something.

->siblings needs protection.  I've been working for this for a long time
and should have a final patchkit soon.


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2003-09-15  9:10 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-08-25 12:24 [PATCH] serialize bus scanning Christoph Hellwig
2003-08-27 20:29 ` Anton Blanchard
2003-08-27 20:35   ` Christoph Hellwig
2003-09-10  7:57     ` Anton Blanchard
2003-09-11 20:34       ` Patrick Mansfield
2003-09-15  7:13         ` Anton Blanchard
2003-09-15  9:10           ` Christoph Hellwig
2003-08-27 22:32   ` Mike Anderson
2003-08-27 22:59     ` Anton Blanchard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox