* [RFC][PATCH] fix for async scsi scan sysfs problem (resend) @ 2007-04-19 13:25 Josef Bacik 2007-04-19 14:02 ` James Bottomley 0 siblings, 1 reply; 10+ messages in thread From: Josef Bacik @ 2007-04-19 13:25 UTC (permalink / raw) To: linux-scsi; +Cc: James.Bottomley, linux-kernel Hello, Resending this to a wider audience (thanks Andrew). I'm having a problem on the newest version of linus's git tree with my qla2xxx card. This is on a UP box, the problem doesn't happen on my similarly configured SMP box. When I unload and then try to load the qla2xxx driver again I get this message kobject_add failed for 3:0:0:0 with -EEXIST, don't try to register things with the same name in the same directory. [<c0405ea6>] show_trace_log_lvl+0x1a/0x2f [<c0406456>] show_trace+0x12/0x14 [<c04064d1>] dump_stack+0x16/0x18 [<c04e6e86>] kobject_shadow_add+0xcd/0x1df [<c04e6fa2>] kobject_add+0xa/0xc [<c0557ae1>] device_add+0xab/0x62e [<d0873a0f>] scsi_sysfs_add_sdev+0x2d/0x1eb [scsi_mod] [<d0871db8>] scsi_probe_and_add_lun+0x974/0xaa5 [scsi_mod] [<d087240a>] __scsi_scan_target+0xc0/0x5f1 [scsi_mod] [<d0872ec5>] scsi_scan_target+0x97/0xa6 [scsi_mod] [<d08b1c34>] fc_scsi_scan_rport+0x5a/0x76 [scsi_transport_fc] [<c0435f33>] run_workqueue+0x89/0x14e [<c0436949>] worker_thread+0xf8/0x124 [<c043911b>] kthread+0xb3/0xdc [<c0405b4f>] kernel_thread_helper+0x7/0x10 ======================= I traced this down to the async scanning doing a kobject_add for that object, the backtrace below shows the path we took to add it. [<c0405ea6>] show_trace_log_lvl+0x1a/0x2f [<c0406456>] show_trace+0x12/0x14 [<c04064d1>] dump_stack+0x16/0x18 [<c04e6e86>] kobject_shadow_add+0xcd/0x1df [<c04e6fa2>] kobject_add+0xa/0xc [<c055a45c>] class_device_add+0x9e/0x3ad [<d0873a3c>] scsi_sysfs_add_sdev+0x5a/0x1eb [scsi_mod] [<d0872cd4>] do_scan_async+0x62/0xf8 [scsi_mod] [<c043911b>] kthread+0xb3/0xdc [<c0405b4f>] kernel_thread_helper+0x7/0x10 ======================= Looking through everything I came to the conclusion that we don't really need the scsi_sysfs_add_devices in scsi_finish_async_scan, which gets run everytime we do a do_scan_async. In doing the scanning, if we come upon anything we will already be registering the device with sysfs so the scsi_sysfs_add_devices step is kind of useless. I tested this and it worked fine on my UP box (where the problem was happening) and my SMP box (where the problem wasn't happening). Now I'm not entirely sure if this is correct, but I'm attaching the patch that I used to fix it for me, please point out if I've done something wrong or if there is a different way this needs to be fixed. Thank you, Josef PS. I'm not on linux-scsi so please CC me, thanks for your time. diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c index 0949145..2c8527b 100644 --- a/drivers/scsi/scsi_scan.c +++ b/drivers/scsi/scsi_scan.c @@ -1661,15 +1661,6 @@ int scsi_scan_host_selected(struct Scsi_ return 0; } -static void scsi_sysfs_add_devices(struct Scsi_Host *shost) -{ - struct scsi_device *sdev; - shost_for_each_device(sdev, shost) { - if (scsi_sysfs_add_sdev(sdev) != 0) - scsi_destroy_sdev(sdev); - } -} - /** * scsi_prep_async_scan - prepare for an async scan * @shost: the host which will be scanned @@ -1741,8 +1732,6 @@ static void scsi_finish_async_scan(struc wait_for_completion(&data->prev_finished); - scsi_sysfs_add_devices(shost); - spin_lock(&async_scan_lock); shost->async_scan = 0; list_del(&data->list); ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [RFC][PATCH] fix for async scsi scan sysfs problem (resend) 2007-04-19 13:25 [RFC][PATCH] fix for async scsi scan sysfs problem (resend) Josef Bacik @ 2007-04-19 14:02 ` James Bottomley 2007-04-19 15:06 ` Josef Bacik 0 siblings, 1 reply; 10+ messages in thread From: James Bottomley @ 2007-04-19 14:02 UTC (permalink / raw) To: Josef Bacik; +Cc: linux-scsi, linux-kernel On Thu, 2007-04-19 at 09:25 -0400, Josef Bacik wrote: > Looking through everything I came to the conclusion that we don't really need > the scsi_sysfs_add_devices in scsi_finish_async_scan, which gets run everytime > we do a do_scan_async. In doing the scanning, if we come upon anything we will > already be registering the device with sysfs so the scsi_sysfs_add_devices step > is kind of useless. Unfortunately, it isn't. The registration step while scanning is at the end of scsi_add_lun(): if (!async && scsi_sysfs_add_sdev(sdev) != 0) return SCSI_SCAN_NO_RESPONSE; return SCSI_SCAN_LUN_PRESENT; The !async should mean that the addition *only* occurs for the non async scan case ... if you remove the post async scan add, we'll lose devices. > I tested this and it worked fine on my UP box (where the > problem was happening) and my SMP box (where the problem wasn't happening). Now > I'm not entirely sure if this is correct, but I'm attaching the patch that I > used to fix it for me, please point out if I've done something wrong or if there > is a different way this needs to be fixed. Thank you, Could you add some debugging first to see if we're actually adding the device twice (and also, if we are, what the value of the async is). James ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC][PATCH] fix for async scsi scan sysfs problem (resend) 2007-04-19 14:02 ` James Bottomley @ 2007-04-19 15:06 ` Josef Bacik 2007-04-21 7:23 ` Andrew Morton 0 siblings, 1 reply; 10+ messages in thread From: Josef Bacik @ 2007-04-19 15:06 UTC (permalink / raw) To: James Bottomley; +Cc: Josef Bacik, linux-scsi, linux-kernel On Thu, Apr 19, 2007 at 10:02:36AM -0400, James Bottomley wrote: > On Thu, 2007-04-19 at 09:25 -0400, Josef Bacik wrote: > > Looking through everything I came to the conclusion that we don't really need > > the scsi_sysfs_add_devices in scsi_finish_async_scan, which gets run everytime > > we do a do_scan_async. In doing the scanning, if we come upon anything we will > > already be registering the device with sysfs so the scsi_sysfs_add_devices step > > is kind of useless. > > Unfortunately, it isn't. The registration step while scanning is at the > end of scsi_add_lun(): > > > if (!async && scsi_sysfs_add_sdev(sdev) != 0) > return SCSI_SCAN_NO_RESPONSE; > > return SCSI_SCAN_LUN_PRESENT; > > The !async should mean that the addition *only* occurs for the non async > scan case ... if you remove the post async scan add, we'll lose devices. > > > I tested this and it worked fine on my UP box (where the > > problem was happening) and my SMP box (where the problem wasn't happening). Now > > I'm not entirely sure if this is correct, but I'm attaching the patch that I > > used to fix it for me, please point out if I've done something wrong or if there > > is a different way this needs to be fixed. Thank you, > > Could you add some debugging first to see if we're actually adding the > device twice (and also, if we are, what the value of the async is). > Sorry I should have put that in the original post, I added debugging to kobject_add to check to see if we were adding something twice, thats how I figured out who was doing it kobject rport-3:0-0: registering. parent: host3, set: devices kobject rport-3:0-0: registering. parent: fc_remote_ports, set: class_obj kobject target3:0:0: registering. parent: rport-3:0-0, set: devices kobject rport-3:0-1: registering. parent: host3, set: devices kobject rport-3:0-1: registering. parent: fc_remote_ports, set: class_obj kobject target3:0:0: registering. parent: fc_transport, set: class_obj kobject rport-3:0-2: registering. parent: host3, set: devices kobject rport-3:0-2: registering. parent: fc_remote_ports, set: class_obj kobject rport-3:0-3: registering. parent: host3, set: devices kobject rport-3:0-3: registering. parent: fc_remote_ports, set: class_obj kobject rport-3:0-4: registering. parent: host3, set: devices kobject rport-3:0-4: registering. parent: fc_remote_ports, set: class_obj kobject rport-3:0-5: registering. parent: host3, set: devices kobject rport-3:0-5: registering. parent: fc_remote_ports, set: class_obj kobject rport-3:0-6: registering. parent: host3, set: devices kobject rport-3:0-6: registering. parent: fc_remote_ports, set: class_obj kobject rport-3:0-7: registering. parent: host3, set: devices kobject rport-3:0-7: registering. parent: fc_remote_ports, set: class_obj >> kobject 3:0:0:0: registering. parent: target3:0:0, set: devices kobject 3:0:0:0: registering. parent: scsi_device, set: class_obj scsi 3:0:0:0: Direct-Access IBM 1742-900 0520 PQ: 0 ANSI: 3 >> kobject 3:0:0:0: registering. parent: target3:0:0, set: devices kobject_add failed for 3:0:0:0 with -EEXIST, don't try to register things with the same name in the same directory. Async in the first case is set and in the second case it isn't set. Thank you, Josef ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC][PATCH] fix for async scsi scan sysfs problem (resend) 2007-04-19 15:06 ` Josef Bacik @ 2007-04-21 7:23 ` Andrew Morton 2007-04-21 13:59 ` Josef Bacik 0 siblings, 1 reply; 10+ messages in thread From: Andrew Morton @ 2007-04-21 7:23 UTC (permalink / raw) To: Josef Bacik; +Cc: James Bottomley, linux-scsi, linux-kernel On Thu, 19 Apr 2007 11:06:56 -0400 Josef Bacik <jwhiter@redhat.com> wrote: > On Thu, Apr 19, 2007 at 10:02:36AM -0400, James Bottomley wrote: > > On Thu, 2007-04-19 at 09:25 -0400, Josef Bacik wrote: > > > Looking through everything I came to the conclusion that we don't really need > > > the scsi_sysfs_add_devices in scsi_finish_async_scan, which gets run everytime > > > we do a do_scan_async. In doing the scanning, if we come upon anything we will > > > already be registering the device with sysfs so the scsi_sysfs_add_devices step > > > is kind of useless. > > > > Unfortunately, it isn't. The registration step while scanning is at the > > end of scsi_add_lun(): > > > > > > if (!async && scsi_sysfs_add_sdev(sdev) != 0) > > return SCSI_SCAN_NO_RESPONSE; > > > > return SCSI_SCAN_LUN_PRESENT; > > > > The !async should mean that the addition *only* occurs for the non async > > scan case ... if you remove the post async scan add, we'll lose devices. > > > > > I tested this and it worked fine on my UP box (where the > > > problem was happening) and my SMP box (where the problem wasn't happening). Now > > > I'm not entirely sure if this is correct, but I'm attaching the patch that I > > > used to fix it for me, please point out if I've done something wrong or if there > > > is a different way this needs to be fixed. Thank you, > > > > Could you add some debugging first to see if we're actually adding the > > device twice (and also, if we are, what the value of the async is). > > > > Sorry I should have put that in the original post, I added debugging to > kobject_add to check to see if we were adding something twice, thats how I > figured out who was doing it > > kobject rport-3:0-0: registering. parent: host3, set: devices > kobject rport-3:0-0: registering. parent: fc_remote_ports, set: class_obj > kobject target3:0:0: registering. parent: rport-3:0-0, set: devices > kobject rport-3:0-1: registering. parent: host3, set: devices > kobject rport-3:0-1: registering. parent: fc_remote_ports, set: class_obj > kobject target3:0:0: registering. parent: fc_transport, set: class_obj > kobject rport-3:0-2: registering. parent: host3, set: devices > kobject rport-3:0-2: registering. parent: fc_remote_ports, set: class_obj > kobject rport-3:0-3: registering. parent: host3, set: devices > kobject rport-3:0-3: registering. parent: fc_remote_ports, set: class_obj > kobject rport-3:0-4: registering. parent: host3, set: devices > kobject rport-3:0-4: registering. parent: fc_remote_ports, set: class_obj > kobject rport-3:0-5: registering. parent: host3, set: devices > kobject rport-3:0-5: registering. parent: fc_remote_ports, set: class_obj > kobject rport-3:0-6: registering. parent: host3, set: devices > kobject rport-3:0-6: registering. parent: fc_remote_ports, set: class_obj > kobject rport-3:0-7: registering. parent: host3, set: devices > kobject rport-3:0-7: registering. parent: fc_remote_ports, set: class_obj > >> kobject 3:0:0:0: registering. parent: target3:0:0, set: devices > kobject 3:0:0:0: registering. parent: scsi_device, set: class_obj > scsi 3:0:0:0: Direct-Access IBM 1742-900 0520 PQ: 0 ANSI: 3 > >> kobject 3:0:0:0: registering. parent: target3:0:0, set: devices > kobject_add failed for 3:0:0:0 with -EEXIST, don't try to register things with > the same name in the same directory. > > Async in the first case is set and in the second case it isn't set. Thank you, > So.... do we now know what is causing this failure? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC][PATCH] fix for async scsi scan sysfs problem (resend) 2007-04-21 7:23 ` Andrew Morton @ 2007-04-21 13:59 ` Josef Bacik 2007-04-23 18:13 ` Josef Bacik 0 siblings, 1 reply; 10+ messages in thread From: Josef Bacik @ 2007-04-21 13:59 UTC (permalink / raw) To: Andrew Morton; +Cc: Josef Bacik, James Bottomley, linux-scsi, linux-kernel On Sat, Apr 21, 2007 at 12:23:45AM -0700, Andrew Morton wrote: > On Thu, 19 Apr 2007 11:06:56 -0400 Josef Bacik <jwhiter@redhat.com> wrote: > > > On Thu, Apr 19, 2007 at 10:02:36AM -0400, James Bottomley wrote: > > > On Thu, 2007-04-19 at 09:25 -0400, Josef Bacik wrote: > > > > Looking through everything I came to the conclusion that we don't really need > > > > the scsi_sysfs_add_devices in scsi_finish_async_scan, which gets run everytime > > > > we do a do_scan_async. In doing the scanning, if we come upon anything we will > > > > already be registering the device with sysfs so the scsi_sysfs_add_devices step > > > > is kind of useless. > > > > > > Unfortunately, it isn't. The registration step while scanning is at the > > > end of scsi_add_lun(): > > > > > > > > > if (!async && scsi_sysfs_add_sdev(sdev) != 0) > > > return SCSI_SCAN_NO_RESPONSE; > > > > > > return SCSI_SCAN_LUN_PRESENT; > > > > > > The !async should mean that the addition *only* occurs for the non async > > > scan case ... if you remove the post async scan add, we'll lose devices. > > > > > > > I tested this and it worked fine on my UP box (where the > > > > problem was happening) and my SMP box (where the problem wasn't happening). Now > > > > I'm not entirely sure if this is correct, but I'm attaching the patch that I > > > > used to fix it for me, please point out if I've done something wrong or if there > > > > is a different way this needs to be fixed. Thank you, > > > > > > Could you add some debugging first to see if we're actually adding the > > > device twice (and also, if we are, what the value of the async is). > > > > > > > Sorry I should have put that in the original post, I added debugging to > > kobject_add to check to see if we were adding something twice, thats how I > > figured out who was doing it > > > > kobject rport-3:0-0: registering. parent: host3, set: devices > > kobject rport-3:0-0: registering. parent: fc_remote_ports, set: class_obj > > kobject target3:0:0: registering. parent: rport-3:0-0, set: devices > > kobject rport-3:0-1: registering. parent: host3, set: devices > > kobject rport-3:0-1: registering. parent: fc_remote_ports, set: class_obj > > kobject target3:0:0: registering. parent: fc_transport, set: class_obj > > kobject rport-3:0-2: registering. parent: host3, set: devices > > kobject rport-3:0-2: registering. parent: fc_remote_ports, set: class_obj > > kobject rport-3:0-3: registering. parent: host3, set: devices > > kobject rport-3:0-3: registering. parent: fc_remote_ports, set: class_obj > > kobject rport-3:0-4: registering. parent: host3, set: devices > > kobject rport-3:0-4: registering. parent: fc_remote_ports, set: class_obj > > kobject rport-3:0-5: registering. parent: host3, set: devices > > kobject rport-3:0-5: registering. parent: fc_remote_ports, set: class_obj > > kobject rport-3:0-6: registering. parent: host3, set: devices > > kobject rport-3:0-6: registering. parent: fc_remote_ports, set: class_obj > > kobject rport-3:0-7: registering. parent: host3, set: devices > > kobject rport-3:0-7: registering. parent: fc_remote_ports, set: class_obj > > >> kobject 3:0:0:0: registering. parent: target3:0:0, set: devices > > kobject 3:0:0:0: registering. parent: scsi_device, set: class_obj > > scsi 3:0:0:0: Direct-Access IBM 1742-900 0520 PQ: 0 ANSI: 3 > > >> kobject 3:0:0:0: registering. parent: target3:0:0, set: devices > > kobject_add failed for 3:0:0:0 with -EEXIST, don't try to register things with > > the same name in the same directory. > > > > Async in the first case is set and in the second case it isn't set. Thank you, > > > > So.... do we now know what is causing this failure? Yes, the qla init stuff kicks off the async scanning, and then continues to initialize, and registers itself with the scsi fc stuff, which makes a workqueue to register the fc port. The problem with this is the async scanning and the fc port registration stuff runs back to back on my UP computer, so the scsi async stuff does the sysfs registration and then right after that the scsi fc stuff goes through and does its own scsi scan without async set, which does the sysfs registration as well, and then kobject complains about adding the same thing twice. Taking out the async check in scsi_add_lun would probably work, but this really isn't my area of expertise and I have a feeling thats kind of defeating the purpose of the async scsi scanning. Josef ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC][PATCH] fix for async scsi scan sysfs problem (resend) 2007-04-21 13:59 ` Josef Bacik @ 2007-04-23 18:13 ` Josef Bacik 2007-04-23 18:26 ` James Bottomley 0 siblings, 1 reply; 10+ messages in thread From: Josef Bacik @ 2007-04-23 18:13 UTC (permalink / raw) To: Josef Bacik; +Cc: Andrew Morton, James Bottomley, linux-scsi, linux-kernel On Sat, Apr 21, 2007 at 09:59:56AM -0400, Josef Bacik wrote: > On Sat, Apr 21, 2007 at 12:23:45AM -0700, Andrew Morton wrote: > > On Thu, 19 Apr 2007 11:06:56 -0400 Josef Bacik <jwhiter@redhat.com> wrote: > > > > > On Thu, Apr 19, 2007 at 10:02:36AM -0400, James Bottomley wrote: > > > > On Thu, 2007-04-19 at 09:25 -0400, Josef Bacik wrote: > > > > > Looking through everything I came to the conclusion that we don't really need > > > > > the scsi_sysfs_add_devices in scsi_finish_async_scan, which gets run everytime > > > > > we do a do_scan_async. In doing the scanning, if we come upon anything we will > > > > > already be registering the device with sysfs so the scsi_sysfs_add_devices step > > > > > is kind of useless. > > > > > > > > Unfortunately, it isn't. The registration step while scanning is at the > > > > end of scsi_add_lun(): > > > > > > > > > > > > if (!async && scsi_sysfs_add_sdev(sdev) != 0) > > > > return SCSI_SCAN_NO_RESPONSE; > > > > > > > > return SCSI_SCAN_LUN_PRESENT; > > > > > > > > The !async should mean that the addition *only* occurs for the non async > > > > scan case ... if you remove the post async scan add, we'll lose devices. > > > > Ok I have a new patch that I've built and tested on both my UP and SMP machine and it appears to work fine. I took the async check out of scsi_add_lun, I don't really see the point in waiting to do the sysfs registration stuff (if theres a reason I haven't been able to find it in the original submission of this functionality). Please let me know if this is incorrect. Thank you, Josef diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c index 0949145..8af1e16 100644 --- a/drivers/scsi/scsi_scan.c +++ b/drivers/scsi/scsi_scan.c @@ -712,7 +712,7 @@ static int scsi_probe_lun(struct scsi_device *sdev, unsigned char *inq_result, * SCSI_SCAN_LUN_PRESENT: a new scsi_device was allocated and initialized **/ static int scsi_add_lun(struct scsi_device *sdev, unsigned char *inq_result, - int *bflags, int async) + int *bflags) { /* * XXX do not save the inquiry, since it can change underneath us, @@ -912,7 +912,7 @@ static int scsi_add_lun(struct scsi_device *sdev, unsigned char *inq_result, * register it and tell the rest of the kernel * about it. */ - if (!async && scsi_sysfs_add_sdev(sdev) != 0) + if (scsi_sysfs_add_sdev(sdev) != 0) return SCSI_SCAN_NO_RESPONSE; return SCSI_SCAN_LUN_PRESENT; @@ -1081,7 +1081,7 @@ static int scsi_probe_and_add_lun(struct scsi_target *starget, goto out_free_result; } - res = scsi_add_lun(sdev, result, &bflags, shost->async_scan); + res = scsi_add_lun(sdev, result, &bflags); if (res == SCSI_SCAN_LUN_PRESENT) { if (bflags & BLIST_KEY) { sdev->lockable = 0; @@ -1661,15 +1661,6 @@ int scsi_scan_host_selected(struct Scsi_Host *shost, unsigned int channel, return 0; } -static void scsi_sysfs_add_devices(struct Scsi_Host *shost) -{ - struct scsi_device *sdev; - shost_for_each_device(sdev, shost) { - if (scsi_sysfs_add_sdev(sdev) != 0) - scsi_destroy_sdev(sdev); - } -} - /** * scsi_prep_async_scan - prepare for an async scan * @shost: the host which will be scanned @@ -1741,8 +1732,6 @@ static void scsi_finish_async_scan(struct async_scan_data *data) wait_for_completion(&data->prev_finished); - scsi_sysfs_add_devices(shost); - spin_lock(&async_scan_lock); shost->async_scan = 0; list_del(&data->list); ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [RFC][PATCH] fix for async scsi scan sysfs problem (resend) 2007-04-23 18:13 ` Josef Bacik @ 2007-04-23 18:26 ` James Bottomley 2007-05-03 20:00 ` James Smart 0 siblings, 1 reply; 10+ messages in thread From: James Bottomley @ 2007-04-23 18:26 UTC (permalink / raw) To: Josef Bacik; +Cc: Andrew Morton, linux-scsi, linux-kernel, Matthew Wilcox On Mon, 2007-04-23 at 14:13 -0400, Josef Bacik wrote: > Ok I have a new patch that I've built and tested on both my UP and SMP machine > and it appears to work fine. I took the async check out of scsi_add_lun, I > don't really see the point in waiting to do the sysfs registration stuff (if > theres a reason I haven't been able to find it in the original submission of > this functionality). Please let me know if this is incorrect. Thank you, Yes, it's incorrect ... if you do this, the devices will come up in a random order for multiple SCSI cards. One of the original design goals was not to require udev, so the final ordering should be the same as for the sync case. I think the root cause of the problem is somewhere in the fc transport rport addition code. James ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC][PATCH] fix for async scsi scan sysfs problem (resend) 2007-04-23 18:26 ` James Bottomley @ 2007-05-03 20:00 ` James Smart 2007-08-11 15:04 ` Jurij Smakov 0 siblings, 1 reply; 10+ messages in thread From: James Smart @ 2007-05-03 20:00 UTC (permalink / raw) To: James Bottomley Cc: Josef Bacik, Andrew Morton, linux-scsi, linux-kernel, Matthew Wilcox I doubt it's in the fc transport - it's doing what it always did, which has nothing to do with coherency of the sdev's. We're seeing like problems, and it looks like it's related to the scan_mutex being held when some of the entry points are being called via the recent async scan code (which also still has a bunch of issues around rmmod). We should be sending some patches shortly. -- james s James Bottomley wrote: > On Mon, 2007-04-23 at 14:13 -0400, Josef Bacik wrote: >> Ok I have a new patch that I've built and tested on both my UP and SMP machine >> and it appears to work fine. I took the async check out of scsi_add_lun, I >> don't really see the point in waiting to do the sysfs registration stuff (if >> theres a reason I haven't been able to find it in the original submission of >> this functionality). Please let me know if this is incorrect. Thank you, > > Yes, it's incorrect ... if you do this, the devices will come up in a > random order for multiple SCSI cards. One of the original design goals > was not to require udev, so the final ordering should be the same as for > the sync case. > > I think the root cause of the problem is somewhere in the fc transport > rport addition code. > > James > > > - > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC][PATCH] fix for async scsi scan sysfs problem (resend) 2007-05-03 20:00 ` James Smart @ 2007-08-11 15:04 ` Jurij Smakov 2007-08-13 0:26 ` Matthew Wilcox 0 siblings, 1 reply; 10+ messages in thread From: Jurij Smakov @ 2007-08-11 15:04 UTC (permalink / raw) To: James Smart Cc: James Bottomley, Josef Bacik, Andrew Morton, linux-scsi, linux-kernel, Matthew Wilcox [Please keep me on CC, as I'm not on LKML.] On Thu, May 03, 2007 at 04:00:57PM -0400, James Smart wrote: > I doubt it's in the fc transport - it's doing what it always did, which has > nothing to do with coherency of the sdev's. > > We're seeing like problems, and it looks like it's related to the > scan_mutex > being held when some of the entry points are being called via the recent > async scan code (which also still has a bunch of issues around rmmod). > We should be sending some patches shortly. Hi James, I've recently got a Sun Blade 1000 box with a QLA2200 controller, and I'm bumping into exact same problem with 2.6.22: scsi 0:0:0:0: Attached scsi generic sg1 type -1 scsi 0:0:0:0: Direct-Access HITACHI DKR1C-J072FC D7V5 PQ: 0 ANSI: 3 kobject_add failed for 0:0:0:0 with -EEXIST, don't try to register things with the same name in the same directory. Call Trace: [000000001000ac78] scsi_sysfs_add_sdev+0x2c/0x228 [scsi_mod] [0000000010008a68] scsi_probe_and_add_lun+0x97c/0xab8 [scsi_mod] [00000000100090d8] __scsi_scan_target+0x90/0x660 [scsi_mod] [0000000010009ce8] scsi_scan_target+0x94/0xa4 [scsi_mod] [00000000100668bc] fc_scsi_scan_rport+0x68/0x8c [scsi_transport_fc] [000000000046de88] run_workqueue+0xac/0x138 [000000000046e414] worker_thread+0xc4/0xd4 [0000000000471f24] kthread+0x4c/0x78 [00000000004277f8] kernel_thread+0x38/0x48 [0000000000471d84] kthreadd+0xbc/0x160 error 1 After that the device fails to initialize. On rare occasions the error does not trigger, and then machine boots fine. The complete boot log can be found at http://www.wooyd.org/misc/dmesg-blade1000-2.6.22.log I'm willing to test any patches you might have, as well as provide any additional debugging information. Best regards, -- Jurij Smakov jurij@wooyd.org Key: http://www.wooyd.org/pgpkey/ KeyID: C99E03CC ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC][PATCH] fix for async scsi scan sysfs problem (resend) 2007-08-11 15:04 ` Jurij Smakov @ 2007-08-13 0:26 ` Matthew Wilcox 0 siblings, 0 replies; 10+ messages in thread From: Matthew Wilcox @ 2007-08-13 0:26 UTC (permalink / raw) To: Jurij Smakov Cc: James Smart, James Bottomley, Josef Bacik, Andrew Morton, linux-scsi, linux-kernel On Sat, Aug 11, 2007 at 04:04:54PM +0100, Jurij Smakov wrote: > [Please keep me on CC, as I'm not on LKML.] > I've recently got a Sun Blade 1000 box with a QLA2200 controller, and > I'm bumping into exact same problem with 2.6.22: Please try http://marc.info/?l=linux-scsi&m=118289275414202 which fixes a number of problems with the async scanning code. -- "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2007-08-13 0:26 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-04-19 13:25 [RFC][PATCH] fix for async scsi scan sysfs problem (resend) Josef Bacik 2007-04-19 14:02 ` James Bottomley 2007-04-19 15:06 ` Josef Bacik 2007-04-21 7:23 ` Andrew Morton 2007-04-21 13:59 ` Josef Bacik 2007-04-23 18:13 ` Josef Bacik 2007-04-23 18:26 ` James Bottomley 2007-05-03 20:00 ` James Smart 2007-08-11 15:04 ` Jurij Smakov 2007-08-13 0:26 ` Matthew Wilcox
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox