From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dan Williams Subject: [PATCH v2] scsi: fix hot unplug vs async scan race Date: Mon, 11 Jun 2012 15:59:52 -0700 Message-ID: <20120611222559.7602.55539.stgit@dwillia2-linux.jf.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Return-path: Received: from mga02.intel.com ([134.134.136.20]:46443 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750948Ab2FKWnZ (ORCPT ); Mon, 11 Jun 2012 18:43:25 -0400 Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: linux-scsi@vger.kernel.org Cc: Mike Christie , Kashyap Desai , Matthew Wilcox , Dariusz Majchrzak , JBottomley@parallels.com, Robert Love , Nagalakshmi Nandigama The following crash results from cases where the end_device has been removed before scsi_sysfs_add_sdev has had a chance to run. BUG: unable to handle kernel NULL pointer dereference at 0000000000000098 IP: [] sysfs_create_dir+0x32/0xb6 ... Call Trace: [] kobject_add_internal+0x120/0x1e3 [] ? trace_hardirqs_on+0xd/0xf [] kobject_add_varg+0x41/0x50 [] kobject_add+0x64/0x66 [] device_add+0x12d/0x63a [] ? _raw_spin_unlock_irqrestore+0x47/0x56 [] ? module_refcount+0x89/0xa0 [] scsi_sysfs_add_sdev+0x4e/0x28a [] do_scan_async+0x9c/0x145 ...teach scsi_sysfs_add_devices() to check for deleted devices() before trying to add them, and teach scsi_remove_target() how to remove targets that have not been added via device_add(). Cc: Mike Christie Cc: Robert Love Cc: Nagalakshmi Nandigama Cc: Kashyap Desai Cc: Matthew Wilcox Reported-by: Dariusz Majchrzak Signed-off-by: Dan Williams --- Changes since v1: http://marc.info/?l=linux-scsi&m=133793175125892&w=2 Mike was right to point out that the initial version (having the transport remember the starget) was dangerous. So v2 takes the approach of looking up the starget via scsi structures and not the driver-core (device_for_each_child() is unreliable prior to device_add()). drivers/scsi/scsi_scan.c | 3 +++ drivers/scsi/scsi_sysfs.c | 41 ++++++++++++++++++++++++++--------------- 2 files changed, 29 insertions(+), 15 deletions(-) diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c index 03ebb37..56a9379 100644 --- a/drivers/scsi/scsi_scan.c +++ b/drivers/scsi/scsi_scan.c @@ -1702,6 +1702,9 @@ static void scsi_sysfs_add_devices(struct Scsi_Host *shost) { struct scsi_device *sdev; shost_for_each_device(sdev, shost) { + /* target removed before the device could be added */ + if (sdev->sdev_state == SDEV_DEL) + continue; if (!scsi_host_scan_allowed(shost) || scsi_sysfs_add_sdev(sdev) != 0) __scsi_remove_device(sdev); diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c index 04c2a27..f888aad 100644 --- a/drivers/scsi/scsi_sysfs.c +++ b/drivers/scsi/scsi_sysfs.c @@ -1000,7 +1000,6 @@ static void __scsi_remove_target(struct scsi_target *starget) struct scsi_device *sdev; spin_lock_irqsave(shost->host_lock, flags); - starget->reap_ref++; restart: list_for_each_entry(sdev, &shost->__devices, siblings) { if (sdev->channel != starget->channel || @@ -1014,14 +1013,6 @@ static void __scsi_remove_target(struct scsi_target *starget) goto restart; } spin_unlock_irqrestore(shost->host_lock, flags); - scsi_target_reap(starget); -} - -static int __remove_child (struct device * dev, void * data) -{ - if (scsi_is_target_device(dev)) - __scsi_remove_target(to_scsi_target(dev)); - return 0; } /** @@ -1034,14 +1025,34 @@ static int __remove_child (struct device * dev, void * data) */ void scsi_remove_target(struct device *dev) { - if (scsi_is_target_device(dev)) { - __scsi_remove_target(to_scsi_target(dev)); - return; + struct Scsi_Host *shost = dev_to_shost(dev->parent); + struct scsi_target *starget, *found; + unsigned long flags; + + restart: + found = NULL; + spin_lock_irqsave(shost->host_lock, flags); + list_for_each_entry(starget, &shost->__targets, siblings) { + if (starget->state == STARGET_DEL) + continue; + if (starget->dev.parent == dev || &starget->dev == dev) { + found = starget; + found->reap_ref++; + break; + } } + spin_unlock_irqrestore(shost->host_lock, flags); - get_device(dev); - device_for_each_child(dev, NULL, __remove_child); - put_device(dev); + if (found) { + __scsi_remove_target(found); + scsi_target_reap(found); + /* in the case where @dev has multiple starget children, + * continue removing. + * + * FIXME: does such a case exist? + */ + goto restart; + } } EXPORT_SYMBOL(scsi_remove_target);