From: Dan Williams <dan.j.williams@intel.com>
To: James Bottomley <James.Bottomley@hansenpartnership.com>
Cc: linux-ide@vger.kernel.org, linux-scsi@vger.kernel.org,
Dariusz Majchrzak <dariusz.majchrzak@intel.com>
Subject: Re: [PATCH 12/12] scsi_transport_sas: fix delete vs scan race
Date: Sun, 20 May 2012 12:20:06 -0700 [thread overview]
Message-ID: <CAA9_cmeL5h_5xESis06pyT-7bt+K2eQrN5SR6_b25qLBSDVXvA@mail.gmail.com> (raw)
In-Reply-To: <CAA9_cmcCQtyRBEt-c8EP6wuSukqZn0Mswxi3nDG7R-88L57BjA@mail.gmail.com>
On Sat, May 5, 2012 at 2:52 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> On Sun, Apr 22, 2012 at 10:15 AM, James Bottomley
> <James.Bottomley@hansenpartnership.com> wrote:
>> Async scan here means any scan in a different thread, right ... it just
>> has to be asynchronous relative to us? So that includes the manually
>> initiated ones and hotplug ones, doesn't it?
>
> [ resend since I notice this never hit the lists ]
>
> Hmm, well no I don't think so. This literally means the initial async
> scan, and the
> failure window is between when we skip the call to
> scsi_sysfs_add_sdev() (in scsi_add_lun() under the scan_mutex) and
> finally call scsi_sysfs_add_sdev() again via scsi_finish_async_scan().
> I don't see how that fixes it because when we fail the sequence goes:
>
> mutex_lock(scan_mutex)
> starget->parent = end_device;
> scsi_add_lun()
> mutex_unlock(scan_mutex)
>
> device_del(end_device)
>
> mutex_lock(scan_mutex)
> device_add(starget)
> <crash>
>
> As far as I can see taking the scan_mutex in sas_rphy_remove() does
> not change this failure window. Unless I missed something?
>
> I am going to re-submit this patch as is with the proposed libsas batch for 3.5.
It turns out this patch can cause a deadlock in the scenario where we
have two hosts scanning and the "previous" host (according to the
async scan queue), experiences a device removal event. I think the
following should be all we need:
diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 01b0374..8906557 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -1714,6 +1714,9 @@ static void scsi_sysfs_add_devices(struct
Scsi_Host *shost)
{
struct scsi_device *sdev;
shost_for_each_device(sdev, shost) {
+ /* target removed before the device could be added */
+ if (sdev->sdev_state == SDEV_DEL)
+ continue;
if (!scsi_host_scan_allowed(shost) ||
scsi_sysfs_add_sdev(sdev) != 0)
__scsi_remove_device(sdev);
...since starget removal will mark the sdevs as deleted under
scan_mutex. scsi_sysfs_add_devices can simply ignore deleted devices.
I'll post this patch after Darek has a chance to try it out.
--
Dan
next prev parent reply other threads:[~2012-05-20 19:20 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-04-13 23:36 [GIT PATCH 00/12] libsas fixes for 3.4 Dan Williams
2012-04-13 23:36 ` [PATCH 01/12] libsas: introduce sas_work to fix sas_drain_work vs sas_queue_work Dan Williams
2012-04-13 23:37 ` [PATCH 02/12] libsas: cleanup spurious calls to scsi_schedule_eh Dan Williams
2012-04-13 23:37 ` [PATCH 03/12] libata, libsas: introduce sched_eh and end_eh port ops Dan Williams
2012-04-21 6:19 ` Jeff Garzik
2012-04-22 17:30 ` James Bottomley
2012-04-23 2:33 ` Jeff Garzik
2012-04-23 8:10 ` James Bottomley
2012-04-23 19:13 ` Dan Williams
2012-04-23 22:22 ` James Bottomley
2012-04-23 22:49 ` Dan Williams
2012-04-24 10:11 ` Jacek Danecki
2012-04-23 19:41 ` Dan Williams
2012-04-26 17:21 ` Dan Williams
2012-04-13 23:37 ` [PATCH 04/12] libsas: fix sas_find_bcast_phy() in the presence of 'vacant' phys Dan Williams
2012-04-13 23:37 ` [PATCH 05/12] libsas: fix sas_get_port_device regression Dan Williams
2012-04-13 23:37 ` [PATCH 06/12] libsas: unify domain_device sas_rphy lifetimes Dan Williams
2012-04-13 23:37 ` [PATCH 07/12] libsas: fix ata_eh clobbering ex_phys via smp_ata_check_ready Dan Williams
2012-04-13 23:37 ` [PATCH 08/12] libata: make ata_print_id atomic Dan Williams
2012-04-13 23:37 ` [PATCH 09/12] libsas, libata: fix start of life for a sas ata_port Dan Williams
2012-04-21 6:20 ` Jeff Garzik
2012-04-13 23:37 ` [PATCH 10/12] scsi: fix eh wakeup (scsi_schedule_eh vs scsi_restart_operations) Dan Williams
2012-04-21 12:22 ` James Bottomley
2012-04-22 15:24 ` Dan Williams
2012-04-13 23:37 ` [PATCH 11/12] libsas: fix false positive 'device attached' conditions Dan Williams
2012-04-22 10:53 ` James Bottomley
2012-04-22 15:56 ` Dan Williams
2012-04-13 23:37 ` [PATCH 12/12] scsi_transport_sas: fix delete vs scan race Dan Williams
2012-04-22 10:38 ` James Bottomley
2012-04-22 15:43 ` Dan Williams
2012-04-22 17:15 ` James Bottomley
2012-05-05 21:52 ` Dan Williams
2012-05-20 19:20 ` Dan Williams [this message]
2012-04-14 8:19 ` [GIT PATCH 00/12] libsas fixes for 3.4 jack_wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAA9_cmeL5h_5xESis06pyT-7bt+K2eQrN5SR6_b25qLBSDVXvA@mail.gmail.com \
--to=dan.j.williams@intel.com \
--cc=James.Bottomley@hansenpartnership.com \
--cc=dariusz.majchrzak@intel.com \
--cc=linux-ide@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).