* [Patch 2 of 2]: scsi-dh + dm-mpath: propagate SCSI device deletion to multipath
@ 2010-12-15 8:31 Menny_Hamburger
2010-12-15 16:03 ` Moger, Babu
2010-12-15 16:09 ` Mike Snitzer
0 siblings, 2 replies; 12+ messages in thread
From: Menny_Hamburger @ 2010-12-15 8:31 UTC (permalink / raw)
To: dm-devel
[-- Attachment #1.1: Type: text/plain, Size: 346 bytes --]
The problem:
When a SCSI device attached to a device handler is deleted, userland processes currently performing I/O on the device will I/O hang forever.
Attached is the multipath layer part of the patch.
Menny Hamburger
Engineer
Dell | IDC
office +972 97698789, fax +972 97698889
Dell IDC. 4 Hacharoshet St, Raanana 43657, Israel
[-- Attachment #1.2: Type: text/html, Size: 3409 bytes --]
[-- Attachment #2: md-handle-device-deletion.patch --]
[-- Type: application/octet-stream, Size: 687 bytes --]
From: Menny Hamburger <Menny_Hamburger@Dell.com>
Subject: [md] dm-multipath: Handle hardware handler deletion in pg_init_done
Bugzilla: 645343
When the SCSI device H/W handler is deleted or offlined, pg_init_done should fail the path.
Failing all the paths associated with the H/W handler will eventually terminate all I/O requests queued on the multipath device
diff -r -U 2 a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
--- a/drivers/md/dm-mpath.c 2010-12-15 10:08:13.243991000 +0200
+++ b/drivers/md/dm-mpath.c 2010-12-15 10:08:13.329736000 +0200
@@ -1190,4 +1190,5 @@
case SCSI_DH_OK:
break;
+ case SCSI_DH_DEV_OFFLINED:
case SCSI_DH_NOSYS:
if (!m->hw_handler_name) {
[-- Attachment #3: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [Patch 2 of 2]: scsi-dh + dm-mpath: propagate SCSI device deletion to multipath 2010-12-15 8:31 [Patch 2 of 2]: scsi-dh + dm-mpath: propagate SCSI device deletion to multipath Menny_Hamburger @ 2010-12-15 16:03 ` Moger, Babu 2010-12-15 16:09 ` Mike Snitzer 1 sibling, 0 replies; 12+ messages in thread From: Moger, Babu @ 2010-12-15 16:03 UTC (permalink / raw) To: device-mapper development Patches look good.. Reviewed-by: Babu Moger <babu.moger@lsi.com> ________________________________________ From: dm-devel-bounces@redhat.com [mailto:dm-devel-bounces@redhat.com] On Behalf Of Menny_Hamburger@dell.com Sent: Wednesday, December 15, 2010 2:31 AM To: dm-devel@redhat.com Subject: [dm-devel] [Patch 2 of 2]: scsi-dh + dm-mpath: propagate SCSI device deletion to multipath The problem: When a SCSI device attached to a device handler is deleted, userland processes currently performing I/O on the device will I/O hang forever. Attached is the multipath layer part of the patch. Menny Hamburger Engineer Dell | IDC office +972 97698789, fax +972 97698889 Dell IDC. 4 Hacharoshet St, Raanana 43657, Israel ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Patch 2 of 2]: scsi-dh + dm-mpath: propagate SCSI device deletion to multipath 2010-12-15 8:31 [Patch 2 of 2]: scsi-dh + dm-mpath: propagate SCSI device deletion to multipath Menny_Hamburger 2010-12-15 16:03 ` Moger, Babu @ 2010-12-15 16:09 ` Mike Snitzer 2010-12-16 7:32 ` Menny_Hamburger 1 sibling, 1 reply; 12+ messages in thread From: Mike Snitzer @ 2010-12-15 16:09 UTC (permalink / raw) To: Menny_Hamburger; +Cc: dm-devel On Wed, Dec 15 2010 at 3:31am -0500, Menny_Hamburger@dell.com <Menny_Hamburger@dell.com> wrote: > The problem: > When a SCSI device attached to a device handler is deleted, userland processes currently performing I/O on the device will I/O hang forever. > > Attached is the multipath layer part of the patch. Please do not use attachments when submitting patches. Please inline the patch in the body of the mail. Also please trim extraneous info when you reply to mails and please do not top-post. Your patch submissions were not well received by the dm-devel patchwork (which automatically collects submitted dm-devel patches so they don't get lost in the shuffle). Also, please use unique and descriptive subjects for the patches in a multipatch series. As for the proposed mpath change: Babu already pointed out that you don't need this mpath change as the default case already performs fail_path(). So all you need is that scsi_dh patch to get upstream. But to do that you'll need to: 1) create a patch against the upstream scsi-misc tree (not RHEL5.5): git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6.git 2) send the patch to the linux-scsi@vger.kernel.org mailing list (feel free to cc dm-devel too). Once this change is in the upstream scsi tree it'll propagate to RHEL releases. Thanks, Mike ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Patch 2 of 2]: scsi-dh + dm-mpath: propagate SCSI device deletion to multipath 2010-12-15 16:09 ` Mike Snitzer @ 2010-12-16 7:32 ` Menny_Hamburger 2010-12-16 14:02 ` Mike Snitzer 0 siblings, 1 reply; 12+ messages in thread From: Menny_Hamburger @ 2010-12-16 7:32 UTC (permalink / raw) To: snitzer; +Cc: dm-devel, Itay_Dar, Rob_Thomas Hi Mike, I am sorry about the submission problems, there's a first time for everything... I know this sounds superstitious but I did run quite thorough tests on this issue and I would really prefer that the proposed patch would go through the same code path as the one I have tested. My suggestion is to modify the scsi_dh code as follows: if (!scsi_dh || !get_device(&sdev->sdev_gendev) || sdev->sdev_state == SDEV_CANCEL || sdev->sdev_state == SDEV_DEL) err = SCSI_DH_NOSYS; if (sdev->sdev_state == SDEV_OFFLINE) err = SCSI_DH_DEV_OFFLINED; if (err) { if (fn) fn(fn_data, err); return err; } There is logic and benefits here: 1) SDEV_CANCEL/SDEV_DEV are just a stage before the DH is deleted so we can respond with the same SCSI_DH_NOSYS. I think it's best here to handle these together, since they are in fact the same thing. In addition, when we delete a device, I don't think it goes through an offline state. 2) SDEV_OFFLINE is easily reverted by echoing to the state of the device - it's not like detaching. Returning SCSI_DH_DEV_OFFLINED will just call the default case and perform fail_path - no need to worry about any handler issues here. 3) It will let me sleep better at night... Best Regards, Menny -----Original Message----- From: Mike Snitzer [mailto:snitzer@redhat.com] Sent: 15 December, 2010 18:10 To: Hamburger, Menny Cc: dm-devel@redhat.com Subject: Re: [Patch 2 of 2]: scsi-dh + dm-mpath: propagate SCSI device deletion to multipath On Wed, Dec 15 2010 at 3:31am -0500, Menny_Hamburger@dell.com <Menny_Hamburger@dell.com> wrote: > The problem: > When a SCSI device attached to a device handler is deleted, userland processes currently performing I/O on the device will I/O hang forever. > > Attached is the multipath layer part of the patch. Please do not use attachments when submitting patches. Please inline the patch in the body of the mail. Also please trim extraneous info when you reply to mails and please do not top-post. Your patch submissions were not well received by the dm-devel patchwork (which automatically collects submitted dm-devel patches so they don't get lost in the shuffle). Also, please use unique and descriptive subjects for the patches in a multipatch series. As for the proposed mpath change: Babu already pointed out that you don't need this mpath change as the default case already performs fail_path(). So all you need is that scsi_dh patch to get upstream. But to do that you'll need to: 1) create a patch against the upstream scsi-misc tree (not RHEL5.5): git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6.git 2) send the patch to the linux-scsi@vger.kernel.org mailing list (feel free to cc dm-devel too). Once this change is in the upstream scsi tree it'll propagate to RHEL releases. Thanks, Mike ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Patch 2 of 2]: scsi-dh + dm-mpath: propagate SCSI device deletion to multipath 2010-12-16 7:32 ` Menny_Hamburger @ 2010-12-16 14:02 ` Mike Snitzer 2010-12-16 14:21 ` Menny_Hamburger 0 siblings, 1 reply; 12+ messages in thread From: Mike Snitzer @ 2010-12-16 14:02 UTC (permalink / raw) To: Menny_Hamburger; +Cc: dm-devel, Itay_Dar, Rob_Thomas On Thu, Dec 16 2010 at 2:32am -0500, Menny_Hamburger@Dell.com <Menny_Hamburger@Dell.com> wrote: > Hi Mike, > > I am sorry about the submission problems, there's a first time for everything... > > I know this sounds superstitious but I did run quite thorough tests on > this issue and I would really prefer that the proposed patch would go > through the same code path as the one I have tested. > > My suggestion is to modify the scsi_dh code as follows: > > if (!scsi_dh || !get_device(&sdev->sdev_gendev) || > sdev->sdev_state == SDEV_CANCEL || > sdev->sdev_state == SDEV_DEL) > err = SCSI_DH_NOSYS; > > if (sdev->sdev_state == SDEV_OFFLINE) > err = SCSI_DH_DEV_OFFLINED; > > if (err) { > if (fn) > fn(fn_data, err); > > return err; > } > > There is logic and benefits here: > 1) SDEV_CANCEL/SDEV_DEV are just a stage before the DH is deleted so we can respond with the same SCSI_DH_NOSYS. > I think it's best here to handle these together, since they are in fact the same thing. > In addition, when we delete a device, I don't think it goes through an offline state. > 2) SDEV_OFFLINE is easily reverted by echoing to the state of the device - it's not like detaching. > Returning SCSI_DH_DEV_OFFLINED will just call the default case and perform fail_path - no need to worry about any handler issues here. > 3) It will let me sleep better at night... I think you misunderstood me. We had a bugzilla exchange yesterday (and as you stated above): the dm-mpath change isn't required so it is only the scsi_dh patch that needs to go upstream. I didn't take issue with your scsi_dh patch. What I shared (below) was that dm-devel isn't the appropriate place to email a patch that is exclussively for scsi_dh. (This is just the tedious bit of "how does this change go upstream?"). I'll take care of rebasing your scsi_dh patch to the scsi-misc git tree I referenced and then send the patch to linux-scsi on your behalf. Regards, Mike > -----Original Message----- > From: Mike Snitzer [mailto:snitzer@redhat.com] > Sent: 15 December, 2010 18:10 > To: Hamburger, Menny > Cc: dm-devel@redhat.com > Subject: Re: [Patch 2 of 2]: scsi-dh + dm-mpath: propagate SCSI device deletion to multipath > > On Wed, Dec 15 2010 at 3:31am -0500, > Menny_Hamburger@dell.com <Menny_Hamburger@dell.com> wrote: > > > The problem: > > When a SCSI device attached to a device handler is deleted, userland processes currently performing I/O on the device will I/O hang forever. > > > > Attached is the multipath layer part of the patch. > > Please do not use attachments when submitting patches. Please inline > the patch in the body of the mail. Also please trim extraneous info > when you reply to mails and please do not top-post. > > Your patch submissions were not well received by the dm-devel patchwork > (which automatically collects submitted dm-devel patches so they don't > get lost in the shuffle). > > Also, please use unique and descriptive subjects for the patches in a > multipatch series. > > As for the proposed mpath change: Babu already pointed out that you > don't need this mpath change as the default case already performs > fail_path(). > > So all you need is that scsi_dh patch to get upstream. But to do that > you'll need to: > 1) create a patch against the upstream scsi-misc tree (not RHEL5.5): > git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6.git > > 2) send the patch to the linux-scsi@vger.kernel.org mailing list (feel > free to cc dm-devel too). > > Once this change is in the upstream scsi tree it'll propagate to RHEL > releases. > > Thanks, > Mike ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Patch 2 of 2]: scsi-dh + dm-mpath: propagate SCSI device deletion to multipath 2010-12-16 14:02 ` Mike Snitzer @ 2010-12-16 14:21 ` Menny_Hamburger 2010-12-16 15:29 ` [PATCH v2][SCSI] scsi_dh: propagate SCSI device deletion Mike Snitzer 0 siblings, 1 reply; 12+ messages in thread From: Menny_Hamburger @ 2010-12-16 14:21 UTC (permalink / raw) To: snitzer; +Cc: dm-devel, Itay_Dar, Rob_Thomas Mike, What I meant is that removing the dm-mpath change from the patch makes the operation path go into an area that I did not test (the default case). I know it's only a !hw_handler_name and a printk (as I said before - this might sound superstitious), however I did my tests with the dm-mpath part, and I prefer to send a 100% tested patch then to add something I did not test. My suggestion was to supply a scsi_dh only patch that goes through the same operation path as the one I tested - via the SCSI_DH_NOSYS case in pg_init_done. In addition, the separation of offline from cancel/del also seems the correct way to do this. Hope this makes sense, Menny -----Original Message----- From: Mike Snitzer [mailto:snitzer@redhat.com] Sent: 16 December, 2010 16:02 To: Hamburger, Menny Cc: dm-devel@redhat.com; Thomas, Rob; Dar, Itay Subject: Re: [Patch 2 of 2]: scsi-dh + dm-mpath: propagate SCSI device deletion to multipath On Thu, Dec 16 2010 at 2:32am -0500, Menny_Hamburger@Dell.com <Menny_Hamburger@Dell.com> wrote: > Hi Mike, > > I am sorry about the submission problems, there's a first time for everything... > > I know this sounds superstitious but I did run quite thorough tests on > this issue and I would really prefer that the proposed patch would go > through the same code path as the one I have tested. > > My suggestion is to modify the scsi_dh code as follows: > > if (!scsi_dh || !get_device(&sdev->sdev_gendev) || > sdev->sdev_state == SDEV_CANCEL || > sdev->sdev_state == SDEV_DEL) > err = SCSI_DH_NOSYS; > > if (sdev->sdev_state == SDEV_OFFLINE) > err = SCSI_DH_DEV_OFFLINED; > > if (err) { > if (fn) > fn(fn_data, err); > > return err; > } > > There is logic and benefits here: > 1) SDEV_CANCEL/SDEV_DEV are just a stage before the DH is deleted so we can respond with the same SCSI_DH_NOSYS. > I think it's best here to handle these together, since they are in fact the same thing. > In addition, when we delete a device, I don't think it goes through an offline state. > 2) SDEV_OFFLINE is easily reverted by echoing to the state of the device - it's not like detaching. > Returning SCSI_DH_DEV_OFFLINED will just call the default case and perform fail_path - no need to worry about any handler issues here. > 3) It will let me sleep better at night... I think you misunderstood me. We had a bugzilla exchange yesterday (and as you stated above): the dm-mpath change isn't required so it is only the scsi_dh patch that needs to go upstream. I didn't take issue with your scsi_dh patch. What I shared (below) was that dm-devel isn't the appropriate place to email a patch that is exclussively for scsi_dh. (This is just the tedious bit of "how does this change go upstream?"). I'll take care of rebasing your scsi_dh patch to the scsi-misc git tree I referenced and then send the patch to linux-scsi on your behalf. Regards, Mike > -----Original Message----- > From: Mike Snitzer [mailto:snitzer@redhat.com] > Sent: 15 December, 2010 18:10 > To: Hamburger, Menny > Cc: dm-devel@redhat.com > Subject: Re: [Patch 2 of 2]: scsi-dh + dm-mpath: propagate SCSI device deletion to multipath > > On Wed, Dec 15 2010 at 3:31am -0500, > Menny_Hamburger@dell.com <Menny_Hamburger@dell.com> wrote: > > > The problem: > > When a SCSI device attached to a device handler is deleted, userland processes currently performing I/O on the device will I/O hang forever. > > > > Attached is the multipath layer part of the patch. > > Please do not use attachments when submitting patches. Please inline > the patch in the body of the mail. Also please trim extraneous info > when you reply to mails and please do not top-post. > > Your patch submissions were not well received by the dm-devel patchwork > (which automatically collects submitted dm-devel patches so they don't > get lost in the shuffle). > > Also, please use unique and descriptive subjects for the patches in a > multipatch series. > > As for the proposed mpath change: Babu already pointed out that you > don't need this mpath change as the default case already performs > fail_path(). > > So all you need is that scsi_dh patch to get upstream. But to do that > you'll need to: > 1) create a patch against the upstream scsi-misc tree (not RHEL5.5): > git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6.git > > 2) send the patch to the linux-scsi@vger.kernel.org mailing list (feel > free to cc dm-devel too). > > Once this change is in the upstream scsi tree it'll propagate to RHEL > releases. > > Thanks, > Mike ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v2][SCSI] scsi_dh: propagate SCSI device deletion 2010-12-16 14:21 ` Menny_Hamburger @ 2010-12-16 15:29 ` Mike Snitzer 2010-12-16 15:52 ` Menny_Hamburger 0 siblings, 1 reply; 12+ messages in thread From: Mike Snitzer @ 2010-12-16 15:29 UTC (permalink / raw) To: Menny_Hamburger; +Cc: dm-devel, Itay_Dar, Rob_Thomas On Thu, Dec 16 2010 at 9:21am -0500, Menny_Hamburger@Dell.com <Menny_Hamburger@Dell.com> wrote: > Mike, > > What I meant is that removing the dm-mpath change from the patch makes > the operation path go into an area that I did not test (the default > case). > I know it's only a !hw_handler_name and a printk (as I said before - > this might sound superstitious), however I did my tests with the > dm-mpath part, and I prefer to send a 100% tested patch then to add > something I did not test. > My suggestion was to supply a scsi_dh only patch that goes through the > same operation path as the one I tested - via the SCSI_DH_NOSYS case > in pg_init_done. In addition, the separation of offline from > cancel/del also seems the correct way to do this. I see, my misunderstanding. - But you're still returning SCSI_DH_DEV_OFFLINED; so if dm mpath's pg_init_done isn't updated to explicitly handle it it'll just fail_path() via the default case. Are you OK with that? - In your initial patch you said: "When running an upstream kernel, the above scenario may not occur because the request queue is aborted when the multipath fails the path." I'm missing _why_ fail_path's call to blk_abort_queue() would obviate the need for this patch. blk_abort_queue() is only called from fail_path(). But I thought the problem is fail_path() isn't called in this case where the device is SCSI_DH_NOSYS or SCSI_DH_DEV_OFFLINED? -- due to the missing call to activate_complete callback (pg_init_done) (again, I'm very interested in this because we'll be reverting DM mpath's call to blk_abort_queue in the near future). But this would be the revised scsi_dh patch (I'm not sending to linux-scsi until I have an answer for the above concerns): From: Menny Hamburger <Menny_Hamburger@Dell.com> When the scsi_dh_activate returns SCSI_DH_NOSYS the activate_complete callback is not called and the error is not propagated to DM mpath. When a SCSI device attached to a device handler is deleted, userland processes currently performing I/O on the device will have their I/O hang forever. - Set SCSI_DH_NOSYS error when the handler is in the process of being deleted (e.g. the SCSI device is in a SDEV_CANCEL or SDEV_DEL state). - Set SCSI_DH_DEV_OFFLINED error when device is in SDEV_OFFLINE state. - Call the activate_complete callback function directly from scsi_dh_activate if an error has been set (when either the scsi_dh internal data has already been deleted or is in the process of being deleted). The patch was tested in an ISCSI environment, RDAC H/W handler and multipath. In the following reproduction process, dd will I/O hang forever and the only way to release it will be to reboot the machine: 1) Perform I/O on a multipath device: dd if=/dev/dm-0 of=/dev/zero bs=8k count=1000000 & 2) Delete all slave SCSI devices contained in the mpath device: I) In an ISCSI environment, the easiest way to do this is by stopping ISCSI: /etc/init.d/iscsi stop II) Another way to delete the devices is by applying the following bash scriptlet: dm_devs=$(ls /sys/block/ | grep dm- | xargs) for dm_dev in $dm_devs; do devices=$(ls /sys/block/$dm_dev/slaves) for device in $devices; do echo 1 > /sys/block/$device/device/delete done done NOTE: when DM mpath uses blk_abort_queue this scsi_dh change isn't required. However, DM mpath's call to blk_abort_queue has proven to be unsafe due to a race (between blk_abort_queue and scsi_request_fn) that can lead to list corruption. Therefore we cannot rely on blk_abort_queue (dm mpath's blk_abort_queue call will be reverted and will only be restored once the race with scsi_request_fn is fixed). Signed-off-by: Menny Hamburger <Menny_Hamburger@Dell.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> --- drivers/scsi/device_handler/scsi_dh.c | 11 +++++++++-- 1 files changed, 9 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/device_handler/scsi_dh.c b/drivers/scsi/device_handler/scsi_dh.c index 6fae3d2..b0c56f6 100644 --- a/drivers/scsi/device_handler/scsi_dh.c +++ b/drivers/scsi/device_handler/scsi_dh.c @@ -442,12 +442,19 @@ int scsi_dh_activate(struct request_queue *q, activate_complete fn, void *data) sdev = q->queuedata; if (sdev && sdev->scsi_dh_data) scsi_dh = sdev->scsi_dh_data->scsi_dh; - if (!scsi_dh || !get_device(&sdev->sdev_gendev)) + if (!scsi_dh || !get_device(&sdev->sdev_gendev) || + sdev->sdev_state == SDEV_CANCEL || + sdev->sdev_state == SDEV_DEL) err = SCSI_DH_NOSYS; + if (sdev->sdev_state == SDEV_OFFLINE) + err = SCSI_DH_DEV_OFFLINED; spin_unlock_irqrestore(q->queue_lock, flags); - if (err) + if (err) { + if (fn) + fn(data, err); return err; + } if (scsi_dh->activate) err = scsi_dh->activate(sdev, fn, data); ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v2][SCSI] scsi_dh: propagate SCSI device deletion 2010-12-16 15:29 ` [PATCH v2][SCSI] scsi_dh: propagate SCSI device deletion Mike Snitzer @ 2010-12-16 15:52 ` Menny_Hamburger 2010-12-16 16:29 ` Mike Snitzer 0 siblings, 1 reply; 12+ messages in thread From: Menny_Hamburger @ 2010-12-16 15:52 UTC (permalink / raw) To: snitzer; +Cc: dm-devel, Itay_Dar, Rob_Thomas 1) When I tested this problem, deleting the device set it temporarily to SDEV_DEL (sometimes SDEV_CANCEL) state. So the main issue should be solved with the function returning SCSI_DH_NOSYS. Failing to the default on offline is OK with me since there is no issue with !mw_handler_name because the device still exists. 2) Aborting the request queue does not obviate the need for this patch, it's just that in a system where abort queue exists, the I/O hang will not occur. Another way to say this is that the abort queue functionality (if it worked OK - which it doesn't) hides the problem from the user experience by forcing all the queues to be flushed. My preference was always to make sure a problem (in this case the I/O hang) does not occur in the first place rather than take actions when it does. Hope this answers your questions. Menny -----Original Message----- From: Mike Snitzer [mailto:snitzer@redhat.com] Sent: 16 December, 2010 17:30 To: Hamburger, Menny Cc: dm-devel@redhat.com; Thomas, Rob; Dar, Itay Subject: [PATCH v2][SCSI] scsi_dh: propagate SCSI device deletion On Thu, Dec 16 2010 at 9:21am -0500, Menny_Hamburger@Dell.com <Menny_Hamburger@Dell.com> wrote: > Mike, > > What I meant is that removing the dm-mpath change from the patch makes > the operation path go into an area that I did not test (the default > case). > I know it's only a !hw_handler_name and a printk (as I said before - > this might sound superstitious), however I did my tests with the > dm-mpath part, and I prefer to send a 100% tested patch then to add > something I did not test. > My suggestion was to supply a scsi_dh only patch that goes through the > same operation path as the one I tested - via the SCSI_DH_NOSYS case > in pg_init_done. In addition, the separation of offline from > cancel/del also seems the correct way to do this. I see, my misunderstanding. - But you're still returning SCSI_DH_DEV_OFFLINED; so if dm mpath's pg_init_done isn't updated to explicitly handle it it'll just fail_path() via the default case. Are you OK with that? - In your initial patch you said: "When running an upstream kernel, the above scenario may not occur because the request queue is aborted when the multipath fails the path." I'm missing _why_ fail_path's call to blk_abort_queue() would obviate the need for this patch. blk_abort_queue() is only called from fail_path(). But I thought the problem is fail_path() isn't called in this case where the device is SCSI_DH_NOSYS or SCSI_DH_DEV_OFFLINED? -- due to the missing call to activate_complete callback (pg_init_done) (again, I'm very interested in this because we'll be reverting DM mpath's call to blk_abort_queue in the near future). But this would be the revised scsi_dh patch (I'm not sending to linux-scsi until I have an answer for the above concerns): From: Menny Hamburger <Menny_Hamburger@Dell.com> When the scsi_dh_activate returns SCSI_DH_NOSYS the activate_complete callback is not called and the error is not propagated to DM mpath. When a SCSI device attached to a device handler is deleted, userland processes currently performing I/O on the device will have their I/O hang forever. - Set SCSI_DH_NOSYS error when the handler is in the process of being deleted (e.g. the SCSI device is in a SDEV_CANCEL or SDEV_DEL state). - Set SCSI_DH_DEV_OFFLINED error when device is in SDEV_OFFLINE state. - Call the activate_complete callback function directly from scsi_dh_activate if an error has been set (when either the scsi_dh internal data has already been deleted or is in the process of being deleted). The patch was tested in an ISCSI environment, RDAC H/W handler and multipath. In the following reproduction process, dd will I/O hang forever and the only way to release it will be to reboot the machine: 1) Perform I/O on a multipath device: dd if=/dev/dm-0 of=/dev/zero bs=8k count=1000000 & 2) Delete all slave SCSI devices contained in the mpath device: I) In an ISCSI environment, the easiest way to do this is by stopping ISCSI: /etc/init.d/iscsi stop II) Another way to delete the devices is by applying the following bash scriptlet: dm_devs=$(ls /sys/block/ | grep dm- | xargs) for dm_dev in $dm_devs; do devices=$(ls /sys/block/$dm_dev/slaves) for device in $devices; do echo 1 > /sys/block/$device/device/delete done done NOTE: when DM mpath uses blk_abort_queue this scsi_dh change isn't required. However, DM mpath's call to blk_abort_queue has proven to be unsafe due to a race (between blk_abort_queue and scsi_request_fn) that can lead to list corruption. Therefore we cannot rely on blk_abort_queue (dm mpath's blk_abort_queue call will be reverted and will only be restored once the race with scsi_request_fn is fixed). Signed-off-by: Menny Hamburger <Menny_Hamburger@Dell.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> --- drivers/scsi/device_handler/scsi_dh.c | 11 +++++++++-- 1 files changed, 9 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/device_handler/scsi_dh.c b/drivers/scsi/device_handler/scsi_dh.c index 6fae3d2..b0c56f6 100644 --- a/drivers/scsi/device_handler/scsi_dh.c +++ b/drivers/scsi/device_handler/scsi_dh.c @@ -442,12 +442,19 @@ int scsi_dh_activate(struct request_queue *q, activate_complete fn, void *data) sdev = q->queuedata; if (sdev && sdev->scsi_dh_data) scsi_dh = sdev->scsi_dh_data->scsi_dh; - if (!scsi_dh || !get_device(&sdev->sdev_gendev)) + if (!scsi_dh || !get_device(&sdev->sdev_gendev) || + sdev->sdev_state == SDEV_CANCEL || + sdev->sdev_state == SDEV_DEL) err = SCSI_DH_NOSYS; + if (sdev->sdev_state == SDEV_OFFLINE) + err = SCSI_DH_DEV_OFFLINED; spin_unlock_irqrestore(q->queue_lock, flags); - if (err) + if (err) { + if (fn) + fn(data, err); return err; + } if (scsi_dh->activate) err = scsi_dh->activate(sdev, fn, data); ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v2][SCSI] scsi_dh: propagate SCSI device deletion 2010-12-16 15:52 ` Menny_Hamburger @ 2010-12-16 16:29 ` Mike Snitzer 2010-12-16 16:42 ` Menny_Hamburger 0 siblings, 1 reply; 12+ messages in thread From: Mike Snitzer @ 2010-12-16 16:29 UTC (permalink / raw) To: Menny_Hamburger; +Cc: Itay_Dar, device-mapper development, Rob_Thomas On Thu, Dec 16 2010 at 10:52am -0500, Menny_Hamburger@dell.com <Menny_Hamburger@dell.com> wrote: > 1) When I tested this problem, deleting the device set it temporarily > to SDEV_DEL (sometimes SDEV_CANCEL) state. So the main issue should > be solved with the function returning SCSI_DH_NOSYS. > Failing to the default on offline is OK with me since there is no > issue with !mw_handler_name because the device still exists. OK. > 2) Aborting the request queue does not obviate the need for this > patch, it's just that in a system where abort queue exists, the I/O > hang will not occur. > Another way to say this is that the abort queue functionality (if > it worked OK - which it doesn't) hides the problem from the user > experience by forcing all the queues to be flushed. But my point was the blk_abort_queue() wouldn't ever be called for this case would it? Because fail_path() isn't called for this SDEV_DEL and SDEV_CANCEL corner case (hence the need for your patch). So the relation to blk_abort_queue() is still muddled for me. > My preference > was always to make sure a problem (in this case the I/O hang) does > not occur in the first place rather than take actions when it does. Sure, I agree, and given blk_abort_queue() is prone to crash -- albeit a fairly rare race -- we need a solution that doesn't rely on blk_abort_qeue. But I'm just missing how blk_abort_queue() was helping this case (can't see how it would ever be called). Mike > -----Original Message----- > From: Mike Snitzer [mailto:snitzer@redhat.com] > Sent: 16 December, 2010 17:30 > To: Hamburger, Menny > Cc: dm-devel@redhat.com; Thomas, Rob; Dar, Itay > Subject: [PATCH v2][SCSI] scsi_dh: propagate SCSI device deletion > > On Thu, Dec 16 2010 at 9:21am -0500, > Menny_Hamburger@Dell.com <Menny_Hamburger@Dell.com> wrote: > > > Mike, > > > > What I meant is that removing the dm-mpath change from the patch makes > > the operation path go into an area that I did not test (the default > > case). > > I know it's only a !hw_handler_name and a printk (as I said before - > > this might sound superstitious), however I did my tests with the > > dm-mpath part, and I prefer to send a 100% tested patch then to add > > something I did not test. > > My suggestion was to supply a scsi_dh only patch that goes through the > > same operation path as the one I tested - via the SCSI_DH_NOSYS case > > in pg_init_done. In addition, the separation of offline from > > cancel/del also seems the correct way to do this. > > I see, my misunderstanding. > > - But you're still returning SCSI_DH_DEV_OFFLINED; so if dm mpath's > pg_init_done isn't updated to explicitly handle it it'll just > fail_path() via the default case. Are you OK with that? > > - In your initial patch you said: "When running an upstream kernel, > the above scenario may not occur because the request queue is aborted > when the multipath fails the path." > > I'm missing _why_ fail_path's call to blk_abort_queue() would obviate > the need for this patch. blk_abort_queue() is only called from > fail_path(). But I thought the problem is fail_path() isn't called in > this case where the device is SCSI_DH_NOSYS or SCSI_DH_DEV_OFFLINED? > -- due to the missing call to activate_complete callback (pg_init_done) > (again, I'm very interested in this because we'll be reverting DM > mpath's call to blk_abort_queue in the near future). > > But this would be the revised scsi_dh patch (I'm not sending to > linux-scsi until I have an answer for the above concerns): > > > From: Menny Hamburger <Menny_Hamburger@Dell.com> > > When the scsi_dh_activate returns SCSI_DH_NOSYS the activate_complete > callback is not called and the error is not propagated to DM mpath. > > When a SCSI device attached to a device handler is deleted, userland > processes currently performing I/O on the device will have their I/O > hang forever. > > - Set SCSI_DH_NOSYS error when the handler is in the process of being > deleted (e.g. the SCSI device is in a SDEV_CANCEL or SDEV_DEL state). > > - Set SCSI_DH_DEV_OFFLINED error when device is in SDEV_OFFLINE state. > > - Call the activate_complete callback function directly from > scsi_dh_activate if an error has been set (when either the scsi_dh > internal data has already been deleted or is in the process of being > deleted). > > The patch was tested in an ISCSI environment, RDAC H/W handler and > multipath. > In the following reproduction process, dd will I/O hang forever and the > only way to release it will be to reboot the machine: > 1) Perform I/O on a multipath device: > dd if=/dev/dm-0 of=/dev/zero bs=8k count=1000000 & > 2) Delete all slave SCSI devices contained in the mpath device: > I) In an ISCSI environment, the easiest way to do this is by > stopping ISCSI: > /etc/init.d/iscsi stop > II) Another way to delete the devices is by applying the following > bash scriptlet: > dm_devs=$(ls /sys/block/ | grep dm- | xargs) > for dm_dev in $dm_devs; do > devices=$(ls /sys/block/$dm_dev/slaves) > for device in $devices; do > echo 1 > /sys/block/$device/device/delete > done > done > > NOTE: when DM mpath uses blk_abort_queue this scsi_dh change isn't > required. However, DM mpath's call to blk_abort_queue has proven to be > unsafe due to a race (between blk_abort_queue and scsi_request_fn) that > can lead to list corruption. Therefore we cannot rely on > blk_abort_queue (dm mpath's blk_abort_queue call will be reverted and > will only be restored once the race with scsi_request_fn is fixed). > > Signed-off-by: Menny Hamburger <Menny_Hamburger@Dell.com> > Signed-off-by: Mike Snitzer <snitzer@redhat.com> > --- > drivers/scsi/device_handler/scsi_dh.c | 11 +++++++++-- > 1 files changed, 9 insertions(+), 2 deletions(-) > > diff --git a/drivers/scsi/device_handler/scsi_dh.c b/drivers/scsi/device_handler/scsi_dh.c > index 6fae3d2..b0c56f6 100644 > --- a/drivers/scsi/device_handler/scsi_dh.c > +++ b/drivers/scsi/device_handler/scsi_dh.c > @@ -442,12 +442,19 @@ int scsi_dh_activate(struct request_queue *q, activate_complete fn, void *data) > sdev = q->queuedata; > if (sdev && sdev->scsi_dh_data) > scsi_dh = sdev->scsi_dh_data->scsi_dh; > - if (!scsi_dh || !get_device(&sdev->sdev_gendev)) > + if (!scsi_dh || !get_device(&sdev->sdev_gendev) || > + sdev->sdev_state == SDEV_CANCEL || > + sdev->sdev_state == SDEV_DEL) > err = SCSI_DH_NOSYS; > + if (sdev->sdev_state == SDEV_OFFLINE) > + err = SCSI_DH_DEV_OFFLINED; > spin_unlock_irqrestore(q->queue_lock, flags); > > - if (err) > + if (err) { > + if (fn) > + fn(data, err); > return err; > + } > > if (scsi_dh->activate) > err = scsi_dh->activate(sdev, fn, data); > > -- > dm-devel mailing list > dm-devel@redhat.com > https://www.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2][SCSI] scsi_dh: propagate SCSI device deletion 2010-12-16 16:29 ` Mike Snitzer @ 2010-12-16 16:42 ` Menny_Hamburger 2010-12-16 19:57 ` [PATCH v3][SCSI] " Mike Snitzer 0 siblings, 1 reply; 12+ messages in thread From: Menny_Hamburger @ 2010-12-16 16:42 UTC (permalink / raw) To: snitzer; +Cc: Itay_Dar, dm-devel, Rob_Thomas Mike, Don't forget you have multipathd running in userland that tests each path from user space: When multipathd detects a path failure it tells the driver to call fail_path on that path - on a system with blk abort this should be enough (although again - you are making the problem go away instead of preventing it) since all processing doing I/O will be stopped on I/O error. On a system without blk_abort, doing fail_path is not enough - it just fails the path but does not flush the request queue. pg_init_done does other things in addition to failing the path - specifically it schedules process_queued_ios that flushes the request queue. Menny -----Original Message----- From: Mike Snitzer [mailto:snitzer@redhat.com] Sent: 16 December, 2010 18:30 To: Hamburger, Menny Cc: Dar, Itay; Thomas, Rob; device-mapper development Subject: Re: [PATCH v2][SCSI] scsi_dh: propagate SCSI device deletion On Thu, Dec 16 2010 at 10:52am -0500, Menny_Hamburger@dell.com <Menny_Hamburger@dell.com> wrote: > 1) When I tested this problem, deleting the device set it temporarily > to SDEV_DEL (sometimes SDEV_CANCEL) state. So the main issue should > be solved with the function returning SCSI_DH_NOSYS. > Failing to the default on offline is OK with me since there is no > issue with !mw_handler_name because the device still exists. OK. > 2) Aborting the request queue does not obviate the need for this > patch, it's just that in a system where abort queue exists, the I/O > hang will not occur. > Another way to say this is that the abort queue functionality (if > it worked OK - which it doesn't) hides the problem from the user > experience by forcing all the queues to be flushed. But my point was the blk_abort_queue() wouldn't ever be called for this case would it? Because fail_path() isn't called for this SDEV_DEL and SDEV_CANCEL corner case (hence the need for your patch). So the relation to blk_abort_queue() is still muddled for me. > My preference > was always to make sure a problem (in this case the I/O hang) does > not occur in the first place rather than take actions when it does. Sure, I agree, and given blk_abort_queue() is prone to crash -- albeit a fairly rare race -- we need a solution that doesn't rely on blk_abort_qeue. But I'm just missing how blk_abort_queue() was helping this case (can't see how it would ever be called). Mike > -----Original Message----- > From: Mike Snitzer [mailto:snitzer@redhat.com] > Sent: 16 December, 2010 17:30 > To: Hamburger, Menny > Cc: dm-devel@redhat.com; Thomas, Rob; Dar, Itay > Subject: [PATCH v2][SCSI] scsi_dh: propagate SCSI device deletion > > On Thu, Dec 16 2010 at 9:21am -0500, > Menny_Hamburger@Dell.com <Menny_Hamburger@Dell.com> wrote: > > > Mike, > > > > What I meant is that removing the dm-mpath change from the patch makes > > the operation path go into an area that I did not test (the default > > case). > > I know it's only a !hw_handler_name and a printk (as I said before - > > this might sound superstitious), however I did my tests with the > > dm-mpath part, and I prefer to send a 100% tested patch then to add > > something I did not test. > > My suggestion was to supply a scsi_dh only patch that goes through the > > same operation path as the one I tested - via the SCSI_DH_NOSYS case > > in pg_init_done. In addition, the separation of offline from > > cancel/del also seems the correct way to do this. > > I see, my misunderstanding. > > - But you're still returning SCSI_DH_DEV_OFFLINED; so if dm mpath's > pg_init_done isn't updated to explicitly handle it it'll just > fail_path() via the default case. Are you OK with that? > > - In your initial patch you said: "When running an upstream kernel, > the above scenario may not occur because the request queue is aborted > when the multipath fails the path." > > I'm missing _why_ fail_path's call to blk_abort_queue() would obviate > the need for this patch. blk_abort_queue() is only called from > fail_path(). But I thought the problem is fail_path() isn't called in > this case where the device is SCSI_DH_NOSYS or SCSI_DH_DEV_OFFLINED? > -- due to the missing call to activate_complete callback (pg_init_done) > (again, I'm very interested in this because we'll be reverting DM > mpath's call to blk_abort_queue in the near future). > > But this would be the revised scsi_dh patch (I'm not sending to > linux-scsi until I have an answer for the above concerns): > > > From: Menny Hamburger <Menny_Hamburger@Dell.com> > > When the scsi_dh_activate returns SCSI_DH_NOSYS the activate_complete > callback is not called and the error is not propagated to DM mpath. > > When a SCSI device attached to a device handler is deleted, userland > processes currently performing I/O on the device will have their I/O > hang forever. > > - Set SCSI_DH_NOSYS error when the handler is in the process of being > deleted (e.g. the SCSI device is in a SDEV_CANCEL or SDEV_DEL state). > > - Set SCSI_DH_DEV_OFFLINED error when device is in SDEV_OFFLINE state. > > - Call the activate_complete callback function directly from > scsi_dh_activate if an error has been set (when either the scsi_dh > internal data has already been deleted or is in the process of being > deleted). > > The patch was tested in an ISCSI environment, RDAC H/W handler and > multipath. > In the following reproduction process, dd will I/O hang forever and the > only way to release it will be to reboot the machine: > 1) Perform I/O on a multipath device: > dd if=/dev/dm-0 of=/dev/zero bs=8k count=1000000 & > 2) Delete all slave SCSI devices contained in the mpath device: > I) In an ISCSI environment, the easiest way to do this is by > stopping ISCSI: > /etc/init.d/iscsi stop > II) Another way to delete the devices is by applying the following > bash scriptlet: > dm_devs=$(ls /sys/block/ | grep dm- | xargs) > for dm_dev in $dm_devs; do > devices=$(ls /sys/block/$dm_dev/slaves) > for device in $devices; do > echo 1 > /sys/block/$device/device/delete > done > done > > NOTE: when DM mpath uses blk_abort_queue this scsi_dh change isn't > required. However, DM mpath's call to blk_abort_queue has proven to be > unsafe due to a race (between blk_abort_queue and scsi_request_fn) that > can lead to list corruption. Therefore we cannot rely on > blk_abort_queue (dm mpath's blk_abort_queue call will be reverted and > will only be restored once the race with scsi_request_fn is fixed). > > Signed-off-by: Menny Hamburger <Menny_Hamburger@Dell.com> > Signed-off-by: Mike Snitzer <snitzer@redhat.com> > --- > drivers/scsi/device_handler/scsi_dh.c | 11 +++++++++-- > 1 files changed, 9 insertions(+), 2 deletions(-) > > diff --git a/drivers/scsi/device_handler/scsi_dh.c b/drivers/scsi/device_handler/scsi_dh.c > index 6fae3d2..b0c56f6 100644 > --- a/drivers/scsi/device_handler/scsi_dh.c > +++ b/drivers/scsi/device_handler/scsi_dh.c > @@ -442,12 +442,19 @@ int scsi_dh_activate(struct request_queue *q, activate_complete fn, void *data) > sdev = q->queuedata; > if (sdev && sdev->scsi_dh_data) > scsi_dh = sdev->scsi_dh_data->scsi_dh; > - if (!scsi_dh || !get_device(&sdev->sdev_gendev)) > + if (!scsi_dh || !get_device(&sdev->sdev_gendev) || > + sdev->sdev_state == SDEV_CANCEL || > + sdev->sdev_state == SDEV_DEL) > err = SCSI_DH_NOSYS; > + if (sdev->sdev_state == SDEV_OFFLINE) > + err = SCSI_DH_DEV_OFFLINED; > spin_unlock_irqrestore(q->queue_lock, flags); > > - if (err) > + if (err) { > + if (fn) > + fn(data, err); > return err; > + } > > if (scsi_dh->activate) > err = scsi_dh->activate(sdev, fn, data); > > -- > dm-devel mailing list > dm-devel@redhat.com > https://www.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v3][SCSI] scsi_dh: propagate SCSI device deletion 2010-12-16 16:42 ` Menny_Hamburger @ 2010-12-16 19:57 ` Mike Snitzer 2010-12-17 15:46 ` Moger, Babu 0 siblings, 1 reply; 12+ messages in thread From: Mike Snitzer @ 2010-12-16 19:57 UTC (permalink / raw) To: linux-scsi; +Cc: Menny_Hamburger, Itay_Dar, Rob_Thomas, dm-devel From: Menny Hamburger <Menny_Hamburger@Dell.com> Currently, when scsi_dh_activate() returns with an error (e.g. SCSI_DH_NOSYS) the activate_complete callback is not called and the error is not propagated to DM mpath. When a SCSI device attached to a device handler is deleted, userland processes currently performing I/O on the device will have their I/O hang forever. - Set SCSI_DH_NOSYS error when the handler is in the process of being deleted (e.g. the SCSI device is in a SDEV_CANCEL or SDEV_DEL state). - Set SCSI_DH_DEV_OFFLINED error when device is in SDEV_OFFLINE state. - Call the activate_complete callback function directly from scsi_dh_activate if an error has been set (when either the scsi_dh internal data has already been deleted or is in the process of being deleted). The patch was tested in an iSCSI environment, RDAC H/W handler and multipath. In the following reproduction process, dd will I/O hang forever and the only way to release it will be to reboot the machine: 1) Perform I/O on a multipath device: dd if=/dev/dm-0 of=/dev/zero bs=8k count=1000000 & 2) Delete all slave SCSI devices contained in the mpath device: I) In an iSCSI environment, the easiest way to do this is by stopping iSCSI: /etc/init.d/iscsi stop II) Another way to delete the devices is by applying the following bash scriptlet: dm_devs=$(ls /sys/block/ | grep dm- | xargs) for dm_dev in $dm_devs; do devices=$(ls /sys/block/$dm_dev/slaves) for device in $devices; do echo 1 > /sys/block/$device/device/delete done done NOTE: when DM mpath's fail_path uses blk_abort_queue this scsi_dh change isn't strictly required. However, DM mpath's call to blk_abort_queue will soon be reverted because it has proven to be unsafe due to a race (between blk_abort_queue and scsi_request_fn) that can lead to list corruption. Therefore we cannot rely on blk_abort_queue via fail_path, but even if we could this scsi_dh change is still preferrable. Signed-off-by: Menny Hamburger <Menny_Hamburger@Dell.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> --- drivers/scsi/device_handler/scsi_dh.c | 11 +++++++++-- 1 files changed, 9 insertions(+), 2 deletions(-) v1 and v2 were posted/discussed on dm-devel v3: just tweaked the patch header a bit diff --git a/drivers/scsi/device_handler/scsi_dh.c b/drivers/scsi/device_handler/scsi_dh.c index 6fae3d2..b0c56f6 100644 --- a/drivers/scsi/device_handler/scsi_dh.c +++ b/drivers/scsi/device_handler/scsi_dh.c @@ -442,12 +442,19 @@ int scsi_dh_activate(struct request_queue *q, activate_complete fn, void *data) sdev = q->queuedata; if (sdev && sdev->scsi_dh_data) scsi_dh = sdev->scsi_dh_data->scsi_dh; - if (!scsi_dh || !get_device(&sdev->sdev_gendev)) + if (!scsi_dh || !get_device(&sdev->sdev_gendev) || + sdev->sdev_state == SDEV_CANCEL || + sdev->sdev_state == SDEV_DEL) err = SCSI_DH_NOSYS; + if (sdev->sdev_state == SDEV_OFFLINE) + err = SCSI_DH_DEV_OFFLINED; spin_unlock_irqrestore(q->queue_lock, flags); - if (err) + if (err) { + if (fn) + fn(data, err); return err; + } if (scsi_dh->activate) err = scsi_dh->activate(sdev, fn, data); ^ permalink raw reply related [flat|nested] 12+ messages in thread
* RE: [PATCH v3][SCSI] scsi_dh: propagate SCSI device deletion 2010-12-16 19:57 ` [PATCH v3][SCSI] " Mike Snitzer @ 2010-12-17 15:46 ` Moger, Babu 0 siblings, 0 replies; 12+ messages in thread From: Moger, Babu @ 2010-12-17 15:46 UTC (permalink / raw) To: Mike Snitzer, linux-scsi@vger.kernel.org Cc: Menny_Hamburger@dell.com, Itay_Dar@dell.com, Rob_Thomas@dell.com, dm-devel@redhat.com Patches look good.. Reviewed-by: Babu Moger <babu.moger@lsi.com> > -----Original Message----- > From: linux-scsi-owner@vger.kernel.org [mailto:linux-scsi- > owner@vger.kernel.org] On Behalf Of Mike Snitzer > Sent: Thursday, December 16, 2010 1:57 PM > To: linux-scsi@vger.kernel.org > Cc: Menny_Hamburger@dell.com; Itay_Dar@dell.com; Rob_Thomas@dell.com; dm- > devel@redhat.com > Subject: [PATCH v3][SCSI] scsi_dh: propagate SCSI device deletion > > From: Menny Hamburger <Menny_Hamburger@Dell.com> > > Currently, when scsi_dh_activate() returns with an error > (e.g. SCSI_DH_NOSYS) the activate_complete callback is not called and > the error is not propagated to DM mpath. > > When a SCSI device attached to a device handler is deleted, userland > processes currently performing I/O on the device will have their I/O > hang forever. > > - Set SCSI_DH_NOSYS error when the handler is in the process of being > deleted (e.g. the SCSI device is in a SDEV_CANCEL or SDEV_DEL state). > > - Set SCSI_DH_DEV_OFFLINED error when device is in SDEV_OFFLINE state. > > - Call the activate_complete callback function directly from > scsi_dh_activate if an error has been set (when either the scsi_dh > internal data has already been deleted or is in the process of being > deleted). > > The patch was tested in an iSCSI environment, RDAC H/W handler and > multipath. In the following reproduction process, dd will I/O hang > forever and the only way to release it will be to reboot the machine: > 1) Perform I/O on a multipath device: > dd if=/dev/dm-0 of=/dev/zero bs=8k count=1000000 & > 2) Delete all slave SCSI devices contained in the mpath device: > I) In an iSCSI environment, the easiest way to do this is by > stopping iSCSI: > /etc/init.d/iscsi stop > II) Another way to delete the devices is by applying the following > bash scriptlet: > dm_devs=$(ls /sys/block/ | grep dm- | xargs) > for dm_dev in $dm_devs; do > devices=$(ls /sys/block/$dm_dev/slaves) > for device in $devices; do > echo 1 > /sys/block/$device/device/delete > done > done > > NOTE: when DM mpath's fail_path uses blk_abort_queue this scsi_dh change > isn't strictly required. However, DM mpath's call to blk_abort_queue > will soon be reverted because it has proven to be unsafe due to a race > (between blk_abort_queue and scsi_request_fn) that can lead to list > corruption. Therefore we cannot rely on blk_abort_queue via fail_path, > but even if we could this scsi_dh change is still preferrable. > > Signed-off-by: Menny Hamburger <Menny_Hamburger@Dell.com> > Signed-off-by: Mike Snitzer <snitzer@redhat.com> > --- > drivers/scsi/device_handler/scsi_dh.c | 11 +++++++++-- > 1 files changed, 9 insertions(+), 2 deletions(-) > > v1 and v2 were posted/discussed on dm-devel > v3: just tweaked the patch header a bit > > diff --git a/drivers/scsi/device_handler/scsi_dh.c > b/drivers/scsi/device_handler/scsi_dh.c > index 6fae3d2..b0c56f6 100644 > --- a/drivers/scsi/device_handler/scsi_dh.c > +++ b/drivers/scsi/device_handler/scsi_dh.c > @@ -442,12 +442,19 @@ int scsi_dh_activate(struct request_queue *q, > activate_complete fn, void *data) > sdev = q->queuedata; > if (sdev && sdev->scsi_dh_data) > scsi_dh = sdev->scsi_dh_data->scsi_dh; > - if (!scsi_dh || !get_device(&sdev->sdev_gendev)) > + if (!scsi_dh || !get_device(&sdev->sdev_gendev) || > + sdev->sdev_state == SDEV_CANCEL || > + sdev->sdev_state == SDEV_DEL) > err = SCSI_DH_NOSYS; > + if (sdev->sdev_state == SDEV_OFFLINE) > + err = SCSI_DH_DEV_OFFLINED; > spin_unlock_irqrestore(q->queue_lock, flags); > > - if (err) > + if (err) { > + if (fn) > + fn(data, err); > return err; > + } > > if (scsi_dh->activate) > err = scsi_dh->activate(sdev, fn, data); > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2010-12-17 15:46 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-12-15 8:31 [Patch 2 of 2]: scsi-dh + dm-mpath: propagate SCSI device deletion to multipath Menny_Hamburger 2010-12-15 16:03 ` Moger, Babu 2010-12-15 16:09 ` Mike Snitzer 2010-12-16 7:32 ` Menny_Hamburger 2010-12-16 14:02 ` Mike Snitzer 2010-12-16 14:21 ` Menny_Hamburger 2010-12-16 15:29 ` [PATCH v2][SCSI] scsi_dh: propagate SCSI device deletion Mike Snitzer 2010-12-16 15:52 ` Menny_Hamburger 2010-12-16 16:29 ` Mike Snitzer 2010-12-16 16:42 ` Menny_Hamburger 2010-12-16 19:57 ` [PATCH v3][SCSI] " Mike Snitzer 2010-12-17 15:46 ` Moger, Babu
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.