* [Patch 2 of 2]: scsi-dh + dm-mpath: propagate SCSI device deletion to multipath
@ 2010-12-15 8:31 Menny_Hamburger
2010-12-15 16:03 ` Moger, Babu
2010-12-15 16:09 ` Mike Snitzer
0 siblings, 2 replies; 12+ messages in thread
From: Menny_Hamburger @ 2010-12-15 8:31 UTC (permalink / raw)
To: dm-devel
[-- Attachment #1.1: Type: text/plain, Size: 346 bytes --]
The problem:
When a SCSI device attached to a device handler is deleted, userland processes currently performing I/O on the device will I/O hang forever.
Attached is the multipath layer part of the patch.
Menny Hamburger
Engineer
Dell | IDC
office +972 97698789, fax +972 97698889
Dell IDC. 4 Hacharoshet St, Raanana 43657, Israel
[-- Attachment #1.2: Type: text/html, Size: 3409 bytes --]
[-- Attachment #2: md-handle-device-deletion.patch --]
[-- Type: application/octet-stream, Size: 687 bytes --]
From: Menny Hamburger <Menny_Hamburger@Dell.com>
Subject: [md] dm-multipath: Handle hardware handler deletion in pg_init_done
Bugzilla: 645343
When the SCSI device H/W handler is deleted or offlined, pg_init_done should fail the path.
Failing all the paths associated with the H/W handler will eventually terminate all I/O requests queued on the multipath device
diff -r -U 2 a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
--- a/drivers/md/dm-mpath.c 2010-12-15 10:08:13.243991000 +0200
+++ b/drivers/md/dm-mpath.c 2010-12-15 10:08:13.329736000 +0200
@@ -1190,4 +1190,5 @@
case SCSI_DH_OK:
break;
+ case SCSI_DH_DEV_OFFLINED:
case SCSI_DH_NOSYS:
if (!m->hw_handler_name) {
[-- Attachment #3: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Patch 2 of 2]: scsi-dh + dm-mpath: propagate SCSI device deletion to multipath
2010-12-15 8:31 [Patch 2 of 2]: scsi-dh + dm-mpath: propagate SCSI device deletion to multipath Menny_Hamburger
@ 2010-12-15 16:03 ` Moger, Babu
2010-12-15 16:09 ` Mike Snitzer
1 sibling, 0 replies; 12+ messages in thread
From: Moger, Babu @ 2010-12-15 16:03 UTC (permalink / raw)
To: device-mapper development
Patches look good..
Reviewed-by: Babu Moger <babu.moger@lsi.com>
________________________________________
From: dm-devel-bounces@redhat.com [mailto:dm-devel-bounces@redhat.com] On Behalf Of Menny_Hamburger@dell.com
Sent: Wednesday, December 15, 2010 2:31 AM
To: dm-devel@redhat.com
Subject: [dm-devel] [Patch 2 of 2]: scsi-dh + dm-mpath: propagate SCSI device deletion to multipath
The problem:
When a SCSI device attached to a device handler is deleted, userland processes currently performing I/O on the device will I/O hang forever.
Attached is the multipath layer part of the patch.
Menny Hamburger
Engineer
Dell | IDC
office +972 97698789, fax +972 97698889
Dell IDC. 4 Hacharoshet St, Raanana 43657, Israel
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Patch 2 of 2]: scsi-dh + dm-mpath: propagate SCSI device deletion to multipath
2010-12-15 8:31 [Patch 2 of 2]: scsi-dh + dm-mpath: propagate SCSI device deletion to multipath Menny_Hamburger
2010-12-15 16:03 ` Moger, Babu
@ 2010-12-15 16:09 ` Mike Snitzer
2010-12-16 7:32 ` Menny_Hamburger
1 sibling, 1 reply; 12+ messages in thread
From: Mike Snitzer @ 2010-12-15 16:09 UTC (permalink / raw)
To: Menny_Hamburger; +Cc: dm-devel
On Wed, Dec 15 2010 at 3:31am -0500,
Menny_Hamburger@dell.com <Menny_Hamburger@dell.com> wrote:
> The problem:
> When a SCSI device attached to a device handler is deleted, userland processes currently performing I/O on the device will I/O hang forever.
>
> Attached is the multipath layer part of the patch.
Please do not use attachments when submitting patches. Please inline
the patch in the body of the mail. Also please trim extraneous info
when you reply to mails and please do not top-post.
Your patch submissions were not well received by the dm-devel patchwork
(which automatically collects submitted dm-devel patches so they don't
get lost in the shuffle).
Also, please use unique and descriptive subjects for the patches in a
multipatch series.
As for the proposed mpath change: Babu already pointed out that you
don't need this mpath change as the default case already performs
fail_path().
So all you need is that scsi_dh patch to get upstream. But to do that
you'll need to:
1) create a patch against the upstream scsi-misc tree (not RHEL5.5):
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6.git
2) send the patch to the linux-scsi@vger.kernel.org mailing list (feel
free to cc dm-devel too).
Once this change is in the upstream scsi tree it'll propagate to RHEL
releases.
Thanks,
Mike
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Patch 2 of 2]: scsi-dh + dm-mpath: propagate SCSI device deletion to multipath
2010-12-15 16:09 ` Mike Snitzer
@ 2010-12-16 7:32 ` Menny_Hamburger
2010-12-16 14:02 ` Mike Snitzer
0 siblings, 1 reply; 12+ messages in thread
From: Menny_Hamburger @ 2010-12-16 7:32 UTC (permalink / raw)
To: snitzer; +Cc: dm-devel, Itay_Dar, Rob_Thomas
Hi Mike,
I am sorry about the submission problems, there's a first time for everything...
I know this sounds superstitious but I did run quite thorough tests on this issue and I would really prefer that the proposed patch
would go through the same code path as the one I have tested.
My suggestion is to modify the scsi_dh code as follows:
if (!scsi_dh || !get_device(&sdev->sdev_gendev) ||
sdev->sdev_state == SDEV_CANCEL ||
sdev->sdev_state == SDEV_DEL)
err = SCSI_DH_NOSYS;
if (sdev->sdev_state == SDEV_OFFLINE)
err = SCSI_DH_DEV_OFFLINED;
if (err) {
if (fn)
fn(fn_data, err);
return err;
}
There is logic and benefits here:
1) SDEV_CANCEL/SDEV_DEV are just a stage before the DH is deleted so we can respond with the same SCSI_DH_NOSYS.
I think it's best here to handle these together, since they are in fact the same thing.
In addition, when we delete a device, I don't think it goes through an offline state.
2) SDEV_OFFLINE is easily reverted by echoing to the state of the device - it's not like detaching.
Returning SCSI_DH_DEV_OFFLINED will just call the default case and perform fail_path - no need to worry about any handler issues here.
3) It will let me sleep better at night...
Best Regards,
Menny
-----Original Message-----
From: Mike Snitzer [mailto:snitzer@redhat.com]
Sent: 15 December, 2010 18:10
To: Hamburger, Menny
Cc: dm-devel@redhat.com
Subject: Re: [Patch 2 of 2]: scsi-dh + dm-mpath: propagate SCSI device deletion to multipath
On Wed, Dec 15 2010 at 3:31am -0500,
Menny_Hamburger@dell.com <Menny_Hamburger@dell.com> wrote:
> The problem:
> When a SCSI device attached to a device handler is deleted, userland processes currently performing I/O on the device will I/O hang forever.
>
> Attached is the multipath layer part of the patch.
Please do not use attachments when submitting patches. Please inline
the patch in the body of the mail. Also please trim extraneous info
when you reply to mails and please do not top-post.
Your patch submissions were not well received by the dm-devel patchwork
(which automatically collects submitted dm-devel patches so they don't
get lost in the shuffle).
Also, please use unique and descriptive subjects for the patches in a
multipatch series.
As for the proposed mpath change: Babu already pointed out that you
don't need this mpath change as the default case already performs
fail_path().
So all you need is that scsi_dh patch to get upstream. But to do that
you'll need to:
1) create a patch against the upstream scsi-misc tree (not RHEL5.5):
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6.git
2) send the patch to the linux-scsi@vger.kernel.org mailing list (feel
free to cc dm-devel too).
Once this change is in the upstream scsi tree it'll propagate to RHEL
releases.
Thanks,
Mike
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Patch 2 of 2]: scsi-dh + dm-mpath: propagate SCSI device deletion to multipath
2010-12-16 7:32 ` Menny_Hamburger
@ 2010-12-16 14:02 ` Mike Snitzer
2010-12-16 14:21 ` Menny_Hamburger
0 siblings, 1 reply; 12+ messages in thread
From: Mike Snitzer @ 2010-12-16 14:02 UTC (permalink / raw)
To: Menny_Hamburger; +Cc: dm-devel, Itay_Dar, Rob_Thomas
On Thu, Dec 16 2010 at 2:32am -0500,
Menny_Hamburger@Dell.com <Menny_Hamburger@Dell.com> wrote:
> Hi Mike,
>
> I am sorry about the submission problems, there's a first time for everything...
>
> I know this sounds superstitious but I did run quite thorough tests on
> this issue and I would really prefer that the proposed patch would go
> through the same code path as the one I have tested.
>
> My suggestion is to modify the scsi_dh code as follows:
>
> if (!scsi_dh || !get_device(&sdev->sdev_gendev) ||
> sdev->sdev_state == SDEV_CANCEL ||
> sdev->sdev_state == SDEV_DEL)
> err = SCSI_DH_NOSYS;
>
> if (sdev->sdev_state == SDEV_OFFLINE)
> err = SCSI_DH_DEV_OFFLINED;
>
> if (err) {
> if (fn)
> fn(fn_data, err);
>
> return err;
> }
>
> There is logic and benefits here:
> 1) SDEV_CANCEL/SDEV_DEV are just a stage before the DH is deleted so we can respond with the same SCSI_DH_NOSYS.
> I think it's best here to handle these together, since they are in fact the same thing.
> In addition, when we delete a device, I don't think it goes through an offline state.
> 2) SDEV_OFFLINE is easily reverted by echoing to the state of the device - it's not like detaching.
> Returning SCSI_DH_DEV_OFFLINED will just call the default case and perform fail_path - no need to worry about any handler issues here.
> 3) It will let me sleep better at night...
I think you misunderstood me.
We had a bugzilla exchange yesterday (and as you stated above): the
dm-mpath change isn't required so it is only the scsi_dh patch that
needs to go upstream.
I didn't take issue with your scsi_dh patch. What I shared (below) was
that dm-devel isn't the appropriate place to email a patch that is
exclussively for scsi_dh. (This is just the tedious bit of "how does
this change go upstream?").
I'll take care of rebasing your scsi_dh patch to the scsi-misc git tree
I referenced and then send the patch to linux-scsi on your behalf.
Regards,
Mike
> -----Original Message-----
> From: Mike Snitzer [mailto:snitzer@redhat.com]
> Sent: 15 December, 2010 18:10
> To: Hamburger, Menny
> Cc: dm-devel@redhat.com
> Subject: Re: [Patch 2 of 2]: scsi-dh + dm-mpath: propagate SCSI device deletion to multipath
>
> On Wed, Dec 15 2010 at 3:31am -0500,
> Menny_Hamburger@dell.com <Menny_Hamburger@dell.com> wrote:
>
> > The problem:
> > When a SCSI device attached to a device handler is deleted, userland processes currently performing I/O on the device will I/O hang forever.
> >
> > Attached is the multipath layer part of the patch.
>
> Please do not use attachments when submitting patches. Please inline
> the patch in the body of the mail. Also please trim extraneous info
> when you reply to mails and please do not top-post.
>
> Your patch submissions were not well received by the dm-devel patchwork
> (which automatically collects submitted dm-devel patches so they don't
> get lost in the shuffle).
>
> Also, please use unique and descriptive subjects for the patches in a
> multipatch series.
>
> As for the proposed mpath change: Babu already pointed out that you
> don't need this mpath change as the default case already performs
> fail_path().
>
> So all you need is that scsi_dh patch to get upstream. But to do that
> you'll need to:
> 1) create a patch against the upstream scsi-misc tree (not RHEL5.5):
> git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6.git
>
> 2) send the patch to the linux-scsi@vger.kernel.org mailing list (feel
> free to cc dm-devel too).
>
> Once this change is in the upstream scsi tree it'll propagate to RHEL
> releases.
>
> Thanks,
> Mike
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Patch 2 of 2]: scsi-dh + dm-mpath: propagate SCSI device deletion to multipath
2010-12-16 14:02 ` Mike Snitzer
@ 2010-12-16 14:21 ` Menny_Hamburger
2010-12-16 15:29 ` [PATCH v2][SCSI] scsi_dh: propagate SCSI device deletion Mike Snitzer
0 siblings, 1 reply; 12+ messages in thread
From: Menny_Hamburger @ 2010-12-16 14:21 UTC (permalink / raw)
To: snitzer; +Cc: dm-devel, Itay_Dar, Rob_Thomas
Mike,
What I meant is that removing the dm-mpath change from the patch makes the operation path go into an area that I did not test (the default case).
I know it's only a !hw_handler_name and a printk (as I said before - this might sound superstitious), however I did my tests with the dm-mpath part,
and I prefer to send a 100% tested patch then to add something I did not test.
My suggestion was to supply a scsi_dh only patch that goes through the same operation path as the one I tested - via the SCSI_DH_NOSYS case in pg_init_done. In addition, the separation of offline from cancel/del also seems the correct way to do this.
Hope this makes sense,
Menny
-----Original Message-----
From: Mike Snitzer [mailto:snitzer@redhat.com]
Sent: 16 December, 2010 16:02
To: Hamburger, Menny
Cc: dm-devel@redhat.com; Thomas, Rob; Dar, Itay
Subject: Re: [Patch 2 of 2]: scsi-dh + dm-mpath: propagate SCSI device deletion to multipath
On Thu, Dec 16 2010 at 2:32am -0500,
Menny_Hamburger@Dell.com <Menny_Hamburger@Dell.com> wrote:
> Hi Mike,
>
> I am sorry about the submission problems, there's a first time for everything...
>
> I know this sounds superstitious but I did run quite thorough tests on
> this issue and I would really prefer that the proposed patch would go
> through the same code path as the one I have tested.
>
> My suggestion is to modify the scsi_dh code as follows:
>
> if (!scsi_dh || !get_device(&sdev->sdev_gendev) ||
> sdev->sdev_state == SDEV_CANCEL ||
> sdev->sdev_state == SDEV_DEL)
> err = SCSI_DH_NOSYS;
>
> if (sdev->sdev_state == SDEV_OFFLINE)
> err = SCSI_DH_DEV_OFFLINED;
>
> if (err) {
> if (fn)
> fn(fn_data, err);
>
> return err;
> }
>
> There is logic and benefits here:
> 1) SDEV_CANCEL/SDEV_DEV are just a stage before the DH is deleted so we can respond with the same SCSI_DH_NOSYS.
> I think it's best here to handle these together, since they are in fact the same thing.
> In addition, when we delete a device, I don't think it goes through an offline state.
> 2) SDEV_OFFLINE is easily reverted by echoing to the state of the device - it's not like detaching.
> Returning SCSI_DH_DEV_OFFLINED will just call the default case and perform fail_path - no need to worry about any handler issues here.
> 3) It will let me sleep better at night...
I think you misunderstood me.
We had a bugzilla exchange yesterday (and as you stated above): the
dm-mpath change isn't required so it is only the scsi_dh patch that
needs to go upstream.
I didn't take issue with your scsi_dh patch. What I shared (below) was
that dm-devel isn't the appropriate place to email a patch that is
exclussively for scsi_dh. (This is just the tedious bit of "how does
this change go upstream?").
I'll take care of rebasing your scsi_dh patch to the scsi-misc git tree
I referenced and then send the patch to linux-scsi on your behalf.
Regards,
Mike
> -----Original Message-----
> From: Mike Snitzer [mailto:snitzer@redhat.com]
> Sent: 15 December, 2010 18:10
> To: Hamburger, Menny
> Cc: dm-devel@redhat.com
> Subject: Re: [Patch 2 of 2]: scsi-dh + dm-mpath: propagate SCSI device deletion to multipath
>
> On Wed, Dec 15 2010 at 3:31am -0500,
> Menny_Hamburger@dell.com <Menny_Hamburger@dell.com> wrote:
>
> > The problem:
> > When a SCSI device attached to a device handler is deleted, userland processes currently performing I/O on the device will I/O hang forever.
> >
> > Attached is the multipath layer part of the patch.
>
> Please do not use attachments when submitting patches. Please inline
> the patch in the body of the mail. Also please trim extraneous info
> when you reply to mails and please do not top-post.
>
> Your patch submissions were not well received by the dm-devel patchwork
> (which automatically collects submitted dm-devel patches so they don't
> get lost in the shuffle).
>
> Also, please use unique and descriptive subjects for the patches in a
> multipatch series.
>
> As for the proposed mpath change: Babu already pointed out that you
> don't need this mpath change as the default case already performs
> fail_path().
>
> So all you need is that scsi_dh patch to get upstream. But to do that
> you'll need to:
> 1) create a patch against the upstream scsi-misc tree (not RHEL5.5):
> git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6.git
>
> 2) send the patch to the linux-scsi@vger.kernel.org mailing list (feel
> free to cc dm-devel too).
>
> Once this change is in the upstream scsi tree it'll propagate to RHEL
> releases.
>
> Thanks,
> Mike
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v2][SCSI] scsi_dh: propagate SCSI device deletion
2010-12-16 14:21 ` Menny_Hamburger
@ 2010-12-16 15:29 ` Mike Snitzer
2010-12-16 15:52 ` Menny_Hamburger
0 siblings, 1 reply; 12+ messages in thread
From: Mike Snitzer @ 2010-12-16 15:29 UTC (permalink / raw)
To: Menny_Hamburger; +Cc: dm-devel, Itay_Dar, Rob_Thomas
On Thu, Dec 16 2010 at 9:21am -0500,
Menny_Hamburger@Dell.com <Menny_Hamburger@Dell.com> wrote:
> Mike,
>
> What I meant is that removing the dm-mpath change from the patch makes
> the operation path go into an area that I did not test (the default
> case).
> I know it's only a !hw_handler_name and a printk (as I said before -
> this might sound superstitious), however I did my tests with the
> dm-mpath part, and I prefer to send a 100% tested patch then to add
> something I did not test.
> My suggestion was to supply a scsi_dh only patch that goes through the
> same operation path as the one I tested - via the SCSI_DH_NOSYS case
> in pg_init_done. In addition, the separation of offline from
> cancel/del also seems the correct way to do this.
I see, my misunderstanding.
- But you're still returning SCSI_DH_DEV_OFFLINED; so if dm mpath's
pg_init_done isn't updated to explicitly handle it it'll just
fail_path() via the default case. Are you OK with that?
- In your initial patch you said: "When running an upstream kernel,
the above scenario may not occur because the request queue is aborted
when the multipath fails the path."
I'm missing _why_ fail_path's call to blk_abort_queue() would obviate
the need for this patch. blk_abort_queue() is only called from
fail_path(). But I thought the problem is fail_path() isn't called in
this case where the device is SCSI_DH_NOSYS or SCSI_DH_DEV_OFFLINED?
-- due to the missing call to activate_complete callback (pg_init_done)
(again, I'm very interested in this because we'll be reverting DM
mpath's call to blk_abort_queue in the near future).
But this would be the revised scsi_dh patch (I'm not sending to
linux-scsi until I have an answer for the above concerns):
From: Menny Hamburger <Menny_Hamburger@Dell.com>
When the scsi_dh_activate returns SCSI_DH_NOSYS the activate_complete
callback is not called and the error is not propagated to DM mpath.
When a SCSI device attached to a device handler is deleted, userland
processes currently performing I/O on the device will have their I/O
hang forever.
- Set SCSI_DH_NOSYS error when the handler is in the process of being
deleted (e.g. the SCSI device is in a SDEV_CANCEL or SDEV_DEL state).
- Set SCSI_DH_DEV_OFFLINED error when device is in SDEV_OFFLINE state.
- Call the activate_complete callback function directly from
scsi_dh_activate if an error has been set (when either the scsi_dh
internal data has already been deleted or is in the process of being
deleted).
The patch was tested in an ISCSI environment, RDAC H/W handler and
multipath.
In the following reproduction process, dd will I/O hang forever and the
only way to release it will be to reboot the machine:
1) Perform I/O on a multipath device:
dd if=/dev/dm-0 of=/dev/zero bs=8k count=1000000 &
2) Delete all slave SCSI devices contained in the mpath device:
I) In an ISCSI environment, the easiest way to do this is by
stopping ISCSI:
/etc/init.d/iscsi stop
II) Another way to delete the devices is by applying the following
bash scriptlet:
dm_devs=$(ls /sys/block/ | grep dm- | xargs)
for dm_dev in $dm_devs; do
devices=$(ls /sys/block/$dm_dev/slaves)
for device in $devices; do
echo 1 > /sys/block/$device/device/delete
done
done
NOTE: when DM mpath uses blk_abort_queue this scsi_dh change isn't
required. However, DM mpath's call to blk_abort_queue has proven to be
unsafe due to a race (between blk_abort_queue and scsi_request_fn) that
can lead to list corruption. Therefore we cannot rely on
blk_abort_queue (dm mpath's blk_abort_queue call will be reverted and
will only be restored once the race with scsi_request_fn is fixed).
Signed-off-by: Menny Hamburger <Menny_Hamburger@Dell.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
drivers/scsi/device_handler/scsi_dh.c | 11 +++++++++--
1 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/scsi/device_handler/scsi_dh.c b/drivers/scsi/device_handler/scsi_dh.c
index 6fae3d2..b0c56f6 100644
--- a/drivers/scsi/device_handler/scsi_dh.c
+++ b/drivers/scsi/device_handler/scsi_dh.c
@@ -442,12 +442,19 @@ int scsi_dh_activate(struct request_queue *q, activate_complete fn, void *data)
sdev = q->queuedata;
if (sdev && sdev->scsi_dh_data)
scsi_dh = sdev->scsi_dh_data->scsi_dh;
- if (!scsi_dh || !get_device(&sdev->sdev_gendev))
+ if (!scsi_dh || !get_device(&sdev->sdev_gendev) ||
+ sdev->sdev_state == SDEV_CANCEL ||
+ sdev->sdev_state == SDEV_DEL)
err = SCSI_DH_NOSYS;
+ if (sdev->sdev_state == SDEV_OFFLINE)
+ err = SCSI_DH_DEV_OFFLINED;
spin_unlock_irqrestore(q->queue_lock, flags);
- if (err)
+ if (err) {
+ if (fn)
+ fn(data, err);
return err;
+ }
if (scsi_dh->activate)
err = scsi_dh->activate(sdev, fn, data);
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v2][SCSI] scsi_dh: propagate SCSI device deletion
2010-12-16 15:29 ` [PATCH v2][SCSI] scsi_dh: propagate SCSI device deletion Mike Snitzer
@ 2010-12-16 15:52 ` Menny_Hamburger
2010-12-16 16:29 ` Mike Snitzer
0 siblings, 1 reply; 12+ messages in thread
From: Menny_Hamburger @ 2010-12-16 15:52 UTC (permalink / raw)
To: snitzer; +Cc: dm-devel, Itay_Dar, Rob_Thomas
1) When I tested this problem, deleting the device set it temporarily to SDEV_DEL (sometimes SDEV_CANCEL) state. So the main issue should be solved
with the function returning SCSI_DH_NOSYS.
Failing to the default on offline is OK with me since there is no issue with !mw_handler_name because the device still exists.
2) Aborting the request queue does not obviate the need for this patch, it's just that in a system where abort queue exists, the I/O hang will not occur.
Another way to say this is that the abort queue functionality (if it worked OK - which it doesn't) hides the problem from the user experience
by forcing all the queues to be flushed. My preference was always to make sure a problem (in this case the I/O hang) does not occur in the first place
rather than take actions when it does.
Hope this answers your questions.
Menny
-----Original Message-----
From: Mike Snitzer [mailto:snitzer@redhat.com]
Sent: 16 December, 2010 17:30
To: Hamburger, Menny
Cc: dm-devel@redhat.com; Thomas, Rob; Dar, Itay
Subject: [PATCH v2][SCSI] scsi_dh: propagate SCSI device deletion
On Thu, Dec 16 2010 at 9:21am -0500,
Menny_Hamburger@Dell.com <Menny_Hamburger@Dell.com> wrote:
> Mike,
>
> What I meant is that removing the dm-mpath change from the patch makes
> the operation path go into an area that I did not test (the default
> case).
> I know it's only a !hw_handler_name and a printk (as I said before -
> this might sound superstitious), however I did my tests with the
> dm-mpath part, and I prefer to send a 100% tested patch then to add
> something I did not test.
> My suggestion was to supply a scsi_dh only patch that goes through the
> same operation path as the one I tested - via the SCSI_DH_NOSYS case
> in pg_init_done. In addition, the separation of offline from
> cancel/del also seems the correct way to do this.
I see, my misunderstanding.
- But you're still returning SCSI_DH_DEV_OFFLINED; so if dm mpath's
pg_init_done isn't updated to explicitly handle it it'll just
fail_path() via the default case. Are you OK with that?
- In your initial patch you said: "When running an upstream kernel,
the above scenario may not occur because the request queue is aborted
when the multipath fails the path."
I'm missing _why_ fail_path's call to blk_abort_queue() would obviate
the need for this patch. blk_abort_queue() is only called from
fail_path(). But I thought the problem is fail_path() isn't called in
this case where the device is SCSI_DH_NOSYS or SCSI_DH_DEV_OFFLINED?
-- due to the missing call to activate_complete callback (pg_init_done)
(again, I'm very interested in this because we'll be reverting DM
mpath's call to blk_abort_queue in the near future).
But this would be the revised scsi_dh patch (I'm not sending to
linux-scsi until I have an answer for the above concerns):
From: Menny Hamburger <Menny_Hamburger@Dell.com>
When the scsi_dh_activate returns SCSI_DH_NOSYS the activate_complete
callback is not called and the error is not propagated to DM mpath.
When a SCSI device attached to a device handler is deleted, userland
processes currently performing I/O on the device will have their I/O
hang forever.
- Set SCSI_DH_NOSYS error when the handler is in the process of being
deleted (e.g. the SCSI device is in a SDEV_CANCEL or SDEV_DEL state).
- Set SCSI_DH_DEV_OFFLINED error when device is in SDEV_OFFLINE state.
- Call the activate_complete callback function directly from
scsi_dh_activate if an error has been set (when either the scsi_dh
internal data has already been deleted or is in the process of being
deleted).
The patch was tested in an ISCSI environment, RDAC H/W handler and
multipath.
In the following reproduction process, dd will I/O hang forever and the
only way to release it will be to reboot the machine:
1) Perform I/O on a multipath device:
dd if=/dev/dm-0 of=/dev/zero bs=8k count=1000000 &
2) Delete all slave SCSI devices contained in the mpath device:
I) In an ISCSI environment, the easiest way to do this is by
stopping ISCSI:
/etc/init.d/iscsi stop
II) Another way to delete the devices is by applying the following
bash scriptlet:
dm_devs=$(ls /sys/block/ | grep dm- | xargs)
for dm_dev in $dm_devs; do
devices=$(ls /sys/block/$dm_dev/slaves)
for device in $devices; do
echo 1 > /sys/block/$device/device/delete
done
done
NOTE: when DM mpath uses blk_abort_queue this scsi_dh change isn't
required. However, DM mpath's call to blk_abort_queue has proven to be
unsafe due to a race (between blk_abort_queue and scsi_request_fn) that
can lead to list corruption. Therefore we cannot rely on
blk_abort_queue (dm mpath's blk_abort_queue call will be reverted and
will only be restored once the race with scsi_request_fn is fixed).
Signed-off-by: Menny Hamburger <Menny_Hamburger@Dell.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
drivers/scsi/device_handler/scsi_dh.c | 11 +++++++++--
1 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/scsi/device_handler/scsi_dh.c b/drivers/scsi/device_handler/scsi_dh.c
index 6fae3d2..b0c56f6 100644
--- a/drivers/scsi/device_handler/scsi_dh.c
+++ b/drivers/scsi/device_handler/scsi_dh.c
@@ -442,12 +442,19 @@ int scsi_dh_activate(struct request_queue *q, activate_complete fn, void *data)
sdev = q->queuedata;
if (sdev && sdev->scsi_dh_data)
scsi_dh = sdev->scsi_dh_data->scsi_dh;
- if (!scsi_dh || !get_device(&sdev->sdev_gendev))
+ if (!scsi_dh || !get_device(&sdev->sdev_gendev) ||
+ sdev->sdev_state == SDEV_CANCEL ||
+ sdev->sdev_state == SDEV_DEL)
err = SCSI_DH_NOSYS;
+ if (sdev->sdev_state == SDEV_OFFLINE)
+ err = SCSI_DH_DEV_OFFLINED;
spin_unlock_irqrestore(q->queue_lock, flags);
- if (err)
+ if (err) {
+ if (fn)
+ fn(data, err);
return err;
+ }
if (scsi_dh->activate)
err = scsi_dh->activate(sdev, fn, data);
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v2][SCSI] scsi_dh: propagate SCSI device deletion
2010-12-16 15:52 ` Menny_Hamburger
@ 2010-12-16 16:29 ` Mike Snitzer
2010-12-16 16:42 ` Menny_Hamburger
0 siblings, 1 reply; 12+ messages in thread
From: Mike Snitzer @ 2010-12-16 16:29 UTC (permalink / raw)
To: Menny_Hamburger; +Cc: Itay_Dar, device-mapper development, Rob_Thomas
On Thu, Dec 16 2010 at 10:52am -0500,
Menny_Hamburger@dell.com <Menny_Hamburger@dell.com> wrote:
> 1) When I tested this problem, deleting the device set it temporarily
> to SDEV_DEL (sometimes SDEV_CANCEL) state. So the main issue should
> be solved with the function returning SCSI_DH_NOSYS.
> Failing to the default on offline is OK with me since there is no
> issue with !mw_handler_name because the device still exists.
OK.
> 2) Aborting the request queue does not obviate the need for this
> patch, it's just that in a system where abort queue exists, the I/O
> hang will not occur.
> Another way to say this is that the abort queue functionality (if
> it worked OK - which it doesn't) hides the problem from the user
> experience by forcing all the queues to be flushed.
But my point was the blk_abort_queue() wouldn't ever be called for this
case would it? Because fail_path() isn't called for this SDEV_DEL and
SDEV_CANCEL corner case (hence the need for your patch).
So the relation to blk_abort_queue() is still muddled for me.
> My preference
> was always to make sure a problem (in this case the I/O hang) does
> not occur in the first place rather than take actions when it does.
Sure, I agree, and given blk_abort_queue() is prone to crash -- albeit
a fairly rare race -- we need a solution that doesn't rely on
blk_abort_qeue. But I'm just missing how blk_abort_queue() was helping
this case (can't see how it would ever be called).
Mike
> -----Original Message-----
> From: Mike Snitzer [mailto:snitzer@redhat.com]
> Sent: 16 December, 2010 17:30
> To: Hamburger, Menny
> Cc: dm-devel@redhat.com; Thomas, Rob; Dar, Itay
> Subject: [PATCH v2][SCSI] scsi_dh: propagate SCSI device deletion
>
> On Thu, Dec 16 2010 at 9:21am -0500,
> Menny_Hamburger@Dell.com <Menny_Hamburger@Dell.com> wrote:
>
> > Mike,
> >
> > What I meant is that removing the dm-mpath change from the patch makes
> > the operation path go into an area that I did not test (the default
> > case).
> > I know it's only a !hw_handler_name and a printk (as I said before -
> > this might sound superstitious), however I did my tests with the
> > dm-mpath part, and I prefer to send a 100% tested patch then to add
> > something I did not test.
> > My suggestion was to supply a scsi_dh only patch that goes through the
> > same operation path as the one I tested - via the SCSI_DH_NOSYS case
> > in pg_init_done. In addition, the separation of offline from
> > cancel/del also seems the correct way to do this.
>
> I see, my misunderstanding.
>
> - But you're still returning SCSI_DH_DEV_OFFLINED; so if dm mpath's
> pg_init_done isn't updated to explicitly handle it it'll just
> fail_path() via the default case. Are you OK with that?
>
> - In your initial patch you said: "When running an upstream kernel,
> the above scenario may not occur because the request queue is aborted
> when the multipath fails the path."
>
> I'm missing _why_ fail_path's call to blk_abort_queue() would obviate
> the need for this patch. blk_abort_queue() is only called from
> fail_path(). But I thought the problem is fail_path() isn't called in
> this case where the device is SCSI_DH_NOSYS or SCSI_DH_DEV_OFFLINED?
> -- due to the missing call to activate_complete callback (pg_init_done)
> (again, I'm very interested in this because we'll be reverting DM
> mpath's call to blk_abort_queue in the near future).
>
> But this would be the revised scsi_dh patch (I'm not sending to
> linux-scsi until I have an answer for the above concerns):
>
>
> From: Menny Hamburger <Menny_Hamburger@Dell.com>
>
> When the scsi_dh_activate returns SCSI_DH_NOSYS the activate_complete
> callback is not called and the error is not propagated to DM mpath.
>
> When a SCSI device attached to a device handler is deleted, userland
> processes currently performing I/O on the device will have their I/O
> hang forever.
>
> - Set SCSI_DH_NOSYS error when the handler is in the process of being
> deleted (e.g. the SCSI device is in a SDEV_CANCEL or SDEV_DEL state).
>
> - Set SCSI_DH_DEV_OFFLINED error when device is in SDEV_OFFLINE state.
>
> - Call the activate_complete callback function directly from
> scsi_dh_activate if an error has been set (when either the scsi_dh
> internal data has already been deleted or is in the process of being
> deleted).
>
> The patch was tested in an ISCSI environment, RDAC H/W handler and
> multipath.
> In the following reproduction process, dd will I/O hang forever and the
> only way to release it will be to reboot the machine:
> 1) Perform I/O on a multipath device:
> dd if=/dev/dm-0 of=/dev/zero bs=8k count=1000000 &
> 2) Delete all slave SCSI devices contained in the mpath device:
> I) In an ISCSI environment, the easiest way to do this is by
> stopping ISCSI:
> /etc/init.d/iscsi stop
> II) Another way to delete the devices is by applying the following
> bash scriptlet:
> dm_devs=$(ls /sys/block/ | grep dm- | xargs)
> for dm_dev in $dm_devs; do
> devices=$(ls /sys/block/$dm_dev/slaves)
> for device in $devices; do
> echo 1 > /sys/block/$device/device/delete
> done
> done
>
> NOTE: when DM mpath uses blk_abort_queue this scsi_dh change isn't
> required. However, DM mpath's call to blk_abort_queue has proven to be
> unsafe due to a race (between blk_abort_queue and scsi_request_fn) that
> can lead to list corruption. Therefore we cannot rely on
> blk_abort_queue (dm mpath's blk_abort_queue call will be reverted and
> will only be restored once the race with scsi_request_fn is fixed).
>
> Signed-off-by: Menny Hamburger <Menny_Hamburger@Dell.com>
> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> ---
> drivers/scsi/device_handler/scsi_dh.c | 11 +++++++++--
> 1 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/scsi/device_handler/scsi_dh.c b/drivers/scsi/device_handler/scsi_dh.c
> index 6fae3d2..b0c56f6 100644
> --- a/drivers/scsi/device_handler/scsi_dh.c
> +++ b/drivers/scsi/device_handler/scsi_dh.c
> @@ -442,12 +442,19 @@ int scsi_dh_activate(struct request_queue *q, activate_complete fn, void *data)
> sdev = q->queuedata;
> if (sdev && sdev->scsi_dh_data)
> scsi_dh = sdev->scsi_dh_data->scsi_dh;
> - if (!scsi_dh || !get_device(&sdev->sdev_gendev))
> + if (!scsi_dh || !get_device(&sdev->sdev_gendev) ||
> + sdev->sdev_state == SDEV_CANCEL ||
> + sdev->sdev_state == SDEV_DEL)
> err = SCSI_DH_NOSYS;
> + if (sdev->sdev_state == SDEV_OFFLINE)
> + err = SCSI_DH_DEV_OFFLINED;
> spin_unlock_irqrestore(q->queue_lock, flags);
>
> - if (err)
> + if (err) {
> + if (fn)
> + fn(data, err);
> return err;
> + }
>
> if (scsi_dh->activate)
> err = scsi_dh->activate(sdev, fn, data);
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2][SCSI] scsi_dh: propagate SCSI device deletion
2010-12-16 16:29 ` Mike Snitzer
@ 2010-12-16 16:42 ` Menny_Hamburger
2010-12-16 19:57 ` [PATCH v3][SCSI] " Mike Snitzer
0 siblings, 1 reply; 12+ messages in thread
From: Menny_Hamburger @ 2010-12-16 16:42 UTC (permalink / raw)
To: snitzer; +Cc: Itay_Dar, dm-devel, Rob_Thomas
Mike,
Don't forget you have multipathd running in userland that tests each path from user space:
When multipathd detects a path failure it tells the driver to call fail_path on that path - on a system with blk abort this should be enough
(although again - you are making the problem go away instead of preventing it) since all processing doing I/O will be stopped on I/O error.
On a system without blk_abort, doing fail_path is not enough - it just fails the path but does not flush the request queue.
pg_init_done does other things in addition to failing the path - specifically it schedules process_queued_ios that flushes the request queue.
Menny
-----Original Message-----
From: Mike Snitzer [mailto:snitzer@redhat.com]
Sent: 16 December, 2010 18:30
To: Hamburger, Menny
Cc: Dar, Itay; Thomas, Rob; device-mapper development
Subject: Re: [PATCH v2][SCSI] scsi_dh: propagate SCSI device deletion
On Thu, Dec 16 2010 at 10:52am -0500,
Menny_Hamburger@dell.com <Menny_Hamburger@dell.com> wrote:
> 1) When I tested this problem, deleting the device set it temporarily
> to SDEV_DEL (sometimes SDEV_CANCEL) state. So the main issue should
> be solved with the function returning SCSI_DH_NOSYS.
> Failing to the default on offline is OK with me since there is no
> issue with !mw_handler_name because the device still exists.
OK.
> 2) Aborting the request queue does not obviate the need for this
> patch, it's just that in a system where abort queue exists, the I/O
> hang will not occur.
> Another way to say this is that the abort queue functionality (if
> it worked OK - which it doesn't) hides the problem from the user
> experience by forcing all the queues to be flushed.
But my point was the blk_abort_queue() wouldn't ever be called for this
case would it? Because fail_path() isn't called for this SDEV_DEL and
SDEV_CANCEL corner case (hence the need for your patch).
So the relation to blk_abort_queue() is still muddled for me.
> My preference
> was always to make sure a problem (in this case the I/O hang) does
> not occur in the first place rather than take actions when it does.
Sure, I agree, and given blk_abort_queue() is prone to crash -- albeit
a fairly rare race -- we need a solution that doesn't rely on
blk_abort_qeue. But I'm just missing how blk_abort_queue() was helping
this case (can't see how it would ever be called).
Mike
> -----Original Message-----
> From: Mike Snitzer [mailto:snitzer@redhat.com]
> Sent: 16 December, 2010 17:30
> To: Hamburger, Menny
> Cc: dm-devel@redhat.com; Thomas, Rob; Dar, Itay
> Subject: [PATCH v2][SCSI] scsi_dh: propagate SCSI device deletion
>
> On Thu, Dec 16 2010 at 9:21am -0500,
> Menny_Hamburger@Dell.com <Menny_Hamburger@Dell.com> wrote:
>
> > Mike,
> >
> > What I meant is that removing the dm-mpath change from the patch makes
> > the operation path go into an area that I did not test (the default
> > case).
> > I know it's only a !hw_handler_name and a printk (as I said before -
> > this might sound superstitious), however I did my tests with the
> > dm-mpath part, and I prefer to send a 100% tested patch then to add
> > something I did not test.
> > My suggestion was to supply a scsi_dh only patch that goes through the
> > same operation path as the one I tested - via the SCSI_DH_NOSYS case
> > in pg_init_done. In addition, the separation of offline from
> > cancel/del also seems the correct way to do this.
>
> I see, my misunderstanding.
>
> - But you're still returning SCSI_DH_DEV_OFFLINED; so if dm mpath's
> pg_init_done isn't updated to explicitly handle it it'll just
> fail_path() via the default case. Are you OK with that?
>
> - In your initial patch you said: "When running an upstream kernel,
> the above scenario may not occur because the request queue is aborted
> when the multipath fails the path."
>
> I'm missing _why_ fail_path's call to blk_abort_queue() would obviate
> the need for this patch. blk_abort_queue() is only called from
> fail_path(). But I thought the problem is fail_path() isn't called in
> this case where the device is SCSI_DH_NOSYS or SCSI_DH_DEV_OFFLINED?
> -- due to the missing call to activate_complete callback (pg_init_done)
> (again, I'm very interested in this because we'll be reverting DM
> mpath's call to blk_abort_queue in the near future).
>
> But this would be the revised scsi_dh patch (I'm not sending to
> linux-scsi until I have an answer for the above concerns):
>
>
> From: Menny Hamburger <Menny_Hamburger@Dell.com>
>
> When the scsi_dh_activate returns SCSI_DH_NOSYS the activate_complete
> callback is not called and the error is not propagated to DM mpath.
>
> When a SCSI device attached to a device handler is deleted, userland
> processes currently performing I/O on the device will have their I/O
> hang forever.
>
> - Set SCSI_DH_NOSYS error when the handler is in the process of being
> deleted (e.g. the SCSI device is in a SDEV_CANCEL or SDEV_DEL state).
>
> - Set SCSI_DH_DEV_OFFLINED error when device is in SDEV_OFFLINE state.
>
> - Call the activate_complete callback function directly from
> scsi_dh_activate if an error has been set (when either the scsi_dh
> internal data has already been deleted or is in the process of being
> deleted).
>
> The patch was tested in an ISCSI environment, RDAC H/W handler and
> multipath.
> In the following reproduction process, dd will I/O hang forever and the
> only way to release it will be to reboot the machine:
> 1) Perform I/O on a multipath device:
> dd if=/dev/dm-0 of=/dev/zero bs=8k count=1000000 &
> 2) Delete all slave SCSI devices contained in the mpath device:
> I) In an ISCSI environment, the easiest way to do this is by
> stopping ISCSI:
> /etc/init.d/iscsi stop
> II) Another way to delete the devices is by applying the following
> bash scriptlet:
> dm_devs=$(ls /sys/block/ | grep dm- | xargs)
> for dm_dev in $dm_devs; do
> devices=$(ls /sys/block/$dm_dev/slaves)
> for device in $devices; do
> echo 1 > /sys/block/$device/device/delete
> done
> done
>
> NOTE: when DM mpath uses blk_abort_queue this scsi_dh change isn't
> required. However, DM mpath's call to blk_abort_queue has proven to be
> unsafe due to a race (between blk_abort_queue and scsi_request_fn) that
> can lead to list corruption. Therefore we cannot rely on
> blk_abort_queue (dm mpath's blk_abort_queue call will be reverted and
> will only be restored once the race with scsi_request_fn is fixed).
>
> Signed-off-by: Menny Hamburger <Menny_Hamburger@Dell.com>
> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> ---
> drivers/scsi/device_handler/scsi_dh.c | 11 +++++++++--
> 1 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/scsi/device_handler/scsi_dh.c b/drivers/scsi/device_handler/scsi_dh.c
> index 6fae3d2..b0c56f6 100644
> --- a/drivers/scsi/device_handler/scsi_dh.c
> +++ b/drivers/scsi/device_handler/scsi_dh.c
> @@ -442,12 +442,19 @@ int scsi_dh_activate(struct request_queue *q, activate_complete fn, void *data)
> sdev = q->queuedata;
> if (sdev && sdev->scsi_dh_data)
> scsi_dh = sdev->scsi_dh_data->scsi_dh;
> - if (!scsi_dh || !get_device(&sdev->sdev_gendev))
> + if (!scsi_dh || !get_device(&sdev->sdev_gendev) ||
> + sdev->sdev_state == SDEV_CANCEL ||
> + sdev->sdev_state == SDEV_DEL)
> err = SCSI_DH_NOSYS;
> + if (sdev->sdev_state == SDEV_OFFLINE)
> + err = SCSI_DH_DEV_OFFLINED;
> spin_unlock_irqrestore(q->queue_lock, flags);
>
> - if (err)
> + if (err) {
> + if (fn)
> + fn(data, err);
> return err;
> + }
>
> if (scsi_dh->activate)
> err = scsi_dh->activate(sdev, fn, data);
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v3][SCSI] scsi_dh: propagate SCSI device deletion
2010-12-16 16:42 ` Menny_Hamburger
@ 2010-12-16 19:57 ` Mike Snitzer
2010-12-17 15:46 ` Moger, Babu
0 siblings, 1 reply; 12+ messages in thread
From: Mike Snitzer @ 2010-12-16 19:57 UTC (permalink / raw)
To: linux-scsi; +Cc: Menny_Hamburger, Itay_Dar, Rob_Thomas, dm-devel
From: Menny Hamburger <Menny_Hamburger@Dell.com>
Currently, when scsi_dh_activate() returns with an error
(e.g. SCSI_DH_NOSYS) the activate_complete callback is not called and
the error is not propagated to DM mpath.
When a SCSI device attached to a device handler is deleted, userland
processes currently performing I/O on the device will have their I/O
hang forever.
- Set SCSI_DH_NOSYS error when the handler is in the process of being
deleted (e.g. the SCSI device is in a SDEV_CANCEL or SDEV_DEL state).
- Set SCSI_DH_DEV_OFFLINED error when device is in SDEV_OFFLINE state.
- Call the activate_complete callback function directly from
scsi_dh_activate if an error has been set (when either the scsi_dh
internal data has already been deleted or is in the process of being
deleted).
The patch was tested in an iSCSI environment, RDAC H/W handler and
multipath. In the following reproduction process, dd will I/O hang
forever and the only way to release it will be to reboot the machine:
1) Perform I/O on a multipath device:
dd if=/dev/dm-0 of=/dev/zero bs=8k count=1000000 &
2) Delete all slave SCSI devices contained in the mpath device:
I) In an iSCSI environment, the easiest way to do this is by
stopping iSCSI:
/etc/init.d/iscsi stop
II) Another way to delete the devices is by applying the following
bash scriptlet:
dm_devs=$(ls /sys/block/ | grep dm- | xargs)
for dm_dev in $dm_devs; do
devices=$(ls /sys/block/$dm_dev/slaves)
for device in $devices; do
echo 1 > /sys/block/$device/device/delete
done
done
NOTE: when DM mpath's fail_path uses blk_abort_queue this scsi_dh change
isn't strictly required. However, DM mpath's call to blk_abort_queue
will soon be reverted because it has proven to be unsafe due to a race
(between blk_abort_queue and scsi_request_fn) that can lead to list
corruption. Therefore we cannot rely on blk_abort_queue via fail_path,
but even if we could this scsi_dh change is still preferrable.
Signed-off-by: Menny Hamburger <Menny_Hamburger@Dell.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
drivers/scsi/device_handler/scsi_dh.c | 11 +++++++++--
1 files changed, 9 insertions(+), 2 deletions(-)
v1 and v2 were posted/discussed on dm-devel
v3: just tweaked the patch header a bit
diff --git a/drivers/scsi/device_handler/scsi_dh.c b/drivers/scsi/device_handler/scsi_dh.c
index 6fae3d2..b0c56f6 100644
--- a/drivers/scsi/device_handler/scsi_dh.c
+++ b/drivers/scsi/device_handler/scsi_dh.c
@@ -442,12 +442,19 @@ int scsi_dh_activate(struct request_queue *q, activate_complete fn, void *data)
sdev = q->queuedata;
if (sdev && sdev->scsi_dh_data)
scsi_dh = sdev->scsi_dh_data->scsi_dh;
- if (!scsi_dh || !get_device(&sdev->sdev_gendev))
+ if (!scsi_dh || !get_device(&sdev->sdev_gendev) ||
+ sdev->sdev_state == SDEV_CANCEL ||
+ sdev->sdev_state == SDEV_DEL)
err = SCSI_DH_NOSYS;
+ if (sdev->sdev_state == SDEV_OFFLINE)
+ err = SCSI_DH_DEV_OFFLINED;
spin_unlock_irqrestore(q->queue_lock, flags);
- if (err)
+ if (err) {
+ if (fn)
+ fn(data, err);
return err;
+ }
if (scsi_dh->activate)
err = scsi_dh->activate(sdev, fn, data);
^ permalink raw reply related [flat|nested] 12+ messages in thread
* RE: [PATCH v3][SCSI] scsi_dh: propagate SCSI device deletion
2010-12-16 19:57 ` [PATCH v3][SCSI] " Mike Snitzer
@ 2010-12-17 15:46 ` Moger, Babu
0 siblings, 0 replies; 12+ messages in thread
From: Moger, Babu @ 2010-12-17 15:46 UTC (permalink / raw)
To: Mike Snitzer, linux-scsi@vger.kernel.org
Cc: Menny_Hamburger@dell.com, Itay_Dar@dell.com, Rob_Thomas@dell.com,
dm-devel@redhat.com
Patches look good..
Reviewed-by: Babu Moger <babu.moger@lsi.com>
> -----Original Message-----
> From: linux-scsi-owner@vger.kernel.org [mailto:linux-scsi-
> owner@vger.kernel.org] On Behalf Of Mike Snitzer
> Sent: Thursday, December 16, 2010 1:57 PM
> To: linux-scsi@vger.kernel.org
> Cc: Menny_Hamburger@dell.com; Itay_Dar@dell.com; Rob_Thomas@dell.com; dm-
> devel@redhat.com
> Subject: [PATCH v3][SCSI] scsi_dh: propagate SCSI device deletion
>
> From: Menny Hamburger <Menny_Hamburger@Dell.com>
>
> Currently, when scsi_dh_activate() returns with an error
> (e.g. SCSI_DH_NOSYS) the activate_complete callback is not called and
> the error is not propagated to DM mpath.
>
> When a SCSI device attached to a device handler is deleted, userland
> processes currently performing I/O on the device will have their I/O
> hang forever.
>
> - Set SCSI_DH_NOSYS error when the handler is in the process of being
> deleted (e.g. the SCSI device is in a SDEV_CANCEL or SDEV_DEL state).
>
> - Set SCSI_DH_DEV_OFFLINED error when device is in SDEV_OFFLINE state.
>
> - Call the activate_complete callback function directly from
> scsi_dh_activate if an error has been set (when either the scsi_dh
> internal data has already been deleted or is in the process of being
> deleted).
>
> The patch was tested in an iSCSI environment, RDAC H/W handler and
> multipath. In the following reproduction process, dd will I/O hang
> forever and the only way to release it will be to reboot the machine:
> 1) Perform I/O on a multipath device:
> dd if=/dev/dm-0 of=/dev/zero bs=8k count=1000000 &
> 2) Delete all slave SCSI devices contained in the mpath device:
> I) In an iSCSI environment, the easiest way to do this is by
> stopping iSCSI:
> /etc/init.d/iscsi stop
> II) Another way to delete the devices is by applying the following
> bash scriptlet:
> dm_devs=$(ls /sys/block/ | grep dm- | xargs)
> for dm_dev in $dm_devs; do
> devices=$(ls /sys/block/$dm_dev/slaves)
> for device in $devices; do
> echo 1 > /sys/block/$device/device/delete
> done
> done
>
> NOTE: when DM mpath's fail_path uses blk_abort_queue this scsi_dh change
> isn't strictly required. However, DM mpath's call to blk_abort_queue
> will soon be reverted because it has proven to be unsafe due to a race
> (between blk_abort_queue and scsi_request_fn) that can lead to list
> corruption. Therefore we cannot rely on blk_abort_queue via fail_path,
> but even if we could this scsi_dh change is still preferrable.
>
> Signed-off-by: Menny Hamburger <Menny_Hamburger@Dell.com>
> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> ---
> drivers/scsi/device_handler/scsi_dh.c | 11 +++++++++--
> 1 files changed, 9 insertions(+), 2 deletions(-)
>
> v1 and v2 were posted/discussed on dm-devel
> v3: just tweaked the patch header a bit
>
> diff --git a/drivers/scsi/device_handler/scsi_dh.c
> b/drivers/scsi/device_handler/scsi_dh.c
> index 6fae3d2..b0c56f6 100644
> --- a/drivers/scsi/device_handler/scsi_dh.c
> +++ b/drivers/scsi/device_handler/scsi_dh.c
> @@ -442,12 +442,19 @@ int scsi_dh_activate(struct request_queue *q,
> activate_complete fn, void *data)
> sdev = q->queuedata;
> if (sdev && sdev->scsi_dh_data)
> scsi_dh = sdev->scsi_dh_data->scsi_dh;
> - if (!scsi_dh || !get_device(&sdev->sdev_gendev))
> + if (!scsi_dh || !get_device(&sdev->sdev_gendev) ||
> + sdev->sdev_state == SDEV_CANCEL ||
> + sdev->sdev_state == SDEV_DEL)
> err = SCSI_DH_NOSYS;
> + if (sdev->sdev_state == SDEV_OFFLINE)
> + err = SCSI_DH_DEV_OFFLINED;
> spin_unlock_irqrestore(q->queue_lock, flags);
>
> - if (err)
> + if (err) {
> + if (fn)
> + fn(data, err);
> return err;
> + }
>
> if (scsi_dh->activate)
> err = scsi_dh->activate(sdev, fn, data);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2010-12-17 15:46 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-12-15 8:31 [Patch 2 of 2]: scsi-dh + dm-mpath: propagate SCSI device deletion to multipath Menny_Hamburger
2010-12-15 16:03 ` Moger, Babu
2010-12-15 16:09 ` Mike Snitzer
2010-12-16 7:32 ` Menny_Hamburger
2010-12-16 14:02 ` Mike Snitzer
2010-12-16 14:21 ` Menny_Hamburger
2010-12-16 15:29 ` [PATCH v2][SCSI] scsi_dh: propagate SCSI device deletion Mike Snitzer
2010-12-16 15:52 ` Menny_Hamburger
2010-12-16 16:29 ` Mike Snitzer
2010-12-16 16:42 ` Menny_Hamburger
2010-12-16 19:57 ` [PATCH v3][SCSI] " Mike Snitzer
2010-12-17 15:46 ` Moger, Babu
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.