From: Mike Snitzer <snitzer@redhat.com>
To: Menny_Hamburger@dell.com
Cc: Itay_Dar@dell.com,
device-mapper development <dm-devel@redhat.com>,
Rob_Thomas@dell.com
Subject: Re: [PATCH v2][SCSI] scsi_dh: propagate SCSI device deletion
Date: Thu, 16 Dec 2010 11:29:51 -0500 [thread overview]
Message-ID: <20101216162951.GE23507@redhat.com> (raw)
In-Reply-To: <D8C50530D6022F40A817A35C40CC06A7064AFFD030@DUBX7MCDUB01.EMEA.DELL.COM>
On Thu, Dec 16 2010 at 10:52am -0500,
Menny_Hamburger@dell.com <Menny_Hamburger@dell.com> wrote:
> 1) When I tested this problem, deleting the device set it temporarily
> to SDEV_DEL (sometimes SDEV_CANCEL) state. So the main issue should
> be solved with the function returning SCSI_DH_NOSYS.
> Failing to the default on offline is OK with me since there is no
> issue with !mw_handler_name because the device still exists.
OK.
> 2) Aborting the request queue does not obviate the need for this
> patch, it's just that in a system where abort queue exists, the I/O
> hang will not occur.
> Another way to say this is that the abort queue functionality (if
> it worked OK - which it doesn't) hides the problem from the user
> experience by forcing all the queues to be flushed.
But my point was the blk_abort_queue() wouldn't ever be called for this
case would it? Because fail_path() isn't called for this SDEV_DEL and
SDEV_CANCEL corner case (hence the need for your patch).
So the relation to blk_abort_queue() is still muddled for me.
> My preference
> was always to make sure a problem (in this case the I/O hang) does
> not occur in the first place rather than take actions when it does.
Sure, I agree, and given blk_abort_queue() is prone to crash -- albeit
a fairly rare race -- we need a solution that doesn't rely on
blk_abort_qeue. But I'm just missing how blk_abort_queue() was helping
this case (can't see how it would ever be called).
Mike
> -----Original Message-----
> From: Mike Snitzer [mailto:snitzer@redhat.com]
> Sent: 16 December, 2010 17:30
> To: Hamburger, Menny
> Cc: dm-devel@redhat.com; Thomas, Rob; Dar, Itay
> Subject: [PATCH v2][SCSI] scsi_dh: propagate SCSI device deletion
>
> On Thu, Dec 16 2010 at 9:21am -0500,
> Menny_Hamburger@Dell.com <Menny_Hamburger@Dell.com> wrote:
>
> > Mike,
> >
> > What I meant is that removing the dm-mpath change from the patch makes
> > the operation path go into an area that I did not test (the default
> > case).
> > I know it's only a !hw_handler_name and a printk (as I said before -
> > this might sound superstitious), however I did my tests with the
> > dm-mpath part, and I prefer to send a 100% tested patch then to add
> > something I did not test.
> > My suggestion was to supply a scsi_dh only patch that goes through the
> > same operation path as the one I tested - via the SCSI_DH_NOSYS case
> > in pg_init_done. In addition, the separation of offline from
> > cancel/del also seems the correct way to do this.
>
> I see, my misunderstanding.
>
> - But you're still returning SCSI_DH_DEV_OFFLINED; so if dm mpath's
> pg_init_done isn't updated to explicitly handle it it'll just
> fail_path() via the default case. Are you OK with that?
>
> - In your initial patch you said: "When running an upstream kernel,
> the above scenario may not occur because the request queue is aborted
> when the multipath fails the path."
>
> I'm missing _why_ fail_path's call to blk_abort_queue() would obviate
> the need for this patch. blk_abort_queue() is only called from
> fail_path(). But I thought the problem is fail_path() isn't called in
> this case where the device is SCSI_DH_NOSYS or SCSI_DH_DEV_OFFLINED?
> -- due to the missing call to activate_complete callback (pg_init_done)
> (again, I'm very interested in this because we'll be reverting DM
> mpath's call to blk_abort_queue in the near future).
>
> But this would be the revised scsi_dh patch (I'm not sending to
> linux-scsi until I have an answer for the above concerns):
>
>
> From: Menny Hamburger <Menny_Hamburger@Dell.com>
>
> When the scsi_dh_activate returns SCSI_DH_NOSYS the activate_complete
> callback is not called and the error is not propagated to DM mpath.
>
> When a SCSI device attached to a device handler is deleted, userland
> processes currently performing I/O on the device will have their I/O
> hang forever.
>
> - Set SCSI_DH_NOSYS error when the handler is in the process of being
> deleted (e.g. the SCSI device is in a SDEV_CANCEL or SDEV_DEL state).
>
> - Set SCSI_DH_DEV_OFFLINED error when device is in SDEV_OFFLINE state.
>
> - Call the activate_complete callback function directly from
> scsi_dh_activate if an error has been set (when either the scsi_dh
> internal data has already been deleted or is in the process of being
> deleted).
>
> The patch was tested in an ISCSI environment, RDAC H/W handler and
> multipath.
> In the following reproduction process, dd will I/O hang forever and the
> only way to release it will be to reboot the machine:
> 1) Perform I/O on a multipath device:
> dd if=/dev/dm-0 of=/dev/zero bs=8k count=1000000 &
> 2) Delete all slave SCSI devices contained in the mpath device:
> I) In an ISCSI environment, the easiest way to do this is by
> stopping ISCSI:
> /etc/init.d/iscsi stop
> II) Another way to delete the devices is by applying the following
> bash scriptlet:
> dm_devs=$(ls /sys/block/ | grep dm- | xargs)
> for dm_dev in $dm_devs; do
> devices=$(ls /sys/block/$dm_dev/slaves)
> for device in $devices; do
> echo 1 > /sys/block/$device/device/delete
> done
> done
>
> NOTE: when DM mpath uses blk_abort_queue this scsi_dh change isn't
> required. However, DM mpath's call to blk_abort_queue has proven to be
> unsafe due to a race (between blk_abort_queue and scsi_request_fn) that
> can lead to list corruption. Therefore we cannot rely on
> blk_abort_queue (dm mpath's blk_abort_queue call will be reverted and
> will only be restored once the race with scsi_request_fn is fixed).
>
> Signed-off-by: Menny Hamburger <Menny_Hamburger@Dell.com>
> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> ---
> drivers/scsi/device_handler/scsi_dh.c | 11 +++++++++--
> 1 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/scsi/device_handler/scsi_dh.c b/drivers/scsi/device_handler/scsi_dh.c
> index 6fae3d2..b0c56f6 100644
> --- a/drivers/scsi/device_handler/scsi_dh.c
> +++ b/drivers/scsi/device_handler/scsi_dh.c
> @@ -442,12 +442,19 @@ int scsi_dh_activate(struct request_queue *q, activate_complete fn, void *data)
> sdev = q->queuedata;
> if (sdev && sdev->scsi_dh_data)
> scsi_dh = sdev->scsi_dh_data->scsi_dh;
> - if (!scsi_dh || !get_device(&sdev->sdev_gendev))
> + if (!scsi_dh || !get_device(&sdev->sdev_gendev) ||
> + sdev->sdev_state == SDEV_CANCEL ||
> + sdev->sdev_state == SDEV_DEL)
> err = SCSI_DH_NOSYS;
> + if (sdev->sdev_state == SDEV_OFFLINE)
> + err = SCSI_DH_DEV_OFFLINED;
> spin_unlock_irqrestore(q->queue_lock, flags);
>
> - if (err)
> + if (err) {
> + if (fn)
> + fn(data, err);
> return err;
> + }
>
> if (scsi_dh->activate)
> err = scsi_dh->activate(sdev, fn, data);
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
next prev parent reply other threads:[~2010-12-16 16:29 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-12-15 8:31 [Patch 2 of 2]: scsi-dh + dm-mpath: propagate SCSI device deletion to multipath Menny_Hamburger
2010-12-15 16:03 ` Moger, Babu
2010-12-15 16:09 ` Mike Snitzer
2010-12-16 7:32 ` Menny_Hamburger
2010-12-16 14:02 ` Mike Snitzer
2010-12-16 14:21 ` Menny_Hamburger
2010-12-16 15:29 ` [PATCH v2][SCSI] scsi_dh: propagate SCSI device deletion Mike Snitzer
2010-12-16 15:52 ` Menny_Hamburger
2010-12-16 16:29 ` Mike Snitzer [this message]
2010-12-16 16:42 ` Menny_Hamburger
2010-12-16 19:57 ` [PATCH v3][SCSI] " Mike Snitzer
2010-12-17 15:46 ` Moger, Babu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101216162951.GE23507@redhat.com \
--to=snitzer@redhat.com \
--cc=Itay_Dar@dell.com \
--cc=Menny_Hamburger@dell.com \
--cc=Rob_Thomas@dell.com \
--cc=dm-devel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.