All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Snitzer <snitzer@redhat.com>
To: Menny_Hamburger@Dell.com
Cc: dm-devel@redhat.com, Itay_Dar@Dell.com, Rob_Thomas@Dell.com
Subject: [PATCH v2][SCSI] scsi_dh: propagate SCSI device deletion
Date: Thu, 16 Dec 2010 10:29:31 -0500	[thread overview]
Message-ID: <20101216152930.GB23507@redhat.com> (raw)
In-Reply-To: <D8C50530D6022F40A817A35C40CC06A7064AFFCF09@DUBX7MCDUB01.EMEA.DELL.COM>

On Thu, Dec 16 2010 at  9:21am -0500,
Menny_Hamburger@Dell.com <Menny_Hamburger@Dell.com> wrote:

> Mike,
> 
> What I meant is that removing the dm-mpath change from the patch makes
> the operation path go into an area that I did not test (the default
> case). 
> I know it's only a !hw_handler_name and a printk (as I said before -
> this might sound superstitious), however I did my tests with the
> dm-mpath part, and I prefer to send a 100% tested patch then to add
> something I did not test.
> My suggestion was to supply a scsi_dh only patch that goes through the
> same operation path as the one I tested - via the SCSI_DH_NOSYS case
> in pg_init_done. In addition, the separation of offline from
> cancel/del also seems the correct way to do this.

I see, my misunderstanding.

- But you're still returning SCSI_DH_DEV_OFFLINED; so if dm mpath's
  pg_init_done isn't updated to explicitly handle it it'll just
  fail_path() via the default case.  Are you OK with that?

- In your initial patch you said: "When running an upstream kernel,
  the above scenario may not occur because the request queue is aborted
  when the multipath fails the path."

  I'm missing _why_ fail_path's call to blk_abort_queue() would obviate
  the need for this patch.  blk_abort_queue() is only called from
  fail_path().  But I thought the problem is fail_path() isn't called in
  this case where the device is SCSI_DH_NOSYS or SCSI_DH_DEV_OFFLINED?
  -- due to the missing call to activate_complete callback (pg_init_done)
  (again, I'm very interested in this because we'll be reverting DM
   mpath's call to blk_abort_queue in the near future). 

But this would be the revised scsi_dh patch (I'm not sending to
linux-scsi until I have an answer for the above concerns):


From: Menny Hamburger <Menny_Hamburger@Dell.com>

When the scsi_dh_activate returns SCSI_DH_NOSYS the activate_complete
callback is not called and the error is not propagated to DM mpath.

When a SCSI device attached to a device handler is deleted, userland
processes currently performing I/O on the device will have their I/O
hang forever.  

- Set SCSI_DH_NOSYS error when the handler is in the process of being
  deleted (e.g. the SCSI device is in a SDEV_CANCEL or SDEV_DEL state).

- Set SCSI_DH_DEV_OFFLINED error when device is in SDEV_OFFLINE state.

- Call the activate_complete callback function directly from
  scsi_dh_activate if an error has been set (when either the scsi_dh
  internal data has already been deleted or is in the process of being
  deleted).

The patch was tested in an ISCSI environment, RDAC H/W handler and
multipath.
In the following reproduction process, dd will I/O hang forever and the
only way to release it will be to reboot the machine:
1) Perform I/O on a multipath device:
    dd if=/dev/dm-0 of=/dev/zero bs=8k count=1000000 &
2) Delete all slave SCSI devices contained in the mpath device:
   I)  In an ISCSI environment, the easiest way to do this is by
   stopping ISCSI:
       /etc/init.d/iscsi stop
   II) Another way to delete the devices is by applying the following
   bash scriptlet:
       dm_devs=$(ls /sys/block/ | grep dm- | xargs)
       for dm_dev in $dm_devs; do
         devices=$(ls /sys/block/$dm_dev/slaves)
         for device in $devices; do
            echo 1 > /sys/block/$device/device/delete
         done
       done

NOTE: when DM mpath uses blk_abort_queue this scsi_dh change isn't
required.  However, DM mpath's call to blk_abort_queue has proven to be
unsafe due to a race (between blk_abort_queue and scsi_request_fn) that
can lead to list corruption.  Therefore we cannot rely on
blk_abort_queue (dm mpath's blk_abort_queue call will be reverted and
will only be restored once the race with scsi_request_fn is fixed).

Signed-off-by: Menny Hamburger <Menny_Hamburger@Dell.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/scsi/device_handler/scsi_dh.c |   11 +++++++++--
 1 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/device_handler/scsi_dh.c b/drivers/scsi/device_handler/scsi_dh.c
index 6fae3d2..b0c56f6 100644
--- a/drivers/scsi/device_handler/scsi_dh.c
+++ b/drivers/scsi/device_handler/scsi_dh.c
@@ -442,12 +442,19 @@ int scsi_dh_activate(struct request_queue *q, activate_complete fn, void *data)
 	sdev = q->queuedata;
 	if (sdev && sdev->scsi_dh_data)
 		scsi_dh = sdev->scsi_dh_data->scsi_dh;
-	if (!scsi_dh || !get_device(&sdev->sdev_gendev))
+	if (!scsi_dh || !get_device(&sdev->sdev_gendev) ||
+	    sdev->sdev_state == SDEV_CANCEL ||
+	    sdev->sdev_state == SDEV_DEL)
 		err = SCSI_DH_NOSYS;
+	if (sdev->sdev_state == SDEV_OFFLINE)
+		err = SCSI_DH_DEV_OFFLINED;
 	spin_unlock_irqrestore(q->queue_lock, flags);
 
-	if (err)
+	if (err) {
+		if (fn)
+			fn(data, err);
 		return err;
+	}
 
 	if (scsi_dh->activate)
 		err = scsi_dh->activate(sdev, fn, data);

  reply	other threads:[~2010-12-16 15:29 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-12-15  8:31 [Patch 2 of 2]: scsi-dh + dm-mpath: propagate SCSI device deletion to multipath Menny_Hamburger
2010-12-15 16:03 ` Moger, Babu
2010-12-15 16:09 ` Mike Snitzer
2010-12-16  7:32   ` Menny_Hamburger
2010-12-16 14:02     ` Mike Snitzer
2010-12-16 14:21       ` Menny_Hamburger
2010-12-16 15:29         ` Mike Snitzer [this message]
2010-12-16 15:52           ` [PATCH v2][SCSI] scsi_dh: propagate SCSI device deletion Menny_Hamburger
2010-12-16 16:29             ` Mike Snitzer
2010-12-16 16:42               ` Menny_Hamburger
2010-12-16 19:57                 ` [PATCH v3][SCSI] " Mike Snitzer
2010-12-17 15:46                   ` Moger, Babu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101216152930.GB23507@redhat.com \
    --to=snitzer@redhat.com \
    --cc=Itay_Dar@Dell.com \
    --cc=Menny_Hamburger@Dell.com \
    --cc=Rob_Thomas@Dell.com \
    --cc=dm-devel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.