From: James Bottomley <James.Bottomley@HansenPartnership.com>
To: Bart Van Assche <Bart.VanAssche@sandisk.com>,
"martin.petersen@oracle.com" <martin.petersen@oracle.com>
Cc: "linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
"maxg@mellanox.com" <maxg@mellanox.com>,
"israelr@mellanox.com" <israelr@mellanox.com>,
"hare@suse.de" <hare@suse.de>
Subject: Re: [PATCH 0/3] Unblock SCSI devices even if the LLD is being unloaded
Date: Fri, 17 Mar 2017 05:54:28 -0700 [thread overview]
Message-ID: <1489755268.2373.11.camel@HansenPartnership.com> (raw)
In-Reply-To: <1489706379.2574.24.camel@sandisk.com>
On Thu, 2017-03-16 at 23:19 +0000, Bart Van Assche wrote:
> On Thu, 2017-03-16 at 15:53 -0700, James Bottomley wrote:
> > On Thu, 2017-03-16 at 13:56 -0700, Bart Van Assche wrote:
> > > scsi_target_unblock() must unblock SCSI devices even if this
> > > function
> > > is called after unloading of the LLD that created these devices
> > > has
> > > started. This is necessary to prevent that __scsi_remove_device()
> > > hangs on the SYNCHRONIZE CACHE command issued by the sd driver
> > > during
> > > shutdown.
> >
> > Your special get function misses the try_module_get(). But this is
> > all
> > really a bit ugly. Since the only problem is the SYNC CACHE
> > triggered by device_del, isn't a better solution a new state:
> > SDEV_CANCEL_BLOCK. This will make the device visible to
> > scsi_get_device() and we can take it back from CANCEL_BLOCKED
> > ->CANCEL when the queue is unblocked. I suspect we could also
> > simply throw away the sync cache command when the device is blocked
> > (the cache should destage naturally in the time it takes for the
> > device to be unblocked).
>
> Hello James,
>
> The purpose of this patch series is to make sure that unblock also
> occurs after module unloading has started. My understanding of
> try_module_get() is that it fails once module unloading has started.
> In other words, it is on purpose that try_module_get() is not
> called. From the kernel module
> code:
>
> bool try_module_get(struct module *module)
> {
> bool ret = true;
>
> if (module) {
> preempt_disable();
> /* Note: here, we can fail to get a reference */
> if (likely(module_is_live(module) &&
> atomic_inc_not_zero(&module->refcnt) != 0))
> trace_module_get(module, _RET_IP_);
> else
> ret = false;
> preempt_enable();
> }
> return ret;
> }
>
> static inline int module_is_live(struct module *mod)
> {
> return mod->state != MODULE_STATE_GOING;
> }
So it's better to use the module without a reference in place and take
the risk that it may exit and release its code area while we're calling
it?
> Regarding introducing a new device state: this is something I would
> like to avoid. Any code that manipulates the SCSI device state is
> unnecessarily hard to modify because multiple types of state
> information have been mixed up in a single state variable (blocked /
> not blocked; created / running / cancel / offline). Additionally, the
> SCSI device state is visible to user space.
> Adding a new SCSI device state could break existing user space
> applications.
I'm not sure that's a real concern for a new cancel state, but it's
addressable by not exposing the state to user space (it can just show
up as blocked)
> There is another problem with the introduction of the
> SDEV_CANCEL_BLOCKED state: we do not want open() calls to succeed for
> devices that are in the SDEV_DEL, SDEV_CANCEL nor for devices that
> are in the SDEV_CANCEL_BLOCKED state. scsi_disk_get() in the sd
> driver relies on scsi_device_get() to check the SCSI device state. If
> scsi_device_get() would succeed for devices in the
> SDEV_CANCEL_BLOCKED state then an explicit check for that state would
> have to be added in several users of scsi_device_get().
That really doesn't matter: getting a reference via open is a race.
Currently if you do it just before SDEV_CANCEL you end up in the same
position: a properly refcounted open device that can't send any
commands, so this doesn't add any new problems.
> In other words, I think adding the SDEV_CANCEL_BLOCKED state would
> result in a much more complex and also harder to test patch.
Fine, here's what I thing it would look like; it seems a lot shorter
and simpler to me, but if you want to pursue your approach fixing the
race with module exit is a requirement.
James
---
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index ba22866..b952a6a 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1248,6 +1248,13 @@ scsi_prep_state_check(struct scsi_device *sdev, struct request *req)
"rejecting I/O to dead device\n");
ret = BLKPREP_KILL;
break;
+ case SDEV_CANCEL_BLOCK:
+ /*
+ * silently kill the I/O: the only allowed thing
+ * is a ULD remove one (like SYNC CACHE)
+ */
+ ret = BLKPREP_KILL;
+ break;
case SDEV_BLOCK:
case SDEV_CREATED_BLOCK:
ret = BLKPREP_DEFER;
@@ -2604,6 +2611,15 @@ scsi_device_set_state(struct scsi_device *sdev, enum scsi_device_state state)
}
break;
+ case SDEV_CANCEL_BLOCK:
+ switch (oldstate) {
+ case SDEV_CANCEL:
+ break;
+ default:
+ goto illegal;
+ }
+ break;
+
case SDEV_CANCEL:
switch (oldstate) {
case SDEV_CREATED:
@@ -2612,6 +2628,7 @@ scsi_device_set_state(struct scsi_device *sdev, enum scsi_device_state state)
case SDEV_OFFLINE:
case SDEV_TRANSPORT_OFFLINE:
case SDEV_BLOCK:
+ case SDEV_CANCEL_BLOCK:
break;
default:
goto illegal;
@@ -3017,6 +3034,8 @@ scsi_internal_device_unblock(struct scsi_device *sdev,
sdev->sdev_state = new_state;
else
sdev->sdev_state = SDEV_CREATED;
+ } else if (sdev->sdev_state == SDEV_CANCEL_BLOCK) {
+ sdev->sdev_state = SDEV_CANCEL;
} else if (sdev->sdev_state != SDEV_CANCEL &&
sdev->sdev_state != SDEV_OFFLINE)
return -EINVAL;
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 82dfe07..fd1ba1d 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -39,6 +39,7 @@ static const struct {
{ SDEV_TRANSPORT_OFFLINE, "transport-offline" },
{ SDEV_BLOCK, "blocked" },
{ SDEV_CREATED_BLOCK, "created-blocked" },
+ { SDEV_CANCEL_BLOCK, "blocked" },
};
const char *scsi_device_state_name(enum scsi_device_state state)
@@ -972,7 +973,8 @@ sdev_store_dh_state(struct device *dev, struct device_attribute *attr,
int err = -EINVAL;
if (sdev->sdev_state == SDEV_CANCEL ||
- sdev->sdev_state == SDEV_DEL)
+ sdev->sdev_state == SDEV_DEL ||
+ sdev->sdev_state == SDEV_CANCEL_BLOCK)
return -ENODEV;
if (!sdev->handler) {
@@ -1282,7 +1284,8 @@ void __scsi_remove_device(struct scsi_device *sdev)
return;
if (sdev->is_visible) {
- if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
+ if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0
+ && scsi_device_set_state(sdev, SDEV_CANCEL_BLOCK) != 0)
return;
bsg_unregister_queue(sdev->request_queue);
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 6f22b39..a78a17a 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -48,6 +48,7 @@ enum scsi_device_state {
* should be issued to the scsi
* lld. */
SDEV_CREATED_BLOCK, /* same as above but for created devices */
+ SDEV_CANCEL_BLOCK, /* same as above but for cancelling devices */
};
enum scsi_scan_mode {
next prev parent reply other threads:[~2017-03-17 12:54 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-03-16 20:56 [PATCH 0/3] Unblock SCSI devices even if the LLD is being unloaded Bart Van Assche
2017-03-16 20:56 ` [PATCH 1/3] __scsi_iterate_devices(): Make the get and put functions arguments Bart Van Assche
2017-03-16 20:56 ` [PATCH 2/3] Introduce starget_for_all_devices() and shost_for_all_devices() Bart Van Assche
2017-03-18 17:14 ` kbuild test robot
2017-03-16 20:56 ` [PATCH 3/3] Ensure that scsi_target_unblock() examines all devices Bart Van Assche
2017-03-18 20:22 ` kbuild test robot
2017-03-16 22:53 ` [PATCH 0/3] Unblock SCSI devices even if the LLD is being unloaded James Bottomley
2017-03-16 23:19 ` Bart Van Assche
2017-03-17 12:54 ` James Bottomley [this message]
2017-03-17 16:40 ` Bart Van Assche
2017-03-18 12:44 ` James Bottomley
2017-03-18 20:49 ` Bart Van Assche
2017-04-10 17:46 ` Bart Van Assche
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1489755268.2373.11.camel@HansenPartnership.com \
--to=james.bottomley@hansenpartnership.com \
--cc=Bart.VanAssche@sandisk.com \
--cc=hare@suse.de \
--cc=israelr@mellanox.com \
--cc=linux-scsi@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=maxg@mellanox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox