* RE: requesting advice: oops when doing sg_dd IO and device is removed
@ 2005-10-20 16:48 Dailey, Nate
2005-10-24 13:04 ` Douglas Gilbert
0 siblings, 1 reply; 6+ messages in thread
From: Dailey, Nate @ 2005-10-20 16:48 UTC (permalink / raw)
To: dougg; +Cc: linux-scsi
[-- Attachment #1: Type: text/plain, Size: 9240 bytes --]
Doug, I'm attaching a new patch (again, against 2.6.9)... the primary
difference from my previous patch is that this one now schedules a work
item from sg_cmd_done. This work item drops the reference on the device,
so this gets rid of the problems of calling scsi_device_put from
sg_cmd_done or of sleeping in sg_release. Does this look okay?
Nate
> -----Original Message-----
> From: Douglas Gilbert [mailto:dougg@torque.net]
> Sent: Tuesday, October 18, 2005 12:55 AM
> To: Dailey, Nate
> Cc: linux-scsi@vger.kernel.org
> Subject: Re: requesting advice: oops when doing sg_dd IO and
> device is removed
>
>
> Nate,
> I always suspected that if one really tried hard enough
> then unexpectedly closing a file descriptor (e.g. via
> control-C on an app) at the same time as there are
> outstanding commands could cause problems. Obviously
> you have succeeded demonstrating that with your experiment.
>
> When I fought with this problem in the lk 2.4 series,
> someone skilled at finding flaws in locking logic
> suggested there was no solution (at least the way
> I was doing it) ...
>
> Waiting a close() call (i.e. sg_release() ), potentially
> indefinitely, was very unpopular with application clients.
> The sg driver did that in the old days, and I was abused
> from several quarters (one name springs to mind). So I
> tried to design sg_release() so it wouldn't ever hang
> the application client.
>
> On reflection, I think to allow sg_release() to go
> through quickly, another kernel thread is needed that
> is passed ownership of unfinished commands. What do
> you think?
>
>
> Doug Gilbert
>
>
> Dailey, Nate wrote:
> > Doug (and anyone else who might be interested), I'm looking for some
> > advice as to how best to fix a problem I hit in sg.
> >
> > I was doing the following:
> >
> > - start up several processes reading from an sg device with sg_dd
> > - every other iteration, kill the processes
> > - remove the device (echo 1 > /sys/bus/scsi/devices/1:0:1:0/delete)
> > - add the device (echo "0 1 0" > /sys/class/scsi_host/host1/scan)
> >
> > This was on a 2.6.9 kernel, with a patch:
> >
> > http://marc.theaimsgroup.com/?l=linux-scsi&m=112749860222533&w=2
> >
> > That I've submitted to the LSML... patch was intended to
> fix a different
> > oops when doing the above test.
> >
> > Here's what I saw, after the device was removed:
> >
> > Debug: sleeping function called from invalid context at
> > include/linux/rwsem.h:6in_atomic():1[expected: 0], irqs_disabled():0
> > [<c011fcd1>] __might_sleep+0x7d/0x88
> > [<c021cb5e>] device_del+0x1e/0x90
> > [<f8840ccc>] scsi_device_dev_release+0xdf/0x13a [scsi_mod]
> > [<c021c8e9>] device_release+0x11/0x40
> > [<c01bf03f>] kobject_cleanup+0x40/0x60
> > [<c01bf05f>] kobject_release+0x0/0x8
> > [<c01bf305>] kref_put+0x42/0x45
> > [<f883e61b>] scsi_next_command+0xc/0x14 [scsi_mod]
> > [<f883a0a8>] scsi_release_request+0x8/0x10 [scsi_mod]
> > [<f883ab15>] scsi_finish_command+0xad/0xb1 [scsi_mod]
> > [<f883aa3a>] scsi_softirq+0xb6/0xbe [scsi_mod]
> > [<c0126424>] __do_softirq+0x4c/0xb1
> > [<c010812f>] do_softirq+0x4f/0x56
> > =======================
> > [<c0107a44>] do_IRQ+0x1a2/0x1ae
> > [<c02d1a8c>] common_interrupt+0x18/0x20
> > [<c0104018>] default_idle+0x0/0x2c
> > [<c0104041>] default_idle+0x29/0x2c
> > [<c010409d>] cpu_idle+0x26/0x3b
> > [<c0390786>] start_kernel+0x199/0x19d
> >
> > (It looks like a command comes back, we drop the last
> reference on the
> > device, the device is freed... because we're in softirq,
> and device_del
> > may eventually sleep (in this kernel version, anyway), we
> get this debug
> > message. I don't think this should normally happen when a
> command comes
> > back, for reasons I'll discuss below)
> >
> > Continuing on...
> >
> > Badness in kref_get at lib/kref.c:32
> > [<c01bf2bb>] kref_get+0x24/0x2c
> > [<c01beffb>] kobject_get+0xf/0x13
> > [<c021cb32>] get_device+0xe/0x14
> > [<f883ef34>] scsi_request_fn+0x19/0x319 [scsi_mod]
> > [<c022213e>] blk_run_queue+0x20/0x2f
> > [<f883a0a8>] scsi_release_request+0x8/0x10 [scsi_mod]
> > [<f883ab15>] scsi_finish_command+0xad/0xb1 [scsi_mod]
> > [<f883aa3a>] scsi_softirq+0xb6/0xbe [scsi_mod]
> > [<c0126424>] __do_softirq+0x4c/0xb1
> > [<c010812f>] do_softirq+0x4f/0x56
> > =======================
> > [<c0107a44>] do_IRQ+0x1a2/0x1ae
> > [<c02d1a8c>] common_interrupt+0x18/0x20
> > [<c0104018>] default_idle+0x0/0x2c
> > [<c0104041>] default_idle+0x29/0x2c
> > [<c010409d>] cpu_idle+0x26/0x3b
> > [<c0390786>] start_kernel+0x199/0x19d
> >
> > (So, there were more commands on the device's queue...
> presumably for
> > sg, since I wasn't doing any other IO to this disk. We're
> trying to get
> > a reference on the device, but the ref count has already
> gone to 0 so
> > this "Badness" message results.)
> >
> > Continuing on...
> >
> > Unable to handle kernel NULL pointer dereference at virtual address
> > 00000008
> > printing eip:
> > c0229678
> > *pde = 00004001
> > Oops: 0000 [#1]
> > SMP
> > Modules linked in: sg(U) parport_pc lp parport autofs4 nfs
> lockd sunrpc
> > md5
> > ip)
> > CPU: 0
> > EIP: 0060:[<c0229678>] Tainted: PF VLI
> > EFLAGS: 00010046 (2.6.9-20.ELsmp)
> > EIP is at cfq_next_request+0x7/0x35
> > eax: f7b6f5e0 ebx: 00000000 ecx: c03249bc edx: f243a594
> > esi: f7b6f5e0 edi: f7f5b000 ebp: f7b6f5e0 esp: c03c7f80
> > ds: 007b es: 007b ss: 0068
> > Process swapper (pid: 0, threadinfo=c03c7000 task=c031ea80)
> > Stack: f7b6f5e0 f7b6f5e0 c02209dc f7b6f5e0 f243a400
> f7f5b000 f883ef51
> > f7b6f5e0
> > 00000246 f243a400 00000000 c022213e f7d62800
> f3872800 f883a0a8
> > f7f5b000
> > f883ab15 00002002 f3872800 c03c7fd4 f883aa3a
> c03c7fd4 c03c7fd4
> > 00000003
> > Call Trace:
> > [<c02209dc>] elv_next_request+0xbe/0xce
> > [<f883ef51>] scsi_request_fn+0x36/0x319 [scsi_mod]
> > [<c022213e>] blk_run_queue+0x20/0x2f
> > [<f883a0a8>] scsi_release_request+0x8/0x10 [scsi_mod]
> > [<f883ab15>] scsi_finish_command+0xad/0xb1 [scsi_mod]
> > [<f883aa3a>] scsi_softirq+0xb6/0xbe [scsi_mod]
> > [<c0126424>] __do_softirq+0x4c/0xb1
> > [<c010812f>] do_softirq+0x4f/0x56
> > =======================
> > [<c0107a44>] do_IRQ+0x1a2/0x1ae
> > [<c02d1a8c>] common_interrupt+0x18/0x20
> > [<c0104018>] default_idle+0x0/0x2c
> > [<c0104041>] default_idle+0x29/0x2c
> > [<c010409d>] cpu_idle+0x26/0x3b
> > [<c0390786>] start_kernel+0x199/0x19d
> > Code: 01 00 00 00 89 e8 eb b6 8b 04 24 3b 43 24 0f 92 c0 31
> d2 85 ff 0f
> > 95 c2
> >
> > (It was inevitable that something bad would happen when
> trying to run
> > the queue on a device which had been freed)
> >
> >
> > My theory here is that sg should be holding a reference on the SCSI
> > device until all IO comes back. With or without the patch I
> mentioned
> > above, this is not the case. Although sg_release does take
> out an extra
> > reference if there's any IO still outstanding, if sg_remove
> is called,
> > this reference is dropped. So after sg_remove, there could
> still be IO
> > to the device outstanding, but no reference on the device (from sg).
> > This means the device could get freed even though there's
> still some sg
> > IO on the queue.
> >
> > Does this analysis sound reasonable? I'm a bit puzzled by
> the fact that
> > sd (for example) doesn't appear to have this problem (maybe
> sd_release
> > just never gets called until all IO comes back?).
> >
> >
> > So, how to fix this?
> >
> > The first thing I tried was adding a kref to the Sg_device struct,
> > patterned after the reference counting in sd, sr, st. In
> sg_release, I
> > would drop a reference only if there was no IO outstanding. If there
> > _was_ IO outstanding, I would drop the reference in
> sg_cmd_done when the
> > last IO came back. In theory, this would ensure that sg
> kept a reference
> > on the device until all IO came back.
> >
> > First problem: I was protecting the kref with a mutex, as
> sd, sr, etc.
> > do. Trying to down the mutex from sg_cmd_done didn't work
> (as this is
> > done from softirq). Okay, so I took out the mutex, just to see how
> > things would work. Still there was a problem... doing the final
> > scsi_device_put in sg_cmd_done won't work, because
> > scsi_device_dev_release calls device_del, which may sleep (in this
> > kernel version, anyway... I think this has changed in later
> kernels). So
> > it seems trying to drop a reference when the last command
> comes back is
> > out of the question.
> >
> > The next thing I tried was simply sleeping in sg_release until all
> > outstanding IO comes back (changing sg_remove_sfp to return
> a count of
> > outstanding IOs), and only then dropping the device reference (patch
> > below). This seems to be working fine. But I'm wondering if this is
> > really a good solution? I'm not sure I can think of
> anything else... any
> > ideas?
> >
> > Thanks!
> >
> > Nate Dailey
> > Stratus Technologies
[-- Attachment #2: sg.c.ND102005.patch --]
[-- Type: application/octet-stream, Size: 10101 bytes --]
--- sg.c.orig 2005-10-13 16:50:00.000000000 -0400
+++ sg.c 2005-10-20 12:08:13.282669966 -0400
@@ -49,6 +49,7 @@ static int sg_version_num = 30531; /* 2
#include <linux/seq_file.h>
#include <linux/blkdev.h>
#include <linux/delay.h>
+#include <linux/kref.h>
#include "scsi.h"
#include <scsi/scsi_host.h>
@@ -173,6 +174,7 @@ typedef struct sg_device { /* holds the
char sgdebug; /* 0->off, 1->sense, 9->dump dev, 10-> all devs */
struct gendisk *disk;
struct cdev * cdev; /* char_dev [sysfs: /sys/cdev/major/sg<n>] */
+ struct kref kref;
} Sg_device;
static int sg_fasync(int fd, struct file *filp, int mode);
@@ -218,11 +220,89 @@ static Sg_device **sg_dev_arr = NULL;
static int sg_dev_max;
static int sg_nr_dev;
+/* This semaphore is used to mediate the 0->1 reference get in the
+ * face of object destruction (i.e. we can't allow a get on an
+ * object after last put) */
+static DECLARE_MUTEX(sg_ref_sem);
+
+static void scsi_generic_release(struct kref *kref);
+#define to_scsi_generic(obj) container_of(obj,Sg_device,kref)
+
#define SZ_SG_HEADER sizeof(struct sg_header)
#define SZ_SG_IO_HDR sizeof(sg_io_hdr_t)
#define SZ_SG_IOVEC sizeof(sg_iovec_t)
#define SZ_SG_REQ_INFO sizeof(sg_req_info_t)
+static Sg_device *scsi_generic_get(int dev)
+{
+ Sg_device *sdp = NULL;
+
+ down(&sg_ref_sem);
+
+ sdp = sg_get_dev (dev);
+ if (!sdp) goto out;
+
+ if (sdp->detached) {
+ sdp = NULL;
+ goto out;
+ }
+
+ kref_get(&sdp->kref);
+
+ if (!sdp->device)
+ goto out_put;
+
+ if (scsi_device_get(sdp->device))
+ goto out_put;
+
+ goto out;
+
+out_put:
+ kref_put(&sdp->kref, scsi_generic_release);
+ sdp = NULL;
+out:
+ up(&sg_ref_sem);
+ return sdp;
+}
+
+static void scsi_generic_put(Sg_device *sdp)
+{
+ struct scsi_device *sdev = sdp->device;
+
+ down(&sg_ref_sem);
+ kref_put(&sdp->kref, scsi_generic_release);
+ scsi_device_put(sdev);
+ up(&sg_ref_sem);
+}
+
+struct scsi_generic_put_work_item {
+ Sg_device *sdp;
+ struct work_struct work;
+};
+
+void scsi_generic_put_work_handler(void *data)
+{
+ Sg_device *sdp = ((struct scsi_generic_put_work_item*)data)->sdp;
+
+ scsi_generic_put(sdp);
+ kfree(data);
+}
+
+static int scsi_generic_put_schedule_work(Sg_device *sdp)
+{
+ struct scsi_generic_put_work_item *data;
+
+ if ((data = kmalloc(sizeof(*data), GFP_ATOMIC)) == NULL)
+ return -ENOMEM;
+
+ memset(data, 0, sizeof(*data));
+ INIT_WORK(&data->work, scsi_generic_put_work_handler, data);
+ data->sdp = sdp;
+
+ schedule_work(&data->work);
+ return 0;
+}
+
static int
sg_open(struct inode *inode, struct file *filp)
{
@@ -235,17 +315,8 @@ sg_open(struct inode *inode, struct file
nonseekable_open(inode, filp);
SCSI_LOG_TIMEOUT(3, printk("sg_open: dev=%d, flags=0x%x\n", dev, flags));
- sdp = sg_get_dev(dev);
- if ((!sdp) || (!sdp->device))
+ if (!(sdp = scsi_generic_get(dev)))
return -ENXIO;
- if (sdp->detached)
- return -ENODEV;
-
- /* This driver's module count bumped by fops_get in <linux/fs.h> */
- /* Prevent the device driver from vanishing while we sleep */
- retval = scsi_device_get(sdp->device);
- if (retval)
- return retval;
if (!((flags & O_NONBLOCK) ||
scsi_block_when_processing_errors(sdp->device))) {
@@ -302,7 +373,7 @@ sg_open(struct inode *inode, struct file
return 0;
error_out:
- scsi_device_put(sdp->device);
+ scsi_generic_put(sdp);
return retval;
}
@@ -312,18 +383,21 @@ sg_release(struct inode *inode, struct f
{
Sg_device *sdp;
Sg_fd *sfp;
+ int dirty;
if ((!(sfp = (Sg_fd *) filp->private_data)) || (!(sdp = sfp->parentdp)))
return -ENXIO;
SCSI_LOG_TIMEOUT(3, printk("sg_release: %s\n", sdp->disk->disk_name));
sg_fasync(-1, filp, 0); /* remove filp from async notification list */
- if (0 == sg_remove_sfp(sdp, sfp)) { /* Returns 1 when sdp gone */
- if (!sdp->detached) {
- scsi_device_put(sdp->device);
- }
- sdp->exclude = 0;
- wake_up_interruptible(&sdp->o_excl_wait);
- }
+ dirty = sg_remove_sfp(sdp, sfp);
+ sdp->exclude = 0;
+ wake_up_interruptible(&sdp->o_excl_wait);
+
+ /* Only drop the reference if sg_remove_sfp returned 0
+ (otherwise, there's IO outstanding & we must keep the reference). */
+ if (!dirty)
+ scsi_generic_put(sdp);
+
return 0;
}
@@ -1175,11 +1249,14 @@ static int
sg_mmap(struct file *filp, struct vm_area_struct *vma)
{
Sg_fd *sfp;
- unsigned long req_sz = vma->vm_end - vma->vm_start;
+ unsigned long req_sz;
Sg_scatter_hold *rsv_schp;
if ((!filp) || (!vma) || (!(sfp = (Sg_fd *) filp->private_data)))
return -ENXIO;
+
+ req_sz = vma->vm_end - vma->vm_start;
+
SCSI_LOG_TIMEOUT(3, printk("sg_mmap starting, vm_start=%p, len=%d\n",
(void *) vma->vm_start, (int) req_sz));
if (vma->vm_pgoff)
@@ -1238,7 +1315,7 @@ sg_cmd_done(Scsi_Cmnd * SCpnt)
sfp = srp->parentfp;
if (sfp)
sdp = sfp->parentdp;
- if ((NULL == sdp) || sdp->detached) {
+ if (NULL == sdp) {
printk(KERN_INFO "sg_cmd_done: device detached\n");
scsi_release_request(SRpnt);
return;
@@ -1298,8 +1375,16 @@ sg_cmd_done(Scsi_Cmnd * SCpnt)
srp = NULL;
if (NULL == sfp->headrp) {
SCSI_LOG_TIMEOUT(1, printk("sg...bh: already closed, final cleanup\n"));
- if (0 == sg_remove_sfp(sdp, sfp)) { /* device still present */
- scsi_device_put(sdp->device);
+
+ /* Drop a reference if sg_remove_sfp returns 0
+ (no IO outstanding). */
+ if (0 == sg_remove_sfp(sdp, sfp)) {
+ if (scsi_generic_put_schedule_work(sdp))
+ {
+ /* Error scheduling the work...
+ just drop the reference. */
+ scsi_generic_put(sdp);
+ }
}
sfp = NULL;
}
@@ -1441,6 +1526,7 @@ sg_add(struct class_device *cl_dev)
}
k = error;
sdp = sg_dev_arr[k];
+ kref_init(&sdp->kref);
devfs_mk_cdev(MKDEV(SCSI_GENERIC_MAJOR, k),
S_IFCHR | S_IRUSR | S_IWUSR | S_IRGRP,
@@ -1493,45 +1579,33 @@ sg_remove(struct class_device *cl_dev)
unsigned long iflags;
Sg_fd *sfp;
Sg_fd *tsfp;
- Sg_request *srp;
- Sg_request *tsrp;
- int k, delay;
+ int k;
if (NULL == sg_dev_arr)
return;
- delay = 0;
write_lock_irqsave(&sg_dev_arr_lock, iflags);
for (k = 0; k < sg_dev_max; k++) {
- sdp = sg_dev_arr[k];
- if ((NULL == sdp) || (sdp->device != scsidp))
+ if ((NULL == sg_dev_arr[k]) ||
+ (sg_dev_arr[k]->device != scsidp))
continue; /* dirty but lowers nesting */
+ sdp = sg_dev_arr[k];
+ sdp->detached = 1;
+
if (sdp->headfp) {
- sdp->detached = 1;
for (sfp = sdp->headfp; sfp; sfp = tsfp) {
tsfp = sfp->nextfp;
- for (srp = sfp->headrp; srp; srp = tsrp) {
- tsrp = srp->nextrp;
- if (sfp->closed || (0 == sg_srp_done(srp, sfp)))
- sg_finish_rem_req(srp);
- }
- if (sfp->closed) {
- scsi_device_put(sdp->device);
- __sg_remove_sfp(sdp, sfp);
- } else {
- delay = 1;
+ if (!(sfp->closed)) {
wake_up_interruptible(&sfp->read_wait);
kill_fasync(&sfp->async_qp, SIGPOLL,
POLL_HUP);
}
}
SCSI_LOG_TIMEOUT(3, printk("sg_detach: dev=%d, dirty\n", k));
- if (NULL == sdp->headfp) {
- sg_dev_arr[k] = NULL;
- }
} else { /* nothing active, simple case */
SCSI_LOG_TIMEOUT(3, printk("sg_detach: dev=%d\n", k));
- sg_dev_arr[k] = NULL;
}
+
+ sg_dev_arr[k] = NULL;
sg_nr_dev--;
break;
}
@@ -1543,14 +1617,32 @@ sg_remove(struct class_device *cl_dev)
cdev_del(sdp->cdev);
sdp->cdev = NULL;
devfs_remove("%s/generic", scsidp->devfs_name);
- put_disk(sdp->disk);
- sdp->disk = NULL;
- if (NULL == sdp->headfp)
- kfree((char *) sdp);
+
+ down(&sg_ref_sem);
+ kref_put(&sdp->kref, scsi_generic_release);
+ up(&sg_ref_sem);
}
+}
+
+/**
+ * scsi_generic_release - Called to free the Sg_device structure
+ * @kref: pointer to embedded kref
+ *
+ * sg_ref_sem must be held entering this routine. Because it is
+ * called on last put, you should always use the scsi_generic_get()
+ * and scsi_generic_put() helpers which manipulate the semaphore
+ * directly and never do a direct kref_put().
+ **/
+static void scsi_generic_release(struct kref *kref)
+{
+ Sg_device *sdp = to_scsi_generic(kref);
+ struct gendisk *disk = sdp->disk;
+
+ disk->private_data = NULL;
+ put_disk(disk);
- if (delay)
- msleep(10); /* dirty detach so delay device destruction */
+ kfree((char *) sdp);
+ return;
}
/* Set 'perm' (4th argument) to 0 to disable module_param's definition
@@ -1602,6 +1694,9 @@ err_out:
static void __exit
exit_sg(void)
{
+ /* sg_cmd_done can schedule work... wait for it to complete. */
+ flush_scheduled_work();
+
#ifdef CONFIG_SCSI_PROC_FS
sg_proc_cleanup();
#endif /* CONFIG_SCSI_PROC_FS */
@@ -2484,14 +2579,13 @@ __sg_remove_sfp(Sg_device * sdp, Sg_fd *
sg_page_free((char *) sfp, sizeof (Sg_fd));
}
-/* Returns 0 in normal case, 1 when detached and sdp object removed */
+/* Returns a count of IOs still in progress. */
static int
sg_remove_sfp(Sg_device * sdp, Sg_fd * sfp)
{
Sg_request *srp;
Sg_request *tsrp;
int dirty = 0;
- int res = 0;
for (srp = sfp->headrp; srp; srp = tsrp) {
tsrp = srp->nextrp;
@@ -2505,30 +2599,16 @@ sg_remove_sfp(Sg_device * sdp, Sg_fd * s
write_lock_irqsave(&sg_dev_arr_lock, iflags);
__sg_remove_sfp(sdp, sfp);
- if (sdp->detached && (NULL == sdp->headfp)) {
- int k, maxd;
-
- maxd = sg_dev_max;
- for (k = 0; k < maxd; ++k) {
- if (sdp == sg_dev_arr[k])
- break;
- }
- if (k < maxd)
- sg_dev_arr[k] = NULL;
- kfree((char *) sdp);
- res = 1;
- }
write_unlock_irqrestore(&sg_dev_arr_lock, iflags);
} else {
- /* MOD_INC's to inhibit unloading sg and associated adapter driver */
- /* only bump the access_count if we actually succeeded in
- * throwing another counter on the host module */
- scsi_device_get(sdp->device); /* XXX: retval ignored? */
+ /* The non-zero dirty count will be returned. This will tell
+ the caller not to drop the reference on this device. */
+
sfp->closed = 1; /* flag dirty state on this fd */
SCSI_LOG_TIMEOUT(1, printk("sg_remove_sfp: worrisome, %d writes pending\n",
dirty));
}
- return res;
+ return dirty;
}
static int
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: requesting advice: oops when doing sg_dd IO and device is removed
2005-10-20 16:48 requesting advice: oops when doing sg_dd IO and device is removed Dailey, Nate
@ 2005-10-24 13:04 ` Douglas Gilbert
0 siblings, 0 replies; 6+ messages in thread
From: Douglas Gilbert @ 2005-10-24 13:04 UTC (permalink / raw)
To: Dailey, Nate; +Cc: linux-scsi
[-- Attachment #1: Type: text/plain, Size: 780 bytes --]
Dailey, Nate wrote:
> Doug, I'm attaching a new patch (again, against 2.6.9)... the primary
> difference from my previous patch is that this one now schedules a work
> item from sg_cmd_done. This work item drops the reference on the device,
> so this gets rid of the problems of calling scsi_device_put from
> sg_cmd_done or of sleeping in sg_release. Does this look okay?
Nate,
I haven't had a close look at the logic but I have
been using this patch for several days. So far I have
been testing other things with sg and have detected
no regression from this patch.
Since your patch was against 2.6.9, it needed a few
changes to apply to lk 2.6.14-rc5 . The patch against
2.6.14-rc5 is attached.
Hopefully in the next few days I can focus on
testing this patch.
Doug Gilbert
[-- Attachment #2: sg_2614rc5nd2.diff --]
[-- Type: text/x-patch, Size: 9145 bytes --]
--- linux/drivers/scsi/sg.c 2005-10-05 12:09:27.000000000 +1000
+++ linux/drivers/scsi/sg.c2614rc5nd2 2005-10-24 12:11:32.000000000 +1000
@@ -49,6 +49,7 @@
#include <linux/seq_file.h>
#include <linux/blkdev.h>
#include <linux/delay.h>
+#include <linux/kref.h>
#include "scsi.h"
#include <scsi/scsi_dbg.h>
@@ -174,6 +175,7 @@
char sgdebug; /* 0->off, 1->sense, 9->dump dev, 10-> all devs */
struct gendisk *disk;
struct cdev * cdev; /* char_dev [sysfs: /sys/cdev/major/sg<n>] */
+ struct kref kref;
} Sg_device;
static int sg_fasync(int fd, struct file *filp, int mode);
@@ -219,11 +221,89 @@
static int sg_dev_max;
static int sg_nr_dev;
+/* This semaphore is used to mediate the 0->1 reference get in the
+ * face of object destruction (i.e. we can't allow a get on an
+ * object after last put) */
+static DECLARE_MUTEX(sg_ref_sem);
+
+static void scsi_generic_release(struct kref *kref);
+#define to_scsi_generic(obj) container_of(obj,Sg_device,kref)
+
#define SZ_SG_HEADER sizeof(struct sg_header)
#define SZ_SG_IO_HDR sizeof(sg_io_hdr_t)
#define SZ_SG_IOVEC sizeof(sg_iovec_t)
#define SZ_SG_REQ_INFO sizeof(sg_req_info_t)
+static Sg_device *scsi_generic_get(int dev)
+{
+ Sg_device *sdp = NULL;
+
+ down(&sg_ref_sem);
+
+ sdp = sg_get_dev (dev);
+ if (!sdp) goto out;
+
+ if (sdp->detached) {
+ sdp = NULL;
+ goto out;
+ }
+
+ kref_get(&sdp->kref);
+
+ if (!sdp->device)
+ goto out_put;
+
+ if (scsi_device_get(sdp->device))
+ goto out_put;
+
+ goto out;
+
+out_put:
+ kref_put(&sdp->kref, scsi_generic_release);
+ sdp = NULL;
+out:
+ up(&sg_ref_sem);
+ return sdp;
+}
+
+static void scsi_generic_put(Sg_device *sdp)
+{
+ struct scsi_device *sdev = sdp->device;
+
+ down(&sg_ref_sem);
+ kref_put(&sdp->kref, scsi_generic_release);
+ scsi_device_put(sdev);
+ up(&sg_ref_sem);
+}
+
+struct scsi_generic_put_work_item {
+ Sg_device *sdp;
+ struct work_struct work;
+};
+
+void scsi_generic_put_work_handler(void *data)
+{
+ Sg_device *sdp = ((struct scsi_generic_put_work_item*)data)->sdp;
+
+ scsi_generic_put(sdp);
+ kfree(data);
+}
+
+static int scsi_generic_put_schedule_work(Sg_device *sdp)
+{
+ struct scsi_generic_put_work_item *data;
+
+ if ((data = kmalloc(sizeof(*data), GFP_ATOMIC)) == NULL)
+ return -ENOMEM;
+
+ memset(data, 0, sizeof(*data));
+ INIT_WORK(&data->work, scsi_generic_put_work_handler, data);
+ data->sdp = sdp;
+
+ schedule_work(&data->work);
+ return 0;
+}
+
static int
sg_open(struct inode *inode, struct file *filp)
{
@@ -236,17 +316,8 @@
nonseekable_open(inode, filp);
SCSI_LOG_TIMEOUT(3, printk("sg_open: dev=%d, flags=0x%x\n", dev, flags));
- sdp = sg_get_dev(dev);
- if ((!sdp) || (!sdp->device))
+ if (!(sdp = scsi_generic_get(dev)))
return -ENXIO;
- if (sdp->detached)
- return -ENODEV;
-
- /* This driver's module count bumped by fops_get in <linux/fs.h> */
- /* Prevent the device driver from vanishing while we sleep */
- retval = scsi_device_get(sdp->device);
- if (retval)
- return retval;
if (!((flags & O_NONBLOCK) ||
scsi_block_when_processing_errors(sdp->device))) {
@@ -303,7 +374,7 @@
return 0;
error_out:
- scsi_device_put(sdp->device);
+ scsi_generic_put(sdp);
return retval;
}
@@ -313,18 +384,21 @@
{
Sg_device *sdp;
Sg_fd *sfp;
+ int dirty;
if ((!(sfp = (Sg_fd *) filp->private_data)) || (!(sdp = sfp->parentdp)))
return -ENXIO;
SCSI_LOG_TIMEOUT(3, printk("sg_release: %s\n", sdp->disk->disk_name));
sg_fasync(-1, filp, 0); /* remove filp from async notification list */
- if (0 == sg_remove_sfp(sdp, sfp)) { /* Returns 1 when sdp gone */
- if (!sdp->detached) {
- scsi_device_put(sdp->device);
- }
- sdp->exclude = 0;
- wake_up_interruptible(&sdp->o_excl_wait);
- }
+ dirty = sg_remove_sfp(sdp, sfp);
+ sdp->exclude = 0;
+ wake_up_interruptible(&sdp->o_excl_wait);
+
+ /* Only drop the reference if sg_remove_sfp returned 0
+ (otherwise, there's IO outstanding & we must keep the reference). */
+ if (!dirty)
+ scsi_generic_put(sdp);
+
return 0;
}
@@ -1328,7 +1402,7 @@
sfp = srp->parentfp;
if (sfp)
sdp = sfp->parentdp;
- if ((NULL == sdp) || sdp->detached) {
+ if (NULL == sdp) {
printk(KERN_INFO "sg_cmd_done: device detached\n");
scsi_release_request(SRpnt);
return;
@@ -1391,8 +1465,16 @@
srp = NULL;
if (NULL == sfp->headrp) {
SCSI_LOG_TIMEOUT(1, printk("sg...bh: already closed, final cleanup\n"));
- if (0 == sg_remove_sfp(sdp, sfp)) { /* device still present */
- scsi_device_put(sdp->device);
+
+ /* Drop a reference if sg_remove_sfp returns 0
+ (no IO outstanding). */
+ if (0 == sg_remove_sfp(sdp, sfp)) {
+ if (scsi_generic_put_schedule_work(sdp))
+ {
+ /* Error scheduling the work...
+ just drop the reference. */
+ scsi_generic_put(sdp);
+ }
}
sfp = NULL;
}
@@ -1537,6 +1619,7 @@
}
k = error;
sdp = sg_dev_arr[k];
+ kref_init(&sdp->kref);
devfs_mk_cdev(MKDEV(SCSI_GENERIC_MAJOR, k),
S_IFCHR | S_IRUSR | S_IWUSR | S_IRGRP,
@@ -1589,45 +1672,33 @@
unsigned long iflags;
Sg_fd *sfp;
Sg_fd *tsfp;
- Sg_request *srp;
- Sg_request *tsrp;
- int k, delay;
+ int k;
if (NULL == sg_dev_arr)
return;
- delay = 0;
write_lock_irqsave(&sg_dev_arr_lock, iflags);
for (k = 0; k < sg_dev_max; k++) {
- sdp = sg_dev_arr[k];
- if ((NULL == sdp) || (sdp->device != scsidp))
+ if ((NULL == sg_dev_arr[k]) ||
+ (sg_dev_arr[k]->device != scsidp))
continue; /* dirty but lowers nesting */
+ sdp = sg_dev_arr[k];
+ sdp->detached = 1;
+
if (sdp->headfp) {
- sdp->detached = 1;
for (sfp = sdp->headfp; sfp; sfp = tsfp) {
tsfp = sfp->nextfp;
- for (srp = sfp->headrp; srp; srp = tsrp) {
- tsrp = srp->nextrp;
- if (sfp->closed || (0 == sg_srp_done(srp, sfp)))
- sg_finish_rem_req(srp);
- }
- if (sfp->closed) {
- scsi_device_put(sdp->device);
- __sg_remove_sfp(sdp, sfp);
- } else {
- delay = 1;
+ if (!(sfp->closed)) {
wake_up_interruptible(&sfp->read_wait);
kill_fasync(&sfp->async_qp, SIGPOLL,
POLL_HUP);
}
}
SCSI_LOG_TIMEOUT(3, printk("sg_detach: dev=%d, dirty\n", k));
- if (NULL == sdp->headfp) {
- sg_dev_arr[k] = NULL;
- }
} else { /* nothing active, simple case */
SCSI_LOG_TIMEOUT(3, printk("sg_detach: dev=%d\n", k));
- sg_dev_arr[k] = NULL;
}
+
+ sg_dev_arr[k] = NULL;
sg_nr_dev--;
break;
}
@@ -1639,14 +1710,32 @@
cdev_del(sdp->cdev);
sdp->cdev = NULL;
devfs_remove("%s/generic", scsidp->devfs_name);
- put_disk(sdp->disk);
- sdp->disk = NULL;
- if (NULL == sdp->headfp)
- kfree((char *) sdp);
+
+ down(&sg_ref_sem);
+ kref_put(&sdp->kref, scsi_generic_release);
+ up(&sg_ref_sem);
}
+}
- if (delay)
- msleep(10); /* dirty detach so delay device destruction */
+/**
+ * scsi_generic_release - Called to free the Sg_device structure
+ * @kref: pointer to embedded kref
+ *
+ * sg_ref_sem must be held entering this routine. Because it is
+ * called on last put, you should always use the scsi_generic_get()
+ * and scsi_generic_put() helpers which manipulate the semaphore
+ * directly and never do a direct kref_put().
+ **/
+static void scsi_generic_release(struct kref *kref)
+{
+ Sg_device *sdp = to_scsi_generic(kref);
+ struct gendisk *disk = sdp->disk;
+
+ disk->private_data = NULL;
+ put_disk(disk);
+
+ kfree((char *) sdp);
+ return;
}
/* Set 'perm' (4th argument) to 0 to disable module_param's definition
@@ -1698,6 +1787,9 @@
static void __exit
exit_sg(void)
{
+ /* sg_cmd_done can schedule work... wait for it to complete. */
+ flush_scheduled_work();
+
#ifdef CONFIG_SCSI_PROC_FS
sg_proc_cleanup();
#endif /* CONFIG_SCSI_PROC_FS */
@@ -2578,14 +2670,13 @@
sg_page_free((char *) sfp, sizeof (Sg_fd));
}
-/* Returns 0 in normal case, 1 when detached and sdp object removed */
+/* Returns a count of IOs still in progress. */
static int
sg_remove_sfp(Sg_device * sdp, Sg_fd * sfp)
{
Sg_request *srp;
Sg_request *tsrp;
int dirty = 0;
- int res = 0;
for (srp = sfp->headrp; srp; srp = tsrp) {
tsrp = srp->nextrp;
@@ -2599,30 +2690,16 @@
write_lock_irqsave(&sg_dev_arr_lock, iflags);
__sg_remove_sfp(sdp, sfp);
- if (sdp->detached && (NULL == sdp->headfp)) {
- int k, maxd;
-
- maxd = sg_dev_max;
- for (k = 0; k < maxd; ++k) {
- if (sdp == sg_dev_arr[k])
- break;
- }
- if (k < maxd)
- sg_dev_arr[k] = NULL;
- kfree((char *) sdp);
- res = 1;
- }
write_unlock_irqrestore(&sg_dev_arr_lock, iflags);
} else {
- /* MOD_INC's to inhibit unloading sg and associated adapter driver */
- /* only bump the access_count if we actually succeeded in
- * throwing another counter on the host module */
- scsi_device_get(sdp->device); /* XXX: retval ignored? */
+ /* The non-zero dirty count will be returned. This will tell
+ the caller not to drop the reference on this device. */
+
sfp->closed = 1; /* flag dirty state on this fd */
SCSI_LOG_TIMEOUT(1, printk("sg_remove_sfp: worrisome, %d writes pending\n",
dirty));
}
- return res;
+ return dirty;
}
static int
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: requesting advice: oops when doing sg_dd IO and device is removed
@ 2005-10-18 21:00 Dailey, Nate
0 siblings, 0 replies; 6+ messages in thread
From: Dailey, Nate @ 2005-10-18 21:00 UTC (permalink / raw)
To: dougg; +Cc: linux-scsi
It seems like adding a kernel thread to handle this would work... if we
don't want to wait in sg_release or sg_remove, and we can't do the final
scsi_device_put from sg_cmd_done. I don't know if it's really worth it
though, for something that's probably never going to happen anyway.
I think that in later kernels than the 2.6.9 kernel I'm on,
scsi_device_dev_release can't sleep (not 100% certain though). So maybe
the best solution would be to just change sg_remove so that it doesn't
do the scsi_device_put if there's outstanding IO, and let that happen
from sg_cmd_done. I don't know if there's anything that could go wrong
with that approach...
Nate
> -----Original Message-----
> From: Douglas Gilbert [mailto:dougg@torque.net]
> Sent: Tuesday, October 18, 2005 12:55 AM
> To: Dailey, Nate
> Cc: linux-scsi@vger.kernel.org
> Subject: Re: requesting advice: oops when doing sg_dd IO and
> device is removed
>
>
> Nate,
> I always suspected that if one really tried hard enough
> then unexpectedly closing a file descriptor (e.g. via
> control-C on an app) at the same time as there are
> outstanding commands could cause problems. Obviously
> you have succeeded demonstrating that with your experiment.
>
> When I fought with this problem in the lk 2.4 series,
> someone skilled at finding flaws in locking logic
> suggested there was no solution (at least the way
> I was doing it) ...
>
> Waiting a close() call (i.e. sg_release() ), potentially
> indefinitely, was very unpopular with application clients.
> The sg driver did that in the old days, and I was abused
> from several quarters (one name springs to mind). So I
> tried to design sg_release() so it wouldn't ever hang
> the application client.
>
> On reflection, I think to allow sg_release() to go
> through quickly, another kernel thread is needed that
> is passed ownership of unfinished commands. What do
> you think?
>
>
> Doug Gilbert
>
>
> Dailey, Nate wrote:
> > Doug (and anyone else who might be interested), I'm looking for some
> > advice as to how best to fix a problem I hit in sg.
> >
> > I was doing the following:
> >
> > - start up several processes reading from an sg device with sg_dd
> > - every other iteration, kill the processes
> > - remove the device (echo 1 > /sys/bus/scsi/devices/1:0:1:0/delete)
> > - add the device (echo "0 1 0" > /sys/class/scsi_host/host1/scan)
> >
> > This was on a 2.6.9 kernel, with a patch:
> >
> > http://marc.theaimsgroup.com/?l=linux-scsi&m=112749860222533&w=2
> >
> > That I've submitted to the LSML... patch was intended to
> fix a different
> > oops when doing the above test.
> >
> > Here's what I saw, after the device was removed:
> >
> > Debug: sleeping function called from invalid context at
> > include/linux/rwsem.h:6in_atomic():1[expected: 0], irqs_disabled():0
> > [<c011fcd1>] __might_sleep+0x7d/0x88
> > [<c021cb5e>] device_del+0x1e/0x90
> > [<f8840ccc>] scsi_device_dev_release+0xdf/0x13a [scsi_mod]
> > [<c021c8e9>] device_release+0x11/0x40
> > [<c01bf03f>] kobject_cleanup+0x40/0x60
> > [<c01bf05f>] kobject_release+0x0/0x8
> > [<c01bf305>] kref_put+0x42/0x45
> > [<f883e61b>] scsi_next_command+0xc/0x14 [scsi_mod]
> > [<f883a0a8>] scsi_release_request+0x8/0x10 [scsi_mod]
> > [<f883ab15>] scsi_finish_command+0xad/0xb1 [scsi_mod]
> > [<f883aa3a>] scsi_softirq+0xb6/0xbe [scsi_mod]
> > [<c0126424>] __do_softirq+0x4c/0xb1
> > [<c010812f>] do_softirq+0x4f/0x56
> > =======================
> > [<c0107a44>] do_IRQ+0x1a2/0x1ae
> > [<c02d1a8c>] common_interrupt+0x18/0x20
> > [<c0104018>] default_idle+0x0/0x2c
> > [<c0104041>] default_idle+0x29/0x2c
> > [<c010409d>] cpu_idle+0x26/0x3b
> > [<c0390786>] start_kernel+0x199/0x19d
> >
> > (It looks like a command comes back, we drop the last
> reference on the
> > device, the device is freed... because we're in softirq,
> and device_del
> > may eventually sleep (in this kernel version, anyway), we
> get this debug
> > message. I don't think this should normally happen when a
> command comes
> > back, for reasons I'll discuss below)
> >
> > Continuing on...
> >
> > Badness in kref_get at lib/kref.c:32
> > [<c01bf2bb>] kref_get+0x24/0x2c
> > [<c01beffb>] kobject_get+0xf/0x13
> > [<c021cb32>] get_device+0xe/0x14
> > [<f883ef34>] scsi_request_fn+0x19/0x319 [scsi_mod]
> > [<c022213e>] blk_run_queue+0x20/0x2f
> > [<f883a0a8>] scsi_release_request+0x8/0x10 [scsi_mod]
> > [<f883ab15>] scsi_finish_command+0xad/0xb1 [scsi_mod]
> > [<f883aa3a>] scsi_softirq+0xb6/0xbe [scsi_mod]
> > [<c0126424>] __do_softirq+0x4c/0xb1
> > [<c010812f>] do_softirq+0x4f/0x56
> > =======================
> > [<c0107a44>] do_IRQ+0x1a2/0x1ae
> > [<c02d1a8c>] common_interrupt+0x18/0x20
> > [<c0104018>] default_idle+0x0/0x2c
> > [<c0104041>] default_idle+0x29/0x2c
> > [<c010409d>] cpu_idle+0x26/0x3b
> > [<c0390786>] start_kernel+0x199/0x19d
> >
> > (So, there were more commands on the device's queue...
> presumably for
> > sg, since I wasn't doing any other IO to this disk. We're
> trying to get
> > a reference on the device, but the ref count has already
> gone to 0 so
> > this "Badness" message results.)
> >
> > Continuing on...
> >
> > Unable to handle kernel NULL pointer dereference at virtual address
> > 00000008
> > printing eip:
> > c0229678
> > *pde = 00004001
> > Oops: 0000 [#1]
> > SMP
> > Modules linked in: sg(U) parport_pc lp parport autofs4 nfs
> lockd sunrpc
> > md5
> > ip)
> > CPU: 0
> > EIP: 0060:[<c0229678>] Tainted: PF VLI
> > EFLAGS: 00010046 (2.6.9-20.ELsmp)
> > EIP is at cfq_next_request+0x7/0x35
> > eax: f7b6f5e0 ebx: 00000000 ecx: c03249bc edx: f243a594
> > esi: f7b6f5e0 edi: f7f5b000 ebp: f7b6f5e0 esp: c03c7f80
> > ds: 007b es: 007b ss: 0068
> > Process swapper (pid: 0, threadinfo=c03c7000 task=c031ea80)
> > Stack: f7b6f5e0 f7b6f5e0 c02209dc f7b6f5e0 f243a400
> f7f5b000 f883ef51
> > f7b6f5e0
> > 00000246 f243a400 00000000 c022213e f7d62800
> f3872800 f883a0a8
> > f7f5b000
> > f883ab15 00002002 f3872800 c03c7fd4 f883aa3a
> c03c7fd4 c03c7fd4
> > 00000003
> > Call Trace:
> > [<c02209dc>] elv_next_request+0xbe/0xce
> > [<f883ef51>] scsi_request_fn+0x36/0x319 [scsi_mod]
> > [<c022213e>] blk_run_queue+0x20/0x2f
> > [<f883a0a8>] scsi_release_request+0x8/0x10 [scsi_mod]
> > [<f883ab15>] scsi_finish_command+0xad/0xb1 [scsi_mod]
> > [<f883aa3a>] scsi_softirq+0xb6/0xbe [scsi_mod]
> > [<c0126424>] __do_softirq+0x4c/0xb1
> > [<c010812f>] do_softirq+0x4f/0x56
> > =======================
> > [<c0107a44>] do_IRQ+0x1a2/0x1ae
> > [<c02d1a8c>] common_interrupt+0x18/0x20
> > [<c0104018>] default_idle+0x0/0x2c
> > [<c0104041>] default_idle+0x29/0x2c
> > [<c010409d>] cpu_idle+0x26/0x3b
> > [<c0390786>] start_kernel+0x199/0x19d
> > Code: 01 00 00 00 89 e8 eb b6 8b 04 24 3b 43 24 0f 92 c0 31
> d2 85 ff 0f
> > 95 c2
> >
> > (It was inevitable that something bad would happen when
> trying to run
> > the queue on a device which had been freed)
> >
> >
> > My theory here is that sg should be holding a reference on the SCSI
> > device until all IO comes back. With or without the patch I
> mentioned
> > above, this is not the case. Although sg_release does take
> out an extra
> > reference if there's any IO still outstanding, if sg_remove
> is called,
> > this reference is dropped. So after sg_remove, there could
> still be IO
> > to the device outstanding, but no reference on the device (from sg).
> > This means the device could get freed even though there's
> still some sg
> > IO on the queue.
> >
> > Does this analysis sound reasonable? I'm a bit puzzled by
> the fact that
> > sd (for example) doesn't appear to have this problem (maybe
> sd_release
> > just never gets called until all IO comes back?).
> >
> >
> > So, how to fix this?
> >
> > The first thing I tried was adding a kref to the Sg_device struct,
> > patterned after the reference counting in sd, sr, st. In
> sg_release, I
> > would drop a reference only if there was no IO outstanding. If there
> > _was_ IO outstanding, I would drop the reference in
> sg_cmd_done when the
> > last IO came back. In theory, this would ensure that sg
> kept a reference
> > on the device until all IO came back.
> >
> > First problem: I was protecting the kref with a mutex, as
> sd, sr, etc.
> > do. Trying to down the mutex from sg_cmd_done didn't work
> (as this is
> > done from softirq). Okay, so I took out the mutex, just to see how
> > things would work. Still there was a problem... doing the final
> > scsi_device_put in sg_cmd_done won't work, because
> > scsi_device_dev_release calls device_del, which may sleep (in this
> > kernel version, anyway... I think this has changed in later
> kernels). So
> > it seems trying to drop a reference when the last command
> comes back is
> > out of the question.
> >
> > The next thing I tried was simply sleeping in sg_release until all
> > outstanding IO comes back (changing sg_remove_sfp to return
> a count of
> > outstanding IOs), and only then dropping the device reference (patch
> > below). This seems to be working fine. But I'm wondering if this is
> > really a good solution? I'm not sure I can think of
> anything else... any
> > ideas?
> >
> > Thanks!
> >
> > Nate Dailey
> > Stratus Technologies
> >
> >
> > Here's the "wait in sg_release" patch (which I think is
> still useful,
> > even if it's not strictly required). This is against a
> 2.6.9 kernel, but
> > I can resubmit a patch against a later kernel if desired...
> >
> > --- sg.c.orig 2005-10-13 16:50:00.806798387 -0400
> > +++ sg.c 2005-10-13 16:07:06.365733548 -0400
> > @@ -49,6 +49,7 @@ static int sg_version_num = 30531; /* 2
> > #include <linux/seq_file.h>
> > #include <linux/blkdev.h>
> > #include <linux/delay.h>
> > +#include <linux/kref.h>
> >
> > #include "scsi.h"
> > #include <scsi/scsi_host.h>
> > @@ -173,6 +174,7 @@ typedef struct sg_device { /* holds the
> > char sgdebug; /* 0->off, 1->sense,
> 9->dump dev, 10->
> > all devs */
> > struct gendisk *disk;
> > struct cdev * cdev; /* char_dev [sysfs:
> > /sys/cdev/major/sg<n>] */
> > + struct kref kref;
> > } Sg_device;
> >
> > static int sg_fasync(int fd, struct file *filp, int mode);
> > @@ -218,11 +220,61 @@ static Sg_device **sg_dev_arr = NULL;
> > static int sg_dev_max;
> > static int sg_nr_dev;
> >
> > +/* This semaphore is used to mediate the 0->1 reference get in the
> > + * face of object destruction (i.e. we can't allow a get on an
> > + * object after last put) */
> > +static DECLARE_MUTEX(sg_ref_sem);
> > +
> > +static void scsi_generic_release(struct kref *kref);
> > +#define to_scsi_generic(obj) container_of(obj,Sg_device,kref)
> > +
> > #define SZ_SG_HEADER sizeof(struct sg_header)
> > #define SZ_SG_IO_HDR sizeof(sg_io_hdr_t)
> > #define SZ_SG_IOVEC sizeof(sg_iovec_t)
> > #define SZ_SG_REQ_INFO sizeof(sg_req_info_t)
> >
> > +static Sg_device *scsi_generic_get(int dev)
> > +{
> > + Sg_device *sdp = NULL;
> > +
> > + down(&sg_ref_sem);
> > +
> > + sdp = sg_get_dev (dev);
> > + if (!sdp) goto out;
> > +
> > + if (sdp->detached) {
> > + sdp = NULL;
> > + goto out;
> > + }
> > +
> > + kref_get(&sdp->kref);
> > +
> > + if (!sdp->device)
> > + goto out_put;
> > +
> > + if (scsi_device_get(sdp->device))
> > + goto out_put;
> > +
> > + goto out;
> > +
> > +out_put:
> > + kref_put(&sdp->kref, scsi_generic_release);
> > + sdp = NULL;
> > +out:
> > + up(&sg_ref_sem);
> > + return sdp;
> > +}
> > +
> > +static void scsi_generic_put(Sg_device *sdp)
> > +{
> > + struct scsi_device *sdev = sdp->device;
> > +
> > + down(&sg_ref_sem);
> > + kref_put(&sdp->kref, scsi_generic_release);
> > + scsi_device_put(sdev);
> > + up(&sg_ref_sem);
> > +}
> > +
> > static int
> > sg_open(struct inode *inode, struct file *filp)
> > {
> > @@ -235,17 +287,8 @@ sg_open(struct inode *inode, struct file
> >
> > nonseekable_open(inode, filp);
> > SCSI_LOG_TIMEOUT(3, printk("sg_open: dev=%d,
> flags=0x%x\n", dev,
> > flags));
> > - sdp = sg_get_dev(dev);
> > - if ((!sdp) || (!sdp->device))
> > + if (!(sdp = scsi_generic_get(dev)))
> > return -ENXIO;
> > - if (sdp->detached)
> > - return -ENODEV;
> > -
> > - /* This driver's module count bumped by fops_get in
> <linux/fs.h>
> > */
> > - /* Prevent the device driver from vanishing while
> we sleep */
> > - retval = scsi_device_get(sdp->device);
> > - if (retval)
> > - return retval;
> >
> > if (!((flags & O_NONBLOCK) ||
> > scsi_block_when_processing_errors(sdp->device))) {
> > @@ -302,7 +345,7 @@ sg_open(struct inode *inode, struct file
> > return 0;
> >
> > error_out:
> > - scsi_device_put(sdp->device);
> > + scsi_generic_put(sdp);
> > return retval;
> > }
> >
> > @@ -317,13 +360,14 @@ sg_release(struct inode *inode, struct f
> > return -ENXIO;
> > SCSI_LOG_TIMEOUT(3, printk("sg_release: %s\n",
> > sdp->disk->disk_name));
> > sg_fasync(-1, filp, 0); /* remove filp from async
> notification
> > list */
> > - if (0 == sg_remove_sfp(sdp, sfp)) { /* Returns
> 1 when sdp
> > gone */
> > - if (!sdp->detached) {
> > - scsi_device_put(sdp->device);
> > - }
> > - sdp->exclude = 0;
> > - wake_up_interruptible(&sdp->o_excl_wait);
> > - }
> > +
> > + /* Don't release the device until all IO comes back. */
> > + while (sg_remove_sfp(sdp, sfp)){
> > + msleep(10);
> > + }
> > + sdp->exclude = 0;
> > + wake_up_interruptible(&sdp->o_excl_wait);
> > + scsi_generic_put(sdp);
> > return 0;
> > }
> >
> > @@ -1238,7 +1285,7 @@ sg_cmd_done(Scsi_Cmnd * SCpnt)
> > sfp = srp->parentfp;
> > if (sfp)
> > sdp = sfp->parentdp;
> > - if ((NULL == sdp) || sdp->detached) {
> > + if (NULL == sdp) {
> > printk(KERN_INFO "sg_cmd_done: device detached\n");
> > scsi_release_request(SRpnt);
> > return;
> > @@ -1292,18 +1339,8 @@ sg_cmd_done(Scsi_Cmnd * SCpnt)
> > scsi_release_request(SRpnt);
> > SRpnt = NULL;
> > - if (sfp->closed) { /* whoops this fd already released,
> > cleanup */
> > - SCSI_LOG_TIMEOUT(1, printk("sg_cmd_done:
> already closed,
> > freeing ...\n"));
> > - sg_finish_rem_req(srp);
> > - srp = NULL;
> > - if (NULL == sfp->headrp) {
> > - SCSI_LOG_TIMEOUT(1, printk("sg...bh: already
> > closed, final cleanup\n"));
> > - if (0 == sg_remove_sfp(sdp, sfp)) { /*
> > device still present */
> > - scsi_device_put(sdp->device);
> > - }
> > - sfp = NULL;
> > - }
> > - } else if (srp && srp->orphan) {
> > +
> > + if (srp && srp->orphan) {
> > if (sfp->keep_orphan)
> > srp->sg_io_owned = 0;
> > else {
> > @@ -1441,6 +1478,7 @@ sg_add(struct class_device *cl_dev)
> > }
> > k = error;
> > sdp = sg_dev_arr[k];
> > + kref_init(&sdp->kref);
> >
> > devfs_mk_cdev(MKDEV(SCSI_GENERIC_MAJOR, k),
> > S_IFCHR | S_IRUSR | S_IWUSR | S_IRGRP,
> > @@ -1493,45 +1531,33 @@ sg_remove(struct class_device *cl_dev)
> > unsigned long iflags;
> > Sg_fd *sfp;
> > Sg_fd *tsfp;
> > - Sg_request *srp;
> > - Sg_request *tsrp;
> > - int k, delay;
> > + int k;
> >
> > if (NULL == sg_dev_arr)
> > return;
> > - delay = 0;
> > write_lock_irqsave(&sg_dev_arr_lock, iflags);
> > for (k = 0; k < sg_dev_max; k++) {
> > - sdp = sg_dev_arr[k];
> > - if ((NULL == sdp) || (sdp->device != scsidp))
> > + if ((NULL == sg_dev_arr[k]) ||
> > + (sg_dev_arr[k]->device != scsidp))
> > continue; /* dirty but lowers
> nesting */
> > + sdp = sg_dev_arr[k];
> > + sdp->detached = 1;
> > +
> > if (sdp->headfp) {
> > - sdp->detached = 1;
> > for (sfp = sdp->headfp; sfp; sfp = tsfp) {
> > tsfp = sfp->nextfp;
> > - for (srp = sfp->headrp;
> srp; srp = tsrp)
> > {
> > - tsrp = srp->nextrp;
> > - if (sfp->closed || (0 ==
> > sg_srp_done(srp, sfp)))
> > -
> sg_finish_rem_req(srp);
> > - }
> > - if (sfp->closed) {
> > -
> scsi_device_put(sdp->device);
> > - __sg_remove_sfp(sdp, sfp);
> > - } else {
> > - delay = 1;
> > + if (!(sfp->closed)) {
> >
> > wake_up_interruptible(&sfp->read_wait);
> > kill_fasync(&sfp->async_qp,
> > SIGPOLL,
> > POLL_HUP);
> > }
> > }
> > SCSI_LOG_TIMEOUT(3,
> printk("sg_detach: dev=%d,
> > dirty\n", k));
> > - if (NULL == sdp->headfp) {
> > - sg_dev_arr[k] = NULL;
> > - }
> > } else { /* nothing active, simple case */
> > SCSI_LOG_TIMEOUT(3, printk("sg_detach:
> > dev=%d\n", k));
> > - sg_dev_arr[k] = NULL;
> > }
> > +
> > + sg_dev_arr[k] = NULL;
> > sg_nr_dev--;
> > break;
> > }
> > @@ -1543,14 +1569,32 @@ sg_remove(struct class_device *cl_dev)
> > cdev_del(sdp->cdev);
> > sdp->cdev = NULL;
> > devfs_remove("%s/generic", scsidp->devfs_name);
> > - put_disk(sdp->disk);
> > - sdp->disk = NULL;
> > - if (NULL == sdp->headfp)
> > - kfree((char *) sdp);
> > +
> > + down(&sg_ref_sem);
> > + kref_put(&sdp->kref, scsi_generic_release);
> > + up(&sg_ref_sem);
> > }
> > +}
> > +
> > +/**
> > + * scsi_generic_release - Called to free the
> Sg_device structure
> > + * @kref: pointer to embedded kref
> > + *
> > + * sg_ref_sem must be held entering this routine.
> Because it is
> > + * called on last put, you should always use the
> > scsi_generic_get()
> > + * and scsi_generic_put() helpers which manipulate
> the semaphore
> > + * directly and never do a direct kref_put().
> > + **/
> > +static void scsi_generic_release(struct kref *kref)
> > +{
> > + Sg_device *sdp = to_scsi_generic(kref);
> > + struct gendisk *disk = sdp->disk;
> > +
> > + disk->private_data = NULL;
> > + put_disk(disk);
> >
> > - if (delay)
> > - msleep(10); /* dirty detach so delay device
> > destruction */
> > + kfree((char *) sdp);
> > + return;
> > }
> >
> > /* Set 'perm' (4th argument) to 0 to disable
> module_param's definition
> > @@ -2484,14 +2528,13 @@ __sg_remove_sfp(Sg_device * sdp, Sg_fd *
> > sg_page_free((char *) sfp, sizeof (Sg_fd));
> > }
> >
> > -/* Returns 0 in normal case, 1 when detached and sdp
> object removed */
> > +/* Returns a count of IOs still in progress. */
> > static int
> > sg_remove_sfp(Sg_device * sdp, Sg_fd * sfp)
> > {
> > Sg_request *srp;
> > Sg_request *tsrp;
> > int dirty = 0;
> > - int res = 0;
> >
> > for (srp = sfp->headrp; srp; srp = tsrp) {
> > tsrp = srp->nextrp;
> > @@ -2505,30 +2548,16 @@ sg_remove_sfp(Sg_device * sdp, Sg_fd * s
> >
> > write_lock_irqsave(&sg_dev_arr_lock, iflags);
> > __sg_remove_sfp(sdp, sfp);
> > - if (sdp->detached && (NULL == sdp->headfp)) {
> > - int k, maxd;
> > -
> > - maxd = sg_dev_max;
> > - for (k = 0; k < maxd; ++k) {
> > - if (sdp == sg_dev_arr[k])
> > - break;
> > - }
> > - if (k < maxd)
> > - sg_dev_arr[k] = NULL;
> > - kfree((char *) sdp);
> > - res = 1;
> > - }
> > write_unlock_irqrestore(&sg_dev_arr_lock, iflags);
> > } else {
> > - /* MOD_INC's to inhibit unloading sg and associated
> > adapter driver */
> > - /* only bump the access_count if we
> actually succeeded
> > in
> > - * throwing another counter on the host module */
> > - scsi_device_get(sdp->device); /* XXX:
> retval ignored?
> > */
> > + /* The non-zero dirty count will be
> returned. This will
> > tell
> > + the caller not to drop the reference on
> this device.
> > */
> > +
> > sfp->closed = 1; /* flag dirty state
> on this fd
> > */
> > SCSI_LOG_TIMEOUT(1, printk("sg_remove_sfp:
> worrisome, %d
> > writes pending\n",
> > dirty));
> > }
> > - return res;
> > + return dirty;
> > }
> >
> > static int
> > -
> > To unsubscribe from this list: send the line "unsubscribe
> linux-scsi" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread* requesting advice: oops when doing sg_dd IO and device is removed
@ 2005-10-17 13:33 Dailey, Nate
2005-10-18 4:54 ` Douglas Gilbert
0 siblings, 1 reply; 6+ messages in thread
From: Dailey, Nate @ 2005-10-17 13:33 UTC (permalink / raw)
To: dgilbert; +Cc: linux-scsi
Doug (and anyone else who might be interested), I'm looking for some
advice as to how best to fix a problem I hit in sg.
I was doing the following:
- start up several processes reading from an sg device with sg_dd
- every other iteration, kill the processes
- remove the device (echo 1 > /sys/bus/scsi/devices/1:0:1:0/delete)
- add the device (echo "0 1 0" > /sys/class/scsi_host/host1/scan)
This was on a 2.6.9 kernel, with a patch:
http://marc.theaimsgroup.com/?l=linux-scsi&m=112749860222533&w=2
That I've submitted to the LSML... patch was intended to fix a different
oops when doing the above test.
Here's what I saw, after the device was removed:
Debug: sleeping function called from invalid context at
include/linux/rwsem.h:6in_atomic():1[expected: 0], irqs_disabled():0
[<c011fcd1>] __might_sleep+0x7d/0x88
[<c021cb5e>] device_del+0x1e/0x90
[<f8840ccc>] scsi_device_dev_release+0xdf/0x13a [scsi_mod]
[<c021c8e9>] device_release+0x11/0x40
[<c01bf03f>] kobject_cleanup+0x40/0x60
[<c01bf05f>] kobject_release+0x0/0x8
[<c01bf305>] kref_put+0x42/0x45
[<f883e61b>] scsi_next_command+0xc/0x14 [scsi_mod]
[<f883a0a8>] scsi_release_request+0x8/0x10 [scsi_mod]
[<f883ab15>] scsi_finish_command+0xad/0xb1 [scsi_mod]
[<f883aa3a>] scsi_softirq+0xb6/0xbe [scsi_mod]
[<c0126424>] __do_softirq+0x4c/0xb1
[<c010812f>] do_softirq+0x4f/0x56
=======================
[<c0107a44>] do_IRQ+0x1a2/0x1ae
[<c02d1a8c>] common_interrupt+0x18/0x20
[<c0104018>] default_idle+0x0/0x2c
[<c0104041>] default_idle+0x29/0x2c
[<c010409d>] cpu_idle+0x26/0x3b
[<c0390786>] start_kernel+0x199/0x19d
(It looks like a command comes back, we drop the last reference on the
device, the device is freed... because we're in softirq, and device_del
may eventually sleep (in this kernel version, anyway), we get this debug
message. I don't think this should normally happen when a command comes
back, for reasons I'll discuss below)
Continuing on...
Badness in kref_get at lib/kref.c:32
[<c01bf2bb>] kref_get+0x24/0x2c
[<c01beffb>] kobject_get+0xf/0x13
[<c021cb32>] get_device+0xe/0x14
[<f883ef34>] scsi_request_fn+0x19/0x319 [scsi_mod]
[<c022213e>] blk_run_queue+0x20/0x2f
[<f883a0a8>] scsi_release_request+0x8/0x10 [scsi_mod]
[<f883ab15>] scsi_finish_command+0xad/0xb1 [scsi_mod]
[<f883aa3a>] scsi_softirq+0xb6/0xbe [scsi_mod]
[<c0126424>] __do_softirq+0x4c/0xb1
[<c010812f>] do_softirq+0x4f/0x56
=======================
[<c0107a44>] do_IRQ+0x1a2/0x1ae
[<c02d1a8c>] common_interrupt+0x18/0x20
[<c0104018>] default_idle+0x0/0x2c
[<c0104041>] default_idle+0x29/0x2c
[<c010409d>] cpu_idle+0x26/0x3b
[<c0390786>] start_kernel+0x199/0x19d
(So, there were more commands on the device's queue... presumably for
sg, since I wasn't doing any other IO to this disk. We're trying to get
a reference on the device, but the ref count has already gone to 0 so
this "Badness" message results.)
Continuing on...
Unable to handle kernel NULL pointer dereference at virtual address
00000008
printing eip:
c0229678
*pde = 00004001
Oops: 0000 [#1]
SMP
Modules linked in: sg(U) parport_pc lp parport autofs4 nfs lockd sunrpc
md5
ip)
CPU: 0
EIP: 0060:[<c0229678>] Tainted: PF VLI
EFLAGS: 00010046 (2.6.9-20.ELsmp)
EIP is at cfq_next_request+0x7/0x35
eax: f7b6f5e0 ebx: 00000000 ecx: c03249bc edx: f243a594
esi: f7b6f5e0 edi: f7f5b000 ebp: f7b6f5e0 esp: c03c7f80
ds: 007b es: 007b ss: 0068
Process swapper (pid: 0, threadinfo=c03c7000 task=c031ea80)
Stack: f7b6f5e0 f7b6f5e0 c02209dc f7b6f5e0 f243a400 f7f5b000 f883ef51
f7b6f5e0
00000246 f243a400 00000000 c022213e f7d62800 f3872800 f883a0a8
f7f5b000
f883ab15 00002002 f3872800 c03c7fd4 f883aa3a c03c7fd4 c03c7fd4
00000003
Call Trace:
[<c02209dc>] elv_next_request+0xbe/0xce
[<f883ef51>] scsi_request_fn+0x36/0x319 [scsi_mod]
[<c022213e>] blk_run_queue+0x20/0x2f
[<f883a0a8>] scsi_release_request+0x8/0x10 [scsi_mod]
[<f883ab15>] scsi_finish_command+0xad/0xb1 [scsi_mod]
[<f883aa3a>] scsi_softirq+0xb6/0xbe [scsi_mod]
[<c0126424>] __do_softirq+0x4c/0xb1
[<c010812f>] do_softirq+0x4f/0x56
=======================
[<c0107a44>] do_IRQ+0x1a2/0x1ae
[<c02d1a8c>] common_interrupt+0x18/0x20
[<c0104018>] default_idle+0x0/0x2c
[<c0104041>] default_idle+0x29/0x2c
[<c010409d>] cpu_idle+0x26/0x3b
[<c0390786>] start_kernel+0x199/0x19d
Code: 01 00 00 00 89 e8 eb b6 8b 04 24 3b 43 24 0f 92 c0 31 d2 85 ff 0f
95 c2
(It was inevitable that something bad would happen when trying to run
the queue on a device which had been freed)
My theory here is that sg should be holding a reference on the SCSI
device until all IO comes back. With or without the patch I mentioned
above, this is not the case. Although sg_release does take out an extra
reference if there's any IO still outstanding, if sg_remove is called,
this reference is dropped. So after sg_remove, there could still be IO
to the device outstanding, but no reference on the device (from sg).
This means the device could get freed even though there's still some sg
IO on the queue.
Does this analysis sound reasonable? I'm a bit puzzled by the fact that
sd (for example) doesn't appear to have this problem (maybe sd_release
just never gets called until all IO comes back?).
So, how to fix this?
The first thing I tried was adding a kref to the Sg_device struct,
patterned after the reference counting in sd, sr, st. In sg_release, I
would drop a reference only if there was no IO outstanding. If there
_was_ IO outstanding, I would drop the reference in sg_cmd_done when the
last IO came back. In theory, this would ensure that sg kept a reference
on the device until all IO came back.
First problem: I was protecting the kref with a mutex, as sd, sr, etc.
do. Trying to down the mutex from sg_cmd_done didn't work (as this is
done from softirq). Okay, so I took out the mutex, just to see how
things would work. Still there was a problem... doing the final
scsi_device_put in sg_cmd_done won't work, because
scsi_device_dev_release calls device_del, which may sleep (in this
kernel version, anyway... I think this has changed in later kernels). So
it seems trying to drop a reference when the last command comes back is
out of the question.
The next thing I tried was simply sleeping in sg_release until all
outstanding IO comes back (changing sg_remove_sfp to return a count of
outstanding IOs), and only then dropping the device reference (patch
below). This seems to be working fine. But I'm wondering if this is
really a good solution? I'm not sure I can think of anything else... any
ideas?
Thanks!
Nate Dailey
Stratus Technologies
Here's the "wait in sg_release" patch (which I think is still useful,
even if it's not strictly required). This is against a 2.6.9 kernel, but
I can resubmit a patch against a later kernel if desired...
--- sg.c.orig 2005-10-13 16:50:00.806798387 -0400
+++ sg.c 2005-10-13 16:07:06.365733548 -0400
@@ -49,6 +49,7 @@ static int sg_version_num = 30531; /* 2
#include <linux/seq_file.h>
#include <linux/blkdev.h>
#include <linux/delay.h>
+#include <linux/kref.h>
#include "scsi.h"
#include <scsi/scsi_host.h>
@@ -173,6 +174,7 @@ typedef struct sg_device { /* holds the
char sgdebug; /* 0->off, 1->sense, 9->dump dev, 10->
all devs */
struct gendisk *disk;
struct cdev * cdev; /* char_dev [sysfs:
/sys/cdev/major/sg<n>] */
+ struct kref kref;
} Sg_device;
static int sg_fasync(int fd, struct file *filp, int mode);
@@ -218,11 +220,61 @@ static Sg_device **sg_dev_arr = NULL;
static int sg_dev_max;
static int sg_nr_dev;
+/* This semaphore is used to mediate the 0->1 reference get in the
+ * face of object destruction (i.e. we can't allow a get on an
+ * object after last put) */
+static DECLARE_MUTEX(sg_ref_sem);
+
+static void scsi_generic_release(struct kref *kref);
+#define to_scsi_generic(obj) container_of(obj,Sg_device,kref)
+
#define SZ_SG_HEADER sizeof(struct sg_header)
#define SZ_SG_IO_HDR sizeof(sg_io_hdr_t)
#define SZ_SG_IOVEC sizeof(sg_iovec_t)
#define SZ_SG_REQ_INFO sizeof(sg_req_info_t)
+static Sg_device *scsi_generic_get(int dev)
+{
+ Sg_device *sdp = NULL;
+
+ down(&sg_ref_sem);
+
+ sdp = sg_get_dev (dev);
+ if (!sdp) goto out;
+
+ if (sdp->detached) {
+ sdp = NULL;
+ goto out;
+ }
+
+ kref_get(&sdp->kref);
+
+ if (!sdp->device)
+ goto out_put;
+
+ if (scsi_device_get(sdp->device))
+ goto out_put;
+
+ goto out;
+
+out_put:
+ kref_put(&sdp->kref, scsi_generic_release);
+ sdp = NULL;
+out:
+ up(&sg_ref_sem);
+ return sdp;
+}
+
+static void scsi_generic_put(Sg_device *sdp)
+{
+ struct scsi_device *sdev = sdp->device;
+
+ down(&sg_ref_sem);
+ kref_put(&sdp->kref, scsi_generic_release);
+ scsi_device_put(sdev);
+ up(&sg_ref_sem);
+}
+
static int
sg_open(struct inode *inode, struct file *filp)
{
@@ -235,17 +287,8 @@ sg_open(struct inode *inode, struct file
nonseekable_open(inode, filp);
SCSI_LOG_TIMEOUT(3, printk("sg_open: dev=%d, flags=0x%x\n", dev,
flags));
- sdp = sg_get_dev(dev);
- if ((!sdp) || (!sdp->device))
+ if (!(sdp = scsi_generic_get(dev)))
return -ENXIO;
- if (sdp->detached)
- return -ENODEV;
-
- /* This driver's module count bumped by fops_get in <linux/fs.h>
*/
- /* Prevent the device driver from vanishing while we sleep */
- retval = scsi_device_get(sdp->device);
- if (retval)
- return retval;
if (!((flags & O_NONBLOCK) ||
scsi_block_when_processing_errors(sdp->device))) {
@@ -302,7 +345,7 @@ sg_open(struct inode *inode, struct file
return 0;
error_out:
- scsi_device_put(sdp->device);
+ scsi_generic_put(sdp);
return retval;
}
@@ -317,13 +360,14 @@ sg_release(struct inode *inode, struct f
return -ENXIO;
SCSI_LOG_TIMEOUT(3, printk("sg_release: %s\n",
sdp->disk->disk_name));
sg_fasync(-1, filp, 0); /* remove filp from async notification
list */
- if (0 == sg_remove_sfp(sdp, sfp)) { /* Returns 1 when sdp
gone */
- if (!sdp->detached) {
- scsi_device_put(sdp->device);
- }
- sdp->exclude = 0;
- wake_up_interruptible(&sdp->o_excl_wait);
- }
+
+ /* Don't release the device until all IO comes back. */
+ while (sg_remove_sfp(sdp, sfp)){
+ msleep(10);
+ }
+ sdp->exclude = 0;
+ wake_up_interruptible(&sdp->o_excl_wait);
+ scsi_generic_put(sdp);
return 0;
}
@@ -1238,7 +1285,7 @@ sg_cmd_done(Scsi_Cmnd * SCpnt)
sfp = srp->parentfp;
if (sfp)
sdp = sfp->parentdp;
- if ((NULL == sdp) || sdp->detached) {
+ if (NULL == sdp) {
printk(KERN_INFO "sg_cmd_done: device detached\n");
scsi_release_request(SRpnt);
return;
@@ -1292,18 +1339,8 @@ sg_cmd_done(Scsi_Cmnd * SCpnt)
scsi_release_request(SRpnt);
SRpnt = NULL;
- if (sfp->closed) { /* whoops this fd already released,
cleanup */
- SCSI_LOG_TIMEOUT(1, printk("sg_cmd_done: already closed,
freeing ...\n"));
- sg_finish_rem_req(srp);
- srp = NULL;
- if (NULL == sfp->headrp) {
- SCSI_LOG_TIMEOUT(1, printk("sg...bh: already
closed, final cleanup\n"));
- if (0 == sg_remove_sfp(sdp, sfp)) { /*
device still present */
- scsi_device_put(sdp->device);
- }
- sfp = NULL;
- }
- } else if (srp && srp->orphan) {
+
+ if (srp && srp->orphan) {
if (sfp->keep_orphan)
srp->sg_io_owned = 0;
else {
@@ -1441,6 +1478,7 @@ sg_add(struct class_device *cl_dev)
}
k = error;
sdp = sg_dev_arr[k];
+ kref_init(&sdp->kref);
devfs_mk_cdev(MKDEV(SCSI_GENERIC_MAJOR, k),
S_IFCHR | S_IRUSR | S_IWUSR | S_IRGRP,
@@ -1493,45 +1531,33 @@ sg_remove(struct class_device *cl_dev)
unsigned long iflags;
Sg_fd *sfp;
Sg_fd *tsfp;
- Sg_request *srp;
- Sg_request *tsrp;
- int k, delay;
+ int k;
if (NULL == sg_dev_arr)
return;
- delay = 0;
write_lock_irqsave(&sg_dev_arr_lock, iflags);
for (k = 0; k < sg_dev_max; k++) {
- sdp = sg_dev_arr[k];
- if ((NULL == sdp) || (sdp->device != scsidp))
+ if ((NULL == sg_dev_arr[k]) ||
+ (sg_dev_arr[k]->device != scsidp))
continue; /* dirty but lowers nesting */
+ sdp = sg_dev_arr[k];
+ sdp->detached = 1;
+
if (sdp->headfp) {
- sdp->detached = 1;
for (sfp = sdp->headfp; sfp; sfp = tsfp) {
tsfp = sfp->nextfp;
- for (srp = sfp->headrp; srp; srp = tsrp)
{
- tsrp = srp->nextrp;
- if (sfp->closed || (0 ==
sg_srp_done(srp, sfp)))
- sg_finish_rem_req(srp);
- }
- if (sfp->closed) {
- scsi_device_put(sdp->device);
- __sg_remove_sfp(sdp, sfp);
- } else {
- delay = 1;
+ if (!(sfp->closed)) {
wake_up_interruptible(&sfp->read_wait);
kill_fasync(&sfp->async_qp,
SIGPOLL,
POLL_HUP);
}
}
SCSI_LOG_TIMEOUT(3, printk("sg_detach: dev=%d,
dirty\n", k));
- if (NULL == sdp->headfp) {
- sg_dev_arr[k] = NULL;
- }
} else { /* nothing active, simple case */
SCSI_LOG_TIMEOUT(3, printk("sg_detach:
dev=%d\n", k));
- sg_dev_arr[k] = NULL;
}
+
+ sg_dev_arr[k] = NULL;
sg_nr_dev--;
break;
}
@@ -1543,14 +1569,32 @@ sg_remove(struct class_device *cl_dev)
cdev_del(sdp->cdev);
sdp->cdev = NULL;
devfs_remove("%s/generic", scsidp->devfs_name);
- put_disk(sdp->disk);
- sdp->disk = NULL;
- if (NULL == sdp->headfp)
- kfree((char *) sdp);
+
+ down(&sg_ref_sem);
+ kref_put(&sdp->kref, scsi_generic_release);
+ up(&sg_ref_sem);
}
+}
+
+/**
+ * scsi_generic_release - Called to free the Sg_device structure
+ * @kref: pointer to embedded kref
+ *
+ * sg_ref_sem must be held entering this routine. Because it is
+ * called on last put, you should always use the
scsi_generic_get()
+ * and scsi_generic_put() helpers which manipulate the semaphore
+ * directly and never do a direct kref_put().
+ **/
+static void scsi_generic_release(struct kref *kref)
+{
+ Sg_device *sdp = to_scsi_generic(kref);
+ struct gendisk *disk = sdp->disk;
+
+ disk->private_data = NULL;
+ put_disk(disk);
- if (delay)
- msleep(10); /* dirty detach so delay device
destruction */
+ kfree((char *) sdp);
+ return;
}
/* Set 'perm' (4th argument) to 0 to disable module_param's definition
@@ -2484,14 +2528,13 @@ __sg_remove_sfp(Sg_device * sdp, Sg_fd *
sg_page_free((char *) sfp, sizeof (Sg_fd));
}
-/* Returns 0 in normal case, 1 when detached and sdp object removed */
+/* Returns a count of IOs still in progress. */
static int
sg_remove_sfp(Sg_device * sdp, Sg_fd * sfp)
{
Sg_request *srp;
Sg_request *tsrp;
int dirty = 0;
- int res = 0;
for (srp = sfp->headrp; srp; srp = tsrp) {
tsrp = srp->nextrp;
@@ -2505,30 +2548,16 @@ sg_remove_sfp(Sg_device * sdp, Sg_fd * s
write_lock_irqsave(&sg_dev_arr_lock, iflags);
__sg_remove_sfp(sdp, sfp);
- if (sdp->detached && (NULL == sdp->headfp)) {
- int k, maxd;
-
- maxd = sg_dev_max;
- for (k = 0; k < maxd; ++k) {
- if (sdp == sg_dev_arr[k])
- break;
- }
- if (k < maxd)
- sg_dev_arr[k] = NULL;
- kfree((char *) sdp);
- res = 1;
- }
write_unlock_irqrestore(&sg_dev_arr_lock, iflags);
} else {
- /* MOD_INC's to inhibit unloading sg and associated
adapter driver */
- /* only bump the access_count if we actually succeeded
in
- * throwing another counter on the host module */
- scsi_device_get(sdp->device); /* XXX: retval ignored?
*/
+ /* The non-zero dirty count will be returned. This will
tell
+ the caller not to drop the reference on this device.
*/
+
sfp->closed = 1; /* flag dirty state on this fd
*/
SCSI_LOG_TIMEOUT(1, printk("sg_remove_sfp: worrisome, %d
writes pending\n",
dirty));
}
- return res;
+ return dirty;
}
static int
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: requesting advice: oops when doing sg_dd IO and device is removed
2005-10-17 13:33 Dailey, Nate
@ 2005-10-18 4:54 ` Douglas Gilbert
2005-10-18 22:02 ` Tony Battersby
0 siblings, 1 reply; 6+ messages in thread
From: Douglas Gilbert @ 2005-10-18 4:54 UTC (permalink / raw)
To: Dailey, Nate; +Cc: linux-scsi
Nate,
I always suspected that if one really tried hard enough
then unexpectedly closing a file descriptor (e.g. via
control-C on an app) at the same time as there are
outstanding commands could cause problems. Obviously
you have succeeded demonstrating that with your experiment.
When I fought with this problem in the lk 2.4 series,
someone skilled at finding flaws in locking logic
suggested there was no solution (at least the way
I was doing it) ...
Waiting a close() call (i.e. sg_release() ), potentially
indefinitely, was very unpopular with application clients.
The sg driver did that in the old days, and I was abused
from several quarters (one name springs to mind). So I
tried to design sg_release() so it wouldn't ever hang
the application client.
On reflection, I think to allow sg_release() to go
through quickly, another kernel thread is needed that
is passed ownership of unfinished commands. What do
you think?
Doug Gilbert
Dailey, Nate wrote:
> Doug (and anyone else who might be interested), I'm looking for some
> advice as to how best to fix a problem I hit in sg.
>
> I was doing the following:
>
> - start up several processes reading from an sg device with sg_dd
> - every other iteration, kill the processes
> - remove the device (echo 1 > /sys/bus/scsi/devices/1:0:1:0/delete)
> - add the device (echo "0 1 0" > /sys/class/scsi_host/host1/scan)
>
> This was on a 2.6.9 kernel, with a patch:
>
> http://marc.theaimsgroup.com/?l=linux-scsi&m=112749860222533&w=2
>
> That I've submitted to the LSML... patch was intended to fix a different
> oops when doing the above test.
>
> Here's what I saw, after the device was removed:
>
> Debug: sleeping function called from invalid context at
> include/linux/rwsem.h:6in_atomic():1[expected: 0], irqs_disabled():0
> [<c011fcd1>] __might_sleep+0x7d/0x88
> [<c021cb5e>] device_del+0x1e/0x90
> [<f8840ccc>] scsi_device_dev_release+0xdf/0x13a [scsi_mod]
> [<c021c8e9>] device_release+0x11/0x40
> [<c01bf03f>] kobject_cleanup+0x40/0x60
> [<c01bf05f>] kobject_release+0x0/0x8
> [<c01bf305>] kref_put+0x42/0x45
> [<f883e61b>] scsi_next_command+0xc/0x14 [scsi_mod]
> [<f883a0a8>] scsi_release_request+0x8/0x10 [scsi_mod]
> [<f883ab15>] scsi_finish_command+0xad/0xb1 [scsi_mod]
> [<f883aa3a>] scsi_softirq+0xb6/0xbe [scsi_mod]
> [<c0126424>] __do_softirq+0x4c/0xb1
> [<c010812f>] do_softirq+0x4f/0x56
> =======================
> [<c0107a44>] do_IRQ+0x1a2/0x1ae
> [<c02d1a8c>] common_interrupt+0x18/0x20
> [<c0104018>] default_idle+0x0/0x2c
> [<c0104041>] default_idle+0x29/0x2c
> [<c010409d>] cpu_idle+0x26/0x3b
> [<c0390786>] start_kernel+0x199/0x19d
>
> (It looks like a command comes back, we drop the last reference on the
> device, the device is freed... because we're in softirq, and device_del
> may eventually sleep (in this kernel version, anyway), we get this debug
> message. I don't think this should normally happen when a command comes
> back, for reasons I'll discuss below)
>
> Continuing on...
>
> Badness in kref_get at lib/kref.c:32
> [<c01bf2bb>] kref_get+0x24/0x2c
> [<c01beffb>] kobject_get+0xf/0x13
> [<c021cb32>] get_device+0xe/0x14
> [<f883ef34>] scsi_request_fn+0x19/0x319 [scsi_mod]
> [<c022213e>] blk_run_queue+0x20/0x2f
> [<f883a0a8>] scsi_release_request+0x8/0x10 [scsi_mod]
> [<f883ab15>] scsi_finish_command+0xad/0xb1 [scsi_mod]
> [<f883aa3a>] scsi_softirq+0xb6/0xbe [scsi_mod]
> [<c0126424>] __do_softirq+0x4c/0xb1
> [<c010812f>] do_softirq+0x4f/0x56
> =======================
> [<c0107a44>] do_IRQ+0x1a2/0x1ae
> [<c02d1a8c>] common_interrupt+0x18/0x20
> [<c0104018>] default_idle+0x0/0x2c
> [<c0104041>] default_idle+0x29/0x2c
> [<c010409d>] cpu_idle+0x26/0x3b
> [<c0390786>] start_kernel+0x199/0x19d
>
> (So, there were more commands on the device's queue... presumably for
> sg, since I wasn't doing any other IO to this disk. We're trying to get
> a reference on the device, but the ref count has already gone to 0 so
> this "Badness" message results.)
>
> Continuing on...
>
> Unable to handle kernel NULL pointer dereference at virtual address
> 00000008
> printing eip:
> c0229678
> *pde = 00004001
> Oops: 0000 [#1]
> SMP
> Modules linked in: sg(U) parport_pc lp parport autofs4 nfs lockd sunrpc
> md5
> ip)
> CPU: 0
> EIP: 0060:[<c0229678>] Tainted: PF VLI
> EFLAGS: 00010046 (2.6.9-20.ELsmp)
> EIP is at cfq_next_request+0x7/0x35
> eax: f7b6f5e0 ebx: 00000000 ecx: c03249bc edx: f243a594
> esi: f7b6f5e0 edi: f7f5b000 ebp: f7b6f5e0 esp: c03c7f80
> ds: 007b es: 007b ss: 0068
> Process swapper (pid: 0, threadinfo=c03c7000 task=c031ea80)
> Stack: f7b6f5e0 f7b6f5e0 c02209dc f7b6f5e0 f243a400 f7f5b000 f883ef51
> f7b6f5e0
> 00000246 f243a400 00000000 c022213e f7d62800 f3872800 f883a0a8
> f7f5b000
> f883ab15 00002002 f3872800 c03c7fd4 f883aa3a c03c7fd4 c03c7fd4
> 00000003
> Call Trace:
> [<c02209dc>] elv_next_request+0xbe/0xce
> [<f883ef51>] scsi_request_fn+0x36/0x319 [scsi_mod]
> [<c022213e>] blk_run_queue+0x20/0x2f
> [<f883a0a8>] scsi_release_request+0x8/0x10 [scsi_mod]
> [<f883ab15>] scsi_finish_command+0xad/0xb1 [scsi_mod]
> [<f883aa3a>] scsi_softirq+0xb6/0xbe [scsi_mod]
> [<c0126424>] __do_softirq+0x4c/0xb1
> [<c010812f>] do_softirq+0x4f/0x56
> =======================
> [<c0107a44>] do_IRQ+0x1a2/0x1ae
> [<c02d1a8c>] common_interrupt+0x18/0x20
> [<c0104018>] default_idle+0x0/0x2c
> [<c0104041>] default_idle+0x29/0x2c
> [<c010409d>] cpu_idle+0x26/0x3b
> [<c0390786>] start_kernel+0x199/0x19d
> Code: 01 00 00 00 89 e8 eb b6 8b 04 24 3b 43 24 0f 92 c0 31 d2 85 ff 0f
> 95 c2
>
> (It was inevitable that something bad would happen when trying to run
> the queue on a device which had been freed)
>
>
> My theory here is that sg should be holding a reference on the SCSI
> device until all IO comes back. With or without the patch I mentioned
> above, this is not the case. Although sg_release does take out an extra
> reference if there's any IO still outstanding, if sg_remove is called,
> this reference is dropped. So after sg_remove, there could still be IO
> to the device outstanding, but no reference on the device (from sg).
> This means the device could get freed even though there's still some sg
> IO on the queue.
>
> Does this analysis sound reasonable? I'm a bit puzzled by the fact that
> sd (for example) doesn't appear to have this problem (maybe sd_release
> just never gets called until all IO comes back?).
>
>
> So, how to fix this?
>
> The first thing I tried was adding a kref to the Sg_device struct,
> patterned after the reference counting in sd, sr, st. In sg_release, I
> would drop a reference only if there was no IO outstanding. If there
> _was_ IO outstanding, I would drop the reference in sg_cmd_done when the
> last IO came back. In theory, this would ensure that sg kept a reference
> on the device until all IO came back.
>
> First problem: I was protecting the kref with a mutex, as sd, sr, etc.
> do. Trying to down the mutex from sg_cmd_done didn't work (as this is
> done from softirq). Okay, so I took out the mutex, just to see how
> things would work. Still there was a problem... doing the final
> scsi_device_put in sg_cmd_done won't work, because
> scsi_device_dev_release calls device_del, which may sleep (in this
> kernel version, anyway... I think this has changed in later kernels). So
> it seems trying to drop a reference when the last command comes back is
> out of the question.
>
> The next thing I tried was simply sleeping in sg_release until all
> outstanding IO comes back (changing sg_remove_sfp to return a count of
> outstanding IOs), and only then dropping the device reference (patch
> below). This seems to be working fine. But I'm wondering if this is
> really a good solution? I'm not sure I can think of anything else... any
> ideas?
>
> Thanks!
>
> Nate Dailey
> Stratus Technologies
>
>
> Here's the "wait in sg_release" patch (which I think is still useful,
> even if it's not strictly required). This is against a 2.6.9 kernel, but
> I can resubmit a patch against a later kernel if desired...
>
> --- sg.c.orig 2005-10-13 16:50:00.806798387 -0400
> +++ sg.c 2005-10-13 16:07:06.365733548 -0400
> @@ -49,6 +49,7 @@ static int sg_version_num = 30531; /* 2
> #include <linux/seq_file.h>
> #include <linux/blkdev.h>
> #include <linux/delay.h>
> +#include <linux/kref.h>
>
> #include "scsi.h"
> #include <scsi/scsi_host.h>
> @@ -173,6 +174,7 @@ typedef struct sg_device { /* holds the
> char sgdebug; /* 0->off, 1->sense, 9->dump dev, 10->
> all devs */
> struct gendisk *disk;
> struct cdev * cdev; /* char_dev [sysfs:
> /sys/cdev/major/sg<n>] */
> + struct kref kref;
> } Sg_device;
>
> static int sg_fasync(int fd, struct file *filp, int mode);
> @@ -218,11 +220,61 @@ static Sg_device **sg_dev_arr = NULL;
> static int sg_dev_max;
> static int sg_nr_dev;
>
> +/* This semaphore is used to mediate the 0->1 reference get in the
> + * face of object destruction (i.e. we can't allow a get on an
> + * object after last put) */
> +static DECLARE_MUTEX(sg_ref_sem);
> +
> +static void scsi_generic_release(struct kref *kref);
> +#define to_scsi_generic(obj) container_of(obj,Sg_device,kref)
> +
> #define SZ_SG_HEADER sizeof(struct sg_header)
> #define SZ_SG_IO_HDR sizeof(sg_io_hdr_t)
> #define SZ_SG_IOVEC sizeof(sg_iovec_t)
> #define SZ_SG_REQ_INFO sizeof(sg_req_info_t)
>
> +static Sg_device *scsi_generic_get(int dev)
> +{
> + Sg_device *sdp = NULL;
> +
> + down(&sg_ref_sem);
> +
> + sdp = sg_get_dev (dev);
> + if (!sdp) goto out;
> +
> + if (sdp->detached) {
> + sdp = NULL;
> + goto out;
> + }
> +
> + kref_get(&sdp->kref);
> +
> + if (!sdp->device)
> + goto out_put;
> +
> + if (scsi_device_get(sdp->device))
> + goto out_put;
> +
> + goto out;
> +
> +out_put:
> + kref_put(&sdp->kref, scsi_generic_release);
> + sdp = NULL;
> +out:
> + up(&sg_ref_sem);
> + return sdp;
> +}
> +
> +static void scsi_generic_put(Sg_device *sdp)
> +{
> + struct scsi_device *sdev = sdp->device;
> +
> + down(&sg_ref_sem);
> + kref_put(&sdp->kref, scsi_generic_release);
> + scsi_device_put(sdev);
> + up(&sg_ref_sem);
> +}
> +
> static int
> sg_open(struct inode *inode, struct file *filp)
> {
> @@ -235,17 +287,8 @@ sg_open(struct inode *inode, struct file
>
> nonseekable_open(inode, filp);
> SCSI_LOG_TIMEOUT(3, printk("sg_open: dev=%d, flags=0x%x\n", dev,
> flags));
> - sdp = sg_get_dev(dev);
> - if ((!sdp) || (!sdp->device))
> + if (!(sdp = scsi_generic_get(dev)))
> return -ENXIO;
> - if (sdp->detached)
> - return -ENODEV;
> -
> - /* This driver's module count bumped by fops_get in <linux/fs.h>
> */
> - /* Prevent the device driver from vanishing while we sleep */
> - retval = scsi_device_get(sdp->device);
> - if (retval)
> - return retval;
>
> if (!((flags & O_NONBLOCK) ||
> scsi_block_when_processing_errors(sdp->device))) {
> @@ -302,7 +345,7 @@ sg_open(struct inode *inode, struct file
> return 0;
>
> error_out:
> - scsi_device_put(sdp->device);
> + scsi_generic_put(sdp);
> return retval;
> }
>
> @@ -317,13 +360,14 @@ sg_release(struct inode *inode, struct f
> return -ENXIO;
> SCSI_LOG_TIMEOUT(3, printk("sg_release: %s\n",
> sdp->disk->disk_name));
> sg_fasync(-1, filp, 0); /* remove filp from async notification
> list */
> - if (0 == sg_remove_sfp(sdp, sfp)) { /* Returns 1 when sdp
> gone */
> - if (!sdp->detached) {
> - scsi_device_put(sdp->device);
> - }
> - sdp->exclude = 0;
> - wake_up_interruptible(&sdp->o_excl_wait);
> - }
> +
> + /* Don't release the device until all IO comes back. */
> + while (sg_remove_sfp(sdp, sfp)){
> + msleep(10);
> + }
> + sdp->exclude = 0;
> + wake_up_interruptible(&sdp->o_excl_wait);
> + scsi_generic_put(sdp);
> return 0;
> }
>
> @@ -1238,7 +1285,7 @@ sg_cmd_done(Scsi_Cmnd * SCpnt)
> sfp = srp->parentfp;
> if (sfp)
> sdp = sfp->parentdp;
> - if ((NULL == sdp) || sdp->detached) {
> + if (NULL == sdp) {
> printk(KERN_INFO "sg_cmd_done: device detached\n");
> scsi_release_request(SRpnt);
> return;
> @@ -1292,18 +1339,8 @@ sg_cmd_done(Scsi_Cmnd * SCpnt)
> scsi_release_request(SRpnt);
> SRpnt = NULL;
> - if (sfp->closed) { /* whoops this fd already released,
> cleanup */
> - SCSI_LOG_TIMEOUT(1, printk("sg_cmd_done: already closed,
> freeing ...\n"));
> - sg_finish_rem_req(srp);
> - srp = NULL;
> - if (NULL == sfp->headrp) {
> - SCSI_LOG_TIMEOUT(1, printk("sg...bh: already
> closed, final cleanup\n"));
> - if (0 == sg_remove_sfp(sdp, sfp)) { /*
> device still present */
> - scsi_device_put(sdp->device);
> - }
> - sfp = NULL;
> - }
> - } else if (srp && srp->orphan) {
> +
> + if (srp && srp->orphan) {
> if (sfp->keep_orphan)
> srp->sg_io_owned = 0;
> else {
> @@ -1441,6 +1478,7 @@ sg_add(struct class_device *cl_dev)
> }
> k = error;
> sdp = sg_dev_arr[k];
> + kref_init(&sdp->kref);
>
> devfs_mk_cdev(MKDEV(SCSI_GENERIC_MAJOR, k),
> S_IFCHR | S_IRUSR | S_IWUSR | S_IRGRP,
> @@ -1493,45 +1531,33 @@ sg_remove(struct class_device *cl_dev)
> unsigned long iflags;
> Sg_fd *sfp;
> Sg_fd *tsfp;
> - Sg_request *srp;
> - Sg_request *tsrp;
> - int k, delay;
> + int k;
>
> if (NULL == sg_dev_arr)
> return;
> - delay = 0;
> write_lock_irqsave(&sg_dev_arr_lock, iflags);
> for (k = 0; k < sg_dev_max; k++) {
> - sdp = sg_dev_arr[k];
> - if ((NULL == sdp) || (sdp->device != scsidp))
> + if ((NULL == sg_dev_arr[k]) ||
> + (sg_dev_arr[k]->device != scsidp))
> continue; /* dirty but lowers nesting */
> + sdp = sg_dev_arr[k];
> + sdp->detached = 1;
> +
> if (sdp->headfp) {
> - sdp->detached = 1;
> for (sfp = sdp->headfp; sfp; sfp = tsfp) {
> tsfp = sfp->nextfp;
> - for (srp = sfp->headrp; srp; srp = tsrp)
> {
> - tsrp = srp->nextrp;
> - if (sfp->closed || (0 ==
> sg_srp_done(srp, sfp)))
> - sg_finish_rem_req(srp);
> - }
> - if (sfp->closed) {
> - scsi_device_put(sdp->device);
> - __sg_remove_sfp(sdp, sfp);
> - } else {
> - delay = 1;
> + if (!(sfp->closed)) {
>
> wake_up_interruptible(&sfp->read_wait);
> kill_fasync(&sfp->async_qp,
> SIGPOLL,
> POLL_HUP);
> }
> }
> SCSI_LOG_TIMEOUT(3, printk("sg_detach: dev=%d,
> dirty\n", k));
> - if (NULL == sdp->headfp) {
> - sg_dev_arr[k] = NULL;
> - }
> } else { /* nothing active, simple case */
> SCSI_LOG_TIMEOUT(3, printk("sg_detach:
> dev=%d\n", k));
> - sg_dev_arr[k] = NULL;
> }
> +
> + sg_dev_arr[k] = NULL;
> sg_nr_dev--;
> break;
> }
> @@ -1543,14 +1569,32 @@ sg_remove(struct class_device *cl_dev)
> cdev_del(sdp->cdev);
> sdp->cdev = NULL;
> devfs_remove("%s/generic", scsidp->devfs_name);
> - put_disk(sdp->disk);
> - sdp->disk = NULL;
> - if (NULL == sdp->headfp)
> - kfree((char *) sdp);
> +
> + down(&sg_ref_sem);
> + kref_put(&sdp->kref, scsi_generic_release);
> + up(&sg_ref_sem);
> }
> +}
> +
> +/**
> + * scsi_generic_release - Called to free the Sg_device structure
> + * @kref: pointer to embedded kref
> + *
> + * sg_ref_sem must be held entering this routine. Because it is
> + * called on last put, you should always use the
> scsi_generic_get()
> + * and scsi_generic_put() helpers which manipulate the semaphore
> + * directly and never do a direct kref_put().
> + **/
> +static void scsi_generic_release(struct kref *kref)
> +{
> + Sg_device *sdp = to_scsi_generic(kref);
> + struct gendisk *disk = sdp->disk;
> +
> + disk->private_data = NULL;
> + put_disk(disk);
>
> - if (delay)
> - msleep(10); /* dirty detach so delay device
> destruction */
> + kfree((char *) sdp);
> + return;
> }
>
> /* Set 'perm' (4th argument) to 0 to disable module_param's definition
> @@ -2484,14 +2528,13 @@ __sg_remove_sfp(Sg_device * sdp, Sg_fd *
> sg_page_free((char *) sfp, sizeof (Sg_fd));
> }
>
> -/* Returns 0 in normal case, 1 when detached and sdp object removed */
> +/* Returns a count of IOs still in progress. */
> static int
> sg_remove_sfp(Sg_device * sdp, Sg_fd * sfp)
> {
> Sg_request *srp;
> Sg_request *tsrp;
> int dirty = 0;
> - int res = 0;
>
> for (srp = sfp->headrp; srp; srp = tsrp) {
> tsrp = srp->nextrp;
> @@ -2505,30 +2548,16 @@ sg_remove_sfp(Sg_device * sdp, Sg_fd * s
>
> write_lock_irqsave(&sg_dev_arr_lock, iflags);
> __sg_remove_sfp(sdp, sfp);
> - if (sdp->detached && (NULL == sdp->headfp)) {
> - int k, maxd;
> -
> - maxd = sg_dev_max;
> - for (k = 0; k < maxd; ++k) {
> - if (sdp == sg_dev_arr[k])
> - break;
> - }
> - if (k < maxd)
> - sg_dev_arr[k] = NULL;
> - kfree((char *) sdp);
> - res = 1;
> - }
> write_unlock_irqrestore(&sg_dev_arr_lock, iflags);
> } else {
> - /* MOD_INC's to inhibit unloading sg and associated
> adapter driver */
> - /* only bump the access_count if we actually succeeded
> in
> - * throwing another counter on the host module */
> - scsi_device_get(sdp->device); /* XXX: retval ignored?
> */
> + /* The non-zero dirty count will be returned. This will
> tell
> + the caller not to drop the reference on this device.
> */
> +
> sfp->closed = 1; /* flag dirty state on this fd
> */
> SCSI_LOG_TIMEOUT(1, printk("sg_remove_sfp: worrisome, %d
> writes pending\n",
> dirty));
> }
> - return res;
> + return dirty;
> }
>
> static int
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 6+ messages in thread* RE: requesting advice: oops when doing sg_dd IO and device is removed
2005-10-18 4:54 ` Douglas Gilbert
@ 2005-10-18 22:02 ` Tony Battersby
0 siblings, 0 replies; 6+ messages in thread
From: Tony Battersby @ 2005-10-18 22:02 UTC (permalink / raw)
To: dougg, 'Dailey, Nate'; +Cc: linux-scsi
> -----Original Message-----
> From: linux-scsi-owner@vger.kernel.org
> [mailto:linux-scsi-owner@vger.kernel.org]On Behalf Of Douglas Gilbert
> Sent: Tuesday, October 18, 2005 12:55 AM
> To: Dailey, Nate
> Cc: linux-scsi@vger.kernel.org
> Subject: Re: requesting advice: oops when doing sg_dd IO and device is
> removed
>
>
> Nate,
> I always suspected that if one really tried hard enough
> then unexpectedly closing a file descriptor (e.g. via
> control-C on an app) at the same time as there are
> outstanding commands could cause problems. Obviously
> you have succeeded demonstrating that with your experiment.
>
> When I fought with this problem in the lk 2.4 series,
> someone skilled at finding flaws in locking logic
> suggested there was no solution (at least the way
> I was doing it) ...
>
> Waiting a close() call (i.e. sg_release() ), potentially
> indefinitely, was very unpopular with application clients.
> The sg driver did that in the old days, and I was abused
> from several quarters (one name springs to mind). So I
> tried to design sg_release() so it wouldn't ever hang
> the application client.
>
> On reflection, I think to allow sg_release() to go
> through quickly, another kernel thread is needed that
> is passed ownership of unfinished commands. What do
> you think?
>
>
> Doug Gilbert
Doug,
Here are some relevant links regarding related problems in the 2.4
kernels. I am still using 2.4, so I don't know if 2.6 has this problem
or not.
http://bugme.osdl.org/show_bug.cgi?id=3685
http://marc.theaimsgroup.com/?t=108005442700006&r=1&w=2
Anthony J. Battersby
Cybernetics
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2005-10-24 13:03 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-10-20 16:48 requesting advice: oops when doing sg_dd IO and device is removed Dailey, Nate
2005-10-24 13:04 ` Douglas Gilbert
-- strict thread matches above, loose matches on Subject: below --
2005-10-18 21:00 Dailey, Nate
2005-10-17 13:33 Dailey, Nate
2005-10-18 4:54 ` Douglas Gilbert
2005-10-18 22:02 ` Tony Battersby
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).