[PATCH 0/2] sg: fix races during device removal (v2)

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/2] sg: fix races during device removal (v2)
@ 2009-01-05 19:07 Tony Battersby
  2009-01-08 23:21 ` Douglas Gilbert
  2009-01-10 17:26 ` FUJITA Tomonori
  0 siblings, 2 replies; 41+ messages in thread
From: Tony Battersby @ 2009-01-05 19:07 UTC (permalink / raw)
  To: Douglas Gilbert, FUJITA Tomonori, James.Bottomley
  Cc: Christoph Hellwig, linux-scsi

This is version 2 of my patch, this time using a kref instead of int.
There are also a lot of other changes since the first version since I
found even more bugs both through testing and source code analysis.
I have split the patch up into two parts: the first patch fixes
races between open, close, device removal, and command completion.
The second patch fixes some races I spotted in ioctl(SG_IO).

Below are a list of test cases fixed by the patch.

----------

test #1

open /dev/sgX
send a command that takes a long time (e.g. any tape drive seek
command)
before command completes, echo 1 >
/sys/class/scsi_generic/sgX/device/delete

without patch:
oops

with patch:
test program gets ENODEV immediately
keventd sleeps until the cmd is complete
this is suboptimal since it starves other users of keventd while
waiting for the command to complete, but it is better than an oops

----------

test #2

open /dev/sgX
send a command that takes a long time (e.g. any tape drive seek
command) without waiting for it to complete
close fd
before command completes, echo 1 >
/sys/class/scsi_generic/sgX/device/delete

without patch:
oops when the command does complete (sg_rq_end_io() bad pointer deref)

with patch:
keventd sleeps until the cmd is complete
this is suboptimal since it starves other users of keventd while
waiting for the command to complete, but it is better than an oops

----------

test #3

open /dev/sgX
send a command that takes a long time (e.g. any tape drive seek
command) without waiting for it to complete
close fd
rmmod sg

without patch:
rmmod succeeds without waiting for the command to complete
oops when the command does complete (sg_rq_end_io() callback no
longer exists)

with patch:
sg module usage count does not drop to 0 until the command completes,
so cannot rmmod

----------

test #4

open /dev/sgX
loop: send commands, check results
unplug SAS cable
mptsas automatically removes the device and fails active commands
the test program detects the failed commands and closes its fds at
the same time
this results in the following sequence:
sg_remove() enter
sg_release() enter
sg_release() exit
sg_remove() exit

without patch:
oops

with patch:
ok

----------

test #5

open /dev/sgX
loop: send commands, check results
unplug SAS cable
mptsas automatically removes the device and fails active commands
the test program detects the failed commands, but sleeps for a second
before closing its fds
this results in the following sequence:
sg_remove() enter
sg_remove() exit
sg_release() enter
sg_release() exit

without patch:
oops

with patch:
ok

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/2] sg: fix races during device removal (v2)
  2009-01-05 19:07 [PATCH 0/2] sg: fix races during device removal (v2) Tony Battersby
@ 2009-01-08 23:21 ` Douglas Gilbert
  2009-01-10 17:26 ` FUJITA Tomonori
  1 sibling, 0 replies; 41+ messages in thread
From: Douglas Gilbert @ 2009-01-08 23:21 UTC (permalink / raw)
  To: Tony Battersby
  Cc: FUJITA Tomonori, James.Bottomley, Christoph Hellwig, linux-scsi

Tony Battersby wrote:
> This is version 2 of my patch, this time using a kref instead of int.
> There are also a lot of other changes since the first version since I
> found even more bugs both through testing and source code analysis.
> I have split the patch up into two parts: the first patch fixes
> races between open, close, device removal, and command completion.
> The second patch fixes some races I spotted in ioctl(SG_IO).

Okay, lets go with v2.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/2] sg: fix races during device removal (v2)
  2009-01-05 19:07 [PATCH 0/2] sg: fix races during device removal (v2) Tony Battersby
  2009-01-08 23:21 ` Douglas Gilbert
@ 2009-01-10 17:26 ` FUJITA Tomonori
  2009-01-12 21:09   ` Tony Battersby
  2009-01-14 20:31   ` Tony Battersby
  1 sibling, 2 replies; 41+ messages in thread
From: FUJITA Tomonori @ 2009-01-10 17:26 UTC (permalink / raw)
  To: tonyb; +Cc: dgilbert, fujita.tomonori, James.Bottomley, hch, linux-scsi

On Mon, 05 Jan 2009 14:07:19 -0500
Tony Battersby <tonyb@cybernetics.com> wrote:

> This is version 2 of my patch, this time using a kref instead of int.
> There are also a lot of other changes since the first version since I
> found even more bugs both through testing and source code analysis.
> I have split the patch up into two parts: the first patch fixes
> races between open, close, device removal, and command completion.
> The second patch fixes some races I spotted in ioctl(SG_IO).
> 
> Below are a list of test cases fixed by the patch.

Thanks for the great work,


> ----------
> 
> test #1
> 
> open /dev/sgX
> send a command that takes a long time (e.g. any tape drive seek
> command)
> before command completes, echo 1 >
> /sys/class/scsi_generic/sgX/device/delete
> 
> without patch:
> oops
> 
> with patch:
> test program gets ENODEV immediately
> keventd sleeps until the cmd is complete
> this is suboptimal since it starves other users of keventd while
> waiting for the command to complete, but it is better than an oops

As you said, we can do better. It's not correct to make
class_interface's remove_dev hook, sg_remove wait for the completion
of all the commands.


> test #2
> 
> open /dev/sgX
> send a command that takes a long time (e.g. any tape drive seek
> command) without waiting for it to complete
> close fd
> before command completes, echo 1 >
> /sys/class/scsi_generic/sgX/device/delete
> 
> without patch:
> oops when the command does complete (sg_rq_end_io() bad pointer deref)
> 
> with patch:
> keventd sleeps until the cmd is complete
> this is suboptimal since it starves other users of keventd while
> waiting for the command to complete, but it is better than an oops

Ditto.


The main problem is about the lifetime of sg_dev and sg_fd. I think
that we can fix it in a simpler and better way.

You use kref for sg_dev. We can use kref for sg_fd too. I think that
that's all we need to do.

A sg_request gets the reference of sg_fd. It makes sure that sg_fd
doesn't go away if any outstanding sg_requests exists.

sg_fd gets the reference of sg_dev (with scsi_device) and makes sure
that the sg module doesn't go away.

Then sg_remove doesn't wait for anything. sg_fd gets the reference of
sg_dev so sg_dev doesn't go away. sg_rq_end_io puts the reference of
sg_fd then sg_fd is freed and sg_dev is freed safely.

Here's a patch to do the above. It's even simpler than the current sg
code. I confirmed that sg can pass the first, second and third tests
with this patch.

I think that 4th and 5th tests fail because of the lifetime of sg_dev
and sg_fd. Can you please try these tests with this patch?

=
From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Date: Sat, 10 Jan 2009 23:25:24 +0900
Subject: [PATCH] sg: fix the lifetime management of sg_dev and sg_fd

To fix the lifetime of sg_dev and sg_fd, this patch adds kref for
sg_dev and sg_fd.

A sg_request gets the reference of sg_fd. It makes sure that sg_fd
doesn't go away if any outstanding sg_requests exists.

sg_fd gets the reference of sg_dev (with scsi_device) and makes sure
that the sg module doesn't go away.


Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
---
 drivers/scsi/sg.c |  198 +++++++++++++++++++++++++----------------------------
 1 files changed, 93 insertions(+), 105 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 5103855..11b4a56 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -158,6 +158,8 @@ typedef struct sg_fd {		/* holds the state of a file descriptor */
 	char next_cmd_len;	/* 0 -> automatic (def), >0 -> use on next write() */
 	char keep_orphan;	/* 0 -> drop orphan (def), 1 -> keep for read() */
 	char mmap_called;	/* 0 -> mmap() never called on this fd */
+	struct kref f_ref;
+	struct execute_work ew;
 } Sg_fd;
 
 typedef struct sg_device { /* holds the state of each scsi generic device */
@@ -171,6 +173,7 @@ typedef struct sg_device { /* holds the state of each scsi generic device */
 	char sgdebug;		/* 0->off, 1->sense, 9->dump dev, 10-> all devs */
 	struct gendisk *disk;
 	struct cdev * cdev;	/* char_dev [sysfs: /sys/cdev/major/sg<n>] */
+	struct kref d_ref;
 } Sg_device;
 
 static int sg_fasync(int fd, struct file *filp, int mode);
@@ -194,13 +197,13 @@ static void sg_build_reserve(Sg_fd * sfp, int req_size);
 static void sg_link_reserve(Sg_fd * sfp, Sg_request * srp, int size);
 static void sg_unlink_reserve(Sg_fd * sfp, Sg_request * srp);
 static Sg_fd *sg_add_sfp(Sg_device * sdp, int dev);
-static int sg_remove_sfp(Sg_device * sdp, Sg_fd * sfp);
-static void __sg_remove_sfp(Sg_device * sdp, Sg_fd * sfp);
+static void sg_remove_sfp(struct kref *);
 static Sg_request *sg_get_rq_mark(Sg_fd * sfp, int pack_id);
 static Sg_request *sg_add_request(Sg_fd * sfp);
 static int sg_remove_request(Sg_fd * sfp, Sg_request * srp);
 static int sg_res_in_use(Sg_fd * sfp);
 static Sg_device *sg_get_dev(int dev);
+static void sg_put_dev(struct sg_device *sdp);
 #ifdef CONFIG_SCSI_PROC_FS
 static int sg_last_dev(void);
 #endif
@@ -238,21 +241,19 @@ sg_open(struct inode *inode, struct file *filp)
 	SCSI_LOG_TIMEOUT(3, printk("sg_open: dev=%d, flags=0x%x\n", dev, flags));
 	sdp = sg_get_dev(dev);
 	if ((!sdp) || (!sdp->device)) {
-		unlock_kernel();
-		return -ENXIO;
+		retval = -ENXIO;
+		goto sg_put;
 	}
 	if (sdp->detached) {
-		unlock_kernel();
-		return -ENODEV;
+		retval = -ENODEV;
+		goto sg_put;
 	}
 
 	/* This driver's module count bumped by fops_get in <linux/fs.h> */
 	/* Prevent the device driver from vanishing while we sleep */
 	retval = scsi_device_get(sdp->device);
-	if (retval) {
-		unlock_kernel();
-		return retval;
-	}
+	if (retval)
+		goto sg_put;
 
 	if (!((flags & O_NONBLOCK) ||
 	      scsi_block_when_processing_errors(sdp->device))) {
@@ -308,11 +309,12 @@ sg_open(struct inode *inode, struct file *filp)
 		retval = -ENOMEM;
 		goto error_out;
 	}
-	unlock_kernel();
-	return 0;
-
-      error_out:
-	scsi_device_put(sdp->device);
+	retval = 0;
+error_out:
+	if (retval)
+		scsi_device_put(sdp->device);
+sg_put:
+	sg_put_dev(sdp);
 	unlock_kernel();
 	return retval;
 }
@@ -327,13 +329,11 @@ sg_release(struct inode *inode, struct file *filp)
 	if ((!(sfp = (Sg_fd *) filp->private_data)) || (!(sdp = sfp->parentdp)))
 		return -ENXIO;
 	SCSI_LOG_TIMEOUT(3, printk("sg_release: %s\n", sdp->disk->disk_name));
-	if (0 == sg_remove_sfp(sdp, sfp)) {	/* Returns 1 when sdp gone */
-		if (!sdp->detached) {
-			scsi_device_put(sdp->device);
-		}
-		sdp->exclude = 0;
-		wake_up_interruptible(&sdp->o_excl_wait);
-	}
+
+	sdp->exclude = 0;
+	wake_up_interruptible(&sdp->o_excl_wait);
+
+	kref_put(&sfp->f_ref, sg_remove_sfp);
 	return 0;
 }
 
@@ -1259,12 +1259,10 @@ static void sg_rq_end_io(struct request *rq, int uptodate)
 		return;
 	}
 	sfp = srp->parentfp;
-	if (sfp)
-		sdp = sfp->parentdp;
-	if ((NULL == sdp) || sdp->detached) {
+	sdp = sfp->parentdp;
+
+	if (sdp->detached)
 		printk(KERN_INFO "sg_cmd_done: device detached\n");
-		return;
-	}
 
 	sense = rq->sense;
 	result = rq->errors;
@@ -1303,18 +1301,7 @@ static void sg_rq_end_io(struct request *rq, int uptodate)
 	}
 	/* Rely on write phase to clean out srp status values, so no "else" */
 
-	if (sfp->closed) {	/* whoops this fd already released, cleanup */
-		SCSI_LOG_TIMEOUT(1, printk("sg_cmd_done: already closed, freeing ...\n"));
-		sg_finish_rem_req(srp);
-		srp = NULL;
-		if (NULL == sfp->headrp) {
-			SCSI_LOG_TIMEOUT(1, printk("sg_cmd_done: already closed, final cleanup\n"));
-			if (0 == sg_remove_sfp(sdp, sfp)) {	/* device still present */
-				scsi_device_put(sdp->device);
-			}
-			sfp = NULL;
-		}
-	} else if (srp && srp->orphan) {
+	if (srp->orphan) {
 		if (sfp->keep_orphan)
 			srp->sg_io_owned = 0;
 		else {
@@ -1322,7 +1309,8 @@ static void sg_rq_end_io(struct request *rq, int uptodate)
 			srp = NULL;
 		}
 	}
-	if (sfp && srp) {
+
+	if (srp) {
 		/* Now wake up any sg_read() that is waiting for this packet. */
 		kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN);
 		write_lock_irqsave(&sfp->rq_list_lock, iflags);
@@ -1330,6 +1318,8 @@ static void sg_rq_end_io(struct request *rq, int uptodate)
 		wake_up_interruptible(&sfp->read_wait);
 		write_unlock_irqrestore(&sfp->rq_list_lock, iflags);
 	}
+
+	kref_put(&sfp->f_ref, sg_remove_sfp);
 }
 
 static struct file_operations sg_fops = {
@@ -1391,6 +1381,7 @@ static Sg_device *sg_alloc(struct gendisk *disk, struct scsi_device *scsidp)
 	init_waitqueue_head(&sdp->o_excl_wait);
 	sdp->sg_tablesize = min(q->max_hw_segments, q->max_phys_segments);
 	sdp->index = k;
+	kref_init(&sdp->d_ref);
 
 	error = 0;
  out:
@@ -1496,41 +1487,18 @@ sg_remove(struct device *cl_dev, struct class_interface *cl_intf)
 	unsigned long iflags;
 	Sg_fd *sfp;
 	Sg_fd *tsfp;
-	Sg_request *srp;
-	Sg_request *tsrp;
-	int delay;
 
 	if (!sdp)
 		return;
 
-	delay = 0;
+	sdp->detached = 1;
+
 	write_lock_irqsave(&sg_index_lock, iflags);
-	if (sdp->headfp) {
-		sdp->detached = 1;
-		for (sfp = sdp->headfp; sfp; sfp = tsfp) {
-			tsfp = sfp->nextfp;
-			for (srp = sfp->headrp; srp; srp = tsrp) {
-				tsrp = srp->nextrp;
-				if (sfp->closed || (0 == sg_srp_done(srp, sfp)))
-					sg_finish_rem_req(srp);
-			}
-			if (sfp->closed) {
-				scsi_device_put(sdp->device);
-				__sg_remove_sfp(sdp, sfp);
-			} else {
-				delay = 1;
-				wake_up_interruptible(&sfp->read_wait);
-				kill_fasync(&sfp->async_qp, SIGPOLL,
-					    POLL_HUP);
-			}
-		}
-		SCSI_LOG_TIMEOUT(3, printk("sg_remove: dev=%d, dirty\n", sdp->index));
-		if (NULL == sdp->headfp) {
-			idr_remove(&sg_index_idr, sdp->index);
-		}
-	} else {	/* nothing active, simple case */
-		SCSI_LOG_TIMEOUT(3, printk("sg_remove: dev=%d\n", sdp->index));
-		idr_remove(&sg_index_idr, sdp->index);
+	for (sfp = sdp->headfp; sfp; sfp = tsfp) {
+		tsfp = sfp->nextfp;
+
+		wake_up_interruptible(&sfp->read_wait);
+		kill_fasync(&sfp->async_qp, SIGPOLL, POLL_HUP);
 	}
 	write_unlock_irqrestore(&sg_index_lock, iflags);
 
@@ -1540,11 +1508,8 @@ sg_remove(struct device *cl_dev, struct class_interface *cl_intf)
 	sdp->cdev = NULL;
 	put_disk(sdp->disk);
 	sdp->disk = NULL;
-	if (NULL == sdp->headfp)
-		kfree(sdp);
 
-	if (delay)
-		msleep(10);	/* dirty detach so delay device destruction */
+	sg_put_dev(sdp);
 }
 
 module_param_named(scatter_elem_sz, scatter_elem_sz, int, S_IRUGO | S_IWUSR);
@@ -1993,6 +1958,7 @@ sg_add_request(Sg_fd * sfp)
 	if (resp) {
 		resp->nextrp = NULL;
 		resp->header.duration = jiffies_to_msecs(jiffies);
+		kref_get(&sfp->f_ref);
 	}
 	write_unlock_irqrestore(&sfp->rq_list_lock, iflags);
 	return resp;
@@ -2060,6 +2026,7 @@ sg_add_sfp(Sg_device * sdp, int dev)
 	init_waitqueue_head(&sfp->read_wait);
 	rwlock_init(&sfp->rq_list_lock);
 
+	kref_init(&sfp->f_ref);
 	sfp->timeout = SG_DEFAULT_TIMEOUT;
 	sfp->timeout_user = SG_DEFAULT_TIMEOUT_USER;
 	sfp->force_packid = SG_DEF_FORCE_PACK_ID;
@@ -2087,6 +2054,9 @@ sg_add_sfp(Sg_device * sdp, int dev)
 	sg_build_reserve(sfp, bufflen);
 	SCSI_LOG_TIMEOUT(3, printk("sg_add_sfp:   bufflen=%d, k_use_sg=%d\n",
 			   sfp->reserve.bufflen, sfp->reserve.k_use_sg));
+
+	kref_get(&sdp->d_ref);
+	__module_get(THIS_MODULE);
 	return sfp;
 }
 
@@ -2114,48 +2084,38 @@ __sg_remove_sfp(Sg_device * sdp, Sg_fd * sfp)
 			(int) sfp->reserve.bufflen, (int) sfp->reserve.k_use_sg));
 		sg_remove_scat(&sfp->reserve);
 	}
-	sfp->parentdp = NULL;
 	SCSI_LOG_TIMEOUT(6, printk("__sg_remove_sfp:    sfp=0x%p\n", sfp));
 	kfree(sfp);
 }
 
-/* Returns 0 in normal case, 1 when detached and sdp object removed */
-static int
-sg_remove_sfp(Sg_device * sdp, Sg_fd * sfp)
+static void sg_remove_sfp_usercontext(struct work_struct *work)
 {
+	struct sg_fd *sfp = container_of(work, struct sg_fd, ew.work);
+	struct sg_device *sdp = sfp->parentdp;
+	unsigned long iflags;
 	Sg_request *srp;
 	Sg_request *tsrp;
-	int dirty = 0;
-	int res = 0;
 
 	for (srp = sfp->headrp; srp; srp = tsrp) {
 		tsrp = srp->nextrp;
-		if (sg_srp_done(srp, sfp))
-			sg_finish_rem_req(srp);
-		else
-			++dirty;
+		sg_finish_rem_req(srp);
 	}
-	if (0 == dirty) {
-		unsigned long iflags;
 
-		write_lock_irqsave(&sg_index_lock, iflags);
-		__sg_remove_sfp(sdp, sfp);
-		if (sdp->detached && (NULL == sdp->headfp)) {
-			idr_remove(&sg_index_idr, sdp->index);
-			kfree(sdp);
-			res = 1;
-		}
-		write_unlock_irqrestore(&sg_index_lock, iflags);
-	} else {
-		/* MOD_INC's to inhibit unloading sg and associated adapter driver */
-		/* only bump the access_count if we actually succeeded in
-		 * throwing another counter on the host module */
-		scsi_device_get(sdp->device);	/* XXX: retval ignored? */	
-		sfp->closed = 1;	/* flag dirty state on this fd */
-		SCSI_LOG_TIMEOUT(1, printk("sg_remove_sfp: worrisome, %d writes pending\n",
-				  dirty));
-	}
-	return res;
+	write_lock_irqsave(&sg_index_lock, iflags);
+	__sg_remove_sfp(sdp, sfp);
+	write_unlock_irqrestore(&sg_index_lock, iflags);
+
+	scsi_device_put(sdp->device);
+
+	sg_put_dev(sdp);
+
+	module_put(THIS_MODULE);
+}
+
+static void sg_remove_sfp(struct kref *kref)
+{
+	struct sg_fd *sfp = container_of(kref, struct sg_fd, f_ref);
+	execute_in_process_context(sg_remove_sfp_usercontext, &sfp->ew);
 }
 
 static int
@@ -2197,6 +2157,28 @@ sg_last_dev(void)
 }
 #endif
 
+static void sg_device_release(struct kref *kref)
+{
+	struct sg_device *sdp = container_of(kref, struct sg_device, d_ref);
+	unsigned long flags;
+
+	SCSI_LOG_TIMEOUT(3, printk("sg_remove: dev=%d\n", sdp->index));
+
+	write_lock_irqsave(&sg_index_lock, flags);
+	idr_remove(&sg_index_idr, sdp->index);
+	write_unlock_irqrestore(&sg_index_lock, flags);
+
+	kfree(sdp);
+}
+
+static void sg_put_dev(struct sg_device *sdp)
+{
+	if (!sdp)
+		return;
+
+	kref_put(&sdp->d_ref, sg_device_release);
+}
+
 static Sg_device *
 sg_get_dev(int dev)
 {
@@ -2205,6 +2187,8 @@ sg_get_dev(int dev)
 
 	read_lock_irqsave(&sg_index_lock, iflags);
 	sdp = idr_find(&sg_index_idr, dev);
+	if (sdp)
+		kref_get(&sdp->d_ref);
 	read_unlock_irqrestore(&sg_index_lock, iflags);
 
 	return sdp;
@@ -2478,6 +2462,7 @@ static int sg_proc_seq_show_dev(struct seq_file *s, void *v)
 			      (int) scsi_device_online(scsidp));
 	else
 		seq_printf(s, "-1\t-1\t-1\t-1\t-1\t-1\t-1\t-1\t-1\n");
+	sg_put_dev(sdp);
 	return 0;
 }
 
@@ -2498,6 +2483,7 @@ static int sg_proc_seq_show_devstrs(struct seq_file *s, void *v)
 			   scsidp->vendor, scsidp->model, scsidp->rev);
 	else
 		seq_printf(s, "<no active device>\n");
+	sg_put_dev(sdp);
 	return 0;
 }
 
@@ -2582,7 +2568,7 @@ static int sg_proc_seq_show_debug(struct seq_file *s, void *v)
 		if (NULL == scsidp) {
 			seq_printf(s, "device %d detached ??\n", 
 				   (int)it->index);
-			return 0;
+			goto out;
 		}
 
 		if (sg_get_nth_sfp(sdp, 0)) {
@@ -2602,6 +2588,8 @@ static int sg_proc_seq_show_debug(struct seq_file *s, void *v)
 		}
 		sg_proc_debug_helper(s, sdp);
 	}
+out:
+	sg_put_dev(sdp);
 	return 0;
 }
 
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/2] sg: fix races during device removal (v2)
  2009-01-10 17:26 ` FUJITA Tomonori
@ 2009-01-12 21:09   ` Tony Battersby
  2009-01-13 16:24     ` FUJITA Tomonori
  2009-01-14 20:31   ` Tony Battersby
  1 sibling, 1 reply; 41+ messages in thread
From: Tony Battersby @ 2009-01-12 21:09 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: dgilbert, James.Bottomley, hch, linux-scsi

FUJITA Tomonori wrote:
>
>> test #1
>>
>> open /dev/sgX
>> send a command that takes a long time (e.g. any tape drive seek
>> command)
>> before command completes, echo 1 >
>> /sys/class/scsi_generic/sgX/device/delete
>>
>> without patch:
>> oops
>>
>> with patch:
>> test program gets ENODEV immediately
>> keventd sleeps until the cmd is complete
>> this is suboptimal since it starves other users of keventd while
>> waiting for the command to complete, but it is better than an oops
>>     
>
> As you said, we can do better. It's not correct to make
> class_interface's remove_dev hook, sg_remove wait for the completion
> of all the commands.
>
> The main problem is about the lifetime of sg_dev and sg_fd. I think
> that we can fix it in a simpler and better way.
>
> You use kref for sg_dev. We can use kref for sg_fd too. I think that
> that's all we need to do.
>
>   
Thanks for the review.  Your patch behaves very similarly to my patch
with the wait_event() removed from sg_remove().  With my patch, it was
not strictly necessary for sg_remove() to wait until all commands had
gone through sg_rq_end_io() - the refcounting, locking, and other checks
in my patch would have taken care of freeing sdp, sfp, and related
resources at the right time, even if sg_remove() didn't wait.  The
reason I added the wait is because without it keventd would still sleep
(outside of sg_remove()) waiting for the command to complete anyway,
except that when the command did complete, the kernel would spew some
scary-looking error messages:

mptscsih: ioc1: ERROR - Received a mf that was already freed
mptscsih: ioc1: ERROR - req_idx=beaf req_idx_MR=beaf mf=ce504b00
mr=00000000 sc=00000000

So I figured it would be better to wait in sg_remove() and avoid the
error above, instead of waiting somewhere else for the same amount of
time and getting an error.  With your patch, I get the above error
again, and keventd blocks for the same amount of time, so initially it
did not look like an improvement.  However, since this issue has
surfaced again, I went ahead and analyzed why keventd is waiting outside
of sg_remove(), and discovered that it is a bug in mptscsih (*).  When I
retested with sym53c8xx or open-iscsi, keventd did not block waiting for
the command to complete.  I will send a patch for mptscsih later.

So we can accomplish the same effect with my patch just by removing the
wait_event() in sg_remove() and the other code associated with it.  Now
the question is - do we still want to use krefs for the sg_fd?  I had
considered it when coming up with my patch, but I decided not to add the
overhead of two extra atomic ops per command.  Atomic ops for opening
and closing fds are OK, but I didn't want to slow down the command
processing critical paths.  I haven't measured the overhead, so maybe it
isn't too bad, but it is something to consider.

Also, I spent a lot of time analyzing locks and potential races with my
patch, and your patch reverts some of the fixes that I incorporated in
my patch.  Just to point out one example that I spotted quickly:
sg_rq_end_io() needs to check srp->orphan and set srp->done = 1 while
holding sfp->rq_list_lock to avoid races with sg_ioctl(SG_IO) (see my
[PATCH 2/2] sg: fix races with ioctl(SG_IO)).  But if you really want to
go with a kref on sg_fd and get/put on every command, let me know and I
will check and test everything again in detail.  If not, I will resubmit
my patch with the wait_event and associated code removed.

As a side note, scsi_remove_device() is called from keventd when using
the /sys interface to delete a device, but may be called from a
different process when using a different interface to delete a device. 
So under different circumstances the sleep may block rmmod, iscsi_wq,
etc., instead of keventd.

(*) Here is why mptscsih causes keventd to sleep:

scsi_remove_device() -> __scsi_remove_device() -> mptscsih_slave_destroy()
First, mptscsih_search_running_cmds() calls sc->scsi_done()
(sg_rq_end_io()) on all outstanding commands.
Then, mptscsih_synchronize_cache() is called.  It has the following check:

if (vdevice->vtarget->type != TYPE_DISK || vdevice->vtarget->deleted ||
    !vdevice->configured_lun)
    return;

However, this check is buggy.  My test is using a tape drive, so the
type != TYPE_DISK check should make the function return without doing
anything.  However, vtarget->type appears not to be set.  Also, you
might hope that the vtarget->deleted check might make the function
return, but it does not.  So it ends up sending a synchronize cache disk
command to a tape drive being deleted, and waits for it to complete. 
Since the tape drive does not do command queueing, this forces it to
wait for the tape drive seek command previously sent by my test program
to finish.  But when the seek command does finish, the message frame was
already freed by mptscsih_search_running_cmds(), so it gives the error
message above.

Tony

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/2] sg: fix races during device removal (v2)
  2009-01-12 21:09   ` Tony Battersby
@ 2009-01-13 16:24     ` FUJITA Tomonori
  0 siblings, 0 replies; 41+ messages in thread
From: FUJITA Tomonori @ 2009-01-13 16:24 UTC (permalink / raw)
  To: tonyb; +Cc: fujita.tomonori, dgilbert, James.Bottomley, hch, linux-scsi

On Mon, 12 Jan 2009 16:09:25 -0500
Tony Battersby <tonyb@cybernetics.com> wrote:

> So we can accomplish the same effect with my patch just by removing the
> wait_event() in sg_remove() and the other code associated with it.  Now
> the question is - do we still want to use krefs for the sg_fd?  I had
> considered it when coming up with my patch, but I decided not to add the
> overhead of two extra atomic ops per command.  Atomic ops for opening
> and closing fds are OK, but I didn't want to slow down the command
> processing critical paths.  I haven't measured the overhead, so maybe it
> isn't too bad, but it is something to consider.

kref_get() adds nothing. We already have lots of atomic_inc per scsi
command. Compared with the rest of the overhead of scsi command
processing, kref_get is simply nothing.


> Also, I spent a lot of time analyzing locks and potential races with my
> patch, and your patch reverts some of the fixes that I incorporated in
> my patch.  Just to point out one example that I spotted quickly:
> sg_rq_end_io() needs to check srp->orphan and set srp->done = 1 while
> holding sfp->rq_list_lock to avoid races with sg_ioctl(SG_IO) (see my
> [PATCH 2/2] sg: fix races with ioctl(SG_IO)).

Yeah, I know that sg has other races but let's fix the most critical
one first.

BTW, I think that we can do better about srp->done too.


>  But if you really want to
> go with a kref on sg_fd and get/put on every command, let me know and I
> will check and test everything again in detail.  If not, I will resubmit
> my patch with the wait_event and associated code removed.

Using kref for lifetime management is pretty common in Linux
kernel. Other developers can easily understand the code. For example,
scsi mid-layer calls get_device() on every command, which is similar
to what we need to solve about sg_fd and commands. So I like this way
but if you can fix the lifetime of sg_dev and sg_fd in a simpler way
than my patch (less lines), then we need to consider about it.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/2] sg: fix races during device removal (v2)
  2009-01-10 17:26 ` FUJITA Tomonori
  2009-01-12 21:09   ` Tony Battersby
@ 2009-01-14 20:31   ` Tony Battersby
  2009-01-14 21:39     ` Greg KH
  2009-01-19  6:57     ` FUJITA Tomonori
  1 sibling, 2 replies; 41+ messages in thread
From: Tony Battersby @ 2009-01-14 20:31 UTC (permalink / raw)
  To: FUJITA Tomonori
  Cc: dgilbert, James.Bottomley, hch, linux-scsi, Greg Kroah-Hartman

[CC: Greg Kroah-Hartman <greg@kroah.com>]

Fujita, your patch results in simpler code than my version 2 patch,
but it still leaves some subtle races and other problems.  The crux
of the problem is that kref_put() needs do be done while holding a
lock if there is still a way for some other CPU to find a reference
to the object in between the time the refcount drops to 0 and the
time the destructor is called.

For example, look at sg_get_dev() and sg_put_dev().  In my patch,
I locked sg_index_lock in sg_put_dev() and called kref_put() while
holding the lock.  In your patch, you moved the lock from sg_put_dev()
to the destructor function.  This introduces a subtle race with
sg_get_dev():

CPU 1:
sg_put_dev()
  kref_put(): refcount 1 -> 0

CPU 2:
sg_get_dev()
  lock sg_index_lock
  idr_find()
  kref_get(): refcount 0 -> 1: bug
  unlock sg_index_lock

CPU 1:
sg_device_release()
  lock sg_index_lock
  idr_remove
  unlock sg_index_lock
  kfree(sdp)

CPU 2:
access sdp which has already been freed
sg_put_dev(): double-free bug

Besides holding the lock during kref_put(), I also considered two
other simple ways to avoid this race:
1) Do idr_remove() from sg_remove().
2) Return NULL in sg_get_dev() if sdp->detached.

However, both of these options would have changed the behavior of
the /proc/scsi/sg/* functions that show information for devices that
are in the process of being detached.  I wanted to fix bugs without
changing other behavior, so I chose to call kref_put() under lock in
my previous patches.

A similar problem exists with sg_remove_sfp() in your patch.
The refcount for the sg_fd can drop to 0 while a different CPU can
still find a reference to it by scanning sg_device::headfp.  The good
news is that most places that scan sg_fd's using sdp->headfp never
drop sg_index_lock while still holding a reference, so for most cases
it is safe for the destructor to work the way you have it.  The one
exception is sg_get_nth_sfp(), which needs to kref_get() the sg_fd it
returns (my version 2 patch doesn't fix this problem either since I
just noticed it).  But then the kref_get() would have a race similar
to the one I described above between sg_get_dev() and sg_put_dev().

There are several solutions to this problem:
1) Call kref_put(&sfp->f_ref, sg_remove_sfp) while holding
sg_index_lock, and have the destructor unlink sfp from sdp->headfp
before dropping the lock.  This is ugly since the destructor may want
to drop sg_index_lock (e.g. to call scsi_device_put()), but the saved
irqflags variable is in a different function.
2) Unlink sfp from sdp->headfp in sg_release().  The downside to this
is that the /proc/scsi/sg/* functions no longer show fds that have
been closed but still have commands pending.
3) Set sfp->closed in sg_release() and have sg_get_nth_sfp() return
NULL if sfp->closed.  Same downside as #2.
4) Add a variation of kref_get() that uses atomic_inc_not_zero().

When I was designing my previous patches, I anticipated running into
these complications with kref, which is one reason I avoided it.
Other people have experienced similar problems with the kref interface
(search "kref_get_not_zero" or "cgroups: fix probable race with
put_css_set[_taskexit] and find_css_set").  If you still want to use
kref, I think that we are either going to have to change the behavior
of /proc/scsi/sg/*, or else use atomic_inc_not_zero().  Since I still
prefer not to change valid user-visible behavior in a patch that fixes
so many possible oopses, I will go with atomic_inc_not_zero().  And,
since I am going that route, I will use it for sg_get_dev() also,
so that sg_put_dev() doesn't need to acquire sg_index_lock.

For the purposes of this patch, I added the local function
sg_kref_get_not_zero() to sg.c.  It would be better to add
kref_get_not_zero() to kref.c, but I see in the mailing list archives
that there has been some resistance to that idea because it complicates
the kref interface.  However, since it has come up more than once,
perhaps it would be better to go ahead and make it an official part
of the interface.  If anyone wants to support that idea, I will break
it out into a separate patch.

Your patch also has a problem with management of f_ref.  You do
kref_get() in sg_add_request() and kref_put() in sg_rq_end_io()().
However, an error can occur in between sg_add_request() and sending
the command to the midlevel, so sg_rq_end_io() may never be called.
It is better to do kref_get() just before sending the command to the
midlevel; that way the refcount counts active commands waiting for
the completion callback rather than prepared requests.

Your patch doesn't fix the problem of various functions calling
SCSI_LOG_TIMEOUT with sdp->disk->disk_name after sg_remove() sets
sdp->disk to NULL.  This is why I moved put_disk() from sg_remove()
to sg_device_release().

Your patch doesn't fix the problem of forgetting to do
wake_up_interruptible(&sdp->o_excl_wait) after removing sfp from
sdp->headfp.  This is important if sg_release() is called with
commands still active; the last command that finishes should wake up
any O_EXCL waiters.

So below is version 3 of my patch combining my ideas and your ideas.
For testing with mpt, I commented out mptscsih_synchronize_cache()
in mptscsih.c as a workaround for the bug I mentioned in a previous
message.  This patch still fixes all the test cases previously listed,
but with the additional improvements that it is simpler and it no
longer blocks keventd waiting for commands to complete.

My previously-posted "[PATCH 2/2] sg: fix races with ioctl(SG_IO)"
still needs to be applied after this patch.

=
From: Tony Battersby <tonyb@cybernetics.com>
Date: Wed, 14 Jan 2009 15:30:00 -0500
Subject: [PATCH] sg: fix races during device removal (v3)

sg has the following problems related to device removal:

* opening a sg fd races with removing a device
* closing a sg fd races with removing a device
* /proc/scsi/sg/* access races with removing a device
* command completion races with removing a device
* command completion races with closing a sg fd
* can rmmod sg with active commands

These problems can cause kernel oopses, memory-use-after-free, or
double-free errors.  This patch fixes these problems by using krefs
to manage the lifetime of sg_device and sg_fd.

Each command submitted to the midlevel holds a reference to sg_fd
until the completion callback.  This ensures that sg_fd doesn't go
away if the fd is closed with commands still outstanding.

sg_fd gets the reference of sg_device (with scsi_device) and also
makes sure that the sg module doesn't go away.

/proc functions get and put references to sg_fd and sg_device as
needed.

Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
---
 sg.c |  342 ++++++++++++++++++++++++++++++++++++-------------------------------
 1 file changed, 184 insertions(+), 158 deletions(-)
--- linux-2.6.29-rc1-git4/drivers/scsi/sg.c.orig	2009-01-14 14:16:14.000000000 -0500
+++ linux-2.6.29-rc1-git4/drivers/scsi/sg.c	2009-01-14 14:25:45.000000000 -0500
@@ -158,6 +158,8 @@ typedef struct sg_fd {		/* holds the sta
 	char next_cmd_len;	/* 0 -> automatic (def), >0 -> use on next write() */
 	char keep_orphan;	/* 0 -> drop orphan (def), 1 -> keep for read() */
 	char mmap_called;	/* 0 -> mmap() never called on this fd */
+	struct kref f_ref;
+	struct execute_work ew;
 } Sg_fd;
 
 typedef struct sg_device { /* holds the state of each scsi generic device */
@@ -171,6 +173,7 @@ typedef struct sg_device { /* holds the 
 	char sgdebug;		/* 0->off, 1->sense, 9->dump dev, 10-> all devs */
 	struct gendisk *disk;
 	struct cdev * cdev;	/* char_dev [sysfs: /sys/cdev/major/sg<n>] */
+	struct kref d_ref;
 } Sg_device;
 
 static int sg_fasync(int fd, struct file *filp, int mode);
@@ -194,13 +197,13 @@ static void sg_build_reserve(Sg_fd * sfp
 static void sg_link_reserve(Sg_fd * sfp, Sg_request * srp, int size);
 static void sg_unlink_reserve(Sg_fd * sfp, Sg_request * srp);
 static Sg_fd *sg_add_sfp(Sg_device * sdp, int dev);
-static int sg_remove_sfp(Sg_device * sdp, Sg_fd * sfp);
-static void __sg_remove_sfp(Sg_device * sdp, Sg_fd * sfp);
+static void sg_remove_sfp(struct kref *);
 static Sg_request *sg_get_rq_mark(Sg_fd * sfp, int pack_id);
 static Sg_request *sg_add_request(Sg_fd * sfp);
 static int sg_remove_request(Sg_fd * sfp, Sg_request * srp);
 static int sg_res_in_use(Sg_fd * sfp);
 static Sg_device *sg_get_dev(int dev);
+static void sg_put_dev(struct sg_device *sdp);
 #ifdef CONFIG_SCSI_PROC_FS
 static int sg_last_dev(void);
 #endif
@@ -210,6 +213,37 @@ static int sg_last_dev(void);
 #define SZ_SG_IOVEC sizeof(sg_iovec_t)
 #define SZ_SG_REQ_INFO sizeof(sg_req_info_t)
 
+/**
+ * sg_kref_get_not_zero - increment refcount for object if not already 0.
+ * @kref: object.
+ *
+ * If the object is not being destroyed, then increment the refcount and return
+ * true.  Otherwise, return false without doing anything.
+ *
+ * Depending on how you manage the referenced object, there may be ways of
+ * finding the object in between the time kref_put() decrements the refcount to
+ * 0 and the time the destructor finishes.  For example, if the object can be
+ * found in a lookup table, and the destructor removes the object from that
+ * table, then one CPU may still be able to find the object in that table
+ * for a brief period while a different CPU is running the destructor.  To
+ * prevent problems, either call kref_put() while holding a lock that prevents
+ * another CPU from finding the object, or else use this function instead of
+ * kref_get() when using the lookup table to find the object.
+ *
+ * This function should be called while holding whatever lock is used to
+ * protect the data structure used to find the object.  If this function
+ * returns false, then do not access the object after dropping the lock, and do
+ * not call kref_put().
+ */
+static bool __must_check sg_kref_get_not_zero(struct kref *kref)
+{
+	if (likely(atomic_inc_not_zero(&kref->refcount))) {
+		smp_mb__after_atomic_inc();
+		return true;
+	}
+	return false;
+}
+
 static int sg_allow_access(struct file *filp, unsigned char *cmd)
 {
 	struct sg_fd *sfp = (struct sg_fd *)filp->private_data;
@@ -238,21 +272,19 @@ sg_open(struct inode *inode, struct file
 	SCSI_LOG_TIMEOUT(3, printk("sg_open: dev=%d, flags=0x%x\n", dev, flags));
 	sdp = sg_get_dev(dev);
 	if ((!sdp) || (!sdp->device)) {
-		unlock_kernel();
-		return -ENXIO;
+		retval = -ENXIO;
+		goto sg_put;
 	}
 	if (sdp->detached) {
-		unlock_kernel();
-		return -ENODEV;
+		retval = -ENODEV;
+		goto sg_put;
 	}
 
 	/* This driver's module count bumped by fops_get in <linux/fs.h> */
 	/* Prevent the device driver from vanishing while we sleep */
 	retval = scsi_device_get(sdp->device);
-	if (retval) {
-		unlock_kernel();
-		return retval;
-	}
+	if (retval)
+		goto sg_put;
 
 	if (!((flags & O_NONBLOCK) ||
 	      scsi_block_when_processing_errors(sdp->device))) {
@@ -303,16 +335,19 @@ sg_open(struct inode *inode, struct file
 	if ((sfp = sg_add_sfp(sdp, dev)))
 		filp->private_data = sfp;
 	else {
-		if (flags & O_EXCL)
+		if (flags & O_EXCL) {
 			sdp->exclude = 0;	/* undo if error */
+			wake_up_interruptible(&sdp->o_excl_wait);
+		}
 		retval = -ENOMEM;
 		goto error_out;
 	}
-	unlock_kernel();
-	return 0;
-
-      error_out:
-	scsi_device_put(sdp->device);
+	retval = 0;
+error_out:
+	if (retval)
+		scsi_device_put(sdp->device);
+sg_put:
+	sg_put_dev(sdp);
 	unlock_kernel();
 	return retval;
 }
@@ -327,13 +362,13 @@ sg_release(struct inode *inode, struct f
 	if ((!(sfp = (Sg_fd *) filp->private_data)) || (!(sdp = sfp->parentdp)))
 		return -ENXIO;
 	SCSI_LOG_TIMEOUT(3, printk("sg_release: %s\n", sdp->disk->disk_name));
-	if (0 == sg_remove_sfp(sdp, sfp)) {	/* Returns 1 when sdp gone */
-		if (!sdp->detached) {
-			scsi_device_put(sdp->device);
-		}
-		sdp->exclude = 0;
-		wake_up_interruptible(&sdp->o_excl_wait);
-	}
+
+	sfp->closed = 1;
+
+	sdp->exclude = 0;
+	wake_up_interruptible(&sdp->o_excl_wait);
+
+	kref_put(&sfp->f_ref, sg_remove_sfp);
 	return 0;
 }
 
@@ -755,6 +790,7 @@ sg_common_write(Sg_fd * sfp, Sg_request 
 	hp->duration = jiffies_to_msecs(jiffies);
 
 	srp->rq->timeout = timeout;
+	kref_get(&sfp->f_ref); /* sg_rq_end_io() does kref_put(). */
 	blk_execute_rq_nowait(sdp->device->request_queue, sdp->disk,
 			      srp->rq, 1, sg_rq_end_io);
 	return 0;
@@ -1247,25 +1283,28 @@ sg_mmap(struct file *filp, struct vm_are
 static void sg_rq_end_io(struct request *rq, int uptodate)
 {
 	struct sg_request *srp = rq->end_io_data;
-	Sg_device *sdp = NULL;
+	Sg_device *sdp;
 	Sg_fd *sfp;
 	unsigned long iflags;
 	unsigned int ms;
 	char *sense;
-	int result, resid;
+	int result, resid, done = 1;
 
-	if (NULL == srp) {
-		printk(KERN_ERR "sg_cmd_done: NULL request\n");
+	if (unlikely(NULL == srp)) { /* shouldn't happen */
+		printk(KERN_ERR "sg_rq_end_io: NULL request\n");
 		return;
 	}
+
 	sfp = srp->parentfp;
-	if (sfp)
-		sdp = sfp->parentdp;
-	if ((NULL == sdp) || sdp->detached) {
-		printk(KERN_INFO "sg_cmd_done: device detached\n");
+	if (unlikely(NULL == sfp)) { /* shouldn't happen */
+		printk(KERN_ERR "sg_rq_end_io: NULL sg_fd\n");
 		return;
 	}
 
+	sdp = sfp->parentdp;
+	if (unlikely(sdp->detached))
+		printk(KERN_INFO "sg_rq_end_io: device detached\n");
+
 	sense = rq->sense;
 	result = rq->errors;
 	resid = rq->data_len;
@@ -1303,33 +1342,25 @@ static void sg_rq_end_io(struct request 
 	}
 	/* Rely on write phase to clean out srp status values, so no "else" */
 
-	if (sfp->closed) {	/* whoops this fd already released, cleanup */
-		SCSI_LOG_TIMEOUT(1, printk("sg_cmd_done: already closed, freeing ...\n"));
-		sg_finish_rem_req(srp);
-		srp = NULL;
-		if (NULL == sfp->headrp) {
-			SCSI_LOG_TIMEOUT(1, printk("sg_cmd_done: already closed, final cleanup\n"));
-			if (0 == sg_remove_sfp(sdp, sfp)) {	/* device still present */
-				scsi_device_put(sdp->device);
-			}
-			sfp = NULL;
-		}
-	} else if (srp && srp->orphan) {
+	write_lock_irqsave(&sfp->rq_list_lock, iflags);
+	if (unlikely(srp->orphan)) {
 		if (sfp->keep_orphan)
 			srp->sg_io_owned = 0;
-		else {
-			sg_finish_rem_req(srp);
-			srp = NULL;
-		}
+		else
+			done = 0;
 	}
-	if (sfp && srp) {
-		/* Now wake up any sg_read() that is waiting for this packet. */
-		kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN);
-		write_lock_irqsave(&sfp->rq_list_lock, iflags);
-		srp->done = 1;
+	srp->done = done;
+	write_unlock_irqrestore(&sfp->rq_list_lock, iflags);
+
+	if (likely(done)) {
+		/* Now wake up any sg_read() that is waiting for this
+		   packet. */
 		wake_up_interruptible(&sfp->read_wait);
-		write_unlock_irqrestore(&sfp->rq_list_lock, iflags);
-	}
+		kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN);
+	} else
+		sg_finish_rem_req(srp); /* call with srp->done == 0 */
+
+	kref_put(&sfp->f_ref, sg_remove_sfp);
 }
 
 static struct file_operations sg_fops = {
@@ -1391,6 +1422,7 @@ static Sg_device *sg_alloc(struct gendis
 	init_waitqueue_head(&sdp->o_excl_wait);
 	sdp->sg_tablesize = min(q->max_hw_segments, q->max_phys_segments);
 	sdp->index = k;
+	kref_init(&sdp->d_ref);
 
 	error = 0;
  out:
@@ -1488,63 +1520,33 @@ out:
 	return error;
 }
 
-static void
-sg_remove(struct device *cl_dev, struct class_interface *cl_intf)
+static void sg_remove(struct device *cl_dev, struct class_interface *cl_intf)
 {
 	struct scsi_device *scsidp = to_scsi_device(cl_dev->parent);
 	Sg_device *sdp = dev_get_drvdata(cl_dev);
 	unsigned long iflags;
 	Sg_fd *sfp;
-	Sg_fd *tsfp;
-	Sg_request *srp;
-	Sg_request *tsrp;
-	int delay;
 
-	if (!sdp)
+	if (!sdp || sdp->detached)
 		return;
 
-	delay = 0;
-	write_lock_irqsave(&sg_index_lock, iflags);
-	if (sdp->headfp) {
-		sdp->detached = 1;
-		for (sfp = sdp->headfp; sfp; sfp = tsfp) {
-			tsfp = sfp->nextfp;
-			for (srp = sfp->headrp; srp; srp = tsrp) {
-				tsrp = srp->nextrp;
-				if (sfp->closed || (0 == sg_srp_done(srp, sfp)))
-					sg_finish_rem_req(srp);
-			}
-			if (sfp->closed) {
-				scsi_device_put(sdp->device);
-				__sg_remove_sfp(sdp, sfp);
-			} else {
-				delay = 1;
-				wake_up_interruptible(&sfp->read_wait);
-				kill_fasync(&sfp->async_qp, SIGPOLL,
-					    POLL_HUP);
-			}
-		}
-		SCSI_LOG_TIMEOUT(3, printk("sg_remove: dev=%d, dirty\n", sdp->index));
-		if (NULL == sdp->headfp) {
-			idr_remove(&sg_index_idr, sdp->index);
-		}
-	} else {	/* nothing active, simple case */
-		SCSI_LOG_TIMEOUT(3, printk("sg_remove: dev=%d\n", sdp->index));
-		idr_remove(&sg_index_idr, sdp->index);
+	SCSI_LOG_TIMEOUT(3, printk("sg_remove: %s\n", sdp->disk->disk_name));
+
+	sdp->detached = 1;
+
+	read_lock_irqsave(&sg_index_lock, iflags);
+	for (sfp = sdp->headfp; sfp; sfp = sfp->nextfp) {
+		wake_up_interruptible(&sfp->read_wait);
+		kill_fasync(&sfp->async_qp, SIGPOLL, POLL_HUP);
 	}
-	write_unlock_irqrestore(&sg_index_lock, iflags);
+	read_unlock_irqrestore(&sg_index_lock, iflags);
 
 	sysfs_remove_link(&scsidp->sdev_gendev.kobj, "generic");
 	device_destroy(sg_sysfs_class, MKDEV(SCSI_GENERIC_MAJOR, sdp->index));
 	cdev_del(sdp->cdev);
 	sdp->cdev = NULL;
-	put_disk(sdp->disk);
-	sdp->disk = NULL;
-	if (NULL == sdp->headfp)
-		kfree(sdp);
 
-	if (delay)
-		msleep(10);	/* dirty detach so delay device destruction */
+	sg_put_dev(sdp); /* put initial reference from kref_init. */
 }
 
 module_param_named(scatter_elem_sz, scatter_elem_sz, int, S_IRUGO | S_IWUSR);
@@ -2033,18 +2035,21 @@ sg_remove_request(Sg_fd * sfp, Sg_reques
 }
 
 #ifdef CONFIG_SCSI_PROC_FS
-static Sg_fd *
-sg_get_nth_sfp(Sg_device * sdp, int nth)
+static Sg_fd *sg_get_next_sfp(Sg_device *sdp, Sg_fd *prev_sfp)
 {
-	Sg_fd *resp;
+	Sg_fd *sfp;
 	unsigned long iflags;
-	int k;
 
 	read_lock_irqsave(&sg_index_lock, iflags);
-	for (k = 0, resp = sdp->headfp; resp && (k < nth);
-	     ++k, resp = resp->nextfp) ;
+	sfp = (prev_sfp == NULL) ? sdp->headfp : prev_sfp->nextfp;
+	for ( ; sfp != NULL; sfp = sfp->nextfp) {
+		if (sg_kref_get_not_zero(&sfp->f_ref))
+			break;
+	}
 	read_unlock_irqrestore(&sg_index_lock, iflags);
-	return resp;
+	if (prev_sfp != NULL)
+		kref_put(&prev_sfp->f_ref, sg_remove_sfp);
+	return sfp;
 }
 #endif
 
@@ -2062,6 +2067,7 @@ sg_add_sfp(Sg_device * sdp, int dev)
 	init_waitqueue_head(&sfp->read_wait);
 	rwlock_init(&sfp->rq_list_lock);
 
+	kref_init(&sfp->f_ref);
 	sfp->timeout = SG_DEFAULT_TIMEOUT;
 	sfp->timeout_user = SG_DEFAULT_TIMEOUT_USER;
 	sfp->force_packid = SG_DEF_FORCE_PACK_ID;
@@ -2089,15 +2095,49 @@ sg_add_sfp(Sg_device * sdp, int dev)
 	sg_build_reserve(sfp, bufflen);
 	SCSI_LOG_TIMEOUT(3, printk("sg_add_sfp:   bufflen=%d, k_use_sg=%d\n",
 			   sfp->reserve.bufflen, sfp->reserve.k_use_sg));
+
+	kref_get(&sdp->d_ref);
+	__module_get(THIS_MODULE);
 	return sfp;
 }
 
-static void
-__sg_remove_sfp(Sg_device * sdp, Sg_fd * sfp)
+static void sg_remove_sfp_usercontext(struct work_struct *work)
 {
+	struct sg_fd *sfp = container_of(work, struct sg_fd, ew.work);
+	struct sg_device *sdp = sfp->parentdp;
+
+	/* Cleanup any responses which were never read(). */
+	while (sfp->headrp)
+		sg_finish_rem_req(sfp->headrp);
+
+	if (sfp->reserve.bufflen > 0) {
+		SCSI_LOG_TIMEOUT(6,
+			printk("sg_remove_sfp:    bufflen=%d, k_use_sg=%d\n",
+				(int) sfp->reserve.bufflen,
+				(int) sfp->reserve.k_use_sg));
+		sg_remove_scat(&sfp->reserve);
+	}
+
+	SCSI_LOG_TIMEOUT(6,
+		printk("sg_remove_sfp: %s, sfp=0x%p\n",
+			sdp->disk->disk_name,
+			sfp));
+	kfree(sfp);
+
+	scsi_device_put(sdp->device);
+	sg_put_dev(sdp);
+	module_put(THIS_MODULE);
+}
+
+static void sg_remove_sfp(struct kref *kref)
+{
+	struct sg_fd *sfp = container_of(kref, struct sg_fd, f_ref);
+	struct sg_device *sdp = sfp->parentdp;
 	Sg_fd *fp;
 	Sg_fd *prev_fp;
+	unsigned long iflags;
 
+	write_lock_irqsave(&sg_index_lock, iflags);
 	prev_fp = sdp->headfp;
 	if (sfp == prev_fp)
 		sdp->headfp = prev_fp->nextfp;
@@ -2110,54 +2150,10 @@ __sg_remove_sfp(Sg_device * sdp, Sg_fd *
 			prev_fp = fp;
 		}
 	}
-	if (sfp->reserve.bufflen > 0) {
-		SCSI_LOG_TIMEOUT(6, 
-			printk("__sg_remove_sfp:    bufflen=%d, k_use_sg=%d\n",
-			(int) sfp->reserve.bufflen, (int) sfp->reserve.k_use_sg));
-		sg_remove_scat(&sfp->reserve);
-	}
-	sfp->parentdp = NULL;
-	SCSI_LOG_TIMEOUT(6, printk("__sg_remove_sfp:    sfp=0x%p\n", sfp));
-	kfree(sfp);
-}
-
-/* Returns 0 in normal case, 1 when detached and sdp object removed */
-static int
-sg_remove_sfp(Sg_device * sdp, Sg_fd * sfp)
-{
-	Sg_request *srp;
-	Sg_request *tsrp;
-	int dirty = 0;
-	int res = 0;
-
-	for (srp = sfp->headrp; srp; srp = tsrp) {
-		tsrp = srp->nextrp;
-		if (sg_srp_done(srp, sfp))
-			sg_finish_rem_req(srp);
-		else
-			++dirty;
-	}
-	if (0 == dirty) {
-		unsigned long iflags;
+	write_unlock_irqrestore(&sg_index_lock, iflags);
+	wake_up_interruptible(&sdp->o_excl_wait);
 
-		write_lock_irqsave(&sg_index_lock, iflags);
-		__sg_remove_sfp(sdp, sfp);
-		if (sdp->detached && (NULL == sdp->headfp)) {
-			idr_remove(&sg_index_idr, sdp->index);
-			kfree(sdp);
-			res = 1;
-		}
-		write_unlock_irqrestore(&sg_index_lock, iflags);
-	} else {
-		/* MOD_INC's to inhibit unloading sg and associated adapter driver */
-		/* only bump the access_count if we actually succeeded in
-		 * throwing another counter on the host module */
-		scsi_device_get(sdp->device);	/* XXX: retval ignored? */	
-		sfp->closed = 1;	/* flag dirty state on this fd */
-		SCSI_LOG_TIMEOUT(1, printk("sg_remove_sfp: worrisome, %d writes pending\n",
-				  dirty));
-	}
-	return res;
+	execute_in_process_context(sg_remove_sfp_usercontext, &sfp->ew);
 }
 
 static int
@@ -2199,6 +2195,28 @@ sg_last_dev(void)
 }
 #endif
 
+static void sg_device_destroy(struct kref *kref)
+{
+	struct sg_device *sdp = container_of(kref, struct sg_device, d_ref);
+	unsigned long flags;
+
+	SCSI_LOG_TIMEOUT(3,
+		printk("sg_device_destroy: %s\n",
+			sdp->disk->disk_name));
+
+	write_lock_irqsave(&sg_index_lock, flags);
+	idr_remove(&sg_index_idr, sdp->index);
+	write_unlock_irqrestore(&sg_index_lock, flags);
+	put_disk(sdp->disk);
+	kfree(sdp);
+}
+
+static void sg_put_dev(struct sg_device *sdp)
+{
+	if (sdp)
+		kref_put(&sdp->d_ref, sg_device_destroy);
+}
+
 static Sg_device *
 sg_get_dev(int dev)
 {
@@ -2207,6 +2225,9 @@ sg_get_dev(int dev)
 
 	read_lock_irqsave(&sg_index_lock, iflags);
 	sdp = idr_find(&sg_index_idr, dev);
+	if (sdp)
+		if (!sg_kref_get_not_zero(&sdp->d_ref))
+			sdp = NULL;
 	read_unlock_irqrestore(&sg_index_lock, iflags);
 
 	return sdp;
@@ -2480,6 +2501,7 @@ static int sg_proc_seq_show_dev(struct s
 			      (int) scsi_device_online(scsidp));
 	else
 		seq_printf(s, "-1\t-1\t-1\t-1\t-1\t-1\t-1\t-1\t-1\n");
+	sg_put_dev(sdp);
 	return 0;
 }
 
@@ -2500,19 +2522,19 @@ static int sg_proc_seq_show_devstrs(stru
 			   scsidp->vendor, scsidp->model, scsidp->rev);
 	else
 		seq_printf(s, "<no active device>\n");
+	sg_put_dev(sdp);
 	return 0;
 }
 
-static void sg_proc_debug_helper(struct seq_file *s, Sg_device * sdp)
+static void sg_proc_debug_helper(struct seq_file *s, Sg_device *sdp, Sg_fd *fp)
 {
 	int k, m, new_interface, blen, usg;
 	Sg_request *srp;
-	Sg_fd *fp;
 	const sg_io_hdr_t *hp;
 	const char * cp;
 	unsigned int ms;
 
-	for (k = 0; (fp = sg_get_nth_sfp(sdp, k)); ++k) {
+	for (k = 0; fp != NULL; ++k, fp = sg_get_next_sfp(sdp, fp)) {
 		seq_printf(s, "   FD(%d): timeout=%dms bufflen=%d "
 			   "(res)sgat=%d low_dma=%d\n", k + 1,
 			   jiffies_to_msecs(fp->timeout),
@@ -2580,14 +2602,16 @@ static int sg_proc_seq_show_debug(struct
 	sdp = it ? sg_get_dev(it->index) : NULL;
 	if (sdp) {
 		struct scsi_device *scsidp = sdp->device;
+		Sg_fd *fp;
 
 		if (NULL == scsidp) {
 			seq_printf(s, "device %d detached ??\n", 
 				   (int)it->index);
-			return 0;
+			goto out;
 		}
 
-		if (sg_get_nth_sfp(sdp, 0)) {
+		fp = sg_get_next_sfp(sdp, NULL);
+		if (fp) {
 			seq_printf(s, " >>> device=%s ",
 				sdp->disk->disk_name);
 			if (sdp->detached)
@@ -2601,9 +2625,11 @@ static int sg_proc_seq_show_debug(struct
 				     scsidp->host->hostt->emulated);
 			seq_printf(s, " sg_tablesize=%d excl=%d\n",
 				   sdp->sg_tablesize, sdp->exclude);
+			sg_proc_debug_helper(s, sdp, fp);
 		}
-		sg_proc_debug_helper(s, sdp);
 	}
+out:
+	sg_put_dev(sdp);
 	return 0;
 }
 


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/2] sg: fix races during device removal (v2)
  2009-01-14 20:31   ` Tony Battersby
@ 2009-01-14 21:39     ` Greg KH
  2009-01-14 21:59       ` Tony Battersby
  2009-01-14 22:33       ` Stefan Richter
  2009-01-19  6:57     ` FUJITA Tomonori
  1 sibling, 2 replies; 41+ messages in thread
From: Greg KH @ 2009-01-14 21:39 UTC (permalink / raw)
  To: Tony Battersby
  Cc: FUJITA Tomonori, dgilbert, James.Bottomley, hch, linux-scsi

On Wed, Jan 14, 2009 at 03:31:15PM -0500, Tony Battersby wrote:
> 4) Add a variation of kref_get() that uses atomic_inc_not_zero().

Ick.

> When I was designing my previous patches, I anticipated running into
> these complications with kref, which is one reason I avoided it.

Heh.

> Other people have experienced similar problems with the kref interface
> (search "kref_get_not_zero" or "cgroups: fix probable race with
> put_css_set[_taskexit] and find_css_set").  If you still want to use
> kref, I think that we are either going to have to change the behavior
> of /proc/scsi/sg/*, or else use atomic_inc_not_zero().  Since I still
> prefer not to change valid user-visible behavior in a patch that fixes
> so many possible oopses, I will go with atomic_inc_not_zero().  And,
> since I am going that route, I will use it for sg_get_dev() also,
> so that sg_put_dev() doesn't need to acquire sg_index_lock.
> 
> For the purposes of this patch, I added the local function
> sg_kref_get_not_zero() to sg.c.  It would be better to add
> kref_get_not_zero() to kref.c, but I see in the mailing list archives
> that there has been some resistance to that idea because it complicates
> the kref interface.  However, since it has come up more than once,
> perhaps it would be better to go ahead and make it an official part
> of the interface.  If anyone wants to support that idea, I will break
> it out into a separate patch.

I'd be interested in seeing how you propose such a change so that it
works properly for people, because as you have noted, others have had
this same problem.

I strongly object to the use of sg_kref_get_not_zero(), as you are just
providing a private version of this function, which you shouldn't be
doing.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/2] sg: fix races during device removal (v2)
  2009-01-14 21:39     ` Greg KH
@ 2009-01-14 21:59       ` Tony Battersby
  2009-01-14 22:33       ` Stefan Richter
  1 sibling, 0 replies; 41+ messages in thread
From: Tony Battersby @ 2009-01-14 21:59 UTC (permalink / raw)
  To: Greg KH; +Cc: FUJITA Tomonori, dgilbert, James.Bottomley, hch, linux-scsi

Greg KH wrote:
> I'd be interested in seeing how you propose such a change so that it
> works properly for people, because as you have noted, others have had
> this same problem.
>
> I strongly object to the use of sg_kref_get_not_zero(), as you are just
> providing a private version of this function, which you shouldn't be
> doing.
>
>   
OK, then I'll just send kref_get_not_zero() as a separate patch to
kref.c.  If there are no objections to that patch, then I'll update my
sg patch to use the new officially-blessed function.  In the meantime,
the patch I already posted should work the same way as a split patch, so
if anyone has any other objections, then let me know.

Tony

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/2] sg: fix races during device removal (v2)
  2009-01-14 21:39     ` Greg KH
  2009-01-14 21:59       ` Tony Battersby
@ 2009-01-14 22:33       ` Stefan Richter
  2009-01-14 22:53         ` Tony Battersby
  1 sibling, 1 reply; 41+ messages in thread
From: Stefan Richter @ 2009-01-14 22:33 UTC (permalink / raw)
  To: Greg KH, Tony Battersby
  Cc: FUJITA Tomonori, dgilbert, James.Bottomley, hch, linux-scsi

Greg KH wrote:
> On Wed, Jan 14, 2009 at 03:31:15PM -0500, Tony Battersby wrote:
>> 4) Add a variation of kref_get() that uses atomic_inc_not_zero().
> 
> Ick.

I missed the most part of this thread.  But in the grandparent post,
Tony wrote:

>> Fujita, your patch results in simpler code than my version 2 patch,
>> but it still leaves some subtle races and other problems.  The crux
>> of the problem is that kref_put() needs do be done while holding a
>> lock if there is still a way for some other CPU to find a reference
>> to the object in between the time the refcount drops to 0 and the
>> time the destructor is called.

This sounds bogus to me.  If "some other CPU can find a reference to the
object" after the reference count dropped to zero, then the problem is
IMO clear and simple:

Some site did not increase the refcount when it should.
-- 
Stefan Richter
-=====-==--= ---= -===-
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/2] sg: fix races during device removal (v2)
  2009-01-14 22:33       ` Stefan Richter
@ 2009-01-14 22:53         ` Tony Battersby
  2009-01-14 23:47           ` Stefan Richter
  0 siblings, 1 reply; 41+ messages in thread
From: Tony Battersby @ 2009-01-14 22:53 UTC (permalink / raw)
  To: Stefan Richter
  Cc: Greg KH, FUJITA Tomonori, dgilbert, James.Bottomley, hch,
	linux-scsi

Stefan Richter wrote:
> Greg KH wrote:
>   
>> On Wed, Jan 14, 2009 at 03:31:15PM -0500, Tony Battersby wrote:
>>     
>>> 4) Add a variation of kref_get() that uses atomic_inc_not_zero().
>>>       
>> Ick.
>>     
>
> I missed the most part of this thread.  But in the grandparent post,
> Tony wrote:
>
>   
>>> Fujita, your patch results in simpler code than my version 2 patch,
>>> but it still leaves some subtle races and other problems.  The crux
>>> of the problem is that kref_put() needs do be done while holding a
>>> lock if there is still a way for some other CPU to find a reference
>>> to the object in between the time the refcount drops to 0 and the
>>> time the destructor is called.
>>>       
>
> This sounds bogus to me.  If "some other CPU can find a reference to the
> object" after the reference count dropped to zero, then the problem is
> IMO clear and simple:
>
> Some site did not increase the refcount when it should.
>   
The original code makes it possible to find the object and get
information about it right up until the point that the destructor is
called.  However, adding a reference just for this purpose would prevent
the object from ever being freed.  Removing the ability to get
information about closed fds or deleted devices that still have
outstanding commands changes the user-visible behavior, which I didn't
want to do in a patch that is just trying to fix a bunch of oopses. 
However, if the solution that I have proposed isn't acceptable, then
that is exactly what will have to happen.  I can live with that if that
is the consensus, but I was trying to avoid it.

Tony

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/2] sg: fix races during device removal (v2)
  2009-01-14 22:53         ` Tony Battersby
@ 2009-01-14 23:47           ` Stefan Richter
  2009-01-15 14:47             ` Tony Battersby
  0 siblings, 1 reply; 41+ messages in thread
From: Stefan Richter @ 2009-01-14 23:47 UTC (permalink / raw)
  To: Tony Battersby
  Cc: Greg KH, FUJITA Tomonori, dgilbert, James.Bottomley, hch,
	linux-scsi

Tony Battersby wrote:
> Stefan Richter wrote:
>> If "some other CPU can find a reference to the
>> object" after the reference count dropped to zero, then the problem is
>> IMO clear and simple:
>>
>> Some site did not increase the refcount when it should.
>>   
> The original code makes it possible to find the object and get
> information about it right up until the point that the destructor is
> called.  However, adding a reference just for this purpose would prevent
> the object from ever being freed.

No, why?

There is a list or idr with pointers to your objects?  As long as the
list contains a pointer to object A, this list needs to own one
reference count to A.  Right after A was deregistered from the list, A's
reference count is decremented on behalf of the list.

If some site looks objects up in that list and uses the objects, it
increases the refcount of such an object while it accesses the list.
When done with the object, it decrements the object's refcount on this
site's behalf.

Eventually, some site will have been the last one to put away a pointer
to the object.  Then, and only then, the kref goes down to zero and the
destructor is executed.

> Removing the ability to get
> information about closed fds or deleted devices that still have
> outstanding commands changes the user-visible behavior,

I'm not saying you are to remove such an ability; I'm just saying that
as long as any site is ably to get to an object, the refcount of the
object can't be zero.  Bring it down to zero _after_ you made the object
invisible to others.
-- 
Stefan Richter
-=====-==--= ---= -====
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/2] sg: fix races during device removal (v2)
  2009-01-14 23:47           ` Stefan Richter
@ 2009-01-15 14:47             ` Tony Battersby
  2009-01-15 16:22               ` Stefan Richter
  0 siblings, 1 reply; 41+ messages in thread
From: Tony Battersby @ 2009-01-15 14:47 UTC (permalink / raw)
  To: Stefan Richter
  Cc: Greg KH, FUJITA Tomonori, dgilbert, James.Bottomley, hch,
	linux-scsi

Stefan Richter wrote:
> Tony Battersby wrote:
>   
>> Stefan Richter wrote:
>>     
>>> If "some other CPU can find a reference to the
>>> object" after the reference count dropped to zero, then the problem is
>>> IMO clear and simple:
>>>
>>> Some site did not increase the refcount when it should.
>>>   
>>>       
>> The original code makes it possible to find the object and get
>> information about it right up until the point that the destructor is
>> called.  However, adding a reference just for this purpose would prevent
>> the object from ever being freed.
>>     
>
> No, why?
>
> There is a list or idr with pointers to your objects?  As long as the
> list contains a pointer to object A, this list needs to own one
> reference count to A.  Right after A was deregistered from the list, A's
> reference count is decremented on behalf of the list.
>
> If some site looks objects up in that list and uses the objects, it
> increases the refcount of such an object while it accesses the list.
> When done with the object, it decrements the object's refcount on this
> site's behalf.
>
> Eventually, some site will have been the last one to put away a pointer
> to the object.  Then, and only then, the kref goes down to zero and the
> destructor is executed.
>
>   
>> Removing the ability to get
>> information about closed fds or deleted devices that still have
>> outstanding commands changes the user-visible behavior,
>>     
>
> I'm not saying you are to remove such an ability; I'm just saying that
> as long as any site is ably to get to an object, the refcount of the
> object can't be zero.  Bring it down to zero _after_ you made the object
> invisible to others.
>   

Here is an example the way my patch does it:

open fd: kref_init(): refcount = 1
send command: kref_get(): refcount 1 -> 2
close fd: kref_put(): refcount 2 -> 1
/proc access begin: idr_find(); kref_get_not_zero(): refcount 1 -> 2
/proc access end: kref_put(): refcount 2 -> 1
command completes: kref_put(): refcount 1 -> 0: idr_remove()

Here is the alternative:

open fd: kref_init(): refcount = 1
send command: kref_get(): refcount 1 -> 2
close fd: idr_remove(); kref_put(): refcount 2 -> 1
/proc access begin: idr_find(): *** NOT FOUND ***
command completes: kref_put(): refcount 1 -> 0

The only obvious places to remove the idr are either from the destructor
(as done by my patch) or from close (which is what you are probably
thinking).  If you do it from close, /proc no longer gives information
about fds that have been closed but still have commands pending.  I
don't see an obvious way to remove the idr from the command completion
path before kref_put(), since there can be multiple commands
outstanding, and we want to leave the idr in place until the last
command completes.

Most users of kref would do the idr_remove() from close(), since the
object doesn't stay around long enough afterward for it to matter. 
However, some sg commands can take minutes to complete (or sometimes
never complete unless the user explicitly takes action to abort them). 
While there are outstanding commands, the sg module cannot be unloaded,
and re-opening the sg device and trying to send more commands may hang
if the old commands need to be aborted first.  It is nice to have some
way of telling the user why rmmod sg fails and why newly-sent commands
don't complete.

Tony

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/2] sg: fix races during device removal (v2)
  2009-01-15 14:47             ` Tony Battersby
@ 2009-01-15 16:22               ` Stefan Richter
  2009-01-15 16:44                 ` Stefan Richter
                                   ` (2 more replies)
  0 siblings, 3 replies; 41+ messages in thread
From: Stefan Richter @ 2009-01-15 16:22 UTC (permalink / raw)
  To: Tony Battersby
  Cc: Greg KH, FUJITA Tomonori, dgilbert, James.Bottomley, hch,
	linux-scsi

Tony Battersby wrote:
> Here is an example the way my patch does it:
> 
> open fd: kref_init(): refcount = 1
> send command: kref_get(): refcount 1 -> 2
> close fd: kref_put(): refcount 2 -> 1
> /proc access begin: idr_find(); kref_get_not_zero(): refcount 1 -> 2
> /proc access end: kref_put(): refcount 2 -> 1
> command completes: kref_put(): refcount 1 -> 0: idr_remove()
> 
> Here is the alternative:
> 
> open fd: kref_init(): refcount = 1
> send command: kref_get(): refcount 1 -> 2
> close fd: idr_remove(); kref_put(): refcount 2 -> 1
> /proc access begin: idr_find(): *** NOT FOUND ***
> command completes: kref_put(): refcount 1 -> 0
> 
> The only obvious places to remove the idr are either from the destructor
> (as done by my patch) or from close (which is what you are probably
> thinking).

I'm not thinking of a particular solution; I'm not familiar with sg.

> If you do it from close, /proc no longer gives information
> about fds that have been closed but still have commands pending.  I
> don't see an obvious way to remove the idr from the command completion
> path before kref_put(), since there can be multiple commands
> outstanding, and we want to leave the idr in place until the last
> command completes.
[...]

I believe your kref_get_not_zero() invention is because you want to
count two unrelated numbers in the same counter.  This won't work, I'm
afraid.

One number is
  - the number of unfinished transactions (request->response)
of an sg device.  (Is that struct sg_device, or struct sg_fd?)

The other number is
  - the number of contexts which peek and poke at the device
representation's memory.  (struct sg_device, or struct sg_fd?)

These two numbers are of course unrelated.  Hence I recommend you count
them in separate counters.

The second counter can be ideally implemented by means of the
<linux/kref.h> API.

The first counter can perhaps be a kref too, or it can be an atomic_t,
or a plain integer which is accessed inside lock-protected regions.
Which of them works best depends on what happens in detail.  Perhaps
kref is a good fit for this as well.

Now, what to do with these counters.

It's entirely obvious how you count the first number:  Inserted a new
request?  ->  t++.  Got a response or timeout or whatever kind of
transaction completion?  ->  t--.

It's also entirely obvious how you count the second number:  The device
representation is created and put into the idr:  ->  ref = 1.  The
reason is of course because there is now one reference to that instance
owned by the idr.  When a /proc user looks into the idr and "borrows"
the instance:  ->  ref++, and when that user is done:  ->  ref--.  The
reason for doing so is that the /proc user did not actually borrow the
reference, it in fact made a copy of the reference.

But that's not all.  A transaction representation obviously refers to
the device instance as well.  So a request gets a ref++ too, and a
response does a ref--.

Well, that's just like what you described above, except that there is
this other independent counter t.

And we need one more thing to finally get all pieces together:  A state
flag which says "device still open? yes/no".  This flag is necessary to
decide when to take the device representation out of the idr:  If the
device is closed _and_ there are no outstanding transactions anymore,
you unregister the device representation from the idr.

You can do a little trick here and combine this flag with the counter t:
Simply say t = 1 for "this device is open and has no outstanding
transactions yet".  Then increment and decrement for requests and
responses.  Also do t-- once on close().  Then t == 0 means "closed and
all transactions completed".  And any t != 0 means of course "there are
t - 1 transactions pending".  You could also shift it all by one and say
"t == -1 means device closed and 0 transactions pending" but the former
way allows you to use the kref API.  (However, if you ever need to know
whether the device is open or closed, independently whether there are
transactions pending or not, then you can't fold this "open?" flag into
the transaction counter t.)

So we already discussed what happens when t drops to 0:  The device
representation is deregistered from the idr.  Of course at this moment
you also do a ref-- because the reference that was owned by the idr has
been deleted now.

And when ref eventually becomes 0, you clean up and kfree() the device
representation, i.e. call its destructor.

(Furthermore, the device representation surely refers to objects of the
SCSI core.  Hence the constructor and destructor of the sg device
representation engage in refcounting of those SCSI core objects.  And I
presume they refcount the sg module, so that sg cannot be unloaded as
long as some context may call sg's functions.)

PS:
In case you need to wait somewhere for a refcount dropping to 0, the
<linux/completion.h> API may be a very good fit.
-- 
Stefan Richter
-=====-==--= ---= -====
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/2] sg: fix races during device removal (v2)
  2009-01-15 16:22               ` Stefan Richter
@ 2009-01-15 16:44                 ` Stefan Richter
  2009-01-15 18:17                 ` Tony Battersby
  2009-01-16  8:09                 ` Stefan Richter
  2 siblings, 0 replies; 41+ messages in thread
From: Stefan Richter @ 2009-01-15 16:44 UTC (permalink / raw)
  To: Tony Battersby
  Cc: Greg KH, FUJITA Tomonori, dgilbert, James.Bottomley, hch,
	linux-scsi

I wrote:
> You can do a little trick here and combine this flag with the counter t:
> Simply say t = 1 for "this device is open and has no outstanding
> transactions yet".  Then increment and decrement for requests and
> responses.  Also do t-- once on close().  Then t == 0 means "closed and
> all transactions completed".  And any t != 0 means of course "there are
> t - 1 transactions pending".  You could also shift it all by one and say
> "t == -1 means device closed and 0 transactions pending" but the former
> way allows you to use the kref API.  (However, if you ever need to know
> whether the device is open or closed, independently whether there are
> transactions pending or not, then you can't fold this "open?" flag into
> the transaction counter t.)

Also, of course you can't use the kref API for this counter t if you
want to read the number of pending transactions somewhere.  Use an
atomic_t then, or an int whose accessors are serialized by a lock.
-- 
Stefan Richter
-=====-==--= ---= -====
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/2] sg: fix races during device removal (v2)
  2009-01-15 16:22               ` Stefan Richter
  2009-01-15 16:44                 ` Stefan Richter
@ 2009-01-15 18:17                 ` Tony Battersby
  2009-01-15 18:47                   ` Stefan Richter
  2009-01-16  8:09                 ` Stefan Richter
  2 siblings, 1 reply; 41+ messages in thread
From: Tony Battersby @ 2009-01-15 18:17 UTC (permalink / raw)
  To: Stefan Richter
  Cc: Greg KH, FUJITA Tomonori, dgilbert, James.Bottomley, hch,
	linux-scsi

Stefan Richter wrote:
> I believe your kref_get_not_zero() invention is because you want to
> count two unrelated numbers in the same counter.  This won't work, I'm
> afraid.
>
>   
It does work actually, just in a way that people don't seem to like very
much.  And it is not really my invention; other people have run into the
same problem and come up with the same solution.  Sometimes people drop
kref and go directly to using atomic ops just so that they can use
atomic_inc_not_zero() without "violating" the kref API.

> One number is
>   - the number of unfinished transactions (request->response)
> of an sg device.  (Is that struct sg_device, or struct sg_fd?)
>
> The other number is
>   - the number of contexts which peek and poke at the device
> representation's memory.  (struct sg_device, or struct sg_fd?)
>
> These two numbers are of course unrelated.  Hence I recommend you count
> them in separate counters.
>
>   
I might be able to get this to work, but is it really better than
kref_get_not_zero()?  And since there are actually two different types
of objects being refcounted with the same issue (sg_fd and sg_device),
applying the same "trick" to both might require (ugh) four counters
total.  Does anyone else favor this approach?  I expect that if I put
the effort into a patch that does this, someone else will just tell me
to do it some other way.  But if someone else seconds your proposal, I
will give it a try.  Please, cast your votes now.

Rant:
1) I posted a patch that fixes a bunch of oopses.
2) I was told to use a kref because it makes things "simpler".
3) I posted a second patch using one kref, being very careful to avoid
the subtle pitfalls of the kref API.
4) Someone else modified my patch to use two krefs to make things "even
simpler", but it was subtly broken because krefs aren't an ideal fit for
this problem.
5) I fixed the patch that used two krefs by extending the kref API.
6) I get flamed all to hell.
7) I am told to use three krefs now, using one kref to refcount another
kref.  However, it may be necessary to use as many as four.
8) I give up being a programmer, move to Wisconsin, and become a dairy
farmer.

I am of course just joking around and not actually offended, but this
whole thing is starting to seem a bit silly.  How much more complicated
does it have to get just to make things simpler?  NIH syndrome, anyone?

--Tony "moo cow" Battersby

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/2] sg: fix races during device removal (v2)
  2009-01-15 18:17                 ` Tony Battersby
@ 2009-01-15 18:47                   ` Stefan Richter
  2009-01-15 19:14                     ` Stefan Richter
  2009-01-15 19:20                     ` Tony Battersby
  0 siblings, 2 replies; 41+ messages in thread
From: Stefan Richter @ 2009-01-15 18:47 UTC (permalink / raw)
  To: Tony Battersby
  Cc: Greg KH, FUJITA Tomonori, dgilbert, James.Bottomley, hch,
	linux-scsi

Tony Battersby wrote:
> Stefan Richter wrote:
>> I believe your kref_get_not_zero() invention is because you want to
>> count two unrelated numbers in the same counter.  This won't work, I'm
>> afraid.
>>  
> It does work actually, just in a way that people don't seem to like very
> much.

No, it doesn't work.  You can track how many transactions are pending,
and you can track how many sites look at memory X, but you can't track
both issues in the same counter.

If you only count pending transactions, you know when to deregister the
device from the idr.  But you don't know when it's OK to free the
device's memory.

If you cont only references to the memory, you know when it is OK to
free it but you don't know when to deregister from the idr.

> And it is not really my invention; other people have run into the
> same problem and come up with the same solution.

Perhaps these people did not analyze rigorously what it really is what
they want to count?

...
> Please, cast your votes now.

I for one am not interested in what you guys do with sg.

However, the proposal to change the kref API in a way which IMO
undermines the whole principle of reference counting motivated me to
comment in this thread, to try to understand what you want to solve, and
to propose something.

What is this kref_get_not_zero()?  Do you want to encode object state
into the reference counter?  That's not the purpose of a reference
count; use separate object attribute for that.  The reference count says
how many sites look at the object.  Nothing more, nothing less.

> 7) I am told to use three krefs now, using one kref to refcount another
> kref.  However, it may be necessary to use as many as four.
...
> How much more complicated does it have to get just to make things simpler?

As I said:  Why count independent numbers with a single counter?
Count number A with counter a.  Count number B with counter b.  This is
easy, works 100%, and is obvious to everybody.  How much simpler than
that can it get?
-- 
Stefan Richter
-=====-==--= ---= -====
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/2] sg: fix races during device removal (v2)
  2009-01-15 18:47                   ` Stefan Richter
@ 2009-01-15 19:14                     ` Stefan Richter
  2009-01-15 19:20                     ` Tony Battersby
  1 sibling, 0 replies; 41+ messages in thread
From: Stefan Richter @ 2009-01-15 19:14 UTC (permalink / raw)
  To: Tony Battersby
  Cc: Greg KH, FUJITA Tomonori, dgilbert, James.Bottomley, hch,
	linux-scsi

I wrote:
> If you only count pending transactions, you know when to deregister the
> device from the idr.  But you don't know when it's OK to free the
> device's memory.
> 
> If you cont only references to the memory, you know when it is OK to
> free it but you don't know when to deregister from the idr.

PS:
However, if you never copy the idr's reference, then you don't need that
other ref-counter.  To avoid such copies, every site which looks up the
device in the idr would have to perform all its accesses to the device
while inside an sg_index_lock protected section.

Then only the transactions have copies of the pointer to the device
representation in memory.  This of course means that the transaction
counter is equal to the reference counter (plus the one reference of the
idr which you don't need to count because you remove that reference when
there is also no transaction anymore).
-- 
Stefan Richter
-=====-==--= ---= -====
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/2] sg: fix races during device removal (v2)
  2009-01-15 18:47                   ` Stefan Richter
  2009-01-15 19:14                     ` Stefan Richter
@ 2009-01-15 19:20                     ` Tony Battersby
  2009-01-15 20:43                       ` Stefan Richter
  1 sibling, 1 reply; 41+ messages in thread
From: Tony Battersby @ 2009-01-15 19:20 UTC (permalink / raw)
  To: Stefan Richter
  Cc: Greg KH, FUJITA Tomonori, dgilbert, James.Bottomley, hch,
	linux-scsi

Stefan Richter wrote:
> Tony Battersby wrote:
>   
>> Stefan Richter wrote:
>>     
>>> I believe your kref_get_not_zero() invention is because you want to
>>> count two unrelated numbers in the same counter.  This won't work, I'm
>>> afraid.
>>>  
>>>       
>> It does work actually, just in a way that people don't seem to like very
>> much.
>>     
>
> No, it doesn't work.  You can track how many transactions are pending,
> and you can track how many sites look at memory X, but you can't track
> both issues in the same counter.
>
> If you only count pending transactions, you know when to deregister the
> device from the idr.  But you don't know when it's OK to free the
> device's memory.
>
> If you cont only references to the memory, you know when it is OK to
> free it but you don't know when to deregister from the idr.
>
>   

It does work because kref_get_not_zero() must be called while holding a
lock that prevents the destructor from freeing the memory (this
requirement is in the comments that I put above the function).  If
kref_get_not_zero() returns false, then the caller forgets that it found
the object, drops the lock, and lets the destructor continue.  It is
safe from all races that I can see if used properly.  In my opinion, the
only legitimate objection that one could have is if the API is too hard
to understand and use correctly, thereby risking misuse or confusion by
people who don't understand it.

Let me give some examples:

* Example 1 *

CPU 1:
kref_put(): refcount 1 -> 0

CPU 2:
lock data structure
find object
kref_get_not_zero() returns false
forget object
unlock data structure

CPU 1:
lock data structure
remove object
unlock data structure
free object

* Example 2 *

CPU 1:
kref_put(): refcount 1 -> 0
lock data structure
remove object
unlock data structure

CPU 2:
lock data structure
object not found
unlock data structure

CPU 1:
free object

Can you point out an actual problem where this will fail to do the right
thing?

Tony


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/2] sg: fix races during device removal (v2)
  2009-01-15 19:20                     ` Tony Battersby
@ 2009-01-15 20:43                       ` Stefan Richter
  2009-01-15 21:43                         ` Tony Battersby
  0 siblings, 1 reply; 41+ messages in thread
From: Stefan Richter @ 2009-01-15 20:43 UTC (permalink / raw)
  To: Tony Battersby
  Cc: Greg KH, FUJITA Tomonori, dgilbert, James.Bottomley, hch,
	linux-scsi

Tony Battersby wrote:
> It does work because kref_get_not_zero() must be called while holding a
> lock that prevents the destructor from freeing the memory
[...]
> lock data structure
> find object
> kref_get_not_zero() returns false
> forget object
> unlock data structure

I have three comments to that.  1.:

What are locks for?

They serialize accesses to data, in order to protect the data from
invalid modifications due to concurrent accesses.

The sg_index_lock for example protects the data structures sg_index_idr
and sdp->headfp from invalid modifications.  (This can be seen quickly
from the comment at the definition of sg_index_lock or from looking at
the occurrences of sg_index_lock in the code.)

What are reference counters for?

Well, the name says it.  I'd better ask:  What is reference counting
for?  Of course to prevent premature destruction of objects; IOW to
implement lifetime rules.

Locks don't implement lifetime rules.  They only implement serialization
of accesses.

Now, data structures like sg_index_idr have a possibly confusing
relevance to lifetimes of other objects:  sg_index_idr contains
references to those objects.  Whoever has read access to sg_index_idr
has the ability to copy references.  Due to that, sg_index_lock happens
to influence /when/ such copies can be made.  If you want you can
implement some decisions whether to allow such copies inside the lock.

But note:  The lock's influence is incomplete!  Once a copy of a
reference has it made out of the serialized region, it can be again
copied at will.

Hence, lock's don't implement lifetime rules.  They only implement
serialization of data accesses.

In your case, you want to extend sg_index_lock's purposes to also
serialize access to a state variable of the sg device objects.  Nothing
wrong with that, as long as you and, later on, readers of your code are
aware what is going on.

But I suspect you aren't 100% aware what you are doing:  You are
speaking of "a lock that prevents the destructor from freeing the
memory".  That's not precise.  If anything, the *state variable*
prevents the destructor from doing its thing.  The lock only serializes
accesses to the state variable.

2.:

Your kref_get_not_zero() tries to use private data of struct kref as a
state variable of the object.  That's wrong because kref is a reference
counter, not a bitfield to memorize whatever object-specific states that
could exist.

The kref is the number of existing references.  It's not saying anything
about an object's state.  Its saying something about the state of users
of the object:  There are this many users who aren't done yet.

It can't be a good thing to confuse these independent matters.

3.:

There can be only one correct variant of kref_get...():

void kref_get(struct kref *kref)
{
	WARN_ON(!atomic_read(&kref->refcount));
	atomic_inc(&kref->refcount);
	smp_mb__after_atomic_inc();
}

WARN_ON says:  Here is a kernel bug!  And the above WARN_ON is straight
to the point.

Why is it a bug to call kref_get when the reference count is already 0?
Simple:  If you can kref_get an object, then it's because you got hold
of a reference which you are now trying to copy.  One reference is more
than zero references!  If your code works correctly, then there are no
references lying around anywhere anymore at the moment when the
reference count becomes 0.

A kref_get_not_zero() however would only be a valid thing to do if we
say "well, according to our count there are no references anymore, yet
we have a couple references left somewhere, people still use them
(notably to call kref_get_not_zero() on them) --- we don't care, because
our reference count was only a meaningless approximation in the first
place.  Err, wait, why do we do this refcounting thing anyway?"

So, there can't be any legal usage of a kref_get_not_zero().  If
references are counted correctly, kref_get_not_zero() will never ever
come across refcount == 0.
-- 
Stefan Richter
-=====-==--= ---= -====
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/2] sg: fix races during device removal (v2)
  2009-01-15 20:43                       ` Stefan Richter
@ 2009-01-15 21:43                         ` Tony Battersby
  2009-01-15 21:58                           ` Stefan Richter
  2009-01-16  0:53                           ` Stefan Richter
  0 siblings, 2 replies; 41+ messages in thread
From: Tony Battersby @ 2009-01-15 21:43 UTC (permalink / raw)
  To: Stefan Richter
  Cc: Greg KH, FUJITA Tomonori, dgilbert, James.Bottomley, hch,
	linux-scsi

Stefan Richter wrote:
> What are locks for?
>
> They serialize accesses to data, in order to protect the data from
> invalid modifications due to concurrent accesses.
>   
...
> What are reference counters for?
>
> Well, the name says it.  I'd better ask:  What is reference counting
> for?  Of course to prevent premature destruction of objects; IOW to
> implement lifetime rules.
>
> Locks don't implement lifetime rules.  They only implement serialization
> of accesses.
>
>   
...
> In your case, you want to extend sg_index_lock's purposes to also
> serialize access to a state variable of the sg device objects.  Nothing
> wrong with that, as long as you and, later on, readers of your code are
> aware what is going on.
>
>   

OK, so your argument now seems to be that the code just might in
practice work correctly, but in theory it is not philosophically right,
semantically pure, or politically correct because it uses fundamental
programming constructs in odd ways that are not the "right" ways taught
in CS courses that everyone understands and loves.  That is a valid
viewpoint I suppose, but I tend to be much more pragmatic.  BTW, you
might want to do a quick grep of the kernel source for all users of
atomic_inc_not_zero() and make sure they all follow the established CS
curricula:

net/netfilter/nf_conntrack_expect.c:

struct nf_conntrack_expect *
nf_ct_expect_find_get(struct net *net, const struct nf_conntrack_tuple *tuple)
{
	struct nf_conntrack_expect *i;

	rcu_read_lock();
	i = __nf_ct_expect_find(net, tuple);
	if (i && !atomic_inc_not_zero(&i->use))
		i = NULL;
	rcu_read_unlock();

	return i;
}

void nf_ct_expect_put(struct nf_conntrack_expect *exp)
{
	if (atomic_dec_and_test(&exp->use))
		call_rcu(&exp->rcu, nf_ct_expect_free_rcu);
}


fs/nfs/nfs4state.c:

static struct nfs4_state *
__nfs4_find_state_byowner(struct inode *inode, struct nfs4_state_owner *owner)
{
	struct nfs_inode *nfsi = NFS_I(inode);
	struct nfs4_state *state;

	list_for_each_entry(state, &nfsi->open_states, inode_states) {
		if (state->owner != owner)
			continue;
		if (atomic_inc_not_zero(&state->count))
			return state;
	}
	return NULL;
}

struct nfs4_state *
nfs4_get_open_state(struct inode *inode, struct nfs4_state_owner *owner)
{
	struct nfs4_state *state, *new;
	struct nfs_inode *nfsi = NFS_I(inode);

	spin_lock(&inode->i_lock);
	state = __nfs4_find_state_byowner(inode, owner);
	spin_unlock(&inode->i_lock);
	...
}

void nfs4_put_open_state(struct nfs4_state *state)
{
	struct inode *inode = state->inode;
	struct nfs4_state_owner *owner = state->owner;

	if (!atomic_dec_and_lock(&state->count, &owner->so_lock))
		return;
	spin_lock(&inode->i_lock);
	list_del(&state->inode_states);
	list_del(&state->open_states);
	spin_unlock(&inode->i_lock);
	spin_unlock(&owner->so_lock);
	iput(inode);
	nfs4_free_open_state(state);
	nfs4_put_state_owner(owner);
}


... etc.  But I'm sure all those silly network and RCU developers didn't
understand what they were doing.

I would still like for some other people to weigh in with their opinions
before I spend more effort on another patch or else push forward with
kref_get_not_zero().

Tony

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/2] sg: fix races during device removal (v2)
  2009-01-15 21:43                         ` Tony Battersby
@ 2009-01-15 21:58                           ` Stefan Richter
  2009-01-15 22:23                             ` Tony Battersby
  2009-01-16  0:53                           ` Stefan Richter
  1 sibling, 1 reply; 41+ messages in thread
From: Stefan Richter @ 2009-01-15 21:58 UTC (permalink / raw)
  To: Tony Battersby
  Cc: Greg KH, FUJITA Tomonori, dgilbert, James.Bottomley, hch,
	linux-scsi

Tony Battersby wrote:
> OK, so your argument now seems to be that the code just might in
> practice work correctly, but in theory it is not philosophically right,
> semantically pure, or politically correct

Nonsense.

I wrote that there is a bug if you have a reference while the reference
count is zero.
-- 
Stefan Richter
-=====-==--= ---= -====
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/2] sg: fix races during device removal (v2)
  2009-01-15 21:58                           ` Stefan Richter
@ 2009-01-15 22:23                             ` Tony Battersby
  2009-01-15 23:24                               ` Stefan Richter
  0 siblings, 1 reply; 41+ messages in thread
From: Tony Battersby @ 2009-01-15 22:23 UTC (permalink / raw)
  To: Stefan Richter
  Cc: Greg KH, FUJITA Tomonori, dgilbert, James.Bottomley, hch,
	linux-scsi

Stefan Richter wrote:
> Tony Battersby wrote:
>   
>> OK, so your argument now seems to be that the code just might in
>> practice work correctly, but in theory it is not philosophically right,
>> semantically pure, or politically correct
>>     
>
> Nonsense.
>
> I wrote that there is a bug if you have a reference while the reference
> count is zero.
>   
Being pragmatic, I call that a _philosophical_ objection rather than a
bug because the code does actually work in practice as far as I can
tell.  But you can call it a bug if you like.  If you want me to give in
and call it a bug, then you will have to come up with an actual case
where the code fails to do the right thing - memory use after free,
double free, memory leak, oops, etc.

Tony


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/2] sg: fix races during device removal (v2)
  2009-01-15 22:23                             ` Tony Battersby
@ 2009-01-15 23:24                               ` Stefan Richter
  2009-01-16 14:16                                 ` Tony Battersby
  0 siblings, 1 reply; 41+ messages in thread
From: Stefan Richter @ 2009-01-15 23:24 UTC (permalink / raw)
  To: Tony Battersby
  Cc: Greg KH, FUJITA Tomonori, dgilbert, James.Bottomley, hch,
	linux-scsi

Tony Battersby wrote:
> Stefan Richter wrote:
>> I wrote that there is a bug if you have a reference while the reference
>> count is zero.
>>   
> Being pragmatic, I call that a _philosophical_ objection rather than a
> bug because the code does actually work in practice as far as I can
> tell.  But you can call it a bug if you like.  If you want me to give in
> and call it a bug, then you will have to come up with an actual case
> where the code fails to do the right thing - memory use after free,
> double free, memory leak, oops, etc.

I understood that you want to fix bugs which are caused by lack of
reference counting.

You fix it by /reference count estimation/.
(Yet you could also fix it by /reference counting/.)

Actual reference counting would have the benefit that it works not only
right after your fix, but it will continue to work when people proceed
to modify sg, provided they follow the sound, proven, straight-forward,
familiar rules of reference counting:
  - r++ when a copy of a reference is made.
  - r-- when a reference is given up.
Sure, it will also continue to work if those who modify sg follow the
ad-hoc rules of your special reference count estimation, which are:
  - ...
  - ...
  - ...

I recommend to stick to the semantics of "refcount == 0 means there is
no reference anymore" because it's simple and robust.  This is
pragmatism.  I recommend it over "refcount == 0 means that there are,
hmm, still some references somewhere, but believe me, you can ignore
those" because that's not very pragmatic in code which is supposed to be
maintainable and to not crash people's machines.

In the subsystems which I maintain, I made the experience that reference
counting is easier to get right and keep right than some locking tricks
around reference stores or whatever other lifetime assumptions.

Anyway.  Fix sg in whatever way you prefer; you are doing the work, your
opinion counts.  Mine doesn't, as I don't contribute the code.

(I spoke up nevertheless because the idea came up in this thread to
change lib/kref in a way which is only useful to buggy refcounting.
This worried me..)
-- 
Stefan Richter
-=====-==--= ---= -====
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/2] sg: fix races during device removal (v2)
  2009-01-15 23:24                               ` Stefan Richter
@ 2009-01-16 14:16                                 ` Tony Battersby
  0 siblings, 0 replies; 41+ messages in thread
From: Tony Battersby @ 2009-01-16 14:16 UTC (permalink / raw)
  To: Stefan Richter
  Cc: Greg KH, FUJITA Tomonori, dgilbert, James.Bottomley, hch,
	linux-scsi

Stefan,

Thank you for all of your comments, and for you concern for the quality
of the code.  In the end, we all want the same thing, and we are all
trying to reach the same goal.  You have given me an alternate way of
looking at the problem.  I am still not sure which way this one will go,
but at the very least your objections will help me write better comments
and documentation about the controversial approach I am taking.

Tony

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/2] sg: fix races during device removal (v2)
  2009-01-15 21:43                         ` Tony Battersby
  2009-01-15 21:58                           ` Stefan Richter
@ 2009-01-16  0:53                           ` Stefan Richter
  1 sibling, 0 replies; 41+ messages in thread
From: Stefan Richter @ 2009-01-16  0:53 UTC (permalink / raw)
  To: Tony Battersby
  Cc: Greg KH, FUJITA Tomonori, dgilbert, James.Bottomley, hch,
	linux-scsi

Tony Battersby wrote:
> net/netfilter/nf_conntrack_expect.c:
> 
> struct nf_conntrack_expect *
> nf_ct_expect_find_get(struct net *net, const struct nf_conntrack_tuple *tuple)
> {
> 	struct nf_conntrack_expect *i;
> 
> 	rcu_read_lock();
> 	i = __nf_ct_expect_find(net, tuple);
> 	if (i && !atomic_inc_not_zero(&i->use))
> 		i = NULL;
> 	rcu_read_unlock();
> 
> 	return i;
> }
> 
> void nf_ct_expect_put(struct nf_conntrack_expect *exp)
> {
> 	if (atomic_dec_and_test(&exp->use))
> 		call_rcu(&exp->rcu, nf_ct_expect_free_rcu);
> }
> 
> 
> fs/nfs/nfs4state.c:
> 
> static struct nfs4_state *
> __nfs4_find_state_byowner(struct inode *inode, struct nfs4_state_owner *owner)
> {
> 	struct nfs_inode *nfsi = NFS_I(inode);
> 	struct nfs4_state *state;
> 
> 	list_for_each_entry(state, &nfsi->open_states, inode_states) {
> 		if (state->owner != owner)
> 			continue;
> 		if (atomic_inc_not_zero(&state->count))
> 			return state;
> 	}
> 	return NULL;
> }
> 
> struct nfs4_state *
> nfs4_get_open_state(struct inode *inode, struct nfs4_state_owner *owner)
> {
> 	struct nfs4_state *state, *new;
> 	struct nfs_inode *nfsi = NFS_I(inode);
> 
> 	spin_lock(&inode->i_lock);
> 	state = __nfs4_find_state_byowner(inode, owner);
> 	spin_unlock(&inode->i_lock);
> 	...
> }
> 
> void nfs4_put_open_state(struct nfs4_state *state)
> {
> 	struct inode *inode = state->inode;
> 	struct nfs4_state_owner *owner = state->owner;
> 
> 	if (!atomic_dec_and_lock(&state->count, &owner->so_lock))
> 		return;
> 	spin_lock(&inode->i_lock);
> 	list_del(&state->inode_states);
> 	list_del(&state->open_states);
> 	spin_unlock(&inode->i_lock);
> 	spin_unlock(&owner->so_lock);
> 	iput(inode);
> 	nfs4_free_open_state(state);
> 	nfs4_put_state_owner(owner);
> }
> 
> 
> ... etc.  But I'm sure all those silly network and RCU developers didn't
> understand what they were doing.

Is it obvious from these excerpts what the lifetime rules of the objects
are?  Not really immediately.

Do we know merely from looking at the code why its developers chose to
do it this way?  No.

Is there a necessity to do similarly complicated things in sg?  No,
definitely not.
-- 
Stefan Richter
-=====-==--= ---= =----
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/2] sg: fix races during device removal (v2)
  2009-01-15 16:22               ` Stefan Richter
  2009-01-15 16:44                 ` Stefan Richter
  2009-01-15 18:17                 ` Tony Battersby
@ 2009-01-16  8:09                 ` Stefan Richter
  2 siblings, 0 replies; 41+ messages in thread
From: Stefan Richter @ 2009-01-16  8:09 UTC (permalink / raw)
  To: Tony Battersby
  Cc: Greg KH, FUJITA Tomonori, dgilbert, James.Bottomley, hch,
	linux-scsi

I wrote:
> One number is
>   - the number of unfinished transactions (request->response)
> of an sg device. [...] The other number is
>   - the number of contexts which peek and poke at the device
> representation's memory. [...]
> And we need one more thing to finally get all pieces together:  A state
> flag which says "device still open? yes/no".  This flag is necessary to
> decide when to take the device representation out of the idr:

You probably don't need an explicit counter of pending transactions.
This number is implicit in the number of list members in struct
sg_fd.headrp.

So you only need the reference counter and the open-or-closed flag
(which would be different from struct sg_fd.closed).
-- 
Stefan Richter
-=====-==--= ---= =----
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/2] sg: fix races during device removal (v2)
  2009-01-14 20:31   ` Tony Battersby
  2009-01-14 21:39     ` Greg KH
@ 2009-01-19  6:57     ` FUJITA Tomonori
  2009-01-19 15:02       ` Tony Battersby
                         ` (2 more replies)
  1 sibling, 3 replies; 41+ messages in thread
From: FUJITA Tomonori @ 2009-01-19  6:57 UTC (permalink / raw)
  To: tonyb; +Cc: fujita.tomonori, dgilbert, James.Bottomley, hch, linux-scsi, greg

Sorry for the delay,

On Wed, 14 Jan 2009 15:31:15 -0500
Tony Battersby <tonyb@cybernetics.com> wrote:

> [CC: Greg Kroah-Hartman <greg@kroah.com>]
> 
> Fujita, your patch results in simpler code than my version 2 patch,
> but it still leaves some subtle races and other problems.  The crux
> of the problem is that kref_put() needs do be done while holding a
> lock if there is still a way for some other CPU to find a reference
> to the object in between the time the refcount drops to 0 and the
> time the destructor is called.
> 
> For example, look at sg_get_dev() and sg_put_dev().  In my patch,
> I locked sg_index_lock in sg_put_dev() and called kref_put() while
> holding the lock.  In your patch, you moved the lock from sg_put_dev()
> to the destructor function.  This introduces a subtle race with
> sg_get_dev():
> 
> CPU 1:
> sg_put_dev()
>   kref_put(): refcount 1 -> 0
> 
> CPU 2:
> sg_get_dev()
>   lock sg_index_lock
>   idr_find()
>   kref_get(): refcount 0 -> 1: bug
>   unlock sg_index_lock
> 
> CPU 1:
> sg_device_release()
>   lock sg_index_lock
>   idr_remove
>   unlock sg_index_lock
>   kfree(sdp)
> 
> CPU 2:
> access sdp which has already been freed
> sg_put_dev(): double-free bug
> 
> Besides holding the lock during kref_put(), I also considered two
> other simple ways to avoid this race:
> 1) Do idr_remove() from sg_remove().
> 2) Return NULL in sg_get_dev() if sdp->detached.
> 
> However, both of these options would have changed the behavior of
> the /proc/scsi/sg/* functions that show information for devices that
> are in the process of being detached.  I wanted to fix bugs without
> changing other behavior, so I chose to call kref_put() under lock in
> my previous patches.

How about doing 2) and accessing to /proc/scsi/sg/* with
sg_index_lock (don't use sg_get_device for it).

What /proc/scsi/sg/* doing is abnormal from the perspective of the ref
counting (accessing to something that is going away).

>From the perspective of the ref counting, the best way is calling
idr_remove in from sg_remove but as you said, it's not nice to change
the behavior.

If we don't use sg_get_device for /proc/scsi/sg/*, then we use
sg_get_device for only sg_open So we can do 2) without changing the
behavior.


> A similar problem exists with sg_remove_sfp() in your patch.
> The refcount for the sg_fd can drop to 0 while a different CPU can
> still find a reference to it by scanning sg_device::headfp.  The good
> news is that most places that scan sg_fd's using sdp->headfp never
> drop sg_index_lock while still holding a reference, so for most cases
> it is safe for the destructor to work the way you have it.  The one
> exception is sg_get_nth_sfp(), which needs to kref_get() the sg_fd it
> returns (my version 2 patch doesn't fix this problem either since I
> just noticed it).  But then the kref_get() would have a race similar
> to the one I described above between sg_get_dev() and sg_put_dev().

If we go with the above way, we can solve this? sg_get_nth_sfp is used
for only /proc/scsi/sg/*.


> There are several solutions to this problem:
> 1) Call kref_put(&sfp->f_ref, sg_remove_sfp) while holding
> sg_index_lock, and have the destructor unlink sfp from sdp->headfp
> before dropping the lock.  This is ugly since the destructor may want
> to drop sg_index_lock (e.g. to call scsi_device_put()), but the saved
> irqflags variable is in a different function.
> 2) Unlink sfp from sdp->headfp in sg_release().  The downside to this
> is that the /proc/scsi/sg/* functions no longer show fds that have
> been closed but still have commands pending.
> 3) Set sfp->closed in sg_release() and have sg_get_nth_sfp() return
> NULL if sfp->closed.  Same downside as #2.
> 4) Add a variation of kref_get() that uses atomic_inc_not_zero().
> 
> When I was designing my previous patches, I anticipated running into
> these complications with kref, which is one reason I avoided it.
> Other people have experienced similar problems with the kref interface
> (search "kref_get_not_zero" or "cgroups: fix probable race with
> put_css_set[_taskexit] and find_css_set").  If you still want to use
> kref, I think that we are either going to have to change the behavior
> of /proc/scsi/sg/*, or else use atomic_inc_not_zero().  Since I still
> prefer not to change valid user-visible behavior in a patch that fixes
> so many possible oopses, I will go with atomic_inc_not_zero().  And,
> since I am going that route, I will use it for sg_get_dev() also,
> so that sg_put_dev() doesn't need to acquire sg_index_lock.
> 
> For the purposes of this patch, I added the local function
> sg_kref_get_not_zero() to sg.c.  It would be better to add
> kref_get_not_zero() to kref.c, but I see in the mailing list archives
> that there has been some resistance to that idea because it complicates
> the kref interface.  However, since it has come up more than once,
> perhaps it would be better to go ahead and make it an official part
> of the interface.  If anyone wants to support that idea, I will break
> it out into a separate patch.

I've not read the whole discussion, but using the common mechanism in
your original way is not good. It confuses other developers and other
developers don't like it.


> Your patch also has a problem with management of f_ref.  You do
> kref_get() in sg_add_request() and kref_put() in sg_rq_end_io()().
> However, an error can occur in between sg_add_request() and sending
> the command to the midlevel, so sg_rq_end_io() may never be called.
> It is better to do kref_get() just before sending the command to the
> midlevel; that way the refcount counts active commands waiting for
> the completion callback rather than prepared requests.

Oops, yeah, you are right.


> Your patch doesn't fix the problem of various functions calling
> SCSI_LOG_TIMEOUT with sdp->disk->disk_name after sg_remove() sets
> sdp->disk to NULL.  This is why I moved put_disk() from sg_remove()
> to sg_device_release().

As I wrote in the previous mail, I don't try to fix all the race in
this patch. But yeah, we need to clean up sdp->disk in
sg_device_release().


> Your patch doesn't fix the problem of forgetting to do
> wake_up_interruptible(&sdp->o_excl_wait) after removing sfp from
> sdp->headfp.  This is important if sg_release() is called with
> commands still active; the last command that finishes should wake up
> any O_EXCL waiters.

I think we can fix this even with my patch.


Thanks,

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/2] sg: fix races during device removal (v2)
  2009-01-19  6:57     ` FUJITA Tomonori
@ 2009-01-19 15:02       ` Tony Battersby
  2009-01-19 23:03       ` [PATCH 1/2] sg: fix races during device removal (v4) Tony Battersby
  2009-01-19 23:06       ` [PATCH 2/2] sg: fix races with ioctl(SG_IO) Tony Battersby
  2 siblings, 0 replies; 41+ messages in thread
From: Tony Battersby @ 2009-01-19 15:02 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: dgilbert, James.Bottomley, hch, linux-scsi, greg

FUJITA Tomonori wrote:
>> Besides holding the lock during kref_put(), I also considered two
>> other simple ways to avoid this race:
>> 1) Do idr_remove() from sg_remove().
>> 2) Return NULL in sg_get_dev() if sdp->detached.
>>
>> However, both of these options would have changed the behavior of
>> the /proc/scsi/sg/* functions that show information for devices that
>> are in the process of being detached.  I wanted to fix bugs without
>> changing other behavior, so I chose to call kref_put() under lock in
>> my previous patches.
>>     
>
> How about doing 2) and accessing to /proc/scsi/sg/* with
> sg_index_lock (don't use sg_get_device for it).
>
>   
That's an excellent idea.  If it works, I can forget this whole
atomic_inc_not_zero() business and the pain that goes with it.
> What /proc/scsi/sg/* doing is abnormal from the perspective of the ref
> counting (accessing to something that is going away).
>
> >From the perspective of the ref counting, the best way is calling
> idr_remove in from sg_remove but as you said, it's not nice to change
> the behavior.
>
>   
Exactly.
> If we don't use sg_get_device for /proc/scsi/sg/*, then we use
> sg_get_device for only sg_open So we can do 2) without changing the
> behavior.
>
>   
I will try to implement it this way today.  Thanks for your input.

Tony


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 1/2] sg: fix races during device removal (v4)
  2009-01-19  6:57     ` FUJITA Tomonori
  2009-01-19 15:02       ` Tony Battersby
@ 2009-01-19 23:03       ` Tony Battersby
  2009-01-20  1:06         ` FUJITA Tomonori
  2009-01-19 23:06       ` [PATCH 2/2] sg: fix races with ioctl(SG_IO) Tony Battersby
  2 siblings, 1 reply; 41+ messages in thread
From: Tony Battersby @ 2009-01-19 23:03 UTC (permalink / raw)
  To: James.Bottomley; +Cc: FUJITA Tomonori, dgilbert, hch, linux-scsi

sg has the following problems related to device removal:

* opening a sg fd races with removing a device
* closing a sg fd races with removing a device
* /proc/scsi/sg/* access races with removing a device
* command completion races with removing a device
* command completion races with closing a sg fd
* can rmmod sg with active commands

These problems can cause kernel oopses, memory-use-after-free, or
double-free errors.  This patch fixes these problems by using krefs
to manage the lifetime of sg_device and sg_fd.

Each command submitted to the midlevel holds a reference to sg_fd
until the completion callback.  This ensures that sg_fd doesn't go
away if the fd is closed with commands still outstanding.

sg_fd gets the reference of sg_device (with scsi_device) and also
makes sure that the sg module doesn't go away.

/proc/scsi/sg/* functions don't play nicely with krefs because they
give information about sg_fds which have been closed but not yet
freed due to still having outstanding commands and sg_devices which
have been removed but not yet freed due to still being referenced
by one or more sg_fds.  To deal with this safely without removing
functionality, /proc functions now access sg_device and sg_fd while
holding a lock instead of using kref_get()/kref_put().

Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
---

This version implements Fujita's suggestion to use locking instead
of kref_get() for the /proc functions.

This version does not use any controversial extensions to the kref API.

To avoid changing the behavior of the /proc functions, it is still
possible to find a sg_fd or a sg_device with a refcount of 0,
since the last remaining references are removed from the kref_put()
release functions.  However, no one should be doing a kref_get()
under these circumstances, so it shouldn't cause a problem.  I added
some comments warning about this non-standard behavior to reduce the
risk of tripping up someone in the future.

The original sg_open() would return -ENXIO if idr_find() returned
NULL but -ENODEV if sdp->detached.  It would be inconvenient to
keep this behavior without inlining sg_get_dev() into sg_open().
Since sg_open() was the only remaining caller anyway, I went ahead
and inlined it and removed sg_get_dev(), thereby enabling sg_open()
to return the same error codes as before.

This version still passes all the tests listed previously.  However,
mptscsih.c still needs some fixes; I will submit a patch for that
later.

 sg.c |  386 ++++++++++++++++++++++++++++++++-----------------------------------
 1 file changed, 186 insertions(+), 200 deletions(-)

--- linux-2.6.29-rc2/drivers/scsi/sg.c.orig	2009-01-19 16:19:01.000000000 -0500
+++ linux-2.6.29-rc2/drivers/scsi/sg.c	2009-01-19 16:26:21.000000000 -0500
@@ -101,6 +101,7 @@ static int scatter_elem_sz_prev = SG_SCA
 #define SG_SECTOR_MSK (SG_SECTOR_SZ - 1)
 
 static int sg_add(struct device *, struct class_interface *);
+static void sg_device_destroy(struct kref *kref);
 static void sg_remove(struct device *, struct class_interface *);
 
 static DEFINE_IDR(sg_index_idr);
@@ -158,6 +159,12 @@ typedef struct sg_fd {		/* holds the sta
 	char next_cmd_len;	/* 0 -> automatic (def), >0 -> use on next write() */
 	char keep_orphan;	/* 0 -> drop orphan (def), 1 -> keep for read() */
 	char mmap_called;	/* 0 -> mmap() never called on this fd */
+	/* CAUTION!  sfp is unlinked from sdp->headfp AFTER the refcount drops
+	   to 0.  When scanning sdp->headfp, it is not safe to call
+	   kref_get(&sfp->f_ref) if sfp->closed because the refcount may
+	   already be 0. */
+	struct kref f_ref;
+	struct execute_work ew;
 } Sg_fd;
 
 typedef struct sg_device { /* holds the state of each scsi generic device */
@@ -171,6 +178,22 @@ typedef struct sg_device { /* holds the 
 	char sgdebug;		/* 0->off, 1->sense, 9->dump dev, 10-> all devs */
 	struct gendisk *disk;
 	struct cdev * cdev;	/* char_dev [sysfs: /sys/cdev/major/sg<n>] */
+	/* CAUTION!  idr_remove() is called AFTER the refcount drops to 0.
+	   When using idr_find(), it is not safe to call kref_get(&sdp->d_ref)
+	   if sdp->detached because the refcount may already be 0.
+
+	   In order to get a reference safely using idr_find():
+	   1) lock sg_index_lock (either read or write lock)
+	   2) call idr_find()
+	   3) if sdp->detached, pretend that idr_find() returned NULL, and do
+	      not call kref_get()
+	   4) if !sdp->detached, then kref_get(&sdp->d_ref)
+	   5) unlock sg_index_lock
+
+	   Alternately, you can use idr_find() to access the device even if
+	   sdp->detached if you do not drop sg_index_lock, in which case you
+	   should NOT call kref_get(&sdp->d_ref). */
+	struct kref d_ref;
 } Sg_device;
 
 static int sg_fasync(int fd, struct file *filp, int mode);
@@ -194,13 +217,11 @@ static void sg_build_reserve(Sg_fd * sfp
 static void sg_link_reserve(Sg_fd * sfp, Sg_request * srp, int size);
 static void sg_unlink_reserve(Sg_fd * sfp, Sg_request * srp);
 static Sg_fd *sg_add_sfp(Sg_device * sdp, int dev);
-static int sg_remove_sfp(Sg_device * sdp, Sg_fd * sfp);
-static void __sg_remove_sfp(Sg_device * sdp, Sg_fd * sfp);
+static void sg_remove_sfp(struct kref *);
 static Sg_request *sg_get_rq_mark(Sg_fd * sfp, int pack_id);
 static Sg_request *sg_add_request(Sg_fd * sfp);
 static int sg_remove_request(Sg_fd * sfp, Sg_request * srp);
 static int sg_res_in_use(Sg_fd * sfp);
-static Sg_device *sg_get_dev(int dev);
 #ifdef CONFIG_SCSI_PROC_FS
 static int sg_last_dev(void);
 #endif
@@ -236,23 +257,30 @@ sg_open(struct inode *inode, struct file
 	lock_kernel();
 	nonseekable_open(inode, filp);
 	SCSI_LOG_TIMEOUT(3, printk("sg_open: dev=%d, flags=0x%x\n", dev, flags));
-	sdp = sg_get_dev(dev);
-	if ((!sdp) || (!sdp->device)) {
-		unlock_kernel();
-		return -ENXIO;
-	}
-	if (sdp->detached) {
-		unlock_kernel();
-		return -ENODEV;
+
+	retval = -ENXIO;
+	read_lock_irq(&sg_index_lock);
+	sdp = idr_find(&sg_index_idr, dev);
+	if (sdp) {
+		/* See comments about using idr_find() and d_ref.  If
+		   sdp->detached, then the refcount may already be 0, in which
+		   case it would be a bug to do kref_get(). */
+		if (!sdp->detached)
+			kref_get(&sdp->d_ref);
+		else {
+			sdp = NULL;
+			retval = -ENODEV;
+		}
 	}
+	read_unlock_irq(&sg_index_lock);
+	if ((!sdp) || (!sdp->device))
+		goto sg_put;
 
 	/* This driver's module count bumped by fops_get in <linux/fs.h> */
 	/* Prevent the device driver from vanishing while we sleep */
 	retval = scsi_device_get(sdp->device);
-	if (retval) {
-		unlock_kernel();
-		return retval;
-	}
+	if (retval)
+		goto sg_put;
 
 	if (!((flags & O_NONBLOCK) ||
 	      scsi_block_when_processing_errors(sdp->device))) {
@@ -303,16 +331,20 @@ sg_open(struct inode *inode, struct file
 	if ((sfp = sg_add_sfp(sdp, dev)))
 		filp->private_data = sfp;
 	else {
-		if (flags & O_EXCL)
+		if (flags & O_EXCL) {
 			sdp->exclude = 0;	/* undo if error */
+			wake_up_interruptible(&sdp->o_excl_wait);
+		}
 		retval = -ENOMEM;
 		goto error_out;
 	}
-	unlock_kernel();
-	return 0;
-
-      error_out:
-	scsi_device_put(sdp->device);
+	retval = 0;
+error_out:
+	if (retval)
+		scsi_device_put(sdp->device);
+sg_put:
+	if (sdp)
+		kref_put(&sdp->d_ref, sg_device_destroy);
 	unlock_kernel();
 	return retval;
 }
@@ -327,13 +359,14 @@ sg_release(struct inode *inode, struct f
 	if ((!(sfp = (Sg_fd *) filp->private_data)) || (!(sdp = sfp->parentdp)))
 		return -ENXIO;
 	SCSI_LOG_TIMEOUT(3, printk("sg_release: %s\n", sdp->disk->disk_name));
-	if (0 == sg_remove_sfp(sdp, sfp)) {	/* Returns 1 when sdp gone */
-		if (!sdp->detached) {
-			scsi_device_put(sdp->device);
-		}
-		sdp->exclude = 0;
-		wake_up_interruptible(&sdp->o_excl_wait);
-	}
+
+	sfp->closed = 1;
+
+	sdp->exclude = 0;
+	wake_up_interruptible(&sdp->o_excl_wait);
+
+	kref_put(&sfp->f_ref, sg_remove_sfp);
+
 	return 0;
 }
 
@@ -755,6 +788,7 @@ sg_common_write(Sg_fd * sfp, Sg_request 
 	hp->duration = jiffies_to_msecs(jiffies);
 
 	srp->rq->timeout = timeout;
+	kref_get(&sfp->f_ref); /* sg_rq_end_io() does kref_put(). */
 	blk_execute_rq_nowait(sdp->device->request_queue, sdp->disk,
 			      srp->rq, 1, sg_rq_end_io);
 	return 0;
@@ -1247,25 +1281,28 @@ sg_mmap(struct file *filp, struct vm_are
 static void sg_rq_end_io(struct request *rq, int uptodate)
 {
 	struct sg_request *srp = rq->end_io_data;
-	Sg_device *sdp = NULL;
+	Sg_device *sdp;
 	Sg_fd *sfp;
 	unsigned long iflags;
 	unsigned int ms;
 	char *sense;
-	int result, resid;
+	int result, resid, done = 1;
 
-	if (NULL == srp) {
-		printk(KERN_ERR "sg_cmd_done: NULL request\n");
+	if (unlikely(NULL == srp)) { /* shouldn't happen */
+		printk(KERN_ERR "sg_rq_end_io: NULL request\n");
 		return;
 	}
+
 	sfp = srp->parentfp;
-	if (sfp)
-		sdp = sfp->parentdp;
-	if ((NULL == sdp) || sdp->detached) {
-		printk(KERN_INFO "sg_cmd_done: device detached\n");
+	if (unlikely(NULL == sfp)) { /* shouldn't happen */
+		printk(KERN_ERR "sg_rq_end_io: NULL sg_fd\n");
 		return;
 	}
 
+	sdp = sfp->parentdp;
+	if (unlikely(sdp->detached))
+		printk(KERN_INFO "sg_rq_end_io: device detached\n");
+
 	sense = rq->sense;
 	result = rq->errors;
 	resid = rq->data_len;
@@ -1303,33 +1340,25 @@ static void sg_rq_end_io(struct request 
 	}
 	/* Rely on write phase to clean out srp status values, so no "else" */
 
-	if (sfp->closed) {	/* whoops this fd already released, cleanup */
-		SCSI_LOG_TIMEOUT(1, printk("sg_cmd_done: already closed, freeing ...\n"));
-		sg_finish_rem_req(srp);
-		srp = NULL;
-		if (NULL == sfp->headrp) {
-			SCSI_LOG_TIMEOUT(1, printk("sg_cmd_done: already closed, final cleanup\n"));
-			if (0 == sg_remove_sfp(sdp, sfp)) {	/* device still present */
-				scsi_device_put(sdp->device);
-			}
-			sfp = NULL;
-		}
-	} else if (srp && srp->orphan) {
+	write_lock_irqsave(&sfp->rq_list_lock, iflags);
+	if (unlikely(srp->orphan)) {
 		if (sfp->keep_orphan)
 			srp->sg_io_owned = 0;
-		else {
-			sg_finish_rem_req(srp);
-			srp = NULL;
-		}
+		else
+			done = 0;
 	}
-	if (sfp && srp) {
-		/* Now wake up any sg_read() that is waiting for this packet. */
-		kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN);
-		write_lock_irqsave(&sfp->rq_list_lock, iflags);
-		srp->done = 1;
+	srp->done = done;
+	write_unlock_irqrestore(&sfp->rq_list_lock, iflags);
+
+	if (likely(done)) {
+		/* Now wake up any sg_read() that is waiting for this
+		   packet. */
 		wake_up_interruptible(&sfp->read_wait);
-		write_unlock_irqrestore(&sfp->rq_list_lock, iflags);
-	}
+		kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN);
+	} else
+		sg_finish_rem_req(srp); /* call with srp->done == 0 */
+
+	kref_put(&sfp->f_ref, sg_remove_sfp);
 }
 
 static struct file_operations sg_fops = {
@@ -1391,6 +1420,7 @@ static Sg_device *sg_alloc(struct gendis
 	init_waitqueue_head(&sdp->o_excl_wait);
 	sdp->sg_tablesize = min(q->max_hw_segments, q->max_phys_segments);
 	sdp->index = k;
+	kref_init(&sdp->d_ref);
 
 	error = 0;
  out:
@@ -1488,49 +1518,45 @@ out:
 	return error;
 }
 
-static void
-sg_remove(struct device *cl_dev, struct class_interface *cl_intf)
+static void sg_device_destroy(struct kref *kref)
+{
+	struct sg_device *sdp = container_of(kref, struct sg_device, d_ref);
+	unsigned long flags;
+
+	/* CAUTION!  Note that the device can still be found via idr_find()
+	   even though the refcount is 0.  Therefore, do idr_remove() BEFORE
+	   any other cleanup. */
+
+	write_lock_irqsave(&sg_index_lock, flags);
+	idr_remove(&sg_index_idr, sdp->index);
+	write_unlock_irqrestore(&sg_index_lock, flags);
+
+	SCSI_LOG_TIMEOUT(3,
+		printk("sg_device_destroy: %s\n",
+			sdp->disk->disk_name));
+
+	put_disk(sdp->disk);
+	kfree(sdp);
+}
+
+static void sg_remove(struct device *cl_dev, struct class_interface *cl_intf)
 {
 	struct scsi_device *scsidp = to_scsi_device(cl_dev->parent);
 	Sg_device *sdp = dev_get_drvdata(cl_dev);
 	unsigned long iflags;
 	Sg_fd *sfp;
-	Sg_fd *tsfp;
-	Sg_request *srp;
-	Sg_request *tsrp;
-	int delay;
 
-	if (!sdp)
+	if (!sdp || sdp->detached)
 		return;
 
-	delay = 0;
+	SCSI_LOG_TIMEOUT(3, printk("sg_remove: %s\n", sdp->disk->disk_name));
+
+	/* Need a write lock to set sdp->detached. */
 	write_lock_irqsave(&sg_index_lock, iflags);
-	if (sdp->headfp) {
-		sdp->detached = 1;
-		for (sfp = sdp->headfp; sfp; sfp = tsfp) {
-			tsfp = sfp->nextfp;
-			for (srp = sfp->headrp; srp; srp = tsrp) {
-				tsrp = srp->nextrp;
-				if (sfp->closed || (0 == sg_srp_done(srp, sfp)))
-					sg_finish_rem_req(srp);
-			}
-			if (sfp->closed) {
-				scsi_device_put(sdp->device);
-				__sg_remove_sfp(sdp, sfp);
-			} else {
-				delay = 1;
-				wake_up_interruptible(&sfp->read_wait);
-				kill_fasync(&sfp->async_qp, SIGPOLL,
-					    POLL_HUP);
-			}
-		}
-		SCSI_LOG_TIMEOUT(3, printk("sg_remove: dev=%d, dirty\n", sdp->index));
-		if (NULL == sdp->headfp) {
-			idr_remove(&sg_index_idr, sdp->index);
-		}
-	} else {	/* nothing active, simple case */
-		SCSI_LOG_TIMEOUT(3, printk("sg_remove: dev=%d\n", sdp->index));
-		idr_remove(&sg_index_idr, sdp->index);
+	sdp->detached = 1;
+	for (sfp = sdp->headfp; sfp; sfp = sfp->nextfp) {
+		wake_up_interruptible(&sfp->read_wait);
+		kill_fasync(&sfp->async_qp, SIGPOLL, POLL_HUP);
 	}
 	write_unlock_irqrestore(&sg_index_lock, iflags);
 
@@ -1538,13 +1564,8 @@ sg_remove(struct device *cl_dev, struct 
 	device_destroy(sg_sysfs_class, MKDEV(SCSI_GENERIC_MAJOR, sdp->index));
 	cdev_del(sdp->cdev);
 	sdp->cdev = NULL;
-	put_disk(sdp->disk);
-	sdp->disk = NULL;
-	if (NULL == sdp->headfp)
-		kfree(sdp);
 
-	if (delay)
-		msleep(10);	/* dirty detach so delay device destruction */
+	kref_put(&sdp->d_ref, sg_device_destroy);
 }
 
 module_param_named(scatter_elem_sz, scatter_elem_sz, int, S_IRUGO | S_IWUSR);
@@ -1941,22 +1962,6 @@ sg_get_rq_mark(Sg_fd * sfp, int pack_id)
 	return resp;
 }
 
-#ifdef CONFIG_SCSI_PROC_FS
-static Sg_request *
-sg_get_nth_request(Sg_fd * sfp, int nth)
-{
-	Sg_request *resp;
-	unsigned long iflags;
-	int k;
-
-	read_lock_irqsave(&sfp->rq_list_lock, iflags);
-	for (k = 0, resp = sfp->headrp; resp && (k < nth);
-	     ++k, resp = resp->nextrp) ;
-	read_unlock_irqrestore(&sfp->rq_list_lock, iflags);
-	return resp;
-}
-#endif
-
 /* always adds to end of list */
 static Sg_request *
 sg_add_request(Sg_fd * sfp)
@@ -2032,22 +2037,6 @@ sg_remove_request(Sg_fd * sfp, Sg_reques
 	return res;
 }
 
-#ifdef CONFIG_SCSI_PROC_FS
-static Sg_fd *
-sg_get_nth_sfp(Sg_device * sdp, int nth)
-{
-	Sg_fd *resp;
-	unsigned long iflags;
-	int k;
-
-	read_lock_irqsave(&sg_index_lock, iflags);
-	for (k = 0, resp = sdp->headfp; resp && (k < nth);
-	     ++k, resp = resp->nextfp) ;
-	read_unlock_irqrestore(&sg_index_lock, iflags);
-	return resp;
-}
-#endif
-
 static Sg_fd *
 sg_add_sfp(Sg_device * sdp, int dev)
 {
@@ -2062,6 +2051,7 @@ sg_add_sfp(Sg_device * sdp, int dev)
 	init_waitqueue_head(&sfp->read_wait);
 	rwlock_init(&sfp->rq_list_lock);
 
+	kref_init(&sfp->f_ref);
 	sfp->timeout = SG_DEFAULT_TIMEOUT;
 	sfp->timeout_user = SG_DEFAULT_TIMEOUT_USER;
 	sfp->force_packid = SG_DEF_FORCE_PACK_ID;
@@ -2089,15 +2079,53 @@ sg_add_sfp(Sg_device * sdp, int dev)
 	sg_build_reserve(sfp, bufflen);
 	SCSI_LOG_TIMEOUT(3, printk("sg_add_sfp:   bufflen=%d, k_use_sg=%d\n",
 			   sfp->reserve.bufflen, sfp->reserve.k_use_sg));
+
+	kref_get(&sdp->d_ref);
+	__module_get(THIS_MODULE);
 	return sfp;
 }
 
-static void
-__sg_remove_sfp(Sg_device * sdp, Sg_fd * sfp)
+static void sg_remove_sfp_usercontext(struct work_struct *work)
+{
+	struct sg_fd *sfp = container_of(work, struct sg_fd, ew.work);
+	struct sg_device *sdp = sfp->parentdp;
+
+	/* Cleanup any responses which were never read(). */
+	while (sfp->headrp)
+		sg_finish_rem_req(sfp->headrp);
+
+	if (sfp->reserve.bufflen > 0) {
+		SCSI_LOG_TIMEOUT(6,
+			printk("sg_remove_sfp:    bufflen=%d, k_use_sg=%d\n",
+				(int) sfp->reserve.bufflen,
+				(int) sfp->reserve.k_use_sg));
+		sg_remove_scat(&sfp->reserve);
+	}
+
+	SCSI_LOG_TIMEOUT(6,
+		printk("sg_remove_sfp: %s, sfp=0x%p\n",
+			sdp->disk->disk_name,
+			sfp));
+	kfree(sfp);
+
+	scsi_device_put(sdp->device);
+	kref_put(&sdp->d_ref, sg_device_destroy);
+	module_put(THIS_MODULE);
+}
+
+static void sg_remove_sfp(struct kref *kref)
 {
+	struct sg_fd *sfp = container_of(kref, struct sg_fd, f_ref);
+	struct sg_device *sdp = sfp->parentdp;
 	Sg_fd *fp;
 	Sg_fd *prev_fp;
+	unsigned long iflags;
+
+	/* CAUTION!  Note that sfp can still be found by walking sdp->headfp
+	   even though the refcount is now 0.  Therefore, unlink sfp from
+	   sdp->headfp BEFORE doing any other cleanup. */
 
+	write_lock_irqsave(&sg_index_lock, iflags);
 	prev_fp = sdp->headfp;
 	if (sfp == prev_fp)
 		sdp->headfp = prev_fp->nextfp;
@@ -2110,54 +2138,10 @@ __sg_remove_sfp(Sg_device * sdp, Sg_fd *
 			prev_fp = fp;
 		}
 	}
-	if (sfp->reserve.bufflen > 0) {
-		SCSI_LOG_TIMEOUT(6, 
-			printk("__sg_remove_sfp:    bufflen=%d, k_use_sg=%d\n",
-			(int) sfp->reserve.bufflen, (int) sfp->reserve.k_use_sg));
-		sg_remove_scat(&sfp->reserve);
-	}
-	sfp->parentdp = NULL;
-	SCSI_LOG_TIMEOUT(6, printk("__sg_remove_sfp:    sfp=0x%p\n", sfp));
-	kfree(sfp);
-}
-
-/* Returns 0 in normal case, 1 when detached and sdp object removed */
-static int
-sg_remove_sfp(Sg_device * sdp, Sg_fd * sfp)
-{
-	Sg_request *srp;
-	Sg_request *tsrp;
-	int dirty = 0;
-	int res = 0;
-
-	for (srp = sfp->headrp; srp; srp = tsrp) {
-		tsrp = srp->nextrp;
-		if (sg_srp_done(srp, sfp))
-			sg_finish_rem_req(srp);
-		else
-			++dirty;
-	}
-	if (0 == dirty) {
-		unsigned long iflags;
+	write_unlock_irqrestore(&sg_index_lock, iflags);
+	wake_up_interruptible(&sdp->o_excl_wait);
 
-		write_lock_irqsave(&sg_index_lock, iflags);
-		__sg_remove_sfp(sdp, sfp);
-		if (sdp->detached && (NULL == sdp->headfp)) {
-			idr_remove(&sg_index_idr, sdp->index);
-			kfree(sdp);
-			res = 1;
-		}
-		write_unlock_irqrestore(&sg_index_lock, iflags);
-	} else {
-		/* MOD_INC's to inhibit unloading sg and associated adapter driver */
-		/* only bump the access_count if we actually succeeded in
-		 * throwing another counter on the host module */
-		scsi_device_get(sdp->device);	/* XXX: retval ignored? */	
-		sfp->closed = 1;	/* flag dirty state on this fd */
-		SCSI_LOG_TIMEOUT(1, printk("sg_remove_sfp: worrisome, %d writes pending\n",
-				  dirty));
-	}
-	return res;
+	execute_in_process_context(sg_remove_sfp_usercontext, &sfp->ew);
 }
 
 static int
@@ -2199,19 +2183,6 @@ sg_last_dev(void)
 }
 #endif
 
-static Sg_device *
-sg_get_dev(int dev)
-{
-	Sg_device *sdp;
-	unsigned long iflags;
-
-	read_lock_irqsave(&sg_index_lock, iflags);
-	sdp = idr_find(&sg_index_idr, dev);
-	read_unlock_irqrestore(&sg_index_lock, iflags);
-
-	return sdp;
-}
-
 #ifdef CONFIG_SCSI_PROC_FS
 
 static struct proc_dir_entry *sg_proc_sgp = NULL;
@@ -2468,8 +2439,10 @@ static int sg_proc_seq_show_dev(struct s
 	struct sg_proc_deviter * it = (struct sg_proc_deviter *) v;
 	Sg_device *sdp;
 	struct scsi_device *scsidp;
+	unsigned long iflags;
 
-	sdp = it ? sg_get_dev(it->index) : NULL;
+	read_lock_irqsave(&sg_index_lock, iflags);
+	sdp = it ? idr_find(&sg_index_idr, it->index) : NULL;
 	if (sdp && (scsidp = sdp->device) && (!sdp->detached))
 		seq_printf(s, "%d\t%d\t%d\t%d\t%d\t%d\t%d\t%d\t%d\n",
 			      scsidp->host->host_no, scsidp->channel,
@@ -2480,6 +2453,7 @@ static int sg_proc_seq_show_dev(struct s
 			      (int) scsi_device_online(scsidp));
 	else
 		seq_printf(s, "-1\t-1\t-1\t-1\t-1\t-1\t-1\t-1\t-1\n");
+	read_unlock_irqrestore(&sg_index_lock, iflags);
 	return 0;
 }
 
@@ -2493,13 +2467,16 @@ static int sg_proc_seq_show_devstrs(stru
 	struct sg_proc_deviter * it = (struct sg_proc_deviter *) v;
 	Sg_device *sdp;
 	struct scsi_device *scsidp;
+	unsigned long iflags;
 
-	sdp = it ? sg_get_dev(it->index) : NULL;
+	read_lock_irqsave(&sg_index_lock, iflags);
+	sdp = it ? idr_find(&sg_index_idr, it->index) : NULL;
 	if (sdp && (scsidp = sdp->device) && (!sdp->detached))
 		seq_printf(s, "%8.8s\t%16.16s\t%4.4s\n",
 			   scsidp->vendor, scsidp->model, scsidp->rev);
 	else
 		seq_printf(s, "<no active device>\n");
+	read_unlock_irqrestore(&sg_index_lock, iflags);
 	return 0;
 }
 
@@ -2512,7 +2489,8 @@ static void sg_proc_debug_helper(struct 
 	const char * cp;
 	unsigned int ms;
 
-	for (k = 0; (fp = sg_get_nth_sfp(sdp, k)); ++k) {
+	for (k = 0, fp = sdp->headfp; fp != NULL; ++k, fp = fp->nextfp) {
+		read_lock(&fp->rq_list_lock); /* irqs already disabled */
 		seq_printf(s, "   FD(%d): timeout=%dms bufflen=%d "
 			   "(res)sgat=%d low_dma=%d\n", k + 1,
 			   jiffies_to_msecs(fp->timeout),
@@ -2522,7 +2500,9 @@ static void sg_proc_debug_helper(struct 
 		seq_printf(s, "   cmd_q=%d f_packid=%d k_orphan=%d closed=%d\n",
 			   (int) fp->cmd_q, (int) fp->force_packid,
 			   (int) fp->keep_orphan, (int) fp->closed);
-		for (m = 0; (srp = sg_get_nth_request(fp, m)); ++m) {
+		for (m = 0, srp = fp->headrp;
+				srp != NULL;
+				++m, srp = srp->nextrp) {
 			hp = &srp->header;
 			new_interface = (hp->interface_id == '\0') ? 0 : 1;
 			if (srp->res_used) {
@@ -2559,6 +2539,7 @@ static void sg_proc_debug_helper(struct 
 		}
 		if (0 == m)
 			seq_printf(s, "     No requests active\n");
+		read_unlock(&fp->rq_list_lock);
 	}
 }
 
@@ -2571,23 +2552,26 @@ static int sg_proc_seq_show_debug(struct
 {
 	struct sg_proc_deviter * it = (struct sg_proc_deviter *) v;
 	Sg_device *sdp;
+	unsigned long iflags;
 
 	if (it && (0 == it->index)) {
 		seq_printf(s, "max_active_device=%d(origin 1)\n",
 			   (int)it->max);
 		seq_printf(s, " def_reserved_size=%d\n", sg_big_buff);
 	}
-	sdp = it ? sg_get_dev(it->index) : NULL;
+
+	read_lock_irqsave(&sg_index_lock, iflags);
+	sdp = it ? idr_find(&sg_index_idr, it->index) : NULL;
 	if (sdp) {
 		struct scsi_device *scsidp = sdp->device;
 
 		if (NULL == scsidp) {
 			seq_printf(s, "device %d detached ??\n", 
 				   (int)it->index);
-			return 0;
+			goto out;
 		}
 
-		if (sg_get_nth_sfp(sdp, 0)) {
+		if (sdp->headfp) {
 			seq_printf(s, " >>> device=%s ",
 				sdp->disk->disk_name);
 			if (sdp->detached)
@@ -2604,6 +2588,8 @@ static int sg_proc_seq_show_debug(struct
 		}
 		sg_proc_debug_helper(s, sdp);
 	}
+out:
+	read_unlock_irqrestore(&sg_index_lock, iflags);
 	return 0;
 }
 



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/2] sg: fix races during device removal (v4)
  2009-01-19 23:03       ` [PATCH 1/2] sg: fix races during device removal (v4) Tony Battersby
@ 2009-01-20  1:06         ` FUJITA Tomonori
  2009-01-20 21:58           ` [PATCH 1/2] sg: fix races during device removal (v5) Tony Battersby
  2009-01-20 22:00           ` [PATCH 2/2] sg: fix races with ioctl(SG_IO) (v2) Tony Battersby
  0 siblings, 2 replies; 41+ messages in thread
From: FUJITA Tomonori @ 2009-01-20  1:06 UTC (permalink / raw)
  To: tonyb; +Cc: James.Bottomley, fujita.tomonori, dgilbert, hch, linux-scsi

On Mon, 19 Jan 2009 18:03:11 -0500
Tony Battersby <tonyb@cybernetics.com> wrote:

> sg has the following problems related to device removal:
> 
> * opening a sg fd races with removing a device
> * closing a sg fd races with removing a device
> * /proc/scsi/sg/* access races with removing a device
> * command completion races with removing a device
> * command completion races with closing a sg fd
> * can rmmod sg with active commands
> 
> These problems can cause kernel oopses, memory-use-after-free, or
> double-free errors.  This patch fixes these problems by using krefs
> to manage the lifetime of sg_device and sg_fd.
> 
> Each command submitted to the midlevel holds a reference to sg_fd
> until the completion callback.  This ensures that sg_fd doesn't go
> away if the fd is closed with commands still outstanding.
> 
> sg_fd gets the reference of sg_device (with scsi_device) and also
> makes sure that the sg module doesn't go away.
> 
> /proc/scsi/sg/* functions don't play nicely with krefs because they
> give information about sg_fds which have been closed but not yet
> freed due to still having outstanding commands and sg_devices which
> have been removed but not yet freed due to still being referenced
> by one or more sg_fds.  To deal with this safely without removing
> functionality, /proc functions now access sg_device and sg_fd while
> holding a lock instead of using kref_get()/kref_put().
> 
> Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
> ---
> 
> This version implements Fujita's suggestion to use locking instead
> of kref_get() for the /proc functions.
> 
> This version does not use any controversial extensions to the kref API.
> 
> To avoid changing the behavior of the /proc functions, it is still
> possible to find a sg_fd or a sg_device with a refcount of 0,
> since the last remaining references are removed from the kref_put()
> release functions.  However, no one should be doing a kref_get()
> under these circumstances, so it shouldn't cause a problem.  I added
> some comments warning about this non-standard behavior to reduce the
> risk of tripping up someone in the future.
> 
> The original sg_open() would return -ENXIO if idr_find() returned
> NULL but -ENODEV if sdp->detached.  It would be inconvenient to
> keep this behavior without inlining sg_get_dev() into sg_open().
> Since sg_open() was the only remaining caller anyway, I went ahead
> and inlined it and removed sg_get_dev(), thereby enabling sg_open()
> to return the same error codes as before.

I think that it would be nice to use sg_get_dev and
sg_put_dev. How about something like this:

/* should be called with sg_index_lock held */
struct sg_device *sg_lookup_dev(int index)
{
	return idr_find(&sg_index_idr, dev);
}

struct sg_device *sg_get_dev(int index)
{
	struct sg_device *sdp;
	unsigned long flags;

 	read_lock_irqsave(&sg_index_lock, flags);
	sdp = sg_lookup_dev(index);
	if (!sdp || !sdp->device)
		sdp = ERR_PTR(-ENXIO);
	else if (sdp->detached)
		sdp = ERR_PTR(-ENODEV);
	else
		kref_get(&sdp->d_ref);
 	write_unlock_irqrestore(&sg_index_lock, flags);

	return sdp;
}

static void sg_put_dev(struct sg_device *sdp)
{
	kref_put(&sdp->d_ref, sg_device_release);
}

The /proc functions use sg_lookup_dev.


> This version still passes all the tests listed previously.  However,
> mptscsih.c still needs some fixes; I will submit a patch for that
> later.
> 
>  sg.c |  386 ++++++++++++++++++++++++++++++++-----------------------------------
>  1 file changed, 186 insertions(+), 200 deletions(-)
> 
> --- linux-2.6.29-rc2/drivers/scsi/sg.c.orig	2009-01-19 16:19:01.000000000 -0500
> +++ linux-2.6.29-rc2/drivers/scsi/sg.c	2009-01-19 16:26:21.000000000 -0500
> @@ -101,6 +101,7 @@ static int scatter_elem_sz_prev = SG_SCA
>  #define SG_SECTOR_MSK (SG_SECTOR_SZ - 1)
>  
>  static int sg_add(struct device *, struct class_interface *);
> +static void sg_device_destroy(struct kref *kref);
>  static void sg_remove(struct device *, struct class_interface *);
>  
>  static DEFINE_IDR(sg_index_idr);
> @@ -158,6 +159,12 @@ typedef struct sg_fd {		/* holds the sta
>  	char next_cmd_len;	/* 0 -> automatic (def), >0 -> use on next write() */
>  	char keep_orphan;	/* 0 -> drop orphan (def), 1 -> keep for read() */
>  	char mmap_called;	/* 0 -> mmap() never called on this fd */
> +	/* CAUTION!  sfp is unlinked from sdp->headfp AFTER the refcount drops
> +	   to 0.  When scanning sdp->headfp, it is not safe to call
> +	   kref_get(&sfp->f_ref) if sfp->closed because the refcount may
> +	   already be 0. */
> +	struct kref f_ref;
> +	struct execute_work ew;

Documentation/CodingStyle says:

The preferred style for long (multi-line) comments is:

	/*
	 * This is the preferred style for multi-line
	 * comments in the Linux kernel source code.
	 * Please use it consistently.
	 *
	 * Description:  A column of asterisks on the left side,
	 * with beginning and ending almost-blank lines.
	 */

But I think that we are fine without the comment.


>  } Sg_fd;
>  
>  typedef struct sg_device { /* holds the state of each scsi generic device */
> @@ -171,6 +178,22 @@ typedef struct sg_device { /* holds the 
>  	char sgdebug;		/* 0->off, 1->sense, 9->dump dev, 10-> all devs */
>  	struct gendisk *disk;
>  	struct cdev * cdev;	/* char_dev [sysfs: /sys/cdev/major/sg<n>] */
> +	/* CAUTION!  idr_remove() is called AFTER the refcount drops to 0.
> +	   When using idr_find(), it is not safe to call kref_get(&sdp->d_ref)
> +	   if sdp->detached because the refcount may already be 0.
> +
> +	   In order to get a reference safely using idr_find():
> +	   1) lock sg_index_lock (either read or write lock)
> +	   2) call idr_find()
> +	   3) if sdp->detached, pretend that idr_find() returned NULL, and do
> +	      not call kref_get()
> +	   4) if !sdp->detached, then kref_get(&sdp->d_ref)
> +	   5) unlock sg_index_lock
> +
> +	   Alternately, you can use idr_find() to access the device even if
> +	   sdp->detached if you do not drop sg_index_lock, in which case you
> +	   should NOT call kref_get(&sdp->d_ref). */
> +	struct kref d_ref;

Ditto.


>  } Sg_device;
>  
>  static int sg_fasync(int fd, struct file *filp, int mode);
> @@ -194,13 +217,11 @@ static void sg_build_reserve(Sg_fd * sfp
>  static void sg_link_reserve(Sg_fd * sfp, Sg_request * srp, int size);
>  static void sg_unlink_reserve(Sg_fd * sfp, Sg_request * srp);
>  static Sg_fd *sg_add_sfp(Sg_device * sdp, int dev);
> -static int sg_remove_sfp(Sg_device * sdp, Sg_fd * sfp);
> -static void __sg_remove_sfp(Sg_device * sdp, Sg_fd * sfp);
> +static void sg_remove_sfp(struct kref *);
>  static Sg_request *sg_get_rq_mark(Sg_fd * sfp, int pack_id);
>  static Sg_request *sg_add_request(Sg_fd * sfp);
>  static int sg_remove_request(Sg_fd * sfp, Sg_request * srp);
>  static int sg_res_in_use(Sg_fd * sfp);
> -static Sg_device *sg_get_dev(int dev);
>  #ifdef CONFIG_SCSI_PROC_FS
>  static int sg_last_dev(void);
>  #endif
> @@ -236,23 +257,30 @@ sg_open(struct inode *inode, struct file
>  	lock_kernel();
>  	nonseekable_open(inode, filp);
>  	SCSI_LOG_TIMEOUT(3, printk("sg_open: dev=%d, flags=0x%x\n", dev, flags));
> -	sdp = sg_get_dev(dev);
> -	if ((!sdp) || (!sdp->device)) {
> -		unlock_kernel();
> -		return -ENXIO;
> -	}
> -	if (sdp->detached) {
> -		unlock_kernel();
> -		return -ENODEV;
> +
> +	retval = -ENXIO;
> +	read_lock_irq(&sg_index_lock);
> +	sdp = idr_find(&sg_index_idr, dev);
> +	if (sdp) {
> +		/* See comments about using idr_find() and d_ref.  If
> +		   sdp->detached, then the refcount may already be 0, in which
> +		   case it would be a bug to do kref_get(). */
> +		if (!sdp->detached)
> +			kref_get(&sdp->d_ref);
> +		else {
> +			sdp = NULL;
> +			retval = -ENODEV;
> +		}
>  	}
> +	read_unlock_irq(&sg_index_lock);
> +	if ((!sdp) || (!sdp->device))
> +		goto sg_put;

With the above sg_get_dev, we can do:

	sdp = sg_get_dev(dev);
	if (IS_ERR(sdp)) {
		unlock_kernel();
		return PTR_ERR(sdp);
	}
	


>  	/* This driver's module count bumped by fops_get in <linux/fs.h> */
>  	/* Prevent the device driver from vanishing while we sleep */
>  	retval = scsi_device_get(sdp->device);
> -	if (retval) {
> -		unlock_kernel();
> -		return retval;
> -	}
> +	if (retval)
> +		goto sg_put;
>  
>  	if (!((flags & O_NONBLOCK) ||
>  	      scsi_block_when_processing_errors(sdp->device))) {
> @@ -303,16 +331,20 @@ sg_open(struct inode *inode, struct file
>  	if ((sfp = sg_add_sfp(sdp, dev)))
>  		filp->private_data = sfp;
>  	else {
> -		if (flags & O_EXCL)
> +		if (flags & O_EXCL) {
>  			sdp->exclude = 0;	/* undo if error */
> +			wake_up_interruptible(&sdp->o_excl_wait);
> +		}
>  		retval = -ENOMEM;
>  		goto error_out;
>  	}
> -	unlock_kernel();
> -	return 0;
> -
> -      error_out:
> -	scsi_device_put(sdp->device);
> +	retval = 0;
> +error_out:
> +	if (retval)
> +		scsi_device_put(sdp->device);
> +sg_put:
> +	if (sdp)
> +		kref_put(&sdp->d_ref, sg_device_destroy);
>  	unlock_kernel();
>  	return retval;
>  }
> @@ -327,13 +359,14 @@ sg_release(struct inode *inode, struct f
>  	if ((!(sfp = (Sg_fd *) filp->private_data)) || (!(sdp = sfp->parentdp)))
>  		return -ENXIO;
>  	SCSI_LOG_TIMEOUT(3, printk("sg_release: %s\n", sdp->disk->disk_name));
> -	if (0 == sg_remove_sfp(sdp, sfp)) {	/* Returns 1 when sdp gone */
> -		if (!sdp->detached) {
> -			scsi_device_put(sdp->device);
> -		}
> -		sdp->exclude = 0;
> -		wake_up_interruptible(&sdp->o_excl_wait);
> -	}
> +
> +	sfp->closed = 1;
> +
> +	sdp->exclude = 0;
> +	wake_up_interruptible(&sdp->o_excl_wait);
> +
> +	kref_put(&sfp->f_ref, sg_remove_sfp);
> +
>  	return 0;
>  }
>  
> @@ -755,6 +788,7 @@ sg_common_write(Sg_fd * sfp, Sg_request 
>  	hp->duration = jiffies_to_msecs(jiffies);
>  
>  	srp->rq->timeout = timeout;
> +	kref_get(&sfp->f_ref); /* sg_rq_end_io() does kref_put(). */
>  	blk_execute_rq_nowait(sdp->device->request_queue, sdp->disk,
>  			      srp->rq, 1, sg_rq_end_io);
>  	return 0;
> @@ -1247,25 +1281,28 @@ sg_mmap(struct file *filp, struct vm_are
>  static void sg_rq_end_io(struct request *rq, int uptodate)
>  {
>  	struct sg_request *srp = rq->end_io_data;
> -	Sg_device *sdp = NULL;
> +	Sg_device *sdp;
>  	Sg_fd *sfp;
>  	unsigned long iflags;
>  	unsigned int ms;
>  	char *sense;
> -	int result, resid;
> +	int result, resid, done = 1;
>  
> -	if (NULL == srp) {
> -		printk(KERN_ERR "sg_cmd_done: NULL request\n");
> +	if (unlikely(NULL == srp)) { /* shouldn't happen */
> +		printk(KERN_ERR "sg_rq_end_io: NULL request\n");
>  		return;
>  	}

I think that we can simply remove the above code. As you wrote, if it
happens, we do something completely wrong.


> +
>  	sfp = srp->parentfp;
> -	if (sfp)
> -		sdp = sfp->parentdp;
> -	if ((NULL == sdp) || sdp->detached) {
> -		printk(KERN_INFO "sg_cmd_done: device detached\n");
> +	if (unlikely(NULL == sfp)) { /* shouldn't happen */
> +		printk(KERN_ERR "sg_rq_end_io: NULL sg_fd\n");
>  		return;
>  	}

Ditto.


> +	sdp = sfp->parentdp;
> +	if (unlikely(sdp->detached))
> +		printk(KERN_INFO "sg_rq_end_io: device detached\n");
> +
>  	sense = rq->sense;
>  	result = rq->errors;
>  	resid = rq->data_len;
> @@ -1303,33 +1340,25 @@ static void sg_rq_end_io(struct request 
>  	}
>  	/* Rely on write phase to clean out srp status values, so no "else" */
>  
> -	if (sfp->closed) {	/* whoops this fd already released, cleanup */
> -		SCSI_LOG_TIMEOUT(1, printk("sg_cmd_done: already closed, freeing ...\n"));
> -		sg_finish_rem_req(srp);
> -		srp = NULL;
> -		if (NULL == sfp->headrp) {
> -			SCSI_LOG_TIMEOUT(1, printk("sg_cmd_done: already closed, final cleanup\n"));
> -			if (0 == sg_remove_sfp(sdp, sfp)) {	/* device still present */
> -				scsi_device_put(sdp->device);
> -			}
> -			sfp = NULL;
> -		}
> -	} else if (srp && srp->orphan) {
> +	write_lock_irqsave(&sfp->rq_list_lock, iflags);
> +	if (unlikely(srp->orphan)) {
>  		if (sfp->keep_orphan)
>  			srp->sg_io_owned = 0;
> -		else {
> -			sg_finish_rem_req(srp);
> -			srp = NULL;
> -		}
> +		else
> +			done = 0;
>  	}
> -	if (sfp && srp) {
> -		/* Now wake up any sg_read() that is waiting for this packet. */
> -		kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN);
> -		write_lock_irqsave(&sfp->rq_list_lock, iflags);
> -		srp->done = 1;
> +	srp->done = done;
> +	write_unlock_irqrestore(&sfp->rq_list_lock, iflags);
> +
> +	if (likely(done)) {
> +		/* Now wake up any sg_read() that is waiting for this
> +		   packet. */
>  		wake_up_interruptible(&sfp->read_wait);
> -		write_unlock_irqrestore(&sfp->rq_list_lock, iflags);
> -	}
> +		kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN);
> +	} else
> +		sg_finish_rem_req(srp); /* call with srp->done == 0 */
> +
> +	kref_put(&sfp->f_ref, sg_remove_sfp);
>  }
>  
>  static struct file_operations sg_fops = {
> @@ -1391,6 +1420,7 @@ static Sg_device *sg_alloc(struct gendis
>  	init_waitqueue_head(&sdp->o_excl_wait);
>  	sdp->sg_tablesize = min(q->max_hw_segments, q->max_phys_segments);
>  	sdp->index = k;
> +	kref_init(&sdp->d_ref);
>  
>  	error = 0;
>   out:
> @@ -1488,49 +1518,45 @@ out:
>  	return error;
>  }
>  
> -static void
> -sg_remove(struct device *cl_dev, struct class_interface *cl_intf)
> +static void sg_device_destroy(struct kref *kref)
> +{
> +	struct sg_device *sdp = container_of(kref, struct sg_device, d_ref);
> +	unsigned long flags;
> +
> +	/* CAUTION!  Note that the device can still be found via idr_find()
> +	   even though the refcount is 0.  Therefore, do idr_remove() BEFORE
> +	   any other cleanup. */
> +
> +	write_lock_irqsave(&sg_index_lock, flags);
> +	idr_remove(&sg_index_idr, sdp->index);
> +	write_unlock_irqrestore(&sg_index_lock, flags);
> +
> +	SCSI_LOG_TIMEOUT(3,
> +		printk("sg_device_destroy: %s\n",
> +			sdp->disk->disk_name));
> +
> +	put_disk(sdp->disk);
> +	kfree(sdp);
> +}
> +
> +static void sg_remove(struct device *cl_dev, struct class_interface *cl_intf)
>  {
>  	struct scsi_device *scsidp = to_scsi_device(cl_dev->parent);
>  	Sg_device *sdp = dev_get_drvdata(cl_dev);
>  	unsigned long iflags;
>  	Sg_fd *sfp;
> -	Sg_fd *tsfp;
> -	Sg_request *srp;
> -	Sg_request *tsrp;
> -	int delay;
>  
> -	if (!sdp)
> +	if (!sdp || sdp->detached)
>  		return;
>  
> -	delay = 0;
> +	SCSI_LOG_TIMEOUT(3, printk("sg_remove: %s\n", sdp->disk->disk_name));
> +
> +	/* Need a write lock to set sdp->detached. */
>  	write_lock_irqsave(&sg_index_lock, iflags);
> -	if (sdp->headfp) {
> -		sdp->detached = 1;
> -		for (sfp = sdp->headfp; sfp; sfp = tsfp) {
> -			tsfp = sfp->nextfp;
> -			for (srp = sfp->headrp; srp; srp = tsrp) {
> -				tsrp = srp->nextrp;
> -				if (sfp->closed || (0 == sg_srp_done(srp, sfp)))
> -					sg_finish_rem_req(srp);
> -			}
> -			if (sfp->closed) {
> -				scsi_device_put(sdp->device);
> -				__sg_remove_sfp(sdp, sfp);
> -			} else {
> -				delay = 1;
> -				wake_up_interruptible(&sfp->read_wait);
> -				kill_fasync(&sfp->async_qp, SIGPOLL,
> -					    POLL_HUP);
> -			}
> -		}
> -		SCSI_LOG_TIMEOUT(3, printk("sg_remove: dev=%d, dirty\n", sdp->index));
> -		if (NULL == sdp->headfp) {
> -			idr_remove(&sg_index_idr, sdp->index);
> -		}
> -	} else {	/* nothing active, simple case */
> -		SCSI_LOG_TIMEOUT(3, printk("sg_remove: dev=%d\n", sdp->index));
> -		idr_remove(&sg_index_idr, sdp->index);
> +	sdp->detached = 1;
> +	for (sfp = sdp->headfp; sfp; sfp = sfp->nextfp) {
> +		wake_up_interruptible(&sfp->read_wait);
> +		kill_fasync(&sfp->async_qp, SIGPOLL, POLL_HUP);
>  	}
>  	write_unlock_irqrestore(&sg_index_lock, iflags);
>  
> @@ -1538,13 +1564,8 @@ sg_remove(struct device *cl_dev, struct 
>  	device_destroy(sg_sysfs_class, MKDEV(SCSI_GENERIC_MAJOR, sdp->index));
>  	cdev_del(sdp->cdev);
>  	sdp->cdev = NULL;
> -	put_disk(sdp->disk);
> -	sdp->disk = NULL;
> -	if (NULL == sdp->headfp)
> -		kfree(sdp);
>  
> -	if (delay)
> -		msleep(10);	/* dirty detach so delay device destruction */
> +	kref_put(&sdp->d_ref, sg_device_destroy);
>  }
>  
>  module_param_named(scatter_elem_sz, scatter_elem_sz, int, S_IRUGO | S_IWUSR);
> @@ -1941,22 +1962,6 @@ sg_get_rq_mark(Sg_fd * sfp, int pack_id)
>  	return resp;
>  }
>  
> -#ifdef CONFIG_SCSI_PROC_FS
> -static Sg_request *
> -sg_get_nth_request(Sg_fd * sfp, int nth)
> -{
> -	Sg_request *resp;
> -	unsigned long iflags;
> -	int k;
> -
> -	read_lock_irqsave(&sfp->rq_list_lock, iflags);
> -	for (k = 0, resp = sfp->headrp; resp && (k < nth);
> -	     ++k, resp = resp->nextrp) ;
> -	read_unlock_irqrestore(&sfp->rq_list_lock, iflags);
> -	return resp;
> -}
> -#endif
> -
>  /* always adds to end of list */
>  static Sg_request *
>  sg_add_request(Sg_fd * sfp)
> @@ -2032,22 +2037,6 @@ sg_remove_request(Sg_fd * sfp, Sg_reques
>  	return res;
>  }
>  
> -#ifdef CONFIG_SCSI_PROC_FS
> -static Sg_fd *
> -sg_get_nth_sfp(Sg_device * sdp, int nth)
> -{
> -	Sg_fd *resp;
> -	unsigned long iflags;
> -	int k;
> -
> -	read_lock_irqsave(&sg_index_lock, iflags);
> -	for (k = 0, resp = sdp->headfp; resp && (k < nth);
> -	     ++k, resp = resp->nextfp) ;
> -	read_unlock_irqrestore(&sg_index_lock, iflags);
> -	return resp;
> -}
> -#endif
> -
>  static Sg_fd *
>  sg_add_sfp(Sg_device * sdp, int dev)
>  {
> @@ -2062,6 +2051,7 @@ sg_add_sfp(Sg_device * sdp, int dev)
>  	init_waitqueue_head(&sfp->read_wait);
>  	rwlock_init(&sfp->rq_list_lock);
>  
> +	kref_init(&sfp->f_ref);
>  	sfp->timeout = SG_DEFAULT_TIMEOUT;
>  	sfp->timeout_user = SG_DEFAULT_TIMEOUT_USER;
>  	sfp->force_packid = SG_DEF_FORCE_PACK_ID;
> @@ -2089,15 +2079,53 @@ sg_add_sfp(Sg_device * sdp, int dev)
>  	sg_build_reserve(sfp, bufflen);
>  	SCSI_LOG_TIMEOUT(3, printk("sg_add_sfp:   bufflen=%d, k_use_sg=%d\n",
>  			   sfp->reserve.bufflen, sfp->reserve.k_use_sg));
> +
> +	kref_get(&sdp->d_ref);
> +	__module_get(THIS_MODULE);
>  	return sfp;
>  }
>  
> -static void
> -__sg_remove_sfp(Sg_device * sdp, Sg_fd * sfp)
> +static void sg_remove_sfp_usercontext(struct work_struct *work)
> +{
> +	struct sg_fd *sfp = container_of(work, struct sg_fd, ew.work);
> +	struct sg_device *sdp = sfp->parentdp;
> +
> +	/* Cleanup any responses which were never read(). */
> +	while (sfp->headrp)
> +		sg_finish_rem_req(sfp->headrp);
> +
> +	if (sfp->reserve.bufflen > 0) {
> +		SCSI_LOG_TIMEOUT(6,
> +			printk("sg_remove_sfp:    bufflen=%d, k_use_sg=%d\n",
> +				(int) sfp->reserve.bufflen,
> +				(int) sfp->reserve.k_use_sg));
> +		sg_remove_scat(&sfp->reserve);
> +	}
> +
> +	SCSI_LOG_TIMEOUT(6,
> +		printk("sg_remove_sfp: %s, sfp=0x%p\n",
> +			sdp->disk->disk_name,
> +			sfp));
> +	kfree(sfp);
> +
> +	scsi_device_put(sdp->device);
> +	kref_put(&sdp->d_ref, sg_device_destroy);
> +	module_put(THIS_MODULE);
> +}
> +
> +static void sg_remove_sfp(struct kref *kref)
>  {
> +	struct sg_fd *sfp = container_of(kref, struct sg_fd, f_ref);
> +	struct sg_device *sdp = sfp->parentdp;
>  	Sg_fd *fp;
>  	Sg_fd *prev_fp;
> +	unsigned long iflags;
> +
> +	/* CAUTION!  Note that sfp can still be found by walking sdp->headfp
> +	   even though the refcount is now 0.  Therefore, unlink sfp from
> +	   sdp->headfp BEFORE doing any other cleanup. */
>  
> +	write_lock_irqsave(&sg_index_lock, iflags);
>  	prev_fp = sdp->headfp;
>  	if (sfp == prev_fp)
>  		sdp->headfp = prev_fp->nextfp;
> @@ -2110,54 +2138,10 @@ __sg_remove_sfp(Sg_device * sdp, Sg_fd *
>  			prev_fp = fp;
>  		}
>  	}
> -	if (sfp->reserve.bufflen > 0) {
> -		SCSI_LOG_TIMEOUT(6, 
> -			printk("__sg_remove_sfp:    bufflen=%d, k_use_sg=%d\n",
> -			(int) sfp->reserve.bufflen, (int) sfp->reserve.k_use_sg));
> -		sg_remove_scat(&sfp->reserve);
> -	}
> -	sfp->parentdp = NULL;
> -	SCSI_LOG_TIMEOUT(6, printk("__sg_remove_sfp:    sfp=0x%p\n", sfp));
> -	kfree(sfp);
> -}
> -
> -/* Returns 0 in normal case, 1 when detached and sdp object removed */
> -static int
> -sg_remove_sfp(Sg_device * sdp, Sg_fd * sfp)
> -{
> -	Sg_request *srp;
> -	Sg_request *tsrp;
> -	int dirty = 0;
> -	int res = 0;
> -
> -	for (srp = sfp->headrp; srp; srp = tsrp) {
> -		tsrp = srp->nextrp;
> -		if (sg_srp_done(srp, sfp))
> -			sg_finish_rem_req(srp);
> -		else
> -			++dirty;
> -	}
> -	if (0 == dirty) {
> -		unsigned long iflags;
> +	write_unlock_irqrestore(&sg_index_lock, iflags);
> +	wake_up_interruptible(&sdp->o_excl_wait);
>  
> -		write_lock_irqsave(&sg_index_lock, iflags);
> -		__sg_remove_sfp(sdp, sfp);
> -		if (sdp->detached && (NULL == sdp->headfp)) {
> -			idr_remove(&sg_index_idr, sdp->index);
> -			kfree(sdp);
> -			res = 1;
> -		}
> -		write_unlock_irqrestore(&sg_index_lock, iflags);
> -	} else {
> -		/* MOD_INC's to inhibit unloading sg and associated adapter driver */
> -		/* only bump the access_count if we actually succeeded in
> -		 * throwing another counter on the host module */
> -		scsi_device_get(sdp->device);	/* XXX: retval ignored? */	
> -		sfp->closed = 1;	/* flag dirty state on this fd */
> -		SCSI_LOG_TIMEOUT(1, printk("sg_remove_sfp: worrisome, %d writes pending\n",
> -				  dirty));
> -	}
> -	return res;
> +	execute_in_process_context(sg_remove_sfp_usercontext, &sfp->ew);
>  }
>  
>  static int
> @@ -2199,19 +2183,6 @@ sg_last_dev(void)
>  }
>  #endif
>  
> -static Sg_device *
> -sg_get_dev(int dev)
> -{
> -	Sg_device *sdp;
> -	unsigned long iflags;
> -
> -	read_lock_irqsave(&sg_index_lock, iflags);
> -	sdp = idr_find(&sg_index_idr, dev);
> -	read_unlock_irqrestore(&sg_index_lock, iflags);
> -
> -	return sdp;
> -}
> -
>  #ifdef CONFIG_SCSI_PROC_FS
>  
>  static struct proc_dir_entry *sg_proc_sgp = NULL;
> @@ -2468,8 +2439,10 @@ static int sg_proc_seq_show_dev(struct s
>  	struct sg_proc_deviter * it = (struct sg_proc_deviter *) v;
>  	Sg_device *sdp;
>  	struct scsi_device *scsidp;
> +	unsigned long iflags;
>  
> -	sdp = it ? sg_get_dev(it->index) : NULL;
> +	read_lock_irqsave(&sg_index_lock, iflags);
> +	sdp = it ? idr_find(&sg_index_idr, it->index) : NULL;
>  	if (sdp && (scsidp = sdp->device) && (!sdp->detached))
>  		seq_printf(s, "%d\t%d\t%d\t%d\t%d\t%d\t%d\t%d\t%d\n",
>  			      scsidp->host->host_no, scsidp->channel,
> @@ -2480,6 +2453,7 @@ static int sg_proc_seq_show_dev(struct s
>  			      (int) scsi_device_online(scsidp));
>  	else
>  		seq_printf(s, "-1\t-1\t-1\t-1\t-1\t-1\t-1\t-1\t-1\n");
> +	read_unlock_irqrestore(&sg_index_lock, iflags);
>  	return 0;
>  }
>  
> @@ -2493,13 +2467,16 @@ static int sg_proc_seq_show_devstrs(stru
>  	struct sg_proc_deviter * it = (struct sg_proc_deviter *) v;
>  	Sg_device *sdp;
>  	struct scsi_device *scsidp;
> +	unsigned long iflags;
>  
> -	sdp = it ? sg_get_dev(it->index) : NULL;
> +	read_lock_irqsave(&sg_index_lock, iflags);
> +	sdp = it ? idr_find(&sg_index_idr, it->index) : NULL;
>  	if (sdp && (scsidp = sdp->device) && (!sdp->detached))
>  		seq_printf(s, "%8.8s\t%16.16s\t%4.4s\n",
>  			   scsidp->vendor, scsidp->model, scsidp->rev);
>  	else
>  		seq_printf(s, "<no active device>\n");
> +	read_unlock_irqrestore(&sg_index_lock, iflags);
>  	return 0;
>  }
>  
> @@ -2512,7 +2489,8 @@ static void sg_proc_debug_helper(struct 
>  	const char * cp;
>  	unsigned int ms;
>  
> -	for (k = 0; (fp = sg_get_nth_sfp(sdp, k)); ++k) {
> +	for (k = 0, fp = sdp->headfp; fp != NULL; ++k, fp = fp->nextfp) {
> +		read_lock(&fp->rq_list_lock); /* irqs already disabled */
>  		seq_printf(s, "   FD(%d): timeout=%dms bufflen=%d "
>  			   "(res)sgat=%d low_dma=%d\n", k + 1,
>  			   jiffies_to_msecs(fp->timeout),
> @@ -2522,7 +2500,9 @@ static void sg_proc_debug_helper(struct 
>  		seq_printf(s, "   cmd_q=%d f_packid=%d k_orphan=%d closed=%d\n",
>  			   (int) fp->cmd_q, (int) fp->force_packid,
>  			   (int) fp->keep_orphan, (int) fp->closed);
> -		for (m = 0; (srp = sg_get_nth_request(fp, m)); ++m) {
> +		for (m = 0, srp = fp->headrp;
> +				srp != NULL;
> +				++m, srp = srp->nextrp) {
>  			hp = &srp->header;
>  			new_interface = (hp->interface_id == '\0') ? 0 : 1;
>  			if (srp->res_used) {
> @@ -2559,6 +2539,7 @@ static void sg_proc_debug_helper(struct 
>  		}
>  		if (0 == m)
>  			seq_printf(s, "     No requests active\n");
> +		read_unlock(&fp->rq_list_lock);
>  	}
>  }
>  
> @@ -2571,23 +2552,26 @@ static int sg_proc_seq_show_debug(struct
>  {
>  	struct sg_proc_deviter * it = (struct sg_proc_deviter *) v;
>  	Sg_device *sdp;
> +	unsigned long iflags;
>  
>  	if (it && (0 == it->index)) {
>  		seq_printf(s, "max_active_device=%d(origin 1)\n",
>  			   (int)it->max);
>  		seq_printf(s, " def_reserved_size=%d\n", sg_big_buff);
>  	}
> -	sdp = it ? sg_get_dev(it->index) : NULL;
> +
> +	read_lock_irqsave(&sg_index_lock, iflags);
> +	sdp = it ? idr_find(&sg_index_idr, it->index) : NULL;
>  	if (sdp) {
>  		struct scsi_device *scsidp = sdp->device;
>  
>  		if (NULL == scsidp) {
>  			seq_printf(s, "device %d detached ??\n", 
>  				   (int)it->index);
> -			return 0;
> +			goto out;
>  		}
>  
> -		if (sg_get_nth_sfp(sdp, 0)) {
> +		if (sdp->headfp) {
>  			seq_printf(s, " >>> device=%s ",
>  				sdp->disk->disk_name);
>  			if (sdp->detached)
> @@ -2604,6 +2588,8 @@ static int sg_proc_seq_show_debug(struct
>  		}
>  		sg_proc_debug_helper(s, sdp);
>  	}
> +out:
> +	read_unlock_irqrestore(&sg_index_lock, iflags);
>  	return 0;
>  }
>  
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 1/2] sg: fix races during device removal (v5)
  2009-01-20  1:06         ` FUJITA Tomonori
@ 2009-01-20 21:58           ` Tony Battersby
  2009-01-21 18:25             ` Stefan Richter
  2009-01-21 19:45             ` [PATCH 1/2] sg: fix races during device removal (v6) Tony Battersby
  2009-01-20 22:00           ` [PATCH 2/2] sg: fix races with ioctl(SG_IO) (v2) Tony Battersby
  1 sibling, 2 replies; 41+ messages in thread
From: Tony Battersby @ 2009-01-20 21:58 UTC (permalink / raw)
  To: James.Bottomley; +Cc: FUJITA Tomonori, dgilbert, hch, linux-scsi

sg has the following problems related to device removal:

* opening a sg fd races with removing a device
* closing a sg fd races with removing a device
* /proc/scsi/sg/* access races with removing a device
* command completion races with removing a device
* command completion races with closing a sg fd
* can rmmod sg with active commands

These problems can cause kernel oopses, memory-use-after-free, or
double-free errors.  This patch fixes these problems by using krefs
to manage the lifetime of sg_device and sg_fd.

Each command submitted to the midlevel holds a reference to sg_fd
until the completion callback.  This ensures that sg_fd doesn't go
away if the fd is closed with commands still outstanding.

sg_fd gets the reference of sg_device (with scsi_device) and also
makes sure that the sg module doesn't go away.

/proc/scsi/sg/* functions don't play nicely with krefs because they
give information about sg_fds which have been closed but not yet
freed due to still having outstanding commands and sg_devices which
have been removed but not yet freed due to still being referenced
by one or more sg_fds.  To deal with this safely without removing
functionality, /proc functions now access sg_device and sg_fd while
holding a lock instead of using kref_get()/kref_put().

Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
---

This version addresses Fujita's comments about version 4.  However,
regarding the unnecessary checks in sg_rq_end_io(), I think that it is
best to have some checks there to help catch bugs.  I think BUG_ON is
the best way to go for this instead of printk/return because bugs
will be noticed and fixed instead of possibly going unnoticed.
BUG_ON is also less distracting than printk/return when reading the
source, and it also has a clearer intent.  To that end, I removed the
check for srp == NULL, which probably wasn't useful.  I changed the
check for sfp == NULL to BUG_ON(); this would happen if a bug caused
sg_finish_rem_req() or sg_remove_request() to be called before
sg_rq_end_io() or if a buggy LLDD called sc->scsi_done() twice for the
same command.  I also added BUG_ON(srp->done != 0) to catch similar
problems.

This version also changes locking in sg_alloc() to make it impossible
for idr_find() called under lock to return sdp with sdp->device == NULL
or some other not-quite-fully-initialized state, so it is no longer
necessary to check for that condition after idr_find().  I also fixed
sg_alloc() forgetting to call idr_remove() if k >= SG_MAX_DEVS.

 sg.c |  420 ++++++++++++++++++++++++++++++++-----------------------------------
 1 file changed, 201 insertions(+), 219 deletions(-)

--- linux-2.6.29-rc2/drivers/scsi/sg.c.orig	2009-01-20 16:24:59.000000000 -0500
+++ linux-2.6.29-rc2/drivers/scsi/sg.c	2009-01-20 16:32:28.000000000 -0500
@@ -101,6 +101,7 @@ static int scatter_elem_sz_prev = SG_SCA
 #define SG_SECTOR_MSK (SG_SECTOR_SZ - 1)
 
 static int sg_add(struct device *, struct class_interface *);
+static void sg_device_destroy(struct kref *kref);
 static void sg_remove(struct device *, struct class_interface *);
 
 static DEFINE_IDR(sg_index_idr);
@@ -158,6 +159,8 @@ typedef struct sg_fd {		/* holds the sta
 	char next_cmd_len;	/* 0 -> automatic (def), >0 -> use on next write() */
 	char keep_orphan;	/* 0 -> drop orphan (def), 1 -> keep for read() */
 	char mmap_called;	/* 0 -> mmap() never called on this fd */
+	struct kref f_ref;
+	struct execute_work ew;
 } Sg_fd;
 
 typedef struct sg_device { /* holds the state of each scsi generic device */
@@ -171,6 +174,7 @@ typedef struct sg_device { /* holds the 
 	char sgdebug;		/* 0->off, 1->sense, 9->dump dev, 10-> all devs */
 	struct gendisk *disk;
 	struct cdev * cdev;	/* char_dev [sysfs: /sys/cdev/major/sg<n>] */
+	struct kref d_ref;
 } Sg_device;
 
 static int sg_fasync(int fd, struct file *filp, int mode);
@@ -194,13 +198,14 @@ static void sg_build_reserve(Sg_fd * sfp
 static void sg_link_reserve(Sg_fd * sfp, Sg_request * srp, int size);
 static void sg_unlink_reserve(Sg_fd * sfp, Sg_request * srp);
 static Sg_fd *sg_add_sfp(Sg_device * sdp, int dev);
-static int sg_remove_sfp(Sg_device * sdp, Sg_fd * sfp);
-static void __sg_remove_sfp(Sg_device * sdp, Sg_fd * sfp);
+static void sg_remove_sfp(struct kref *);
 static Sg_request *sg_get_rq_mark(Sg_fd * sfp, int pack_id);
 static Sg_request *sg_add_request(Sg_fd * sfp);
 static int sg_remove_request(Sg_fd * sfp, Sg_request * srp);
 static int sg_res_in_use(Sg_fd * sfp);
+static Sg_device *sg_lookup_dev(int dev);
 static Sg_device *sg_get_dev(int dev);
+static void sg_put_dev(Sg_device *sdp);
 #ifdef CONFIG_SCSI_PROC_FS
 static int sg_last_dev(void);
 #endif
@@ -237,22 +242,17 @@ sg_open(struct inode *inode, struct file
 	nonseekable_open(inode, filp);
 	SCSI_LOG_TIMEOUT(3, printk("sg_open: dev=%d, flags=0x%x\n", dev, flags));
 	sdp = sg_get_dev(dev);
-	if ((!sdp) || (!sdp->device)) {
-		unlock_kernel();
-		return -ENXIO;
-	}
-	if (sdp->detached) {
-		unlock_kernel();
-		return -ENODEV;
+	if (IS_ERR(sdp)) {
+		retval = PTR_ERR(sdp);
+		sdp = NULL;
+		goto sg_put;
 	}
 
 	/* This driver's module count bumped by fops_get in <linux/fs.h> */
 	/* Prevent the device driver from vanishing while we sleep */
 	retval = scsi_device_get(sdp->device);
-	if (retval) {
-		unlock_kernel();
-		return retval;
-	}
+	if (retval)
+		goto sg_put;
 
 	if (!((flags & O_NONBLOCK) ||
 	      scsi_block_when_processing_errors(sdp->device))) {
@@ -303,16 +303,20 @@ sg_open(struct inode *inode, struct file
 	if ((sfp = sg_add_sfp(sdp, dev)))
 		filp->private_data = sfp;
 	else {
-		if (flags & O_EXCL)
+		if (flags & O_EXCL) {
 			sdp->exclude = 0;	/* undo if error */
+			wake_up_interruptible(&sdp->o_excl_wait);
+		}
 		retval = -ENOMEM;
 		goto error_out;
 	}
-	unlock_kernel();
-	return 0;
-
-      error_out:
-	scsi_device_put(sdp->device);
+	retval = 0;
+error_out:
+	if (retval)
+		scsi_device_put(sdp->device);
+sg_put:
+	if (sdp)
+		sg_put_dev(sdp);
 	unlock_kernel();
 	return retval;
 }
@@ -327,13 +331,13 @@ sg_release(struct inode *inode, struct f
 	if ((!(sfp = (Sg_fd *) filp->private_data)) || (!(sdp = sfp->parentdp)))
 		return -ENXIO;
 	SCSI_LOG_TIMEOUT(3, printk("sg_release: %s\n", sdp->disk->disk_name));
-	if (0 == sg_remove_sfp(sdp, sfp)) {	/* Returns 1 when sdp gone */
-		if (!sdp->detached) {
-			scsi_device_put(sdp->device);
-		}
-		sdp->exclude = 0;
-		wake_up_interruptible(&sdp->o_excl_wait);
-	}
+
+	sfp->closed = 1;
+
+	sdp->exclude = 0;
+	wake_up_interruptible(&sdp->o_excl_wait);
+
+	kref_put(&sfp->f_ref, sg_remove_sfp);
 	return 0;
 }
 
@@ -755,6 +759,7 @@ sg_common_write(Sg_fd * sfp, Sg_request 
 	hp->duration = jiffies_to_msecs(jiffies);
 
 	srp->rq->timeout = timeout;
+	kref_get(&sfp->f_ref); /* sg_rq_end_io() does kref_put(). */
 	blk_execute_rq_nowait(sdp->device->request_queue, sdp->disk,
 			      srp->rq, 1, sg_rq_end_io);
 	return 0;
@@ -1247,24 +1252,21 @@ sg_mmap(struct file *filp, struct vm_are
 static void sg_rq_end_io(struct request *rq, int uptodate)
 {
 	struct sg_request *srp = rq->end_io_data;
-	Sg_device *sdp = NULL;
+	Sg_device *sdp;
 	Sg_fd *sfp;
 	unsigned long iflags;
 	unsigned int ms;
 	char *sense;
-	int result, resid;
+	int result, resid, done = 1;
+
+	BUG_ON(srp->done != 0);
 
-	if (NULL == srp) {
-		printk(KERN_ERR "sg_cmd_done: NULL request\n");
-		return;
-	}
 	sfp = srp->parentfp;
-	if (sfp)
-		sdp = sfp->parentdp;
-	if ((NULL == sdp) || sdp->detached) {
-		printk(KERN_INFO "sg_cmd_done: device detached\n");
-		return;
-	}
+	BUG_ON(sfp == NULL);
+
+	sdp = sfp->parentdp;
+	if (unlikely(sdp->detached))
+		printk(KERN_INFO "sg_rq_end_io: device detached\n");
 
 	sense = rq->sense;
 	result = rq->errors;
@@ -1303,33 +1305,26 @@ static void sg_rq_end_io(struct request 
 	}
 	/* Rely on write phase to clean out srp status values, so no "else" */
 
-	if (sfp->closed) {	/* whoops this fd already released, cleanup */
-		SCSI_LOG_TIMEOUT(1, printk("sg_cmd_done: already closed, freeing ...\n"));
-		sg_finish_rem_req(srp);
-		srp = NULL;
-		if (NULL == sfp->headrp) {
-			SCSI_LOG_TIMEOUT(1, printk("sg_cmd_done: already closed, final cleanup\n"));
-			if (0 == sg_remove_sfp(sdp, sfp)) {	/* device still present */
-				scsi_device_put(sdp->device);
-			}
-			sfp = NULL;
-		}
-	} else if (srp && srp->orphan) {
+	write_lock_irqsave(&sfp->rq_list_lock, iflags);
+	if (unlikely(srp->orphan)) {
 		if (sfp->keep_orphan)
 			srp->sg_io_owned = 0;
-		else {
-			sg_finish_rem_req(srp);
-			srp = NULL;
-		}
+		else
+			done = 0;
 	}
-	if (sfp && srp) {
-		/* Now wake up any sg_read() that is waiting for this packet. */
-		kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN);
-		write_lock_irqsave(&sfp->rq_list_lock, iflags);
-		srp->done = 1;
+	srp->done = done;
+	write_unlock_irqrestore(&sfp->rq_list_lock, iflags);
+
+	if (likely(done)) {
+		/* Now wake up any sg_read() that is waiting for this
+		 * packet.
+		 */
 		wake_up_interruptible(&sfp->read_wait);
-		write_unlock_irqrestore(&sfp->rq_list_lock, iflags);
-	}
+		kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN);
+	} else
+		sg_finish_rem_req(srp); /* call with srp->done == 0 */
+
+	kref_put(&sfp->f_ref, sg_remove_sfp);
 }
 
 static struct file_operations sg_fops = {
@@ -1364,17 +1359,18 @@ static Sg_device *sg_alloc(struct gendis
 		printk(KERN_WARNING "kmalloc Sg_device failure\n");
 		return ERR_PTR(-ENOMEM);
 	}
-	error = -ENOMEM;
+
 	if (!idr_pre_get(&sg_index_idr, GFP_KERNEL)) {
 		printk(KERN_WARNING "idr expansion Sg_device failure\n");
+		error = -ENOMEM;
 		goto out;
 	}
 
 	write_lock_irqsave(&sg_index_lock, iflags);
-	error = idr_get_new(&sg_index_idr, sdp, &k);
-	write_unlock_irqrestore(&sg_index_lock, iflags);
 
+	error = idr_get_new(&sg_index_idr, sdp, &k);
 	if (error) {
+		write_unlock_irqrestore(&sg_index_lock, iflags);
 		printk(KERN_WARNING "idr allocation Sg_device failure: %d\n",
 		       error);
 		goto out;
@@ -1391,6 +1387,9 @@ static Sg_device *sg_alloc(struct gendis
 	init_waitqueue_head(&sdp->o_excl_wait);
 	sdp->sg_tablesize = min(q->max_hw_segments, q->max_phys_segments);
 	sdp->index = k;
+	kref_init(&sdp->d_ref);
+
+	write_unlock_irqrestore(&sg_index_lock, iflags);
 
 	error = 0;
  out:
@@ -1401,6 +1400,8 @@ static Sg_device *sg_alloc(struct gendis
 	return sdp;
 
  overflow:
+	idr_remove(&sg_index_idr, k);
+	write_unlock_irqrestore(&sg_index_lock, iflags);
 	sdev_printk(KERN_WARNING, scsidp,
 		    "Unable to attach sg device type=%d, minor "
 		    "number exceeds %d\n", scsidp->type, SG_MAX_DEVS - 1);
@@ -1488,49 +1489,46 @@ out:
 	return error;
 }
 
-static void
-sg_remove(struct device *cl_dev, struct class_interface *cl_intf)
+static void sg_device_destroy(struct kref *kref)
+{
+	struct sg_device *sdp = container_of(kref, struct sg_device, d_ref);
+	unsigned long flags;
+
+	/* CAUTION!  Note that the device can still be found via idr_find()
+	 * even though the refcount is 0.  Therefore, do idr_remove() BEFORE
+	 * any other cleanup.
+	 */
+
+	write_lock_irqsave(&sg_index_lock, flags);
+	idr_remove(&sg_index_idr, sdp->index);
+	write_unlock_irqrestore(&sg_index_lock, flags);
+
+	SCSI_LOG_TIMEOUT(3,
+		printk("sg_device_destroy: %s\n",
+			sdp->disk->disk_name));
+
+	put_disk(sdp->disk);
+	kfree(sdp);
+}
+
+static void sg_remove(struct device *cl_dev, struct class_interface *cl_intf)
 {
 	struct scsi_device *scsidp = to_scsi_device(cl_dev->parent);
 	Sg_device *sdp = dev_get_drvdata(cl_dev);
 	unsigned long iflags;
 	Sg_fd *sfp;
-	Sg_fd *tsfp;
-	Sg_request *srp;
-	Sg_request *tsrp;
-	int delay;
 
-	if (!sdp)
+	if (!sdp || sdp->detached)
 		return;
 
-	delay = 0;
+	SCSI_LOG_TIMEOUT(3, printk("sg_remove: %s\n", sdp->disk->disk_name));
+
+	/* Need a write lock to set sdp->detached. */
 	write_lock_irqsave(&sg_index_lock, iflags);
-	if (sdp->headfp) {
-		sdp->detached = 1;
-		for (sfp = sdp->headfp; sfp; sfp = tsfp) {
-			tsfp = sfp->nextfp;
-			for (srp = sfp->headrp; srp; srp = tsrp) {
-				tsrp = srp->nextrp;
-				if (sfp->closed || (0 == sg_srp_done(srp, sfp)))
-					sg_finish_rem_req(srp);
-			}
-			if (sfp->closed) {
-				scsi_device_put(sdp->device);
-				__sg_remove_sfp(sdp, sfp);
-			} else {
-				delay = 1;
-				wake_up_interruptible(&sfp->read_wait);
-				kill_fasync(&sfp->async_qp, SIGPOLL,
-					    POLL_HUP);
-			}
-		}
-		SCSI_LOG_TIMEOUT(3, printk("sg_remove: dev=%d, dirty\n", sdp->index));
-		if (NULL == sdp->headfp) {
-			idr_remove(&sg_index_idr, sdp->index);
-		}
-	} else {	/* nothing active, simple case */
-		SCSI_LOG_TIMEOUT(3, printk("sg_remove: dev=%d\n", sdp->index));
-		idr_remove(&sg_index_idr, sdp->index);
+	sdp->detached = 1;
+	for (sfp = sdp->headfp; sfp; sfp = sfp->nextfp) {
+		wake_up_interruptible(&sfp->read_wait);
+		kill_fasync(&sfp->async_qp, SIGPOLL, POLL_HUP);
 	}
 	write_unlock_irqrestore(&sg_index_lock, iflags);
 
@@ -1538,13 +1536,8 @@ sg_remove(struct device *cl_dev, struct 
 	device_destroy(sg_sysfs_class, MKDEV(SCSI_GENERIC_MAJOR, sdp->index));
 	cdev_del(sdp->cdev);
 	sdp->cdev = NULL;
-	put_disk(sdp->disk);
-	sdp->disk = NULL;
-	if (NULL == sdp->headfp)
-		kfree(sdp);
 
-	if (delay)
-		msleep(10);	/* dirty detach so delay device destruction */
+	sg_put_dev(sdp);
 }
 
 module_param_named(scatter_elem_sz, scatter_elem_sz, int, S_IRUGO | S_IWUSR);
@@ -1941,22 +1934,6 @@ sg_get_rq_mark(Sg_fd * sfp, int pack_id)
 	return resp;
 }
 
-#ifdef CONFIG_SCSI_PROC_FS
-static Sg_request *
-sg_get_nth_request(Sg_fd * sfp, int nth)
-{
-	Sg_request *resp;
-	unsigned long iflags;
-	int k;
-
-	read_lock_irqsave(&sfp->rq_list_lock, iflags);
-	for (k = 0, resp = sfp->headrp; resp && (k < nth);
-	     ++k, resp = resp->nextrp) ;
-	read_unlock_irqrestore(&sfp->rq_list_lock, iflags);
-	return resp;
-}
-#endif
-
 /* always adds to end of list */
 static Sg_request *
 sg_add_request(Sg_fd * sfp)
@@ -2032,22 +2009,6 @@ sg_remove_request(Sg_fd * sfp, Sg_reques
 	return res;
 }
 
-#ifdef CONFIG_SCSI_PROC_FS
-static Sg_fd *
-sg_get_nth_sfp(Sg_device * sdp, int nth)
-{
-	Sg_fd *resp;
-	unsigned long iflags;
-	int k;
-
-	read_lock_irqsave(&sg_index_lock, iflags);
-	for (k = 0, resp = sdp->headfp; resp && (k < nth);
-	     ++k, resp = resp->nextfp) ;
-	read_unlock_irqrestore(&sg_index_lock, iflags);
-	return resp;
-}
-#endif
-
 static Sg_fd *
 sg_add_sfp(Sg_device * sdp, int dev)
 {
@@ -2062,6 +2023,7 @@ sg_add_sfp(Sg_device * sdp, int dev)
 	init_waitqueue_head(&sfp->read_wait);
 	rwlock_init(&sfp->rq_list_lock);
 
+	kref_init(&sfp->f_ref);
 	sfp->timeout = SG_DEFAULT_TIMEOUT;
 	sfp->timeout_user = SG_DEFAULT_TIMEOUT_USER;
 	sfp->force_packid = SG_DEF_FORCE_PACK_ID;
@@ -2089,15 +2051,54 @@ sg_add_sfp(Sg_device * sdp, int dev)
 	sg_build_reserve(sfp, bufflen);
 	SCSI_LOG_TIMEOUT(3, printk("sg_add_sfp:   bufflen=%d, k_use_sg=%d\n",
 			   sfp->reserve.bufflen, sfp->reserve.k_use_sg));
+
+	kref_get(&sdp->d_ref);
+	__module_get(THIS_MODULE);
 	return sfp;
 }
 
-static void
-__sg_remove_sfp(Sg_device * sdp, Sg_fd * sfp)
+static void sg_remove_sfp_usercontext(struct work_struct *work)
+{
+	struct sg_fd *sfp = container_of(work, struct sg_fd, ew.work);
+	struct sg_device *sdp = sfp->parentdp;
+
+	/* Cleanup any responses which were never read(). */
+	while (sfp->headrp)
+		sg_finish_rem_req(sfp->headrp);
+
+	if (sfp->reserve.bufflen > 0) {
+		SCSI_LOG_TIMEOUT(6,
+			printk("sg_remove_sfp:    bufflen=%d, k_use_sg=%d\n",
+				(int) sfp->reserve.bufflen,
+				(int) sfp->reserve.k_use_sg));
+		sg_remove_scat(&sfp->reserve);
+	}
+
+	SCSI_LOG_TIMEOUT(6,
+		printk("sg_remove_sfp: %s, sfp=0x%p\n",
+			sdp->disk->disk_name,
+			sfp));
+	kfree(sfp);
+
+	scsi_device_put(sdp->device);
+	sg_put_dev(sdp);
+	module_put(THIS_MODULE);
+}
+
+static void sg_remove_sfp(struct kref *kref)
 {
+	struct sg_fd *sfp = container_of(kref, struct sg_fd, f_ref);
+	struct sg_device *sdp = sfp->parentdp;
 	Sg_fd *fp;
 	Sg_fd *prev_fp;
+	unsigned long iflags;
+
+	/* CAUTION!  Note that sfp can still be found by walking sdp->headfp
+	 * even though the refcount is now 0.  Therefore, unlink sfp from
+	 * sdp->headfp BEFORE doing any other cleanup.
+	 */
 
+	write_lock_irqsave(&sg_index_lock, iflags);
 	prev_fp = sdp->headfp;
 	if (sfp == prev_fp)
 		sdp->headfp = prev_fp->nextfp;
@@ -2110,54 +2111,10 @@ __sg_remove_sfp(Sg_device * sdp, Sg_fd *
 			prev_fp = fp;
 		}
 	}
-	if (sfp->reserve.bufflen > 0) {
-		SCSI_LOG_TIMEOUT(6, 
-			printk("__sg_remove_sfp:    bufflen=%d, k_use_sg=%d\n",
-			(int) sfp->reserve.bufflen, (int) sfp->reserve.k_use_sg));
-		sg_remove_scat(&sfp->reserve);
-	}
-	sfp->parentdp = NULL;
-	SCSI_LOG_TIMEOUT(6, printk("__sg_remove_sfp:    sfp=0x%p\n", sfp));
-	kfree(sfp);
-}
-
-/* Returns 0 in normal case, 1 when detached and sdp object removed */
-static int
-sg_remove_sfp(Sg_device * sdp, Sg_fd * sfp)
-{
-	Sg_request *srp;
-	Sg_request *tsrp;
-	int dirty = 0;
-	int res = 0;
-
-	for (srp = sfp->headrp; srp; srp = tsrp) {
-		tsrp = srp->nextrp;
-		if (sg_srp_done(srp, sfp))
-			sg_finish_rem_req(srp);
-		else
-			++dirty;
-	}
-	if (0 == dirty) {
-		unsigned long iflags;
+	write_unlock_irqrestore(&sg_index_lock, iflags);
+	wake_up_interruptible(&sdp->o_excl_wait);
 
-		write_lock_irqsave(&sg_index_lock, iflags);
-		__sg_remove_sfp(sdp, sfp);
-		if (sdp->detached && (NULL == sdp->headfp)) {
-			idr_remove(&sg_index_idr, sdp->index);
-			kfree(sdp);
-			res = 1;
-		}
-		write_unlock_irqrestore(&sg_index_lock, iflags);
-	} else {
-		/* MOD_INC's to inhibit unloading sg and associated adapter driver */
-		/* only bump the access_count if we actually succeeded in
-		 * throwing another counter on the host module */
-		scsi_device_get(sdp->device);	/* XXX: retval ignored? */	
-		sfp->closed = 1;	/* flag dirty state on this fd */
-		SCSI_LOG_TIMEOUT(1, printk("sg_remove_sfp: worrisome, %d writes pending\n",
-				  dirty));
-	}
-	return res;
+	execute_in_process_context(sg_remove_sfp_usercontext, &sfp->ew);
 }
 
 static int
@@ -2199,19 +2156,38 @@ sg_last_dev(void)
 }
 #endif
 
-static Sg_device *
-sg_get_dev(int dev)
+/* must be called with sg_index_lock held */
+static Sg_device *sg_lookup_dev(int dev)
 {
-	Sg_device *sdp;
-	unsigned long iflags;
+	return idr_find(&sg_index_idr, dev);
+}
 
-	read_lock_irqsave(&sg_index_lock, iflags);
-	sdp = idr_find(&sg_index_idr, dev);
-	read_unlock_irqrestore(&sg_index_lock, iflags);
+static Sg_device *sg_get_dev(int dev)
+{
+	struct sg_device *sdp;
+	unsigned long flags;
+
+	read_lock_irqsave(&sg_index_lock, flags);
+	sdp = sg_lookup_dev(dev);
+	if (!sdp)
+		sdp = ERR_PTR(-ENXIO);
+	else if (sdp->detached) {
+		/* If sdp->detached, then the refcount may already be 0, in
+		 * which case it would be a bug to do kref_get().
+		 */
+		sdp = ERR_PTR(-ENODEV);
+	} else
+		kref_get(&sdp->d_ref);
+	read_unlock_irqrestore(&sg_index_lock, flags);
 
 	return sdp;
 }
 
+static void sg_put_dev(struct sg_device *sdp)
+{
+	kref_put(&sdp->d_ref, sg_device_destroy);
+}
+
 #ifdef CONFIG_SCSI_PROC_FS
 
 static struct proc_dir_entry *sg_proc_sgp = NULL;
@@ -2468,8 +2444,10 @@ static int sg_proc_seq_show_dev(struct s
 	struct sg_proc_deviter * it = (struct sg_proc_deviter *) v;
 	Sg_device *sdp;
 	struct scsi_device *scsidp;
+	unsigned long iflags;
 
-	sdp = it ? sg_get_dev(it->index) : NULL;
+	read_lock_irqsave(&sg_index_lock, iflags);
+	sdp = it ? sg_lookup_dev(it->index) : NULL;
 	if (sdp && (scsidp = sdp->device) && (!sdp->detached))
 		seq_printf(s, "%d\t%d\t%d\t%d\t%d\t%d\t%d\t%d\t%d\n",
 			      scsidp->host->host_no, scsidp->channel,
@@ -2480,6 +2458,7 @@ static int sg_proc_seq_show_dev(struct s
 			      (int) scsi_device_online(scsidp));
 	else
 		seq_printf(s, "-1\t-1\t-1\t-1\t-1\t-1\t-1\t-1\t-1\n");
+	read_unlock_irqrestore(&sg_index_lock, iflags);
 	return 0;
 }
 
@@ -2493,16 +2472,20 @@ static int sg_proc_seq_show_devstrs(stru
 	struct sg_proc_deviter * it = (struct sg_proc_deviter *) v;
 	Sg_device *sdp;
 	struct scsi_device *scsidp;
+	unsigned long iflags;
 
-	sdp = it ? sg_get_dev(it->index) : NULL;
+	read_lock_irqsave(&sg_index_lock, iflags);
+	sdp = it ? sg_lookup_dev(it->index) : NULL;
 	if (sdp && (scsidp = sdp->device) && (!sdp->detached))
 		seq_printf(s, "%8.8s\t%16.16s\t%4.4s\n",
 			   scsidp->vendor, scsidp->model, scsidp->rev);
 	else
 		seq_printf(s, "<no active device>\n");
+	read_unlock_irqrestore(&sg_index_lock, iflags);
 	return 0;
 }
 
+/* must be called while holding sg_index_lock */
 static void sg_proc_debug_helper(struct seq_file *s, Sg_device * sdp)
 {
 	int k, m, new_interface, blen, usg;
@@ -2512,7 +2495,8 @@ static void sg_proc_debug_helper(struct 
 	const char * cp;
 	unsigned int ms;
 
-	for (k = 0; (fp = sg_get_nth_sfp(sdp, k)); ++k) {
+	for (k = 0, fp = sdp->headfp; fp != NULL; ++k, fp = fp->nextfp) {
+		read_lock(&fp->rq_list_lock); /* irqs already disabled */
 		seq_printf(s, "   FD(%d): timeout=%dms bufflen=%d "
 			   "(res)sgat=%d low_dma=%d\n", k + 1,
 			   jiffies_to_msecs(fp->timeout),
@@ -2522,7 +2506,9 @@ static void sg_proc_debug_helper(struct 
 		seq_printf(s, "   cmd_q=%d f_packid=%d k_orphan=%d closed=%d\n",
 			   (int) fp->cmd_q, (int) fp->force_packid,
 			   (int) fp->keep_orphan, (int) fp->closed);
-		for (m = 0; (srp = sg_get_nth_request(fp, m)); ++m) {
+		for (m = 0, srp = fp->headrp;
+				srp != NULL;
+				++m, srp = srp->nextrp) {
 			hp = &srp->header;
 			new_interface = (hp->interface_id == '\0') ? 0 : 1;
 			if (srp->res_used) {
@@ -2559,6 +2545,7 @@ static void sg_proc_debug_helper(struct 
 		}
 		if (0 == m)
 			seq_printf(s, "     No requests active\n");
+		read_unlock(&fp->rq_list_lock);
 	}
 }
 
@@ -2571,39 +2558,34 @@ static int sg_proc_seq_show_debug(struct
 {
 	struct sg_proc_deviter * it = (struct sg_proc_deviter *) v;
 	Sg_device *sdp;
+	unsigned long iflags;
 
 	if (it && (0 == it->index)) {
 		seq_printf(s, "max_active_device=%d(origin 1)\n",
 			   (int)it->max);
 		seq_printf(s, " def_reserved_size=%d\n", sg_big_buff);
 	}
-	sdp = it ? sg_get_dev(it->index) : NULL;
-	if (sdp) {
-		struct scsi_device *scsidp = sdp->device;
 
-		if (NULL == scsidp) {
-			seq_printf(s, "device %d detached ??\n", 
-				   (int)it->index);
-			return 0;
-		}
+	read_lock_irqsave(&sg_index_lock, iflags);
+	sdp = it ? sg_lookup_dev(it->index) : NULL;
+	if (sdp && sdp->headfp) {
+		struct scsi_device *scsidp = sdp->device;
 
-		if (sg_get_nth_sfp(sdp, 0)) {
-			seq_printf(s, " >>> device=%s ",
-				sdp->disk->disk_name);
-			if (sdp->detached)
-				seq_printf(s, "detached pending close ");
-			else
-				seq_printf
-				    (s, "scsi%d chan=%d id=%d lun=%d   em=%d",
-				     scsidp->host->host_no,
-				     scsidp->channel, scsidp->id,
-				     scsidp->lun,
-				     scsidp->host->hostt->emulated);
-			seq_printf(s, " sg_tablesize=%d excl=%d\n",
-				   sdp->sg_tablesize, sdp->exclude);
-		}
+		seq_printf(s, " >>> device=%s ", sdp->disk->disk_name);
+		if (sdp->detached)
+			seq_printf(s, "detached pending close ");
+		else
+			seq_printf
+			    (s, "scsi%d chan=%d id=%d lun=%d   em=%d",
+			     scsidp->host->host_no,
+			     scsidp->channel, scsidp->id,
+			     scsidp->lun,
+			     scsidp->host->hostt->emulated);
+		seq_printf(s, " sg_tablesize=%d excl=%d\n",
+			   sdp->sg_tablesize, sdp->exclude);
 		sg_proc_debug_helper(s, sdp);
 	}
+	read_unlock_irqrestore(&sg_index_lock, iflags);
 	return 0;
 }
 



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/2] sg: fix races during device removal (v5)
  2009-01-20 21:58           ` [PATCH 1/2] sg: fix races during device removal (v5) Tony Battersby
@ 2009-01-21 18:25             ` Stefan Richter
  2009-01-21 19:23               ` Tony Battersby
  2009-01-21 19:45             ` [PATCH 1/2] sg: fix races during device removal (v6) Tony Battersby
  1 sibling, 1 reply; 41+ messages in thread
From: Stefan Richter @ 2009-01-21 18:25 UTC (permalink / raw)
  To: Tony Battersby
  Cc: James.Bottomley, FUJITA Tomonori, dgilbert, hch, linux-scsi

Tony Battersby wrote:
> I think BUG_ON is
> the best way to go for this instead of printk/return because bugs
> will be noticed and fixed instead of possibly going unnoticed.
...
>  static void sg_rq_end_io(struct request *rq, int uptodate)
>  {
...
> +	int result, resid, done = 1;
> +
> +	BUG_ON(srp->done != 0);

AFAIU this is typically called in atomic context.  If so, WARN_ON is
preferred.

...
> +	BUG_ON(sfp == NULL);
> +
> +	sdp = sfp->parentdp;

This would bring up a NULL pointer dereference dump anyway.
-- 
Stefan Richter
-=====-==--= ---= =-=-=
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/2] sg: fix races during device removal (v5)
  2009-01-21 18:25             ` Stefan Richter
@ 2009-01-21 19:23               ` Tony Battersby
  0 siblings, 0 replies; 41+ messages in thread
From: Tony Battersby @ 2009-01-21 19:23 UTC (permalink / raw)
  To: Stefan Richter
  Cc: James.Bottomley, FUJITA Tomonori, dgilbert, hch, linux-scsi

Stefan Richter wrote:
> Tony Battersby wrote:
>   
>> I think BUG_ON is
>> the best way to go for this instead of printk/return because bugs
>> will be noticed and fixed instead of possibly going unnoticed.
>>     
> ...
>   
>>  static void sg_rq_end_io(struct request *rq, int uptodate)
>>  {
>>     
> ...
>   
>> +	int result, resid, done = 1;
>> +
>> +	BUG_ON(srp->done != 0);
>>     
>
> AFAIU this is typically called in atomic context.  If so, WARN_ON is
> preferred.
>
>   

Good point.  I think the following would be the best way then:

	if (WARN_ON(srp->done != 0))
		return;


> ...
>   
>> +	BUG_ON(sfp == NULL);
>> +
>> +	sdp = sfp->parentdp;
>>     
>
> This would bring up a NULL pointer dereference dump anyway.
>   

Yes, and that would have the same problem as BUG_ON by killing the
machine.  So I think WARN_ON()/return works best for this too.

Tony


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 1/2] sg: fix races during device removal (v6)
  2009-01-20 21:58           ` [PATCH 1/2] sg: fix races during device removal (v5) Tony Battersby
  2009-01-21 18:25             ` Stefan Richter
@ 2009-01-21 19:45             ` Tony Battersby
  2009-01-25 12:46               ` FUJITA Tomonori
  2009-01-26 13:57               ` Douglas Gilbert
  1 sibling, 2 replies; 41+ messages in thread
From: Tony Battersby @ 2009-01-21 19:45 UTC (permalink / raw)
  To: James.Bottomley; +Cc: FUJITA Tomonori, dgilbert, hch, linux-scsi

sg has the following problems related to device removal:

* opening a sg fd races with removing a device
* closing a sg fd races with removing a device
* /proc/scsi/sg/* access races with removing a device
* command completion races with removing a device
* command completion races with closing a sg fd
* can rmmod sg with active commands

These problems can cause kernel oopses, memory-use-after-free, or
double-free errors.  This patch fixes these problems by using krefs
to manage the lifetime of sg_device and sg_fd.

Each command submitted to the midlevel holds a reference to sg_fd
until the completion callback.  This ensures that sg_fd doesn't go
away if the fd is closed with commands still outstanding.

sg_fd gets the reference of sg_device (with scsi_device) and also
makes sure that the sg module doesn't go away.

/proc/scsi/sg/* functions don't play nicely with krefs because they
give information about sg_fds which have been closed but not yet
freed due to still having outstanding commands and sg_devices which
have been removed but not yet freed due to still being referenced
by one or more sg_fds.  To deal with this safely without removing
functionality, /proc functions now access sg_device and sg_fd while
holding a lock instead of using kref_get()/kref_put().

Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
---

This version changes BUG_ON() to WARN_ON()/return as suggested by
Stefan Richter.

The second patch "[PATCH 2/2] sg: fix races with ioctl(SG_IO) (v2)"
is still the same as before, so I am not resending it.

 sg.c |  418 ++++++++++++++++++++++++++++++++-----------------------------------
 1 file changed, 201 insertions(+), 217 deletions(-)

--- linux-2.6.29-rc2/drivers/scsi/sg.c.orig	2009-01-21 14:34:05.000000000 -0500
+++ linux-2.6.29-rc2/drivers/scsi/sg.c	2009-01-21 14:36:00.000000000 -0500
@@ -101,6 +101,7 @@ static int scatter_elem_sz_prev = SG_SCA
 #define SG_SECTOR_MSK (SG_SECTOR_SZ - 1)
 
 static int sg_add(struct device *, struct class_interface *);
+static void sg_device_destroy(struct kref *kref);
 static void sg_remove(struct device *, struct class_interface *);
 
 static DEFINE_IDR(sg_index_idr);
@@ -158,6 +159,8 @@ typedef struct sg_fd {		/* holds the sta
 	char next_cmd_len;	/* 0 -> automatic (def), >0 -> use on next write() */
 	char keep_orphan;	/* 0 -> drop orphan (def), 1 -> keep for read() */
 	char mmap_called;	/* 0 -> mmap() never called on this fd */
+	struct kref f_ref;
+	struct execute_work ew;
 } Sg_fd;
 
 typedef struct sg_device { /* holds the state of each scsi generic device */
@@ -171,6 +174,7 @@ typedef struct sg_device { /* holds the 
 	char sgdebug;		/* 0->off, 1->sense, 9->dump dev, 10-> all devs */
 	struct gendisk *disk;
 	struct cdev * cdev;	/* char_dev [sysfs: /sys/cdev/major/sg<n>] */
+	struct kref d_ref;
 } Sg_device;
 
 static int sg_fasync(int fd, struct file *filp, int mode);
@@ -194,13 +198,14 @@ static void sg_build_reserve(Sg_fd * sfp
 static void sg_link_reserve(Sg_fd * sfp, Sg_request * srp, int size);
 static void sg_unlink_reserve(Sg_fd * sfp, Sg_request * srp);
 static Sg_fd *sg_add_sfp(Sg_device * sdp, int dev);
-static int sg_remove_sfp(Sg_device * sdp, Sg_fd * sfp);
-static void __sg_remove_sfp(Sg_device * sdp, Sg_fd * sfp);
+static void sg_remove_sfp(struct kref *);
 static Sg_request *sg_get_rq_mark(Sg_fd * sfp, int pack_id);
 static Sg_request *sg_add_request(Sg_fd * sfp);
 static int sg_remove_request(Sg_fd * sfp, Sg_request * srp);
 static int sg_res_in_use(Sg_fd * sfp);
+static Sg_device *sg_lookup_dev(int dev);
 static Sg_device *sg_get_dev(int dev);
+static void sg_put_dev(Sg_device *sdp);
 #ifdef CONFIG_SCSI_PROC_FS
 static int sg_last_dev(void);
 #endif
@@ -237,22 +242,17 @@ sg_open(struct inode *inode, struct file
 	nonseekable_open(inode, filp);
 	SCSI_LOG_TIMEOUT(3, printk("sg_open: dev=%d, flags=0x%x\n", dev, flags));
 	sdp = sg_get_dev(dev);
-	if ((!sdp) || (!sdp->device)) {
-		unlock_kernel();
-		return -ENXIO;
-	}
-	if (sdp->detached) {
-		unlock_kernel();
-		return -ENODEV;
+	if (IS_ERR(sdp)) {
+		retval = PTR_ERR(sdp);
+		sdp = NULL;
+		goto sg_put;
 	}
 
 	/* This driver's module count bumped by fops_get in <linux/fs.h> */
 	/* Prevent the device driver from vanishing while we sleep */
 	retval = scsi_device_get(sdp->device);
-	if (retval) {
-		unlock_kernel();
-		return retval;
-	}
+	if (retval)
+		goto sg_put;
 
 	if (!((flags & O_NONBLOCK) ||
 	      scsi_block_when_processing_errors(sdp->device))) {
@@ -303,16 +303,20 @@ sg_open(struct inode *inode, struct file
 	if ((sfp = sg_add_sfp(sdp, dev)))
 		filp->private_data = sfp;
 	else {
-		if (flags & O_EXCL)
+		if (flags & O_EXCL) {
 			sdp->exclude = 0;	/* undo if error */
+			wake_up_interruptible(&sdp->o_excl_wait);
+		}
 		retval = -ENOMEM;
 		goto error_out;
 	}
-	unlock_kernel();
-	return 0;
-
-      error_out:
-	scsi_device_put(sdp->device);
+	retval = 0;
+error_out:
+	if (retval)
+		scsi_device_put(sdp->device);
+sg_put:
+	if (sdp)
+		sg_put_dev(sdp);
 	unlock_kernel();
 	return retval;
 }
@@ -327,13 +331,13 @@ sg_release(struct inode *inode, struct f
 	if ((!(sfp = (Sg_fd *) filp->private_data)) || (!(sdp = sfp->parentdp)))
 		return -ENXIO;
 	SCSI_LOG_TIMEOUT(3, printk("sg_release: %s\n", sdp->disk->disk_name));
-	if (0 == sg_remove_sfp(sdp, sfp)) {	/* Returns 1 when sdp gone */
-		if (!sdp->detached) {
-			scsi_device_put(sdp->device);
-		}
-		sdp->exclude = 0;
-		wake_up_interruptible(&sdp->o_excl_wait);
-	}
+
+	sfp->closed = 1;
+
+	sdp->exclude = 0;
+	wake_up_interruptible(&sdp->o_excl_wait);
+
+	kref_put(&sfp->f_ref, sg_remove_sfp);
 	return 0;
 }
 
@@ -755,6 +759,7 @@ sg_common_write(Sg_fd * sfp, Sg_request 
 	hp->duration = jiffies_to_msecs(jiffies);
 
 	srp->rq->timeout = timeout;
+	kref_get(&sfp->f_ref); /* sg_rq_end_io() does kref_put(). */
 	blk_execute_rq_nowait(sdp->device->request_queue, sdp->disk,
 			      srp->rq, 1, sg_rq_end_io);
 	return 0;
@@ -1247,24 +1252,23 @@ sg_mmap(struct file *filp, struct vm_are
 static void sg_rq_end_io(struct request *rq, int uptodate)
 {
 	struct sg_request *srp = rq->end_io_data;
-	Sg_device *sdp = NULL;
+	Sg_device *sdp;
 	Sg_fd *sfp;
 	unsigned long iflags;
 	unsigned int ms;
 	char *sense;
-	int result, resid;
+	int result, resid, done = 1;
 
-	if (NULL == srp) {
-		printk(KERN_ERR "sg_cmd_done: NULL request\n");
+	if (WARN_ON(srp->done != 0))
 		return;
-	}
+
 	sfp = srp->parentfp;
-	if (sfp)
-		sdp = sfp->parentdp;
-	if ((NULL == sdp) || sdp->detached) {
-		printk(KERN_INFO "sg_cmd_done: device detached\n");
+	if (WARN_ON(sfp == NULL))
 		return;
-	}
+
+	sdp = sfp->parentdp;
+	if (unlikely(sdp->detached))
+		printk(KERN_INFO "sg_rq_end_io: device detached\n");
 
 	sense = rq->sense;
 	result = rq->errors;
@@ -1303,33 +1307,26 @@ static void sg_rq_end_io(struct request 
 	}
 	/* Rely on write phase to clean out srp status values, so no "else" */
 
-	if (sfp->closed) {	/* whoops this fd already released, cleanup */
-		SCSI_LOG_TIMEOUT(1, printk("sg_cmd_done: already closed, freeing ...\n"));
-		sg_finish_rem_req(srp);
-		srp = NULL;
-		if (NULL == sfp->headrp) {
-			SCSI_LOG_TIMEOUT(1, printk("sg_cmd_done: already closed, final cleanup\n"));
-			if (0 == sg_remove_sfp(sdp, sfp)) {	/* device still present */
-				scsi_device_put(sdp->device);
-			}
-			sfp = NULL;
-		}
-	} else if (srp && srp->orphan) {
+	write_lock_irqsave(&sfp->rq_list_lock, iflags);
+	if (unlikely(srp->orphan)) {
 		if (sfp->keep_orphan)
 			srp->sg_io_owned = 0;
-		else {
-			sg_finish_rem_req(srp);
-			srp = NULL;
-		}
+		else
+			done = 0;
 	}
-	if (sfp && srp) {
-		/* Now wake up any sg_read() that is waiting for this packet. */
-		kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN);
-		write_lock_irqsave(&sfp->rq_list_lock, iflags);
-		srp->done = 1;
+	srp->done = done;
+	write_unlock_irqrestore(&sfp->rq_list_lock, iflags);
+
+	if (likely(done)) {
+		/* Now wake up any sg_read() that is waiting for this
+		 * packet.
+		 */
 		wake_up_interruptible(&sfp->read_wait);
-		write_unlock_irqrestore(&sfp->rq_list_lock, iflags);
-	}
+		kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN);
+	} else
+		sg_finish_rem_req(srp); /* call with srp->done == 0 */
+
+	kref_put(&sfp->f_ref, sg_remove_sfp);
 }
 
 static struct file_operations sg_fops = {
@@ -1364,17 +1361,18 @@ static Sg_device *sg_alloc(struct gendis
 		printk(KERN_WARNING "kmalloc Sg_device failure\n");
 		return ERR_PTR(-ENOMEM);
 	}
-	error = -ENOMEM;
+
 	if (!idr_pre_get(&sg_index_idr, GFP_KERNEL)) {
 		printk(KERN_WARNING "idr expansion Sg_device failure\n");
+		error = -ENOMEM;
 		goto out;
 	}
 
 	write_lock_irqsave(&sg_index_lock, iflags);
-	error = idr_get_new(&sg_index_idr, sdp, &k);
-	write_unlock_irqrestore(&sg_index_lock, iflags);
 
+	error = idr_get_new(&sg_index_idr, sdp, &k);
 	if (error) {
+		write_unlock_irqrestore(&sg_index_lock, iflags);
 		printk(KERN_WARNING "idr allocation Sg_device failure: %d\n",
 		       error);
 		goto out;
@@ -1391,6 +1389,9 @@ static Sg_device *sg_alloc(struct gendis
 	init_waitqueue_head(&sdp->o_excl_wait);
 	sdp->sg_tablesize = min(q->max_hw_segments, q->max_phys_segments);
 	sdp->index = k;
+	kref_init(&sdp->d_ref);
+
+	write_unlock_irqrestore(&sg_index_lock, iflags);
 
 	error = 0;
  out:
@@ -1401,6 +1402,8 @@ static Sg_device *sg_alloc(struct gendis
 	return sdp;
 
  overflow:
+	idr_remove(&sg_index_idr, k);
+	write_unlock_irqrestore(&sg_index_lock, iflags);
 	sdev_printk(KERN_WARNING, scsidp,
 		    "Unable to attach sg device type=%d, minor "
 		    "number exceeds %d\n", scsidp->type, SG_MAX_DEVS - 1);
@@ -1488,49 +1491,46 @@ out:
 	return error;
 }
 
-static void
-sg_remove(struct device *cl_dev, struct class_interface *cl_intf)
+static void sg_device_destroy(struct kref *kref)
+{
+	struct sg_device *sdp = container_of(kref, struct sg_device, d_ref);
+	unsigned long flags;
+
+	/* CAUTION!  Note that the device can still be found via idr_find()
+	 * even though the refcount is 0.  Therefore, do idr_remove() BEFORE
+	 * any other cleanup.
+	 */
+
+	write_lock_irqsave(&sg_index_lock, flags);
+	idr_remove(&sg_index_idr, sdp->index);
+	write_unlock_irqrestore(&sg_index_lock, flags);
+
+	SCSI_LOG_TIMEOUT(3,
+		printk("sg_device_destroy: %s\n",
+			sdp->disk->disk_name));
+
+	put_disk(sdp->disk);
+	kfree(sdp);
+}
+
+static void sg_remove(struct device *cl_dev, struct class_interface *cl_intf)
 {
 	struct scsi_device *scsidp = to_scsi_device(cl_dev->parent);
 	Sg_device *sdp = dev_get_drvdata(cl_dev);
 	unsigned long iflags;
 	Sg_fd *sfp;
-	Sg_fd *tsfp;
-	Sg_request *srp;
-	Sg_request *tsrp;
-	int delay;
 
-	if (!sdp)
+	if (!sdp || sdp->detached)
 		return;
 
-	delay = 0;
+	SCSI_LOG_TIMEOUT(3, printk("sg_remove: %s\n", sdp->disk->disk_name));
+
+	/* Need a write lock to set sdp->detached. */
 	write_lock_irqsave(&sg_index_lock, iflags);
-	if (sdp->headfp) {
-		sdp->detached = 1;
-		for (sfp = sdp->headfp; sfp; sfp = tsfp) {
-			tsfp = sfp->nextfp;
-			for (srp = sfp->headrp; srp; srp = tsrp) {
-				tsrp = srp->nextrp;
-				if (sfp->closed || (0 == sg_srp_done(srp, sfp)))
-					sg_finish_rem_req(srp);
-			}
-			if (sfp->closed) {
-				scsi_device_put(sdp->device);
-				__sg_remove_sfp(sdp, sfp);
-			} else {
-				delay = 1;
-				wake_up_interruptible(&sfp->read_wait);
-				kill_fasync(&sfp->async_qp, SIGPOLL,
-					    POLL_HUP);
-			}
-		}
-		SCSI_LOG_TIMEOUT(3, printk("sg_remove: dev=%d, dirty\n", sdp->index));
-		if (NULL == sdp->headfp) {
-			idr_remove(&sg_index_idr, sdp->index);
-		}
-	} else {	/* nothing active, simple case */
-		SCSI_LOG_TIMEOUT(3, printk("sg_remove: dev=%d\n", sdp->index));
-		idr_remove(&sg_index_idr, sdp->index);
+	sdp->detached = 1;
+	for (sfp = sdp->headfp; sfp; sfp = sfp->nextfp) {
+		wake_up_interruptible(&sfp->read_wait);
+		kill_fasync(&sfp->async_qp, SIGPOLL, POLL_HUP);
 	}
 	write_unlock_irqrestore(&sg_index_lock, iflags);
 
@@ -1538,13 +1538,8 @@ sg_remove(struct device *cl_dev, struct 
 	device_destroy(sg_sysfs_class, MKDEV(SCSI_GENERIC_MAJOR, sdp->index));
 	cdev_del(sdp->cdev);
 	sdp->cdev = NULL;
-	put_disk(sdp->disk);
-	sdp->disk = NULL;
-	if (NULL == sdp->headfp)
-		kfree(sdp);
 
-	if (delay)
-		msleep(10);	/* dirty detach so delay device destruction */
+	sg_put_dev(sdp);
 }
 
 module_param_named(scatter_elem_sz, scatter_elem_sz, int, S_IRUGO | S_IWUSR);
@@ -1941,22 +1936,6 @@ sg_get_rq_mark(Sg_fd * sfp, int pack_id)
 	return resp;
 }
 
-#ifdef CONFIG_SCSI_PROC_FS
-static Sg_request *
-sg_get_nth_request(Sg_fd * sfp, int nth)
-{
-	Sg_request *resp;
-	unsigned long iflags;
-	int k;
-
-	read_lock_irqsave(&sfp->rq_list_lock, iflags);
-	for (k = 0, resp = sfp->headrp; resp && (k < nth);
-	     ++k, resp = resp->nextrp) ;
-	read_unlock_irqrestore(&sfp->rq_list_lock, iflags);
-	return resp;
-}
-#endif
-
 /* always adds to end of list */
 static Sg_request *
 sg_add_request(Sg_fd * sfp)
@@ -2032,22 +2011,6 @@ sg_remove_request(Sg_fd * sfp, Sg_reques
 	return res;
 }
 
-#ifdef CONFIG_SCSI_PROC_FS
-static Sg_fd *
-sg_get_nth_sfp(Sg_device * sdp, int nth)
-{
-	Sg_fd *resp;
-	unsigned long iflags;
-	int k;
-
-	read_lock_irqsave(&sg_index_lock, iflags);
-	for (k = 0, resp = sdp->headfp; resp && (k < nth);
-	     ++k, resp = resp->nextfp) ;
-	read_unlock_irqrestore(&sg_index_lock, iflags);
-	return resp;
-}
-#endif
-
 static Sg_fd *
 sg_add_sfp(Sg_device * sdp, int dev)
 {
@@ -2062,6 +2025,7 @@ sg_add_sfp(Sg_device * sdp, int dev)
 	init_waitqueue_head(&sfp->read_wait);
 	rwlock_init(&sfp->rq_list_lock);
 
+	kref_init(&sfp->f_ref);
 	sfp->timeout = SG_DEFAULT_TIMEOUT;
 	sfp->timeout_user = SG_DEFAULT_TIMEOUT_USER;
 	sfp->force_packid = SG_DEF_FORCE_PACK_ID;
@@ -2089,15 +2053,54 @@ sg_add_sfp(Sg_device * sdp, int dev)
 	sg_build_reserve(sfp, bufflen);
 	SCSI_LOG_TIMEOUT(3, printk("sg_add_sfp:   bufflen=%d, k_use_sg=%d\n",
 			   sfp->reserve.bufflen, sfp->reserve.k_use_sg));
+
+	kref_get(&sdp->d_ref);
+	__module_get(THIS_MODULE);
 	return sfp;
 }
 
-static void
-__sg_remove_sfp(Sg_device * sdp, Sg_fd * sfp)
+static void sg_remove_sfp_usercontext(struct work_struct *work)
+{
+	struct sg_fd *sfp = container_of(work, struct sg_fd, ew.work);
+	struct sg_device *sdp = sfp->parentdp;
+
+	/* Cleanup any responses which were never read(). */
+	while (sfp->headrp)
+		sg_finish_rem_req(sfp->headrp);
+
+	if (sfp->reserve.bufflen > 0) {
+		SCSI_LOG_TIMEOUT(6,
+			printk("sg_remove_sfp:    bufflen=%d, k_use_sg=%d\n",
+				(int) sfp->reserve.bufflen,
+				(int) sfp->reserve.k_use_sg));
+		sg_remove_scat(&sfp->reserve);
+	}
+
+	SCSI_LOG_TIMEOUT(6,
+		printk("sg_remove_sfp: %s, sfp=0x%p\n",
+			sdp->disk->disk_name,
+			sfp));
+	kfree(sfp);
+
+	scsi_device_put(sdp->device);
+	sg_put_dev(sdp);
+	module_put(THIS_MODULE);
+}
+
+static void sg_remove_sfp(struct kref *kref)
 {
+	struct sg_fd *sfp = container_of(kref, struct sg_fd, f_ref);
+	struct sg_device *sdp = sfp->parentdp;
 	Sg_fd *fp;
 	Sg_fd *prev_fp;
+	unsigned long iflags;
+
+	/* CAUTION!  Note that sfp can still be found by walking sdp->headfp
+	 * even though the refcount is now 0.  Therefore, unlink sfp from
+	 * sdp->headfp BEFORE doing any other cleanup.
+	 */
 
+	write_lock_irqsave(&sg_index_lock, iflags);
 	prev_fp = sdp->headfp;
 	if (sfp == prev_fp)
 		sdp->headfp = prev_fp->nextfp;
@@ -2110,54 +2113,10 @@ __sg_remove_sfp(Sg_device * sdp, Sg_fd *
 			prev_fp = fp;
 		}
 	}
-	if (sfp->reserve.bufflen > 0) {
-		SCSI_LOG_TIMEOUT(6, 
-			printk("__sg_remove_sfp:    bufflen=%d, k_use_sg=%d\n",
-			(int) sfp->reserve.bufflen, (int) sfp->reserve.k_use_sg));
-		sg_remove_scat(&sfp->reserve);
-	}
-	sfp->parentdp = NULL;
-	SCSI_LOG_TIMEOUT(6, printk("__sg_remove_sfp:    sfp=0x%p\n", sfp));
-	kfree(sfp);
-}
-
-/* Returns 0 in normal case, 1 when detached and sdp object removed */
-static int
-sg_remove_sfp(Sg_device * sdp, Sg_fd * sfp)
-{
-	Sg_request *srp;
-	Sg_request *tsrp;
-	int dirty = 0;
-	int res = 0;
-
-	for (srp = sfp->headrp; srp; srp = tsrp) {
-		tsrp = srp->nextrp;
-		if (sg_srp_done(srp, sfp))
-			sg_finish_rem_req(srp);
-		else
-			++dirty;
-	}
-	if (0 == dirty) {
-		unsigned long iflags;
+	write_unlock_irqrestore(&sg_index_lock, iflags);
+	wake_up_interruptible(&sdp->o_excl_wait);
 
-		write_lock_irqsave(&sg_index_lock, iflags);
-		__sg_remove_sfp(sdp, sfp);
-		if (sdp->detached && (NULL == sdp->headfp)) {
-			idr_remove(&sg_index_idr, sdp->index);
-			kfree(sdp);
-			res = 1;
-		}
-		write_unlock_irqrestore(&sg_index_lock, iflags);
-	} else {
-		/* MOD_INC's to inhibit unloading sg and associated adapter driver */
-		/* only bump the access_count if we actually succeeded in
-		 * throwing another counter on the host module */
-		scsi_device_get(sdp->device);	/* XXX: retval ignored? */	
-		sfp->closed = 1;	/* flag dirty state on this fd */
-		SCSI_LOG_TIMEOUT(1, printk("sg_remove_sfp: worrisome, %d writes pending\n",
-				  dirty));
-	}
-	return res;
+	execute_in_process_context(sg_remove_sfp_usercontext, &sfp->ew);
 }
 
 static int
@@ -2199,19 +2158,38 @@ sg_last_dev(void)
 }
 #endif
 
-static Sg_device *
-sg_get_dev(int dev)
+/* must be called with sg_index_lock held */
+static Sg_device *sg_lookup_dev(int dev)
 {
-	Sg_device *sdp;
-	unsigned long iflags;
+	return idr_find(&sg_index_idr, dev);
+}
 
-	read_lock_irqsave(&sg_index_lock, iflags);
-	sdp = idr_find(&sg_index_idr, dev);
-	read_unlock_irqrestore(&sg_index_lock, iflags);
+static Sg_device *sg_get_dev(int dev)
+{
+	struct sg_device *sdp;
+	unsigned long flags;
+
+	read_lock_irqsave(&sg_index_lock, flags);
+	sdp = sg_lookup_dev(dev);
+	if (!sdp)
+		sdp = ERR_PTR(-ENXIO);
+	else if (sdp->detached) {
+		/* If sdp->detached, then the refcount may already be 0, in
+		 * which case it would be a bug to do kref_get().
+		 */
+		sdp = ERR_PTR(-ENODEV);
+	} else
+		kref_get(&sdp->d_ref);
+	read_unlock_irqrestore(&sg_index_lock, flags);
 
 	return sdp;
 }
 
+static void sg_put_dev(struct sg_device *sdp)
+{
+	kref_put(&sdp->d_ref, sg_device_destroy);
+}
+
 #ifdef CONFIG_SCSI_PROC_FS
 
 static struct proc_dir_entry *sg_proc_sgp = NULL;
@@ -2468,8 +2446,10 @@ static int sg_proc_seq_show_dev(struct s
 	struct sg_proc_deviter * it = (struct sg_proc_deviter *) v;
 	Sg_device *sdp;
 	struct scsi_device *scsidp;
+	unsigned long iflags;
 
-	sdp = it ? sg_get_dev(it->index) : NULL;
+	read_lock_irqsave(&sg_index_lock, iflags);
+	sdp = it ? sg_lookup_dev(it->index) : NULL;
 	if (sdp && (scsidp = sdp->device) && (!sdp->detached))
 		seq_printf(s, "%d\t%d\t%d\t%d\t%d\t%d\t%d\t%d\t%d\n",
 			      scsidp->host->host_no, scsidp->channel,
@@ -2480,6 +2460,7 @@ static int sg_proc_seq_show_dev(struct s
 			      (int) scsi_device_online(scsidp));
 	else
 		seq_printf(s, "-1\t-1\t-1\t-1\t-1\t-1\t-1\t-1\t-1\n");
+	read_unlock_irqrestore(&sg_index_lock, iflags);
 	return 0;
 }
 
@@ -2493,16 +2474,20 @@ static int sg_proc_seq_show_devstrs(stru
 	struct sg_proc_deviter * it = (struct sg_proc_deviter *) v;
 	Sg_device *sdp;
 	struct scsi_device *scsidp;
+	unsigned long iflags;
 
-	sdp = it ? sg_get_dev(it->index) : NULL;
+	read_lock_irqsave(&sg_index_lock, iflags);
+	sdp = it ? sg_lookup_dev(it->index) : NULL;
 	if (sdp && (scsidp = sdp->device) && (!sdp->detached))
 		seq_printf(s, "%8.8s\t%16.16s\t%4.4s\n",
 			   scsidp->vendor, scsidp->model, scsidp->rev);
 	else
 		seq_printf(s, "<no active device>\n");
+	read_unlock_irqrestore(&sg_index_lock, iflags);
 	return 0;
 }
 
+/* must be called while holding sg_index_lock */
 static void sg_proc_debug_helper(struct seq_file *s, Sg_device * sdp)
 {
 	int k, m, new_interface, blen, usg;
@@ -2512,7 +2497,8 @@ static void sg_proc_debug_helper(struct 
 	const char * cp;
 	unsigned int ms;
 
-	for (k = 0; (fp = sg_get_nth_sfp(sdp, k)); ++k) {
+	for (k = 0, fp = sdp->headfp; fp != NULL; ++k, fp = fp->nextfp) {
+		read_lock(&fp->rq_list_lock); /* irqs already disabled */
 		seq_printf(s, "   FD(%d): timeout=%dms bufflen=%d "
 			   "(res)sgat=%d low_dma=%d\n", k + 1,
 			   jiffies_to_msecs(fp->timeout),
@@ -2522,7 +2508,9 @@ static void sg_proc_debug_helper(struct 
 		seq_printf(s, "   cmd_q=%d f_packid=%d k_orphan=%d closed=%d\n",
 			   (int) fp->cmd_q, (int) fp->force_packid,
 			   (int) fp->keep_orphan, (int) fp->closed);
-		for (m = 0; (srp = sg_get_nth_request(fp, m)); ++m) {
+		for (m = 0, srp = fp->headrp;
+				srp != NULL;
+				++m, srp = srp->nextrp) {
 			hp = &srp->header;
 			new_interface = (hp->interface_id == '\0') ? 0 : 1;
 			if (srp->res_used) {
@@ -2559,6 +2547,7 @@ static void sg_proc_debug_helper(struct 
 		}
 		if (0 == m)
 			seq_printf(s, "     No requests active\n");
+		read_unlock(&fp->rq_list_lock);
 	}
 }
 
@@ -2571,39 +2560,34 @@ static int sg_proc_seq_show_debug(struct
 {
 	struct sg_proc_deviter * it = (struct sg_proc_deviter *) v;
 	Sg_device *sdp;
+	unsigned long iflags;
 
 	if (it && (0 == it->index)) {
 		seq_printf(s, "max_active_device=%d(origin 1)\n",
 			   (int)it->max);
 		seq_printf(s, " def_reserved_size=%d\n", sg_big_buff);
 	}
-	sdp = it ? sg_get_dev(it->index) : NULL;
-	if (sdp) {
-		struct scsi_device *scsidp = sdp->device;
 
-		if (NULL == scsidp) {
-			seq_printf(s, "device %d detached ??\n", 
-				   (int)it->index);
-			return 0;
-		}
+	read_lock_irqsave(&sg_index_lock, iflags);
+	sdp = it ? sg_lookup_dev(it->index) : NULL;
+	if (sdp && sdp->headfp) {
+		struct scsi_device *scsidp = sdp->device;
 
-		if (sg_get_nth_sfp(sdp, 0)) {
-			seq_printf(s, " >>> device=%s ",
-				sdp->disk->disk_name);
-			if (sdp->detached)
-				seq_printf(s, "detached pending close ");
-			else
-				seq_printf
-				    (s, "scsi%d chan=%d id=%d lun=%d   em=%d",
-				     scsidp->host->host_no,
-				     scsidp->channel, scsidp->id,
-				     scsidp->lun,
-				     scsidp->host->hostt->emulated);
-			seq_printf(s, " sg_tablesize=%d excl=%d\n",
-				   sdp->sg_tablesize, sdp->exclude);
-		}
+		seq_printf(s, " >>> device=%s ", sdp->disk->disk_name);
+		if (sdp->detached)
+			seq_printf(s, "detached pending close ");
+		else
+			seq_printf
+			    (s, "scsi%d chan=%d id=%d lun=%d   em=%d",
+			     scsidp->host->host_no,
+			     scsidp->channel, scsidp->id,
+			     scsidp->lun,
+			     scsidp->host->hostt->emulated);
+		seq_printf(s, " sg_tablesize=%d excl=%d\n",
+			   sdp->sg_tablesize, sdp->exclude);
 		sg_proc_debug_helper(s, sdp);
 	}
+	read_unlock_irqrestore(&sg_index_lock, iflags);
 	return 0;
 }
 



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/2] sg: fix races during device removal (v6)
  2009-01-21 19:45             ` [PATCH 1/2] sg: fix races during device removal (v6) Tony Battersby
@ 2009-01-25 12:46               ` FUJITA Tomonori
  2009-01-26 13:57               ` Douglas Gilbert
  1 sibling, 0 replies; 41+ messages in thread
From: FUJITA Tomonori @ 2009-01-25 12:46 UTC (permalink / raw)
  To: tonyb; +Cc: James.Bottomley, fujita.tomonori, dgilbert, hch, linux-scsi

On Wed, 21 Jan 2009 14:45:50 -0500
Tony Battersby <tonyb@cybernetics.com> wrote:

> sg has the following problems related to device removal:
> 
> * opening a sg fd races with removing a device
> * closing a sg fd races with removing a device
> * /proc/scsi/sg/* access races with removing a device
> * command completion races with removing a device
> * command completion races with closing a sg fd
> * can rmmod sg with active commands
> 
> These problems can cause kernel oopses, memory-use-after-free, or
> double-free errors.  This patch fixes these problems by using krefs
> to manage the lifetime of sg_device and sg_fd.
> 
> Each command submitted to the midlevel holds a reference to sg_fd
> until the completion callback.  This ensures that sg_fd doesn't go
> away if the fd is closed with commands still outstanding.
> 
> sg_fd gets the reference of sg_device (with scsi_device) and also
> makes sure that the sg module doesn't go away.
> 
> /proc/scsi/sg/* functions don't play nicely with krefs because they
> give information about sg_fds which have been closed but not yet
> freed due to still having outstanding commands and sg_devices which
> have been removed but not yet freed due to still being referenced
> by one or more sg_fds.  To deal with this safely without removing
> functionality, /proc functions now access sg_device and sg_fd while
> holding a lock instead of using kref_get()/kref_put().
> 
> Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
> ---
> 
> This version changes BUG_ON() to WARN_ON()/return as suggested by
> Stefan Richter.
> 
> The second patch "[PATCH 2/2] sg: fix races with ioctl(SG_IO) (v2)"
> is still the same as before, so I am not resending it.
> 
>  sg.c |  418 ++++++++++++++++++++++++++++++++-----------------------------------
>  1 file changed, 201 insertions(+), 217 deletions(-)

Looks good to me,

Thanks!

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/2] sg: fix races during device removal (v6)
  2009-01-21 19:45             ` [PATCH 1/2] sg: fix races during device removal (v6) Tony Battersby
  2009-01-25 12:46               ` FUJITA Tomonori
@ 2009-01-26 13:57               ` Douglas Gilbert
  2009-01-28  1:51                 ` FUJITA Tomonori
  1 sibling, 1 reply; 41+ messages in thread
From: Douglas Gilbert @ 2009-01-26 13:57 UTC (permalink / raw)
  To: Tony Battersby; +Cc: James.Bottomley, FUJITA Tomonori, hch, linux-scsi

Tony Battersby wrote:
> sg has the following problems related to device removal:
> 
> * opening a sg fd races with removing a device
> * closing a sg fd races with removing a device
> * /proc/scsi/sg/* access races with removing a device
> * command completion races with removing a device
> * command completion races with closing a sg fd
> * can rmmod sg with active commands
> 
> These problems can cause kernel oopses, memory-use-after-free, or
> double-free errors.  This patch fixes these problems by using krefs
> to manage the lifetime of sg_device and sg_fd.
> 
> Each command submitted to the midlevel holds a reference to sg_fd
> until the completion callback.  This ensures that sg_fd doesn't go
> away if the fd is closed with commands still outstanding.
> 
> sg_fd gets the reference of sg_device (with scsi_device) and also
> makes sure that the sg module doesn't go away.
> 
> /proc/scsi/sg/* functions don't play nicely with krefs because they
> give information about sg_fds which have been closed but not yet
> freed due to still having outstanding commands and sg_devices which
> have been removed but not yet freed due to still being referenced
> by one or more sg_fds.  To deal with this safely without removing
> functionality, /proc functions now access sg_device and sg_fd while
> holding a lock instead of using kref_get()/kref_put().
> 
> Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
> ---
> 
> This version changes BUG_ON() to WARN_ON()/return as suggested by
> Stefan Richter.
> 
> The second patch "[PATCH 2/2] sg: fix races with ioctl(SG_IO) (v2)"
> is still the same as before, so I am not resending it.
> 
>  sg.c |  418 ++++++++++++++++++++++++++++++++-----------------------------------
>  1 file changed, 201 insertions(+), 217 deletions(-)
> 
> --- linux-2.6.29-rc2/drivers/scsi/sg.c.orig	2009-01-21 14:34:05.000000000 -0500
> +++ linux-2.6.29-rc2/drivers/scsi/sg.c	2009-01-21 14:36:00.000000000 -0500

Tony,
We seem to have consensus on this version (v6 20090121).

Thanks for you work.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/2] sg: fix races during device removal (v6)
  2009-01-26 13:57               ` Douglas Gilbert
@ 2009-01-28  1:51                 ` FUJITA Tomonori
  2009-01-28 15:06                   ` James Bottomley
  0 siblings, 1 reply; 41+ messages in thread
From: FUJITA Tomonori @ 2009-01-28  1:51 UTC (permalink / raw)
  To: dgilbert; +Cc: tonyb, James.Bottomley, fujita.tomonori, hch, linux-scsi

On Mon, 26 Jan 2009 08:57:20 -0500
Douglas Gilbert <dgilbert@interlog.com> wrote:

> Tony Battersby wrote:
> > sg has the following problems related to device removal:
> > 
> > * opening a sg fd races with removing a device
> > * closing a sg fd races with removing a device
> > * /proc/scsi/sg/* access races with removing a device
> > * command completion races with removing a device
> > * command completion races with closing a sg fd
> > * can rmmod sg with active commands
> > 
> > These problems can cause kernel oopses, memory-use-after-free, or
> > double-free errors.  This patch fixes these problems by using krefs
> > to manage the lifetime of sg_device and sg_fd.
> > 
> > Each command submitted to the midlevel holds a reference to sg_fd
> > until the completion callback.  This ensures that sg_fd doesn't go
> > away if the fd is closed with commands still outstanding.
> > 
> > sg_fd gets the reference of sg_device (with scsi_device) and also
> > makes sure that the sg module doesn't go away.
> > 
> > /proc/scsi/sg/* functions don't play nicely with krefs because they
> > give information about sg_fds which have been closed but not yet
> > freed due to still having outstanding commands and sg_devices which
> > have been removed but not yet freed due to still being referenced
> > by one or more sg_fds.  To deal with this safely without removing
> > functionality, /proc functions now access sg_device and sg_fd while
> > holding a lock instead of using kref_get()/kref_put().
> > 
> > Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
> > ---
> > 
> > This version changes BUG_ON() to WARN_ON()/return as suggested by
> > Stefan Richter.
> > 
> > The second patch "[PATCH 2/2] sg: fix races with ioctl(SG_IO) (v2)"
> > is still the same as before, so I am not resending it.
> > 
> >  sg.c |  418 ++++++++++++++++++++++++++++++++-----------------------------------
> >  1 file changed, 201 insertions(+), 217 deletions(-)
> > 
> > --- linux-2.6.29-rc2/drivers/scsi/sg.c.orig	2009-01-21 14:34:05.000000000 -0500
> > +++ linux-2.6.29-rc2/drivers/scsi/sg.c	2009-01-21 14:36:00.000000000 -0500
> 
> Tony,
> We seem to have consensus on this version (v6 20090121).
> 
> Thanks for you work.
> 
> Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>

Can we also get your ACK on:

[PATCH 2/2] sg: fix races with ioctl(SG_IO) (v2)

http://marc.info/?l=linux-scsi&m=123248892909435&w=2

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/2] sg: fix races during device removal (v6)
  2009-01-28  1:51                 ` FUJITA Tomonori
@ 2009-01-28 15:06                   ` James Bottomley
  0 siblings, 0 replies; 41+ messages in thread
From: James Bottomley @ 2009-01-28 15:06 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: dgilbert, tonyb, hch, linux-scsi

On Wed, 2009-01-28 at 10:51 +0900, FUJITA Tomonori wrote:
> On Mon, 26 Jan 2009 08:57:20 -0500
> Douglas Gilbert <dgilbert@interlog.com> wrote:
> 
> > Tony Battersby wrote:
> > > sg has the following problems related to device removal:
> > > 
> > > * opening a sg fd races with removing a device
> > > * closing a sg fd races with removing a device
> > > * /proc/scsi/sg/* access races with removing a device
> > > * command completion races with removing a device
> > > * command completion races with closing a sg fd
> > > * can rmmod sg with active commands
> > > 
> > > These problems can cause kernel oopses, memory-use-after-free, or
> > > double-free errors.  This patch fixes these problems by using krefs
> > > to manage the lifetime of sg_device and sg_fd.
> > > 
> > > Each command submitted to the midlevel holds a reference to sg_fd
> > > until the completion callback.  This ensures that sg_fd doesn't go
> > > away if the fd is closed with commands still outstanding.
> > > 
> > > sg_fd gets the reference of sg_device (with scsi_device) and also
> > > makes sure that the sg module doesn't go away.
> > > 
> > > /proc/scsi/sg/* functions don't play nicely with krefs because they
> > > give information about sg_fds which have been closed but not yet
> > > freed due to still having outstanding commands and sg_devices which
> > > have been removed but not yet freed due to still being referenced
> > > by one or more sg_fds.  To deal with this safely without removing
> > > functionality, /proc functions now access sg_device and sg_fd while
> > > holding a lock instead of using kref_get()/kref_put().
> > > 
> > > Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
> > > ---
> > > 
> > > This version changes BUG_ON() to WARN_ON()/return as suggested by
> > > Stefan Richter.
> > > 
> > > The second patch "[PATCH 2/2] sg: fix races with ioctl(SG_IO) (v2)"
> > > is still the same as before, so I am not resending it.
> > > 
> > >  sg.c |  418 ++++++++++++++++++++++++++++++++-----------------------------------
> > >  1 file changed, 201 insertions(+), 217 deletions(-)
> > > 
> > > --- linux-2.6.29-rc2/drivers/scsi/sg.c.orig	2009-01-21 14:34:05.000000000 -0500
> > > +++ linux-2.6.29-rc2/drivers/scsi/sg.c	2009-01-21 14:36:00.000000000 -0500
> > 
> > Tony,
> > We seem to have consensus on this version (v6 20090121).
> > 
> > Thanks for you work.
> > 
> > Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
> 
> Can we also get your ACK on:
> 
> [PATCH 2/2] sg: fix races with ioctl(SG_IO) (v2)
> 
> http://marc.info/?l=linux-scsi&m=123248892909435&w=2

Actually, using maintainer's prerogative, I was taking this as ack to
both.

James



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 2/2] sg: fix races with ioctl(SG_IO) (v2)
  2009-01-20  1:06         ` FUJITA Tomonori
  2009-01-20 21:58           ` [PATCH 1/2] sg: fix races during device removal (v5) Tony Battersby
@ 2009-01-20 22:00           ` Tony Battersby
  2009-01-25 12:46             ` FUJITA Tomonori
  1 sibling, 1 reply; 41+ messages in thread
From: Tony Battersby @ 2009-01-20 22:00 UTC (permalink / raw)
  To: James.Bottomley; +Cc: FUJITA Tomonori, dgilbert, hch, linux-scsi

sg_io_owned needs to be set before the command is sent to the midlevel;
otherwise, a quickly-completing command may cause a different CPU
to see "srp->done == 1 && !srp->sg_io_owned", which would lead to
incorrect behavior.

Check srp->done and set srp->orphan while holding rq_list_lock to
prevent races with sg_rq_end_io().

There is no need to check sfp->closed from read/write/ioctl/poll/etc.
since the kernel guarantees that this won't happen.

The usefulness of sg_srp_done() was questionable before; now it is
definitely not needed.

Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
---

Compared to the previous version of this patch, this version removes
sg_srp_done() and the unnecessary check for sfp->closed.

 sg.c |   39 ++++++++++++++-------------------------
 1 file changed, 14 insertions(+), 25 deletions(-)

--- linux-2.6.29-rc2/drivers/scsi/sg.c.orig	2009-01-20 16:32:28.000000000 -0500
+++ linux-2.6.29-rc2/drivers/scsi/sg.c	2009-01-20 16:34:37.000000000 -0500
@@ -189,7 +189,7 @@ static ssize_t sg_new_read(Sg_fd * sfp, 
 			   Sg_request * srp);
 static ssize_t sg_new_write(Sg_fd *sfp, struct file *file,
 			const char __user *buf, size_t count, int blocking,
-			int read_only, Sg_request **o_srp);
+			int read_only, int sg_io_owned, Sg_request **o_srp);
 static int sg_common_write(Sg_fd * sfp, Sg_request * srp,
 			   unsigned char *cmnd, int timeout, int blocking);
 static int sg_read_oxfer(Sg_request * srp, char __user *outp, int num_read_xfer);
@@ -561,7 +561,8 @@ sg_write(struct file *filp, const char _
 		return -EFAULT;
 	blocking = !(filp->f_flags & O_NONBLOCK);
 	if (old_hdr.reply_len < 0)
-		return sg_new_write(sfp, filp, buf, count, blocking, 0, NULL);
+		return sg_new_write(sfp, filp, buf, count,
+				    blocking, 0, 0, NULL);
 	if (count < (SZ_SG_HEADER + 6))
 		return -EIO;	/* The minimum scsi command length is 6 bytes. */
 
@@ -642,7 +643,7 @@ sg_write(struct file *filp, const char _
 
 static ssize_t
 sg_new_write(Sg_fd *sfp, struct file *file, const char __user *buf,
-		 size_t count, int blocking, int read_only,
+		 size_t count, int blocking, int read_only, int sg_io_owned,
 		 Sg_request **o_srp)
 {
 	int k;
@@ -662,6 +663,7 @@ sg_new_write(Sg_fd *sfp, struct file *fi
 		SCSI_LOG_TIMEOUT(1, printk("sg_new_write: queue full\n"));
 		return -EDOM;
 	}
+	srp->sg_io_owned = sg_io_owned;
 	hp = &srp->header;
 	if (__copy_from_user(hp, buf, SZ_SG_IO_HDR)) {
 		sg_remove_request(sfp, srp);
@@ -766,18 +768,6 @@ sg_common_write(Sg_fd * sfp, Sg_request 
 }
 
 static int
-sg_srp_done(Sg_request *srp, Sg_fd *sfp)
-{
-	unsigned long iflags;
-	int done;
-
-	read_lock_irqsave(&sfp->rq_list_lock, iflags);
-	done = srp->done;
-	read_unlock_irqrestore(&sfp->rq_list_lock, iflags);
-	return done;
-}
-
-static int
 sg_ioctl(struct inode *inode, struct file *filp,
 	 unsigned int cmd_in, unsigned long arg)
 {
@@ -809,27 +799,26 @@ sg_ioctl(struct inode *inode, struct fil
 				return -EFAULT;
 			result =
 			    sg_new_write(sfp, filp, p, SZ_SG_IO_HDR,
-					 blocking, read_only, &srp);
+					 blocking, read_only, 1, &srp);
 			if (result < 0)
 				return result;
-			srp->sg_io_owned = 1;
 			while (1) {
 				result = 0;	/* following macro to beat race condition */
 				__wait_event_interruptible(sfp->read_wait,
-					(sdp->detached || sfp->closed || sg_srp_done(srp, sfp)),
-							   result);
+					(srp->done || sdp->detached),
+					result);
 				if (sdp->detached)
 					return -ENODEV;
-				if (sfp->closed)
-					return 0;	/* request packet dropped already */
-				if (0 == result)
+				write_lock_irq(&sfp->rq_list_lock);
+				if (srp->done) {
+					srp->done = 2;
+					write_unlock_irq(&sfp->rq_list_lock);
 					break;
+				}
 				srp->orphan = 1;
+				write_unlock_irq(&sfp->rq_list_lock);
 				return result;	/* -ERESTARTSYS because signal hit process */
 			}
-			write_lock_irqsave(&sfp->rq_list_lock, iflags);
-			srp->done = 2;
-			write_unlock_irqrestore(&sfp->rq_list_lock, iflags);
 			result = sg_new_read(sfp, p, SZ_SG_IO_HDR, srp);
 			return (result < 0) ? result : 0;
 		}



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/2] sg: fix races with ioctl(SG_IO) (v2)
  2009-01-20 22:00           ` [PATCH 2/2] sg: fix races with ioctl(SG_IO) (v2) Tony Battersby
@ 2009-01-25 12:46             ` FUJITA Tomonori
  0 siblings, 0 replies; 41+ messages in thread
From: FUJITA Tomonori @ 2009-01-25 12:46 UTC (permalink / raw)
  To: tonyb; +Cc: James.Bottomley, fujita.tomonori, dgilbert, hch, linux-scsi

On Tue, 20 Jan 2009 17:00:09 -0500
Tony Battersby <tonyb@cybernetics.com> wrote:

> sg_io_owned needs to be set before the command is sent to the midlevel;
> otherwise, a quickly-completing command may cause a different CPU
> to see "srp->done == 1 && !srp->sg_io_owned", which would lead to
> incorrect behavior.
> 
> Check srp->done and set srp->orphan while holding rq_list_lock to
> prevent races with sg_rq_end_io().
> 
> There is no need to check sfp->closed from read/write/ioctl/poll/etc.
> since the kernel guarantees that this won't happen.
> 
> The usefulness of sg_srp_done() was questionable before; now it is
> definitely not needed.
> 
> Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
> ---
> 
> Compared to the previous version of this patch, this version removes
> sg_srp_done() and the unnecessary check for sfp->closed.
> 
>  sg.c |   39 ++++++++++++++-------------------------

It looks ok to me. Better (simpler) than the first version.

I guess that we could simplify 'done' and 'orphan' stuff a bit more
but we can do later on.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 2/2] sg: fix races with ioctl(SG_IO)
  2009-01-19  6:57     ` FUJITA Tomonori
  2009-01-19 15:02       ` Tony Battersby
  2009-01-19 23:03       ` [PATCH 1/2] sg: fix races during device removal (v4) Tony Battersby
@ 2009-01-19 23:06       ` Tony Battersby
  2 siblings, 0 replies; 41+ messages in thread
From: Tony Battersby @ 2009-01-19 23:06 UTC (permalink / raw)
  To: James.Bottomley; +Cc: FUJITA Tomonori, dgilbert, hch, linux-scsi

sg_io_owned needs to be set before the command is sent to the midlevel;
otherwise, a quickly-completing command may cause a different CPU
to see "srp->done == 1 && !srp->sg_io_owned", which would lead to
incorrect behavior.

Check srp->done and set srp->orphan while holding rq_list_lock to
prevent races with sg_rq_end_io().

Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
---

Same as before, just rediffed against v4 of the previous patch.

 sg.c |   21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

--- linux-2.6.29-rc2/drivers/scsi/sg.c.orig	2009-01-19 16:26:21.000000000 -0500
+++ linux-2.6.29-rc2/drivers/scsi/sg.c	2009-01-19 17:53:28.000000000 -0500
@@ -208,7 +208,7 @@ static ssize_t sg_new_read(Sg_fd * sfp, 
 			   Sg_request * srp);
 static ssize_t sg_new_write(Sg_fd *sfp, struct file *file,
 			const char __user *buf, size_t count, int blocking,
-			int read_only, Sg_request **o_srp);
+			int read_only, int sg_io_owned, Sg_request **o_srp);
 static int sg_common_write(Sg_fd * sfp, Sg_request * srp,
 			   unsigned char *cmnd, int timeout, int blocking);
 static int sg_read_oxfer(Sg_request * srp, char __user *outp, int num_read_xfer);
@@ -590,7 +590,8 @@ sg_write(struct file *filp, const char _
 		return -EFAULT;
 	blocking = !(filp->f_flags & O_NONBLOCK);
 	if (old_hdr.reply_len < 0)
-		return sg_new_write(sfp, filp, buf, count, blocking, 0, NULL);
+		return sg_new_write(sfp, filp, buf, count,
+				    blocking, 0, 0, NULL);
 	if (count < (SZ_SG_HEADER + 6))
 		return -EIO;	/* The minimum scsi command length is 6 bytes. */
 
@@ -671,7 +672,7 @@ sg_write(struct file *filp, const char _
 
 static ssize_t
 sg_new_write(Sg_fd *sfp, struct file *file, const char __user *buf,
-		 size_t count, int blocking, int read_only,
+		 size_t count, int blocking, int read_only, int sg_io_owned,
 		 Sg_request **o_srp)
 {
 	int k;
@@ -691,6 +692,7 @@ sg_new_write(Sg_fd *sfp, struct file *fi
 		SCSI_LOG_TIMEOUT(1, printk("sg_new_write: queue full\n"));
 		return -EDOM;
 	}
+	srp->sg_io_owned = sg_io_owned;
 	hp = &srp->header;
 	if (__copy_from_user(hp, buf, SZ_SG_IO_HDR)) {
 		sg_remove_request(sfp, srp);
@@ -838,10 +840,9 @@ sg_ioctl(struct inode *inode, struct fil
 				return -EFAULT;
 			result =
 			    sg_new_write(sfp, filp, p, SZ_SG_IO_HDR,
-					 blocking, read_only, &srp);
+					 blocking, read_only, 1, &srp);
 			if (result < 0)
 				return result;
-			srp->sg_io_owned = 1;
 			while (1) {
 				result = 0;	/* following macro to beat race condition */
 				__wait_event_interruptible(sfp->read_wait,
@@ -851,14 +852,16 @@ sg_ioctl(struct inode *inode, struct fil
 					return -ENODEV;
 				if (sfp->closed)
 					return 0;	/* request packet dropped already */
-				if (0 == result)
+				write_lock_irq(&sfp->rq_list_lock);
+				if (srp->done) {
+					srp->done = 2;
+					write_unlock_irq(&sfp->rq_list_lock);
 					break;
+				}
 				srp->orphan = 1;
+				write_unlock_irq(&sfp->rq_list_lock);
 				return result;	/* -ERESTARTSYS because signal hit process */
 			}
-			write_lock_irqsave(&sfp->rq_list_lock, iflags);
-			srp->done = 2;
-			write_unlock_irqrestore(&sfp->rq_list_lock, iflags);
 			result = sg_new_read(sfp, p, SZ_SG_IO_HDR, srp);
 			return (result < 0) ? result : 0;
 		}



^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2009-01-28 15:06 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-05 19:07 [PATCH 0/2] sg: fix races during device removal (v2) Tony Battersby
2009-01-08 23:21 ` Douglas Gilbert
2009-01-10 17:26 ` FUJITA Tomonori
2009-01-12 21:09   ` Tony Battersby
2009-01-13 16:24     ` FUJITA Tomonori
2009-01-14 20:31   ` Tony Battersby
2009-01-14 21:39     ` Greg KH
2009-01-14 21:59       ` Tony Battersby
2009-01-14 22:33       ` Stefan Richter
2009-01-14 22:53         ` Tony Battersby
2009-01-14 23:47           ` Stefan Richter
2009-01-15 14:47             ` Tony Battersby
2009-01-15 16:22               ` Stefan Richter
2009-01-15 16:44                 ` Stefan Richter
2009-01-15 18:17                 ` Tony Battersby
2009-01-15 18:47                   ` Stefan Richter
2009-01-15 19:14                     ` Stefan Richter
2009-01-15 19:20                     ` Tony Battersby
2009-01-15 20:43                       ` Stefan Richter
2009-01-15 21:43                         ` Tony Battersby
2009-01-15 21:58                           ` Stefan Richter
2009-01-15 22:23                             ` Tony Battersby
2009-01-15 23:24                               ` Stefan Richter
2009-01-16 14:16                                 ` Tony Battersby
2009-01-16  0:53                           ` Stefan Richter
2009-01-16  8:09                 ` Stefan Richter
2009-01-19  6:57     ` FUJITA Tomonori
2009-01-19 15:02       ` Tony Battersby
2009-01-19 23:03       ` [PATCH 1/2] sg: fix races during device removal (v4) Tony Battersby
2009-01-20  1:06         ` FUJITA Tomonori
2009-01-20 21:58           ` [PATCH 1/2] sg: fix races during device removal (v5) Tony Battersby
2009-01-21 18:25             ` Stefan Richter
2009-01-21 19:23               ` Tony Battersby
2009-01-21 19:45             ` [PATCH 1/2] sg: fix races during device removal (v6) Tony Battersby
2009-01-25 12:46               ` FUJITA Tomonori
2009-01-26 13:57               ` Douglas Gilbert
2009-01-28  1:51                 ` FUJITA Tomonori
2009-01-28 15:06                   ` James Bottomley
2009-01-20 22:00           ` [PATCH 2/2] sg: fix races with ioctl(SG_IO) (v2) Tony Battersby
2009-01-25 12:46             ` FUJITA Tomonori
2009-01-19 23:06       ` [PATCH 2/2] sg: fix races with ioctl(SG_IO) Tony Battersby

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox