linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: axboe@kernel.dk, linux-kernel@vger.kernel.org, vgoyal@redhat.com
Cc: ctalbott@google.com, ni@google.com, Tejun Heo <tj@kernel.org>,
	stable@kernel.org
Subject: [PATCH 01/10] block: make gendisk hold a reference to its queue
Date: Tue, 18 Oct 2011 21:26:15 -0700	[thread overview]
Message-ID: <1318998384-22525-2-git-send-email-tj@kernel.org> (raw)
In-Reply-To: <1318998384-22525-1-git-send-email-tj@kernel.org>

The following command sequence triggers an oops.

# mount /dev/sdb1 /mnt
# echo 1 > /sys/class/scsi_device/0\:0\:1\:0/device/delete
# umount /mnt

 general protection fault: 0000 [#1] PREEMPT SMP
 CPU 2
 Modules linked in:

 Pid: 791, comm: umount Not tainted 3.1.0-rc3-work+ #8 Bochs Bochs
 RIP: 0010:[<ffffffff810d0879>]  [<ffffffff810d0879>] __lock_acquire+0x389/0x1d60
...
 Call Trace:
  [<ffffffff810d2845>] lock_acquire+0x95/0x140
  [<ffffffff81aed87b>] _raw_spin_lock+0x3b/0x50
  [<ffffffff811573bc>] bdi_lock_two+0x5c/0x70
  [<ffffffff811c2f6c>] bdev_inode_switch_bdi+0x4c/0xf0
  [<ffffffff811c3fcb>] __blkdev_put+0x11b/0x1d0
  [<ffffffff811c4010>] __blkdev_put+0x160/0x1d0
  [<ffffffff811c40df>] blkdev_put+0x5f/0x190
  [<ffffffff8118f18d>] kill_block_super+0x4d/0x80
  [<ffffffff8118f4a5>] deactivate_locked_super+0x45/0x70
  [<ffffffff8119003a>] deactivate_super+0x4a/0x70
  [<ffffffff811ac4ad>] mntput_no_expire+0xed/0x130
  [<ffffffff811acf2e>] sys_umount+0x7e/0x3a0
  [<ffffffff81aeeeab>] system_call_fastpath+0x16/0x1b

This is because bdev holds on to disk but disk doesn't pin the
associated queue.  If a SCSI device is removed while the device is
still open, the sdev puts the base reference to the queue on release.
When the bdev is finally released, the associated queue is already
gone along with the bdi and bdev_inode_switch_bdi() ends up
dereferencing already freed bdi.

Even if it were not for this bug, disk not holding onto the associated
queue is very unusual and error-prone.

Fix it by making add_disk() take an extra reference to its queue and
put it on disk_release() and ensuring that disk and its fops owner are
put in that order after all accesses to the disk and queue are
complete.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: stable@kernel.org
---
 block/genhd.c  |    8 ++++++++
 fs/block_dev.c |   13 ++++++++-----
 2 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index 3a3bccb..02e9fca 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -611,6 +611,12 @@ void add_disk(struct gendisk *disk)
 	register_disk(disk);
 	blk_register_queue(disk);
 
+	/*
+	 * Take an extra ref on queue which will be put on disk_release()
+	 * so that it sticks around as long as @disk is there.
+	 */
+	WARN_ON_ONCE(blk_get_queue(disk->queue));
+
 	retval = sysfs_create_link(&disk_to_dev(disk)->kobj, &bdi->dev->kobj,
 				   "bdi");
 	WARN_ON(retval);
@@ -1095,6 +1101,8 @@ static void disk_release(struct device *dev)
 	disk_replace_part_tbl(disk, NULL);
 	free_part_stats(&disk->part0);
 	free_part_info(&disk->part0);
+	if (disk->queue)
+		blk_put_queue(disk->queue);
 	kfree(disk);
 }
 struct class block_class = {
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 0bed0d4..53bb05e 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1085,6 +1085,7 @@ static int __blkdev_put(struct block_device *bdev, fmode_t mode, int for_part);
 static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part)
 {
 	struct gendisk *disk;
+	struct module *owner;
 	int ret;
 	int partno;
 	int perm = 0;
@@ -1110,6 +1111,7 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part)
 	disk = get_gendisk(bdev->bd_dev, &partno);
 	if (!disk)
 		goto out;
+	owner = disk->fops->owner;
 
 	disk_block_events(disk);
 	mutex_lock_nested(&bdev->bd_mutex, for_part);
@@ -1137,8 +1139,8 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part)
 					bdev->bd_disk = NULL;
 					mutex_unlock(&bdev->bd_mutex);
 					disk_unblock_events(disk);
-					module_put(disk->fops->owner);
 					put_disk(disk);
+					module_put(owner);
 					goto restart;
 				}
 			}
@@ -1194,8 +1196,8 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part)
 				goto out_unlock_bdev;
 		}
 		/* only one opener holds refs to the module and disk */
-		module_put(disk->fops->owner);
 		put_disk(disk);
+		module_put(owner);
 	}
 	bdev->bd_openers++;
 	if (for_part)
@@ -1215,8 +1217,8 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part)
  out_unlock_bdev:
 	mutex_unlock(&bdev->bd_mutex);
 	disk_unblock_events(disk);
-	module_put(disk->fops->owner);
 	put_disk(disk);
+	module_put(owner);
  out:
 	bdput(bdev);
 
@@ -1437,8 +1439,6 @@ static int __blkdev_put(struct block_device *bdev, fmode_t mode, int for_part)
 	if (!bdev->bd_openers) {
 		struct module *owner = disk->fops->owner;
 
-		put_disk(disk);
-		module_put(owner);
 		disk_put_part(bdev->bd_part);
 		bdev->bd_part = NULL;
 		bdev->bd_disk = NULL;
@@ -1447,6 +1447,9 @@ static int __blkdev_put(struct block_device *bdev, fmode_t mode, int for_part)
 		if (bdev != bdev->bd_contains)
 			victim = bdev->bd_contains;
 		bdev->bd_contains = NULL;
+
+		put_disk(disk);
+		module_put(owner);
 	}
 	mutex_unlock(&bdev->bd_mutex);
 	bdput(bdev);
-- 
1.7.3.1


  reply	other threads:[~2011-10-19  4:26 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-10-19  4:26 [PATCHSET block/for-next] fix request_queue life-cycle management Tejun Heo
2011-10-19  4:26 ` Tejun Heo [this message]
2011-10-19  4:26 ` [PATCH 02/10] block: fix genhd refcounting in blkio_policy_parse_and_set() Tejun Heo
2011-10-19 13:26   ` Vivek Goyal
2011-10-19 16:29     ` Tejun Heo
2011-10-19 16:59       ` Vivek Goyal
2011-10-19 22:05         ` Tejun Heo
2011-10-19 22:07           ` Tejun Heo
2011-10-19 23:51             ` Tejun Heo
2011-10-20 13:41               ` Vivek Goyal
2011-10-20 16:11                 ` Tejun Heo
2011-10-20 16:16                   ` Kay Sievers
2011-10-20 17:50                     ` Vivek Goyal
2011-10-20 17:47                   ` Vivek Goyal
2011-10-19  4:26 ` [PATCH 03/10] block: move blk_throtl prototypes to block/blk.h Tejun Heo
2011-10-19 13:33   ` Vivek Goyal
2011-10-19  4:26 ` [PATCH 04/10] block: pass around REQ_* flags instead of broken down booleans during request alloc/free Tejun Heo
2011-10-19 13:44   ` Vivek Goyal
2011-10-19 16:31     ` Tejun Heo
2011-10-19  4:26 ` [PATCH 05/10] block: drop unnecessary blk_get/put_queue() in scsi_cmd_ioctl() and blk_get_tg() Tejun Heo
2011-10-19 13:52   ` Vivek Goyal
2011-10-19 16:35     ` Tejun Heo
2011-10-19  4:26 ` [PATCH 06/10] block: reorganize queue draining Tejun Heo
2011-10-19  4:26 ` [PATCH 07/10] block: reorganize throtl_get_tg() and blk_throtl_bio() Tejun Heo
2011-10-19 14:56   ` Vivek Goyal
2011-10-19 17:06     ` Tejun Heo
2011-10-19 17:19       ` Vivek Goyal
2011-10-19 17:30         ` Tejun Heo
2011-10-19 17:45           ` Vivek Goyal
2011-10-19 17:49             ` Tejun Heo
2011-10-19  4:26 ` [PATCH 08/10] block: make get_request[_wait]() fail if queue is dead Tejun Heo
2011-10-19 15:22   ` Vivek Goyal
2011-10-19  4:26 ` [PATCH 09/10] block: drop @tsk from attempt_plug_merge() and explain sync rules Tejun Heo
2011-10-19  4:26 ` [PATCH 10/10] block: fix request_queue lifetime handling by making blk_queue_cleanup() proper shutdown Tejun Heo
2011-10-19 12:43   ` Jens Axboe
2011-10-19 17:13     ` Tejun Heo
2011-10-19 18:04       ` Jens Axboe
2011-10-19 16:18   ` Vivek Goyal
2011-10-19 17:12     ` Tejun Heo
2011-10-19 17:29       ` Vivek Goyal
2011-10-19 17:33         ` Tejun Heo
2011-10-19  4:29 ` [PATCHSET block/for-next] fix request_queue life-cycle management Tejun Heo
2011-10-19 12:44 ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1318998384-22525-2-git-send-email-tj@kernel.org \
    --to=tj@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=ctalbott@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ni@google.com \
    --cc=stable@kernel.org \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).