linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET 2/3 blk-for-2.6.28] block: implement extended devt, take #3
@ 2008-08-25 10:47 Tejun Heo
  2008-08-25 10:47 ` [PATCH 01/09] block: misc updates Tejun Heo
                   ` (9 more replies)
  0 siblings, 10 replies; 12+ messages in thread
From: Tejun Heo @ 2008-08-25 10:47 UTC (permalink / raw)
  To: Jens Axboe
  Cc: James.Bottomley, bzolnier, bharrosh, greg.freemyer, linux-scsi,
	brking, liml, viro, linux-ide, neilb, linux-kernel

Hello, all.

This is the third take of block-extended-devt patchset and contains
the following 10 patches.

0001-block-misc-updates.patch
0002-block-make-variable-and-argument-names-more-consist.patch
0003-block-don-t-depend-on-consecutive-minor-space.patch
0004-block-fix-disk-part-dereferencing-race.patch
0005-block-fix-diskstats-access.patch
0006-block-implement-extended-dev-numbers.patch
0007-block-adjust-formatting-for-large-minors-and-add-ex.patch
0008-sd-ide-disk-apply-extended-minors-to-sd-and-ide.patch
0009-block-implement-CONFIG_DEBUG_BLOCK_EXT_DEVT.patch

0001-0002 are trivial preparation patches.  0003 implements accessors
and applies them to insulate consecutive device number assumption and
help fixing partition dereferencing problem.  0004-0005 fixes
partition dereferencing bug and simplifies stats handling.  0006-0009
implement and apply extended devt.

Changes from the last take[L] are...

* Adapted to blk-for-2.6.28. 
* 0005 and 0006 of the last take are collapsed into 0005.
* Reverse iteration bug fixed.
* blk_alloc_devt() fixed to honor first_minor when allocating from
  consecutive minor range.

For general description of the patchset, please read description of
the first take[F].

This patchset is against...

  blk-for-2.6.28 (9abd7c437c02e7448fb1d2d3cfc0b9c1ab77cf2d)
+ [1] klist-dont-iterate-over-deleted-entries
+ [2] use-klist-for-class-device-list-and-implement-iterator
+ [3] block-misc-fixes-and-improvements patchset

and available in the following git tree.

  http://git.kernel.org/?p=linux/kernel/git/tj/misc.git;a=shortlog;h=block-extended-devt
  git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git block-extended-devt

and the combined diffstat follows.

 block/blk-core.c                    |   57 ++-
 block/blk-merge.c                   |   14 
 block/genhd.c                       |  534 ++++++++++++++++++++++++++++++------
 block/ioctl.c                       |   39 +-
 drivers/block/aoe/aoecmd.c          |   15 -
 drivers/block/pktcdvd.c             |    2 
 drivers/block/ps3disk.c             |    2 
 drivers/char/random.c               |    6 
 drivers/ide/ide-disk.c              |   17 -
 drivers/md/dm-ioctl.c               |    4 
 drivers/md/dm-stripe.c              |    4 
 drivers/md/dm.c                     |   33 +-
 drivers/md/linear.c                 |    7 
 drivers/md/multipath.c              |    7 
 drivers/md/raid0.c                  |    7 
 drivers/md/raid1.c                  |    8 
 drivers/md/raid10.c                 |    7 
 drivers/md/raid5.c                  |    8 
 drivers/memstick/core/mspro_block.c |    2 
 drivers/mmc/card/block.c            |    2 
 drivers/s390/block/dasd_proc.c      |    3 
 drivers/s390/block/dcssblk.c        |    4 
 drivers/scsi/sd.c                   |   15 -
 drivers/scsi/sr.c                   |    2 
 fs/block_dev.c                      |   23 -
 fs/partitions/check.c               |  118 +++++--
 include/linux/fs.h                  |    1 
 include/linux/genhd.h               |  237 ++++++++-------
 include/linux/major.h               |    2 
 lib/Kconfig.debug                   |   16 +
 30 files changed, 861 insertions(+), 335 deletions(-)

Thanks.

--
tejun

[L] http://kerneltrap.org/mailarchive/linux-kernel/2008/7/14/2453024
[F] http://thread.gmane.org/gmane.linux.kernel/701825
[1] http://article.gmane.org/gmane.linux.kernel/725706
[2] http://article.gmane.org/gmane.linux.kernel/725708
[3] http://thread.gmane.org/gmane.linux.kernel/725724

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 01/09] block: misc updates
  2008-08-25 10:47 [PATCHSET 2/3 blk-for-2.6.28] block: implement extended devt, take #3 Tejun Heo
@ 2008-08-25 10:47 ` Tejun Heo
  2008-08-25 10:47 ` [PATCH 02/09] block: make variable and argument names more consistent Tejun Heo
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Tejun Heo @ 2008-08-25 10:47 UTC (permalink / raw)
  To: Jens Axboe
  Cc: James.Bottomley, bzolnier, bharrosh, greg.freemyer, linux-scsi,
	brking, liml, viro, linux-ide, neilb, linux-kernel, Tejun Heo

This patch makes the following misc updates in preparation for
disk->part dereference fix and extended block devt support.

* implment part_to_disk()

* fix comment about gendisk->part indexing

* rename get_part() to disk_map_sector()

* don't use n which is always zero while printing disk information in
  diskstats_show()

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 block/blk-core.c           |    7 ++++---
 block/blk-merge.c          |    4 ++--
 block/genhd.c              |    4 ++--
 drivers/block/aoe/aoecmd.c |    2 +-
 include/linux/genhd.h      |   13 ++++++++++---
 5 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index d785466..430b206 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -60,7 +60,7 @@ static void drive_stat_acct(struct request *rq, int new_io)
 	if (!blk_fs_request(rq) || !rq->rq_disk)
 		return;
 
-	part = get_part(rq->rq_disk, rq->sector);
+	part = disk_map_sector(rq->rq_disk, rq->sector);
 	if (!new_io)
 		__all_stat_inc(rq->rq_disk, part, merges[rw], rq->sector);
 	else {
@@ -1555,7 +1555,8 @@ static int __end_that_request_first(struct request *req, int error,
 	}
 
 	if (blk_fs_request(req) && req->rq_disk) {
-		struct hd_struct *part = get_part(req->rq_disk, req->sector);
+		struct hd_struct *part =
+			disk_map_sector(req->rq_disk, req->sector);
 		const int rw = rq_data_dir(req);
 
 		all_stat_add(req->rq_disk, part, sectors[rw],
@@ -1743,7 +1744,7 @@ static void end_that_request_last(struct request *req, int error)
 	if (disk && blk_fs_request(req) && req != &req->q->bar_rq) {
 		unsigned long duration = jiffies - req->start_time;
 		const int rw = rq_data_dir(req);
-		struct hd_struct *part = get_part(disk, req->sector);
+		struct hd_struct *part = disk_map_sector(disk, req->sector);
 
 		__all_stat_inc(disk, part, ios[rw], req->sector);
 		__all_stat_add(disk, part, ticks[rw], duration, req->sector);
diff --git a/block/blk-merge.c b/block/blk-merge.c
index d81d914..9b17da6 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -387,8 +387,8 @@ static int attempt_merge(struct request_queue *q, struct request *req,
 	elv_merge_requests(q, req, next);
 
 	if (req->rq_disk) {
-		struct hd_struct *part
-			= get_part(req->rq_disk, req->sector);
+		struct hd_struct *part =
+			disk_map_sector(req->rq_disk, req->sector);
 		disk_round_stats(req->rq_disk);
 		req->rq_disk->in_flight--;
 		if (part) {
diff --git a/block/genhd.c b/block/genhd.c
index e9d60fd..fe147bb 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -565,7 +565,7 @@ static int diskstats_show(struct seq_file *s, void *v)
 {
 	struct gendisk *gp = v;
 	char buf[BDEVNAME_SIZE];
-	int n = 0;
+	int n;
 
 	/*
 	if (&gp->dev.kobj.entry == block_class.devices.next)
@@ -579,7 +579,7 @@ static int diskstats_show(struct seq_file *s, void *v)
 	disk_round_stats(gp);
 	preempt_enable();
 	seq_printf(s, "%4d %4d %s %lu %lu %llu %u %lu %lu %llu %u %u %u %u\n",
-		gp->major, n + gp->first_minor, disk_name(gp, n, buf),
+		gp->major, gp->first_minor, disk_name(gp, 0, buf),
 		disk_stat_read(gp, ios[0]), disk_stat_read(gp, merges[0]),
 		(unsigned long long)disk_stat_read(gp, sectors[0]),
 		jiffies_to_msecs(disk_stat_read(gp, ticks[0])),
diff --git a/drivers/block/aoe/aoecmd.c b/drivers/block/aoe/aoecmd.c
index 2f17462..885d140 100644
--- a/drivers/block/aoe/aoecmd.c
+++ b/drivers/block/aoe/aoecmd.c
@@ -757,7 +757,7 @@ diskstats(struct gendisk *disk, struct bio *bio, ulong duration, sector_t sector
 	const int rw = bio_data_dir(bio);
 	struct hd_struct *part;
 
-	part = get_part(disk, sector);
+	part = disk_map_sector(disk, sector);
 	all_stat_inc(disk, part, ios[rw], sector);
 	all_stat_add(disk, part, ticks[rw], duration, sector);
 	all_stat_add(disk, part, sectors[rw], n_sect, sector);
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index 118216f..f802961 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -125,7 +125,7 @@ struct gendisk {
 	int minors;                     /* maximum number of minors, =1 for
                                          * disks that can't be partitioned. */
 	char disk_name[32];		/* name of major driver */
-	struct hd_struct **part;	/* [indexed by minor] */
+	struct hd_struct **part;	/* [indexed by minor - 1] */
 	struct block_device_operations *fops;
 	struct request_queue *queue;
 	struct blk_scsi_cmd_filter cmd_filter;
@@ -155,14 +155,21 @@ struct gendisk {
 #endif
 };
 
+static inline struct gendisk *part_to_disk(struct hd_struct *part)
+{
+	if (likely(part))
+		return dev_to_disk((part)->dev.parent);
+	return NULL;
+}
+
 /* 
  * Macros to operate on percpu disk statistics:
  *
  * The __ variants should only be called in critical sections. The full
  * variants disable/enable preemption.
  */
-static inline struct hd_struct *get_part(struct gendisk *gendiskp,
-					 sector_t sector)
+static inline struct hd_struct *disk_map_sector(struct gendisk *gendiskp,
+						sector_t sector)
 {
 	struct hd_struct *part;
 	int i;
-- 
1.5.4.5


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 02/09] block: make variable and argument names more consistent
  2008-08-25 10:47 [PATCHSET 2/3 blk-for-2.6.28] block: implement extended devt, take #3 Tejun Heo
  2008-08-25 10:47 ` [PATCH 01/09] block: misc updates Tejun Heo
@ 2008-08-25 10:47 ` Tejun Heo
  2008-08-25 10:47 ` [PATCH 03/09] block: don't depend on consecutive minor space Tejun Heo
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Tejun Heo @ 2008-08-25 10:47 UTC (permalink / raw)
  To: Jens Axboe
  Cc: James.Bottomley, bzolnier, bharrosh, greg.freemyer, linux-scsi,
	brking, liml, viro, linux-ide, neilb, linux-kernel, Tejun Heo

In hd_struct, @partno is used to denote partition number and a number
of other places use @part to denote hd_struct.  Functions use @part
and @index instead.  This causes confusion and makes it difficult to
use consistent variable names for hd_struct.  Always use @partno if a
variable represents partition number.

Also, print out functions use @f or @part for seq_file argument.  Use
@seqf uniformly instead.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 block/genhd.c         |   36 ++++++++++++++++++------------------
 block/ioctl.c         |   15 ++++++++-------
 fs/block_dev.c        |    8 ++++----
 fs/partitions/check.c |   33 +++++++++++++++++----------------
 include/linux/genhd.h |   12 ++++++------
 5 files changed, 53 insertions(+), 51 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index fe147bb..3a33847 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -43,14 +43,14 @@ static inline int major_to_index(int major)
 }
 
 #ifdef CONFIG_PROC_FS
-void blkdev_show(struct seq_file *f, off_t offset)
+void blkdev_show(struct seq_file *seqf, off_t offset)
 {
 	struct blk_major_name *dp;
 
 	if (offset < BLKDEV_MAJOR_HASH_SIZE) {
 		mutex_lock(&block_class_lock);
 		for (dp = major_names[offset]; dp; dp = dp->next)
-			seq_printf(f, "%3d %s\n", dp->major, dp->name);
+			seq_printf(seqf, "%3d %s\n", dp->major, dp->name);
 		mutex_unlock(&block_class_lock);
 	}
 }
@@ -157,7 +157,7 @@ void blk_unregister_region(dev_t devt, unsigned long range)
 
 EXPORT_SYMBOL(blk_unregister_region);
 
-static struct kobject *exact_match(dev_t devt, int *part, void *data)
+static struct kobject *exact_match(dev_t devt, int *partno, void *data)
 {
 	struct gendisk *p = data;
 
@@ -219,9 +219,9 @@ void unlink_gendisk(struct gendisk *disk)
  * This function gets the structure containing partitioning
  * information for the given device @devt.
  */
-struct gendisk *get_gendisk(dev_t devt, int *part)
+struct gendisk *get_gendisk(dev_t devt, int *partno)
 {
-	struct kobject *kobj = kobj_lookup(bdev_map, devt, part);
+	struct kobject *kobj = kobj_lookup(bdev_map, devt, partno);
 	struct device *dev = kobj_to_dev(kobj);
 
 	return  kobj ? dev_to_disk(dev) : NULL;
@@ -337,7 +337,7 @@ static void *show_partition_start(struct seq_file *seqf, loff_t *pos)
 	return p;
 }
 
-static int show_partition(struct seq_file *part, void *v)
+static int show_partition(struct seq_file *seqf, void *v)
 {
 	struct gendisk *sgp = v;
 	int n;
@@ -351,7 +351,7 @@ static int show_partition(struct seq_file *part, void *v)
 		return 0;
 
 	/* show the full disk and all non-0 size partitions of it */
-	seq_printf(part, "%4d  %4d %10llu %s\n",
+	seq_printf(seqf, "%4d  %4d %10llu %s\n",
 		sgp->major, sgp->first_minor,
 		(unsigned long long)get_capacity(sgp) >> 1,
 		disk_name(sgp, 0, buf));
@@ -360,7 +360,7 @@ static int show_partition(struct seq_file *part, void *v)
 			continue;
 		if (sgp->part[n]->nr_sects == 0)
 			continue;
-		seq_printf(part, "%4d  %4d %10llu %s\n",
+		seq_printf(seqf, "%4d  %4d %10llu %s\n",
 			sgp->major, n + 1 + sgp->first_minor,
 			(unsigned long long)sgp->part[n]->nr_sects >> 1 ,
 			disk_name(sgp, n + 1, buf));
@@ -378,7 +378,7 @@ const struct seq_operations partitions_op = {
 #endif
 
 
-static struct kobject *base_probe(dev_t devt, int *part, void *data)
+static struct kobject *base_probe(dev_t devt, int *partno, void *data)
 {
 	if (request_module("block-major-%d-%d", MAJOR(devt), MINOR(devt)) > 0)
 		/* Make old-style 2.4 aliases work */
@@ -561,7 +561,7 @@ static struct device_type disk_type = {
  * The output looks suspiciously like /proc/partitions with a bunch of
  * extra fields.
  */
-static int diskstats_show(struct seq_file *s, void *v)
+static int diskstats_show(struct seq_file *seqf, void *v)
 {
 	struct gendisk *gp = v;
 	char buf[BDEVNAME_SIZE];
@@ -569,7 +569,7 @@ static int diskstats_show(struct seq_file *s, void *v)
 
 	/*
 	if (&gp->dev.kobj.entry == block_class.devices.next)
-		seq_puts(s,	"major minor name"
+		seq_puts(seqf,	"major minor name"
 				"     rio rmerge rsect ruse wio wmerge "
 				"wsect wuse running use aveq"
 				"\n\n");
@@ -578,7 +578,7 @@ static int diskstats_show(struct seq_file *s, void *v)
 	preempt_disable();
 	disk_round_stats(gp);
 	preempt_enable();
-	seq_printf(s, "%4d %4d %s %lu %lu %llu %u %lu %lu %llu %u %u %u %u\n",
+	seq_printf(seqf, "%4d %4d %s %lu %lu %llu %u %lu %lu %llu %u %u %u %u\n",
 		gp->major, gp->first_minor, disk_name(gp, 0, buf),
 		disk_stat_read(gp, ios[0]), disk_stat_read(gp, merges[0]),
 		(unsigned long long)disk_stat_read(gp, sectors[0]),
@@ -600,7 +600,7 @@ static int diskstats_show(struct seq_file *s, void *v)
 		preempt_disable();
 		part_round_stats(hd);
 		preempt_enable();
-		seq_printf(s, "%4d %4d %s %lu %lu %llu "
+		seq_printf(seqf, "%4d %4d %s %lu %lu %llu "
 			   "%u %lu %lu %llu %u %u %u %u\n",
 			   gp->major, n + gp->first_minor + 1,
 			   disk_name(gp, n + 1, buf),
@@ -652,7 +652,7 @@ void genhd_media_change_notify(struct gendisk *disk)
 EXPORT_SYMBOL_GPL(genhd_media_change_notify);
 #endif  /*  0  */
 
-dev_t blk_lookup_devt(const char *name, int part)
+dev_t blk_lookup_devt(const char *name, int partno)
 {
 	dev_t devt = MKDEV(0, 0);
 	struct class_dev_iter iter;
@@ -662,9 +662,9 @@ dev_t blk_lookup_devt(const char *name, int part)
 	while ((dev = class_dev_iter_next(&iter))) {
 		struct gendisk *disk = dev_to_disk(dev);
 
-		if (!strcmp(dev->bus_id, name) && part < disk->minors) {
+		if (!strcmp(dev->bus_id, name) && partno < disk->minors) {
 			devt = MKDEV(MAJOR(dev->devt),
-				     MINOR(dev->devt) + part);
+				     MINOR(dev->devt) + partno);
 			break;
 		}
 	}
@@ -774,10 +774,10 @@ int bdev_read_only(struct block_device *bdev)
 
 EXPORT_SYMBOL(bdev_read_only);
 
-int invalidate_partition(struct gendisk *disk, int index)
+int invalidate_partition(struct gendisk *disk, int partno)
 {
 	int res = 0;
-	struct block_device *bdev = bdget_disk(disk, index);
+	struct block_device *bdev = bdget_disk(disk, partno);
 	if (bdev) {
 		fsync_bdev(bdev);
 		res = __invalidate_device(bdev);
diff --git a/block/ioctl.c b/block/ioctl.c
index eb046ae..d77f5e2 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -15,7 +15,7 @@ static int blkpg_ioctl(struct block_device *bdev, struct blkpg_ioctl_arg __user
 	struct blkpg_ioctl_arg a;
 	struct blkpg_partition p;
 	long long start, length;
-	int part;
+	int partno;
 	int i;
 	int err;
 
@@ -28,8 +28,8 @@ static int blkpg_ioctl(struct block_device *bdev, struct blkpg_ioctl_arg __user
 	disk = bdev->bd_disk;
 	if (bdev != bdev->bd_contains)
 		return -EINVAL;
-	part = p.pno;
-	if (part <= 0 || part >= disk->minors)
+	partno = p.pno;
+	if (partno <= 0 || partno >= disk->minors)
 		return -EINVAL;
 	switch (a.op) {
 		case BLKPG_ADD_PARTITION:
@@ -59,13 +59,14 @@ static int blkpg_ioctl(struct block_device *bdev, struct blkpg_ioctl_arg __user
 				}
 			}
 			/* all seems OK */
-			err = add_partition(disk, part, start, length, ADDPART_FLAG_NONE);
+			err = add_partition(disk, partno, start, length,
+					    ADDPART_FLAG_NONE);
 			mutex_unlock(&bdev->bd_mutex);
 			return err;
 		case BLKPG_DEL_PARTITION:
-			if (!disk->part[part-1])
+			if (!disk->part[partno - 1])
 				return -ENXIO;
-			bdevp = bdget_disk(disk, part);
+			bdevp = bdget_disk(disk, partno);
 			if (!bdevp)
 				return -ENOMEM;
 			mutex_lock(&bdevp->bd_mutex);
@@ -79,7 +80,7 @@ static int blkpg_ioctl(struct block_device *bdev, struct blkpg_ioctl_arg __user
 			invalidate_bdev(bdevp);
 
 			mutex_lock_nested(&bdev->bd_mutex, 1);
-			delete_partition(disk, part);
+			delete_partition(disk, partno);
 			mutex_unlock(&bdev->bd_mutex);
 			mutex_unlock(&bdevp->bd_mutex);
 			bdput(bdevp);
diff --git a/fs/block_dev.c b/fs/block_dev.c
index aff5421..de0776c 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -930,7 +930,7 @@ static int do_open(struct block_device *bdev, struct file *file, int for_part)
 	struct module *owner = NULL;
 	struct gendisk *disk;
 	int ret;
-	int part;
+	int partno;
 	int perm = 0;
 
 	if (file->f_mode & FMODE_READ)
@@ -949,7 +949,7 @@ static int do_open(struct block_device *bdev, struct file *file, int for_part)
 	ret = -ENXIO;
 	file->f_mapping = bdev->bd_inode->i_mapping;
 	lock_kernel();
-	disk = get_gendisk(bdev->bd_dev, &part);
+	disk = get_gendisk(bdev->bd_dev, &partno);
 	if (!disk) {
 		unlock_kernel();
 		bdput(bdev);
@@ -961,7 +961,7 @@ static int do_open(struct block_device *bdev, struct file *file, int for_part)
 	if (!bdev->bd_openers) {
 		bdev->bd_disk = disk;
 		bdev->bd_contains = bdev;
-		if (!part) {
+		if (!partno) {
 			struct backing_dev_info *bdi;
 			if (disk->fops->open) {
 				ret = disk->fops->open(bdev->bd_inode, file);
@@ -989,7 +989,7 @@ static int do_open(struct block_device *bdev, struct file *file, int for_part)
 			if (ret)
 				goto out_first;
 			bdev->bd_contains = whole;
-			p = disk->part[part - 1];
+			p = disk->part[partno - 1];
 			bdev->bd_inode->i_data.backing_dev_info =
 			   whole->bd_inode->i_data.backing_dev_info;
 			if (!(disk->flags & GENHD_FL_UP) || !p || !p->nr_sects) {
diff --git a/fs/partitions/check.c b/fs/partitions/check.c
index 43dbfab..db56f5f 100644
--- a/fs/partitions/check.c
+++ b/fs/partitions/check.c
@@ -120,22 +120,22 @@ static int (*check_part[])(struct parsed_partitions *, struct block_device *) =
  * a pointer to that same buffer (for convenience).
  */
 
-char *disk_name(struct gendisk *hd, int part, char *buf)
+char *disk_name(struct gendisk *hd, int partno, char *buf)
 {
-	if (!part)
+	if (!partno)
 		snprintf(buf, BDEVNAME_SIZE, "%s", hd->disk_name);
 	else if (isdigit(hd->disk_name[strlen(hd->disk_name)-1]))
-		snprintf(buf, BDEVNAME_SIZE, "%sp%d", hd->disk_name, part);
+		snprintf(buf, BDEVNAME_SIZE, "%sp%d", hd->disk_name, partno);
 	else
-		snprintf(buf, BDEVNAME_SIZE, "%s%d", hd->disk_name, part);
+		snprintf(buf, BDEVNAME_SIZE, "%s%d", hd->disk_name, partno);
 
 	return buf;
 }
 
 const char *bdevname(struct block_device *bdev, char *buf)
 {
-	int part = MINOR(bdev->bd_dev) - bdev->bd_disk->first_minor;
-	return disk_name(bdev->bd_disk, part, buf);
+	int partno = MINOR(bdev->bd_dev) - bdev->bd_disk->first_minor;
+	return disk_name(bdev->bd_disk, partno, buf);
 }
 
 EXPORT_SYMBOL(bdevname);
@@ -310,13 +310,13 @@ static inline void disk_sysfs_add_subdirs(struct gendisk *disk)
 	kobject_put(k);
 }
 
-void delete_partition(struct gendisk *disk, int part)
+void delete_partition(struct gendisk *disk, int partno)
 {
-	struct hd_struct *p = disk->part[part-1];
+	struct hd_struct *p = disk->part[partno - 1];
 
 	if (!p)
 		return;
-	disk->part[part-1] = NULL;
+	disk->part[partno - 1] = NULL;
 	p->start_sect = 0;
 	p->nr_sects = 0;
 	part_stat_set_all(p, 0);
@@ -333,12 +333,13 @@ static ssize_t whole_disk_show(struct device *dev,
 static DEVICE_ATTR(whole_disk, S_IRUSR | S_IRGRP | S_IROTH,
 		   whole_disk_show, NULL);
 
-int add_partition(struct gendisk *disk, int part, sector_t start, sector_t len, int flags)
+int add_partition(struct gendisk *disk, int partno,
+		  sector_t start, sector_t len, int flags)
 {
 	struct hd_struct *p;
 	int err;
 
-	if (disk->part[part - 1])
+	if (disk->part[partno - 1])
 		return -EBUSY;
 
 	p = kzalloc(sizeof(*p), GFP_KERNEL);
@@ -351,18 +352,18 @@ int add_partition(struct gendisk *disk, int part, sector_t start, sector_t len,
 	}
 	p->start_sect = start;
 	p->nr_sects = len;
-	p->partno = part;
+	p->partno = partno;
 	p->policy = disk->policy;
 
 	if (isdigit(disk->dev.bus_id[strlen(disk->dev.bus_id)-1]))
 		snprintf(p->dev.bus_id, BUS_ID_SIZE,
-		"%sp%d", disk->dev.bus_id, part);
+		"%sp%d", disk->dev.bus_id, partno);
 	else
 		snprintf(p->dev.bus_id, BUS_ID_SIZE,
-			 "%s%d", disk->dev.bus_id, part);
+			 "%s%d", disk->dev.bus_id, partno);
 
 	device_initialize(&p->dev);
-	p->dev.devt = MKDEV(disk->major, disk->first_minor + part);
+	p->dev.devt = MKDEV(disk->major, disk->first_minor + partno);
 	p->dev.class = &block_class;
 	p->dev.type = &part_type;
 	p->dev.parent = &disk->dev;
@@ -386,7 +387,7 @@ int add_partition(struct gendisk *disk, int part, sector_t start, sector_t len,
 	}
 
 	/* everything is up and running, commence */
-	disk->part[part - 1] = p;
+	disk->part[partno - 1] = p;
 
 	/* suppress uevent if the disk supresses it */
 	if (!disk->dev.uevent_suppress)
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index f802961..6bfff7a 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -375,7 +375,7 @@ extern int get_blkdev_list(char *, int);
 extern void add_disk(struct gendisk *disk);
 extern void del_gendisk(struct gendisk *gp);
 extern void unlink_gendisk(struct gendisk *gp);
-extern struct gendisk *get_gendisk(dev_t dev, int *part);
+extern struct gendisk *get_gendisk(dev_t dev, int *partno);
 
 extern void set_device_ro(struct block_device *bdev, int flag);
 extern void set_disk_ro(struct gendisk *disk, int flag);
@@ -544,8 +544,8 @@ struct unixware_disklabel {
 #define ADDPART_FLAG_RAID	1
 #define ADDPART_FLAG_WHOLEDISK	2
 
-extern dev_t blk_lookup_devt(const char *name, int part);
-extern char *disk_name (struct gendisk *hd, int part, char *buf);
+extern dev_t blk_lookup_devt(const char *name, int partno);
+extern char *disk_name (struct gendisk *hd, int partno, char *buf);
 
 extern int rescan_partitions(struct gendisk *disk, struct block_device *bdev);
 extern int __must_check add_partition(struct gendisk *, int, sector_t, sector_t, int);
@@ -563,16 +563,16 @@ extern void blk_register_region(dev_t devt, unsigned long range,
 			void *data);
 extern void blk_unregister_region(dev_t devt, unsigned long range);
 
-static inline struct block_device *bdget_disk(struct gendisk *disk, int index)
+static inline struct block_device *bdget_disk(struct gendisk *disk, int partno)
 {
-	return bdget(MKDEV(disk->major, disk->first_minor) + index);
+	return bdget(MKDEV(disk->major, disk->first_minor) + partno);
 }
 
 #else /* CONFIG_BLOCK */
 
 static inline void printk_all_partitions(void) { }
 
-static inline dev_t blk_lookup_devt(const char *name, int part)
+static inline dev_t blk_lookup_devt(const char *name, int partno)
 {
 	dev_t devt = MKDEV(0, 0);
 	return devt;
-- 
1.5.4.5


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 03/09] block: don't depend on consecutive minor space
  2008-08-25 10:47 [PATCHSET 2/3 blk-for-2.6.28] block: implement extended devt, take #3 Tejun Heo
  2008-08-25 10:47 ` [PATCH 01/09] block: misc updates Tejun Heo
  2008-08-25 10:47 ` [PATCH 02/09] block: make variable and argument names more consistent Tejun Heo
@ 2008-08-25 10:47 ` Tejun Heo
  2008-08-25 10:47 ` [PATCH 04/09] block: fix disk->part[] dereferencing race Tejun Heo
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Tejun Heo @ 2008-08-25 10:47 UTC (permalink / raw)
  To: Jens Axboe
  Cc: James.Bottomley, bzolnier, bharrosh, greg.freemyer, linux-scsi,
	brking, liml, viro, linux-ide, neilb, linux-kernel, Tejun Heo

* Implement disk_devt() and part_devt() and use them to directly
  access devt instead of computing it from ->major and ->first_minor.

  Note that all references to ->major and ->first_minor outside of
  block layer is used to determine devt of the disk (the part0) and as
  ->major and ->first_minor will continue to represent devt for the
  disk, converting these users aren't strictly necessary.  However,
  convert them for consistency.

* Implement disk_max_parts() to avoid directly deferencing
  genhd->minors.

* Update bdget_disk() such that it doesn't assume consecutive minor
  space.

* Move devt computation from register_disk() to add_disk() and make it
  the only one (all other usages use the initially determined value).

These changes clean up the code and will help disk->part dereference
fix and extended block device numbers.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 block/genhd.c                       |  107 +++++++++++++++++++++++++----------
 block/ioctl.c                       |    6 +-
 drivers/block/pktcdvd.c             |    2 +-
 drivers/block/ps3disk.c             |    2 +-
 drivers/char/random.c               |    6 +-
 drivers/md/dm-ioctl.c               |    4 +-
 drivers/md/dm-stripe.c              |    4 +-
 drivers/md/dm.c                     |    7 +-
 drivers/memstick/core/mspro_block.c |    2 +-
 drivers/mmc/card/block.c            |    2 +-
 drivers/s390/block/dasd_proc.c      |    3 +-
 drivers/s390/block/dcssblk.c        |    4 +-
 drivers/scsi/sr.c                   |    2 +-
 fs/block_dev.c                      |    2 +-
 fs/partitions/check.c               |   19 ++++---
 include/linux/genhd.h               |   27 +++++++--
 16 files changed, 132 insertions(+), 67 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index 3a33847..d26b4b7 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -186,14 +186,15 @@ void add_disk(struct gendisk *disk)
 	int retval;
 
 	disk->flags |= GENHD_FL_UP;
-	blk_register_region(MKDEV(disk->major, disk->first_minor),
-			    disk->minors, NULL, exact_match, exact_lock, disk);
+	disk->dev.devt = MKDEV(disk->major, disk->first_minor);
+	blk_register_region(disk_devt(disk), disk->minors, NULL,
+			    exact_match, exact_lock, disk);
 	register_disk(disk);
 	blk_register_queue(disk);
 	blk_register_filter(disk);
 
 	bdi = &disk->queue->backing_dev_info;
-	bdi_register_dev(bdi, MKDEV(disk->major, disk->first_minor));
+	bdi_register_dev(bdi, disk_devt(disk));
 	retval = sysfs_create_link(&disk->dev.kobj, &bdi->dev->kobj, "bdi");
 	WARN_ON(retval);
 }
@@ -207,8 +208,7 @@ void unlink_gendisk(struct gendisk *disk)
 	sysfs_remove_link(&disk->dev.kobj, "bdi");
 	bdi_unregister(&disk->queue->backing_dev_info);
 	blk_unregister_queue(disk);
-	blk_unregister_region(MKDEV(disk->major, disk->first_minor),
-			      disk->minors);
+	blk_unregister_region(disk_devt(disk), disk->minors);
 }
 
 /**
@@ -227,6 +227,38 @@ struct gendisk *get_gendisk(dev_t devt, int *partno)
 	return  kobj ? dev_to_disk(dev) : NULL;
 }
 
+/**
+ * bdget_disk - do bdget() by gendisk and partition number
+ * @disk: gendisk of interest
+ * @partno: partition number
+ *
+ * Find partition @partno from @disk, do bdget() on it.
+ *
+ * CONTEXT:
+ * Don't care.
+ *
+ * RETURNS:
+ * Resulting block_device on success, NULL on failure.
+ */
+extern struct block_device *bdget_disk(struct gendisk *disk, int partno)
+{
+	dev_t devt = MKDEV(0, 0);
+
+	if (partno == 0)
+		devt = disk_devt(disk);
+	else {
+		struct hd_struct *part = disk->part[partno - 1];
+
+		if (part && part->nr_sects)
+			devt = part_devt(part);
+	}
+
+	if (likely(devt != MKDEV(0, 0)))
+		return bdget(devt);
+	return NULL;
+}
+EXPORT_SYMBOL(bdget_disk);
+
 /*
  * print a full list of all partitions - intended for places where the root
  * filesystem can't be mounted and thus to give the victim some idea of what
@@ -257,7 +289,7 @@ void __init printk_all_partitions(void)
 		 * option takes.
 		 */
 		printk("%02x%02x %10llu %s",
-		       disk->major, disk->first_minor,
+		       MAJOR(disk_devt(disk)), MINOR(disk_devt(disk)),
 		       (unsigned long long)get_capacity(disk) >> 1,
 		       disk_name(disk, 0, buf));
 		if (disk->driverfs_dev != NULL &&
@@ -268,15 +300,15 @@ void __init printk_all_partitions(void)
 			printk(" (driver?)\n");
 
 		/* now show the partitions */
-		for (n = 0; n < disk->minors - 1; ++n) {
-			if (disk->part[n] == NULL)
-				continue;
-			if (disk->part[n]->nr_sects == 0)
+		for (n = 0; n < disk_max_parts(disk); ++n) {
+			struct hd_struct *part = disk->part[n];
+
+			if (!part || !part->nr_sects)
 				continue;
 			printk("  %02x%02x %10llu %s\n",
-			       disk->major, n + 1 + disk->first_minor,
-			       (unsigned long long)disk->part[n]->nr_sects >> 1,
-			       disk_name(disk, n + 1, buf));
+			       MAJOR(part_devt(part)), MINOR(part_devt(part)),
+			       (unsigned long long)part->nr_sects >> 1,
+			       disk_name(disk, part->partno, buf));
 		}
 	}
 	class_dev_iter_exit(&iter);
@@ -344,26 +376,27 @@ static int show_partition(struct seq_file *seqf, void *v)
 	char buf[BDEVNAME_SIZE];
 
 	/* Don't show non-partitionable removeable devices or empty devices */
-	if (!get_capacity(sgp) ||
-			(sgp->minors == 1 && (sgp->flags & GENHD_FL_REMOVABLE)))
+	if (!get_capacity(sgp) || (!disk_max_parts(sgp) &&
+				   (sgp->flags & GENHD_FL_REMOVABLE)))
 		return 0;
 	if (sgp->flags & GENHD_FL_SUPPRESS_PARTITION_INFO)
 		return 0;
 
 	/* show the full disk and all non-0 size partitions of it */
 	seq_printf(seqf, "%4d  %4d %10llu %s\n",
-		sgp->major, sgp->first_minor,
+		MAJOR(disk_devt(sgp)), MINOR(disk_devt(sgp)),
 		(unsigned long long)get_capacity(sgp) >> 1,
 		disk_name(sgp, 0, buf));
-	for (n = 0; n < sgp->minors - 1; n++) {
-		if (!sgp->part[n])
+	for (n = 0; n < disk_max_parts(sgp); n++) {
+		struct hd_struct *part = sgp->part[n];
+		if (!part)
 			continue;
-		if (sgp->part[n]->nr_sects == 0)
+		if (part->nr_sects == 0)
 			continue;
 		seq_printf(seqf, "%4d  %4d %10llu %s\n",
-			sgp->major, n + 1 + sgp->first_minor,
-			(unsigned long long)sgp->part[n]->nr_sects >> 1 ,
-			disk_name(sgp, n + 1, buf));
+			   MAJOR(part_devt(part)), MINOR(part_devt(part)),
+			   (unsigned long long)part->nr_sects >> 1,
+			   disk_name(sgp, part->partno, buf));
 	}
 
 	return 0;
@@ -579,7 +612,8 @@ static int diskstats_show(struct seq_file *seqf, void *v)
 	disk_round_stats(gp);
 	preempt_enable();
 	seq_printf(seqf, "%4d %4d %s %lu %lu %llu %u %lu %lu %llu %u %u %u %u\n",
-		gp->major, gp->first_minor, disk_name(gp, 0, buf),
+		MAJOR(disk_devt(gp)), MINOR(disk_devt(gp)),
+		disk_name(gp, 0, buf),
 		disk_stat_read(gp, ios[0]), disk_stat_read(gp, merges[0]),
 		(unsigned long long)disk_stat_read(gp, sectors[0]),
 		jiffies_to_msecs(disk_stat_read(gp, ticks[0])),
@@ -591,7 +625,7 @@ static int diskstats_show(struct seq_file *seqf, void *v)
 		jiffies_to_msecs(disk_stat_read(gp, time_in_queue)));
 
 	/* now show all non-0 size partitions of it */
-	for (n = 0; n < gp->minors - 1; n++) {
+	for (n = 0; n < disk_max_parts(gp); n++) {
 		struct hd_struct *hd = gp->part[n];
 
 		if (!hd || !hd->nr_sects)
@@ -602,8 +636,8 @@ static int diskstats_show(struct seq_file *seqf, void *v)
 		preempt_enable();
 		seq_printf(seqf, "%4d %4d %s %lu %lu %llu "
 			   "%u %lu %lu %llu %u %u %u %u\n",
-			   gp->major, n + gp->first_minor + 1,
-			   disk_name(gp, n + 1, buf),
+			   MAJOR(part_devt(hd)), MINOR(part_devt(hd)),
+			   disk_name(gp, hd->partno, buf),
 			   part_stat_read(hd, ios[0]),
 			   part_stat_read(hd, merges[0]),
 			   (unsigned long long)part_stat_read(hd, sectors[0]),
@@ -662,11 +696,22 @@ dev_t blk_lookup_devt(const char *name, int partno)
 	while ((dev = class_dev_iter_next(&iter))) {
 		struct gendisk *disk = dev_to_disk(dev);
 
-		if (!strcmp(dev->bus_id, name) && partno < disk->minors) {
-			devt = MKDEV(MAJOR(dev->devt),
-				     MINOR(dev->devt) + partno);
-			break;
+		if (strcmp(dev->bus_id, name))
+			continue;
+		if (partno < 0 || partno > disk_max_parts(disk))
+			continue;
+
+		if (partno == 0)
+			devt = disk_devt(disk);
+		else {
+			struct hd_struct *part = disk->part[partno - 1];
+
+			if (!part || !part->nr_sects)
+				continue;
+
+			devt = part_devt(part);
 		}
+		break;
 	}
 	class_dev_iter_exit(&iter);
 	return devt;
@@ -756,7 +801,7 @@ void set_disk_ro(struct gendisk *disk, int flag)
 {
 	int i;
 	disk->policy = flag;
-	for (i = 0; i < disk->minors - 1; i++)
+	for (i = 0; i < disk_max_parts(disk); i++)
 		if (disk->part[i]) disk->part[i]->policy = flag;
 }
 
diff --git a/block/ioctl.c b/block/ioctl.c
index d77f5e2..403f7d7 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -29,7 +29,7 @@ static int blkpg_ioctl(struct block_device *bdev, struct blkpg_ioctl_arg __user
 	if (bdev != bdev->bd_contains)
 		return -EINVAL;
 	partno = p.pno;
-	if (partno <= 0 || partno >= disk->minors)
+	if (partno <= 0 || partno > disk_max_parts(disk))
 		return -EINVAL;
 	switch (a.op) {
 		case BLKPG_ADD_PARTITION:
@@ -47,7 +47,7 @@ static int blkpg_ioctl(struct block_device *bdev, struct blkpg_ioctl_arg __user
 			mutex_lock(&bdev->bd_mutex);
 
 			/* overlap? */
-			for (i = 0; i < disk->minors - 1; i++) {
+			for (i = 0; i < disk_max_parts(disk); i++) {
 				struct hd_struct *s = disk->part[i];
 
 				if (!s)
@@ -96,7 +96,7 @@ static int blkdev_reread_part(struct block_device *bdev)
 	struct gendisk *disk = bdev->bd_disk;
 	int res;
 
-	if (disk->minors == 1 || bdev != bdev->bd_contains)
+	if (!disk_max_parts(disk) || bdev != bdev->bd_contains)
 		return -EINVAL;
 	if (!capable(CAP_SYS_ADMIN))
 		return -EACCES;
diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index 158eed4..19c739d 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -2919,7 +2919,7 @@ static int pkt_setup_dev(dev_t dev, dev_t* pkt_dev)
 	if (!disk->queue)
 		goto out_mem2;
 
-	pd->pkt_dev = MKDEV(disk->major, disk->first_minor);
+	pd->pkt_dev = MKDEV(pktdev_major, idx);
 	ret = pkt_new_dev(pd, dev);
 	if (ret)
 		goto out_new_dev;
diff --git a/drivers/block/ps3disk.c b/drivers/block/ps3disk.c
index 4b0d6c7..936466f 100644
--- a/drivers/block/ps3disk.c
+++ b/drivers/block/ps3disk.c
@@ -541,7 +541,7 @@ static int ps3disk_remove(struct ps3_system_bus_device *_dev)
 	struct ps3disk_private *priv = dev->sbd.core.driver_data;
 
 	mutex_lock(&ps3disk_mask_mutex);
-	__clear_bit(priv->gendisk->first_minor / PS3DISK_MINORS,
+	__clear_bit(MINOR(disk_devt(priv->gendisk)) / PS3DISK_MINORS,
 		    &ps3disk_mask);
 	mutex_unlock(&ps3disk_mask_mutex);
 	del_gendisk(priv->gendisk);
diff --git a/drivers/char/random.c b/drivers/char/random.c
index e0d0e37..6c6282c 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -660,10 +660,10 @@ void add_disk_randomness(struct gendisk *disk)
 	if (!disk || !disk->random)
 		return;
 	/* first major is 1, so we get >= 0x200 here */
-	DEBUG_ENT("disk event %d:%d\n", disk->major, disk->first_minor);
+	DEBUG_ENT("disk event %d:%d\n",
+		  MAJOR(disk_devt(disk)), MINOR(disk_devt(disk)));
 
-	add_timer_randomness(disk->random,
-			     0x100 + MKDEV(disk->major, disk->first_minor));
+	add_timer_randomness(disk->random, 0x100 + disk_devt(disk));
 }
 #endif
 
diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
index b262c00..c3de311 100644
--- a/drivers/md/dm-ioctl.c
+++ b/drivers/md/dm-ioctl.c
@@ -426,7 +426,7 @@ static int list_devices(struct dm_ioctl *param, size_t param_size)
 				old_nl->next = (uint32_t) ((void *) nl -
 							   (void *) old_nl);
 			disk = dm_disk(hc->md);
-			nl->dev = huge_encode_dev(MKDEV(disk->major, disk->first_minor));
+			nl->dev = huge_encode_dev(disk_devt(disk));
 			nl->next = 0;
 			strcpy(nl->name, hc->name);
 
@@ -539,7 +539,7 @@ static int __dev_status(struct mapped_device *md, struct dm_ioctl *param)
 	if (dm_suspended(md))
 		param->flags |= DM_SUSPEND_FLAG;
 
-	param->dev = huge_encode_dev(MKDEV(disk->major, disk->first_minor));
+	param->dev = huge_encode_dev(disk_devt(disk));
 
 	/*
 	 * Yes, this will be out of date by the time it gets back
diff --git a/drivers/md/dm-stripe.c b/drivers/md/dm-stripe.c
index 4de90ab..b745d8a 100644
--- a/drivers/md/dm-stripe.c
+++ b/drivers/md/dm-stripe.c
@@ -284,8 +284,8 @@ static int stripe_end_io(struct dm_target *ti, struct bio *bio,
 
 	memset(major_minor, 0, sizeof(major_minor));
 	sprintf(major_minor, "%d:%d",
-		bio->bi_bdev->bd_disk->major,
-		bio->bi_bdev->bd_disk->first_minor);
+		MAJOR(disk_devt(bio->bi_bdev->bd_disk)),
+		MINOR(disk_devt(bio->bi_bdev->bd_disk)));
 
 	/*
 	 * Test to see which stripe drive triggered the event
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index bca448e..b4ddb96 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1142,7 +1142,7 @@ static void unlock_fs(struct mapped_device *md);
 
 static void free_dev(struct mapped_device *md)
 {
-	int minor = md->disk->first_minor;
+	int minor = MINOR(disk_devt(md->disk));
 
 	if (md->suspended_bdev) {
 		unlock_fs(md);
@@ -1263,7 +1263,7 @@ static struct mapped_device *dm_find_md(dev_t dev)
 
 	md = idr_find(&_minor_idr, minor);
 	if (md && (md == MINOR_ALLOCED ||
-		   (dm_disk(md)->first_minor != minor) ||
+		   (MINOR(disk_devt(dm_disk(md))) != minor) ||
 		   test_bit(DMF_FREEING, &md->flags))) {
 		md = NULL;
 		goto out;
@@ -1314,7 +1314,8 @@ void dm_put(struct mapped_device *md)
 
 	if (atomic_dec_and_lock(&md->holders, &_minor_lock)) {
 		map = dm_get_table(md);
-		idr_replace(&_minor_idr, MINOR_ALLOCED, dm_disk(md)->first_minor);
+		idr_replace(&_minor_idr, MINOR_ALLOCED,
+			    MINOR(disk_devt(dm_disk(md))));
 		set_bit(DMF_FREEING, &md->flags);
 		spin_unlock(&_minor_lock);
 		if (!dm_suspended(md)) {
diff --git a/drivers/memstick/core/mspro_block.c b/drivers/memstick/core/mspro_block.c
index 44b1817..95bbd3b 100644
--- a/drivers/memstick/core/mspro_block.c
+++ b/drivers/memstick/core/mspro_block.c
@@ -195,7 +195,7 @@ static int mspro_block_bd_open(struct inode *inode, struct file *filp)
 static int mspro_block_disk_release(struct gendisk *disk)
 {
 	struct mspro_block_data *msb = disk->private_data;
-	int disk_id = disk->first_minor >> MEMSTICK_PART_SHIFT;
+	int disk_id = MINOR(disk_devt(disk)) >> MEMSTICK_PART_SHIFT;
 
 	mutex_lock(&mspro_block_disk_lock);
 
diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
index 86dbb36..1cb6114 100644
--- a/drivers/mmc/card/block.c
+++ b/drivers/mmc/card/block.c
@@ -83,7 +83,7 @@ static void mmc_blk_put(struct mmc_blk_data *md)
 	mutex_lock(&open_lock);
 	md->usage--;
 	if (md->usage == 0) {
-		int devidx = md->disk->first_minor >> MMC_SHIFT;
+		int devidx = MINOR(disk_devt(md->disk)) >> MMC_SHIFT;
 		__clear_bit(devidx, dev_use);
 
 		put_disk(md->disk);
diff --git a/drivers/s390/block/dasd_proc.c b/drivers/s390/block/dasd_proc.c
index 03c0e40..e3b5c4d 100644
--- a/drivers/s390/block/dasd_proc.c
+++ b/drivers/s390/block/dasd_proc.c
@@ -76,7 +76,8 @@ dasd_devices_show(struct seq_file *m, void *v)
 	/* Print kdev. */
 	if (block->gdp)
 		seq_printf(m, " at (%3d:%6d)",
-			   block->gdp->major, block->gdp->first_minor);
+			   MAJOR(disk_devt(block->gdp)),
+			   MINOR(disk_devt(block->gdp)));
 	else
 		seq_printf(m, "  at (???:??????)");
 	/* Print device name. */
diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
index 01fcdd9..805cd01 100644
--- a/drivers/s390/block/dcssblk.c
+++ b/drivers/s390/block/dcssblk.c
@@ -114,7 +114,7 @@ dcssblk_assign_free_minor(struct dcssblk_dev_info *dev_info)
 		found = 0;
 		// test if minor available
 		list_for_each_entry(entry, &dcssblk_devices, lh)
-			if (minor == entry->gd->first_minor)
+			if (minor == MINOR(disk_devt(entry->gd)))
 				found++;
 		if (!found) break; // got unused minor
 	}
@@ -392,7 +392,7 @@ dcssblk_add_store(struct device *dev, struct device_attribute *attr, const char
 		goto unload_seg;
 	}
 	sprintf(dev_info->gd->disk_name, "dcssblk%d",
-		dev_info->gd->first_minor);
+		MINOR(disk_devt(dev_info->gd)));
 	list_add_tail(&dev_info->lh, &dcssblk_devices);
 
 	if (!try_module_get(THIS_MODULE)) {
diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c
index 27f5bfd..8dbe379 100644
--- a/drivers/scsi/sr.c
+++ b/drivers/scsi/sr.c
@@ -878,7 +878,7 @@ static void sr_kref_release(struct kref *kref)
 	struct gendisk *disk = cd->disk;
 
 	spin_lock(&sr_index_lock);
-	clear_bit(disk->first_minor, sr_index_bits);
+	clear_bit(MINOR(disk_devt(disk)), sr_index_bits);
 	spin_unlock(&sr_index_lock);
 
 	unregister_cdrom(&cd->cdi);
diff --git a/fs/block_dev.c b/fs/block_dev.c
index de0776c..72e0a28 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -892,7 +892,7 @@ int check_disk_change(struct block_device *bdev)
 
 	if (bdops->revalidate_disk)
 		bdops->revalidate_disk(bdev->bd_disk);
-	if (bdev->bd_disk->minors > 1)
+	if (disk_max_parts(bdev->bd_disk))
 		bdev->bd_invalidated = 1;
 	return 1;
 }
diff --git a/fs/partitions/check.c b/fs/partitions/check.c
index db56f5f..0f33955 100644
--- a/fs/partitions/check.c
+++ b/fs/partitions/check.c
@@ -134,7 +134,11 @@ char *disk_name(struct gendisk *hd, int partno, char *buf)
 
 const char *bdevname(struct block_device *bdev, char *buf)
 {
-	int partno = MINOR(bdev->bd_dev) - bdev->bd_disk->first_minor;
+	int partno = 0;
+
+	if (bdev->bd_part)
+		partno = bdev->bd_part->partno;
+
 	return disk_name(bdev->bd_disk, partno, buf);
 }
 
@@ -169,7 +173,7 @@ check_partition(struct gendisk *hd, struct block_device *bdev)
 	if (isdigit(state->name[strlen(state->name)-1]))
 		sprintf(state->name, "p");
 
-	state->limit = hd->minors;
+	state->limit = disk_max_parts(hd) + 1;
 	i = res = err = 0;
 	while (!res && check_part[i]) {
 		memset(&state->parts, 0, sizeof(state->parts));
@@ -416,7 +420,6 @@ void register_disk(struct gendisk *disk)
 	int err;
 
 	disk->dev.parent = disk->driverfs_dev;
-	disk->dev.devt = MKDEV(disk->major, disk->first_minor);
 
 	strlcpy(disk->dev.bus_id, disk->disk_name, BUS_ID_SIZE);
 	/* ewww... some of these buggers have / in the name... */
@@ -440,7 +443,7 @@ void register_disk(struct gendisk *disk)
 	disk_sysfs_add_subdirs(disk);
 
 	/* No minors to use for partitions */
-	if (disk->minors == 1)
+	if (!disk_max_parts(disk))
 		goto exit;
 
 	/* No such device (e.g., media were just removed) */
@@ -463,8 +466,8 @@ exit:
 	kobject_uevent(&disk->dev.kobj, KOBJ_ADD);
 
 	/* announce possible partitions */
-	for (i = 1; i < disk->minors; i++) {
-		p = disk->part[i-1];
+	for (i = 0; i < disk_max_parts(disk); i++) {
+		p = disk->part[i];
 		if (!p || !p->nr_sects)
 			continue;
 		kobject_uevent(&p->dev.kobj, KOBJ_ADD);
@@ -482,7 +485,7 @@ int rescan_partitions(struct gendisk *disk, struct block_device *bdev)
 	if (res)
 		return res;
 	bdev->bd_invalidated = 0;
-	for (p = 1; p < disk->minors; p++)
+	for (p = 1; p <= disk_max_parts(disk); p++)
 		delete_partition(disk, p);
 	if (disk->fops->revalidate_disk)
 		disk->fops->revalidate_disk(disk);
@@ -545,7 +548,7 @@ void del_gendisk(struct gendisk *disk)
 	int p;
 
 	/* invalidate stuff */
-	for (p = disk->minors - 1; p > 0; p--) {
+	for (p = disk_max_parts(disk); p > 0; p--) {
 		invalidate_partition(disk, p);
 		delete_partition(disk, p);
 	}
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index 6bfff7a..6cb34ba 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -120,10 +120,14 @@ struct blk_scsi_cmd_filter {
 };
 
 struct gendisk {
+	/* major, first_minor and minors are input parameters only,
+	 * don't use directly.  Use disk_devt() and disk_max_parts().
+	 */
 	int major;			/* major number of driver */
 	int first_minor;
 	int minors;                     /* maximum number of minors, =1 for
                                          * disks that can't be partitioned. */
+
 	char disk_name[32];		/* name of major driver */
 	struct hd_struct **part;	/* [indexed by minor - 1] */
 	struct block_device_operations *fops;
@@ -162,6 +166,21 @@ static inline struct gendisk *part_to_disk(struct hd_struct *part)
 	return NULL;
 }
 
+static inline int disk_max_parts(struct gendisk *disk)
+{
+	return disk->minors - 1;
+}
+
+static inline dev_t disk_devt(struct gendisk *disk)
+{
+	return disk->dev.devt;
+}
+
+static inline dev_t part_devt(struct hd_struct *part)
+{
+	return part->dev.devt;
+}
+
 /* 
  * Macros to operate on percpu disk statistics:
  *
@@ -173,7 +192,7 @@ static inline struct hd_struct *disk_map_sector(struct gendisk *gendiskp,
 {
 	struct hd_struct *part;
 	int i;
-	for (i = 0; i < gendiskp->minors - 1; i++) {
+	for (i = 0; i < disk_max_parts(gendiskp); i++) {
 		part = gendiskp->part[i];
 		if (part && part->start_sect <= sector
 		    && sector < part->start_sect + part->nr_sects)
@@ -376,6 +395,7 @@ extern void add_disk(struct gendisk *disk);
 extern void del_gendisk(struct gendisk *gp);
 extern void unlink_gendisk(struct gendisk *gp);
 extern struct gendisk *get_gendisk(dev_t dev, int *partno);
+extern struct block_device *bdget_disk(struct gendisk *disk, int partno);
 
 extern void set_device_ro(struct block_device *bdev, int flag);
 extern void set_disk_ro(struct gendisk *disk, int flag);
@@ -563,11 +583,6 @@ extern void blk_register_region(dev_t devt, unsigned long range,
 			void *data);
 extern void blk_unregister_region(dev_t devt, unsigned long range);
 
-static inline struct block_device *bdget_disk(struct gendisk *disk, int partno)
-{
-	return bdget(MKDEV(disk->major, disk->first_minor) + partno);
-}
-
 #else /* CONFIG_BLOCK */
 
 static inline void printk_all_partitions(void) { }
-- 
1.5.4.5


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 04/09] block: fix disk->part[] dereferencing race
  2008-08-25 10:47 [PATCHSET 2/3 blk-for-2.6.28] block: implement extended devt, take #3 Tejun Heo
                   ` (2 preceding siblings ...)
  2008-08-25 10:47 ` [PATCH 03/09] block: don't depend on consecutive minor space Tejun Heo
@ 2008-08-25 10:47 ` Tejun Heo
  2008-08-25 10:47 ` [PATCH 05/09] block: fix diskstats access Tejun Heo
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Tejun Heo @ 2008-08-25 10:47 UTC (permalink / raw)
  To: Jens Axboe
  Cc: James.Bottomley, bzolnier, bharrosh, greg.freemyer, linux-scsi,
	brking, liml, viro, linux-ide, neilb, linux-kernel, Tejun Heo

disk->part[] is protected by its matching bdev's lock.  However,
non-critical accesses like collecting stats and printing out sysfs and
proc information used to be performed without any locking.  As
partitions can come and go dynamically, partitions can go away
underneath those non-critical accesses.  As some of those accesses are
writes, this theoretically can lead to silent corruption.

This patch fixes the race by using RCU for the partition array and dev
reference counter to hold partitions.

* Rename disk->part[] to disk->__part[] to make sure no one outside
  genhd layer proper accesses it directly.

* Use RCU for disk->__part[] dereferencing.

* Implement disk_{get|put}_part() which can be used to get and put
  partitions from gendisk respectively.

* Iterators are implemented to help iterate through all partitions
  safely.

* Functions which require RCU readlock are marked with _rcu suffix.

* Use disk_put_part() in __blkdev_put() instead of directly putting
  the contained kobject.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 block/blk-core.c           |   20 +++-
 block/blk-merge.c          |    9 ++-
 block/genhd.c              |  218 +++++++++++++++++++++++++++++++++++++------
 block/ioctl.c              |   26 +++--
 drivers/block/aoe/aoecmd.c |    6 +-
 fs/block_dev.c             |   15 ++--
 fs/partitions/check.c      |   70 +++++++++-----
 include/linux/genhd.h      |   53 ++++++++---
 8 files changed, 323 insertions(+), 94 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 430b206..a8cfa5e 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -60,7 +60,9 @@ static void drive_stat_acct(struct request *rq, int new_io)
 	if (!blk_fs_request(rq) || !rq->rq_disk)
 		return;
 
-	part = disk_map_sector(rq->rq_disk, rq->sector);
+	rcu_read_lock();
+
+	part = disk_map_sector_rcu(rq->rq_disk, rq->sector);
 	if (!new_io)
 		__all_stat_inc(rq->rq_disk, part, merges[rw], rq->sector);
 	else {
@@ -71,6 +73,8 @@ static void drive_stat_acct(struct request *rq, int new_io)
 			part->in_flight++;
 		}
 	}
+
+	rcu_read_unlock();
 }
 
 void blk_queue_congestion_threshold(struct request_queue *q)
@@ -1555,12 +1559,14 @@ static int __end_that_request_first(struct request *req, int error,
 	}
 
 	if (blk_fs_request(req) && req->rq_disk) {
-		struct hd_struct *part =
-			disk_map_sector(req->rq_disk, req->sector);
 		const int rw = rq_data_dir(req);
+		struct hd_struct *part;
 
+		rcu_read_lock();
+		part = disk_map_sector_rcu(req->rq_disk, req->sector);
 		all_stat_add(req->rq_disk, part, sectors[rw],
 				nr_bytes >> 9, req->sector);
+		rcu_read_unlock();
 	}
 
 	total_bytes = bio_nbytes = 0;
@@ -1744,7 +1750,11 @@ static void end_that_request_last(struct request *req, int error)
 	if (disk && blk_fs_request(req) && req != &req->q->bar_rq) {
 		unsigned long duration = jiffies - req->start_time;
 		const int rw = rq_data_dir(req);
-		struct hd_struct *part = disk_map_sector(disk, req->sector);
+		struct hd_struct *part;
+
+		rcu_read_lock();
+
+		part = disk_map_sector_rcu(disk, req->sector);
 
 		__all_stat_inc(disk, part, ios[rw], req->sector);
 		__all_stat_add(disk, part, ticks[rw], duration, req->sector);
@@ -1754,6 +1764,8 @@ static void end_that_request_last(struct request *req, int error)
 			part_round_stats(part);
 			part->in_flight--;
 		}
+
+		rcu_read_unlock();
 	}
 
 	if (req->end_io)
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 9b17da6..eb2a3ca 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -387,14 +387,19 @@ static int attempt_merge(struct request_queue *q, struct request *req,
 	elv_merge_requests(q, req, next);
 
 	if (req->rq_disk) {
-		struct hd_struct *part =
-			disk_map_sector(req->rq_disk, req->sector);
+		struct hd_struct *part;
+
+		rcu_read_lock();
+
+		part = disk_map_sector_rcu(req->rq_disk, req->sector);
 		disk_round_stats(req->rq_disk);
 		req->rq_disk->in_flight--;
 		if (part) {
 			part_round_stats(part);
 			part->in_flight--;
 		}
+
+		rcu_read_unlock();
 	}
 
 	req->ioprio = ioprio_best(req->ioprio, next->ioprio);
diff --git a/block/genhd.c b/block/genhd.c
index d26b4b7..5e234b1 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -26,6 +26,158 @@ struct kobject *block_depr;
 
 static struct device_type disk_type;
 
+/**
+ * disk_get_part - get partition
+ * @disk: disk to look partition from
+ * @partno: partition number
+ *
+ * Look for partition @partno from @disk.  If found, increment
+ * reference count and return it.
+ *
+ * CONTEXT:
+ * Don't care.
+ *
+ * RETURNS:
+ * Pointer to the found partition on success, NULL if not found.
+ */
+struct hd_struct *disk_get_part(struct gendisk *disk, int partno)
+{
+	struct hd_struct *part;
+
+	if (unlikely(partno < 1 || partno > disk_max_parts(disk)))
+		return NULL;
+	rcu_read_lock();
+	part = rcu_dereference(disk->__part[partno - 1]);
+	if (part)
+		get_device(&part->dev);
+	rcu_read_unlock();
+
+	return part;
+}
+EXPORT_SYMBOL_GPL(disk_get_part);
+
+/**
+ * disk_part_iter_init - initialize partition iterator
+ * @piter: iterator to initialize
+ * @disk: disk to iterate over
+ * @flags: DISK_PITER_* flags
+ *
+ * Initialize @piter so that it iterates over partitions of @disk.
+ *
+ * CONTEXT:
+ * Don't care.
+ */
+void disk_part_iter_init(struct disk_part_iter *piter, struct gendisk *disk,
+			  unsigned int flags)
+{
+	piter->disk = disk;
+	piter->part = NULL;
+
+	if (flags & DISK_PITER_REVERSE)
+		piter->idx = disk_max_parts(piter->disk) - 1;
+	else
+		piter->idx = 0;
+
+	piter->flags = flags;
+}
+EXPORT_SYMBOL_GPL(disk_part_iter_init);
+
+/**
+ * disk_part_iter_next - proceed iterator to the next partition and return it
+ * @piter: iterator of interest
+ *
+ * Proceed @piter to the next partition and return it.
+ *
+ * CONTEXT:
+ * Don't care.
+ */
+struct hd_struct *disk_part_iter_next(struct disk_part_iter *piter)
+{
+	int inc, end;
+
+	/* put the last partition */
+	disk_put_part(piter->part);
+	piter->part = NULL;
+
+	rcu_read_lock();
+
+	/* determine iteration parameters */
+	if (piter->flags & DISK_PITER_REVERSE) {
+		inc = -1;
+		end = -1;
+	} else {
+		inc = 1;
+		end = disk_max_parts(piter->disk);
+	}
+
+	/* iterate to the next partition */
+	for (; piter->idx != end; piter->idx += inc) {
+		struct hd_struct *part;
+
+		part = rcu_dereference(piter->disk->__part[piter->idx]);
+		if (!part)
+			continue;
+		if (!(piter->flags & DISK_PITER_INCL_EMPTY) && !part->nr_sects)
+			continue;
+
+		get_device(&part->dev);
+		piter->part = part;
+		piter->idx += inc;
+		break;
+	}
+
+	rcu_read_unlock();
+
+	return piter->part;
+}
+EXPORT_SYMBOL_GPL(disk_part_iter_next);
+
+/**
+ * disk_part_iter_exit - finish up partition iteration
+ * @piter: iter of interest
+ *
+ * Called when iteration is over.  Cleans up @piter.
+ *
+ * CONTEXT:
+ * Don't care.
+ */
+void disk_part_iter_exit(struct disk_part_iter *piter)
+{
+	disk_put_part(piter->part);
+	piter->part = NULL;
+}
+EXPORT_SYMBOL_GPL(disk_part_iter_exit);
+
+/**
+ * disk_map_sector_rcu - map sector to partition
+ * @disk: gendisk of interest
+ * @sector: sector to map
+ *
+ * Find out which partition @sector maps to on @disk.  This is
+ * primarily used for stats accounting.
+ *
+ * CONTEXT:
+ * RCU read locked.  The returned partition pointer is valid only
+ * while preemption is disabled.
+ *
+ * RETURNS:
+ * Found partition on success, NULL if there's no matching partition.
+ */
+struct hd_struct *disk_map_sector_rcu(struct gendisk *disk, sector_t sector)
+{
+	int i;
+
+	for (i = 0; i < disk_max_parts(disk); i++) {
+		struct hd_struct *part = rcu_dereference(disk->__part[i]);
+
+		if (part && part->start_sect <= sector &&
+		    sector < part->start_sect + part->nr_sects)
+			return part;
+	}
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(disk_map_sector_rcu);
+
 /*
  * Can be deleted altogether. Later.
  *
@@ -247,10 +399,12 @@ extern struct block_device *bdget_disk(struct gendisk *disk, int partno)
 	if (partno == 0)
 		devt = disk_devt(disk);
 	else {
-		struct hd_struct *part = disk->part[partno - 1];
+		struct hd_struct *part;
 
+		part = disk_get_part(disk, partno);
 		if (part && part->nr_sects)
 			devt = part_devt(part);
+		disk_put_part(part);
 	}
 
 	if (likely(devt != MKDEV(0, 0)))
@@ -272,8 +426,9 @@ void __init printk_all_partitions(void)
 	class_dev_iter_init(&iter, &block_class, NULL, &disk_type);
 	while ((dev = class_dev_iter_next(&iter))) {
 		struct gendisk *disk = dev_to_disk(dev);
+		struct disk_part_iter piter;
+		struct hd_struct *part;
 		char buf[BDEVNAME_SIZE];
-		int n;
 
 		/*
 		 * Don't show empty devices or things that have been
@@ -300,16 +455,13 @@ void __init printk_all_partitions(void)
 			printk(" (driver?)\n");
 
 		/* now show the partitions */
-		for (n = 0; n < disk_max_parts(disk); ++n) {
-			struct hd_struct *part = disk->part[n];
-
-			if (!part || !part->nr_sects)
-				continue;
+		disk_part_iter_init(&piter, disk, 0);
+		while ((part = disk_part_iter_next(&piter)))
 			printk("  %02x%02x %10llu %s\n",
 			       MAJOR(part_devt(part)), MINOR(part_devt(part)),
 			       (unsigned long long)part->nr_sects >> 1,
 			       disk_name(disk, part->partno, buf));
-		}
+		disk_part_iter_exit(&piter);
 	}
 	class_dev_iter_exit(&iter);
 }
@@ -372,7 +524,8 @@ static void *show_partition_start(struct seq_file *seqf, loff_t *pos)
 static int show_partition(struct seq_file *seqf, void *v)
 {
 	struct gendisk *sgp = v;
-	int n;
+	struct disk_part_iter piter;
+	struct hd_struct *part;
 	char buf[BDEVNAME_SIZE];
 
 	/* Don't show non-partitionable removeable devices or empty devices */
@@ -387,17 +540,14 @@ static int show_partition(struct seq_file *seqf, void *v)
 		MAJOR(disk_devt(sgp)), MINOR(disk_devt(sgp)),
 		(unsigned long long)get_capacity(sgp) >> 1,
 		disk_name(sgp, 0, buf));
-	for (n = 0; n < disk_max_parts(sgp); n++) {
-		struct hd_struct *part = sgp->part[n];
-		if (!part)
-			continue;
-		if (part->nr_sects == 0)
-			continue;
+
+	disk_part_iter_init(&piter, sgp, 0);
+	while ((part = disk_part_iter_next(&piter)))
 		seq_printf(seqf, "%4d  %4d %10llu %s\n",
 			   MAJOR(part_devt(part)), MINOR(part_devt(part)),
 			   (unsigned long long)part->nr_sects >> 1,
 			   disk_name(sgp, part->partno, buf));
-	}
+	disk_part_iter_exit(&piter);
 
 	return 0;
 }
@@ -572,7 +722,7 @@ static void disk_release(struct device *dev)
 	struct gendisk *disk = dev_to_disk(dev);
 
 	kfree(disk->random);
-	kfree(disk->part);
+	kfree(disk->__part);
 	free_disk_stats(disk);
 	kfree(disk);
 }
@@ -597,8 +747,9 @@ static struct device_type disk_type = {
 static int diskstats_show(struct seq_file *seqf, void *v)
 {
 	struct gendisk *gp = v;
+	struct disk_part_iter piter;
+	struct hd_struct *hd;
 	char buf[BDEVNAME_SIZE];
-	int n;
 
 	/*
 	if (&gp->dev.kobj.entry == block_class.devices.next)
@@ -625,12 +776,8 @@ static int diskstats_show(struct seq_file *seqf, void *v)
 		jiffies_to_msecs(disk_stat_read(gp, time_in_queue)));
 
 	/* now show all non-0 size partitions of it */
-	for (n = 0; n < disk_max_parts(gp); n++) {
-		struct hd_struct *hd = gp->part[n];
-
-		if (!hd || !hd->nr_sects)
-			continue;
-
+	disk_part_iter_init(&piter, gp, 0);
+	while ((hd = disk_part_iter_next(&piter))) {
 		preempt_disable();
 		part_round_stats(hd);
 		preempt_enable();
@@ -651,6 +798,7 @@ static int diskstats_show(struct seq_file *seqf, void *v)
 			   jiffies_to_msecs(part_stat_read(hd, time_in_queue))
 			);
 	}
+	disk_part_iter_exit(&piter);
  
 	return 0;
 }
@@ -704,12 +852,16 @@ dev_t blk_lookup_devt(const char *name, int partno)
 		if (partno == 0)
 			devt = disk_devt(disk);
 		else {
-			struct hd_struct *part = disk->part[partno - 1];
+			struct hd_struct *part;
 
-			if (!part || !part->nr_sects)
+			part = disk_get_part(disk, partno);
+			if (!part || !part->nr_sects) {
+				disk_put_part(part);
 				continue;
+			}
 
 			devt = part_devt(part);
+			disk_put_part(part);
 		}
 		break;
 	}
@@ -736,9 +888,9 @@ struct gendisk *alloc_disk_node(int minors, int node_id)
 		}
 		if (minors > 1) {
 			int size = (minors - 1) * sizeof(struct hd_struct *);
-			disk->part = kmalloc_node(size,
+			disk->__part = kmalloc_node(size,
 				GFP_KERNEL | __GFP_ZERO, node_id);
-			if (!disk->part) {
+			if (!disk->__part) {
 				free_disk_stats(disk);
 				kfree(disk);
 				return NULL;
@@ -799,10 +951,14 @@ EXPORT_SYMBOL(set_device_ro);
 
 void set_disk_ro(struct gendisk *disk, int flag)
 {
-	int i;
+	struct disk_part_iter piter;
+	struct hd_struct *part;
+
 	disk->policy = flag;
-	for (i = 0; i < disk_max_parts(disk); i++)
-		if (disk->part[i]) disk->part[i]->policy = flag;
+	disk_part_iter_init(&piter, disk, DISK_PITER_INCL_EMPTY);
+	while ((part = disk_part_iter_next(&piter)))
+		part->policy = flag;
+	disk_part_iter_exit(&piter);
 }
 
 EXPORT_SYMBOL(set_disk_ro);
diff --git a/block/ioctl.c b/block/ioctl.c
index 403f7d7..a5f672a 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -12,11 +12,12 @@ static int blkpg_ioctl(struct block_device *bdev, struct blkpg_ioctl_arg __user
 {
 	struct block_device *bdevp;
 	struct gendisk *disk;
+	struct hd_struct *part;
 	struct blkpg_ioctl_arg a;
 	struct blkpg_partition p;
+	struct disk_part_iter piter;
 	long long start, length;
 	int partno;
-	int i;
 	int err;
 
 	if (!capable(CAP_SYS_ADMIN))
@@ -47,28 +48,33 @@ static int blkpg_ioctl(struct block_device *bdev, struct blkpg_ioctl_arg __user
 			mutex_lock(&bdev->bd_mutex);
 
 			/* overlap? */
-			for (i = 0; i < disk_max_parts(disk); i++) {
-				struct hd_struct *s = disk->part[i];
-
-				if (!s)
-					continue;
-				if (!(start+length <= s->start_sect ||
-				      start >= s->start_sect + s->nr_sects)) {
+			disk_part_iter_init(&piter, disk,
+					    DISK_PITER_INCL_EMPTY);
+			while ((part = disk_part_iter_next(&piter))) {
+				if (!(start + length <= part->start_sect ||
+				      start >= part->start_sect + part->nr_sects)) {
+					disk_part_iter_exit(&piter);
 					mutex_unlock(&bdev->bd_mutex);
 					return -EBUSY;
 				}
 			}
+			disk_part_iter_exit(&piter);
+
 			/* all seems OK */
 			err = add_partition(disk, partno, start, length,
 					    ADDPART_FLAG_NONE);
 			mutex_unlock(&bdev->bd_mutex);
 			return err;
 		case BLKPG_DEL_PARTITION:
-			if (!disk->part[partno - 1])
+			part = disk_get_part(disk, partno);
+			if (!part)
 				return -ENXIO;
-			bdevp = bdget_disk(disk, partno);
+
+			bdevp = bdget(part_devt(part));
+			disk_put_part(part);
 			if (!bdevp)
 				return -ENOMEM;
+
 			mutex_lock(&bdevp->bd_mutex);
 			if (bdevp->bd_openers) {
 				mutex_unlock(&bdevp->bd_mutex);
diff --git a/drivers/block/aoe/aoecmd.c b/drivers/block/aoe/aoecmd.c
index 885d140..84c03d6 100644
--- a/drivers/block/aoe/aoecmd.c
+++ b/drivers/block/aoe/aoecmd.c
@@ -757,11 +757,15 @@ diskstats(struct gendisk *disk, struct bio *bio, ulong duration, sector_t sector
 	const int rw = bio_data_dir(bio);
 	struct hd_struct *part;
 
-	part = disk_map_sector(disk, sector);
+	rcu_read_lock();
+
+	part = disk_map_sector_rcu(disk, sector);
 	all_stat_inc(disk, part, ios[rw], sector);
 	all_stat_add(disk, part, ticks[rw], duration, sector);
 	all_stat_add(disk, part, sectors[rw], n_sect, sector);
 	all_stat_add(disk, part, io_ticks, duration, sector);
+
+	rcu_read_unlock();
 }
 
 void
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 72e0a28..2f2873b 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -929,6 +929,7 @@ static int do_open(struct block_device *bdev, struct file *file, int for_part)
 {
 	struct module *owner = NULL;
 	struct gendisk *disk;
+	struct hd_struct *part = NULL;
 	int ret;
 	int partno;
 	int perm = 0;
@@ -978,7 +979,6 @@ static int do_open(struct block_device *bdev, struct file *file, int for_part)
 			if (bdev->bd_invalidated)
 				rescan_partitions(disk, bdev);
 		} else {
-			struct hd_struct *p;
 			struct block_device *whole;
 			whole = bdget_disk(disk, 0);
 			ret = -ENOMEM;
@@ -989,16 +989,16 @@ static int do_open(struct block_device *bdev, struct file *file, int for_part)
 			if (ret)
 				goto out_first;
 			bdev->bd_contains = whole;
-			p = disk->part[partno - 1];
+			part = disk_get_part(disk, partno);
 			bdev->bd_inode->i_data.backing_dev_info =
 			   whole->bd_inode->i_data.backing_dev_info;
-			if (!(disk->flags & GENHD_FL_UP) || !p || !p->nr_sects) {
+			if (!(disk->flags & GENHD_FL_UP) ||
+			    !part || !part->nr_sects) {
 				ret = -ENXIO;
 				goto out_first;
 			}
-			kobject_get(&p->dev.kobj);
-			bdev->bd_part = p;
-			bd_set_size(bdev, (loff_t) p->nr_sects << 9);
+			bdev->bd_part = part;
+			bd_set_size(bdev, (loff_t)part->nr_sects << 9);
 		}
 	} else {
 		put_disk(disk);
@@ -1027,6 +1027,7 @@ out_first:
 		__blkdev_put(bdev->bd_contains, 1);
 	bdev->bd_contains = NULL;
 	put_disk(disk);
+	disk_put_part(part);
 	module_put(owner);
 out:
 	mutex_unlock(&bdev->bd_mutex);
@@ -1119,7 +1120,7 @@ static int __blkdev_put(struct block_device *bdev, int for_part)
 		module_put(owner);
 
 		if (bdev->bd_contains != bdev) {
-			kobject_put(&bdev->bd_part->dev.kobj);
+			disk_put_part(bdev->bd_part);
 			bdev->bd_part = NULL;
 		}
 		bdev->bd_disk = NULL;
diff --git a/fs/partitions/check.c b/fs/partitions/check.c
index 0f33955..aa2f5f7 100644
--- a/fs/partitions/check.c
+++ b/fs/partitions/check.c
@@ -314,19 +314,29 @@ static inline void disk_sysfs_add_subdirs(struct gendisk *disk)
 	kobject_put(k);
 }
 
+static void delete_partition_rcu_cb(struct rcu_head *head)
+{
+	struct hd_struct *part = container_of(head, struct hd_struct, rcu_head);
+
+	part->start_sect = 0;
+	part->nr_sects = 0;
+	part_stat_set_all(part, 0);
+	put_device(&part->dev);
+}
+
 void delete_partition(struct gendisk *disk, int partno)
 {
-	struct hd_struct *p = disk->part[partno - 1];
+	struct hd_struct *part;
 
-	if (!p)
+	part = disk->__part[partno-1];
+	if (!part)
 		return;
-	disk->part[partno - 1] = NULL;
-	p->start_sect = 0;
-	p->nr_sects = 0;
-	part_stat_set_all(p, 0);
-	kobject_put(p->holder_dir);
-	device_del(&p->dev);
-	put_device(&p->dev);
+
+	rcu_assign_pointer(disk->__part[partno-1], NULL);
+	kobject_put(part->holder_dir);
+	device_del(&part->dev);
+
+	call_rcu(&part->rcu_head, delete_partition_rcu_cb);
 }
 
 static ssize_t whole_disk_show(struct device *dev,
@@ -343,7 +353,7 @@ int add_partition(struct gendisk *disk, int partno,
 	struct hd_struct *p;
 	int err;
 
-	if (disk->part[partno - 1])
+	if (disk->__part[partno - 1])
 		return -EBUSY;
 
 	p = kzalloc(sizeof(*p), GFP_KERNEL);
@@ -391,7 +401,8 @@ int add_partition(struct gendisk *disk, int partno,
 	}
 
 	/* everything is up and running, commence */
-	disk->part[partno - 1] = p;
+	INIT_RCU_HEAD(&p->rcu_head);
+	rcu_assign_pointer(disk->__part[partno - 1], p);
 
 	/* suppress uevent if the disk supresses it */
 	if (!disk->dev.uevent_suppress)
@@ -414,9 +425,9 @@ out_put:
 void register_disk(struct gendisk *disk)
 {
 	struct block_device *bdev;
+	struct disk_part_iter piter;
+	struct hd_struct *part;
 	char *s;
-	int i;
-	struct hd_struct *p;
 	int err;
 
 	disk->dev.parent = disk->driverfs_dev;
@@ -466,16 +477,16 @@ exit:
 	kobject_uevent(&disk->dev.kobj, KOBJ_ADD);
 
 	/* announce possible partitions */
-	for (i = 0; i < disk_max_parts(disk); i++) {
-		p = disk->part[i];
-		if (!p || !p->nr_sects)
-			continue;
-		kobject_uevent(&p->dev.kobj, KOBJ_ADD);
-	}
+	disk_part_iter_init(&piter, disk, 0);
+	while ((part = disk_part_iter_next(&piter)))
+		kobject_uevent(&part->dev.kobj, KOBJ_ADD);
+	disk_part_iter_exit(&piter);
 }
 
 int rescan_partitions(struct gendisk *disk, struct block_device *bdev)
 {
+	struct disk_part_iter piter;
+	struct hd_struct *part;
 	struct parsed_partitions *state;
 	int p, res;
 
@@ -485,8 +496,12 @@ int rescan_partitions(struct gendisk *disk, struct block_device *bdev)
 	if (res)
 		return res;
 	bdev->bd_invalidated = 0;
-	for (p = 1; p <= disk_max_parts(disk); p++)
-		delete_partition(disk, p);
+
+	disk_part_iter_init(&piter, disk, DISK_PITER_INCL_EMPTY);
+	while ((part = disk_part_iter_next(&piter)))
+		delete_partition(disk, part->partno);
+	disk_part_iter_exit(&piter);
+
 	if (disk->fops->revalidate_disk)
 		disk->fops->revalidate_disk(disk);
 	if (!get_capacity(disk) || !(state = check_partition(disk, bdev)))
@@ -545,13 +560,18 @@ EXPORT_SYMBOL(read_dev_sector);
 
 void del_gendisk(struct gendisk *disk)
 {
-	int p;
+	struct disk_part_iter piter;
+	struct hd_struct *part;
 
 	/* invalidate stuff */
-	for (p = disk_max_parts(disk); p > 0; p--) {
-		invalidate_partition(disk, p);
-		delete_partition(disk, p);
+	disk_part_iter_init(&piter, disk,
+			     DISK_PITER_INCL_EMPTY | DISK_PITER_REVERSE);
+	while ((part = disk_part_iter_next(&piter))) {
+		invalidate_partition(disk, part->partno);
+		delete_partition(disk, part->partno);
 	}
+	disk_part_iter_exit(&piter);
+
 	invalidate_partition(disk, 0);
 	disk->capacity = 0;
 	disk->flags &= ~GENHD_FL_UP;
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index 6cb34ba..aeaf59c 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -11,6 +11,7 @@
 
 #include <linux/types.h>
 #include <linux/kdev_t.h>
+#include <linux/rcupdate.h>
 
 #ifdef CONFIG_BLOCK
 
@@ -100,6 +101,7 @@ struct hd_struct {
 #else
 	struct disk_stats dkstats;
 #endif
+	struct rcu_head rcu_head;
 };
 
 #define GENHD_FL_REMOVABLE			1
@@ -129,7 +131,14 @@ struct gendisk {
                                          * disks that can't be partitioned. */
 
 	char disk_name[32];		/* name of major driver */
-	struct hd_struct **part;	/* [indexed by minor - 1] */
+
+	/* Array of pointers to partitions indexed by partno - 1.
+	 * Protected with matching bdev lock but stat and other
+	 * non-critical accesses use RCU.  Always access through
+	 * helpers.
+	 */
+	struct hd_struct **__part;
+
 	struct block_device_operations *fops;
 	struct request_queue *queue;
 	struct blk_scsi_cmd_filter cmd_filter;
@@ -181,25 +190,41 @@ static inline dev_t part_devt(struct hd_struct *part)
 	return part->dev.devt;
 }
 
+extern struct hd_struct *disk_get_part(struct gendisk *disk, int partno);
+
+static inline void disk_put_part(struct hd_struct *part)
+{
+	if (likely(part))
+		put_device(&part->dev);
+}
+
+/*
+ * Smarter partition iterator without context limits.
+ */
+#define DISK_PITER_REVERSE	(1 << 0) /* iterate in the reverse direction */
+#define DISK_PITER_INCL_EMPTY	(1 << 1) /* include 0-sized parts */
+
+struct disk_part_iter {
+	struct gendisk		*disk;
+	struct hd_struct	*part;
+	int			idx;
+	unsigned int		flags;
+};
+
+extern void disk_part_iter_init(struct disk_part_iter *piter,
+				 struct gendisk *disk, unsigned int flags);
+extern struct hd_struct *disk_part_iter_next(struct disk_part_iter *piter);
+extern void disk_part_iter_exit(struct disk_part_iter *piter);
+
+extern struct hd_struct *disk_map_sector_rcu(struct gendisk *disk,
+					     sector_t sector);
+
 /* 
  * Macros to operate on percpu disk statistics:
  *
  * The __ variants should only be called in critical sections. The full
  * variants disable/enable preemption.
  */
-static inline struct hd_struct *disk_map_sector(struct gendisk *gendiskp,
-						sector_t sector)
-{
-	struct hd_struct *part;
-	int i;
-	for (i = 0; i < disk_max_parts(gendiskp); i++) {
-		part = gendiskp->part[i];
-		if (part && part->start_sect <= sector
-		    && sector < part->start_sect + part->nr_sects)
-			return part;
-	}
-	return NULL;
-}
 
 #ifdef	CONFIG_SMP
 #define __disk_stat_add(gendiskp, field, addnd) 	\
-- 
1.5.4.5


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 05/09] block: fix diskstats access
  2008-08-25 10:47 [PATCHSET 2/3 blk-for-2.6.28] block: implement extended devt, take #3 Tejun Heo
                   ` (3 preceding siblings ...)
  2008-08-25 10:47 ` [PATCH 04/09] block: fix disk->part[] dereferencing race Tejun Heo
@ 2008-08-25 10:47 ` Tejun Heo
  2008-08-25 10:58   ` Peter Zijlstra
  2008-08-25 10:47 ` [PATCH 06/09] block: implement extended dev numbers Tejun Heo
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 12+ messages in thread
From: Tejun Heo @ 2008-08-25 10:47 UTC (permalink / raw)
  To: Jens Axboe
  Cc: James.Bottomley, bzolnier, bharrosh, greg.freemyer, linux-scsi,
	brking, liml, viro, linux-ide, neilb, linux-kernel, Tejun Heo,
	Peter Zijlstra

There are two variants of stat functions - ones prefixed with double
underbars which don't care about preemption and ones without which
disable preemption before manipulating per-cpu counters.  It's unclear
whether the underbarred ones assume that preemtion is disabled on
entry as some callers don't do that.

This patch unifies diskstats access by implementing disk_stat_lock()
and disk_stat_unlock() which take care of both RCU (for partition
access) and preemption (for per-cpu counter access).  diskstats access
should always be enclosed between the two functions.  As such, there's
no need for the versions which disables preemption.  They're removed
and double underbars ones are renamed to drop the underbars.  As an
extra argument is added, there's no danger of using the old version
unconverted.

disk_stat_lock() uses get_cpu() and returns the cpu index and all
diskstat functions which access per-cpu counters now has @cpu
argument to help RT.

This change adds RCU or preemption operations at some places but also
collapses several preemption ops into one at others.  Overall, the
performance difference should be negligible as all involved ops are
very lightweight per-cpu ones.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 block/blk-core.c           |   52 +++++++++--------
 block/blk-merge.c          |   11 ++--
 block/genhd.c              |   20 ++++---
 drivers/block/aoe/aoecmd.c |   15 +++--
 drivers/md/dm.c            |   26 +++++----
 drivers/md/linear.c        |    7 ++-
 drivers/md/multipath.c     |    7 ++-
 drivers/md/raid0.c         |    7 ++-
 drivers/md/raid1.c         |    8 ++-
 drivers/md/raid10.c        |    7 ++-
 drivers/md/raid5.c         |    8 ++-
 fs/partitions/check.c      |    7 +-
 include/linux/genhd.h      |  139 ++++++++++++++++++--------------------------
 13 files changed, 158 insertions(+), 156 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index a8cfa5e..3de5610 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -56,25 +56,26 @@ static void drive_stat_acct(struct request *rq, int new_io)
 {
 	struct hd_struct *part;
 	int rw = rq_data_dir(rq);
+	int cpu;
 
 	if (!blk_fs_request(rq) || !rq->rq_disk)
 		return;
 
-	rcu_read_lock();
-
+	cpu = disk_stat_lock();
 	part = disk_map_sector_rcu(rq->rq_disk, rq->sector);
+
 	if (!new_io)
-		__all_stat_inc(rq->rq_disk, part, merges[rw], rq->sector);
+		all_stat_inc(cpu, rq->rq_disk, part, merges[rw], rq->sector);
 	else {
-		disk_round_stats(rq->rq_disk);
+		disk_round_stats(cpu, rq->rq_disk);
 		rq->rq_disk->in_flight++;
 		if (part) {
-			part_round_stats(part);
+			part_round_stats(cpu, part);
 			part->in_flight++;
 		}
 	}
 
-	rcu_read_unlock();
+	disk_stat_unlock();
 }
 
 void blk_queue_congestion_threshold(struct request_queue *q)
@@ -995,7 +996,7 @@ static inline void add_request(struct request_queue *q, struct request *req)
  * /proc/diskstats.  This accounts immediately for all queue usage up to
  * the current jiffies and restarts the counters again.
  */
-void disk_round_stats(struct gendisk *disk)
+void disk_round_stats(int cpu, struct gendisk *disk)
 {
 	unsigned long now = jiffies;
 
@@ -1003,15 +1004,15 @@ void disk_round_stats(struct gendisk *disk)
 		return;
 
 	if (disk->in_flight) {
-		__disk_stat_add(disk, time_in_queue,
-				disk->in_flight * (now - disk->stamp));
-		__disk_stat_add(disk, io_ticks, (now - disk->stamp));
+		disk_stat_add(cpu, disk, time_in_queue,
+			      disk->in_flight * (now - disk->stamp));
+		disk_stat_add(cpu, disk, io_ticks, (now - disk->stamp));
 	}
 	disk->stamp = now;
 }
 EXPORT_SYMBOL_GPL(disk_round_stats);
 
-void part_round_stats(struct hd_struct *part)
+void part_round_stats(int cpu, struct hd_struct *part)
 {
 	unsigned long now = jiffies;
 
@@ -1019,9 +1020,9 @@ void part_round_stats(struct hd_struct *part)
 		return;
 
 	if (part->in_flight) {
-		__part_stat_add(part, time_in_queue,
-				part->in_flight * (now - part->stamp));
-		__part_stat_add(part, io_ticks, (now - part->stamp));
+		part_stat_add(cpu, part, time_in_queue,
+			      part->in_flight * (now - part->stamp));
+		part_stat_add(cpu, part, io_ticks, (now - part->stamp));
 	}
 	part->stamp = now;
 }
@@ -1561,12 +1562,13 @@ static int __end_that_request_first(struct request *req, int error,
 	if (blk_fs_request(req) && req->rq_disk) {
 		const int rw = rq_data_dir(req);
 		struct hd_struct *part;
+		int cpu;
 
-		rcu_read_lock();
+		cpu = disk_stat_lock();
 		part = disk_map_sector_rcu(req->rq_disk, req->sector);
-		all_stat_add(req->rq_disk, part, sectors[rw],
-				nr_bytes >> 9, req->sector);
-		rcu_read_unlock();
+		all_stat_add(cpu, req->rq_disk, part, sectors[rw],
+			     nr_bytes >> 9, req->sector);
+		disk_stat_unlock();
 	}
 
 	total_bytes = bio_nbytes = 0;
@@ -1751,21 +1753,21 @@ static void end_that_request_last(struct request *req, int error)
 		unsigned long duration = jiffies - req->start_time;
 		const int rw = rq_data_dir(req);
 		struct hd_struct *part;
+		int cpu;
 
-		rcu_read_lock();
-
+		cpu = disk_stat_lock();
 		part = disk_map_sector_rcu(disk, req->sector);
 
-		__all_stat_inc(disk, part, ios[rw], req->sector);
-		__all_stat_add(disk, part, ticks[rw], duration, req->sector);
-		disk_round_stats(disk);
+		all_stat_inc(cpu, disk, part, ios[rw], req->sector);
+		all_stat_add(cpu, disk, part, ticks[rw], duration, req->sector);
+		disk_round_stats(cpu, disk);
 		disk->in_flight--;
 		if (part) {
-			part_round_stats(part);
+			part_round_stats(cpu, part);
 			part->in_flight--;
 		}
 
-		rcu_read_unlock();
+		disk_stat_unlock();
 	}
 
 	if (req->end_io)
diff --git a/block/blk-merge.c b/block/blk-merge.c
index eb2a3ca..d926a24 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -388,18 +388,19 @@ static int attempt_merge(struct request_queue *q, struct request *req,
 
 	if (req->rq_disk) {
 		struct hd_struct *part;
+		int cpu;
 
-		rcu_read_lock();
-
+		cpu = disk_stat_lock();
 		part = disk_map_sector_rcu(req->rq_disk, req->sector);
-		disk_round_stats(req->rq_disk);
+
+		disk_round_stats(cpu, req->rq_disk);
 		req->rq_disk->in_flight--;
 		if (part) {
-			part_round_stats(part);
+			part_round_stats(cpu, part);
 			part->in_flight--;
 		}
 
-		rcu_read_unlock();
+		disk_stat_unlock();
 	}
 
 	req->ioprio = ioprio_best(req->ioprio, next->ioprio);
diff --git a/block/genhd.c b/block/genhd.c
index 5e234b1..7dbf2cc 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -634,10 +634,11 @@ static ssize_t disk_stat_show(struct device *dev,
 			      struct device_attribute *attr, char *buf)
 {
 	struct gendisk *disk = dev_to_disk(dev);
+	int cpu;
 
-	preempt_disable();
-	disk_round_stats(disk);
-	preempt_enable();
+	cpu = disk_stat_lock();
+	disk_round_stats(cpu, disk);
+	disk_stat_unlock();
 	return sprintf(buf,
 		"%8lu %8lu %8llu %8u "
 		"%8lu %8lu %8llu %8u "
@@ -750,6 +751,7 @@ static int diskstats_show(struct seq_file *seqf, void *v)
 	struct disk_part_iter piter;
 	struct hd_struct *hd;
 	char buf[BDEVNAME_SIZE];
+	int cpu;
 
 	/*
 	if (&gp->dev.kobj.entry == block_class.devices.next)
@@ -759,9 +761,9 @@ static int diskstats_show(struct seq_file *seqf, void *v)
 				"\n\n");
 	*/
  
-	preempt_disable();
-	disk_round_stats(gp);
-	preempt_enable();
+	cpu = disk_stat_lock();
+	disk_round_stats(cpu, gp);
+	disk_stat_unlock();
 	seq_printf(seqf, "%4d %4d %s %lu %lu %llu %u %lu %lu %llu %u %u %u %u\n",
 		MAJOR(disk_devt(gp)), MINOR(disk_devt(gp)),
 		disk_name(gp, 0, buf),
@@ -778,9 +780,9 @@ static int diskstats_show(struct seq_file *seqf, void *v)
 	/* now show all non-0 size partitions of it */
 	disk_part_iter_init(&piter, gp, 0);
 	while ((hd = disk_part_iter_next(&piter))) {
-		preempt_disable();
-		part_round_stats(hd);
-		preempt_enable();
+		cpu = disk_stat_lock();
+		part_round_stats(cpu, hd);
+		disk_stat_unlock();
 		seq_printf(seqf, "%4d %4d %s %lu %lu %llu "
 			   "%u %lu %lu %llu %u %u %u %u\n",
 			   MAJOR(part_devt(hd)), MINOR(part_devt(hd)),
diff --git a/drivers/block/aoe/aoecmd.c b/drivers/block/aoe/aoecmd.c
index 84c03d6..17eed8c 100644
--- a/drivers/block/aoe/aoecmd.c
+++ b/drivers/block/aoe/aoecmd.c
@@ -756,16 +756,17 @@ diskstats(struct gendisk *disk, struct bio *bio, ulong duration, sector_t sector
 	unsigned long n_sect = bio->bi_size >> 9;
 	const int rw = bio_data_dir(bio);
 	struct hd_struct *part;
+	int cpu;
 
-	rcu_read_lock();
-
+	cpu = disk_stat_lock();
 	part = disk_map_sector_rcu(disk, sector);
-	all_stat_inc(disk, part, ios[rw], sector);
-	all_stat_add(disk, part, ticks[rw], duration, sector);
-	all_stat_add(disk, part, sectors[rw], n_sect, sector);
-	all_stat_add(disk, part, io_ticks, duration, sector);
 
-	rcu_read_unlock();
+	all_stat_inc(cpu, disk, part, ios[rw], sector);
+	all_stat_add(cpu, disk, part, ticks[rw], duration, sector);
+	all_stat_add(cpu, disk, part, sectors[rw], n_sect, sector);
+	all_stat_add(cpu, disk, part, io_ticks, duration, sector);
+
+	disk_stat_unlock();
 }
 
 void
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index b4ddb96..d087435 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -377,12 +377,13 @@ static void free_tio(struct mapped_device *md, struct dm_target_io *tio)
 static void start_io_acct(struct dm_io *io)
 {
 	struct mapped_device *md = io->md;
+	int cpu;
 
 	io->start_time = jiffies;
 
-	preempt_disable();
-	disk_round_stats(dm_disk(md));
-	preempt_enable();
+	cpu = disk_stat_lock();
+	disk_round_stats(cpu, dm_disk(md));
+	disk_stat_unlock();
 	dm_disk(md)->in_flight = atomic_inc_return(&md->pending);
 }
 
@@ -391,15 +392,15 @@ static int end_io_acct(struct dm_io *io)
 	struct mapped_device *md = io->md;
 	struct bio *bio = io->bio;
 	unsigned long duration = jiffies - io->start_time;
-	int pending;
+	int pending, cpu;
 	int rw = bio_data_dir(bio);
 
-	preempt_disable();
-	disk_round_stats(dm_disk(md));
-	preempt_enable();
-	dm_disk(md)->in_flight = pending = atomic_dec_return(&md->pending);
+	cpu = disk_stat_lock();
+	disk_round_stats(cpu, dm_disk(md));
+	disk_stat_add(cpu, dm_disk(md), ticks[rw], duration);
+	disk_stat_unlock();
 
-	disk_stat_add(dm_disk(md), ticks[rw], duration);
+	dm_disk(md)->in_flight = pending = atomic_dec_return(&md->pending);
 
 	return !pending;
 }
@@ -881,6 +882,7 @@ static int dm_request(struct request_queue *q, struct bio *bio)
 	int r = -EIO;
 	int rw = bio_data_dir(bio);
 	struct mapped_device *md = q->queuedata;
+	int cpu;
 
 	/*
 	 * There is no use in forwarding any barrier request since we can't
@@ -893,8 +895,10 @@ static int dm_request(struct request_queue *q, struct bio *bio)
 
 	down_read(&md->io_lock);
 
-	disk_stat_inc(dm_disk(md), ios[rw]);
-	disk_stat_add(dm_disk(md), sectors[rw], bio_sectors(bio));
+	cpu = disk_stat_lock();
+	disk_stat_inc(cpu, dm_disk(md), ios[rw]);
+	disk_stat_add(cpu, dm_disk(md), sectors[rw], bio_sectors(bio));
+	disk_stat_unlock();
 
 	/*
 	 * If we're suspended we have to queue
diff --git a/drivers/md/linear.c b/drivers/md/linear.c
index b1eebf8..00cbc8e 100644
--- a/drivers/md/linear.c
+++ b/drivers/md/linear.c
@@ -318,14 +318,17 @@ static int linear_make_request (struct request_queue *q, struct bio *bio)
 	mddev_t *mddev = q->queuedata;
 	dev_info_t *tmp_dev;
 	sector_t block;
+	int cpu;
 
 	if (unlikely(bio_barrier(bio))) {
 		bio_endio(bio, -EOPNOTSUPP);
 		return 0;
 	}
 
-	disk_stat_inc(mddev->gendisk, ios[rw]);
-	disk_stat_add(mddev->gendisk, sectors[rw], bio_sectors(bio));
+	cpu = disk_stat_lock();
+	disk_stat_inc(cpu, mddev->gendisk, ios[rw]);
+	disk_stat_add(cpu, mddev->gendisk, sectors[rw], bio_sectors(bio));
+	disk_stat_unlock();
 
 	tmp_dev = which_dev(mddev, bio->bi_sector);
 	block = bio->bi_sector >> 1;
diff --git a/drivers/md/multipath.c b/drivers/md/multipath.c
index c4779cc..182f5a9 100644
--- a/drivers/md/multipath.c
+++ b/drivers/md/multipath.c
@@ -147,6 +147,7 @@ static int multipath_make_request (struct request_queue *q, struct bio * bio)
 	struct multipath_bh * mp_bh;
 	struct multipath_info *multipath;
 	const int rw = bio_data_dir(bio);
+	int cpu;
 
 	if (unlikely(bio_barrier(bio))) {
 		bio_endio(bio, -EOPNOTSUPP);
@@ -158,8 +159,10 @@ static int multipath_make_request (struct request_queue *q, struct bio * bio)
 	mp_bh->master_bio = bio;
 	mp_bh->mddev = mddev;
 
-	disk_stat_inc(mddev->gendisk, ios[rw]);
-	disk_stat_add(mddev->gendisk, sectors[rw], bio_sectors(bio));
+	cpu = disk_stat_lock();
+	disk_stat_inc(cpu, mddev->gendisk, ios[rw]);
+	disk_stat_add(cpu, mddev->gendisk, sectors[rw], bio_sectors(bio));
+	disk_stat_unlock();
 
 	mp_bh->path = multipath_map(conf);
 	if (mp_bh->path < 0) {
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index 1836106..e26030f 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -399,14 +399,17 @@ static int raid0_make_request (struct request_queue *q, struct bio *bio)
 	sector_t chunk;
 	sector_t block, rsect;
 	const int rw = bio_data_dir(bio);
+	int cpu;
 
 	if (unlikely(bio_barrier(bio))) {
 		bio_endio(bio, -EOPNOTSUPP);
 		return 0;
 	}
 
-	disk_stat_inc(mddev->gendisk, ios[rw]);
-	disk_stat_add(mddev->gendisk, sectors[rw], bio_sectors(bio));
+	cpu = disk_stat_lock();
+	disk_stat_inc(cpu, mddev->gendisk, ios[rw]);
+	disk_stat_add(cpu, mddev->gendisk, sectors[rw], bio_sectors(bio));
+	disk_stat_unlock();
 
 	chunk_size = mddev->chunk_size >> 10;
 	chunk_sects = mddev->chunk_size >> 9;
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 0b82030..babb130 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -779,7 +779,7 @@ static int make_request(struct request_queue *q, struct bio * bio)
 	struct page **behind_pages = NULL;
 	const int rw = bio_data_dir(bio);
 	const int do_sync = bio_sync(bio);
-	int do_barriers;
+	int cpu, do_barriers;
 	mdk_rdev_t *blocked_rdev;
 
 	/*
@@ -804,8 +804,10 @@ static int make_request(struct request_queue *q, struct bio * bio)
 
 	bitmap = mddev->bitmap;
 
-	disk_stat_inc(mddev->gendisk, ios[rw]);
-	disk_stat_add(mddev->gendisk, sectors[rw], bio_sectors(bio));
+	cpu = disk_stat_lock();
+	disk_stat_inc(cpu, mddev->gendisk, ios[rw]);
+	disk_stat_add(cpu, mddev->gendisk, sectors[rw], bio_sectors(bio));
+	disk_stat_unlock();
 
 	/*
 	 * make_request() can abort the operation when READA is being
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index d3b9aa0..5ec80da 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -789,6 +789,7 @@ static int make_request(struct request_queue *q, struct bio * bio)
 	mirror_info_t *mirror;
 	r10bio_t *r10_bio;
 	struct bio *read_bio;
+	int cpu;
 	int i;
 	int chunk_sects = conf->chunk_mask + 1;
 	const int rw = bio_data_dir(bio);
@@ -843,8 +844,10 @@ static int make_request(struct request_queue *q, struct bio * bio)
 	 */
 	wait_barrier(conf);
 
-	disk_stat_inc(mddev->gendisk, ios[rw]);
-	disk_stat_add(mddev->gendisk, sectors[rw], bio_sectors(bio));
+	cpu = disk_stat_lock();
+	disk_stat_inc(cpu, mddev->gendisk, ios[rw]);
+	disk_stat_add(cpu, mddev->gendisk, sectors[rw], bio_sectors(bio));
+	disk_stat_unlock();
 
 	r10_bio = mempool_alloc(conf->r10bio_pool, GFP_NOIO);
 
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 37e5465..5899f21 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -3387,7 +3387,7 @@ static int make_request(struct request_queue *q, struct bio * bi)
 	sector_t logical_sector, last_sector;
 	struct stripe_head *sh;
 	const int rw = bio_data_dir(bi);
-	int remaining;
+	int cpu, remaining;
 
 	if (unlikely(bio_barrier(bi))) {
 		bio_endio(bi, -EOPNOTSUPP);
@@ -3396,8 +3396,10 @@ static int make_request(struct request_queue *q, struct bio * bi)
 
 	md_write_start(mddev, bi);
 
-	disk_stat_inc(mddev->gendisk, ios[rw]);
-	disk_stat_add(mddev->gendisk, sectors[rw], bio_sectors(bi));
+	cpu = disk_stat_lock();
+	disk_stat_inc(cpu, mddev->gendisk, ios[rw]);
+	disk_stat_add(cpu, mddev->gendisk, sectors[rw], bio_sectors(bi));
+	disk_stat_unlock();
 
 	if (rw == READ &&
 	     mddev->reshape_position == MaxSector &&
diff --git a/fs/partitions/check.c b/fs/partitions/check.c
index aa2f5f7..36e0641 100644
--- a/fs/partitions/check.c
+++ b/fs/partitions/check.c
@@ -219,10 +219,11 @@ static ssize_t part_stat_show(struct device *dev,
 			      struct device_attribute *attr, char *buf)
 {
 	struct hd_struct *p = dev_to_part(dev);
+	int cpu;
 
-	preempt_disable();
-	part_round_stats(p);
-	preempt_enable();
+	cpu = disk_stat_lock();
+	part_round_stats(cpu, p);
+	disk_stat_unlock();
 	return sprintf(buf,
 		"%8lu %8lu %8llu %8u "
 		"%8lu %8lu %8llu %8u "
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index aeaf59c..09420c3 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -219,16 +219,24 @@ extern void disk_part_iter_exit(struct disk_part_iter *piter);
 extern struct hd_struct *disk_map_sector_rcu(struct gendisk *disk,
 					     sector_t sector);
 
-/* 
+/*
  * Macros to operate on percpu disk statistics:
  *
- * The __ variants should only be called in critical sections. The full
- * variants disable/enable preemption.
+ * {disk|part|all}_stat_{add|sub|inc|dec}() modify the stat counters
+ * and should be called between disk_stat_lock() and
+ * disk_stat_unlock().
+ *
+ * part_stat_read() can be called at any time.
+ *
+ * part_stat_{add|set_all}() and {init|free}_part_stats are for
+ * internal use only.
  */
-
 #ifdef	CONFIG_SMP
-#define __disk_stat_add(gendiskp, field, addnd) 	\
-	(per_cpu_ptr(gendiskp->dkstats, smp_processor_id())->field += addnd)
+#define disk_stat_lock()	({ rcu_read_lock(); get_cpu(); })
+#define disk_stat_unlock()	do { put_cpu(); rcu_read_unlock(); } while (0)
+
+#define disk_stat_add(cpu, gendiskp, field, addnd)			\
+	(per_cpu_ptr(gendiskp->dkstats, cpu)->field += addnd)
 
 #define disk_stat_read(gendiskp, field)					\
 ({									\
@@ -239,7 +247,8 @@ extern struct hd_struct *disk_map_sector_rcu(struct gendisk *disk,
 	res;								\
 })
 
-static inline void disk_stat_set_all(struct gendisk *gendiskp, int value)	{
+static inline void disk_stat_set_all(struct gendisk *gendiskp, int value)
+{
 	int i;
 
 	for_each_possible_cpu(i)
@@ -247,14 +256,14 @@ static inline void disk_stat_set_all(struct gendisk *gendiskp, int value)	{
 				sizeof(struct disk_stats));
 }		
 
-#define __part_stat_add(part, field, addnd)				\
-	(per_cpu_ptr(part->dkstats, smp_processor_id())->field += addnd)
+#define part_stat_add(cpu, part, field, addnd)				\
+	(per_cpu_ptr(part->dkstats, cpu)->field += addnd)
 
-#define __all_stat_add(gendiskp, part, field, addnd, sector)	\
-({								\
-	if (part)						\
-		__part_stat_add(part, field, addnd);		\
-	__disk_stat_add(gendiskp, field, addnd);		\
+#define all_stat_add(cpu, gendiskp, part, field, addnd, sector)		\
+({									\
+	if (part)							\
+		part_stat_add(cpu, part, field, addnd);			\
+	disk_stat_add(cpu, gendiskp, field, addnd);			\
 })
 
 #define part_stat_read(part, field)					\
@@ -274,10 +283,13 @@ static inline void part_stat_set_all(struct hd_struct *part, int value)
 		memset(per_cpu_ptr(part->dkstats, i), value,
 				sizeof(struct disk_stats));
 }
-				
+
 #else /* !CONFIG_SMP */
-#define __disk_stat_add(gendiskp, field, addnd) \
-				(gendiskp->dkstats.field += addnd)
+#define disk_stat_lock()	({ rcu_read_lock(); 0; })
+#define disk_stat_unlock()	rcu_read_unlock()
+
+#define disk_stat_add(cpu, gendiskp, field, addnd)			\
+	(gendiskp->dkstats.field += addnd)
 #define disk_stat_read(gendiskp, field)	(gendiskp->dkstats.field)
 
 static inline void disk_stat_set_all(struct gendisk *gendiskp, int value)
@@ -285,14 +297,14 @@ static inline void disk_stat_set_all(struct gendisk *gendiskp, int value)
 	memset(&gendiskp->dkstats, value, sizeof (struct disk_stats));
 }
 
-#define __part_stat_add(part, field, addnd) \
+#define part_stat_add(cpu, part, field, addnd)				\
 	(part->dkstats.field += addnd)
 
-#define __all_stat_add(gendiskp, part, field, addnd, sector)	\
-({								\
-	if (part)						\
-		part->dkstats.field += addnd;			\
-	__disk_stat_add(gendiskp, field, addnd);		\
+#define all_stat_add(cpu, gendiskp, part, field, addnd, sector)		\
+({									\
+	if (part)							\
+		part_stat_add(cpu, part, field, addnd);			\
+	disk_stat_add(cpu, gendiskp, field, addnd);			\
 })
 
 #define part_stat_read(part, field)	(part->dkstats.field)
@@ -304,63 +316,26 @@ static inline void part_stat_set_all(struct hd_struct *part, int value)
 
 #endif /* CONFIG_SMP */
 
-#define disk_stat_add(gendiskp, field, addnd)			\
-	do {							\
-		preempt_disable();				\
-		__disk_stat_add(gendiskp, field, addnd);	\
-		preempt_enable();				\
-	} while (0)
-
-#define __disk_stat_dec(gendiskp, field) __disk_stat_add(gendiskp, field, -1)
-#define disk_stat_dec(gendiskp, field) disk_stat_add(gendiskp, field, -1)
-
-#define __disk_stat_inc(gendiskp, field) __disk_stat_add(gendiskp, field, 1)
-#define disk_stat_inc(gendiskp, field) disk_stat_add(gendiskp, field, 1)
-
-#define __disk_stat_sub(gendiskp, field, subnd) \
-		__disk_stat_add(gendiskp, field, -subnd)
-#define disk_stat_sub(gendiskp, field, subnd) \
-		disk_stat_add(gendiskp, field, -subnd)
-
-#define part_stat_add(gendiskp, field, addnd)		\
-	do {						\
-		preempt_disable();			\
-		__part_stat_add(gendiskp, field, addnd);\
-		preempt_enable();			\
-	} while (0)
-
-#define __part_stat_dec(gendiskp, field) __part_stat_add(gendiskp, field, -1)
-#define part_stat_dec(gendiskp, field) part_stat_add(gendiskp, field, -1)
-
-#define __part_stat_inc(gendiskp, field) __part_stat_add(gendiskp, field, 1)
-#define part_stat_inc(gendiskp, field) part_stat_add(gendiskp, field, 1)
-
-#define __part_stat_sub(gendiskp, field, subnd) \
-		__part_stat_add(gendiskp, field, -subnd)
-#define part_stat_sub(gendiskp, field, subnd) \
-		part_stat_add(gendiskp, field, -subnd)
-
-#define all_stat_add(gendiskp, part, field, addnd, sector)	\
-	do {							\
-		preempt_disable();				\
-		__all_stat_add(gendiskp, part, field, addnd, sector);	\
-		preempt_enable();				\
-	} while (0)
-
-#define __all_stat_dec(gendiskp, field, sector) \
-		__all_stat_add(gendiskp, field, -1, sector)
-#define all_stat_dec(gendiskp, field, sector) \
-		all_stat_add(gendiskp, field, -1, sector)
-
-#define __all_stat_inc(gendiskp, part, field, sector) \
-		__all_stat_add(gendiskp, part, field, 1, sector)
-#define all_stat_inc(gendiskp, part, field, sector) \
-		all_stat_add(gendiskp, part, field, 1, sector)
-
-#define __all_stat_sub(gendiskp, part, field, subnd, sector) \
-		__all_stat_add(gendiskp, part, field, -subnd, sector)
-#define all_stat_sub(gendiskp, part, field, subnd, sector) \
-		all_stat_add(gendiskp, part, field, -subnd, sector)
+#define disk_stat_dec(cpu, gendiskp, field)				\
+	disk_stat_add(cpu, gendiskp, field, -1)
+#define disk_stat_inc(cpu, gendiskp, field)				\
+	disk_stat_add(cpu, gendiskp, field, 1)
+#define disk_stat_sub(cpu, gendiskp, field, subnd)			\
+	disk_stat_add(cpu, gendiskp, field, -subnd)
+
+#define part_stat_dec(cpu, gendiskp, field)				\
+	part_stat_add(cpu, gendiskp, field, -1)
+#define part_stat_inc(cpu, gendiskp, field)				\
+	part_stat_add(cpu, gendiskp, field, 1)
+#define part_stat_sub(cpu, gendiskp, field, subnd)			\
+	part_stat_add(cpu, gendiskp, field, -subnd)
+
+#define all_stat_dec(cpu, gendiskp, field, sector)			\
+	all_stat_add(cpu, gendiskp, field, -1, sector)
+#define all_stat_inc(cpu, gendiskp, part, field, sector)		\
+	all_stat_add(cpu, gendiskp, part, field, 1, sector)
+#define all_stat_sub(cpu, gendiskp, part, field, subnd, sector)		\
+	all_stat_add(cpu, gendiskp, part, field, -subnd, sector)
 
 /* Inlines to alloc and free disk stats in struct gendisk */
 #ifdef  CONFIG_SMP
@@ -411,8 +386,8 @@ static inline void free_part_stats(struct hd_struct *part)
 #endif	/* CONFIG_SMP */
 
 /* drivers/block/ll_rw_blk.c */
-extern void disk_round_stats(struct gendisk *disk);
-extern void part_round_stats(struct hd_struct *part);
+extern void disk_round_stats(int cpu, struct gendisk *disk);
+extern void part_round_stats(int cpu, struct hd_struct *part);
 
 /* drivers/block/genhd.c */
 extern int get_blkdev_list(char *, int);
-- 
1.5.4.5


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 06/09] block: implement extended dev numbers
  2008-08-25 10:47 [PATCHSET 2/3 blk-for-2.6.28] block: implement extended devt, take #3 Tejun Heo
                   ` (4 preceding siblings ...)
  2008-08-25 10:47 ` [PATCH 05/09] block: fix diskstats access Tejun Heo
@ 2008-08-25 10:47 ` Tejun Heo
  2008-08-25 10:47 ` [PATCH 07/09] block: adjust formatting for large minors and add ext_range sysfs attr Tejun Heo
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Tejun Heo @ 2008-08-25 10:47 UTC (permalink / raw)
  To: Jens Axboe
  Cc: James.Bottomley, bzolnier, bharrosh, greg.freemyer, linux-scsi,
	brking, liml, viro, linux-ide, neilb, linux-kernel, Tejun Heo

Implement extended device numbers.  A block driver can tell block
layer that it wants to use extended device numbers.  After the usual
minor space is used up, block layer automatically allocates devt's
from EXT_BLOCK_MAJOR.

Currently only one major number is allocated for this but as the
allocation is strictly on-demand, ~1mil minor space under it should
suffice unless the system actually has more than ~1mil partitions and
if that ever happens adding more majors to the extended devt area is
easy.

Due to internal implementation issues, the first partition can't be
allocated on the extended area.  In other words, genhd->minors should
at least be 1.  This limitation will be lifted by later changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 block/genhd.c         |  120 ++++++++++++++++++++++++++++++++++++++++++++++--
 fs/partitions/check.c |    9 +++-
 include/linux/genhd.h |   13 ++++-
 include/linux/major.h |    2 +
 4 files changed, 135 insertions(+), 9 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index 7dbf2cc..e730dc0 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -16,6 +16,7 @@
 #include <linux/kobj_map.h>
 #include <linux/buffer_head.h>
 #include <linux/mutex.h>
+#include <linux/idr.h>
 
 #include "blk.h"
 
@@ -24,6 +25,15 @@ static DEFINE_MUTEX(block_class_lock);
 struct kobject *block_depr;
 #endif
 
+/* for extended dynamic devt allocation, currently only one major is used */
+#define MAX_EXT_DEVT		(1 << MINORBITS)
+
+/* For extended devt allocation.  ext_devt_mutex prevents look up
+ * results from going away underneath its user.
+ */
+static DEFINE_MUTEX(ext_devt_mutex);
+static DEFINE_IDR(ext_devt_idr);
+
 static struct device_type disk_type;
 
 /**
@@ -288,6 +298,74 @@ EXPORT_SYMBOL(unregister_blkdev);
 
 static struct kobj_map *bdev_map;
 
+/**
+ * blk_alloc_devt - allocate a dev_t for a partition
+ * @part: partition to allocate dev_t for
+ * @gfp_mask: memory allocation flag
+ * @devt: out parameter for resulting dev_t
+ *
+ * Allocate a dev_t for block device.
+ *
+ * RETURNS:
+ * 0 on success, allocated dev_t is returned in *@devt.  -errno on
+ * failure.
+ *
+ * CONTEXT:
+ * Might sleep.
+ */
+int blk_alloc_devt(struct hd_struct *part, dev_t *devt)
+{
+	struct gendisk *disk = part_to_disk(part);
+	int idx, rc;
+
+	/* in consecutive minor range? */
+	if (part->partno < disk->minors) {
+		*devt = MKDEV(disk->major, disk->first_minor + part->partno);
+		return 0;
+	}
+
+	/* allocate ext devt */
+	do {
+		if (!idr_pre_get(&ext_devt_idr, GFP_KERNEL))
+			return -ENOMEM;
+		rc = idr_get_new(&ext_devt_idr, part, &idx);
+	} while (rc == -EAGAIN);
+
+	if (rc)
+		return rc;
+
+	if (idx > MAX_EXT_DEVT) {
+		idr_remove(&ext_devt_idr, idx);
+		return -EBUSY;
+	}
+
+	*devt = MKDEV(BLOCK_EXT_MAJOR, idx);
+	return 0;
+}
+
+/**
+ * blk_free_devt - free a dev_t
+ * @devt: dev_t to free
+ *
+ * Free @devt which was allocated using blk_alloc_devt().
+ *
+ * CONTEXT:
+ * Might sleep.
+ */
+void blk_free_devt(dev_t devt)
+{
+	might_sleep();
+
+	if (devt == MKDEV(0, 0))
+		return;
+
+	if (MAJOR(devt) == BLOCK_EXT_MAJOR) {
+		mutex_lock(&ext_devt_mutex);
+		idr_remove(&ext_devt_idr, MINOR(devt));
+		mutex_unlock(&ext_devt_mutex);
+	}
+}
+
 /*
  * Register device numbers dev..(dev+range-1)
  * range must be nonzero
@@ -373,10 +451,27 @@ void unlink_gendisk(struct gendisk *disk)
  */
 struct gendisk *get_gendisk(dev_t devt, int *partno)
 {
-	struct kobject *kobj = kobj_lookup(bdev_map, devt, partno);
-	struct device *dev = kobj_to_dev(kobj);
+	struct gendisk *disk = NULL;
+
+	if (MAJOR(devt) != BLOCK_EXT_MAJOR) {
+		struct kobject *kobj;
+
+		kobj = kobj_lookup(bdev_map, devt, partno);
+		if (kobj)
+			disk = dev_to_disk(kobj_to_dev(kobj));
+	} else {
+		struct hd_struct *part;
+
+		mutex_lock(&ext_devt_mutex);
+		part = idr_find(&ext_devt_idr, MINOR(devt));
+		if (part && get_disk(part_to_disk(part))) {
+			*partno = part->partno;
+			disk = part_to_disk(part);
+		}
+		mutex_unlock(&ext_devt_mutex);
+	}
 
-	return  kobj ? dev_to_disk(dev) : NULL;
+	return disk;
 }
 
 /**
@@ -879,17 +974,29 @@ struct gendisk *alloc_disk(int minors)
 
 struct gendisk *alloc_disk_node(int minors, int node_id)
 {
+	return alloc_disk_ext_node(minors, 0, node_id);
+}
+
+struct gendisk *alloc_disk_ext(int minors, int ext_minors)
+{
+	return alloc_disk_ext_node(minors, ext_minors, -1);
+}
+
+struct gendisk *alloc_disk_ext_node(int minors, int ext_minors, int node_id)
+{
 	struct gendisk *disk;
 
 	disk = kmalloc_node(sizeof(struct gendisk),
 				GFP_KERNEL | __GFP_ZERO, node_id);
 	if (disk) {
+		int tot_minors = minors + ext_minors;
+
 		if (!init_disk_stats(disk)) {
 			kfree(disk);
 			return NULL;
 		}
-		if (minors > 1) {
-			int size = (minors - 1) * sizeof(struct hd_struct *);
+		if (tot_minors > 1) {
+			int size = (tot_minors - 1) * sizeof(struct hd_struct *);
 			disk->__part = kmalloc_node(size,
 				GFP_KERNEL | __GFP_ZERO, node_id);
 			if (!disk->__part) {
@@ -899,6 +1006,7 @@ struct gendisk *alloc_disk_node(int minors, int node_id)
 			}
 		}
 		disk->minors = minors;
+		disk->ext_minors = ext_minors;
 		rand_initialize_disk(disk);
 		disk->dev.class = &block_class;
 		disk->dev.type = &disk_type;
@@ -911,6 +1019,8 @@ struct gendisk *alloc_disk_node(int minors, int node_id)
 
 EXPORT_SYMBOL(alloc_disk);
 EXPORT_SYMBOL(alloc_disk_node);
+EXPORT_SYMBOL(alloc_disk_ext);
+EXPORT_SYMBOL(alloc_disk_ext_node);
 
 struct kobject *get_disk(struct gendisk *disk)
 {
diff --git a/fs/partitions/check.c b/fs/partitions/check.c
index 36e0641..0ecc93f 100644
--- a/fs/partitions/check.c
+++ b/fs/partitions/check.c
@@ -333,6 +333,7 @@ void delete_partition(struct gendisk *disk, int partno)
 	if (!part)
 		return;
 
+	blk_free_devt(part_devt(part));
 	rcu_assign_pointer(disk->__part[partno-1], NULL);
 	kobject_put(part->holder_dir);
 	device_del(&part->dev);
@@ -352,6 +353,7 @@ int add_partition(struct gendisk *disk, int partno,
 		  sector_t start, sector_t len, int flags)
 {
 	struct hd_struct *p;
+	dev_t devt = MKDEV(0, 0);
 	int err;
 
 	if (disk->__part[partno - 1])
@@ -378,11 +380,15 @@ int add_partition(struct gendisk *disk, int partno,
 			 "%s%d", disk->dev.bus_id, partno);
 
 	device_initialize(&p->dev);
-	p->dev.devt = MKDEV(disk->major, disk->first_minor + partno);
 	p->dev.class = &block_class;
 	p->dev.type = &part_type;
 	p->dev.parent = &disk->dev;
 
+	err = blk_alloc_devt(p, &devt);
+	if (err)
+		goto out_put;
+	p->dev.devt = devt;
+
 	/* delay uevent until 'holders' subdir is created */
 	p->dev.uevent_suppress = 1;
 	err = device_add(&p->dev);
@@ -419,6 +425,7 @@ out_del:
 	device_del(&p->dev);
 out_put:
 	put_device(&p->dev);
+	blk_free_devt(devt);
 	return err;
 }
 
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index 09420c3..c9a7315 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -122,13 +122,15 @@ struct blk_scsi_cmd_filter {
 };
 
 struct gendisk {
-	/* major, first_minor and minors are input parameters only,
-	 * don't use directly.  Use disk_devt() and disk_max_parts().
+	/* major, first_minor, minors and ext_minors are input
+	 * parameters only, don't use directly.  Use disk_devt() and
+	 * disk_max_parts().
 	 */
 	int major;			/* major number of driver */
 	int first_minor;
 	int minors;                     /* maximum number of minors, =1 for
                                          * disks that can't be partitioned. */
+	int ext_minors;			/* number of extended dynamic minors */
 
 	char disk_name[32];		/* name of major driver */
 
@@ -177,7 +179,7 @@ static inline struct gendisk *part_to_disk(struct hd_struct *part)
 
 static inline int disk_max_parts(struct gendisk *disk)
 {
-	return disk->minors - 1;
+	return disk->minors + disk->ext_minors - 1;
 }
 
 static inline dev_t disk_devt(struct gendisk *disk)
@@ -564,6 +566,8 @@ struct unixware_disklabel {
 #define ADDPART_FLAG_RAID	1
 #define ADDPART_FLAG_WHOLEDISK	2
 
+extern int blk_alloc_devt(struct hd_struct *part, dev_t *devt);
+extern void blk_free_devt(dev_t devt);
 extern dev_t blk_lookup_devt(const char *name, int partno);
 extern char *disk_name (struct gendisk *hd, int partno, char *buf);
 
@@ -574,6 +578,9 @@ extern void printk_all_partitions(void);
 
 extern struct gendisk *alloc_disk_node(int minors, int node_id);
 extern struct gendisk *alloc_disk(int minors);
+extern struct gendisk *alloc_disk_ext_node(int minors, int ext_minrs,
+					   int node_id);
+extern struct gendisk *alloc_disk_ext(int minors, int ext_minors);
 extern struct kobject *get_disk(struct gendisk *disk);
 extern void put_disk(struct gendisk *disk);
 extern void blk_register_region(dev_t devt, unsigned long range,
diff --git a/include/linux/major.h b/include/linux/major.h
index 53d5faf..8824945 100644
--- a/include/linux/major.h
+++ b/include/linux/major.h
@@ -170,4 +170,6 @@
 
 #define VIOTAPE_MAJOR		230
 
+#define BLOCK_EXT_MAJOR		259
+
 #endif
-- 
1.5.4.5


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 07/09] block: adjust formatting for large minors and add ext_range sysfs attr
  2008-08-25 10:47 [PATCHSET 2/3 blk-for-2.6.28] block: implement extended devt, take #3 Tejun Heo
                   ` (5 preceding siblings ...)
  2008-08-25 10:47 ` [PATCH 06/09] block: implement extended dev numbers Tejun Heo
@ 2008-08-25 10:47 ` Tejun Heo
  2008-08-25 10:47 ` [PATCH 08/09] sd/ide-disk: apply extended minors to sd and ide Tejun Heo
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Tejun Heo @ 2008-08-25 10:47 UTC (permalink / raw)
  To: Jens Axboe
  Cc: James.Bottomley, bzolnier, bharrosh, greg.freemyer, linux-scsi,
	brking, liml, viro, linux-ide, neilb, linux-kernel, Tejun Heo

With extended minors and the soon-to-follow debug feature, large minor
numbers for block devices will be common.  This patch does the
followings to make printouts pretty.

* Adapt print formats such that large minors don't break the
  formatting.

* For extended MAJ:MIN, %02x%02x for MAJ:MIN used in
  printk_all_partitions() doesn't cut it anymore.  Update it such that
  %03x:%05x is used if either MAJ or MIN doesn't fit in %02x.

* Implement ext_range sysfs attribute which shows total minors the
  device can use including both conventional minor space and the
  extended one.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 block/genhd.c      |   47 +++++++++++++++++++++++++++++++++++------------
 include/linux/fs.h |    1 +
 2 files changed, 36 insertions(+), 12 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index e730dc0..93ebc6c 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -366,6 +366,18 @@ void blk_free_devt(dev_t devt)
 	}
 }
 
+static char *bdevt_str(dev_t devt, char *buf)
+{
+	if (MAJOR(devt) <= 0xff && MINOR(devt) <= 0xff) {
+		char tbuf[BDEVT_SIZE];
+		snprintf(tbuf, BDEVT_SIZE, "%02x%02x", MAJOR(devt), MINOR(devt));
+		snprintf(buf, BDEVT_SIZE, "%-9s", tbuf);
+	} else
+		snprintf(buf, BDEVT_SIZE, "%03x:%05x", MAJOR(devt), MINOR(devt));
+
+	return buf;
+}
+
 /*
  * Register device numbers dev..(dev+range-1)
  * range must be nonzero
@@ -523,7 +535,8 @@ void __init printk_all_partitions(void)
 		struct gendisk *disk = dev_to_disk(dev);
 		struct disk_part_iter piter;
 		struct hd_struct *part;
-		char buf[BDEVNAME_SIZE];
+		char name_buf[BDEVNAME_SIZE];
+		char devt_buf[BDEVT_SIZE];
 
 		/*
 		 * Don't show empty devices or things that have been
@@ -538,10 +551,10 @@ void __init printk_all_partitions(void)
 		 * numbers in hex - the same format as the root=
 		 * option takes.
 		 */
-		printk("%02x%02x %10llu %s",
-		       MAJOR(disk_devt(disk)), MINOR(disk_devt(disk)),
+		printk("%s %10llu %s",
+		       bdevt_str(disk_devt(disk), devt_buf),
 		       (unsigned long long)get_capacity(disk) >> 1,
-		       disk_name(disk, 0, buf));
+		       disk_name(disk, 0, name_buf));
 		if (disk->driverfs_dev != NULL &&
 		    disk->driverfs_dev->driver != NULL)
 			printk(" driver: %s\n",
@@ -552,10 +565,10 @@ void __init printk_all_partitions(void)
 		/* now show the partitions */
 		disk_part_iter_init(&piter, disk, 0);
 		while ((part = disk_part_iter_next(&piter)))
-			printk("  %02x%02x %10llu %s\n",
-			       MAJOR(part_devt(part)), MINOR(part_devt(part)),
+			printk("  %s %10llu %s\n",
+			       bdevt_str(part_devt(part), devt_buf),
 			       (unsigned long long)part->nr_sects >> 1,
-			       disk_name(disk, part->partno, buf));
+			       disk_name(disk, part->partno, name_buf));
 		disk_part_iter_exit(&piter);
 	}
 	class_dev_iter_exit(&iter);
@@ -612,7 +625,7 @@ static void *show_partition_start(struct seq_file *seqf, loff_t *pos)
 
 	p = disk_seqf_start(seqf, pos);
 	if (!IS_ERR(p) && p)
-		seq_puts(part, "major minor  #blocks  name\n\n");
+		seq_puts(seqf, "major   minor   #blocks  name\n\n");
 	return p;
 }
 
@@ -631,14 +644,14 @@ static int show_partition(struct seq_file *seqf, void *v)
 		return 0;
 
 	/* show the full disk and all non-0 size partitions of it */
-	seq_printf(seqf, "%4d  %4d %10llu %s\n",
+	seq_printf(seqf, "%4d  %7d %10llu %s\n",
 		MAJOR(disk_devt(sgp)), MINOR(disk_devt(sgp)),
 		(unsigned long long)get_capacity(sgp) >> 1,
 		disk_name(sgp, 0, buf));
 
 	disk_part_iter_init(&piter, sgp, 0);
 	while ((part = disk_part_iter_next(&piter)))
-		seq_printf(seqf, "%4d  %4d %10llu %s\n",
+		seq_printf(seqf, "%4d  %7d %10llu %s\n",
 			   MAJOR(part_devt(part)), MINOR(part_devt(part)),
 			   (unsigned long long)part->nr_sects >> 1,
 			   disk_name(sgp, part->partno, buf));
@@ -692,6 +705,14 @@ static ssize_t disk_range_show(struct device *dev,
 	return sprintf(buf, "%d\n", disk->minors);
 }
 
+static ssize_t disk_ext_range_show(struct device *dev,
+				   struct device_attribute *attr, char *buf)
+{
+	struct gendisk *disk = dev_to_disk(dev);
+
+	return sprintf(buf, "%d\n", disk_max_parts(disk) + 1);
+}
+
 static ssize_t disk_removable_show(struct device *dev,
 				   struct device_attribute *attr, char *buf)
 {
@@ -781,6 +802,7 @@ static ssize_t disk_fail_store(struct device *dev,
 #endif
 
 static DEVICE_ATTR(range, S_IRUGO, disk_range_show, NULL);
+static DEVICE_ATTR(ext_range, S_IRUGO, disk_ext_range_show, NULL);
 static DEVICE_ATTR(removable, S_IRUGO, disk_removable_show, NULL);
 static DEVICE_ATTR(ro, S_IRUGO, disk_ro_show, NULL);
 static DEVICE_ATTR(size, S_IRUGO, disk_size_show, NULL);
@@ -793,6 +815,7 @@ static struct device_attribute dev_attr_fail =
 
 static struct attribute *disk_attrs[] = {
 	&dev_attr_range.attr,
+	&dev_attr_ext_range.attr,
 	&dev_attr_removable.attr,
 	&dev_attr_ro.attr,
 	&dev_attr_size.attr,
@@ -859,7 +882,7 @@ static int diskstats_show(struct seq_file *seqf, void *v)
 	cpu = disk_stat_lock();
 	disk_round_stats(cpu, gp);
 	disk_stat_unlock();
-	seq_printf(seqf, "%4d %4d %s %lu %lu %llu %u %lu %lu %llu %u %u %u %u\n",
+	seq_printf(seqf, "%4d %7d %s %lu %lu %llu %u %lu %lu %llu %u %u %u %u\n",
 		MAJOR(disk_devt(gp)), MINOR(disk_devt(gp)),
 		disk_name(gp, 0, buf),
 		disk_stat_read(gp, ios[0]), disk_stat_read(gp, merges[0]),
@@ -878,7 +901,7 @@ static int diskstats_show(struct seq_file *seqf, void *v)
 		cpu = disk_stat_lock();
 		part_round_stats(cpu, hd);
 		disk_stat_unlock();
-		seq_printf(seqf, "%4d %4d %s %lu %lu %llu "
+		seq_printf(seqf, "%4d %7d %s %lu %lu %llu "
 			   "%u %lu %lu %llu %u %u %u %u\n",
 			   MAJOR(part_devt(hd)), MINOR(part_devt(hd)),
 			   disk_name(gp, hd->partno, buf),
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 860689f..02a9fb5 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1685,6 +1685,7 @@ extern void chrdev_show(struct seq_file *,off_t);
 
 /* fs/block_dev.c */
 #define BDEVNAME_SIZE	32	/* Largest string for a blockdev identifier */
+#define BDEVT_SIZE	10	/* Largest string for MAJ:MIN for blkdev */
 
 #ifdef CONFIG_BLOCK
 #define BLKDEV_MAJOR_HASH_SIZE	255
-- 
1.5.4.5


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 08/09] sd/ide-disk: apply extended minors to sd and ide
  2008-08-25 10:47 [PATCHSET 2/3 blk-for-2.6.28] block: implement extended devt, take #3 Tejun Heo
                   ` (6 preceding siblings ...)
  2008-08-25 10:47 ` [PATCH 07/09] block: adjust formatting for large minors and add ext_range sysfs attr Tejun Heo
@ 2008-08-25 10:47 ` Tejun Heo
  2008-08-25 10:47 ` [PATCH 09/09] block: implement CONFIG_DEBUG_BLOCK_EXT_DEVT Tejun Heo
  2008-08-25 18:33 ` [PATCHSET 2/3 blk-for-2.6.28] block: implement extended devt, take #3 Jens Axboe
  9 siblings, 0 replies; 12+ messages in thread
From: Tejun Heo @ 2008-08-25 10:47 UTC (permalink / raw)
  To: Jens Axboe
  Cc: James.Bottomley, bzolnier, bharrosh, greg.freemyer, linux-scsi,
	brking, liml, viro, linux-ide, neilb, linux-kernel, Tejun Heo

Update sd and ide-disk such that they can take advantage of extended
minors.

ide-disk already has 64 minors per device and currently doesn't use
extended minors although after this patch it can be turned on by
simply tweaking constants.

sd only had 16 minors per device causing problems on certain peculiar
configurations.  This patch lifts the restriction and enables it to
use upto 64 minors.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 drivers/ide/ide-disk.c |   11 ++++++++---
 drivers/scsi/sd.c      |    9 +++++++--
 2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/drivers/ide/ide-disk.c b/drivers/ide/ide-disk.c
index 68b9cf0..ce24caf 100644
--- a/drivers/ide/ide-disk.c
+++ b/drivers/ide/ide-disk.c
@@ -41,6 +41,10 @@
 #include <asm/io.h>
 #include <asm/div64.h>
 
+#define IDE_DISK_PARTS		(1 << PARTN_BITS)
+#define IDE_DISK_MINORS		IDE_DISK_PARTS
+#define IDE_DISK_EXT_MINORS	(IDE_DISK_PARTS - IDE_DISK_MINORS)
+
 struct ide_disk_obj {
 	ide_drive_t	*drive;
 	ide_driver_t	*driver;
@@ -1165,8 +1169,8 @@ static int ide_disk_probe(ide_drive_t *drive)
 	if (!idkp)
 		goto failed;
 
-	g = alloc_disk_node(1 << PARTN_BITS,
-			hwif_to_node(drive->hwif));
+	g = alloc_disk_ext_node(IDE_DISK_MINORS, IDE_DISK_EXT_MINORS,
+				hwif_to_node(drive->hwif));
 	if (!g)
 		goto out_free_idkp;
 
@@ -1192,7 +1196,8 @@ static int ide_disk_probe(ide_drive_t *drive)
 	} else
 		drive->attach = 1;
 
-	g->minors = 1 << PARTN_BITS;
+	g->minors = IDE_DISK_MINORS;
+	g->ext_minors = IDE_DISK_EXT_MINORS;
 	g->driverfs_dev = &drive->gendev;
 	g->flags = drive->removable ? GENHD_FL_REMOVABLE : 0;
 	set_capacity(g, idedisk_capacity(drive));
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index e5e7d78..d1bb0e1 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -86,6 +86,10 @@ MODULE_ALIAS_SCSI_DEVICE(TYPE_DISK);
 MODULE_ALIAS_SCSI_DEVICE(TYPE_MOD);
 MODULE_ALIAS_SCSI_DEVICE(TYPE_RBC);
 
+#define SD_PARTS	64
+#define SD_MINORS	16
+#define SD_EXT_MINORS	(SD_PARTS - SD_MINORS)
+
 static int  sd_revalidate_disk(struct gendisk *);
 static int  sd_probe(struct device *);
 static int  sd_remove(struct device *);
@@ -1801,7 +1805,7 @@ static int sd_probe(struct device *dev)
 	if (!sdkp)
 		goto out;
 
-	gd = alloc_disk(16);
+	gd = alloc_disk_ext(SD_MINORS, SD_EXT_MINORS);
 	if (!gd)
 		goto out_free;
 
@@ -1845,7 +1849,8 @@ static int sd_probe(struct device *dev)
 
 	gd->major = sd_major((index & 0xf0) >> 4);
 	gd->first_minor = ((index & 0xf) << 4) | (index & 0xfff00);
-	gd->minors = 16;
+	gd->minors = SD_MINORS;
+	gd->ext_minors = SD_EXT_MINORS;
 	gd->fops = &sd_fops;
 
 	if (index < 26) {
-- 
1.5.4.5


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 09/09] block: implement CONFIG_DEBUG_BLOCK_EXT_DEVT
  2008-08-25 10:47 [PATCHSET 2/3 blk-for-2.6.28] block: implement extended devt, take #3 Tejun Heo
                   ` (7 preceding siblings ...)
  2008-08-25 10:47 ` [PATCH 08/09] sd/ide-disk: apply extended minors to sd and ide Tejun Heo
@ 2008-08-25 10:47 ` Tejun Heo
  2008-08-25 18:33 ` [PATCHSET 2/3 blk-for-2.6.28] block: implement extended devt, take #3 Jens Axboe
  9 siblings, 0 replies; 12+ messages in thread
From: Tejun Heo @ 2008-08-25 10:47 UTC (permalink / raw)
  To: Jens Axboe
  Cc: James.Bottomley, bzolnier, bharrosh, greg.freemyer, linux-scsi,
	brking, liml, viro, linux-ide, neilb, linux-kernel, Tejun Heo

Extended devt introduces non-contiguos device numbers.  This patch
implements a debug option which forces most devt allocations to be
from the extended area and spreads them out.  This is enabled by
default if DEBUG_KERNEL is set and achieves...

1. Detects code paths in kernel or userland which expect predetermined
   consecutive device numbers.

2. When something goes wrong, avoid corruption as adding to the minor
   of earlier partition won't lead to the wrong but valid device.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 block/genhd.c          |   38 +++++++++++++++++++++++++++++++++++---
 drivers/ide/ide-disk.c |    6 ++++++
 drivers/scsi/sd.c      |    6 ++++++
 lib/Kconfig.debug      |   16 ++++++++++++++++
 4 files changed, 63 insertions(+), 3 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index 93ebc6c..dbfae29 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -299,6 +299,38 @@ EXPORT_SYMBOL(unregister_blkdev);
 static struct kobj_map *bdev_map;
 
 /**
+ * blk_mangle_minor - scatter minor numbers apart
+ * @minor: minor number to mangle
+ *
+ * Scatter consecutively allocated @minor number apart if MANGLE_DEVT
+ * is enabled.  Mangling twice gives the original value.
+ *
+ * RETURNS:
+ * Mangled value.
+ *
+ * CONTEXT:
+ * Don't care.
+ */
+static int blk_mangle_minor(int minor)
+{
+#ifdef CONFIG_DEBUG_BLOCK_EXT_DEVT
+	int i;
+
+	for (i = 0; i < MINORBITS / 2; i++) {
+		int low = minor & (1 << i);
+		int high = minor & (1 << (MINORBITS - 1 - i));
+		int distance = MINORBITS - 1 - 2 * i;
+
+		minor ^= low | high;	/* clear both bits */
+		low <<= distance;	/* swap the positions */
+		high >>= distance;
+		minor |= low | high;	/* and set */
+	}
+#endif
+	return minor;
+}
+
+/**
  * blk_alloc_devt - allocate a dev_t for a partition
  * @part: partition to allocate dev_t for
  * @gfp_mask: memory allocation flag
@@ -339,7 +371,7 @@ int blk_alloc_devt(struct hd_struct *part, dev_t *devt)
 		return -EBUSY;
 	}
 
-	*devt = MKDEV(BLOCK_EXT_MAJOR, idx);
+	*devt = MKDEV(BLOCK_EXT_MAJOR, blk_mangle_minor(idx));
 	return 0;
 }
 
@@ -361,7 +393,7 @@ void blk_free_devt(dev_t devt)
 
 	if (MAJOR(devt) == BLOCK_EXT_MAJOR) {
 		mutex_lock(&ext_devt_mutex);
-		idr_remove(&ext_devt_idr, MINOR(devt));
+		idr_remove(&ext_devt_idr, blk_mangle_minor(MINOR(devt)));
 		mutex_unlock(&ext_devt_mutex);
 	}
 }
@@ -475,7 +507,7 @@ struct gendisk *get_gendisk(dev_t devt, int *partno)
 		struct hd_struct *part;
 
 		mutex_lock(&ext_devt_mutex);
-		part = idr_find(&ext_devt_idr, MINOR(devt));
+		part = idr_find(&ext_devt_idr, blk_mangle_minor(MINOR(devt)));
 		if (part && get_disk(part_to_disk(part))) {
 			*partno = part->partno;
 			disk = part_to_disk(part);
diff --git a/drivers/ide/ide-disk.c b/drivers/ide/ide-disk.c
index ce24caf..b4cc1ce 100644
--- a/drivers/ide/ide-disk.c
+++ b/drivers/ide/ide-disk.c
@@ -42,7 +42,13 @@
 #include <asm/div64.h>
 
 #define IDE_DISK_PARTS		(1 << PARTN_BITS)
+
+#if !defined(CONFIG_DEBUG_BLOCK_EXT_DEVT)
 #define IDE_DISK_MINORS		IDE_DISK_PARTS
+#else
+#define IDE_DISK_MINORS		1
+#endif
+
 #define IDE_DISK_EXT_MINORS	(IDE_DISK_PARTS - IDE_DISK_MINORS)
 
 struct ide_disk_obj {
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index d1bb0e1..280d231 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -87,7 +87,13 @@ MODULE_ALIAS_SCSI_DEVICE(TYPE_MOD);
 MODULE_ALIAS_SCSI_DEVICE(TYPE_RBC);
 
 #define SD_PARTS	64
+
+#if !defined(CONFIG_DEBUG_BLOCK_EXT_DEVT)
 #define SD_MINORS	16
+#else
+#define SD_MINORS	1
+#endif
+
 #define SD_EXT_MINORS	(SD_PARTS - SD_MINORS)
 
 static int  sd_revalidate_disk(struct gendisk *);
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 8b5a7d3..1bc3c07 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -624,6 +624,22 @@ config BACKTRACE_SELF_TEST
 
 	  Say N if you are unsure.
 
+config DEBUG_BLOCK_EXT_DEVT
+        bool "Force extended block device numbers and spread them"
+	depends on DEBUG_KERNEL
+	depends on BLOCK
+	default y
+	help
+	  Conventionally, block device numbers are allocated from
+	  predetermined contiguous area.  However, extended block area
+	  may introduce non-contiguous block device numbers.  This
+	  option forces most block device numbers to be allocated from
+	  the extended space and spreads them to discover kernel or
+	  userland code paths which assume predetermined contiguous
+	  device number allocation.
+
+	  Say N if you are unsure.
+
 config LKDTM
 	tristate "Linux Kernel Dump Test Tool Module"
 	depends on DEBUG_KERNEL
-- 
1.5.4.5


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 05/09] block: fix diskstats access
  2008-08-25 10:47 ` [PATCH 05/09] block: fix diskstats access Tejun Heo
@ 2008-08-25 10:58   ` Peter Zijlstra
  0 siblings, 0 replies; 12+ messages in thread
From: Peter Zijlstra @ 2008-08-25 10:58 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jens Axboe, James.Bottomley, bzolnier, bharrosh, greg.freemyer,
	linux-scsi, brking, liml, viro, linux-ide, neilb, linux-kernel

On Mon, 2008-08-25 at 19:47 +0900, Tejun Heo wrote:
> There are two variants of stat functions - ones prefixed with double
> underbars which don't care about preemption and ones without which
> disable preemption before manipulating per-cpu counters.  It's unclear
> whether the underbarred ones assume that preemtion is disabled on
> entry as some callers don't do that.
> 
> This patch unifies diskstats access by implementing disk_stat_lock()
> and disk_stat_unlock() which take care of both RCU (for partition
> access) and preemption (for per-cpu counter access).  diskstats access
> should always be enclosed between the two functions.  As such, there's
> no need for the versions which disables preemption.  They're removed
> and double underbars ones are renamed to drop the underbars.  As an
> extra argument is added, there's no danger of using the old version
> unconverted.
> 
> disk_stat_lock() uses get_cpu() and returns the cpu index and all
> diskstat functions which access per-cpu counters now has @cpu
> argument to help RT.
> 
> This change adds RCU or preemption operations at some places but also
> collapses several preemption ops into one at others.  Overall, the
> performance difference should be negligible as all involved ops are
> very lightweight per-cpu ones.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>

Most appreciated,

Acked-by: Peter Zijlstra <peterz@infradead.org>

> ---
>  block/blk-core.c           |   52 +++++++++--------
>  block/blk-merge.c          |   11 ++--
>  block/genhd.c              |   20 ++++---
>  drivers/block/aoe/aoecmd.c |   15 +++--
>  drivers/md/dm.c            |   26 +++++----
>  drivers/md/linear.c        |    7 ++-
>  drivers/md/multipath.c     |    7 ++-
>  drivers/md/raid0.c         |    7 ++-
>  drivers/md/raid1.c         |    8 ++-
>  drivers/md/raid10.c        |    7 ++-
>  drivers/md/raid5.c         |    8 ++-
>  fs/partitions/check.c      |    7 +-
>  include/linux/genhd.h      |  139 ++++++++++++++++++--------------------------
>  13 files changed, 158 insertions(+), 156 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index a8cfa5e..3de5610 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -56,25 +56,26 @@ static void drive_stat_acct(struct request *rq, int new_io)
>  {
>  	struct hd_struct *part;
>  	int rw = rq_data_dir(rq);
> +	int cpu;
>  
>  	if (!blk_fs_request(rq) || !rq->rq_disk)
>  		return;
>  
> -	rcu_read_lock();
> -
> +	cpu = disk_stat_lock();
>  	part = disk_map_sector_rcu(rq->rq_disk, rq->sector);
> +
>  	if (!new_io)
> -		__all_stat_inc(rq->rq_disk, part, merges[rw], rq->sector);
> +		all_stat_inc(cpu, rq->rq_disk, part, merges[rw], rq->sector);
>  	else {
> -		disk_round_stats(rq->rq_disk);
> +		disk_round_stats(cpu, rq->rq_disk);
>  		rq->rq_disk->in_flight++;
>  		if (part) {
> -			part_round_stats(part);
> +			part_round_stats(cpu, part);
>  			part->in_flight++;
>  		}
>  	}
>  
> -	rcu_read_unlock();
> +	disk_stat_unlock();
>  }
>  
>  void blk_queue_congestion_threshold(struct request_queue *q)
> @@ -995,7 +996,7 @@ static inline void add_request(struct request_queue *q, struct request *req)
>   * /proc/diskstats.  This accounts immediately for all queue usage up to
>   * the current jiffies and restarts the counters again.
>   */
> -void disk_round_stats(struct gendisk *disk)
> +void disk_round_stats(int cpu, struct gendisk *disk)
>  {
>  	unsigned long now = jiffies;
>  
> @@ -1003,15 +1004,15 @@ void disk_round_stats(struct gendisk *disk)
>  		return;
>  
>  	if (disk->in_flight) {
> -		__disk_stat_add(disk, time_in_queue,
> -				disk->in_flight * (now - disk->stamp));
> -		__disk_stat_add(disk, io_ticks, (now - disk->stamp));
> +		disk_stat_add(cpu, disk, time_in_queue,
> +			      disk->in_flight * (now - disk->stamp));
> +		disk_stat_add(cpu, disk, io_ticks, (now - disk->stamp));
>  	}
>  	disk->stamp = now;
>  }
>  EXPORT_SYMBOL_GPL(disk_round_stats);
>  
> -void part_round_stats(struct hd_struct *part)
> +void part_round_stats(int cpu, struct hd_struct *part)
>  {
>  	unsigned long now = jiffies;
>  
> @@ -1019,9 +1020,9 @@ void part_round_stats(struct hd_struct *part)
>  		return;
>  
>  	if (part->in_flight) {
> -		__part_stat_add(part, time_in_queue,
> -				part->in_flight * (now - part->stamp));
> -		__part_stat_add(part, io_ticks, (now - part->stamp));
> +		part_stat_add(cpu, part, time_in_queue,
> +			      part->in_flight * (now - part->stamp));
> +		part_stat_add(cpu, part, io_ticks, (now - part->stamp));
>  	}
>  	part->stamp = now;
>  }
> @@ -1561,12 +1562,13 @@ static int __end_that_request_first(struct request *req, int error,
>  	if (blk_fs_request(req) && req->rq_disk) {
>  		const int rw = rq_data_dir(req);
>  		struct hd_struct *part;
> +		int cpu;
>  
> -		rcu_read_lock();
> +		cpu = disk_stat_lock();
>  		part = disk_map_sector_rcu(req->rq_disk, req->sector);
> -		all_stat_add(req->rq_disk, part, sectors[rw],
> -				nr_bytes >> 9, req->sector);
> -		rcu_read_unlock();
> +		all_stat_add(cpu, req->rq_disk, part, sectors[rw],
> +			     nr_bytes >> 9, req->sector);
> +		disk_stat_unlock();
>  	}
>  
>  	total_bytes = bio_nbytes = 0;
> @@ -1751,21 +1753,21 @@ static void end_that_request_last(struct request *req, int error)
>  		unsigned long duration = jiffies - req->start_time;
>  		const int rw = rq_data_dir(req);
>  		struct hd_struct *part;
> +		int cpu;
>  
> -		rcu_read_lock();
> -
> +		cpu = disk_stat_lock();
>  		part = disk_map_sector_rcu(disk, req->sector);
>  
> -		__all_stat_inc(disk, part, ios[rw], req->sector);
> -		__all_stat_add(disk, part, ticks[rw], duration, req->sector);
> -		disk_round_stats(disk);
> +		all_stat_inc(cpu, disk, part, ios[rw], req->sector);
> +		all_stat_add(cpu, disk, part, ticks[rw], duration, req->sector);
> +		disk_round_stats(cpu, disk);
>  		disk->in_flight--;
>  		if (part) {
> -			part_round_stats(part);
> +			part_round_stats(cpu, part);
>  			part->in_flight--;
>  		}
>  
> -		rcu_read_unlock();
> +		disk_stat_unlock();
>  	}
>  
>  	if (req->end_io)
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index eb2a3ca..d926a24 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -388,18 +388,19 @@ static int attempt_merge(struct request_queue *q, struct request *req,
>  
>  	if (req->rq_disk) {
>  		struct hd_struct *part;
> +		int cpu;
>  
> -		rcu_read_lock();
> -
> +		cpu = disk_stat_lock();
>  		part = disk_map_sector_rcu(req->rq_disk, req->sector);
> -		disk_round_stats(req->rq_disk);
> +
> +		disk_round_stats(cpu, req->rq_disk);
>  		req->rq_disk->in_flight--;
>  		if (part) {
> -			part_round_stats(part);
> +			part_round_stats(cpu, part);
>  			part->in_flight--;
>  		}
>  
> -		rcu_read_unlock();
> +		disk_stat_unlock();
>  	}
>  
>  	req->ioprio = ioprio_best(req->ioprio, next->ioprio);
> diff --git a/block/genhd.c b/block/genhd.c
> index 5e234b1..7dbf2cc 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -634,10 +634,11 @@ static ssize_t disk_stat_show(struct device *dev,
>  			      struct device_attribute *attr, char *buf)
>  {
>  	struct gendisk *disk = dev_to_disk(dev);
> +	int cpu;
>  
> -	preempt_disable();
> -	disk_round_stats(disk);
> -	preempt_enable();
> +	cpu = disk_stat_lock();
> +	disk_round_stats(cpu, disk);
> +	disk_stat_unlock();
>  	return sprintf(buf,
>  		"%8lu %8lu %8llu %8u "
>  		"%8lu %8lu %8llu %8u "
> @@ -750,6 +751,7 @@ static int diskstats_show(struct seq_file *seqf, void *v)
>  	struct disk_part_iter piter;
>  	struct hd_struct *hd;
>  	char buf[BDEVNAME_SIZE];
> +	int cpu;
>  
>  	/*
>  	if (&gp->dev.kobj.entry == block_class.devices.next)
> @@ -759,9 +761,9 @@ static int diskstats_show(struct seq_file *seqf, void *v)
>  				"\n\n");
>  	*/
>   
> -	preempt_disable();
> -	disk_round_stats(gp);
> -	preempt_enable();
> +	cpu = disk_stat_lock();
> +	disk_round_stats(cpu, gp);
> +	disk_stat_unlock();
>  	seq_printf(seqf, "%4d %4d %s %lu %lu %llu %u %lu %lu %llu %u %u %u %u\n",
>  		MAJOR(disk_devt(gp)), MINOR(disk_devt(gp)),
>  		disk_name(gp, 0, buf),
> @@ -778,9 +780,9 @@ static int diskstats_show(struct seq_file *seqf, void *v)
>  	/* now show all non-0 size partitions of it */
>  	disk_part_iter_init(&piter, gp, 0);
>  	while ((hd = disk_part_iter_next(&piter))) {
> -		preempt_disable();
> -		part_round_stats(hd);
> -		preempt_enable();
> +		cpu = disk_stat_lock();
> +		part_round_stats(cpu, hd);
> +		disk_stat_unlock();
>  		seq_printf(seqf, "%4d %4d %s %lu %lu %llu "
>  			   "%u %lu %lu %llu %u %u %u %u\n",
>  			   MAJOR(part_devt(hd)), MINOR(part_devt(hd)),
> diff --git a/drivers/block/aoe/aoecmd.c b/drivers/block/aoe/aoecmd.c
> index 84c03d6..17eed8c 100644
> --- a/drivers/block/aoe/aoecmd.c
> +++ b/drivers/block/aoe/aoecmd.c
> @@ -756,16 +756,17 @@ diskstats(struct gendisk *disk, struct bio *bio, ulong duration, sector_t sector
>  	unsigned long n_sect = bio->bi_size >> 9;
>  	const int rw = bio_data_dir(bio);
>  	struct hd_struct *part;
> +	int cpu;
>  
> -	rcu_read_lock();
> -
> +	cpu = disk_stat_lock();
>  	part = disk_map_sector_rcu(disk, sector);
> -	all_stat_inc(disk, part, ios[rw], sector);
> -	all_stat_add(disk, part, ticks[rw], duration, sector);
> -	all_stat_add(disk, part, sectors[rw], n_sect, sector);
> -	all_stat_add(disk, part, io_ticks, duration, sector);
>  
> -	rcu_read_unlock();
> +	all_stat_inc(cpu, disk, part, ios[rw], sector);
> +	all_stat_add(cpu, disk, part, ticks[rw], duration, sector);
> +	all_stat_add(cpu, disk, part, sectors[rw], n_sect, sector);
> +	all_stat_add(cpu, disk, part, io_ticks, duration, sector);
> +
> +	disk_stat_unlock();
>  }
>  
>  void
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index b4ddb96..d087435 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -377,12 +377,13 @@ static void free_tio(struct mapped_device *md, struct dm_target_io *tio)
>  static void start_io_acct(struct dm_io *io)
>  {
>  	struct mapped_device *md = io->md;
> +	int cpu;
>  
>  	io->start_time = jiffies;
>  
> -	preempt_disable();
> -	disk_round_stats(dm_disk(md));
> -	preempt_enable();
> +	cpu = disk_stat_lock();
> +	disk_round_stats(cpu, dm_disk(md));
> +	disk_stat_unlock();
>  	dm_disk(md)->in_flight = atomic_inc_return(&md->pending);
>  }
>  
> @@ -391,15 +392,15 @@ static int end_io_acct(struct dm_io *io)
>  	struct mapped_device *md = io->md;
>  	struct bio *bio = io->bio;
>  	unsigned long duration = jiffies - io->start_time;
> -	int pending;
> +	int pending, cpu;
>  	int rw = bio_data_dir(bio);
>  
> -	preempt_disable();
> -	disk_round_stats(dm_disk(md));
> -	preempt_enable();
> -	dm_disk(md)->in_flight = pending = atomic_dec_return(&md->pending);
> +	cpu = disk_stat_lock();
> +	disk_round_stats(cpu, dm_disk(md));
> +	disk_stat_add(cpu, dm_disk(md), ticks[rw], duration);
> +	disk_stat_unlock();
>  
> -	disk_stat_add(dm_disk(md), ticks[rw], duration);
> +	dm_disk(md)->in_flight = pending = atomic_dec_return(&md->pending);
>  
>  	return !pending;
>  }
> @@ -881,6 +882,7 @@ static int dm_request(struct request_queue *q, struct bio *bio)
>  	int r = -EIO;
>  	int rw = bio_data_dir(bio);
>  	struct mapped_device *md = q->queuedata;
> +	int cpu;
>  
>  	/*
>  	 * There is no use in forwarding any barrier request since we can't
> @@ -893,8 +895,10 @@ static int dm_request(struct request_queue *q, struct bio *bio)
>  
>  	down_read(&md->io_lock);
>  
> -	disk_stat_inc(dm_disk(md), ios[rw]);
> -	disk_stat_add(dm_disk(md), sectors[rw], bio_sectors(bio));
> +	cpu = disk_stat_lock();
> +	disk_stat_inc(cpu, dm_disk(md), ios[rw]);
> +	disk_stat_add(cpu, dm_disk(md), sectors[rw], bio_sectors(bio));
> +	disk_stat_unlock();
>  
>  	/*
>  	 * If we're suspended we have to queue
> diff --git a/drivers/md/linear.c b/drivers/md/linear.c
> index b1eebf8..00cbc8e 100644
> --- a/drivers/md/linear.c
> +++ b/drivers/md/linear.c
> @@ -318,14 +318,17 @@ static int linear_make_request (struct request_queue *q, struct bio *bio)
>  	mddev_t *mddev = q->queuedata;
>  	dev_info_t *tmp_dev;
>  	sector_t block;
> +	int cpu;
>  
>  	if (unlikely(bio_barrier(bio))) {
>  		bio_endio(bio, -EOPNOTSUPP);
>  		return 0;
>  	}
>  
> -	disk_stat_inc(mddev->gendisk, ios[rw]);
> -	disk_stat_add(mddev->gendisk, sectors[rw], bio_sectors(bio));
> +	cpu = disk_stat_lock();
> +	disk_stat_inc(cpu, mddev->gendisk, ios[rw]);
> +	disk_stat_add(cpu, mddev->gendisk, sectors[rw], bio_sectors(bio));
> +	disk_stat_unlock();
>  
>  	tmp_dev = which_dev(mddev, bio->bi_sector);
>  	block = bio->bi_sector >> 1;
> diff --git a/drivers/md/multipath.c b/drivers/md/multipath.c
> index c4779cc..182f5a9 100644
> --- a/drivers/md/multipath.c
> +++ b/drivers/md/multipath.c
> @@ -147,6 +147,7 @@ static int multipath_make_request (struct request_queue *q, struct bio * bio)
>  	struct multipath_bh * mp_bh;
>  	struct multipath_info *multipath;
>  	const int rw = bio_data_dir(bio);
> +	int cpu;
>  
>  	if (unlikely(bio_barrier(bio))) {
>  		bio_endio(bio, -EOPNOTSUPP);
> @@ -158,8 +159,10 @@ static int multipath_make_request (struct request_queue *q, struct bio * bio)
>  	mp_bh->master_bio = bio;
>  	mp_bh->mddev = mddev;
>  
> -	disk_stat_inc(mddev->gendisk, ios[rw]);
> -	disk_stat_add(mddev->gendisk, sectors[rw], bio_sectors(bio));
> +	cpu = disk_stat_lock();
> +	disk_stat_inc(cpu, mddev->gendisk, ios[rw]);
> +	disk_stat_add(cpu, mddev->gendisk, sectors[rw], bio_sectors(bio));
> +	disk_stat_unlock();
>  
>  	mp_bh->path = multipath_map(conf);
>  	if (mp_bh->path < 0) {
> diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
> index 1836106..e26030f 100644
> --- a/drivers/md/raid0.c
> +++ b/drivers/md/raid0.c
> @@ -399,14 +399,17 @@ static int raid0_make_request (struct request_queue *q, struct bio *bio)
>  	sector_t chunk;
>  	sector_t block, rsect;
>  	const int rw = bio_data_dir(bio);
> +	int cpu;
>  
>  	if (unlikely(bio_barrier(bio))) {
>  		bio_endio(bio, -EOPNOTSUPP);
>  		return 0;
>  	}
>  
> -	disk_stat_inc(mddev->gendisk, ios[rw]);
> -	disk_stat_add(mddev->gendisk, sectors[rw], bio_sectors(bio));
> +	cpu = disk_stat_lock();
> +	disk_stat_inc(cpu, mddev->gendisk, ios[rw]);
> +	disk_stat_add(cpu, mddev->gendisk, sectors[rw], bio_sectors(bio));
> +	disk_stat_unlock();
>  
>  	chunk_size = mddev->chunk_size >> 10;
>  	chunk_sects = mddev->chunk_size >> 9;
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 0b82030..babb130 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -779,7 +779,7 @@ static int make_request(struct request_queue *q, struct bio * bio)
>  	struct page **behind_pages = NULL;
>  	const int rw = bio_data_dir(bio);
>  	const int do_sync = bio_sync(bio);
> -	int do_barriers;
> +	int cpu, do_barriers;
>  	mdk_rdev_t *blocked_rdev;
>  
>  	/*
> @@ -804,8 +804,10 @@ static int make_request(struct request_queue *q, struct bio * bio)
>  
>  	bitmap = mddev->bitmap;
>  
> -	disk_stat_inc(mddev->gendisk, ios[rw]);
> -	disk_stat_add(mddev->gendisk, sectors[rw], bio_sectors(bio));
> +	cpu = disk_stat_lock();
> +	disk_stat_inc(cpu, mddev->gendisk, ios[rw]);
> +	disk_stat_add(cpu, mddev->gendisk, sectors[rw], bio_sectors(bio));
> +	disk_stat_unlock();
>  
>  	/*
>  	 * make_request() can abort the operation when READA is being
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index d3b9aa0..5ec80da 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -789,6 +789,7 @@ static int make_request(struct request_queue *q, struct bio * bio)
>  	mirror_info_t *mirror;
>  	r10bio_t *r10_bio;
>  	struct bio *read_bio;
> +	int cpu;
>  	int i;
>  	int chunk_sects = conf->chunk_mask + 1;
>  	const int rw = bio_data_dir(bio);
> @@ -843,8 +844,10 @@ static int make_request(struct request_queue *q, struct bio * bio)
>  	 */
>  	wait_barrier(conf);
>  
> -	disk_stat_inc(mddev->gendisk, ios[rw]);
> -	disk_stat_add(mddev->gendisk, sectors[rw], bio_sectors(bio));
> +	cpu = disk_stat_lock();
> +	disk_stat_inc(cpu, mddev->gendisk, ios[rw]);
> +	disk_stat_add(cpu, mddev->gendisk, sectors[rw], bio_sectors(bio));
> +	disk_stat_unlock();
>  
>  	r10_bio = mempool_alloc(conf->r10bio_pool, GFP_NOIO);
>  
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 37e5465..5899f21 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -3387,7 +3387,7 @@ static int make_request(struct request_queue *q, struct bio * bi)
>  	sector_t logical_sector, last_sector;
>  	struct stripe_head *sh;
>  	const int rw = bio_data_dir(bi);
> -	int remaining;
> +	int cpu, remaining;
>  
>  	if (unlikely(bio_barrier(bi))) {
>  		bio_endio(bi, -EOPNOTSUPP);
> @@ -3396,8 +3396,10 @@ static int make_request(struct request_queue *q, struct bio * bi)
>  
>  	md_write_start(mddev, bi);
>  
> -	disk_stat_inc(mddev->gendisk, ios[rw]);
> -	disk_stat_add(mddev->gendisk, sectors[rw], bio_sectors(bi));
> +	cpu = disk_stat_lock();
> +	disk_stat_inc(cpu, mddev->gendisk, ios[rw]);
> +	disk_stat_add(cpu, mddev->gendisk, sectors[rw], bio_sectors(bi));
> +	disk_stat_unlock();
>  
>  	if (rw == READ &&
>  	     mddev->reshape_position == MaxSector &&
> diff --git a/fs/partitions/check.c b/fs/partitions/check.c
> index aa2f5f7..36e0641 100644
> --- a/fs/partitions/check.c
> +++ b/fs/partitions/check.c
> @@ -219,10 +219,11 @@ static ssize_t part_stat_show(struct device *dev,
>  			      struct device_attribute *attr, char *buf)
>  {
>  	struct hd_struct *p = dev_to_part(dev);
> +	int cpu;
>  
> -	preempt_disable();
> -	part_round_stats(p);
> -	preempt_enable();
> +	cpu = disk_stat_lock();
> +	part_round_stats(cpu, p);
> +	disk_stat_unlock();
>  	return sprintf(buf,
>  		"%8lu %8lu %8llu %8u "
>  		"%8lu %8lu %8llu %8u "
> diff --git a/include/linux/genhd.h b/include/linux/genhd.h
> index aeaf59c..09420c3 100644
> --- a/include/linux/genhd.h
> +++ b/include/linux/genhd.h
> @@ -219,16 +219,24 @@ extern void disk_part_iter_exit(struct disk_part_iter *piter);
>  extern struct hd_struct *disk_map_sector_rcu(struct gendisk *disk,
>  					     sector_t sector);
>  
> -/* 
> +/*
>   * Macros to operate on percpu disk statistics:
>   *
> - * The __ variants should only be called in critical sections. The full
> - * variants disable/enable preemption.
> + * {disk|part|all}_stat_{add|sub|inc|dec}() modify the stat counters
> + * and should be called between disk_stat_lock() and
> + * disk_stat_unlock().
> + *
> + * part_stat_read() can be called at any time.
> + *
> + * part_stat_{add|set_all}() and {init|free}_part_stats are for
> + * internal use only.
>   */
> -
>  #ifdef	CONFIG_SMP
> -#define __disk_stat_add(gendiskp, field, addnd) 	\
> -	(per_cpu_ptr(gendiskp->dkstats, smp_processor_id())->field += addnd)
> +#define disk_stat_lock()	({ rcu_read_lock(); get_cpu(); })
> +#define disk_stat_unlock()	do { put_cpu(); rcu_read_unlock(); } while (0)
> +
> +#define disk_stat_add(cpu, gendiskp, field, addnd)			\
> +	(per_cpu_ptr(gendiskp->dkstats, cpu)->field += addnd)
>  
>  #define disk_stat_read(gendiskp, field)					\
>  ({									\
> @@ -239,7 +247,8 @@ extern struct hd_struct *disk_map_sector_rcu(struct gendisk *disk,
>  	res;								\
>  })
>  
> -static inline void disk_stat_set_all(struct gendisk *gendiskp, int value)	{
> +static inline void disk_stat_set_all(struct gendisk *gendiskp, int value)
> +{
>  	int i;
>  
>  	for_each_possible_cpu(i)
> @@ -247,14 +256,14 @@ static inline void disk_stat_set_all(struct gendisk *gendiskp, int value)	{
>  				sizeof(struct disk_stats));
>  }		
>  
> -#define __part_stat_add(part, field, addnd)				\
> -	(per_cpu_ptr(part->dkstats, smp_processor_id())->field += addnd)
> +#define part_stat_add(cpu, part, field, addnd)				\
> +	(per_cpu_ptr(part->dkstats, cpu)->field += addnd)
>  
> -#define __all_stat_add(gendiskp, part, field, addnd, sector)	\
> -({								\
> -	if (part)						\
> -		__part_stat_add(part, field, addnd);		\
> -	__disk_stat_add(gendiskp, field, addnd);		\
> +#define all_stat_add(cpu, gendiskp, part, field, addnd, sector)		\
> +({									\
> +	if (part)							\
> +		part_stat_add(cpu, part, field, addnd);			\
> +	disk_stat_add(cpu, gendiskp, field, addnd);			\
>  })
>  
>  #define part_stat_read(part, field)					\
> @@ -274,10 +283,13 @@ static inline void part_stat_set_all(struct hd_struct *part, int value)
>  		memset(per_cpu_ptr(part->dkstats, i), value,
>  				sizeof(struct disk_stats));
>  }
> -				
> +
>  #else /* !CONFIG_SMP */
> -#define __disk_stat_add(gendiskp, field, addnd) \
> -				(gendiskp->dkstats.field += addnd)
> +#define disk_stat_lock()	({ rcu_read_lock(); 0; })
> +#define disk_stat_unlock()	rcu_read_unlock()
> +
> +#define disk_stat_add(cpu, gendiskp, field, addnd)			\
> +	(gendiskp->dkstats.field += addnd)
>  #define disk_stat_read(gendiskp, field)	(gendiskp->dkstats.field)
>  
>  static inline void disk_stat_set_all(struct gendisk *gendiskp, int value)
> @@ -285,14 +297,14 @@ static inline void disk_stat_set_all(struct gendisk *gendiskp, int value)
>  	memset(&gendiskp->dkstats, value, sizeof (struct disk_stats));
>  }
>  
> -#define __part_stat_add(part, field, addnd) \
> +#define part_stat_add(cpu, part, field, addnd)				\
>  	(part->dkstats.field += addnd)
>  
> -#define __all_stat_add(gendiskp, part, field, addnd, sector)	\
> -({								\
> -	if (part)						\
> -		part->dkstats.field += addnd;			\
> -	__disk_stat_add(gendiskp, field, addnd);		\
> +#define all_stat_add(cpu, gendiskp, part, field, addnd, sector)		\
> +({									\
> +	if (part)							\
> +		part_stat_add(cpu, part, field, addnd);			\
> +	disk_stat_add(cpu, gendiskp, field, addnd);			\
>  })
>  
>  #define part_stat_read(part, field)	(part->dkstats.field)
> @@ -304,63 +316,26 @@ static inline void part_stat_set_all(struct hd_struct *part, int value)
>  
>  #endif /* CONFIG_SMP */
>  
> -#define disk_stat_add(gendiskp, field, addnd)			\
> -	do {							\
> -		preempt_disable();				\
> -		__disk_stat_add(gendiskp, field, addnd);	\
> -		preempt_enable();				\
> -	} while (0)
> -
> -#define __disk_stat_dec(gendiskp, field) __disk_stat_add(gendiskp, field, -1)
> -#define disk_stat_dec(gendiskp, field) disk_stat_add(gendiskp, field, -1)
> -
> -#define __disk_stat_inc(gendiskp, field) __disk_stat_add(gendiskp, field, 1)
> -#define disk_stat_inc(gendiskp, field) disk_stat_add(gendiskp, field, 1)
> -
> -#define __disk_stat_sub(gendiskp, field, subnd) \
> -		__disk_stat_add(gendiskp, field, -subnd)
> -#define disk_stat_sub(gendiskp, field, subnd) \
> -		disk_stat_add(gendiskp, field, -subnd)
> -
> -#define part_stat_add(gendiskp, field, addnd)		\
> -	do {						\
> -		preempt_disable();			\
> -		__part_stat_add(gendiskp, field, addnd);\
> -		preempt_enable();			\
> -	} while (0)
> -
> -#define __part_stat_dec(gendiskp, field) __part_stat_add(gendiskp, field, -1)
> -#define part_stat_dec(gendiskp, field) part_stat_add(gendiskp, field, -1)
> -
> -#define __part_stat_inc(gendiskp, field) __part_stat_add(gendiskp, field, 1)
> -#define part_stat_inc(gendiskp, field) part_stat_add(gendiskp, field, 1)
> -
> -#define __part_stat_sub(gendiskp, field, subnd) \
> -		__part_stat_add(gendiskp, field, -subnd)
> -#define part_stat_sub(gendiskp, field, subnd) \
> -		part_stat_add(gendiskp, field, -subnd)
> -
> -#define all_stat_add(gendiskp, part, field, addnd, sector)	\
> -	do {							\
> -		preempt_disable();				\
> -		__all_stat_add(gendiskp, part, field, addnd, sector);	\
> -		preempt_enable();				\
> -	} while (0)
> -
> -#define __all_stat_dec(gendiskp, field, sector) \
> -		__all_stat_add(gendiskp, field, -1, sector)
> -#define all_stat_dec(gendiskp, field, sector) \
> -		all_stat_add(gendiskp, field, -1, sector)
> -
> -#define __all_stat_inc(gendiskp, part, field, sector) \
> -		__all_stat_add(gendiskp, part, field, 1, sector)
> -#define all_stat_inc(gendiskp, part, field, sector) \
> -		all_stat_add(gendiskp, part, field, 1, sector)
> -
> -#define __all_stat_sub(gendiskp, part, field, subnd, sector) \
> -		__all_stat_add(gendiskp, part, field, -subnd, sector)
> -#define all_stat_sub(gendiskp, part, field, subnd, sector) \
> -		all_stat_add(gendiskp, part, field, -subnd, sector)
> +#define disk_stat_dec(cpu, gendiskp, field)				\
> +	disk_stat_add(cpu, gendiskp, field, -1)
> +#define disk_stat_inc(cpu, gendiskp, field)				\
> +	disk_stat_add(cpu, gendiskp, field, 1)
> +#define disk_stat_sub(cpu, gendiskp, field, subnd)			\
> +	disk_stat_add(cpu, gendiskp, field, -subnd)
> +
> +#define part_stat_dec(cpu, gendiskp, field)				\
> +	part_stat_add(cpu, gendiskp, field, -1)
> +#define part_stat_inc(cpu, gendiskp, field)				\
> +	part_stat_add(cpu, gendiskp, field, 1)
> +#define part_stat_sub(cpu, gendiskp, field, subnd)			\
> +	part_stat_add(cpu, gendiskp, field, -subnd)
> +
> +#define all_stat_dec(cpu, gendiskp, field, sector)			\
> +	all_stat_add(cpu, gendiskp, field, -1, sector)
> +#define all_stat_inc(cpu, gendiskp, part, field, sector)		\
> +	all_stat_add(cpu, gendiskp, part, field, 1, sector)
> +#define all_stat_sub(cpu, gendiskp, part, field, subnd, sector)		\
> +	all_stat_add(cpu, gendiskp, part, field, -subnd, sector)
>  
>  /* Inlines to alloc and free disk stats in struct gendisk */
>  #ifdef  CONFIG_SMP
> @@ -411,8 +386,8 @@ static inline void free_part_stats(struct hd_struct *part)
>  #endif	/* CONFIG_SMP */
>  
>  /* drivers/block/ll_rw_blk.c */
> -extern void disk_round_stats(struct gendisk *disk);
> -extern void part_round_stats(struct hd_struct *part);
> +extern void disk_round_stats(int cpu, struct gendisk *disk);
> +extern void part_round_stats(int cpu, struct hd_struct *part);
>  
>  /* drivers/block/genhd.c */
>  extern int get_blkdev_list(char *, int);


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHSET 2/3 blk-for-2.6.28] block: implement extended devt, take #3
  2008-08-25 10:47 [PATCHSET 2/3 blk-for-2.6.28] block: implement extended devt, take #3 Tejun Heo
                   ` (8 preceding siblings ...)
  2008-08-25 10:47 ` [PATCH 09/09] block: implement CONFIG_DEBUG_BLOCK_EXT_DEVT Tejun Heo
@ 2008-08-25 18:33 ` Jens Axboe
  9 siblings, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2008-08-25 18:33 UTC (permalink / raw)
  To: Tejun Heo
  Cc: James.Bottomley, bzolnier, bharrosh, greg.freemyer, linux-scsi,
	brking, liml, viro, linux-ide, neilb, linux-kernel

On Mon, Aug 25 2008, Tejun Heo wrote:
> Hello, all.
> 
> This is the third take of block-extended-devt patchset and contains
> the following 10 patches.

Applied 1-9, thanks Tejun!

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2008-08-25 18:33 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-25 10:47 [PATCHSET 2/3 blk-for-2.6.28] block: implement extended devt, take #3 Tejun Heo
2008-08-25 10:47 ` [PATCH 01/09] block: misc updates Tejun Heo
2008-08-25 10:47 ` [PATCH 02/09] block: make variable and argument names more consistent Tejun Heo
2008-08-25 10:47 ` [PATCH 03/09] block: don't depend on consecutive minor space Tejun Heo
2008-08-25 10:47 ` [PATCH 04/09] block: fix disk->part[] dereferencing race Tejun Heo
2008-08-25 10:47 ` [PATCH 05/09] block: fix diskstats access Tejun Heo
2008-08-25 10:58   ` Peter Zijlstra
2008-08-25 10:47 ` [PATCH 06/09] block: implement extended dev numbers Tejun Heo
2008-08-25 10:47 ` [PATCH 07/09] block: adjust formatting for large minors and add ext_range sysfs attr Tejun Heo
2008-08-25 10:47 ` [PATCH 08/09] sd/ide-disk: apply extended minors to sd and ide Tejun Heo
2008-08-25 10:47 ` [PATCH 09/09] block: implement CONFIG_DEBUG_BLOCK_EXT_DEVT Tejun Heo
2008-08-25 18:33 ` [PATCHSET 2/3 blk-for-2.6.28] block: implement extended devt, take #3 Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).