linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0 of 9] I/O topology patch kit
@ 2009-04-24  5:32 Martin K. Petersen
  2009-04-24  5:32 ` [PATCH 1 of 9] block: Expose stacked device queues in sysfs Martin K. Petersen
                   ` (11 more replies)
  0 siblings, 12 replies; 21+ messages in thread
From: Martin K. Petersen @ 2009-04-24  5:32 UTC (permalink / raw)
  To: rwheeler, snitzer, jeff, neilb, James.Bottomley, dgilbert,
	jens.axboe, matthew, linux-ide


Second take of the I/O topology patches.

Changes:

 - PATCH 2 of 8: New naming as suggested by Jens. Document sysfs ABI.
 - PATCH [346] of 8: Updated to use the new blk_queue_io_foo() calls.
 - PATCH 8: Wilcox & Petersen libata READ CAPACITY(16) convergence.
 - PATCH 9: libata rotation rate heuristics

Wrt. patch 9: Flame away :)

-- 
Martin K. Petersen	Oracle Linux Engineering



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1 of 9] block: Expose stacked device queues in sysfs
  2009-04-24  5:32 [PATCH 0 of 9] I/O topology patch kit Martin K. Petersen
@ 2009-04-24  5:32 ` Martin K. Petersen
  2009-04-24  5:32 ` [PATCH 2 of 9] block: Export I/O topology for block devices and partitions Martin K. Petersen
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Martin K. Petersen @ 2009-04-24  5:32 UTC (permalink / raw)
  To: rwheeler, snitzer, jeff, neilb, James.Bottomley, dgilbert,
	jens.axboe, matthew, linux-ide

Currently stacking devices do not have a queue directory in sysfs.
However, many of the I/O characteristics like sector size, maximum
request size, etc. are queue properties.

This patch enables the queue directory for MD/DM devices.  The elevator
code has been modified to deal with queues that do not have an I/O
scheduler.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

---
2 files changed, 15 insertions(+), 4 deletions(-)
block/blk-sysfs.c |    6 +++---
block/elevator.c  |   13 ++++++++++++-



diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -393,9 +393,6 @@ int blk_register_queue(struct gendisk *d
 	if (WARN_ON(!q))
 		return -ENXIO;
 
-	if (!q->request_fn)
-		return 0;
-
 	ret = kobject_add(&q->kobj, kobject_get(&disk_to_dev(disk)->kobj),
 			  "%s", "queue");
 	if (ret < 0)
@@ -403,6 +400,9 @@ int blk_register_queue(struct gendisk *d
 
 	kobject_uevent(&q->kobj, KOBJ_ADD);
 
+	if (!q->request_fn)
+		return 0;
+
 	ret = elv_register_queue(q);
 	if (ret) {
 		kobject_uevent(&q->kobj, KOBJ_REMOVE);
diff --git a/block/elevator.c b/block/elevator.c
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -592,6 +592,9 @@ void elv_drain_elevator(struct request_q
  */
 void elv_quiesce_start(struct request_queue *q)
 {
+	if (!q->elevator)
+		return;
+
 	queue_flag_set(QUEUE_FLAG_ELVSWITCH, q);
 
 	/*
@@ -1179,6 +1182,9 @@ ssize_t elv_iosched_store(struct request
 	char elevator_name[ELV_NAME_MAX];
 	struct elevator_type *e;
 
+	if (!q->elevator)
+		return count;
+
 	strlcpy(elevator_name, name, sizeof(elevator_name));
 	strstrip(elevator_name);
 
@@ -1202,10 +1208,15 @@ ssize_t elv_iosched_store(struct request
 ssize_t elv_iosched_show(struct request_queue *q, char *name)
 {
 	struct elevator_queue *e = q->elevator;
-	struct elevator_type *elv = e->elevator_type;
+	struct elevator_type *elv;
 	struct elevator_type *__e;
 	int len = 0;
 
+	if (!q->elevator)
+		return sprintf(name, "none\n");
+
+	elv = e->elevator_type;
+
 	spin_lock(&elv_list_lock);
 	list_for_each_entry(__e, &elv_list, list) {
 		if (!strcmp(elv->elevator_name, __e->elevator_name))



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 2 of 9] block: Export I/O topology for block devices and partitions
  2009-04-24  5:32 [PATCH 0 of 9] I/O topology patch kit Martin K. Petersen
  2009-04-24  5:32 ` [PATCH 1 of 9] block: Expose stacked device queues in sysfs Martin K. Petersen
@ 2009-04-24  5:32 ` Martin K. Petersen
  2009-04-24 12:14   ` Kay Sievers
  2009-04-24  5:32 ` [PATCH 3 of 9] MD: Use new topology calls to indicate alignment and I/O sizes Martin K. Petersen
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 21+ messages in thread
From: Martin K. Petersen @ 2009-04-24  5:32 UTC (permalink / raw)
  To: rwheeler, snitzer, jeff, neilb, James.Bottomley, dgilbert,
	jens.axboe, matthew, linux-ide

To support devices with physical block sizes bigger than 512 bytes we
need to ensure proper alignment.  This patch adds support for exposing
I/O topology characteristics as devices are stacked.

  hardsect_size remains unchanged.  It is the smallest atomic unit the
  device can address (i.e. logical block size).

  io_granularity indicates the smallest I/O the device can access
  without incurring a read-modify-write penalty.  The granularity is set
  by low-level drivers from then on it is purely internal to the
  stacking logic.

  The io_min parameter is the smallest preferred I/O size reported by
  the device.  In many cases this is the same as granularity.  However,
  the io_min parameter can be scaled up when stacking (RAID5 chunk size
  > physical sector size).  io_min is available in sysfs
  (minimum_io_size).

  The io_opt characteristic indicates the optimal I/O size reported by
  the device.  This is usually the stripe width for arrays.  The value
  is in sysfs (optimal_io_size).

  The io_alignment parameter indicates the number of bytes the start of
  the device/partition is offset from the device granularity.  Partition
  tools and MD/DM tools can use this to align filesystems to the proper
  boundaries.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

---
7 files changed, 245 insertions(+), 4 deletions(-)
Documentation/ABI/testing/sysfs-block |   41 ++++++++++
block/blk-settings.c                  |  135 ++++++++++++++++++++++++++++++++-
block/blk-sysfs.c                     |   22 +++++
block/genhd.c                         |   10 ++
fs/partitions/check.c                 |   10 ++
include/linux/blkdev.h                |   30 +++++++
include/linux/genhd.h                 |    1 



diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/testing/sysfs-block
--- a/Documentation/ABI/testing/sysfs-block
+++ b/Documentation/ABI/testing/sysfs-block
@@ -60,3 +60,44 @@ Description:
 		Indicates whether the block layer should automatically
 		generate checksums for write requests bound for
 		devices that support receiving integrity metadata.
+
+What:		/sys/block/<disk>/alignment
+Date:		April 2009
+Contact:	Martin K. Petersen <martin.petersen@oracle.com>
+Description:
+		Storage devices may report a physical block size that is
+		bigger than the logical block size (for instance a drive
+		with 4KB physical sectors exposing 512-byte logical
+		blocks to the operating system).  This parameter
+		indicates how many bytes the beginning of the device are
+		offset from the disk's natural alignment.
+
+What:		/sys/block/<disk>/<partition>/alignment
+Date:		April 2009
+Contact:	Martin K. Petersen <martin.petersen@oracle.com>
+Description:
+		Storage devices may report a physical block size that is
+		bigger than the logical block size (for instance a drive
+		with 4KB physical sectors exposing 512-byte logical
+		blocks to the operating system).  This parameter
+		indicates how many bytes the beginning of the partition
+		are offset from the disk's natural alignment.
+
+What:		/sys/block/<disk>/queue/minimum_io_size
+Date:		April 2009
+Contact:	Martin K. Petersen <martin.petersen@oracle.com>
+Description:
+		Storage devices may report a preferred minimum I/O size,
+		which is the smallest request the device can perform
+		without incurring a read-modify-write penalty.  For disk
+		drives this is often the physical block size.  For RAID
+		arrays it is often the stripe chunk size.
+
+What:		/sys/block/<disk>/queue/optimal_io_size
+Date:		April 2009
+Contact:	Martin K. Petersen <martin.petersen@oracle.com>
+Description:
+		Storage devices may report an optimal I/O size, which is
+		the device's preferred unit of receiving I/O.  This is
+		rarely reported for disk drives.  For RAID devices it is
+		usually the stripe width or the internal block size.
diff --git a/block/blk-settings.c b/block/blk-settings.c
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -292,22 +292,87 @@ EXPORT_SYMBOL(blk_queue_max_segment_size
  *
  * Description:
  *   This should typically be set to the lowest possible sector size
- *   that the hardware can operate on (possible without reverting to
- *   even internal read-modify-write operations). Usually the default
- *   of 512 covers most hardware.
+ *   (logical block size) that the hardware can operate on.  Usually the
+ *   default of 512 covers most hardware.
  **/
 void blk_queue_hardsect_size(struct request_queue *q, unsigned short size)
 {
-	q->hardsect_size = size;
+	q->hardsect_size = q->io_granularity = size;
 }
 EXPORT_SYMBOL(blk_queue_hardsect_size);
 
+/**
+ * blk_queue_io_granularity - set I/O granularity for the queue
+ * @q:  the request queue for the device
+ * @size:  the I/O granularity, in bytes
+ *
+ * Description:
+ *   This should typically be set to the lowest possible sector size
+ *   that the hardware can operate on without reverting to
+ *   read-modify-write operations.
+ **/
+void blk_queue_io_granularity(struct request_queue *q, unsigned short size)
+{
+	q->io_granularity = size;
+}
+EXPORT_SYMBOL(blk_queue_io_granularity);
+
+/**
+ * blk_queue_io_alignment - set physical block alignment for the queue
+ * @q:  the request queue for the device
+ * @alignment:  alignment offset in bytes
+ *
+ * Description:
+ *   Some devices are naturally misaligned to compensate for things like
+ *   the legacy DOS partition table 63-sector offset.  Low-level drivers
+ *   should call this function for devices whose first sector is not
+ *   naturally aligned.
+ */
+void blk_queue_io_alignment(struct request_queue *q, unsigned int alignment)
+{
+	q->io_alignment = alignment & (q->io_granularity - 1);
+	clear_bit(QUEUE_FLAG_MISALIGNED, &q->queue_flags);
+}
+EXPORT_SYMBOL(blk_queue_io_alignment);
+
 /*
  * Returns the minimum that is _not_ zero, unless both are zero.
  */
 #define min_not_zero(l, r) (l == 0) ? r : ((r == 0) ? l : min(l, r))
 
 /**
+ * blk_queue_io_min - set minimum request size for the queue
+ * @q:  the request queue for the device
+ * @io_min:  smallest I/O size in bytes
+ *
+ * Description:
+ *   Some devices have an internal block size bigger than the reported
+ *   hardware sector size.  This function can be used to signal the
+ *   smallest I/O the device can perform without incurring a performance
+ *   penalty.
+ */
+void blk_queue_io_min(struct request_queue *q, unsigned int min)
+{
+	q->io_min = min;
+}
+EXPORT_SYMBOL(blk_queue_io_min);
+
+/**
+ * blk_queue_io_opt - set optimal request size for the queue
+ * @q:  the request queue for the device
+ * @io_opt:  optimal request size in bytes
+ *
+ * Description:
+ *   Drivers can call this function to set the preferred I/O request
+ *   size for devices that report such a value.
+ */
+void blk_queue_io_opt(struct request_queue *q, unsigned int opt)
+{
+	q->io_opt = opt;
+}
+EXPORT_SYMBOL(blk_queue_io_opt);
+
+/**
  * blk_queue_stack_limits - inherit underlying queue limits for stacked drivers
  * @t:	the stacking driver (top)
  * @b:  the underlying device (bottom)
@@ -335,6 +400,68 @@ void blk_queue_stack_limits(struct reque
 EXPORT_SYMBOL(blk_queue_stack_limits);
 
 /**
+ * blk_queue_stack_topology - adjust queue limits for stacked drivers
+ * @t:	the stacking driver (top)
+ * @bdev:  the underlying block device (bottom)
+ * @offset:  offset to beginning of data within component device
+ **/
+void blk_queue_stack_topology(struct request_queue *t, struct block_device *bdev,
+			      sector_t offset)
+{
+	struct request_queue *b = bdev_get_queue(bdev);
+	int misaligned;
+
+	/* zero is "infinity" */
+	t->max_sectors = min_not_zero(t->max_sectors, b->max_sectors);
+	t->max_hw_sectors = min_not_zero(t->max_hw_sectors, b->max_hw_sectors);
+	t->seg_boundary_mask = min_not_zero(t->seg_boundary_mask, b->seg_boundary_mask);
+
+	t->max_phys_segments = min_not_zero(t->max_phys_segments, b->max_phys_segments);
+	t->max_hw_segments = min_not_zero(t->max_hw_segments, b->max_hw_segments);
+	t->max_segment_size = min_not_zero(t->max_segment_size, b->max_segment_size);
+	t->hardsect_size = max(t->hardsect_size, b->hardsect_size);
+	t->io_min = max(t->io_min, b->io_min);
+	t->io_granularity = max(t->io_granularity, b->io_granularity);
+
+	misaligned = 0;
+	offset += get_start_sect(bdev) << 9;
+
+	/* Bottom device offset aligned? */
+	if (offset && (offset & (b->io_granularity - 1)) != b->io_alignment) {
+		misaligned = 1;
+		goto out;
+	}
+
+	/* If top has no alignment, inherit from bottom */
+	if (!t->io_alignment)
+		t->io_alignment = b->io_alignment & (b->io_granularity - 1);
+
+	/* Top alignment on logical block boundary? */
+	if (t->io_alignment & (t->hardsect_size - 1)) {
+		misaligned = 1;
+		goto out;
+	}
+
+out:
+	if (!t->queue_lock)
+		WARN_ON_ONCE(1);
+	else if (misaligned || !test_bit(QUEUE_FLAG_CLUSTER, &b->queue_flags)) {
+		unsigned long flags;
+
+		spin_lock_irqsave(t->queue_lock, flags);
+
+		if (!test_bit(QUEUE_FLAG_CLUSTER, &b->queue_flags))
+			queue_flag_clear(QUEUE_FLAG_CLUSTER, t);
+
+		if (misaligned)
+			queue_flag_set(QUEUE_FLAG_MISALIGNED, t);
+
+		spin_unlock_irqrestore(t->queue_lock, flags);
+	}
+}
+EXPORT_SYMBOL(blk_queue_stack_topology);
+
+/**
  * blk_queue_dma_pad - set pad mask
  * @q:     the request queue for the device
  * @mask:  pad mask
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -105,6 +105,16 @@ static ssize_t queue_hw_sector_size_show
 	return queue_var_show(q->hardsect_size, page);
 }
 
+static ssize_t queue_io_min_show(struct request_queue *q, char *page)
+{
+	return queue_var_show(q->io_min, page);
+}
+
+static ssize_t queue_io_opt_show(struct request_queue *q, char *page)
+{
+	return queue_var_show(q->io_opt, page);
+}
+
 static ssize_t
 queue_max_sectors_store(struct request_queue *q, const char *page, size_t count)
 {
@@ -256,6 +266,16 @@ static struct queue_sysfs_entry queue_hw
 	.show = queue_hw_sector_size_show,
 };
 
+static struct queue_sysfs_entry queue_io_min_entry = {
+	.attr = {.name = "minimum_io_size", .mode = S_IRUGO },
+	.show = queue_io_min_show,
+};
+
+static struct queue_sysfs_entry queue_io_opt_entry = {
+	.attr = {.name = "optimal_io_size", .mode = S_IRUGO },
+	.show = queue_io_opt_show,
+};
+
 static struct queue_sysfs_entry queue_nonrot_entry = {
 	.attr = {.name = "rotational", .mode = S_IRUGO | S_IWUSR },
 	.show = queue_nonrot_show,
@@ -287,6 +307,8 @@ static struct attribute *default_attrs[]
 	&queue_max_sectors_entry.attr,
 	&queue_iosched_entry.attr,
 	&queue_hw_sector_size_entry.attr,
+	&queue_io_min_entry.attr,
+	&queue_io_opt_entry.attr,
 	&queue_nonrot_entry.attr,
 	&queue_nomerges_entry.attr,
 	&queue_rq_affinity_entry.attr,
diff --git a/block/genhd.c b/block/genhd.c
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -848,11 +848,20 @@ static ssize_t disk_capability_show(stru
 	return sprintf(buf, "%x\n", disk->flags);
 }
 
+static ssize_t disk_alignment_show(struct device *dev,
+				   struct device_attribute *attr, char *buf)
+{
+	struct gendisk *disk = dev_to_disk(dev);
+
+	return sprintf(buf, "%d\n", queue_io_alignment(disk->queue));
+}
+
 static DEVICE_ATTR(range, S_IRUGO, disk_range_show, NULL);
 static DEVICE_ATTR(ext_range, S_IRUGO, disk_ext_range_show, NULL);
 static DEVICE_ATTR(removable, S_IRUGO, disk_removable_show, NULL);
 static DEVICE_ATTR(ro, S_IRUGO, disk_ro_show, NULL);
 static DEVICE_ATTR(size, S_IRUGO, part_size_show, NULL);
+static DEVICE_ATTR(alignment, S_IRUGO, disk_alignment_show, NULL);
 static DEVICE_ATTR(capability, S_IRUGO, disk_capability_show, NULL);
 static DEVICE_ATTR(stat, S_IRUGO, part_stat_show, NULL);
 #ifdef CONFIG_FAIL_MAKE_REQUEST
@@ -871,6 +880,7 @@ static struct attribute *disk_attrs[] = 
 	&dev_attr_removable.attr,
 	&dev_attr_ro.attr,
 	&dev_attr_size.attr,
+	&dev_attr_alignment.attr,
 	&dev_attr_capability.attr,
 	&dev_attr_stat.attr,
 #ifdef CONFIG_FAIL_MAKE_REQUEST
diff --git a/fs/partitions/check.c b/fs/partitions/check.c
--- a/fs/partitions/check.c
+++ b/fs/partitions/check.c
@@ -219,6 +219,13 @@ ssize_t part_size_show(struct device *de
 	return sprintf(buf, "%llu\n",(unsigned long long)p->nr_sects);
 }
 
+ssize_t part_alignment_show(struct device *dev,
+			    struct device_attribute *attr, char *buf)
+{
+	struct hd_struct *p = dev_to_part(dev);
+	return sprintf(buf, "%llu\n", (unsigned long long)p->alignment);
+}
+
 ssize_t part_stat_show(struct device *dev,
 		       struct device_attribute *attr, char *buf)
 {
@@ -272,6 +279,7 @@ ssize_t part_fail_store(struct device *d
 static DEVICE_ATTR(partition, S_IRUGO, part_partition_show, NULL);
 static DEVICE_ATTR(start, S_IRUGO, part_start_show, NULL);
 static DEVICE_ATTR(size, S_IRUGO, part_size_show, NULL);
+static DEVICE_ATTR(alignment, S_IRUGO, part_alignment_show, NULL);
 static DEVICE_ATTR(stat, S_IRUGO, part_stat_show, NULL);
 #ifdef CONFIG_FAIL_MAKE_REQUEST
 static struct device_attribute dev_attr_fail =
@@ -282,6 +290,7 @@ static struct attribute *part_attrs[] = 
 	&dev_attr_partition.attr,
 	&dev_attr_start.attr,
 	&dev_attr_size.attr,
+	&dev_attr_alignment.attr,
 	&dev_attr_stat.attr,
 #ifdef CONFIG_FAIL_MAKE_REQUEST
 	&dev_attr_fail.attr,
@@ -383,6 +392,7 @@ struct hd_struct *add_partition(struct g
 	pdev = part_to_dev(p);
 
 	p->start_sect = start;
+	p->alignment = queue_sector_alignment(disk->queue, start);
 	p->nr_sects = len;
 	p->partno = partno;
 	p->policy = get_disk_ro(disk);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -402,6 +402,10 @@ struct request_queue
 	unsigned short		max_hw_segments;
 	unsigned short		hardsect_size;
 	unsigned int		max_segment_size;
+	unsigned int		io_alignment;
+	unsigned int		io_granularity;
+	unsigned int		io_min;
+	unsigned int		io_opt;
 
 	unsigned long		seg_boundary_mask;
 	void			*dma_drain_buffer;
@@ -461,6 +465,7 @@ struct request_queue
 #define QUEUE_FLAG_NONROT      14	/* non-rotational device (SSD) */
 #define QUEUE_FLAG_VIRT        QUEUE_FLAG_NONROT /* paravirt device */
 #define QUEUE_FLAG_IO_STAT     15	/* do IO stats */
+#define QUEUE_FLAG_MISALIGNED  16	/* bdev not aligned to disk */
 
 #define QUEUE_FLAG_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
 				 (1 << QUEUE_FLAG_CLUSTER) |		\
@@ -877,7 +882,15 @@ extern void blk_queue_max_phys_segments(
 extern void blk_queue_max_hw_segments(struct request_queue *, unsigned short);
 extern void blk_queue_max_segment_size(struct request_queue *, unsigned int);
 extern void blk_queue_hardsect_size(struct request_queue *, unsigned short);
+extern void blk_queue_io_granularity(struct request_queue *, unsigned short);
+extern void blk_queue_io_alignment(struct request_queue *q,
+				unsigned int alignment);
+extern void blk_queue_io_min(struct request_queue *q, unsigned int min);
+extern void blk_queue_io_opt(struct request_queue *q, unsigned int opt);
 extern void blk_queue_stack_limits(struct request_queue *t, struct request_queue *b);
+extern void blk_queue_stack_topology(struct request_queue *t,
+				     struct block_device *bdev,
+				     sector_t offset);
 extern void blk_queue_dma_pad(struct request_queue *, unsigned int);
 extern void blk_queue_update_dma_pad(struct request_queue *, unsigned int);
 extern int blk_queue_dma_drain(struct request_queue *q,
@@ -978,6 +991,23 @@ static inline int bdev_hardsect_size(str
 	return queue_hardsect_size(bdev_get_queue(bdev));
 }
 
+static inline int queue_io_alignment(struct request_queue *q)
+{
+	if (q && test_bit(QUEUE_FLAG_MISALIGNED, &q->queue_flags))
+		return -1;
+
+	if (q && q->io_alignment)
+		return q->io_alignment;
+
+	return 0;
+}
+
+static inline int queue_sector_alignment(struct request_queue *q,
+					 sector_t sector)
+{
+	return ((sector << 9) - q->io_alignment) & (q->io_min - 1);
+}
+
 static inline int queue_dma_alignment(struct request_queue *q)
 {
 	return q ? q->dma_alignment : 511;
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -90,6 +90,7 @@ struct disk_stats {
 struct hd_struct {
 	sector_t start_sect;
 	sector_t nr_sects;
+	sector_t alignment;
 	struct device __dev;
 	struct kobject *holder_dir;
 	int policy, partno;



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 3 of 9] MD: Use new topology calls to indicate alignment and I/O sizes
  2009-04-24  5:32 [PATCH 0 of 9] I/O topology patch kit Martin K. Petersen
  2009-04-24  5:32 ` [PATCH 1 of 9] block: Expose stacked device queues in sysfs Martin K. Petersen
  2009-04-24  5:32 ` [PATCH 2 of 9] block: Export I/O topology for block devices and partitions Martin K. Petersen
@ 2009-04-24  5:32 ` Martin K. Petersen
  2009-04-24  5:32 ` [PATCH 4 of 9] sd: Physical block size and alignment support Martin K. Petersen
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Martin K. Petersen @ 2009-04-24  5:32 UTC (permalink / raw)
  To: rwheeler, snitzer, jeff, neilb, James.Bottomley, dgilbert,
	jens.axboe, matthew, linux-ide

Switch MD over to the new blk_queue_stack_topology() function which
checks for aligment and adjusts preferred I/O sizes when stacking.

Also warn if an MD device contains misaligned components.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

---
6 files changed, 73 insertions(+), 15 deletions(-)
drivers/md/linear.c    |   10 ++++++++--
drivers/md/multipath.c |    5 ++---
drivers/md/raid0.c     |   14 ++++++++++++--
drivers/md/raid1.c     |   19 +++++++++++++++----
drivers/md/raid10.c    |   23 +++++++++++++++++++----
drivers/md/raid5.c     |   17 +++++++++++++++++



diff --git a/drivers/md/linear.c b/drivers/md/linear.c
--- a/drivers/md/linear.c
+++ b/drivers/md/linear.c
@@ -139,8 +139,8 @@ static linear_conf_t *linear_conf(mddev_
 
 		disk->rdev = rdev;
 
-		blk_queue_stack_limits(mddev->queue,
-				       rdev->bdev->bd_disk->queue);
+		blk_queue_stack_topology(mddev->queue, rdev->bdev,
+					 rdev->data_offset);
 		/* as we don't honour merge_bvec_fn, we must never risk
 		 * violating it, so limit ->max_sector to one PAGE, as
 		 * a one page request is never in violation.
@@ -154,6 +154,12 @@ static linear_conf_t *linear_conf(mddev_
 
 		cnt++;
 	}
+
+	if (queue_io_alignment(mddev->queue) < 0)
+		printk(KERN_NOTICE
+		       "Warning: %s has one or more misaligned components\n",
+		       mdname(mddev));
+
 	if (cnt != raid_disks) {
 		printk("linear: not enough drives present. Aborting!\n");
 		goto out;
diff --git a/drivers/md/multipath.c b/drivers/md/multipath.c
--- a/drivers/md/multipath.c
+++ b/drivers/md/multipath.c
@@ -294,7 +294,7 @@ static int multipath_add_disk(mddev_t *m
 	for (path = first; path <= last; path++)
 		if ((p=conf->multipaths+path)->rdev == NULL) {
 			q = rdev->bdev->bd_disk->queue;
-			blk_queue_stack_limits(mddev->queue, q);
+			blk_queue_stack_topology(mddev->queue, rdev->bdev, 0);
 
 		/* as we don't honour merge_bvec_fn, we must never risk
 		 * violating it, so limit ->max_sector to one PAGE, as
@@ -461,8 +461,7 @@ static int multipath_run (mddev_t *mddev
 		disk = conf->multipaths + disk_idx;
 		disk->rdev = rdev;
 
-		blk_queue_stack_limits(mddev->queue,
-				       rdev->bdev->bd_disk->queue);
+		blk_queue_stack_topology(mddev->queue, rdev->bdev, 0);
 		/* as we don't honour merge_bvec_fn, we must never risk
 		 * violating it, not that we ever expect a device with
 		 * a merge_bvec_fn to be involved in multipath */
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -120,6 +120,10 @@ static int create_strip_zones (mddev_t *
 	zone = &conf->strip_zone[0];
 	cnt = 0;
 	smallest = NULL;
+
+	blk_queue_io_min(mddev->queue, mddev->chunk_size);
+	blk_queue_io_opt(mddev->queue, mddev->chunk_size * mddev->raid_disks);
+
 	zone->dev = conf->devlist;
 	list_for_each_entry(rdev1, &mddev->disks, same_set) {
 		int j = rdev1->raid_disk;
@@ -136,8 +140,8 @@ static int create_strip_zones (mddev_t *
 		}
 		zone->dev[j] = rdev1;
 
-		blk_queue_stack_limits(mddev->queue,
-				       rdev1->bdev->bd_disk->queue);
+		blk_queue_stack_topology(mddev->queue, rdev1->bdev,
+					 rdev1->data_offset);
 		/* as we don't honour merge_bvec_fn, we must never risk
 		 * violating it, so limit ->max_sector to one PAGE, as
 		 * a one page request is never in violation.
@@ -151,6 +155,12 @@ static int create_strip_zones (mddev_t *
 			smallest = rdev1;
 		cnt++;
 	}
+
+	if (queue_io_alignment(mddev->queue) < 0)
+		printk(KERN_NOTICE
+		       "Warning: %s has one or more misaligned components\n",
+		       mdname(mddev));
+
 	if (cnt != mddev->raid_disks) {
 		printk(KERN_ERR "raid0: too few disks (%d of %d) - "
 			"aborting!\n", cnt, mddev->raid_disks);
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1123,8 +1123,8 @@ static int raid1_add_disk(mddev_t *mddev
 	for (mirror = first; mirror <= last; mirror++)
 		if ( !(p=conf->mirrors+mirror)->rdev) {
 
-			blk_queue_stack_limits(mddev->queue,
-					       rdev->bdev->bd_disk->queue);
+			blk_queue_stack_topology(mddev->queue, rdev->bdev,
+						 rdev->data_offset);
 			/* as we don't honour merge_bvec_fn, we must never risk
 			 * violating it, so limit ->max_sector to one PAGE, as
 			 * a one page request is never in violation.
@@ -1145,6 +1145,11 @@ static int raid1_add_disk(mddev_t *mddev
 			break;
 		}
 
+	if (queue_io_alignment(mddev->queue) < 0)
+		printk(KERN_NOTICE
+		       "Warning: %s has one or more misaligned components\n",
+		       mdname(mddev));
+
 	print_conf(conf);
 	return err;
 }
@@ -1989,8 +1994,8 @@ static int run(mddev_t *mddev)
 
 		disk->rdev = rdev;
 
-		blk_queue_stack_limits(mddev->queue,
-				       rdev->bdev->bd_disk->queue);
+		blk_queue_stack_topology(mddev->queue, rdev->bdev,
+					 rdev->data_offset);
 		/* as we don't honour merge_bvec_fn, we must never risk
 		 * violating it, so limit ->max_sector to one PAGE, as
 		 * a one page request is never in violation.
@@ -2001,6 +2006,12 @@ static int run(mddev_t *mddev)
 
 		disk->head_position = 0;
 	}
+
+	if (queue_io_alignment(mddev->queue) < 0)
+		printk(KERN_NOTICE
+		       "Warning: %s has one or more misaligned components\n",
+		       mdname(mddev));
+
 	conf->raid_disks = mddev->raid_disks;
 	conf->mddev = mddev;
 	INIT_LIST_HEAD(&conf->retry_list);
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1151,8 +1151,8 @@ static int raid10_add_disk(mddev_t *mdde
 	for ( ; mirror <= last ; mirror++)
 		if ( !(p=conf->mirrors+mirror)->rdev) {
 
-			blk_queue_stack_limits(mddev->queue,
-					       rdev->bdev->bd_disk->queue);
+			blk_queue_stack_topology(mddev->queue, rdev->bdev,
+						 rdev->data_offset);
 			/* as we don't honour merge_bvec_fn, we must never risk
 			 * violating it, so limit ->max_sector to one PAGE, as
 			 * a one page request is never in violation.
@@ -1170,6 +1170,11 @@ static int raid10_add_disk(mddev_t *mdde
 			break;
 		}
 
+	if (queue_io_alignment(mddev->queue) < 0)
+		printk(KERN_NOTICE
+		       "Warning: %s has one or more misaligned components\n",
+		       mdname(mddev));
+
 	print_conf(conf);
 	return err;
 }
@@ -2129,6 +2134,10 @@ static int run(mddev_t *mddev)
 	spin_lock_init(&conf->device_lock);
 	mddev->queue->queue_lock = &conf->device_lock;
 
+	blk_queue_io_min(mddev->queue, mddev->chunk_size);
+	blk_queue_io_opt(mddev->queue,
+			 (mddev->chunk_size * mddev->raid_disks) >> 1);
+
 	list_for_each_entry(rdev, &mddev->disks, same_set) {
 		disk_idx = rdev->raid_disk;
 		if (disk_idx >= mddev->raid_disks
@@ -2138,8 +2147,8 @@ static int run(mddev_t *mddev)
 
 		disk->rdev = rdev;
 
-		blk_queue_stack_limits(mddev->queue,
-				       rdev->bdev->bd_disk->queue);
+		blk_queue_stack_topology(mddev->queue, rdev->bdev,
+					 rdev->data_offset);
 		/* as we don't honour merge_bvec_fn, we must never risk
 		 * violating it, so limit ->max_sector to one PAGE, as
 		 * a one page request is never in violation.
@@ -2150,6 +2159,12 @@ static int run(mddev_t *mddev)
 
 		disk->head_position = 0;
 	}
+
+	if (queue_io_alignment(mddev->queue) < 0)
+		printk(KERN_NOTICE
+		       "Warning: %s has one or more misaligned components\n",
+		       mdname(mddev));
+
 	INIT_LIST_HEAD(&conf->retry_list);
 
 	spin_lock_init(&conf->resync_lock);
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -4601,6 +4601,23 @@ static int run(mddev_t *mddev)
 	md_set_array_sectors(mddev, raid5_size(mddev, 0, 0));
 
 	blk_queue_merge_bvec(mddev->queue, raid5_mergeable_bvec);
+	blk_queue_io_min(mddev->queue, mddev->chunk_size);
+
+	if (mddev->level == 5)
+		blk_queue_io_opt(mddev->queue,
+				 mddev->chunk_size * (mddev->raid_disks - 1));
+	else
+		blk_queue_io_opt(mddev->queue,
+				 mddev->chunk_size * (mddev->raid_disks - 2));
+
+	list_for_each_entry(rdev, &mddev->disks, same_set)
+		blk_queue_stack_topology(mddev->queue, rdev->bdev,
+					 rdev->data_offset);
+
+	if (queue_io_alignment(mddev->queue) < 0)
+		printk(KERN_NOTICE
+		       "Warning: %s has one or more misaligned components\n",
+		       mdname(mddev));
 
 	return 0;
 abort:



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 4 of 9] sd: Physical block size and alignment support
  2009-04-24  5:32 [PATCH 0 of 9] I/O topology patch kit Martin K. Petersen
                   ` (2 preceding siblings ...)
  2009-04-24  5:32 ` [PATCH 3 of 9] MD: Use new topology calls to indicate alignment and I/O sizes Martin K. Petersen
@ 2009-04-24  5:32 ` Martin K. Petersen
  2009-04-24  5:32 ` [PATCH 5 of 9] sd: Detect non-rotational devices Martin K. Petersen
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Martin K. Petersen @ 2009-04-24  5:32 UTC (permalink / raw)
  To: rwheeler, snitzer, jeff, neilb, James.Bottomley, dgilbert,
	jens.axboe, matthew, linux-ide

Extract physical block size and lowest aligned LBA from READ
CAPACITY(16) response and adjust queue parameters.

Report physical block size and alignment when applicable.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

---
2 files changed, 22 insertions(+), 2 deletions(-)
drivers/scsi/sd.c |   23 +++++++++++++++++++++--
drivers/scsi/sd.h |    1 +



diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1306,6 +1306,7 @@ static int read_capacity_16(struct scsi_
 	int sense_valid = 0;
 	int the_result;
 	int retries = 3;
+	unsigned int alignment;
 	unsigned long long lba;
 	unsigned sector_size;
 
@@ -1346,6 +1347,16 @@ static int read_capacity_16(struct scsi_
 
 	sector_size =	(buffer[8] << 24) | (buffer[9] << 16) |
 			(buffer[10] << 8) | buffer[11];
+
+	/* Logical blocks per physical block exponent */
+	sdkp->hw_sector_size = (1 << (buffer[13] & 0xf)) * sector_size;
+
+	/* Lowest aligned logical block */
+	alignment = ((buffer[14] & 0x3f) << 8 | buffer[15]) * sector_size;
+	blk_queue_io_alignment(sdp->request_queue, alignment);
+	if (alignment && sdkp->first_scan)
+		sd_printk(KERN_NOTICE, sdkp, "%u-byte alignment\n", alignment);
+
 	lba =  (((u64)buffer[0] << 56) | ((u64)buffer[1] << 48) |
 		((u64)buffer[2] << 40) | ((u64)buffer[3] << 32) |
 		((u64)buffer[4] << 24) | ((u64)buffer[5] << 16) |
@@ -1402,6 +1413,7 @@ static int read_capacity_10(struct scsi_
 
 	sector_size =	(buffer[4] << 24) | (buffer[5] << 16) |
 			(buffer[6] << 8) | buffer[7];
+	sdkp->hw_sector_size = sector_size;
 	lba =	(buffer[0] << 24) | (buffer[1] << 16) |
 		(buffer[2] << 8) | buffer[3];
 
@@ -1526,11 +1538,17 @@ got_data:
 		string_get_size(sz, STRING_UNITS_10, cap_str_10,
 				sizeof(cap_str_10));
 
-		if (sdkp->first_scan || old_capacity != sdkp->capacity)
+		if (sdkp->first_scan || old_capacity != sdkp->capacity) {
 			sd_printk(KERN_NOTICE, sdkp,
-				  "%llu %d-byte hardware sectors: (%s/%s)\n",
+				  "%llu %d-byte logical blocks: (%s/%s)\n",
 				  (unsigned long long)sdkp->capacity,
 				  sector_size, cap_str_10, cap_str_2);
+
+			if (sdkp->hw_sector_size != sector_size)
+				sd_printk(KERN_NOTICE, sdkp,
+					  "%u-byte physical blocks\n",
+					  sdkp->hw_sector_size);
+		}
 	}
 
 	/* Rescale capacity to 512-byte units */
@@ -1543,6 +1561,7 @@ got_data:
 	else if (sector_size == 256)
 		sdkp->capacity >>= 1;
 
+	blk_queue_io_granularity(sdp->request_queue, sdkp->hw_sector_size);
 	sdkp->device->sector_size = sector_size;
 }
 
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -45,6 +45,7 @@ struct scsi_disk {
 	unsigned int	openers;	/* protected by BKL for now, yuck */
 	sector_t	capacity;	/* size in 512-byte sectors */
 	u32		index;
+	unsigned short	hw_sector_size;
 	u8		media_present;
 	u8		write_prot;
 	u8		protection_type;/* Data Integrity Field */



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 5 of 9] sd: Detect non-rotational devices
  2009-04-24  5:32 [PATCH 0 of 9] I/O topology patch kit Martin K. Petersen
                   ` (3 preceding siblings ...)
  2009-04-24  5:32 ` [PATCH 4 of 9] sd: Physical block size and alignment support Martin K. Petersen
@ 2009-04-24  5:32 ` Martin K. Petersen
  2009-04-24  5:32 ` [PATCH 6 of 9] sd: Block limits VPD support Martin K. Petersen
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Martin K. Petersen @ 2009-04-24  5:32 UTC (permalink / raw)
  To: rwheeler, snitzer, jeff, neilb, James.Bottomley, dgilbert,
	jens.axboe, matthew, linux-ide

Detect non-rotational devices and set the queue flag accordingly.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

---
1 file changed, 27 insertions(+)
drivers/scsi/sd.c |   27 +++++++++++++++++++++++++++



diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -50,6 +50,7 @@
 #include <linux/string_helpers.h>
 #include <linux/async.h>
 #include <asm/uaccess.h>
+#include <asm/unaligned.h>
 
 #include <scsi/scsi.h>
 #include <scsi/scsi_cmnd.h>
@@ -1800,6 +1801,29 @@ void sd_read_app_tag_own(struct scsi_dis
 }
 
 /**
+ * sd_read_block_characteristics - Query block dev. characteristics
+ * @disk: disk to query
+ */
+static void sd_read_block_characteristics(struct scsi_disk *sdkp)
+{
+	char *buffer;
+	u16 rot;
+
+	/* Block Device Characteristics VPD */
+	buffer = scsi_get_vpd_page(sdkp->device, 0xb1);
+
+	if (buffer == NULL)
+		return;
+
+	rot = get_unaligned_be16(&buffer[4]);
+
+	if (rot == 1)
+		queue_flag_set_unlocked(QUEUE_FLAG_NONROT, sdkp->disk->queue);
+
+	kfree(buffer);
+}
+
+/**
  *	sd_revalidate_disk - called the first time a new disk is seen,
  *	performs disk spin up, read_capacity, etc.
  *	@disk: struct gendisk we care about
@@ -1836,6 +1860,7 @@ static int sd_revalidate_disk(struct gen
 	 */
 	if (sdkp->media_present) {
 		sd_read_capacity(sdkp, buffer);
+		sd_read_block_characteristics(sdkp);
 		sd_read_write_protect_flag(sdkp, buffer);
 		sd_read_cache_type(sdkp, buffer);
 		sd_read_app_tag_own(sdkp, buffer);
@@ -1976,6 +2001,8 @@ static void sd_probe_async(void *data, a
 	add_disk(gd);
 	sd_dif_config_host(sdkp);
 
+	sd_revalidate_disk(gd);
+
 	sd_printk(KERN_NOTICE, sdkp, "Attached SCSI %sdisk\n",
 		  sdp->removable ? "removable " : "");
 



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 6 of 9] sd: Block limits VPD support
  2009-04-24  5:32 [PATCH 0 of 9] I/O topology patch kit Martin K. Petersen
                   ` (4 preceding siblings ...)
  2009-04-24  5:32 ` [PATCH 5 of 9] sd: Detect non-rotational devices Martin K. Petersen
@ 2009-04-24  5:32 ` Martin K. Petersen
  2009-04-24  5:32 ` [PATCH 7 of 9] scsi_debug: Add support for physical block exponent and alignment Martin K. Petersen
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Martin K. Petersen @ 2009-04-24  5:32 UTC (permalink / raw)
  To: rwheeler, snitzer, jeff, neilb, James.Bottomley, dgilbert,
	jens.axboe, matthew, linux-ide

Query the block limits VPD page and adjust queue minimum and optimal I/O
sizes.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

---
1 file changed, 24 insertions(+)
drivers/scsi/sd.c |   24 ++++++++++++++++++++++++



diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1801,6 +1801,29 @@ void sd_read_app_tag_own(struct scsi_dis
 }
 
 /**
+ * sd_read_block_limits - Query disk device for preferred I/O sizes.
+ * @disk: disk to query
+ */
+static void sd_read_block_limits(struct scsi_disk *sdkp)
+{
+	unsigned int sector_sz = sdkp->device->sector_size;
+	char *buffer;
+
+	/* Block Limits VPD */
+	buffer = scsi_get_vpd_page(sdkp->device, 0xb0);
+
+	if (buffer == NULL)
+		return;
+
+	blk_queue_io_min(sdkp->disk->queue,
+			 get_unaligned_be16(&buffer[6]) * sector_sz);
+	blk_queue_io_opt(sdkp->disk->queue,
+			 get_unaligned_be32(&buffer[12]) * sector_sz);
+
+	kfree(buffer);
+}
+
+/**
  * sd_read_block_characteristics - Query block dev. characteristics
  * @disk: disk to query
  */
@@ -1860,6 +1883,7 @@ static int sd_revalidate_disk(struct gen
 	 */
 	if (sdkp->media_present) {
 		sd_read_capacity(sdkp, buffer);
+		sd_read_block_limits(sdkp);
 		sd_read_block_characteristics(sdkp);
 		sd_read_write_protect_flag(sdkp, buffer);
 		sd_read_cache_type(sdkp, buffer);



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 7 of 9] scsi_debug: Add support for physical block exponent and alignment
  2009-04-24  5:32 [PATCH 0 of 9] I/O topology patch kit Martin K. Petersen
                   ` (5 preceding siblings ...)
  2009-04-24  5:32 ` [PATCH 6 of 9] sd: Block limits VPD support Martin K. Petersen
@ 2009-04-24  5:32 ` Martin K. Petersen
  2009-04-24  5:32 ` [PATCH 8 of 9] libata: Report disk alignment and physical block size Martin K. Petersen
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Martin K. Petersen @ 2009-04-24  5:32 UTC (permalink / raw)
  To: rwheeler, snitzer, jeff, neilb, James.Bottomley, dgilbert,
	jens.axboe, matthew, linux-ide

This patch adds support for setting the physical block exponent and
lowest aligned LBA in the READ CAPACITY(16) response.

The B0 VPD page is adjusted accordingly.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

---
1 file changed, 29 insertions(+), 1 deletion(-)
drivers/scsi/scsi_debug.c |   30 +++++++++++++++++++++++++++++-



diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
--- a/drivers/scsi/scsi_debug.c
+++ b/drivers/scsi/scsi_debug.c
@@ -101,6 +101,8 @@ static const char * scsi_debug_version_d
 #define DEF_DIF 0
 #define DEF_GUARD 0
 #define DEF_ATO 1
+#define DEF_PHYSBLK_EXP 0
+#define DEF_LOWEST_ALIGNED 0
 
 /* bit mask values for scsi_debug_opts */
 #define SCSI_DEBUG_OPT_NOISE   1
@@ -156,6 +158,8 @@ static int scsi_debug_dix = DEF_DIX;
 static int scsi_debug_dif = DEF_DIF;
 static int scsi_debug_guard = DEF_GUARD;
 static int scsi_debug_ato = DEF_ATO;
+static int scsi_debug_physblk_exp = DEF_PHYSBLK_EXP;
+static int scsi_debug_lowest_aligned = DEF_LOWEST_ALIGNED;
 
 static int scsi_debug_cmnd_count = 0;
 
@@ -657,7 +661,12 @@ static unsigned char vpdb0_data[] = {
 
 static int inquiry_evpd_b0(unsigned char * arr)
 {
+	unsigned int gran;
+
 	memcpy(arr, vpdb0_data, sizeof(vpdb0_data));
+	gran = 1 << scsi_debug_physblk_exp;
+	arr[2] = (gran >> 8) & 0xff;
+	arr[3] = gran & 0xff;
 	if (sdebug_store_sectors > 0x400) {
 		arr[4] = (sdebug_store_sectors >> 24) & 0xff;
 		arr[5] = (sdebug_store_sectors >> 16) & 0xff;
@@ -945,6 +954,9 @@ static int resp_readcap16(struct scsi_cm
 	arr[9] = (scsi_debug_sector_size >> 16) & 0xff;
 	arr[10] = (scsi_debug_sector_size >> 8) & 0xff;
 	arr[11] = scsi_debug_sector_size & 0xff;
+	arr[13] = scsi_debug_physblk_exp & 0xf;
+	arr[14] = (scsi_debug_lowest_aligned >> 8) & 0x3f;
+	arr[15] = scsi_debug_lowest_aligned & 0xff;
 
 	if (scsi_debug_dif) {
 		arr[12] = (scsi_debug_dif - 1) << 1; /* P_TYPE */
@@ -2380,6 +2392,8 @@ module_param_named(dix, scsi_debug_dix, 
 module_param_named(dif, scsi_debug_dif, int, S_IRUGO);
 module_param_named(guard, scsi_debug_guard, int, S_IRUGO);
 module_param_named(ato, scsi_debug_ato, int, S_IRUGO);
+module_param_named(physblk_exp, scsi_debug_physblk_exp, int, S_IRUGO);
+module_param_named(lowest_aligned, scsi_debug_lowest_aligned, int, S_IRUGO);
 
 MODULE_AUTHOR("Eric Youngdale + Douglas Gilbert");
 MODULE_DESCRIPTION("SCSI debug adapter driver");
@@ -2401,7 +2415,9 @@ MODULE_PARM_DESC(ptype, "SCSI peripheral
 MODULE_PARM_DESC(scsi_level, "SCSI level to simulate(def=5[SPC-3])");
 MODULE_PARM_DESC(virtual_gb, "virtual gigabyte size (def=0 -> use dev_size_mb)");
 MODULE_PARM_DESC(vpd_use_hostno, "0 -> dev ids ignore hostno (def=1 -> unique dev ids)");
-MODULE_PARM_DESC(sector_size, "hardware sector size in bytes (def=512)");
+MODULE_PARM_DESC(sector_size, "logical block size in bytes (def=512)");
+MODULE_PARM_DESC(physblk_exp, "physical block exponent (def=0)");
+MODULE_PARM_DESC(lowest_aligned, "lowest aligned lba (def=0)");
 MODULE_PARM_DESC(dix, "data integrity extensions mask (def=0)");
 MODULE_PARM_DESC(dif, "data integrity field type: 0-3 (def=0)");
 MODULE_PARM_DESC(guard, "protection checksum: 0=crc, 1=ip (def=0)");
@@ -2874,6 +2890,18 @@ static int __init scsi_debug_init(void)
 		return -EINVAL;
 	}
 
+	if (scsi_debug_physblk_exp > 15) {
+		printk(KERN_ERR "scsi_debug_init: invalid physblk_exp %u\n",
+		       scsi_debug_physblk_exp);
+		return -EINVAL;
+	}
+
+	if (scsi_debug_lowest_aligned > 0x3fff) {
+		printk(KERN_ERR "scsi_debug_init: lowest_aligned too big: %u\n",
+		       scsi_debug_lowest_aligned);
+		return -EINVAL;
+	}
+
 	if (scsi_debug_dev_size_mb < 1)
 		scsi_debug_dev_size_mb = 1;  /* force minimum 1 MB ramdisk */
 	sz = (unsigned long)scsi_debug_dev_size_mb * 1048576;



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 8 of 9] libata: Report disk alignment and physical block size
  2009-04-24  5:32 [PATCH 0 of 9] I/O topology patch kit Martin K. Petersen
                   ` (6 preceding siblings ...)
  2009-04-24  5:32 ` [PATCH 7 of 9] scsi_debug: Add support for physical block exponent and alignment Martin K. Petersen
@ 2009-04-24  5:32 ` Martin K. Petersen
  2009-04-24  5:32 ` [PATCH 9 of 9] libata: Media rotation rate and form factor heuristics Martin K. Petersen
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Martin K. Petersen @ 2009-04-24  5:32 UTC (permalink / raw)
  To: rwheeler, snitzer, jeff, neilb, James.Bottomley, dgilbert,
	jens.axboe, matthew, linux-ide

For disks with 4KB sectors, report the correct block size and alignment
when filling out the READ CAPACITY(16) response.

This patch is based upon code from Matthew Wilcox' 4KB ATA tree.  I
fixed the bug I reported a while back caused by ATA and SCSI using
different approaches to describing the alignment.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

---
1 file changed, 22 insertions(+), 1 deletion(-)
drivers/ata/libata-scsi.c |   23 ++++++++++++++++++++++-



diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -2376,7 +2376,23 @@ saving_not_supp:
  */
 static unsigned int ata_scsiop_read_cap(struct ata_scsi_args *args, u8 *rbuf)
 {
-	u64 last_lba = args->dev->n_sectors - 1; /* LBA of the last block */
+	struct ata_device *dev = args->dev;
+	u64 last_lba = dev->n_sectors - 1; /* LBA of the last block */
+	u8 log_per_phys = 0;
+	u16 lowest_aligned = 0;
+	u16 word_106 = dev->id[106];
+	u16 word_209 = dev->id[209];
+
+	if ((word_106 & 0xc000) == 0x4000) {
+		/* Number and offset of logical sectors per physical sector */
+		if (word_106 & (1 << 13))
+			log_per_phys = word_106 & 0xf;
+		if ((word_209 & 0xc000) == 0x4000) {
+			u16 first = dev->id[209] & 0x3fff;
+			if (first > 0)
+				lowest_aligned = (1 << log_per_phys) - first;
+		}
+	}
 
 	VPRINTK("ENTER\n");
 
@@ -2407,6 +2423,11 @@ static unsigned int ata_scsiop_read_cap(
 		/* sector size */
 		rbuf[10] = ATA_SECT_SIZE >> 8;
 		rbuf[11] = ATA_SECT_SIZE & 0xff;
+
+		rbuf[12] = 0;
+		rbuf[13] = log_per_phys;
+		rbuf[14] = (lowest_aligned >> 8) & 0x3f;
+		rbuf[15] = lowest_aligned;
 	}
 
 	return 0;



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 9 of 9] libata: Media rotation rate and form factor heuristics
  2009-04-24  5:32 [PATCH 0 of 9] I/O topology patch kit Martin K. Petersen
                   ` (7 preceding siblings ...)
  2009-04-24  5:32 ` [PATCH 8 of 9] libata: Report disk alignment and physical block size Martin K. Petersen
@ 2009-04-24  5:32 ` Martin K. Petersen
       [not found] ` <7ec2d82b188a9e9d4c56.1240551148@sermon.lab.mkp.net>
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Martin K. Petersen @ 2009-04-24  5:32 UTC (permalink / raw)
  To: rwheeler, snitzer, jeff, neilb, James.Bottomley, dgilbert,
	jens.axboe, matthew, linux-ide

This patch delegates setting the queue NONROT flag to SCSI.

It also provides new heuristics for parsing both the form factor and
media rotation rate ATA IDENFITY words.

The reported ATA version must be 7 or greater and the device must return
values defined as valid in the standard.  Only then are the
characteristics reported to SCSI via the VPD B1 page.

This seems like a reasonable compromise to me considering that we have
been shipping several kernel releases that key off the rotation rate bit
without any version checking whatsoever.  With no complaints so far.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

---
2 files changed, 34 insertions(+), 9 deletions(-)
drivers/ata/libata-scsi.c |   15 ++++++---------
include/linux/ata.h       |   28 ++++++++++++++++++++++++++++



diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1119,10 +1119,6 @@ static int ata_scsi_dev_config(struct sc
 
 		blk_queue_dma_drain(q, atapi_drain_needed, buf, ATAPI_MAX_DRAIN);
 	} else {
-		if (ata_id_is_ssd(dev->id))
-			queue_flag_set_unlocked(QUEUE_FLAG_NONROT,
-						sdev->request_queue);
-
 		/* ATA devices must be sector aligned */
 		blk_queue_update_dma_alignment(sdev->request_queue,
 					       ATA_SECT_SIZE - 1);
@@ -2142,13 +2138,14 @@ static unsigned int ata_scsiop_inq_89(st
 
 static unsigned int ata_scsiop_inq_b1(struct ata_scsi_args *args, u8 *rbuf)
 {
+	int form_factor = ata_id_form_factor(args->id);
+	int media_rotation_rate = ata_id_rotation_rate(args->id);
+
 	rbuf[1] = 0xb1;
 	rbuf[3] = 0x3c;
-	if (ata_id_major_version(args->id) > 7) {
-		rbuf[4] = args->id[217] >> 8;
-		rbuf[5] = args->id[217];
-		rbuf[7] = args->id[168] & 0xf;
-	}
+	rbuf[4] = media_rotation_rate >> 8;
+	rbuf[5] = media_rotation_rate;
+	rbuf[7] = form_factor;
 
 	return 0;
 }
diff --git a/include/linux/ata.h b/include/linux/ata.h
--- a/include/linux/ata.h
+++ b/include/linux/ata.h
@@ -730,6 +730,34 @@ static inline int ata_id_has_unload(cons
 	return 0;
 }
 
+static inline int ata_id_form_factor(const u16 *id)
+{
+	u16 val = id[168];
+
+	if (ata_id_major_version(id) < 7 || val == 0 || val == 0xffff)
+		return 0;
+
+	val &= 0xf;
+
+	if (val > 5)
+		return 0;
+
+	return val;
+}
+
+static inline int ata_id_rotation_rate(const u16 *id)
+{
+	u16 val = id[217];
+
+	if (ata_id_major_version(id) < 7 || val == 0 || val == 0xffff)
+		return 0;
+
+	if (val > 1 && val < 0x401)
+		return 0;
+
+	return val;
+}
+
 static inline int ata_id_has_trim(const u16 *id)
 {
 	if (ata_id_major_version(id) >= 7 &&



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 7 of 9] scsi_debug: Add support for physical block exponent and alignment
       [not found] ` <7ec2d82b188a9e9d4c56.1240551148@sermon.lab.mkp.net>
@ 2009-04-24  6:10   ` Douglas Gilbert
  2009-04-24  6:14     ` Martin K. Petersen
  0 siblings, 1 reply; 21+ messages in thread
From: Douglas Gilbert @ 2009-04-24  6:10 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: rwheeler, snitzer, jeff, neilb, James.Bottomley, jens.axboe,
	matthew, linux-ide, linux-scsi

Martin K. Petersen wrote:
> This patch adds support for setting the physical block exponent and
> lowest aligned LBA in the READ CAPACITY(16) response.
> 
> The B0 VPD page is adjusted accordingly.
> 
> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>

Sorry, not much of a flame.

BTW Does scsi_debug identify itself (in VPD b1h) as a SSD?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 7 of 9] scsi_debug: Add support for physical block exponent and alignment
  2009-04-24  6:10   ` [PATCH 7 of 9] scsi_debug: Add support for physical block exponent and alignment Douglas Gilbert
@ 2009-04-24  6:14     ` Martin K. Petersen
  0 siblings, 0 replies; 21+ messages in thread
From: Martin K. Petersen @ 2009-04-24  6:14 UTC (permalink / raw)
  To: dgilbert
  Cc: Martin K. Petersen, rwheeler, snitzer, jeff, neilb,
	James.Bottomley, jens.axboe, matthew, linux-ide, linux-scsi

>>>>> "Doug" == Douglas Gilbert <dgilbert@interlog.com> writes:

Doug> BTW Does scsi_debug identify itself (in VPD b1h) as a SSD?

Yep!

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2 of 9] block: Export I/O topology for block devices and partitions
  2009-04-24  5:32 ` [PATCH 2 of 9] block: Export I/O topology for block devices and partitions Martin K. Petersen
@ 2009-04-24 12:14   ` Kay Sievers
  2009-04-24 12:54     ` Jeff Garzik
  2009-04-24 14:53     ` Martin K. Petersen
  0 siblings, 2 replies; 21+ messages in thread
From: Kay Sievers @ 2009-04-24 12:14 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: rwheeler, snitzer, jeff, neilb, James.Bottomley, dgilbert,
	jens.axboe, matthew, linux-ide, linux-scsi

On Fri, Apr 24, 2009 at 07:32, Martin K. Petersen
<martin.petersen@oracle.com> wrote:
> +What:          /sys/block/<disk>/alignment
> +What:          /sys/block/<disk>/<partition>/alignment
> +What:          /sys/block/<disk>/queue/minimum_io_size
> +What:          /sys/block/<disk>/queue/optimal_io_size

Wouldn't it be good to include "sector", like the queue files do? The
alignment of a partition could mean many things.
  /sys/block/<disk>/sector_alignment
  /sys/block/<disk>/<partition>/sector_alignment

And prefixing the io values might be easier to read when they show up
in a group?
  /sys/block/<disk>/queue/io_minimum_size
  /sys/block/<disk>/queue/io_optimal_size
  /sys/block/<disk>/queue/io_...

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 9 of 9] libata: Media rotation rate and form factor heuristics
       [not found] ` <2fc5b2aa370a8ad47db1.1240551150@sermon.lab.mkp.net>
@ 2009-04-24 12:30   ` Matthew Wilcox
  0 siblings, 0 replies; 21+ messages in thread
From: Matthew Wilcox @ 2009-04-24 12:30 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: rwheeler, snitzer, jeff, neilb, James.Bottomley, dgilbert,
	jens.axboe, linux-ide, linux-scsi

On Fri, Apr 24, 2009 at 01:32:30AM -0400, Martin K. Petersen wrote:
> This patch delegates setting the queue NONROT flag to SCSI.
> 
> It also provides new heuristics for parsing both the form factor and
> media rotation rate ATA IDENFITY words.
> 
> The reported ATA version must be 7 or greater and the device must return
> values defined as valid in the standard.  Only then are the
> characteristics reported to SCSI via the VPD B1 page.
> 
> This seems like a reasonable compromise to me considering that we have
> been shipping several kernel releases that key off the rotation rate bit
> without any version checking whatsoever.  With no complaints so far.
> 
> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

Reviewed-by: Matthew Wilcox <willy@linux.intel.com>

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 8 of 9] libata: Report disk alignment and physical block size
       [not found] ` <47f4f448a804a2d24f10.1240551149@sermon.lab.mkp.net>
@ 2009-04-24 12:32   ` Matthew Wilcox
  0 siblings, 0 replies; 21+ messages in thread
From: Matthew Wilcox @ 2009-04-24 12:32 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: rwheeler, snitzer, jeff, neilb, James.Bottomley, dgilbert,
	jens.axboe, linux-ide, linux-scsi

On Fri, Apr 24, 2009 at 01:32:29AM -0400, Martin K. Petersen wrote:
> For disks with 4KB sectors, report the correct block size and alignment
> when filling out the READ CAPACITY(16) response.
> 
> This patch is based upon code from Matthew Wilcox' 4KB ATA tree.  I
> fixed the bug I reported a while back caused by ATA and SCSI using
> different approaches to describing the alignment.
> 
> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

Reviewed-by: Matthew Wilcox <willy@linux.intel.com>

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2 of 9] block: Export I/O topology for block devices and partitions
  2009-04-24 12:14   ` Kay Sievers
@ 2009-04-24 12:54     ` Jeff Garzik
  2009-04-24 14:37       ` Carl Henrik Lunde
  2009-04-24 15:00       ` Martin K. Petersen
  2009-04-24 14:53     ` Martin K. Petersen
  1 sibling, 2 replies; 21+ messages in thread
From: Jeff Garzik @ 2009-04-24 12:54 UTC (permalink / raw)
  To: Kay Sievers
  Cc: Martin K. Petersen, rwheeler, snitzer, neilb, James.Bottomley,
	dgilbert, jens.axboe, matthew, linux-ide, linux-scsi, LKML

Kay Sievers wrote:
> On Fri, Apr 24, 2009 at 07:32, Martin K. Petersen
> <martin.petersen@oracle.com> wrote:
>> +What:          /sys/block/<disk>/alignment
>> +What:          /sys/block/<disk>/<partition>/alignment
>> +What:          /sys/block/<disk>/queue/minimum_io_size
>> +What:          /sys/block/<disk>/queue/optimal_io_size
> 
> Wouldn't it be good to include "sector", like the queue files do? The
> alignment of a partition could mean many things.
>   /sys/block/<disk>/sector_alignment
>   /sys/block/<disk>/<partition>/sector_alignment
> 
> And prefixing the io values might be easier to read when they show up
> in a group?
>   /sys/block/<disk>/queue/io_minimum_size
>   /sys/block/<disk>/queue/io_optimal_size
>   /sys/block/<disk>/queue/io_...

Why do we need all this syscall overhead just to read individual data items?

Isn't it dumb to require 30 userland syscalls simply to input a 
10-member data structure?

netlink looks more and more attractive for anything non-trivial.

	Jeff




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2 of 9] block: Export I/O topology for block devices and partitions
  2009-04-24 12:54     ` Jeff Garzik
@ 2009-04-24 14:37       ` Carl Henrik Lunde
  2009-04-24 14:47         ` Matthew Wilcox
  2009-04-24 15:16         ` Martin K. Petersen
  2009-04-24 15:00       ` Martin K. Petersen
  1 sibling, 2 replies; 21+ messages in thread
From: Carl Henrik Lunde @ 2009-04-24 14:37 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Jeff Garzik, Kay Sievers, rwheeler, snitzer, neilb,
	James.Bottomley, dgilbert, jens.axboe, matthew, linux-ide,
	linux-scsi, LKML

> On Fri, Apr 24, 2009 at 07:32, Martin K. Petersen <martin.petersen@oracle.com> wrote:
> +What:          /sys/block/<disk>/alignment
> +What:          /sys/block/<disk>/<partition>/alignment
> +What:          /sys/block/<disk>/queue/minimum_io_size
> +What:          /sys/block/<disk>/queue/optimal_io_size

Would it also be possible and useful to include the number of
spindles/channels, i.e., how many requests the device can handle
concurrently?  CFQ could for example serve two time slices
concurrently if you have sequential reads and the device reports two
spindles.

[sorry for replying in the middle of the thread, but I didn't get the
original email]
-- 
mvh
Carl Henrik
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2 of 9] block: Export I/O topology for block devices and partitions
  2009-04-24 14:37       ` Carl Henrik Lunde
@ 2009-04-24 14:47         ` Matthew Wilcox
  2009-04-24 15:16         ` Martin K. Petersen
  1 sibling, 0 replies; 21+ messages in thread
From: Matthew Wilcox @ 2009-04-24 14:47 UTC (permalink / raw)
  To: Carl Henrik Lunde
  Cc: Martin K. Petersen, Jeff Garzik, Kay Sievers, rwheeler, snitzer,
	neilb, James.Bottomley, dgilbert, jens.axboe, linux-ide,
	linux-scsi, LKML

On Fri, Apr 24, 2009 at 04:37:17PM +0200, Carl Henrik Lunde wrote:
> > On Fri, Apr 24, 2009 at 07:32, Martin K. Petersen <martin.petersen@oracle.com> wrote:
> > +What: ? ? ? ? ?/sys/block/<disk>/alignment
> > +What: ? ? ? ? ?/sys/block/<disk>/<partition>/alignment
> > +What: ? ? ? ? ?/sys/block/<disk>/queue/minimum_io_size
> > +What: ? ? ? ? ?/sys/block/<disk>/queue/optimal_io_size
> 
> Would it also be possible and useful to include the number of
> spindles/channels, i.e., how many requests the device can handle
> concurrently?  CFQ could for example serve two time slices
> concurrently if you have sequential reads and the device reports two
> spindles.

This is what we call "creeping featurism" (or other names not as nice).
You'll then want to know which data are provided by which spindle.
Then you'll want to know how fast each spindle is.  Then you'll find that
not all storage gives you that information (try asking an EMC Symmetrix
how many spindles it has and where data is mapped ...)

Let's just get something merged which gives us an improvement.
Then you're free to experiment with adding the spindles count yourself,
and if you can show a real advantage to it, come back and we can argue
over it with data.

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2 of 9] block: Export I/O topology for block devices and  partitions
  2009-04-24 12:14   ` Kay Sievers
  2009-04-24 12:54     ` Jeff Garzik
@ 2009-04-24 14:53     ` Martin K. Petersen
  1 sibling, 0 replies; 21+ messages in thread
From: Martin K. Petersen @ 2009-04-24 14:53 UTC (permalink / raw)
  To: Kay Sievers
  Cc: Martin K. Petersen, rwheeler, snitzer, jeff, neilb,
	James.Bottomley, dgilbert, jens.axboe, matthew, linux-ide,
	linux-scsi

>>>>> "Kay" == Kay Sievers <kay.sievers@vrfy.org> writes:

Kay> Wouldn't it be good to include "sector", like the queue files do?
Kay> The alignment of a partition could mean many things.
Kay>   /sys/block/<disk>/sector_alignment
Kay>   /sys/block/<disk>/<partition>/sector_alignment

Well, the whole point of this exercise is to get rid of the
about-to-become-incorrect notion of a "sector" :)

What I'd like to see is these values being picked up by libdisk and used
to replace the current hacks that extract MD stripe size, etc.

So the values exported are:

   alignment		- Use this to add padding.

   minimum_io_size	- Don't submit I/Os smaller than this.  May be
                          bigger than both physical and logical block
                          sizes at the bottom of the stack.

   optimal_io_size	- Use this request size for best performance.
   			  Think full stripe write.

Those values apply to all block device types and are adjusted when
stacking.


Kay> And prefixing the io values might be easier to read when they show
Kay> up in a group?
Kay>   /sys/block/<disk>/queue/io_minimum_size
Kay>   /sys/block/<disk>/queue/io_optimal_size
Kay>   /sys/block/<disk>/queue/io_...

I don't really have immediate plans to add more.

I'm not married to any particular naming scheme but I felt that
minimum_io_size was more readable than io_minimum_size.  Which I why I
kept the user-visible names as is.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2 of 9] block: Export I/O topology for block devices and partitions
  2009-04-24 12:54     ` Jeff Garzik
  2009-04-24 14:37       ` Carl Henrik Lunde
@ 2009-04-24 15:00       ` Martin K. Petersen
  1 sibling, 0 replies; 21+ messages in thread
From: Martin K. Petersen @ 2009-04-24 15:00 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Kay Sievers, Martin K. Petersen, rwheeler, snitzer, neilb,
	James.Bottomley, dgilbert, jens.axboe, matthew, linux-ide,
	linux-scsi, LKML

>>>>> "Jeff" == Jeff Garzik <jeff@garzik.org> writes:

Jeff> Why do we need all this syscall overhead just to read individual
Jeff> data items?

Jeff> Isn't it dumb to require 30 userland syscalls simply to input a
Jeff> 10-member data structure?

Jeff> netlink looks more and more attractive for anything non-trivial.

I think these three knobs are very trivial :)

I agree that traversing sysfs can be sucky.  But for the mkfs-es of the
world I expect most of this to be handled by libdisk.

I also I really wanted something that could be easily scripted for
installers to poke at.

If these values were in any kind of hot path I'd be inclined to agree
with the need for a different interface.  But realistically these are
only ever going to be accessed when creating a filesystem, partition or
MD/DM device.

So I opted to keep things simple.  Doesn't mean we can't add another
interface if there's a real need...

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2 of 9] block: Export I/O topology for block devices and  partitions
  2009-04-24 14:37       ` Carl Henrik Lunde
  2009-04-24 14:47         ` Matthew Wilcox
@ 2009-04-24 15:16         ` Martin K. Petersen
  1 sibling, 0 replies; 21+ messages in thread
From: Martin K. Petersen @ 2009-04-24 15:16 UTC (permalink / raw)
  To: Carl Henrik Lunde
  Cc: Martin K. Petersen, Jeff Garzik, Kay Sievers, rwheeler, snitzer,
	neilb, James.Bottomley, dgilbert, jens.axboe, matthew, linux-ide,
	linux-scsi, LKML

>>>>> "Carl" == Carl Henrik Lunde <chlunde@ping.uio.no> writes:

Carl> Would it also be possible and useful to include the number of
Carl> spindles/channels, i.e., how many requests the device can handle
Carl> concurrently?  CFQ could for example serve two time slices
Carl> concurrently if you have sequential reads and the device reports
Carl> two spindles.

We don't really have a way to get that information at this point.

The values exported in my patch set is what the storage vendors in T10
could agree on.  I simply applied them to DM and MD devices as well.

We're talking to SSD vendors about having their devices export some
characteristics that would allow us to schedule I/O more intelligently.
The effort of defining what those values might be is work in progress.
There has been murmurs about T13 adopting the T10 knobs, much like it
happened with form factor and media rotation rate.

For more traditional storage we don't really have a good way to
distinguish between a 10GB LUN on a million dollar array and a single
disk drive.  Rotation rate can help and we already use and export that.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2009-04-24 15:16 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-24  5:32 [PATCH 0 of 9] I/O topology patch kit Martin K. Petersen
2009-04-24  5:32 ` [PATCH 1 of 9] block: Expose stacked device queues in sysfs Martin K. Petersen
2009-04-24  5:32 ` [PATCH 2 of 9] block: Export I/O topology for block devices and partitions Martin K. Petersen
2009-04-24 12:14   ` Kay Sievers
2009-04-24 12:54     ` Jeff Garzik
2009-04-24 14:37       ` Carl Henrik Lunde
2009-04-24 14:47         ` Matthew Wilcox
2009-04-24 15:16         ` Martin K. Petersen
2009-04-24 15:00       ` Martin K. Petersen
2009-04-24 14:53     ` Martin K. Petersen
2009-04-24  5:32 ` [PATCH 3 of 9] MD: Use new topology calls to indicate alignment and I/O sizes Martin K. Petersen
2009-04-24  5:32 ` [PATCH 4 of 9] sd: Physical block size and alignment support Martin K. Petersen
2009-04-24  5:32 ` [PATCH 5 of 9] sd: Detect non-rotational devices Martin K. Petersen
2009-04-24  5:32 ` [PATCH 6 of 9] sd: Block limits VPD support Martin K. Petersen
2009-04-24  5:32 ` [PATCH 7 of 9] scsi_debug: Add support for physical block exponent and alignment Martin K. Petersen
2009-04-24  5:32 ` [PATCH 8 of 9] libata: Report disk alignment and physical block size Martin K. Petersen
2009-04-24  5:32 ` [PATCH 9 of 9] libata: Media rotation rate and form factor heuristics Martin K. Petersen
     [not found] ` <7ec2d82b188a9e9d4c56.1240551148@sermon.lab.mkp.net>
2009-04-24  6:10   ` [PATCH 7 of 9] scsi_debug: Add support for physical block exponent and alignment Douglas Gilbert
2009-04-24  6:14     ` Martin K. Petersen
     [not found] ` <2fc5b2aa370a8ad47db1.1240551150@sermon.lab.mkp.net>
2009-04-24 12:30   ` [PATCH 9 of 9] libata: Media rotation rate and form factor heuristics Matthew Wilcox
     [not found] ` <47f4f448a804a2d24f10.1240551149@sermon.lab.mkp.net>
2009-04-24 12:32   ` [PATCH 8 of 9] libata: Report disk alignment and physical block size Matthew Wilcox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).