[patch v4 0/3] aio: implement request batching

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [patch v4 0/3] aio: implement request batching
@ 2009-10-02 22:54 Jeff Moyer
  2009-10-02 22:56 ` [patch v4 1/2] block: get rid of the WRITE_ODIRECT flag Jeff Moyer
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Jeff Moyer @ 2009-10-02 22:54 UTC (permalink / raw)
  To: zach.brown; +Cc: linux-aio, Linux Kernel Mailing, Andrew Morton

Hi,

In this, the 4th iteration of the patch set, I've addressed the concerns
of Jens and Zach as follows:

- In the O_DIRECT write path, we get rid of WRITE_ODIRECT in favor of
  WRITE_SYNC_PLUG
- request batching is only done for AIO operations that could
  potentially benefit from it
- Initialization of the hash table is done using the gcc array
  initialization syntax

Further, I got rid of a compiler warning that I had introduced in the
last patch set.  I'll continue to test this under a variety of loads,
but would certainly appreciate it if others could give it a spin.

Patch overview / reason for existence:

Some workloads issue batches of small I/Os, and the performance is poor
due to the call to blk_run_address_space for every single iocb.  Nathan
Roberts pointed this out, and suggested that by deferring this call
until all I/Os in the iocb array are submitted to the block layer, we
can realize some impressive performance gains.

For example, running against a simple SATA disk driving a queue depth of
128, we can improve the performance of O_DIRECT writes twofold for 4k
I/Os, and similarly impressive numbers for other I/O sizes:
  http://people.redhat.com/jmoyer/dbras/vanilla-vs-v4/metallica-single-sata/noop/io-depth-128-write-bw.png

For read workloads on somewhat faster storage, we see similar benefits
for batches of smaller I/Os as well:
  http://people.redhat.com/jmoyer/dbras/vanilla-vs-v4/thor-4-disk-lvm-stripe/noop/io-depth-16-read-bw.png
That's 230+MB/s for vanilla 4k reads, batches of 16 vs. 410MB/s for
patched.  When there are multiple threads competing for the disk, I
haven't witnessed such crazy numbers, however.

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [patch v4 1/2] block: get rid of the WRITE_ODIRECT flag
  2009-10-02 22:54 [patch v4 0/3] aio: implement request batching Jeff Moyer
@ 2009-10-02 22:56 ` Jeff Moyer
  2009-10-02 22:57 ` [patch v4 2/2] aio: implement request batching Jeff Moyer
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Jeff Moyer @ 2009-10-02 22:56 UTC (permalink / raw)
  To: zach.brown; +Cc: linux-aio, Linux Kernel Mailing, Andrew Morton, Jens Axboe

Hi,

The WRITE_ODIRECT flag is only used in one place, and that code path
happens to also call blk_run_address_space.  The introduction of this
flag, then, could result in the device being unplugged twice for every
I/O.

Further, with the batching changes in the next patch, we don't want an
O_DIRECT write to imply a queue unplug.

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>

diff --git a/fs/direct-io.c b/fs/direct-io.c
index 8b10b87..c86d35f 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -1124,7 +1124,7 @@ __blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode,
 	int acquire_i_mutex = 0;
 
 	if (rw & WRITE)
-		rw = WRITE_ODIRECT;
+		rw = WRITE_SYNC_PLUG;
 
 	if (bdev)
 		bdev_blkbits = blksize_bits(bdev_logical_block_size(bdev));
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 2adaa25..2aac751 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -129,7 +129,6 @@ struct inodes_stat_t {
  * WRITE_SYNC		Like WRITE_SYNC_PLUG, but also unplugs the device
  *			immediately after submission. The write equivalent
  *			of READ_SYNC.
- * WRITE_ODIRECT	Special case write for O_DIRECT only.
  * SWRITE_SYNC
  * SWRITE_SYNC_PLUG	Like WRITE_SYNC/WRITE_SYNC_PLUG, but locks the buffer.
  *			See SWRITE.
@@ -151,7 +150,6 @@ struct inodes_stat_t {
 #define READ_META	(READ | (1 << BIO_RW_META))
 #define WRITE_SYNC_PLUG	(WRITE | (1 << BIO_RW_SYNCIO) | (1 << BIO_RW_NOIDLE))
 #define WRITE_SYNC	(WRITE_SYNC_PLUG | (1 << BIO_RW_UNPLUG))
-#define WRITE_ODIRECT	(WRITE | (1 << BIO_RW_SYNCIO) | (1 << BIO_RW_UNPLUG))
 #define SWRITE_SYNC_PLUG	\
 			(SWRITE | (1 << BIO_RW_SYNCIO) | (1 << BIO_RW_NOIDLE))
 #define SWRITE_SYNC	(SWRITE_SYNC_PLUG | (1 << BIO_RW_UNPLUG))

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [patch v4 2/2] aio: implement request batching
  2009-10-02 22:54 [patch v4 0/3] aio: implement request batching Jeff Moyer
  2009-10-02 22:56 ` [patch v4 1/2] block: get rid of the WRITE_ODIRECT flag Jeff Moyer
@ 2009-10-02 22:57 ` Jeff Moyer
  2009-10-02 22:58 ` [patch v4 0/3] " Jeff Moyer
  2009-10-06 17:45 ` [patch v4 0/3] aio: implement request batching [more performance numbers] Jeff Moyer
  3 siblings, 0 replies; 11+ messages in thread
From: Jeff Moyer @ 2009-10-02 22:57 UTC (permalink / raw)
  To: zach.brown; +Cc: linux-aio, Linux Kernel Mailing, Andrew Morton

Hi,

Some workloads issue batches of small I/O, and the performance is poor
due to the call to blk_run_address_space for every single iocb.  Nathan
Roberts pointed this out, and suggested that by deferring this call
until all I/Os in the iocb array are submitted to the block layer, we
can realize some impressive performance gains (up to 30% for sequential
4k reads in batches of 16).

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>

diff --git a/fs/aio.c b/fs/aio.c
index 02a2c93..cf0bef4 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -32,6 +32,9 @@
 #include <linux/workqueue.h>
 #include <linux/security.h>
 #include <linux/eventfd.h>
+#include <linux/blkdev.h>
+#include <linux/mempool.h>
+#include <linux/hash.h>
 
 #include <asm/kmap_types.h>
 #include <asm/uaccess.h>
@@ -60,6 +63,14 @@ static DECLARE_WORK(fput_work, aio_fput_routine);
 static DEFINE_SPINLOCK(fput_lock);
 static LIST_HEAD(fput_head);
 
+#define AIO_BATCH_HASH_BITS	3 /* allocated on-stack, so don't go crazy */
+#define AIO_BATCH_HASH_SIZE	(1 << AIO_BATCH_HASH_BITS)
+struct aio_batch_entry {
+	struct hlist_node list;
+	struct address_space *mapping;
+};
+mempool_t *abe_pool;
+
 static void aio_kick_handler(struct work_struct *);
 static void aio_queue_work(struct kioctx *);
 
@@ -73,6 +84,8 @@ static int __init aio_setup(void)
 	kioctx_cachep = KMEM_CACHE(kioctx,SLAB_HWCACHE_ALIGN|SLAB_PANIC);
 
 	aio_wq = create_workqueue("aio");
+	abe_pool = mempool_create_kmalloc_pool(1, sizeof(struct aio_batch_entry));
+	BUG_ON(!abe_pool);
 
 	pr_debug("aio_setup: sizeof(struct page) = %d\n", (int)sizeof(struct page));
 
@@ -1531,8 +1544,44 @@ static int aio_wake_function(wait_queue_t *wait, unsigned mode,
 	return 1;
 }
 
+static void aio_batch_add(struct address_space *mapping,
+			  struct hlist_head *batch_hash)
+{
+	struct aio_batch_entry *abe;
+	struct hlist_node *pos;
+	unsigned bucket;
+
+	bucket = hash_ptr(mapping, AIO_BATCH_HASH_BITS);
+	hlist_for_each_entry(abe, pos, &batch_hash[bucket], list) {
+		if (abe->mapping == mapping)
+			return;
+	}
+
+	abe = mempool_alloc(abe_pool, GFP_KERNEL);
+	BUG_ON(!igrab(mapping->host));
+	abe->mapping = mapping;
+	hlist_add_head(&abe->list, &batch_hash[bucket]);
+	return;
+}
+
+static void aio_batch_free(struct hlist_head *batch_hash)
+{
+	struct aio_batch_entry *abe;
+	struct hlist_node *pos, *n;
+	int i;
+
+	for (i = 0; i < AIO_BATCH_HASH_SIZE; i++) {
+		hlist_for_each_entry_safe(abe, pos, n, &batch_hash[i], list) {
+			blk_run_address_space(abe->mapping);
+			iput(abe->mapping->host);
+			hlist_del(&abe->list);
+			mempool_free(abe, abe_pool);
+		}
+	}
+}
+
 static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
-			 struct iocb *iocb)
+			 struct iocb *iocb, struct hlist_head *batch_hash)
 {
 	struct kiocb *req;
 	struct file *file;
@@ -1608,6 +1657,12 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
 			;
 	}
 	spin_unlock_irq(&ctx->ctx_lock);
+	if (req->ki_opcode == IOCB_CMD_PREAD ||
+	    req->ki_opcode == IOCB_CMD_PREADV ||
+	    req->ki_opcode == IOCB_CMD_PWRITE ||
+	    req->ki_opcode == IOCB_CMD_PWRITEV)
+		aio_batch_add(file->f_mapping, batch_hash);
+
 	aio_put_req(req);	/* drop extra ref to req */
 	return 0;
 
@@ -1635,6 +1690,7 @@ SYSCALL_DEFINE3(io_submit, aio_context_t, ctx_id, long, nr,
 	struct kioctx *ctx;
 	long ret = 0;
 	int i;
+	struct hlist_head batch_hash[AIO_BATCH_HASH_SIZE] = { { 0, }, };
 
 	if (unlikely(nr < 0))
 		return -EINVAL;
@@ -1666,10 +1722,11 @@ SYSCALL_DEFINE3(io_submit, aio_context_t, ctx_id, long, nr,
 			break;
 		}
 
-		ret = io_submit_one(ctx, user_iocb, &tmp);
+		ret = io_submit_one(ctx, user_iocb, &tmp, batch_hash);
 		if (ret)
 			break;
 	}
+	aio_batch_free(batch_hash);
 
 	put_ioctx(ctx);
 	return i ? i : ret;
diff --git a/fs/direct-io.c b/fs/direct-io.c
index c86d35f..3af761c 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -1028,9 +1028,6 @@ direct_io_worker(int rw, struct kiocb *iocb, struct inode *inode,
 	if (dio->bio)
 		dio_bio_submit(dio);
 
-	/* All IO is now issued, send it on its way */
-	blk_run_address_space(inode->i_mapping);
-
 	/*
 	 * It is possible that, we return short IO due to end of file.
 	 * In that case, we need to release all the pages we got hold on.
@@ -1057,8 +1054,11 @@ direct_io_worker(int rw, struct kiocb *iocb, struct inode *inode,
 	    ((rw & READ) || (dio->result == dio->size)))
 		ret = -EIOCBQUEUED;
 
-	if (ret != -EIOCBQUEUED)
+	if (ret != -EIOCBQUEUED) {
+		/* All IO is now issued, send it on its way */
+		blk_run_address_space(inode->i_mapping);
 		dio_await_completion(dio);
+	}
 
 	/*
 	 * Sync will always be dropping the final ref and completing the

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [patch v4 0/3] aio: implement request batching
  2009-10-02 22:54 [patch v4 0/3] aio: implement request batching Jeff Moyer
  2009-10-02 22:56 ` [patch v4 1/2] block: get rid of the WRITE_ODIRECT flag Jeff Moyer
  2009-10-02 22:57 ` [patch v4 2/2] aio: implement request batching Jeff Moyer
@ 2009-10-02 22:58 ` Jeff Moyer
  2009-10-06 17:45 ` [patch v4 0/3] aio: implement request batching [more performance numbers] Jeff Moyer
  3 siblings, 0 replies; 11+ messages in thread
From: Jeff Moyer @ 2009-10-02 22:58 UTC (permalink / raw)
  To: zach.brown; +Cc: linux-aio, Linux Kernel Mailing, Andrew Morton

Yeah, there are only 2 patches.  Sorry about that broken subject.  I had
a crossed wire somewhere.  ;-)

-Jeff

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch v4 0/3] aio: implement request batching [more performance numbers]
  2009-10-02 22:54 [patch v4 0/3] aio: implement request batching Jeff Moyer
                   ` (2 preceding siblings ...)
  2009-10-02 22:58 ` [patch v4 0/3] " Jeff Moyer
@ 2009-10-06 17:45 ` Jeff Moyer
  2009-10-06 18:06   ` Jens Axboe
  3 siblings, 1 reply; 11+ messages in thread
From: Jeff Moyer @ 2009-10-06 17:45 UTC (permalink / raw)
  To: zach.brown; +Cc: linux-aio, Linux Kernel Mailing, Andrew Morton

Here's a mail I got from Nathan Roberts.

Cheers,
Jeff

---

Similar test as before. I had to re-upload the files so comparing
against last time isn't really apples-apples.

Disk is a cciss logical drive consisting of 12 SATA drives in a RAID6
configuration with 128K stripes.

Test case 1 is to read 1 million random 40K files (no file is read
more than once), 16 4K iocbs at a time, 100 threads.

Test case 2 is the same except 100,000 128K files are read.

Unit of measure is "files read per second".


40K
-------------------------------------------------
Kernel                              NOOP
------                              ----
2.6.30.5                            682
2.6.30.5 (w/o drop_caches)          718
2.6.30.5+patch_v4                   900
2.6.30.5+patch_v4 (w/o drop caches) 965


128K
-------------------------------------------------
Kernel                              NOOP
------                              ----
2.6.30.5                            242
2.6.30.5 (w/o drop_caches)          350
2.6.30.5+patch_v4                   292
2.6.30.5+patch_v4 (w/o drop caches) 420


Hope it helps.
Nathan


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch v4 0/3] aio: implement request batching [more performance numbers]
  2009-10-06 17:45 ` [patch v4 0/3] aio: implement request batching [more performance numbers] Jeff Moyer
@ 2009-10-06 18:06   ` Jens Axboe
  2009-10-06 18:18     ` Jeff Moyer
  0 siblings, 1 reply; 11+ messages in thread
From: Jens Axboe @ 2009-10-06 18:06 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: zach.brown, linux-aio, Linux Kernel Mailing, Andrew Morton

On Tue, Oct 06 2009, Jeff Moyer wrote:
> Here's a mail I got from Nathan Roberts.
> 
> Cheers,
> Jeff
> 
> ---
> 
> Similar test as before. I had to re-upload the files so comparing
> against last time isn't really apples-apples.
> 
> Disk is a cciss logical drive consisting of 12 SATA drives in a RAID6
> configuration with 128K stripes.
> 
> Test case 1 is to read 1 million random 40K files (no file is read
> more than once), 16 4K iocbs at a time, 100 threads.
> 
> Test case 2 is the same except 100,000 128K files are read.
> 
> Unit of measure is "files read per second".
> 
> 
> 40K
> -------------------------------------------------
> Kernel                              NOOP
> ------                              ----
> 2.6.30.5                            682
> 2.6.30.5 (w/o drop_caches)          718
> 2.6.30.5+patch_v4                   900
> 2.6.30.5+patch_v4 (w/o drop caches) 965
> 
> 
> 128K
> -------------------------------------------------
> Kernel                              NOOP
> ------                              ----
> 2.6.30.5                            242
> 2.6.30.5 (w/o drop_caches)          350
> 2.6.30.5+patch_v4                   292
> 2.6.30.5+patch_v4 (w/o drop caches) 420

Nice numbers! The patch looks good to me from a quick look, if you want
I can throw it into the testing mix tomorrow and see what kind of
improvements I see here. With performance increase of that magnitude, we
should get it in sooner rather than later.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch v4 0/3] aio: implement request batching [more performance numbers]
  2009-10-06 18:06   ` Jens Axboe
@ 2009-10-06 18:18     ` Jeff Moyer
  2009-10-07 10:52       ` Jens Axboe
  0 siblings, 1 reply; 11+ messages in thread
From: Jeff Moyer @ 2009-10-06 18:18 UTC (permalink / raw)
  To: Jens Axboe; +Cc: zach.brown, linux-aio, Linux Kernel Mailing, Andrew Morton

Jens Axboe <jens.axboe@oracle.com> writes:

> On Tue, Oct 06 2009, Jeff Moyer wrote:
>> Here's a mail I got from Nathan Roberts.
>> 
>> Cheers,
>> Jeff
>> 
>> ---
>> 
>> Similar test as before. I had to re-upload the files so comparing
>> against last time isn't really apples-apples.
>> 
>> Disk is a cciss logical drive consisting of 12 SATA drives in a RAID6
>> configuration with 128K stripes.
>> 
>> Test case 1 is to read 1 million random 40K files (no file is read
>> more than once), 16 4K iocbs at a time, 100 threads.
>> 
>> Test case 2 is the same except 100,000 128K files are read.
>> 
>> Unit of measure is "files read per second".
>> 
>> 
>> 40K
>> -------------------------------------------------
>> Kernel                              NOOP
>> ------                              ----
>> 2.6.30.5                            682
>> 2.6.30.5 (w/o drop_caches)          718
>> 2.6.30.5+patch_v4                   900
>> 2.6.30.5+patch_v4 (w/o drop caches) 965
>> 
>> 
>> 128K
>> -------------------------------------------------
>> Kernel                              NOOP
>> ------                              ----
>> 2.6.30.5                            242
>> 2.6.30.5 (w/o drop_caches)          350
>> 2.6.30.5+patch_v4                   292
>> 2.6.30.5+patch_v4 (w/o drop caches) 420
>
> Nice numbers! The patch looks good to me from a quick look, if you want
> I can throw it into the testing mix tomorrow and see what kind of
> improvements I see here. With performance increase of that magnitude, we
> should get it in sooner rather than later.

I'd love it if you could run some benchmarks, thank you!

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch v4 0/3] aio: implement request batching [more performance numbers]
  2009-10-06 18:18     ` Jeff Moyer
@ 2009-10-07 10:52       ` Jens Axboe
  2009-10-07 12:09         ` Jeff Moyer
  2009-10-27 16:16         ` Jeff Moyer
  0 siblings, 2 replies; 11+ messages in thread
From: Jens Axboe @ 2009-10-07 10:52 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: zach.brown, linux-aio, Linux Kernel Mailing, Andrew Morton

On Tue, Oct 06 2009, Jeff Moyer wrote:
> >> 40K
> >> -------------------------------------------------
> >> Kernel                              NOOP
> >> ------                              ----
> >> 2.6.30.5                            682
> >> 2.6.30.5 (w/o drop_caches)          718
> >> 2.6.30.5+patch_v4                   900
> >> 2.6.30.5+patch_v4 (w/o drop caches) 965
> >> 
> >> 
> >> 128K
> >> -------------------------------------------------
> >> Kernel                              NOOP
> >> ------                              ----
> >> 2.6.30.5                            242
> >> 2.6.30.5 (w/o drop_caches)          350
> >> 2.6.30.5+patch_v4                   292
> >> 2.6.30.5+patch_v4 (w/o drop caches) 420
> >
> > Nice numbers! The patch looks good to me from a quick look, if you want
> > I can throw it into the testing mix tomorrow and see what kind of
> > improvements I see here. With performance increase of that magnitude, we
> > should get it in sooner rather than later.
> 
> I'd love it if you could run some benchmarks, thank you!

So here's a pretty basic test. It does random reads from a bunch of
devices, I tested both 4kb and 64kb block sizes. Queue depth used is 32
for both cases, but note that this test uses a thread per device (so the
queue depth is 32 per device). Results are averaged over 3 runs.
slat/clat are the submission and completion latencies, they are in
microseconds here.

4kb random reads

kernel               sys     IOPS       slat    clat
----------------------------------------------------
2.6.32-rc3+patch    25.8%   192500      7.9     2606
2.6.32-rc3          27.4%   191300      8.4     2612



64kb random reads

kernel               sys     IOPS       slat    clat
----------------------------------------------------
2.6.32-rc3+patch     2.5%    24590      9.7     9681
2.6.32-rc3           2.5%    24580      9.4     9691

So pretty close, nothing earth shattering here. What the results above
do not show is that the 4kb test runs very stable with your patch.
Mainline fluctuates somewhat in the bandwidth, most likely due to the
varying depth.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch v4 0/3] aio: implement request batching [more performance numbers]
  2009-10-07 10:52       ` Jens Axboe
@ 2009-10-07 12:09         ` Jeff Moyer
  2009-10-27 16:16         ` Jeff Moyer
  1 sibling, 0 replies; 11+ messages in thread
From: Jeff Moyer @ 2009-10-07 12:09 UTC (permalink / raw)
  To: Jens Axboe; +Cc: zach.brown, linux-aio, Linux Kernel Mailing, Andrew Morton

Jens Axboe <jens.axboe@oracle.com> writes:

> So here's a pretty basic test. It does random reads from a bunch of
> devices, I tested both 4kb and 64kb block sizes. Queue depth used is 32
> for both cases, but note that this test uses a thread per device (so the
> queue depth is 32 per device). Results are averaged over 3 runs.
> slat/clat are the submission and completion latencies, they are in
> microseconds here.
>
> 4kb random reads
>
> kernel               sys     IOPS       slat    clat
> ----------------------------------------------------
> 2.6.32-rc3+patch    25.8%   192500      7.9     2606
> 2.6.32-rc3          27.4%   191300      8.4     2612
>
>
>
> 64kb random reads
>
> kernel               sys     IOPS       slat    clat
> ----------------------------------------------------
> 2.6.32-rc3+patch     2.5%    24590      9.7     9681
> 2.6.32-rc3           2.5%    24580      9.4     9691
>
> So pretty close, nothing earth shattering here. What the results above
> do not show is that the 4kb test runs very stable with your patch.
> Mainline fluctuates somewhat in the bandwidth, most likely due to the
> varying depth.

OK, good news.  Thanks for testing!

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch v4 0/3] aio: implement request batching [more performance numbers]
  2009-10-07 10:52       ` Jens Axboe
  2009-10-07 12:09         ` Jeff Moyer
@ 2009-10-27 16:16         ` Jeff Moyer
  2009-10-28  8:28           ` Jens Axboe
  1 sibling, 1 reply; 11+ messages in thread
From: Jeff Moyer @ 2009-10-27 16:16 UTC (permalink / raw)
  To: Jens Axboe; +Cc: zach.brown, linux-aio, Linux Kernel Mailing, Andrew Morton

Jens Axboe <jens.axboe@oracle.com> writes:

>> > Nice numbers! The patch looks good to me from a quick look, if you want
>> > I can throw it into the testing mix tomorrow and see what kind of
>> > improvements I see here. With performance increase of that magnitude, we
>> > should get it in sooner rather than later.
>> 
> So here's a pretty basic test. It does random reads from a bunch of
> devices, I tested both 4kb and 64kb block sizes. Queue depth used is 32
> for both cases, but note that this test uses a thread per device (so the
> queue depth is 32 per device). Results are averaged over 3 runs.
> slat/clat are the submission and completion latencies, they are in
> microseconds here.
[...]
> So pretty close, nothing earth shattering here. What the results above
> do not show is that the 4kb test runs very stable with your patch.
> Mainline fluctuates somewhat in the bandwidth, most likely due to the
> varying depth.

OK, I don't think this patch set will see further testing without
pulling it into someone's tree.  Jens, would you mind putting this into
your for-2.6.33 queue?  Let me know if you need me to resend the
patches.

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch v4 0/3] aio: implement request batching [more performance numbers]
  2009-10-27 16:16         ` Jeff Moyer
@ 2009-10-28  8:28           ` Jens Axboe
  0 siblings, 0 replies; 11+ messages in thread
From: Jens Axboe @ 2009-10-28  8:28 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: zach.brown, linux-aio, Linux Kernel Mailing, Andrew Morton

On Tue, Oct 27 2009, Jeff Moyer wrote:
> Jens Axboe <jens.axboe@oracle.com> writes:
> 
> >> > Nice numbers! The patch looks good to me from a quick look, if you want
> >> > I can throw it into the testing mix tomorrow and see what kind of
> >> > improvements I see here. With performance increase of that magnitude, we
> >> > should get it in sooner rather than later.
> >> 
> > So here's a pretty basic test. It does random reads from a bunch of
> > devices, I tested both 4kb and 64kb block sizes. Queue depth used is 32
> > for both cases, but note that this test uses a thread per device (so the
> > queue depth is 32 per device). Results are averaged over 3 runs.
> > slat/clat are the submission and completion latencies, they are in
> > microseconds here.
> [...]
> > So pretty close, nothing earth shattering here. What the results above
> > do not show is that the 4kb test runs very stable with your patch.
> > Mainline fluctuates somewhat in the bandwidth, most likely due to the
> > varying depth.
> 
> OK, I don't think this patch set will see further testing without
> pulling it into someone's tree.  Jens, would you mind putting this into
> your for-2.6.33 queue?  Let me know if you need me to resend the
> patches.

Sure, I'll pull it in. I still have the patches, will ping you if it
doesn't apply cleanly.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2009-10-28  8:28 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-02 22:54 [patch v4 0/3] aio: implement request batching Jeff Moyer
2009-10-02 22:56 ` [patch v4 1/2] block: get rid of the WRITE_ODIRECT flag Jeff Moyer
2009-10-02 22:57 ` [patch v4 2/2] aio: implement request batching Jeff Moyer
2009-10-02 22:58 ` [patch v4 0/3] " Jeff Moyer
2009-10-06 17:45 ` [patch v4 0/3] aio: implement request batching [more performance numbers] Jeff Moyer
2009-10-06 18:06   ` Jens Axboe
2009-10-06 18:18     ` Jeff Moyer
2009-10-07 10:52       ` Jens Axboe
2009-10-07 12:09         ` Jeff Moyer
2009-10-27 16:16         ` Jeff Moyer
2009-10-28  8:28           ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox