[RFC][PATCH] Make io_submit non-blocking

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC][PATCH] Make io_submit non-blocking
@ 2012-07-24 11:41 Ankit Jain
  2012-07-24 12:34 ` Rajat Sharma
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Ankit Jain @ 2012-07-24 11:41 UTC (permalink / raw)
  To: Al Viro, bcrl; +Cc: linux-fsdevel, linux-aio, linux-kernel, Jan Kara

[-- Attachment #1: Type: text/plain, Size: 9349 bytes --]


Currently, io_submit tries to execute the io requests on the
same thread, which could block because of various reaons (eg.
allocation of disk blocks). So, essentially, io_submit ends
up being a blocking call.

With this patch, io_submit prepares all the kiocbs and then
adds (kicks) them to ctx->run_list (kicked) in one go and then
schedules the workqueue. The actual operations are not executed
on io_submit's process context, so it can return very quickly.

This run_list is processed either on a workqueue or in response to
an io_getevents call. This utilizes the existing retry infrastructure.

It uses override_creds/revert_creds to use the submitting process'
credentials when processing the iocb request from the workqueue. This
is required for proper support of quota and reserved block access.

Currently, we use block plugging in io_submit, since most of the IO
was being done there itself. This patch moves it to aio_kick_handler
and aio_run_all_iocbs, where the IO gets submitted.

All the tests were run with ext4.

I tested the patch with fio
 (fio rand-rw-disk.fio --max-jobs=2 --latency-log
 --bandwidth-log)

**Unpatched**
read : io=102120KB, bw=618740 B/s, iops=151 , runt=169006msec
slat (usec): min=275 , max=87560 , avg=6571.88, stdev=2799.57

write: io=102680KB, bw=622133 B/s, iops=151 , runt=169006msec
slat (usec): min=2 , max=196 , avg=24.66, stdev=20.35

**Patched**
read : io=102864KB, bw=504885 B/s, iops=123 , runt=208627msec
slat (usec): min=0 , max=120 , avg= 1.65, stdev= 3.46 

write: io=101936KB, bw=500330 B/s, iops=122 , runt=208627msec
slat (usec): min=0 , max=131 , avg= 1.85, stdev= 3.27 

>From above, it can be seen that submit latencies improve a lot with the
patch. The full fio results for the "old"(unpatched) and "new"(patched)
cases are attached. Results with both ramdisk (*rd*) and disk attached,
and also the corresponding fio files.

Some variations I tried:

1. I tried to submit one iocb at a time (lock/unlock ctx->lock), and
that had good performance for a regular disk, but when I tested with a
ramdisk (to simulate very fast disk), performance was extremely bad.
Submitting all the iocbs from an io_submit in one go, restored the
performance (latencies+bandwidth).

2. I was earlier trying to use queue_delayed_work with 0 timeout, but
that worsened the submit latencies a bit but improved bandwidth. 

3. Also, I tried not using aio_queue_work from io_submit call, and instead
depending on an already scheduled one or the iocbs being run when
io_getevents gets called. This seemed to give improved perfomance. But
does this constitute as change of api semantics?

Signed-off-by: Ankit Jain <jankit@suse.de>

--
diff --git a/fs/aio.c b/fs/aio.c
index 71f613c..79801096b 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -563,6 +563,11 @@ static int __aio_put_req(struct kioctx *ctx, struct kiocb *req)
 	req->ki_cancel = NULL;
 	req->ki_retry = NULL;
 
+	if (likely(req->submitter_cred)) {
+		put_cred(req->submitter_cred);
+		req->submitter_cred = NULL;
+	}
+
 	fput(req->ki_filp);
 	req->ki_filp = NULL;
 	really_put_req(ctx, req);
@@ -659,6 +664,7 @@ static ssize_t aio_run_iocb(struct kiocb *iocb)
 	struct kioctx	*ctx = iocb->ki_ctx;
 	ssize_t (*retry)(struct kiocb *);
 	ssize_t ret;
+	const struct cred *old_cred = NULL;
 
 	if (!(retry = iocb->ki_retry)) {
 		printk("aio_run_iocb: iocb->ki_retry = NULL\n");
@@ -703,12 +709,19 @@ static ssize_t aio_run_iocb(struct kiocb *iocb)
 		goto out;
 	}
 
+	if (iocb->submitter_cred)
+		/* setup creds */
+		old_cred = override_creds(iocb->submitter_cred);
+
 	/*
 	 * Now we are all set to call the retry method in async
 	 * context.
 	 */
 	ret = retry(iocb);
 
+	if (old_cred)
+		revert_creds(old_cred);
+
 	if (ret != -EIOCBRETRY && ret != -EIOCBQUEUED) {
 		/*
 		 * There's no easy way to restart the syscall since other AIO's
@@ -804,10 +817,14 @@ static void aio_queue_work(struct kioctx * ctx)
  */
 static inline void aio_run_all_iocbs(struct kioctx *ctx)
 {
+	struct blk_plug plug;
+
+	blk_start_plug(&plug);
 	spin_lock_irq(&ctx->ctx_lock);
 	while (__aio_run_iocbs(ctx))
 		;
 	spin_unlock_irq(&ctx->ctx_lock);
+	blk_finish_plug(&plug);
 }
 
 /*
@@ -825,13 +842,16 @@ static void aio_kick_handler(struct work_struct *work)
 	mm_segment_t oldfs = get_fs();
 	struct mm_struct *mm;
 	int requeue;
+	struct blk_plug plug;
 
 	set_fs(USER_DS);
 	use_mm(ctx->mm);
+	blk_start_plug(&plug);
 	spin_lock_irq(&ctx->ctx_lock);
 	requeue =__aio_run_iocbs(ctx);
 	mm = ctx->mm;
 	spin_unlock_irq(&ctx->ctx_lock);
+	blk_finish_plug(&plug);
  	unuse_mm(mm);
 	set_fs(oldfs);
 	/*
@@ -1506,12 +1526,14 @@ static ssize_t aio_setup_iocb(struct kiocb *kiocb, bool compat)
 
 static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
 			 struct iocb *iocb, struct kiocb_batch *batch,
-			 bool compat)
+			 bool compat, struct kiocb **req_entry)
 {
 	struct kiocb *req;
 	struct file *file;
 	ssize_t ret;
 
+	*req_entry = NULL;
+
 	/* enforce forwards compatibility on users */
 	if (unlikely(iocb->aio_reserved1 || iocb->aio_reserved2)) {
 		pr_debug("EINVAL: io_submit: reserve field set\n");
@@ -1537,6 +1559,7 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
 		fput(file);
 		return -EAGAIN;
 	}
+
 	req->ki_filp = file;
 	if (iocb->aio_flags & IOCB_FLAG_RESFD) {
 		/*
@@ -1567,38 +1590,16 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
 	req->ki_left = req->ki_nbytes = iocb->aio_nbytes;
 	req->ki_opcode = iocb->aio_lio_opcode;
 
+	req->submitter_cred = get_current_cred();
+
 	ret = aio_setup_iocb(req, compat);
 
 	if (ret)
 		goto out_put_req;
 
-	spin_lock_irq(&ctx->ctx_lock);
-	/*
-	 * We could have raced with io_destroy() and are currently holding a
-	 * reference to ctx which should be destroyed. We cannot submit IO
-	 * since ctx gets freed as soon as io_submit() puts its reference.  The
-	 * check here is reliable: io_destroy() sets ctx->dead before waiting
-	 * for outstanding IO and the barrier between these two is realized by
-	 * unlock of mm->ioctx_lock and lock of ctx->ctx_lock.  Analogously we
-	 * increment ctx->reqs_active before checking for ctx->dead and the
-	 * barrier is realized by unlock and lock of ctx->ctx_lock. Thus if we
-	 * don't see ctx->dead set here, io_destroy() waits for our IO to
-	 * finish.
-	 */
-	if (ctx->dead) {
-		spin_unlock_irq(&ctx->ctx_lock);
-		ret = -EINVAL;
-		goto out_put_req;
-	}
-	aio_run_iocb(req);
-	if (!list_empty(&ctx->run_list)) {
-		/* drain the run list */
-		while (__aio_run_iocbs(ctx))
-			;
-	}
-	spin_unlock_irq(&ctx->ctx_lock);
-
 	aio_put_req(req);	/* drop extra ref to req */
+
+	*req_entry = req;
 	return 0;
 
 out_put_req:
@@ -1613,8 +1614,10 @@ long do_io_submit(aio_context_t ctx_id, long nr,
 	struct kioctx *ctx;
 	long ret = 0;
 	int i = 0;
-	struct blk_plug plug;
 	struct kiocb_batch batch;
+	struct kiocb **req_arr = NULL;
+	int nr_submitted = 0;
+	int req_arr_cnt = 0;
 
 	if (unlikely(nr < 0))
 		return -EINVAL;
@@ -1632,8 +1635,8 @@ long do_io_submit(aio_context_t ctx_id, long nr,
 	}
 
 	kiocb_batch_init(&batch, nr);
-
-	blk_start_plug(&plug);
+	req_arr = kmalloc(sizeof(struct kiocb *) * nr, GFP_KERNEL);
+	memset(req_arr, 0, sizeof(req_arr));
 
 	/*
 	 * AKPM: should this return a partial result if some of the IOs were
@@ -1653,15 +1656,51 @@ long do_io_submit(aio_context_t ctx_id, long nr,
 			break;
 		}
 
-		ret = io_submit_one(ctx, user_iocb, &tmp, &batch, compat);
+		ret = io_submit_one(ctx, user_iocb, &tmp, &batch, compat,
+				&req_arr[i]);
 		if (ret)
 			break;
+		req_arr_cnt++;
 	}
-	blk_finish_plug(&plug);
 
+	spin_lock_irq(&ctx->ctx_lock);
+	/*
+	 * We could have raced with io_destroy() and are currently holding a
+	 * reference to ctx which should be destroyed. We cannot submit IO
+	 * since ctx gets freed as soon as io_submit() puts its reference.  The
+	 * check here is reliable: io_destroy() sets ctx->dead before waiting
+	 * for outstanding IO and the barrier between these two is realized by
+	 * unlock of mm->ioctx_lock and lock of ctx->ctx_lock.  Analogously we
+	 * increment ctx->reqs_active before checking for ctx->dead and the
+	 * barrier is realized by unlock and lock of ctx->ctx_lock. Thus if we
+	 * don't see ctx->dead set here, io_destroy() waits for our IO to
+	 * finish.
+	 */
+	if (ctx->dead) {
+		spin_unlock_irq(&ctx->ctx_lock);
+		for (i = 0; i < req_arr_cnt; i++)
+			/* drop i/o ref to the req */
+			__aio_put_req(ctx, req_arr[i]);
+
+		ret = -EINVAL;
+		goto out;
+	}
+
+	for (i = 0; i < req_arr_cnt; i++) {
+		struct kiocb *req = req_arr[i];
+		if (likely(!kiocbTryKick(req)))
+			__queue_kicked_iocb(req);
+		nr_submitted++;
+	}
+	if (likely(nr_submitted > 0))
+		aio_queue_work(ctx);
+	spin_unlock_irq(&ctx->ctx_lock);
+
+out:
 	kiocb_batch_free(ctx, &batch);
+	kfree(req_arr);
 	put_ioctx(ctx);
-	return i ? i : ret;
+	return nr_submitted ? nr_submitted : ret;
 }
 
 /* sys_io_submit:
diff --git a/include/linux/aio.h b/include/linux/aio.h
index b1a520e..bcd6a5e 100644
--- a/include/linux/aio.h
+++ b/include/linux/aio.h
@@ -124,6 +124,8 @@ struct kiocb {
 	 * this is the underlying eventfd context to deliver events to.
 	 */
 	struct eventfd_ctx	*ki_eventfd;
+
+	const struct cred	*submitter_cred;
 };
 
 #define is_sync_kiocb(iocb)	((iocb)->ki_key == KIOCB_SYNC_KEY)

-- 
Ankit Jain
SUSE Labs

[-- Attachment #2: ext4-disk-new.log --]
[-- Type: text/x-log, Size: 2539 bytes --]

random_rw: (g=0): rw=randrw, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=32
fio-2.0.8-9-gfb9f0
Starting 1 process

random_rw: (groupid=0, jobs=1): err= 0: pid=2021: Tue Jul 24 15:37:25 2012
  read : io=102864KB, bw=504885 B/s, iops=123 , runt=208627msec
    slat (usec): min=0 , max=120 , avg= 1.65, stdev= 3.46
    clat (msec): min=29 , max=619 , avg=134.15, stdev=53.04
     lat (msec): min=29 , max=619 , avg=134.15, stdev=53.04
    clat percentiles (msec):
     |  1.00th=[   63],  5.00th=[   78], 10.00th=[   87], 20.00th=[   98],
     | 30.00th=[  106], 40.00th=[  114], 50.00th=[  123], 60.00th=[  130],
     | 70.00th=[  143], 80.00th=[  163], 90.00th=[  198], 95.00th=[  231],
     | 99.00th=[  302], 99.50th=[  363], 99.90th=[  619], 99.95th=[  619],
     | 99.99th=[  619]
    bw (KB/s)  : min=  153, max=  761, per=100.00%, avg=497.88, stdev=128.35
  write: io=101936KB, bw=500330 B/s, iops=122 , runt=208627msec
    slat (usec): min=0 , max=131 , avg= 1.85, stdev= 3.27
    clat (usec): min=97 , max=619552 , avg=126590.22, stdev=50338.80
     lat (usec): min=108 , max=619553 , avg=126592.26, stdev=50338.83
    clat percentiles (msec):
     |  1.00th=[   59],  5.00th=[   72], 10.00th=[   80], 20.00th=[   91],
     | 30.00th=[  100], 40.00th=[  108], 50.00th=[  116], 60.00th=[  125],
     | 70.00th=[  137], 80.00th=[  155], 90.00th=[  188], 95.00th=[  225],
     | 99.00th=[  293], 99.50th=[  334], 99.90th=[  578], 99.95th=[  619],
     | 99.99th=[  619]
    bw (KB/s)  : min=  127, max=  842, per=100.00%, avg=493.55, stdev=140.80
    lat (usec) : 100=0.01%, 250=0.01%
    lat (msec) : 50=0.25%, 100=26.26%, 250=70.61%, 500=2.69%, 750=0.19%
  cpu          : usr=0.07%, sys=1.25%, ctx=27318, majf=0, minf=24
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=99.9%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued    : total=r=25716/w=25484/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):
   READ: io=102864KB, aggrb=493KB/s, minb=493KB/s, maxb=493KB/s, mint=208627msec, maxt=208627msec
  WRITE: io=101936KB, aggrb=488KB/s, minb=488KB/s, maxb=488KB/s, mint=208627msec, maxt=208627msec

Disk stats (read/write):
  sda: ios=25681/22647, merge=0/70, ticks=204770/98179832, in_queue=99369560, util=98.97%
fio rand-rw-disk-2.fio --output=/home/radical/src/play/ios-test/new-logs/ext4-disk-2-b73147d.log --max-jobs=2 --latency-log --bandwidth-log
b73147d remove unused ki_colln

[-- Attachment #3: ext4-disk-old.org --]
[-- Type: text/plain, Size: 2432 bytes --]

random_rw: (g=0): rw=randrw, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=32
fio-2.0.8-9-gfb9f0
Starting 1 process
random_rw: Laying out IO file(s) (1 file(s) / 200MB)

random_rw: (groupid=0, jobs=1): err= 0: pid=2011: Tue Jul 24 01:00:02 2012
  read : io=102120KB, bw=618740 B/s, iops=151 , runt=169006msec
    slat (usec): min=275 , max=87560 , avg=6571.88, stdev=2799.57
    clat (usec): min=294 , max=214745 , avg=102349.31, stdev=21957.26
     lat (msec): min=9 , max=225 , avg=108.92, stdev=22.21
    clat percentiles (msec):
     |  1.00th=[   54],  5.00th=[   68], 10.00th=[   76], 20.00th=[   84],
     | 30.00th=[   91], 40.00th=[   96], 50.00th=[  102], 60.00th=[  108],
     | 70.00th=[  114], 80.00th=[  121], 90.00th=[  131], 95.00th=[  141],
     | 99.00th=[  157], 99.50th=[  161], 99.90th=[  182], 99.95th=[  198],
     | 99.99th=[  210]
    bw (KB/s)  : min=  474, max=  817, per=99.90%, avg=603.38, stdev=47.87
  write: io=102680KB, bw=622133 B/s, iops=151 , runt=169006msec
    slat (usec): min=2 , max=196 , avg=24.66, stdev=20.35
    clat (usec): min=42 , max=221533 , avg=102260.25, stdev=21825.38
     lat (usec): min=85 , max=221542 , avg=102285.38, stdev=21825.25
    clat percentiles (msec):
     |  1.00th=[   53],  5.00th=[   69], 10.00th=[   76], 20.00th=[   85],
     | 30.00th=[   91], 40.00th=[   96], 50.00th=[  102], 60.00th=[  108],
     | 70.00th=[  114], 80.00th=[  121], 90.00th=[  131], 95.00th=[  139],
     | 99.00th=[  157], 99.50th=[  163], 99.90th=[  184], 99.95th=[  206],
     | 99.99th=[  215]
    bw (KB/s)  : min=  318, max=  936, per=99.86%, avg=606.14, stdev=100.63
    lat (usec) : 50=0.01%, 250=0.01%, 500=0.01%
    lat (msec) : 10=0.01%, 20=0.02%, 50=0.60%, 100=46.62%, 250=52.75%
  cpu          : usr=0.41%, sys=1.58%, ctx=27474, majf=0, minf=22
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=99.9%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued    : total=r=25530/w=25670/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):
   READ: io=102120KB, aggrb=604KB/s, minb=604KB/s, maxb=604KB/s, mint=169006msec, maxt=169006msec
  WRITE: io=102680KB, aggrb=607KB/s, minb=607KB/s, maxb=607KB/s, mint=169006msec, maxt=169006msec

Disk stats (read/write):
  sda: ios=25533/4, merge=23/18, ticks=164781/112, in_queue=164823, util=97.51%

[-- Attachment #4: ext4-rd-new.log --]
[-- Type: text/x-log, Size: 2520 bytes --]

random_rw: (g=0): rw=randrw, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=32
fio-2.0.8-9-gfb9f0
Starting 1 process
random_rw: Laying out IO file(s) (1 file(s) / 1700MB)

random_rw: (groupid=0, jobs=1): err= 0: pid=2002: Tue Jul 24 15:31:47 2012
  read : io=870504KB, bw=558373KB/s, iops=139593 , runt=  1559msec
    slat (usec): min=0 , max=32 , avg= 0.38, stdev= 0.52
    clat (usec): min=62 , max=597 , avg=114.05, stdev=18.96
     lat (usec): min=63 , max=599 , avg=114.49, stdev=19.03
    clat percentiles (usec):
     |  1.00th=[  103],  5.00th=[  105], 10.00th=[  106], 20.00th=[  107],
     | 30.00th=[  108], 40.00th=[  108], 50.00th=[  109], 60.00th=[  110],
     | 70.00th=[  115], 80.00th=[  123], 90.00th=[  126], 95.00th=[  129],
     | 99.00th=[  145], 99.50th=[  151], 99.90th=[  438], 99.95th=[  462],
     | 99.99th=[  580]
    bw (KB/s)  : min=550016, max=572568, per=100.00%, avg=560584.00, stdev=11342.49
  write: io=870296KB, bw=558240KB/s, iops=139559 , runt=  1559msec
    slat (usec): min=0 , max=65 , avg= 0.42, stdev= 0.53
    clat (usec): min=62 , max=595 , avg=113.45, stdev=18.91
     lat (usec): min=63 , max=597 , avg=113.93, stdev=18.99
    clat percentiles (usec):
     |  1.00th=[  103],  5.00th=[  104], 10.00th=[  105], 20.00th=[  106],
     | 30.00th=[  107], 40.00th=[  108], 50.00th=[  108], 60.00th=[  110],
     | 70.00th=[  115], 80.00th=[  122], 90.00th=[  125], 95.00th=[  129],
     | 99.00th=[  145], 99.50th=[  149], 99.90th=[  438], 99.95th=[  462],
     | 99.99th=[  564]
    bw (KB/s)  : min=547456, max=575144, per=100.00%, avg=559728.00, stdev=14109.21
    lat (usec) : 100=0.02%, 250=99.72%, 500=0.24%, 750=0.03%
  cpu          : usr=18.73%, sys=80.76%, ctx=175, majf=0, minf=22
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued    : total=r=217626/w=217574/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):
   READ: io=870504KB, aggrb=558373KB/s, minb=558373KB/s, maxb=558373KB/s, mint=1559msec, maxt=1559msec
  WRITE: io=870296KB, aggrb=558239KB/s, minb=558239KB/s, maxb=558239KB/s, mint=1559msec, maxt=1559msec

Disk stats (read/write):
  ram0: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
fio rand-rw-rd.fio --output=/home/radical/src/play/ios-test/new-logs/ext4-rd-b73147d.log --max-jobs=2 --latency-log --bandwidth-log
b73147d remove unused ki_colln

[-- Attachment #5: ext4-rd-old.org --]
[-- Type: text/plain, Size: 2392 bytes --]

random_rw: (g=0): rw=randrw, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=32
fio-2.0.8-9-gfb9f0
Starting 1 process
random_rw: Laying out IO file(s) (1 file(s) / 1700MB)

random_rw: (groupid=0, jobs=1): err= 0: pid=1999: Tue Jul 24 00:55:56 2012
  read : io=869872KB, bw=489517KB/s, iops=122379 , runt=  1777msec
    slat (usec): min=2 , max=78 , avg= 3.46, stdev= 0.87
    clat (usec): min=16 , max=604 , avg=126.83, stdev=16.35
     lat (usec): min=19 , max=617 , avg=130.40, stdev=16.74
    clat percentiles (usec):
     |  1.00th=[  119],  5.00th=[  121], 10.00th=[  122], 20.00th=[  123],
     | 30.00th=[  124], 40.00th=[  124], 50.00th=[  125], 60.00th=[  126],
     | 70.00th=[  127], 80.00th=[  129], 90.00th=[  133], 95.00th=[  135],
     | 99.00th=[  149], 99.50th=[  155], 99.90th=[  490], 99.95th=[  524],
     | 99.99th=[  548]
    bw (KB/s)  : min=487648, max=493224, per=100.00%, avg=490160.00, stdev=2828.69
  write: io=870928KB, bw=490111KB/s, iops=122527 , runt=  1777msec
    slat (usec): min=2 , max=171 , avg= 2.78, stdev= 0.90
    clat (usec): min=23 , max=606 , avg=126.77, stdev=16.22
     lat (usec): min=26 , max=616 , avg=129.65, stdev=16.57
    clat percentiles (usec):
     |  1.00th=[  118],  5.00th=[  120], 10.00th=[  121], 20.00th=[  123],
     | 30.00th=[  124], 40.00th=[  124], 50.00th=[  125], 60.00th=[  126],
     | 70.00th=[  127], 80.00th=[  129], 90.00th=[  133], 95.00th=[  135],
     | 99.00th=[  149], 99.50th=[  155], 99.90th=[  490], 99.95th=[  516],
     | 99.99th=[  548]
    bw (KB/s)  : min=484072, max=496464, per=100.00%, avg=490920.00, stdev=6298.07
    lat (usec) : 20=0.01%, 50=0.01%, 100=0.01%, 250=99.80%, 500=0.11%
    lat (usec) : 750=0.08%
  cpu          : usr=24.21%, sys=75.34%, ctx=185, majf=0, minf=22
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued    : total=r=217468/w=217732/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):
   READ: io=869872KB, aggrb=489517KB/s, minb=489517KB/s, maxb=489517KB/s, mint=1777msec, maxt=1777msec
  WRITE: io=870928KB, aggrb=490111KB/s, minb=490111KB/s, maxb=490111KB/s, mint=1777msec, maxt=1777msec

Disk stats (read/write):
  ram0: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%

[-- Attachment #6: rand-rw-disk.io --]
[-- Type: text/plain, Size: 78 bytes --]

[random_rw]
rw=randrw
size=200m
directory=/misc/rd
ioengine=libaio
iodepth=32

[-- Attachment #7: rand-rw-rd.fio --]
[-- Type: text/plain, Size: 78 bytes --]

[random_rw]
rw=randrw
size=1700m
directory=/mnt/rd
ioengine=libaio
iodepth=32

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [RFC][PATCH] Make io_submit non-blocking
  2012-07-24 11:41 [RFC][PATCH] Make io_submit non-blocking Ankit Jain
@ 2012-07-24 12:34 ` Rajat Sharma
  2012-07-24 20:27   ` Theodore Ts'o
  2012-07-24 22:31 ` Dave Chinner
  2012-07-24 22:37 ` Zach Brown
  2 siblings, 1 reply; 11+ messages in thread
From: Rajat Sharma @ 2012-07-24 12:34 UTC (permalink / raw)
  To: Ankit Jain
  Cc: Al Viro, bcrl, linux-fsdevel, linux-aio, linux-kernel, Jan Kara

Hi Ankit,

On Tue, Jul 24, 2012 at 5:11 PM, Ankit Jain <jankit@suse.de> wrote:
>
>
> Currently, io_submit tries to execute the io requests on the
> same thread, which could block because of various reaons (eg.
> allocation of disk blocks). So, essentially, io_submit ends
> up being a blocking call.
>

Ideally filesystem should take care of it e.g. by deferring such time
consuming allocations and return -EIOCBQUEUED immediately. But have
you seen such cases?

> With this patch, io_submit prepares all the kiocbs and then
> adds (kicks) them to ctx->run_list (kicked) in one go and then
> schedules the workqueue. The actual operations are not executed
> on io_submit's process context, so it can return very quickly.
>

With lots of application threads firing continuous IOs, workqueue
threads might become bottleneck and you might have to eventually
develop a priority scheduling. This workqueue was originally designed
for IO retries which is an error path, now consumers of workqueue
might easily increase by 100x.

> This run_list is processed either on a workqueue or in response to
> an io_getevents call. This utilizes the existing retry infrastructure.
>
> It uses override_creds/revert_creds to use the submitting process'
> credentials when processing the iocb request from the workqueue. This
> is required for proper support of quota and reserved block access.
>
> Currently, we use block plugging in io_submit, since most of the IO
> was being done there itself. This patch moves it to aio_kick_handler
> and aio_run_all_iocbs, where the IO gets submitted.
>
> All the tests were run with ext4.
>
> I tested the patch with fio
>  (fio rand-rw-disk.fio --max-jobs=2 --latency-log
>  --bandwidth-log)
>
> **Unpatched**
> read : io=102120KB, bw=618740 B/s, iops=151 , runt=169006msec
> slat (usec): min=275 , max=87560 , avg=6571.88, stdev=2799.57
>
> write: io=102680KB, bw=622133 B/s, iops=151 , runt=169006msec
> slat (usec): min=2 , max=196 , avg=24.66, stdev=20.35
>
> **Patched**
> read : io=102864KB, bw=504885 B/s, iops=123 , runt=208627msec
> slat (usec): min=0 , max=120 , avg= 1.65, stdev= 3.46
>
> write: io=101936KB, bw=500330 B/s, iops=122 , runt=208627msec
> slat (usec): min=0 , max=131 , avg= 1.85, stdev= 3.27
>
> From above, it can be seen that submit latencies improve a lot with the
> patch. The full fio results for the "old"(unpatched) and "new"(patched)
> cases are attached. Results with both ramdisk (*rd*) and disk attached,
> and also the corresponding fio files.
>
> Some variations I tried:
>
> 1. I tried to submit one iocb at a time (lock/unlock ctx->lock), and
> that had good performance for a regular disk, but when I tested with a
> ramdisk (to simulate very fast disk), performance was extremely bad.
> Submitting all the iocbs from an io_submit in one go, restored the
> performance (latencies+bandwidth).
>
> 2. I was earlier trying to use queue_delayed_work with 0 timeout, but
> that worsened the submit latencies a bit but improved bandwidth.
>
> 3. Also, I tried not using aio_queue_work from io_submit call, and instead
> depending on an already scheduled one or the iocbs being run when
> io_getevents gets called. This seemed to give improved perfomance. But
> does this constitute as change of api semantics?
>

I once have observed latency issues with aio_queue_work with lesser
number of threads when I was trying to resubmit IOs on a ramdisk, as
this function introduces a mandatory delay if nobody is waiting on
this iocb. The latencies were high but with large number of threads,
effect was not prominent.

> Signed-off-by: Ankit Jain <jankit@suse.de>
>
> --
> diff --git a/fs/aio.c b/fs/aio.c
> index 71f613c..79801096b 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -563,6 +563,11 @@ static int __aio_put_req(struct kioctx *ctx, struct
> kiocb *req)
>         req->ki_cancel = NULL;
>         req->ki_retry = NULL;
>
> +       if (likely(req->submitter_cred)) {
> +               put_cred(req->submitter_cred);
> +               req->submitter_cred = NULL;
> +       }
> +
>         fput(req->ki_filp);
>         req->ki_filp = NULL;
>         really_put_req(ctx, req);
> @@ -659,6 +664,7 @@ static ssize_t aio_run_iocb(struct kiocb *iocb)
>         struct kioctx   *ctx = iocb->ki_ctx;
>         ssize_t (*retry)(struct kiocb *);
>         ssize_t ret;
> +       const struct cred *old_cred = NULL;
>
>         if (!(retry = iocb->ki_retry)) {
>                 printk("aio_run_iocb: iocb->ki_retry = NULL\n");
> @@ -703,12 +709,19 @@ static ssize_t aio_run_iocb(struct kiocb *iocb)
>                 goto out;
>         }
>
> +       if (iocb->submitter_cred)
> +               /* setup creds */
> +               old_cred = override_creds(iocb->submitter_cred);
> +
>         /*
>          * Now we are all set to call the retry method in async
>          * context.
>          */
>         ret = retry(iocb);
>
> +       if (old_cred)
> +               revert_creds(old_cred);
> +
>         if (ret != -EIOCBRETRY && ret != -EIOCBQUEUED) {
>                 /*
>                  * There's no easy way to restart the syscall since other
> AIO's
> @@ -804,10 +817,14 @@ static void aio_queue_work(struct kioctx * ctx)
>   */
>  static inline void aio_run_all_iocbs(struct kioctx *ctx)
>  {
> +       struct blk_plug plug;
> +
> +       blk_start_plug(&plug);
>         spin_lock_irq(&ctx->ctx_lock);
>         while (__aio_run_iocbs(ctx))
>                 ;
>         spin_unlock_irq(&ctx->ctx_lock);
> +       blk_finish_plug(&plug);
>  }
>
>  /*
> @@ -825,13 +842,16 @@ static void aio_kick_handler(struct work_struct
> *work)
>         mm_segment_t oldfs = get_fs();
>         struct mm_struct *mm;
>         int requeue;
> +       struct blk_plug plug;
>
>         set_fs(USER_DS);
>         use_mm(ctx->mm);
> +       blk_start_plug(&plug);
>         spin_lock_irq(&ctx->ctx_lock);
>         requeue =__aio_run_iocbs(ctx);
>         mm = ctx->mm;
>         spin_unlock_irq(&ctx->ctx_lock);
> +       blk_finish_plug(&plug);
>         unuse_mm(mm);
>         set_fs(oldfs);
>         /*
> @@ -1506,12 +1526,14 @@ static ssize_t aio_setup_iocb(struct kiocb *kiocb,
> bool compat)
>
>  static int io_submit_one(struct kioctx *ctx, struct iocb __user
> *user_iocb,
>                          struct iocb *iocb, struct kiocb_batch *batch,
> -                        bool compat)
> +                        bool compat, struct kiocb **req_entry)
>  {
>         struct kiocb *req;
>         struct file *file;
>         ssize_t ret;
>
> +       *req_entry = NULL;
> +
>         /* enforce forwards compatibility on users */
>         if (unlikely(iocb->aio_reserved1 || iocb->aio_reserved2)) {
>                 pr_debug("EINVAL: io_submit: reserve field set\n");
> @@ -1537,6 +1559,7 @@ static int io_submit_one(struct kioctx *ctx, struct
> iocb __user *user_iocb,
>                 fput(file);
>                 return -EAGAIN;
>         }
> +
>         req->ki_filp = file;
>         if (iocb->aio_flags & IOCB_FLAG_RESFD) {
>                 /*
> @@ -1567,38 +1590,16 @@ static int io_submit_one(struct kioctx *ctx,
> struct iocb __user *user_iocb,
>         req->ki_left = req->ki_nbytes = iocb->aio_nbytes;
>         req->ki_opcode = iocb->aio_lio_opcode;
>
> +       req->submitter_cred = get_current_cred();
> +
>         ret = aio_setup_iocb(req, compat);
>
>         if (ret)
>                 goto out_put_req;
>
> -       spin_lock_irq(&ctx->ctx_lock);
> -       /*
> -        * We could have raced with io_destroy() and are currently holding
> a
> -        * reference to ctx which should be destroyed. We cannot submit IO
> -        * since ctx gets freed as soon as io_submit() puts its reference.
> The
> -        * check here is reliable: io_destroy() sets ctx->dead before
> waiting
> -        * for outstanding IO and the barrier between these two is
> realized by
> -        * unlock of mm->ioctx_lock and lock of ctx->ctx_lock.
> Analogously we
> -        * increment ctx->reqs_active before checking for ctx->dead and
> the
> -        * barrier is realized by unlock and lock of ctx->ctx_lock. Thus
> if we
> -        * don't see ctx->dead set here, io_destroy() waits for our IO to
> -        * finish.
> -        */
> -       if (ctx->dead) {
> -               spin_unlock_irq(&ctx->ctx_lock);
> -               ret = -EINVAL;
> -               goto out_put_req;
> -       }
> -       aio_run_iocb(req);
> -       if (!list_empty(&ctx->run_list)) {
> -               /* drain the run list */
> -               while (__aio_run_iocbs(ctx))
> -                       ;
> -       }
> -       spin_unlock_irq(&ctx->ctx_lock);
> -
>         aio_put_req(req);       /* drop extra ref to req */
> +
> +       *req_entry = req;
>         return 0;
>
>  out_put_req:
> @@ -1613,8 +1614,10 @@ long do_io_submit(aio_context_t ctx_id, long nr,
>         struct kioctx *ctx;
>         long ret = 0;
>         int i = 0;
> -       struct blk_plug plug;
>         struct kiocb_batch batch;
> +       struct kiocb **req_arr = NULL;
> +       int nr_submitted = 0;
> +       int req_arr_cnt = 0;
>
>         if (unlikely(nr < 0))
>                 return -EINVAL;
> @@ -1632,8 +1635,8 @@ long do_io_submit(aio_context_t ctx_id, long nr,
>         }
>
>         kiocb_batch_init(&batch, nr);
> -
> -       blk_start_plug(&plug);
> +       req_arr = kmalloc(sizeof(struct kiocb *) * nr, GFP_KERNEL);
> +       memset(req_arr, 0, sizeof(req_arr));
>
>         /*
>          * AKPM: should this return a partial result if some of the IOs
> were
> @@ -1653,15 +1656,51 @@ long do_io_submit(aio_context_t ctx_id, long nr,
>                         break;
>                 }
>
> -               ret = io_submit_one(ctx, user_iocb, &tmp, &batch, compat);
> +               ret = io_submit_one(ctx, user_iocb, &tmp, &batch, compat,
> +                               &req_arr[i]);
>                 if (ret)
>                         break;
> +               req_arr_cnt++;
>         }
> -       blk_finish_plug(&plug);
>
> +       spin_lock_irq(&ctx->ctx_lock);
> +       /*
> +        * We could have raced with io_destroy() and are currently holding
> a
> +        * reference to ctx which should be destroyed. We cannot submit IO
> +        * since ctx gets freed as soon as io_submit() puts its reference.
> The
> +        * check here is reliable: io_destroy() sets ctx->dead before
> waiting
> +        * for outstanding IO and the barrier between these two is
> realized by
> +        * unlock of mm->ioctx_lock and lock of ctx->ctx_lock.
> Analogously we
> +        * increment ctx->reqs_active before checking for ctx->dead and
> the
> +        * barrier is realized by unlock and lock of ctx->ctx_lock. Thus
> if we
> +        * don't see ctx->dead set here, io_destroy() waits for our IO to
> +        * finish.
> +        */
> +       if (ctx->dead) {
> +               spin_unlock_irq(&ctx->ctx_lock);
> +               for (i = 0; i < req_arr_cnt; i++)
> +                       /* drop i/o ref to the req */
> +                       __aio_put_req(ctx, req_arr[i]);
> +
> +               ret = -EINVAL;
> +               goto out;
> +       }
> +
> +       for (i = 0; i < req_arr_cnt; i++) {
> +               struct kiocb *req = req_arr[i];
> +               if (likely(!kiocbTryKick(req)))
> +                       __queue_kicked_iocb(req);
> +               nr_submitted++;
> +       }
> +       if (likely(nr_submitted > 0))
> +               aio_queue_work(ctx);
> +       spin_unlock_irq(&ctx->ctx_lock);
> +
> +out:
>         kiocb_batch_free(ctx, &batch);
> +       kfree(req_arr);
>         put_ioctx(ctx);
> -       return i ? i : ret;
> +       return nr_submitted ? nr_submitted : ret;
>  }
>
>  /* sys_io_submit:
> diff --git a/include/linux/aio.h b/include/linux/aio.h
> index b1a520e..bcd6a5e 100644
> --- a/include/linux/aio.h
> +++ b/include/linux/aio.h
> @@ -124,6 +124,8 @@ struct kiocb {
>          * this is the underlying eventfd context to deliver events to.
>          */
>         struct eventfd_ctx      *ki_eventfd;
> +
> +       const struct cred       *submitter_cred;
>  };
>
>  #define is_sync_kiocb(iocb)    ((iocb)->ki_key == KIOCB_SYNC_KEY)
>
> --
> Ankit Jain
> SUSE Labs

--
Rajat Sharma

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC][PATCH] Make io_submit non-blocking
  2012-07-24 12:34 ` Rajat Sharma
@ 2012-07-24 20:27   ` Theodore Ts'o
  0 siblings, 0 replies; 11+ messages in thread
From: Theodore Ts'o @ 2012-07-24 20:27 UTC (permalink / raw)
  To: Rajat Sharma
  Cc: Ankit Jain, Al Viro, bcrl, linux-fsdevel, linux-aio, linux-kernel,
	Jan Kara

On Tue, Jul 24, 2012 at 06:04:23PM +0530, Rajat Sharma wrote:
> >
> > Currently, io_submit tries to execute the io requests on the
> > same thread, which could block because of various reaons (eg.
> > allocation of disk blocks). So, essentially, io_submit ends
> > up being a blocking call.
> 
> Ideally filesystem should take care of it e.g. by deferring such time
> consuming allocations and return -EIOCBQUEUED immediately. But have
> you seen such cases?

Oh, it happens all the time if you are using AIO.  If the file system
needs to read or write any metadata block, AIO can become distinctly
non-"A".  The workaround that I've chosen is to create a way to cache
the information needed for the bmap() operation, triggered via an
ioctl() issued at open time, so that this is not an issue, but that
only works if the file is pre-allocated, and there is no need to do
any block allocations.

It's all very well and good to say, "the file system should handle
it", but that just pushes the problem onto the file system.  And since
you need to potentially issue block I/O requests, which you can't do
from an interrupt context (i.e., a block I/O completion handler), you
really do need to create a workqueue in order to make things work.

If you do it in the fs/direct_io.c layer, at least that way you can
solve the problem once for all file systems....

> With lots of application threads firing continuous IOs, workqueue
> threads might become bottleneck and you might have to eventually
> develop a priority scheduling. This workqueue was originally designed
> for IO retries which is an error path, now consumers of workqueue
> might easily increase by 100x.

Yes, you definitely need to throttle how many outstanding AIO's can be
allowed to be outstanding, either globally, or on a
per-superblock/process/user/cgroup basis, and return EAGAIN if there
are too many outstanding requests.

Speaking of cgroups, one of the other challenges with running the AIO
out of a workqueue is trying to respect cgroup restrictions.  In
particular, the io-throttle cgroup (which is needed to provide
Proportional I/O support), but also the memory cgroup.

All of these complications is why I decided to simply go with the "pin
metadata" approach, since I didn't need to worry (at least initially)
with the allocating write case.  (These patches to ext4 haven't yet
been published upstream, mainly because they need a lot of cleanup
work and I haven't had time to do that cleanup; my intention is to get
the "big extents" patchset upstream, though.)

						- Ted

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC][PATCH] Make io_submit non-blocking
  2012-07-24 11:41 [RFC][PATCH] Make io_submit non-blocking Ankit Jain
  2012-07-24 12:34 ` Rajat Sharma
@ 2012-07-24 22:31 ` Dave Chinner
  2012-07-24 22:50   ` Christoph Hellwig
  2012-07-25 20:12   ` Ankit Jain
  2012-07-24 22:37 ` Zach Brown
  2 siblings, 2 replies; 11+ messages in thread
From: Dave Chinner @ 2012-07-24 22:31 UTC (permalink / raw)
  To: Ankit Jain
  Cc: Al Viro, bcrl, linux-fsdevel, linux-aio, linux-kernel, Jan Kara

On Tue, Jul 24, 2012 at 05:11:05PM +0530, Ankit Jain wrote:
> 
> Currently, io_submit tries to execute the io requests on the
> same thread, which could block because of various reaons (eg.
> allocation of disk blocks). So, essentially, io_submit ends
> up being a blocking call.
> 
> With this patch, io_submit prepares all the kiocbs and then
> adds (kicks) them to ctx->run_list (kicked) in one go and then
> schedules the workqueue. The actual operations are not executed
> on io_submit's process context, so it can return very quickly.
> 
> This run_list is processed either on a workqueue or in response to
> an io_getevents call. This utilizes the existing retry infrastructure.
> 
> It uses override_creds/revert_creds to use the submitting process'
> credentials when processing the iocb request from the workqueue. This
> is required for proper support of quota and reserved block access.
> 
> Currently, we use block plugging in io_submit, since most of the IO
> was being done there itself. This patch moves it to aio_kick_handler
> and aio_run_all_iocbs, where the IO gets submitted.
> 
> All the tests were run with ext4.
> 
> I tested the patch with fio
>  (fio rand-rw-disk.fio --max-jobs=2 --latency-log
>  --bandwidth-log)
> 
> **Unpatched**
> read : io=102120KB, bw=618740 B/s, iops=151 , runt=169006msec
> slat (usec): min=275 , max=87560 , avg=6571.88, stdev=2799.57

Hmmm, I had to check the numbers twice - that's only 600KB/s.

Perhaps you need to test on something more than a single piece of
spinning rust. Optimising AIO for SSD rates (say 100k 4k write IOPS)
is probably more relevant to the majority of AIO users....

> write: io=102680KB, bw=622133 B/s, iops=151 , runt=169006msec
> slat (usec): min=2 , max=196 , avg=24.66, stdev=20.35
> 
> **Patched**
> read : io=102864KB, bw=504885 B/s, iops=123 , runt=208627msec
> slat (usec): min=0 , max=120 , avg= 1.65, stdev= 3.46 
> 
> write: io=101936KB, bw=500330 B/s, iops=122 , runt=208627msec
> slat (usec): min=0 , max=131 , avg= 1.85, stdev= 3.27 

So you made ext4 20% slower at random 4k writes with worst case
latencies only improving by about 30%. That, I think, is a
non-starter....

Also, you added a memory allocation in the io submit code. Worse
case latency will still be effectively undefined - what happens to
latencies if you generate memory pressure while the test is running?

FWIW, if you are going to change generic code, you need to present
results for other filesystems as well (xfs, btrfs are typical), as
they may not have the same problems as ext4 or react the same way to
your change. The result might simply be "it is 20% slower"....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC][PATCH] Make io_submit non-blocking
  2012-07-24 11:41 [RFC][PATCH] Make io_submit non-blocking Ankit Jain
  2012-07-24 12:34 ` Rajat Sharma
  2012-07-24 22:31 ` Dave Chinner
@ 2012-07-24 22:37 ` Zach Brown
  2012-07-25 20:17   ` Ankit Jain
  2 siblings, 1 reply; 11+ messages in thread
From: Zach Brown @ 2012-07-24 22:37 UTC (permalink / raw)
  To: Ankit Jain
  Cc: Al Viro, bcrl, linux-fsdevel, linux-aio, linux-kernel, Jan Kara

On Tue, Jul 24, 2012 at 05:11:05PM +0530, Ankit Jain wrote:
> 
> Currently, io_submit tries to execute the io requests on the
> same thread, which could block because of various reaons (eg.
> allocation of disk blocks). So, essentially, io_submit ends
> up being a blocking call.

Yup, sadly that's how its built.  A blocking submission phase that
returns once completion doesn't need the submitters's context.  It
happens to mostly work for O_DIRECT block IO most of the time.

> With this patch, io_submit prepares all the kiocbs and then
> adds (kicks) them to ctx->run_list (kicked) in one go and then
> schedules the workqueue. The actual operations are not executed
> on io_submit's process context, so it can return very quickly.

Strong nack; this isn't safe without having done the work to ensure that
all the task_struct references under the f_op->aio_*() paths won't be
horribly confused to find a kernel thread instead of the process that
called io_submit().

The one-off handling of the submitters's cred is an indication that
there might be other cases to worry about :).

> 3. Also, I tried not using aio_queue_work from io_submit call, and instead
> depending on an already scheduled one or the iocbs being run when
> io_getevents gets called. This seemed to give improved perfomance. But
> does this constitute as change of api semantics?

You can't rely on io_getevents() being called for forward progress.  Its
perfectly reasonable for a task to wait for io completion by polling an
eventfd that aio_complete() notifies, for instance.

- z

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC][PATCH] Make io_submit non-blocking
  2012-07-24 22:31 ` Dave Chinner
@ 2012-07-24 22:50   ` Christoph Hellwig
  2012-07-24 23:08     ` Zach Brown
  2012-07-26 19:52     ` Ankit Jain
  2012-07-25 20:12   ` Ankit Jain
  1 sibling, 2 replies; 11+ messages in thread
From: Christoph Hellwig @ 2012-07-24 22:50 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Ankit Jain, Al Viro, bcrl, linux-fsdevel, linux-aio, linux-kernel,
	Jan Kara

On Wed, Jul 25, 2012 at 08:31:10AM +1000, Dave Chinner wrote:
> FWIW, if you are going to change generic code, you need to present
> results for other filesystems as well (xfs, btrfs are typical), as
> they may not have the same problems as ext4 or react the same way to
> your change. The result might simply be "it is 20% slower"....

And most importantly block devices, as they are one of the biggest
use cases of AIO.  With an almost no-op get_blocks callback I can't
see how this change would provide any gain there.

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC][PATCH] Make io_submit non-blocking
  2012-07-24 22:50   ` Christoph Hellwig
@ 2012-07-24 23:08     ` Zach Brown
  2012-07-26 19:52     ` Ankit Jain
  1 sibling, 0 replies; 11+ messages in thread
From: Zach Brown @ 2012-07-24 23:08 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dave Chinner, Ankit Jain, Al Viro, bcrl, linux-fsdevel, linux-aio,
	linux-kernel, Jan Kara

> And most importantly block devices, as they are one of the biggest
> use cases of AIO.  With an almost no-op get_blocks callback I can't
> see how this change would provide any gain there.

Historically we'd often see submission stuck waiting for requests.
Tasks often try to submit way more aio than the block layer is happy to
have in flight.

Dunno if that's still a problem these days.

- z

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC][PATCH] Make io_submit non-blocking
  2012-07-24 22:31 ` Dave Chinner
  2012-07-24 22:50   ` Christoph Hellwig
@ 2012-07-25 20:12   ` Ankit Jain
  1 sibling, 0 replies; 11+ messages in thread
From: Ankit Jain @ 2012-07-25 20:12 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Al Viro, bcrl, linux-fsdevel, linux-aio, linux-kernel, Jan Kara

[-- Attachment #1: Type: text/plain, Size: 4446 bytes --]

On 07/25/2012 04:01 AM, Dave Chinner wrote:
> On Tue, Jul 24, 2012 at 05:11:05PM +0530, Ankit Jain wrote:
[snip]
>> **Unpatched**
>> read : io=102120KB, bw=618740 B/s, iops=151 , runt=169006msec
>> slat (usec): min=275 , max=87560 , avg=6571.88, stdev=2799.57
> 
> Hmmm, I had to check the numbers twice - that's only 600KB/s.
> 
> Perhaps you need to test on something more than a single piece of
> spinning rust. Optimising AIO for SSD rates (say 100k 4k write IOPS)
> is probably more relevant to the majority of AIO users....

I tested with a ramdisk to "simulate" a fast disk and had attached the
results. I'll try to get hold of a SSD and then test with that also.
Meanwhile, I ran the tests again, with ext3/ext4/xfs/btrfs and not sure
what I had screwed up when I did that previous test, but the numbers
look proper (as I was getting in my earlier testing) now:

For disk, I tested on a separate partition formatted with the fs, and
then run fio on it, with 1 job. Here "Old" is 3.5-rc7 (918227b).

------ disk -------
====== ext3 ======
                                       submit latencies(usec)
      	B/w       iops   runtime     min  max     avg   std dev

ext3-read :
Old:  453352 B/s  110   231050msec    3  283048  170.28 5183.28
New:  451298 B/s  110   232555msec    0     444    8.18    7.95
ext3-write:
Old:  454309 B/s  110   231050msec    2  304614  232.72 6549.82
New:  450488 B/s  109   232555msec    0     233    7.94    7.23

====== ext4 ======
ext4-read :
Old:  459824 B/s  112   228635msec    2  260051  121.40 3569.78
New:  422700 B/s  103   247097msec    0     165    8.18    7.87
ext4-write:
Old:  457424 B/s  111   228635msec    3  312958  166.75 4616.58
New:  426015 B/s  104   247097msec    0     169    8.00    8.08

====== xfs ======
xfs-read :
Old:  467330 B/s  114   224516msec    3     272   46.45   25.35
New:  417049 B/s  101   252262msec    0     165    7.84    7.87
xfs-write:
Old:  466746 B/s  113   224516msec    3     265   52.52   28.13
New:  414289 B/s  101   252262msec    0     143    7.58    7.66

====== btrfs ======
btrfs-read :
Old:  1027.1KB/s  256    99918msec    5   84457   62.15  527.24
New:  1054.5KB/s  263    97542msec    0     121    9.72    7.05
btrfs-write:
Old:  1021.8KB/s  255    99918msec    10 139473   84.96  899.99
New:  1045.2KB/s  261    97542msec    0     248    9.55    7.02

These are the figures with a ramdisk:

------ ramdisk -------
====== ext3 ======
                                         submit latencies (usec)
        B/w       iops       runtime    min  max   avg  std dev

ext3-read :
Old:  430312KB/s  107577     2026msec    1  7072   3.85 15.17
New:  491251KB/s  122812     1772msec    0    22   0.39  0.52
ext3-write:
Old:  428918KB/s  107229     2026msec    2    61   3.46  0.85
New:  491142KB/s  122785     1772msec    0    62   0.43  0.55

====== ext4 ======
ext4-read :
Old:  466132KB/s  116532     1869msec    2   133   3.66  1.04
New:  542337KB/s  135584     1607msec    0    67   0.40  0.54
ext4-write:
Old:  465276KB/s  116318     1869msec    2   127   2.96  0.94
New:  540923KB/s  135230     1607msec    0    73   0.43  0.55

====== xfs ======
xfs-read :
Old:  485556KB/s  121389     1794msec    2   160   3.58  1.22
New:  581477KB/s  145369     1495msec    0    19   0.39  0.51
xfs-write:
Old:  484789KB/s  121197     1794msec    1    87   2.68  0.99
New:  582938KB/s  145734     1495msec    0    56   0.43  0.55

====== btrfs ======
I had trouble with btrfs on a ramdisk though, it complained about space
during preallocation. This was with a 4gig ramdisk and fio set to write
1700mb file, so these numbers are from that partial run. Btrfs ran fine
on a regular disk though.

btrfs-read :
Old:  107519KB/s  26882     2579msec    13  1492  17.03  9.23
New:  109878KB/s  27469     4665msec    0     29   0.45  0.55
btrfs-write:
Old:  108047KB/s  27020     2579msec    1  64963  17.21 823.88
New:  109413KB/s  27357     4665msec    0     32   0.48   0.56

Also, I dropped caches ("echo 3 > /proc/vm/sys/drop_cache") and sync'ed
before running each test. All the fio log files are attached.

Any suggestions on how I might test this better, other than the SSD
suggestion ofcourse.

[snip]
> Also, you added a memory allocation in the io submit code. Worse
> case latency will still be effectively undefined - what happens to
> latencies if you generate memory pressure while the test is running?

I'll try to fix this.

-- 
Ankit Jain
SUSE Labs

[-- Attachment #2: fio-logs.tgz --]
[-- Type: application/x-compressed-tar, Size: 11198 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC][PATCH] Make io_submit non-blocking
  2012-07-24 22:37 ` Zach Brown
@ 2012-07-25 20:17   ` Ankit Jain
  0 siblings, 0 replies; 11+ messages in thread
From: Ankit Jain @ 2012-07-25 20:17 UTC (permalink / raw)
  To: Zach Brown
  Cc: Al Viro, bcrl, linux-fsdevel, linux-aio, linux-kernel, Jan Kara

On 07/25/2012 04:07 AM, Zach Brown wrote:
> On Tue, Jul 24, 2012 at 05:11:05PM +0530, Ankit Jain wrote:
[snip]
>> With this patch, io_submit prepares all the kiocbs and then
>> adds (kicks) them to ctx->run_list (kicked) in one go and then
>> schedules the workqueue. The actual operations are not executed
>> on io_submit's process context, so it can return very quickly.
> 
> Strong nack; this isn't safe without having done the work to ensure that
> all the task_struct references under the f_op->aio_*() paths won't be
> horribly confused to find a kernel thread instead of the process that
> called io_submit().
> 
> The one-off handling of the submitters's cred is an indication that
> there might be other cases to worry about :).

Makes sense, I will try to look into this.

>> 3. Also, I tried not using aio_queue_work from io_submit call, and instead
>> depending on an already scheduled one or the iocbs being run when
>> io_getevents gets called. This seemed to give improved perfomance. But
>> does this constitute as change of api semantics?
> 
> You can't rely on io_getevents() being called for forward progress.  Its
> perfectly reasonable for a task to wait for io completion by polling an
> eventfd that aio_complete() notifies, for instance.

Ah okay, didn't realize that.

Thanks,
-- 
Ankit Jain
SUSE Labs


--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC][PATCH] Make io_submit non-blocking
  2012-07-24 22:50   ` Christoph Hellwig
  2012-07-24 23:08     ` Zach Brown
@ 2012-07-26 19:52     ` Ankit Jain
  2012-07-26 21:43       ` Zach Brown
  1 sibling, 1 reply; 11+ messages in thread
From: Ankit Jain @ 2012-07-26 19:52 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dave Chinner, Al Viro, bcrl, linux-fsdevel, linux-aio,
	linux-kernel, Jan Kara

[-- Attachment #1: Type: text/plain, Size: 1921 bytes --]

On 07/25/2012 04:20 AM, Christoph Hellwig wrote:
> On Wed, Jul 25, 2012 at 08:31:10AM +1000, Dave Chinner wrote:
>> FWIW, if you are going to change generic code, you need to present
>> results for other filesystems as well (xfs, btrfs are typical), as
>> they may not have the same problems as ext4 or react the same way to
>> your change. The result might simply be "it is 20% slower"....
> 
> And most importantly block devices, as they are one of the biggest
> use cases of AIO.  With an almost no-op get_blocks callback I can't
> see how this change would provide any gain there.

I tried running fio against a block device, disk partition and a
ramdisk. I ran this with a single job though. For disks, bandwidth
seems to stay nearly the same with submit latencies getting better.
And for ramdisk, bandwidth also sees improvement. I should probably
be doing better tests, any suggestions on what or how I can test?
For block devices, if the patch doesn't make it worse, at least, then
that should be good enough?

------ disk -------
                                      submit latencies(usec)
       	B/w       iops   runtime     min  max   avg  std dev
Read :
Old:  417335 B/s  101   252668msec     4  231  40.03  21.66
New:  419099 B/s  102   251282msec     0  169   8.20   6.95

Write:
Old:  412667 B/s  100   252668msec     3  272  47.65  24.58
New:  415481 B/s  101   251282msec     0  134   7.95   7.11

------ ramdisk -------
                                      submit latencies(usec)
       	B/w       iops      runtime   min  max   avg  std dev
Read:
Old:  708235KB/s  177058   1227msec     1   51   1.61  0.72
New:  822157KB/s  205539   1059msec     0   14   0.38  0.52

Write:
Old:  710510KB/s  177627   1227msec     2   46   2.33  0.81
New:  821658KB/s  205414   1059msec     0   24   0.40  0.53

Full fio results are attached, and I dropped cache before running
the tests.

-- 
Ankit Jain
SUSE Labs

[-- Attachment #2: raw-disk-new.log --]
[-- Type: text/x-log, Size: 2601 bytes --]

random_rw: (g=0): rw=randrw, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=32
fio-2.0.8-9-gfb9f0
Starting 1 process

random_rw: (groupid=0, jobs=1): err= 0: pid=2109: Thu Jul 26 17:14:55 2012
  read : io=102844KB, bw=419099 B/s, iops=102 , runt=251282msec
    slat (usec): min=0 , max=169 , avg= 8.20, stdev= 6.95
    clat (usec): min=335 , max=3356.7K, avg=255054.47, stdev=158234.29
     lat (usec): min=342 , max=3356.7K, avg=255063.32, stdev=158234.33
    clat percentiles (msec):
     |  1.00th=[    8],  5.00th=[   50], 10.00th=[   84], 20.00th=[  130],
     | 30.00th=[  169], 40.00th=[  204], 50.00th=[  237], 60.00th=[  269],
     | 70.00th=[  306], 80.00th=[  351], 90.00th=[  437], 95.00th=[  529],
     | 99.00th=[  791], 99.50th=[  914], 99.90th=[ 1237], 99.95th=[ 1483],
     | 99.99th=[ 2073]
    bw (KB/s)  : min=  111, max=  646, per=100.00%, avg=410.90, stdev=84.69
  write: io=101956KB, bw=415481 B/s, iops=101 , runt=251282msec
    slat (usec): min=0 , max=134 , avg= 7.95, stdev= 7.11
    clat (usec): min=189 , max=928209 , avg=58138.79, stdev=76776.72
     lat (usec): min=194 , max=928221 , avg=58147.37, stdev=76776.86
    clat percentiles (usec):
     |  1.00th=[  498],  5.00th=[  828], 10.00th=[ 1624], 20.00th=[ 4960],
     | 30.00th=[12352], 40.00th=[22144], 50.00th=[33536], 60.00th=[46848],
     | 70.00th=[63232], 80.00th=[90624], 90.00th=[148480], 95.00th=[203776],
     | 99.00th=[370688], 99.50th=[460800], 99.90th=[643072], 99.95th=[716800],
     | 99.99th=[831488]
    bw (KB/s)  : min=   31, max=  864, per=100.00%, avg=408.11, stdev=111.34
    lat (usec) : 250=0.02%, 500=0.54%, 750=1.27%, 1000=1.51%
    lat (msec) : 2=2.39%, 4=3.60%, 10=4.63%, 20=5.96%, 50=13.51%
    lat (msec) : 100=14.18%, 250=27.95%, 500=21.04%, 750=2.78%, 1000=0.46%
    lat (msec) : 2000=0.15%, >=2000=0.01%
  cpu          : usr=0.51%, sys=1.52%, ctx=52135, majf=0, minf=23
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=99.9%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued    : total=r=25711/w=25489/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):
   READ: io=102844KB, aggrb=409KB/s, minb=409KB/s, maxb=409KB/s, mint=251282msec, maxt=251282msec
  WRITE: io=101956KB, aggrb=405KB/s, minb=405KB/s, maxb=405KB/s, mint=251282msec, maxt=251282msec
fio rand-rw-disk-2-raw.fio --output=/home/radical/src/play/ios-test/logs-with-drop-cache/ad6d29a/raw-disk-2-raw-ad6d29a.log --max-jobs=2 --latency-log --bandwidth-log
ad6d29a sent upstream

[-- Attachment #3: raw-disk-old.log --]
[-- Type: text/x-log, Size: 2663 bytes --]

random_rw: (g=0): rw=randrw, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=32
fio-2.0.8-9-gfb9f0
Starting 1 process

random_rw: (groupid=0, jobs=1): err= 0: pid=2117: Thu Jul 26 17:53:41 2012
  read : io=102976KB, bw=417335 B/s, iops=101 , runt=252668msec
    slat (usec): min=4 , max=231 , avg=40.03, stdev=21.66
    clat (usec): min=236 , max=4075.6K, avg=254175.39, stdev=158853.64
     lat (usec): min=339 , max=4075.7K, avg=254216.22, stdev=158853.33
    clat percentiles (msec):
     |  1.00th=[    7],  5.00th=[   51], 10.00th=[   85], 20.00th=[  131],
     | 30.00th=[  169], 40.00th=[  204], 50.00th=[  237], 60.00th=[  269],
     | 70.00th=[  306], 80.00th=[  351], 90.00th=[  433], 95.00th=[  529],
     | 99.00th=[  766], 99.50th=[  906], 99.90th=[ 1270], 99.95th=[ 1500],
     | 99.99th=[ 3261]
    bw (KB/s)  : min=   83, max=  624, per=100.00%, avg=409.49, stdev=91.68
  write: io=101824KB, bw=412667 B/s, iops=100 , runt=252668msec
    slat (usec): min=3 , max=272 , avg=47.65, stdev=24.58
    clat (usec): min=139 , max=1248.8K, avg=60442.70, stdev=82817.74
     lat (usec): min=198 , max=1248.9K, avg=60491.15, stdev=82817.11
    clat percentiles (usec):
     |  1.00th=[  438],  5.00th=[  812], 10.00th=[ 1704], 20.00th=[ 5280],
     | 30.00th=[13376], 40.00th=[23168], 50.00th=[34560], 60.00th=[47872],
     | 70.00th=[66048], 80.00th=[91648], 90.00th=[150528], 95.00th=[209920],
     | 99.00th=[403456], 99.50th=[505856], 99.90th=[798720], 99.95th=[897024],
     | 99.99th=[1073152]
    bw (KB/s)  : min=    7, max=  808, per=100.00%, avg=405.72, stdev=121.51
    lat (usec) : 250=0.03%, 500=0.79%, 750=1.35%, 1000=1.21%
    lat (msec) : 2=2.29%, 4=3.59%, 10=4.36%, 20=5.90%, 50=13.45%
    lat (msec) : 100=14.37%, 250=28.08%, 500=21.27%, 750=2.73%, 1000=0.42%
    lat (msec) : 2000=0.17%, >=2000=0.01%
  cpu          : usr=0.54%, sys=1.44%, ctx=52211, majf=0, minf=23
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=99.9%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued    : total=r=25744/w=25456/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):
   READ: io=102976KB, aggrb=407KB/s, minb=407KB/s, maxb=407KB/s, mint=252668msec, maxt=252668msec
  WRITE: io=101824KB, aggrb=402KB/s, minb=402KB/s, maxb=402KB/s, mint=252668msec, maxt=252668msec
fio rand-rw-disk-2-raw.fio --output=/home/radical/src/play/ios-test/logs-with-drop-cache/918227b/raw-disk-2-raw-918227b.log --max-jobs=2 --latency-log --bandwidth-log
918227b Merge tag 'fbdev-fixes-for-3.5-2' of git://github.com/schandinat/linux-2.6

[-- Attachment #4: raw-rd-new.log --]
[-- Type: text/x-log, Size: 2477 bytes --]

random_rw: (g=0): rw=randrw, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=32
fio-2.0.8-9-gfb9f0
Starting 1 process

random_rw: (groupid=0, jobs=1): err= 0: pid=2179: Thu Jul 26 17:14:57 2012
  read : io=870664KB, bw=822157KB/s, iops=205539 , runt=  1059msec
    slat (usec): min=0 , max=14 , avg= 0.38, stdev= 0.52
    clat (usec): min=44 , max=294 , avg=76.67, stdev=12.77
     lat (usec): min=45 , max=296 , avg=77.10, stdev=12.84
    clat percentiles (usec):
     |  1.00th=[   70],  5.00th=[   71], 10.00th=[   72], 20.00th=[   73],
     | 30.00th=[   74], 40.00th=[   74], 50.00th=[   75], 60.00th=[   76],
     | 70.00th=[   76], 80.00th=[   77], 90.00th=[   79], 95.00th=[   86],
     | 99.00th=[   97], 99.50th=[  107], 99.90th=[  255], 99.95th=[  266],
     | 99.99th=[  286]
    bw (KB/s)  : min=819368, max=826656, per=100.00%, avg=823012.00, stdev=5153.39
  write: io=870136KB, bw=821658KB/s, iops=205414 , runt=  1059msec
    slat (usec): min=0 , max=24 , avg= 0.40, stdev= 0.53
    clat (usec): min=42 , max=292 , avg=77.34, stdev=12.81
     lat (usec): min=43 , max=293 , avg=77.79, stdev=12.89
    clat percentiles (usec):
     |  1.00th=[   70],  5.00th=[   72], 10.00th=[   73], 20.00th=[   74],
     | 30.00th=[   74], 40.00th=[   75], 50.00th=[   76], 60.00th=[   76],
     | 70.00th=[   77], 80.00th=[   78], 90.00th=[   80], 95.00th=[   87],
     | 99.00th=[   98], 99.50th=[  107], 99.90th=[  262], 99.95th=[  270],
     | 99.99th=[  286]
    bw (KB/s)  : min=819368, max=825328, per=100.00%, avg=822348.00, stdev=4214.36
    lat (usec) : 50=0.01%, 100=99.35%, 250=0.41%, 500=0.24%
  cpu          : usr=24.76%, sys=74.76%, ctx=114, majf=0, minf=24
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued    : total=r=217666/w=217534/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):
   READ: io=870664KB, aggrb=822156KB/s, minb=822156KB/s, maxb=822156KB/s, mint=1059msec, maxt=1059msec
  WRITE: io=870136KB, aggrb=821658KB/s, minb=821658KB/s, maxb=821658KB/s, mint=1059msec, maxt=1059msec

Disk stats (read/write):
  ram0: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
fio rand-rw-rd-raw.fio --output=/home/radical/src/play/ios-test/logs-with-drop-cache/ad6d29a/raw-rd-raw-ad6d29a.log --max-jobs=2 --latency-log --bandwidth-log
ad6d29a sent upstream

[-- Attachment #5: raw-rd-old.log --]
[-- Type: text/x-log, Size: 2548 bytes --]

random_rw: (g=0): rw=randrw, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=32
fio-2.0.8-9-gfb9f0
Starting 1 process

random_rw: (groupid=0, jobs=1): err= 0: pid=2192: Thu Jul 26 17:53:52 2012
  read : io=869004KB, bw=708235KB/s, iops=177058 , runt=  1227msec
    slat (usec): min=1 , max=51 , avg= 1.61, stdev= 0.72
    clat (usec): min=17 , max=425 , avg=87.51, stdev=17.51
     lat (usec): min=19 , max=432 , avg=89.20, stdev=17.83
    clat percentiles (usec):
     |  1.00th=[   79],  5.00th=[   81], 10.00th=[   82], 20.00th=[   83],
     | 30.00th=[   84], 40.00th=[   85], 50.00th=[   86], 60.00th=[   86],
     | 70.00th=[   87], 80.00th=[   89], 90.00th=[   91], 95.00th=[   98],
     | 99.00th=[  111], 99.50th=[  118], 99.90th=[  374], 99.95th=[  390],
     | 99.99th=[  406]
    bw (KB/s)  : min=707912, max=711328, per=100.00%, avg=709620.00, stdev=2415.48
  write: io=871796KB, bw=710510KB/s, iops=177627 , runt=  1227msec
    slat (usec): min=2 , max=46 , avg= 2.33, stdev= 0.81
    clat (usec): min=14 , max=425 , avg=87.62, stdev=17.61
     lat (usec): min=16 , max=435 , avg=90.05, stdev=18.07
    clat percentiles (usec):
     |  1.00th=[   79],  5.00th=[   81], 10.00th=[   82], 20.00th=[   83],
     | 30.00th=[   84], 40.00th=[   85], 50.00th=[   86], 60.00th=[   87],
     | 70.00th=[   87], 80.00th=[   89], 90.00th=[   91], 95.00th=[   98],
     | 99.00th=[  111], 99.50th=[  118], 99.90th=[  378], 99.95th=[  390],
     | 99.99th=[  406]
    bw (KB/s)  : min=709360, max=717872, per=100.00%, avg=713616.00, stdev=6018.89
    lat (usec) : 20=0.01%, 50=0.01%, 100=95.59%, 250=4.04%, 500=0.36%
  cpu          : usr=31.57%, sys=67.94%, ctx=125, majf=0, minf=24
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued    : total=r=217251/w=217949/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):
   READ: io=869004KB, aggrb=708234KB/s, minb=708234KB/s, maxb=708234KB/s, mint=1227msec, maxt=1227msec
  WRITE: io=871796KB, aggrb=710510KB/s, minb=710510KB/s, maxb=710510KB/s, mint=1227msec, maxt=1227msec

Disk stats (read/write):
  ram0: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
fio rand-rw-rd-raw.fio --output=/home/radical/src/play/ios-test/logs-with-drop-cache/918227b/raw-rd-raw-918227b.log --max-jobs=2 --latency-log --bandwidth-log
918227b Merge tag 'fbdev-fixes-for-3.5-2' of git://github.com/schandinat/linux-2.6

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC][PATCH] Make io_submit non-blocking
  2012-07-26 19:52     ` Ankit Jain
@ 2012-07-26 21:43       ` Zach Brown
  0 siblings, 0 replies; 11+ messages in thread
From: Zach Brown @ 2012-07-26 21:43 UTC (permalink / raw)
  To: Ankit Jain
  Cc: Christoph Hellwig, Dave Chinner, Al Viro, bcrl, linux-fsdevel,
	linux-aio, linux-kernel, Jan Kara

On Fri, Jul 27, 2012 at 01:22:10AM +0530, Ankit Jain wrote:

> I should probably be doing better tests, any suggestions on what or
> how I can test?

Well, is the test actually *doing* anything with these IOs?

Calling io_submit() and then immediately waiting for completion is the
best case for offloading work to threads inside io_submit().  It's
likely that the kernel thread will then get a chance to run and submit
the IO and you won't have lost much time since the io_submit() queued
the work.

So try saturating the cpus while the tests are running.  Give the kernel
aio submission threads some competition for run time on the cpus.

Maybe with the cpuio bits of fio?  I haven't used that myself but the
description of it in its README/HOWTO files are using all the right
words :).

- z

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2012-07-26 21:43 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-07-24 11:41 [RFC][PATCH] Make io_submit non-blocking Ankit Jain
2012-07-24 12:34 ` Rajat Sharma
2012-07-24 20:27   ` Theodore Ts'o
2012-07-24 22:31 ` Dave Chinner
2012-07-24 22:50   ` Christoph Hellwig
2012-07-24 23:08     ` Zach Brown
2012-07-26 19:52     ` Ankit Jain
2012-07-26 21:43       ` Zach Brown
2012-07-25 20:12   ` Ankit Jain
2012-07-24 22:37 ` Zach Brown
2012-07-25 20:17   ` Ankit Jain

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).