* [RFC][PATCH] Make io_submit non-blocking
@ 2012-07-24 11:41 Ankit Jain
2012-07-24 12:34 ` Rajat Sharma
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: Ankit Jain @ 2012-07-24 11:41 UTC (permalink / raw)
To: Al Viro, bcrl; +Cc: linux-fsdevel, linux-aio, linux-kernel, Jan Kara
[-- Attachment #1: Type: text/plain, Size: 9349 bytes --]
Currently, io_submit tries to execute the io requests on the
same thread, which could block because of various reaons (eg.
allocation of disk blocks). So, essentially, io_submit ends
up being a blocking call.
With this patch, io_submit prepares all the kiocbs and then
adds (kicks) them to ctx->run_list (kicked) in one go and then
schedules the workqueue. The actual operations are not executed
on io_submit's process context, so it can return very quickly.
This run_list is processed either on a workqueue or in response to
an io_getevents call. This utilizes the existing retry infrastructure.
It uses override_creds/revert_creds to use the submitting process'
credentials when processing the iocb request from the workqueue. This
is required for proper support of quota and reserved block access.
Currently, we use block plugging in io_submit, since most of the IO
was being done there itself. This patch moves it to aio_kick_handler
and aio_run_all_iocbs, where the IO gets submitted.
All the tests were run with ext4.
I tested the patch with fio
(fio rand-rw-disk.fio --max-jobs=2 --latency-log
--bandwidth-log)
**Unpatched**
read : io=102120KB, bw=618740 B/s, iops=151 , runt=169006msec
slat (usec): min=275 , max=87560 , avg=6571.88, stdev=2799.57
write: io=102680KB, bw=622133 B/s, iops=151 , runt=169006msec
slat (usec): min=2 , max=196 , avg=24.66, stdev=20.35
**Patched**
read : io=102864KB, bw=504885 B/s, iops=123 , runt=208627msec
slat (usec): min=0 , max=120 , avg= 1.65, stdev= 3.46
write: io=101936KB, bw=500330 B/s, iops=122 , runt=208627msec
slat (usec): min=0 , max=131 , avg= 1.85, stdev= 3.27
>From above, it can be seen that submit latencies improve a lot with the
patch. The full fio results for the "old"(unpatched) and "new"(patched)
cases are attached. Results with both ramdisk (*rd*) and disk attached,
and also the corresponding fio files.
Some variations I tried:
1. I tried to submit one iocb at a time (lock/unlock ctx->lock), and
that had good performance for a regular disk, but when I tested with a
ramdisk (to simulate very fast disk), performance was extremely bad.
Submitting all the iocbs from an io_submit in one go, restored the
performance (latencies+bandwidth).
2. I was earlier trying to use queue_delayed_work with 0 timeout, but
that worsened the submit latencies a bit but improved bandwidth.
3. Also, I tried not using aio_queue_work from io_submit call, and instead
depending on an already scheduled one or the iocbs being run when
io_getevents gets called. This seemed to give improved perfomance. But
does this constitute as change of api semantics?
Signed-off-by: Ankit Jain <jankit@suse.de>
--
diff --git a/fs/aio.c b/fs/aio.c
index 71f613c..79801096b 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -563,6 +563,11 @@ static int __aio_put_req(struct kioctx *ctx, struct kiocb *req)
req->ki_cancel = NULL;
req->ki_retry = NULL;
+ if (likely(req->submitter_cred)) {
+ put_cred(req->submitter_cred);
+ req->submitter_cred = NULL;
+ }
+
fput(req->ki_filp);
req->ki_filp = NULL;
really_put_req(ctx, req);
@@ -659,6 +664,7 @@ static ssize_t aio_run_iocb(struct kiocb *iocb)
struct kioctx *ctx = iocb->ki_ctx;
ssize_t (*retry)(struct kiocb *);
ssize_t ret;
+ const struct cred *old_cred = NULL;
if (!(retry = iocb->ki_retry)) {
printk("aio_run_iocb: iocb->ki_retry = NULL\n");
@@ -703,12 +709,19 @@ static ssize_t aio_run_iocb(struct kiocb *iocb)
goto out;
}
+ if (iocb->submitter_cred)
+ /* setup creds */
+ old_cred = override_creds(iocb->submitter_cred);
+
/*
* Now we are all set to call the retry method in async
* context.
*/
ret = retry(iocb);
+ if (old_cred)
+ revert_creds(old_cred);
+
if (ret != -EIOCBRETRY && ret != -EIOCBQUEUED) {
/*
* There's no easy way to restart the syscall since other AIO's
@@ -804,10 +817,14 @@ static void aio_queue_work(struct kioctx * ctx)
*/
static inline void aio_run_all_iocbs(struct kioctx *ctx)
{
+ struct blk_plug plug;
+
+ blk_start_plug(&plug);
spin_lock_irq(&ctx->ctx_lock);
while (__aio_run_iocbs(ctx))
;
spin_unlock_irq(&ctx->ctx_lock);
+ blk_finish_plug(&plug);
}
/*
@@ -825,13 +842,16 @@ static void aio_kick_handler(struct work_struct *work)
mm_segment_t oldfs = get_fs();
struct mm_struct *mm;
int requeue;
+ struct blk_plug plug;
set_fs(USER_DS);
use_mm(ctx->mm);
+ blk_start_plug(&plug);
spin_lock_irq(&ctx->ctx_lock);
requeue =__aio_run_iocbs(ctx);
mm = ctx->mm;
spin_unlock_irq(&ctx->ctx_lock);
+ blk_finish_plug(&plug);
unuse_mm(mm);
set_fs(oldfs);
/*
@@ -1506,12 +1526,14 @@ static ssize_t aio_setup_iocb(struct kiocb *kiocb, bool compat)
static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
struct iocb *iocb, struct kiocb_batch *batch,
- bool compat)
+ bool compat, struct kiocb **req_entry)
{
struct kiocb *req;
struct file *file;
ssize_t ret;
+ *req_entry = NULL;
+
/* enforce forwards compatibility on users */
if (unlikely(iocb->aio_reserved1 || iocb->aio_reserved2)) {
pr_debug("EINVAL: io_submit: reserve field set\n");
@@ -1537,6 +1559,7 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
fput(file);
return -EAGAIN;
}
+
req->ki_filp = file;
if (iocb->aio_flags & IOCB_FLAG_RESFD) {
/*
@@ -1567,38 +1590,16 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
req->ki_left = req->ki_nbytes = iocb->aio_nbytes;
req->ki_opcode = iocb->aio_lio_opcode;
+ req->submitter_cred = get_current_cred();
+
ret = aio_setup_iocb(req, compat);
if (ret)
goto out_put_req;
- spin_lock_irq(&ctx->ctx_lock);
- /*
- * We could have raced with io_destroy() and are currently holding a
- * reference to ctx which should be destroyed. We cannot submit IO
- * since ctx gets freed as soon as io_submit() puts its reference. The
- * check here is reliable: io_destroy() sets ctx->dead before waiting
- * for outstanding IO and the barrier between these two is realized by
- * unlock of mm->ioctx_lock and lock of ctx->ctx_lock. Analogously we
- * increment ctx->reqs_active before checking for ctx->dead and the
- * barrier is realized by unlock and lock of ctx->ctx_lock. Thus if we
- * don't see ctx->dead set here, io_destroy() waits for our IO to
- * finish.
- */
- if (ctx->dead) {
- spin_unlock_irq(&ctx->ctx_lock);
- ret = -EINVAL;
- goto out_put_req;
- }
- aio_run_iocb(req);
- if (!list_empty(&ctx->run_list)) {
- /* drain the run list */
- while (__aio_run_iocbs(ctx))
- ;
- }
- spin_unlock_irq(&ctx->ctx_lock);
-
aio_put_req(req); /* drop extra ref to req */
+
+ *req_entry = req;
return 0;
out_put_req:
@@ -1613,8 +1614,10 @@ long do_io_submit(aio_context_t ctx_id, long nr,
struct kioctx *ctx;
long ret = 0;
int i = 0;
- struct blk_plug plug;
struct kiocb_batch batch;
+ struct kiocb **req_arr = NULL;
+ int nr_submitted = 0;
+ int req_arr_cnt = 0;
if (unlikely(nr < 0))
return -EINVAL;
@@ -1632,8 +1635,8 @@ long do_io_submit(aio_context_t ctx_id, long nr,
}
kiocb_batch_init(&batch, nr);
-
- blk_start_plug(&plug);
+ req_arr = kmalloc(sizeof(struct kiocb *) * nr, GFP_KERNEL);
+ memset(req_arr, 0, sizeof(req_arr));
/*
* AKPM: should this return a partial result if some of the IOs were
@@ -1653,15 +1656,51 @@ long do_io_submit(aio_context_t ctx_id, long nr,
break;
}
- ret = io_submit_one(ctx, user_iocb, &tmp, &batch, compat);
+ ret = io_submit_one(ctx, user_iocb, &tmp, &batch, compat,
+ &req_arr[i]);
if (ret)
break;
+ req_arr_cnt++;
}
- blk_finish_plug(&plug);
+ spin_lock_irq(&ctx->ctx_lock);
+ /*
+ * We could have raced with io_destroy() and are currently holding a
+ * reference to ctx which should be destroyed. We cannot submit IO
+ * since ctx gets freed as soon as io_submit() puts its reference. The
+ * check here is reliable: io_destroy() sets ctx->dead before waiting
+ * for outstanding IO and the barrier between these two is realized by
+ * unlock of mm->ioctx_lock and lock of ctx->ctx_lock. Analogously we
+ * increment ctx->reqs_active before checking for ctx->dead and the
+ * barrier is realized by unlock and lock of ctx->ctx_lock. Thus if we
+ * don't see ctx->dead set here, io_destroy() waits for our IO to
+ * finish.
+ */
+ if (ctx->dead) {
+ spin_unlock_irq(&ctx->ctx_lock);
+ for (i = 0; i < req_arr_cnt; i++)
+ /* drop i/o ref to the req */
+ __aio_put_req(ctx, req_arr[i]);
+
+ ret = -EINVAL;
+ goto out;
+ }
+
+ for (i = 0; i < req_arr_cnt; i++) {
+ struct kiocb *req = req_arr[i];
+ if (likely(!kiocbTryKick(req)))
+ __queue_kicked_iocb(req);
+ nr_submitted++;
+ }
+ if (likely(nr_submitted > 0))
+ aio_queue_work(ctx);
+ spin_unlock_irq(&ctx->ctx_lock);
+
+out:
kiocb_batch_free(ctx, &batch);
+ kfree(req_arr);
put_ioctx(ctx);
- return i ? i : ret;
+ return nr_submitted ? nr_submitted : ret;
}
/* sys_io_submit:
diff --git a/include/linux/aio.h b/include/linux/aio.h
index b1a520e..bcd6a5e 100644
--- a/include/linux/aio.h
+++ b/include/linux/aio.h
@@ -124,6 +124,8 @@ struct kiocb {
* this is the underlying eventfd context to deliver events to.
*/
struct eventfd_ctx *ki_eventfd;
+
+ const struct cred *submitter_cred;
};
#define is_sync_kiocb(iocb) ((iocb)->ki_key == KIOCB_SYNC_KEY)
--
Ankit Jain
SUSE Labs
[-- Attachment #2: ext4-disk-new.log --]
[-- Type: text/x-log, Size: 2539 bytes --]
random_rw: (g=0): rw=randrw, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=32
fio-2.0.8-9-gfb9f0
Starting 1 process
random_rw: (groupid=0, jobs=1): err= 0: pid=2021: Tue Jul 24 15:37:25 2012
read : io=102864KB, bw=504885 B/s, iops=123 , runt=208627msec
slat (usec): min=0 , max=120 , avg= 1.65, stdev= 3.46
clat (msec): min=29 , max=619 , avg=134.15, stdev=53.04
lat (msec): min=29 , max=619 , avg=134.15, stdev=53.04
clat percentiles (msec):
| 1.00th=[ 63], 5.00th=[ 78], 10.00th=[ 87], 20.00th=[ 98],
| 30.00th=[ 106], 40.00th=[ 114], 50.00th=[ 123], 60.00th=[ 130],
| 70.00th=[ 143], 80.00th=[ 163], 90.00th=[ 198], 95.00th=[ 231],
| 99.00th=[ 302], 99.50th=[ 363], 99.90th=[ 619], 99.95th=[ 619],
| 99.99th=[ 619]
bw (KB/s) : min= 153, max= 761, per=100.00%, avg=497.88, stdev=128.35
write: io=101936KB, bw=500330 B/s, iops=122 , runt=208627msec
slat (usec): min=0 , max=131 , avg= 1.85, stdev= 3.27
clat (usec): min=97 , max=619552 , avg=126590.22, stdev=50338.80
lat (usec): min=108 , max=619553 , avg=126592.26, stdev=50338.83
clat percentiles (msec):
| 1.00th=[ 59], 5.00th=[ 72], 10.00th=[ 80], 20.00th=[ 91],
| 30.00th=[ 100], 40.00th=[ 108], 50.00th=[ 116], 60.00th=[ 125],
| 70.00th=[ 137], 80.00th=[ 155], 90.00th=[ 188], 95.00th=[ 225],
| 99.00th=[ 293], 99.50th=[ 334], 99.90th=[ 578], 99.95th=[ 619],
| 99.99th=[ 619]
bw (KB/s) : min= 127, max= 842, per=100.00%, avg=493.55, stdev=140.80
lat (usec) : 100=0.01%, 250=0.01%
lat (msec) : 50=0.25%, 100=26.26%, 250=70.61%, 500=2.69%, 750=0.19%
cpu : usr=0.07%, sys=1.25%, ctx=27318, majf=0, minf=24
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=99.9%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
issued : total=r=25716/w=25484/d=0, short=r=0/w=0/d=0
Run status group 0 (all jobs):
READ: io=102864KB, aggrb=493KB/s, minb=493KB/s, maxb=493KB/s, mint=208627msec, maxt=208627msec
WRITE: io=101936KB, aggrb=488KB/s, minb=488KB/s, maxb=488KB/s, mint=208627msec, maxt=208627msec
Disk stats (read/write):
sda: ios=25681/22647, merge=0/70, ticks=204770/98179832, in_queue=99369560, util=98.97%
fio rand-rw-disk-2.fio --output=/home/radical/src/play/ios-test/new-logs/ext4-disk-2-b73147d.log --max-jobs=2 --latency-log --bandwidth-log
b73147d remove unused ki_colln
[-- Attachment #3: ext4-disk-old.org --]
[-- Type: text/plain, Size: 2432 bytes --]
random_rw: (g=0): rw=randrw, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=32
fio-2.0.8-9-gfb9f0
Starting 1 process
random_rw: Laying out IO file(s) (1 file(s) / 200MB)
random_rw: (groupid=0, jobs=1): err= 0: pid=2011: Tue Jul 24 01:00:02 2012
read : io=102120KB, bw=618740 B/s, iops=151 , runt=169006msec
slat (usec): min=275 , max=87560 , avg=6571.88, stdev=2799.57
clat (usec): min=294 , max=214745 , avg=102349.31, stdev=21957.26
lat (msec): min=9 , max=225 , avg=108.92, stdev=22.21
clat percentiles (msec):
| 1.00th=[ 54], 5.00th=[ 68], 10.00th=[ 76], 20.00th=[ 84],
| 30.00th=[ 91], 40.00th=[ 96], 50.00th=[ 102], 60.00th=[ 108],
| 70.00th=[ 114], 80.00th=[ 121], 90.00th=[ 131], 95.00th=[ 141],
| 99.00th=[ 157], 99.50th=[ 161], 99.90th=[ 182], 99.95th=[ 198],
| 99.99th=[ 210]
bw (KB/s) : min= 474, max= 817, per=99.90%, avg=603.38, stdev=47.87
write: io=102680KB, bw=622133 B/s, iops=151 , runt=169006msec
slat (usec): min=2 , max=196 , avg=24.66, stdev=20.35
clat (usec): min=42 , max=221533 , avg=102260.25, stdev=21825.38
lat (usec): min=85 , max=221542 , avg=102285.38, stdev=21825.25
clat percentiles (msec):
| 1.00th=[ 53], 5.00th=[ 69], 10.00th=[ 76], 20.00th=[ 85],
| 30.00th=[ 91], 40.00th=[ 96], 50.00th=[ 102], 60.00th=[ 108],
| 70.00th=[ 114], 80.00th=[ 121], 90.00th=[ 131], 95.00th=[ 139],
| 99.00th=[ 157], 99.50th=[ 163], 99.90th=[ 184], 99.95th=[ 206],
| 99.99th=[ 215]
bw (KB/s) : min= 318, max= 936, per=99.86%, avg=606.14, stdev=100.63
lat (usec) : 50=0.01%, 250=0.01%, 500=0.01%
lat (msec) : 10=0.01%, 20=0.02%, 50=0.60%, 100=46.62%, 250=52.75%
cpu : usr=0.41%, sys=1.58%, ctx=27474, majf=0, minf=22
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=99.9%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
issued : total=r=25530/w=25670/d=0, short=r=0/w=0/d=0
Run status group 0 (all jobs):
READ: io=102120KB, aggrb=604KB/s, minb=604KB/s, maxb=604KB/s, mint=169006msec, maxt=169006msec
WRITE: io=102680KB, aggrb=607KB/s, minb=607KB/s, maxb=607KB/s, mint=169006msec, maxt=169006msec
Disk stats (read/write):
sda: ios=25533/4, merge=23/18, ticks=164781/112, in_queue=164823, util=97.51%
[-- Attachment #4: ext4-rd-new.log --]
[-- Type: text/x-log, Size: 2520 bytes --]
random_rw: (g=0): rw=randrw, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=32
fio-2.0.8-9-gfb9f0
Starting 1 process
random_rw: Laying out IO file(s) (1 file(s) / 1700MB)
random_rw: (groupid=0, jobs=1): err= 0: pid=2002: Tue Jul 24 15:31:47 2012
read : io=870504KB, bw=558373KB/s, iops=139593 , runt= 1559msec
slat (usec): min=0 , max=32 , avg= 0.38, stdev= 0.52
clat (usec): min=62 , max=597 , avg=114.05, stdev=18.96
lat (usec): min=63 , max=599 , avg=114.49, stdev=19.03
clat percentiles (usec):
| 1.00th=[ 103], 5.00th=[ 105], 10.00th=[ 106], 20.00th=[ 107],
| 30.00th=[ 108], 40.00th=[ 108], 50.00th=[ 109], 60.00th=[ 110],
| 70.00th=[ 115], 80.00th=[ 123], 90.00th=[ 126], 95.00th=[ 129],
| 99.00th=[ 145], 99.50th=[ 151], 99.90th=[ 438], 99.95th=[ 462],
| 99.99th=[ 580]
bw (KB/s) : min=550016, max=572568, per=100.00%, avg=560584.00, stdev=11342.49
write: io=870296KB, bw=558240KB/s, iops=139559 , runt= 1559msec
slat (usec): min=0 , max=65 , avg= 0.42, stdev= 0.53
clat (usec): min=62 , max=595 , avg=113.45, stdev=18.91
lat (usec): min=63 , max=597 , avg=113.93, stdev=18.99
clat percentiles (usec):
| 1.00th=[ 103], 5.00th=[ 104], 10.00th=[ 105], 20.00th=[ 106],
| 30.00th=[ 107], 40.00th=[ 108], 50.00th=[ 108], 60.00th=[ 110],
| 70.00th=[ 115], 80.00th=[ 122], 90.00th=[ 125], 95.00th=[ 129],
| 99.00th=[ 145], 99.50th=[ 149], 99.90th=[ 438], 99.95th=[ 462],
| 99.99th=[ 564]
bw (KB/s) : min=547456, max=575144, per=100.00%, avg=559728.00, stdev=14109.21
lat (usec) : 100=0.02%, 250=99.72%, 500=0.24%, 750=0.03%
cpu : usr=18.73%, sys=80.76%, ctx=175, majf=0, minf=22
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
issued : total=r=217626/w=217574/d=0, short=r=0/w=0/d=0
Run status group 0 (all jobs):
READ: io=870504KB, aggrb=558373KB/s, minb=558373KB/s, maxb=558373KB/s, mint=1559msec, maxt=1559msec
WRITE: io=870296KB, aggrb=558239KB/s, minb=558239KB/s, maxb=558239KB/s, mint=1559msec, maxt=1559msec
Disk stats (read/write):
ram0: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
fio rand-rw-rd.fio --output=/home/radical/src/play/ios-test/new-logs/ext4-rd-b73147d.log --max-jobs=2 --latency-log --bandwidth-log
b73147d remove unused ki_colln
[-- Attachment #5: ext4-rd-old.org --]
[-- Type: text/plain, Size: 2392 bytes --]
random_rw: (g=0): rw=randrw, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=32
fio-2.0.8-9-gfb9f0
Starting 1 process
random_rw: Laying out IO file(s) (1 file(s) / 1700MB)
random_rw: (groupid=0, jobs=1): err= 0: pid=1999: Tue Jul 24 00:55:56 2012
read : io=869872KB, bw=489517KB/s, iops=122379 , runt= 1777msec
slat (usec): min=2 , max=78 , avg= 3.46, stdev= 0.87
clat (usec): min=16 , max=604 , avg=126.83, stdev=16.35
lat (usec): min=19 , max=617 , avg=130.40, stdev=16.74
clat percentiles (usec):
| 1.00th=[ 119], 5.00th=[ 121], 10.00th=[ 122], 20.00th=[ 123],
| 30.00th=[ 124], 40.00th=[ 124], 50.00th=[ 125], 60.00th=[ 126],
| 70.00th=[ 127], 80.00th=[ 129], 90.00th=[ 133], 95.00th=[ 135],
| 99.00th=[ 149], 99.50th=[ 155], 99.90th=[ 490], 99.95th=[ 524],
| 99.99th=[ 548]
bw (KB/s) : min=487648, max=493224, per=100.00%, avg=490160.00, stdev=2828.69
write: io=870928KB, bw=490111KB/s, iops=122527 , runt= 1777msec
slat (usec): min=2 , max=171 , avg= 2.78, stdev= 0.90
clat (usec): min=23 , max=606 , avg=126.77, stdev=16.22
lat (usec): min=26 , max=616 , avg=129.65, stdev=16.57
clat percentiles (usec):
| 1.00th=[ 118], 5.00th=[ 120], 10.00th=[ 121], 20.00th=[ 123],
| 30.00th=[ 124], 40.00th=[ 124], 50.00th=[ 125], 60.00th=[ 126],
| 70.00th=[ 127], 80.00th=[ 129], 90.00th=[ 133], 95.00th=[ 135],
| 99.00th=[ 149], 99.50th=[ 155], 99.90th=[ 490], 99.95th=[ 516],
| 99.99th=[ 548]
bw (KB/s) : min=484072, max=496464, per=100.00%, avg=490920.00, stdev=6298.07
lat (usec) : 20=0.01%, 50=0.01%, 100=0.01%, 250=99.80%, 500=0.11%
lat (usec) : 750=0.08%
cpu : usr=24.21%, sys=75.34%, ctx=185, majf=0, minf=22
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
issued : total=r=217468/w=217732/d=0, short=r=0/w=0/d=0
Run status group 0 (all jobs):
READ: io=869872KB, aggrb=489517KB/s, minb=489517KB/s, maxb=489517KB/s, mint=1777msec, maxt=1777msec
WRITE: io=870928KB, aggrb=490111KB/s, minb=490111KB/s, maxb=490111KB/s, mint=1777msec, maxt=1777msec
Disk stats (read/write):
ram0: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
[-- Attachment #6: rand-rw-disk.io --]
[-- Type: text/plain, Size: 78 bytes --]
[random_rw]
rw=randrw
size=200m
directory=/misc/rd
ioengine=libaio
iodepth=32
[-- Attachment #7: rand-rw-rd.fio --]
[-- Type: text/plain, Size: 78 bytes --]
[random_rw]
rw=randrw
size=1700m
directory=/mnt/rd
ioengine=libaio
iodepth=32
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [RFC][PATCH] Make io_submit non-blocking
2012-07-24 11:41 [RFC][PATCH] Make io_submit non-blocking Ankit Jain
@ 2012-07-24 12:34 ` Rajat Sharma
2012-07-24 20:27 ` Theodore Ts'o
2012-07-24 22:31 ` Dave Chinner
2012-07-24 22:37 ` Zach Brown
2 siblings, 1 reply; 11+ messages in thread
From: Rajat Sharma @ 2012-07-24 12:34 UTC (permalink / raw)
To: Ankit Jain
Cc: Al Viro, bcrl, linux-fsdevel, linux-aio, linux-kernel, Jan Kara
Hi Ankit,
On Tue, Jul 24, 2012 at 5:11 PM, Ankit Jain <jankit@suse.de> wrote:
>
>
> Currently, io_submit tries to execute the io requests on the
> same thread, which could block because of various reaons (eg.
> allocation of disk blocks). So, essentially, io_submit ends
> up being a blocking call.
>
Ideally filesystem should take care of it e.g. by deferring such time
consuming allocations and return -EIOCBQUEUED immediately. But have
you seen such cases?
> With this patch, io_submit prepares all the kiocbs and then
> adds (kicks) them to ctx->run_list (kicked) in one go and then
> schedules the workqueue. The actual operations are not executed
> on io_submit's process context, so it can return very quickly.
>
With lots of application threads firing continuous IOs, workqueue
threads might become bottleneck and you might have to eventually
develop a priority scheduling. This workqueue was originally designed
for IO retries which is an error path, now consumers of workqueue
might easily increase by 100x.
> This run_list is processed either on a workqueue or in response to
> an io_getevents call. This utilizes the existing retry infrastructure.
>
> It uses override_creds/revert_creds to use the submitting process'
> credentials when processing the iocb request from the workqueue. This
> is required for proper support of quota and reserved block access.
>
> Currently, we use block plugging in io_submit, since most of the IO
> was being done there itself. This patch moves it to aio_kick_handler
> and aio_run_all_iocbs, where the IO gets submitted.
>
> All the tests were run with ext4.
>
> I tested the patch with fio
> (fio rand-rw-disk.fio --max-jobs=2 --latency-log
> --bandwidth-log)
>
> **Unpatched**
> read : io=102120KB, bw=618740 B/s, iops=151 , runt=169006msec
> slat (usec): min=275 , max=87560 , avg=6571.88, stdev=2799.57
>
> write: io=102680KB, bw=622133 B/s, iops=151 , runt=169006msec
> slat (usec): min=2 , max=196 , avg=24.66, stdev=20.35
>
> **Patched**
> read : io=102864KB, bw=504885 B/s, iops=123 , runt=208627msec
> slat (usec): min=0 , max=120 , avg= 1.65, stdev= 3.46
>
> write: io=101936KB, bw=500330 B/s, iops=122 , runt=208627msec
> slat (usec): min=0 , max=131 , avg= 1.85, stdev= 3.27
>
> From above, it can be seen that submit latencies improve a lot with the
> patch. The full fio results for the "old"(unpatched) and "new"(patched)
> cases are attached. Results with both ramdisk (*rd*) and disk attached,
> and also the corresponding fio files.
>
> Some variations I tried:
>
> 1. I tried to submit one iocb at a time (lock/unlock ctx->lock), and
> that had good performance for a regular disk, but when I tested with a
> ramdisk (to simulate very fast disk), performance was extremely bad.
> Submitting all the iocbs from an io_submit in one go, restored the
> performance (latencies+bandwidth).
>
> 2. I was earlier trying to use queue_delayed_work with 0 timeout, but
> that worsened the submit latencies a bit but improved bandwidth.
>
> 3. Also, I tried not using aio_queue_work from io_submit call, and instead
> depending on an already scheduled one or the iocbs being run when
> io_getevents gets called. This seemed to give improved perfomance. But
> does this constitute as change of api semantics?
>
I once have observed latency issues with aio_queue_work with lesser
number of threads when I was trying to resubmit IOs on a ramdisk, as
this function introduces a mandatory delay if nobody is waiting on
this iocb. The latencies were high but with large number of threads,
effect was not prominent.
> Signed-off-by: Ankit Jain <jankit@suse.de>
>
> --
> diff --git a/fs/aio.c b/fs/aio.c
> index 71f613c..79801096b 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -563,6 +563,11 @@ static int __aio_put_req(struct kioctx *ctx, struct
> kiocb *req)
> req->ki_cancel = NULL;
> req->ki_retry = NULL;
>
> + if (likely(req->submitter_cred)) {
> + put_cred(req->submitter_cred);
> + req->submitter_cred = NULL;
> + }
> +
> fput(req->ki_filp);
> req->ki_filp = NULL;
> really_put_req(ctx, req);
> @@ -659,6 +664,7 @@ static ssize_t aio_run_iocb(struct kiocb *iocb)
> struct kioctx *ctx = iocb->ki_ctx;
> ssize_t (*retry)(struct kiocb *);
> ssize_t ret;
> + const struct cred *old_cred = NULL;
>
> if (!(retry = iocb->ki_retry)) {
> printk("aio_run_iocb: iocb->ki_retry = NULL\n");
> @@ -703,12 +709,19 @@ static ssize_t aio_run_iocb(struct kiocb *iocb)
> goto out;
> }
>
> + if (iocb->submitter_cred)
> + /* setup creds */
> + old_cred = override_creds(iocb->submitter_cred);
> +
> /*
> * Now we are all set to call the retry method in async
> * context.
> */
> ret = retry(iocb);
>
> + if (old_cred)
> + revert_creds(old_cred);
> +
> if (ret != -EIOCBRETRY && ret != -EIOCBQUEUED) {
> /*
> * There's no easy way to restart the syscall since other
> AIO's
> @@ -804,10 +817,14 @@ static void aio_queue_work(struct kioctx * ctx)
> */
> static inline void aio_run_all_iocbs(struct kioctx *ctx)
> {
> + struct blk_plug plug;
> +
> + blk_start_plug(&plug);
> spin_lock_irq(&ctx->ctx_lock);
> while (__aio_run_iocbs(ctx))
> ;
> spin_unlock_irq(&ctx->ctx_lock);
> + blk_finish_plug(&plug);
> }
>
> /*
> @@ -825,13 +842,16 @@ static void aio_kick_handler(struct work_struct
> *work)
> mm_segment_t oldfs = get_fs();
> struct mm_struct *mm;
> int requeue;
> + struct blk_plug plug;
>
> set_fs(USER_DS);
> use_mm(ctx->mm);
> + blk_start_plug(&plug);
> spin_lock_irq(&ctx->ctx_lock);
> requeue =__aio_run_iocbs(ctx);
> mm = ctx->mm;
> spin_unlock_irq(&ctx->ctx_lock);
> + blk_finish_plug(&plug);
> unuse_mm(mm);
> set_fs(oldfs);
> /*
> @@ -1506,12 +1526,14 @@ static ssize_t aio_setup_iocb(struct kiocb *kiocb,
> bool compat)
>
> static int io_submit_one(struct kioctx *ctx, struct iocb __user
> *user_iocb,
> struct iocb *iocb, struct kiocb_batch *batch,
> - bool compat)
> + bool compat, struct kiocb **req_entry)
> {
> struct kiocb *req;
> struct file *file;
> ssize_t ret;
>
> + *req_entry = NULL;
> +
> /* enforce forwards compatibility on users */
> if (unlikely(iocb->aio_reserved1 || iocb->aio_reserved2)) {
> pr_debug("EINVAL: io_submit: reserve field set\n");
> @@ -1537,6 +1559,7 @@ static int io_submit_one(struct kioctx *ctx, struct
> iocb __user *user_iocb,
> fput(file);
> return -EAGAIN;
> }
> +
> req->ki_filp = file;
> if (iocb->aio_flags & IOCB_FLAG_RESFD) {
> /*
> @@ -1567,38 +1590,16 @@ static int io_submit_one(struct kioctx *ctx,
> struct iocb __user *user_iocb,
> req->ki_left = req->ki_nbytes = iocb->aio_nbytes;
> req->ki_opcode = iocb->aio_lio_opcode;
>
> + req->submitter_cred = get_current_cred();
> +
> ret = aio_setup_iocb(req, compat);
>
> if (ret)
> goto out_put_req;
>
> - spin_lock_irq(&ctx->ctx_lock);
> - /*
> - * We could have raced with io_destroy() and are currently holding
> a
> - * reference to ctx which should be destroyed. We cannot submit IO
> - * since ctx gets freed as soon as io_submit() puts its reference.
> The
> - * check here is reliable: io_destroy() sets ctx->dead before
> waiting
> - * for outstanding IO and the barrier between these two is
> realized by
> - * unlock of mm->ioctx_lock and lock of ctx->ctx_lock.
> Analogously we
> - * increment ctx->reqs_active before checking for ctx->dead and
> the
> - * barrier is realized by unlock and lock of ctx->ctx_lock. Thus
> if we
> - * don't see ctx->dead set here, io_destroy() waits for our IO to
> - * finish.
> - */
> - if (ctx->dead) {
> - spin_unlock_irq(&ctx->ctx_lock);
> - ret = -EINVAL;
> - goto out_put_req;
> - }
> - aio_run_iocb(req);
> - if (!list_empty(&ctx->run_list)) {
> - /* drain the run list */
> - while (__aio_run_iocbs(ctx))
> - ;
> - }
> - spin_unlock_irq(&ctx->ctx_lock);
> -
> aio_put_req(req); /* drop extra ref to req */
> +
> + *req_entry = req;
> return 0;
>
> out_put_req:
> @@ -1613,8 +1614,10 @@ long do_io_submit(aio_context_t ctx_id, long nr,
> struct kioctx *ctx;
> long ret = 0;
> int i = 0;
> - struct blk_plug plug;
> struct kiocb_batch batch;
> + struct kiocb **req_arr = NULL;
> + int nr_submitted = 0;
> + int req_arr_cnt = 0;
>
> if (unlikely(nr < 0))
> return -EINVAL;
> @@ -1632,8 +1635,8 @@ long do_io_submit(aio_context_t ctx_id, long nr,
> }
>
> kiocb_batch_init(&batch, nr);
> -
> - blk_start_plug(&plug);
> + req_arr = kmalloc(sizeof(struct kiocb *) * nr, GFP_KERNEL);
> + memset(req_arr, 0, sizeof(req_arr));
>
> /*
> * AKPM: should this return a partial result if some of the IOs
> were
> @@ -1653,15 +1656,51 @@ long do_io_submit(aio_context_t ctx_id, long nr,
> break;
> }
>
> - ret = io_submit_one(ctx, user_iocb, &tmp, &batch, compat);
> + ret = io_submit_one(ctx, user_iocb, &tmp, &batch, compat,
> + &req_arr[i]);
> if (ret)
> break;
> + req_arr_cnt++;
> }
> - blk_finish_plug(&plug);
>
> + spin_lock_irq(&ctx->ctx_lock);
> + /*
> + * We could have raced with io_destroy() and are currently holding
> a
> + * reference to ctx which should be destroyed. We cannot submit IO
> + * since ctx gets freed as soon as io_submit() puts its reference.
> The
> + * check here is reliable: io_destroy() sets ctx->dead before
> waiting
> + * for outstanding IO and the barrier between these two is
> realized by
> + * unlock of mm->ioctx_lock and lock of ctx->ctx_lock.
> Analogously we
> + * increment ctx->reqs_active before checking for ctx->dead and
> the
> + * barrier is realized by unlock and lock of ctx->ctx_lock. Thus
> if we
> + * don't see ctx->dead set here, io_destroy() waits for our IO to
> + * finish.
> + */
> + if (ctx->dead) {
> + spin_unlock_irq(&ctx->ctx_lock);
> + for (i = 0; i < req_arr_cnt; i++)
> + /* drop i/o ref to the req */
> + __aio_put_req(ctx, req_arr[i]);
> +
> + ret = -EINVAL;
> + goto out;
> + }
> +
> + for (i = 0; i < req_arr_cnt; i++) {
> + struct kiocb *req = req_arr[i];
> + if (likely(!kiocbTryKick(req)))
> + __queue_kicked_iocb(req);
> + nr_submitted++;
> + }
> + if (likely(nr_submitted > 0))
> + aio_queue_work(ctx);
> + spin_unlock_irq(&ctx->ctx_lock);
> +
> +out:
> kiocb_batch_free(ctx, &batch);
> + kfree(req_arr);
> put_ioctx(ctx);
> - return i ? i : ret;
> + return nr_submitted ? nr_submitted : ret;
> }
>
> /* sys_io_submit:
> diff --git a/include/linux/aio.h b/include/linux/aio.h
> index b1a520e..bcd6a5e 100644
> --- a/include/linux/aio.h
> +++ b/include/linux/aio.h
> @@ -124,6 +124,8 @@ struct kiocb {
> * this is the underlying eventfd context to deliver events to.
> */
> struct eventfd_ctx *ki_eventfd;
> +
> + const struct cred *submitter_cred;
> };
>
> #define is_sync_kiocb(iocb) ((iocb)->ki_key == KIOCB_SYNC_KEY)
>
> --
> Ankit Jain
> SUSE Labs
--
Rajat Sharma
--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org. For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC][PATCH] Make io_submit non-blocking
2012-07-24 12:34 ` Rajat Sharma
@ 2012-07-24 20:27 ` Theodore Ts'o
0 siblings, 0 replies; 11+ messages in thread
From: Theodore Ts'o @ 2012-07-24 20:27 UTC (permalink / raw)
To: Rajat Sharma
Cc: Ankit Jain, Al Viro, bcrl, linux-fsdevel, linux-aio, linux-kernel,
Jan Kara
On Tue, Jul 24, 2012 at 06:04:23PM +0530, Rajat Sharma wrote:
> >
> > Currently, io_submit tries to execute the io requests on the
> > same thread, which could block because of various reaons (eg.
> > allocation of disk blocks). So, essentially, io_submit ends
> > up being a blocking call.
>
> Ideally filesystem should take care of it e.g. by deferring such time
> consuming allocations and return -EIOCBQUEUED immediately. But have
> you seen such cases?
Oh, it happens all the time if you are using AIO. If the file system
needs to read or write any metadata block, AIO can become distinctly
non-"A". The workaround that I've chosen is to create a way to cache
the information needed for the bmap() operation, triggered via an
ioctl() issued at open time, so that this is not an issue, but that
only works if the file is pre-allocated, and there is no need to do
any block allocations.
It's all very well and good to say, "the file system should handle
it", but that just pushes the problem onto the file system. And since
you need to potentially issue block I/O requests, which you can't do
from an interrupt context (i.e., a block I/O completion handler), you
really do need to create a workqueue in order to make things work.
If you do it in the fs/direct_io.c layer, at least that way you can
solve the problem once for all file systems....
> With lots of application threads firing continuous IOs, workqueue
> threads might become bottleneck and you might have to eventually
> develop a priority scheduling. This workqueue was originally designed
> for IO retries which is an error path, now consumers of workqueue
> might easily increase by 100x.
Yes, you definitely need to throttle how many outstanding AIO's can be
allowed to be outstanding, either globally, or on a
per-superblock/process/user/cgroup basis, and return EAGAIN if there
are too many outstanding requests.
Speaking of cgroups, one of the other challenges with running the AIO
out of a workqueue is trying to respect cgroup restrictions. In
particular, the io-throttle cgroup (which is needed to provide
Proportional I/O support), but also the memory cgroup.
All of these complications is why I decided to simply go with the "pin
metadata" approach, since I didn't need to worry (at least initially)
with the allocating write case. (These patches to ext4 haven't yet
been published upstream, mainly because they need a lot of cleanup
work and I haven't had time to do that cleanup; my intention is to get
the "big extents" patchset upstream, though.)
- Ted
--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org. For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC][PATCH] Make io_submit non-blocking
2012-07-24 11:41 [RFC][PATCH] Make io_submit non-blocking Ankit Jain
2012-07-24 12:34 ` Rajat Sharma
@ 2012-07-24 22:31 ` Dave Chinner
2012-07-24 22:50 ` Christoph Hellwig
2012-07-25 20:12 ` Ankit Jain
2012-07-24 22:37 ` Zach Brown
2 siblings, 2 replies; 11+ messages in thread
From: Dave Chinner @ 2012-07-24 22:31 UTC (permalink / raw)
To: Ankit Jain
Cc: Al Viro, bcrl, linux-fsdevel, linux-aio, linux-kernel, Jan Kara
On Tue, Jul 24, 2012 at 05:11:05PM +0530, Ankit Jain wrote:
>
> Currently, io_submit tries to execute the io requests on the
> same thread, which could block because of various reaons (eg.
> allocation of disk blocks). So, essentially, io_submit ends
> up being a blocking call.
>
> With this patch, io_submit prepares all the kiocbs and then
> adds (kicks) them to ctx->run_list (kicked) in one go and then
> schedules the workqueue. The actual operations are not executed
> on io_submit's process context, so it can return very quickly.
>
> This run_list is processed either on a workqueue or in response to
> an io_getevents call. This utilizes the existing retry infrastructure.
>
> It uses override_creds/revert_creds to use the submitting process'
> credentials when processing the iocb request from the workqueue. This
> is required for proper support of quota and reserved block access.
>
> Currently, we use block plugging in io_submit, since most of the IO
> was being done there itself. This patch moves it to aio_kick_handler
> and aio_run_all_iocbs, where the IO gets submitted.
>
> All the tests were run with ext4.
>
> I tested the patch with fio
> (fio rand-rw-disk.fio --max-jobs=2 --latency-log
> --bandwidth-log)
>
> **Unpatched**
> read : io=102120KB, bw=618740 B/s, iops=151 , runt=169006msec
> slat (usec): min=275 , max=87560 , avg=6571.88, stdev=2799.57
Hmmm, I had to check the numbers twice - that's only 600KB/s.
Perhaps you need to test on something more than a single piece of
spinning rust. Optimising AIO for SSD rates (say 100k 4k write IOPS)
is probably more relevant to the majority of AIO users....
> write: io=102680KB, bw=622133 B/s, iops=151 , runt=169006msec
> slat (usec): min=2 , max=196 , avg=24.66, stdev=20.35
>
> **Patched**
> read : io=102864KB, bw=504885 B/s, iops=123 , runt=208627msec
> slat (usec): min=0 , max=120 , avg= 1.65, stdev= 3.46
>
> write: io=101936KB, bw=500330 B/s, iops=122 , runt=208627msec
> slat (usec): min=0 , max=131 , avg= 1.85, stdev= 3.27
So you made ext4 20% slower at random 4k writes with worst case
latencies only improving by about 30%. That, I think, is a
non-starter....
Also, you added a memory allocation in the io submit code. Worse
case latency will still be effectively undefined - what happens to
latencies if you generate memory pressure while the test is running?
FWIW, if you are going to change generic code, you need to present
results for other filesystems as well (xfs, btrfs are typical), as
they may not have the same problems as ext4 or react the same way to
your change. The result might simply be "it is 20% slower"....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC][PATCH] Make io_submit non-blocking
2012-07-24 11:41 [RFC][PATCH] Make io_submit non-blocking Ankit Jain
2012-07-24 12:34 ` Rajat Sharma
2012-07-24 22:31 ` Dave Chinner
@ 2012-07-24 22:37 ` Zach Brown
2012-07-25 20:17 ` Ankit Jain
2 siblings, 1 reply; 11+ messages in thread
From: Zach Brown @ 2012-07-24 22:37 UTC (permalink / raw)
To: Ankit Jain
Cc: Al Viro, bcrl, linux-fsdevel, linux-aio, linux-kernel, Jan Kara
On Tue, Jul 24, 2012 at 05:11:05PM +0530, Ankit Jain wrote:
>
> Currently, io_submit tries to execute the io requests on the
> same thread, which could block because of various reaons (eg.
> allocation of disk blocks). So, essentially, io_submit ends
> up being a blocking call.
Yup, sadly that's how its built. A blocking submission phase that
returns once completion doesn't need the submitters's context. It
happens to mostly work for O_DIRECT block IO most of the time.
> With this patch, io_submit prepares all the kiocbs and then
> adds (kicks) them to ctx->run_list (kicked) in one go and then
> schedules the workqueue. The actual operations are not executed
> on io_submit's process context, so it can return very quickly.
Strong nack; this isn't safe without having done the work to ensure that
all the task_struct references under the f_op->aio_*() paths won't be
horribly confused to find a kernel thread instead of the process that
called io_submit().
The one-off handling of the submitters's cred is an indication that
there might be other cases to worry about :).
> 3. Also, I tried not using aio_queue_work from io_submit call, and instead
> depending on an already scheduled one or the iocbs being run when
> io_getevents gets called. This seemed to give improved perfomance. But
> does this constitute as change of api semantics?
You can't rely on io_getevents() being called for forward progress. Its
perfectly reasonable for a task to wait for io completion by polling an
eventfd that aio_complete() notifies, for instance.
- z
--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org. For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC][PATCH] Make io_submit non-blocking
2012-07-24 22:31 ` Dave Chinner
@ 2012-07-24 22:50 ` Christoph Hellwig
2012-07-24 23:08 ` Zach Brown
2012-07-26 19:52 ` Ankit Jain
2012-07-25 20:12 ` Ankit Jain
1 sibling, 2 replies; 11+ messages in thread
From: Christoph Hellwig @ 2012-07-24 22:50 UTC (permalink / raw)
To: Dave Chinner
Cc: Ankit Jain, Al Viro, bcrl, linux-fsdevel, linux-aio, linux-kernel,
Jan Kara
On Wed, Jul 25, 2012 at 08:31:10AM +1000, Dave Chinner wrote:
> FWIW, if you are going to change generic code, you need to present
> results for other filesystems as well (xfs, btrfs are typical), as
> they may not have the same problems as ext4 or react the same way to
> your change. The result might simply be "it is 20% slower"....
And most importantly block devices, as they are one of the biggest
use cases of AIO. With an almost no-op get_blocks callback I can't
see how this change would provide any gain there.
--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org. For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC][PATCH] Make io_submit non-blocking
2012-07-24 22:50 ` Christoph Hellwig
@ 2012-07-24 23:08 ` Zach Brown
2012-07-26 19:52 ` Ankit Jain
1 sibling, 0 replies; 11+ messages in thread
From: Zach Brown @ 2012-07-24 23:08 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Dave Chinner, Ankit Jain, Al Viro, bcrl, linux-fsdevel, linux-aio,
linux-kernel, Jan Kara
> And most importantly block devices, as they are one of the biggest
> use cases of AIO. With an almost no-op get_blocks callback I can't
> see how this change would provide any gain there.
Historically we'd often see submission stuck waiting for requests.
Tasks often try to submit way more aio than the block layer is happy to
have in flight.
Dunno if that's still a problem these days.
- z
--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org. For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC][PATCH] Make io_submit non-blocking
2012-07-24 22:31 ` Dave Chinner
2012-07-24 22:50 ` Christoph Hellwig
@ 2012-07-25 20:12 ` Ankit Jain
1 sibling, 0 replies; 11+ messages in thread
From: Ankit Jain @ 2012-07-25 20:12 UTC (permalink / raw)
To: Dave Chinner
Cc: Al Viro, bcrl, linux-fsdevel, linux-aio, linux-kernel, Jan Kara
[-- Attachment #1: Type: text/plain, Size: 4446 bytes --]
On 07/25/2012 04:01 AM, Dave Chinner wrote:
> On Tue, Jul 24, 2012 at 05:11:05PM +0530, Ankit Jain wrote:
[snip]
>> **Unpatched**
>> read : io=102120KB, bw=618740 B/s, iops=151 , runt=169006msec
>> slat (usec): min=275 , max=87560 , avg=6571.88, stdev=2799.57
>
> Hmmm, I had to check the numbers twice - that's only 600KB/s.
>
> Perhaps you need to test on something more than a single piece of
> spinning rust. Optimising AIO for SSD rates (say 100k 4k write IOPS)
> is probably more relevant to the majority of AIO users....
I tested with a ramdisk to "simulate" a fast disk and had attached the
results. I'll try to get hold of a SSD and then test with that also.
Meanwhile, I ran the tests again, with ext3/ext4/xfs/btrfs and not sure
what I had screwed up when I did that previous test, but the numbers
look proper (as I was getting in my earlier testing) now:
For disk, I tested on a separate partition formatted with the fs, and
then run fio on it, with 1 job. Here "Old" is 3.5-rc7 (918227b).
------ disk -------
====== ext3 ======
submit latencies(usec)
B/w iops runtime min max avg std dev
ext3-read :
Old: 453352 B/s 110 231050msec 3 283048 170.28 5183.28
New: 451298 B/s 110 232555msec 0 444 8.18 7.95
ext3-write:
Old: 454309 B/s 110 231050msec 2 304614 232.72 6549.82
New: 450488 B/s 109 232555msec 0 233 7.94 7.23
====== ext4 ======
ext4-read :
Old: 459824 B/s 112 228635msec 2 260051 121.40 3569.78
New: 422700 B/s 103 247097msec 0 165 8.18 7.87
ext4-write:
Old: 457424 B/s 111 228635msec 3 312958 166.75 4616.58
New: 426015 B/s 104 247097msec 0 169 8.00 8.08
====== xfs ======
xfs-read :
Old: 467330 B/s 114 224516msec 3 272 46.45 25.35
New: 417049 B/s 101 252262msec 0 165 7.84 7.87
xfs-write:
Old: 466746 B/s 113 224516msec 3 265 52.52 28.13
New: 414289 B/s 101 252262msec 0 143 7.58 7.66
====== btrfs ======
btrfs-read :
Old: 1027.1KB/s 256 99918msec 5 84457 62.15 527.24
New: 1054.5KB/s 263 97542msec 0 121 9.72 7.05
btrfs-write:
Old: 1021.8KB/s 255 99918msec 10 139473 84.96 899.99
New: 1045.2KB/s 261 97542msec 0 248 9.55 7.02
These are the figures with a ramdisk:
------ ramdisk -------
====== ext3 ======
submit latencies (usec)
B/w iops runtime min max avg std dev
ext3-read :
Old: 430312KB/s 107577 2026msec 1 7072 3.85 15.17
New: 491251KB/s 122812 1772msec 0 22 0.39 0.52
ext3-write:
Old: 428918KB/s 107229 2026msec 2 61 3.46 0.85
New: 491142KB/s 122785 1772msec 0 62 0.43 0.55
====== ext4 ======
ext4-read :
Old: 466132KB/s 116532 1869msec 2 133 3.66 1.04
New: 542337KB/s 135584 1607msec 0 67 0.40 0.54
ext4-write:
Old: 465276KB/s 116318 1869msec 2 127 2.96 0.94
New: 540923KB/s 135230 1607msec 0 73 0.43 0.55
====== xfs ======
xfs-read :
Old: 485556KB/s 121389 1794msec 2 160 3.58 1.22
New: 581477KB/s 145369 1495msec 0 19 0.39 0.51
xfs-write:
Old: 484789KB/s 121197 1794msec 1 87 2.68 0.99
New: 582938KB/s 145734 1495msec 0 56 0.43 0.55
====== btrfs ======
I had trouble with btrfs on a ramdisk though, it complained about space
during preallocation. This was with a 4gig ramdisk and fio set to write
1700mb file, so these numbers are from that partial run. Btrfs ran fine
on a regular disk though.
btrfs-read :
Old: 107519KB/s 26882 2579msec 13 1492 17.03 9.23
New: 109878KB/s 27469 4665msec 0 29 0.45 0.55
btrfs-write:
Old: 108047KB/s 27020 2579msec 1 64963 17.21 823.88
New: 109413KB/s 27357 4665msec 0 32 0.48 0.56
Also, I dropped caches ("echo 3 > /proc/vm/sys/drop_cache") and sync'ed
before running each test. All the fio log files are attached.
Any suggestions on how I might test this better, other than the SSD
suggestion ofcourse.
[snip]
> Also, you added a memory allocation in the io submit code. Worse
> case latency will still be effectively undefined - what happens to
> latencies if you generate memory pressure while the test is running?
I'll try to fix this.
--
Ankit Jain
SUSE Labs
[-- Attachment #2: fio-logs.tgz --]
[-- Type: application/x-compressed-tar, Size: 11198 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC][PATCH] Make io_submit non-blocking
2012-07-24 22:37 ` Zach Brown
@ 2012-07-25 20:17 ` Ankit Jain
0 siblings, 0 replies; 11+ messages in thread
From: Ankit Jain @ 2012-07-25 20:17 UTC (permalink / raw)
To: Zach Brown
Cc: Al Viro, bcrl, linux-fsdevel, linux-aio, linux-kernel, Jan Kara
On 07/25/2012 04:07 AM, Zach Brown wrote:
> On Tue, Jul 24, 2012 at 05:11:05PM +0530, Ankit Jain wrote:
[snip]
>> With this patch, io_submit prepares all the kiocbs and then
>> adds (kicks) them to ctx->run_list (kicked) in one go and then
>> schedules the workqueue. The actual operations are not executed
>> on io_submit's process context, so it can return very quickly.
>
> Strong nack; this isn't safe without having done the work to ensure that
> all the task_struct references under the f_op->aio_*() paths won't be
> horribly confused to find a kernel thread instead of the process that
> called io_submit().
>
> The one-off handling of the submitters's cred is an indication that
> there might be other cases to worry about :).
Makes sense, I will try to look into this.
>> 3. Also, I tried not using aio_queue_work from io_submit call, and instead
>> depending on an already scheduled one or the iocbs being run when
>> io_getevents gets called. This seemed to give improved perfomance. But
>> does this constitute as change of api semantics?
>
> You can't rely on io_getevents() being called for forward progress. Its
> perfectly reasonable for a task to wait for io completion by polling an
> eventfd that aio_complete() notifies, for instance.
Ah okay, didn't realize that.
Thanks,
--
Ankit Jain
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org. For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC][PATCH] Make io_submit non-blocking
2012-07-24 22:50 ` Christoph Hellwig
2012-07-24 23:08 ` Zach Brown
@ 2012-07-26 19:52 ` Ankit Jain
2012-07-26 21:43 ` Zach Brown
1 sibling, 1 reply; 11+ messages in thread
From: Ankit Jain @ 2012-07-26 19:52 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Dave Chinner, Al Viro, bcrl, linux-fsdevel, linux-aio,
linux-kernel, Jan Kara
[-- Attachment #1: Type: text/plain, Size: 1921 bytes --]
On 07/25/2012 04:20 AM, Christoph Hellwig wrote:
> On Wed, Jul 25, 2012 at 08:31:10AM +1000, Dave Chinner wrote:
>> FWIW, if you are going to change generic code, you need to present
>> results for other filesystems as well (xfs, btrfs are typical), as
>> they may not have the same problems as ext4 or react the same way to
>> your change. The result might simply be "it is 20% slower"....
>
> And most importantly block devices, as they are one of the biggest
> use cases of AIO. With an almost no-op get_blocks callback I can't
> see how this change would provide any gain there.
I tried running fio against a block device, disk partition and a
ramdisk. I ran this with a single job though. For disks, bandwidth
seems to stay nearly the same with submit latencies getting better.
And for ramdisk, bandwidth also sees improvement. I should probably
be doing better tests, any suggestions on what or how I can test?
For block devices, if the patch doesn't make it worse, at least, then
that should be good enough?
------ disk -------
submit latencies(usec)
B/w iops runtime min max avg std dev
Read :
Old: 417335 B/s 101 252668msec 4 231 40.03 21.66
New: 419099 B/s 102 251282msec 0 169 8.20 6.95
Write:
Old: 412667 B/s 100 252668msec 3 272 47.65 24.58
New: 415481 B/s 101 251282msec 0 134 7.95 7.11
------ ramdisk -------
submit latencies(usec)
B/w iops runtime min max avg std dev
Read:
Old: 708235KB/s 177058 1227msec 1 51 1.61 0.72
New: 822157KB/s 205539 1059msec 0 14 0.38 0.52
Write:
Old: 710510KB/s 177627 1227msec 2 46 2.33 0.81
New: 821658KB/s 205414 1059msec 0 24 0.40 0.53
Full fio results are attached, and I dropped cache before running
the tests.
--
Ankit Jain
SUSE Labs
[-- Attachment #2: raw-disk-new.log --]
[-- Type: text/x-log, Size: 2601 bytes --]
random_rw: (g=0): rw=randrw, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=32
fio-2.0.8-9-gfb9f0
Starting 1 process
random_rw: (groupid=0, jobs=1): err= 0: pid=2109: Thu Jul 26 17:14:55 2012
read : io=102844KB, bw=419099 B/s, iops=102 , runt=251282msec
slat (usec): min=0 , max=169 , avg= 8.20, stdev= 6.95
clat (usec): min=335 , max=3356.7K, avg=255054.47, stdev=158234.29
lat (usec): min=342 , max=3356.7K, avg=255063.32, stdev=158234.33
clat percentiles (msec):
| 1.00th=[ 8], 5.00th=[ 50], 10.00th=[ 84], 20.00th=[ 130],
| 30.00th=[ 169], 40.00th=[ 204], 50.00th=[ 237], 60.00th=[ 269],
| 70.00th=[ 306], 80.00th=[ 351], 90.00th=[ 437], 95.00th=[ 529],
| 99.00th=[ 791], 99.50th=[ 914], 99.90th=[ 1237], 99.95th=[ 1483],
| 99.99th=[ 2073]
bw (KB/s) : min= 111, max= 646, per=100.00%, avg=410.90, stdev=84.69
write: io=101956KB, bw=415481 B/s, iops=101 , runt=251282msec
slat (usec): min=0 , max=134 , avg= 7.95, stdev= 7.11
clat (usec): min=189 , max=928209 , avg=58138.79, stdev=76776.72
lat (usec): min=194 , max=928221 , avg=58147.37, stdev=76776.86
clat percentiles (usec):
| 1.00th=[ 498], 5.00th=[ 828], 10.00th=[ 1624], 20.00th=[ 4960],
| 30.00th=[12352], 40.00th=[22144], 50.00th=[33536], 60.00th=[46848],
| 70.00th=[63232], 80.00th=[90624], 90.00th=[148480], 95.00th=[203776],
| 99.00th=[370688], 99.50th=[460800], 99.90th=[643072], 99.95th=[716800],
| 99.99th=[831488]
bw (KB/s) : min= 31, max= 864, per=100.00%, avg=408.11, stdev=111.34
lat (usec) : 250=0.02%, 500=0.54%, 750=1.27%, 1000=1.51%
lat (msec) : 2=2.39%, 4=3.60%, 10=4.63%, 20=5.96%, 50=13.51%
lat (msec) : 100=14.18%, 250=27.95%, 500=21.04%, 750=2.78%, 1000=0.46%
lat (msec) : 2000=0.15%, >=2000=0.01%
cpu : usr=0.51%, sys=1.52%, ctx=52135, majf=0, minf=23
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=99.9%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
issued : total=r=25711/w=25489/d=0, short=r=0/w=0/d=0
Run status group 0 (all jobs):
READ: io=102844KB, aggrb=409KB/s, minb=409KB/s, maxb=409KB/s, mint=251282msec, maxt=251282msec
WRITE: io=101956KB, aggrb=405KB/s, minb=405KB/s, maxb=405KB/s, mint=251282msec, maxt=251282msec
fio rand-rw-disk-2-raw.fio --output=/home/radical/src/play/ios-test/logs-with-drop-cache/ad6d29a/raw-disk-2-raw-ad6d29a.log --max-jobs=2 --latency-log --bandwidth-log
ad6d29a sent upstream
[-- Attachment #3: raw-disk-old.log --]
[-- Type: text/x-log, Size: 2663 bytes --]
random_rw: (g=0): rw=randrw, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=32
fio-2.0.8-9-gfb9f0
Starting 1 process
random_rw: (groupid=0, jobs=1): err= 0: pid=2117: Thu Jul 26 17:53:41 2012
read : io=102976KB, bw=417335 B/s, iops=101 , runt=252668msec
slat (usec): min=4 , max=231 , avg=40.03, stdev=21.66
clat (usec): min=236 , max=4075.6K, avg=254175.39, stdev=158853.64
lat (usec): min=339 , max=4075.7K, avg=254216.22, stdev=158853.33
clat percentiles (msec):
| 1.00th=[ 7], 5.00th=[ 51], 10.00th=[ 85], 20.00th=[ 131],
| 30.00th=[ 169], 40.00th=[ 204], 50.00th=[ 237], 60.00th=[ 269],
| 70.00th=[ 306], 80.00th=[ 351], 90.00th=[ 433], 95.00th=[ 529],
| 99.00th=[ 766], 99.50th=[ 906], 99.90th=[ 1270], 99.95th=[ 1500],
| 99.99th=[ 3261]
bw (KB/s) : min= 83, max= 624, per=100.00%, avg=409.49, stdev=91.68
write: io=101824KB, bw=412667 B/s, iops=100 , runt=252668msec
slat (usec): min=3 , max=272 , avg=47.65, stdev=24.58
clat (usec): min=139 , max=1248.8K, avg=60442.70, stdev=82817.74
lat (usec): min=198 , max=1248.9K, avg=60491.15, stdev=82817.11
clat percentiles (usec):
| 1.00th=[ 438], 5.00th=[ 812], 10.00th=[ 1704], 20.00th=[ 5280],
| 30.00th=[13376], 40.00th=[23168], 50.00th=[34560], 60.00th=[47872],
| 70.00th=[66048], 80.00th=[91648], 90.00th=[150528], 95.00th=[209920],
| 99.00th=[403456], 99.50th=[505856], 99.90th=[798720], 99.95th=[897024],
| 99.99th=[1073152]
bw (KB/s) : min= 7, max= 808, per=100.00%, avg=405.72, stdev=121.51
lat (usec) : 250=0.03%, 500=0.79%, 750=1.35%, 1000=1.21%
lat (msec) : 2=2.29%, 4=3.59%, 10=4.36%, 20=5.90%, 50=13.45%
lat (msec) : 100=14.37%, 250=28.08%, 500=21.27%, 750=2.73%, 1000=0.42%
lat (msec) : 2000=0.17%, >=2000=0.01%
cpu : usr=0.54%, sys=1.44%, ctx=52211, majf=0, minf=23
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=99.9%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
issued : total=r=25744/w=25456/d=0, short=r=0/w=0/d=0
Run status group 0 (all jobs):
READ: io=102976KB, aggrb=407KB/s, minb=407KB/s, maxb=407KB/s, mint=252668msec, maxt=252668msec
WRITE: io=101824KB, aggrb=402KB/s, minb=402KB/s, maxb=402KB/s, mint=252668msec, maxt=252668msec
fio rand-rw-disk-2-raw.fio --output=/home/radical/src/play/ios-test/logs-with-drop-cache/918227b/raw-disk-2-raw-918227b.log --max-jobs=2 --latency-log --bandwidth-log
918227b Merge tag 'fbdev-fixes-for-3.5-2' of git://github.com/schandinat/linux-2.6
[-- Attachment #4: raw-rd-new.log --]
[-- Type: text/x-log, Size: 2477 bytes --]
random_rw: (g=0): rw=randrw, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=32
fio-2.0.8-9-gfb9f0
Starting 1 process
random_rw: (groupid=0, jobs=1): err= 0: pid=2179: Thu Jul 26 17:14:57 2012
read : io=870664KB, bw=822157KB/s, iops=205539 , runt= 1059msec
slat (usec): min=0 , max=14 , avg= 0.38, stdev= 0.52
clat (usec): min=44 , max=294 , avg=76.67, stdev=12.77
lat (usec): min=45 , max=296 , avg=77.10, stdev=12.84
clat percentiles (usec):
| 1.00th=[ 70], 5.00th=[ 71], 10.00th=[ 72], 20.00th=[ 73],
| 30.00th=[ 74], 40.00th=[ 74], 50.00th=[ 75], 60.00th=[ 76],
| 70.00th=[ 76], 80.00th=[ 77], 90.00th=[ 79], 95.00th=[ 86],
| 99.00th=[ 97], 99.50th=[ 107], 99.90th=[ 255], 99.95th=[ 266],
| 99.99th=[ 286]
bw (KB/s) : min=819368, max=826656, per=100.00%, avg=823012.00, stdev=5153.39
write: io=870136KB, bw=821658KB/s, iops=205414 , runt= 1059msec
slat (usec): min=0 , max=24 , avg= 0.40, stdev= 0.53
clat (usec): min=42 , max=292 , avg=77.34, stdev=12.81
lat (usec): min=43 , max=293 , avg=77.79, stdev=12.89
clat percentiles (usec):
| 1.00th=[ 70], 5.00th=[ 72], 10.00th=[ 73], 20.00th=[ 74],
| 30.00th=[ 74], 40.00th=[ 75], 50.00th=[ 76], 60.00th=[ 76],
| 70.00th=[ 77], 80.00th=[ 78], 90.00th=[ 80], 95.00th=[ 87],
| 99.00th=[ 98], 99.50th=[ 107], 99.90th=[ 262], 99.95th=[ 270],
| 99.99th=[ 286]
bw (KB/s) : min=819368, max=825328, per=100.00%, avg=822348.00, stdev=4214.36
lat (usec) : 50=0.01%, 100=99.35%, 250=0.41%, 500=0.24%
cpu : usr=24.76%, sys=74.76%, ctx=114, majf=0, minf=24
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
issued : total=r=217666/w=217534/d=0, short=r=0/w=0/d=0
Run status group 0 (all jobs):
READ: io=870664KB, aggrb=822156KB/s, minb=822156KB/s, maxb=822156KB/s, mint=1059msec, maxt=1059msec
WRITE: io=870136KB, aggrb=821658KB/s, minb=821658KB/s, maxb=821658KB/s, mint=1059msec, maxt=1059msec
Disk stats (read/write):
ram0: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
fio rand-rw-rd-raw.fio --output=/home/radical/src/play/ios-test/logs-with-drop-cache/ad6d29a/raw-rd-raw-ad6d29a.log --max-jobs=2 --latency-log --bandwidth-log
ad6d29a sent upstream
[-- Attachment #5: raw-rd-old.log --]
[-- Type: text/x-log, Size: 2548 bytes --]
random_rw: (g=0): rw=randrw, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=32
fio-2.0.8-9-gfb9f0
Starting 1 process
random_rw: (groupid=0, jobs=1): err= 0: pid=2192: Thu Jul 26 17:53:52 2012
read : io=869004KB, bw=708235KB/s, iops=177058 , runt= 1227msec
slat (usec): min=1 , max=51 , avg= 1.61, stdev= 0.72
clat (usec): min=17 , max=425 , avg=87.51, stdev=17.51
lat (usec): min=19 , max=432 , avg=89.20, stdev=17.83
clat percentiles (usec):
| 1.00th=[ 79], 5.00th=[ 81], 10.00th=[ 82], 20.00th=[ 83],
| 30.00th=[ 84], 40.00th=[ 85], 50.00th=[ 86], 60.00th=[ 86],
| 70.00th=[ 87], 80.00th=[ 89], 90.00th=[ 91], 95.00th=[ 98],
| 99.00th=[ 111], 99.50th=[ 118], 99.90th=[ 374], 99.95th=[ 390],
| 99.99th=[ 406]
bw (KB/s) : min=707912, max=711328, per=100.00%, avg=709620.00, stdev=2415.48
write: io=871796KB, bw=710510KB/s, iops=177627 , runt= 1227msec
slat (usec): min=2 , max=46 , avg= 2.33, stdev= 0.81
clat (usec): min=14 , max=425 , avg=87.62, stdev=17.61
lat (usec): min=16 , max=435 , avg=90.05, stdev=18.07
clat percentiles (usec):
| 1.00th=[ 79], 5.00th=[ 81], 10.00th=[ 82], 20.00th=[ 83],
| 30.00th=[ 84], 40.00th=[ 85], 50.00th=[ 86], 60.00th=[ 87],
| 70.00th=[ 87], 80.00th=[ 89], 90.00th=[ 91], 95.00th=[ 98],
| 99.00th=[ 111], 99.50th=[ 118], 99.90th=[ 378], 99.95th=[ 390],
| 99.99th=[ 406]
bw (KB/s) : min=709360, max=717872, per=100.00%, avg=713616.00, stdev=6018.89
lat (usec) : 20=0.01%, 50=0.01%, 100=95.59%, 250=4.04%, 500=0.36%
cpu : usr=31.57%, sys=67.94%, ctx=125, majf=0, minf=24
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
issued : total=r=217251/w=217949/d=0, short=r=0/w=0/d=0
Run status group 0 (all jobs):
READ: io=869004KB, aggrb=708234KB/s, minb=708234KB/s, maxb=708234KB/s, mint=1227msec, maxt=1227msec
WRITE: io=871796KB, aggrb=710510KB/s, minb=710510KB/s, maxb=710510KB/s, mint=1227msec, maxt=1227msec
Disk stats (read/write):
ram0: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
fio rand-rw-rd-raw.fio --output=/home/radical/src/play/ios-test/logs-with-drop-cache/918227b/raw-rd-raw-918227b.log --max-jobs=2 --latency-log --bandwidth-log
918227b Merge tag 'fbdev-fixes-for-3.5-2' of git://github.com/schandinat/linux-2.6
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC][PATCH] Make io_submit non-blocking
2012-07-26 19:52 ` Ankit Jain
@ 2012-07-26 21:43 ` Zach Brown
0 siblings, 0 replies; 11+ messages in thread
From: Zach Brown @ 2012-07-26 21:43 UTC (permalink / raw)
To: Ankit Jain
Cc: Christoph Hellwig, Dave Chinner, Al Viro, bcrl, linux-fsdevel,
linux-aio, linux-kernel, Jan Kara
On Fri, Jul 27, 2012 at 01:22:10AM +0530, Ankit Jain wrote:
> I should probably be doing better tests, any suggestions on what or
> how I can test?
Well, is the test actually *doing* anything with these IOs?
Calling io_submit() and then immediately waiting for completion is the
best case for offloading work to threads inside io_submit(). It's
likely that the kernel thread will then get a chance to run and submit
the IO and you won't have lost much time since the io_submit() queued
the work.
So try saturating the cpus while the tests are running. Give the kernel
aio submission threads some competition for run time on the cpus.
Maybe with the cpuio bits of fio? I haven't used that myself but the
description of it in its README/HOWTO files are using all the right
words :).
- z
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2012-07-26 21:43 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-07-24 11:41 [RFC][PATCH] Make io_submit non-blocking Ankit Jain
2012-07-24 12:34 ` Rajat Sharma
2012-07-24 20:27 ` Theodore Ts'o
2012-07-24 22:31 ` Dave Chinner
2012-07-24 22:50 ` Christoph Hellwig
2012-07-24 23:08 ` Zach Brown
2012-07-26 19:52 ` Ankit Jain
2012-07-26 21:43 ` Zach Brown
2012-07-25 20:12 ` Ankit Jain
2012-07-24 22:37 ` Zach Brown
2012-07-25 20:17 ` Ankit Jain
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).