From: Jens Axboe <axboe@kernel.dk>
To: Dan Ehrenberg <dehrenberg@google.com>
Cc: fio@vger.kernel.org
Subject: Re: [PATCH] Adding userspace_libaio_reap option
Date: Tue, 30 Aug 2011 07:45:17 -0600 [thread overview]
Message-ID: <4E5CE96D.3000905@kernel.dk> (raw)
In-Reply-To: <1314664153-21134-1-git-send-email-dehrenberg@google.com>
On 2011-08-29 18:29, Dan Ehrenberg wrote:
> When a single thread is reading from a libaio io_context_t object
> in a non-blocking polling manner (that is, with the minimum number
> of events to return being 0), then it is possible to safely read
> events directly from user-space, taking advantage of the fact that
> the io_context_t object is a pointer to memory with a certain layout.
> This patch adds an option, userspace_libaio_reap, which allows
> reading events in this manner when the libaio engine is used.
>
> You can observe its effect by setting iodepth_batch_complete=0
> and seeing the change in distribution of system/user time based on
> whether this new flag is set. If userspace_libaio_reap=1, then
> busy polling takes place in userspace, and there is a larger amount of
> usr CPU. If userspace_libaio_reap=0 (the default), then there is a
> larger amount of sys CPU from the polling in the kernel.
>
> Polling from a queue in this manner is several times faster. In my
> testing, it took less than an eighth as much time to execute a
> polling operation in user-space than with the io_getevents syscall.
Good stuff! The libaio side looks good, but I think we should add engine
specific options under the specific engine. With all the
commands/options that fio has, it quickly becomes a bit unwieldy. So,
idea would be to have:
ioengine=libaio:userspace_reap
I'll look into that.
One question on the code:
> +static int user_io_getevents(io_context_t aio_ctx, unsigned int max,
> + struct io_event *events)
> +{
> + long i = 0;
> + unsigned head;
> + struct aio_ring *ring = (struct aio_ring*)aio_ctx;
> +
> + while (i < max) {
> + head = ring->head;
> +
> + if (head == ring->tail) {
> + /* There are no more completions */
> + break;
> + } else {
> + /* There is another completion to reap */
> + events[i] = ring->events[head];
> + ring->head = (head + 1) % ring->nr;
> + i++;
> + }
> + }
Don't we need a read barrier here before reading the head/tail?
--
Jens Axboe
next prev parent reply other threads:[~2011-08-30 13:45 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-08-30 0:29 [PATCH] Adding userspace_libaio_reap option Dan Ehrenberg
2011-08-30 13:45 ` Jens Axboe [this message]
2011-08-30 17:47 ` Daniel Ehrenberg
2011-08-30 17:51 ` Jens Axboe
2011-08-30 14:07 ` Jeff Moyer
[not found] ` <CAAK6Zt0uPgY1V1tJihUwKxbjtNVNUqMZu0UDmUtRdJY4k_Lkmw@mail.gmail.com>
[not found] ` <CAAK6Zt3iVsTS9=YGSJ3dTvY3vSYBygQYR9HeJwh8Zivmkfa7dg@mail.gmail.com>
2011-08-30 21:14 ` Jeff Moyer
[not found] ` <CAAK6Zt0SUBi_+_XRkP0pU5W6RYjcQhg-W+RkJP5qpteGSwPo4g@mail.gmail.com>
2011-08-30 21:35 ` Daniel Ehrenberg
[not found] ` <x49r5415x0j.fsf@segfault.boston.devel.redhat.com>
[not found] ` <CAAK6Zt3cnfUra+vXrZG5ondMf49KDgKf3JOgCM5rbi=KxJvD_Q@mail.gmail.com>
2011-08-31 22:58 ` Daniel Ehrenberg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4E5CE96D.3000905@kernel.dk \
--to=axboe@kernel.dk \
--cc=dehrenberg@google.com \
--cc=fio@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.