From: Jens Axboe <axboe@kernel.dk>
To: Daniel Ehrenberg <dehrenberg@google.com>
Cc: "fio@vger.kernel.org" <fio@vger.kernel.org>
Subject: Re: [PATCH] Adding userspace_libaio_reap option
Date: Tue, 30 Aug 2011 11:51:37 -0600 [thread overview]
Message-ID: <4E5D2329.7070909@kernel.dk> (raw)
In-Reply-To: <CAAK6Zt23Aq86=5Cbe+Bzpy=8QoAmequf-9Keo7wE45hHWwxW7g@mail.gmail.com>
On 2011-08-30 11:47, Daniel Ehrenberg wrote:
> On Tuesday, August 30, 2011, Jens Axboe <axboe@kernel.dk <mailto:axboe@kernel.dk>> wrote:
>> On 2011-08-29 18:29, Dan Ehrenberg wrote:
>>> When a single thread is reading from a libaio io_context_t object
>>> in a non-blocking polling manner (that is, with the minimum number
>>> of events to return being 0), then it is possible to safely read
>>> events directly from user-space, taking advantage of the fact that
>>> the io_context_t object is a pointer to memory with a certain layout.
>>> This patch adds an option, userspace_libaio_reap, which allows
>>> reading events in this manner when the libaio engine is used.
>>>
>>> You can observe its effect by setting iodepth_batch_complete=0
>>> and seeing the change in distribution of system/user time based on
>>> whether this new flag is set. If userspace_libaio_reap=1, then
>>> busy polling takes place in userspace, and there is a larger amount of
>>> usr CPU. If userspace_libaio_reap=0 (the default), then there is a
>>> larger amount of sys CPU from the polling in the kernel.
>>>
>>> Polling from a queue in this manner is several times faster. In my
>>> testing, it took less than an eighth as much time to execute a
>>> polling operation in user-space than with the io_getevents syscall.
>>
>> Good stuff! The libaio side looks good, but I think we should add engine
>> specific options under the specific engine. With all the
>> commands/options that fio has, it quickly becomes a bit unwieldy. So,
>> idea would be to have:
>>
>> ioengine=libaio:userspace_reap
>
> Good idea. I was looking around for engine-specific options but didn't
> see any examples. I like this convention.
Optimally, we should be able to nest options under the options. But a
quicker hack should suffice, can always be extended if need be.
>>
>> I'll look into that.
>>
>> One question on the code:
>>
>>> +static int user_io_getevents(io_context_t aio_ctx, unsigned int max,
>>> + struct io_event *events)
>>> +{
>>> + long i = 0;
>>> + unsigned head;
>>> + struct aio_ring *ring = (struct aio_ring*)aio_ctx;
>>> +
>>> + while (i < max) {
>>> + head = ring->head;
>>> +
>>> + if (head == ring->tail) {
>>> + /* There are no more completions */
>>> + break;
>>> + } else {
>>> + /* There is another completion to reap */
>>> + events[i] = ring->events[head];
>>> + ring->head = (head + 1) % ring->nr;
>>> + i++;
>>> + }
>>> + }
>>
>> Don't we need a read barrier here before reading the head/tail?
>>
> Of course; how did I forget that?
>
> I can make a fine barrier to run on my x64 machines, but it would be
> much better to not introduce an architectural dependency. Is there any
> kind of free library for this? Google has one (used in V8) but it's
> C++ and probably isn't on enough architectures. And of course the
> Linux kernel has one, but it would be a small project to extract it
> for use in user-space--or has someone done this work?
Fio already includes read and write barriers, they are called
read_barrier() and write_barrier().
FWIW, I agree with Jeff that this would be best handled in the libaio
library code. But if we can make it work reliably with the generic
kernel code (and I think we should), then I want to carry it in fio. For
patches that aren't even merged yet, the road to a setup that already
has this included by default is very long.
--
Jens Axboe
next prev parent reply other threads:[~2011-08-30 17:51 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-08-30 0:29 [PATCH] Adding userspace_libaio_reap option Dan Ehrenberg
2011-08-30 13:45 ` Jens Axboe
2011-08-30 17:47 ` Daniel Ehrenberg
2011-08-30 17:51 ` Jens Axboe [this message]
2011-08-30 14:07 ` Jeff Moyer
[not found] ` <CAAK6Zt0uPgY1V1tJihUwKxbjtNVNUqMZu0UDmUtRdJY4k_Lkmw@mail.gmail.com>
[not found] ` <CAAK6Zt3iVsTS9=YGSJ3dTvY3vSYBygQYR9HeJwh8Zivmkfa7dg@mail.gmail.com>
2011-08-30 21:14 ` Jeff Moyer
[not found] ` <CAAK6Zt0SUBi_+_XRkP0pU5W6RYjcQhg-W+RkJP5qpteGSwPo4g@mail.gmail.com>
2011-08-30 21:35 ` Daniel Ehrenberg
[not found] ` <x49r5415x0j.fsf@segfault.boston.devel.redhat.com>
[not found] ` <CAAK6Zt3cnfUra+vXrZG5ondMf49KDgKf3JOgCM5rbi=KxJvD_Q@mail.gmail.com>
2011-08-31 22:58 ` Daniel Ehrenberg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4E5D2329.7070909@kernel.dk \
--to=axboe@kernel.dk \
--cc=dehrenberg@google.com \
--cc=fio@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.