From: "Maxim V. Patlasov" <mpatlasov@parallels.com>
To: Brian Foster <bfoster@redhat.com>
Cc: "miklos@szeredi.hu" <miklos@szeredi.hu>,
Alexander Viro <viro@zeniv.linux.org.uk>,
"fuse-devel@lists.sourceforge.net"
<fuse-devel@lists.sourceforge.net>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
<linux-aio@kvack.org>, <bcrl@kvack.org>
Subject: Re: [fuse-devel] [PATCH 0/6] fuse: process direct IO asynchronously
Date: Thu, 13 Dec 2012 16:52:37 +0400 [thread overview]
Message-ID: <50C9CF95.7090704@parallels.com> (raw)
Hi Brian,
>> Existing fuse implementation always processes direct IO
>> synchronously: it
>> submits next request to userspace fuse only when previous is
>> completed. This
>> is suboptimal because: 1) libaio DIO works in blocking way; 2)
>> userspace fuse
>> can't achieve parallelism processing several requests simultaneously
>> (e.g.
>> in case of distributed network storage); 3) userspace fuse can't merge
>> requests before passing it to actual storage.
>>
>> The idea of the patch-set is to submit fuse requests in non-blocking way
>> (where it's possible) and either return -EIOCBQUEUED or wait for their
>> completion synchronously. The patch-set to be applied on top of
>> for-next of
>> Miklos' git repo.
>>
>> To estimate performance improvement I used slightly modified fusexmp
>> over
>> tmpfs (clearing O_DIRECT bit from fi->flags in xmp_open). For
>> synchronous
>> operations I used 'dd' like this:
>>
>> dd of=/dev/null if=/fuse/mnt/file bs=2M count=256 iflag=direct
>> dd if=/dev/zero of=/fuse/mnt/file bs=2M count=256 oflag=direct
>> conv=notrunc
>>
>> For AIO I used 'aio-stress' like this:
>>
>> aio-stress -s 512 -a 4 -b 1 -c 1 -O -o 1 /fuse/mnt/file
>> aio-stress -s 512 -a 4 -b 1 -c 1 -O -o 0 /fuse/mnt/file
>>
>> The throughput on some commodity (rather feeble) server was (in MB/sec):
>>
>> original / patched
>>
>> dd reads: ~322 / ~382
>> dd writes: ~277 / ~288
>>
>> aio reads: ~380 / ~459
>> aio writes: ~319 / ~353
>>
> Thanks for posting this, first of all. I was reading through the code
> and found some of the logic a bit hard to follow, particularly due to
> controlling the behavior of underlying read/write functions based on the
> 'async' kiocb. For example: conditionally updating inode size in a
> couple different places; submitting sync requests async and waiting on
> them (causing the need to duplicate the update size call); and the
> existence of a kiocb called 'async' (when a kiocb can be either sync or
> async).
>
> Anyways, I couldn't quite put my finger on how to potentially clean that
> up without messing around with the code, so I've inlined the diff that I
> came up with that (IMO) cleans things up a bit. This should apply on top
> of your set and makes the following tweaks:
>
> - Always pass a kiocb down through __fuse_direct_write()/read() (instead
> of struct file).
> - Trigger the "async" behavior in fuse_direct_io() and
> fuse_send_read/write() based on the sync/async nature of the kiocb. This
> means that sync requests and async requests are sent as such based on
> the nature of the kiocb (as opposed to whether a kiocb exists).
> - Update the various other callers of __fuse_direct_write()/read and
> fuse_direct_io() to create and pass a sync kiocb where necessary.
> - Use the same approach in fuse_direct_IO() to turn an extending, async
> write into a sync write.
>
> The tradeoff with this approach is that we slightly uglify the callers
> that have to now create a sync kiocb. That said, I think some of those
> codepaths (i.e., fuse_file_aio_write()->fuse_perform_write()->...) could
> now just pass the kiocb from the vfs straight down. Thoughts?
>
Thanks a lot for review and further efforts. Your clean-up patch looks
fine, but it constrains the scope of patch-set too much: ordinary
synchronous dio-s (generated by read(2) and write(2)) cease to be
optimized. I think such a loss doesn't justify the clean-up. To be sure
that we are on the same page, I'll try to explain this optimization in
more details:
Imagine direct read or write of 1MB size. Having
FUSE_MAX_PAGES_PER_REQ==32, this can be fulfilled by 8 fuse requests.
Generic linux-kernel (e.g. generic_file_aio_read) passes synchronous
iocb to fuse_direct_IO(). From fuse perspective, it only means that the
caller expects to get control back when that 1MB request is completed.
It's up to fuse how to process that 1MB request.
Existing fuse implementation (w/o my patch-set) processes it like this:
- allocate, fill and send 1st fuse request, wait for its completion
(ACK-ing by userspace);
- allocate, fill and send 2nd fuse request, wait for its completion
(ACK-ing by userspace);
- ...
- allocate, fill and send 8th fuse request, wait for its completion
(ACK-ing by userspace);
With my patch-set applied, it becomes like this:
- allocate, fill and send 1st fuse request;
- allocate, fill and send 2nd fuse request;
- ...
- allocate, fill and send 8th fuse request;
- wait till all of them are completed (ACK-ed by userspace)
This is significant optimization for non-trivial fuse userspace
implementations because: 1) fuse requests might be processed
concurrently; 2) fuse requests might be merged before processing.
Now, let's proceed to technical details... Your idea to use iocb
everywhere instead of file and rely on its type (sync vs. async) is
cool. I like it! Unfortunately, to make it usable in both libaio and
sync dio cases we'll need to put our hands on generic linux-kernel code
as well. Let me explain why:
Currently, iocb has only one binary attribute - it's either sync or
async. aio_complete() derives its ancestry (libaio vs. read(2)/write(2))
from its sync/async type:
> if (is_sync_kiocb(iocb)) {
> BUG_ON(iocb->ki_users != 1);
> iocb->ki_user_data = res;
> iocb->ki_users = 0;
> wake_up_process(iocb->ki_obj.tsk);
> return 1;
> }
In the other words, if it's sync, it's enough to wake up someone in
kernel (who called wait_on_sync_kiocb()) and return. Otherwise,
aio_complete() performs some steps to wake up user (who called
io_getevents()). What we need for fuse is another binary attribute: iocb
was generated by libaio vs. read(2)/write(2). Then we could use
aio_complete() and wait_on_sync_kiocb() for any iocb which was not
generated by libaio. E.g. to process sync dio in async way, we'd
allocate iocb in fuse_direct_IO on stack (as you suggested), but
initialize it as async iocb and pass it to __fuse_direct_read/write,
then call wait_on_sync_kiocb(), not to return -EIOCBQUEUED.
The problem is that FUSE tends to be the only user of this 'feature'.
I'm considering one-line change of aio_complete():
- if (is_sync_kiocb(iocb)) {
+ if (!iocb->ki_ctx) {
Though, not sure whether Miklos, Al Viro, Benjamin and others will
accept it... Let's try if you're OK about this approach. At least this
will cover all fuse dio use-cases (including libaio extending file) and
makes fuse code significantly cleaner.
Thanks,
Maxim
--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org. For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
next reply other threads:[~2012-12-13 12:52 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-12-13 12:52 Maxim V. Patlasov [this message]
2012-12-13 19:43 ` [fuse-devel] [PATCH 0/6] fuse: process direct IO asynchronously Zach Brown
2012-12-14 6:44 ` Maxim V. Patlasov
2012-12-13 19:51 ` Brian Foster
2012-12-14 12:40 ` Maxim V. Patlasov
2012-12-14 14:21 ` Brian Foster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50C9CF95.7090704@parallels.com \
--to=mpatlasov@parallels.com \
--cc=bcrl@kvack.org \
--cc=bfoster@redhat.com \
--cc=fuse-devel@lists.sourceforge.net \
--cc=linux-aio@kvack.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.