* Re: [fuse-devel] [PATCH 0/6] fuse: process direct IO asynchronously
@ 2012-12-13 12:52 Maxim V. Patlasov
2012-12-13 19:43 ` Zach Brown
2012-12-13 19:51 ` Brian Foster
0 siblings, 2 replies; 6+ messages in thread
From: Maxim V. Patlasov @ 2012-12-13 12:52 UTC (permalink / raw)
To: Brian Foster
Cc: miklos@szeredi.hu, Alexander Viro,
fuse-devel@lists.sourceforge.net, linux-fsdevel@vger.kernel.org,
linux-aio, bcrl
Hi Brian,
>> Existing fuse implementation always processes direct IO
>> synchronously: it
>> submits next request to userspace fuse only when previous is
>> completed. This
>> is suboptimal because: 1) libaio DIO works in blocking way; 2)
>> userspace fuse
>> can't achieve parallelism processing several requests simultaneously
>> (e.g.
>> in case of distributed network storage); 3) userspace fuse can't merge
>> requests before passing it to actual storage.
>>
>> The idea of the patch-set is to submit fuse requests in non-blocking way
>> (where it's possible) and either return -EIOCBQUEUED or wait for their
>> completion synchronously. The patch-set to be applied on top of
>> for-next of
>> Miklos' git repo.
>>
>> To estimate performance improvement I used slightly modified fusexmp
>> over
>> tmpfs (clearing O_DIRECT bit from fi->flags in xmp_open). For
>> synchronous
>> operations I used 'dd' like this:
>>
>> dd of=/dev/null if=/fuse/mnt/file bs=2M count=256 iflag=direct
>> dd if=/dev/zero of=/fuse/mnt/file bs=2M count=256 oflag=direct
>> conv=notrunc
>>
>> For AIO I used 'aio-stress' like this:
>>
>> aio-stress -s 512 -a 4 -b 1 -c 1 -O -o 1 /fuse/mnt/file
>> aio-stress -s 512 -a 4 -b 1 -c 1 -O -o 0 /fuse/mnt/file
>>
>> The throughput on some commodity (rather feeble) server was (in MB/sec):
>>
>> original / patched
>>
>> dd reads: ~322 / ~382
>> dd writes: ~277 / ~288
>>
>> aio reads: ~380 / ~459
>> aio writes: ~319 / ~353
>>
> Thanks for posting this, first of all. I was reading through the code
> and found some of the logic a bit hard to follow, particularly due to
> controlling the behavior of underlying read/write functions based on the
> 'async' kiocb. For example: conditionally updating inode size in a
> couple different places; submitting sync requests async and waiting on
> them (causing the need to duplicate the update size call); and the
> existence of a kiocb called 'async' (when a kiocb can be either sync or
> async).
>
> Anyways, I couldn't quite put my finger on how to potentially clean that
> up without messing around with the code, so I've inlined the diff that I
> came up with that (IMO) cleans things up a bit. This should apply on top
> of your set and makes the following tweaks:
>
> - Always pass a kiocb down through __fuse_direct_write()/read() (instead
> of struct file).
> - Trigger the "async" behavior in fuse_direct_io() and
> fuse_send_read/write() based on the sync/async nature of the kiocb. This
> means that sync requests and async requests are sent as such based on
> the nature of the kiocb (as opposed to whether a kiocb exists).
> - Update the various other callers of __fuse_direct_write()/read and
> fuse_direct_io() to create and pass a sync kiocb where necessary.
> - Use the same approach in fuse_direct_IO() to turn an extending, async
> write into a sync write.
>
> The tradeoff with this approach is that we slightly uglify the callers
> that have to now create a sync kiocb. That said, I think some of those
> codepaths (i.e., fuse_file_aio_write()->fuse_perform_write()->...) could
> now just pass the kiocb from the vfs straight down. Thoughts?
>
Thanks a lot for review and further efforts. Your clean-up patch looks
fine, but it constrains the scope of patch-set too much: ordinary
synchronous dio-s (generated by read(2) and write(2)) cease to be
optimized. I think such a loss doesn't justify the clean-up. To be sure
that we are on the same page, I'll try to explain this optimization in
more details:
Imagine direct read or write of 1MB size. Having
FUSE_MAX_PAGES_PER_REQ==32, this can be fulfilled by 8 fuse requests.
Generic linux-kernel (e.g. generic_file_aio_read) passes synchronous
iocb to fuse_direct_IO(). From fuse perspective, it only means that the
caller expects to get control back when that 1MB request is completed.
It's up to fuse how to process that 1MB request.
Existing fuse implementation (w/o my patch-set) processes it like this:
- allocate, fill and send 1st fuse request, wait for its completion
(ACK-ing by userspace);
- allocate, fill and send 2nd fuse request, wait for its completion
(ACK-ing by userspace);
- ...
- allocate, fill and send 8th fuse request, wait for its completion
(ACK-ing by userspace);
With my patch-set applied, it becomes like this:
- allocate, fill and send 1st fuse request;
- allocate, fill and send 2nd fuse request;
- ...
- allocate, fill and send 8th fuse request;
- wait till all of them are completed (ACK-ed by userspace)
This is significant optimization for non-trivial fuse userspace
implementations because: 1) fuse requests might be processed
concurrently; 2) fuse requests might be merged before processing.
Now, let's proceed to technical details... Your idea to use iocb
everywhere instead of file and rely on its type (sync vs. async) is
cool. I like it! Unfortunately, to make it usable in both libaio and
sync dio cases we'll need to put our hands on generic linux-kernel code
as well. Let me explain why:
Currently, iocb has only one binary attribute - it's either sync or
async. aio_complete() derives its ancestry (libaio vs. read(2)/write(2))
from its sync/async type:
> if (is_sync_kiocb(iocb)) {
> BUG_ON(iocb->ki_users != 1);
> iocb->ki_user_data = res;
> iocb->ki_users = 0;
> wake_up_process(iocb->ki_obj.tsk);
> return 1;
> }
In the other words, if it's sync, it's enough to wake up someone in
kernel (who called wait_on_sync_kiocb()) and return. Otherwise,
aio_complete() performs some steps to wake up user (who called
io_getevents()). What we need for fuse is another binary attribute: iocb
was generated by libaio vs. read(2)/write(2). Then we could use
aio_complete() and wait_on_sync_kiocb() for any iocb which was not
generated by libaio. E.g. to process sync dio in async way, we'd
allocate iocb in fuse_direct_IO on stack (as you suggested), but
initialize it as async iocb and pass it to __fuse_direct_read/write,
then call wait_on_sync_kiocb(), not to return -EIOCBQUEUED.
The problem is that FUSE tends to be the only user of this 'feature'.
I'm considering one-line change of aio_complete():
- if (is_sync_kiocb(iocb)) {
+ if (!iocb->ki_ctx) {
Though, not sure whether Miklos, Al Viro, Benjamin and others will
accept it... Let's try if you're OK about this approach. At least this
will cover all fuse dio use-cases (including libaio extending file) and
makes fuse code significantly cleaner.
Thanks,
Maxim
--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org. For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [fuse-devel] [PATCH 0/6] fuse: process direct IO asynchronously
2012-12-13 12:52 [fuse-devel] [PATCH 0/6] fuse: process direct IO asynchronously Maxim V. Patlasov
@ 2012-12-13 19:43 ` Zach Brown
2012-12-14 6:44 ` Maxim V. Patlasov
2012-12-13 19:51 ` Brian Foster
1 sibling, 1 reply; 6+ messages in thread
From: Zach Brown @ 2012-12-13 19:43 UTC (permalink / raw)
To: Maxim V. Patlasov
Cc: Brian Foster, miklos@szeredi.hu, Alexander Viro,
fuse-devel@lists.sourceforge.net, linux-fsdevel@vger.kernel.org,
linux-aio, bcrl
(I'm jumping in mid-thread, I don't *really* know the context.)
> In the other words, if it's sync, it's enough to wake up someone in
> kernel (who called wait_on_sync_kiocb()) and return. Otherwise,
> aio_complete() performs some steps to wake up user (who called
> io_getevents()). What we need for fuse is another binary attribute:
> iocb was generated by libaio vs. read(2)/write(2). Then we could use
> aio_complete() and wait_on_sync_kiocb() for any iocb which was not
> generated by libaio. E.g. to process sync dio in async way, we'd
> allocate iocb in fuse_direct_IO on stack (as you suggested), but
> initialize it as async iocb and pass it to __fuse_direct_read/write,
> then call wait_on_sync_kiocb(), not to return -EIOCBQUEUED.
It sounds like you might be interested in the aio in-kernel interface
that is being worked on in a patch series that gets loop issuing aio
instead of doing sync io.
http://thread.gmane.org/gmane.linux.file-systems/69326/focus=1398723
- z
--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org. For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [fuse-devel] [PATCH 0/6] fuse: process direct IO asynchronously
2012-12-13 12:52 [fuse-devel] [PATCH 0/6] fuse: process direct IO asynchronously Maxim V. Patlasov
2012-12-13 19:43 ` Zach Brown
@ 2012-12-13 19:51 ` Brian Foster
2012-12-14 12:40 ` Maxim V. Patlasov
1 sibling, 1 reply; 6+ messages in thread
From: Brian Foster @ 2012-12-13 19:51 UTC (permalink / raw)
To: Maxim V. Patlasov
Cc: miklos@szeredi.hu, Alexander Viro,
fuse-devel@lists.sourceforge.net, linux-fsdevel@vger.kernel.org,
linux-aio, bcrl
On 12/13/2012 07:52 AM, Maxim V. Patlasov wrote:
> Hi Brian,
>
...
>
...
>>
>
> Thanks a lot for review and further efforts. Your clean-up patch looks
> fine, but it constrains the scope of patch-set too much: ordinary
> synchronous dio-s (generated by read(2) and write(2)) cease to be
> optimized. I think such a loss doesn't justify the clean-up. To be sure
> that we are on the same page, I'll try to explain this optimization in
> more details:
>
...
>
> This is significant optimization for non-trivial fuse userspace
> implementations because: 1) fuse requests might be processed
> concurrently; 2) fuse requests might be merged before processing.
>
I missed that, thanks for the explanation. It makes sense and I agree,
the clean up alone isn't worth that tradeoff. Basically, my assumption
that we could trigger fuse request handling based on the type of iocb is
invalid...
> Now, let's proceed to technical details... Your idea to use iocb
> everywhere instead of file and rely on its type (sync vs. async) is
> cool. I like it! Unfortunately, to make it usable in both libaio and
> sync dio cases we'll need to put our hands on generic linux-kernel code
> as well. Let me explain why:
>
Before getting into that, I made some adjustments to the original patch
to use your new fuse_io_priv structure in place of an iocb, primarily to
decouple the request submission mechanism from the type of the user
request. The highlighted differences are:
- Updated fuse_io_priv with an async field and file pointer to preserve
the current style of interface (i.e., use this instead of iocb).
- Trigger the type of request submission based on the async field.
- Reintroduce the wait in fuse_direct_IO(), but I also pulled up the
fuse_write_update_size() call out of __fuse_direct_write() to make the
separate paths more consistent.
This could probably use more cleanups (i.e., it morphs your fuse_io_priv
into more of a generic I/O descriptor), but illustrates the idea.
> Currently, iocb has only one binary attribute - it's either sync or
> async. aio_complete() derives its ancestry (libaio vs. read(2)/write(2))
> from its sync/async type:
>
>> if (is_sync_kiocb(iocb)) {
>> BUG_ON(iocb->ki_users != 1);
>> iocb->ki_user_data = res;
>> iocb->ki_users = 0;
>> wake_up_process(iocb->ki_obj.tsk);
>> return 1;
>> }
>
> In the other words, if it's sync, it's enough to wake up someone in
> kernel (who called wait_on_sync_kiocb()) and return. Otherwise,
> aio_complete() performs some steps to wake up user (who called
> io_getevents()). What we need for fuse is another binary attribute: iocb
> was generated by libaio vs. read(2)/write(2). Then we could use
> aio_complete() and wait_on_sync_kiocb() for any iocb which was not
> generated by libaio. E.g. to process sync dio in async way, we'd
> allocate iocb in fuse_direct_IO on stack (as you suggested), but
> initialize it as async iocb and pass it to __fuse_direct_read/write,
> then call wait_on_sync_kiocb(), not to return -EIOCBQUEUED.
>
Thanks again for the explanation. I actually ran into this when I
started playing around with the set (initially preserving an extending
async submission and attempting to wait on it, iirc), then I undid that
because the optimization wasn't clear to me. On the plus side, I didn't
quite follow the need for the error recovery truncate but that's clear
now as well. ;)
> The problem is that FUSE tends to be the only user of this 'feature'.
> I'm considering one-line change of aio_complete():
>
> - if (is_sync_kiocb(iocb)) {
> + if (!iocb->ki_ctx) {
>
> Though, not sure whether Miklos, Al Viro, Benjamin and others will
> accept it... Let's try if you're OK about this approach. At least this
> will cover all fuse dio use-cases (including libaio extending file) and
> makes fuse code significantly cleaner.
>
Interesting idea. It seems like it could work to me but I'm not very
familiar with that code and it might not be worth cluttering that
interface for this particular case. I brought this up briefly with Jeff
Moyer and he pointed me at this set (I just saw zab's mail pointing this
out as well :):
https://lkml.org/lkml/2012/10/22/321
... which looks like it could help. Perhaps we could use that new type
of kiocb with a custom callback to wake a blocking thread. That said, I
wonder if we could do that anyways; i.e., define our own
wait_on_fuse_async() type function that that waits on the number of
in-flight requests in the fuse_priv_io. I guess we'd have to complicate
the refcounting with that type of approach, though.
Brian
> Thanks,
> Maxim
---
diff --git a/fs/fuse/cuse.c b/fs/fuse/cuse.c
index beb99e9..0cfc4f9 100644
--- a/fs/fuse/cuse.c
+++ b/fs/fuse/cuse.c
@@ -92,8 +92,11 @@ static ssize_t cuse_read(struct file *file, char
__user *buf, size_t count,
{
loff_t pos = 0;
struct iovec iov = { .iov_base = buf, .iov_len = count };
+ struct fuse_io_priv io = {0,};
- return fuse_direct_io(file, &iov, 1, count, &pos, 0, NULL);
+ io.file = file;
+
+ return fuse_direct_io(&io, &iov, 1, count, &pos, 0);
}
static ssize_t cuse_write(struct file *file, const char __user *buf,
@@ -101,12 +104,15 @@ static ssize_t cuse_write(struct file *file, const
char __user *buf,
{
loff_t pos = 0;
struct iovec iov = { .iov_base = (void __user *)buf, .iov_len = count };
+ struct fuse_io_priv io = {0,};
+
+ io.file = file;
/*
* No locking or generic_write_checks(), the server is
* responsible for locking and sanity checks.
*/
- return fuse_direct_io(file, &iov, 1, count, &pos, 1, NULL);
+ return fuse_direct_io(&io, &iov, 1, count, &pos, 1);
}
static int cuse_open(struct inode *inode, struct file *file)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index d2094e1..29b94e9 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -580,10 +580,8 @@ static void fuse_aio_complete_req(struct fuse_conn
*fc, struct fuse_req *req)
}
static size_t fuse_async_req_send(struct fuse_conn *fc, struct fuse_req
*req,
- size_t num_bytes, struct kiocb *iocb)
+ size_t num_bytes, struct fuse_io_priv *io)
{
- struct fuse_io_priv *io = iocb->private;
-
spin_lock(&io->lock);
io->size += num_bytes;
io->reqs++;
@@ -597,10 +595,10 @@ static size_t fuse_async_req_send(struct fuse_conn
*fc, struct fuse_req *req,
return num_bytes;
}
-static size_t fuse_send_read(struct fuse_req *req, struct file *file,
- loff_t pos, size_t count, fl_owner_t owner,
- struct kiocb *async)
+static size_t fuse_send_read(struct fuse_req *req, struct fuse_io_priv *io,
+ loff_t pos, size_t count, fl_owner_t owner)
{
+ struct file *file = io->file;
struct fuse_file *ff = file->private_data;
struct fuse_conn *fc = ff->fc;
@@ -612,8 +610,8 @@ static size_t fuse_send_read(struct fuse_req *req,
struct file *file,
inarg->lock_owner = fuse_lock_owner_id(fc, owner);
}
- if (async)
- return fuse_async_req_send(fc, req, count, async);
+ if (io->async)
+ return fuse_async_req_send(fc, req, count, io);
fuse_request_send(fc, req);
return req->out.args[0].size;
@@ -635,6 +633,7 @@ static void fuse_read_update_size(struct inode
*inode, loff_t size,
static int fuse_readpage(struct file *file, struct page *page)
{
+ struct fuse_io_priv io = {0,};
struct inode *inode = page->mapping->host;
struct fuse_conn *fc = get_fuse_conn(inode);
struct fuse_req *req;
@@ -648,6 +647,8 @@ static int fuse_readpage(struct file *file, struct
page *page)
if (is_bad_inode(inode))
goto out;
+ io.file = file;
+
/*
* Page writeback can extend beyond the lifetime of the
* page-cache page, so make sure we read a properly synced
@@ -667,7 +668,7 @@ static int fuse_readpage(struct file *file, struct
page *page)
req->num_pages = 1;
req->pages[0] = page;
req->page_descs[0].length = count;
- num_read = fuse_send_read(req, file, pos, count, NULL, NULL);
+ num_read = fuse_send_read(req, &io, pos, count, NULL);
err = req->out.h.error;
fuse_put_request(fc, req);
@@ -869,10 +870,10 @@ static void fuse_write_fill(struct fuse_req *req,
struct fuse_file *ff,
req->out.args[0].value = outarg;
}
-static size_t fuse_send_write(struct fuse_req *req, struct file *file,
- loff_t pos, size_t count, fl_owner_t owner,
- struct kiocb *async)
+static size_t fuse_send_write(struct fuse_req *req, struct fuse_io_priv
*io,
+ loff_t pos, size_t count, fl_owner_t owner)
{
+ struct file *file = io->file;
struct fuse_file *ff = file->private_data;
struct fuse_conn *fc = ff->fc;
struct fuse_write_in *inarg = &req->misc.write.in;
@@ -884,8 +885,8 @@ static size_t fuse_send_write(struct fuse_req *req,
struct file *file,
inarg->lock_owner = fuse_lock_owner_id(fc, owner);
}
- if (async)
- return fuse_async_req_send(fc, req, count, async);
+ if (io->async)
+ return fuse_async_req_send(fc, req, count, io);
fuse_request_send(fc, req);
return req->misc.write.out.size;
@@ -910,11 +911,14 @@ static size_t fuse_send_write_pages(struct
fuse_req *req, struct file *file,
size_t res;
unsigned offset;
unsigned i;
+ struct fuse_io_priv io = {0,};
+
+ io.file = file;
for (i = 0; i < req->num_pages; i++)
fuse_wait_on_page_writeback(inode, req->pages[i]->index);
- res = fuse_send_write(req, file, pos, count, NULL, NULL);
+ res = fuse_send_write(req, &io, pos, count, NULL);
offset = req->page_descs[0].offset;
count = res;
@@ -1252,10 +1256,11 @@ static inline int fuse_iter_npages(const struct
iov_iter *ii_p)
return min(npages, FUSE_MAX_PAGES_PER_REQ);
}
-ssize_t fuse_direct_io(struct file *file, const struct iovec *iov,
+ssize_t fuse_direct_io(struct fuse_io_priv *io, const struct iovec *iov,
unsigned long nr_segs, size_t count, loff_t *ppos,
- int write, struct kiocb *async)
+ int write)
{
+ struct file *file = io->file;
struct fuse_file *ff = file->private_data;
struct fuse_conn *fc = ff->fc;
size_t nmax = write ? fc->max_write : fc->max_read;
@@ -1276,7 +1281,7 @@ ssize_t fuse_direct_io(struct file *file, const
struct iovec *iov,
size_t nbytes = min(count, nmax);
int err = fuse_get_user_pages(req, &ii, &nbytes, write);
if (err) {
- if (async)
+ if (io->async)
fuse_put_request(fc, req);
res = err;
@@ -1284,13 +1289,11 @@ ssize_t fuse_direct_io(struct file *file, const
struct iovec *iov,
}
if (write)
- nres = fuse_send_write(req, file, pos, nbytes, owner,
- async);
+ nres = fuse_send_write(req, io, pos, nbytes, owner);
else
- nres = fuse_send_read(req, file, pos, nbytes, owner,
- async);
+ nres = fuse_send_read(req, io, pos, nbytes, owner);
- if (!async)
+ if (!io->async)
fuse_release_user_pages(req, !write);
if (req->out.h.error) {
if (!res)
@@ -1306,14 +1309,14 @@ ssize_t fuse_direct_io(struct file *file, const
struct iovec *iov,
if (nres != nbytes)
break;
if (count) {
- if (!async)
+ if (!io->async)
fuse_put_request(fc, req);
req = fuse_get_req(fc, fuse_iter_npages(&ii));
if (IS_ERR(req))
break;
}
}
- if (!IS_ERR(req) && !async)
+ if (!IS_ERR(req) && !io->async)
fuse_put_request(fc, req);
if (res > 0)
*ppos = pos;
@@ -1322,17 +1325,18 @@ ssize_t fuse_direct_io(struct file *file, const
struct iovec *iov,
}
EXPORT_SYMBOL_GPL(fuse_direct_io);
-static ssize_t __fuse_direct_read(struct file *file, const struct iovec
*iov,
+static ssize_t __fuse_direct_read(struct fuse_io_priv *io, const struct
iovec *iov,
unsigned long nr_segs, loff_t *ppos,
- struct kiocb *async, size_t count)
+ size_t count)
{
ssize_t res;
+ struct file *file = io->file;
struct inode *inode = file->f_path.dentry->d_inode;
if (is_bad_inode(inode))
return -EIO;
- res = fuse_direct_io(file, iov, nr_segs, count, ppos, 0, async);
+ res = fuse_direct_io(io, iov, nr_segs, count, ppos, 0);
fuse_invalidate_attr(inode);
@@ -1342,25 +1346,23 @@ static ssize_t __fuse_direct_read(struct file
*file, const struct iovec *iov,
static ssize_t fuse_direct_read(struct file *file, char __user *buf,
size_t count, loff_t *ppos)
{
+ struct fuse_io_priv io = {0,};
struct iovec iov = { .iov_base = buf, .iov_len = count };
- return __fuse_direct_read(file, &iov, 1, ppos, NULL, count);
+ io.file = file;
+ return __fuse_direct_read(&io, &iov, 1, ppos, count);
}
-static ssize_t __fuse_direct_write(struct file *file, const struct
iovec *iov,
- unsigned long nr_segs, loff_t *ppos,
- struct kiocb *async)
+static ssize_t __fuse_direct_write(struct fuse_io_priv *io, const
struct iovec *iov,
+ unsigned long nr_segs, loff_t *ppos)
{
+ struct file *file = io->file;
struct inode *inode = file->f_path.dentry->d_inode;
size_t count = iov_length(iov, nr_segs);
ssize_t res;
res = generic_write_checks(file, ppos, &count, 0);
- if (!res) {
- res = fuse_direct_io(file, iov, nr_segs, count, ppos, 1,
- async);
- if (!async && res > 0)
- fuse_write_update_size(inode, *ppos);
- }
+ if (!res)
+ res = fuse_direct_io(io, iov, nr_segs, count, ppos, 1);
fuse_invalidate_attr(inode);
@@ -1373,13 +1375,18 @@ static ssize_t fuse_direct_write(struct file
*file, const char __user *buf,
struct iovec iov = { .iov_base = (void __user *)buf, .iov_len = count };
struct inode *inode = file->f_path.dentry->d_inode;
ssize_t res;
+ struct fuse_io_priv io = {0,};
if (is_bad_inode(inode))
return -EIO;
- /* Don't allow parallel writes to the same file */
+ io.file = file;
+
+ /* don't allow parallel writes to the same file */
mutex_lock(&inode->i_mutex);
- res = __fuse_direct_write(file, &iov, 1, ppos, NULL);
+ res = __fuse_direct_write(&io, &iov, 1, ppos);
+ if (res > 0)
+ fuse_write_update_size(inode, *ppos);
mutex_unlock(&inode->i_mutex);
return res;
@@ -2397,7 +2404,7 @@ fuse_direct_IO(int rw, struct kiocb *iocb, const
struct iovec *iov,
struct inode *inode;
loff_t i_size;
size_t count = iov_length(iov, nr_segs);
- struct kiocb *async_cb = NULL;
+ struct fuse_io_priv *io;
file = iocb->ki_filp;
pos = offset;
@@ -2411,48 +2418,56 @@ fuse_direct_IO(int rw, struct kiocb *iocb, const
struct iovec *iov,
count = i_size - offset;
}
- /* cannot write beyond eof asynchronously */
- if (is_sync_kiocb(iocb) || (offset + count <= i_size)) {
- struct fuse_io_priv *io;
+ io = kmalloc(sizeof(struct fuse_io_priv), GFP_KERNEL);
+ if (!io)
+ return -ENOMEM;
+ spin_lock_init(&io->lock);
+ io->reqs = 1;
+ io->bytes = -1;
+ io->size = 0;
+ io->offset = offset;
+ io->write = (rw == WRITE);
+ io->err = 0;
+ io->file = file;
+ /*
+ * By default, we want to optimize all I/Os with async request submission
+ * to the client filesystem.
+ */
+ io->async = 1;
+ io->iocb = iocb;
- io = kmalloc(sizeof(struct fuse_io_priv), GFP_KERNEL);
- if (!io)
- return -ENOMEM;
+ iocb->private = io;
- spin_lock_init(&io->lock);
- io->reqs = 1;
- io->bytes = -1;
- io->size = 0;
- io->offset = offset;
- io->write = (rw == WRITE);
- io->err = 0;
- io->iocb = iocb;
- iocb->private = io;
-
- async_cb = iocb;
- }
+ /*
+ * We cannot asynchronously extend the size of a file. We have no method
+ * to wait on real async I/O requests, so we must submit this request
+ * synchronously.
+ */
+ if (!is_sync_kiocb(iocb) && (offset + count > i_size))
+ io->async = 0;
if (rw == WRITE)
- ret = __fuse_direct_write(file, iov, nr_segs, &pos, async_cb);
+ ret = __fuse_direct_write(io, iov, nr_segs, &pos);
else
- ret = __fuse_direct_read(file, iov, nr_segs, &pos, async_cb,
- count);
-
- if (async_cb) {
- fuse_aio_complete(async_cb->private, ret == count ? 0 : -EIO,
- -1);
+ ret = __fuse_direct_read(io, iov, nr_segs, &pos, count);
+ if (io->async) {
+ fuse_aio_complete(io, ret == count ? 0 : -EIO, -1);
+
+ /* we have a non-extending, async request, so return */
if (!is_sync_kiocb(iocb))
return -EIOCBQUEUED;
ret = wait_on_sync_kiocb(iocb);
+ } else {
+ kfree(io);
+ }
- if (rw == WRITE) {
- if (ret > 0)
- fuse_write_update_size(inode, pos);
- else if (ret < 0 && offset + count > i_size)
- fuse_do_truncate(file);
- }
+ if (rw == WRITE) {
+ if (ret > 0)
+ fuse_write_update_size(inode, pos);
+ else if (ret < 0 && offset + count > i_size)
+ fuse_do_truncate(file);
}
return ret;
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 173c959..91b5192 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -221,6 +221,7 @@ enum fuse_req_state {
/** The request IO state (for asynchronous processing) */
struct fuse_io_priv {
+ int async;
spinlock_t lock;
unsigned reqs;
ssize_t bytes;
@@ -229,6 +230,7 @@ struct fuse_io_priv {
bool write;
int err;
struct kiocb *iocb;
+ struct file *file;
};
/**
@@ -826,9 +828,9 @@ int fuse_reverse_inval_entry(struct super_block *sb,
u64 parent_nodeid,
int fuse_do_open(struct fuse_conn *fc, u64 nodeid, struct file *file,
bool isdir);
-ssize_t fuse_direct_io(struct file *file, const struct iovec *iov,
+ssize_t fuse_direct_io(struct fuse_io_priv *io, const struct iovec *iov,
unsigned long nr_segs, size_t count, loff_t *ppos,
- int write, struct kiocb *async);
+ int write);
long fuse_do_ioctl(struct file *file, unsigned int cmd, unsigned long arg,
unsigned int flags);
long fuse_ioctl_common(struct file *file, unsigned int cmd,
--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org. For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [fuse-devel] [PATCH 0/6] fuse: process direct IO asynchronously
2012-12-13 19:43 ` Zach Brown
@ 2012-12-14 6:44 ` Maxim V. Patlasov
0 siblings, 0 replies; 6+ messages in thread
From: Maxim V. Patlasov @ 2012-12-14 6:44 UTC (permalink / raw)
To: Zach Brown
Cc: Brian Foster, miklos@szeredi.hu, Alexander Viro,
fuse-devel@lists.sourceforge.net, linux-fsdevel@vger.kernel.org,
linux-aio, bcrl
12/13/2012 11:43 PM, Zach Brown пишет:
> (I'm jumping in mid-thread, I don't *really* know the context.)
>
>> In the other words, if it's sync, it's enough to wake up someone in
>> kernel (who called wait_on_sync_kiocb()) and return. Otherwise,
>> aio_complete() performs some steps to wake up user (who called
>> io_getevents()). What we need for fuse is another binary attribute:
>> iocb was generated by libaio vs. read(2)/write(2). Then we could use
>> aio_complete() and wait_on_sync_kiocb() for any iocb which was not
>> generated by libaio. E.g. to process sync dio in async way, we'd
>> allocate iocb in fuse_direct_IO on stack (as you suggested), but
>> initialize it as async iocb and pass it to __fuse_direct_read/write,
>> then call wait_on_sync_kiocb(), not to return -EIOCBQUEUED.
> It sounds like you might be interested in the aio in-kernel interface
> that is being worked on in a patch series that gets loop issuing aio
> instead of doing sync io.
>
> http://thread.gmane.org/gmane.linux.file-systems/69326/focus=1398723
Thanks, Zach! I've already considered using this patch, but thank you
anyway.
Maxim
>
> - z
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [fuse-devel] [PATCH 0/6] fuse: process direct IO asynchronously
2012-12-13 19:51 ` Brian Foster
@ 2012-12-14 12:40 ` Maxim V. Patlasov
2012-12-14 14:21 ` Brian Foster
0 siblings, 1 reply; 6+ messages in thread
From: Maxim V. Patlasov @ 2012-12-14 12:40 UTC (permalink / raw)
To: Brian Foster
Cc: miklos@szeredi.hu, Alexander Viro,
fuse-devel@lists.sourceforge.net, linux-fsdevel@vger.kernel.org,
linux-aio, bcrl
Hi Brian,
12/13/2012 11:51 PM, Brian Foster пишет:
> [cut]
> Before getting into that, I made some adjustments to the original patch
> to use your new fuse_io_priv structure in place of an iocb, primarily to
> decouple the request submission mechanism from the type of the user
> request. The highlighted differences are:
>
> - Updated fuse_io_priv with an async field and file pointer to preserve
> the current style of interface (i.e., use this instead of iocb).
> - Trigger the type of request submission based on the async field.
> - Reintroduce the wait in fuse_direct_IO(), but I also pulled up the
> fuse_write_update_size() call out of __fuse_direct_write() to make the
> separate paths more consistent.
Sounds reasonable.
> This could probably use more cleanups (i.e., it morphs your fuse_io_priv
> into more of a generic I/O descriptor), but illustrates the idea.
The patch you provided looks fine. Do you have any particular (further)
cleanups in mind? I'm going to resend initial patch-set with your last
patch applied. If you provide more ideas about improving it, I'll be
glad to consider and integrate them as well.
> [cut]
>> The problem is that FUSE tends to be the only user of this 'feature'.
>> I'm considering one-line change of aio_complete():
>>
>> - if (is_sync_kiocb(iocb)) {
>> + if (!iocb->ki_ctx) {
>>
>> Though, not sure whether Miklos, Al Viro, Benjamin and others will
>> accept it... Let's try if you're OK about this approach. At least this
>> will cover all fuse dio use-cases (including libaio extending file) and
>> makes fuse code significantly cleaner.
>>
> Interesting idea. It seems like it could work to me but I'm not very
> familiar with that code and it might not be worth cluttering that
> interface for this particular case.
Yes, exactly. Moreover, after looking at aio.h closer:
> struct kioctx *ki_ctx; /* may be NULL for sync ops */
I realized that that would be rather hack: sync ops doesn't use ki_ctx
for now, but this can change in the future.
> I brought this up briefly with Jeff
> Moyer and he pointed me at this set (I just saw zab's mail pointing this
> out as well :):
>
> https://lkml.org/lkml/2012/10/22/321
>
> ... which looks like it could help. Perhaps we could use that new type
> of kiocb with a custom callback to wake a blocking thread. That said, I
> wonder if we could do that anyways; i.e., define our own
> wait_on_fuse_async() type function that that waits on the number of
> in-flight requests in the fuse_priv_io. I guess we'd have to complicate
> the refcounting with that type of approach, though.
I've considered using that new API (kernel aio) but refrained from it
deliberately. The reasons were similar to ones you listed above: why
should we invent another synchronization mechanism if
wait_on_sync_kiocb() does exactly what we need? Now, after more
thinking, I tend to think that using that API is doable, but the way how
we could do it is too clumsy. Please let me know if you need more details.
Thanks,
Maxim
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [fuse-devel] [PATCH 0/6] fuse: process direct IO asynchronously
2012-12-14 12:40 ` Maxim V. Patlasov
@ 2012-12-14 14:21 ` Brian Foster
0 siblings, 0 replies; 6+ messages in thread
From: Brian Foster @ 2012-12-14 14:21 UTC (permalink / raw)
To: Maxim V. Patlasov
Cc: miklos@szeredi.hu, Alexander Viro,
fuse-devel@lists.sourceforge.net, linux-fsdevel@vger.kernel.org,
linux-aio, bcrl
On 12/14/2012 07:40 AM, Maxim V. Patlasov wrote:
> Hi Brian,
>
> 12/13/2012 11:51 PM, Brian Foster пишет:
>> [cut]
>> Before getting into that, I made some adjustments to the original patch
>> to use your new fuse_io_priv structure in place of an iocb, primarily to
>> decouple the request submission mechanism from the type of the user
>> request. The highlighted differences are:
>>
>> - Updated fuse_io_priv with an async field and file pointer to preserve
>> the current style of interface (i.e., use this instead of iocb).
>> - Trigger the type of request submission based on the async field.
>> - Reintroduce the wait in fuse_direct_IO(), but I also pulled up the
>> fuse_write_update_size() call out of __fuse_direct_write() to make the
>> separate paths more consistent.
>
> Sounds reasonable.
>
>> This could probably use more cleanups (i.e., it morphs your fuse_io_priv
>> into more of a generic I/O descriptor), but illustrates the idea.
>
> The patch you provided looks fine. Do you have any particular (further)
> cleanups in mind? I'm going to resend initial patch-set with your last
> patch applied. If you provide more ideas about improving it, I'll be
> glad to consider and integrate them as well.
>
Great! I was considering some minor things such as renaming the
structure, turning the async field into a generic flags field with an
async bit, perhaps an initialization helper for the io_priv would be
useful, etc.
I'm not totally sold on any of that and I still need to take a more
detailed look at the overall set as well as repeat some of your perf.
tests on gluster and real hardware, so perhaps it is best to see v2
shake out first and I'll follow up from there...
>> [cut]
>>> The problem is that FUSE tends to be the only user of this 'feature'.
>>> I'm considering one-line change of aio_complete():
>>>
>>> - if (is_sync_kiocb(iocb)) {
>>> + if (!iocb->ki_ctx) {
>>>
>>> Though, not sure whether Miklos, Al Viro, Benjamin and others will
>>> accept it... Let's try if you're OK about this approach. At least this
>>> will cover all fuse dio use-cases (including libaio extending file) and
>>> makes fuse code significantly cleaner.
>>>
>> Interesting idea. It seems like it could work to me but I'm not very
>> familiar with that code and it might not be worth cluttering that
>> interface for this particular case.
>
> Yes, exactly. Moreover, after looking at aio.h closer:
>
>> struct kioctx *ki_ctx; /* may be NULL for sync ops */
>
> I realized that that would be rather hack: sync ops doesn't use ki_ctx
> for now, but this can change in the future.
>
Right, it seems like that wouldn't break !sync aio, but could set a
landmine for future sync aio uses (not to mention confusing the
wait_on_sync_kiocb() semantics).
>> I brought this up briefly with Jeff
>> Moyer and he pointed me at this set (I just saw zab's mail pointing this
>> out as well :):
>>
>> https://lkml.org/lkml/2012/10/22/321
>>
>> ... which looks like it could help. Perhaps we could use that new type
>> of kiocb with a custom callback to wake a blocking thread. That said, I
>> wonder if we could do that anyways; i.e., define our own
>> wait_on_fuse_async() type function that that waits on the number of
>> in-flight requests in the fuse_priv_io. I guess we'd have to complicate
>> the refcounting with that type of approach, though.
>
> I've considered using that new API (kernel aio) but refrained from it
> deliberately. The reasons were similar to ones you listed above: why
> should we invent another synchronization mechanism if
> wait_on_sync_kiocb() does exactly what we need? Now, after more
> thinking, I tend to think that using that API is doable, but the way how
> we could do it is too clumsy. Please let me know if you need more details.
>
That sounds reasonable to me. I was just trying to consider alternate
ideas. It does seem unnecessary to jump through hoops when we can
accomplish the same behavior with a cleaner interface. Thanks!
Brian
> Thanks,
> Maxim
--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org. For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2012-12-14 14:21 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-13 12:52 [fuse-devel] [PATCH 0/6] fuse: process direct IO asynchronously Maxim V. Patlasov
2012-12-13 19:43 ` Zach Brown
2012-12-14 6:44 ` Maxim V. Patlasov
2012-12-13 19:51 ` Brian Foster
2012-12-14 12:40 ` Maxim V. Patlasov
2012-12-14 14:21 ` Brian Foster
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).