* [PATCH 0/6] fuse: process direct IO asynchronously
@ 2012-12-10 7:41 Maxim V. Patlasov
2012-12-10 7:41 ` [PATCH 1/6] fuse: move fuse_release_user_pages() up Maxim V. Patlasov
` (5 more replies)
0 siblings, 6 replies; 8+ messages in thread
From: Maxim V. Patlasov @ 2012-12-10 7:41 UTC (permalink / raw)
To: miklos; +Cc: dev, xemul, fuse-devel, bfoster, linux-kernel, devel
Hi,
Existing fuse implementation always processes direct IO synchronously: it
submits next request to userspace fuse only when previous is completed. This
is suboptimal because: 1) libaio DIO works in blocking way; 2) userspace fuse
can't achieve parallelism processing several requests simultaneously (e.g.
in case of distributed network storage); 3) userspace fuse can't merge
requests before passing it to actual storage.
The idea of the patch-set is to submit fuse requests in non-blocking way
(where it's possible) and either return -EIOCBQUEUED or wait for their
completion synchronously. The patch-set to be applied on top of for-next of
Miklos' git repo.
To estimate performance improvement I used slightly modified fusexmp over
tmpfs (clearing O_DIRECT bit from fi->flags in xmp_open). For synchronous
operations I used 'dd' like this:
dd of=/dev/null if=/fuse/mnt/file bs=2M count=256 iflag=direct
dd if=/dev/zero of=/fuse/mnt/file bs=2M count=256 oflag=direct conv=notrunc
For AIO I used 'aio-stress' like this:
aio-stress -s 512 -a 4 -b 1 -c 1 -O -o 1 /fuse/mnt/file
aio-stress -s 512 -a 4 -b 1 -c 1 -O -o 0 /fuse/mnt/file
The throughput on some commodity (rather feeble) server was (in MB/sec):
original / patched
dd reads: ~322 / ~382
dd writes: ~277 / ~288
aio reads: ~380 / ~459
aio writes: ~319 / ~353
Thanks,
Maxim
---
Maxim V. Patlasov (6):
fuse: move fuse_release_user_pages() up
fuse: add support of async IO
fuse: make fuse_direct_io() aware about AIO
fuse: enable asynchronous processing direct IO
fuse: truncate file if async dio failed
fuse: optimize short direct reads
fs/fuse/cuse.c | 4 -
fs/fuse/file.c | 276 ++++++++++++++++++++++++++++++++++++++++++++++++------
fs/fuse/fuse_i.h | 17 +++
3 files changed, 262 insertions(+), 35 deletions(-)
--
Signature
^ permalink raw reply [flat|nested] 8+ messages in thread* [PATCH 1/6] fuse: move fuse_release_user_pages() up 2012-12-10 7:41 [PATCH 0/6] fuse: process direct IO asynchronously Maxim V. Patlasov @ 2012-12-10 7:41 ` Maxim V. Patlasov 2012-12-10 7:41 ` [PATCH 2/6] fuse: add support of async IO Maxim V. Patlasov ` (4 subsequent siblings) 5 siblings, 0 replies; 8+ messages in thread From: Maxim V. Patlasov @ 2012-12-10 7:41 UTC (permalink / raw) To: miklos; +Cc: dev, xemul, fuse-devel, bfoster, linux-kernel, devel fuse_release_user_pages() will be indirectly used by fuse_send_read/write in future patches. Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com> --- fs/fuse/file.c | 24 ++++++++++++------------ 1 files changed, 12 insertions(+), 12 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 19b50e7..6685cb0 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -491,6 +491,18 @@ void fuse_read_fill(struct fuse_req *req, struct file *file, loff_t pos, req->out.args[0].size = count; } +static void fuse_release_user_pages(struct fuse_req *req, int write) +{ + unsigned i; + + for (i = 0; i < req->num_pages; i++) { + struct page *page = req->pages[i]; + if (write) + set_page_dirty_lock(page); + put_page(page); + } +} + static size_t fuse_send_read(struct fuse_req *req, struct file *file, loff_t pos, size_t count, fl_owner_t owner) { @@ -1035,18 +1047,6 @@ out: return written ? written : err; } -static void fuse_release_user_pages(struct fuse_req *req, int write) -{ - unsigned i; - - for (i = 0; i < req->num_pages; i++) { - struct page *page = req->pages[i]; - if (write) - set_page_dirty_lock(page); - put_page(page); - } -} - static inline void fuse_page_descs_length_init(struct fuse_req *req, unsigned index, unsigned nr_pages) { ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/6] fuse: add support of async IO 2012-12-10 7:41 [PATCH 0/6] fuse: process direct IO asynchronously Maxim V. Patlasov 2012-12-10 7:41 ` [PATCH 1/6] fuse: move fuse_release_user_pages() up Maxim V. Patlasov @ 2012-12-10 7:41 ` Maxim V. Patlasov 2012-12-10 7:41 ` [PATCH 3/6] fuse: make fuse_direct_io() aware about AIO Maxim V. Patlasov ` (3 subsequent siblings) 5 siblings, 0 replies; 8+ messages in thread From: Maxim V. Patlasov @ 2012-12-10 7:41 UTC (permalink / raw) To: miklos; +Cc: dev, xemul, fuse-devel, bfoster, linux-kernel, devel The patch implements a framework to process an IO request asynchronously. The idea is to associate several fuse requests with a single kiocb by means of fuse_io_priv structure. The structure plays the same role for FUSE as 'struct dio' for direct-io.c. The framework is supposed to be used like this: - someone (who wants to process an IO asynchronously) allocates fuse_io_priv, initializes and saves it in kiocb->private. - as soon as fuse request is filled, it can be submitted (in non-blocking way) by fuse_async_req_send() - when all submitted requests are ACKed by userspace, io->reqs drops to zero triggering aio_complete() In case of IO initiated by libaio, aio_complete() will finish processing the same way as in case of dio_complete() calling aio_complete(). But the framework may be also used for internal FUSE use when initial IO request was synchronous (from user perspective), but it's beneficial to process it asynchronously. Then the caller should wait on kiocb explicitly and aio_complete() will wake the caller up. Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com> --- fs/fuse/file.c | 94 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ fs/fuse/fuse_i.h | 15 +++++++++ 2 files changed, 109 insertions(+), 0 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 6685cb0..634f54a 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -503,6 +503,100 @@ static void fuse_release_user_pages(struct fuse_req *req, int write) } } +/** + * In case of short read, the caller sets 'pos' to the position of + * actual end of fuse request in IO request. Otherwise, if bytes_requested + * == bytes_transferred or rw == WRITE, the caller sets 'pos' to -1. + * + * An example: + * User requested DIO read of 64K. It was splitted into two 32K fuse requests, + * both submitted asynchronously. The first of them was ACKed by userspace as + * fully completed (req->out.args[0].size == 32K) resulting in pos == -1. The + * second request was ACKed as short, e.g. only 1K was read, resulting in + * pos == 33K. + * + * Thus, when all fuse requests are completed, the minimal non-negative 'pos' + * will be equal to the length of the longest contiguous fragment of + * transferred data starting from the beginning of IO request. + */ +static void fuse_aio_complete(struct fuse_io_priv *io, int err, ssize_t pos) +{ + int left; + + spin_lock(&io->lock); + if (err) + io->err = io->err ? : err; + else if (pos >= 0 && (io->bytes < 0 || pos < io->bytes)) + io->bytes = pos; + + left = --io->reqs; + spin_unlock(&io->lock); + + if (!left) { + long res; + + if (io->err) + res = io->err; + else if (io->bytes >= 0 && io->write) + res = -EIO; + else { + res = io->bytes < 0 ? io->size : io->bytes; + + if (!is_sync_kiocb(io->iocb)) { + struct path *path = &io->iocb->ki_filp->f_path; + struct inode *inode = path->dentry->d_inode; + struct fuse_conn *fc = get_fuse_conn(inode); + struct fuse_inode *fi = get_fuse_inode(inode); + + spin_lock(&fc->lock); + fi->attr_version = ++fc->attr_version; + spin_unlock(&fc->lock); + } + } + + aio_complete(io->iocb, res, 0); + kfree(io); + } +} + +static void fuse_aio_complete_req(struct fuse_conn *fc, struct fuse_req *req) +{ + struct fuse_io_priv *io = req->io; + ssize_t pos = -1; + + fuse_release_user_pages(req, !io->write); + + if (io->write) { + if (req->misc.write.in.size != req->misc.write.out.size) + pos = req->misc.write.in.offset - io->offset + + req->misc.write.out.size; + } else { + if (req->misc.read.in.size != req->out.args[0].size) + pos = req->misc.read.in.offset - io->offset + + req->out.args[0].size; + } + + fuse_aio_complete(io, req->out.h.error, pos); +} + +static size_t fuse_async_req_send(struct fuse_conn *fc, struct fuse_req *req, + size_t num_bytes, struct kiocb *iocb) +{ + struct fuse_io_priv *io = iocb->private; + + spin_lock(&io->lock); + io->size += num_bytes; + io->reqs++; + spin_unlock(&io->lock); + + req->io = io; + req->end = fuse_aio_complete_req; + + fuse_request_send_background(fc, req); + + return num_bytes; +} + static size_t fuse_send_read(struct fuse_req *req, struct file *file, loff_t pos, size_t count, fl_owner_t owner) { diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index e4f70ea..618d48a 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -219,6 +219,18 @@ enum fuse_req_state { FUSE_REQ_FINISHED }; +/** The request IO state (for asynchronous processing) */ +struct fuse_io_priv { + spinlock_t lock; + unsigned reqs; + ssize_t bytes; + size_t size; + __u64 offset; + bool write; + int err; + struct kiocb *iocb; +}; + /** * A request to the client */ @@ -323,6 +335,9 @@ struct fuse_req { /** Inode used in the request or NULL */ struct inode *inode; + /** AIO control block */ + struct fuse_io_priv *io; + /** Link on fi->writepages */ struct list_head writepages_entry; ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 3/6] fuse: make fuse_direct_io() aware about AIO 2012-12-10 7:41 [PATCH 0/6] fuse: process direct IO asynchronously Maxim V. Patlasov 2012-12-10 7:41 ` [PATCH 1/6] fuse: move fuse_release_user_pages() up Maxim V. Patlasov 2012-12-10 7:41 ` [PATCH 2/6] fuse: add support of async IO Maxim V. Patlasov @ 2012-12-10 7:41 ` Maxim V. Patlasov 2012-12-10 7:42 ` [PATCH 4/6] fuse: enable asynchronous processing direct IO Maxim V. Patlasov ` (2 subsequent siblings) 5 siblings, 0 replies; 8+ messages in thread From: Maxim V. Patlasov @ 2012-12-10 7:41 UTC (permalink / raw) To: miklos; +Cc: dev, xemul, fuse-devel, bfoster, linux-kernel, devel The patch implements passing "struct kiocb *async" down the stack up to fuse_send_read/write where it is used to submit request asynchronously. async==NULL designates synchronous processing. Non-trivial part of the patch is changes in fuse_direct_io(): resources like fuse requests and user pages cannot be released immediately in async case. Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com> --- fs/fuse/cuse.c | 4 ++-- fs/fuse/file.c | 58 ++++++++++++++++++++++++++++++++++++------------------ fs/fuse/fuse_i.h | 2 +- 3 files changed, 42 insertions(+), 22 deletions(-) diff --git a/fs/fuse/cuse.c b/fs/fuse/cuse.c index 65ce10a..beb99e9 100644 --- a/fs/fuse/cuse.c +++ b/fs/fuse/cuse.c @@ -93,7 +93,7 @@ static ssize_t cuse_read(struct file *file, char __user *buf, size_t count, loff_t pos = 0; struct iovec iov = { .iov_base = buf, .iov_len = count }; - return fuse_direct_io(file, &iov, 1, count, &pos, 0); + return fuse_direct_io(file, &iov, 1, count, &pos, 0, NULL); } static ssize_t cuse_write(struct file *file, const char __user *buf, @@ -106,7 +106,7 @@ static ssize_t cuse_write(struct file *file, const char __user *buf, * No locking or generic_write_checks(), the server is * responsible for locking and sanity checks. */ - return fuse_direct_io(file, &iov, 1, count, &pos, 1); + return fuse_direct_io(file, &iov, 1, count, &pos, 1, NULL); } static int cuse_open(struct inode *inode, struct file *file) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 634f54a..c585158 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -598,7 +598,8 @@ static size_t fuse_async_req_send(struct fuse_conn *fc, struct fuse_req *req, } static size_t fuse_send_read(struct fuse_req *req, struct file *file, - loff_t pos, size_t count, fl_owner_t owner) + loff_t pos, size_t count, fl_owner_t owner, + struct kiocb *async) { struct fuse_file *ff = file->private_data; struct fuse_conn *fc = ff->fc; @@ -610,6 +611,10 @@ static size_t fuse_send_read(struct fuse_req *req, struct file *file, inarg->read_flags |= FUSE_READ_LOCKOWNER; inarg->lock_owner = fuse_lock_owner_id(fc, owner); } + + if (async) + return fuse_async_req_send(fc, req, count, async); + fuse_request_send(fc, req); return req->out.args[0].size; } @@ -662,7 +667,7 @@ static int fuse_readpage(struct file *file, struct page *page) req->num_pages = 1; req->pages[0] = page; req->page_descs[0].length = count; - num_read = fuse_send_read(req, file, pos, count, NULL); + num_read = fuse_send_read(req, file, pos, count, NULL, NULL); err = req->out.h.error; fuse_put_request(fc, req); @@ -865,7 +870,8 @@ static void fuse_write_fill(struct fuse_req *req, struct fuse_file *ff, } static size_t fuse_send_write(struct fuse_req *req, struct file *file, - loff_t pos, size_t count, fl_owner_t owner) + loff_t pos, size_t count, fl_owner_t owner, + struct kiocb *async) { struct fuse_file *ff = file->private_data; struct fuse_conn *fc = ff->fc; @@ -877,6 +883,10 @@ static size_t fuse_send_write(struct fuse_req *req, struct file *file, inarg->write_flags |= FUSE_WRITE_LOCKOWNER; inarg->lock_owner = fuse_lock_owner_id(fc, owner); } + + if (async) + return fuse_async_req_send(fc, req, count, async); + fuse_request_send(fc, req); return req->misc.write.out.size; } @@ -904,7 +914,7 @@ static size_t fuse_send_write_pages(struct fuse_req *req, struct file *file, for (i = 0; i < req->num_pages; i++) fuse_wait_on_page_writeback(inode, req->pages[i]->index); - res = fuse_send_write(req, file, pos, count, NULL); + res = fuse_send_write(req, file, pos, count, NULL, NULL); offset = req->page_descs[0].offset; count = res; @@ -1244,7 +1254,7 @@ static inline int fuse_iter_npages(const struct iov_iter *ii_p) ssize_t fuse_direct_io(struct file *file, const struct iovec *iov, unsigned long nr_segs, size_t count, loff_t *ppos, - int write) + int write, struct kiocb *async) { struct fuse_file *ff = file->private_data; struct fuse_conn *fc = ff->fc; @@ -1266,16 +1276,22 @@ ssize_t fuse_direct_io(struct file *file, const struct iovec *iov, size_t nbytes = min(count, nmax); int err = fuse_get_user_pages(req, &ii, &nbytes, write); if (err) { + if (async) + fuse_put_request(fc, req); + res = err; break; } if (write) - nres = fuse_send_write(req, file, pos, nbytes, owner); + nres = fuse_send_write(req, file, pos, nbytes, owner, + async); else - nres = fuse_send_read(req, file, pos, nbytes, owner); + nres = fuse_send_read(req, file, pos, nbytes, owner, + async); - fuse_release_user_pages(req, !write); + if (!async) + fuse_release_user_pages(req, !write); if (req->out.h.error) { if (!res) res = req->out.h.error; @@ -1290,13 +1306,14 @@ ssize_t fuse_direct_io(struct file *file, const struct iovec *iov, if (nres != nbytes) break; if (count) { - fuse_put_request(fc, req); + if (!async) + fuse_put_request(fc, req); req = fuse_get_req(fc, fuse_iter_npages(&ii)); if (IS_ERR(req)) break; } } - if (!IS_ERR(req)) + if (!IS_ERR(req) && !async) fuse_put_request(fc, req); if (res > 0) *ppos = pos; @@ -1306,7 +1323,8 @@ ssize_t fuse_direct_io(struct file *file, const struct iovec *iov, EXPORT_SYMBOL_GPL(fuse_direct_io); static ssize_t __fuse_direct_read(struct file *file, const struct iovec *iov, - unsigned long nr_segs, loff_t *ppos) + unsigned long nr_segs, loff_t *ppos, + struct kiocb *async) { ssize_t res; struct inode *inode = file->f_path.dentry->d_inode; @@ -1315,7 +1333,7 @@ static ssize_t __fuse_direct_read(struct file *file, const struct iovec *iov, return -EIO; res = fuse_direct_io(file, iov, nr_segs, iov_length(iov, nr_segs), - ppos, 0); + ppos, 0, async); fuse_invalidate_attr(inode); @@ -1326,11 +1344,12 @@ static ssize_t fuse_direct_read(struct file *file, char __user *buf, size_t count, loff_t *ppos) { struct iovec iov = { .iov_base = buf, .iov_len = count }; - return __fuse_direct_read(file, &iov, 1, ppos); + return __fuse_direct_read(file, &iov, 1, ppos, NULL); } static ssize_t __fuse_direct_write(struct file *file, const struct iovec *iov, - unsigned long nr_segs, loff_t *ppos) + unsigned long nr_segs, loff_t *ppos, + struct kiocb *async) { struct inode *inode = file->f_path.dentry->d_inode; size_t count = iov_length(iov, nr_segs); @@ -1338,8 +1357,9 @@ static ssize_t __fuse_direct_write(struct file *file, const struct iovec *iov, res = generic_write_checks(file, ppos, &count, 0); if (!res) { - res = fuse_direct_io(file, iov, nr_segs, count, ppos, 1); - if (res > 0) + res = fuse_direct_io(file, iov, nr_segs, count, ppos, 1, + async); + if (!async && res > 0) fuse_write_update_size(inode, *ppos); } @@ -1360,7 +1380,7 @@ static ssize_t fuse_direct_write(struct file *file, const char __user *buf, /* Don't allow parallel writes to the same file */ mutex_lock(&inode->i_mutex); - res = __fuse_direct_write(file, &iov, 1, ppos); + res = __fuse_direct_write(file, &iov, 1, ppos, NULL); mutex_unlock(&inode->i_mutex); return res; @@ -2333,9 +2353,9 @@ fuse_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov, pos = offset; if (rw == WRITE) - ret = __fuse_direct_write(file, iov, nr_segs, &pos); + ret = __fuse_direct_write(file, iov, nr_segs, &pos, NULL); else - ret = __fuse_direct_read(file, iov, nr_segs, &pos); + ret = __fuse_direct_read(file, iov, nr_segs, &pos, NULL); return ret; } diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 618d48a..173c959 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -828,7 +828,7 @@ int fuse_do_open(struct fuse_conn *fc, u64 nodeid, struct file *file, bool isdir); ssize_t fuse_direct_io(struct file *file, const struct iovec *iov, unsigned long nr_segs, size_t count, loff_t *ppos, - int write); + int write, struct kiocb *async); long fuse_do_ioctl(struct file *file, unsigned int cmd, unsigned long arg, unsigned int flags); long fuse_ioctl_common(struct file *file, unsigned int cmd, ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 4/6] fuse: enable asynchronous processing direct IO 2012-12-10 7:41 [PATCH 0/6] fuse: process direct IO asynchronously Maxim V. Patlasov ` (2 preceding siblings ...) 2012-12-10 7:41 ` [PATCH 3/6] fuse: make fuse_direct_io() aware about AIO Maxim V. Patlasov @ 2012-12-10 7:42 ` Maxim V. Patlasov 2012-12-10 7:42 ` [PATCH 5/6] fuse: truncate file if async dio failed Maxim V. Patlasov 2012-12-10 7:42 ` [PATCH 6/6] fuse: optimize short direct reads Maxim V. Patlasov 5 siblings, 0 replies; 8+ messages in thread From: Maxim V. Patlasov @ 2012-12-10 7:42 UTC (permalink / raw) To: miklos; +Cc: dev, xemul, fuse-devel, bfoster, linux-kernel, devel In case of synchronous DIO request (i.e. read(2) or write(2) for a file opened with O_DIRECT), the patch submits fuse requests asynchronously, but waits for their completions before return from fuse_direct_IO(). In case of asynchronous DIO request (i.e. libaio io_submit() or a file opened with O_DIRECT), the patch submits fuse requests asynchronously and return -EIOCBQUEUED immediately. The only special case is async DIO extending file. Here the patch falls back to old behaviour because we can't return -EIOCBQUEUED and update i_size later, without i_mutex hold. Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com> --- fs/fuse/file.c | 44 ++++++++++++++++++++++++++++++++++++++++++-- 1 files changed, 42 insertions(+), 2 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index c585158..ef6d3de 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -2348,14 +2348,54 @@ fuse_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov, ssize_t ret = 0; struct file *file = NULL; loff_t pos = 0; + struct inode *inode; + loff_t i_size; + size_t count = iov_length(iov, nr_segs); + struct kiocb *async_cb = NULL; file = iocb->ki_filp; pos = offset; + inode = file->f_mapping->host; + i_size = i_size_read(inode); + + /* cannot write beyond eof asynchronously */ + if (is_sync_kiocb(iocb) || (offset + count <= i_size) || rw != WRITE) { + struct fuse_io_priv *io; + + io = kmalloc(sizeof(struct fuse_io_priv), GFP_KERNEL); + if (!io) + return -ENOMEM; + + spin_lock_init(&io->lock); + io->reqs = 1; + io->bytes = -1; + io->size = 0; + io->offset = offset; + io->write = (rw == WRITE); + io->err = 0; + io->iocb = iocb; + iocb->private = io; + + async_cb = iocb; + } if (rw == WRITE) - ret = __fuse_direct_write(file, iov, nr_segs, &pos, NULL); + ret = __fuse_direct_write(file, iov, nr_segs, &pos, async_cb); else - ret = __fuse_direct_read(file, iov, nr_segs, &pos, NULL); + ret = __fuse_direct_read(file, iov, nr_segs, &pos, async_cb); + + if (async_cb) { + fuse_aio_complete(async_cb->private, ret == count ? 0 : -EIO, + -1); + + if (!is_sync_kiocb(iocb)) + return -EIOCBQUEUED; + + ret = wait_on_sync_kiocb(iocb); + + if (rw == WRITE && ret > 0) + fuse_write_update_size(inode, pos); + } return ret; } ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 5/6] fuse: truncate file if async dio failed 2012-12-10 7:41 [PATCH 0/6] fuse: process direct IO asynchronously Maxim V. Patlasov ` (3 preceding siblings ...) 2012-12-10 7:42 ` [PATCH 4/6] fuse: enable asynchronous processing direct IO Maxim V. Patlasov @ 2012-12-10 7:42 ` Maxim V. Patlasov 2012-12-10 7:42 ` [PATCH 6/6] fuse: optimize short direct reads Maxim V. Patlasov 5 siblings, 0 replies; 8+ messages in thread From: Maxim V. Patlasov @ 2012-12-10 7:42 UTC (permalink / raw) To: miklos; +Cc: dev, xemul, fuse-devel, bfoster, linux-kernel, devel The patch improves error handling in fuse_direct_IO(): if we successfully submitted several fuse requests on behalf of synchronous direct write extending file and some of them failed, let's try to do our best to clean-up. Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com> --- fs/fuse/file.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 files changed, 53 insertions(+), 2 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index ef6d3de..3e0fdb7 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -2341,6 +2341,53 @@ int fuse_notify_poll_wakeup(struct fuse_conn *fc, return 0; } +static void fuse_do_truncate(struct file *file) +{ + struct fuse_file *ff = file->private_data; + struct inode *inode = file->f_mapping->host; + struct fuse_conn *fc = get_fuse_conn(inode); + struct fuse_req *req; + struct fuse_setattr_in inarg; + struct fuse_attr_out outarg; + int err; + + req = fuse_get_req_nopages(fc); + if (IS_ERR(req)) { + printk(KERN_WARNING "failed to allocate req for truncate " + "(%ld)\n", PTR_ERR(req)); + return; + } + + memset(&inarg, 0, sizeof(inarg)); + memset(&outarg, 0, sizeof(outarg)); + + inarg.valid |= FATTR_SIZE; + inarg.size = i_size_read(inode); + + inarg.valid |= FATTR_FH; + inarg.fh = ff->fh; + + req->in.h.opcode = FUSE_SETATTR; + req->in.h.nodeid = get_node_id(inode); + req->in.numargs = 1; + req->in.args[0].size = sizeof(inarg); + req->in.args[0].value = &inarg; + req->out.numargs = 1; + if (fc->minor < 9) + req->out.args[0].size = FUSE_COMPAT_ATTR_OUT_SIZE; + else + req->out.args[0].size = sizeof(outarg); + req->out.args[0].value = &outarg; + + fuse_request_send(fc, req); + err = req->out.h.error; + fuse_put_request(fc, req); + + if (err) + printk(KERN_WARNING "failed to truncate to %lld with error " + "%d\n", i_size_read(inode), err); +} + static ssize_t fuse_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov, loff_t offset, unsigned long nr_segs) @@ -2393,8 +2440,12 @@ fuse_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov, ret = wait_on_sync_kiocb(iocb); - if (rw == WRITE && ret > 0) - fuse_write_update_size(inode, pos); + if (rw == WRITE) { + if (ret > 0) + fuse_write_update_size(inode, pos); + else if (ret < 0 && offset + count > i_size) + fuse_do_truncate(file); + } } return ret; ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 6/6] fuse: optimize short direct reads 2012-12-10 7:41 [PATCH 0/6] fuse: process direct IO asynchronously Maxim V. Patlasov ` (4 preceding siblings ...) 2012-12-10 7:42 ` [PATCH 5/6] fuse: truncate file if async dio failed Maxim V. Patlasov @ 2012-12-10 7:42 ` Maxim V. Patlasov 5 siblings, 0 replies; 8+ messages in thread From: Maxim V. Patlasov @ 2012-12-10 7:42 UTC (permalink / raw) To: miklos; +Cc: dev, xemul, fuse-devel, bfoster, linux-kernel, devel If user requested direct read beyond EOF, we can skip sending fuse requests for positions beyond EOF because userspace would ACK them with zero bytes read anyway. We can trust to i_size in fuse_direct_IO for such cases because it's called from fuse_file_aio_read() and the latter updates fuse attributes including i_size. Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com> --- fs/fuse/file.c | 19 +++++++++++++------ 1 files changed, 13 insertions(+), 6 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 3e0fdb7..d2094e1 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -1324,7 +1324,7 @@ EXPORT_SYMBOL_GPL(fuse_direct_io); static ssize_t __fuse_direct_read(struct file *file, const struct iovec *iov, unsigned long nr_segs, loff_t *ppos, - struct kiocb *async) + struct kiocb *async, size_t count) { ssize_t res; struct inode *inode = file->f_path.dentry->d_inode; @@ -1332,8 +1332,7 @@ static ssize_t __fuse_direct_read(struct file *file, const struct iovec *iov, if (is_bad_inode(inode)) return -EIO; - res = fuse_direct_io(file, iov, nr_segs, iov_length(iov, nr_segs), - ppos, 0, async); + res = fuse_direct_io(file, iov, nr_segs, count, ppos, 0, async); fuse_invalidate_attr(inode); @@ -1344,7 +1343,7 @@ static ssize_t fuse_direct_read(struct file *file, char __user *buf, size_t count, loff_t *ppos) { struct iovec iov = { .iov_base = buf, .iov_len = count }; - return __fuse_direct_read(file, &iov, 1, ppos, NULL); + return __fuse_direct_read(file, &iov, 1, ppos, NULL, count); } static ssize_t __fuse_direct_write(struct file *file, const struct iovec *iov, @@ -2405,8 +2404,15 @@ fuse_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov, inode = file->f_mapping->host; i_size = i_size_read(inode); + /* optimization for short read */ + if (rw != WRITE && offset + count > i_size) { + if (offset >= i_size) + return 0; + count = i_size - offset; + } + /* cannot write beyond eof asynchronously */ - if (is_sync_kiocb(iocb) || (offset + count <= i_size) || rw != WRITE) { + if (is_sync_kiocb(iocb) || (offset + count <= i_size)) { struct fuse_io_priv *io; io = kmalloc(sizeof(struct fuse_io_priv), GFP_KERNEL); @@ -2429,7 +2435,8 @@ fuse_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov, if (rw == WRITE) ret = __fuse_direct_write(file, iov, nr_segs, &pos, async_cb); else - ret = __fuse_direct_read(file, iov, nr_segs, &pos, async_cb); + ret = __fuse_direct_read(file, iov, nr_segs, &pos, async_cb, + count); if (async_cb) { fuse_aio_complete(async_cb->private, ret == count ? 0 : -EIO, ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v2 0/6] fuse: process direct IO asynchronously
@ 2012-12-14 15:20 Maxim V. Patlasov
2012-12-14 15:20 ` [PATCH 3/6] fuse: make fuse_direct_io() aware about AIO Maxim V. Patlasov
0 siblings, 1 reply; 8+ messages in thread
From: Maxim V. Patlasov @ 2012-12-14 15:20 UTC (permalink / raw)
To: miklos; +Cc: dev, xemul, fuse-devel, bfoster, linux-kernel, devel
Hi,
Existing fuse implementation always processes direct IO synchronously: it
submits next request to userspace fuse only when previous is completed. This
is suboptimal because: 1) libaio DIO works in blocking way; 2) userspace fuse
can't achieve parallelism processing several requests simultaneously (e.g.
in case of distributed network storage); 3) userspace fuse can't merge
requests before passing it to actual storage.
The idea of the patch-set is to submit fuse requests in non-blocking way
(where it's possible) and either return -EIOCBQUEUED or wait for their
completion synchronously. The patch-set to be applied on top of for-next of
Miklos' git repo.
To estimate performance improvement I used slightly modified fusexmp over
tmpfs (clearing O_DIRECT bit from fi->flags in xmp_open). For synchronous
operations I used 'dd' like this:
dd of=/dev/null if=/fuse/mnt/file bs=2M count=256 iflag=direct
dd if=/dev/zero of=/fuse/mnt/file bs=2M count=256 oflag=direct conv=notrunc
For AIO I used 'aio-stress' like this:
aio-stress -s 512 -a 4 -b 1 -c 1 -O -o 1 /fuse/mnt/file
aio-stress -s 512 -a 4 -b 1 -c 1 -O -o 0 /fuse/mnt/file
The throughput on some commodity (rather feeble) server was (in MB/sec):
original / patched
dd reads: ~322 / ~382
dd writes: ~277 / ~288
aio reads: ~380 / ~459
aio writes: ~319 / ~353
Changed in v2 - cleanups suggested by Brian:
- Updated fuse_io_priv with an async field and file pointer to preserve
the current style of interface (i.e., use this instead of iocb).
- Trigger the type of request submission based on the async field.
- Pulled up the fuse_write_update_size() call out of __fuse_direct_write()
to make the separate paths more consistent.
Thanks,
Maxim
---
Maxim V. Patlasov (6):
fuse: move fuse_release_user_pages() up
fuse: add support of async IO
fuse: make fuse_direct_io() aware about AIO
fuse: enable asynchronous processing direct IO
fuse: truncate file if async dio failed
fuse: optimize short direct reads
fs/fuse/cuse.c | 6 +
fs/fuse/file.c | 290 +++++++++++++++++++++++++++++++++++++++++++++++-------
fs/fuse/fuse_i.h | 19 +++-
3 files changed, 276 insertions(+), 39 deletions(-)
--
Signature
^ permalink raw reply [flat|nested] 8+ messages in thread* [PATCH 3/6] fuse: make fuse_direct_io() aware about AIO 2012-12-14 15:20 [PATCH v2 0/6] fuse: process direct IO asynchronously Maxim V. Patlasov @ 2012-12-14 15:20 ` Maxim V. Patlasov 0 siblings, 0 replies; 8+ messages in thread From: Maxim V. Patlasov @ 2012-12-14 15:20 UTC (permalink / raw) To: miklos; +Cc: dev, xemul, fuse-devel, bfoster, linux-kernel, devel The patch implements passing "struct fuse_io_priv *io" down the stack up to fuse_send_read/write where it is used to submit request asynchronously. io->async==0 designates synchronous processing. Non-trivial part of the patch is changes in fuse_direct_io(): resources like fuse requests and user pages cannot be released immediately in async case. Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com> --- fs/fuse/cuse.c | 6 +++-- fs/fuse/file.c | 69 +++++++++++++++++++++++++++++++++++++++--------------- fs/fuse/fuse_i.h | 2 +- 3 files changed, 55 insertions(+), 22 deletions(-) diff --git a/fs/fuse/cuse.c b/fs/fuse/cuse.c index 65ce10a..d890901 100644 --- a/fs/fuse/cuse.c +++ b/fs/fuse/cuse.c @@ -92,8 +92,9 @@ static ssize_t cuse_read(struct file *file, char __user *buf, size_t count, { loff_t pos = 0; struct iovec iov = { .iov_base = buf, .iov_len = count }; + struct fuse_io_priv io = { .async = 0, .file = file }; - return fuse_direct_io(file, &iov, 1, count, &pos, 0); + return fuse_direct_io(&io, &iov, 1, count, &pos, 0); } static ssize_t cuse_write(struct file *file, const char __user *buf, @@ -101,12 +102,13 @@ static ssize_t cuse_write(struct file *file, const char __user *buf, { loff_t pos = 0; struct iovec iov = { .iov_base = (void __user *)buf, .iov_len = count }; + struct fuse_io_priv io = { .async = 0, .file = file }; /* * No locking or generic_write_checks(), the server is * responsible for locking and sanity checks. */ - return fuse_direct_io(file, &iov, 1, count, &pos, 1); + return fuse_direct_io(&io, &iov, 1, count, &pos, 1); } static int cuse_open(struct inode *inode, struct file *file) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 8dd931f..6c2ca8a 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -595,9 +595,10 @@ static size_t fuse_async_req_send(struct fuse_conn *fc, struct fuse_req *req, return num_bytes; } -static size_t fuse_send_read(struct fuse_req *req, struct file *file, +static size_t fuse_send_read(struct fuse_req *req, struct fuse_io_priv *io, loff_t pos, size_t count, fl_owner_t owner) { + struct file *file = io->file; struct fuse_file *ff = file->private_data; struct fuse_conn *fc = ff->fc; @@ -608,6 +609,10 @@ static size_t fuse_send_read(struct fuse_req *req, struct file *file, inarg->read_flags |= FUSE_READ_LOCKOWNER; inarg->lock_owner = fuse_lock_owner_id(fc, owner); } + + if (io->async) + return fuse_async_req_send(fc, req, count, io); + fuse_request_send(fc, req); return req->out.args[0].size; } @@ -628,6 +633,7 @@ static void fuse_read_update_size(struct inode *inode, loff_t size, static int fuse_readpage(struct file *file, struct page *page) { + struct fuse_io_priv io = { .async = 0, .file = file }; struct inode *inode = page->mapping->host; struct fuse_conn *fc = get_fuse_conn(inode); struct fuse_req *req; @@ -660,7 +666,7 @@ static int fuse_readpage(struct file *file, struct page *page) req->num_pages = 1; req->pages[0] = page; req->page_descs[0].length = count; - num_read = fuse_send_read(req, file, pos, count, NULL); + num_read = fuse_send_read(req, &io, pos, count, NULL); err = req->out.h.error; fuse_put_request(fc, req); @@ -862,9 +868,10 @@ static void fuse_write_fill(struct fuse_req *req, struct fuse_file *ff, req->out.args[0].value = outarg; } -static size_t fuse_send_write(struct fuse_req *req, struct file *file, +static size_t fuse_send_write(struct fuse_req *req, struct fuse_io_priv *io, loff_t pos, size_t count, fl_owner_t owner) { + struct file *file = io->file; struct fuse_file *ff = file->private_data; struct fuse_conn *fc = ff->fc; struct fuse_write_in *inarg = &req->misc.write.in; @@ -875,6 +882,10 @@ static size_t fuse_send_write(struct fuse_req *req, struct file *file, inarg->write_flags |= FUSE_WRITE_LOCKOWNER; inarg->lock_owner = fuse_lock_owner_id(fc, owner); } + + if (io->async) + return fuse_async_req_send(fc, req, count, io); + fuse_request_send(fc, req); return req->misc.write.out.size; } @@ -898,11 +909,12 @@ static size_t fuse_send_write_pages(struct fuse_req *req, struct file *file, size_t res; unsigned offset; unsigned i; + struct fuse_io_priv io = { .async = 0, .file = file }; for (i = 0; i < req->num_pages; i++) fuse_wait_on_page_writeback(inode, req->pages[i]->index); - res = fuse_send_write(req, file, pos, count, NULL); + res = fuse_send_write(req, &io, pos, count, NULL); offset = req->page_descs[0].offset; count = res; @@ -1240,10 +1252,11 @@ static inline int fuse_iter_npages(const struct iov_iter *ii_p) return min(npages, FUSE_MAX_PAGES_PER_REQ); } -ssize_t fuse_direct_io(struct file *file, const struct iovec *iov, +ssize_t fuse_direct_io(struct fuse_io_priv *io, const struct iovec *iov, unsigned long nr_segs, size_t count, loff_t *ppos, int write) { + struct file *file = io->file; struct fuse_file *ff = file->private_data; struct fuse_conn *fc = ff->fc; size_t nmax = write ? fc->max_write : fc->max_read; @@ -1264,16 +1277,20 @@ ssize_t fuse_direct_io(struct file *file, const struct iovec *iov, size_t nbytes = min(count, nmax); int err = fuse_get_user_pages(req, &ii, &nbytes, write); if (err) { + if (io->async) + fuse_put_request(fc, req); + res = err; break; } if (write) - nres = fuse_send_write(req, file, pos, nbytes, owner); + nres = fuse_send_write(req, io, pos, nbytes, owner); else - nres = fuse_send_read(req, file, pos, nbytes, owner); + nres = fuse_send_read(req, io, pos, nbytes, owner); - fuse_release_user_pages(req, !write); + if (!io->async) + fuse_release_user_pages(req, !write); if (req->out.h.error) { if (!res) res = req->out.h.error; @@ -1288,13 +1305,14 @@ ssize_t fuse_direct_io(struct file *file, const struct iovec *iov, if (nres != nbytes) break; if (count) { - fuse_put_request(fc, req); + if (!io->async) + fuse_put_request(fc, req); req = fuse_get_req(fc, fuse_iter_npages(&ii)); if (IS_ERR(req)) break; } } - if (!IS_ERR(req)) + if (!IS_ERR(req) && !io->async) fuse_put_request(fc, req); if (res > 0) *ppos = pos; @@ -1303,16 +1321,17 @@ ssize_t fuse_direct_io(struct file *file, const struct iovec *iov, } EXPORT_SYMBOL_GPL(fuse_direct_io); -static ssize_t __fuse_direct_read(struct file *file, const struct iovec *iov, +static ssize_t __fuse_direct_read(struct fuse_io_priv *io, const struct iovec *iov, unsigned long nr_segs, loff_t *ppos) { ssize_t res; + struct file *file = io->file; struct inode *inode = file->f_path.dentry->d_inode; if (is_bad_inode(inode)) return -EIO; - res = fuse_direct_io(file, iov, nr_segs, iov_length(iov, nr_segs), + res = fuse_direct_io(io, iov, nr_segs, iov_length(iov, nr_segs), ppos, 0); fuse_invalidate_attr(inode); @@ -1323,21 +1342,23 @@ static ssize_t __fuse_direct_read(struct file *file, const struct iovec *iov, static ssize_t fuse_direct_read(struct file *file, char __user *buf, size_t count, loff_t *ppos) { + struct fuse_io_priv io = { .async = 0, .file = file }; struct iovec iov = { .iov_base = buf, .iov_len = count }; - return __fuse_direct_read(file, &iov, 1, ppos); + return __fuse_direct_read(&io, &iov, 1, ppos); } -static ssize_t __fuse_direct_write(struct file *file, const struct iovec *iov, +static ssize_t __fuse_direct_write(struct fuse_io_priv *io, const struct iovec *iov, unsigned long nr_segs, loff_t *ppos) { + struct file *file = io->file; struct inode *inode = file->f_path.dentry->d_inode; size_t count = iov_length(iov, nr_segs); ssize_t res; res = generic_write_checks(file, ppos, &count, 0); if (!res) { - res = fuse_direct_io(file, iov, nr_segs, count, ppos, 1); - if (res > 0) + res = fuse_direct_io(io, iov, nr_segs, count, ppos, 1); + if (!io->async && res > 0) fuse_write_update_size(inode, *ppos); } @@ -1352,13 +1373,14 @@ static ssize_t fuse_direct_write(struct file *file, const char __user *buf, struct iovec iov = { .iov_base = (void __user *)buf, .iov_len = count }; struct inode *inode = file->f_path.dentry->d_inode; ssize_t res; + struct fuse_io_priv io = { .async = 0, .file = file }; if (is_bad_inode(inode)) return -EIO; /* Don't allow parallel writes to the same file */ mutex_lock(&inode->i_mutex); - res = __fuse_direct_write(file, &iov, 1, ppos); + res = __fuse_direct_write(&io, &iov, 1, ppos); mutex_unlock(&inode->i_mutex); return res; @@ -2326,14 +2348,23 @@ fuse_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov, ssize_t ret = 0; struct file *file = NULL; loff_t pos = 0; + struct fuse_io_priv *io; file = iocb->ki_filp; pos = offset; + io = kzalloc(sizeof(struct fuse_io_priv), GFP_KERNEL); + if (!io) + return -ENOMEM; + + io->file = file; + if (rw == WRITE) - ret = __fuse_direct_write(file, iov, nr_segs, &pos); + ret = __fuse_direct_write(io, iov, nr_segs, &pos); else - ret = __fuse_direct_read(file, iov, nr_segs, &pos); + ret = __fuse_direct_read(io, iov, nr_segs, &pos); + + kfree(io); return ret; } diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index e0a5b65..91b5192 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -828,7 +828,7 @@ int fuse_reverse_inval_entry(struct super_block *sb, u64 parent_nodeid, int fuse_do_open(struct fuse_conn *fc, u64 nodeid, struct file *file, bool isdir); -ssize_t fuse_direct_io(struct file *file, const struct iovec *iov, +ssize_t fuse_direct_io(struct fuse_io_priv *io, const struct iovec *iov, unsigned long nr_segs, size_t count, loff_t *ppos, int write); long fuse_do_ioctl(struct file *file, unsigned int cmd, unsigned long arg, ^ permalink raw reply related [flat|nested] 8+ messages in thread
end of thread, other threads:[~2012-12-14 15:21 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-12-10 7:41 [PATCH 0/6] fuse: process direct IO asynchronously Maxim V. Patlasov 2012-12-10 7:41 ` [PATCH 1/6] fuse: move fuse_release_user_pages() up Maxim V. Patlasov 2012-12-10 7:41 ` [PATCH 2/6] fuse: add support of async IO Maxim V. Patlasov 2012-12-10 7:41 ` [PATCH 3/6] fuse: make fuse_direct_io() aware about AIO Maxim V. Patlasov 2012-12-10 7:42 ` [PATCH 4/6] fuse: enable asynchronous processing direct IO Maxim V. Patlasov 2012-12-10 7:42 ` [PATCH 5/6] fuse: truncate file if async dio failed Maxim V. Patlasov 2012-12-10 7:42 ` [PATCH 6/6] fuse: optimize short direct reads Maxim V. Patlasov -- strict thread matches above, loose matches on Subject: below -- 2012-12-14 15:20 [PATCH v2 0/6] fuse: process direct IO asynchronously Maxim V. Patlasov 2012-12-14 15:20 ` [PATCH 3/6] fuse: make fuse_direct_io() aware about AIO Maxim V. Patlasov
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.