* [PATCH v2 1/4] aio: add aio_kernel_() interface [not found] <1421163888-21452-1-git-send-email-ming.lei@canonical.com> @ 2015-01-13 15:44 ` Ming Lei 2015-01-25 13:31 ` Christoph Hellwig 2015-01-13 15:44 ` [PATCH v2 2/4] fd/direct-io: introduce should_dirty for kernel aio Ming Lei 1 sibling, 1 reply; 9+ messages in thread From: Ming Lei @ 2015-01-13 15:44 UTC (permalink / raw) To: linux-kernel, Dave Kleikamp Cc: Jens Axboe, Zach Brown, Christoph Hellwig, Maxim Patlasov, Andrew Morton, Alexander Viro, Benjamin LaHaise, linux-fsdevel, open list:AIO, Ming Lei From: Dave Kleikamp <dave.kleikamp@oracle.com> This adds an interface that lets kernel callers submit aio iocbs without going through the user space syscalls. This lets kernel callers avoid the management limits and overhead of the context. It will also let us integrate aio operations with other kernel apis that the user space interface doesn't have access to. This patch is based on Dave's posts in below links: https://lkml.org/lkml/2013/10/16/365 https://groups.google.com/forum/#!topic/linux.kernel/l7mogGJZoKQ Most of the patch is from Dave's directly, this version tries to not couple kernel aio with linux-aio implementation. Follows potential users of these APIs: - Loop block driver for avoiding double cache, and improving throughput - All kinds of kernel target(SCSI, USB, ...) which need to access file efficiently, and has the double cache problem too - socket users may benifit from the APIs too Cc: Maxim Patlasov <mpatlasov@parallels.com> Cc: Zach Brown <zab@zabbo.net> Cc: Dave Kleikamp <dave.kleikamp@oracle.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Cc: linux-aio@kvack.org (open list:AIO) Signed-off-by: Ming Lei <ming.lei@canonical.com> --- fs/aio.c | 121 +++++++++++++++++++++++++++++++++++++++++++++++++++ include/linux/aio.h | 17 +++++++- 2 files changed, 137 insertions(+), 1 deletion(-) diff --git a/fs/aio.c b/fs/aio.c index 1b7893e..d044387 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1044,6 +1044,9 @@ void aio_complete(struct kiocb *iocb, long res, long res2) iocb->ki_ctx = ERR_PTR(-EXDEV); wake_up_process(iocb->ki_obj.tsk); return; + } else if (is_kernel_kiocb(iocb)) { + iocb->ki_obj.complete(iocb->ki_user_data, res); + return; } if (iocb->ki_list.next) { @@ -1503,6 +1506,124 @@ rw_common: return 0; } +/* + * This allocates an iocb that will be used to submit and track completion of + * an IO that is issued from kernel space. + * + * The caller is expected to call the appropriate aio_kernel_init_() functions + * and then call aio_kernel_submit(). From that point forward progress is + * guaranteed by the file system aio method. Eventually the caller's + * completion callback will be called. + * + * These iocbs are special. They don't have a context, we don't limit the + * number pending, and they can't be canceled. + */ +struct kiocb *aio_kernel_alloc(gfp_t gfp) +{ + return kzalloc(sizeof(struct kiocb), gfp); +} +EXPORT_SYMBOL_GPL(aio_kernel_alloc); + +void aio_kernel_free(struct kiocb *iocb) +{ + kfree(iocb); +} +EXPORT_SYMBOL_GPL(aio_kernel_free); + +/* + * ptr and count can be a buff and bytes or an iov and segs. + */ +void aio_kernel_init_rw(struct kiocb *iocb, struct file *filp, + size_t nr, loff_t off, + void (*complete)(u64 user_data, long res), + u64 user_data) +{ + iocb->ki_filp = filp; + iocb->ki_nbytes = nr; + iocb->ki_pos = off; + iocb->ki_ctx = KERNEL_AIO_CTX; + + iocb->ki_obj.complete = complete; + iocb->ki_user_data = user_data; +} +EXPORT_SYMBOL_GPL(aio_kernel_init_rw); + +static ssize_t aio_read_iter(struct kiocb *iocb, struct iov_iter *iter) +{ + struct file *file = iocb->ki_filp; + ssize_t ret; + + if (unlikely(!(file->f_mode & FMODE_READ))) + return -EBADF; + + ret = security_file_permission(file, MAY_READ); + if (unlikely(ret)) + return ret; + + if (!file->f_op->read_iter) + return -EINVAL; + + return file->f_op->read_iter(iocb, iter); +} + +static ssize_t aio_write_iter(struct kiocb *iocb, struct iov_iter *iter) +{ + struct file *file = iocb->ki_filp; + ssize_t ret; + + if (unlikely(!(file->f_mode & FMODE_WRITE))) + return -EBADF; + + ret = security_file_permission(file, MAY_WRITE); + if (unlikely(ret)) + return ret; + + if (!file->f_op->write_iter) + return -EINVAL; + + file_start_write(file); + ret = file->f_op->write_iter(iocb, iter); + file_end_write(file); + + return ret; +} + +/* + * The iocb is our responsibility once this is called. The caller must not + * reference it. + * + * Callers must be prepared for their iocb completion callback to be called the + * moment they enter this function. The completion callback may be called from + * any context. + * + * Returns: 0: the iocb completion callback will be called with the op result + * negative errno: the operation was not submitted and the iocb was freed + */ +int aio_kernel_submit(struct kiocb *iocb, bool is_write, + struct iov_iter *iter) +{ + int ret = -EINVAL; + + if (WARN_ON(!is_kernel_kiocb(iocb) || !iocb->ki_obj.complete + || !iocb->ki_filp || !(iter->type & ITER_BVEC))) + return ret; + + if (is_write) + ret = aio_write_iter(iocb, iter); + else + ret = aio_read_iter(iocb, iter); + + /* + * use same policy with userspace aio, req may have been + * completed already, so release it by aio completion. + */ + if (ret != -EIOCBQUEUED) + iocb->ki_obj.complete(iocb->ki_user_data, ret); + + return 0; +} +EXPORT_SYMBOL_GPL(aio_kernel_submit); + static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, struct iocb *iocb, bool compat) { diff --git a/include/linux/aio.h b/include/linux/aio.h index d9c92da..7b4764a 100644 --- a/include/linux/aio.h +++ b/include/linux/aio.h @@ -27,17 +27,21 @@ struct kiocb; */ #define KIOCB_CANCELLED ((void *) (~0ULL)) +#define KERNEL_AIO_CTX ((void *) -1) + typedef int (kiocb_cancel_fn)(struct kiocb *); struct kiocb { struct file *ki_filp; - struct kioctx *ki_ctx; /* NULL for sync ops */ + struct kioctx *ki_ctx; /* NULL for sync ops, + * -1 for kernel caller */ kiocb_cancel_fn *ki_cancel; void *private; union { void __user *user; struct task_struct *tsk; + void (*complete)(u64 user_data, long res); } ki_obj; __u64 ki_user_data; /* user's data for completion */ @@ -59,6 +63,11 @@ static inline bool is_sync_kiocb(struct kiocb *kiocb) return kiocb->ki_ctx == NULL; } +static inline bool is_kernel_kiocb(struct kiocb *kiocb) +{ + return kiocb->ki_ctx == KERNEL_AIO_CTX; +} + static inline void init_sync_kiocb(struct kiocb *kiocb, struct file *filp) { *kiocb = (struct kiocb) { @@ -77,6 +86,12 @@ extern void exit_aio(struct mm_struct *mm); extern long do_io_submit(aio_context_t ctx_id, long nr, struct iocb __user *__user *iocbpp, bool compat); void kiocb_set_cancel_fn(struct kiocb *req, kiocb_cancel_fn *cancel); +struct kiocb *aio_kernel_alloc(gfp_t gfp); +void aio_kernel_free(struct kiocb *iocb); +void aio_kernel_init_rw(struct kiocb *iocb, struct file *filp, size_t nr, + loff_t off, void (*complete)(u64 user_data, long res), + u64 user_data); +int aio_kernel_submit(struct kiocb *iocb, bool is_write, struct iov_iter *iter); #else static inline ssize_t wait_on_sync_kiocb(struct kiocb *iocb) { return 0; } static inline void aio_complete(struct kiocb *iocb, long res, long res2) { } -- 1.7.9.5 -- To unsubscribe, send a message with 'unsubscribe linux-aio' in the body to majordomo@kvack.org. For more info on Linux AIO, see: http://www.kvack.org/aio/ Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a> ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v2 1/4] aio: add aio_kernel_() interface 2015-01-13 15:44 ` [PATCH v2 1/4] aio: add aio_kernel_() interface Ming Lei @ 2015-01-25 13:31 ` Christoph Hellwig 2015-01-26 16:18 ` Ming Lei 0 siblings, 1 reply; 9+ messages in thread From: Christoph Hellwig @ 2015-01-25 13:31 UTC (permalink / raw) To: Ming Lei Cc: linux-kernel, Dave Kleikamp, Jens Axboe, Zach Brown, Christoph Hellwig, Maxim Patlasov, Andrew Morton, Alexander Viro, Benjamin LaHaise, linux-fsdevel, open list:AIO > +struct kiocb *aio_kernel_alloc(gfp_t gfp) > +{ > + return kzalloc(sizeof(struct kiocb), gfp); > +} > +EXPORT_SYMBOL_GPL(aio_kernel_alloc); > + > +void aio_kernel_free(struct kiocb *iocb) > +{ > + kfree(iocb); > +} > +EXPORT_SYMBOL_GPL(aio_kernel_free); Both functions don't actually seem to be used in this patch set. > +void aio_kernel_init_rw(struct kiocb *iocb, struct file *filp, > + size_t nr, loff_t off, > + void (*complete)(u64 user_data, long res), > + u64 user_data) > +int aio_kernel_submit(struct kiocb *iocb, bool is_write, > + struct iov_iter *iter) Why do we keep these two separate? Especially having the iov passed n the second, and the count in the first seems rather confusing as we shouldn't even need both for a high level API. Also the private data should really be a void pointer for the kernel, or simply be left away as we can assume the iocb is embedded into a caller data structure and container_of can be used to find that structure. Also it might make sense to just offer aio_kernel_read/write intefaces instead of the common submit wrapper, as that's much closer to other kernel APIs, e.g. int aio_kernel_read(struct kiocb *iocb, struct file *file, struct iov_iter *iter, loff_t off, void (*complete)(struct kiocb *iocb, long res)); int aio_kernel_write(struct kiocb *iocb, struct file *file, struct iov_iter *iter, loff_t off, void (*complete)(struct kiocb *iocb, long res)); > + if (WARN_ON(!is_kernel_kiocb(iocb) || !iocb->ki_obj.complete > + || !iocb->ki_filp || !(iter->type & ITER_BVEC))) Why do you want to limit what the iov_iter can contain? iovec based ones seem very useful, and athough I can come up with a use case for vectors pointing to userspace address I can't see anything that speaks against allowing them either. call this from drivers deadling ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 1/4] aio: add aio_kernel_() interface 2015-01-25 13:31 ` Christoph Hellwig @ 2015-01-26 16:18 ` Ming Lei 2015-01-26 17:00 ` Christoph Hellwig 2015-01-27 17:59 ` Christoph Hellwig 0 siblings, 2 replies; 9+ messages in thread From: Ming Lei @ 2015-01-26 16:18 UTC (permalink / raw) To: Christoph Hellwig Cc: Linux Kernel Mailing List, Dave Kleikamp, Jens Axboe, Zach Brown, Maxim Patlasov, Andrew Morton, Alexander Viro, Benjamin LaHaise, Linux FS Devel, open list:AIO On Sun, Jan 25, 2015 at 9:31 PM, Christoph Hellwig <hch@infradead.org> wrote: >> +struct kiocb *aio_kernel_alloc(gfp_t gfp) >> +{ >> + return kzalloc(sizeof(struct kiocb), gfp); >> +} >> +EXPORT_SYMBOL_GPL(aio_kernel_alloc); >> + >> +void aio_kernel_free(struct kiocb *iocb) >> +{ >> + kfree(iocb); >> +} >> +EXPORT_SYMBOL_GPL(aio_kernel_free); > > Both functions don't actually seem to be used in this patch set. My fault, and it is just v2 which stops using them. >> +void aio_kernel_init_rw(struct kiocb *iocb, struct file *filp, >> + size_t nr, loff_t off, >> + void (*complete)(u64 user_data, long res), >> + u64 user_data) > >> +int aio_kernel_submit(struct kiocb *iocb, bool is_write, >> + struct iov_iter *iter) > > Why do we keep these two separate? Especially having the iov passed No special meaning, just follow previous patches, :-) But one benefit is that we can separate the one-shot initialization from submit, at least filep/complete/ki_ctx can be set during initialization. > n the second, and the count in the first seems rather confusing as > we shouldn't even need both for a high level API. Also the private > data should really be a void pointer for the kernel, or simply be > left away as we can assume the iocb is embedded into a caller > data structure and container_of can be used to find that structure. Either one is OK. > Also it might make sense to just offer aio_kernel_read/write intefaces > instead of the common submit wrapper, as that's much closer to other > kernel APIs, e.g. > > int aio_kernel_read(struct kiocb *iocb, struct file *file, > struct iov_iter *iter, loff_t off, > void (*complete)(struct kiocb *iocb, long res)); > int aio_kernel_write(struct kiocb *iocb, struct file *file, > struct iov_iter *iter, loff_t off, > void (*complete)(struct kiocb *iocb, long res)); It is like style of sync APIs, looks submit/complete is common for async APIs, like io_submit(). >> + if (WARN_ON(!is_kernel_kiocb(iocb) || !iocb->ki_obj.complete >> + || !iocb->ki_filp || !(iter->type & ITER_BVEC))) > > Why do you want to limit what the iov_iter can contain? iovec based > ones seem very useful, and athough I can come up with a use case > for vectors pointing to userspace address I can't see anything that > speaks against allowing them either. > call this from drivers deadling Yes, we should allow KVEC at least. Thanks, Ming Lei -- To unsubscribe, send a message with 'unsubscribe linux-aio' in the body to majordomo@kvack.org. For more info on Linux AIO, see: http://www.kvack.org/aio/ Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 1/4] aio: add aio_kernel_() interface 2015-01-26 16:18 ` Ming Lei @ 2015-01-26 17:00 ` Christoph Hellwig 2015-01-27 13:57 ` Ming Lei 2015-01-27 17:59 ` Christoph Hellwig 1 sibling, 1 reply; 9+ messages in thread From: Christoph Hellwig @ 2015-01-26 17:00 UTC (permalink / raw) To: Ming Lei Cc: Christoph Hellwig, Linux Kernel Mailing List, Dave Kleikamp, Jens Axboe, Zach Brown, Maxim Patlasov, Andrew Morton, Alexander Viro, Benjamin LaHaise, Linux FS Devel, open list:AIO On Tue, Jan 27, 2015 at 12:18:23AM +0800, Ming Lei wrote: > > Also it might make sense to just offer aio_kernel_read/write intefaces > > instead of the common submit wrapper, as that's much closer to other > > kernel APIs, e.g. > > > > int aio_kernel_read(struct kiocb *iocb, struct file *file, > > struct iov_iter *iter, loff_t off, > > void (*complete)(struct kiocb *iocb, long res)); > > int aio_kernel_write(struct kiocb *iocb, struct file *file, > > struct iov_iter *iter, loff_t off, > > void (*complete)(struct kiocb *iocb, long res)); > > It is like style of sync APIs, looks submit/complete is common > for async APIs, like io_submit(). io_submit is a fairly horrible API. While Posix AIO isn't too much better it does have separate APIs for read/write. Given that there isn't much code shared I'd keep an API that feels familar. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 1/4] aio: add aio_kernel_() interface 2015-01-26 17:00 ` Christoph Hellwig @ 2015-01-27 13:57 ` Ming Lei 0 siblings, 0 replies; 9+ messages in thread From: Ming Lei @ 2015-01-27 13:57 UTC (permalink / raw) To: Christoph Hellwig Cc: Linux Kernel Mailing List, Dave Kleikamp, Jens Axboe, Zach Brown, Maxim Patlasov, Andrew Morton, Alexander Viro, Benjamin LaHaise, Linux FS Devel, open list:AIO On Tue, Jan 27, 2015 at 1:00 AM, Christoph Hellwig <hch@infradead.org> wrote: > On Tue, Jan 27, 2015 at 12:18:23AM +0800, Ming Lei wrote: >> > Also it might make sense to just offer aio_kernel_read/write intefaces >> > instead of the common submit wrapper, as that's much closer to other >> > kernel APIs, e.g. >> > >> > int aio_kernel_read(struct kiocb *iocb, struct file *file, >> > struct iov_iter *iter, loff_t off, >> > void (*complete)(struct kiocb *iocb, long res)); >> > int aio_kernel_write(struct kiocb *iocb, struct file *file, >> > struct iov_iter *iter, loff_t off, >> > void (*complete)(struct kiocb *iocb, long res)); >> >> It is like style of sync APIs, looks submit/complete is common >> for async APIs, like io_submit(). > > io_submit is a fairly horrible API. While Posix AIO isn't too much > better it does have separate APIs for read/write. Given that there > isn't much code shared I'd keep an API that feels familar. But from performance view, the two APIs of aio_kernel_read() and aio_kernel_write() aren't good because both need 5 parameters which can't be held in registers and they are often called from single thread context, not like synchronized read() and write(). Also aio_kernel_init_rw() can be moved to header file as inline to avoid too many parameters. Thanks, Ming Lei ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 1/4] aio: add aio_kernel_() interface 2015-01-26 16:18 ` Ming Lei 2015-01-26 17:00 ` Christoph Hellwig @ 2015-01-27 17:59 ` Christoph Hellwig 1 sibling, 0 replies; 9+ messages in thread From: Christoph Hellwig @ 2015-01-27 17:59 UTC (permalink / raw) To: Ming Lei Cc: Christoph Hellwig, Linux Kernel Mailing List, Dave Kleikamp, Jens Axboe, Zach Brown, Maxim Patlasov, Andrew Morton, Alexander Viro, Benjamin LaHaise, Linux FS Devel, open list:AIO On Tue, Jan 27, 2015 at 12:18:23AM +0800, Ming Lei wrote: > > Why do we keep these two separate? Especially having the iov passed > > No special meaning, just follow previous patches, :-) > > But one benefit is that we can separate the one-shot > initialization from submit, at least filep/complete/ki_ctx can be > set during initialization. FYI, I've posted another approach at async kernel reads/writes in the "[RFC] split struct kiocb" series. I thought I had you on Cc, but I messed it up. This keeps the separate init function, but it still feels a bit pointless to me. -- To unsubscribe, send a message with 'unsubscribe linux-aio' in the body to majordomo@kvack.org. For more info on Linux AIO, see: http://www.kvack.org/aio/ Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v2 2/4] fd/direct-io: introduce should_dirty for kernel aio [not found] <1421163888-21452-1-git-send-email-ming.lei@canonical.com> 2015-01-13 15:44 ` [PATCH v2 1/4] aio: add aio_kernel_() interface Ming Lei @ 2015-01-13 15:44 ` Ming Lei 2015-01-25 13:34 ` Christoph Hellwig 1 sibling, 1 reply; 9+ messages in thread From: Ming Lei @ 2015-01-13 15:44 UTC (permalink / raw) To: linux-kernel, Dave Kleikamp Cc: Jens Axboe, Zach Brown, Christoph Hellwig, Maxim Patlasov, Andrew Morton, Alexander Viro, Benjamin LaHaise, Ming Lei, Omar Sandoval, linux-fsdevel For pages from kernel AIO, it is required to dirty them before direct IO. The idea is borrowd from Dave previous post. Also set ->should_dirty only for ITER_IOVEC pages based on recent discussion in following link: http://marc.info/?t=141904556800003&r=1&w=2 Cc: Maxim Patlasov <mpatlasov@parallels.com> Cc: Omar Sandoval <osandov@osandov.com> Cc: Zach Brown <zab@zabbo.net> Cc: Dave Kleikamp <dave.kleikamp@oracle.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Ming Lei <ming.lei@canonical.com> --- fs/direct-io.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/fs/direct-io.c b/fs/direct-io.c index e181b6b..4dd6d14 100644 --- a/fs/direct-io.c +++ b/fs/direct-io.c @@ -121,6 +121,7 @@ struct dio { int page_errors; /* errno from get_user_pages() */ int is_async; /* is IO async ? */ bool defer_completion; /* defer AIO completion to workqueue? */ + bool should_dirty; /* if page should be dirty */ int io_error; /* IO error in completion path */ unsigned long refcount; /* direct_io_worker() and bios */ struct bio *bio_list; /* singly linked via bi_private */ @@ -392,7 +393,7 @@ static inline void dio_bio_submit(struct dio *dio, struct dio_submit *sdio) dio->refcount++; spin_unlock_irqrestore(&dio->bio_lock, flags); - if (dio->is_async && dio->rw == READ) + if (dio->is_async && dio->rw == READ && dio->should_dirty) bio_set_pages_dirty(bio); if (sdio->submit_io) @@ -463,13 +464,14 @@ static int dio_bio_complete(struct dio *dio, struct bio *bio) if (!uptodate) dio->io_error = -EIO; - if (dio->is_async && dio->rw == READ) { + if (dio->is_async && dio->rw == READ && dio->should_dirty) { bio_check_pages_dirty(bio); /* transfers ownership */ } else { bio_for_each_segment_all(bvec, bio, i) { struct page *page = bvec->bv_page; - if (dio->rw == READ && !PageCompound(page)) + if (dio->rw == READ && !PageCompound(page) && + dio->should_dirty) set_page_dirty_lock(page); page_cache_release(page); } @@ -1217,6 +1219,7 @@ do_blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode, spin_lock_init(&dio->bio_lock); dio->refcount = 1; + dio->should_dirty = iter_is_iovec(iter); sdio.iter = iter; sdio.final_block_in_request = -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v2 2/4] fd/direct-io: introduce should_dirty for kernel aio 2015-01-13 15:44 ` [PATCH v2 2/4] fd/direct-io: introduce should_dirty for kernel aio Ming Lei @ 2015-01-25 13:34 ` Christoph Hellwig 2015-01-27 16:05 ` Ming Lei 0 siblings, 1 reply; 9+ messages in thread From: Christoph Hellwig @ 2015-01-25 13:34 UTC (permalink / raw) To: Ming Lei Cc: linux-kernel, Dave Kleikamp, Jens Axboe, Zach Brown, Christoph Hellwig, Maxim Patlasov, Andrew Morton, Alexander Viro, Benjamin LaHaise, Omar Sandoval, linux-fsdevel On Tue, Jan 13, 2015 at 11:44:46PM +0800, Ming Lei wrote: > > - if (dio->is_async && dio->rw == READ) > + if (dio->is_async && dio->rw == READ && dio->should_dirty) > bio_set_pages_dirty(bio); > > if (sdio->submit_io) > @@ -463,13 +464,14 @@ static int dio_bio_complete(struct dio *dio, struct bio *bio) > if (!uptodate) > dio->io_error = -EIO; > > - if (dio->is_async && dio->rw == READ) { > + if (dio->is_async && dio->rw == READ && dio->should_dirty) { I'd rather have a ->should_dirrty flag that means we need to call bio_check_pages_dirty, And set that either if we have a kernel iovec/bvec or dio->is_async && dio->rw == READ. But why would we even set this if writing from an iovec? > bio_check_pages_dirty(bio); /* transfers ownership */ > } else { > bio_for_each_segment_all(bvec, bio, i) { > struct page *page = bvec->bv_page; > > - if (dio->rw == READ && !PageCompound(page)) > + if (dio->rw == READ && !PageCompound(page) && > + dio->should_dirty) > set_page_dirty_lock(page); And this unk could also use some explanation. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 2/4] fd/direct-io: introduce should_dirty for kernel aio 2015-01-25 13:34 ` Christoph Hellwig @ 2015-01-27 16:05 ` Ming Lei 0 siblings, 0 replies; 9+ messages in thread From: Ming Lei @ 2015-01-27 16:05 UTC (permalink / raw) To: Christoph Hellwig Cc: linux-kernel, Dave Kleikamp, Jens Axboe, Zach Brown, Maxim Patlasov, Andrew Morton, Alexander Viro, Benjamin LaHaise, Omar Sandoval, linux-fsdevel On 1/25/15, Christoph Hellwig <hch@infradead.org> wrote: > On Tue, Jan 13, 2015 at 11:44:46PM +0800, Ming Lei wrote: >> >> - if (dio->is_async && dio->rw == READ) >> + if (dio->is_async && dio->rw == READ && dio->should_dirty) >> bio_set_pages_dirty(bio); >> >> if (sdio->submit_io) >> @@ -463,13 +464,14 @@ static int dio_bio_complete(struct dio *dio, struct >> bio *bio) >> if (!uptodate) >> dio->io_error = -EIO; >> >> - if (dio->is_async && dio->rw == READ) { >> + if (dio->is_async && dio->rw == READ && dio->should_dirty) { > > I'd rather have a ->should_dirrty flag that means we need to call > bio_check_pages_dirty, And set that either if we have a kernel > iovec/bvec or dio->is_async && dio->rw == READ. It may not work for loop because the page may not become dirty until the read dio is completed, so it just means the caller totally handles the page dirty if I/O is from kernel. > > But why would we even set this if writing from an iovec? Or just rename it as ->from_user? >> bio_check_pages_dirty(bio); /* transfers ownership */ >> } else { >> bio_for_each_segment_all(bvec, bio, i) { >> struct page *page = bvec->bv_page; >> >> - if (dio->rw == READ && !PageCompound(page)) >> + if (dio->rw == READ && !PageCompound(page) && >> + dio->should_dirty) >> set_page_dirty_lock(page); > > And this unk could also use some explanation. This is same, just don't handle page dirty for kernel dio pages. Thanks, Ming Lei ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2015-01-27 17:59 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <1421163888-21452-1-git-send-email-ming.lei@canonical.com> 2015-01-13 15:44 ` [PATCH v2 1/4] aio: add aio_kernel_() interface Ming Lei 2015-01-25 13:31 ` Christoph Hellwig 2015-01-26 16:18 ` Ming Lei 2015-01-26 17:00 ` Christoph Hellwig 2015-01-27 13:57 ` Ming Lei 2015-01-27 17:59 ` Christoph Hellwig 2015-01-13 15:44 ` [PATCH v2 2/4] fd/direct-io: introduce should_dirty for kernel aio Ming Lei 2015-01-25 13:34 ` Christoph Hellwig 2015-01-27 16:05 ` Ming Lei
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).