* [GIT PULL] direct IO support for loop driver @ 2013-11-18 19:03 Dave Kleikamp 2013-11-18 19:07 ` Dave Kleikamp 2013-11-20 21:19 ` Linus Torvalds 0 siblings, 2 replies; 12+ messages in thread From: Dave Kleikamp @ 2013-11-18 19:03 UTC (permalink / raw) To: Linus Torvalds Cc: Christoph Hellwig, LKML, linux-fsdevel@vger.kernel.org, Maxim V. Patlasov, linux-aio, Kent Overstreet, Jens Axboe Linus, Please pull the following patches. They add the infrastructure for kernel-initiated direct-io, change the loop driver to perform direct-io, and change swap-over-nfs to use the new infrastructure. These patches have been under development for a long time and have been in linux-next since August. I'd have sent the request sooner, but I was on vacation last week. There are merge conflicts with several files, so I will follow up with a merge patch. The following changes since commit 1e52db6908ca5aa8e035caeaf96e8de2567fd84d: Merge tag 'vfio-v3.12-rc5' of git://github.com/awilliam/linux-vfio (2013-10-14 18:11:28 -0700) are available in the git repository at: git://github.com/kleikamp/linux-shaggy.git tags/aio-direct-for-3.13 for you to fetch changes up to bb6f7be483180a9674f3d00b8e97e9b29b39d55d: tmpfs: add support for read_iter and write_iter (2013-10-18 10:56:54 -0500) ---------------------------------------------------------------- Allow the loop driver to perform direct IO to the underlying file system ---------------------------------------------------------------- Asias He (1): block_dev: add support for read_iter, write_iter Dave Kleikamp (22): iov_iter: iov_iter_copy_from_user() should use non-atomic copy iov_iter: add __iovec_copy_to_user() fuse: convert fuse to use iov_iter_copy_[to|from]_user iov_iter: ii_iovec_copy_to_user should pre-fault user pages dio: Convert direct_IO to use iov_iter dio: add bio_vec support to __blockdev_direct_IO() aio: add aio_kernel_() interface aio: add aio support for iov_iter arguments fs: create file_readable() and file_writable() functions fs: use read_iter and write_iter rather than aio_read and aio_write fs: add read_iter and write_iter to several file systems ocfs2: add support for read_iter and write_iter ext4: add support for read_iter and write_iter nfs: add support for read_iter, write_iter nfs: simplify swap btrfs: add support for read_iter and write_iter xfs: add support for read_iter and write_iter gfs2: Convert aio_read/write ops to read/write_iter udf: convert file ops from aio_read/write to read/write_iter afs: add support for read_iter and write_iter ecrpytfs: Convert aio_read/write ops to read/write_iter ubifs: convert file ops from aio_read/write to read/write_iter Hugh Dickins (1): tmpfs: add support for read_iter and write_iter Zach Brown (9): iov_iter: move into its own file iov_iter: add copy_to_user support iov_iter: hide iovec details behind ops function pointers iov_iter: add bvec support iov_iter: add a shorten call iov_iter: let callers extract iovecs and bio_vecs fs: pull iov_iter use higher up the stack bio: add bvec_length(), like iov_length() loop: use aio to perform io on the underlying file Documentation/filesystems/Locking | 6 +- Documentation/filesystems/vfs.txt | 12 +- drivers/block/loop.c | 158 +++++++++---- drivers/char/raw.c | 4 +- drivers/mtd/nand/nandsim.c | 4 +- drivers/usb/gadget/storage_common.c | 4 +- fs/9p/vfs_addr.c | 12 +- fs/9p/vfs_file.c | 8 +- fs/Makefile | 2 +- fs/adfs/file.c | 4 +- fs/affs/file.c | 4 +- fs/afs/file.c | 4 +- fs/afs/internal.h | 3 +- fs/afs/write.c | 9 +- fs/aio.c | 136 ++++++++++- fs/bad_inode.c | 14 ++ fs/bfs/file.c | 4 +- fs/block_dev.c | 27 ++- fs/btrfs/file.c | 42 ++-- fs/btrfs/inode.c | 63 +++--- fs/ceph/addr.c | 3 +- fs/cifs/file.c | 4 +- fs/direct-io.c | 223 +++++++++++++------ fs/ecryptfs/file.c | 15 +- fs/exofs/file.c | 4 +- fs/ext2/file.c | 4 +- fs/ext2/inode.c | 8 +- fs/ext3/file.c | 4 +- fs/ext3/inode.c | 15 +- fs/ext4/ext4.h | 3 +- fs/ext4/file.c | 34 +-- fs/ext4/indirect.c | 16 +- fs/ext4/inode.c | 23 +- fs/f2fs/data.c | 4 +- fs/f2fs/file.c | 4 +- fs/fat/file.c | 4 +- fs/fat/inode.c | 10 +- fs/fuse/cuse.c | 10 +- fs/fuse/file.c | 90 ++++---- fs/fuse/fuse_i.h | 5 +- fs/gfs2/aops.c | 7 +- fs/gfs2/file.c | 21 +- fs/hfs/inode.c | 11 +- fs/hfsplus/inode.c | 10 +- fs/hostfs/hostfs_kern.c | 4 +- fs/hpfs/file.c | 4 +- fs/internal.h | 4 + fs/iov-iter.c | 411 ++++++++++++++++++++++++++++++++++ fs/jffs2/file.c | 8 +- fs/jfs/file.c | 4 +- fs/jfs/inode.c | 7 +- fs/logfs/file.c | 4 +- fs/minix/file.c | 4 +- fs/nfs/direct.c | 301 ++++++++++++++++--------- fs/nfs/file.c | 33 ++- fs/nfs/internal.h | 4 +- fs/nfs/nfs4file.c | 4 +- fs/nilfs2/file.c | 4 +- fs/nilfs2/inode.c | 8 +- fs/ocfs2/aops.c | 8 +- fs/ocfs2/aops.h | 2 +- fs/ocfs2/file.c | 55 ++--- fs/ocfs2/ocfs2_trace.h | 6 +- fs/omfs/file.c | 4 +- fs/ramfs/file-mmu.c | 4 +- fs/ramfs/file-nommu.c | 4 +- fs/read_write.c | 78 +++++-- fs/reiserfs/file.c | 4 +- fs/reiserfs/inode.c | 7 +- fs/romfs/mmap-nommu.c | 2 +- fs/sysv/file.c | 4 +- fs/ubifs/file.c | 12 +- fs/udf/file.c | 13 +- fs/udf/inode.c | 10 +- fs/ufs/file.c | 4 +- fs/xfs/xfs_aops.c | 13 +- fs/xfs/xfs_file.c | 51 ++--- include/linux/aio.h | 25 ++- include/linux/bio.h | 8 + include/linux/blk_types.h | 2 - include/linux/fs.h | 165 ++++++++++++-- include/linux/nfs_fs.h | 13 +- include/uapi/linux/loop.h | 1 + mm/filemap.c | 433 ++++++++++++++---------------------- mm/page_io.c | 15 +- mm/shmem.c | 61 ++--- 86 files changed, 1857 insertions(+), 1003 deletions(-) create mode 100644 fs/iov-iter.c -- To unsubscribe, send a message with 'unsubscribe linux-aio' in the body to majordomo@kvack.org. For more info on Linux AIO, see: http://www.kvack.org/aio/ Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [GIT PULL] direct IO support for loop driver 2013-11-18 19:03 [GIT PULL] direct IO support for loop driver Dave Kleikamp @ 2013-11-18 19:07 ` Dave Kleikamp 2013-11-20 21:19 ` Linus Torvalds 1 sibling, 0 replies; 12+ messages in thread From: Dave Kleikamp @ 2013-11-18 19:07 UTC (permalink / raw) To: Linus Torvalds Cc: Christoph Hellwig, LKML, linux-fsdevel@vger.kernel.org, Maxim V. Patlasov, linux-aio, Kent Overstreet, Jens Axboe Linus, Here is a merge patch for resolving the conflicts in my git tree. Of course I could rebase, but I think you prefer I didn't do that. Thanks, Shaggy Conflicts: drivers/mtd/nand/nandsim.c fs/btrfs/inode.c fs/cifs/file.c fs/nfs/direct.c fs/nfs/file.c fs/read_write.c include/linux/blk_types.h mm/filemap.c diff --cc fs/btrfs/inode.c index da8d2f6,6feae86..1b83942 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@@ -7234,10 -7233,8 +7247,10 @@@ static ssize_t btrfs_direct_IO(int rw, * call btrfs_wait_ordered_range to make absolutely sure that any * outstanding dirty pages are on disk. */ - count = iov_length(iov, nr_segs); + count = iov_iter_count(iter); - btrfs_wait_ordered_range(inode, offset, count); + ret = btrfs_wait_ordered_range(inode, offset, count); + if (ret) + return ret; if (rw & WRITE) { /* diff --cc fs/cifs/file.c index 5a5a872,cf6aedc..931158b --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@@ -3663,27 -3663,6 +3663,26 @@@ void cifs_oplock_break(struct work_stru } } +/* + * The presence of cifs_direct_io() in the address space ops vector + * allowes open() O_DIRECT flags which would have failed otherwise. + * + * In the non-cached mode (mount with cache=none), we shunt off direct read and write requests + * so this method should never be called. + * + * Direct IO is not yet supported in the cached mode. + */ +static ssize_t - cifs_direct_io(int rw, struct kiocb *iocb, const struct iovec *iov, - loff_t pos, unsigned long nr_segs) ++cifs_direct_io(int rw, struct kiocb *iocb, struct iov_iter *iter, loff_t pos) +{ + /* + * FIXME + * Eventually need to support direct IO for non forcedirectio mounts + */ + return -EINVAL; +} + + const struct address_space_operations cifs_addr_ops = { .readpage = cifs_readpage, .readpages = cifs_readpages, diff --cc fs/nfs/direct.c index d71d66c,239c2fe..87a6475 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@@ -117,26 -118,18 +118,17 @@@ static inline int put_dreq(struct nfs_d * @nr_segs: size of iovec array * * The presence of this routine in the address space ops vector means - * the NFS client supports direct I/O. However, for most direct IO, we - * shunt off direct read and write requests before the VFS gets them, - * so this method is only ever called for swap. + * the NFS client supports direct I/O. However, we shunt off direct + * read and write requests before the VFS gets them, so this method + * should never be called. */ - ssize_t nfs_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov, loff_t pos, unsigned long nr_segs) + ssize_t nfs_direct_IO(int rw, struct kiocb *iocb, struct iov_iter *iter, + loff_t pos) { - #ifndef CONFIG_NFS_SWAP - dprintk("NFS: nfs_direct_IO (%s) off/no(%Ld/%lu) EINVAL\n", - iocb->ki_filp->f_path.dentry->d_name.name, - (long long) pos, iter->nr_segs); + dprintk("NFS: nfs_direct_IO (%pD) off/no(%Ld/%lu) EINVAL\n", - iocb->ki_filp, (long long) pos, nr_segs); ++ iocb->ki_filp, (long long) pos, iter->nr_segs); return -EINVAL; - #else - VM_BUG_ON(iocb->ki_nbytes != PAGE_SIZE); - - if (rw == READ || rw == KERNEL_READ) - return nfs_file_direct_read(iocb, iov, nr_segs, pos, - rw == READ ? true : false); - return nfs_file_direct_write(iocb, iov, nr_segs, pos, - rw == WRITE ? true : false); - #endif /* CONFIG_NFS_SWAP */ } static void nfs_direct_release_pages(struct page **pages, unsigned int npages) @@@ -905,11 -1010,13 +1009,11 @@@ ssize_t nfs_file_direct_read(struct kio struct address_space *mapping = file->f_mapping; size_t count; - count = iov_length(iov, nr_segs); + count = iov_iter_count(iter); nfs_add_stats(mapping->host, NFSIOS_DIRECTREADBYTES, count); - dfprintk(FILE, "NFS: direct read(%s/%s, %zd@%Ld)\n", - file->f_path.dentry->d_parent->d_name.name, - file->f_path.dentry->d_name.name, - count, (long long) pos); + dfprintk(FILE, "NFS: direct read(%pD2, %zd@%Ld)\n", + file, count, (long long) pos); retval = 0; if (!count) @@@ -959,11 -1065,13 +1062,11 @@@ ssize_t nfs_file_direct_write(struct ki struct address_space *mapping = file->f_mapping; size_t count; - count = iov_length(iov, nr_segs); + count = iov_iter_count(iter); nfs_add_stats(mapping->host, NFSIOS_DIRECTWRITTENBYTES, count); - dfprintk(FILE, "NFS: direct write(%s/%s, %zd@%Ld)\n", - file->f_path.dentry->d_parent->d_name.name, - file->f_path.dentry->d_name.name, - count, (long long) pos); + dfprintk(FILE, "NFS: direct write(%pD2, %zd@%Ld)\n", + file, count, (long long) pos); retval = generic_write_checks(file, &pos, &count, 0); if (retval) diff --cc fs/nfs/file.c index e2fcacf,19ac4fd..e022fe9 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@@ -165,18 -174,18 +165,17 @@@ nfs_file_flush(struct file *file, fl_ow EXPORT_SYMBOL_GPL(nfs_file_flush); ssize_t - nfs_file_read(struct kiocb *iocb, const struct iovec *iov, - unsigned long nr_segs, loff_t pos) + nfs_file_read_iter(struct kiocb *iocb, struct iov_iter *iter, loff_t pos) { - struct dentry * dentry = iocb->ki_filp->f_path.dentry; - struct inode * inode = dentry->d_inode; + struct inode *inode = file_inode(iocb->ki_filp); ssize_t result; if (iocb->ki_filp->f_flags & O_DIRECT) - return nfs_file_direct_read(iocb, iov, nr_segs, pos, true); + return nfs_file_direct_read(iocb, iter, pos); - dprintk("NFS: read(%pD2, %lu@%lu)\n", - dprintk("NFS: read_iter(%s/%s, %lu@%lu)\n", - dentry->d_parent->d_name.name, dentry->d_name.name, ++ dprintk("NFS: read_iter(%pD2, %lu@%lu)\n", + iocb->ki_filp, - (unsigned long) iov_length(iov, nr_segs), (unsigned long) pos); + (unsigned long) iov_iter_count(iter), (unsigned long) pos); result = nfs_revalidate_mapping(inode, iocb->ki_filp->f_mapping); if (!result) { @@@ -634,24 -655,25 +633,24 @@@ static int nfs_need_sync_write(struct f return 0; } - ssize_t nfs_file_write(struct kiocb *iocb, const struct iovec *iov, - unsigned long nr_segs, loff_t pos) + ssize_t nfs_file_write_iter(struct kiocb *iocb, struct iov_iter *iter, + loff_t pos) { - struct dentry * dentry = iocb->ki_filp->f_path.dentry; - struct inode * inode = dentry->d_inode; + struct file *file = iocb->ki_filp; + struct inode *inode = file_inode(file); unsigned long written = 0; ssize_t result; - size_t count = iov_length(iov, nr_segs); + size_t count = iov_iter_count(iter); - result = nfs_key_timeout_notify(iocb->ki_filp, inode); + result = nfs_key_timeout_notify(file, inode); if (result) return result; - if (iocb->ki_filp->f_flags & O_DIRECT) + if (file->f_flags & O_DIRECT) - return nfs_file_direct_write(iocb, iov, nr_segs, pos, true); + return nfs_file_direct_write(iocb, iter, pos); - dprintk("NFS: write(%pD2, %lu@%Ld)\n", - dprintk("NFS: write_iter(%s/%s, %lu@%lld)\n", - dentry->d_parent->d_name.name, dentry->d_name.name, - (unsigned long) count, (long long) pos); ++ dprintk("NFS: write_iter(%pD2, %lu@%Ld)\n", + file, (unsigned long) count, (long long) pos); result = -EBUSY; if (IS_SWAPFILE(inode)) diff --cc include/linux/blk_types.h index 238ef0e,1bea25f..2c1c8c9 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@@ -176,9 -176,7 +176,8 @@@ enum rq_flag_bits __REQ_FLUSH_SEQ, /* request for flush sequence */ __REQ_IO_STAT, /* account I/O stat */ __REQ_MIXED_MERGE, /* merge of different types, fail separately */ - __REQ_KERNEL, /* direct IO to kernel pages */ __REQ_PM, /* runtime pm request */ + __REQ_END, /* last of chain of requests */ __REQ_NR_BITS, /* stops here */ }; @@@ -207,29 -205,27 +206,28 @@@ #define REQ_NOMERGE_FLAGS \ (REQ_NOMERGE | REQ_STARTED | REQ_SOFTBARRIER | REQ_FLUSH | REQ_FUA) -#define REQ_RAHEAD (1 << __REQ_RAHEAD) -#define REQ_THROTTLED (1 << __REQ_THROTTLED) - -#define REQ_SORTED (1 << __REQ_SORTED) -#define REQ_SOFTBARRIER (1 << __REQ_SOFTBARRIER) -#define REQ_FUA (1 << __REQ_FUA) -#define REQ_NOMERGE (1 << __REQ_NOMERGE) -#define REQ_STARTED (1 << __REQ_STARTED) -#define REQ_DONTPREP (1 << __REQ_DONTPREP) -#define REQ_QUEUED (1 << __REQ_QUEUED) -#define REQ_ELVPRIV (1 << __REQ_ELVPRIV) -#define REQ_FAILED (1 << __REQ_FAILED) -#define REQ_QUIET (1 << __REQ_QUIET) -#define REQ_PREEMPT (1 << __REQ_PREEMPT) -#define REQ_ALLOCED (1 << __REQ_ALLOCED) -#define REQ_COPY_USER (1 << __REQ_COPY_USER) -#define REQ_FLUSH (1 << __REQ_FLUSH) -#define REQ_FLUSH_SEQ (1 << __REQ_FLUSH_SEQ) -#define REQ_IO_STAT (1 << __REQ_IO_STAT) -#define REQ_MIXED_MERGE (1 << __REQ_MIXED_MERGE) -#define REQ_SECURE (1 << __REQ_SECURE) -#define REQ_PM (1 << __REQ_PM) +#define REQ_RAHEAD (1ULL << __REQ_RAHEAD) +#define REQ_THROTTLED (1ULL << __REQ_THROTTLED) + +#define REQ_SORTED (1ULL << __REQ_SORTED) +#define REQ_SOFTBARRIER (1ULL << __REQ_SOFTBARRIER) +#define REQ_FUA (1ULL << __REQ_FUA) +#define REQ_NOMERGE (1ULL << __REQ_NOMERGE) +#define REQ_STARTED (1ULL << __REQ_STARTED) +#define REQ_DONTPREP (1ULL << __REQ_DONTPREP) +#define REQ_QUEUED (1ULL << __REQ_QUEUED) +#define REQ_ELVPRIV (1ULL << __REQ_ELVPRIV) +#define REQ_FAILED (1ULL << __REQ_FAILED) +#define REQ_QUIET (1ULL << __REQ_QUIET) +#define REQ_PREEMPT (1ULL << __REQ_PREEMPT) +#define REQ_ALLOCED (1ULL << __REQ_ALLOCED) +#define REQ_COPY_USER (1ULL << __REQ_COPY_USER) +#define REQ_FLUSH (1ULL << __REQ_FLUSH) +#define REQ_FLUSH_SEQ (1ULL << __REQ_FLUSH_SEQ) +#define REQ_IO_STAT (1ULL << __REQ_IO_STAT) +#define REQ_MIXED_MERGE (1ULL << __REQ_MIXED_MERGE) +#define REQ_SECURE (1ULL << __REQ_SECURE) - #define REQ_KERNEL (1ULL << __REQ_KERNEL) +#define REQ_PM (1ULL << __REQ_PM) +#define REQ_END (1ULL << __REQ_END) #endif /* __LINUX_BLK_TYPES_H */ diff --cc mm/filemap.c index b7749a9,9b0b852..fc78cf2 --- a/mm/filemap.c +++ b/mm/filemap.c @@@ -1199,14 -1200,13 +1199,14 @@@ page_ok * Ok, we have the page, and it's up-to-date, so * now we can copy it to user space... * - * The file_read_actor routine returns how many bytes were - * The actor routine returns how many bytes were actually used.. ++ * The file_read_iter_actor routine returns how many bytes were + * actually used.. * NOTE! This may not be the same as how much of a user buffer * we filled up (we may be padding etc), so we can only update * "pos" here (the actor routine has to update the user buffer * pointers and the remaining count). */ - ret = file_read_actor(desc, page, offset, nr); - ret = actor(desc, page, offset, nr); ++ ret = file_read_iter_actor(desc, page, offset, nr); offset += ret; index += offset >> PAGE_CACHE_SHIFT; offset &= ~PAGE_CACHE_MASK; @@@ -1455,39 -1426,15 +1426,15 @@@ generic_file_read_iter(struct kiocb *io } } - count = retval; - for (seg = 0; seg < nr_segs; seg++) { - read_descriptor_t desc; - loff_t offset = 0; - - /* - * If we did a short DIO read we need to skip the section of the - * iov that we've already read data into. - */ - if (count) { - if (count > iov[seg].iov_len) { - count -= iov[seg].iov_len; - continue; - } - offset = count; - count = 0; - } - - desc.written = 0; - desc.arg.buf = iov[seg].iov_base + offset; - desc.count = iov[seg].iov_len - offset; - if (desc.count == 0) - continue; - desc.error = 0; - do_generic_file_read(filp, ppos, &desc); - retval += desc.written; - if (desc.error) { - retval = retval ?: desc.error; - break; - } - if (desc.count > 0) - break; - } + desc.written = 0; + desc.arg.data = iter; + desc.count = count; + desc.error = 0; - do_generic_file_read(filp, ppos, &desc, file_read_iter_actor); ++ do_generic_file_read(filp, ppos, &desc); + if (desc.written) + retval = desc.written; + else + retval = desc.error; out: return retval; } -- To unsubscribe, send a message with 'unsubscribe linux-aio' in the body to majordomo@kvack.org. For more info on Linux AIO, see: http://www.kvack.org/aio/ Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [GIT PULL] direct IO support for loop driver 2013-11-18 19:03 [GIT PULL] direct IO support for loop driver Dave Kleikamp 2013-11-18 19:07 ` Dave Kleikamp @ 2013-11-20 21:19 ` Linus Torvalds 2013-11-20 21:38 ` Linus Torvalds 2013-11-21 9:58 ` Christoph Hellwig 1 sibling, 2 replies; 12+ messages in thread From: Linus Torvalds @ 2013-11-20 21:19 UTC (permalink / raw) To: Dave Kleikamp Cc: Christoph Hellwig, LKML, linux-fsdevel@vger.kernel.org, Maxim V. Patlasov, linux-aio, Kent Overstreet, Jens Axboe On Mon, Nov 18, 2013 at 11:03 AM, Dave Kleikamp <dave.kleikamp@oracle.com> wrote: > Linus, > > Please pull the following patches. They add the infrastructure for > kernel-initiated direct-io, change the loop driver to perform direct-io, > and change swap-over-nfs to use the new infrastructure. Quite frankly, I got maybe ten patches into this series, at which point I just threw my hands up and said: "This is too ugly to live". The naming in fs/iov-iter.c is disgusting. :ii_iov_xyz? WTF? Random "flag" value for marking things atomic? F*ck me, that's ugly. A separate phase for checking addresses instead of just doing it in the loop that loops over iovec's? Why? It sure as hell isn't because it's more efficient, and it doubly sure as hell isn't because it's prettier. At that point, I just couldn't take it any more. I really don't see the point of all this crap. All this for the loop driver? If so, it had better at least be prettier than it is. Linus -- To unsubscribe, send a message with 'unsubscribe linux-aio' in the body to majordomo@kvack.org. For more info on Linux AIO, see: http://www.kvack.org/aio/ Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [GIT PULL] direct IO support for loop driver 2013-11-20 21:19 ` Linus Torvalds @ 2013-11-20 21:38 ` Linus Torvalds 2013-11-20 21:50 ` Kent Overstreet 2013-11-20 22:46 ` Dave Kleikamp 2013-11-21 9:58 ` Christoph Hellwig 1 sibling, 2 replies; 12+ messages in thread From: Linus Torvalds @ 2013-11-20 21:38 UTC (permalink / raw) To: Dave Kleikamp Cc: Christoph Hellwig, LKML, linux-fsdevel@vger.kernel.org, Maxim V. Patlasov, linux-aio, Kent Overstreet, Jens Axboe On Wed, Nov 20, 2013 at 1:19 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > At that point, I just couldn't take it any more. Just to clarify, I think it might be fixable. But it does need fixing, because I really feel dirty from reading it. And I may not care all that deeply about what random drivers or low-level filesystems do, but I *do* care about generic code in mm/ and fs/, so making those iovec functions uglier makes me go all "Hulk angry! Hulk smash" on the code. The whole "separate out checking from user copy" needs to go away. There's no excuse for it. The whole "if (atomic) do_atomic_ops() else do_regular_ops()" crap needs to go away. You can do it either by just duplicating the function, or by having it use a indirect function for the copy (and that indirect function acts like copy_from/to_user() and checks the address range - and you can obviously then also have it be a "copy from kernel" thing too if you want/need it). And no, you don't then make it do *both* the conditional *and* the function pointer like you did in that discusting commit that mixes the two with the struct iov_iter_ops). The "__" versions that don't check the user address range needs to die entirely. The whole crazy "ii_iov_xyz" naming needs to go away. It doesn't even make sense (one of the "i"s is for "iov". That "unsigned long data" that contains an iovec *? WTF? How did that ever start making sense? IOW, there are many many details that just make me absolutely detest this series. Enough that there's no way in hell I feel comfortable pulling it. But they are likely fixable. Linus -- To unsubscribe, send a message with 'unsubscribe linux-aio' in the body to majordomo@kvack.org. For more info on Linux AIO, see: http://www.kvack.org/aio/ Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [GIT PULL] direct IO support for loop driver 2013-11-20 21:38 ` Linus Torvalds @ 2013-11-20 21:50 ` Kent Overstreet 2013-11-20 22:46 ` Dave Kleikamp 1 sibling, 0 replies; 12+ messages in thread From: Kent Overstreet @ 2013-11-20 21:50 UTC (permalink / raw) To: Linus Torvalds Cc: Dave Kleikamp, Christoph Hellwig, LKML, linux-fsdevel@vger.kernel.org, Maxim V. Patlasov, linux-aio, Jens Axboe On Wed, Nov 20, 2013 at 01:38:19PM -0800, Linus Torvalds wrote: > On Wed, Nov 20, 2013 at 1:19 PM, Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > > > At that point, I just couldn't take it any more. > > Just to clarify, I think it might be fixable. But it does need fixing, > because I really feel dirty from reading it. And I may not care all > that deeply about what random drivers or low-level filesystems do, but > I *do* care about generic code in mm/ and fs/, so making those iovec > functions uglier makes me go all "Hulk angry! Hulk smash" on the code. > > The whole "separate out checking from user copy" needs to go away. > There's no excuse for it. > > The whole "if (atomic) do_atomic_ops() else do_regular_ops()" crap > needs to go away. You can do it either by just duplicating the > function, or by having it use a indirect function for the copy (and > that indirect function acts like copy_from/to_user() and checks the > address range - and you can obviously then also have it be a "copy > from kernel" thing too if you want/need it). And no, you don't then > make it do *both* the conditional *and* the function pointer like you > did in that discusting commit that mixes the two with the struct > iov_iter_ops). > > The "__" versions that don't check the user address range needs to die entirely. > > The whole crazy "ii_iov_xyz" naming needs to go away. It doesn't even > make sense (one of the "i"s is for "iov". > > That "unsigned long data" that contains an iovec *? WTF? How did that > ever start making sense? > > IOW, there are many many details that just make me absolutely detest > this series. Enough that there's no way in hell I feel comfortable > pulling it. But they are likely fixable. To be honest, I'm skeptical that this approach (adding methods and a bunch of complexity to iov_iter for backing them with biovecs) is necessary for fixing the loop driver. I've been working on an alternate approach that in the process cleans a bunch of stuff up - the idea is basically, 1) refactor and massage a bunch of stuff in the block layer so bios can be created and submitting without caring about the constraints of the underlying device, and then 2) rewrite the dio code to start out by pinning pages directly into bios, then query the filesystem to map those bios wherever - splitting as needed. What this gets us is a nice clean split where (with some more handwaving; Zach had some ideas about how to handle some annoying details) we can just submit bios to some sort of new DIO method - and then the loop driver could just use that directly. IMO this would be vastly cleaner; we'd have one data structure - the iov_iter - for memory that isn't pinned, and another data structure - struct bio - for iterating over pinned pages. Most of the aforementioned block layer massaging I've been doing was creating a real iterator for bios (that doesn't modify the biovecs). I've also done the DIO rewrite - awhile ago - and it's definitely not ready to go upstream (various prereqs still aren't in), but it at least shows that the approach is viable and my rewrite cuts fs/direct-io.c in _half_ while significantly improving performance: http://evilpiepirate.org/git/linux-bcache.git/commit/?h=block_stuff&id=caf18e8ec531daea29f61c9aa486443026a343c7 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [GIT PULL] direct IO support for loop driver 2013-11-20 21:38 ` Linus Torvalds 2013-11-20 21:50 ` Kent Overstreet @ 2013-11-20 22:46 ` Dave Kleikamp 2013-11-21 4:24 ` Stephen Rothwell 1 sibling, 1 reply; 12+ messages in thread From: Dave Kleikamp @ 2013-11-20 22:46 UTC (permalink / raw) To: Linus Torvalds Cc: Christoph Hellwig, LKML, linux-fsdevel@vger.kernel.org, Maxim V. Patlasov, linux-aio, Kent Overstreet, Jens Axboe On 11/20/2013 03:38 PM, Linus Torvalds wrote: > On Wed, Nov 20, 2013 at 1:19 PM, Linus Torvalds > <torvalds@linux-foundation.org> wrote: >> >> At that point, I just couldn't take it any more. > > Just to clarify, I think it might be fixable. But it does need fixing, > because I really feel dirty from reading it. And I may not care all > that deeply about what random drivers or low-level filesystems do, but > I *do* care about generic code in mm/ and fs/, so making those iovec > functions uglier makes me go all "Hulk angry! Hulk smash" on the code. Okay, fair enough. This patchset started out as a bit of a hack, but it ended up working well and I had multiple other developers wanting the function in mainline. I didn't get too much critical feedback, but I should have been more critical myself. > The whole "separate out checking from user copy" needs to go away. > There's no excuse for it. > > The whole "if (atomic) do_atomic_ops() else do_regular_ops()" crap > needs to go away. You can do it either by just duplicating the > function, or by having it use a indirect function for the copy (and > that indirect function acts like copy_from/to_user() and checks the > address range - and you can obviously then also have it be a "copy > from kernel" thing too if you want/need it). And no, you don't then > make it do *both* the conditional *and* the function pointer like you > did in that discusting commit that mixes the two with the struct > iov_iter_ops). I'll fix those. > The "__" versions that don't check the user address range needs to die entirely. > > The whole crazy "ii_iov_xyz" naming needs to go away. It doesn't even > make sense (one of the "i"s is for "iov". okay > That "unsigned long data" that contains an iovec *? WTF? How did that > ever start making sense? In a later patch, it could also be a biovec. It'd probably be better to make it a union at that point, or at least add a comment. > IOW, there are many many details that just make me absolutely detest > this series. Enough that there's no way in hell I feel comfortable > pulling it. But they are likely fixable. Yeah, I'll work on something cleaner. The vital piece is some container that can either contain an iovec or or an array of kernel pages to be passed to the filesystems. The iov_iter structure was a reasonable choice, but the implementation is a bit sloppy. Kent wants to pass the bio structure itself, while this patchset used the biovec array. As long as I'm reworking the patchset, I'll see if I can accommodate his requirements. Thanks, Dave -- To unsubscribe, send a message with 'unsubscribe linux-aio' in the body to majordomo@kvack.org. For more info on Linux AIO, see: http://www.kvack.org/aio/ Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [GIT PULL] direct IO support for loop driver 2013-11-20 22:46 ` Dave Kleikamp @ 2013-11-21 4:24 ` Stephen Rothwell 0 siblings, 0 replies; 12+ messages in thread From: Stephen Rothwell @ 2013-11-21 4:24 UTC (permalink / raw) To: Dave Kleikamp Cc: Linus Torvalds, Christoph Hellwig, LKML, linux-fsdevel@vger.kernel.org, Maxim V. Patlasov, linux-aio, Kent Overstreet, Jens Axboe [-- Attachment #1: Type: text/plain, Size: 813 bytes --] Hi all, On Wed, 20 Nov 2013 16:46:38 -0600 Dave Kleikamp <dave.kleikamp@oracle.com> wrote: > > Yeah, I'll work on something cleaner. The vital piece is some container > that can either contain an iovec or or an array of kernel pages to be > passed to the filesystems. The iov_iter structure was a reasonable > choice, but the implementation is a bit sloppy. Kent wants to pass the > bio structure itself, while this patchset used the biovec array. As long > as I'm reworking the patchset, I'll see if I can accommodate his > requirements. So given this rework, I think I should drop the aio-direct tree from linux-next for now. We can add it back (hopefully with a better name) when the rewrite is in a reasonable state. -- Cheers, Stephen Rothwell sfr@canb.auug.org.au [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [GIT PULL] direct IO support for loop driver 2013-11-20 21:19 ` Linus Torvalds 2013-11-20 21:38 ` Linus Torvalds @ 2013-11-21 9:58 ` Christoph Hellwig 2013-11-21 10:06 ` Kent Overstreet 1 sibling, 1 reply; 12+ messages in thread From: Christoph Hellwig @ 2013-11-21 9:58 UTC (permalink / raw) To: Linus Torvalds Cc: Dave Kleikamp, Christoph Hellwig, LKML, linux-fsdevel@vger.kernel.org, Maxim V. Patlasov, linux-aio, Kent Overstreet, Jens Axboe On Wed, Nov 20, 2013 at 01:19:33PM -0800, Linus Torvalds wrote: > I really don't see the point of all this crap. All this for the loop > driver? If so, it had better at least be prettier than it is. It's for anyone trying to to direct I/O from kernel space. I have patches that speed up ecryptfs and allow sane operation for the file backed scsi target based on this that have been collecting dust for a while. -- To unsubscribe, send a message with 'unsubscribe linux-aio' in the body to majordomo@kvack.org. For more info on Linux AIO, see: http://www.kvack.org/aio/ Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [GIT PULL] direct IO support for loop driver 2013-11-21 9:58 ` Christoph Hellwig @ 2013-11-21 10:06 ` Kent Overstreet 2013-11-21 10:11 ` Christoph Hellwig 2013-11-21 17:34 ` Christoph Hellwig 0 siblings, 2 replies; 12+ messages in thread From: Kent Overstreet @ 2013-11-21 10:06 UTC (permalink / raw) To: Christoph Hellwig Cc: Linus Torvalds, Dave Kleikamp, LKML, linux-fsdevel@vger.kernel.org, Maxim V. Patlasov, linux-aio, Jens Axboe On Thu, Nov 21, 2013 at 01:58:37AM -0800, Christoph Hellwig wrote: > On Wed, Nov 20, 2013 at 01:19:33PM -0800, Linus Torvalds wrote: > > I really don't see the point of all this crap. All this for the loop > > driver? If so, it had better at least be prettier than it is. > > It's for anyone trying to to direct I/O from kernel space. I have > patches that speed up ecryptfs and allow sane operation for the file > backed scsi target based on this that have been collecting dust for a > while. Can you post those patches somewhere? If that's the end goal of this patch series, we should be able to look at them too. -- To unsubscribe, send a message with 'unsubscribe linux-aio' in the body to majordomo@kvack.org. For more info on Linux AIO, see: http://www.kvack.org/aio/ Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [GIT PULL] direct IO support for loop driver 2013-11-21 10:06 ` Kent Overstreet @ 2013-11-21 10:11 ` Christoph Hellwig 2013-11-21 10:13 ` Kent Overstreet 2013-11-21 17:34 ` Christoph Hellwig 1 sibling, 1 reply; 12+ messages in thread From: Christoph Hellwig @ 2013-11-21 10:11 UTC (permalink / raw) To: Kent Overstreet Cc: Christoph Hellwig, Linus Torvalds, Dave Kleikamp, LKML, linux-fsdevel@vger.kernel.org, Maxim V. Patlasov, linux-aio, Jens Axboe On Thu, Nov 21, 2013 at 02:06:09AM -0800, Kent Overstreet wrote: > Can you post those patches somewhere? If that's the end goal of this > patch series, we should be able to look at them too. I've should have the target side back working soon as I started on it again about a week ago. -- To unsubscribe, send a message with 'unsubscribe linux-aio' in the body to majordomo@kvack.org. For more info on Linux AIO, see: http://www.kvack.org/aio/ Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [GIT PULL] direct IO support for loop driver 2013-11-21 10:11 ` Christoph Hellwig @ 2013-11-21 10:13 ` Kent Overstreet 0 siblings, 0 replies; 12+ messages in thread From: Kent Overstreet @ 2013-11-21 10:13 UTC (permalink / raw) To: Christoph Hellwig Cc: Linus Torvalds, Dave Kleikamp, LKML, linux-fsdevel@vger.kernel.org, Maxim V. Patlasov, linux-aio, Jens Axboe On Thu, Nov 21, 2013 at 02:11:38AM -0800, Christoph Hellwig wrote: > On Thu, Nov 21, 2013 at 02:06:09AM -0800, Kent Overstreet wrote: > > Can you post those patches somewhere? If that's the end goal of this > > patch series, we should be able to look at them too. > > I've should have the target side back working soon as I started on it > again about a week ago. Broken code is fine, I just want to see the overall approach. -- To unsubscribe, send a message with 'unsubscribe linux-aio' in the body to majordomo@kvack.org. For more info on Linux AIO, see: http://www.kvack.org/aio/ Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [GIT PULL] direct IO support for loop driver 2013-11-21 10:06 ` Kent Overstreet 2013-11-21 10:11 ` Christoph Hellwig @ 2013-11-21 17:34 ` Christoph Hellwig 1 sibling, 0 replies; 12+ messages in thread From: Christoph Hellwig @ 2013-11-21 17:34 UTC (permalink / raw) To: Kent Overstreet Cc: Linus Torvalds, Dave Kleikamp, LKML, linux-fsdevel@vger.kernel.org, Maxim V. Patlasov, linux-aio, Jens Axboe find the target one below. I haven't found the ecryptfs work, but it is a lot more involved as it only wants to use direct I/O, not aio. --- a/drivers/target/target_core_file.c +++ b/drivers/target/target_core_file.c @@ -31,6 +31,7 @@ #include <linux/spinlock.h> #include <linux/module.h> #include <linux/falloc.h> +#include <linux/blk_types.h> #include <scsi/scsi.h> #include <scsi/scsi_host.h> #include <asm/unaligned.h> @@ -120,6 +121,8 @@ static int fd_configure_device(struct se_device *dev) * of pure timestamp updates. */ flags = O_RDWR | O_CREAT | O_LARGEFILE | O_DSYNC; + if (1) + flags |= O_DIRECT; /* * Optionally allow fd_buffered_io=1 to be enabled for people @@ -546,6 +549,61 @@ fd_execute_unmap(struct se_cmd *cmd) return sbc_execute_unmap(cmd, fd_do_unmap, file); } +static void fd_rw_aio_complete(u64 data, long res) +{ + struct se_cmd *cmd = (struct se_cmd *)(uintptr_t)data; + + kfree(cmd->priv); + + if (res < 0) + target_complete_cmd(cmd, SAM_STAT_CHECK_CONDITION); + else + target_complete_cmd(cmd, SAM_STAT_GOOD); +} + +static sense_reason_t +fd_rw_aio(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents, + enum dma_data_direction data_direction) +{ + struct se_device *se_dev = cmd->se_dev; + struct fd_dev *dev = FD_DEV(se_dev); + struct file *file = dev->fd_file; + struct scatterlist *sg; + struct kiocb *iocb; + unsigned int op; + struct iov_iter iter; + struct bio_vec *bvec; + loff_t pos = (cmd->t_task_lba * se_dev->dev_attrib.block_size); + int i; + + iocb = aio_kernel_alloc(GFP_NOIO); + if (!iocb) + return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE; + + if (data_direction == DMA_TO_DEVICE) + op = IOCB_CMD_WRITE_ITER; + else + op = IOCB_CMD_READ_ITER; + + bvec = cmd->priv = kcalloc(sgl_nents, sizeof(struct bio_vec), GFP_NOIO); + if (!bvec) { + aio_kernel_free(iocb); + return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE; + } + + for_each_sg(sgl, sg, sgl_nents, i) { + bvec[i].bv_page = sg_page(sg); + bvec[i].bv_len = sg->length; + bvec[i].bv_offset = sg->offset; + } + + iov_iter_init_bvec(&iter, bvec, sgl_nents, bvec_length(bvec, sgl_nents), 0); + aio_kernel_init_rw(iocb, file, iov_iter_count(&iter), pos); + aio_kernel_init_callback(iocb, fd_rw_aio_complete, (u64)(uintptr_t)bvec); + + return aio_kernel_submit(iocb, op, &iter); +} + static sense_reason_t fd_execute_rw(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents, enum dma_data_direction data_direction) @@ -553,6 +611,9 @@ fd_execute_rw(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents, struct se_device *dev = cmd->se_dev; int ret = 0; + if (1) + return fd_rw_aio(cmd, sgl, sgl_nents, data_direction); + /* * Call vectorized fileio functions to map struct scatterlist * physical memory addresses to struct iovec virtual memory. -- To unsubscribe, send a message with 'unsubscribe linux-aio' in the body to majordomo@kvack.org. For more info on Linux AIO, see: http://www.kvack.org/aio/ Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a> ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2013-11-21 17:34 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-11-18 19:03 [GIT PULL] direct IO support for loop driver Dave Kleikamp 2013-11-18 19:07 ` Dave Kleikamp 2013-11-20 21:19 ` Linus Torvalds 2013-11-20 21:38 ` Linus Torvalds 2013-11-20 21:50 ` Kent Overstreet 2013-11-20 22:46 ` Dave Kleikamp 2013-11-21 4:24 ` Stephen Rothwell 2013-11-21 9:58 ` Christoph Hellwig 2013-11-21 10:06 ` Kent Overstreet 2013-11-21 10:11 ` Christoph Hellwig 2013-11-21 10:13 ` Kent Overstreet 2013-11-21 17:34 ` Christoph Hellwig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).