* [PATCH v5 0/7] vfs: Non-blockling buffered fs read (page cache only)
@ 2014-11-05 21:14 Milosz Tanski
2014-11-05 21:14 ` [PATCH v5 1/7] vfs: Prepare for adding a new preadv/pwritev with user flags Milosz Tanski
` (9 more replies)
0 siblings, 10 replies; 27+ messages in thread
From: Milosz Tanski @ 2014-11-05 21:14 UTC (permalink / raw)
To: linux-kernel
Cc: Christoph Hellwig, linux-fsdevel, linux-aio, Mel Gorman,
Volker Lendecke, Tejun Heo, Jeff Moyer, Theodore Ts'o,
Al Viro, linux-api, Michael Kerrisk, linux-arch
This patcheset introduces an ability to perform a non-blocking read from
regular files in buffered IO mode. This works by only for those filesystems
that have data in the page cache.
It does this by introducing new syscalls new syscalls preadv2/pwritev2. These
new syscalls behave like the network sendmsg, recvmsg syscalls that accept an
extra flag argument (RWF_NONBLOCK).
It's a very common patern today (samba, libuv, etc..) use a large threadpool to
perform buffered IO operations. They submit the work form another thread
that performs network IO and epoll or other threads that perform CPU work. This
leads to increased latency for processing, esp. in the case of data that's
already cached in the page cache.
With the new interface the applications will now be able to fetch the data in
their network / cpu bound thread(s) and only defer to a threadpool if it's not
there. In our own application (VLDB) we've observed a decrease in latency for
"fast" request by avoiding unnecessary queuing and having to swap out current
tasks in IO bound work threads.
Version 5 highlight:
- XFS support for RWF_NONBLOCK. from Christoph.
- RWF_DSYNC flag and support for pwritev2, from Christoph.
- Implemented compat syscalls, per. Jeff.
- Missing nfs, ceph changes from older patchset.
Version 4 highlight:
- Updated for 3.18-rc1.
- Performance data from our application.
- First stab at man page with Jeff's help. Patch is in-reply to.
RFC Version 3 highlights:
- Down to 2 syscalls from 4; can user fp or argument position.
- RWF_NONBLOCK value flag is not the same O_NONBLOCK, per Jeff.
RFC Version 2 highlights:
- Put the flags argument into kiocb (less noise), per. Al Viro
- O_DIRECT checking early in the process, per. Jeff Moyer
- Resolved duplicate (c&p) code in syscall code, per. Jeff
- Included perf data in thread cover letter, per. Jeff
- Created a new flag (not O_NONBLOCK) for readv2, perf Jeff
Some perf data generated using fio comparing the posix aio engine to a version
of the posix AIO engine that attempts to performs "fast" reads before
submitting the operations to the queue. This workflow is on ext4 partition on
raid0 (test / build-rig.) Simulating our database access patern workload using
16kb read accesses. Our database uses a home-spun posix aio like queue (samba
does the same thing.)
f1: ~73% rand read over mostly cached data (zipf med-size dataset)
f2: ~18% rand read over mostly un-cached data (uniform large-dataset)
f3: ~9% seq-read over large dataset
before:
f1:
bw (KB /s): min= 11, max= 9088, per=0.56%, avg=969.54, stdev=827.99
lat (msec) : 50=0.01%, 100=1.06%, 250=5.88%, 500=4.08%, 750=12.48%
lat (msec) : 1000=17.27%, 2000=49.86%, >=2000=9.42%
f2:
bw (KB /s): min= 2, max= 1882, per=0.16%, avg=273.28, stdev=220.26
lat (msec) : 250=5.65%, 500=3.31%, 750=15.64%, 1000=24.59%, 2000=46.56%
lat (msec) : >=2000=4.33%
f3:
bw (KB /s): min= 0, max=265568, per=99.95%, avg=174575.10,
stdev=34526.89
lat (usec) : 2=0.01%, 4=0.01%, 10=0.02%, 20=0.27%, 50=10.82%
lat (usec) : 100=50.34%, 250=5.05%, 500=7.12%, 750=6.60%, 1000=4.55%
lat (msec) : 2=8.73%, 4=3.49%, 10=1.83%, 20=0.89%, 50=0.22%
lat (msec) : 100=0.05%, 250=0.02%, 500=0.01%
total:
READ: io=102365MB, aggrb=174669KB/s, minb=240KB/s, maxb=173599KB/s,
mint=600001msec, maxt=600113msec
after (with fast read using preadv2 before submit):
f1:
bw (KB /s): min= 3, max=14897, per=1.28%, avg=2276.69, stdev=2930.39
lat (usec) : 2=70.63%, 4=0.01%
lat (msec) : 250=0.20%, 500=2.26%, 750=1.18%, 2000=0.22%, >=2000=25.53%
f2:
bw (KB /s): min= 2, max= 2362, per=0.14%, avg=249.83, stdev=222.00
lat (msec) : 250=6.35%, 500=1.78%, 750=9.29%, 1000=20.49%, 2000=52.18%
lat (msec) : >=2000=9.99%
f3:
bw (KB /s): min= 1, max=245448, per=100.00%, avg=177366.50,
stdev=35995.60
lat (usec) : 2=64.04%, 4=0.01%, 10=0.01%, 20=0.06%, 50=0.43%
lat (usec) : 100=0.20%, 250=1.27%, 500=2.93%, 750=3.93%, 1000=7.35%
lat (msec) : 2=14.27%, 4=2.88%, 10=1.54%, 20=0.81%, 50=0.22%
lat (msec) : 100=0.05%, 250=0.02%
total:
READ: io=103941MB, aggrb=177339KB/s, minb=213KB/s, maxb=176375KB/s,
mint=600020msec, maxt=600178msec
Interpreting the results you can see total bandwidth stays the same but overall
request latency is decreased in f1 (random, mostly cached) and f3 (sequential)
workloads. There is a slight bump in latency for since it's random data that's
unlikely to be cached but we're always trying "fast read".
In our application we have starting keeping track of "fast read" hits/misses
and for files / requests that have a lot hit ratio we don't do "fast reads"
mostly getting rid of extra latency in the uncached cases. In our real world
work load we were able to reduce average response time by 20 to 30% (depends
on amount of IO done by request).
I've performed other benchmarks and I have no observed any perf regressions in
any of the normal (old) code paths.
I have co-developed these changes with Christoph Hellwig.
Christoph Hellwig (3):
xfs: add RWF_NONBLOCK support
fs: pass iocb to generic_write_sync
fs: add a flag for per-operation O_DSYNC semantics
Milosz Tanski (4):
vfs: Prepare for adding a new preadv/pwritev with user flags.
vfs: Define new syscalls preadv2,pwritev2
x86: wire up preadv2 and pwritev2
vfs: RWF_NONBLOCK flag for preadv2
arch/x86/syscalls/syscall_32.tbl | 2 +
arch/x86/syscalls/syscall_64.tbl | 2 +
drivers/target/target_core_file.c | 6 +-
fs/block_dev.c | 8 +-
fs/btrfs/file.c | 7 +-
fs/ceph/file.c | 6 +-
fs/cifs/file.c | 14 +--
fs/direct-io.c | 8 +-
fs/ext4/file.c | 8 +-
fs/fuse/file.c | 2 +
fs/gfs2/file.c | 9 +-
fs/nfs/file.c | 15 ++-
fs/nfsd/vfs.c | 4 +-
fs/ntfs/file.c | 8 +-
fs/ocfs2/file.c | 12 +-
fs/pipe.c | 3 +-
fs/read_write.c | 239 +++++++++++++++++++++++++++++---------
fs/splice.c | 2 +-
fs/udf/file.c | 11 +-
fs/xfs/xfs_file.c | 36 ++++--
include/linux/aio.h | 2 +
include/linux/compat.h | 6 +
include/linux/fs.h | 16 ++-
include/linux/syscalls.h | 6 +
include/uapi/asm-generic/unistd.h | 6 +-
mm/filemap.c | 55 +++++++--
mm/shmem.c | 4 +
27 files changed, 346 insertions(+), 151 deletions(-)
--
1.9.1
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH v5 1/7] vfs: Prepare for adding a new preadv/pwritev with user flags.
2014-11-05 21:14 [PATCH v5 0/7] vfs: Non-blockling buffered fs read (page cache only) Milosz Tanski
@ 2014-11-05 21:14 ` Milosz Tanski
2014-11-05 21:14 ` [PATCH v5 2/7] vfs: Define new syscalls preadv2,pwritev2 Milosz Tanski
` (8 subsequent siblings)
9 siblings, 0 replies; 27+ messages in thread
From: Milosz Tanski @ 2014-11-05 21:14 UTC (permalink / raw)
To: linux-kernel
Cc: Christoph Hellwig, linux-fsdevel, linux-aio, Mel Gorman,
Volker Lendecke, Tejun Heo, Jeff Moyer, Theodore Ts'o,
Al Viro, linux-api, Michael Kerrisk, linux-arch, linux-scsi,
linux-nfs
Plumbing the flags argument through the vfs code so they can be passed down to
__generic_file_(read/write)_iter function that do the acctual work.
Signed-off-by: Milosz Tanski <milosz@adfin.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
---
drivers/target/target_core_file.c | 6 +++---
fs/nfsd/vfs.c | 4 ++--
fs/read_write.c | 27 +++++++++++++++------------
fs/splice.c | 2 +-
include/linux/aio.h | 2 ++
include/linux/fs.h | 4 ++--
6 files changed, 25 insertions(+), 20 deletions(-)
diff --git a/drivers/target/target_core_file.c b/drivers/target/target_core_file.c
index 7d6cdda..58d9a6d 100644
--- a/drivers/target/target_core_file.c
+++ b/drivers/target/target_core_file.c
@@ -350,9 +350,9 @@ static int fd_do_rw(struct se_cmd *cmd, struct scatterlist *sgl,
set_fs(get_ds());
if (is_write)
- ret = vfs_writev(fd, &iov[0], sgl_nents, &pos);
+ ret = vfs_writev(fd, &iov[0], sgl_nents, &pos, 0);
else
- ret = vfs_readv(fd, &iov[0], sgl_nents, &pos);
+ ret = vfs_readv(fd, &iov[0], sgl_nents, &pos, 0);
set_fs(old_fs);
@@ -528,7 +528,7 @@ fd_execute_write_same(struct se_cmd *cmd)
old_fs = get_fs();
set_fs(get_ds());
- rc = vfs_writev(f, &iov[0], iov_num, &pos);
+ rc = vfs_writev(f, &iov[0], iov_num, &pos, 0);
set_fs(old_fs);
vfree(iov);
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 989129e..ef01c78 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -872,7 +872,7 @@ __be32 nfsd_readv(struct file *file, loff_t offset, struct kvec *vec, int vlen,
oldfs = get_fs();
set_fs(KERNEL_DS);
- host_err = vfs_readv(file, (struct iovec __user *)vec, vlen, &offset);
+ host_err = vfs_readv(file, (struct iovec __user *)vec, vlen, &offset, 0);
set_fs(oldfs);
return nfsd_finish_read(file, count, host_err);
}
@@ -960,7 +960,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file *file,
/* Write the data. */
oldfs = get_fs(); set_fs(KERNEL_DS);
- host_err = vfs_writev(file, (struct iovec __user *)vec, vlen, &pos);
+ host_err = vfs_writev(file, (struct iovec __user *)vec, vlen, &pos, 0);
set_fs(oldfs);
if (host_err < 0)
goto out_nfserr;
diff --git a/fs/read_write.c b/fs/read_write.c
index 7d9318c..94b2d34 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -653,7 +653,8 @@ unsigned long iov_shorten(struct iovec *iov, unsigned long nr_segs, size_t to)
EXPORT_SYMBOL(iov_shorten);
static ssize_t do_iter_readv_writev(struct file *filp, int rw, const struct iovec *iov,
- unsigned long nr_segs, size_t len, loff_t *ppos, iter_fn_t fn)
+ unsigned long nr_segs, size_t len, loff_t *ppos, iter_fn_t fn,
+ int flags)
{
struct kiocb kiocb;
struct iov_iter iter;
@@ -662,6 +663,7 @@ static ssize_t do_iter_readv_writev(struct file *filp, int rw, const struct iove
init_sync_kiocb(&kiocb, filp);
kiocb.ki_pos = *ppos;
kiocb.ki_nbytes = len;
+ kiocb.ki_rwflags = flags;
iov_iter_init(&iter, rw, iov, nr_segs, len);
ret = fn(&kiocb, &iter);
@@ -800,7 +802,8 @@ out:
static ssize_t do_readv_writev(int type, struct file *file,
const struct iovec __user * uvector,
- unsigned long nr_segs, loff_t *pos)
+ unsigned long nr_segs, loff_t *pos,
+ int flags)
{
size_t tot_len;
struct iovec iovstack[UIO_FASTIOV];
@@ -834,7 +837,7 @@ static ssize_t do_readv_writev(int type, struct file *file,
if (iter_fn)
ret = do_iter_readv_writev(file, type, iov, nr_segs, tot_len,
- pos, iter_fn);
+ pos, iter_fn, flags);
else if (fnv)
ret = do_sync_readv_writev(file, iov, nr_segs, tot_len,
pos, fnv);
@@ -857,27 +860,27 @@ out:
}
ssize_t vfs_readv(struct file *file, const struct iovec __user *vec,
- unsigned long vlen, loff_t *pos)
+ unsigned long vlen, loff_t *pos, int flags)
{
if (!(file->f_mode & FMODE_READ))
return -EBADF;
if (!(file->f_mode & FMODE_CAN_READ))
return -EINVAL;
- return do_readv_writev(READ, file, vec, vlen, pos);
+ return do_readv_writev(READ, file, vec, vlen, pos, flags);
}
EXPORT_SYMBOL(vfs_readv);
ssize_t vfs_writev(struct file *file, const struct iovec __user *vec,
- unsigned long vlen, loff_t *pos)
+ unsigned long vlen, loff_t *pos, int flags)
{
if (!(file->f_mode & FMODE_WRITE))
return -EBADF;
if (!(file->f_mode & FMODE_CAN_WRITE))
return -EINVAL;
- return do_readv_writev(WRITE, file, vec, vlen, pos);
+ return do_readv_writev(WRITE, file, vec, vlen, pos, flags);
}
EXPORT_SYMBOL(vfs_writev);
@@ -890,7 +893,7 @@ SYSCALL_DEFINE3(readv, unsigned long, fd, const struct iovec __user *, vec,
if (f.file) {
loff_t pos = file_pos_read(f.file);
- ret = vfs_readv(f.file, vec, vlen, &pos);
+ ret = vfs_readv(f.file, vec, vlen, &pos, 0);
if (ret >= 0)
file_pos_write(f.file, pos);
fdput_pos(f);
@@ -910,7 +913,7 @@ SYSCALL_DEFINE3(writev, unsigned long, fd, const struct iovec __user *, vec,
if (f.file) {
loff_t pos = file_pos_read(f.file);
- ret = vfs_writev(f.file, vec, vlen, &pos);
+ ret = vfs_writev(f.file, vec, vlen, &pos, 0);
if (ret >= 0)
file_pos_write(f.file, pos);
fdput_pos(f);
@@ -942,7 +945,7 @@ SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
if (f.file) {
ret = -ESPIPE;
if (f.file->f_mode & FMODE_PREAD)
- ret = vfs_readv(f.file, vec, vlen, &pos);
+ ret = vfs_readv(f.file, vec, vlen, &pos, 0);
fdput(f);
}
@@ -966,7 +969,7 @@ SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
if (f.file) {
ret = -ESPIPE;
if (f.file->f_mode & FMODE_PWRITE)
- ret = vfs_writev(f.file, vec, vlen, &pos);
+ ret = vfs_writev(f.file, vec, vlen, &pos, 0);
fdput(f);
}
@@ -1014,7 +1017,7 @@ static ssize_t compat_do_readv_writev(int type, struct file *file,
if (iter_fn)
ret = do_iter_readv_writev(file, type, iov, nr_segs, tot_len,
- pos, iter_fn);
+ pos, iter_fn, 0);
else if (fnv)
ret = do_sync_readv_writev(file, iov, nr_segs, tot_len,
pos, fnv);
diff --git a/fs/splice.c b/fs/splice.c
index f5cb9ba..9591b9f 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -576,7 +576,7 @@ static ssize_t kernel_readv(struct file *file, const struct iovec *vec,
old_fs = get_fs();
set_fs(get_ds());
/* The cast to a user pointer is valid due to the set_fs() */
- res = vfs_readv(file, (const struct iovec __user *)vec, vlen, &pos);
+ res = vfs_readv(file, (const struct iovec __user *)vec, vlen, &pos, 0);
set_fs(old_fs);
return res;
diff --git a/include/linux/aio.h b/include/linux/aio.h
index d9c92da..9c1d499 100644
--- a/include/linux/aio.h
+++ b/include/linux/aio.h
@@ -52,6 +52,8 @@ struct kiocb {
* this is the underlying eventfd context to deliver events to.
*/
struct eventfd_ctx *ki_eventfd;
+
+ int ki_rwflags;
};
static inline bool is_sync_kiocb(struct kiocb *kiocb)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index a957d43..9ed5711 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1538,9 +1538,9 @@ ssize_t rw_copy_check_uvector(int type, const struct iovec __user * uvector,
extern ssize_t vfs_read(struct file *, char __user *, size_t, loff_t *);
extern ssize_t vfs_write(struct file *, const char __user *, size_t, loff_t *);
extern ssize_t vfs_readv(struct file *, const struct iovec __user *,
- unsigned long, loff_t *);
+ unsigned long, loff_t *, int);
extern ssize_t vfs_writev(struct file *, const struct iovec __user *,
- unsigned long, loff_t *);
+ unsigned long, loff_t *, int);
struct super_operations {
struct inode *(*alloc_inode)(struct super_block *sb);
--
1.9.1
--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org. For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v5 2/7] vfs: Define new syscalls preadv2,pwritev2
2014-11-05 21:14 [PATCH v5 0/7] vfs: Non-blockling buffered fs read (page cache only) Milosz Tanski
2014-11-05 21:14 ` [PATCH v5 1/7] vfs: Prepare for adding a new preadv/pwritev with user flags Milosz Tanski
@ 2014-11-05 21:14 ` Milosz Tanski
2014-11-06 23:25 ` Jeff Moyer
2014-11-05 21:14 ` [PATCH v5 3/7] x86: wire up preadv2 and pwritev2 Milosz Tanski
` (7 subsequent siblings)
9 siblings, 1 reply; 27+ messages in thread
From: Milosz Tanski @ 2014-11-05 21:14 UTC (permalink / raw)
To: linux-kernel
Cc: Christoph Hellwig, linux-fsdevel, linux-aio, Mel Gorman,
Volker Lendecke, Tejun Heo, Jeff Moyer, Theodore Ts'o,
Al Viro, linux-api, Michael Kerrisk, linux-arch, linux-mm
New syscalls that take an flag argument. This change does not add any specific
flags.
Signed-off-by: Milosz Tanski <milosz@adfin.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
fs/read_write.c | 176 ++++++++++++++++++++++++++++++--------
include/linux/compat.h | 6 ++
include/linux/syscalls.h | 6 ++
include/uapi/asm-generic/unistd.h | 6 +-
mm/filemap.c | 5 +-
5 files changed, 158 insertions(+), 41 deletions(-)
diff --git a/fs/read_write.c b/fs/read_write.c
index 94b2d34..907735c 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -866,6 +866,8 @@ ssize_t vfs_readv(struct file *file, const struct iovec __user *vec,
return -EBADF;
if (!(file->f_mode & FMODE_CAN_READ))
return -EINVAL;
+ if (flags & ~0)
+ return -EINVAL;
return do_readv_writev(READ, file, vec, vlen, pos, flags);
}
@@ -879,21 +881,23 @@ ssize_t vfs_writev(struct file *file, const struct iovec __user *vec,
return -EBADF;
if (!(file->f_mode & FMODE_CAN_WRITE))
return -EINVAL;
+ if (flags & ~0)
+ return -EINVAL;
return do_readv_writev(WRITE, file, vec, vlen, pos, flags);
}
EXPORT_SYMBOL(vfs_writev);
-SYSCALL_DEFINE3(readv, unsigned long, fd, const struct iovec __user *, vec,
- unsigned long, vlen)
+static ssize_t do_readv(unsigned long fd, const struct iovec __user *vec,
+ unsigned long vlen, int flags)
{
struct fd f = fdget_pos(fd);
ssize_t ret = -EBADF;
if (f.file) {
loff_t pos = file_pos_read(f.file);
- ret = vfs_readv(f.file, vec, vlen, &pos, 0);
+ ret = vfs_readv(f.file, vec, vlen, &pos, flags);
if (ret >= 0)
file_pos_write(f.file, pos);
fdput_pos(f);
@@ -905,15 +909,15 @@ SYSCALL_DEFINE3(readv, unsigned long, fd, const struct iovec __user *, vec,
return ret;
}
-SYSCALL_DEFINE3(writev, unsigned long, fd, const struct iovec __user *, vec,
- unsigned long, vlen)
+static ssize_t do_writev(unsigned long fd, const struct iovec __user *vec,
+ unsigned long vlen, int flags)
{
struct fd f = fdget_pos(fd);
ssize_t ret = -EBADF;
if (f.file) {
loff_t pos = file_pos_read(f.file);
- ret = vfs_writev(f.file, vec, vlen, &pos, 0);
+ ret = vfs_writev(f.file, vec, vlen, &pos, flags);
if (ret >= 0)
file_pos_write(f.file, pos);
fdput_pos(f);
@@ -931,10 +935,9 @@ static inline loff_t pos_from_hilo(unsigned long high, unsigned long low)
return (((loff_t)high << HALF_LONG_BITS) << HALF_LONG_BITS) | low;
}
-SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
- unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h)
+static ssize_t do_preadv(unsigned long fd, const struct iovec __user *vec,
+ unsigned long vlen, loff_t pos, int flags)
{
- loff_t pos = pos_from_hilo(pos_h, pos_l);
struct fd f;
ssize_t ret = -EBADF;
@@ -945,7 +948,7 @@ SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
if (f.file) {
ret = -ESPIPE;
if (f.file->f_mode & FMODE_PREAD)
- ret = vfs_readv(f.file, vec, vlen, &pos, 0);
+ ret = vfs_readv(f.file, vec, vlen, &pos, flags);
fdput(f);
}
@@ -955,10 +958,9 @@ SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
return ret;
}
-SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
- unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h)
+static ssize_t do_pwritev(unsigned long fd, const struct iovec __user *vec,
+ unsigned long vlen, loff_t pos, int flags)
{
- loff_t pos = pos_from_hilo(pos_h, pos_l);
struct fd f;
ssize_t ret = -EBADF;
@@ -969,7 +971,7 @@ SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
if (f.file) {
ret = -ESPIPE;
if (f.file->f_mode & FMODE_PWRITE)
- ret = vfs_writev(f.file, vec, vlen, &pos, 0);
+ ret = vfs_writev(f.file, vec, vlen, &pos, flags);
fdput(f);
}
@@ -979,11 +981,63 @@ SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
return ret;
}
+SYSCALL_DEFINE3(readv, unsigned long, fd, const struct iovec __user *, vec,
+ unsigned long, vlen)
+{
+ return do_readv(fd, vec, vlen, 0);
+}
+
+SYSCALL_DEFINE3(writev, unsigned long, fd, const struct iovec __user *, vec,
+ unsigned long, vlen)
+{
+ return do_writev(fd, vec, vlen, 0);
+}
+
+SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
+ unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h)
+{
+ loff_t pos = pos_from_hilo(pos_h, pos_l);
+
+ return do_preadv(fd, vec, vlen, pos, 0);
+}
+
+SYSCALL_DEFINE6(preadv2, unsigned long, fd, const struct iovec __user *, vec,
+ unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h,
+ int, flags)
+{
+ loff_t pos = pos_from_hilo(pos_h, pos_l);
+
+ if (pos == -1)
+ return do_readv(fd, vec, vlen, flags);
+
+ return do_preadv(fd, vec, vlen, pos, flags);
+}
+
+SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
+ unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h)
+{
+ loff_t pos = pos_from_hilo(pos_h, pos_l);
+
+ return do_pwritev(fd, vec, vlen, pos, 0);
+}
+
+SYSCALL_DEFINE6(pwritev2, unsigned long, fd, const struct iovec __user *, vec,
+ unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h,
+ int, flags)
+{
+ loff_t pos = pos_from_hilo(pos_h, pos_l);
+
+ if (pos == -1)
+ return do_writev(fd, vec, vlen, flags);
+
+ return do_pwritev(fd, vec, vlen, pos, flags);
+}
+
#ifdef CONFIG_COMPAT
static ssize_t compat_do_readv_writev(int type, struct file *file,
const struct compat_iovec __user *uvector,
- unsigned long nr_segs, loff_t *pos)
+ unsigned long nr_segs, loff_t *pos, int flags)
{
compat_ssize_t tot_len;
struct iovec iovstack[UIO_FASTIOV];
@@ -1017,7 +1071,7 @@ static ssize_t compat_do_readv_writev(int type, struct file *file,
if (iter_fn)
ret = do_iter_readv_writev(file, type, iov, nr_segs, tot_len,
- pos, iter_fn, 0);
+ pos, iter_fn, flags);
else if (fnv)
ret = do_sync_readv_writev(file, iov, nr_segs, tot_len,
pos, fnv);
@@ -1041,7 +1095,7 @@ out:
static size_t compat_readv(struct file *file,
const struct compat_iovec __user *vec,
- unsigned long vlen, loff_t *pos)
+ unsigned long vlen, loff_t *pos, int flags)
{
ssize_t ret = -EBADF;
@@ -1052,7 +1106,7 @@ static size_t compat_readv(struct file *file,
if (!(file->f_mode & FMODE_CAN_READ))
goto out;
- ret = compat_do_readv_writev(READ, file, vec, vlen, pos);
+ ret = compat_do_readv_writev(READ, file, vec, vlen, pos, flags);
out:
if (ret > 0)
@@ -1061,9 +1115,9 @@ out:
return ret;
}
-COMPAT_SYSCALL_DEFINE3(readv, compat_ulong_t, fd,
- const struct compat_iovec __user *,vec,
- compat_ulong_t, vlen)
+static size_t __compat_sys_readv(compat_ulong_t fd,
+ const struct compat_iovec __user *vec,
+ compat_ulong_t vlen, int flags)
{
struct fd f = fdget_pos(fd);
ssize_t ret;
@@ -1072,28 +1126,34 @@ COMPAT_SYSCALL_DEFINE3(readv, compat_ulong_t, fd,
if (!f.file)
return -EBADF;
pos = f.file->f_pos;
- ret = compat_readv(f.file, vec, vlen, &pos);
+ ret = compat_readv(f.file, vec, vlen, &pos, flags);
if (ret >= 0)
f.file->f_pos = pos;
fdput_pos(f);
return ret;
+
+}
+
+COMPAT_SYSCALL_DEFINE3(readv, compat_ulong_t, fd,
+ const struct compat_iovec __user *,vec,
+ compat_ulong_t, vlen)
+{
+ return __compat_sys_readv(fd, vec, vlen, 0);
}
static long __compat_sys_preadv64(unsigned long fd,
const struct compat_iovec __user *vec,
- unsigned long vlen, loff_t pos)
+ unsigned long vlen, loff_t pos, int flags)
{
struct fd f;
ssize_t ret;
- if (pos < 0)
- return -EINVAL;
f = fdget(fd);
if (!f.file)
return -EBADF;
ret = -ESPIPE;
if (f.file->f_mode & FMODE_PREAD)
- ret = compat_readv(f.file, vec, vlen, &pos);
+ ret = compat_readv(f.file, vec, vlen, &pos, flags);
fdput(f);
return ret;
}
@@ -1103,7 +1163,10 @@ COMPAT_SYSCALL_DEFINE4(preadv64, unsigned long, fd,
const struct compat_iovec __user *,vec,
unsigned long, vlen, loff_t, pos)
{
- return __compat_sys_preadv64(fd, vec, vlen, pos);
+ if (pos < 0)
+ return -EINVAL;
+
+ return __compat_sys_preadv64(fd, vec, vlen, pos, 0);
}
#endif
@@ -1113,12 +1176,28 @@ COMPAT_SYSCALL_DEFINE5(preadv, compat_ulong_t, fd,
{
loff_t pos = ((loff_t)pos_high << 32) | pos_low;
- return __compat_sys_preadv64(fd, vec, vlen, pos);
+ if (pos < 0)
+ return -EINVAL;
+
+ return __compat_sys_preadv64(fd, vec, vlen, pos, 0);
+}
+
+COMPAT_SYSCALL_DEFINE6(preadv2, compat_ulong_t, fd,
+ const struct compat_iovec __user *,vec,
+ compat_ulong_t, vlen, u32, pos_low, u32, pos_high,
+ int, flags)
+{
+ loff_t pos = ((loff_t)pos_high << 32) | pos_low;
+
+ if (pos == -1)
+ return __compat_sys_readv(fd, vec, vlen, flags);
+
+ return __compat_sys_preadv64(fd, vec, vlen, pos, flags);
}
static size_t compat_writev(struct file *file,
const struct compat_iovec __user *vec,
- unsigned long vlen, loff_t *pos)
+ unsigned long vlen, loff_t *pos, int flags)
{
ssize_t ret = -EBADF;
@@ -1129,7 +1208,7 @@ static size_t compat_writev(struct file *file,
if (!(file->f_mode & FMODE_CAN_WRITE))
goto out;
- ret = compat_do_readv_writev(WRITE, file, vec, vlen, pos);
+ ret = compat_do_readv_writev(WRITE, file, vec, vlen, pos, flags);
out:
if (ret > 0)
@@ -1138,9 +1217,9 @@ out:
return ret;
}
-COMPAT_SYSCALL_DEFINE3(writev, compat_ulong_t, fd,
- const struct compat_iovec __user *, vec,
- compat_ulong_t, vlen)
+static size_t __compat_sys_writev(compat_ulong_t fd,
+ const struct compat_iovec __user* vec,
+ compat_ulong_t vlen, int flags)
{
struct fd f = fdget_pos(fd);
ssize_t ret;
@@ -1149,28 +1228,36 @@ COMPAT_SYSCALL_DEFINE3(writev, compat_ulong_t, fd,
if (!f.file)
return -EBADF;
pos = f.file->f_pos;
- ret = compat_writev(f.file, vec, vlen, &pos);
+ ret = compat_writev(f.file, vec, vlen, &pos, flags);
if (ret >= 0)
f.file->f_pos = pos;
fdput_pos(f);
return ret;
}
+COMPAT_SYSCALL_DEFINE3(writev, compat_ulong_t, fd,
+ const struct compat_iovec __user *, vec,
+ compat_ulong_t, vlen)
+{
+ return __compat_sys_writev(fd, vec, vlen, 0);
+}
+
static long __compat_sys_pwritev64(unsigned long fd,
const struct compat_iovec __user *vec,
- unsigned long vlen, loff_t pos)
+ unsigned long vlen, loff_t pos, int flags)
{
struct fd f;
ssize_t ret;
if (pos < 0)
return -EINVAL;
+
f = fdget(fd);
if (!f.file)
return -EBADF;
ret = -ESPIPE;
if (f.file->f_mode & FMODE_PWRITE)
- ret = compat_writev(f.file, vec, vlen, &pos);
+ ret = compat_writev(f.file, vec, vlen, &pos, flags);
fdput(f);
return ret;
}
@@ -1180,7 +1267,7 @@ COMPAT_SYSCALL_DEFINE4(pwritev64, unsigned long, fd,
const struct compat_iovec __user *,vec,
unsigned long, vlen, loff_t, pos)
{
- return __compat_sys_pwritev64(fd, vec, vlen, pos);
+ return __compat_sys_pwritev64(fd, vec, vlen, pos, 0);
}
#endif
@@ -1190,8 +1277,21 @@ COMPAT_SYSCALL_DEFINE5(pwritev, compat_ulong_t, fd,
{
loff_t pos = ((loff_t)pos_high << 32) | pos_low;
- return __compat_sys_pwritev64(fd, vec, vlen, pos);
+ return __compat_sys_pwritev64(fd, vec, vlen, pos, 0);
}
+
+COMPAT_SYSCALL_DEFINE6(pwritev2, compat_ulong_t, fd,
+ const struct compat_iovec __user *,vec,
+ compat_ulong_t, vlen, u32, pos_low, u32, pos_high, int, flags)
+{
+ loff_t pos = ((loff_t)pos_high << 32) | pos_low;
+
+ if (pos == -1)
+ return __compat_sys_writev(fd, vec, vlen, flags);
+
+ return __compat_sys_pwritev64(fd, vec, vlen, pos, flags);
+}
+
#endif
static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
diff --git a/include/linux/compat.h b/include/linux/compat.h
index e649426..63a94e2 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -340,6 +340,12 @@ asmlinkage ssize_t compat_sys_preadv(compat_ulong_t fd,
asmlinkage ssize_t compat_sys_pwritev(compat_ulong_t fd,
const struct compat_iovec __user *vec,
compat_ulong_t vlen, u32 pos_low, u32 pos_high);
+asmlinkage ssize_t compat_sys_preadv2(compat_ulong_t fd,
+ const struct compat_iovec __user *vec,
+ compat_ulong_t vlen, u32 pos_low, u32 pos_high, int flags);
+asmlinkage ssize_t compat_sys_pwritev2(compat_ulong_t fd,
+ const struct compat_iovec __user *vec,
+ compat_ulong_t vlen, u32 pos_low, u32 pos_high, int flags);
#ifdef __ARCH_WANT_COMPAT_SYS_PREADV64
asmlinkage long compat_sys_preadv64(unsigned long fd,
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index bda9b81..cedc22e 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -571,8 +571,14 @@ asmlinkage long sys_pwrite64(unsigned int fd, const char __user *buf,
size_t count, loff_t pos);
asmlinkage long sys_preadv(unsigned long fd, const struct iovec __user *vec,
unsigned long vlen, unsigned long pos_l, unsigned long pos_h);
+asmlinkage long sys_preadv2(unsigned long fd, const struct iovec __user *vec,
+ unsigned long vlen, unsigned long pos_l, unsigned long pos_h,
+ int flags);
asmlinkage long sys_pwritev(unsigned long fd, const struct iovec __user *vec,
unsigned long vlen, unsigned long pos_l, unsigned long pos_h);
+asmlinkage long sys_pwritev2(unsigned long fd, const struct iovec __user *vec,
+ unsigned long vlen, unsigned long pos_l, unsigned long pos_h,
+ int flags);
asmlinkage long sys_getcwd(char __user *buf, unsigned long size);
asmlinkage long sys_mkdir(const char __user *pathname, umode_t mode);
asmlinkage long sys_chdir(const char __user *filename);
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 22749c1..9406018 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -213,6 +213,10 @@ __SC_COMP(__NR_pwrite64, sys_pwrite64, compat_sys_pwrite64)
__SC_COMP(__NR_preadv, sys_preadv, compat_sys_preadv)
#define __NR_pwritev 70
__SC_COMP(__NR_pwritev, sys_pwritev, compat_sys_pwritev)
+#define __NR_preadv2 281
+__SC_COMP(__NR_preadv2, sys_preadv2, compat_sys_preadv2)
+#define __NR_pwritev2 282
+__SC_COMP(__NR_pwritev2, sys_pwritev2, compat_sys_pwritev2)
/* fs/sendfile.c */
#define __NR3264_sendfile 71
@@ -709,7 +713,7 @@ __SYSCALL(__NR_memfd_create, sys_memfd_create)
__SYSCALL(__NR_bpf, sys_bpf)
#undef __NR_syscalls
-#define __NR_syscalls 281
+#define __NR_syscalls 283
/*
* All syscalls below here should go away really,
diff --git a/mm/filemap.c b/mm/filemap.c
index 14b4642..530c263 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1457,6 +1457,7 @@ static void shrink_readahead_size_eio(struct file *filp,
* @ppos: current file position
* @iter: data destination
* @written: already copied
+ * @flags: optional flags
*
* This is a generic file read routine, and uses the
* mapping->a_ops->readpage() function for the actual low-level stuff.
@@ -1465,7 +1466,7 @@ static void shrink_readahead_size_eio(struct file *filp,
* of the logic when it comes to error handling etc.
*/
static ssize_t do_generic_file_read(struct file *filp, loff_t *ppos,
- struct iov_iter *iter, ssize_t written)
+ struct iov_iter *iter, ssize_t written, int flags)
{
struct address_space *mapping = filp->f_mapping;
struct inode *inode = mapping->host;
@@ -1735,7 +1736,7 @@ generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
}
}
- retval = do_generic_file_read(file, ppos, iter, retval);
+ retval = do_generic_file_read(file, ppos, iter, retval, iocb->ki_rwflags);
out:
return retval;
}
--
1.9.1
--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org. For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v5 3/7] x86: wire up preadv2 and pwritev2
2014-11-05 21:14 [PATCH v5 0/7] vfs: Non-blockling buffered fs read (page cache only) Milosz Tanski
2014-11-05 21:14 ` [PATCH v5 1/7] vfs: Prepare for adding a new preadv/pwritev with user flags Milosz Tanski
2014-11-05 21:14 ` [PATCH v5 2/7] vfs: Define new syscalls preadv2,pwritev2 Milosz Tanski
@ 2014-11-05 21:14 ` Milosz Tanski
2014-11-05 21:14 ` [PATCH v5 4/7] vfs: RWF_NONBLOCK flag for preadv2 Milosz Tanski
` (6 subsequent siblings)
9 siblings, 0 replies; 27+ messages in thread
From: Milosz Tanski @ 2014-11-05 21:14 UTC (permalink / raw)
To: linux-kernel
Cc: Christoph Hellwig, linux-fsdevel, linux-aio, Mel Gorman,
Volker Lendecke, Tejun Heo, Jeff Moyer, Theodore Ts'o,
Al Viro, linux-api, Michael Kerrisk, linux-arch, x86,
linux-x86_64
Signed-off-by: Milosz Tanski <milosz@adfin.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
arch/x86/syscalls/syscall_32.tbl | 2 ++
arch/x86/syscalls/syscall_64.tbl | 2 ++
2 files changed, 4 insertions(+)
diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/syscalls/syscall_32.tbl
index 9fe1b5d..d592d87 100644
--- a/arch/x86/syscalls/syscall_32.tbl
+++ b/arch/x86/syscalls/syscall_32.tbl
@@ -364,3 +364,5 @@
355 i386 getrandom sys_getrandom
356 i386 memfd_create sys_memfd_create
357 i386 bpf sys_bpf
+358 i386 preadv2 sys_preadv2
+359 i386 pwritev2 sys_pwritev2
diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl
index 281150b..7be2447 100644
--- a/arch/x86/syscalls/syscall_64.tbl
+++ b/arch/x86/syscalls/syscall_64.tbl
@@ -328,6 +328,8 @@
319 common memfd_create sys_memfd_create
320 common kexec_file_load sys_kexec_file_load
321 common bpf sys_bpf
+322 64 preadv2 sys_preadv2
+323 64 pwritev2 sys_pwritev2
#
# x32-specific system call numbers start at 512 to avoid cache impact
--
1.9.1
--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org. For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v5 4/7] vfs: RWF_NONBLOCK flag for preadv2
2014-11-05 21:14 [PATCH v5 0/7] vfs: Non-blockling buffered fs read (page cache only) Milosz Tanski
` (2 preceding siblings ...)
2014-11-05 21:14 ` [PATCH v5 3/7] x86: wire up preadv2 and pwritev2 Milosz Tanski
@ 2014-11-05 21:14 ` Milosz Tanski
2014-11-10 16:07 ` Sage Weil
2014-11-05 21:14 ` [PATCH v5 5/7] xfs: add RWF_NONBLOCK support Milosz Tanski
` (5 subsequent siblings)
9 siblings, 1 reply; 27+ messages in thread
From: Milosz Tanski @ 2014-11-05 21:14 UTC (permalink / raw)
To: linux-kernel
Cc: Christoph Hellwig, linux-fsdevel, linux-aio, Mel Gorman,
Volker Lendecke, Tejun Heo, Jeff Moyer, Theodore Ts'o,
Al Viro, linux-api, Michael Kerrisk, linux-arch, ceph-devel,
linux-cifs, samba-technical, linux-nfs, linux-xfs, ocfs2-devel,
linux-mm
generic_file_read_iter() supports a new flag RWF_NONBLOCK which says that we
only want to read the data if it's already in the page cache.
Additionally, there are a few filesystems that we have to specifically
bail early if RWF_NONBLOCK because the op would block. Christoph Hellwig
contributed this code.
Signed-off-by: Milosz Tanski <milosz@adfin.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
---
fs/ceph/file.c | 2 ++
fs/cifs/file.c | 6 ++++++
fs/nfs/file.c | 5 ++++-
fs/ocfs2/file.c | 6 ++++++
fs/pipe.c | 3 ++-
fs/read_write.c | 38 +++++++++++++++++++++++++-------------
fs/xfs/xfs_file.c | 4 ++++
include/linux/fs.h | 3 +++
mm/filemap.c | 18 ++++++++++++++++++
mm/shmem.c | 4 ++++
10 files changed, 74 insertions(+), 15 deletions(-)
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index d7e0da8..b798b5c 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -822,6 +822,8 @@ again:
if ((got & (CEPH_CAP_FILE_CACHE|CEPH_CAP_FILE_LAZYIO)) == 0 ||
(iocb->ki_filp->f_flags & O_DIRECT) ||
(fi->flags & CEPH_F_SYNC)) {
+ if (iocb->ki_rwflags & O_NONBLOCK)
+ return -EAGAIN;
dout("aio_sync_read %p %llx.%llx %llu~%u got cap refs on %s\n",
inode, ceph_vinop(inode), iocb->ki_pos, (unsigned)len,
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 3e4d00a..c485afa 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -3005,6 +3005,9 @@ ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to)
struct cifs_readdata *rdata, *tmp;
struct list_head rdata_list;
+ if (iocb->ki_rwflags & RWF_NONBLOCK)
+ return -EAGAIN;
+
len = iov_iter_count(to);
if (!len)
return 0;
@@ -3123,6 +3126,9 @@ cifs_strict_readv(struct kiocb *iocb, struct iov_iter *to)
((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NOPOSIXBRL) == 0))
return generic_file_read_iter(iocb, to);
+ if (iocb->ki_rwflags & RWF_NONBLOCK)
+ return -EAGAIN;
+
/*
* We need to hold the sem to be sure nobody modifies lock list
* with a brlock that prevents reading.
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 2ab6f00..aa9046f 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -171,8 +171,11 @@ nfs_file_read(struct kiocb *iocb, struct iov_iter *to)
struct inode *inode = file_inode(iocb->ki_filp);
ssize_t result;
- if (iocb->ki_filp->f_flags & O_DIRECT)
+ if (iocb->ki_filp->f_flags & O_DIRECT) {
+ if (iocb->ki_rwflags & O_NONBLOCK)
+ return -EAGAIN;
return nfs_file_direct_read(iocb, to, iocb->ki_pos);
+ }
dprintk("NFS: read(%pD2, %zu@%lu)\n",
iocb->ki_filp,
diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index 324dc93..bb66ca4 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -2472,6 +2472,12 @@ static ssize_t ocfs2_file_read_iter(struct kiocb *iocb,
filp->f_path.dentry->d_name.name,
to->nr_segs); /* GRRRRR */
+ /*
+ * No non-blocking reads for ocfs2 for now. Might be doable with
+ * non-blocking cluster lock helpers.
+ */
+ if (iocb->ki_rwflags & RWF_NONBLOCK)
+ return -EAGAIN;
if (!inode) {
ret = -EINVAL;
diff --git a/fs/pipe.c b/fs/pipe.c
index 21981e5..212bf68 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -302,7 +302,8 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to)
*/
if (ret)
break;
- if (filp->f_flags & O_NONBLOCK) {
+ if ((filp->f_flags & O_NONBLOCK) ||
+ (iocb->ki_rwflags & RWF_NONBLOCK)) {
ret = -EAGAIN;
break;
}
diff --git a/fs/read_write.c b/fs/read_write.c
index 907735c..cba7d4c 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -835,14 +835,19 @@ static ssize_t do_readv_writev(int type, struct file *file,
file_start_write(file);
}
- if (iter_fn)
+ if (iter_fn) {
ret = do_iter_readv_writev(file, type, iov, nr_segs, tot_len,
pos, iter_fn, flags);
- else if (fnv)
- ret = do_sync_readv_writev(file, iov, nr_segs, tot_len,
- pos, fnv);
- else
- ret = do_loop_readv_writev(file, iov, nr_segs, pos, fn);
+ } else {
+ if (type == READ && (flags & RWF_NONBLOCK))
+ return -EAGAIN;
+
+ if (fnv)
+ ret = do_sync_readv_writev(file, iov, nr_segs, tot_len,
+ pos, fnv);
+ else
+ ret = do_loop_readv_writev(file, iov, nr_segs, pos, fn);
+ }
if (type != READ)
file_end_write(file);
@@ -866,8 +871,10 @@ ssize_t vfs_readv(struct file *file, const struct iovec __user *vec,
return -EBADF;
if (!(file->f_mode & FMODE_CAN_READ))
return -EINVAL;
- if (flags & ~0)
+ if (flags & ~RWF_NONBLOCK)
return -EINVAL;
+ if ((file->f_flags & O_DIRECT) && (flags & RWF_NONBLOCK))
+ return -EAGAIN;
return do_readv_writev(READ, file, vec, vlen, pos, flags);
}
@@ -1069,14 +1076,19 @@ static ssize_t compat_do_readv_writev(int type, struct file *file,
file_start_write(file);
}
- if (iter_fn)
+ if (iter_fn) {
ret = do_iter_readv_writev(file, type, iov, nr_segs, tot_len,
pos, iter_fn, flags);
- else if (fnv)
- ret = do_sync_readv_writev(file, iov, nr_segs, tot_len,
- pos, fnv);
- else
- ret = do_loop_readv_writev(file, iov, nr_segs, pos, fn);
+ } else {
+ if (type == READ && (flags & RWF_NONBLOCK))
+ return -EAGAIN;
+
+ if (fnv)
+ ret = do_sync_readv_writev(file, iov, nr_segs, tot_len,
+ pos, fnv);
+ else
+ ret = do_loop_readv_writev(file, iov, nr_segs, pos, fn);
+ }
if (type != READ)
file_end_write(file);
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index eb596b4..b1f6334 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -246,6 +246,10 @@ xfs_file_read_iter(
XFS_STATS_INC(xs_read_calls);
+ /* XXX: need a non-blocking iolock helper, shouldn't be too hard */
+ if (iocb->ki_rwflags & RWF_NONBLOCK)
+ return -EAGAIN;
+
if (unlikely(file->f_flags & O_DIRECT))
ioflags |= XFS_IO_ISDIRECT;
if (file->f_mode & FMODE_NOCMTIME)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 9ed5711..eaebd99 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1459,6 +1459,9 @@ struct block_device_operations;
#define HAVE_COMPAT_IOCTL 1
#define HAVE_UNLOCKED_IOCTL 1
+/* These flags are used for the readv/writev syscalls with flags. */
+#define RWF_NONBLOCK 0x00000001
+
struct iov_iter;
struct file_operations {
diff --git a/mm/filemap.c b/mm/filemap.c
index 530c263..09d3af3 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1494,6 +1494,8 @@ static ssize_t do_generic_file_read(struct file *filp, loff_t *ppos,
find_page:
page = find_get_page(mapping, index);
if (!page) {
+ if (flags & RWF_NONBLOCK)
+ goto would_block;
page_cache_sync_readahead(mapping,
ra, filp,
index, last_index - index);
@@ -1585,6 +1587,11 @@ page_ok:
continue;
page_not_up_to_date:
+ if (flags & RWF_NONBLOCK) {
+ page_cache_release(page);
+ goto would_block;
+ }
+
/* Get exclusive access to the page ... */
error = lock_page_killable(page);
if (unlikely(error))
@@ -1604,6 +1611,12 @@ page_not_up_to_date_locked:
goto page_ok;
}
+ if (flags & RWF_NONBLOCK) {
+ unlock_page(page);
+ page_cache_release(page);
+ goto would_block;
+ }
+
readpage:
/*
* A previous I/O error may have been due to temporary
@@ -1674,6 +1687,8 @@ no_cached_page:
goto readpage;
}
+would_block:
+ error = -EAGAIN;
out:
ra->prev_pos = prev_index;
ra->prev_pos <<= PAGE_CACHE_SHIFT;
@@ -1707,6 +1722,9 @@ generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
size_t count = iov_iter_count(iter);
loff_t size;
+ if (iocb->ki_rwflags & RWF_NONBLOCK)
+ return -EAGAIN;
+
if (!count)
goto out; /* skip atime */
size = i_size_read(inode);
diff --git a/mm/shmem.c b/mm/shmem.c
index cd6fc75..5c30f04 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1531,6 +1531,10 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
ssize_t retval = 0;
loff_t *ppos = &iocb->ki_pos;
+ /* XXX: should be easily supportable */
+ if (iocb->ki_rwflags & RWF_NONBLOCK)
+ return -EAGAIN;
+
/*
* Might this read be for a stacking filesystem? Then when reading
* holes of a sparse file, we actually need to allocate those pages,
--
1.9.1
--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org. For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v5 5/7] xfs: add RWF_NONBLOCK support
2014-11-05 21:14 [PATCH v5 0/7] vfs: Non-blockling buffered fs read (page cache only) Milosz Tanski
` (3 preceding siblings ...)
2014-11-05 21:14 ` [PATCH v5 4/7] vfs: RWF_NONBLOCK flag for preadv2 Milosz Tanski
@ 2014-11-05 21:14 ` Milosz Tanski
2014-11-05 21:14 ` [PATCH v5 6/7] fs: pass iocb to generic_write_sync Milosz Tanski
` (4 subsequent siblings)
9 siblings, 0 replies; 27+ messages in thread
From: Milosz Tanski @ 2014-11-05 21:14 UTC (permalink / raw)
To: linux-kernel
Cc: Christoph Hellwig, Christoph Hellwig, linux-fsdevel, linux-aio,
Mel Gorman, Volker Lendecke, Tejun Heo, Jeff Moyer,
Theodore Ts'o, Al Viro, linux-api, Michael Kerrisk,
linux-arch, linux-xfs
From: Christoph Hellwig <hch@lst.de>
Add support for non-blocking reads. The guts are handled by the generic
code, the only addition is a non-blocking variant of xfs_rw_ilock.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/xfs/xfs_file.c | 32 +++++++++++++++++++++++++++-----
1 file changed, 27 insertions(+), 5 deletions(-)
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index b1f6334..0655915 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -61,6 +61,25 @@ xfs_rw_ilock(
xfs_ilock(ip, type);
}
+static inline bool
+xfs_rw_ilock_nowait(
+ struct xfs_inode *ip,
+ int type)
+{
+ if (type & XFS_IOLOCK_EXCL) {
+ if (!mutex_trylock(&VFS_I(ip)->i_mutex))
+ return false;
+ if (!xfs_ilock_nowait(ip, type)) {
+ mutex_unlock(&VFS_I(ip)->i_mutex);
+ return false;
+ }
+ } else {
+ if (!xfs_ilock_nowait(ip, type))
+ return false;
+ }
+ return true;
+}
+
static inline void
xfs_rw_iunlock(
struct xfs_inode *ip,
@@ -246,10 +265,6 @@ xfs_file_read_iter(
XFS_STATS_INC(xs_read_calls);
- /* XXX: need a non-blocking iolock helper, shouldn't be too hard */
- if (iocb->ki_rwflags & RWF_NONBLOCK)
- return -EAGAIN;
-
if (unlikely(file->f_flags & O_DIRECT))
ioflags |= XFS_IO_ISDIRECT;
if (file->f_mode & FMODE_NOCMTIME)
@@ -287,7 +302,14 @@ xfs_file_read_iter(
* This allows the normal direct IO case of no page cache pages to
* proceeed concurrently without serialisation.
*/
- xfs_rw_ilock(ip, XFS_IOLOCK_SHARED);
+ if (iocb->ki_rwflags & RWF_NONBLOCK) {
+ if (ioflags & XFS_IO_ISDIRECT)
+ return -EAGAIN;
+ if (!xfs_rw_ilock_nowait(ip, XFS_IOLOCK_SHARED))
+ return -EAGAIN;
+ } else {
+ xfs_rw_ilock(ip, XFS_IOLOCK_SHARED);
+ }
if ((ioflags & XFS_IO_ISDIRECT) && inode->i_mapping->nrpages) {
xfs_rw_iunlock(ip, XFS_IOLOCK_SHARED);
xfs_rw_ilock(ip, XFS_IOLOCK_EXCL);
--
1.9.1
--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org. For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v5 6/7] fs: pass iocb to generic_write_sync
2014-11-05 21:14 [PATCH v5 0/7] vfs: Non-blockling buffered fs read (page cache only) Milosz Tanski
` (4 preceding siblings ...)
2014-11-05 21:14 ` [PATCH v5 5/7] xfs: add RWF_NONBLOCK support Milosz Tanski
@ 2014-11-05 21:14 ` Milosz Tanski
2014-11-06 10:18 ` [Cluster-devel] " Steven Whitehouse
` (2 more replies)
2014-11-05 21:14 ` [PATCH v5 7/7] fs: add a flag for per-operation O_DSYNC semantics Milosz Tanski
` (3 subsequent siblings)
9 siblings, 3 replies; 27+ messages in thread
From: Milosz Tanski @ 2014-11-05 21:14 UTC (permalink / raw)
To: linux-kernel
Cc: Christoph Hellwig, Christoph Hellwig, linux-fsdevel, linux-aio,
Mel Gorman, Volker Lendecke, Tejun Heo, Jeff Moyer,
Theodore Ts'o, Al Viro, linux-api, Michael Kerrisk,
linux-arch, linux-btrfs, linux-cifs, linux-ext4, linux-ntfs-dev,
linux-xfs, cluster-devel
From: Christoph Hellwig <hch@lst.de>
Clean up the generic_write_sync by just passing an iocb and a bytes
written / negative errno argument. In addition to simplifying the
callers this also prepares for passing a per-operation O_DSYNC
flag. Two callers didn't quite fit that scheme:
- dio_complete didn't both to update ki_pos as we don't need it
on a iocb that is about to be freed, so we had to add it. Additionally
it also synced out written data in the error case, which has been
changed to operate like the other callers.
- gfs2 also used generic_write_sync to implement a crude version
of fallocate. It has been switched to use an open coded variant
instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/block_dev.c | 8 +-------
fs/btrfs/file.c | 7 ++-----
fs/cifs/file.c | 8 +-------
fs/direct-io.c | 8 ++------
fs/ext4/file.c | 8 +-------
fs/gfs2/file.c | 9 +++++++--
fs/ntfs/file.c | 8 ++------
fs/udf/file.c | 11 ++---------
fs/xfs/xfs_file.c | 8 +-------
include/linux/fs.h | 8 +-------
mm/filemap.c | 30 ++++++++++++++++++++----------
11 files changed, 40 insertions(+), 73 deletions(-)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index cc9d411..c529b1c 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1568,18 +1568,12 @@ static long block_ioctl(struct file *file, unsigned cmd, unsigned long arg)
*/
ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from)
{
- struct file *file = iocb->ki_filp;
struct blk_plug plug;
ssize_t ret;
blk_start_plug(&plug);
ret = __generic_file_write_iter(iocb, from);
- if (ret > 0) {
- ssize_t err;
- err = generic_write_sync(file, iocb->ki_pos - ret, ret);
- if (err < 0)
- ret = err;
- }
+ ret = generic_write_sync(iocb, ret);
blk_finish_plug(&plug);
return ret;
}
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index a18ceab..4f4a6f7 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1820,11 +1820,8 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
*/
BTRFS_I(inode)->last_trans = root->fs_info->generation + 1;
BTRFS_I(inode)->last_sub_trans = root->log_transid;
- if (num_written > 0) {
- err = generic_write_sync(file, pos, num_written);
- if (err < 0)
- num_written = err;
- }
+
+ num_written = generic_write_sync(iocb, num_written);
if (sync)
atomic_dec(&BTRFS_I(inode)->sync_writers);
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index c485afa..32359de 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -2706,13 +2706,7 @@ cifs_writev(struct kiocb *iocb, struct iov_iter *from)
rc = __generic_file_write_iter(iocb, from);
mutex_unlock(&inode->i_mutex);
- if (rc > 0) {
- ssize_t err;
-
- err = generic_write_sync(file, iocb->ki_pos - rc, rc);
- if (err < 0)
- rc = err;
- }
+ rc = generic_write_sync(iocb, rc);
} else {
mutex_unlock(&inode->i_mutex);
}
diff --git a/fs/direct-io.c b/fs/direct-io.c
index e181b6b..b72ac83 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -257,12 +257,8 @@ static ssize_t dio_complete(struct dio *dio, loff_t offset, ssize_t ret,
inode_dio_done(dio->inode);
if (is_async) {
if (dio->rw & WRITE) {
- int err;
-
- err = generic_write_sync(dio->iocb->ki_filp, offset,
- transferred);
- if (err < 0 && ret > 0)
- ret = err;
+ dio->iocb->ki_pos = offset + transferred;
+ ret = generic_write_sync(dio->iocb, ret);
}
aio_complete(dio->iocb, ret, 0);
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index aca7b24..79b000c 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -175,13 +175,7 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
ret = __generic_file_write_iter(iocb, from);
mutex_unlock(&inode->i_mutex);
- if (ret > 0) {
- ssize_t err;
-
- err = generic_write_sync(file, iocb->ki_pos - ret, ret);
- if (err < 0)
- ret = err;
- }
+ ret = generic_write_sync(iocb, ret);
if (o_direct)
blk_finish_plug(&plug);
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 80dd44d..3fafeca 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -895,8 +895,13 @@ retry:
gfs2_quota_unlock(ip);
}
- if (error == 0)
- error = generic_write_sync(file, pos, count);
+ if (error)
+ goto out_unlock;
+
+ if ((file->f_flags & O_DSYNC) || IS_SYNC(file->f_mapping->host)) {
+ error = vfs_fsync_range(file, pos, pos + count - 1,
+ (file->f_flags & __O_SYNC) ? 0 : 1);
+ }
goto out_unlock;
out_trans_fail:
diff --git a/fs/ntfs/file.c b/fs/ntfs/file.c
index 643faa4..4f3d664 100644
--- a/fs/ntfs/file.c
+++ b/fs/ntfs/file.c
@@ -2127,12 +2127,8 @@ static ssize_t ntfs_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
mutex_lock(&inode->i_mutex);
ret = ntfs_file_aio_write_nolock(iocb, iov, nr_segs, &iocb->ki_pos);
mutex_unlock(&inode->i_mutex);
- if (ret > 0) {
- int err = generic_write_sync(file, iocb->ki_pos - ret, ret);
- if (err < 0)
- ret = err;
- }
- return ret;
+
+ return generic_write_sync(iocb, ret);
}
/**
diff --git a/fs/udf/file.c b/fs/udf/file.c
index bb15771..1cdabd0 100644
--- a/fs/udf/file.c
+++ b/fs/udf/file.c
@@ -155,16 +155,9 @@ static ssize_t udf_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
retval = __generic_file_write_iter(iocb, from);
mutex_unlock(&inode->i_mutex);
- if (retval > 0) {
- ssize_t err;
-
+ if (retval > 0)
mark_inode_dirty(inode);
- err = generic_write_sync(file, iocb->ki_pos - retval, retval);
- if (err < 0)
- retval = err;
- }
-
- return retval;
+ return generic_write_sync(iocb, retval);
}
long udf_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 0655915..a8cab66 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -792,14 +792,8 @@ xfs_file_write_iter(
ret = xfs_file_buffered_aio_write(iocb, from);
if (ret > 0) {
- ssize_t err;
-
XFS_STATS_ADD(xs_write_bytes, ret);
-
- /* Handle various SYNC-type writes */
- err = generic_write_sync(file, iocb->ki_pos - ret, ret);
- if (err < 0)
- ret = err;
+ ret = generic_write_sync(iocb, ret);
}
return ret;
}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index eaebd99..7d0e116 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2242,13 +2242,7 @@ extern int filemap_fdatawrite_range(struct address_space *mapping,
extern int vfs_fsync_range(struct file *file, loff_t start, loff_t end,
int datasync);
extern int vfs_fsync(struct file *file, int datasync);
-static inline int generic_write_sync(struct file *file, loff_t pos, loff_t count)
-{
- if (!(file->f_flags & O_DSYNC) && !IS_SYNC(file->f_mapping->host))
- return 0;
- return vfs_fsync_range(file, pos, pos + count - 1,
- (file->f_flags & __O_SYNC) ? 0 : 1);
-}
+extern int generic_write_sync(struct kiocb *iocb, loff_t count);
extern void emergency_sync(void);
extern void emergency_remount(void);
#ifdef CONFIG_BLOCK
diff --git a/mm/filemap.c b/mm/filemap.c
index 09d3af3..6107058 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2664,6 +2664,24 @@ out:
}
EXPORT_SYMBOL(__generic_file_write_iter);
+int generic_write_sync(struct kiocb *iocb, loff_t count)
+{
+ struct file *file = iocb->ki_filp;
+
+ if (count > 0 &&
+ ((file->f_flags & O_DSYNC) || IS_SYNC(file->f_mapping->host))) {
+ bool fdatasync = !(file->f_flags & __O_SYNC);
+ ssize_t ret = 0;
+
+ ret = vfs_fsync_range(file, iocb->ki_pos - count,
+ iocb->ki_pos - 1, fdatasync);
+ if (ret < 0)
+ return ret;
+ }
+ return count;
+}
+EXPORT_SYMBOL(generic_write_sync);
+
/**
* generic_file_write_iter - write data to a file
* @iocb: IO state structure
@@ -2675,22 +2693,14 @@ EXPORT_SYMBOL(__generic_file_write_iter);
*/
ssize_t generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
{
- struct file *file = iocb->ki_filp;
- struct inode *inode = file->f_mapping->host;
+ struct inode *inode = iocb->ki_filp->f_mapping->host;
ssize_t ret;
mutex_lock(&inode->i_mutex);
ret = __generic_file_write_iter(iocb, from);
mutex_unlock(&inode->i_mutex);
- if (ret > 0) {
- ssize_t err;
-
- err = generic_write_sync(file, iocb->ki_pos - ret, ret);
- if (err < 0)
- ret = err;
- }
- return ret;
+ return generic_write_sync(iocb, ret);
}
EXPORT_SYMBOL(generic_file_write_iter);
--
1.9.1
--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org. For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v5 7/7] fs: add a flag for per-operation O_DSYNC semantics
2014-11-05 21:14 [PATCH v5 0/7] vfs: Non-blockling buffered fs read (page cache only) Milosz Tanski
` (5 preceding siblings ...)
2014-11-05 21:14 ` [PATCH v5 6/7] fs: pass iocb to generic_write_sync Milosz Tanski
@ 2014-11-05 21:14 ` Milosz Tanski
2014-11-06 23:46 ` Jeff Moyer
2014-11-10 16:07 ` [PATCH v5 7/7] fs: " Sage Weil
[not found] ` <cover.1415220890.git.milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org>
` (2 subsequent siblings)
9 siblings, 2 replies; 27+ messages in thread
From: Milosz Tanski @ 2014-11-05 21:14 UTC (permalink / raw)
To: linux-kernel
Cc: Christoph Hellwig, Christoph Hellwig, linux-fsdevel, linux-aio,
Mel Gorman, Volker Lendecke, Tejun Heo, Jeff Moyer,
Theodore Ts'o, Al Viro, linux-api, Michael Kerrisk,
linux-arch, ceph-devel, fuse-devel, linux-nfs, ocfs2-devel,
linux-mm
From: Christoph Hellwig <hch@lst.de>
With the new read/write with flags syscalls we can support a flag
to enable O_DSYNC semantics on a per-operation basis. This іs
useful to implement protocols like SMB, NFS or SCSI that have such
per-operation flags.
Example program below:
cat > pwritev2.c << EOF
(off_t) val, \
(off_t) ((((uint64_t) (val)) >> (sizeof (long) * 4)) >> (sizeof (long) * 4))
static ssize_t
pwritev2(int fd, const struct iovec *iov, int iovcnt, off_t offset, int flags)
{
return syscall(__NR_pwritev2, fd, iov, iovcnt, LO_HI_LONG(offset),
flags);
}
int main(int argc, char **argv)
{
int fd = open(argv[1], O_WRONLY|O_CREAT|O_TRUNC, 0666);
char buf[1024];
struct iovec iov = { .iov_base = buf, .iov_len = 1024 };
int ret;
if (fd < 0) {
perror("open");
return 0;
}
memset(buf, 0xfe, sizeof(buf));
ret = pwritev2(fd, &iov, 1, 0, RWF_DSYNC);
if (ret < 0)
perror("pwritev2");
else
printf("ret = %d\n", ret);
return 0;
}
EOF
Signed-off-by: Christoph Hellwig <hch@lst.de>
[milosz@adfin.com: added flag check to compat_do_readv_writev()]
Signed-off-by: Milosz Tanski <milosz@adfin.com>
---
fs/ceph/file.c | 4 +++-
fs/fuse/file.c | 2 ++
fs/nfs/file.c | 10 ++++++----
fs/ocfs2/file.c | 6 ++++--
fs/read_write.c | 20 +++++++++++++++-----
include/linux/fs.h | 3 ++-
mm/filemap.c | 4 +++-
7 files changed, 35 insertions(+), 14 deletions(-)
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index b798b5c..2d4e15a 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -983,7 +983,9 @@ retry_snap:
ceph_put_cap_refs(ci, got);
if (written >= 0 &&
- ((file->f_flags & O_SYNC) || IS_SYNC(file->f_mapping->host) ||
+ ((file->f_flags & O_SYNC) ||
+ IS_SYNC(file->f_mapping->host) ||
+ (iocb->ki_rwflags & RWF_DSYNC) ||
ceph_osdmap_flag(osdc->osdmap, CEPH_OSDMAP_NEARFULL))) {
err = vfs_fsync_range(file, pos, pos + written - 1, 1);
if (err < 0)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index caa8d95..bb4fb23 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1248,6 +1248,8 @@ static ssize_t fuse_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
written += written_buffered;
iocb->ki_pos = pos + written_buffered;
} else {
+ if (iocb->ki_rwflags & RWF_DSYNC)
+ return -EINVAL;
written = fuse_perform_write(file, mapping, from, pos);
if (written >= 0)
iocb->ki_pos = pos + written;
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index aa9046f..c59b0b7 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -652,13 +652,15 @@ static const struct vm_operations_struct nfs_file_vm_ops = {
.remap_pages = generic_file_remap_pages,
};
-static int nfs_need_sync_write(struct file *filp, struct inode *inode)
+static int nfs_need_sync_write(struct kiocb *iocb, struct inode *inode)
{
struct nfs_open_context *ctx;
- if (IS_SYNC(inode) || (filp->f_flags & O_DSYNC))
+ if (IS_SYNC(inode) ||
+ (iocb->ki_filp->f_flags & O_DSYNC) ||
+ (iocb->ki_rwflags & RWF_DSYNC))
return 1;
- ctx = nfs_file_open_context(filp);
+ ctx = nfs_file_open_context(iocb->ki_filp);
if (test_bit(NFS_CONTEXT_ERROR_WRITE, &ctx->flags) ||
nfs_ctx_key_to_expire(ctx))
return 1;
@@ -705,7 +707,7 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from)
written = result;
/* Return error values for O_DSYNC and IS_SYNC() */
- if (result >= 0 && nfs_need_sync_write(file, inode)) {
+ if (result >= 0 && nfs_need_sync_write(iocb, inode)) {
int err = vfs_fsync(file, 0);
if (err < 0)
result = err;
diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index bb66ca4..8f9a86b 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -2374,8 +2374,10 @@ out_dio:
/* buffered aio wouldn't have proper lock coverage today */
BUG_ON(ret == -EIOCBQUEUED && !(file->f_flags & O_DIRECT));
- if (((file->f_flags & O_DSYNC) && !direct_io) || IS_SYNC(inode) ||
- ((file->f_flags & O_DIRECT) && !direct_io)) {
+ if (((file->f_flags & O_DSYNC) && !direct_io) ||
+ IS_SYNC(inode) ||
+ ((file->f_flags & O_DIRECT) && !direct_io) ||
+ (iocb->ki_rwflags & RWF_DSYNC)) {
ret = filemap_fdatawrite_range(file->f_mapping, *ppos,
*ppos + count - 1);
if (ret < 0)
diff --git a/fs/read_write.c b/fs/read_write.c
index cba7d4c..3443265 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -839,8 +839,13 @@ static ssize_t do_readv_writev(int type, struct file *file,
ret = do_iter_readv_writev(file, type, iov, nr_segs, tot_len,
pos, iter_fn, flags);
} else {
- if (type == READ && (flags & RWF_NONBLOCK))
- return -EAGAIN;
+ if (type == READ) {
+ if (flags & RWF_NONBLOCK)
+ return -EAGAIN;
+ } else {
+ if (flags & RWF_DSYNC)
+ return -EINVAL;
+ }
if (fnv)
ret = do_sync_readv_writev(file, iov, nr_segs, tot_len,
@@ -888,7 +893,7 @@ ssize_t vfs_writev(struct file *file, const struct iovec __user *vec,
return -EBADF;
if (!(file->f_mode & FMODE_CAN_WRITE))
return -EINVAL;
- if (flags & ~0)
+ if (flags & ~RWF_DSYNC)
return -EINVAL;
return do_readv_writev(WRITE, file, vec, vlen, pos, flags);
@@ -1080,8 +1085,13 @@ static ssize_t compat_do_readv_writev(int type, struct file *file,
ret = do_iter_readv_writev(file, type, iov, nr_segs, tot_len,
pos, iter_fn, flags);
} else {
- if (type == READ && (flags & RWF_NONBLOCK))
- return -EAGAIN;
+ if (type == READ) {
+ if (flags & RWF_NONBLOCK)
+ return -EAGAIN;
+ } else {
+ if (flags & RWF_DSYNC)
+ return -EINVAL;
+ }
if (fnv)
ret = do_sync_readv_writev(file, iov, nr_segs, tot_len,
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7d0e116..7786b88 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1460,7 +1460,8 @@ struct block_device_operations;
#define HAVE_UNLOCKED_IOCTL 1
/* These flags are used for the readv/writev syscalls with flags. */
-#define RWF_NONBLOCK 0x00000001
+#define RWF_NONBLOCK 0x00000001
+#define RWF_DSYNC 0x00000002
struct iov_iter;
diff --git a/mm/filemap.c b/mm/filemap.c
index 6107058..4fbef99 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2669,7 +2669,9 @@ int generic_write_sync(struct kiocb *iocb, loff_t count)
struct file *file = iocb->ki_filp;
if (count > 0 &&
- ((file->f_flags & O_DSYNC) || IS_SYNC(file->f_mapping->host))) {
+ ((file->f_flags & O_DSYNC) ||
+ (iocb->ki_rwflags & RWF_DSYNC) ||
+ IS_SYNC(file->f_mapping->host))) {
bool fdatasync = !(file->f_flags & __O_SYNC);
ssize_t ret = 0;
--
1.9.1
--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org. For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH v5 0/7] vfs: Non-blockling buffered fs read (page cache only)
[not found] ` <cover.1415220890.git.milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org>
@ 2014-11-06 7:56 ` Christoph Hellwig
2014-11-06 15:46 ` Milosz Tanski
0 siblings, 1 reply; 27+ messages in thread
From: Christoph Hellwig @ 2014-11-06 7:56 UTC (permalink / raw)
To: Milosz Tanski
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, Christoph Hellwig,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
linux-aio-Bw31MaZKKs3YtjvyW6yDsg, Mel Gorman, Volker Lendecke,
Tejun Heo, Jeff Moyer, Theodore Ts'o, Al Viro,
linux-api-u79uwXL29TY76Z2rM5mHXA, Michael Kerrisk,
linux-arch-u79uwXL29TY76Z2rM5mHXA
This series looks good, do you also have a man page sniplet to document
the new syscalls?
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Cluster-devel] [PATCH v5 6/7] fs: pass iocb to generic_write_sync
2014-11-05 21:14 ` [PATCH v5 6/7] fs: pass iocb to generic_write_sync Milosz Tanski
@ 2014-11-06 10:18 ` Steven Whitehouse
2014-11-06 10:52 ` [Linux-NTFS-Dev] " Anton Altaparmakov
2014-11-06 12:04 ` Jan Kara
2 siblings, 0 replies; 27+ messages in thread
From: Steven Whitehouse @ 2014-11-06 10:18 UTC (permalink / raw)
To: Milosz Tanski, linux-kernel
Cc: linux-arch, linux-aio, Volker Lendecke, Theodore Ts'o,
linux-xfs, linux-cifs, linux-ntfs-dev, linux-api, Tejun Heo,
Jeff Moyer, cluster-devel, Mel Gorman, linux-fsdevel,
Michael Kerrisk, linux-ext4, Christoph Hellwig, linux-btrfs,
Al Viro, Andrew Price
Hi,
On 05/11/14 21:14, Milosz Tanski wrote:
> From: Christoph Hellwig <hch@lst.de>
>
> Clean up the generic_write_sync by just passing an iocb and a bytes
> written / negative errno argument. In addition to simplifying the
> callers this also prepares for passing a per-operation O_DSYNC
> flag. Two callers didn't quite fit that scheme:
>
> - dio_complete didn't both to update ki_pos as we don't need it
> on a iocb that is about to be freed, so we had to add it. Additionally
> it also synced out written data in the error case, which has been
> changed to operate like the other callers.
> - gfs2 also used generic_write_sync to implement a crude version
> of fallocate. It has been switched to use an open coded variant
> instead.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
GFS2 bits:
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
I know that Andy Price has some work in this area too, so in due course
we'll have to be careful not to create a merge conflict here. Copying in
Andy so he can see the changes,
Steve.
> ---
> fs/block_dev.c | 8 +-------
> fs/btrfs/file.c | 7 ++-----
> fs/cifs/file.c | 8 +-------
> fs/direct-io.c | 8 ++------
> fs/ext4/file.c | 8 +-------
> fs/gfs2/file.c | 9 +++++++--
> fs/ntfs/file.c | 8 ++------
> fs/udf/file.c | 11 ++---------
> fs/xfs/xfs_file.c | 8 +-------
> include/linux/fs.h | 8 +-------
> mm/filemap.c | 30 ++++++++++++++++++++----------
> 11 files changed, 40 insertions(+), 73 deletions(-)
>
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index cc9d411..c529b1c 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -1568,18 +1568,12 @@ static long block_ioctl(struct file *file, unsigned cmd, unsigned long arg)
> */
> ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from)
> {
> - struct file *file = iocb->ki_filp;
> struct blk_plug plug;
> ssize_t ret;
>
> blk_start_plug(&plug);
> ret = __generic_file_write_iter(iocb, from);
> - if (ret > 0) {
> - ssize_t err;
> - err = generic_write_sync(file, iocb->ki_pos - ret, ret);
> - if (err < 0)
> - ret = err;
> - }
> + ret = generic_write_sync(iocb, ret);
> blk_finish_plug(&plug);
> return ret;
> }
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index a18ceab..4f4a6f7 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -1820,11 +1820,8 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
> */
> BTRFS_I(inode)->last_trans = root->fs_info->generation + 1;
> BTRFS_I(inode)->last_sub_trans = root->log_transid;
> - if (num_written > 0) {
> - err = generic_write_sync(file, pos, num_written);
> - if (err < 0)
> - num_written = err;
> - }
> +
> + num_written = generic_write_sync(iocb, num_written);
>
> if (sync)
> atomic_dec(&BTRFS_I(inode)->sync_writers);
> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> index c485afa..32359de 100644
> --- a/fs/cifs/file.c
> +++ b/fs/cifs/file.c
> @@ -2706,13 +2706,7 @@ cifs_writev(struct kiocb *iocb, struct iov_iter *from)
> rc = __generic_file_write_iter(iocb, from);
> mutex_unlock(&inode->i_mutex);
>
> - if (rc > 0) {
> - ssize_t err;
> -
> - err = generic_write_sync(file, iocb->ki_pos - rc, rc);
> - if (err < 0)
> - rc = err;
> - }
> + rc = generic_write_sync(iocb, rc);
> } else {
> mutex_unlock(&inode->i_mutex);
> }
> diff --git a/fs/direct-io.c b/fs/direct-io.c
> index e181b6b..b72ac83 100644
> --- a/fs/direct-io.c
> +++ b/fs/direct-io.c
> @@ -257,12 +257,8 @@ static ssize_t dio_complete(struct dio *dio, loff_t offset, ssize_t ret,
> inode_dio_done(dio->inode);
> if (is_async) {
> if (dio->rw & WRITE) {
> - int err;
> -
> - err = generic_write_sync(dio->iocb->ki_filp, offset,
> - transferred);
> - if (err < 0 && ret > 0)
> - ret = err;
> + dio->iocb->ki_pos = offset + transferred;
> + ret = generic_write_sync(dio->iocb, ret);
> }
>
> aio_complete(dio->iocb, ret, 0);
> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> index aca7b24..79b000c 100644
> --- a/fs/ext4/file.c
> +++ b/fs/ext4/file.c
> @@ -175,13 +175,7 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
> ret = __generic_file_write_iter(iocb, from);
> mutex_unlock(&inode->i_mutex);
>
> - if (ret > 0) {
> - ssize_t err;
> -
> - err = generic_write_sync(file, iocb->ki_pos - ret, ret);
> - if (err < 0)
> - ret = err;
> - }
> + ret = generic_write_sync(iocb, ret);
> if (o_direct)
> blk_finish_plug(&plug);
>
> diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
> index 80dd44d..3fafeca 100644
> --- a/fs/gfs2/file.c
> +++ b/fs/gfs2/file.c
> @@ -895,8 +895,13 @@ retry:
> gfs2_quota_unlock(ip);
> }
>
> - if (error == 0)
> - error = generic_write_sync(file, pos, count);
> + if (error)
> + goto out_unlock;
> +
> + if ((file->f_flags & O_DSYNC) || IS_SYNC(file->f_mapping->host)) {
> + error = vfs_fsync_range(file, pos, pos + count - 1,
> + (file->f_flags & __O_SYNC) ? 0 : 1);
> + }
> goto out_unlock;
>
> out_trans_fail:
> diff --git a/fs/ntfs/file.c b/fs/ntfs/file.c
> index 643faa4..4f3d664 100644
> --- a/fs/ntfs/file.c
> +++ b/fs/ntfs/file.c
> @@ -2127,12 +2127,8 @@ static ssize_t ntfs_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
> mutex_lock(&inode->i_mutex);
> ret = ntfs_file_aio_write_nolock(iocb, iov, nr_segs, &iocb->ki_pos);
> mutex_unlock(&inode->i_mutex);
> - if (ret > 0) {
> - int err = generic_write_sync(file, iocb->ki_pos - ret, ret);
> - if (err < 0)
> - ret = err;
> - }
> - return ret;
> +
> + return generic_write_sync(iocb, ret);
> }
>
> /**
> diff --git a/fs/udf/file.c b/fs/udf/file.c
> index bb15771..1cdabd0 100644
> --- a/fs/udf/file.c
> +++ b/fs/udf/file.c
> @@ -155,16 +155,9 @@ static ssize_t udf_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
> retval = __generic_file_write_iter(iocb, from);
> mutex_unlock(&inode->i_mutex);
>
> - if (retval > 0) {
> - ssize_t err;
> -
> + if (retval > 0)
> mark_inode_dirty(inode);
> - err = generic_write_sync(file, iocb->ki_pos - retval, retval);
> - if (err < 0)
> - retval = err;
> - }
> -
> - return retval;
> + return generic_write_sync(iocb, retval);
> }
>
> long udf_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 0655915..a8cab66 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -792,14 +792,8 @@ xfs_file_write_iter(
> ret = xfs_file_buffered_aio_write(iocb, from);
>
> if (ret > 0) {
> - ssize_t err;
> -
> XFS_STATS_ADD(xs_write_bytes, ret);
> -
> - /* Handle various SYNC-type writes */
> - err = generic_write_sync(file, iocb->ki_pos - ret, ret);
> - if (err < 0)
> - ret = err;
> + ret = generic_write_sync(iocb, ret);
> }
> return ret;
> }
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index eaebd99..7d0e116 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2242,13 +2242,7 @@ extern int filemap_fdatawrite_range(struct address_space *mapping,
> extern int vfs_fsync_range(struct file *file, loff_t start, loff_t end,
> int datasync);
> extern int vfs_fsync(struct file *file, int datasync);
> -static inline int generic_write_sync(struct file *file, loff_t pos, loff_t count)
> -{
> - if (!(file->f_flags & O_DSYNC) && !IS_SYNC(file->f_mapping->host))
> - return 0;
> - return vfs_fsync_range(file, pos, pos + count - 1,
> - (file->f_flags & __O_SYNC) ? 0 : 1);
> -}
> +extern int generic_write_sync(struct kiocb *iocb, loff_t count);
> extern void emergency_sync(void);
> extern void emergency_remount(void);
> #ifdef CONFIG_BLOCK
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 09d3af3..6107058 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2664,6 +2664,24 @@ out:
> }
> EXPORT_SYMBOL(__generic_file_write_iter);
>
> +int generic_write_sync(struct kiocb *iocb, loff_t count)
> +{
> + struct file *file = iocb->ki_filp;
> +
> + if (count > 0 &&
> + ((file->f_flags & O_DSYNC) || IS_SYNC(file->f_mapping->host))) {
> + bool fdatasync = !(file->f_flags & __O_SYNC);
> + ssize_t ret = 0;
> +
> + ret = vfs_fsync_range(file, iocb->ki_pos - count,
> + iocb->ki_pos - 1, fdatasync);
> + if (ret < 0)
> + return ret;
> + }
> + return count;
> +}
> +EXPORT_SYMBOL(generic_write_sync);
> +
> /**
> * generic_file_write_iter - write data to a file
> * @iocb: IO state structure
> @@ -2675,22 +2693,14 @@ EXPORT_SYMBOL(__generic_file_write_iter);
> */
> ssize_t generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
> {
> - struct file *file = iocb->ki_filp;
> - struct inode *inode = file->f_mapping->host;
> + struct inode *inode = iocb->ki_filp->f_mapping->host;
> ssize_t ret;
>
> mutex_lock(&inode->i_mutex);
> ret = __generic_file_write_iter(iocb, from);
> mutex_unlock(&inode->i_mutex);
>
> - if (ret > 0) {
> - ssize_t err;
> -
> - err = generic_write_sync(file, iocb->ki_pos - ret, ret);
> - if (err < 0)
> - ret = err;
> - }
> - return ret;
> + return generic_write_sync(iocb, ret);
> }
> EXPORT_SYMBOL(generic_file_write_iter);
>
--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org. For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Linux-NTFS-Dev] [PATCH v5 6/7] fs: pass iocb to generic_write_sync
2014-11-05 21:14 ` [PATCH v5 6/7] fs: pass iocb to generic_write_sync Milosz Tanski
2014-11-06 10:18 ` [Cluster-devel] " Steven Whitehouse
@ 2014-11-06 10:52 ` Anton Altaparmakov
2014-11-06 16:14 ` Milosz Tanski
2014-11-06 12:04 ` Jan Kara
2 siblings, 1 reply; 27+ messages in thread
From: Anton Altaparmakov @ 2014-11-06 10:52 UTC (permalink / raw)
To: Milosz Tanski
Cc: Linux Kernel Mailing List, linux-arch, linux-aio, Volker Lendecke,
Theodore Ts'o, linux-xfs, linux-cifs, linux-ntfs-dev,
linux-api, Christoph Hellwig, Tejun Heo, Jeff Moyer,
cluster-devel, Mel Gorman, linux-fsdevel, Michael Kerrisk,
linux-ext4, Christoph Hellwig, linux-btrfs, Al Viro
Hi,
> On 5 Nov 2014, at 23:14, Milosz Tanski <milosz@adfin.com> wrote:
>
> From: Christoph Hellwig <hch@lst.de>
>
> Clean up the generic_write_sync by just passing an iocb and a bytes
> written / negative errno argument. In addition to simplifying the
> callers this also prepares for passing a per-operation O_DSYNC
> flag. Two callers didn't quite fit that scheme:
>
> - dio_complete didn't both to update ki_pos as we don't need it
> on a iocb that is about to be freed, so we had to add it. Additionally
> it also synced out written data in the error case, which has been
> changed to operate like the other callers.
> - gfs2 also used generic_write_sync to implement a crude version
> of fallocate. It has been switched to use an open coded variant
> instead.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> fs/block_dev.c | 8 +-------
> fs/btrfs/file.c | 7 ++-----
> fs/cifs/file.c | 8 +-------
> fs/direct-io.c | 8 ++------
> fs/ext4/file.c | 8 +-------
> fs/gfs2/file.c | 9 +++++++--
> fs/ntfs/file.c | 8 ++------
> fs/udf/file.c | 11 ++---------
> fs/xfs/xfs_file.c | 8 +-------
> include/linux/fs.h | 8 +-------
> mm/filemap.c | 30 ++++++++++++++++++++----------
> 11 files changed, 40 insertions(+), 73 deletions(-)
>
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index cc9d411..c529b1c 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -1568,18 +1568,12 @@ static long block_ioctl(struct file *file, unsigned cmd, unsigned long arg)
> */
> ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from)
> {
> - struct file *file = iocb->ki_filp;
> struct blk_plug plug;
> ssize_t ret;
>
> blk_start_plug(&plug);
> ret = __generic_file_write_iter(iocb, from);
> - if (ret > 0) {
> - ssize_t err;
> - err = generic_write_sync(file, iocb->ki_pos - ret, ret);
> - if (err < 0)
> - ret = err;
> - }
> + ret = generic_write_sync(iocb, ret);
> blk_finish_plug(&plug);
> return ret;
> }
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index a18ceab..4f4a6f7 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -1820,11 +1820,8 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
> */
> BTRFS_I(inode)->last_trans = root->fs_info->generation + 1;
> BTRFS_I(inode)->last_sub_trans = root->log_transid;
> - if (num_written > 0) {
> - err = generic_write_sync(file, pos, num_written);
> - if (err < 0)
> - num_written = err;
> - }
> +
> + num_written = generic_write_sync(iocb, num_written);
>
> if (sync)
> atomic_dec(&BTRFS_I(inode)->sync_writers);
> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> index c485afa..32359de 100644
> --- a/fs/cifs/file.c
> +++ b/fs/cifs/file.c
> @@ -2706,13 +2706,7 @@ cifs_writev(struct kiocb *iocb, struct iov_iter *from)
> rc = __generic_file_write_iter(iocb, from);
> mutex_unlock(&inode->i_mutex);
>
> - if (rc > 0) {
> - ssize_t err;
> -
> - err = generic_write_sync(file, iocb->ki_pos - rc, rc);
> - if (err < 0)
> - rc = err;
> - }
> + rc = generic_write_sync(iocb, rc);
> } else {
> mutex_unlock(&inode->i_mutex);
> }
> diff --git a/fs/direct-io.c b/fs/direct-io.c
> index e181b6b..b72ac83 100644
> --- a/fs/direct-io.c
> +++ b/fs/direct-io.c
> @@ -257,12 +257,8 @@ static ssize_t dio_complete(struct dio *dio, loff_t offset, ssize_t ret,
> inode_dio_done(dio->inode);
> if (is_async) {
> if (dio->rw & WRITE) {
> - int err;
> -
> - err = generic_write_sync(dio->iocb->ki_filp, offset,
> - transferred);
> - if (err < 0 && ret > 0)
> - ret = err;
> + dio->iocb->ki_pos = offset + transferred;
> + ret = generic_write_sync(dio->iocb, ret);
> }
>
> aio_complete(dio->iocb, ret, 0);
> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> index aca7b24..79b000c 100644
> --- a/fs/ext4/file.c
> +++ b/fs/ext4/file.c
> @@ -175,13 +175,7 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
> ret = __generic_file_write_iter(iocb, from);
> mutex_unlock(&inode->i_mutex);
>
> - if (ret > 0) {
> - ssize_t err;
> -
> - err = generic_write_sync(file, iocb->ki_pos - ret, ret);
> - if (err < 0)
> - ret = err;
> - }
> + ret = generic_write_sync(iocb, ret);
> if (o_direct)
> blk_finish_plug(&plug);
>
> diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
> index 80dd44d..3fafeca 100644
> --- a/fs/gfs2/file.c
> +++ b/fs/gfs2/file.c
> @@ -895,8 +895,13 @@ retry:
> gfs2_quota_unlock(ip);
> }
>
> - if (error == 0)
> - error = generic_write_sync(file, pos, count);
> + if (error)
> + goto out_unlock;
> +
> + if ((file->f_flags & O_DSYNC) || IS_SYNC(file->f_mapping->host)) {
> + error = vfs_fsync_range(file, pos, pos + count - 1,
> + (file->f_flags & __O_SYNC) ? 0 : 1);
> + }
> goto out_unlock;
>
> out_trans_fail:
> diff --git a/fs/ntfs/file.c b/fs/ntfs/file.c
> index 643faa4..4f3d664 100644
> --- a/fs/ntfs/file.c
> +++ b/fs/ntfs/file.c
> @@ -2127,12 +2127,8 @@ static ssize_t ntfs_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
> mutex_lock(&inode->i_mutex);
> ret = ntfs_file_aio_write_nolock(iocb, iov, nr_segs, &iocb->ki_pos);
> mutex_unlock(&inode->i_mutex);
> - if (ret > 0) {
> - int err = generic_write_sync(file, iocb->ki_pos - ret, ret);
> - if (err < 0)
> - ret = err;
> - }
> - return ret;
> +
> + return generic_write_sync(iocb, ret);
> }
>
> /**
> diff --git a/fs/udf/file.c b/fs/udf/file.c
> index bb15771..1cdabd0 100644
> --- a/fs/udf/file.c
> +++ b/fs/udf/file.c
> @@ -155,16 +155,9 @@ static ssize_t udf_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
> retval = __generic_file_write_iter(iocb, from);
> mutex_unlock(&inode->i_mutex);
>
> - if (retval > 0) {
> - ssize_t err;
> -
> + if (retval > 0)
> mark_inode_dirty(inode);
> - err = generic_write_sync(file, iocb->ki_pos - retval, retval);
> - if (err < 0)
> - retval = err;
> - }
> -
> - return retval;
> + return generic_write_sync(iocb, retval);
> }
>
> long udf_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 0655915..a8cab66 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -792,14 +792,8 @@ xfs_file_write_iter(
> ret = xfs_file_buffered_aio_write(iocb, from);
>
> if (ret > 0) {
> - ssize_t err;
> -
> XFS_STATS_ADD(xs_write_bytes, ret);
> -
> - /* Handle various SYNC-type writes */
> - err = generic_write_sync(file, iocb->ki_pos - ret, ret);
> - if (err < 0)
> - ret = err;
> + ret = generic_write_sync(iocb, ret);
> }
> return ret;
> }
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index eaebd99..7d0e116 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2242,13 +2242,7 @@ extern int filemap_fdatawrite_range(struct address_space *mapping,
> extern int vfs_fsync_range(struct file *file, loff_t start, loff_t end,
> int datasync);
> extern int vfs_fsync(struct file *file, int datasync);
> -static inline int generic_write_sync(struct file *file, loff_t pos, loff_t count)
> -{
> - if (!(file->f_flags & O_DSYNC) && !IS_SYNC(file->f_mapping->host))
> - return 0;
> - return vfs_fsync_range(file, pos, pos + count - 1,
> - (file->f_flags & __O_SYNC) ? 0 : 1);
> -}
> +extern int generic_write_sync(struct kiocb *iocb, loff_t count);
> extern void emergency_sync(void);
> extern void emergency_remount(void);
> #ifdef CONFIG_BLOCK
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 09d3af3..6107058 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2664,6 +2664,24 @@ out:
> }
> EXPORT_SYMBOL(__generic_file_write_iter);
>
> +int generic_write_sync(struct kiocb *iocb, loff_t count)
> +{
> + struct file *file = iocb->ki_filp;
> +
> + if (count > 0 &&
> + ((file->f_flags & O_DSYNC) || IS_SYNC(file->f_mapping->host))) {
> + bool fdatasync = !(file->f_flags & __O_SYNC);
> + ssize_t ret = 0;
That "= 0" is pointless. "ret" is overwritten unconditionally on the following line...
Other than that the NTFS bits are:
Acked-by: Anton Altaparmakov <anton@tuxera.com>
Best regards,
Anton
> +
> + ret = vfs_fsync_range(file, iocb->ki_pos - count,
> + iocb->ki_pos - 1, fdatasync);
> + if (ret < 0)
> + return ret;
> + }
> + return count;
> +}
> +EXPORT_SYMBOL(generic_write_sync);
> +
> /**
> * generic_file_write_iter - write data to a file
> * @iocb: IO state structure
> @@ -2675,22 +2693,14 @@ EXPORT_SYMBOL(__generic_file_write_iter);
> */
> ssize_t generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
> {
> - struct file *file = iocb->ki_filp;
> - struct inode *inode = file->f_mapping->host;
> + struct inode *inode = iocb->ki_filp->f_mapping->host;
> ssize_t ret;
>
> mutex_lock(&inode->i_mutex);
> ret = __generic_file_write_iter(iocb, from);
> mutex_unlock(&inode->i_mutex);
>
> - if (ret > 0) {
> - ssize_t err;
> -
> - err = generic_write_sync(file, iocb->ki_pos - ret, ret);
> - if (err < 0)
> - ret = err;
> - }
> - return ret;
> + return generic_write_sync(iocb, ret);
> }
> EXPORT_SYMBOL(generic_file_write_iter);
--
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
University of Cambridge Information Services, Roger Needham Building
7 JJ Thomson Avenue, Cambridge, CB3 0RB, UK
--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org. For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v5 6/7] fs: pass iocb to generic_write_sync
2014-11-05 21:14 ` [PATCH v5 6/7] fs: pass iocb to generic_write_sync Milosz Tanski
2014-11-06 10:18 ` [Cluster-devel] " Steven Whitehouse
2014-11-06 10:52 ` [Linux-NTFS-Dev] " Anton Altaparmakov
@ 2014-11-06 12:04 ` Jan Kara
2 siblings, 0 replies; 27+ messages in thread
From: Jan Kara @ 2014-11-06 12:04 UTC (permalink / raw)
To: Milosz Tanski
Cc: linux-kernel, Christoph Hellwig, Christoph Hellwig, linux-fsdevel,
linux-aio, Mel Gorman, Volker Lendecke, Tejun Heo, Jeff Moyer,
Theodore Ts'o, Al Viro, linux-api, Michael Kerrisk,
linux-arch, linux-btrfs, linux-cifs, linux-ext4, linux-ntfs-dev,
linux-xfs, cluster-devel
On Wed 05-11-14 16:14:52, Milosz Tanski wrote:
> From: Christoph Hellwig <hch@lst.de>
>
> Clean up the generic_write_sync by just passing an iocb and a bytes
> written / negative errno argument. In addition to simplifying the
> callers this also prepares for passing a per-operation O_DSYNC
> flag. Two callers didn't quite fit that scheme:
>
> - dio_complete didn't both to update ki_pos as we don't need it
> on a iocb that is about to be freed, so we had to add it. Additionally
> it also synced out written data in the error case, which has been
> changed to operate like the other callers.
> - gfs2 also used generic_write_sync to implement a crude version
> of fallocate. It has been switched to use an open coded variant
> instead.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Looks good. You can add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> fs/block_dev.c | 8 +-------
> fs/btrfs/file.c | 7 ++-----
> fs/cifs/file.c | 8 +-------
> fs/direct-io.c | 8 ++------
> fs/ext4/file.c | 8 +-------
> fs/gfs2/file.c | 9 +++++++--
> fs/ntfs/file.c | 8 ++------
> fs/udf/file.c | 11 ++---------
> fs/xfs/xfs_file.c | 8 +-------
> include/linux/fs.h | 8 +-------
> mm/filemap.c | 30 ++++++++++++++++++++----------
> 11 files changed, 40 insertions(+), 73 deletions(-)
>
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index cc9d411..c529b1c 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -1568,18 +1568,12 @@ static long block_ioctl(struct file *file, unsigned cmd, unsigned long arg)
> */
> ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from)
> {
> - struct file *file = iocb->ki_filp;
> struct blk_plug plug;
> ssize_t ret;
>
> blk_start_plug(&plug);
> ret = __generic_file_write_iter(iocb, from);
> - if (ret > 0) {
> - ssize_t err;
> - err = generic_write_sync(file, iocb->ki_pos - ret, ret);
> - if (err < 0)
> - ret = err;
> - }
> + ret = generic_write_sync(iocb, ret);
> blk_finish_plug(&plug);
> return ret;
> }
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index a18ceab..4f4a6f7 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -1820,11 +1820,8 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
> */
> BTRFS_I(inode)->last_trans = root->fs_info->generation + 1;
> BTRFS_I(inode)->last_sub_trans = root->log_transid;
> - if (num_written > 0) {
> - err = generic_write_sync(file, pos, num_written);
> - if (err < 0)
> - num_written = err;
> - }
> +
> + num_written = generic_write_sync(iocb, num_written);
>
> if (sync)
> atomic_dec(&BTRFS_I(inode)->sync_writers);
> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> index c485afa..32359de 100644
> --- a/fs/cifs/file.c
> +++ b/fs/cifs/file.c
> @@ -2706,13 +2706,7 @@ cifs_writev(struct kiocb *iocb, struct iov_iter *from)
> rc = __generic_file_write_iter(iocb, from);
> mutex_unlock(&inode->i_mutex);
>
> - if (rc > 0) {
> - ssize_t err;
> -
> - err = generic_write_sync(file, iocb->ki_pos - rc, rc);
> - if (err < 0)
> - rc = err;
> - }
> + rc = generic_write_sync(iocb, rc);
> } else {
> mutex_unlock(&inode->i_mutex);
> }
> diff --git a/fs/direct-io.c b/fs/direct-io.c
> index e181b6b..b72ac83 100644
> --- a/fs/direct-io.c
> +++ b/fs/direct-io.c
> @@ -257,12 +257,8 @@ static ssize_t dio_complete(struct dio *dio, loff_t offset, ssize_t ret,
> inode_dio_done(dio->inode);
> if (is_async) {
> if (dio->rw & WRITE) {
> - int err;
> -
> - err = generic_write_sync(dio->iocb->ki_filp, offset,
> - transferred);
> - if (err < 0 && ret > 0)
> - ret = err;
> + dio->iocb->ki_pos = offset + transferred;
> + ret = generic_write_sync(dio->iocb, ret);
> }
>
> aio_complete(dio->iocb, ret, 0);
> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> index aca7b24..79b000c 100644
> --- a/fs/ext4/file.c
> +++ b/fs/ext4/file.c
> @@ -175,13 +175,7 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
> ret = __generic_file_write_iter(iocb, from);
> mutex_unlock(&inode->i_mutex);
>
> - if (ret > 0) {
> - ssize_t err;
> -
> - err = generic_write_sync(file, iocb->ki_pos - ret, ret);
> - if (err < 0)
> - ret = err;
> - }
> + ret = generic_write_sync(iocb, ret);
> if (o_direct)
> blk_finish_plug(&plug);
>
> diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
> index 80dd44d..3fafeca 100644
> --- a/fs/gfs2/file.c
> +++ b/fs/gfs2/file.c
> @@ -895,8 +895,13 @@ retry:
> gfs2_quota_unlock(ip);
> }
>
> - if (error == 0)
> - error = generic_write_sync(file, pos, count);
> + if (error)
> + goto out_unlock;
> +
> + if ((file->f_flags & O_DSYNC) || IS_SYNC(file->f_mapping->host)) {
> + error = vfs_fsync_range(file, pos, pos + count - 1,
> + (file->f_flags & __O_SYNC) ? 0 : 1);
> + }
> goto out_unlock;
>
> out_trans_fail:
> diff --git a/fs/ntfs/file.c b/fs/ntfs/file.c
> index 643faa4..4f3d664 100644
> --- a/fs/ntfs/file.c
> +++ b/fs/ntfs/file.c
> @@ -2127,12 +2127,8 @@ static ssize_t ntfs_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
> mutex_lock(&inode->i_mutex);
> ret = ntfs_file_aio_write_nolock(iocb, iov, nr_segs, &iocb->ki_pos);
> mutex_unlock(&inode->i_mutex);
> - if (ret > 0) {
> - int err = generic_write_sync(file, iocb->ki_pos - ret, ret);
> - if (err < 0)
> - ret = err;
> - }
> - return ret;
> +
> + return generic_write_sync(iocb, ret);
> }
>
> /**
> diff --git a/fs/udf/file.c b/fs/udf/file.c
> index bb15771..1cdabd0 100644
> --- a/fs/udf/file.c
> +++ b/fs/udf/file.c
> @@ -155,16 +155,9 @@ static ssize_t udf_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
> retval = __generic_file_write_iter(iocb, from);
> mutex_unlock(&inode->i_mutex);
>
> - if (retval > 0) {
> - ssize_t err;
> -
> + if (retval > 0)
> mark_inode_dirty(inode);
> - err = generic_write_sync(file, iocb->ki_pos - retval, retval);
> - if (err < 0)
> - retval = err;
> - }
> -
> - return retval;
> + return generic_write_sync(iocb, retval);
> }
>
> long udf_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 0655915..a8cab66 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -792,14 +792,8 @@ xfs_file_write_iter(
> ret = xfs_file_buffered_aio_write(iocb, from);
>
> if (ret > 0) {
> - ssize_t err;
> -
> XFS_STATS_ADD(xs_write_bytes, ret);
> -
> - /* Handle various SYNC-type writes */
> - err = generic_write_sync(file, iocb->ki_pos - ret, ret);
> - if (err < 0)
> - ret = err;
> + ret = generic_write_sync(iocb, ret);
> }
> return ret;
> }
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index eaebd99..7d0e116 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2242,13 +2242,7 @@ extern int filemap_fdatawrite_range(struct address_space *mapping,
> extern int vfs_fsync_range(struct file *file, loff_t start, loff_t end,
> int datasync);
> extern int vfs_fsync(struct file *file, int datasync);
> -static inline int generic_write_sync(struct file *file, loff_t pos, loff_t count)
> -{
> - if (!(file->f_flags & O_DSYNC) && !IS_SYNC(file->f_mapping->host))
> - return 0;
> - return vfs_fsync_range(file, pos, pos + count - 1,
> - (file->f_flags & __O_SYNC) ? 0 : 1);
> -}
> +extern int generic_write_sync(struct kiocb *iocb, loff_t count);
> extern void emergency_sync(void);
> extern void emergency_remount(void);
> #ifdef CONFIG_BLOCK
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 09d3af3..6107058 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2664,6 +2664,24 @@ out:
> }
> EXPORT_SYMBOL(__generic_file_write_iter);
>
> +int generic_write_sync(struct kiocb *iocb, loff_t count)
> +{
> + struct file *file = iocb->ki_filp;
> +
> + if (count > 0 &&
> + ((file->f_flags & O_DSYNC) || IS_SYNC(file->f_mapping->host))) {
> + bool fdatasync = !(file->f_flags & __O_SYNC);
> + ssize_t ret = 0;
> +
> + ret = vfs_fsync_range(file, iocb->ki_pos - count,
> + iocb->ki_pos - 1, fdatasync);
> + if (ret < 0)
> + return ret;
> + }
> + return count;
> +}
> +EXPORT_SYMBOL(generic_write_sync);
> +
> /**
> * generic_file_write_iter - write data to a file
> * @iocb: IO state structure
> @@ -2675,22 +2693,14 @@ EXPORT_SYMBOL(__generic_file_write_iter);
> */
> ssize_t generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
> {
> - struct file *file = iocb->ki_filp;
> - struct inode *inode = file->f_mapping->host;
> + struct inode *inode = iocb->ki_filp->f_mapping->host;
> ssize_t ret;
>
> mutex_lock(&inode->i_mutex);
> ret = __generic_file_write_iter(iocb, from);
> mutex_unlock(&inode->i_mutex);
>
> - if (ret > 0) {
> - ssize_t err;
> -
> - err = generic_write_sync(file, iocb->ki_pos - ret, ret);
> - if (err < 0)
> - ret = err;
> - }
> - return ret;
> + return generic_write_sync(iocb, ret);
> }
> EXPORT_SYMBOL(generic_file_write_iter);
>
> --
> 1.9.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org. For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH v2 1/2] Add preadv2/pwritev2 documentation.
2014-11-05 21:14 [PATCH v5 0/7] vfs: Non-blockling buffered fs read (page cache only) Milosz Tanski
` (7 preceding siblings ...)
[not found] ` <cover.1415220890.git.milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org>
@ 2014-11-06 15:44 ` Milosz Tanski
[not found] ` <d2cbc4795f774b521e13ac448d07a1156c6aa04d.1415288353.git.milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org>
2014-11-06 16:16 ` [PATCH v5 0/7] vfs: Non-blockling buffered fs read (page cache only) Milosz Tanski
9 siblings, 1 reply; 27+ messages in thread
From: Milosz Tanski @ 2014-11-06 15:44 UTC (permalink / raw)
To: linux-kernel
Cc: Christoph Hellwig, linux-fsdevel, linux-aio, Mel Gorman,
Volker Lendecke, Tejun Heo, Jeff Moyer, Theodore Ts'o,
Al Viro, linux-api, Michael Kerrisk, linux-man
New syscalls that are a variation on the preadv/pwritev but support an extra
flag argument.
Signed-off-by: Milosz Tanski <milosz@adfin.com>
Suggested-by: Jeff Moyer <jmoyer@redhat.com>
Fixes: Jeff Moyer <jmoyer@redhat.com>
---
man2/readv.2 | 71 +++++++++++++++++++++++++++++++++++++++++++++++++++---------
1 file changed, 61 insertions(+), 10 deletions(-)
diff --git a/man2/readv.2 b/man2/readv.2
index 8748efa..31b3870 100644
--- a/man2/readv.2
+++ b/man2/readv.2
@@ -45,6 +45,12 @@ readv, writev, preadv, pwritev \- read or write data into multiple buffers
.sp
.BI "ssize_t pwritev(int " fd ", const struct iovec *" iov ", int " iovcnt ,
.BI " off_t " offset );
+.sp
+.BI "ssize_t preadv2(int " fd ", const struct iovec *" iov ", int " iovcnt ,
+.BI " off_t " offset ", int " flags );
+.sp
+.BI "ssize_t pwritev2(int " fd ", const struct iovec *" iov ", int " iovcnt ,
+.BI " off_t " offset ", int " flags );
.fi
.sp
.in -4n
@@ -162,9 +168,9 @@ The
system call combines the functionality of
.BR writev ()
and
-.BR pwrite (2).
+.BR pwrite (2) "."
It performs the same task as
-.BR writev (),
+.BR writev () ","
but adds a fourth argument,
.IR offset ,
which specifies the file offset at which the output operation
@@ -174,15 +180,41 @@ The file offset is not changed by these system calls.
The file referred to by
.I fd
must be capable of seeking.
+.SS preadv2() and pwritev2()
+
+This pair of system calls has similar functionality to the
+.BR preadv ()
+and
+.BR pwritev ()
+calls, but adds a fifth argument, \fIflags\fP, which modifies the behavior on a per call basis.
+
+Like the
+.BR preadv ()
+and
+.BR pwritev ()
+calls, they accept an \fIoffset\fP argument. Unlike those calls, if the \fIoffset\fP argument is set to -1 then the current file offset is used and updated.
+
+The \fIflags\fP arguments to
+.BR preadv2 ()
+and
+.BR pwritev2 ()
+contains a bitwise OR of one or more of the following flags:
+.TP
+.BR RWF_NONBLOCK " (only " preadv2() " since Linux 3.19)"
+Performs a non-blocking operation for regular files (not sockets) opened in buffered mode (not
+.BR O_DIRECT ")."
+
.SH RETURN VALUE
On success,
-.BR readv ()
-and
+.BR readv () ","
.BR preadv ()
-return the number of bytes read;
-.BR writev ()
and
+.BR preadv2 ()
+return the number of bytes read;
+.BR writev () ","
.BR pwritev ()
+and
+.BR pwritev2 ()
return the number of bytes written.
On error, \-1 is returned, and \fIerrno\fP is set appropriately.
.SH ERRORS
@@ -191,12 +223,22 @@ The errors are as given for
and
.BR write (2).
Furthermore,
-.BR preadv ()
-and
+.BR preadv () ","
+.BR preadv2 () ","
.BR pwritev ()
+and
+.BR pwritev2 ()
can also fail for the same reasons as
.BR lseek (2).
-Additionally, the following error is defined:
+Additionally, the following errors are defined:
+.TP
+.B EAGAIN
+The operation would block. This is possible if the file descriptor \fIfd\fP refers to a socket and has been marked nonblocking
+.RB ( O_NONBLOCK ),
+or the operation is a
+.BR preadv2
+and the \fIflags\fP argument is set to
+.BR RWF_NONBLOCK.
.TP
.B EINVAL
The sum of the
@@ -205,12 +247,17 @@ values overflows an
.I ssize_t
value.
Or, the vector count \fIiovcnt\fP is less than zero or greater than the
-permitted maximum.
+permitted maximum. Or, an unknown flag is specified in \fIflags\fP.
.SH VERSIONS
.BR preadv ()
and
.BR pwritev ()
first appeared in Linux 2.6.30; library support was added in glibc 2.10.
+.sp
+.BR preadv2 ()
+and
+.BR pwritev2 ()
+first appeared in Linux 3.19 (if we're lucky);
.SH CONFORMING TO
.BR readv (),
.BR writev ():
@@ -223,6 +270,10 @@ first appeared in Linux 2.6.30; library support was added in glibc 2.10.
.BR preadv (),
.BR pwritev ():
nonstandard, but present also on the modern BSDs.
+.sp
+.BR preadv2 (),
+.BR pwritev2 ():
+nonstandard, Linux extension.
.SH NOTES
.SS C library/kernel ABI differences
POSIX.1-2001 allows an implementation to place a limit on
--
1.9.1
--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org. For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v2 2/2] RWF_ODSYNC flag for pwritev2
[not found] ` <d2cbc4795f774b521e13ac448d07a1156c6aa04d.1415288353.git.milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org>
@ 2014-11-06 15:44 ` Milosz Tanski
0 siblings, 0 replies; 27+ messages in thread
From: Milosz Tanski @ 2014-11-06 15:44 UTC (permalink / raw)
To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
Cc: Christoph Hellwig, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
linux-aio-Bw31MaZKKs3YtjvyW6yDsg, Mel Gorman, Volker Lendecke,
Tejun Heo, Jeff Moyer, Theodore Ts'o, Al Viro,
linux-api-u79uwXL29TY76Z2rM5mHXA, Michael Kerrisk,
linux-man-u79uwXL29TY76Z2rM5mHXA
Document RWF_ODSYNC flag for pwritev2 as implemented by Christoph Hellwig.
Signed-off-by: Milosz Tanski <milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org>
---
man2/readv.2 | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/man2/readv.2 b/man2/readv.2
index 31b3870..ff1405c 100644
--- a/man2/readv.2
+++ b/man2/readv.2
@@ -203,6 +203,15 @@ contains a bitwise OR of one or more of the following flags:
.BR RWF_NONBLOCK " (only " preadv2() " since Linux 3.19)"
Performs a non-blocking operation for regular files (not sockets) opened in buffered mode (not
.BR O_DIRECT ")."
+.TP
+.BR RWF_DSYNC " (only " pwritev2() " since Linux 3.19)"
+Write operation will complete according to the requirements of synchronized I/O
+.I data
+integrity completion. This has the same effect on the operation as if the file handle was created by
+.BR open(2)
+with the
+.BR O_DSYNC
+flag.
.SH RETURN VALUE
On success,
@@ -333,6 +342,7 @@ nwritten = writev(STDOUT_FILENO, iov, 2);
.fi
.in
.SH SEE ALSO
+.BR open(2),
.BR pread (2),
.BR read (2),
.BR write (2)
--
1.9.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH v5 0/7] vfs: Non-blockling buffered fs read (page cache only)
2014-11-06 7:56 ` [PATCH v5 0/7] vfs: Non-blockling buffered fs read (page cache only) Christoph Hellwig
@ 2014-11-06 15:46 ` Milosz Tanski
0 siblings, 0 replies; 27+ messages in thread
From: Milosz Tanski @ 2014-11-06 15:46 UTC (permalink / raw)
To: Christoph Hellwig
Cc: LKML, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org,
Mel Gorman, Volker Lendecke, Tejun Heo, Jeff Moyer,
Theodore Ts'o, Al Viro, Linux API, Michael Kerrisk,
linux-arch
On Thu, Nov 6, 2014 at 2:56 AM, Christoph Hellwig <hch@infradead.org> wrote:
> This series looks good, do you also have a man page sniplet to document
> the new syscalls?
>
I just send out the two patches for the man pages, ran out of time
yesterday to update it for RWF_ODSYNC.
--
Milosz Tanski
CTO
16 East 34th Street, 15th floor
New York, NY 10016
p: 646-253-9055
e: milosz@adfin.com
--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org. For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Linux-NTFS-Dev] [PATCH v5 6/7] fs: pass iocb to generic_write_sync
2014-11-06 10:52 ` [Linux-NTFS-Dev] " Anton Altaparmakov
@ 2014-11-06 16:14 ` Milosz Tanski
0 siblings, 0 replies; 27+ messages in thread
From: Milosz Tanski @ 2014-11-06 16:14 UTC (permalink / raw)
To: Anton Altaparmakov
Cc: Linux Kernel Mailing List, linux-arch, linux-aio@kvack.org,
Volker Lendecke, Theodore Ts'o, linux-xfs, linux-cifs,
linux-ntfs-dev, Linux API, Christoph Hellwig, Tejun Heo,
Jeff Moyer, cluster-devel, Mel Gorman,
linux-fsdevel@vger.kernel.org, Michael Kerrisk, linux-ext4,
Christoph Hellwig, linux-btrfs, Al Viro
On Thu, Nov 6, 2014 at 5:52 AM, Anton Altaparmakov <aia21@cam.ac.uk> wrote:
> Hi,
>
>> On 5 Nov 2014, at 23:14, Milosz Tanski <milosz@adfin.com> wrote:
>>
>> From: Christoph Hellwig <hch@lst.de>
>>
>> Clean up the generic_write_sync by just passing an iocb and a bytes
>> written / negative errno argument. In addition to simplifying the
>> callers this also prepares for passing a per-operation O_DSYNC
>> flag. Two callers didn't quite fit that scheme:
>>
>> - dio_complete didn't both to update ki_pos as we don't need it
>> on a iocb that is about to be freed, so we had to add it. Additionally
>> it also synced out written data in the error case, which has been
>> changed to operate like the other callers.
>> - gfs2 also used generic_write_sync to implement a crude version
>> of fallocate. It has been switched to use an open coded variant
>> instead.
>>
>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>> ---
>> fs/block_dev.c | 8 +-------
>> fs/btrfs/file.c | 7 ++-----
>> fs/cifs/file.c | 8 +-------
>> fs/direct-io.c | 8 ++------
>> fs/ext4/file.c | 8 +-------
>> fs/gfs2/file.c | 9 +++++++--
>> fs/ntfs/file.c | 8 ++------
>> fs/udf/file.c | 11 ++---------
>> fs/xfs/xfs_file.c | 8 +-------
>> include/linux/fs.h | 8 +-------
>> mm/filemap.c | 30 ++++++++++++++++++++----------
>> 11 files changed, 40 insertions(+), 73 deletions(-)
>>
>> diff --git a/fs/block_dev.c b/fs/block_dev.c
>> index cc9d411..c529b1c 100644
>> --- a/fs/block_dev.c
>> +++ b/fs/block_dev.c
>> @@ -1568,18 +1568,12 @@ static long block_ioctl(struct file *file, unsigned cmd, unsigned long arg)
>> */
>> ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from)
>> {
>> - struct file *file = iocb->ki_filp;
>> struct blk_plug plug;
>> ssize_t ret;
>>
>> blk_start_plug(&plug);
>> ret = __generic_file_write_iter(iocb, from);
>> - if (ret > 0) {
>> - ssize_t err;
>> - err = generic_write_sync(file, iocb->ki_pos - ret, ret);
>> - if (err < 0)
>> - ret = err;
>> - }
>> + ret = generic_write_sync(iocb, ret);
>> blk_finish_plug(&plug);
>> return ret;
>> }
>> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
>> index a18ceab..4f4a6f7 100644
>> --- a/fs/btrfs/file.c
>> +++ b/fs/btrfs/file.c
>> @@ -1820,11 +1820,8 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
>> */
>> BTRFS_I(inode)->last_trans = root->fs_info->generation + 1;
>> BTRFS_I(inode)->last_sub_trans = root->log_transid;
>> - if (num_written > 0) {
>> - err = generic_write_sync(file, pos, num_written);
>> - if (err < 0)
>> - num_written = err;
>> - }
>> +
>> + num_written = generic_write_sync(iocb, num_written);
>>
>> if (sync)
>> atomic_dec(&BTRFS_I(inode)->sync_writers);
>> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
>> index c485afa..32359de 100644
>> --- a/fs/cifs/file.c
>> +++ b/fs/cifs/file.c
>> @@ -2706,13 +2706,7 @@ cifs_writev(struct kiocb *iocb, struct iov_iter *from)
>> rc = __generic_file_write_iter(iocb, from);
>> mutex_unlock(&inode->i_mutex);
>>
>> - if (rc > 0) {
>> - ssize_t err;
>> -
>> - err = generic_write_sync(file, iocb->ki_pos - rc, rc);
>> - if (err < 0)
>> - rc = err;
>> - }
>> + rc = generic_write_sync(iocb, rc);
>> } else {
>> mutex_unlock(&inode->i_mutex);
>> }
>> diff --git a/fs/direct-io.c b/fs/direct-io.c
>> index e181b6b..b72ac83 100644
>> --- a/fs/direct-io.c
>> +++ b/fs/direct-io.c
>> @@ -257,12 +257,8 @@ static ssize_t dio_complete(struct dio *dio, loff_t offset, ssize_t ret,
>> inode_dio_done(dio->inode);
>> if (is_async) {
>> if (dio->rw & WRITE) {
>> - int err;
>> -
>> - err = generic_write_sync(dio->iocb->ki_filp, offset,
>> - transferred);
>> - if (err < 0 && ret > 0)
>> - ret = err;
>> + dio->iocb->ki_pos = offset + transferred;
>> + ret = generic_write_sync(dio->iocb, ret);
>> }
>>
>> aio_complete(dio->iocb, ret, 0);
>> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
>> index aca7b24..79b000c 100644
>> --- a/fs/ext4/file.c
>> +++ b/fs/ext4/file.c
>> @@ -175,13 +175,7 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
>> ret = __generic_file_write_iter(iocb, from);
>> mutex_unlock(&inode->i_mutex);
>>
>> - if (ret > 0) {
>> - ssize_t err;
>> -
>> - err = generic_write_sync(file, iocb->ki_pos - ret, ret);
>> - if (err < 0)
>> - ret = err;
>> - }
>> + ret = generic_write_sync(iocb, ret);
>> if (o_direct)
>> blk_finish_plug(&plug);
>>
>> diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
>> index 80dd44d..3fafeca 100644
>> --- a/fs/gfs2/file.c
>> +++ b/fs/gfs2/file.c
>> @@ -895,8 +895,13 @@ retry:
>> gfs2_quota_unlock(ip);
>> }
>>
>> - if (error == 0)
>> - error = generic_write_sync(file, pos, count);
>> + if (error)
>> + goto out_unlock;
>> +
>> + if ((file->f_flags & O_DSYNC) || IS_SYNC(file->f_mapping->host)) {
>> + error = vfs_fsync_range(file, pos, pos + count - 1,
>> + (file->f_flags & __O_SYNC) ? 0 : 1);
>> + }
>> goto out_unlock;
>>
>> out_trans_fail:
>> diff --git a/fs/ntfs/file.c b/fs/ntfs/file.c
>> index 643faa4..4f3d664 100644
>> --- a/fs/ntfs/file.c
>> +++ b/fs/ntfs/file.c
>> @@ -2127,12 +2127,8 @@ static ssize_t ntfs_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
>> mutex_lock(&inode->i_mutex);
>> ret = ntfs_file_aio_write_nolock(iocb, iov, nr_segs, &iocb->ki_pos);
>> mutex_unlock(&inode->i_mutex);
>> - if (ret > 0) {
>> - int err = generic_write_sync(file, iocb->ki_pos - ret, ret);
>> - if (err < 0)
>> - ret = err;
>> - }
>> - return ret;
>> +
>> + return generic_write_sync(iocb, ret);
>> }
>>
>> /**
>> diff --git a/fs/udf/file.c b/fs/udf/file.c
>> index bb15771..1cdabd0 100644
>> --- a/fs/udf/file.c
>> +++ b/fs/udf/file.c
>> @@ -155,16 +155,9 @@ static ssize_t udf_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
>> retval = __generic_file_write_iter(iocb, from);
>> mutex_unlock(&inode->i_mutex);
>>
>> - if (retval > 0) {
>> - ssize_t err;
>> -
>> + if (retval > 0)
>> mark_inode_dirty(inode);
>> - err = generic_write_sync(file, iocb->ki_pos - retval, retval);
>> - if (err < 0)
>> - retval = err;
>> - }
>> -
>> - return retval;
>> + return generic_write_sync(iocb, retval);
>> }
>>
>> long udf_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
>> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
>> index 0655915..a8cab66 100644
>> --- a/fs/xfs/xfs_file.c
>> +++ b/fs/xfs/xfs_file.c
>> @@ -792,14 +792,8 @@ xfs_file_write_iter(
>> ret = xfs_file_buffered_aio_write(iocb, from);
>>
>> if (ret > 0) {
>> - ssize_t err;
>> -
>> XFS_STATS_ADD(xs_write_bytes, ret);
>> -
>> - /* Handle various SYNC-type writes */
>> - err = generic_write_sync(file, iocb->ki_pos - ret, ret);
>> - if (err < 0)
>> - ret = err;
>> + ret = generic_write_sync(iocb, ret);
>> }
>> return ret;
>> }
>> diff --git a/include/linux/fs.h b/include/linux/fs.h
>> index eaebd99..7d0e116 100644
>> --- a/include/linux/fs.h
>> +++ b/include/linux/fs.h
>> @@ -2242,13 +2242,7 @@ extern int filemap_fdatawrite_range(struct address_space *mapping,
>> extern int vfs_fsync_range(struct file *file, loff_t start, loff_t end,
>> int datasync);
>> extern int vfs_fsync(struct file *file, int datasync);
>> -static inline int generic_write_sync(struct file *file, loff_t pos, loff_t count)
>> -{
>> - if (!(file->f_flags & O_DSYNC) && !IS_SYNC(file->f_mapping->host))
>> - return 0;
>> - return vfs_fsync_range(file, pos, pos + count - 1,
>> - (file->f_flags & __O_SYNC) ? 0 : 1);
>> -}
>> +extern int generic_write_sync(struct kiocb *iocb, loff_t count);
>> extern void emergency_sync(void);
>> extern void emergency_remount(void);
>> #ifdef CONFIG_BLOCK
>> diff --git a/mm/filemap.c b/mm/filemap.c
>> index 09d3af3..6107058 100644
>> --- a/mm/filemap.c
>> +++ b/mm/filemap.c
>> @@ -2664,6 +2664,24 @@ out:
>> }
>> EXPORT_SYMBOL(__generic_file_write_iter);
>>
>> +int generic_write_sync(struct kiocb *iocb, loff_t count)
>> +{
>> + struct file *file = iocb->ki_filp;
>> +
>> + if (count > 0 &&
>> + ((file->f_flags & O_DSYNC) || IS_SYNC(file->f_mapping->host))) {
>> + bool fdatasync = !(file->f_flags & __O_SYNC);
>> + ssize_t ret = 0;
>
> That "= 0" is pointless. "ret" is overwritten unconditionally on the following line...
I have fixed this change; it will be in the next patch series / pull
request. The branch for the pull is at:
https://bitbucket.org/adfin/linux-fs.git read_call_6
>
> Other than that the NTFS bits are:
>
> Acked-by: Anton Altaparmakov <anton@tuxera.com>
>
> Best regards,
>
> Anton
>
>> +
>> + ret = vfs_fsync_range(file, iocb->ki_pos - count,
>> + iocb->ki_pos - 1, fdatasync);
>> + if (ret < 0)
>> + return ret;
>> + }
>> + return count;
>> +}
>> +EXPORT_SYMBOL(generic_write_sync);
>> +
>> /**
>> * generic_file_write_iter - write data to a file
>> * @iocb: IO state structure
>> @@ -2675,22 +2693,14 @@ EXPORT_SYMBOL(__generic_file_write_iter);
>> */
>> ssize_t generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
>> {
>> - struct file *file = iocb->ki_filp;
>> - struct inode *inode = file->f_mapping->host;
>> + struct inode *inode = iocb->ki_filp->f_mapping->host;
>> ssize_t ret;
>>
>> mutex_lock(&inode->i_mutex);
>> ret = __generic_file_write_iter(iocb, from);
>> mutex_unlock(&inode->i_mutex);
>>
>> - if (ret > 0) {
>> - ssize_t err;
>> -
>> - err = generic_write_sync(file, iocb->ki_pos - ret, ret);
>> - if (err < 0)
>> - ret = err;
>> - }
>> - return ret;
>> + return generic_write_sync(iocb, ret);
>> }
>> EXPORT_SYMBOL(generic_file_write_iter);
>
> --
> Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
> University of Cambridge Information Services, Roger Needham Building
> 7 JJ Thomson Avenue, Cambridge, CB3 0RB, UK
>
--
Milosz Tanski
CTO
16 East 34th Street, 15th floor
New York, NY 10016
p: 646-253-9055
e: milosz@adfin.com
--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org. For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v5 0/7] vfs: Non-blockling buffered fs read (page cache only)
2014-11-05 21:14 [PATCH v5 0/7] vfs: Non-blockling buffered fs read (page cache only) Milosz Tanski
` (8 preceding siblings ...)
2014-11-06 15:44 ` [PATCH v2 1/2] Add preadv2/pwritev2 documentation Milosz Tanski
@ 2014-11-06 16:16 ` Milosz Tanski
9 siblings, 0 replies; 27+ messages in thread
From: Milosz Tanski @ 2014-11-06 16:16 UTC (permalink / raw)
To: LKML
Cc: Christoph Hellwig, linux-fsdevel@vger.kernel.org,
linux-aio@kvack.org, Mel Gorman, Volker Lendecke, Tejun Heo,
Jeff Moyer, Theodore Ts'o, Al Viro, Linux API,
Michael Kerrisk, linux-arch
The pull request for these changes is at:
https://bitbucket.org/adfin/linux-fs.git read_call_6
I've updated it so far with various Ack-by from different maintainers
(and a small stylistic fix).
On Wed, Nov 5, 2014 at 4:14 PM, Milosz Tanski <milosz@adfin.com> wrote:
> This patcheset introduces an ability to perform a non-blocking read from
> regular files in buffered IO mode. This works by only for those filesystems
> that have data in the page cache.
>
> It does this by introducing new syscalls new syscalls preadv2/pwritev2. These
> new syscalls behave like the network sendmsg, recvmsg syscalls that accept an
> extra flag argument (RWF_NONBLOCK).
>
> It's a very common patern today (samba, libuv, etc..) use a large threadpool to
> perform buffered IO operations. They submit the work form another thread
> that performs network IO and epoll or other threads that perform CPU work. This
> leads to increased latency for processing, esp. in the case of data that's
> already cached in the page cache.
>
> With the new interface the applications will now be able to fetch the data in
> their network / cpu bound thread(s) and only defer to a threadpool if it's not
> there. In our own application (VLDB) we've observed a decrease in latency for
> "fast" request by avoiding unnecessary queuing and having to swap out current
> tasks in IO bound work threads.
>
> Version 5 highlight:
> - XFS support for RWF_NONBLOCK. from Christoph.
> - RWF_DSYNC flag and support for pwritev2, from Christoph.
> - Implemented compat syscalls, per. Jeff.
> - Missing nfs, ceph changes from older patchset.
>
> Version 4 highlight:
> - Updated for 3.18-rc1.
> - Performance data from our application.
> - First stab at man page with Jeff's help. Patch is in-reply to.
>
> RFC Version 3 highlights:
> - Down to 2 syscalls from 4; can user fp or argument position.
> - RWF_NONBLOCK value flag is not the same O_NONBLOCK, per Jeff.
>
> RFC Version 2 highlights:
> - Put the flags argument into kiocb (less noise), per. Al Viro
> - O_DIRECT checking early in the process, per. Jeff Moyer
> - Resolved duplicate (c&p) code in syscall code, per. Jeff
> - Included perf data in thread cover letter, per. Jeff
> - Created a new flag (not O_NONBLOCK) for readv2, perf Jeff
>
>
> Some perf data generated using fio comparing the posix aio engine to a version
> of the posix AIO engine that attempts to performs "fast" reads before
> submitting the operations to the queue. This workflow is on ext4 partition on
> raid0 (test / build-rig.) Simulating our database access patern workload using
> 16kb read accesses. Our database uses a home-spun posix aio like queue (samba
> does the same thing.)
>
> f1: ~73% rand read over mostly cached data (zipf med-size dataset)
> f2: ~18% rand read over mostly un-cached data (uniform large-dataset)
> f3: ~9% seq-read over large dataset
>
> before:
>
> f1:
> bw (KB /s): min= 11, max= 9088, per=0.56%, avg=969.54, stdev=827.99
> lat (msec) : 50=0.01%, 100=1.06%, 250=5.88%, 500=4.08%, 750=12.48%
> lat (msec) : 1000=17.27%, 2000=49.86%, >=2000=9.42%
> f2:
> bw (KB /s): min= 2, max= 1882, per=0.16%, avg=273.28, stdev=220.26
> lat (msec) : 250=5.65%, 500=3.31%, 750=15.64%, 1000=24.59%, 2000=46.56%
> lat (msec) : >=2000=4.33%
> f3:
> bw (KB /s): min= 0, max=265568, per=99.95%, avg=174575.10,
> stdev=34526.89
> lat (usec) : 2=0.01%, 4=0.01%, 10=0.02%, 20=0.27%, 50=10.82%
> lat (usec) : 100=50.34%, 250=5.05%, 500=7.12%, 750=6.60%, 1000=4.55%
> lat (msec) : 2=8.73%, 4=3.49%, 10=1.83%, 20=0.89%, 50=0.22%
> lat (msec) : 100=0.05%, 250=0.02%, 500=0.01%
> total:
> READ: io=102365MB, aggrb=174669KB/s, minb=240KB/s, maxb=173599KB/s,
> mint=600001msec, maxt=600113msec
>
> after (with fast read using preadv2 before submit):
>
> f1:
> bw (KB /s): min= 3, max=14897, per=1.28%, avg=2276.69, stdev=2930.39
> lat (usec) : 2=70.63%, 4=0.01%
> lat (msec) : 250=0.20%, 500=2.26%, 750=1.18%, 2000=0.22%, >=2000=25.53%
> f2:
> bw (KB /s): min= 2, max= 2362, per=0.14%, avg=249.83, stdev=222.00
> lat (msec) : 250=6.35%, 500=1.78%, 750=9.29%, 1000=20.49%, 2000=52.18%
> lat (msec) : >=2000=9.99%
> f3:
> bw (KB /s): min= 1, max=245448, per=100.00%, avg=177366.50,
> stdev=35995.60
> lat (usec) : 2=64.04%, 4=0.01%, 10=0.01%, 20=0.06%, 50=0.43%
> lat (usec) : 100=0.20%, 250=1.27%, 500=2.93%, 750=3.93%, 1000=7.35%
> lat (msec) : 2=14.27%, 4=2.88%, 10=1.54%, 20=0.81%, 50=0.22%
> lat (msec) : 100=0.05%, 250=0.02%
> total:
> READ: io=103941MB, aggrb=177339KB/s, minb=213KB/s, maxb=176375KB/s,
> mint=600020msec, maxt=600178msec
>
> Interpreting the results you can see total bandwidth stays the same but overall
> request latency is decreased in f1 (random, mostly cached) and f3 (sequential)
> workloads. There is a slight bump in latency for since it's random data that's
> unlikely to be cached but we're always trying "fast read".
>
> In our application we have starting keeping track of "fast read" hits/misses
> and for files / requests that have a lot hit ratio we don't do "fast reads"
> mostly getting rid of extra latency in the uncached cases. In our real world
> work load we were able to reduce average response time by 20 to 30% (depends
> on amount of IO done by request).
>
> I've performed other benchmarks and I have no observed any perf regressions in
> any of the normal (old) code paths.
>
> I have co-developed these changes with Christoph Hellwig.
>
> Christoph Hellwig (3):
> xfs: add RWF_NONBLOCK support
> fs: pass iocb to generic_write_sync
> fs: add a flag for per-operation O_DSYNC semantics
>
> Milosz Tanski (4):
> vfs: Prepare for adding a new preadv/pwritev with user flags.
> vfs: Define new syscalls preadv2,pwritev2
> x86: wire up preadv2 and pwritev2
> vfs: RWF_NONBLOCK flag for preadv2
>
> arch/x86/syscalls/syscall_32.tbl | 2 +
> arch/x86/syscalls/syscall_64.tbl | 2 +
> drivers/target/target_core_file.c | 6 +-
> fs/block_dev.c | 8 +-
> fs/btrfs/file.c | 7 +-
> fs/ceph/file.c | 6 +-
> fs/cifs/file.c | 14 +--
> fs/direct-io.c | 8 +-
> fs/ext4/file.c | 8 +-
> fs/fuse/file.c | 2 +
> fs/gfs2/file.c | 9 +-
> fs/nfs/file.c | 15 ++-
> fs/nfsd/vfs.c | 4 +-
> fs/ntfs/file.c | 8 +-
> fs/ocfs2/file.c | 12 +-
> fs/pipe.c | 3 +-
> fs/read_write.c | 239 +++++++++++++++++++++++++++++---------
> fs/splice.c | 2 +-
> fs/udf/file.c | 11 +-
> fs/xfs/xfs_file.c | 36 ++++--
> include/linux/aio.h | 2 +
> include/linux/compat.h | 6 +
> include/linux/fs.h | 16 ++-
> include/linux/syscalls.h | 6 +
> include/uapi/asm-generic/unistd.h | 6 +-
> mm/filemap.c | 55 +++++++--
> mm/shmem.c | 4 +
> 27 files changed, 346 insertions(+), 151 deletions(-)
>
> --
> 1.9.1
>
--
Milosz Tanski
CTO
16 East 34th Street, 15th floor
New York, NY 10016
p: 646-253-9055
e: milosz@adfin.com
--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org. For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v5 2/7] vfs: Define new syscalls preadv2,pwritev2
2014-11-05 21:14 ` [PATCH v5 2/7] vfs: Define new syscalls preadv2,pwritev2 Milosz Tanski
@ 2014-11-06 23:25 ` Jeff Moyer
2014-11-07 16:28 ` Milosz Tanski
0 siblings, 1 reply; 27+ messages in thread
From: Jeff Moyer @ 2014-11-06 23:25 UTC (permalink / raw)
To: Milosz Tanski
Cc: linux-kernel, Christoph Hellwig, linux-fsdevel, linux-aio,
Mel Gorman, Volker Lendecke, Tejun Heo, Theodore Ts'o,
Al Viro, linux-api, Michael Kerrisk, linux-arch, linux-mm
Milosz Tanski <milosz@adfin.com> writes:
> New syscalls that take an flag argument. This change does not add any specific
> flags.
>
> Signed-off-by: Milosz Tanski <milosz@adfin.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
> fs/read_write.c | 176 ++++++++++++++++++++++++++++++--------
> include/linux/compat.h | 6 ++
> include/linux/syscalls.h | 6 ++
> include/uapi/asm-generic/unistd.h | 6 +-
> mm/filemap.c | 5 +-
> 5 files changed, 158 insertions(+), 41 deletions(-)
>
> diff --git a/fs/read_write.c b/fs/read_write.c
> index 94b2d34..907735c 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -866,6 +866,8 @@ ssize_t vfs_readv(struct file *file, const struct iovec __user *vec,
> return -EBADF;
> if (!(file->f_mode & FMODE_CAN_READ))
> return -EINVAL;
> + if (flags & ~0)
> + return -EINVAL;
>
> return do_readv_writev(READ, file, vec, vlen, pos, flags);
> }
> @@ -879,21 +881,23 @@ ssize_t vfs_writev(struct file *file, const struct iovec __user *vec,
> return -EBADF;
> if (!(file->f_mode & FMODE_CAN_WRITE))
> return -EINVAL;
> + if (flags & ~0)
> + return -EINVAL;
>
> return do_readv_writev(WRITE, file, vec, vlen, pos, flags);
> }
Hi, Milosz,
You've checked for invalid flags for the normal system calls, but not
for the compat variants. Can you add that in, please?
Thanks!
Jeff
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v5 7/7] fs: add a flag for per-operation O_DSYNC semantics
2014-11-05 21:14 ` [PATCH v5 7/7] fs: add a flag for per-operation O_DSYNC semantics Milosz Tanski
@ 2014-11-06 23:46 ` Jeff Moyer
[not found] ` <x49r3xf28qn.fsf-RRHT56Q3PSP4kTEheFKJxxDDeQx5vsVwAInAS/Ez/D0@public.gmane.org>
2014-11-10 16:07 ` [PATCH v5 7/7] fs: " Sage Weil
1 sibling, 1 reply; 27+ messages in thread
From: Jeff Moyer @ 2014-11-06 23:46 UTC (permalink / raw)
To: Milosz Tanski
Cc: linux-kernel, Christoph Hellwig, Christoph Hellwig, linux-fsdevel,
linux-aio, Mel Gorman, Volker Lendecke, Tejun Heo,
Theodore Ts'o, Al Viro, linux-api, Michael Kerrisk,
linux-arch, ceph-devel, fuse-devel, linux-nfs, ocfs2-devel,
linux-mm
Milosz Tanski <milosz@adfin.com> writes:
> - if (type == READ && (flags & RWF_NONBLOCK))
> - return -EAGAIN;
> + if (type == READ) {
> + if (flags & RWF_NONBLOCK)
> + return -EAGAIN;
> + } else {
> + if (flags & RWF_DSYNC)
> + return -EINVAL;
> + }
Minor nit, but I'd rather read something that looks like this:
if (type == READ && (flags & RWF_NONBLOCK))
return -EAGAIN;
else if (type == WRITE && (flags & RWF_DSYNC))
return -EINVAL;
I won't lose sleep over it, though.
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v5 7/7] add a flag for per-operation O_DSYNC semantics
[not found] ` <x49r3xf28qn.fsf-RRHT56Q3PSP4kTEheFKJxxDDeQx5vsVwAInAS/Ez/D0@public.gmane.org>
@ 2014-11-07 4:22 ` Anton Altaparmakov
2014-11-07 5:52 ` [fuse-devel] " Anand Avati
0 siblings, 1 reply; 27+ messages in thread
From: Anton Altaparmakov @ 2014-11-07 4:22 UTC (permalink / raw)
To: Jeff Moyer
Cc: Milosz Tanski, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
Christoph Hellwig, Christoph Hellwig,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
linux-aio-Bw31MaZKKs3YtjvyW6yDsg, Mel Gorman, Volker Lendecke,
Tejun Heo, Theodore Ts'o, Al Viro,
linux-api-u79uwXL29TY76Z2rM5mHXA, Michael Kerrisk,
linux-arch-u79uwXL29TY76Z2rM5mHXA,
ceph-devel-u79uwXL29TY76Z2rM5mHXA,
fuse-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
linux-nfs-u79uwXL29TY76Z2rM5mHXA,
ocfs2-devel-N0ozoZBvEnrZJqsBc5GL+g,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg
Hi Jeff,
> On 7 Nov 2014, at 01:46, Jeff Moyer <jmoyer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>
> Milosz Tanski <milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org> writes:
>
>> - if (type == READ && (flags & RWF_NONBLOCK))
>> - return -EAGAIN;
>> + if (type == READ) {
>> + if (flags & RWF_NONBLOCK)
>> + return -EAGAIN;
>> + } else {
>> + if (flags & RWF_DSYNC)
>> + return -EINVAL;
>> + }
>
> Minor nit, but I'd rather read something that looks like this:
>
> if (type == READ && (flags & RWF_NONBLOCK))
> return -EAGAIN;
> else if (type == WRITE && (flags & RWF_DSYNC))
> return -EINVAL;
But your version is less logically efficient for the case where "type == READ" is true and "flags & RWF_NONBLOCK" is false because your version then has to do the "if (type == WRITE" check before discovering it does not need to take that branch either, whilst the original version does not have to do such a test at all.
Best regards,
Anton
> I won't lose sleep over it, though.
>
> Reviewed-by: Jeff Moyer <jmoyer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
--
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
University of Cambridge Information Services, Roger Needham Building
7 JJ Thomson Avenue, Cambridge, CB3 0RB, UK
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [fuse-devel] [PATCH v5 7/7] add a flag for per-operation O_DSYNC semantics
2014-11-07 4:22 ` [PATCH v5 7/7] " Anton Altaparmakov
@ 2014-11-07 5:52 ` Anand Avati
2014-11-07 6:43 ` Anton Altaparmakov
0 siblings, 1 reply; 27+ messages in thread
From: Anand Avati @ 2014-11-07 5:52 UTC (permalink / raw)
To: Anton Altaparmakov
Cc: Jeff Moyer, linux-arch, linux-aio, linux-nfs, Volker Lendecke,
Theodore Ts'o, linux-mm, fuse-devel@lists.sourceforge.net,
linux-api, Linux Kernel Mailing List, Al Viro, Christoph Hellwig,
Tejun Heo, Milosz Tanski, linux-fsdevel, Michael Kerrisk,
ceph-devel, Christoph Hellwig, ocfs2-devel, Mel Gorman
[-- Attachment #1: Type: text/plain, Size: 928 bytes --]
On Thu, Nov 6, 2014 at 8:22 PM, Anton Altaparmakov <aia21@cam.ac.uk> wrote:
> > On 7 Nov 2014, at 01:46, Jeff Moyer <jmoyer@redhat.com> wrote:
> > Minor nit, but I'd rather read something that looks like this:
> >
> > if (type == READ && (flags & RWF_NONBLOCK))
> > return -EAGAIN;
> > else if (type == WRITE && (flags & RWF_DSYNC))
> > return -EINVAL;
>
> But your version is less logically efficient for the case where "type ==
> READ" is true and "flags & RWF_NONBLOCK" is false because your version then
> has to do the "if (type == WRITE" check before discovering it does not need
> to take that branch either, whilst the original version does not have to do
> such a test at all.
>
Seriously? Just focus on the code readability/maintainability which makes
the code most easily understood/obvious to a new pair of eyes, and leave
such micro-optimizations to the compiler..
Thanks
[-- Attachment #2: Type: text/html, Size: 1425 bytes --]
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [fuse-devel] [PATCH v5 7/7] add a flag for per-operation O_DSYNC semantics
2014-11-07 5:52 ` [fuse-devel] " Anand Avati
@ 2014-11-07 6:43 ` Anton Altaparmakov
2014-11-07 14:21 ` Roger Willcocks
0 siblings, 1 reply; 27+ messages in thread
From: Anton Altaparmakov @ 2014-11-07 6:43 UTC (permalink / raw)
To: Anand Avati
Cc: Jeff Moyer, linux-arch, linux-aio, linux-nfs, Volker Lendecke,
Theodore Ts'o, linux-mm, fuse-devel@lists.sourceforge.net,
linux-api, Linux Kernel Mailing List, Al Viro, Christoph Hellwig,
Tejun Heo, Milosz Tanski, linux-fsdevel, Michael Kerrisk,
ceph-devel, Christoph Hellwig, ocfs2-devel, Mel Gorman
Hi,
> On 7 Nov 2014, at 07:52, Anand Avati <avati@gluster.org> wrote:
> On Thu, Nov 6, 2014 at 8:22 PM, Anton Altaparmakov <aia21@cam.ac.uk> wrote:
> > On 7 Nov 2014, at 01:46, Jeff Moyer <jmoyer@redhat.com> wrote:
> > Minor nit, but I'd rather read something that looks like this:
> >
> > if (type == READ && (flags & RWF_NONBLOCK))
> > return -EAGAIN;
> > else if (type == WRITE && (flags & RWF_DSYNC))
> > return -EINVAL;
>
> But your version is less logically efficient for the case where "type == READ" is true and "flags & RWF_NONBLOCK" is false because your version then has to do the "if (type == WRITE" check before discovering it does not need to take that branch either, whilst the original version does not have to do such a test at all.
>
> Seriously?
Of course seriously.
> Just focus on the code readability/maintainability which makes the code most easily understood/obvious to a new pair of eyes, and leave such micro-optimizations to the compiler..
The original version is more readable (IMO) and this is not a micro-optimization. It is people like you who are responsible for the fact that we need faster and faster computers to cope with the inefficient/poor code being written more and more...
And I really wouldn't hedge my bets on gcc optimizing something like that. The amount of crap assembly produced from gcc that I have seen over the years suggests that it is quite likely it will make a hash of it instead...
Best regards,
Anton
> Thanks
--
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
University of Cambridge Information Services, Roger Needham Building
7 JJ Thomson Avenue, Cambridge, CB3 0RB, UK
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [fuse-devel] [PATCH v5 7/7] add a flag for per-operation O_DSYNC semantics
2014-11-07 6:43 ` Anton Altaparmakov
@ 2014-11-07 14:21 ` Roger Willcocks
2014-11-07 19:58 ` Milosz Tanski
0 siblings, 1 reply; 27+ messages in thread
From: Roger Willcocks @ 2014-11-07 14:21 UTC (permalink / raw)
To: Anton Altaparmakov
Cc: Anand Avati, linux-arch, linux-aio, linux-nfs, Volker Lendecke,
Theodore Ts'o, Mel Gorman, fuse-devel@lists.sourceforge.net,
linux-api, Linux Kernel Mailing List, Michael Kerrisk,
Christoph Hellwig, linux-mm, Jeff Moyer, Al Viro, Tejun Heo,
linux-fsdevel, ceph-devel, Christoph Hellwig, ocfs2-devel,
Milosz Tanski
On Fri, 2014-11-07 at 08:43 +0200, Anton Altaparmakov wrote:
> Hi,
>
> > On 7 Nov 2014, at 07:52, Anand Avati <avati@gluster.org> wrote:
> > On Thu, Nov 6, 2014 at 8:22 PM, Anton Altaparmakov <aia21@cam.ac.uk> wrote:
> > > On 7 Nov 2014, at 01:46, Jeff Moyer <jmoyer@redhat.com> wrote:
> > > Minor nit, but I'd rather read something that looks like this:
> > >
> > > if (type == READ && (flags & RWF_NONBLOCK))
> > > return -EAGAIN;
> > > else if (type == WRITE && (flags & RWF_DSYNC))
> > > return -EINVAL;
> >
> > But your version is less logically efficient for the case where "type == READ" is true and "flags & RWF_NONBLOCK" is false because your version then has to do the "if (type == WRITE" check before discovering it does not need to take that branch either, whilst the original version does not have to do such a test at all.
> >
> > Seriously?
>
> Of course seriously.
>
> > Just focus on the code readability/maintainability which makes the code most easily understood/obvious to a new pair of eyes, and leave such micro-optimizations to the compiler..
>
> The original version is more readable (IMO) and this is not a micro-optimization. It is people like you who are responsible for the fact that we need faster and faster computers to cope with the inefficient/poor code being written more and more...
>
Your original version needs me to know that type can only be either READ
or WRITE (and not, for instance, READONLY or READWRITE or some other
random special case) and it rings alarm bells when I first see it. If
you want to keep the micro optimization, you need an assertion to
acknowledge the potential bug and a comment to make the code obvious:
+ assert(type == READ || type == WRITE);
+ if (type == READ) {
+ if (flags & RWF_NONBLOCK)
+ return -EAGAIN;
+ } else { /* WRITE */
+ if (flags & RWF_DSYNC)
+ return -EINVAL;
+ }
but since what's really happening here is two separate and independent
error checks, Jeff's version is still better, even if it does take an
extra couple of nanoseconds.
Actually I'd probably write:
if (type == READ && (flags & RWF_NONBLOCK))
return -EAGAIN;
if (type == WRITE && (flags & RWF_DSYNC))
return -EINVAL;
(no 'else' since the code will never be reached if the first test is
true).
--
Roger Willcocks <roger@filmlight.ltd.uk>
--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org. For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v5 2/7] vfs: Define new syscalls preadv2,pwritev2
2014-11-06 23:25 ` Jeff Moyer
@ 2014-11-07 16:28 ` Milosz Tanski
0 siblings, 0 replies; 27+ messages in thread
From: Milosz Tanski @ 2014-11-07 16:28 UTC (permalink / raw)
To: Jeff Moyer
Cc: LKML, Christoph Hellwig, linux-fsdevel@vger.kernel.org,
linux-aio@kvack.org, Mel Gorman, Volker Lendecke, Tejun Heo,
Theodore Ts'o, Al Viro, Linux API, Michael Kerrisk,
linux-arch, linux-mm
On Thu, Nov 6, 2014 at 6:25 PM, Jeff Moyer <jmoyer@redhat.com> wrote:
> Milosz Tanski <milosz@adfin.com> writes:
>
>> New syscalls that take an flag argument. This change does not add any specific
>> flags.
>>
>> Signed-off-by: Milosz Tanski <milosz@adfin.com>
>> Reviewed-by: Christoph Hellwig <hch@lst.de>
>> ---
>> fs/read_write.c | 176 ++++++++++++++++++++++++++++++--------
>> include/linux/compat.h | 6 ++
>> include/linux/syscalls.h | 6 ++
>> include/uapi/asm-generic/unistd.h | 6 +-
>> mm/filemap.c | 5 +-
>> 5 files changed, 158 insertions(+), 41 deletions(-)
>>
>> diff --git a/fs/read_write.c b/fs/read_write.c
>> index 94b2d34..907735c 100644
>> --- a/fs/read_write.c
>> +++ b/fs/read_write.c
>> @@ -866,6 +866,8 @@ ssize_t vfs_readv(struct file *file, const struct iovec __user *vec,
>> return -EBADF;
>> if (!(file->f_mode & FMODE_CAN_READ))
>> return -EINVAL;
>> + if (flags & ~0)
>> + return -EINVAL;
>>
>> return do_readv_writev(READ, file, vec, vlen, pos, flags);
>> }
>> @@ -879,21 +881,23 @@ ssize_t vfs_writev(struct file *file, const struct iovec __user *vec,
>> return -EBADF;
>> if (!(file->f_mode & FMODE_CAN_WRITE))
>> return -EINVAL;
>> + if (flags & ~0)
>> + return -EINVAL;
>>
>> return do_readv_writev(WRITE, file, vec, vlen, pos, flags);
>> }
>
> Hi, Milosz,
>
> You've checked for invalid flags for the normal system calls, but not
> for the compat variants. Can you add that in, please?
>
> Thanks!
> Jeff
That's a good catch Jeff I'll fix this and it'll be in the next
version of the patch series.
- M
--
Milosz Tanski
CTO
16 East 34th Street, 15th floor
New York, NY 10016
p: 646-253-9055
e: milosz@adfin.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [fuse-devel] [PATCH v5 7/7] add a flag for per-operation O_DSYNC semantics
2014-11-07 14:21 ` Roger Willcocks
@ 2014-11-07 19:58 ` Milosz Tanski
0 siblings, 0 replies; 27+ messages in thread
From: Milosz Tanski @ 2014-11-07 19:58 UTC (permalink / raw)
To: Roger Willcocks
Cc: Anton Altaparmakov, Anand Avati, linux-arch, linux-aio@kvack.org,
linux-nfs, Volker Lendecke, Theodore Ts'o, Mel Gorman,
fuse-devel@lists.sourceforge.net, Linux API,
Linux Kernel Mailing List, Michael Kerrisk, Christoph Hellwig,
linux-mm, Jeff Moyer, Al Viro, Tejun Heo,
linux-fsdevel@vger.kernel.org, ceph-devel, Christoph Hellwig,
ocfs2-devel
On Fri, Nov 7, 2014 at 9:21 AM, Roger Willcocks <roger@filmlight.ltd.uk> wrote:
>
> On Fri, 2014-11-07 at 08:43 +0200, Anton Altaparmakov wrote:
>> Hi,
>>
>> > On 7 Nov 2014, at 07:52, Anand Avati <avati@gluster.org> wrote:
>> > On Thu, Nov 6, 2014 at 8:22 PM, Anton Altaparmakov <aia21@cam.ac.uk> wrote:
>> > > On 7 Nov 2014, at 01:46, Jeff Moyer <jmoyer@redhat.com> wrote:
>> > > Minor nit, but I'd rather read something that looks like this:
>> > >
>> > > if (type == READ && (flags & RWF_NONBLOCK))
>> > > return -EAGAIN;
>> > > else if (type == WRITE && (flags & RWF_DSYNC))
>> > > return -EINVAL;
>> >
>> > But your version is less logically efficient for the case where "type == READ" is true and "flags & RWF_NONBLOCK" is false because your version then has to do the "if (type == WRITE" check before discovering it does not need to take that branch either, whilst the original version does not have to do such a test at all.
>> >
>> > Seriously?
>>
>> Of course seriously.
>>
>> > Just focus on the code readability/maintainability which makes the code most easily understood/obvious to a new pair of eyes, and leave such micro-optimizations to the compiler..
>>
>> The original version is more readable (IMO) and this is not a micro-optimization. It is people like you who are responsible for the fact that we need faster and faster computers to cope with the inefficient/poor code being written more and more...
>>
>
> Your original version needs me to know that type can only be either READ
> or WRITE (and not, for instance, READONLY or READWRITE or some other
> random special case) and it rings alarm bells when I first see it. If
> you want to keep the micro optimization, you need an assertion to
> acknowledge the potential bug and a comment to make the code obvious:
>
> + assert(type == READ || type == WRITE);
> + if (type == READ) {
> + if (flags & RWF_NONBLOCK)
> + return -EAGAIN;
> + } else { /* WRITE */
> + if (flags & RWF_DSYNC)
> + return -EINVAL;
> + }
>
> but since what's really happening here is two separate and independent
> error checks, Jeff's version is still better, even if it does take an
> extra couple of nanoseconds.
>
> Actually I'd probably write:
>
> if (type == READ && (flags & RWF_NONBLOCK))
> return -EAGAIN;
>
> if (type == WRITE && (flags & RWF_DSYNC))
> return -EINVAL;
>
> (no 'else' since the code will never be reached if the first test is
> true).
>
>
> --
> Roger Willcocks <roger@filmlight.ltd.uk>
>
This is what I changed it to (and will be sending that out for the
next version).
--
Milosz Tanski
CTO
16 East 34th Street, 15th floor
New York, NY 10016
p: 646-253-9055
e: milosz@adfin.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v5 4/7] vfs: RWF_NONBLOCK flag for preadv2
2014-11-05 21:14 ` [PATCH v5 4/7] vfs: RWF_NONBLOCK flag for preadv2 Milosz Tanski
@ 2014-11-10 16:07 ` Sage Weil
0 siblings, 0 replies; 27+ messages in thread
From: Sage Weil @ 2014-11-10 16:07 UTC (permalink / raw)
To: Milosz Tanski
Cc: linux-kernel, Christoph Hellwig, linux-fsdevel, linux-aio,
Mel Gorman, Volker Lendecke, Tejun Heo, Jeff Moyer,
Theodore Ts'o, Al Viro, linux-api, Michael Kerrisk,
linux-arch, ceph-devel, linux-cifs, samba-technical, linux-nfs,
linux-xfs, ocfs2-devel, linux-mm
On Wed, 5 Nov 2014, Milosz Tanski wrote:
> generic_file_read_iter() supports a new flag RWF_NONBLOCK which says that we
> only want to read the data if it's already in the page cache.
>
> Additionally, there are a few filesystems that we have to specifically
> bail early if RWF_NONBLOCK because the op would block. Christoph Hellwig
> contributed this code.
>
> Signed-off-by: Milosz Tanski <milosz@adfin.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Ceph bits
Acked-by: Sage Weil <sage@redhat.com>
> ---
> fs/ceph/file.c | 2 ++
> fs/cifs/file.c | 6 ++++++
> fs/nfs/file.c | 5 ++++-
> fs/ocfs2/file.c | 6 ++++++
> fs/pipe.c | 3 ++-
> fs/read_write.c | 38 +++++++++++++++++++++++++-------------
> fs/xfs/xfs_file.c | 4 ++++
> include/linux/fs.h | 3 +++
> mm/filemap.c | 18 ++++++++++++++++++
> mm/shmem.c | 4 ++++
> 10 files changed, 74 insertions(+), 15 deletions(-)
>
> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> index d7e0da8..b798b5c 100644
> --- a/fs/ceph/file.c
> +++ b/fs/ceph/file.c
> @@ -822,6 +822,8 @@ again:
> if ((got & (CEPH_CAP_FILE_CACHE|CEPH_CAP_FILE_LAZYIO)) == 0 ||
> (iocb->ki_filp->f_flags & O_DIRECT) ||
> (fi->flags & CEPH_F_SYNC)) {
> + if (iocb->ki_rwflags & O_NONBLOCK)
> + return -EAGAIN;
>
> dout("aio_sync_read %p %llx.%llx %llu~%u got cap refs on %s\n",
> inode, ceph_vinop(inode), iocb->ki_pos, (unsigned)len,
> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> index 3e4d00a..c485afa 100644
> --- a/fs/cifs/file.c
> +++ b/fs/cifs/file.c
> @@ -3005,6 +3005,9 @@ ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to)
> struct cifs_readdata *rdata, *tmp;
> struct list_head rdata_list;
>
> + if (iocb->ki_rwflags & RWF_NONBLOCK)
> + return -EAGAIN;
> +
> len = iov_iter_count(to);
> if (!len)
> return 0;
> @@ -3123,6 +3126,9 @@ cifs_strict_readv(struct kiocb *iocb, struct iov_iter *to)
> ((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NOPOSIXBRL) == 0))
> return generic_file_read_iter(iocb, to);
>
> + if (iocb->ki_rwflags & RWF_NONBLOCK)
> + return -EAGAIN;
> +
> /*
> * We need to hold the sem to be sure nobody modifies lock list
> * with a brlock that prevents reading.
> diff --git a/fs/nfs/file.c b/fs/nfs/file.c
> index 2ab6f00..aa9046f 100644
> --- a/fs/nfs/file.c
> +++ b/fs/nfs/file.c
> @@ -171,8 +171,11 @@ nfs_file_read(struct kiocb *iocb, struct iov_iter *to)
> struct inode *inode = file_inode(iocb->ki_filp);
> ssize_t result;
>
> - if (iocb->ki_filp->f_flags & O_DIRECT)
> + if (iocb->ki_filp->f_flags & O_DIRECT) {
> + if (iocb->ki_rwflags & O_NONBLOCK)
> + return -EAGAIN;
> return nfs_file_direct_read(iocb, to, iocb->ki_pos);
> + }
>
> dprintk("NFS: read(%pD2, %zu@%lu)\n",
> iocb->ki_filp,
> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
> index 324dc93..bb66ca4 100644
> --- a/fs/ocfs2/file.c
> +++ b/fs/ocfs2/file.c
> @@ -2472,6 +2472,12 @@ static ssize_t ocfs2_file_read_iter(struct kiocb *iocb,
> filp->f_path.dentry->d_name.name,
> to->nr_segs); /* GRRRRR */
>
> + /*
> + * No non-blocking reads for ocfs2 for now. Might be doable with
> + * non-blocking cluster lock helpers.
> + */
> + if (iocb->ki_rwflags & RWF_NONBLOCK)
> + return -EAGAIN;
>
> if (!inode) {
> ret = -EINVAL;
> diff --git a/fs/pipe.c b/fs/pipe.c
> index 21981e5..212bf68 100644
> --- a/fs/pipe.c
> +++ b/fs/pipe.c
> @@ -302,7 +302,8 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to)
> */
> if (ret)
> break;
> - if (filp->f_flags & O_NONBLOCK) {
> + if ((filp->f_flags & O_NONBLOCK) ||
> + (iocb->ki_rwflags & RWF_NONBLOCK)) {
> ret = -EAGAIN;
> break;
> }
> diff --git a/fs/read_write.c b/fs/read_write.c
> index 907735c..cba7d4c 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -835,14 +835,19 @@ static ssize_t do_readv_writev(int type, struct file *file,
> file_start_write(file);
> }
>
> - if (iter_fn)
> + if (iter_fn) {
> ret = do_iter_readv_writev(file, type, iov, nr_segs, tot_len,
> pos, iter_fn, flags);
> - else if (fnv)
> - ret = do_sync_readv_writev(file, iov, nr_segs, tot_len,
> - pos, fnv);
> - else
> - ret = do_loop_readv_writev(file, iov, nr_segs, pos, fn);
> + } else {
> + if (type == READ && (flags & RWF_NONBLOCK))
> + return -EAGAIN;
> +
> + if (fnv)
> + ret = do_sync_readv_writev(file, iov, nr_segs, tot_len,
> + pos, fnv);
> + else
> + ret = do_loop_readv_writev(file, iov, nr_segs, pos, fn);
> + }
>
> if (type != READ)
> file_end_write(file);
> @@ -866,8 +871,10 @@ ssize_t vfs_readv(struct file *file, const struct iovec __user *vec,
> return -EBADF;
> if (!(file->f_mode & FMODE_CAN_READ))
> return -EINVAL;
> - if (flags & ~0)
> + if (flags & ~RWF_NONBLOCK)
> return -EINVAL;
> + if ((file->f_flags & O_DIRECT) && (flags & RWF_NONBLOCK))
> + return -EAGAIN;
>
> return do_readv_writev(READ, file, vec, vlen, pos, flags);
> }
> @@ -1069,14 +1076,19 @@ static ssize_t compat_do_readv_writev(int type, struct file *file,
> file_start_write(file);
> }
>
> - if (iter_fn)
> + if (iter_fn) {
> ret = do_iter_readv_writev(file, type, iov, nr_segs, tot_len,
> pos, iter_fn, flags);
> - else if (fnv)
> - ret = do_sync_readv_writev(file, iov, nr_segs, tot_len,
> - pos, fnv);
> - else
> - ret = do_loop_readv_writev(file, iov, nr_segs, pos, fn);
> + } else {
> + if (type == READ && (flags & RWF_NONBLOCK))
> + return -EAGAIN;
> +
> + if (fnv)
> + ret = do_sync_readv_writev(file, iov, nr_segs, tot_len,
> + pos, fnv);
> + else
> + ret = do_loop_readv_writev(file, iov, nr_segs, pos, fn);
> + }
>
> if (type != READ)
> file_end_write(file);
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index eb596b4..b1f6334 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -246,6 +246,10 @@ xfs_file_read_iter(
>
> XFS_STATS_INC(xs_read_calls);
>
> + /* XXX: need a non-blocking iolock helper, shouldn't be too hard */
> + if (iocb->ki_rwflags & RWF_NONBLOCK)
> + return -EAGAIN;
> +
> if (unlikely(file->f_flags & O_DIRECT))
> ioflags |= XFS_IO_ISDIRECT;
> if (file->f_mode & FMODE_NOCMTIME)
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 9ed5711..eaebd99 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1459,6 +1459,9 @@ struct block_device_operations;
> #define HAVE_COMPAT_IOCTL 1
> #define HAVE_UNLOCKED_IOCTL 1
>
> +/* These flags are used for the readv/writev syscalls with flags. */
> +#define RWF_NONBLOCK 0x00000001
> +
> struct iov_iter;
>
> struct file_operations {
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 530c263..09d3af3 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -1494,6 +1494,8 @@ static ssize_t do_generic_file_read(struct file *filp, loff_t *ppos,
> find_page:
> page = find_get_page(mapping, index);
> if (!page) {
> + if (flags & RWF_NONBLOCK)
> + goto would_block;
> page_cache_sync_readahead(mapping,
> ra, filp,
> index, last_index - index);
> @@ -1585,6 +1587,11 @@ page_ok:
> continue;
>
> page_not_up_to_date:
> + if (flags & RWF_NONBLOCK) {
> + page_cache_release(page);
> + goto would_block;
> + }
> +
> /* Get exclusive access to the page ... */
> error = lock_page_killable(page);
> if (unlikely(error))
> @@ -1604,6 +1611,12 @@ page_not_up_to_date_locked:
> goto page_ok;
> }
>
> + if (flags & RWF_NONBLOCK) {
> + unlock_page(page);
> + page_cache_release(page);
> + goto would_block;
> + }
> +
> readpage:
> /*
> * A previous I/O error may have been due to temporary
> @@ -1674,6 +1687,8 @@ no_cached_page:
> goto readpage;
> }
>
> +would_block:
> + error = -EAGAIN;
> out:
> ra->prev_pos = prev_index;
> ra->prev_pos <<= PAGE_CACHE_SHIFT;
> @@ -1707,6 +1722,9 @@ generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
> size_t count = iov_iter_count(iter);
> loff_t size;
>
> + if (iocb->ki_rwflags & RWF_NONBLOCK)
> + return -EAGAIN;
> +
> if (!count)
> goto out; /* skip atime */
> size = i_size_read(inode);
> diff --git a/mm/shmem.c b/mm/shmem.c
> index cd6fc75..5c30f04 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1531,6 +1531,10 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
> ssize_t retval = 0;
> loff_t *ppos = &iocb->ki_pos;
>
> + /* XXX: should be easily supportable */
> + if (iocb->ki_rwflags & RWF_NONBLOCK)
> + return -EAGAIN;
> +
> /*
> * Might this read be for a stacking filesystem? Then when reading
> * holes of a sparse file, we actually need to allocate those pages,
> --
> 1.9.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org. For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v5 7/7] fs: add a flag for per-operation O_DSYNC semantics
2014-11-05 21:14 ` [PATCH v5 7/7] fs: add a flag for per-operation O_DSYNC semantics Milosz Tanski
2014-11-06 23:46 ` Jeff Moyer
@ 2014-11-10 16:07 ` Sage Weil
1 sibling, 0 replies; 27+ messages in thread
From: Sage Weil @ 2014-11-10 16:07 UTC (permalink / raw)
To: Milosz Tanski
Cc: linux-kernel, Christoph Hellwig, Christoph Hellwig, linux-fsdevel,
linux-aio, Mel Gorman, Volker Lendecke, Tejun Heo, Jeff Moyer,
Theodore Ts'o, Al Viro, linux-api, Michael Kerrisk,
linux-arch, ceph-devel, fuse-devel, linux-nfs, ocfs2-devel,
linux-mm
On Wed, 5 Nov 2014, Milosz Tanski wrote:
> From: Christoph Hellwig <hch@lst.de>
>
> With the new read/write with flags syscalls we can support a flag
> to enable O_DSYNC semantics on a per-operation basis. This ?s
> useful to implement protocols like SMB, NFS or SCSI that have such
> per-operation flags.
>
> Example program below:
>
> cat > pwritev2.c << EOF
>
> (off_t) val, \
> (off_t) ((((uint64_t) (val)) >> (sizeof (long) * 4)) >> (sizeof (long) * 4))
>
> static ssize_t
> pwritev2(int fd, const struct iovec *iov, int iovcnt, off_t offset, int flags)
> {
> return syscall(__NR_pwritev2, fd, iov, iovcnt, LO_HI_LONG(offset),
> flags);
> }
>
> int main(int argc, char **argv)
> {
> int fd = open(argv[1], O_WRONLY|O_CREAT|O_TRUNC, 0666);
> char buf[1024];
> struct iovec iov = { .iov_base = buf, .iov_len = 1024 };
> int ret;
>
> if (fd < 0) {
> perror("open");
> return 0;
> }
>
> memset(buf, 0xfe, sizeof(buf));
>
> ret = pwritev2(fd, &iov, 1, 0, RWF_DSYNC);
> if (ret < 0)
> perror("pwritev2");
> else
> printf("ret = %d\n", ret);
>
> return 0;
> }
> EOF
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> [milosz@adfin.com: added flag check to compat_do_readv_writev()]
> Signed-off-by: Milosz Tanski <milosz@adfin.com>
Ceph bits
Acked-by: Sage Weil <sage@redhat.com>
> ---
> fs/ceph/file.c | 4 +++-
> fs/fuse/file.c | 2 ++
> fs/nfs/file.c | 10 ++++++----
> fs/ocfs2/file.c | 6 ++++--
> fs/read_write.c | 20 +++++++++++++++-----
> include/linux/fs.h | 3 ++-
> mm/filemap.c | 4 +++-
> 7 files changed, 35 insertions(+), 14 deletions(-)
>
> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> index b798b5c..2d4e15a 100644
> --- a/fs/ceph/file.c
> +++ b/fs/ceph/file.c
> @@ -983,7 +983,9 @@ retry_snap:
> ceph_put_cap_refs(ci, got);
>
> if (written >= 0 &&
> - ((file->f_flags & O_SYNC) || IS_SYNC(file->f_mapping->host) ||
> + ((file->f_flags & O_SYNC) ||
> + IS_SYNC(file->f_mapping->host) ||
> + (iocb->ki_rwflags & RWF_DSYNC) ||
> ceph_osdmap_flag(osdc->osdmap, CEPH_OSDMAP_NEARFULL))) {
> err = vfs_fsync_range(file, pos, pos + written - 1, 1);
> if (err < 0)
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index caa8d95..bb4fb23 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -1248,6 +1248,8 @@ static ssize_t fuse_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
> written += written_buffered;
> iocb->ki_pos = pos + written_buffered;
> } else {
> + if (iocb->ki_rwflags & RWF_DSYNC)
> + return -EINVAL;
> written = fuse_perform_write(file, mapping, from, pos);
> if (written >= 0)
> iocb->ki_pos = pos + written;
> diff --git a/fs/nfs/file.c b/fs/nfs/file.c
> index aa9046f..c59b0b7 100644
> --- a/fs/nfs/file.c
> +++ b/fs/nfs/file.c
> @@ -652,13 +652,15 @@ static const struct vm_operations_struct nfs_file_vm_ops = {
> .remap_pages = generic_file_remap_pages,
> };
>
> -static int nfs_need_sync_write(struct file *filp, struct inode *inode)
> +static int nfs_need_sync_write(struct kiocb *iocb, struct inode *inode)
> {
> struct nfs_open_context *ctx;
>
> - if (IS_SYNC(inode) || (filp->f_flags & O_DSYNC))
> + if (IS_SYNC(inode) ||
> + (iocb->ki_filp->f_flags & O_DSYNC) ||
> + (iocb->ki_rwflags & RWF_DSYNC))
> return 1;
> - ctx = nfs_file_open_context(filp);
> + ctx = nfs_file_open_context(iocb->ki_filp);
> if (test_bit(NFS_CONTEXT_ERROR_WRITE, &ctx->flags) ||
> nfs_ctx_key_to_expire(ctx))
> return 1;
> @@ -705,7 +707,7 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from)
> written = result;
>
> /* Return error values for O_DSYNC and IS_SYNC() */
> - if (result >= 0 && nfs_need_sync_write(file, inode)) {
> + if (result >= 0 && nfs_need_sync_write(iocb, inode)) {
> int err = vfs_fsync(file, 0);
> if (err < 0)
> result = err;
> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
> index bb66ca4..8f9a86b 100644
> --- a/fs/ocfs2/file.c
> +++ b/fs/ocfs2/file.c
> @@ -2374,8 +2374,10 @@ out_dio:
> /* buffered aio wouldn't have proper lock coverage today */
> BUG_ON(ret == -EIOCBQUEUED && !(file->f_flags & O_DIRECT));
>
> - if (((file->f_flags & O_DSYNC) && !direct_io) || IS_SYNC(inode) ||
> - ((file->f_flags & O_DIRECT) && !direct_io)) {
> + if (((file->f_flags & O_DSYNC) && !direct_io) ||
> + IS_SYNC(inode) ||
> + ((file->f_flags & O_DIRECT) && !direct_io) ||
> + (iocb->ki_rwflags & RWF_DSYNC)) {
> ret = filemap_fdatawrite_range(file->f_mapping, *ppos,
> *ppos + count - 1);
> if (ret < 0)
> diff --git a/fs/read_write.c b/fs/read_write.c
> index cba7d4c..3443265 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -839,8 +839,13 @@ static ssize_t do_readv_writev(int type, struct file *file,
> ret = do_iter_readv_writev(file, type, iov, nr_segs, tot_len,
> pos, iter_fn, flags);
> } else {
> - if (type == READ && (flags & RWF_NONBLOCK))
> - return -EAGAIN;
> + if (type == READ) {
> + if (flags & RWF_NONBLOCK)
> + return -EAGAIN;
> + } else {
> + if (flags & RWF_DSYNC)
> + return -EINVAL;
> + }
>
> if (fnv)
> ret = do_sync_readv_writev(file, iov, nr_segs, tot_len,
> @@ -888,7 +893,7 @@ ssize_t vfs_writev(struct file *file, const struct iovec __user *vec,
> return -EBADF;
> if (!(file->f_mode & FMODE_CAN_WRITE))
> return -EINVAL;
> - if (flags & ~0)
> + if (flags & ~RWF_DSYNC)
> return -EINVAL;
>
> return do_readv_writev(WRITE, file, vec, vlen, pos, flags);
> @@ -1080,8 +1085,13 @@ static ssize_t compat_do_readv_writev(int type, struct file *file,
> ret = do_iter_readv_writev(file, type, iov, nr_segs, tot_len,
> pos, iter_fn, flags);
> } else {
> - if (type == READ && (flags & RWF_NONBLOCK))
> - return -EAGAIN;
> + if (type == READ) {
> + if (flags & RWF_NONBLOCK)
> + return -EAGAIN;
> + } else {
> + if (flags & RWF_DSYNC)
> + return -EINVAL;
> + }
>
> if (fnv)
> ret = do_sync_readv_writev(file, iov, nr_segs, tot_len,
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 7d0e116..7786b88 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1460,7 +1460,8 @@ struct block_device_operations;
> #define HAVE_UNLOCKED_IOCTL 1
>
> /* These flags are used for the readv/writev syscalls with flags. */
> -#define RWF_NONBLOCK 0x00000001
> +#define RWF_NONBLOCK 0x00000001
> +#define RWF_DSYNC 0x00000002
>
> struct iov_iter;
>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 6107058..4fbef99 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2669,7 +2669,9 @@ int generic_write_sync(struct kiocb *iocb, loff_t count)
> struct file *file = iocb->ki_filp;
>
> if (count > 0 &&
> - ((file->f_flags & O_DSYNC) || IS_SYNC(file->f_mapping->host))) {
> + ((file->f_flags & O_DSYNC) ||
> + (iocb->ki_rwflags & RWF_DSYNC) ||
> + IS_SYNC(file->f_mapping->host))) {
> bool fdatasync = !(file->f_flags & __O_SYNC);
> ssize_t ret = 0;
>
> --
> 1.9.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2014-11-10 16:07 UTC | newest]
Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-05 21:14 [PATCH v5 0/7] vfs: Non-blockling buffered fs read (page cache only) Milosz Tanski
2014-11-05 21:14 ` [PATCH v5 1/7] vfs: Prepare for adding a new preadv/pwritev with user flags Milosz Tanski
2014-11-05 21:14 ` [PATCH v5 2/7] vfs: Define new syscalls preadv2,pwritev2 Milosz Tanski
2014-11-06 23:25 ` Jeff Moyer
2014-11-07 16:28 ` Milosz Tanski
2014-11-05 21:14 ` [PATCH v5 3/7] x86: wire up preadv2 and pwritev2 Milosz Tanski
2014-11-05 21:14 ` [PATCH v5 4/7] vfs: RWF_NONBLOCK flag for preadv2 Milosz Tanski
2014-11-10 16:07 ` Sage Weil
2014-11-05 21:14 ` [PATCH v5 5/7] xfs: add RWF_NONBLOCK support Milosz Tanski
2014-11-05 21:14 ` [PATCH v5 6/7] fs: pass iocb to generic_write_sync Milosz Tanski
2014-11-06 10:18 ` [Cluster-devel] " Steven Whitehouse
2014-11-06 10:52 ` [Linux-NTFS-Dev] " Anton Altaparmakov
2014-11-06 16:14 ` Milosz Tanski
2014-11-06 12:04 ` Jan Kara
2014-11-05 21:14 ` [PATCH v5 7/7] fs: add a flag for per-operation O_DSYNC semantics Milosz Tanski
2014-11-06 23:46 ` Jeff Moyer
[not found] ` <x49r3xf28qn.fsf-RRHT56Q3PSP4kTEheFKJxxDDeQx5vsVwAInAS/Ez/D0@public.gmane.org>
2014-11-07 4:22 ` [PATCH v5 7/7] " Anton Altaparmakov
2014-11-07 5:52 ` [fuse-devel] " Anand Avati
2014-11-07 6:43 ` Anton Altaparmakov
2014-11-07 14:21 ` Roger Willcocks
2014-11-07 19:58 ` Milosz Tanski
2014-11-10 16:07 ` [PATCH v5 7/7] fs: " Sage Weil
[not found] ` <cover.1415220890.git.milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org>
2014-11-06 7:56 ` [PATCH v5 0/7] vfs: Non-blockling buffered fs read (page cache only) Christoph Hellwig
2014-11-06 15:46 ` Milosz Tanski
2014-11-06 15:44 ` [PATCH v2 1/2] Add preadv2/pwritev2 documentation Milosz Tanski
[not found] ` <d2cbc4795f774b521e13ac448d07a1156c6aa04d.1415288353.git.milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org>
2014-11-06 15:44 ` [PATCH v2 2/2] RWF_ODSYNC flag for pwritev2 Milosz Tanski
2014-11-06 16:16 ` [PATCH v5 0/7] vfs: Non-blockling buffered fs read (page cache only) Milosz Tanski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).