* [Cluster-devel] cleanup the filemap / direct I/O interaction v3 (full series now)
@ 2023-05-31 7:50 Christoph Hellwig
2023-05-31 7:50 ` [Cluster-devel] [PATCH 01/12] backing_dev: remove current->backing_dev_info Christoph Hellwig
` (11 more replies)
0 siblings, 12 replies; 34+ messages in thread
From: Christoph Hellwig @ 2023-05-31 7:50 UTC (permalink / raw)
To: cluster-devel.redhat.com
[Sorry for the previous attempt that stopped at patch 8]
Hi all,
this series cleans up some of the generic write helper calling
conventions and the page cache writeback / invalidation for
direct I/O. This is a spinoff from the no-bufferhead kernel
project, for which we'll want to an use iomap based buffered
write path in the block layer.
Changes since v2:
- stick to the existing behavior of returning a short write
if the buffer fallback write or sync fails
- bring back "fuse: use direct_write_fallback" which accidentally
got lost in v2
Changes since v1:
- remove current->backing_dev_info entirely
- fix the pos/end calculation in direct_write_fallback
- rename kiocb_invalidate_post_write to
kiocb_invalidate_post_direct_write
- typo fixes
diffstat:
block/fops.c | 18 +-----
fs/btrfs/file.c | 6 --
fs/ceph/file.c | 6 --
fs/direct-io.c | 10 ---
fs/ext4/file.c | 11 +---
fs/f2fs/file.c | 3 -
fs/fuse/file.c | 4 -
fs/gfs2/file.c | 6 --
fs/iomap/buffered-io.c | 9 ++-
fs/iomap/direct-io.c | 88 ++++++++++++---------------------
fs/nfs/file.c | 6 --
fs/ntfs/file.c | 2
fs/ntfs3/file.c | 3 -
fs/xfs/xfs_file.c | 6 --
fs/zonefs/file.c | 4 -
include/linux/fs.h | 5 -
include/linux/pagemap.h | 4 +
include/linux/sched.h | 3 -
mm/filemap.c | 126 ++++++++++++++++++++++++++----------------------
19 files changed, 125 insertions(+), 195 deletions(-)
^ permalink raw reply [flat|nested] 34+ messages in thread
* [Cluster-devel] [PATCH 01/12] backing_dev: remove current->backing_dev_info
2023-05-31 7:50 [Cluster-devel] cleanup the filemap / direct I/O interaction v3 (full series now) Christoph Hellwig
@ 2023-05-31 7:50 ` Christoph Hellwig
2023-05-31 9:04 ` Damien Le Moal
` (3 more replies)
2023-05-31 7:50 ` [Cluster-devel] [PATCH 02/12] iomap: update ki_pos a little later in iomap_dio_complete Christoph Hellwig
` (10 subsequent siblings)
11 siblings, 4 replies; 34+ messages in thread
From: Christoph Hellwig @ 2023-05-31 7:50 UTC (permalink / raw)
To: cluster-devel.redhat.com
The last user of current->backing_dev_info disappeared in commit
b9b1335e6403 ("remove bdi_congested() and wb_congested() and related
functions"). Remove the field and all assignments to it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
fs/btrfs/file.c | 6 +-----
fs/ceph/file.c | 4 ----
fs/ext4/file.c | 2 --
fs/f2fs/file.c | 2 --
fs/fuse/file.c | 4 ----
fs/gfs2/file.c | 2 --
fs/nfs/file.c | 5 +----
fs/ntfs/file.c | 2 --
fs/ntfs3/file.c | 3 ---
fs/xfs/xfs_file.c | 4 ----
include/linux/sched.h | 3 ---
mm/filemap.c | 3 ---
12 files changed, 2 insertions(+), 38 deletions(-)
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index f649647392e0e4..ecd43ab66fa6c7 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1145,7 +1145,6 @@ static int btrfs_write_check(struct kiocb *iocb, struct iov_iter *from,
!(BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW | BTRFS_INODE_PREALLOC)))
return -EAGAIN;
- current->backing_dev_info = inode_to_bdi(inode);
ret = file_remove_privs(file);
if (ret)
return ret;
@@ -1165,10 +1164,8 @@ static int btrfs_write_check(struct kiocb *iocb, struct iov_iter *from,
loff_t end_pos = round_up(pos + count, fs_info->sectorsize);
ret = btrfs_cont_expand(BTRFS_I(inode), oldsize, end_pos);
- if (ret) {
- current->backing_dev_info = NULL;
+ if (ret)
return ret;
- }
}
return 0;
@@ -1689,7 +1686,6 @@ ssize_t btrfs_do_write_iter(struct kiocb *iocb, struct iov_iter *from,
if (sync)
atomic_dec(&inode->sync_writers);
- current->backing_dev_info = NULL;
return num_written;
}
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index f4d8bf7dec88a8..c8ef72f723badd 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -1791,9 +1791,6 @@ static ssize_t ceph_write_iter(struct kiocb *iocb, struct iov_iter *from)
else
ceph_start_io_write(inode);
- /* We can write back this queue in page reclaim */
- current->backing_dev_info = inode_to_bdi(inode);
-
if (iocb->ki_flags & IOCB_APPEND) {
err = ceph_do_getattr(inode, CEPH_STAT_CAP_SIZE, false);
if (err < 0)
@@ -1940,7 +1937,6 @@ static ssize_t ceph_write_iter(struct kiocb *iocb, struct iov_iter *from)
ceph_end_io_write(inode);
out_unlocked:
ceph_free_cap_flush(prealloc_cf);
- current->backing_dev_info = NULL;
return written ? written : err;
}
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index d101b3b0c7dad8..bc430270c23c19 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -285,9 +285,7 @@ static ssize_t ext4_buffered_write_iter(struct kiocb *iocb,
if (ret <= 0)
goto out;
- current->backing_dev_info = inode_to_bdi(inode);
ret = generic_perform_write(iocb, from);
- current->backing_dev_info = NULL;
out:
inode_unlock(inode);
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 5ac53d2627d20d..4f423d367a44b9 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -4517,9 +4517,7 @@ static ssize_t f2fs_buffered_write_iter(struct kiocb *iocb,
if (iocb->ki_flags & IOCB_NOWAIT)
return -EOPNOTSUPP;
- current->backing_dev_info = inode_to_bdi(inode);
ret = generic_perform_write(iocb, from);
- current->backing_dev_info = NULL;
if (ret > 0) {
iocb->ki_pos += ret;
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 89d97f6188e05e..97d435874b14aa 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1362,9 +1362,6 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
writethrough:
inode_lock(inode);
- /* We can write back this queue in page reclaim */
- current->backing_dev_info = inode_to_bdi(inode);
-
err = generic_write_checks(iocb, from);
if (err <= 0)
goto out;
@@ -1409,7 +1406,6 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
iocb->ki_pos += written;
}
out:
- current->backing_dev_info = NULL;
inode_unlock(inode);
if (written > 0)
written = generic_write_sync(iocb, written);
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 300844f50dcd28..904a0d6ac1a1a9 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -1041,11 +1041,9 @@ static ssize_t gfs2_file_buffered_write(struct kiocb *iocb,
goto out_unlock;
}
- current->backing_dev_info = inode_to_bdi(inode);
pagefault_disable();
ret = iomap_file_buffered_write(iocb, from, &gfs2_iomap_ops);
pagefault_enable();
- current->backing_dev_info = NULL;
if (ret > 0) {
iocb->ki_pos += ret;
written += ret;
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index f0edf5a36237d1..665ce3fc62eaf4 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -648,11 +648,8 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from)
since = filemap_sample_wb_err(file->f_mapping);
nfs_start_io_write(inode);
result = generic_write_checks(iocb, from);
- if (result > 0) {
- current->backing_dev_info = inode_to_bdi(inode);
+ if (result > 0)
result = generic_perform_write(iocb, from);
- current->backing_dev_info = NULL;
- }
nfs_end_io_write(inode);
if (result <= 0)
goto out;
diff --git a/fs/ntfs/file.c b/fs/ntfs/file.c
index c481b14e4fd989..e296f804a9c442 100644
--- a/fs/ntfs/file.c
+++ b/fs/ntfs/file.c
@@ -1911,11 +1911,9 @@ static ssize_t ntfs_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
inode_lock(vi);
/* We can write back this queue in page reclaim. */
- current->backing_dev_info = inode_to_bdi(vi);
err = ntfs_prepare_file_for_write(iocb, from);
if (iov_iter_count(from) && !err)
written = ntfs_perform_write(file, from, iocb->ki_pos);
- current->backing_dev_info = NULL;
inode_unlock(vi);
iocb->ki_pos += written;
if (likely(written > 0))
diff --git a/fs/ntfs3/file.c b/fs/ntfs3/file.c
index 9a3d55c367d920..86d16a2c8339ca 100644
--- a/fs/ntfs3/file.c
+++ b/fs/ntfs3/file.c
@@ -820,7 +820,6 @@ static ssize_t ntfs_compress_write(struct kiocb *iocb, struct iov_iter *from)
if (!pages)
return -ENOMEM;
- current->backing_dev_info = inode_to_bdi(inode);
err = file_remove_privs(file);
if (err)
goto out;
@@ -993,8 +992,6 @@ static ssize_t ntfs_compress_write(struct kiocb *iocb, struct iov_iter *from)
out:
kfree(pages);
- current->backing_dev_info = NULL;
-
if (err < 0)
return err;
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index aede746541f8ae..431c3fd0e2b598 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -717,9 +717,6 @@ xfs_file_buffered_write(
if (ret)
goto out;
- /* We can write back this queue in page reclaim */
- current->backing_dev_info = inode_to_bdi(inode);
-
trace_xfs_file_buffered_write(iocb, from);
ret = iomap_file_buffered_write(iocb, from,
&xfs_buffered_write_iomap_ops);
@@ -753,7 +750,6 @@ xfs_file_buffered_write(
goto write_retry;
}
- current->backing_dev_info = NULL;
out:
if (iolock)
xfs_iunlock(ip, iolock);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index eed5d65b8d1f4d..54780571fe9a07 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -41,7 +41,6 @@
/* task_struct member predeclarations (sorted alphabetically): */
struct audit_context;
-struct backing_dev_info;
struct bio_list;
struct blk_plug;
struct bpf_local_storage;
@@ -1186,8 +1185,6 @@ struct task_struct {
/* VM state: */
struct reclaim_state *reclaim_state;
- struct backing_dev_info *backing_dev_info;
-
struct io_context *io_context;
#ifdef CONFIG_COMPACTION
diff --git a/mm/filemap.c b/mm/filemap.c
index b4c9bd368b7e58..33b54660ad2b39 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3991,8 +3991,6 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
ssize_t err;
ssize_t status;
- /* We can write back this queue in page reclaim */
- current->backing_dev_info = inode_to_bdi(inode);
err = file_remove_privs(file);
if (err)
goto out;
@@ -4053,7 +4051,6 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
iocb->ki_pos += written;
}
out:
- current->backing_dev_info = NULL;
return written ? written : err;
}
EXPORT_SYMBOL(__generic_file_write_iter);
--
2.39.2
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [Cluster-devel] [PATCH 02/12] iomap: update ki_pos a little later in iomap_dio_complete
2023-05-31 7:50 [Cluster-devel] cleanup the filemap / direct I/O interaction v3 (full series now) Christoph Hellwig
2023-05-31 7:50 ` [Cluster-devel] [PATCH 01/12] backing_dev: remove current->backing_dev_info Christoph Hellwig
@ 2023-05-31 7:50 ` Christoph Hellwig
2023-05-31 7:50 ` [Cluster-devel] [PATCH 03/12] filemap: update ki_pos in generic_perform_write Christoph Hellwig
` (9 subsequent siblings)
11 siblings, 0 replies; 34+ messages in thread
From: Christoph Hellwig @ 2023-05-31 7:50 UTC (permalink / raw)
To: cluster-devel.redhat.com
Move the ki_pos update down a bit to prepare for a better common
helper that invalidates pages based of an iocb.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
fs/iomap/direct-io.c | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 019cc87d0fb339..6207a59d2162e1 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -94,7 +94,6 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio)
if (offset + ret > dio->i_size &&
!(dio->flags & IOMAP_DIO_WRITE))
ret = dio->i_size - offset;
- iocb->ki_pos += ret;
}
/*
@@ -120,19 +119,21 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio)
}
inode_dio_end(file_inode(iocb->ki_filp));
- /*
- * If this is a DSYNC write, make sure we push it to stable storage now
- * that we've written data.
- */
- if (ret > 0 && (dio->flags & IOMAP_DIO_NEED_SYNC))
- ret = generic_write_sync(iocb, ret);
- if (ret > 0)
- ret += dio->done_before;
+ if (ret > 0) {
+ iocb->ki_pos += ret;
+ /*
+ * If this is a DSYNC write, make sure we push it to stable
+ * storage now that we've written data.
+ */
+ if (dio->flags & IOMAP_DIO_NEED_SYNC)
+ ret = generic_write_sync(iocb, ret);
+ if (ret > 0)
+ ret += dio->done_before;
+ }
trace_iomap_dio_complete(iocb, dio->error, ret);
kfree(dio);
-
return ret;
}
EXPORT_SYMBOL_GPL(iomap_dio_complete);
--
2.39.2
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [Cluster-devel] [PATCH 03/12] filemap: update ki_pos in generic_perform_write
2023-05-31 7:50 [Cluster-devel] cleanup the filemap / direct I/O interaction v3 (full series now) Christoph Hellwig
2023-05-31 7:50 ` [Cluster-devel] [PATCH 01/12] backing_dev: remove current->backing_dev_info Christoph Hellwig
2023-05-31 7:50 ` [Cluster-devel] [PATCH 02/12] iomap: update ki_pos a little later in iomap_dio_complete Christoph Hellwig
@ 2023-05-31 7:50 ` Christoph Hellwig
2023-06-01 4:06 ` Theodore Ts'o
2023-05-31 7:50 ` [Cluster-devel] [PATCH 04/12] filemap: add a kiocb_write_and_wait helper Christoph Hellwig
` (8 subsequent siblings)
11 siblings, 1 reply; 34+ messages in thread
From: Christoph Hellwig @ 2023-05-31 7:50 UTC (permalink / raw)
To: cluster-devel.redhat.com
All callers of generic_perform_write need to updated ki_pos, move it into
common code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Darrick J. Wong <djwong@kernel.org>
---
fs/ceph/file.c | 2 --
fs/ext4/file.c | 9 +++------
fs/f2fs/file.c | 1 -
fs/nfs/file.c | 1 -
mm/filemap.c | 8 ++++----
5 files changed, 7 insertions(+), 14 deletions(-)
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index c8ef72f723badd..767f4dfe7def64 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -1891,8 +1891,6 @@ static ssize_t ceph_write_iter(struct kiocb *iocb, struct iov_iter *from)
* can not run at the same time
*/
written = generic_perform_write(iocb, from);
- if (likely(written >= 0))
- iocb->ki_pos = pos + written;
ceph_end_io_write(inode);
}
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index bc430270c23c19..ea0ada3985cba2 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -289,12 +289,9 @@ static ssize_t ext4_buffered_write_iter(struct kiocb *iocb,
out:
inode_unlock(inode);
- if (likely(ret > 0)) {
- iocb->ki_pos += ret;
- ret = generic_write_sync(iocb, ret);
- }
-
- return ret;
+ if (unlikely(ret <= 0))
+ return ret;
+ return generic_write_sync(iocb, ret);
}
static ssize_t ext4_handle_inode_extension(struct inode *inode, loff_t offset,
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 4f423d367a44b9..7134fe8bd008cb 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -4520,7 +4520,6 @@ static ssize_t f2fs_buffered_write_iter(struct kiocb *iocb,
ret = generic_perform_write(iocb, from);
if (ret > 0) {
- iocb->ki_pos += ret;
f2fs_update_iostat(F2FS_I_SB(inode), inode,
APP_BUFFERED_IO, ret);
}
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 665ce3fc62eaf4..e8bb4c48a3210a 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -655,7 +655,6 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from)
goto out;
written = result;
- iocb->ki_pos += written;
nfs_add_stats(inode, NFSIOS_NORMALWRITTENBYTES, written);
if (mntflags & NFS_MOUNT_WRITE_EAGER) {
diff --git a/mm/filemap.c b/mm/filemap.c
index 33b54660ad2b39..15907af4a57ff5 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3957,7 +3957,10 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i)
balance_dirty_pages_ratelimited(mapping);
} while (iov_iter_count(i));
- return written ? written : status;
+ if (!written)
+ return status;
+ iocb->ki_pos += written;
+ return written;
}
EXPORT_SYMBOL(generic_perform_write);
@@ -4034,7 +4037,6 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
endbyte = pos + status - 1;
err = filemap_write_and_wait_range(mapping, pos, endbyte);
if (err == 0) {
- iocb->ki_pos = endbyte + 1;
written += status;
invalidate_mapping_pages(mapping,
pos >> PAGE_SHIFT,
@@ -4047,8 +4049,6 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
}
} else {
written = generic_perform_write(iocb, from);
- if (likely(written > 0))
- iocb->ki_pos += written;
}
out:
return written ? written : err;
--
2.39.2
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [Cluster-devel] [PATCH 04/12] filemap: add a kiocb_write_and_wait helper
2023-05-31 7:50 [Cluster-devel] cleanup the filemap / direct I/O interaction v3 (full series now) Christoph Hellwig
` (2 preceding siblings ...)
2023-05-31 7:50 ` [Cluster-devel] [PATCH 03/12] filemap: update ki_pos in generic_perform_write Christoph Hellwig
@ 2023-05-31 7:50 ` Christoph Hellwig
2023-05-31 7:50 ` [Cluster-devel] [PATCH 05/12] filemap: add a kiocb_invalidate_pages helper Christoph Hellwig
` (7 subsequent siblings)
11 siblings, 0 replies; 34+ messages in thread
From: Christoph Hellwig @ 2023-05-31 7:50 UTC (permalink / raw)
To: cluster-devel.redhat.com
Factor out a helper that does filemap_write_and_wait_range for the range
covered by a read kiocb, or returns -EAGAIN if the kiocb is marked as
nowait and there would be pages to write.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Darrick J. Wong <djwong@kernel.org>
---
block/fops.c | 18 +++---------------
include/linux/pagemap.h | 2 ++
mm/filemap.c | 30 ++++++++++++++++++------------
3 files changed, 23 insertions(+), 27 deletions(-)
diff --git a/block/fops.c b/block/fops.c
index 58d0aebc7313a8..575171049c5d83 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -576,21 +576,9 @@ static ssize_t blkdev_read_iter(struct kiocb *iocb, struct iov_iter *to)
goto reexpand; /* skip atime */
if (iocb->ki_flags & IOCB_DIRECT) {
- struct address_space *mapping = iocb->ki_filp->f_mapping;
-
- if (iocb->ki_flags & IOCB_NOWAIT) {
- if (filemap_range_needs_writeback(mapping, pos,
- pos + count - 1)) {
- ret = -EAGAIN;
- goto reexpand;
- }
- } else {
- ret = filemap_write_and_wait_range(mapping, pos,
- pos + count - 1);
- if (ret < 0)
- goto reexpand;
- }
-
+ ret = kiocb_write_and_wait(iocb, count);
+ if (ret < 0)
+ goto reexpand;
file_accessed(iocb->ki_filp);
ret = blkdev_direct_IO(iocb, to);
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index a56308a9d1a450..36fc2cea13ce20 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -30,6 +30,7 @@ static inline void invalidate_remote_inode(struct inode *inode)
int invalidate_inode_pages2(struct address_space *mapping);
int invalidate_inode_pages2_range(struct address_space *mapping,
pgoff_t start, pgoff_t end);
+
int write_inode_now(struct inode *, int sync);
int filemap_fdatawrite(struct address_space *);
int filemap_flush(struct address_space *);
@@ -54,6 +55,7 @@ int filemap_check_errors(struct address_space *mapping);
void __filemap_set_wb_err(struct address_space *mapping, int err);
int filemap_fdatawrite_wbc(struct address_space *mapping,
struct writeback_control *wbc);
+int kiocb_write_and_wait(struct kiocb *iocb, size_t count);
static inline int filemap_write_and_wait(struct address_space *mapping)
{
diff --git a/mm/filemap.c b/mm/filemap.c
index 15907af4a57ff5..5fcd5227f9cae2 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2762,6 +2762,21 @@ ssize_t filemap_read(struct kiocb *iocb, struct iov_iter *iter,
}
EXPORT_SYMBOL_GPL(filemap_read);
+int kiocb_write_and_wait(struct kiocb *iocb, size_t count)
+{
+ struct address_space *mapping = iocb->ki_filp->f_mapping;
+ loff_t pos = iocb->ki_pos;
+ loff_t end = pos + count - 1;
+
+ if (iocb->ki_flags & IOCB_NOWAIT) {
+ if (filemap_range_needs_writeback(mapping, pos, end))
+ return -EAGAIN;
+ return 0;
+ }
+
+ return filemap_write_and_wait_range(mapping, pos, end);
+}
+
/**
* generic_file_read_iter - generic filesystem read routine
* @iocb: kernel I/O control block
@@ -2797,18 +2812,9 @@ generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
struct address_space *mapping = file->f_mapping;
struct inode *inode = mapping->host;
- if (iocb->ki_flags & IOCB_NOWAIT) {
- if (filemap_range_needs_writeback(mapping, iocb->ki_pos,
- iocb->ki_pos + count - 1))
- return -EAGAIN;
- } else {
- retval = filemap_write_and_wait_range(mapping,
- iocb->ki_pos,
- iocb->ki_pos + count - 1);
- if (retval < 0)
- return retval;
- }
-
+ retval = kiocb_write_and_wait(iocb, count);
+ if (retval < 0)
+ return retval;
file_accessed(file);
retval = mapping->a_ops->direct_IO(iocb, iter);
--
2.39.2
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [Cluster-devel] [PATCH 05/12] filemap: add a kiocb_invalidate_pages helper
2023-05-31 7:50 [Cluster-devel] cleanup the filemap / direct I/O interaction v3 (full series now) Christoph Hellwig
` (3 preceding siblings ...)
2023-05-31 7:50 ` [Cluster-devel] [PATCH 04/12] filemap: add a kiocb_write_and_wait helper Christoph Hellwig
@ 2023-05-31 7:50 ` Christoph Hellwig
2023-05-31 7:50 ` [Cluster-devel] [PATCH 06/12] filemap: add a kiocb_invalidate_post_direct_write helper Christoph Hellwig
` (6 subsequent siblings)
11 siblings, 0 replies; 34+ messages in thread
From: Christoph Hellwig @ 2023-05-31 7:50 UTC (permalink / raw)
To: cluster-devel.redhat.com
Factor out a helper that calls filemap_write_and_wait_range and
invalidate_inode_pages2_range for the range covered by a write kiocb or
returns -EAGAIN if the kiocb is marked as nowait and there would be pages
to write or invalidate.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Darrick J. Wong <djwong@kernel.org>
---
include/linux/pagemap.h | 1 +
mm/filemap.c | 48 ++++++++++++++++++++++++-----------------
2 files changed, 29 insertions(+), 20 deletions(-)
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 36fc2cea13ce20..6e4c9ee40baa99 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -30,6 +30,7 @@ static inline void invalidate_remote_inode(struct inode *inode)
int invalidate_inode_pages2(struct address_space *mapping);
int invalidate_inode_pages2_range(struct address_space *mapping,
pgoff_t start, pgoff_t end);
+int kiocb_invalidate_pages(struct kiocb *iocb, size_t count);
int write_inode_now(struct inode *, int sync);
int filemap_fdatawrite(struct address_space *);
diff --git a/mm/filemap.c b/mm/filemap.c
index 5fcd5227f9cae2..a1cb01a4b8046a 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2777,6 +2777,33 @@ int kiocb_write_and_wait(struct kiocb *iocb, size_t count)
return filemap_write_and_wait_range(mapping, pos, end);
}
+int kiocb_invalidate_pages(struct kiocb *iocb, size_t count)
+{
+ struct address_space *mapping = iocb->ki_filp->f_mapping;
+ loff_t pos = iocb->ki_pos;
+ loff_t end = pos + count - 1;
+ int ret;
+
+ if (iocb->ki_flags & IOCB_NOWAIT) {
+ /* we could block if there are any pages in the range */
+ if (filemap_range_has_page(mapping, pos, end))
+ return -EAGAIN;
+ } else {
+ ret = filemap_write_and_wait_range(mapping, pos, end);
+ if (ret)
+ return ret;
+ }
+
+ /*
+ * After a write we want buffered reads to be sure to go to disk to get
+ * the new data. We invalidate clean cached page from the region we're
+ * about to write. We do this *before* the write so that we can return
+ * without clobbering -EIOCBQUEUED from ->direct_IO().
+ */
+ return invalidate_inode_pages2_range(mapping, pos >> PAGE_SHIFT,
+ end >> PAGE_SHIFT);
+}
+
/**
* generic_file_read_iter - generic filesystem read routine
* @iocb: kernel I/O control block
@@ -3820,30 +3847,11 @@ generic_file_direct_write(struct kiocb *iocb, struct iov_iter *from)
write_len = iov_iter_count(from);
end = (pos + write_len - 1) >> PAGE_SHIFT;
- if (iocb->ki_flags & IOCB_NOWAIT) {
- /* If there are pages to writeback, return */
- if (filemap_range_has_page(file->f_mapping, pos,
- pos + write_len - 1))
- return -EAGAIN;
- } else {
- written = filemap_write_and_wait_range(mapping, pos,
- pos + write_len - 1);
- if (written)
- goto out;
- }
-
- /*
- * After a write we want buffered reads to be sure to go to disk to get
- * the new data. We invalidate clean cached page from the region we're
- * about to write. We do this *before* the write so that we can return
- * without clobbering -EIOCBQUEUED from ->direct_IO().
- */
- written = invalidate_inode_pages2_range(mapping,
- pos >> PAGE_SHIFT, end);
/*
* If a page can not be invalidated, return 0 to fall back
* to buffered write.
*/
+ written = kiocb_invalidate_pages(iocb, write_len);
if (written) {
if (written == -EBUSY)
return 0;
--
2.39.2
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [Cluster-devel] [PATCH 06/12] filemap: add a kiocb_invalidate_post_direct_write helper
2023-05-31 7:50 [Cluster-devel] cleanup the filemap / direct I/O interaction v3 (full series now) Christoph Hellwig
` (4 preceding siblings ...)
2023-05-31 7:50 ` [Cluster-devel] [PATCH 05/12] filemap: add a kiocb_invalidate_pages helper Christoph Hellwig
@ 2023-05-31 7:50 ` Christoph Hellwig
2023-05-31 7:50 ` [Cluster-devel] [PATCH 07/12] iomap: update ki_pos in iomap_file_buffered_write Christoph Hellwig
` (5 subsequent siblings)
11 siblings, 0 replies; 34+ messages in thread
From: Christoph Hellwig @ 2023-05-31 7:50 UTC (permalink / raw)
To: cluster-devel.redhat.com
Add a helper to invalidate page cache after a dio write.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Darrick J. Wong <djwong@kernel.org>
---
fs/direct-io.c | 10 ++--------
fs/iomap/direct-io.c | 12 ++----------
include/linux/fs.h | 5 -----
include/linux/pagemap.h | 1 +
mm/filemap.c | 37 ++++++++++++++++++++-----------------
5 files changed, 25 insertions(+), 40 deletions(-)
diff --git a/fs/direct-io.c b/fs/direct-io.c
index 0b380bb8a81e11..4f9069aee0fe19 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -285,14 +285,8 @@ static ssize_t dio_complete(struct dio *dio, ssize_t ret, unsigned int flags)
* zeros from unwritten extents.
*/
if (flags & DIO_COMPLETE_INVALIDATE &&
- ret > 0 && dio_op == REQ_OP_WRITE &&
- dio->inode->i_mapping->nrpages) {
- err = invalidate_inode_pages2_range(dio->inode->i_mapping,
- offset >> PAGE_SHIFT,
- (offset + ret - 1) >> PAGE_SHIFT);
- if (err)
- dio_warn_stale_pagecache(dio->iocb->ki_filp);
- }
+ ret > 0 && dio_op == REQ_OP_WRITE)
+ kiocb_invalidate_post_direct_write(dio->iocb, ret);
inode_dio_end(dio->inode);
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 6207a59d2162e1..0795c54a745bca 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -81,7 +81,6 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio)
{
const struct iomap_dio_ops *dops = dio->dops;
struct kiocb *iocb = dio->iocb;
- struct inode *inode = file_inode(iocb->ki_filp);
loff_t offset = iocb->ki_pos;
ssize_t ret = dio->error;
@@ -108,15 +107,8 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio)
* ->end_io() when necessary, otherwise a racing buffer read would cache
* zeros from unwritten extents.
*/
- if (!dio->error && dio->size &&
- (dio->flags & IOMAP_DIO_WRITE) && inode->i_mapping->nrpages) {
- int err;
- err = invalidate_inode_pages2_range(inode->i_mapping,
- offset >> PAGE_SHIFT,
- (offset + dio->size - 1) >> PAGE_SHIFT);
- if (err)
- dio_warn_stale_pagecache(iocb->ki_filp);
- }
+ if (!dio->error && dio->size && (dio->flags & IOMAP_DIO_WRITE))
+ kiocb_invalidate_post_direct_write(iocb, dio->size);
inode_dio_end(file_inode(iocb->ki_filp));
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 133f0640fb2411..91021b4e1f6f48 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2837,11 +2837,6 @@ static inline void inode_dio_end(struct inode *inode)
wake_up_bit(&inode->i_state, __I_DIO_WAKEUP);
}
-/*
- * Warn about a page cache invalidation failure diring a direct I/O write.
- */
-void dio_warn_stale_pagecache(struct file *filp);
-
extern void inode_set_flags(struct inode *inode, unsigned int flags,
unsigned int mask);
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 6e4c9ee40baa99..6ecc4aaf5e3d51 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -31,6 +31,7 @@ int invalidate_inode_pages2(struct address_space *mapping);
int invalidate_inode_pages2_range(struct address_space *mapping,
pgoff_t start, pgoff_t end);
int kiocb_invalidate_pages(struct kiocb *iocb, size_t count);
+void kiocb_invalidate_post_direct_write(struct kiocb *iocb, size_t count);
int write_inode_now(struct inode *, int sync);
int filemap_fdatawrite(struct address_space *);
diff --git a/mm/filemap.c b/mm/filemap.c
index a1cb01a4b8046a..ddb6f8aa86d6ca 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3816,7 +3816,7 @@ EXPORT_SYMBOL(read_cache_page_gfp);
/*
* Warn about a page cache invalidation failure during a direct I/O write.
*/
-void dio_warn_stale_pagecache(struct file *filp)
+static void dio_warn_stale_pagecache(struct file *filp)
{
static DEFINE_RATELIMIT_STATE(_rs, 86400 * HZ, DEFAULT_RATELIMIT_BURST);
char pathname[128];
@@ -3833,19 +3833,23 @@ void dio_warn_stale_pagecache(struct file *filp)
}
}
+void kiocb_invalidate_post_direct_write(struct kiocb *iocb, size_t count)
+{
+ struct address_space *mapping = iocb->ki_filp->f_mapping;
+
+ if (mapping->nrpages &&
+ invalidate_inode_pages2_range(mapping,
+ iocb->ki_pos >> PAGE_SHIFT,
+ (iocb->ki_pos + count - 1) >> PAGE_SHIFT))
+ dio_warn_stale_pagecache(iocb->ki_filp);
+}
+
ssize_t
generic_file_direct_write(struct kiocb *iocb, struct iov_iter *from)
{
- struct file *file = iocb->ki_filp;
- struct address_space *mapping = file->f_mapping;
- struct inode *inode = mapping->host;
- loff_t pos = iocb->ki_pos;
- ssize_t written;
- size_t write_len;
- pgoff_t end;
-
- write_len = iov_iter_count(from);
- end = (pos + write_len - 1) >> PAGE_SHIFT;
+ struct address_space *mapping = iocb->ki_filp->f_mapping;
+ size_t write_len = iov_iter_count(from);
+ ssize_t written;
/*
* If a page can not be invalidated, return 0 to fall back
@@ -3855,7 +3859,7 @@ generic_file_direct_write(struct kiocb *iocb, struct iov_iter *from)
if (written) {
if (written == -EBUSY)
return 0;
- goto out;
+ return written;
}
written = mapping->a_ops->direct_IO(iocb, from);
@@ -3877,11 +3881,11 @@ generic_file_direct_write(struct kiocb *iocb, struct iov_iter *from)
*
* Skip invalidation for async writes or if mapping has no pages.
*/
- if (written > 0 && mapping->nrpages &&
- invalidate_inode_pages2_range(mapping, pos >> PAGE_SHIFT, end))
- dio_warn_stale_pagecache(file);
-
if (written > 0) {
+ struct inode *inode = mapping->host;
+ loff_t pos = iocb->ki_pos;
+
+ kiocb_invalidate_post_direct_write(iocb, written);
pos += written;
write_len -= written;
if (pos > i_size_read(inode) && !S_ISBLK(inode->i_mode)) {
@@ -3892,7 +3896,6 @@ generic_file_direct_write(struct kiocb *iocb, struct iov_iter *from)
}
if (written != -EIOCBQUEUED)
iov_iter_revert(from, write_len - iov_iter_count(from));
-out:
return written;
}
EXPORT_SYMBOL(generic_file_direct_write);
--
2.39.2
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [Cluster-devel] [PATCH 07/12] iomap: update ki_pos in iomap_file_buffered_write
2023-05-31 7:50 [Cluster-devel] cleanup the filemap / direct I/O interaction v3 (full series now) Christoph Hellwig
` (5 preceding siblings ...)
2023-05-31 7:50 ` [Cluster-devel] [PATCH 06/12] filemap: add a kiocb_invalidate_post_direct_write helper Christoph Hellwig
@ 2023-05-31 7:50 ` Christoph Hellwig
2023-05-31 7:50 ` [Cluster-devel] [PATCH 08/12] iomap: use kiocb_write_and_wait and kiocb_invalidate_pages Christoph Hellwig
` (4 subsequent siblings)
11 siblings, 0 replies; 34+ messages in thread
From: Christoph Hellwig @ 2023-05-31 7:50 UTC (permalink / raw)
To: cluster-devel.redhat.com
All callers of iomap_file_buffered_write need to updated ki_pos, move it
into common code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Acked-by: Damien Le Moal <dlemoal@kernel.org>
---
fs/gfs2/file.c | 4 +---
fs/iomap/buffered-io.c | 9 ++++++---
fs/xfs/xfs_file.c | 2 --
fs/zonefs/file.c | 4 +---
4 files changed, 8 insertions(+), 11 deletions(-)
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 904a0d6ac1a1a9..c6a7555d5ad8bb 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -1044,10 +1044,8 @@ static ssize_t gfs2_file_buffered_write(struct kiocb *iocb,
pagefault_disable();
ret = iomap_file_buffered_write(iocb, from, &gfs2_iomap_ops);
pagefault_enable();
- if (ret > 0) {
- iocb->ki_pos += ret;
+ if (ret > 0)
written += ret;
- }
if (inode == sdp->sd_rindex)
gfs2_glock_dq_uninit(statfs_gh);
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 063133ec77f49e..550525a525c45c 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -864,16 +864,19 @@ iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *i,
.len = iov_iter_count(i),
.flags = IOMAP_WRITE,
};
- int ret;
+ ssize_t ret;
if (iocb->ki_flags & IOCB_NOWAIT)
iter.flags |= IOMAP_NOWAIT;
while ((ret = iomap_iter(&iter, ops)) > 0)
iter.processed = iomap_write_iter(&iter, i);
- if (iter.pos == iocb->ki_pos)
+
+ if (unlikely(ret < 0))
return ret;
- return iter.pos - iocb->ki_pos;
+ ret = iter.pos - iocb->ki_pos;
+ iocb->ki_pos += ret;
+ return ret;
}
EXPORT_SYMBOL_GPL(iomap_file_buffered_write);
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 431c3fd0e2b598..d57443db633637 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -720,8 +720,6 @@ xfs_file_buffered_write(
trace_xfs_file_buffered_write(iocb, from);
ret = iomap_file_buffered_write(iocb, from,
&xfs_buffered_write_iomap_ops);
- if (likely(ret >= 0))
- iocb->ki_pos += ret;
/*
* If we hit a space limit, try to free up some lingering preallocated
diff --git a/fs/zonefs/file.c b/fs/zonefs/file.c
index 132f01d3461f14..e212d0636f848e 100644
--- a/fs/zonefs/file.c
+++ b/fs/zonefs/file.c
@@ -643,9 +643,7 @@ static ssize_t zonefs_file_buffered_write(struct kiocb *iocb,
goto inode_unlock;
ret = iomap_file_buffered_write(iocb, from, &zonefs_write_iomap_ops);
- if (ret > 0)
- iocb->ki_pos += ret;
- else if (ret == -EIO)
+ if (ret == -EIO)
zonefs_io_error(inode, true);
inode_unlock:
--
2.39.2
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [Cluster-devel] [PATCH 08/12] iomap: use kiocb_write_and_wait and kiocb_invalidate_pages
2023-05-31 7:50 [Cluster-devel] cleanup the filemap / direct I/O interaction v3 (full series now) Christoph Hellwig
` (6 preceding siblings ...)
2023-05-31 7:50 ` [Cluster-devel] [PATCH 07/12] iomap: update ki_pos in iomap_file_buffered_write Christoph Hellwig
@ 2023-05-31 7:50 ` Christoph Hellwig
2023-05-31 7:50 ` [Cluster-devel] [PATCH 09/12] fs: factor out a direct_write_fallback helper Christoph Hellwig
` (3 subsequent siblings)
11 siblings, 0 replies; 34+ messages in thread
From: Christoph Hellwig @ 2023-05-31 7:50 UTC (permalink / raw)
To: cluster-devel.redhat.com
Use the common helpers for direct I/O page invalidation instead of
open coding the logic. This leads to a slight reordering of checks
in __iomap_dio_rw to keep the logic straight.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
fs/iomap/direct-io.c | 55 ++++++++++++++++----------------------------
1 file changed, 20 insertions(+), 35 deletions(-)
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 0795c54a745bca..6bd14691f96e07 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -472,7 +472,6 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
const struct iomap_ops *ops, const struct iomap_dio_ops *dops,
unsigned int dio_flags, void *private, size_t done_before)
{
- struct address_space *mapping = iocb->ki_filp->f_mapping;
struct inode *inode = file_inode(iocb->ki_filp);
struct iomap_iter iomi = {
.inode = inode,
@@ -481,11 +480,11 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
.flags = IOMAP_DIRECT,
.private = private,
};
- loff_t end = iomi.pos + iomi.len - 1, ret = 0;
bool wait_for_completion =
is_sync_kiocb(iocb) || (dio_flags & IOMAP_DIO_FORCE_WAIT);
struct blk_plug plug;
struct iomap_dio *dio;
+ loff_t ret = 0;
trace_iomap_dio_rw_begin(iocb, iter, dio_flags, done_before);
@@ -509,31 +508,29 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
dio->submit.waiter = current;
dio->submit.poll_bio = NULL;
+ if (iocb->ki_flags & IOCB_NOWAIT)
+ iomi.flags |= IOMAP_NOWAIT;
+
if (iov_iter_rw(iter) == READ) {
if (iomi.pos >= dio->i_size)
goto out_free_dio;
- if (iocb->ki_flags & IOCB_NOWAIT) {
- if (filemap_range_needs_writeback(mapping, iomi.pos,
- end)) {
- ret = -EAGAIN;
- goto out_free_dio;
- }
- iomi.flags |= IOMAP_NOWAIT;
- }
-
if (user_backed_iter(iter))
dio->flags |= IOMAP_DIO_DIRTY;
+
+ ret = kiocb_write_and_wait(iocb, iomi.len);
+ if (ret)
+ goto out_free_dio;
} else {
iomi.flags |= IOMAP_WRITE;
dio->flags |= IOMAP_DIO_WRITE;
- if (iocb->ki_flags & IOCB_NOWAIT) {
- if (filemap_range_has_page(mapping, iomi.pos, end)) {
- ret = -EAGAIN;
+ if (dio_flags & IOMAP_DIO_OVERWRITE_ONLY) {
+ ret = -EAGAIN;
+ if (iomi.pos >= dio->i_size ||
+ iomi.pos + iomi.len > dio->i_size)
goto out_free_dio;
- }
- iomi.flags |= IOMAP_NOWAIT;
+ iomi.flags |= IOMAP_OVERWRITE_ONLY;
}
/* for data sync or sync, we need sync completion processing */
@@ -549,31 +546,19 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
if (!(iocb->ki_flags & IOCB_SYNC))
dio->flags |= IOMAP_DIO_WRITE_FUA;
}
- }
-
- if (dio_flags & IOMAP_DIO_OVERWRITE_ONLY) {
- ret = -EAGAIN;
- if (iomi.pos >= dio->i_size ||
- iomi.pos + iomi.len > dio->i_size)
- goto out_free_dio;
- iomi.flags |= IOMAP_OVERWRITE_ONLY;
- }
- ret = filemap_write_and_wait_range(mapping, iomi.pos, end);
- if (ret)
- goto out_free_dio;
-
- if (iov_iter_rw(iter) == WRITE) {
/*
* Try to invalidate cache pages for the range we are writing.
* If this invalidation fails, let the caller fall back to
* buffered I/O.
*/
- if (invalidate_inode_pages2_range(mapping,
- iomi.pos >> PAGE_SHIFT, end >> PAGE_SHIFT)) {
- trace_iomap_dio_invalidate_fail(inode, iomi.pos,
- iomi.len);
- ret = -ENOTBLK;
+ ret = kiocb_invalidate_pages(iocb, iomi.len);
+ if (ret) {
+ if (ret != -EAGAIN) {
+ trace_iomap_dio_invalidate_fail(inode, iomi.pos,
+ iomi.len);
+ ret = -ENOTBLK;
+ }
goto out_free_dio;
}
--
2.39.2
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [Cluster-devel] [PATCH 09/12] fs: factor out a direct_write_fallback helper
2023-05-31 7:50 [Cluster-devel] cleanup the filemap / direct I/O interaction v3 (full series now) Christoph Hellwig
` (7 preceding siblings ...)
2023-05-31 7:50 ` [Cluster-devel] [PATCH 08/12] iomap: use kiocb_write_and_wait and kiocb_invalidate_pages Christoph Hellwig
@ 2023-05-31 7:50 ` Christoph Hellwig
2023-05-31 9:08 ` Damien Le Moal
2023-05-31 9:15 ` Miklos Szeredi
2023-05-31 7:50 ` [Cluster-devel] [PATCH 10/12] fuse: update ki_pos in fuse_perform_write Christoph Hellwig
` (2 subsequent siblings)
11 siblings, 2 replies; 34+ messages in thread
From: Christoph Hellwig @ 2023-05-31 7:50 UTC (permalink / raw)
To: cluster-devel.redhat.com
Add a helper dealing with handling the syncing of a buffered write fallback
for direct I/O.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/libfs.c | 41 ++++++++++++++++++++++++++++
include/linux/fs.h | 2 ++
mm/filemap.c | 66 +++++++++++-----------------------------------
3 files changed, 58 insertions(+), 51 deletions(-)
diff --git a/fs/libfs.c b/fs/libfs.c
index 89cf614a327158..5b851315eeed03 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1613,3 +1613,44 @@ u64 inode_query_iversion(struct inode *inode)
return cur >> I_VERSION_QUERIED_SHIFT;
}
EXPORT_SYMBOL(inode_query_iversion);
+
+ssize_t direct_write_fallback(struct kiocb *iocb, struct iov_iter *iter,
+ ssize_t direct_written, ssize_t buffered_written)
+{
+ struct address_space *mapping = iocb->ki_filp->f_mapping;
+ loff_t pos = iocb->ki_pos - buffered_written;
+ loff_t end = iocb->ki_pos - 1;
+ int err;
+
+ /*
+ * If the buffered write fallback returned an error, we want to return
+ * the number of bytes which were written by direct I/O, or the error
+ * code if that was zero.
+ *
+ * Note that this differs from normal direct-io semantics, which will
+ * return -EFOO even if some bytes were written.
+ */
+ if (unlikely(buffered_written < 0)) {
+ if (direct_written)
+ return direct_written;
+ return buffered_written;
+ }
+
+ /*
+ * We need to ensure that the page cache pages are written to disk and
+ * invalidated to preserve the expected O_DIRECT semantics.
+ */
+ err = filemap_write_and_wait_range(mapping, pos, end);
+ if (err < 0) {
+ /*
+ * We don't know how much we wrote, so just return the number of
+ * bytes which were direct-written
+ */
+ if (direct_written)
+ return direct_written;
+ return err;
+ }
+ invalidate_mapping_pages(mapping, pos >> PAGE_SHIFT, end >> PAGE_SHIFT);
+ return direct_written + buffered_written;
+}
+EXPORT_SYMBOL_GPL(direct_write_fallback);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 91021b4e1f6f48..6af25137543824 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2738,6 +2738,8 @@ extern ssize_t __generic_file_write_iter(struct kiocb *, struct iov_iter *);
extern ssize_t generic_file_write_iter(struct kiocb *, struct iov_iter *);
extern ssize_t generic_file_direct_write(struct kiocb *, struct iov_iter *);
ssize_t generic_perform_write(struct kiocb *, struct iov_iter *);
+ssize_t direct_write_fallback(struct kiocb *iocb, struct iov_iter *iter,
+ ssize_t direct_written, ssize_t buffered_written);
ssize_t vfs_iter_read(struct file *file, struct iov_iter *iter, loff_t *ppos,
rwf_t flags);
diff --git a/mm/filemap.c b/mm/filemap.c
index ddb6f8aa86d6ca..137508da5525b6 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -4006,23 +4006,19 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
{
struct file *file = iocb->ki_filp;
struct address_space *mapping = file->f_mapping;
- struct inode *inode = mapping->host;
- ssize_t written = 0;
- ssize_t err;
- ssize_t status;
+ struct inode *inode = mapping->host;
+ ssize_t ret;
- err = file_remove_privs(file);
- if (err)
- goto out;
+ ret = file_remove_privs(file);
+ if (ret)
+ return ret;
- err = file_update_time(file);
- if (err)
- goto out;
+ ret = file_update_time(file);
+ if (ret)
+ return ret;
if (iocb->ki_flags & IOCB_DIRECT) {
- loff_t pos, endbyte;
-
- written = generic_file_direct_write(iocb, from);
+ ret = generic_file_direct_write(iocb, from);
/*
* If the write stopped short of completing, fall back to
* buffered writes. Some filesystems do this for writes to
@@ -4030,45 +4026,13 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
* not succeed (even if it did, DAX does not handle dirty
* page-cache pages correctly).
*/
- if (written < 0 || !iov_iter_count(from) || IS_DAX(inode))
- goto out;
-
- pos = iocb->ki_pos;
- status = generic_perform_write(iocb, from);
- /*
- * If generic_perform_write() returned a synchronous error
- * then we want to return the number of bytes which were
- * direct-written, or the error code if that was zero. Note
- * that this differs from normal direct-io semantics, which
- * will return -EFOO even if some bytes were written.
- */
- if (unlikely(status < 0)) {
- err = status;
- goto out;
- }
- /*
- * We need to ensure that the page cache pages are written to
- * disk and invalidated to preserve the expected O_DIRECT
- * semantics.
- */
- endbyte = pos + status - 1;
- err = filemap_write_and_wait_range(mapping, pos, endbyte);
- if (err == 0) {
- written += status;
- invalidate_mapping_pages(mapping,
- pos >> PAGE_SHIFT,
- endbyte >> PAGE_SHIFT);
- } else {
- /*
- * We don't know how much we wrote, so just return
- * the number of bytes which were direct-written
- */
- }
- } else {
- written = generic_perform_write(iocb, from);
+ if (ret < 0 || !iov_iter_count(from) || IS_DAX(inode))
+ return ret;
+ return direct_write_fallback(iocb, from, ret,
+ generic_perform_write(iocb, from));
}
-out:
- return written ? written : err;
+
+ return generic_perform_write(iocb, from);
}
EXPORT_SYMBOL(__generic_file_write_iter);
--
2.39.2
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [Cluster-devel] [PATCH 10/12] fuse: update ki_pos in fuse_perform_write
2023-05-31 7:50 [Cluster-devel] cleanup the filemap / direct I/O interaction v3 (full series now) Christoph Hellwig
` (8 preceding siblings ...)
2023-05-31 7:50 ` [Cluster-devel] [PATCH 09/12] fs: factor out a direct_write_fallback helper Christoph Hellwig
@ 2023-05-31 7:50 ` Christoph Hellwig
2023-05-31 9:11 ` Miklos Szeredi
2023-05-31 7:50 ` [Cluster-devel] [PATCH 11/12] fuse: drop redundant arguments to fuse_perform_write Christoph Hellwig
2023-05-31 7:50 ` [Cluster-devel] [PATCH 12/12] fuse: use direct_write_fallback Christoph Hellwig
11 siblings, 1 reply; 34+ messages in thread
From: Christoph Hellwig @ 2023-05-31 7:50 UTC (permalink / raw)
To: cluster-devel.redhat.com
Both callers of fuse_perform_write need to updated ki_pos, move it into
common code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
---
fs/fuse/file.c | 23 ++++++++++-------------
1 file changed, 10 insertions(+), 13 deletions(-)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 97d435874b14aa..e60e48bf392d49 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1329,7 +1329,10 @@ static ssize_t fuse_perform_write(struct kiocb *iocb,
fuse_write_update_attr(inode, pos, res);
clear_bit(FUSE_I_SIZE_UNSTABLE, &fi->state);
- return res > 0 ? res : err;
+ if (!res)
+ return err;
+ iocb->ki_pos += res;
+ return res;
}
static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
@@ -1341,7 +1344,6 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
struct inode *inode = mapping->host;
ssize_t err;
struct fuse_conn *fc = get_fuse_conn(inode);
- loff_t endbyte = 0;
if (fc->writeback_cache) {
/* Update size (EOF optimization) and mode (SUID clearing) */
@@ -1375,19 +1377,20 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
goto out;
if (iocb->ki_flags & IOCB_DIRECT) {
- loff_t pos = iocb->ki_pos;
+ loff_t pos, endbyte;
+
written = generic_file_direct_write(iocb, from);
if (written < 0 || !iov_iter_count(from))
goto out;
- pos += written;
-
- written_buffered = fuse_perform_write(iocb, mapping, from, pos);
+ written_buffered = fuse_perform_write(iocb, mapping, from,
+ iocb->ki_pos);
if (written_buffered < 0) {
err = written_buffered;
goto out;
}
- endbyte = pos + written_buffered - 1;
+ pos = iocb->ki_pos - written_buffered;
+ endbyte = iocb->ki_pos - 1;
err = filemap_write_and_wait_range(file->f_mapping, pos,
endbyte);
@@ -1399,17 +1402,11 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
endbyte >> PAGE_SHIFT);
written += written_buffered;
- iocb->ki_pos = pos + written_buffered;
} else {
written = fuse_perform_write(iocb, mapping, from, iocb->ki_pos);
- if (written >= 0)
- iocb->ki_pos += written;
}
out:
inode_unlock(inode);
- if (written > 0)
- written = generic_write_sync(iocb, written);
-
return written ? written : err;
}
--
2.39.2
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [Cluster-devel] [PATCH 11/12] fuse: drop redundant arguments to fuse_perform_write
2023-05-31 7:50 [Cluster-devel] cleanup the filemap / direct I/O interaction v3 (full series now) Christoph Hellwig
` (9 preceding siblings ...)
2023-05-31 7:50 ` [Cluster-devel] [PATCH 10/12] fuse: update ki_pos in fuse_perform_write Christoph Hellwig
@ 2023-05-31 7:50 ` Christoph Hellwig
2023-05-31 7:50 ` [Cluster-devel] [PATCH 12/12] fuse: use direct_write_fallback Christoph Hellwig
11 siblings, 0 replies; 34+ messages in thread
From: Christoph Hellwig @ 2023-05-31 7:50 UTC (permalink / raw)
To: cluster-devel.redhat.com
pos is always equal to iocb->ki_pos, and mapping is always equal to
iocb->ki_filp->f_mapping.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Miklos Szeredi <mszeredi@redhat.com>
---
fs/fuse/file.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index e60e48bf392d49..025973ad813e05 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1280,13 +1280,13 @@ static inline unsigned int fuse_wr_pages(loff_t pos, size_t len,
max_pages);
}
-static ssize_t fuse_perform_write(struct kiocb *iocb,
- struct address_space *mapping,
- struct iov_iter *ii, loff_t pos)
+static ssize_t fuse_perform_write(struct kiocb *iocb, struct iov_iter *ii)
{
+ struct address_space *mapping = iocb->ki_filp->f_mapping;
struct inode *inode = mapping->host;
struct fuse_conn *fc = get_fuse_conn(inode);
struct fuse_inode *fi = get_fuse_inode(inode);
+ loff_t pos = iocb->ki_pos;
int err = 0;
ssize_t res = 0;
@@ -1383,8 +1383,7 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
if (written < 0 || !iov_iter_count(from))
goto out;
- written_buffered = fuse_perform_write(iocb, mapping, from,
- iocb->ki_pos);
+ written_buffered = fuse_perform_write(iocb, from);
if (written_buffered < 0) {
err = written_buffered;
goto out;
@@ -1403,7 +1402,7 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
written += written_buffered;
} else {
- written = fuse_perform_write(iocb, mapping, from, iocb->ki_pos);
+ written = fuse_perform_write(iocb, from);
}
out:
inode_unlock(inode);
--
2.39.2
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [Cluster-devel] [PATCH 12/12] fuse: use direct_write_fallback
2023-05-31 7:50 [Cluster-devel] cleanup the filemap / direct I/O interaction v3 (full series now) Christoph Hellwig
` (10 preceding siblings ...)
2023-05-31 7:50 ` [Cluster-devel] [PATCH 11/12] fuse: drop redundant arguments to fuse_perform_write Christoph Hellwig
@ 2023-05-31 7:50 ` Christoph Hellwig
2023-05-31 9:01 ` Miklos Szeredi
11 siblings, 1 reply; 34+ messages in thread
From: Christoph Hellwig @ 2023-05-31 7:50 UTC (permalink / raw)
To: cluster-devel.redhat.com
Use the generic direct_write_fallback helper instead of duplicating the
logic.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
---
fs/fuse/file.c | 24 ++----------------------
1 file changed, 2 insertions(+), 22 deletions(-)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 025973ad813e05..7a72dc0a691201 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1340,7 +1340,6 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
struct file *file = iocb->ki_filp;
struct address_space *mapping = file->f_mapping;
ssize_t written = 0;
- ssize_t written_buffered = 0;
struct inode *inode = mapping->host;
ssize_t err;
struct fuse_conn *fc = get_fuse_conn(inode);
@@ -1377,30 +1376,11 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
goto out;
if (iocb->ki_flags & IOCB_DIRECT) {
- loff_t pos, endbyte;
-
written = generic_file_direct_write(iocb, from);
if (written < 0 || !iov_iter_count(from))
goto out;
-
- written_buffered = fuse_perform_write(iocb, from);
- if (written_buffered < 0) {
- err = written_buffered;
- goto out;
- }
- pos = iocb->ki_pos - written_buffered;
- endbyte = iocb->ki_pos - 1;
-
- err = filemap_write_and_wait_range(file->f_mapping, pos,
- endbyte);
- if (err)
- goto out;
-
- invalidate_mapping_pages(file->f_mapping,
- pos >> PAGE_SHIFT,
- endbyte >> PAGE_SHIFT);
-
- written += written_buffered;
+ written = direct_write_fallback(iocb, from, written,
+ generic_perform_write(iocb, from));
} else {
written = fuse_perform_write(iocb, from);
}
--
2.39.2
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [Cluster-devel] [PATCH 12/12] fuse: use direct_write_fallback
2023-05-31 7:50 ` [Cluster-devel] [PATCH 12/12] fuse: use direct_write_fallback Christoph Hellwig
@ 2023-05-31 9:01 ` Miklos Szeredi
0 siblings, 0 replies; 34+ messages in thread
From: Miklos Szeredi @ 2023-05-31 9:01 UTC (permalink / raw)
To: cluster-devel.redhat.com
On Wed, 31 May 2023 at 09:51, Christoph Hellwig <hch@lst.de> wrote:
>
> Use the generic direct_write_fallback helper instead of duplicating the
> logic.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
> ---
> fs/fuse/file.c | 24 ++----------------------
> 1 file changed, 2 insertions(+), 22 deletions(-)
>
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index 025973ad813e05..7a72dc0a691201 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -1340,7 +1340,6 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
> struct file *file = iocb->ki_filp;
> struct address_space *mapping = file->f_mapping;
> ssize_t written = 0;
> - ssize_t written_buffered = 0;
> struct inode *inode = mapping->host;
> ssize_t err;
> struct fuse_conn *fc = get_fuse_conn(inode);
> @@ -1377,30 +1376,11 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
> goto out;
>
> if (iocb->ki_flags & IOCB_DIRECT) {
> - loff_t pos, endbyte;
> -
> written = generic_file_direct_write(iocb, from);
> if (written < 0 || !iov_iter_count(from))
> goto out;
> -
> - written_buffered = fuse_perform_write(iocb, from);
> - if (written_buffered < 0) {
> - err = written_buffered;
> - goto out;
> - }
> - pos = iocb->ki_pos - written_buffered;
> - endbyte = iocb->ki_pos - 1;
> -
> - err = filemap_write_and_wait_range(file->f_mapping, pos,
> - endbyte);
> - if (err)
> - goto out;
> -
> - invalidate_mapping_pages(file->f_mapping,
> - pos >> PAGE_SHIFT,
> - endbyte >> PAGE_SHIFT);
> -
> - written += written_buffered;
> + written = direct_write_fallback(iocb, from, written,
> + generic_perform_write(iocb, from));
This should use fuse_perform_write().
Last version of the patch was correct; copy-paste error?
Thanks,
Miklos
^ permalink raw reply [flat|nested] 34+ messages in thread
* [Cluster-devel] [PATCH 01/12] backing_dev: remove current->backing_dev_info
2023-05-31 7:50 ` [Cluster-devel] [PATCH 01/12] backing_dev: remove current->backing_dev_info Christoph Hellwig
@ 2023-05-31 9:04 ` Damien Le Moal
2023-05-31 11:38 ` Johannes Thumshirn
` (2 subsequent siblings)
3 siblings, 0 replies; 34+ messages in thread
From: Damien Le Moal @ 2023-05-31 9:04 UTC (permalink / raw)
To: cluster-devel.redhat.com
On 5/31/23 16:50, Christoph Hellwig wrote:
> The last user of current->backing_dev_info disappeared in commit
> b9b1335e6403 ("remove bdi_congested() and wb_congested() and related
> functions"). Remove the field and all assignments to it.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Nice cleanup !
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 34+ messages in thread
* [Cluster-devel] [PATCH 09/12] fs: factor out a direct_write_fallback helper
2023-05-31 7:50 ` [Cluster-devel] [PATCH 09/12] fs: factor out a direct_write_fallback helper Christoph Hellwig
@ 2023-05-31 9:08 ` Damien Le Moal
2023-05-31 9:15 ` Miklos Szeredi
1 sibling, 0 replies; 34+ messages in thread
From: Damien Le Moal @ 2023-05-31 9:08 UTC (permalink / raw)
To: cluster-devel.redhat.com
On 5/31/23 16:50, Christoph Hellwig wrote:
> Add a helper dealing with handling the syncing of a buffered write fallback
> for direct I/O.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Looks OK to me.
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 34+ messages in thread
* [Cluster-devel] [PATCH 10/12] fuse: update ki_pos in fuse_perform_write
2023-05-31 7:50 ` [Cluster-devel] [PATCH 10/12] fuse: update ki_pos in fuse_perform_write Christoph Hellwig
@ 2023-05-31 9:11 ` Miklos Szeredi
2023-06-01 6:34 ` Christoph Hellwig
0 siblings, 1 reply; 34+ messages in thread
From: Miklos Szeredi @ 2023-05-31 9:11 UTC (permalink / raw)
To: cluster-devel.redhat.com
On Wed, 31 May 2023 at 09:51, Christoph Hellwig <hch@lst.de> wrote:
>
> Both callers of fuse_perform_write need to updated ki_pos, move it into
> common code.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
> ---
> fs/fuse/file.c | 23 ++++++++++-------------
> 1 file changed, 10 insertions(+), 13 deletions(-)
>
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index 97d435874b14aa..e60e48bf392d49 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -1329,7 +1329,10 @@ static ssize_t fuse_perform_write(struct kiocb *iocb,
> fuse_write_update_attr(inode, pos, res);
> clear_bit(FUSE_I_SIZE_UNSTABLE, &fi->state);
>
> - return res > 0 ? res : err;
> + if (!res)
> + return err;
> + iocb->ki_pos += res;
> + return res;
> }
>
> static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
> @@ -1341,7 +1344,6 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
> struct inode *inode = mapping->host;
> ssize_t err;
> struct fuse_conn *fc = get_fuse_conn(inode);
> - loff_t endbyte = 0;
>
> if (fc->writeback_cache) {
> /* Update size (EOF optimization) and mode (SUID clearing) */
> @@ -1375,19 +1377,20 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
> goto out;
>
> if (iocb->ki_flags & IOCB_DIRECT) {
> - loff_t pos = iocb->ki_pos;
> + loff_t pos, endbyte;
> +
> written = generic_file_direct_write(iocb, from);
> if (written < 0 || !iov_iter_count(from))
> goto out;
>
> - pos += written;
> -
> - written_buffered = fuse_perform_write(iocb, mapping, from, pos);
> + written_buffered = fuse_perform_write(iocb, mapping, from,
> + iocb->ki_pos);
> if (written_buffered < 0) {
> err = written_buffered;
> goto out;
> }
> - endbyte = pos + written_buffered - 1;
> + pos = iocb->ki_pos - written_buffered;
> + endbyte = iocb->ki_pos - 1;
>
> err = filemap_write_and_wait_range(file->f_mapping, pos,
> endbyte);
> @@ -1399,17 +1402,11 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
> endbyte >> PAGE_SHIFT);
>
> written += written_buffered;
> - iocb->ki_pos = pos + written_buffered;
> } else {
> written = fuse_perform_write(iocb, mapping, from, iocb->ki_pos);
> - if (written >= 0)
> - iocb->ki_pos += written;
> }
> out:
> inode_unlock(inode);
> - if (written > 0)
> - written = generic_write_sync(iocb, written);
Why remove generic_write_sync()? Definitely doesn't belong in this
patch even if there's a good reason.
Sorry, didn't notice this in the last round.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 34+ messages in thread
* [Cluster-devel] [PATCH 09/12] fs: factor out a direct_write_fallback helper
2023-05-31 7:50 ` [Cluster-devel] [PATCH 09/12] fs: factor out a direct_write_fallback helper Christoph Hellwig
2023-05-31 9:08 ` Damien Le Moal
@ 2023-05-31 9:15 ` Miklos Szeredi
1 sibling, 0 replies; 34+ messages in thread
From: Miklos Szeredi @ 2023-05-31 9:15 UTC (permalink / raw)
To: cluster-devel.redhat.com
On Wed, 31 May 2023 at 09:50, Christoph Hellwig <hch@lst.de> wrote:
>
> Add a helper dealing with handling the syncing of a buffered write fallback
> for direct I/O.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
^ permalink raw reply [flat|nested] 34+ messages in thread
* [Cluster-devel] [PATCH 01/12] backing_dev: remove current->backing_dev_info
2023-05-31 7:50 ` [Cluster-devel] [PATCH 01/12] backing_dev: remove current->backing_dev_info Christoph Hellwig
2023-05-31 9:04 ` Damien Le Moal
@ 2023-05-31 11:38 ` Johannes Thumshirn
2023-05-31 14:11 ` Christian Brauner
2023-06-01 4:04 ` Theodore Ts'o
3 siblings, 0 replies; 34+ messages in thread
From: Johannes Thumshirn @ 2023-05-31 11:38 UTC (permalink / raw)
To: cluster-devel.redhat.com
Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
^ permalink raw reply [flat|nested] 34+ messages in thread
* [Cluster-devel] [PATCH 01/12] backing_dev: remove current->backing_dev_info
2023-05-31 7:50 ` [Cluster-devel] [PATCH 01/12] backing_dev: remove current->backing_dev_info Christoph Hellwig
2023-05-31 9:04 ` Damien Le Moal
2023-05-31 11:38 ` Johannes Thumshirn
@ 2023-05-31 14:11 ` Christian Brauner
2023-06-01 4:04 ` Theodore Ts'o
3 siblings, 0 replies; 34+ messages in thread
From: Christian Brauner @ 2023-05-31 14:11 UTC (permalink / raw)
To: cluster-devel.redhat.com
On Wed, May 31, 2023 at 09:50:15AM +0200, Christoph Hellwig wrote:
> The last user of current->backing_dev_info disappeared in commit
> b9b1335e6403 ("remove bdi_congested() and wb_congested() and related
> functions"). Remove the field and all assignments to it.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> ---
I somehow thought I'd already acked this... Anyway,
Reviewed-by: Christian Brauner <brauner@kernel.org>
^ permalink raw reply [flat|nested] 34+ messages in thread
* [Cluster-devel] [PATCH 01/12] backing_dev: remove current->backing_dev_info
2023-05-31 7:50 ` [Cluster-devel] [PATCH 01/12] backing_dev: remove current->backing_dev_info Christoph Hellwig
` (2 preceding siblings ...)
2023-05-31 14:11 ` Christian Brauner
@ 2023-06-01 4:04 ` Theodore Ts'o
3 siblings, 0 replies; 34+ messages in thread
From: Theodore Ts'o @ 2023-06-01 4:04 UTC (permalink / raw)
To: cluster-devel.redhat.com
On Wed, May 31, 2023 at 09:50:15AM +0200, Christoph Hellwig wrote:
> The last user of current->backing_dev_info disappeared in commit
> b9b1335e6403 ("remove bdi_congested() and wb_congested() and related
> functions"). Remove the field and all assignments to it.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Acked-by: Theodore Ts'o <tytso@mit.edu>
^ permalink raw reply [flat|nested] 34+ messages in thread
* [Cluster-devel] [PATCH 03/12] filemap: update ki_pos in generic_perform_write
2023-05-31 7:50 ` [Cluster-devel] [PATCH 03/12] filemap: update ki_pos in generic_perform_write Christoph Hellwig
@ 2023-06-01 4:06 ` Theodore Ts'o
0 siblings, 0 replies; 34+ messages in thread
From: Theodore Ts'o @ 2023-06-01 4:06 UTC (permalink / raw)
To: cluster-devel.redhat.com
On Wed, May 31, 2023 at 09:50:17AM +0200, Christoph Hellwig wrote:
> All callers of generic_perform_write need to updated ki_pos, move it into
> common code.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Xiubo Li <xiubli@redhat.com>
> Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> Acked-by: Darrick J. Wong <djwong@kernel.org>
Acked-by: Theodore Ts'o <tytso@mit.edu>
^ permalink raw reply [flat|nested] 34+ messages in thread
* [Cluster-devel] [PATCH 10/12] fuse: update ki_pos in fuse_perform_write
2023-05-31 9:11 ` Miklos Szeredi
@ 2023-06-01 6:34 ` Christoph Hellwig
0 siblings, 0 replies; 34+ messages in thread
From: Christoph Hellwig @ 2023-06-01 6:34 UTC (permalink / raw)
To: cluster-devel.redhat.com
On Wed, May 31, 2023 at 11:11:13AM +0200, Miklos Szeredi wrote:
> Why remove generic_write_sync()? Definitely doesn't belong in this
> patch even if there's a good reason.
Yes, this shouldn't have happened. I think this was a bad merge
resolution after the current->backing_dev removal.
^ permalink raw reply [flat|nested] 34+ messages in thread
* [Cluster-devel] [PATCH 03/12] filemap: update ki_pos in generic_perform_write
2023-06-01 14:58 [Cluster-devel] cleanup the filemap / direct I/O interaction v4 Christoph Hellwig
@ 2023-06-01 14:58 ` Christoph Hellwig
2023-08-27 19:41 ` Al Viro
0 siblings, 1 reply; 34+ messages in thread
From: Christoph Hellwig @ 2023-06-01 14:58 UTC (permalink / raw)
To: cluster-devel.redhat.com
All callers of generic_perform_write need to updated ki_pos, move it into
common code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Acked-by: Darrick J. Wong <djwong@kernel.org>
---
fs/ceph/file.c | 2 --
fs/ext4/file.c | 9 +++------
fs/f2fs/file.c | 1 -
fs/nfs/file.c | 1 -
mm/filemap.c | 8 ++++----
5 files changed, 7 insertions(+), 14 deletions(-)
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index c8ef72f723badd..767f4dfe7def64 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -1891,8 +1891,6 @@ static ssize_t ceph_write_iter(struct kiocb *iocb, struct iov_iter *from)
* can not run at the same time
*/
written = generic_perform_write(iocb, from);
- if (likely(written >= 0))
- iocb->ki_pos = pos + written;
ceph_end_io_write(inode);
}
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index bc430270c23c19..ea0ada3985cba2 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -289,12 +289,9 @@ static ssize_t ext4_buffered_write_iter(struct kiocb *iocb,
out:
inode_unlock(inode);
- if (likely(ret > 0)) {
- iocb->ki_pos += ret;
- ret = generic_write_sync(iocb, ret);
- }
-
- return ret;
+ if (unlikely(ret <= 0))
+ return ret;
+ return generic_write_sync(iocb, ret);
}
static ssize_t ext4_handle_inode_extension(struct inode *inode, loff_t offset,
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 4f423d367a44b9..7134fe8bd008cb 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -4520,7 +4520,6 @@ static ssize_t f2fs_buffered_write_iter(struct kiocb *iocb,
ret = generic_perform_write(iocb, from);
if (ret > 0) {
- iocb->ki_pos += ret;
f2fs_update_iostat(F2FS_I_SB(inode), inode,
APP_BUFFERED_IO, ret);
}
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 665ce3fc62eaf4..e8bb4c48a3210a 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -655,7 +655,6 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from)
goto out;
written = result;
- iocb->ki_pos += written;
nfs_add_stats(inode, NFSIOS_NORMALWRITTENBYTES, written);
if (mntflags & NFS_MOUNT_WRITE_EAGER) {
diff --git a/mm/filemap.c b/mm/filemap.c
index 33b54660ad2b39..15907af4a57ff5 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3957,7 +3957,10 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i)
balance_dirty_pages_ratelimited(mapping);
} while (iov_iter_count(i));
- return written ? written : status;
+ if (!written)
+ return status;
+ iocb->ki_pos += written;
+ return written;
}
EXPORT_SYMBOL(generic_perform_write);
@@ -4034,7 +4037,6 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
endbyte = pos + status - 1;
err = filemap_write_and_wait_range(mapping, pos, endbyte);
if (err == 0) {
- iocb->ki_pos = endbyte + 1;
written += status;
invalidate_mapping_pages(mapping,
pos >> PAGE_SHIFT,
@@ -4047,8 +4049,6 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
}
} else {
written = generic_perform_write(iocb, from);
- if (likely(written > 0))
- iocb->ki_pos += written;
}
out:
return written ? written : err;
--
2.39.2
^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [Cluster-devel] [PATCH 03/12] filemap: update ki_pos in generic_perform_write
2023-06-01 14:58 ` [Cluster-devel] [PATCH 03/12] filemap: update ki_pos in generic_perform_write Christoph Hellwig
@ 2023-08-27 19:41 ` Al Viro
2023-08-27 21:45 ` Al Viro
` (2 more replies)
0 siblings, 3 replies; 34+ messages in thread
From: Al Viro @ 2023-08-27 19:41 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Darrick J. Wong, linux-mm, Miklos Szeredi, Matthew Wilcox,
cluster-devel, Ilya Dryomov, linux-ext4, Chao Yu, linux-nfs,
linux-block, Damien Le Moal, Hannes Reinecke, Jaegeuk Kim,
ceph-devel, Xiubo Li, Trond Myklebust, Jens Axboe,
Christian Brauner, Theodore Ts'o, linux-f2fs-devel, linux-xfs,
Anna Schumaker, linux-fsdevel, Andrew Morton
On Thu, Jun 01, 2023 at 04:58:55PM +0200, Christoph Hellwig wrote:
> All callers of generic_perform_write need to updated ki_pos, move it into
> common code.
> @@ -4034,7 +4037,6 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
> endbyte = pos + status - 1;
> err = filemap_write_and_wait_range(mapping, pos, endbyte);
> if (err == 0) {
> - iocb->ki_pos = endbyte + 1;
> written += status;
> invalidate_mapping_pages(mapping,
> pos >> PAGE_SHIFT,
> @@ -4047,8 +4049,6 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
> }
> } else {
> written = generic_perform_write(iocb, from);
> - if (likely(written > 0))
> - iocb->ki_pos += written;
> }
> out:
> return written ? written : err;
[another late reply, sorry]
That part is somewhat fishy - there's a case where you return a positive value
and advance ->ki_pos by more than that amount. I really wonder if all callers
of ->write_iter() are OK with that. Consider e.g. this:
ssize_t ksys_write(unsigned int fd, const char __user *buf, size_t count)
{
struct fd f = fdget_pos(fd);
ssize_t ret = -EBADF;
if (f.file) {
loff_t pos, *ppos = file_ppos(f.file);
if (ppos) {
pos = *ppos;
ppos = &pos;
}
ret = vfs_write(f.file, buf, count, ppos);
if (ret >= 0 && ppos)
f.file->f_pos = pos;
fdput_pos(f);
}
return ret;
}
ssize_t vfs_write(struct file *file, const char __user *buf, size_t count, loff_t *pos)
{
ssize_t ret;
if (!(file->f_mode & FMODE_WRITE))
return -EBADF;
if (!(file->f_mode & FMODE_CAN_WRITE))
return -EINVAL;
if (unlikely(!access_ok(buf, count)))
return -EFAULT;
ret = rw_verify_area(WRITE, file, pos, count);
if (ret)
return ret;
if (count > MAX_RW_COUNT)
count = MAX_RW_COUNT;
file_start_write(file);
if (file->f_op->write)
ret = file->f_op->write(file, buf, count, pos);
else if (file->f_op->write_iter)
ret = new_sync_write(file, buf, count, pos);
else
ret = -EINVAL;
if (ret > 0) {
fsnotify_modify(file);
add_wchar(current, ret);
}
inc_syscw(current);
file_end_write(file);
return ret;
}
static ssize_t new_sync_write(struct file *filp, const char __user *buf, size_t len, loff_t *ppos)
{
struct kiocb kiocb;
struct iov_iter iter;
ssize_t ret;
init_sync_kiocb(&kiocb, filp);
kiocb.ki_pos = (ppos ? *ppos : 0);
iov_iter_ubuf(&iter, ITER_SOURCE, (void __user *)buf, len);
ret = call_write_iter(filp, &kiocb, &iter);
BUG_ON(ret == -EIOCBQUEUED);
if (ret > 0 && ppos)
*ppos = kiocb.ki_pos;
return ret;
}
Suppose ->write_iter() ends up doing returning a positive value smaller than
the increment of kiocb.ki_pos. What do we get? ret is positive, so
kiocb.ki_pos gets copied into *ppos, which is ksys_write's pos and there
we copy it into file->f_pos.
Is it really OK to have write() return 4096 and advance the file position
by 16K? AFAICS, userland wouldn't get any indication of something
odd going on - just a short write to a regular file, with followup write
of remaining 12K getting quietly written in the range 16K..28K.
I don't remember what POSIX says about that, but it would qualify as
nasty surprise for any userland program - sure, one can check fsync()
results before closing the sucker and see if everything looks fine,
but the way it's usually discussed could easily lead to assumption that
(synchronous) O_DIRECT writes would not be affected by anything of that
sort.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Cluster-devel] [PATCH 03/12] filemap: update ki_pos in generic_perform_write
2023-08-27 19:41 ` Al Viro
@ 2023-08-27 21:45 ` Al Viro
2023-08-28 12:32 ` Christoph Hellwig
2023-09-13 11:00 ` Christoph Hellwig
2023-08-28 1:04 ` Al Viro
2023-08-28 12:30 ` Christoph Hellwig
2 siblings, 2 replies; 34+ messages in thread
From: Al Viro @ 2023-08-27 21:45 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Darrick J. Wong, linux-mm, Miklos Szeredi, Matthew Wilcox,
cluster-devel, Ilya Dryomov, linux-ext4, Chao Yu, linux-nfs,
linux-block, Damien Le Moal, Hannes Reinecke, Jaegeuk Kim,
ceph-devel, Xiubo Li, Trond Myklebust, Jens Axboe,
Christian Brauner, Theodore Ts'o, linux-f2fs-devel, linux-xfs,
Anna Schumaker, linux-fsdevel, Andrew Morton
On Sun, Aug 27, 2023 at 08:41:22PM +0100, Al Viro wrote:
> On Thu, Jun 01, 2023 at 04:58:55PM +0200, Christoph Hellwig wrote:
> > All callers of generic_perform_write need to updated ki_pos, move it into
> > common code.
>
> > @@ -4034,7 +4037,6 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
> > endbyte = pos + status - 1;
> > err = filemap_write_and_wait_range(mapping, pos, endbyte);
> > if (err == 0) {
> > - iocb->ki_pos = endbyte + 1;
> > written += status;
> > invalidate_mapping_pages(mapping,
> > pos >> PAGE_SHIFT,
> > @@ -4047,8 +4049,6 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
> > }
> > } else {
> > written = generic_perform_write(iocb, from);
> > - if (likely(written > 0))
> > - iocb->ki_pos += written;
> > }
> > out:
> > return written ? written : err;
>
> [another late reply, sorry]
>
> That part is somewhat fishy - there's a case where you return a positive value
> and advance ->ki_pos by more than that amount. I really wonder if all callers
> of ->write_iter() are OK with that. Consider e.g. this:
>
> ssize_t ksys_write(unsigned int fd, const char __user *buf, size_t count)
> {
> struct fd f = fdget_pos(fd);
> ssize_t ret = -EBADF;
>
> if (f.file) {
> loff_t pos, *ppos = file_ppos(f.file);
> if (ppos) {
> pos = *ppos;
> ppos = &pos;
> }
> ret = vfs_write(f.file, buf, count, ppos);
> if (ret >= 0 && ppos)
> f.file->f_pos = pos;
> fdput_pos(f);
> }
>
> return ret;
> }
>
> ssize_t vfs_write(struct file *file, const char __user *buf, size_t count, loff_t *pos)
> {
> ssize_t ret;
>
> if (!(file->f_mode & FMODE_WRITE))
> return -EBADF;
> if (!(file->f_mode & FMODE_CAN_WRITE))
> return -EINVAL;
> if (unlikely(!access_ok(buf, count)))
> return -EFAULT;
>
> ret = rw_verify_area(WRITE, file, pos, count);
> if (ret)
> return ret;
> if (count > MAX_RW_COUNT)
> count = MAX_RW_COUNT;
> file_start_write(file);
> if (file->f_op->write)
> ret = file->f_op->write(file, buf, count, pos);
> else if (file->f_op->write_iter)
> ret = new_sync_write(file, buf, count, pos);
> else
> ret = -EINVAL;
> if (ret > 0) {
> fsnotify_modify(file);
> add_wchar(current, ret);
> }
> inc_syscw(current);
> file_end_write(file);
> return ret;
> }
>
> static ssize_t new_sync_write(struct file *filp, const char __user *buf, size_t len, loff_t *ppos)
> {
> struct kiocb kiocb;
> struct iov_iter iter;
> ssize_t ret;
>
> init_sync_kiocb(&kiocb, filp);
> kiocb.ki_pos = (ppos ? *ppos : 0);
> iov_iter_ubuf(&iter, ITER_SOURCE, (void __user *)buf, len);
>
> ret = call_write_iter(filp, &kiocb, &iter);
> BUG_ON(ret == -EIOCBQUEUED);
> if (ret > 0 && ppos)
> *ppos = kiocb.ki_pos;
> return ret;
> }
>
> Suppose ->write_iter() ends up doing returning a positive value smaller than
> the increment of kiocb.ki_pos. What do we get? ret is positive, so
> kiocb.ki_pos gets copied into *ppos, which is ksys_write's pos and there
> we copy it into file->f_pos.
>
> Is it really OK to have write() return 4096 and advance the file position
> by 16K? AFAICS, userland wouldn't get any indication of something
> odd going on - just a short write to a regular file, with followup write
> of remaining 12K getting quietly written in the range 16K..28K.
>
> I don't remember what POSIX says about that, but it would qualify as
> nasty surprise for any userland program - sure, one can check fsync()
> results before closing the sucker and see if everything looks fine,
> but the way it's usually discussed could easily lead to assumption that
> (synchronous) O_DIRECT writes would not be affected by anything of that
> sort.
IOW, I suspect that the right thing to do would be something along the lines
of
direct_write_fallback(): on error revert the ->ki_pos update from buffered write
If we fail filemap_write_and_wait_range() on the range the buffered write went
into, we only report the "number of bytes which we direct-written", to quote
the comment in there. Which is fine, but buffered write has already advanced
iocb->ki_pos, so we need to roll that back. Otherwise we end up with e.g.
write(2) advancing position by more than the amount it reports having written.
Fixes: 182c25e9c157 "filemap: update ki_pos in generic_perform_write"
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
diff --git a/fs/libfs.c b/fs/libfs.c
index 5b851315eeed..712c57828c0e 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1646,6 +1646,7 @@ ssize_t direct_write_fallback(struct kiocb *iocb, struct iov_iter *iter,
* We don't know how much we wrote, so just return the number of
* bytes which were direct-written
*/
+ iocb->ki_pos -= buffered_written;
if (direct_written)
return direct_written;
return err;
^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [Cluster-devel] [PATCH 03/12] filemap: update ki_pos in generic_perform_write
2023-08-27 19:41 ` Al Viro
2023-08-27 21:45 ` Al Viro
@ 2023-08-28 1:04 ` Al Viro
2023-08-28 12:30 ` Christoph Hellwig
2 siblings, 0 replies; 34+ messages in thread
From: Al Viro @ 2023-08-28 1:04 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Darrick J. Wong, linux-mm, Miklos Szeredi, Matthew Wilcox,
cluster-devel, Ilya Dryomov, linux-ext4, Chao Yu, linux-nfs,
linux-block, Damien Le Moal, Hannes Reinecke, Jaegeuk Kim,
ceph-devel, Xiubo Li, Trond Myklebust, Jens Axboe,
Christian Brauner, Theodore Ts'o, linux-f2fs-devel, linux-xfs,
Anna Schumaker, linux-fsdevel, Andrew Morton
On Sun, Aug 27, 2023 at 08:41:22PM +0100, Al Viro wrote:
> That part is somewhat fishy - there's a case where you return a positive value
> and advance ->ki_pos by more than that amount. I really wonder if all callers
> of ->write_iter() are OK with that.
Speaking of which, in case of negative return value we'd better *not* use
->ki_pos; consider e.g. generic_file_write_iter() with O_DSYNC and
vfs_fsync_range() failure. An error gets returned, but ->ki_pos is left
advanced. Normal write(2) is fine - it will only update file->f_pos if
->write_iter() has returned a non-negative. However, io_uring
kiocb_done() starts with
if (req->flags & REQ_F_CUR_POS)
req->file->f_pos = rw->kiocb.ki_pos;
if (ret >= 0 && (rw->kiocb.ki_complete == io_complete_rw)) {
if (!__io_complete_rw_common(req, ret)) {
/*
* Safe to call io_end from here as we're inline
* from the submission path.
*/
io_req_io_end(req);
io_req_set_res(req, final_ret,
io_put_kbuf(req, issue_flags));
return IOU_OK;
}
} else {
io_rw_done(&rw->kiocb, ret);
}
Note that ->f_pos update is *NOT* conditional upon ret >= 0 - it happens
no matter what, provided that original request had ->kiocb.ki_pos equal
to -1 (on a non-FMODE_STREAM file).
Jens, is there any reason for doing that unconditionally? IMO it's
a bad idea - there's a wide scope for fuckups that way, especially
since write(2) is not sensitive to that and this use of -1 ki_pos
is not particularly encouraged on io_uring side either, AFAICT.
Worse, it's handling of failure exits in the first place, which
already gets little testing...
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Cluster-devel] [PATCH 03/12] filemap: update ki_pos in generic_perform_write
2023-08-27 19:41 ` Al Viro
2023-08-27 21:45 ` Al Viro
2023-08-28 1:04 ` Al Viro
@ 2023-08-28 12:30 ` Christoph Hellwig
2023-08-28 14:02 ` Al Viro
2 siblings, 1 reply; 34+ messages in thread
From: Christoph Hellwig @ 2023-08-28 12:30 UTC (permalink / raw)
To: Al Viro
Cc: Darrick J. Wong, linux-mm, Christoph Hellwig, Miklos Szeredi,
Matthew Wilcox, cluster-devel, Ilya Dryomov, linux-ext4, Chao Yu,
linux-nfs, linux-block, Damien Le Moal, Hannes Reinecke,
Jaegeuk Kim, ceph-devel, Xiubo Li, Trond Myklebust, Jens Axboe,
Christian Brauner, Theodore Ts'o, linux-f2fs-devel, linux-xfs,
Anna Schumaker, linux-fsdevel, Andrew Morton
On Sun, Aug 27, 2023 at 08:41:22PM +0100, Al Viro wrote:
> That part is somewhat fishy - there's a case where you return a positive value
> and advance ->ki_pos by more than that amount. I really wonder if all callers
> of ->write_iter() are OK with that. Consider e.g. this:
This should not exist in the latest version merged by Jens. Can you
check if you still see issues in the version in the block tree or
linux-next.
> Suppose ->write_iter() ends up doing returning a positive value smaller than
> the increment of kiocb.ki_pos. What do we get? ret is positive, so
> kiocb.ki_pos gets copied into *ppos, which is ksys_write's pos and there
> we copy it into file->f_pos.
>
> Is it really OK to have write() return 4096 and advance the file position
> by 16K? AFAICS, userland wouldn't get any indication of something
> odd going on - just a short write to a regular file, with followup write
> of remaining 12K getting quietly written in the range 16K..28K.
>
> I don't remember what POSIX says about that, but it would qualify as
> nasty surprise for any userland program - sure, one can check fsync()
> results before closing the sucker and see if everything looks fine,
> but the way it's usually discussed could easily lead to assumption that
> (synchronous) O_DIRECT writes would not be affected by anything of that
> sort.
ki_pos should always be updated by the write return value. Everything
else is a bug.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Cluster-devel] [PATCH 03/12] filemap: update ki_pos in generic_perform_write
2023-08-27 21:45 ` Al Viro
@ 2023-08-28 12:32 ` Christoph Hellwig
2023-08-28 13:56 ` Al Viro
2023-09-13 11:00 ` Christoph Hellwig
1 sibling, 1 reply; 34+ messages in thread
From: Christoph Hellwig @ 2023-08-28 12:32 UTC (permalink / raw)
To: Al Viro
Cc: Darrick J. Wong, linux-mm, Christoph Hellwig, Miklos Szeredi,
Matthew Wilcox, cluster-devel, Ilya Dryomov, linux-ext4, Chao Yu,
linux-nfs, linux-block, Damien Le Moal, Hannes Reinecke,
Jaegeuk Kim, ceph-devel, Xiubo Li, Trond Myklebust, Jens Axboe,
Christian Brauner, Theodore Ts'o, linux-f2fs-devel, linux-xfs,
Anna Schumaker, linux-fsdevel, Andrew Morton
On Sun, Aug 27, 2023 at 10:45:18PM +0100, Al Viro wrote:
> IOW, I suspect that the right thing to do would be something along the lines
> of
The idea looks sensible to me, but we'll also need to do it for the
filemap_write_and_wait_range failure case.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Cluster-devel] [PATCH 03/12] filemap: update ki_pos in generic_perform_write
2023-08-28 12:32 ` Christoph Hellwig
@ 2023-08-28 13:56 ` Al Viro
2023-08-28 14:15 ` Christoph Hellwig
0 siblings, 1 reply; 34+ messages in thread
From: Al Viro @ 2023-08-28 13:56 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Darrick J. Wong, linux-mm, Miklos Szeredi, Matthew Wilcox,
cluster-devel, Ilya Dryomov, linux-ext4, Chao Yu, linux-nfs,
linux-block, Damien Le Moal, Hannes Reinecke, Jaegeuk Kim,
ceph-devel, Xiubo Li, Trond Myklebust, Jens Axboe,
Christian Brauner, Theodore Ts'o, linux-f2fs-devel, linux-xfs,
Anna Schumaker, linux-fsdevel, Andrew Morton
On Mon, Aug 28, 2023 at 02:32:59PM +0200, Christoph Hellwig wrote:
> On Sun, Aug 27, 2023 at 10:45:18PM +0100, Al Viro wrote:
> > IOW, I suspect that the right thing to do would be something along the lines
> > of
>
> The idea looks sensible to me, but we'll also need to do it for the
> filemap_write_and_wait_range failure case.
Huh? That's precisely where this patch is doing that... That function
in mainline is
if (unlikely(buffered_written < 0)) {
if (direct_written)
return direct_written;
return buffered_written;
}
/*
* We need to ensure that the page cache pages are written to disk and
* invalidated to preserve the expected O_DIRECT semantics.
*/
err = filemap_write_and_wait_range(mapping, pos, end);
if (err < 0) {
/*
* We don't know how much we wrote, so just return the number of
* bytes which were direct-written
*/
if (direct_written)
return direct_written;
return err;
}
invalidate_mapping_pages(mapping, pos >> PAGE_SHIFT, end >> PAGE_SHIFT);
return direct_written + buffered_written;
The first failure exit does not need any work - the caller had not bumped
->ki_pos; the second one (after that 'if (err < 0) {' line) does and that's
where the patch upthread adds iocb->ki_pos -= buffered_written.
Or am I completely misparsing what you've written?
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Cluster-devel] [PATCH 03/12] filemap: update ki_pos in generic_perform_write
2023-08-28 12:30 ` Christoph Hellwig
@ 2023-08-28 14:02 ` Al Viro
0 siblings, 0 replies; 34+ messages in thread
From: Al Viro @ 2023-08-28 14:02 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Darrick J. Wong, linux-mm, Miklos Szeredi, Matthew Wilcox,
cluster-devel, Ilya Dryomov, linux-ext4, Chao Yu, linux-nfs,
linux-block, Damien Le Moal, Hannes Reinecke, Jaegeuk Kim,
ceph-devel, Xiubo Li, Trond Myklebust, Jens Axboe,
Christian Brauner, Theodore Ts'o, linux-f2fs-devel, linux-xfs,
Anna Schumaker, linux-fsdevel, Andrew Morton
On Mon, Aug 28, 2023 at 02:30:23PM +0200, Christoph Hellwig wrote:
> On Sun, Aug 27, 2023 at 08:41:22PM +0100, Al Viro wrote:
> > That part is somewhat fishy - there's a case where you return a positive value
> > and advance ->ki_pos by more than that amount. I really wonder if all callers
> > of ->write_iter() are OK with that. Consider e.g. this:
>
> This should not exist in the latest version merged by Jens. Can you
> check if you still see issues in the version in the block tree or
> linux-next.
Still does - the problem has migrated into direct_write_fallback(), but
that hadn't changed the situation. We are still left with ->ki_pos bumped
by generic_perform_write() (evaluated as an argument of direct_write_fallback()
now) and *not* retraced in case when direct_write_fallback() decides to
discard the buffered write result. Both in -next and in mainline (since
6.5-rc1).
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Cluster-devel] [PATCH 03/12] filemap: update ki_pos in generic_perform_write
2023-08-28 13:56 ` Al Viro
@ 2023-08-28 14:15 ` Christoph Hellwig
0 siblings, 0 replies; 34+ messages in thread
From: Christoph Hellwig @ 2023-08-28 14:15 UTC (permalink / raw)
To: Al Viro
Cc: Darrick J. Wong, linux-mm, Christoph Hellwig, Miklos Szeredi,
Matthew Wilcox, cluster-devel, Ilya Dryomov, linux-ext4, Chao Yu,
linux-nfs, linux-block, Damien Le Moal, Hannes Reinecke,
Jaegeuk Kim, ceph-devel, Xiubo Li, Trond Myklebust, Jens Axboe,
Christian Brauner, Theodore Ts'o, linux-f2fs-devel, linux-xfs,
Anna Schumaker, linux-fsdevel, Andrew Morton
On Mon, Aug 28, 2023 at 02:56:15PM +0100, Al Viro wrote:
> The first failure exit does not need any work - the caller had not bumped
> ->ki_pos; the second one (after that 'if (err < 0) {' line) does and that's
> where the patch upthread adds iocb->ki_pos -= buffered_written.
>
> Or am I completely misparsing what you've written?
No, I misread the patch. Looks good:
Acked-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Cluster-devel] [PATCH 03/12] filemap: update ki_pos in generic_perform_write
2023-08-27 21:45 ` Al Viro
2023-08-28 12:32 ` Christoph Hellwig
@ 2023-09-13 11:00 ` Christoph Hellwig
2023-09-13 16:33 ` Christian Brauner
1 sibling, 1 reply; 34+ messages in thread
From: Christoph Hellwig @ 2023-09-13 11:00 UTC (permalink / raw)
To: Al Viro
Cc: Darrick J. Wong, linux-mm, Christoph Hellwig, Miklos Szeredi,
Matthew Wilcox, cluster-devel, Ilya Dryomov, linux-ext4, Chao Yu,
linux-nfs, linux-block, Damien Le Moal, Hannes Reinecke,
Jaegeuk Kim, ceph-devel, Xiubo Li, Trond Myklebust, Jens Axboe,
Christian Brauner, Theodore Ts'o, linux-f2fs-devel, linux-xfs,
Anna Schumaker, linux-fsdevel, Andrew Morton
> direct_write_fallback(): on error revert the ->ki_pos update from buffered write
Al, Christian: can you send this fix on top Linus?
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Cluster-devel] [PATCH 03/12] filemap: update ki_pos in generic_perform_write
2023-09-13 11:00 ` Christoph Hellwig
@ 2023-09-13 16:33 ` Christian Brauner
0 siblings, 0 replies; 34+ messages in thread
From: Christian Brauner @ 2023-09-13 16:33 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Darrick J. Wong, linux-mm, Miklos Szeredi, Matthew Wilcox,
cluster-devel, Ilya Dryomov, linux-ext4, Chao Yu, linux-block,
Damien Le Moal, Al Viro, Jaegeuk Kim, ceph-devel, Xiubo Li,
Trond Myklebust, Jens Axboe, linux-nfs, Theodore Ts'o,
linux-f2fs-devel, linux-xfs, Anna Schumaker, linux-fsdevel,
Andrew Morton, Hannes Reinecke
On Wed, Sep 13, 2023 at 01:00:10PM +0200, Christoph Hellwig wrote:
> > direct_write_fallback(): on error revert the ->ki_pos update from buffered write
>
> Al, Christian: can you send this fix on top Linus?
Wasn't aware of this, sorry. I've picked it up and placed it with
another set of small fixes I already have.
I'm happy to have Al take it ofc.
^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2023-09-13 16:48 UTC | newest]
Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-31 7:50 [Cluster-devel] cleanup the filemap / direct I/O interaction v3 (full series now) Christoph Hellwig
2023-05-31 7:50 ` [Cluster-devel] [PATCH 01/12] backing_dev: remove current->backing_dev_info Christoph Hellwig
2023-05-31 9:04 ` Damien Le Moal
2023-05-31 11:38 ` Johannes Thumshirn
2023-05-31 14:11 ` Christian Brauner
2023-06-01 4:04 ` Theodore Ts'o
2023-05-31 7:50 ` [Cluster-devel] [PATCH 02/12] iomap: update ki_pos a little later in iomap_dio_complete Christoph Hellwig
2023-05-31 7:50 ` [Cluster-devel] [PATCH 03/12] filemap: update ki_pos in generic_perform_write Christoph Hellwig
2023-06-01 4:06 ` Theodore Ts'o
2023-05-31 7:50 ` [Cluster-devel] [PATCH 04/12] filemap: add a kiocb_write_and_wait helper Christoph Hellwig
2023-05-31 7:50 ` [Cluster-devel] [PATCH 05/12] filemap: add a kiocb_invalidate_pages helper Christoph Hellwig
2023-05-31 7:50 ` [Cluster-devel] [PATCH 06/12] filemap: add a kiocb_invalidate_post_direct_write helper Christoph Hellwig
2023-05-31 7:50 ` [Cluster-devel] [PATCH 07/12] iomap: update ki_pos in iomap_file_buffered_write Christoph Hellwig
2023-05-31 7:50 ` [Cluster-devel] [PATCH 08/12] iomap: use kiocb_write_and_wait and kiocb_invalidate_pages Christoph Hellwig
2023-05-31 7:50 ` [Cluster-devel] [PATCH 09/12] fs: factor out a direct_write_fallback helper Christoph Hellwig
2023-05-31 9:08 ` Damien Le Moal
2023-05-31 9:15 ` Miklos Szeredi
2023-05-31 7:50 ` [Cluster-devel] [PATCH 10/12] fuse: update ki_pos in fuse_perform_write Christoph Hellwig
2023-05-31 9:11 ` Miklos Szeredi
2023-06-01 6:34 ` Christoph Hellwig
2023-05-31 7:50 ` [Cluster-devel] [PATCH 11/12] fuse: drop redundant arguments to fuse_perform_write Christoph Hellwig
2023-05-31 7:50 ` [Cluster-devel] [PATCH 12/12] fuse: use direct_write_fallback Christoph Hellwig
2023-05-31 9:01 ` Miklos Szeredi
-- strict thread matches above, loose matches on Subject: below --
2023-06-01 14:58 [Cluster-devel] cleanup the filemap / direct I/O interaction v4 Christoph Hellwig
2023-06-01 14:58 ` [Cluster-devel] [PATCH 03/12] filemap: update ki_pos in generic_perform_write Christoph Hellwig
2023-08-27 19:41 ` Al Viro
2023-08-27 21:45 ` Al Viro
2023-08-28 12:32 ` Christoph Hellwig
2023-08-28 13:56 ` Al Viro
2023-08-28 14:15 ` Christoph Hellwig
2023-09-13 11:00 ` Christoph Hellwig
2023-09-13 16:33 ` Christian Brauner
2023-08-28 1:04 ` Al Viro
2023-08-28 12:30 ` Christoph Hellwig
2023-08-28 14:02 ` Al Viro
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).