cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed
* [Cluster-devel] [PATCH 03/12] filemap: update ki_pos in generic_perform_write
  2023-05-31  7:50 [Cluster-devel] cleanup the filemap / direct I/O interaction v3 (full series now) Christoph Hellwig
@ 2023-05-31  7:50 ` Christoph Hellwig
  2023-06-01  4:06   ` Theodore Ts'o
  0 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2023-05-31  7:50 UTC (permalink / raw)
  To: cluster-devel.redhat.com

All callers of generic_perform_write need to updated ki_pos, move it into
common code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/ceph/file.c | 2 --
 fs/ext4/file.c | 9 +++------
 fs/f2fs/file.c | 1 -
 fs/nfs/file.c  | 1 -
 mm/filemap.c   | 8 ++++----
 5 files changed, 7 insertions(+), 14 deletions(-)

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index c8ef72f723badd..767f4dfe7def64 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -1891,8 +1891,6 @@ static ssize_t ceph_write_iter(struct kiocb *iocb, struct iov_iter *from)
 		 * can not run at the same time
 		 */
 		written = generic_perform_write(iocb, from);
-		if (likely(written >= 0))
-			iocb->ki_pos = pos + written;
 		ceph_end_io_write(inode);
 	}
 
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index bc430270c23c19..ea0ada3985cba2 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -289,12 +289,9 @@ static ssize_t ext4_buffered_write_iter(struct kiocb *iocb,
 
 out:
 	inode_unlock(inode);
-	if (likely(ret > 0)) {
-		iocb->ki_pos += ret;
-		ret = generic_write_sync(iocb, ret);
-	}
-
-	return ret;
+	if (unlikely(ret <= 0))
+		return ret;
+	return generic_write_sync(iocb, ret);
 }
 
 static ssize_t ext4_handle_inode_extension(struct inode *inode, loff_t offset,
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 4f423d367a44b9..7134fe8bd008cb 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -4520,7 +4520,6 @@ static ssize_t f2fs_buffered_write_iter(struct kiocb *iocb,
 	ret = generic_perform_write(iocb, from);
 
 	if (ret > 0) {
-		iocb->ki_pos += ret;
 		f2fs_update_iostat(F2FS_I_SB(inode), inode,
 						APP_BUFFERED_IO, ret);
 	}
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 665ce3fc62eaf4..e8bb4c48a3210a 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -655,7 +655,6 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from)
 		goto out;
 
 	written = result;
-	iocb->ki_pos += written;
 	nfs_add_stats(inode, NFSIOS_NORMALWRITTENBYTES, written);
 
 	if (mntflags & NFS_MOUNT_WRITE_EAGER) {
diff --git a/mm/filemap.c b/mm/filemap.c
index 33b54660ad2b39..15907af4a57ff5 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3957,7 +3957,10 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i)
 		balance_dirty_pages_ratelimited(mapping);
 	} while (iov_iter_count(i));
 
-	return written ? written : status;
+	if (!written)
+		return status;
+	iocb->ki_pos += written;
+	return written;
 }
 EXPORT_SYMBOL(generic_perform_write);
 
@@ -4034,7 +4037,6 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 		endbyte = pos + status - 1;
 		err = filemap_write_and_wait_range(mapping, pos, endbyte);
 		if (err == 0) {
-			iocb->ki_pos = endbyte + 1;
 			written += status;
 			invalidate_mapping_pages(mapping,
 						 pos >> PAGE_SHIFT,
@@ -4047,8 +4049,6 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 		}
 	} else {
 		written = generic_perform_write(iocb, from);
-		if (likely(written > 0))
-			iocb->ki_pos += written;
 	}
 out:
 	return written ? written : err;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Cluster-devel] [PATCH 03/12] filemap: update ki_pos in generic_perform_write
  2023-05-31  7:50 ` [Cluster-devel] [PATCH 03/12] filemap: update ki_pos in generic_perform_write Christoph Hellwig
@ 2023-06-01  4:06   ` Theodore Ts'o
  0 siblings, 0 replies; 28+ messages in thread
From: Theodore Ts'o @ 2023-06-01  4:06 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Wed, May 31, 2023 at 09:50:17AM +0200, Christoph Hellwig wrote:
> All callers of generic_perform_write need to updated ki_pos, move it into
> common code.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Xiubo Li <xiubli@redhat.com>
> Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> Acked-by: Darrick J. Wong <djwong@kernel.org>

Acked-by: Theodore Ts'o <tytso@mit.edu>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Cluster-devel] cleanup the filemap / direct I/O interaction v4
@ 2023-06-01 14:58 Christoph Hellwig
  2023-06-01 14:58 ` [Cluster-devel] [PATCH 01/12] backing_dev: remove current->backing_dev_info Christoph Hellwig
                   ` (11 more replies)
  0 siblings, 12 replies; 28+ messages in thread
From: Christoph Hellwig @ 2023-06-01 14:58 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi all,

this series cleans up some of the generic write helper calling
conventions and the page cache writeback / invalidation for
direct I/O.  This is a spinoff from the no-bufferhead kernel
project, for which we'll want to an use iomap based buffered
write path in the block layer.

Changes since v3:
 - fix a generic_sync_file that got lost in fuse
 - fix fuse to call fuse_perform_write and not generic_perform_write

Changes since v2:
 - stick to the existing behavior of returning a short write
   if the buffer fallback write or sync fails
 - bring back "fuse: use direct_write_fallback" which accidentally
   got lost in v2

Changes since v1:
 - remove current->backing_dev_info entirely
 - fix the pos/end calculation in direct_write_fallback
 - rename kiocb_invalidate_post_write to
   kiocb_invalidate_post_direct_write
 - typo fixes

diffstat:
 block/fops.c            |   18 ----
 fs/btrfs/file.c         |    6 -
 fs/ceph/file.c          |    6 -
 fs/direct-io.c          |   10 --
 fs/ext4/file.c          |   11 --
 fs/f2fs/file.c          |    3 
 fs/fuse/file.c          |   45 ++---------
 fs/gfs2/file.c          |    6 -
 fs/iomap/buffered-io.c  |    9 +-
 fs/iomap/direct-io.c    |   88 ++++++++-------------
 fs/libfs.c              |   41 ++++++++++
 fs/nfs/file.c           |    6 -
 fs/ntfs/file.c          |    2 
 fs/ntfs3/file.c         |    3 
 fs/xfs/xfs_file.c       |    6 -
 fs/zonefs/file.c        |    4 
 include/linux/fs.h      |    7 -
 include/linux/pagemap.h |    4 
 include/linux/sched.h   |    3 
 mm/filemap.c            |  194 +++++++++++++++++++++---------------------------
 20 files changed, 194 insertions(+), 278 deletions(-)


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Cluster-devel] [PATCH 01/12] backing_dev: remove current->backing_dev_info
  2023-06-01 14:58 [Cluster-devel] cleanup the filemap / direct I/O interaction v4 Christoph Hellwig
@ 2023-06-01 14:58 ` Christoph Hellwig
  2023-07-06  0:18   ` [Cluster-devel] [f2fs-dev] " patchwork-bot+f2fs
  2023-06-01 14:58 ` [Cluster-devel] [PATCH 02/12] iomap: update ki_pos a little later in iomap_dio_complete Christoph Hellwig
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2023-06-01 14:58 UTC (permalink / raw)
  To: cluster-devel.redhat.com

The last user of current->backing_dev_info disappeared in commit
b9b1335e6403 ("remove bdi_congested() and wb_congested() and related
functions").  Remove the field and all assignments to it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Acked-by: Theodore Ts'o <tytso@mit.edu>
---
 fs/btrfs/file.c       | 6 +-----
 fs/ceph/file.c        | 4 ----
 fs/ext4/file.c        | 2 --
 fs/f2fs/file.c        | 2 --
 fs/fuse/file.c        | 4 ----
 fs/gfs2/file.c        | 2 --
 fs/nfs/file.c         | 5 +----
 fs/ntfs/file.c        | 2 --
 fs/ntfs3/file.c       | 3 ---
 fs/xfs/xfs_file.c     | 4 ----
 include/linux/sched.h | 3 ---
 mm/filemap.c          | 3 ---
 12 files changed, 2 insertions(+), 38 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index f649647392e0e4..ecd43ab66fa6c7 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1145,7 +1145,6 @@ static int btrfs_write_check(struct kiocb *iocb, struct iov_iter *from,
 	    !(BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW | BTRFS_INODE_PREALLOC)))
 		return -EAGAIN;
 
-	current->backing_dev_info = inode_to_bdi(inode);
 	ret = file_remove_privs(file);
 	if (ret)
 		return ret;
@@ -1165,10 +1164,8 @@ static int btrfs_write_check(struct kiocb *iocb, struct iov_iter *from,
 		loff_t end_pos = round_up(pos + count, fs_info->sectorsize);
 
 		ret = btrfs_cont_expand(BTRFS_I(inode), oldsize, end_pos);
-		if (ret) {
-			current->backing_dev_info = NULL;
+		if (ret)
 			return ret;
-		}
 	}
 
 	return 0;
@@ -1689,7 +1686,6 @@ ssize_t btrfs_do_write_iter(struct kiocb *iocb, struct iov_iter *from,
 	if (sync)
 		atomic_dec(&inode->sync_writers);
 
-	current->backing_dev_info = NULL;
 	return num_written;
 }
 
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index f4d8bf7dec88a8..c8ef72f723badd 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -1791,9 +1791,6 @@ static ssize_t ceph_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	else
 		ceph_start_io_write(inode);
 
-	/* We can write back this queue in page reclaim */
-	current->backing_dev_info = inode_to_bdi(inode);
-
 	if (iocb->ki_flags & IOCB_APPEND) {
 		err = ceph_do_getattr(inode, CEPH_STAT_CAP_SIZE, false);
 		if (err < 0)
@@ -1940,7 +1937,6 @@ static ssize_t ceph_write_iter(struct kiocb *iocb, struct iov_iter *from)
 		ceph_end_io_write(inode);
 out_unlocked:
 	ceph_free_cap_flush(prealloc_cf);
-	current->backing_dev_info = NULL;
 	return written ? written : err;
 }
 
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index d101b3b0c7dad8..bc430270c23c19 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -285,9 +285,7 @@ static ssize_t ext4_buffered_write_iter(struct kiocb *iocb,
 	if (ret <= 0)
 		goto out;
 
-	current->backing_dev_info = inode_to_bdi(inode);
 	ret = generic_perform_write(iocb, from);
-	current->backing_dev_info = NULL;
 
 out:
 	inode_unlock(inode);
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 5ac53d2627d20d..4f423d367a44b9 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -4517,9 +4517,7 @@ static ssize_t f2fs_buffered_write_iter(struct kiocb *iocb,
 	if (iocb->ki_flags & IOCB_NOWAIT)
 		return -EOPNOTSUPP;
 
-	current->backing_dev_info = inode_to_bdi(inode);
 	ret = generic_perform_write(iocb, from);
-	current->backing_dev_info = NULL;
 
 	if (ret > 0) {
 		iocb->ki_pos += ret;
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 89d97f6188e05e..97d435874b14aa 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1362,9 +1362,6 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
 writethrough:
 	inode_lock(inode);
 
-	/* We can write back this queue in page reclaim */
-	current->backing_dev_info = inode_to_bdi(inode);
-
 	err = generic_write_checks(iocb, from);
 	if (err <= 0)
 		goto out;
@@ -1409,7 +1406,6 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
 			iocb->ki_pos += written;
 	}
 out:
-	current->backing_dev_info = NULL;
 	inode_unlock(inode);
 	if (written > 0)
 		written = generic_write_sync(iocb, written);
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 300844f50dcd28..904a0d6ac1a1a9 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -1041,11 +1041,9 @@ static ssize_t gfs2_file_buffered_write(struct kiocb *iocb,
 			goto out_unlock;
 	}
 
-	current->backing_dev_info = inode_to_bdi(inode);
 	pagefault_disable();
 	ret = iomap_file_buffered_write(iocb, from, &gfs2_iomap_ops);
 	pagefault_enable();
-	current->backing_dev_info = NULL;
 	if (ret > 0) {
 		iocb->ki_pos += ret;
 		written += ret;
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index f0edf5a36237d1..665ce3fc62eaf4 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -648,11 +648,8 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from)
 	since = filemap_sample_wb_err(file->f_mapping);
 	nfs_start_io_write(inode);
 	result = generic_write_checks(iocb, from);
-	if (result > 0) {
-		current->backing_dev_info = inode_to_bdi(inode);
+	if (result > 0)
 		result = generic_perform_write(iocb, from);
-		current->backing_dev_info = NULL;
-	}
 	nfs_end_io_write(inode);
 	if (result <= 0)
 		goto out;
diff --git a/fs/ntfs/file.c b/fs/ntfs/file.c
index c481b14e4fd989..e296f804a9c442 100644
--- a/fs/ntfs/file.c
+++ b/fs/ntfs/file.c
@@ -1911,11 +1911,9 @@ static ssize_t ntfs_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 
 	inode_lock(vi);
 	/* We can write back this queue in page reclaim. */
-	current->backing_dev_info = inode_to_bdi(vi);
 	err = ntfs_prepare_file_for_write(iocb, from);
 	if (iov_iter_count(from) && !err)
 		written = ntfs_perform_write(file, from, iocb->ki_pos);
-	current->backing_dev_info = NULL;
 	inode_unlock(vi);
 	iocb->ki_pos += written;
 	if (likely(written > 0))
diff --git a/fs/ntfs3/file.c b/fs/ntfs3/file.c
index 9a3d55c367d920..86d16a2c8339ca 100644
--- a/fs/ntfs3/file.c
+++ b/fs/ntfs3/file.c
@@ -820,7 +820,6 @@ static ssize_t ntfs_compress_write(struct kiocb *iocb, struct iov_iter *from)
 	if (!pages)
 		return -ENOMEM;
 
-	current->backing_dev_info = inode_to_bdi(inode);
 	err = file_remove_privs(file);
 	if (err)
 		goto out;
@@ -993,8 +992,6 @@ static ssize_t ntfs_compress_write(struct kiocb *iocb, struct iov_iter *from)
 out:
 	kfree(pages);
 
-	current->backing_dev_info = NULL;
-
 	if (err < 0)
 		return err;
 
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index aede746541f8ae..431c3fd0e2b598 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -717,9 +717,6 @@ xfs_file_buffered_write(
 	if (ret)
 		goto out;
 
-	/* We can write back this queue in page reclaim */
-	current->backing_dev_info = inode_to_bdi(inode);
-
 	trace_xfs_file_buffered_write(iocb, from);
 	ret = iomap_file_buffered_write(iocb, from,
 			&xfs_buffered_write_iomap_ops);
@@ -753,7 +750,6 @@ xfs_file_buffered_write(
 		goto write_retry;
 	}
 
-	current->backing_dev_info = NULL;
 out:
 	if (iolock)
 		xfs_iunlock(ip, iolock);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index eed5d65b8d1f4d..54780571fe9a07 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -41,7 +41,6 @@
 
 /* task_struct member predeclarations (sorted alphabetically): */
 struct audit_context;
-struct backing_dev_info;
 struct bio_list;
 struct blk_plug;
 struct bpf_local_storage;
@@ -1186,8 +1185,6 @@ struct task_struct {
 	/* VM state: */
 	struct reclaim_state		*reclaim_state;
 
-	struct backing_dev_info		*backing_dev_info;
-
 	struct io_context		*io_context;
 
 #ifdef CONFIG_COMPACTION
diff --git a/mm/filemap.c b/mm/filemap.c
index b4c9bd368b7e58..33b54660ad2b39 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3991,8 +3991,6 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	ssize_t		err;
 	ssize_t		status;
 
-	/* We can write back this queue in page reclaim */
-	current->backing_dev_info = inode_to_bdi(inode);
 	err = file_remove_privs(file);
 	if (err)
 		goto out;
@@ -4053,7 +4051,6 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 			iocb->ki_pos += written;
 	}
 out:
-	current->backing_dev_info = NULL;
 	return written ? written : err;
 }
 EXPORT_SYMBOL(__generic_file_write_iter);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Cluster-devel] [PATCH 02/12] iomap: update ki_pos a little later in iomap_dio_complete
  2023-06-01 14:58 [Cluster-devel] cleanup the filemap / direct I/O interaction v4 Christoph Hellwig
  2023-06-01 14:58 ` [Cluster-devel] [PATCH 01/12] backing_dev: remove current->backing_dev_info Christoph Hellwig
@ 2023-06-01 14:58 ` Christoph Hellwig
  2023-06-01 14:58 ` [Cluster-devel] [PATCH 03/12] filemap: update ki_pos in generic_perform_write Christoph Hellwig
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2023-06-01 14:58 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Move the ki_pos update down a bit to prepare for a better common
helper that invalidates pages based of an iocb.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/iomap/direct-io.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 019cc87d0fb339..6207a59d2162e1 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -94,7 +94,6 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio)
 		if (offset + ret > dio->i_size &&
 		    !(dio->flags & IOMAP_DIO_WRITE))
 			ret = dio->i_size - offset;
-		iocb->ki_pos += ret;
 	}
 
 	/*
@@ -120,19 +119,21 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio)
 	}
 
 	inode_dio_end(file_inode(iocb->ki_filp));
-	/*
-	 * If this is a DSYNC write, make sure we push it to stable storage now
-	 * that we've written data.
-	 */
-	if (ret > 0 && (dio->flags & IOMAP_DIO_NEED_SYNC))
-		ret = generic_write_sync(iocb, ret);
 
-	if (ret > 0)
-		ret += dio->done_before;
+	if (ret > 0) {
+		iocb->ki_pos += ret;
 
+		/*
+		 * If this is a DSYNC write, make sure we push it to stable
+		 * storage now that we've written data.
+		 */
+		if (dio->flags & IOMAP_DIO_NEED_SYNC)
+			ret = generic_write_sync(iocb, ret);
+		if (ret > 0)
+			ret += dio->done_before;
+	}
 	trace_iomap_dio_complete(iocb, dio->error, ret);
 	kfree(dio);
-
 	return ret;
 }
 EXPORT_SYMBOL_GPL(iomap_dio_complete);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Cluster-devel] [PATCH 03/12] filemap: update ki_pos in generic_perform_write
  2023-06-01 14:58 [Cluster-devel] cleanup the filemap / direct I/O interaction v4 Christoph Hellwig
  2023-06-01 14:58 ` [Cluster-devel] [PATCH 01/12] backing_dev: remove current->backing_dev_info Christoph Hellwig
  2023-06-01 14:58 ` [Cluster-devel] [PATCH 02/12] iomap: update ki_pos a little later in iomap_dio_complete Christoph Hellwig
@ 2023-06-01 14:58 ` Christoph Hellwig
  2023-08-27 19:41   ` Al Viro
  2023-06-01 14:58 ` [Cluster-devel] [PATCH 04/12] filemap: add a kiocb_write_and_wait helper Christoph Hellwig
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2023-06-01 14:58 UTC (permalink / raw)
  To: cluster-devel.redhat.com

All callers of generic_perform_write need to updated ki_pos, move it into
common code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Acked-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/ceph/file.c | 2 --
 fs/ext4/file.c | 9 +++------
 fs/f2fs/file.c | 1 -
 fs/nfs/file.c  | 1 -
 mm/filemap.c   | 8 ++++----
 5 files changed, 7 insertions(+), 14 deletions(-)

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index c8ef72f723badd..767f4dfe7def64 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -1891,8 +1891,6 @@ static ssize_t ceph_write_iter(struct kiocb *iocb, struct iov_iter *from)
 		 * can not run at the same time
 		 */
 		written = generic_perform_write(iocb, from);
-		if (likely(written >= 0))
-			iocb->ki_pos = pos + written;
 		ceph_end_io_write(inode);
 	}
 
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index bc430270c23c19..ea0ada3985cba2 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -289,12 +289,9 @@ static ssize_t ext4_buffered_write_iter(struct kiocb *iocb,
 
 out:
 	inode_unlock(inode);
-	if (likely(ret > 0)) {
-		iocb->ki_pos += ret;
-		ret = generic_write_sync(iocb, ret);
-	}
-
-	return ret;
+	if (unlikely(ret <= 0))
+		return ret;
+	return generic_write_sync(iocb, ret);
 }
 
 static ssize_t ext4_handle_inode_extension(struct inode *inode, loff_t offset,
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 4f423d367a44b9..7134fe8bd008cb 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -4520,7 +4520,6 @@ static ssize_t f2fs_buffered_write_iter(struct kiocb *iocb,
 	ret = generic_perform_write(iocb, from);
 
 	if (ret > 0) {
-		iocb->ki_pos += ret;
 		f2fs_update_iostat(F2FS_I_SB(inode), inode,
 						APP_BUFFERED_IO, ret);
 	}
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 665ce3fc62eaf4..e8bb4c48a3210a 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -655,7 +655,6 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from)
 		goto out;
 
 	written = result;
-	iocb->ki_pos += written;
 	nfs_add_stats(inode, NFSIOS_NORMALWRITTENBYTES, written);
 
 	if (mntflags & NFS_MOUNT_WRITE_EAGER) {
diff --git a/mm/filemap.c b/mm/filemap.c
index 33b54660ad2b39..15907af4a57ff5 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3957,7 +3957,10 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i)
 		balance_dirty_pages_ratelimited(mapping);
 	} while (iov_iter_count(i));
 
-	return written ? written : status;
+	if (!written)
+		return status;
+	iocb->ki_pos += written;
+	return written;
 }
 EXPORT_SYMBOL(generic_perform_write);
 
@@ -4034,7 +4037,6 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 		endbyte = pos + status - 1;
 		err = filemap_write_and_wait_range(mapping, pos, endbyte);
 		if (err == 0) {
-			iocb->ki_pos = endbyte + 1;
 			written += status;
 			invalidate_mapping_pages(mapping,
 						 pos >> PAGE_SHIFT,
@@ -4047,8 +4049,6 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 		}
 	} else {
 		written = generic_perform_write(iocb, from);
-		if (likely(written > 0))
-			iocb->ki_pos += written;
 	}
 out:
 	return written ? written : err;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Cluster-devel] [PATCH 04/12] filemap: add a kiocb_write_and_wait helper
  2023-06-01 14:58 [Cluster-devel] cleanup the filemap / direct I/O interaction v4 Christoph Hellwig
                   ` (2 preceding siblings ...)
  2023-06-01 14:58 ` [Cluster-devel] [PATCH 03/12] filemap: update ki_pos in generic_perform_write Christoph Hellwig
@ 2023-06-01 14:58 ` Christoph Hellwig
  2023-06-01 14:58 ` [Cluster-devel] [PATCH 05/12] filemap: add a kiocb_invalidate_pages helper Christoph Hellwig
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2023-06-01 14:58 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Factor out a helper that does filemap_write_and_wait_range for the range
covered by a read kiocb, or returns -EAGAIN if the kiocb is marked as
nowait and there would be pages to write.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Darrick J. Wong <djwong@kernel.org>
---
 block/fops.c            | 18 +++---------------
 include/linux/pagemap.h |  2 ++
 mm/filemap.c            | 30 ++++++++++++++++++------------
 3 files changed, 23 insertions(+), 27 deletions(-)

diff --git a/block/fops.c b/block/fops.c
index 58d0aebc7313a8..575171049c5d83 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -576,21 +576,9 @@ static ssize_t blkdev_read_iter(struct kiocb *iocb, struct iov_iter *to)
 		goto reexpand; /* skip atime */
 
 	if (iocb->ki_flags & IOCB_DIRECT) {
-		struct address_space *mapping = iocb->ki_filp->f_mapping;
-
-		if (iocb->ki_flags & IOCB_NOWAIT) {
-			if (filemap_range_needs_writeback(mapping, pos,
-							  pos + count - 1)) {
-				ret = -EAGAIN;
-				goto reexpand;
-			}
-		} else {
-			ret = filemap_write_and_wait_range(mapping, pos,
-							   pos + count - 1);
-			if (ret < 0)
-				goto reexpand;
-		}
-
+		ret = kiocb_write_and_wait(iocb, count);
+		if (ret < 0)
+			goto reexpand;
 		file_accessed(iocb->ki_filp);
 
 		ret = blkdev_direct_IO(iocb, to);
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index a56308a9d1a450..36fc2cea13ce20 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -30,6 +30,7 @@ static inline void invalidate_remote_inode(struct inode *inode)
 int invalidate_inode_pages2(struct address_space *mapping);
 int invalidate_inode_pages2_range(struct address_space *mapping,
 		pgoff_t start, pgoff_t end);
+
 int write_inode_now(struct inode *, int sync);
 int filemap_fdatawrite(struct address_space *);
 int filemap_flush(struct address_space *);
@@ -54,6 +55,7 @@ int filemap_check_errors(struct address_space *mapping);
 void __filemap_set_wb_err(struct address_space *mapping, int err);
 int filemap_fdatawrite_wbc(struct address_space *mapping,
 			   struct writeback_control *wbc);
+int kiocb_write_and_wait(struct kiocb *iocb, size_t count);
 
 static inline int filemap_write_and_wait(struct address_space *mapping)
 {
diff --git a/mm/filemap.c b/mm/filemap.c
index 15907af4a57ff5..5fcd5227f9cae2 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2762,6 +2762,21 @@ ssize_t filemap_read(struct kiocb *iocb, struct iov_iter *iter,
 }
 EXPORT_SYMBOL_GPL(filemap_read);
 
+int kiocb_write_and_wait(struct kiocb *iocb, size_t count)
+{
+	struct address_space *mapping = iocb->ki_filp->f_mapping;
+	loff_t pos = iocb->ki_pos;
+	loff_t end = pos + count - 1;
+
+	if (iocb->ki_flags & IOCB_NOWAIT) {
+		if (filemap_range_needs_writeback(mapping, pos, end))
+			return -EAGAIN;
+		return 0;
+	}
+
+	return filemap_write_and_wait_range(mapping, pos, end);
+}
+
 /**
  * generic_file_read_iter - generic filesystem read routine
  * @iocb:	kernel I/O control block
@@ -2797,18 +2812,9 @@ generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
 		struct address_space *mapping = file->f_mapping;
 		struct inode *inode = mapping->host;
 
-		if (iocb->ki_flags & IOCB_NOWAIT) {
-			if (filemap_range_needs_writeback(mapping, iocb->ki_pos,
-						iocb->ki_pos + count - 1))
-				return -EAGAIN;
-		} else {
-			retval = filemap_write_and_wait_range(mapping,
-						iocb->ki_pos,
-					        iocb->ki_pos + count - 1);
-			if (retval < 0)
-				return retval;
-		}
-
+		retval = kiocb_write_and_wait(iocb, count);
+		if (retval < 0)
+			return retval;
 		file_accessed(file);
 
 		retval = mapping->a_ops->direct_IO(iocb, iter);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Cluster-devel] [PATCH 05/12] filemap: add a kiocb_invalidate_pages helper
  2023-06-01 14:58 [Cluster-devel] cleanup the filemap / direct I/O interaction v4 Christoph Hellwig
                   ` (3 preceding siblings ...)
  2023-06-01 14:58 ` [Cluster-devel] [PATCH 04/12] filemap: add a kiocb_write_and_wait helper Christoph Hellwig
@ 2023-06-01 14:58 ` Christoph Hellwig
  2023-06-01 14:58 ` [Cluster-devel] [PATCH 06/12] filemap: add a kiocb_invalidate_post_direct_write helper Christoph Hellwig
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2023-06-01 14:58 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Factor out a helper that calls filemap_write_and_wait_range and
invalidate_inode_pages2_range for the range covered by a write kiocb or
returns -EAGAIN if the kiocb is marked as nowait and there would be pages
to write or invalidate.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Darrick J. Wong <djwong@kernel.org>
---
 include/linux/pagemap.h |  1 +
 mm/filemap.c            | 48 ++++++++++++++++++++++++-----------------
 2 files changed, 29 insertions(+), 20 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 36fc2cea13ce20..6e4c9ee40baa99 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -30,6 +30,7 @@ static inline void invalidate_remote_inode(struct inode *inode)
 int invalidate_inode_pages2(struct address_space *mapping);
 int invalidate_inode_pages2_range(struct address_space *mapping,
 		pgoff_t start, pgoff_t end);
+int kiocb_invalidate_pages(struct kiocb *iocb, size_t count);
 
 int write_inode_now(struct inode *, int sync);
 int filemap_fdatawrite(struct address_space *);
diff --git a/mm/filemap.c b/mm/filemap.c
index 5fcd5227f9cae2..a1cb01a4b8046a 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2777,6 +2777,33 @@ int kiocb_write_and_wait(struct kiocb *iocb, size_t count)
 	return filemap_write_and_wait_range(mapping, pos, end);
 }
 
+int kiocb_invalidate_pages(struct kiocb *iocb, size_t count)
+{
+	struct address_space *mapping = iocb->ki_filp->f_mapping;
+	loff_t pos = iocb->ki_pos;
+	loff_t end = pos + count - 1;
+	int ret;
+
+	if (iocb->ki_flags & IOCB_NOWAIT) {
+		/* we could block if there are any pages in the range */
+		if (filemap_range_has_page(mapping, pos, end))
+			return -EAGAIN;
+	} else {
+		ret = filemap_write_and_wait_range(mapping, pos, end);
+		if (ret)
+			return ret;
+	}
+
+	/*
+	 * After a write we want buffered reads to be sure to go to disk to get
+	 * the new data.  We invalidate clean cached page from the region we're
+	 * about to write.  We do this *before* the write so that we can return
+	 * without clobbering -EIOCBQUEUED from ->direct_IO().
+	 */
+	return invalidate_inode_pages2_range(mapping, pos >> PAGE_SHIFT,
+					     end >> PAGE_SHIFT);
+}
+
 /**
  * generic_file_read_iter - generic filesystem read routine
  * @iocb:	kernel I/O control block
@@ -3820,30 +3847,11 @@ generic_file_direct_write(struct kiocb *iocb, struct iov_iter *from)
 	write_len = iov_iter_count(from);
 	end = (pos + write_len - 1) >> PAGE_SHIFT;
 
-	if (iocb->ki_flags & IOCB_NOWAIT) {
-		/* If there are pages to writeback, return */
-		if (filemap_range_has_page(file->f_mapping, pos,
-					   pos + write_len - 1))
-			return -EAGAIN;
-	} else {
-		written = filemap_write_and_wait_range(mapping, pos,
-							pos + write_len - 1);
-		if (written)
-			goto out;
-	}
-
-	/*
-	 * After a write we want buffered reads to be sure to go to disk to get
-	 * the new data.  We invalidate clean cached page from the region we're
-	 * about to write.  We do this *before* the write so that we can return
-	 * without clobbering -EIOCBQUEUED from ->direct_IO().
-	 */
-	written = invalidate_inode_pages2_range(mapping,
-					pos >> PAGE_SHIFT, end);
 	/*
 	 * If a page can not be invalidated, return 0 to fall back
 	 * to buffered write.
 	 */
+	written = kiocb_invalidate_pages(iocb, write_len);
 	if (written) {
 		if (written == -EBUSY)
 			return 0;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Cluster-devel] [PATCH 06/12] filemap: add a kiocb_invalidate_post_direct_write helper
  2023-06-01 14:58 [Cluster-devel] cleanup the filemap / direct I/O interaction v4 Christoph Hellwig
                   ` (4 preceding siblings ...)
  2023-06-01 14:58 ` [Cluster-devel] [PATCH 05/12] filemap: add a kiocb_invalidate_pages helper Christoph Hellwig
@ 2023-06-01 14:58 ` Christoph Hellwig
  2023-06-01 14:58 ` [Cluster-devel] [PATCH 07/12] iomap: update ki_pos in iomap_file_buffered_write Christoph Hellwig
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2023-06-01 14:58 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Add a helper to invalidate page cache after a dio write.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/direct-io.c          | 10 ++--------
 fs/iomap/direct-io.c    | 12 ++----------
 include/linux/fs.h      |  5 -----
 include/linux/pagemap.h |  1 +
 mm/filemap.c            | 37 ++++++++++++++++++++-----------------
 5 files changed, 25 insertions(+), 40 deletions(-)

diff --git a/fs/direct-io.c b/fs/direct-io.c
index 0b380bb8a81e11..4f9069aee0fe19 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -285,14 +285,8 @@ static ssize_t dio_complete(struct dio *dio, ssize_t ret, unsigned int flags)
 	 * zeros from unwritten extents.
 	 */
 	if (flags & DIO_COMPLETE_INVALIDATE &&
-	    ret > 0 && dio_op == REQ_OP_WRITE &&
-	    dio->inode->i_mapping->nrpages) {
-		err = invalidate_inode_pages2_range(dio->inode->i_mapping,
-					offset >> PAGE_SHIFT,
-					(offset + ret - 1) >> PAGE_SHIFT);
-		if (err)
-			dio_warn_stale_pagecache(dio->iocb->ki_filp);
-	}
+	    ret > 0 && dio_op == REQ_OP_WRITE)
+		kiocb_invalidate_post_direct_write(dio->iocb, ret);
 
 	inode_dio_end(dio->inode);
 
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 6207a59d2162e1..0795c54a745bca 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -81,7 +81,6 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio)
 {
 	const struct iomap_dio_ops *dops = dio->dops;
 	struct kiocb *iocb = dio->iocb;
-	struct inode *inode = file_inode(iocb->ki_filp);
 	loff_t offset = iocb->ki_pos;
 	ssize_t ret = dio->error;
 
@@ -108,15 +107,8 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio)
 	 * ->end_io() when necessary, otherwise a racing buffer read would cache
 	 * zeros from unwritten extents.
 	 */
-	if (!dio->error && dio->size &&
-	    (dio->flags & IOMAP_DIO_WRITE) && inode->i_mapping->nrpages) {
-		int err;
-		err = invalidate_inode_pages2_range(inode->i_mapping,
-				offset >> PAGE_SHIFT,
-				(offset + dio->size - 1) >> PAGE_SHIFT);
-		if (err)
-			dio_warn_stale_pagecache(iocb->ki_filp);
-	}
+	if (!dio->error && dio->size && (dio->flags & IOMAP_DIO_WRITE))
+		kiocb_invalidate_post_direct_write(iocb, dio->size);
 
 	inode_dio_end(file_inode(iocb->ki_filp));
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 133f0640fb2411..91021b4e1f6f48 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2837,11 +2837,6 @@ static inline void inode_dio_end(struct inode *inode)
 		wake_up_bit(&inode->i_state, __I_DIO_WAKEUP);
 }
 
-/*
- * Warn about a page cache invalidation failure diring a direct I/O write.
- */
-void dio_warn_stale_pagecache(struct file *filp);
-
 extern void inode_set_flags(struct inode *inode, unsigned int flags,
 			    unsigned int mask);
 
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 6e4c9ee40baa99..6ecc4aaf5e3d51 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -31,6 +31,7 @@ int invalidate_inode_pages2(struct address_space *mapping);
 int invalidate_inode_pages2_range(struct address_space *mapping,
 		pgoff_t start, pgoff_t end);
 int kiocb_invalidate_pages(struct kiocb *iocb, size_t count);
+void kiocb_invalidate_post_direct_write(struct kiocb *iocb, size_t count);
 
 int write_inode_now(struct inode *, int sync);
 int filemap_fdatawrite(struct address_space *);
diff --git a/mm/filemap.c b/mm/filemap.c
index a1cb01a4b8046a..ddb6f8aa86d6ca 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3816,7 +3816,7 @@ EXPORT_SYMBOL(read_cache_page_gfp);
 /*
  * Warn about a page cache invalidation failure during a direct I/O write.
  */
-void dio_warn_stale_pagecache(struct file *filp)
+static void dio_warn_stale_pagecache(struct file *filp)
 {
 	static DEFINE_RATELIMIT_STATE(_rs, 86400 * HZ, DEFAULT_RATELIMIT_BURST);
 	char pathname[128];
@@ -3833,19 +3833,23 @@ void dio_warn_stale_pagecache(struct file *filp)
 	}
 }
 
+void kiocb_invalidate_post_direct_write(struct kiocb *iocb, size_t count)
+{
+	struct address_space *mapping = iocb->ki_filp->f_mapping;
+
+	if (mapping->nrpages &&
+	    invalidate_inode_pages2_range(mapping,
+			iocb->ki_pos >> PAGE_SHIFT,
+			(iocb->ki_pos + count - 1) >> PAGE_SHIFT))
+		dio_warn_stale_pagecache(iocb->ki_filp);
+}
+
 ssize_t
 generic_file_direct_write(struct kiocb *iocb, struct iov_iter *from)
 {
-	struct file	*file = iocb->ki_filp;
-	struct address_space *mapping = file->f_mapping;
-	struct inode	*inode = mapping->host;
-	loff_t		pos = iocb->ki_pos;
-	ssize_t		written;
-	size_t		write_len;
-	pgoff_t		end;
-
-	write_len = iov_iter_count(from);
-	end = (pos + write_len - 1) >> PAGE_SHIFT;
+	struct address_space *mapping = iocb->ki_filp->f_mapping;
+	size_t write_len = iov_iter_count(from);
+	ssize_t written;
 
 	/*
 	 * If a page can not be invalidated, return 0 to fall back
@@ -3855,7 +3859,7 @@ generic_file_direct_write(struct kiocb *iocb, struct iov_iter *from)
 	if (written) {
 		if (written == -EBUSY)
 			return 0;
-		goto out;
+		return written;
 	}
 
 	written = mapping->a_ops->direct_IO(iocb, from);
@@ -3877,11 +3881,11 @@ generic_file_direct_write(struct kiocb *iocb, struct iov_iter *from)
 	 *
 	 * Skip invalidation for async writes or if mapping has no pages.
 	 */
-	if (written > 0 && mapping->nrpages &&
-	    invalidate_inode_pages2_range(mapping, pos >> PAGE_SHIFT, end))
-		dio_warn_stale_pagecache(file);
-
 	if (written > 0) {
+		struct inode *inode = mapping->host;
+		loff_t pos = iocb->ki_pos;
+
+		kiocb_invalidate_post_direct_write(iocb, written);
 		pos += written;
 		write_len -= written;
 		if (pos > i_size_read(inode) && !S_ISBLK(inode->i_mode)) {
@@ -3892,7 +3896,6 @@ generic_file_direct_write(struct kiocb *iocb, struct iov_iter *from)
 	}
 	if (written != -EIOCBQUEUED)
 		iov_iter_revert(from, write_len - iov_iter_count(from));
-out:
 	return written;
 }
 EXPORT_SYMBOL(generic_file_direct_write);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Cluster-devel] [PATCH 07/12] iomap: update ki_pos in iomap_file_buffered_write
  2023-06-01 14:58 [Cluster-devel] cleanup the filemap / direct I/O interaction v4 Christoph Hellwig
                   ` (5 preceding siblings ...)
  2023-06-01 14:58 ` [Cluster-devel] [PATCH 06/12] filemap: add a kiocb_invalidate_post_direct_write helper Christoph Hellwig
@ 2023-06-01 14:58 ` Christoph Hellwig
  2023-06-01 14:59 ` [Cluster-devel] [PATCH 08/12] iomap: use kiocb_write_and_wait and kiocb_invalidate_pages Christoph Hellwig
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2023-06-01 14:58 UTC (permalink / raw)
  To: cluster-devel.redhat.com

All callers of iomap_file_buffered_write need to updated ki_pos, move it
into common code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Acked-by: Damien Le Moal <dlemoal@kernel.org>
---
 fs/gfs2/file.c         | 4 +---
 fs/iomap/buffered-io.c | 9 ++++++---
 fs/xfs/xfs_file.c      | 2 --
 fs/zonefs/file.c       | 4 +---
 4 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 904a0d6ac1a1a9..c6a7555d5ad8bb 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -1044,10 +1044,8 @@ static ssize_t gfs2_file_buffered_write(struct kiocb *iocb,
 	pagefault_disable();
 	ret = iomap_file_buffered_write(iocb, from, &gfs2_iomap_ops);
 	pagefault_enable();
-	if (ret > 0) {
-		iocb->ki_pos += ret;
+	if (ret > 0)
 		written += ret;
-	}
 
 	if (inode == sdp->sd_rindex)
 		gfs2_glock_dq_uninit(statfs_gh);
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 063133ec77f49e..550525a525c45c 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -864,16 +864,19 @@ iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *i,
 		.len		= iov_iter_count(i),
 		.flags		= IOMAP_WRITE,
 	};
-	int ret;
+	ssize_t ret;
 
 	if (iocb->ki_flags & IOCB_NOWAIT)
 		iter.flags |= IOMAP_NOWAIT;
 
 	while ((ret = iomap_iter(&iter, ops)) > 0)
 		iter.processed = iomap_write_iter(&iter, i);
-	if (iter.pos == iocb->ki_pos)
+
+	if (unlikely(ret < 0))
 		return ret;
-	return iter.pos - iocb->ki_pos;
+	ret = iter.pos - iocb->ki_pos;
+	iocb->ki_pos += ret;
+	return ret;
 }
 EXPORT_SYMBOL_GPL(iomap_file_buffered_write);
 
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 431c3fd0e2b598..d57443db633637 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -720,8 +720,6 @@ xfs_file_buffered_write(
 	trace_xfs_file_buffered_write(iocb, from);
 	ret = iomap_file_buffered_write(iocb, from,
 			&xfs_buffered_write_iomap_ops);
-	if (likely(ret >= 0))
-		iocb->ki_pos += ret;
 
 	/*
 	 * If we hit a space limit, try to free up some lingering preallocated
diff --git a/fs/zonefs/file.c b/fs/zonefs/file.c
index 132f01d3461f14..e212d0636f848e 100644
--- a/fs/zonefs/file.c
+++ b/fs/zonefs/file.c
@@ -643,9 +643,7 @@ static ssize_t zonefs_file_buffered_write(struct kiocb *iocb,
 		goto inode_unlock;
 
 	ret = iomap_file_buffered_write(iocb, from, &zonefs_write_iomap_ops);
-	if (ret > 0)
-		iocb->ki_pos += ret;
-	else if (ret == -EIO)
+	if (ret == -EIO)
 		zonefs_io_error(inode, true);
 
 inode_unlock:
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Cluster-devel] [PATCH 08/12] iomap: use kiocb_write_and_wait and kiocb_invalidate_pages
  2023-06-01 14:58 [Cluster-devel] cleanup the filemap / direct I/O interaction v4 Christoph Hellwig
                   ` (6 preceding siblings ...)
  2023-06-01 14:58 ` [Cluster-devel] [PATCH 07/12] iomap: update ki_pos in iomap_file_buffered_write Christoph Hellwig
@ 2023-06-01 14:59 ` Christoph Hellwig
  2023-06-01 14:59 ` [Cluster-devel] [PATCH 09/12] fs: factor out a direct_write_fallback helper Christoph Hellwig
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2023-06-01 14:59 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Use the common helpers for direct I/O page invalidation instead of
open coding the logic.  This leads to a slight reordering of checks
in __iomap_dio_rw to keep the logic straight.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/iomap/direct-io.c | 55 ++++++++++++++++----------------------------
 1 file changed, 20 insertions(+), 35 deletions(-)

diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 0795c54a745bca..6bd14691f96e07 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -472,7 +472,6 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 		const struct iomap_ops *ops, const struct iomap_dio_ops *dops,
 		unsigned int dio_flags, void *private, size_t done_before)
 {
-	struct address_space *mapping = iocb->ki_filp->f_mapping;
 	struct inode *inode = file_inode(iocb->ki_filp);
 	struct iomap_iter iomi = {
 		.inode		= inode,
@@ -481,11 +480,11 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 		.flags		= IOMAP_DIRECT,
 		.private	= private,
 	};
-	loff_t end = iomi.pos + iomi.len - 1, ret = 0;
 	bool wait_for_completion =
 		is_sync_kiocb(iocb) || (dio_flags & IOMAP_DIO_FORCE_WAIT);
 	struct blk_plug plug;
 	struct iomap_dio *dio;
+	loff_t ret = 0;
 
 	trace_iomap_dio_rw_begin(iocb, iter, dio_flags, done_before);
 
@@ -509,31 +508,29 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 	dio->submit.waiter = current;
 	dio->submit.poll_bio = NULL;
 
+	if (iocb->ki_flags & IOCB_NOWAIT)
+		iomi.flags |= IOMAP_NOWAIT;
+
 	if (iov_iter_rw(iter) == READ) {
 		if (iomi.pos >= dio->i_size)
 			goto out_free_dio;
 
-		if (iocb->ki_flags & IOCB_NOWAIT) {
-			if (filemap_range_needs_writeback(mapping, iomi.pos,
-					end)) {
-				ret = -EAGAIN;
-				goto out_free_dio;
-			}
-			iomi.flags |= IOMAP_NOWAIT;
-		}
-
 		if (user_backed_iter(iter))
 			dio->flags |= IOMAP_DIO_DIRTY;
+
+		ret = kiocb_write_and_wait(iocb, iomi.len);
+		if (ret)
+			goto out_free_dio;
 	} else {
 		iomi.flags |= IOMAP_WRITE;
 		dio->flags |= IOMAP_DIO_WRITE;
 
-		if (iocb->ki_flags & IOCB_NOWAIT) {
-			if (filemap_range_has_page(mapping, iomi.pos, end)) {
-				ret = -EAGAIN;
+		if (dio_flags & IOMAP_DIO_OVERWRITE_ONLY) {
+			ret = -EAGAIN;
+			if (iomi.pos >= dio->i_size ||
+			    iomi.pos + iomi.len > dio->i_size)
 				goto out_free_dio;
-			}
-			iomi.flags |= IOMAP_NOWAIT;
+			iomi.flags |= IOMAP_OVERWRITE_ONLY;
 		}
 
 		/* for data sync or sync, we need sync completion processing */
@@ -549,31 +546,19 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 			if (!(iocb->ki_flags & IOCB_SYNC))
 				dio->flags |= IOMAP_DIO_WRITE_FUA;
 		}
-	}
-
-	if (dio_flags & IOMAP_DIO_OVERWRITE_ONLY) {
-		ret = -EAGAIN;
-		if (iomi.pos >= dio->i_size ||
-		    iomi.pos + iomi.len > dio->i_size)
-			goto out_free_dio;
-		iomi.flags |= IOMAP_OVERWRITE_ONLY;
-	}
 
-	ret = filemap_write_and_wait_range(mapping, iomi.pos, end);
-	if (ret)
-		goto out_free_dio;
-
-	if (iov_iter_rw(iter) == WRITE) {
 		/*
 		 * Try to invalidate cache pages for the range we are writing.
 		 * If this invalidation fails, let the caller fall back to
 		 * buffered I/O.
 		 */
-		if (invalidate_inode_pages2_range(mapping,
-				iomi.pos >> PAGE_SHIFT, end >> PAGE_SHIFT)) {
-			trace_iomap_dio_invalidate_fail(inode, iomi.pos,
-							iomi.len);
-			ret = -ENOTBLK;
+		ret = kiocb_invalidate_pages(iocb, iomi.len);
+		if (ret) {
+			if (ret != -EAGAIN) {
+				trace_iomap_dio_invalidate_fail(inode, iomi.pos,
+								iomi.len);
+				ret = -ENOTBLK;
+			}
 			goto out_free_dio;
 		}
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Cluster-devel] [PATCH 09/12] fs: factor out a direct_write_fallback helper
  2023-06-01 14:58 [Cluster-devel] cleanup the filemap / direct I/O interaction v4 Christoph Hellwig
                   ` (7 preceding siblings ...)
  2023-06-01 14:59 ` [Cluster-devel] [PATCH 08/12] iomap: use kiocb_write_and_wait and kiocb_invalidate_pages Christoph Hellwig
@ 2023-06-01 14:59 ` Christoph Hellwig
  2023-06-06  0:04   ` Darrick J. Wong
  2023-06-01 14:59 ` [Cluster-devel] [PATCH 10/12] fuse: update ki_pos in fuse_perform_write Christoph Hellwig
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2023-06-01 14:59 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Add a helper dealing with handling the syncing of a buffered write fallback
for direct I/O.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
---
 fs/libfs.c         | 41 ++++++++++++++++++++++++++++
 include/linux/fs.h |  2 ++
 mm/filemap.c       | 66 +++++++++++-----------------------------------
 3 files changed, 58 insertions(+), 51 deletions(-)

diff --git a/fs/libfs.c b/fs/libfs.c
index 89cf614a327158..5b851315eeed03 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1613,3 +1613,44 @@ u64 inode_query_iversion(struct inode *inode)
 	return cur >> I_VERSION_QUERIED_SHIFT;
 }
 EXPORT_SYMBOL(inode_query_iversion);
+
+ssize_t direct_write_fallback(struct kiocb *iocb, struct iov_iter *iter,
+		ssize_t direct_written, ssize_t buffered_written)
+{
+	struct address_space *mapping = iocb->ki_filp->f_mapping;
+	loff_t pos = iocb->ki_pos - buffered_written;
+	loff_t end = iocb->ki_pos - 1;
+	int err;
+
+	/*
+	 * If the buffered write fallback returned an error, we want to return
+	 * the number of bytes which were written by direct I/O, or the error
+	 * code if that was zero.
+	 *
+	 * Note that this differs from normal direct-io semantics, which will
+	 * return -EFOO even if some bytes were written.
+	 */
+	if (unlikely(buffered_written < 0)) {
+		if (direct_written)
+			return direct_written;
+		return buffered_written;
+	}
+
+	/*
+	 * We need to ensure that the page cache pages are written to disk and
+	 * invalidated to preserve the expected O_DIRECT semantics.
+	 */
+	err = filemap_write_and_wait_range(mapping, pos, end);
+	if (err < 0) {
+		/*
+		 * We don't know how much we wrote, so just return the number of
+		 * bytes which were direct-written
+		 */
+		if (direct_written)
+			return direct_written;
+		return err;
+	}
+	invalidate_mapping_pages(mapping, pos >> PAGE_SHIFT, end >> PAGE_SHIFT);
+	return direct_written + buffered_written;
+}
+EXPORT_SYMBOL_GPL(direct_write_fallback);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 91021b4e1f6f48..6af25137543824 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2738,6 +2738,8 @@ extern ssize_t __generic_file_write_iter(struct kiocb *, struct iov_iter *);
 extern ssize_t generic_file_write_iter(struct kiocb *, struct iov_iter *);
 extern ssize_t generic_file_direct_write(struct kiocb *, struct iov_iter *);
 ssize_t generic_perform_write(struct kiocb *, struct iov_iter *);
+ssize_t direct_write_fallback(struct kiocb *iocb, struct iov_iter *iter,
+		ssize_t direct_written, ssize_t buffered_written);
 
 ssize_t vfs_iter_read(struct file *file, struct iov_iter *iter, loff_t *ppos,
 		rwf_t flags);
diff --git a/mm/filemap.c b/mm/filemap.c
index ddb6f8aa86d6ca..137508da5525b6 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -4006,23 +4006,19 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 {
 	struct file *file = iocb->ki_filp;
 	struct address_space *mapping = file->f_mapping;
-	struct inode 	*inode = mapping->host;
-	ssize_t		written = 0;
-	ssize_t		err;
-	ssize_t		status;
+	struct inode *inode = mapping->host;
+	ssize_t ret;
 
-	err = file_remove_privs(file);
-	if (err)
-		goto out;
+	ret = file_remove_privs(file);
+	if (ret)
+		return ret;
 
-	err = file_update_time(file);
-	if (err)
-		goto out;
+	ret = file_update_time(file);
+	if (ret)
+		return ret;
 
 	if (iocb->ki_flags & IOCB_DIRECT) {
-		loff_t pos, endbyte;
-
-		written = generic_file_direct_write(iocb, from);
+		ret = generic_file_direct_write(iocb, from);
 		/*
 		 * If the write stopped short of completing, fall back to
 		 * buffered writes.  Some filesystems do this for writes to
@@ -4030,45 +4026,13 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 		 * not succeed (even if it did, DAX does not handle dirty
 		 * page-cache pages correctly).
 		 */
-		if (written < 0 || !iov_iter_count(from) || IS_DAX(inode))
-			goto out;
-
-		pos = iocb->ki_pos;
-		status = generic_perform_write(iocb, from);
-		/*
-		 * If generic_perform_write() returned a synchronous error
-		 * then we want to return the number of bytes which were
-		 * direct-written, or the error code if that was zero.  Note
-		 * that this differs from normal direct-io semantics, which
-		 * will return -EFOO even if some bytes were written.
-		 */
-		if (unlikely(status < 0)) {
-			err = status;
-			goto out;
-		}
-		/*
-		 * We need to ensure that the page cache pages are written to
-		 * disk and invalidated to preserve the expected O_DIRECT
-		 * semantics.
-		 */
-		endbyte = pos + status - 1;
-		err = filemap_write_and_wait_range(mapping, pos, endbyte);
-		if (err == 0) {
-			written += status;
-			invalidate_mapping_pages(mapping,
-						 pos >> PAGE_SHIFT,
-						 endbyte >> PAGE_SHIFT);
-		} else {
-			/*
-			 * We don't know how much we wrote, so just return
-			 * the number of bytes which were direct-written
-			 */
-		}
-	} else {
-		written = generic_perform_write(iocb, from);
+		if (ret < 0 || !iov_iter_count(from) || IS_DAX(inode))
+			return ret;
+		return direct_write_fallback(iocb, from, ret,
+				generic_perform_write(iocb, from));
 	}
-out:
-	return written ? written : err;
+
+	return generic_perform_write(iocb, from);
 }
 EXPORT_SYMBOL(__generic_file_write_iter);
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Cluster-devel] [PATCH 10/12] fuse: update ki_pos in fuse_perform_write
  2023-06-01 14:58 [Cluster-devel] cleanup the filemap / direct I/O interaction v4 Christoph Hellwig
                   ` (8 preceding siblings ...)
  2023-06-01 14:59 ` [Cluster-devel] [PATCH 09/12] fs: factor out a direct_write_fallback helper Christoph Hellwig
@ 2023-06-01 14:59 ` Christoph Hellwig
  2023-06-01 14:59 ` [Cluster-devel] [PATCH 11/12] fuse: drop redundant arguments to fuse_perform_write Christoph Hellwig
  2023-06-01 14:59 ` [Cluster-devel] [PATCH 12/12] fuse: use direct_write_fallback Christoph Hellwig
  11 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2023-06-01 14:59 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Both callers of fuse_perform_write need to updated ki_pos, move it into
common code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
---
 fs/fuse/file.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 97d435874b14aa..d5902506cdcc65 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1329,7 +1329,10 @@ static ssize_t fuse_perform_write(struct kiocb *iocb,
 	fuse_write_update_attr(inode, pos, res);
 	clear_bit(FUSE_I_SIZE_UNSTABLE, &fi->state);
 
-	return res > 0 ? res : err;
+	if (!res)
+		return err;
+	iocb->ki_pos += res;
+	return res;
 }
 
 static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
@@ -1341,7 +1344,6 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	struct inode *inode = mapping->host;
 	ssize_t err;
 	struct fuse_conn *fc = get_fuse_conn(inode);
-	loff_t endbyte = 0;
 
 	if (fc->writeback_cache) {
 		/* Update size (EOF optimization) and mode (SUID clearing) */
@@ -1375,19 +1377,20 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
 		goto out;
 
 	if (iocb->ki_flags & IOCB_DIRECT) {
-		loff_t pos = iocb->ki_pos;
+		loff_t pos, endbyte;
+
 		written = generic_file_direct_write(iocb, from);
 		if (written < 0 || !iov_iter_count(from))
 			goto out;
 
-		pos += written;
-
-		written_buffered = fuse_perform_write(iocb, mapping, from, pos);
+		written_buffered = fuse_perform_write(iocb, mapping, from,
+						      iocb->ki_pos);
 		if (written_buffered < 0) {
 			err = written_buffered;
 			goto out;
 		}
-		endbyte = pos + written_buffered - 1;
+		pos = iocb->ki_pos - written_buffered;
+		endbyte = iocb->ki_pos - 1;
 
 		err = filemap_write_and_wait_range(file->f_mapping, pos,
 						   endbyte);
@@ -1399,11 +1402,8 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
 					 endbyte >> PAGE_SHIFT);
 
 		written += written_buffered;
-		iocb->ki_pos = pos + written_buffered;
 	} else {
 		written = fuse_perform_write(iocb, mapping, from, iocb->ki_pos);
-		if (written >= 0)
-			iocb->ki_pos += written;
 	}
 out:
 	inode_unlock(inode);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Cluster-devel] [PATCH 11/12] fuse: drop redundant arguments to fuse_perform_write
  2023-06-01 14:58 [Cluster-devel] cleanup the filemap / direct I/O interaction v4 Christoph Hellwig
                   ` (9 preceding siblings ...)
  2023-06-01 14:59 ` [Cluster-devel] [PATCH 10/12] fuse: update ki_pos in fuse_perform_write Christoph Hellwig
@ 2023-06-01 14:59 ` Christoph Hellwig
  2023-06-01 14:59 ` [Cluster-devel] [PATCH 12/12] fuse: use direct_write_fallback Christoph Hellwig
  11 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2023-06-01 14:59 UTC (permalink / raw)
  To: cluster-devel.redhat.com

pos is always equal to iocb->ki_pos, and mapping is always equal to
iocb->ki_filp->f_mapping.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Miklos Szeredi <mszeredi@redhat.com>
---
 fs/fuse/file.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index d5902506cdcc65..b4e272a65fdd25 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1280,13 +1280,13 @@ static inline unsigned int fuse_wr_pages(loff_t pos, size_t len,
 		     max_pages);
 }
 
-static ssize_t fuse_perform_write(struct kiocb *iocb,
-				  struct address_space *mapping,
-				  struct iov_iter *ii, loff_t pos)
+static ssize_t fuse_perform_write(struct kiocb *iocb, struct iov_iter *ii)
 {
+	struct address_space *mapping = iocb->ki_filp->f_mapping;
 	struct inode *inode = mapping->host;
 	struct fuse_conn *fc = get_fuse_conn(inode);
 	struct fuse_inode *fi = get_fuse_inode(inode);
+	loff_t pos = iocb->ki_pos;
 	int err = 0;
 	ssize_t res = 0;
 
@@ -1383,8 +1383,7 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
 		if (written < 0 || !iov_iter_count(from))
 			goto out;
 
-		written_buffered = fuse_perform_write(iocb, mapping, from,
-						      iocb->ki_pos);
+		written_buffered = fuse_perform_write(iocb, from);
 		if (written_buffered < 0) {
 			err = written_buffered;
 			goto out;
@@ -1403,7 +1402,7 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
 
 		written += written_buffered;
 	} else {
-		written = fuse_perform_write(iocb, mapping, from, iocb->ki_pos);
+		written = fuse_perform_write(iocb, from);
 	}
 out:
 	inode_unlock(inode);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Cluster-devel] [PATCH 12/12] fuse: use direct_write_fallback
  2023-06-01 14:58 [Cluster-devel] cleanup the filemap / direct I/O interaction v4 Christoph Hellwig
                   ` (10 preceding siblings ...)
  2023-06-01 14:59 ` [Cluster-devel] [PATCH 11/12] fuse: drop redundant arguments to fuse_perform_write Christoph Hellwig
@ 2023-06-01 14:59 ` Christoph Hellwig
  11 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2023-06-01 14:59 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Use the generic direct_write_fallback helper instead of duplicating the
logic.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
---
 fs/fuse/file.c | 24 ++----------------------
 1 file changed, 2 insertions(+), 22 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index b4e272a65fdd25..3a7c7d7181ccb9 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1340,7 +1340,6 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	struct file *file = iocb->ki_filp;
 	struct address_space *mapping = file->f_mapping;
 	ssize_t written = 0;
-	ssize_t written_buffered = 0;
 	struct inode *inode = mapping->host;
 	ssize_t err;
 	struct fuse_conn *fc = get_fuse_conn(inode);
@@ -1377,30 +1376,11 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
 		goto out;
 
 	if (iocb->ki_flags & IOCB_DIRECT) {
-		loff_t pos, endbyte;
-
 		written = generic_file_direct_write(iocb, from);
 		if (written < 0 || !iov_iter_count(from))
 			goto out;
-
-		written_buffered = fuse_perform_write(iocb, from);
-		if (written_buffered < 0) {
-			err = written_buffered;
-			goto out;
-		}
-		pos = iocb->ki_pos - written_buffered;
-		endbyte = iocb->ki_pos - 1;
-
-		err = filemap_write_and_wait_range(file->f_mapping, pos,
-						   endbyte);
-		if (err)
-			goto out;
-
-		invalidate_mapping_pages(file->f_mapping,
-					 pos >> PAGE_SHIFT,
-					 endbyte >> PAGE_SHIFT);
-
-		written += written_buffered;
+		written = direct_write_fallback(iocb, from, written,
+				fuse_perform_write(iocb, from));
 	} else {
 		written = fuse_perform_write(iocb, from);
 	}
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Cluster-devel] [PATCH 09/12] fs: factor out a direct_write_fallback helper
  2023-06-01 14:59 ` [Cluster-devel] [PATCH 09/12] fs: factor out a direct_write_fallback helper Christoph Hellwig
@ 2023-06-06  0:04   ` Darrick J. Wong
  2023-06-06  6:43     ` Christoph Hellwig
  0 siblings, 1 reply; 28+ messages in thread
From: Darrick J. Wong @ 2023-06-06  0:04 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Thu, Jun 01, 2023 at 04:59:01PM +0200, Christoph Hellwig wrote:
> Add a helper dealing with handling the syncing of a buffered write fallback
> for direct I/O.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
> Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>

Looks good to me; whose tree do you want this to go through?

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/libfs.c         | 41 ++++++++++++++++++++++++++++
>  include/linux/fs.h |  2 ++
>  mm/filemap.c       | 66 +++++++++++-----------------------------------
>  3 files changed, 58 insertions(+), 51 deletions(-)
> 
> diff --git a/fs/libfs.c b/fs/libfs.c
> index 89cf614a327158..5b851315eeed03 100644
> --- a/fs/libfs.c
> +++ b/fs/libfs.c
> @@ -1613,3 +1613,44 @@ u64 inode_query_iversion(struct inode *inode)
>  	return cur >> I_VERSION_QUERIED_SHIFT;
>  }
>  EXPORT_SYMBOL(inode_query_iversion);
> +
> +ssize_t direct_write_fallback(struct kiocb *iocb, struct iov_iter *iter,
> +		ssize_t direct_written, ssize_t buffered_written)
> +{
> +	struct address_space *mapping = iocb->ki_filp->f_mapping;
> +	loff_t pos = iocb->ki_pos - buffered_written;
> +	loff_t end = iocb->ki_pos - 1;
> +	int err;
> +
> +	/*
> +	 * If the buffered write fallback returned an error, we want to return
> +	 * the number of bytes which were written by direct I/O, or the error
> +	 * code if that was zero.
> +	 *
> +	 * Note that this differs from normal direct-io semantics, which will
> +	 * return -EFOO even if some bytes were written.
> +	 */
> +	if (unlikely(buffered_written < 0)) {
> +		if (direct_written)
> +			return direct_written;
> +		return buffered_written;
> +	}
> +
> +	/*
> +	 * We need to ensure that the page cache pages are written to disk and
> +	 * invalidated to preserve the expected O_DIRECT semantics.
> +	 */
> +	err = filemap_write_and_wait_range(mapping, pos, end);
> +	if (err < 0) {
> +		/*
> +		 * We don't know how much we wrote, so just return the number of
> +		 * bytes which were direct-written
> +		 */
> +		if (direct_written)
> +			return direct_written;
> +		return err;
> +	}
> +	invalidate_mapping_pages(mapping, pos >> PAGE_SHIFT, end >> PAGE_SHIFT);
> +	return direct_written + buffered_written;
> +}
> +EXPORT_SYMBOL_GPL(direct_write_fallback);
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 91021b4e1f6f48..6af25137543824 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2738,6 +2738,8 @@ extern ssize_t __generic_file_write_iter(struct kiocb *, struct iov_iter *);
>  extern ssize_t generic_file_write_iter(struct kiocb *, struct iov_iter *);
>  extern ssize_t generic_file_direct_write(struct kiocb *, struct iov_iter *);
>  ssize_t generic_perform_write(struct kiocb *, struct iov_iter *);
> +ssize_t direct_write_fallback(struct kiocb *iocb, struct iov_iter *iter,
> +		ssize_t direct_written, ssize_t buffered_written);
>  
>  ssize_t vfs_iter_read(struct file *file, struct iov_iter *iter, loff_t *ppos,
>  		rwf_t flags);
> diff --git a/mm/filemap.c b/mm/filemap.c
> index ddb6f8aa86d6ca..137508da5525b6 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -4006,23 +4006,19 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
>  {
>  	struct file *file = iocb->ki_filp;
>  	struct address_space *mapping = file->f_mapping;
> -	struct inode 	*inode = mapping->host;
> -	ssize_t		written = 0;
> -	ssize_t		err;
> -	ssize_t		status;
> +	struct inode *inode = mapping->host;
> +	ssize_t ret;
>  
> -	err = file_remove_privs(file);
> -	if (err)
> -		goto out;
> +	ret = file_remove_privs(file);
> +	if (ret)
> +		return ret;
>  
> -	err = file_update_time(file);
> -	if (err)
> -		goto out;
> +	ret = file_update_time(file);
> +	if (ret)
> +		return ret;
>  
>  	if (iocb->ki_flags & IOCB_DIRECT) {
> -		loff_t pos, endbyte;
> -
> -		written = generic_file_direct_write(iocb, from);
> +		ret = generic_file_direct_write(iocb, from);
>  		/*
>  		 * If the write stopped short of completing, fall back to
>  		 * buffered writes.  Some filesystems do this for writes to
> @@ -4030,45 +4026,13 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
>  		 * not succeed (even if it did, DAX does not handle dirty
>  		 * page-cache pages correctly).
>  		 */
> -		if (written < 0 || !iov_iter_count(from) || IS_DAX(inode))
> -			goto out;
> -
> -		pos = iocb->ki_pos;
> -		status = generic_perform_write(iocb, from);
> -		/*
> -		 * If generic_perform_write() returned a synchronous error
> -		 * then we want to return the number of bytes which were
> -		 * direct-written, or the error code if that was zero.  Note
> -		 * that this differs from normal direct-io semantics, which
> -		 * will return -EFOO even if some bytes were written.
> -		 */
> -		if (unlikely(status < 0)) {
> -			err = status;
> -			goto out;
> -		}
> -		/*
> -		 * We need to ensure that the page cache pages are written to
> -		 * disk and invalidated to preserve the expected O_DIRECT
> -		 * semantics.
> -		 */
> -		endbyte = pos + status - 1;
> -		err = filemap_write_and_wait_range(mapping, pos, endbyte);
> -		if (err == 0) {
> -			written += status;
> -			invalidate_mapping_pages(mapping,
> -						 pos >> PAGE_SHIFT,
> -						 endbyte >> PAGE_SHIFT);
> -		} else {
> -			/*
> -			 * We don't know how much we wrote, so just return
> -			 * the number of bytes which were direct-written
> -			 */
> -		}
> -	} else {
> -		written = generic_perform_write(iocb, from);
> +		if (ret < 0 || !iov_iter_count(from) || IS_DAX(inode))
> +			return ret;
> +		return direct_write_fallback(iocb, from, ret,
> +				generic_perform_write(iocb, from));
>  	}
> -out:
> -	return written ? written : err;
> +
> +	return generic_perform_write(iocb, from);
>  }
>  EXPORT_SYMBOL(__generic_file_write_iter);
>  
> -- 
> 2.39.2
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Cluster-devel] [PATCH 09/12] fs: factor out a direct_write_fallback helper
  2023-06-06  0:04   ` Darrick J. Wong
@ 2023-06-06  6:43     ` Christoph Hellwig
  0 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2023-06-06  6:43 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Mon, Jun 05, 2023 at 05:04:14PM -0700, Darrick J. Wong wrote:
> On Thu, Jun 01, 2023 at 04:59:01PM +0200, Christoph Hellwig wrote:
> > Add a helper dealing with handling the syncing of a buffered write fallback
> > for direct I/O.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
> > Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
> 
> Looks good to me; whose tree do you want this to go through?

Andrew has already picked them up.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Cluster-devel] [f2fs-dev] [PATCH 01/12] backing_dev: remove current->backing_dev_info
  2023-06-01 14:58 ` [Cluster-devel] [PATCH 01/12] backing_dev: remove current->backing_dev_info Christoph Hellwig
@ 2023-07-06  0:18   ` patchwork-bot+f2fs
  0 siblings, 0 replies; 28+ messages in thread
From: patchwork-bot+f2fs @ 2023-07-06  0:18 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hello:

This series was applied to jaegeuk/f2fs.git (dev)
by Andrew Morton <akpm@linux-foundation.org>:

On Thu,  1 Jun 2023 16:58:53 +0200 you wrote:
> The last user of current->backing_dev_info disappeared in commit
> b9b1335e6403 ("remove bdi_congested() and wb_congested() and related
> functions").  Remove the field and all assignments to it.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Christian Brauner <brauner@kernel.org>
> Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> Acked-by: Theodore Ts'o <tytso@mit.edu>
> 
> [...]

Here is the summary with links:
  - [f2fs-dev,01/12] backing_dev: remove current->backing_dev_info
    https://git.kernel.org/jaegeuk/f2fs/c/0d625446d0a4
  - [f2fs-dev,02/12] iomap: update ki_pos a little later in iomap_dio_complete
    https://git.kernel.org/jaegeuk/f2fs/c/936e114a245b
  - [f2fs-dev,03/12] filemap: update ki_pos in generic_perform_write
    https://git.kernel.org/jaegeuk/f2fs/c/182c25e9c157
  - [f2fs-dev,04/12] filemap: add a kiocb_write_and_wait helper
    https://git.kernel.org/jaegeuk/f2fs/c/3c435a0fe35c
  - [f2fs-dev,05/12] filemap: add a kiocb_invalidate_pages helper
    https://git.kernel.org/jaegeuk/f2fs/c/e003f74afbd2
  - [f2fs-dev,06/12] filemap: add a kiocb_invalidate_post_direct_write helper
    https://git.kernel.org/jaegeuk/f2fs/c/c402a9a9430b
  - [f2fs-dev,07/12] iomap: update ki_pos in iomap_file_buffered_write
    https://git.kernel.org/jaegeuk/f2fs/c/219580eea1ee
  - [f2fs-dev,08/12] iomap: use kiocb_write_and_wait and kiocb_invalidate_pages
    https://git.kernel.org/jaegeuk/f2fs/c/8ee93b4bb626
  - [f2fs-dev,09/12] fs: factor out a direct_write_fallback helper
    https://git.kernel.org/jaegeuk/f2fs/c/44fff0fa08ec
  - [f2fs-dev,10/12] fuse: update ki_pos in fuse_perform_write
    https://git.kernel.org/jaegeuk/f2fs/c/70e986c3b4f4
  - [f2fs-dev,11/12] fuse: drop redundant arguments to fuse_perform_write
    https://git.kernel.org/jaegeuk/f2fs/c/596df33d673d
  - [f2fs-dev,12/12] fuse: use direct_write_fallback
    https://git.kernel.org/jaegeuk/f2fs/c/64d1b4dd826d

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Cluster-devel] [PATCH 03/12] filemap: update ki_pos in generic_perform_write
  2023-06-01 14:58 ` [Cluster-devel] [PATCH 03/12] filemap: update ki_pos in generic_perform_write Christoph Hellwig
@ 2023-08-27 19:41   ` Al Viro
  2023-08-27 21:45     ` Al Viro
                       ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Al Viro @ 2023-08-27 19:41 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Darrick J. Wong, linux-mm, Miklos Szeredi, Matthew Wilcox,
	cluster-devel, Ilya Dryomov, linux-ext4, Chao Yu, linux-nfs,
	linux-block, Damien Le Moal, Hannes Reinecke, Jaegeuk Kim,
	ceph-devel, Xiubo Li, Trond Myklebust, Jens Axboe,
	Christian Brauner, Theodore Ts'o, linux-f2fs-devel, linux-xfs,
	Anna Schumaker, linux-fsdevel, Andrew Morton

On Thu, Jun 01, 2023 at 04:58:55PM +0200, Christoph Hellwig wrote:
> All callers of generic_perform_write need to updated ki_pos, move it into
> common code.

> @@ -4034,7 +4037,6 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
>  		endbyte = pos + status - 1;
>  		err = filemap_write_and_wait_range(mapping, pos, endbyte);
>  		if (err == 0) {
> -			iocb->ki_pos = endbyte + 1;
>  			written += status;
>  			invalidate_mapping_pages(mapping,
>  						 pos >> PAGE_SHIFT,
> @@ -4047,8 +4049,6 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
>  		}
>  	} else {
>  		written = generic_perform_write(iocb, from);
> -		if (likely(written > 0))
> -			iocb->ki_pos += written;
>  	}
>  out:
>  	return written ? written : err;

[another late reply, sorry]

That part is somewhat fishy - there's a case where you return a positive value
and advance ->ki_pos by more than that amount.  I really wonder if all callers
of ->write_iter() are OK with that.  Consider e.g. this:

ssize_t ksys_write(unsigned int fd, const char __user *buf, size_t count)
{
        struct fd f = fdget_pos(fd);
        ssize_t ret = -EBADF;

        if (f.file) {
                loff_t pos, *ppos = file_ppos(f.file);
                if (ppos) {
                        pos = *ppos;   
                        ppos = &pos;
                }
                ret = vfs_write(f.file, buf, count, ppos);
                if (ret >= 0 && ppos)
                        f.file->f_pos = pos;
                fdput_pos(f);
        }

        return ret;
}

ssize_t vfs_write(struct file *file, const char __user *buf, size_t count, loff_t *pos)
{
        ssize_t ret;

        if (!(file->f_mode & FMODE_WRITE))
                return -EBADF;
        if (!(file->f_mode & FMODE_CAN_WRITE))
                return -EINVAL;
        if (unlikely(!access_ok(buf, count)))
                return -EFAULT;

        ret = rw_verify_area(WRITE, file, pos, count);
        if (ret)
                return ret;
        if (count > MAX_RW_COUNT)
                count =  MAX_RW_COUNT;
        file_start_write(file);
        if (file->f_op->write)
                ret = file->f_op->write(file, buf, count, pos);
        else if (file->f_op->write_iter)
                ret = new_sync_write(file, buf, count, pos);
        else   
                ret = -EINVAL;
        if (ret > 0) {
                fsnotify_modify(file);
                add_wchar(current, ret);
        }
        inc_syscw(current);
        file_end_write(file);
        return ret;
}

static ssize_t new_sync_write(struct file *filp, const char __user *buf, size_t len, loff_t *ppos)
{
        struct kiocb kiocb;
        struct iov_iter iter;
        ssize_t ret; 

        init_sync_kiocb(&kiocb, filp);
        kiocb.ki_pos = (ppos ? *ppos : 0);
        iov_iter_ubuf(&iter, ITER_SOURCE, (void __user *)buf, len);

        ret = call_write_iter(filp, &kiocb, &iter);
        BUG_ON(ret == -EIOCBQUEUED);
        if (ret > 0 && ppos)
                *ppos = kiocb.ki_pos;
        return ret;
} 

Suppose ->write_iter() ends up doing returning a positive value smaller than
the increment of kiocb.ki_pos.  What do we get?  ret is positive, so
kiocb.ki_pos gets copied into *ppos, which is ksys_write's pos and there
we copy it into file->f_pos.

Is it really OK to have write() return 4096 and advance the file position
by 16K?  AFAICS, userland wouldn't get any indication of something
odd going on - just a short write to a regular file, with followup write
of remaining 12K getting quietly written in the range 16K..28K.

I don't remember what POSIX says about that, but it would qualify as
nasty surprise for any userland program - sure, one can check fsync()
results before closing the sucker and see if everything looks fine,
but the way it's usually discussed could easily lead to assumption that
(synchronous) O_DIRECT writes would not be affected by anything of that
sort.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Cluster-devel] [PATCH 03/12] filemap: update ki_pos in generic_perform_write
  2023-08-27 19:41   ` Al Viro
@ 2023-08-27 21:45     ` Al Viro
  2023-08-28 12:32       ` Christoph Hellwig
  2023-09-13 11:00       ` Christoph Hellwig
  2023-08-28  1:04     ` Al Viro
  2023-08-28 12:30     ` Christoph Hellwig
  2 siblings, 2 replies; 28+ messages in thread
From: Al Viro @ 2023-08-27 21:45 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Darrick J. Wong, linux-mm, Miklos Szeredi, Matthew Wilcox,
	cluster-devel, Ilya Dryomov, linux-ext4, Chao Yu, linux-nfs,
	linux-block, Damien Le Moal, Hannes Reinecke, Jaegeuk Kim,
	ceph-devel, Xiubo Li, Trond Myklebust, Jens Axboe,
	Christian Brauner, Theodore Ts'o, linux-f2fs-devel, linux-xfs,
	Anna Schumaker, linux-fsdevel, Andrew Morton

On Sun, Aug 27, 2023 at 08:41:22PM +0100, Al Viro wrote:
> On Thu, Jun 01, 2023 at 04:58:55PM +0200, Christoph Hellwig wrote:
> > All callers of generic_perform_write need to updated ki_pos, move it into
> > common code.
> 
> > @@ -4034,7 +4037,6 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
> >  		endbyte = pos + status - 1;
> >  		err = filemap_write_and_wait_range(mapping, pos, endbyte);
> >  		if (err == 0) {
> > -			iocb->ki_pos = endbyte + 1;
> >  			written += status;
> >  			invalidate_mapping_pages(mapping,
> >  						 pos >> PAGE_SHIFT,
> > @@ -4047,8 +4049,6 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
> >  		}
> >  	} else {
> >  		written = generic_perform_write(iocb, from);
> > -		if (likely(written > 0))
> > -			iocb->ki_pos += written;
> >  	}
> >  out:
> >  	return written ? written : err;
> 
> [another late reply, sorry]
> 
> That part is somewhat fishy - there's a case where you return a positive value
> and advance ->ki_pos by more than that amount.  I really wonder if all callers
> of ->write_iter() are OK with that.  Consider e.g. this:
> 
> ssize_t ksys_write(unsigned int fd, const char __user *buf, size_t count)
> {
>         struct fd f = fdget_pos(fd);
>         ssize_t ret = -EBADF;
> 
>         if (f.file) {
>                 loff_t pos, *ppos = file_ppos(f.file);
>                 if (ppos) {
>                         pos = *ppos;   
>                         ppos = &pos;
>                 }
>                 ret = vfs_write(f.file, buf, count, ppos);
>                 if (ret >= 0 && ppos)
>                         f.file->f_pos = pos;
>                 fdput_pos(f);
>         }
> 
>         return ret;
> }
> 
> ssize_t vfs_write(struct file *file, const char __user *buf, size_t count, loff_t *pos)
> {
>         ssize_t ret;
> 
>         if (!(file->f_mode & FMODE_WRITE))
>                 return -EBADF;
>         if (!(file->f_mode & FMODE_CAN_WRITE))
>                 return -EINVAL;
>         if (unlikely(!access_ok(buf, count)))
>                 return -EFAULT;
> 
>         ret = rw_verify_area(WRITE, file, pos, count);
>         if (ret)
>                 return ret;
>         if (count > MAX_RW_COUNT)
>                 count =  MAX_RW_COUNT;
>         file_start_write(file);
>         if (file->f_op->write)
>                 ret = file->f_op->write(file, buf, count, pos);
>         else if (file->f_op->write_iter)
>                 ret = new_sync_write(file, buf, count, pos);
>         else   
>                 ret = -EINVAL;
>         if (ret > 0) {
>                 fsnotify_modify(file);
>                 add_wchar(current, ret);
>         }
>         inc_syscw(current);
>         file_end_write(file);
>         return ret;
> }
> 
> static ssize_t new_sync_write(struct file *filp, const char __user *buf, size_t len, loff_t *ppos)
> {
>         struct kiocb kiocb;
>         struct iov_iter iter;
>         ssize_t ret; 
> 
>         init_sync_kiocb(&kiocb, filp);
>         kiocb.ki_pos = (ppos ? *ppos : 0);
>         iov_iter_ubuf(&iter, ITER_SOURCE, (void __user *)buf, len);
> 
>         ret = call_write_iter(filp, &kiocb, &iter);
>         BUG_ON(ret == -EIOCBQUEUED);
>         if (ret > 0 && ppos)
>                 *ppos = kiocb.ki_pos;
>         return ret;
> } 
> 
> Suppose ->write_iter() ends up doing returning a positive value smaller than
> the increment of kiocb.ki_pos.  What do we get?  ret is positive, so
> kiocb.ki_pos gets copied into *ppos, which is ksys_write's pos and there
> we copy it into file->f_pos.
> 
> Is it really OK to have write() return 4096 and advance the file position
> by 16K?  AFAICS, userland wouldn't get any indication of something
> odd going on - just a short write to a regular file, with followup write
> of remaining 12K getting quietly written in the range 16K..28K.
> 
> I don't remember what POSIX says about that, but it would qualify as
> nasty surprise for any userland program - sure, one can check fsync()
> results before closing the sucker and see if everything looks fine,
> but the way it's usually discussed could easily lead to assumption that
> (synchronous) O_DIRECT writes would not be affected by anything of that
> sort.

IOW, I suspect that the right thing to do would be something along the lines
of

direct_write_fallback(): on error revert the ->ki_pos update from buffered write

If we fail filemap_write_and_wait_range() on the range the buffered write went
into, we only report the "number of bytes which we direct-written", to quote
the comment in there.  Which is fine, but buffered write has already advanced
iocb->ki_pos, so we need to roll that back.  Otherwise we end up with e.g.
write(2) advancing position by more than the amount it reports having written.

Fixes: 182c25e9c157 "filemap: update ki_pos in generic_perform_write"
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
diff --git a/fs/libfs.c b/fs/libfs.c
index 5b851315eeed..712c57828c0e 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1646,6 +1646,7 @@ ssize_t direct_write_fallback(struct kiocb *iocb, struct iov_iter *iter,
 		 * We don't know how much we wrote, so just return the number of
 		 * bytes which were direct-written
 		 */
+		iocb->ki_pos -= buffered_written;
 		if (direct_written)
 			return direct_written;
 		return err;


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [Cluster-devel] [PATCH 03/12] filemap: update ki_pos in generic_perform_write
  2023-08-27 19:41   ` Al Viro
  2023-08-27 21:45     ` Al Viro
@ 2023-08-28  1:04     ` Al Viro
  2023-08-28 12:30     ` Christoph Hellwig
  2 siblings, 0 replies; 28+ messages in thread
From: Al Viro @ 2023-08-28  1:04 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Darrick J. Wong, linux-mm, Miklos Szeredi, Matthew Wilcox,
	cluster-devel, Ilya Dryomov, linux-ext4, Chao Yu, linux-nfs,
	linux-block, Damien Le Moal, Hannes Reinecke, Jaegeuk Kim,
	ceph-devel, Xiubo Li, Trond Myklebust, Jens Axboe,
	Christian Brauner, Theodore Ts'o, linux-f2fs-devel, linux-xfs,
	Anna Schumaker, linux-fsdevel, Andrew Morton

On Sun, Aug 27, 2023 at 08:41:22PM +0100, Al Viro wrote:

> That part is somewhat fishy - there's a case where you return a positive value
> and advance ->ki_pos by more than that amount.  I really wonder if all callers
> of ->write_iter() are OK with that.

Speaking of which, in case of negative return value we'd better *not* use
->ki_pos; consider e.g. generic_file_write_iter() with O_DSYNC and
vfs_fsync_range() failure.  An error gets returned, but ->ki_pos is left
advanced.  Normal write(2) is fine - it will only update file->f_pos if
->write_iter() has returned a non-negative.  However, io_uring
kiocb_done() starts with
        if (req->flags & REQ_F_CUR_POS)
                req->file->f_pos = rw->kiocb.ki_pos;
        if (ret >= 0 && (rw->kiocb.ki_complete == io_complete_rw)) {
                if (!__io_complete_rw_common(req, ret)) {
                        /*
                         * Safe to call io_end from here as we're inline
                         * from the submission path.
                         */
                        io_req_io_end(req);
                        io_req_set_res(req, final_ret,
                                       io_put_kbuf(req, issue_flags));
                        return IOU_OK;
                }
        } else {
                io_rw_done(&rw->kiocb, ret);
        }
Note that ->f_pos update is *NOT* conditional upon ret >= 0 - it happens
no matter what, provided that original request had ->kiocb.ki_pos equal
to -1 (on a non-FMODE_STREAM file).

Jens, is there any reason for doing that unconditionally?  IMO it's
a bad idea - there's a wide scope for fuckups that way, especially
since write(2) is not sensitive to that and this use of -1 ki_pos
is not particularly encouraged on io_uring side either, AFAICT.
Worse, it's handling of failure exits in the first place, which
already gets little testing...


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Cluster-devel] [PATCH 03/12] filemap: update ki_pos in generic_perform_write
  2023-08-27 19:41   ` Al Viro
  2023-08-27 21:45     ` Al Viro
  2023-08-28  1:04     ` Al Viro
@ 2023-08-28 12:30     ` Christoph Hellwig
  2023-08-28 14:02       ` Al Viro
  2 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2023-08-28 12:30 UTC (permalink / raw)
  To: Al Viro
  Cc: Darrick J. Wong, linux-mm, Christoph Hellwig, Miklos Szeredi,
	Matthew Wilcox, cluster-devel, Ilya Dryomov, linux-ext4, Chao Yu,
	linux-nfs, linux-block, Damien Le Moal, Hannes Reinecke,
	Jaegeuk Kim, ceph-devel, Xiubo Li, Trond Myklebust, Jens Axboe,
	Christian Brauner, Theodore Ts'o, linux-f2fs-devel, linux-xfs,
	Anna Schumaker, linux-fsdevel, Andrew Morton

On Sun, Aug 27, 2023 at 08:41:22PM +0100, Al Viro wrote:
> That part is somewhat fishy - there's a case where you return a positive value
> and advance ->ki_pos by more than that amount.  I really wonder if all callers
> of ->write_iter() are OK with that.  Consider e.g. this:

This should not exist in the latest version merged by Jens.  Can you
check if you still  see issues in the version in the block tree or
linux-next.

> Suppose ->write_iter() ends up doing returning a positive value smaller than
> the increment of kiocb.ki_pos.  What do we get?  ret is positive, so
> kiocb.ki_pos gets copied into *ppos, which is ksys_write's pos and there
> we copy it into file->f_pos.
> 
> Is it really OK to have write() return 4096 and advance the file position
> by 16K?  AFAICS, userland wouldn't get any indication of something
> odd going on - just a short write to a regular file, with followup write
> of remaining 12K getting quietly written in the range 16K..28K.
> 
> I don't remember what POSIX says about that, but it would qualify as
> nasty surprise for any userland program - sure, one can check fsync()
> results before closing the sucker and see if everything looks fine,
> but the way it's usually discussed could easily lead to assumption that
> (synchronous) O_DIRECT writes would not be affected by anything of that
> sort.

ki_pos should always be updated by the write return value.  Everything
else is a bug.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Cluster-devel] [PATCH 03/12] filemap: update ki_pos in generic_perform_write
  2023-08-27 21:45     ` Al Viro
@ 2023-08-28 12:32       ` Christoph Hellwig
  2023-08-28 13:56         ` Al Viro
  2023-09-13 11:00       ` Christoph Hellwig
  1 sibling, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2023-08-28 12:32 UTC (permalink / raw)
  To: Al Viro
  Cc: Darrick J. Wong, linux-mm, Christoph Hellwig, Miklos Szeredi,
	Matthew Wilcox, cluster-devel, Ilya Dryomov, linux-ext4, Chao Yu,
	linux-nfs, linux-block, Damien Le Moal, Hannes Reinecke,
	Jaegeuk Kim, ceph-devel, Xiubo Li, Trond Myklebust, Jens Axboe,
	Christian Brauner, Theodore Ts'o, linux-f2fs-devel, linux-xfs,
	Anna Schumaker, linux-fsdevel, Andrew Morton

On Sun, Aug 27, 2023 at 10:45:18PM +0100, Al Viro wrote:
> IOW, I suspect that the right thing to do would be something along the lines
> of

The idea looks sensible to me, but we'll also need to do it for the
filemap_write_and_wait_range failure case.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Cluster-devel] [PATCH 03/12] filemap: update ki_pos in generic_perform_write
  2023-08-28 12:32       ` Christoph Hellwig
@ 2023-08-28 13:56         ` Al Viro
  2023-08-28 14:15           ` Christoph Hellwig
  0 siblings, 1 reply; 28+ messages in thread
From: Al Viro @ 2023-08-28 13:56 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Darrick J. Wong, linux-mm, Miklos Szeredi, Matthew Wilcox,
	cluster-devel, Ilya Dryomov, linux-ext4, Chao Yu, linux-nfs,
	linux-block, Damien Le Moal, Hannes Reinecke, Jaegeuk Kim,
	ceph-devel, Xiubo Li, Trond Myklebust, Jens Axboe,
	Christian Brauner, Theodore Ts'o, linux-f2fs-devel, linux-xfs,
	Anna Schumaker, linux-fsdevel, Andrew Morton

On Mon, Aug 28, 2023 at 02:32:59PM +0200, Christoph Hellwig wrote:
> On Sun, Aug 27, 2023 at 10:45:18PM +0100, Al Viro wrote:
> > IOW, I suspect that the right thing to do would be something along the lines
> > of
> 
> The idea looks sensible to me, but we'll also need to do it for the
> filemap_write_and_wait_range failure case.

Huh?  That's precisely where this patch is doing that...  That function
in mainline is
        if (unlikely(buffered_written < 0)) {
                if (direct_written)
                        return direct_written;
                return buffered_written;
        }

        /*
         * We need to ensure that the page cache pages are written to disk and
         * invalidated to preserve the expected O_DIRECT semantics.
         */
        err = filemap_write_and_wait_range(mapping, pos, end);
        if (err < 0) {
                /*
                 * We don't know how much we wrote, so just return the number of
                 * bytes which were direct-written
                 */
                if (direct_written)
                        return direct_written;
                return err;
        }
        invalidate_mapping_pages(mapping, pos >> PAGE_SHIFT, end >> PAGE_SHIFT);
        return direct_written + buffered_written;

The first failure exit does not need any work - the caller had not bumped
->ki_pos; the second one (after that 'if (err < 0) {' line) does and that's
where the patch upthread adds iocb->ki_pos -= buffered_written.

Or am I completely misparsing what you've written?


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Cluster-devel] [PATCH 03/12] filemap: update ki_pos in generic_perform_write
  2023-08-28 12:30     ` Christoph Hellwig
@ 2023-08-28 14:02       ` Al Viro
  0 siblings, 0 replies; 28+ messages in thread
From: Al Viro @ 2023-08-28 14:02 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Darrick J. Wong, linux-mm, Miklos Szeredi, Matthew Wilcox,
	cluster-devel, Ilya Dryomov, linux-ext4, Chao Yu, linux-nfs,
	linux-block, Damien Le Moal, Hannes Reinecke, Jaegeuk Kim,
	ceph-devel, Xiubo Li, Trond Myklebust, Jens Axboe,
	Christian Brauner, Theodore Ts'o, linux-f2fs-devel, linux-xfs,
	Anna Schumaker, linux-fsdevel, Andrew Morton

On Mon, Aug 28, 2023 at 02:30:23PM +0200, Christoph Hellwig wrote:
> On Sun, Aug 27, 2023 at 08:41:22PM +0100, Al Viro wrote:
> > That part is somewhat fishy - there's a case where you return a positive value
> > and advance ->ki_pos by more than that amount.  I really wonder if all callers
> > of ->write_iter() are OK with that.  Consider e.g. this:
> 
> This should not exist in the latest version merged by Jens.  Can you
> check if you still  see issues in the version in the block tree or
> linux-next.

Still does - the problem has migrated into direct_write_fallback(), but
that hadn't changed the situation.  We are still left with ->ki_pos bumped
by generic_perform_write() (evaluated as an argument of direct_write_fallback()
now) and *not* retraced in case when direct_write_fallback() decides to
discard the buffered write result.  Both in -next and in mainline (since
6.5-rc1).


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Cluster-devel] [PATCH 03/12] filemap: update ki_pos in generic_perform_write
  2023-08-28 13:56         ` Al Viro
@ 2023-08-28 14:15           ` Christoph Hellwig
  0 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2023-08-28 14:15 UTC (permalink / raw)
  To: Al Viro
  Cc: Darrick J. Wong, linux-mm, Christoph Hellwig, Miklos Szeredi,
	Matthew Wilcox, cluster-devel, Ilya Dryomov, linux-ext4, Chao Yu,
	linux-nfs, linux-block, Damien Le Moal, Hannes Reinecke,
	Jaegeuk Kim, ceph-devel, Xiubo Li, Trond Myklebust, Jens Axboe,
	Christian Brauner, Theodore Ts'o, linux-f2fs-devel, linux-xfs,
	Anna Schumaker, linux-fsdevel, Andrew Morton

On Mon, Aug 28, 2023 at 02:56:15PM +0100, Al Viro wrote:
> The first failure exit does not need any work - the caller had not bumped
> ->ki_pos; the second one (after that 'if (err < 0) {' line) does and that's
> where the patch upthread adds iocb->ki_pos -= buffered_written.
> 
> Or am I completely misparsing what you've written?

No, I misread the patch.  Looks good:

Acked-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Cluster-devel] [PATCH 03/12] filemap: update ki_pos in generic_perform_write
  2023-08-27 21:45     ` Al Viro
  2023-08-28 12:32       ` Christoph Hellwig
@ 2023-09-13 11:00       ` Christoph Hellwig
  2023-09-13 16:33         ` Christian Brauner
  1 sibling, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2023-09-13 11:00 UTC (permalink / raw)
  To: Al Viro
  Cc: Darrick J. Wong, linux-mm, Christoph Hellwig, Miklos Szeredi,
	Matthew Wilcox, cluster-devel, Ilya Dryomov, linux-ext4, Chao Yu,
	linux-nfs, linux-block, Damien Le Moal, Hannes Reinecke,
	Jaegeuk Kim, ceph-devel, Xiubo Li, Trond Myklebust, Jens Axboe,
	Christian Brauner, Theodore Ts'o, linux-f2fs-devel, linux-xfs,
	Anna Schumaker, linux-fsdevel, Andrew Morton

> direct_write_fallback(): on error revert the ->ki_pos update from buffered write

Al, Christian: can you send this fix on top Linus?


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Cluster-devel] [PATCH 03/12] filemap: update ki_pos in generic_perform_write
  2023-09-13 11:00       ` Christoph Hellwig
@ 2023-09-13 16:33         ` Christian Brauner
  0 siblings, 0 replies; 28+ messages in thread
From: Christian Brauner @ 2023-09-13 16:33 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Darrick J. Wong, linux-mm, Miklos Szeredi, Matthew Wilcox,
	cluster-devel, Ilya Dryomov, linux-ext4, Chao Yu, linux-block,
	Damien Le Moal, Al Viro, Jaegeuk Kim, ceph-devel, Xiubo Li,
	Trond Myklebust, Jens Axboe, linux-nfs, Theodore Ts'o,
	linux-f2fs-devel, linux-xfs, Anna Schumaker, linux-fsdevel,
	Andrew Morton, Hannes Reinecke

On Wed, Sep 13, 2023 at 01:00:10PM +0200, Christoph Hellwig wrote:
> > direct_write_fallback(): on error revert the ->ki_pos update from buffered write
> 
> Al, Christian: can you send this fix on top Linus?

Wasn't aware of this, sorry. I've picked it up and placed it with
another set of small fixes I already have.
I'm happy to have Al take it ofc.


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2023-09-13 16:48 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-01 14:58 [Cluster-devel] cleanup the filemap / direct I/O interaction v4 Christoph Hellwig
2023-06-01 14:58 ` [Cluster-devel] [PATCH 01/12] backing_dev: remove current->backing_dev_info Christoph Hellwig
2023-07-06  0:18   ` [Cluster-devel] [f2fs-dev] " patchwork-bot+f2fs
2023-06-01 14:58 ` [Cluster-devel] [PATCH 02/12] iomap: update ki_pos a little later in iomap_dio_complete Christoph Hellwig
2023-06-01 14:58 ` [Cluster-devel] [PATCH 03/12] filemap: update ki_pos in generic_perform_write Christoph Hellwig
2023-08-27 19:41   ` Al Viro
2023-08-27 21:45     ` Al Viro
2023-08-28 12:32       ` Christoph Hellwig
2023-08-28 13:56         ` Al Viro
2023-08-28 14:15           ` Christoph Hellwig
2023-09-13 11:00       ` Christoph Hellwig
2023-09-13 16:33         ` Christian Brauner
2023-08-28  1:04     ` Al Viro
2023-08-28 12:30     ` Christoph Hellwig
2023-08-28 14:02       ` Al Viro
2023-06-01 14:58 ` [Cluster-devel] [PATCH 04/12] filemap: add a kiocb_write_and_wait helper Christoph Hellwig
2023-06-01 14:58 ` [Cluster-devel] [PATCH 05/12] filemap: add a kiocb_invalidate_pages helper Christoph Hellwig
2023-06-01 14:58 ` [Cluster-devel] [PATCH 06/12] filemap: add a kiocb_invalidate_post_direct_write helper Christoph Hellwig
2023-06-01 14:58 ` [Cluster-devel] [PATCH 07/12] iomap: update ki_pos in iomap_file_buffered_write Christoph Hellwig
2023-06-01 14:59 ` [Cluster-devel] [PATCH 08/12] iomap: use kiocb_write_and_wait and kiocb_invalidate_pages Christoph Hellwig
2023-06-01 14:59 ` [Cluster-devel] [PATCH 09/12] fs: factor out a direct_write_fallback helper Christoph Hellwig
2023-06-06  0:04   ` Darrick J. Wong
2023-06-06  6:43     ` Christoph Hellwig
2023-06-01 14:59 ` [Cluster-devel] [PATCH 10/12] fuse: update ki_pos in fuse_perform_write Christoph Hellwig
2023-06-01 14:59 ` [Cluster-devel] [PATCH 11/12] fuse: drop redundant arguments to fuse_perform_write Christoph Hellwig
2023-06-01 14:59 ` [Cluster-devel] [PATCH 12/12] fuse: use direct_write_fallback Christoph Hellwig
  -- strict thread matches above, loose matches on Subject: below --
2023-05-31  7:50 [Cluster-devel] cleanup the filemap / direct I/O interaction v3 (full series now) Christoph Hellwig
2023-05-31  7:50 ` [Cluster-devel] [PATCH 03/12] filemap: update ki_pos in generic_perform_write Christoph Hellwig
2023-06-01  4:06   ` Theodore Ts'o

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).