[PATCH v2 00/10] ext4: clean up and refactor fallocate

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 00/10] ext4: clean up and refactor fallocate
@ 2024-09-04  6:29 Zhang Yi
  2024-09-04  6:29 ` [PATCH v2 01/10] ext4: write out dirty data before dropping pages Zhang Yi
                   ` (9 more replies)
  0 siblings, 10 replies; 28+ messages in thread
From: Zhang Yi @ 2024-09-04  6:29 UTC (permalink / raw)
  To: linux-ext4
  Cc: linux-fsdevel, linux-kernel, tytso, adilger.kernel, jack,
	ritesh.list, yi.zhang, yi.zhang, chengzhihao1, yukuai3

From: Zhang Yi <yi.zhang@huawei.com>

Changes since v1:
 - Fix an using uninitialized variable problem in the error out path in
   ext4_do_fallocate() in patch 08.

Current ext4 fallocate code is mess with mode checking, locking, input
parameter checking, position calculation, and having some stale code.
Almost all of the five sub-functions have the same preparation, it
deserve a clean up now.

This series tries to clean this up by refactor all fallocate related
operations, it unify variable naming, reduce some unnecessary position
calculation, factor out one common helper to check input parameters, and
also foctor out one common helper to wait for the dios to finish, hold
filemap invalidate lock, write back dirty data and drop page cache.

The first patch fix a potential data loss problem when punch hole, zero
range and collapse range by always write back dirty pages. Later patchs
do cleanup and refactor work, please see them for details. After this
series, we can reduce a lot of redundant code and make it more clear
than before.

Thanks,
Yi.

Zhang Yi (10):
  ext4: write out dirty data before dropping pages
  ext4: don't explicit update times in ext4_fallocate()
  ext4: drop ext4_update_disksize_before_punch()
  ext4: refactor ext4_zero_range()
  ext4: refactor ext4_punch_hole()
  ext4: refactor ext4_collapse_range()
  ext4: refactor ext4_insert_range()
  ext4: factor out ext4_do_fallocate()
  ext4: factor out the common checking part of all fallocate operations
  ext4: factor out a common helper to lock and flush data before
    fallocate

 fs/ext4/ext4.h    |   5 +-
 fs/ext4/extents.c | 566 +++++++++++++++++++---------------------------
 fs/ext4/inode.c   | 173 ++++----------
 3 files changed, 278 insertions(+), 466 deletions(-)

-- 
2.39.2

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 01/10] ext4: write out dirty data before dropping pages
  2024-09-04  6:29 [PATCH v2 00/10] ext4: clean up and refactor fallocate Zhang Yi
@ 2024-09-04  6:29 ` Zhang Yi
  2024-09-17 16:50   ` Jan Kara
  2024-09-04  6:29 ` [PATCH v2 02/10] ext4: don't explicit update times in ext4_fallocate() Zhang Yi
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 28+ messages in thread
From: Zhang Yi @ 2024-09-04  6:29 UTC (permalink / raw)
  To: linux-ext4
  Cc: linux-fsdevel, linux-kernel, tytso, adilger.kernel, jack,
	ritesh.list, yi.zhang, yi.zhang, chengzhihao1, yukuai3

From: Zhang Yi <yi.zhang@huawei.com>

Current zero range, punch hole and collapse range have a common
potential data loss problem. In general, ext4_zero_range(),
ext4_collapse_range() and ext4_punch_hold() will discard all page cache
of the operation range before converting the extents status. However,
the first two functions don't write back dirty data before discarding
page cache, and ext4_punch_hold() write back at the very beginning
without holding i_rwsem and mapping invalidate lock. Hence, if some bad
things (e.g. EIO or ENOMEM) happens just after dropping dirty page
cache, the operation will failed but the user's valid data in the dirty
page cache will be lost. Fix this by write all dirty data under i_rwsem
and mapping invalidate lock before discarding pages.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/ext4/extents.c | 77 +++++++++++++++++------------------------------
 fs/ext4/inode.c   | 19 +++++-------
 2 files changed, 36 insertions(+), 60 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index e067f2dd0335..7d5edfa2e630 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4602,6 +4602,24 @@ static long ext4_zero_range(struct file *file, loff_t offset,
 	if (ret)
 		goto out_mutex;
 
+	/*
+	 * Prevent page faults from reinstantiating pages we have released
+	 * from page cache.
+	 */
+	filemap_invalidate_lock(mapping);
+
+	ret = ext4_break_layouts(inode);
+	if (ret)
+		goto out_invalidate_lock;
+
+	/*
+	 * Write data that will be zeroed to preserve them when successfully
+	 * discarding page cache below but fail to convert extents.
+	 */
+	ret = filemap_write_and_wait_range(mapping, start, end - 1);
+	if (ret)
+		goto out_invalidate_lock;
+
 	/* Preallocate the range including the unaligned edges */
 	if (partial_begin || partial_end) {
 		ret = ext4_alloc_file_blocks(file,
@@ -4610,7 +4628,7 @@ static long ext4_zero_range(struct file *file, loff_t offset,
 				 round_down(offset, 1 << blkbits)) >> blkbits,
 				new_size, flags);
 		if (ret)
-			goto out_mutex;
+			goto out_invalidate_lock;
 
 	}
 
@@ -4619,37 +4637,9 @@ static long ext4_zero_range(struct file *file, loff_t offset,
 		flags |= (EXT4_GET_BLOCKS_CONVERT_UNWRITTEN |
 			  EXT4_EX_NOCACHE);
 
-		/*
-		 * Prevent page faults from reinstantiating pages we have
-		 * released from page cache.
-		 */
-		filemap_invalidate_lock(mapping);
-
-		ret = ext4_break_layouts(inode);
-		if (ret) {
-			filemap_invalidate_unlock(mapping);
-			goto out_mutex;
-		}
-
 		ret = ext4_update_disksize_before_punch(inode, offset, len);
-		if (ret) {
-			filemap_invalidate_unlock(mapping);
-			goto out_mutex;
-		}
-
-		/*
-		 * For journalled data we need to write (and checkpoint) pages
-		 * before discarding page cache to avoid inconsitent data on
-		 * disk in case of crash before zeroing trans is committed.
-		 */
-		if (ext4_should_journal_data(inode)) {
-			ret = filemap_write_and_wait_range(mapping, start,
-							   end - 1);
-			if (ret) {
-				filemap_invalidate_unlock(mapping);
-				goto out_mutex;
-			}
-		}
+		if (ret)
+			goto out_invalidate_lock;
 
 		/* Now release the pages and zero block aligned part of pages */
 		truncate_pagecache_range(inode, start, end - 1);
@@ -4657,12 +4647,11 @@ static long ext4_zero_range(struct file *file, loff_t offset,
 
 		ret = ext4_alloc_file_blocks(file, lblk, max_blocks, new_size,
 					     flags);
-		filemap_invalidate_unlock(mapping);
 		if (ret)
-			goto out_mutex;
+			goto out_invalidate_lock;
 	}
 	if (!partial_begin && !partial_end)
-		goto out_mutex;
+		goto out_invalidate_lock;
 
 	/*
 	 * In worst case we have to writeout two nonadjacent unwritten
@@ -4675,7 +4664,7 @@ static long ext4_zero_range(struct file *file, loff_t offset,
 	if (IS_ERR(handle)) {
 		ret = PTR_ERR(handle);
 		ext4_std_error(inode->i_sb, ret);
-		goto out_mutex;
+		goto out_invalidate_lock;
 	}
 
 	inode_set_mtime_to_ts(inode, inode_set_ctime_current(inode));
@@ -4694,6 +4683,8 @@ static long ext4_zero_range(struct file *file, loff_t offset,
 
 out_handle:
 	ext4_journal_stop(handle);
+out_invalidate_lock:
+	filemap_invalidate_unlock(mapping);
 out_mutex:
 	inode_unlock(inode);
 	return ret;
@@ -5363,20 +5354,8 @@ static int ext4_collapse_range(struct file *file, loff_t offset, loff_t len)
 	 * for page size > block size.
 	 */
 	ioffset = round_down(offset, PAGE_SIZE);
-	/*
-	 * Write tail of the last page before removed range since it will get
-	 * removed from the page cache below.
-	 */
-	ret = filemap_write_and_wait_range(mapping, ioffset, offset);
-	if (ret)
-		goto out_mmap;
-	/*
-	 * Write data that will be shifted to preserve them when discarding
-	 * page cache below. We are also protected from pages becoming dirty
-	 * by i_rwsem and invalidate_lock.
-	 */
-	ret = filemap_write_and_wait_range(mapping, offset + len,
-					   LLONG_MAX);
+	/* Write out all dirty pages */
+	ret = filemap_write_and_wait_range(mapping, ioffset, LLONG_MAX);
 	if (ret)
 		goto out_mmap;
 	truncate_pagecache(inode, ioffset);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 941c1c0d5c6e..c3d7606a5315 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3957,17 +3957,6 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
 
 	trace_ext4_punch_hole(inode, offset, length, 0);
 
-	/*
-	 * Write out all dirty pages to avoid race conditions
-	 * Then release them.
-	 */
-	if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
-		ret = filemap_write_and_wait_range(mapping, offset,
-						   offset + length - 1);
-		if (ret)
-			return ret;
-	}
-
 	inode_lock(inode);
 
 	/* No need to punch hole beyond i_size */
@@ -4021,6 +4010,14 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
 	if (ret)
 		goto out_dio;
 
+	/* Write out all dirty pages to avoid race conditions */
+	if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
+		ret = filemap_write_and_wait_range(mapping, offset,
+						   offset + length - 1);
+		if (ret)
+			goto out_dio;
+	}
+
 	first_block_offset = round_up(offset, sb->s_blocksize);
 	last_block_offset = round_down((offset + length), sb->s_blocksize) - 1;
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 02/10] ext4: don't explicit update times in ext4_fallocate()
  2024-09-04  6:29 [PATCH v2 00/10] ext4: clean up and refactor fallocate Zhang Yi
  2024-09-04  6:29 ` [PATCH v2 01/10] ext4: write out dirty data before dropping pages Zhang Yi
@ 2024-09-04  6:29 ` Zhang Yi
  2024-09-20 16:04   ` Jan Kara
  2024-09-04  6:29 ` [PATCH v2 03/10] ext4: drop ext4_update_disksize_before_punch() Zhang Yi
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 28+ messages in thread
From: Zhang Yi @ 2024-09-04  6:29 UTC (permalink / raw)
  To: linux-ext4
  Cc: linux-fsdevel, linux-kernel, tytso, adilger.kernel, jack,
	ritesh.list, yi.zhang, yi.zhang, chengzhihao1, yukuai3

From: Zhang Yi <yi.zhang@huawei.com>

After commit 'ad5cd4f4ee4d ("ext4: fix fallocate to use file_modified to
update permissions consistently"), we can update mtime and ctime
appropriately through file_modified() when doing zero range, collapse
rage, insert range and punch hole, hence there is no need to explicit
update times in those paths, just drop them.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/ext4/extents.c | 4 ----
 fs/ext4/inode.c   | 1 -
 2 files changed, 5 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 7d5edfa2e630..19a9b14935b7 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4643,7 +4643,6 @@ static long ext4_zero_range(struct file *file, loff_t offset,
 
 		/* Now release the pages and zero block aligned part of pages */
 		truncate_pagecache_range(inode, start, end - 1);
-		inode_set_mtime_to_ts(inode, inode_set_ctime_current(inode));
 
 		ret = ext4_alloc_file_blocks(file, lblk, max_blocks, new_size,
 					     flags);
@@ -4667,7 +4666,6 @@ static long ext4_zero_range(struct file *file, loff_t offset,
 		goto out_invalidate_lock;
 	}
 
-	inode_set_mtime_to_ts(inode, inode_set_ctime_current(inode));
 	if (new_size)
 		ext4_update_inode_size(inode, new_size);
 	ret = ext4_mark_inode_dirty(handle, inode);
@@ -5393,7 +5391,6 @@ static int ext4_collapse_range(struct file *file, loff_t offset, loff_t len)
 	up_write(&EXT4_I(inode)->i_data_sem);
 	if (IS_SYNC(inode))
 		ext4_handle_sync(handle);
-	inode_set_mtime_to_ts(inode, inode_set_ctime_current(inode));
 	ret = ext4_mark_inode_dirty(handle, inode);
 	ext4_update_inode_fsync_trans(handle, inode, 1);
 
@@ -5503,7 +5500,6 @@ static int ext4_insert_range(struct file *file, loff_t offset, loff_t len)
 	/* Expand file to avoid data loss if there is error while shifting */
 	inode->i_size += len;
 	EXT4_I(inode)->i_disksize += len;
-	inode_set_mtime_to_ts(inode, inode_set_ctime_current(inode));
 	ret = ext4_mark_inode_dirty(handle, inode);
 	if (ret)
 		goto out_stop;
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index c3d7606a5315..8af25442d44d 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4074,7 +4074,6 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
 	if (IS_SYNC(inode))
 		ext4_handle_sync(handle);
 
-	inode_set_mtime_to_ts(inode, inode_set_ctime_current(inode));
 	ret2 = ext4_mark_inode_dirty(handle, inode);
 	if (unlikely(ret2))
 		ret = ret2;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 03/10] ext4: drop ext4_update_disksize_before_punch()
  2024-09-04  6:29 [PATCH v2 00/10] ext4: clean up and refactor fallocate Zhang Yi
  2024-09-04  6:29 ` [PATCH v2 01/10] ext4: write out dirty data before dropping pages Zhang Yi
  2024-09-04  6:29 ` [PATCH v2 02/10] ext4: don't explicit update times in ext4_fallocate() Zhang Yi
@ 2024-09-04  6:29 ` Zhang Yi
  2024-09-20 16:13   ` Jan Kara
  2024-09-04  6:29 ` [PATCH v2 04/10] ext4: refactor ext4_zero_range() Zhang Yi
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 28+ messages in thread
From: Zhang Yi @ 2024-09-04  6:29 UTC (permalink / raw)
  To: linux-ext4
  Cc: linux-fsdevel, linux-kernel, tytso, adilger.kernel, jack,
	ritesh.list, yi.zhang, yi.zhang, chengzhihao1, yukuai3

From: Zhang Yi <yi.zhang@huawei.com>

Since we always write back dirty data before zeroing range and punching
hole, the delalloc extended file's disksize of should be updated
properly when writing back pages, hence we don't need to update file's
disksize before discarding page cache in ext4_zero_range() and
ext4_punch_hole(), just drop ext4_update_disksize_before_punch().

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/ext4/ext4.h    |  3 ---
 fs/ext4/extents.c |  4 ----
 fs/ext4/inode.c   | 37 +------------------------------------
 3 files changed, 1 insertion(+), 43 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 08acd152261e..e8d7965f62c4 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -3414,9 +3414,6 @@ static inline int ext4_update_inode_size(struct inode *inode, loff_t newsize)
 	return changed;
 }
 
-int ext4_update_disksize_before_punch(struct inode *inode, loff_t offset,
-				      loff_t len);
-
 struct ext4_group_info {
 	unsigned long   bb_state;
 #ifdef AGGRESSIVE_CHECK
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 19a9b14935b7..d9fccf2970e9 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4637,10 +4637,6 @@ static long ext4_zero_range(struct file *file, loff_t offset,
 		flags |= (EXT4_GET_BLOCKS_CONVERT_UNWRITTEN |
 			  EXT4_EX_NOCACHE);
 
-		ret = ext4_update_disksize_before_punch(inode, offset, len);
-		if (ret)
-			goto out_invalidate_lock;
-
 		/* Now release the pages and zero block aligned part of pages */
 		truncate_pagecache_range(inode, start, end - 1);
 
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 8af25442d44d..9343ce9f2b01 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3872,37 +3872,6 @@ int ext4_can_truncate(struct inode *inode)
 	return 0;
 }
 
-/*
- * We have to make sure i_disksize gets properly updated before we truncate
- * page cache due to hole punching or zero range. Otherwise i_disksize update
- * can get lost as it may have been postponed to submission of writeback but
- * that will never happen after we truncate page cache.
- */
-int ext4_update_disksize_before_punch(struct inode *inode, loff_t offset,
-				      loff_t len)
-{
-	handle_t *handle;
-	int ret;
-
-	loff_t size = i_size_read(inode);
-
-	WARN_ON(!inode_is_locked(inode));
-	if (offset > size || offset + len < size)
-		return 0;
-
-	if (EXT4_I(inode)->i_disksize >= size)
-		return 0;
-
-	handle = ext4_journal_start(inode, EXT4_HT_MISC, 1);
-	if (IS_ERR(handle))
-		return PTR_ERR(handle);
-	ext4_update_i_disksize(inode, size);
-	ret = ext4_mark_inode_dirty(handle, inode);
-	ext4_journal_stop(handle);
-
-	return ret;
-}
-
 static void ext4_wait_dax_page(struct inode *inode)
 {
 	filemap_invalidate_unlock(inode->i_mapping);
@@ -4022,13 +3991,9 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
 	last_block_offset = round_down((offset + length), sb->s_blocksize) - 1;
 
 	/* Now release the pages and zero block aligned part of pages*/
-	if (last_block_offset > first_block_offset) {
-		ret = ext4_update_disksize_before_punch(inode, offset, length);
-		if (ret)
-			goto out_dio;
+	if (last_block_offset > first_block_offset)
 		truncate_pagecache_range(inode, first_block_offset,
 					 last_block_offset);
-	}
 
 	if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))
 		credits = ext4_writepage_trans_blocks(inode);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 04/10] ext4: refactor ext4_zero_range()
  2024-09-04  6:29 [PATCH v2 00/10] ext4: clean up and refactor fallocate Zhang Yi
                   ` (2 preceding siblings ...)
  2024-09-04  6:29 ` [PATCH v2 03/10] ext4: drop ext4_update_disksize_before_punch() Zhang Yi
@ 2024-09-04  6:29 ` Zhang Yi
  2024-09-20 16:24   ` Jan Kara
  2024-09-04  6:29 ` [PATCH v2 05/10] ext4: refactor ext4_punch_hole() Zhang Yi
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 28+ messages in thread
From: Zhang Yi @ 2024-09-04  6:29 UTC (permalink / raw)
  To: linux-ext4
  Cc: linux-fsdevel, linux-kernel, tytso, adilger.kernel, jack,
	ritesh.list, yi.zhang, yi.zhang, chengzhihao1, yukuai3

From: Zhang Yi <yi.zhang@huawei.com>

Current ext4_zero_range() is full of complex position calculation and
stale error out tags. In order to clean up the code and make things
clear, refactor it by a) simplify and rename variables, b) remove some
unnecessary position calculations, always write back dirty data and
drop cache from offset to end, instead of only write back aligned
blocks, c) rename the stale out_mutex tag.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/ext4/extents.c | 96 ++++++++++++++++++-----------------------------
 1 file changed, 37 insertions(+), 59 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index d9fccf2970e9..2fb0c2e303c7 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4540,40 +4540,15 @@ static long ext4_zero_range(struct file *file, loff_t offset,
 	struct inode *inode = file_inode(file);
 	struct address_space *mapping = file->f_mapping;
 	handle_t *handle = NULL;
-	unsigned int max_blocks;
 	loff_t new_size = 0;
-	int ret = 0;
-	int flags;
-	int credits;
-	int partial_begin, partial_end;
-	loff_t start, end;
-	ext4_lblk_t lblk;
+	loff_t end = offset + len;
+	ext4_lblk_t start_lblk, end_lblk;
+	unsigned int blocksize = i_blocksize(inode);
 	unsigned int blkbits = inode->i_blkbits;
+	int ret, flags, credits;
 
 	trace_ext4_zero_range(inode, offset, len, mode);
 
-	/*
-	 * Round up offset. This is not fallocate, we need to zero out
-	 * blocks, so convert interior block aligned part of the range to
-	 * unwritten and possibly manually zero out unaligned parts of the
-	 * range. Here, start and partial_begin are inclusive, end and
-	 * partial_end are exclusive.
-	 */
-	start = round_up(offset, 1 << blkbits);
-	end = round_down((offset + len), 1 << blkbits);
-
-	if (start < offset || end > offset + len)
-		return -EINVAL;
-	partial_begin = offset & ((1 << blkbits) - 1);
-	partial_end = (offset + len) & ((1 << blkbits) - 1);
-
-	lblk = start >> blkbits;
-	max_blocks = (end >> blkbits);
-	if (max_blocks < lblk)
-		max_blocks = 0;
-	else
-		max_blocks -= lblk;
-
 	inode_lock(inode);
 
 	/*
@@ -4581,26 +4556,23 @@ static long ext4_zero_range(struct file *file, loff_t offset,
 	 */
 	if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) {
 		ret = -EOPNOTSUPP;
-		goto out_mutex;
+		goto out;
 	}
 
 	if (!(mode & FALLOC_FL_KEEP_SIZE) &&
-	    (offset + len > inode->i_size ||
-	     offset + len > EXT4_I(inode)->i_disksize)) {
-		new_size = offset + len;
+	    (end > inode->i_size || end > EXT4_I(inode)->i_disksize)) {
+		new_size = end;
 		ret = inode_newsize_ok(inode, new_size);
 		if (ret)
-			goto out_mutex;
+			goto out;
 	}
 
-	flags = EXT4_GET_BLOCKS_CREATE_UNWRIT_EXT;
-
 	/* Wait all existing dio workers, newcomers will block on i_rwsem */
 	inode_dio_wait(inode);
 
 	ret = file_modified(file);
 	if (ret)
-		goto out_mutex;
+		goto out;
 
 	/*
 	 * Prevent page faults from reinstantiating pages we have released
@@ -4616,36 +4588,40 @@ static long ext4_zero_range(struct file *file, loff_t offset,
 	 * Write data that will be zeroed to preserve them when successfully
 	 * discarding page cache below but fail to convert extents.
 	 */
-	ret = filemap_write_and_wait_range(mapping, start, end - 1);
+	ret = filemap_write_and_wait_range(mapping, offset, end - 1);
 	if (ret)
 		goto out_invalidate_lock;
 
+	/* Now release the pages and zero block aligned part of pages */
+	truncate_pagecache_range(inode, offset, end - 1);
+
+	flags = EXT4_GET_BLOCKS_CREATE_UNWRIT_EXT;
 	/* Preallocate the range including the unaligned edges */
-	if (partial_begin || partial_end) {
-		ret = ext4_alloc_file_blocks(file,
-				round_down(offset, 1 << blkbits) >> blkbits,
-				(round_up((offset + len), 1 << blkbits) -
-				 round_down(offset, 1 << blkbits)) >> blkbits,
-				new_size, flags);
+	if (offset & (blocksize - 1) || end & (blocksize - 1)) {
+		ext4_lblk_t alloc_lblk = offset >> blkbits;
+		ext4_lblk_t len_lblk = EXT4_MAX_BLOCKS(len, offset, blkbits);
+
+		ret = ext4_alloc_file_blocks(file, alloc_lblk, len_lblk,
+					     new_size, flags);
 		if (ret)
 			goto out_invalidate_lock;
 
 	}
 
 	/* Zero range excluding the unaligned edges */
-	if (max_blocks > 0) {
-		flags |= (EXT4_GET_BLOCKS_CONVERT_UNWRITTEN |
-			  EXT4_EX_NOCACHE);
-
-		/* Now release the pages and zero block aligned part of pages */
-		truncate_pagecache_range(inode, start, end - 1);
-
-		ret = ext4_alloc_file_blocks(file, lblk, max_blocks, new_size,
-					     flags);
+	start_lblk = round_up(offset, blocksize) >> blkbits;
+	end_lblk = end >> blkbits;
+	if (end_lblk > start_lblk) {
+		ext4_lblk_t zero_blks = end_lblk - start_lblk;
+
+		flags |= (EXT4_GET_BLOCKS_CONVERT_UNWRITTEN | EXT4_EX_NOCACHE);
+		ret = ext4_alloc_file_blocks(file, start_lblk, zero_blks,
+					     new_size, flags);
 		if (ret)
 			goto out_invalidate_lock;
 	}
-	if (!partial_begin && !partial_end)
+	/* Finish zeroing out if it doesn't contain partial block */
+	if (!(offset & (blocksize - 1)) && !(end & (blocksize - 1)))
 		goto out_invalidate_lock;
 
 	/*
@@ -4662,16 +4638,18 @@ static long ext4_zero_range(struct file *file, loff_t offset,
 		goto out_invalidate_lock;
 	}
 
+	/* Zero out partial block at the edges of the range */
+	ret = ext4_zero_partial_blocks(handle, inode, offset, len);
+	if (ret)
+		goto out_handle;
+
 	if (new_size)
 		ext4_update_inode_size(inode, new_size);
 	ret = ext4_mark_inode_dirty(handle, inode);
 	if (unlikely(ret))
 		goto out_handle;
-	/* Zero out partial block at the edges of the range */
-	ret = ext4_zero_partial_blocks(handle, inode, offset, len);
-	if (ret >= 0)
-		ext4_update_inode_fsync_trans(handle, inode, 1);
 
+	ext4_update_inode_fsync_trans(handle, inode, 1);
 	if (file->f_flags & O_SYNC)
 		ext4_handle_sync(handle);
 
@@ -4679,7 +4657,7 @@ static long ext4_zero_range(struct file *file, loff_t offset,
 	ext4_journal_stop(handle);
 out_invalidate_lock:
 	filemap_invalidate_unlock(mapping);
-out_mutex:
+out:
 	inode_unlock(inode);
 	return ret;
 }
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 05/10] ext4: refactor ext4_punch_hole()
  2024-09-04  6:29 [PATCH v2 00/10] ext4: clean up and refactor fallocate Zhang Yi
                   ` (3 preceding siblings ...)
  2024-09-04  6:29 ` [PATCH v2 04/10] ext4: refactor ext4_zero_range() Zhang Yi
@ 2024-09-04  6:29 ` Zhang Yi
  2024-09-20 16:31   ` Jan Kara
  2024-09-04  6:29 ` [PATCH v2 06/10] ext4: refactor ext4_collapse_range() Zhang Yi
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 28+ messages in thread
From: Zhang Yi @ 2024-09-04  6:29 UTC (permalink / raw)
  To: linux-ext4
  Cc: linux-fsdevel, linux-kernel, tytso, adilger.kernel, jack,
	ritesh.list, yi.zhang, yi.zhang, chengzhihao1, yukuai3

From: Zhang Yi <yi.zhang@huawei.com>

Current ext4_punch_hole() is full of complex position calculation and
stale error out tags. In order to clean up the code and make things
clear, refactor it by a) simplify and rename variables, make the style
the same as ext4_zero_range(), b) remove some unnecessary position
calculations, always write back dirty data and drop cache from offset to
end, instead of only write back aligned blocks, c) rename the three
stale error tags.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/ext4/inode.c | 114 ++++++++++++++++++++++--------------------------
 1 file changed, 51 insertions(+), 63 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 9343ce9f2b01..dfaf9e9d6ad8 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3916,13 +3916,14 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
 {
 	struct inode *inode = file_inode(file);
 	struct super_block *sb = inode->i_sb;
-	ext4_lblk_t first_block, stop_block;
+	ext4_lblk_t start_lblk, end_lblk;
 	struct address_space *mapping = inode->i_mapping;
-	loff_t first_block_offset, last_block_offset, max_length;
-	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+	loff_t max_end = EXT4_SB(sb)->s_bitmap_maxbytes - sb->s_blocksize;
+	loff_t end = offset + length;
+	unsigned long blocksize = i_blocksize(inode);
 	handle_t *handle;
 	unsigned int credits;
-	int ret = 0, ret2 = 0;
+	int ret = 0;
 
 	trace_ext4_punch_hole(inode, offset, length, 0);
 
@@ -3930,36 +3931,27 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
 
 	/* No need to punch hole beyond i_size */
 	if (offset >= inode->i_size)
-		goto out_mutex;
+		goto out;
 
 	/*
-	 * If the hole extends beyond i_size, set the hole
-	 * to end after the page that contains i_size
+	 * If the hole extends beyond i_size, set the hole to end after
+	 * the page that contains i_size, and also make sure that the hole
+	 * within one block before last range.
 	 */
-	if (offset + length > inode->i_size) {
-		length = inode->i_size +
-		   PAGE_SIZE - (inode->i_size & (PAGE_SIZE - 1)) -
-		   offset;
-	}
+	if (end > inode->i_size)
+		end = round_up(inode->i_size, PAGE_SIZE);
+	if (end > max_end)
+		end = max_end;
+	length = end - offset;
 
 	/*
-	 * For punch hole the length + offset needs to be within one block
-	 * before last range. Adjust the length if it goes beyond that limit.
+	 * Attach jinode to inode for jbd2 if we do any zeroing of partial
+	 * block.
 	 */
-	max_length = sbi->s_bitmap_maxbytes - inode->i_sb->s_blocksize;
-	if (offset + length > max_length)
-		length = max_length - offset;
-
-	if (offset & (sb->s_blocksize - 1) ||
-	    (offset + length) & (sb->s_blocksize - 1)) {
-		/*
-		 * Attach jinode to inode for jbd2 if we do any zeroing of
-		 * partial block
-		 */
+	if (offset & (blocksize - 1) || end & (blocksize - 1)) {
 		ret = ext4_inode_attach_jinode(inode);
 		if (ret < 0)
-			goto out_mutex;
-
+			goto out;
 	}
 
 	/* Wait all existing dio workers, newcomers will block on i_rwsem */
@@ -3967,7 +3959,7 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
 
 	ret = file_modified(file);
 	if (ret)
-		goto out_mutex;
+		goto out;
 
 	/*
 	 * Prevent page faults from reinstantiating pages we have released from
@@ -3977,23 +3969,17 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
 
 	ret = ext4_break_layouts(inode);
 	if (ret)
-		goto out_dio;
+		goto out_invalidate_lock;
 
 	/* Write out all dirty pages to avoid race conditions */
 	if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
-		ret = filemap_write_and_wait_range(mapping, offset,
-						   offset + length - 1);
+		ret = filemap_write_and_wait_range(mapping, offset, end - 1);
 		if (ret)
-			goto out_dio;
+			goto out_invalidate_lock;
 	}
 
-	first_block_offset = round_up(offset, sb->s_blocksize);
-	last_block_offset = round_down((offset + length), sb->s_blocksize) - 1;
-
 	/* Now release the pages and zero block aligned part of pages*/
-	if (last_block_offset > first_block_offset)
-		truncate_pagecache_range(inode, first_block_offset,
-					 last_block_offset);
+	truncate_pagecache_range(inode, offset, end - 1);
 
 	if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))
 		credits = ext4_writepage_trans_blocks(inode);
@@ -4003,52 +3989,54 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
 	if (IS_ERR(handle)) {
 		ret = PTR_ERR(handle);
 		ext4_std_error(sb, ret);
-		goto out_dio;
+		goto out_invalidate_lock;
 	}
 
-	ret = ext4_zero_partial_blocks(handle, inode, offset,
-				       length);
+	ret = ext4_zero_partial_blocks(handle, inode, offset, length);
 	if (ret)
-		goto out_stop;
-
-	first_block = (offset + sb->s_blocksize - 1) >>
-		EXT4_BLOCK_SIZE_BITS(sb);
-	stop_block = (offset + length) >> EXT4_BLOCK_SIZE_BITS(sb);
+		goto out_handle;
 
 	/* If there are blocks to remove, do it */
-	if (stop_block > first_block) {
-		ext4_lblk_t hole_len = stop_block - first_block;
+	start_lblk = round_up(offset, blocksize) >> inode->i_blkbits;
+	end_lblk = end >> inode->i_blkbits;
+
+	if (end_lblk > start_lblk) {
+		ext4_lblk_t hole_len = end_lblk - start_lblk;
 
 		down_write(&EXT4_I(inode)->i_data_sem);
 		ext4_discard_preallocations(inode);
 
-		ext4_es_remove_extent(inode, first_block, hole_len);
+		ext4_es_remove_extent(inode, start_lblk, hole_len);
 
 		if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))
-			ret = ext4_ext_remove_space(inode, first_block,
-						    stop_block - 1);
+			ret = ext4_ext_remove_space(inode, start_lblk,
+						    end_lblk - 1);
 		else
-			ret = ext4_ind_remove_space(handle, inode, first_block,
-						    stop_block);
+			ret = ext4_ind_remove_space(handle, inode, start_lblk,
+						    end_lblk);
+		if (ret) {
+			up_write(&EXT4_I(inode)->i_data_sem);
+			goto out_handle;
+		}
 
-		ext4_es_insert_extent(inode, first_block, hole_len, ~0,
+		ext4_es_insert_extent(inode, start_lblk, hole_len, ~0,
 				      EXTENT_STATUS_HOLE);
 		up_write(&EXT4_I(inode)->i_data_sem);
 	}
-	ext4_fc_track_range(handle, inode, first_block, stop_block);
+	ext4_fc_track_range(handle, inode, start_lblk, end_lblk);
+
+	ret = ext4_mark_inode_dirty(handle, inode);
+	if (unlikely(ret))
+		goto out_handle;
+
+	ext4_update_inode_fsync_trans(handle, inode, 1);
 	if (IS_SYNC(inode))
 		ext4_handle_sync(handle);
-
-	ret2 = ext4_mark_inode_dirty(handle, inode);
-	if (unlikely(ret2))
-		ret = ret2;
-	if (ret >= 0)
-		ext4_update_inode_fsync_trans(handle, inode, 1);
-out_stop:
+out_handle:
 	ext4_journal_stop(handle);
-out_dio:
+out_invalidate_lock:
 	filemap_invalidate_unlock(mapping);
-out_mutex:
+out:
 	inode_unlock(inode);
 	return ret;
 }
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 06/10] ext4: refactor ext4_collapse_range()
  2024-09-04  6:29 [PATCH v2 00/10] ext4: clean up and refactor fallocate Zhang Yi
                   ` (4 preceding siblings ...)
  2024-09-04  6:29 ` [PATCH v2 05/10] ext4: refactor ext4_punch_hole() Zhang Yi
@ 2024-09-04  6:29 ` Zhang Yi
  2024-09-20 16:35   ` Jan Kara
  2024-09-04  6:29 ` [PATCH v2 07/10] ext4: refactor ext4_insert_range() Zhang Yi
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 28+ messages in thread
From: Zhang Yi @ 2024-09-04  6:29 UTC (permalink / raw)
  To: linux-ext4
  Cc: linux-fsdevel, linux-kernel, tytso, adilger.kernel, jack,
	ritesh.list, yi.zhang, yi.zhang, chengzhihao1, yukuai3

From: Zhang Yi <yi.zhang@huawei.com>

Simplify ext4_collapse_range() and make the code style the same as
ext4_zero_range() and ext4_punch_hole(), refactor it by a) rename
variables, b) drop redundant input parameters checking, move others to
under i_rwsem, preparing for later refactor, c) rename the three stale
error tags.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/ext4/extents.c | 80 +++++++++++++++++++++++------------------------
 1 file changed, 39 insertions(+), 41 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 2fb0c2e303c7..5c0b4d512531 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -5265,43 +5265,35 @@ static int ext4_collapse_range(struct file *file, loff_t offset, loff_t len)
 	struct inode *inode = file_inode(file);
 	struct super_block *sb = inode->i_sb;
 	struct address_space *mapping = inode->i_mapping;
-	ext4_lblk_t punch_start, punch_stop;
+	ext4_lblk_t start_lblk, end_lblk;
 	handle_t *handle;
 	unsigned int credits;
-	loff_t new_size, ioffset;
+	loff_t start, new_size;
 	int ret;
 
-	/*
-	 * We need to test this early because xfstests assumes that a
-	 * collapse range of (0, 1) will return EOPNOTSUPP if the file
-	 * system does not support collapse range.
-	 */
-	if (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))
-		return -EOPNOTSUPP;
+	trace_ext4_collapse_range(inode, offset, len);
 
-	/* Collapse range works only on fs cluster size aligned regions. */
-	if (!IS_ALIGNED(offset | len, EXT4_CLUSTER_SIZE(sb)))
-		return -EINVAL;
+	inode_lock(inode);
 
-	trace_ext4_collapse_range(inode, offset, len);
+	/* Currently just for extent based files */
+	if (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) {
+		ret = -EOPNOTSUPP;
+		goto out;
+	}
 
-	punch_start = offset >> EXT4_BLOCK_SIZE_BITS(sb);
-	punch_stop = (offset + len) >> EXT4_BLOCK_SIZE_BITS(sb);
+	/* Collapse range works only on fs cluster size aligned regions. */
+	if (!IS_ALIGNED(offset | len, EXT4_CLUSTER_SIZE(sb))) {
+		ret = -EINVAL;
+		goto out;
+	}
 
-	inode_lock(inode);
 	/*
 	 * There is no need to overlap collapse range with EOF, in which case
 	 * it is effectively a truncate operation
 	 */
 	if (offset + len >= inode->i_size) {
 		ret = -EINVAL;
-		goto out_mutex;
-	}
-
-	/* Currently just for extent based files */
-	if (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) {
-		ret = -EOPNOTSUPP;
-		goto out_mutex;
+		goto out;
 	}
 
 	/* Wait for existing dio to complete */
@@ -5309,7 +5301,7 @@ static int ext4_collapse_range(struct file *file, loff_t offset, loff_t len)
 
 	ret = file_modified(file);
 	if (ret)
-		goto out_mutex;
+		goto out;
 
 	/*
 	 * Prevent page faults from reinstantiating pages we have released from
@@ -5319,43 +5311,46 @@ static int ext4_collapse_range(struct file *file, loff_t offset, loff_t len)
 
 	ret = ext4_break_layouts(inode);
 	if (ret)
-		goto out_mmap;
+		goto out_invalidate_lock;
 
 	/*
 	 * Need to round down offset to be aligned with page size boundary
 	 * for page size > block size.
 	 */
-	ioffset = round_down(offset, PAGE_SIZE);
+	start = round_down(offset, PAGE_SIZE);
 	/* Write out all dirty pages */
-	ret = filemap_write_and_wait_range(mapping, ioffset, LLONG_MAX);
+	ret = filemap_write_and_wait_range(mapping, start, LLONG_MAX);
 	if (ret)
-		goto out_mmap;
-	truncate_pagecache(inode, ioffset);
+		goto out_invalidate_lock;
+	truncate_pagecache(inode, start);
 
 	credits = ext4_writepage_trans_blocks(inode);
 	handle = ext4_journal_start(inode, EXT4_HT_TRUNCATE, credits);
 	if (IS_ERR(handle)) {
 		ret = PTR_ERR(handle);
-		goto out_mmap;
+		goto out_invalidate_lock;
 	}
 	ext4_fc_mark_ineligible(sb, EXT4_FC_REASON_FALLOC_RANGE, handle);
 
+	start_lblk = offset >> inode->i_blkbits;
+	end_lblk = (offset + len) >> inode->i_blkbits;
+
 	down_write(&EXT4_I(inode)->i_data_sem);
 	ext4_discard_preallocations(inode);
-	ext4_es_remove_extent(inode, punch_start, EXT_MAX_BLOCKS - punch_start);
+	ext4_es_remove_extent(inode, start_lblk, EXT_MAX_BLOCKS - start_lblk);
 
-	ret = ext4_ext_remove_space(inode, punch_start, punch_stop - 1);
+	ret = ext4_ext_remove_space(inode, start_lblk, end_lblk - 1);
 	if (ret) {
 		up_write(&EXT4_I(inode)->i_data_sem);
-		goto out_stop;
+		goto out_handle;
 	}
 	ext4_discard_preallocations(inode);
 
-	ret = ext4_ext_shift_extents(inode, handle, punch_stop,
-				     punch_stop - punch_start, SHIFT_LEFT);
+	ret = ext4_ext_shift_extents(inode, handle, end_lblk,
+				     end_lblk - start_lblk, SHIFT_LEFT);
 	if (ret) {
 		up_write(&EXT4_I(inode)->i_data_sem);
-		goto out_stop;
+		goto out_handle;
 	}
 
 	new_size = inode->i_size - len;
@@ -5363,16 +5358,19 @@ static int ext4_collapse_range(struct file *file, loff_t offset, loff_t len)
 	EXT4_I(inode)->i_disksize = new_size;
 
 	up_write(&EXT4_I(inode)->i_data_sem);
-	if (IS_SYNC(inode))
-		ext4_handle_sync(handle);
 	ret = ext4_mark_inode_dirty(handle, inode);
+	if (ret)
+		goto out_handle;
+
 	ext4_update_inode_fsync_trans(handle, inode, 1);
+	if (IS_SYNC(inode))
+		ext4_handle_sync(handle);
 
-out_stop:
+out_handle:
 	ext4_journal_stop(handle);
-out_mmap:
+out_invalidate_lock:
 	filemap_invalidate_unlock(mapping);
-out_mutex:
+out:
 	inode_unlock(inode);
 	return ret;
 }
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 07/10] ext4: refactor ext4_insert_range()
  2024-09-04  6:29 [PATCH v2 00/10] ext4: clean up and refactor fallocate Zhang Yi
                   ` (5 preceding siblings ...)
  2024-09-04  6:29 ` [PATCH v2 06/10] ext4: refactor ext4_collapse_range() Zhang Yi
@ 2024-09-04  6:29 ` Zhang Yi
  2024-09-23  8:17   ` Jan Kara
  2024-09-04  6:29 ` [PATCH v2 08/10] ext4: factor out ext4_do_fallocate() Zhang Yi
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 28+ messages in thread
From: Zhang Yi @ 2024-09-04  6:29 UTC (permalink / raw)
  To: linux-ext4
  Cc: linux-fsdevel, linux-kernel, tytso, adilger.kernel, jack,
	ritesh.list, yi.zhang, yi.zhang, chengzhihao1, yukuai3

From: Zhang Yi <yi.zhang@huawei.com>

Simplify ext4_collapse_range() and make the code style the same as
ext4_collapse_range(), refactor it by a) rename variables, b) drop
redundant input parameters checking, move others to under i_rwsem,
preparing for later refactor, c) rename the three stale error tags.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/ext4/extents.c | 95 ++++++++++++++++++++++-------------------------
 1 file changed, 45 insertions(+), 50 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 5c0b4d512531..a6c24c229cb4 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -5391,45 +5391,37 @@ static int ext4_insert_range(struct file *file, loff_t offset, loff_t len)
 	handle_t *handle;
 	struct ext4_ext_path *path;
 	struct ext4_extent *extent;
-	ext4_lblk_t offset_lblk, len_lblk, ee_start_lblk = 0;
+	ext4_lblk_t start_lblk, len_lblk, ee_start_lblk = 0;
 	unsigned int credits, ee_len;
-	int ret = 0, depth, split_flag = 0;
-	loff_t ioffset;
-
-	/*
-	 * We need to test this early because xfstests assumes that an
-	 * insert range of (0, 1) will return EOPNOTSUPP if the file
-	 * system does not support insert range.
-	 */
-	if (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))
-		return -EOPNOTSUPP;
-
-	/* Insert range works only on fs cluster size aligned regions. */
-	if (!IS_ALIGNED(offset | len, EXT4_CLUSTER_SIZE(sb)))
-		return -EINVAL;
+	int ret, depth, split_flag = 0;
+	loff_t start;
 
 	trace_ext4_insert_range(inode, offset, len);
 
-	offset_lblk = offset >> EXT4_BLOCK_SIZE_BITS(sb);
-	len_lblk = len >> EXT4_BLOCK_SIZE_BITS(sb);
-
 	inode_lock(inode);
+
 	/* Currently just for extent based files */
 	if (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) {
 		ret = -EOPNOTSUPP;
-		goto out_mutex;
+		goto out;
 	}
 
-	/* Check whether the maximum file size would be exceeded */
-	if (len > inode->i_sb->s_maxbytes - inode->i_size) {
-		ret = -EFBIG;
-		goto out_mutex;
+	/* Insert range works only on fs cluster size aligned regions. */
+	if (!IS_ALIGNED(offset | len, EXT4_CLUSTER_SIZE(sb))) {
+		ret = -EINVAL;
+		goto out;
 	}
 
 	/* Offset must be less than i_size */
 	if (offset >= inode->i_size) {
 		ret = -EINVAL;
-		goto out_mutex;
+		goto out;
+	}
+
+	/* Check whether the maximum file size would be exceeded */
+	if (len > inode->i_sb->s_maxbytes - inode->i_size) {
+		ret = -EFBIG;
+		goto out;
 	}
 
 	/* Wait for existing dio to complete */
@@ -5437,7 +5429,7 @@ static int ext4_insert_range(struct file *file, loff_t offset, loff_t len)
 
 	ret = file_modified(file);
 	if (ret)
-		goto out_mutex;
+		goto out;
 
 	/*
 	 * Prevent page faults from reinstantiating pages we have released from
@@ -5447,25 +5439,24 @@ static int ext4_insert_range(struct file *file, loff_t offset, loff_t len)
 
 	ret = ext4_break_layouts(inode);
 	if (ret)
-		goto out_mmap;
+		goto out_invalidate_lock;
 
 	/*
 	 * Need to round down to align start offset to page size boundary
 	 * for page size > block size.
 	 */
-	ioffset = round_down(offset, PAGE_SIZE);
+	start = round_down(offset, PAGE_SIZE);
 	/* Write out all dirty pages */
-	ret = filemap_write_and_wait_range(inode->i_mapping, ioffset,
-			LLONG_MAX);
+	ret = filemap_write_and_wait_range(mapping, start, LLONG_MAX);
 	if (ret)
-		goto out_mmap;
-	truncate_pagecache(inode, ioffset);
+		goto out_invalidate_lock;
+	truncate_pagecache(inode, start);
 
 	credits = ext4_writepage_trans_blocks(inode);
 	handle = ext4_journal_start(inode, EXT4_HT_TRUNCATE, credits);
 	if (IS_ERR(handle)) {
 		ret = PTR_ERR(handle);
-		goto out_mmap;
+		goto out_invalidate_lock;
 	}
 	ext4_fc_mark_ineligible(sb, EXT4_FC_REASON_FALLOC_RANGE, handle);
 
@@ -5474,15 +5465,18 @@ static int ext4_insert_range(struct file *file, loff_t offset, loff_t len)
 	EXT4_I(inode)->i_disksize += len;
 	ret = ext4_mark_inode_dirty(handle, inode);
 	if (ret)
-		goto out_stop;
+		goto out_handle;
+
+	start_lblk = offset >> inode->i_blkbits;
+	len_lblk = len >> inode->i_blkbits;
 
 	down_write(&EXT4_I(inode)->i_data_sem);
 	ext4_discard_preallocations(inode);
 
-	path = ext4_find_extent(inode, offset_lblk, NULL, 0);
+	path = ext4_find_extent(inode, start_lblk, NULL, 0);
 	if (IS_ERR(path)) {
 		up_write(&EXT4_I(inode)->i_data_sem);
-		goto out_stop;
+		goto out_handle;
 	}
 
 	depth = ext_depth(inode);
@@ -5492,16 +5486,16 @@ static int ext4_insert_range(struct file *file, loff_t offset, loff_t len)
 		ee_len = ext4_ext_get_actual_len(extent);
 
 		/*
-		 * If offset_lblk is not the starting block of extent, split
-		 * the extent @offset_lblk
+		 * If start_lblk is not the starting block of extent, split
+		 * the extent @start_lblk
 		 */
-		if ((offset_lblk > ee_start_lblk) &&
-				(offset_lblk < (ee_start_lblk + ee_len))) {
+		if ((start_lblk > ee_start_lblk) &&
+				(start_lblk < (ee_start_lblk + ee_len))) {
 			if (ext4_ext_is_unwritten(extent))
 				split_flag = EXT4_EXT_MARK_UNWRIT1 |
 					EXT4_EXT_MARK_UNWRIT2;
 			ret = ext4_split_extent_at(handle, inode, &path,
-					offset_lblk, split_flag,
+					start_lblk, split_flag,
 					EXT4_EX_NOCACHE |
 					EXT4_GET_BLOCKS_PRE_IO |
 					EXT4_GET_BLOCKS_METADATA_NOFAIL);
@@ -5510,32 +5504,33 @@ static int ext4_insert_range(struct file *file, loff_t offset, loff_t len)
 		ext4_free_ext_path(path);
 		if (ret < 0) {
 			up_write(&EXT4_I(inode)->i_data_sem);
-			goto out_stop;
+			goto out_handle;
 		}
 	} else {
 		ext4_free_ext_path(path);
 	}
 
-	ext4_es_remove_extent(inode, offset_lblk, EXT_MAX_BLOCKS - offset_lblk);
+	ext4_es_remove_extent(inode, start_lblk, EXT_MAX_BLOCKS - start_lblk);
 
 	/*
-	 * if offset_lblk lies in a hole which is at start of file, use
+	 * if start_lblk lies in a hole which is at start of file, use
 	 * ee_start_lblk to shift extents
 	 */
 	ret = ext4_ext_shift_extents(inode, handle,
-		max(ee_start_lblk, offset_lblk), len_lblk, SHIFT_RIGHT);
-
+		max(ee_start_lblk, start_lblk), len_lblk, SHIFT_RIGHT);
 	up_write(&EXT4_I(inode)->i_data_sem);
+	if (ret)
+		goto out_handle;
+
+	ext4_update_inode_fsync_trans(handle, inode, 1);
 	if (IS_SYNC(inode))
 		ext4_handle_sync(handle);
-	if (ret >= 0)
-		ext4_update_inode_fsync_trans(handle, inode, 1);
 
-out_stop:
+out_handle:
 	ext4_journal_stop(handle);
-out_mmap:
+out_invalidate_lock:
 	filemap_invalidate_unlock(mapping);
-out_mutex:
+out:
 	inode_unlock(inode);
 	return ret;
 }
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 08/10] ext4: factor out ext4_do_fallocate()
  2024-09-04  6:29 [PATCH v2 00/10] ext4: clean up and refactor fallocate Zhang Yi
                   ` (6 preceding siblings ...)
  2024-09-04  6:29 ` [PATCH v2 07/10] ext4: refactor ext4_insert_range() Zhang Yi
@ 2024-09-04  6:29 ` Zhang Yi
  2024-09-23  8:20   ` Jan Kara
  2024-09-04  6:29 ` [PATCH v2 09/10] ext4: factor out the common checking part of all fallocate operations Zhang Yi
  2024-09-04  6:29 ` [PATCH v2 10/10] ext4: factor out a common helper to lock and flush data before fallocate Zhang Yi
  9 siblings, 1 reply; 28+ messages in thread
From: Zhang Yi @ 2024-09-04  6:29 UTC (permalink / raw)
  To: linux-ext4
  Cc: linux-fsdevel, linux-kernel, tytso, adilger.kernel, jack,
	ritesh.list, yi.zhang, yi.zhang, chengzhihao1, yukuai3

From: Zhang Yi <yi.zhang@huawei.com>

Now the real job of normal fallocate are open code in ext4_fallocate(),
factor out a new helper ext4_do_fallocate() to do the real job, like
others functions (e.g. ext4_zero_range()) in ext4_fallocate() do, this
can make the code more clear, no functional changes.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/ext4/extents.c | 125 ++++++++++++++++++++++------------------------
 1 file changed, 60 insertions(+), 65 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index a6c24c229cb4..06b2c1190181 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4662,6 +4662,58 @@ static long ext4_zero_range(struct file *file, loff_t offset,
 	return ret;
 }
 
+static long ext4_do_fallocate(struct file *file, loff_t offset,
+			      loff_t len, int mode)
+{
+	struct inode *inode = file_inode(file);
+	loff_t end = offset + len;
+	loff_t new_size = 0;
+	ext4_lblk_t start_lblk, len_lblk;
+	int ret;
+
+	trace_ext4_fallocate_enter(inode, offset, len, mode);
+
+	start_lblk = offset >> inode->i_blkbits;
+	len_lblk = EXT4_MAX_BLOCKS(len, offset, inode->i_blkbits);
+
+	inode_lock(inode);
+
+	/* We only support preallocation for extent-based files only. */
+	if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) {
+		ret = -EOPNOTSUPP;
+		goto out;
+	}
+
+	if (!(mode & FALLOC_FL_KEEP_SIZE) &&
+	    (end > inode->i_size || end > EXT4_I(inode)->i_disksize)) {
+		new_size = end;
+		ret = inode_newsize_ok(inode, new_size);
+		if (ret)
+			goto out;
+	}
+
+	/* Wait all existing dio workers, newcomers will block on i_rwsem */
+	inode_dio_wait(inode);
+
+	ret = file_modified(file);
+	if (ret)
+		goto out;
+
+	ret = ext4_alloc_file_blocks(file, start_lblk, len_lblk, new_size,
+				     EXT4_GET_BLOCKS_CREATE_UNWRIT_EXT);
+	if (ret)
+		goto out;
+
+	if (file->f_flags & O_SYNC && EXT4_SB(inode->i_sb)->s_journal) {
+		ret = ext4_fc_commit(EXT4_SB(inode->i_sb)->s_journal,
+					EXT4_I(inode)->i_sync_tid);
+	}
+out:
+	inode_unlock(inode);
+	trace_ext4_fallocate_exit(inode, offset, len_lblk, ret);
+	return ret;
+}
+
 /*
  * preallocate space for a file. This implements ext4's fallocate file
  * operation, which gets called from sys_fallocate system call.
@@ -4672,12 +4724,7 @@ static long ext4_zero_range(struct file *file, loff_t offset,
 long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
 {
 	struct inode *inode = file_inode(file);
-	loff_t new_size = 0;
-	unsigned int max_blocks;
-	int ret = 0;
-	int flags;
-	ext4_lblk_t lblk;
-	unsigned int blkbits = inode->i_blkbits;
+	int ret;
 
 	/*
 	 * Encrypted inodes can't handle collapse range or insert
@@ -4699,71 +4746,19 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
 	ret = ext4_convert_inline_data(inode);
 	inode_unlock(inode);
 	if (ret)
-		goto exit;
+		return ret;
 
-	if (mode & FALLOC_FL_PUNCH_HOLE) {
+	if (mode & FALLOC_FL_PUNCH_HOLE)
 		ret = ext4_punch_hole(file, offset, len);
-		goto exit;
-	}
-
-	if (mode & FALLOC_FL_COLLAPSE_RANGE) {
+	else if (mode & FALLOC_FL_COLLAPSE_RANGE)
 		ret = ext4_collapse_range(file, offset, len);
-		goto exit;
-	}
-
-	if (mode & FALLOC_FL_INSERT_RANGE) {
+	else if (mode & FALLOC_FL_INSERT_RANGE)
 		ret = ext4_insert_range(file, offset, len);
-		goto exit;
-	}
-
-	if (mode & FALLOC_FL_ZERO_RANGE) {
+	else if (mode & FALLOC_FL_ZERO_RANGE)
 		ret = ext4_zero_range(file, offset, len, mode);
-		goto exit;
-	}
-	trace_ext4_fallocate_enter(inode, offset, len, mode);
-	lblk = offset >> blkbits;
-
-	max_blocks = EXT4_MAX_BLOCKS(len, offset, blkbits);
-	flags = EXT4_GET_BLOCKS_CREATE_UNWRIT_EXT;
-
-	inode_lock(inode);
-
-	/*
-	 * We only support preallocation for extent-based files only
-	 */
-	if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) {
-		ret = -EOPNOTSUPP;
-		goto out;
-	}
-
-	if (!(mode & FALLOC_FL_KEEP_SIZE) &&
-	    (offset + len > inode->i_size ||
-	     offset + len > EXT4_I(inode)->i_disksize)) {
-		new_size = offset + len;
-		ret = inode_newsize_ok(inode, new_size);
-		if (ret)
-			goto out;
-	}
-
-	/* Wait all existing dio workers, newcomers will block on i_rwsem */
-	inode_dio_wait(inode);
-
-	ret = file_modified(file);
-	if (ret)
-		goto out;
-
-	ret = ext4_alloc_file_blocks(file, lblk, max_blocks, new_size, flags);
-	if (ret)
-		goto out;
+	else
+		ret = ext4_do_fallocate(file, offset, len, mode);
 
-	if (file->f_flags & O_SYNC && EXT4_SB(inode->i_sb)->s_journal) {
-		ret = ext4_fc_commit(EXT4_SB(inode->i_sb)->s_journal,
-					EXT4_I(inode)->i_sync_tid);
-	}
-out:
-	inode_unlock(inode);
-	trace_ext4_fallocate_exit(inode, offset, max_blocks, ret);
-exit:
 	return ret;
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 09/10] ext4: factor out the common checking part of all fallocate operations
  2024-09-04  6:29 [PATCH v2 00/10] ext4: clean up and refactor fallocate Zhang Yi
                   ` (7 preceding siblings ...)
  2024-09-04  6:29 ` [PATCH v2 08/10] ext4: factor out ext4_do_fallocate() Zhang Yi
@ 2024-09-04  6:29 ` Zhang Yi
  2024-09-23  8:31   ` Jan Kara
  2024-09-04  6:29 ` [PATCH v2 10/10] ext4: factor out a common helper to lock and flush data before fallocate Zhang Yi
  9 siblings, 1 reply; 28+ messages in thread
From: Zhang Yi @ 2024-09-04  6:29 UTC (permalink / raw)
  To: linux-ext4
  Cc: linux-fsdevel, linux-kernel, tytso, adilger.kernel, jack,
	ritesh.list, yi.zhang, yi.zhang, chengzhihao1, yukuai3

From: Zhang Yi <yi.zhang@huawei.com>

Now the beginning of all the five functions in ext4_fallocate() (punch
hole, zero range, insert range, collapse range and normal fallocate) are
almost the same, they need to hold i_rwsem and check the validity of
input parameters, so move the holding of i_rwsem to ext4_fallocate()
and factor out a common helper to check the input parameters can make
the code more clear.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/ext4/extents.c | 132 ++++++++++++++++++----------------------------
 fs/ext4/inode.c   |  13 ++---
 2 files changed, 56 insertions(+), 89 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 06b2c1190181..91e509201915 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4548,23 +4548,14 @@ static long ext4_zero_range(struct file *file, loff_t offset,
 	int ret, flags, credits;
 
 	trace_ext4_zero_range(inode, offset, len, mode);
-
-	inode_lock(inode);
-
-	/*
-	 * Indirect files do not support unwritten extents
-	 */
-	if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) {
-		ret = -EOPNOTSUPP;
-		goto out;
-	}
+	WARN_ON_ONCE(!inode_is_locked(inode));
 
 	if (!(mode & FALLOC_FL_KEEP_SIZE) &&
 	    (end > inode->i_size || end > EXT4_I(inode)->i_disksize)) {
 		new_size = end;
 		ret = inode_newsize_ok(inode, new_size);
 		if (ret)
-			goto out;
+			return ret;
 	}
 
 	/* Wait all existing dio workers, newcomers will block on i_rwsem */
@@ -4572,7 +4563,7 @@ static long ext4_zero_range(struct file *file, loff_t offset,
 
 	ret = file_modified(file);
 	if (ret)
-		goto out;
+		return ret;
 
 	/*
 	 * Prevent page faults from reinstantiating pages we have released
@@ -4657,8 +4648,6 @@ static long ext4_zero_range(struct file *file, loff_t offset,
 	ext4_journal_stop(handle);
 out_invalidate_lock:
 	filemap_invalidate_unlock(mapping);
-out:
-	inode_unlock(inode);
 	return ret;
 }
 
@@ -4672,18 +4661,11 @@ static long ext4_do_fallocate(struct file *file, loff_t offset,
 	int ret;
 
 	trace_ext4_fallocate_enter(inode, offset, len, mode);
+	WARN_ON_ONCE(!inode_is_locked(inode));
 
 	start_lblk = offset >> inode->i_blkbits;
 	len_lblk = EXT4_MAX_BLOCKS(len, offset, inode->i_blkbits);
 
-	inode_lock(inode);
-
-	/* We only support preallocation for extent-based files only. */
-	if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) {
-		ret = -EOPNOTSUPP;
-		goto out;
-	}
-
 	if (!(mode & FALLOC_FL_KEEP_SIZE) &&
 	    (end > inode->i_size || end > EXT4_I(inode)->i_disksize)) {
 		new_size = end;
@@ -4709,11 +4691,46 @@ static long ext4_do_fallocate(struct file *file, loff_t offset,
 					EXT4_I(inode)->i_sync_tid);
 	}
 out:
-	inode_unlock(inode);
 	trace_ext4_fallocate_exit(inode, offset, len_lblk, ret);
 	return ret;
 }
 
+static int ext4_fallocate_check(struct inode *inode, int mode,
+				loff_t offset, loff_t len)
+{
+	/* Currently except punch_hole, just for extent based files. */
+	if (!(mode & FALLOC_FL_PUNCH_HOLE) &&
+	    !ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))
+		return -EOPNOTSUPP;
+
+	/*
+	 * Insert range and collapse range works only on fs cluster size
+	 * aligned regions.
+	 */
+	if (mode & (FALLOC_FL_INSERT_RANGE | FALLOC_FL_COLLAPSE_RANGE) &&
+	    !IS_ALIGNED(offset | len, EXT4_CLUSTER_SIZE(inode->i_sb)))
+		return -EINVAL;
+
+	if (mode & FALLOC_FL_INSERT_RANGE) {
+		/* Collapse range, offset must be less than i_size */
+		if (offset >= inode->i_size)
+			return -EINVAL;
+		/* Check whether the maximum file size would be exceeded */
+		if (len > inode->i_sb->s_maxbytes - inode->i_size)
+			return -EFBIG;
+	} else if (mode & FALLOC_FL_COLLAPSE_RANGE) {
+		/*
+		 * Insert range, there is no need to overlap collapse
+		 * range with EOF, in which case it is effectively a
+		 * truncate operation.
+		 */
+		if (offset + len >= inode->i_size)
+			return -EINVAL;
+	}
+
+	return 0;
+}
+
 /*
  * preallocate space for a file. This implements ext4's fallocate file
  * operation, which gets called from sys_fallocate system call.
@@ -4744,9 +4761,12 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
 
 	inode_lock(inode);
 	ret = ext4_convert_inline_data(inode);
-	inode_unlock(inode);
 	if (ret)
-		return ret;
+		goto out;
+
+	ret = ext4_fallocate_check(inode, mode, offset, len);
+	if (ret)
+		goto out;
 
 	if (mode & FALLOC_FL_PUNCH_HOLE)
 		ret = ext4_punch_hole(file, offset, len);
@@ -4758,7 +4778,8 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
 		ret = ext4_zero_range(file, offset, len, mode);
 	else
 		ret = ext4_do_fallocate(file, offset, len, mode);
-
+out:
+	inode_unlock(inode);
 	return ret;
 }
 
@@ -5267,36 +5288,14 @@ static int ext4_collapse_range(struct file *file, loff_t offset, loff_t len)
 	int ret;
 
 	trace_ext4_collapse_range(inode, offset, len);
-
-	inode_lock(inode);
-
-	/* Currently just for extent based files */
-	if (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) {
-		ret = -EOPNOTSUPP;
-		goto out;
-	}
-
-	/* Collapse range works only on fs cluster size aligned regions. */
-	if (!IS_ALIGNED(offset | len, EXT4_CLUSTER_SIZE(sb))) {
-		ret = -EINVAL;
-		goto out;
-	}
-
-	/*
-	 * There is no need to overlap collapse range with EOF, in which case
-	 * it is effectively a truncate operation
-	 */
-	if (offset + len >= inode->i_size) {
-		ret = -EINVAL;
-		goto out;
-	}
+	WARN_ON_ONCE(!inode_is_locked(inode));
 
 	/* Wait for existing dio to complete */
 	inode_dio_wait(inode);
 
 	ret = file_modified(file);
 	if (ret)
-		goto out;
+		return ret;
 
 	/*
 	 * Prevent page faults from reinstantiating pages we have released from
@@ -5365,8 +5364,6 @@ static int ext4_collapse_range(struct file *file, loff_t offset, loff_t len)
 	ext4_journal_stop(handle);
 out_invalidate_lock:
 	filemap_invalidate_unlock(mapping);
-out:
-	inode_unlock(inode);
 	return ret;
 }
 
@@ -5392,39 +5389,14 @@ static int ext4_insert_range(struct file *file, loff_t offset, loff_t len)
 	loff_t start;
 
 	trace_ext4_insert_range(inode, offset, len);
-
-	inode_lock(inode);
-
-	/* Currently just for extent based files */
-	if (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) {
-		ret = -EOPNOTSUPP;
-		goto out;
-	}
-
-	/* Insert range works only on fs cluster size aligned regions. */
-	if (!IS_ALIGNED(offset | len, EXT4_CLUSTER_SIZE(sb))) {
-		ret = -EINVAL;
-		goto out;
-	}
-
-	/* Offset must be less than i_size */
-	if (offset >= inode->i_size) {
-		ret = -EINVAL;
-		goto out;
-	}
-
-	/* Check whether the maximum file size would be exceeded */
-	if (len > inode->i_sb->s_maxbytes - inode->i_size) {
-		ret = -EFBIG;
-		goto out;
-	}
+	WARN_ON_ONCE(!inode_is_locked(inode));
 
 	/* Wait for existing dio to complete */
 	inode_dio_wait(inode);
 
 	ret = file_modified(file);
 	if (ret)
-		goto out;
+		return ret;
 
 	/*
 	 * Prevent page faults from reinstantiating pages we have released from
@@ -5525,8 +5497,6 @@ static int ext4_insert_range(struct file *file, loff_t offset, loff_t len)
 	ext4_journal_stop(handle);
 out_invalidate_lock:
 	filemap_invalidate_unlock(mapping);
-out:
-	inode_unlock(inode);
 	return ret;
 }
 
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index dfaf9e9d6ad8..57636c656fa5 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3923,15 +3923,14 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
 	unsigned long blocksize = i_blocksize(inode);
 	handle_t *handle;
 	unsigned int credits;
-	int ret = 0;
+	int ret;
 
 	trace_ext4_punch_hole(inode, offset, length, 0);
-
-	inode_lock(inode);
+	WARN_ON_ONCE(!inode_is_locked(inode));
 
 	/* No need to punch hole beyond i_size */
 	if (offset >= inode->i_size)
-		goto out;
+		return 0;
 
 	/*
 	 * If the hole extends beyond i_size, set the hole to end after
@@ -3951,7 +3950,7 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
 	if (offset & (blocksize - 1) || end & (blocksize - 1)) {
 		ret = ext4_inode_attach_jinode(inode);
 		if (ret < 0)
-			goto out;
+			return ret;
 	}
 
 	/* Wait all existing dio workers, newcomers will block on i_rwsem */
@@ -3959,7 +3958,7 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
 
 	ret = file_modified(file);
 	if (ret)
-		goto out;
+		return ret;
 
 	/*
 	 * Prevent page faults from reinstantiating pages we have released from
@@ -4036,8 +4035,6 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
 	ext4_journal_stop(handle);
 out_invalidate_lock:
 	filemap_invalidate_unlock(mapping);
-out:
-	inode_unlock(inode);
 	return ret;
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 10/10] ext4: factor out a common helper to lock and flush data before fallocate
  2024-09-04  6:29 [PATCH v2 00/10] ext4: clean up and refactor fallocate Zhang Yi
                   ` (8 preceding siblings ...)
  2024-09-04  6:29 ` [PATCH v2 09/10] ext4: factor out the common checking part of all fallocate operations Zhang Yi
@ 2024-09-04  6:29 ` Zhang Yi
  2024-09-23  8:54   ` Jan Kara
  9 siblings, 1 reply; 28+ messages in thread
From: Zhang Yi @ 2024-09-04  6:29 UTC (permalink / raw)
  To: linux-ext4
  Cc: linux-fsdevel, linux-kernel, tytso, adilger.kernel, jack,
	ritesh.list, yi.zhang, yi.zhang, chengzhihao1, yukuai3

From: Zhang Yi <yi.zhang@huawei.com>

Now the beginning of the first four functions in ext4_fallocate() (punch
hole, zero range, insert range and collapse range) are almost the same,
they need to wait for the dio to finish, get filemap invalidate lock,
write back dirty data and finally drop page cache. Factor out a common
helper to do these work can reduce a lot of the redundant code.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/ext4/ext4.h    |   2 +
 fs/ext4/extents.c | 125 ++++++++++++++++++++--------------------------
 fs/ext4/inode.c   |  25 +---------
 3 files changed, 57 insertions(+), 95 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index e8d7965f62c4..281fab9abc42 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -3696,6 +3696,8 @@ extern int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start,
 				 ext4_lblk_t end);
 extern void ext4_ext_init(struct super_block *);
 extern void ext4_ext_release(struct super_block *);
+extern int ext4_prepare_falloc(struct file *file, loff_t start, loff_t end,
+			       int mode);
 extern long ext4_fallocate(struct file *file, int mode, loff_t offset,
 			  loff_t len);
 extern int ext4_convert_unwritten_extents(handle_t *handle, struct inode *inode,
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 91e509201915..eee63e92dcc6 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4558,34 +4558,10 @@ static long ext4_zero_range(struct file *file, loff_t offset,
 			return ret;
 	}
 
-	/* Wait all existing dio workers, newcomers will block on i_rwsem */
-	inode_dio_wait(inode);
-
-	ret = file_modified(file);
+	ret = ext4_prepare_falloc(file, offset, end - 1, FALLOC_FL_ZERO_RANGE);
 	if (ret)
 		return ret;
 
-	/*
-	 * Prevent page faults from reinstantiating pages we have released
-	 * from page cache.
-	 */
-	filemap_invalidate_lock(mapping);
-
-	ret = ext4_break_layouts(inode);
-	if (ret)
-		goto out_invalidate_lock;
-
-	/*
-	 * Write data that will be zeroed to preserve them when successfully
-	 * discarding page cache below but fail to convert extents.
-	 */
-	ret = filemap_write_and_wait_range(mapping, offset, end - 1);
-	if (ret)
-		goto out_invalidate_lock;
-
-	/* Now release the pages and zero block aligned part of pages */
-	truncate_pagecache_range(inode, offset, end - 1);
-
 	flags = EXT4_GET_BLOCKS_CREATE_UNWRIT_EXT;
 	/* Preallocate the range including the unaligned edges */
 	if (offset & (blocksize - 1) || end & (blocksize - 1)) {
@@ -4731,6 +4707,52 @@ static int ext4_fallocate_check(struct inode *inode, int mode,
 	return 0;
 }
 
+int ext4_prepare_falloc(struct file *file, loff_t start, loff_t end, int mode)
+{
+	struct inode *inode = file_inode(file);
+	struct address_space *mapping = inode->i_mapping;
+	int ret;
+
+	/* Wait all existing dio workers, newcomers will block on i_rwsem */
+	inode_dio_wait(inode);
+	ret = file_modified(file);
+	if (ret)
+		return ret;
+
+	/*
+	 * Prevent page faults from reinstantiating pages we have released
+	 * from page cache.
+	 */
+	filemap_invalidate_lock(mapping);
+
+	ret = ext4_break_layouts(inode);
+	if (ret)
+		goto failed;
+
+	/*
+	 * Write data that will be zeroed to preserve them when successfully
+	 * discarding page cache below but fail to convert extents.
+	 */
+	ret = filemap_write_and_wait_range(mapping, start, end);
+	if (ret)
+		goto failed;
+
+	/*
+	 * For insert range and collapse range, COWed private pages should
+	 * be removed since the file's logical offset will be changed, but
+	 * punch hole and zero range doesn't.
+	 */
+	if (mode & (FALLOC_FL_INSERT_RANGE | FALLOC_FL_COLLAPSE_RANGE))
+		truncate_pagecache(inode, start);
+	else
+		truncate_pagecache_range(inode, start, end);
+
+	return 0;
+failed:
+	filemap_invalidate_unlock(mapping);
+	return ret;
+}
+
 /*
  * preallocate space for a file. This implements ext4's fallocate file
  * operation, which gets called from sys_fallocate system call.
@@ -5284,39 +5306,20 @@ static int ext4_collapse_range(struct file *file, loff_t offset, loff_t len)
 	ext4_lblk_t start_lblk, end_lblk;
 	handle_t *handle;
 	unsigned int credits;
-	loff_t start, new_size;
+	loff_t new_size;
 	int ret;
 
 	trace_ext4_collapse_range(inode, offset, len);
 	WARN_ON_ONCE(!inode_is_locked(inode));
 
-	/* Wait for existing dio to complete */
-	inode_dio_wait(inode);
-
-	ret = file_modified(file);
-	if (ret)
-		return ret;
-
-	/*
-	 * Prevent page faults from reinstantiating pages we have released from
-	 * page cache.
-	 */
-	filemap_invalidate_lock(mapping);
-
-	ret = ext4_break_layouts(inode);
-	if (ret)
-		goto out_invalidate_lock;
-
 	/*
 	 * Need to round down offset to be aligned with page size boundary
 	 * for page size > block size.
 	 */
-	start = round_down(offset, PAGE_SIZE);
-	/* Write out all dirty pages */
-	ret = filemap_write_and_wait_range(mapping, start, LLONG_MAX);
+	ret = ext4_prepare_falloc(file, round_down(offset, PAGE_SIZE),
+				  LLONG_MAX, FALLOC_FL_COLLAPSE_RANGE);
 	if (ret)
-		goto out_invalidate_lock;
-	truncate_pagecache(inode, start);
+		return ret;
 
 	credits = ext4_writepage_trans_blocks(inode);
 	handle = ext4_journal_start(inode, EXT4_HT_TRUNCATE, credits);
@@ -5386,38 +5389,18 @@ static int ext4_insert_range(struct file *file, loff_t offset, loff_t len)
 	ext4_lblk_t start_lblk, len_lblk, ee_start_lblk = 0;
 	unsigned int credits, ee_len;
 	int ret, depth, split_flag = 0;
-	loff_t start;
 
 	trace_ext4_insert_range(inode, offset, len);
 	WARN_ON_ONCE(!inode_is_locked(inode));
 
-	/* Wait for existing dio to complete */
-	inode_dio_wait(inode);
-
-	ret = file_modified(file);
-	if (ret)
-		return ret;
-
-	/*
-	 * Prevent page faults from reinstantiating pages we have released from
-	 * page cache.
-	 */
-	filemap_invalidate_lock(mapping);
-
-	ret = ext4_break_layouts(inode);
-	if (ret)
-		goto out_invalidate_lock;
-
 	/*
 	 * Need to round down to align start offset to page size boundary
 	 * for page size > block size.
 	 */
-	start = round_down(offset, PAGE_SIZE);
-	/* Write out all dirty pages */
-	ret = filemap_write_and_wait_range(mapping, start, LLONG_MAX);
+	ret = ext4_prepare_falloc(file, round_down(offset, PAGE_SIZE),
+				  LLONG_MAX, FALLOC_FL_INSERT_RANGE);
 	if (ret)
-		goto out_invalidate_lock;
-	truncate_pagecache(inode, start);
+		return ret;
 
 	credits = ext4_writepage_trans_blocks(inode);
 	handle = ext4_journal_start(inode, EXT4_HT_TRUNCATE, credits);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 57636c656fa5..4b7f8fcaa5c2 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3953,33 +3953,10 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
 			return ret;
 	}
 
-	/* Wait all existing dio workers, newcomers will block on i_rwsem */
-	inode_dio_wait(inode);
-
-	ret = file_modified(file);
+	ret = ext4_prepare_falloc(file, offset, end - 1, FALLOC_FL_PUNCH_HOLE);
 	if (ret)
 		return ret;
 
-	/*
-	 * Prevent page faults from reinstantiating pages we have released from
-	 * page cache.
-	 */
-	filemap_invalidate_lock(mapping);
-
-	ret = ext4_break_layouts(inode);
-	if (ret)
-		goto out_invalidate_lock;
-
-	/* Write out all dirty pages to avoid race conditions */
-	if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
-		ret = filemap_write_and_wait_range(mapping, offset, end - 1);
-		if (ret)
-			goto out_invalidate_lock;
-	}
-
-	/* Now release the pages and zero block aligned part of pages*/
-	truncate_pagecache_range(inode, offset, end - 1);
-
 	if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))
 		credits = ext4_writepage_trans_blocks(inode);
 	else
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 01/10] ext4: write out dirty data before dropping pages
  2024-09-04  6:29 ` [PATCH v2 01/10] ext4: write out dirty data before dropping pages Zhang Yi
@ 2024-09-17 16:50   ` Jan Kara
  2024-09-18 12:27     ` Zhang Yi
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Kara @ 2024-09-17 16:50 UTC (permalink / raw)
  To: Zhang Yi
  Cc: linux-ext4, linux-fsdevel, linux-kernel, tytso, adilger.kernel,
	jack, ritesh.list, yi.zhang, chengzhihao1, yukuai3

On Wed 04-09-24 14:29:16, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@huawei.com>
> 
> Current zero range, punch hole and collapse range have a common
> potential data loss problem. In general, ext4_zero_range(),
> ext4_collapse_range() and ext4_punch_hold() will discard all page cache
> of the operation range before converting the extents status. However,
> the first two functions don't write back dirty data before discarding
> page cache, and ext4_punch_hold() write back at the very beginning
> without holding i_rwsem and mapping invalidate lock. Hence, if some bad
> things (e.g. EIO or ENOMEM) happens just after dropping dirty page
> cache, the operation will failed but the user's valid data in the dirty
> page cache will be lost. Fix this by write all dirty data under i_rwsem
> and mapping invalidate lock before discarding pages.
> 
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>

I'm not sure this is the direction we want to go. When zeroing / collapsing
/ punching writing out all the data we are going to remove seems suboptimal
and we can spend significant time doing work that is mostly unnecessary.
After all with truncate we also drop pagecache pages and the do on-disk
modification which can fail.

The case of EIO is in my opinion OK - when there are disk errors, we are
going to loose data and e2fsck is needed. So protecting with writeout
against possible damage is pointless. For ENOMEM I agree we should better
preserve filesystem consistency. Is there some case where we would keep
filesystem inconsistent on ENOMEM?

								Honza

> ---
>  fs/ext4/extents.c | 77 +++++++++++++++++------------------------------
>  fs/ext4/inode.c   | 19 +++++-------
>  2 files changed, 36 insertions(+), 60 deletions(-)
> 
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index e067f2dd0335..7d5edfa2e630 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -4602,6 +4602,24 @@ static long ext4_zero_range(struct file *file, loff_t offset,
>  	if (ret)
>  		goto out_mutex;
>  
> +	/*
> +	 * Prevent page faults from reinstantiating pages we have released
> +	 * from page cache.
> +	 */
> +	filemap_invalidate_lock(mapping);
> +
> +	ret = ext4_break_layouts(inode);
> +	if (ret)
> +		goto out_invalidate_lock;
> +
> +	/*
> +	 * Write data that will be zeroed to preserve them when successfully
> +	 * discarding page cache below but fail to convert extents.
> +	 */
> +	ret = filemap_write_and_wait_range(mapping, start, end - 1);
> +	if (ret)
> +		goto out_invalidate_lock;
> +
>  	/* Preallocate the range including the unaligned edges */
>  	if (partial_begin || partial_end) {
>  		ret = ext4_alloc_file_blocks(file,
> @@ -4610,7 +4628,7 @@ static long ext4_zero_range(struct file *file, loff_t offset,
>  				 round_down(offset, 1 << blkbits)) >> blkbits,
>  				new_size, flags);
>  		if (ret)
> -			goto out_mutex;
> +			goto out_invalidate_lock;
>  
>  	}
>  
> @@ -4619,37 +4637,9 @@ static long ext4_zero_range(struct file *file, loff_t offset,
>  		flags |= (EXT4_GET_BLOCKS_CONVERT_UNWRITTEN |
>  			  EXT4_EX_NOCACHE);
>  
> -		/*
> -		 * Prevent page faults from reinstantiating pages we have
> -		 * released from page cache.
> -		 */
> -		filemap_invalidate_lock(mapping);
> -
> -		ret = ext4_break_layouts(inode);
> -		if (ret) {
> -			filemap_invalidate_unlock(mapping);
> -			goto out_mutex;
> -		}
> -
>  		ret = ext4_update_disksize_before_punch(inode, offset, len);
> -		if (ret) {
> -			filemap_invalidate_unlock(mapping);
> -			goto out_mutex;
> -		}
> -
> -		/*
> -		 * For journalled data we need to write (and checkpoint) pages
> -		 * before discarding page cache to avoid inconsitent data on
> -		 * disk in case of crash before zeroing trans is committed.
> -		 */
> -		if (ext4_should_journal_data(inode)) {
> -			ret = filemap_write_and_wait_range(mapping, start,
> -							   end - 1);
> -			if (ret) {
> -				filemap_invalidate_unlock(mapping);
> -				goto out_mutex;
> -			}
> -		}
> +		if (ret)
> +			goto out_invalidate_lock;
>  
>  		/* Now release the pages and zero block aligned part of pages */
>  		truncate_pagecache_range(inode, start, end - 1);
> @@ -4657,12 +4647,11 @@ static long ext4_zero_range(struct file *file, loff_t offset,
>  
>  		ret = ext4_alloc_file_blocks(file, lblk, max_blocks, new_size,
>  					     flags);
> -		filemap_invalidate_unlock(mapping);
>  		if (ret)
> -			goto out_mutex;
> +			goto out_invalidate_lock;
>  	}
>  	if (!partial_begin && !partial_end)
> -		goto out_mutex;
> +		goto out_invalidate_lock;
>  
>  	/*
>  	 * In worst case we have to writeout two nonadjacent unwritten
> @@ -4675,7 +4664,7 @@ static long ext4_zero_range(struct file *file, loff_t offset,
>  	if (IS_ERR(handle)) {
>  		ret = PTR_ERR(handle);
>  		ext4_std_error(inode->i_sb, ret);
> -		goto out_mutex;
> +		goto out_invalidate_lock;
>  	}
>  
>  	inode_set_mtime_to_ts(inode, inode_set_ctime_current(inode));
> @@ -4694,6 +4683,8 @@ static long ext4_zero_range(struct file *file, loff_t offset,
>  
>  out_handle:
>  	ext4_journal_stop(handle);
> +out_invalidate_lock:
> +	filemap_invalidate_unlock(mapping);
>  out_mutex:
>  	inode_unlock(inode);
>  	return ret;
> @@ -5363,20 +5354,8 @@ static int ext4_collapse_range(struct file *file, loff_t offset, loff_t len)
>  	 * for page size > block size.
>  	 */
>  	ioffset = round_down(offset, PAGE_SIZE);
> -	/*
> -	 * Write tail of the last page before removed range since it will get
> -	 * removed from the page cache below.
> -	 */
> -	ret = filemap_write_and_wait_range(mapping, ioffset, offset);
> -	if (ret)
> -		goto out_mmap;
> -	/*
> -	 * Write data that will be shifted to preserve them when discarding
> -	 * page cache below. We are also protected from pages becoming dirty
> -	 * by i_rwsem and invalidate_lock.
> -	 */
> -	ret = filemap_write_and_wait_range(mapping, offset + len,
> -					   LLONG_MAX);
> +	/* Write out all dirty pages */
> +	ret = filemap_write_and_wait_range(mapping, ioffset, LLONG_MAX);
>  	if (ret)
>  		goto out_mmap;
>  	truncate_pagecache(inode, ioffset);
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 941c1c0d5c6e..c3d7606a5315 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3957,17 +3957,6 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
>  
>  	trace_ext4_punch_hole(inode, offset, length, 0);
>  
> -	/*
> -	 * Write out all dirty pages to avoid race conditions
> -	 * Then release them.
> -	 */
> -	if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
> -		ret = filemap_write_and_wait_range(mapping, offset,
> -						   offset + length - 1);
> -		if (ret)
> -			return ret;
> -	}
> -
>  	inode_lock(inode);
>  
>  	/* No need to punch hole beyond i_size */
> @@ -4021,6 +4010,14 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
>  	if (ret)
>  		goto out_dio;
>  
> +	/* Write out all dirty pages to avoid race conditions */
> +	if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
> +		ret = filemap_write_and_wait_range(mapping, offset,
> +						   offset + length - 1);
> +		if (ret)
> +			goto out_dio;
> +	}
> +
>  	first_block_offset = round_up(offset, sb->s_blocksize);
>  	last_block_offset = round_down((offset + length), sb->s_blocksize) - 1;
>  
> -- 
> 2.39.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 01/10] ext4: write out dirty data before dropping pages
  2024-09-17 16:50   ` Jan Kara
@ 2024-09-18 12:27     ` Zhang Yi
  0 siblings, 0 replies; 28+ messages in thread
From: Zhang Yi @ 2024-09-18 12:27 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-ext4, linux-fsdevel, linux-kernel, tytso, adilger.kernel,
	ritesh.list, yi.zhang, chengzhihao1, yukuai3

On 2024/9/18 0:50, Jan Kara wrote:
> On Wed 04-09-24 14:29:16, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@huawei.com>
>>
>> Current zero range, punch hole and collapse range have a common
>> potential data loss problem. In general, ext4_zero_range(),
>> ext4_collapse_range() and ext4_punch_hold() will discard all page cache
>> of the operation range before converting the extents status. However,
>> the first two functions don't write back dirty data before discarding
>> page cache, and ext4_punch_hold() write back at the very beginning
>> without holding i_rwsem and mapping invalidate lock. Hence, if some bad
>> things (e.g. EIO or ENOMEM) happens just after dropping dirty page
>> cache, the operation will failed but the user's valid data in the dirty
>> page cache will be lost. Fix this by write all dirty data under i_rwsem
>> and mapping invalidate lock before discarding pages.
>>
>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> 
> I'm not sure this is the direction we want to go. When zeroing / collapsing
> / punching writing out all the data we are going to remove seems suboptimal
> and we can spend significant time doing work that is mostly unnecessary.

Yes, I agree with you that it do bring some performance sacrifices and
seems not the best solution, but at the moment, I can't find a simple and
better solution.

I've also checked some other modern disk filesystems. IIUC, it seems that
each filesystem is different when doing this 3 operations, bcachefs only do
write back before dropping pagecache when collapsing, f2fs do write back
when zeroing range and collapsing, btrfs do write back when punching and
zeroing(it doesn't support collapse), xfs do write back for all of the
three operations. So, it seems that only btrfs and xfs can survival now.

> After all with truncate we also drop pagecache pages and the do on-disk
> modification which can fail.

Yeah, right, truncate may have the same problem too, and all of the above
other 4 filesystems are the same.

> The case of EIO is in my opinion OK - when there are disk errors, we are
> going to loose data and e2fsck is needed. So protecting with writeout
> against possible damage is pointless.

Yeah, please forgive me for this not good example.

> For ENOMEM I agree we should better
> preserve filesystem consistency. Is there some case where we would keep
> filesystem inconsistent on ENOMEM?

The ENOMEM case were seldom happen on our products, so it hasn't trigger
any real problem so far. I find it when I was refactoring these fallocate
functions. Theoretically, I believe it should be a problem, but based on
current filesystems' implementation, I'm not sure if we really need to
care about it, maybe xfs and btrfs do write back because they could have
more opportunity to fail after dropping pagecache when punching/zeroing/
(collapsing), so they have to write data back?

Thanks,
Yi.

> 
>> ---
>>  fs/ext4/extents.c | 77 +++++++++++++++++------------------------------
>>  fs/ext4/inode.c   | 19 +++++-------
>>  2 files changed, 36 insertions(+), 60 deletions(-)
>>
>> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
>> index e067f2dd0335..7d5edfa2e630 100644
>> --- a/fs/ext4/extents.c
>> +++ b/fs/ext4/extents.c
>> @@ -4602,6 +4602,24 @@ static long ext4_zero_range(struct file *file, loff_t offset,
>>  	if (ret)
>>  		goto out_mutex;
>>  
>> +	/*
>> +	 * Prevent page faults from reinstantiating pages we have released
>> +	 * from page cache.
>> +	 */
>> +	filemap_invalidate_lock(mapping);
>> +
>> +	ret = ext4_break_layouts(inode);
>> +	if (ret)
>> +		goto out_invalidate_lock;
>> +
>> +	/*
>> +	 * Write data that will be zeroed to preserve them when successfully
>> +	 * discarding page cache below but fail to convert extents.
>> +	 */
>> +	ret = filemap_write_and_wait_range(mapping, start, end - 1);
>> +	if (ret)
>> +		goto out_invalidate_lock;
>> +
>>  	/* Preallocate the range including the unaligned edges */
>>  	if (partial_begin || partial_end) {
>>  		ret = ext4_alloc_file_blocks(file,
>> @@ -4610,7 +4628,7 @@ static long ext4_zero_range(struct file *file, loff_t offset,
>>  				 round_down(offset, 1 << blkbits)) >> blkbits,
>>  				new_size, flags);
>>  		if (ret)
>> -			goto out_mutex;
>> +			goto out_invalidate_lock;
>>  
>>  	}
>>  
>> @@ -4619,37 +4637,9 @@ static long ext4_zero_range(struct file *file, loff_t offset,
>>  		flags |= (EXT4_GET_BLOCKS_CONVERT_UNWRITTEN |
>>  			  EXT4_EX_NOCACHE);
>>  
>> -		/*
>> -		 * Prevent page faults from reinstantiating pages we have
>> -		 * released from page cache.
>> -		 */
>> -		filemap_invalidate_lock(mapping);
>> -
>> -		ret = ext4_break_layouts(inode);
>> -		if (ret) {
>> -			filemap_invalidate_unlock(mapping);
>> -			goto out_mutex;
>> -		}
>> -
>>  		ret = ext4_update_disksize_before_punch(inode, offset, len);
>> -		if (ret) {
>> -			filemap_invalidate_unlock(mapping);
>> -			goto out_mutex;
>> -		}
>> -
>> -		/*
>> -		 * For journalled data we need to write (and checkpoint) pages
>> -		 * before discarding page cache to avoid inconsitent data on
>> -		 * disk in case of crash before zeroing trans is committed.
>> -		 */
>> -		if (ext4_should_journal_data(inode)) {
>> -			ret = filemap_write_and_wait_range(mapping, start,
>> -							   end - 1);
>> -			if (ret) {
>> -				filemap_invalidate_unlock(mapping);
>> -				goto out_mutex;
>> -			}
>> -		}
>> +		if (ret)
>> +			goto out_invalidate_lock;
>>  
>>  		/* Now release the pages and zero block aligned part of pages */
>>  		truncate_pagecache_range(inode, start, end - 1);
>> @@ -4657,12 +4647,11 @@ static long ext4_zero_range(struct file *file, loff_t offset,
>>  
>>  		ret = ext4_alloc_file_blocks(file, lblk, max_blocks, new_size,
>>  					     flags);
>> -		filemap_invalidate_unlock(mapping);
>>  		if (ret)
>> -			goto out_mutex;
>> +			goto out_invalidate_lock;
>>  	}
>>  	if (!partial_begin && !partial_end)
>> -		goto out_mutex;
>> +		goto out_invalidate_lock;
>>  
>>  	/*
>>  	 * In worst case we have to writeout two nonadjacent unwritten
>> @@ -4675,7 +4664,7 @@ static long ext4_zero_range(struct file *file, loff_t offset,
>>  	if (IS_ERR(handle)) {
>>  		ret = PTR_ERR(handle);
>>  		ext4_std_error(inode->i_sb, ret);
>> -		goto out_mutex;
>> +		goto out_invalidate_lock;
>>  	}
>>  
>>  	inode_set_mtime_to_ts(inode, inode_set_ctime_current(inode));
>> @@ -4694,6 +4683,8 @@ static long ext4_zero_range(struct file *file, loff_t offset,
>>  
>>  out_handle:
>>  	ext4_journal_stop(handle);
>> +out_invalidate_lock:
>> +	filemap_invalidate_unlock(mapping);
>>  out_mutex:
>>  	inode_unlock(inode);
>>  	return ret;
>> @@ -5363,20 +5354,8 @@ static int ext4_collapse_range(struct file *file, loff_t offset, loff_t len)
>>  	 * for page size > block size.
>>  	 */
>>  	ioffset = round_down(offset, PAGE_SIZE);
>> -	/*
>> -	 * Write tail of the last page before removed range since it will get
>> -	 * removed from the page cache below.
>> -	 */
>> -	ret = filemap_write_and_wait_range(mapping, ioffset, offset);
>> -	if (ret)
>> -		goto out_mmap;
>> -	/*
>> -	 * Write data that will be shifted to preserve them when discarding
>> -	 * page cache below. We are also protected from pages becoming dirty
>> -	 * by i_rwsem and invalidate_lock.
>> -	 */
>> -	ret = filemap_write_and_wait_range(mapping, offset + len,
>> -					   LLONG_MAX);
>> +	/* Write out all dirty pages */
>> +	ret = filemap_write_and_wait_range(mapping, ioffset, LLONG_MAX);
>>  	if (ret)
>>  		goto out_mmap;
>>  	truncate_pagecache(inode, ioffset);
>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>> index 941c1c0d5c6e..c3d7606a5315 100644
>> --- a/fs/ext4/inode.c
>> +++ b/fs/ext4/inode.c
>> @@ -3957,17 +3957,6 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
>>  
>>  	trace_ext4_punch_hole(inode, offset, length, 0);
>>  
>> -	/*
>> -	 * Write out all dirty pages to avoid race conditions
>> -	 * Then release them.
>> -	 */
>> -	if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
>> -		ret = filemap_write_and_wait_range(mapping, offset,
>> -						   offset + length - 1);
>> -		if (ret)
>> -			return ret;
>> -	}
>> -
>>  	inode_lock(inode);
>>  
>>  	/* No need to punch hole beyond i_size */
>> @@ -4021,6 +4010,14 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
>>  	if (ret)
>>  		goto out_dio;
>>  
>> +	/* Write out all dirty pages to avoid race conditions */
>> +	if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
>> +		ret = filemap_write_and_wait_range(mapping, offset,
>> +						   offset + length - 1);
>> +		if (ret)
>> +			goto out_dio;
>> +	}
>> +
>>  	first_block_offset = round_up(offset, sb->s_blocksize);
>>  	last_block_offset = round_down((offset + length), sb->s_blocksize) - 1;
>>  
>> -- 
>> 2.39.2
>>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 02/10] ext4: don't explicit update times in ext4_fallocate()
  2024-09-04  6:29 ` [PATCH v2 02/10] ext4: don't explicit update times in ext4_fallocate() Zhang Yi
@ 2024-09-20 16:04   ` Jan Kara
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Kara @ 2024-09-20 16:04 UTC (permalink / raw)
  To: Zhang Yi
  Cc: linux-ext4, linux-fsdevel, linux-kernel, tytso, adilger.kernel,
	jack, ritesh.list, yi.zhang, chengzhihao1, yukuai3

On Wed 04-09-24 14:29:17, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@huawei.com>
> 
> After commit 'ad5cd4f4ee4d ("ext4: fix fallocate to use file_modified to
> update permissions consistently"), we can update mtime and ctime
> appropriately through file_modified() when doing zero range, collapse
> rage, insert range and punch hole, hence there is no need to explicit
> update times in those paths, just drop them.
> 
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>

Good point! Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/extents.c | 4 ----
>  fs/ext4/inode.c   | 1 -
>  2 files changed, 5 deletions(-)
> 
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 7d5edfa2e630..19a9b14935b7 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -4643,7 +4643,6 @@ static long ext4_zero_range(struct file *file, loff_t offset,
>  
>  		/* Now release the pages and zero block aligned part of pages */
>  		truncate_pagecache_range(inode, start, end - 1);
> -		inode_set_mtime_to_ts(inode, inode_set_ctime_current(inode));
>  
>  		ret = ext4_alloc_file_blocks(file, lblk, max_blocks, new_size,
>  					     flags);
> @@ -4667,7 +4666,6 @@ static long ext4_zero_range(struct file *file, loff_t offset,
>  		goto out_invalidate_lock;
>  	}
>  
> -	inode_set_mtime_to_ts(inode, inode_set_ctime_current(inode));
>  	if (new_size)
>  		ext4_update_inode_size(inode, new_size);
>  	ret = ext4_mark_inode_dirty(handle, inode);
> @@ -5393,7 +5391,6 @@ static int ext4_collapse_range(struct file *file, loff_t offset, loff_t len)
>  	up_write(&EXT4_I(inode)->i_data_sem);
>  	if (IS_SYNC(inode))
>  		ext4_handle_sync(handle);
> -	inode_set_mtime_to_ts(inode, inode_set_ctime_current(inode));
>  	ret = ext4_mark_inode_dirty(handle, inode);
>  	ext4_update_inode_fsync_trans(handle, inode, 1);
>  
> @@ -5503,7 +5500,6 @@ static int ext4_insert_range(struct file *file, loff_t offset, loff_t len)
>  	/* Expand file to avoid data loss if there is error while shifting */
>  	inode->i_size += len;
>  	EXT4_I(inode)->i_disksize += len;
> -	inode_set_mtime_to_ts(inode, inode_set_ctime_current(inode));
>  	ret = ext4_mark_inode_dirty(handle, inode);
>  	if (ret)
>  		goto out_stop;
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index c3d7606a5315..8af25442d44d 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -4074,7 +4074,6 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
>  	if (IS_SYNC(inode))
>  		ext4_handle_sync(handle);
>  
> -	inode_set_mtime_to_ts(inode, inode_set_ctime_current(inode));
>  	ret2 = ext4_mark_inode_dirty(handle, inode);
>  	if (unlikely(ret2))
>  		ret = ret2;
> -- 
> 2.39.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 03/10] ext4: drop ext4_update_disksize_before_punch()
  2024-09-04  6:29 ` [PATCH v2 03/10] ext4: drop ext4_update_disksize_before_punch() Zhang Yi
@ 2024-09-20 16:13   ` Jan Kara
  2024-09-24  7:43     ` Zhang Yi
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Kara @ 2024-09-20 16:13 UTC (permalink / raw)
  To: Zhang Yi
  Cc: linux-ext4, linux-fsdevel, linux-kernel, tytso, adilger.kernel,
	jack, ritesh.list, yi.zhang, chengzhihao1, yukuai3

On Wed 04-09-24 14:29:18, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@huawei.com>
> 
> Since we always write back dirty data before zeroing range and punching
> hole, the delalloc extended file's disksize of should be updated
> properly when writing back pages, hence we don't need to update file's
> disksize before discarding page cache in ext4_zero_range() and
> ext4_punch_hole(), just drop ext4_update_disksize_before_punch().
> 
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>

So when we don't write out before hole punching & company this needs to stay
in some shape or form. 

								Honza

> ---
>  fs/ext4/ext4.h    |  3 ---
>  fs/ext4/extents.c |  4 ----
>  fs/ext4/inode.c   | 37 +------------------------------------
>  3 files changed, 1 insertion(+), 43 deletions(-)
> 
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 08acd152261e..e8d7965f62c4 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -3414,9 +3414,6 @@ static inline int ext4_update_inode_size(struct inode *inode, loff_t newsize)
>  	return changed;
>  }
>  
> -int ext4_update_disksize_before_punch(struct inode *inode, loff_t offset,
> -				      loff_t len);
> -
>  struct ext4_group_info {
>  	unsigned long   bb_state;
>  #ifdef AGGRESSIVE_CHECK
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 19a9b14935b7..d9fccf2970e9 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -4637,10 +4637,6 @@ static long ext4_zero_range(struct file *file, loff_t offset,
>  		flags |= (EXT4_GET_BLOCKS_CONVERT_UNWRITTEN |
>  			  EXT4_EX_NOCACHE);
>  
> -		ret = ext4_update_disksize_before_punch(inode, offset, len);
> -		if (ret)
> -			goto out_invalidate_lock;
> -
>  		/* Now release the pages and zero block aligned part of pages */
>  		truncate_pagecache_range(inode, start, end - 1);
>  
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 8af25442d44d..9343ce9f2b01 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3872,37 +3872,6 @@ int ext4_can_truncate(struct inode *inode)
>  	return 0;
>  }
>  
> -/*
> - * We have to make sure i_disksize gets properly updated before we truncate
> - * page cache due to hole punching or zero range. Otherwise i_disksize update
> - * can get lost as it may have been postponed to submission of writeback but
> - * that will never happen after we truncate page cache.
> - */
> -int ext4_update_disksize_before_punch(struct inode *inode, loff_t offset,
> -				      loff_t len)
> -{
> -	handle_t *handle;
> -	int ret;
> -
> -	loff_t size = i_size_read(inode);
> -
> -	WARN_ON(!inode_is_locked(inode));
> -	if (offset > size || offset + len < size)
> -		return 0;
> -
> -	if (EXT4_I(inode)->i_disksize >= size)
> -		return 0;
> -
> -	handle = ext4_journal_start(inode, EXT4_HT_MISC, 1);
> -	if (IS_ERR(handle))
> -		return PTR_ERR(handle);
> -	ext4_update_i_disksize(inode, size);
> -	ret = ext4_mark_inode_dirty(handle, inode);
> -	ext4_journal_stop(handle);
> -
> -	return ret;
> -}
> -
>  static void ext4_wait_dax_page(struct inode *inode)
>  {
>  	filemap_invalidate_unlock(inode->i_mapping);
> @@ -4022,13 +3991,9 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
>  	last_block_offset = round_down((offset + length), sb->s_blocksize) - 1;
>  
>  	/* Now release the pages and zero block aligned part of pages*/
> -	if (last_block_offset > first_block_offset) {
> -		ret = ext4_update_disksize_before_punch(inode, offset, length);
> -		if (ret)
> -			goto out_dio;
> +	if (last_block_offset > first_block_offset)
>  		truncate_pagecache_range(inode, first_block_offset,
>  					 last_block_offset);
> -	}
>  
>  	if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))
>  		credits = ext4_writepage_trans_blocks(inode);
> -- 
> 2.39.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 04/10] ext4: refactor ext4_zero_range()
  2024-09-04  6:29 ` [PATCH v2 04/10] ext4: refactor ext4_zero_range() Zhang Yi
@ 2024-09-20 16:24   ` Jan Kara
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Kara @ 2024-09-20 16:24 UTC (permalink / raw)
  To: Zhang Yi
  Cc: linux-ext4, linux-fsdevel, linux-kernel, tytso, adilger.kernel,
	jack, ritesh.list, yi.zhang, chengzhihao1, yukuai3

On Wed 04-09-24 14:29:19, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@huawei.com>
> 
> Current ext4_zero_range() is full of complex position calculation and
> stale error out tags. In order to clean up the code and make things
> clear, refactor it by a) simplify and rename variables, b) remove some
> unnecessary position calculations, always write back dirty data and
> drop cache from offset to end, instead of only write back aligned
> blocks, c) rename the stale out_mutex tag.
> 
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>

Looks good to me. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/extents.c | 96 ++++++++++++++++++-----------------------------
>  1 file changed, 37 insertions(+), 59 deletions(-)
> 
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index d9fccf2970e9..2fb0c2e303c7 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -4540,40 +4540,15 @@ static long ext4_zero_range(struct file *file, loff_t offset,
>  	struct inode *inode = file_inode(file);
>  	struct address_space *mapping = file->f_mapping;
>  	handle_t *handle = NULL;
> -	unsigned int max_blocks;
>  	loff_t new_size = 0;
> -	int ret = 0;
> -	int flags;
> -	int credits;
> -	int partial_begin, partial_end;
> -	loff_t start, end;
> -	ext4_lblk_t lblk;
> +	loff_t end = offset + len;
> +	ext4_lblk_t start_lblk, end_lblk;
> +	unsigned int blocksize = i_blocksize(inode);
>  	unsigned int blkbits = inode->i_blkbits;
> +	int ret, flags, credits;
>  
>  	trace_ext4_zero_range(inode, offset, len, mode);
>  
> -	/*
> -	 * Round up offset. This is not fallocate, we need to zero out
> -	 * blocks, so convert interior block aligned part of the range to
> -	 * unwritten and possibly manually zero out unaligned parts of the
> -	 * range. Here, start and partial_begin are inclusive, end and
> -	 * partial_end are exclusive.
> -	 */
> -	start = round_up(offset, 1 << blkbits);
> -	end = round_down((offset + len), 1 << blkbits);
> -
> -	if (start < offset || end > offset + len)
> -		return -EINVAL;
> -	partial_begin = offset & ((1 << blkbits) - 1);
> -	partial_end = (offset + len) & ((1 << blkbits) - 1);
> -
> -	lblk = start >> blkbits;
> -	max_blocks = (end >> blkbits);
> -	if (max_blocks < lblk)
> -		max_blocks = 0;
> -	else
> -		max_blocks -= lblk;
> -
>  	inode_lock(inode);
>  
>  	/*
> @@ -4581,26 +4556,23 @@ static long ext4_zero_range(struct file *file, loff_t offset,
>  	 */
>  	if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) {
>  		ret = -EOPNOTSUPP;
> -		goto out_mutex;
> +		goto out;
>  	}
>  
>  	if (!(mode & FALLOC_FL_KEEP_SIZE) &&
> -	    (offset + len > inode->i_size ||
> -	     offset + len > EXT4_I(inode)->i_disksize)) {
> -		new_size = offset + len;
> +	    (end > inode->i_size || end > EXT4_I(inode)->i_disksize)) {
> +		new_size = end;
>  		ret = inode_newsize_ok(inode, new_size);
>  		if (ret)
> -			goto out_mutex;
> +			goto out;
>  	}
>  
> -	flags = EXT4_GET_BLOCKS_CREATE_UNWRIT_EXT;
> -
>  	/* Wait all existing dio workers, newcomers will block on i_rwsem */
>  	inode_dio_wait(inode);
>  
>  	ret = file_modified(file);
>  	if (ret)
> -		goto out_mutex;
> +		goto out;
>  
>  	/*
>  	 * Prevent page faults from reinstantiating pages we have released
> @@ -4616,36 +4588,40 @@ static long ext4_zero_range(struct file *file, loff_t offset,
>  	 * Write data that will be zeroed to preserve them when successfully
>  	 * discarding page cache below but fail to convert extents.
>  	 */
> -	ret = filemap_write_and_wait_range(mapping, start, end - 1);
> +	ret = filemap_write_and_wait_range(mapping, offset, end - 1);
>  	if (ret)
>  		goto out_invalidate_lock;
>  
> +	/* Now release the pages and zero block aligned part of pages */
> +	truncate_pagecache_range(inode, offset, end - 1);
> +
> +	flags = EXT4_GET_BLOCKS_CREATE_UNWRIT_EXT;
>  	/* Preallocate the range including the unaligned edges */
> -	if (partial_begin || partial_end) {
> -		ret = ext4_alloc_file_blocks(file,
> -				round_down(offset, 1 << blkbits) >> blkbits,
> -				(round_up((offset + len), 1 << blkbits) -
> -				 round_down(offset, 1 << blkbits)) >> blkbits,
> -				new_size, flags);
> +	if (offset & (blocksize - 1) || end & (blocksize - 1)) {
> +		ext4_lblk_t alloc_lblk = offset >> blkbits;
> +		ext4_lblk_t len_lblk = EXT4_MAX_BLOCKS(len, offset, blkbits);
> +
> +		ret = ext4_alloc_file_blocks(file, alloc_lblk, len_lblk,
> +					     new_size, flags);
>  		if (ret)
>  			goto out_invalidate_lock;
>  
>  	}
>  
>  	/* Zero range excluding the unaligned edges */
> -	if (max_blocks > 0) {
> -		flags |= (EXT4_GET_BLOCKS_CONVERT_UNWRITTEN |
> -			  EXT4_EX_NOCACHE);
> -
> -		/* Now release the pages and zero block aligned part of pages */
> -		truncate_pagecache_range(inode, start, end - 1);
> -
> -		ret = ext4_alloc_file_blocks(file, lblk, max_blocks, new_size,
> -					     flags);
> +	start_lblk = round_up(offset, blocksize) >> blkbits;
> +	end_lblk = end >> blkbits;
> +	if (end_lblk > start_lblk) {
> +		ext4_lblk_t zero_blks = end_lblk - start_lblk;
> +
> +		flags |= (EXT4_GET_BLOCKS_CONVERT_UNWRITTEN | EXT4_EX_NOCACHE);
> +		ret = ext4_alloc_file_blocks(file, start_lblk, zero_blks,
> +					     new_size, flags);
>  		if (ret)
>  			goto out_invalidate_lock;
>  	}
> -	if (!partial_begin && !partial_end)
> +	/* Finish zeroing out if it doesn't contain partial block */
> +	if (!(offset & (blocksize - 1)) && !(end & (blocksize - 1)))
>  		goto out_invalidate_lock;
>  
>  	/*
> @@ -4662,16 +4638,18 @@ static long ext4_zero_range(struct file *file, loff_t offset,
>  		goto out_invalidate_lock;
>  	}
>  
> +	/* Zero out partial block at the edges of the range */
> +	ret = ext4_zero_partial_blocks(handle, inode, offset, len);
> +	if (ret)
> +		goto out_handle;
> +
>  	if (new_size)
>  		ext4_update_inode_size(inode, new_size);
>  	ret = ext4_mark_inode_dirty(handle, inode);
>  	if (unlikely(ret))
>  		goto out_handle;
> -	/* Zero out partial block at the edges of the range */
> -	ret = ext4_zero_partial_blocks(handle, inode, offset, len);
> -	if (ret >= 0)
> -		ext4_update_inode_fsync_trans(handle, inode, 1);
>  
> +	ext4_update_inode_fsync_trans(handle, inode, 1);
>  	if (file->f_flags & O_SYNC)
>  		ext4_handle_sync(handle);
>  
> @@ -4679,7 +4657,7 @@ static long ext4_zero_range(struct file *file, loff_t offset,
>  	ext4_journal_stop(handle);
>  out_invalidate_lock:
>  	filemap_invalidate_unlock(mapping);
> -out_mutex:
> +out:
>  	inode_unlock(inode);
>  	return ret;
>  }
> -- 
> 2.39.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 05/10] ext4: refactor ext4_punch_hole()
  2024-09-04  6:29 ` [PATCH v2 05/10] ext4: refactor ext4_punch_hole() Zhang Yi
@ 2024-09-20 16:31   ` Jan Kara
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Kara @ 2024-09-20 16:31 UTC (permalink / raw)
  To: Zhang Yi
  Cc: linux-ext4, linux-fsdevel, linux-kernel, tytso, adilger.kernel,
	jack, ritesh.list, yi.zhang, chengzhihao1, yukuai3

On Wed 04-09-24 14:29:20, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@huawei.com>
> 
> Current ext4_punch_hole() is full of complex position calculation and
> stale error out tags. In order to clean up the code and make things
> clear, refactor it by a) simplify and rename variables, make the style
> the same as ext4_zero_range(), b) remove some unnecessary position
> calculations, always write back dirty data and drop cache from offset to
> end, instead of only write back aligned blocks, c) rename the three
> stale error tags.
> 
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/inode.c | 114 ++++++++++++++++++++++--------------------------
>  1 file changed, 51 insertions(+), 63 deletions(-)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 9343ce9f2b01..dfaf9e9d6ad8 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3916,13 +3916,14 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
>  {
>  	struct inode *inode = file_inode(file);
>  	struct super_block *sb = inode->i_sb;
> -	ext4_lblk_t first_block, stop_block;
> +	ext4_lblk_t start_lblk, end_lblk;
>  	struct address_space *mapping = inode->i_mapping;
> -	loff_t first_block_offset, last_block_offset, max_length;
> -	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
> +	loff_t max_end = EXT4_SB(sb)->s_bitmap_maxbytes - sb->s_blocksize;
> +	loff_t end = offset + length;
> +	unsigned long blocksize = i_blocksize(inode);
>  	handle_t *handle;
>  	unsigned int credits;
> -	int ret = 0, ret2 = 0;
> +	int ret = 0;
>  
>  	trace_ext4_punch_hole(inode, offset, length, 0);
>  
> @@ -3930,36 +3931,27 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
>  
>  	/* No need to punch hole beyond i_size */
>  	if (offset >= inode->i_size)
> -		goto out_mutex;
> +		goto out;
>  
>  	/*
> -	 * If the hole extends beyond i_size, set the hole
> -	 * to end after the page that contains i_size
> +	 * If the hole extends beyond i_size, set the hole to end after
> +	 * the page that contains i_size, and also make sure that the hole
> +	 * within one block before last range.
>  	 */
> -	if (offset + length > inode->i_size) {
> -		length = inode->i_size +
> -		   PAGE_SIZE - (inode->i_size & (PAGE_SIZE - 1)) -
> -		   offset;
> -	}
> +	if (end > inode->i_size)
> +		end = round_up(inode->i_size, PAGE_SIZE);
> +	if (end > max_end)
> +		end = max_end;
> +	length = end - offset;
>  
>  	/*
> -	 * For punch hole the length + offset needs to be within one block
> -	 * before last range. Adjust the length if it goes beyond that limit.
> +	 * Attach jinode to inode for jbd2 if we do any zeroing of partial
> +	 * block.
>  	 */
> -	max_length = sbi->s_bitmap_maxbytes - inode->i_sb->s_blocksize;
> -	if (offset + length > max_length)
> -		length = max_length - offset;
> -
> -	if (offset & (sb->s_blocksize - 1) ||
> -	    (offset + length) & (sb->s_blocksize - 1)) {
> -		/*
> -		 * Attach jinode to inode for jbd2 if we do any zeroing of
> -		 * partial block
> -		 */
> +	if (offset & (blocksize - 1) || end & (blocksize - 1)) {
>  		ret = ext4_inode_attach_jinode(inode);
>  		if (ret < 0)
> -			goto out_mutex;
> -
> +			goto out;
>  	}
>  
>  	/* Wait all existing dio workers, newcomers will block on i_rwsem */
> @@ -3967,7 +3959,7 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
>  
>  	ret = file_modified(file);
>  	if (ret)
> -		goto out_mutex;
> +		goto out;
>  
>  	/*
>  	 * Prevent page faults from reinstantiating pages we have released from
> @@ -3977,23 +3969,17 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
>  
>  	ret = ext4_break_layouts(inode);
>  	if (ret)
> -		goto out_dio;
> +		goto out_invalidate_lock;
>  
>  	/* Write out all dirty pages to avoid race conditions */
>  	if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
> -		ret = filemap_write_and_wait_range(mapping, offset,
> -						   offset + length - 1);
> +		ret = filemap_write_and_wait_range(mapping, offset, end - 1);
>  		if (ret)
> -			goto out_dio;
> +			goto out_invalidate_lock;
>  	}
>  
> -	first_block_offset = round_up(offset, sb->s_blocksize);
> -	last_block_offset = round_down((offset + length), sb->s_blocksize) - 1;
> -
>  	/* Now release the pages and zero block aligned part of pages*/
> -	if (last_block_offset > first_block_offset)
> -		truncate_pagecache_range(inode, first_block_offset,
> -					 last_block_offset);
> +	truncate_pagecache_range(inode, offset, end - 1);
>  
>  	if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))
>  		credits = ext4_writepage_trans_blocks(inode);
> @@ -4003,52 +3989,54 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
>  	if (IS_ERR(handle)) {
>  		ret = PTR_ERR(handle);
>  		ext4_std_error(sb, ret);
> -		goto out_dio;
> +		goto out_invalidate_lock;
>  	}
>  
> -	ret = ext4_zero_partial_blocks(handle, inode, offset,
> -				       length);
> +	ret = ext4_zero_partial_blocks(handle, inode, offset, length);
>  	if (ret)
> -		goto out_stop;
> -
> -	first_block = (offset + sb->s_blocksize - 1) >>
> -		EXT4_BLOCK_SIZE_BITS(sb);
> -	stop_block = (offset + length) >> EXT4_BLOCK_SIZE_BITS(sb);
> +		goto out_handle;
>  
>  	/* If there are blocks to remove, do it */
> -	if (stop_block > first_block) {
> -		ext4_lblk_t hole_len = stop_block - first_block;
> +	start_lblk = round_up(offset, blocksize) >> inode->i_blkbits;
> +	end_lblk = end >> inode->i_blkbits;
> +
> +	if (end_lblk > start_lblk) {
> +		ext4_lblk_t hole_len = end_lblk - start_lblk;
>  
>  		down_write(&EXT4_I(inode)->i_data_sem);
>  		ext4_discard_preallocations(inode);
>  
> -		ext4_es_remove_extent(inode, first_block, hole_len);
> +		ext4_es_remove_extent(inode, start_lblk, hole_len);
>  
>  		if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))
> -			ret = ext4_ext_remove_space(inode, first_block,
> -						    stop_block - 1);
> +			ret = ext4_ext_remove_space(inode, start_lblk,
> +						    end_lblk - 1);
>  		else
> -			ret = ext4_ind_remove_space(handle, inode, first_block,
> -						    stop_block);
> +			ret = ext4_ind_remove_space(handle, inode, start_lblk,
> +						    end_lblk);
> +		if (ret) {
> +			up_write(&EXT4_I(inode)->i_data_sem);
> +			goto out_handle;
> +		}
>  
> -		ext4_es_insert_extent(inode, first_block, hole_len, ~0,
> +		ext4_es_insert_extent(inode, start_lblk, hole_len, ~0,
>  				      EXTENT_STATUS_HOLE);
>  		up_write(&EXT4_I(inode)->i_data_sem);
>  	}
> -	ext4_fc_track_range(handle, inode, first_block, stop_block);
> +	ext4_fc_track_range(handle, inode, start_lblk, end_lblk);
> +
> +	ret = ext4_mark_inode_dirty(handle, inode);
> +	if (unlikely(ret))
> +		goto out_handle;
> +
> +	ext4_update_inode_fsync_trans(handle, inode, 1);
>  	if (IS_SYNC(inode))
>  		ext4_handle_sync(handle);
> -
> -	ret2 = ext4_mark_inode_dirty(handle, inode);
> -	if (unlikely(ret2))
> -		ret = ret2;
> -	if (ret >= 0)
> -		ext4_update_inode_fsync_trans(handle, inode, 1);
> -out_stop:
> +out_handle:
>  	ext4_journal_stop(handle);
> -out_dio:
> +out_invalidate_lock:
>  	filemap_invalidate_unlock(mapping);
> -out_mutex:
> +out:
>  	inode_unlock(inode);
>  	return ret;
>  }
> -- 
> 2.39.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 06/10] ext4: refactor ext4_collapse_range()
  2024-09-04  6:29 ` [PATCH v2 06/10] ext4: refactor ext4_collapse_range() Zhang Yi
@ 2024-09-20 16:35   ` Jan Kara
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Kara @ 2024-09-20 16:35 UTC (permalink / raw)
  To: Zhang Yi
  Cc: linux-ext4, linux-fsdevel, linux-kernel, tytso, adilger.kernel,
	jack, ritesh.list, yi.zhang, chengzhihao1, yukuai3

On Wed 04-09-24 14:29:21, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@huawei.com>
> 
> Simplify ext4_collapse_range() and make the code style the same as
> ext4_zero_range() and ext4_punch_hole(), refactor it by a) rename
> variables, b) drop redundant input parameters checking, move others to
> under i_rwsem, preparing for later refactor, c) rename the three stale
> error tags.
> 
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/extents.c | 80 +++++++++++++++++++++++------------------------
>  1 file changed, 39 insertions(+), 41 deletions(-)
> 
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 2fb0c2e303c7..5c0b4d512531 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -5265,43 +5265,35 @@ static int ext4_collapse_range(struct file *file, loff_t offset, loff_t len)
>  	struct inode *inode = file_inode(file);
>  	struct super_block *sb = inode->i_sb;
>  	struct address_space *mapping = inode->i_mapping;
> -	ext4_lblk_t punch_start, punch_stop;
> +	ext4_lblk_t start_lblk, end_lblk;
>  	handle_t *handle;
>  	unsigned int credits;
> -	loff_t new_size, ioffset;
> +	loff_t start, new_size;
>  	int ret;
>  
> -	/*
> -	 * We need to test this early because xfstests assumes that a
> -	 * collapse range of (0, 1) will return EOPNOTSUPP if the file
> -	 * system does not support collapse range.
> -	 */
> -	if (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))
> -		return -EOPNOTSUPP;
> +	trace_ext4_collapse_range(inode, offset, len);
>  
> -	/* Collapse range works only on fs cluster size aligned regions. */
> -	if (!IS_ALIGNED(offset | len, EXT4_CLUSTER_SIZE(sb)))
> -		return -EINVAL;
> +	inode_lock(inode);
>  
> -	trace_ext4_collapse_range(inode, offset, len);
> +	/* Currently just for extent based files */
> +	if (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) {
> +		ret = -EOPNOTSUPP;
> +		goto out;
> +	}
>  
> -	punch_start = offset >> EXT4_BLOCK_SIZE_BITS(sb);
> -	punch_stop = (offset + len) >> EXT4_BLOCK_SIZE_BITS(sb);
> +	/* Collapse range works only on fs cluster size aligned regions. */
> +	if (!IS_ALIGNED(offset | len, EXT4_CLUSTER_SIZE(sb))) {
> +		ret = -EINVAL;
> +		goto out;
> +	}
>  
> -	inode_lock(inode);
>  	/*
>  	 * There is no need to overlap collapse range with EOF, in which case
>  	 * it is effectively a truncate operation
>  	 */
>  	if (offset + len >= inode->i_size) {
>  		ret = -EINVAL;
> -		goto out_mutex;
> -	}
> -
> -	/* Currently just for extent based files */
> -	if (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) {
> -		ret = -EOPNOTSUPP;
> -		goto out_mutex;
> +		goto out;
>  	}
>  
>  	/* Wait for existing dio to complete */
> @@ -5309,7 +5301,7 @@ static int ext4_collapse_range(struct file *file, loff_t offset, loff_t len)
>  
>  	ret = file_modified(file);
>  	if (ret)
> -		goto out_mutex;
> +		goto out;
>  
>  	/*
>  	 * Prevent page faults from reinstantiating pages we have released from
> @@ -5319,43 +5311,46 @@ static int ext4_collapse_range(struct file *file, loff_t offset, loff_t len)
>  
>  	ret = ext4_break_layouts(inode);
>  	if (ret)
> -		goto out_mmap;
> +		goto out_invalidate_lock;
>  
>  	/*
>  	 * Need to round down offset to be aligned with page size boundary
>  	 * for page size > block size.
>  	 */
> -	ioffset = round_down(offset, PAGE_SIZE);
> +	start = round_down(offset, PAGE_SIZE);
>  	/* Write out all dirty pages */
> -	ret = filemap_write_and_wait_range(mapping, ioffset, LLONG_MAX);
> +	ret = filemap_write_and_wait_range(mapping, start, LLONG_MAX);
>  	if (ret)
> -		goto out_mmap;
> -	truncate_pagecache(inode, ioffset);
> +		goto out_invalidate_lock;
> +	truncate_pagecache(inode, start);
>  
>  	credits = ext4_writepage_trans_blocks(inode);
>  	handle = ext4_journal_start(inode, EXT4_HT_TRUNCATE, credits);
>  	if (IS_ERR(handle)) {
>  		ret = PTR_ERR(handle);
> -		goto out_mmap;
> +		goto out_invalidate_lock;
>  	}
>  	ext4_fc_mark_ineligible(sb, EXT4_FC_REASON_FALLOC_RANGE, handle);
>  
> +	start_lblk = offset >> inode->i_blkbits;
> +	end_lblk = (offset + len) >> inode->i_blkbits;
> +
>  	down_write(&EXT4_I(inode)->i_data_sem);
>  	ext4_discard_preallocations(inode);
> -	ext4_es_remove_extent(inode, punch_start, EXT_MAX_BLOCKS - punch_start);
> +	ext4_es_remove_extent(inode, start_lblk, EXT_MAX_BLOCKS - start_lblk);
>  
> -	ret = ext4_ext_remove_space(inode, punch_start, punch_stop - 1);
> +	ret = ext4_ext_remove_space(inode, start_lblk, end_lblk - 1);
>  	if (ret) {
>  		up_write(&EXT4_I(inode)->i_data_sem);
> -		goto out_stop;
> +		goto out_handle;
>  	}
>  	ext4_discard_preallocations(inode);
>  
> -	ret = ext4_ext_shift_extents(inode, handle, punch_stop,
> -				     punch_stop - punch_start, SHIFT_LEFT);
> +	ret = ext4_ext_shift_extents(inode, handle, end_lblk,
> +				     end_lblk - start_lblk, SHIFT_LEFT);
>  	if (ret) {
>  		up_write(&EXT4_I(inode)->i_data_sem);
> -		goto out_stop;
> +		goto out_handle;
>  	}
>  
>  	new_size = inode->i_size - len;
> @@ -5363,16 +5358,19 @@ static int ext4_collapse_range(struct file *file, loff_t offset, loff_t len)
>  	EXT4_I(inode)->i_disksize = new_size;
>  
>  	up_write(&EXT4_I(inode)->i_data_sem);
> -	if (IS_SYNC(inode))
> -		ext4_handle_sync(handle);
>  	ret = ext4_mark_inode_dirty(handle, inode);
> +	if (ret)
> +		goto out_handle;
> +
>  	ext4_update_inode_fsync_trans(handle, inode, 1);
> +	if (IS_SYNC(inode))
> +		ext4_handle_sync(handle);
>  
> -out_stop:
> +out_handle:
>  	ext4_journal_stop(handle);
> -out_mmap:
> +out_invalidate_lock:
>  	filemap_invalidate_unlock(mapping);
> -out_mutex:
> +out:
>  	inode_unlock(inode);
>  	return ret;
>  }
> -- 
> 2.39.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 07/10] ext4: refactor ext4_insert_range()
  2024-09-04  6:29 ` [PATCH v2 07/10] ext4: refactor ext4_insert_range() Zhang Yi
@ 2024-09-23  8:17   ` Jan Kara
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Kara @ 2024-09-23  8:17 UTC (permalink / raw)
  To: Zhang Yi
  Cc: linux-ext4, linux-fsdevel, linux-kernel, tytso, adilger.kernel,
	jack, ritesh.list, yi.zhang, chengzhihao1, yukuai3

On Wed 04-09-24 14:29:22, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@huawei.com>
> 
> Simplify ext4_collapse_range() and make the code style the same as
> ext4_collapse_range(), refactor it by a) rename variables, b) drop
> redundant input parameters checking, move others to under i_rwsem,
> preparing for later refactor, c) rename the three stale error tags.
> 
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/extents.c | 95 ++++++++++++++++++++++-------------------------
>  1 file changed, 45 insertions(+), 50 deletions(-)
> 
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 5c0b4d512531..a6c24c229cb4 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -5391,45 +5391,37 @@ static int ext4_insert_range(struct file *file, loff_t offset, loff_t len)
>  	handle_t *handle;
>  	struct ext4_ext_path *path;
>  	struct ext4_extent *extent;
> -	ext4_lblk_t offset_lblk, len_lblk, ee_start_lblk = 0;
> +	ext4_lblk_t start_lblk, len_lblk, ee_start_lblk = 0;
>  	unsigned int credits, ee_len;
> -	int ret = 0, depth, split_flag = 0;
> -	loff_t ioffset;
> -
> -	/*
> -	 * We need to test this early because xfstests assumes that an
> -	 * insert range of (0, 1) will return EOPNOTSUPP if the file
> -	 * system does not support insert range.
> -	 */
> -	if (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))
> -		return -EOPNOTSUPP;
> -
> -	/* Insert range works only on fs cluster size aligned regions. */
> -	if (!IS_ALIGNED(offset | len, EXT4_CLUSTER_SIZE(sb)))
> -		return -EINVAL;
> +	int ret, depth, split_flag = 0;
> +	loff_t start;
>  
>  	trace_ext4_insert_range(inode, offset, len);
>  
> -	offset_lblk = offset >> EXT4_BLOCK_SIZE_BITS(sb);
> -	len_lblk = len >> EXT4_BLOCK_SIZE_BITS(sb);
> -
>  	inode_lock(inode);
> +
>  	/* Currently just for extent based files */
>  	if (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) {
>  		ret = -EOPNOTSUPP;
> -		goto out_mutex;
> +		goto out;
>  	}
>  
> -	/* Check whether the maximum file size would be exceeded */
> -	if (len > inode->i_sb->s_maxbytes - inode->i_size) {
> -		ret = -EFBIG;
> -		goto out_mutex;
> +	/* Insert range works only on fs cluster size aligned regions. */
> +	if (!IS_ALIGNED(offset | len, EXT4_CLUSTER_SIZE(sb))) {
> +		ret = -EINVAL;
> +		goto out;
>  	}
>  
>  	/* Offset must be less than i_size */
>  	if (offset >= inode->i_size) {
>  		ret = -EINVAL;
> -		goto out_mutex;
> +		goto out;
> +	}
> +
> +	/* Check whether the maximum file size would be exceeded */
> +	if (len > inode->i_sb->s_maxbytes - inode->i_size) {
> +		ret = -EFBIG;
> +		goto out;
>  	}
>  
>  	/* Wait for existing dio to complete */
> @@ -5437,7 +5429,7 @@ static int ext4_insert_range(struct file *file, loff_t offset, loff_t len)
>  
>  	ret = file_modified(file);
>  	if (ret)
> -		goto out_mutex;
> +		goto out;
>  
>  	/*
>  	 * Prevent page faults from reinstantiating pages we have released from
> @@ -5447,25 +5439,24 @@ static int ext4_insert_range(struct file *file, loff_t offset, loff_t len)
>  
>  	ret = ext4_break_layouts(inode);
>  	if (ret)
> -		goto out_mmap;
> +		goto out_invalidate_lock;
>  
>  	/*
>  	 * Need to round down to align start offset to page size boundary
>  	 * for page size > block size.
>  	 */
> -	ioffset = round_down(offset, PAGE_SIZE);
> +	start = round_down(offset, PAGE_SIZE);
>  	/* Write out all dirty pages */
> -	ret = filemap_write_and_wait_range(inode->i_mapping, ioffset,
> -			LLONG_MAX);
> +	ret = filemap_write_and_wait_range(mapping, start, LLONG_MAX);
>  	if (ret)
> -		goto out_mmap;
> -	truncate_pagecache(inode, ioffset);
> +		goto out_invalidate_lock;
> +	truncate_pagecache(inode, start);
>  
>  	credits = ext4_writepage_trans_blocks(inode);
>  	handle = ext4_journal_start(inode, EXT4_HT_TRUNCATE, credits);
>  	if (IS_ERR(handle)) {
>  		ret = PTR_ERR(handle);
> -		goto out_mmap;
> +		goto out_invalidate_lock;
>  	}
>  	ext4_fc_mark_ineligible(sb, EXT4_FC_REASON_FALLOC_RANGE, handle);
>  
> @@ -5474,15 +5465,18 @@ static int ext4_insert_range(struct file *file, loff_t offset, loff_t len)
>  	EXT4_I(inode)->i_disksize += len;
>  	ret = ext4_mark_inode_dirty(handle, inode);
>  	if (ret)
> -		goto out_stop;
> +		goto out_handle;
> +
> +	start_lblk = offset >> inode->i_blkbits;
> +	len_lblk = len >> inode->i_blkbits;
>  
>  	down_write(&EXT4_I(inode)->i_data_sem);
>  	ext4_discard_preallocations(inode);
>  
> -	path = ext4_find_extent(inode, offset_lblk, NULL, 0);
> +	path = ext4_find_extent(inode, start_lblk, NULL, 0);
>  	if (IS_ERR(path)) {
>  		up_write(&EXT4_I(inode)->i_data_sem);
> -		goto out_stop;
> +		goto out_handle;
>  	}
>  
>  	depth = ext_depth(inode);
> @@ -5492,16 +5486,16 @@ static int ext4_insert_range(struct file *file, loff_t offset, loff_t len)
>  		ee_len = ext4_ext_get_actual_len(extent);
>  
>  		/*
> -		 * If offset_lblk is not the starting block of extent, split
> -		 * the extent @offset_lblk
> +		 * If start_lblk is not the starting block of extent, split
> +		 * the extent @start_lblk
>  		 */
> -		if ((offset_lblk > ee_start_lblk) &&
> -				(offset_lblk < (ee_start_lblk + ee_len))) {
> +		if ((start_lblk > ee_start_lblk) &&
> +				(start_lblk < (ee_start_lblk + ee_len))) {
>  			if (ext4_ext_is_unwritten(extent))
>  				split_flag = EXT4_EXT_MARK_UNWRIT1 |
>  					EXT4_EXT_MARK_UNWRIT2;
>  			ret = ext4_split_extent_at(handle, inode, &path,
> -					offset_lblk, split_flag,
> +					start_lblk, split_flag,
>  					EXT4_EX_NOCACHE |
>  					EXT4_GET_BLOCKS_PRE_IO |
>  					EXT4_GET_BLOCKS_METADATA_NOFAIL);
> @@ -5510,32 +5504,33 @@ static int ext4_insert_range(struct file *file, loff_t offset, loff_t len)
>  		ext4_free_ext_path(path);
>  		if (ret < 0) {
>  			up_write(&EXT4_I(inode)->i_data_sem);
> -			goto out_stop;
> +			goto out_handle;
>  		}
>  	} else {
>  		ext4_free_ext_path(path);
>  	}
>  
> -	ext4_es_remove_extent(inode, offset_lblk, EXT_MAX_BLOCKS - offset_lblk);
> +	ext4_es_remove_extent(inode, start_lblk, EXT_MAX_BLOCKS - start_lblk);
>  
>  	/*
> -	 * if offset_lblk lies in a hole which is at start of file, use
> +	 * if start_lblk lies in a hole which is at start of file, use
>  	 * ee_start_lblk to shift extents
>  	 */
>  	ret = ext4_ext_shift_extents(inode, handle,
> -		max(ee_start_lblk, offset_lblk), len_lblk, SHIFT_RIGHT);
> -
> +		max(ee_start_lblk, start_lblk), len_lblk, SHIFT_RIGHT);
>  	up_write(&EXT4_I(inode)->i_data_sem);
> +	if (ret)
> +		goto out_handle;
> +
> +	ext4_update_inode_fsync_trans(handle, inode, 1);
>  	if (IS_SYNC(inode))
>  		ext4_handle_sync(handle);
> -	if (ret >= 0)
> -		ext4_update_inode_fsync_trans(handle, inode, 1);
>  
> -out_stop:
> +out_handle:
>  	ext4_journal_stop(handle);
> -out_mmap:
> +out_invalidate_lock:
>  	filemap_invalidate_unlock(mapping);
> -out_mutex:
> +out:
>  	inode_unlock(inode);
>  	return ret;
>  }
> -- 
> 2.39.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 08/10] ext4: factor out ext4_do_fallocate()
  2024-09-04  6:29 ` [PATCH v2 08/10] ext4: factor out ext4_do_fallocate() Zhang Yi
@ 2024-09-23  8:20   ` Jan Kara
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Kara @ 2024-09-23  8:20 UTC (permalink / raw)
  To: Zhang Yi
  Cc: linux-ext4, linux-fsdevel, linux-kernel, tytso, adilger.kernel,
	jack, ritesh.list, yi.zhang, chengzhihao1, yukuai3

On Wed 04-09-24 14:29:23, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@huawei.com>
> 
> Now the real job of normal fallocate are open code in ext4_fallocate(),
> factor out a new helper ext4_do_fallocate() to do the real job, like
> others functions (e.g. ext4_zero_range()) in ext4_fallocate() do, this
> can make the code more clear, no functional changes.
> 
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/extents.c | 125 ++++++++++++++++++++++------------------------
>  1 file changed, 60 insertions(+), 65 deletions(-)
> 
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index a6c24c229cb4..06b2c1190181 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -4662,6 +4662,58 @@ static long ext4_zero_range(struct file *file, loff_t offset,
>  	return ret;
>  }
>  
> +static long ext4_do_fallocate(struct file *file, loff_t offset,
> +			      loff_t len, int mode)
> +{
> +	struct inode *inode = file_inode(file);
> +	loff_t end = offset + len;
> +	loff_t new_size = 0;
> +	ext4_lblk_t start_lblk, len_lblk;
> +	int ret;
> +
> +	trace_ext4_fallocate_enter(inode, offset, len, mode);
> +
> +	start_lblk = offset >> inode->i_blkbits;
> +	len_lblk = EXT4_MAX_BLOCKS(len, offset, inode->i_blkbits);
> +
> +	inode_lock(inode);
> +
> +	/* We only support preallocation for extent-based files only. */
> +	if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) {
> +		ret = -EOPNOTSUPP;
> +		goto out;
> +	}
> +
> +	if (!(mode & FALLOC_FL_KEEP_SIZE) &&
> +	    (end > inode->i_size || end > EXT4_I(inode)->i_disksize)) {
> +		new_size = end;
> +		ret = inode_newsize_ok(inode, new_size);
> +		if (ret)
> +			goto out;
> +	}
> +
> +	/* Wait all existing dio workers, newcomers will block on i_rwsem */
> +	inode_dio_wait(inode);
> +
> +	ret = file_modified(file);
> +	if (ret)
> +		goto out;
> +
> +	ret = ext4_alloc_file_blocks(file, start_lblk, len_lblk, new_size,
> +				     EXT4_GET_BLOCKS_CREATE_UNWRIT_EXT);
> +	if (ret)
> +		goto out;
> +
> +	if (file->f_flags & O_SYNC && EXT4_SB(inode->i_sb)->s_journal) {
> +		ret = ext4_fc_commit(EXT4_SB(inode->i_sb)->s_journal,
> +					EXT4_I(inode)->i_sync_tid);
> +	}
> +out:
> +	inode_unlock(inode);
> +	trace_ext4_fallocate_exit(inode, offset, len_lblk, ret);
> +	return ret;
> +}
> +
>  /*
>   * preallocate space for a file. This implements ext4's fallocate file
>   * operation, which gets called from sys_fallocate system call.
> @@ -4672,12 +4724,7 @@ static long ext4_zero_range(struct file *file, loff_t offset,
>  long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
>  {
>  	struct inode *inode = file_inode(file);
> -	loff_t new_size = 0;
> -	unsigned int max_blocks;
> -	int ret = 0;
> -	int flags;
> -	ext4_lblk_t lblk;
> -	unsigned int blkbits = inode->i_blkbits;
> +	int ret;
>  
>  	/*
>  	 * Encrypted inodes can't handle collapse range or insert
> @@ -4699,71 +4746,19 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
>  	ret = ext4_convert_inline_data(inode);
>  	inode_unlock(inode);
>  	if (ret)
> -		goto exit;
> +		return ret;
>  
> -	if (mode & FALLOC_FL_PUNCH_HOLE) {
> +	if (mode & FALLOC_FL_PUNCH_HOLE)
>  		ret = ext4_punch_hole(file, offset, len);
> -		goto exit;
> -	}
> -
> -	if (mode & FALLOC_FL_COLLAPSE_RANGE) {
> +	else if (mode & FALLOC_FL_COLLAPSE_RANGE)
>  		ret = ext4_collapse_range(file, offset, len);
> -		goto exit;
> -	}
> -
> -	if (mode & FALLOC_FL_INSERT_RANGE) {
> +	else if (mode & FALLOC_FL_INSERT_RANGE)
>  		ret = ext4_insert_range(file, offset, len);
> -		goto exit;
> -	}
> -
> -	if (mode & FALLOC_FL_ZERO_RANGE) {
> +	else if (mode & FALLOC_FL_ZERO_RANGE)
>  		ret = ext4_zero_range(file, offset, len, mode);
> -		goto exit;
> -	}
> -	trace_ext4_fallocate_enter(inode, offset, len, mode);
> -	lblk = offset >> blkbits;
> -
> -	max_blocks = EXT4_MAX_BLOCKS(len, offset, blkbits);
> -	flags = EXT4_GET_BLOCKS_CREATE_UNWRIT_EXT;
> -
> -	inode_lock(inode);
> -
> -	/*
> -	 * We only support preallocation for extent-based files only
> -	 */
> -	if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) {
> -		ret = -EOPNOTSUPP;
> -		goto out;
> -	}
> -
> -	if (!(mode & FALLOC_FL_KEEP_SIZE) &&
> -	    (offset + len > inode->i_size ||
> -	     offset + len > EXT4_I(inode)->i_disksize)) {
> -		new_size = offset + len;
> -		ret = inode_newsize_ok(inode, new_size);
> -		if (ret)
> -			goto out;
> -	}
> -
> -	/* Wait all existing dio workers, newcomers will block on i_rwsem */
> -	inode_dio_wait(inode);
> -
> -	ret = file_modified(file);
> -	if (ret)
> -		goto out;
> -
> -	ret = ext4_alloc_file_blocks(file, lblk, max_blocks, new_size, flags);
> -	if (ret)
> -		goto out;
> +	else
> +		ret = ext4_do_fallocate(file, offset, len, mode);
>  
> -	if (file->f_flags & O_SYNC && EXT4_SB(inode->i_sb)->s_journal) {
> -		ret = ext4_fc_commit(EXT4_SB(inode->i_sb)->s_journal,
> -					EXT4_I(inode)->i_sync_tid);
> -	}
> -out:
> -	inode_unlock(inode);
> -	trace_ext4_fallocate_exit(inode, offset, max_blocks, ret);
> -exit:
>  	return ret;
>  }
>  
> -- 
> 2.39.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 09/10] ext4: factor out the common checking part of all fallocate operations
  2024-09-04  6:29 ` [PATCH v2 09/10] ext4: factor out the common checking part of all fallocate operations Zhang Yi
@ 2024-09-23  8:31   ` Jan Kara
  2024-09-24  7:52     ` Zhang Yi
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Kara @ 2024-09-23  8:31 UTC (permalink / raw)
  To: Zhang Yi
  Cc: linux-ext4, linux-fsdevel, linux-kernel, tytso, adilger.kernel,
	jack, ritesh.list, yi.zhang, chengzhihao1, yukuai3

On Wed 04-09-24 14:29:24, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@huawei.com>
> 
> Now the beginning of all the five functions in ext4_fallocate() (punch
> hole, zero range, insert range, collapse range and normal fallocate) are
> almost the same, they need to hold i_rwsem and check the validity of
> input parameters, so move the holding of i_rwsem to ext4_fallocate()
> and factor out a common helper to check the input parameters can make
> the code more clear.
> 
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
...
> +static int ext4_fallocate_check(struct inode *inode, int mode,
> +				loff_t offset, loff_t len)
> +{
> +	/* Currently except punch_hole, just for extent based files. */
> +	if (!(mode & FALLOC_FL_PUNCH_HOLE) &&
> +	    !ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))
> +		return -EOPNOTSUPP;
> +
> +	/*
> +	 * Insert range and collapse range works only on fs cluster size
> +	 * aligned regions.
> +	 */
> +	if (mode & (FALLOC_FL_INSERT_RANGE | FALLOC_FL_COLLAPSE_RANGE) &&
> +	    !IS_ALIGNED(offset | len, EXT4_CLUSTER_SIZE(inode->i_sb)))
> +		return -EINVAL;
> +
> +	if (mode & FALLOC_FL_INSERT_RANGE) {
> +		/* Collapse range, offset must be less than i_size */
> +		if (offset >= inode->i_size)
> +			return -EINVAL;
> +		/* Check whether the maximum file size would be exceeded */
> +		if (len > inode->i_sb->s_maxbytes - inode->i_size)
> +			return -EFBIG;
> +	} else if (mode & FALLOC_FL_COLLAPSE_RANGE) {
> +		/*
> +		 * Insert range, there is no need to overlap collapse
> +		 * range with EOF, in which case it is effectively a
> +		 * truncate operation.
> +		 */
> +		if (offset + len >= inode->i_size)
> +			return -EINVAL;
> +	}
> +
> +	return 0;
> +}

I don't think this helps. If the code is really shared, then the
factorization is good but here you have to do various checks what operation
we perform and in that case I don't think it really helps readability to
factor out checks into a common function.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 10/10] ext4: factor out a common helper to lock and flush data before fallocate
  2024-09-04  6:29 ` [PATCH v2 10/10] ext4: factor out a common helper to lock and flush data before fallocate Zhang Yi
@ 2024-09-23  8:54   ` Jan Kara
  2024-09-24  8:11     ` Zhang Yi
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Kara @ 2024-09-23  8:54 UTC (permalink / raw)
  To: Zhang Yi
  Cc: linux-ext4, linux-fsdevel, linux-kernel, tytso, adilger.kernel,
	jack, ritesh.list, yi.zhang, chengzhihao1, yukuai3

On Wed 04-09-24 14:29:25, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@huawei.com>
> 
> Now the beginning of the first four functions in ext4_fallocate() (punch
> hole, zero range, insert range and collapse range) are almost the same,
> they need to wait for the dio to finish, get filemap invalidate lock,
> write back dirty data and finally drop page cache. Factor out a common
> helper to do these work can reduce a lot of the redundant code.
> 
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>

I like that we factor out this functionality in a common helper. But see
below:

> @@ -4731,6 +4707,52 @@ static int ext4_fallocate_check(struct inode *inode, int mode,
>  	return 0;
>  }
>  
> +int ext4_prepare_falloc(struct file *file, loff_t start, loff_t end, int mode)
> +{
> +	struct inode *inode = file_inode(file);
> +	struct address_space *mapping = inode->i_mapping;
> +	int ret;
> +
> +	/* Wait all existing dio workers, newcomers will block on i_rwsem */
> +	inode_dio_wait(inode);
> +	ret = file_modified(file);
> +	if (ret)
> +		return ret;
> +
> +	/*
> +	 * Prevent page faults from reinstantiating pages we have released
> +	 * from page cache.
> +	 */
> +	filemap_invalidate_lock(mapping);
> +
> +	ret = ext4_break_layouts(inode);
> +	if (ret)
> +		goto failed;
> +
> +	/*
> +	 * Write data that will be zeroed to preserve them when successfully
> +	 * discarding page cache below but fail to convert extents.
> +	 */
> +	ret = filemap_write_and_wait_range(mapping, start, end);

The comment is somewhat outdated now. Also the range is wrong for collapse
and insert range. There we need to writeout data upto the EOF because we
truncate it below.

								Honza

> +	if (ret)
> +		goto failed;
> +
> +	/*
> +	 * For insert range and collapse range, COWed private pages should
> +	 * be removed since the file's logical offset will be changed, but
> +	 * punch hole and zero range doesn't.
> +	 */
> +	if (mode & (FALLOC_FL_INSERT_RANGE | FALLOC_FL_COLLAPSE_RANGE))
> +		truncate_pagecache(inode, start);
> +	else
> +		truncate_pagecache_range(inode, start, end);
> +
> +	return 0;
> +failed:
> +	filemap_invalidate_unlock(mapping);
> +	return ret;
> +}

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 03/10] ext4: drop ext4_update_disksize_before_punch()
  2024-09-20 16:13   ` Jan Kara
@ 2024-09-24  7:43     ` Zhang Yi
  2024-09-24 10:11       ` Jan Kara
  0 siblings, 1 reply; 28+ messages in thread
From: Zhang Yi @ 2024-09-24  7:43 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-ext4, linux-fsdevel, linux-kernel, tytso, adilger.kernel,
	ritesh.list, yi.zhang, chengzhihao1, yukuai3

On 2024/9/21 0:13, Jan Kara wrote:
> On Wed 04-09-24 14:29:18, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@huawei.com>
>>
>> Since we always write back dirty data before zeroing range and punching
>> hole, the delalloc extended file's disksize of should be updated
>> properly when writing back pages, hence we don't need to update file's
>> disksize before discarding page cache in ext4_zero_range() and
>> ext4_punch_hole(), just drop ext4_update_disksize_before_punch().
>>
>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> 
> So when we don't write out before hole punching & company this needs to stay
> in some shape or form. 
> 

Thanks for taking time to review this series!

I don't fully understand this comment, please let me confirm. Do you
suggested that we still don't write out all the data before punching /
zeroing / collapseing(i.e. drop patch 01), so we need to keep
ext4_update_disksize_before_punch()(i.e. also drop this patch), is
that right?

Thanks,
Yi.

> 
>> ---
>>  fs/ext4/ext4.h    |  3 ---
>>  fs/ext4/extents.c |  4 ----
>>  fs/ext4/inode.c   | 37 +------------------------------------
>>  3 files changed, 1 insertion(+), 43 deletions(-)
>>
>> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
>> index 08acd152261e..e8d7965f62c4 100644
>> --- a/fs/ext4/ext4.h
>> +++ b/fs/ext4/ext4.h
>> @@ -3414,9 +3414,6 @@ static inline int ext4_update_inode_size(struct inode *inode, loff_t newsize)
>>  	return changed;
>>  }
>>  
>> -int ext4_update_disksize_before_punch(struct inode *inode, loff_t offset,
>> -				      loff_t len);
>> -
>>  struct ext4_group_info {
>>  	unsigned long   bb_state;
>>  #ifdef AGGRESSIVE_CHECK
>> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
>> index 19a9b14935b7..d9fccf2970e9 100644
>> --- a/fs/ext4/extents.c
>> +++ b/fs/ext4/extents.c
>> @@ -4637,10 +4637,6 @@ static long ext4_zero_range(struct file *file, loff_t offset,
>>  		flags |= (EXT4_GET_BLOCKS_CONVERT_UNWRITTEN |
>>  			  EXT4_EX_NOCACHE);
>>  
>> -		ret = ext4_update_disksize_before_punch(inode, offset, len);
>> -		if (ret)
>> -			goto out_invalidate_lock;
>> -
>>  		/* Now release the pages and zero block aligned part of pages */
>>  		truncate_pagecache_range(inode, start, end - 1);
>>  
>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>> index 8af25442d44d..9343ce9f2b01 100644
>> --- a/fs/ext4/inode.c
>> +++ b/fs/ext4/inode.c
>> @@ -3872,37 +3872,6 @@ int ext4_can_truncate(struct inode *inode)
>>  	return 0;
>>  }
>>  
>> -/*
>> - * We have to make sure i_disksize gets properly updated before we truncate
>> - * page cache due to hole punching or zero range. Otherwise i_disksize update
>> - * can get lost as it may have been postponed to submission of writeback but
>> - * that will never happen after we truncate page cache.
>> - */
>> -int ext4_update_disksize_before_punch(struct inode *inode, loff_t offset,
>> -				      loff_t len)
>> -{
>> -	handle_t *handle;
>> -	int ret;
>> -
>> -	loff_t size = i_size_read(inode);
>> -
>> -	WARN_ON(!inode_is_locked(inode));
>> -	if (offset > size || offset + len < size)
>> -		return 0;
>> -
>> -	if (EXT4_I(inode)->i_disksize >= size)
>> -		return 0;
>> -
>> -	handle = ext4_journal_start(inode, EXT4_HT_MISC, 1);
>> -	if (IS_ERR(handle))
>> -		return PTR_ERR(handle);
>> -	ext4_update_i_disksize(inode, size);
>> -	ret = ext4_mark_inode_dirty(handle, inode);
>> -	ext4_journal_stop(handle);
>> -
>> -	return ret;
>> -}
>> -
>>  static void ext4_wait_dax_page(struct inode *inode)
>>  {
>>  	filemap_invalidate_unlock(inode->i_mapping);
>> @@ -4022,13 +3991,9 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
>>  	last_block_offset = round_down((offset + length), sb->s_blocksize) - 1;
>>  
>>  	/* Now release the pages and zero block aligned part of pages*/
>> -	if (last_block_offset > first_block_offset) {
>> -		ret = ext4_update_disksize_before_punch(inode, offset, length);
>> -		if (ret)
>> -			goto out_dio;
>> +	if (last_block_offset > first_block_offset)
>>  		truncate_pagecache_range(inode, first_block_offset,
>>  					 last_block_offset);
>> -	}
>>  
>>  	if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))
>>  		credits = ext4_writepage_trans_blocks(inode);
>> -- 
>> 2.39.2
>>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 09/10] ext4: factor out the common checking part of all fallocate operations
  2024-09-23  8:31   ` Jan Kara
@ 2024-09-24  7:52     ` Zhang Yi
  0 siblings, 0 replies; 28+ messages in thread
From: Zhang Yi @ 2024-09-24  7:52 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-ext4, linux-fsdevel, linux-kernel, tytso, adilger.kernel,
	ritesh.list, yi.zhang, chengzhihao1, yukuai3

On 2024/9/23 16:31, Jan Kara wrote:
> On Wed 04-09-24 14:29:24, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@huawei.com>
>>
>> Now the beginning of all the five functions in ext4_fallocate() (punch
>> hole, zero range, insert range, collapse range and normal fallocate) are
>> almost the same, they need to hold i_rwsem and check the validity of
>> input parameters, so move the holding of i_rwsem to ext4_fallocate()
>> and factor out a common helper to check the input parameters can make
>> the code more clear.
>>
>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> ...
>> +static int ext4_fallocate_check(struct inode *inode, int mode,
>> +				loff_t offset, loff_t len)
>> +{
>> +	/* Currently except punch_hole, just for extent based files. */
>> +	if (!(mode & FALLOC_FL_PUNCH_HOLE) &&
>> +	    !ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))
>> +		return -EOPNOTSUPP;
>> +
>> +	/*
>> +	 * Insert range and collapse range works only on fs cluster size
>> +	 * aligned regions.
>> +	 */
>> +	if (mode & (FALLOC_FL_INSERT_RANGE | FALLOC_FL_COLLAPSE_RANGE) &&
>> +	    !IS_ALIGNED(offset | len, EXT4_CLUSTER_SIZE(inode->i_sb)))
>> +		return -EINVAL;
>> +
>> +	if (mode & FALLOC_FL_INSERT_RANGE) {
>> +		/* Collapse range, offset must be less than i_size */
>> +		if (offset >= inode->i_size)
>> +			return -EINVAL;
>> +		/* Check whether the maximum file size would be exceeded */
>> +		if (len > inode->i_sb->s_maxbytes - inode->i_size)
>> +			return -EFBIG;
>> +	} else if (mode & FALLOC_FL_COLLAPSE_RANGE) {
>> +		/*
>> +		 * Insert range, there is no need to overlap collapse
>> +		 * range with EOF, in which case it is effectively a
>> +		 * truncate operation.
>> +		 */
>> +		if (offset + len >= inode->i_size)
>> +			return -EINVAL;
>> +	}
>> +
>> +	return 0;
>> +}
> 
> I don't think this helps. If the code is really shared, then the
> factorization is good but here you have to do various checks what operation
> we perform and in that case I don't think it really helps readability to
> factor out checks into a common function.
> 

Yeah, I think you are right, this is just move out the checks and
may increase the reading difficulty, it should be easier to understand
if they're still in their original places.

Thanks,
Yi.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 10/10] ext4: factor out a common helper to lock and flush data before fallocate
  2024-09-23  8:54   ` Jan Kara
@ 2024-09-24  8:11     ` Zhang Yi
  2024-09-24 10:05       ` Jan Kara
  0 siblings, 1 reply; 28+ messages in thread
From: Zhang Yi @ 2024-09-24  8:11 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-ext4, linux-fsdevel, linux-kernel, tytso, adilger.kernel,
	ritesh.list, yi.zhang, chengzhihao1, yukuai3

On 2024/9/23 16:54, Jan Kara wrote:
> On Wed 04-09-24 14:29:25, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@huawei.com>
>>
>> Now the beginning of the first four functions in ext4_fallocate() (punch
>> hole, zero range, insert range and collapse range) are almost the same,
>> they need to wait for the dio to finish, get filemap invalidate lock,
>> write back dirty data and finally drop page cache. Factor out a common
>> helper to do these work can reduce a lot of the redundant code.
>>
>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> 
> I like that we factor out this functionality in a common helper. But see
> below:
> 
>> @@ -4731,6 +4707,52 @@ static int ext4_fallocate_check(struct inode *inode, int mode,
>>  	return 0;
>>  }
>>  
>> +int ext4_prepare_falloc(struct file *file, loff_t start, loff_t end, int mode)
>> +{
>> +	struct inode *inode = file_inode(file);
>> +	struct address_space *mapping = inode->i_mapping;
>> +	int ret;
>> +
>> +	/* Wait all existing dio workers, newcomers will block on i_rwsem */
>> +	inode_dio_wait(inode);
>> +	ret = file_modified(file);
>> +	if (ret)
>> +		return ret;
>> +
>> +	/*
>> +	 * Prevent page faults from reinstantiating pages we have released
>> +	 * from page cache.
>> +	 */
>> +	filemap_invalidate_lock(mapping);
>> +
>> +	ret = ext4_break_layouts(inode);
>> +	if (ret)
>> +		goto failed;
>> +
>> +	/*
>> +	 * Write data that will be zeroed to preserve them when successfully
>> +	 * discarding page cache below but fail to convert extents.
>> +	 */
>> +	ret = filemap_write_and_wait_range(mapping, start, end);
> 
> The comment is somewhat outdated now.

Sure, will update it in next iteration.

> Also the range is wrong for collapse
> and insert range. There we need to writeout data upto the EOF because we
> truncate it below.
> 

For collapse and insert range, I passed the length LLONG_MAX, which is
the same as before, this should've upto the EOF, so I think it's
right, or am I missing something?

ext4_collapse_range():

-	start = round_down(offset, PAGE_SIZE);
-	/* Write out all dirty pages */
-	ret = filemap_write_and_wait_range(mapping, start, LLONG_MAX);
+	ret = ext4_prepare_falloc(file, round_down(offset, PAGE_SIZE),
+				  LLONG_MAX, FALLOC_FL_COLLAPSE_RANGE);


ext4_insert_range():

-	start = round_down(offset, PAGE_SIZE);
-	/* Write out all dirty pages */
-	ret = filemap_write_and_wait_range(mapping, start, LLONG_MAX);
+	ret = ext4_prepare_falloc(file, round_down(offset, PAGE_SIZE),
+				  LLONG_MAX, FALLOC_FL_INSERT_RANGE);

Thanks,
Yi.

> 
>> +	if (ret)
>> +		goto failed;
>> +
>> +	/*
>> +	 * For insert range and collapse range, COWed private pages should
>> +	 * be removed since the file's logical offset will be changed, but
>> +	 * punch hole and zero range doesn't.
>> +	 */
>> +	if (mode & (FALLOC_FL_INSERT_RANGE | FALLOC_FL_COLLAPSE_RANGE))
>> +		truncate_pagecache(inode, start);
>> +	else
>> +		truncate_pagecache_range(inode, start, end);
>> +
>> +	return 0;
>> +failed:
>> +	filemap_invalidate_unlock(mapping);
>> +	return ret;
>> +}
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 10/10] ext4: factor out a common helper to lock and flush data before fallocate
  2024-09-24  8:11     ` Zhang Yi
@ 2024-09-24 10:05       ` Jan Kara
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Kara @ 2024-09-24 10:05 UTC (permalink / raw)
  To: Zhang Yi
  Cc: Jan Kara, linux-ext4, linux-fsdevel, linux-kernel, tytso,
	adilger.kernel, ritesh.list, yi.zhang, chengzhihao1, yukuai3

On Tue 24-09-24 16:11:08, Zhang Yi wrote:
> On 2024/9/23 16:54, Jan Kara wrote:
> > On Wed 04-09-24 14:29:25, Zhang Yi wrote:
> > Also the range is wrong for collapse
> > and insert range. There we need to writeout data upto the EOF because we
> > truncate it below.
> > 
> 
> For collapse and insert range, I passed the length LLONG_MAX, which is
> the same as before, this should've upto the EOF, so I think it's
> right, or am I missing something?
> 
> ext4_collapse_range():
> 
> -	start = round_down(offset, PAGE_SIZE);
> -	/* Write out all dirty pages */
> -	ret = filemap_write_and_wait_range(mapping, start, LLONG_MAX);
> +	ret = ext4_prepare_falloc(file, round_down(offset, PAGE_SIZE),
> +				  LLONG_MAX, FALLOC_FL_COLLAPSE_RANGE);
> 
> 
> ext4_insert_range():
> 
> -	start = round_down(offset, PAGE_SIZE);
> -	/* Write out all dirty pages */
> -	ret = filemap_write_and_wait_range(mapping, start, LLONG_MAX);
> +	ret = ext4_prepare_falloc(file, round_down(offset, PAGE_SIZE),
> +				  LLONG_MAX, FALLOC_FL_INSERT_RANGE);
> 

Ah sorry, I've missed these bits. So we should be fine.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 03/10] ext4: drop ext4_update_disksize_before_punch()
  2024-09-24  7:43     ` Zhang Yi
@ 2024-09-24 10:11       ` Jan Kara
  2024-09-24 11:09         ` Zhang Yi
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Kara @ 2024-09-24 10:11 UTC (permalink / raw)
  To: Zhang Yi
  Cc: Jan Kara, linux-ext4, linux-fsdevel, linux-kernel, tytso,
	adilger.kernel, ritesh.list, yi.zhang, chengzhihao1, yukuai3

On Tue 24-09-24 15:43:22, Zhang Yi wrote:
> On 2024/9/21 0:13, Jan Kara wrote:
> > On Wed 04-09-24 14:29:18, Zhang Yi wrote:
> >> From: Zhang Yi <yi.zhang@huawei.com>
> >>
> >> Since we always write back dirty data before zeroing range and punching
> >> hole, the delalloc extended file's disksize of should be updated
> >> properly when writing back pages, hence we don't need to update file's
> >> disksize before discarding page cache in ext4_zero_range() and
> >> ext4_punch_hole(), just drop ext4_update_disksize_before_punch().
> >>
> >> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> > 
> > So when we don't write out before hole punching & company this needs to stay
> > in some shape or form. 
> > 
> 
> Thanks for taking time to review this series!
> 
> I don't fully understand this comment, please let me confirm. Do you
> suggested that we still don't write out all the data before punching /
> zeroing / collapseing(i.e. drop patch 01), so we need to keep
> ext4_update_disksize_before_punch()(i.e. also drop this patch), is
> that right?

Yes, this is what I meant. Sorry for not being clear.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 03/10] ext4: drop ext4_update_disksize_before_punch()
  2024-09-24 10:11       ` Jan Kara
@ 2024-09-24 11:09         ` Zhang Yi
  0 siblings, 0 replies; 28+ messages in thread
From: Zhang Yi @ 2024-09-24 11:09 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-ext4, linux-fsdevel, linux-kernel, tytso, adilger.kernel,
	ritesh.list, yi.zhang, chengzhihao1, yukuai3

On 2024/9/24 18:11, Jan Kara wrote:
> On Tue 24-09-24 15:43:22, Zhang Yi wrote:
>> On 2024/9/21 0:13, Jan Kara wrote:
>>> On Wed 04-09-24 14:29:18, Zhang Yi wrote:
>>>> From: Zhang Yi <yi.zhang@huawei.com>
>>>>
>>>> Since we always write back dirty data before zeroing range and punching
>>>> hole, the delalloc extended file's disksize of should be updated
>>>> properly when writing back pages, hence we don't need to update file's
>>>> disksize before discarding page cache in ext4_zero_range() and
>>>> ext4_punch_hole(), just drop ext4_update_disksize_before_punch().
>>>>
>>>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
>>>
>>> So when we don't write out before hole punching & company this needs to stay
>>> in some shape or form. 
>>>
>>
>> Thanks for taking time to review this series!
>>
>> I don't fully understand this comment, please let me confirm. Do you
>> suggested that we still don't write out all the data before punching /
>> zeroing / collapseing(i.e. drop patch 01), so we need to keep
>> ext4_update_disksize_before_punch()(i.e. also drop this patch), is
>> that right?
> 
> Yes, this is what I meant. Sorry for not being clear.
> 
> 								Honza
> 

OK, this looks fine to me. Let me revise this series.

Thanks,
Yi.


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2024-09-24 11:10 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-04  6:29 [PATCH v2 00/10] ext4: clean up and refactor fallocate Zhang Yi
2024-09-04  6:29 ` [PATCH v2 01/10] ext4: write out dirty data before dropping pages Zhang Yi
2024-09-17 16:50   ` Jan Kara
2024-09-18 12:27     ` Zhang Yi
2024-09-04  6:29 ` [PATCH v2 02/10] ext4: don't explicit update times in ext4_fallocate() Zhang Yi
2024-09-20 16:04   ` Jan Kara
2024-09-04  6:29 ` [PATCH v2 03/10] ext4: drop ext4_update_disksize_before_punch() Zhang Yi
2024-09-20 16:13   ` Jan Kara
2024-09-24  7:43     ` Zhang Yi
2024-09-24 10:11       ` Jan Kara
2024-09-24 11:09         ` Zhang Yi
2024-09-04  6:29 ` [PATCH v2 04/10] ext4: refactor ext4_zero_range() Zhang Yi
2024-09-20 16:24   ` Jan Kara
2024-09-04  6:29 ` [PATCH v2 05/10] ext4: refactor ext4_punch_hole() Zhang Yi
2024-09-20 16:31   ` Jan Kara
2024-09-04  6:29 ` [PATCH v2 06/10] ext4: refactor ext4_collapse_range() Zhang Yi
2024-09-20 16:35   ` Jan Kara
2024-09-04  6:29 ` [PATCH v2 07/10] ext4: refactor ext4_insert_range() Zhang Yi
2024-09-23  8:17   ` Jan Kara
2024-09-04  6:29 ` [PATCH v2 08/10] ext4: factor out ext4_do_fallocate() Zhang Yi
2024-09-23  8:20   ` Jan Kara
2024-09-04  6:29 ` [PATCH v2 09/10] ext4: factor out the common checking part of all fallocate operations Zhang Yi
2024-09-23  8:31   ` Jan Kara
2024-09-24  7:52     ` Zhang Yi
2024-09-04  6:29 ` [PATCH v2 10/10] ext4: factor out a common helper to lock and flush data before fallocate Zhang Yi
2024-09-23  8:54   ` Jan Kara
2024-09-24  8:11     ` Zhang Yi
2024-09-24 10:05       ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).