[PATCH 1/3] mpage: terminate read-ahead on read error

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 1/3] mpage: terminate read-ahead on read error
@ 2025-08-12  7:22 Chi Zhiling
  2025-08-12  7:22 ` [PATCH 2/3] mpage: clean up do_mpage_readpage() Chi Zhiling
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Chi Zhiling @ 2025-08-12  7:22 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel
  Cc: Alexander Viro, Christian Brauner, Jan Kara, Matthew Wilcox,
	Andrew Morton, Namjae Jeon, Sungjong Seo, Yuezhang Mo,
	Chi Zhiling

From: Chi Zhiling <chizhiling@kylinos.cn>

For exFAT filesystems with 4MB read_ahead_size, removing the storage device
during read operations can delay EIO error reporting by several minutes.
This occurs because the read-ahead implementation in mpage doesn't handle
errors.

Another reason for the delay is that the filesystem requires metadata to
issue file read request. When the storage device is removed, the metadata
buffers are invalidated, causing mpage to repeatedly attempt to fetch
metadata during each get_block call.

The original purpose of this patch is terminate read ahead when we fail
to get metadata, to make the patch more generic, implement it by checking
folio status, instead of checking the return of get_block().

Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
---
 fs/mpage.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/mpage.c b/fs/mpage.c
index c5fd821fd30e..b6510b8dfa2b 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -369,6 +369,9 @@ void mpage_readahead(struct readahead_control *rac, get_block_t get_block)
 		args.folio = folio;
 		args.nr_pages = readahead_count(rac);
 		args.bio = do_mpage_readpage(&args);
+		if (!folio_test_locked(folio) &&
+		    !folio_test_uptodate(folio))
+			break;
 	}
 	if (args.bio)
 		mpage_bio_submit_read(args.bio);
-- 
2.43.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/3] mpage: clean up do_mpage_readpage()
  2025-08-12  7:22 [PATCH 1/3] mpage: terminate read-ahead on read error Chi Zhiling
@ 2025-08-12  7:22 ` Chi Zhiling
  2025-08-12  7:22 ` [PATCH 3/3] mpage: convert do_mpage_readpage() to return int type Chi Zhiling
  2025-08-18  2:41 ` [PATCH 1/3] mpage: terminate read-ahead on read error Andrew Morton
  2 siblings, 0 replies; 7+ messages in thread
From: Chi Zhiling @ 2025-08-12  7:22 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel
  Cc: Alexander Viro, Christian Brauner, Jan Kara, Matthew Wilcox,
	Andrew Morton, Namjae Jeon, Sungjong Seo, Yuezhang Mo,
	Chi Zhiling

From: Chi Zhiling <chizhiling@kylinos.cn>

Replace two loop iterations with direct calculations.
The variable nblocks now represents the number of avalid blocks we can
obtain from map_bh. no functional change.

Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
---
 fs/mpage.c | 42 +++++++++++++++++-------------------------
 1 file changed, 17 insertions(+), 25 deletions(-)

diff --git a/fs/mpage.c b/fs/mpage.c
index b6510b8dfa2b..a81a71de8f59 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -158,7 +158,6 @@ static struct bio *do_mpage_readpage(struct mpage_readpage_args *args)
 	struct buffer_head *map_bh = &args->map_bh;
 	sector_t block_in_file;
 	sector_t last_block;
-	sector_t last_block_in_file;
 	sector_t first_block;
 	unsigned page_block;
 	unsigned first_hole = blocks_per_folio;
@@ -180,9 +179,8 @@ static struct bio *do_mpage_readpage(struct mpage_readpage_args *args)
 
 	block_in_file = folio_pos(folio) >> blkbits;
 	last_block = block_in_file + ((args->nr_pages * PAGE_SIZE) >> blkbits);
-	last_block_in_file = (i_size_read(inode) + blocksize - 1) >> blkbits;
-	if (last_block > last_block_in_file)
-		last_block = last_block_in_file;
+	last_block = min_t(sector_t, last_block,
+			   (i_size_read(inode) + blocksize - 1) >> blkbits);
 	page_block = 0;
 
 	/*
@@ -193,19 +191,15 @@ static struct bio *do_mpage_readpage(struct mpage_readpage_args *args)
 			block_in_file > args->first_logical_block &&
 			block_in_file < (args->first_logical_block + nblocks)) {
 		unsigned map_offset = block_in_file - args->first_logical_block;
-		unsigned last = nblocks - map_offset;
 
 		first_block = map_bh->b_blocknr + map_offset;
-		for (relative_block = 0; ; relative_block++) {
-			if (relative_block == last) {
-				clear_buffer_mapped(map_bh);
-				break;
-			}
-			if (page_block == blocks_per_folio)
-				break;
-			page_block++;
-			block_in_file++;
-		}
+		nblocks -= map_offset;
+		if (nblocks > blocks_per_folio - page_block)
+			nblocks = blocks_per_folio - page_block;
+		else
+			clear_buffer_mapped(map_bh);
+		page_block += nblocks;
+		block_in_file += nblocks;
 		bdev = map_bh->b_bdev;
 	}
 
@@ -243,7 +237,7 @@ static struct bio *do_mpage_readpage(struct mpage_readpage_args *args)
 			map_buffer_to_folio(folio, map_bh, page_block);
 			goto confused;
 		}
-	
+
 		if (first_hole != blocks_per_folio)
 			goto confused;		/* hole -> non-hole */
 
@@ -252,16 +246,14 @@ static struct bio *do_mpage_readpage(struct mpage_readpage_args *args)
 			first_block = map_bh->b_blocknr;
 		else if (first_block + page_block != map_bh->b_blocknr)
 			goto confused;
+
 		nblocks = map_bh->b_size >> blkbits;
-		for (relative_block = 0; ; relative_block++) {
-			if (relative_block == nblocks) {
-				clear_buffer_mapped(map_bh);
-				break;
-			} else if (page_block == blocks_per_folio)
-				break;
-			page_block++;
-			block_in_file++;
-		}
+		if (nblocks > blocks_per_folio - page_block)
+			nblocks = blocks_per_folio - page_block;
+		else
+			clear_buffer_mapped(map_bh);
+		page_block += nblocks;
+		block_in_file += nblocks;
 		bdev = map_bh->b_bdev;
 	}
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/3] mpage: convert do_mpage_readpage() to return int type
  2025-08-12  7:22 [PATCH 1/3] mpage: terminate read-ahead on read error Chi Zhiling
  2025-08-12  7:22 ` [PATCH 2/3] mpage: clean up do_mpage_readpage() Chi Zhiling
@ 2025-08-12  7:22 ` Chi Zhiling
  2025-08-18  2:41 ` [PATCH 1/3] mpage: terminate read-ahead on read error Andrew Morton
  2 siblings, 0 replies; 7+ messages in thread
From: Chi Zhiling @ 2025-08-12  7:22 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel
  Cc: Alexander Viro, Christian Brauner, Jan Kara, Matthew Wilcox,
	Andrew Morton, Namjae Jeon, Sungjong Seo, Yuezhang Mo,
	Chi Zhiling

From: Chi Zhiling <chizhiling@kylinos.cn>

The return value of do_mpage_readpage() is arg->bio, which is already set
in the arg structure. Returning it again is redundant.

This patch changes the return type to int and always returns 0 since
the caller does not care about the return value.

Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
---
 fs/mpage.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/mpage.c b/fs/mpage.c
index a81a71de8f59..718c2c448947 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -148,7 +148,7 @@ struct mpage_readpage_args {
  * represent the validity of its disk mapping and to decide when to do the next
  * get_block() call.
  */
-static struct bio *do_mpage_readpage(struct mpage_readpage_args *args)
+static int do_mpage_readpage(struct mpage_readpage_args *args)
 {
 	struct folio *folio = args->folio;
 	struct inode *inode = folio->mapping->host;
@@ -297,7 +297,7 @@ static struct bio *do_mpage_readpage(struct mpage_readpage_args *args)
 	else
 		args->last_block_in_bio = first_block + blocks_per_folio - 1;
 out:
-	return args->bio;
+	return 0;
 
 confused:
 	if (args->bio)
@@ -360,7 +360,7 @@ void mpage_readahead(struct readahead_control *rac, get_block_t get_block)
 		prefetchw(&folio->flags);
 		args.folio = folio;
 		args.nr_pages = readahead_count(rac);
-		args.bio = do_mpage_readpage(&args);
+		do_mpage_readpage(&args);
 		if (!folio_test_locked(folio) &&
 		    !folio_test_uptodate(folio))
 			break;
@@ -381,7 +381,7 @@ int mpage_read_folio(struct folio *folio, get_block_t get_block)
 		.get_block = get_block,
 	};
 
-	args.bio = do_mpage_readpage(&args);
+	do_mpage_readpage(&args);
 	if (args.bio)
 		mpage_bio_submit_read(args.bio);
 	return 0;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/3] mpage: terminate read-ahead on read error
  2025-08-12  7:22 [PATCH 1/3] mpage: terminate read-ahead on read error Chi Zhiling
  2025-08-12  7:22 ` [PATCH 2/3] mpage: clean up do_mpage_readpage() Chi Zhiling
  2025-08-12  7:22 ` [PATCH 3/3] mpage: convert do_mpage_readpage() to return int type Chi Zhiling
@ 2025-08-18  2:41 ` Andrew Morton
  2025-08-18 10:04   ` Chi Zhiling
  2 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2025-08-18  2:41 UTC (permalink / raw)
  To: Chi Zhiling
  Cc: linux-fsdevel, linux-mm, linux-kernel, Alexander Viro,
	Christian Brauner, Jan Kara, Matthew Wilcox, Namjae Jeon,
	Sungjong Seo, Yuezhang Mo, Chi Zhiling

On Tue, 12 Aug 2025 15:22:23 +0800 Chi Zhiling <chizhiling@163.com> wrote:

> From: Chi Zhiling <chizhiling@kylinos.cn>
> 
> For exFAT filesystems with 4MB read_ahead_size, removing the storage device
> during read operations can delay EIO error reporting by several minutes.
> This occurs because the read-ahead implementation in mpage doesn't handle
> errors.
> 
> Another reason for the delay is that the filesystem requires metadata to
> issue file read request. When the storage device is removed, the metadata
> buffers are invalidated, causing mpage to repeatedly attempt to fetch
> metadata during each get_block call.
> 
> The original purpose of this patch is terminate read ahead when we fail
> to get metadata, to make the patch more generic, implement it by checking
> folio status, instead of checking the return of get_block().
> 
> ...
>
> --- a/fs/mpage.c
> +++ b/fs/mpage.c
> @@ -369,6 +369,9 @@ void mpage_readahead(struct readahead_control *rac, get_block_t get_block)
>  		args.folio = folio;
>  		args.nr_pages = readahead_count(rac);
>  		args.bio = do_mpage_readpage(&args);
> +		if (!folio_test_locked(folio) &&
> +		    !folio_test_uptodate(folio))
> +			break;
>  	}
>  	if (args.bio)
>  		mpage_bio_submit_read(args.bio);

So...  this is what the fs does when the device is unplugged? 
Synchronously return an unlocked !uptodate folio?  Or is this specific
to FAT?

I think a comment here telling readers why we're doing this would be
helpful.  It isn't obvious that we're dealing with e removed device!

Also, boy this is old code.  Basically akpm code from pre-git times. 
It was quite innovative back then, but everybody who understood it has
since moved on,  got senile or probably died.  Oh well.

Also, that if statement didn't need a newline ;)



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/3] mpage: terminate read-ahead on read error
  2025-08-18  2:41 ` [PATCH 1/3] mpage: terminate read-ahead on read error Andrew Morton
@ 2025-08-18 10:04   ` Chi Zhiling
  2025-08-18 14:33     ` Matthew Wilcox
  0 siblings, 1 reply; 7+ messages in thread
From: Chi Zhiling @ 2025-08-18 10:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-fsdevel, linux-mm, linux-kernel, Alexander Viro,
	Christian Brauner, Jan Kara, Matthew Wilcox, Namjae Jeon,
	Sungjong Seo, Yuezhang Mo, Chi Zhiling

On 2025/8/18 10:41, Andrew Morton wrote:
> On Tue, 12 Aug 2025 15:22:23 +0800 Chi Zhiling <chizhiling@163.com> wrote:
> 
>> From: Chi Zhiling <chizhiling@kylinos.cn>
>>
>> For exFAT filesystems with 4MB read_ahead_size, removing the storage device
>> during read operations can delay EIO error reporting by several minutes.
>> This occurs because the read-ahead implementation in mpage doesn't handle
>> errors.
>>
>> Another reason for the delay is that the filesystem requires metadata to
>> issue file read request. When the storage device is removed, the metadata
>> buffers are invalidated, causing mpage to repeatedly attempt to fetch
>> metadata during each get_block call.
>>
>> The original purpose of this patch is terminate read ahead when we fail
>> to get metadata, to make the patch more generic, implement it by checking
>> folio status, instead of checking the return of get_block().
>>
>> ...
>>
>> --- a/fs/mpage.c
>> +++ b/fs/mpage.c
>> @@ -369,6 +369,9 @@ void mpage_readahead(struct readahead_control *rac, get_block_t get_block)
>>   		args.folio = folio;
>>   		args.nr_pages = readahead_count(rac);
>>   		args.bio = do_mpage_readpage(&args);
>> +		if (!folio_test_locked(folio) &&
>> +		    !folio_test_uptodate(folio))
>> +			break;
>>   	}
>>   	if (args.bio)
>>   		mpage_bio_submit_read(args.bio);
> 
> So...  this is what the fs does when the device is unplugged?
> Synchronously return an unlocked !uptodate folio?  Or is this specific
> to FAT?

It's fs behavior,

AFAIK, all filesystems that use mpage will lock the folio until I/O 
finishes or encounters an error. This avoids races like buffered writes, 
etc. The uptodate flag being set or not depends on the I/O status.

So, if a folio is synchronously unlocked and non-uptodate, should we 
quit the read ahead?

I think it depends on whether the error is permanent or temporary, and 
whether further read ahead might succeed.

A device being unplugged is one reason for returning such a folio, but 
we could return it for many other reasons (e.g., metadata errors).

I think most errors won't be restored in a short time, so we should quit 
read ahead when they occur.

Besides, IOMAP also quits read ahead when some errors are encountered in 
iomap_begin().

> 
> I think a comment here telling readers why we're doing this would be
> helpful.  It isn't obvious that we're dealing with e removed device!

okay, I will comment here.
/*
  * If read ahead failed synchronously, it may cause by removed device,
  * or some filesystem metadata error.
  */

> 
> Also, boy this is old code.  Basically akpm code from pre-git times.
> It was quite innovative back then, but everybody who understood it has
> since moved on,  got senile or probably died.  Oh well.

Actually, I think this patch is safe, but I'm not sure if we should fix 
this issue. After all, this code has existed for a long time, and it's 
quite rare to unplug the device during a copy operation :)

Thanks,
Chi Zhiling

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/3] mpage: terminate read-ahead on read error
  2025-08-18 10:04   ` Chi Zhiling
@ 2025-08-18 14:33     ` Matthew Wilcox
  2025-08-19 12:22       ` Chi Zhiling
  0 siblings, 1 reply; 7+ messages in thread
From: Matthew Wilcox @ 2025-08-18 14:33 UTC (permalink / raw)
  To: Chi Zhiling
  Cc: Andrew Morton, linux-fsdevel, linux-mm, linux-kernel,
	Alexander Viro, Christian Brauner, Jan Kara, Namjae Jeon,
	Sungjong Seo, Yuezhang Mo, Chi Zhiling

On Mon, Aug 18, 2025 at 06:04:23PM +0800, Chi Zhiling wrote:
> > Also, boy this is old code.  Basically akpm code from pre-git times.
> > It was quite innovative back then, but everybody who understood it has
> > since moved on,  got senile or probably died.  Oh well.
> 
> Actually, I think this patch is safe, but I'm not sure if we should fix this
> issue. After all, this code has existed for a long time, and it's quite rare
> to unplug the device during a copy operation :)

Converting exfat to use iomap would be a valuable piece of work ...

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/3] mpage: terminate read-ahead on read error
  2025-08-18 14:33     ` Matthew Wilcox
@ 2025-08-19 12:22       ` Chi Zhiling
  0 siblings, 0 replies; 7+ messages in thread
From: Chi Zhiling @ 2025-08-19 12:22 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Morton, linux-fsdevel, linux-mm, linux-kernel,
	Alexander Viro, Christian Brauner, Jan Kara, Namjae Jeon,
	Sungjong Seo, Yuezhang Mo, Chi Zhiling

On 2025/8/18 22:33, Matthew Wilcox wrote:
> On Mon, Aug 18, 2025 at 06:04:23PM +0800, Chi Zhiling wrote:
>>> Also, boy this is old code.  Basically akpm code from pre-git times.
>>> It was quite innovative back then, but everybody who understood it has
>>> since moved on,  got senile or probably died.  Oh well.
>>
>> Actually, I think this patch is safe, but I'm not sure if we should fix this
>> issue. After all, this code has existed for a long time, and it's quite rare
>> to unplug the device during a copy operation :)
> 
> Converting exfat to use iomap would be a valuable piece of work ...

Yes, this is indeed worthwhile, and exFAT should also be restructured to 
support extents rather than fetching entries one by one.

I estimate this would bring significant performance improvements


Thanks,
Chi Zhiling


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-08-19 12:23 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-12  7:22 [PATCH 1/3] mpage: terminate read-ahead on read error Chi Zhiling
2025-08-12  7:22 ` [PATCH 2/3] mpage: clean up do_mpage_readpage() Chi Zhiling
2025-08-12  7:22 ` [PATCH 3/3] mpage: convert do_mpage_readpage() to return int type Chi Zhiling
2025-08-18  2:41 ` [PATCH 1/3] mpage: terminate read-ahead on read error Andrew Morton
2025-08-18 10:04   ` Chi Zhiling
2025-08-18 14:33     ` Matthew Wilcox
2025-08-19 12:22       ` Chi Zhiling

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).