linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] f2fs: f2fs supports uncached buffered I/O
@ 2025-07-15  3:10 Qi Han
  2025-07-15  6:58 ` Chao Yu
  2025-07-15 14:28 ` Jens Axboe
  0 siblings, 2 replies; 10+ messages in thread
From: Qi Han @ 2025-07-15  3:10 UTC (permalink / raw)
  To: jaegeuk, chao; +Cc: axboe, linux-f2fs-devel, linux-kernel, Qi Han

Jens has already completed the development of uncached buffered I/O
in git [1], and in f2fs, the feature can be enabled simply by setting
the FOP_DONTCACHE flag in f2fs_file_operations.

[1]
https://lore.kernel.org/all/20241220154831.1086649-10-axboe@kernel.dk/T/#m58520a94b46f543d82db3711453dfc7bb594b2b0

Signed-off-by: Qi Han <hanqi@vivo.com>
---
 fs/f2fs/file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 696131e655ed..d8da1fc2febf 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -5425,5 +5425,5 @@ const struct file_operations f2fs_file_operations = {
 	.splice_read	= f2fs_file_splice_read,
 	.splice_write	= iter_file_splice_write,
 	.fadvise	= f2fs_file_fadvise,
-	.fop_flags	= FOP_BUFFER_RASYNC,
+	.fop_flags	= FOP_BUFFER_RASYNC | FOP_DONTCACHE,
 };
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] f2fs: f2fs supports uncached buffered I/O
  2025-07-15  3:10 [PATCH] f2fs: f2fs supports uncached buffered I/O Qi Han
@ 2025-07-15  6:58 ` Chao Yu
  2025-07-15  8:14   ` hanqi
  2025-07-15 14:28 ` Jens Axboe
  1 sibling, 1 reply; 10+ messages in thread
From: Chao Yu @ 2025-07-15  6:58 UTC (permalink / raw)
  To: Qi Han, jaegeuk; +Cc: chao, axboe, linux-f2fs-devel, linux-kernel

On 7/15/25 11:10, Qi Han wrote:
> Jens has already completed the development of uncached buffered I/O
> in git [1], and in f2fs, the feature can be enabled simply by setting
> the FOP_DONTCACHE flag in f2fs_file_operations.

Hi Qi, do you have any numbers of f2fs before/after this change? though
I'm not against supporting this feature in f2fs.

Thanks,

> 
> [1]
> https://lore.kernel.org/all/20241220154831.1086649-10-axboe@kernel.dk/T/#m58520a94b46f543d82db3711453dfc7bb594b2b0
> 
> Signed-off-by: Qi Han <hanqi@vivo.com>
> ---
>  fs/f2fs/file.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index 696131e655ed..d8da1fc2febf 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -5425,5 +5425,5 @@ const struct file_operations f2fs_file_operations = {
>  	.splice_read	= f2fs_file_splice_read,
>  	.splice_write	= iter_file_splice_write,
>  	.fadvise	= f2fs_file_fadvise,
> -	.fop_flags	= FOP_BUFFER_RASYNC,
> +	.fop_flags	= FOP_BUFFER_RASYNC | FOP_DONTCACHE,
>  };


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] f2fs: f2fs supports uncached buffered I/O
  2025-07-15  6:58 ` Chao Yu
@ 2025-07-15  8:14   ` hanqi
  0 siblings, 0 replies; 10+ messages in thread
From: hanqi @ 2025-07-15  8:14 UTC (permalink / raw)
  To: Chao Yu, jaegeuk; +Cc: axboe, linux-f2fs-devel, linux-kernel



在 2025/7/15 14:58, Chao Yu 写道:
> On 7/15/25 11:10, Qi Han wrote:
>> Jens has already completed the development of uncached buffered I/O
>> in git [1], and in f2fs, the feature can be enabled simply by setting
>> the FOP_DONTCACHE flag in f2fs_file_operations.
> Hi Qi, do you have any numbers of f2fs before/after this change? though
> I'm not against supporting this feature in f2fs.
>
> Thanks,

Hi, Chao
I have been testing a use case locally, which aligns with Jens' test
case [1]. In the read scenario, using uncached buffer I/O results in
more stable read performance and a lower load on the background memory
reclaim thread (kswapd).
However, in the write scenario, it appears that uncached buffer I/O
may not be suitable for F2FS. This is because F2FS calls folio_end_writeback
in the softirq context, as discussed in [2].

Read test data without using uncached buffer I/O:
reading bs 32768, uncached 0
   1s: 1856MB/sec, MB=1856
   2s: 1907MB/sec, MB=3763
   3s: 1830MB/sec, MB=5594
   4s: 1745MB/sec, MB=7333
   5s: 1829MB/sec, MB=9162
   6s: 1903MB/sec, MB=11075
   7s: 1878MB/sec, MB=12942
   8s: 1763MB/sec, MB=14718
   9s: 1845MB/sec, MB=16549
  10s: 1915MB/sec, MB=18481
  11s: 1831MB/sec, MB=20295
  12s: 1750MB/sec, MB=22066
  13s: 1787MB/sec, MB=23832
  14s: 1913MB/sec, MB=25769
  15s: 1898MB/sec, MB=27668
  16s: 1795MB/sec, MB=29436
  17s: 1812MB/sec, MB=31248
  18s: 1890MB/sec, MB=33139
  19s: 1880MB/sec, MB=35020
  20s: 1754MB/sec, MB=36810

08:36:26      UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
08:36:27        0        93    0.00    0.00    0.00    0.00    0.00     7  kswapd0
08:36:28        0        93    0.00    0.00    0.00    0.00    0.00     7  kswapd0
08:36:29        0        93    0.00    0.00    0.00    0.00    0.00     7  kswapd0
08:36:30        0        93    0.00   56.00    0.00    0.00   56.00     7  kswapd0
08:36:31        0        93    0.00   73.00    0.00    0.00   73.00     7  kswapd0
08:36:32        0        93    0.00   83.00    0.00    0.00   83.00     7  kswapd0
08:36:33        0        93    0.00   75.00    0.00    0.00   75.00     7  kswapd0
08:36:34        0        93    0.00   81.00    0.00    0.00   81.00     7  kswapd0
08:36:35        0        93    0.00   54.00    0.00    1.00   54.00     2  kswapd0
08:36:36        0        93    0.00   61.00    0.00    0.00   61.00     0  kswapd0
08:36:37        0        93    0.00   68.00    0.00    0.00   68.00     7  kswapd0
08:36:38        0        93    0.00   53.00    0.00    0.00   53.00     2  kswapd0
08:36:39        0        93    0.00   82.00    0.00    0.00   82.00     7  kswapd0
08:36:40        0        93    0.00   77.00    0.00    0.00   77.00     1  kswapd0
08:36:41        0        93    0.00   74.00    0.00    1.00   74.00     7  kswapd0
08:36:42        0        93    0.00   71.00    0.00    0.00   71.00     7  kswapd0
08:36:43        0        93    0.00   78.00    0.00    0.00   78.00     7  kswapd0
08:36:44        0        93    0.00   85.00    0.00    0.00   85.00     7  kswapd0
08:36:45        0        93    0.00   83.00    0.00    0.00   83.00     7  kswapd0
08:36:46        0        93    0.00   70.00    0.00    0.00   70.00     7  kswapd0
08:36:47        0        93    0.00   78.00    0.00    1.00   78.00     2  kswapd0
08:36:48        0        93    0.00   81.00    0.00    0.00   81.00     3  kswapd0
08:36:49        0        93    0.00   54.00    0.00    0.00   54.00     7  kswapd0
08:36:50        0        93    0.00   76.00    0.00    0.00   76.00     1  kswapd0
08:36:51        0        93    0.00   75.00    0.00    0.00   75.00     0  kswapd0
08:36:52        0        93    0.00   73.00    0.00    0.00   73.00     7  kswapd0
08:36:53        0        93    0.00   61.00    0.00    1.00   61.00     7  kswapd0
08:36:54        0        93    0.00   80.00    0.00    0.00   80.00     7  kswapd0
08:36:55        0        93    0.00   64.00    0.00    0.00   64.00     7  kswapd0
08:36:56        0        93    0.00   56.00    0.00    0.00   56.00     7  kswapd0
08:36:57        0        93    0.00   26.00    0.00    0.00   26.00     2  kswapd0
08:36:58        0        93    0.00   24.00    0.00    1.00   24.00     3  kswapd0
08:36:59        0        93    0.00   22.00    0.00    1.00   22.00     3  kswapd0
08:37:00        0        93    0.00   15.84    0.00    0.00   15.84     3  kswapd0
08:37:01        0        93    0.00    0.00    0.00    0.00    0.00     3  kswapd0
08:37:02        0        93    0.00    0.00    0.00    0.00    0.00     3  kswapd0

Read test data after using uncached buffer I/O:
reading bs 32768, uncached 1
   1s: 1863MB/sec, MB=1863
   2s: 1903MB/sec, MB=3766
   3s: 1860MB/sec, MB=5627
   4s: 1864MB/sec, MB=7491
   5s: 1860MB/sec, MB=9352
   6s: 1854MB/sec, MB=11206
   7s: 1874MB/sec, MB=13081
   8s: 1874MB/sec, MB=14943
   9s: 1840MB/sec, MB=16798
  10s: 1849MB/sec, MB=18647
  11s: 1863MB/sec, MB=20511
  12s: 1798MB/sec, MB=22310
  13s: 1897MB/sec, MB=24207
  14s: 1817MB/sec, MB=26025
  15s: 1893MB/sec, MB=27918
  16s: 1917MB/sec, MB=29836
  17s: 1863MB/sec, MB=31699
  18s: 1904MB/sec, MB=33604
  19s: 1894MB/sec, MB=35499
  20s: 1907MB/sec, MB=37407
  
08:38:00      UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
08:38:01        0        93    0.00    0.00    0.00    0.00    0.00     4  kswapd0
08:38:02        0        93    0.00    0.00    0.00    0.00    0.00     4  kswapd0
08:38:03        0        93    0.00    0.00    0.00    0.00    0.00     4  kswapd0
08:38:04        0        93    0.00    0.00    0.00    0.00    0.00     4  kswapd0
08:38:05        0        93    0.00    0.00    0.00    0.00    0.00     4  kswapd0
08:38:06        0        93    0.00    1.00    0.00    1.00    1.00     0  kswapd0
08:38:07        0        93    0.00    0.00    0.00    0.00    0.00     0  kswapd0
08:38:08        0        93    0.00    0.00    0.00    0.00    0.00     0  kswapd0
08:38:09        0        93    0.00    1.00    0.00    0.00    1.00     1  kswapd0
08:38:10        0        93    0.00    0.00    0.00    0.00    0.00     1  kswapd0
08:38:11        0        93    0.00    0.00    0.00    0.00    0.00     1  kswapd0
08:38:12        0        93    0.00    0.00    0.00    0.00    0.00     1  kswapd0
08:38:13        0        93    0.00    0.00    0.00    0.00    0.00     1  kswapd0
08:38:14        0        93    0.00    0.00    0.00    0.00    0.00     1  kswapd0
08:38:15        0        93    0.00    3.00    0.00    0.00    3.00     0  kswapd0
08:38:16        0        93    0.00    0.00    0.00    0.00    0.00     0  kswapd0
08:38:17        0        93    0.00    0.00    0.00    0.00    0.00     0  kswapd0
08:38:18        0        93    0.00    0.00    0.00    0.00    0.00     0  kswapd0
08:38:19        0        93    0.00    0.00    0.00    0.00    0.00     0  kswapd0
08:38:20        0        93    0.00    0.00    0.00    0.00    0.00     0  kswapd0
08:38:21        0        93    0.00    0.00    0.00    0.00    0.00     0  kswapd0
08:38:22        0        93    0.00    0.00    0.00    0.00    0.00     0  kswapd0
08:38:23        0        93    0.00    3.00    0.00    0.00    3.00     4  kswapd0
08:38:24        0        93    0.00    0.00    0.00    0.00    0.00     4  kswapd0
08:38:25        0        93    0.00    0.00    0.00    0.00    0.00     4  kswapd0
08:38:26        0        93    0.00    4.00    0.00    0.00    4.00     3  kswapd0
08:38:27        0        93    0.00    0.00    0.00    0.00    0.00     3  kswapd0
08:38:28        0        93    0.00    0.00    0.00    0.00    0.00     3  kswapd0
08:38:29        0        93    0.00    0.00    0.00    0.00    0.00     3  kswapd0
08:38:30        0        93    0.00    0.00    0.00    0.00    0.00     3  kswapd0
08:38:31        0        93    0.00    0.00    0.00    0.00    0.00     3  kswapd0
08:38:32        0        93    0.00    0.00    0.00    0.00    0.00     3  kswapd0
08:38:33        0        93    0.00    0.00    0.00    0.00    0.00     3  kswapd0

[1] https://pastebin.com/u8eCBzB5
[2] https://lore.kernel.org/all/20241220154831.1086649-10-axboe@kernel.dk/T/#m0dff9e4f79c95a75c6b2cf202bc9d3d6f4559723
Thanks,
Qi

>> [1]
>> https://lore.kernel.org/all/20241220154831.1086649-10-axboe@kernel.dk/T/#m58520a94b46f543d82db3711453dfc7bb594b2b0
>>
>> Signed-off-by: Qi Han <hanqi@vivo.com>
>> ---
>>   fs/f2fs/file.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
>> index 696131e655ed..d8da1fc2febf 100644
>> --- a/fs/f2fs/file.c
>> +++ b/fs/f2fs/file.c
>> @@ -5425,5 +5425,5 @@ const struct file_operations f2fs_file_operations = {
>>   	.splice_read	= f2fs_file_splice_read,
>>   	.splice_write	= iter_file_splice_write,
>>   	.fadvise	= f2fs_file_fadvise,
>> -	.fop_flags	= FOP_BUFFER_RASYNC,
>> +	.fop_flags	= FOP_BUFFER_RASYNC | FOP_DONTCACHE,
>>   };

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] f2fs: f2fs supports uncached buffered I/O
  2025-07-15  3:10 [PATCH] f2fs: f2fs supports uncached buffered I/O Qi Han
  2025-07-15  6:58 ` Chao Yu
@ 2025-07-15 14:28 ` Jens Axboe
  2025-07-16  3:34   ` hanqi
  1 sibling, 1 reply; 10+ messages in thread
From: Jens Axboe @ 2025-07-15 14:28 UTC (permalink / raw)
  To: Qi Han, jaegeuk, chao; +Cc: linux-f2fs-devel, linux-kernel

On 7/14/25 9:10 PM, Qi Han wrote:
> Jens has already completed the development of uncached buffered I/O
> in git [1], and in f2fs, the feature can be enabled simply by setting
> the FOP_DONTCACHE flag in f2fs_file_operations.

You need to ensure that for any DONTCACHE IO that the completion is
routed via non-irq context, if applicable. I didn't verify that this is
the case for f2fs. Generally you can deduce this as well through
testing, I'd say the following cases would be interesting to test:

1) Normal DONTCACHE buffered read
2) Overwrite DONTCACHE buffered write
3) Append DONTCACHE buffered write

Test those with DEBUG_ATOMIC_SLEEP set in your config, and it that
doesn't complain, that's a great start.

For the above test cases as well, verify that page cache doesn't grow as
IO is performed. A bit is fine for things like meta data, but generally
you want to see it remain basically flat in terms of page cache usage.

Maybe this is all fine, like I said I didn't verify. Just mentioning it
for completeness sake.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] f2fs: f2fs supports uncached buffered I/O
  2025-07-15 14:28 ` Jens Axboe
@ 2025-07-16  3:34   ` hanqi
  2025-07-16  3:43     ` Jens Axboe
  0 siblings, 1 reply; 10+ messages in thread
From: hanqi @ 2025-07-16  3:34 UTC (permalink / raw)
  To: Jens Axboe, jaegeuk, chao; +Cc: linux-f2fs-devel, linux-kernel



在 2025/7/15 22:28, Jens Axboe 写道:
> On 7/14/25 9:10 PM, Qi Han wrote:
>> Jens has already completed the development of uncached buffered I/O
>> in git [1], and in f2fs, the feature can be enabled simply by setting
>> the FOP_DONTCACHE flag in f2fs_file_operations.
> You need to ensure that for any DONTCACHE IO that the completion is
> routed via non-irq context, if applicable. I didn't verify that this is
> the case for f2fs. Generally you can deduce this as well through
> testing, I'd say the following cases would be interesting to test:
>
> 1) Normal DONTCACHE buffered read
> 2) Overwrite DONTCACHE buffered write
> 3) Append DONTCACHE buffered write
>
> Test those with DEBUG_ATOMIC_SLEEP set in your config, and it that
> doesn't complain, that's a great start.
>
> For the above test cases as well, verify that page cache doesn't grow as
> IO is performed. A bit is fine for things like meta data, but generally
> you want to see it remain basically flat in terms of page cache usage.
>
> Maybe this is all fine, like I said I didn't verify. Just mentioning it
> for completeness sake.

Hi, Jens
Thanks for your suggestion. As I mentioned earlier in [1], in f2fs,
the regular buffered write path invokes folio_end_writeback from a
softirq context. Therefore, it seems that f2fs may not be suitable
for DONTCACHE I/O writes.

I’d like to ask a question: why is DONTCACHE I/O write restricted to
non-interrupt context only? Is it because dropping the page might be
too time-consuming to be done safely in interrupt context? This might
be a naive question, but I’d really appreciate your clarification.
Thanks in advance.

[1] https://lore.kernel.org/all/137c0a07-ea0a-48fa-acc4-3e0ec63681f4@vivo.com/

>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] f2fs: f2fs supports uncached buffered I/O
  2025-07-16  3:34   ` hanqi
@ 2025-07-16  3:43     ` Jens Axboe
  2025-07-16  8:27       ` hanqi
  0 siblings, 1 reply; 10+ messages in thread
From: Jens Axboe @ 2025-07-16  3:43 UTC (permalink / raw)
  To: hanqi, jaegeuk, chao; +Cc: linux-f2fs-devel, linux-kernel

On 7/15/25 9:34 PM, hanqi wrote:
> 
> 
> ? 2025/7/15 22:28, Jens Axboe ??:
>> On 7/14/25 9:10 PM, Qi Han wrote:
>>> Jens has already completed the development of uncached buffered I/O
>>> in git [1], and in f2fs, the feature can be enabled simply by setting
>>> the FOP_DONTCACHE flag in f2fs_file_operations.
>> You need to ensure that for any DONTCACHE IO that the completion is
>> routed via non-irq context, if applicable. I didn't verify that this is
>> the case for f2fs. Generally you can deduce this as well through
>> testing, I'd say the following cases would be interesting to test:
>>
>> 1) Normal DONTCACHE buffered read
>> 2) Overwrite DONTCACHE buffered write
>> 3) Append DONTCACHE buffered write
>>
>> Test those with DEBUG_ATOMIC_SLEEP set in your config, and it that
>> doesn't complain, that's a great start.
>>
>> For the above test cases as well, verify that page cache doesn't grow as
>> IO is performed. A bit is fine for things like meta data, but generally
>> you want to see it remain basically flat in terms of page cache usage.
>>
>> Maybe this is all fine, like I said I didn't verify. Just mentioning it
>> for completeness sake.
> 
> Hi, Jens
> Thanks for your suggestion. As I mentioned earlier in [1], in f2fs,
> the regular buffered write path invokes folio_end_writeback from a
> softirq context. Therefore, it seems that f2fs may not be suitable
> for DONTCACHE I/O writes.
> 
> I?d like to ask a question: why is DONTCACHE I/O write restricted to
> non-interrupt context only? Is it because dropping the page might be
> too time-consuming to be done safely in interrupt context? This might
> be a naive question, but I?d really appreciate your clarification.
> Thanks in advance.

Because (as of right now, at least) the code doing the invalidation
needs process context. There are various reasons for this, which you'll
see if you follow the path off folio_end_writeback() ->
filemap_end_dropbehind_write() -> filemap_end_dropbehind() ->
folio_unmap_invalidate(). unmap_mapping_folio() is one case, and while
that may be doable, the inode i_lock is not IRQ safe.

Most file systems have a need to punt some writeback completions to
non-irq context, eg for file extending etc. Hence for most file systems,
the dontcache case just becomes another case that needs to go through
that path.

It'd certainly be possible to improve upon this, for example by having
an opportunistic dontcache unmap from IRQ/soft-irq context, and then
punting to a workqueue if that doesn't pan out. But this doesn't exist
as of yet, hence the need for the workqueue punt.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] f2fs: f2fs supports uncached buffered I/O
  2025-07-16  3:43     ` Jens Axboe
@ 2025-07-16  8:27       ` hanqi
  2025-07-24 13:09         ` Chao Yu
  0 siblings, 1 reply; 10+ messages in thread
From: hanqi @ 2025-07-16  8:27 UTC (permalink / raw)
  To: Jens Axboe, jaegeuk, chao; +Cc: linux-f2fs-devel, linux-kernel



在 2025/7/16 11:43, Jens Axboe 写道:
> On 7/15/25 9:34 PM, hanqi wrote:
>>
>> ? 2025/7/15 22:28, Jens Axboe ??:
>>> On 7/14/25 9:10 PM, Qi Han wrote:
>>>> Jens has already completed the development of uncached buffered I/O
>>>> in git [1], and in f2fs, the feature can be enabled simply by setting
>>>> the FOP_DONTCACHE flag in f2fs_file_operations.
>>> You need to ensure that for any DONTCACHE IO that the completion is
>>> routed via non-irq context, if applicable. I didn't verify that this is
>>> the case for f2fs. Generally you can deduce this as well through
>>> testing, I'd say the following cases would be interesting to test:
>>>
>>> 1) Normal DONTCACHE buffered read
>>> 2) Overwrite DONTCACHE buffered write
>>> 3) Append DONTCACHE buffered write
>>>
>>> Test those with DEBUG_ATOMIC_SLEEP set in your config, and it that
>>> doesn't complain, that's a great start.
>>>
>>> For the above test cases as well, verify that page cache doesn't grow as
>>> IO is performed. A bit is fine for things like meta data, but generally
>>> you want to see it remain basically flat in terms of page cache usage.
>>>
>>> Maybe this is all fine, like I said I didn't verify. Just mentioning it
>>> for completeness sake.
>> Hi, Jens
>> Thanks for your suggestion. As I mentioned earlier in [1], in f2fs,
>> the regular buffered write path invokes folio_end_writeback from a
>> softirq context. Therefore, it seems that f2fs may not be suitable
>> for DONTCACHE I/O writes.
>>
>> I?d like to ask a question: why is DONTCACHE I/O write restricted to
>> non-interrupt context only? Is it because dropping the page might be
>> too time-consuming to be done safely in interrupt context? This might
>> be a naive question, but I?d really appreciate your clarification.
>> Thanks in advance.
> Because (as of right now, at least) the code doing the invalidation
> needs process context. There are various reasons for this, which you'll
> see if you follow the path off folio_end_writeback() ->
> filemap_end_dropbehind_write() -> filemap_end_dropbehind() ->
> folio_unmap_invalidate(). unmap_mapping_folio() is one case, and while
> that may be doable, the inode i_lock is not IRQ safe.
>
> Most file systems have a need to punt some writeback completions to
> non-irq context, eg for file extending etc. Hence for most file systems,
> the dontcache case just becomes another case that needs to go through
> that path.
>
> It'd certainly be possible to improve upon this, for example by having
> an opportunistic dontcache unmap from IRQ/soft-irq context, and then
> punting to a workqueue if that doesn't pan out. But this doesn't exist
> as of yet, hence the need for the workqueue punt.

Hi, Jens
Thank you for your response. I tested uncached buffer I/O reads with
a 50GB dataset on a local F2FS filesystem, and the page cache size
only increased slightly, which I believe aligns with expectations.
After clearing the page cache, the page cache size returned to its
initial state. The test results are as follows:

stat 50G.txt
   File: 50G.txt
   Size: 53687091200      Blocks: 104960712       IO Blocks: 512  regular file

[read before]:
echo 3 > /proc/sys/vm/drop_caches
01:48:17        kbmemfree kbavail     kbmemused  %memused      kbbuffers kbcached   kbcommit     %commit   kbactive    kbinact     kbdirty
01:50:59      6404648   8149508   2719384   23.40     512     1898092   199384760    823.75   1846756    466832     44

./uncached_io_test 8192 1 1 50G.txt
Starting 1 threads
reading bs 8192, uncached 1
   1s: 754MB/sec, MB=754
   ...
  64s: 844MB/sec, MB=262144

[read after]:
01:52:33      6326664   8121240   2747968    23.65      728     1947656   199384788    823.75   1887896    502004     68
echo 3 > /proc/sys/vm/drop_caches
01:53:11      6351136   8096936   2772400   23.86     512     1900500   199385216    823.75   1847252    533768      104

Hi Chao,
Given that F2FS currently calls folio_end_writeback in the softirq
context for normal write scenarios, could we first support uncached
buffer I/O reads? For normal uncached buffer I/O writes, would it be
feasible for F2FS to introduce an asynchronous workqueue to handle the
page drop operation in the future? What are your thoughts on this?
Thank you!



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] f2fs: f2fs supports uncached buffered I/O
  2025-07-16  8:27       ` hanqi
@ 2025-07-24 13:09         ` Chao Yu
  2025-07-25  1:44           ` hanqi
  0 siblings, 1 reply; 10+ messages in thread
From: Chao Yu @ 2025-07-24 13:09 UTC (permalink / raw)
  To: hanqi, Jens Axboe, jaegeuk; +Cc: chao, linux-f2fs-devel, linux-kernel

On 2025/7/16 16:27, hanqi wrote:
> 
> 
> 在 2025/7/16 11:43, Jens Axboe 写道:
>> On 7/15/25 9:34 PM, hanqi wrote:
>>>
>>> ? 2025/7/15 22:28, Jens Axboe ??:
>>>> On 7/14/25 9:10 PM, Qi Han wrote:
>>>>> Jens has already completed the development of uncached buffered I/O
>>>>> in git [1], and in f2fs, the feature can be enabled simply by setting
>>>>> the FOP_DONTCACHE flag in f2fs_file_operations.
>>>> You need to ensure that for any DONTCACHE IO that the completion is
>>>> routed via non-irq context, if applicable. I didn't verify that this is
>>>> the case for f2fs. Generally you can deduce this as well through
>>>> testing, I'd say the following cases would be interesting to test:
>>>>
>>>> 1) Normal DONTCACHE buffered read
>>>> 2) Overwrite DONTCACHE buffered write
>>>> 3) Append DONTCACHE buffered write
>>>>
>>>> Test those with DEBUG_ATOMIC_SLEEP set in your config, and it that
>>>> doesn't complain, that's a great start.
>>>>
>>>> For the above test cases as well, verify that page cache doesn't grow as
>>>> IO is performed. A bit is fine for things like meta data, but generally
>>>> you want to see it remain basically flat in terms of page cache usage.
>>>>
>>>> Maybe this is all fine, like I said I didn't verify. Just mentioning it
>>>> for completeness sake.
>>> Hi, Jens
>>> Thanks for your suggestion. As I mentioned earlier in [1], in f2fs,
>>> the regular buffered write path invokes folio_end_writeback from a
>>> softirq context. Therefore, it seems that f2fs may not be suitable
>>> for DONTCACHE I/O writes.
>>>
>>> I?d like to ask a question: why is DONTCACHE I/O write restricted to
>>> non-interrupt context only? Is it because dropping the page might be
>>> too time-consuming to be done safely in interrupt context? This might
>>> be a naive question, but I?d really appreciate your clarification.
>>> Thanks in advance.
>> Because (as of right now, at least) the code doing the invalidation
>> needs process context. There are various reasons for this, which you'll
>> see if you follow the path off folio_end_writeback() ->
>> filemap_end_dropbehind_write() -> filemap_end_dropbehind() ->
>> folio_unmap_invalidate(). unmap_mapping_folio() is one case, and while
>> that may be doable, the inode i_lock is not IRQ safe.
>>
>> Most file systems have a need to punt some writeback completions to
>> non-irq context, eg for file extending etc. Hence for most file systems,
>> the dontcache case just becomes another case that needs to go through
>> that path.
>>
>> It'd certainly be possible to improve upon this, for example by having
>> an opportunistic dontcache unmap from IRQ/soft-irq context, and then
>> punting to a workqueue if that doesn't pan out. But this doesn't exist
>> as of yet, hence the need for the workqueue punt.

Thanks Jens for the detailed explanation.

> 
> Hi, Jens
> Thank you for your response. I tested uncached buffer I/O reads with
> a 50GB dataset on a local F2FS filesystem, and the page cache size
> only increased slightly, which I believe aligns with expectations.
> After clearing the page cache, the page cache size returned to its
> initial state. The test results are as follows:
> 
> stat 50G.txt
>     File: 50G.txt
>     Size: 53687091200      Blocks: 104960712       IO Blocks: 512  regular file
> 
> [read before]:
> echo 3 > /proc/sys/vm/drop_caches
> 01:48:17        kbmemfree kbavail     kbmemused  %memused      kbbuffers kbcached   kbcommit     %commit   kbactive    kbinact     kbdirty
> 01:50:59      6404648   8149508   2719384   23.40     512     1898092   199384760    823.75   1846756    466832     44
> 
> ./uncached_io_test 8192 1 1 50G.txt
> Starting 1 threads
> reading bs 8192, uncached 1
>     1s: 754MB/sec, MB=754
>     ...
>    64s: 844MB/sec, MB=262144
> 
> [read after]:
> 01:52:33      6326664   8121240   2747968    23.65      728     1947656   199384788    823.75   1887896    502004     68
> echo 3 > /proc/sys/vm/drop_caches
> 01:53:11      6351136   8096936   2772400   23.86     512     1900500   199385216    823.75   1847252    533768      104
> 
> Hi Chao,
> Given that F2FS currently calls folio_end_writeback in the softirq
> context for normal write scenarios, could we first support uncached
> buffer I/O reads? For normal uncached buffer I/O writes, would it be
> feasible for F2FS to introduce an asynchronous workqueue to handle the
> page drop operation in the future? What are your thoughts on this?

Qi,

Sorry for the delay.

I think it will be good to support uncached buffered I/O in read path
first, and then let's take a look what we can do for write path, anyway,
let's do this step by step.

Can you please update the patch?
- support read path only
- include test data in commit message

> Thank you!
> 
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] f2fs: f2fs supports uncached buffered I/O
  2025-07-24 13:09         ` Chao Yu
@ 2025-07-25  1:44           ` hanqi
  2025-07-25  2:37             ` Chao Yu
  0 siblings, 1 reply; 10+ messages in thread
From: hanqi @ 2025-07-25  1:44 UTC (permalink / raw)
  To: Chao Yu, Jens Axboe, jaegeuk; +Cc: linux-f2fs-devel, linux-kernel



在 2025/7/24 21:09, Chao Yu 写道:
> On 2025/7/16 16:27, hanqi wrote:
>>
>>
>> 在 2025/7/16 11:43, Jens Axboe 写道:
>>> On 7/15/25 9:34 PM, hanqi wrote:
>>>>
>>>> ? 2025/7/15 22:28, Jens Axboe ??:
>>>>> On 7/14/25 9:10 PM, Qi Han wrote:
>>>>>> Jens has already completed the development of uncached buffered I/O
>>>>>> in git [1], and in f2fs, the feature can be enabled simply by 
>>>>>> setting
>>>>>> the FOP_DONTCACHE flag in f2fs_file_operations.
>>>>> You need to ensure that for any DONTCACHE IO that the completion is
>>>>> routed via non-irq context, if applicable. I didn't verify that 
>>>>> this is
>>>>> the case for f2fs. Generally you can deduce this as well through
>>>>> testing, I'd say the following cases would be interesting to test:
>>>>>
>>>>> 1) Normal DONTCACHE buffered read
>>>>> 2) Overwrite DONTCACHE buffered write
>>>>> 3) Append DONTCACHE buffered write
>>>>>
>>>>> Test those with DEBUG_ATOMIC_SLEEP set in your config, and it that
>>>>> doesn't complain, that's a great start.
>>>>>
>>>>> For the above test cases as well, verify that page cache doesn't 
>>>>> grow as
>>>>> IO is performed. A bit is fine for things like meta data, but 
>>>>> generally
>>>>> you want to see it remain basically flat in terms of page cache 
>>>>> usage.
>>>>>
>>>>> Maybe this is all fine, like I said I didn't verify. Just 
>>>>> mentioning it
>>>>> for completeness sake.
>>>> Hi, Jens
>>>> Thanks for your suggestion. As I mentioned earlier in [1], in f2fs,
>>>> the regular buffered write path invokes folio_end_writeback from a
>>>> softirq context. Therefore, it seems that f2fs may not be suitable
>>>> for DONTCACHE I/O writes.
>>>>
>>>> I?d like to ask a question: why is DONTCACHE I/O write restricted to
>>>> non-interrupt context only? Is it because dropping the page might be
>>>> too time-consuming to be done safely in interrupt context? This might
>>>> be a naive question, but I?d really appreciate your clarification.
>>>> Thanks in advance.
>>> Because (as of right now, at least) the code doing the invalidation
>>> needs process context. There are various reasons for this, which you'll
>>> see if you follow the path off folio_end_writeback() ->
>>> filemap_end_dropbehind_write() -> filemap_end_dropbehind() ->
>>> folio_unmap_invalidate(). unmap_mapping_folio() is one case, and while
>>> that may be doable, the inode i_lock is not IRQ safe.
>>>
>>> Most file systems have a need to punt some writeback completions to
>>> non-irq context, eg for file extending etc. Hence for most file 
>>> systems,
>>> the dontcache case just becomes another case that needs to go through
>>> that path.
>>>
>>> It'd certainly be possible to improve upon this, for example by having
>>> an opportunistic dontcache unmap from IRQ/soft-irq context, and then
>>> punting to a workqueue if that doesn't pan out. But this doesn't exist
>>> as of yet, hence the need for the workqueue punt.
>
> Thanks Jens for the detailed explanation.
>
>>
>> Hi, Jens
>> Thank you for your response. I tested uncached buffer I/O reads with
>> a 50GB dataset on a local F2FS filesystem, and the page cache size
>> only increased slightly, which I believe aligns with expectations.
>> After clearing the page cache, the page cache size returned to its
>> initial state. The test results are as follows:
>>
>> stat 50G.txt
>>     File: 50G.txt
>>     Size: 53687091200      Blocks: 104960712       IO Blocks: 512  
>> regular file
>>
>> [read before]:
>> echo 3 > /proc/sys/vm/drop_caches
>> 01:48:17        kbmemfree kbavail     kbmemused  %memused 
>> kbbuffers kbcached   kbcommit     %commit   kbactive kbinact     kbdirty
>> 01:50:59      6404648   8149508   2719384   23.40     512 1898092   
>> 199384760    823.75   1846756    466832     44
>>
>> ./uncached_io_test 8192 1 1 50G.txt
>> Starting 1 threads
>> reading bs 8192, uncached 1
>>     1s: 754MB/sec, MB=754
>>     ...
>>    64s: 844MB/sec, MB=262144
>>
>> [read after]:
>> 01:52:33      6326664   8121240   2747968    23.65      728 1947656   
>> 199384788    823.75   1887896    502004     68
>> echo 3 > /proc/sys/vm/drop_caches
>> 01:53:11      6351136   8096936   2772400   23.86     512 1900500   
>> 199385216    823.75   1847252    533768      104
>>
>> Hi Chao,
>> Given that F2FS currently calls folio_end_writeback in the softirq
>> context for normal write scenarios, could we first support uncached
>> buffer I/O reads? For normal uncached buffer I/O writes, would it be
>> feasible for F2FS to introduce an asynchronous workqueue to handle the
>> page drop operation in the future? What are your thoughts on this?
>
> Qi,
>
> Sorry for the delay.
>
> I think it will be good to support uncached buffered I/O in read path
> first, and then let's take a look what we can do for write path, anyway,
> let's do this step by step.
>
> Can you please update the patch?
> - support read path only
> - include test data in commit message
Chao

I will re-submit a patch to first enable F2FS support for uncached
buffer I/O reads. Following that, I will work on implementing
asynchronous page dropping in F2FS.

Thank you!
>
>> Thank you!
>>
>>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] f2fs: f2fs supports uncached buffered I/O
  2025-07-25  1:44           ` hanqi
@ 2025-07-25  2:37             ` Chao Yu
  0 siblings, 0 replies; 10+ messages in thread
From: Chao Yu @ 2025-07-25  2:37 UTC (permalink / raw)
  To: hanqi, Jens Axboe, jaegeuk; +Cc: chao, linux-f2fs-devel, linux-kernel

On 7/25/2025 9:44 AM, hanqi wrote:
> 
> 
> 在 2025/7/24 21:09, Chao Yu 写道:
>> On 2025/7/16 16:27, hanqi wrote:
>>>
>>>
>>> 在 2025/7/16 11:43, Jens Axboe 写道:
>>>> On 7/15/25 9:34 PM, hanqi wrote:
>>>>>
>>>>> ? 2025/7/15 22:28, Jens Axboe ??:
>>>>>> On 7/14/25 9:10 PM, Qi Han wrote:
>>>>>>> Jens has already completed the development of uncached buffered I/O
>>>>>>> in git [1], and in f2fs, the feature can be enabled simply by
>>>>>>> setting
>>>>>>> the FOP_DONTCACHE flag in f2fs_file_operations.
>>>>>> You need to ensure that for any DONTCACHE IO that the completion is
>>>>>> routed via non-irq context, if applicable. I didn't verify that
>>>>>> this is
>>>>>> the case for f2fs. Generally you can deduce this as well through
>>>>>> testing, I'd say the following cases would be interesting to test:
>>>>>>
>>>>>> 1) Normal DONTCACHE buffered read
>>>>>> 2) Overwrite DONTCACHE buffered write
>>>>>> 3) Append DONTCACHE buffered write
>>>>>>
>>>>>> Test those with DEBUG_ATOMIC_SLEEP set in your config, and it that
>>>>>> doesn't complain, that's a great start.
>>>>>>
>>>>>> For the above test cases as well, verify that page cache doesn't
>>>>>> grow as
>>>>>> IO is performed. A bit is fine for things like meta data, but
>>>>>> generally
>>>>>> you want to see it remain basically flat in terms of page cache
>>>>>> usage.
>>>>>>
>>>>>> Maybe this is all fine, like I said I didn't verify. Just
>>>>>> mentioning it
>>>>>> for completeness sake.
>>>>> Hi, Jens
>>>>> Thanks for your suggestion. As I mentioned earlier in [1], in f2fs,
>>>>> the regular buffered write path invokes folio_end_writeback from a
>>>>> softirq context. Therefore, it seems that f2fs may not be suitable
>>>>> for DONTCACHE I/O writes.
>>>>>
>>>>> I?d like to ask a question: why is DONTCACHE I/O write restricted to
>>>>> non-interrupt context only? Is it because dropping the page might be
>>>>> too time-consuming to be done safely in interrupt context? This might
>>>>> be a naive question, but I?d really appreciate your clarification.
>>>>> Thanks in advance.
>>>> Because (as of right now, at least) the code doing the invalidation
>>>> needs process context. There are various reasons for this, which you'll
>>>> see if you follow the path off folio_end_writeback() ->
>>>> filemap_end_dropbehind_write() -> filemap_end_dropbehind() ->
>>>> folio_unmap_invalidate(). unmap_mapping_folio() is one case, and while
>>>> that may be doable, the inode i_lock is not IRQ safe.
>>>>
>>>> Most file systems have a need to punt some writeback completions to
>>>> non-irq context, eg for file extending etc. Hence for most file
>>>> systems,
>>>> the dontcache case just becomes another case that needs to go through
>>>> that path.
>>>>
>>>> It'd certainly be possible to improve upon this, for example by having
>>>> an opportunistic dontcache unmap from IRQ/soft-irq context, and then
>>>> punting to a workqueue if that doesn't pan out. But this doesn't exist
>>>> as of yet, hence the need for the workqueue punt.
>>
>> Thanks Jens for the detailed explanation.
>>
>>>
>>> Hi, Jens
>>> Thank you for your response. I tested uncached buffer I/O reads with
>>> a 50GB dataset on a local F2FS filesystem, and the page cache size
>>> only increased slightly, which I believe aligns with expectations.
>>> After clearing the page cache, the page cache size returned to its
>>> initial state. The test results are as follows:
>>>
>>> stat 50G.txt
>>>      File: 50G.txt
>>>      Size: 53687091200      Blocks: 104960712       IO Blocks: 512
>>> regular file
>>>
>>> [read before]:
>>> echo 3 > /proc/sys/vm/drop_caches
>>> 01:48:17        kbmemfree kbavail     kbmemused  %memused
>>> kbbuffers kbcached   kbcommit     %commit   kbactive kbinact     kbdirty
>>> 01:50:59      6404648   8149508   2719384   23.40     512 1898092
>>> 199384760    823.75   1846756    466832     44
>>>
>>> ./uncached_io_test 8192 1 1 50G.txt
>>> Starting 1 threads
>>> reading bs 8192, uncached 1
>>>      1s: 754MB/sec, MB=754
>>>      ...
>>>     64s: 844MB/sec, MB=262144
>>>
>>> [read after]:
>>> 01:52:33      6326664   8121240   2747968    23.65      728 1947656
>>> 199384788    823.75   1887896    502004     68
>>> echo 3 > /proc/sys/vm/drop_caches
>>> 01:53:11      6351136   8096936   2772400   23.86     512 1900500
>>> 199385216    823.75   1847252    533768      104
>>>
>>> Hi Chao,
>>> Given that F2FS currently calls folio_end_writeback in the softirq
>>> context for normal write scenarios, could we first support uncached
>>> buffer I/O reads? For normal uncached buffer I/O writes, would it be
>>> feasible for F2FS to introduce an asynchronous workqueue to handle the
>>> page drop operation in the future? What are your thoughts on this?
>>
>> Qi,
>>
>> Sorry for the delay.
>>
>> I think it will be good to support uncached buffered I/O in read path
>> first, and then let's take a look what we can do for write path, anyway,
>> let's do this step by step.
>>
>> Can you please update the patch?
>> - support read path only
>> - include test data in commit message
> Chao
> 
> I will re-submit a patch to first enable F2FS support for uncached
> buffer I/O reads. Following that, I will work on implementing
> asynchronous page dropping in F2FS.

Qi, sure, please go ahead, thanks for the work. :)

Thanks,

> 
> Thank you!
>>
>>> Thank you!
>>>
>>>
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-07-25  2:37 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-15  3:10 [PATCH] f2fs: f2fs supports uncached buffered I/O Qi Han
2025-07-15  6:58 ` Chao Yu
2025-07-15  8:14   ` hanqi
2025-07-15 14:28 ` Jens Axboe
2025-07-16  3:34   ` hanqi
2025-07-16  3:43     ` Jens Axboe
2025-07-16  8:27       ` hanqi
2025-07-24 13:09         ` Chao Yu
2025-07-25  1:44           ` hanqi
2025-07-25  2:37             ` Chao Yu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).