* Re: [PATCH] exfat: enable request merging for dir readahead
2025-04-07 10:23 ` [PATCH] exfat: enable request merging for dir readahead Anthony Iliopoulos
@ 2025-04-07 13:00 ` Namjae Jeon
2025-04-08 1:15 ` Sungjong Seo
1 sibling, 0 replies; 3+ messages in thread
From: Namjae Jeon @ 2025-04-07 13:00 UTC (permalink / raw)
To: Anthony Iliopoulos; +Cc: Sungjong Seo, Yuezhang Mo, linux-fsdevel, linux-kernel
On Mon, Apr 7, 2025 at 7:23 PM Anthony Iliopoulos <ailiop@suse.com> wrote:
>
> Directory listings that need to access the inode metadata (e.g. via
> statx to obtain the file types) of large filesystems with lots of
> metadata that aren't yet in dcache, will take a long time due to the
> directory readahead submitting one io request at a time which although
> targeting sequential disk sectors (up to EXFAT_MAX_RA_SIZE) are not
> merged at the block layer.
>
> Add plugging around sb_breadahead so that the requests can be batched
> and submitted jointly to the block layer where they can be merged by the
> io schedulers, instead of having each request individually submitted to
> the hardware queues.
>
> This significantly improves the throughput of directory listings as it
> also minimizes the number of io completions and related handling from
> the device driver side.
>
> Signed-off-by: Anthony Iliopoulos <ailiop@suse.com>
> ---
> fs/exfat/dir.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/fs/exfat/dir.c b/fs/exfat/dir.c
> index 3103b932b674..a46ab2690b4d 100644
> --- a/fs/exfat/dir.c
> +++ b/fs/exfat/dir.c
Hi Anthony,
> @@ -621,6 +621,7 @@ static int exfat_dir_readahead(struct super_block *sb, sector_t sec)
> {
> struct exfat_sb_info *sbi = EXFAT_SB(sb);
> struct buffer_head *bh;
> + struct blk_plug plug;
> unsigned int max_ra_count = EXFAT_MAX_RA_SIZE >> sb->s_blocksize_bits;
> unsigned int page_ra_count = PAGE_SIZE >> sb->s_blocksize_bits;
> unsigned int adj_ra_count = max(sbi->sect_per_clus, page_ra_count);
> @@ -644,8 +645,10 @@ static int exfat_dir_readahead(struct super_block *sb, sector_t sec)
> if (!bh || !buffer_uptodate(bh)) {
> unsigned int i;
It is better to move plug declaration here.
Thanks!
>
> + blk_start_plug(&plug);
> for (i = 0; i < ra_count; i++)
> sb_breadahead(sb, (sector_t)(sec + i));
> + blk_finish_plug(&plug);
> }
> brelse(bh);
> return 0;
> --
> 2.49.0
>
^ permalink raw reply [flat|nested] 3+ messages in thread* RE: [PATCH] exfat: enable request merging for dir readahead
2025-04-07 10:23 ` [PATCH] exfat: enable request merging for dir readahead Anthony Iliopoulos
2025-04-07 13:00 ` Namjae Jeon
@ 2025-04-08 1:15 ` Sungjong Seo
1 sibling, 0 replies; 3+ messages in thread
From: Sungjong Seo @ 2025-04-08 1:15 UTC (permalink / raw)
To: 'Anthony Iliopoulos', 'Namjae Jeon',
'Yuezhang Mo'
Cc: linux-fsdevel, linux-kernel, sjdev.seo, cpgs, sj1557.seo
Hi, Anthony
> Directory listings that need to access the inode metadata (e.g. via
> statx to obtain the file types) of large filesystems with lots of
> metadata that aren't yet in dcache, will take a long time due to the
> directory readahead submitting one io request at a time which although
> targeting sequential disk sectors (up to EXFAT_MAX_RA_SIZE) are not
> merged at the block layer.
>
> Add plugging around sb_breadahead so that the requests can be batched
> and submitted jointly to the block layer where they can be merged by the
> io schedulers, instead of having each request individually submitted to
> the hardware queues.
>
> This significantly improves the throughput of directory listings as it
> also minimizes the number of io completions and related handling from
> the device driver side.
Good approach. However, this attempt was in the past Samsung code,
and there was a problem that the latency of directory-related operations
became longer when ra_count is large (maybe, MAX_RA_SIZE).
In the most recent code, blk_flush_plug is being done in units of
pages as follows.
```
blk_start_plug(&plug);
for (i = 0; i < ra_count; i++) {
if (i && !(i & (sects_per_page - 1)))
blk_flush_plug(&plug, false);
sb_breadahead(sb, sec + i);
}
blk_finish_plug(&plug);
```
However, since blk_flush_plug is not exported, it can no longer be used in
module build. It seems that blk_flush_plug needs to be exported or
improved to repeat blk_start_plug and blk_finish_plug in units of pages.
After changing to plug by page unit, could you also compare the throughput?
Thanks
>
> Signed-off-by: Anthony Iliopoulos <ailiop@suse.com>
> ---
> fs/exfat/dir.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/fs/exfat/dir.c b/fs/exfat/dir.c
> index 3103b932b674..a46ab2690b4d 100644
> --- a/fs/exfat/dir.c
> +++ b/fs/exfat/dir.c
> @@ -621,6 +621,7 @@ static int exfat_dir_readahead(struct super_block *sb,
> sector_t sec)
> {
> struct exfat_sb_info *sbi = EXFAT_SB(sb);
> struct buffer_head *bh;
> + struct blk_plug plug;
> unsigned int max_ra_count = EXFAT_MAX_RA_SIZE >> sb-
> >s_blocksize_bits;
> unsigned int page_ra_count = PAGE_SIZE >> sb->s_blocksize_bits;
> unsigned int adj_ra_count = max(sbi->sect_per_clus, page_ra_count);
> @@ -644,8 +645,10 @@ static int exfat_dir_readahead(struct super_block
*sb,
> sector_t sec)
> if (!bh || !buffer_uptodate(bh)) {
> unsigned int i;
>
> + blk_start_plug(&plug);
> for (i = 0; i < ra_count; i++)
> sb_breadahead(sb, (sector_t)(sec + i));
> + blk_finish_plug(&plug);
> }
> brelse(bh);
> return 0;
> --
> 2.49.0
^ permalink raw reply [flat|nested] 3+ messages in thread