* [PATCH] ext4: check flags's EXT4_GET_BLOCKS_DELALLOC_RESERVE before call ext4_find_delalloc_cluster()
@ 2011-12-07 6:04 Robin Dong
2011-12-07 11:02 ` Yongqiang Yang
2011-12-08 6:59 ` [PATCH v2] ext4: directly leave out of ext4_find_delalloc_range() if filesystem mount with "nodelalloc" Robin Dong
0 siblings, 2 replies; 5+ messages in thread
From: Robin Dong @ 2011-12-07 6:04 UTC (permalink / raw)
To: linux-ext4; +Cc: Robin Dong
From: Robin Dong <sanbai@taobao.com>
We found performance regression when using bigalloc with "nodelalloc" (1MB cluster size):
1. mke2fs -C 1048576 -O ^has_journal,bigalloc /dev/sda
2. mount -o nodelalloc /dev/sda /test/
3. time dd if=/dev/zero of=/test/io bs=1048576 count=1024
The "dd" will cost about 2 seconds to finish, but if we mke2fs without "bigalloc",
"dd" will only cost lesss than 1 second.
The reason is: when using ext4 with "nodelalloc", it will call ext4_find_delalloc_cluster() nearly
everytime it call ext4_ext_map_blocks(), and ext4_find_delalloc_cluster() will also scan all pages
in cluster because no buffer is "delayed".
A cluster has 256 pages (1MB cluster), so it will scan 256 * 256k pags when creating a 1G file. That
severely hurts the performance.
Therefore, we don't call ext4_find_delalloc_cluster() when use "nodelalloc".
Signed-off-by: Robin Dong <sanbai@taobao.com>
---
fs/ext4/extents.c | 6 ++++--
1 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 61fa9e1..e15d32b 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3724,7 +3724,8 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
if (!(flags & EXT4_GET_BLOCKS_PUNCH_OUT_EXT) &&
ext4_ext_in_cache(inode, map->m_lblk, &newex)) {
if (!newex.ee_start_lo && !newex.ee_start_hi) {
- if ((sbi->s_cluster_ratio > 1) &&
+ if ((flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) &&
+ (sbi->s_cluster_ratio > 1) &&
ext4_find_delalloc_cluster(inode, map->m_lblk, 0))
map->m_flags |= EXT4_MAP_FROM_CLUSTER;
@@ -3900,7 +3901,8 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
}
}
- if ((sbi->s_cluster_ratio > 1) &&
+ if ((flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) &&
+ (sbi->s_cluster_ratio > 1) &&
ext4_find_delalloc_cluster(inode, map->m_lblk, 0))
map->m_flags |= EXT4_MAP_FROM_CLUSTER;
--
1.7.4.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] ext4: check flags's EXT4_GET_BLOCKS_DELALLOC_RESERVE before call ext4_find_delalloc_cluster()
2011-12-07 6:04 [PATCH] ext4: check flags's EXT4_GET_BLOCKS_DELALLOC_RESERVE before call ext4_find_delalloc_cluster() Robin Dong
@ 2011-12-07 11:02 ` Yongqiang Yang
2011-12-08 6:59 ` [PATCH v2] ext4: directly leave out of ext4_find_delalloc_range() if filesystem mount with "nodelalloc" Robin Dong
1 sibling, 0 replies; 5+ messages in thread
From: Yongqiang Yang @ 2011-12-07 11:02 UTC (permalink / raw)
To: Robin Dong; +Cc: linux-ext4, Robin Dong
Hi Robin,
If a file system is mounted with delalloc and it changes to nodelalloc
mode thereafter, does the patch work?
Yongqiang.
On Wed, Dec 7, 2011 at 2:04 PM, Robin Dong <hao.bigrat@gmail.com> wrote:
> From: Robin Dong <sanbai@taobao.com>
>
> We found performance regression when using bigalloc with "nodelalloc" (1MB cluster size):
>
> 1. mke2fs -C 1048576 -O ^has_journal,bigalloc /dev/sda
> 2. mount -o nodelalloc /dev/sda /test/
> 3. time dd if=/dev/zero of=/test/io bs=1048576 count=1024
>
> The "dd" will cost about 2 seconds to finish, but if we mke2fs without "bigalloc",
> "dd" will only cost lesss than 1 second.
>
> The reason is: when using ext4 with "nodelalloc", it will call ext4_find_delalloc_cluster() nearly
> everytime it call ext4_ext_map_blocks(), and ext4_find_delalloc_cluster() will also scan all pages
> in cluster because no buffer is "delayed".
> A cluster has 256 pages (1MB cluster), so it will scan 256 * 256k pags when creating a 1G file. That
> severely hurts the performance.
>
> Therefore, we don't call ext4_find_delalloc_cluster() when use "nodelalloc".
>
> Signed-off-by: Robin Dong <sanbai@taobao.com>
> ---
> fs/ext4/extents.c | 6 ++++--
> 1 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 61fa9e1..e15d32b 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -3724,7 +3724,8 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
> if (!(flags & EXT4_GET_BLOCKS_PUNCH_OUT_EXT) &&
> ext4_ext_in_cache(inode, map->m_lblk, &newex)) {
> if (!newex.ee_start_lo && !newex.ee_start_hi) {
> - if ((sbi->s_cluster_ratio > 1) &&
> + if ((flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) &&
> + (sbi->s_cluster_ratio > 1) &&
> ext4_find_delalloc_cluster(inode, map->m_lblk, 0))
> map->m_flags |= EXT4_MAP_FROM_CLUSTER;
>
> @@ -3900,7 +3901,8 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
> }
> }
>
> - if ((sbi->s_cluster_ratio > 1) &&
> + if ((flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) &&
> + (sbi->s_cluster_ratio > 1) &&
> ext4_find_delalloc_cluster(inode, map->m_lblk, 0))
> map->m_flags |= EXT4_MAP_FROM_CLUSTER;
>
> --
> 1.7.4.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Best Wishes
Yongqiang Yang
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v2] ext4: directly leave out of ext4_find_delalloc_range() if filesystem mount with "nodelalloc"
2011-12-07 6:04 [PATCH] ext4: check flags's EXT4_GET_BLOCKS_DELALLOC_RESERVE before call ext4_find_delalloc_cluster() Robin Dong
2011-12-07 11:02 ` Yongqiang Yang
@ 2011-12-08 6:59 ` Robin Dong
2011-12-08 8:36 ` Yongqiang Yang
2011-12-19 15:39 ` Ted Ts'o
1 sibling, 2 replies; 5+ messages in thread
From: Robin Dong @ 2011-12-08 6:59 UTC (permalink / raw)
To: linux-ext4; +Cc: Robin Dong
From: Robin Dong <sanbai@taobao.com>
We found performance regression when using bigalloc with "nodelalloc" (1MB cluster size):
1. mke2fs -C 1048576 -O ^has_journal,bigalloc /dev/sda
2. mount -o nodelalloc /dev/sda /test/
3. time dd if=/dev/zero of=/test/io bs=1048576 count=1024
The "dd" will cost about 2 seconds to finish, but if we mke2fs without "bigalloc",
"dd" will only cost lesss than 1 second.
The reason is: when using ext4 with "nodelalloc", it will call ext4_find_delalloc_cluster() nearly
everytime it call ext4_ext_map_blocks(), and ext4_find_delalloc_range() will also scan all pages
in cluster because no buffer is "delayed".
A cluster has 256 pages (1MB cluster), so it will scan 256 * 256k pags when creating a 1G file. That
severely hurts the performance.
Therefore, we return out from ext4_find_delalloc_range() when using "nodelalloc".
Signed-off-by: Robin Dong <sanbai@taobao.com>
---
fs/ext4/extents.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 61fa9e1..60f5f25 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3282,6 +3282,9 @@ static int ext4_find_delalloc_range(struct inode *inode,
ext4_lblk_t i, pg_lblk;
pgoff_t index;
+ if (!test_opt(inode->i_sb, DELALLOC))
+ return 0;
+
/* reverse search wont work if fs block size is less than page size */
if (inode->i_blkbits < PAGE_CACHE_SHIFT)
search_hint_reverse = 0;
--
1.7.4.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v2] ext4: directly leave out of ext4_find_delalloc_range() if filesystem mount with "nodelalloc"
2011-12-08 6:59 ` [PATCH v2] ext4: directly leave out of ext4_find_delalloc_range() if filesystem mount with "nodelalloc" Robin Dong
@ 2011-12-08 8:36 ` Yongqiang Yang
2011-12-19 15:39 ` Ted Ts'o
1 sibling, 0 replies; 5+ messages in thread
From: Yongqiang Yang @ 2011-12-08 8:36 UTC (permalink / raw)
To: Robin Dong; +Cc: linux-ext4, Ted Ts'o
On Thu, Dec 8, 2011 at 2:59 PM, Robin Dong <hao.bigrat@gmail.com> wrote:
> From: Robin Dong <sanbai@taobao.com>
>
> We found performance regression when using bigalloc with "nodelalloc" (1MB cluster size):
>
> 1. mke2fs -C 1048576 -O ^has_journal,bigalloc /dev/sda
> 2. mount -o nodelalloc /dev/sda /test/
> 3. time dd if=/dev/zero of=/test/io bs=1048576 count=1024
>
> The "dd" will cost about 2 seconds to finish, but if we mke2fs without "bigalloc",
> "dd" will only cost lesss than 1 second.
>
> The reason is: when using ext4 with "nodelalloc", it will call ext4_find_delalloc_cluster() nearly
> everytime it call ext4_ext_map_blocks(), and ext4_find_delalloc_range() will also scan all pages
> in cluster because no buffer is "delayed".
> A cluster has 256 pages (1MB cluster), so it will scan 256 * 256k pags when creating a 1G file. That
> severely hurts the performance.
Looks good to me.
I think delayed extent tree can help a lot when a cluster has hundreds
of pages in delalloc case.
Hi Ted,
Any plans on merging delayed extent tree patches?
Yongqiang.
>
> Therefore, we return out from ext4_find_delalloc_range() when using "nodelalloc".
>
> Signed-off-by: Robin Dong <sanbai@taobao.com>
> ---
> fs/ext4/extents.c | 3 +++
> 1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 61fa9e1..60f5f25 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -3282,6 +3282,9 @@ static int ext4_find_delalloc_range(struct inode *inode,
> ext4_lblk_t i, pg_lblk;
> pgoff_t index;
>
> + if (!test_opt(inode->i_sb, DELALLOC))
> + return 0;
> +
> /* reverse search wont work if fs block size is less than page size */
> if (inode->i_blkbits < PAGE_CACHE_SHIFT)
> search_hint_reverse = 0;
> --
> 1.7.4.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Best Wishes
Yongqiang Yang
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2] ext4: directly leave out of ext4_find_delalloc_range() if filesystem mount with "nodelalloc"
2011-12-08 6:59 ` [PATCH v2] ext4: directly leave out of ext4_find_delalloc_range() if filesystem mount with "nodelalloc" Robin Dong
2011-12-08 8:36 ` Yongqiang Yang
@ 2011-12-19 15:39 ` Ted Ts'o
1 sibling, 0 replies; 5+ messages in thread
From: Ted Ts'o @ 2011-12-19 15:39 UTC (permalink / raw)
To: Robin Dong; +Cc: linux-ext4, Robin Dong
On Thu, Dec 08, 2011 at 02:59:54PM +0800, Robin Dong wrote:
> From: Robin Dong <sanbai@taobao.com>
>
> We found performance regression when using bigalloc with "nodelalloc" (1MB cluster size):
>
> 1. mke2fs -C 1048576 -O ^has_journal,bigalloc /dev/sda
> 2. mount -o nodelalloc /dev/sda /test/
> 3. time dd if=/dev/zero of=/test/io bs=1048576 count=1024
>
> The "dd" will cost about 2 seconds to finish, but if we mke2fs without "bigalloc",
> "dd" will only cost lesss than 1 second.
>
> The reason is: when using ext4 with "nodelalloc", it will call ext4_find_delalloc_cluster() nearly
> everytime it call ext4_ext_map_blocks(), and ext4_find_delalloc_range() will also scan all pages
> in cluster because no buffer is "delayed".
> A cluster has 256 pages (1MB cluster), so it will scan 256 * 256k pags when creating a 1G file. That
> severely hurts the performance.
>
> Therefore, we return out from ext4_find_delalloc_range() when using "nodelalloc".
>
> Signed-off-by: Robin Dong <sanbai@taobao.com>
Thanks, applied.
- Ted
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2011-12-19 15:39 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-07 6:04 [PATCH] ext4: check flags's EXT4_GET_BLOCKS_DELALLOC_RESERVE before call ext4_find_delalloc_cluster() Robin Dong
2011-12-07 11:02 ` Yongqiang Yang
2011-12-08 6:59 ` [PATCH v2] ext4: directly leave out of ext4_find_delalloc_range() if filesystem mount with "nodelalloc" Robin Dong
2011-12-08 8:36 ` Yongqiang Yang
2011-12-19 15:39 ` Ted Ts'o
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).