* [RFC PATCH] ext4: Fix group number calculation
@ 2013-07-05 9:29 Li Zhong
2013-07-06 3:11 ` Theodore Ts'o
0 siblings, 1 reply; 4+ messages in thread
From: Li Zhong @ 2013-07-05 9:29 UTC (permalink / raw)
To: linux-ext4; +Cc: tytso, adilger.kernel
This patch tries to fix the following BUG:
[ 568.877799] ------------[ cut here ]------------
[ 568.877833] kernel BUG at fs/ext4/mballoc.c:3728!
[ 568.877840] Oops: Exception in kernel mode, sig: 5 [#1]
[ 568.877845] SMP NR_CPUS=32 NUMA pSeries
[ 568.877852] Modules linked in: binfmt_misc
[ 568.877861] CPU: 1 PID: 3516 Comm: fs_mark Not tainted 3.10.0-03216-g7c6809f-dirty #1
[ 568.877867] task: c0000001fb0b8000 ti: c0000001fa954000 task.ti: c0000001fa954000
[ 568.877873] NIP: c0000000002f42a4 LR: c0000000002f4274 CTR: c000000000317ef8
[ 568.877879] REGS: c0000001fa956ed0 TRAP: 0700 Not tainted (3.10.0-03216-g7c6809f-dirty)
[ 568.877884] MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI> CR: 24000428 XER: 00000000
[ 568.877902] SOFTE: 1
[ 568.877905] CFAR: c0000000002b5464
[ 568.877908]
GPR00: 0000000000000001 c0000001fa957150 c000000000c6a408 c0000001fb588000
GPR04: 0000000000003fff c0000001fa9571c0 c0000001fa9571c4 000138098c50625f
GPR08: 1301200000000000 0000000000000002 0000000000000001 0000000000000000
GPR12: 0000000024000422 c00000000f33a300 0000000000008000 c0000001fa9577f0
GPR16: c0000001fb7d0100 c000000000c29190 c0000000007f46e8 c000000000a14672
GPR20: 0000000000000001 0000000000000008 ffffffffffffffff 0000000000000000
GPR24: 0000000000000100 c0000001fa957278 c0000001fdb2bc78 c0000001fa957288
GPR28: 0000000000100100 c0000001fa957288 c0000001fb588000 c0000001fdb2bd10
[ 568.877993] NIP [c0000000002f42a4] .ext4_mb_release_group_pa+0xec/0x1c0
[ 568.877999] LR [c0000000002f4274] .ext4_mb_release_group_pa+0xbc/0x1c0
[ 568.878004] Call Trace:
[ 568.878008] [c0000001fa957150] [c0000000002f4274] .ext4_mb_release_group_pa+0xbc/0x1c0 (unreliable)
[ 568.878017] [c0000001fa957200] [c0000000002fb070] .ext4_mb_discard_lg_preallocations+0x394/0x444
[ 568.878025] [c0000001fa957340] [c0000000002fb45c] .ext4_mb_release_context+0x33c/0x734
[ 568.878032] [c0000001fa957440] [c0000000002fbcf8] .ext4_mb_new_blocks+0x4a4/0x5f4
[ 568.878039] [c0000001fa957510] [c0000000002ef56c] .ext4_ext_map_blocks+0xc28/0x1178
[ 568.878047] [c0000001fa957640] [c0000000002c1a94] .ext4_map_blocks+0x2c8/0x490
[ 568.878054] [c0000001fa957730] [c0000000002c536c] .ext4_writepages+0x738/0xc60
[ 568.878062] [c0000001fa957950] [c000000000168a78] .do_writepages+0x5c/0x80
[ 568.878069] [c0000001fa9579d0] [c00000000015d1c4] .__filemap_fdatawrite_range+0x88/0xb0
[ 568.878078] [c0000001fa957aa0] [c00000000015d23c] .filemap_write_and_wait_range+0x50/0xfc
[ 568.878085] [c0000001fa957b30] [c0000000002b8edc] .ext4_sync_file+0x220/0x3c4
[ 568.878092] [c0000001fa957be0] [c0000000001f849c] .vfs_fsync_range+0x64/0x80
[ 568.878098] [c0000001fa957c70] [c0000000001f84f0] .vfs_fsync+0x38/0x4c
[ 568.878105] [c0000001fa957d00] [c0000000001f87f4] .do_fsync+0x54/0x90
[ 568.878111] [c0000001fa957db0] [c0000000001f8894] .SyS_fsync+0x28/0x3c
[ 568.878120] [c0000001fa957e30] [c000000000009c88] syscall_exit+0x0/0x7c
[ 568.878125] Instruction dump:
[ 568.878130] 60000000 813d0034 81610070 38000000 7f8b4800 419e001c 813f007c 7d2bfe70
[ 568.878144] 7d604a78 7c005850 54000ffe 7c0007b4 <0b000000> e8a10076 e87f0090 7fa4eb78
[ 568.878160] ---[ end trace 594d911d9654770b ]---
It seems that it is caused by different group numbers calculated from
ext4_get_group_number() and ext4_get_group_no_and_offset().
The fix below tries to make ext4_get_group_number() consistent with
ext4_get_group_no_and_offset():
1) minus first data block number instead of adding it
2) remove the cluster bits, as seems it is used for offset calculation in
ext4_get_group_no_and_offset().
Signed-off-by: Li Zhong <lizhongfs@gmail.com>
---
fs/ext4/balloc.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
index 5833939..938c97e 100644
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -38,9 +38,9 @@ ext4_group_t ext4_get_group_number(struct super_block *sb,
ext4_group_t group;
if (test_opt2(sb, STD_GROUP_SIZE))
- group = (le32_to_cpu(EXT4_SB(sb)->s_es->s_first_data_block) +
- block) >>
- (EXT4_BLOCK_SIZE_BITS(sb) + EXT4_CLUSTER_BITS(sb) + 3);
+ group = (block -
+ le32_to_cpu(EXT4_SB(sb)->s_es->s_first_data_block)) >>
+ (EXT4_BLOCK_SIZE_BITS(sb) + 3);
else
ext4_get_group_no_and_offset(sb, block, &group, NULL);
return group;
--
1.7.9.5
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [RFC PATCH] ext4: Fix group number calculation
2013-07-05 9:29 [RFC PATCH] ext4: Fix group number calculation Li Zhong
@ 2013-07-06 3:11 ` Theodore Ts'o
2013-07-07 0:14 ` Li Zhong
2013-07-08 6:25 ` Lukáš Czerner
0 siblings, 2 replies; 4+ messages in thread
From: Theodore Ts'o @ 2013-07-06 3:11 UTC (permalink / raw)
To: Li Zhong; +Cc: linux-ext4, adilger.kernel, Lukas Czerner
On Fri, Jul 05, 2013 at 05:29:48PM +0800, Li Zhong wrote:
>
> It seems that it is caused by different group numbers calculated from
> ext4_get_group_number() and ext4_get_group_no_and_offset().
>
> The fix below tries to make ext4_get_group_number() consistent with
> ext4_get_group_no_and_offset():
> 1) minus first data block number instead of adding it
> 2) remove the cluster bits, as seems it is used for offset calculation in
> ext4_get_group_no_and_offset().
Thanks for reporting the problem, but only (1) is a problem. The
cluster bits part is right, although it wasn't actually ever going to
be used because we were only setting STD_GROUP_SIZE for non-bigalloc
file systems. The following is I believe the correct fix.
- Ted
>From 96eaeed0d7fc516481ed07739826dac840755eb3 Mon Sep 17 00:00:00 2001
From: Theodore Ts'o <tytso@mit.edu>
Date: Fri, 5 Jul 2013 23:11:16 -0400
Subject: [PATCH] ext4: fix ext4_get_group_number()
The function ext4_get_group_number() was introduced as an optimization
in commit bd86298e60b8. Unfortunately, this commit incorrectly
calculate the group number for file systems with a 1k block size (when
s_first_data_block is 1 instead of zero). This could cause the
following kernel BUG:
[ 568.877799] ------------[ cut here ]------------
[ 568.877833] kernel BUG at fs/ext4/mballoc.c:3728!
[ 568.877840] Oops: Exception in kernel mode, sig: 5 [#1]
[ 568.877845] SMP NR_CPUS=32 NUMA pSeries
[ 568.877852] Modules linked in: binfmt_misc
[ 568.877861] CPU: 1 PID: 3516 Comm: fs_mark Not tainted 3.10.0-03216-g7c6809f-dirty #1
[ 568.877867] task: c0000001fb0b8000 ti: c0000001fa954000 task.ti: c0000001fa954000
[ 568.877873] NIP: c0000000002f42a4 LR: c0000000002f4274 CTR: c000000000317ef8
[ 568.877879] REGS: c0000001fa956ed0 TRAP: 0700 Not tainted (3.10.0-03216-g7c6809f-dirty)
[ 568.877884] MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI> CR: 24000428 XER: 00000000
[ 568.877902] SOFTE: 1
[ 568.877905] CFAR: c0000000002b5464
[ 568.877908]
GPR00: 0000000000000001 c0000001fa957150 c000000000c6a408 c0000001fb588000
GPR04: 0000000000003fff c0000001fa9571c0 c0000001fa9571c4 000138098c50625f
GPR08: 1301200000000000 0000000000000002 0000000000000001 0000000000000000
GPR12: 0000000024000422 c00000000f33a300 0000000000008000 c0000001fa9577f0
GPR16: c0000001fb7d0100 c000000000c29190 c0000000007f46e8 c000000000a14672
GPR20: 0000000000000001 0000000000000008 ffffffffffffffff 0000000000000000
GPR24: 0000000000000100 c0000001fa957278 c0000001fdb2bc78 c0000001fa957288
GPR28: 0000000000100100 c0000001fa957288 c0000001fb588000 c0000001fdb2bd10
[ 568.877993] NIP [c0000000002f42a4] .ext4_mb_release_group_pa+0xec/0x1c0
[ 568.877999] LR [c0000000002f4274] .ext4_mb_release_group_pa+0xbc/0x1c0
[ 568.878004] Call Trace:
[ 568.878008] [c0000001fa957150] [c0000000002f4274] .ext4_mb_release_group_pa+0xbc/0x1c0 (unreliable)
[ 568.878017] [c0000001fa957200] [c0000000002fb070] .ext4_mb_discard_lg_preallocations+0x394/0x444
[ 568.878025] [c0000001fa957340] [c0000000002fb45c] .ext4_mb_release_context+0x33c/0x734
[ 568.878032] [c0000001fa957440] [c0000000002fbcf8] .ext4_mb_new_blocks+0x4a4/0x5f4
[ 568.878039] [c0000001fa957510] [c0000000002ef56c] .ext4_ext_map_blocks+0xc28/0x1178
[ 568.878047] [c0000001fa957640] [c0000000002c1a94] .ext4_map_blocks+0x2c8/0x490
[ 568.878054] [c0000001fa957730] [c0000000002c536c] .ext4_writepages+0x738/0xc60
[ 568.878062] [c0000001fa957950] [c000000000168a78] .do_writepages+0x5c/0x80
[ 568.878069] [c0000001fa9579d0] [c00000000015d1c4] .__filemap_fdatawrite_range+0x88/0xb0
[ 568.878078] [c0000001fa957aa0] [c00000000015d23c] .filemap_write_and_wait_range+0x50/0xfc
[ 568.878085] [c0000001fa957b30] [c0000000002b8edc] .ext4_sync_file+0x220/0x3c4
[ 568.878092] [c0000001fa957be0] [c0000000001f849c] .vfs_fsync_range+0x64/0x80
[ 568.878098] [c0000001fa957c70] [c0000000001f84f0] .vfs_fsync+0x38/0x4c
[ 568.878105] [c0000001fa957d00] [c0000000001f87f4] .do_fsync+0x54/0x90
[ 568.878111] [c0000001fa957db0] [c0000000001f8894] .SyS_fsync+0x28/0x3c
[ 568.878120] [c0000001fa957e30] [c000000000009c88] syscall_exit+0x0/0x7c
[ 568.878125] Instruction dump:
[ 568.878130] 60000000 813d0034 81610070 38000000 7f8b4800 419e001c 813f007c 7d2bfe70
[ 568.878144] 7d604a78 7c005850 54000ffe 7c0007b4 <0b000000> e8a10076 e87f0090 7fa4eb78
[ 568.878160] ---[ end trace 594d911d9654770b ]---
In addition fix the STD_GROUP optimization so that it works for
bigalloc file systems as well.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reported-by: Li Zhong <lizhongfs@gmail.com>
Cc: Lukas Czerner <lczerner@redhat.com>
Cc: stable@vger.kernel.org # 3.10
---
fs/ext4/balloc.c | 4 ++--
fs/ext4/super.c | 8 ++++----
2 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
index 5833939..ddd715e 100644
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -38,8 +38,8 @@ ext4_group_t ext4_get_group_number(struct super_block *sb,
ext4_group_t group;
if (test_opt2(sb, STD_GROUP_SIZE))
- group = (le32_to_cpu(EXT4_SB(sb)->s_es->s_first_data_block) +
- block) >>
+ group = (block -
+ le32_to_cpu(EXT4_SB(sb)->s_es->s_first_data_block)) >>
(EXT4_BLOCK_SIZE_BITS(sb) + EXT4_CLUSTER_BITS(sb) + 3);
else
ext4_get_group_no_and_offset(sb, block, &group, NULL);
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 85b3dd6..8862d4d 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -3624,10 +3624,6 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
sbi->s_addr_per_block_bits = ilog2(EXT4_ADDR_PER_BLOCK(sb));
sbi->s_desc_per_block_bits = ilog2(EXT4_DESC_PER_BLOCK(sb));
- /* Do we have standard group size of blocksize * 8 blocks ? */
- if (sbi->s_blocks_per_group == blocksize << 3)
- set_opt2(sb, STD_GROUP_SIZE);
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [RFC PATCH] ext4: Fix group number calculation
2013-07-06 3:11 ` Theodore Ts'o
@ 2013-07-07 0:14 ` Li Zhong
2013-07-08 6:25 ` Lukáš Czerner
1 sibling, 0 replies; 4+ messages in thread
From: Li Zhong @ 2013-07-07 0:14 UTC (permalink / raw)
To: Theodore Ts'o; +Cc: linux-ext4, adilger.kernel, Lukas Czerner
On Fri, 2013-07-05 at 23:11 -0400, Theodore Ts'o wrote:
> On Fri, Jul 05, 2013 at 05:29:48PM +0800, Li Zhong wrote:
> >
> > It seems that it is caused by different group numbers calculated from
> > ext4_get_group_number() and ext4_get_group_no_and_offset().
> >
> > The fix below tries to make ext4_get_group_number() consistent with
> > ext4_get_group_no_and_offset():
> > 1) minus first data block number instead of adding it
> > 2) remove the cluster bits, as seems it is used for offset calculation in
> > ext4_get_group_no_and_offset().
>
> Thanks for reporting the problem, but only (1) is a problem. The
> cluster bits part is right, although it wasn't actually ever going to
> be used because we were only setting STD_GROUP_SIZE for non-bigalloc
> file systems. The following is I believe the correct fix.
Ah, yes, it seems correct to fix the place where STD_GROUP_SIZE is set.
So the standard(default) blocks per group is always clustersize * 8:
for non-bigalloc file systems, it equals to blocksize * 8 (cluster has
only 1 block -- cluster_bits is 0)
for bigalloc ones, it equals to blocksize * w ^ cluster_bits * 8
(cluster has more than 1 blocks -- 2 ^ cluster_bits)
Thanks, Zhong
>
> - Ted
>
> From 96eaeed0d7fc516481ed07739826dac840755eb3 Mon Sep 17 00:00:00 2001
> From: Theodore Ts'o <tytso@mit.edu>
> Date: Fri, 5 Jul 2013 23:11:16 -0400
> Subject: [PATCH] ext4: fix ext4_get_group_number()
>
> The function ext4_get_group_number() was introduced as an optimization
> in commit bd86298e60b8. Unfortunately, this commit incorrectly
> calculate the group number for file systems with a 1k block size (when
> s_first_data_block is 1 instead of zero). This could cause the
> following kernel BUG:
>
> [ 568.877799] ------------[ cut here ]------------
> [ 568.877833] kernel BUG at fs/ext4/mballoc.c:3728!
> [ 568.877840] Oops: Exception in kernel mode, sig: 5 [#1]
> [ 568.877845] SMP NR_CPUS=32 NUMA pSeries
> [ 568.877852] Modules linked in: binfmt_misc
> [ 568.877861] CPU: 1 PID: 3516 Comm: fs_mark Not tainted 3.10.0-03216-g7c6809f-dirty #1
> [ 568.877867] task: c0000001fb0b8000 ti: c0000001fa954000 task.ti: c0000001fa954000
> [ 568.877873] NIP: c0000000002f42a4 LR: c0000000002f4274 CTR: c000000000317ef8
> [ 568.877879] REGS: c0000001fa956ed0 TRAP: 0700 Not tainted (3.10.0-03216-g7c6809f-dirty)
> [ 568.877884] MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI> CR: 24000428 XER: 00000000
> [ 568.877902] SOFTE: 1
> [ 568.877905] CFAR: c0000000002b5464
> [ 568.877908]
> GPR00: 0000000000000001 c0000001fa957150 c000000000c6a408 c0000001fb588000
> GPR04: 0000000000003fff c0000001fa9571c0 c0000001fa9571c4 000138098c50625f
> GPR08: 1301200000000000 0000000000000002 0000000000000001 0000000000000000
> GPR12: 0000000024000422 c00000000f33a300 0000000000008000 c0000001fa9577f0
> GPR16: c0000001fb7d0100 c000000000c29190 c0000000007f46e8 c000000000a14672
> GPR20: 0000000000000001 0000000000000008 ffffffffffffffff 0000000000000000
> GPR24: 0000000000000100 c0000001fa957278 c0000001fdb2bc78 c0000001fa957288
> GPR28: 0000000000100100 c0000001fa957288 c0000001fb588000 c0000001fdb2bd10
> [ 568.877993] NIP [c0000000002f42a4] .ext4_mb_release_group_pa+0xec/0x1c0
> [ 568.877999] LR [c0000000002f4274] .ext4_mb_release_group_pa+0xbc/0x1c0
> [ 568.878004] Call Trace:
> [ 568.878008] [c0000001fa957150] [c0000000002f4274] .ext4_mb_release_group_pa+0xbc/0x1c0 (unreliable)
> [ 568.878017] [c0000001fa957200] [c0000000002fb070] .ext4_mb_discard_lg_preallocations+0x394/0x444
> [ 568.878025] [c0000001fa957340] [c0000000002fb45c] .ext4_mb_release_context+0x33c/0x734
> [ 568.878032] [c0000001fa957440] [c0000000002fbcf8] .ext4_mb_new_blocks+0x4a4/0x5f4
> [ 568.878039] [c0000001fa957510] [c0000000002ef56c] .ext4_ext_map_blocks+0xc28/0x1178
> [ 568.878047] [c0000001fa957640] [c0000000002c1a94] .ext4_map_blocks+0x2c8/0x490
> [ 568.878054] [c0000001fa957730] [c0000000002c536c] .ext4_writepages+0x738/0xc60
> [ 568.878062] [c0000001fa957950] [c000000000168a78] .do_writepages+0x5c/0x80
> [ 568.878069] [c0000001fa9579d0] [c00000000015d1c4] .__filemap_fdatawrite_range+0x88/0xb0
> [ 568.878078] [c0000001fa957aa0] [c00000000015d23c] .filemap_write_and_wait_range+0x50/0xfc
> [ 568.878085] [c0000001fa957b30] [c0000000002b8edc] .ext4_sync_file+0x220/0x3c4
> [ 568.878092] [c0000001fa957be0] [c0000000001f849c] .vfs_fsync_range+0x64/0x80
> [ 568.878098] [c0000001fa957c70] [c0000000001f84f0] .vfs_fsync+0x38/0x4c
> [ 568.878105] [c0000001fa957d00] [c0000000001f87f4] .do_fsync+0x54/0x90
> [ 568.878111] [c0000001fa957db0] [c0000000001f8894] .SyS_fsync+0x28/0x3c
> [ 568.878120] [c0000001fa957e30] [c000000000009c88] syscall_exit+0x0/0x7c
> [ 568.878125] Instruction dump:
> [ 568.878130] 60000000 813d0034 81610070 38000000 7f8b4800 419e001c 813f007c 7d2bfe70
> [ 568.878144] 7d604a78 7c005850 54000ffe 7c0007b4 <0b000000> e8a10076 e87f0090 7fa4eb78
> [ 568.878160] ---[ end trace 594d911d9654770b ]---
>
> In addition fix the STD_GROUP optimization so that it works for
> bigalloc file systems as well.
>
> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
> Reported-by: Li Zhong <lizhongfs@gmail.com>
> Cc: Lukas Czerner <lczerner@redhat.com>
> Cc: stable@vger.kernel.org # 3.10
> ---
> fs/ext4/balloc.c | 4 ++--
> fs/ext4/super.c | 8 ++++----
> 2 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
> index 5833939..ddd715e 100644
> --- a/fs/ext4/balloc.c
> +++ b/fs/ext4/balloc.c
> @@ -38,8 +38,8 @@ ext4_group_t ext4_get_group_number(struct super_block *sb,
> ext4_group_t group;
>
> if (test_opt2(sb, STD_GROUP_SIZE))
> - group = (le32_to_cpu(EXT4_SB(sb)->s_es->s_first_data_block) +
> - block) >>
> + group = (block -
> + le32_to_cpu(EXT4_SB(sb)->s_es->s_first_data_block)) >>
> (EXT4_BLOCK_SIZE_BITS(sb) + EXT4_CLUSTER_BITS(sb) + 3);
> else
> ext4_get_group_no_and_offset(sb, block, &group, NULL);
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 85b3dd6..8862d4d 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -3624,10 +3624,6 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
> sbi->s_addr_per_block_bits = ilog2(EXT4_ADDR_PER_BLOCK(sb));
> sbi->s_desc_per_block_bits = ilog2(EXT4_DESC_PER_BLOCK(sb));
>
> - /* Do we have standard group size of blocksize * 8 blocks ? */
> - if (sbi->s_blocks_per_group == blocksize << 3)
> - set_opt2(sb, STD_GROUP_SIZE);
> -
> for (i = 0; i < 4; i++)
> sbi->s_hash_seed[i] = le32_to_cpu(es->s_hash_seed[i]);
> sbi->s_def_hash_version = es->s_def_hash_version;
> @@ -3697,6 +3693,10 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
> goto failed_mount;
> }
>
> + /* Do we have standard group size of clustersize * 8 blocks ? */
> + if (sbi->s_blocks_per_group == clustersize << 3)
> + set_opt2(sb, STD_GROUP_SIZE);
> +
> /*
> * Test whether we have more sectors than will fit in sector_t,
> * and whether the max offset is addressable by the page cache.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFC PATCH] ext4: Fix group number calculation
2013-07-06 3:11 ` Theodore Ts'o
2013-07-07 0:14 ` Li Zhong
@ 2013-07-08 6:25 ` Lukáš Czerner
1 sibling, 0 replies; 4+ messages in thread
From: Lukáš Czerner @ 2013-07-08 6:25 UTC (permalink / raw)
To: Theodore Ts'o; +Cc: Li Zhong, linux-ext4, adilger.kernel
On Fri, 5 Jul 2013, Theodore Ts'o wrote:
> Date: Fri, 5 Jul 2013 23:11:47 -0400
> From: Theodore Ts'o <tytso@mit.edu>
> To: Li Zhong <lizhongfs@gmail.com>
> Cc: linux-ext4@vger.kernel.org, adilger.kernel@dilger.ca,
> Lukas Czerner <lczerner@redhat.com>
> Subject: Re: [RFC PATCH] ext4: Fix group number calculation
>
> On Fri, Jul 05, 2013 at 05:29:48PM +0800, Li Zhong wrote:
> >
> > It seems that it is caused by different group numbers calculated from
> > ext4_get_group_number() and ext4_get_group_no_and_offset().
> >
> > The fix below tries to make ext4_get_group_number() consistent with
> > ext4_get_group_no_and_offset():
> > 1) minus first data block number instead of adding it
> > 2) remove the cluster bits, as seems it is used for offset calculation in
> > ext4_get_group_no_and_offset().
>
> Thanks for reporting the problem, but only (1) is a problem. The
> cluster bits part is right, although it wasn't actually ever going to
> be used because we were only setting STD_GROUP_SIZE for non-bigalloc
> file systems. The following is I believe the correct fix.
>
> - Ted
>
> From 96eaeed0d7fc516481ed07739826dac840755eb3 Mon Sep 17 00:00:00 2001
> From: Theodore Ts'o <tytso@mit.edu>
> Date: Fri, 5 Jul 2013 23:11:16 -0400
> Subject: [PATCH] ext4: fix ext4_get_group_number()
>
> The function ext4_get_group_number() was introduced as an optimization
> in commit bd86298e60b8. Unfortunately, this commit incorrectly
> calculate the group number for file systems with a 1k block size (when
> s_first_data_block is 1 instead of zero). This could cause the
> following kernel BUG:
Yes this is a correct fix, thanks!
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
>
> [ 568.877799] ------------[ cut here ]------------
> [ 568.877833] kernel BUG at fs/ext4/mballoc.c:3728!
> [ 568.877840] Oops: Exception in kernel mode, sig: 5 [#1]
> [ 568.877845] SMP NR_CPUS=32 NUMA pSeries
> [ 568.877852] Modules linked in: binfmt_misc
> [ 568.877861] CPU: 1 PID: 3516 Comm: fs_mark Not tainted 3.10.0-03216-g7c6809f-dirty #1
> [ 568.877867] task: c0000001fb0b8000 ti: c0000001fa954000 task.ti: c0000001fa954000
> [ 568.877873] NIP: c0000000002f42a4 LR: c0000000002f4274 CTR: c000000000317ef8
> [ 568.877879] REGS: c0000001fa956ed0 TRAP: 0700 Not tainted (3.10.0-03216-g7c6809f-dirty)
> [ 568.877884] MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI> CR: 24000428 XER: 00000000
> [ 568.877902] SOFTE: 1
> [ 568.877905] CFAR: c0000000002b5464
> [ 568.877908]
> GPR00: 0000000000000001 c0000001fa957150 c000000000c6a408 c0000001fb588000
> GPR04: 0000000000003fff c0000001fa9571c0 c0000001fa9571c4 000138098c50625f
> GPR08: 1301200000000000 0000000000000002 0000000000000001 0000000000000000
> GPR12: 0000000024000422 c00000000f33a300 0000000000008000 c0000001fa9577f0
> GPR16: c0000001fb7d0100 c000000000c29190 c0000000007f46e8 c000000000a14672
> GPR20: 0000000000000001 0000000000000008 ffffffffffffffff 0000000000000000
> GPR24: 0000000000000100 c0000001fa957278 c0000001fdb2bc78 c0000001fa957288
> GPR28: 0000000000100100 c0000001fa957288 c0000001fb588000 c0000001fdb2bd10
> [ 568.877993] NIP [c0000000002f42a4] .ext4_mb_release_group_pa+0xec/0x1c0
> [ 568.877999] LR [c0000000002f4274] .ext4_mb_release_group_pa+0xbc/0x1c0
> [ 568.878004] Call Trace:
> [ 568.878008] [c0000001fa957150] [c0000000002f4274] .ext4_mb_release_group_pa+0xbc/0x1c0 (unreliable)
> [ 568.878017] [c0000001fa957200] [c0000000002fb070] .ext4_mb_discard_lg_preallocations+0x394/0x444
> [ 568.878025] [c0000001fa957340] [c0000000002fb45c] .ext4_mb_release_context+0x33c/0x734
> [ 568.878032] [c0000001fa957440] [c0000000002fbcf8] .ext4_mb_new_blocks+0x4a4/0x5f4
> [ 568.878039] [c0000001fa957510] [c0000000002ef56c] .ext4_ext_map_blocks+0xc28/0x1178
> [ 568.878047] [c0000001fa957640] [c0000000002c1a94] .ext4_map_blocks+0x2c8/0x490
> [ 568.878054] [c0000001fa957730] [c0000000002c536c] .ext4_writepages+0x738/0xc60
> [ 568.878062] [c0000001fa957950] [c000000000168a78] .do_writepages+0x5c/0x80
> [ 568.878069] [c0000001fa9579d0] [c00000000015d1c4] .__filemap_fdatawrite_range+0x88/0xb0
> [ 568.878078] [c0000001fa957aa0] [c00000000015d23c] .filemap_write_and_wait_range+0x50/0xfc
> [ 568.878085] [c0000001fa957b30] [c0000000002b8edc] .ext4_sync_file+0x220/0x3c4
> [ 568.878092] [c0000001fa957be0] [c0000000001f849c] .vfs_fsync_range+0x64/0x80
> [ 568.878098] [c0000001fa957c70] [c0000000001f84f0] .vfs_fsync+0x38/0x4c
> [ 568.878105] [c0000001fa957d00] [c0000000001f87f4] .do_fsync+0x54/0x90
> [ 568.878111] [c0000001fa957db0] [c0000000001f8894] .SyS_fsync+0x28/0x3c
> [ 568.878120] [c0000001fa957e30] [c000000000009c88] syscall_exit+0x0/0x7c
> [ 568.878125] Instruction dump:
> [ 568.878130] 60000000 813d0034 81610070 38000000 7f8b4800 419e001c 813f007c 7d2bfe70
> [ 568.878144] 7d604a78 7c005850 54000ffe 7c0007b4 <0b000000> e8a10076 e87f0090 7fa4eb78
> [ 568.878160] ---[ end trace 594d911d9654770b ]---
>
> In addition fix the STD_GROUP optimization so that it works for
> bigalloc file systems as well.
>
> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
> Reported-by: Li Zhong <lizhongfs@gmail.com>
> Cc: Lukas Czerner <lczerner@redhat.com>
> Cc: stable@vger.kernel.org # 3.10
> ---
> fs/ext4/balloc.c | 4 ++--
> fs/ext4/super.c | 8 ++++----
> 2 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
> index 5833939..ddd715e 100644
> --- a/fs/ext4/balloc.c
> +++ b/fs/ext4/balloc.c
> @@ -38,8 +38,8 @@ ext4_group_t ext4_get_group_number(struct super_block *sb,
> ext4_group_t group;
>
> if (test_opt2(sb, STD_GROUP_SIZE))
> - group = (le32_to_cpu(EXT4_SB(sb)->s_es->s_first_data_block) +
> - block) >>
> + group = (block -
> + le32_to_cpu(EXT4_SB(sb)->s_es->s_first_data_block)) >>
> (EXT4_BLOCK_SIZE_BITS(sb) + EXT4_CLUSTER_BITS(sb) + 3);
> else
> ext4_get_group_no_and_offset(sb, block, &group, NULL);
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 85b3dd6..8862d4d 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -3624,10 +3624,6 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
> sbi->s_addr_per_block_bits = ilog2(EXT4_ADDR_PER_BLOCK(sb));
> sbi->s_desc_per_block_bits = ilog2(EXT4_DESC_PER_BLOCK(sb));
>
> - /* Do we have standard group size of blocksize * 8 blocks ? */
> - if (sbi->s_blocks_per_group == blocksize << 3)
> - set_opt2(sb, STD_GROUP_SIZE);
> -
> for (i = 0; i < 4; i++)
> sbi->s_hash_seed[i] = le32_to_cpu(es->s_hash_seed[i]);
> sbi->s_def_hash_version = es->s_def_hash_version;
> @@ -3697,6 +3693,10 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
> goto failed_mount;
> }
>
> + /* Do we have standard group size of clustersize * 8 blocks ? */
> + if (sbi->s_blocks_per_group == clustersize << 3)
> + set_opt2(sb, STD_GROUP_SIZE);
> +
> /*
> * Test whether we have more sectors than will fit in sector_t,
> * and whether the max offset is addressable by the page cache.
>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2013-07-08 6:26 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-07-05 9:29 [RFC PATCH] ext4: Fix group number calculation Li Zhong
2013-07-06 3:11 ` Theodore Ts'o
2013-07-07 0:14 ` Li Zhong
2013-07-08 6:25 ` Lukáš Czerner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox