[BUG][Bigalloc] applictions will be blocked for more than 120s when we run xfstests #083

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [BUG][Bigalloc] applictions will be blocked for more than 120s when we run xfstests #083
@ 2013-03-07 12:11 Zheng Liu
  2013-03-07 14:07 ` Lukáš Czerner
  0 siblings, 1 reply; 3+ messages in thread
From: Zheng Liu @ 2013-03-07 12:11 UTC (permalink / raw)
  To: linux-ext4

Hi all,

This bug has been confirmed by Ted and Lukas.  When we run xfstests #083
in a ext4 file system with bigalloc feature, it will be blocked for more
than 120s.  I hit this bug in 3.8 kernel, and I can confirm that it
doesn't be fixed in dev branch until now.  This bug is very hard to be
hitted in my sand box.  I need to run the following commands to trigger
it.

  for i in {0..9}
  do
    ./check 083
  done

My sand box is a Dell Desktop with a Intel(R) Core(TM)2 Duo CPU E8400
@ 3.00GHz, 4G memory, a 160G HDD and a Intel SSD.  The test runs against
SSD.

In 3.8 kernel, we will get the follwing messages from dmesg, and will
be blocked for more than 120s:

Mar  7 15:15:17 lz-desktop wenqing: run xfstest 083 
Mar  7 15:15:17 lz-desktop kernel: EXT4-fs (sda2): mounted filesystem
with ordered data mode. Opts: acl,user_xattr
Mar  7 15:15:18 lz-desktop kernel: EXT4-fs (sda2): delayed block
allocation failed for inode 32 at logical offset 631 with max blocks 29
 with error -28 
Mar  7 15:15:18 lz-desktop kernel: EXT4-fs (sda2): This should not
happen!! Data will be lost
Mar  7 15:15:18 lz-desktop kernel:
Mar  7 15:15:18 lz-desktop kernel: EXT4-fs (sda2): Total free blocks count 288
Mar  7 15:15:18 lz-desktop kernel: EXT4-fs (sda2): Free/Dirty block details
Mar  7 15:15:18 lz-desktop kernel: EXT4-fs (sda2): free_blocks=288
Mar  7 15:15:18 lz-desktop kernel: EXT4-fs (sda2): dirty_blocks=96
Mar  7 15:15:18 lz-desktop kernel: EXT4-fs (sda2): Block reservation details
Mar  7 15:15:18 lz-desktop kernel: EXT4-fs (sda2): i_reserved_data_blocks=3
Mar  7 15:15:18 lz-desktop kernel: EXT4-fs (sda2): i_reserved_meta_blocks=3

These messages *disappears* in dev branch because Lukas's patches.

But we still are blocked for more than 120s as below (after running 9
times):

wenqing: run xfstest 083
kernel: EXT4-fs (sda2): mounted filesystem with ordered data mode. Opts: acl,
user_xattr
kernel: INFO: task fsstress:9190 blocked for more than 120 seconds.
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: fsstress        D 0000000000000000     0  9190   9189 0x00000000
kernel: ffff88010878dd58 0000000000000086 ffff880102928230 ffff88010878c010
kernel: ffff880110d526f0 0000000000012080 ffff88010878dfd8 0000000000004000
kernel: ffff88010878dfd8 0000000000012080 ffffffff82613410 ffff880110d526f0
kernel: Call Trace:
kernel: [<ffffffff8205c88d>] ? sched_clock_local+0x1c/0x82
kernel: [<ffffffff8205e797>] ? enqueue_task_fair+0x14a/0x16c
kernel: [<ffffffff8237ef2f>] schedule+0x64/0x66
kernel: [<ffffffff8237d4cc>] schedule_timeout+0x2b/0x178
kernel: [<ffffffff8205c547>] ? ttwu_do_activate.clone.0+0x3f/0x44
kernel: [<ffffffff8237edbc>] wait_for_common+0xbd/0x112
kernel: [<ffffffff8205c76a>] ? try_to_wake_up+0x21e/0x21e
kernel: [<ffffffff82114559>] ? fdatawrite_one_bdev+0x18/0x18
kernel: [<ffffffff8237eec9>] wait_for_completion+0x1d/0x1f
kernel: [<ffffffff821105c8>] sync_inodes_sb+0xb3/0x198
kernel: [<ffffffff82114559>] ? fdatawrite_one_bdev+0x18/0x18
kernel: [<ffffffff8211456d>] sync_inodes_one_sb+0x14/0x16
kernel: [<ffffffff820f0f0a>] iterate_supers+0x6d/0xbf
kernel: [<ffffffff821145a4>] sys_sync+0x35/0x83
kernel: [<ffffffff82386942>] system_call_fastpath+0x16/0x1b
kernel: INFO: task fsstress:9191 blocked for more than 120 seconds.
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: fsstress        D 0000000000000001     0  9191   9189 0x00000000
kernel: ffff880102db3d58 0000000000000086 ffff88011238ad18 ffff880102db2010
kernel: ffff880112d3a6f0 0000000000012080 ffff880102db3fd8 0000000000004000
kernel: ffff880102db3fd8 0000000000012080 ffff880112c84670 ffff880112d3a6f0
kernel: Call Trace:
kernel: [<ffffffff8205c88d>] ? sched_clock_local+0x1c/0x82
kernel: [<ffffffff8205da55>] ? check_preempt_wakeup+0x11a/0x1b6
kernel: [<ffffffff8237ef2f>] schedule+0x64/0x66
kernel: [<ffffffff8237d4cc>] schedule_timeout+0x2b/0x178
kernel: [<ffffffff8205c547>] ? ttwu_do_activate.clone.0+0x3f/0x44
kernel: [<ffffffff8237edbc>] wait_for_common+0xbd/0x112
kernel: [<ffffffff8205c76a>] ? try_to_wake_up+0x21e/0x21e
kernel: [<ffffffff82114559>] ? fdatawrite_one_bdev+0x18/0x18
kernel: [<ffffffff8237eec9>] wait_for_completion+0x1d/0x1f
kernel: [<ffffffff821105c8>] sync_inodes_sb+0xb3/0x198
kernel: [<ffffffff82114559>] ? fdatawrite_one_bdev+0x18/0x18
kernel: [<ffffffff8211456d>] sync_inodes_one_sb+0x14/0x16
kernel: [<ffffffff820f0f0a>] iterate_supers+0x6d/0xbf
kernel: [<ffffffff821145a4>] sys_sync+0x35/0x83
kernel: [<ffffffff82386942>] system_call_fastpath+0x16/0x1b
kernel: INFO: task fsstress:9192 blocked for more than 120 seconds.
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: fsstress        D 0000000000000000     0  9192   9189 0x00000000
kernel: ffff880112317d58 0000000000000086 0000000000000000 ffff880112316010
kernel: ffff880112c1b950 0000000000012080 ffff880112317fd8 0000000000004000
kernel: ffff880112317fd8 0000000000012080 ffff880112c5ad50 ffff880112c1b950
kernel: Call Trace:
kernel: [<ffffffff820b02c3>] ? find_get_pages_tag+0xfb/0x130
kernel: [<ffffffff8237ebe7>] ? __schedule+0x740/0x7d2
kernel: [<ffffffff8237ef2f>] schedule+0x64/0x66
kernel: [<ffffffff8237d4cc>] schedule_timeout+0x2b/0x178
kernel: [<ffffffff8237edbc>] wait_for_common+0xbd/0x112
kernel: [<ffffffff8205c76a>] ? try_to_wake_up+0x21e/0x21e
kernel: [<ffffffff82114559>] ? fdatawrite_one_bdev+0x18/0x18
kernel: [<ffffffff8237eec9>] wait_for_completion+0x1d/0x1f
kernel: [<ffffffff821105c8>] sync_inodes_sb+0xb3/0x198
kernel: [<ffffffff82114559>] ? fdatawrite_one_bdev+0x18/0x18
kernel: [<ffffffff8211456d>] sync_inodes_one_sb+0x14/0x16
kernel: [<ffffffff820f0f0a>] iterate_supers+0x6d/0xbf
kernel: [<ffffffff821145a4>] sys_sync+0x35/0x83
kernel: [<ffffffff82386942>] system_call_fastpath+0x16/0x1b
kernel: INFO: task fsstress:9193 blocked for more than 120 seconds.
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: fsstress        D 0000000000000000     0  9193   9189 0x00000000
kernel: ffff8801122f7d58 0000000000000082 ffffffffa01f5eea ffff8801122f6010
kernel: ffff8801128492b0 0000000000012080 ffff8801122f7fd8 0000000000004000
kernel: ffff8801122f7fd8 0000000000012080 ffffffff82613410 ffff8801128492b0
kernel: Call Trace:
kernel: [<ffffffffa01f5eea>] ? ext4_release_file+0xb2/0xb2 [ext4]
kernel: [<ffffffff821074eb>] ? mntput+0x2a/0x2c
kernel: [<ffffffff8205c88d>] ? sched_clock_local+0x1c/0x82
kernel: [<ffffffff8205e797>] ? enqueue_task_fair+0x14a/0x16c
kernel: [<ffffffff8237ef2f>] schedule+0x64/0x66
kernel: [<ffffffff8237d4cc>] schedule_timeout+0x2b/0x178
kernel: [<ffffffff8205c547>] ? ttwu_do_activate.clone.0+0x3f/0x44
kernel: [<ffffffff8237edbc>] wait_for_common+0xbd/0x112
kernel: [<ffffffff8205c76a>] ? try_to_wake_up+0x21e/0x21e
kernel: [<ffffffff82114559>] ? fdatawrite_one_bdev+0x18/0x18
kernel: [<ffffffff8237eec9>] wait_for_completion+0x1d/0x1f
kernel: [<ffffffff821105c8>] sync_inodes_sb+0xb3/0x198
kernel: [<ffffffff82114559>] ? fdatawrite_one_bdev+0x18/0x18
kernel: [<ffffffff8211456d>] sync_inodes_one_sb+0x14/0x16
kernel: [<ffffffff820f0f0a>] iterate_supers+0x6d/0xbf
kernel: [<ffffffff821145a4>] sys_sync+0x35/0x83
kernel: [<ffffffff82386942>] system_call_fastpath+0x16/0x1b
kernel: INFO: task fsstress:9194 blocked for more than 120 seconds.
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: fsstress        D 0000000000000000     0  9194   9189 0x00000000
kernel: ffff88010371bd58 0000000000000086 ffff88011238ad18 ffff88010371a010
kernel: ffff880112c5ad50 0000000000012080 ffff88010371bfd8 0000000000004000
kernel: ffff88010371bfd8 0000000000012080 ffffffff82613410 ffff880112c5ad50
kernel: Call Trace:
kernel: [<ffffffff8205c88d>] ? sched_clock_local+0x1c/0x82
kernel: [<ffffffff8205e797>] ? enqueue_task_fair+0x14a/0x16c
kernel: [<ffffffff8237ef2f>] schedule+0x64/0x66
kernel: [<ffffffff8237d4cc>] schedule_timeout+0x2b/0x178
kernel: [<ffffffff8205c547>] ? ttwu_do_activate.clone.0+0x3f/0x44
kernel: [<ffffffff8237edbc>] wait_for_common+0xbd/0x112
kernel: [<ffffffff8205c76a>] ? try_to_wake_up+0x21e/0x21e
kernel: [<ffffffff82114559>] ? fdatawrite_one_bdev+0x18/0x18
kernel: [<ffffffff8237eec9>] wait_for_completion+0x1d/0x1f
kernel: [<ffffffff821105c8>] sync_inodes_sb+0xb3/0x198
kernel: [<ffffffff82114559>] ? fdatawrite_one_bdev+0x18/0x18
kernel: [<ffffffff8211456d>] sync_inodes_one_sb+0x14/0x16
kernel: [<ffffffff820f0f0a>] iterate_supers+0x6d/0xbf
kernel: [<ffffffff821145a4>] sys_sync+0x35/0x83
kernel: [<ffffffff82386942>] system_call_fastpath+0x16/0x1b
kernel: INFO: task fsstress:9195 blocked for more than 120 seconds.
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: fsstress        D 0000000000000001     0  9195   9189 0x00000000
kernel: ffff8801036f3d58 0000000000000086 ffff88011238ad18 ffff8801036f2010
kernel: ffff880112c912f0 0000000000012080 ffff8801036f3fd8 0000000000004000
kernel: ffff8801036f3fd8 0000000000012080 ffff88011331d950 ffff880112c912f0
kernel: Call Trace:
kernel: [<ffffffff8201e533>] ? native_smp_send_reschedule+0x5c/0x5e
kernel: [<ffffffff82055b9e>] ? resched_task+0x61/0x63
kernel: [<ffffffff8205da8f>] ? check_preempt_wakeup+0x154/0x1b6
kernel: [<ffffffff8237ef2f>] schedule+0x64/0x66
kernel: [<ffffffff8237d4cc>] schedule_timeout+0x2b/0x178
kernel: [<ffffffff8205c547>] ? ttwu_do_activate.clone.0+0x3f/0x44
kernel: [<ffffffff8237edbc>] wait_for_common+0xbd/0x112
kernel: [<ffffffff8205c76a>] ? try_to_wake_up+0x21e/0x21e
kernel: [<ffffffff82114559>] ? fdatawrite_one_bdev+0x18/0x18
kernel: [<ffffffff8237eec9>] wait_for_completion+0x1d/0x1f
kernel: [<ffffffff821105c8>] sync_inodes_sb+0xb3/0x198
kernel: [<ffffffff82114559>] ? fdatawrite_one_bdev+0x18/0x18
kernel: [<ffffffff8211456d>] sync_inodes_one_sb+0x14/0x16
kernel: [<ffffffff820f0f0a>] iterate_supers+0x6d/0xbf
kernel: [<ffffffff821145a4>] sys_sync+0x35/0x83
kernel: [<ffffffff82386942>] system_call_fastpath+0x16/0x1b
kernel: INFO: task fsstress:9196 blocked for more than 120 seconds.
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: fsstress        D 0000000000000000     0  9196   9189 0x00000000
kernel: ffff8801081d9d58 0000000000000082 ffff88011238ad18 ffff8801081d8010
kernel: ffff88011238a6b0 0000000000012080 ffff8801081d9fd8 0000000000004000
kernel: ffff8801081d9fd8 0000000000012080 ffff880112d0a0d0 ffff88011238a6b0
kernel: Call Trace:
kernel: [<ffffffff8205c88d>] ? sched_clock_local+0x1c/0x82
kernel: [<ffffffff8205da55>] ? check_preempt_wakeup+0x11a/0x1b6
kernel: [<ffffffff8237ef2f>] schedule+0x64/0x66
kernel: [<ffffffff8237d4cc>] schedule_timeout+0x2b/0x178
kernel: [<ffffffff8205c547>] ? ttwu_do_activate.clone.0+0x3f/0x44
kernel: [<ffffffff8237edbc>] wait_for_common+0xbd/0x112
kernel: [<ffffffff8205c76a>] ? try_to_wake_up+0x21e/0x21e
kernel: [<ffffffff82114559>] ? fdatawrite_one_bdev+0x18/0x18
kernel: [<ffffffff8237eec9>] wait_for_completion+0x1d/0x1f
kernel: [<ffffffff821105c8>] sync_inodes_sb+0xb3/0x198
kernel: [<ffffffff82114559>] ? fdatawrite_one_bdev+0x18/0x18
kernel: [<ffffffff8211456d>] sync_inodes_one_sb+0x14/0x16
kernel: [<ffffffff820f0f0a>] iterate_supers+0x6d/0xbf
kernel: [<ffffffff821145a4>] sys_sync+0x35/0x83
kernel: [<ffffffff82386942>] system_call_fastpath+0x16/0x1b
kernel: INFO: task fsstress:9197 blocked for more than 120 seconds.
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: fsstress        D 0000000000000001     0  9197   9189 0x00000000
kernel: ffff88011087bd58 0000000000000086 ffff8801110c0800 ffff88011087a010
kernel: ffff880112c84670 0000000000012080 ffff88011087bfd8 0000000000004000
kernel: ffff88011087bfd8 0000000000012080 ffff880112cd4c90 ffff880112c84670
kernel: Call Trace:
kernel: [<ffffffff8205c88d>] ? sched_clock_local+0x1c/0x82
kernel: [<ffffffff8205e797>] ? enqueue_task_fair+0x14a/0x16c
kernel: [<ffffffff8237ef2f>] schedule+0x64/0x66
kernel: [<ffffffff8237d4cc>] schedule_timeout+0x2b/0x178
kernel: [<ffffffff8205c547>] ? ttwu_do_activate.clone.0+0x3f/0x44
kernel: [<ffffffff8237edbc>] wait_for_common+0xbd/0x112
kernel: [<ffffffff8205c76a>] ? try_to_wake_up+0x21e/0x21e
kernel: [<ffffffff82114559>] ? fdatawrite_one_bdev+0x18/0x18
kernel: [<ffffffff8237eec9>] wait_for_completion+0x1d/0x1f
kernel: [<ffffffff821105c8>] sync_inodes_sb+0xb3/0x198
kernel: [<ffffffff82114559>] ? fdatawrite_one_bdev+0x18/0x18
kernel: [<ffffffff8211456d>] sync_inodes_one_sb+0x14/0x16
kernel: [<ffffffff820f0f0a>] iterate_supers+0x6d/0xbf
kernel: [<ffffffff821145a4>] sys_sync+0x35/0x83
kernel: [<ffffffff82386942>] system_call_fastpath+0x16/0x1b
kernel: INFO: task fsstress:9198 blocked for more than 120 seconds.
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: fsstress        D 0000000000000000     0  9198   9189 0x00000000
kernel: ffff880110993d58 0000000000000082 ffff88011238ad18 ffff880110992010
kernel: ffff880112c1a0d0 0000000000012080 ffff880110993fd8 0000000000004000
kernel: ffff880110993fd8 0000000000012080 ffff88011238a6b0 ffff880112c1a0d0
kernel: Call Trace:
kernel: [<ffffffff820b02c3>] ? find_get_pages_tag+0xfb/0x130
kernel: [<ffffffff8205da8f>] ? check_preempt_wakeup+0x154/0x1b6
kernel: [<ffffffff8237ef2f>] schedule+0x64/0x66
kernel: [<ffffffff8237d4cc>] schedule_timeout+0x2b/0x178
kernel: [<ffffffff8237edbc>] wait_for_common+0xbd/0x112
kernel: [<ffffffff8205c76a>] ? try_to_wake_up+0x21e/0x21e
kernel: [<ffffffff82114559>] ? fdatawrite_one_bdev+0x18/0x18
kernel: [<ffffffff8237eec9>] wait_for_completion+0x1d/0x1f
kernel: [<ffffffff821105c8>] sync_inodes_sb+0xb3/0x198
kernel: [<ffffffff82114559>] ? fdatawrite_one_bdev+0x18/0x18
kernel: [<ffffffff8211456d>] sync_inodes_one_sb+0x14/0x16
kernel: [<ffffffff820f0f0a>] iterate_supers+0x6d/0xbf
kernel: [<ffffffff821145a4>] sys_sync+0x35/0x83
kernel: [<ffffffff82386942>] system_call_fastpath+0x16/0x1b
kernel: INFO: task fsstress:9199 blocked for more than 120 seconds.
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: fsstress        D 0000000000000001     0  9199   9189 0x00000000
kernel: ffff88010365dd58 0000000000000082 ffff880101128170 ffff88010365c010
kernel: ffff880112cec0d0 0000000000012080 ffff88010365dfd8 0000000000004000
kernel: ffff88010365dfd8 0000000000012080 ffff88011331d950 ffff880112cec0d0
kernel: Call Trace:
kernel: [<ffffffff8205c88d>] ? sched_clock_local+0x1c/0x82
kernel: [<ffffffff8205e797>] ? enqueue_task_fair+0x14a/0x16c
kernel: [<ffffffff8237ef2f>] schedule+0x64/0x66
kernel: [<ffffffff8237d4cc>] schedule_timeout+0x2b/0x178
kernel: [<ffffffff8205c547>] ? ttwu_do_activate.clone.0+0x3f/0x44
kernel: [<ffffffff8237edbc>] wait_for_common+0xbd/0x112
kernel: [<ffffffff8205c76a>] ? try_to_wake_up+0x21e/0x21e
kernel: [<ffffffff82114559>] ? fdatawrite_one_bdev+0x18/0x18
kernel: [<ffffffff8237eec9>] wait_for_completion+0x1d/0x1f
kernel: [<ffffffff821105c8>] sync_inodes_sb+0xb3/0x198
kernel: [<ffffffff82114559>] ? fdatawrite_one_bdev+0x18/0x18
kernel: [<ffffffff8211456d>] sync_inodes_one_sb+0x14/0x16
kernel: [<ffffffff820f0f0a>] iterate_supers+0x6d/0xbf
kernel: [<ffffffff821145a4>] sys_sync+0x35/0x83
kernel: [<ffffffff82386942>] system_call_fastpath+0x16/0x1b

Regards,
                                                - Zheng

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [BUG][Bigalloc] applictions will be blocked for more than 120s when we run xfstests #083
  2013-03-07 12:11 [BUG][Bigalloc] applictions will be blocked for more than 120s when we run xfstests #083 Zheng Liu
@ 2013-03-07 14:07 ` Lukáš Czerner
  2013-03-08 12:49   ` Zheng Liu
  0 siblings, 1 reply; 3+ messages in thread
From: Lukáš Czerner @ 2013-03-07 14:07 UTC (permalink / raw)
  To: Zheng Liu; +Cc: linux-ext4

On Thu, 7 Mar 2013, Zheng Liu wrote:

> Date: Thu, 7 Mar 2013 20:11:49 +0800
> From: Zheng Liu <gnehzuil.liu@gmail.com>
> To: linux-ext4@vger.kernel.org
> Subject: [BUG][Bigalloc] applictions will be blocked for more than 120s when
>     we run xfstests #083
> 
> Hi all,
> 
> This bug has been confirmed by Ted and Lukas.  When we run xfstests #083
> in a ext4 file system with bigalloc feature, it will be blocked for more
> than 120s.  I hit this bug in 3.8 kernel, and I can confirm that it
> doesn't be fixed in dev branch until now.  This bug is very hard to be
> hitted in my sand box.  I need to run the following commands to trigger
> it.
> 
>   for i in {0..9}
>   do
>     ./check 083
>   done
> 
> My sand box is a Dell Desktop with a Intel(R) Core(TM)2 Duo CPU E8400
> @ 3.00GHz, 4G memory, a 160G HDD and a Intel SSD.  The test runs against
> SSD.
> 
> In 3.8 kernel, we will get the follwing messages from dmesg, and will
> be blocked for more than 120s:
> 
> Mar  7 15:15:17 lz-desktop wenqing: run xfstest 083 
> Mar  7 15:15:17 lz-desktop kernel: EXT4-fs (sda2): mounted filesystem
> with ordered data mode. Opts: acl,user_xattr
> Mar  7 15:15:18 lz-desktop kernel: EXT4-fs (sda2): delayed block
> allocation failed for inode 32 at logical offset 631 with max blocks 29
>  with error -28 
> Mar  7 15:15:18 lz-desktop kernel: EXT4-fs (sda2): This should not
> happen!! Data will be lost
> Mar  7 15:15:18 lz-desktop kernel:
> Mar  7 15:15:18 lz-desktop kernel: EXT4-fs (sda2): Total free blocks count 288
> Mar  7 15:15:18 lz-desktop kernel: EXT4-fs (sda2): Free/Dirty block details
> Mar  7 15:15:18 lz-desktop kernel: EXT4-fs (sda2): free_blocks=288
> Mar  7 15:15:18 lz-desktop kernel: EXT4-fs (sda2): dirty_blocks=96
> Mar  7 15:15:18 lz-desktop kernel: EXT4-fs (sda2): Block reservation details
> Mar  7 15:15:18 lz-desktop kernel: EXT4-fs (sda2): i_reserved_data_blocks=3
> Mar  7 15:15:18 lz-desktop kernel: EXT4-fs (sda2): i_reserved_meta_blocks=3
> 
> These messages *disappears* in dev branch because Lukas's patches.
> 
> But we still are blocked for more than 120s as below (after running 9
> times):

Yes, I can confirm that. The problem is that when we have delayed
write into unwritten extent we do not reserve any space, which is ok
because the data has already been allocated, however we might need
metadata blocks to cover unwritten extent conversion which we do not
have reserved.

Then in writeback time when the extent splic actually happen we
might not have enough space to allocate metadata blocks hence
ext4_map_blocks() in mpage_da_map_and_submit() will return -ENOSPC
to the ext4_da_writepages() caller.

However we're in writeback and we do not expect allocation to fail
because of ENOSPC at all because we should have reserved everything
we need to complete successfully so in the loop we'll force the
journal commit hoping that some blocks will be released and retry
the allocation again...and we'll be stuck in this loop forever.

Now here is patch which fixes the problem for me, however it still
needs some testing. Also we should probably do something about the
infinite loop in the ext4_da_writepages() - at least warn the user
if we try too many times so we at least know what's happening
because it was not easy to find this out.

Hopefully I'll send the proper patch soon, but feel free to test the
fix yourself.

Thanks!
-Lukas

---
 fs/ext4/ext4.h  |    1 +
 fs/ext4/inode.c |   76 +++++++++++++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 70 insertions(+), 7 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 6e16c18..c20efe2 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -581,6 +581,7 @@ enum {
 #define EXT4_GET_BLOCKS_NO_LOCK			0x0100
 	/* Do not put hole in extent cache */
 #define EXT4_GET_BLOCKS_NO_PUT_HOLE		0x0200
+#define EXT4_GET_BLOCKS_METADATA_RESERVED	0x0400
 
 /*
  * Flags used by ext4_free_blocks
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 9ea0cde..46cc739 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -606,7 +606,8 @@ found:
 	 * let the underlying get_block() function know to
 	 * avoid double accounting
 	 */
-	if (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE)
+	if ((flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) ||
+	    (flags & EXT4_GET_BLOCKS_METADATA_RESERVED))
 		ext4_set_inode_state(inode, EXT4_STATE_DELALLOC_RESERVED);
 	/*
 	 * We need to check for EXT4 here because migrate
@@ -636,7 +637,8 @@ found:
 			(flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE))
 			ext4_da_update_reserve_space(inode, retval, 1);
 	}
-	if (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE)
+	if ((flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) ||
+	    (flags & EXT4_GET_BLOCKS_METADATA_RESERVED))
 		ext4_clear_inode_state(inode, EXT4_STATE_DELALLOC_RESERVED);
 
 	if (retval > 0) {
@@ -1215,6 +1217,56 @@ static int ext4_journalled_write_end(struct file *file,
 	return ret ? ret : copied;
 }
 
+
+/*
+ * Reserve a metadata for a single block located at lblock
+ */
+static int ext4_da_reserve_metadata(struct inode *inode, ext4_lblk_t lblock)
+{
+	int retries = 0;
+	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+	struct ext4_inode_info *ei = EXT4_I(inode);
+	unsigned int md_needed;
+	ext4_lblk_t save_last_lblock;
+	int save_len;
+
+	/*
+	 * recalculate the amount of metadata blocks to reserve
+	 * in order to allocate nrblocks
+	 * worse case is one extent per block
+	 */
+repeat:
+	spin_lock(&ei->i_block_reservation_lock);
+	/*
+	 * ext4_calc_metadata_amount() has side effects, which we have
+	 * to be prepared undo if we fail to claim space.
+	 */
+	save_len = ei->i_da_metadata_calc_len;
+	save_last_lblock = ei->i_da_metadata_calc_last_lblock;
+	md_needed = EXT4_NUM_B2C(sbi,
+				 ext4_calc_metadata_amount(inode, lblock));
+	trace_ext4_da_reserve_space(inode, md_needed);
+
+	/*
+	 * We do still charge estimated metadata to the sb though;
+	 * we cannot afford to run out of free blocks.
+	 */
+	if (ext4_claim_free_clusters(sbi, md_needed, 0)) {
+		ei->i_da_metadata_calc_len = save_len;
+		ei->i_da_metadata_calc_last_lblock = save_last_lblock;
+		spin_unlock(&ei->i_block_reservation_lock);
+		if (ext4_should_retry_alloc(inode->i_sb, &retries)) {
+			yield();
+			goto repeat;
+		}
+		return -ENOSPC;
+	}
+	ei->i_reserved_meta_blocks += md_needed;
+	spin_unlock(&ei->i_block_reservation_lock);
+
+	return 0;       /* success */
+}
+
 /*
  * Reserve a single cluster located at lblock
  */
@@ -1601,7 +1653,8 @@ static void mpage_da_map_and_submit(struct mpage_da_data *mpd)
 	 */
 	map.m_lblk = next;
 	map.m_len = max_blocks;
-	get_blocks_flags = EXT4_GET_BLOCKS_CREATE;
+	get_blocks_flags = EXT4_GET_BLOCKS_CREATE |
+			   EXT4_GET_BLOCKS_METADATA_RESERVED;
 	if (ext4_should_dioread_nolock(mpd->inode))
 		get_blocks_flags |= EXT4_GET_BLOCKS_IO_CREATE_EXT;
 	if (mpd->b_state & (1 << BH_Delay))
@@ -1766,7 +1819,7 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
 			      struct buffer_head *bh)
 {
 	struct extent_status es;
-	int retval;
+	int retval, ret;
 	sector_t invalid_block = ~((sector_t) 0xffff);
 
 	if (invalid_block < ext4_blocks_count(EXT4_SB(inode->i_sb)->s_es))
@@ -1804,9 +1857,19 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
 		map->m_len = retval;
 		if (ext4_es_is_written(&es))
 			map->m_flags |= EXT4_MAP_MAPPED;
-		else if (ext4_es_is_unwritten(&es))
+		else if (ext4_es_is_unwritten(&es)) {
+			/*
+			 * We have delalloc write into the unwritten extent
+			 * which means that we have to reserve metadata
+			 * potentially required for converting unwritten
+			 * extent.
+			 */
+			ret = ext4_da_reserve_metadata(inode, iblock);
+			if (ret)
+				/* not enough space to reserve */
+				retval = ret;
 			map->m_flags |= EXT4_MAP_UNWRITTEN;
-		else
+		} else
 			BUG_ON(1);
 
 		return retval;
@@ -1838,7 +1901,6 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
 
 add_delayed:
 	if (retval == 0) {
-		int ret;
 		/*
 		 * XXX: __block_prepare_write() unmaps passed block,
 		 * is it OK?
-- 
1.7.7.6





^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [BUG][Bigalloc] applictions will be blocked for more than 120s when we run xfstests #083
  2013-03-07 14:07 ` Lukáš Czerner
@ 2013-03-08 12:49   ` Zheng Liu
  0 siblings, 0 replies; 3+ messages in thread
From: Zheng Liu @ 2013-03-08 12:49 UTC (permalink / raw)
  To: Lukáš Czerner; +Cc: linux-ext4

On Thu, Mar 07, 2013 at 03:07:25PM +0100, Lukáš Czerner wrote:
[snip]
> 
> Yes, I can confirm that. The problem is that when we have delayed
> write into unwritten extent we do not reserve any space, which is ok
> because the data has already been allocated, however we might need
> metadata blocks to cover unwritten extent conversion which we do not
> have reserved.
> 
> Then in writeback time when the extent splic actually happen we
> might not have enough space to allocate metadata blocks hence
> ext4_map_blocks() in mpage_da_map_and_submit() will return -ENOSPC
> to the ext4_da_writepages() caller.
> 
> However we're in writeback and we do not expect allocation to fail
> because of ENOSPC at all because we should have reserved everything
> we need to complete successfully so in the loop we'll force the
> journal commit hoping that some blocks will be released and retry
> the allocation again...and we'll be stuck in this loop forever.
> 
> Now here is patch which fixes the problem for me, however it still
> needs some testing. Also we should probably do something about the
> infinite loop in the ext4_da_writepages() - at least warn the user
> if we try too many times so we at least know what's happening
> because it was not easy to find this out.
> 
> Hopefully I'll send the proper patch soon, but feel free to test the
> fix yourself.

I have seen that you have sent the patch series to the mailing list, and
I will take a close look at them.

For this patch, I can confirm that xfstests #083 never hang, and I only
see the warning from ext4_da_update_reserve_space() in #269.  I guess
that has been fixed by your patch series.  Thanks for fixing it.
Tested-by: Zheng Liu <wenqing.lz@taobao.com>

Regards,
                                                - Zheng

> 
> Thanks!
> -Lukas
> 
> ---
>  fs/ext4/ext4.h  |    1 +
>  fs/ext4/inode.c |   76 +++++++++++++++++++++++++++++++++++++++++++++++++-----
>  2 files changed, 70 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 6e16c18..c20efe2 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -581,6 +581,7 @@ enum {
>  #define EXT4_GET_BLOCKS_NO_LOCK			0x0100
>  	/* Do not put hole in extent cache */
>  #define EXT4_GET_BLOCKS_NO_PUT_HOLE		0x0200
> +#define EXT4_GET_BLOCKS_METADATA_RESERVED	0x0400
>  
>  /*
>   * Flags used by ext4_free_blocks
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 9ea0cde..46cc739 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -606,7 +606,8 @@ found:
>  	 * let the underlying get_block() function know to
>  	 * avoid double accounting
>  	 */
> -	if (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE)
> +	if ((flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) ||
> +	    (flags & EXT4_GET_BLOCKS_METADATA_RESERVED))
>  		ext4_set_inode_state(inode, EXT4_STATE_DELALLOC_RESERVED);
>  	/*
>  	 * We need to check for EXT4 here because migrate
> @@ -636,7 +637,8 @@ found:
>  			(flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE))
>  			ext4_da_update_reserve_space(inode, retval, 1);
>  	}
> -	if (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE)
> +	if ((flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) ||
> +	    (flags & EXT4_GET_BLOCKS_METADATA_RESERVED))
>  		ext4_clear_inode_state(inode, EXT4_STATE_DELALLOC_RESERVED);
>  
>  	if (retval > 0) {
> @@ -1215,6 +1217,56 @@ static int ext4_journalled_write_end(struct file *file,
>  	return ret ? ret : copied;
>  }
>  
> +
> +/*
> + * Reserve a metadata for a single block located at lblock
> + */
> +static int ext4_da_reserve_metadata(struct inode *inode, ext4_lblk_t lblock)
> +{
> +	int retries = 0;
> +	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
> +	struct ext4_inode_info *ei = EXT4_I(inode);
> +	unsigned int md_needed;
> +	ext4_lblk_t save_last_lblock;
> +	int save_len;
> +
> +	/*
> +	 * recalculate the amount of metadata blocks to reserve
> +	 * in order to allocate nrblocks
> +	 * worse case is one extent per block
> +	 */
> +repeat:
> +	spin_lock(&ei->i_block_reservation_lock);
> +	/*
> +	 * ext4_calc_metadata_amount() has side effects, which we have
> +	 * to be prepared undo if we fail to claim space.
> +	 */
> +	save_len = ei->i_da_metadata_calc_len;
> +	save_last_lblock = ei->i_da_metadata_calc_last_lblock;
> +	md_needed = EXT4_NUM_B2C(sbi,
> +				 ext4_calc_metadata_amount(inode, lblock));
> +	trace_ext4_da_reserve_space(inode, md_needed);
> +
> +	/*
> +	 * We do still charge estimated metadata to the sb though;
> +	 * we cannot afford to run out of free blocks.
> +	 */
> +	if (ext4_claim_free_clusters(sbi, md_needed, 0)) {
> +		ei->i_da_metadata_calc_len = save_len;
> +		ei->i_da_metadata_calc_last_lblock = save_last_lblock;
> +		spin_unlock(&ei->i_block_reservation_lock);
> +		if (ext4_should_retry_alloc(inode->i_sb, &retries)) {
> +			yield();
> +			goto repeat;
> +		}
> +		return -ENOSPC;
> +	}
> +	ei->i_reserved_meta_blocks += md_needed;
> +	spin_unlock(&ei->i_block_reservation_lock);
> +
> +	return 0;       /* success */
> +}
> +
>  /*
>   * Reserve a single cluster located at lblock
>   */
> @@ -1601,7 +1653,8 @@ static void mpage_da_map_and_submit(struct mpage_da_data *mpd)
>  	 */
>  	map.m_lblk = next;
>  	map.m_len = max_blocks;
> -	get_blocks_flags = EXT4_GET_BLOCKS_CREATE;
> +	get_blocks_flags = EXT4_GET_BLOCKS_CREATE |
> +			   EXT4_GET_BLOCKS_METADATA_RESERVED;
>  	if (ext4_should_dioread_nolock(mpd->inode))
>  		get_blocks_flags |= EXT4_GET_BLOCKS_IO_CREATE_EXT;
>  	if (mpd->b_state & (1 << BH_Delay))
> @@ -1766,7 +1819,7 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
>  			      struct buffer_head *bh)
>  {
>  	struct extent_status es;
> -	int retval;
> +	int retval, ret;
>  	sector_t invalid_block = ~((sector_t) 0xffff);
>  
>  	if (invalid_block < ext4_blocks_count(EXT4_SB(inode->i_sb)->s_es))
> @@ -1804,9 +1857,19 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
>  		map->m_len = retval;
>  		if (ext4_es_is_written(&es))
>  			map->m_flags |= EXT4_MAP_MAPPED;
> -		else if (ext4_es_is_unwritten(&es))
> +		else if (ext4_es_is_unwritten(&es)) {
> +			/*
> +			 * We have delalloc write into the unwritten extent
> +			 * which means that we have to reserve metadata
> +			 * potentially required for converting unwritten
> +			 * extent.
> +			 */
> +			ret = ext4_da_reserve_metadata(inode, iblock);
> +			if (ret)
> +				/* not enough space to reserve */
> +				retval = ret;
>  			map->m_flags |= EXT4_MAP_UNWRITTEN;
> -		else
> +		} else
>  			BUG_ON(1);
>  
>  		return retval;
> @@ -1838,7 +1901,6 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
>  
>  add_delayed:
>  	if (retval == 0) {
> -		int ret;
>  		/*
>  		 * XXX: __block_prepare_write() unmaps passed block,
>  		 * is it OK?
> -- 
> 1.7.7.6
> 
> 
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2013-03-08 12:34 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-07 12:11 [BUG][Bigalloc] applictions will be blocked for more than 120s when we run xfstests #083 Zheng Liu
2013-03-07 14:07 ` Lukáš Czerner
2013-03-08 12:49   ` Zheng Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).