* f2fs freezes the machine
@ 2015-02-20 20:10 Sebastian Andrzej Siewior
2015-02-23 23:33 ` Jaegeuk Kim
0 siblings, 1 reply; 8+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-02-20 20:10 UTC (permalink / raw)
To: Jaegeuk Kim, Changman Lee; +Cc: linux-f2fs-devel
Hi,
I've been playing with f2fs on eMMC with v3.19. I simply cp/rm files on
the partition for a while. The partition has a size of 510MiB and runs
out of space now and then.
I managed to run into this:
|[ 428.035231] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [kworker/u2:2:59]
|[ 428.042896] Modules linked in:
|[ 428.045972] CPU: 0 PID: 59 Comm: kworker/u2:2 Tainted: G W 3.19.0-00010-g10c11c51ffed #153
|[ 428.062166] Workqueue: writeback bdi_writeback_workfn (flush-179:0)
|[ 428.068460] task: df230000 ti: df23e000 task.ti: df23e000
|[ 428.073866] PC is at __submit_merged_bio+0x6c/0x110
|[ 428.078746] LR is at f2fs_submit_merged_bio+0x74/0x80
|[ 428.126618] CPU: 0 PID: 59 Comm: kworker/u2:2 Tainted: G W 3.19.0-00010-g10c11c51ffed #153
|[ 428.142802] Workqueue: writeback bdi_writeback_workfn (flush-179:0)
…
|[ 428.256579] [<c00085c4>] (gic_handle_irq) from [<c0012e84>] (__irq_svc+0x44/0x5c)
|[ 428.264063] Exception stack(0xdf23fb48 to 0xdf23fb90)
|[ 428.269119] fb40: deef3484 ffff0001 ffff0001 00000027 deef3484 00000000
|[ 428.277299] fb60: deef3440 00000000 de426000 deef34ec deefc440 df23fbb4 df23fbb8 df23fb90
|[ 428.285477] fb80: c02191f0 c0218fa0 60000013 ffffffff
|[ 428.290539] [<c0012e84>] (__irq_svc) from [<c0218fa0>] (__submit_merged_bio+0x6c/0x110)
|[ 428.298552] [<c0218fa0>] (__submit_merged_bio) from [<c02191f0>] (f2fs_submit_merged_bio+0x74/0x80)
|[ 428.307604] [<c02191f0>] (f2fs_submit_merged_bio) from [<c021624c>] (sync_dirty_dir_inodes+0x70/0x78)
|[ 428.316831] [<c021624c>] (sync_dirty_dir_inodes) from [<c0216358>] (write_checkpoint+0x104/0xc10)
|[ 428.325709] [<c0216358>] (write_checkpoint) from [<c021231c>] (f2fs_sync_fs+0x80/0xbc)
|[ 428.333637] [<c021231c>] (f2fs_sync_fs) from [<c0221eb8>] (f2fs_balance_fs_bg+0x4c/0x68)
|[ 428.341736] [<c0221eb8>] (f2fs_balance_fs_bg) from [<c021e9b8>] (f2fs_write_node_pages+0x40/0x110)
|[ 428.350704] [<c021e9b8>] (f2fs_write_node_pages) from [<c00de620>] (do_writepages+0x34/0x48)
|[ 428.359152] [<c00de620>] (do_writepages) from [<c0145714>] (__writeback_single_inode+0x50/0x228)
|[ 428.367946] [<c0145714>] (__writeback_single_inode) from [<c0146184>] (writeback_sb_inodes+0x1a8/0x378)
|[ 428.377346] [<c0146184>] (writeback_sb_inodes) from [<c01463e4>] (__writeback_inodes_wb+0x90/0xc8)
|[ 428.386312] [<c01463e4>] (__writeback_inodes_wb) from [<c01465f8>] (wb_writeback+0x1dc/0x28c)
|[ 428.394843] [<c01465f8>] (wb_writeback) from [<c0146dd8>] (bdi_writeback_workfn+0x2ac/0x460)
|[ 428.403293] [<c0146dd8>] (bdi_writeback_workfn) from [<c003c3fc>] (process_one_work+0x11c/0x3a4)
|[ 428.412087] [<c003c3fc>] (process_one_work) from [<c003c844>] (worker_thread+0x17c/0x490)
|[ 428.420272] [<c003c844>] (worker_thread) from [<c0041398>] (kthread+0xec/0x100)
|[ 428.427590] [<c0041398>] (kthread) from [<c000ed10>] (ret_from_fork+0x14/0x24)
The kernel config is UP no preemption/server. Once in this state, it
does not recover. I have more backtraces and they all look more or less
the same. I pulled in the f2fs changes from this merge window and was able
to reproduce the same thing. It can take hours to get it reproduced.
The backtrace up to sync_dirty_dir_inodes() is same once this "stall"
occures. I meassured the time spent within this function and added
printk() once 5 seconds were reached and I saw output.
At the point of the stall, the task state of this work queue:
|kworker/u2:1 R running 0 53 2 0x00000002
|Workqueue: writeback bdi_writeback_workfn (flush-179:0)
|[<c0012e84>] (__irq_svc) from [<c0219dd8>] (f2fs_submit_merged_bio+0x4c/0x84)
|[<c0219dd8>] (f2fs_submit_merged_bio) from [<c021632c>] (sync_dirty_dir_inodes+0x70/0x78)
|[<c021632c>] (sync_dirty_dir_inodes) from [<c0216448>] (write_checkpoint+0x114/0xbf0)
|[<c0216448>] (write_checkpoint) from [<c0212314>] (f2fs_sync_fs+0x7c/0xcc)
|[<c0212314>] (f2fs_sync_fs) from [<c0221f14>] (f2fs_balance_fs_bg+0x4c/0x68)
|[<c0221f14>] (f2fs_balance_fs_bg) from [<c021ed28>] (f2fs_write_node_pages+0x40/0x110)
|[<c021ed28>] (f2fs_write_node_pages) from [<c00de620>] (do_writepages+0x34/0x48)
|[<c00de620>] (do_writepages) from [<c0145714>] (__writeback_single_inode+0x50/0x228)
|[<c0145714>] (__writeback_single_inode) from [<c0146184>] (writeback_sb_inodes+0x1a8/0x378)
|[<c0146184>] (writeback_sb_inodes) from [<c01463e4>] (__writeback_inodes_wb+0x90/0xc8)
|[<c01463e4>] (__writeback_inodes_wb) from [<c01465f8>] (wb_writeback+0x1dc/0x28c)
|[<c01465f8>] (wb_writeback) from [<c0146dd8>] (bdi_writeback_workfn+0x2ac/0x460)
|[<c0146dd8>] (bdi_writeback_workfn) from [<c003c3fc>] (process_one_work+0x11c/0x3a4)
|[<c003c3fc>] (process_one_work) from [<c003c844>] (worker_thread+0x17c/0x490)
|[<c003c844>] (worker_thread) from [<c0041398>] (kthread+0xec/0x100)
|[<c0041398>] (kthread) from [<c000ed10>] (ret_from_fork+0x14/0x24)
The 0x0…2 mean TIF_NEED_RESCHED is set but since preemption is off this
schedule() seem not to happen.
I have no idea what sync_dirty_dir_inodes() is doing :) Can you image a
case where it could busy loop? I could come up with this:
| void sync_dirty_dir_inodes(struct f2fs_sb_info *sbi)
| {
| retry:
…
| head = &sbi->dir_inode_list;
…
| entry = list_entry(head->next, struct inode_entry, list);
| inode = igrab(entry->inode);
| spin_unlock(&sbi->dir_inode_lock);
…
| } else {
| /*
| * We should submit bio, since it exists several
| * wribacking dentry pages in the freeing inode.
| */
| f2fs_submit_merged_bio(sbi, DATA, WRITE);
->
|void f2fs_submit_merged_bio(struct f2fs_sb_info *sbi,
| enum page_type type, int rw)
| {
| io = is_read_io(rw) ? &sbi->read_io : &sbi->write_io[btype];
|
| down_write(&io->io_rwsem);
…
|
| __submit_merged_bio(io);
->
| static void __submit_merged_bio(struct f2fs_bio_info *io)
| {
| if (!io->bio)
exit here since it is set to NULL at the end of this function.
| return;
…
| submit_bio(fio->rw, io->bio);
| io->bio = NULL;
| }
<-
| up_write(&io->io_rwsem);
| }
|
<-
| }
| goto retry;
| }
I don't know what submit_bio() does (in terms of terminating the loop
here) but I haven't seen it in my backtraces (and assume it does not get
invoked).
Ah. The other process that (I think) could run:
| rm R running 0 1989 1774 0x00000000
| [<c047c55c>] (__schedule) from [<c00486dc>] (__cond_resched+0x30/0x4c)
| [<c00486dc>] (__cond_resched) from [<c047c8c8>] (_cond_resched+0x4c/0x54)
| [<c047c8c8>] (_cond_resched) from [<c00e1aec>] (truncate_inode_pages_range+0x1f0/0x5e8)
| [<c00e1aec>] (truncate_inode_pages_range) from [<c00e1fd8>] (truncate_inode_pages+0x28/0x30)
| [<c00e1fd8>] (truncate_inode_pages) from [<c00e2148>] (truncate_inode_pages_final+0x60/0x64)
| [<c00e2148>] (truncate_inode_pages_final) from [<c020c92c>] (f2fs_evict_inode+0x4c/0x268)
| [<c020c92c>] (f2fs_evict_inode) from [<c0137214>] (evict+0x94/0x140)
| [<c0137214>] (evict) from [<c01377e8>] (iput+0xc8/0x134)
| [<c01377e8>] (iput) from [<c01333e4>] (d_delete+0x154/0x180)
| [<c01333e4>] (d_delete) from [<c0129870>] (vfs_rmdir+0x114/0x12c)
| [<c0129870>] (vfs_rmdir) from [<c012d644>] (do_rmdir+0x158/0x168)
| [<c012d644>] (do_rmdir) from [<c012dd90>] (SyS_unlinkat+0x30/0x3c)
| [<c012dd90>] (SyS_unlinkat) from [<c000ec40>] (ret_fast_syscall+0x0/0x4c)
I wasn't able reproduce the issue after enable preemption (low latency
desktop). I saw however the the code looping in sync_dirty_dir_inodes()
for 5 seconds but it did not lockup the machine. The box continued to
work and the file system access was working.
Sebastian
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: f2fs freezes the machine
2015-02-20 20:10 f2fs freezes the machine Sebastian Andrzej Siewior
@ 2015-02-23 23:33 ` Jaegeuk Kim
2015-02-24 1:20 ` Nicholas Krause
2015-02-27 12:13 ` [PATCH] fs/f2fs: add cond_resched() to sync_dirty_dir_inodes() Sebastian Andrzej Siewior
0 siblings, 2 replies; 8+ messages in thread
From: Jaegeuk Kim @ 2015-02-23 23:33 UTC (permalink / raw)
To: Sebastian Andrzej Siewior; +Cc: linux-f2fs-devel
Hi Sebastian,
Thank you for the report.
On Fri, Feb 20, 2015 at 09:10:35PM +0100, Sebastian Andrzej Siewior wrote:
> Hi,
>
> I've been playing with f2fs on eMMC with v3.19. I simply cp/rm files on
> the partition for a while. The partition has a size of 510MiB and runs
> out of space now and then.
> I managed to run into this:
>
> |[ 428.035231] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [kworker/u2:2:59]
> |[ 428.042896] Modules linked in:
> |[ 428.045972] CPU: 0 PID: 59 Comm: kworker/u2:2 Tainted: G W 3.19.0-00010-g10c11c51ffed #153
> |[ 428.062166] Workqueue: writeback bdi_writeback_workfn (flush-179:0)
> |[ 428.068460] task: df230000 ti: df23e000 task.ti: df23e000
> |[ 428.073866] PC is at __submit_merged_bio+0x6c/0x110
> |[ 428.078746] LR is at f2fs_submit_merged_bio+0x74/0x80
> |[ 428.126618] CPU: 0 PID: 59 Comm: kworker/u2:2 Tainted: G W 3.19.0-00010-g10c11c51ffed #153
> |[ 428.142802] Workqueue: writeback bdi_writeback_workfn (flush-179:0)
> …
> |[ 428.256579] [<c00085c4>] (gic_handle_irq) from [<c0012e84>] (__irq_svc+0x44/0x5c)
> |[ 428.264063] Exception stack(0xdf23fb48 to 0xdf23fb90)
> |[ 428.269119] fb40: deef3484 ffff0001 ffff0001 00000027 deef3484 00000000
> |[ 428.277299] fb60: deef3440 00000000 de426000 deef34ec deefc440 df23fbb4 df23fbb8 df23fb90
> |[ 428.285477] fb80: c02191f0 c0218fa0 60000013 ffffffff
> |[ 428.290539] [<c0012e84>] (__irq_svc) from [<c0218fa0>] (__submit_merged_bio+0x6c/0x110)
> |[ 428.298552] [<c0218fa0>] (__submit_merged_bio) from [<c02191f0>] (f2fs_submit_merged_bio+0x74/0x80)
> |[ 428.307604] [<c02191f0>] (f2fs_submit_merged_bio) from [<c021624c>] (sync_dirty_dir_inodes+0x70/0x78)
> |[ 428.316831] [<c021624c>] (sync_dirty_dir_inodes) from [<c0216358>] (write_checkpoint+0x104/0xc10)
> |[ 428.325709] [<c0216358>] (write_checkpoint) from [<c021231c>] (f2fs_sync_fs+0x80/0xbc)
> |[ 428.333637] [<c021231c>] (f2fs_sync_fs) from [<c0221eb8>] (f2fs_balance_fs_bg+0x4c/0x68)
> |[ 428.341736] [<c0221eb8>] (f2fs_balance_fs_bg) from [<c021e9b8>] (f2fs_write_node_pages+0x40/0x110)
> |[ 428.350704] [<c021e9b8>] (f2fs_write_node_pages) from [<c00de620>] (do_writepages+0x34/0x48)
> |[ 428.359152] [<c00de620>] (do_writepages) from [<c0145714>] (__writeback_single_inode+0x50/0x228)
> |[ 428.367946] [<c0145714>] (__writeback_single_inode) from [<c0146184>] (writeback_sb_inodes+0x1a8/0x378)
> |[ 428.377346] [<c0146184>] (writeback_sb_inodes) from [<c01463e4>] (__writeback_inodes_wb+0x90/0xc8)
> |[ 428.386312] [<c01463e4>] (__writeback_inodes_wb) from [<c01465f8>] (wb_writeback+0x1dc/0x28c)
> |[ 428.394843] [<c01465f8>] (wb_writeback) from [<c0146dd8>] (bdi_writeback_workfn+0x2ac/0x460)
> |[ 428.403293] [<c0146dd8>] (bdi_writeback_workfn) from [<c003c3fc>] (process_one_work+0x11c/0x3a4)
> |[ 428.412087] [<c003c3fc>] (process_one_work) from [<c003c844>] (worker_thread+0x17c/0x490)
> |[ 428.420272] [<c003c844>] (worker_thread) from [<c0041398>] (kthread+0xec/0x100)
> |[ 428.427590] [<c0041398>] (kthread) from [<c000ed10>] (ret_from_fork+0x14/0x24)
>
> The kernel config is UP no preemption/server. Once in this state, it
> does not recover. I have more backtraces and they all look more or less
> the same. I pulled in the f2fs changes from this merge window and was able
> to reproduce the same thing. It can take hours to get it reproduced.
>
> The backtrace up to sync_dirty_dir_inodes() is same once this "stall"
> occures. I meassured the time spent within this function and added
> printk() once 5 seconds were reached and I saw output.
> At the point of the stall, the task state of this work queue:
>
> |kworker/u2:1 R running 0 53 2 0x00000002
> |Workqueue: writeback bdi_writeback_workfn (flush-179:0)
> |[<c0012e84>] (__irq_svc) from [<c0219dd8>] (f2fs_submit_merged_bio+0x4c/0x84)
> |[<c0219dd8>] (f2fs_submit_merged_bio) from [<c021632c>] (sync_dirty_dir_inodes+0x70/0x78)
> |[<c021632c>] (sync_dirty_dir_inodes) from [<c0216448>] (write_checkpoint+0x114/0xbf0)
> |[<c0216448>] (write_checkpoint) from [<c0212314>] (f2fs_sync_fs+0x7c/0xcc)
> |[<c0212314>] (f2fs_sync_fs) from [<c0221f14>] (f2fs_balance_fs_bg+0x4c/0x68)
> |[<c0221f14>] (f2fs_balance_fs_bg) from [<c021ed28>] (f2fs_write_node_pages+0x40/0x110)
> |[<c021ed28>] (f2fs_write_node_pages) from [<c00de620>] (do_writepages+0x34/0x48)
> |[<c00de620>] (do_writepages) from [<c0145714>] (__writeback_single_inode+0x50/0x228)
> |[<c0145714>] (__writeback_single_inode) from [<c0146184>] (writeback_sb_inodes+0x1a8/0x378)
> |[<c0146184>] (writeback_sb_inodes) from [<c01463e4>] (__writeback_inodes_wb+0x90/0xc8)
> |[<c01463e4>] (__writeback_inodes_wb) from [<c01465f8>] (wb_writeback+0x1dc/0x28c)
> |[<c01465f8>] (wb_writeback) from [<c0146dd8>] (bdi_writeback_workfn+0x2ac/0x460)
> |[<c0146dd8>] (bdi_writeback_workfn) from [<c003c3fc>] (process_one_work+0x11c/0x3a4)
> |[<c003c3fc>] (process_one_work) from [<c003c844>] (worker_thread+0x17c/0x490)
> |[<c003c844>] (worker_thread) from [<c0041398>] (kthread+0xec/0x100)
> |[<c0041398>] (kthread) from [<c000ed10>] (ret_from_fork+0x14/0x24)
>
> The 0x0…2 mean TIF_NEED_RESCHED is set but since preemption is off this
> schedule() seem not to happen.
>
> I have no idea what sync_dirty_dir_inodes() is doing :) Can you image a
> case where it could busy loop? I could come up with this:
>
> | void sync_dirty_dir_inodes(struct f2fs_sb_info *sbi)
> | {
> | retry:
> …
> | head = &sbi->dir_inode_list;
> …
> | entry = list_entry(head->next, struct inode_entry, list);
> | inode = igrab(entry->inode);
> | spin_unlock(&sbi->dir_inode_lock);
> …
> | } else {
> | /*
> | * We should submit bio, since it exists several
> | * wribacking dentry pages in the freeing inode.
> | */
> | f2fs_submit_merged_bio(sbi, DATA, WRITE);
> ->
> |void f2fs_submit_merged_bio(struct f2fs_sb_info *sbi,
> | enum page_type type, int rw)
> | {
> | io = is_read_io(rw) ? &sbi->read_io : &sbi->write_io[btype];
> |
> | down_write(&io->io_rwsem);
> …
> |
> | __submit_merged_bio(io);
> ->
> | static void __submit_merged_bio(struct f2fs_bio_info *io)
> | {
> | if (!io->bio)
> exit here since it is set to NULL at the end of this function.
> | return;
> …
> | submit_bio(fio->rw, io->bio);
> | io->bio = NULL;
> | }
> <-
> | up_write(&io->io_rwsem);
> | }
> |
> <-
> | }
> | goto retry;
> | }
>
> I don't know what submit_bio() does (in terms of terminating the loop
> here) but I haven't seen it in my backtraces (and assume it does not get
> invoked).
> Ah. The other process that (I think) could run:
> | rm R running 0 1989 1774 0x00000000
> | [<c047c55c>] (__schedule) from [<c00486dc>] (__cond_resched+0x30/0x4c)
> | [<c00486dc>] (__cond_resched) from [<c047c8c8>] (_cond_resched+0x4c/0x54)
> | [<c047c8c8>] (_cond_resched) from [<c00e1aec>] (truncate_inode_pages_range+0x1f0/0x5e8)
> | [<c00e1aec>] (truncate_inode_pages_range) from [<c00e1fd8>] (truncate_inode_pages+0x28/0x30)
> | [<c00e1fd8>] (truncate_inode_pages) from [<c00e2148>] (truncate_inode_pages_final+0x60/0x64)
> | [<c00e2148>] (truncate_inode_pages_final) from [<c020c92c>] (f2fs_evict_inode+0x4c/0x268)
> | [<c020c92c>] (f2fs_evict_inode) from [<c0137214>] (evict+0x94/0x140)
> | [<c0137214>] (evict) from [<c01377e8>] (iput+0xc8/0x134)
> | [<c01377e8>] (iput) from [<c01333e4>] (d_delete+0x154/0x180)
> | [<c01333e4>] (d_delete) from [<c0129870>] (vfs_rmdir+0x114/0x12c)
> | [<c0129870>] (vfs_rmdir) from [<c012d644>] (do_rmdir+0x158/0x168)
> | [<c012d644>] (do_rmdir) from [<c012dd90>] (SyS_unlinkat+0x30/0x3c)
> | [<c012dd90>] (SyS_unlinkat) from [<c000ec40>] (ret_fast_syscall+0x0/0x4c)
This inode is the directory (c.f., do_rmdir) causing a infinite loop on
sync_dirty_dir_inodes.
The sync_dirty_dir_inodes tries to flush dirty dentry pages, but if the inode is
under eviction, it submits bios and do it again until eviction is finished.
It needs to check why truncate_inode_pages_range cannot be completed.
How about adding cond_resched() right after f2fs_submit_merged_bio in
sync_dirty_dir_inodes?
Could you test this?
Thanks,
>
> I wasn't able reproduce the issue after enable preemption (low latency
> desktop). I saw however the the code looping in sync_dirty_dir_inodes()
> for 5 seconds but it did not lockup the machine. The box continued to
> work and the file system access was working.
>
> Sebastian
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: f2fs freezes the machine
2015-02-23 23:33 ` Jaegeuk Kim
@ 2015-02-24 1:20 ` Nicholas Krause
2015-02-27 12:13 ` [PATCH] fs/f2fs: add cond_resched() to sync_dirty_dir_inodes() Sebastian Andrzej Siewior
1 sibling, 0 replies; 8+ messages in thread
From: Nicholas Krause @ 2015-02-24 1:20 UTC (permalink / raw)
To: Jaegeuk Kim, Sebastian Andrzej Siewior; +Cc: linux-f2fs-devel
Greetings All,
I was wondering if there is anyone working on tracing this bug.
If no one else is I am will look into this later when I am at my actual
work system.
Nick
On February 23, 2015 6:33:53 PM EST, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
>Hi Sebastian,
>
>Thank you for the report.
>
>On Fri, Feb 20, 2015 at 09:10:35PM +0100, Sebastian Andrzej Siewior
>wrote:
>> Hi,
>>
>> I've been playing with f2fs on eMMC with v3.19. I simply cp/rm files
>on
>> the partition for a while. The partition has a size of 510MiB and
>runs
>> out of space now and then.
>> I managed to run into this:
>>
>> |[ 428.035231] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s!
>[kworker/u2:2:59]
>> |[ 428.042896] Modules linked in:
>> |[ 428.045972] CPU: 0 PID: 59 Comm: kworker/u2:2 Tainted: G W
> 3.19.0-00010-g10c11c51ffed #153
>> |[ 428.062166] Workqueue: writeback bdi_writeback_workfn
>(flush-179:0)
>> |[ 428.068460] task: df230000 ti: df23e000 task.ti: df23e000
>> |[ 428.073866] PC is at __submit_merged_bio+0x6c/0x110
>> |[ 428.078746] LR is at f2fs_submit_merged_bio+0x74/0x80
>> |[ 428.126618] CPU: 0 PID: 59 Comm: kworker/u2:2 Tainted: G W
> 3.19.0-00010-g10c11c51ffed #153
>> |[ 428.142802] Workqueue: writeback bdi_writeback_workfn
>(flush-179:0)
>> …
>> |[ 428.256579] [<c00085c4>] (gic_handle_irq) from [<c0012e84>]
>(__irq_svc+0x44/0x5c)
>> |[ 428.264063] Exception stack(0xdf23fb48 to 0xdf23fb90)
>> |[ 428.269119] fb40: deef3484 ffff0001 ffff0001
>00000027 deef3484 00000000
>> |[ 428.277299] fb60: deef3440 00000000 de426000 deef34ec deefc440
>df23fbb4 df23fbb8 df23fb90
>> |[ 428.285477] fb80: c02191f0 c0218fa0 60000013 ffffffff
>> |[ 428.290539] [<c0012e84>] (__irq_svc) from [<c0218fa0>]
>(__submit_merged_bio+0x6c/0x110)
>> |[ 428.298552] [<c0218fa0>] (__submit_merged_bio) from [<c02191f0>]
>(f2fs_submit_merged_bio+0x74/0x80)
>> |[ 428.307604] [<c02191f0>] (f2fs_submit_merged_bio) from
>[<c021624c>] (sync_dirty_dir_inodes+0x70/0x78)
>> |[ 428.316831] [<c021624c>] (sync_dirty_dir_inodes) from
>[<c0216358>] (write_checkpoint+0x104/0xc10)
>> |[ 428.325709] [<c0216358>] (write_checkpoint) from [<c021231c>]
>(f2fs_sync_fs+0x80/0xbc)
>> |[ 428.333637] [<c021231c>] (f2fs_sync_fs) from [<c0221eb8>]
>(f2fs_balance_fs_bg+0x4c/0x68)
>> |[ 428.341736] [<c0221eb8>] (f2fs_balance_fs_bg) from [<c021e9b8>]
>(f2fs_write_node_pages+0x40/0x110)
>> |[ 428.350704] [<c021e9b8>] (f2fs_write_node_pages) from
>[<c00de620>] (do_writepages+0x34/0x48)
>> |[ 428.359152] [<c00de620>] (do_writepages) from [<c0145714>]
>(__writeback_single_inode+0x50/0x228)
>> |[ 428.367946] [<c0145714>] (__writeback_single_inode) from
>[<c0146184>] (writeback_sb_inodes+0x1a8/0x378)
>> |[ 428.377346] [<c0146184>] (writeback_sb_inodes) from [<c01463e4>]
>(__writeback_inodes_wb+0x90/0xc8)
>> |[ 428.386312] [<c01463e4>] (__writeback_inodes_wb) from
>[<c01465f8>] (wb_writeback+0x1dc/0x28c)
>> |[ 428.394843] [<c01465f8>] (wb_writeback) from [<c0146dd8>]
>(bdi_writeback_workfn+0x2ac/0x460)
>> |[ 428.403293] [<c0146dd8>] (bdi_writeback_workfn) from [<c003c3fc>]
>(process_one_work+0x11c/0x3a4)
>> |[ 428.412087] [<c003c3fc>] (process_one_work) from [<c003c844>]
>(worker_thread+0x17c/0x490)
>> |[ 428.420272] [<c003c844>] (worker_thread) from [<c0041398>]
>(kthread+0xec/0x100)
>> |[ 428.427590] [<c0041398>] (kthread) from [<c000ed10>]
>(ret_from_fork+0x14/0x24)
>>
>> The kernel config is UP no preemption/server. Once in this state, it
>> does not recover. I have more backtraces and they all look more or
>less
>> the same. I pulled in the f2fs changes from this merge window and was
>able
>> to reproduce the same thing. It can take hours to get it reproduced.
>>
>> The backtrace up to sync_dirty_dir_inodes() is same once this "stall"
>> occures. I meassured the time spent within this function and added
>> printk() once 5 seconds were reached and I saw output.
>> At the point of the stall, the task state of this work queue:
>>
>> |kworker/u2:1 R running 0 53 2 0x00000002
>> |Workqueue: writeback bdi_writeback_workfn (flush-179:0)
>> |[<c0012e84>] (__irq_svc) from [<c0219dd8>]
>(f2fs_submit_merged_bio+0x4c/0x84)
>> |[<c0219dd8>] (f2fs_submit_merged_bio) from [<c021632c>]
>(sync_dirty_dir_inodes+0x70/0x78)
>> |[<c021632c>] (sync_dirty_dir_inodes) from [<c0216448>]
>(write_checkpoint+0x114/0xbf0)
>> |[<c0216448>] (write_checkpoint) from [<c0212314>]
>(f2fs_sync_fs+0x7c/0xcc)
>> |[<c0212314>] (f2fs_sync_fs) from [<c0221f14>]
>(f2fs_balance_fs_bg+0x4c/0x68)
>> |[<c0221f14>] (f2fs_balance_fs_bg) from [<c021ed28>]
>(f2fs_write_node_pages+0x40/0x110)
>> |[<c021ed28>] (f2fs_write_node_pages) from [<c00de620>]
>(do_writepages+0x34/0x48)
>> |[<c00de620>] (do_writepages) from [<c0145714>]
>(__writeback_single_inode+0x50/0x228)
>> |[<c0145714>] (__writeback_single_inode) from [<c0146184>]
>(writeback_sb_inodes+0x1a8/0x378)
>> |[<c0146184>] (writeback_sb_inodes) from [<c01463e4>]
>(__writeback_inodes_wb+0x90/0xc8)
>> |[<c01463e4>] (__writeback_inodes_wb) from [<c01465f8>]
>(wb_writeback+0x1dc/0x28c)
>> |[<c01465f8>] (wb_writeback) from [<c0146dd8>]
>(bdi_writeback_workfn+0x2ac/0x460)
>> |[<c0146dd8>] (bdi_writeback_workfn) from [<c003c3fc>]
>(process_one_work+0x11c/0x3a4)
>> |[<c003c3fc>] (process_one_work) from [<c003c844>]
>(worker_thread+0x17c/0x490)
>> |[<c003c844>] (worker_thread) from [<c0041398>] (kthread+0xec/0x100)
>> |[<c0041398>] (kthread) from [<c000ed10>] (ret_from_fork+0x14/0x24)
>>
>> The 0x0…2 mean TIF_NEED_RESCHED is set but since preemption is off
>this
>> schedule() seem not to happen.
>>
>> I have no idea what sync_dirty_dir_inodes() is doing :) Can you image
>a
>> case where it could busy loop? I could come up with this:
>>
>> | void sync_dirty_dir_inodes(struct f2fs_sb_info *sbi)
>> | {
>> | retry:
>> …
>> | head = &sbi->dir_inode_list;
>> …
>> | entry = list_entry(head->next, struct inode_entry, list);
>> | inode = igrab(entry->inode);
>> | spin_unlock(&sbi->dir_inode_lock);
>> …
>> | } else {
>> | /*
>> | * We should submit bio, since it exists several
>> | * wribacking dentry pages in the freeing inode.
>> | */
>> | f2fs_submit_merged_bio(sbi, DATA, WRITE);
>> ->
>> |void f2fs_submit_merged_bio(struct f2fs_sb_info *sbi,
>> | enum page_type type, int rw)
>> | {
>> | io = is_read_io(rw) ? &sbi->read_io :
>&sbi->write_io[btype];
>> |
>> | down_write(&io->io_rwsem);
>> …
>> |
>> | __submit_merged_bio(io);
>> ->
>> | static void __submit_merged_bio(struct f2fs_bio_info *io)
>> | {
>> | if (!io->bio)
>> exit here since it is set to NULL at the end of this function.
>> | return;
>> …
>> | submit_bio(fio->rw, io->bio);
>> | io->bio = NULL;
>> | }
>> <-
>> | up_write(&io->io_rwsem);
>> | }
>> |
>> <-
>> | }
>> | goto retry;
>> | }
>>
>> I don't know what submit_bio() does (in terms of terminating the loop
>> here) but I haven't seen it in my backtraces (and assume it does not
>get
>> invoked).
>> Ah. The other process that (I think) could run:
>> | rm R running 0 1989 1774 0x00000000
>> | [<c047c55c>] (__schedule) from [<c00486dc>]
>(__cond_resched+0x30/0x4c)
>> | [<c00486dc>] (__cond_resched) from [<c047c8c8>]
>(_cond_resched+0x4c/0x54)
>> | [<c047c8c8>] (_cond_resched) from [<c00e1aec>]
>(truncate_inode_pages_range+0x1f0/0x5e8)
>> | [<c00e1aec>] (truncate_inode_pages_range) from [<c00e1fd8>]
>(truncate_inode_pages+0x28/0x30)
>> | [<c00e1fd8>] (truncate_inode_pages) from [<c00e2148>]
>(truncate_inode_pages_final+0x60/0x64)
>> | [<c00e2148>] (truncate_inode_pages_final) from [<c020c92c>]
>(f2fs_evict_inode+0x4c/0x268)
>> | [<c020c92c>] (f2fs_evict_inode) from [<c0137214>]
>(evict+0x94/0x140)
>> | [<c0137214>] (evict) from [<c01377e8>] (iput+0xc8/0x134)
>> | [<c01377e8>] (iput) from [<c01333e4>] (d_delete+0x154/0x180)
>> | [<c01333e4>] (d_delete) from [<c0129870>] (vfs_rmdir+0x114/0x12c)
>> | [<c0129870>] (vfs_rmdir) from [<c012d644>] (do_rmdir+0x158/0x168)
>> | [<c012d644>] (do_rmdir) from [<c012dd90>] (SyS_unlinkat+0x30/0x3c)
>> | [<c012dd90>] (SyS_unlinkat) from [<c000ec40>]
>(ret_fast_syscall+0x0/0x4c)
>
>This inode is the directory (c.f., do_rmdir) causing a infinite loop on
>sync_dirty_dir_inodes.
>The sync_dirty_dir_inodes tries to flush dirty dentry pages, but if the
>inode is
>under eviction, it submits bios and do it again until eviction is
>finished.
>
>It needs to check why truncate_inode_pages_range cannot be completed.
>
>How about adding cond_resched() right after f2fs_submit_merged_bio in
>sync_dirty_dir_inodes?
>
>Could you test this?
>
>Thanks,
>
>>
>> I wasn't able reproduce the issue after enable preemption (low
>latency
>> desktop). I saw however the the code looping in
>sync_dirty_dir_inodes()
>> for 5 seconds but it did not lockup the machine. The box continued to
>> work and the file system access was working.
>>
>> Sebastian
>
>------------------------------------------------------------------------------
>Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
>from Actuate! Instantly Supercharge Your Business Reports and
>Dashboards
>with Interactivity, Sharing, Native Excel Exports, App Integration &
>more
>Get technology previously reserved for billion-dollar corporations,
>FREE
>http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
>_______________________________________________
>Linux-f2fs-devel mailing list
>Linux-f2fs-devel@lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] fs/f2fs: add cond_resched() to sync_dirty_dir_inodes()
2015-02-23 23:33 ` Jaegeuk Kim
2015-02-24 1:20 ` Nicholas Krause
@ 2015-02-27 12:13 ` Sebastian Andrzej Siewior
2015-03-03 1:13 ` Changman Lee
1 sibling, 1 reply; 8+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-02-27 12:13 UTC (permalink / raw)
To: Jaegeuk Kim; +Cc: linux-f2fs-devel
In a preempt-off enviroment a alot of FS activity (write/delete) I run
into a CPU stall:
| NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [kworker/u2:2:59]
| Modules linked in:
| CPU: 0 PID: 59 Comm: kworker/u2:2 Tainted: G W 3.19.0-00010-g10c11c51ffed #153
| Workqueue: writeback bdi_writeback_workfn (flush-179:0)
| task: df230000 ti: df23e000 task.ti: df23e000
| PC is at __submit_merged_bio+0x6c/0x110
| LR is at f2fs_submit_merged_bio+0x74/0x80
…
| [<c00085c4>] (gic_handle_irq) from [<c0012e84>] (__irq_svc+0x44/0x5c)
| Exception stack(0xdf23fb48 to 0xdf23fb90)
| fb40: deef3484 ffff0001 ffff0001 00000027 deef3484 00000000
| fb60: deef3440 00000000 de426000 deef34ec deefc440 df23fbb4 df23fbb8 df23fb90
| fb80: c02191f0 c0218fa0 60000013 ffffffff
| [<c0012e84>] (__irq_svc) from [<c0218fa0>] (__submit_merged_bio+0x6c/0x110)
| [<c0218fa0>] (__submit_merged_bio) from [<c02191f0>] (f2fs_submit_merged_bio+0x74/0x80)
| [<c02191f0>] (f2fs_submit_merged_bio) from [<c021624c>] (sync_dirty_dir_inodes+0x70/0x78)
| [<c021624c>] (sync_dirty_dir_inodes) from [<c0216358>] (write_checkpoint+0x104/0xc10)
| [<c0216358>] (write_checkpoint) from [<c021231c>] (f2fs_sync_fs+0x80/0xbc)
| [<c021231c>] (f2fs_sync_fs) from [<c0221eb8>] (f2fs_balance_fs_bg+0x4c/0x68)
| [<c0221eb8>] (f2fs_balance_fs_bg) from [<c021e9b8>] (f2fs_write_node_pages+0x40/0x110)
| [<c021e9b8>] (f2fs_write_node_pages) from [<c00de620>] (do_writepages+0x34/0x48)
| [<c00de620>] (do_writepages) from [<c0145714>] (__writeback_single_inode+0x50/0x228)
| [<c0145714>] (__writeback_single_inode) from [<c0146184>] (writeback_sb_inodes+0x1a8/0x378)
| [<c0146184>] (writeback_sb_inodes) from [<c01463e4>] (__writeback_inodes_wb+0x90/0xc8)
| [<c01463e4>] (__writeback_inodes_wb) from [<c01465f8>] (wb_writeback+0x1dc/0x28c)
| [<c01465f8>] (wb_writeback) from [<c0146dd8>] (bdi_writeback_workfn+0x2ac/0x460)
| [<c0146dd8>] (bdi_writeback_workfn) from [<c003c3fc>] (process_one_work+0x11c/0x3a4)
| [<c003c3fc>] (process_one_work) from [<c003c844>] (worker_thread+0x17c/0x490)
| [<c003c844>] (worker_thread) from [<c0041398>] (kthread+0xec/0x100)
| [<c0041398>] (kthread) from [<c000ed10>] (ret_from_fork+0x14/0x24)
As it turns out, the code loops in sync_dirty_dir_inodes() and waits for
others to make progress but since it never leaves the CPU there is no
progress made. At the time of this stall, there is also a rm process
blocked:
| rm R running 0 1989 1774 0x00000000
| [<c047c55c>] (__schedule) from [<c00486dc>] (__cond_resched+0x30/0x4c)
| [<c00486dc>] (__cond_resched) from [<c047c8c8>] (_cond_resched+0x4c/0x54)
| [<c047c8c8>] (_cond_resched) from [<c00e1aec>] (truncate_inode_pages_range+0x1f0/0x5e8)
| [<c00e1aec>] (truncate_inode_pages_range) from [<c00e1fd8>] (truncate_inode_pages+0x28/0x30)
| [<c00e1fd8>] (truncate_inode_pages) from [<c00e2148>] (truncate_inode_pages_final+0x60/0x64)
| [<c00e2148>] (truncate_inode_pages_final) from [<c020c92c>] (f2fs_evict_inode+0x4c/0x268)
| [<c020c92c>] (f2fs_evict_inode) from [<c0137214>] (evict+0x94/0x140)
| [<c0137214>] (evict) from [<c01377e8>] (iput+0xc8/0x134)
| [<c01377e8>] (iput) from [<c01333e4>] (d_delete+0x154/0x180)
| [<c01333e4>] (d_delete) from [<c0129870>] (vfs_rmdir+0x114/0x12c)
| [<c0129870>] (vfs_rmdir) from [<c012d644>] (do_rmdir+0x158/0x168)
| [<c012d644>] (do_rmdir) from [<c012dd90>] (SyS_unlinkat+0x30/0x3c)
| [<c012dd90>] (SyS_unlinkat) from [<c000ec40>] (ret_fast_syscall+0x0/0x4c)
As explained by Jaegeuk Kim:
|This inode is the directory (c.f., do_rmdir) causing a infinite loop on
|sync_dirty_dir_inodes.
|The sync_dirty_dir_inodes tries to flush dirty dentry pages, but if the
|inode is under eviction, it submits bios and do it again until eviction
|is finished.
This patch adds a cond_resched() (as suggested by Jaegeuk) after a BIO
is submitted so other thread can make progress.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
Hi Jaegeuk,
>How about adding cond_resched() right after f2fs_submit_merged_bio in
>sync_dirty_dir_inodes?
>
>Could you test this?
So I added it as you suggsted. I've seen that the two function looped
for >5sec but the system did not freeze like before that patch. So it
seems to work, thanks.
fs/f2fs/checkpoint.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 7f794b72b3b7..a2ad3df39f24 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -796,6 +796,7 @@ void sync_dirty_dir_inodes(struct f2fs_sb_info *sbi)
* wribacking dentry pages in the freeing inode.
*/
f2fs_submit_merged_bio(sbi, DATA, WRITE);
+ cond_resched();
}
goto retry;
}
--
2.1.4
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] fs/f2fs: add cond_resched() to sync_dirty_dir_inodes()
2015-02-27 12:13 ` [PATCH] fs/f2fs: add cond_resched() to sync_dirty_dir_inodes() Sebastian Andrzej Siewior
@ 2015-03-03 1:13 ` Changman Lee
2015-03-03 1:32 ` nick
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Changman Lee @ 2015-03-03 1:13 UTC (permalink / raw)
To: Sebastian Andrzej Siewior; +Cc: Jaegeuk Kim, linux-f2fs-devel
On Fri, Feb 27, 2015 at 01:13:14PM +0100, Sebastian Andrzej Siewior wrote:
> In a preempt-off enviroment a alot of FS activity (write/delete) I run
> into a CPU stall:
>
> | NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [kworker/u2:2:59]
> | Modules linked in:
> | CPU: 0 PID: 59 Comm: kworker/u2:2 Tainted: G W 3.19.0-00010-g10c11c51ffed #153
> | Workqueue: writeback bdi_writeback_workfn (flush-179:0)
> | task: df230000 ti: df23e000 task.ti: df23e000
> | PC is at __submit_merged_bio+0x6c/0x110
> | LR is at f2fs_submit_merged_bio+0x74/0x80
> …
> | [<c00085c4>] (gic_handle_irq) from [<c0012e84>] (__irq_svc+0x44/0x5c)
> | Exception stack(0xdf23fb48 to 0xdf23fb90)
> | fb40: deef3484 ffff0001 ffff0001 00000027 deef3484 00000000
> | fb60: deef3440 00000000 de426000 deef34ec deefc440 df23fbb4 df23fbb8 df23fb90
> | fb80: c02191f0 c0218fa0 60000013 ffffffff
> | [<c0012e84>] (__irq_svc) from [<c0218fa0>] (__submit_merged_bio+0x6c/0x110)
> | [<c0218fa0>] (__submit_merged_bio) from [<c02191f0>] (f2fs_submit_merged_bio+0x74/0x80)
> | [<c02191f0>] (f2fs_submit_merged_bio) from [<c021624c>] (sync_dirty_dir_inodes+0x70/0x78)
> | [<c021624c>] (sync_dirty_dir_inodes) from [<c0216358>] (write_checkpoint+0x104/0xc10)
> | [<c0216358>] (write_checkpoint) from [<c021231c>] (f2fs_sync_fs+0x80/0xbc)
> | [<c021231c>] (f2fs_sync_fs) from [<c0221eb8>] (f2fs_balance_fs_bg+0x4c/0x68)
> | [<c0221eb8>] (f2fs_balance_fs_bg) from [<c021e9b8>] (f2fs_write_node_pages+0x40/0x110)
> | [<c021e9b8>] (f2fs_write_node_pages) from [<c00de620>] (do_writepages+0x34/0x48)
> | [<c00de620>] (do_writepages) from [<c0145714>] (__writeback_single_inode+0x50/0x228)
> | [<c0145714>] (__writeback_single_inode) from [<c0146184>] (writeback_sb_inodes+0x1a8/0x378)
> | [<c0146184>] (writeback_sb_inodes) from [<c01463e4>] (__writeback_inodes_wb+0x90/0xc8)
> | [<c01463e4>] (__writeback_inodes_wb) from [<c01465f8>] (wb_writeback+0x1dc/0x28c)
> | [<c01465f8>] (wb_writeback) from [<c0146dd8>] (bdi_writeback_workfn+0x2ac/0x460)
> | [<c0146dd8>] (bdi_writeback_workfn) from [<c003c3fc>] (process_one_work+0x11c/0x3a4)
> | [<c003c3fc>] (process_one_work) from [<c003c844>] (worker_thread+0x17c/0x490)
> | [<c003c844>] (worker_thread) from [<c0041398>] (kthread+0xec/0x100)
> | [<c0041398>] (kthread) from [<c000ed10>] (ret_from_fork+0x14/0x24)
>
> As it turns out, the code loops in sync_dirty_dir_inodes() and waits for
> others to make progress but since it never leaves the CPU there is no
> progress made. At the time of this stall, there is also a rm process
> blocked:
> | rm R running 0 1989 1774 0x00000000
> | [<c047c55c>] (__schedule) from [<c00486dc>] (__cond_resched+0x30/0x4c)
> | [<c00486dc>] (__cond_resched) from [<c047c8c8>] (_cond_resched+0x4c/0x54)
> | [<c047c8c8>] (_cond_resched) from [<c00e1aec>] (truncate_inode_pages_range+0x1f0/0x5e8)
> | [<c00e1aec>] (truncate_inode_pages_range) from [<c00e1fd8>] (truncate_inode_pages+0x28/0x30)
> | [<c00e1fd8>] (truncate_inode_pages) from [<c00e2148>] (truncate_inode_pages_final+0x60/0x64)
> | [<c00e2148>] (truncate_inode_pages_final) from [<c020c92c>] (f2fs_evict_inode+0x4c/0x268)
> | [<c020c92c>] (f2fs_evict_inode) from [<c0137214>] (evict+0x94/0x140)
> | [<c0137214>] (evict) from [<c01377e8>] (iput+0xc8/0x134)
> | [<c01377e8>] (iput) from [<c01333e4>] (d_delete+0x154/0x180)
> | [<c01333e4>] (d_delete) from [<c0129870>] (vfs_rmdir+0x114/0x12c)
> | [<c0129870>] (vfs_rmdir) from [<c012d644>] (do_rmdir+0x158/0x168)
> | [<c012d644>] (do_rmdir) from [<c012dd90>] (SyS_unlinkat+0x30/0x3c)
> | [<c012dd90>] (SyS_unlinkat) from [<c000ec40>] (ret_fast_syscall+0x0/0x4c)
>
> As explained by Jaegeuk Kim:
> |This inode is the directory (c.f., do_rmdir) causing a infinite loop on
> |sync_dirty_dir_inodes.
> |The sync_dirty_dir_inodes tries to flush dirty dentry pages, but if the
> |inode is under eviction, it submits bios and do it again until eviction
> |is finished.
>
> This patch adds a cond_resched() (as suggested by Jaegeuk) after a BIO
> is submitted so other thread can make progress.
>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
> Hi Jaegeuk,
>
> >How about adding cond_resched() right after f2fs_submit_merged_bio in
> >sync_dirty_dir_inodes?
> >
> >Could you test this?
>
> So I added it as you suggsted. I've seen that the two function looped
> for >5sec but the system did not freeze like before that patch. So it
> seems to work, thanks.
Hi Sebastian,
After this patch, your test is all done without any CPU stall, Right?
IMHO, context should be switched without cond_resched() after consumed
own time quota. So, it just reduces system latency due to yielding.
I thought another way to discard pages of inode to be evicted in merged bio
instead of submit. If so, evict() doesn't need to wait for writeback.
Just my curiousity out of this problem.
Thanks,
>
> fs/f2fs/checkpoint.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> index 7f794b72b3b7..a2ad3df39f24 100644
> --- a/fs/f2fs/checkpoint.c
> +++ b/fs/f2fs/checkpoint.c
> @@ -796,6 +796,7 @@ void sync_dirty_dir_inodes(struct f2fs_sb_info *sbi)
> * wribacking dentry pages in the freeing inode.
> */
> f2fs_submit_merged_bio(sbi, DATA, WRITE);
> + cond_resched();
> }
> goto retry;
> }
> --
> 2.1.4
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] fs/f2fs: add cond_resched() to sync_dirty_dir_inodes()
2015-03-03 1:13 ` Changman Lee
@ 2015-03-03 1:32 ` nick
2015-03-03 8:22 ` Sebastian Andrzej Siewior
2015-06-12 13:28 ` Sebastian Andrzej Siewior
2 siblings, 0 replies; 8+ messages in thread
From: nick @ 2015-03-03 1:32 UTC (permalink / raw)
To: Changman Lee, Sebastian Andrzej Siewior; +Cc: Jaegeuk Kim, linux-f2fs-devel
On 2015-03-02 08:13 PM, Changman Lee wrote:
> On Fri, Feb 27, 2015 at 01:13:14PM +0100, Sebastian Andrzej Siewior wrote:
>> In a preempt-off enviroment a alot of FS activity (write/delete) I run
>> into a CPU stall:
>>
>> | NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [kworker/u2:2:59]
>> | Modules linked in:
>> | CPU: 0 PID: 59 Comm: kworker/u2:2 Tainted: G W 3.19.0-00010-g10c11c51ffed #153
>> | Workqueue: writeback bdi_writeback_workfn (flush-179:0)
>> | task: df230000 ti: df23e000 task.ti: df23e000
>> | PC is at __submit_merged_bio+0x6c/0x110
>> | LR is at f2fs_submit_merged_bio+0x74/0x80
>> …
>> | [<c00085c4>] (gic_handle_irq) from [<c0012e84>] (__irq_svc+0x44/0x5c)
>> | Exception stack(0xdf23fb48 to 0xdf23fb90)
>> | fb40: deef3484 ffff0001 ffff0001 00000027 deef3484 00000000
>> | fb60: deef3440 00000000 de426000 deef34ec deefc440 df23fbb4 df23fbb8 df23fb90
>> | fb80: c02191f0 c0218fa0 60000013 ffffffff
>> | [<c0012e84>] (__irq_svc) from [<c0218fa0>] (__submit_merged_bio+0x6c/0x110)
>> | [<c0218fa0>] (__submit_merged_bio) from [<c02191f0>] (f2fs_submit_merged_bio+0x74/0x80)
>> | [<c02191f0>] (f2fs_submit_merged_bio) from [<c021624c>] (sync_dirty_dir_inodes+0x70/0x78)
>> | [<c021624c>] (sync_dirty_dir_inodes) from [<c0216358>] (write_checkpoint+0x104/0xc10)
>> | [<c0216358>] (write_checkpoint) from [<c021231c>] (f2fs_sync_fs+0x80/0xbc)
>> | [<c021231c>] (f2fs_sync_fs) from [<c0221eb8>] (f2fs_balance_fs_bg+0x4c/0x68)
>> | [<c0221eb8>] (f2fs_balance_fs_bg) from [<c021e9b8>] (f2fs_write_node_pages+0x40/0x110)
>> | [<c021e9b8>] (f2fs_write_node_pages) from [<c00de620>] (do_writepages+0x34/0x48)
>> | [<c00de620>] (do_writepages) from [<c0145714>] (__writeback_single_inode+0x50/0x228)
>> | [<c0145714>] (__writeback_single_inode) from [<c0146184>] (writeback_sb_inodes+0x1a8/0x378)
>> | [<c0146184>] (writeback_sb_inodes) from [<c01463e4>] (__writeback_inodes_wb+0x90/0xc8)
>> | [<c01463e4>] (__writeback_inodes_wb) from [<c01465f8>] (wb_writeback+0x1dc/0x28c)
>> | [<c01465f8>] (wb_writeback) from [<c0146dd8>] (bdi_writeback_workfn+0x2ac/0x460)
>> | [<c0146dd8>] (bdi_writeback_workfn) from [<c003c3fc>] (process_one_work+0x11c/0x3a4)
>> | [<c003c3fc>] (process_one_work) from [<c003c844>] (worker_thread+0x17c/0x490)
>> | [<c003c844>] (worker_thread) from [<c0041398>] (kthread+0xec/0x100)
>> | [<c0041398>] (kthread) from [<c000ed10>] (ret_from_fork+0x14/0x24)
>>
>> As it turns out, the code loops in sync_dirty_dir_inodes() and waits for
>> others to make progress but since it never leaves the CPU there is no
>> progress made. At the time of this stall, there is also a rm process
>> blocked:
>> | rm R running 0 1989 1774 0x00000000
>> | [<c047c55c>] (__schedule) from [<c00486dc>] (__cond_resched+0x30/0x4c)
>> | [<c00486dc>] (__cond_resched) from [<c047c8c8>] (_cond_resched+0x4c/0x54)
>> | [<c047c8c8>] (_cond_resched) from [<c00e1aec>] (truncate_inode_pages_range+0x1f0/0x5e8)
>> | [<c00e1aec>] (truncate_inode_pages_range) from [<c00e1fd8>] (truncate_inode_pages+0x28/0x30)
>> | [<c00e1fd8>] (truncate_inode_pages) from [<c00e2148>] (truncate_inode_pages_final+0x60/0x64)
>> | [<c00e2148>] (truncate_inode_pages_final) from [<c020c92c>] (f2fs_evict_inode+0x4c/0x268)
>> | [<c020c92c>] (f2fs_evict_inode) from [<c0137214>] (evict+0x94/0x140)
>> | [<c0137214>] (evict) from [<c01377e8>] (iput+0xc8/0x134)
>> | [<c01377e8>] (iput) from [<c01333e4>] (d_delete+0x154/0x180)
>> | [<c01333e4>] (d_delete) from [<c0129870>] (vfs_rmdir+0x114/0x12c)
>> | [<c0129870>] (vfs_rmdir) from [<c012d644>] (do_rmdir+0x158/0x168)
>> | [<c012d644>] (do_rmdir) from [<c012dd90>] (SyS_unlinkat+0x30/0x3c)
>> | [<c012dd90>] (SyS_unlinkat) from [<c000ec40>] (ret_fast_syscall+0x0/0x4c)
>>
>> As explained by Jaegeuk Kim:
>> |This inode is the directory (c.f., do_rmdir) causing a infinite loop on
>> |sync_dirty_dir_inodes.
>> |The sync_dirty_dir_inodes tries to flush dirty dentry pages, but if the
>> |inode is under eviction, it submits bios and do it again until eviction
>> |is finished.
>>
>> This patch adds a cond_resched() (as suggested by Jaegeuk) after a BIO
>> is submitted so other thread can make progress.
>>
>> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
>> ---
>> Hi Jaegeuk,
>>
>>> How about adding cond_resched() right after f2fs_submit_merged_bio in
>>> sync_dirty_dir_inodes?
>>>
>>> Could you test this?
>>
>> So I added it as you suggsted. I've seen that the two function looped
>> for >5sec but the system did not freeze like before that patch. So it
>> seems to work, thanks.
>
> Hi Sebastian,
>
> After this patch, your test is all done without any CPU stall, Right?
> IMHO, context should be switched without cond_resched() after consumed
> own time quota. So, it just reduces system latency due to yielding.
> I thought another way to discard pages of inode to be evicted in merged bio
> instead of submit. If so, evict() doesn't need to wait for writeback.
> Just my curiousity out of this problem.
>
> Thanks,
>
>>
>> fs/f2fs/checkpoint.c | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
>> index 7f794b72b3b7..a2ad3df39f24 100644
>> --- a/fs/f2fs/checkpoint.c
>> +++ b/fs/f2fs/checkpoint.c
>> @@ -796,6 +796,7 @@ void sync_dirty_dir_inodes(struct f2fs_sb_info *sbi)
>> * wribacking dentry pages in the freeing inode.
>> */
>> f2fs_submit_merged_bio(sbi, DATA, WRITE);
>> + cond_resched();
>> }
>> goto retry;
>> }
>> --
>> 2.1.4
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website, sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for all
> things parallel software development, from weekly thought leadership blogs to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Linux-f2fs-devel mailing list
> Linux-f2fs-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
>
Changman,
That seems a good idea until we think about file system caching and if that would
cause a performance hit. I don't f2fs that while but this seems the simplest and
cleanest way to do this without removing the caching of the inodes and related
flash blocks.
Regards,
Nick
P.S. Good caught goes out to Sebastian on this patch :)
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] fs/f2fs: add cond_resched() to sync_dirty_dir_inodes()
2015-03-03 1:13 ` Changman Lee
2015-03-03 1:32 ` nick
@ 2015-03-03 8:22 ` Sebastian Andrzej Siewior
2015-06-12 13:28 ` Sebastian Andrzej Siewior
2 siblings, 0 replies; 8+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-03-03 8:22 UTC (permalink / raw)
To: Changman Lee; +Cc: Jaegeuk Kim, linux-f2fs-devel
On 03/03/2015 02:13 AM, Changman Lee wrote:
> Hi Sebastian,
Hi Jaegeuk,
> After this patch, your test is all done without any CPU stall, Right?
correct.
> IMHO, context should be switched without cond_resched() after consumed
> own time quota. So, it just reduces system latency due to yielding.
Correct but without preemption enabled there is no time quota. I still
have the >5sec printk but since the task schedules away I don't have a
"problem" anymore.
> I thought another way to discard pages of inode to be evicted in merged bio
> instead of submit. If so, evict() doesn't need to wait for writeback.
You are lucky that the workqueue code is aware of "that". That means it
will schedule another worker if the current task schedules away because
it waits for something. Otherwise you would stall other worker.
If you could solve it in another way, that would be better than just
polling for it.
> Just my curiousity out of this problem.
>
> Thanks,
Sebastian
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] fs/f2fs: add cond_resched() to sync_dirty_dir_inodes()
2015-03-03 1:13 ` Changman Lee
2015-03-03 1:32 ` nick
2015-03-03 8:22 ` Sebastian Andrzej Siewior
@ 2015-06-12 13:28 ` Sebastian Andrzej Siewior
2 siblings, 0 replies; 8+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-06-12 13:28 UTC (permalink / raw)
To: Changman Lee; +Cc: Jaegeuk Kim, linux-f2fs-devel
On 03/03/2015 02:13 AM, Changman Lee wrote:
>
> Hi Sebastian,
Hi Changman,
> After this patch, your test is all done without any CPU stall, Right?
> IMHO, context should be switched without cond_resched() after consumed
> own time quota. So, it just reduces system latency due to yielding.
> I thought another way to discard pages of inode to be evicted in merged bio
> instead of submit. If so, evict() doesn't need to wait for writeback.
> Just my curiousity out of this problem.
Where do we stand on this?
>
> Thanks,
>
Sebastian
------------------------------------------------------------------------------
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2015-06-12 13:28 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-02-20 20:10 f2fs freezes the machine Sebastian Andrzej Siewior
2015-02-23 23:33 ` Jaegeuk Kim
2015-02-24 1:20 ` Nicholas Krause
2015-02-27 12:13 ` [PATCH] fs/f2fs: add cond_resched() to sync_dirty_dir_inodes() Sebastian Andrzej Siewior
2015-03-03 1:13 ` Changman Lee
2015-03-03 1:32 ` nick
2015-03-03 8:22 ` Sebastian Andrzej Siewior
2015-06-12 13:28 ` Sebastian Andrzej Siewior
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).