* Re: [uml-devel] kernel stalls in balance_dirty_pages_ratelimited() [not found] ` <543CD148.7010904@kot-begemot.co.uk> @ 2014-10-19 14:59 ` Thomas Meyer 2014-10-19 16:02 ` Anton Ivanov 0 siblings, 1 reply; 4+ messages in thread From: Thomas Meyer @ 2014-10-19 14:59 UTC (permalink / raw) To: Anton Ivanov; +Cc: user-mode-linux-devel, Linux Kernel Mailing List Am Dienstag, den 14.10.2014, 08:31 +0100 schrieb Anton Ivanov: > I see a very similar stall on writeout to ubd with my patches (easy) and > without (difficult - takes running an IO soak for a few days). > > It stalls (usually) when trying to flush the journal file of ext4. Hi, here an extract of the trace of all writeback:* tracepoints: # tracer: nop # # entries-in-buffer/entries-written: 19322/2052430 #P:1 # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / delay # TASK-PID CPU# |||| TIMESTAMP FUNCTION # | | | |||| | | yum-1553 [000] .... 1246.000000: writeback_wake_background: bdi 98:0 yum-1553 [000] .... 1246.000000: balance_dirty_pages: bdi 98:0: limit=24732 setpoint=16229 dirty=18446744073709551284 bdi_setpoint=16227 bdi_dirty=1 dirty_ratelimit=4 task_ratelimit=0 dirtied=1 dirtied_pause=0 paused=0 pause=10 period=10 think=0 kworker/u2:0-2603 [000] .... 1246.000000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340953 written=358955 kworker/u2:0-2603 [000] .... 1246.000000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340953 written=358955 kworker/u2:0-2603 [000] .... 1246.000000: writeback_start: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775807 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background kworker/u2:0-2603 [000] .... 1246.000000: writeback_queue_io: bdi 98:0: older=4295061896 age=0 enqueue=1 reason=background kworker/u2:0-2603 [000] .... 1246.000000: writeback_single_inode_start: bdi 98:0: ino=29951 state=I_DIRTY_SYNC|I_DIRTY_PAGES|I_SYNC dirtied_when=4295061896 age=8 index=9 to_write=1024 wrote=0 kworker/u2:0-2603 [000] .... 1246.000000: writeback_write_inode_start: bdi 98:0: ino=29951 sync_mode=0 kworker/u2:0-2603 [000] .... 1246.000000: writeback_write_inode: bdi 98:0: ino=29951 sync_mode=0 kworker/u2:0-2603 [000] .... 1246.000000: writeback_single_inode: bdi 98:0: ino=29951 state=I_SYNC dirtied_when=4295061896 age=8 index=9 to_write=1024 wrote=1 kworker/u2:0-2603 [000] .... 1246.000000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background kworker/u2:0-2603 [000] .... 1246.000000: global_dirty_state: dirty=18446744073709551283 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340953 written=358956 kworker/u2:0-2603 [000] .... 1246.000000: writeback_start: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background kworker/u2:0-2603 [000] .... 1246.000000: writeback_queue_io: bdi 98:0: older=4295061896 age=0 enqueue=0 reason=background kworker/u2:0-2603 [000] .... 1246.000000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background kworker/u2:0-2603 [000] .... 1246.000000: writeback_pages_written: 1 yum-1553 [000] .... 1246.010000: writeback_wake_background: bdi 98:0 yum-1553 [000] .... 1246.010000: writeback_dirty_inode_start: bdi 98:0: ino=29951 flags=I_DIRTY_SYNC yum-1553 [000] .... 1246.010000: writeback_dirty_inode: bdi 98:0: ino=29951 flags=I_DIRTY_SYNC yum-1553 [000] d... 1246.010000: writeback_dirty_page: bdi 98:0: ino=29951 index=8 yum-1553 [000] .... 1246.010000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340954 written=358956 yum-1553 [000] .... 1246.010000: writeback_wake_background: bdi 98:0 yum-1553 [000] .... 1246.010000: balance_dirty_pages: bdi 98:0: limit=24732 setpoint=16229 dirty=18446744073709551284 bdi_setpoint=16227 bdi_dirty=1 dirty_ratelimit=4 task_ratelimit=0 dirtied=1 dirtied_pause=0 paused=0 pause=10 period=10 think=0 kworker/u2:0-2603 [000] .... 1246.010000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340954 written=358956 kworker/u2:0-2603 [000] .... 1246.010000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340954 written=358956 kworker/u2:0-2603 [000] .... 1246.010000: writeback_start: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775807 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background kworker/u2:0-2603 [000] .... 1246.010000: writeback_queue_io: bdi 98:0: older=4295061897 age=0 enqueue=1 reason=background kworker/u2:0-2603 [000] .... 1246.010000: writeback_single_inode_start: bdi 98:0: ino=29951 state=I_DIRTY_SYNC|I_DIRTY_PAGES|I_SYNC dirtied_when=4295061897 age=8 index=9 to_write=1024 wrote=0 kworker/u2:0-2603 [000] .... 1246.010000: writeback_write_inode_start: bdi 98:0: ino=29951 sync_mode=0 kworker/u2:0-2603 [000] .... 1246.010000: writeback_write_inode: bdi 98:0: ino=29951 sync_mode=0 kworker/u2:0-2603 [000] .... 1246.010000: writeback_single_inode: bdi 98:0: ino=29951 state=I_SYNC dirtied_when=4295061897 age=8 index=9 to_write=1024 wrote=1 kworker/u2:0-2603 [000] .... 1246.010000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background kworker/u2:0-2603 [000] .... 1246.010000: global_dirty_state: dirty=18446744073709551283 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340954 written=358957 kworker/u2:0-2603 [000] .... 1246.010000: writeback_start: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background kworker/u2:0-2603 [000] .... 1246.010000: writeback_queue_io: bdi 98:0: older=4295061897 age=0 enqueue=0 reason=background kworker/u2:0-2603 [000] .... 1246.010000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background kworker/u2:0-2603 [000] .... 1246.010000: writeback_pages_written: 1 yum-1553 [000] .... 1246.020000: writeback_wake_background: bdi 98:0 yum-1553 [000] .... 1246.020000: writeback_dirty_inode_start: bdi 98:0: ino=29951 flags=I_DIRTY_SYNC yum-1553 [000] .... 1246.020000: writeback_dirty_inode: bdi 98:0: ino=29951 flags=I_DIRTY_SYNC yum-1553 [000] d... 1246.020000: writeback_dirty_page: bdi 98:0: ino=29951 index=8 yum-1553 [000] .... 1246.020000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340955 written=358957 yum-1553 [000] .... 1246.020000: writeback_wake_background: bdi 98:0 yum-1553 [000] .... 1246.020000: balance_dirty_pages: bdi 98:0: limit=24732 setpoint=16229 dirty=18446744073709551284 bdi_setpoint=16227 bdi_dirty=1 dirty_ratelimit=4 task_ratelimit=0 dirtied=1 dirtied_pause=0 paused=0 pause=10 period=10 think=0 kworker/u2:0-2603 [000] .... 1246.020000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340955 written=358957 kworker/u2:0-2603 [000] .... 1246.020000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340955 written=358957 kworker/u2:0-2603 [000] .... 1246.020000: writeback_start: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775807 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background kworker/u2:0-2603 [000] .... 1246.020000: writeback_queue_io: bdi 98:0: older=4295061898 age=0 enqueue=1 reason=background kworker/u2:0-2603 [000] .... 1246.020000: writeback_single_inode_start: bdi 98:0: ino=29951 state=I_DIRTY_SYNC|I_DIRTY_PAGES|I_SYNC dirtied_when=4295061898 age=8 index=9 to_write=1024 wrote=0 kworker/u2:0-2603 [000] .... 1246.020000: writeback_write_inode_start: bdi 98:0: ino=29951 sync_mode=0 kworker/u2:0-2603 [000] .... 1246.020000: writeback_write_inode: bdi 98:0: ino=29951 sync_mode=0 kworker/u2:0-2603 [000] .... 1246.020000: writeback_single_inode: bdi 98:0: ino=29951 state=I_SYNC dirtied_when=4295061898 age=8 index=9 to_write=1024 wrote=1 kworker/u2:0-2603 [000] .... 1246.020000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background kworker/u2:0-2603 [000] .... 1246.020000: global_dirty_state: dirty=18446744073709551283 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340955 written=358958 kworker/u2:0-2603 [000] .... 1246.020000: writeback_start: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background kworker/u2:0-2603 [000] .... 1246.020000: writeback_queue_io: bdi 98:0: older=4295061898 age=0 enqueue=0 reason=background kworker/u2:0-2603 [000] .... 1246.020000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background kworker/u2:0-2603 [000] .... 1246.020000: writeback_pages_written: 1 I wonder why "global_dirty_state: dirty=18446744073709551284" can get so high! I think this is maybe an signed long, but "nr_dirty" should be a unsigned long. Two's complement of above value is 332... I'm not sure what's going on here... any ideas? > > A. > > > On 14/10/14 08:21, Thomas Meyer wrote: > > Am Dienstag, den 14.10.2014, 07:43 +0100 schrieb Anton Ivanov: > >> On 14/10/14 06:38, Anton Ivanov wrote: > >>> How does the stall manifest itself? > >>> > >>> Do you have the journal thread (and sometimes a couple of other threads) > >>> sitting in D state? > >> Sorry, should not be asking questions at 6 am before the 3rd double > >> espresso. > >> > >> I think it is the same bug I am chasing - a stall in ubd, you hit it on > >> swap while I hit it in normal operation on a swapless system. I see a > >> stall in the journal instead of a backing dev stall. > >> > >> If you apply the ubd patches out of my patchsets, you can trigger this > >> one with ease. In theory, all they do is to make UBD faster so they > >> should not by themselves introduce new races. They may however make the > >> older ones more pronounced. > >> > >> My working hypothesis is a race somewhere in the vm subsystem. I have > >> been unable to nail it though. > > Hi Anton, > > > > I see this bug on a 3.17 uml kernel with the sync fix patch from > > Thorsten Knabe applied. > > > > The stall has to do with the writepage ratelimit mechanism, as the > > mechanism seems to reach a state where it tries to write out page, per > > page: > > > > Breakpoint 1, balance_dirty_pages (pages_dirtied=1, mapping=<optimized out>) at mm/page-writeback.c:1338 > > (gdb) bt > > #0 balance_dirty_pages (pages_dirtied=1, mapping=<optimized out>) at mm/page-writeback.c:1338 > > > > pages_dirtied = 1 !! > > > > #0 try_to_grab_pending (work=0x7fa2a348, is_dwork=true, flags=0x72ff5ab8) at kernel/workqueue.c:1159 > > #1 0x0000000060051feb in mod_delayed_work_on (cpu=2141365064, wq=0x1, dwork=0x72ff5ab8, delay=<optimized out>) at kernel/workqueue.c:1510 > > #2 0x00000000600f382c in mod_delayed_work (delay=<optimized out>, dwork=<optimized out>, wq=<optimized out>) at include/linux/workqueue.h:504 > > #3 bdi_wakeup_thread (bdi=<optimized out>) at fs/fs-writeback.c:98 > > #4 0x00000000600f4aca in bdi_start_background_writeback (bdi=<optimized out>) at fs/fs-writeback.c:179 > > #5 0x000000006042d4c0 in balance_dirty_pages (pages_dirtied=<optimized out>, mapping=<optimized out>) at mm/page-writeback.c:1408 > > #6 0x00000000600a6e1a in balance_dirty_pages_ratelimited (mapping=<optimized out>) at mm/page-writeback.c:1627 > > #7 0x00000000600ba54f in do_wp_page (mm=<optimized out>, vma=<optimized out>, address=<optimized out>, page_table=<optimized out>, pmd=<optimized out>, orig_pte=..., ptl=<optimized out>) at mm/memory.c:2178 > > #8 0x00000000600bc986 in handle_pte_fault (flags=<optimized out>, pmd=<optimized out>, pte=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>) at mm/memory.c:3230 > > #9 __handle_mm_fault (flags=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>) at mm/memory.c:3335 > > #10 handle_mm_fault (mm=<optimized out>, vma=0x7f653228, address=1472490776, flags=<optimized out>) at mm/memory.c:3364 > > #11 0x0000000060028cec in handle_page_fault (address=1472490776, ip=<optimized out>, is_write=<optimized out>, is_user=0, code_out=<optimized out>) at arch/um/kernel/trap.c:75 > > #12 0x00000000600290d7 in segv (fi=..., ip=1228924391, is_user=<optimized out>, regs=0x73eb8de8) at arch/um/kernel/trap.c:222 > > #13 0x0000000060029395 in segv_handler (sig=<optimized out>, unused_si=<optimized out>, regs=<optimized out>) at arch/um/kernel/trap.c:191 > > #14 0x0000000060039c0f in userspace (regs=0x73eb8de8) at arch/um/os-Linux/skas/process.c:429 > > #15 0x0000000060026a8c in fork_handler () at arch/um/kernel/process.c:149 > > #16 0x000000000070b620 in ?? () > > #17 0x0000000000000000 in ?? () > > > > I'm not sure if this is the same error you encounter. > > > > This is on an ubd device with a cow image attached to it. > > > > The original ubd file and the cow file are spares ones, and do also > > contain a swap partition. > > > > I hope to get tracepoints/perf working, now as there is stacktrace > > support in uml. An interessting tracepoint would be > > TRACE_EVENT(bdi_dirty_ratelimit) or TRACE_EVENT(balance_dirty_pages) > > > > > >> A. > >> > >>> A. > >>> > >>> On 13/10/14 22:48, Thomas Meyer wrote: > >>>> #0 balance_dirty_pages_ratelimited (mapping=0x792cc618) at mm/page-writeback.c:1587 > >>>> #1 0x00000000600ba54f in do_wp_page (mm=<optimized out>, vma=<optimized out>, address=<optimized out>, page_table=<optimized out>, pmd > >>>> =<optimized out>, orig_pte=..., ptl=<optimized out>) at mm/memory.c:2178 > >>>> #2 0x00000000600bc986 in handle_pte_fault (flags=<optimized out>, pmd=<optimized out>, pte=<optimized out>, address=<optimized out>, v > >>>> ma=<optimized out>, mm=<optimized out>) at mm/memory.c:3230 > >>>> #3 __handle_mm_fault (flags=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>) at mm/memory.c:3335 > >>>> #4 handle_mm_fault (mm=<optimized out>, vma=0x78008e88, address=1462695424, flags=<optimized out>) at mm/memory.c:3364 > >>>> #5 0x0000000060028cec in handle_page_fault (address=1462695424, ip=<optimized out>, is_write=<optimized out>, is_user=0, code_out=<opt > >>>> imized out>) at arch/um/kernel/trap.c:75 > >>>> #6 0x00000000600290d7 in segv (fi=..., ip=1228924391, is_user=<optimized out>, regs=0x624f5728) at arch/um/kernel/trap.c:222 > >>>> #7 0x0000000060029395 in segv_handler (sig=<optimized out>, unused_si=<optimized out>, regs=<optimized out>) at arch/um/kernel/trap.c: > >>>> 191 > >>>> #8 0x0000000060039c0f in userspace (regs=0x624f5728) at arch/um/os-Linux/skas/process.c:429 > >>>> #9 0x0000000060026a8c in fork_handler () at arch/um/kernel/process.c:149 > >>>> #10 0x0000000000000000 in ?? () > >>>> > >>>> backing_dev_info: > >>>> p *mapping->backing_dev_info > >>>> $2 = {bdi_list = {next = 0x605901a0 <bdi_list>, prev = 0x80a42890}, ra_pages = 32, state = 8, capabilities = 4, congested_fn = 0x0, con > >>>> gested_data = 0x0, name = 0x604fb827 "block", bdi_stat = {{count = 4}, {count = 0}, {count = 318691}, {count = 314567}}, bw_time_stamp > >>>> = 4339445229, dirtied_stamp = 318686, written_stamp = 314564, write_bandwidth = 166, avg_write_bandwidth = 164, dirty_ratelimit = 1, ba > >>>> lanced_dirty_ratelimit = 1, completions = {events = {count = 3}, period = 4481, lock = {raw_lock = {<No data fields>}}}, dirty_exceeded > >>>> = 0, min_ratio = 0, max_ratio = 100, max_prop_frac = 1024, wb = {bdi = 0x80a42278, nr = 0, last_old_flush = 4339445229, dwork = {work > >>>> = {data = {counter = 65}, entry = {next = 0x80a42350, prev = 0x80a42350}, func = 0x600f4b25 <bdi_writeback_workfn>}, timer = {entry = { > >>>> next = 0x606801a0 <boot_tvec_bases+4896>, prev = 0x803db650}, expires = 4339445730, base = 0x6067ee82 <boot_tvec_bases+2>, function = 0 > >>>> x60051dbb <delayed_work_timer_fn>, data = 2158240584, slack = -1}, wq = 0x808d9c00, cpu = 1}, b_dirty = {next = 0x7a4ce1f8, prev = 0x80 > >>>> 6ad9a8}, b_io = {next = 0x80a423c0, prev = 0x80a423c0}, b_more_io = {next = 0x80a423d0, prev = 0x80a423d0}, list_lock = {{rlock = {raw_ > >>>> lock = {<No data fields>}}}}}, wb_lock = {{rlock = {raw_lock = {<No data fields>}}}}, work_list = {next = 0x80a423e0, prev = 0x80a423e0 > >>>> }, dev = 0x80b68e00, laptop_mode_wb_timer = {entry = {next = 0x0, prev = 0x0}, expires = 0, base = 0x6067ee80 <boot_tvec_bases>, functi > >>>> on = 0x600a6efd <laptop_mode_timer_fn>, data = 2158240008, slack = -1}, debug_dir = 0x80419e58, debug_stats = 0x80419d98} > >>>> > >>>> when i set the cap_dirty from the backing-dev ( capabilities = 5 ) the system comes back to normal. > >>>> > >>>> any ideas what's going on here? > >>>> > >>>> with kind regards > >>>> thomas ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [uml-devel] kernel stalls in balance_dirty_pages_ratelimited() 2014-10-19 14:59 ` [uml-devel] kernel stalls in balance_dirty_pages_ratelimited() Thomas Meyer @ 2014-10-19 16:02 ` Anton Ivanov 2014-10-19 19:35 ` Thomas Meyer 0 siblings, 1 reply; 4+ messages in thread From: Anton Ivanov @ 2014-10-19 16:02 UTC (permalink / raw) To: Thomas Meyer; +Cc: user-mode-linux-devel, Linux Kernel Mailing List On 19/10/14 15:59, Thomas Meyer wrote: > Am Dienstag, den 14.10.2014, 08:31 +0100 schrieb Anton Ivanov: >> I see a very similar stall on writeout to ubd with my patches (easy) and >> without (difficult - takes running an IO soak for a few days). >> >> It stalls (usually) when trying to flush the journal file of ext4. > Hi, > > here an extract of the trace of all writeback:* tracepoints: > > # tracer: nop > # > # entries-in-buffer/entries-written: 19322/2052430 #P:1 > # > # _-----=> irqs-off > # / _----=> need-resched > # | / _---=> hardirq/softirq > # || / _--=> preempt-depth > # ||| / delay > # TASK-PID CPU# |||| TIMESTAMP FUNCTION > # | | | |||| | | > yum-1553 [000] .... 1246.000000: writeback_wake_background: bdi 98:0 > yum-1553 [000] .... 1246.000000: balance_dirty_pages: bdi 98:0: limit=24732 setpoint=16229 dirty=18446744073709551284 bdi_setpoint=16227 bdi_dirty=1 dirty_ratelimit=4 task_ratelimit=0 dirtied=1 dirtied_pause=0 paused=0 pause=10 period=10 think=0 > kworker/u2:0-2603 [000] .... 1246.000000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340953 written=358955 > kworker/u2:0-2603 [000] .... 1246.000000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340953 written=358955 > kworker/u2:0-2603 [000] .... 1246.000000: writeback_start: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775807 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background > kworker/u2:0-2603 [000] .... 1246.000000: writeback_queue_io: bdi 98:0: older=4295061896 age=0 enqueue=1 reason=background > kworker/u2:0-2603 [000] .... 1246.000000: writeback_single_inode_start: bdi 98:0: ino=29951 state=I_DIRTY_SYNC|I_DIRTY_PAGES|I_SYNC dirtied_when=4295061896 age=8 index=9 to_write=1024 wrote=0 > kworker/u2:0-2603 [000] .... 1246.000000: writeback_write_inode_start: bdi 98:0: ino=29951 sync_mode=0 > kworker/u2:0-2603 [000] .... 1246.000000: writeback_write_inode: bdi 98:0: ino=29951 sync_mode=0 > kworker/u2:0-2603 [000] .... 1246.000000: writeback_single_inode: bdi 98:0: ino=29951 state=I_SYNC dirtied_when=4295061896 age=8 index=9 to_write=1024 wrote=1 > kworker/u2:0-2603 [000] .... 1246.000000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background > kworker/u2:0-2603 [000] .... 1246.000000: global_dirty_state: dirty=18446744073709551283 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340953 written=358956 > kworker/u2:0-2603 [000] .... 1246.000000: writeback_start: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background > kworker/u2:0-2603 [000] .... 1246.000000: writeback_queue_io: bdi 98:0: older=4295061896 age=0 enqueue=0 reason=background > kworker/u2:0-2603 [000] .... 1246.000000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background > kworker/u2:0-2603 [000] .... 1246.000000: writeback_pages_written: 1 > yum-1553 [000] .... 1246.010000: writeback_wake_background: bdi 98:0 > yum-1553 [000] .... 1246.010000: writeback_dirty_inode_start: bdi 98:0: ino=29951 flags=I_DIRTY_SYNC > yum-1553 [000] .... 1246.010000: writeback_dirty_inode: bdi 98:0: ino=29951 flags=I_DIRTY_SYNC > yum-1553 [000] d... 1246.010000: writeback_dirty_page: bdi 98:0: ino=29951 index=8 > yum-1553 [000] .... 1246.010000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340954 written=358956 > yum-1553 [000] .... 1246.010000: writeback_wake_background: bdi 98:0 > yum-1553 [000] .... 1246.010000: balance_dirty_pages: bdi 98:0: limit=24732 setpoint=16229 dirty=18446744073709551284 bdi_setpoint=16227 bdi_dirty=1 dirty_ratelimit=4 task_ratelimit=0 dirtied=1 dirtied_pause=0 paused=0 pause=10 period=10 think=0 > kworker/u2:0-2603 [000] .... 1246.010000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340954 written=358956 > kworker/u2:0-2603 [000] .... 1246.010000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340954 written=358956 > kworker/u2:0-2603 [000] .... 1246.010000: writeback_start: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775807 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background > kworker/u2:0-2603 [000] .... 1246.010000: writeback_queue_io: bdi 98:0: older=4295061897 age=0 enqueue=1 reason=background > kworker/u2:0-2603 [000] .... 1246.010000: writeback_single_inode_start: bdi 98:0: ino=29951 state=I_DIRTY_SYNC|I_DIRTY_PAGES|I_SYNC dirtied_when=4295061897 age=8 index=9 to_write=1024 wrote=0 > kworker/u2:0-2603 [000] .... 1246.010000: writeback_write_inode_start: bdi 98:0: ino=29951 sync_mode=0 > kworker/u2:0-2603 [000] .... 1246.010000: writeback_write_inode: bdi 98:0: ino=29951 sync_mode=0 > kworker/u2:0-2603 [000] .... 1246.010000: writeback_single_inode: bdi 98:0: ino=29951 state=I_SYNC dirtied_when=4295061897 age=8 index=9 to_write=1024 wrote=1 > kworker/u2:0-2603 [000] .... 1246.010000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background > kworker/u2:0-2603 [000] .... 1246.010000: global_dirty_state: dirty=18446744073709551283 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340954 written=358957 > kworker/u2:0-2603 [000] .... 1246.010000: writeback_start: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background > kworker/u2:0-2603 [000] .... 1246.010000: writeback_queue_io: bdi 98:0: older=4295061897 age=0 enqueue=0 reason=background > kworker/u2:0-2603 [000] .... 1246.010000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background > kworker/u2:0-2603 [000] .... 1246.010000: writeback_pages_written: 1 > yum-1553 [000] .... 1246.020000: writeback_wake_background: bdi 98:0 > yum-1553 [000] .... 1246.020000: writeback_dirty_inode_start: bdi 98:0: ino=29951 flags=I_DIRTY_SYNC > yum-1553 [000] .... 1246.020000: writeback_dirty_inode: bdi 98:0: ino=29951 flags=I_DIRTY_SYNC > yum-1553 [000] d... 1246.020000: writeback_dirty_page: bdi 98:0: ino=29951 index=8 > yum-1553 [000] .... 1246.020000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340955 written=358957 > yum-1553 [000] .... 1246.020000: writeback_wake_background: bdi 98:0 > yum-1553 [000] .... 1246.020000: balance_dirty_pages: bdi 98:0: limit=24732 setpoint=16229 dirty=18446744073709551284 bdi_setpoint=16227 bdi_dirty=1 dirty_ratelimit=4 task_ratelimit=0 dirtied=1 dirtied_pause=0 paused=0 pause=10 period=10 think=0 > kworker/u2:0-2603 [000] .... 1246.020000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340955 written=358957 > kworker/u2:0-2603 [000] .... 1246.020000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340955 written=358957 > kworker/u2:0-2603 [000] .... 1246.020000: writeback_start: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775807 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background > kworker/u2:0-2603 [000] .... 1246.020000: writeback_queue_io: bdi 98:0: older=4295061898 age=0 enqueue=1 reason=background > kworker/u2:0-2603 [000] .... 1246.020000: writeback_single_inode_start: bdi 98:0: ino=29951 state=I_DIRTY_SYNC|I_DIRTY_PAGES|I_SYNC dirtied_when=4295061898 age=8 index=9 to_write=1024 wrote=0 > kworker/u2:0-2603 [000] .... 1246.020000: writeback_write_inode_start: bdi 98:0: ino=29951 sync_mode=0 > kworker/u2:0-2603 [000] .... 1246.020000: writeback_write_inode: bdi 98:0: ino=29951 sync_mode=0 > kworker/u2:0-2603 [000] .... 1246.020000: writeback_single_inode: bdi 98:0: ino=29951 state=I_SYNC dirtied_when=4295061898 age=8 index=9 to_write=1024 wrote=1 > kworker/u2:0-2603 [000] .... 1246.020000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background > kworker/u2:0-2603 [000] .... 1246.020000: global_dirty_state: dirty=18446744073709551283 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340955 written=358958 > kworker/u2:0-2603 [000] .... 1246.020000: writeback_start: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background > kworker/u2:0-2603 [000] .... 1246.020000: writeback_queue_io: bdi 98:0: older=4295061898 age=0 enqueue=0 reason=background > kworker/u2:0-2603 [000] .... 1246.020000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background > kworker/u2:0-2603 [000] .... 1246.020000: writeback_pages_written: 1 > > I wonder why "global_dirty_state: dirty=18446744073709551284" can get so high! > I think this is maybe an signed long, but "nr_dirty" should be a unsigned long. Two's complement of above value is 332... > > I'm not sure what's going on here... Neither am I, but, unless I am mistaken, it does look like the writeout just stops working and the dirty starts to grow uncontrolled. :) > > any ideas? I had some suspicion of a race somewhere in the UML VM subsystem. I sprinked barrier() all over it, nope not the case. I have gone through the ubd driver itself with a microscope ~ 3 times, cannot see anything wrong with the logic. While it does not win a beauty contest, it does seem like it should work. All I know is that my fixes to UBD and the replacement epoll IRQ controller for UML make it more likely, however, you can trigger it with the stock UML too. A. > >> A. >> >> >> On 14/10/14 08:21, Thomas Meyer wrote: >>> Am Dienstag, den 14.10.2014, 07:43 +0100 schrieb Anton Ivanov: >>>> On 14/10/14 06:38, Anton Ivanov wrote: >>>>> How does the stall manifest itself? >>>>> >>>>> Do you have the journal thread (and sometimes a couple of other threads) >>>>> sitting in D state? >>>> Sorry, should not be asking questions at 6 am before the 3rd double >>>> espresso. >>>> >>>> I think it is the same bug I am chasing - a stall in ubd, you hit it on >>>> swap while I hit it in normal operation on a swapless system. I see a >>>> stall in the journal instead of a backing dev stall. >>>> >>>> If you apply the ubd patches out of my patchsets, you can trigger this >>>> one with ease. In theory, all they do is to make UBD faster so they >>>> should not by themselves introduce new races. They may however make the >>>> older ones more pronounced. >>>> >>>> My working hypothesis is a race somewhere in the vm subsystem. I have >>>> been unable to nail it though. >>> Hi Anton, >>> >>> I see this bug on a 3.17 uml kernel with the sync fix patch from >>> Thorsten Knabe applied. >>> >>> The stall has to do with the writepage ratelimit mechanism, as the >>> mechanism seems to reach a state where it tries to write out page, per >>> page: >>> >>> Breakpoint 1, balance_dirty_pages (pages_dirtied=1, mapping=<optimized out>) at mm/page-writeback.c:1338 >>> (gdb) bt >>> #0 balance_dirty_pages (pages_dirtied=1, mapping=<optimized out>) at mm/page-writeback.c:1338 >>> >>> pages_dirtied = 1 !! >>> >>> #0 try_to_grab_pending (work=0x7fa2a348, is_dwork=true, flags=0x72ff5ab8) at kernel/workqueue.c:1159 >>> #1 0x0000000060051feb in mod_delayed_work_on (cpu=2141365064, wq=0x1, dwork=0x72ff5ab8, delay=<optimized out>) at kernel/workqueue.c:1510 >>> #2 0x00000000600f382c in mod_delayed_work (delay=<optimized out>, dwork=<optimized out>, wq=<optimized out>) at include/linux/workqueue.h:504 >>> #3 bdi_wakeup_thread (bdi=<optimized out>) at fs/fs-writeback.c:98 >>> #4 0x00000000600f4aca in bdi_start_background_writeback (bdi=<optimized out>) at fs/fs-writeback.c:179 >>> #5 0x000000006042d4c0 in balance_dirty_pages (pages_dirtied=<optimized out>, mapping=<optimized out>) at mm/page-writeback.c:1408 >>> #6 0x00000000600a6e1a in balance_dirty_pages_ratelimited (mapping=<optimized out>) at mm/page-writeback.c:1627 >>> #7 0x00000000600ba54f in do_wp_page (mm=<optimized out>, vma=<optimized out>, address=<optimized out>, page_table=<optimized out>, pmd=<optimized out>, orig_pte=..., ptl=<optimized out>) at mm/memory.c:2178 >>> #8 0x00000000600bc986 in handle_pte_fault (flags=<optimized out>, pmd=<optimized out>, pte=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>) at mm/memory.c:3230 >>> #9 __handle_mm_fault (flags=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>) at mm/memory.c:3335 >>> #10 handle_mm_fault (mm=<optimized out>, vma=0x7f653228, address=1472490776, flags=<optimized out>) at mm/memory.c:3364 >>> #11 0x0000000060028cec in handle_page_fault (address=1472490776, ip=<optimized out>, is_write=<optimized out>, is_user=0, code_out=<optimized out>) at arch/um/kernel/trap.c:75 >>> #12 0x00000000600290d7 in segv (fi=..., ip=1228924391, is_user=<optimized out>, regs=0x73eb8de8) at arch/um/kernel/trap.c:222 >>> #13 0x0000000060029395 in segv_handler (sig=<optimized out>, unused_si=<optimized out>, regs=<optimized out>) at arch/um/kernel/trap.c:191 >>> #14 0x0000000060039c0f in userspace (regs=0x73eb8de8) at arch/um/os-Linux/skas/process.c:429 >>> #15 0x0000000060026a8c in fork_handler () at arch/um/kernel/process.c:149 >>> #16 0x000000000070b620 in ?? () >>> #17 0x0000000000000000 in ?? () >>> >>> I'm not sure if this is the same error you encounter. >>> >>> This is on an ubd device with a cow image attached to it. >>> >>> The original ubd file and the cow file are spares ones, and do also >>> contain a swap partition. >>> >>> I hope to get tracepoints/perf working, now as there is stacktrace >>> support in uml. An interessting tracepoint would be >>> TRACE_EVENT(bdi_dirty_ratelimit) or TRACE_EVENT(balance_dirty_pages) >>> >>> >>>> A. >>>> >>>>> A. >>>>> >>>>> On 13/10/14 22:48, Thomas Meyer wrote: >>>>>> #0 balance_dirty_pages_ratelimited (mapping=0x792cc618) at mm/page-writeback.c:1587 >>>>>> #1 0x00000000600ba54f in do_wp_page (mm=<optimized out>, vma=<optimized out>, address=<optimized out>, page_table=<optimized out>, pmd >>>>>> =<optimized out>, orig_pte=..., ptl=<optimized out>) at mm/memory.c:2178 >>>>>> #2 0x00000000600bc986 in handle_pte_fault (flags=<optimized out>, pmd=<optimized out>, pte=<optimized out>, address=<optimized out>, v >>>>>> ma=<optimized out>, mm=<optimized out>) at mm/memory.c:3230 >>>>>> #3 __handle_mm_fault (flags=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>) at mm/memory.c:3335 >>>>>> #4 handle_mm_fault (mm=<optimized out>, vma=0x78008e88, address=1462695424, flags=<optimized out>) at mm/memory.c:3364 >>>>>> #5 0x0000000060028cec in handle_page_fault (address=1462695424, ip=<optimized out>, is_write=<optimized out>, is_user=0, code_out=<opt >>>>>> imized out>) at arch/um/kernel/trap.c:75 >>>>>> #6 0x00000000600290d7 in segv (fi=..., ip=1228924391, is_user=<optimized out>, regs=0x624f5728) at arch/um/kernel/trap.c:222 >>>>>> #7 0x0000000060029395 in segv_handler (sig=<optimized out>, unused_si=<optimized out>, regs=<optimized out>) at arch/um/kernel/trap.c: >>>>>> 191 >>>>>> #8 0x0000000060039c0f in userspace (regs=0x624f5728) at arch/um/os-Linux/skas/process.c:429 >>>>>> #9 0x0000000060026a8c in fork_handler () at arch/um/kernel/process.c:149 >>>>>> #10 0x0000000000000000 in ?? () >>>>>> >>>>>> backing_dev_info: >>>>>> p *mapping->backing_dev_info >>>>>> $2 = {bdi_list = {next = 0x605901a0 <bdi_list>, prev = 0x80a42890}, ra_pages = 32, state = 8, capabilities = 4, congested_fn = 0x0, con >>>>>> gested_data = 0x0, name = 0x604fb827 "block", bdi_stat = {{count = 4}, {count = 0}, {count = 318691}, {count = 314567}}, bw_time_stamp >>>>>> = 4339445229, dirtied_stamp = 318686, written_stamp = 314564, write_bandwidth = 166, avg_write_bandwidth = 164, dirty_ratelimit = 1, ba >>>>>> lanced_dirty_ratelimit = 1, completions = {events = {count = 3}, period = 4481, lock = {raw_lock = {<No data fields>}}}, dirty_exceeded >>>>>> = 0, min_ratio = 0, max_ratio = 100, max_prop_frac = 1024, wb = {bdi = 0x80a42278, nr = 0, last_old_flush = 4339445229, dwork = {work >>>>>> = {data = {counter = 65}, entry = {next = 0x80a42350, prev = 0x80a42350}, func = 0x600f4b25 <bdi_writeback_workfn>}, timer = {entry = { >>>>>> next = 0x606801a0 <boot_tvec_bases+4896>, prev = 0x803db650}, expires = 4339445730, base = 0x6067ee82 <boot_tvec_bases+2>, function = 0 >>>>>> x60051dbb <delayed_work_timer_fn>, data = 2158240584, slack = -1}, wq = 0x808d9c00, cpu = 1}, b_dirty = {next = 0x7a4ce1f8, prev = 0x80 >>>>>> 6ad9a8}, b_io = {next = 0x80a423c0, prev = 0x80a423c0}, b_more_io = {next = 0x80a423d0, prev = 0x80a423d0}, list_lock = {{rlock = {raw_ >>>>>> lock = {<No data fields>}}}}}, wb_lock = {{rlock = {raw_lock = {<No data fields>}}}}, work_list = {next = 0x80a423e0, prev = 0x80a423e0 >>>>>> }, dev = 0x80b68e00, laptop_mode_wb_timer = {entry = {next = 0x0, prev = 0x0}, expires = 0, base = 0x6067ee80 <boot_tvec_bases>, functi >>>>>> on = 0x600a6efd <laptop_mode_timer_fn>, data = 2158240008, slack = -1}, debug_dir = 0x80419e58, debug_stats = 0x80419d98} >>>>>> >>>>>> when i set the cap_dirty from the backing-dev ( capabilities = 5 ) the system comes back to normal. >>>>>> >>>>>> any ideas what's going on here? >>>>>> >>>>>> with kind regards >>>>>> thomas > > ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [uml-devel] kernel stalls in balance_dirty_pages_ratelimited() 2014-10-19 16:02 ` Anton Ivanov @ 2014-10-19 19:35 ` Thomas Meyer 2014-10-20 20:01 ` Thomas Meyer 0 siblings, 1 reply; 4+ messages in thread From: Thomas Meyer @ 2014-10-19 19:35 UTC (permalink / raw) To: Anton Ivanov; +Cc: Linux Kernel Mailing List, user-mode-linux-devel Am Sonntag, den 19.10.2014, 17:02 +0100 schrieb Anton Ivanov: > On 19/10/14 15:59, Thomas Meyer wrote: > > Am Dienstag, den 14.10.2014, 08:31 +0100 schrieb Anton Ivanov: > >> I see a very similar stall on writeout to ubd with my patches (easy) and > >> without (difficult - takes running an IO soak for a few days). > >> > >> It stalls (usually) when trying to flush the journal file of ext4. > > Hi, > > > > here an extract of the trace of all writeback:* tracepoints: > > > > yum-1553 [000] .... 1246.020000: balance_dirty_pages: bdi 98:0: limit=24732 setpoint=16229 dirty=18446744073709551284 bdi_setpoint=16227 bdi_dirty=1 dirty_ratelimit=4 task_ratelimit=0 dirtied=1 dirtied_pause=0 paused=0 pause=10 period=10 think=0 > > kworker/u2:0-2603 [000] .... 1246.020000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340955 written=358957 > > kworker/u2:0-2603 [000] .... 1246.020000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340955 written=358957 > > kworker/u2:0-2603 [000] .... 1246.020000: writeback_start: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775807 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background > > kworker/u2:0-2603 [000] .... 1246.020000: writeback_queue_io: bdi 98:0: older=4295061898 age=0 enqueue=1 reason=background > > kworker/u2:0-2603 [000] .... 1246.020000: writeback_single_inode_start: bdi 98:0: ino=29951 state=I_DIRTY_SYNC|I_DIRTY_PAGES|I_SYNC dirtied_when=4295061898 age=8 index=9 to_write=1024 wrote=0 > > kworker/u2:0-2603 [000] .... 1246.020000: writeback_write_inode_start: bdi 98:0: ino=29951 sync_mode=0 > > kworker/u2:0-2603 [000] .... 1246.020000: writeback_write_inode: bdi 98:0: ino=29951 sync_mode=0 > > kworker/u2:0-2603 [000] .... 1246.020000: writeback_single_inode: bdi 98:0: ino=29951 state=I_SYNC dirtied_when=4295061898 age=8 index=9 to_write=1024 wrote=1 > > kworker/u2:0-2603 [000] .... 1246.020000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background > > kworker/u2:0-2603 [000] .... 1246.020000: global_dirty_state: dirty=18446744073709551283 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340955 written=358958 > > kworker/u2:0-2603 [000] .... 1246.020000: writeback_start: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background > > kworker/u2:0-2603 [000] .... 1246.020000: writeback_queue_io: bdi 98:0: older=4295061898 age=0 enqueue=0 reason=background > > kworker/u2:0-2603 [000] .... 1246.020000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background > > kworker/u2:0-2603 [000] .... 1246.020000: writeback_pages_written: 1 > > > > I wonder why "global_dirty_state: dirty=18446744073709551284" can get so high! > > I think this is maybe an signed long, but "nr_dirty" should be a unsigned long. Two's complement of above value is 332... > > > > I'm not sure what's going on here... > > Neither am I, but, unless I am mistaken, it does look like the writeout > just stops working and the dirty starts to grow uncontrolled. :) > > > > > any ideas? > > I had some suspicion of a race somewhere in the UML VM subsystem. I > sprinked barrier() all over it, nope not the case. > > I have gone through the ubd driver itself with a microscope ~ 3 times, > cannot see anything wrong with the logic. While it does not win a beauty > contest, it does seem like it should work. > > All I know is that my fixes to UBD and the replacement epoll IRQ > controller for UML make it more likely, however, you can trigger it with > the stock UML too. this is a trace with traceoff with the first global_dirty_state nr_dirty > 1000000 kworker/u2:2-112 [000] .... 541.590000: writeback_write_inode_start: bdi 98:0: ino=51 sync_mode=0 kworker/u2:2-112 [000] .... 541.590000: writeback_write_inode: bdi 98:0: ino=51 sync_mode=0 kworker/u2:2-112 [000] .... 541.590000: writeback_single_inode: bdi 98:0: ino=51 state=I_SYNC dirtied_when=4294988447 age=117 index=69 to_write=4096 wrote=0 kworker/u2:2-112 [000] .... 541.590000: writeback_single_inode_start: bdi 98:0: ino=52 state=I_DIRTY_SYNC|I_DIRTY_DATASYNC|I_DIRTY_PAGES|I_SYNC dirtied_when=4294988448 age=117 index=4 to_write=4096 wrote=0 kworker/u2:2-112 [000] .... 541.590000: writeback_write_inode_start: bdi 98:0: ino=52 sync_mode=0 kworker/u2:2-112 [000] .... 541.590000: writeback_write_inode: bdi 98:0: ino=52 sync_mode=0 kworker/u2:2-112 [000] .... 541.590000: writeback_single_inode: bdi 98:0: ino=52 state=I_SYNC dirtied_when=4294988448 age=117 index=4 to_write=4096 wrote=0 kworker/u2:2-112 [000] .... 541.590000: writeback_single_inode_start: bdi 98:0: ino=53 state=I_DIRTY_SYNC|I_DIRTY_PAGES|I_SYNC dirtied_when=4294988448 age=117 index=7 to_write=4096 wrote=0 kworker/u2:2-112 [000] .... 541.590000: writeback_write_inode_start: bdi 98:0: ino=53 sync_mode=0 kworker/u2:2-112 [000] .... 541.590000: writeback_write_inode: bdi 98:0: ino=53 sync_mode=0 kworker/u2:2-112 [000] .... 541.590000: writeback_single_inode: bdi 98:0: ino=53 state=I_SYNC dirtied_when=4294988448 age=117 index=7 to_write=4096 wrote=0 kworker/u2:2-112 [000] .... 541.590000: writeback_single_inode_start: bdi 98:0: ino=54 state=I_DIRTY_SYNC|I_DIRTY_PAGES|I_SYNC dirtied_when=4294988448 age=117 index=18 to_write=4096 wrote=0 kworker/u2:2-112 [000] .... 541.590000: writeback_write_inode_start: bdi 98:0: ino=54 sync_mode=0 kworker/u2:2-112 [000] .... 541.590000: writeback_write_inode: bdi 98:0: ino=54 sync_mode=0 kworker/u2:2-112 [000] .... 541.590000: writeback_single_inode: bdi 98:0: ino=54 state=I_SYNC dirtied_when=4294988448 age=117 index=18 to_write=4096 wrote=0 kworker/u2:2-112 [000] .... 541.590000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=15382 sync_mode=0 kupdate=1 range_cyclic=1 background=0 reason=periodic kworker/u2:2-112 [000] .... 541.590000: writeback_start: bdi 98:0: sb_dev 0:0 nr_pages=15382 sync_mode=0 kupdate=1 range_cyclic=1 background=0 reason=periodic kworker/u2:2-112 [000] .... 541.590000: writeback_queue_io: bdi 98:0: older=4294988455 age=30000 enqueue=0 reason=periodic kworker/u2:2-112 [000] .... 541.590000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=15382 sync_mode=0 kupdate=1 range_cyclic=1 background=0 reason=periodic kworker/u2:2-112 [000] .... 541.590000: global_dirty_state: dirty=18446744073709551604 writeback=30 unstable=0 bg_thresh=6954 thresh=13909 limit=24789 dirtied=278225 written=295588 So something seems to mess with global_page_state(NR_FILE_DIRTY) which is vm_stat[11].counter and indeed vm_stat is defined as signed long, that explains the negativ value. Maybe some unbalanced calls between dec/inc_zone_page_state(page, NR_FILE_DIRTY)? Is it possible in gdb to set a watchpoint on vm_stat[11].counter when a value < 0 is written to this address? > >> On 14/10/14 08:21, Thomas Meyer wrote: > >>> Am Dienstag, den 14.10.2014, 07:43 +0100 schrieb Anton Ivanov: > >>>> On 14/10/14 06:38, Anton Ivanov wrote: > >>>>> How does the stall manifest itself? > >>>>> > >>>>> Do you have the journal thread (and sometimes a couple of other threads) > >>>>> sitting in D state? > >>>> Sorry, should not be asking questions at 6 am before the 3rd double > >>>> espresso. > >>>> > >>>> I think it is the same bug I am chasing - a stall in ubd, you hit it on > >>>> swap while I hit it in normal operation on a swapless system. I see a > >>>> stall in the journal instead of a backing dev stall. > >>>> > >>>> If you apply the ubd patches out of my patchsets, you can trigger this > >>>> one with ease. In theory, all they do is to make UBD faster so they > >>>> should not by themselves introduce new races. They may however make the > >>>> older ones more pronounced. > >>>> > >>>> My working hypothesis is a race somewhere in the vm subsystem. I have > >>>> been unable to nail it though. > >>> Hi Anton, > >>> > >>> I see this bug on a 3.17 uml kernel with the sync fix patch from > >>> Thorsten Knabe applied. > >>> > >>> The stall has to do with the writepage ratelimit mechanism, as the > >>> mechanism seems to reach a state where it tries to write out page, per > >>> page: > >>> > >>> Breakpoint 1, balance_dirty_pages (pages_dirtied=1, mapping=<optimized out>) at mm/page-writeback.c:1338 > >>> (gdb) bt > >>> #0 balance_dirty_pages (pages_dirtied=1, mapping=<optimized out>) at mm/page-writeback.c:1338 > >>> > >>> pages_dirtied = 1 !! > >>> > >>> #0 try_to_grab_pending (work=0x7fa2a348, is_dwork=true, flags=0x72ff5ab8) at kernel/workqueue.c:1159 > >>> #1 0x0000000060051feb in mod_delayed_work_on (cpu=2141365064, wq=0x1, dwork=0x72ff5ab8, delay=<optimized out>) at kernel/workqueue.c:1510 > >>> #2 0x00000000600f382c in mod_delayed_work (delay=<optimized out>, dwork=<optimized out>, wq=<optimized out>) at include/linux/workqueue.h:504 > >>> #3 bdi_wakeup_thread (bdi=<optimized out>) at fs/fs-writeback.c:98 > >>> #4 0x00000000600f4aca in bdi_start_background_writeback (bdi=<optimized out>) at fs/fs-writeback.c:179 > >>> #5 0x000000006042d4c0 in balance_dirty_pages (pages_dirtied=<optimized out>, mapping=<optimized out>) at mm/page-writeback.c:1408 > >>> #6 0x00000000600a6e1a in balance_dirty_pages_ratelimited (mapping=<optimized out>) at mm/page-writeback.c:1627 > >>> #7 0x00000000600ba54f in do_wp_page (mm=<optimized out>, vma=<optimized out>, address=<optimized out>, page_table=<optimized out>, pmd=<optimized out>, orig_pte=..., ptl=<optimized out>) at mm/memory.c:2178 > >>> #8 0x00000000600bc986 in handle_pte_fault (flags=<optimized out>, pmd=<optimized out>, pte=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>) at mm/memory.c:3230 > >>> #9 __handle_mm_fault (flags=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>) at mm/memory.c:3335 > >>> #10 handle_mm_fault (mm=<optimized out>, vma=0x7f653228, address=1472490776, flags=<optimized out>) at mm/memory.c:3364 > >>> #11 0x0000000060028cec in handle_page_fault (address=1472490776, ip=<optimized out>, is_write=<optimized out>, is_user=0, code_out=<optimized out>) at arch/um/kernel/trap.c:75 > >>> #12 0x00000000600290d7 in segv (fi=..., ip=1228924391, is_user=<optimized out>, regs=0x73eb8de8) at arch/um/kernel/trap.c:222 > >>> #13 0x0000000060029395 in segv_handler (sig=<optimized out>, unused_si=<optimized out>, regs=<optimized out>) at arch/um/kernel/trap.c:191 > >>> #14 0x0000000060039c0f in userspace (regs=0x73eb8de8) at arch/um/os-Linux/skas/process.c:429 > >>> #15 0x0000000060026a8c in fork_handler () at arch/um/kernel/process.c:149 > >>> #16 0x000000000070b620 in ?? () > >>> #17 0x0000000000000000 in ?? () > >>> > >>> I'm not sure if this is the same error you encounter. > >>> > >>> This is on an ubd device with a cow image attached to it. > >>> > >>> The original ubd file and the cow file are spares ones, and do also > >>> contain a swap partition. > >>> > >>> I hope to get tracepoints/perf working, now as there is stacktrace > >>> support in uml. An interessting tracepoint would be > >>> TRACE_EVENT(bdi_dirty_ratelimit) or TRACE_EVENT(balance_dirty_pages) > >>> > >>> > >>>> A. > >>>> > >>>>> A. > >>>>> > >>>>> On 13/10/14 22:48, Thomas Meyer wrote: > >>>>>> #0 balance_dirty_pages_ratelimited (mapping=0x792cc618) at mm/page-writeback.c:1587 > >>>>>> #1 0x00000000600ba54f in do_wp_page (mm=<optimized out>, vma=<optimized out>, address=<optimized out>, page_table=<optimized out>, pmd > >>>>>> =<optimized out>, orig_pte=..., ptl=<optimized out>) at mm/memory.c:2178 > >>>>>> #2 0x00000000600bc986 in handle_pte_fault (flags=<optimized out>, pmd=<optimized out>, pte=<optimized out>, address=<optimized out>, v > >>>>>> ma=<optimized out>, mm=<optimized out>) at mm/memory.c:3230 > >>>>>> #3 __handle_mm_fault (flags=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>) at mm/memory.c:3335 > >>>>>> #4 handle_mm_fault (mm=<optimized out>, vma=0x78008e88, address=1462695424, flags=<optimized out>) at mm/memory.c:3364 > >>>>>> #5 0x0000000060028cec in handle_page_fault (address=1462695424, ip=<optimized out>, is_write=<optimized out>, is_user=0, code_out=<opt > >>>>>> imized out>) at arch/um/kernel/trap.c:75 > >>>>>> #6 0x00000000600290d7 in segv (fi=..., ip=1228924391, is_user=<optimized out>, regs=0x624f5728) at arch/um/kernel/trap.c:222 > >>>>>> #7 0x0000000060029395 in segv_handler (sig=<optimized out>, unused_si=<optimized out>, regs=<optimized out>) at arch/um/kernel/trap.c: > >>>>>> 191 > >>>>>> #8 0x0000000060039c0f in userspace (regs=0x624f5728) at arch/um/os-Linux/skas/process.c:429 > >>>>>> #9 0x0000000060026a8c in fork_handler () at arch/um/kernel/process.c:149 > >>>>>> #10 0x0000000000000000 in ?? () > >>>>>> > >>>>>> backing_dev_info: > >>>>>> p *mapping->backing_dev_info > >>>>>> $2 = {bdi_list = {next = 0x605901a0 <bdi_list>, prev = 0x80a42890}, ra_pages = 32, state = 8, capabilities = 4, congested_fn = 0x0, con > >>>>>> gested_data = 0x0, name = 0x604fb827 "block", bdi_stat = {{count = 4}, {count = 0}, {count = 318691}, {count = 314567}}, bw_time_stamp > >>>>>> = 4339445229, dirtied_stamp = 318686, written_stamp = 314564, write_bandwidth = 166, avg_write_bandwidth = 164, dirty_ratelimit = 1, ba > >>>>>> lanced_dirty_ratelimit = 1, completions = {events = {count = 3}, period = 4481, lock = {raw_lock = {<No data fields>}}}, dirty_exceeded > >>>>>> = 0, min_ratio = 0, max_ratio = 100, max_prop_frac = 1024, wb = {bdi = 0x80a42278, nr = 0, last_old_flush = 4339445229, dwork = {work > >>>>>> = {data = {counter = 65}, entry = {next = 0x80a42350, prev = 0x80a42350}, func = 0x600f4b25 <bdi_writeback_workfn>}, timer = {entry = { > >>>>>> next = 0x606801a0 <boot_tvec_bases+4896>, prev = 0x803db650}, expires = 4339445730, base = 0x6067ee82 <boot_tvec_bases+2>, function = 0 > >>>>>> x60051dbb <delayed_work_timer_fn>, data = 2158240584, slack = -1}, wq = 0x808d9c00, cpu = 1}, b_dirty = {next = 0x7a4ce1f8, prev = 0x80 > >>>>>> 6ad9a8}, b_io = {next = 0x80a423c0, prev = 0x80a423c0}, b_more_io = {next = 0x80a423d0, prev = 0x80a423d0}, list_lock = {{rlock = {raw_ > >>>>>> lock = {<No data fields>}}}}}, wb_lock = {{rlock = {raw_lock = {<No data fields>}}}}, work_list = {next = 0x80a423e0, prev = 0x80a423e0 > >>>>>> }, dev = 0x80b68e00, laptop_mode_wb_timer = {entry = {next = 0x0, prev = 0x0}, expires = 0, base = 0x6067ee80 <boot_tvec_bases>, functi > >>>>>> on = 0x600a6efd <laptop_mode_timer_fn>, data = 2158240008, slack = -1}, debug_dir = 0x80419e58, debug_stats = 0x80419d98} > >>>>>> > >>>>>> when i set the cap_dirty from the backing-dev ( capabilities = 5 ) the system comes back to normal. > >>>>>> > >>>>>> any ideas what's going on here? > >>>>>> > >>>>>> with kind regards > >>>>>> thomas > > ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [uml-devel] kernel stalls in balance_dirty_pages_ratelimited() 2014-10-19 19:35 ` Thomas Meyer @ 2014-10-20 20:01 ` Thomas Meyer 0 siblings, 0 replies; 4+ messages in thread From: Thomas Meyer @ 2014-10-20 20:01 UTC (permalink / raw) To: Anton Ivanov; +Cc: Linux Kernel Mailing List, user-mode-linux-devel Am Sonntag, den 19.10.2014, 21:35 +0200 schrieb Thomas Meyer: > Am Sonntag, den 19.10.2014, 17:02 +0100 schrieb Anton Ivanov: > > On 19/10/14 15:59, Thomas Meyer wrote: > > > Am Dienstag, den 14.10.2014, 08:31 +0100 schrieb Anton Ivanov: > > >> I see a very similar stall on writeout to ubd with my patches (easy) and > > >> without (difficult - takes running an IO soak for a few days). > > >> > > >> It stalls (usually) when trying to flush the journal file of ext4. > > > any ideas? > > > > I had some suspicion of a race somewhere in the UML VM subsystem. I > > sprinked barrier() all over it, nope not the case. > > I added this patch to the uml kernel: diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h index 82e7db7..7f35fa4 100644 --- a/include/linux/vmstat.h +++ b/include/linux/vmstat.h @@ -241,6 +241,10 @@ static inline void __inc_zone_state(struct zone *zone, enum zone_stat_item item) static inline void __dec_zone_state(struct zone *zone, enum zone_stat_item item) { atomic_long_dec(&zone->vm_stat[item]); + if (&vm_stat[item] == &vm_stat[NR_FILE_DIRTY] && + atomic_long_read(&vm_stat[item]) < 0) { + asm("int3"); + } atomic_long_dec(&vm_stat[item]); } And this is the backtrace leading to the negative nr_dirty value: Program received signal SIGTRAP, Trace/breakpoint trap. __dec_zone_state (item=<optimized out>, zone=<optimized out>) at include/linux/vmstat.h:248 (gdb) bt #0 __dec_zone_state (item=<optimized out>, zone=<optimized out>) at include/linux/vmstat.h:248 #1 __dec_zone_page_state (item=<optimized out>, page=<optimized out>) at include/linux/vmstat.h:260 #2 clear_page_dirty_for_io (page=0x628b7308) at mm/page-writeback.c:2333 #3 0x0000000060188c36 in mpage_submit_page (mpd=0x808ebb90, page=<optimized out>) at fs/ext4/inode.c:1785 #4 0x000000006018917e in mpage_map_and_submit_buffers (mpd=0x808ebb90) at fs/ext4/inode.c:1981 #5 0x000000006018d64a in mpage_map_and_submit_extent (give_up_on_write=<optimized out>, mpd=<optimized out>, handle=<optimized out>) at fs/ext4/inode.c:2123 #6 ext4_writepages (mapping=<optimized out>, wbc=<optimized out>) at fs/ext4/inode.c:2428 #7 0x00000000600f0838 in do_writepages (mapping=<optimized out>, wbc=<optimized out>) at mm/page-writeback.c:2043 #8 0x0000000060143d29 in __writeback_single_inode (inode=0x75e191a8, wbc=0x808ebcb8) at fs/fs-writeback.c:461 #9 0x0000000060144c00 in writeback_sb_inodes (sb=<optimized out>, wb=0x80a92330, work=0x808ebe00) at fs/fs-writeback.c:688 #10 0x0000000060144e0e in __writeback_inodes_wb (wb=0x808eb990, work=0x628b7308) at fs/fs-writeback.c:733 #11 0x0000000060144f8d in wb_writeback (wb=0x80a92330, work=0x808ebe00) at fs/fs-writeback.c:864 #12 0x0000000060145375 in wb_check_old_data_flush (wb=<optimized out>) at fs/fs-writeback.c:979 #13 wb_do_writeback (wb=<optimized out>) at fs/fs-writeback.c:1014 #14 bdi_writeback_workfn (work=0x808eb990) at fs/fs-writeback.c:1044 #15 0x00000000600690a2 in process_one_work (worker=0x808c3700, work=0x80a92340) at kernel/workqueue.c:2023 #16 0x0000000060069b5e in worker_thread (__worker=0x808eb990) at kernel/workqueue.c:2155 #17 0x000000006006dd9f in kthread (_create=0x80822040) at kernel/kthread.c:207 #18 0x000000006003ab59 in new_thread_handler () at arch/um/kernel/process.c:129 #19 0x0000000000000000 in ?? () ^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2014-10-20 20:01 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1413236904.13916.13.camel@localhost.localdomain>
[not found] ` <543CB6BA.9030907@kot-begemot.co.uk>
[not found] ` <543CC5FE.4000601@kot-begemot.co.uk>
[not found] ` <1413271301.13744.13.camel@localhost.localdomain>
[not found] ` <543CD148.7010904@kot-begemot.co.uk>
2014-10-19 14:59 ` [uml-devel] kernel stalls in balance_dirty_pages_ratelimited() Thomas Meyer
2014-10-19 16:02 ` Anton Ivanov
2014-10-19 19:35 ` Thomas Meyer
2014-10-20 20:01 ` Thomas Meyer
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox