Re: [uml-devel] kernel stalls in balance_dirty_pages

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Re: [uml-devel] kernel stalls in balance_dirty_pages_ratelimited()
       [not found]       ` <543CD148.7010904@kot-begemot.co.uk>
@ 2014-10-19 14:59         ` Thomas Meyer
  2014-10-19 16:02           ` Anton Ivanov
  0 siblings, 1 reply; 4+ messages in thread
From: Thomas Meyer @ 2014-10-19 14:59 UTC (permalink / raw)
  To: Anton Ivanov; +Cc: user-mode-linux-devel, Linux Kernel Mailing List

Am Dienstag, den 14.10.2014, 08:31 +0100 schrieb Anton Ivanov:
> I see a very similar stall on writeout to ubd with my patches (easy) and 
> without (difficult - takes running an IO soak for a few days).
> 
> It stalls (usually) when trying to flush the journal file of ext4.

Hi,

here an extract of the trace of all writeback:* tracepoints:

# tracer: nop
#
# entries-in-buffer/entries-written: 19322/2052430   #P:1
#
#                              _-----=> irqs-off
#                             / _----=> need-resched
#                            | / _---=> hardirq/softirq
#                            || / _--=> preempt-depth
#                            ||| /     delay
#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
#              | |       |   ||||       |         |
             yum-1553  [000] ....  1246.000000: writeback_wake_background: bdi 98:0
             yum-1553  [000] ....  1246.000000: balance_dirty_pages: bdi 98:0: limit=24732 setpoint=16229 dirty=18446744073709551284 bdi_setpoint=16227 bdi_dirty=1 dirty_ratelimit=4 task_ratelimit=0 dirtied=1 dirtied_pause=0 paused=0 pause=10 period=10 think=0
    kworker/u2:0-2603  [000] ....  1246.000000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340953 written=358955
    kworker/u2:0-2603  [000] ....  1246.000000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340953 written=358955
    kworker/u2:0-2603  [000] ....  1246.000000: writeback_start: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775807 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
    kworker/u2:0-2603  [000] ....  1246.000000: writeback_queue_io: bdi 98:0: older=4295061896 age=0 enqueue=1 reason=background
    kworker/u2:0-2603  [000] ....  1246.000000: writeback_single_inode_start: bdi 98:0: ino=29951 state=I_DIRTY_SYNC|I_DIRTY_PAGES|I_SYNC dirtied_when=4295061896 age=8 index=9 to_write=1024 wrote=0
    kworker/u2:0-2603  [000] ....  1246.000000: writeback_write_inode_start: bdi 98:0: ino=29951 sync_mode=0
    kworker/u2:0-2603  [000] ....  1246.000000: writeback_write_inode: bdi 98:0: ino=29951 sync_mode=0
    kworker/u2:0-2603  [000] ....  1246.000000: writeback_single_inode: bdi 98:0: ino=29951 state=I_SYNC dirtied_when=4295061896 age=8 index=9 to_write=1024 wrote=1
    kworker/u2:0-2603  [000] ....  1246.000000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
    kworker/u2:0-2603  [000] ....  1246.000000: global_dirty_state: dirty=18446744073709551283 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340953 written=358956
    kworker/u2:0-2603  [000] ....  1246.000000: writeback_start: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
    kworker/u2:0-2603  [000] ....  1246.000000: writeback_queue_io: bdi 98:0: older=4295061896 age=0 enqueue=0 reason=background
    kworker/u2:0-2603  [000] ....  1246.000000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
    kworker/u2:0-2603  [000] ....  1246.000000: writeback_pages_written: 1
             yum-1553  [000] ....  1246.010000: writeback_wake_background: bdi 98:0
             yum-1553  [000] ....  1246.010000: writeback_dirty_inode_start: bdi 98:0: ino=29951 flags=I_DIRTY_SYNC
             yum-1553  [000] ....  1246.010000: writeback_dirty_inode: bdi 98:0: ino=29951 flags=I_DIRTY_SYNC
             yum-1553  [000] d...  1246.010000: writeback_dirty_page: bdi 98:0: ino=29951 index=8
             yum-1553  [000] ....  1246.010000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340954 written=358956
             yum-1553  [000] ....  1246.010000: writeback_wake_background: bdi 98:0
             yum-1553  [000] ....  1246.010000: balance_dirty_pages: bdi 98:0: limit=24732 setpoint=16229 dirty=18446744073709551284 bdi_setpoint=16227 bdi_dirty=1 dirty_ratelimit=4 task_ratelimit=0 dirtied=1 dirtied_pause=0 paused=0 pause=10 period=10 think=0
    kworker/u2:0-2603  [000] ....  1246.010000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340954 written=358956
    kworker/u2:0-2603  [000] ....  1246.010000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340954 written=358956
    kworker/u2:0-2603  [000] ....  1246.010000: writeback_start: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775807 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
    kworker/u2:0-2603  [000] ....  1246.010000: writeback_queue_io: bdi 98:0: older=4295061897 age=0 enqueue=1 reason=background
    kworker/u2:0-2603  [000] ....  1246.010000: writeback_single_inode_start: bdi 98:0: ino=29951 state=I_DIRTY_SYNC|I_DIRTY_PAGES|I_SYNC dirtied_when=4295061897 age=8 index=9 to_write=1024 wrote=0
    kworker/u2:0-2603  [000] ....  1246.010000: writeback_write_inode_start: bdi 98:0: ino=29951 sync_mode=0
    kworker/u2:0-2603  [000] ....  1246.010000: writeback_write_inode: bdi 98:0: ino=29951 sync_mode=0
    kworker/u2:0-2603  [000] ....  1246.010000: writeback_single_inode: bdi 98:0: ino=29951 state=I_SYNC dirtied_when=4295061897 age=8 index=9 to_write=1024 wrote=1
    kworker/u2:0-2603  [000] ....  1246.010000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
    kworker/u2:0-2603  [000] ....  1246.010000: global_dirty_state: dirty=18446744073709551283 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340954 written=358957
    kworker/u2:0-2603  [000] ....  1246.010000: writeback_start: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
    kworker/u2:0-2603  [000] ....  1246.010000: writeback_queue_io: bdi 98:0: older=4295061897 age=0 enqueue=0 reason=background
    kworker/u2:0-2603  [000] ....  1246.010000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
    kworker/u2:0-2603  [000] ....  1246.010000: writeback_pages_written: 1
             yum-1553  [000] ....  1246.020000: writeback_wake_background: bdi 98:0
             yum-1553  [000] ....  1246.020000: writeback_dirty_inode_start: bdi 98:0: ino=29951 flags=I_DIRTY_SYNC
             yum-1553  [000] ....  1246.020000: writeback_dirty_inode: bdi 98:0: ino=29951 flags=I_DIRTY_SYNC
             yum-1553  [000] d...  1246.020000: writeback_dirty_page: bdi 98:0: ino=29951 index=8
             yum-1553  [000] ....  1246.020000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340955 written=358957
             yum-1553  [000] ....  1246.020000: writeback_wake_background: bdi 98:0
             yum-1553  [000] ....  1246.020000: balance_dirty_pages: bdi 98:0: limit=24732 setpoint=16229 dirty=18446744073709551284 bdi_setpoint=16227 bdi_dirty=1 dirty_ratelimit=4 task_ratelimit=0 dirtied=1 dirtied_pause=0 paused=0 pause=10 period=10 think=0
    kworker/u2:0-2603  [000] ....  1246.020000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340955 written=358957
    kworker/u2:0-2603  [000] ....  1246.020000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340955 written=358957
    kworker/u2:0-2603  [000] ....  1246.020000: writeback_start: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775807 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
    kworker/u2:0-2603  [000] ....  1246.020000: writeback_queue_io: bdi 98:0: older=4295061898 age=0 enqueue=1 reason=background
    kworker/u2:0-2603  [000] ....  1246.020000: writeback_single_inode_start: bdi 98:0: ino=29951 state=I_DIRTY_SYNC|I_DIRTY_PAGES|I_SYNC dirtied_when=4295061898 age=8 index=9 to_write=1024 wrote=0
    kworker/u2:0-2603  [000] ....  1246.020000: writeback_write_inode_start: bdi 98:0: ino=29951 sync_mode=0
    kworker/u2:0-2603  [000] ....  1246.020000: writeback_write_inode: bdi 98:0: ino=29951 sync_mode=0
    kworker/u2:0-2603  [000] ....  1246.020000: writeback_single_inode: bdi 98:0: ino=29951 state=I_SYNC dirtied_when=4295061898 age=8 index=9 to_write=1024 wrote=1
    kworker/u2:0-2603  [000] ....  1246.020000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
    kworker/u2:0-2603  [000] ....  1246.020000: global_dirty_state: dirty=18446744073709551283 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340955 written=358958
    kworker/u2:0-2603  [000] ....  1246.020000: writeback_start: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
    kworker/u2:0-2603  [000] ....  1246.020000: writeback_queue_io: bdi 98:0: older=4295061898 age=0 enqueue=0 reason=background
    kworker/u2:0-2603  [000] ....  1246.020000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
    kworker/u2:0-2603  [000] ....  1246.020000: writeback_pages_written: 1

I wonder why "global_dirty_state: dirty=18446744073709551284" can get so high!
I think this is maybe an signed long, but "nr_dirty" should be a unsigned long. Two's complement of above value is 332...

I'm not sure what's going on here...

any ideas?

> 
> A.
> 
> 
> On 14/10/14 08:21, Thomas Meyer wrote:
> > Am Dienstag, den 14.10.2014, 07:43 +0100 schrieb Anton Ivanov:
> >> On 14/10/14 06:38, Anton Ivanov wrote:
> >>> How does the stall manifest itself?
> >>>
> >>> Do you have the journal thread (and sometimes a couple of other threads)
> >>> sitting in D state?
> >> Sorry, should not be asking questions at 6 am before the 3rd double
> >> espresso.
> >>
> >> I think it is the same bug I am chasing - a stall in ubd, you hit it on
> >> swap while I hit it in normal operation on a swapless system. I see a
> >> stall in the journal instead of a backing dev stall.
> >>
> >> If you apply the ubd patches out of my patchsets, you can trigger this
> >> one with ease. In theory, all they do is to make UBD faster so they
> >> should not by themselves introduce new races. They may however make the
> >> older ones more pronounced.
> >>
> >> My working hypothesis is a race somewhere in the vm subsystem. I have
> >> been unable to nail it though.
> > Hi Anton,
> >
> > I see this bug on a 3.17 uml kernel with the sync fix patch from
> > Thorsten Knabe applied.
> >
> > The stall has to do with the writepage ratelimit mechanism, as the
> > mechanism seems to reach a state where it tries to write out page, per
> > page:
> >
> > Breakpoint 1, balance_dirty_pages (pages_dirtied=1, mapping=<optimized out>) at	mm/page-writeback.c:1338
> > (gdb) bt
> > #0  balance_dirty_pages	(pages_dirtied=1, mapping=<optimized out>) at mm/page-writeback.c:1338
> >
> > pages_dirtied = 1 !!
> >
> > #0  try_to_grab_pending (work=0x7fa2a348, is_dwork=true, flags=0x72ff5ab8) at kernel/workqueue.c:1159
> > #1  0x0000000060051feb in mod_delayed_work_on (cpu=2141365064, wq=0x1, dwork=0x72ff5ab8, delay=<optimized out>) at kernel/workqueue.c:1510
> > #2  0x00000000600f382c in mod_delayed_work (delay=<optimized out>, dwork=<optimized out>, wq=<optimized out>) at include/linux/workqueue.h:504
> > #3  bdi_wakeup_thread (bdi=<optimized out>) at fs/fs-writeback.c:98
> > #4  0x00000000600f4aca in bdi_start_background_writeback (bdi=<optimized out>) at fs/fs-writeback.c:179
> > #5  0x000000006042d4c0 in balance_dirty_pages (pages_dirtied=<optimized out>, mapping=<optimized out>) at mm/page-writeback.c:1408
> > #6  0x00000000600a6e1a in balance_dirty_pages_ratelimited (mapping=<optimized out>) at mm/page-writeback.c:1627
> > #7  0x00000000600ba54f in do_wp_page (mm=<optimized out>, vma=<optimized out>, address=<optimized out>, page_table=<optimized out>, pmd=<optimized out>, orig_pte=..., ptl=<optimized out>) at mm/memory.c:2178
> > #8  0x00000000600bc986 in handle_pte_fault (flags=<optimized out>, pmd=<optimized out>, pte=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>) at mm/memory.c:3230
> > #9  __handle_mm_fault (flags=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>) at mm/memory.c:3335
> > #10 handle_mm_fault (mm=<optimized out>, vma=0x7f653228, address=1472490776, flags=<optimized out>) at mm/memory.c:3364
> > #11 0x0000000060028cec in handle_page_fault (address=1472490776, ip=<optimized out>, is_write=<optimized out>, is_user=0, code_out=<optimized out>) at arch/um/kernel/trap.c:75
> > #12 0x00000000600290d7 in segv (fi=..., ip=1228924391, is_user=<optimized out>, regs=0x73eb8de8) at arch/um/kernel/trap.c:222
> > #13 0x0000000060029395 in segv_handler (sig=<optimized out>, unused_si=<optimized out>, regs=<optimized out>) at arch/um/kernel/trap.c:191
> > #14 0x0000000060039c0f in userspace (regs=0x73eb8de8) at arch/um/os-Linux/skas/process.c:429
> > #15 0x0000000060026a8c in fork_handler () at arch/um/kernel/process.c:149
> > #16 0x000000000070b620 in ?? ()
> > #17 0x0000000000000000 in ?? ()
> >
> > I'm not sure if this is the same error you encounter.
> >
> > This is on an ubd device with a cow image attached to it.
> >
> > The original ubd file and the cow file are spares ones, and do also
> > contain a swap partition.
> >
> > I hope to get tracepoints/perf working, now as there is stacktrace
> > support in uml. An interessting tracepoint would be
> > TRACE_EVENT(bdi_dirty_ratelimit) or TRACE_EVENT(balance_dirty_pages)
> >
> >
> >> A.
> >>
> >>> A.
> >>>
> >>> On 13/10/14 22:48, Thomas Meyer wrote:
> >>>> #0  balance_dirty_pages_ratelimited (mapping=0x792cc618) at mm/page-writeback.c:1587
> >>>> #1  0x00000000600ba54f in do_wp_page (mm=<optimized out>, vma=<optimized out>, address=<optimized out>, page_table=<optimized out>, pmd
> >>>> =<optimized out>, orig_pte=..., ptl=<optimized out>) at mm/memory.c:2178
> >>>> #2  0x00000000600bc986 in handle_pte_fault (flags=<optimized out>, pmd=<optimized out>, pte=<optimized out>, address=<optimized out>, v
> >>>> ma=<optimized out>, mm=<optimized out>) at mm/memory.c:3230
> >>>> #3  __handle_mm_fault (flags=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>) at mm/memory.c:3335
> >>>> #4  handle_mm_fault (mm=<optimized out>, vma=0x78008e88, address=1462695424, flags=<optimized out>) at mm/memory.c:3364
> >>>> #5  0x0000000060028cec in handle_page_fault (address=1462695424, ip=<optimized out>, is_write=<optimized out>, is_user=0, code_out=<opt
> >>>> imized out>) at arch/um/kernel/trap.c:75
> >>>> #6  0x00000000600290d7 in segv (fi=..., ip=1228924391, is_user=<optimized out>, regs=0x624f5728) at arch/um/kernel/trap.c:222
> >>>> #7  0x0000000060029395 in segv_handler (sig=<optimized out>, unused_si=<optimized out>, regs=<optimized out>) at arch/um/kernel/trap.c:
> >>>> 191
> >>>> #8  0x0000000060039c0f in userspace (regs=0x624f5728) at arch/um/os-Linux/skas/process.c:429
> >>>> #9  0x0000000060026a8c in fork_handler () at arch/um/kernel/process.c:149
> >>>> #10 0x0000000000000000 in ?? ()
> >>>>
> >>>> backing_dev_info:
> >>>> p *mapping->backing_dev_info
> >>>> $2 = {bdi_list = {next = 0x605901a0 <bdi_list>, prev = 0x80a42890}, ra_pages = 32, state = 8, capabilities = 4, congested_fn = 0x0, con
> >>>> gested_data = 0x0, name = 0x604fb827 "block", bdi_stat = {{count = 4}, {count = 0}, {count = 318691}, {count = 314567}}, bw_time_stamp
> >>>> = 4339445229, dirtied_stamp = 318686, written_stamp = 314564, write_bandwidth = 166, avg_write_bandwidth = 164, dirty_ratelimit = 1, ba
> >>>> lanced_dirty_ratelimit = 1, completions = {events = {count = 3}, period = 4481, lock = {raw_lock = {<No data fields>}}}, dirty_exceeded
> >>>>    = 0, min_ratio = 0, max_ratio = 100, max_prop_frac = 1024, wb = {bdi = 0x80a42278, nr = 0, last_old_flush = 4339445229, dwork = {work
> >>>> = {data = {counter = 65}, entry = {next = 0x80a42350, prev = 0x80a42350}, func = 0x600f4b25 <bdi_writeback_workfn>}, timer = {entry = {
> >>>> next = 0x606801a0 <boot_tvec_bases+4896>, prev = 0x803db650}, expires = 4339445730, base = 0x6067ee82 <boot_tvec_bases+2>, function = 0
> >>>> x60051dbb <delayed_work_timer_fn>, data = 2158240584, slack = -1}, wq = 0x808d9c00, cpu = 1}, b_dirty = {next = 0x7a4ce1f8, prev = 0x80
> >>>> 6ad9a8}, b_io = {next = 0x80a423c0, prev = 0x80a423c0}, b_more_io = {next = 0x80a423d0, prev = 0x80a423d0}, list_lock = {{rlock = {raw_
> >>>> lock = {<No data fields>}}}}}, wb_lock = {{rlock = {raw_lock = {<No data fields>}}}}, work_list = {next = 0x80a423e0, prev = 0x80a423e0
> >>>> }, dev = 0x80b68e00, laptop_mode_wb_timer = {entry = {next = 0x0, prev = 0x0}, expires = 0, base = 0x6067ee80 <boot_tvec_bases>, functi
> >>>> on = 0x600a6efd <laptop_mode_timer_fn>, data = 2158240008, slack = -1}, debug_dir = 0x80419e58, debug_stats = 0x80419d98}
> >>>>
> >>>> when i set the cap_dirty from the backing-dev ( capabilities = 5 ) the system comes back to normal.
> >>>>
> >>>> any ideas what's going on here?
> >>>>
> >>>> with kind regards
> >>>> thomas



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [uml-devel] kernel stalls in balance_dirty_pages_ratelimited()
  2014-10-19 14:59         ` [uml-devel] kernel stalls in balance_dirty_pages_ratelimited() Thomas Meyer
@ 2014-10-19 16:02           ` Anton Ivanov
  2014-10-19 19:35             ` Thomas Meyer
  0 siblings, 1 reply; 4+ messages in thread
From: Anton Ivanov @ 2014-10-19 16:02 UTC (permalink / raw)
  To: Thomas Meyer; +Cc: user-mode-linux-devel, Linux Kernel Mailing List

On 19/10/14 15:59, Thomas Meyer wrote:
> Am Dienstag, den 14.10.2014, 08:31 +0100 schrieb Anton Ivanov:
>> I see a very similar stall on writeout to ubd with my patches (easy) and 
>> without (difficult - takes running an IO soak for a few days).
>>
>> It stalls (usually) when trying to flush the journal file of ext4.
> Hi,
>
> here an extract of the trace of all writeback:* tracepoints:
>
> # tracer: nop
> #
> # entries-in-buffer/entries-written: 19322/2052430   #P:1
> #
> #                              _-----=> irqs-off
> #                             / _----=> need-resched
> #                            | / _---=> hardirq/softirq
> #                            || / _--=> preempt-depth
> #                            ||| /     delay
> #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
> #              | |       |   ||||       |         |
>              yum-1553  [000] ....  1246.000000: writeback_wake_background: bdi 98:0
>              yum-1553  [000] ....  1246.000000: balance_dirty_pages: bdi 98:0: limit=24732 setpoint=16229 dirty=18446744073709551284 bdi_setpoint=16227 bdi_dirty=1 dirty_ratelimit=4 task_ratelimit=0 dirtied=1 dirtied_pause=0 paused=0 pause=10 period=10 think=0
>     kworker/u2:0-2603  [000] ....  1246.000000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340953 written=358955
>     kworker/u2:0-2603  [000] ....  1246.000000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340953 written=358955
>     kworker/u2:0-2603  [000] ....  1246.000000: writeback_start: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775807 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
>     kworker/u2:0-2603  [000] ....  1246.000000: writeback_queue_io: bdi 98:0: older=4295061896 age=0 enqueue=1 reason=background
>     kworker/u2:0-2603  [000] ....  1246.000000: writeback_single_inode_start: bdi 98:0: ino=29951 state=I_DIRTY_SYNC|I_DIRTY_PAGES|I_SYNC dirtied_when=4295061896 age=8 index=9 to_write=1024 wrote=0
>     kworker/u2:0-2603  [000] ....  1246.000000: writeback_write_inode_start: bdi 98:0: ino=29951 sync_mode=0
>     kworker/u2:0-2603  [000] ....  1246.000000: writeback_write_inode: bdi 98:0: ino=29951 sync_mode=0
>     kworker/u2:0-2603  [000] ....  1246.000000: writeback_single_inode: bdi 98:0: ino=29951 state=I_SYNC dirtied_when=4295061896 age=8 index=9 to_write=1024 wrote=1
>     kworker/u2:0-2603  [000] ....  1246.000000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
>     kworker/u2:0-2603  [000] ....  1246.000000: global_dirty_state: dirty=18446744073709551283 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340953 written=358956
>     kworker/u2:0-2603  [000] ....  1246.000000: writeback_start: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
>     kworker/u2:0-2603  [000] ....  1246.000000: writeback_queue_io: bdi 98:0: older=4295061896 age=0 enqueue=0 reason=background
>     kworker/u2:0-2603  [000] ....  1246.000000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
>     kworker/u2:0-2603  [000] ....  1246.000000: writeback_pages_written: 1
>              yum-1553  [000] ....  1246.010000: writeback_wake_background: bdi 98:0
>              yum-1553  [000] ....  1246.010000: writeback_dirty_inode_start: bdi 98:0: ino=29951 flags=I_DIRTY_SYNC
>              yum-1553  [000] ....  1246.010000: writeback_dirty_inode: bdi 98:0: ino=29951 flags=I_DIRTY_SYNC
>              yum-1553  [000] d...  1246.010000: writeback_dirty_page: bdi 98:0: ino=29951 index=8
>              yum-1553  [000] ....  1246.010000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340954 written=358956
>              yum-1553  [000] ....  1246.010000: writeback_wake_background: bdi 98:0
>              yum-1553  [000] ....  1246.010000: balance_dirty_pages: bdi 98:0: limit=24732 setpoint=16229 dirty=18446744073709551284 bdi_setpoint=16227 bdi_dirty=1 dirty_ratelimit=4 task_ratelimit=0 dirtied=1 dirtied_pause=0 paused=0 pause=10 period=10 think=0
>     kworker/u2:0-2603  [000] ....  1246.010000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340954 written=358956
>     kworker/u2:0-2603  [000] ....  1246.010000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340954 written=358956
>     kworker/u2:0-2603  [000] ....  1246.010000: writeback_start: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775807 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
>     kworker/u2:0-2603  [000] ....  1246.010000: writeback_queue_io: bdi 98:0: older=4295061897 age=0 enqueue=1 reason=background
>     kworker/u2:0-2603  [000] ....  1246.010000: writeback_single_inode_start: bdi 98:0: ino=29951 state=I_DIRTY_SYNC|I_DIRTY_PAGES|I_SYNC dirtied_when=4295061897 age=8 index=9 to_write=1024 wrote=0
>     kworker/u2:0-2603  [000] ....  1246.010000: writeback_write_inode_start: bdi 98:0: ino=29951 sync_mode=0
>     kworker/u2:0-2603  [000] ....  1246.010000: writeback_write_inode: bdi 98:0: ino=29951 sync_mode=0
>     kworker/u2:0-2603  [000] ....  1246.010000: writeback_single_inode: bdi 98:0: ino=29951 state=I_SYNC dirtied_when=4295061897 age=8 index=9 to_write=1024 wrote=1
>     kworker/u2:0-2603  [000] ....  1246.010000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
>     kworker/u2:0-2603  [000] ....  1246.010000: global_dirty_state: dirty=18446744073709551283 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340954 written=358957
>     kworker/u2:0-2603  [000] ....  1246.010000: writeback_start: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
>     kworker/u2:0-2603  [000] ....  1246.010000: writeback_queue_io: bdi 98:0: older=4295061897 age=0 enqueue=0 reason=background
>     kworker/u2:0-2603  [000] ....  1246.010000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
>     kworker/u2:0-2603  [000] ....  1246.010000: writeback_pages_written: 1
>              yum-1553  [000] ....  1246.020000: writeback_wake_background: bdi 98:0
>              yum-1553  [000] ....  1246.020000: writeback_dirty_inode_start: bdi 98:0: ino=29951 flags=I_DIRTY_SYNC
>              yum-1553  [000] ....  1246.020000: writeback_dirty_inode: bdi 98:0: ino=29951 flags=I_DIRTY_SYNC
>              yum-1553  [000] d...  1246.020000: writeback_dirty_page: bdi 98:0: ino=29951 index=8
>              yum-1553  [000] ....  1246.020000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340955 written=358957
>              yum-1553  [000] ....  1246.020000: writeback_wake_background: bdi 98:0
>              yum-1553  [000] ....  1246.020000: balance_dirty_pages: bdi 98:0: limit=24732 setpoint=16229 dirty=18446744073709551284 bdi_setpoint=16227 bdi_dirty=1 dirty_ratelimit=4 task_ratelimit=0 dirtied=1 dirtied_pause=0 paused=0 pause=10 period=10 think=0
>     kworker/u2:0-2603  [000] ....  1246.020000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340955 written=358957
>     kworker/u2:0-2603  [000] ....  1246.020000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340955 written=358957
>     kworker/u2:0-2603  [000] ....  1246.020000: writeback_start: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775807 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
>     kworker/u2:0-2603  [000] ....  1246.020000: writeback_queue_io: bdi 98:0: older=4295061898 age=0 enqueue=1 reason=background
>     kworker/u2:0-2603  [000] ....  1246.020000: writeback_single_inode_start: bdi 98:0: ino=29951 state=I_DIRTY_SYNC|I_DIRTY_PAGES|I_SYNC dirtied_when=4295061898 age=8 index=9 to_write=1024 wrote=0
>     kworker/u2:0-2603  [000] ....  1246.020000: writeback_write_inode_start: bdi 98:0: ino=29951 sync_mode=0
>     kworker/u2:0-2603  [000] ....  1246.020000: writeback_write_inode: bdi 98:0: ino=29951 sync_mode=0
>     kworker/u2:0-2603  [000] ....  1246.020000: writeback_single_inode: bdi 98:0: ino=29951 state=I_SYNC dirtied_when=4295061898 age=8 index=9 to_write=1024 wrote=1
>     kworker/u2:0-2603  [000] ....  1246.020000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
>     kworker/u2:0-2603  [000] ....  1246.020000: global_dirty_state: dirty=18446744073709551283 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340955 written=358958
>     kworker/u2:0-2603  [000] ....  1246.020000: writeback_start: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
>     kworker/u2:0-2603  [000] ....  1246.020000: writeback_queue_io: bdi 98:0: older=4295061898 age=0 enqueue=0 reason=background
>     kworker/u2:0-2603  [000] ....  1246.020000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
>     kworker/u2:0-2603  [000] ....  1246.020000: writeback_pages_written: 1
>
> I wonder why "global_dirty_state: dirty=18446744073709551284" can get so high!
> I think this is maybe an signed long, but "nr_dirty" should be a unsigned long. Two's complement of above value is 332...
>
> I'm not sure what's going on here...

Neither am I, but, unless I am mistaken, it does look like the writeout
just stops working and the dirty starts to grow uncontrolled. :)

>
> any ideas?

I had some suspicion of a race somewhere in the UML VM subsystem. I
sprinked barrier() all over it, nope not the case.

I have gone through the ubd driver itself with a microscope ~ 3 times,
cannot see anything wrong with the logic. While it does not win a beauty
contest, it does seem like it should work.

All I know is that my fixes to UBD and the replacement epoll IRQ
controller for UML make it more likely, however, you can trigger it with
the stock UML too.

A.

>
>> A.
>>
>>
>> On 14/10/14 08:21, Thomas Meyer wrote:
>>> Am Dienstag, den 14.10.2014, 07:43 +0100 schrieb Anton Ivanov:
>>>> On 14/10/14 06:38, Anton Ivanov wrote:
>>>>> How does the stall manifest itself?
>>>>>
>>>>> Do you have the journal thread (and sometimes a couple of other threads)
>>>>> sitting in D state?
>>>> Sorry, should not be asking questions at 6 am before the 3rd double
>>>> espresso.
>>>>
>>>> I think it is the same bug I am chasing - a stall in ubd, you hit it on
>>>> swap while I hit it in normal operation on a swapless system. I see a
>>>> stall in the journal instead of a backing dev stall.
>>>>
>>>> If you apply the ubd patches out of my patchsets, you can trigger this
>>>> one with ease. In theory, all they do is to make UBD faster so they
>>>> should not by themselves introduce new races. They may however make the
>>>> older ones more pronounced.
>>>>
>>>> My working hypothesis is a race somewhere in the vm subsystem. I have
>>>> been unable to nail it though.
>>> Hi Anton,
>>>
>>> I see this bug on a 3.17 uml kernel with the sync fix patch from
>>> Thorsten Knabe applied.
>>>
>>> The stall has to do with the writepage ratelimit mechanism, as the
>>> mechanism seems to reach a state where it tries to write out page, per
>>> page:
>>>
>>> Breakpoint 1, balance_dirty_pages (pages_dirtied=1, mapping=<optimized out>) at	mm/page-writeback.c:1338
>>> (gdb) bt
>>> #0  balance_dirty_pages	(pages_dirtied=1, mapping=<optimized out>) at mm/page-writeback.c:1338
>>>
>>> pages_dirtied = 1 !!
>>>
>>> #0  try_to_grab_pending (work=0x7fa2a348, is_dwork=true, flags=0x72ff5ab8) at kernel/workqueue.c:1159
>>> #1  0x0000000060051feb in mod_delayed_work_on (cpu=2141365064, wq=0x1, dwork=0x72ff5ab8, delay=<optimized out>) at kernel/workqueue.c:1510
>>> #2  0x00000000600f382c in mod_delayed_work (delay=<optimized out>, dwork=<optimized out>, wq=<optimized out>) at include/linux/workqueue.h:504
>>> #3  bdi_wakeup_thread (bdi=<optimized out>) at fs/fs-writeback.c:98
>>> #4  0x00000000600f4aca in bdi_start_background_writeback (bdi=<optimized out>) at fs/fs-writeback.c:179
>>> #5  0x000000006042d4c0 in balance_dirty_pages (pages_dirtied=<optimized out>, mapping=<optimized out>) at mm/page-writeback.c:1408
>>> #6  0x00000000600a6e1a in balance_dirty_pages_ratelimited (mapping=<optimized out>) at mm/page-writeback.c:1627
>>> #7  0x00000000600ba54f in do_wp_page (mm=<optimized out>, vma=<optimized out>, address=<optimized out>, page_table=<optimized out>, pmd=<optimized out>, orig_pte=..., ptl=<optimized out>) at mm/memory.c:2178
>>> #8  0x00000000600bc986 in handle_pte_fault (flags=<optimized out>, pmd=<optimized out>, pte=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>) at mm/memory.c:3230
>>> #9  __handle_mm_fault (flags=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>) at mm/memory.c:3335
>>> #10 handle_mm_fault (mm=<optimized out>, vma=0x7f653228, address=1472490776, flags=<optimized out>) at mm/memory.c:3364
>>> #11 0x0000000060028cec in handle_page_fault (address=1472490776, ip=<optimized out>, is_write=<optimized out>, is_user=0, code_out=<optimized out>) at arch/um/kernel/trap.c:75
>>> #12 0x00000000600290d7 in segv (fi=..., ip=1228924391, is_user=<optimized out>, regs=0x73eb8de8) at arch/um/kernel/trap.c:222
>>> #13 0x0000000060029395 in segv_handler (sig=<optimized out>, unused_si=<optimized out>, regs=<optimized out>) at arch/um/kernel/trap.c:191
>>> #14 0x0000000060039c0f in userspace (regs=0x73eb8de8) at arch/um/os-Linux/skas/process.c:429
>>> #15 0x0000000060026a8c in fork_handler () at arch/um/kernel/process.c:149
>>> #16 0x000000000070b620 in ?? ()
>>> #17 0x0000000000000000 in ?? ()
>>>
>>> I'm not sure if this is the same error you encounter.
>>>
>>> This is on an ubd device with a cow image attached to it.
>>>
>>> The original ubd file and the cow file are spares ones, and do also
>>> contain a swap partition.
>>>
>>> I hope to get tracepoints/perf working, now as there is stacktrace
>>> support in uml. An interessting tracepoint would be
>>> TRACE_EVENT(bdi_dirty_ratelimit) or TRACE_EVENT(balance_dirty_pages)
>>>
>>>
>>>> A.
>>>>
>>>>> A.
>>>>>
>>>>> On 13/10/14 22:48, Thomas Meyer wrote:
>>>>>> #0  balance_dirty_pages_ratelimited (mapping=0x792cc618) at mm/page-writeback.c:1587
>>>>>> #1  0x00000000600ba54f in do_wp_page (mm=<optimized out>, vma=<optimized out>, address=<optimized out>, page_table=<optimized out>, pmd
>>>>>> =<optimized out>, orig_pte=..., ptl=<optimized out>) at mm/memory.c:2178
>>>>>> #2  0x00000000600bc986 in handle_pte_fault (flags=<optimized out>, pmd=<optimized out>, pte=<optimized out>, address=<optimized out>, v
>>>>>> ma=<optimized out>, mm=<optimized out>) at mm/memory.c:3230
>>>>>> #3  __handle_mm_fault (flags=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>) at mm/memory.c:3335
>>>>>> #4  handle_mm_fault (mm=<optimized out>, vma=0x78008e88, address=1462695424, flags=<optimized out>) at mm/memory.c:3364
>>>>>> #5  0x0000000060028cec in handle_page_fault (address=1462695424, ip=<optimized out>, is_write=<optimized out>, is_user=0, code_out=<opt
>>>>>> imized out>) at arch/um/kernel/trap.c:75
>>>>>> #6  0x00000000600290d7 in segv (fi=..., ip=1228924391, is_user=<optimized out>, regs=0x624f5728) at arch/um/kernel/trap.c:222
>>>>>> #7  0x0000000060029395 in segv_handler (sig=<optimized out>, unused_si=<optimized out>, regs=<optimized out>) at arch/um/kernel/trap.c:
>>>>>> 191
>>>>>> #8  0x0000000060039c0f in userspace (regs=0x624f5728) at arch/um/os-Linux/skas/process.c:429
>>>>>> #9  0x0000000060026a8c in fork_handler () at arch/um/kernel/process.c:149
>>>>>> #10 0x0000000000000000 in ?? ()
>>>>>>
>>>>>> backing_dev_info:
>>>>>> p *mapping->backing_dev_info
>>>>>> $2 = {bdi_list = {next = 0x605901a0 <bdi_list>, prev = 0x80a42890}, ra_pages = 32, state = 8, capabilities = 4, congested_fn = 0x0, con
>>>>>> gested_data = 0x0, name = 0x604fb827 "block", bdi_stat = {{count = 4}, {count = 0}, {count = 318691}, {count = 314567}}, bw_time_stamp
>>>>>> = 4339445229, dirtied_stamp = 318686, written_stamp = 314564, write_bandwidth = 166, avg_write_bandwidth = 164, dirty_ratelimit = 1, ba
>>>>>> lanced_dirty_ratelimit = 1, completions = {events = {count = 3}, period = 4481, lock = {raw_lock = {<No data fields>}}}, dirty_exceeded
>>>>>>    = 0, min_ratio = 0, max_ratio = 100, max_prop_frac = 1024, wb = {bdi = 0x80a42278, nr = 0, last_old_flush = 4339445229, dwork = {work
>>>>>> = {data = {counter = 65}, entry = {next = 0x80a42350, prev = 0x80a42350}, func = 0x600f4b25 <bdi_writeback_workfn>}, timer = {entry = {
>>>>>> next = 0x606801a0 <boot_tvec_bases+4896>, prev = 0x803db650}, expires = 4339445730, base = 0x6067ee82 <boot_tvec_bases+2>, function = 0
>>>>>> x60051dbb <delayed_work_timer_fn>, data = 2158240584, slack = -1}, wq = 0x808d9c00, cpu = 1}, b_dirty = {next = 0x7a4ce1f8, prev = 0x80
>>>>>> 6ad9a8}, b_io = {next = 0x80a423c0, prev = 0x80a423c0}, b_more_io = {next = 0x80a423d0, prev = 0x80a423d0}, list_lock = {{rlock = {raw_
>>>>>> lock = {<No data fields>}}}}}, wb_lock = {{rlock = {raw_lock = {<No data fields>}}}}, work_list = {next = 0x80a423e0, prev = 0x80a423e0
>>>>>> }, dev = 0x80b68e00, laptop_mode_wb_timer = {entry = {next = 0x0, prev = 0x0}, expires = 0, base = 0x6067ee80 <boot_tvec_bases>, functi
>>>>>> on = 0x600a6efd <laptop_mode_timer_fn>, data = 2158240008, slack = -1}, debug_dir = 0x80419e58, debug_stats = 0x80419d98}
>>>>>>
>>>>>> when i set the cap_dirty from the backing-dev ( capabilities = 5 ) the system comes back to normal.
>>>>>>
>>>>>> any ideas what's going on here?
>>>>>>
>>>>>> with kind regards
>>>>>> thomas
>
>


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [uml-devel] kernel stalls in balance_dirty_pages_ratelimited()
  2014-10-19 16:02           ` Anton Ivanov
@ 2014-10-19 19:35             ` Thomas Meyer
  2014-10-20 20:01               ` Thomas Meyer
  0 siblings, 1 reply; 4+ messages in thread
From: Thomas Meyer @ 2014-10-19 19:35 UTC (permalink / raw)
  To: Anton Ivanov; +Cc: Linux Kernel Mailing List, user-mode-linux-devel

Am Sonntag, den 19.10.2014, 17:02 +0100 schrieb Anton Ivanov:
> On 19/10/14 15:59, Thomas Meyer wrote:
> > Am Dienstag, den 14.10.2014, 08:31 +0100 schrieb Anton Ivanov:
> >> I see a very similar stall on writeout to ubd with my patches (easy) and 
> >> without (difficult - takes running an IO soak for a few days).
> >>
> >> It stalls (usually) when trying to flush the journal file of ext4.
> > Hi,
> >
> > here an extract of the trace of all writeback:* tracepoints:
> >
> >              yum-1553  [000] ....  1246.020000: balance_dirty_pages: bdi 98:0: limit=24732 setpoint=16229 dirty=18446744073709551284 bdi_setpoint=16227 bdi_dirty=1 dirty_ratelimit=4 task_ratelimit=0 dirtied=1 dirtied_pause=0 paused=0 pause=10 period=10 think=0
> >     kworker/u2:0-2603  [000] ....  1246.020000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340955 written=358957
> >     kworker/u2:0-2603  [000] ....  1246.020000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340955 written=358957
> >     kworker/u2:0-2603  [000] ....  1246.020000: writeback_start: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775807 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
> >     kworker/u2:0-2603  [000] ....  1246.020000: writeback_queue_io: bdi 98:0: older=4295061898 age=0 enqueue=1 reason=background
> >     kworker/u2:0-2603  [000] ....  1246.020000: writeback_single_inode_start: bdi 98:0: ino=29951 state=I_DIRTY_SYNC|I_DIRTY_PAGES|I_SYNC dirtied_when=4295061898 age=8 index=9 to_write=1024 wrote=0
> >     kworker/u2:0-2603  [000] ....  1246.020000: writeback_write_inode_start: bdi 98:0: ino=29951 sync_mode=0
> >     kworker/u2:0-2603  [000] ....  1246.020000: writeback_write_inode: bdi 98:0: ino=29951 sync_mode=0
> >     kworker/u2:0-2603  [000] ....  1246.020000: writeback_single_inode: bdi 98:0: ino=29951 state=I_SYNC dirtied_when=4295061898 age=8 index=9 to_write=1024 wrote=1
> >     kworker/u2:0-2603  [000] ....  1246.020000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
> >     kworker/u2:0-2603  [000] ....  1246.020000: global_dirty_state: dirty=18446744073709551283 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340955 written=358958
> >     kworker/u2:0-2603  [000] ....  1246.020000: writeback_start: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
> >     kworker/u2:0-2603  [000] ....  1246.020000: writeback_queue_io: bdi 98:0: older=4295061898 age=0 enqueue=0 reason=background
> >     kworker/u2:0-2603  [000] ....  1246.020000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
> >     kworker/u2:0-2603  [000] ....  1246.020000: writeback_pages_written: 1
> >
> > I wonder why "global_dirty_state: dirty=18446744073709551284" can get so high!
> > I think this is maybe an signed long, but "nr_dirty" should be a unsigned long. Two's complement of above value is 332...
> >
> > I'm not sure what's going on here...
> 
> Neither am I, but, unless I am mistaken, it does look like the writeout
> just stops working and the dirty starts to grow uncontrolled. :)
> 
> >
> > any ideas?
> 
> I had some suspicion of a race somewhere in the UML VM subsystem. I
> sprinked barrier() all over it, nope not the case.
> 
> I have gone through the ubd driver itself with a microscope ~ 3 times,
> cannot see anything wrong with the logic. While it does not win a beauty
> contest, it does seem like it should work.
> 
> All I know is that my fixes to UBD and the replacement epoll IRQ
> controller for UML make it more likely, however, you can trigger it with
> the stock UML too.

this is a trace with traceoff with the first global_dirty_state nr_dirty > 1000000

    kworker/u2:2-112   [000] ....   541.590000: writeback_write_inode_start: bdi 98:0: ino=51 sync_mode=0
    kworker/u2:2-112   [000] ....   541.590000: writeback_write_inode: bdi 98:0: ino=51 sync_mode=0
    kworker/u2:2-112   [000] ....   541.590000: writeback_single_inode: bdi 98:0: ino=51 state=I_SYNC dirtied_when=4294988447 age=117 index=69 to_write=4096 wrote=0
    kworker/u2:2-112   [000] ....   541.590000: writeback_single_inode_start: bdi 98:0: ino=52 state=I_DIRTY_SYNC|I_DIRTY_DATASYNC|I_DIRTY_PAGES|I_SYNC dirtied_when=4294988448 age=117 index=4 to_write=4096 wrote=0
    kworker/u2:2-112   [000] ....   541.590000: writeback_write_inode_start: bdi 98:0: ino=52 sync_mode=0
    kworker/u2:2-112   [000] ....   541.590000: writeback_write_inode: bdi 98:0: ino=52 sync_mode=0
    kworker/u2:2-112   [000] ....   541.590000: writeback_single_inode: bdi 98:0: ino=52 state=I_SYNC dirtied_when=4294988448 age=117 index=4 to_write=4096 wrote=0
    kworker/u2:2-112   [000] ....   541.590000: writeback_single_inode_start: bdi 98:0: ino=53 state=I_DIRTY_SYNC|I_DIRTY_PAGES|I_SYNC dirtied_when=4294988448 age=117 index=7 to_write=4096 wrote=0
    kworker/u2:2-112   [000] ....   541.590000: writeback_write_inode_start: bdi 98:0: ino=53 sync_mode=0
    kworker/u2:2-112   [000] ....   541.590000: writeback_write_inode: bdi 98:0: ino=53 sync_mode=0
    kworker/u2:2-112   [000] ....   541.590000: writeback_single_inode: bdi 98:0: ino=53 state=I_SYNC dirtied_when=4294988448 age=117 index=7 to_write=4096 wrote=0
    kworker/u2:2-112   [000] ....   541.590000: writeback_single_inode_start: bdi 98:0: ino=54 state=I_DIRTY_SYNC|I_DIRTY_PAGES|I_SYNC dirtied_when=4294988448 age=117 index=18 to_write=4096 wrote=0
    kworker/u2:2-112   [000] ....   541.590000: writeback_write_inode_start: bdi 98:0: ino=54 sync_mode=0
    kworker/u2:2-112   [000] ....   541.590000: writeback_write_inode: bdi 98:0: ino=54 sync_mode=0
    kworker/u2:2-112   [000] ....   541.590000: writeback_single_inode: bdi 98:0: ino=54 state=I_SYNC dirtied_when=4294988448 age=117 index=18 to_write=4096 wrote=0
    kworker/u2:2-112   [000] ....   541.590000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=15382 sync_mode=0 kupdate=1 range_cyclic=1 background=0 reason=periodic
    kworker/u2:2-112   [000] ....   541.590000: writeback_start: bdi 98:0: sb_dev 0:0 nr_pages=15382 sync_mode=0 kupdate=1 range_cyclic=1 background=0 reason=periodic
    kworker/u2:2-112   [000] ....   541.590000: writeback_queue_io: bdi 98:0: older=4294988455 age=30000 enqueue=0 reason=periodic
    kworker/u2:2-112   [000] ....   541.590000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=15382 sync_mode=0 kupdate=1 range_cyclic=1 background=0 reason=periodic
    kworker/u2:2-112   [000] ....   541.590000: global_dirty_state: dirty=18446744073709551604 writeback=30 unstable=0 bg_thresh=6954 thresh=13909 limit=24789 dirtied=278225 written=295588

So something seems to mess with global_page_state(NR_FILE_DIRTY) which
is vm_stat[11].counter
and indeed vm_stat is defined as signed long, that explains the negativ
value.

Maybe some unbalanced calls between dec/inc_zone_page_state(page, NR_FILE_DIRTY)?

Is it possible in gdb to set a watchpoint on vm_stat[11].counter when a value < 0 is written to this address?

> >> On 14/10/14 08:21, Thomas Meyer wrote:
> >>> Am Dienstag, den 14.10.2014, 07:43 +0100 schrieb Anton Ivanov:
> >>>> On 14/10/14 06:38, Anton Ivanov wrote:
> >>>>> How does the stall manifest itself?
> >>>>>
> >>>>> Do you have the journal thread (and sometimes a couple of other threads)
> >>>>> sitting in D state?
> >>>> Sorry, should not be asking questions at 6 am before the 3rd double
> >>>> espresso.
> >>>>
> >>>> I think it is the same bug I am chasing - a stall in ubd, you hit it on
> >>>> swap while I hit it in normal operation on a swapless system. I see a
> >>>> stall in the journal instead of a backing dev stall.
> >>>>
> >>>> If you apply the ubd patches out of my patchsets, you can trigger this
> >>>> one with ease. In theory, all they do is to make UBD faster so they
> >>>> should not by themselves introduce new races. They may however make the
> >>>> older ones more pronounced.
> >>>>
> >>>> My working hypothesis is a race somewhere in the vm subsystem. I have
> >>>> been unable to nail it though.
> >>> Hi Anton,
> >>>
> >>> I see this bug on a 3.17 uml kernel with the sync fix patch from
> >>> Thorsten Knabe applied.
> >>>
> >>> The stall has to do with the writepage ratelimit mechanism, as the
> >>> mechanism seems to reach a state where it tries to write out page, per
> >>> page:
> >>>
> >>> Breakpoint 1, balance_dirty_pages (pages_dirtied=1, mapping=<optimized out>) at	mm/page-writeback.c:1338
> >>> (gdb) bt
> >>> #0  balance_dirty_pages	(pages_dirtied=1, mapping=<optimized out>) at mm/page-writeback.c:1338
> >>>
> >>> pages_dirtied = 1 !!
> >>>
> >>> #0  try_to_grab_pending (work=0x7fa2a348, is_dwork=true, flags=0x72ff5ab8) at kernel/workqueue.c:1159
> >>> #1  0x0000000060051feb in mod_delayed_work_on (cpu=2141365064, wq=0x1, dwork=0x72ff5ab8, delay=<optimized out>) at kernel/workqueue.c:1510
> >>> #2  0x00000000600f382c in mod_delayed_work (delay=<optimized out>, dwork=<optimized out>, wq=<optimized out>) at include/linux/workqueue.h:504
> >>> #3  bdi_wakeup_thread (bdi=<optimized out>) at fs/fs-writeback.c:98
> >>> #4  0x00000000600f4aca in bdi_start_background_writeback (bdi=<optimized out>) at fs/fs-writeback.c:179
> >>> #5  0x000000006042d4c0 in balance_dirty_pages (pages_dirtied=<optimized out>, mapping=<optimized out>) at mm/page-writeback.c:1408
> >>> #6  0x00000000600a6e1a in balance_dirty_pages_ratelimited (mapping=<optimized out>) at mm/page-writeback.c:1627
> >>> #7  0x00000000600ba54f in do_wp_page (mm=<optimized out>, vma=<optimized out>, address=<optimized out>, page_table=<optimized out>, pmd=<optimized out>, orig_pte=..., ptl=<optimized out>) at mm/memory.c:2178
> >>> #8  0x00000000600bc986 in handle_pte_fault (flags=<optimized out>, pmd=<optimized out>, pte=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>) at mm/memory.c:3230
> >>> #9  __handle_mm_fault (flags=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>) at mm/memory.c:3335
> >>> #10 handle_mm_fault (mm=<optimized out>, vma=0x7f653228, address=1472490776, flags=<optimized out>) at mm/memory.c:3364
> >>> #11 0x0000000060028cec in handle_page_fault (address=1472490776, ip=<optimized out>, is_write=<optimized out>, is_user=0, code_out=<optimized out>) at arch/um/kernel/trap.c:75
> >>> #12 0x00000000600290d7 in segv (fi=..., ip=1228924391, is_user=<optimized out>, regs=0x73eb8de8) at arch/um/kernel/trap.c:222
> >>> #13 0x0000000060029395 in segv_handler (sig=<optimized out>, unused_si=<optimized out>, regs=<optimized out>) at arch/um/kernel/trap.c:191
> >>> #14 0x0000000060039c0f in userspace (regs=0x73eb8de8) at arch/um/os-Linux/skas/process.c:429
> >>> #15 0x0000000060026a8c in fork_handler () at arch/um/kernel/process.c:149
> >>> #16 0x000000000070b620 in ?? ()
> >>> #17 0x0000000000000000 in ?? ()
> >>>
> >>> I'm not sure if this is the same error you encounter.
> >>>
> >>> This is on an ubd device with a cow image attached to it.
> >>>
> >>> The original ubd file and the cow file are spares ones, and do also
> >>> contain a swap partition.
> >>>
> >>> I hope to get tracepoints/perf working, now as there is stacktrace
> >>> support in uml. An interessting tracepoint would be
> >>> TRACE_EVENT(bdi_dirty_ratelimit) or TRACE_EVENT(balance_dirty_pages)
> >>>
> >>>
> >>>> A.
> >>>>
> >>>>> A.
> >>>>>
> >>>>> On 13/10/14 22:48, Thomas Meyer wrote:
> >>>>>> #0  balance_dirty_pages_ratelimited (mapping=0x792cc618) at mm/page-writeback.c:1587
> >>>>>> #1  0x00000000600ba54f in do_wp_page (mm=<optimized out>, vma=<optimized out>, address=<optimized out>, page_table=<optimized out>, pmd
> >>>>>> =<optimized out>, orig_pte=..., ptl=<optimized out>) at mm/memory.c:2178
> >>>>>> #2  0x00000000600bc986 in handle_pte_fault (flags=<optimized out>, pmd=<optimized out>, pte=<optimized out>, address=<optimized out>, v
> >>>>>> ma=<optimized out>, mm=<optimized out>) at mm/memory.c:3230
> >>>>>> #3  __handle_mm_fault (flags=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>) at mm/memory.c:3335
> >>>>>> #4  handle_mm_fault (mm=<optimized out>, vma=0x78008e88, address=1462695424, flags=<optimized out>) at mm/memory.c:3364
> >>>>>> #5  0x0000000060028cec in handle_page_fault (address=1462695424, ip=<optimized out>, is_write=<optimized out>, is_user=0, code_out=<opt
> >>>>>> imized out>) at arch/um/kernel/trap.c:75
> >>>>>> #6  0x00000000600290d7 in segv (fi=..., ip=1228924391, is_user=<optimized out>, regs=0x624f5728) at arch/um/kernel/trap.c:222
> >>>>>> #7  0x0000000060029395 in segv_handler (sig=<optimized out>, unused_si=<optimized out>, regs=<optimized out>) at arch/um/kernel/trap.c:
> >>>>>> 191
> >>>>>> #8  0x0000000060039c0f in userspace (regs=0x624f5728) at arch/um/os-Linux/skas/process.c:429
> >>>>>> #9  0x0000000060026a8c in fork_handler () at arch/um/kernel/process.c:149
> >>>>>> #10 0x0000000000000000 in ?? ()
> >>>>>>
> >>>>>> backing_dev_info:
> >>>>>> p *mapping->backing_dev_info
> >>>>>> $2 = {bdi_list = {next = 0x605901a0 <bdi_list>, prev = 0x80a42890}, ra_pages = 32, state = 8, capabilities = 4, congested_fn = 0x0, con
> >>>>>> gested_data = 0x0, name = 0x604fb827 "block", bdi_stat = {{count = 4}, {count = 0}, {count = 318691}, {count = 314567}}, bw_time_stamp
> >>>>>> = 4339445229, dirtied_stamp = 318686, written_stamp = 314564, write_bandwidth = 166, avg_write_bandwidth = 164, dirty_ratelimit = 1, ba
> >>>>>> lanced_dirty_ratelimit = 1, completions = {events = {count = 3}, period = 4481, lock = {raw_lock = {<No data fields>}}}, dirty_exceeded
> >>>>>>    = 0, min_ratio = 0, max_ratio = 100, max_prop_frac = 1024, wb = {bdi = 0x80a42278, nr = 0, last_old_flush = 4339445229, dwork = {work
> >>>>>> = {data = {counter = 65}, entry = {next = 0x80a42350, prev = 0x80a42350}, func = 0x600f4b25 <bdi_writeback_workfn>}, timer = {entry = {
> >>>>>> next = 0x606801a0 <boot_tvec_bases+4896>, prev = 0x803db650}, expires = 4339445730, base = 0x6067ee82 <boot_tvec_bases+2>, function = 0
> >>>>>> x60051dbb <delayed_work_timer_fn>, data = 2158240584, slack = -1}, wq = 0x808d9c00, cpu = 1}, b_dirty = {next = 0x7a4ce1f8, prev = 0x80
> >>>>>> 6ad9a8}, b_io = {next = 0x80a423c0, prev = 0x80a423c0}, b_more_io = {next = 0x80a423d0, prev = 0x80a423d0}, list_lock = {{rlock = {raw_
> >>>>>> lock = {<No data fields>}}}}}, wb_lock = {{rlock = {raw_lock = {<No data fields>}}}}, work_list = {next = 0x80a423e0, prev = 0x80a423e0
> >>>>>> }, dev = 0x80b68e00, laptop_mode_wb_timer = {entry = {next = 0x0, prev = 0x0}, expires = 0, base = 0x6067ee80 <boot_tvec_bases>, functi
> >>>>>> on = 0x600a6efd <laptop_mode_timer_fn>, data = 2158240008, slack = -1}, debug_dir = 0x80419e58, debug_stats = 0x80419d98}
> >>>>>>
> >>>>>> when i set the cap_dirty from the backing-dev ( capabilities = 5 ) the system comes back to normal.
> >>>>>>
> >>>>>> any ideas what's going on here?
> >>>>>>
> >>>>>> with kind regards
> >>>>>> thomas
> >




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [uml-devel] kernel stalls in balance_dirty_pages_ratelimited()
  2014-10-19 19:35             ` Thomas Meyer
@ 2014-10-20 20:01               ` Thomas Meyer
  0 siblings, 0 replies; 4+ messages in thread
From: Thomas Meyer @ 2014-10-20 20:01 UTC (permalink / raw)
  To: Anton Ivanov; +Cc: Linux Kernel Mailing List, user-mode-linux-devel

Am Sonntag, den 19.10.2014, 21:35 +0200 schrieb Thomas Meyer:
> Am Sonntag, den 19.10.2014, 17:02 +0100 schrieb Anton Ivanov:
> > On 19/10/14 15:59, Thomas Meyer wrote:
> > > Am Dienstag, den 14.10.2014, 08:31 +0100 schrieb Anton Ivanov:
> > >> I see a very similar stall on writeout to ubd with my patches (easy) and 
> > >> without (difficult - takes running an IO soak for a few days).
> > >>
> > >> It stalls (usually) when trying to flush the journal file of ext4.
> > > any ideas?
> > 
> > I had some suspicion of a race somewhere in the UML VM subsystem. I
> > sprinked barrier() all over it, nope not the case.
> > 

I added this patch to the uml kernel:

diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index 82e7db7..7f35fa4 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -241,6 +241,10 @@ static inline void __inc_zone_state(struct zone *zone, enum zone_stat_item item)
 static inline void __dec_zone_state(struct zone *zone, enum zone_stat_item item)
 {
        atomic_long_dec(&zone->vm_stat[item]);
+       if (&vm_stat[item] == &vm_stat[NR_FILE_DIRTY] &&
+               atomic_long_read(&vm_stat[item]) < 0) {
+               asm("int3");
+       }
        atomic_long_dec(&vm_stat[item]);
 }


And this is the backtrace leading to the negative nr_dirty value:

Program received signal SIGTRAP, Trace/breakpoint trap.
__dec_zone_state (item=<optimized out>, zone=<optimized out>) at include/linux/vmstat.h:248
(gdb) bt
#0  __dec_zone_state (item=<optimized out>, zone=<optimized out>) at include/linux/vmstat.h:248
#1  __dec_zone_page_state (item=<optimized out>, page=<optimized out>) at include/linux/vmstat.h:260
#2  clear_page_dirty_for_io (page=0x628b7308) at mm/page-writeback.c:2333
#3  0x0000000060188c36 in mpage_submit_page (mpd=0x808ebb90, page=<optimized out>) at fs/ext4/inode.c:1785
#4  0x000000006018917e in mpage_map_and_submit_buffers (mpd=0x808ebb90) at fs/ext4/inode.c:1981
#5  0x000000006018d64a in mpage_map_and_submit_extent (give_up_on_write=<optimized out>, mpd=<optimized out>, handle=<optimized out>) at fs/ext4/inode.c:2123
#6  ext4_writepages (mapping=<optimized out>, wbc=<optimized out>) at fs/ext4/inode.c:2428
#7  0x00000000600f0838 in do_writepages	(mapping=<optimized out>, wbc=<optimized out>) at mm/page-writeback.c:2043
#8  0x0000000060143d29 in __writeback_single_inode (inode=0x75e191a8, wbc=0x808ebcb8) at fs/fs-writeback.c:461
#9  0x0000000060144c00 in writeback_sb_inodes (sb=<optimized out>, wb=0x80a92330, work=0x808ebe00) at fs/fs-writeback.c:688
#10 0x0000000060144e0e in __writeback_inodes_wb (wb=0x808eb990,	work=0x628b7308) at fs/fs-writeback.c:733
#11 0x0000000060144f8d in wb_writeback (wb=0x80a92330, work=0x808ebe00) at fs/fs-writeback.c:864
#12 0x0000000060145375 in wb_check_old_data_flush (wb=<optimized out>) at fs/fs-writeback.c:979
#13 wb_do_writeback (wb=<optimized out>) at fs/fs-writeback.c:1014
#14 bdi_writeback_workfn (work=0x808eb990) at fs/fs-writeback.c:1044
#15 0x00000000600690a2 in process_one_work (worker=0x808c3700, work=0x80a92340)	at kernel/workqueue.c:2023
#16 0x0000000060069b5e in worker_thread	(__worker=0x808eb990) at kernel/workqueue.c:2155
#17 0x000000006006dd9f in kthread (_create=0x80822040) at kernel/kthread.c:207
#18 0x000000006003ab59 in new_thread_handler ()	at arch/um/kernel/process.c:129
#19 0x0000000000000000 in ?? ()




^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-10-20 20:01 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1413236904.13916.13.camel@localhost.localdomain>
     [not found] ` <543CB6BA.9030907@kot-begemot.co.uk>
     [not found]   ` <543CC5FE.4000601@kot-begemot.co.uk>
     [not found]     ` <1413271301.13744.13.camel@localhost.localdomain>
     [not found]       ` <543CD148.7010904@kot-begemot.co.uk>
2014-10-19 14:59         ` [uml-devel] kernel stalls in balance_dirty_pages_ratelimited() Thomas Meyer
2014-10-19 16:02           ` Anton Ivanov
2014-10-19 19:35             ` Thomas Meyer
2014-10-20 20:01               ` Thomas Meyer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox