[uml-devel] kernel stalls in balance_dirty_pages

All of lore.kernel.org
 help / color / mirror / Atom feed

* [uml-devel] kernel stalls in balance_dirty_pages_ratelimited()
@ 2014-10-13 21:48 Thomas Meyer
  2014-10-14  5:38 ` Anton Ivanov
  0 siblings, 1 reply; 13+ messages in thread
From: Thomas Meyer @ 2014-10-13 21:48 UTC (permalink / raw)
  To: user-mode-linux-devel

#0  balance_dirty_pages_ratelimited (mapping=0x792cc618) at mm/page-writeback.c:1587
#1  0x00000000600ba54f in do_wp_page (mm=<optimized out>, vma=<optimized out>, address=<optimized out>, page_table=<optimized out>, pmd
=<optimized out>, orig_pte=..., ptl=<optimized out>) at mm/memory.c:2178
#2  0x00000000600bc986 in handle_pte_fault (flags=<optimized out>, pmd=<optimized out>, pte=<optimized out>, address=<optimized out>, v
ma=<optimized out>, mm=<optimized out>) at mm/memory.c:3230
#3  __handle_mm_fault (flags=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>) at mm/memory.c:3335
#4  handle_mm_fault (mm=<optimized out>, vma=0x78008e88, address=1462695424, flags=<optimized out>) at mm/memory.c:3364
#5  0x0000000060028cec in handle_page_fault (address=1462695424, ip=<optimized out>, is_write=<optimized out>, is_user=0, code_out=<opt
imized out>) at arch/um/kernel/trap.c:75
#6  0x00000000600290d7 in segv (fi=..., ip=1228924391, is_user=<optimized out>, regs=0x624f5728) at arch/um/kernel/trap.c:222
#7  0x0000000060029395 in segv_handler (sig=<optimized out>, unused_si=<optimized out>, regs=<optimized out>) at arch/um/kernel/trap.c:
191
#8  0x0000000060039c0f in userspace (regs=0x624f5728) at arch/um/os-Linux/skas/process.c:429
#9  0x0000000060026a8c in fork_handler () at arch/um/kernel/process.c:149
#10 0x0000000000000000 in ?? ()

backing_dev_info:
p *mapping->backing_dev_info
$2 = {bdi_list = {next = 0x605901a0 <bdi_list>, prev = 0x80a42890}, ra_pages = 32, state = 8, capabilities = 4, congested_fn = 0x0, con
gested_data = 0x0, name = 0x604fb827 "block", bdi_stat = {{count = 4}, {count = 0}, {count = 318691}, {count = 314567}}, bw_time_stamp
= 4339445229, dirtied_stamp = 318686, written_stamp = 314564, write_bandwidth = 166, avg_write_bandwidth = 164, dirty_ratelimit = 1, ba
lanced_dirty_ratelimit = 1, completions = {events = {count = 3}, period = 4481, lock = {raw_lock = {<No data fields>}}}, dirty_exceeded
 = 0, min_ratio = 0, max_ratio = 100, max_prop_frac = 1024, wb = {bdi = 0x80a42278, nr = 0, last_old_flush = 4339445229, dwork = {work
= {data = {counter = 65}, entry = {next = 0x80a42350, prev = 0x80a42350}, func = 0x600f4b25 <bdi_writeback_workfn>}, timer = {entry = {
next = 0x606801a0 <boot_tvec_bases+4896>, prev = 0x803db650}, expires = 4339445730, base = 0x6067ee82 <boot_tvec_bases+2>, function = 0
x60051dbb <delayed_work_timer_fn>, data = 2158240584, slack = -1}, wq = 0x808d9c00, cpu = 1}, b_dirty = {next = 0x7a4ce1f8, prev = 0x80
6ad9a8}, b_io = {next = 0x80a423c0, prev = 0x80a423c0}, b_more_io = {next = 0x80a423d0, prev = 0x80a423d0}, list_lock = {{rlock = {raw_
lock = {<No data fields>}}}}}, wb_lock = {{rlock = {raw_lock = {<No data fields>}}}}, work_list = {next = 0x80a423e0, prev = 0x80a423e0
}, dev = 0x80b68e00, laptop_mode_wb_timer = {entry = {next = 0x0, prev = 0x0}, expires = 0, base = 0x6067ee80 <boot_tvec_bases>, functi
on = 0x600a6efd <laptop_mode_timer_fn>, data = 2158240008, slack = -1}, debug_dir = 0x80419e58, debug_stats = 0x80419d98}

when i set the cap_dirty from the backing-dev ( capabilities = 5 ) the system comes back to normal.

any ideas what's going on here?

with kind regards
thomas

------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://p.sf.net/sfu/Zoho
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [uml-devel] kernel stalls in balance_dirty_pages_ratelimited()
  2014-10-13 21:48 [uml-devel] kernel stalls in balance_dirty_pages_ratelimited() Thomas Meyer
@ 2014-10-14  5:38 ` Anton Ivanov
  2014-10-14  6:43   ` Anton Ivanov
  0 siblings, 1 reply; 13+ messages in thread
From: Anton Ivanov @ 2014-10-14  5:38 UTC (permalink / raw)
  To: user-mode-linux-devel

How does the stall manifest itself?

Do you have the journal thread (and sometimes a couple of other threads)
sitting in D state?

A.

On 13/10/14 22:48, Thomas Meyer wrote:
> #0  balance_dirty_pages_ratelimited (mapping=0x792cc618) at mm/page-writeback.c:1587
> #1  0x00000000600ba54f in do_wp_page (mm=<optimized out>, vma=<optimized out>, address=<optimized out>, page_table=<optimized out>, pmd
> =<optimized out>, orig_pte=..., ptl=<optimized out>) at mm/memory.c:2178
> #2  0x00000000600bc986 in handle_pte_fault (flags=<optimized out>, pmd=<optimized out>, pte=<optimized out>, address=<optimized out>, v
> ma=<optimized out>, mm=<optimized out>) at mm/memory.c:3230
> #3  __handle_mm_fault (flags=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>) at mm/memory.c:3335
> #4  handle_mm_fault (mm=<optimized out>, vma=0x78008e88, address=1462695424, flags=<optimized out>) at mm/memory.c:3364
> #5  0x0000000060028cec in handle_page_fault (address=1462695424, ip=<optimized out>, is_write=<optimized out>, is_user=0, code_out=<opt
> imized out>) at arch/um/kernel/trap.c:75
> #6  0x00000000600290d7 in segv (fi=..., ip=1228924391, is_user=<optimized out>, regs=0x624f5728) at arch/um/kernel/trap.c:222
> #7  0x0000000060029395 in segv_handler (sig=<optimized out>, unused_si=<optimized out>, regs=<optimized out>) at arch/um/kernel/trap.c:
> 191
> #8  0x0000000060039c0f in userspace (regs=0x624f5728) at arch/um/os-Linux/skas/process.c:429
> #9  0x0000000060026a8c in fork_handler () at arch/um/kernel/process.c:149
> #10 0x0000000000000000 in ?? ()
>
> backing_dev_info:
> p *mapping->backing_dev_info
> $2 = {bdi_list = {next = 0x605901a0 <bdi_list>, prev = 0x80a42890}, ra_pages = 32, state = 8, capabilities = 4, congested_fn = 0x0, con
> gested_data = 0x0, name = 0x604fb827 "block", bdi_stat = {{count = 4}, {count = 0}, {count = 318691}, {count = 314567}}, bw_time_stamp
> = 4339445229, dirtied_stamp = 318686, written_stamp = 314564, write_bandwidth = 166, avg_write_bandwidth = 164, dirty_ratelimit = 1, ba
> lanced_dirty_ratelimit = 1, completions = {events = {count = 3}, period = 4481, lock = {raw_lock = {<No data fields>}}}, dirty_exceeded
>  = 0, min_ratio = 0, max_ratio = 100, max_prop_frac = 1024, wb = {bdi = 0x80a42278, nr = 0, last_old_flush = 4339445229, dwork = {work
> = {data = {counter = 65}, entry = {next = 0x80a42350, prev = 0x80a42350}, func = 0x600f4b25 <bdi_writeback_workfn>}, timer = {entry = {
> next = 0x606801a0 <boot_tvec_bases+4896>, prev = 0x803db650}, expires = 4339445730, base = 0x6067ee82 <boot_tvec_bases+2>, function = 0
> x60051dbb <delayed_work_timer_fn>, data = 2158240584, slack = -1}, wq = 0x808d9c00, cpu = 1}, b_dirty = {next = 0x7a4ce1f8, prev = 0x80
> 6ad9a8}, b_io = {next = 0x80a423c0, prev = 0x80a423c0}, b_more_io = {next = 0x80a423d0, prev = 0x80a423d0}, list_lock = {{rlock = {raw_
> lock = {<No data fields>}}}}}, wb_lock = {{rlock = {raw_lock = {<No data fields>}}}}, work_list = {next = 0x80a423e0, prev = 0x80a423e0
> }, dev = 0x80b68e00, laptop_mode_wb_timer = {entry = {next = 0x0, prev = 0x0}, expires = 0, base = 0x6067ee80 <boot_tvec_bases>, functi
> on = 0x600a6efd <laptop_mode_timer_fn>, data = 2158240008, slack = -1}, debug_dir = 0x80419e58, debug_stats = 0x80419d98}
>
> when i set the cap_dirty from the backing-dev ( capabilities = 5 ) the system comes back to normal.
>
> any ideas what's going on here?
>
> with kind regards
> thomas
>
>
>
> ------------------------------------------------------------------------------
> Comprehensive Server Monitoring with Site24x7.
> Monitor 10 servers for $9/Month.
> Get alerted through email, SMS, voice calls or mobile push notifications.
> Take corrective actions from your mobile device.
> http://p.sf.net/sfu/Zoho
> _______________________________________________
> User-mode-linux-devel mailing list
> User-mode-linux-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
>


------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://p.sf.net/sfu/Zoho
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [uml-devel] kernel stalls in balance_dirty_pages_ratelimited()
  2014-10-14  5:38 ` Anton Ivanov
@ 2014-10-14  6:43   ` Anton Ivanov
  2014-10-14  7:21     ` Thomas Meyer
  0 siblings, 1 reply; 13+ messages in thread
From: Anton Ivanov @ 2014-10-14  6:43 UTC (permalink / raw)
  To: user-mode-linux-devel

On 14/10/14 06:38, Anton Ivanov wrote:
> How does the stall manifest itself?
>
> Do you have the journal thread (and sometimes a couple of other threads)
> sitting in D state?

Sorry, should not be asking questions at 6 am before the 3rd double 
espresso.

I think it is the same bug I am chasing - a stall in ubd, you hit it on 
swap while I hit it in normal operation on a swapless system. I see a 
stall in the journal instead of a backing dev stall.

If you apply the ubd patches out of my patchsets, you can trigger this 
one with ease. In theory, all they do is to make UBD faster so they 
should not by themselves introduce new races. They may however make the 
older ones more pronounced.

My working hypothesis is a race somewhere in the vm subsystem. I have 
been unable to nail it though.

A.

>
> A.
>
> On 13/10/14 22:48, Thomas Meyer wrote:
>> #0  balance_dirty_pages_ratelimited (mapping=0x792cc618) at mm/page-writeback.c:1587
>> #1  0x00000000600ba54f in do_wp_page (mm=<optimized out>, vma=<optimized out>, address=<optimized out>, page_table=<optimized out>, pmd
>> =<optimized out>, orig_pte=..., ptl=<optimized out>) at mm/memory.c:2178
>> #2  0x00000000600bc986 in handle_pte_fault (flags=<optimized out>, pmd=<optimized out>, pte=<optimized out>, address=<optimized out>, v
>> ma=<optimized out>, mm=<optimized out>) at mm/memory.c:3230
>> #3  __handle_mm_fault (flags=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>) at mm/memory.c:3335
>> #4  handle_mm_fault (mm=<optimized out>, vma=0x78008e88, address=1462695424, flags=<optimized out>) at mm/memory.c:3364
>> #5  0x0000000060028cec in handle_page_fault (address=1462695424, ip=<optimized out>, is_write=<optimized out>, is_user=0, code_out=<opt
>> imized out>) at arch/um/kernel/trap.c:75
>> #6  0x00000000600290d7 in segv (fi=..., ip=1228924391, is_user=<optimized out>, regs=0x624f5728) at arch/um/kernel/trap.c:222
>> #7  0x0000000060029395 in segv_handler (sig=<optimized out>, unused_si=<optimized out>, regs=<optimized out>) at arch/um/kernel/trap.c:
>> 191
>> #8  0x0000000060039c0f in userspace (regs=0x624f5728) at arch/um/os-Linux/skas/process.c:429
>> #9  0x0000000060026a8c in fork_handler () at arch/um/kernel/process.c:149
>> #10 0x0000000000000000 in ?? ()
>>
>> backing_dev_info:
>> p *mapping->backing_dev_info
>> $2 = {bdi_list = {next = 0x605901a0 <bdi_list>, prev = 0x80a42890}, ra_pages = 32, state = 8, capabilities = 4, congested_fn = 0x0, con
>> gested_data = 0x0, name = 0x604fb827 "block", bdi_stat = {{count = 4}, {count = 0}, {count = 318691}, {count = 314567}}, bw_time_stamp
>> = 4339445229, dirtied_stamp = 318686, written_stamp = 314564, write_bandwidth = 166, avg_write_bandwidth = 164, dirty_ratelimit = 1, ba
>> lanced_dirty_ratelimit = 1, completions = {events = {count = 3}, period = 4481, lock = {raw_lock = {<No data fields>}}}, dirty_exceeded
>>   = 0, min_ratio = 0, max_ratio = 100, max_prop_frac = 1024, wb = {bdi = 0x80a42278, nr = 0, last_old_flush = 4339445229, dwork = {work
>> = {data = {counter = 65}, entry = {next = 0x80a42350, prev = 0x80a42350}, func = 0x600f4b25 <bdi_writeback_workfn>}, timer = {entry = {
>> next = 0x606801a0 <boot_tvec_bases+4896>, prev = 0x803db650}, expires = 4339445730, base = 0x6067ee82 <boot_tvec_bases+2>, function = 0
>> x60051dbb <delayed_work_timer_fn>, data = 2158240584, slack = -1}, wq = 0x808d9c00, cpu = 1}, b_dirty = {next = 0x7a4ce1f8, prev = 0x80
>> 6ad9a8}, b_io = {next = 0x80a423c0, prev = 0x80a423c0}, b_more_io = {next = 0x80a423d0, prev = 0x80a423d0}, list_lock = {{rlock = {raw_
>> lock = {<No data fields>}}}}}, wb_lock = {{rlock = {raw_lock = {<No data fields>}}}}, work_list = {next = 0x80a423e0, prev = 0x80a423e0
>> }, dev = 0x80b68e00, laptop_mode_wb_timer = {entry = {next = 0x0, prev = 0x0}, expires = 0, base = 0x6067ee80 <boot_tvec_bases>, functi
>> on = 0x600a6efd <laptop_mode_timer_fn>, data = 2158240008, slack = -1}, debug_dir = 0x80419e58, debug_stats = 0x80419d98}
>>
>> when i set the cap_dirty from the backing-dev ( capabilities = 5 ) the system comes back to normal.
>>
>> any ideas what's going on here?
>>
>> with kind regards
>> thomas
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Comprehensive Server Monitoring with Site24x7.
>> Monitor 10 servers for $9/Month.
>> Get alerted through email, SMS, voice calls or mobile push notifications.
>> Take corrective actions from your mobile device.
>> http://p.sf.net/sfu/Zoho
>> _______________________________________________
>> User-mode-linux-devel mailing list
>> User-mode-linux-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
>>


------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://p.sf.net/sfu/Zoho
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [uml-devel] kernel stalls in balance_dirty_pages_ratelimited()
  2014-10-14  6:43   ` Anton Ivanov
@ 2014-10-14  7:21     ` Thomas Meyer
  2014-10-14  7:31       ` Anton Ivanov
  0 siblings, 1 reply; 13+ messages in thread
From: Thomas Meyer @ 2014-10-14  7:21 UTC (permalink / raw)
  To: user-mode-linux-devel

Am Dienstag, den 14.10.2014, 07:43 +0100 schrieb Anton Ivanov:
> On 14/10/14 06:38, Anton Ivanov wrote:
> > How does the stall manifest itself?
> >
> > Do you have the journal thread (and sometimes a couple of other threads)
> > sitting in D state?
> 
> Sorry, should not be asking questions at 6 am before the 3rd double 
> espresso.
> 
> I think it is the same bug I am chasing - a stall in ubd, you hit it on 
> swap while I hit it in normal operation on a swapless system. I see a 
> stall in the journal instead of a backing dev stall.
> 
> If you apply the ubd patches out of my patchsets, you can trigger this 
> one with ease. In theory, all they do is to make UBD faster so they 
> should not by themselves introduce new races. They may however make the 
> older ones more pronounced.
> 
> My working hypothesis is a race somewhere in the vm subsystem. I have 
> been unable to nail it though.

Hi Anton,

I see this bug on a 3.17 uml kernel with the sync fix patch from
Thorsten Knabe applied.

The stall has to do with the writepage ratelimit mechanism, as the
mechanism seems to reach a state where it tries to write out page, per
page:

Breakpoint 1, balance_dirty_pages (pages_dirtied=1, mapping=<optimized out>) at	mm/page-writeback.c:1338
(gdb) bt
#0  balance_dirty_pages	(pages_dirtied=1, mapping=<optimized out>) at mm/page-writeback.c:1338

pages_dirtied = 1 !!

#0  try_to_grab_pending (work=0x7fa2a348, is_dwork=true, flags=0x72ff5ab8) at kernel/workqueue.c:1159
#1  0x0000000060051feb in mod_delayed_work_on (cpu=2141365064, wq=0x1, dwork=0x72ff5ab8, delay=<optimized out>) at kernel/workqueue.c:1510
#2  0x00000000600f382c in mod_delayed_work (delay=<optimized out>, dwork=<optimized out>, wq=<optimized out>) at include/linux/workqueue.h:504
#3  bdi_wakeup_thread (bdi=<optimized out>) at fs/fs-writeback.c:98
#4  0x00000000600f4aca in bdi_start_background_writeback (bdi=<optimized out>) at fs/fs-writeback.c:179
#5  0x000000006042d4c0 in balance_dirty_pages (pages_dirtied=<optimized out>, mapping=<optimized out>) at mm/page-writeback.c:1408
#6  0x00000000600a6e1a in balance_dirty_pages_ratelimited (mapping=<optimized out>) at mm/page-writeback.c:1627
#7  0x00000000600ba54f in do_wp_page (mm=<optimized out>, vma=<optimized out>, address=<optimized out>, page_table=<optimized out>, pmd=<optimized out>, orig_pte=..., ptl=<optimized out>) at mm/memory.c:2178
#8  0x00000000600bc986 in handle_pte_fault (flags=<optimized out>, pmd=<optimized out>, pte=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>) at mm/memory.c:3230
#9  __handle_mm_fault (flags=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>) at mm/memory.c:3335
#10 handle_mm_fault (mm=<optimized out>, vma=0x7f653228, address=1472490776, flags=<optimized out>) at mm/memory.c:3364
#11 0x0000000060028cec in handle_page_fault (address=1472490776, ip=<optimized out>, is_write=<optimized out>, is_user=0, code_out=<optimized out>) at arch/um/kernel/trap.c:75
#12 0x00000000600290d7 in segv (fi=..., ip=1228924391, is_user=<optimized out>, regs=0x73eb8de8) at arch/um/kernel/trap.c:222
#13 0x0000000060029395 in segv_handler (sig=<optimized out>, unused_si=<optimized out>, regs=<optimized out>) at arch/um/kernel/trap.c:191
#14 0x0000000060039c0f in userspace (regs=0x73eb8de8) at arch/um/os-Linux/skas/process.c:429
#15 0x0000000060026a8c in fork_handler () at arch/um/kernel/process.c:149
#16 0x000000000070b620 in ?? ()
#17 0x0000000000000000 in ?? ()

I'm not sure if this is the same error you encounter.

This is on an ubd device with a cow image attached to it.

The original ubd file and the cow file are spares ones, and do also
contain a swap partition.

I hope to get tracepoints/perf working, now as there is stacktrace
support in uml. An interessting tracepoint would be 
TRACE_EVENT(bdi_dirty_ratelimit) or TRACE_EVENT(balance_dirty_pages)


> 
> A.
> 
> >
> > A.
> >
> > On 13/10/14 22:48, Thomas Meyer wrote:
> >> #0  balance_dirty_pages_ratelimited (mapping=0x792cc618) at mm/page-writeback.c:1587
> >> #1  0x00000000600ba54f in do_wp_page (mm=<optimized out>, vma=<optimized out>, address=<optimized out>, page_table=<optimized out>, pmd
> >> =<optimized out>, orig_pte=..., ptl=<optimized out>) at mm/memory.c:2178
> >> #2  0x00000000600bc986 in handle_pte_fault (flags=<optimized out>, pmd=<optimized out>, pte=<optimized out>, address=<optimized out>, v
> >> ma=<optimized out>, mm=<optimized out>) at mm/memory.c:3230
> >> #3  __handle_mm_fault (flags=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>) at mm/memory.c:3335
> >> #4  handle_mm_fault (mm=<optimized out>, vma=0x78008e88, address=1462695424, flags=<optimized out>) at mm/memory.c:3364
> >> #5  0x0000000060028cec in handle_page_fault (address=1462695424, ip=<optimized out>, is_write=<optimized out>, is_user=0, code_out=<opt
> >> imized out>) at arch/um/kernel/trap.c:75
> >> #6  0x00000000600290d7 in segv (fi=..., ip=1228924391, is_user=<optimized out>, regs=0x624f5728) at arch/um/kernel/trap.c:222
> >> #7  0x0000000060029395 in segv_handler (sig=<optimized out>, unused_si=<optimized out>, regs=<optimized out>) at arch/um/kernel/trap.c:
> >> 191
> >> #8  0x0000000060039c0f in userspace (regs=0x624f5728) at arch/um/os-Linux/skas/process.c:429
> >> #9  0x0000000060026a8c in fork_handler () at arch/um/kernel/process.c:149
> >> #10 0x0000000000000000 in ?? ()
> >>
> >> backing_dev_info:
> >> p *mapping->backing_dev_info
> >> $2 = {bdi_list = {next = 0x605901a0 <bdi_list>, prev = 0x80a42890}, ra_pages = 32, state = 8, capabilities = 4, congested_fn = 0x0, con
> >> gested_data = 0x0, name = 0x604fb827 "block", bdi_stat = {{count = 4}, {count = 0}, {count = 318691}, {count = 314567}}, bw_time_stamp
> >> = 4339445229, dirtied_stamp = 318686, written_stamp = 314564, write_bandwidth = 166, avg_write_bandwidth = 164, dirty_ratelimit = 1, ba
> >> lanced_dirty_ratelimit = 1, completions = {events = {count = 3}, period = 4481, lock = {raw_lock = {<No data fields>}}}, dirty_exceeded
> >>   = 0, min_ratio = 0, max_ratio = 100, max_prop_frac = 1024, wb = {bdi = 0x80a42278, nr = 0, last_old_flush = 4339445229, dwork = {work
> >> = {data = {counter = 65}, entry = {next = 0x80a42350, prev = 0x80a42350}, func = 0x600f4b25 <bdi_writeback_workfn>}, timer = {entry = {
> >> next = 0x606801a0 <boot_tvec_bases+4896>, prev = 0x803db650}, expires = 4339445730, base = 0x6067ee82 <boot_tvec_bases+2>, function = 0
> >> x60051dbb <delayed_work_timer_fn>, data = 2158240584, slack = -1}, wq = 0x808d9c00, cpu = 1}, b_dirty = {next = 0x7a4ce1f8, prev = 0x80
> >> 6ad9a8}, b_io = {next = 0x80a423c0, prev = 0x80a423c0}, b_more_io = {next = 0x80a423d0, prev = 0x80a423d0}, list_lock = {{rlock = {raw_
> >> lock = {<No data fields>}}}}}, wb_lock = {{rlock = {raw_lock = {<No data fields>}}}}, work_list = {next = 0x80a423e0, prev = 0x80a423e0
> >> }, dev = 0x80b68e00, laptop_mode_wb_timer = {entry = {next = 0x0, prev = 0x0}, expires = 0, base = 0x6067ee80 <boot_tvec_bases>, functi
> >> on = 0x600a6efd <laptop_mode_timer_fn>, data = 2158240008, slack = -1}, debug_dir = 0x80419e58, debug_stats = 0x80419d98}
> >>
> >> when i set the cap_dirty from the backing-dev ( capabilities = 5 ) the system comes back to normal.
> >>
> >> any ideas what's going on here?
> >>
> >> with kind regards
> >> thomas
> >>
> >>
> >>
> >> ------------------------------------------------------------------------------
> >> Comprehensive Server Monitoring with Site24x7.
> >> Monitor 10 servers for $9/Month.
> >> Get alerted through email, SMS, voice calls or mobile push notifications.
> >> Take corrective actions from your mobile device.
> >> http://p.sf.net/sfu/Zoho
> >> _______________________________________________
> >> User-mode-linux-devel mailing list
> >> User-mode-linux-devel@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
> >>
> 
> 
> ------------------------------------------------------------------------------
> Comprehensive Server Monitoring with Site24x7.
> Monitor 10 servers for $9/Month.
> Get alerted through email, SMS, voice calls or mobile push notifications.
> Take corrective actions from your mobile device.
> http://p.sf.net/sfu/Zoho
> _______________________________________________
> User-mode-linux-devel mailing list
> User-mode-linux-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel



------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://p.sf.net/sfu/Zoho
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [uml-devel] kernel stalls in balance_dirty_pages_ratelimited()
  2014-10-14  7:21     ` Thomas Meyer
@ 2014-10-14  7:31       ` Anton Ivanov
  2014-10-19 14:59         ` Thomas Meyer
  0 siblings, 1 reply; 13+ messages in thread
From: Anton Ivanov @ 2014-10-14  7:31 UTC (permalink / raw)
  To: user-mode-linux-devel

I see a very similar stall on writeout to ubd with my patches (easy) and 
without (difficult - takes running an IO soak for a few days).

It stalls (usually) when trying to flush the journal file of ext4.

A.


On 14/10/14 08:21, Thomas Meyer wrote:
> Am Dienstag, den 14.10.2014, 07:43 +0100 schrieb Anton Ivanov:
>> On 14/10/14 06:38, Anton Ivanov wrote:
>>> How does the stall manifest itself?
>>>
>>> Do you have the journal thread (and sometimes a couple of other threads)
>>> sitting in D state?
>> Sorry, should not be asking questions at 6 am before the 3rd double
>> espresso.
>>
>> I think it is the same bug I am chasing - a stall in ubd, you hit it on
>> swap while I hit it in normal operation on a swapless system. I see a
>> stall in the journal instead of a backing dev stall.
>>
>> If you apply the ubd patches out of my patchsets, you can trigger this
>> one with ease. In theory, all they do is to make UBD faster so they
>> should not by themselves introduce new races. They may however make the
>> older ones more pronounced.
>>
>> My working hypothesis is a race somewhere in the vm subsystem. I have
>> been unable to nail it though.
> Hi Anton,
>
> I see this bug on a 3.17 uml kernel with the sync fix patch from
> Thorsten Knabe applied.
>
> The stall has to do with the writepage ratelimit mechanism, as the
> mechanism seems to reach a state where it tries to write out page, per
> page:
>
> Breakpoint 1, balance_dirty_pages (pages_dirtied=1, mapping=<optimized out>) at	mm/page-writeback.c:1338
> (gdb) bt
> #0  balance_dirty_pages	(pages_dirtied=1, mapping=<optimized out>) at mm/page-writeback.c:1338
>
> pages_dirtied = 1 !!
>
> #0  try_to_grab_pending (work=0x7fa2a348, is_dwork=true, flags=0x72ff5ab8) at kernel/workqueue.c:1159
> #1  0x0000000060051feb in mod_delayed_work_on (cpu=2141365064, wq=0x1, dwork=0x72ff5ab8, delay=<optimized out>) at kernel/workqueue.c:1510
> #2  0x00000000600f382c in mod_delayed_work (delay=<optimized out>, dwork=<optimized out>, wq=<optimized out>) at include/linux/workqueue.h:504
> #3  bdi_wakeup_thread (bdi=<optimized out>) at fs/fs-writeback.c:98
> #4  0x00000000600f4aca in bdi_start_background_writeback (bdi=<optimized out>) at fs/fs-writeback.c:179
> #5  0x000000006042d4c0 in balance_dirty_pages (pages_dirtied=<optimized out>, mapping=<optimized out>) at mm/page-writeback.c:1408
> #6  0x00000000600a6e1a in balance_dirty_pages_ratelimited (mapping=<optimized out>) at mm/page-writeback.c:1627
> #7  0x00000000600ba54f in do_wp_page (mm=<optimized out>, vma=<optimized out>, address=<optimized out>, page_table=<optimized out>, pmd=<optimized out>, orig_pte=..., ptl=<optimized out>) at mm/memory.c:2178
> #8  0x00000000600bc986 in handle_pte_fault (flags=<optimized out>, pmd=<optimized out>, pte=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>) at mm/memory.c:3230
> #9  __handle_mm_fault (flags=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>) at mm/memory.c:3335
> #10 handle_mm_fault (mm=<optimized out>, vma=0x7f653228, address=1472490776, flags=<optimized out>) at mm/memory.c:3364
> #11 0x0000000060028cec in handle_page_fault (address=1472490776, ip=<optimized out>, is_write=<optimized out>, is_user=0, code_out=<optimized out>) at arch/um/kernel/trap.c:75
> #12 0x00000000600290d7 in segv (fi=..., ip=1228924391, is_user=<optimized out>, regs=0x73eb8de8) at arch/um/kernel/trap.c:222
> #13 0x0000000060029395 in segv_handler (sig=<optimized out>, unused_si=<optimized out>, regs=<optimized out>) at arch/um/kernel/trap.c:191
> #14 0x0000000060039c0f in userspace (regs=0x73eb8de8) at arch/um/os-Linux/skas/process.c:429
> #15 0x0000000060026a8c in fork_handler () at arch/um/kernel/process.c:149
> #16 0x000000000070b620 in ?? ()
> #17 0x0000000000000000 in ?? ()
>
> I'm not sure if this is the same error you encounter.
>
> This is on an ubd device with a cow image attached to it.
>
> The original ubd file and the cow file are spares ones, and do also
> contain a swap partition.
>
> I hope to get tracepoints/perf working, now as there is stacktrace
> support in uml. An interessting tracepoint would be
> TRACE_EVENT(bdi_dirty_ratelimit) or TRACE_EVENT(balance_dirty_pages)
>
>
>> A.
>>
>>> A.
>>>
>>> On 13/10/14 22:48, Thomas Meyer wrote:
>>>> #0  balance_dirty_pages_ratelimited (mapping=0x792cc618) at mm/page-writeback.c:1587
>>>> #1  0x00000000600ba54f in do_wp_page (mm=<optimized out>, vma=<optimized out>, address=<optimized out>, page_table=<optimized out>, pmd
>>>> =<optimized out>, orig_pte=..., ptl=<optimized out>) at mm/memory.c:2178
>>>> #2  0x00000000600bc986 in handle_pte_fault (flags=<optimized out>, pmd=<optimized out>, pte=<optimized out>, address=<optimized out>, v
>>>> ma=<optimized out>, mm=<optimized out>) at mm/memory.c:3230
>>>> #3  __handle_mm_fault (flags=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>) at mm/memory.c:3335
>>>> #4  handle_mm_fault (mm=<optimized out>, vma=0x78008e88, address=1462695424, flags=<optimized out>) at mm/memory.c:3364
>>>> #5  0x0000000060028cec in handle_page_fault (address=1462695424, ip=<optimized out>, is_write=<optimized out>, is_user=0, code_out=<opt
>>>> imized out>) at arch/um/kernel/trap.c:75
>>>> #6  0x00000000600290d7 in segv (fi=..., ip=1228924391, is_user=<optimized out>, regs=0x624f5728) at arch/um/kernel/trap.c:222
>>>> #7  0x0000000060029395 in segv_handler (sig=<optimized out>, unused_si=<optimized out>, regs=<optimized out>) at arch/um/kernel/trap.c:
>>>> 191
>>>> #8  0x0000000060039c0f in userspace (regs=0x624f5728) at arch/um/os-Linux/skas/process.c:429
>>>> #9  0x0000000060026a8c in fork_handler () at arch/um/kernel/process.c:149
>>>> #10 0x0000000000000000 in ?? ()
>>>>
>>>> backing_dev_info:
>>>> p *mapping->backing_dev_info
>>>> $2 = {bdi_list = {next = 0x605901a0 <bdi_list>, prev = 0x80a42890}, ra_pages = 32, state = 8, capabilities = 4, congested_fn = 0x0, con
>>>> gested_data = 0x0, name = 0x604fb827 "block", bdi_stat = {{count = 4}, {count = 0}, {count = 318691}, {count = 314567}}, bw_time_stamp
>>>> = 4339445229, dirtied_stamp = 318686, written_stamp = 314564, write_bandwidth = 166, avg_write_bandwidth = 164, dirty_ratelimit = 1, ba
>>>> lanced_dirty_ratelimit = 1, completions = {events = {count = 3}, period = 4481, lock = {raw_lock = {<No data fields>}}}, dirty_exceeded
>>>>    = 0, min_ratio = 0, max_ratio = 100, max_prop_frac = 1024, wb = {bdi = 0x80a42278, nr = 0, last_old_flush = 4339445229, dwork = {work
>>>> = {data = {counter = 65}, entry = {next = 0x80a42350, prev = 0x80a42350}, func = 0x600f4b25 <bdi_writeback_workfn>}, timer = {entry = {
>>>> next = 0x606801a0 <boot_tvec_bases+4896>, prev = 0x803db650}, expires = 4339445730, base = 0x6067ee82 <boot_tvec_bases+2>, function = 0
>>>> x60051dbb <delayed_work_timer_fn>, data = 2158240584, slack = -1}, wq = 0x808d9c00, cpu = 1}, b_dirty = {next = 0x7a4ce1f8, prev = 0x80
>>>> 6ad9a8}, b_io = {next = 0x80a423c0, prev = 0x80a423c0}, b_more_io = {next = 0x80a423d0, prev = 0x80a423d0}, list_lock = {{rlock = {raw_
>>>> lock = {<No data fields>}}}}}, wb_lock = {{rlock = {raw_lock = {<No data fields>}}}}, work_list = {next = 0x80a423e0, prev = 0x80a423e0
>>>> }, dev = 0x80b68e00, laptop_mode_wb_timer = {entry = {next = 0x0, prev = 0x0}, expires = 0, base = 0x6067ee80 <boot_tvec_bases>, functi
>>>> on = 0x600a6efd <laptop_mode_timer_fn>, data = 2158240008, slack = -1}, debug_dir = 0x80419e58, debug_stats = 0x80419d98}
>>>>
>>>> when i set the cap_dirty from the backing-dev ( capabilities = 5 ) the system comes back to normal.
>>>>
>>>> any ideas what's going on here?
>>>>
>>>> with kind regards
>>>> thomas
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Comprehensive Server Monitoring with Site24x7.
>>>> Monitor 10 servers for $9/Month.
>>>> Get alerted through email, SMS, voice calls or mobile push notifications.
>>>> Take corrective actions from your mobile device.
>>>> http://p.sf.net/sfu/Zoho
>>>> _______________________________________________
>>>> User-mode-linux-devel mailing list
>>>> User-mode-linux-devel@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
>>>>
>>
>> ------------------------------------------------------------------------------
>> Comprehensive Server Monitoring with Site24x7.
>> Monitor 10 servers for $9/Month.
>> Get alerted through email, SMS, voice calls or mobile push notifications.
>> Take corrective actions from your mobile device.
>> http://p.sf.net/sfu/Zoho
>> _______________________________________________
>> User-mode-linux-devel mailing list
>> User-mode-linux-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
>
>
> ------------------------------------------------------------------------------
> Comprehensive Server Monitoring with Site24x7.
> Monitor 10 servers for $9/Month.
> Get alerted through email, SMS, voice calls or mobile push notifications.
> Take corrective actions from your mobile device.
> http://p.sf.net/sfu/Zoho
> _______________________________________________
> User-mode-linux-devel mailing list
> User-mode-linux-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel


------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://p.sf.net/sfu/Zoho
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [uml-devel] kernel stalls in balance_dirty_pages_ratelimited()
  2014-10-14  7:31       ` Anton Ivanov
@ 2014-10-19 14:59         ` Thomas Meyer
  2014-10-19 16:02             ` Anton Ivanov
  0 siblings, 1 reply; 13+ messages in thread
From: Thomas Meyer @ 2014-10-19 14:59 UTC (permalink / raw)
  To: Anton Ivanov; +Cc: user-mode-linux-devel, Linux Kernel Mailing List

Am Dienstag, den 14.10.2014, 08:31 +0100 schrieb Anton Ivanov:
> I see a very similar stall on writeout to ubd with my patches (easy) and 
> without (difficult - takes running an IO soak for a few days).
> 
> It stalls (usually) when trying to flush the journal file of ext4.

Hi,

here an extract of the trace of all writeback:* tracepoints:

# tracer: nop
#
# entries-in-buffer/entries-written: 19322/2052430   #P:1
#
#                              _-----=> irqs-off
#                             / _----=> need-resched
#                            | / _---=> hardirq/softirq
#                            || / _--=> preempt-depth
#                            ||| /     delay
#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
#              | |       |   ||||       |         |
             yum-1553  [000] ....  1246.000000: writeback_wake_background: bdi 98:0
             yum-1553  [000] ....  1246.000000: balance_dirty_pages: bdi 98:0: limit=24732 setpoint=16229 dirty=18446744073709551284 bdi_setpoint=16227 bdi_dirty=1 dirty_ratelimit=4 task_ratelimit=0 dirtied=1 dirtied_pause=0 paused=0 pause=10 period=10 think=0
    kworker/u2:0-2603  [000] ....  1246.000000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340953 written=358955
    kworker/u2:0-2603  [000] ....  1246.000000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340953 written=358955
    kworker/u2:0-2603  [000] ....  1246.000000: writeback_start: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775807 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
    kworker/u2:0-2603  [000] ....  1246.000000: writeback_queue_io: bdi 98:0: older=4295061896 age=0 enqueue=1 reason=background
    kworker/u2:0-2603  [000] ....  1246.000000: writeback_single_inode_start: bdi 98:0: ino=29951 state=I_DIRTY_SYNC|I_DIRTY_PAGES|I_SYNC dirtied_when=4295061896 age=8 index=9 to_write=1024 wrote=0
    kworker/u2:0-2603  [000] ....  1246.000000: writeback_write_inode_start: bdi 98:0: ino=29951 sync_mode=0
    kworker/u2:0-2603  [000] ....  1246.000000: writeback_write_inode: bdi 98:0: ino=29951 sync_mode=0
    kworker/u2:0-2603  [000] ....  1246.000000: writeback_single_inode: bdi 98:0: ino=29951 state=I_SYNC dirtied_when=4295061896 age=8 index=9 to_write=1024 wrote=1
    kworker/u2:0-2603  [000] ....  1246.000000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
    kworker/u2:0-2603  [000] ....  1246.000000: global_dirty_state: dirty=18446744073709551283 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340953 written=358956
    kworker/u2:0-2603  [000] ....  1246.000000: writeback_start: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
    kworker/u2:0-2603  [000] ....  1246.000000: writeback_queue_io: bdi 98:0: older=4295061896 age=0 enqueue=0 reason=background
    kworker/u2:0-2603  [000] ....  1246.000000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
    kworker/u2:0-2603  [000] ....  1246.000000: writeback_pages_written: 1
             yum-1553  [000] ....  1246.010000: writeback_wake_background: bdi 98:0
             yum-1553  [000] ....  1246.010000: writeback_dirty_inode_start: bdi 98:0: ino=29951 flags=I_DIRTY_SYNC
             yum-1553  [000] ....  1246.010000: writeback_dirty_inode: bdi 98:0: ino=29951 flags=I_DIRTY_SYNC
             yum-1553  [000] d...  1246.010000: writeback_dirty_page: bdi 98:0: ino=29951 index=8
             yum-1553  [000] ....  1246.010000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340954 written=358956
             yum-1553  [000] ....  1246.010000: writeback_wake_background: bdi 98:0
             yum-1553  [000] ....  1246.010000: balance_dirty_pages: bdi 98:0: limit=24732 setpoint=16229 dirty=18446744073709551284 bdi_setpoint=16227 bdi_dirty=1 dirty_ratelimit=4 task_ratelimit=0 dirtied=1 dirtied_pause=0 paused=0 pause=10 period=10 think=0
    kworker/u2:0-2603  [000] ....  1246.010000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340954 written=358956
    kworker/u2:0-2603  [000] ....  1246.010000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340954 written=358956
    kworker/u2:0-2603  [000] ....  1246.010000: writeback_start: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775807 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
    kworker/u2:0-2603  [000] ....  1246.010000: writeback_queue_io: bdi 98:0: older=4295061897 age=0 enqueue=1 reason=background
    kworker/u2:0-2603  [000] ....  1246.010000: writeback_single_inode_start: bdi 98:0: ino=29951 state=I_DIRTY_SYNC|I_DIRTY_PAGES|I_SYNC dirtied_when=4295061897 age=8 index=9 to_write=1024 wrote=0
    kworker/u2:0-2603  [000] ....  1246.010000: writeback_write_inode_start: bdi 98:0: ino=29951 sync_mode=0
    kworker/u2:0-2603  [000] ....  1246.010000: writeback_write_inode: bdi 98:0: ino=29951 sync_mode=0
    kworker/u2:0-2603  [000] ....  1246.010000: writeback_single_inode: bdi 98:0: ino=29951 state=I_SYNC dirtied_when=4295061897 age=8 index=9 to_write=1024 wrote=1
    kworker/u2:0-2603  [000] ....  1246.010000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
    kworker/u2:0-2603  [000] ....  1246.010000: global_dirty_state: dirty=18446744073709551283 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340954 written=358957
    kworker/u2:0-2603  [000] ....  1246.010000: writeback_start: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
    kworker/u2:0-2603  [000] ....  1246.010000: writeback_queue_io: bdi 98:0: older=4295061897 age=0 enqueue=0 reason=background
    kworker/u2:0-2603  [000] ....  1246.010000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
    kworker/u2:0-2603  [000] ....  1246.010000: writeback_pages_written: 1
             yum-1553  [000] ....  1246.020000: writeback_wake_background: bdi 98:0
             yum-1553  [000] ....  1246.020000: writeback_dirty_inode_start: bdi 98:0: ino=29951 flags=I_DIRTY_SYNC
             yum-1553  [000] ....  1246.020000: writeback_dirty_inode: bdi 98:0: ino=29951 flags=I_DIRTY_SYNC
             yum-1553  [000] d...  1246.020000: writeback_dirty_page: bdi 98:0: ino=29951 index=8
             yum-1553  [000] ....  1246.020000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340955 written=358957
             yum-1553  [000] ....  1246.020000: writeback_wake_background: bdi 98:0
             yum-1553  [000] ....  1246.020000: balance_dirty_pages: bdi 98:0: limit=24732 setpoint=16229 dirty=18446744073709551284 bdi_setpoint=16227 bdi_dirty=1 dirty_ratelimit=4 task_ratelimit=0 dirtied=1 dirtied_pause=0 paused=0 pause=10 period=10 think=0
    kworker/u2:0-2603  [000] ....  1246.020000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340955 written=358957
    kworker/u2:0-2603  [000] ....  1246.020000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340955 written=358957
    kworker/u2:0-2603  [000] ....  1246.020000: writeback_start: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775807 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
    kworker/u2:0-2603  [000] ....  1246.020000: writeback_queue_io: bdi 98:0: older=4295061898 age=0 enqueue=1 reason=background
    kworker/u2:0-2603  [000] ....  1246.020000: writeback_single_inode_start: bdi 98:0: ino=29951 state=I_DIRTY_SYNC|I_DIRTY_PAGES|I_SYNC dirtied_when=4295061898 age=8 index=9 to_write=1024 wrote=0
    kworker/u2:0-2603  [000] ....  1246.020000: writeback_write_inode_start: bdi 98:0: ino=29951 sync_mode=0
    kworker/u2:0-2603  [000] ....  1246.020000: writeback_write_inode: bdi 98:0: ino=29951 sync_mode=0
    kworker/u2:0-2603  [000] ....  1246.020000: writeback_single_inode: bdi 98:0: ino=29951 state=I_SYNC dirtied_when=4295061898 age=8 index=9 to_write=1024 wrote=1
    kworker/u2:0-2603  [000] ....  1246.020000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
    kworker/u2:0-2603  [000] ....  1246.020000: global_dirty_state: dirty=18446744073709551283 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340955 written=358958
    kworker/u2:0-2603  [000] ....  1246.020000: writeback_start: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
    kworker/u2:0-2603  [000] ....  1246.020000: writeback_queue_io: bdi 98:0: older=4295061898 age=0 enqueue=0 reason=background
    kworker/u2:0-2603  [000] ....  1246.020000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
    kworker/u2:0-2603  [000] ....  1246.020000: writeback_pages_written: 1

I wonder why "global_dirty_state: dirty=18446744073709551284" can get so high!
I think this is maybe an signed long, but "nr_dirty" should be a unsigned long. Two's complement of above value is 332...

I'm not sure what's going on here...

any ideas?

> 
> A.
> 
> 
> On 14/10/14 08:21, Thomas Meyer wrote:
> > Am Dienstag, den 14.10.2014, 07:43 +0100 schrieb Anton Ivanov:
> >> On 14/10/14 06:38, Anton Ivanov wrote:
> >>> How does the stall manifest itself?
> >>>
> >>> Do you have the journal thread (and sometimes a couple of other threads)
> >>> sitting in D state?
> >> Sorry, should not be asking questions at 6 am before the 3rd double
> >> espresso.
> >>
> >> I think it is the same bug I am chasing - a stall in ubd, you hit it on
> >> swap while I hit it in normal operation on a swapless system. I see a
> >> stall in the journal instead of a backing dev stall.
> >>
> >> If you apply the ubd patches out of my patchsets, you can trigger this
> >> one with ease. In theory, all they do is to make UBD faster so they
> >> should not by themselves introduce new races. They may however make the
> >> older ones more pronounced.
> >>
> >> My working hypothesis is a race somewhere in the vm subsystem. I have
> >> been unable to nail it though.
> > Hi Anton,
> >
> > I see this bug on a 3.17 uml kernel with the sync fix patch from
> > Thorsten Knabe applied.
> >
> > The stall has to do with the writepage ratelimit mechanism, as the
> > mechanism seems to reach a state where it tries to write out page, per
> > page:
> >
> > Breakpoint 1, balance_dirty_pages (pages_dirtied=1, mapping=<optimized out>) at	mm/page-writeback.c:1338
> > (gdb) bt
> > #0  balance_dirty_pages	(pages_dirtied=1, mapping=<optimized out>) at mm/page-writeback.c:1338
> >
> > pages_dirtied = 1 !!
> >
> > #0  try_to_grab_pending (work=0x7fa2a348, is_dwork=true, flags=0x72ff5ab8) at kernel/workqueue.c:1159
> > #1  0x0000000060051feb in mod_delayed_work_on (cpu=2141365064, wq=0x1, dwork=0x72ff5ab8, delay=<optimized out>) at kernel/workqueue.c:1510
> > #2  0x00000000600f382c in mod_delayed_work (delay=<optimized out>, dwork=<optimized out>, wq=<optimized out>) at include/linux/workqueue.h:504
> > #3  bdi_wakeup_thread (bdi=<optimized out>) at fs/fs-writeback.c:98
> > #4  0x00000000600f4aca in bdi_start_background_writeback (bdi=<optimized out>) at fs/fs-writeback.c:179
> > #5  0x000000006042d4c0 in balance_dirty_pages (pages_dirtied=<optimized out>, mapping=<optimized out>) at mm/page-writeback.c:1408
> > #6  0x00000000600a6e1a in balance_dirty_pages_ratelimited (mapping=<optimized out>) at mm/page-writeback.c:1627
> > #7  0x00000000600ba54f in do_wp_page (mm=<optimized out>, vma=<optimized out>, address=<optimized out>, page_table=<optimized out>, pmd=<optimized out>, orig_pte=..., ptl=<optimized out>) at mm/memory.c:2178
> > #8  0x00000000600bc986 in handle_pte_fault (flags=<optimized out>, pmd=<optimized out>, pte=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>) at mm/memory.c:3230
> > #9  __handle_mm_fault (flags=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>) at mm/memory.c:3335
> > #10 handle_mm_fault (mm=<optimized out>, vma=0x7f653228, address=1472490776, flags=<optimized out>) at mm/memory.c:3364
> > #11 0x0000000060028cec in handle_page_fault (address=1472490776, ip=<optimized out>, is_write=<optimized out>, is_user=0, code_out=<optimized out>) at arch/um/kernel/trap.c:75
> > #12 0x00000000600290d7 in segv (fi=..., ip=1228924391, is_user=<optimized out>, regs=0x73eb8de8) at arch/um/kernel/trap.c:222
> > #13 0x0000000060029395 in segv_handler (sig=<optimized out>, unused_si=<optimized out>, regs=<optimized out>) at arch/um/kernel/trap.c:191
> > #14 0x0000000060039c0f in userspace (regs=0x73eb8de8) at arch/um/os-Linux/skas/process.c:429
> > #15 0x0000000060026a8c in fork_handler () at arch/um/kernel/process.c:149
> > #16 0x000000000070b620 in ?? ()
> > #17 0x0000000000000000 in ?? ()
> >
> > I'm not sure if this is the same error you encounter.
> >
> > This is on an ubd device with a cow image attached to it.
> >
> > The original ubd file and the cow file are spares ones, and do also
> > contain a swap partition.
> >
> > I hope to get tracepoints/perf working, now as there is stacktrace
> > support in uml. An interessting tracepoint would be
> > TRACE_EVENT(bdi_dirty_ratelimit) or TRACE_EVENT(balance_dirty_pages)
> >
> >
> >> A.
> >>
> >>> A.
> >>>
> >>> On 13/10/14 22:48, Thomas Meyer wrote:
> >>>> #0  balance_dirty_pages_ratelimited (mapping=0x792cc618) at mm/page-writeback.c:1587
> >>>> #1  0x00000000600ba54f in do_wp_page (mm=<optimized out>, vma=<optimized out>, address=<optimized out>, page_table=<optimized out>, pmd
> >>>> =<optimized out>, orig_pte=..., ptl=<optimized out>) at mm/memory.c:2178
> >>>> #2  0x00000000600bc986 in handle_pte_fault (flags=<optimized out>, pmd=<optimized out>, pte=<optimized out>, address=<optimized out>, v
> >>>> ma=<optimized out>, mm=<optimized out>) at mm/memory.c:3230
> >>>> #3  __handle_mm_fault (flags=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>) at mm/memory.c:3335
> >>>> #4  handle_mm_fault (mm=<optimized out>, vma=0x78008e88, address=1462695424, flags=<optimized out>) at mm/memory.c:3364
> >>>> #5  0x0000000060028cec in handle_page_fault (address=1462695424, ip=<optimized out>, is_write=<optimized out>, is_user=0, code_out=<opt
> >>>> imized out>) at arch/um/kernel/trap.c:75
> >>>> #6  0x00000000600290d7 in segv (fi=..., ip=1228924391, is_user=<optimized out>, regs=0x624f5728) at arch/um/kernel/trap.c:222
> >>>> #7  0x0000000060029395 in segv_handler (sig=<optimized out>, unused_si=<optimized out>, regs=<optimized out>) at arch/um/kernel/trap.c:
> >>>> 191
> >>>> #8  0x0000000060039c0f in userspace (regs=0x624f5728) at arch/um/os-Linux/skas/process.c:429
> >>>> #9  0x0000000060026a8c in fork_handler () at arch/um/kernel/process.c:149
> >>>> #10 0x0000000000000000 in ?? ()
> >>>>
> >>>> backing_dev_info:
> >>>> p *mapping->backing_dev_info
> >>>> $2 = {bdi_list = {next = 0x605901a0 <bdi_list>, prev = 0x80a42890}, ra_pages = 32, state = 8, capabilities = 4, congested_fn = 0x0, con
> >>>> gested_data = 0x0, name = 0x604fb827 "block", bdi_stat = {{count = 4}, {count = 0}, {count = 318691}, {count = 314567}}, bw_time_stamp
> >>>> = 4339445229, dirtied_stamp = 318686, written_stamp = 314564, write_bandwidth = 166, avg_write_bandwidth = 164, dirty_ratelimit = 1, ba
> >>>> lanced_dirty_ratelimit = 1, completions = {events = {count = 3}, period = 4481, lock = {raw_lock = {<No data fields>}}}, dirty_exceeded
> >>>>    = 0, min_ratio = 0, max_ratio = 100, max_prop_frac = 1024, wb = {bdi = 0x80a42278, nr = 0, last_old_flush = 4339445229, dwork = {work
> >>>> = {data = {counter = 65}, entry = {next = 0x80a42350, prev = 0x80a42350}, func = 0x600f4b25 <bdi_writeback_workfn>}, timer = {entry = {
> >>>> next = 0x606801a0 <boot_tvec_bases+4896>, prev = 0x803db650}, expires = 4339445730, base = 0x6067ee82 <boot_tvec_bases+2>, function = 0
> >>>> x60051dbb <delayed_work_timer_fn>, data = 2158240584, slack = -1}, wq = 0x808d9c00, cpu = 1}, b_dirty = {next = 0x7a4ce1f8, prev = 0x80
> >>>> 6ad9a8}, b_io = {next = 0x80a423c0, prev = 0x80a423c0}, b_more_io = {next = 0x80a423d0, prev = 0x80a423d0}, list_lock = {{rlock = {raw_
> >>>> lock = {<No data fields>}}}}}, wb_lock = {{rlock = {raw_lock = {<No data fields>}}}}, work_list = {next = 0x80a423e0, prev = 0x80a423e0
> >>>> }, dev = 0x80b68e00, laptop_mode_wb_timer = {entry = {next = 0x0, prev = 0x0}, expires = 0, base = 0x6067ee80 <boot_tvec_bases>, functi
> >>>> on = 0x600a6efd <laptop_mode_timer_fn>, data = 2158240008, slack = -1}, debug_dir = 0x80419e58, debug_stats = 0x80419d98}
> >>>>
> >>>> when i set the cap_dirty from the backing-dev ( capabilities = 5 ) the system comes back to normal.
> >>>>
> >>>> any ideas what's going on here?
> >>>>
> >>>> with kind regards
> >>>> thomas



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [uml-devel] kernel stalls in balance_dirty_pages_ratelimited()
  2014-10-19 14:59         ` Thomas Meyer
@ 2014-10-19 16:02             ` Anton Ivanov
  0 siblings, 0 replies; 13+ messages in thread
From: Anton Ivanov @ 2014-10-19 16:02 UTC (permalink / raw)
  To: Thomas Meyer; +Cc: Linux Kernel Mailing List, user-mode-linux-devel

On 19/10/14 15:59, Thomas Meyer wrote:
> Am Dienstag, den 14.10.2014, 08:31 +0100 schrieb Anton Ivanov:
>> I see a very similar stall on writeout to ubd with my patches (easy) and 
>> without (difficult - takes running an IO soak for a few days).
>>
>> It stalls (usually) when trying to flush the journal file of ext4.
> Hi,
>
> here an extract of the trace of all writeback:* tracepoints:
>
> # tracer: nop
> #
> # entries-in-buffer/entries-written: 19322/2052430   #P:1
> #
> #                              _-----=> irqs-off
> #                             / _----=> need-resched
> #                            | / _---=> hardirq/softirq
> #                            || / _--=> preempt-depth
> #                            ||| /     delay
> #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
> #              | |       |   ||||       |         |
>              yum-1553  [000] ....  1246.000000: writeback_wake_background: bdi 98:0
>              yum-1553  [000] ....  1246.000000: balance_dirty_pages: bdi 98:0: limit=24732 setpoint=16229 dirty=18446744073709551284 bdi_setpoint=16227 bdi_dirty=1 dirty_ratelimit=4 task_ratelimit=0 dirtied=1 dirtied_pause=0 paused=0 pause=10 period=10 think=0
>     kworker/u2:0-2603  [000] ....  1246.000000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340953 written=358955
>     kworker/u2:0-2603  [000] ....  1246.000000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340953 written=358955
>     kworker/u2:0-2603  [000] ....  1246.000000: writeback_start: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775807 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
>     kworker/u2:0-2603  [000] ....  1246.000000: writeback_queue_io: bdi 98:0: older=4295061896 age=0 enqueue=1 reason=background
>     kworker/u2:0-2603  [000] ....  1246.000000: writeback_single_inode_start: bdi 98:0: ino=29951 state=I_DIRTY_SYNC|I_DIRTY_PAGES|I_SYNC dirtied_when=4295061896 age=8 index=9 to_write=1024 wrote=0
>     kworker/u2:0-2603  [000] ....  1246.000000: writeback_write_inode_start: bdi 98:0: ino=29951 sync_mode=0
>     kworker/u2:0-2603  [000] ....  1246.000000: writeback_write_inode: bdi 98:0: ino=29951 sync_mode=0
>     kworker/u2:0-2603  [000] ....  1246.000000: writeback_single_inode: bdi 98:0: ino=29951 state=I_SYNC dirtied_when=4295061896 age=8 index=9 to_write=1024 wrote=1
>     kworker/u2:0-2603  [000] ....  1246.000000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
>     kworker/u2:0-2603  [000] ....  1246.000000: global_dirty_state: dirty=18446744073709551283 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340953 written=358956
>     kworker/u2:0-2603  [000] ....  1246.000000: writeback_start: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
>     kworker/u2:0-2603  [000] ....  1246.000000: writeback_queue_io: bdi 98:0: older=4295061896 age=0 enqueue=0 reason=background
>     kworker/u2:0-2603  [000] ....  1246.000000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
>     kworker/u2:0-2603  [000] ....  1246.000000: writeback_pages_written: 1
>              yum-1553  [000] ....  1246.010000: writeback_wake_background: bdi 98:0
>              yum-1553  [000] ....  1246.010000: writeback_dirty_inode_start: bdi 98:0: ino=29951 flags=I_DIRTY_SYNC
>              yum-1553  [000] ....  1246.010000: writeback_dirty_inode: bdi 98:0: ino=29951 flags=I_DIRTY_SYNC
>              yum-1553  [000] d...  1246.010000: writeback_dirty_page: bdi 98:0: ino=29951 index=8
>              yum-1553  [000] ....  1246.010000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340954 written=358956
>              yum-1553  [000] ....  1246.010000: writeback_wake_background: bdi 98:0
>              yum-1553  [000] ....  1246.010000: balance_dirty_pages: bdi 98:0: limit=24732 setpoint=16229 dirty=18446744073709551284 bdi_setpoint=16227 bdi_dirty=1 dirty_ratelimit=4 task_ratelimit=0 dirtied=1 dirtied_pause=0 paused=0 pause=10 period=10 think=0
>     kworker/u2:0-2603  [000] ....  1246.010000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340954 written=358956
>     kworker/u2:0-2603  [000] ....  1246.010000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340954 written=358956
>     kworker/u2:0-2603  [000] ....  1246.010000: writeback_start: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775807 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
>     kworker/u2:0-2603  [000] ....  1246.010000: writeback_queue_io: bdi 98:0: older=4295061897 age=0 enqueue=1 reason=background
>     kworker/u2:0-2603  [000] ....  1246.010000: writeback_single_inode_start: bdi 98:0: ino=29951 state=I_DIRTY_SYNC|I_DIRTY_PAGES|I_SYNC dirtied_when=4295061897 age=8 index=9 to_write=1024 wrote=0
>     kworker/u2:0-2603  [000] ....  1246.010000: writeback_write_inode_start: bdi 98:0: ino=29951 sync_mode=0
>     kworker/u2:0-2603  [000] ....  1246.010000: writeback_write_inode: bdi 98:0: ino=29951 sync_mode=0
>     kworker/u2:0-2603  [000] ....  1246.010000: writeback_single_inode: bdi 98:0: ino=29951 state=I_SYNC dirtied_when=4295061897 age=8 index=9 to_write=1024 wrote=1
>     kworker/u2:0-2603  [000] ....  1246.010000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
>     kworker/u2:0-2603  [000] ....  1246.010000: global_dirty_state: dirty=18446744073709551283 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340954 written=358957
>     kworker/u2:0-2603  [000] ....  1246.010000: writeback_start: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
>     kworker/u2:0-2603  [000] ....  1246.010000: writeback_queue_io: bdi 98:0: older=4295061897 age=0 enqueue=0 reason=background
>     kworker/u2:0-2603  [000] ....  1246.010000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
>     kworker/u2:0-2603  [000] ....  1246.010000: writeback_pages_written: 1
>              yum-1553  [000] ....  1246.020000: writeback_wake_background: bdi 98:0
>              yum-1553  [000] ....  1246.020000: writeback_dirty_inode_start: bdi 98:0: ino=29951 flags=I_DIRTY_SYNC
>              yum-1553  [000] ....  1246.020000: writeback_dirty_inode: bdi 98:0: ino=29951 flags=I_DIRTY_SYNC
>              yum-1553  [000] d...  1246.020000: writeback_dirty_page: bdi 98:0: ino=29951 index=8
>              yum-1553  [000] ....  1246.020000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340955 written=358957
>              yum-1553  [000] ....  1246.020000: writeback_wake_background: bdi 98:0
>              yum-1553  [000] ....  1246.020000: balance_dirty_pages: bdi 98:0: limit=24732 setpoint=16229 dirty=18446744073709551284 bdi_setpoint=16227 bdi_dirty=1 dirty_ratelimit=4 task_ratelimit=0 dirtied=1 dirtied_pause=0 paused=0 pause=10 period=10 think=0
>     kworker/u2:0-2603  [000] ....  1246.020000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340955 written=358957
>     kworker/u2:0-2603  [000] ....  1246.020000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340955 written=358957
>     kworker/u2:0-2603  [000] ....  1246.020000: writeback_start: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775807 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
>     kworker/u2:0-2603  [000] ....  1246.020000: writeback_queue_io: bdi 98:0: older=4295061898 age=0 enqueue=1 reason=background
>     kworker/u2:0-2603  [000] ....  1246.020000: writeback_single_inode_start: bdi 98:0: ino=29951 state=I_DIRTY_SYNC|I_DIRTY_PAGES|I_SYNC dirtied_when=4295061898 age=8 index=9 to_write=1024 wrote=0
>     kworker/u2:0-2603  [000] ....  1246.020000: writeback_write_inode_start: bdi 98:0: ino=29951 sync_mode=0
>     kworker/u2:0-2603  [000] ....  1246.020000: writeback_write_inode: bdi 98:0: ino=29951 sync_mode=0
>     kworker/u2:0-2603  [000] ....  1246.020000: writeback_single_inode: bdi 98:0: ino=29951 state=I_SYNC dirtied_when=4295061898 age=8 index=9 to_write=1024 wrote=1
>     kworker/u2:0-2603  [000] ....  1246.020000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
>     kworker/u2:0-2603  [000] ....  1246.020000: global_dirty_state: dirty=18446744073709551283 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340955 written=358958
>     kworker/u2:0-2603  [000] ....  1246.020000: writeback_start: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
>     kworker/u2:0-2603  [000] ....  1246.020000: writeback_queue_io: bdi 98:0: older=4295061898 age=0 enqueue=0 reason=background
>     kworker/u2:0-2603  [000] ....  1246.020000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
>     kworker/u2:0-2603  [000] ....  1246.020000: writeback_pages_written: 1
>
> I wonder why "global_dirty_state: dirty=18446744073709551284" can get so high!
> I think this is maybe an signed long, but "nr_dirty" should be a unsigned long. Two's complement of above value is 332...
>
> I'm not sure what's going on here...

Neither am I, but, unless I am mistaken, it does look like the writeout
just stops working and the dirty starts to grow uncontrolled. :)

>
> any ideas?

I had some suspicion of a race somewhere in the UML VM subsystem. I
sprinked barrier() all over it, nope not the case.

I have gone through the ubd driver itself with a microscope ~ 3 times,
cannot see anything wrong with the logic. While it does not win a beauty
contest, it does seem like it should work.

All I know is that my fixes to UBD and the replacement epoll IRQ
controller for UML make it more likely, however, you can trigger it with
the stock UML too.

A.

>
>> A.
>>
>>
>> On 14/10/14 08:21, Thomas Meyer wrote:
>>> Am Dienstag, den 14.10.2014, 07:43 +0100 schrieb Anton Ivanov:
>>>> On 14/10/14 06:38, Anton Ivanov wrote:
>>>>> How does the stall manifest itself?
>>>>>
>>>>> Do you have the journal thread (and sometimes a couple of other threads)
>>>>> sitting in D state?
>>>> Sorry, should not be asking questions at 6 am before the 3rd double
>>>> espresso.
>>>>
>>>> I think it is the same bug I am chasing - a stall in ubd, you hit it on
>>>> swap while I hit it in normal operation on a swapless system. I see a
>>>> stall in the journal instead of a backing dev stall.
>>>>
>>>> If you apply the ubd patches out of my patchsets, you can trigger this
>>>> one with ease. In theory, all they do is to make UBD faster so they
>>>> should not by themselves introduce new races. They may however make the
>>>> older ones more pronounced.
>>>>
>>>> My working hypothesis is a race somewhere in the vm subsystem. I have
>>>> been unable to nail it though.
>>> Hi Anton,
>>>
>>> I see this bug on a 3.17 uml kernel with the sync fix patch from
>>> Thorsten Knabe applied.
>>>
>>> The stall has to do with the writepage ratelimit mechanism, as the
>>> mechanism seems to reach a state where it tries to write out page, per
>>> page:
>>>
>>> Breakpoint 1, balance_dirty_pages (pages_dirtied=1, mapping=<optimized out>) at	mm/page-writeback.c:1338
>>> (gdb) bt
>>> #0  balance_dirty_pages	(pages_dirtied=1, mapping=<optimized out>) at mm/page-writeback.c:1338
>>>
>>> pages_dirtied = 1 !!
>>>
>>> #0  try_to_grab_pending (work=0x7fa2a348, is_dwork=true, flags=0x72ff5ab8) at kernel/workqueue.c:1159
>>> #1  0x0000000060051feb in mod_delayed_work_on (cpu=2141365064, wq=0x1, dwork=0x72ff5ab8, delay=<optimized out>) at kernel/workqueue.c:1510
>>> #2  0x00000000600f382c in mod_delayed_work (delay=<optimized out>, dwork=<optimized out>, wq=<optimized out>) at include/linux/workqueue.h:504
>>> #3  bdi_wakeup_thread (bdi=<optimized out>) at fs/fs-writeback.c:98
>>> #4  0x00000000600f4aca in bdi_start_background_writeback (bdi=<optimized out>) at fs/fs-writeback.c:179
>>> #5  0x000000006042d4c0 in balance_dirty_pages (pages_dirtied=<optimized out>, mapping=<optimized out>) at mm/page-writeback.c:1408
>>> #6  0x00000000600a6e1a in balance_dirty_pages_ratelimited (mapping=<optimized out>) at mm/page-writeback.c:1627
>>> #7  0x00000000600ba54f in do_wp_page (mm=<optimized out>, vma=<optimized out>, address=<optimized out>, page_table=<optimized out>, pmd=<optimized out>, orig_pte=..., ptl=<optimized out>) at mm/memory.c:2178
>>> #8  0x00000000600bc986 in handle_pte_fault (flags=<optimized out>, pmd=<optimized out>, pte=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>) at mm/memory.c:3230
>>> #9  __handle_mm_fault (flags=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>) at mm/memory.c:3335
>>> #10 handle_mm_fault (mm=<optimized out>, vma=0x7f653228, address=1472490776, flags=<optimized out>) at mm/memory.c:3364
>>> #11 0x0000000060028cec in handle_page_fault (address=1472490776, ip=<optimized out>, is_write=<optimized out>, is_user=0, code_out=<optimized out>) at arch/um/kernel/trap.c:75
>>> #12 0x00000000600290d7 in segv (fi=..., ip=1228924391, is_user=<optimized out>, regs=0x73eb8de8) at arch/um/kernel/trap.c:222
>>> #13 0x0000000060029395 in segv_handler (sig=<optimized out>, unused_si=<optimized out>, regs=<optimized out>) at arch/um/kernel/trap.c:191
>>> #14 0x0000000060039c0f in userspace (regs=0x73eb8de8) at arch/um/os-Linux/skas/process.c:429
>>> #15 0x0000000060026a8c in fork_handler () at arch/um/kernel/process.c:149
>>> #16 0x000000000070b620 in ?? ()
>>> #17 0x0000000000000000 in ?? ()
>>>
>>> I'm not sure if this is the same error you encounter.
>>>
>>> This is on an ubd device with a cow image attached to it.
>>>
>>> The original ubd file and the cow file are spares ones, and do also
>>> contain a swap partition.
>>>
>>> I hope to get tracepoints/perf working, now as there is stacktrace
>>> support in uml. An interessting tracepoint would be
>>> TRACE_EVENT(bdi_dirty_ratelimit) or TRACE_EVENT(balance_dirty_pages)
>>>
>>>
>>>> A.
>>>>
>>>>> A.
>>>>>
>>>>> On 13/10/14 22:48, Thomas Meyer wrote:
>>>>>> #0  balance_dirty_pages_ratelimited (mapping=0x792cc618) at mm/page-writeback.c:1587
>>>>>> #1  0x00000000600ba54f in do_wp_page (mm=<optimized out>, vma=<optimized out>, address=<optimized out>, page_table=<optimized out>, pmd
>>>>>> =<optimized out>, orig_pte=..., ptl=<optimized out>) at mm/memory.c:2178
>>>>>> #2  0x00000000600bc986 in handle_pte_fault (flags=<optimized out>, pmd=<optimized out>, pte=<optimized out>, address=<optimized out>, v
>>>>>> ma=<optimized out>, mm=<optimized out>) at mm/memory.c:3230
>>>>>> #3  __handle_mm_fault (flags=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>) at mm/memory.c:3335
>>>>>> #4  handle_mm_fault (mm=<optimized out>, vma=0x78008e88, address=1462695424, flags=<optimized out>) at mm/memory.c:3364
>>>>>> #5  0x0000000060028cec in handle_page_fault (address=1462695424, ip=<optimized out>, is_write=<optimized out>, is_user=0, code_out=<opt
>>>>>> imized out>) at arch/um/kernel/trap.c:75
>>>>>> #6  0x00000000600290d7 in segv (fi=..., ip=1228924391, is_user=<optimized out>, regs=0x624f5728) at arch/um/kernel/trap.c:222
>>>>>> #7  0x0000000060029395 in segv_handler (sig=<optimized out>, unused_si=<optimized out>, regs=<optimized out>) at arch/um/kernel/trap.c:
>>>>>> 191
>>>>>> #8  0x0000000060039c0f in userspace (regs=0x624f5728) at arch/um/os-Linux/skas/process.c:429
>>>>>> #9  0x0000000060026a8c in fork_handler () at arch/um/kernel/process.c:149
>>>>>> #10 0x0000000000000000 in ?? ()
>>>>>>
>>>>>> backing_dev_info:
>>>>>> p *mapping->backing_dev_info
>>>>>> $2 = {bdi_list = {next = 0x605901a0 <bdi_list>, prev = 0x80a42890}, ra_pages = 32, state = 8, capabilities = 4, congested_fn = 0x0, con
>>>>>> gested_data = 0x0, name = 0x604fb827 "block", bdi_stat = {{count = 4}, {count = 0}, {count = 318691}, {count = 314567}}, bw_time_stamp
>>>>>> = 4339445229, dirtied_stamp = 318686, written_stamp = 314564, write_bandwidth = 166, avg_write_bandwidth = 164, dirty_ratelimit = 1, ba
>>>>>> lanced_dirty_ratelimit = 1, completions = {events = {count = 3}, period = 4481, lock = {raw_lock = {<No data fields>}}}, dirty_exceeded
>>>>>>    = 0, min_ratio = 0, max_ratio = 100, max_prop_frac = 1024, wb = {bdi = 0x80a42278, nr = 0, last_old_flush = 4339445229, dwork = {work
>>>>>> = {data = {counter = 65}, entry = {next = 0x80a42350, prev = 0x80a42350}, func = 0x600f4b25 <bdi_writeback_workfn>}, timer = {entry = {
>>>>>> next = 0x606801a0 <boot_tvec_bases+4896>, prev = 0x803db650}, expires = 4339445730, base = 0x6067ee82 <boot_tvec_bases+2>, function = 0
>>>>>> x60051dbb <delayed_work_timer_fn>, data = 2158240584, slack = -1}, wq = 0x808d9c00, cpu = 1}, b_dirty = {next = 0x7a4ce1f8, prev = 0x80
>>>>>> 6ad9a8}, b_io = {next = 0x80a423c0, prev = 0x80a423c0}, b_more_io = {next = 0x80a423d0, prev = 0x80a423d0}, list_lock = {{rlock = {raw_
>>>>>> lock = {<No data fields>}}}}}, wb_lock = {{rlock = {raw_lock = {<No data fields>}}}}, work_list = {next = 0x80a423e0, prev = 0x80a423e0
>>>>>> }, dev = 0x80b68e00, laptop_mode_wb_timer = {entry = {next = 0x0, prev = 0x0}, expires = 0, base = 0x6067ee80 <boot_tvec_bases>, functi
>>>>>> on = 0x600a6efd <laptop_mode_timer_fn>, data = 2158240008, slack = -1}, debug_dir = 0x80419e58, debug_stats = 0x80419d98}
>>>>>>
>>>>>> when i set the cap_dirty from the backing-dev ( capabilities = 5 ) the system comes back to normal.
>>>>>>
>>>>>> any ideas what's going on here?
>>>>>>
>>>>>> with kind regards
>>>>>> thomas
>
>


------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://p.sf.net/sfu/Zoho
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [uml-devel] kernel stalls in balance_dirty_pages_ratelimited()
@ 2014-10-19 16:02             ` Anton Ivanov
  0 siblings, 0 replies; 13+ messages in thread
From: Anton Ivanov @ 2014-10-19 16:02 UTC (permalink / raw)
  To: Thomas Meyer; +Cc: user-mode-linux-devel, Linux Kernel Mailing List

On 19/10/14 15:59, Thomas Meyer wrote:
> Am Dienstag, den 14.10.2014, 08:31 +0100 schrieb Anton Ivanov:
>> I see a very similar stall on writeout to ubd with my patches (easy) and 
>> without (difficult - takes running an IO soak for a few days).
>>
>> It stalls (usually) when trying to flush the journal file of ext4.
> Hi,
>
> here an extract of the trace of all writeback:* tracepoints:
>
> # tracer: nop
> #
> # entries-in-buffer/entries-written: 19322/2052430   #P:1
> #
> #                              _-----=> irqs-off
> #                             / _----=> need-resched
> #                            | / _---=> hardirq/softirq
> #                            || / _--=> preempt-depth
> #                            ||| /     delay
> #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
> #              | |       |   ||||       |         |
>              yum-1553  [000] ....  1246.000000: writeback_wake_background: bdi 98:0
>              yum-1553  [000] ....  1246.000000: balance_dirty_pages: bdi 98:0: limit=24732 setpoint=16229 dirty=18446744073709551284 bdi_setpoint=16227 bdi_dirty=1 dirty_ratelimit=4 task_ratelimit=0 dirtied=1 dirtied_pause=0 paused=0 pause=10 period=10 think=0
>     kworker/u2:0-2603  [000] ....  1246.000000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340953 written=358955
>     kworker/u2:0-2603  [000] ....  1246.000000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340953 written=358955
>     kworker/u2:0-2603  [000] ....  1246.000000: writeback_start: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775807 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
>     kworker/u2:0-2603  [000] ....  1246.000000: writeback_queue_io: bdi 98:0: older=4295061896 age=0 enqueue=1 reason=background
>     kworker/u2:0-2603  [000] ....  1246.000000: writeback_single_inode_start: bdi 98:0: ino=29951 state=I_DIRTY_SYNC|I_DIRTY_PAGES|I_SYNC dirtied_when=4295061896 age=8 index=9 to_write=1024 wrote=0
>     kworker/u2:0-2603  [000] ....  1246.000000: writeback_write_inode_start: bdi 98:0: ino=29951 sync_mode=0
>     kworker/u2:0-2603  [000] ....  1246.000000: writeback_write_inode: bdi 98:0: ino=29951 sync_mode=0
>     kworker/u2:0-2603  [000] ....  1246.000000: writeback_single_inode: bdi 98:0: ino=29951 state=I_SYNC dirtied_when=4295061896 age=8 index=9 to_write=1024 wrote=1
>     kworker/u2:0-2603  [000] ....  1246.000000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
>     kworker/u2:0-2603  [000] ....  1246.000000: global_dirty_state: dirty=18446744073709551283 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340953 written=358956
>     kworker/u2:0-2603  [000] ....  1246.000000: writeback_start: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
>     kworker/u2:0-2603  [000] ....  1246.000000: writeback_queue_io: bdi 98:0: older=4295061896 age=0 enqueue=0 reason=background
>     kworker/u2:0-2603  [000] ....  1246.000000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
>     kworker/u2:0-2603  [000] ....  1246.000000: writeback_pages_written: 1
>              yum-1553  [000] ....  1246.010000: writeback_wake_background: bdi 98:0
>              yum-1553  [000] ....  1246.010000: writeback_dirty_inode_start: bdi 98:0: ino=29951 flags=I_DIRTY_SYNC
>              yum-1553  [000] ....  1246.010000: writeback_dirty_inode: bdi 98:0: ino=29951 flags=I_DIRTY_SYNC
>              yum-1553  [000] d...  1246.010000: writeback_dirty_page: bdi 98:0: ino=29951 index=8
>              yum-1553  [000] ....  1246.010000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340954 written=358956
>              yum-1553  [000] ....  1246.010000: writeback_wake_background: bdi 98:0
>              yum-1553  [000] ....  1246.010000: balance_dirty_pages: bdi 98:0: limit=24732 setpoint=16229 dirty=18446744073709551284 bdi_setpoint=16227 bdi_dirty=1 dirty_ratelimit=4 task_ratelimit=0 dirtied=1 dirtied_pause=0 paused=0 pause=10 period=10 think=0
>     kworker/u2:0-2603  [000] ....  1246.010000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340954 written=358956
>     kworker/u2:0-2603  [000] ....  1246.010000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340954 written=358956
>     kworker/u2:0-2603  [000] ....  1246.010000: writeback_start: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775807 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
>     kworker/u2:0-2603  [000] ....  1246.010000: writeback_queue_io: bdi 98:0: older=4295061897 age=0 enqueue=1 reason=background
>     kworker/u2:0-2603  [000] ....  1246.010000: writeback_single_inode_start: bdi 98:0: ino=29951 state=I_DIRTY_SYNC|I_DIRTY_PAGES|I_SYNC dirtied_when=4295061897 age=8 index=9 to_write=1024 wrote=0
>     kworker/u2:0-2603  [000] ....  1246.010000: writeback_write_inode_start: bdi 98:0: ino=29951 sync_mode=0
>     kworker/u2:0-2603  [000] ....  1246.010000: writeback_write_inode: bdi 98:0: ino=29951 sync_mode=0
>     kworker/u2:0-2603  [000] ....  1246.010000: writeback_single_inode: bdi 98:0: ino=29951 state=I_SYNC dirtied_when=4295061897 age=8 index=9 to_write=1024 wrote=1
>     kworker/u2:0-2603  [000] ....  1246.010000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
>     kworker/u2:0-2603  [000] ....  1246.010000: global_dirty_state: dirty=18446744073709551283 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340954 written=358957
>     kworker/u2:0-2603  [000] ....  1246.010000: writeback_start: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
>     kworker/u2:0-2603  [000] ....  1246.010000: writeback_queue_io: bdi 98:0: older=4295061897 age=0 enqueue=0 reason=background
>     kworker/u2:0-2603  [000] ....  1246.010000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
>     kworker/u2:0-2603  [000] ....  1246.010000: writeback_pages_written: 1
>              yum-1553  [000] ....  1246.020000: writeback_wake_background: bdi 98:0
>              yum-1553  [000] ....  1246.020000: writeback_dirty_inode_start: bdi 98:0: ino=29951 flags=I_DIRTY_SYNC
>              yum-1553  [000] ....  1246.020000: writeback_dirty_inode: bdi 98:0: ino=29951 flags=I_DIRTY_SYNC
>              yum-1553  [000] d...  1246.020000: writeback_dirty_page: bdi 98:0: ino=29951 index=8
>              yum-1553  [000] ....  1246.020000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340955 written=358957
>              yum-1553  [000] ....  1246.020000: writeback_wake_background: bdi 98:0
>              yum-1553  [000] ....  1246.020000: balance_dirty_pages: bdi 98:0: limit=24732 setpoint=16229 dirty=18446744073709551284 bdi_setpoint=16227 bdi_dirty=1 dirty_ratelimit=4 task_ratelimit=0 dirtied=1 dirtied_pause=0 paused=0 pause=10 period=10 think=0
>     kworker/u2:0-2603  [000] ....  1246.020000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340955 written=358957
>     kworker/u2:0-2603  [000] ....  1246.020000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340955 written=358957
>     kworker/u2:0-2603  [000] ....  1246.020000: writeback_start: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775807 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
>     kworker/u2:0-2603  [000] ....  1246.020000: writeback_queue_io: bdi 98:0: older=4295061898 age=0 enqueue=1 reason=background
>     kworker/u2:0-2603  [000] ....  1246.020000: writeback_single_inode_start: bdi 98:0: ino=29951 state=I_DIRTY_SYNC|I_DIRTY_PAGES|I_SYNC dirtied_when=4295061898 age=8 index=9 to_write=1024 wrote=0
>     kworker/u2:0-2603  [000] ....  1246.020000: writeback_write_inode_start: bdi 98:0: ino=29951 sync_mode=0
>     kworker/u2:0-2603  [000] ....  1246.020000: writeback_write_inode: bdi 98:0: ino=29951 sync_mode=0
>     kworker/u2:0-2603  [000] ....  1246.020000: writeback_single_inode: bdi 98:0: ino=29951 state=I_SYNC dirtied_when=4295061898 age=8 index=9 to_write=1024 wrote=1
>     kworker/u2:0-2603  [000] ....  1246.020000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
>     kworker/u2:0-2603  [000] ....  1246.020000: global_dirty_state: dirty=18446744073709551283 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340955 written=358958
>     kworker/u2:0-2603  [000] ....  1246.020000: writeback_start: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
>     kworker/u2:0-2603  [000] ....  1246.020000: writeback_queue_io: bdi 98:0: older=4295061898 age=0 enqueue=0 reason=background
>     kworker/u2:0-2603  [000] ....  1246.020000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
>     kworker/u2:0-2603  [000] ....  1246.020000: writeback_pages_written: 1
>
> I wonder why "global_dirty_state: dirty=18446744073709551284" can get so high!
> I think this is maybe an signed long, but "nr_dirty" should be a unsigned long. Two's complement of above value is 332...
>
> I'm not sure what's going on here...

Neither am I, but, unless I am mistaken, it does look like the writeout
just stops working and the dirty starts to grow uncontrolled. :)

>
> any ideas?

I had some suspicion of a race somewhere in the UML VM subsystem. I
sprinked barrier() all over it, nope not the case.

I have gone through the ubd driver itself with a microscope ~ 3 times,
cannot see anything wrong with the logic. While it does not win a beauty
contest, it does seem like it should work.

All I know is that my fixes to UBD and the replacement epoll IRQ
controller for UML make it more likely, however, you can trigger it with
the stock UML too.

A.

>
>> A.
>>
>>
>> On 14/10/14 08:21, Thomas Meyer wrote:
>>> Am Dienstag, den 14.10.2014, 07:43 +0100 schrieb Anton Ivanov:
>>>> On 14/10/14 06:38, Anton Ivanov wrote:
>>>>> How does the stall manifest itself?
>>>>>
>>>>> Do you have the journal thread (and sometimes a couple of other threads)
>>>>> sitting in D state?
>>>> Sorry, should not be asking questions at 6 am before the 3rd double
>>>> espresso.
>>>>
>>>> I think it is the same bug I am chasing - a stall in ubd, you hit it on
>>>> swap while I hit it in normal operation on a swapless system. I see a
>>>> stall in the journal instead of a backing dev stall.
>>>>
>>>> If you apply the ubd patches out of my patchsets, you can trigger this
>>>> one with ease. In theory, all they do is to make UBD faster so they
>>>> should not by themselves introduce new races. They may however make the
>>>> older ones more pronounced.
>>>>
>>>> My working hypothesis is a race somewhere in the vm subsystem. I have
>>>> been unable to nail it though.
>>> Hi Anton,
>>>
>>> I see this bug on a 3.17 uml kernel with the sync fix patch from
>>> Thorsten Knabe applied.
>>>
>>> The stall has to do with the writepage ratelimit mechanism, as the
>>> mechanism seems to reach a state where it tries to write out page, per
>>> page:
>>>
>>> Breakpoint 1, balance_dirty_pages (pages_dirtied=1, mapping=<optimized out>) at	mm/page-writeback.c:1338
>>> (gdb) bt
>>> #0  balance_dirty_pages	(pages_dirtied=1, mapping=<optimized out>) at mm/page-writeback.c:1338
>>>
>>> pages_dirtied = 1 !!
>>>
>>> #0  try_to_grab_pending (work=0x7fa2a348, is_dwork=true, flags=0x72ff5ab8) at kernel/workqueue.c:1159
>>> #1  0x0000000060051feb in mod_delayed_work_on (cpu=2141365064, wq=0x1, dwork=0x72ff5ab8, delay=<optimized out>) at kernel/workqueue.c:1510
>>> #2  0x00000000600f382c in mod_delayed_work (delay=<optimized out>, dwork=<optimized out>, wq=<optimized out>) at include/linux/workqueue.h:504
>>> #3  bdi_wakeup_thread (bdi=<optimized out>) at fs/fs-writeback.c:98
>>> #4  0x00000000600f4aca in bdi_start_background_writeback (bdi=<optimized out>) at fs/fs-writeback.c:179
>>> #5  0x000000006042d4c0 in balance_dirty_pages (pages_dirtied=<optimized out>, mapping=<optimized out>) at mm/page-writeback.c:1408
>>> #6  0x00000000600a6e1a in balance_dirty_pages_ratelimited (mapping=<optimized out>) at mm/page-writeback.c:1627
>>> #7  0x00000000600ba54f in do_wp_page (mm=<optimized out>, vma=<optimized out>, address=<optimized out>, page_table=<optimized out>, pmd=<optimized out>, orig_pte=..., ptl=<optimized out>) at mm/memory.c:2178
>>> #8  0x00000000600bc986 in handle_pte_fault (flags=<optimized out>, pmd=<optimized out>, pte=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>) at mm/memory.c:3230
>>> #9  __handle_mm_fault (flags=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>) at mm/memory.c:3335
>>> #10 handle_mm_fault (mm=<optimized out>, vma=0x7f653228, address=1472490776, flags=<optimized out>) at mm/memory.c:3364
>>> #11 0x0000000060028cec in handle_page_fault (address=1472490776, ip=<optimized out>, is_write=<optimized out>, is_user=0, code_out=<optimized out>) at arch/um/kernel/trap.c:75
>>> #12 0x00000000600290d7 in segv (fi=..., ip=1228924391, is_user=<optimized out>, regs=0x73eb8de8) at arch/um/kernel/trap.c:222
>>> #13 0x0000000060029395 in segv_handler (sig=<optimized out>, unused_si=<optimized out>, regs=<optimized out>) at arch/um/kernel/trap.c:191
>>> #14 0x0000000060039c0f in userspace (regs=0x73eb8de8) at arch/um/os-Linux/skas/process.c:429
>>> #15 0x0000000060026a8c in fork_handler () at arch/um/kernel/process.c:149
>>> #16 0x000000000070b620 in ?? ()
>>> #17 0x0000000000000000 in ?? ()
>>>
>>> I'm not sure if this is the same error you encounter.
>>>
>>> This is on an ubd device with a cow image attached to it.
>>>
>>> The original ubd file and the cow file are spares ones, and do also
>>> contain a swap partition.
>>>
>>> I hope to get tracepoints/perf working, now as there is stacktrace
>>> support in uml. An interessting tracepoint would be
>>> TRACE_EVENT(bdi_dirty_ratelimit) or TRACE_EVENT(balance_dirty_pages)
>>>
>>>
>>>> A.
>>>>
>>>>> A.
>>>>>
>>>>> On 13/10/14 22:48, Thomas Meyer wrote:
>>>>>> #0  balance_dirty_pages_ratelimited (mapping=0x792cc618) at mm/page-writeback.c:1587
>>>>>> #1  0x00000000600ba54f in do_wp_page (mm=<optimized out>, vma=<optimized out>, address=<optimized out>, page_table=<optimized out>, pmd
>>>>>> =<optimized out>, orig_pte=..., ptl=<optimized out>) at mm/memory.c:2178
>>>>>> #2  0x00000000600bc986 in handle_pte_fault (flags=<optimized out>, pmd=<optimized out>, pte=<optimized out>, address=<optimized out>, v
>>>>>> ma=<optimized out>, mm=<optimized out>) at mm/memory.c:3230
>>>>>> #3  __handle_mm_fault (flags=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>) at mm/memory.c:3335
>>>>>> #4  handle_mm_fault (mm=<optimized out>, vma=0x78008e88, address=1462695424, flags=<optimized out>) at mm/memory.c:3364
>>>>>> #5  0x0000000060028cec in handle_page_fault (address=1462695424, ip=<optimized out>, is_write=<optimized out>, is_user=0, code_out=<opt
>>>>>> imized out>) at arch/um/kernel/trap.c:75
>>>>>> #6  0x00000000600290d7 in segv (fi=..., ip=1228924391, is_user=<optimized out>, regs=0x624f5728) at arch/um/kernel/trap.c:222
>>>>>> #7  0x0000000060029395 in segv_handler (sig=<optimized out>, unused_si=<optimized out>, regs=<optimized out>) at arch/um/kernel/trap.c:
>>>>>> 191
>>>>>> #8  0x0000000060039c0f in userspace (regs=0x624f5728) at arch/um/os-Linux/skas/process.c:429
>>>>>> #9  0x0000000060026a8c in fork_handler () at arch/um/kernel/process.c:149
>>>>>> #10 0x0000000000000000 in ?? ()
>>>>>>
>>>>>> backing_dev_info:
>>>>>> p *mapping->backing_dev_info
>>>>>> $2 = {bdi_list = {next = 0x605901a0 <bdi_list>, prev = 0x80a42890}, ra_pages = 32, state = 8, capabilities = 4, congested_fn = 0x0, con
>>>>>> gested_data = 0x0, name = 0x604fb827 "block", bdi_stat = {{count = 4}, {count = 0}, {count = 318691}, {count = 314567}}, bw_time_stamp
>>>>>> = 4339445229, dirtied_stamp = 318686, written_stamp = 314564, write_bandwidth = 166, avg_write_bandwidth = 164, dirty_ratelimit = 1, ba
>>>>>> lanced_dirty_ratelimit = 1, completions = {events = {count = 3}, period = 4481, lock = {raw_lock = {<No data fields>}}}, dirty_exceeded
>>>>>>    = 0, min_ratio = 0, max_ratio = 100, max_prop_frac = 1024, wb = {bdi = 0x80a42278, nr = 0, last_old_flush = 4339445229, dwork = {work
>>>>>> = {data = {counter = 65}, entry = {next = 0x80a42350, prev = 0x80a42350}, func = 0x600f4b25 <bdi_writeback_workfn>}, timer = {entry = {
>>>>>> next = 0x606801a0 <boot_tvec_bases+4896>, prev = 0x803db650}, expires = 4339445730, base = 0x6067ee82 <boot_tvec_bases+2>, function = 0
>>>>>> x60051dbb <delayed_work_timer_fn>, data = 2158240584, slack = -1}, wq = 0x808d9c00, cpu = 1}, b_dirty = {next = 0x7a4ce1f8, prev = 0x80
>>>>>> 6ad9a8}, b_io = {next = 0x80a423c0, prev = 0x80a423c0}, b_more_io = {next = 0x80a423d0, prev = 0x80a423d0}, list_lock = {{rlock = {raw_
>>>>>> lock = {<No data fields>}}}}}, wb_lock = {{rlock = {raw_lock = {<No data fields>}}}}, work_list = {next = 0x80a423e0, prev = 0x80a423e0
>>>>>> }, dev = 0x80b68e00, laptop_mode_wb_timer = {entry = {next = 0x0, prev = 0x0}, expires = 0, base = 0x6067ee80 <boot_tvec_bases>, functi
>>>>>> on = 0x600a6efd <laptop_mode_timer_fn>, data = 2158240008, slack = -1}, debug_dir = 0x80419e58, debug_stats = 0x80419d98}
>>>>>>
>>>>>> when i set the cap_dirty from the backing-dev ( capabilities = 5 ) the system comes back to normal.
>>>>>>
>>>>>> any ideas what's going on here?
>>>>>>
>>>>>> with kind regards
>>>>>> thomas
>
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [uml-devel] kernel stalls in balance_dirty_pages_ratelimited()
  2014-10-19 16:02             ` Anton Ivanov
  (?)
@ 2014-10-19 19:35             ` Thomas Meyer
  2014-10-20 20:01               ` Thomas Meyer
  -1 siblings, 1 reply; 13+ messages in thread
From: Thomas Meyer @ 2014-10-19 19:35 UTC (permalink / raw)
  To: Anton Ivanov; +Cc: Linux Kernel Mailing List, user-mode-linux-devel

Am Sonntag, den 19.10.2014, 17:02 +0100 schrieb Anton Ivanov:
> On 19/10/14 15:59, Thomas Meyer wrote:
> > Am Dienstag, den 14.10.2014, 08:31 +0100 schrieb Anton Ivanov:
> >> I see a very similar stall on writeout to ubd with my patches (easy) and 
> >> without (difficult - takes running an IO soak for a few days).
> >>
> >> It stalls (usually) when trying to flush the journal file of ext4.
> > Hi,
> >
> > here an extract of the trace of all writeback:* tracepoints:
> >
> >              yum-1553  [000] ....  1246.020000: balance_dirty_pages: bdi 98:0: limit=24732 setpoint=16229 dirty=18446744073709551284 bdi_setpoint=16227 bdi_dirty=1 dirty_ratelimit=4 task_ratelimit=0 dirtied=1 dirtied_pause=0 paused=0 pause=10 period=10 think=0
> >     kworker/u2:0-2603  [000] ....  1246.020000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340955 written=358957
> >     kworker/u2:0-2603  [000] ....  1246.020000: global_dirty_state: dirty=18446744073709551284 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340955 written=358957
> >     kworker/u2:0-2603  [000] ....  1246.020000: writeback_start: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775807 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
> >     kworker/u2:0-2603  [000] ....  1246.020000: writeback_queue_io: bdi 98:0: older=4295061898 age=0 enqueue=1 reason=background
> >     kworker/u2:0-2603  [000] ....  1246.020000: writeback_single_inode_start: bdi 98:0: ino=29951 state=I_DIRTY_SYNC|I_DIRTY_PAGES|I_SYNC dirtied_when=4295061898 age=8 index=9 to_write=1024 wrote=0
> >     kworker/u2:0-2603  [000] ....  1246.020000: writeback_write_inode_start: bdi 98:0: ino=29951 sync_mode=0
> >     kworker/u2:0-2603  [000] ....  1246.020000: writeback_write_inode: bdi 98:0: ino=29951 sync_mode=0
> >     kworker/u2:0-2603  [000] ....  1246.020000: writeback_single_inode: bdi 98:0: ino=29951 state=I_SYNC dirtied_when=4295061898 age=8 index=9 to_write=1024 wrote=1
> >     kworker/u2:0-2603  [000] ....  1246.020000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
> >     kworker/u2:0-2603  [000] ....  1246.020000: global_dirty_state: dirty=18446744073709551283 writeback=0 unstable=0 bg_thresh=5151 thresh=10303 limit=24732 dirtied=340955 written=358958
> >     kworker/u2:0-2603  [000] ....  1246.020000: writeback_start: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
> >     kworker/u2:0-2603  [000] ....  1246.020000: writeback_queue_io: bdi 98:0: older=4295061898 age=0 enqueue=0 reason=background
> >     kworker/u2:0-2603  [000] ....  1246.020000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=9223372036854775806 sync_mode=0 kupdate=0 range_cyclic=1 background=1 reason=background
> >     kworker/u2:0-2603  [000] ....  1246.020000: writeback_pages_written: 1
> >
> > I wonder why "global_dirty_state: dirty=18446744073709551284" can get so high!
> > I think this is maybe an signed long, but "nr_dirty" should be a unsigned long. Two's complement of above value is 332...
> >
> > I'm not sure what's going on here...
> 
> Neither am I, but, unless I am mistaken, it does look like the writeout
> just stops working and the dirty starts to grow uncontrolled. :)
> 
> >
> > any ideas?
> 
> I had some suspicion of a race somewhere in the UML VM subsystem. I
> sprinked barrier() all over it, nope not the case.
> 
> I have gone through the ubd driver itself with a microscope ~ 3 times,
> cannot see anything wrong with the logic. While it does not win a beauty
> contest, it does seem like it should work.
> 
> All I know is that my fixes to UBD and the replacement epoll IRQ
> controller for UML make it more likely, however, you can trigger it with
> the stock UML too.

this is a trace with traceoff with the first global_dirty_state nr_dirty > 1000000

    kworker/u2:2-112   [000] ....   541.590000: writeback_write_inode_start: bdi 98:0: ino=51 sync_mode=0
    kworker/u2:2-112   [000] ....   541.590000: writeback_write_inode: bdi 98:0: ino=51 sync_mode=0
    kworker/u2:2-112   [000] ....   541.590000: writeback_single_inode: bdi 98:0: ino=51 state=I_SYNC dirtied_when=4294988447 age=117 index=69 to_write=4096 wrote=0
    kworker/u2:2-112   [000] ....   541.590000: writeback_single_inode_start: bdi 98:0: ino=52 state=I_DIRTY_SYNC|I_DIRTY_DATASYNC|I_DIRTY_PAGES|I_SYNC dirtied_when=4294988448 age=117 index=4 to_write=4096 wrote=0
    kworker/u2:2-112   [000] ....   541.590000: writeback_write_inode_start: bdi 98:0: ino=52 sync_mode=0
    kworker/u2:2-112   [000] ....   541.590000: writeback_write_inode: bdi 98:0: ino=52 sync_mode=0
    kworker/u2:2-112   [000] ....   541.590000: writeback_single_inode: bdi 98:0: ino=52 state=I_SYNC dirtied_when=4294988448 age=117 index=4 to_write=4096 wrote=0
    kworker/u2:2-112   [000] ....   541.590000: writeback_single_inode_start: bdi 98:0: ino=53 state=I_DIRTY_SYNC|I_DIRTY_PAGES|I_SYNC dirtied_when=4294988448 age=117 index=7 to_write=4096 wrote=0
    kworker/u2:2-112   [000] ....   541.590000: writeback_write_inode_start: bdi 98:0: ino=53 sync_mode=0
    kworker/u2:2-112   [000] ....   541.590000: writeback_write_inode: bdi 98:0: ino=53 sync_mode=0
    kworker/u2:2-112   [000] ....   541.590000: writeback_single_inode: bdi 98:0: ino=53 state=I_SYNC dirtied_when=4294988448 age=117 index=7 to_write=4096 wrote=0
    kworker/u2:2-112   [000] ....   541.590000: writeback_single_inode_start: bdi 98:0: ino=54 state=I_DIRTY_SYNC|I_DIRTY_PAGES|I_SYNC dirtied_when=4294988448 age=117 index=18 to_write=4096 wrote=0
    kworker/u2:2-112   [000] ....   541.590000: writeback_write_inode_start: bdi 98:0: ino=54 sync_mode=0
    kworker/u2:2-112   [000] ....   541.590000: writeback_write_inode: bdi 98:0: ino=54 sync_mode=0
    kworker/u2:2-112   [000] ....   541.590000: writeback_single_inode: bdi 98:0: ino=54 state=I_SYNC dirtied_when=4294988448 age=117 index=18 to_write=4096 wrote=0
    kworker/u2:2-112   [000] ....   541.590000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=15382 sync_mode=0 kupdate=1 range_cyclic=1 background=0 reason=periodic
    kworker/u2:2-112   [000] ....   541.590000: writeback_start: bdi 98:0: sb_dev 0:0 nr_pages=15382 sync_mode=0 kupdate=1 range_cyclic=1 background=0 reason=periodic
    kworker/u2:2-112   [000] ....   541.590000: writeback_queue_io: bdi 98:0: older=4294988455 age=30000 enqueue=0 reason=periodic
    kworker/u2:2-112   [000] ....   541.590000: writeback_written: bdi 98:0: sb_dev 0:0 nr_pages=15382 sync_mode=0 kupdate=1 range_cyclic=1 background=0 reason=periodic
    kworker/u2:2-112   [000] ....   541.590000: global_dirty_state: dirty=18446744073709551604 writeback=30 unstable=0 bg_thresh=6954 thresh=13909 limit=24789 dirtied=278225 written=295588

So something seems to mess with global_page_state(NR_FILE_DIRTY) which
is vm_stat[11].counter
and indeed vm_stat is defined as signed long, that explains the negativ
value.

Maybe some unbalanced calls between dec/inc_zone_page_state(page, NR_FILE_DIRTY)?

Is it possible in gdb to set a watchpoint on vm_stat[11].counter when a value < 0 is written to this address?

> >> On 14/10/14 08:21, Thomas Meyer wrote:
> >>> Am Dienstag, den 14.10.2014, 07:43 +0100 schrieb Anton Ivanov:
> >>>> On 14/10/14 06:38, Anton Ivanov wrote:
> >>>>> How does the stall manifest itself?
> >>>>>
> >>>>> Do you have the journal thread (and sometimes a couple of other threads)
> >>>>> sitting in D state?
> >>>> Sorry, should not be asking questions at 6 am before the 3rd double
> >>>> espresso.
> >>>>
> >>>> I think it is the same bug I am chasing - a stall in ubd, you hit it on
> >>>> swap while I hit it in normal operation on a swapless system. I see a
> >>>> stall in the journal instead of a backing dev stall.
> >>>>
> >>>> If you apply the ubd patches out of my patchsets, you can trigger this
> >>>> one with ease. In theory, all they do is to make UBD faster so they
> >>>> should not by themselves introduce new races. They may however make the
> >>>> older ones more pronounced.
> >>>>
> >>>> My working hypothesis is a race somewhere in the vm subsystem. I have
> >>>> been unable to nail it though.
> >>> Hi Anton,
> >>>
> >>> I see this bug on a 3.17 uml kernel with the sync fix patch from
> >>> Thorsten Knabe applied.
> >>>
> >>> The stall has to do with the writepage ratelimit mechanism, as the
> >>> mechanism seems to reach a state where it tries to write out page, per
> >>> page:
> >>>
> >>> Breakpoint 1, balance_dirty_pages (pages_dirtied=1, mapping=<optimized out>) at	mm/page-writeback.c:1338
> >>> (gdb) bt
> >>> #0  balance_dirty_pages	(pages_dirtied=1, mapping=<optimized out>) at mm/page-writeback.c:1338
> >>>
> >>> pages_dirtied = 1 !!
> >>>
> >>> #0  try_to_grab_pending (work=0x7fa2a348, is_dwork=true, flags=0x72ff5ab8) at kernel/workqueue.c:1159
> >>> #1  0x0000000060051feb in mod_delayed_work_on (cpu=2141365064, wq=0x1, dwork=0x72ff5ab8, delay=<optimized out>) at kernel/workqueue.c:1510
> >>> #2  0x00000000600f382c in mod_delayed_work (delay=<optimized out>, dwork=<optimized out>, wq=<optimized out>) at include/linux/workqueue.h:504
> >>> #3  bdi_wakeup_thread (bdi=<optimized out>) at fs/fs-writeback.c:98
> >>> #4  0x00000000600f4aca in bdi_start_background_writeback (bdi=<optimized out>) at fs/fs-writeback.c:179
> >>> #5  0x000000006042d4c0 in balance_dirty_pages (pages_dirtied=<optimized out>, mapping=<optimized out>) at mm/page-writeback.c:1408
> >>> #6  0x00000000600a6e1a in balance_dirty_pages_ratelimited (mapping=<optimized out>) at mm/page-writeback.c:1627
> >>> #7  0x00000000600ba54f in do_wp_page (mm=<optimized out>, vma=<optimized out>, address=<optimized out>, page_table=<optimized out>, pmd=<optimized out>, orig_pte=..., ptl=<optimized out>) at mm/memory.c:2178
> >>> #8  0x00000000600bc986 in handle_pte_fault (flags=<optimized out>, pmd=<optimized out>, pte=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>) at mm/memory.c:3230
> >>> #9  __handle_mm_fault (flags=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>) at mm/memory.c:3335
> >>> #10 handle_mm_fault (mm=<optimized out>, vma=0x7f653228, address=1472490776, flags=<optimized out>) at mm/memory.c:3364
> >>> #11 0x0000000060028cec in handle_page_fault (address=1472490776, ip=<optimized out>, is_write=<optimized out>, is_user=0, code_out=<optimized out>) at arch/um/kernel/trap.c:75
> >>> #12 0x00000000600290d7 in segv (fi=..., ip=1228924391, is_user=<optimized out>, regs=0x73eb8de8) at arch/um/kernel/trap.c:222
> >>> #13 0x0000000060029395 in segv_handler (sig=<optimized out>, unused_si=<optimized out>, regs=<optimized out>) at arch/um/kernel/trap.c:191
> >>> #14 0x0000000060039c0f in userspace (regs=0x73eb8de8) at arch/um/os-Linux/skas/process.c:429
> >>> #15 0x0000000060026a8c in fork_handler () at arch/um/kernel/process.c:149
> >>> #16 0x000000000070b620 in ?? ()
> >>> #17 0x0000000000000000 in ?? ()
> >>>
> >>> I'm not sure if this is the same error you encounter.
> >>>
> >>> This is on an ubd device with a cow image attached to it.
> >>>
> >>> The original ubd file and the cow file are spares ones, and do also
> >>> contain a swap partition.
> >>>
> >>> I hope to get tracepoints/perf working, now as there is stacktrace
> >>> support in uml. An interessting tracepoint would be
> >>> TRACE_EVENT(bdi_dirty_ratelimit) or TRACE_EVENT(balance_dirty_pages)
> >>>
> >>>
> >>>> A.
> >>>>
> >>>>> A.
> >>>>>
> >>>>> On 13/10/14 22:48, Thomas Meyer wrote:
> >>>>>> #0  balance_dirty_pages_ratelimited (mapping=0x792cc618) at mm/page-writeback.c:1587
> >>>>>> #1  0x00000000600ba54f in do_wp_page (mm=<optimized out>, vma=<optimized out>, address=<optimized out>, page_table=<optimized out>, pmd
> >>>>>> =<optimized out>, orig_pte=..., ptl=<optimized out>) at mm/memory.c:2178
> >>>>>> #2  0x00000000600bc986 in handle_pte_fault (flags=<optimized out>, pmd=<optimized out>, pte=<optimized out>, address=<optimized out>, v
> >>>>>> ma=<optimized out>, mm=<optimized out>) at mm/memory.c:3230
> >>>>>> #3  __handle_mm_fault (flags=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>) at mm/memory.c:3335
> >>>>>> #4  handle_mm_fault (mm=<optimized out>, vma=0x78008e88, address=1462695424, flags=<optimized out>) at mm/memory.c:3364
> >>>>>> #5  0x0000000060028cec in handle_page_fault (address=1462695424, ip=<optimized out>, is_write=<optimized out>, is_user=0, code_out=<opt
> >>>>>> imized out>) at arch/um/kernel/trap.c:75
> >>>>>> #6  0x00000000600290d7 in segv (fi=..., ip=1228924391, is_user=<optimized out>, regs=0x624f5728) at arch/um/kernel/trap.c:222
> >>>>>> #7  0x0000000060029395 in segv_handler (sig=<optimized out>, unused_si=<optimized out>, regs=<optimized out>) at arch/um/kernel/trap.c:
> >>>>>> 191
> >>>>>> #8  0x0000000060039c0f in userspace (regs=0x624f5728) at arch/um/os-Linux/skas/process.c:429
> >>>>>> #9  0x0000000060026a8c in fork_handler () at arch/um/kernel/process.c:149
> >>>>>> #10 0x0000000000000000 in ?? ()
> >>>>>>
> >>>>>> backing_dev_info:
> >>>>>> p *mapping->backing_dev_info
> >>>>>> $2 = {bdi_list = {next = 0x605901a0 <bdi_list>, prev = 0x80a42890}, ra_pages = 32, state = 8, capabilities = 4, congested_fn = 0x0, con
> >>>>>> gested_data = 0x0, name = 0x604fb827 "block", bdi_stat = {{count = 4}, {count = 0}, {count = 318691}, {count = 314567}}, bw_time_stamp
> >>>>>> = 4339445229, dirtied_stamp = 318686, written_stamp = 314564, write_bandwidth = 166, avg_write_bandwidth = 164, dirty_ratelimit = 1, ba
> >>>>>> lanced_dirty_ratelimit = 1, completions = {events = {count = 3}, period = 4481, lock = {raw_lock = {<No data fields>}}}, dirty_exceeded
> >>>>>>    = 0, min_ratio = 0, max_ratio = 100, max_prop_frac = 1024, wb = {bdi = 0x80a42278, nr = 0, last_old_flush = 4339445229, dwork = {work
> >>>>>> = {data = {counter = 65}, entry = {next = 0x80a42350, prev = 0x80a42350}, func = 0x600f4b25 <bdi_writeback_workfn>}, timer = {entry = {
> >>>>>> next = 0x606801a0 <boot_tvec_bases+4896>, prev = 0x803db650}, expires = 4339445730, base = 0x6067ee82 <boot_tvec_bases+2>, function = 0
> >>>>>> x60051dbb <delayed_work_timer_fn>, data = 2158240584, slack = -1}, wq = 0x808d9c00, cpu = 1}, b_dirty = {next = 0x7a4ce1f8, prev = 0x80
> >>>>>> 6ad9a8}, b_io = {next = 0x80a423c0, prev = 0x80a423c0}, b_more_io = {next = 0x80a423d0, prev = 0x80a423d0}, list_lock = {{rlock = {raw_
> >>>>>> lock = {<No data fields>}}}}}, wb_lock = {{rlock = {raw_lock = {<No data fields>}}}}, work_list = {next = 0x80a423e0, prev = 0x80a423e0
> >>>>>> }, dev = 0x80b68e00, laptop_mode_wb_timer = {entry = {next = 0x0, prev = 0x0}, expires = 0, base = 0x6067ee80 <boot_tvec_bases>, functi
> >>>>>> on = 0x600a6efd <laptop_mode_timer_fn>, data = 2158240008, slack = -1}, debug_dir = 0x80419e58, debug_stats = 0x80419d98}
> >>>>>>
> >>>>>> when i set the cap_dirty from the backing-dev ( capabilities = 5 ) the system comes back to normal.
> >>>>>>
> >>>>>> any ideas what's going on here?
> >>>>>>
> >>>>>> with kind regards
> >>>>>> thomas
> >




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [uml-devel] kernel stalls in balance_dirty_pages_ratelimited()
  2014-10-19 19:35             ` Thomas Meyer
@ 2014-10-20 20:01               ` Thomas Meyer
  2014-10-23  7:34                 ` Anton Ivanov
  0 siblings, 1 reply; 13+ messages in thread
From: Thomas Meyer @ 2014-10-20 20:01 UTC (permalink / raw)
  To: Anton Ivanov; +Cc: Linux Kernel Mailing List, user-mode-linux-devel

Am Sonntag, den 19.10.2014, 21:35 +0200 schrieb Thomas Meyer:
> Am Sonntag, den 19.10.2014, 17:02 +0100 schrieb Anton Ivanov:
> > On 19/10/14 15:59, Thomas Meyer wrote:
> > > Am Dienstag, den 14.10.2014, 08:31 +0100 schrieb Anton Ivanov:
> > >> I see a very similar stall on writeout to ubd with my patches (easy) and 
> > >> without (difficult - takes running an IO soak for a few days).
> > >>
> > >> It stalls (usually) when trying to flush the journal file of ext4.
> > > any ideas?
> > 
> > I had some suspicion of a race somewhere in the UML VM subsystem. I
> > sprinked barrier() all over it, nope not the case.
> > 

I added this patch to the uml kernel:

diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index 82e7db7..7f35fa4 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -241,6 +241,10 @@ static inline void __inc_zone_state(struct zone *zone, enum zone_stat_item item)
 static inline void __dec_zone_state(struct zone *zone, enum zone_stat_item item)
 {
        atomic_long_dec(&zone->vm_stat[item]);
+       if (&vm_stat[item] == &vm_stat[NR_FILE_DIRTY] &&
+               atomic_long_read(&vm_stat[item]) < 0) {
+               asm("int3");
+       }
        atomic_long_dec(&vm_stat[item]);
 }


And this is the backtrace leading to the negative nr_dirty value:

Program received signal SIGTRAP, Trace/breakpoint trap.
__dec_zone_state (item=<optimized out>, zone=<optimized out>) at include/linux/vmstat.h:248
(gdb) bt
#0  __dec_zone_state (item=<optimized out>, zone=<optimized out>) at include/linux/vmstat.h:248
#1  __dec_zone_page_state (item=<optimized out>, page=<optimized out>) at include/linux/vmstat.h:260
#2  clear_page_dirty_for_io (page=0x628b7308) at mm/page-writeback.c:2333
#3  0x0000000060188c36 in mpage_submit_page (mpd=0x808ebb90, page=<optimized out>) at fs/ext4/inode.c:1785
#4  0x000000006018917e in mpage_map_and_submit_buffers (mpd=0x808ebb90) at fs/ext4/inode.c:1981
#5  0x000000006018d64a in mpage_map_and_submit_extent (give_up_on_write=<optimized out>, mpd=<optimized out>, handle=<optimized out>) at fs/ext4/inode.c:2123
#6  ext4_writepages (mapping=<optimized out>, wbc=<optimized out>) at fs/ext4/inode.c:2428
#7  0x00000000600f0838 in do_writepages	(mapping=<optimized out>, wbc=<optimized out>) at mm/page-writeback.c:2043
#8  0x0000000060143d29 in __writeback_single_inode (inode=0x75e191a8, wbc=0x808ebcb8) at fs/fs-writeback.c:461
#9  0x0000000060144c00 in writeback_sb_inodes (sb=<optimized out>, wb=0x80a92330, work=0x808ebe00) at fs/fs-writeback.c:688
#10 0x0000000060144e0e in __writeback_inodes_wb (wb=0x808eb990,	work=0x628b7308) at fs/fs-writeback.c:733
#11 0x0000000060144f8d in wb_writeback (wb=0x80a92330, work=0x808ebe00) at fs/fs-writeback.c:864
#12 0x0000000060145375 in wb_check_old_data_flush (wb=<optimized out>) at fs/fs-writeback.c:979
#13 wb_do_writeback (wb=<optimized out>) at fs/fs-writeback.c:1014
#14 bdi_writeback_workfn (work=0x808eb990) at fs/fs-writeback.c:1044
#15 0x00000000600690a2 in process_one_work (worker=0x808c3700, work=0x80a92340)	at kernel/workqueue.c:2023
#16 0x0000000060069b5e in worker_thread	(__worker=0x808eb990) at kernel/workqueue.c:2155
#17 0x000000006006dd9f in kthread (_create=0x80822040) at kernel/kthread.c:207
#18 0x000000006003ab59 in new_thread_handler ()	at arch/um/kernel/process.c:129
#19 0x0000000000000000 in ?? ()




^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [uml-devel] kernel stalls in balance_dirty_pages_ratelimited()
  2014-10-20 20:01               ` Thomas Meyer
@ 2014-10-23  7:34                 ` Anton Ivanov
  2014-10-23  8:36                   ` Richard Weinberger
  0 siblings, 1 reply; 13+ messages in thread
From: Anton Ivanov @ 2014-10-23  7:34 UTC (permalink / raw)
  To: user-mode-linux-devel, Richard Weinberger

Hi Richard,

I have had a question in my mind on this for quite a while around this 
bug (and a quite a few others).

UML by the nature of being UP includes the generic spinlock definition 
for UP which does very little if any.

How exactly does this work on a multicore host if you have a different 
thread hitting the same critical section and that thread is running on a 
different core? Isn't that effectively a form of weird SMP (which in 
turn requires stricter locking). Am I missing something here?

A.

On 20/10/14 21:01, Thomas Meyer wrote:
> Am Sonntag, den 19.10.2014, 21:35 +0200 schrieb Thomas Meyer:
>> Am Sonntag, den 19.10.2014, 17:02 +0100 schrieb Anton Ivanov:
>>> On 19/10/14 15:59, Thomas Meyer wrote:
>>>> Am Dienstag, den 14.10.2014, 08:31 +0100 schrieb Anton Ivanov:
>>>>> I see a very similar stall on writeout to ubd with my patches (easy) and
>>>>> without (difficult - takes running an IO soak for a few days).
>>>>>
>>>>> It stalls (usually) when trying to flush the journal file of ext4.
>>>> any ideas?
>>> I had some suspicion of a race somewhere in the UML VM subsystem. I
>>> sprinked barrier() all over it, nope not the case.
>>>
> I added this patch to the uml kernel:
>
> diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
> index 82e7db7..7f35fa4 100644
> --- a/include/linux/vmstat.h
> +++ b/include/linux/vmstat.h
> @@ -241,6 +241,10 @@ static inline void __inc_zone_state(struct zone *zone, enum zone_stat_item item)
>   static inline void __dec_zone_state(struct zone *zone, enum zone_stat_item item)
>   {
>          atomic_long_dec(&zone->vm_stat[item]);
> +       if (&vm_stat[item] == &vm_stat[NR_FILE_DIRTY] &&
> +               atomic_long_read(&vm_stat[item]) < 0) {
> +               asm("int3");
> +       }
>          atomic_long_dec(&vm_stat[item]);
>   }
>
>
> And this is the backtrace leading to the negative nr_dirty value:
>
> Program received signal SIGTRAP, Trace/breakpoint trap.
> __dec_zone_state (item=<optimized out>, zone=<optimized out>) at include/linux/vmstat.h:248
> (gdb) bt
> #0  __dec_zone_state (item=<optimized out>, zone=<optimized out>) at include/linux/vmstat.h:248
> #1  __dec_zone_page_state (item=<optimized out>, page=<optimized out>) at include/linux/vmstat.h:260
> #2  clear_page_dirty_for_io (page=0x628b7308) at mm/page-writeback.c:2333
> #3  0x0000000060188c36 in mpage_submit_page (mpd=0x808ebb90, page=<optimized out>) at fs/ext4/inode.c:1785
> #4  0x000000006018917e in mpage_map_and_submit_buffers (mpd=0x808ebb90) at fs/ext4/inode.c:1981
> #5  0x000000006018d64a in mpage_map_and_submit_extent (give_up_on_write=<optimized out>, mpd=<optimized out>, handle=<optimized out>) at fs/ext4/inode.c:2123
> #6  ext4_writepages (mapping=<optimized out>, wbc=<optimized out>) at fs/ext4/inode.c:2428
> #7  0x00000000600f0838 in do_writepages	(mapping=<optimized out>, wbc=<optimized out>) at mm/page-writeback.c:2043
> #8  0x0000000060143d29 in __writeback_single_inode (inode=0x75e191a8, wbc=0x808ebcb8) at fs/fs-writeback.c:461
> #9  0x0000000060144c00 in writeback_sb_inodes (sb=<optimized out>, wb=0x80a92330, work=0x808ebe00) at fs/fs-writeback.c:688
> #10 0x0000000060144e0e in __writeback_inodes_wb (wb=0x808eb990,	work=0x628b7308) at fs/fs-writeback.c:733
> #11 0x0000000060144f8d in wb_writeback (wb=0x80a92330, work=0x808ebe00) at fs/fs-writeback.c:864
> #12 0x0000000060145375 in wb_check_old_data_flush (wb=<optimized out>) at fs/fs-writeback.c:979
> #13 wb_do_writeback (wb=<optimized out>) at fs/fs-writeback.c:1014
> #14 bdi_writeback_workfn (work=0x808eb990) at fs/fs-writeback.c:1044
> #15 0x00000000600690a2 in process_one_work (worker=0x808c3700, work=0x80a92340)	at kernel/workqueue.c:2023
> #16 0x0000000060069b5e in worker_thread	(__worker=0x808eb990) at kernel/workqueue.c:2155
> #17 0x000000006006dd9f in kthread (_create=0x80822040) at kernel/kthread.c:207
> #18 0x000000006003ab59 in new_thread_handler ()	at arch/um/kernel/process.c:129
> #19 0x0000000000000000 in ?? ()
>
>
>
>
> ------------------------------------------------------------------------------
> Comprehensive Server Monitoring with Site24x7.
> Monitor 10 servers for $9/Month.
> Get alerted through email, SMS, voice calls or mobile push notifications.
> Take corrective actions from your mobile device.
> http://p.sf.net/sfu/Zoho
> _______________________________________________
> User-mode-linux-devel mailing list
> User-mode-linux-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel


------------------------------------------------------------------------------
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [uml-devel] kernel stalls in balance_dirty_pages_ratelimited()
  2014-10-23  7:34                 ` Anton Ivanov
@ 2014-10-23  8:36                   ` Richard Weinberger
  2014-10-23  8:47                     ` Anton Ivanov
  0 siblings, 1 reply; 13+ messages in thread
From: Richard Weinberger @ 2014-10-23  8:36 UTC (permalink / raw)
  To: Anton Ivanov, user-mode-linux-devel

Anton,

Am 23.10.2014 um 09:34 schrieb Anton Ivanov:
> Hi Richard,
> 
> I have had a question in my mind on this for quite a while around this bug (and a quite a few others).

Fell free to ask, I'll try hand to find time to answer.

> UML by the nature of being UP includes the generic spinlock definition for UP which does very little if any.

Correct. IIRC they are a NOP on UML as it does not support preemption.

> How exactly does this work on a multicore host if you have a different thread hitting the same critical section and that thread is running on a different core? Isn't that
> effectively a form of weird SMP (which in turn requires stricter locking). Am I missing something here?

>>From the host side of view the UML kernel a single process and therefore there is no parallel execution at all.

Does this help?

Thanks,
//richard

------------------------------------------------------------------------------
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [uml-devel] kernel stalls in balance_dirty_pages_ratelimited()
  2014-10-23  8:36                   ` Richard Weinberger
@ 2014-10-23  8:47                     ` Anton Ivanov
  0 siblings, 0 replies; 13+ messages in thread
From: Anton Ivanov @ 2014-10-23  8:47 UTC (permalink / raw)
  To: user-mode-linux-devel

On 23/10/14 09:36, Richard Weinberger wrote:
> Anton,
>
> Am 23.10.2014 um 09:34 schrieb Anton Ivanov:
>> Hi Richard,
>>
>> I have had a question in my mind on this for quite a while around this bug (and a quite a few others).
> Fell free to ask, I'll try hand to find time to answer.
>
>> UML by the nature of being UP includes the generic spinlock definition for UP which does very little if any.
> Correct. IIRC they are a NOP on UML as it does not support preemption.
>
>> How exactly does this work on a multicore host if you have a different thread hitting the same critical section and that thread is running on a different core? Isn't that
>> effectively a form of weird SMP (which in turn requires stricter locking). Am I missing something here?
> >From the host side of view the UML kernel a single process and therefore there is no parallel execution at all.
>
> Does this help?

That is what I have always thought as well.

Just checking that I am not missing something, thanks.

A.
>
> Thanks,
> //richard
>
> ------------------------------------------------------------------------------
> _______________________________________________
> User-mode-linux-devel mailing list
> User-mode-linux-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel


------------------------------------------------------------------------------
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2014-10-23  8:47 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-13 21:48 [uml-devel] kernel stalls in balance_dirty_pages_ratelimited() Thomas Meyer
2014-10-14  5:38 ` Anton Ivanov
2014-10-14  6:43   ` Anton Ivanov
2014-10-14  7:21     ` Thomas Meyer
2014-10-14  7:31       ` Anton Ivanov
2014-10-19 14:59         ` Thomas Meyer
2014-10-19 16:02           ` Anton Ivanov
2014-10-19 16:02             ` Anton Ivanov
2014-10-19 19:35             ` Thomas Meyer
2014-10-20 20:01               ` Thomas Meyer
2014-10-23  7:34                 ` Anton Ivanov
2014-10-23  8:36                   ` Richard Weinberger
2014-10-23  8:47                     ` Anton Ivanov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.