From: Richard Weinberger <richard@nod.at>
To: "Toralf Förster" <toralf.foerster@gmx.de>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>,
UML devel <user-mode-linux-devel@lists.sourceforge.net>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
Jan Kara <jack@suse.cz>,
hannes@cmpxchg.org, darrick.wong@oracle.com,
Michal Hocko <mhocko@suse.cz>
Subject: Re: [uml-devel] BUG: soft lockup for a user mode linux image
Date: Wed, 09 Oct 2013 20:43:50 +0200 [thread overview]
Message-ID: <5255A3E6.6020100@nod.at> (raw)
In-Reply-To: <525591AD.4060401@gmx.de>
CC'ing mm folks.
Please see below.
Am 09.10.2013 19:26, schrieb Toralf Förster:
> On 10/08/2013 10:07 PM, Geert Uytterhoeven wrote:
>> On Sun, Oct 6, 2013 at 11:01 PM, Toralf Förster <toralf.foerster@gmx.de> wrote:
>>>> Hmm, now pages_dirtied is zero, according to the backtrace, but the BUG_ON()
>>>> asserts its strict positive?!?
>>>>
>>>> Can you please try the following instead of the BUG_ON():
>>>>
>>>> if (pause < 0) {
>>>> printk("pages_dirtied = %lu\n", pages_dirtied);
>>>> printk("task_ratelimit = %lu\n", task_ratelimit);
>>>> printk("pause = %ld\n", pause);
>>>> }
>>>>
>>>> Gr{oetje,eeting}s,
>>>>
>>>> Geert
>>> I tried it in different ways already - I'm completely unsuccessful in getting any printk output.
>>> As soon as the issue happens I do have a
>>>
>>> BUG: soft lockup - CPU#0 stuck for 22s! [trinity-child0:1521]
>>>
>>> at stderr of the UML and then no further input is accepted. With uml_mconsole I'm however able
>>> to run very basic commands like a crash dump, sysrq ond so on.
>>
>> You may get an idea of the magnitude of pages_dirtied by using a chain of
>> BUG_ON()s, like:
>>
>> BUG_ON(pages_dirtied > 2000000000);
>> BUG_ON(pages_dirtied > 1000000000);
>> BUG_ON(pages_dirtied > 100000000);
>> BUG_ON(pages_dirtied > 10000000);
>> BUG_ON(pages_dirtied > 1000000);
>>
>> Probably 1 million is already too much for normal operation?
>>
> period = HZ * pages_dirtied / task_ratelimit;
> BUG_ON(pages_dirtied > 2000000000);
> BUG_ON(pages_dirtied > 1000000000); <-------------- this is line 1467
Summary for mm people:
Toralf runs trinty on UML/i386.
After some time pages_dirtied becomes very large.
More than 1000000000 pages in this case.
Thus, period = HZ * pages_dirtied / task_ratelimit overflows
and period/pause becomes extremely large.
It looks like io_schedule_timeout() get's called with a very large timeout.
I don't know why "if (unlikely(pause > max_pause)) {" does not help.
Any ideas?
Thanks,
//richard
> the back trace is :
>
> tfoerste@n22 ~/devel/linux $ gdb --core=/mnt/ramdisk/core /home/tfoerste/devel/linux/linux -batch -ex bt
> [New LWP 6911]
> Core was generated by `/home/tfoerste/devel/linux/linux earlyprintk ubda=/home/tfoerste/virtual/uml/tr'.
> Program terminated with signal 6, Aborted.
> #0 0xb77a7424 in __kernel_vsyscall ()
> #0 0xb77a7424 in __kernel_vsyscall ()
> #1 0x083bdf35 in kill ()
> #2 0x0807296d in uml_abort () at arch/um/os-Linux/util.c:93
> #3 0x08072ca5 in os_dump_core () at arch/um/os-Linux/util.c:148
> #4 0x080623c4 in panic_exit (self=0x85c1558 <panic_exit_notifier>, unused1=0, unused2=0x85f76e0 <buf.16221>) at arch/um/kernel/um_arch.c:240
> #5 0x0809ba86 in notifier_call_chain (nl=0x0, val=0, v=0x85f76e0 <buf.16221>, nr_to_call=-2, nr_calls=0x0) at kernel/notifier.c:93
> #6 0x0809bba1 in __atomic_notifier_call_chain (nh=0x85f76c4 <panic_notifier_list>, val=0, v=0x85f76e0 <buf.16221>, nr_to_call=0, nr_calls=0x0) at kernel/notifier.c:182
> #7 0x0809bbdf in atomic_notifier_call_chain (nh=0x0, val=0, v=0x0) at kernel/notifier.c:191
> #8 0x0841b5bc in panic (fmt=0x0) at kernel/panic.c:130
> #9 0x0841c470 in balance_dirty_pages (pages_dirtied=23, mapping=<optimized out>) at mm/page-writeback.c:1467
> #10 0x080d3595 in balance_dirty_pages_ratelimited (mapping=0x6) at mm/page-writeback.c:1661
> #11 0x080e4a3f in __do_fault (mm=0x45ba3600, vma=0x48777b90, address=1084678144, pmd=0x0, pgoff=0, flags=0, orig_pte=<incomplete type>) at mm/memory.c:3452
> #12 0x080e6e0f in do_linear_fault (orig_pte=..., flags=<optimized out>, pmd=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>, page_table=<optimized out>) at mm/memory.c:3486
> #13 handle_pte_fault (flags=<optimized out>, pmd=<optimized out>, pte=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>) at mm/memory.c:3710
> #14 __handle_mm_fault (flags=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>) at mm/memory.c:3845
> #15 handle_mm_fault (mm=0x45ba3600, vma=0x487034c8, address=1084678144, flags=1) at mm/memory.c:3868
> #16 0x080e7817 in __get_user_pages (tsk=0x48705800, mm=0x45ba3600, start=1084678144, nr_pages=1025, gup_flags=519, pages=0x48558000, vmas=0x0, nonblocking=0x0) at mm/memory.c:1822
> #17 0x080e7ae3 in get_user_pages (tsk=0x0, mm=0x0, start=0, nr_pages=0, write=1, force=0, pages=0x48777b90, vmas=0x6) at mm/memory.c:2019
> #18 0x08143dc6 in aio_setup_ring (ctx=<optimized out>) at fs/aio.c:340
> #19 ioctx_alloc (nr_events=<optimized out>) at fs/aio.c:605
> #20 SYSC_io_setup (ctxp=<optimized out>, nr_events=<optimized out>) at fs/aio.c:1122
> #21 SyS_io_setup (nr_events=65535, ctxp=135081984) at fs/aio.c:1105
> #22 0x08062984 in handle_syscall (r=0x487059d4) at arch/um/kernel/skas/syscall.c:35
> #23 0x08074fb5 in handle_trap (local_using_sysemu=<optimized out>, regs=<optimized out>, pid=<optimized out>) at arch/um/os-Linux/skas/process.c:198
> #24 userspace (regs=0x487059d4) at arch/um/os-Linux/skas/process.c:431
> #25 0x0805f750 in fork_handler () at arch/um/kernel/process.c:160
> #26 0x00000000 in ?? ()
>
>
>
>> Gr{oetje,eeting}s,
>>
>> Geert
>>
>> --
>> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
>>
>> In personal conversations with technical people, I call myself a hacker. But
>> when I'm talking to journalists I just say "programmer" or something like that.
>> -- Linus Torvalds
>>
>
>
WARNING: multiple messages have this Message-ID (diff)
From: Richard Weinberger <richard@nod.at>
To: "Toralf Förster" <toralf.foerster@gmx.de>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>,
UML devel <user-mode-linux-devel@lists.sourceforge.net>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
Jan Kara <jack@suse.cz>,
hannes@cmpxchg.org, darrick.wong@oracle.com,
Michal Hocko <mhocko@suse.cz>
Subject: Re: [uml-devel] BUG: soft lockup for a user mode linux image
Date: Wed, 09 Oct 2013 20:43:50 +0200 [thread overview]
Message-ID: <5255A3E6.6020100@nod.at> (raw)
In-Reply-To: <525591AD.4060401@gmx.de>
CC'ing mm folks.
Please see below.
Am 09.10.2013 19:26, schrieb Toralf FA?rster:
> On 10/08/2013 10:07 PM, Geert Uytterhoeven wrote:
>> On Sun, Oct 6, 2013 at 11:01 PM, Toralf FA?rster <toralf.foerster@gmx.de> wrote:
>>>> Hmm, now pages_dirtied is zero, according to the backtrace, but the BUG_ON()
>>>> asserts its strict positive?!?
>>>>
>>>> Can you please try the following instead of the BUG_ON():
>>>>
>>>> if (pause < 0) {
>>>> printk("pages_dirtied = %lu\n", pages_dirtied);
>>>> printk("task_ratelimit = %lu\n", task_ratelimit);
>>>> printk("pause = %ld\n", pause);
>>>> }
>>>>
>>>> Gr{oetje,eeting}s,
>>>>
>>>> Geert
>>> I tried it in different ways already - I'm completely unsuccessful in getting any printk output.
>>> As soon as the issue happens I do have a
>>>
>>> BUG: soft lockup - CPU#0 stuck for 22s! [trinity-child0:1521]
>>>
>>> at stderr of the UML and then no further input is accepted. With uml_mconsole I'm however able
>>> to run very basic commands like a crash dump, sysrq ond so on.
>>
>> You may get an idea of the magnitude of pages_dirtied by using a chain of
>> BUG_ON()s, like:
>>
>> BUG_ON(pages_dirtied > 2000000000);
>> BUG_ON(pages_dirtied > 1000000000);
>> BUG_ON(pages_dirtied > 100000000);
>> BUG_ON(pages_dirtied > 10000000);
>> BUG_ON(pages_dirtied > 1000000);
>>
>> Probably 1 million is already too much for normal operation?
>>
> period = HZ * pages_dirtied / task_ratelimit;
> BUG_ON(pages_dirtied > 2000000000);
> BUG_ON(pages_dirtied > 1000000000); <-------------- this is line 1467
Summary for mm people:
Toralf runs trinty on UML/i386.
After some time pages_dirtied becomes very large.
More than 1000000000 pages in this case.
Thus, period = HZ * pages_dirtied / task_ratelimit overflows
and period/pause becomes extremely large.
It looks like io_schedule_timeout() get's called with a very large timeout.
I don't know why "if (unlikely(pause > max_pause)) {" does not help.
Any ideas?
Thanks,
//richard
> the back trace is :
>
> tfoerste@n22 ~/devel/linux $ gdb --core=/mnt/ramdisk/core /home/tfoerste/devel/linux/linux -batch -ex bt
> [New LWP 6911]
> Core was generated by `/home/tfoerste/devel/linux/linux earlyprintk ubda=/home/tfoerste/virtual/uml/tr'.
> Program terminated with signal 6, Aborted.
> #0 0xb77a7424 in __kernel_vsyscall ()
> #0 0xb77a7424 in __kernel_vsyscall ()
> #1 0x083bdf35 in kill ()
> #2 0x0807296d in uml_abort () at arch/um/os-Linux/util.c:93
> #3 0x08072ca5 in os_dump_core () at arch/um/os-Linux/util.c:148
> #4 0x080623c4 in panic_exit (self=0x85c1558 <panic_exit_notifier>, unused1=0, unused2=0x85f76e0 <buf.16221>) at arch/um/kernel/um_arch.c:240
> #5 0x0809ba86 in notifier_call_chain (nl=0x0, val=0, v=0x85f76e0 <buf.16221>, nr_to_call=-2, nr_calls=0x0) at kernel/notifier.c:93
> #6 0x0809bba1 in __atomic_notifier_call_chain (nh=0x85f76c4 <panic_notifier_list>, val=0, v=0x85f76e0 <buf.16221>, nr_to_call=0, nr_calls=0x0) at kernel/notifier.c:182
> #7 0x0809bbdf in atomic_notifier_call_chain (nh=0x0, val=0, v=0x0) at kernel/notifier.c:191
> #8 0x0841b5bc in panic (fmt=0x0) at kernel/panic.c:130
> #9 0x0841c470 in balance_dirty_pages (pages_dirtied=23, mapping=<optimized out>) at mm/page-writeback.c:1467
> #10 0x080d3595 in balance_dirty_pages_ratelimited (mapping=0x6) at mm/page-writeback.c:1661
> #11 0x080e4a3f in __do_fault (mm=0x45ba3600, vma=0x48777b90, address=1084678144, pmd=0x0, pgoff=0, flags=0, orig_pte=<incomplete type>) at mm/memory.c:3452
> #12 0x080e6e0f in do_linear_fault (orig_pte=..., flags=<optimized out>, pmd=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>, page_table=<optimized out>) at mm/memory.c:3486
> #13 handle_pte_fault (flags=<optimized out>, pmd=<optimized out>, pte=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>) at mm/memory.c:3710
> #14 __handle_mm_fault (flags=<optimized out>, address=<optimized out>, vma=<optimized out>, mm=<optimized out>) at mm/memory.c:3845
> #15 handle_mm_fault (mm=0x45ba3600, vma=0x487034c8, address=1084678144, flags=1) at mm/memory.c:3868
> #16 0x080e7817 in __get_user_pages (tsk=0x48705800, mm=0x45ba3600, start=1084678144, nr_pages=1025, gup_flags=519, pages=0x48558000, vmas=0x0, nonblocking=0x0) at mm/memory.c:1822
> #17 0x080e7ae3 in get_user_pages (tsk=0x0, mm=0x0, start=0, nr_pages=0, write=1, force=0, pages=0x48777b90, vmas=0x6) at mm/memory.c:2019
> #18 0x08143dc6 in aio_setup_ring (ctx=<optimized out>) at fs/aio.c:340
> #19 ioctx_alloc (nr_events=<optimized out>) at fs/aio.c:605
> #20 SYSC_io_setup (ctxp=<optimized out>, nr_events=<optimized out>) at fs/aio.c:1122
> #21 SyS_io_setup (nr_events=65535, ctxp=135081984) at fs/aio.c:1105
> #22 0x08062984 in handle_syscall (r=0x487059d4) at arch/um/kernel/skas/syscall.c:35
> #23 0x08074fb5 in handle_trap (local_using_sysemu=<optimized out>, regs=<optimized out>, pid=<optimized out>) at arch/um/os-Linux/skas/process.c:198
> #24 userspace (regs=0x487059d4) at arch/um/os-Linux/skas/process.c:431
> #25 0x0805f750 in fork_handler () at arch/um/kernel/process.c:160
> #26 0x00000000 in ?? ()
>
>
>
>> Gr{oetje,eeting}s,
>>
>> Geert
>>
>> --
>> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
>>
>> In personal conversations with technical people, I call myself a hacker. But
>> when I'm talking to journalists I just say "programmer" or something like that.
>> -- Linus Torvalds
>>
>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-10-09 18:43 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-10-02 18:30 [uml-devel] BUG: soft lockup for a user mode linux image Toralf Förster
2013-10-02 18:30 ` Toralf Förster
2013-10-02 18:31 ` [uml-devel] " Toralf Förster
2013-10-02 18:31 ` Toralf Förster
2013-10-02 19:55 ` [uml-devel] " Richard Weinberger
2013-10-02 19:55 ` Richard Weinberger
2013-10-03 18:54 ` Toralf Förster
2013-10-03 18:54 ` Toralf Förster
2013-10-03 19:04 ` Richard Weinberger
2013-10-03 19:04 ` Richard Weinberger
2013-10-03 19:16 ` [uml-devel] " Toralf Förster
2013-10-03 19:16 ` Toralf Förster
2013-10-03 19:20 ` Richard Weinberger
2013-10-03 19:20 ` Richard Weinberger
2013-10-03 19:33 ` Toralf Förster
2013-10-03 19:33 ` Toralf Förster
2013-10-04 5:52 ` Richard Weinberger
2013-10-04 5:52 ` Richard Weinberger
2013-10-06 14:17 ` Toralf Förster
2013-10-06 14:17 ` Toralf Förster
2013-10-06 18:38 ` Geert Uytterhoeven
2013-10-06 18:38 ` Geert Uytterhoeven
2013-10-06 20:08 ` Toralf Förster
2013-10-06 20:26 ` Geert Uytterhoeven
2013-10-06 21:01 ` Toralf Förster
2013-10-08 20:07 ` Geert Uytterhoeven
2013-10-08 20:07 ` Geert Uytterhoeven
2013-10-09 17:26 ` Toralf Förster
2013-10-09 18:43 ` Richard Weinberger [this message]
2013-10-09 18:43 ` Richard Weinberger
2013-10-09 21:47 ` Jan Kara
2013-10-09 21:47 ` Jan Kara
2013-10-09 22:33 ` Richard Weinberger
2013-10-09 22:33 ` Richard Weinberger
2013-10-09 22:33 ` Richard Weinberger
2013-10-10 16:49 ` Toralf Förster
2013-10-10 16:49 ` Toralf Förster
2013-10-10 16:49 ` Toralf Förster
2013-10-11 1:16 ` Fengguang Wu
2013-10-11 1:16 ` Fengguang Wu
2013-10-11 8:42 ` Toralf Förster
2013-10-11 8:42 ` Toralf Förster
2013-10-11 8:57 ` Fengguang Wu
2013-10-11 8:57 ` Fengguang Wu
2013-10-11 8:57 ` Fengguang Wu
2013-10-11 9:05 ` Fengguang Wu
2013-10-11 9:05 ` Fengguang Wu
2013-10-11 14:12 ` Toralf Förster
2013-10-11 14:12 ` Toralf Förster
2013-10-11 14:12 ` Toralf Förster
2013-10-12 0:43 ` [PATCH] writeback: fix negative bdi max pause Fengguang Wu
2013-10-12 0:43 ` Fengguang Wu
2013-10-12 4:45 ` [PATCH v2] " Fengguang Wu
2013-10-12 4:45 ` Fengguang Wu
2013-10-14 12:34 ` Jan Kara
2013-10-14 12:34 ` Jan Kara
2013-10-10 2:46 ` [uml-devel] BUG: soft lockup for a user mode linux image Fengguang Wu
2013-10-10 2:46 ` Fengguang Wu
2013-10-10 6:52 ` Geert Uytterhoeven
2013-10-10 6:52 ` Geert Uytterhoeven
2013-10-10 7:03 ` Fengguang Wu
2013-10-10 7:03 ` Fengguang Wu
2013-10-08 19:56 ` Toralf Förster
2013-10-09 10:35 ` stian
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5255A3E6.6020100@nod.at \
--to=richard@nod.at \
--cc=akpm@linux-foundation.org \
--cc=darrick.wong@oracle.com \
--cc=geert@linux-m68k.org \
--cc=hannes@cmpxchg.org \
--cc=jack@suse.cz \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.cz \
--cc=toralf.foerster@gmx.de \
--cc=user-mode-linux-devel@lists.sourceforge.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.