All of lore.kernel.org
 help / color / mirror / Atom feed
From: Fengguang Wu <fengguang.wu@intel.com>
To: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: "Jan Kara" <jack@suse.cz>, "Richard Weinberger" <richard@nod.at>,
	"Toralf Förster" <toralf.foerster@gmx.de>,
	"UML devel" <user-mode-linux-devel@lists.sourceforge.net>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	hannes@cmpxchg.org, darrick.wong@oracle.com,
	"Michal Hocko" <mhocko@suse.cz>,
	"Gu Zheng" <guz.fnst@cn.fujitsu.com>,
	"Benjamin LaHaise" <bcrl@kvack.org>
Subject: Re: [uml-devel] BUG: soft lockup for a user mode linux image
Date: Thu, 10 Oct 2013 15:03:24 +0800	[thread overview]
Message-ID: <20131010070324.GA24244@localhost> (raw)
In-Reply-To: <CAMuHMdWSQrgMtW84QLs1Q96Jg-sYntS9Ohz-sXd3dWhuR2O7mw@mail.gmail.com>

On Thu, Oct 10, 2013 at 08:52:33AM +0200, Geert Uytterhoeven wrote:
> On Thu, Oct 10, 2013 at 4:46 AM, Fengguang Wu <fengguang.wu@intel.com> wrote:
> > On Wed, Oct 09, 2013 at 11:47:33PM +0200, Jan Kara wrote:
> >> On Wed 09-10-13 20:43:50, Richard Weinberger wrote:
> >> > Am 09.10.2013 19:26, schrieb Toralf Förster:
> >> > > On 10/08/2013 10:07 PM, Geert Uytterhoeven wrote:
> >> > >> On Sun, Oct 6, 2013 at 11:01 PM, Toralf Förster <toralf.foerster@gmx.de> wrote:
> >> > >>>> Hmm, now pages_dirtied is zero, according to the backtrace, but the BUG_ON()
> >> > >>>> asserts its strict positive?!?
> >> > >>>>
> >> > >>>> Can you please try the following instead of the BUG_ON():
> >> > >>>>
> >> > >>>> if (pause < 0) {
> >> > >>>>         printk("pages_dirtied = %lu\n", pages_dirtied);
> >> > >>>>         printk("task_ratelimit = %lu\n", task_ratelimit);
> >> > >>>>         printk("pause = %ld\n", pause);
> 
> >> > >>> I tried it in different ways already - I'm completely unsuccessful in getting any printk output.
> >> > >>> As soon as the issue happens I do have a
> >> > >>>
> >> > >>> BUG: soft lockup - CPU#0 stuck for 22s! [trinity-child0:1521]
> >> > >>>
> >> > >>> at stderr of the UML and then no further input is accepted. With uml_mconsole I'm however able
> >> > >>> to run very basic commands like a crash dump, sysrq ond so on.
> >> > >>
> >> > >> You may get an idea of the magnitude of pages_dirtied by using a chain of
> >> > >> BUG_ON()s, like:
> >> > >>
> >> > >> BUG_ON(pages_dirtied > 2000000000);
> >> > >> BUG_ON(pages_dirtied > 1000000000);
> >> > >> BUG_ON(pages_dirtied > 100000000);
> >> > >> BUG_ON(pages_dirtied > 10000000);
> >> > >> BUG_ON(pages_dirtied > 1000000);
> >> > >>
> >> > >> Probably 1 million is already too much for normal operation?
> >> > >>
> >> > > period = HZ * pages_dirtied / task_ratelimit;
> >> > >           BUG_ON(pages_dirtied > 2000000000);
> >> > >           BUG_ON(pages_dirtied > 1000000000);      <-------------- this is line 1467
> >> >
> >> > Summary for mm people:
> >> >
> >> > Toralf runs trinty on UML/i386.
> >> > After some time pages_dirtied becomes very large.
> >> > More than 1000000000 pages in this case.
> >>   Huh, this is really strange. pages_dirtied is passed into
> >> balance_dirty_pages() from current->nr_dirtied. So I wonder how a value
> >> over 10^9 can get there.
> >
> > I noticed aio_setup_ring() in the call trace and find it recently
> > added a SetPageDirty() call in a loop by commit 36bc08cc01 ("fs/aio:
> > Add support to aio ring pages migration"). So added CC to its authors.
> >
> >> After all that is over 4TB so I somewhat doubt the
> >> task was ever able to dirty that much during its lifetime (but correct me
> >> if I'm wrong here, with UML and memory backed disks it is not totally
> >> impossible)... I went through the logic of handling ->nr_dirtied but
> >> I didn't find any obvious problem there. Hum, maybe one thing - what
> >> 'task_ratelimit' values do you see in balance_dirty_pages? If that one was
> >> huge, we could possibly accumulate huge current->nr_dirtied.
> >>
> >> > Thus, period = HZ * pages_dirtied / task_ratelimit overflows
> >> > and period/pause becomes extremely large.
> 
> period/pause are signed long, so they become negative instead of
> extremely large when overflowing.

Yeah. For that we have underflow detect as well: 

                if (pause < min_pause) {
                        ...
                        break;
                }

So we'll break out of the loop -- but yeah, whether the break is the
right behavior on underflow is still questionable.

> >> > It looks like io_schedule_timeout() get's called with a very large timeout.
> >> > I don't know why "if (unlikely(pause > max_pause)) {" does not help.
> 
> Because pause is now negative.

So here io_schedule_timeout() won't be called with negative pause.

And if ever io_schedule_timeout() calls schedule_timeout() with
negative timeout, the latter will emit a warning and break out, too:

                if (timeout < 0) {
                        printk(KERN_ERR "schedule_timeout: wrong timeout "
                                "value %lx\n", timeout);
                        dump_stack();
                        current->state = TASK_RUNNING;
                        goto out;
                }

Thanks,
Fengguang


WARNING: multiple messages have this Message-ID (diff)
From: Fengguang Wu <fengguang.wu@intel.com>
To: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: "Jan Kara" <jack@suse.cz>, "Richard Weinberger" <richard@nod.at>,
	"Toralf Förster" <toralf.foerster@gmx.de>,
	"UML devel" <user-mode-linux-devel@lists.sourceforge.net>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	hannes@cmpxchg.org, darrick.wong@oracle.com,
	"Michal Hocko" <mhocko@suse.cz>,
	"Gu Zheng" <guz.fnst@cn.fujitsu.com>,
	"Benjamin LaHaise" <bcrl@kvack.org>
Subject: Re: [uml-devel] BUG: soft lockup for a user mode linux image
Date: Thu, 10 Oct 2013 15:03:24 +0800	[thread overview]
Message-ID: <20131010070324.GA24244@localhost> (raw)
In-Reply-To: <CAMuHMdWSQrgMtW84QLs1Q96Jg-sYntS9Ohz-sXd3dWhuR2O7mw@mail.gmail.com>

On Thu, Oct 10, 2013 at 08:52:33AM +0200, Geert Uytterhoeven wrote:
> On Thu, Oct 10, 2013 at 4:46 AM, Fengguang Wu <fengguang.wu@intel.com> wrote:
> > On Wed, Oct 09, 2013 at 11:47:33PM +0200, Jan Kara wrote:
> >> On Wed 09-10-13 20:43:50, Richard Weinberger wrote:
> >> > Am 09.10.2013 19:26, schrieb Toralf FA?rster:
> >> > > On 10/08/2013 10:07 PM, Geert Uytterhoeven wrote:
> >> > >> On Sun, Oct 6, 2013 at 11:01 PM, Toralf FA?rster <toralf.foerster@gmx.de> wrote:
> >> > >>>> Hmm, now pages_dirtied is zero, according to the backtrace, but the BUG_ON()
> >> > >>>> asserts its strict positive?!?
> >> > >>>>
> >> > >>>> Can you please try the following instead of the BUG_ON():
> >> > >>>>
> >> > >>>> if (pause < 0) {
> >> > >>>>         printk("pages_dirtied = %lu\n", pages_dirtied);
> >> > >>>>         printk("task_ratelimit = %lu\n", task_ratelimit);
> >> > >>>>         printk("pause = %ld\n", pause);
> 
> >> > >>> I tried it in different ways already - I'm completely unsuccessful in getting any printk output.
> >> > >>> As soon as the issue happens I do have a
> >> > >>>
> >> > >>> BUG: soft lockup - CPU#0 stuck for 22s! [trinity-child0:1521]
> >> > >>>
> >> > >>> at stderr of the UML and then no further input is accepted. With uml_mconsole I'm however able
> >> > >>> to run very basic commands like a crash dump, sysrq ond so on.
> >> > >>
> >> > >> You may get an idea of the magnitude of pages_dirtied by using a chain of
> >> > >> BUG_ON()s, like:
> >> > >>
> >> > >> BUG_ON(pages_dirtied > 2000000000);
> >> > >> BUG_ON(pages_dirtied > 1000000000);
> >> > >> BUG_ON(pages_dirtied > 100000000);
> >> > >> BUG_ON(pages_dirtied > 10000000);
> >> > >> BUG_ON(pages_dirtied > 1000000);
> >> > >>
> >> > >> Probably 1 million is already too much for normal operation?
> >> > >>
> >> > > period = HZ * pages_dirtied / task_ratelimit;
> >> > >           BUG_ON(pages_dirtied > 2000000000);
> >> > >           BUG_ON(pages_dirtied > 1000000000);      <-------------- this is line 1467
> >> >
> >> > Summary for mm people:
> >> >
> >> > Toralf runs trinty on UML/i386.
> >> > After some time pages_dirtied becomes very large.
> >> > More than 1000000000 pages in this case.
> >>   Huh, this is really strange. pages_dirtied is passed into
> >> balance_dirty_pages() from current->nr_dirtied. So I wonder how a value
> >> over 10^9 can get there.
> >
> > I noticed aio_setup_ring() in the call trace and find it recently
> > added a SetPageDirty() call in a loop by commit 36bc08cc01 ("fs/aio:
> > Add support to aio ring pages migration"). So added CC to its authors.
> >
> >> After all that is over 4TB so I somewhat doubt the
> >> task was ever able to dirty that much during its lifetime (but correct me
> >> if I'm wrong here, with UML and memory backed disks it is not totally
> >> impossible)... I went through the logic of handling ->nr_dirtied but
> >> I didn't find any obvious problem there. Hum, maybe one thing - what
> >> 'task_ratelimit' values do you see in balance_dirty_pages? If that one was
> >> huge, we could possibly accumulate huge current->nr_dirtied.
> >>
> >> > Thus, period = HZ * pages_dirtied / task_ratelimit overflows
> >> > and period/pause becomes extremely large.
> 
> period/pause are signed long, so they become negative instead of
> extremely large when overflowing.

Yeah. For that we have underflow detect as well: 

                if (pause < min_pause) {
                        ...
                        break;
                }

So we'll break out of the loop -- but yeah, whether the break is the
right behavior on underflow is still questionable.

> >> > It looks like io_schedule_timeout() get's called with a very large timeout.
> >> > I don't know why "if (unlikely(pause > max_pause)) {" does not help.
> 
> Because pause is now negative.

So here io_schedule_timeout() won't be called with negative pause.

And if ever io_schedule_timeout() calls schedule_timeout() with
negative timeout, the latter will emit a warning and break out, too:

                if (timeout < 0) {
                        printk(KERN_ERR "schedule_timeout: wrong timeout "
                                "value %lx\n", timeout);
                        dump_stack();
                        current->state = TASK_RUNNING;
                        goto out;
                }

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2013-10-10  7:03 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-02 18:30 [uml-devel] BUG: soft lockup for a user mode linux image Toralf Förster
2013-10-02 18:30 ` Toralf Förster
2013-10-02 18:31 ` [uml-devel] " Toralf Förster
2013-10-02 18:31   ` Toralf Förster
2013-10-02 19:55 ` [uml-devel] " Richard Weinberger
2013-10-02 19:55   ` Richard Weinberger
2013-10-03 18:54   ` Toralf Förster
2013-10-03 18:54     ` Toralf Förster
2013-10-03 19:04     ` Richard Weinberger
2013-10-03 19:04       ` Richard Weinberger
2013-10-03 19:16       ` [uml-devel] " Toralf Förster
2013-10-03 19:16         ` Toralf Förster
2013-10-03 19:20         ` Richard Weinberger
2013-10-03 19:20           ` Richard Weinberger
2013-10-03 19:33           ` Toralf Förster
2013-10-03 19:33             ` Toralf Förster
2013-10-04  5:52             ` Richard Weinberger
2013-10-04  5:52               ` Richard Weinberger
2013-10-06 14:17               ` Toralf Förster
2013-10-06 14:17                 ` Toralf Förster
2013-10-06 18:38                 ` Geert Uytterhoeven
2013-10-06 18:38                   ` Geert Uytterhoeven
2013-10-06 20:08                   ` Toralf Förster
2013-10-06 20:26                     ` Geert Uytterhoeven
2013-10-06 21:01                       ` Toralf Förster
2013-10-08 20:07                         ` Geert Uytterhoeven
2013-10-08 20:07                           ` Geert Uytterhoeven
2013-10-09 17:26                           ` Toralf Förster
2013-10-09 18:43                             ` Richard Weinberger
2013-10-09 18:43                               ` Richard Weinberger
2013-10-09 21:47                               ` Jan Kara
2013-10-09 21:47                                 ` Jan Kara
2013-10-09 22:33                                 ` Richard Weinberger
2013-10-09 22:33                                   ` Richard Weinberger
2013-10-09 22:33                                   ` Richard Weinberger
2013-10-10 16:49                                   ` Toralf Förster
2013-10-10 16:49                                     ` Toralf Förster
2013-10-10 16:49                                     ` Toralf Förster
2013-10-11  1:16                                     ` Fengguang Wu
2013-10-11  1:16                                       ` Fengguang Wu
2013-10-11  8:42                                       ` Toralf Förster
2013-10-11  8:42                                         ` Toralf Förster
2013-10-11  8:57                                         ` Fengguang Wu
2013-10-11  8:57                                           ` Fengguang Wu
2013-10-11  8:57                                           ` Fengguang Wu
2013-10-11  9:05                                           ` Fengguang Wu
2013-10-11  9:05                                             ` Fengguang Wu
2013-10-11 14:12                                           ` Toralf Förster
2013-10-11 14:12                                             ` Toralf Förster
2013-10-11 14:12                                             ` Toralf Förster
2013-10-12  0:43                                             ` [PATCH] writeback: fix negative bdi max pause Fengguang Wu
2013-10-12  0:43                                               ` Fengguang Wu
2013-10-12  4:45                                             ` [PATCH v2] " Fengguang Wu
2013-10-12  4:45                                               ` Fengguang Wu
2013-10-14 12:34                                               ` Jan Kara
2013-10-14 12:34                                                 ` Jan Kara
2013-10-10  2:46                                 ` [uml-devel] BUG: soft lockup for a user mode linux image Fengguang Wu
2013-10-10  2:46                                   ` Fengguang Wu
2013-10-10  6:52                                   ` Geert Uytterhoeven
2013-10-10  6:52                                     ` Geert Uytterhoeven
2013-10-10  7:03                                     ` Fengguang Wu [this message]
2013-10-10  7:03                                       ` Fengguang Wu
2013-10-08 19:56                       ` Toralf Förster
2013-10-09 10:35                         ` stian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131010070324.GA24244@localhost \
    --to=fengguang.wu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=bcrl@kvack.org \
    --cc=darrick.wong@oracle.com \
    --cc=geert@linux-m68k.org \
    --cc=guz.fnst@cn.fujitsu.com \
    --cc=hannes@cmpxchg.org \
    --cc=jack@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    --cc=richard@nod.at \
    --cc=toralf.foerster@gmx.de \
    --cc=user-mode-linux-devel@lists.sourceforge.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.