From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: "Bruno Prémont" <bonbons@linux-vserver.org>,
"Linus Torvalds" <torvalds@linux-foundation.org>,
"Ingo Molnar" <mingo@elte.hu>,
"Peter Zijlstra" <a.p.zijlstra@chello.nl>,
"Mike Frysinger" <vapier.adi@gmail.com>,
"KOSAKI Motohiro" <kosaki.motohiro@jp.fujitsu.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linux-fsdevel@vger.kernel.org,
"Paul E. McKenney" <paul.mckenney@linaro.org>,
"Pekka Enberg" <penberg@kernel.org>
Subject: Re: 2.6.39-rc4+: Kernel leaking memory during FS scanning, regression?
Date: Wed, 27 Apr 2011 15:27:27 -0700 [thread overview]
Message-ID: <20110427222727.GU2135@linux.vnet.ibm.com> (raw)
In-Reply-To: <alpine.LFD.2.02.1104272351290.3323@ionos>
On Thu, Apr 28, 2011 at 12:06:11AM +0200, Thomas Gleixner wrote:
> On Wed, 27 Apr 2011, Bruno Prémont wrote:
> > On Wed, 27 April 2011 Bruno Prémont wrote:
> > Voluntary context switches stay constant from the time on SLABs pile up.
> > (which makes sense as it doesn't run get CPU slices anymore)
> >
> > > > Can you please enable CONFIG_SCHED_DEBUG and provide the output of
> > > > /proc/sched_stat when the problem surfaces and a minute after the
> > > > first snapshot?
> >
> > hm, did you mean CONFIG_SCHEDSTAT or /proc/sched_debug?
> >
> > I did use CONFIG_SCHED_DEBUG (and there is no /proc/sched_stat) so I took
> > /proc/sched_debug which exists... (attached, taken about 7min and +1min
> > after SLABs started piling up), though build processes were SIGSTOPped
> > during first minute.
>
> Oops. /proc/sched_debug is the right thing.
>
> > printk wrote (in case its timestamp is useful, more below):
> > [ 518.480103] sched: RT throttling activated
>
> Ok. Aside of the fact that the CPU time accounting is completely hosed
> this is pointing to the root cause of the problem.
>
> kthread_rcu seems to run in circles for whatever reason and the RT
> throttler catches it. After that things go down the drain completely
> as it should get on the CPU again after that 50ms throttling break.
Ah. This could happen if there was a huge number of callbacks, in
which case blimit would be set very large and kthread_rcu could then
go CPU-bound. And this workload was generating large numbers of
callbacks due to filesystem operations, right?
So, perhaps I should kick kthread_rcu back to SCHED_NORMAL if blimit
has been set high. Or have some throttling of my own. I must confess
that throttling kthread_rcu for two hours seems a bit harsh. ;-)
If this was just throttling kthread_rcu for a few hundred milliseconds,
or even for a second or two, things would be just fine.
Left to myself, I will put together a patch that puts callback processing
down to SCHED_NORMAL in the case where there are huge numbers of
callbacks to be processed.
> Though we should not ignore the fact, that the RT throttler hit, but
> none of the RT tasks actually accumulated runtime.
>
> So there is a couple of questions:
>
> - Why does the scheduler detect the 950 ms RT runtime, but does
> not accumulate that runtime to any thread
>
> - Why is the runtime accounting totally hosed
>
> - Why does that not happen (at least not reproducible) with
> TREE_RCU
This one I can answer -- In Linus's tree, TREE_RCU still uses softirq,
so there is no RCU kthread, so there is nothing to throttle other
than ksoftirqd itself.
Thanx, Paul
> I need some sleep now, but I will try to come up with sensible
> debugging tomorrow unless Paul or someone else beats me to it.
>
> Thanks,
>
> tglx
WARNING: multiple messages have this Message-ID (diff)
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: "Bruno Prémont" <bonbons@linux-vserver.org>,
"Linus Torvalds" <torvalds@linux-foundation.org>,
"Ingo Molnar" <mingo@elte.hu>,
"Peter Zijlstra" <a.p.zijlstra@chello.nl>,
"Mike Frysinger" <vapier.adi@gmail.com>,
"KOSAKI Motohiro" <kosaki.motohiro@jp.fujitsu.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linux-fsdevel@vger.kernel.org,
"Paul E. McKenney" <paul.mckenney@linaro.org>,
"Pekka Enberg" <penberg@kernel.org>
Subject: Re: 2.6.39-rc4+: Kernel leaking memory during FS scanning, regression?
Date: Wed, 27 Apr 2011 15:27:27 -0700 [thread overview]
Message-ID: <20110427222727.GU2135@linux.vnet.ibm.com> (raw)
In-Reply-To: <alpine.LFD.2.02.1104272351290.3323@ionos>
On Thu, Apr 28, 2011 at 12:06:11AM +0200, Thomas Gleixner wrote:
> On Wed, 27 Apr 2011, Bruno Premont wrote:
> > On Wed, 27 April 2011 Bruno Premont wrote:
> > Voluntary context switches stay constant from the time on SLABs pile up.
> > (which makes sense as it doesn't run get CPU slices anymore)
> >
> > > > Can you please enable CONFIG_SCHED_DEBUG and provide the output of
> > > > /proc/sched_stat when the problem surfaces and a minute after the
> > > > first snapshot?
> >
> > hm, did you mean CONFIG_SCHEDSTAT or /proc/sched_debug?
> >
> > I did use CONFIG_SCHED_DEBUG (and there is no /proc/sched_stat) so I took
> > /proc/sched_debug which exists... (attached, taken about 7min and +1min
> > after SLABs started piling up), though build processes were SIGSTOPped
> > during first minute.
>
> Oops. /proc/sched_debug is the right thing.
>
> > printk wrote (in case its timestamp is useful, more below):
> > [ 518.480103] sched: RT throttling activated
>
> Ok. Aside of the fact that the CPU time accounting is completely hosed
> this is pointing to the root cause of the problem.
>
> kthread_rcu seems to run in circles for whatever reason and the RT
> throttler catches it. After that things go down the drain completely
> as it should get on the CPU again after that 50ms throttling break.
Ah. This could happen if there was a huge number of callbacks, in
which case blimit would be set very large and kthread_rcu could then
go CPU-bound. And this workload was generating large numbers of
callbacks due to filesystem operations, right?
So, perhaps I should kick kthread_rcu back to SCHED_NORMAL if blimit
has been set high. Or have some throttling of my own. I must confess
that throttling kthread_rcu for two hours seems a bit harsh. ;-)
If this was just throttling kthread_rcu for a few hundred milliseconds,
or even for a second or two, things would be just fine.
Left to myself, I will put together a patch that puts callback processing
down to SCHED_NORMAL in the case where there are huge numbers of
callbacks to be processed.
> Though we should not ignore the fact, that the RT throttler hit, but
> none of the RT tasks actually accumulated runtime.
>
> So there is a couple of questions:
>
> - Why does the scheduler detect the 950 ms RT runtime, but does
> not accumulate that runtime to any thread
>
> - Why is the runtime accounting totally hosed
>
> - Why does that not happen (at least not reproducible) with
> TREE_RCU
This one I can answer -- In Linus's tree, TREE_RCU still uses softirq,
so there is no RCU kthread, so there is nothing to throttle other
than ksoftirqd itself.
Thanx, Paul
> I need some sleep now, but I will try to come up with sensible
> debugging tomorrow unless Paul or someone else beats me to it.
>
> Thanks,
>
> tglx
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2011-04-27 22:27 UTC|newest]
Thread overview: 184+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-04-24 18:21 2.6.39-rc4+: Kernel leaking memory during FS scanning, regression? Bruno Prémont
2011-04-24 21:59 ` Bruno Prémont
2011-04-24 21:59 ` Bruno Prémont
2011-04-25 2:42 ` KOSAKI Motohiro
2011-04-25 2:42 ` KOSAKI Motohiro
2011-04-25 7:47 ` Mike Frysinger
2011-04-25 7:47 ` Mike Frysinger
2011-04-25 9:17 ` Bruno Prémont
2011-04-25 9:17 ` Bruno Prémont
2011-04-25 9:25 ` Pekka Enberg
2011-04-25 9:25 ` Pekka Enberg
2011-04-25 10:34 ` Bruno Prémont
2011-04-25 10:34 ` Bruno Prémont
2011-04-25 11:41 ` Bruno Prémont
2011-04-25 11:41 ` Bruno Prémont
2011-04-25 11:47 ` Pekka Enberg
2011-04-25 11:47 ` Pekka Enberg
2011-04-25 12:11 ` Bruno Prémont
2011-04-25 12:11 ` Bruno Prémont
2011-04-25 12:14 ` Tetsuo Handa
2011-04-25 12:14 ` Tetsuo Handa
2011-04-25 12:21 ` Tetsuo Handa
2011-04-25 12:21 ` Tetsuo Handa
2011-04-25 15:22 ` Linus Torvalds
2011-04-25 15:22 ` Linus Torvalds
2011-04-25 16:04 ` Bruno Prémont
2011-04-25 16:31 ` Linus Torvalds
2011-04-25 16:31 ` Linus Torvalds
2011-04-25 17:00 ` Bruno Prémont
2011-04-25 17:10 ` Linus Torvalds
2011-04-25 17:10 ` Linus Torvalds
2011-04-25 17:20 ` Linus Torvalds
2011-04-25 17:20 ` Linus Torvalds
2011-04-25 18:36 ` Bruno Prémont
2011-04-25 18:36 ` Bruno Prémont
2011-04-25 19:16 ` Paul E. McKenney
2011-04-25 19:16 ` Paul E. McKenney
2011-04-25 19:16 ` Paul E. McKenney
2011-04-25 21:10 ` Bruno Prémont
2011-04-25 21:26 ` Paul E. McKenney
2011-04-25 21:26 ` Paul E. McKenney
2011-04-25 21:30 ` Linus Torvalds
2011-04-25 21:30 ` Linus Torvalds
2011-04-25 21:49 ` Paul E. McKenney
2011-04-25 21:49 ` Paul E. McKenney
2011-04-25 21:49 ` Paul E. McKenney
2011-04-26 6:19 ` Bruno Prémont
2011-04-26 6:19 ` Bruno Prémont
2011-04-26 11:27 ` Paul E. McKenney
2011-04-26 11:27 ` Paul E. McKenney
2011-04-26 11:27 ` Paul E. McKenney
2011-04-26 16:38 ` Bruno Prémont
2011-04-26 17:09 ` Bruno Prémont
2011-04-26 17:09 ` Bruno Prémont
2011-04-26 17:18 ` Linus Torvalds
2011-04-26 17:18 ` Linus Torvalds
2011-04-26 22:28 ` Thomas Gleixner
2011-04-26 22:28 ` Thomas Gleixner
2011-04-27 6:15 ` Bruno Prémont
2011-04-27 6:15 ` Bruno Prémont
2011-04-27 18:41 ` Bruno Prémont
2011-04-27 19:16 ` Pádraig Brady
2011-04-27 19:16 ` Pádraig Brady
2011-04-27 19:16 ` Pádraig Brady
2011-04-27 19:34 ` Bruno Prémont
2011-04-27 19:34 ` Bruno Prémont
2011-04-27 22:05 ` Paul E. McKenney
2011-04-27 22:05 ` Paul E. McKenney
2011-04-27 22:05 ` Paul E. McKenney
2011-04-27 20:40 ` Bruno Prémont
2011-04-27 20:40 ` Bruno Prémont
2011-04-27 20:40 ` Bruno Prémont
2011-04-27 22:07 ` Paul E. McKenney
2011-04-27 22:07 ` Paul E. McKenney
2011-04-27 22:07 ` Paul E. McKenney
2011-04-28 6:10 ` Bruno Prémont
2011-04-28 6:10 ` Bruno Prémont
2011-04-28 6:10 ` Bruno Prémont
2011-04-27 22:06 ` Thomas Gleixner
2011-04-27 22:06 ` Thomas Gleixner
2011-04-27 22:27 ` Paul E. McKenney [this message]
2011-04-27 22:27 ` Paul E. McKenney
2011-04-27 22:32 ` Thomas Gleixner
2011-04-27 22:32 ` Thomas Gleixner
2011-04-27 22:59 ` Paul E. McKenney
2011-04-27 22:59 ` Paul E. McKenney
2011-04-27 22:59 ` Paul E. McKenney
2011-04-27 23:28 ` Linus Torvalds
2011-04-27 23:28 ` Linus Torvalds
2011-04-27 23:46 ` Linus Torvalds
2011-04-27 23:46 ` Linus Torvalds
2011-04-28 9:09 ` Thomas Gleixner
2011-04-28 9:09 ` Thomas Gleixner
2011-04-28 9:17 ` Sedat Dilek
2011-04-28 9:17 ` Sedat Dilek
2011-04-28 9:40 ` Thomas Gleixner
2011-04-28 9:40 ` Thomas Gleixner
2011-04-28 10:12 ` Mike Galbraith
2011-04-28 10:12 ` Mike Galbraith
2011-04-28 10:12 ` Mike Galbraith
2011-04-28 9:45 ` Sedat Dilek
2011-04-28 10:26 ` Paul E. McKenney
2011-04-28 10:26 ` Paul E. McKenney
2011-04-28 13:30 ` Mike Galbraith
2011-04-28 13:30 ` Mike Galbraith
2011-04-28 15:28 ` Sedat Dilek
2011-04-28 15:44 ` Sedat Dilek
2011-04-28 15:44 ` Sedat Dilek
2011-04-28 15:48 ` Linus Torvalds
2011-04-28 15:48 ` Linus Torvalds
2011-04-28 18:49 ` Thomas Gleixner
2011-04-28 18:49 ` Thomas Gleixner
2011-04-28 20:23 ` Bruno Prémont
2011-04-28 20:29 ` Thomas Gleixner
2011-04-28 20:29 ` Thomas Gleixner
2011-04-28 20:44 ` Bruno Prémont
2011-04-28 20:44 ` Bruno Prémont
2011-04-28 21:04 ` Thomas Gleixner
2011-04-28 21:04 ` Thomas Gleixner
2011-04-28 21:51 ` john stultz
2011-04-28 21:51 ` john stultz
2011-04-28 21:51 ` john stultz
2011-04-28 22:02 ` Thomas Gleixner
2011-04-28 22:02 ` Thomas Gleixner
2011-04-28 23:06 ` Sedat Dilek
2011-04-28 23:06 ` Sedat Dilek
2011-04-28 23:35 ` Sedat Dilek
2011-04-29 0:42 ` Paul E. McKenney
2011-04-29 0:42 ` Paul E. McKenney
2011-04-29 0:42 ` Paul E. McKenney
2011-04-29 9:34 ` Thomas Gleixner
2011-04-29 9:34 ` Thomas Gleixner
2011-04-29 7:55 ` Sedat Dilek
2011-04-29 7:55 ` Sedat Dilek
2011-04-29 18:09 ` Mike Frysinger
2011-04-29 18:09 ` Mike Frysinger
2011-04-29 18:26 ` Thomas Gleixner
2011-04-29 18:26 ` Thomas Gleixner
2011-04-29 19:31 ` Bruno Prémont
2011-04-29 19:31 ` Bruno Prémont
2011-04-29 20:10 ` Thomas Gleixner
2011-04-29 20:10 ` Thomas Gleixner
2011-04-29 20:14 ` Bruno Prémont
2011-04-29 20:14 ` Bruno Prémont
2011-04-30 9:14 ` Sedat Dilek
2011-04-30 9:14 ` Sedat Dilek
2011-04-28 20:41 ` Sedat Dilek
2011-04-28 19:22 ` Mike Galbraith
2011-04-28 19:22 ` Mike Galbraith
2011-04-27 21:55 ` Paul E. McKenney
2011-04-27 21:55 ` Paul E. McKenney
2011-04-27 21:55 ` Paul E. McKenney
2011-04-28 6:22 ` Bruno Prémont
2011-04-28 6:22 ` Bruno Prémont
2011-04-28 10:26 ` Paul E. McKenney
2011-04-28 10:26 ` Paul E. McKenney
2011-04-28 10:26 ` Paul E. McKenney
2011-04-26 17:12 ` Linus Torvalds
2011-04-26 17:12 ` Linus Torvalds
2011-04-26 17:12 ` Linus Torvalds
2011-04-26 18:50 ` Paul E. McKenney
2011-04-26 18:50 ` Paul E. McKenney
2011-04-26 19:17 ` Sedat Dilek
2011-04-26 19:17 ` Sedat Dilek
2011-04-27 22:02 ` Paul E. McKenney
2011-04-27 22:02 ` Paul E. McKenney
2011-04-27 22:02 ` Paul E. McKenney
2011-04-25 22:08 ` Mike Frysinger
2011-04-25 22:08 ` Mike Frysinger
2011-04-25 17:29 ` Paul E. McKenney
2011-04-25 17:29 ` Paul E. McKenney
2011-04-25 17:29 ` Paul E. McKenney
2011-04-25 18:13 ` Sedat Dilek
2011-04-25 18:13 ` Sedat Dilek
2011-04-25 18:28 ` Paul E. McKenney
2011-04-25 18:28 ` Paul E. McKenney
2011-04-25 18:28 ` Paul E. McKenney
2011-04-25 17:26 ` Paul E. McKenney
2011-04-25 17:26 ` Paul E. McKenney
2011-04-25 17:26 ` Paul E. McKenney
2011-04-27 10:28 ` Catalin Marinas
2011-04-27 10:28 ` Catalin Marinas
2011-04-25 17:51 ` Pekka Enberg
2011-04-25 17:51 ` Pekka Enberg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110427222727.GU2135@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=a.p.zijlstra@chello.nl \
--cc=bonbons@linux-vserver.org \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@elte.hu \
--cc=paul.mckenney@linaro.org \
--cc=penberg@kernel.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=vapier.adi@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.