From: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
To: Paul Menage <menage@google.com>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Rik van Riel <riel@redhat.com>
Subject: Re: [BUG?] 2.6.25-rc[23]-mm1 cgroup list corruption under load with VM Scalability patches
Date: Tue, 18 Mar 2008 14:10:11 -0400 [thread overview]
Message-ID: <1205863811.5032.26.camel@localhost> (raw)
In-Reply-To: <6599ad830803051309g22d5b746ta30c4f28a394572c@mail.gmail.com>
On Wed, 2008-03-05 at 13:09 -0800, Paul Menage wrote:
> On Wed, Mar 5, 2008 at 11:37 AM, Lee Schermerhorn
> <Lee.Schermerhorn@hp.com> wrote:
> > list_del corruption in cgroup_exit() on 16 cpu, 32GB ia64 NUMA platform.
> >
> > I've been seeing this for a while now, but we've had known problems
> > [page leaks, ...] with the VM scalability series. Now the system
> > appears to be running very well with these patches under stress loads
> > that would hang it or cause OOM kill of tests with plenty of swap space
> > left. Eventually, [after 40-45 minutes], I hit a list corruption in
> > cgroup_exit().
> >
> > I can't say for sure that our patches aren't causing this, but I've been
> > unable to keep the system up long enough under the stress load w/o the
> > splitlru+noreclaim patches to hit the problem.
> >
> > I looked in the mailing lists and found one other thread related to
> > cgroup list corruption:
> >
> > http://marc.info/?l=linux-kernel&m=119263666823236&w=4
> >
> > Paul looked into this and couldn't see anywhere that the lists are
> > manipulate w/o holding the css set lock. I concur. I did find one
> > possible race in enabling the task cg_lists [see patch below], but this
> > did not solve the problem. And I did not hit the printk in the patch.
>
> No, that's not a (malign) race - cgroup_enable_task_cg_lists() is
> idempotent. In the case that you see, every thread seen in the
> do_each_thread() loop will already have a non-empty cg_list field, so
> it will be a no-op. So adding the additional check isn't wrong but
> it's not needed.
>
> I'll look again at the code to try to figure out where the problem is.
Paul:
just wanted to let you know that I did manage to hit this list
corruption--same stack trace: cgroup_exit() from do_exit() ...--on
25-rc3-mm1 WITHOUT any of the vm scalability [split-lru/noreclaim-mlock]
patches applied. This occurred ~9 minutes into a fairly heavy 'usex'
load on my 16 cpu ia64 platform.
An x86_64 version [w/ prebuilt binaries of the tools used] of the stress
load is available here:
http://free.linux.hp.com/~lts/Temp/
There's a README there describing the contents of the tarball. I
haven't tried this load on an x86_64 recently, so I don't know if it
will trigger the problem there.
Lee
next prev parent reply other threads:[~2008-03-19 19:50 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-03-05 19:37 [BUG?] 2.6.25-rc[23]-mm1 cgroup list corruption under load with VM Scalability patches Lee Schermerhorn
2008-03-05 21:09 ` Paul Menage
2008-03-18 18:10 ` Lee Schermerhorn [this message]
2008-03-20 7:58 ` KOSAKI Motohiro
2008-03-20 14:52 ` kamezawa.hiroyu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1205863811.5032.26.camel@localhost \
--to=lee.schermerhorn@hp.com \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=menage@google.com \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.