From: Li Zefan <lizf@cn.fujitsu.com>
To: Paul Menage <menage@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
LKML <linux-kernel@vger.kernel.org>,
Linux Containers <containers@lists.linux-foundation.org>,
Balbir Singh <balbir@linux.vnet.ibm.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Paul Jackson <pj@sgi.com>
Subject: Re: [PATCH] cgroup: fix a race condition in manipulating tsk->cg_list
Date: Thu, 17 Apr 2008 13:04:47 +0800 [thread overview]
Message-ID: <4806DA6F.3000405@cn.fujitsu.com> (raw)
In-Reply-To: <6599ad830804162118g6b24d8ebq26b0d72133b0e19e@mail.gmail.com>
Paul Menage wrote:
> On Wed, Apr 16, 2008 at 8:37 PM, Li Zefan <lizf@cn.fujitsu.com> wrote:
>> When I ran a test program to fork mass processes and at the same time
>> 'cat /cgroup/tasks', I got the following oops:
>>
>> ------------[ cut here ]------------
>> kernel BUG at lib/list_debug.c:72!
>> invalid opcode: 0000 [#1] SMP
>> Pid: 4178, comm: a.out Not tainted (2.6.25-rc9 #72)
>> ...
>> Call Trace:
>> [<c044a5f9>] ? cgroup_exit+0x55/0x94
>> [<c0427acf>] ? do_exit+0x217/0x5ba
>> [<c0427ed7>] ? do_group_exit+0.65/0x7c
>> [<c0427efd>] ? sys_exit_group+0xf/0x11
>> [<c0404842>] ? syscall_call+0x7/0xb
>> [<c05e0000>] ? init_cyrix+0x2fa/0x479
>> ...
>> EIP: [<c04df671>] list_del+0x35/0x53 SS:ESP 0068:ebc7df4
>> ---[ end trace caffb7332252612b ]---
>> Fixing recursive fault but reboot is needed!
>>
>> After digging into the code and debugging, I finlly found out a race
>> situation:
>> do_exit()
>> ->cgroup_exit()
>> ->if (!list_empty(&tsk->cg_list))
>> list_del(&tsk->cg_list);
>>
>> cgroup_iter_start()
>> ->cgroup_enable_task_cg_list()
>> ->list_add(&tsk->cg_list, ..);
>>
>> In this case the list won't be deleted though the process has exited.
>>
>> We got two bug reports in the past, which seem to be the same bug as
>> this one:
>> http://lkml.org/lkml/2008/3/5/332
>> http://lkml.org/lkml/2007/10/17/224
>
> Yes, that looks like it could be the same one - great. But this
> corruption can only be triggered the first time you cat a tasks file
> after a reboot, right? That would partly explain why it was hard to
> reproduce (at least, I had trouble).
>
Right. I was lucky to trigger this and thus knew how to reproduce.
> My only thought about the downside of this is that an exiting task
> that gets stuck somewhere between setting PF_EXITING and calling
> cgroup_exit() won't show up in its cgroup's tasks file, since we'll
> enable cgroup links but skip it. I guess that's not a big deal.
>
Agree. I think it won't be a problem.
> Maybe it would be better to not do a cgroup_exit() until we're
> unhashed, so that cgroup_enable_task_cg_list() can't find the exiting
> task?
>
> Paul
>
next prev parent reply other threads:[~2008-04-17 5:06 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-04-17 3:37 [PATCH] cgroup: fix a race condition in manipulating tsk->cg_list Li Zefan
2008-04-17 4:18 ` Paul Menage
2008-04-17 4:28 ` Paul Menage
2008-04-17 5:04 ` Li Zefan [this message]
[not found] ` <4806DA6F.3000405-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2008-04-17 5:16 ` Andrew Morton
2008-04-17 5:16 ` Andrew Morton
[not found] ` <20080416221655.c73d219f.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2008-04-17 5:24 ` Paul Menage
2008-04-17 5:24 ` Paul Menage
[not found] ` <6599ad830804162224s42ba221vea981fe34b30636a-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-04-17 5:27 ` Li Zefan
2008-04-17 5:27 ` Li Zefan
[not found] ` <6599ad830804162118g6b24d8ebq26b0d72133b0e19e-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-04-17 4:28 ` Paul Menage
2008-04-17 5:04 ` Li Zefan
[not found] ` <4806C5EB.3040102-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2008-04-17 4:11 ` Andrew Morton
2008-04-17 4:11 ` Andrew Morton
[not found] ` <20080416211144.a38f6fc0.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2008-04-17 4:17 ` Paul Menage
2008-04-17 4:17 ` Paul Menage
2008-04-17 4:59 ` Andrew Morton
2008-04-17 5:10 ` Li Zefan
[not found] ` <4806DBC9.3090607-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2008-04-17 5:16 ` Andrew Morton
2008-04-17 5:16 ` Andrew Morton
2008-04-17 5:20 ` Paul Menage
[not found] ` <20080416215907.63d71409.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2008-04-17 5:10 ` Li Zefan
2008-04-17 5:20 ` Paul Menage
[not found] ` <6599ad830804162117w14364b7cg20d3694ffdfeb867-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-04-17 4:59 ` Andrew Morton
2008-04-17 4:18 ` Paul Menage
-- strict thread matches above, loose matches on Subject: below --
2008-04-17 3:37 Li Zefan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4806DA6F.3000405@cn.fujitsu.com \
--to=lizf@cn.fujitsu.com \
--cc=akpm@linux-foundation.org \
--cc=balbir@linux.vnet.ibm.com \
--cc=containers@lists.linux-foundation.org \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=menage@google.com \
--cc=pj@sgi.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.