Re: Kernel crash in cgroup_pidlist_destroy_work_fn()

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
To: Cong Wang <xiyou.wangcong-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	LKML <linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: Kernel crash in cgroup_pidlist_destroy_work_fn()
Date: Wed, 17 Sep 2014 17:26:07 +0800	[thread overview]
Message-ID: <541953AF.7000201@huawei.com> (raw)
In-Reply-To: <54191C41.3080003-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

On 2014/9/17 13:29, Li Zefan wrote:
> On 2014/9/17 7:56, Cong Wang wrote:
>> Hi, Tejun
>>
>>
>> We saw some kernel null pointer dereference in
>> cgroup_pidlist_destroy_work_fn(), more precisely at
>> __mutex_lock_slowpath(), on 3.14. I can show you the full stack trace
>> on request.
>>
> 
> Yes, please.
> 
>> Looking at the code, it seems flush_workqueue() doesn't care about new
>> incoming works, it only processes currently pending ones, if this is
>> correct, then we could have the following race condition:
>>
>> cgroup_pidlist_destroy_all():
>>         //...
>>         mutex_lock(&cgrp->pidlist_mutex);
>>         list_for_each_entry_safe(l, tmp_l, &cgrp->pidlists, links)
>>                 mod_delayed_work(cgroup_pidlist_destroy_wq,
>> &l->destroy_dwork, 0);
>>         mutex_unlock(&cgrp->pidlist_mutex);
>>
>>         // <--- another process calls cgroup_pidlist_start() here
>> since mutex is released
>>
>>         flush_workqueue(cgroup_pidlist_destroy_wq); // <--- another
>> process adds new pidlist and queue work in pararell
>>         BUG_ON(!list_empty(&cgrp->pidlists)); // <--- This check is
>> passed, list_add() could happen after this
>>
> 
> Did you confirm this is what happened when the bug was triggered?
> 
> I don't think the race condition you described exists. In 3.14 kernel,
> cgroup_diput() won't be called if there is any thread running
> cgroup_pidlist_start(). This is guaranteed by vfs.
> 
> But newer kernels are different. Looks like the bug exists in those
> kernels.
> 

Newer kernels should be also fine.

If cgroup_pidlist_destroy_all() is called, it means kernfs has already
removed the tasks file, and even if you still have it opened, when
you try to read it, it will immediately return an errno.

                                fd = open(cgrp/tasks)
cgroup_rmdir(cgrp)
  cgroup_destroy_locked(c)
    kernfs_remove()
  ...
css_free_work_fn()
  cgroup_pidlist_destroy_all()
                               read(fd of cgrp/tasks)
                                 return -ENODEV

So cgroup_pidlist_destroy_all() won't race with cgroup_pidlist_start().

WARNING: multiple messages have this Message-ID (diff)

From: Li Zefan <lizefan@huawei.com>
To: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Tejun Heo <tj@kernel.org>, LKML <linux-kernel@vger.kernel.org>,
	<cgroups@vger.kernel.org>
Subject: Re: Kernel crash in cgroup_pidlist_destroy_work_fn()
Date: Wed, 17 Sep 2014 17:26:07 +0800	[thread overview]
Message-ID: <541953AF.7000201@huawei.com> (raw)
In-Reply-To: <54191C41.3080003@huawei.com>

On 2014/9/17 13:29, Li Zefan wrote:
> On 2014/9/17 7:56, Cong Wang wrote:
>> Hi, Tejun
>>
>>
>> We saw some kernel null pointer dereference in
>> cgroup_pidlist_destroy_work_fn(), more precisely at
>> __mutex_lock_slowpath(), on 3.14. I can show you the full stack trace
>> on request.
>>
> 
> Yes, please.
> 
>> Looking at the code, it seems flush_workqueue() doesn't care about new
>> incoming works, it only processes currently pending ones, if this is
>> correct, then we could have the following race condition:
>>
>> cgroup_pidlist_destroy_all():
>>         //...
>>         mutex_lock(&cgrp->pidlist_mutex);
>>         list_for_each_entry_safe(l, tmp_l, &cgrp->pidlists, links)
>>                 mod_delayed_work(cgroup_pidlist_destroy_wq,
>> &l->destroy_dwork, 0);
>>         mutex_unlock(&cgrp->pidlist_mutex);
>>
>>         // <--- another process calls cgroup_pidlist_start() here
>> since mutex is released
>>
>>         flush_workqueue(cgroup_pidlist_destroy_wq); // <--- another
>> process adds new pidlist and queue work in pararell
>>         BUG_ON(!list_empty(&cgrp->pidlists)); // <--- This check is
>> passed, list_add() could happen after this
>>
> 
> Did you confirm this is what happened when the bug was triggered?
> 
> I don't think the race condition you described exists. In 3.14 kernel,
> cgroup_diput() won't be called if there is any thread running
> cgroup_pidlist_start(). This is guaranteed by vfs.
> 
> But newer kernels are different. Looks like the bug exists in those
> kernels.
> 

Newer kernels should be also fine.

If cgroup_pidlist_destroy_all() is called, it means kernfs has already
removed the tasks file, and even if you still have it opened, when
you try to read it, it will immediately return an errno.

                                fd = open(cgrp/tasks)
cgroup_rmdir(cgrp)
  cgroup_destroy_locked(c)
    kernfs_remove()
  ...
css_free_work_fn()
  cgroup_pidlist_destroy_all()
                               read(fd of cgrp/tasks)
                                 return -ENODEV

So cgroup_pidlist_destroy_all() won't race with cgroup_pidlist_start().

next prev parent reply	other threads:[~2014-09-17  9:26 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-16 23:56 Kernel crash in cgroup_pidlist_destroy_work_fn() Cong Wang
2014-09-16 23:56 ` Cong Wang
     [not found] ` <CAM_iQpVNzx1r8x-bP5CoiCX8PFk15JYHw_XfpYvJGgdkFHj8Gw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-09-17  5:29   ` Li Zefan
2014-09-17  5:29     ` Li Zefan
     [not found]     ` <54191C41.3080003-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2014-09-17  9:26       ` Li Zefan [this message]
2014-09-17  9:26         ` Li Zefan
     [not found]         ` <541953AF.7000201-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2014-09-19  0:23           ` Cong Wang
2014-09-19  0:23             ` Cong Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=541953AF.7000201@huawei.com \
    --to=lizefan-hv44wf8li93qt0dzr+alfa@public.gmane.org \
    --cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    --cc=xiyou.wangcong-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.