All of lore.kernel.org
 help / color / mirror / Atom feed
From: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
To: Cong Wang <xiyou.wangcong-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	LKML <linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: Kernel crash in cgroup_pidlist_destroy_work_fn()
Date: Wed, 17 Sep 2014 13:29:37 +0800	[thread overview]
Message-ID: <54191C41.3080003@huawei.com> (raw)
In-Reply-To: <CAM_iQpVNzx1r8x-bP5CoiCX8PFk15JYHw_XfpYvJGgdkFHj8Gw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On 2014/9/17 7:56, Cong Wang wrote:
> Hi, Tejun
> 
> 
> We saw some kernel null pointer dereference in
> cgroup_pidlist_destroy_work_fn(), more precisely at
> __mutex_lock_slowpath(), on 3.14. I can show you the full stack trace
> on request.
> 

Yes, please.

> Looking at the code, it seems flush_workqueue() doesn't care about new
> incoming works, it only processes currently pending ones, if this is
> correct, then we could have the following race condition:
> 
> cgroup_pidlist_destroy_all():
>         //...
>         mutex_lock(&cgrp->pidlist_mutex);
>         list_for_each_entry_safe(l, tmp_l, &cgrp->pidlists, links)
>                 mod_delayed_work(cgroup_pidlist_destroy_wq,
> &l->destroy_dwork, 0);
>         mutex_unlock(&cgrp->pidlist_mutex);
> 
>         // <--- another process calls cgroup_pidlist_start() here
> since mutex is released
> 
>         flush_workqueue(cgroup_pidlist_destroy_wq); // <--- another
> process adds new pidlist and queue work in pararell
>         BUG_ON(!list_empty(&cgrp->pidlists)); // <--- This check is
> passed, list_add() could happen after this
> 

Did you confirm this is what happened when the bug was triggered?

I don't think the race condition you described exists. In 3.14 kernel,
cgroup_diput() won't be called if there is any thread running
cgroup_pidlist_start(). This is guaranteed by vfs.

But newer kernels are different. Looks like the bug exists in those
kernels.

> 
> Therefore, the newly added pidlist will point to a freed cgroup, and
> when it is freed in the delayed work we will crash.
> 
> The attached patch (compile test ONLY) could be a possible fix, since
> it will check and hold a refcount on this cgroup in
> cgroup_pidlist_start(). But I could very easily miss something here
> since there are many cgroup changes after 3.14 and I don't follow
> cgroup development.
> 
> What do you think?
> 

WARNING: multiple messages have this Message-ID (diff)
From: Li Zefan <lizefan@huawei.com>
To: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Tejun Heo <tj@kernel.org>, LKML <linux-kernel@vger.kernel.org>,
	<cgroups@vger.kernel.org>
Subject: Re: Kernel crash in cgroup_pidlist_destroy_work_fn()
Date: Wed, 17 Sep 2014 13:29:37 +0800	[thread overview]
Message-ID: <54191C41.3080003@huawei.com> (raw)
In-Reply-To: <CAM_iQpVNzx1r8x-bP5CoiCX8PFk15JYHw_XfpYvJGgdkFHj8Gw@mail.gmail.com>

On 2014/9/17 7:56, Cong Wang wrote:
> Hi, Tejun
> 
> 
> We saw some kernel null pointer dereference in
> cgroup_pidlist_destroy_work_fn(), more precisely at
> __mutex_lock_slowpath(), on 3.14. I can show you the full stack trace
> on request.
> 

Yes, please.

> Looking at the code, it seems flush_workqueue() doesn't care about new
> incoming works, it only processes currently pending ones, if this is
> correct, then we could have the following race condition:
> 
> cgroup_pidlist_destroy_all():
>         //...
>         mutex_lock(&cgrp->pidlist_mutex);
>         list_for_each_entry_safe(l, tmp_l, &cgrp->pidlists, links)
>                 mod_delayed_work(cgroup_pidlist_destroy_wq,
> &l->destroy_dwork, 0);
>         mutex_unlock(&cgrp->pidlist_mutex);
> 
>         // <--- another process calls cgroup_pidlist_start() here
> since mutex is released
> 
>         flush_workqueue(cgroup_pidlist_destroy_wq); // <--- another
> process adds new pidlist and queue work in pararell
>         BUG_ON(!list_empty(&cgrp->pidlists)); // <--- This check is
> passed, list_add() could happen after this
> 

Did you confirm this is what happened when the bug was triggered?

I don't think the race condition you described exists. In 3.14 kernel,
cgroup_diput() won't be called if there is any thread running
cgroup_pidlist_start(). This is guaranteed by vfs.

But newer kernels are different. Looks like the bug exists in those
kernels.

> 
> Therefore, the newly added pidlist will point to a freed cgroup, and
> when it is freed in the delayed work we will crash.
> 
> The attached patch (compile test ONLY) could be a possible fix, since
> it will check and hold a refcount on this cgroup in
> cgroup_pidlist_start(). But I could very easily miss something here
> since there are many cgroup changes after 3.14 and I don't follow
> cgroup development.
> 
> What do you think?
> 


  parent reply	other threads:[~2014-09-17  5:29 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-16 23:56 Kernel crash in cgroup_pidlist_destroy_work_fn() Cong Wang
2014-09-16 23:56 ` Cong Wang
     [not found] ` <CAM_iQpVNzx1r8x-bP5CoiCX8PFk15JYHw_XfpYvJGgdkFHj8Gw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-09-17  5:29   ` Li Zefan [this message]
2014-09-17  5:29     ` Li Zefan
     [not found]     ` <54191C41.3080003-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2014-09-17  9:26       ` Li Zefan
2014-09-17  9:26         ` Li Zefan
     [not found]         ` <541953AF.7000201-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2014-09-19  0:23           ` Cong Wang
2014-09-19  0:23             ` Cong Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54191C41.3080003@huawei.com \
    --to=lizefan-hv44wf8li93qt0dzr+alfa@public.gmane.org \
    --cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    --cc=xiyou.wangcong-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.