cgroups.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 3.4 0/2] 2 bug fixes for the 3.4 cgroup code
@ 2013-01-09 12:11 Hans de Goede
       [not found] ` <1357733486-1742-1-git-send-email-hdegoede-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Hans de Goede @ 2013-01-09 12:11 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan; +Cc: cgroups-u79uwXL29TY76Z2rM5mHXA

As a spare-time project I'm working on Linux on Allwinner A10 socs (ARM
mach-sun4i). While the linux-sunxi project is slowly working on cleaning
up the code and getting support for these socs upstream (first bits have
landed in 3.8), to be able to truely use these systems we're stuck on using
3.4 for now, since that is the latest to which all the ugly Allwinner code
has been forward-ported. See: http://linux-sunxi.org

I'm using Fedora-18 as userland, and since that uses systemd it exercises
the cgroup subsystem quite a bit. Thanks to building the kernel with
CONFIG_DEBUG_LIST this has exposed a use after free bug in the 3.4 cgroup
code. While debugging this I've also noticed some missing locking (or so
I believe).

The current cgroup code seems to be unaffected by both issues. Since the
current code is significantly changed, I've written a simple fix for 3.4,
which I would like to propose to be added to the 3.4.xx bugfix releases.

Allthough I know everyone dislikes looking back at old code once it has
been rewritten in a better fashion, I hope you are still willing to make
some time to review these 2 simple patches, as a first step in getting
them into 3.4.xx.

Some details on the differences between the current cgroup code, which
does not have these issues and the 3.4 code, in the current code:
1) The code-segment missing the locking has been removed
2) Unlike the 3.4 code cgroup_rmdir() properly checks cgrp->count

Note that the cgrp->count check in the current cgroup_rmdir() code has the
check before waiting on the cgroup_rmdir_waitq, and simply returns EBUSY when
cgrp->count != 0, where as my patch adds the check inside the waiting block
and waits for the waitq to be woken up.

The reason for this difference is that the current cgroup code handles
the clearing of a css_set / the link->cgrp_link_list elements from
cgrp->css_sets directly (and does not call cgroup_wakeup_rmdir_waiter()
when doing so), where as the 3.4 code handles the clearing of this list
from a workqueue, and *does* call cgroup_wakeup_rmdir_waiter() when clearing
the list.

Regards,

Hans

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 3.4 1/2] cgroup: Take css_set_lock when calling cgroup_css_sets_empty()
       [not found] ` <1357733486-1742-1-git-send-email-hdegoede-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2013-01-09 12:11   ` Hans de Goede
       [not found]     ` <1357733486-1742-2-git-send-email-hdegoede-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2013-01-09 12:11   ` [PATCH 3.4 2/2] cgroup: Fix use after free of cgrp (cgrp->css_sets) Hans de Goede
  2013-01-09 14:38   ` [PATCH 3.4 0/2] 2 bug fixes for the 3.4 cgroup code Tejun Heo
  2 siblings, 1 reply; 10+ messages in thread
From: Hans de Goede @ 2013-01-09 12:11 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan; +Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, Hans de Goede

As indicated in the comment above cgroup_css_sets_empty the
css_set_lock must be hold when calling it.

Signed-off-by: Hans de Goede <hdegoede-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 kernel/cgroup.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index c908354..59e711a 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -3980,7 +3980,10 @@ static int cgroup_rmdir(struct inode *unused_dir, struct dentry *dentry)
 	/* the vfs holds both inode->i_mutex already */
 again:
 	mutex_lock(&cgroup_mutex);
-	if (!cgroup_css_sets_empty(cgrp)) {
+	read_lock(&css_set_lock);
+	ret = cgroup_css_sets_empty(cgrp);
+	read_unlock(&css_set_lock);
+	if (!ret) {
 		mutex_unlock(&cgroup_mutex);
 		return -EBUSY;
 	}
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3.4 2/2] cgroup: Fix use after free of cgrp (cgrp->css_sets)
       [not found] ` <1357733486-1742-1-git-send-email-hdegoede-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2013-01-09 12:11   ` [PATCH 3.4 1/2] cgroup: Take css_set_lock when calling cgroup_css_sets_empty() Hans de Goede
@ 2013-01-09 12:11   ` Hans de Goede
  2013-01-09 14:38   ` [PATCH 3.4 0/2] 2 bug fixes for the 3.4 cgroup code Tejun Heo
  2 siblings, 0 replies; 10+ messages in thread
From: Hans de Goede @ 2013-01-09 12:11 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan; +Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, Hans de Goede

Running a 3.4 kernel + Fedora-18 (systemd) userland on my Allwinner A10
(arm cortex a8), I'm seeing repeated, reproducable list_del list corruption
errors when build with CONFIG_DEBUG_LIST, and the backtrace always shows
free_css_set_work as the function making the problematic list_del call.

I've tracked this doen to a use after free of the cgrp struct, specifically
of the cgrp->css_sets list_head, which gets cleared by free_css_set_work.

Since free_css_set_work runs form a workqueue, it is possible for it to not be
done with clearing the list when the cgrp gets free-ed. To avoid this the code
adding the links increases cgrp->count, and the freeing code running from the
workqueue decreases cgrp->count *after* doing list_del, and then if the count
goes to 0 calls cgroup_wakeup_rmdir_waiter().

However cgroup_rmdir() is missing a check for cgrp->count != 0, causing it
to still continue with the rmdir (which leads to the free-ing of the cgrp),
before free_css_set_work is done. Sometimes the free-ed memory is re-used
before free_css_set_work gets around to unlinking link->cgrp_link_list,
triggering the list_del list corruption messages.

This patch fixes this by properly checking for cgrp->count != 0 and waiting
for the cgroup_rmdir_waitq in that case.

Signed-off-by: Hans de Goede <hdegoede-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 kernel/cgroup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 59e711a..78c7a6f 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -4022,7 +4022,7 @@ again:
 		return -EBUSY;
 	}
 	prepare_to_wait(&cgroup_rmdir_waitq, &wait, TASK_INTERRUPTIBLE);
-	if (!cgroup_clear_css_refs(cgrp)) {
+	if (!cgroup_clear_css_refs(cgrp) || atomic_read(&cgrp->count) != 0) {
 		mutex_unlock(&cgroup_mutex);
 		/*
 		 * Because someone may call cgroup_wakeup_rmdir_waiter() before
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 3.4 1/2] cgroup: Take css_set_lock when calling cgroup_css_sets_empty()
       [not found]     ` <1357733486-1742-2-git-send-email-hdegoede-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2013-01-09 12:37       ` Hans de Goede
  0 siblings, 0 replies; 10+ messages in thread
From: Hans de Goede @ 2013-01-09 12:37 UTC (permalink / raw)
  To: Hans de Goede; +Cc: Tejun Heo, Li Zefan, cgroups-u79uwXL29TY76Z2rM5mHXA

Hi,

Self-nack, looks like I posted this series too soon. With a config which actually
has CGROUP controllers enabled this causes a panic, so looks like I need to do some
more work on this.

Feedback on this issue is still appreciated.

Regards,

Hans


On 01/09/2013 01:11 PM, Hans de Goede wrote:
> As indicated in the comment above cgroup_css_sets_empty the
> css_set_lock must be hold when calling it.
>
> Signed-off-by: Hans de Goede <hdegoede-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---
>   kernel/cgroup.c | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index c908354..59e711a 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -3980,7 +3980,10 @@ static int cgroup_rmdir(struct inode *unused_dir, struct dentry *dentry)
>   	/* the vfs holds both inode->i_mutex already */
>   again:
>   	mutex_lock(&cgroup_mutex);
> -	if (!cgroup_css_sets_empty(cgrp)) {
> +	read_lock(&css_set_lock);
> +	ret = cgroup_css_sets_empty(cgrp);
> +	read_unlock(&css_set_lock);
> +	if (!ret) {
>   		mutex_unlock(&cgroup_mutex);
>   		return -EBUSY;
>   	}
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3.4 0/2] 2 bug fixes for the 3.4 cgroup code
       [not found] ` <1357733486-1742-1-git-send-email-hdegoede-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2013-01-09 12:11   ` [PATCH 3.4 1/2] cgroup: Take css_set_lock when calling cgroup_css_sets_empty() Hans de Goede
  2013-01-09 12:11   ` [PATCH 3.4 2/2] cgroup: Fix use after free of cgrp (cgrp->css_sets) Hans de Goede
@ 2013-01-09 14:38   ` Tejun Heo
       [not found]     ` <20130109143850.GG3926-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
  2 siblings, 1 reply; 10+ messages in thread
From: Tejun Heo @ 2013-01-09 14:38 UTC (permalink / raw)
  To: Hans de Goede; +Cc: Li Zefan, cgroups-u79uwXL29TY76Z2rM5mHXA

Hello, Hans.

On Wed, Jan 09, 2013 at 01:11:24PM +0100, Hans de Goede wrote:
> Allthough I know everyone dislikes looking back at old code once it has
> been rewritten in a better fashion, I hope you are still willing to make
> some time to review these 2 simple patches, as a first step in getting
> them into 3.4.xx.

Well, the problem is that -stable doesn't take any patch which isn't
already upstream, so with upstream already diverted from the 3.4 code
base, there isn't any way to feed these patches to -stable.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3.4 0/2] 2 bug fixes for the 3.4 cgroup code
       [not found]     ` <20130109143850.GG3926-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
@ 2013-01-09 14:48       ` Hans de Goede
       [not found]         ` <50ED8347.7000806-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Hans de Goede @ 2013-01-09 14:48 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Li Zefan, cgroups-u79uwXL29TY76Z2rM5mHXA

Hi,

On 01/09/2013 03:38 PM, Tejun Heo wrote:
> Hello, Hans.
>
> On Wed, Jan 09, 2013 at 01:11:24PM +0100, Hans de Goede wrote:
>> Allthough I know everyone dislikes looking back at old code once it has
>> been rewritten in a better fashion, I hope you are still willing to make
>> some time to review these 2 simple patches, as a first step in getting
>> them into 3.4.xx.
>
> Well, the problem is that -stable doesn't take any patch which isn't
> already upstream,

I know that that is the generic rule, but I was hoping Greg KH would be
willing to make an exception. At least I can try :)

I've solved the panic btw, I'll send a v2 of the set right after this
mail, including an explanation what was wrong with v1.

I was hoping that I could get a review from you on v2 including an
Acked-by (assuming the review does not find any issues) before I very kindly
ask Greg KH to make an exception for this.

Thanks & Regards,

Hans

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 3.4 1/2] cgroup: Take css_set_lock when calling cgroup_css_sets_empty()
       [not found] ` <1357743822-21707-1-git-send-email-hdegoede-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2013-01-09 15:03   ` Hans de Goede
       [not found]     ` <1357743822-21707-2-git-send-email-hdegoede-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Hans de Goede @ 2013-01-09 15:03 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan; +Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, Hans de Goede

As indicated in the comment above cgroup_css_sets_empty the
css_set_lock must be hold when calling it.

Signed-off-by: Hans de Goede <hdegoede-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 kernel/cgroup.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index c908354..59e711a 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -3980,7 +3980,10 @@ static int cgroup_rmdir(struct inode *unused_dir, struct dentry *dentry)
 	/* the vfs holds both inode->i_mutex already */
 again:
 	mutex_lock(&cgroup_mutex);
-	if (!cgroup_css_sets_empty(cgrp)) {
+	read_lock(&css_set_lock);
+	ret = cgroup_css_sets_empty(cgrp);
+	read_unlock(&css_set_lock);
+	if (!ret) {
 		mutex_unlock(&cgroup_mutex);
 		return -EBUSY;
 	}
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 3.4 0/2] 2 bug fixes for the 3.4 cgroup code
       [not found]         ` <50ED8347.7000806-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2013-01-09 15:39           ` Tejun Heo
  0 siblings, 0 replies; 10+ messages in thread
From: Tejun Heo @ 2013-01-09 15:39 UTC (permalink / raw)
  To: Hans de Goede; +Cc: Li Zefan, cgroups-u79uwXL29TY76Z2rM5mHXA

Hello,

On Wed, Jan 09, 2013 at 03:48:39PM +0100, Hans de Goede wrote:
> I know that that is the generic rule, but I was hoping Greg KH would be
> willing to make an exception. At least I can try :)
> 
> I've solved the panic btw, I'll send a v2 of the set right after this
> mail, including an explanation what was wrong with v1.
> 
> I was hoping that I could get a review from you on v2 including an
> Acked-by (assuming the review does not find any issues) before I very kindly
> ask Greg KH to make an exception for this.

Unfortunately, I'm fairly sure Greg won't take these.  It's simply not
how -stable process works, at least until now.  I'll try to review
them later.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3.4 1/2] cgroup: Take css_set_lock when calling cgroup_css_sets_empty()
       [not found]     ` <1357743822-21707-2-git-send-email-hdegoede-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2013-01-10  2:43       ` Li Zefan
       [not found]         ` <50EE2ABD.5070404-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Li Zefan @ 2013-01-10  2:43 UTC (permalink / raw)
  To: Hans de Goede; +Cc: Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA

On 2013/1/9 23:03, Hans de Goede wrote:
> As indicated in the comment above cgroup_css_sets_empty the
> css_set_lock must be hold when calling it.
> 
> Signed-off-by: Hans de Goede <hdegoede-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---
>  kernel/cgroup.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index c908354..59e711a 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -3980,7 +3980,10 @@ static int cgroup_rmdir(struct inode *unused_dir, struct dentry *dentry)
>  	/* the vfs holds both inode->i_mutex already */
>  again:
>  	mutex_lock(&cgroup_mutex);
> -	if (!cgroup_css_sets_empty(cgrp)) {
> +	read_lock(&css_set_lock);
> +	ret = cgroup_css_sets_empty(cgrp);

This function doesn't exist in 3.4.xx kernel! It was introduced by an
out-of-tree patch, and that patch introduced these two bugs (maybe more).

> +	read_unlock(&css_set_lock);
> +	if (!ret) {
>  		mutex_unlock(&cgroup_mutex);
>  		return -EBUSY;
>  	}
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3.4 1/2] cgroup: Take css_set_lock when calling cgroup_css_sets_empty()
       [not found]         ` <50EE2ABD.5070404-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
@ 2013-01-10  8:49           ` Hans de Goede
  0 siblings, 0 replies; 10+ messages in thread
From: Hans de Goede @ 2013-01-10  8:49 UTC (permalink / raw)
  To: Li Zefan; +Cc: Tejun Heo, cgroups-u79uwXL29TY76Z2rM5mHXA

Hi,

On 01/10/2013 03:43 AM, Li Zefan wrote:
> On 2013/1/9 23:03, Hans de Goede wrote:
>> As indicated in the comment above cgroup_css_sets_empty the
>> css_set_lock must be hold when calling it.
>>
>> Signed-off-by: Hans de Goede <hdegoede-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>> ---
>>   kernel/cgroup.c | 5 ++++-
>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
>> index c908354..59e711a 100644
>> --- a/kernel/cgroup.c
>> +++ b/kernel/cgroup.c
>> @@ -3980,7 +3980,10 @@ static int cgroup_rmdir(struct inode *unused_dir, struct dentry *dentry)
>>   	/* the vfs holds both inode->i_mutex already */
>>   again:
>>   	mutex_lock(&cgroup_mutex);
>> -	if (!cgroup_css_sets_empty(cgrp)) {
>> +	read_lock(&css_set_lock);
>> +	ret = cgroup_css_sets_empty(cgrp);
> 
> This function doesn't exist in 3.4.xx kernel! It was introduced by an
> out-of-tree patch, and that patch introduced these two bugs (maybe more).

Ah, I didn't expect the sunxi support patches in linux-sunxi to touch the
cgroup code, but since they also bring in a ton of android stuff they do,
and you're right, this android specific patch is the culprit:
https://github.com/linux-sunxi/linux-sunxi/commit/dbc38c633f4b7abe97c53036df55fbb2040188bc

Sorry about the noise.

Regards,

Hans

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2013-01-10  8:49 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-09 12:11 [PATCH 3.4 0/2] 2 bug fixes for the 3.4 cgroup code Hans de Goede
     [not found] ` <1357733486-1742-1-git-send-email-hdegoede-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-01-09 12:11   ` [PATCH 3.4 1/2] cgroup: Take css_set_lock when calling cgroup_css_sets_empty() Hans de Goede
     [not found]     ` <1357733486-1742-2-git-send-email-hdegoede-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-01-09 12:37       ` Hans de Goede
2013-01-09 12:11   ` [PATCH 3.4 2/2] cgroup: Fix use after free of cgrp (cgrp->css_sets) Hans de Goede
2013-01-09 14:38   ` [PATCH 3.4 0/2] 2 bug fixes for the 3.4 cgroup code Tejun Heo
     [not found]     ` <20130109143850.GG3926-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2013-01-09 14:48       ` Hans de Goede
     [not found]         ` <50ED8347.7000806-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-01-09 15:39           ` Tejun Heo
  -- strict thread matches above, loose matches on Subject: below --
2013-01-09 15:03 2 bug fixes for the 3.4 cgroup code (v2) Hans de Goede
     [not found] ` <1357743822-21707-1-git-send-email-hdegoede-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-01-09 15:03   ` [PATCH 3.4 1/2] cgroup: Take css_set_lock when calling cgroup_css_sets_empty() Hans de Goede
     [not found]     ` <1357743822-21707-2-git-send-email-hdegoede-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-01-10  2:43       ` Li Zefan
     [not found]         ` <50EE2ABD.5070404-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-01-10  8:49           ` Hans de Goede

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).