* Re: [PATCH] cgroups: fix probable race with put_css_set[_taskexit] and find_css_set
2008-09-10 2:18 ` Lai Jiangshan
@ 2008-09-10 2:40 ` Li Zefan
2008-09-10 3:11 ` Paul Menage
2008-09-10 5:01 ` Greg KH
2 siblings, 0 replies; 13+ messages in thread
From: Li Zefan @ 2008-09-10 2:40 UTC (permalink / raw)
To: Lai Jiangshan
Cc: Paul Menage, Andrew Morton, Linux Kernel Mailing List,
Greg Kroah-Hartman
Lai Jiangshan wrote:
> Paul Menage wrote:
>> On Mon, Aug 18, 2008 at 11:29 PM, Lai Jiangshan <laijs@cn.fujitsu.com> wrote:
>>
>> 2) Use atomic_inc_not_zero() in find_existing_css_set(), to ensure
>> that we only return a referenced css, and remove the get_css_set()
>> call from find_css_set(). (Possibly wrapping this in a new
>> kref_get_not_zero() function)
>>
>
> [CC: Greg Kroah-Hartman <greg@kroah.com>]
>
> There are indeed several ways fix this race by Using the
> atomic-functions directly. I prefer the second one, i makes all
> code clearly. And put_css_set[_taskexit] do not need to be changed.
>
> I don't think adding kref_get_not_zero() API is a good idea.
> It will bring kref APIs to a little chaos, kref_get_not_zero() is
> hard to be used, for this function needs a special lock held.
>
> But I tried:
>
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> ---
> diff --git a/include/linux/kref.h b/include/linux/kref.h
> index 0cef6ba..400ffab 100644
> --- a/include/linux/kref.h
> +++ b/include/linux/kref.h
> @@ -25,6 +25,7 @@ struct kref {
> void kref_set(struct kref *kref, int num);
> void kref_init(struct kref *kref);
> void kref_get(struct kref *kref);
> +int kref_get_not_zero(struct kref *kref);
> int kref_put(struct kref *kref, void (*release) (struct kref *kref));
>
> #endif /* _KREF_H_ */
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index 13932ab..0bbb98d 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -347,6 +347,8 @@ static struct css_set *find_existing_css_set(
> hlist_for_each_entry(cg, node, hhead, hlist) {
> if (!memcmp(template, cg->subsys, sizeof(cg->subsys))) {
> /* All subsystems matched */
> + if (!kref_get_not_zero(&cg->ref))
> + return NULL;
Better add comments to explain why we might get 0 refcount here.
> return cg;
> }
> }
> @@ -410,8 +412,6 @@ static struct css_set *find_css_set(
> * the desired set */
> read_lock(&css_set_lock);
> res = find_existing_css_set(oldcg, cgrp, template);
> - if (res)
> - get_css_set(res);
> read_unlock(&css_set_lock);
>
> if (res)
> diff --git a/lib/kref.c b/lib/kref.c
> index 9ecd6e8..b8c1ce6 100644
> --- a/lib/kref.c
> +++ b/lib/kref.c
> @@ -46,6 +46,25 @@ void kref_get(struct kref *kref)
> }
>
> /**
> + * kref_get_not_zero - increment refcount for object if current refcount
> + * is not zero.
> + * @kref: object.
> + *
> + * Beware, the object maybe be being released, so we need a special lock held
s/maybe be being/may be being/
> + * to ensure the object's refcount is remaining access.
> + *
need better wording... what do you mean by "remaining access"? to ensure the object
is remaining valid?
> + * Return 0 if this refcount is 0, otherwise return 1.
> + */
> +int kref_get_not_zero(struct kref *kref)
> +{
> + if (atomic_inc_not_zero(&kref->refcount)) {
> + smp_mb__after_atomic_inc();
> + return 1;
> + }
> + return 0;
> +}
> +
> +/**
> * kref_put - decrement refcount for object.
> * @kref: object.
> * @release: pointer to the function that will clean up the object when the
>
>
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH] cgroups: fix probable race with put_css_set[_taskexit] and find_css_set
2008-09-10 2:18 ` Lai Jiangshan
2008-09-10 2:40 ` Li Zefan
@ 2008-09-10 3:11 ` Paul Menage
2008-09-10 5:01 ` Greg KH
2 siblings, 0 replies; 13+ messages in thread
From: Paul Menage @ 2008-09-10 3:11 UTC (permalink / raw)
To: Lai Jiangshan
Cc: Andrew Morton, Linux Kernel Mailing List, Greg Kroah-Hartman
On Tue, Sep 9, 2008 at 7:18 PM, Lai Jiangshan <laijs@cn.fujitsu.com> wrote:
> hlist_for_each_entry(cg, node, hhead, hlist) {
> if (!memcmp(template, cg->subsys, sizeof(cg->subsys))) {
> /* All subsystems matched */
> + if (!kref_get_not_zero(&cg->ref))
> + return NULL;
Maybe it would be better to do:
if (!memcmp(...) && kref_get_not_zero(&cg->ref))
return cg;
so that in the event of a race, we carry on looking in the array.
> + * Return 0 if this refcount is 0, otherwise return 1.
> + */
> +int kref_get_not_zero(struct kref *kref)
Probably clearer to have a bool return type.
Paul
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH] cgroups: fix probable race with put_css_set[_taskexit] and find_css_set
2008-09-10 2:18 ` Lai Jiangshan
2008-09-10 2:40 ` Li Zefan
2008-09-10 3:11 ` Paul Menage
@ 2008-09-10 5:01 ` Greg KH
2008-09-10 5:31 ` Paul Menage
2 siblings, 1 reply; 13+ messages in thread
From: Greg KH @ 2008-09-10 5:01 UTC (permalink / raw)
To: Lai Jiangshan; +Cc: Paul Menage, Andrew Morton, Linux Kernel Mailing List
On Wed, Sep 10, 2008 at 10:18:36AM +0800, Lai Jiangshan wrote:
> Paul Menage wrote:
> > On Mon, Aug 18, 2008 at 11:29 PM, Lai Jiangshan <laijs@cn.fujitsu.com> wrote:
> >
> > 2) Use atomic_inc_not_zero() in find_existing_css_set(), to ensure
> > that we only return a referenced css, and remove the get_css_set()
> > call from find_css_set(). (Possibly wrapping this in a new
> > kref_get_not_zero() function)
> >
>
> [CC: Greg Kroah-Hartman <greg@kroah.com>]
>
> There are indeed several ways fix this race by Using the
> atomic-functions directly. I prefer the second one, i makes all
> code clearly. And put_css_set[_taskexit] do not need to be changed.
>
> I don't think adding kref_get_not_zero() API is a good idea.
> It will bring kref APIs to a little chaos, kref_get_not_zero() is
> hard to be used, for this function needs a special lock held.
>
> But I tried:
What are you trying to solve here with this change? I agree, it does
seem a bit "chaotic" :)
I thought we used to have something like this for kref in the past, but
I must be mistaken as it's no longer there...
thanks,
greg k-h
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] cgroups: fix probable race with put_css_set[_taskexit] and find_css_set
2008-09-10 5:01 ` Greg KH
@ 2008-09-10 5:31 ` Paul Menage
2008-09-10 6:17 ` Greg KH
0 siblings, 1 reply; 13+ messages in thread
From: Paul Menage @ 2008-09-10 5:31 UTC (permalink / raw)
To: Greg KH; +Cc: Lai Jiangshan, Andrew Morton, Linux Kernel Mailing List
On Tue, Sep 9, 2008 at 10:01 PM, Greg KH <greg@kroah.com> wrote:
>
> What are you trying to solve here with this change? I agree, it does
> seem a bit "chaotic" :)
There's a place in cgroups that uses kref_put() to release an object;
the release function *then* takes a write-lock and removes the object
from a lookup table; it could race with another thread that searches
the lookup table (while holding a read-lock) and does kref_get() on
the same object.
The current fix is for the release function to recheck inside the lock
that the object's refcount is still zero, and only actually
unlink/free it if so. And actually I've just realised that this isn't
actually even safe, since the thread that just acquired the object
could kref_put() it almost immediately, which would leave two threads
both trying to unlink/free the object.
The two solutions being considered are:
- add a kref_put_and_write_lock(), similar to atomic_dec_and_lock(),
which would ensure that the final refcount on the object was only
released inside the lock
- add a kref_get_if_not_zero(), which would prevent a lookup from
succeeding if another thread had just dropped the last reference on
the object.
Paul
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] cgroups: fix probable race with put_css_set[_taskexit] and find_css_set
2008-09-10 5:31 ` Paul Menage
@ 2008-09-10 6:17 ` Greg KH
2008-09-10 6:25 ` Li Zefan
0 siblings, 1 reply; 13+ messages in thread
From: Greg KH @ 2008-09-10 6:17 UTC (permalink / raw)
To: Paul Menage; +Cc: Lai Jiangshan, Andrew Morton, Linux Kernel Mailing List
On Tue, Sep 09, 2008 at 10:31:24PM -0700, Paul Menage wrote:
> On Tue, Sep 9, 2008 at 10:01 PM, Greg KH <greg@kroah.com> wrote:
> >
> > What are you trying to solve here with this change? I agree, it does
> > seem a bit "chaotic" :)
>
> There's a place in cgroups that uses kref_put() to release an object;
> the release function *then* takes a write-lock and removes the object
> from a lookup table; it could race with another thread that searches
> the lookup table (while holding a read-lock) and does kref_get() on
> the same object.
Ick, yeah that's not good.
What about the way everyone else solves this, grab the lock before you
call kref_put()?
> The current fix is for the release function to recheck inside the lock
> that the object's refcount is still zero, and only actually
> unlink/free it if so. And actually I've just realised that this isn't
> actually even safe, since the thread that just acquired the object
> could kref_put() it almost immediately, which would leave two threads
> both trying to unlink/free the object.
Yeah, don't do that :)
thanks,
greg k-h
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] cgroups: fix probable race with put_css_set[_taskexit] and find_css_set
2008-09-10 6:17 ` Greg KH
@ 2008-09-10 6:25 ` Li Zefan
2008-09-10 6:29 ` Greg KH
0 siblings, 1 reply; 13+ messages in thread
From: Li Zefan @ 2008-09-10 6:25 UTC (permalink / raw)
To: Greg KH
Cc: Paul Menage, Lai Jiangshan, Andrew Morton,
Linux Kernel Mailing List
Greg KH wrote:
> On Tue, Sep 09, 2008 at 10:31:24PM -0700, Paul Menage wrote:
>> On Tue, Sep 9, 2008 at 10:01 PM, Greg KH <greg@kroah.com> wrote:
>>> What are you trying to solve here with this change? I agree, it does
>>> seem a bit "chaotic" :)
>> There's a place in cgroups that uses kref_put() to release an object;
>> the release function *then* takes a write-lock and removes the object
>> from a lookup table; it could race with another thread that searches
>> the lookup table (while holding a read-lock) and does kref_get() on
>> the same object.
>
> Ick, yeah that's not good.
>
> What about the way everyone else solves this, grab the lock before you
> call kref_put()?
>
do_exit()
cgroup_exit()
put_css_set_taskexit()
kref_put()
If we grab the lock before kref_put(), we add overhead to do_exit(), which
is what we are trying to avoid here.
>> The current fix is for the release function to recheck inside the lock
>> that the object's refcount is still zero, and only actually
>> unlink/free it if so. And actually I've just realised that this isn't
>> actually even safe, since the thread that just acquired the object
>> could kref_put() it almost immediately, which would leave two threads
>> both trying to unlink/free the object.
>
> Yeah, don't do that :)
>
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH] cgroups: fix probable race with put_css_set[_taskexit] and find_css_set
2008-09-10 6:25 ` Li Zefan
@ 2008-09-10 6:29 ` Greg KH
2008-09-10 15:03 ` Paul Menage
0 siblings, 1 reply; 13+ messages in thread
From: Greg KH @ 2008-09-10 6:29 UTC (permalink / raw)
To: Li Zefan
Cc: Paul Menage, Lai Jiangshan, Andrew Morton,
Linux Kernel Mailing List
On Wed, Sep 10, 2008 at 02:25:57PM +0800, Li Zefan wrote:
> Greg KH wrote:
> > On Tue, Sep 09, 2008 at 10:31:24PM -0700, Paul Menage wrote:
> >> On Tue, Sep 9, 2008 at 10:01 PM, Greg KH <greg@kroah.com> wrote:
> >>> What are you trying to solve here with this change? I agree, it does
> >>> seem a bit "chaotic" :)
> >> There's a place in cgroups that uses kref_put() to release an object;
> >> the release function *then* takes a write-lock and removes the object
> >> from a lookup table; it could race with another thread that searches
> >> the lookup table (while holding a read-lock) and does kref_get() on
> >> the same object.
> >
> > Ick, yeah that's not good.
> >
> > What about the way everyone else solves this, grab the lock before you
> > call kref_put()?
> >
>
> do_exit()
> cgroup_exit()
> put_css_set_taskexit()
> kref_put()
>
> If we grab the lock before kref_put(), we add overhead to do_exit(), which
> is what we are trying to avoid here.
But you can't put such logic in the release() function as you are
finding out, that's not going to work either.
Maybe you need to just "open-code" an atomic counter here and not use
kref as it sounds like you are needing to do something very "special"
here.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] cgroups: fix probable race with put_css_set[_taskexit] and find_css_set
2008-09-10 6:29 ` Greg KH
@ 2008-09-10 15:03 ` Paul Menage
2008-09-12 15:58 ` Greg KH
0 siblings, 1 reply; 13+ messages in thread
From: Paul Menage @ 2008-09-10 15:03 UTC (permalink / raw)
To: Greg KH; +Cc: Li Zefan, Lai Jiangshan, Andrew Morton, Linux Kernel Mailing List
On Tue, Sep 9, 2008 at 11:29 PM, Greg KH <greg@kroah.com> wrote:
>
> But you can't put such logic in the release() function as you are
> finding out, that's not going to work either.
Right - that's why a kref_put_and_write_lock() function would be handy
- it would take the lock only when necessary. It would delegate most
of its work to the (as-yet nonexistant) atomic_dec_and_write_lock()
function, which would be an rwlock equivalent of atomic_dec_and_lock()
>
> Maybe you need to just "open-code" an atomic counter here and not use
> kref as it sounds like you are needing to do something very "special"
> here.
>
It's basically the same situation as the dentry cache - ref-counted
objects that are also referenced from a hash table protected by a
global lock.
If you're opposed to the addition of kref_put_and_write_lock() then
yes, I'll replace kref with a custom refcount.
Paul
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] cgroups: fix probable race with put_css_set[_taskexit] and find_css_set
2008-09-10 15:03 ` Paul Menage
@ 2008-09-12 15:58 ` Greg KH
2008-09-12 19:33 ` Paul Menage
0 siblings, 1 reply; 13+ messages in thread
From: Greg KH @ 2008-09-12 15:58 UTC (permalink / raw)
To: Paul Menage
Cc: Li Zefan, Lai Jiangshan, Andrew Morton, Linux Kernel Mailing List
On Wed, Sep 10, 2008 at 08:03:18AM -0700, Paul Menage wrote:
> On Tue, Sep 9, 2008 at 11:29 PM, Greg KH <greg@kroah.com> wrote:
> >
> > But you can't put such logic in the release() function as you are
> > finding out, that's not going to work either.
>
> Right - that's why a kref_put_and_write_lock() function would be handy
> - it would take the lock only when necessary. It would delegate most
> of its work to the (as-yet nonexistant) atomic_dec_and_write_lock()
> function, which would be an rwlock equivalent of atomic_dec_and_lock()
>
> >
> > Maybe you need to just "open-code" an atomic counter here and not use
> > kref as it sounds like you are needing to do something very "special"
> > here.
> >
>
> It's basically the same situation as the dentry cache - ref-counted
> objects that are also referenced from a hash table protected by a
> global lock.
>
> If you're opposed to the addition of kref_put_and_write_lock() then
> yes, I'll replace kref with a custom refcount.
It just seems messy, but if you want to try it, I'll be glad to look at
the code. Oh wait, was that the patch that you sent out last time?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] cgroups: fix probable race with put_css_set[_taskexit] and find_css_set
2008-09-12 15:58 ` Greg KH
@ 2008-09-12 19:33 ` Paul Menage
0 siblings, 0 replies; 13+ messages in thread
From: Paul Menage @ 2008-09-12 19:33 UTC (permalink / raw)
To: Greg KH; +Cc: Li Zefan, Lai Jiangshan, Andrew Morton, Linux Kernel Mailing List
On Fri, Sep 12, 2008 at 8:58 AM, Greg KH <greg@kroah.com> wrote:
>> If you're opposed to the addition of kref_put_and_write_lock() then
>> yes, I'll replace kref with a custom refcount.
>
> It just seems messy, but if you want to try it, I'll be glad to look at
> the code. Oh wait, was that the patch that you sent out last time?
Yes - that patch replaced kref with a plain refcount. We weren't
really using kref as anything other than an atomic_dec_and_test()
wrapper anyway.
Paul
^ permalink raw reply [flat|nested] 13+ messages in thread