public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Containers: css_put() dilemma
@ 2007-07-16 18:50 Balbir Singh
  2007-07-16 19:03 ` Paul (宝瑠) Menage
  0 siblings, 1 reply; 26+ messages in thread
From: Balbir Singh @ 2007-07-16 18:50 UTC (permalink / raw)
  To: Paul Menage, Andrew Morton
  Cc: Srivatsa Vaddagiri, Pavel Emelianov, Vaidyanathan Srinivasan,
	Serge E. Hallyn, linux kernel mailing list, Linux Containers,
	Paul Jackson

Hi, Paul,

I've run into a strange problem with css_put(). After the changes for notify_on_release(), the css_put() routine can now block and it blocks on
the container_mutex. This implies that css_put() cannot be called if

1. We cannot block
2. We already hold the container_mutex

The problem I have is that of preventing the destruction of my container
(when the user does rmdir). If the user migrates away all tasks and does
an rmdir, the only way to prevent the container from going away is through
css_get() references. In my case, some pages have been allocated from the
container and hence I do not want it to go away, until all the pages
charged to it are freed. When I use css_get/put() to prevent destruction
I am blocked by the limitations of css_put() listed above.

Do you have any recommendations for a cleaner solution? I suspect we'll
need can_destroy() callbacks (similar to can_attach()).

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Containers: css_put() dilemma
  2007-07-16 18:50 Containers: css_put() dilemma Balbir Singh
@ 2007-07-16 19:03 ` Paul (宝瑠) Menage
  2007-07-17  2:21   ` Balbir Singh
  0 siblings, 1 reply; 26+ messages in thread
From: Paul (宝瑠) Menage @ 2007-07-16 19:03 UTC (permalink / raw)
  To: balbir
  Cc: Andrew Morton, Srivatsa Vaddagiri, Pavel Emelianov,
	Vaidyanathan Srinivasan, Serge E. Hallyn,
	linux kernel mailing list, Linux Containers, Paul Jackson

On 7/16/07, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> Hi, Paul,
>
> I've run into a strange problem with css_put(). After the changes for notify_on_release(), the css_put() routine can now block and it blocks on
> the container_mutex. This implies that css_put() cannot be called if
>
> 1. We cannot block
> 2. We already hold the container_mutex
>
> The problem I have is that of preventing the destruction of my container
> (when the user does rmdir). If the user migrates away all tasks and does
> an rmdir, the only way to prevent the container from going away is through
> css_get() references. In my case, some pages have been allocated from the
> container and hence I do not want it to go away, until all the pages
> charged to it are freed. When I use css_get/put() to prevent destruction
> I am blocked by the limitations of css_put() listed above.
>
> Do you have any recommendations for a cleaner solution? I suspect we'll
> need can_destroy() callbacks (similar to can_attach()).

I think moving the release_list synchronization inside a separate
spinlock, and thus not requiring container_mutex to be held for
check_for_release(), is the simplest solution. I'll do that. I'm
hoping to get a new set of patches to Andrew today or tomorrow.

Adding a can_destroy() callback is possible, but since I envisage that
most subsystems that would want to implement it would basically be
doing reference counting anyway, it seems worth having a generic
reference counting mechanism in the framework. In particular, since
once the container does become releasable due to all the
subsystem-specific refcounts being released, we want to be able to
invoke the release agent, we'll end up with the same synchronization
problems that we have now if we just pushed everything into a
can_destroy() method. (Unless the framework polled all can_destroy()
methods for potentially-removable containers, which seems a bit
nasty).

We can add can_destroy() if we encounter a situation that can't be
handled by generic reference counting.

Paul

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Containers: css_put() dilemma
  2007-07-16 19:03 ` Paul (宝瑠) Menage
@ 2007-07-17  2:21   ` Balbir Singh
  2007-07-17  2:35     ` Paul (宝瑠) Menage
  0 siblings, 1 reply; 26+ messages in thread
From: Balbir Singh @ 2007-07-17  2:21 UTC (permalink / raw)
  To: "Paul (??) Menage"
  Cc: Pavel Emelianov, linux kernel mailing list, Paul Jackson,
	Linux Containers, Andrew Morton

Paul (??) Menage wrote:
> On 7/16/07, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>> Hi, Paul,
>>
>> I've run into a strange problem with css_put(). After the changes for
>> notify_on_release(), the css_put() routine can now block and it blocks on
>> the container_mutex. This implies that css_put() cannot be called if
>>
>> 1. We cannot block
>> 2. We already hold the container_mutex
>>
>> The problem I have is that of preventing the destruction of my container
>> (when the user does rmdir). If the user migrates away all tasks and does
>> an rmdir, the only way to prevent the container from going away is
>> through
>> css_get() references. In my case, some pages have been allocated from the
>> container and hence I do not want it to go away, until all the pages
>> charged to it are freed. When I use css_get/put() to prevent destruction
>> I am blocked by the limitations of css_put() listed above.
>>
>> Do you have any recommendations for a cleaner solution? I suspect we'll
>> need can_destroy() callbacks (similar to can_attach()).
> 
> I think moving the release_list synchronization inside a separate
> spinlock, and thus not requiring container_mutex to be held for
> check_for_release(), is the simplest solution. I'll do that. I'm
> hoping to get a new set of patches to Andrew today or tomorrow.
> 

That sounds good to me. But I worry about having to do release synchronization
on every css_put(). The current patch I have, but does not work 100%
does the following (WARNING: white spaces ahead, do not use the patch
directly)

-       if (notify_on_release(cont)) {
+       if (atomic_dec_and_test(&css->refcnt) && notify_on_release(cont)) {
                mutex_lock(&container_mutex);
                set_bit(CONT_RELEASABLE, &cont->flags);
-               if (atomic_dec_and_test(&css->refcnt)) {
-                       check_for_release(cont);
-               }
+               check_for_release(cont);
                mutex_unlock(&container_mutex);

That way we set the CONT_RELEASABLE bit only when the ref count drops
to zero.


> Adding a can_destroy() callback is possible, but since I envisage that
> most subsystems that would want to implement it would basically be
> doing reference counting anyway, it seems worth having a generic
> reference counting mechanism in the framework. In particular, since
> once the container does become releasable due to all the
> subsystem-specific refcounts being released, we want to be able to
> invoke the release agent, we'll end up with the same synchronization
> problems that we have now if we just pushed everything into a
> can_destroy() method. (Unless the framework polled all can_destroy()
> methods for potentially-removable containers, which seems a bit
> nasty).
> 
> We can add can_destroy() if we encounter a situation that can't be
> handled by generic reference counting.
> 

Yes, that is correct, the advantage is that with can_destroy() we
don't need to go through release synchronization each time we do
a css_put(). May be the patch above will fix the problem along
with your release locking proposal.


> Paul
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/containers


-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Containers: css_put() dilemma
  2007-07-17  2:21   ` Balbir Singh
@ 2007-07-17  2:35     ` Paul (宝瑠) Menage
  2007-07-17  7:00       ` Balbir Singh
  0 siblings, 1 reply; 26+ messages in thread
From: Paul (宝瑠) Menage @ 2007-07-17  2:35 UTC (permalink / raw)
  To: balbir
  Cc: Pavel Emelianov, linux kernel mailing list, Paul Jackson,
	Linux Containers, Andrew Morton

On 7/16/07, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>
> -       if (notify_on_release(cont)) {
> +       if (atomic_dec_and_test(&css->refcnt) && notify_on_release(cont)) {

This seems like a good idea, as long as atomic_dec_and_test() isn't
noticeably more expensive than atomic_dec(). I assume it shouldn't
need to be, since the bus locking operations are presumably the same
in each case.

>                 mutex_lock(&container_mutex);
>                 set_bit(CONT_RELEASABLE, &cont->flags);
> -               if (atomic_dec_and_test(&css->refcnt)) {
> -                       check_for_release(cont);
> -               }
> +               check_for_release(cont);
>                 mutex_unlock(&container_mutex);
>
> That way we set the CONT_RELEASABLE bit only when the ref count drops
> to zero.
>

That's probably a good idea, in conjunction with another part of my
patch for this that frees container objects under RCU - as soon as you
do the atomic_dec_and_test(), then in theory some other thread could
delete the container (since we're no longer going to be taking
container_mutex in this function). But as long as the container object
remains valid until synchronize_rcu() completes, then we can safely
set the CONT_RELEASABLE bit on it.

>
> Yes, that is correct, the advantage is that with can_destroy() we
> don't need to go through release synchronization each time we do
> a css_put().

I think the amount of release synchronization *needed* is going to be
the same whether you have the refcounting done in the subsystem or in
the framework. But I agree that right now we're doing one more atomic
op than we strictly need to, and can remove it.

Paul

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Containers: css_put() dilemma
  2007-07-17  2:35     ` Paul (宝瑠) Menage
@ 2007-07-17  7:00       ` Balbir Singh
  2007-07-17  7:18         ` Paul (宝瑠) Menage
  0 siblings, 1 reply; 26+ messages in thread
From: Balbir Singh @ 2007-07-17  7:00 UTC (permalink / raw)
  To: Paul (宝瑠) Menage
  Cc: Pavel Emelianov, linux kernel mailing list, Paul Jackson,
	Linux Containers, Andrew Morton, dhaval

On Mon, Jul 16, 2007 at 07:35:01PM -0700, Paul (宝瑠) Menage wrote:
> On 7/16/07, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> >
> >-       if (notify_on_release(cont)) {
> >+       if (atomic_dec_and_test(&css->refcnt) && notify_on_release(cont)) {
> 
> This seems like a good idea, as long as atomic_dec_and_test() isn't
> noticeably more expensive than atomic_dec(). I assume it shouldn't
> need to be, since the bus locking operations are presumably the same
> in each case.
> 
> >                mutex_lock(&container_mutex);
> >                set_bit(CONT_RELEASABLE, &cont->flags);
> >-               if (atomic_dec_and_test(&css->refcnt)) {
> >-                       check_for_release(cont);
> >-               }
> >+               check_for_release(cont);
> >                mutex_unlock(&container_mutex);
> >
> >That way we set the CONT_RELEASABLE bit only when the ref count drops
> >to zero.
> >
> 
> That's probably a good idea, in conjunction with another part of my
> patch for this that frees container objects under RCU - as soon as you
> do the atomic_dec_and_test(), then in theory some other thread could
> delete the container (since we're no longer going to be taking
> container_mutex in this function). But as long as the container object
> remains valid until synchronize_rcu() completes, then we can safely
> set the CONT_RELEASABLE bit on it.
> 
> >
> >Yes, that is correct, the advantage is that with can_destroy() we
> >don't need to go through release synchronization each time we do
> >a css_put().
> 
> I think the amount of release synchronization *needed* is going to be
> the same whether you have the refcounting done in the subsystem or in
> the framework. But I agree that right now we're doing one more atomic
> op than we strictly need to, and can remove it.
> 
> Paul

Hi, Paul/Andrew

Would you accept this fix, while we wait for the complete solution.
It worked for me quite well.


Description

Stop checking if the container can be released every time we do css_put().
A better solution that avoids container_mutex has been suggested by
Paul, but meanwhile, to get containers working correctly, this fix
would be very useful.

Signed-off-by: <balbir@linux.vnet.ibm.com>
---

 kernel/container.c |    8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff -puN kernel/container.c~container-css-put-on-refcount-zero kernel/container.c
--- linux-2.6.22-rc6/kernel/container.c~container-css-put-on-refcount-zero	2007-07-17 12:18:52.000000000 +0530
+++ linux-2.6.22-rc6-balbir/kernel/container.c	2007-07-17 12:23:29.000000000 +0530
@@ -2515,15 +2515,11 @@ static void check_for_release(struct con
 void css_put(struct container_subsys_state *css)
 {
 	struct container *cont = css->container;
-	if (notify_on_release(cont)) {
+	if (atomic_dec_and_test(&css->refcnt) && notify_on_release(cont)) {
 		mutex_lock(&container_mutex);
 		set_bit(CONT_RELEASABLE, &cont->flags);
-		if (atomic_dec_and_test(&css->refcnt)) {
-			check_for_release(cont);
-		}
+		check_for_release(cont);
 		mutex_unlock(&container_mutex);
-	} else {
-		atomic_dec(&css->refcnt);
 	}
 }
 
_

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Containers: css_put() dilemma
  2007-07-17  7:00       ` Balbir Singh
@ 2007-07-17  7:18         ` Paul (宝瑠) Menage
  2007-07-17 10:28           ` Balbir Singh
  0 siblings, 1 reply; 26+ messages in thread
From: Paul (宝瑠) Menage @ 2007-07-17  7:18 UTC (permalink / raw)
  To: balbir
  Cc: Pavel Emelianov, linux kernel mailing list, Paul Jackson,
	Linux Containers, Andrew Morton, dhaval

On 7/17/07, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> >
> > >                mutex_lock(&container_mutex);
> > >                set_bit(CONT_RELEASABLE, &cont->flags);
> > >-               if (atomic_dec_and_test(&css->refcnt)) {
> > >-                       check_for_release(cont);
> > >-               }
> > >+               check_for_release(cont);
> > >                mutex_unlock(&container_mutex);
> > >

I think that this isn't safe as it stands, without a synchronize_rcu()
in container_diput() prior to the kfree(). Also, it will break if
anyone tries to use a release agent on a hierarchy that has your
memory controller bound to it.

Paul

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Containers: css_put() dilemma
  2007-07-17  7:18         ` Paul (宝瑠) Menage
@ 2007-07-17 10:28           ` Balbir Singh
  2007-07-17 15:49             ` Paul (宝瑠) Menage
  0 siblings, 1 reply; 26+ messages in thread
From: Balbir Singh @ 2007-07-17 10:28 UTC (permalink / raw)
  To: "Paul (??) Menage"
  Cc: dhaval, Pavel Emelianov, linux kernel mailing list, Paul Jackson,
	Linux Containers, Andrew Morton

Paul (??) Menage wrote:
> On 7/17/07, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>> >
>> > >                mutex_lock(&container_mutex);
>> > >                set_bit(CONT_RELEASABLE, &cont->flags);
>> > >-               if (atomic_dec_and_test(&css->refcnt)) {
>> > >-                       check_for_release(cont);
>> > >-               }
>> > >+               check_for_release(cont);
>> > >                mutex_unlock(&container_mutex);
>> > >
> 
> I think that this isn't safe as it stands, without a synchronize_rcu()
> in container_diput() prior to the kfree(). Also, it will break if
> anyone tries to use a release agent on a hierarchy that has your
> memory controller bound to it.
> 


Isn't the code functionally the same as before? We still do atomic_test_and_dec()
as before. We still set_bit() CONT_RELEASABLE, we take the container_mutex
and check_for_release(). I am not sure I understand what changed?

Could you please elaborate as to why using a release agent is broken
when the memory controller is attached to it? I am not quite sure why we
need the synchronize_rcu() either in container_diput().



> Paul
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/containers


-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Containers: css_put() dilemma
  2007-07-17 10:28           ` Balbir Singh
@ 2007-07-17 15:49             ` Paul (宝瑠) Menage
  2007-07-17 16:02               ` Dave Hansen
                                 ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Paul (宝瑠) Menage @ 2007-07-17 15:49 UTC (permalink / raw)
  To: balbir
  Cc: dhaval, Pavel Emelianov, linux kernel mailing list, Paul Jackson,
	Linux Containers, Andrew Morton

On 7/17/07, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> Paul (??) Menage wrote:
> > On 7/17/07, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> >> >
> >> > >                mutex_lock(&container_mutex);
> >> > >                set_bit(CONT_RELEASABLE, &cont->flags);
> >> > >-               if (atomic_dec_and_test(&css->refcnt)) {
> >> > >-                       check_for_release(cont);
> >> > >-               }
> >> > >+               check_for_release(cont);
> >> > >                mutex_unlock(&container_mutex);
> >> > >
> >
> > I think that this isn't safe as it stands, without a synchronize_rcu()
> > in container_diput() prior to the kfree(). Also, it will break if
> > anyone tries to use a release agent on a hierarchy that has your
> > memory controller bound to it.
> >
>
>
> Isn't the code functionally the same as before? We still do atomic_test_and_dec()
> as before. We still set_bit() CONT_RELEASABLE, we take the container_mutex
> and check_for_release(). I am not sure I understand what changed?

Because as soon as you do the atomic_dec_and_test() on css->refcnt and
the refcnt hits zero, then theoretically someone other thread (that
already holds container_mutex) could check that the refcount is zero
and free the container structure.

Adding a synchronize_rcu in container_diput() guarantees that the
container structure won't be freed while someone may still be
accessing it.

>
> Could you please elaborate as to why using a release agent is broken
> when the memory controller is attached to it?

Because then it will try to take container_mutex in css_put() if it
drops the last reference to a container, which is the thing that you
said you had to avoid since you called css_put() in contexts that
couldn't sleep.

Paul

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Containers: css_put() dilemma
  2007-07-17 15:49             ` Paul (宝瑠) Menage
@ 2007-07-17 16:02               ` Dave Hansen
  2007-07-17 16:15                 ` Paul (宝瑠) Menage
  2007-07-17 17:23               ` Paul Jackson
  2007-07-17 17:40               ` Balbir Singh
  2 siblings, 1 reply; 26+ messages in thread
From: Dave Hansen @ 2007-07-17 16:02 UTC (permalink / raw)
  To: Paul (宝瑠) Menage
  Cc: balbir, dhaval, linux kernel mailing list, Pavel Emelianov,
	Paul Jackson, Linux Containers, Andrew Morton

On Tue, 2007-07-17 at 08:49 -0700, Paul (宝瑠) Menage wrote:
> Because as soon as you do the atomic_dec_and_test() on css->refcnt and
> the refcnt hits zero, then theoretically someone other thread (that
> already holds container_mutex) could check that the refcount is zero
> and free the container structure.

Then that other task had a reference and itself should have bumped the
count, and the other user would never have seen it hit zero.

Even if there are still pages attached to the container, why not just
have those take a reference, and don't bother actually freeing the
container until the last true reference is dropped?

Does it matter if the destruction callbacks don't happen until well
after an attempt to destroy the container is made?

-- Dave


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Containers: css_put() dilemma
  2007-07-17 16:02               ` Dave Hansen
@ 2007-07-17 16:15                 ` Paul (宝瑠) Menage
  0 siblings, 0 replies; 26+ messages in thread
From: Paul (宝瑠) Menage @ 2007-07-17 16:15 UTC (permalink / raw)
  To: Dave Hansen
  Cc: balbir, dhaval, linux kernel mailing list, Pavel Emelianov,
	Paul Jackson, Linux Containers, Andrew Morton

On 7/17/07, Dave Hansen <haveblue@us.ibm.com> wrote:
> On Tue, 2007-07-17 at 08:49 -0700, Paul (宝瑠) Menage wrote:
> > Because as soon as you do the atomic_dec_and_test() on css->refcnt and
> > the refcnt hits zero, then theoretically someone other thread (that
> > already holds container_mutex) could check that the refcount is zero
> > and free the container structure.
>
> Then that other task had a reference and itself should have bumped the
> count, and the other user would never have seen it hit zero.

Nope. The container could have been empty (of tasks) and hence had a zero count.

The liveness model used by containers is that when the refcount hits
zero, the container isn't immediately destroyed (since it can contain
useful historical usage data, etc) but simply becomes eligible for
destruction by userspace via an rmdir().

>
> Even if there are still pages attached to the container, why not just
> have those take a reference, and don't bother actually freeing the
> container until the last true reference is dropped?

Yes, we could potentially just use the main count variable rather than
having separate per-subsystem extra refcounts. The main reasons to do
it this way are:

- the root subsystem state for a subsystem can shift between different
"struct container" objects if it was previously inactive and gets
mounted as part of a hierarchy (or similarly, gets unmounted and goes
inactive). Possibly we could get around this by simply saying not
doing refcounting on the subsys states attached to root containers
since they can never be freed anyway.

- At some point I'd like to be able to support shifting subsystems
between active hierarchies, at least in limited cases such as where
the hierarchies are isomorphic, or binding/unbinding subsystems
to/from active hierarchies. At that point we definitely need to be
able to split out the different subsystem state refcounts from one
another in the same hierarchy.

- we'd still have the issue that Balbir wants to be able to drop a
reference in a non-sleeping context, and we want to avoid doing
excessive synchronization in the normal case when the css_put()
doesn't put the reference count to zero.

>
> Does it matter if the destruction callbacks don't happen until well
> after an attempt to destroy the container is made?

Well that's sort of the point of putting a synchronize_rcu() in
container_diput() - it ensures that the actual destruction of the
object doesn't occur until sufficiently after the destruction attempt
is initiated that no one is still using the reference.

The alternative would be something that polls to spot whether
refcounts have reached zero and if so runs the userspace helper. That
doesn't seem particularly palatable when you have large numbers of
containers, if we can avoid it easily.

Paul

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Containers: css_put() dilemma
  2007-07-17 15:49             ` Paul (宝瑠) Menage
  2007-07-17 16:02               ` Dave Hansen
@ 2007-07-17 17:23               ` Paul Jackson
  2007-07-17 17:40               ` Balbir Singh
  2 siblings, 0 replies; 26+ messages in thread
From: Paul Jackson @ 2007-07-17 17:23 UTC (permalink / raw)
  To: Paul (______) Menage
  Cc: balbir, dhaval, xemul, linux-kernel, containers, akpm

> Because as soon as you do the atomic_dec_and_test() on css->refcnt and
> the refcnt hits zero, then theoretically someone other thread (that
> already holds container_mutex) could check that the refcount is zero
> and free the container structure.

Not just theory ... I've debugged crashes from live customer systems
that were basically this race.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Containers: css_put() dilemma
  2007-07-17 15:49             ` Paul (宝瑠) Menage
  2007-07-17 16:02               ` Dave Hansen
  2007-07-17 17:23               ` Paul Jackson
@ 2007-07-17 17:40               ` Balbir Singh
  2007-07-17 17:44                 ` Paul (宝瑠) Menage
  2007-07-17 17:53                 ` Paul Jackson
  2 siblings, 2 replies; 26+ messages in thread
From: Balbir Singh @ 2007-07-17 17:40 UTC (permalink / raw)
  To: "Paul (??) Menage"
  Cc: dhaval, Pavel Emelianov, linux kernel mailing list, Paul Jackson,
	Linux Containers, Andrew Morton

Paul (??) Menage wrote:
> Because as soon as you do the atomic_dec_and_test() on css->refcnt and
> the refcnt hits zero, then theoretically someone other thread (that
> already holds container_mutex) could check that the refcount is zero
> and free the container structure.
> 

Hi, Paul,

That sounds correct. I wonder now if the solution should be some form
of delegation for deletion of unreferenced containers (HINT: work queue
or kernel threads).

> Adding a synchronize_rcu in container_diput() guarantees that the
> container structure won't be freed while someone may still be
> accessing it.
> 

Do we take rcu_read_lock() in css_put() path or use call_rcu() to
free the container?

>>
>> Could you please elaborate as to why using a release agent is broken
>> when the memory controller is attached to it?
> 
> Because then it will try to take container_mutex in css_put() if it
> drops the last reference to a container, which is the thing that you
> said you had to avoid since you called css_put() in contexts that
> couldn't sleep.
> 
> Paul


-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Containers: css_put() dilemma
  2007-07-17 17:40               ` Balbir Singh
@ 2007-07-17 17:44                 ` Paul (宝瑠) Menage
  2007-07-17 17:55                   ` Paul Jackson
  2007-07-17 18:11                   ` Balbir Singh
  2007-07-17 17:53                 ` Paul Jackson
  1 sibling, 2 replies; 26+ messages in thread
From: Paul (宝瑠) Menage @ 2007-07-17 17:44 UTC (permalink / raw)
  To: balbir
  Cc: dhaval, Pavel Emelianov, linux kernel mailing list, Paul Jackson,
	Linux Containers, Andrew Morton

On 7/17/07, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>
> That sounds correct. I wonder now if the solution should be some form
> of delegation for deletion of unreferenced containers (HINT: work queue
> or kernel threads).

What a great idea. In fact, that's exactly what the release agent
patch already does.

>
> > Adding a synchronize_rcu in container_diput() guarantees that the
> > container structure won't be freed while someone may still be
> > accessing it.
> >
>
> Do we take rcu_read_lock() in css_put() path or use call_rcu() to
> free the container?

Good point, we ought to add rcu_read_lock() (even though it doesn't
actually do anything on architectures other than alpha, right?)

Using call_rcu to do the container kfree rather than synchronize_rcu()
would be a possible future optimization, yes.

Paul

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Containers: css_put() dilemma
  2007-07-17 17:40               ` Balbir Singh
  2007-07-17 17:44                 ` Paul (宝瑠) Menage
@ 2007-07-17 17:53                 ` Paul Jackson
  2007-07-17 17:55                   ` Paul (宝瑠) Menage
  1 sibling, 1 reply; 26+ messages in thread
From: Paul Jackson @ 2007-07-17 17:53 UTC (permalink / raw)
  To: balbir; +Cc: menage, dhaval, xemul, linux-kernel, containers, akpm

> That sounds correct. I wonder now if the solution should be some form
> of delegation for deletion of unreferenced containers (HINT: work queue
> or kernel threads).

At least for cpusets (the mother of all containers), notify on release
is part of the user visible API of cpusets.  The kernel does not remove
cpusets; it runs a user program, /sbin/cpuset_release_agent.  That
program might choose to rmdir the cpuset directory, and/or do other
actions, like notify a batch scheduler that one of its cpusets was
released.

That API is not as sacrosanct as some, but it is working, and I wouldn't
want to break it without good reason.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Containers: css_put() dilemma
  2007-07-17 17:44                 ` Paul (宝瑠) Menage
@ 2007-07-17 17:55                   ` Paul Jackson
  2007-07-17 17:57                     ` Paul (宝瑠) Menage
  2007-07-17 18:11                   ` Balbir Singh
  1 sibling, 1 reply; 26+ messages in thread
From: Paul Jackson @ 2007-07-17 17:55 UTC (permalink / raw)
  To: Paul (______) Menage
  Cc: balbir, dhaval, xemul, linux-kernel, containers, akpm

Paul M wrote:
> In fact, that's exactly what the release agent
> patch already does.

I'm feeling lazy ;)  What's the Subject of that
patch, for my easy searching?

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Containers: css_put() dilemma
  2007-07-17 17:53                 ` Paul Jackson
@ 2007-07-17 17:55                   ` Paul (宝瑠) Menage
  2007-07-17 17:58                     ` Paul Jackson
  0 siblings, 1 reply; 26+ messages in thread
From: Paul (宝瑠) Menage @ 2007-07-17 17:55 UTC (permalink / raw)
  To: Paul Jackson; +Cc: balbir, dhaval, xemul, linux-kernel, containers, akpm

On 7/17/07, Paul Jackson <pj@sgi.com> wrote:
>
> At least for cpusets (the mother of all containers), notify on release
> is part of the user visible API of cpusets.  The kernel does not remove
> cpusets; it runs a user program, /sbin/cpuset_release_agent.  That
> program might choose to rmdir the cpuset directory, and/or do other
> actions, like notify a batch scheduler that one of its cpusets was
> released.

Right, that's what the release agent patch for process containers does
- there's a file in the hierarchy root called "release_agent" that
contains the path to the program to run if a notify_on_release
container goes idle. When you mount the "cpuset" filesystem, it
automatically populates that file with "/sbin/cpuset_release_agent". A
workqueue task is used to actually do the notifications so that
references can be dropped without having to potentially run a
userspace helper.

Paul

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Containers: css_put() dilemma
  2007-07-17 17:55                   ` Paul Jackson
@ 2007-07-17 17:57                     ` Paul (宝瑠) Menage
  0 siblings, 0 replies; 26+ messages in thread
From: Paul (宝瑠) Menage @ 2007-07-17 17:57 UTC (permalink / raw)
  To: Paul Jackson; +Cc: balbir, dhaval, xemul, linux-kernel, containers, akpm

On 7/17/07, Paul Jackson <pj@sgi.com> wrote:
> Paul M wrote:
> > In fact, that's exactly what the release agent
> > patch already does.
>
> I'm feeling lazy ;)  What's the Subject of that
> patch, for my easy searching?

"Support for automatic userspace release agents"

Paul

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Containers: css_put() dilemma
  2007-07-17 17:55                   ` Paul (宝瑠) Menage
@ 2007-07-17 17:58                     ` Paul Jackson
  0 siblings, 0 replies; 26+ messages in thread
From: Paul Jackson @ 2007-07-17 17:58 UTC (permalink / raw)
  To: Paul (______) Menage
  Cc: balbir, dhaval, xemul, linux-kernel, containers, akpm

Paul M wrote:
> Right, that's what the release agent patch for process containers does
> - there's a file in the hierarchy root called "release_agent" that
> contains the path to the program to run if a notify_on_release
> container goes idle. When you mount the "cpuset" filesystem, it
> automatically populates that file with "/sbin/cpuset_release_agent".

Cool -- thanks.  Sounds good.

Nevermind my other "what's the Subject" query, if you haven't
already answered it.  You just nicely answered my questions.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Containers: css_put() dilemma
  2007-07-17 17:44                 ` Paul (宝瑠) Menage
  2007-07-17 17:55                   ` Paul Jackson
@ 2007-07-17 18:11                   ` Balbir Singh
  2007-07-17 18:26                     ` Paul (宝瑠) Menage
  1 sibling, 1 reply; 26+ messages in thread
From: Balbir Singh @ 2007-07-17 18:11 UTC (permalink / raw)
  To: "Paul (??) Menage"
  Cc: dhaval, Pavel Emelianov, linux kernel mailing list, Paul Jackson,
	Linux Containers, Andrew Morton

Paul (??) Menage wrote:
> On 7/17/07, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>>
>> That sounds correct. I wonder now if the solution should be some form
>> of delegation for deletion of unreferenced containers (HINT: work queue
>> or kernel threads).
> 
> What a great idea. In fact, that's exactly what the release agent
> patch already does.

:-) I should have seen that. I am a little lost thinking that
container_rmdir() and the release agent check_for_release() work
without too much knowledge of each other. BTW, what are the semantics
of css_put() is it expected to free the container/run the release agent
when the reference count of the container_subsys_state drops to zero?
 

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Containers: css_put() dilemma
  2007-07-17 18:11                   ` Balbir Singh
@ 2007-07-17 18:26                     ` Paul (宝瑠) Menage
  2007-07-18  4:29                       ` Balbir Singh
  0 siblings, 1 reply; 26+ messages in thread
From: Paul (宝瑠) Menage @ 2007-07-17 18:26 UTC (permalink / raw)
  To: balbir
  Cc: dhaval, Pavel Emelianov, linux kernel mailing list, Paul Jackson,
	Linux Containers, Andrew Morton

On 7/17/07, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> without too much knowledge of each other. BTW, what are the semantics
> of css_put() is it expected to free the container/run the release agent
> when the reference count of the container_subsys_state drops to zero?
>

If you css_put() the last reference on a subsystem state object and
the associated container is marked as notify_on_release, then
check_for_release() is called which does a more full check of whether
the container is releasable. If it is, a workqueue task is scheduled
to run the userspace release agent, which can then do anything it
wants, including potentially deleting the empty container.

Paul

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Containers: css_put() dilemma
  2007-07-17 18:26                     ` Paul (宝瑠) Menage
@ 2007-07-18  4:29                       ` Balbir Singh
  2007-07-18  5:30                         ` Balbir Singh
  2007-07-18  6:07                         ` Paul (宝瑠) Menage
  0 siblings, 2 replies; 26+ messages in thread
From: Balbir Singh @ 2007-07-18  4:29 UTC (permalink / raw)
  To: "Paul (??) Menage"
  Cc: dhaval, linux kernel mailing list, Pavel Emelianov, Paul Jackson,
	Linux Containers, Andrew Morton

Paul (??) Menage wrote:
> On 7/17/07, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>> without too much knowledge of each other. BTW, what are the semantics
>> of css_put() is it expected to free the container/run the release agent
>> when the reference count of the container_subsys_state drops to zero?
>>
> 
> If you css_put() the last reference on a subsystem state object and
> the associated container is marked as notify_on_release, then
> check_for_release() is called which does a more full check of whether
> the container is releasable. If it is, a workqueue task is scheduled
> to run the userspace release agent, which can then do anything it
> wants, including potentially deleting the empty container.
> 

Ok.. so my problem still remains, how do I get a non-blocking atomic
reference increment/decrement routine, that would prevent my
container from being deleted?

I don't find cpusets using css_put(). I was hoping that we could
alter css_* would provide the functionality I need.


-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Containers: css_put() dilemma
  2007-07-18  4:29                       ` Balbir Singh
@ 2007-07-18  5:30                         ` Balbir Singh
  2007-07-18  5:52                           ` Srivatsa Vaddagiri
  2007-07-18 23:15                           ` Paul Menage
  2007-07-18  6:07                         ` Paul (宝瑠) Menage
  1 sibling, 2 replies; 26+ messages in thread
From: Balbir Singh @ 2007-07-18  5:30 UTC (permalink / raw)
  To: Paul (??) Menage
  Cc: balbir, dhaval, linux kernel mailing list, Pavel Emelianov,
	Paul Jackson, Linux Containers, Andrew Morton

Balbir Singh wrote:
> Paul (??) Menage wrote:
>> On 7/17/07, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>>> without too much knowledge of each other. BTW, what are the semantics
>>> of css_put() is it expected to free the container/run the release agent
>>> when the reference count of the container_subsys_state drops to zero?
>>>
>> If you css_put() the last reference on a subsystem state object and
>> the associated container is marked as notify_on_release, then
>> check_for_release() is called which does a more full check of whether
>> the container is releasable. If it is, a workqueue task is scheduled
>> to run the userspace release agent, which can then do anything it
>> wants, including potentially deleting the empty container.
>>
> 
> Ok.. so my problem still remains, how do I get a non-blocking atomic
> reference increment/decrement routine, that would prevent my
> container from being deleted?
> 
> I don't find cpusets using css_put(). I was hoping that we could
> alter css_* would provide the functionality I need.
> 
> 

Thinking out loud again, can we add can_destory() callbacks?

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Containers: css_put() dilemma
  2007-07-18  5:30                         ` Balbir Singh
@ 2007-07-18  5:52                           ` Srivatsa Vaddagiri
  2007-07-18 23:15                           ` Paul Menage
  1 sibling, 0 replies; 26+ messages in thread
From: Srivatsa Vaddagiri @ 2007-07-18  5:52 UTC (permalink / raw)
  To: Balbir Singh
  Cc: Paul (??) Menage, dhaval, linux kernel mailing list,
	Pavel Emelianov, Paul Jackson, Linux Containers, Andrew Morton

On Wed, Jul 18, 2007 at 11:00:39AM +0530, Balbir Singh wrote:
> Thinking out loud again, can we add can_destory() callbacks?

I remember suggesting such a callback long before :

	http://marc.info/?l=linux-kernel&m=117576241131788&w=2

-- 
Regards,
vatsa

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Containers: css_put() dilemma
  2007-07-18  4:29                       ` Balbir Singh
  2007-07-18  5:30                         ` Balbir Singh
@ 2007-07-18  6:07                         ` Paul (宝瑠) Menage
  1 sibling, 0 replies; 26+ messages in thread
From: Paul (宝瑠) Menage @ 2007-07-18  6:07 UTC (permalink / raw)
  To: balbir
  Cc: dhaval, linux kernel mailing list, Pavel Emelianov, Paul Jackson,
	Linux Containers, Andrew Morton

On 7/17/07, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>
> Ok.. so my problem still remains, how do I get a non-blocking atomic
> reference increment/decrement routine, that would prevent my
> container from being deleted?

css_put() in my new patchset will be non-blocking.

>
> I don't find cpusets using css_put().

Cpusets never needs to keep a non-process reference on a cpuset
object, so the framework handles everything.

Paul

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Containers: css_put() dilemma
  2007-07-18  5:30                         ` Balbir Singh
  2007-07-18  5:52                           ` Srivatsa Vaddagiri
@ 2007-07-18 23:15                           ` Paul Menage
  2007-07-19  3:44                             ` Balbir Singh
  1 sibling, 1 reply; 26+ messages in thread
From: Paul Menage @ 2007-07-18 23:15 UTC (permalink / raw)
  To: balbir
  Cc: dhaval, linux kernel mailing list, Pavel Emelianov, Paul Jackson,
	Linux Containers, Andrew Morton

On 7/17/07, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>
> Thinking out loud again, can we add can_destroy() callbacks?
>

What would the exact semantics of such a callback be?

Since for proper interaction with release agents we need the subsystem
to notify the framework when a subsystem object becomes releasable
(currently as part of css_put()), what would a can_destroy() callback
let you do that you couldn't do just by taking an extra css refcount
to prevent destruction and releasing that refcount to allow
destruction?

Paul

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Containers: css_put() dilemma
  2007-07-18 23:15                           ` Paul Menage
@ 2007-07-19  3:44                             ` Balbir Singh
  0 siblings, 0 replies; 26+ messages in thread
From: Balbir Singh @ 2007-07-19  3:44 UTC (permalink / raw)
  To: Paul Menage
  Cc: dhaval, Pavel Emelianov, linux kernel mailing list, Paul Jackson,
	Linux Containers, Andrew Morton

On 7/19/07, Paul Menage <menage@google.com> wrote:
> On 7/17/07, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> >
> > Thinking out loud again, can we add can_destroy() callbacks?
> >
>
> What would the exact semantics of such a callback be?
>
> Since for proper interaction with release agents we need the subsystem
> to notify the framework when a subsystem object becomes releasable
> (currently as part of css_put()), what would a can_destroy() callback
> let you do that you couldn't do just by taking an extra css refcount
> to prevent destruction and releasing that refcount to allow
> destruction?

I was thinking along those lines before you mentioned that the next
version of css_put() will not block. The advantage I see of
can_destory() is that it allows subsystems to do their own reference
counting and decide whether they are ready to be deleted or not. The
other advantage I see is that it can act like a prepare to be deleted
phase for the controller, the controller might decide to take some
action in the can_destroy() phase, like the memory controller could
decide to start reclaiming all  the remaining page cache pages.

BTW, do you know upfront as to when the next set of container
enhancement patches will be ready? The css_put() issue is blocking us
currently.

Balbir

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2007-07-19  3:44 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-16 18:50 Containers: css_put() dilemma Balbir Singh
2007-07-16 19:03 ` Paul (宝瑠) Menage
2007-07-17  2:21   ` Balbir Singh
2007-07-17  2:35     ` Paul (宝瑠) Menage
2007-07-17  7:00       ` Balbir Singh
2007-07-17  7:18         ` Paul (宝瑠) Menage
2007-07-17 10:28           ` Balbir Singh
2007-07-17 15:49             ` Paul (宝瑠) Menage
2007-07-17 16:02               ` Dave Hansen
2007-07-17 16:15                 ` Paul (宝瑠) Menage
2007-07-17 17:23               ` Paul Jackson
2007-07-17 17:40               ` Balbir Singh
2007-07-17 17:44                 ` Paul (宝瑠) Menage
2007-07-17 17:55                   ` Paul Jackson
2007-07-17 17:57                     ` Paul (宝瑠) Menage
2007-07-17 18:11                   ` Balbir Singh
2007-07-17 18:26                     ` Paul (宝瑠) Menage
2007-07-18  4:29                       ` Balbir Singh
2007-07-18  5:30                         ` Balbir Singh
2007-07-18  5:52                           ` Srivatsa Vaddagiri
2007-07-18 23:15                           ` Paul Menage
2007-07-19  3:44                             ` Balbir Singh
2007-07-18  6:07                         ` Paul (宝瑠) Menage
2007-07-17 17:53                 ` Paul Jackson
2007-07-17 17:55                   ` Paul (宝瑠) Menage
2007-07-17 17:58                     ` Paul Jackson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox