* Containers: css_put() dilemma @ 2007-07-16 18:50 Balbir Singh 2007-07-16 19:03 ` Paul (宝瑠) Menage 0 siblings, 1 reply; 26+ messages in thread From: Balbir Singh @ 2007-07-16 18:50 UTC (permalink / raw) To: Paul Menage, Andrew Morton Cc: Srivatsa Vaddagiri, Pavel Emelianov, Vaidyanathan Srinivasan, Serge E. Hallyn, linux kernel mailing list, Linux Containers, Paul Jackson Hi, Paul, I've run into a strange problem with css_put(). After the changes for notify_on_release(), the css_put() routine can now block and it blocks on the container_mutex. This implies that css_put() cannot be called if 1. We cannot block 2. We already hold the container_mutex The problem I have is that of preventing the destruction of my container (when the user does rmdir). If the user migrates away all tasks and does an rmdir, the only way to prevent the container from going away is through css_get() references. In my case, some pages have been allocated from the container and hence I do not want it to go away, until all the pages charged to it are freed. When I use css_get/put() to prevent destruction I am blocked by the limitations of css_put() listed above. Do you have any recommendations for a cleaner solution? I suspect we'll need can_destroy() callbacks (similar to can_attach()). -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Containers: css_put() dilemma 2007-07-16 18:50 Containers: css_put() dilemma Balbir Singh @ 2007-07-16 19:03 ` Paul (宝瑠) Menage 2007-07-17 2:21 ` Balbir Singh 0 siblings, 1 reply; 26+ messages in thread From: Paul (宝瑠) Menage @ 2007-07-16 19:03 UTC (permalink / raw) To: balbir Cc: Andrew Morton, Srivatsa Vaddagiri, Pavel Emelianov, Vaidyanathan Srinivasan, Serge E. Hallyn, linux kernel mailing list, Linux Containers, Paul Jackson On 7/16/07, Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > Hi, Paul, > > I've run into a strange problem with css_put(). After the changes for notify_on_release(), the css_put() routine can now block and it blocks on > the container_mutex. This implies that css_put() cannot be called if > > 1. We cannot block > 2. We already hold the container_mutex > > The problem I have is that of preventing the destruction of my container > (when the user does rmdir). If the user migrates away all tasks and does > an rmdir, the only way to prevent the container from going away is through > css_get() references. In my case, some pages have been allocated from the > container and hence I do not want it to go away, until all the pages > charged to it are freed. When I use css_get/put() to prevent destruction > I am blocked by the limitations of css_put() listed above. > > Do you have any recommendations for a cleaner solution? I suspect we'll > need can_destroy() callbacks (similar to can_attach()). I think moving the release_list synchronization inside a separate spinlock, and thus not requiring container_mutex to be held for check_for_release(), is the simplest solution. I'll do that. I'm hoping to get a new set of patches to Andrew today or tomorrow. Adding a can_destroy() callback is possible, but since I envisage that most subsystems that would want to implement it would basically be doing reference counting anyway, it seems worth having a generic reference counting mechanism in the framework. In particular, since once the container does become releasable due to all the subsystem-specific refcounts being released, we want to be able to invoke the release agent, we'll end up with the same synchronization problems that we have now if we just pushed everything into a can_destroy() method. (Unless the framework polled all can_destroy() methods for potentially-removable containers, which seems a bit nasty). We can add can_destroy() if we encounter a situation that can't be handled by generic reference counting. Paul ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Containers: css_put() dilemma 2007-07-16 19:03 ` Paul (宝瑠) Menage @ 2007-07-17 2:21 ` Balbir Singh 2007-07-17 2:35 ` Paul (宝瑠) Menage 0 siblings, 1 reply; 26+ messages in thread From: Balbir Singh @ 2007-07-17 2:21 UTC (permalink / raw) To: "Paul (??) Menage" Cc: Pavel Emelianov, linux kernel mailing list, Paul Jackson, Linux Containers, Andrew Morton Paul (??) Menage wrote: > On 7/16/07, Balbir Singh <balbir@linux.vnet.ibm.com> wrote: >> Hi, Paul, >> >> I've run into a strange problem with css_put(). After the changes for >> notify_on_release(), the css_put() routine can now block and it blocks on >> the container_mutex. This implies that css_put() cannot be called if >> >> 1. We cannot block >> 2. We already hold the container_mutex >> >> The problem I have is that of preventing the destruction of my container >> (when the user does rmdir). If the user migrates away all tasks and does >> an rmdir, the only way to prevent the container from going away is >> through >> css_get() references. In my case, some pages have been allocated from the >> container and hence I do not want it to go away, until all the pages >> charged to it are freed. When I use css_get/put() to prevent destruction >> I am blocked by the limitations of css_put() listed above. >> >> Do you have any recommendations for a cleaner solution? I suspect we'll >> need can_destroy() callbacks (similar to can_attach()). > > I think moving the release_list synchronization inside a separate > spinlock, and thus not requiring container_mutex to be held for > check_for_release(), is the simplest solution. I'll do that. I'm > hoping to get a new set of patches to Andrew today or tomorrow. > That sounds good to me. But I worry about having to do release synchronization on every css_put(). The current patch I have, but does not work 100% does the following (WARNING: white spaces ahead, do not use the patch directly) - if (notify_on_release(cont)) { + if (atomic_dec_and_test(&css->refcnt) && notify_on_release(cont)) { mutex_lock(&container_mutex); set_bit(CONT_RELEASABLE, &cont->flags); - if (atomic_dec_and_test(&css->refcnt)) { - check_for_release(cont); - } + check_for_release(cont); mutex_unlock(&container_mutex); That way we set the CONT_RELEASABLE bit only when the ref count drops to zero. > Adding a can_destroy() callback is possible, but since I envisage that > most subsystems that would want to implement it would basically be > doing reference counting anyway, it seems worth having a generic > reference counting mechanism in the framework. In particular, since > once the container does become releasable due to all the > subsystem-specific refcounts being released, we want to be able to > invoke the release agent, we'll end up with the same synchronization > problems that we have now if we just pushed everything into a > can_destroy() method. (Unless the framework polled all can_destroy() > methods for potentially-removable containers, which seems a bit > nasty). > > We can add can_destroy() if we encounter a situation that can't be > handled by generic reference counting. > Yes, that is correct, the advantage is that with can_destroy() we don't need to go through release synchronization each time we do a css_put(). May be the patch above will fix the problem along with your release locking proposal. > Paul > _______________________________________________ > Containers mailing list > Containers@lists.linux-foundation.org > https://lists.linux-foundation.org/mailman/listinfo/containers -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Containers: css_put() dilemma 2007-07-17 2:21 ` Balbir Singh @ 2007-07-17 2:35 ` Paul (宝瑠) Menage 2007-07-17 7:00 ` Balbir Singh 0 siblings, 1 reply; 26+ messages in thread From: Paul (宝瑠) Menage @ 2007-07-17 2:35 UTC (permalink / raw) To: balbir Cc: Pavel Emelianov, linux kernel mailing list, Paul Jackson, Linux Containers, Andrew Morton On 7/16/07, Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > > - if (notify_on_release(cont)) { > + if (atomic_dec_and_test(&css->refcnt) && notify_on_release(cont)) { This seems like a good idea, as long as atomic_dec_and_test() isn't noticeably more expensive than atomic_dec(). I assume it shouldn't need to be, since the bus locking operations are presumably the same in each case. > mutex_lock(&container_mutex); > set_bit(CONT_RELEASABLE, &cont->flags); > - if (atomic_dec_and_test(&css->refcnt)) { > - check_for_release(cont); > - } > + check_for_release(cont); > mutex_unlock(&container_mutex); > > That way we set the CONT_RELEASABLE bit only when the ref count drops > to zero. > That's probably a good idea, in conjunction with another part of my patch for this that frees container objects under RCU - as soon as you do the atomic_dec_and_test(), then in theory some other thread could delete the container (since we're no longer going to be taking container_mutex in this function). But as long as the container object remains valid until synchronize_rcu() completes, then we can safely set the CONT_RELEASABLE bit on it. > > Yes, that is correct, the advantage is that with can_destroy() we > don't need to go through release synchronization each time we do > a css_put(). I think the amount of release synchronization *needed* is going to be the same whether you have the refcounting done in the subsystem or in the framework. But I agree that right now we're doing one more atomic op than we strictly need to, and can remove it. Paul ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Containers: css_put() dilemma 2007-07-17 2:35 ` Paul (宝瑠) Menage @ 2007-07-17 7:00 ` Balbir Singh 2007-07-17 7:18 ` Paul (宝瑠) Menage 0 siblings, 1 reply; 26+ messages in thread From: Balbir Singh @ 2007-07-17 7:00 UTC (permalink / raw) To: Paul (宝瑠) Menage Cc: Pavel Emelianov, linux kernel mailing list, Paul Jackson, Linux Containers, Andrew Morton, dhaval On Mon, Jul 16, 2007 at 07:35:01PM -0700, Paul (宝瑠) Menage wrote: > On 7/16/07, Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > > > >- if (notify_on_release(cont)) { > >+ if (atomic_dec_and_test(&css->refcnt) && notify_on_release(cont)) { > > This seems like a good idea, as long as atomic_dec_and_test() isn't > noticeably more expensive than atomic_dec(). I assume it shouldn't > need to be, since the bus locking operations are presumably the same > in each case. > > > mutex_lock(&container_mutex); > > set_bit(CONT_RELEASABLE, &cont->flags); > >- if (atomic_dec_and_test(&css->refcnt)) { > >- check_for_release(cont); > >- } > >+ check_for_release(cont); > > mutex_unlock(&container_mutex); > > > >That way we set the CONT_RELEASABLE bit only when the ref count drops > >to zero. > > > > That's probably a good idea, in conjunction with another part of my > patch for this that frees container objects under RCU - as soon as you > do the atomic_dec_and_test(), then in theory some other thread could > delete the container (since we're no longer going to be taking > container_mutex in this function). But as long as the container object > remains valid until synchronize_rcu() completes, then we can safely > set the CONT_RELEASABLE bit on it. > > > > >Yes, that is correct, the advantage is that with can_destroy() we > >don't need to go through release synchronization each time we do > >a css_put(). > > I think the amount of release synchronization *needed* is going to be > the same whether you have the refcounting done in the subsystem or in > the framework. But I agree that right now we're doing one more atomic > op than we strictly need to, and can remove it. > > Paul Hi, Paul/Andrew Would you accept this fix, while we wait for the complete solution. It worked for me quite well. Description Stop checking if the container can be released every time we do css_put(). A better solution that avoids container_mutex has been suggested by Paul, but meanwhile, to get containers working correctly, this fix would be very useful. Signed-off-by: <balbir@linux.vnet.ibm.com> --- kernel/container.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff -puN kernel/container.c~container-css-put-on-refcount-zero kernel/container.c --- linux-2.6.22-rc6/kernel/container.c~container-css-put-on-refcount-zero 2007-07-17 12:18:52.000000000 +0530 +++ linux-2.6.22-rc6-balbir/kernel/container.c 2007-07-17 12:23:29.000000000 +0530 @@ -2515,15 +2515,11 @@ static void check_for_release(struct con void css_put(struct container_subsys_state *css) { struct container *cont = css->container; - if (notify_on_release(cont)) { + if (atomic_dec_and_test(&css->refcnt) && notify_on_release(cont)) { mutex_lock(&container_mutex); set_bit(CONT_RELEASABLE, &cont->flags); - if (atomic_dec_and_test(&css->refcnt)) { - check_for_release(cont); - } + check_for_release(cont); mutex_unlock(&container_mutex); - } else { - atomic_dec(&css->refcnt); } } _ -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Containers: css_put() dilemma 2007-07-17 7:00 ` Balbir Singh @ 2007-07-17 7:18 ` Paul (宝瑠) Menage 2007-07-17 10:28 ` Balbir Singh 0 siblings, 1 reply; 26+ messages in thread From: Paul (宝瑠) Menage @ 2007-07-17 7:18 UTC (permalink / raw) To: balbir Cc: Pavel Emelianov, linux kernel mailing list, Paul Jackson, Linux Containers, Andrew Morton, dhaval On 7/17/07, Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > > > > > mutex_lock(&container_mutex); > > > set_bit(CONT_RELEASABLE, &cont->flags); > > >- if (atomic_dec_and_test(&css->refcnt)) { > > >- check_for_release(cont); > > >- } > > >+ check_for_release(cont); > > > mutex_unlock(&container_mutex); > > > I think that this isn't safe as it stands, without a synchronize_rcu() in container_diput() prior to the kfree(). Also, it will break if anyone tries to use a release agent on a hierarchy that has your memory controller bound to it. Paul ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Containers: css_put() dilemma 2007-07-17 7:18 ` Paul (宝瑠) Menage @ 2007-07-17 10:28 ` Balbir Singh 2007-07-17 15:49 ` Paul (宝瑠) Menage 0 siblings, 1 reply; 26+ messages in thread From: Balbir Singh @ 2007-07-17 10:28 UTC (permalink / raw) To: "Paul (??) Menage" Cc: dhaval, Pavel Emelianov, linux kernel mailing list, Paul Jackson, Linux Containers, Andrew Morton Paul (??) Menage wrote: > On 7/17/07, Balbir Singh <balbir@linux.vnet.ibm.com> wrote: >> > >> > > mutex_lock(&container_mutex); >> > > set_bit(CONT_RELEASABLE, &cont->flags); >> > >- if (atomic_dec_and_test(&css->refcnt)) { >> > >- check_for_release(cont); >> > >- } >> > >+ check_for_release(cont); >> > > mutex_unlock(&container_mutex); >> > > > > I think that this isn't safe as it stands, without a synchronize_rcu() > in container_diput() prior to the kfree(). Also, it will break if > anyone tries to use a release agent on a hierarchy that has your > memory controller bound to it. > Isn't the code functionally the same as before? We still do atomic_test_and_dec() as before. We still set_bit() CONT_RELEASABLE, we take the container_mutex and check_for_release(). I am not sure I understand what changed? Could you please elaborate as to why using a release agent is broken when the memory controller is attached to it? I am not quite sure why we need the synchronize_rcu() either in container_diput(). > Paul > _______________________________________________ > Containers mailing list > Containers@lists.linux-foundation.org > https://lists.linux-foundation.org/mailman/listinfo/containers -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Containers: css_put() dilemma 2007-07-17 10:28 ` Balbir Singh @ 2007-07-17 15:49 ` Paul (宝瑠) Menage 2007-07-17 16:02 ` Dave Hansen ` (2 more replies) 0 siblings, 3 replies; 26+ messages in thread From: Paul (宝瑠) Menage @ 2007-07-17 15:49 UTC (permalink / raw) To: balbir Cc: dhaval, Pavel Emelianov, linux kernel mailing list, Paul Jackson, Linux Containers, Andrew Morton On 7/17/07, Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > Paul (??) Menage wrote: > > On 7/17/07, Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > >> > > >> > > mutex_lock(&container_mutex); > >> > > set_bit(CONT_RELEASABLE, &cont->flags); > >> > >- if (atomic_dec_and_test(&css->refcnt)) { > >> > >- check_for_release(cont); > >> > >- } > >> > >+ check_for_release(cont); > >> > > mutex_unlock(&container_mutex); > >> > > > > > > I think that this isn't safe as it stands, without a synchronize_rcu() > > in container_diput() prior to the kfree(). Also, it will break if > > anyone tries to use a release agent on a hierarchy that has your > > memory controller bound to it. > > > > > Isn't the code functionally the same as before? We still do atomic_test_and_dec() > as before. We still set_bit() CONT_RELEASABLE, we take the container_mutex > and check_for_release(). I am not sure I understand what changed? Because as soon as you do the atomic_dec_and_test() on css->refcnt and the refcnt hits zero, then theoretically someone other thread (that already holds container_mutex) could check that the refcount is zero and free the container structure. Adding a synchronize_rcu in container_diput() guarantees that the container structure won't be freed while someone may still be accessing it. > > Could you please elaborate as to why using a release agent is broken > when the memory controller is attached to it? Because then it will try to take container_mutex in css_put() if it drops the last reference to a container, which is the thing that you said you had to avoid since you called css_put() in contexts that couldn't sleep. Paul ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Containers: css_put() dilemma 2007-07-17 15:49 ` Paul (宝瑠) Menage @ 2007-07-17 16:02 ` Dave Hansen 2007-07-17 16:15 ` Paul (宝瑠) Menage 2007-07-17 17:23 ` Paul Jackson 2007-07-17 17:40 ` Balbir Singh 2 siblings, 1 reply; 26+ messages in thread From: Dave Hansen @ 2007-07-17 16:02 UTC (permalink / raw) To: Paul (宝瑠) Menage Cc: balbir, dhaval, linux kernel mailing list, Pavel Emelianov, Paul Jackson, Linux Containers, Andrew Morton On Tue, 2007-07-17 at 08:49 -0700, Paul (宝瑠) Menage wrote: > Because as soon as you do the atomic_dec_and_test() on css->refcnt and > the refcnt hits zero, then theoretically someone other thread (that > already holds container_mutex) could check that the refcount is zero > and free the container structure. Then that other task had a reference and itself should have bumped the count, and the other user would never have seen it hit zero. Even if there are still pages attached to the container, why not just have those take a reference, and don't bother actually freeing the container until the last true reference is dropped? Does it matter if the destruction callbacks don't happen until well after an attempt to destroy the container is made? -- Dave ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Containers: css_put() dilemma 2007-07-17 16:02 ` Dave Hansen @ 2007-07-17 16:15 ` Paul (宝瑠) Menage 0 siblings, 0 replies; 26+ messages in thread From: Paul (宝瑠) Menage @ 2007-07-17 16:15 UTC (permalink / raw) To: Dave Hansen Cc: balbir, dhaval, linux kernel mailing list, Pavel Emelianov, Paul Jackson, Linux Containers, Andrew Morton On 7/17/07, Dave Hansen <haveblue@us.ibm.com> wrote: > On Tue, 2007-07-17 at 08:49 -0700, Paul (宝瑠) Menage wrote: > > Because as soon as you do the atomic_dec_and_test() on css->refcnt and > > the refcnt hits zero, then theoretically someone other thread (that > > already holds container_mutex) could check that the refcount is zero > > and free the container structure. > > Then that other task had a reference and itself should have bumped the > count, and the other user would never have seen it hit zero. Nope. The container could have been empty (of tasks) and hence had a zero count. The liveness model used by containers is that when the refcount hits zero, the container isn't immediately destroyed (since it can contain useful historical usage data, etc) but simply becomes eligible for destruction by userspace via an rmdir(). > > Even if there are still pages attached to the container, why not just > have those take a reference, and don't bother actually freeing the > container until the last true reference is dropped? Yes, we could potentially just use the main count variable rather than having separate per-subsystem extra refcounts. The main reasons to do it this way are: - the root subsystem state for a subsystem can shift between different "struct container" objects if it was previously inactive and gets mounted as part of a hierarchy (or similarly, gets unmounted and goes inactive). Possibly we could get around this by simply saying not doing refcounting on the subsys states attached to root containers since they can never be freed anyway. - At some point I'd like to be able to support shifting subsystems between active hierarchies, at least in limited cases such as where the hierarchies are isomorphic, or binding/unbinding subsystems to/from active hierarchies. At that point we definitely need to be able to split out the different subsystem state refcounts from one another in the same hierarchy. - we'd still have the issue that Balbir wants to be able to drop a reference in a non-sleeping context, and we want to avoid doing excessive synchronization in the normal case when the css_put() doesn't put the reference count to zero. > > Does it matter if the destruction callbacks don't happen until well > after an attempt to destroy the container is made? Well that's sort of the point of putting a synchronize_rcu() in container_diput() - it ensures that the actual destruction of the object doesn't occur until sufficiently after the destruction attempt is initiated that no one is still using the reference. The alternative would be something that polls to spot whether refcounts have reached zero and if so runs the userspace helper. That doesn't seem particularly palatable when you have large numbers of containers, if we can avoid it easily. Paul ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Containers: css_put() dilemma 2007-07-17 15:49 ` Paul (宝瑠) Menage 2007-07-17 16:02 ` Dave Hansen @ 2007-07-17 17:23 ` Paul Jackson 2007-07-17 17:40 ` Balbir Singh 2 siblings, 0 replies; 26+ messages in thread From: Paul Jackson @ 2007-07-17 17:23 UTC (permalink / raw) To: Paul (______) Menage Cc: balbir, dhaval, xemul, linux-kernel, containers, akpm > Because as soon as you do the atomic_dec_and_test() on css->refcnt and > the refcnt hits zero, then theoretically someone other thread (that > already holds container_mutex) could check that the refcount is zero > and free the container structure. Not just theory ... I've debugged crashes from live customer systems that were basically this race. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Containers: css_put() dilemma 2007-07-17 15:49 ` Paul (宝瑠) Menage 2007-07-17 16:02 ` Dave Hansen 2007-07-17 17:23 ` Paul Jackson @ 2007-07-17 17:40 ` Balbir Singh 2007-07-17 17:44 ` Paul (宝瑠) Menage 2007-07-17 17:53 ` Paul Jackson 2 siblings, 2 replies; 26+ messages in thread From: Balbir Singh @ 2007-07-17 17:40 UTC (permalink / raw) To: "Paul (??) Menage" Cc: dhaval, Pavel Emelianov, linux kernel mailing list, Paul Jackson, Linux Containers, Andrew Morton Paul (??) Menage wrote: > Because as soon as you do the atomic_dec_and_test() on css->refcnt and > the refcnt hits zero, then theoretically someone other thread (that > already holds container_mutex) could check that the refcount is zero > and free the container structure. > Hi, Paul, That sounds correct. I wonder now if the solution should be some form of delegation for deletion of unreferenced containers (HINT: work queue or kernel threads). > Adding a synchronize_rcu in container_diput() guarantees that the > container structure won't be freed while someone may still be > accessing it. > Do we take rcu_read_lock() in css_put() path or use call_rcu() to free the container? >> >> Could you please elaborate as to why using a release agent is broken >> when the memory controller is attached to it? > > Because then it will try to take container_mutex in css_put() if it > drops the last reference to a container, which is the thing that you > said you had to avoid since you called css_put() in contexts that > couldn't sleep. > > Paul -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Containers: css_put() dilemma 2007-07-17 17:40 ` Balbir Singh @ 2007-07-17 17:44 ` Paul (宝瑠) Menage 2007-07-17 17:55 ` Paul Jackson 2007-07-17 18:11 ` Balbir Singh 2007-07-17 17:53 ` Paul Jackson 1 sibling, 2 replies; 26+ messages in thread From: Paul (宝瑠) Menage @ 2007-07-17 17:44 UTC (permalink / raw) To: balbir Cc: dhaval, Pavel Emelianov, linux kernel mailing list, Paul Jackson, Linux Containers, Andrew Morton On 7/17/07, Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > > That sounds correct. I wonder now if the solution should be some form > of delegation for deletion of unreferenced containers (HINT: work queue > or kernel threads). What a great idea. In fact, that's exactly what the release agent patch already does. > > > Adding a synchronize_rcu in container_diput() guarantees that the > > container structure won't be freed while someone may still be > > accessing it. > > > > Do we take rcu_read_lock() in css_put() path or use call_rcu() to > free the container? Good point, we ought to add rcu_read_lock() (even though it doesn't actually do anything on architectures other than alpha, right?) Using call_rcu to do the container kfree rather than synchronize_rcu() would be a possible future optimization, yes. Paul ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Containers: css_put() dilemma 2007-07-17 17:44 ` Paul (宝瑠) Menage @ 2007-07-17 17:55 ` Paul Jackson 2007-07-17 17:57 ` Paul (宝瑠) Menage 2007-07-17 18:11 ` Balbir Singh 1 sibling, 1 reply; 26+ messages in thread From: Paul Jackson @ 2007-07-17 17:55 UTC (permalink / raw) To: Paul (______) Menage Cc: balbir, dhaval, xemul, linux-kernel, containers, akpm Paul M wrote: > In fact, that's exactly what the release agent > patch already does. I'm feeling lazy ;) What's the Subject of that patch, for my easy searching? -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Containers: css_put() dilemma 2007-07-17 17:55 ` Paul Jackson @ 2007-07-17 17:57 ` Paul (宝瑠) Menage 0 siblings, 0 replies; 26+ messages in thread From: Paul (宝瑠) Menage @ 2007-07-17 17:57 UTC (permalink / raw) To: Paul Jackson; +Cc: balbir, dhaval, xemul, linux-kernel, containers, akpm On 7/17/07, Paul Jackson <pj@sgi.com> wrote: > Paul M wrote: > > In fact, that's exactly what the release agent > > patch already does. > > I'm feeling lazy ;) What's the Subject of that > patch, for my easy searching? "Support for automatic userspace release agents" Paul ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Containers: css_put() dilemma 2007-07-17 17:44 ` Paul (宝瑠) Menage 2007-07-17 17:55 ` Paul Jackson @ 2007-07-17 18:11 ` Balbir Singh 2007-07-17 18:26 ` Paul (宝瑠) Menage 1 sibling, 1 reply; 26+ messages in thread From: Balbir Singh @ 2007-07-17 18:11 UTC (permalink / raw) To: "Paul (??) Menage" Cc: dhaval, Pavel Emelianov, linux kernel mailing list, Paul Jackson, Linux Containers, Andrew Morton Paul (??) Menage wrote: > On 7/17/07, Balbir Singh <balbir@linux.vnet.ibm.com> wrote: >> >> That sounds correct. I wonder now if the solution should be some form >> of delegation for deletion of unreferenced containers (HINT: work queue >> or kernel threads). > > What a great idea. In fact, that's exactly what the release agent > patch already does. :-) I should have seen that. I am a little lost thinking that container_rmdir() and the release agent check_for_release() work without too much knowledge of each other. BTW, what are the semantics of css_put() is it expected to free the container/run the release agent when the reference count of the container_subsys_state drops to zero? -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Containers: css_put() dilemma 2007-07-17 18:11 ` Balbir Singh @ 2007-07-17 18:26 ` Paul (宝瑠) Menage 2007-07-18 4:29 ` Balbir Singh 0 siblings, 1 reply; 26+ messages in thread From: Paul (宝瑠) Menage @ 2007-07-17 18:26 UTC (permalink / raw) To: balbir Cc: dhaval, Pavel Emelianov, linux kernel mailing list, Paul Jackson, Linux Containers, Andrew Morton On 7/17/07, Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > without too much knowledge of each other. BTW, what are the semantics > of css_put() is it expected to free the container/run the release agent > when the reference count of the container_subsys_state drops to zero? > If you css_put() the last reference on a subsystem state object and the associated container is marked as notify_on_release, then check_for_release() is called which does a more full check of whether the container is releasable. If it is, a workqueue task is scheduled to run the userspace release agent, which can then do anything it wants, including potentially deleting the empty container. Paul ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Containers: css_put() dilemma 2007-07-17 18:26 ` Paul (宝瑠) Menage @ 2007-07-18 4:29 ` Balbir Singh 2007-07-18 5:30 ` Balbir Singh 2007-07-18 6:07 ` Paul (宝瑠) Menage 0 siblings, 2 replies; 26+ messages in thread From: Balbir Singh @ 2007-07-18 4:29 UTC (permalink / raw) To: "Paul (??) Menage" Cc: dhaval, linux kernel mailing list, Pavel Emelianov, Paul Jackson, Linux Containers, Andrew Morton Paul (??) Menage wrote: > On 7/17/07, Balbir Singh <balbir@linux.vnet.ibm.com> wrote: >> without too much knowledge of each other. BTW, what are the semantics >> of css_put() is it expected to free the container/run the release agent >> when the reference count of the container_subsys_state drops to zero? >> > > If you css_put() the last reference on a subsystem state object and > the associated container is marked as notify_on_release, then > check_for_release() is called which does a more full check of whether > the container is releasable. If it is, a workqueue task is scheduled > to run the userspace release agent, which can then do anything it > wants, including potentially deleting the empty container. > Ok.. so my problem still remains, how do I get a non-blocking atomic reference increment/decrement routine, that would prevent my container from being deleted? I don't find cpusets using css_put(). I was hoping that we could alter css_* would provide the functionality I need. -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Containers: css_put() dilemma 2007-07-18 4:29 ` Balbir Singh @ 2007-07-18 5:30 ` Balbir Singh 2007-07-18 5:52 ` Srivatsa Vaddagiri 2007-07-18 23:15 ` Paul Menage 2007-07-18 6:07 ` Paul (宝瑠) Menage 1 sibling, 2 replies; 26+ messages in thread From: Balbir Singh @ 2007-07-18 5:30 UTC (permalink / raw) To: Paul (??) Menage Cc: balbir, dhaval, linux kernel mailing list, Pavel Emelianov, Paul Jackson, Linux Containers, Andrew Morton Balbir Singh wrote: > Paul (??) Menage wrote: >> On 7/17/07, Balbir Singh <balbir@linux.vnet.ibm.com> wrote: >>> without too much knowledge of each other. BTW, what are the semantics >>> of css_put() is it expected to free the container/run the release agent >>> when the reference count of the container_subsys_state drops to zero? >>> >> If you css_put() the last reference on a subsystem state object and >> the associated container is marked as notify_on_release, then >> check_for_release() is called which does a more full check of whether >> the container is releasable. If it is, a workqueue task is scheduled >> to run the userspace release agent, which can then do anything it >> wants, including potentially deleting the empty container. >> > > Ok.. so my problem still remains, how do I get a non-blocking atomic > reference increment/decrement routine, that would prevent my > container from being deleted? > > I don't find cpusets using css_put(). I was hoping that we could > alter css_* would provide the functionality I need. > > Thinking out loud again, can we add can_destory() callbacks? -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Containers: css_put() dilemma 2007-07-18 5:30 ` Balbir Singh @ 2007-07-18 5:52 ` Srivatsa Vaddagiri 2007-07-18 23:15 ` Paul Menage 1 sibling, 0 replies; 26+ messages in thread From: Srivatsa Vaddagiri @ 2007-07-18 5:52 UTC (permalink / raw) To: Balbir Singh Cc: Paul (??) Menage, dhaval, linux kernel mailing list, Pavel Emelianov, Paul Jackson, Linux Containers, Andrew Morton On Wed, Jul 18, 2007 at 11:00:39AM +0530, Balbir Singh wrote: > Thinking out loud again, can we add can_destory() callbacks? I remember suggesting such a callback long before : http://marc.info/?l=linux-kernel&m=117576241131788&w=2 -- Regards, vatsa ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Containers: css_put() dilemma 2007-07-18 5:30 ` Balbir Singh 2007-07-18 5:52 ` Srivatsa Vaddagiri @ 2007-07-18 23:15 ` Paul Menage 2007-07-19 3:44 ` Balbir Singh 1 sibling, 1 reply; 26+ messages in thread From: Paul Menage @ 2007-07-18 23:15 UTC (permalink / raw) To: balbir Cc: dhaval, linux kernel mailing list, Pavel Emelianov, Paul Jackson, Linux Containers, Andrew Morton On 7/17/07, Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > > Thinking out loud again, can we add can_destroy() callbacks? > What would the exact semantics of such a callback be? Since for proper interaction with release agents we need the subsystem to notify the framework when a subsystem object becomes releasable (currently as part of css_put()), what would a can_destroy() callback let you do that you couldn't do just by taking an extra css refcount to prevent destruction and releasing that refcount to allow destruction? Paul ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Containers: css_put() dilemma 2007-07-18 23:15 ` Paul Menage @ 2007-07-19 3:44 ` Balbir Singh 0 siblings, 0 replies; 26+ messages in thread From: Balbir Singh @ 2007-07-19 3:44 UTC (permalink / raw) To: Paul Menage Cc: dhaval, Pavel Emelianov, linux kernel mailing list, Paul Jackson, Linux Containers, Andrew Morton On 7/19/07, Paul Menage <menage@google.com> wrote: > On 7/17/07, Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > > > > Thinking out loud again, can we add can_destroy() callbacks? > > > > What would the exact semantics of such a callback be? > > Since for proper interaction with release agents we need the subsystem > to notify the framework when a subsystem object becomes releasable > (currently as part of css_put()), what would a can_destroy() callback > let you do that you couldn't do just by taking an extra css refcount > to prevent destruction and releasing that refcount to allow > destruction? I was thinking along those lines before you mentioned that the next version of css_put() will not block. The advantage I see of can_destory() is that it allows subsystems to do their own reference counting and decide whether they are ready to be deleted or not. The other advantage I see is that it can act like a prepare to be deleted phase for the controller, the controller might decide to take some action in the can_destroy() phase, like the memory controller could decide to start reclaiming all the remaining page cache pages. BTW, do you know upfront as to when the next set of container enhancement patches will be ready? The css_put() issue is blocking us currently. Balbir ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Containers: css_put() dilemma 2007-07-18 4:29 ` Balbir Singh 2007-07-18 5:30 ` Balbir Singh @ 2007-07-18 6:07 ` Paul (宝瑠) Menage 1 sibling, 0 replies; 26+ messages in thread From: Paul (宝瑠) Menage @ 2007-07-18 6:07 UTC (permalink / raw) To: balbir Cc: dhaval, linux kernel mailing list, Pavel Emelianov, Paul Jackson, Linux Containers, Andrew Morton On 7/17/07, Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > > Ok.. so my problem still remains, how do I get a non-blocking atomic > reference increment/decrement routine, that would prevent my > container from being deleted? css_put() in my new patchset will be non-blocking. > > I don't find cpusets using css_put(). Cpusets never needs to keep a non-process reference on a cpuset object, so the framework handles everything. Paul ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Containers: css_put() dilemma 2007-07-17 17:40 ` Balbir Singh 2007-07-17 17:44 ` Paul (宝瑠) Menage @ 2007-07-17 17:53 ` Paul Jackson 2007-07-17 17:55 ` Paul (宝瑠) Menage 1 sibling, 1 reply; 26+ messages in thread From: Paul Jackson @ 2007-07-17 17:53 UTC (permalink / raw) To: balbir; +Cc: menage, dhaval, xemul, linux-kernel, containers, akpm > That sounds correct. I wonder now if the solution should be some form > of delegation for deletion of unreferenced containers (HINT: work queue > or kernel threads). At least for cpusets (the mother of all containers), notify on release is part of the user visible API of cpusets. The kernel does not remove cpusets; it runs a user program, /sbin/cpuset_release_agent. That program might choose to rmdir the cpuset directory, and/or do other actions, like notify a batch scheduler that one of its cpusets was released. That API is not as sacrosanct as some, but it is working, and I wouldn't want to break it without good reason. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Containers: css_put() dilemma 2007-07-17 17:53 ` Paul Jackson @ 2007-07-17 17:55 ` Paul (宝瑠) Menage 2007-07-17 17:58 ` Paul Jackson 0 siblings, 1 reply; 26+ messages in thread From: Paul (宝瑠) Menage @ 2007-07-17 17:55 UTC (permalink / raw) To: Paul Jackson; +Cc: balbir, dhaval, xemul, linux-kernel, containers, akpm On 7/17/07, Paul Jackson <pj@sgi.com> wrote: > > At least for cpusets (the mother of all containers), notify on release > is part of the user visible API of cpusets. The kernel does not remove > cpusets; it runs a user program, /sbin/cpuset_release_agent. That > program might choose to rmdir the cpuset directory, and/or do other > actions, like notify a batch scheduler that one of its cpusets was > released. Right, that's what the release agent patch for process containers does - there's a file in the hierarchy root called "release_agent" that contains the path to the program to run if a notify_on_release container goes idle. When you mount the "cpuset" filesystem, it automatically populates that file with "/sbin/cpuset_release_agent". A workqueue task is used to actually do the notifications so that references can be dropped without having to potentially run a userspace helper. Paul ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Containers: css_put() dilemma 2007-07-17 17:55 ` Paul (宝瑠) Menage @ 2007-07-17 17:58 ` Paul Jackson 0 siblings, 0 replies; 26+ messages in thread From: Paul Jackson @ 2007-07-17 17:58 UTC (permalink / raw) To: Paul (______) Menage Cc: balbir, dhaval, xemul, linux-kernel, containers, akpm Paul M wrote: > Right, that's what the release agent patch for process containers does > - there's a file in the hierarchy root called "release_agent" that > contains the path to the program to run if a notify_on_release > container goes idle. When you mount the "cpuset" filesystem, it > automatically populates that file with "/sbin/cpuset_release_agent". Cool -- thanks. Sounds good. Nevermind my other "what's the Subject" query, if you haven't already answered it. You just nicely answered my questions. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2007-07-19 3:44 UTC | newest] Thread overview: 26+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-07-16 18:50 Containers: css_put() dilemma Balbir Singh 2007-07-16 19:03 ` Paul (宝瑠) Menage 2007-07-17 2:21 ` Balbir Singh 2007-07-17 2:35 ` Paul (宝瑠) Menage 2007-07-17 7:00 ` Balbir Singh 2007-07-17 7:18 ` Paul (宝瑠) Menage 2007-07-17 10:28 ` Balbir Singh 2007-07-17 15:49 ` Paul (宝瑠) Menage 2007-07-17 16:02 ` Dave Hansen 2007-07-17 16:15 ` Paul (宝瑠) Menage 2007-07-17 17:23 ` Paul Jackson 2007-07-17 17:40 ` Balbir Singh 2007-07-17 17:44 ` Paul (宝瑠) Menage 2007-07-17 17:55 ` Paul Jackson 2007-07-17 17:57 ` Paul (宝瑠) Menage 2007-07-17 18:11 ` Balbir Singh 2007-07-17 18:26 ` Paul (宝瑠) Menage 2007-07-18 4:29 ` Balbir Singh 2007-07-18 5:30 ` Balbir Singh 2007-07-18 5:52 ` Srivatsa Vaddagiri 2007-07-18 23:15 ` Paul Menage 2007-07-19 3:44 ` Balbir Singh 2007-07-18 6:07 ` Paul (宝瑠) Menage 2007-07-17 17:53 ` Paul Jackson 2007-07-17 17:55 ` Paul (宝瑠) Menage 2007-07-17 17:58 ` Paul Jackson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox