From: Srivatsa Vaddagiri <vatsa@in.ibm.com>
To: "Paul Menage" <menage@google.com>
Cc: sekharan@us.ibm.com, ckrm-tech@lists.sourceforge.net,
linux-kernel@vger.kernel.org, xemul@sw.ru, rohitseth@google.com,
pj@sgi.com, "Eric W. Biederman" <ebiederm@xmission.com>,
mbligh@google.com, winget@google.com, containers@lists.osdl.org,
"Serge E. Hallyn" <serue@us.ibm.com>,
dev@sw.ru, devel@openvz.org
Subject: Re: [ckrm-tech] [PATCH 7/7] containers (V7): Container interface to nsproxy subsystem
Date: Thu, 5 Apr 2007 12:09:50 +0530 [thread overview]
Message-ID: <20070405063950.GA3435@in.ibm.com> (raw)
In-Reply-To: <6599ad830704041957y7b81c4ecrd21f4c08b9d7c72d@mail.gmail.com>
On Wed, Apr 04, 2007 at 07:57:40PM -0700, Paul Menage wrote:
> >Firstly, this is not a unique problem introduced by using ->nsproxy.
> >Secondly we have discussed this to some extent before
> >(http://lkml.org/lkml/2007/2/13/122). Essentially if we see zero tasks
> >sharing a resource object pointed to by ->nsproxy, then we can't be
> >racing with a function like bc_file_charge(), which simplifies the
> >problem quite a bit. In other words, seeing zero tasks in xxx_rmdir()
> >after taking manage_mutex is permission to kill nsproxy and associated
> >objects. Correct me if I am wrong here.
Let me clarify first that I wasn't proposing an extra ref count in
nsproxy to account for non-task references to a resource object pointed
to by nsproxy (say nsproxy->ctlr_data[BC_ID]). Refcounts needed
on beancounter because a non-task object is pointing to it (like struct
file) will be put in the beancounter itself.
What I did want to say was this (sorry about the verbose rant):
mount -t container -obeancounter none /dev/bean
mkdir /dev/bean/foo
echo some_pid > /dev/bean/foo
Associated with foo is a beancounter object A1 which contains (among other
things) max files that can be opened by tasks in foo. Also upon
successful file open, file->f_bc will point to A1.
Now lets say that someone is doing
rmdir /dev/bean/foo
while will lead us to xxx_rmdir() doing this:
mutex_lock(&manage_mutex);
count = rcfs_task_count(foo's dentry);
rcfs_task_count will essentially return number of tasks pointing to A1
thr' their nsproxy->ctlr_data[BC_ID].
IF (note that /if/ again) the count returned is zero, then my point was
we can destroy nsproxy behind foo and also B1, not worrying about a
'struct file' still pointing to B1. This stems from the fact that you
cannot have a task's file->f_bc pointing to B1 w/o the task itself
pointing to B1 also (task->nsproxy->ctlr_data[BC_ID] == B1). I also
assume f_bc will get migrated with its owner task across beancounters
(which seems reasonable to me atleast from 'struct file' context).
If there was indeed a file object still pointing to B1, then that can
only be true if rcfs_task_count() returns non-zero value. Correct?
This is what I had in mind when I said this above : "In other words, seeing
zero tasks in xxx_rmdir() after taking manage_mutex is permission to kill
nsproxy and associated objects".
OT : In your posting of beancounter patches on top of containers, f_bc
isnt being migrated upon task movements. Is that on intention?
> OK, I've managed to reconstruct my reasoning remembered why it's
> important to have the refcounts associated with the subsystems, and
> why the simple use of the nsproxy count doesn't work.
I didn't mean to have non-task objects add refcounts to nsproxy. See
above.
> 1) Assume the system has a single task T, and two subsystems, A and B
>
> 2) Mount hierarchy H1, with subsystem A and root subsystem state A0,
> and hierarchy H2 with subsystem B and root subsystem state B0. Both
> H1/ and H2/ share a single nsproxy N0, with refcount 3 (including the
> reference from T), pointing at A0 and B0.
Why refcount 3? I can only be 1 (from T) ..
> 3) Create directory H1/foo, which creates subsystem state A1 (nsproxy
> N1, refcount 1, pointing at A1 and B0)
right. At this point A1.count should be 1 (because N1 is pointing to it)
> 4) Create directory H2/bar, which creates subsystem state B1 (nsproxy
> N2, refcount 1, pointing at A0 and B1)
right. B1.count = 1 also.
> 5) Move T into H1/foo/tasks and then H2/bar/tasks. It ends up with
> nsproxy N3, refcount 1, pointing at A1 and B1.
right. A1.count = 2 (N1, N3) and B1.count = 2 (N2, N3)
> 6) T creates an object that is charged to A1 and hence needs to take a
> reference on A1 in order to uncharge it later when it's released. So
> N3 now has a refcount of 2
no ..N3 can continue to have 1 while A1.count becomes 3 (N1, N3 and
file->f_bc)
> 7) Move T back to H1/tasks and H2/tasks; assume it picks up nsproxy N0
> again; N3 has a refcount of 1 now. (Assume that the object created in
> step 6 isn't one that's practical/desirable to relocate when the task
> that created it moves to a different container)
The object was created by the task, so I would expect it should get
migrated too to the new task's context (which should be true in case of
f_bc atleast?). Can you give a practical example where you want to
migrate the task and not the object it created?
Anyway, coming down to the impact of all this for a nsproxy based
solution, I would imagine this is what will happen when T moves back to
H1/tasks and H2/tasks:
- N3.count becomes zero
- We invoke free_nsproxy(N3), which drops refcounts on
all objects it is pointing to i.e
free_nsproxy()
{
if (N3->mnt_ns)
put_mnt_ns(N3->mnt_ns);
...
if (N3->ctlr_data[BC_ID])
put_bc(N3->ctlr_data[BC_ID]);
}
put/get_bc() manages refcounts on beancounters. It will drop A1.count to 2
(if f_bc wasnt migrated) and not finding it zero will not destroy A1.
Essentially, in the nsproxy based approach, I am having individual
controllers maintain their own refcount mechanism (just like mnt_ns or
uts_ns are doing today).
> In this particular case the extra refcount on N3 is intended to keep
> A1 alive (which prevents H1/foo being deleted), but there's no way to
> tell from the structures in use whether it was taken on A1 or on B1.
> Neither H1/foo nor H2/bar can be deleted, even though nothing is
> intending to have a reference count on H2/bar.
>
> Putting the extra refcount explicitly either in A1, or else in a
> container object associated with H1/foo makes this more obvious.
Hope the above description resolves these points ..
--
Regards,
vatsa
next prev parent reply other threads:[~2007-04-05 6:32 UTC|newest]
Thread overview: 132+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-02-12 8:15 [PATCH 0/7] containers (V7): Generic Process Containers menage
2007-02-12 8:15 ` [PATCH 1/7] containers (V7): Generic container system abstracted from cpusets code menage
2007-02-12 12:33 ` Srivatsa Vaddagiri
2007-02-12 19:26 ` Paul Menage
2007-02-12 19:46 ` Paul Menage
2007-02-13 5:48 ` Srivatsa Vaddagiri
2007-02-13 8:16 ` [ckrm-tech] " Srivatsa Vaddagiri
2007-03-07 12:21 ` Srivatsa Vaddagiri
2007-03-07 14:06 ` [ckrm-tech] " Srivatsa Vaddagiri
2007-03-07 20:50 ` Paul Menage
2007-03-08 10:38 ` [ckrm-tech] " Srivatsa Vaddagiri
2007-03-08 10:40 ` Paul Menage
2007-03-11 19:38 ` Paul Jackson
2007-03-12 14:19 ` Srivatsa Vaddagiri
2007-03-22 9:56 ` Srivatsa Vaddagiri
2007-03-22 10:20 ` Srivatsa Vaddagiri
2007-03-24 15:05 ` Srivatsa Vaddagiri
2007-03-24 19:25 ` Paul Jackson
2007-03-25 0:45 ` Srivatsa Vaddagiri
2007-03-25 1:41 ` Paul Jackson
2007-03-25 2:28 ` Srivatsa Vaddagiri
2007-03-25 4:16 ` Srivatsa Vaddagiri
2007-03-25 5:43 ` Paul Jackson
2007-03-25 8:21 ` Srivatsa Vaddagiri
2007-03-25 4:45 ` Paul Jackson
2007-03-25 5:05 ` Srivatsa Vaddagiri
2007-03-25 4:59 ` Paul Jackson
2007-02-12 8:15 ` [PATCH 2/7] containers (V7): Cpusets hooked into containers menage
2007-02-15 20:35 ` Serge E. Hallyn
2007-02-15 20:49 ` Paul Menage
2007-03-07 14:34 ` Srivatsa Vaddagiri
2007-03-07 16:01 ` Paul Menage
2007-03-07 16:31 ` [ckrm-tech] " Srivatsa Vaddagiri
2007-03-07 16:31 ` Paul Menage
2007-03-07 14:52 ` Srivatsa Vaddagiri
2007-03-07 16:12 ` Paul Menage
2007-02-12 8:15 ` [PATCH 4/7] containers (V7): Simple CPU accounting container subsystem menage
2007-02-12 8:15 ` [PATCH 5/7] containers (V7): Resource Groups over generic containers menage
2007-02-12 8:15 ` [PATCH 6/7] containers (V7): BeanCounters over generic process containers menage
2007-02-12 15:34 ` Srivatsa Vaddagiri
2007-02-12 18:49 ` Paul Menage
2007-02-13 8:52 ` Pavel Emelianov
2007-02-13 9:03 ` Paul Menage
2007-02-13 9:18 ` Pavel Emelianov
2007-02-13 9:37 ` Paul Menage
2007-02-13 9:49 ` Pavel Emelianov
2007-02-12 8:15 ` [PATCH 7/7] containers (V7): Container interface to nsproxy subsystem menage
2007-03-24 5:05 ` [ckrm-tech] " Srivatsa Vaddagiri
2007-03-24 16:23 ` Srivatsa Vaddagiri
2007-03-26 21:57 ` Serge E. Hallyn
2007-03-28 14:55 ` Srivatsa Vaddagiri
2007-03-28 15:26 ` Serge E. Hallyn
2007-03-26 21:55 ` Serge E. Hallyn
2007-03-31 2:47 ` Srivatsa Vaddagiri
2007-04-02 14:09 ` Serge E. Hallyn
2007-04-02 14:27 ` Srivatsa Vaddagiri
2007-04-02 18:02 ` Eric W. Biederman
2007-04-03 14:16 ` Srivatsa Vaddagiri
2007-04-03 15:32 ` Serge E. Hallyn
2007-04-03 15:45 ` Paul Menage
2007-04-03 15:54 ` Serge E. Hallyn
2007-04-03 16:16 ` Srivatsa Vaddagiri
2007-04-03 16:26 ` Kirill Korotaev
2007-04-03 16:46 ` Srivatsa Vaddagiri
2007-04-03 16:52 ` Paul Menage
2007-04-03 17:11 ` Srivatsa Vaddagiri
2007-04-03 17:10 ` Paul Menage
2007-04-03 17:30 ` Srivatsa Vaddagiri
2007-04-03 17:30 ` Paul Menage
2007-04-03 17:51 ` Srivatsa Vaddagiri
2007-04-03 17:49 ` Paul Menage
2007-04-04 3:07 ` Srivatsa Vaddagiri
2007-04-04 3:44 ` Paul Jackson
2007-04-04 4:04 ` Paul Menage
2007-04-04 5:15 ` Srivatsa Vaddagiri
2007-04-04 7:00 ` Paul Menage
2007-04-04 17:26 ` Srivatsa Vaddagiri
2007-04-04 17:42 ` Srivatsa Vaddagiri
2007-04-04 18:57 ` Paul Menage
2007-04-04 23:02 ` Eric W. Biederman
2007-04-05 1:35 ` Paul Menage
2007-04-05 1:37 ` Paul Menage
2007-04-05 16:57 ` Srivatsa Vaddagiri
2007-04-05 17:14 ` Srivatsa Vaddagiri
2007-04-06 21:54 ` Paul Menage
2007-04-05 2:57 ` Paul Menage
2007-04-05 6:39 ` Srivatsa Vaddagiri [this message]
2007-04-05 6:46 ` Srivatsa Vaddagiri
2007-04-05 6:48 ` Paul Menage
2007-04-05 8:49 ` Srivatsa Vaddagiri
2007-04-05 9:29 ` Paul Menage
2007-04-05 12:43 ` Srivatsa Vaddagiri
2007-04-05 14:13 ` Srivatsa Vaddagiri
2007-04-05 14:13 ` Paul Menage
2007-04-05 14:46 ` Srivatsa Vaddagiri
2007-04-03 17:34 ` Srivatsa Vaddagiri
2007-04-03 17:29 ` Paul Menage
2007-04-03 16:10 ` Srivatsa Vaddagiri
2007-04-03 15:41 ` Serge E. Hallyn
2007-02-12 9:18 ` [PATCH 0/7] containers (V7): Generic Process Containers Paul Jackson
2007-02-12 9:32 ` Paul Menage
2007-02-12 9:52 ` Paul Jackson
[not found] ` <20070212085104.485337000@menage.corp.google.com>
2007-02-12 15:27 ` [PATCH 3/7] containers (V7): Add generic multi-subsystem API to containers Srivatsa Vaddagiri
2007-02-12 18:40 ` Paul Menage
2007-02-13 13:19 ` Srivatsa Vaddagiri
2007-02-15 1:17 ` Paul Menage
2007-02-12 15:39 ` Serge E. Hallyn
2007-02-12 15:56 ` Cedric Le Goater
2007-02-12 18:31 ` Paul Menage
2007-02-14 8:49 ` Balbir Singh
2007-03-08 17:52 ` Srivatsa Vaddagiri
2007-03-24 12:51 ` [ckrm-tech] " Srivatsa Vaddagiri
2007-02-12 22:38 ` [PATCH 0/7] containers (V7): Generic Process Containers Sam Vilain
2007-02-12 22:47 ` Serge E. Hallyn
2007-02-12 23:18 ` Paul Menage
2007-02-12 23:15 ` Paul Menage
2007-02-13 0:30 ` Sam Vilain
2007-02-13 0:42 ` [ckrm-tech] " Paul Menage
2007-02-13 1:13 ` Sam Vilain
2007-02-13 1:47 ` Paul Menage
2007-02-20 17:34 ` Eric W. Biederman
2007-02-20 17:55 ` Paul Menage
2007-02-20 19:29 ` Eric W. Biederman
2007-02-20 22:47 ` Paul Menage
2007-02-20 23:08 ` Sam Vilain
2007-02-20 23:36 ` Paul Menage
2007-02-20 23:32 ` Serge E. Hallyn
2007-02-20 21:58 ` Sam Vilain
2007-02-20 22:19 ` Paul Menage
2007-02-20 22:58 ` Sam Vilain
2007-02-20 23:28 ` Paul Menage
2007-02-20 23:37 ` Serge E. Hallyn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070405063950.GA3435@in.ibm.com \
--to=vatsa@in.ibm.com \
--cc=ckrm-tech@lists.sourceforge.net \
--cc=containers@lists.osdl.org \
--cc=dev@sw.ru \
--cc=devel@openvz.org \
--cc=ebiederm@xmission.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mbligh@google.com \
--cc=menage@google.com \
--cc=pj@sgi.com \
--cc=rohitseth@google.com \
--cc=sekharan@us.ibm.com \
--cc=serue@us.ibm.com \
--cc=winget@google.com \
--cc=xemul@sw.ru \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.