public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Srivatsa Vaddagiri <vatsa@in.ibm.com>
To: "Paul Menage" <menage@google.com>
Cc: sekharan@us.ibm.com, ckrm-tech@lists.sourceforge.net,
	linux-kernel@vger.kernel.org, xemul@sw.ru, rohitseth@google.com,
	pj@sgi.com, "Eric W. Biederman" <ebiederm@xmission.com>,
	mbligh@google.com, winget@google.com, containers@lists.osdl.org,
	"Serge E. Hallyn" <serue@us.ibm.com>,
	dev@sw.ru, devel@openvz.org
Subject: Re: [ckrm-tech] [PATCH 7/7] containers (V7): Container interface to nsproxy subsystem
Date: Thu, 5 Apr 2007 12:09:50 +0530	[thread overview]
Message-ID: <20070405063950.GA3435@in.ibm.com> (raw)
In-Reply-To: <6599ad830704041957y7b81c4ecrd21f4c08b9d7c72d@mail.gmail.com>

On Wed, Apr 04, 2007 at 07:57:40PM -0700, Paul Menage wrote:
> >Firstly, this is not a unique problem introduced by using ->nsproxy.
> >Secondly we have discussed this to some extent before
> >(http://lkml.org/lkml/2007/2/13/122). Essentially if we see zero tasks
> >sharing a resource object pointed to by ->nsproxy, then we can't be
> >racing with a function like bc_file_charge(), which simplifies the
> >problem quite a bit. In other words, seeing zero tasks in xxx_rmdir()
> >after taking manage_mutex is permission to kill nsproxy and associated
> >objects. Correct me if I am wrong here.

Let me clarify first that I wasn't proposing an extra ref count in
nsproxy to account for non-task references to a resource object pointed
to by nsproxy (say nsproxy->ctlr_data[BC_ID]). Refcounts needed
on beancounter because a non-task object is pointing to it (like struct
file) will be put in the beancounter itself.

What I did want to say was this (sorry about the verbose rant):

	mount -t container -obeancounter none /dev/bean
	mkdir /dev/bean/foo
	echo some_pid > /dev/bean/foo

Associated with foo is a beancounter object A1 which contains (among other
things) max files that can be opened by tasks in foo. Also upon
successful file open, file->f_bc will point to A1.

Now lets say that someone is doing 

	rmdir /dev/bean/foo

while will lead us to xxx_rmdir() doing this:
	
	mutex_lock(&manage_mutex);

	count = rcfs_task_count(foo's dentry);

rcfs_task_count will essentially return number of tasks pointing to A1
thr' their nsproxy->ctlr_data[BC_ID].

IF (note that /if/ again) the count returned is zero, then my point was
we can destroy nsproxy behind foo and also B1, not worrying about a
'struct file' still pointing to B1. This stems from the fact that you
cannot have a task's file->f_bc pointing to B1 w/o the task itself
pointing to B1 also (task->nsproxy->ctlr_data[BC_ID] == B1). I also
assume f_bc will get migrated with its owner task across beancounters
(which seems reasonable to me atleast from 'struct file' context).

If there was indeed a file object still pointing to B1, then that can
only be true if rcfs_task_count() returns non-zero value. Correct?

This is what I had in mind when I said this above : "In other words, seeing 
zero tasks in xxx_rmdir() after taking manage_mutex is permission to kill 
nsproxy and associated objects".

OT : In your posting of beancounter patches on top of containers, f_bc
isnt being migrated upon task movements. Is that on intention?

> OK, I've managed to reconstruct my reasoning  remembered why it's
> important to have the refcounts associated with the subsystems, and
> why the simple use of the nsproxy count doesn't work.

I didn't mean to have non-task objects add refcounts to nsproxy. See
above.

> 1) Assume the system has a single task T, and two subsystems, A and B
> 
> 2) Mount hierarchy H1, with subsystem A and root subsystem state A0,
> and hierarchy H2 with subsystem B and root subsystem state B0. Both
> H1/ and H2/ share a single nsproxy N0, with refcount 3 (including the
> reference from T), pointing at A0 and B0.

Why refcount 3? I can only be 1 (from T) ..

> 3) Create directory H1/foo, which creates subsystem state A1 (nsproxy
> N1, refcount 1, pointing at A1 and B0)

right. At this point A1.count should be 1 (because N1 is pointing to it)

> 4) Create directory H2/bar, which creates subsystem state B1 (nsproxy
> N2, refcount 1, pointing at A0 and B1)

right. B1.count = 1 also.

> 5) Move T into H1/foo/tasks and then H2/bar/tasks. It ends up with
> nsproxy N3, refcount 1, pointing at A1 and B1.

right. A1.count = 2 (N1, N3) and B1.count = 2 (N2, N3)

> 6) T creates an object that is charged to A1 and hence needs to take a
> reference on A1 in order to uncharge it later when it's released. So
> N3 now has a refcount of 2

no ..N3 can continue to have 1 while A1.count becomes 3 (N1, N3 and
file->f_bc)

> 7) Move T back to H1/tasks and H2/tasks; assume it picks up nsproxy N0
> again; N3 has a refcount of 1 now. (Assume that the object created in
> step 6 isn't one that's practical/desirable to relocate when the task
> that created it moves to a different container)

The object was created by the task, so I would expect it should get
migrated too to the new task's context (which should be true in case of
f_bc atleast?). Can you give a practical example where you want to
migrate the task and not the object it created?

Anyway, coming down to the impact of all this for a nsproxy based
solution, I would imagine this is what will happen when T moves back to
H1/tasks and H2/tasks:

	- N3.count becomes zero
	- We invoke free_nsproxy(N3), which drops refcounts on
	  all objects it is pointing to i.e
		
	free_nsproxy()
	{
		if (N3->mnt_ns)
			put_mnt_ns(N3->mnt_ns);
		...
		if (N3->ctlr_data[BC_ID])
			put_bc(N3->ctlr_data[BC_ID]);
	}

put/get_bc() manages refcounts on beancounters. It will drop A1.count to 2
(if f_bc wasnt migrated) and not finding it zero will not destroy A1.

Essentially, in the nsproxy based approach, I am having individual
controllers maintain their own refcount mechanism (just like mnt_ns or
uts_ns are doing today).

> In this particular case the extra refcount on N3 is intended to keep
> A1 alive (which prevents H1/foo being deleted), but there's no way to
> tell from the structures in use whether it was taken on A1 or on B1.
> Neither H1/foo nor H2/bar can be deleted, even though nothing is
> intending to have a reference count on H2/bar.
> 
> Putting the extra refcount explicitly either in A1, or else in a
> container object associated with H1/foo makes this more obvious.

Hope the above description resolves these points ..


-- 
Regards,
vatsa

  reply	other threads:[~2007-04-05  6:32 UTC|newest]

Thread overview: 132+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-02-12  8:15 [PATCH 0/7] containers (V7): Generic Process Containers menage
2007-02-12  8:15 ` [PATCH 1/7] containers (V7): Generic container system abstracted from cpusets code menage
2007-02-12 12:33   ` Srivatsa Vaddagiri
2007-02-12 19:26     ` Paul Menage
2007-02-12 19:46       ` Paul Menage
2007-02-13  5:48         ` Srivatsa Vaddagiri
2007-02-13  8:16           ` [ckrm-tech] " Srivatsa Vaddagiri
2007-03-07 12:21   ` Srivatsa Vaddagiri
2007-03-07 14:06     ` [ckrm-tech] " Srivatsa Vaddagiri
2007-03-07 20:50     ` Paul Menage
2007-03-08 10:38       ` [ckrm-tech] " Srivatsa Vaddagiri
2007-03-08 10:40         ` Paul Menage
2007-03-11 19:38         ` Paul Jackson
2007-03-12 14:19           ` Srivatsa Vaddagiri
2007-03-22  9:56   ` Srivatsa Vaddagiri
2007-03-22 10:20     ` Srivatsa Vaddagiri
2007-03-24 15:05   ` Srivatsa Vaddagiri
2007-03-24 19:25     ` Paul Jackson
2007-03-25  0:45       ` Srivatsa Vaddagiri
2007-03-25  1:41         ` Paul Jackson
2007-03-25  2:28           ` Srivatsa Vaddagiri
2007-03-25  4:16             ` Srivatsa Vaddagiri
2007-03-25  5:43               ` Paul Jackson
2007-03-25  8:21                 ` Srivatsa Vaddagiri
2007-03-25  4:45             ` Paul Jackson
2007-03-25  5:05               ` Srivatsa Vaddagiri
2007-03-25  4:59                 ` Paul Jackson
2007-02-12  8:15 ` [PATCH 2/7] containers (V7): Cpusets hooked into containers menage
2007-02-15 20:35   ` Serge E. Hallyn
2007-02-15 20:49     ` Paul Menage
2007-03-07 14:34   ` Srivatsa Vaddagiri
2007-03-07 16:01     ` Paul Menage
2007-03-07 16:31       ` [ckrm-tech] " Srivatsa Vaddagiri
2007-03-07 16:31         ` Paul Menage
2007-03-07 14:52   ` Srivatsa Vaddagiri
2007-03-07 16:12     ` Paul Menage
2007-02-12  8:15 ` [PATCH 4/7] containers (V7): Simple CPU accounting container subsystem menage
2007-02-12  8:15 ` [PATCH 5/7] containers (V7): Resource Groups over generic containers menage
2007-02-12  8:15 ` [PATCH 6/7] containers (V7): BeanCounters over generic process containers menage
2007-02-12 15:34   ` Srivatsa Vaddagiri
2007-02-12 18:49     ` Paul Menage
2007-02-13  8:52   ` Pavel Emelianov
2007-02-13  9:03     ` Paul Menage
2007-02-13  9:18       ` Pavel Emelianov
2007-02-13  9:37         ` Paul Menage
2007-02-13  9:49           ` Pavel Emelianov
2007-02-12  8:15 ` [PATCH 7/7] containers (V7): Container interface to nsproxy subsystem menage
2007-03-24  5:05   ` [ckrm-tech] " Srivatsa Vaddagiri
2007-03-24 16:23     ` Srivatsa Vaddagiri
2007-03-26 21:57       ` Serge E. Hallyn
2007-03-28 14:55         ` Srivatsa Vaddagiri
2007-03-28 15:26           ` Serge E. Hallyn
2007-03-26 21:55     ` Serge E. Hallyn
2007-03-31  2:47   ` Srivatsa Vaddagiri
2007-04-02 14:09     ` Serge E. Hallyn
2007-04-02 14:27       ` Srivatsa Vaddagiri
2007-04-02 18:02         ` Eric W. Biederman
2007-04-03 14:16           ` Srivatsa Vaddagiri
2007-04-03 15:32           ` Serge E. Hallyn
2007-04-03 15:45             ` Paul Menage
2007-04-03 15:54               ` Serge E. Hallyn
2007-04-03 16:16               ` Srivatsa Vaddagiri
2007-04-03 16:26               ` Kirill Korotaev
2007-04-03 16:46               ` Srivatsa Vaddagiri
2007-04-03 16:52                 ` Paul Menage
2007-04-03 17:11                   ` Srivatsa Vaddagiri
2007-04-03 17:10                     ` Paul Menage
2007-04-03 17:30                       ` Srivatsa Vaddagiri
2007-04-03 17:30                         ` Paul Menage
2007-04-03 17:51                           ` Srivatsa Vaddagiri
2007-04-03 17:49                             ` Paul Menage
2007-04-04  3:07                               ` Srivatsa Vaddagiri
2007-04-04  3:44                                 ` Paul Jackson
2007-04-04  4:04                                 ` Paul Menage
2007-04-04  5:15                                   ` Srivatsa Vaddagiri
2007-04-04  7:00                                     ` Paul Menage
2007-04-04 17:26                                       ` Srivatsa Vaddagiri
2007-04-04 17:42                                         ` Srivatsa Vaddagiri
2007-04-04 18:57                                         ` Paul Menage
2007-04-04 23:02                                           ` Eric W. Biederman
2007-04-05  1:35                                             ` Paul Menage
2007-04-05  1:37                                               ` Paul Menage
2007-04-05 16:57                                           ` Srivatsa Vaddagiri
2007-04-05 17:14                                             ` Srivatsa Vaddagiri
2007-04-06 21:54                                             ` Paul Menage
2007-04-05  2:57                                         ` Paul Menage
2007-04-05  6:39                                           ` Srivatsa Vaddagiri [this message]
2007-04-05  6:46                                             ` Srivatsa Vaddagiri
2007-04-05  6:48                                             ` Paul Menage
2007-04-05  8:49                                               ` Srivatsa Vaddagiri
2007-04-05  9:29                                                 ` Paul Menage
2007-04-05 12:43                                                   ` Srivatsa Vaddagiri
2007-04-05 14:13                                                     ` Srivatsa Vaddagiri
2007-04-05 14:13                                                     ` Paul Menage
2007-04-05 14:46                                                       ` Srivatsa Vaddagiri
2007-04-03 17:34                       ` Srivatsa Vaddagiri
2007-04-03 17:29                         ` Paul Menage
2007-04-03 16:10             ` Srivatsa Vaddagiri
2007-04-03 15:41           ` Serge E. Hallyn
2007-02-12  9:18 ` [PATCH 0/7] containers (V7): Generic Process Containers Paul Jackson
2007-02-12  9:32   ` Paul Menage
2007-02-12  9:52     ` Paul Jackson
     [not found] ` <20070212085104.485337000@menage.corp.google.com>
2007-02-12 15:27   ` [PATCH 3/7] containers (V7): Add generic multi-subsystem API to containers Srivatsa Vaddagiri
2007-02-12 18:40     ` Paul Menage
2007-02-13 13:19       ` Srivatsa Vaddagiri
2007-02-15  1:17         ` Paul Menage
2007-02-12 15:39   ` Serge E. Hallyn
2007-02-12 15:56     ` Cedric Le Goater
2007-02-12 18:31       ` Paul Menage
2007-02-14  8:49   ` Balbir Singh
2007-03-08 17:52   ` Srivatsa Vaddagiri
2007-03-24 12:51   ` [ckrm-tech] " Srivatsa Vaddagiri
2007-02-12 22:38 ` [PATCH 0/7] containers (V7): Generic Process Containers Sam Vilain
2007-02-12 22:47   ` Serge E. Hallyn
2007-02-12 23:18     ` Paul Menage
2007-02-12 23:15   ` Paul Menage
2007-02-13  0:30     ` Sam Vilain
2007-02-13  0:42       ` [ckrm-tech] " Paul Menage
2007-02-13  1:13         ` Sam Vilain
2007-02-13  1:47           ` Paul Menage
2007-02-20 17:34     ` Eric W. Biederman
2007-02-20 17:55       ` Paul Menage
2007-02-20 19:29         ` Eric W. Biederman
2007-02-20 22:47           ` Paul Menage
2007-02-20 23:08             ` Sam Vilain
2007-02-20 23:36               ` Paul Menage
2007-02-20 23:32             ` Serge E. Hallyn
2007-02-20 21:58         ` Sam Vilain
2007-02-20 22:19           ` Paul Menage
2007-02-20 22:58             ` Sam Vilain
2007-02-20 23:28               ` Paul Menage
2007-02-20 23:37               ` Serge E. Hallyn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070405063950.GA3435@in.ibm.com \
    --to=vatsa@in.ibm.com \
    --cc=ckrm-tech@lists.sourceforge.net \
    --cc=containers@lists.osdl.org \
    --cc=dev@sw.ru \
    --cc=devel@openvz.org \
    --cc=ebiederm@xmission.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mbligh@google.com \
    --cc=menage@google.com \
    --cc=pj@sgi.com \
    --cc=rohitseth@google.com \
    --cc=sekharan@us.ibm.com \
    --cc=serue@us.ibm.com \
    --cc=winget@google.com \
    --cc=xemul@sw.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox