[Fwd: [-mm] Add an owner to the mm

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [Fwd: [-mm] Add an owner to the mm_struct (v9)]
@ 2008-04-14 14:13 Balbir Singh
  2008-04-14 14:49 ` Pekka Enberg
  2008-04-15  0:07 ` Andrew Morton
  0 siblings, 2 replies; 18+ messages in thread
From: Balbir Singh @ 2008-04-14 14:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Paul Menage, Serge E. Hallyn, Pekka Enberg,
	linux kernel mailing list

Hi,

If nobody has any specific objection to this patch, can I go ahead and request
Andrew to include this patch?

Balbir


-------- Original Message --------
Subject: [-mm] Add an owner to the mm_struct (v9)
Date: Thu, 10 Apr 2008 14:46:02 +0530
From: Balbir Singh <balbir@linux.vnet.ibm.com>
To: Paul Menage <menage@google.com>, Pavel Emelianov <xemul@openvz.org>
CC: Hugh Dickins <hugh@veritas.com>, Sudhir Kumar <skumar@linux.vnet.ibm.com>,
YAMAMOTO Takashi <yamamoto@valinux.co.jp>, linux-kernel@vger.kernel.org,
taka@valinux.co.jp, linux-mm@kvack.org, David Rientjes <rientjes@google.com>,
Balbir Singh <balbir@linux.vnet.ibm.com>,        Andrew Morton
<akpm@linux-foundation.org>,        KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com>


Changelog v8
------------
1. CONFIG_MM_OWNER has no help, it is an internal option to be used by
   other configuration controls.
2. Added __read_mostly for cgroup_mm_owner_callback
3. Removed delay_group_leader() optimization

Changelog v7
------------
1. Make mm_need_new_owner() more readable
2. Remove extra white space from init_task.h

Changelog v6
------------

1. Fix typos
2. Document the use of delay_group_leader()

Changelog v5
------------
Remove the hooks for .owner from init_task.h and move it to init/main.c

Changelog v4
------------
1. Release rcu_read_lock() after acquiring task_lock(). Also get a reference
   to the task_struct
2. Change cgroup mm_owner_changed callback to callback only if the
   cgroup of old and new task is different and to pass the old and new
   cgroups instead of task pointers
3. Port the patch to 2.6.25-rc8-mm1

Changelog v3
------------

1. Add mm->owner change callbacks using cgroups

This patch removes the mem_cgroup member from mm_struct and instead adds
an owner. This approach was suggested by Paul Menage. The advantage of
this approach is that, once the mm->owner is known, using the subsystem
id, the cgroup can be determined. It also allows several control groups
that are virtually grouped by mm_struct, to exist independent of the memory
controller i.e., without adding mem_cgroup's for each controller,
to mm_struct.

A new config option CONFIG_MM_OWNER is added and the memory resource
controller selects this config option.

This patch also adds cgroup callbacks to notify subsystems when mm->owner
changes. The mm_cgroup_changed callback is called with the task_lock()
of the new task held and is called just prior to changing the mm->owner.

I am indebted to Paul Menage for the several reviews of this patchset
and helping me make it lighter and simpler.

This patch was tested on a powerpc box, it was compiled with both the
MM_OWNER config turned on and off.

After the thread group leader exits, it's moved to init_css_state by
cgroup_exit(), thus all future charges from runnings threads would
be redirected to the init_css_set's subsystem.

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---

 fs/exec.c                  |    1
 include/linux/cgroup.h     |   15 ++++++++
 include/linux/memcontrol.h |   16 +-------
 include/linux/mm_types.h   |    5 +-
 include/linux/sched.h      |   13 +++++++
 init/Kconfig               |    7 +++
 init/main.c                |    1
 kernel/cgroup.c            |   30 ++++++++++++++++
 kernel/exit.c              |   83 +++++++++++++++++++++++++++++++++++++++++++++
 kernel/fork.c              |   11 ++++-
 mm/memcontrol.c            |   24 +------------
 11 files changed, 167 insertions(+), 39 deletions(-)

diff -puN fs/exec.c~memory-controller-add-mm-owner fs/exec.c
--- linux-2.6.25-rc8/fs/exec.c~memory-controller-add-mm-owner	2008-04-08
17:23:20.000000000 +0530
+++ linux-2.6.25-rc8-balbir/fs/exec.c	2008-04-08 17:23:20.000000000 +0530
@@ -735,6 +735,7 @@ static int exec_mmap(struct mm_struct *m
 	tsk->active_mm = mm;
 	activate_mm(active_mm, mm);
 	task_unlock(tsk);
+	mm_update_next_owner(mm);
 	arch_pick_mmap_layout(mm);
 	if (old_mm) {
 		up_read(&old_mm->mmap_sem);
diff -puN include/linux/cgroup.h~memory-controller-add-mm-owner
include/linux/cgroup.h
--- linux-2.6.25-rc8/include/linux/cgroup.h~memory-controller-add-mm-owner
2008-04-08 17:23:20.000000000 +0530
+++ linux-2.6.25-rc8-balbir/include/linux/cgroup.h	2008-04-08 17:23:20.000000000
+0530
@@ -300,6 +300,12 @@ struct cgroup_subsys {
 			struct cgroup *cgrp);
 	void (*post_clone)(struct cgroup_subsys *ss, struct cgroup *cgrp);
 	void (*bind)(struct cgroup_subsys *ss, struct cgroup *root);
+	/*
+	 * This routine is called with the task_lock of mm->owner held
+	 */
+	void (*mm_owner_changed)(struct cgroup_subsys *ss,
+					struct cgroup *old,
+					struct cgroup *new);
 	int subsys_id;
 	int active;
 	int disabled;
@@ -385,4 +391,13 @@ static inline int cgroupstats_build(stru

 #endif /* !CONFIG_CGROUPS */

+#ifdef CONFIG_MM_OWNER
+extern void
+cgroup_mm_owner_callbacks(struct task_struct *old, struct task_struct *new);
+#else /* !CONFIG_MM_OWNER */
+static inline void
+cgroup_mm_owner_callbacks(struct task_struct *old, struct task_struct *new)
+{
+}
+#endif /* CONFIG_MM_OWNER */
 #endif /* _LINUX_CGROUP_H */
diff -puN include/linux/init_task.h~memory-controller-add-mm-owner
include/linux/init_task.h
diff -puN include/linux/memcontrol.h~memory-controller-add-mm-owner
include/linux/memcontrol.h
--- linux-2.6.25-rc8/include/linux/memcontrol.h~memory-controller-add-mm-owner
2008-04-08 17:23:20.000000000 +0530
+++ linux-2.6.25-rc8-balbir/include/linux/memcontrol.h	2008-04-08
17:23:20.000000000 +0530
@@ -27,9 +27,6 @@ struct mm_struct;

 #ifdef CONFIG_CGROUP_MEM_RES_CTLR

-extern void mm_init_cgroup(struct mm_struct *mm, struct task_struct *p);
-extern void mm_free_cgroup(struct mm_struct *mm);
-
 #define page_reset_bad_cgroup(page)	((page)->page_cgroup = 0)

 extern struct page_cgroup *page_get_page_cgroup(struct page *page);
@@ -48,8 +45,10 @@ extern unsigned long mem_cgroup_isolate_
 extern void mem_cgroup_out_of_memory(struct mem_cgroup *mem, gfp_t gfp_mask);
 int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem);

+extern struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p);
+
 #define mm_match_cgroup(mm, cgroup)	\
-	((cgroup) == rcu_dereference((mm)->mem_cgroup))
+	((cgroup) == mem_cgroup_from_task((mm)->owner))

 extern int mem_cgroup_prepare_migration(struct page *page);
 extern void mem_cgroup_end_migration(struct page *page);
@@ -73,15 +72,6 @@ extern long mem_cgroup_calc_reclaim_inac
 				struct zone *zone, int priority);

 #else /* CONFIG_CGROUP_MEM_RES_CTLR */
-static inline void mm_init_cgroup(struct mm_struct *mm,
-					struct task_struct *p)
-{
-}
-
-static inline void mm_free_cgroup(struct mm_struct *mm)
-{
-}
-
 static inline void page_reset_bad_cgroup(struct page *page)
 {
 }
diff -puN include/linux/mm_types.h~memory-controller-add-mm-owner
include/linux/mm_types.h
--- linux-2.6.25-rc8/include/linux/mm_types.h~memory-controller-add-mm-owner
2008-04-08 17:23:20.000000000 +0530
+++ linux-2.6.25-rc8-balbir/include/linux/mm_types.h	2008-04-08
17:23:20.000000000 +0530
@@ -230,8 +230,9 @@ struct mm_struct {
 	/* aio bits */
 	rwlock_t		ioctx_list_lock;	/* aio lock */
 	struct kioctx		*ioctx_list;
-#ifdef CONFIG_CGROUP_MEM_RES_CTLR
-	struct mem_cgroup *mem_cgroup;
+#ifdef CONFIG_MM_OWNER
+	struct task_struct *owner;	/* The thread group leader that */
+					/* owns the mm_struct.		*/
 #endif

 #ifdef CONFIG_PROC_FS
diff -puN include/linux/sched.h~memory-controller-add-mm-owner include/linux/sched.h
--- linux-2.6.25-rc8/include/linux/sched.h~memory-controller-add-mm-owner
2008-04-08 17:23:20.000000000 +0530
+++ linux-2.6.25-rc8-balbir/include/linux/sched.h	2008-04-08 17:23:20.000000000
+0530
@@ -2144,6 +2144,19 @@ static inline void migration_init(void)

 #define TASK_STATE_TO_CHAR_STR "RSDTtZX"

+#ifdef CONFIG_MM_OWNER
+extern void mm_update_next_owner(struct mm_struct *mm);
+extern void mm_init_owner(struct mm_struct *mm, struct task_struct *p);
+#else
+static inline void mm_update_next_owner(struct mm_struct *mm)
+{
+}
+
+static inline void mm_init_owner(struct mm_struct *mm, struct task_struct *p)
+{
+}
+#endif /* CONFIG_MM_OWNER */
+
 #endif /* __KERNEL__ */

 #endif
diff -puN init/Kconfig~memory-controller-add-mm-owner init/Kconfig
--- linux-2.6.25-rc8/init/Kconfig~memory-controller-add-mm-owner	2008-04-08
17:23:20.000000000 +0530
+++ linux-2.6.25-rc8-balbir/init/Kconfig	2008-04-08 17:31:38.000000000 +0530
@@ -371,9 +371,13 @@ config RESOURCE_COUNTERS
           infrastructure that works with cgroups
 	depends on CGROUPS

+config MM_OWNER
+	bool
+
 config CGROUP_MEM_RES_CTLR
 	bool "Memory Resource Controller for Control Groups"
 	depends on CGROUPS && RESOURCE_COUNTERS
+	select MM_OWNER
 	help
 	  Provides a memory resource controller that manages both page cache and
 	  RSS memory.
@@ -386,6 +390,9 @@ config CGROUP_MEM_RES_CTLR
 	  Only enable when you're ok with these trade offs and really
 	  sure you need the memory resource controller.

+	  This config option also selects MM_OWNER config option, which
+	  could in turn add some fork/exit overhead.
+
 config SYSFS_DEPRECATED
 	bool

diff -puN kernel/cgroup.c~memory-controller-add-mm-owner kernel/cgroup.c
--- linux-2.6.25-rc8/kernel/cgroup.c~memory-controller-add-mm-owner	2008-04-08
17:23:20.000000000 +0530
+++ linux-2.6.25-rc8-balbir/kernel/cgroup.c	2008-04-08 17:29:45.000000000 +0530
@@ -118,6 +118,7 @@ static int root_count;
  * be called.
  */
 static int need_forkexit_callback;
+static int need_mm_owner_callback __read_mostly;

 /* convenient tests for these bits */
 inline int cgroup_is_removed(const struct cgroup *cgrp)
@@ -2485,6 +2486,7 @@ static void __init cgroup_init_subsys(st
 	}

 	need_forkexit_callback |= ss->fork || ss->exit;
+	need_mm_owner_callback |= !!ss->mm_owner_changed;

 	ss->active = 1;
 }
@@ -2721,6 +2723,34 @@ void cgroup_fork_callbacks(struct task_s
 	}
 }

+#ifdef CONFIG_MM_OWNER
+/**
+ * cgroup_mm_owner_callbacks - run callbacks when the mm->owner changes
+ * @p: the new owner
+ *
+ * Called on every change to mm->owner. mm_init_owner() does not
+ * invoke this routine, since it assigns the mm->owner the first time
+ * and does not change it.
+ */
+void cgroup_mm_owner_callbacks(struct task_struct *old, struct task_struct *new)
+{
+	struct cgroup *oldcgrp, *newcgrp;
+
+	if (need_mm_owner_callback) {
+		int i;
+		for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
+			struct cgroup_subsys *ss = subsys[i];
+			oldcgrp = task_cgroup(old, ss->subsys_id);
+			newcgrp = task_cgroup(new, ss->subsys_id);
+			if (oldcgrp == newcgrp)
+				continue;
+			if (ss->mm_owner_changed)
+				ss->mm_owner_changed(ss, oldcgrp, newcgrp);
+		}
+	}
+}
+#endif /* CONFIG_MM_OWNER */
+
 /**
  * cgroup_post_fork - called on a new task after adding it to the task list
  * @child: the task in question
diff -puN kernel/exit.c~memory-controller-add-mm-owner kernel/exit.c
--- linux-2.6.25-rc8/kernel/exit.c~memory-controller-add-mm-owner	2008-04-08
17:23:20.000000000 +0530
+++ linux-2.6.25-rc8-balbir/kernel/exit.c	2008-04-10 14:42:11.000000000 +0530
@@ -577,6 +577,88 @@ void exit_fs(struct task_struct *tsk)

 EXPORT_SYMBOL_GPL(exit_fs);

+#ifdef CONFIG_MM_OWNER
+/*
+ * Task p is exiting and it owned mm, lets find a new owner for it
+ */
+static inline int
+mm_need_new_owner(struct mm_struct *mm, struct task_struct *p)
+{
+	/*
+	 * If there are other users of the mm and the owner (us) is exiting
+	 * we need to find a new owner to take on the responsibility.
+	 */
+	if (!mm)
+		return 0;
+	if (atomic_read(&mm->mm_users) <= 1)
+		return 0;
+	if (mm->owner != p)
+		return 0;
+	return 1;
+}
+
+void mm_update_next_owner(struct mm_struct *mm)
+{
+	struct task_struct *c, *g, *p = current;
+
+retry:
+	if (!mm_need_new_owner(mm, p))
+		return;
+
+	read_lock(&tasklist_lock);
+	/*
+	 * Search in the children
+	 */
+	list_for_each_entry(c, &p->children, sibling) {
+		if (c->mm == mm)
+			goto assign_new_owner;
+	}
+
+	/*
+	 * Search in the siblings
+	 */
+	list_for_each_entry(c, &p->parent->children, sibling) {
+		if (c->mm == mm)
+			goto assign_new_owner;
+	}
+
+	/*
+	 * Search through everything else. We should not get
+	 * here often
+	 */
+	do_each_thread(g, c) {
+		if (c->mm == mm)
+			goto assign_new_owner;
+	} while_each_thread(g, c);
+
+	read_unlock(&tasklist_lock);
+	return;
+
+assign_new_owner:
+	BUG_ON(c == p);
+	get_task_struct(c);
+	/*
+	 * The task_lock protects c->mm from changing.
+	 * We always want mm->owner->mm == mm
+	 */
+	task_lock(c);
+	/*
+	 * Delay read_unlock() till we have the task_lock()
+	 * to ensure that c does not slip away underneath us
+	 */
+	read_unlock(&tasklist_lock);
+	if (c->mm != mm) {
+		task_unlock(c);
+		put_task_struct(c);
+		goto retry;
+	}
+	cgroup_mm_owner_callbacks(mm->owner, c);
+	mm->owner = c;
+	task_unlock(c);
+	put_task_struct(c);
+}
+#endif /* CONFIG_MM_OWNER */
+
 /*
  * Turn us into a lazy TLB process if we
  * aren't already..
@@ -616,6 +698,7 @@ static void exit_mm(struct task_struct *
 	/* We don't want this task to be frozen prematurely */
 	clear_freeze_flag(tsk);
 	task_unlock(tsk);
+	mm_update_next_owner(mm);
 	mmput(mm);
 }

diff -puN kernel/fork.c~memory-controller-add-mm-owner kernel/fork.c
--- linux-2.6.25-rc8/kernel/fork.c~memory-controller-add-mm-owner	2008-04-08
17:23:20.000000000 +0530
+++ linux-2.6.25-rc8-balbir/kernel/fork.c	2008-04-08 17:23:20.000000000 +0530
@@ -358,14 +358,13 @@ static struct mm_struct * mm_init(struct
 	mm->ioctx_list = NULL;
 	mm->free_area_cache = TASK_UNMAPPED_BASE;
 	mm->cached_hole_size = ~0UL;
-	mm_init_cgroup(mm, p);
+	mm_init_owner(mm, p);

 	if (likely(!mm_alloc_pgd(mm))) {
 		mm->def_flags = 0;
 		return mm;
 	}

-	mm_free_cgroup(mm);
 	free_mm(mm);
 	return NULL;
 }
@@ -416,7 +415,6 @@ void mmput(struct mm_struct *mm)
 			spin_unlock(&mmlist_lock);
 		}
 		put_swap_token(mm);
-		mm_free_cgroup(mm);
 		mmdrop(mm);
 	}
 }
@@ -996,6 +994,13 @@ static void rt_mutex_init_task(struct ta
 #endif
 }

+#ifdef CONFIG_MM_OWNER
+void mm_init_owner(struct mm_struct *mm, struct task_struct *p)
+{
+	mm->owner = p;
+}
+#endif /* CONFIG_MM_OWNER */
+
 /*
  * This creates a new process as a copy of the old one,
  * but does not actually start it yet.
diff -puN mm/memcontrol.c~memory-controller-add-mm-owner mm/memcontrol.c
--- linux-2.6.25-rc8/mm/memcontrol.c~memory-controller-add-mm-owner	2008-04-08
17:23:20.000000000 +0530
+++ linux-2.6.25-rc8-balbir/mm/memcontrol.c	2008-04-08 17:23:20.000000000 +0530
@@ -238,26 +238,12 @@ static struct mem_cgroup *mem_cgroup_fro
 				css);
 }

-static struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p)
+struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p)
 {
 	return container_of(task_subsys_state(p, mem_cgroup_subsys_id),
 				struct mem_cgroup, css);
 }

-void mm_init_cgroup(struct mm_struct *mm, struct task_struct *p)
-{
-	struct mem_cgroup *mem;
-
-	mem = mem_cgroup_from_task(p);
-	css_get(&mem->css);
-	mm->mem_cgroup = mem;
-}
-
-void mm_free_cgroup(struct mm_struct *mm)
-{
-	css_put(&mm->mem_cgroup->css);
-}
-
 static inline int page_cgroup_locked(struct page *page)
 {
 	return bit_spin_is_locked(PAGE_CGROUP_LOCK_BIT, &page->page_cgroup);
@@ -478,6 +464,7 @@ unsigned long mem_cgroup_isolate_pages(u
 	int zid = zone_idx(z);
 	struct mem_cgroup_per_zone *mz;

+	BUG_ON(!mem_cont);
 	mz = mem_cgroup_zoneinfo(mem_cont, nid, zid);
 	if (active)
 		src = &mz->active_list;
@@ -576,7 +563,7 @@ retry:
 		mm = &init_mm;

 	rcu_read_lock();
-	mem = rcu_dereference(mm->mem_cgroup);
+	mem = mem_cgroup_from_task(rcu_dereference(mm->owner));
 	/*
 	 * For every charge from the cgroup, increment reference count
 	 */
@@ -1006,7 +993,6 @@ mem_cgroup_create(struct cgroup_subsys *

 	if (unlikely((cont->parent) == NULL)) {
 		mem = &init_mem_cgroup;
-		init_mm.mem_cgroup = mem;
 		page_cgroup_cache = KMEM_CACHE(page_cgroup, SLAB_PANIC);
 	} else
 		mem = kzalloc(sizeof(struct mem_cgroup), GFP_KERNEL);
@@ -1087,10 +1073,6 @@ static void mem_cgroup_move_task(struct
 	if (!thread_group_leader(p))
 		goto out;

-	css_get(&mem->css);
-	rcu_assign_pointer(mm->mem_cgroup, mem);
-	css_put(&old_mem->css);
-
 out:
 	mmput(mm);
 }
diff -puN init/main.c~memory-controller-add-mm-owner init/main.c
--- linux-2.6.25-rc8/init/main.c~memory-controller-add-mm-owner	2008-04-08
17:23:20.000000000 +0530
+++ linux-2.6.25-rc8-balbir/init/main.c	2008-04-08 17:23:20.000000000 +0530
@@ -537,6 +537,7 @@ asmlinkage void __init start_kernel(void
 	printk(KERN_NOTICE);
 	printk(linux_banner);
 	setup_arch(&command_line);
+	mm_init_owner(&init_mm, &init_task);
 	setup_command_line(command_line);
 	unwind_setup();
 	setup_per_cpu_areas();
_

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Fwd: [-mm] Add an owner to the mm_struct (v9)]
  2008-04-14 14:13 [Fwd: [-mm] Add an owner to the mm_struct (v9)] Balbir Singh
@ 2008-04-14 14:49 ` Pekka Enberg
  2008-04-14 16:04   ` Paul Menage
  2008-04-15  0:07 ` Andrew Morton
  1 sibling, 1 reply; 18+ messages in thread
From: Pekka Enberg @ 2008-04-14 14:49 UTC (permalink / raw)
  To: balbir
  Cc: Andrew Morton, Paul Menage, Serge E. Hallyn,
	linux kernel mailing list

Balbir Singh wrote:
> If nobody has any specific objection to this patch, can I go ahead and request
> Andrew to include this patch?

->owner in struct mm_struct is very useful for revokeat() so

Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Fwd: [-mm] Add an owner to the mm_struct (v9)]
  2008-04-14 14:49 ` Pekka Enberg
@ 2008-04-14 16:04   ` Paul Menage
  2008-04-14 19:36     ` Andrew Morton
  0 siblings, 1 reply; 18+ messages in thread
From: Paul Menage @ 2008-04-14 16:04 UTC (permalink / raw)
  To: Balbir Singh; +Cc: Andrew Morton, Serge E. Hallyn, linux kernel mailing list

On Mon, Apr 14, 2008 at 7:49 AM, Pekka Enberg <penberg@cs.helsinki.fi> wrote:
> Balbir Singh wrote:
>
> > If nobody has any specific objection to this patch, can I go ahead and
> request
> > Andrew to include this patch?
> >
>
>  ->owner in struct mm_struct is very useful for revokeat() so
>
>  Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
>

Reviewed-by: Paul Menage <menage@google.com>

I'm still not sure when to use Acked-by versus Reviewed-by - is it
just based on the level of scrutiny/review?

Paul

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Fwd: [-mm] Add an owner to the mm_struct (v9)]
  2008-04-14 16:04   ` Paul Menage
@ 2008-04-14 19:36     ` Andrew Morton
  2008-04-14 21:06       ` Pekka Enberg
  0 siblings, 1 reply; 18+ messages in thread
From: Andrew Morton @ 2008-04-14 19:36 UTC (permalink / raw)
  To: Paul Menage; +Cc: balbir, serue, linux-kernel

On Mon, 14 Apr 2008 09:04:28 -0700
"Paul Menage" <menage@google.com> wrote:

> On Mon, Apr 14, 2008 at 7:49 AM, Pekka Enberg <penberg@cs.helsinki.fi> wrote:
> > Balbir Singh wrote:
> >
> > > If nobody has any specific objection to this patch, can I go ahead and
> > request
> > > Andrew to include this patch?
> > >
> >
> >  ->owner in struct mm_struct is very useful for revokeat() so
> >
> >  Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
> >
> 
> Reviewed-by: Paul Menage <menage@google.com>
> 
> I'm still not sure when to use Acked-by versus Reviewed-by - is it
> just based on the level of scrutiny/review?
> 

Yes.  Reviewed-by: is a lot stronger that Acked-by:.  I've seen some pretty
shocking Acked-by:'s :)

Documentation/SubmittingPatches goes into a surprising amount of detail.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Fwd: [-mm] Add an owner to the mm_struct (v9)]
  2008-04-14 19:36     ` Andrew Morton
@ 2008-04-14 21:06       ` Pekka Enberg
  0 siblings, 0 replies; 18+ messages in thread
From: Pekka Enberg @ 2008-04-14 21:06 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Paul Menage, balbir, serue, linux-kernel

On Mon, Apr 14, 2008 at 10:36 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
>  > >  Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
>  >
>  > Reviewed-by: Paul Menage <menage@google.com>
>  >
>  > I'm still not sure when to use Acked-by versus Reviewed-by - is it
>  > just based on the level of scrutiny/review?
>
>  Yes.  Reviewed-by: is a lot stronger that Acked-by:.  I've seen some pretty
>  shocking Acked-by:'s :)

Heh, I hope mine wasn't one. But the point here is that, no, I didn't
review the patch that closely (it's mostly cgroup stuff anyway) but I
do want to see struct mm_struct ->owner in the kernel for revoke thus
Acked-by.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Fwd: [-mm] Add an owner to the mm_struct (v9)]
  2008-04-14 14:13 [Fwd: [-mm] Add an owner to the mm_struct (v9)] Balbir Singh
  2008-04-14 14:49 ` Pekka Enberg
@ 2008-04-15  0:07 ` Andrew Morton
  2008-04-15 17:13   ` Oleg Nesterov
  1 sibling, 1 reply; 18+ messages in thread
From: Andrew Morton @ 2008-04-15  0:07 UTC (permalink / raw)
  To: balbir; +Cc: menage, serue, penberg, linux-kernel, Oleg Nesterov

On Mon, 14 Apr 2008 19:43:11 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> 1. Add mm->owner change callbacks using cgroups
> 
> This patch removes the mem_cgroup member from mm_struct and instead adds
> an owner. This approach was suggested by Paul Menage. The advantage of
> this approach is that, once the mm->owner is known, using the subsystem
> id, the cgroup can be determined. It also allows several control groups
> that are virtually grouped by mm_struct, to exist independent of the memory
> controller i.e., without adding mem_cgroup's for each controller,
> to mm_struct.
> 
> A new config option CONFIG_MM_OWNER is added and the memory resource
> controller selects this config option.
> 
> This patch also adds cgroup callbacks to notify subsystems when mm->owner
> changes. The mm_cgroup_changed callback is called with the task_lock()
> of the new task held and is called just prior to changing the mm->owner.
> 
> I am indebted to Paul Menage for the several reviews of this patchset
> and helping me make it lighter and simpler.
> 
> This patch was tested on a powerpc box, it was compiled with both the
> MM_OWNER config turned on and off.
> 
> After the thread group leader exits, it's moved to init_css_state by
> cgroup_exit(), thus all future charges from runnings threads would
> be redirected to the init_css_set's subsystem.
> 
> Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
> ---
> 
>  fs/exec.c                  |    1
>  include/linux/cgroup.h     |   15 ++++++++
>  include/linux/memcontrol.h |   16 +-------
>  include/linux/mm_types.h   |    5 +-
>  include/linux/sched.h      |   13 +++++++
>  init/Kconfig               |    7 +++
>  init/main.c                |    1
>  kernel/cgroup.c            |   30 ++++++++++++++++
>  kernel/exit.c              |   83 +++++++++++++++++++++++++++++++++++++++++++++
>  kernel/fork.c              |   11 ++++-
>  mm/memcontrol.c            |   24 +------------
>  11 files changed, 167 insertions(+), 39 deletions(-)
> 
> diff -puN fs/exec.c~memory-controller-add-mm-owner fs/exec.c
> --- linux-2.6.25-rc8/fs/exec.c~memory-controller-add-mm-owner	2008-04-08
> 17:23:20.000000000 +0530
> +++ linux-2.6.25-rc8-balbir/fs/exec.c	2008-04-08 17:23:20.000000000 +0530

Rather a lot of wordwrapping here, but the patch applied OK nonetheless. 
This means that all the code was within 80 columns ;)

> diff -puN kernel/cgroup.c~memory-controller-add-mm-owner kernel/cgroup.c
> --- linux-2.6.25-rc8/kernel/cgroup.c~memory-controller-add-mm-owner	2008-04-08
> 17:23:20.000000000 +0530
> +++ linux-2.6.25-rc8-balbir/kernel/cgroup.c	2008-04-08 17:29:45.000000000 +0530
> @@ -118,6 +118,7 @@ static int root_count;
>   * be called.
>   */
>  static int need_forkexit_callback;
> +static int need_mm_owner_callback __read_mostly;
> 
>  /* convenient tests for these bits */
>  inline int cgroup_is_removed(const struct cgroup *cgrp)
> @@ -2485,6 +2486,7 @@ static void __init cgroup_init_subsys(st
>  	}
> 
>  	need_forkexit_callback |= ss->fork || ss->exit;
> +	need_mm_owner_callback |= !!ss->mm_owner_changed;

I assume this trick is here to minimise runtime overhead on systems which
never use cgroups?

It'd be nice if the changelog were to describe what that overhead is.

Then I'd know whether to ask whether we should look at clearing
need_mm_owner_callback when the system stops using cgroups.  Somehow.

> 
> +#ifdef CONFIG_MM_OWNER
> +/*
> + * Task p is exiting and it owned mm, lets find a new owner for it

"let's"

> + */
> +static inline int
> +mm_need_new_owner(struct mm_struct *mm, struct task_struct *p)
> +{
> +	/*
> +	 * If there are other users of the mm and the owner (us) is exiting
> +	 * we need to find a new owner to take on the responsibility.
> +	 */

But this comment rather duplicates the above one.

> +	if (!mm)
> +		return 0;
> +	if (atomic_read(&mm->mm_users) <= 1)
> +		return 0;
> +	if (mm->owner != p)
> +		return 0;
> +	return 1;
> +}
> +
> +void mm_update_next_owner(struct mm_struct *mm)
> +{
> +	struct task_struct *c, *g, *p = current;
> +
> +retry:
> +	if (!mm_need_new_owner(mm, p))
> +		return;
> +
> +	read_lock(&tasklist_lock);
> +	/*
> +	 * Search in the children
> +	 */
> +	list_for_each_entry(c, &p->children, sibling) {
> +		if (c->mm == mm)
> +			goto assign_new_owner;
> +	}
> +
> +	/*
> +	 * Search in the siblings
> +	 */
> +	list_for_each_entry(c, &p->parent->children, sibling) {
> +		if (c->mm == mm)
> +			goto assign_new_owner;
> +	}
> +
> +	/*
> +	 * Search through everything else. We should not get
> +	 * here often
> +	 */
> +	do_each_thread(g, c) {
> +		if (c->mm == mm)
> +			goto assign_new_owner;
> +	} while_each_thread(g, c);
> +
> +	read_unlock(&tasklist_lock);

Potentially-long tasklist_lock hold times are a concern.  I don't suppose
rcu can save us?

Some additional commentary fleshing out "We should not get here often"
might set minds at ease.  How not-often?  Under which circumstances?

> +	return;
> +
> +assign_new_owner:
> +	BUG_ON(c == p);
> +	get_task_struct(c);
> +	/*
> +	 * The task_lock protects c->mm from changing.
> +	 * We always want mm->owner->mm == mm
> +	 */
> +	task_lock(c);
> +	/*
> +	 * Delay read_unlock() till we have the task_lock()
> +	 * to ensure that c does not slip away underneath us
> +	 */
> +	read_unlock(&tasklist_lock);
> +	if (c->mm != mm) {
> +		task_unlock(c);
> +		put_task_struct(c);
> +		goto retry;
> +	}
> +	cgroup_mm_owner_callbacks(mm->owner, c);

afaict no callbacks are implemented in this patch?  What's the plan here? 
Should be covered in the changelog, please.

> +	mm->owner = c;
> +	task_unlock(c);
> +	put_task_struct(c);
> +}
> +#endif /* CONFIG_MM_OWNER */
> +
>
> ...
>
>  static inline int page_cgroup_locked(struct page *page)
>  {
>  	return bit_spin_is_locked(PAGE_CGROUP_LOCK_BIT, &page->page_cgroup);
> @@ -478,6 +464,7 @@ unsigned long mem_cgroup_isolate_pages(u
>  	int zid = zone_idx(z);
>  	struct mem_cgroup_per_zone *mz;
> 
> +	BUG_ON(!mem_cont);
>  	mz = mem_cgroup_zoneinfo(mem_cont, nid, zid);

The mem_cgroup_zoneinfo() will immediately and reliably oops if mem_cont is
NULL, won't it?  If so, the BUG_ON() is redundant?



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Fwd: [-mm] Add an owner to the mm_struct (v9)]
  2008-04-15  0:07 ` Andrew Morton
@ 2008-04-15 17:13   ` Oleg Nesterov
  2008-04-15 18:13     ` Paul Menage
  0 siblings, 1 reply; 18+ messages in thread
From: Oleg Nesterov @ 2008-04-15 17:13 UTC (permalink / raw)
  To: Andrew Morton; +Cc: balbir, menage, serue, penberg, linux-kernel

On 04/14, Andrew Morton wrote:
>
> On Mon, 14 Apr 2008 19:43:11 +0530
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> 
> > +void mm_update_next_owner(struct mm_struct *mm)
> > +{
> > +	struct task_struct *c, *g, *p = current;
> > +
> > +retry:
> > +	if (!mm_need_new_owner(mm, p))
> > +		return;
> > +
> > +	read_lock(&tasklist_lock);
> > +	/*
> > +	 * Search in the children
> > +	 */
> > +	list_for_each_entry(c, &p->children, sibling) {
> > +		if (c->mm == mm)
> > +			goto assign_new_owner;
> > +	}
> > +
> > +	/*
> > +	 * Search in the siblings
> > +	 */
> > +	list_for_each_entry(c, &p->parent->children, sibling) {
> > +		if (c->mm == mm)
> > +			goto assign_new_owner;
> > +	}
> > +
> > +	/*
> > +	 * Search through everything else. We should not get
> > +	 * here often
> > +	 */
> > +	do_each_thread(g, c) {
> > +		if (c->mm == mm)
> > +			goto assign_new_owner;
> > +	} while_each_thread(g, c);
> > +
> > +	read_unlock(&tasklist_lock);
> 
> Potentially-long tasklist_lock hold times are a concern.  I don't suppose
> rcu can save us?

I guess rcu can't help...

> Some additional commentary fleshing out "We should not get here often"
> might set minds at ease.  How not-often?  Under which circumstances?

Oh, I don't know what cgroup is, at all, but this looks really strange.

What about use_mm()? We can choose a kernel thread, but unuse_mm() doesn't
try to change ->owner...

Let's suppose the process with a lot of threads does exit_group() and nobody
else uses this ->mm. How many time we will re-assign mm->owner and iterate
over the all threads in system ?


Perhaps, we can add mm_struct->mm_user_list instead? In that case mm->owner
becomes first_entry()...

> > +assign_new_owner:
> > +	BUG_ON(c == p);
> > +	get_task_struct(c);
> > +	/*
> > +	 * The task_lock protects c->mm from changing.
> > +	 * We always want mm->owner->mm == mm
> > +	 */
> > +	task_lock(c);
> > +	/*
> > +	 * Delay read_unlock() till we have the task_lock()
> > +	 * to ensure that c does not slip away underneath us
> > +	 */
> > +	read_unlock(&tasklist_lock);

You can drop tasklist_lock right after get_task_struct(), the nested locks
are not preempt-friendly.

> > +	if (c->mm != mm) {
> > +		task_unlock(c);
> > +		put_task_struct(c);
> > +		goto retry;
> > +	}
> > +	cgroup_mm_owner_callbacks(mm->owner, c);

Can't we avoid calling cgroup_mm_owner_callbacks() at least when
mm->owner->cgroups == c->cgroups ?

Minor, but perhaps cgroup_mm_owner_callbacks() should check ->mm_owner_changed
!= NULL first, then play with task_cgroup()...

Oleg.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Fwd: [-mm] Add an owner to the mm_struct (v9)]
  2008-04-15 17:13   ` Oleg Nesterov
@ 2008-04-15 18:13     ` Paul Menage
  2008-04-15 19:59       ` Oleg Nesterov
  0 siblings, 1 reply; 18+ messages in thread
From: Paul Menage @ 2008-04-15 18:13 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Andrew Morton, balbir, serue, penberg, linux-kernel

On Tue, Apr 15, 2008 at 10:13 AM, Oleg Nesterov <oleg@tv-sign.ru> wrote:
>
>  Let's suppose the process with a lot of threads does exit_group() and nobody
>  else uses this ->mm. How many time we will re-assign mm->owner and iterate
>  over the all threads in system ?
>

In general we won't get to the third loop, since one of the first two
loops (children or siblings) will find another mm user.

Paul

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Fwd: [-mm] Add an owner to the mm_struct (v9)]
  2008-04-15 18:13     ` Paul Menage
@ 2008-04-15 19:59       ` Oleg Nesterov
  2008-04-17  3:38         ` Balbir Singh
  0 siblings, 1 reply; 18+ messages in thread
From: Oleg Nesterov @ 2008-04-15 19:59 UTC (permalink / raw)
  To: Paul Menage; +Cc: Andrew Morton, balbir, serue, penberg, linux-kernel

On 04/15, Paul Menage wrote:
>
> On Tue, Apr 15, 2008 at 10:13 AM, Oleg Nesterov <oleg@tv-sign.ru> wrote:
> >
> >  Let's suppose the process with a lot of threads does exit_group() and nobody
> >  else uses this ->mm. How many time we will re-assign mm->owner and iterate
> >  over the all threads in system ?
> >
> 
> In general we won't get to the third loop, since one of the first two
> loops (children or siblings) will find another mm user.

Well yes, the second loop checks parent->children ... all sub-threads have
the same parent.

I'd suggest to use ->real_parent though. And the third loop could be

	for_each_process(g) {
		c = g;
		do {
			if (!c->mm)
				continue;
			if (c->mm != mm)
				break;
			goto assign_new_owner;
		} while_each_thread(g, c);
	}

Still. can't we make mm->mm_users_list ?

Oleg.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Fwd: [-mm] Add an owner to the mm_struct (v9)]
  2008-04-15 19:59       ` Oleg Nesterov
@ 2008-04-17  3:38         ` Balbir Singh
  2008-04-17 11:30           ` Oleg Nesterov
  0 siblings, 1 reply; 18+ messages in thread
From: Balbir Singh @ 2008-04-17  3:38 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Paul Menage, Andrew Morton, serue, penberg, linux-kernel

Oleg Nesterov wrote:
> On 04/15, Paul Menage wrote:
>> On Tue, Apr 15, 2008 at 10:13 AM, Oleg Nesterov <oleg@tv-sign.ru> wrote:
>>>  Let's suppose the process with a lot of threads does exit_group() and nobody
>>>  else uses this ->mm. How many time we will re-assign mm->owner and iterate
>>>  over the all threads in system ?
>>>
>> In general we won't get to the third loop, since one of the first two
>> loops (children or siblings) will find another mm user.
> 
> Well yes, the second loop checks parent->children ... all sub-threads have
> the same parent.
> 
> I'd suggest to use ->real_parent though. And the third loop could be
> 

real_parent is for ptraced processes right?

> 	for_each_process(g) {
> 		c = g;
> 		do {
> 			if (!c->mm)
> 				continue;
> 			if (c->mm != mm)
> 				break;
> 			goto assign_new_owner;
> 		} while_each_thread(g, c);
> 	}
> 

I had this loop earlier (inspired from zap_threads()), is this loop more
efficient than what we have?

> Still. can't we make mm->mm_users_list ?
> 

I suspect that will be expensive to maintain. Specially with large number of
threads. I see a large space overhead and time overhead and additional
synchronization overhead. Apart from finding the next owner is there any other
advantage?

> Oleg.
> 


-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Fwd: [-mm] Add an owner to the mm_struct (v9)]
  2008-04-17  3:38         ` Balbir Singh
@ 2008-04-17 11:30           ` Oleg Nesterov
  2008-04-17 16:34             ` Paul Menage
  0 siblings, 1 reply; 18+ messages in thread
From: Oleg Nesterov @ 2008-04-17 11:30 UTC (permalink / raw)
  To: Balbir Singh; +Cc: Paul Menage, Andrew Morton, serue, penberg, linux-kernel

On 04/17, Balbir Singh wrote:
>
> Oleg Nesterov wrote:
> > 
> > I'd suggest to use ->real_parent though. And the third loop could be
> 
> real_parent is for ptraced processes right?

No, please look at __ptrace_link(). ->real_parent is parent, ->parent
is ptracer or it is equal to ->real_parent.

> > 	for_each_process(g) {
> > 		c = g;
> > 		do {
> > 			if (!c->mm)
> > 				continue;
> > 			if (c->mm != mm)
> > 				break;
> > 			goto assign_new_owner;
> > 		} while_each_thread(g, c);
> > 	}
> > 
> 
> I had this loop earlier (inspired from zap_threads()), is this loop more
> efficient than what we have?

All sub-threads have the same ->mm. Once we see that c->mm != mm, we don't
need to waste CPU iterating over the all other threads in the thread group.

> > Still. can't we make mm->mm_users_list ?
> 
> I suspect that will be expensive to maintain. Specially with large number of
> threads. I see a large space overhead and time overhead and additional
> synchronization overhead.

Not sure... but I didn't really think about the implementation.

> Apart from finding the next owner is there any other
> advantage?

it could be used by coredump.

OK, please forget. Even _if_ I am right, we can do this later.

Sadly, I don't have any time to read cgroup.c currently. Balbir, any
chance you have the "for dummies" explanation what mm->owner is?
I mean, I can't understand how it is possible that 2 CLONE_VM tasks
are not equal wrt "ownering". When the old owner dies, we choose a
random thread with the same mm. But we do nothing when the last user
of ->mm dies. What is the point? (please feel free to ignore my q
if it is not easy to explain).

Also, please let me remind,

 > > +     get_task_struct(c);
 > > +     /*
 > > +      * The task_lock protects c->mm from changing.
 > > +      * We always want mm->owner->mm == mm
 > > +      */
 > > +     task_lock(c);
 > > +     /*
 > > +      * Delay read_unlock() till we have the task_lock()
 > > +      * to ensure that c does not slip away underneath us
 > > +      */
 > > +     read_unlock(&tasklist_lock);

The commemt is misleading, tasklist_lock buys nothing and could
be dropped right after get_task_struct(). tasklist can't prevent
the task from exiting, but get_task_struct() pins task_struct,
so it is safe to do task_lock() and re-check ->mm.

And we seem to have problems with use_mm(), no? Btw, what do you
think about killing PF_BORROWED_MM ?

	http://marc.info/?l=linux-kernel&m=120825843403378

Oleg.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Fwd: [-mm] Add an owner to the mm_struct (v9)]
  2008-04-17 16:34             ` Paul Menage
@ 2008-04-17 16:19               ` Oleg Nesterov
  2008-04-17 17:40                 ` Balbir Singh
  2008-04-17 17:49                 ` Paul Menage
  0 siblings, 2 replies; 18+ messages in thread
From: Oleg Nesterov @ 2008-04-17 16:19 UTC (permalink / raw)
  To: Paul Menage; +Cc: Balbir Singh, Andrew Morton, serue, penberg, linux-kernel

On 04/17, Paul Menage wrote:
>
> On Thu, Apr 17, 2008 at 4:30 AM, Oleg Nesterov <oleg@tv-sign.ru> wrote:
> >  >
> >  > I had this loop earlier (inspired from zap_threads()), is this loop more
> >  > efficient than what we have?
> >
> >  All sub-threads have the same ->mm. Once we see that c->mm != mm, we don't
> >  need to waste CPU iterating over the all other threads in the thread group.
> 
> Technically they don't have to have the same mm, right? You can use
> CLONE_THREAD without CLONE_VM when creating a new subthread.

No, no, this is not possible/allowed.

Please look at copy_process, CLONE_THREAD requires CLONE_SIGHAND,
CLONE_SIGHAND needs CLONE_VM.

> >  chance you have the "for dummies" explanation what mm->owner is?
> >  I mean, I can't understand how it is possible that 2 CLONE_VM tasks
> >  are not equal wrt "ownering".
> 
> The idea is to be able to get from an mm to a task, where that task is
> representative of the tasks using the mm. Uses include:

Thanks! but...

Let's suppose 2 task belongs to different cgroups, but share ->mm,

> - swap cgroup - when swapping from an mm, find a task whose swap
> cgroup we can charge the swap page to

now we will charge the somewhat "random" cgroup. The one to which
the result of the last "find the next suitable owner" belongs.

This looks a bit strange to me... but OK, as I said, I don't know
what cgroup is, please ignore me ;)

Oleg.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Fwd: [-mm] Add an owner to the mm_struct (v9)]
  2008-04-17 11:30           ` Oleg Nesterov
@ 2008-04-17 16:34             ` Paul Menage
  2008-04-17 16:19               ` Oleg Nesterov
  0 siblings, 1 reply; 18+ messages in thread
From: Paul Menage @ 2008-04-17 16:34 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Balbir Singh, Andrew Morton, serue, penberg, linux-kernel

On Thu, Apr 17, 2008 at 4:30 AM, Oleg Nesterov <oleg@tv-sign.ru> wrote:
>  >
>  > I had this loop earlier (inspired from zap_threads()), is this loop more
>  > efficient than what we have?
>
>  All sub-threads have the same ->mm. Once we see that c->mm != mm, we don't
>  need to waste CPU iterating over the all other threads in the thread group.

Technically they don't have to have the same mm, right? You can use
CLONE_THREAD without CLONE_VM when creating a new subthread.

So the complete loop is required for correctness - but it might make
sense to include your version of the loop first, since that will be
faster whenever there are heavily-threaded apps on the system, and
will give the right answer 99.9% of the time (i.e. except when the
user is doing something really weird with clone flags).

>  chance you have the "for dummies" explanation what mm->owner is?
>  I mean, I can't understand how it is possible that 2 CLONE_VM tasks
>  are not equal wrt "ownering".

The idea is to be able to get from an mm to a task, where that task is
representative of the tasks using the mm. Uses include:

- virtual address space cgroup - when we extend an mm, we don't always
have a task pointer available currently.

- swap cgroup - when swapping from an mm, find a task whose swap
cgroup we can charge the swap page to

- revokeat() - apparently needs this for locating tasks based on file mappings

> When the old owner dies, we choose a
>  random thread with the same mm. But we do nothing when the last user
>  of ->mm dies. What is the point? (please feel free to ignore my q

When the last user of the mm dies, the mm is freed, so there's no need
for mm->owner to be valid any longer.

Paul

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Fwd: [-mm] Add an owner to the mm_struct (v9)]
  2008-04-17 17:40                 ` Balbir Singh
@ 2008-04-17 17:08                   ` Oleg Nesterov
  2008-04-17 17:50                   ` Paul Menage
  1 sibling, 0 replies; 18+ messages in thread
From: Oleg Nesterov @ 2008-04-17 17:08 UTC (permalink / raw)
  To: Balbir Singh; +Cc: Paul Menage, Andrew Morton, serue, penberg, linux-kernel

On 04/17, Balbir Singh wrote:
> 
> Oleg Nesterov wrote:
> >> CLONE_THREAD without CLONE_VM when creating a new subthread.
> > 
> > No, no, this is not possible/allowed.
> > 
> > Please look at copy_process, CLONE_THREAD requires CLONE_SIGHAND,
> > CLONE_SIGHAND needs CLONE_VM.
> > 
> 
> What about the other way round, CLONE_VM without CLONE_THREAD?

Yes, this is possible, and this is why mm_update_next_owner() has to be so
complicated. The same for zap_threads/elf_core_dump.

If only I knew why we allow CLONE_VM! except for CLONE_THREAD/CLONE_VFORK
of course, but there are trivial.


And thanks to you and Paul for your explanations.

Oleg.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Fwd: [-mm] Add an owner to the mm_struct (v9)]
  2008-04-17 16:19               ` Oleg Nesterov
@ 2008-04-17 17:40                 ` Balbir Singh
  2008-04-17 17:08                   ` Oleg Nesterov
  2008-04-17 17:50                   ` Paul Menage
  2008-04-17 17:49                 ` Paul Menage
  1 sibling, 2 replies; 18+ messages in thread
From: Balbir Singh @ 2008-04-17 17:40 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Paul Menage, Andrew Morton, serue, penberg, linux-kernel

Hi, Oleg,

I've been slow in responding and I expect that situation to continue till the
weekend. I hope to respond to your queries sooner

Oleg Nesterov wrote:
> On 04/17, Paul Menage wrote:
>> On Thu, Apr 17, 2008 at 4:30 AM, Oleg Nesterov <oleg@tv-sign.ru> wrote:
>>>  >
>>>  > I had this loop earlier (inspired from zap_threads()), is this loop more
>>>  > efficient than what we have?
>>>
>>>  All sub-threads have the same ->mm. Once we see that c->mm != mm, we don't
>>>  need to waste CPU iterating over the all other threads in the thread group.
>> Technically they don't have to have the same mm, right? You can use
>> CLONE_THREAD without CLONE_VM when creating a new subthread.
> 
> No, no, this is not possible/allowed.
> 
> Please look at copy_process, CLONE_THREAD requires CLONE_SIGHAND,
> CLONE_SIGHAND needs CLONE_VM.
> 

What about the other way round, CLONE_VM without CLONE_THREAD?


>>>  chance you have the "for dummies" explanation what mm->owner is?
>>>  I mean, I can't understand how it is possible that 2 CLONE_VM tasks
>>>  are not equal wrt "ownering".
>> The idea is to be able to get from an mm to a task, where that task is
>> representative of the tasks using the mm. Uses include:
> 
> Thanks! but...
> 
> Let's suppose 2 task belongs to different cgroups, but share ->mm,
> 
>> - swap cgroup - when swapping from an mm, find a task whose swap
>> cgroup we can charge the swap page to
> 
> now we will charge the somewhat "random" cgroup. The one to which
> the result of the last "find the next suitable owner" belongs.
> 

Basically, with mm->owner and prior to that with the memory controller, we group
tasks by mm_struct instead of by task (virtually). Earlier, we had a hook in
mm_struct to tell us where the mm_struct belonged (to which cgroup). With
mm->owner, we can figure out where each subsystem belongs without having hooks
in mm_struct for each memory based controller subsystem.

When the owner exits (mm->owner), we pick the next owner and tell the tasks via
notification that the cgroup has changed (if it does). This does bring about
some issues, where without control accounting can move to a different cgroup all
of a sudden. To avoid that

We would recommend that the memory based controllers be mounted separately and
all related threads be put in the same cgroup


> This looks a bit strange to me... but OK, as I said, I don't know
> what cgroup is, please ignore me ;)
> 
> Oleg.
> 

Thanks for your review!

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Fwd: [-mm] Add an owner to the mm_struct (v9)]
  2008-04-17 16:19               ` Oleg Nesterov
  2008-04-17 17:40                 ` Balbir Singh
@ 2008-04-17 17:49                 ` Paul Menage
  1 sibling, 0 replies; 18+ messages in thread
From: Paul Menage @ 2008-04-17 17:49 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Balbir Singh, Andrew Morton, serue, penberg, linux-kernel

On Thu, Apr 17, 2008 at 9:19 AM, Oleg Nesterov <oleg@tv-sign.ru> wrote:
> On 04/17, Paul Menage wrote:
>  >
>  > On Thu, Apr 17, 2008 at 4:30 AM, Oleg Nesterov <oleg@tv-sign.ru> wrote:
>  > >  >
>  > >  > I had this loop earlier (inspired from zap_threads()), is this loop more
>  > >  > efficient than what we have?
>  > >
>  > >  All sub-threads have the same ->mm. Once we see that c->mm != mm, we don't
>  > >  need to waste CPU iterating over the all other threads in the thread group.
>  >
>  > Technically they don't have to have the same mm, right? You can use
>  > CLONE_THREAD without CLONE_VM when creating a new subthread.
>
>  No, no, this is not possible/allowed.
>
>  Please look at copy_process, CLONE_THREAD requires CLONE_SIGHAND,
>  CLONE_SIGHAND needs CLONE_VM.

Ah, OK. I saw that CLONE_THREAD -> CLONE_SIGHAND but I didn't notice
the second implication. That would mean that your optimization is
valid, good.

>
>  Let's suppose 2 task belongs to different cgroups, but share ->mm,
>
>
>  > - swap cgroup - when swapping from an mm, find a task whose swap
>  > cgroup we can charge the swap page to
>
>  now we will charge the somewhat "random" cgroup. The one to which
>  the result of the last "find the next suitable owner" belongs.

Correct. (Although if two tasks are sharing an mm, charging either of
them seems perfectly fair, as long as we charge exactly one of them
and remember who we charged).

>
>  This looks a bit strange to me... but OK, as I said, I don't know
>  what cgroup is, please ignore me ;)

A control group is just a way of grouping tasks together for the
purposes of resource control, isolation, etc.

Yes, weird things can potentially happen in this case - if the user
sets up situations like this, hopefully they know what they're doing.
The important thing is that these weird setups not crash.

Paul

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Fwd: [-mm] Add an owner to the mm_struct (v9)]
  2008-04-17 17:40                 ` Balbir Singh
  2008-04-17 17:08                   ` Oleg Nesterov
@ 2008-04-17 17:50                   ` Paul Menage
  2008-04-17 19:07                     ` Balbir Singh
  1 sibling, 1 reply; 18+ messages in thread
From: Paul Menage @ 2008-04-17 17:50 UTC (permalink / raw)
  To: balbir; +Cc: Oleg Nesterov, Andrew Morton, serue, penberg, linux-kernel

On Thu, Apr 17, 2008 at 10:40 AM, Balbir Singh
<balbir@linux.vnet.ibm.com> wrote:
>  >
>  > Please look at copy_process, CLONE_THREAD requires CLONE_SIGHAND,
>  > CLONE_SIGHAND needs CLONE_VM.
>  >
>
>  What about the other way round, CLONE_VM without CLONE_THREAD?
>

That's just a standard LinuxThreads clone() call. The patch handles
that just fine.

Paul

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Fwd: [-mm] Add an owner to the mm_struct (v9)]
  2008-04-17 17:50                   ` Paul Menage
@ 2008-04-17 19:07                     ` Balbir Singh
  0 siblings, 0 replies; 18+ messages in thread
From: Balbir Singh @ 2008-04-17 19:07 UTC (permalink / raw)
  To: Paul Menage; +Cc: Oleg Nesterov, Andrew Morton, serue, penberg, linux-kernel

Paul Menage wrote:
> On Thu, Apr 17, 2008 at 10:40 AM, Balbir Singh
> <balbir@linux.vnet.ibm.com> wrote:
>>  >
>>  > Please look at copy_process, CLONE_THREAD requires CLONE_SIGHAND,
>>  > CLONE_SIGHAND needs CLONE_VM.
>>  >
>>
>>  What about the other way round, CLONE_VM without CLONE_THREAD?
>>
> 
> That's just a standard LinuxThreads clone() call. The patch handles
> that just fine.
> 
> Paul

Paul, it does. I was just telling Oleg about why we need the loop.

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2008-04-17 19:10 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-14 14:13 [Fwd: [-mm] Add an owner to the mm_struct (v9)] Balbir Singh
2008-04-14 14:49 ` Pekka Enberg
2008-04-14 16:04   ` Paul Menage
2008-04-14 19:36     ` Andrew Morton
2008-04-14 21:06       ` Pekka Enberg
2008-04-15  0:07 ` Andrew Morton
2008-04-15 17:13   ` Oleg Nesterov
2008-04-15 18:13     ` Paul Menage
2008-04-15 19:59       ` Oleg Nesterov
2008-04-17  3:38         ` Balbir Singh
2008-04-17 11:30           ` Oleg Nesterov
2008-04-17 16:34             ` Paul Menage
2008-04-17 16:19               ` Oleg Nesterov
2008-04-17 17:40                 ` Balbir Singh
2008-04-17 17:08                   ` Oleg Nesterov
2008-04-17 17:50                   ` Paul Menage
2008-04-17 19:07                     ` Balbir Singh
2008-04-17 17:49                 ` Paul Menage

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox