From mboxrd@z Thu Jan 1 00:00:00 1970 From: Balbir Singh Subject: Re: [-mm] Add an owner to the mm_struct (v7) Date: Fri, 04 Apr 2008 00:21:33 +0530 Message-ID: <47F52735.7090502@linux.vnet.ibm.com> References: <20080403174433.26356.42121.sendpatchset@localhost.localdomain> <1207247113.21922.63.camel@nimitz.home.sr71.net> Reply-To: balbir@linux.vnet.ibm.com Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1207247113.21922.63.camel@nimitz.home.sr71.net> Sender: linux-kernel-owner@vger.kernel.org To: Dave Hansen Cc: Paul Menage , Pavel Emelianov , Hugh Dickins , Sudhir Kumar , YAMAMOTO Takashi , lizf@cn.fujitsu.com, linux-kernel@vger.kernel.org, taka@valinux.co.jp, linux-mm@kvack.org, David Rientjes , Andrew Morton , KAMEZAWA Hiroyuki List-Id: linux-mm.kvack.org Dave Hansen wrote: > On Thu, 2008-04-03 at 23:14 +0530, Balbir Singh wrote: >> diff -puN include/linux/init_task.h~memory-controller-add-mm-owner include/linux/init_task.h >> --- linux-2.6.25-rc8/include/linux/init_task.h~memory-controller-add-mm-owner 2008-04-03 22:43:27.000000000 +0530 >> +++ linux-2.6.25-rc8-balbir/include/linux/init_task.h 2008-04-03 22:43:27.000000000 +0530 >> @@ -199,7 +199,6 @@ extern struct group_info init_groups; >> INIT_LOCKDEP \ >> } >> >> - >> #define INIT_CPU_TIMERS(cpu_timers) \ >> { \ >> LIST_HEAD_INIT(cpu_timers[0]), \ > > I assume you didn't mean to do that one. > No, I made some changes to that file and the undo removed an extra space, which showed up. Thanks for spotting it. >> diff -puN include/linux/memcontrol.h~memory-controller-add-mm-owner include/linux/memcontrol.h >> --- linux-2.6.25-rc8/include/linux/memcontrol.h~memory-controller-add-mm-owner 2008-04-03 22:43:27.000000000 +0530 >> +++ linux-2.6.25-rc8-balbir/include/linux/memcontrol.h 2008-04-03 22:43:27.000000000 +0530 >> @@ -27,9 +27,6 @@ struct mm_struct; >> >> #ifdef CONFIG_CGROUP_MEM_RES_CTLR >> >> -extern void mm_init_cgroup(struct mm_struct *mm, struct task_struct *p); >> -extern void mm_free_cgroup(struct mm_struct *mm); >> - >> #define page_reset_bad_cgroup(page) ((page)->page_cgroup = 0) >> >> extern struct page_cgroup *page_get_page_cgroup(struct page *page); >> @@ -48,8 +45,10 @@ extern unsigned long mem_cgroup_isolate_ >> extern void mem_cgroup_out_of_memory(struct mem_cgroup *mem, gfp_t gfp_mask); >> int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem); >> >> +extern struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p); >> + >> #define mm_match_cgroup(mm, cgroup) \ >> - ((cgroup) == rcu_dereference((mm)->mem_cgroup)) >> + ((cgroup) == mem_cgroup_from_task((mm)->owner)) > > Now that you've mucked with this one, can you just turn this into a > static inline? > It was a static inline, but there was a build error on sparc. David Reintjes made this change to make it compile. This might not be the right patch for the change you are suggesting, but definitely worth doing. > ... >> +#ifdef CONFIG_MM_OWNER >> +/* >> + * Task p is exiting and it owned p, so lets find a new owner for it >> + */ >> +static inline int >> +mm_need_new_owner(struct mm_struct *mm, struct task_struct *p) >> +{ >> + int ret; >> + >> + /* >> + * If there are other users of the mm and the owner (us) is exiting >> + * we need to find a new owner to take on the responsibility. >> + * When we use thread groups (CLONE_THREAD), the thread group >> + * leader is kept around in zombie state, even after it exits. >> + * delay_group_leader() ensures that if the group leader is around >> + * we need not select a new owner. >> + */ >> + ret = (mm && (atomic_read(&mm->mm_users) > 1) && (mm->owner == p) && >> + !delay_group_leader(p)); >> + return ret; >> +} > > Ugh. Could you please spell this out a bit more. I find that stuff > above really hard to read. Something like: > > if (!mm) > return 0; > if (atomic_read(&mm->mm_users) <= 1) > return 0; > if (mm->owner != p) > return 0; > if (delay_group_leader(p)) > return 0; > return 1; > The problem with code above is 4 branch instructions and the code I have just 4 AND operations. I don't think &&'s are so hard to read. If there is a mixture of operations (&&, ||) then it can get a little harder > It also gives you a nice spot to stick comments for each particular > check. > >> +void mm_update_next_owner(struct mm_struct *mm) >> +{ >> + struct task_struct *c, *g, *p = current; > > Any chance I can talk you into spelling these out a bit? By the time I > get down in the function, it's easy to forget what they are. > That can be done, will do >> +retry: >> + if (!mm_need_new_owner(mm, p)) >> + return; >> + >> + rcu_read_lock(); >> + /* >> + * Search in the children >> + */ >> + list_for_each_entry(c, &p->children, sibling) { >> + if (c->mm == mm) >> + goto assign_new_owner; >> + } >> + >> + /* >> + * Search in the siblings >> + */ >> + list_for_each_entry(c, &p->parent->children, sibling) { >> + if (c->mm == mm) >> + goto assign_new_owner; >> + } >> + >> + /* >> + * Search through everything else. We should not get >> + * here often >> + */ >> + do_each_thread(g, c) { >> + if (c->mm == mm) >> + goto assign_new_owner; >> + } while_each_thread(g, c); > > What is the case in which we get here? Threading that's two deep where > none of the immeidate siblings or children is still alive? > This usually happens for cases where threads were created without CLONE_THREAD. We need to scan for shared mm's between processes (siblings and children scans have not been successful). > Have you happened to instrument this and see if it happens in practice > much? > Yes, I have. I removed the !delay_group_leader() and registered the cgroup mm_owner_changed callback and saw the mm->owner change. >> + rcu_read_unlock(); >> + return; >> + >> +assign_new_owner: >> + BUG_ON(c == p); >> + get_task_struct(c); >> + /* >> + * The task_lock protects c->mm from changing. >> + * We always want mm->owner->mm == mm >> + */ >> + task_lock(c); >> + /* >> + * Delay rcu_read_unlock() till we have the task_lock() >> + * to ensure that c does not slip away underneath us >> + */ >> + rcu_read_unlock(); >> + if (c->mm != mm) { >> + task_unlock(c); >> + put_task_struct(c); >> + goto retry; >> + } >> + cgroup_mm_owner_callbacks(mm->owner, c); >> + mm->owner = c; >> + task_unlock(c); >> + put_task_struct(c); >> +} >> +#endif /* CONFIG_MM_OWNER */ Thanks for the review, -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL