Re: [-mm] Add an owner to the mm_struct (v8)

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Balbir Singh <balbir@linux.vnet.ibm.com>
To: Paul Menage <menage@google.com>
Cc: Pavel Emelianov <xemul@openvz.org>,
	Hugh Dickins <hugh@veritas.com>,
	Sudhir Kumar <skumar@linux.vnet.ibm.com>,
	YAMAMOTO Takashi <yamamoto@valinux.co.jp>,
	lizf@cn.fujitsu.com, linux-kernel@vger.kernel.org,
	taka@valinux.co.jp, linux-mm@kvack.org,
	David Rientjes <rientjes@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Subject: Re: [-mm] Add an owner to the mm_struct (v8)
Date: Sat, 05 Apr 2008 20:17:30 +0530	[thread overview]
Message-ID: <47F79102.6090406@linux.vnet.ibm.com> (raw)
In-Reply-To: <6599ad830804041211r37848a6coaa900d8bdac40fbe@mail.gmail.com>

Paul Menage wrote:
> On Fri, Apr 4, 2008 at 2:25 AM, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>>  >>  For other controllers,
>>  >>  they'll need to monitor exit() callbacks to know when the leader is dead :( (sigh).
>>  >
>>  > That sounds like a nightmare ...
>>  >
>>
>>  Yes, it would be, but worth the trouble. Is it really critical to move a dead
>>  cgroup leader to init_css_set in cgroup_exit()?
> 
> It struck me that this whole group leader optimization is broken as it
> stands since there could (in strange configurations) be multiple
> thread groups sharing the same mm.
> 
> I wonder if we can't just delay the exit_mm() call of a group leader
> until all its threads have exited?
> 

Not sure about this one, I suspect keeping the group_leader around is an
optimization, changing exit_mm() for the group_leader, not sure how that will
impact functionality or standards. It might even break some applications.

Repeating my question earlier

Can we delay setting task->cgroups = &init_css_set for the group_leader, until
all threads have exited? If the user is unable to remove a cgroup node, it will
be due a valid reason, the group_leader is still around, since the threads are
still around. The user in that case should wait for notify_on_release.

>>  > As long as we find someone to pass the mm to quickly, it shouldn't be
>>  > too bad - I think we're already optimized for that case. Generally the
>>  > group leader's first child will be the new owner, and any subsequent
>>  > times the owner exits, they're unlikely to have any children so
>>  > they'll go straight to the sibling check and pass the mm to the
>>  > parent's first child.
>>  >
>>  > Unless they all exit in strict sibling order and hence pass the mm
>>  > along the chain one by one, we should be fine. And if that exit
>>  > ordering does turn out to be common, then simply walking the child and
>>  > sibling lists in reverse order to find a victim will minimize the
>>  > amount of passing.
>>  >
>>
>>
>>  Finding the next mm might not be all that bad, but doing it each time a task
>>  exits, can be an overhead, specially for large multi threaded programs.
> 
> Right, but we only have that overhead if we actually end up passing
> the mm from one to another each time they exit. It would be
> interesting to know what order the threads in a large multi-threaded
> process exit typically (when the main process exits and all the
> threads die).
> 
> I guess it's likely to be one of:
> 
> - in thread creation order (i.e. in order of parent->children list),
> in which case we should try to throw the mm to the parent's last child
> - in reverse creation order, in which case we should try to throw the
> mm to the parent's first child
> - in random order depending on which threads the scheduler runs first
> (in which case we can expect that a small fraction of the threads will
> have to throw the mm whichever end we start from)
> 
>>  This can
>>  get severe if the new mm->owner belongs to a different cgroup, in which case we
>>  need to use callbacks as well.
>>
>>  If half the threads belonged to a different cgroup and the new mm->owner kept
>>  switching between cgroups, the overhead would be really high, with the callbacks
>>  and the mm->owner changing frequently.
> 
> To me, it seems that setting up a *virtual address space* cgroup
> hierarchy and then putting half your threads in one group and half in
> the another is asking for trouble. We need to not break in that
> situation, but I'm not sure it's a case to optimize for.

That could potentially happen, if the virtual address space cgroup and cpu
control cgroup were bound together in the same hierarchy by the sysadmin.

I measured the overhead of removing the delay_group_leader optimization and
found a 4% impact on throughput (with volanomark, that is one of the
multi-threaded benchmarks I know of).

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

next prev parent reply	other threads:[~2008-04-05 14:48 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-04-04  8:05 [-mm] Add an owner to the mm_struct (v8) Balbir Singh
2008-04-04  8:12 ` Paul Menage
2008-04-04  8:28   ` Balbir Singh
2008-04-04  8:50     ` Paul Menage
2008-04-04  9:25       ` Balbir Singh
2008-04-04 19:11         ` Paul Menage
2008-04-05 14:47           ` Balbir Singh [this message]
2008-04-05 17:23             ` Paul Menage
2008-04-05 17:48               ` Balbir Singh
2008-04-05 17:57                 ` Paul Menage
2008-04-05 18:59                   ` Balbir Singh
2008-04-05 23:29                     ` Paul Menage
2008-04-06  5:38                       ` Balbir Singh
2008-04-08  6:37                         ` Paul Menage
2008-04-08  6:52                           ` Balbir Singh
2008-04-08  6:57                             ` Paul Menage
2008-04-08  7:05                               ` Balbir Singh
2008-04-08  7:29                                 ` Paul Menage
2008-04-10  9:09                                   ` Balbir Singh
2008-04-10  9:09                                     ` Balbir Singh
2008-04-05 23:31                     ` Paul Menage
2008-04-06  6:31                       ` Balbir Singh
2008-04-08  6:32                         ` Paul Menage
2008-04-07 22:09 ` Andrew Morton
2008-04-07 22:09   ` Andrew Morton
2008-04-08  2:39   ` Balbir Singh
2008-04-08  2:55     ` Andrew Morton
2008-04-09  0:42 ` KAMEZAWA Hiroyuki
2008-04-09  0:42   ` KAMEZAWA Hiroyuki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47F79102.6090406@linux.vnet.ibm.com \
    --to=balbir@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=hugh@veritas.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=menage@google.com \
    --cc=rientjes@google.com \
    --cc=skumar@linux.vnet.ibm.com \
    --cc=taka@valinux.co.jp \
    --cc=xemul@openvz.org \
    --cc=yamamoto@valinux.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.