Re: [PATCH v3 2/7] socket: initial cgroup code.

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Glauber Costa <glommer@parallels.com>
To: Greg Thelen <gthelen@google.com>
Cc: <linux-kernel@vger.kernel.org>, <paul@paulmenage.org>,
	<lizf@cn.fujitsu.com>, <kamezawa.hiroyu@jp.fujitsu.com>,
	<ebiederm@xmission.com>, <davem@davemloft.net>,
	<netdev@vger.kernel.org>, <linux-mm@kvack.org>,
	<kirill@shutemov.name>
Subject: Re: [PATCH v3 2/7] socket: initial cgroup code.
Date: Wed, 21 Sep 2011 15:59:55 -0300	[thread overview]
Message-ID: <4E7A342B.5040608@parallels.com> (raw)
In-Reply-To: <CAHH2K0YgkG2J_bO+U9zbZYhTTqSLvr6NtxKxN8dRtfHs=iB8iA@mail.gmail.com>

On 09/21/2011 03:47 PM, Greg Thelen wrote:
> On Sun, Sep 18, 2011 at 5:56 PM, Glauber Costa<glommer@parallels.com>  wrote:
>> We aim to control the amount of kernel memory pinned at any
>> time by tcp sockets. To lay the foundations for this work,
>> this patch adds a pointer to the kmem_cgroup to the socket
>> structure.
>>
>> Signed-off-by: Glauber Costa<glommer@parallels.com>
>> CC: David S. Miller<davem@davemloft.net>
>> CC: Hiroyouki Kamezawa<kamezawa.hiroyu@jp.fujitsu.com>
>> CC: Eric W. Biederman<ebiederm@xmission.com>
> ...
>> +void sock_update_memcg(struct sock *sk)
>> +{
>> +       /* right now a socket spends its whole life in the same cgroup */
>> +       BUG_ON(sk->sk_cgrp);
>> +
>> +       rcu_read_lock();
>> +       sk->sk_cgrp = mem_cgroup_from_task(current);
>> +
>> +       /*
>> +        * We don't need to protect against anything task-related, because
>> +        * we are basically stuck with the sock pointer that won't change,
>> +        * even if the task that originated the socket changes cgroups.
>> +        *
>> +        * What we do have to guarantee, is that the chain leading us to
>> +        * the top level won't change under our noses. Incrementing the
>> +        * reference count via cgroup_exclude_rmdir guarantees that.
>> +        */
>> +       cgroup_exclude_rmdir(mem_cgroup_css(sk->sk_cgrp));
>
> This grabs a css_get() reference, which prevents rmdir (will return
> -EBUSY).
Yes.

  How long is this reference held?
For the socket lifetime.

> I wonder about the case
> where a process creates a socket in memcg M1 and later is moved into
> memcg M2.  At that point an admin would expect to be able to 'rmdir
> M1'.  I think this rmdir would return -EBUSY and I suspect it would be
> difficult for the admin to understand why the rmdir of M1 failed.  It
> seems that to rmdir a memcg, an admin would have to kill all processes
> that allocated sockets while in M1.  Such processes may not still be
> in M1.
>
>> +       rcu_read_unlock();
>> +}
I agree. But also, don't see too much ways around it without 
implementing full task migration.

Right now I am working under the assumption that tasks are long lived 
inside the cgroup. Migration potentially introduces some nasty locking 
problems in the mem_schedule path.

Also, unless I am missing something, the memcg already has the policy of
not carrying charges around, probably because of this very same complexity.

True that at least it won't EBUSY you... But I think this is at least a 
way to guarantee that the cgroup under our nose won't disappear in the 
middle of our allocations.

WARNING: multiple messages have this Message-ID (diff)

From: Glauber Costa <glommer@parallels.com>
To: Greg Thelen <gthelen@google.com>
Cc: linux-kernel@vger.kernel.org, paul@paulmenage.org,
	lizf@cn.fujitsu.com, kamezawa.hiroyu@jp.fujitsu.com,
	ebiederm@xmission.com, davem@davemloft.net,
	netdev@vger.kernel.org, linux-mm@kvack.org, kirill@shutemov.name
Subject: Re: [PATCH v3 2/7] socket: initial cgroup code.
Date: Wed, 21 Sep 2011 15:59:55 -0300	[thread overview]
Message-ID: <4E7A342B.5040608@parallels.com> (raw)
In-Reply-To: <CAHH2K0YgkG2J_bO+U9zbZYhTTqSLvr6NtxKxN8dRtfHs=iB8iA@mail.gmail.com>

On 09/21/2011 03:47 PM, Greg Thelen wrote:
> On Sun, Sep 18, 2011 at 5:56 PM, Glauber Costa<glommer@parallels.com>  wrote:
>> We aim to control the amount of kernel memory pinned at any
>> time by tcp sockets. To lay the foundations for this work,
>> this patch adds a pointer to the kmem_cgroup to the socket
>> structure.
>>
>> Signed-off-by: Glauber Costa<glommer@parallels.com>
>> CC: David S. Miller<davem@davemloft.net>
>> CC: Hiroyouki Kamezawa<kamezawa.hiroyu@jp.fujitsu.com>
>> CC: Eric W. Biederman<ebiederm@xmission.com>
> ...
>> +void sock_update_memcg(struct sock *sk)
>> +{
>> +       /* right now a socket spends its whole life in the same cgroup */
>> +       BUG_ON(sk->sk_cgrp);
>> +
>> +       rcu_read_lock();
>> +       sk->sk_cgrp = mem_cgroup_from_task(current);
>> +
>> +       /*
>> +        * We don't need to protect against anything task-related, because
>> +        * we are basically stuck with the sock pointer that won't change,
>> +        * even if the task that originated the socket changes cgroups.
>> +        *
>> +        * What we do have to guarantee, is that the chain leading us to
>> +        * the top level won't change under our noses. Incrementing the
>> +        * reference count via cgroup_exclude_rmdir guarantees that.
>> +        */
>> +       cgroup_exclude_rmdir(mem_cgroup_css(sk->sk_cgrp));
>
> This grabs a css_get() reference, which prevents rmdir (will return
> -EBUSY).
Yes.

  How long is this reference held?
For the socket lifetime.

> I wonder about the case
> where a process creates a socket in memcg M1 and later is moved into
> memcg M2.  At that point an admin would expect to be able to 'rmdir
> M1'.  I think this rmdir would return -EBUSY and I suspect it would be
> difficult for the admin to understand why the rmdir of M1 failed.  It
> seems that to rmdir a memcg, an admin would have to kill all processes
> that allocated sockets while in M1.  Such processes may not still be
> in M1.
>
>> +       rcu_read_unlock();
>> +}
I agree. But also, don't see too much ways around it without 
implementing full task migration.

Right now I am working under the assumption that tasks are long lived 
inside the cgroup. Migration potentially introduces some nasty locking 
problems in the mem_schedule path.

Also, unless I am missing something, the memcg already has the policy of
not carrying charges around, probably because of this very same complexity.

True that at least it won't EBUSY you... But I think this is at least a 
way to guarantee that the cgroup under our nose won't disappear in the 
middle of our allocations.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Glauber Costa <glommer@parallels.com>
To: Greg Thelen <gthelen@google.com>
Cc: <linux-kernel@vger.kernel.org>, <paul@paulmenage.org>,
	<lizf@cn.fujitsu.com>, <kamezawa.hiroyu@jp.fujitsu.com>,
	<ebiederm@xmission.com>, <davem@davemloft.net>,
	<netdev@vger.kernel.org>, <linux-mm@kvack.org>,
	<kirill@shutemov.name>
Subject: Re: [PATCH v3 2/7] socket: initial cgroup code.
Date: Wed, 21 Sep 2011 15:59:55 -0300	[thread overview]
Message-ID: <4E7A342B.5040608@parallels.com> (raw)
In-Reply-To: <CAHH2K0YgkG2J_bO+U9zbZYhTTqSLvr6NtxKxN8dRtfHs=iB8iA@mail.gmail.com>

On 09/21/2011 03:47 PM, Greg Thelen wrote:
> On Sun, Sep 18, 2011 at 5:56 PM, Glauber Costa<glommer@parallels.com>  wrote:
>> We aim to control the amount of kernel memory pinned at any
>> time by tcp sockets. To lay the foundations for this work,
>> this patch adds a pointer to the kmem_cgroup to the socket
>> structure.
>>
>> Signed-off-by: Glauber Costa<glommer@parallels.com>
>> CC: David S. Miller<davem@davemloft.net>
>> CC: Hiroyouki Kamezawa<kamezawa.hiroyu@jp.fujitsu.com>
>> CC: Eric W. Biederman<ebiederm@xmission.com>
> ...
>> +void sock_update_memcg(struct sock *sk)
>> +{
>> +       /* right now a socket spends its whole life in the same cgroup */
>> +       BUG_ON(sk->sk_cgrp);
>> +
>> +       rcu_read_lock();
>> +       sk->sk_cgrp = mem_cgroup_from_task(current);
>> +
>> +       /*
>> +        * We don't need to protect against anything task-related, because
>> +        * we are basically stuck with the sock pointer that won't change,
>> +        * even if the task that originated the socket changes cgroups.
>> +        *
>> +        * What we do have to guarantee, is that the chain leading us to
>> +        * the top level won't change under our noses. Incrementing the
>> +        * reference count via cgroup_exclude_rmdir guarantees that.
>> +        */
>> +       cgroup_exclude_rmdir(mem_cgroup_css(sk->sk_cgrp));
>
> This grabs a css_get() reference, which prevents rmdir (will return
> -EBUSY).
Yes.

  How long is this reference held?
For the socket lifetime.

> I wonder about the case
> where a process creates a socket in memcg M1 and later is moved into
> memcg M2.  At that point an admin would expect to be able to 'rmdir
> M1'.  I think this rmdir would return -EBUSY and I suspect it would be
> difficult for the admin to understand why the rmdir of M1 failed.  It
> seems that to rmdir a memcg, an admin would have to kill all processes
> that allocated sockets while in M1.  Such processes may not still be
> in M1.
>
>> +       rcu_read_unlock();
>> +}
I agree. But also, don't see too much ways around it without 
implementing full task migration.

Right now I am working under the assumption that tasks are long lived 
inside the cgroup. Migration potentially introduces some nasty locking 
problems in the mem_schedule path.

Also, unless I am missing something, the memcg already has the policy of
not carrying charges around, probably because of this very same complexity.

True that at least it won't EBUSY you... But I think this is at least a 
way to guarantee that the cgroup under our nose won't disappear in the 
middle of our allocations.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2011-09-21 19:00 UTC|newest]

Thread overview: 118+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-09-19  0:56 [PATCH v3 0/7] per-cgroup tcp buffer pressure settings Glauber Costa
2011-09-19  0:56 ` Glauber Costa
2011-09-19  0:56 ` [PATCH v3 1/7] Basic kernel memory functionality for the Memory Controller Glauber Costa
2011-09-19  0:56   ` Glauber Costa
2011-09-21  2:23   ` Glauber Costa
2011-09-21  2:23     ` Glauber Costa
2011-09-21  2:23     ` Glauber Costa
2011-09-22  3:17     ` Balbir Singh
2011-09-22  3:17       ` Balbir Singh
2011-09-22  3:19       ` Glauber Costa
2011-09-22  3:19         ` Glauber Costa
2011-09-22  3:19         ` Glauber Costa
2011-09-24 14:43       ` Glauber Costa
2011-09-24 14:43         ` Glauber Costa
2011-09-24 14:43         ` Glauber Costa
2011-09-27 10:06         ` Balbir Singh
2011-09-27 10:06           ` Balbir Singh
2011-09-22  5:58   ` Greg Thelen
2011-09-22  5:58     ` Greg Thelen
2011-09-26 10:34   ` KAMEZAWA Hiroyuki
2011-09-26 10:34     ` KAMEZAWA Hiroyuki
2011-09-26 22:44     ` Glauber Costa
2011-09-26 22:44       ` Glauber Costa
2011-09-26 22:44       ` Glauber Costa
2011-09-26 23:18     ` Glauber Costa
2011-09-26 23:18       ` Glauber Costa
2011-09-26 23:18       ` Glauber Costa
2011-09-28  0:58       ` KAMEZAWA Hiroyuki
2011-09-28  0:58         ` KAMEZAWA Hiroyuki
2011-09-28  0:58         ` KAMEZAWA Hiroyuki
2011-09-28 12:03         ` Glauber Costa
2011-09-28 12:03           ` Glauber Costa
2011-09-28 12:03           ` Glauber Costa
2011-09-19  0:56 ` [PATCH v3 2/7] socket: initial cgroup code Glauber Costa
2011-09-19  0:56   ` Glauber Costa
2011-09-21 18:47   ` Greg Thelen
2011-09-21 18:47     ` Greg Thelen
2011-09-21 18:59     ` Glauber Costa [this message]
2011-09-21 18:59       ` Glauber Costa
2011-09-21 18:59       ` Glauber Costa
2011-09-22  6:00       ` Greg Thelen
2011-09-22  6:00         ` Greg Thelen
2011-09-22 15:09         ` Balbir Singh
2011-09-22 15:09           ` Balbir Singh
2011-09-24 13:33           ` Glauber Costa
2011-09-24 13:33             ` Glauber Costa
2011-09-24 13:33             ` Glauber Costa
2011-09-24 13:40           ` Glauber Costa
2011-09-24 13:40             ` Glauber Costa
2011-09-24 13:40             ` Glauber Costa
2011-09-24 14:45           ` Glauber Costa
2011-09-24 14:45             ` Glauber Costa
2011-09-24 14:45             ` Glauber Costa
2011-09-26 10:52             ` KAMEZAWA Hiroyuki
2011-09-26 10:52               ` KAMEZAWA Hiroyuki
2011-09-26 10:52               ` KAMEZAWA Hiroyuki
2011-09-26 22:47               ` Glauber Costa
2011-09-26 22:47                 ` Glauber Costa
2011-09-26 22:47                 ` Glauber Costa
2011-09-28  0:56                 ` KAMEZAWA Hiroyuki
2011-09-28  0:56                   ` KAMEZAWA Hiroyuki
2011-09-27 20:43               ` Glauber Costa
2011-09-27 20:43                 ` Glauber Costa
2011-09-19  0:56 ` [PATCH v3 3/7] foundations of per-cgroup memory pressure controlling Glauber Costa
2011-09-19  0:56   ` Glauber Costa
2011-09-19  0:56 ` [PATCH v3 4/7] per-cgroup tcp buffers control Glauber Costa
2011-09-19  0:56   ` Glauber Costa
2011-09-26 10:59   ` KAMEZAWA Hiroyuki
2011-09-26 10:59     ` KAMEZAWA Hiroyuki
2011-09-26 22:48     ` Glauber Costa
2011-09-26 22:48       ` Glauber Costa
2011-09-26 22:48       ` Glauber Costa
2011-09-27  1:53     ` Glauber Costa
2011-09-27  1:53       ` Glauber Costa
2011-09-27  1:53       ` Glauber Costa
2011-09-28  1:09       ` KAMEZAWA Hiroyuki
2011-09-28  1:09         ` KAMEZAWA Hiroyuki
2011-09-26 14:39   ` Andrew Vagin
2011-09-26 14:39     ` Andrew Vagin
2011-09-26 22:52     ` Glauber Costa
2011-09-26 22:52       ` Glauber Costa
2011-09-26 22:52       ` Glauber Costa
2011-09-19  0:56 ` [PATCH v3 5/7] per-netns ipv4 sysctl_tcp_mem Glauber Costa
2011-09-19  0:56   ` Glauber Costa
2011-09-19  0:56 ` [PATCH v3 6/7] tcp buffer limitation: per-cgroup limit Glauber Costa
2011-09-19  0:56   ` Glauber Costa
2011-09-22  6:01   ` Greg Thelen
2011-09-22  6:01     ` Greg Thelen
2011-09-22  9:58     ` Kirill A. Shutemov
2011-09-22  9:58       ` Kirill A. Shutemov
2011-09-22  9:58       ` Kirill A. Shutemov
2011-09-22 15:44       ` Greg Thelen
2011-09-22 15:44         ` Greg Thelen
2011-09-24 13:30     ` Glauber Costa
2011-09-24 13:30       ` Glauber Costa
2011-09-24 13:30       ` Glauber Costa
2011-09-26 11:02       ` KAMEZAWA Hiroyuki
2011-09-26 11:02         ` KAMEZAWA Hiroyuki
2011-09-26 11:02         ` KAMEZAWA Hiroyuki
2011-09-26 22:49         ` Glauber Costa
2011-09-26 22:49           ` Glauber Costa
2011-09-26 22:49           ` Glauber Costa
2011-09-22 23:08   ` Balbir Singh
2011-09-22 23:08     ` Balbir Singh
2011-09-24 13:35     ` Glauber Costa
2011-09-24 13:35       ` Glauber Costa
2011-09-24 13:35       ` Glauber Costa
2011-09-24 16:58   ` Andi Kleen
2011-09-24 16:58     ` Andi Kleen
2011-09-24 17:27     ` Glauber Costa
2011-09-24 17:27       ` Glauber Costa
2011-09-24 17:27       ` Glauber Costa
2011-09-28  2:29     ` Balbir Singh
2011-09-28  2:29       ` Balbir Singh
2011-09-28  3:06       ` Andi Kleen
2011-09-28  3:06         ` Andi Kleen
2011-09-19  0:56 ` [PATCH v3 7/7] Display current tcp memory allocation in kmem cgroup Glauber Costa
2011-09-19  0:56   ` Glauber Costa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E7A342B.5040608@parallels.com \
    --to=glommer@parallels.com \
    --cc=davem@davemloft.net \
    --cc=ebiederm@xmission.com \
    --cc=gthelen@google.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=netdev@vger.kernel.org \
    --cc=paul@paulmenage.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.