From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Glauber Costa <glommer@parallels.com>
Cc: Tejun Heo <tj@kernel.org>,
netdev@vger.kernel.org, cgroups@vger.kernel.org,
Li Zefan <lizefan@huawei.com>, David Miller <davem@davemloft.net>,
devel@openvz.org
Subject: Re: [PATCH v2 5/5] decrement static keys on real destroy time
Date: Wed, 25 Apr 2012 09:22:37 +0900 [thread overview]
Message-ID: <4F9743CD.9030209@jp.fujitsu.com> (raw)
In-Reply-To: <4F969176.8010804@parallels.com>
(2012/04/24 20:41), Glauber Costa wrote:
> On 04/23/2012 11:40 PM, KAMEZAWA Hiroyuki wrote:
>> (2012/04/24 4:37), Glauber Costa wrote:
>>
>>> We call the destroy function when a cgroup starts to be removed,
>>> such as by a rmdir event.
>>>
>>> However, because of our reference counters, some objects are still
>>> inflight. Right now, we are decrementing the static_keys at destroy()
>>> time, meaning that if we get rid of the last static_key reference,
>>> some objects will still have charges, but the code to properly
>>> uncharge them won't be run.
>>>
>>> This becomes a problem specially if it is ever enabled again, because
>>> now new charges will be added to the staled charges making keeping
>>> it pretty much impossible.
>>>
>>> We just need to be careful with the static branch activation:
>>> since there is no particular preferred order of their activation,
>>> we need to make sure that we only start using it after all
>>> call sites are active. This is achieved by having a per-memcg
>>> flag that is only updated after static_key_slow_inc() returns.
>>> At this time, we are sure all sites are active.
>>>
>>> This is made per-memcg, not global, for a reason:
>>> it also has the effect of making socket accounting more
>>> consistent. The first memcg to be limited will trigger static_key()
>>> activation, therefore, accounting. But all the others will then be
>>> accounted no matter what. After this patch, only limited memcgs
>>> will have its sockets accounted.
>>>
>>> [v2: changed a tcp limited flag for a generic proto limited flag ]
>>> [v3: update the current active flag only after the static_key update ]
>>>
>>> Signed-off-by: Glauber Costa<glommer@parallels.com>
>>
>>
>> Acked-by: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
>>
>> A small request below.
>>
>> <snip>
>>
>>
>>> + * ->activated needs to be written after the static_key update.
>>> + * This is what guarantees that the socket activation function
>>> + * is the last one to run. See sock_update_memcg() for details,
>>> + * and note that we don't mark any socket as belonging to this
>>> + * memcg until that flag is up.
>>> + *
>>> + * We need to do this, because static_keys will span multiple
>>> + * sites, but we can't control their order. If we mark a socket
>>> + * as accounted, but the accounting functions are not patched in
>>> + * yet, we'll lose accounting.
>>> + *
>>> + * We never race with the readers in sock_update_memcg(), because
>>> + * when this value change, the code to process it is not patched in
>>> + * yet.
>>> + */
>>> + mutex_lock(&tcp_set_limit_mutex);
>>
>>
>> Could you explain for what this mutex is in above comment ?
>>
> This is explained at the site where the mutex is defined.
> If you still want me to mention it here, or maybe expand the explanation
> there, I surely can.
>
Ah, I think it's better to mention one more complicated race.
Let me explain.
Assume we don't have tcp_set_limit_mutex. And jump_label is not activated yet
i.e. memcg_socket_limit_enabled->count == 0.
When a user updates limit of 2 cgroups at once, following happens.
CPU A CPU B
if (cg_proto->activated) if (cg->proto_activated)
static_key_inc() static_key_inc()
=> set counter 0->1 => set counter 1->2, return immediately.
=> hold mutex => cg_proto->activated = true.
=> overwrite jmps.
Then, without mutex, activated/active may be set 'true' before the end
of jump_label modification.
Thanks,
-Kame
prev parent reply other threads:[~2012-04-25 0:22 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-04-23 19:37 [PATCH v2 0/5] Fix problem with static_key decrement Glauber Costa
2012-04-23 19:37 ` Glauber Costa
[not found] ` <1335209867-1831-1-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-04-23 19:37 ` [PATCH v2 1/5] don't attach a task to a dead cgroup Glauber Costa
2012-04-23 19:37 ` Glauber Costa
[not found] ` <1335209867-1831-2-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-04-24 2:20 ` KAMEZAWA Hiroyuki
2012-04-23 19:37 ` [PATCH v2 2/5] blkcg: protect blkcg->policy_list Glauber Costa
2012-04-23 19:37 ` Glauber Costa
2012-04-23 19:37 ` [PATCH v2 3/5] change number_of_cpusets to an atomic Glauber Costa
2012-04-23 19:37 ` Glauber Costa
[not found] ` <1335209867-1831-4-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-04-24 2:25 ` KAMEZAWA Hiroyuki
2012-04-24 15:02 ` Christoph Lameter
[not found] ` <alpine.DEB.2.00.1204241001050.26005-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>
2012-04-24 16:15 ` Glauber Costa
2012-04-24 16:15 ` Glauber Costa
[not found] ` <4F96D1A8.6040604-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-04-24 16:24 ` Christoph Lameter
[not found] ` <alpine.DEB.2.00.1204241120130.26005-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>
2012-04-24 16:30 ` Glauber Costa
2012-04-24 16:30 ` Glauber Costa
[not found] ` <4F96D50D.4020804-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-04-24 18:27 ` Christoph Lameter
2012-04-23 19:37 ` [PATCH v2 4/5] don't take cgroup_mutex in destroy() Glauber Costa
2012-04-23 19:37 ` Glauber Costa
[not found] ` <1335209867-1831-5-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-04-24 2:31 ` KAMEZAWA Hiroyuki
[not found] ` <4F96109A.8000907-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2012-04-24 11:42 ` Glauber Costa
2012-04-24 11:42 ` Glauber Costa
[not found] ` <4F9691A8.1070106-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-04-25 8:01 ` Li Zefan
2012-04-23 19:37 ` [PATCH v2 5/5] decrement static keys on real destroy time Glauber Costa
2012-04-23 19:37 ` Glauber Costa
[not found] ` <1335209867-1831-6-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-04-24 2:40 ` KAMEZAWA Hiroyuki
[not found] ` <4F9612B9.7050705-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2012-04-24 11:41 ` Glauber Costa
2012-04-24 11:41 ` Glauber Costa
2012-04-25 0:22 ` KAMEZAWA Hiroyuki [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4F9743CD.9030209@jp.fujitsu.com \
--to=kamezawa.hiroyu@jp.fujitsu.com \
--cc=cgroups@vger.kernel.org \
--cc=davem@davemloft.net \
--cc=devel@openvz.org \
--cc=glommer@parallels.com \
--cc=lizefan@huawei.com \
--cc=netdev@vger.kernel.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.