From mboxrd@z Thu Jan 1 00:00:00 1970 From: Lord Glauber Costa of Sealand Subject: Re: [PATCH v4 0/6] replace cgroup_lock with memcg specific locking Date: Fri, 25 Jan 2013 14:05:04 +0400 Message-ID: <510258D0.6060407@parallels.com> References: <1358862461-18046-1-git-send-email-glommer@parallels.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1358862461-18046-1-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, Tejun Heo , Michal Hocko , Johannes Weiner , kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org On 01/22/2013 05:47 PM, Glauber Costa wrote: > Hi, > > In memcg, we use the cgroup_lock basically to synchronize against > attaching new children to a cgroup. We do this because we rely on cgroup core to > provide us with this information. > > We need to guarantee that upon child creation, our tunables are consistent. > For those, the calls to cgroup_lock() all live in handlers like > mem_cgroup_hierarchy_write(), where we change a tunable in the group that is > hierarchy-related. For instance, the use_hierarchy flag cannot be changed if > the cgroup already have children. > > Furthermore, those values are propageted from the parent to the child when a > new child is created. So if we don't lock like this, we can end up with the > following situation: > > A B > memcg_css_alloc() mem_cgroup_hierarchy_write() > copy use hierarchy from parent change use hierarchy in parent > finish creation. > > This is mainly because during create, we are still not fully connected to the > css tree. So all iterators and the such that we could use, will fail to show > that the group has children. > > My observation is that all of creation can proceed in parallel with those > tasks, except value assignment. So what this patchseries does is to first move > all value assignment that is dependent on parent values from css_alloc to > css_online, where the iterators all work, and then we lock only the value > assignment. This will guarantee that parent and children always have consistent > values. Together with an online test, that can be derived from the observation > that the refcount of an online memcg can be made to be always positive, we > should be able to synchronize our side without the cgroup lock. > > *v4: > - revert back to using the set_limit_mutex for kmemcg limit setting. > > *v3: > - simplified test for presence of children, and no longer using refcnt for > online testing > - some cleanups as suggested by Michal > > *v2: > - sanitize kmemcg assignment in the light of the current locking change. > - don't grab locks on immigrate charges by caching the value during can_attach > > Glauber Costa (6): > memcg: prevent changes to move_charge_at_immigrate during task attach > memcg: split part of memcg creation to css_online > memcg: fast hierarchy-aware child test. > memcg: replace cgroup_lock with memcg specific memcg_lock > memcg: increment static branch right after limit set. > memcg: avoid dangling reference count in creation failure. > Tejun, This applies ontop of your cpuset patches. Would you pick this (would be my choice), or would you rather have it routed through somewhere mmish ?