From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f49.google.com (mail-oi0-f49.google.com [209.85.218.49]) by kanga.kvack.org (Postfix) with ESMTP id 6B8FA828DF for ; Fri, 22 Jan 2016 13:35:30 -0500 (EST) Received: by mail-oi0-f49.google.com with SMTP id w75so52570470oie.0 for ; Fri, 22 Jan 2016 10:35:30 -0800 (PST) Received: from mail-ob0-x233.google.com (mail-ob0-x233.google.com. [2607:f8b0:4003:c01::233]) by mx.google.com with ESMTPS id z2si6917697oek.73.2016.01.22.10.35.29 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 22 Jan 2016 10:35:29 -0800 (PST) Received: by mail-ob0-x233.google.com with SMTP id ba1so70731741obb.3 for ; Fri, 22 Jan 2016 10:35:29 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <20160122163324.GH26192@esperanza> References: <20160122135042.GF26192@esperanza> <20160122144854.GA14432@cmpxchg.org> <20160122155104.GG32380@htj.duckdns.org> <20160122163324.GH26192@esperanza> Date: Fri, 22 Jan 2016 10:35:29 -0800 Message-ID: Subject: Re: PROBLEM: BUG when using memory.kmem.limit_in_bytes From: Brian Christiansen Content-Type: multipart/alternative; boundary=089e0149d21ccb76300529f07bc1 Sender: owner-linux-mm@kvack.org List-ID: To: Vladimir Davydov Cc: Tejun Heo , Johannes Weiner , Michal Hocko , cgroups@vger.kernel.org, linux-mm@kvack.org --089e0149d21ccb76300529f07bc1 Content-Type: text/plain; charset=UTF-8 On Fri, Jan 22, 2016 at 8:33 AM, Vladimir Davydov wrote: > On Fri, Jan 22, 2016 at 10:51:04AM -0500, Tejun Heo wrote: > > On Fri, Jan 22, 2016 at 09:48:54AM -0500, Johannes Weiner wrote: > > > On Fri, Jan 22, 2016 at 04:50:42PM +0300, Vladimir Davydov wrote: > > > > From first glance, it looks like the bug was triggered, because > > > > mem_cgroup_css_offline was run for a child cgroup earlier than for > its > > > > parent. This couldn't happen for sure before the cgroup was switched > to > > > > percpu_ref, because cgroup_destroy_wq has always had max_active == 1. > > > > Now, however, it looks like this is perfectly possible for > > > > css_killed_ref_fn is called from an rcu callback - see kill_css -> > > > > percpu_ref_kill_and_confirm. This breaks kmemcg assumptions. > > > > > > > > I'll take a look what can be done about that. > > > > > > It's an acknowledged problem in the cgroup core then, and not an issue > > > with kmemcg. Tejun sent a fix to correct the offlining order here: > > > > > > > https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1056544.html > > > > Patche descriptions updated and applied to cgroup/for-4.5-fixes. > > > > http://lkml.kernel.org/g/20160122154503.GD32380@htj.duckdns.org > > http://lkml.kernel.org/g/20160122154552.GE32380@htj.duckdns.org > > I couldn't reproduce the issue with the two patches applied. Looks like > they fix it. > > Thanks, > Vladimir > Thanks for the quick turn around! I'll test it when it gets into the mainline. Do you know what versions the fixes will go into? Thanks, Brian --089e0149d21ccb76300529f07bc1 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
On F= ri, Jan 22, 2016 at 8:33 AM, Vladimir Davydov <vdavydov@virtuozzo.com= > wrote:
O= n Fri, Jan 22, 2016 at 10:51:04AM -0500, Tejun Heo wrote:
> On Fri, Jan 22, 2016 at 09:48:54AM -0500, Johannes Weiner wrote:
> > On Fri, Jan 22, 2016 at 04:50:42PM +0300, Vladimir Davydov wrote:=
> > > From first glance, it looks like the bug was triggered, beca= use
> > > mem_cgroup_css_offline was run for a child cgroup earlier th= an for its
> > > parent. This couldn't happen for sure before the cgroup = was switched to
> > > percpu_ref, because cgroup_destroy_wq has always had max_act= ive =3D=3D 1.
> > > Now, however, it looks like this is perfectly possible for > > > css_killed_ref_fn is called from an rcu callback - see kill_= css ->
> > > percpu_ref_kill_and_confirm. This breaks kmemcg assumptions.=
> > >
> > > I'll take a look what can be done about that.
> >
> > It's an acknowledged problem in the cgroup core then, and not= an issue
> > with kmemcg. Tejun sent a fix to correct the offlining order here= :
> >
> > https://www.mail-= archive.com/linux-kernel@vger.kernel.org/msg1056544.html
>
> Patche descriptions updated and applied to cgroup/for-4.5-fixes.
>
>=C2=A0 http://lkml.kernel.org/g/2= 0160122154503.GD32380@htj.duckdns.org
>=C2=A0 http://lkml.kernel.org/g/2= 0160122154552.GE32380@htj.duckdns.org

I couldn't reproduce the issue with the two patches applied. Loo= ks like
they fix it.

Thanks,
Vladimir

Thanks for the quic= k turn around! I'll test it when it gets into the mainline. Do you know= what versions the fixes will go into?

=
Thanks,
Brian
--089e0149d21ccb76300529f07bc1-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org