All of lore.kernel.org
 help / color / mirror / Atom feed
From: Roman Gushchin <guro@fb.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Kernel Team <Kernel-team@fb.com>,
	Christian Borntraeger <borntraeger@de.ibm.com>,
	"stable@vger.kernel.org" <stable@vger.kernel.org>
Subject: Re: [PATCH] mm: memcg/slab: wait for !root kmem_cache refcnt killing on root kmem_cache destruction
Date: Tue, 26 Nov 2019 18:41:41 +0000	[thread overview]
Message-ID: <20191126184135.GA66034@localhost.localdomain> (raw)
In-Reply-To: <20191126092918.GB20912@dhcp22.suse.cz>

On Tue, Nov 26, 2019 at 10:29:18AM +0100, Michal Hocko wrote:
> On Mon 25-11-19 10:54:53, Roman Gushchin wrote:
> > Christian reported a warning like the following obtained during running some
> > KVM-related tests on s390:
> > 
> > WARNING: CPU: 8 PID: 208 at lib/percpu-refcount.c:108 percpu_ref_exit+0x50/0x58
> > Modules linked in: kvm(-) xt_CHECKSUM xt_MASQUERADE bonding xt_tcpudp ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ip6table_na>
> > CPU: 8 PID: 208 Comm: kworker/8:1 Not tainted 5.2.0+ #66
> > Hardware name: IBM 2964 NC9 712 (LPAR)
> > Workqueue: events sysfs_slab_remove_workfn
> > Krnl PSW : 0704e00180000000 0000001529746850 (percpu_ref_exit+0x50/0x58)
> >            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
> > Krnl GPRS: 00000000ffff8808 0000001529746740 000003f4e30e8e18 0036008100000000
> >            0000001f00000000 0035008100000000 0000001fb3573ab8 0000000000000000
> >            0000001fbdb6de00 0000000000000000 0000001529f01328 0000001fb3573b00
> >            0000001fbb27e000 0000001fbdb69300 000003e009263d00 000003e009263cd0
> > Krnl Code: 0000001529746842: f0a0000407fe        srp        4(11,%r0),2046,0
> >            0000001529746848: 47000700            bc         0,1792
> >           #000000152974684c: a7f40001            brc        15,152974684e
> >           >0000001529746850: a7f4fff2            brc        15,1529746834
> >            0000001529746854: 0707                bcr        0,%r7
> >            0000001529746856: 0707                bcr        0,%r7
> >            0000001529746858: eb8ff0580024        stmg       %r8,%r15,88(%r15)
> >            000000152974685e: a738ffff            lhi        %r3,-1
> > Call Trace:
> > ([<000003e009263d00>] 0x3e009263d00)
> >  [<00000015293252ea>] slab_kmem_cache_release+0x3a/0x70
> >  [<0000001529b04882>] kobject_put+0xaa/0xe8
> >  [<000000152918cf28>] process_one_work+0x1e8/0x428
> >  [<000000152918d1b0>] worker_thread+0x48/0x460
> >  [<00000015291942c6>] kthread+0x126/0x160
> >  [<0000001529b22344>] ret_from_fork+0x28/0x30
> >  [<0000001529b2234c>] kernel_thread_starter+0x0/0x10
> > Last Breaking-Event-Address:
> >  [<000000152974684c>] percpu_ref_exit+0x4c/0x58
> > ---[ end trace b035e7da5788eb09 ]---
> > 
> > The problem occurs because kmem_cache_destroy() is called immediately
> > after deleting of a memcg, so it races with the memcg kmem_cache
> > deactivation.
> > 
> > flush_memcg_workqueue() at the beginning of kmem_cache_destroy()
> > is supposed to guarantee that all deactivation processes are finished,
> > but failed to do so. It waits for an rcu grace period, after which all
> > children kmem_caches should be deactivated. During the deactivation
> > percpu_ref_kill() is called for non root kmem_cache refcounters,
> > but it requires yet another rcu grace period to finish the transition
> > to the atomic (dead) state.
> > 
> > So in a rare case when not all children kmem_caches are destroyed
> > at the moment when the root kmem_cache is about to be gone, we need
> > to wait another rcu grace period before destroying the root
> > kmem_cache.
> 
> Could you explain how rare this really is please?

It seems that we don't destroy root kmem_caches with enabled memcg
accounting that often, but maybe I'm biased here.

> I still have to wrap
> my head around the overall logic here. It looks quite fragile to me TBH.
> I am worried that is relies on implementation detail of the PCP ref
> counters too much.

It is definitely very complicated and fragile, but I hope it won't remain
in this state for long. The new slab controller, which I'm working on,
eliminates all this logic all together and generally simplifies things a lot.
Simple because there will be no need to create and destroy per-memcg
kmem_caches.

Thanks!


  parent reply	other threads:[~2019-11-26 18:42 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-25 18:54 [PATCH] mm: memcg/slab: wait for !root kmem_cache refcnt killing on root kmem_cache destruction Roman Gushchin
2019-11-25 19:20 ` Shakeel Butt
2019-11-26  9:29 ` Michal Hocko
2019-11-26  9:33   ` Christian Borntraeger
2019-11-26 18:41   ` Roman Gushchin [this message]
2019-11-27 12:32     ` Michal Hocko
2019-11-27 17:27       ` Roman Gushchin
2019-11-28  9:43         ` Michal Hocko
2019-11-29  2:28           ` Roman Gushchin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191126184135.GA66034@localhost.localdomain \
    --to=guro@fb.com \
    --cc=Kernel-team@fb.com \
    --cc=akpm@linux-foundation.org \
    --cc=borntraeger@de.ibm.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.