From: Roman Gushchin <guro@fb.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
Andrew Morton <akpm@linux-foundation.org>,
Johannes Weiner <hannes@cmpxchg.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Kernel Team <Kernel-team@fb.com>,
Christian Borntraeger <borntraeger@de.ibm.com>,
"stable@vger.kernel.org" <stable@vger.kernel.org>
Subject: Re: [PATCH] mm: memcg/slab: wait for !root kmem_cache refcnt killing on root kmem_cache destruction
Date: Tue, 26 Nov 2019 18:41:41 +0000 [thread overview]
Message-ID: <20191126184135.GA66034@localhost.localdomain> (raw)
In-Reply-To: <20191126092918.GB20912@dhcp22.suse.cz>
On Tue, Nov 26, 2019 at 10:29:18AM +0100, Michal Hocko wrote:
> On Mon 25-11-19 10:54:53, Roman Gushchin wrote:
> > Christian reported a warning like the following obtained during running some
> > KVM-related tests on s390:
> >
> > WARNING: CPU: 8 PID: 208 at lib/percpu-refcount.c:108 percpu_ref_exit+0x50/0x58
> > Modules linked in: kvm(-) xt_CHECKSUM xt_MASQUERADE bonding xt_tcpudp ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ip6table_na>
> > CPU: 8 PID: 208 Comm: kworker/8:1 Not tainted 5.2.0+ #66
> > Hardware name: IBM 2964 NC9 712 (LPAR)
> > Workqueue: events sysfs_slab_remove_workfn
> > Krnl PSW : 0704e00180000000 0000001529746850 (percpu_ref_exit+0x50/0x58)
> > R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
> > Krnl GPRS: 00000000ffff8808 0000001529746740 000003f4e30e8e18 0036008100000000
> > 0000001f00000000 0035008100000000 0000001fb3573ab8 0000000000000000
> > 0000001fbdb6de00 0000000000000000 0000001529f01328 0000001fb3573b00
> > 0000001fbb27e000 0000001fbdb69300 000003e009263d00 000003e009263cd0
> > Krnl Code: 0000001529746842: f0a0000407fe srp 4(11,%r0),2046,0
> > 0000001529746848: 47000700 bc 0,1792
> > #000000152974684c: a7f40001 brc 15,152974684e
> > >0000001529746850: a7f4fff2 brc 15,1529746834
> > 0000001529746854: 0707 bcr 0,%r7
> > 0000001529746856: 0707 bcr 0,%r7
> > 0000001529746858: eb8ff0580024 stmg %r8,%r15,88(%r15)
> > 000000152974685e: a738ffff lhi %r3,-1
> > Call Trace:
> > ([<000003e009263d00>] 0x3e009263d00)
> > [<00000015293252ea>] slab_kmem_cache_release+0x3a/0x70
> > [<0000001529b04882>] kobject_put+0xaa/0xe8
> > [<000000152918cf28>] process_one_work+0x1e8/0x428
> > [<000000152918d1b0>] worker_thread+0x48/0x460
> > [<00000015291942c6>] kthread+0x126/0x160
> > [<0000001529b22344>] ret_from_fork+0x28/0x30
> > [<0000001529b2234c>] kernel_thread_starter+0x0/0x10
> > Last Breaking-Event-Address:
> > [<000000152974684c>] percpu_ref_exit+0x4c/0x58
> > ---[ end trace b035e7da5788eb09 ]---
> >
> > The problem occurs because kmem_cache_destroy() is called immediately
> > after deleting of a memcg, so it races with the memcg kmem_cache
> > deactivation.
> >
> > flush_memcg_workqueue() at the beginning of kmem_cache_destroy()
> > is supposed to guarantee that all deactivation processes are finished,
> > but failed to do so. It waits for an rcu grace period, after which all
> > children kmem_caches should be deactivated. During the deactivation
> > percpu_ref_kill() is called for non root kmem_cache refcounters,
> > but it requires yet another rcu grace period to finish the transition
> > to the atomic (dead) state.
> >
> > So in a rare case when not all children kmem_caches are destroyed
> > at the moment when the root kmem_cache is about to be gone, we need
> > to wait another rcu grace period before destroying the root
> > kmem_cache.
>
> Could you explain how rare this really is please?
It seems that we don't destroy root kmem_caches with enabled memcg
accounting that often, but maybe I'm biased here.
> I still have to wrap
> my head around the overall logic here. It looks quite fragile to me TBH.
> I am worried that is relies on implementation detail of the PCP ref
> counters too much.
It is definitely very complicated and fragile, but I hope it won't remain
in this state for long. The new slab controller, which I'm working on,
eliminates all this logic all together and generally simplifies things a lot.
Simple because there will be no need to create and destroy per-memcg
kmem_caches.
Thanks!
next prev parent reply other threads:[~2019-11-26 18:42 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-11-25 18:54 [PATCH] mm: memcg/slab: wait for !root kmem_cache refcnt killing on root kmem_cache destruction Roman Gushchin
2019-11-25 19:20 ` Shakeel Butt
2019-11-26 9:29 ` Michal Hocko
2019-11-26 9:33 ` Christian Borntraeger
2019-11-26 18:41 ` Roman Gushchin [this message]
2019-11-27 12:32 ` Michal Hocko
2019-11-27 17:27 ` Roman Gushchin
2019-11-28 9:43 ` Michal Hocko
2019-11-29 2:28 ` Roman Gushchin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191126184135.GA66034@localhost.localdomain \
--to=guro@fb.com \
--cc=Kernel-team@fb.com \
--cc=akpm@linux-foundation.org \
--cc=borntraeger@de.ibm.com \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.