From: Vladimir Davydov <vdavydov-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
To: Brian Christiansen
<brian.o.christiansen-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>,
Michal Hocko <mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org
Subject: Re: PROBLEM: BUG when using memory.kmem.limit_in_bytes
Date: Fri, 22 Jan 2016 16:50:42 +0300 [thread overview]
Message-ID: <20160122135042.GF26192@esperanza> (raw)
In-Reply-To: <CAKB58ikDkzc8REt31WBkD99+hxNzjK4+FBmhkgS+NVrC9vjMSg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
Hi Brian,
Thanks for the report.
I managed to reproduce the bug on the latest mmotm kernel using the
script you attached, so it isn't ubuntu-specific:
: kernel BUG at mm/memcontrol.c:2929!
: invalid opcode: 0000 [#1] SMP
: CPU: 0 PID: 4441 Comm: kworker/0:2 Not tainted 4.4.0-mm1+ #256
: Workqueue: cgroup_destroy css_killed_work_fn
: task: ffff8800aaddd880 ti: ffff8800369b0000 task.ti: ffff8800369b0000
: RIP: 0010:[<ffffffff81220551>] [<ffffffff81220551>] memcg_offline_kmem+0xd1/0xe0
: RSP: 0018:ffff8800369b3b08 EFLAGS: 00010293
: RAX: ffff8800a9cba800 RBX: ffff8800ab1c7000 RCX: 0000000000000003
: RDX: ffff8800a9cba850 RSI: ffff8800ab1c7060 RDI: ffff8800ab1c1000
: RBP: ffff8800369b3b28 R08: 0000000000000001 R09: 0000000000000000
: R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800ab1c5000
: R13: 0000000000000000 R14: ffff8800ab1c7650 R15: ffff8800ab1c7640
: FS: 0000000000000000(0000) GS:ffff88014ae00000(0000) knlGS:0000000000000000
: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
: CR2: 00007f3ad9d3b090 CR3: 0000000148d61000 CR4: 00000000000006f0
: Stack:
: ffff8800369b3b28 ffff8800ab1c7640 ffff8800ab1c7640 ffff8800ab1c7000
: ffff8800369b3b88 ffffffff81220601 0000000000000001 0000000000000000
: 0000000000000002 ffff8800aaddd880 ffff8800369b3b88 ffff8800ab1c7090
: Call Trace:
: [<ffffffff81220601>] mem_cgroup_css_offline+0xa1/0xc0
: [<ffffffff81124b5c>] css_killed_work_fn+0x5c/0x170
: [<ffffffff8109ea30>] process_one_work+0x200/0x560
: [<ffffffff8109e99f>] ? process_one_work+0x16f/0x560
: [<ffffffff810d2bb2>] ? __lock_acquire+0x1a2/0x440
: [<ffffffff8109f6d4>] ? worker_thread+0x204/0x530
: [<ffffffff8109f6c7>] ? worker_thread+0x1f7/0x530
: [<ffffffff8109f63e>] worker_thread+0x16e/0x530
: [<ffffffff816f2eb4>] ? __schedule+0x354/0x900
: [<ffffffff810b2e22>] ? default_wake_function+0x12/0x20
: [<ffffffff810ca356>] ? __wake_up_common+0x56/0x90
: [<ffffffff8109f4d0>] ? maybe_create_worker+0x110/0x110
: [<ffffffff816f3567>] ? schedule+0x47/0xc0
: [<ffffffff8109f4d0>] ? maybe_create_worker+0x110/0x110
: [<ffffffff810a4ac9>] kthread+0xe9/0x110
: [<ffffffff810af22e>] ? schedule_tail+0x1e/0xd0
: [<ffffffff810a49e0>] ? __init_kthread_worker+0x70/0x70
: [<ffffffff816f92cf>] ret_from_fork+0x3f/0x70
: [<ffffffff810a49e0>] ? __init_kthread_worker+0x70/0x70
From first glance, it looks like the bug was triggered, because
mem_cgroup_css_offline was run for a child cgroup earlier than for its
parent. This couldn't happen for sure before the cgroup was switched to
percpu_ref, because cgroup_destroy_wq has always had max_active == 1.
Now, however, it looks like this is perfectly possible for
css_killed_ref_fn is called from an rcu callback - see kill_css ->
percpu_ref_kill_and_confirm. This breaks kmemcg assumptions.
I'll take a look what can be done about that.
Thanks,
Vladimir
WARNING: multiple messages have this Message-ID (diff)
From: Vladimir Davydov <vdavydov@virtuozzo.com>
To: Brian Christiansen <brian.o.christiansen@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
Michal Hocko <mhocko@kernel.org>,
cgroups@vger.kernel.org, linux-mm@kvack.org
Subject: Re: PROBLEM: BUG when using memory.kmem.limit_in_bytes
Date: Fri, 22 Jan 2016 16:50:42 +0300 [thread overview]
Message-ID: <20160122135042.GF26192@esperanza> (raw)
In-Reply-To: <CAKB58ikDkzc8REt31WBkD99+hxNzjK4+FBmhkgS+NVrC9vjMSg@mail.gmail.com>
Hi Brian,
Thanks for the report.
I managed to reproduce the bug on the latest mmotm kernel using the
script you attached, so it isn't ubuntu-specific:
: kernel BUG at mm/memcontrol.c:2929!
: invalid opcode: 0000 [#1] SMP
: CPU: 0 PID: 4441 Comm: kworker/0:2 Not tainted 4.4.0-mm1+ #256
: Workqueue: cgroup_destroy css_killed_work_fn
: task: ffff8800aaddd880 ti: ffff8800369b0000 task.ti: ffff8800369b0000
: RIP: 0010:[<ffffffff81220551>] [<ffffffff81220551>] memcg_offline_kmem+0xd1/0xe0
: RSP: 0018:ffff8800369b3b08 EFLAGS: 00010293
: RAX: ffff8800a9cba800 RBX: ffff8800ab1c7000 RCX: 0000000000000003
: RDX: ffff8800a9cba850 RSI: ffff8800ab1c7060 RDI: ffff8800ab1c1000
: RBP: ffff8800369b3b28 R08: 0000000000000001 R09: 0000000000000000
: R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800ab1c5000
: R13: 0000000000000000 R14: ffff8800ab1c7650 R15: ffff8800ab1c7640
: FS: 0000000000000000(0000) GS:ffff88014ae00000(0000) knlGS:0000000000000000
: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
: CR2: 00007f3ad9d3b090 CR3: 0000000148d61000 CR4: 00000000000006f0
: Stack:
: ffff8800369b3b28 ffff8800ab1c7640 ffff8800ab1c7640 ffff8800ab1c7000
: ffff8800369b3b88 ffffffff81220601 0000000000000001 0000000000000000
: 0000000000000002 ffff8800aaddd880 ffff8800369b3b88 ffff8800ab1c7090
: Call Trace:
: [<ffffffff81220601>] mem_cgroup_css_offline+0xa1/0xc0
: [<ffffffff81124b5c>] css_killed_work_fn+0x5c/0x170
: [<ffffffff8109ea30>] process_one_work+0x200/0x560
: [<ffffffff8109e99f>] ? process_one_work+0x16f/0x560
: [<ffffffff810d2bb2>] ? __lock_acquire+0x1a2/0x440
: [<ffffffff8109f6d4>] ? worker_thread+0x204/0x530
: [<ffffffff8109f6c7>] ? worker_thread+0x1f7/0x530
: [<ffffffff8109f63e>] worker_thread+0x16e/0x530
: [<ffffffff816f2eb4>] ? __schedule+0x354/0x900
: [<ffffffff810b2e22>] ? default_wake_function+0x12/0x20
: [<ffffffff810ca356>] ? __wake_up_common+0x56/0x90
: [<ffffffff8109f4d0>] ? maybe_create_worker+0x110/0x110
: [<ffffffff816f3567>] ? schedule+0x47/0xc0
: [<ffffffff8109f4d0>] ? maybe_create_worker+0x110/0x110
: [<ffffffff810a4ac9>] kthread+0xe9/0x110
: [<ffffffff810af22e>] ? schedule_tail+0x1e/0xd0
: [<ffffffff810a49e0>] ? __init_kthread_worker+0x70/0x70
: [<ffffffff816f92cf>] ret_from_fork+0x3f/0x70
: [<ffffffff810a49e0>] ? __init_kthread_worker+0x70/0x70
>From first glance, it looks like the bug was triggered, because
mem_cgroup_css_offline was run for a child cgroup earlier than for its
parent. This couldn't happen for sure before the cgroup was switched to
percpu_ref, because cgroup_destroy_wq has always had max_active == 1.
Now, however, it looks like this is perfectly possible for
css_killed_ref_fn is called from an rcu callback - see kill_css ->
percpu_ref_kill_and_confirm. This breaks kmemcg assumptions.
I'll take a look what can be done about that.
Thanks,
Vladimir
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-01-22 13:50 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-21 19:49 PROBLEM: BUG when using memory.kmem.limit_in_bytes Brian Christiansen
[not found] ` <CAKB58ikDkzc8REt31WBkD99+hxNzjK4+FBmhkgS+NVrC9vjMSg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-01-22 13:50 ` Vladimir Davydov [this message]
2016-01-22 13:50 ` Vladimir Davydov
2016-01-22 14:48 ` Johannes Weiner
2016-01-22 15:51 ` Tejun Heo
[not found] ` <20160122155104.GG32380-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>
2016-01-22 16:33 ` Vladimir Davydov
2016-01-22 16:33 ` Vladimir Davydov
2016-01-22 18:35 ` Brian Christiansen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160122135042.GF26192@esperanza \
--to=vdavydov-5hdwgun5lf+gspxsjd1c4w@public.gmane.org \
--cc=brian.o.christiansen-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org \
--cc=linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org \
--cc=mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.