From: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
To: Markus Blank-Burian <burian-iYtK5bfT9M8b1SvskN2V4Q@public.gmane.org>
Cc: cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>,
Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>,
David Rientjes <rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
Hugh Dickins <hughd-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
Ying Han <yinghan-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
Greg Thelen <gthelen-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Subject: Re: Possible regression with cgroups in 3.11
Date: Wed, 30 Oct 2013 16:14:31 +0800 [thread overview]
Message-ID: <5270BFE7.4000602@huawei.com> (raw)
In-Reply-To: <CA+SBX_OJBbYzrNX5Mi4rmM2SANShXMmAvuPGczAyBdx8F2hBDQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
Sorry for late reply.
Seems we stuck in the while loop in mem_cgroup_reparent_charges().
I talked with Michal during Kernel Summit, and seems Google also
hit this bug. Let's get more people involed.
On 2013/10/18 17:57, Markus Blank-Burian wrote:
> My test-runs now reproduced the bug with tracing enabled. The mutex
> holding thread is definitely the one I posted earlier, and with the
> "-t" option the crash utility can also display the whole stack
> backtrace. (Did only show the first 3 lines without this options,
> which confused me earlier into thinking, that the worker thread was
> idle). I will keep the test machine running in this state if you need
> more information.
>
> crash> bt 13115 -t
> PID: 13115 TASK: ffff88082e34a050 CPU: 4 COMMAND: "kworker/4:0"
> START: __schedule at ffffffff813e0f4f
> [ffff88082f673ad8] schedule at ffffffff813e111f
> [ffff88082f673ae8] schedule_timeout at ffffffff813ddd6c
> [ffff88082f673af8] mark_held_locks at ffffffff8107bec4
> [ffff88082f673b10] _raw_spin_unlock_irq at ffffffff813e2625
> [ffff88082f673b38] trace_hardirqs_on_caller at ffffffff8107c04f
> [ffff88082f673b58] trace_hardirqs_on at ffffffff8107c078
> [ffff88082f673b80] __wait_for_common at ffffffff813e0980
> [ffff88082f673b88] schedule_timeout at ffffffff813ddd38
> [ffff88082f673ba0] default_wake_function at ffffffff8105a258
> [ffff88082f673bb8] call_rcu at ffffffff810a552b
> [ffff88082f673be8] wait_for_completion at ffffffff813e0a1c
> [ffff88082f673bf8] wait_rcu_gp at ffffffff8104c736
> [ffff88082f673c08] wakeme_after_rcu at ffffffff8104c6d1
> [ffff88082f673c60] __mutex_unlock_slowpath at ffffffff813e0217
> [ffff88082f673c88] synchronize_rcu at ffffffff810a3f50
> [ffff88082f673c98] mem_cgroup_reparent_charges at ffffffff810f6765
> [ffff88082f673d28] mem_cgroup_css_offline at ffffffff810f6b9f
> [ffff88082f673d58] offline_css at ffffffff8108b4aa
> [ffff88082f673d80] cgroup_offline_fn at ffffffff8108e112
> [ffff88082f673dc0] process_one_work at ffffffff810493b3
> [ffff88082f673dc8] process_one_work at ffffffff81049348
> [ffff88082f673e28] worker_thread at ffffffff81049d7b
> [ffff88082f673e48] worker_thread at ffffffff81049c37
> [ffff88082f673e60] kthread at ffffffff8104ef80
> [ffff88082f673f28] kthread at ffffffff8104eed4
> [ffff88082f673f50] ret_from_fork at ffffffff813e31ec
> [ffff88082f673f80] kthread at ffffffff8104eed4
>
> On Fri, Oct 18, 2013 at 11:34 AM, Markus Blank-Burian
> <burian-iYtK5bfT9M8b1SvskN2V4Q@public.gmane.org> wrote:
>> I guess I found out, where it is hanging: While waiting for the
>> test-runs to trigger the bug, I tried "echo w > /proc/sysrq-trigger"
>> to show the stacks of all blocked tasks, and one of them was always
>> this one:
>>
>> [586147.824671] kworker/3:5 D ffff8800df81e208 0 10909 2 0x00000000
>> [586147.824671] Workqueue: events cgroup_offline_fn
>> [586147.824671] ffff8800fba7bbd0 0000000000000002 ffff88007afc2ee0
>> ffff8800fba7bfd8
>> [586147.824671] ffff8800fba7bfd8 0000000000011c40 ffff8800df81ddc0
>> 7fffffffffffffff
>> [586147.824671] ffff8800fba7bcf8 ffff8800df81ddc0 0000000000000002
>> ffff8800fba7bcf0
>> [586147.824671] Call Trace:
>> [586147.824671] [<ffffffff813c57e4>] schedule+0x60/0x62
>> [586147.824671] [<ffffffff813c374c>] schedule_timeout+0x34/0x11c
>> [586147.824671] [<ffffffff81053305>] ? __wake_up_common+0x51/0x7e
>> [586147.824671] [<ffffffff813c6a73>] ? _raw_spin_unlock_irqrestore+0x29/0x34
>> [586147.824671] [<ffffffff813c5097>] __wait_for_common+0x9c/0x119
>> [586147.824671] [<ffffffff813c3718>] ? svcauth_gss_legacy_init+0x176/0x176
>> [586147.824671] [<ffffffff8105790d>] ? wake_up_state+0xd/0xd
>> [586147.824671] [<ffffffff8109c237>] ? call_rcu_bh+0x18/0x18
>> [586147.824671] [<ffffffff813c5133>] wait_for_completion+0x1f/0x21
>> [586147.824671] [<ffffffff8104a8ee>] wait_rcu_gp+0x46/0x4c
>> [586147.824671] [<ffffffff8104a899>] ? __rcu_read_unlock+0x4c/0x4c
>> [586147.824671] [<ffffffff8109ad6b>] synchronize_rcu+0x29/0x2b
>> [586147.824671] [<ffffffff810ec34e>] mem_cgroup_reparent_charges+0x63/0x2fb
>> [586147.824671] [<ffffffff810ec75a>] mem_cgroup_css_offline+0xa5/0x14a
>> [586147.824671] [<ffffffff8108329e>] offline_css.part.15+0x1b/0x2e
>> [586147.824671] [<ffffffff81084f8b>] cgroup_offline_fn+0x72/0x137
>> [586147.824671] [<ffffffff81047cb7>] process_one_work+0x15f/0x21e
>> [586147.824671] [<ffffffff81048159>] worker_thread+0x144/0x1f0
>> [586147.824671] [<ffffffff81048015>] ? rescuer_thread+0x275/0x275
>> [586147.824671] [<ffffffff8104cbec>] kthread+0x88/0x90
>> [586147.824671] [<ffffffff8104cb64>] ? __kthread_parkme+0x60/0x60
>> [586147.824671] [<ffffffff813c756c>] ret_from_fork+0x7c/0xb0
>> [586147.824671] [<ffffffff8104cb64>] ? __kthread_parkme+0x60/0x60
>>
>>
>> On Tue, Oct 15, 2013 at 5:15 AM, Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> wrote:
>>> On 2013/10/14 16:06, Markus Blank-Burian wrote:
>>>> The crash utility indicated, that the lock was held by a kworker
>>>> thread, which was idle at the moment. So there might be a case, where
>>>> no unlock is done. I am trying to reproduce the problem at the moment
>>>> with CONFIG_PROVE_LOCKING, but without luck so far. It seems, that my
>>>> test-job is quite bad at reproducing the bug. I'll let you know, if I
>>>> can find out more.
>>>>
>>>
>>> Thanks. I'll review the code to see if I can find some suspect.
>>>
>>> PS: I'll be travelling from 10/16 ~ 10/28, so I may not be able
>>> to spend much time on this.
next prev parent reply other threads:[~2013-10-30 8:14 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-10-10 8:50 Possible regression with cgroups in 3.11 Markus Blank-Burian
[not found] ` <4431690.ZqnBIdaGMg-fhzw3bAB8VLGE+7tAf435K1T39T6GgSB@public.gmane.org>
2013-10-11 13:06 ` Li Zefan
[not found] ` <5257F7CE.90702-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-10-11 16:05 ` Markus Blank-Burian
[not found] ` <CA+SBX_Pa8sJbRq3aOghzqam5tDUbs_SPnVTaewtg-pRmvUqSzA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-10-12 6:00 ` Li Zefan
[not found] ` <5258E584.70500-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-10-14 8:06 ` Markus Blank-Burian
[not found] ` <CA+SBX_MQVMuzWKroASK7Cr5J8cu9ajGo=CWr7SRs+OWh83h4_w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-10-15 3:15 ` Li Zefan
[not found] ` <525CB337.8050105-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-10-18 9:34 ` Markus Blank-Burian
[not found] ` <CA+SBX_Ogo8HP81o+vrJ8ozSBN6gPwzc8WNOV3Uya=4AYv+CCyQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-10-18 9:57 ` Markus Blank-Burian
[not found] ` <CA+SBX_OJBbYzrNX5Mi4rmM2SANShXMmAvuPGczAyBdx8F2hBDQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-10-30 8:14 ` Li Zefan [this message]
[not found] ` <5270BFE7.4000602-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-10-31 2:09 ` Hugh Dickins
[not found] ` <alpine.LNX.2.00.1310301606080.2333-fupSdm12i1nKWymIFiNcPA@public.gmane.org>
2013-10-31 17:06 ` Steven Rostedt
[not found] ` <20131031130647.0ff6f2c7-f9ZlEuEWxVcJvu8Pb33WZ0EMvNT87kid@public.gmane.org>
2013-10-31 21:46 ` Hugh Dickins
[not found] ` <alpine.LNX.2.00.1310311442030.2633-fupSdm12i1nKWymIFiNcPA@public.gmane.org>
2013-10-31 23:27 ` Steven Rostedt
[not found] ` <20131031192732.2dbb14b3-f9ZlEuEWxVcJvu8Pb33WZ0EMvNT87kid@public.gmane.org>
2013-11-01 1:33 ` Hugh Dickins
2013-11-04 11:00 ` Markus Blank-Burian
[not found] ` <CA+SBX_NjAYrqqOpSuCy8Wpj6q1hE_qdLrRV6auydmJjdcHKQHg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-04 12:29 ` Li Zefan
[not found] ` <5277932C.40400-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-11-04 13:43 ` Markus Blank-Burian
[not found] ` <CA+SBX_ORkOzDynKKweg=JomY2+1kz4=FXYJXYMsN8LKf48idBg@mail.gmail. com>
[not found] ` <CA+SBX_ORkOzDynKKweg=JomY2+1kz4=FXYJXYMsN8LKf48idBg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-05 9:01 ` Li Zefan
[not found] ` <5278B3F1.9040502-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-11-07 23:53 ` Johannes Weiner
[not found] ` <20131107235301.GB1092-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2013-11-08 0:14 ` Johannes Weiner
[not found] ` <20131108001437.GC1092-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2013-11-08 8:36 ` Li Zefan
[not found] ` <527CA292.7090104-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-11-08 13:34 ` Johannes Weiner
2013-11-08 10:20 ` Markus Blank-Burian
[not found] ` <CA+SBX_P6wzmb0k0qM1m06C_1024ZTfYZOs0axLBBJm46X+osqA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-11 15:39 ` Michal Hocko
[not found] ` <20131111153943.GA22384-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-11-11 16:11 ` Markus Blank-Burian
[not found] ` <CA+SBX_PiRoL7HU-C_wXHjHYduYrbTjO3i6_OoHOJ_Mq+sMZStg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-12 13:58 ` Michal Hocko
[not found] ` <20131112135844.GA6049-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-11-12 19:33 ` Markus Blank-Burian
[not found] ` <CA+SBX_MWM1iU7kyT5Ct3OJ7S3oMgbz_EWbFH1dGae+r_UnDxOA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-13 1:51 ` Li Zefan
2013-11-13 16:31 ` Markus Blank-Burian
[not found] ` <CA+SBX_O4oK1H7Gtb5OFYSn_W3Gz+d-YqF7OmM3mOrRTp6x3pvw@mail.gmail.com>
[not found] ` <CA+SBX_O4oK1H7Gtb5OFYSn_W3Gz+d-YqF7OmM3mOrRTp6x3pvw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-18 9:45 ` Michal Hocko
[not found] ` <20131118094554.GA32623-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-11-18 14:31 ` Markus Blank-Burian
[not found] ` <CA+SBX_PqdsG5LBQ1uLpPsSUsbjF8TJ+ok4E+Hp_3AdHf+_5e-A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-18 19:16 ` Michal Hocko
[not found] ` <20131118191655.GB12923-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-11-21 15:59 ` Markus Blank-Burian
[not found] ` <CA+SBX_OeGCr5oDbF0n7jSLu-TTY9xpqc=LYp_=18qFYHB-nBdg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-21 16:45 ` Michal Hocko
[not found] ` <CA+SBX_PDuU7roist-rQ136Jhx1pr-Nt-r=ULdghJFNHsMWwLrg@mail.gmail.com>
[not found] ` <CA+SBX_PDuU7roist-rQ136Jhx1pr-Nt-r=ULdghJFNHsMWwLrg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-22 14:50 ` Michal Hocko
[not found] ` <20131122145033.GE25406-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-11-25 14:03 ` Markus Blank-Burian
[not found] ` <CA+SBX_O_+WbZGUJ_tw_EWPaSfrWbTgQu8=GpGpqm0sizmmP=cA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-26 15:21 ` Michal Hocko
[not found] ` <20131126152124.GC32639-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-11-26 21:05 ` Markus Blank-Burian
[not found] ` <CA+SBX_Mb0EwvmaejqoW4mtYbiOTV6yV3VrLH7=s0wX-6rH7yDA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-28 17:05 ` Michal Hocko
[not found] ` <20131128170536.GA17411-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-11-29 8:33 ` Markus Blank-Burian
2013-11-26 21:47 ` Markus Blank-Burian
2013-11-13 15:17 ` Michal Hocko
2013-11-18 10:30 ` William Dauchy
[not found] ` <CAJ75kXamrtQz5-cYS7tYtYeP1ZLf2pzSE7UnEPpyORzpG3BASg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-18 16:43 ` Johannes Weiner
[not found] ` <20131118164308.GD3556-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2013-11-19 11:16 ` William Dauchy
2013-11-11 15:31 ` Michal Hocko
[not found] ` <20131111153148.GC14497-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-11-12 14:58 ` Michal Hocko
[not found] ` <20131112145824.GC6049-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-11-13 3:38 ` Tejun Heo
[not found] ` <20131113033840.GC19394-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-11-13 11:01 ` Michal Hocko
[not found] ` <20131113110108.GA22131-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-11-13 13:23 ` [RFC] memcg: fix race between css_offline and async charge (was: Re: Possible regression with cgroups in 3.11) Michal Hocko
[not found] ` <20131113132337.GB22131-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-11-13 14:54 ` Johannes Weiner
[not found] ` <20131113145427.GG707-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2013-11-13 15:13 ` Michal Hocko
[not found] ` <20131113151339.GC22131-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-11-13 15:30 ` Johannes Weiner
2013-11-13 3:28 ` Possible regression with cgroups in 3.11 Tejun Heo
[not found] ` <20131113032804.GB19394-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-11-13 7:38 ` Tejun Heo
2013-11-16 0:28 ` Bjorn Helgaas
[not found] ` <20131116002820.GA31073-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2013-11-16 4:53 ` Tejun Heo
2013-11-18 18:14 ` Bjorn Helgaas
[not found] ` <20131118181440.GA2996-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2013-11-18 19:29 ` Yinghai Lu
2013-11-18 20:39 ` Bjorn Helgaas
[not found] ` <20131118203925.GA26682-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2013-11-21 4:26 ` Sasha Levin
[not found] ` <528D8B6A.40008-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2013-11-21 4:47 ` Bjorn Helgaas
[not found] ` <CAErSpo4bXfVbxcJ6-LcByDRX25DSa8Pa+9dLtcaW631YK88Gcw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-25 21:57 ` Bjorn Helgaas
2013-10-15 3:47 ` Li Zefan
-- strict thread matches above, loose matches on Subject: below --
2013-10-10 8:49 Markus Blank-Burian
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5270BFE7.4000602@huawei.com \
--to=lizefan-hv44wf8li93qt0dzr+alfa@public.gmane.org \
--cc=burian-iYtK5bfT9M8b1SvskN2V4Q@public.gmane.org \
--cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=gthelen-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
--cc=hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org \
--cc=hughd-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
--cc=mhocko-AlSwsSmVLrQ@public.gmane.org \
--cc=rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
--cc=yinghan-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox