From: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
To: Markus Blank-Burian <burian-iYtK5bfT9M8b1SvskN2V4Q@public.gmane.org>
Cc: cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>,
Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>,
David Rientjes <rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
Hugh Dickins <hughd-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
Ying Han <yinghan-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
Greg Thelen <gthelen-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Subject: Re: Possible regression with cgroups in 3.11
Date: Wed, 30 Oct 2013 16:14:31 +0800 [thread overview]
Message-ID: <5270BFE7.4000602@huawei.com> (raw)
In-Reply-To: <CA+SBX_OJBbYzrNX5Mi4rmM2SANShXMmAvuPGczAyBdx8F2hBDQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
Sorry for late reply.
Seems we stuck in the while loop in mem_cgroup_reparent_charges().
I talked with Michal during Kernel Summit, and seems Google also
hit this bug. Let's get more people involed.
On 2013/10/18 17:57, Markus Blank-Burian wrote:
> My test-runs now reproduced the bug with tracing enabled. The mutex
> holding thread is definitely the one I posted earlier, and with the
> "-t" option the crash utility can also display the whole stack
> backtrace. (Did only show the first 3 lines without this options,
> which confused me earlier into thinking, that the worker thread was
> idle). I will keep the test machine running in this state if you need
> more information.
>
> crash> bt 13115 -t
> PID: 13115 TASK: ffff88082e34a050 CPU: 4 COMMAND: "kworker/4:0"
> START: __schedule at ffffffff813e0f4f
> [ffff88082f673ad8] schedule at ffffffff813e111f
> [ffff88082f673ae8] schedule_timeout at ffffffff813ddd6c
> [ffff88082f673af8] mark_held_locks at ffffffff8107bec4
> [ffff88082f673b10] _raw_spin_unlock_irq at ffffffff813e2625
> [ffff88082f673b38] trace_hardirqs_on_caller at ffffffff8107c04f
> [ffff88082f673b58] trace_hardirqs_on at ffffffff8107c078
> [ffff88082f673b80] __wait_for_common at ffffffff813e0980
> [ffff88082f673b88] schedule_timeout at ffffffff813ddd38
> [ffff88082f673ba0] default_wake_function at ffffffff8105a258
> [ffff88082f673bb8] call_rcu at ffffffff810a552b
> [ffff88082f673be8] wait_for_completion at ffffffff813e0a1c
> [ffff88082f673bf8] wait_rcu_gp at ffffffff8104c736
> [ffff88082f673c08] wakeme_after_rcu at ffffffff8104c6d1
> [ffff88082f673c60] __mutex_unlock_slowpath at ffffffff813e0217
> [ffff88082f673c88] synchronize_rcu at ffffffff810a3f50
> [ffff88082f673c98] mem_cgroup_reparent_charges at ffffffff810f6765
> [ffff88082f673d28] mem_cgroup_css_offline at ffffffff810f6b9f
> [ffff88082f673d58] offline_css at ffffffff8108b4aa
> [ffff88082f673d80] cgroup_offline_fn at ffffffff8108e112
> [ffff88082f673dc0] process_one_work at ffffffff810493b3
> [ffff88082f673dc8] process_one_work at ffffffff81049348
> [ffff88082f673e28] worker_thread at ffffffff81049d7b
> [ffff88082f673e48] worker_thread at ffffffff81049c37
> [ffff88082f673e60] kthread at ffffffff8104ef80
> [ffff88082f673f28] kthread at ffffffff8104eed4
> [ffff88082f673f50] ret_from_fork at ffffffff813e31ec
> [ffff88082f673f80] kthread at ffffffff8104eed4
>
> On Fri, Oct 18, 2013 at 11:34 AM, Markus Blank-Burian
> <burian-iYtK5bfT9M8b1SvskN2V4Q@public.gmane.org> wrote:
>> I guess I found out, where it is hanging: While waiting for the
>> test-runs to trigger the bug, I tried "echo w > /proc/sysrq-trigger"
>> to show the stacks of all blocked tasks, and one of them was always
>> this one:
>>
>> [586147.824671] kworker/3:5 D ffff8800df81e208 0 10909 2 0x00000000
>> [586147.824671] Workqueue: events cgroup_offline_fn
>> [586147.824671] ffff8800fba7bbd0 0000000000000002 ffff88007afc2ee0
>> ffff8800fba7bfd8
>> [586147.824671] ffff8800fba7bfd8 0000000000011c40 ffff8800df81ddc0
>> 7fffffffffffffff
>> [586147.824671] ffff8800fba7bcf8 ffff8800df81ddc0 0000000000000002
>> ffff8800fba7bcf0
>> [586147.824671] Call Trace:
>> [586147.824671] [<ffffffff813c57e4>] schedule+0x60/0x62
>> [586147.824671] [<ffffffff813c374c>] schedule_timeout+0x34/0x11c
>> [586147.824671] [<ffffffff81053305>] ? __wake_up_common+0x51/0x7e
>> [586147.824671] [<ffffffff813c6a73>] ? _raw_spin_unlock_irqrestore+0x29/0x34
>> [586147.824671] [<ffffffff813c5097>] __wait_for_common+0x9c/0x119
>> [586147.824671] [<ffffffff813c3718>] ? svcauth_gss_legacy_init+0x176/0x176
>> [586147.824671] [<ffffffff8105790d>] ? wake_up_state+0xd/0xd
>> [586147.824671] [<ffffffff8109c237>] ? call_rcu_bh+0x18/0x18
>> [586147.824671] [<ffffffff813c5133>] wait_for_completion+0x1f/0x21
>> [586147.824671] [<ffffffff8104a8ee>] wait_rcu_gp+0x46/0x4c
>> [586147.824671] [<ffffffff8104a899>] ? __rcu_read_unlock+0x4c/0x4c
>> [586147.824671] [<ffffffff8109ad6b>] synchronize_rcu+0x29/0x2b
>> [586147.824671] [<ffffffff810ec34e>] mem_cgroup_reparent_charges+0x63/0x2fb
>> [586147.824671] [<ffffffff810ec75a>] mem_cgroup_css_offline+0xa5/0x14a
>> [586147.824671] [<ffffffff8108329e>] offline_css.part.15+0x1b/0x2e
>> [586147.824671] [<ffffffff81084f8b>] cgroup_offline_fn+0x72/0x137
>> [586147.824671] [<ffffffff81047cb7>] process_one_work+0x15f/0x21e
>> [586147.824671] [<ffffffff81048159>] worker_thread+0x144/0x1f0
>> [586147.824671] [<ffffffff81048015>] ? rescuer_thread+0x275/0x275
>> [586147.824671] [<ffffffff8104cbec>] kthread+0x88/0x90
>> [586147.824671] [<ffffffff8104cb64>] ? __kthread_parkme+0x60/0x60
>> [586147.824671] [<ffffffff813c756c>] ret_from_fork+0x7c/0xb0
>> [586147.824671] [<ffffffff8104cb64>] ? __kthread_parkme+0x60/0x60
>>
>>
>> On Tue, Oct 15, 2013 at 5:15 AM, Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> wrote:
>>> On 2013/10/14 16:06, Markus Blank-Burian wrote:
>>>> The crash utility indicated, that the lock was held by a kworker
>>>> thread, which was idle at the moment. So there might be a case, where
>>>> no unlock is done. I am trying to reproduce the problem at the moment
>>>> with CONFIG_PROVE_LOCKING, but without luck so far. It seems, that my
>>>> test-job is quite bad at reproducing the bug. I'll let you know, if I
>>>> can find out more.
>>>>
>>>
>>> Thanks. I'll review the code to see if I can find some suspect.
>>>
>>> PS: I'll be travelling from 10/16 ~ 10/28, so I may not be able
>>> to spend much time on this.
next prev parent reply other threads:[~2013-10-30 8:14 UTC|newest]
Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-10-10 8:50 Possible regression with cgroups in 3.11 Markus Blank-Burian
[not found] ` <4431690.ZqnBIdaGMg-fhzw3bAB8VLGE+7tAf435K1T39T6GgSB@public.gmane.org>
2013-10-11 13:06 ` Li Zefan
[not found] ` <5257F7CE.90702-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-10-11 16:05 ` Markus Blank-Burian
[not found] ` <CA+SBX_Pa8sJbRq3aOghzqam5tDUbs_SPnVTaewtg-pRmvUqSzA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-10-12 6:00 ` Li Zefan
[not found] ` <5258E584.70500-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-10-14 8:06 ` Markus Blank-Burian
[not found] ` <CA+SBX_MQVMuzWKroASK7Cr5J8cu9ajGo=CWr7SRs+OWh83h4_w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-10-15 3:15 ` Li Zefan
[not found] ` <525CB337.8050105-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-10-18 9:34 ` Markus Blank-Burian
[not found] ` <CA+SBX_Ogo8HP81o+vrJ8ozSBN6gPwzc8WNOV3Uya=4AYv+CCyQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-10-18 9:57 ` Markus Blank-Burian
[not found] ` <CA+SBX_OJBbYzrNX5Mi4rmM2SANShXMmAvuPGczAyBdx8F2hBDQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-10-30 8:14 ` Li Zefan [this message]
[not found] ` <5270BFE7.4000602-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-10-31 2:09 ` Hugh Dickins
[not found] ` <alpine.LNX.2.00.1310301606080.2333-fupSdm12i1nKWymIFiNcPA@public.gmane.org>
2013-10-31 17:06 ` Steven Rostedt
[not found] ` <20131031130647.0ff6f2c7-f9ZlEuEWxVcJvu8Pb33WZ0EMvNT87kid@public.gmane.org>
2013-10-31 21:46 ` Hugh Dickins
[not found] ` <alpine.LNX.2.00.1310311442030.2633-fupSdm12i1nKWymIFiNcPA@public.gmane.org>
2013-10-31 23:27 ` Steven Rostedt
[not found] ` <20131031192732.2dbb14b3-f9ZlEuEWxVcJvu8Pb33WZ0EMvNT87kid@public.gmane.org>
2013-11-01 1:33 ` Hugh Dickins
2013-11-04 11:00 ` Markus Blank-Burian
[not found] ` <CA+SBX_NjAYrqqOpSuCy8Wpj6q1hE_qdLrRV6auydmJjdcHKQHg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-04 12:29 ` Li Zefan
[not found] ` <5277932C.40400-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-11-04 13:43 ` Markus Blank-Burian
[not found] ` <CA+SBX_ORkOzDynKKweg=JomY2+1kz4=FXYJXYMsN8LKf48idBg@mail.gmail. com>
[not found] ` <CA+SBX_ORkOzDynKKweg=JomY2+1kz4=FXYJXYMsN8LKf48idBg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-05 9:01 ` Li Zefan
[not found] ` <5278B3F1.9040502-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-11-07 23:53 ` Johannes Weiner
[not found] ` <20131107235301.GB1092-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2013-11-08 0:14 ` Johannes Weiner
[not found] ` <20131108001437.GC1092-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2013-11-08 8:36 ` Li Zefan
[not found] ` <527CA292.7090104-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-11-08 13:34 ` Johannes Weiner
2013-11-08 10:20 ` Markus Blank-Burian
[not found] ` <CA+SBX_P6wzmb0k0qM1m06C_1024ZTfYZOs0axLBBJm46X+osqA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-11 15:39 ` Michal Hocko
[not found] ` <20131111153943.GA22384-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-11-11 16:11 ` Markus Blank-Burian
[not found] ` <CA+SBX_PiRoL7HU-C_wXHjHYduYrbTjO3i6_OoHOJ_Mq+sMZStg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-12 13:58 ` Michal Hocko
[not found] ` <20131112135844.GA6049-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-11-12 19:33 ` Markus Blank-Burian
[not found] ` <CA+SBX_MWM1iU7kyT5Ct3OJ7S3oMgbz_EWbFH1dGae+r_UnDxOA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-13 1:51 ` Li Zefan
2013-11-13 16:31 ` Markus Blank-Burian
[not found] ` <CA+SBX_O4oK1H7Gtb5OFYSn_W3Gz+d-YqF7OmM3mOrRTp6x3pvw@mail.gmail.com>
[not found] ` <CA+SBX_O4oK1H7Gtb5OFYSn_W3Gz+d-YqF7OmM3mOrRTp6x3pvw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-18 9:45 ` Michal Hocko
[not found] ` <20131118094554.GA32623-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-11-18 14:31 ` Markus Blank-Burian
[not found] ` <CA+SBX_PqdsG5LBQ1uLpPsSUsbjF8TJ+ok4E+Hp_3AdHf+_5e-A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-18 19:16 ` Michal Hocko
[not found] ` <20131118191655.GB12923-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-11-21 15:59 ` Markus Blank-Burian
[not found] ` <CA+SBX_OeGCr5oDbF0n7jSLu-TTY9xpqc=LYp_=18qFYHB-nBdg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-21 16:45 ` Michal Hocko
[not found] ` <CA+SBX_PDuU7roist-rQ136Jhx1pr-Nt-r=ULdghJFNHsMWwLrg@mail.gmail.com>
[not found] ` <CA+SBX_PDuU7roist-rQ136Jhx1pr-Nt-r=ULdghJFNHsMWwLrg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-22 14:50 ` Michal Hocko
[not found] ` <20131122145033.GE25406-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-11-25 14:03 ` Markus Blank-Burian
[not found] ` <CA+SBX_O_+WbZGUJ_tw_EWPaSfrWbTgQu8=GpGpqm0sizmmP=cA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-26 15:21 ` Michal Hocko
[not found] ` <20131126152124.GC32639-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-11-26 21:05 ` Markus Blank-Burian
[not found] ` <CA+SBX_Mb0EwvmaejqoW4mtYbiOTV6yV3VrLH7=s0wX-6rH7yDA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-28 17:05 ` Michal Hocko
[not found] ` <20131128170536.GA17411-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-11-29 8:33 ` Markus Blank-Burian
2013-11-26 21:47 ` Markus Blank-Burian
2013-11-13 15:17 ` Michal Hocko
2013-11-18 10:30 ` William Dauchy
[not found] ` <CAJ75kXamrtQz5-cYS7tYtYeP1ZLf2pzSE7UnEPpyORzpG3BASg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-18 16:43 ` Johannes Weiner
[not found] ` <20131118164308.GD3556-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2013-11-19 11:16 ` William Dauchy
2013-11-11 15:31 ` Michal Hocko
[not found] ` <20131111153148.GC14497-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-11-12 14:58 ` Michal Hocko
[not found] ` <20131112145824.GC6049-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-11-13 3:38 ` Tejun Heo
[not found] ` <20131113033840.GC19394-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-11-13 11:01 ` Michal Hocko
[not found] ` <20131113110108.GA22131-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-11-13 13:23 ` [RFC] memcg: fix race between css_offline and async charge (was: Re: Possible regression with cgroups in 3.11) Michal Hocko
[not found] ` <20131113132337.GB22131-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-11-13 14:54 ` Johannes Weiner
[not found] ` <20131113145427.GG707-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2013-11-13 15:13 ` Michal Hocko
[not found] ` <20131113151339.GC22131-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-11-13 15:30 ` Johannes Weiner
2013-11-13 3:28 ` Possible regression with cgroups in 3.11 Tejun Heo
[not found] ` <20131113032804.GB19394-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-11-13 7:38 ` Tejun Heo
2013-11-13 7:38 ` Tejun Heo
2013-11-16 0:28 ` Bjorn Helgaas
[not found] ` <20131116002820.GA31073-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2013-11-16 4:53 ` Tejun Heo
2013-11-16 4:53 ` Tejun Heo
2013-11-18 18:14 ` Bjorn Helgaas
[not found] ` <20131118181440.GA2996-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2013-11-18 19:29 ` Yinghai Lu
2013-11-18 19:29 ` Yinghai Lu
2013-11-18 20:39 ` Bjorn Helgaas
[not found] ` <20131118203925.GA26682-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2013-11-21 4:26 ` Sasha Levin
2013-11-21 4:26 ` Sasha Levin
[not found] ` <528D8B6A.40008-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2013-11-21 4:47 ` Bjorn Helgaas
2013-11-21 4:47 ` Bjorn Helgaas
[not found] ` <CAErSpo4bXfVbxcJ6-LcByDRX25DSa8Pa+9dLtcaW631YK88Gcw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-25 21:57 ` Bjorn Helgaas
2013-11-25 21:57 ` Bjorn Helgaas
2013-10-15 3:47 ` Li Zefan
-- strict thread matches above, loose matches on Subject: below --
2013-10-10 8:49 Markus Blank-Burian
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5270BFE7.4000602@huawei.com \
--to=lizefan-hv44wf8li93qt0dzr+alfa@public.gmane.org \
--cc=burian-iYtK5bfT9M8b1SvskN2V4Q@public.gmane.org \
--cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=gthelen-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
--cc=hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org \
--cc=hughd-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
--cc=mhocko-AlSwsSmVLrQ@public.gmane.org \
--cc=rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
--cc=yinghan-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.