From: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
To: Markus Blank-Burian <burian-iYtK5bfT9M8b1SvskN2V4Q@public.gmane.org>
Cc: cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: Possible regression with cgroups in 3.11
Date: Fri, 11 Oct 2013 21:06:22 +0800 [thread overview]
Message-ID: <5257F7CE.90702@huawei.com> (raw)
In-Reply-To: <4431690.ZqnBIdaGMg-fhzw3bAB8VLGE+7tAf435K1T39T6GgSB@public.gmane.org>
On 2013/10/10 16:50, Markus Blank-Burian wrote:
> Hi,
>
Thanks for the report.
> I have upgraded all nodes on our computing cluster to 3.11.3 last week (from
> 3.10.9) and experience deadlocks in kernel threads connected to cgroups. They
> appear sometimes, when our queuing system (slurm 2.6.0) tries to clean up its
> cgroups (using freezer, cpuset, memory and devices subsets). I have attached
> the associated kernel messages as well als the cleanup script.
>
We've changed the cgroup destroy path dramatically including using per-cpu
ref, so those changes probably introduced this bug.
> Oct 10 00:39:48 kaa-14 kernel: [169967.617545] INFO: task kworker/7:0:5201 blocked for more than 120 seconds.
> Oct 10 00:39:48 kaa-14 kernel: [169967.617557] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Oct 10 00:39:48 kaa-14 kernel: [169967.617563] kworker/7:0 D ffff88077e873328 0 5201 2 0x00000000
> Oct 10 00:39:48 kaa-14 kernel: [169967.617583] Workqueue: events cgroup_offline_fn
> Oct 10 00:39:48 kaa-14 kernel: [169967.617590] ffff8804a4129d70 0000000000000002 ffff8804adc60000 ffff8804a4129fd8
> Oct 10 00:39:48 kaa-14 kernel: [169967.617599] ffff8804a4129fd8 0000000000011c40 ffff88077e872ee0 ffffffff81634ae0
> Oct 10 00:39:48 kaa-14 kernel: [169967.617608] ffffffff81634ae4 ffff88077e872ee0 ffffffff81634ae8 00000000ffffffff
> Oct 10 00:39:48 kaa-14 kernel: [169967.617617] Call Trace:
> Oct 10 00:39:48 kaa-14 kernel: [169967.617634] [<ffffffff813c57e4>] schedule+0x60/0x62
> Oct 10 00:39:48 kaa-14 kernel: [169967.617645] [<ffffffff813c5a6b>] schedule_preempt_disabled+0x13/0x1f
> Oct 10 00:39:48 kaa-14 kernel: [169967.617654] [<ffffffff813c4987>] __mutex_lock_slowpath+0x143/0x1d4
> Oct 10 00:39:48 kaa-14 kernel: [169967.617665] [<ffffffff8105a3e8>] ? arch_vtime_task_switch+0x6a/0x6f
> Oct 10 00:39:48 kaa-14 kernel: [169967.617673] [<ffffffff813c3b58>] mutex_lock+0x12/0x22
> Oct 10 00:39:48 kaa-14 kernel: [169967.617681] [<ffffffff81084f4f>] cgroup_offline_fn+0x36/0x137
All the tasks are blocked in cgroup mutex, but it doesn't tell us who's
holding this lock, which is vital.
Is there any other kernel warnings in the kernel log?
next prev parent reply other threads:[~2013-10-11 13:06 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-10-10 8:50 Possible regression with cgroups in 3.11 Markus Blank-Burian
[not found] ` <4431690.ZqnBIdaGMg-fhzw3bAB8VLGE+7tAf435K1T39T6GgSB@public.gmane.org>
2013-10-11 13:06 ` Li Zefan [this message]
[not found] ` <5257F7CE.90702-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-10-11 16:05 ` Markus Blank-Burian
[not found] ` <CA+SBX_Pa8sJbRq3aOghzqam5tDUbs_SPnVTaewtg-pRmvUqSzA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-10-12 6:00 ` Li Zefan
[not found] ` <5258E584.70500-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-10-14 8:06 ` Markus Blank-Burian
[not found] ` <CA+SBX_MQVMuzWKroASK7Cr5J8cu9ajGo=CWr7SRs+OWh83h4_w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-10-15 3:15 ` Li Zefan
[not found] ` <525CB337.8050105-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-10-18 9:34 ` Markus Blank-Burian
[not found] ` <CA+SBX_Ogo8HP81o+vrJ8ozSBN6gPwzc8WNOV3Uya=4AYv+CCyQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-10-18 9:57 ` Markus Blank-Burian
[not found] ` <CA+SBX_OJBbYzrNX5Mi4rmM2SANShXMmAvuPGczAyBdx8F2hBDQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-10-30 8:14 ` Li Zefan
[not found] ` <5270BFE7.4000602-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-10-31 2:09 ` Hugh Dickins
[not found] ` <alpine.LNX.2.00.1310301606080.2333-fupSdm12i1nKWymIFiNcPA@public.gmane.org>
2013-10-31 17:06 ` Steven Rostedt
[not found] ` <20131031130647.0ff6f2c7-f9ZlEuEWxVcJvu8Pb33WZ0EMvNT87kid@public.gmane.org>
2013-10-31 21:46 ` Hugh Dickins
[not found] ` <alpine.LNX.2.00.1310311442030.2633-fupSdm12i1nKWymIFiNcPA@public.gmane.org>
2013-10-31 23:27 ` Steven Rostedt
[not found] ` <20131031192732.2dbb14b3-f9ZlEuEWxVcJvu8Pb33WZ0EMvNT87kid@public.gmane.org>
2013-11-01 1:33 ` Hugh Dickins
2013-11-04 11:00 ` Markus Blank-Burian
[not found] ` <CA+SBX_NjAYrqqOpSuCy8Wpj6q1hE_qdLrRV6auydmJjdcHKQHg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-04 12:29 ` Li Zefan
[not found] ` <5277932C.40400-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-11-04 13:43 ` Markus Blank-Burian
[not found] ` <CA+SBX_ORkOzDynKKweg=JomY2+1kz4=FXYJXYMsN8LKf48idBg@mail.gmail. com>
[not found] ` <CA+SBX_ORkOzDynKKweg=JomY2+1kz4=FXYJXYMsN8LKf48idBg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-05 9:01 ` Li Zefan
[not found] ` <5278B3F1.9040502-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-11-07 23:53 ` Johannes Weiner
[not found] ` <20131107235301.GB1092-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2013-11-08 0:14 ` Johannes Weiner
[not found] ` <20131108001437.GC1092-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2013-11-08 8:36 ` Li Zefan
[not found] ` <527CA292.7090104-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-11-08 13:34 ` Johannes Weiner
2013-11-08 10:20 ` Markus Blank-Burian
[not found] ` <CA+SBX_P6wzmb0k0qM1m06C_1024ZTfYZOs0axLBBJm46X+osqA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-11 15:39 ` Michal Hocko
[not found] ` <20131111153943.GA22384-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-11-11 16:11 ` Markus Blank-Burian
[not found] ` <CA+SBX_PiRoL7HU-C_wXHjHYduYrbTjO3i6_OoHOJ_Mq+sMZStg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-12 13:58 ` Michal Hocko
[not found] ` <20131112135844.GA6049-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-11-12 19:33 ` Markus Blank-Burian
[not found] ` <CA+SBX_MWM1iU7kyT5Ct3OJ7S3oMgbz_EWbFH1dGae+r_UnDxOA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-13 1:51 ` Li Zefan
2013-11-13 16:31 ` Markus Blank-Burian
[not found] ` <CA+SBX_O4oK1H7Gtb5OFYSn_W3Gz+d-YqF7OmM3mOrRTp6x3pvw@mail.gmail.com>
[not found] ` <CA+SBX_O4oK1H7Gtb5OFYSn_W3Gz+d-YqF7OmM3mOrRTp6x3pvw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-18 9:45 ` Michal Hocko
[not found] ` <20131118094554.GA32623-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-11-18 14:31 ` Markus Blank-Burian
[not found] ` <CA+SBX_PqdsG5LBQ1uLpPsSUsbjF8TJ+ok4E+Hp_3AdHf+_5e-A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-18 19:16 ` Michal Hocko
[not found] ` <20131118191655.GB12923-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-11-21 15:59 ` Markus Blank-Burian
[not found] ` <CA+SBX_OeGCr5oDbF0n7jSLu-TTY9xpqc=LYp_=18qFYHB-nBdg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-21 16:45 ` Michal Hocko
[not found] ` <CA+SBX_PDuU7roist-rQ136Jhx1pr-Nt-r=ULdghJFNHsMWwLrg@mail.gmail.com>
[not found] ` <CA+SBX_PDuU7roist-rQ136Jhx1pr-Nt-r=ULdghJFNHsMWwLrg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-22 14:50 ` Michal Hocko
[not found] ` <20131122145033.GE25406-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-11-25 14:03 ` Markus Blank-Burian
[not found] ` <CA+SBX_O_+WbZGUJ_tw_EWPaSfrWbTgQu8=GpGpqm0sizmmP=cA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-26 15:21 ` Michal Hocko
[not found] ` <20131126152124.GC32639-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-11-26 21:05 ` Markus Blank-Burian
[not found] ` <CA+SBX_Mb0EwvmaejqoW4mtYbiOTV6yV3VrLH7=s0wX-6rH7yDA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-28 17:05 ` Michal Hocko
[not found] ` <20131128170536.GA17411-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-11-29 8:33 ` Markus Blank-Burian
2013-11-26 21:47 ` Markus Blank-Burian
2013-11-13 15:17 ` Michal Hocko
2013-11-18 10:30 ` William Dauchy
[not found] ` <CAJ75kXamrtQz5-cYS7tYtYeP1ZLf2pzSE7UnEPpyORzpG3BASg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-18 16:43 ` Johannes Weiner
[not found] ` <20131118164308.GD3556-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2013-11-19 11:16 ` William Dauchy
2013-11-11 15:31 ` Michal Hocko
[not found] ` <20131111153148.GC14497-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-11-12 14:58 ` Michal Hocko
[not found] ` <20131112145824.GC6049-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-11-13 3:38 ` Tejun Heo
[not found] ` <20131113033840.GC19394-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-11-13 11:01 ` Michal Hocko
[not found] ` <20131113110108.GA22131-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-11-13 13:23 ` [RFC] memcg: fix race between css_offline and async charge (was: Re: Possible regression with cgroups in 3.11) Michal Hocko
[not found] ` <20131113132337.GB22131-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-11-13 14:54 ` Johannes Weiner
[not found] ` <20131113145427.GG707-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2013-11-13 15:13 ` Michal Hocko
[not found] ` <20131113151339.GC22131-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-11-13 15:30 ` Johannes Weiner
2013-11-13 3:28 ` Possible regression with cgroups in 3.11 Tejun Heo
[not found] ` <20131113032804.GB19394-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-11-13 7:38 ` Tejun Heo
2013-11-16 0:28 ` Bjorn Helgaas
[not found] ` <20131116002820.GA31073-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2013-11-16 4:53 ` Tejun Heo
2013-11-18 18:14 ` Bjorn Helgaas
[not found] ` <20131118181440.GA2996-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2013-11-18 19:29 ` Yinghai Lu
2013-11-18 20:39 ` Bjorn Helgaas
[not found] ` <20131118203925.GA26682-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2013-11-21 4:26 ` Sasha Levin
[not found] ` <528D8B6A.40008-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2013-11-21 4:47 ` Bjorn Helgaas
[not found] ` <CAErSpo4bXfVbxcJ6-LcByDRX25DSa8Pa+9dLtcaW631YK88Gcw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-25 21:57 ` Bjorn Helgaas
2013-10-15 3:47 ` Li Zefan
-- strict thread matches above, loose matches on Subject: below --
2013-10-10 8:49 Markus Blank-Burian
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5257F7CE.90702@huawei.com \
--to=lizefan-hv44wf8li93qt0dzr+alfa@public.gmane.org \
--cc=burian-iYtK5bfT9M8b1SvskN2V4Q@public.gmane.org \
--cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox