All of lore.kernel.org
 help / color / mirror / Atom feed
From: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
To: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Hugh Dickins <hughd-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Shawn Bohrer
	<shawn.bohrer-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>,
	Markus Blank-Burian
	<burian-iYtK5bfT9M8b1SvskN2V4Q@public.gmane.org>
Subject: Re: [PATCH cgroup/for-3.13-fixes] cgroup: use a dedicated workqueue for cgroup destruction
Date: Mon, 25 Nov 2013 09:16:39 +0800	[thread overview]
Message-ID: <5292A4F7.3030105@huawei.com> (raw)
In-Reply-To: <20131122221752.GC8981-9pTldWuhBndy/B6EtB590w@public.gmane.org>

> Since be44562613851 ("cgroup: remove synchronize_rcu() from
> cgroup_diput()"), cgroup destruction path makes use of workqueue.  css
> freeing is performed from a work item from that point on and a later
> commit, ea15f8ccdb430 ("cgroup: split cgroup destruction into two
> steps"), moves css offlining to workqueue too.
> 
> As cgroup destruction isn't depended upon for memory reclaim, the
> destruction work items were put on the system_wq; unfortunately, some
> controller may block in the destruction path for considerable duration
> while holding cgroup_mutex.  As large part of destruction path is
> synchronized through cgroup_mutex, when combined with high rate of
> cgroup removals, this has potential to fill up system_wq's max_active
> of 256.
> 
> Also, it turns out that memcg's css destruction path ends up queueing
> and waiting for work items on system_wq through work_on_cpu().  If
> such operation happens while system_wq is fully occupied by cgroup
> destruction work items, work_on_cpu() can't make forward progress
> because system_wq is full and other destruction work items on
> system_wq can't make forward progress because the work item waiting
> for work_on_cpu() is holding cgroup_mutex, leading to deadlock.
> 
> This can be fixed by queueing destruction work items on a separate
> workqueue.  This patch creates a dedicated workqueue -
> cgroup_destroy_wq - for this purpose.  As these work items shouldn't
> have inter-dependencies and mostly serialized by cgroup_mutex anyway,
> giving high concurrency level doesn't buy anything and the workqueue's
> @max_active is set to 1 so that destruction work items are executed
> one by one on each CPU.
> 
> Hugh Dickins: Because cgroup_init() is run before init_workqueues(),
> cgroup_destroy_wq can't be allocated from cgroup_init().  Do it from a
> separate core_initcall().  In the future, we probably want to reorder
> so that workqueue init happens before cgroup_init().
> 
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> Reported-by: Hugh Dickins <hughd-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> Reported-by: Shawn Bohrer <shawn.bohrer-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Link: http://lkml.kernel.org/r/20131111220626.GA7509-/vebjAlq/uFE7V8Yqttd03bhEEblAqRIDbRjUBewulXQT0dZR+AlfA@public.gmane.org
> Link: http://lkml.kernel.org/g/alpine.LNX.2.00.1310301606080.2333-fupSdm12i1nKWymIFiNcPA@public.gmane.org
> Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org # v3.9+

Acked-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

WARNING: multiple messages have this Message-ID (diff)
From: Li Zefan <lizefan@huawei.com>
To: Tejun Heo <tj@kernel.org>
Cc: Hugh Dickins <hughd@google.com>,
	Shawn Bohrer <shawn.bohrer@gmail.com>,
	Michal Hocko <mhocko@suse.cz>, <cgroups@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Markus Blank-Burian <burian@muenster.de>
Subject: Re: [PATCH cgroup/for-3.13-fixes] cgroup: use a dedicated workqueue for cgroup destruction
Date: Mon, 25 Nov 2013 09:16:39 +0800	[thread overview]
Message-ID: <5292A4F7.3030105@huawei.com> (raw)
In-Reply-To: <20131122221752.GC8981@mtj.dyndns.org>

> Since be44562613851 ("cgroup: remove synchronize_rcu() from
> cgroup_diput()"), cgroup destruction path makes use of workqueue.  css
> freeing is performed from a work item from that point on and a later
> commit, ea15f8ccdb430 ("cgroup: split cgroup destruction into two
> steps"), moves css offlining to workqueue too.
> 
> As cgroup destruction isn't depended upon for memory reclaim, the
> destruction work items were put on the system_wq; unfortunately, some
> controller may block in the destruction path for considerable duration
> while holding cgroup_mutex.  As large part of destruction path is
> synchronized through cgroup_mutex, when combined with high rate of
> cgroup removals, this has potential to fill up system_wq's max_active
> of 256.
> 
> Also, it turns out that memcg's css destruction path ends up queueing
> and waiting for work items on system_wq through work_on_cpu().  If
> such operation happens while system_wq is fully occupied by cgroup
> destruction work items, work_on_cpu() can't make forward progress
> because system_wq is full and other destruction work items on
> system_wq can't make forward progress because the work item waiting
> for work_on_cpu() is holding cgroup_mutex, leading to deadlock.
> 
> This can be fixed by queueing destruction work items on a separate
> workqueue.  This patch creates a dedicated workqueue -
> cgroup_destroy_wq - for this purpose.  As these work items shouldn't
> have inter-dependencies and mostly serialized by cgroup_mutex anyway,
> giving high concurrency level doesn't buy anything and the workqueue's
> @max_active is set to 1 so that destruction work items are executed
> one by one on each CPU.
> 
> Hugh Dickins: Because cgroup_init() is run before init_workqueues(),
> cgroup_destroy_wq can't be allocated from cgroup_init().  Do it from a
> separate core_initcall().  In the future, we probably want to reorder
> so that workqueue init happens before cgroup_init().
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Reported-by: Hugh Dickins <hughd@google.com>
> Reported-by: Shawn Bohrer <shawn.bohrer@gmail.com>
> Link: http://lkml.kernel.org/r/20131111220626.GA7509@sbohrermbp13-local.rgmadvisors.com
> Link: http://lkml.kernel.org/g/alpine.LNX.2.00.1310301606080.2333@eggly.anvils
> Cc: stable@vger.kernel.org # v3.9+

Acked-by: Li Zefan <lizefan@huawei.com>


  parent reply	other threads:[~2013-11-25  1:16 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-11 22:06 3.10.16 cgroup_mutex deadlock Shawn Bohrer
2013-11-11 22:06 ` Shawn Bohrer
     [not found] ` <20131111220626.GA7509-/vebjAlq/uFE7V8Yqttd03bhEEblAqRIDbRjUBewulXQT0dZR+AlfA@public.gmane.org>
2013-11-12 10:17   ` Li Zefan
2013-11-12 10:17     ` Li Zefan
     [not found]     ` <52820030.6000806-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-11-12 14:31       ` Michal Hocko
2013-11-12 14:31         ` Michal Hocko
2013-11-12 15:55         ` Shawn Bohrer
     [not found]           ` <20131112155530.GA2860-/vebjAlq/uFE7V8Yqttd03bhEEblAqRIDbRjUBewulXQT0dZR+AlfA@public.gmane.org>
2013-11-12 16:55             ` Michal Hocko
2013-11-12 16:55               ` Michal Hocko
     [not found]               ` <20131112165504.GF6049-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-11-14 22:56                 ` Shawn Bohrer
2013-11-14 22:56                   ` Shawn Bohrer
     [not found]                   ` <20131114225649.GA16725-/vebjAlq/uFE7V8Yqttd03bhEEblAqRIDbRjUBewulXQT0dZR+AlfA@public.gmane.org>
2013-11-15  6:24                     ` Tejun Heo
2013-11-15  6:24                       ` Tejun Heo
     [not found]                       ` <20131115062458.GA9755-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-11-15  7:54                         ` Tejun Heo
2013-11-15  7:54                           ` Tejun Heo
     [not found]                           ` <20131115075401.GB9755-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-11-18  2:17                             ` Hugh Dickins
2013-11-18  2:17                               ` Hugh Dickins
     [not found]                               ` <alpine.LNX.2.00.1311171746160.15789-fupSdm12i1nKWymIFiNcPA@public.gmane.org>
2013-11-18 20:10                                 ` Shawn Bohrer
2013-11-18 20:10                                   ` Shawn Bohrer
     [not found]                                   ` <20131118201025.GA2747-/vebjAlq/uFE7V8Yqttd03bhEEblAqRIDbRjUBewulXQT0dZR+AlfA@public.gmane.org>
2013-11-19  2:55                                     ` Li Zefan
2013-11-19  2:55                                       ` Li Zefan
     [not found]                                       ` <528AD316.10001-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-11-20 22:47                                         ` Shawn Bohrer
2013-11-20 22:47                                           ` Shawn Bohrer
2013-11-22 20:59                                 ` William Dauchy
2013-11-22 20:59                                   ` William Dauchy
     [not found]                                   ` <CAJ75kXZYjKwV_XiEB493jNyGRqS395JZyY-S9xQBQJLyaCSOEQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-22 22:18                                     ` Tejun Heo
2013-11-22 22:18                                       ` Tejun Heo
     [not found]                                       ` <20131122221839.GD8981-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-11-22 22:54                                         ` William Dauchy
2013-11-22 22:54                                           ` William Dauchy
     [not found]                                           ` <CAJ75kXabrnxqdtb5SXqm_pYTrSih9yvP38DApF+_P+YZCepTMw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-25  1:20                                             ` Li Zefan
2013-11-25  1:20                                               ` Li Zefan
     [not found]                                               ` <5292A5EA.1030501-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-12-02 10:31                                                 ` William Dauchy
2013-12-02 10:31                                                   ` William Dauchy
2013-12-03  1:37                                                   ` Li Zefan
2013-12-03  1:37                                                     ` Li Zefan
2013-11-22 22:17                                 ` [PATCH cgroup/for-3.13-fixes] cgroup: use a dedicated workqueue for cgroup destruction Tejun Heo
2013-11-22 22:17                                   ` Tejun Heo
2013-11-24 18:23                                   ` Hugh Dickins
     [not found]                                   ` <20131122221752.GC8981-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-11-25  1:16                                     ` Li Zefan [this message]
2013-11-25  1:16                                       ` Li Zefan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5292A4F7.3030105@huawei.com \
    --to=lizefan-hv44wf8li93qt0dzr+alfa@public.gmane.org \
    --cc=burian-iYtK5bfT9M8b1SvskN2V4Q@public.gmane.org \
    --cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org \
    --cc=hughd-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=mhocko-AlSwsSmVLrQ@public.gmane.org \
    --cc=shawn.bohrer-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.