From: Michal Hocko <mhocko@kernel.org>
To: Tejun Heo <tj@kernel.org>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH wq/for-4.5-fixes] workqueue: handle NUMA_NO_NODE for unbound pool_workqueue lookup
Date: Thu, 4 Feb 2016 09:40:05 +0100 [thread overview]
Message-ID: <20160204084005.GA14430@dhcp22.suse.cz> (raw)
In-Reply-To: <20160203185425.GK14091@mtj.duckdns.org>
On Wed 03-02-16 13:54:25, Tejun Heo wrote:
> When looking up the pool_workqueue to use for an unbound workqueue,
> workqueue assumes that the target CPU is always bound to a valid NUMA
> node. However, currently, when a CPU goes offline, the mapping is
> destroyed and cpu_to_node() returns NUMA_NO_NODE. This has always
> been broken but hasn't triggered until recently.
>
> After 874bbfe600a6 ("workqueue: make sure delayed work run in local
> cpu"), workqueue forcifully assigns the local CPU for delayed work
> items without explicit target CPU to fix a different issue. This
> widens the window where CPU can go offline while a delayed work item
> is pending causing delayed work items dispatched with target CPU set
> to an already offlined CPU. The resulting NUMA_NO_NODE mapping makes
> workqueue try to queue the work item on a NULL pool_workqueue and thus
> crash.
>
> Fix it by mapping NUMA_NO_NODE to the default pool_workqueue from
> unbound_pwq_by_node(). This is a temporary workaround. The long term
> solution is keeping CPU -> NODE mapping stable across CPU off/online
> cycles which is in the works.
>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com>
> Cc: Tang Chen <tangchen@cn.fujitsu.com>
> Cc: Rafael J. Wysocki <rafael@kernel.org>
> Cc: Len Brown <len.brown@intel.com>
> Cc: stable@vger.kernel.org # v4.3+
> Fixes: 874bbfe600a6 ("workqueue: make sure delayed work run in local cpu")
> Link: http://lkml.kernel.org/g/1454424264.11183.46.camel@gmail.com
> Link: http://lkml.kernel.org/g/1453702100-2597-1-git-send-email-tangchen@cn.fujitsu.com
Reviewed-by: Michal Hocko <mhocko@suse.com>
> ---
> kernel/workqueue.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 61a0264..f748eab 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -570,6 +570,16 @@ static struct pool_workqueue *unbound_pwq_by_node(struct workqueue_struct *wq,
> int node)
> {
> assert_rcu_or_wq_mutex_or_pool_mutex(wq);
> +
> + /*
> + * XXX: @node can be NUMA_NO_NODE if CPU goes offline while a
> + * delayed item is pending. The plan is to keep CPU -> NODE
> + * mapping valid and stable across CPU on/offlines. Once that
> + * happens, this workaround can be removed.
> + */
> + if (unlikely(node == NUMA_NO_NODE))
> + return wq->dfl_pwq;
> +
> return rcu_dereference_raw(wq->numa_pwq_tbl[node]);
> }
>
--
Michal Hocko
SUSE Labs
next prev parent reply other threads:[~2016-02-04 8:40 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-02 14:44 mod_delayed_work() explosion due to 874bbfe6 Mike Galbraith
2016-02-03 14:37 ` Michal Hocko
2016-02-03 16:32 ` Tejun Heo
2016-02-03 18:54 ` [PATCH wq/for-4.5-fixes] workqueue: handle NUMA_NO_NODE for unbound pool_workqueue lookup Tejun Heo
2016-02-03 18:55 ` Tejun Heo
2016-02-04 3:15 ` Mike Galbraith
2016-02-03 19:12 ` Thomas Gleixner
2016-02-03 19:28 ` Tejun Heo
2016-02-04 2:12 ` Mike Galbraith
2016-02-04 8:40 ` Michal Hocko [this message]
2016-02-10 15:55 ` Tejun Heo
2016-02-15 17:33 ` Michal Hocko
2016-02-15 18:21 ` Tejun Heo
2016-02-15 20:54 ` Michal Hocko
2016-02-15 21:02 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160204084005.GA14430@dhcp22.suse.cz \
--to=mhocko@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tj@kernel.org \
--cc=umgwanakikbuti@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.