Patch "workqueue: handle NUMA_NO_NODE for unbound pool_workqueue lookup" has been added to the 4.4-stable tree

All of lore.kernel.org
 help / color / mirror / Atom feed

From: <gregkh@linuxfoundation.org>
To: <tj@kernel.org>, <gregkh@linuxfoundation.org>,
	<len.brown@intel.com>, <rafael@kernel.org>,
	<tangchen@cn.fujitsu.com>, <umgwanakikbuti@gmail.com>
Cc: <stable@vger.kernel.org>, <stable-commits@vger.kernel.org>
Subject: Patch "workqueue: handle NUMA_NO_NODE for unbound pool_workqueue lookup" has been added to the 4.4-stable tree
Date: Tue, 01 Mar 2016 22:00:53 +0000	[thread overview]
Message-ID: <145686964814242@kroah.com> (raw)

This is a note to let you know that I've just added the patch titled

    workqueue: handle NUMA_NO_NODE for unbound pool_workqueue lookup

to the 4.4-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     workqueue-handle-numa_no_node-for-unbound-pool_workqueue-lookup.patch
and it can be found in the queue-4.4 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@vger.kernel.org> know about it.

>From d6e022f1d207a161cd88e08ef0371554680ffc46 Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj@kernel.org>
Date: Wed, 3 Feb 2016 13:54:25 -0500
Subject: workqueue: handle NUMA_NO_NODE for unbound pool_workqueue lookup

From: Tejun Heo <tj@kernel.org>

commit d6e022f1d207a161cd88e08ef0371554680ffc46 upstream.

When looking up the pool_workqueue to use for an unbound workqueue,
workqueue assumes that the target CPU is always bound to a valid NUMA
node.  However, currently, when a CPU goes offline, the mapping is
destroyed and cpu_to_node() returns NUMA_NO_NODE.

This has always been broken but hasn't triggered often enough before
874bbfe600a6 ("workqueue: make sure delayed work run in local cpu").
After the commit, workqueue forcifully assigns the local CPU for
delayed work items without explicit target CPU to fix a different
issue.  This widens the window where CPU can go offline while a
delayed work item is pending causing delayed work items dispatched
with target CPU set to an already offlined CPU.  The resulting
NUMA_NO_NODE mapping makes workqueue try to queue the work item on a
NULL pool_workqueue and thus crash.

While 874bbfe600a6 has been reverted for a different reason making the
bug less visible again, it can still happen.  Fix it by mapping
NUMA_NO_NODE to the default pool_workqueue from unbound_pwq_by_node().
This is a temporary workaround.  The long term solution is keeping CPU
-> NODE mapping stable across CPU off/online cycles which is being
worked on.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Len Brown <len.brown@intel.com>
Link: http://lkml.kernel.org/g/1454424264.11183.46.camel@gmail.com
Link: http://lkml.kernel.org/g/1453702100-2597-1-git-send-email-tangchen@cn.fujitsu.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 kernel/workqueue.c |   10 ++++++++++
 1 file changed, 10 insertions(+)

--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -568,6 +568,16 @@ static struct pool_workqueue *unbound_pw
 						  int node)
 {
 	assert_rcu_or_wq_mutex_or_pool_mutex(wq);
+
+	/*
+	 * XXX: @node can be NUMA_NO_NODE if CPU goes offline while a
+	 * delayed item is pending.  The plan is to keep CPU -> NODE
+	 * mapping valid and stable across CPU on/offlines.  Once that
+	 * happens, this workaround can be removed.
+	 */
+	if (unlikely(node == NUMA_NO_NODE))
+		return wq->dfl_pwq;
+
 	return rcu_dereference_raw(wq->numa_pwq_tbl[node]);
 }

Patches currently in stable-queue which might be from tj@kernel.org are

queue-4.4/workqueue-handle-numa_no_node-for-unbound-pool_workqueue-lookup.patch
queue-4.4/libata-fix-sff-host-state-machine-locking-while-polling.patch
queue-4.4/revert-workqueue-make-sure-delayed-work-run-in-local-cpu.patch

                 reply	other threads:[~2016-03-01 22:00 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=145686964814242@kroah.com \
    --to=gregkh@linuxfoundation.org \
    --cc=len.brown@intel.com \
    --cc=rafael@kernel.org \
    --cc=stable-commits@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=tangchen@cn.fujitsu.com \
    --cc=tj@kernel.org \
    --cc=umgwanakikbuti@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.