From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965006AbcBDIkK (ORCPT ); Thu, 4 Feb 2016 03:40:10 -0500 Received: from mail-wm0-f67.google.com ([74.125.82.67]:34772 "EHLO mail-wm0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932313AbcBDIkI (ORCPT ); Thu, 4 Feb 2016 03:40:08 -0500 Date: Thu, 4 Feb 2016 09:40:05 +0100 From: Michal Hocko To: Tejun Heo Cc: Mike Galbraith , LKML Subject: Re: [PATCH wq/for-4.5-fixes] workqueue: handle NUMA_NO_NODE for unbound pool_workqueue lookup Message-ID: <20160204084005.GA14430@dhcp22.suse.cz> References: <1454424264.11183.46.camel@gmail.com> <20160203185425.GK14091@mtj.duckdns.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160203185425.GK14091@mtj.duckdns.org> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 03-02-16 13:54:25, Tejun Heo wrote: > When looking up the pool_workqueue to use for an unbound workqueue, > workqueue assumes that the target CPU is always bound to a valid NUMA > node. However, currently, when a CPU goes offline, the mapping is > destroyed and cpu_to_node() returns NUMA_NO_NODE. This has always > been broken but hasn't triggered until recently. > > After 874bbfe600a6 ("workqueue: make sure delayed work run in local > cpu"), workqueue forcifully assigns the local CPU for delayed work > items without explicit target CPU to fix a different issue. This > widens the window where CPU can go offline while a delayed work item > is pending causing delayed work items dispatched with target CPU set > to an already offlined CPU. The resulting NUMA_NO_NODE mapping makes > workqueue try to queue the work item on a NULL pool_workqueue and thus > crash. > > Fix it by mapping NUMA_NO_NODE to the default pool_workqueue from > unbound_pwq_by_node(). This is a temporary workaround. The long term > solution is keeping CPU -> NODE mapping stable across CPU off/online > cycles which is in the works. > > Signed-off-by: Tejun Heo > Reported-by: Mike Galbraith > Cc: Tang Chen > Cc: Rafael J. Wysocki > Cc: Len Brown > Cc: stable@vger.kernel.org # v4.3+ > Fixes: 874bbfe600a6 ("workqueue: make sure delayed work run in local cpu") > Link: http://lkml.kernel.org/g/1454424264.11183.46.camel@gmail.com > Link: http://lkml.kernel.org/g/1453702100-2597-1-git-send-email-tangchen@cn.fujitsu.com Reviewed-by: Michal Hocko > --- > kernel/workqueue.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/kernel/workqueue.c b/kernel/workqueue.c > index 61a0264..f748eab 100644 > --- a/kernel/workqueue.c > +++ b/kernel/workqueue.c > @@ -570,6 +570,16 @@ static struct pool_workqueue *unbound_pwq_by_node(struct workqueue_struct *wq, > int node) > { > assert_rcu_or_wq_mutex_or_pool_mutex(wq); > + > + /* > + * XXX: @node can be NUMA_NO_NODE if CPU goes offline while a > + * delayed item is pending. The plan is to keep CPU -> NODE > + * mapping valid and stable across CPU on/offlines. Once that > + * happens, this workaround can be removed. > + */ > + if (unlikely(node == NUMA_NO_NODE)) > + return wq->dfl_pwq; > + > return rcu_dereference_raw(wq->numa_pwq_tbl[node]); > } > -- Michal Hocko SUSE Labs