linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Gautham R Shenoy <ego@linux.vnet.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Tejun Heo <htejun@gmail.com>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Abdul Haleem <abdhalee@linux.vnet.ibm.com>,
	Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>,
	linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/2] workqueue:Fix affinity of an unbound worker of a node with 1 online CPU
Date: Wed, 15 Jun 2016 15:49:36 +0530	[thread overview]
Message-ID: <20160615101936.GA31671@in.ibm.com> (raw)
In-Reply-To: <20160614112234.GF30154@twins.programming.kicks-ass.net>

Hi Peter,

On Tue, Jun 14, 2016 at 01:22:34PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 07, 2016 at 08:44:03PM +0530, Gautham R. Shenoy wrote:
> 
> I'm still puzzled why we don't see this on x86. Afaict there's nothing
> PPC specific about this.

You are right. On PPC, at boot time we hit the WARN_ON like once in 5
times. Using some debug prints, I have verified that these are
instances when the workqueue subsystem gets initialized before all the
CPUs come online. On x86, I have never been able to hit this since it
appears that every time the workqueues get initialized only after all
the CPUs have come online.

PPC doesn't uses any specific unbound workqueue early in the boot. The
unbound workqueues causing the WARN_ON() were the
"events_unbound" workqueue which was created by workqueue_init().

=================================================================================
[WQ] Creating Unbound workers for WQ events_unbound,cpumask 0-127.
     online mask 0
[WQ] Creating Unbound workers for WQ events_unbound,cpumask 0-31. 
     online mask 0
[WQ] Creating Unbound workers for WQ events_unbound,cpumask 32-63.
      online mask 0
[WQ] Creating Unbound workers for WQ events_unbound,cpumask 64-95.
      online mask 0
[WQ] Creating Unbound workers for WQ events_unbound,cpumask 96-127.
      online mask 0
=================================================================================

Also, with the first patch in the series (which ensures that
restore_unbound_workers are called *after* the new workers for the
newly onlined CPUs are created) and without this one, you can
reproduce this WARN_ON on both x86 and PPC by offlining all the CPUs
of a node and bringing just one of them online. So essentially the BUG
fixed by the previous patch is currently hiding this BUG which is why
we are not able to reproduce this WARN_ON() with CPU-hotplug once the
system has booted.

> 
> > This patch sets the affinity of the worker to
> > a) the only online CPU in the cpumask of the worker pool when it comes
> >    online.
> > b) the cpumask of the worker pool when the second CPU in the pool's
> >    cpumask comes online.
> 
> This basically works around the WARN conditions, which I suppose is fair
> enough, but I would like a note here to revisit this once the whole cpu
> hotplug rework has settled.
> 

Sure.

> The real problem is that workqueues seem to want to create worker
> threads before there's anybody who would use them or something like
> that.

I am not sure about that. The workqueue creates unbound workers for a
node via wq_update_unbound_numa() whenever the first CPU of every node
comes online. So that seems legitimate. It then tries to affine these
workers to the cpumask of that node. Again this seems right. As an
optimization, it does this only when the first CPU of the node comes
online. Since this online CPU is not yet active, and since
nr_cpus_allowed > 1, we will hit the WARN_ON().

However, I agree with you that during boot-up, the workqueue subsystem
needs to create unbound worker threads for only the online CPUs
(instead of all possible CPUs as it currently does!) and let the
CPU_ONLINE notification take care of creating the remaining workers
when they are really required.

> 
> Or is that what PPC does funny? Use an unbound workqueue this early in
> cpu bringup?

Like I pointed out above, PPC doesn't use an unbound workqueue early
in the CPU bring up.

--
Thanks and Regards
gautham.

  reply	other threads:[~2016-06-15 10:19 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-19 10:57 WARNING at kernel/sched/core.c:1166 while booting 4.6.0 mainline on ppc64le bare metal abdhalee
2016-05-19 12:34 ` Gavin Shan
2016-05-26 15:11 ` Gautham R Shenoy
2016-06-07 12:29   ` Abdul Haleem
2016-06-07 15:14     ` [PATCH 0/2] Fix CPU Online handling for unbounded worker threads Gautham R. Shenoy
2016-06-07 15:14       ` [PATCH 1/2] workqueue: Move wq_update_unbound_numa() to the beginning of CPU_ONLINE Gautham R. Shenoy
2016-06-15 15:53         ` Tejun Heo
2016-06-15 19:28           ` Gautham R Shenoy
2016-06-16 19:35             ` Tejun Heo
2016-06-21 14:12               ` Gautham R Shenoy
2016-06-21 15:36                 ` Tejun Heo
2016-06-21 19:37                   ` Peter Zijlstra
2016-06-21 19:43                     ` Tejun Heo
2016-06-21 19:47                       ` Peter Zijlstra
2016-06-22  5:15                         ` Gautham R Shenoy
2016-06-07 15:14       ` [PATCH 2/2] workqueue:Fix affinity of an unbound worker of a node with 1 online CPU Gautham R. Shenoy
2016-06-08  6:03         ` Abdul Haleem
2016-06-14 11:22         ` Peter Zijlstra
2016-06-15 10:19           ` Gautham R Shenoy [this message]
2016-06-15 11:32             ` Peter Zijlstra
2016-06-15 12:50               ` Gautham R Shenoy
2016-06-15 13:14                 ` Peter Zijlstra
2016-06-15 16:01                   ` Tejun Heo
2016-06-16 12:11                     ` Michael Ellerman
2016-06-16 12:45                       ` Peter Zijlstra
2016-06-16 19:39                         ` Tejun Heo
2016-06-17  1:49                           ` Michael Ellerman
2016-07-15  5:27                           ` Gautham R Shenoy
2016-07-15  5:30                           ` Michael Ellerman
     [not found]                           ` <57887507.911f240a.687de.08c5SMTPIN_ADDED_BROKEN@mx.google.com>
2016-07-15 12:10                             ` Tejun Heo
2016-06-13  5:44       ` [PATCH 0/2] Fix CPU Online handling for unbounded worker threads Gautham R Shenoy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160615101936.GA31671@in.ibm.com \
    --to=ego@linux.vnet.ibm.com \
    --cc=abdhalee@linux.vnet.ibm.com \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=htejun@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).