linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: santosh.shilimkar@ti.com (Santosh Shilimkar)
To: linux-arm-kernel@lists.infradead.org
Subject: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.
Date: Mon, 20 Jun 2011 20:24:33 +0530	[thread overview]
Message-ID: <4DFF5F29.2000904@ti.com> (raw)
In-Reply-To: <20110620142338.GL2082@n2100.arm.linux.org.uk>

On 6/20/2011 7:53 PM, Russell King - ARM Linux wrote:
> On Mon, Jun 20, 2011 at 12:40:19PM +0100, Russell King - ARM Linux wrote:
>> Ok.  So loops_per_jiffy must be too small.  My guess is you're using an
>> older kernel without 71c696b1 (calibrate: extract fall-back calculation
>> into own helper).
>
> Right, this commit above helps show the problem - and it's fairly subtle.
>
> It's a race condition.  Let's first look at the spinlock debugging code.
> It does this:
>
> static void __spin_lock_debug(raw_spinlock_t *lock)
> {
>          u64 i;
>          u64 loops = loops_per_jiffy * HZ;
>
>          for (;;) {
>                  for (i = 0; i<  loops; i++) {
>                          if (arch_spin_trylock(&lock->raw_lock))
>                                  return;
>                          __delay(1);
>                  }
> 		/* print warning */
> 	}
> }
>
> If loops_per_jiffy is zero, we never try to grab the spinlock, because
> we never enter the inner for loop.  We immediately print a warning,
> and re-execute the outer loop for ever, resulting in the CPU locking up
> in this condition.
>
> In theory, we should never see a zero loops_per_jiffy value, because it
> represents the number of loops __delay() needs to delay by one jiffy and
> clearly zero makes no sense.
>
> However, calibrate_delay() does this (which x86 and ARM call on secondary
> CPU startup):
>
> calibrate_delay()
> {
> ...
> 	if (preset_lpj) {
> 	} else if ((!printed)&&  lpj_fine) {
> 	} else if ((loops_per_jiffy = calibrate_delay_direct()) != 0) {
> 	} else {
> 		/* approximation/convergence stuff */
> 	}
> }
>
> Now, before 71c696b, this used to be:
>
>          } else {
>                  loops_per_jiffy = (1<<12);
>
> So the window between calibrate_delay_direct() returning and setting
> loops_per_jiffy to zero, and the re-initialization of loops_per_jiffy
> was relatively short (maybe even the compiler optimized away the zero
> write.)
>
> However, after 71c696b, this now does:
>
>          } else {
>                  if (!printed)
>                          pr_info("Calibrating delay loop... ");
> +               loops_per_jiffy = calibrate_delay_converge();
>
> So, as loops_per_jiffy is not local to this function, the compiler has
> to write out that zero value, before calling calibrate_delay_converge(),
> and loops_per_jiffy only becomes non-zero _after_ calibrate_delay_converge()
> has returned.  This opens the window and allows the spinlock debugging
> code to explode.
>
> This patch closes the window completely, by only writing to loops_per_jiffy
> only when we have a real value for it.
>
> This allows me to boot 3.0.0-rc3 on Versatile Express (4 CPU) whereas
> without this it fails with spinlock lockup and rcu problems.
>
>   init/calibrate.c |   14 ++++++++------
>   1 files changed, 8 insertions(+), 6 deletions(-)
>
I am away from my board now. Will test this change.
btw, the online-active race is still open even with this patch close
and should be fixed.

Regards
Santosh

  reply	other threads:[~2011-06-20 14:54 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-20  9:23 [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler Santosh Shilimkar
2011-06-20  9:50 ` Russell King - ARM Linux
2011-06-20 10:14   ` Russell King - ARM Linux
2011-06-20 10:28     ` Santosh Shilimkar
2011-06-20 10:35       ` Russell King - ARM Linux
2011-06-20 10:45         ` Santosh Shilimkar
2011-06-20 11:42           ` Santosh Shilimkar
2011-06-20 10:44       ` Russell King - ARM Linux
2011-06-20 10:47         ` Santosh Shilimkar
2011-06-20 11:13           ` Russell King - ARM Linux
2011-06-20 11:25             ` Santosh Shilimkar
2011-06-20 11:40               ` Russell King - ARM Linux
2011-06-20 11:51                 ` Santosh Shilimkar
2011-06-20 12:19                   ` Russell King - ARM Linux
2011-06-20 12:27                     ` Santosh Shilimkar
2011-06-20 12:57                       ` Russell King - ARM Linux
2011-06-20 14:23                 ` Russell King - ARM Linux
2011-06-20 14:54                   ` Santosh Shilimkar [this message]
2011-06-20 15:01                     ` Russell King - ARM Linux
2011-06-20 15:10                       ` Santosh Shilimkar
2011-06-21  9:08                     ` Santosh Shilimkar
2011-06-21 10:00                       ` Russell King - ARM Linux
2011-06-21 10:17                         ` Santosh Shilimkar
2011-06-21 10:19                           ` Russell King - ARM Linux
2011-06-21 10:21                             ` Santosh Shilimkar
2011-06-21 10:26                               ` Russell King - ARM Linux
2011-06-21 20:16                                 ` Stephen Boyd
2011-06-21 23:10                                   ` Russell King - ARM Linux
2011-06-22  0:06                                     ` Stephen Boyd
2011-06-22 10:06                                       ` Russell King - ARM Linux
2011-06-20 10:19   ` Santosh Shilimkar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4DFF5F29.2000904@ti.com \
    --to=santosh.shilimkar@ti.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).