linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: andreiw@motorola.com (Andrei Warkentin)
To: linux-arm-kernel@lists.infradead.org
Subject: [RFC] Make SMP secondary CPU up more resilient to failure.
Date: Wed, 15 Dec 2010 17:45:13 -0600	[thread overview]
Message-ID: <AANLkTinM524ozYQhHSpCN49LvZLePO6faHJrqxKBtyJe@mail.gmail.com> (raw)

Hi,

This is my first time on linux-arm-kernel, and while I've read the
FAQ, hopefully I don't screw up too badly :).

Anyway, we're on a dual-core ARMv7 running 2.6.36, and during
stability stress testing saw the following:
1) After a number hotplug iterations, CPU1 fails to set its online bit
quickly enough and __cpu_up() times-out.
2) CPU1 eventually completes its startup and sets the bit, however,
since _cpu_up() failed, CPU1's active bit is never set.
3) On the next call to cpu_down(), the function checks that the online
bit is set and proceeds.
4) The workqueue receives a CPU_DOWN_PENDING notification and creates
a trustee_thread to run on CPU1.
5) Since CPU1's active bit is not set, the scheduler runs the thread
on CPU0 instead and the BUG_ON at kernel/workqueue.c:3111 fires (check
that CPU run on is the desired one)

I have a patch that resolves this in two ways -
1) I only wait with timeouts to make sure the CPU registers the SIPI
and enters secondary_start_kernel. After that we're in "safe"
territory and it doesn't matter how long the rest of initialization
takes. So that puts the waited time into deterministic land.

2) Additionally I ensure that if the CPU comes up later than it were
supposed to (shouldn't, but...), then it will not start initializing
behind cpu_up's back (which is not really undoable). This solves the
problem with both cpu_up+secondary_start_kernel races and with
platform_cpu_kill+secondary_start_kernel races.

Tested this by injecting mdelays into secondary_start_kernel (before
clearing the booting bit and after), and by putting a while(1) into
secondary_start_kernel at different points.

There were concerns brought up that this patch might conflict with all
the ARM SMP work going into .38, so I wanted to check with the list
first.

Thank You,
A
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-ARM-Make-SMP-init-more-resilient-to-failures.patch
Type: text/x-patch
Size: 4678 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20101215/ec26c93b/attachment.bin>

             reply	other threads:[~2010-12-15 23:45 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-12-15 23:45 Andrei Warkentin [this message]
2010-12-16 11:34 ` [RFC] Make SMP secondary CPU up more resilient to failure Russell King - ARM Linux
2010-12-16 23:09   ` Andrei Warkentin
2010-12-16 23:28     ` Russell King - ARM Linux
2010-12-17 20:52       ` Andrei Warkentin
2010-12-17 23:14         ` Russell King - ARM Linux
2010-12-17 23:45           ` Andrei Warkentin
2010-12-18  0:08             ` Russell King - ARM Linux
2010-12-18  0:36               ` Russell King - ARM Linux
2010-12-18  7:17               ` Andrei Warkentin
2010-12-18 12:01                 ` Russell King - ARM Linux
2010-12-18 12:10                   ` Andrei Warkentin
2010-12-18 20:04                     ` Russell King - ARM Linux
2010-12-21 21:53                       ` Andrei Warkentin
2010-12-24 17:38                         ` Russell King - ARM Linux
2011-01-13 10:19                           ` Andrei Warkentin
2011-01-13 11:14                             ` Russell King - ARM Linux
2011-01-13 22:03                               ` Andrei Warkentin
2010-12-17  0:11     ` murali at embeddedwireless.com
2010-12-18  9:58     ` Russell King - ARM Linux
2010-12-18 11:54       ` Andrei Warkentin
2010-12-18 12:19         ` Russell King - ARM Linux

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AANLkTinM524ozYQhHSpCN49LvZLePO6faHJrqxKBtyJe@mail.gmail.com \
    --to=andreiw@motorola.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).