From: Rik van Riel <riel@redhat.com>
To: Mel Gorman <mgorman@suse.de>
Cc: peterz@infradead.org, mingo@kernel.org, prarit@redhat.com,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH -tip] fix race between stop_two_cpus and stop_cpus
Date: Fri, 01 Nov 2013 07:36:36 -0400 [thread overview]
Message-ID: <52739244.3060209@redhat.com> (raw)
In-Reply-To: <20131101110825.GX2400@suse.de>
On 11/01/2013 07:08 AM, Mel Gorman wrote:
> On Thu, Oct 31, 2013 at 04:31:44PM -0400, Rik van Riel wrote:
>> There is a race between stop_two_cpus, and the global stop_cpus.
>>
>
> What was the trigger for this? I want to see what was missing from my own
> testing. I'm going to go out on a limb and guess that CPU hotplug was also
> running in the background to specifically stress this sort of rare condition.
> Something like running a standard test with the monitors/watch-cpuoffline.sh
> from mmtests running in parallel.
AFAIK the trigger was a test that continuously loads and
unloads kernel modules, while doing other stuff.
>> It is possible for two CPUs to get their stopper functions queued
>> "backwards" from one another, resulting in the stopper threads
>> getting stuck, and the system hanging. This can happen because
>> queuing up stoppers is not synchronized.
>>
>> This patch adds synchronization between stop_cpus (a rare operation),
>> and stop_two_cpus.
>>
>> Signed-off-by: Rik van Riel <riel@redhat.com>
>> ---
>> Prarit is running a test with this patch. By now the kernel would have
>> crashed already, yet it is still going. I expect Prarit will add his
>> Tested-by: some time tomorrow morning.
>>
>> kernel/stop_machine.c | 43 ++++++++++++++++++++++++++++++++++++++++++-
>> 1 file changed, 42 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
>> index 32a6c44..46cb4c2 100644
>> --- a/kernel/stop_machine.c
>> +++ b/kernel/stop_machine.c
>> @@ -40,8 +40,10 @@ struct cpu_stopper {
>> };
>>
>> static DEFINE_PER_CPU(struct cpu_stopper, cpu_stopper);
>> +static DEFINE_PER_CPU(bool, stop_two_cpus_queueing);
>> static DEFINE_PER_CPU(struct task_struct *, cpu_stopper_task);
>> static bool stop_machine_initialized = false;
>> +static bool stop_cpus_queueing = false;
>>
>> static void cpu_stop_init_done(struct cpu_stop_done *done, unsigned int nr_todo)
>> {
>> @@ -261,16 +263,37 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, cpu_stop_fn_t fn, void *
>> cpu_stop_init_done(&done, 2);
>> set_state(&msdata, MULTI_STOP_PREPARE);
>>
>> + wait_for_global:
>> + /* If a global stop_cpus is queuing up stoppers, wait. */
>> + while (unlikely(stop_cpus_queueing))
>> + cpu_relax();
>> +
>
> This partially serialises callers to migrate_swap() while it is checked
> if the pair of CPUs are being affected at the moment. It's two-stage
Not really. This only serializes migrate_swap if there is a global
stop_cpus underway.
If there is no global stop_cpus, migrate_swap will continue the way
it did before, without locking.
> locking. The global lock is short-lived while the per-cpu data is updated
> and the per-cpu values allow a degree of parallelisation on call_cpu which
> could not be done with a spinlock held anyway. Why not make protection
> of the initial update a normal spinlock? i.e.
>
> spin_lock(&stop_cpus_queue_lock);
> this_cpu_write(stop_two_cpus_queueing, true);
> spin_unlock(&stop_cpus_queue_lock);
Because that would result in all migrate_swap instances serializing
with each other.
--
All rights reversed
next prev parent reply other threads:[~2013-11-01 11:37 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-10-31 20:31 [PATCH -tip] fix race between stop_two_cpus and stop_cpus Rik van Riel
2013-11-01 11:08 ` Mel Gorman
2013-11-01 11:36 ` Rik van Riel [this message]
2013-11-01 12:08 ` Prarit Bhargava
2013-11-01 13:44 ` Mel Gorman
2013-11-01 14:24 ` Peter Zijlstra
2013-11-01 14:27 ` Rik van Riel
2013-11-01 14:41 ` [PATCH -v2 " Rik van Riel
2013-11-01 14:47 ` Mel Gorman
2013-11-01 14:49 ` Prarit Bhargava
2013-11-01 18:24 ` Prarit Bhargava
2013-11-11 17:52 ` [tip:sched/core] stop_machine: Fix race between stop_two_cpus() and stop_cpus() tip-bot for Rik van Riel
2013-11-01 11:39 ` [PATCH -tip] fix race between stop_two_cpus and stop_cpus Prarit Bhargava
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52739244.3060209@redhat.com \
--to=riel@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=prarit@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.