All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: heiko.carstens@de.ibm.com, linux-kernel@vger.kernel.org,
	Tejun Heo <tj@kernel.org>, Ingo Molnar <mingo@kernel.org>,
	Rik van Riel <riel@redhat.com>
Subject: Re: [RFC][PATCH] sched: Start stopper early
Date: Wed, 7 Oct 2015 15:20:07 +0200	[thread overview]
Message-ID: <20151007132007.GA24540@redhat.com> (raw)
In-Reply-To: <20151007123852.GH17308@twins.programming.kicks-ass.net>

On 10/07, Peter Zijlstra wrote:
>
> On Wed, Oct 07, 2015 at 02:30:46PM +0200, Oleg Nesterov wrote:
> > On 10/07, Peter Zijlstra wrote:
> > >
> > > So Heiko reported some 'interesting' fail where stop_two_cpus() got
> > > stuck in multi_cpu_stop() with one cpu waiting for another that never
> > > happens.
> > >
> > > It _looks_ like the 'other' cpu isn't running and the current best
> > > theory is that we race on cpu-up and get the stop_two_cpus() call in
> > > before the stopper task is running.
> > >
> > > This _is_ possible because we set 'online && active'
> >
> > Argh. Can't really comment this change right now, but this reminds me
> > that stop_two_cpus() path should not rely on cpu_active() at all. I mean
> > we should not use this check to avoid the deadlock, migrate_swap_stop()
> > can check it itself. And cpu_stop_park()->cpu_stop_signal_done() should
> > be replaced by BUG_ON().
> >
> > Probably slightly off-topic, but what do you finally think about the old
> > "[PATCH v2 6/6] stop_machine: kill stop_cpus_lock and lg_double_lock/unlock()"
> > we discussed in http://marc.info/?t=143750670300014 ?
> >
> > I won't really insist if you still dislike it, but it seems we both
> > agree that "lg_lock stop_cpus_lock" must die in any case, and after that
> > we can the cleanups mentioned above.
>
> Yes, I was looking at that, this issue reminded me we still had that
> issue open.

Great, thanks!

But let me add that I tried to confuse you because I forgot what actually
I was going to do... I meant something like the (incomplete) patch below,
and after that we can change stop_two_cpus() to rely on ->enabled and
remove the cpu_active() checks (again, ignoring the fact we do not want
to migrate to inactive CPU). Although I need to recall/recheck this all,
perhaps I missed something...

So while I think we should kill lg_lock in any case, this and the patch
above is absolutely off-topic, we can do this with or without lg_lock
removal.

Oleg.


--- x/kernel/cpu.c
+++ x/kernel/cpu.c
@@ -344,7 +344,7 @@ static int take_cpu_down(void *_param)
 	/* Give up timekeeping duties */
 	tick_handover_do_timer();
 	/* Park the stopper thread */
-	kthread_park(current);
+	stop_machine_park(param->hcpu);
 	return 0;
 }
 
--- x/kernel/stop_machine.c
+++ x/kernel/stop_machine.c
@@ -452,6 +452,15 @@ repeat:
 	}
 }
 
+void stop_machine_park(int cpu)
+{
+	struct cpu_stopper *stopper = &per_cpu(cpu_stopper, cpu);
+
+	spin_lock(&stopper->lock);
+	stopper->enabled = false;
+	spin_unlock(&stopper->lock);
+}
+
 extern void sched_set_stop_task(int cpu, struct task_struct *stop);
 
 static void cpu_stop_create(unsigned int cpu)
@@ -468,10 +477,10 @@ static void cpu_stop_park(unsigned int c
 	/* drain remaining works */
 	spin_lock_irqsave(&stopper->lock, flags);
 	list_for_each_entry_safe(work, tmp, &stopper->works, list) {
+		WARN_ON(1);
 		list_del_init(&work->list);
 		cpu_stop_signal_done(work->done, false);
 	}
-	stopper->enabled = false;
 	spin_unlock_irqrestore(&stopper->lock, flags);
 }
 


  reply	other threads:[~2015-10-07 13:23 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-07  8:41 [RFC][PATCH] sched: Start stopper early Peter Zijlstra
2015-10-07 12:30 ` Oleg Nesterov
2015-10-07 12:38   ` Peter Zijlstra
2015-10-07 13:20     ` Oleg Nesterov [this message]
2015-10-07 13:24       ` Oleg Nesterov
2015-10-07 13:36       ` kbuild test robot
2015-10-08 14:50 ` [PATCH 0/3] (Was: [RFC][PATCH] sched: Start stopper early) Oleg Nesterov
2015-10-08 14:51   ` [PATCH 1/3] stop_machine: ensure that a queued callback will be called before cpu_stop_park() Oleg Nesterov
2015-10-14 15:34     ` Peter Zijlstra
2015-10-14 19:03       ` Oleg Nesterov
2015-10-14 20:32         ` Peter Zijlstra
2015-10-15 17:02           ` Oleg Nesterov
2015-10-16 10:49             ` Peter Zijlstra
2015-10-20  9:32     ` [tip:sched/core] stop_machine: Ensure " tip-bot for Oleg Nesterov
2015-10-08 14:51   ` [PATCH 2/3] stop_machine: introduce __cpu_stop_queue_work() and cpu_stop_queue_two_works() Oleg Nesterov
2015-10-20  9:33     ` [tip:sched/core] stop_machine: Introduce " tip-bot for Oleg Nesterov
2015-10-08 14:51   ` [PATCH 3/3] stop_machine: change cpu_stop_queue_two_works() to rely on stopper->enabled Oleg Nesterov
2015-10-08 15:04     ` Peter Zijlstra
2015-10-08 15:59       ` Oleg Nesterov
2015-10-08 16:08         ` Oleg Nesterov
2015-10-08 17:01     ` [PATCH v2 " Oleg Nesterov
2015-10-09 16:37       ` Peter Zijlstra
2015-10-09 16:40         ` Oleg Nesterov
2015-10-20  9:33       ` [tip:sched/core] stop_machine: Change " tip-bot for Oleg Nesterov
2015-10-08 18:05 ` [RFC][PATCH] sched: Start stopper early Oleg Nesterov
2015-10-08 18:47   ` Oleg Nesterov
2015-10-09 16:00 ` [PATCH 0/3] make stopper threads more "selfparking" Oleg Nesterov
2015-10-09 16:00   ` [PATCH 1/3] stop_machine: kill smp_hotplug_thread->pre_unpark, introduce stop_machine_unpark() Oleg Nesterov
2015-10-20  9:33     ` [tip:sched/core] stop_machine: Kill smp_hotplug_thread-> pre_unpark, " tip-bot for Oleg Nesterov
2015-10-09 16:00   ` [PATCH 2/3] stop_machine: kill cpu_stop_threads->setup() and cpu_stop_unpark() Oleg Nesterov
2015-10-20  9:34     ` [tip:sched/core] stop_machine: Kill " tip-bot for Oleg Nesterov
2015-10-09 16:00   ` [PATCH 3/3] sched: start stopper early Oleg Nesterov
2015-10-09 16:49     ` Oleg Nesterov
2015-10-20  9:34     ` [tip:sched/core] sched: Start " tip-bot for Peter Zijlstra
2015-10-16  8:22 ` [RFC][PATCH] " Heiko Carstens
2015-10-16  9:57   ` Peter Zijlstra
2015-10-16 12:01     ` Heiko Carstens
2015-10-26 14:24       ` Michael Holzheu
2015-10-26 20:20         ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151007132007.GA24540@redhat.com \
    --to=oleg@redhat.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.