From: Gautham R Shenoy <ego@in.ibm.com>
To: Gregory Haskins <ghaskins@novell.com>
Cc: suresh.b.siddha@intel.com, rjw@sisk.pl,
akpm@linux-foundation.org, dmitry.adamushko@gmail.com,
mingo@elte.hu, oleg@sign.ru, yi.y.yang@intel.com,
linux-kernel@vger.kernel.org, tglx@linutronix.de
Subject: Re: [PATCH] adjust root-domain->online span in response to hotplug event
Date: Mon, 10 Mar 2008 13:44:26 +0530 [thread overview]
Message-ID: <20080310081425.GA11031@in.ibm.com> (raw)
In-Reply-To: <20080308050627.4831.87630.stgit@novell1.haskins.net>
On Sat, Mar 08, 2008 at 12:10:15AM -0500, Gregory Haskins wrote:
> Suresh Siddha wrote:
> > On Sat, Mar 08, 2008 at 12:43:15AM +0100, Rafael J. Wysocki wrote:
> >> On Saturday, 8 of March 2008, Andrew Morton wrote:
> >>> On Fri, 7 Mar 2008 15:01:26 -0800
> >>> Suresh Siddha <suresh.b.siddha@intel.com> wrote:
> >>>
> >>>> Andrew, Please check if the appended patch fixes your power-off
> problem aswell.
> >>>> ...
> >>>>
> >>>> --- a/kernel/sched.c
> >>>> +++ b/kernel/sched.c
> >>>> @@ -5882,6 +5882,7 @@ migration_call(struct notifier_block *nfb,
> unsigned long action, void *hcpu)
> >>>> break;
> >>>>
> >>>> case CPU_DOWN_PREPARE:
> >>>> + case CPU_DOWN_PREPARE_FROZEN:
> >>>> /* Update our root-domain */
> >>>> rq = cpu_rq(cpu);
> >>>> spin_lock_irqsave(&rq->lock, flags);
> >>> No, it does not.
> >> Well, this is a bug nevertheless.
> >>
> >
> > Well, my previous root cause needs some small changes.
> >
> > During the notifier call chain for CPU_DOWN(till 'update_sched_domains'
> > is called atleast), all the cpu's are attached to 'def_root_domain',
> for whom
> > online mask still has the offline cpu.
> >
> > This is because, during CPU_DOWN_PREPARE, migration_call() first clears
> > the root_domain->online, and later during the DOWN_PREPARE call chain
> > detach_destroy_domains() attach to def_root_domain with
> cpu_online_map(which
> > still has the just about to die 'cpu' set).
> >
> > So essentially, during the notifier call chain of CPU_DOWN (before
> > 'update_sched_domains' is called atleast), any one doing RT process
> > wakeup's (for example: kthread_stop()) can still end up on the dead cpu.
> >
> > Andrew, Can you please try one more patch(appended) to see if it helps?
> >
> > Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
> > ---
> >
> > diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
> > index 0a6d2e5..8cb707c 100644
> > --- a/kernel/sched_rt.c
> > +++ b/kernel/sched_rt.c
> > @@ -597,7 +597,7 @@ static int find_lowest_cpus(struct task_struct
> *task, cpumask_t *lowest_mask)
> > int count = 0;
> > int cpu;
> >
> > - cpus_and(*lowest_mask, task_rq(task)->rd->online, task->cpus_allowed);
> > + cpus_and(*lowest_mask, task->cpus_allowed, cpu_online_map);
> >
> > /*
>
> Hi Suresh,
> Unfortunately, this patch will introduce its own set of bugs.
> However, your analysis was spot-on. I think I see the problem now. It
> was introduced when I put a hack in to "fix" s2ram problems in -mm as a
> result of the new root-domain logic. I think the following patch will
> fix both issues:
>
> (I verified that I could take a cpu offline/online, but I don't have an
> s2ram compatible machine handy. Andrew, I believe you could reproduce
> the s2ram problem a few months ago when that issue popped up. So if you
> could, please verify that s2ram also works with this patch applied, in
> addition to the hotplug problem.
>
> Regards,
> -Greg
>
> -------------------------------------------------
>
> adjust root-domain->online span in response to hotplug event
>
> We currently set the root-domain online span automatically when the domain
> is added to the cpu if the cpu is already a member of cpu_online_map.
> This was done as a hack/bug-fix for s2ram, but it also causes a problem
> with hotplug CPU_DOWN transitioning. The right way to fix the original
> problem is to actually respond to CPU_UP events, instead of CPU_ONLINE,
> which is already too late.
>
> Signed-off-by: Gregory Haskins <ghaskins@novell.com>
> ---
>
> kernel/sched.c | 18 +++++++-----------
> 1 files changed, 7 insertions(+), 11 deletions(-)
>
> diff --git a/kernel/sched.c b/kernel/sched.c
> index 52b9867..b02e4fc 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -5813,6 +5813,13 @@ migration_call(struct notifier_block *nfb, unsigned long action, void *hcpu)
> /* Must be high prio: stop_machine expects to yield to it. */
> rq = task_rq_lock(p, &flags);
> __setscheduler(rq, p, SCHED_FIFO, MAX_RT_PRIO-1);
> +
> + /* Update our root-domain */
> + if (rq->rd) {
> + BUG_ON(!cpu_isset(cpu, rq->rd->span));
> + cpu_set(cpu, rq->rd->online);
> + }
Hi Greg,
Suppose someone issues a wakeup at some point between CPU_UP_PREPARE
and CPU_ONLINE, then isn't there a possibility that the task could
be woken up on the cpu which has not yet come online ?
Because at this point in find_lowest_cpus()
cpus_and(*lowest_mask, task_rq(task)->rd->online, task->cpus_allowed);
the cpu which has not yet come online is set in both rd->online map
and the task->cpus_allowed.
I wonder if assigning a priority to the update_sched_domains() notifier
so that it's called immediately after migration_call() would solve the
problem.
> +
> task_rq_unlock(rq, &flags);
> cpu_rq(cpu)->migration_thread = p;
> break;
> @@ -5821,15 +5828,6 @@ migration_call(struct notifier_block *nfb, unsigned long action, void *hcpu)
> case CPU_ONLINE_FROZEN:
> /* Strictly unnecessary, as first user will wake it. */
> wake_up_process(cpu_rq(cpu)->migration_thread);
> -
> - /* Update our root-domain */
> - rq = cpu_rq(cpu);
> - spin_lock_irqsave(&rq->lock, flags);
> - if (rq->rd) {
> - BUG_ON(!cpu_isset(cpu, rq->rd->span));
> - cpu_set(cpu, rq->rd->online);
> - }
> - spin_unlock_irqrestore(&rq->lock, flags);
> break;
>
> #ifdef CONFIG_HOTPLUG_CPU
> @@ -6105,8 +6103,6 @@ static void rq_attach_root(struct rq *rq, struct root_domain *rd)
> rq->rd = rd;
>
> cpu_set(rq->cpu, rd->span);
> - if (cpu_isset(rq->cpu, cpu_online_map))
> - cpu_set(rq->cpu, rd->online);
>
> for (class = sched_class_highest; class; class = class->next) {
> if (class->join_domain)
--
Thanks and Regards
gautham
next prev parent reply other threads:[~2008-03-10 8:14 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-03-02 18:42 [BUG 2.6.25-rc3] scheduler/hotplug: some processes are dealocked when cpu is set to offline Yi Yang
2008-03-03 11:54 ` Dmitry Adamushko
2008-03-03 11:56 ` Ingo Molnar
2008-03-03 12:02 ` Dmitry Adamushko
2008-03-03 14:53 ` Yi Yang
2008-03-03 17:37 ` Yi Yang
2008-03-03 15:31 ` Gautham R Shenoy
2008-03-03 14:45 ` Yi Yang
2008-03-04 5:26 ` Gautham R Shenoy
2008-03-04 9:09 ` Gautham R Shenoy
2008-03-03 21:56 ` Yi Yang
2008-03-04 15:01 ` Oleg Nesterov
2008-03-04 14:37 ` Yi Yang
2008-03-06 20:05 ` Yi Yang
2008-03-05 10:05 ` Gautham R Shenoy
2008-03-05 13:53 ` Oleg Nesterov
2008-03-06 11:15 ` Gautham R Shenoy
2008-03-06 12:22 ` Gautham R Shenoy
2008-03-06 13:44 ` Gautham R Shenoy
2008-03-07 2:54 ` Oleg Nesterov
2008-03-07 9:10 ` Gautham R Shenoy
2008-03-07 10:51 ` Gautham R Shenoy
2008-03-06 23:20 ` Yi Yang
2008-03-07 13:02 ` Dmitry Adamushko
2008-03-07 13:55 ` Gautham R Shenoy
2008-03-07 15:50 ` Gautham R Shenoy
2008-03-07 19:14 ` [BUG 2.6.25-rc3] scheduler/hotplug: some processes aredealocked " Suresh Siddha
2008-03-07 20:18 ` [BUG 2.6.25-rc3] scheduler/hotplug: some processes are dealocked " Andrew Morton
2008-03-07 21:36 ` Rafael J. Wysocki
2008-03-07 23:01 ` Suresh Siddha
2008-03-07 23:29 ` Andrew Morton
2008-03-07 23:43 ` Rafael J. Wysocki
2008-03-08 1:50 ` Suresh Siddha
2008-03-08 2:09 ` Andrew Morton
2008-03-08 5:10 ` [PATCH] adjust root-domain->online span in response to hotplug event Gregory Haskins
2008-03-08 8:41 ` Ingo Molnar
2008-03-08 17:50 ` [PATCH] adjust root-domain->online span in response to hotplugevent Gregory Haskins
2008-03-09 0:31 ` Dmitry Adamushko
2008-03-10 14:12 ` Gregory Haskins
2008-03-09 2:35 ` [PATCH] adjust root-domain->online span in response to hotplug event Suresh Siddha
2008-03-10 12:41 ` Gregory Haskins
2008-03-10 8:14 ` Gautham R Shenoy [this message]
2008-03-10 13:13 ` [PATCH] cpu-hotplug: Register update_sched_domains() notifier with higher prio Gautham R Shenoy
2008-03-10 22:25 ` Andrew Morton
2008-03-10 13:39 ` [PATCH] keep rd->online and cpu_online_map in sync Gregory Haskins
2008-03-10 14:21 ` Gautham R Shenoy
2008-03-10 18:12 ` Suresh Siddha
2008-03-10 22:03 ` Rafael J. Wysocki
2008-03-10 22:00 ` Gregory Haskins
2008-03-10 22:10 ` Suresh Siddha
2008-03-10 21:59 ` [PATCH v2] " Gregory Haskins
2008-03-10 23:36 ` Andrew Morton
2008-03-11 1:34 ` Suresh Siddha
2008-03-11 4:39 ` Gautham R Shenoy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080310081425.GA11031@in.ibm.com \
--to=ego@in.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=dmitry.adamushko@gmail.com \
--cc=ghaskins@novell.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=oleg@sign.ru \
--cc=rjw@sisk.pl \
--cc=suresh.b.siddha@intel.com \
--cc=tglx@linutronix.de \
--cc=yi.y.yang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.