All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Gregory Haskins" <ghaskins@novell.com>
To: "Peter Zijlstra" <peterz@infradead.org>
Cc: "Ingo Molnar" <mingo@elte.hu>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	"Thomas Gleixner" <tglx@linutonix.de>,
	"Arnaldo Carvalho de Melo" <acme@redhat.com>,
	<linux-kernel@vger.kernel.org>, <linux-rt-users@vger.kernel.org>
Subject: Re: [PATCH] sched: fix cpupri hotplug support
Date: Wed, 04 Jun 2008 08:28:53 -0600	[thread overview]
Message-ID: <48466E65.BA47.005A.0@novell.com> (raw)
In-Reply-To: <1212580796.23439.10.camel@twins>

>>> On Wed, Jun 4, 2008 at  7:59 AM, in message <1212580796.23439.10.camel@twins>,
Peter Zijlstra <peterz@infradead.org> wrote: 
> On Tue, 2008-06-03 at 23:24 -0400, Gregory Haskins wrote:
>> Hi Ingo, Steven, Thomas,
>>   This patch is critical to fix a snafu in the root-domain/cpupri code.  It
>>   applies to 25.4-rt4, but as mentioned below it affects most recent -rt
>>   kernels as well as sched-devel.  If you accept this patch, and you have
>>   non-trivial merge issues with other flavors just let me know and I will 
> port
>>   it to those kernels as well.
>> 
>> Regards
>> -Greg
>> 
>> --------------------------------
>> sched: fix cpupri hotplug support
>> 
>> The RT folks over at RedHat found an issue w.r.t. hotplug support which
>> was traced to problems with the cpupri infrastructure in the scheduler:
>> 
>> https://bugzilla.redhat.com/show_bug.cgi?id=449676
>> 
>> This bug affects 23-rt12+, 24-rtX, 25-rtX, and sched-devel.  This patch
>> applies to 25.4-rt4, though it should trivially apply to most cpupri enabled
>> kernels mentioned above.
>> 
>> It turned out that the issue was that offline cpus could get inadvertently
>> registered with cpupri so that they were erroneously selected during
>> migration decisions.  The end result would be an OOPS as the offline cpu
>> had tasks routed to it.
> 
> This is not what I saw, I traced it to select_task_rq_rt() selecting a
> cpu that never was online. In my case cpu 2 on a dual core.

First off, Peter, I didnt realize you were involved too.  So I apologize for omitting you in the credits. :(

Note that despite the reproduction case difference, I think we are still talking about the same basic problem here...and I think your observation would also be fixed by this patch.

The issue is that a cpu that is not online is being set to something other than INVALID in cpupri, so it comes up as a valid target during the various migration decisions.  This should now be fixed.


> 
>> This patch adds a new per-sched-class pair of interfaces:
>> online/offline. They allow for the sched-class to register for
>> notification when a run-queue is taken online or offline.  Additionally,
>> the existing join/leave-domain code has been unified behind the new
>> online/offline handlers so that they do the right thing w.r.t. the online
>> status of a particular CPU.
>> 
>> I was able to easily reproduce the issue prior to this patch, and am no
>> longer able to reproduce it after this patch.  I can offline cpus
>> indefinately and everything seems to be in working order.
>> 
>> Thanks to Arnaldo (acme) and Thomas (tglx) for doing the legwork to point
>> me in the right direction. 
>> 
>> Signed-off-by: Gregory Haskins <ghaskins@novell.com>
>> CC: Ingo Molnar <mingo@elte.hu>
>> CC: Thomas Gleixner <tglx@linutonix.de>
>> CC: Steven Rostedt <rostedt@goodmis.org>
>> CC: Arnaldo Carvalho de Melo <acme@redhat.com>
>> ---
>> 
>>  include/linux/sched.h |    2 ++
>>  kernel/sched.c        |   19 +++++++++++++++++++
>>  kernel/sched_rt.c     |   16 ++++++++++++----
>>  3 files changed, 33 insertions(+), 4 deletions(-)
>> 
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> index bb71371..9c5b8c9 100644
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -970,6 +970,8 @@ struct sched_class {
>>  
>>  	void (*join_domain)(struct rq *rq);
>>  	void (*leave_domain)(struct rq *rq);
>> +	void (*online)(struct rq *rq);
>> +	void (*offline)(struct rq *rq);
> 
> Rather unfortunate to add yet another two callbacks, can't this be done
> with a creative use of leave/join?

Yeah, I was debating how to do this.  In fact, as you can see the implementation of join/online and leave/offline is identical.  I wasnt sure if putting a join/leave in the hotplug code made sense, so I added a new callback that maps to the same logic.  Perhaps what I should have done was to s/join/online and s/leave/offline and fixed the root-domain code to use the online/offline variant.  I like this better so I will put out a v2 patch in a few minutes.

> 
>>  	void (*switched_from) (struct rq *this_rq, struct task_struct *task,
>>  			       int running);
>> diff --git a/kernel/sched.c b/kernel/sched.c
>> index 0399411..fe96984 100644
>> --- a/kernel/sched.c
>> +++ b/kernel/sched.c
>> @@ -6278,6 +6278,9 @@ static void unregister_sched_domain_sysctl(void)
>>  }
>>  #endif
>>  
>> +#define for_each_class(class) \
>> +   for (class = sched_class_highest; class; class = class->next)
>> +
>>  /*
>>   * migration_call - callback that gets triggered when a CPU is added.
>>   * Here we can start up the necessary migration thread for the new CPU.
>> @@ -6314,8 +6317,15 @@ migration_call(struct notifier_block *nfb, unsigned 
> long action, void *hcpu)
>>  		rq = cpu_rq(cpu);
>>  		spin_lock_irqsave(&rq->lock, flags);
>>  		if (rq->rd) {
>> +			const struct sched_class *class;
>> +
>>  			BUG_ON(!cpu_isset(cpu, rq->rd->span));
>>  			cpu_set(cpu, rq->rd->online);
>> +
>> +			for_each_class(class) {
>> +				if (class->online)
>> +					class->online(rq);
>> +			}
>>  		}
>>  		spin_unlock_irqrestore(&rq->lock, flags);
>>  		break;
>> @@ -6375,8 +6385,17 @@ migration_call(struct notifier_block *nfb, unsigned 
> long action, void *hcpu)
>>  		rq = cpu_rq(cpu);
>>  		spin_lock_irqsave(&rq->lock, flags);
>>  		if (rq->rd) {
>> +			const struct sched_class *class;
>> +
>>  			BUG_ON(!cpu_isset(cpu, rq->rd->span));
>> +
>> +			for_each_class(class) {
>> +				if (class->offline)
>> +					class->offline(rq);
>> +			}
>> +
>>  			cpu_clear(cpu, rq->rd->online);
>> +
>>  		}
>>  		spin_unlock_irqrestore(&rq->lock, flags);
>>  		break;
> 
> I'm thinking this should be done in update_sched_domains().

Hmm...I think I agree with you.  I will add this to v2 as well.

> 
>> diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
>> index 425cf01..8baf2e8 100644
>> --- a/kernel/sched_rt.c
>> +++ b/kernel/sched_rt.c
>> @@ -1101,8 +1101,11 @@ static void set_cpus_allowed_rt(struct task_struct *p, 
> cpumask_t *new_mask)
>>  }
>>  
>>  /* Assumes rq->lock is held */
>> -static void join_domain_rt(struct rq *rq)
>> +static void rq_online_rt(struct rq *rq)
>>  {
>> +	if (!cpu_isset(rq->cpu, rq->rd->online))
>> +		return;
>> +
>>  	if (rq->rt.overloaded)
>>  		rt_set_overload(rq);
>>  
>> @@ -1110,8 +1113,11 @@ static void join_domain_rt(struct rq *rq)
>>  }
>>  
>>  /* Assumes rq->lock is held */
>> -static void leave_domain_rt(struct rq *rq)
>> +static void rq_offline_rt(struct rq *rq)
>>  {
>> +	if (!cpu_isset(rq->cpu, rq->rd->online))
>> +		return;
>> +
>>  	if (rq->rt.overloaded)
>>  		rt_clear_overload(rq);
> 
> So you now fully remove the leave/join calls because switching between
> root domains doesn't need to migrate the overload status between those
> domains? - seems unlikely.

I think you are misinterpreting that part.  leave/join is fully functional and intact.  I just merged the two implementations to call one function (see the sched_class assignment below)

> 
>> @@ -1278,8 +1284,10 @@ const struct sched_class rt_sched_class = {
>>  	.load_balance		= load_balance_rt,
>>  	.move_one_task		= move_one_task_rt,
>>  	.set_cpus_allowed       = set_cpus_allowed_rt,
>> -	.join_domain            = join_domain_rt,
>> -	.leave_domain           = leave_domain_rt,
>> +	.join_domain            = rq_online_rt,
>> +	.leave_domain           = rq_offline_rt,
>> +	.online                 = rq_online_rt,
>> +	.offline                = rq_offline_rt,
>>  	.pre_schedule		= pre_schedule_rt,
>>  	.post_schedule		= post_schedule_rt,
>>  	.task_wake_up		= task_wake_up_rt,



  reply	other threads:[~2008-06-04 14:29 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-04  3:24 [PATCH] sched: fix cpupri hotplug support Gregory Haskins
2008-06-04 11:59 ` Peter Zijlstra
2008-06-04 14:28   ` Gregory Haskins [this message]
2008-06-04 14:42     ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48466E65.BA47.005A.0@novell.com \
    --to=ghaskins@novell.com \
    --cc=acme@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutonix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.