All of lore.kernel.org
 help / color / mirror / Atom feed
From: Clark Williams <williams@redhat.com>
To: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@elte.hu>,
	LKML <linux-kernel@vger.kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>
Subject: Re: [PATCH v2] sched: add a tuning knob to allow changing RR tmeslice
Date: Fri, 25 Jan 2013 12:02:56 -0600	[thread overview]
Message-ID: <20130125120256.52733b51@redhat.com> (raw)
In-Reply-To: <20130125081916.GC25314@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 10417 bytes --]

On Fri, 25 Jan 2013 09:19:16 +0100
Ingo Molnar <mingo@kernel.org> wrote:

> 
> * Clark Williams <williams@redhat.com> wrote:
> 
> > On Thu, 24 Jan 2013 17:59:55 +0100
> > Ingo Molnar <mingo@kernel.org> wrote:
> > 
> > > 
> > > * Clark Williams <williams@redhat.com> wrote:
> > > 
> > > > This version stores the user-input value in a separate 
> > > > location from the jiffies values used by the scheduler, to 
> > > > prevent a race condition.
> > > > 
> > > > Subject: [PATCH v2] sched: add a tuning knob to allow changing 
> > > > RR timeslice
> > > 
> > > looks useful.
> > > 
> > > > @@ -2010,7 +2010,7 @@ static void task_tick_rt(struct rq *rq, struct
> > > > task_struct *p, int queued) if (--p->rt.time_slice)
> > > >  		return;
> > > >  
> > > > -	p->rt.time_slice = RR_TIMESLICE;
> > > > +	p->rt.time_slice = sched_rr_timeslice;
> > > >  
> > > >  	/*
> > > >  	 * Requeue to the end of queue if we (and all of our
> > > > ancestors) are the @@ -2041,7 +2041,7 @@ static unsigned int
> > > > get_rr_interval_rt(struct rq *rq, struct task_struct *task)
> > > >  	 * Time slice is 0 for SCHED_FIFO tasks
> > > 
> > > Patch wont apply due to patch corruption, alas.
> > > 
> > > 
> > 
> > Easily fixed. Modified for 3.8-rc4:
> 
> Thanks. Some more substantial review feedback this time around:
> 
> > 
> > commit 0e2d40c5c84d06670f85cc212591f27f69f59c62
> > Author: Clark Williams <williams@redhat.com>
> > Date:   Thu Jan 24 13:51:01 2013 -0600
> > 
> >     [kernel] sched: add a tuning knob to allow changing RR timeslice
> >     
> >     Add a /proc/sys/kernel scheduler knob named sched_rr_timeslice_ms
> 
> s/Add a /proc/sys/kernel/sched_rr_timeslice_ms scheduler knob
> 
> >     that allows global changing of the SCHED_RR timeslice value. User
> >     visable value is in milliseconds but is stored as jiffies.  Setting
> 
> s/visible
> 
> >     to 0 (zero) resets to the default (currently 100ms).
> >     
> >     Signed-off-by: Clark Williams <williams@redhat.com>
> > 
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index 6fc8f45..d803690 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -2082,6 +2082,11 @@ int sched_rt_handler(struct ctl_table *table, int write,
> >  		void __user *buffer, size_t *lenp,
> >  		loff_t *ppos);
> >  
> > +extern int sched_rr_timeslice;
> > +extern int sched_rr_handler(struct ctl_table *table, int write,
> > +		void __user *buffer, size_t *lenp,
> > +		loff_t *ppos);
> > +
> 
> Shouldn't this be in kernel/sched/sched.h instead of the 
> (already too large) linux/sched.h?
> 

I don't think that will work be cause kernel/sysctl.c needs the
externs and I doubt we want to include kernel/sched/sched.h from there. 

> >  #ifdef CONFIG_SCHED_AUTOGROUP
> >  extern unsigned int sysctl_sched_autogroup_enabled;
> >  
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 257002c..5675074 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -7507,6 +7507,24 @@ static int sched_rt_global_constraints(void)
> >  }
> >  #endif /* CONFIG_RT_GROUP_SCHED */
> >  
> > +int sched_rr_handler(struct ctl_table *table, int write,
> > +		void __user *buffer, size_t *lenp,
> > +		loff_t *ppos)
> > +{
> > +	int ret;
> > +	static DEFINE_MUTEX(mutex);
> 
> This mutex should be outside the function (not in local scope), 
> and named like this, with an explanation:
> 
> +/*
> + * Since it's an int the CPU will always read a full word
> + * of the RR timeslice interval - no need for locking.
> + *
> + * But in the RR handler we read the value multiple times
> + * before setting it, which should be protected - hence
> + * this mutex:
> + */
> +static DEFINE_MUTEX(rr_timeslice_mutex);
> 

Done.

> 
> > +
> > +	mutex_lock(&mutex);
> > +	ret = proc_dointvec(table, write, buffer, lenp, ppos);
> > +	/* make sure that internally we keep jiffies */
> > +	/* also, writing zero resets timeslice to default */
> > +	if (!ret && write) 
> > +		sched_rr_timeslice = sched_rr_timeslice <= 0 ? 
> > +			RR_TIMESLICE : msecs_to_jiffies(sched_rr_timeslice);
> > +	mutex_unlock(&mutex);
> > +	return ret;
> 
> A couple of stray spaces at end of lines. Also, please put curly 
> braces around multi-line statements.
> 

I didn't put curly braces there because "technically" that's a single
line statement. But since in fact it is multi-line, I've added them :).

> Multi-line comments should be like this:
> 
>   /*
>    * Comment .....
>    * ...... goes here.
>    */
> 

Done.

> 
> > +}
> > +
> >  int sched_rt_handler(struct ctl_table *table, int write,
> >  		void __user *buffer, size_t *lenp,
> >  		loff_t *ppos)
> > diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> > index 418feb0..6c54e83 100644
> > --- a/kernel/sched/rt.c
> > +++ b/kernel/sched/rt.c
> > @@ -11,6 +11,8 @@ static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun);
> >  
> >  struct rt_bandwidth def_rt_bandwidth;
> >  
> > +int sched_rr_timeslice = RR_TIMESLICE;
> > +
> 
> I think this could go into core.c as well, together with the 
> mutex. That way it's easier to see why the mutex is needed as 
> well.
> 
> It should also be __read_mostly.

Done and done. 

> 
> >  static enum hrtimer_restart sched_rt_period_timer(struct hrtimer *timer)
> >  {
> >  	struct rt_bandwidth *rt_b =
> > @@ -2010,7 +2012,7 @@ static void task_tick_rt(struct rq *rq, struct task_struct *p, int queued)
> >  	if (--p->rt.time_slice)
> >  		return;
> >  
> > -	p->rt.time_slice = RR_TIMESLICE;
> > +	p->rt.time_slice = sched_rr_timeslice;
> >  
> >  	/*
> >  	 * Requeue to the end of queue if we (and all of our ancestors) are the
> > @@ -2041,7 +2043,7 @@ static unsigned int get_rr_interval_rt(struct rq *rq, struct task_struct *task)
> >  	 * Time slice is 0 for SCHED_FIFO tasks
> >  	 */
> >  	if (task->policy == SCHED_RR)
> > -		return RR_TIMESLICE;
> > +		return sched_rr_timeslice;
> >  	else
> >  		return 0;
> >  }
> > diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> > index c88878d..1eabf86 100644
> > --- a/kernel/sysctl.c
> > +++ b/kernel/sysctl.c
> > @@ -403,6 +403,14 @@ static struct ctl_table kern_table[] = {
> >  		.mode		= 0644,
> >  		.proc_handler	= sched_rt_handler,
> >  	},
> > +	{
> > +		.procname	= "sched_rr_timeslice_ms",
> > +		.data		= &sched_rr_timeslice,
> > +		.maxlen		= sizeof(int),
> > +		.mode		= 0644,
> > +		.proc_handler	= sched_rr_handler,
> 
> Does this allow negative values? Wouldn't it be better to make 
> it unsigned int all around?

Good point. Changed to unsigned int. 

Updated patch:

commit 93dfbf6326cc4ba85c917f9440203f9fc19e9bcc
Author: Clark Williams <williams@redhat.com>
Date:   Thu Jan 24 13:51:01 2013 -0600

    [kernel] sched: add a tuning knob to allow changing RR timeslice
    
    Add a /proc/sys/kernel/sched_rr_timeslice_ms tuning knob
    that allows global changing of the SCHED_RR timeslice value. User
    visible value is in milliseconds but is stored as jiffies.  Setting
    to 0 (zero) resets to the default (currently 100ms).
    
    Signed-off-by: Clark Williams <williams@redhat.com>

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 6fc8f45..5d0a7cf 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1226,6 +1226,11 @@ struct sched_rt_entity {
  */
 #define RR_TIMESLICE		(100 * HZ / 1000)
 
+extern __read_mostly unsigned int sched_rr_timeslice;
+
+extern int sched_rr_handler(struct ctl_table *table, int write,
+		void __user *buffer, size_t *lenp,
+		loff_t *ppos);
 struct rcu_node;
 
 enum perf_event_task_context {
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 257002c..0f7e6a2 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7507,6 +7507,43 @@ static int sched_rt_global_constraints(void)
 }
 #endif /* CONFIG_RT_GROUP_SCHED */
 
+/*
+ * Since it's an int the CPU will always read a full word
+ * of the RR timeslice interval - no need for locking.
+ *
+ * But in the RR handler we read the value multiple times
+ * before setting it, which should be protected - hence
+ * this mutex:
+ */
+static DEFINE_MUTEX(rr_timeslice_mutex);
+
+unsigned int __read_mostly sched_rr_timeslice = RR_TIMESLICE;
+
+/*
+ * manage /proc/sys/kernel/sched_rr_timeslice_ms entry
+ * for changing SCHED_RR quantum interval
+ */
+
+int sched_rr_handler(struct ctl_table *table, int write,
+		void __user *buffer, size_t *lenp,
+		loff_t *ppos)
+{
+	int ret;
+
+	mutex_lock(&rr_timeslice_mutex);
+	ret = proc_dointvec(table, write, buffer, lenp, ppos);
+	/*
+	 * make sure that internally we keep jiffies
+	 * also, writing zero resets timeslice to default
+	 */
+	if (!ret && write)  {
+		sched_rr_timeslice = sched_rr_timeslice == 0 ?
+			RR_TIMESLICE : msecs_to_jiffies(sched_rr_timeslice);
+	}
+	mutex_unlock(&rr_timeslice_mutex);
+	return ret;
+}
+
 int sched_rt_handler(struct ctl_table *table, int write,
 		void __user *buffer, size_t *lenp,
 		loff_t *ppos)
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 418feb0..71aa6d0 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2010,7 +2010,7 @@ static void task_tick_rt(struct rq *rq, struct task_struct *p, int queued)
 	if (--p->rt.time_slice)
 		return;
 
-	p->rt.time_slice = RR_TIMESLICE;
+	p->rt.time_slice = sched_rr_timeslice;
 
 	/*
 	 * Requeue to the end of queue if we (and all of our ancestors) are the
@@ -2041,7 +2041,7 @@ static unsigned int get_rr_interval_rt(struct rq *rq, struct task_struct *task)
 	 * Time slice is 0 for SCHED_FIFO tasks
 	 */
 	if (task->policy == SCHED_RR)
-		return RR_TIMESLICE;
+		return sched_rr_timeslice;
 	else
 		return 0;
 }
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index c88878d..6770fc8 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -403,6 +403,14 @@ static struct ctl_table kern_table[] = {
 		.mode		= 0644,
 		.proc_handler	= sched_rt_handler,
 	},
+	{
+		.procname	= "sched_rr_timeslice_ms",
+		.data		= &sched_rr_timeslice,
+		.maxlen		= sizeof(unsigned int),
+		.mode		= 0644,
+		.proc_handler	= sched_rr_handler,
+	},
+
 #ifdef CONFIG_SCHED_AUTOGROUP
 	{
 		.procname	= "sched_autogroup_enabled",




[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

  reply	other threads:[~2013-01-25 18:03 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-09  1:51 [PATCH v2] sched: add a tuning knob to allow changing RR tmeslice Clark Williams
2013-01-24 16:59 ` Ingo Molnar
2013-01-24 19:54   ` Clark Williams
2013-01-25  8:19     ` Ingo Molnar
2013-01-25 18:02       ` Clark Williams [this message]
2013-01-25 18:36         ` Ingo Molnar
2013-01-25 21:43           ` Clark Williams
2013-01-26 12:16             ` Ingo Molnar
2013-01-26 12:18             ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130125120256.52733b51@redhat.com \
    --to=williams@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.