public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH 0/3 v2] sched: Display deadline bandwidth and other SCHED_DEBUG clean up
@ 2016-02-03 16:56 Steven Rostedt
  2016-02-03 16:56 ` [RFC][PATCH 1/3 v2] sched: Move sched_feature file setup into debug.c Steven Rostedt
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Steven Rostedt @ 2016-02-03 16:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Juri Lelli,
	Clark Williams, Andrew Morton

I'm starting to play with SCHED_DEADLINE a bit and I'm able to cause
a bandwidth "leak". Then I realized there's no way to examine what bandwidths
are enabled on which CPUs. I added the bandwith ratios to the
/proc/sched_debug file.

I will be posting the SCHED_DEADLINE issue in a separate thread.

# grep dl /proc/sched_debug         
dl_rq[0]:
  .dl_nr_running                 : 0
  .dl_bw->bw                     : 996147
  .dl_bw->total_bw               : 0
dl_rq[1]:
  .dl_nr_running                 : 0
  .dl_bw->bw                     : 996147
  .dl_bw->total_bw               : 104857
dl_rq[2]:
  .dl_nr_running                 : 0
  .dl_bw->bw                     : 996147
  .dl_bw->total_bw               : 0
dl_rq[3]:
  .dl_nr_running                 : 0
  .dl_bw->bw                     : 996147
  .dl_bw->total_bw               : 0
dl_rq[4]:
  .dl_nr_running                 : 0
  .dl_bw->bw                     : 996147
  .dl_bw->total_bw               : 0
dl_rq[5]:
  .dl_nr_running                 : 0
  .dl_bw->bw                     : 996147
  .dl_bw->total_bw               : 0
dl_rq[6]:
  .dl_nr_running                 : 0
  .dl_bw->bw                     : 996147
  .dl_bw->total_bw               : 0
dl_rq[7]:
  .dl_nr_running                 : 0
  .dl_bw->bw                     : 996147
  .dl_bw->total_bw               : 0

Before adding this code, I also realized there was a bit of 
SCHED_DEBUG code in the kernel/sched/core.c file, and decided to move that
to kernel/sched/debug.c to clean the core.c file up a bit. Those patches
are mostly orthognal to the deadline_bw file, but decided to group them
together here.

Changes since v1:

  o Added the bandwith to /proc/sched_debug instead of creating a new file.

  o Removed patch: sched: Move sched_domain_debug into debug.c
      As it was decided that debug.c shouldn't have domain debugging.

Steven Rostedt (Red Hat) (3):
      sched: Move sched_feature file setup into debug.c
      sched: Move sched_domain_sysctl to debug.c
      sched: Add bandwidth ratio to /proc/sched_debug

----
 kernel/sched/core.c  | 311 --------------------------------------------------
 kernel/sched/debug.c | 313 +++++++++++++++++++++++++++++++++++++++++++++++++++
 kernel/sched/sched.h |  13 +++
 3 files changed, 326 insertions(+), 311 deletions(-)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFC][PATCH 1/3 v2] sched: Move sched_feature file setup into debug.c
  2016-02-03 16:56 [RFC][PATCH 0/3 v2] sched: Display deadline bandwidth and other SCHED_DEBUG clean up Steven Rostedt
@ 2016-02-03 16:56 ` Steven Rostedt
  2016-02-03 16:57 ` [RFC][PATCH 2/3 v2] sched: Move sched_domain_sysctl to debug.c Steven Rostedt
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Steven Rostedt @ 2016-02-03 16:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Juri Lelli,
	Clark Williams, Andrew Morton

[-- Attachment #1: 0001-sched-Move-sched_feature-file-setup-into-debug.c.patch --]
[-- Type: text/plain, Size: 6768 bytes --]

From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>

As sched_feature is only created when SCHED_DEBUG is enabled, and the file
debug.c is only compiled when SCHED_DEBUG is enabled, it makes sense to move
sched_feature setup into that file and get rid of the #ifdef.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/sched/core.c  | 133 ---------------------------------------------------
 kernel/sched/debug.c | 131 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 131 insertions(+), 133 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 63d3a24e081a..c49e64c4675e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -66,7 +66,6 @@
 #include <linux/pagemap.h>
 #include <linux/hrtimer.h>
 #include <linux/tick.h>
-#include <linux/debugfs.h>
 #include <linux/ctype.h>
 #include <linux/ftrace.h>
 #include <linux/slab.h>
@@ -124,138 +123,6 @@ const_debug unsigned int sysctl_sched_features =
 
 #undef SCHED_FEAT
 
-#ifdef CONFIG_SCHED_DEBUG
-#define SCHED_FEAT(name, enabled)	\
-	#name ,
-
-static const char * const sched_feat_names[] = {
-#include "features.h"
-};
-
-#undef SCHED_FEAT
-
-static int sched_feat_show(struct seq_file *m, void *v)
-{
-	int i;
-
-	for (i = 0; i < __SCHED_FEAT_NR; i++) {
-		if (!(sysctl_sched_features & (1UL << i)))
-			seq_puts(m, "NO_");
-		seq_printf(m, "%s ", sched_feat_names[i]);
-	}
-	seq_puts(m, "\n");
-
-	return 0;
-}
-
-#ifdef HAVE_JUMP_LABEL
-
-#define jump_label_key__true  STATIC_KEY_INIT_TRUE
-#define jump_label_key__false STATIC_KEY_INIT_FALSE
-
-#define SCHED_FEAT(name, enabled)	\
-	jump_label_key__##enabled ,
-
-struct static_key sched_feat_keys[__SCHED_FEAT_NR] = {
-#include "features.h"
-};
-
-#undef SCHED_FEAT
-
-static void sched_feat_disable(int i)
-{
-	static_key_disable(&sched_feat_keys[i]);
-}
-
-static void sched_feat_enable(int i)
-{
-	static_key_enable(&sched_feat_keys[i]);
-}
-#else
-static void sched_feat_disable(int i) { };
-static void sched_feat_enable(int i) { };
-#endif /* HAVE_JUMP_LABEL */
-
-static int sched_feat_set(char *cmp)
-{
-	int i;
-	int neg = 0;
-
-	if (strncmp(cmp, "NO_", 3) == 0) {
-		neg = 1;
-		cmp += 3;
-	}
-
-	for (i = 0; i < __SCHED_FEAT_NR; i++) {
-		if (strcmp(cmp, sched_feat_names[i]) == 0) {
-			if (neg) {
-				sysctl_sched_features &= ~(1UL << i);
-				sched_feat_disable(i);
-			} else {
-				sysctl_sched_features |= (1UL << i);
-				sched_feat_enable(i);
-			}
-			break;
-		}
-	}
-
-	return i;
-}
-
-static ssize_t
-sched_feat_write(struct file *filp, const char __user *ubuf,
-		size_t cnt, loff_t *ppos)
-{
-	char buf[64];
-	char *cmp;
-	int i;
-	struct inode *inode;
-
-	if (cnt > 63)
-		cnt = 63;
-
-	if (copy_from_user(&buf, ubuf, cnt))
-		return -EFAULT;
-
-	buf[cnt] = 0;
-	cmp = strstrip(buf);
-
-	/* Ensure the static_key remains in a consistent state */
-	inode = file_inode(filp);
-	inode_lock(inode);
-	i = sched_feat_set(cmp);
-	inode_unlock(inode);
-	if (i == __SCHED_FEAT_NR)
-		return -EINVAL;
-
-	*ppos += cnt;
-
-	return cnt;
-}
-
-static int sched_feat_open(struct inode *inode, struct file *filp)
-{
-	return single_open(filp, sched_feat_show, NULL);
-}
-
-static const struct file_operations sched_feat_fops = {
-	.open		= sched_feat_open,
-	.write		= sched_feat_write,
-	.read		= seq_read,
-	.llseek		= seq_lseek,
-	.release	= single_release,
-};
-
-static __init int sched_init_debug(void)
-{
-	debugfs_create_file("sched_features", 0644, NULL, NULL,
-			&sched_feat_fops);
-
-	return 0;
-}
-late_initcall(sched_init_debug);
-#endif /* CONFIG_SCHED_DEBUG */
-
 /*
  * Number of tasks to iterate in a single balance run.
  * Limited because this is done with IRQs disabled.
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 641511771ae6..593178be8124 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -16,6 +16,7 @@
 #include <linux/kallsyms.h>
 #include <linux/utsname.h>
 #include <linux/mempolicy.h>
+#include <linux/debugfs.h>
 
 #include "sched.h"
 
@@ -58,6 +59,136 @@ static unsigned long nsec_low(unsigned long long nsec)
 
 #define SPLIT_NS(x) nsec_high(x), nsec_low(x)
 
+#define SCHED_FEAT(name, enabled)	\
+	#name ,
+
+static const char * const sched_feat_names[] = {
+#include "features.h"
+};
+
+#undef SCHED_FEAT
+
+static int sched_feat_show(struct seq_file *m, void *v)
+{
+	int i;
+
+	for (i = 0; i < __SCHED_FEAT_NR; i++) {
+		if (!(sysctl_sched_features & (1UL << i)))
+			seq_puts(m, "NO_");
+		seq_printf(m, "%s ", sched_feat_names[i]);
+	}
+	seq_puts(m, "\n");
+
+	return 0;
+}
+
+#ifdef HAVE_JUMP_LABEL
+
+#define jump_label_key__true  STATIC_KEY_INIT_TRUE
+#define jump_label_key__false STATIC_KEY_INIT_FALSE
+
+#define SCHED_FEAT(name, enabled)	\
+	jump_label_key__##enabled ,
+
+struct static_key sched_feat_keys[__SCHED_FEAT_NR] = {
+#include "features.h"
+};
+
+#undef SCHED_FEAT
+
+static void sched_feat_disable(int i)
+{
+	static_key_disable(&sched_feat_keys[i]);
+}
+
+static void sched_feat_enable(int i)
+{
+	static_key_enable(&sched_feat_keys[i]);
+}
+#else
+static void sched_feat_disable(int i) { };
+static void sched_feat_enable(int i) { };
+#endif /* HAVE_JUMP_LABEL */
+
+static int sched_feat_set(char *cmp)
+{
+	int i;
+	int neg = 0;
+
+	if (strncmp(cmp, "NO_", 3) == 0) {
+		neg = 1;
+		cmp += 3;
+	}
+
+	for (i = 0; i < __SCHED_FEAT_NR; i++) {
+		if (strcmp(cmp, sched_feat_names[i]) == 0) {
+			if (neg) {
+				sysctl_sched_features &= ~(1UL << i);
+				sched_feat_disable(i);
+			} else {
+				sysctl_sched_features |= (1UL << i);
+				sched_feat_enable(i);
+			}
+			break;
+		}
+	}
+
+	return i;
+}
+
+static ssize_t
+sched_feat_write(struct file *filp, const char __user *ubuf,
+		size_t cnt, loff_t *ppos)
+{
+	char buf[64];
+	char *cmp;
+	int i;
+	struct inode *inode;
+
+	if (cnt > 63)
+		cnt = 63;
+
+	if (copy_from_user(&buf, ubuf, cnt))
+		return -EFAULT;
+
+	buf[cnt] = 0;
+	cmp = strstrip(buf);
+
+	/* Ensure the static_key remains in a consistent state */
+	inode = file_inode(filp);
+	inode_lock(inode);
+	i = sched_feat_set(cmp);
+	inode_unlock(inode);
+	if (i == __SCHED_FEAT_NR)
+		return -EINVAL;
+
+	*ppos += cnt;
+
+	return cnt;
+}
+
+static int sched_feat_open(struct inode *inode, struct file *filp)
+{
+	return single_open(filp, sched_feat_show, NULL);
+}
+
+static const struct file_operations sched_feat_fops = {
+	.open		= sched_feat_open,
+	.write		= sched_feat_write,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= single_release,
+};
+
+static __init int sched_init_debug(void)
+{
+	debugfs_create_file("sched_features", 0644, NULL, NULL,
+			&sched_feat_fops);
+
+	return 0;
+}
+late_initcall(sched_init_debug);
+
 #ifdef CONFIG_FAIR_GROUP_SCHED
 static void print_cfs_group_stats(struct seq_file *m, int cpu, struct task_group *tg)
 {
-- 
2.6.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC][PATCH 2/3 v2] sched: Move sched_domain_sysctl to debug.c
  2016-02-03 16:56 [RFC][PATCH 0/3 v2] sched: Display deadline bandwidth and other SCHED_DEBUG clean up Steven Rostedt
  2016-02-03 16:56 ` [RFC][PATCH 1/3 v2] sched: Move sched_feature file setup into debug.c Steven Rostedt
@ 2016-02-03 16:57 ` Steven Rostedt
  2016-02-03 16:57 ` [RFC][PATCH 3/3 v2] sched: Add bandwidth ratio to /proc/sched_debug Steven Rostedt
  2016-02-10 15:13 ` [RFC][PATCH 0/3 v2] sched: Display deadline bandwidth and other SCHED_DEBUG clean up Steven Rostedt
  3 siblings, 0 replies; 7+ messages in thread
From: Steven Rostedt @ 2016-02-03 16:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Juri Lelli,
	Clark Williams, Andrew Morton

[-- Attachment #1: 0002-sched-Move-sched_domain_sysctl-to-debug.c.patch --]
[-- Type: text/plain, Size: 11900 bytes --]

From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>

The sched_domain_sysctl setup is only enabled when SCHED_DEBUG is
configured. As debug.c is only compiled when SCHED_DEBUG is configured as
well, move the setup of sched_domain_sysctl into that file.

Note, the (un)register_sched_domain_sysctl() functions had to be changed
from static to allow access to them from core.c.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/sched/core.c  | 178 ---------------------------------------------------
 kernel/sched/debug.c | 173 +++++++++++++++++++++++++++++++++++++++++++++++++
 kernel/sched/sched.h |  13 ++++
 3 files changed, 186 insertions(+), 178 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c49e64c4675e..9c6d976d94f0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -70,7 +70,6 @@
 #include <linux/ftrace.h>
 #include <linux/slab.h>
 #include <linux/init_task.h>
-#include <linux/binfmts.h>
 #include <linux/context_tracking.h>
 #include <linux/compiler.h>
 
@@ -5272,183 +5271,6 @@ static void migrate_tasks(struct rq *dead_rq)
 }
 #endif /* CONFIG_HOTPLUG_CPU */
 
-#if defined(CONFIG_SCHED_DEBUG) && defined(CONFIG_SYSCTL)
-
-static struct ctl_table sd_ctl_dir[] = {
-	{
-		.procname	= "sched_domain",
-		.mode		= 0555,
-	},
-	{}
-};
-
-static struct ctl_table sd_ctl_root[] = {
-	{
-		.procname	= "kernel",
-		.mode		= 0555,
-		.child		= sd_ctl_dir,
-	},
-	{}
-};
-
-static struct ctl_table *sd_alloc_ctl_entry(int n)
-{
-	struct ctl_table *entry =
-		kcalloc(n, sizeof(struct ctl_table), GFP_KERNEL);
-
-	return entry;
-}
-
-static void sd_free_ctl_entry(struct ctl_table **tablep)
-{
-	struct ctl_table *entry;
-
-	/*
-	 * In the intermediate directories, both the child directory and
-	 * procname are dynamically allocated and could fail but the mode
-	 * will always be set. In the lowest directory the names are
-	 * static strings and all have proc handlers.
-	 */
-	for (entry = *tablep; entry->mode; entry++) {
-		if (entry->child)
-			sd_free_ctl_entry(&entry->child);
-		if (entry->proc_handler == NULL)
-			kfree(entry->procname);
-	}
-
-	kfree(*tablep);
-	*tablep = NULL;
-}
-
-static int min_load_idx = 0;
-static int max_load_idx = CPU_LOAD_IDX_MAX-1;
-
-static void
-set_table_entry(struct ctl_table *entry,
-		const char *procname, void *data, int maxlen,
-		umode_t mode, proc_handler *proc_handler,
-		bool load_idx)
-{
-	entry->procname = procname;
-	entry->data = data;
-	entry->maxlen = maxlen;
-	entry->mode = mode;
-	entry->proc_handler = proc_handler;
-
-	if (load_idx) {
-		entry->extra1 = &min_load_idx;
-		entry->extra2 = &max_load_idx;
-	}
-}
-
-static struct ctl_table *
-sd_alloc_ctl_domain_table(struct sched_domain *sd)
-{
-	struct ctl_table *table = sd_alloc_ctl_entry(14);
-
-	if (table == NULL)
-		return NULL;
-
-	set_table_entry(&table[0], "min_interval", &sd->min_interval,
-		sizeof(long), 0644, proc_doulongvec_minmax, false);
-	set_table_entry(&table[1], "max_interval", &sd->max_interval,
-		sizeof(long), 0644, proc_doulongvec_minmax, false);
-	set_table_entry(&table[2], "busy_idx", &sd->busy_idx,
-		sizeof(int), 0644, proc_dointvec_minmax, true);
-	set_table_entry(&table[3], "idle_idx", &sd->idle_idx,
-		sizeof(int), 0644, proc_dointvec_minmax, true);
-	set_table_entry(&table[4], "newidle_idx", &sd->newidle_idx,
-		sizeof(int), 0644, proc_dointvec_minmax, true);
-	set_table_entry(&table[5], "wake_idx", &sd->wake_idx,
-		sizeof(int), 0644, proc_dointvec_minmax, true);
-	set_table_entry(&table[6], "forkexec_idx", &sd->forkexec_idx,
-		sizeof(int), 0644, proc_dointvec_minmax, true);
-	set_table_entry(&table[7], "busy_factor", &sd->busy_factor,
-		sizeof(int), 0644, proc_dointvec_minmax, false);
-	set_table_entry(&table[8], "imbalance_pct", &sd->imbalance_pct,
-		sizeof(int), 0644, proc_dointvec_minmax, false);
-	set_table_entry(&table[9], "cache_nice_tries",
-		&sd->cache_nice_tries,
-		sizeof(int), 0644, proc_dointvec_minmax, false);
-	set_table_entry(&table[10], "flags", &sd->flags,
-		sizeof(int), 0644, proc_dointvec_minmax, false);
-	set_table_entry(&table[11], "max_newidle_lb_cost",
-		&sd->max_newidle_lb_cost,
-		sizeof(long), 0644, proc_doulongvec_minmax, false);
-	set_table_entry(&table[12], "name", sd->name,
-		CORENAME_MAX_SIZE, 0444, proc_dostring, false);
-	/* &table[13] is terminator */
-
-	return table;
-}
-
-static struct ctl_table *sd_alloc_ctl_cpu_table(int cpu)
-{
-	struct ctl_table *entry, *table;
-	struct sched_domain *sd;
-	int domain_num = 0, i;
-	char buf[32];
-
-	for_each_domain(cpu, sd)
-		domain_num++;
-	entry = table = sd_alloc_ctl_entry(domain_num + 1);
-	if (table == NULL)
-		return NULL;
-
-	i = 0;
-	for_each_domain(cpu, sd) {
-		snprintf(buf, 32, "domain%d", i);
-		entry->procname = kstrdup(buf, GFP_KERNEL);
-		entry->mode = 0555;
-		entry->child = sd_alloc_ctl_domain_table(sd);
-		entry++;
-		i++;
-	}
-	return table;
-}
-
-static struct ctl_table_header *sd_sysctl_header;
-static void register_sched_domain_sysctl(void)
-{
-	int i, cpu_num = num_possible_cpus();
-	struct ctl_table *entry = sd_alloc_ctl_entry(cpu_num + 1);
-	char buf[32];
-
-	WARN_ON(sd_ctl_dir[0].child);
-	sd_ctl_dir[0].child = entry;
-
-	if (entry == NULL)
-		return;
-
-	for_each_possible_cpu(i) {
-		snprintf(buf, 32, "cpu%d", i);
-		entry->procname = kstrdup(buf, GFP_KERNEL);
-		entry->mode = 0555;
-		entry->child = sd_alloc_ctl_cpu_table(i);
-		entry++;
-	}
-
-	WARN_ON(sd_sysctl_header);
-	sd_sysctl_header = register_sysctl_table(sd_ctl_root);
-}
-
-/* may be called multiple times per register */
-static void unregister_sched_domain_sysctl(void)
-{
-	unregister_sysctl_table(sd_sysctl_header);
-	sd_sysctl_header = NULL;
-	if (sd_ctl_dir[0].child)
-		sd_free_ctl_entry(&sd_ctl_dir[0].child);
-}
-#else
-static void register_sched_domain_sysctl(void)
-{
-}
-static void unregister_sched_domain_sysctl(void)
-{
-}
-#endif /* CONFIG_SCHED_DEBUG && CONFIG_SYSCTL */
-
 static void set_rq_online(struct rq *rq)
 {
 	if (!rq->online) {
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 593178be8124..a5625e793d15 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -189,6 +189,179 @@ static __init int sched_init_debug(void)
 }
 late_initcall(sched_init_debug);
 
+#ifdef CONFIG_SMP
+
+#ifdef CONFIG_SYSCTL
+
+static struct ctl_table sd_ctl_dir[] = {
+	{
+		.procname	= "sched_domain",
+		.mode		= 0555,
+	},
+	{}
+};
+
+static struct ctl_table sd_ctl_root[] = {
+	{
+		.procname	= "kernel",
+		.mode		= 0555,
+		.child		= sd_ctl_dir,
+	},
+	{}
+};
+
+static struct ctl_table *sd_alloc_ctl_entry(int n)
+{
+	struct ctl_table *entry =
+		kcalloc(n, sizeof(struct ctl_table), GFP_KERNEL);
+
+	return entry;
+}
+
+static void sd_free_ctl_entry(struct ctl_table **tablep)
+{
+	struct ctl_table *entry;
+
+	/*
+	 * In the intermediate directories, both the child directory and
+	 * procname are dynamically allocated and could fail but the mode
+	 * will always be set. In the lowest directory the names are
+	 * static strings and all have proc handlers.
+	 */
+	for (entry = *tablep; entry->mode; entry++) {
+		if (entry->child)
+			sd_free_ctl_entry(&entry->child);
+		if (entry->proc_handler == NULL)
+			kfree(entry->procname);
+	}
+
+	kfree(*tablep);
+	*tablep = NULL;
+}
+
+static int min_load_idx = 0;
+static int max_load_idx = CPU_LOAD_IDX_MAX-1;
+
+static void
+set_table_entry(struct ctl_table *entry,
+		const char *procname, void *data, int maxlen,
+		umode_t mode, proc_handler *proc_handler,
+		bool load_idx)
+{
+	entry->procname = procname;
+	entry->data = data;
+	entry->maxlen = maxlen;
+	entry->mode = mode;
+	entry->proc_handler = proc_handler;
+
+	if (load_idx) {
+		entry->extra1 = &min_load_idx;
+		entry->extra2 = &max_load_idx;
+	}
+}
+
+static struct ctl_table *
+sd_alloc_ctl_domain_table(struct sched_domain *sd)
+{
+	struct ctl_table *table = sd_alloc_ctl_entry(14);
+
+	if (table == NULL)
+		return NULL;
+
+	set_table_entry(&table[0], "min_interval", &sd->min_interval,
+		sizeof(long), 0644, proc_doulongvec_minmax, false);
+	set_table_entry(&table[1], "max_interval", &sd->max_interval,
+		sizeof(long), 0644, proc_doulongvec_minmax, false);
+	set_table_entry(&table[2], "busy_idx", &sd->busy_idx,
+		sizeof(int), 0644, proc_dointvec_minmax, true);
+	set_table_entry(&table[3], "idle_idx", &sd->idle_idx,
+		sizeof(int), 0644, proc_dointvec_minmax, true);
+	set_table_entry(&table[4], "newidle_idx", &sd->newidle_idx,
+		sizeof(int), 0644, proc_dointvec_minmax, true);
+	set_table_entry(&table[5], "wake_idx", &sd->wake_idx,
+		sizeof(int), 0644, proc_dointvec_minmax, true);
+	set_table_entry(&table[6], "forkexec_idx", &sd->forkexec_idx,
+		sizeof(int), 0644, proc_dointvec_minmax, true);
+	set_table_entry(&table[7], "busy_factor", &sd->busy_factor,
+		sizeof(int), 0644, proc_dointvec_minmax, false);
+	set_table_entry(&table[8], "imbalance_pct", &sd->imbalance_pct,
+		sizeof(int), 0644, proc_dointvec_minmax, false);
+	set_table_entry(&table[9], "cache_nice_tries",
+		&sd->cache_nice_tries,
+		sizeof(int), 0644, proc_dointvec_minmax, false);
+	set_table_entry(&table[10], "flags", &sd->flags,
+		sizeof(int), 0644, proc_dointvec_minmax, false);
+	set_table_entry(&table[11], "max_newidle_lb_cost",
+		&sd->max_newidle_lb_cost,
+		sizeof(long), 0644, proc_doulongvec_minmax, false);
+	set_table_entry(&table[12], "name", sd->name,
+		CORENAME_MAX_SIZE, 0444, proc_dostring, false);
+	/* &table[13] is terminator */
+
+	return table;
+}
+
+static struct ctl_table *sd_alloc_ctl_cpu_table(int cpu)
+{
+	struct ctl_table *entry, *table;
+	struct sched_domain *sd;
+	int domain_num = 0, i;
+	char buf[32];
+
+	for_each_domain(cpu, sd)
+		domain_num++;
+	entry = table = sd_alloc_ctl_entry(domain_num + 1);
+	if (table == NULL)
+		return NULL;
+
+	i = 0;
+	for_each_domain(cpu, sd) {
+		snprintf(buf, 32, "domain%d", i);
+		entry->procname = kstrdup(buf, GFP_KERNEL);
+		entry->mode = 0555;
+		entry->child = sd_alloc_ctl_domain_table(sd);
+		entry++;
+		i++;
+	}
+	return table;
+}
+
+static struct ctl_table_header *sd_sysctl_header;
+void register_sched_domain_sysctl(void)
+{
+	int i, cpu_num = num_possible_cpus();
+	struct ctl_table *entry = sd_alloc_ctl_entry(cpu_num + 1);
+	char buf[32];
+
+	WARN_ON(sd_ctl_dir[0].child);
+	sd_ctl_dir[0].child = entry;
+
+	if (entry == NULL)
+		return;
+
+	for_each_possible_cpu(i) {
+		snprintf(buf, 32, "cpu%d", i);
+		entry->procname = kstrdup(buf, GFP_KERNEL);
+		entry->mode = 0555;
+		entry->child = sd_alloc_ctl_cpu_table(i);
+		entry++;
+	}
+
+	WARN_ON(sd_sysctl_header);
+	sd_sysctl_header = register_sysctl_table(sd_ctl_root);
+}
+
+/* may be called multiple times per register */
+void unregister_sched_domain_sysctl(void)
+{
+	unregister_sysctl_table(sd_sysctl_header);
+	sd_sysctl_header = NULL;
+	if (sd_ctl_dir[0].child)
+		sd_free_ctl_entry(&sd_ctl_dir[0].child);
+}
+#endif /* CONFIG_SYSCTL */
+#endif /* CONFIG_SMP */
+
 #ifdef CONFIG_FAIR_GROUP_SCHED
 static void print_cfs_group_stats(struct seq_file *m, int cpu, struct task_group *tg)
 {
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 10f16374df7f..b146f5298cb1 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -3,6 +3,7 @@
 #include <linux/sched/sysctl.h>
 #include <linux/sched/rt.h>
 #include <linux/sched/deadline.h>
+#include <linux/binfmts.h>
 #include <linux/mutex.h>
 #include <linux/spinlock.h>
 #include <linux/stop_machine.h>
@@ -909,6 +910,18 @@ static inline unsigned int group_first_cpu(struct sched_group *group)
 
 extern int group_balance_cpu(struct sched_group *sg);
 
+#if defined(CONFIG_SCHED_DEBUG) && defined(CONFIG_SYSCTL)
+void register_sched_domain_sysctl(void);
+void unregister_sched_domain_sysctl(void);
+#else
+static inline void register_sched_domain_sysctl(void)
+{
+}
+static inline void unregister_sched_domain_sysctl(void)
+{
+}
+#endif
+
 #else
 
 static inline void sched_ttwu_pending(void) { }
-- 
2.6.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC][PATCH 3/3 v2] sched: Add bandwidth ratio to /proc/sched_debug
  2016-02-03 16:56 [RFC][PATCH 0/3 v2] sched: Display deadline bandwidth and other SCHED_DEBUG clean up Steven Rostedt
  2016-02-03 16:56 ` [RFC][PATCH 1/3 v2] sched: Move sched_feature file setup into debug.c Steven Rostedt
  2016-02-03 16:57 ` [RFC][PATCH 2/3 v2] sched: Move sched_domain_sysctl to debug.c Steven Rostedt
@ 2016-02-03 16:57 ` Steven Rostedt
  2016-02-03 17:56   ` Juri Lelli
  2016-02-10 15:13 ` [RFC][PATCH 0/3 v2] sched: Display deadline bandwidth and other SCHED_DEBUG clean up Steven Rostedt
  3 siblings, 1 reply; 7+ messages in thread
From: Steven Rostedt @ 2016-02-03 16:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Juri Lelli,
	Clark Williams, Andrew Morton

[-- Attachment #1: 0003-sched-Add-bandwidth-ratio-to-proc-sched_debug.patch --]
[-- Type: text/plain, Size: 4010 bytes --]

From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>

Playing with SCHED_DEADLINE and cpusets, I found that I was unable to create
new SCHED_DEADLINE tasks, with the error of EBUSY as if the bandwidth was
already used up. I then realized there wa no way to see what bandwidth is
used by the runqueues to debug the issue.

By adding the dl_bw->bw and dl_bw->total_bw to the output of the deadline
info in /proc/sched_debug, this allows us to see what bandwidth has been
reserved and where a problem may exist.

For example, before the issue we see the ratio of the bandwidth:

 # cat /proc/sys/kernel/sched_rt_runtime_us
 950000
 # cat /proc/sys/kernel/sched_rt_period_us
 1000000

  # grep dl /proc/sched_debug
  dl_rq[0]:
    .dl_nr_running                 : 0
    .dl_bw->bw                     : 996147
    .dl_bw->total_bw               : 0
  dl_rq[1]:
    .dl_nr_running                 : 0
    .dl_bw->bw                     : 996147
    .dl_bw->total_bw               : 0
  dl_rq[2]:
    .dl_nr_running                 : 0
    .dl_bw->bw                     : 996147
    .dl_bw->total_bw               : 0
  dl_rq[3]:
    .dl_nr_running                 : 0
    .dl_bw->bw                     : 996147
    .dl_bw->total_bw               : 0
  dl_rq[4]:
    .dl_nr_running                 : 0
    .dl_bw->bw                     : 996147
    .dl_bw->total_bw               : 0
  dl_rq[5]:
    .dl_nr_running                 : 0
    .dl_bw->bw                     : 996147
    .dl_bw->total_bw               : 0
  dl_rq[6]:
    .dl_nr_running                 : 0
    .dl_bw->bw                     : 996147
    .dl_bw->total_bw               : 0
  dl_rq[7]:
    .dl_nr_running                 : 0
    .dl_bw->bw                     : 996147
    .dl_bw->total_bw               : 0

Note: (950000 / 1000000) << 20 == 996147

After I played with cpusets and hit the issue, the result is now:

  # grep dl /proc/sched_debug
  dl_rq[0]:
    .dl_nr_running                 : 0
    .dl_bw->bw                     : 996147
    .dl_bw->total_bw               : -104857
  dl_rq[1]:
    .dl_nr_running                 : 0
    .dl_bw->bw                     : 996147
    .dl_bw->total_bw               : 104857
  dl_rq[2]:
    .dl_nr_running                 : 0
    .dl_bw->bw                     : 996147
    .dl_bw->total_bw               : 104857
  dl_rq[3]:
    .dl_nr_running                 : 0
    .dl_bw->bw                     : 996147
    .dl_bw->total_bw               : 104857
  dl_rq[4]:
    .dl_nr_running                 : 0
    .dl_bw->bw                     : 996147
    .dl_bw->total_bw               : -104857
  dl_rq[5]:
    .dl_nr_running                 : 0
    .dl_bw->bw                     : 996147
    .dl_bw->total_bw               : -104857
  dl_rq[6]:
    .dl_nr_running                 : 0
    .dl_bw->bw                     : 996147
    .dl_bw->total_bw               : -104857
  dl_rq[7]:
    .dl_nr_running                 : 0
    .dl_bw->bw                     : 996147
    .dl_bw->total_bw               : -104857

This shows that there is definitely a problem as we should never have a
negative total bandwidth.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/sched/debug.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index a5625e793d15..a1f161a7500e 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -562,8 +562,17 @@ void print_rt_rq(struct seq_file *m, int cpu, struct rt_rq *rt_rq)
 
 void print_dl_rq(struct seq_file *m, int cpu, struct dl_rq *dl_rq)
 {
+	struct dl_bw *dl_bw;
+
 	SEQ_printf(m, "\ndl_rq[%d]:\n", cpu);
 	SEQ_printf(m, "  .%-30s: %ld\n", "dl_nr_running", dl_rq->dl_nr_running);
+#ifdef CONFIG_SMP
+	dl_bw = &cpu_rq(cpu)->rd->dl_bw;
+#else
+	dl_bw = &dl_rq->dl_bw;
+#endif
+	SEQ_printf(m, "  .%-30s: %lld\n", "dl_bw->bw", dl_bw->bw);
+	SEQ_printf(m, "  .%-30s: %lld\n", "dl_bw->total_bw", dl_bw->total_bw);
 }
 
 extern __read_mostly int sched_clock_running;
-- 
2.6.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC][PATCH 3/3 v2] sched: Add bandwidth ratio to /proc/sched_debug
  2016-02-03 16:57 ` [RFC][PATCH 3/3 v2] sched: Add bandwidth ratio to /proc/sched_debug Steven Rostedt
@ 2016-02-03 17:56   ` Juri Lelli
  2016-02-03 18:10     ` Steven Rostedt
  0 siblings, 1 reply; 7+ messages in thread
From: Juri Lelli @ 2016-02-03 17:56 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, Thomas Gleixner,
	Juri Lelli, Clark Williams, Andrew Morton

Hi,

On 03/02/16 11:57, Steven Rostedt wrote:
> From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>
> 
> Playing with SCHED_DEADLINE and cpusets, I found that I was unable to create
> new SCHED_DEADLINE tasks, with the error of EBUSY as if the bandwidth was
> already used up. I then realized there wa no way to see what bandwidth is
> used by the runqueues to debug the issue.
> 
> By adding the dl_bw->bw and dl_bw->total_bw to the output of the deadline
> info in /proc/sched_debug, this allows us to see what bandwidth has been
> reserved and where a problem may exist.
> 
> For example, before the issue we see the ratio of the bandwidth:
> 
>  # cat /proc/sys/kernel/sched_rt_runtime_us
>  950000
>  # cat /proc/sys/kernel/sched_rt_period_us
>  1000000
> 
>   # grep dl /proc/sched_debug
>   dl_rq[0]:
>     .dl_nr_running                 : 0
>     .dl_bw->bw                     : 996147
>     .dl_bw->total_bw               : 0
>   dl_rq[1]:
>     .dl_nr_running                 : 0
>     .dl_bw->bw                     : 996147
>     .dl_bw->total_bw               : 0
>   dl_rq[2]:
>     .dl_nr_running                 : 0
>     .dl_bw->bw                     : 996147
>     .dl_bw->total_bw               : 0
>   dl_rq[3]:
>     .dl_nr_running                 : 0
>     .dl_bw->bw                     : 996147
>     .dl_bw->total_bw               : 0
>   dl_rq[4]:
>     .dl_nr_running                 : 0
>     .dl_bw->bw                     : 996147
>     .dl_bw->total_bw               : 0
>   dl_rq[5]:
>     .dl_nr_running                 : 0
>     .dl_bw->bw                     : 996147
>     .dl_bw->total_bw               : 0
>   dl_rq[6]:
>     .dl_nr_running                 : 0
>     .dl_bw->bw                     : 996147
>     .dl_bw->total_bw               : 0
>   dl_rq[7]:
>     .dl_nr_running                 : 0
>     .dl_bw->bw                     : 996147
>     .dl_bw->total_bw               : 0
>

I think this is already quite useful. But, do you think we can also
display information about the root_domain (like the span for example)?
Or do we have that some place else already?

Thanks,

- Juri

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC][PATCH 3/3 v2] sched: Add bandwidth ratio to /proc/sched_debug
  2016-02-03 17:56   ` Juri Lelli
@ 2016-02-03 18:10     ` Steven Rostedt
  0 siblings, 0 replies; 7+ messages in thread
From: Steven Rostedt @ 2016-02-03 18:10 UTC (permalink / raw)
  To: Juri Lelli
  Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, Thomas Gleixner,
	Juri Lelli, Clark Williams, Andrew Morton

On Wed, 3 Feb 2016 17:56:46 +0000
Juri Lelli <juri.lelli@arm.com> wrote:


> I think this is already quite useful. But, do you think we can also
> display information about the root_domain (like the span for example)?
> Or do we have that some place else already?

That's something I was planning on adding to that debugfs/sched
directory. But it may already be present. Perhaps it can be added to
the sched_debug file too if it isn't already there.

But that's a question for Peter.

-- Steve

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC][PATCH 0/3 v2] sched: Display deadline bandwidth and other SCHED_DEBUG clean up
  2016-02-03 16:56 [RFC][PATCH 0/3 v2] sched: Display deadline bandwidth and other SCHED_DEBUG clean up Steven Rostedt
                   ` (2 preceding siblings ...)
  2016-02-03 16:57 ` [RFC][PATCH 3/3 v2] sched: Add bandwidth ratio to /proc/sched_debug Steven Rostedt
@ 2016-02-10 15:13 ` Steven Rostedt
  3 siblings, 0 replies; 7+ messages in thread
From: Steven Rostedt @ 2016-02-10 15:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Juri Lelli,
	Clark Williams, Andrew Morton

On Wed, 03 Feb 2016 11:56:58 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> I'm starting to play with SCHED_DEADLINE a bit and I'm able to cause
> a bandwidth "leak". Then I realized there's no way to examine what bandwidths
> are enabled on which CPUs. I added the bandwith ratios to the
> /proc/sched_debug file.
> 

ping?

-- Steve

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-02-10 15:16 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-03 16:56 [RFC][PATCH 0/3 v2] sched: Display deadline bandwidth and other SCHED_DEBUG clean up Steven Rostedt
2016-02-03 16:56 ` [RFC][PATCH 1/3 v2] sched: Move sched_feature file setup into debug.c Steven Rostedt
2016-02-03 16:57 ` [RFC][PATCH 2/3 v2] sched: Move sched_domain_sysctl to debug.c Steven Rostedt
2016-02-03 16:57 ` [RFC][PATCH 3/3 v2] sched: Add bandwidth ratio to /proc/sched_debug Steven Rostedt
2016-02-03 17:56   ` Juri Lelli
2016-02-03 18:10     ` Steven Rostedt
2016-02-10 15:13 ` [RFC][PATCH 0/3 v2] sched: Display deadline bandwidth and other SCHED_DEBUG clean up Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox