* RFC: cgroups aware proc [not found] ` <52937B0C.3070005-NV7Lj0SOnH0@public.gmane.org> @ 2014-01-04 4:28 ` Marian Marinov [not found] ` <52C78E09.60904-NV7Lj0SOnH0@public.gmane.org> 0 siblings, 1 reply; 11+ messages in thread From: Marian Marinov @ 2014-01-04 4:28 UTC (permalink / raw) To: Daniel P. Berrange, Serge Hallyn Cc: lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I, cgroups-u79uwXL29TY76Z2rM5mHXA [-- Attachment #1: Type: text/plain, Size: 749 bytes --] Happy new year guys. I need to have /proc cgroups aware, as I want to have LXC containers that see only the resources that are given to them. In order to do that I had to patch the kernel. I decided to start with cpuinfo, stat and interrupts and then continue with meminfo and loadavg. I managed to patch the Kernel (linux 3.12.0) and make /proc/cpuinfo, /proc/stat and /proc/interrupts be cgroups aware. Attached are the patches that make the necessary changes. The change for /proc/cpuinfo and /proc/interrupts is currently done only for x86 arch, but I will patch the rest of the architectures if the style of the patches is acceptable. Tomorrow I will check if the patches apply and build with the latest kernel. Best regards, Marian [-- Attachment #2: 0001-arch-x86-kernel-cpu-proc.c-Make-proc-cpuinfo-display.patch --] [-- Type: text/x-patch, Size: 1585 bytes --] From 94891538f4a6a6b57aab0a2b917589ba73adfad9 Mon Sep 17 00:00:00 2001 From: Marian Marinov <mm-NV7Lj0SOnH0@public.gmane.org> Date: Sat, 4 Jan 2014 05:45:42 +0200 Subject: [PATCH 1/2] arch/x86/kernel/cpu/proc.c: Make /proc/cpuinfo display cpu information relative only to the current cgroup - added linux/cgroup.h include because it is needed for the cpumask_test_cpu() - addded a task_struct to c_start() - and added a loop that will skip all CPUs that are not part of the current cgroup by using the cpus_allowed mask from the task_struct Signed-off-by: Marian Marinov <mm-NV7Lj0SOnH0@public.gmane.org> --- arch/x86/kernel/cpu/proc.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c index aee6317..d9e9fb6 100644 --- a/arch/x86/kernel/cpu/proc.c +++ b/arch/x86/kernel/cpu/proc.c @@ -3,6 +3,7 @@ #include <linux/string.h> #include <linux/seq_file.h> #include <linux/cpufreq.h> +#include <linux/cgroup.h> /* * Get CPU information for use by the procfs. @@ -133,9 +134,19 @@ static int show_cpuinfo(struct seq_file *m, void *v) static void *c_start(struct seq_file *m, loff_t *pos) { + struct task_struct *tsk; *pos = cpumask_next(*pos - 1, cpu_online_mask); - if ((*pos) < nr_cpu_ids) + tsk = current_thread_info()->task; + if ((*pos) < nr_cpu_ids) { + if (tsk != NULL) { + while (cpumask_test_cpu((*pos), &tsk->cpus_allowed) == 0) { + (*pos)++; + if ((*pos) >= nr_cpu_ids) + return NULL; + } + } return &cpu_data(*pos); + } return NULL; } -- 1.8.4 [-- Attachment #3: 0001-arch-x86-kernel-irq.c-Made-proc-interrupts-to-be-cgr.patch --] [-- Type: text/x-patch, Size: 5885 bytes --] From ff68f073cb90316baa78936ff219a155788e29c2 Mon Sep 17 00:00:00 2001 From: Marian Marinov <mm-NV7Lj0SOnH0@public.gmane.org> Date: Sat, 4 Jan 2014 06:10:24 +0200 Subject: [PATCH 1/1] arch/x86/kernel/irq.c: Made /proc/interrupts to be cgroups aware - print only the CPUs that are part of the current cgroup - Added code to handle Kconfig options - Added code to skip all CPUs that are not part of the current cgroup using the task_struct's allowed_cpus mask Signed-off-by: Marian Marinov <mm-NV7Lj0SOnH0@public.gmane.org> --- arch/x86/kernel/irq.c | 73 +++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 59 insertions(+), 14 deletions(-) diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c index 22d0687..b0a17c0 100644 --- a/arch/x86/kernel/irq.c +++ b/arch/x86/kernel/irq.c @@ -54,75 +54,120 @@ void ack_bad_irq(unsigned int irq) int arch_show_interrupts(struct seq_file *p, int prec) { int j; +#ifdef CONFIG_CPUSETS + struct task_struct *tsk; + tsk = current_thread_info()->task; +#endif seq_printf(p, "%*s: ", prec, "NMI"); for_each_online_cpu(j) - seq_printf(p, "%10u ", irq_stats(j)->__nmi_count); +#ifdef CONFIG_CPUSETS + if (tsk != NULL && cpumask_test_cpu(j, &tsk->cpus_allowed)) +#endif + seq_printf(p, "%10u ", irq_stats(j)->__nmi_count); seq_printf(p, " Non-maskable interrupts\n"); #ifdef CONFIG_X86_LOCAL_APIC seq_printf(p, "%*s: ", prec, "LOC"); for_each_online_cpu(j) - seq_printf(p, "%10u ", irq_stats(j)->apic_timer_irqs); +#ifdef CONFIG_CPUSETS + if (tsk != NULL && cpumask_test_cpu(j, &tsk->cpus_allowed)) +#endif + seq_printf(p, "%10u ", irq_stats(j)->apic_timer_irqs); seq_printf(p, " Local timer interrupts\n"); seq_printf(p, "%*s: ", prec, "SPU"); for_each_online_cpu(j) - seq_printf(p, "%10u ", irq_stats(j)->irq_spurious_count); +#ifdef CONFIG_CPUSETS + if (tsk != NULL && cpumask_test_cpu(j, &tsk->cpus_allowed)) +#endif + seq_printf(p, "%10u ", irq_stats(j)->irq_spurious_count); seq_printf(p, " Spurious interrupts\n"); seq_printf(p, "%*s: ", prec, "PMI"); for_each_online_cpu(j) - seq_printf(p, "%10u ", irq_stats(j)->apic_perf_irqs); +#ifdef CONFIG_CPUSETS + if (tsk != NULL && cpumask_test_cpu(j, &tsk->cpus_allowed)) +#endif + seq_printf(p, "%10u ", irq_stats(j)->apic_perf_irqs); seq_printf(p, " Performance monitoring interrupts\n"); seq_printf(p, "%*s: ", prec, "IWI"); for_each_online_cpu(j) - seq_printf(p, "%10u ", irq_stats(j)->apic_irq_work_irqs); +#ifdef CONFIG_CPUSETS + if (tsk != NULL && cpumask_test_cpu(j, &tsk->cpus_allowed)) +#endif + seq_printf(p, "%10u ", irq_stats(j)->apic_irq_work_irqs); seq_printf(p, " IRQ work interrupts\n"); seq_printf(p, "%*s: ", prec, "RTR"); for_each_online_cpu(j) - seq_printf(p, "%10u ", irq_stats(j)->icr_read_retry_count); +#ifdef CONFIG_CPUSETS + if (tsk != NULL && cpumask_test_cpu(j, &tsk->cpus_allowed)) +#endif + seq_printf(p, "%10u ", irq_stats(j)->icr_read_retry_count); seq_printf(p, " APIC ICR read retries\n"); #endif if (x86_platform_ipi_callback) { seq_printf(p, "%*s: ", prec, "PLT"); for_each_online_cpu(j) - seq_printf(p, "%10u ", irq_stats(j)->x86_platform_ipis); +#ifdef CONFIG_CPUSETS + if (tsk != NULL && cpumask_test_cpu(j, &tsk->cpus_allowed)) +#endif + seq_printf(p, "%10u ", irq_stats(j)->x86_platform_ipis); seq_printf(p, " Platform interrupts\n"); } #ifdef CONFIG_SMP seq_printf(p, "%*s: ", prec, "RES"); for_each_online_cpu(j) - seq_printf(p, "%10u ", irq_stats(j)->irq_resched_count); + if (tsk != NULL && cpumask_test_cpu(j, &tsk->cpus_allowed)) +#endif + seq_printf(p, "%10u ", irq_stats(j)->irq_resched_count); seq_printf(p, " Rescheduling interrupts\n"); seq_printf(p, "%*s: ", prec, "CAL"); for_each_online_cpu(j) - seq_printf(p, "%10u ", irq_stats(j)->irq_call_count - +#ifdef CONFIG_CPUSETS + if (tsk != NULL && cpumask_test_cpu(j, &tsk->cpus_allowed)) +#endif + seq_printf(p, "%10u ", irq_stats(j)->irq_call_count - irq_stats(j)->irq_tlb_count); seq_printf(p, " Function call interrupts\n"); seq_printf(p, "%*s: ", prec, "TLB"); for_each_online_cpu(j) - seq_printf(p, "%10u ", irq_stats(j)->irq_tlb_count); +#ifdef CONFIG_CPUSETS + if (tsk != NULL && cpumask_test_cpu(j, &tsk->cpus_allowed)) +#endif + seq_printf(p, "%10u ", irq_stats(j)->irq_tlb_count); seq_printf(p, " TLB shootdowns\n"); #endif #ifdef CONFIG_X86_THERMAL_VECTOR seq_printf(p, "%*s: ", prec, "TRM"); for_each_online_cpu(j) - seq_printf(p, "%10u ", irq_stats(j)->irq_thermal_count); +#ifdef CONFIG_CPUSETS + if (tsk != NULL && cpumask_test_cpu(j, &tsk->cpus_allowed)) +#endif + seq_printf(p, "%10u ", irq_stats(j)->irq_thermal_count); seq_printf(p, " Thermal event interrupts\n"); #endif #ifdef CONFIG_X86_MCE_THRESHOLD seq_printf(p, "%*s: ", prec, "THR"); for_each_online_cpu(j) - seq_printf(p, "%10u ", irq_stats(j)->irq_threshold_count); +#ifdef CONFIG_CPUSETS + if (tsk != NULL && cpumask_test_cpu(j, &tsk->cpus_allowed)) +#endif + seq_printf(p, "%10u ", irq_stats(j)->irq_threshold_count); seq_printf(p, " Threshold APIC interrupts\n"); #endif #ifdef CONFIG_X86_MCE seq_printf(p, "%*s: ", prec, "MCE"); for_each_online_cpu(j) - seq_printf(p, "%10u ", per_cpu(mce_exception_count, j)); +#ifdef CONFIG_CPUSETS + if (tsk != NULL && cpumask_test_cpu(j, &tsk->cpus_allowed)) +#endif + seq_printf(p, "%10u ", per_cpu(mce_exception_count, j)); seq_printf(p, " Machine check exceptions\n"); seq_printf(p, "%*s: ", prec, "MCP"); for_each_online_cpu(j) - seq_printf(p, "%10u ", per_cpu(mce_poll_count, j)); +#ifdef CONFIG_CPUSETS + if (tsk != NULL && cpumask_test_cpu(j, &tsk->cpus_allowed)) +#endif + seq_printf(p, "%10u ", per_cpu(mce_poll_count, j)); seq_printf(p, " Machine check polls\n"); #endif seq_printf(p, "%*s: %10u\n", prec, "ERR", atomic_read(&irq_err_count)); -- 1.8.4 [-- Attachment #4: 0001-fs-proc-stat.c-kernel-sched-stats.c-List-only-the-CP.patch --] [-- Type: text/x-patch, Size: 2798 bytes --] From 00af9f7b5eeef770d0da240a6bf2064a2ba11e47 Mon Sep 17 00:00:00 2001 From: Marian Marinov <mm-NV7Lj0SOnH0@public.gmane.org> Date: Sat, 4 Jan 2014 06:03:11 +0200 Subject: [PATCH 1/1] fs/proc/stat.c & kernel/sched/stats.c: List only the CPUs that are in the current cpuset - Added a check to allow the display of cpu information only if the cpu is part of the current cpu set using the task_struct allowed_cpus Signed-off-by: Marian Marinov <mm-NV7Lj0SOnH0@public.gmane.org> --- fs/proc/stat.c | 14 ++++++++++++++ kernel/sched/stats.c | 9 +++++++++ 2 files changed, 23 insertions(+) diff --git a/fs/proc/stat.c b/fs/proc/stat.c index 1cf86c0..e5ca3ef 100644 --- a/fs/proc/stat.c +++ b/fs/proc/stat.c @@ -87,6 +87,11 @@ static int show_stat(struct seq_file *p, void *v) u64 sum_softirq = 0; unsigned int per_softirq_sums[NR_SOFTIRQS] = {0}; struct timespec boottime; +#ifdef CONFIG_CPUSETS + struct task_struct *tsk; + + tsk = current_thread_info()->task; +#endif user = nice = system = idle = iowait = irq = softirq = steal = 0; @@ -94,7 +99,12 @@ static int show_stat(struct seq_file *p, void *v) getboottime(&boottime); jif = boottime.tv_sec; + for_each_possible_cpu(i) { +#ifdef CONFIG_CPUSETS + if (tsk != NULL && cpumask_test_cpu(i, &tsk->cpus_allowed) == 0) + continue; +#endif user += kcpustat_cpu(i).cpustat[CPUTIME_USER]; nice += kcpustat_cpu(i).cpustat[CPUTIME_NICE]; system += kcpustat_cpu(i).cpustat[CPUTIME_SYSTEM]; @@ -142,6 +152,10 @@ static int show_stat(struct seq_file *p, void *v) steal = kcpustat_cpu(i).cpustat[CPUTIME_STEAL]; guest = kcpustat_cpu(i).cpustat[CPUTIME_GUEST]; guest_nice = kcpustat_cpu(i).cpustat[CPUTIME_GUEST_NICE]; +#ifdef CONFIG_CPUSETS + if (tsk != NULL && cpumask_test_cpu(i, &tsk->cpus_allowed) == 0) + continue; +#endif seq_printf(p, "cpu%d", i); seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(user)); seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(nice)); diff --git a/kernel/sched/stats.c b/kernel/sched/stats.c index da98af3..5897358 100644 --- a/kernel/sched/stats.c +++ b/kernel/sched/stats.c @@ -17,6 +17,10 @@ static int show_schedstat(struct seq_file *seq, void *v) int cpu; int mask_len = DIV_ROUND_UP(NR_CPUS, 32) * 9; char *mask_str = kmalloc(mask_len, GFP_KERNEL); +#ifdef CONFIG_CPUSETS + struct task_struct *tsk; + tsk = current_thread_info()->task; +#endif if (mask_str == NULL) return -ENOMEM; @@ -33,6 +37,11 @@ static int show_schedstat(struct seq_file *seq, void *v) cpu = (unsigned long)(v - 2); rq = cpu_rq(cpu); +#ifdef CONFIG_CPUSETS + if (tsk != NULL && cpumask_test_cpu(cpu, &tsk->cpus_allowed) == 0) + return 0; +#endif + /* runqueue-specific stats */ seq_printf(seq, "cpu%d %u 0 %u %u %u %u %llu %llu %lu", -- 1.8.4 [-- Attachment #5: 0002-arch-x86-kernel-cpu-proc.c-Added-Kconfig-option-hand.patch --] [-- Type: text/x-patch, Size: 1421 bytes --] From dec97e6141f92109c0cd02883cff20e3f1429564 Mon Sep 17 00:00:00 2001 From: Marian Marinov <mm-NV7Lj0SOnH0@public.gmane.org> Date: Sat, 4 Jan 2014 05:50:03 +0200 Subject: [PATCH 2/2] arch/x86/kernel/cpu/proc.c: Added Kconfig option handling Signed-off-by: Marian Marinov <mm-NV7Lj0SOnH0@public.gmane.org> --- arch/x86/kernel/cpu/proc.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c index d9e9fb6..114fd95 100644 --- a/arch/x86/kernel/cpu/proc.c +++ b/arch/x86/kernel/cpu/proc.c @@ -3,7 +3,10 @@ #include <linux/string.h> #include <linux/seq_file.h> #include <linux/cpufreq.h> + +#ifdef CONFIG_CPUSETS #include <linux/cgroup.h> +#endif /* * Get CPU information for use by the procfs. @@ -134,10 +137,13 @@ static int show_cpuinfo(struct seq_file *m, void *v) static void *c_start(struct seq_file *m, loff_t *pos) { +#ifdef CONFIG_CPUSETS struct task_struct *tsk; +#endif *pos = cpumask_next(*pos - 1, cpu_online_mask); - tsk = current_thread_info()->task; if ((*pos) < nr_cpu_ids) { +#ifdef CONFIG_CPUSETS + tsk = current_thread_info()->task; if (tsk != NULL) { while (cpumask_test_cpu((*pos), &tsk->cpus_allowed) == 0) { (*pos)++; @@ -145,6 +151,7 @@ static void *c_start(struct seq_file *m, loff_t *pos) return NULL; } } +#endif return &cpu_data(*pos); } return NULL; -- 1.8.4 ^ permalink raw reply related [flat|nested] 11+ messages in thread
[parent not found: <52C78E09.60904-NV7Lj0SOnH0@public.gmane.org>]
* RFC: cgroups aware proc [not found] ` <52C78E09.60904-NV7Lj0SOnH0@public.gmane.org> @ 2014-01-05 0:12 ` Marian Marinov [not found] ` <52C8A36B.6030201-NV7Lj0SOnH0@public.gmane.org> 0 siblings, 1 reply; 11+ messages in thread From: Marian Marinov @ 2014-01-05 0:12 UTC (permalink / raw) To: lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I, cgroups-u79uwXL29TY76Z2rM5mHXA, Daniel P. Berrange, Serge Hallyn [-- Attachment #1: Type: text/plain, Size: 750 bytes --] Happy new year guys. I need to have /proc cgroups aware, as I want to have LXC containers that see only the resources that are given to them. In order to do that I had to patch the kernel. I decided to start with cpuinfo, stat and interrupts and then continue with meminfo and loadavg. I managed to patch the Kernel (linux 3.12.0) and make /proc/cpuinfo, /proc/stat and /proc/interrupts be cgroups aware. Attached are the patches that make the necessary changes. The change for /proc/cpuinfo and /proc/interrupts is currently done only for x86 arch, but I will patch the rest of the architectures if the style of the patches is acceptable. Tomorrow I will check if the patches apply and build with the latest kernel. Best regards, Marian [-- Attachment #2: 0001-arch-x86-kernel-cpu-proc.c-Make-proc-cpuinfo-display.patch --] [-- Type: text/x-patch, Size: 1586 bytes --] From 94891538f4a6a6b57aab0a2b917589ba73adfad9 Mon Sep 17 00:00:00 2001 From: Marian Marinov <mm-NV7Lj0SOnH0@public.gmane.org> Date: Sat, 4 Jan 2014 05:45:42 +0200 Subject: [PATCH 1/2] arch/x86/kernel/cpu/proc.c: Make /proc/cpuinfo display cpu information relative only to the current cgroup - added linux/cgroup.h include because it is needed for the cpumask_test_cpu() - addded a task_struct to c_start() - and added a loop that will skip all CPUs that are not part of the current cgroup by using the cpus_allowed mask from the task_struct Signed-off-by: Marian Marinov <mm-NV7Lj0SOnH0@public.gmane.org> --- arch/x86/kernel/cpu/proc.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c index aee6317..d9e9fb6 100644 --- a/arch/x86/kernel/cpu/proc.c +++ b/arch/x86/kernel/cpu/proc.c @@ -3,6 +3,7 @@ #include <linux/string.h> #include <linux/seq_file.h> #include <linux/cpufreq.h> +#include <linux/cgroup.h> /* * Get CPU information for use by the procfs. @@ -133,9 +134,19 @@ static int show_cpuinfo(struct seq_file *m, void *v) static void *c_start(struct seq_file *m, loff_t *pos) { + struct task_struct *tsk; *pos = cpumask_next(*pos - 1, cpu_online_mask); - if ((*pos) < nr_cpu_ids) + tsk = current_thread_info()->task; + if ((*pos) < nr_cpu_ids) { + if (tsk != NULL) { + while (cpumask_test_cpu((*pos), &tsk->cpus_allowed) == 0) { + (*pos)++; + if ((*pos) >= nr_cpu_ids) + return NULL; + } + } return &cpu_data(*pos); + } return NULL; } -- 1.8.4 [-- Attachment #3: 0001-arch-x86-kernel-irq.c-Made-proc-interrupts-to-be-cgr.patch --] [-- Type: text/x-patch, Size: 5886 bytes --] From ff68f073cb90316baa78936ff219a155788e29c2 Mon Sep 17 00:00:00 2001 From: Marian Marinov <mm-NV7Lj0SOnH0@public.gmane.org> Date: Sat, 4 Jan 2014 06:10:24 +0200 Subject: [PATCH 1/1] arch/x86/kernel/irq.c: Made /proc/interrupts to be cgroups aware - print only the CPUs that are part of the current cgroup - Added code to handle Kconfig options - Added code to skip all CPUs that are not part of the current cgroup using the task_struct's allowed_cpus mask Signed-off-by: Marian Marinov <mm-NV7Lj0SOnH0@public.gmane.org> --- arch/x86/kernel/irq.c | 73 +++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 59 insertions(+), 14 deletions(-) diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c index 22d0687..b0a17c0 100644 --- a/arch/x86/kernel/irq.c +++ b/arch/x86/kernel/irq.c @@ -54,75 +54,120 @@ void ack_bad_irq(unsigned int irq) int arch_show_interrupts(struct seq_file *p, int prec) { int j; +#ifdef CONFIG_CPUSETS + struct task_struct *tsk; + tsk = current_thread_info()->task; +#endif seq_printf(p, "%*s: ", prec, "NMI"); for_each_online_cpu(j) - seq_printf(p, "%10u ", irq_stats(j)->__nmi_count); +#ifdef CONFIG_CPUSETS + if (tsk != NULL && cpumask_test_cpu(j, &tsk->cpus_allowed)) +#endif + seq_printf(p, "%10u ", irq_stats(j)->__nmi_count); seq_printf(p, " Non-maskable interrupts\n"); #ifdef CONFIG_X86_LOCAL_APIC seq_printf(p, "%*s: ", prec, "LOC"); for_each_online_cpu(j) - seq_printf(p, "%10u ", irq_stats(j)->apic_timer_irqs); +#ifdef CONFIG_CPUSETS + if (tsk != NULL && cpumask_test_cpu(j, &tsk->cpus_allowed)) +#endif + seq_printf(p, "%10u ", irq_stats(j)->apic_timer_irqs); seq_printf(p, " Local timer interrupts\n"); seq_printf(p, "%*s: ", prec, "SPU"); for_each_online_cpu(j) - seq_printf(p, "%10u ", irq_stats(j)->irq_spurious_count); +#ifdef CONFIG_CPUSETS + if (tsk != NULL && cpumask_test_cpu(j, &tsk->cpus_allowed)) +#endif + seq_printf(p, "%10u ", irq_stats(j)->irq_spurious_count); seq_printf(p, " Spurious interrupts\n"); seq_printf(p, "%*s: ", prec, "PMI"); for_each_online_cpu(j) - seq_printf(p, "%10u ", irq_stats(j)->apic_perf_irqs); +#ifdef CONFIG_CPUSETS + if (tsk != NULL && cpumask_test_cpu(j, &tsk->cpus_allowed)) +#endif + seq_printf(p, "%10u ", irq_stats(j)->apic_perf_irqs); seq_printf(p, " Performance monitoring interrupts\n"); seq_printf(p, "%*s: ", prec, "IWI"); for_each_online_cpu(j) - seq_printf(p, "%10u ", irq_stats(j)->apic_irq_work_irqs); +#ifdef CONFIG_CPUSETS + if (tsk != NULL && cpumask_test_cpu(j, &tsk->cpus_allowed)) +#endif + seq_printf(p, "%10u ", irq_stats(j)->apic_irq_work_irqs); seq_printf(p, " IRQ work interrupts\n"); seq_printf(p, "%*s: ", prec, "RTR"); for_each_online_cpu(j) - seq_printf(p, "%10u ", irq_stats(j)->icr_read_retry_count); +#ifdef CONFIG_CPUSETS + if (tsk != NULL && cpumask_test_cpu(j, &tsk->cpus_allowed)) +#endif + seq_printf(p, "%10u ", irq_stats(j)->icr_read_retry_count); seq_printf(p, " APIC ICR read retries\n"); #endif if (x86_platform_ipi_callback) { seq_printf(p, "%*s: ", prec, "PLT"); for_each_online_cpu(j) - seq_printf(p, "%10u ", irq_stats(j)->x86_platform_ipis); +#ifdef CONFIG_CPUSETS + if (tsk != NULL && cpumask_test_cpu(j, &tsk->cpus_allowed)) +#endif + seq_printf(p, "%10u ", irq_stats(j)->x86_platform_ipis); seq_printf(p, " Platform interrupts\n"); } #ifdef CONFIG_SMP seq_printf(p, "%*s: ", prec, "RES"); for_each_online_cpu(j) - seq_printf(p, "%10u ", irq_stats(j)->irq_resched_count); + if (tsk != NULL && cpumask_test_cpu(j, &tsk->cpus_allowed)) +#endif + seq_printf(p, "%10u ", irq_stats(j)->irq_resched_count); seq_printf(p, " Rescheduling interrupts\n"); seq_printf(p, "%*s: ", prec, "CAL"); for_each_online_cpu(j) - seq_printf(p, "%10u ", irq_stats(j)->irq_call_count - +#ifdef CONFIG_CPUSETS + if (tsk != NULL && cpumask_test_cpu(j, &tsk->cpus_allowed)) +#endif + seq_printf(p, "%10u ", irq_stats(j)->irq_call_count - irq_stats(j)->irq_tlb_count); seq_printf(p, " Function call interrupts\n"); seq_printf(p, "%*s: ", prec, "TLB"); for_each_online_cpu(j) - seq_printf(p, "%10u ", irq_stats(j)->irq_tlb_count); +#ifdef CONFIG_CPUSETS + if (tsk != NULL && cpumask_test_cpu(j, &tsk->cpus_allowed)) +#endif + seq_printf(p, "%10u ", irq_stats(j)->irq_tlb_count); seq_printf(p, " TLB shootdowns\n"); #endif #ifdef CONFIG_X86_THERMAL_VECTOR seq_printf(p, "%*s: ", prec, "TRM"); for_each_online_cpu(j) - seq_printf(p, "%10u ", irq_stats(j)->irq_thermal_count); +#ifdef CONFIG_CPUSETS + if (tsk != NULL && cpumask_test_cpu(j, &tsk->cpus_allowed)) +#endif + seq_printf(p, "%10u ", irq_stats(j)->irq_thermal_count); seq_printf(p, " Thermal event interrupts\n"); #endif #ifdef CONFIG_X86_MCE_THRESHOLD seq_printf(p, "%*s: ", prec, "THR"); for_each_online_cpu(j) - seq_printf(p, "%10u ", irq_stats(j)->irq_threshold_count); +#ifdef CONFIG_CPUSETS + if (tsk != NULL && cpumask_test_cpu(j, &tsk->cpus_allowed)) +#endif + seq_printf(p, "%10u ", irq_stats(j)->irq_threshold_count); seq_printf(p, " Threshold APIC interrupts\n"); #endif #ifdef CONFIG_X86_MCE seq_printf(p, "%*s: ", prec, "MCE"); for_each_online_cpu(j) - seq_printf(p, "%10u ", per_cpu(mce_exception_count, j)); +#ifdef CONFIG_CPUSETS + if (tsk != NULL && cpumask_test_cpu(j, &tsk->cpus_allowed)) +#endif + seq_printf(p, "%10u ", per_cpu(mce_exception_count, j)); seq_printf(p, " Machine check exceptions\n"); seq_printf(p, "%*s: ", prec, "MCP"); for_each_online_cpu(j) - seq_printf(p, "%10u ", per_cpu(mce_poll_count, j)); +#ifdef CONFIG_CPUSETS + if (tsk != NULL && cpumask_test_cpu(j, &tsk->cpus_allowed)) +#endif + seq_printf(p, "%10u ", per_cpu(mce_poll_count, j)); seq_printf(p, " Machine check polls\n"); #endif seq_printf(p, "%*s: %10u\n", prec, "ERR", atomic_read(&irq_err_count)); -- 1.8.4 [-- Attachment #4: 0001-fs-proc-stat.c-kernel-sched-stats.c-List-only-the-CP.patch --] [-- Type: text/x-patch, Size: 2799 bytes --] From 00af9f7b5eeef770d0da240a6bf2064a2ba11e47 Mon Sep 17 00:00:00 2001 From: Marian Marinov <mm-NV7Lj0SOnH0@public.gmane.org> Date: Sat, 4 Jan 2014 06:03:11 +0200 Subject: [PATCH 1/1] fs/proc/stat.c & kernel/sched/stats.c: List only the CPUs that are in the current cpuset - Added a check to allow the display of cpu information only if the cpu is part of the current cpu set using the task_struct allowed_cpus Signed-off-by: Marian Marinov <mm-NV7Lj0SOnH0@public.gmane.org> --- fs/proc/stat.c | 14 ++++++++++++++ kernel/sched/stats.c | 9 +++++++++ 2 files changed, 23 insertions(+) diff --git a/fs/proc/stat.c b/fs/proc/stat.c index 1cf86c0..e5ca3ef 100644 --- a/fs/proc/stat.c +++ b/fs/proc/stat.c @@ -87,6 +87,11 @@ static int show_stat(struct seq_file *p, void *v) u64 sum_softirq = 0; unsigned int per_softirq_sums[NR_SOFTIRQS] = {0}; struct timespec boottime; +#ifdef CONFIG_CPUSETS + struct task_struct *tsk; + + tsk = current_thread_info()->task; +#endif user = nice = system = idle = iowait = irq = softirq = steal = 0; @@ -94,7 +99,12 @@ static int show_stat(struct seq_file *p, void *v) getboottime(&boottime); jif = boottime.tv_sec; + for_each_possible_cpu(i) { +#ifdef CONFIG_CPUSETS + if (tsk != NULL && cpumask_test_cpu(i, &tsk->cpus_allowed) == 0) + continue; +#endif user += kcpustat_cpu(i).cpustat[CPUTIME_USER]; nice += kcpustat_cpu(i).cpustat[CPUTIME_NICE]; system += kcpustat_cpu(i).cpustat[CPUTIME_SYSTEM]; @@ -142,6 +152,10 @@ static int show_stat(struct seq_file *p, void *v) steal = kcpustat_cpu(i).cpustat[CPUTIME_STEAL]; guest = kcpustat_cpu(i).cpustat[CPUTIME_GUEST]; guest_nice = kcpustat_cpu(i).cpustat[CPUTIME_GUEST_NICE]; +#ifdef CONFIG_CPUSETS + if (tsk != NULL && cpumask_test_cpu(i, &tsk->cpus_allowed) == 0) + continue; +#endif seq_printf(p, "cpu%d", i); seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(user)); seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(nice)); diff --git a/kernel/sched/stats.c b/kernel/sched/stats.c index da98af3..5897358 100644 --- a/kernel/sched/stats.c +++ b/kernel/sched/stats.c @@ -17,6 +17,10 @@ static int show_schedstat(struct seq_file *seq, void *v) int cpu; int mask_len = DIV_ROUND_UP(NR_CPUS, 32) * 9; char *mask_str = kmalloc(mask_len, GFP_KERNEL); +#ifdef CONFIG_CPUSETS + struct task_struct *tsk; + tsk = current_thread_info()->task; +#endif if (mask_str == NULL) return -ENOMEM; @@ -33,6 +37,11 @@ static int show_schedstat(struct seq_file *seq, void *v) cpu = (unsigned long)(v - 2); rq = cpu_rq(cpu); +#ifdef CONFIG_CPUSETS + if (tsk != NULL && cpumask_test_cpu(cpu, &tsk->cpus_allowed) == 0) + return 0; +#endif + /* runqueue-specific stats */ seq_printf(seq, "cpu%d %u 0 %u %u %u %u %llu %llu %lu", -- 1.8.4 [-- Attachment #5: 0002-arch-x86-kernel-cpu-proc.c-Added-Kconfig-option-hand.patch --] [-- Type: text/x-patch, Size: 1422 bytes --] From dec97e6141f92109c0cd02883cff20e3f1429564 Mon Sep 17 00:00:00 2001 From: Marian Marinov <mm-NV7Lj0SOnH0@public.gmane.org> Date: Sat, 4 Jan 2014 05:50:03 +0200 Subject: [PATCH 2/2] arch/x86/kernel/cpu/proc.c: Added Kconfig option handling Signed-off-by: Marian Marinov <mm-NV7Lj0SOnH0@public.gmane.org> --- arch/x86/kernel/cpu/proc.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c index d9e9fb6..114fd95 100644 --- a/arch/x86/kernel/cpu/proc.c +++ b/arch/x86/kernel/cpu/proc.c @@ -3,7 +3,10 @@ #include <linux/string.h> #include <linux/seq_file.h> #include <linux/cpufreq.h> + +#ifdef CONFIG_CPUSETS #include <linux/cgroup.h> +#endif /* * Get CPU information for use by the procfs. @@ -134,10 +137,13 @@ static int show_cpuinfo(struct seq_file *m, void *v) static void *c_start(struct seq_file *m, loff_t *pos) { +#ifdef CONFIG_CPUSETS struct task_struct *tsk; +#endif *pos = cpumask_next(*pos - 1, cpu_online_mask); - tsk = current_thread_info()->task; if ((*pos) < nr_cpu_ids) { +#ifdef CONFIG_CPUSETS + tsk = current_thread_info()->task; if (tsk != NULL) { while (cpumask_test_cpu((*pos), &tsk->cpus_allowed) == 0) { (*pos)++; @@ -145,6 +151,7 @@ static void *c_start(struct seq_file *m, loff_t *pos) return NULL; } } +#endif return &cpu_data(*pos); } return NULL; -- 1.8.4 ^ permalink raw reply related [flat|nested] 11+ messages in thread
[parent not found: <52C8A36B.6030201-NV7Lj0SOnH0@public.gmane.org>]
* Re: RFC: cgroups aware proc [not found] ` <52C8A36B.6030201-NV7Lj0SOnH0@public.gmane.org> @ 2014-01-07 11:16 ` Li Zefan 2014-01-07 11:17 ` Li Zefan 1 sibling, 0 replies; 11+ messages in thread From: Li Zefan @ 2014-01-07 11:16 UTC (permalink / raw) To: Marian Marinov Cc: lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I, cgroups-u79uwXL29TY76Z2rM5mHXA, Daniel P. Berrange, Serge Hallyn On 2014/1/5 8:12, Marian Marinov wrote: > Happy new year guys. > > I need to have /proc cgroups aware, as I want to have LXC containers that see only the resources that are given to them. > > In order to do that I had to patch the kernel. I decided to start with cpuinfo, stat and interrupts and then continue > with meminfo and loadavg. > > I managed to patch the Kernel (linux 3.12.0) and make /proc/cpuinfo, /proc/stat and /proc/interrupts be cgroups aware. > > Attached are the patches that make the necessary changes. > > The change for /proc/cpuinfo and /proc/interrupts is currently done only for x86 arch, but I will patch the rest of the > architectures if the style of the patches is acceptable. > > Tomorrow I will check if the patches apply and build with the latest kernel. > People tried to do this before, but got rejected by upstream maintainers, and then the opinion was to do this in userspace throught FUSE. Seems libvirt already supports containerized /proc/meminfo in this way. See: http://libvirt.org/drvlxc.html ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: RFC: cgroups aware proc [not found] ` <52C8A36B.6030201-NV7Lj0SOnH0@public.gmane.org> 2014-01-07 11:16 ` Li Zefan @ 2014-01-07 11:17 ` Li Zefan [not found] ` <52CBE22F.1010106-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> 1 sibling, 1 reply; 11+ messages in thread From: Li Zefan @ 2014-01-07 11:17 UTC (permalink / raw) To: Marian Marinov Cc: lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I, cgroups-u79uwXL29TY76Z2rM5mHXA, Daniel P. Berrange, Serge Hallyn On 2014/1/5 8:12, Marian Marinov wrote: > Happy new year guys. > > I need to have /proc cgroups aware, as I want to have LXC containers that see only the resources that are given to them. > > In order to do that I had to patch the kernel. I decided to start with cpuinfo, stat and interrupts and then continue > with meminfo and loadavg. > > I managed to patch the Kernel (linux 3.12.0) and make /proc/cpuinfo, /proc/stat and /proc/interrupts be cgroups aware. > > Attached are the patches that make the necessary changes. > > The change for /proc/cpuinfo and /proc/interrupts is currently done only for x86 arch, but I will patch the rest of the > architectures if the style of the patches is acceptable. > > Tomorrow I will check if the patches apply and build with the latest kernel. > People tried to do this before, but got rejected by upstream maintainers, and then the opinion was to do this in userspace throught FUSE. Seems libvirt-lxc already supports containerized /proc/meminfo in this way. See: http://libvirt.org/drvlxc.html ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <52CBE22F.1010106-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>]
* Re: RFC: cgroups aware proc [not found] ` <52CBE22F.1010106-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> @ 2014-01-07 17:42 ` Marian Marinov [not found] ` <52CC3C80.8030603-NV7Lj0SOnH0@public.gmane.org> 0 siblings, 1 reply; 11+ messages in thread From: Marian Marinov @ 2014-01-07 17:42 UTC (permalink / raw) To: Li Zefan Cc: lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I, cgroups-u79uwXL29TY76Z2rM5mHXA, Daniel P. Berrange, Serge Hallyn On 01/07/2014 01:17 PM, Li Zefan wrote: > On 2014/1/5 8:12, Marian Marinov wrote: >> Happy new year guys. >> >> I need to have /proc cgroups aware, as I want to have LXC containers that see only the resources that are given to them. >> >> In order to do that I had to patch the kernel. I decided to start with cpuinfo, stat and interrupts and then continue >> with meminfo and loadavg. >> >> I managed to patch the Kernel (linux 3.12.0) and make /proc/cpuinfo, /proc/stat and /proc/interrupts be cgroups aware. >> >> Attached are the patches that make the necessary changes. >> >> The change for /proc/cpuinfo and /proc/interrupts is currently done only for x86 arch, but I will patch the rest of the >> architectures if the style of the patches is acceptable. >> >> Tomorrow I will check if the patches apply and build with the latest kernel. >> > > People tried to do this before, but got rejected by upstream maintainers, > and then the opinion was to do this in userspace throught FUSE. > > Seems libvirt-lxc already supports containerized /proc/meminfo in this way. > See: > http://libvirt.org/drvlxc.html I'm well aware of the FUSE approach and the fact that the kernel maintainers do not accept the this kind of changing the kernel but the simple truth is that FUSE is too have for this thing. I'm setting up a repo on GitHub which will hold all the patches for this and will keep updating it even if it is not accepted by the upstream maintainers. I'll give you the link within a few days. I have already finished with CPU and Memory... the only thing that is left is the /proc/loadavg, which will take more time, but will be done. I hope some of the scheduler maintainers at least to give me some comments on the patches that I have done. Marian > > > ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <52CC3C80.8030603-NV7Lj0SOnH0@public.gmane.org>]
* Re: RFC: cgroups aware proc [not found] ` <52CC3C80.8030603-NV7Lj0SOnH0@public.gmane.org> @ 2014-01-08 15:27 ` Serge Hallyn 2014-01-10 16:29 ` Marian Marinov 2014-01-13 3:26 ` Li Zefan 1 sibling, 1 reply; 11+ messages in thread From: Serge Hallyn @ 2014-01-08 15:27 UTC (permalink / raw) To: Marian Marinov Cc: Li Zefan, lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I, cgroups-u79uwXL29TY76Z2rM5mHXA, Daniel P. Berrange Quoting Marian Marinov (mm-NV7Lj0SOnH0@public.gmane.org): > On 01/07/2014 01:17 PM, Li Zefan wrote: > >On 2014/1/5 8:12, Marian Marinov wrote: > >>Happy new year guys. > >> > >>I need to have /proc cgroups aware, as I want to have LXC containers that see only the resources that are given to them. > >> > >>In order to do that I had to patch the kernel. I decided to start with cpuinfo, stat and interrupts and then continue > >>with meminfo and loadavg. > >> > >>I managed to patch the Kernel (linux 3.12.0) and make /proc/cpuinfo, /proc/stat and /proc/interrupts be cgroups aware. > >> > >>Attached are the patches that make the necessary changes. > >> > >>The change for /proc/cpuinfo and /proc/interrupts is currently done only for x86 arch, but I will patch the rest of the > >>architectures if the style of the patches is acceptable. > >> > >>Tomorrow I will check if the patches apply and build with the latest kernel. > >> > > > >People tried to do this before, but got rejected by upstream maintainers, > >and then the opinion was to do this in userspace throught FUSE. > > > >Seems libvirt-lxc already supports containerized /proc/meminfo in this way. > >See: > > http://libvirt.org/drvlxc.html > > I'm well aware of the FUSE approach and the fact that the kernel > maintainers do not accept the this kind of changing the kernel but > the simple truth is that FUSE is too have for this thing. > > I'm setting up a repo on GitHub which will hold all the patches for Thanks, that'll be easier to look at than the in-line patches. From my very quick look, I would recommend 1. coming up with some helpers to reduce the degree to which you are negatively affecting the flow of the existing code. Currently it looks like you're obfuscating it a lot, and I think you can make it so only a few clean lines are added per function. For instance, in arch_show_interrupts(), instead of plopping +#ifdef CONFIG_CPUSETS + if (tsk != NULL && cpumask_test_cpu(j, &tsk->cpus_allowed)) +#endif in several places, write static inline bool task_has_cpu(tsk, cpu) { #ifdef CONFIG_CPUSETS return (tsk != NULL && cpumask_test_cpu(cpu, &tsk->cpus_allowed)); #else return true; #endif } and then just use 'if task_has_cpu(tsk, j)' several times. 2. showing performance degredation in the not-using-it case (that is, with cgroups enabled but in the root cpuset for instance), which hopefully will be near-nil. If you can avoid confounding the readability of the code and not impact the performance, that'll help your chances a lot. > this and will keep updating it even if it is not accepted by the > upstream maintainers. I'll give you the link within a few days. > > I have already finished with CPU and Memory... the only thing that > is left is the /proc/loadavg, which will take more time, but will be > done. > > I hope some of the scheduler maintainers at least to give me some comments on the patches that I have done. > > Marian > > > > > > > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: RFC: cgroups aware proc 2014-01-08 15:27 ` Serge Hallyn @ 2014-01-10 16:29 ` Marian Marinov 0 siblings, 0 replies; 11+ messages in thread From: Marian Marinov @ 2014-01-10 16:29 UTC (permalink / raw) To: Serge Hallyn Cc: Li Zefan, lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I, cgroups-u79uwXL29TY76Z2rM5mHXA, Daniel P. Berrange On 01/08/2014 05:27 PM, Serge Hallyn wrote: > Quoting Marian Marinov (mm-NV7Lj0SOnH0@public.gmane.org): >> On 01/07/2014 01:17 PM, Li Zefan wrote: >>> On 2014/1/5 8:12, Marian Marinov wrote: >>>> Happy new year guys. >>>> >>>> I need to have /proc cgroups aware, as I want to have LXC containers that see only the resources that are given to them. >>>> >>>> In order to do that I had to patch the kernel. I decided to start with cpuinfo, stat and interrupts and then continue >>>> with meminfo and loadavg. >>>> >>>> I managed to patch the Kernel (linux 3.12.0) and make /proc/cpuinfo, /proc/stat and /proc/interrupts be cgroups aware. >>>> >>>> Attached are the patches that make the necessary changes. >>>> >>>> The change for /proc/cpuinfo and /proc/interrupts is currently done only for x86 arch, but I will patch the rest of the >>>> architectures if the style of the patches is acceptable. >>>> >>>> Tomorrow I will check if the patches apply and build with the latest kernel. >>>> >>> >>> People tried to do this before, but got rejected by upstream maintainers, >>> and then the opinion was to do this in userspace throught FUSE. >>> >>> Seems libvirt-lxc already supports containerized /proc/meminfo in this way. >>> See: >>> http://libvirt.org/drvlxc.html >> >> I'm well aware of the FUSE approach and the fact that the kernel >> maintainers do not accept the this kind of changing the kernel but >> the simple truth is that FUSE is too have for this thing. >> >> I'm setting up a repo on GitHub which will hold all the patches for > > Thanks, that'll be easier to look at than the in-line patches. > >>From my very quick look, I would recommend > > 1. coming up with some helpers to reduce the degree to which you are > negatively affecting the flow of the existing code. Currently it > looks like you're obfuscating it a lot, and I think you can make it > so only a few clean lines are added per function. > > For instance, in arch_show_interrupts(), instead of plopping > > +#ifdef CONFIG_CPUSETS > + if (tsk != NULL && cpumask_test_cpu(j, &tsk->cpus_allowed)) > +#endif > > in several places, > > write > static inline bool task_has_cpu(tsk, cpu) { > #ifdef CONFIG_CPUSETS > return (tsk != NULL && cpumask_test_cpu(cpu, &tsk->cpus_allowed)); > #else > return true; > #endif > } > > and then just use 'if task_has_cpu(tsk, j)' several times. > > > 2. showing performance degredation in the not-using-it case (that is, > with cgroups enabled but in the root cpuset for instance), which > hopefully will be near-nil. > > If you can avoid confounding the readability of the code and not impact > the performance, that'll help your chances a lot. Thanks for the suggestions. I have merged all of my changes into this branch: https://github.com/1HLtd/linux/tree/cgroup-aware-proc I'm still working on the loadavg issue I hope to have it finished next week. If anyone has any suggestions for it I would be more then happy. Marian > >> this and will keep updating it even if it is not accepted by the >> upstream maintainers. I'll give you the link within a few days. >> >> I have already finished with CPU and Memory... the only thing that >> is left is the /proc/loadavg, which will take more time, but will be >> done. >> >> I hope some of the scheduler maintainers at least to give me some comments on the patches that I have done. >> >> Marian >> >>> >>> >>> >> > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: RFC: cgroups aware proc [not found] ` <52CC3C80.8030603-NV7Lj0SOnH0@public.gmane.org> 2014-01-08 15:27 ` Serge Hallyn @ 2014-01-13 3:26 ` Li Zefan [not found] ` <52D41316.5080508@yuhu.biz> 1 sibling, 1 reply; 11+ messages in thread From: Li Zefan @ 2014-01-13 3:26 UTC (permalink / raw) To: Marian Marinov Cc: lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I, cgroups-u79uwXL29TY76Z2rM5mHXA, Daniel P. Berrange, Serge Hallyn >> People tried to do this before, but got rejected by upstream maintainers, >> and then the opinion was to do this in userspace throught FUSE. >> >> Seems libvirt-lxc already supports containerized /proc/meminfo in this way. >> See: >> http://libvirt.org/drvlxc.html > > I'm well aware of the FUSE approach and the fact that the kernel maintainers do not accept the this kind of changing the kernel but the simple truth is that FUSE is too have for this thing. > > I'm setting up a repo on GitHub which will hold all the patches for this and will keep updating it even if it is not accepted by the upstream maintainers. I'll give you the link within a few days. > > I have already finished with CPU and Memory... the only thing that is left is the /proc/loadavg, which will take more time, but will be done. > > I hope some of the scheduler maintainers at least to give me some comments on the patches that I have done. > Then you should add Peter, Ingo and LKML to your Cc list. :) ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <52D41316.5080508@yuhu.biz>]
* Re: Fwd: Re: RFC: cgroups aware proc [not found] ` <52D41316.5080508@yuhu.biz> @ 2014-01-13 17:12 ` Peter Zijlstra [not found] ` <20140113171238.GS31570-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org> 0 siblings, 1 reply; 11+ messages in thread From: Peter Zijlstra @ 2014-01-13 17:12 UTC (permalink / raw) To: Marian Marinov Cc: lxc-devel, cgroups, Daniel P. Berrange, Serge Hallyn, Li Zefan, Ingo Molnar, linux-kernel On Mon, Jan 13, 2014 at 06:23:50PM +0200, Marian Marinov wrote: > Hello Peter, > > I need help with the scheduler. > > I'm currently trying to patch the /proc/loadavg to show the load that is > only related to the processes from the current cgroup. > > I looked trough the code and I was hoping that tsk->sched_task_group->cfs_rq > struct will give me the needed information, but unfortunately for me, it did > not. > > Can you advise me, how to approach this problem? Yeah, don't :-) Really, loadavg is a stupid metric. > I'm totally new to the scheduler code. Luckily you won't actually have to touch much of it. Most of the actual loadavg code lives in the first ~400 lines of kernel/sched/proc.c, read and weep. Its one of the best documented bits around. Your proposition however is extremely expensive, you turn something that's already expensive O(nr_cpus) into something O(nr_cpus * nr_cgroups). I'm fairly sure people will not like that, esp. for something of such questionable use as the loadavg -- its really only a pretty number that doesn't mean all that much. > -------- Original Message -------- > From: Li Zefan <lizefan@huawei.com> > > Then you should add Peter, Ingo and LKML to your Cc list. :) You failed that, let me fix that. ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <20140113171238.GS31570-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>]
* Re: RFC: cgroups aware proc [not found] ` <20140113171238.GS31570-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org> @ 2014-01-14 0:58 ` Marian Marinov [not found] ` <52D48BA6.2080701-NV7Lj0SOnH0@public.gmane.org> 0 siblings, 1 reply; 11+ messages in thread From: Marian Marinov @ 2014-01-14 0:58 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I, cgroups-u79uwXL29TY76Z2rM5mHXA, Ingo Molnar On 01/13/2014 07:12 PM, Peter Zijlstra wrote: > On Mon, Jan 13, 2014 at 06:23:50PM +0200, Marian Marinov wrote: >> Hello Peter, >> >> I need help with the scheduler. >> >> I'm currently trying to patch the /proc/loadavg to show the load that is >> only related to the processes from the current cgroup. >> >> I looked trough the code and I was hoping that tsk->sched_task_group->cfs_rq >> struct will give me the needed information, but unfortunately for me, it did >> not. >> >> Can you advise me, how to approach this problem? > > Yeah, don't :-) Really, loadavg is a stupid metric. Yes... stupid, but unfortunately everyone is looking at it :( > >> I'm totally new to the scheduler code. > > Luckily you won't actually have to touch much of it. Most of the actual > loadavg code lives in the first ~400 lines of kernel/sched/proc.c, read > and weep. Its one of the best documented bits around. I looked trough it but I don't understand how to introduce the per cgroup calculation. I looked trough the headers and found the following, which is already implemented. task->sched_task_group->load_avg task->sched_task_group->cfs_rq->load_avg task->sched_task_group->cfs_rq->load.weight task->sched_task_group->cfs_rq->runnable_load_avg Unfortunately there is almost no documentation for these elements of the cfs_rq and task_group structs. It seams to me that part of the per task group loadavg code is already present. > > Your proposition however is extremely expensive, you turn something > that's already expensive O(nr_cpus) into something O(nr_cpus * > nr_cgroups). > > I'm fairly sure people will not like that, esp. for something of such > questionable use as the loadavg -- its really only a pretty number that > doesn't mean all that much. I know that its use is questionable but in my case I need to have it, or I will not be able to offer correct loadavg values in the containers. > >> -------- Original Message -------- >> From: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> >> >> Then you should add Peter, Ingo and LKML to your Cc list. :) > > You failed that, let me fix that. > > ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <52D48BA6.2080701-NV7Lj0SOnH0@public.gmane.org>]
* Re: RFC: cgroups aware proc [not found] ` <52D48BA6.2080701-NV7Lj0SOnH0@public.gmane.org> @ 2014-01-14 10:05 ` Peter Zijlstra 0 siblings, 0 replies; 11+ messages in thread From: Peter Zijlstra @ 2014-01-14 10:05 UTC (permalink / raw) To: Marian Marinov Cc: lxc-devel-cunTk1MwBs9qMoObBWhMNEqPaTDuhLve2LY78lusg7I, cgroups-u79uwXL29TY76Z2rM5mHXA, Daniel P. Berrange, Serge Hallyn, Li Zefan, Ingo Molnar, linux-kernel-u79uwXL29TY76Z2rM5mHXA On Tue, Jan 14, 2014 at 02:58:14AM +0200, Marian Marinov wrote: > On 01/13/2014 07:12 PM, Peter Zijlstra wrote: > >On Mon, Jan 13, 2014 at 06:23:50PM +0200, Marian Marinov wrote: > >>Hello Peter, > >> > >>I need help with the scheduler. > >> > >>I'm currently trying to patch the /proc/loadavg to show the load that is > >>only related to the processes from the current cgroup. > >> > >>I looked trough the code and I was hoping that tsk->sched_task_group->cfs_rq > >>struct will give me the needed information, but unfortunately for me, it did > >>not. > >> > >>Can you advise me, how to approach this problem? > > > >Yeah, don't :-) Really, loadavg is a stupid metric. > > Yes... stupid, but unfortunately everyone is looking at it :( > > > > >>I'm totally new to the scheduler code. > > > >Luckily you won't actually have to touch much of it. Most of the actual > >loadavg code lives in the first ~400 lines of kernel/sched/proc.c, read > >and weep. Its one of the best documented bits around. > > I looked trough it but I don't understand how to introduce the per cgroup calculation. > > I looked trough the headers and found the following, which is already implemented. > > task->sched_task_group->load_avg > task->sched_task_group->cfs_rq->load_avg > task->sched_task_group->cfs_rq->load.weight > task->sched_task_group->cfs_rq->runnable_load_avg > > Unfortunately there is almost no documentation for these elements of the cfs_rq and task_group structs. > > It seams to me that part of the per task group loadavg code is already present. No, those are actual load metrics and completely unrelated to loadavg. Loadavg requires per-cgroup-per-cpu variants of nr_running and nr_uninterruptible. Those are the only metrics used in kernel/sched/proc.c for loadavg. ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2014-01-14 10:05 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <529350DB.3010906@yuhu.biz>
[not found] ` <20131125150940.GB7120@sergelap>
[not found] ` <20131125151232.GR6766@redhat.com>
[not found] ` <52937B0C.3070005@yuhu.biz>
[not found] ` <52937B0C.3070005-NV7Lj0SOnH0@public.gmane.org>
2014-01-04 4:28 ` RFC: cgroups aware proc Marian Marinov
[not found] ` <52C78E09.60904-NV7Lj0SOnH0@public.gmane.org>
2014-01-05 0:12 ` Marian Marinov
[not found] ` <52C8A36B.6030201-NV7Lj0SOnH0@public.gmane.org>
2014-01-07 11:16 ` Li Zefan
2014-01-07 11:17 ` Li Zefan
[not found] ` <52CBE22F.1010106-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2014-01-07 17:42 ` Marian Marinov
[not found] ` <52CC3C80.8030603-NV7Lj0SOnH0@public.gmane.org>
2014-01-08 15:27 ` Serge Hallyn
2014-01-10 16:29 ` Marian Marinov
2014-01-13 3:26 ` Li Zefan
[not found] ` <52D41316.5080508@yuhu.biz>
2014-01-13 17:12 ` Fwd: " Peter Zijlstra
[not found] ` <20140113171238.GS31570-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>
2014-01-14 0:58 ` Marian Marinov
[not found] ` <52D48BA6.2080701-NV7Lj0SOnH0@public.gmane.org>
2014-01-14 10:05 ` Peter Zijlstra
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).