public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Venkatesh Pallipadi <venki@google.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@elte.hu>, "H. Peter Anvin" <hpa@zytor.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Balbir Singh <balbir@linux.vnet.ibm.com>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: linux-kernel@vger.kernel.org, Paul Turner <pjt@google.com>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	Shaun Ruffell <sruffell@digium.com>,
	Venkatesh Pallipadi <venki@google.com>
Subject: [PATCH 1/5] Free up pf flag PF_KSOFTIRQD -v2
Date: Tue, 21 Dec 2010 17:09:00 -0800	[thread overview]
Message-ID: <1292980144-28796-2-git-send-email-venki@google.com> (raw)
In-Reply-To: <1292980144-28796-1-git-send-email-venki@google.com>

Patchset:
This is Part 2 of
"Proper kernel irq time accounting -v4"
http://lkml.indiana.edu/hypermail//linux/kernel/1010.0/01175.html

and applies 2.6.37-rc7.

Part 1 solves the way irqs are accounted in scheduler and tasks. This
patchset solves how irq times are reported in /proc/stat and also not
to include irq time in task->stime, etc.

Example:
Running a cpu intensive loop and network intensive nc on a 4 CPU system
and looking at 'top' output.

With vanilla kernel:
Cpu0  :  0.0% us,  0.3% sy,  0.0% ni, 99.3% id,  0.0% wa,  0.0% hi,  0.3% si
Cpu1  : 100.0% us,  0.0% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.0% si
Cpu2  :  1.3% us, 27.2% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi, 71.4% si
Cpu3  :  1.6% us,  1.3% sy,  0.0% ni, 96.7% id,  0.0% wa,  0.0% hi,  0.3% si

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 7555 root      20   0  1760  528  436 R  100  0.0   0:15.79 nc
 7563 root      20   0  3632  268  204 R  100  0.0   0:13.13 loop

Notes:
* Both tasks show 100% CPU, even when one of them is stuck on a CPU thats
  processing 70% softirq.
* no hardirq time.


With "Part 1" patches:
Cpu0  :  0.0% us,  0.0% sy,  0.0% ni, 100.0% id,  0.0% wa,  0.0% hi,  0.0% si
Cpu1  : 100.0% us,  0.0% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.0% si
Cpu2  :  2.0% us, 30.6% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi, 67.4% si
Cpu3  :  0.7% us,  0.7% sy,  0.3% ni, 98.3% id,  0.0% wa,  0.0% hi,  0.0% si

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 6289 root      20   0  3632  268  204 R  100  0.0   2:18.67 loop
 5737 root      20   0  1760  528  436 R   33  0.0   0:26.72 nc

Notes:
* Tasks show 100% CPU and 33% CPU that correspond to their non-irq exec time.
* no hardirq time.


With "Part 1 + Part 2" patches:
Cpu0  :  1.3% us,  1.0% sy,  0.3% ni, 97.0% id,  0.0% wa,  0.0% hi,  0.3% si
Cpu1  : 99.3% us,  0.0% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.7% hi,  0.0% si
Cpu2  :  1.3% us, 31.5% sy,  0.0% ni,  0.0% id,  0.0% wa,  8.3% hi, 58.9% si
Cpu3  :  1.0% us,  2.0% sy,  0.3% ni, 95.0% id,  0.0% wa,  0.7% hi,  1.0% si

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
20929 root      20   0  3632  268  204 R   99  0.0   3:48.25 loop
20796 root      20   0  1760  528  436 R   33  0.0   2:38.65 nc

Notes:
* Both task exec time and hard irq time reported correctly.
* hi and si time are based on fine granularity info and not on samples.
* getrusage would give proper utime/stime split not including irq times
  in that ratio.
* Other places that report user/sys time like, cgroup cpuacct.stat will
  now include only non-irq exectime.

This patch:

Cleanup patch, freeing up PF_KSOFTIRQD and use per_cpu ksoftirqd pointer
instead, as suggested by Eric Dumazet.

Tested-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Venkatesh Pallipadi <venki@google.com>
---
 include/linux/interrupt.h |    7 +++++++
 include/linux/sched.h     |    1 -
 kernel/sched.c            |    2 +-
 kernel/softirq.c          |    3 +--
 4 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 79d0c4f..3802fac 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -426,6 +426,13 @@ extern void raise_softirq(unsigned int nr);
  */
 DECLARE_PER_CPU(struct list_head [NR_SOFTIRQS], softirq_work_list);
 
+DECLARE_PER_CPU(struct task_struct *, ksoftirqd);
+
+static inline struct task_struct *this_cpu_ksoftirqd(void)
+{
+	return this_cpu_read(ksoftirqd);
+}
+
 /* Try to send a softirq to a remote cpu.  If this cannot be done, the
  * work will be queued to the local cpu.
  */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 2238745..86924ff 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1699,7 +1699,6 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
 /*
  * Per process flags
  */
-#define PF_KSOFTIRQD	0x00000001	/* I am ksoftirqd */
 #define PF_STARTING	0x00000002	/* being created */
 #define PF_EXITING	0x00000004	/* getting shut down */
 #define PF_EXITPIDONE	0x00000008	/* pi exit done on shut down */
diff --git a/kernel/sched.c b/kernel/sched.c
index 297d1a0..bfc9646 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2011,7 +2011,7 @@ void account_system_vtime(struct task_struct *curr)
 	 */
 	if (hardirq_count())
 		__this_cpu_add(cpu_hardirq_time, delta);
-	else if (in_serving_softirq() && !(curr->flags & PF_KSOFTIRQD))
+	else if (in_serving_softirq() && curr != this_cpu_ksoftirqd())
 		__this_cpu_add(cpu_softirq_time, delta);
 
 	irq_time_write_end();
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 18f4be0..b904be8 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -54,7 +54,7 @@ EXPORT_SYMBOL(irq_stat);
 
 static struct softirq_action softirq_vec[NR_SOFTIRQS] __cacheline_aligned_in_smp;
 
-static DEFINE_PER_CPU(struct task_struct *, ksoftirqd);
+DEFINE_PER_CPU(struct task_struct *, ksoftirqd);
 
 char *softirq_to_name[NR_SOFTIRQS] = {
 	"HI", "TIMER", "NET_TX", "NET_RX", "BLOCK", "BLOCK_IOPOLL",
@@ -721,7 +721,6 @@ static int run_ksoftirqd(void * __bind_cpu)
 {
 	set_current_state(TASK_INTERRUPTIBLE);
 
-	current->flags |= PF_KSOFTIRQD;
 	while (!kthread_should_stop()) {
 		preempt_disable();
 		if (!local_softirq_pending()) {
-- 
1.7.3.1


  reply	other threads:[~2010-12-22  1:09 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-12-22  1:08 [PATCH 0/5] Proper kernel irq time reporting -v2 Venkatesh Pallipadi
2010-12-22  1:09 ` Venkatesh Pallipadi [this message]
2010-12-22  9:17   ` [PATCH 1/5] Free up pf flag PF_KSOFTIRQD -v2 Peter Zijlstra
2011-01-26 12:11   ` [tip:sched/core] softirqs: Free up pf flag PF_KSOFTIRQD tip-bot for Venkatesh Pallipadi
2010-12-22  1:09 ` [PATCH 2/5] Add nsecs_to_cputime64 interface for asm-generic -v2 Venkatesh Pallipadi
2010-12-22  8:30   ` Martin Schwidefsky
2010-12-22 14:23     ` Venkatesh Pallipadi
2010-12-22 15:25       ` Martin Schwidefsky
2011-01-26 12:12   ` [tip:sched/core] time: Add nsecs_to_cputime64 interface for asm-generic tip-bot for Venkatesh Pallipadi
2010-12-22  1:09 ` [PATCH 3/5] Refactor account_system_time separating id-update -v2 Venkatesh Pallipadi
2011-01-26 12:12   ` [tip:sched/core] sched: Refactor account_system_time separating id-update tip-bot for Venkatesh Pallipadi
2010-12-22  1:09 ` [PATCH 4/5] Export ns irqtimes through /proc/stat -v2 Venkatesh Pallipadi
2011-01-26 12:13   ` [tip:sched/core] sched: Export ns irqtimes through /proc/stat tip-bot for Venkatesh Pallipadi
2010-12-22  1:09 ` [PATCH 5/5] Account ksoftirqd time as cpustat softirq -v2 Venkatesh Pallipadi
2010-12-22  9:20   ` Peter Zijlstra
2010-12-22 13:59     ` Venkatesh Pallipadi
2010-12-22 14:05       ` Peter Zijlstra
2010-12-22 14:17         ` Venkatesh Pallipadi
2011-01-26 12:13   ` [tip:sched/core] softirqs: Account ksoftirqd time as cpustat softirq tip-bot for Venkatesh Pallipadi
2011-01-06 15:31 ` [PATCH 0/5] Proper kernel irq time reporting -v2 Shaun Ruffell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1292980144-28796-2-git-send-email-venki@google.com \
    --to=venki@google.com \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=eric.dumazet@gmail.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=schwidefsky@de.ibm.com \
    --cc=sruffell@digium.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox