public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Venkatesh Pallipadi <venki@google.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@elte.hu>, "H. Peter Anvin" <hpa@zytor.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Balbir Singh <balbir@linux.vnet.ibm.com>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: linux-kernel@vger.kernel.org, Paul Turner <pjt@google.com>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	Shaun Ruffell <sruffell@digium.com>,
	Yong Zhang <yong.zhang0@gmail.com>,
	Venkatesh Pallipadi <venki@google.com>
Subject: [PATCH 4/6] Export ns irqtimes from IRQ_TIME_ACCOUNTING through /proc/stat
Date: Wed, 20 Oct 2010 15:49:00 -0700	[thread overview]
Message-ID: <1287614941-32325-5-git-send-email-venki@google.com> (raw)
In-Reply-To: <1287614941-32325-1-git-send-email-venki@google.com>

CONFIG_IRQ_TIME_ACCOUNTING adds ns granularity irq time on each CPU.
This info is already used in scheduler to do proper task chargeback
(earlier patches). This patch retro-fits this ns granularity
hardirq and softirq information to /proc/stat irq and softirq fields.

The update is still done on timer tick, where we look at accumulated
ns hardirq/softirq time and account the tick to user/system/irq/hardirq/guest
accordingly.

No new interface added.

Earlier versions looked at adding this as new fields in some /proc
files. This one seems to be the best in terms of impact to existing
apps, even though it has somewhat more kernel code than earlier versions.

Signed-off-by: Venkatesh Pallipadi <venki@google.com>
---
 kernel/sched.c |  102 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 102 insertions(+), 0 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 8b97958..7e812a6 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2002,8 +2002,40 @@ static void sched_irq_time_avg_update(struct rq *rq, u64 curr_irq_time)
 	}
 }
 
+static int irqtime_account_hi_update(void)
+{
+	struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat;
+	unsigned long flags;
+	u64 latest_ns;
+	int ret = 0;
+
+	local_irq_save(flags);
+	latest_ns = __get_cpu_var(cpu_hardirq_time);
+	if (cputime64_gt(nsecs_to_cputime64(latest_ns), cpustat->irq))
+		ret = 1;
+	local_irq_restore(flags);
+	return ret;
+}
+
+static int irqtime_account_si_update(void)
+{
+	struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat;
+	unsigned long flags;
+	u64 latest_ns;
+	int ret = 0;
+
+	local_irq_save(flags);
+	latest_ns = __get_cpu_var(cpu_softirq_time);
+	if (cputime64_gt(nsecs_to_cputime64(latest_ns), cpustat->softirq))
+		ret = 1;
+	local_irq_restore(flags);
+	return ret;
+}
+
 #else
 
+#define sched_clock_irqtime	(0)
+
 static u64 irq_time_cpu(int cpu)
 {
 	return 0;
@@ -3554,6 +3586,65 @@ void account_system_time(struct task_struct *p, int hardirq_offset,
 	__account_system_time(p, cputime, cputime_scaled, target_cputime64);
 }
 
+#ifdef CONFIG_IRQ_TIME_ACCOUNTING
+/*
+ * Account a tick to a process and cpustat
+ * @p: the process that the cpu time gets accounted to
+ * @user_tick: is the tick from userspace
+ * @rq: the pointer to rq
+ *
+ * Tick demultiplexing follows the order
+ * - pending hardirq update
+ * - user_time
+ * - pending softirq update
+ * - idle_time
+ * - system time
+ *   - check for guest_time
+ *   - else account as system_time
+ *
+ * Check for hardirq is done both for system and user time as there is
+ * no timer going off while we are on hardirq and hence we may never get an
+ * oppurtunity to update it solely in system time.
+ * p->stime and friends are only updated on system time and not on irq
+ * softirq as those do not count in task exec_runtime any more.
+ */
+static void irqtime_account_process_tick(struct task_struct *p, int user_tick,
+						struct rq *rq)
+{
+	cputime_t one_jiffy_scaled = cputime_to_scaled(cputime_one_jiffy);
+	cputime64_t tmp = cputime_to_cputime64(cputime_one_jiffy);
+	struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat;
+
+	if (irqtime_account_hi_update()) {
+		cpustat->irq = cputime64_add(cpustat->irq, tmp);
+	} else if (user_tick) {
+		account_user_time(p, cputime_one_jiffy, one_jiffy_scaled);
+	} else if (irqtime_account_si_update()) {
+		cpustat->softirq = cputime64_add(cpustat->softirq, tmp);
+	} else if (p == rq->idle) {
+		account_idle_time(cputime_one_jiffy);
+	} else if (p->flags & PF_VCPU) { /* System time or guest time */
+		account_guest_time(p, cputime_one_jiffy, one_jiffy_scaled);
+	} else {
+		__account_system_time(p, cputime_one_jiffy, one_jiffy_scaled,
+					&cpustat->system);
+	}
+}
+
+static void irqtime_account_idle_ticks(int ticks)
+{
+	int i;
+	struct rq *rq = this_rq();
+
+	for (i = 0; i < ticks; i++)
+		irqtime_account_process_tick(current, 0, rq);
+}
+#else
+static void irqtime_account_idle_ticks(int ticks) {}
+static void irqtime_account_process_tick(struct task_struct *p, int user_tick,
+						struct rq *rq) {}
+#endif
+
 /*
  * Account for involuntary wait time.
  * @steal: the cpu time spent in involuntary wait
@@ -3594,6 +3685,11 @@ void account_process_tick(struct task_struct *p, int user_tick)
 	cputime_t one_jiffy_scaled = cputime_to_scaled(cputime_one_jiffy);
 	struct rq *rq = this_rq();
 
+	if (sched_clock_irqtime) {
+		irqtime_account_process_tick(p, user_tick, rq);
+		return;
+	}
+
 	if (user_tick)
 		account_user_time(p, cputime_one_jiffy, one_jiffy_scaled);
 	else if ((p != rq->idle) || (irq_count() != HARDIRQ_OFFSET))
@@ -3619,6 +3715,12 @@ void account_steal_ticks(unsigned long ticks)
  */
 void account_idle_ticks(unsigned long ticks)
 {
+
+	if (sched_clock_irqtime) {
+		irqtime_account_idle_ticks(ticks);
+		return;
+	}
+
 	account_idle_time(jiffies_to_cputime(ticks));
 }
 
-- 
1.7.1


  parent reply	other threads:[~2010-10-20 22:49 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-20 22:48 [PATCH 0/5] Proper kernel irq time reporting -v0 Venkatesh Pallipadi
2010-10-20 22:48 ` [PATCH 1/6] Free up pf flag PF_KSOFTIRQD Venkatesh Pallipadi
2010-10-21  5:23   ` Eric Dumazet
2010-10-21 14:36     ` Venkatesh Pallipadi
2010-10-21 14:58       ` Eric Dumazet
2010-10-21 17:03         ` Venkatesh Pallipadi
2010-10-21 15:13   ` Christoph Lameter
2010-10-21 17:06     ` Venkatesh Pallipadi
2010-10-20 22:48 ` [PATCH 2/6] Add nsecs_to_cputime64 interface for asm-generic Venkatesh Pallipadi
2010-10-20 22:48 ` [PATCH 3/6] Refactor account_system_time separating id and actual update Venkatesh Pallipadi
2010-10-20 22:49 ` Venkatesh Pallipadi [this message]
2010-10-21 14:44   ` [PATCH 4/6] Export ns irqtimes from IRQ_TIME_ACCOUNTING through /proc/stat Peter Zijlstra
2010-10-21 19:25     ` Venkatesh Pallipadi
2010-10-22 12:23       ` Peter Zijlstra
2010-10-22 23:34         ` Venkatesh Pallipadi
2010-10-20 22:49 ` [PATCH 5/6] Account ksoftirq time as cpustat softirq Venkatesh Pallipadi
2010-10-21 14:53   ` Peter Zijlstra
2010-10-21 19:10     ` Venkatesh Pallipadi
2010-10-21 17:25 ` [PATCH 0/5] Proper kernel irq time reporting -v0 Shaun Ruffell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1287614941-32325-5-git-send-email-venki@google.com \
    --to=venki@google.com \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=eric.dumazet@gmail.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=schwidefsky@de.ibm.com \
    --cc=sruffell@digium.com \
    --cc=tglx@linutronix.de \
    --cc=yong.zhang0@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox