All of lore.kernel.org
 help / color / mirror / Atom feed
From: Venkatesh Pallipadi <venki@google.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>,
	Ingo Molnar <mingo@elte.hu>, "H. Peter Anvin" <hpa@zytor.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Balbir Singh <balbir@linux.vnet.ibm.com>,
	linux-kernel@vger.kernel.org, Paul Turner <pjt@google.com>,
	Venkatesh Pallipadi <venki@google.com>
Subject: [PATCH] si time accounting accounts bh_disable'd time to si
Date: Mon, 27 Sep 2010 13:35:53 -0700	[thread overview]
Message-ID: <1285619753-10892-1-git-send-email-venki@google.com> (raw)
In-Reply-To: <1285003599.2275.756.camel@laptop>

>> >> > You still do have the problem with local_bh_disable() though, since you
>> >> > cannot distinguish between having bh disabled and processing softirq.
>> >> >
>> >> > So a hardirq that hits while you have bh disabled will inflate your
>> >> > softirq time.

>> >> Hmm, that bug is valid for CONFIG_VIRT_CPU_ACCOUNTING=y as well.
>> >
>> > And nobody ever noticed?
>> >
>> Yes. I inherited the API from VIRT_CPU_ACCOUNTING along with this
>> local_bh_disable bug. Agree that we need one extra bit to handle this
>> case. I will take a stab at fixing this along with refresh of this
>> patchset if no one else has beaten me to it until then.
>
>Make sure to only fix the softirq processing on the hardirq tail, not
> the ksoftirqd one :-)

softirq processing from hardirq tail and ksoftirqd are currently
handled in the same way and I didn't see any issues changing both of
them. Am I missing something?

Here's the patch I have for this. 

[PATCH] si time accounting accounts bh_disable'd time to si

Peter Zijlstra found a bug in the way softirq time is accounted in
VIRT_CPU_ACCOUNTING on this thread.
http://lkml.indiana.edu/hypermail//linux/kernel/1009.2/01366.html

The problem is, softirq processing uses local_bh_disable internally and there
would be no way later in the flow to differentiate between whether softirq is
being processed or is it just that bh has been disabled. So, a hardirq when bh
id disabled results in time being wrongly accounted for softirq.

Looking at the code a bit more, the problem exists in !VIRT_CPU_ACCOUNTING
as well. As account_system_time() in normal tick based accouting also uses
softirq_count, which will be set when not in softirq with bh disabled.

Peter also suggested solution of using 2 * SOFTIRQ_OFFSET as irq count
for local_bh_{disable,enable} and using just SOFTIRQ_OFFSET while softirq
processing. The patch below does that and adds API in_serving_softirq() which
returns whether we are currently processing softirq or not.

Also changes one of the usages of softirq_count in net/sched/cls_cgroup.c
to in_serving_softirq.

Looks like many usages of in_softirq really want in_serving_softirq. Those
changes can be made individually on a case by case basis.

Signed-off-by: Venkatesh Pallipadi <venki@google.com>
---
 include/linux/hardirq.h |    3 ++
 include/linux/sched.h   |    2 +-
 kernel/sched.c          |    2 +-
 kernel/softirq.c        |   51 +++++++++++++++++++++++++++++++---------------
 net/sched/cls_cgroup.c  |    2 +-
 5 files changed, 40 insertions(+), 20 deletions(-)

diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index d5b3876..1c736ae 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -82,10 +82,13 @@
 /*
  * Are we doing bottom half or hardware interrupt processing?
  * Are we in a softirq context? Interrupt context?
+ * in_softirq answers - are we currently processing softirq or have bh disabled?
+ * in_serving_softirq answers - are we currently processing softirq?
  */
 #define in_irq()		(hardirq_count())
 #define in_softirq()		(softirq_count())
 #define in_interrupt()		(irq_count())
+#define in_serving_softirq()	(softirq_count() == SOFTIRQ_OFFSET)
 
 /*
  * Are we in NMI context?
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 1e2a6db..1c40289 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2368,7 +2368,7 @@ extern int __cond_resched_lock(spinlock_t *lock);
 extern int __cond_resched_softirq(void);
 
 #define cond_resched_softirq() ({				\
-	__might_sleep(__FILE__, __LINE__, SOFTIRQ_OFFSET);	\
+	__might_sleep(__FILE__, __LINE__, SOFTIRQ_OFFSET * 2);	\
 	__cond_resched_softirq();				\
 })
 
diff --git a/kernel/sched.c b/kernel/sched.c
index dc85ceb..b6e714b 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -3397,7 +3397,7 @@ void account_system_time(struct task_struct *p, int hardirq_offset,
 	tmp = cputime_to_cputime64(cputime);
 	if (hardirq_count() - hardirq_offset)
 		cpustat->irq = cputime64_add(cpustat->irq, tmp);
-	else if (softirq_count())
+	else if (in_serving_softirq())
 		cpustat->softirq = cputime64_add(cpustat->softirq, tmp);
 	else
 		cpustat->system = cputime64_add(cpustat->system, tmp);
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 07b4f1b..7bfc67b 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -77,11 +77,21 @@ void wakeup_softirqd(void)
 }
 
 /*
+ * preempt count and SOFTIRQ_OFFSET usage:
+ * - preempt count is changed by SOFTIRQ_OFFSET on entering or leaving
+ *   softirq processing.
+ * - preempt count is changed by 2 * SOFTIRQ_OFFSET on local_bh_disable or
+ *   local_bh_enable.
+ * This lets us distinguish between whether we are currently processing
+ * softirq and whether we have bh disabled.
+ */
+
+/*
  * This one is for softirq.c-internal use,
  * where hardirqs are disabled legitimately:
  */
 #ifdef CONFIG_TRACE_IRQFLAGS
-static void __local_bh_disable(unsigned long ip)
+static void __local_bh_disable(unsigned long ip, unsigned int cnt)
 {
 	unsigned long flags;
 
@@ -95,32 +105,43 @@ static void __local_bh_disable(unsigned long ip)
 	 * We must manually increment preempt_count here and manually
 	 * call the trace_preempt_off later.
 	 */
-	preempt_count() += SOFTIRQ_OFFSET;
+	preempt_count() += cnt;
 	/*
 	 * Were softirqs turned off above:
 	 */
-	if (softirq_count() == SOFTIRQ_OFFSET)
+	if (softirq_count() == cnt)
 		trace_softirqs_off(ip);
 	raw_local_irq_restore(flags);
 
-	if (preempt_count() == SOFTIRQ_OFFSET)
+	if (preempt_count() == cnt)
 		trace_preempt_off(CALLER_ADDR0, get_parent_ip(CALLER_ADDR1));
 }
 #else /* !CONFIG_TRACE_IRQFLAGS */
-static inline void __local_bh_disable(unsigned long ip)
+static inline void __local_bh_disable(unsigned long ip, unsigned int cnt)
 {
-	add_preempt_count(SOFTIRQ_OFFSET);
+	add_preempt_count(cnt);
 	barrier();
 }
 #endif /* CONFIG_TRACE_IRQFLAGS */
 
 void local_bh_disable(void)
 {
-	__local_bh_disable((unsigned long)__builtin_return_address(0));
+	__local_bh_disable((unsigned long)__builtin_return_address(0),
+				2 * SOFTIRQ_OFFSET);
 }
 
 EXPORT_SYMBOL(local_bh_disable);
 
+static void __local_bh_enable(unsigned int cnt)
+{
+	WARN_ON_ONCE(in_irq());
+	WARN_ON_ONCE(!irqs_disabled());
+
+	if (softirq_count() == cnt)
+		trace_softirqs_on((unsigned long)__builtin_return_address(0));
+	sub_preempt_count(cnt);
+}
+
 /*
  * Special-case - softirqs can safely be enabled in
  * cond_resched_softirq(), or by __do_softirq(),
@@ -128,12 +149,7 @@ EXPORT_SYMBOL(local_bh_disable);
  */
 void _local_bh_enable(void)
 {
-	WARN_ON_ONCE(in_irq());
-	WARN_ON_ONCE(!irqs_disabled());
-
-	if (softirq_count() == SOFTIRQ_OFFSET)
-		trace_softirqs_on((unsigned long)__builtin_return_address(0));
-	sub_preempt_count(SOFTIRQ_OFFSET);
+	__local_bh_enable(2 * SOFTIRQ_OFFSET);
 }
 
 EXPORT_SYMBOL(_local_bh_enable);
@@ -147,13 +163,13 @@ static inline void _local_bh_enable_ip(unsigned long ip)
 	/*
 	 * Are softirqs going to be turned on now:
 	 */
-	if (softirq_count() == SOFTIRQ_OFFSET)
+	if (softirq_count() == 2 * SOFTIRQ_OFFSET)
 		trace_softirqs_on(ip);
 	/*
 	 * Keep preemption disabled until we are done with
 	 * softirq processing:
  	 */
- 	sub_preempt_count(SOFTIRQ_OFFSET - 1);
+	sub_preempt_count(2 * SOFTIRQ_OFFSET - 1);
 
 	if (unlikely(!in_interrupt() && local_softirq_pending()))
 		do_softirq();
@@ -198,7 +214,8 @@ asmlinkage void __do_softirq(void)
 	pending = local_softirq_pending();
 	account_system_vtime(current);
 
-	__local_bh_disable((unsigned long)__builtin_return_address(0));
+	__local_bh_disable((unsigned long)__builtin_return_address(0),
+				SOFTIRQ_OFFSET);
 	lockdep_softirq_enter();
 
 	cpu = smp_processor_id();
@@ -245,7 +262,7 @@ restart:
 	lockdep_softirq_exit();
 
 	account_system_vtime(current);
-	_local_bh_enable();
+	__local_bh_enable(SOFTIRQ_OFFSET);
 }
 
 #ifndef __ARCH_HAS_DO_SOFTIRQ
diff --git a/net/sched/cls_cgroup.c b/net/sched/cls_cgroup.c
index 78ef2c5..37dff78 100644
--- a/net/sched/cls_cgroup.c
+++ b/net/sched/cls_cgroup.c
@@ -123,7 +123,7 @@ static int cls_cgroup_classify(struct sk_buff *skb, struct tcf_proto *tp,
 	 * calls by looking at the number of nested bh disable calls because
 	 * softirqs always disables bh.
 	 */
-	if (softirq_count() != SOFTIRQ_OFFSET) {
+	if (in_serving_softirq()) {
 		/* If there is an sk_classid we'll use that. */
 		if (!skb->sk)
 			return -1;
-- 
1.7.1


  reply	other threads:[~2010-09-27 20:36 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-17  1:56 [PATCH 0/6] Proper kernel irq time accounting Venkatesh Pallipadi
2010-09-17  1:56 ` [PATCH 1/6] Consolidate account_system_vtime extern declaration Venkatesh Pallipadi
2010-09-17  1:56 ` [PATCH 2/6] Add IRQ_TIME_ACCOUNTING, finer accounting of CPU irq time Venkatesh Pallipadi
2010-09-19 11:11   ` Peter Zijlstra
2010-09-20 17:13     ` Venkatesh Pallipadi
2010-09-20 17:23       ` Peter Zijlstra
2010-09-19 11:21   ` Peter Zijlstra
2010-09-19 11:42     ` Peter Zijlstra
2010-09-19 12:01     ` Peter Zijlstra
2010-09-20  7:27       ` Martin Schwidefsky
2010-09-20  9:27         ` Peter Zijlstra
2010-09-20 17:16           ` Venkatesh Pallipadi
2010-09-20 17:26             ` Peter Zijlstra
2010-09-27 20:35               ` Venkatesh Pallipadi [this message]
2010-09-27 20:53                 ` [PATCH] si time accounting accounts bh_disable'd time to si Eric Dumazet
2010-09-27 21:11                   ` Venkatesh Pallipadi
2010-09-27 21:16                     ` Eric Dumazet
2010-09-30 11:17                 ` Peter Zijlstra
2010-09-17  1:56 ` [PATCH 3/6] x86: Add IRQ_TIME_ACCOUNTING in x86 Venkatesh Pallipadi
2010-09-17  1:56 ` [PATCH 4/6] sched: Do not account irq time to current task Venkatesh Pallipadi
2010-09-19 11:28   ` Peter Zijlstra
2010-09-20 17:33     ` Venkatesh Pallipadi
2010-09-20 17:38       ` Peter Zijlstra
2010-09-20 17:40         ` Venkatesh Pallipadi
2010-09-17  1:56 ` [PATCH 5/6] sched: Remove irq time from available CPU power Venkatesh Pallipadi
2010-09-19 11:31   ` Peter Zijlstra
2010-09-20 17:38     ` Venkatesh Pallipadi
2010-09-17  1:56 ` [PATCH 6/6] Export per cpu hardirq and softirq time in proc Venkatesh Pallipadi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1285619753-10892-1-git-send-email-venki@google.com \
    --to=venki@google.com \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=schwidefsky@de.ibm.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.