public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [BUG-REPORT] hrtick_start_fair and CPU-Hotplug
@ 2008-04-29  6:52 Gautham R Shenoy
  2008-04-29  6:59 ` Peter Zijlstra
  0 siblings, 1 reply; 4+ messages in thread
From: Gautham R Shenoy @ 2008-04-29  6:52 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Dhaval Giani, Srivatsa Vaddagiri,
	Balbir Singh
  Cc: linux-kernel

Hello!

While running the usual CPU-Hotplug stress tests on linux-2.6.25,
I noticed the following in the console logs.

This is a wee bit difficult to reproduce. In the past 10 runs I hit this
only once.

------------[ cut here ]------------

WARNING: at kernel/sched.c:962 hrtick+0x2e/0x65()

Modules linked in: usbhid

Pid: 15, comm: events/0 Not tainted 2.6.25 #3

 [<c0122230>] warn_on_slowpath+0x40/0x65

 [<c0130035>] schedule_on_each_cpu+0x74/0xbc

 [<c0119de3>] hrtick+0x0/0x65

 [<c013e095>] __lock_acquire+0xbe8/0xbf7

 [<c013e095>] __lock_acquire+0xbe8/0xbf7

 [<c0134bc1>] __run_hrtimer+0x42/0x79

 [<c0119e11>] hrtick+0x2e/0x65

 [<c0119de3>] hrtick+0x0/0x65

 [<c0134bc5>] __run_hrtimer+0x46/0x79

 [<c0135526>] hrtimer_interrupt+0x100/0x173

 [<c0111bc7>] smp_apic_timer_interrupt+0x68/0x77

 [<c01053b7>] apic_timer_interrupt+0x33/0x38

 [<c0108a3e>] native_read_tsc+0xe/0xf

 [<c0291dc9>] delay_tsc+0x12/0x1d

 [<c0291d5f>] __const_udelay+0x29/0x2a

 [<c04ed960>] do_boot_cpu+0x2f0/0x473

 [<c04edaf4>] do_warm_boot_cpu+0x11/0x1a

 [<c012f72a>] run_workqueue+0xd1/0x1a5

 [<c012f6ea>] run_workqueue+0x91/0x1a5

 [<c04edae3>] do_warm_boot_cpu+0x0/0x1a

 [<c012f8b4>] worker_thread+0xb6/0xc2

 [<c013256a>] autoremove_wake_function+0x0/0x2d

 [<c012f7fe>] worker_thread+0x0/0xc2

 [<c013231a>] kthread+0x38/0x5d

 [<c01322e2>] kthread+0x0/0x5d

 [<c010554f>] kernel_thread_helper+0x7/0x10

 =======================

---[ end trace 9ccf64fec08bb0d7 ]---

------------[ cut here ]------------

WARNING: at kernel/sched.c:514 __update_rq_clock+0x69/0x12b()

Modules linked in: usbhid

Pid: 15, comm: events/0 Not tainted 2.6.25 #3

 [<c0122230>] warn_on_slowpath+0x40/0x65

 [<c0130035>] schedule_on_each_cpu+0x74/0xbc

 [<c013e095>] __lock_acquire+0xbe8/0xbf7

 [<c0108740>] native_sched_clock+0x91/0xa2

 [<c0118683>] __update_rq_clock+0x69/0x12b

 [<c0119e29>] hrtick+0x46/0x65

 [<c0119de3>] hrtick+0x0/0x65

 [<c0134bc5>] __run_hrtimer+0x46/0x79

 [<c0135526>] hrtimer_interrupt+0x100/0x173

 [<c0111bc7>] smp_apic_timer_interrupt+0x68/0x77

 [<c01053b7>] apic_timer_interrupt+0x33/0x38

 [<c0108a3e>] native_read_tsc+0xe/0xf

 [<c0291dc9>] delay_tsc+0x12/0x1d

 [<c0291d5f>] __const_udelay+0x29/0x2a

 [<c04ed960>] do_boot_cpu+0x2f0/0x473

 [<c04edaf4>] do_warm_boot_cpu+0x11/0x1a

 [<c012f72a>] run_workqueue+0xd1/0x1a5

 [<c012f6ea>] run_workqueue+0x91/0x1a5

 [<c04edae3>] do_warm_boot_cpu+0x0/0x1a

 [<c012f8b4>] worker_thread+0xb6/0xc2

 [<c013256a>] autoremove_wake_function+0x0/0x2d

 [<c012f7fe>] worker_thread+0x0/0xc2

 [<c013231a>] kthread+0x38/0x5d

 [<c01322e2>] kthread+0x0/0x5d

 [<c010554f>] kernel_thread_helper+0x7/0x10

 =======================

---[ end trace 9ccf64fec08bb0d7 ]---

Calibrating delay using timer specific routine.. 186544.67 BogoMIPS (lpj=373089344)

CPU: L1 I cache: 32K, L1 D cache: 32K

CPU: L2 cache: 4096K

CPU: Physical Processor ID: 3

CPU: Processor Core ID: 1

Intel machine check architecture supported.

Intel machine check reporting enabled on CPU#3.

CPU3: Intel(R) Xeon(R) CPU            5160  @ 3.00GHz stepping 06

checking TSC synchronization [CPU#0 -> CPU#3]: passed.

lockdep: fixing up alternatives.

Booting processor 1/1 ip 4000

Initializing CPU#1

Calibrating delay using timer specific routine.. 5985.62 BogoMIPS (lpj=11971248)

CPU: L1 I cache: 32K, L1 D cache: 32K

CPU: L2 cache: 4096K

CPU: Physical Processor ID: 0

CPU: Processor Core ID: 1

Intel machine check architecture supported.

.
.
.
.
.
After some half hour,

BUG: unable to handle kernel NULL pointer dereference at 00000000

IP: [<c011845c>] hrtick_start_fair+0x57/0x156

*pde = 00000000 

Oops: 0000 [#1] SMP 

Modules linked in: usbhid



Pid: 3316, comm: bash Not tainted (2.6.25 #3)

EIP: 0060:[<c011845c>] EFLAGS: 00010002 CPU: 1

EIP is at hrtick_start_fair+0x57/0x156

EAX: 00000000 EBX: c04ffc88 ECX: 00000001 EDX: 00000003

ESI: f35a1020 EDI: f31a90e0 EBP: cc84fe00 ESP: f60e7e80

 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068

Process bash (pid: 3316, ti=f60e6000 task=f78dc020 task.ti=f60e6000)

Stack: 00b873a6 c011c4b9 f31a90e0 cc84cb80 cc84fe00 c04ffc88 f35a1020 cc84fe00 

       00000001 c011715d cc84fe00 f35a1020 cc84fe00 c0117183 f35a1020 00000001 

       c011c961 00000000 0000000f 00000003 00000282 c06c7d7c 00000003 00000000 

Call Trace:

 [<c011c4b9>] task_rq_lock+0x2d/0x53

 [<c011715d>] enqueue_task+0x49/0x54

 [<c0117183>] activate_task+0x1b/0x2e

 [<c011c961>] try_to_wake_up+0x15e/0x1a1

 [<c04ef325>] cpu_callback+0xb2/0x12a

 [<c0154b00>] writeback_set_ratelimit+0x11/0x43

 [<c04f4881>] notifier_call_chain+0x2a/0x47

 [<c0135bc2>] raw_notifier_call_chain+0x9/0xc

 [<c04ef1c1>] _cpu_up+0xa8/0xd6

 [<c04ef237>] cpu_up+0x48/0x59

 [<c02f13da>] store_online+0x3a/0x56

 [<c02f13a0>] store_online+0x0/0x56

 [<c02ee5ee>] sysdev_store+0x1e/0x22

 [<c01a1ccf>] sysfs_write_file+0xb5/0xe2

 [<c01a1c1a>] sysfs_write_file+0x0/0xe2

 [<c016f45e>] vfs_write+0x84/0xf7

 [<c016f569>] sys_write+0x3c/0x63

 [<c010483e>] sysenter_past_esp+0x5f/0xa5

 =======================

Code: 95 00 43 6d c0 39 c5 74 0f ba 2d 03 00 00 b8 76 a7 5c c0 e8 a7 9d 00 00 f6 05 50 4a 6d c0 08 0f 84 fd 00 00 00 8b 85 84 04 00 00 <8b> 00 83 b8 9c 00 00 00 00 0f 84 e8 00 00 00 8b 44 24 0c 8b 58 

EIP: [<c011845c>] hrtick_start_fair+0x57/0x156 SS:ESP 0068:f60e7e80

---[ end trace 9ccf64fec08bb0d7 ]---


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Disassembly of hrtick_start_fair + 0x57 is as follows:

00001abd <hrtick_start_fair>:
hrtick_start_fair():
.
.
.
.
/home/ego/kernels/linux-2.6.25/kernel/sched_fair.c:815
    1b0e:	8b 85 84 04 00 00    	mov    0x484(%ebp),%eax
    ------------------------------------------------------------
    1b14:	8b 00                	mov    (%eax),%eax
    -------------------------------------------------------------
    1b16:	83 b8 9c 00 00 00 00 	cmpl   $0x0,0x9c(%eax)
    1b1d:	0f 84 e8 00 00 00    	je     1c0b <hrtick_start_fair+0x14e>
    1b23:	8b 44 24 0c          	mov    0xc(%esp),%eax
    1b27:	8b 58 08             	mov    0x8(%eax),%ebx
    1b2a:	83 fb 01             	cmp    $0x1,%ebx
    1b2d:	0f 86 d8 00 00 00    	jbe    1c0b <hrtick_start_fair+0x14e>


I am not well versed with this part of the code, so just wondering if we
are doing a good job at handling the cancellation of any per-cpu
scheduler timers during CPU-Hotplug.

--
Thanks and Regards
gautham

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [BUG-REPORT] hrtick_start_fair and CPU-Hotplug
  2008-04-29  6:52 [BUG-REPORT] hrtick_start_fair and CPU-Hotplug Gautham R Shenoy
@ 2008-04-29  6:59 ` Peter Zijlstra
  2008-04-29  8:02   ` Peter Zijlstra
  0 siblings, 1 reply; 4+ messages in thread
From: Peter Zijlstra @ 2008-04-29  6:59 UTC (permalink / raw)
  To: ego
  Cc: Ingo Molnar, Dhaval Giani, Srivatsa Vaddagiri, Balbir Singh,
	linux-kernel

On Tue, 2008-04-29 at 12:22 +0530, Gautham R Shenoy wrote:
> Hello!
> 
> While running the usual CPU-Hotplug stress tests on linux-2.6.25,
> I noticed the following in the console logs.
> 
> This is a wee bit difficult to reproduce. In the past 10 runs I hit this
> only once.
> 
> ------------[ cut here ]------------
> 
> WARNING: at kernel/sched.c:962 hrtick+0x2e/0x65()

> I am not well versed with this part of the code, so just wondering if we
> are doing a good job at handling the cancellation of any per-cpu
> scheduler timers during CPU-Hotplug.

This looks like its not cancelled at all and migrates the it to another
cpu. I'll see what I can come up with.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [BUG-REPORT] hrtick_start_fair and CPU-Hotplug
  2008-04-29  6:59 ` Peter Zijlstra
@ 2008-04-29  8:02   ` Peter Zijlstra
  2008-04-29  9:46     ` Ingo Molnar
  0 siblings, 1 reply; 4+ messages in thread
From: Peter Zijlstra @ 2008-04-29  8:02 UTC (permalink / raw)
  To: ego
  Cc: Ingo Molnar, Dhaval Giani, Srivatsa Vaddagiri, Balbir Singh,
	linux-kernel

On Tue, 2008-04-29 at 08:59 +0200, Peter Zijlstra wrote:
> On Tue, 2008-04-29 at 12:22 +0530, Gautham R Shenoy wrote:
> > Hello!
> > 
> > While running the usual CPU-Hotplug stress tests on linux-2.6.25,
> > I noticed the following in the console logs.
> > 
> > This is a wee bit difficult to reproduce. In the past 10 runs I hit this
> > only once.
> > 
> > ------------[ cut here ]------------
> > 
> > WARNING: at kernel/sched.c:962 hrtick+0x2e/0x65()
> 
> > I am not well versed with this part of the code, so just wondering if we
> > are doing a good job at handling the cancellation of any per-cpu
> > scheduler timers during CPU-Hotplug.
> 
> This looks like its not cancelled at all and migrates the it to another
> cpu. I'll see what I can come up with.


Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 kernel/sched.c |   66 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 65 insertions(+), 1 deletion(-)

Index: linux-2.6-2/kernel/sched.c
===================================================================
--- linux-2.6-2.orig/kernel/sched.c
+++ linux-2.6-2/kernel/sched.c
@@ -1201,6 +1201,7 @@ static inline void resched_rq(struct rq 
 enum {
 	HRTICK_SET,		/* re-programm hrtick_timer */
 	HRTICK_RESET,		/* not a new slice */
+	HRTICK_BLOCK,		/* stop hrtick operations */
 };
 
 /*
@@ -1212,6 +1213,8 @@ static inline int hrtick_enabled(struct 
 {
 	if (!sched_feat(HRTICK))
 		return 0;
+	if (unlikely(test_bit(HRTICK_BLOCK, &rq->hrtick_flags)))
+		return 0;
 	return hrtimer_is_hres_active(&rq->hrtick_timer);
 }
 
@@ -1294,7 +1297,63 @@ static enum hrtimer_restart hrtick(struc
 	return HRTIMER_NORESTART;
 }
 
-static inline void init_rq_hrtick(struct rq *rq)
+static void hotplug_hrtick_disable(int cpu)
+{
+	struct rq *rq = cpu_rq(cpu);
+	unsigned long flags;
+
+	spin_lock_irqsave(&rq->lock, flags);
+	rq->hrtick_flags = 0;
+	__set_bit(HRTICK_BLOCK, &rq->hrtick_flags);
+	spin_unlock_irqrestore(&rq->lock, flags);
+
+	hrtick_clear(rq);
+}
+
+static void hotplug_hrtick_enable(int cpu)
+{
+	struct rq *rq = cpu_rq(cpu);
+	unsigned long flags;
+
+	spin_lock_irqsave(&rq->lock, flags);
+	__clear_bit(HRTICK_BLOCK, &rq->hrtick_flags);
+	spin_unlock_irqrestore(&rq->lock, flags);
+}
+
+static int
+hotplug_hrtick(struct notifier_block *nfb, unsigned long action, void *hcpu)
+{
+	int cpu = (int)hcpu;
+
+	switch (action) {
+	case CPU_UP_CANCELED:
+	case CPU_UP_CANCELED_FROZEN:
+	case CPU_DOWN_PREPARE:
+	case CPU_DOWN_PREPARE_FROZEN:
+	case CPU_DEAD:
+	case CPU_DEAD_FROZEN:
+		hotplug_hrtick_disable(cpu);
+		return NOTIFY_OK;
+
+	case CPU_UP_PREPARE:
+	case CPU_UP_PREPARE_FROZEN:
+	case CPU_DOWN_FAILED:
+	case CPU_DOWN_FAILED_FROZEN:
+	case CPU_ONLINE:
+	case CPU_ONLINE_FROZEN:
+		hotplug_hrtick_enable(cpu);
+		return NOTIFY_OK;
+	}
+
+	return NOTIFY_DONE;
+}
+
+static void init_hrtick(void)
+{
+	hotcpu_notifier(hotplug_hrtick, 0);
+}
+
+static void init_rq_hrtick(struct rq *rq)
 {
 	rq->hrtick_flags = 0;
 	hrtimer_init(&rq->hrtick_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
@@ -1331,6 +1390,10 @@ static inline void init_rq_hrtick(struct
 void hrtick_resched(void)
 {
 }
+
+static inline void init_hrtick(void)
+{
+}
 #endif
 
 /*
@@ -7964,6 +8027,7 @@ void __init sched_init_smp(void)
 	put_online_cpus();
 	/* XXX: Theoretical race here - CPU may be hotplugged now */
 	hotcpu_notifier(update_sched_domains, 0);
+	init_hrtick();
 
 	/* Move init over to a non-isolated CPU */
 	if (set_cpus_allowed_ptr(current, &non_isolated_cpus) < 0)



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [BUG-REPORT] hrtick_start_fair and CPU-Hotplug
  2008-04-29  8:02   ` Peter Zijlstra
@ 2008-04-29  9:46     ` Ingo Molnar
  0 siblings, 0 replies; 4+ messages in thread
From: Ingo Molnar @ 2008-04-29  9:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: ego, Dhaval Giani, Srivatsa Vaddagiri, Balbir Singh, linux-kernel


* Peter Zijlstra <peterz@infradead.org> wrote:

> > This looks like its not cancelled at all and migrates the it to 
> > another cpu. I'll see what I can come up with.

thanks - i've queued up the patch below. Once this goes upstream it 
would be a backport candidate as wel.

	Ingo

-------------------------->
Subject: sched: fix hrtick_start_fair and CPU-Hotplug
From: Peter Zijlstra <peterz@infradead.org>
Date: Tue, 29 Apr 2008 10:02:46 +0200

Gautham R Shenoy reported:

 > While running the usual CPU-Hotplug stress tests on linux-2.6.25,
 > I noticed the following in the console logs.
 >
 > This is a wee bit difficult to reproduce. In the past 10 runs I hit this
 > only once.
 >
 > ------------[ cut here ]------------
 >
 > WARNING: at kernel/sched.c:962 hrtick+0x2e/0x65()
 >
 > Just wondering if we are doing a good job at handling the cancellation
 > of any per-cpu scheduler timers during CPU-Hotplug.

This looks like its indeed not cancelled at all and migrates the it to
another cpu. Fix it via a proper hotplug notifier mechanism.

Reported-by: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/sched.c |   66 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 65 insertions(+), 1 deletion(-)

Index: linux/kernel/sched.c
===================================================================
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -1238,6 +1238,7 @@ static inline void resched_rq(struct rq 
 enum {
 	HRTICK_SET,		/* re-programm hrtick_timer */
 	HRTICK_RESET,		/* not a new slice */
+	HRTICK_BLOCK,		/* stop hrtick operations */
 };
 
 /*
@@ -1249,6 +1250,8 @@ static inline int hrtick_enabled(struct 
 {
 	if (!sched_feat(HRTICK))
 		return 0;
+	if (unlikely(test_bit(HRTICK_BLOCK, &rq->hrtick_flags)))
+		return 0;
 	return hrtimer_is_hres_active(&rq->hrtick_timer);
 }
 
@@ -1331,7 +1334,63 @@ static enum hrtimer_restart hrtick(struc
 	return HRTIMER_NORESTART;
 }
 
-static inline void init_rq_hrtick(struct rq *rq)
+static void hotplug_hrtick_disable(int cpu)
+{
+	struct rq *rq = cpu_rq(cpu);
+	unsigned long flags;
+
+	spin_lock_irqsave(&rq->lock, flags);
+	rq->hrtick_flags = 0;
+	__set_bit(HRTICK_BLOCK, &rq->hrtick_flags);
+	spin_unlock_irqrestore(&rq->lock, flags);
+
+	hrtick_clear(rq);
+}
+
+static void hotplug_hrtick_enable(int cpu)
+{
+	struct rq *rq = cpu_rq(cpu);
+	unsigned long flags;
+
+	spin_lock_irqsave(&rq->lock, flags);
+	__clear_bit(HRTICK_BLOCK, &rq->hrtick_flags);
+	spin_unlock_irqrestore(&rq->lock, flags);
+}
+
+static int
+hotplug_hrtick(struct notifier_block *nfb, unsigned long action, void *hcpu)
+{
+	int cpu = (int)hcpu;
+
+	switch (action) {
+	case CPU_UP_CANCELED:
+	case CPU_UP_CANCELED_FROZEN:
+	case CPU_DOWN_PREPARE:
+	case CPU_DOWN_PREPARE_FROZEN:
+	case CPU_DEAD:
+	case CPU_DEAD_FROZEN:
+		hotplug_hrtick_disable(cpu);
+		return NOTIFY_OK;
+
+	case CPU_UP_PREPARE:
+	case CPU_UP_PREPARE_FROZEN:
+	case CPU_DOWN_FAILED:
+	case CPU_DOWN_FAILED_FROZEN:
+	case CPU_ONLINE:
+	case CPU_ONLINE_FROZEN:
+		hotplug_hrtick_enable(cpu);
+		return NOTIFY_OK;
+	}
+
+	return NOTIFY_DONE;
+}
+
+static void init_hrtick(void)
+{
+	hotcpu_notifier(hotplug_hrtick, 0);
+}
+
+static void init_rq_hrtick(struct rq *rq)
 {
 	rq->hrtick_flags = 0;
 	hrtimer_init(&rq->hrtick_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
@@ -1368,6 +1427,10 @@ static inline void init_rq_hrtick(struct
 void hrtick_resched(void)
 {
 }
+
+static inline void init_hrtick(void)
+{
+}
 #endif
 
 /*
@@ -8020,6 +8083,7 @@ void __init sched_init_smp(void)
 	put_online_cpus();
 	/* XXX: Theoretical race here - CPU may be hotplugged now */
 	hotcpu_notifier(update_sched_domains, 0);
+	init_hrtick();
 
 	/* Move init over to a non-isolated CPU */
 	if (set_cpus_allowed_ptr(current, &non_isolated_cpus) < 0)

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2008-04-29  9:46 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-29  6:52 [BUG-REPORT] hrtick_start_fair and CPU-Hotplug Gautham R Shenoy
2008-04-29  6:59 ` Peter Zijlstra
2008-04-29  8:02   ` Peter Zijlstra
2008-04-29  9:46     ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox