public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
@ 2005-07-30 16:03 Ingo Molnar
  2005-07-30 20:47 ` Peter Zijlstra
                   ` (3 more replies)
  0 siblings, 4 replies; 66+ messages in thread
From: Ingo Molnar @ 2005-07-30 16:03 UTC (permalink / raw)
  To: linux-kernel


i have released the -V0.7.52-01 Real-Time Preemption patch, which can be 
downloaded from the usual place:

    http://redhat.com/~mingo/realtime-preempt/

this release is mainly a merge to 2.6.13-rc4. (That merge slashed ~30K 
off the patch, due to the continuing merge of various bits of the -RT 
tree to mainline.)

to build a -V0.7.52-01 tree, the following patches should to be applied:

   http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.12.tar.bz2
   http://kernel.org/pub/linux/kernel/v2.6/testing/patch-2.6.13-rc4.bz2
   http://redhat.com/~mingo/realtime-preempt/realtime-preempt-2.6.13-rc4-RT-V0.7.52-01

reports, patches, suggestions welcome.

	Ingo

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
  2005-07-30 16:03 [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01 Ingo Molnar
@ 2005-07-30 20:47 ` Peter Zijlstra
  2005-07-30 20:52   ` Ingo Molnar
  2005-08-01 18:22 ` [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01 Steven Rostedt
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 66+ messages in thread
From: Peter Zijlstra @ 2005-07-30 20:47 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 92 bytes --]

Hi Ingo,

-02 needs the attached patch to compile with my config.

Regards,

Peter Zijlstra

[-- Attachment #2: linux-2.6.13-rc4-RT-V0.7.52-02_compile-fix.diff --]
[-- Type: text/x-patch, Size: 1210 bytes --]

--- linux-2.6.13-rc4-RT-V0.7.52-02/mm/swap.c.orig	2005-07-30 21:38:28.000000000 +0200
+++ linux-2.6.13-rc4-RT-V0.7.52-02/mm/swap.c	2005-07-30 21:40:11.000000000 +0200
@@ -422,14 +422,17 @@
 #ifdef CONFIG_HOTPLUG_CPU
 static void lru_drain_cache(unsigned int cpu)
 {
-	struct pagevec *pvec = &per_cpu(lru_add_pvecs, cpu);
+	struct pagevec *pvec = &get_cpu_var_locked(lru_add_pvecs, cpu);
 
 	/* CPU is dead, so no locking needed. */
 	if (pagevec_count(pvec))
 		__pagevec_lru_add(pvec);
-	pvec = &per_cpu(lru_add_active_pvecs, cpu);
+	put_cpu_var_locked(lru_add_pvecs, cpu);
+
+	pvec = &get_cpu_var_locked(lru_add_active_pvecs, cpu);
 	if (pagevec_count(pvec))
 		__pagevec_lru_add_active(pvec);
+	put_cpu_var_locked(lru_add_active_pvecs, cpu);
 }
 
 /* Drop the CPU's cached committed space back into the central pool. */
--- linux-2.6.13-rc4-RT-V0.7.52-02/drivers/message/i2o/exec-osm.c~	2005-07-30 20:46:19.000000000 +0200
+++ linux-2.6.13-rc4-RT-V0.7.52-02/drivers/message/i2o/exec-osm.c	2005-07-30 21:58:38.000000000 +0200
@@ -204,7 +204,7 @@
 {
 	struct i2o_exec_wait *wait, *tmp;
 	unsigned long flags;
-	static spinlock_t lock = SPIN_LOCK_UNLOCKED;
+	static DEFINE_SPINLOCK(lock);
 	int rc = 1;
 
 	/*

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
  2005-07-30 20:47 ` Peter Zijlstra
@ 2005-07-30 20:52   ` Ingo Molnar
  2005-07-31  4:47     ` Lee Revell
  2005-07-31  8:03     ` Peter Zijlstra
  0 siblings, 2 replies; 66+ messages in thread
From: Ingo Molnar @ 2005-07-30 20:52 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel


* Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:

> Hi Ingo,
> 
> -02 needs the attached patch to compile with my config.

thanks, i've released -03 with your fixes.

	Ingo

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
  2005-07-30 20:52   ` Ingo Molnar
@ 2005-07-31  4:47     ` Lee Revell
  2005-07-31  6:38       ` Ingo Molnar
  2005-07-31  8:03     ` Peter Zijlstra
  1 sibling, 1 reply; 66+ messages in thread
From: Lee Revell @ 2005-07-31  4:47 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Peter Zijlstra, linux-kernel

On Sat, 2005-07-30 at 22:52 +0200, Ingo Molnar wrote:
> * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> 
> > Hi Ingo,
> > 
> > -02 needs the attached patch to compile with my config.
> 
> thanks, i've released -03 with your fixes.
> 

Does not compile with highmem enabled:

  CC      arch/i386/mm/highmem.o
arch/i386/mm/highmem.c:102: error: syntax error before '(' token
arch/i386/mm/highmem.c:107: error: syntax error before numeric constant
arch/i386/mm/highmem.c:107: warning: type defaults to 'int' in declaration of 'add_preempt_count'
arch/i386/mm/highmem.c:107: warning: function declaration isn't a prototype
arch/i386/mm/highmem.c:107: error: conflicting types for 'add_preempt_count'
include/linux/preempt.h:14: error: previous declaration of 'add_preempt_count' was here
arch/i386/mm/highmem.c:107: warning: data definition has no type or storage class
arch/i386/mm/highmem.c:109: warning: type defaults to 'int' in declaration of 'idx'
arch/i386/mm/highmem.c:109: error: 'type' undeclared here (not in a function)
arch/i386/mm/highmem.c:109: warning: data definition has no type or storage class
arch/i386/mm/highmem.c:110: warning: type defaults to 'int' in declaration of 'vaddr'
arch/i386/mm/highmem.c:110: error: conflicting types for 'vaddr'
arch/i386/mm/highmem.c:105: error: previous declaration of 'vaddr' was here
arch/i386/mm/highmem.c:110: error: initializer element is not constant
arch/i386/mm/highmem.c:110: warning: data definition has no type or storage class
arch/i386/mm/highmem.c:111: error: syntax error before '-' token
arch/i386/mm/highmem.c:132: error: 'kmap_atomic' undeclared here (not in a function)
arch/i386/mm/highmem.c:133: error: 'kunmap_atomic' undeclared here (not in a function)
arch/i386/mm/highmem.c:134: error: 'kmap_atomic_to_page' undeclared here (not in a function)
make[1]: *** [arch/i386/mm/highmem.o] Error 1
make: *** [arch/i386/mm] Error 2

Lee


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
  2005-07-31  4:47     ` Lee Revell
@ 2005-07-31  6:38       ` Ingo Molnar
  2005-08-01  4:45         ` Lee Revell
  0 siblings, 1 reply; 66+ messages in thread
From: Ingo Molnar @ 2005-07-31  6:38 UTC (permalink / raw)
  To: Lee Revell; +Cc: Peter Zijlstra, linux-kernel

u
* Lee Revell <rlrevell@joe-job.com> wrote:

> On Sat, 2005-07-30 at 22:52 +0200, Ingo Molnar wrote:
> > * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> > 
> > > Hi Ingo,
> > > 
> > > -02 needs the attached patch to compile with my config.
> > 
> > thanks, i've released -03 with your fixes.
> > 
> 
> Does not compile with highmem enabled:

ok - i've uploaded the -52-04 patch, does that fix it for you?

	Ingo

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
  2005-07-30 20:52   ` Ingo Molnar
  2005-07-31  4:47     ` Lee Revell
@ 2005-07-31  8:03     ` Peter Zijlstra
  2005-07-31 10:44       ` Ingo Molnar
  1 sibling, 1 reply; 66+ messages in thread
From: Peter Zijlstra @ 2005-07-31  8:03 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 5380 bytes --]

Hi Ingo,

Still on -02;

I'm now running the thing and having some trouble fixing it.
Attached is a patch to the new CFQ iosched that I had lying about in an
old -mm port of the RT patch.

This trace is giving me a headache to fix:

 BUG: scheduling with irqs disabled: IRQ 14/0x20000000/772
 caller is __down_mutex+0x446/0x600
  [<c0104693>] dump_stack+0x23/0x30 (20)
  [<c036defc>] schedule+0xac/0x120 (28)
  [<c036f086>] __down_mutex+0x446/0x600 (120)
  [<c0370a48>] _spin_lock_irqsave+0x28/0x60 (28)
  [<c0121220>] __wake_up+0x20/0x70 (44)
  [<c013fab9>] __wake_up_bit+0x39/0x40 (24)
  [<c013fae4>] wake_up_bit+0x24/0x30 (16)
  [<c01790c6>] unlock_buffer+0x16/0x20 (8)
  [<c0179d19>] end_buffer_async_write+0x69/0x200 (68)
  [<c017d234>] end_bio_bh_io_sync+0x34/0x70 (20)
  [<c017eccd>] bio_endio+0x5d/0x90 (32)
  [<c02890f7>] __end_that_request_first+0xd7/0x250 (52)
  [<c0289297>] end_that_request_first+0x27/0x30 (20)
  [<c029f9ca>] __ide_end_request+0x6a/0x1b0 (36)
  [<c029fb9a>] ide_end_request+0x8a/0xb0 (40)
  [<c02a9ad0>] ide_dma_intr+0xa0/0xf0 (32)
  [<c02a1ba5>] ide_intr+0xb5/0x160 (36)
  [<c01544b6>] handle_IRQ_event+0x76/0x110 (52)
  [<c0154d09>] do_hardirq+0x59/0x110 (44)
  [<c0154e87>] do_irqd+0xc7/0x1e0 (36)
  [<c013f426>] kthread+0xb6/0xf0 (48)
  [<c01015f9>] kernel_thread_helper+0x5/0xc (271781916)
 ---------------------------
 | preempt count: 20000001 ]
 | 1-level deep critical section nesting:
 ----------------------------------------
 .. [<c0149d2c>] .... add_preempt_count+0x1c/0x20
 .....[<c014b0d7>] ..   ( <= print_traces+0x17/0x60)

 ------------------------------
 | showing all locks held by: |  (IRQ 14/772 [efef4660,  53]):
 ------------------------------


because end_buffer_async_read/write use bit_spin_(un)lock and I do not
know how those interact with -RT.

=-=-=-=-=-=-=-=-=-=-=-=

Also (just for fun) I enabled CONFIG_HOTPLUG_CPU and tried to offline a
cpu. This is what happened:

 kstopmachine/14814[CPU#1]: BUG in __activate_idle_task at kernel/sched.c:871
  [<c0104693>] dump_stack+0x23/0x30 (20)
  [<c0128473>] __WARN_ON+0x63/0x80 (44)
  [<c01230bd>] sched_idle_next+0xfd/0x100 (40)
  [<c014c276>] take_cpu_down+0x16/0x20 (8)
  [<c0153e5e>] do_stop+0x6e/0x80 (20)
  [<c013f426>] kthread+0xb6/0xf0 (48)
  [<c01015f9>] kernel_thread_helper+0x5/0xc (288305180)
 ---------------------------
 | preempt count: 20000003 ]
 | 3-level deep critical section nesting:
 ----------------------------------------
 .. [<c0149d2c>] .... add_preempt_count+0x1c/0x20
 .....[<c012300d>] ..   ( <= sched_idle_next+0x4d/0x100)
 .. [<c0149d2c>] .... add_preempt_count+0x1c/0x20
 .....[<c0128427>] ..   ( <= __WARN_ON+0x17/0x80)
 .. [<c0149d2c>] .... add_preempt_count+0x1c/0x20
 .....[<c014b0d7>] ..   ( <= print_traces+0x17/0x60)

 ------------------------------
 | showing all locks held by: |  (kstopmachine/14814 [ee9226a0,   0]):
 ------------------------------

 BUG: scheduling with irqs disabled: kstopmachine/0x00000000/14814
 caller is do_stop+0x3c/0x80
  [<c0104693>] dump_stack+0x23/0x30 (20)
  [<c036defc>] schedule+0xac/0x120 (28)
  [<c0153e2c>] do_stop+0x3c/0x80 (20)
  [<c013f426>] kthread+0xb6/0xf0 (48)
  [<c01015f9>] kernel_thread_helper+0x5/0xc (288305180)
 ---------------------------
 | preempt count: 00000001 ]
 | 1-level deep critical section nesting:
 ----------------------------------------
 .. [<c0149d2c>] .... add_preempt_count+0x1c/0x20
 .....[<c014b0d7>] ..   ( <= print_traces+0x17/0x60)

 ------------------------------
 | showing all locks held by: |  (kstopmachine/14814 [ee9226a0,   0]):
 ------------------------------

 CPU 1 is now offline
 bash/14800[CPU#0]: BUG in dec_rt_tasks at kernel/sched.c:647
  [<c0104693>] dump_stack+0x23/0x30 (20)
  [<c0128473>] __WARN_ON+0x63/0x80 (44)
  [<c011e01d>] dequeue_task+0x8d/0xa0 (28)
  [<c011e4c4>] deactivate_task+0x24/0x40 (20)
  [<c0123441>] migration_call+0xe1/0x2e0 (44)
  [<c0136ad5>] notifier_call_chain+0x25/0x40 (32)
  [<c014c40c>] cpu_down+0x18c/0x2d0 (52)
  [<c028194f>] store_online+0x3f/0xa0 (24)
  [<c027e533>] sysdev_store+0x33/0x40 (20)
  [<c01b5a67>] flush_write_buffer+0x47/0x50 (32)
  [<c01b5ac9>] sysfs_write_file+0x59/0x80 (32)
  [<c0177d52>] vfs_write+0xe2/0x1b0 (40)
  [<c0177ef0>] sys_write+0x50/0x80 (44)
  [<c0103698>] sysenter_past_esp+0x61/0x89 (-4020)
 ---------------------------
 | preempt count: 00000003 ]
 | 3-level deep critical section nesting:
 ----------------------------------------
 .. [<c0149d2c>] .... add_preempt_count+0x1c/0x20
 .....[<c0154632>] ..   ( <= __do_IRQ+0xe2/0x170)
 .. [<c0149d2c>] .... add_preempt_count+0x1c/0x20
 .....[<c0119c26>] ..   ( <= unmask_IO_APIC_irq+0x16/0x50)
 .. [<c0149d2c>] .... add_preempt_count+0x1c/0x20
 .....[<c014b0d7>] ..   ( <= print_traces+0x17/0x60)

 ------------------------------
 | showing all locks held by: |  (bash/14800 [ee044660, 115]):
 ------------------------------

 #001:             [eaff1078] {(struct semaphore *)(&buffer->sem)}
 ... acquired at:               sysfs_write_file+0x27/0x80

 #002:             [c03d0204] {cpucontrol.lock}
 ... acquired at:               cpu_down+0x21/0x2d0


and a shitload of smp_processor_id in preemptible section thingies.
And as expected getting the cpu back online didn't work :-)

I haven't looked at these traces yet; but since I am writing this email
anyway I might as well include them.

Kind regards,

Peter Zijlstra

[-- Attachment #2: linux-2.6.13-rc4-RT-V0.7.52-02_fixups.diff --]
[-- Type: text/x-patch, Size: 1922 bytes --]

 kernel: BUG: scheduling with irqs disabled: uname/0x20000000/983
 kernel: caller is __down_mutex+0x446/0x600
 kernel:  [<c0104693>] dump_stack+0x23/0x30 (20)
 kernel:  [<c036defc>] schedule+0xac/0x120 (28)
 kernel:  [<c036f086>] __down_mutex+0x446/0x600 (120)
 kernel:  [<c0370888>] _spin_lock+0x28/0x50 (28)
 kernel:  [<c0292df4>] cfq_exit_single_io_context+0x34/0xc0 (32)
 kernel:  [<c0292ebc>] cfq_exit_io_context+0x3c/0x50 (24)
 kernel:  [<c028975c>] exit_io_context+0x8c/0xb0 (24)
 kernel:  [<c012ac41>] do_exit+0x411/0x460 (44)
 kernel:  [<c012ad2e>] do_group_exit+0x3e/0xd0 (40)
 kernel:  [<c012adda>] sys_exit_group+0x1a/0x20 (12)
 kernel:  [<c0103698>] sysenter_past_esp+0x61/0x89 (-4020)
 kernel: ---------------------------
 kernel: | preempt count: 20000001 ]
 kernel: | 1-level deep critical section nesting:
 kernel: ----------------------------------------
 kernel: .. [<c0149d2c>] .... add_preempt_count+0x1c/0x20
 kernel: .....[<c014b0d7>] ..   ( <= print_traces+0x17/0x60)
 kernel:
 kernel: ------------------------------
 kernel: | showing all locks held by: |  (uname/983 [ee9fa720, 114]):
 kernel: ------------------------------
 kernel:

--- linux-2.6.13-rc4-RT-V0.7.52-02/drivers/block/cfq-iosched.c~	2005-07-30 20:45:57.000000000 +0200
+++ linux-2.6.13-rc4-RT-V0.7.52-02/drivers/block/cfq-iosched.c	2005-07-31 09:38:35.000000000 +0200
@@ -1376,7 +1376,7 @@
 	struct cfq_data *cfqd = cic->cfqq->cfqd;
 	request_queue_t *q = cfqd->queue;
 
-	WARN_ON(!irqs_disabled());
+	WARN_ON_NONRT(!irqs_disabled());
 
 	spin_lock(q->queue_lock);
 
@@ -1400,7 +1400,7 @@
 	struct list_head *entry;
 	unsigned long flags;
 
-	local_irq_save(flags);
+	local_irq_save_nort(flags);
 
 	/*
 	 * put the reference this task is holding to the various queues
@@ -1411,7 +1411,7 @@
 	}
 
 	cfq_exit_single_io_context(cic);
-	local_irq_restore(flags);
+	local_irq_restore_nort(flags);
 }
 
 static struct cfq_io_context *

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
  2005-07-31  8:03     ` Peter Zijlstra
@ 2005-07-31 10:44       ` Ingo Molnar
  2005-07-31 15:56         ` [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-05 Gene Heskett
  0 siblings, 1 reply; 66+ messages in thread
From: Ingo Molnar @ 2005-07-31 10:44 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel


* Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:

> because end_buffer_async_read/write use bit_spin_(un)lock and I do not 
> know how those interact with -RT.

bit_spin_lock is preemptible too - but it's not a too nice construct.  
What seems to have happened in your trace is that local_irq_disable() 
was used too in combination with bit-spinlocks, and a spinlock was taken 
from within it. The best fix would be to get rid of bit-spinlocks ...

	Ingo

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-05
  2005-07-31 10:44       ` Ingo Molnar
@ 2005-07-31 15:56         ` Gene Heskett
  0 siblings, 0 replies; 66+ messages in thread
From: Gene Heskett @ 2005-07-31 15:56 UTC (permalink / raw)
  To: linux-kernel, Ingo Molnar

On Sunday 31 July 2005 06:44, Ingo Molnar wrote:
>* Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>> because end_buffer_async_read/write use bit_spin_(un)lock and I do
>> not know how those interact with -RT.
>
>bit_spin_lock is preemptible too - but it's not a too nice
> construct. What seems to have happened in your trace is that
> local_irq_disable() was used too in combination with bit-spinlocks,
> and a spinlock was taken from within it. The best fix would be to
> get rid of bit-spinlocks ...
>
> Ingo

And that refuses to build here, mode 4 for 52-05:

fs/nfsd/nfs4state.c:125: error: `SPIN_LOCK_UNLOCKED' undeclared here 
(not in a function)
make[2]: *** [fs/nfsd/nfs4state.o] Error 1
make[1]: *** [fs/nfsd] Error 2
make: *** [fs] Error 2

>-
>To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at  http://www.tux.org/lkml/

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.35% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com and AOL/TW attorneys please note, additions to the above
message by Gene Heskett are:
Copyright 2005 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
  2005-07-31  6:38       ` Ingo Molnar
@ 2005-08-01  4:45         ` Lee Revell
  2005-08-01 21:08           ` Ingo Molnar
  2005-08-02 13:56           ` Steven Rostedt
  0 siblings, 2 replies; 66+ messages in thread
From: Lee Revell @ 2005-08-01  4:45 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Peter Zijlstra, linux-kernel

On Sun, 2005-07-31 at 08:38 +0200, Ingo Molnar wrote:
> ok - i've uploaded the -52-04 patch, does that fix it for you?

Has anyone found their PS2 keyboard rather sluggish with this kernel?
I'm not sure whether it's an -RT problem, I'll have to try rc4.

Lee


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
  2005-07-30 16:03 [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01 Ingo Molnar
  2005-07-30 20:47 ` Peter Zijlstra
@ 2005-08-01 18:22 ` Steven Rostedt
  2005-08-01 19:49   ` Steven Rostedt
                     ` (2 more replies)
  2005-08-02 14:53 ` Steven Rostedt
  2005-08-04 12:20 ` Andrzej Nowak
  3 siblings, 3 replies; 66+ messages in thread
From: Steven Rostedt @ 2005-08-01 18:22 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

Ingo,

What's with the "BUG: possible soft lockup detected on CPU..."? I'm
getting a bunch of them from the IDE interrupt.  It's not locking up,
but it does things that probably do take some time.  Is this really
necessary? Here's an example dump:

-- Steve

Note: I added the curr=%s:%d,current->comm,current->pid just to see who
was at fault. 

BUG: possible soft lockup detected on CPU#0! 578977-577975(578975)
curr=IRQ 14:713
 [<c010410f>] dump_stack+0x1f/0x30 (20)
 [<c01441e2>] softlockup_tick+0x172/0x1a0 (44)
 [<c0125d32>] update_process_times+0x62/0x140 (28)
 [<c010861d>] timer_interrupt+0x4d/0x100 (20)
 [<c014450f>] handle_IRQ_event+0x6f/0x120 (48)
 [<c014469c>] __do_IRQ+0xdc/0x1a0 (48)
 [<c0105abe>] do_IRQ+0x4e/0x90 (28)
 [<c0103b63>] common_interrupt+0x1f/0x24 (64)
 [<c02bddbe>] ata_input_data+0xbe/0xd0 (36)
 [<c02c29e1>] taskfile_input_data+0x31/0x60 (32)
 [<c02c3178>] ide_pio_sector+0xc8/0xf0 (36)
 [<c02c31f0>] ide_pio_multi+0x50/0x70 (28)
 [<c02c344e>] task_in_intr+0xfe/0x120 (36)
 [<c02bd450>] ide_intr+0x80/0x170 (36)
 [<c014450f>] handle_IRQ_event+0x6f/0x120 (48)
 [<c0144e5d>] do_hardirq+0x6d/0x150 (40)
 [<c0144fa9>] do_irqd+0x69/0xa0 (28)
 [<c013255e>] kthread+0xae/0xc0 (44)
 [<c01011ed>] kernel_thread_helper+0x5/0x18 (1052794908)
---------------------------
| preempt count: 20010003 ]
| 3-level deep critical section nesting:
----------------------------------------
.. [<c013d49a>] .... add_preempt_count+0x1a/0x20
.....[<c01085ec>] ..   ( <= timer_interrupt+0x1c/0x100)
.. [<c013d49a>] .... add_preempt_count+0x1a/0x20
.....[<c0144199>] ..   ( <= softlockup_tick+0x129/0x1a0)
.. [<c013d49a>] .... add_preempt_count+0x1a/0x20
.....[<c013e727>] ..   ( <= print_traces+0x17/0x50)

------------------------------
| showing all locks held by: |  (IRQ 14/713 [c13e2d10,  53]):
------------------------------



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
  2005-08-01 18:22 ` [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01 Steven Rostedt
@ 2005-08-01 19:49   ` Steven Rostedt
  2005-08-01 20:52   ` Ingo Molnar
  2005-08-01 21:20   ` Daniel Walker
  2 siblings, 0 replies; 66+ messages in thread
From: Steven Rostedt @ 2005-08-01 19:49 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Stephen C. Tweedie, linux-kernel

Ingo,

Here's a conversion of the bit_spin_locks to wait_on_bit.
Unfortunately, this doesn't have PI but it is better than just a normal
bit_spin_lock.

This patch applies cleanly to 2.6.13-rc3 (that's what I tried it on).  I
haven't done any benchmarking and I only booted this on a RT UP machine
so far. The RT part uses this portion.

So Stephen,  please take a look and let me know if this is something
that is suitable for the mainline?

Thanks,

-- Steve

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

Index: linux_realtime_ernie/include/linux/jbd.h
===================================================================
--- linux_realtime_ernie/include/linux/jbd.h	(revision 266)
+++ linux_realtime_ernie/include/linux/jbd.h	(working copy)
@@ -324,36 +324,88 @@
 	return bh->b_private;
 }
 
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
+
+extern int jbd_lock_bh_sleep(void *notused);
+
 static inline void jbd_lock_bh_state(struct buffer_head *bh)
 {
-	bit_spin_lock(BH_State, &bh->b_state);
+	wait_on_bit_lock(&bh->b_state,BH_State,&jbd_lock_bh_sleep,
+			 TASK_UNINTERRUPTIBLE);
+	__acquire(bitlock);
 }
 
 static inline int jbd_trylock_bh_state(struct buffer_head *bh)
 {
-	return bit_spin_trylock(BH_State, &bh->b_state);
+	if (test_and_set_bit(BH_State, &bh->b_state))
+		return 0;
+	__acquire(bitlock);
+	return 1;
 }
 
 static inline int jbd_is_locked_bh_state(struct buffer_head *bh)
 {
-	return bit_spin_is_locked(BH_State, &bh->b_state);
+	return test_bit(BH_State, &bh->b_state);
 }
 
 static inline void jbd_unlock_bh_state(struct buffer_head *bh)
 {
-	bit_spin_unlock(BH_State, &bh->b_state);
+	clear_bit(BH_State, &bh->b_state);
+	smp_mb__after_clear_bit();
+	wake_up_bit(&bh->b_state, BH_State);
+	__release(bitlock);
 }
 
 static inline void jbd_lock_bh_journal_head(struct buffer_head *bh)
 {
-	bit_spin_lock(BH_JournalHead, &bh->b_state);
+	wait_on_bit_lock(&bh->b_state, BH_JournalHead, &jbd_lock_bh_sleep,
+			 TASK_UNINTERRUPTIBLE);
+	__acquire(bitlock);
 }
 
 static inline void jbd_unlock_bh_journal_head(struct buffer_head *bh)
 {
-	bit_spin_unlock(BH_JournalHead, &bh->b_state);
+	clear_bit(BH_JournalHead, &bh->b_state);
+	smp_mb__after_clear_bit();
+	wake_up_bit(&bh->b_state, BH_JournalHead);
+	__release(bitlock);
 }
 
+#else  /* ! (CONFIG_SMP || CONFIG_DEBUG_SPINLOCK || CONFIG_PREEMPT) */
+
+static inline void jbd_lock_bh_state(struct buffer_head *bh)
+{
+	__acquire(journal_bh_state_lock);
+}
+
+static inline int jbd_trylock_bh_state(struct buffer_head *bh)
+{
+	__acquire(journal_bh_state_lock);
+	return 1;
+}
+
+static inline int jbd_is_locked_bh_state(struct buffer_head *bh)
+{
+	return 1;
+}
+
+static inline void jbd_unlock_bh_state(struct buffer_head *bh)
+{
+	__release(journal_bh_state_lock);
+}
+
+static inline void jbd_lock_bh_journal_head(struct buffer_head *bh)
+{
+	__acquire(journal_bh_journal_lock);
+}
+
+static inline void jbd_unlock_bh_journal_head(struct buffer_head *bh)
+{
+	__release(journal_bh_journal_lock);
+}
+#endif /* (CONFIG_SMP || CONFIG_DEBUG_SPINLOCK || CONFIG_PREEMPT) */
+
+
 struct jbd_revoke_table_s;
 
 /**
Index: linux_realtime_ernie/fs/jbd/journal.c
===================================================================
--- linux_realtime_ernie/fs/jbd/journal.c	(revision 266)
+++ linux_realtime_ernie/fs/jbd/journal.c	(working copy)
@@ -80,6 +80,14 @@
 EXPORT_SYMBOL(journal_try_to_free_buffers);
 EXPORT_SYMBOL(journal_force_commit);
 
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT)
+int jbd_lock_bh_sleep(void *notused)
+{
+	schedule();
+	return 0;
+}
+#endif
+
 static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *);
 
 /*



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
  2005-08-01 18:22 ` [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01 Steven Rostedt
  2005-08-01 19:49   ` Steven Rostedt
@ 2005-08-01 20:52   ` Ingo Molnar
  2005-08-01 21:09     ` Daniel Walker
  2005-08-01 21:15     ` Steven Rostedt
  2005-08-01 21:20   ` Daniel Walker
  2 siblings, 2 replies; 66+ messages in thread
From: Ingo Molnar @ 2005-08-01 20:52 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-kernel, dwalker


* Steven Rostedt <rostedt@goodmis.org> wrote:

> Ingo,
> 
> What's with the "BUG: possible soft lockup detected on CPU..."? I'm 
> getting a bunch of them from the IDE interrupt.  It's not locking up, 
> but it does things that probably do take some time.  Is this really 
> necessary? Here's an example dump:

doh - it's Daniel not Cc:-ing lkml when sending me patches, so people 
dont know what's going on ...

here's the patch below. Could you try to revert it?

	Ingo

On Sun, 2005-07-31 at 20:27 +0200, Ingo Molnar wrote:
> looks good, but i'd suggest to use printk_ratelimit(). (and the use of 
> u16 can be a performance hit on x86 due to potential 16-bit prefixes - 
> the best thing to use is an 'int' on pretty much every arch. with 
> printk_ratelimit() this flag go away anyway.)


Ok, here's with your suggestions.


Index: linux-2.6.12/kernel/softlockup.c
===================================================================
--- linux-2.6.12.orig/kernel/softlockup.c	2005-07-31 15:31:09.000000000 +0000
+++ linux-2.6.12/kernel/softlockup.c	2005-07-31 18:43:35.000000000 +0000
@@ -9,6 +9,7 @@
 
 #include <linux/mm.h>
 #include <linux/cpu.h>
+#include <linux/sched.h>
 #include <linux/init.h>
 #include <linux/delay.h>
 #include <linux/kthread.h>
@@ -19,6 +20,7 @@ static DEFINE_RAW_SPINLOCK(print_lock);
 static DEFINE_PER_CPU(unsigned long, timeout) = INITIAL_JIFFIES;
 static DEFINE_PER_CPU(unsigned long, timestamp) = INITIAL_JIFFIES;
 static DEFINE_PER_CPU(unsigned long, print_timestamp) = INITIAL_JIFFIES;
+static DEFINE_PER_CPU(struct task_struct *, prev_task);
 static DEFINE_PER_CPU(struct task_struct *, watchdog_task);
 
 static int did_panic = 0;
@@ -56,6 +58,23 @@ void softlockup_tick(void)
 		if (!per_cpu(watchdog_task, this_cpu))
 			return;
 
+		if (per_cpu(prev_task, this_cpu) != current || 
+			!rt_task(current)) {
+			per_cpu(prev_task, this_cpu) = current;
+		}
+		else if (printk_ratelimit()) {
+
+			spin_lock(&print_lock);
+			printk(KERN_ERR "BUG: possible soft lockup detected on CPU#%u! %lu-%lu(%lu)\n",
+				this_cpu, jiffies, timestamp, timeout);
+			dump_stack();
+#if defined(__i386__) && defined(CONFIG_SMP)
+			nmi_show_all_regs();
+#endif
+			spin_unlock(&print_lock);
+
+		}
+
 		wake_up_process(per_cpu(watchdog_task, this_cpu));
 		per_cpu(timeout, this_cpu) = jiffies + msecs_to_jiffies(1000);
 	}
@@ -71,7 +90,7 @@ void softlockup_tick(void)
 		per_cpu(print_timestamp, this_cpu) = timestamp;
 
 		spin_lock(&print_lock);
-		printk(KERN_ERR "BUG: soft lockup detected on CPU#%d! %ld-%ld(%ld)\n",
+		printk(KERN_ERR "BUG: soft lockup detected on CPU#%u! %lu-%lu(%lu)\n",
 			this_cpu, jiffies, timestamp, timeout);
 		dump_stack();
 #if defined(__i386__) && defined(CONFIG_SMP)


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
  2005-08-01  4:45         ` Lee Revell
@ 2005-08-01 21:08           ` Ingo Molnar
  2005-08-01 21:12             ` Ingo Molnar
  2005-08-02 13:56           ` Steven Rostedt
  1 sibling, 1 reply; 66+ messages in thread
From: Ingo Molnar @ 2005-08-01 21:08 UTC (permalink / raw)
  To: Lee Revell; +Cc: Peter Zijlstra, linux-kernel


* Lee Revell <rlrevell@joe-job.com> wrote:

> On Sun, 2005-07-31 at 08:38 +0200, Ingo Molnar wrote:
> > ok - i've uploaded the -52-04 patch, does that fix it for you?
> 
> Has anyone found their PS2 keyboard rather sluggish with this kernel? 
> I'm not sure whether it's an -RT problem, I'll have to try rc4.

hm, i've got no other reports about that.

	Ingo

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
  2005-08-01 20:52   ` Ingo Molnar
@ 2005-08-01 21:09     ` Daniel Walker
  2005-08-01 21:15       ` Ingo Molnar
  2005-08-02  0:43       ` Steven Rostedt
  2005-08-01 21:15     ` Steven Rostedt
  1 sibling, 2 replies; 66+ messages in thread
From: Daniel Walker @ 2005-08-01 21:09 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Steven Rostedt, linux-kernel

On Mon, 2005-08-01 at 22:52 +0200, Ingo Molnar wrote:
> * Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > Ingo,
> > 
> > What's with the "BUG: possible soft lockup detected on CPU..."? I'm 
> > getting a bunch of them from the IDE interrupt.  It's not locking up, 
> > but it does things that probably do take some time.  Is this really 
> > necessary? Here's an example dump:
> 
> doh - it's Daniel not Cc:-ing lkml when sending me patches, so people 
> dont know what's going on ...
> 
> here's the patch below. Could you try to revert it?

You guys want me to always CC in the future? 

Daniel


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
  2005-08-01 21:08           ` Ingo Molnar
@ 2005-08-01 21:12             ` Ingo Molnar
  0 siblings, 0 replies; 66+ messages in thread
From: Ingo Molnar @ 2005-08-01 21:12 UTC (permalink / raw)
  To: Lee Revell; +Cc: Peter Zijlstra, linux-kernel


> * Lee Revell <rlrevell@joe-job.com> wrote:
> 
> > On Sun, 2005-07-31 at 08:38 +0200, Ingo Molnar wrote:
> > > ok - i've uploaded the -52-04 patch, does that fix it for you?
> > 
> > Has anyone found their PS2 keyboard rather sluggish with this kernel? 
> > I'm not sure whether it's an -RT problem, I'll have to try rc4.

There was one irq-redirection change done recently, i've undone it in 
-52-09, does it work any better?

	Ingo

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
  2005-08-01 21:09     ` Daniel Walker
@ 2005-08-01 21:15       ` Ingo Molnar
  2005-08-02  0:43       ` Steven Rostedt
  1 sibling, 0 replies; 66+ messages in thread
From: Ingo Molnar @ 2005-08-01 21:15 UTC (permalink / raw)
  To: Daniel Walker; +Cc: Steven Rostedt, linux-kernel


* Daniel Walker <dwalker@mvista.com> wrote:

> > here's the patch below. Could you try to revert it?
> 
> You guys want me to always CC in the future?

well if it's somewhat larger than a trivial fix then it would definitely 
be useful to always Cc: lkml. Trivial fixes can go to lkml too, just in 
case i dont upload fast enough and someone else wants the fix too.  
Generally, Cc:-ing the mailing list also puts less of a burden on me, 
because others might find flaws in patches i dont spot right away.

	Ingo

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
  2005-08-01 20:52   ` Ingo Molnar
  2005-08-01 21:09     ` Daniel Walker
@ 2005-08-01 21:15     ` Steven Rostedt
  2005-08-01 21:23       ` Ingo Molnar
  1 sibling, 1 reply; 66+ messages in thread
From: Steven Rostedt @ 2005-08-01 21:15 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, dwalker

On Mon, 2005-08-01 at 22:52 +0200, Ingo Molnar wrote:
> * Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > Ingo,
> > 
> > What's with the "BUG: possible soft lockup detected on CPU..."? I'm 
> > getting a bunch of them from the IDE interrupt.  It's not locking up, 
> > but it does things that probably do take some time.  Is this really 
> > necessary? Here's an example dump:
> 
> doh - it's Daniel not Cc:-ing lkml when sending me patches, so people 
> dont know what's going on ...
> 
> here's the patch below. Could you try to revert it?

Thanks Ingo.

If Daniel was trying to detect soft lock ups of lower priority tasks
(tasks that block all tasks lower than itself), I've added a counter to
Daniels patch to keep from showing this for the one time case.  This
doesn't spit anything out for me anymore.  But I guess this could detect
a higher priority task blocking lower ones, as long as higher tasks
don't run often (thus reseting the count).

-- Steve

Index: linux_realtime_ernie/kernel/softlockup.c
===================================================================
--- linux_realtime_ernie/kernel/softlockup.c	(revision 266)
+++ linux_realtime_ernie/kernel/softlockup.c	(working copy)
@@ -22,6 +22,7 @@
 static DEFINE_PER_CPU(unsigned long, print_timestamp) = INITIAL_JIFFIES;
 static DEFINE_PER_CPU(struct task_struct *, prev_task);
 static DEFINE_PER_CPU(struct task_struct *, watchdog_task);
+static DEFINE_PER_CPU(unsigned long, task_counter);
 
 static int did_panic = 0;
 static int softlock_panic(struct notifier_block *this, unsigned long event,
@@ -61,18 +62,21 @@
 		if (per_cpu(prev_task, this_cpu) != current || 
 			!rt_task(current)) {
 			per_cpu(prev_task, this_cpu) = current;
+			per_cpu(task_counter, this_cpu) = 0;
 		}
-		else if (printk_ratelimit()) {
+		else if ((++per_cpu(task_counter, this_cpu) > 10) && printk_ratelimit()) {
 
 			spin_lock(&print_lock);
 			printk(KERN_ERR "BUG: possible soft lockup detected on CPU#%u! %lu-%lu(%lu)\n",
 				this_cpu, jiffies, timestamp, timeout);
+			printk("curr=%s:%d\n",current->comm,current->pid);
+			
 			dump_stack();
 #if defined(__i386__) && defined(CONFIG_SMP)
 			nmi_show_all_regs();
 #endif
 			spin_unlock(&print_lock);
-
+			per_cpu(task_counter, this_cpu) = 0;
 		}
 
 		wake_up_process(per_cpu(watchdog_task, this_cpu));
@@ -97,6 +101,7 @@
 		nmi_show_all_regs();
 #endif
 		spin_unlock(&print_lock);
+		per_cpu(task_counter, this_cpu) = 0;
 	}
 }
 



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
  2005-08-01 18:22 ` [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01 Steven Rostedt
  2005-08-01 19:49   ` Steven Rostedt
  2005-08-01 20:52   ` Ingo Molnar
@ 2005-08-01 21:20   ` Daniel Walker
  2005-08-02  0:53     ` Steven Rostedt
  2005-08-02  3:55     ` Steven Rostedt
  2 siblings, 2 replies; 66+ messages in thread
From: Daniel Walker @ 2005-08-01 21:20 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Ingo Molnar, linux-kernel

On Mon, 2005-08-01 at 14:22 -0400, Steven Rostedt wrote:
> Ingo,
> 
> What's with the "BUG: possible soft lockup detected on CPU..."? I'm
> getting a bunch of them from the IDE interrupt.  It's not locking up,
> but it does things that probably do take some time.  Is this really
> necessary? Here's an example dump:
> 
> -- Steve
> 
> Note: I added the curr=%s:%d,current->comm,current->pid just to see who
> was at fault. 

It means that IRQ 14 is running for a long time as an RT task .. btw,
the curr=%s:%d information duplicates some in the "show all held locks"
section .

I could base it off current_sched_time() to only trigger if the task has
actually been running for 10 seconds, instead of just assuming that it
has..

Daniel


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
  2005-08-01 21:15     ` Steven Rostedt
@ 2005-08-01 21:23       ` Ingo Molnar
  0 siblings, 0 replies; 66+ messages in thread
From: Ingo Molnar @ 2005-08-01 21:23 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-kernel, dwalker


* Steven Rostedt <rostedt@goodmis.org> wrote:

> > here's the patch below. Could you try to revert it?
> 
> Thanks Ingo.
> 
> If Daniel was trying to detect soft lock ups of lower priority tasks 
> (tasks that block all tasks lower than itself), I've added a counter 
> to Daniels patch to keep from showing this for the one time case.  
> This doesn't spit anything out for me anymore.  But I guess this could 
> detect a higher priority task blocking lower ones, as long as higher 
> tasks don't run often (thus reseting the count).

thanks. In -52-09 i've unapplied the original patch, and i've now 
uploaded -52-10 with Daniel's original patch plus your patch applied.

I think 10 seconds is pretty reasonable - if an RT task runs 
uninterrupted for that long time i think we want to know about it. It's 
not illegal for an RT task to monopolize the CPU for that long, but it's 
certainly unusual enough to warn about. (and the warning can be turned 
off)

	Ingo

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
  2005-08-01 21:09     ` Daniel Walker
  2005-08-01 21:15       ` Ingo Molnar
@ 2005-08-02  0:43       ` Steven Rostedt
  1 sibling, 0 replies; 66+ messages in thread
From: Steven Rostedt @ 2005-08-02  0:43 UTC (permalink / raw)
  To: dwalker; +Cc: Ingo Molnar, linux-kernel

On Mon, 2005-08-01 at 14:09 -0700, Daniel Walker wrote:
> 
> You guys want me to always CC in the future? 

Yes, please CC the LKML.  I try to for all updates since I might have
done a mistake in my code that my shallow tests don't catch, and others
might. Also to let others know what I'm suggesting to Ingo so they may
comment as well. I've had some pretty good comments from people, as well
as just deeper thoughts in what's being changed.  I also try to see
what's being proposed to Ingo's patch to make sure that I understand
what my code will be depending on.

Your wakeup race patch is a perfect example of something that should
definitely be CC'd to LKML.

-- Steve



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
  2005-08-01 21:20   ` Daniel Walker
@ 2005-08-02  0:53     ` Steven Rostedt
  2005-08-02 10:19       ` Ingo Molnar
  2005-08-02  3:55     ` Steven Rostedt
  1 sibling, 1 reply; 66+ messages in thread
From: Steven Rostedt @ 2005-08-02  0:53 UTC (permalink / raw)
  To: dwalker; +Cc: Ingo Molnar, linux-kernel

On Mon, 2005-08-01 at 14:20 -0700, Daniel Walker wrote:
> On Mon, 2005-08-01 at 14:22 -0400, Steven Rostedt wrote:
> > Ingo,
> > 
> > What's with the "BUG: possible soft lockup detected on CPU..."? I'm
> > getting a bunch of them from the IDE interrupt.  It's not locking up,
> > but it does things that probably do take some time.  Is this really
> > necessary? Here's an example dump:
> > 
> > -- Steve
> > 
> > Note: I added the curr=%s:%d,current->comm,current->pid just to see who
> > was at fault. 
> 
> It means that IRQ 14 is running for a long time as an RT task .. btw,
> the curr=%s:%d information duplicates some in the "show all held locks"
> section .

yeah I know that was redundant (after putting it in), but I wanted to
make sure what current was. The locks held wasn't as straight forward as
to what was current (I wasn't looking at what produced that, I just
noticed the output).

> 
> I could base it off current_sched_time() to only trigger if the task has
> actually been running for 10 seconds, instead of just assuming that it
> has..

I thought about changing that too. But I'm assuming that you are looking
for bugs (like the kjournald as RT) where a task may be in a loop, but
higher priority tasks can still preempt it.  Putting the check elsewhere
will still be screwed up by preempting higher prio tasks.

In my custom kernel, I have a wchan field of the task that records where
the task calls something that might schedule. This way I can see where
things locked up if I don't have a back trace of the task.  This field
is always zero when it switches to usermode.  Something like this can
also be used to check how long the process is in kernel mode.  If a task
is in the kernel for more than 10 seconds without sleeping, that would
definitely be a good indication of something wrong.  I probably could
write something to check for this if people are interested.  I wont
waste my time if nobody would want it.

-- Steve



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
  2005-08-01 21:20   ` Daniel Walker
  2005-08-02  0:53     ` Steven Rostedt
@ 2005-08-02  3:55     ` Steven Rostedt
  2005-08-02  4:07       ` Daniel Walker
  1 sibling, 1 reply; 66+ messages in thread
From: Steven Rostedt @ 2005-08-02  3:55 UTC (permalink / raw)
  To: dwalker; +Cc: Ingo Molnar, linux-kernel

On Mon, 2005-08-01 at 14:20 -0700, Daniel Walker wrote:
> It means that IRQ 14 is running for a long time as an RT task 

Oh yeah, I forgot to comment on this.  Yes IRQ 14 is rather slow. It's
the IDE drive interrupt and it gets pretty busy.  Actually the check
doesn't really see if it is running for a long time, since it gets
scheduled out.  But I'm running this on a slow 368MHz machine and it
takes some time. There's cases where every second the interrupt just
happened to be running, since that is what it checks.  It doesn't check
to see if the thread actual sleeps.

I may add something to your patch to see if a thread actually goes to
sleep. If it doesn't then to flag it as possible stuck.

-- Steve



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
  2005-08-02  3:55     ` Steven Rostedt
@ 2005-08-02  4:07       ` Daniel Walker
  0 siblings, 0 replies; 66+ messages in thread
From: Daniel Walker @ 2005-08-02  4:07 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Ingo Molnar, linux-kernel

On Mon, 2005-08-01 at 23:55 -0400, Steven Rostedt wrote:
> On Mon, 2005-08-01 at 14:20 -0700, Daniel Walker wrote:
> > It means that IRQ 14 is running for a long time as an RT task 
> 
> Oh yeah, I forgot to comment on this.  Yes IRQ 14 is rather slow. It's
> the IDE drive interrupt and it gets pretty busy.  Actually the check
> doesn't really see if it is running for a long time, since it gets
> scheduled out.  But I'm running this on a slow 368MHz machine and it
> takes some time. There's cases where every second the interrupt just
> happened to be running, since that is what it checks.  It doesn't check
> to see if the thread actual sleeps.
> 
> I may add something to your patch to see if a thread actually goes to
> sleep. If it doesn't then to flag it as possible stuck.

I was offering to do that earlier , but I assumed your other patch was
sufficient .. Feel free to add it if you think it's needed..

Daniel


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
  2005-08-02  0:53     ` Steven Rostedt
@ 2005-08-02 10:19       ` Ingo Molnar
  2005-08-02 19:45         ` Steven Rostedt
  0 siblings, 1 reply; 66+ messages in thread
From: Ingo Molnar @ 2005-08-02 10:19 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: dwalker, linux-kernel


* Steven Rostedt <rostedt@goodmis.org> wrote:

> In my custom kernel, I have a wchan field of the task that records 
> where the task calls something that might schedule. This way I can see 
> where things locked up if I don't have a back trace of the task.  This 
> field is always zero when it switches to usermode.  Something like 
> this can also be used to check how long the process is in kernel mode.  
> If a task is in the kernel for more than 10 seconds without sleeping, 
> that would definitely be a good indication of something wrong.  I 
> probably could write something to check for this if people are 
> interested.  I wont waste my time if nobody would want it.

this would be a pretty useful extension of the softlockup checker!

	Ingo

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
  2005-08-01  4:45         ` Lee Revell
  2005-08-01 21:08           ` Ingo Molnar
@ 2005-08-02 13:56           ` Steven Rostedt
  2005-08-02 14:05             ` Lee Revell
  1 sibling, 1 reply; 66+ messages in thread
From: Steven Rostedt @ 2005-08-02 13:56 UTC (permalink / raw)
  To: Lee Revell; +Cc: linux-kernel, Peter Zijlstra, Ingo Molnar

On Mon, 2005-08-01 at 00:45 -0400, Lee Revell wrote:
> On Sun, 2005-07-31 at 08:38 +0200, Ingo Molnar wrote:
> > ok - i've uploaded the -52-04 patch, does that fix it for you?
> 
> Has anyone found their PS2 keyboard rather sluggish with this kernel?
> I'm not sure whether it's an -RT problem, I'll have to try rc4.

I've just noticed this now. While I have lots of ssh sessions running,
my keyboard does get really sluggish. This hasn't happened before. I'm
currently running 2.6.13-rc3 with no RT.  So this may definitely be a
mainline issue.

-- Steve



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
  2005-08-02 13:56           ` Steven Rostedt
@ 2005-08-02 14:05             ` Lee Revell
  2005-08-02 14:20               ` Steven Rostedt
  0 siblings, 1 reply; 66+ messages in thread
From: Lee Revell @ 2005-08-02 14:05 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-kernel, Peter Zijlstra, Ingo Molnar

On Tue, 2005-08-02 at 09:56 -0400, Steven Rostedt wrote:
> On Mon, 2005-08-01 at 00:45 -0400, Lee Revell wrote:
> > On Sun, 2005-07-31 at 08:38 +0200, Ingo Molnar wrote:
> > > ok - i've uploaded the -52-04 patch, does that fix it for you?
> > 
> > Has anyone found their PS2 keyboard rather sluggish with this kernel?
> > I'm not sure whether it's an -RT problem, I'll have to try rc4.
> 
> I've just noticed this now. While I have lots of ssh sessions running,
> my keyboard does get really sluggish. This hasn't happened before. I'm
> currently running 2.6.13-rc3 with no RT.  So this may definitely be a
> mainline issue.

I'm on a slower machine, and I seem to get this behavior regardless of
load.  Probably just running X+Gnome on this box is enough.

Are you in any position to do a binary search?  It would be really bad
to release 2.6.13 with this problem...

Lee


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
  2005-08-02 14:05             ` Lee Revell
@ 2005-08-02 14:20               ` Steven Rostedt
  2005-08-02 15:37                 ` 2.6.13-rc3 -> sluggish PS2 keyboard (was Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01) Lee Revell
  2005-08-02 15:38                 ` [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01 Lee Revell
  0 siblings, 2 replies; 66+ messages in thread
From: Steven Rostedt @ 2005-08-02 14:20 UTC (permalink / raw)
  To: Lee Revell; +Cc: linux-kernel, Peter Zijlstra, Ingo Molnar

On Tue, 2005-08-02 at 10:05 -0400, Lee Revell wrote:
> On Tue, 2005-08-02 at 09:56 -0400, Steven Rostedt wrote:
> > On Mon, 2005-08-01 at 00:45 -0400, Lee Revell wrote:
> > > On Sun, 2005-07-31 at 08:38 +0200, Ingo Molnar wrote:
> > > > ok - i've uploaded the -52-04 patch, does that fix it for you?
> > > 
> > > Has anyone found their PS2 keyboard rather sluggish with this kernel?
> > > I'm not sure whether it's an -RT problem, I'll have to try rc4.
> > 
> > I've just noticed this now. While I have lots of ssh sessions running,
> > my keyboard does get really sluggish. This hasn't happened before. I'm
> > currently running 2.6.13-rc3 with no RT.  So this may definitely be a
> > mainline issue.
> 
> I'm on a slower machine, and I seem to get this behavior regardless of
> load.  Probably just running X+Gnome on this box is enough.
> 
> Are you in any position to do a binary search?  It would be really bad
> to release 2.6.13 with this problem...

Unfortunately no. I'm trying to finish a milestone that was due last
Friday, debug a problem that was found on my last milestone, and add a
feature to Ingo's RT patch. So I can't get to this till at earliest next
week.

Also, I don't know if this is a kernel issue or a debian issue since I
updated my kernel at the same time I did a debian upgrade, and I'm using
debian unstable. Since debian unstable is going through some major
changes, this could be caused by that.  I may be able to try some other
machines to see if they are affected, but that might take some time
before I can get to it.

The machine that I noticed this on is a SMP AMD 2.1GHz with a gig of
ram. So the keyboard shouldn't be affected by the screen display. But I
have had X problems with the latest release of debian unstable.

-- Steve



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
  2005-07-30 16:03 [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01 Ingo Molnar
  2005-07-30 20:47 ` Peter Zijlstra
  2005-08-01 18:22 ` [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01 Steven Rostedt
@ 2005-08-02 14:53 ` Steven Rostedt
  2005-08-04 12:20 ` Andrzej Nowak
  3 siblings, 0 replies; 66+ messages in thread
From: Steven Rostedt @ 2005-08-02 14:53 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

Trivial patch.  Remove redundant unlikely.

Side note. Ingo, I've also completed the softdeadlock update patch, and
I'm right now trying to trigger the kjournald deadlock by upping it to
FIFO 30 and having non RT tasks running find and compiles.  Since you
removed my inverted lock patch, this should cause the bug. This use to
cause the deadlock right away, but now I can't get it to lock.  Did you
change anything else? The jbd looks identical to 2.6.12.

-- Steve

Index: linux_realtime_ernie/kernel/latency.c
===================================================================
--- linux_realtime_ernie/kernel/latency.c	(revision 266)
+++ linux_realtime_ernie/kernel/latency.c	(working copy)
@@ -1481,7 +1481,7 @@
 	/*
 	 * Underflow?
 	 */
-	BUG_ON(unlikely(val > preempt_count_ti(ti)));
+	BUG_ON(val > preempt_count_ti(ti));
 
 	/*
 	 * Is the spinlock portion underflowing?



^ permalink raw reply	[flat|nested] 66+ messages in thread

* 2.6.13-rc3 -> sluggish PS2 keyboard (was Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01)
  2005-08-02 14:20               ` Steven Rostedt
@ 2005-08-02 15:37                 ` Lee Revell
  2005-08-02 15:44                   ` Vojtech Pavlik
  2005-08-02 15:55                   ` Dmitry Torokhov
  2005-08-02 15:38                 ` [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01 Lee Revell
  1 sibling, 2 replies; 66+ messages in thread
From: Lee Revell @ 2005-08-02 15:37 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, Vojtech Pavlik

On Tue, 2005-08-02 at 10:20 -0400, Steven Rostedt wrote:
> On Tue, 2005-08-02 at 10:05 -0400, Lee Revell wrote:
> > On Tue, 2005-08-02 at 09:56 -0400, Steven Rostedt wrote:
> > > On Mon, 2005-08-01 at 00:45 -0400, Lee Revell wrote:
> > > > On Sun, 2005-07-31 at 08:38 +0200, Ingo Molnar wrote:
> > > > > ok - i've uploaded the -52-04 patch, does that fix it for you?
> > > > 
> > > > Has anyone found their PS2 keyboard rather sluggish with this kernel?
> > > > I'm not sure whether it's an -RT problem, I'll have to try rc4.
> > > 
> > > I've just noticed this now. While I have lots of ssh sessions running,
> > > my keyboard does get really sluggish. This hasn't happened before. I'm
> > > currently running 2.6.13-rc3 with no RT.  So this may definitely be a
> > > mainline issue.
> > 
> > I'm on a slower machine, and I seem to get this behavior regardless of
> > load.  Probably just running X+Gnome on this box is enough.
> > 
> Also, I don't know if this is a kernel issue or a debian issue since I
> updated my kernel at the same time I did a debian upgrade, and I'm using
> debian unstable. Since debian unstable is going through some major
> changes, this could be caused by that.  I may be able to try some other
> machines to see if they are affected, but that might take some time
> before I can get to it.
> 

Same here (s/debian/ubuntu/) but I have the exact same problem at the
console, I don't think it could be an X issue unless X was able to wedge
the keyboard controller.

It feels like typing over a slow modem link, I can get about one word
ahead of the cursor (X or console, regardless of load) but the delay
seems to be constant.

Lee


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
  2005-08-02 14:20               ` Steven Rostedt
  2005-08-02 15:37                 ` 2.6.13-rc3 -> sluggish PS2 keyboard (was Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01) Lee Revell
@ 2005-08-02 15:38                 ` Lee Revell
  1 sibling, 0 replies; 66+ messages in thread
From: Lee Revell @ 2005-08-02 15:38 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-kernel, Peter Zijlstra, Ingo Molnar

On Tue, 2005-08-02 at 10:20 -0400, Steven Rostedt wrote:
> > Are you in any position to do a binary search?  It would be really
> bad
> > to release 2.6.13 with this problem...
> 
> Unfortunately no. I'm trying to finish a milestone that was due last
> Friday, debug a problem that was found on my last milestone, and add a
> feature to Ingo's RT patch. So I can't get to this till at earliest
> next
> week.

OK I have time to try -rc1 then -rc2, hopefully this will nail it down.

Lee


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: 2.6.13-rc3 -> sluggish PS2 keyboard (was Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01)
  2005-08-02 15:37                 ` 2.6.13-rc3 -> sluggish PS2 keyboard (was Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01) Lee Revell
@ 2005-08-02 15:44                   ` Vojtech Pavlik
  2005-08-02 15:46                     ` Lee Revell
  2005-08-02 15:47                     ` Lee Revell
  2005-08-02 15:55                   ` Dmitry Torokhov
  1 sibling, 2 replies; 66+ messages in thread
From: Vojtech Pavlik @ 2005-08-02 15:44 UTC (permalink / raw)
  To: Lee Revell; +Cc: Steven Rostedt, linux-kernel, Peter Zijlstra, Ingo Molnar

On Tue, Aug 02, 2005 at 11:37:41AM -0400, Lee Revell wrote:
> On Tue, 2005-08-02 at 10:20 -0400, Steven Rostedt wrote:
> > On Tue, 2005-08-02 at 10:05 -0400, Lee Revell wrote:
> > > On Tue, 2005-08-02 at 09:56 -0400, Steven Rostedt wrote:
> > > > On Mon, 2005-08-01 at 00:45 -0400, Lee Revell wrote:
> > > > > On Sun, 2005-07-31 at 08:38 +0200, Ingo Molnar wrote:
> > > > > > ok - i've uploaded the -52-04 patch, does that fix it for you?
> > > > > 
> > > > > Has anyone found their PS2 keyboard rather sluggish with this kernel?
> > > > > I'm not sure whether it's an -RT problem, I'll have to try rc4.
> > > > 
> > > > I've just noticed this now. While I have lots of ssh sessions running,
> > > > my keyboard does get really sluggish. This hasn't happened before. I'm
> > > > currently running 2.6.13-rc3 with no RT.  So this may definitely be a
> > > > mainline issue.
> > > 
> > > I'm on a slower machine, and I seem to get this behavior regardless of
> > > load.  Probably just running X+Gnome on this box is enough.
> > > 
> > Also, I don't know if this is a kernel issue or a debian issue since I
> > updated my kernel at the same time I did a debian upgrade, and I'm using
> > debian unstable. Since debian unstable is going through some major
> > changes, this could be caused by that.  I may be able to try some other
> > machines to see if they are affected, but that might take some time
> > before I can get to it.
> > 
> 
> Same here (s/debian/ubuntu/) but I have the exact same problem at the
> console, I don't think it could be an X issue unless X was able to wedge
> the keyboard controller.
> 
> It feels like typing over a slow modem link, I can get about one word
> ahead of the cursor (X or console, regardless of load) but the delay
> seems to be constant.
 
Is your keyboard interrupt (irq #1) working correctly? If not, then the
keyboard controller is polled at 20Hz to compensate for lost interrupts,
which would make it work, but if no interrupts work, it would seem like
typing over a slow link.

-- 
Vojtech Pavlik
SuSE Labs, SuSE CR

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: 2.6.13-rc3 -> sluggish PS2 keyboard (was Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01)
  2005-08-02 15:44                   ` Vojtech Pavlik
@ 2005-08-02 15:46                     ` Lee Revell
  2005-08-02 15:47                     ` Lee Revell
  1 sibling, 0 replies; 66+ messages in thread
From: Lee Revell @ 2005-08-02 15:46 UTC (permalink / raw)
  To: Vojtech Pavlik; +Cc: Steven Rostedt, linux-kernel, Peter Zijlstra, Ingo Molnar

On Tue, 2005-08-02 at 17:44 +0200, Vojtech Pavlik wrote:
> Is your keyboard interrupt (irq #1) working correctly? If not, then the
> keyboard controller is polled at 20Hz to compensate for lost interrupts,
> which would make it work, but if no interrupts work, it would seem like
> typing over a slow link.
> 

Bingo, no interrupts when I type.

What could cause this?  I was switching machines so I unplugged this
keyboard a few times since booting.

Lee


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: 2.6.13-rc3 -> sluggish PS2 keyboard (was Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01)
  2005-08-02 15:44                   ` Vojtech Pavlik
  2005-08-02 15:46                     ` Lee Revell
@ 2005-08-02 15:47                     ` Lee Revell
  2005-08-02 15:53                       ` Steven Rostedt
  2005-08-02 15:55                       ` Vojtech Pavlik
  1 sibling, 2 replies; 66+ messages in thread
From: Lee Revell @ 2005-08-02 15:47 UTC (permalink / raw)
  To: Vojtech Pavlik; +Cc: Steven Rostedt, linux-kernel, Peter Zijlstra, Ingo Molnar

On Tue, 2005-08-02 at 17:44 +0200, Vojtech Pavlik wrote:
> Is your keyboard interrupt (irq #1) working correctly? If not, then the
> keyboard controller is polled at 20Hz to compensate for lost interrupts,
> which would make it work, but if no interrupts work, it would seem like
> typing over a slow link.

I am an idiot.  The keyboard was plugged into the mouse port.

I'm impressed this worked at all.

Lee


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: 2.6.13-rc3 -> sluggish PS2 keyboard (was Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01)
  2005-08-02 15:47                     ` Lee Revell
@ 2005-08-02 15:53                       ` Steven Rostedt
  2005-08-02 15:55                       ` Vojtech Pavlik
  1 sibling, 0 replies; 66+ messages in thread
From: Steven Rostedt @ 2005-08-02 15:53 UTC (permalink / raw)
  To: Lee Revell; +Cc: Vojtech Pavlik, linux-kernel, Peter Zijlstra, Ingo Molnar

On Tue, 2005-08-02 at 11:47 -0400, Lee Revell wrote:
> On Tue, 2005-08-02 at 17:44 +0200, Vojtech Pavlik wrote:
> > Is your keyboard interrupt (irq #1) working correctly? If not, then the
> > keyboard controller is polled at 20Hz to compensate for lost interrupts,
> > which would make it work, but if no interrupts work, it would seem like
> > typing over a slow link.
> 
> I am an idiot.  The keyboard was plugged into the mouse port.
> 
> I'm impressed this worked at all.

:)

I guess this also makes the case that my sluggish keyboard is from the X
updates in debian. I wasn't able  to get it to be sluggish at the
console, and it was only sluggish under X load.

-- Steve



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: 2.6.13-rc3 -> sluggish PS2 keyboard (was Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01)
  2005-08-02 15:47                     ` Lee Revell
  2005-08-02 15:53                       ` Steven Rostedt
@ 2005-08-02 15:55                       ` Vojtech Pavlik
  1 sibling, 0 replies; 66+ messages in thread
From: Vojtech Pavlik @ 2005-08-02 15:55 UTC (permalink / raw)
  To: Lee Revell; +Cc: Steven Rostedt, linux-kernel, Peter Zijlstra, Ingo Molnar

On Tue, Aug 02, 2005 at 11:47:13AM -0400, Lee Revell wrote:

> On Tue, 2005-08-02 at 17:44 +0200, Vojtech Pavlik wrote:
> > Is your keyboard interrupt (irq #1) working correctly? If not, then the
> > keyboard controller is polled at 20Hz to compensate for lost interrupts,
> > which would make it work, but if no interrupts work, it would seem like
> > typing over a slow link.
> 
> I am an idiot.  The keyboard was plugged into the mouse port.
> 
> I'm impressed this worked at all.
 
It would likely even work correctly if irq 12 was available and working
on the AUX port.

-- 
Vojtech Pavlik
SuSE Labs, SuSE CR

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: 2.6.13-rc3 -> sluggish PS2 keyboard (was Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01)
  2005-08-02 15:37                 ` 2.6.13-rc3 -> sluggish PS2 keyboard (was Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01) Lee Revell
  2005-08-02 15:44                   ` Vojtech Pavlik
@ 2005-08-02 15:55                   ` Dmitry Torokhov
  1 sibling, 0 replies; 66+ messages in thread
From: Dmitry Torokhov @ 2005-08-02 15:55 UTC (permalink / raw)
  To: Lee Revell
  Cc: Steven Rostedt, linux-kernel, Peter Zijlstra, Ingo Molnar,
	Vojtech Pavlik

On 8/2/05, Lee Revell <rlrevell@joe-job.com> wrote:
> On Tue, 2005-08-02 at 10:20 -0400, Steven Rostedt wrote:
> > On Tue, 2005-08-02 at 10:05 -0400, Lee Revell wrote:
> > > On Tue, 2005-08-02 at 09:56 -0400, Steven Rostedt wrote:
> > > > On Mon, 2005-08-01 at 00:45 -0400, Lee Revell wrote:
> > > > > On Sun, 2005-07-31 at 08:38 +0200, Ingo Molnar wrote:
> > > > > > ok - i've uploaded the -52-04 patch, does that fix it for you?
> > > > >
> > > > > Has anyone found their PS2 keyboard rather sluggish with this kernel?
> > > > > I'm not sure whether it's an -RT problem, I'll have to try rc4.
> > > >
> > > > I've just noticed this now. While I have lots of ssh sessions running,
> > > > my keyboard does get really sluggish. This hasn't happened before. I'm
> > > > currently running 2.6.13-rc3 with no RT.  So this may definitely be a
> > > > mainline issue.
> > >
> > > I'm on a slower machine, and I seem to get this behavior regardless of
> > > load.  Probably just running X+Gnome on this box is enough.
> > >
> > Also, I don't know if this is a kernel issue or a debian issue since I
> > updated my kernel at the same time I did a debian upgrade, and I'm using
> > debian unstable. Since debian unstable is going through some major
> > changes, this could be caused by that.  I may be able to try some other
> > machines to see if they are affected, but that might take some time
> > before I can get to it.
> >
> 
> Same here (s/debian/ubuntu/) but I have the exact same problem at the
> console, I don't think it could be an X issue unless X was able to wedge
> the keyboard controller.
> 
> It feels like typing over a slow modem link, I can get about one word
> ahead of the cursor (X or console, regardless of load) but the delay
> seems to be constant.
> 

Is this with ACPI? Have you tried playing with ec_polling parameter?

-- 
Dmitry

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
  2005-08-02 10:19       ` Ingo Molnar
@ 2005-08-02 19:45         ` Steven Rostedt
  2005-08-02 19:56           ` Steven Rostedt
  2005-08-02 23:38           ` Daniel Walker
  0 siblings, 2 replies; 66+ messages in thread
From: Steven Rostedt @ 2005-08-02 19:45 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: dwalker, linux-kernel

On Tue, 2005-08-02 at 12:19 +0200, Ingo Molnar wrote:
> * Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > In my custom kernel, I have a wchan field of the task that records 
> > where the task calls something that might schedule. This way I can see 
> > where things locked up if I don't have a back trace of the task.  This 
> > field is always zero when it switches to usermode.  Something like 
> > this can also be used to check how long the process is in kernel mode.  
> > If a task is in the kernel for more than 10 seconds without sleeping, 
> > that would definitely be a good indication of something wrong.  I 
> > probably could write something to check for this if people are 
> > interested.  I wont waste my time if nobody would want it.
> 
> this would be a pretty useful extension of the softlockup checker!

Here it is (Finally).  I just had to be patient with the kjournal
lockup.  I had to wait some time before the lockup occurred, but when it
did, I got my message out:
--------------------------------------------
BUG: possible soft lockup detected on CPU#0! 1314840-1313839(1314839)
curr=kjournald:734 count=11
 [<c010410f>] dump_stack+0x1f/0x30 (20)
 [<c01441e0>] softlockup_tick+0x170/0x1a0 (44)
 [<c0125d32>] update_process_times+0x62/0x140 (28)
 [<c010861d>] timer_interrupt+0x4d/0x100 (20)
 [<c014450f>] handle_IRQ_event+0x6f/0x120 (48)
 [<c014469c>] __do_IRQ+0xdc/0x1a0 (48)
 [<c0105abe>] do_IRQ+0x4e/0x90 (28)
 [<c0103b67>] common_interrupt+0x1f/0x24 (112)
 [<c01edc36>] journal_commit_transaction+0x1206/0x1430 (112)
 [<c01f06d0>] kjournald+0xd0/0x1e0 (84)
 [<c01011ed>] kernel_thread_helper+0x5/0x18 (825638940)
---------------------------
| preempt count: 20010003 ]
| 3-level deep critical section nesting:
----------------------------------------
.. [<c013d49a>] .... add_preempt_count+0x1a/0x20
.....[<c01085ec>] ..   ( <= timer_interrupt+0x1c/0x100)
.. [<c013d49a>] .... add_preempt_count+0x1a/0x20
.....[<c014418d>] ..   ( <= softlockup_tick+0x11d/0x1a0)
.. [<c013d49a>] .... add_preempt_count+0x1a/0x20
.....[<c013e727>] ..   ( <= print_traces+0x17/0x50)

------------------------------
| showing all locks held by: |  (kjournald/734 [c13e20b0,  69]):
------------------------------
-----------------------------------------------

This does NOT detect lockups in user space. If a RT user program gets
stuck in an infinite loop, it's their problem. This only detects lockups
in the kernel. I also don't determine a difference if a kernel is stuck
as a RT task or not, as long as it tries to sleep once in a while this
message wont appear. I'm not aware of any kernel thread that spins in
the kernel, so I don't think this will be a problem. (I forgot about
swapper and it was showing up, hence the check for current->pid :-).

Note this currently only works with i386. If other archs want this, they
need to modify the thread_info to include the softlockup_count, and then
upon returning to user space it needs to be reset.  Then the
ARCH_HAS_SOFTLOCK_DETECT needs to be defined. This counter will be reset
at every time the kernel enters users space, so this includes the timer
interrupt.


-- Steve

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

Index: linux_realtime_ernie/kernel/sched.c
===================================================================
--- linux_realtime_ernie/kernel/sched.c	(revision 266)
+++ linux_realtime_ernie/kernel/sched.c	(working copy)
@@ -3367,6 +3367,8 @@
 		send_sig(SIGUSR2, current, 1);
 	}
 	do {
+		if (current->state & ~TASK_RUNNING_MUTEX)
+			touch_light_softlockup_watchdog();
 		__schedule();
 	} while (unlikely(test_thread_flag(TIF_NEED_RESCHED) || test_thread_flag(TIF_NEED_RESCHED_DELAYED)));
 	raw_local_irq_enable(); // TODO: do sti; ret
Index: linux_realtime_ernie/kernel/softlockup.c
===================================================================
--- linux_realtime_ernie/kernel/softlockup.c	(revision 269)
+++ linux_realtime_ernie/kernel/softlockup.c	(working copy)
@@ -3,6 +3,10 @@
  *
  * started by Ingo Molnar, (C) 2005, Red Hat
  *
+ * Steven Rostedt, Kihon Technologies Inc.
+ *   Added light softlockup detection off of what Daniel Walker of
+ *   MontaVista started.
+ *
  * this code detects soft lockups: incidents in where on a CPU
  * the kernel does not reschedule for 10 seconds or more.
  */
@@ -20,9 +24,7 @@
 static DEFINE_PER_CPU(unsigned long, timeout) = INITIAL_JIFFIES;
 static DEFINE_PER_CPU(unsigned long, timestamp) = INITIAL_JIFFIES;
 static DEFINE_PER_CPU(unsigned long, print_timestamp) = INITIAL_JIFFIES;
-static DEFINE_PER_CPU(struct task_struct *, prev_task);
 static DEFINE_PER_CPU(struct task_struct *, watchdog_task);
-static DEFINE_PER_CPU(unsigned long, task_counter);
 
 static int did_panic = 0;
 static int softlock_panic(struct notifier_block *this, unsigned long event,
@@ -59,25 +61,25 @@
 		if (!per_cpu(watchdog_task, this_cpu))
 			return;
 
-		if (per_cpu(prev_task, this_cpu) != current || 
-			!rt_task(current)) {
-			per_cpu(prev_task, this_cpu) = current;
-			per_cpu(task_counter, this_cpu) = 0;
-		}
-		else if ((++per_cpu(task_counter, this_cpu) > 10) && printk_ratelimit()) {
-
-			spin_lock(&print_lock);
-			printk(KERN_ERR "BUG: possible soft lockup detected on CPU#%u! %lu-%lu(%lu)\n",
-				this_cpu, jiffies, timestamp, timeout);
-			printk("curr=%s:%d\n",current->comm,current->pid);
-			
-			dump_stack();
+#ifdef ARCH_HAS_SOFTLOCKUP_DETECT
+		if (current->pid) {
+			unsigned long count;
+			count = task_softlockup_count(current);
+			if (++count > 10) {
+				spin_lock(&print_lock);
+				printk(KERN_ERR "BUG: possible soft lockup detected on CPU#%u! %lu-%lu(%lu)\n",
+				       this_cpu, jiffies, timestamp, timeout);
+				printk("curr=%s:%d count=%ld\n",current->comm,current->pid,count);
+				dump_stack();
 #if defined(__i386__) && defined(CONFIG_SMP)
-			nmi_show_all_regs();
+				nmi_show_all_regs();
 #endif
-			spin_unlock(&print_lock);
-			per_cpu(task_counter, this_cpu) = 0;
+				spin_unlock(&print_lock);
+				count = 0;
+			}
+			set_task_softlockup_count(current,count);
 		}
+#endif /* ARCH_HAS_SOFTLOCKUP_DETECT */
 
 		wake_up_process(per_cpu(watchdog_task, this_cpu));
 		per_cpu(timeout, this_cpu) = jiffies + msecs_to_jiffies(1000);
@@ -101,7 +103,6 @@
 		nmi_show_all_regs();
 #endif
 		spin_unlock(&print_lock);
-		per_cpu(task_counter, this_cpu) = 0;
 	}
 }
 
Index: linux_realtime_ernie/include/asm-i386/thread_info.h
===================================================================
--- linux_realtime_ernie/include/asm-i386/thread_info.h	(revision 266)
+++ linux_realtime_ernie/include/asm-i386/thread_info.h	(working copy)
@@ -43,6 +43,13 @@
 	unsigned long           previous_esp;   /* ESP of the previous stack in case
 						   of nested (IRQ) stacks
 						*/
+#ifdef CONFIG_DETECT_SOFTLOCKUP
+#define ARCH_HAS_SOFTLOCKUP_DETECT
+	unsigned long		softlockup_count; /* Count to keep track how long the
+						     thread is in the kernel without
+						     sleeping.
+						  */
+#endif
 	__u8			supervisor_stack[0];
 };
 
Index: linux_realtime_ernie/include/linux/sched.h
===================================================================
--- linux_realtime_ernie/include/linux/sched.h	(revision 266)
+++ linux_realtime_ernie/include/linux/sched.h	(working copy)
@@ -1497,6 +1497,26 @@
 
 #endif /* CONFIG_SMP */
 
+#ifdef ARCH_HAS_SOFTLOCKUP_DETECT
+static inline unsigned long task_softlockup_count(const struct task_struct *p)
+{
+	return p->thread_info->softlockup_count;
+}
+static inline void set_task_softlockup_count(const struct task_struct *p,
+					     unsigned long count)
+{
+	p->thread_info->softlockup_count = count;
+}
+static inline void touch_light_softlockup_watchdog(void)
+{
+	set_task_softlockup_count(current, 0);
+}
+#else
+static inline void touch_light_softlockup_watchdog(void)
+{
+}
+#endif
+
 #ifdef HAVE_ARCH_PICK_MMAP_LAYOUT
 extern void arch_pick_mmap_layout(struct mm_struct *mm);
 #else
Index: linux_realtime_ernie/arch/i386/kernel/asm-offsets.c
===================================================================
--- linux_realtime_ernie/arch/i386/kernel/asm-offsets.c	(revision 266)
+++ linux_realtime_ernie/arch/i386/kernel/asm-offsets.c	(working copy)
@@ -53,6 +53,9 @@
 	OFFSET(TI_preempt_count, thread_info, preempt_count);
 	OFFSET(TI_addr_limit, thread_info, addr_limit);
 	OFFSET(TI_restart_block, thread_info, restart_block);
+#ifdef CONFIG_DETECT_SOFTLOCKUP
+	OFFSET(TI_softlockup_count, thread_info, softlockup_count);
+#endif
 	BLANK();
 
 	OFFSET(EXEC_DOMAIN_handler, exec_domain, handler);
Index: linux_realtime_ernie/arch/i386/kernel/entry.S
===================================================================
--- linux_realtime_ernie/arch/i386/kernel/entry.S	(revision 266)
+++ linux_realtime_ernie/arch/i386/kernel/entry.S	(working copy)
@@ -155,6 +155,10 @@
 	andl $_TIF_WORK_MASK, %ecx	# is there any work to be done on
 					# int/exception return?
 	jne work_pending
+#ifdef CONFIG_DETECT_SOFTLOCKUP
+	movl $0, TI_softlockup_count(%ebp)  # Zero out the count when going
+					# back to userland
+#endif
 	jmp restore_all
 
 #ifdef CONFIG_PREEMPT



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
  2005-08-02 19:45         ` Steven Rostedt
@ 2005-08-02 19:56           ` Steven Rostedt
  2005-08-02 23:38           ` Daniel Walker
  1 sibling, 0 replies; 66+ messages in thread
From: Steven Rostedt @ 2005-08-02 19:56 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, dwalker

On Tue, 2005-08-02 at 15:45 -0400, Steven Rostedt wrote:

> Here it is (Finally).  I just had to be patient with the kjournal
> lockup.  I had to wait some time before the lockup occurred, but when it
> did, I got my message out:

Oh yeah, this goes against 52-10.

-- Steve



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
  2005-08-02 19:45         ` Steven Rostedt
  2005-08-02 19:56           ` Steven Rostedt
@ 2005-08-02 23:38           ` Daniel Walker
  2005-08-03  0:00             ` Steven Rostedt
  1 sibling, 1 reply; 66+ messages in thread
From: Daniel Walker @ 2005-08-02 23:38 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Ingo Molnar, linux-kernel


Couldn't you just do some math off current->timestamp to see how long
the task has been running? This per arch stuff seems a bit invasive..

Daniel

On Tue, 2005-08-02 at 15:45 -0400, Steven Rostedt wrote:
> On Tue, 2005-08-02 at 12:19 +0200, Ingo Molnar wrote:
> > * Steven Rostedt <rostedt@goodmis.org> wrote:
> > 
> > > In my custom kernel, I have a wchan field of the task that records 
> > > where the task calls something that might schedule. This way I can see 
> > > where things locked up if I don't have a back trace of the task.  This 
> > > field is always zero when it switches to usermode.  Something like 
> > > this can also be used to check how long the process is in kernel mode.  
> > > If a task is in the kernel for more than 10 seconds without sleeping, 
> > > that would definitely be a good indication of something wrong.  I 
> > > probably could write something to check for this if people are 
> > > interested.  I wont waste my time if nobody would want it.
> > 
> > this would be a pretty useful extension of the softlockup checker!
> 
> Here it is (Finally).  I just had to be patient with the kjournal
> lockup.  I had to wait some time before the lockup occurred, but when it
> did, I got my message out:
> --------------------------------------------
> BUG: possible soft lockup detected on CPU#0! 1314840-1313839(1314839)
> curr=kjournald:734 count=11
>  [<c010410f>] dump_stack+0x1f/0x30 (20)
>  [<c01441e0>] softlockup_tick+0x170/0x1a0 (44)
>  [<c0125d32>] update_process_times+0x62/0x140 (28)
>  [<c010861d>] timer_interrupt+0x4d/0x100 (20)
>  [<c014450f>] handle_IRQ_event+0x6f/0x120 (48)
>  [<c014469c>] __do_IRQ+0xdc/0x1a0 (48)
>  [<c0105abe>] do_IRQ+0x4e/0x90 (28)
>  [<c0103b67>] common_interrupt+0x1f/0x24 (112)
>  [<c01edc36>] journal_commit_transaction+0x1206/0x1430 (112)
>  [<c01f06d0>] kjournald+0xd0/0x1e0 (84)
>  [<c01011ed>] kernel_thread_helper+0x5/0x18 (825638940)
> ---------------------------
> | preempt count: 20010003 ]
> | 3-level deep critical section nesting:
> ----------------------------------------
> .. [<c013d49a>] .... add_preempt_count+0x1a/0x20
> .....[<c01085ec>] ..   ( <= timer_interrupt+0x1c/0x100)
> .. [<c013d49a>] .... add_preempt_count+0x1a/0x20
> .....[<c014418d>] ..   ( <= softlockup_tick+0x11d/0x1a0)
> .. [<c013d49a>] .... add_preempt_count+0x1a/0x20
> .....[<c013e727>] ..   ( <= print_traces+0x17/0x50)
> 
> ------------------------------
> | showing all locks held by: |  (kjournald/734 [c13e20b0,  69]):
> ------------------------------
> -----------------------------------------------



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
  2005-08-02 23:38           ` Daniel Walker
@ 2005-08-03  0:00             ` Steven Rostedt
  2005-08-03  1:12               ` George Anzinger
  2005-08-03  2:25               ` Daniel Walker
  0 siblings, 2 replies; 66+ messages in thread
From: Steven Rostedt @ 2005-08-03  0:00 UTC (permalink / raw)
  To: dwalker; +Cc: Ingo Molnar, linux-kernel

On Tue, 2005-08-02 at 16:38 -0700, Daniel Walker wrote:
> Couldn't you just do some math off current->timestamp to see how long
> the task has been running? This per arch stuff seems a bit invasive..

The thing is, I'm tracking how long the task is running in the kernel
without doing a schedule.  That's actually easy, but I don't want to
count when the task is in userspace. The per-arch is only updating so
that we don't count user space, otherwise the count could be in the
task_struct.  If there is an arch-independent way to tell if a task is
running in user-space or kernel when an interrupt goes off then I would
use it.  The per arch is actually easy, and I would write it, but I
don't have the hardware now to test it.  I could at least do PPC and
MIPS since I'm quite familiar with both, but I don't currently have a
cross compiler to compile it.

I understand your point, I would really prefer an arch independent
solution, but the timestamp from current just wont cut it.  Have another
idea, I'm all open for it.

So far, what I submitted works with no know side-effects except that it
is a per-arch patch, which does suck.  

-- Steve


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
  2005-08-03  0:00             ` Steven Rostedt
@ 2005-08-03  1:12               ` George Anzinger
  2005-08-03  1:48                 ` Keith Owens
  2005-08-03  2:25               ` Daniel Walker
  1 sibling, 1 reply; 66+ messages in thread
From: George Anzinger @ 2005-08-03  1:12 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: dwalker, Ingo Molnar, linux-kernel

Steven Rostedt wrote:
> On Tue, 2005-08-02 at 16:38 -0700, Daniel Walker wrote:
> 
>>Couldn't you just do some math off current->timestamp to see how long
>>the task has been running? This per arch stuff seems a bit invasive..
> 
> 
> The thing is, I'm tracking how long the task is running in the kernel
> without doing a schedule.  That's actually easy, but I don't want to
> count when the task is in userspace. The per-arch is only updating so
> that we don't count user space, otherwise the count could be in the
> task_struct.  If there is an arch-independent way to tell if a task is
> running in user-space or kernel when an interrupt goes off then I would
> use it.  The per arch is actually easy, and I would write it, but I
> don't have the hardware now to test it.  I could at least do PPC and
> MIPS since I'm quite familiar with both, but I don't currently have a
> cross compiler to compile it.
> 
> I understand your point, I would really prefer an arch independent
> solution, but the timestamp from current just wont cut it.  Have another
> idea, I'm all open for it.

How about something like:
	if (current + THREAD_SIZE/sizeof(long) - (regs + sizeof(pt_regs)) > MAGIC)

The idea is that an interrupt from user space will be the ONLY thing on 
the stack while an interrupt from the kernel will have kernel stack 
under it.  Current is the bottom end of the kernel stack and regs + 
sizeof(pt_regs) is where the interrupt context started.  Assumptions a) 
stack grows down, b) no switch stack at interrupt.
MAGIC is some small number.  For x86 user it is actually zero, don't 
know about others but the saved context should be the first thing on the 
stack so a minimun frame size should do.


-- 
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
  2005-08-03  1:12               ` George Anzinger
@ 2005-08-03  1:48                 ` Keith Owens
  2005-08-03  2:12                   ` George Anzinger
  0 siblings, 1 reply; 66+ messages in thread
From: Keith Owens @ 2005-08-03  1:48 UTC (permalink / raw)
  To: george; +Cc: Steven Rostedt, dwalker, Ingo Molnar, linux-kernel

On Tue, 02 Aug 2005 18:12:27 -0700, 
George Anzinger <george@mvista.com> wrote:
>How about something like:
>	if (current + THREAD_SIZE/sizeof(long) - (regs + sizeof(pt_regs)) > MAGIC)

current points to the current struct task, regs points to the kernel
stack.  Those two data areas can be completely separate, as they are on
i386.  Also i386 uses a separate kernel stack for interrupts.


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
  2005-08-03  1:48                 ` Keith Owens
@ 2005-08-03  2:12                   ` George Anzinger
  0 siblings, 0 replies; 66+ messages in thread
From: George Anzinger @ 2005-08-03  2:12 UTC (permalink / raw)
  To: Keith Owens; +Cc: Steven Rostedt, dwalker, Ingo Molnar, linux-kernel

Keith Owens wrote:
> On Tue, 02 Aug 2005 18:12:27 -0700, 
> George Anzinger <george@mvista.com> wrote:
> 
>>How about something like:
>>	if (current + THREAD_SIZE/sizeof(long) - (regs + sizeof(pt_regs)) > MAGIC)
> 
> 
> current points to the current struct task, regs points to the kernel
> stack.  Those two data areas can be completely separate, as they are on
> i386.  Also i386 uses a separate kernel stack for interrupts.

Acually I must mean the thread_info and not current.  i386 only uses a 
seperate stack if you use 4K stacks.  I think others use seperate 
interrupt stacks, however :(.  Also, on thinking on it, I think some 
archs don't call the registers pt_regs either.  Oh, well, it was a 
thought...

Waiting for its brother... :)
-- 
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
  2005-08-03  0:00             ` Steven Rostedt
  2005-08-03  1:12               ` George Anzinger
@ 2005-08-03  2:25               ` Daniel Walker
  2005-08-03  2:42                 ` Steven Rostedt
  1 sibling, 1 reply; 66+ messages in thread
From: Daniel Walker @ 2005-08-03  2:25 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Ingo Molnar, linux-kernel

On Tue, 2005-08-02 at 20:00 -0400, Steven Rostedt wrote:
> On Tue, 2005-08-02 at 16:38 -0700, Daniel Walker wrote:
> > Couldn't you just do some math off current->timestamp to see how long
> > the task has been running? This per arch stuff seems a bit invasive..
> 
> The thing is, I'm tracking how long the task is running in the kernel
> without doing a schedule.  That's actually easy, but I don't want to

Why make the distinction ? For what I was going for all I wanted to know
was that an RT task was eating up all the CPU . Did you have something
else in mind?

Daniel


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
  2005-08-03  2:25               ` Daniel Walker
@ 2005-08-03  2:42                 ` Steven Rostedt
  2005-08-03  2:58                   ` Daniel Walker
  0 siblings, 1 reply; 66+ messages in thread
From: Steven Rostedt @ 2005-08-03  2:42 UTC (permalink / raw)
  To: Daniel Walker; +Cc: Ingo Molnar, linux-kernel

On Tue, 2005-08-02 at 19:25 -0700, Daniel Walker wrote:
> On Tue, 2005-08-02 at 20:00 -0400, Steven Rostedt wrote:
> > On Tue, 2005-08-02 at 16:38 -0700, Daniel Walker wrote:
> > > Couldn't you just do some math off current->timestamp to see how long
> > > the task has been running? This per arch stuff seems a bit invasive..
> > 
> > The thing is, I'm tracking how long the task is running in the kernel
> > without doing a schedule.  That's actually easy, but I don't want to
> 
> Why make the distinction ? For what I was going for all I wanted to know
> was that an RT task was eating up all the CPU . Did you have something
> else in mind?

Yeah, bugs in the kernel :-)

I can change the patch to just see who is hogging the CPU for more than
X amount of seconds (10 by default) if that pleases everyone. If that's
what people want, then I'll send another patch tomorrow. If this is the
way to go, then I'll add back the check for RT tasks to limit the output
to just RT hogs.  Or is any hog OK?

I guess this would at least keep it arch independent.  Although I like
my little hack with the additions to thread_info and entry.S ;-)

-- Steve



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
  2005-08-03  2:42                 ` Steven Rostedt
@ 2005-08-03  2:58                   ` Daniel Walker
  2005-08-03 10:30                     ` Steven Rostedt
                                       ` (2 more replies)
  0 siblings, 3 replies; 66+ messages in thread
From: Daniel Walker @ 2005-08-03  2:58 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Ingo Molnar, linux-kernel

On Tue, 2005-08-02 at 22:42 -0400, Steven Rostedt wrote:
> On Tue, 2005-08-02 at 19:25 -0700, Daniel Walker wrote:
> > On Tue, 2005-08-02 at 20:00 -0400, Steven Rostedt wrote:
> > > On Tue, 2005-08-02 at 16:38 -0700, Daniel Walker wrote:
> > > > Couldn't you just do some math off current->timestamp to see how long
> > > > the task has been running? This per arch stuff seems a bit invasive..
> > > 
> > > The thing is, I'm tracking how long the task is running in the kernel
> > > without doing a schedule.  That's actually easy, but I don't want to
> > 
> > Why make the distinction ? For what I was going for all I wanted to know
> > was that an RT task was eating up all the CPU . Did you have something
> > else in mind?
> 
> Yeah, bugs in the kernel :-)
> 
> I can change the patch to just see who is hogging the CPU for more than
> X amount of seconds (10 by default) if that pleases everyone. If that's
> what people want, then I'll send another patch tomorrow. If this is the
> way to go, then I'll add back the check for RT tasks to limit the output
> to just RT hogs.  Or is any hog OK?

The stack trace should show where the problem is . If it's in the kernel
we will see kernel functions before do_IRQ() , if it's just a whacked
out task then do_IRQ() would be first in the stack trace . 

I can't speak for everyone else, but I would want to catch both. That
way we'll know if it's just a whacked out task, or a kernel problem.

Daniel




^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
  2005-08-03  2:58                   ` Daniel Walker
@ 2005-08-03 10:30                     ` Steven Rostedt
  2005-08-03 15:10                       ` Daniel Walker
  2005-08-03 10:37                     ` [Question] arch-independent way to differentiate between user and kernel Steven Rostedt
  2005-08-03 14:50                     ` [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01 Steven Rostedt
  2 siblings, 1 reply; 66+ messages in thread
From: Steven Rostedt @ 2005-08-03 10:30 UTC (permalink / raw)
  To: Daniel Walker; +Cc: Ingo Molnar, linux-kernel

On Tue, 2005-08-02 at 19:58 -0700, Daniel Walker wrote:
> The stack trace should show where the problem is . If it's in the kernel
> we will see kernel functions before do_IRQ() , if it's just a whacked
> out task then do_IRQ() would be first in the stack trace . 

The problem is not differentiating tho output as kernel or user, I just
don't want too many false positives.

> 
> I can't speak for everyone else, but I would want to catch both. That
> way we'll know if it's just a whacked out task, or a kernel problem.

The thing is, it may be OK for a RT process to run in userspace for 10
seconds without sleeping.  If this is the case, you will constantly get
this output saying you may mave a bug. But if the kernel is running for
10 seconds without scheduling, I strongly believe that is a bug.  Unless
someone has some special driver thread, I don't know of any kernel path
that runs for 10 seconds without going back to userspace or sleeping.

I still wish there was a nice arch-independent way to tell if the task
is running in user space from do_IRQ.  Maybe there is?  I'll post
another thread and ask the question.

-- Steve


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [Question] arch-independent way to differentiate between user and kernel
  2005-08-03  2:58                   ` Daniel Walker
  2005-08-03 10:30                     ` Steven Rostedt
@ 2005-08-03 10:37                     ` Steven Rostedt
  2005-08-03 10:48                       ` Ingo Molnar
  2005-08-03 10:56                       ` [Question] arch-independent way to differentiate between user andkernel linux-os (Dick Johnson)
  2005-08-03 14:50                     ` [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01 Steven Rostedt
  2 siblings, 2 replies; 66+ messages in thread
From: Steven Rostedt @ 2005-08-03 10:37 UTC (permalink / raw)
  To: LKML; +Cc: Daniel Walker, Ingo Molnar

Hi all,

I'm dealing with a problem where I want to know from __do_IRQ in
kernel/irq/handle.c if the interrupt occurred while the process was in
user space or kernel space.  But the trick here is that it must work on
all architectures.

Does anyone know of some way that that function can tell if it had
interrupted the kernel or user space?  I know of serveral arch-dependent
ways, but that's not acceptable right now.

Thanks,

-- Steve



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Question] arch-independent way to differentiate between user and kernel
  2005-08-03 10:37                     ` [Question] arch-independent way to differentiate between user and kernel Steven Rostedt
@ 2005-08-03 10:48                       ` Ingo Molnar
  2005-08-03 12:18                         ` Steven Rostedt
  2005-08-03 10:56                       ` [Question] arch-independent way to differentiate between user andkernel linux-os (Dick Johnson)
  1 sibling, 1 reply; 66+ messages in thread
From: Ingo Molnar @ 2005-08-03 10:48 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: LKML, Daniel Walker


* Steven Rostedt <rostedt@goodmis.org> wrote:

> Hi all,
> 
> I'm dealing with a problem where I want to know from __do_IRQ in 
> kernel/irq/handle.c if the interrupt occurred while the process was in 
> user space or kernel space.  But the trick here is that it must work 
> on all architectures.
> 
> Does anyone know of some way that that function can tell if it had 
> interrupted the kernel or user space?  I know of serveral 
> arch-dependent ways, but that's not acceptable right now.

i dont think there's any. user_mode(regs) gets the closest - it might 
make sense to generalize it over all arches.

update_process_times() gets an arch-independent 'was the tick user-space 
or kernel-space' flag, so the best starting point would be to look at 
the output of:

 for N in `find . -name '*.c' | xargs grep update_process_times |
  grep arch`; do echo $N; done | grep update_process_times |
   sort | uniq -c

which gives:

      2 update_process_times()
      1 update_process_times(CHOOSE_MODE(user_context(UPT_SP(regs)),
      6 update_process_times(user);
      1 update_process_times(user_mode(fp));
     33 update_process_times(user_mode(regs));
      2 update_process_times(user_mode_vm(regs));

so ~33 calls use user_mode(regs), and the rest needs to be reviewed and 
possibly changed. Looks doable.

	Ingo

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Question] arch-independent way to differentiate between user andkernel
  2005-08-03 10:37                     ` [Question] arch-independent way to differentiate between user and kernel Steven Rostedt
  2005-08-03 10:48                       ` Ingo Molnar
@ 2005-08-03 10:56                       ` linux-os (Dick Johnson)
  2005-08-03 11:44                         ` Steven Rostedt
  1 sibling, 1 reply; 66+ messages in thread
From: linux-os (Dick Johnson) @ 2005-08-03 10:56 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: LKML, Daniel Walker, Ingo Molnar


On Wed, 3 Aug 2005, Steven Rostedt wrote:

> Hi all,
>
> I'm dealing with a problem where I want to know from __do_IRQ in
> kernel/irq/handle.c if the interrupt occurred while the process was in
> user space or kernel space.  But the trick here is that it must work on
> all architectures.
>
> Does anyone know of some way that that function can tell if it had
> interrupted the kernel or user space?  I know of serveral arch-dependent
> ways, but that's not acceptable right now.
>
> Thanks,
>
> -- Steve
>

The interrupt handler gets a pointer to a structure called "struct pt_regs".
That contains, amongst other things, the registers pushed onto the stack
during the interrupt. If the segments were kernel segments, the interrupt
occurred while in kernel mode. But..... If you have any code that
needs to know, it's horribly and irreparably broken beyond all
repair. Interrupts need to be handled NOW, without regard to what
got interrupted.


Cheers,
Dick Johnson
Penguin : Linux version 2.6.12 on an i686 machine (5537.79 BogoMips).
Warning : 98.36% of all statistics are fiction.
.
I apologize for the following. I tried to kill it with the above dot :

****************************************************************
The information transmitted in this message is confidential and may be privileged.  Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited.  If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Question] arch-independent way to differentiate between user andkernel
  2005-08-03 10:56                       ` [Question] arch-independent way to differentiate between user andkernel linux-os (Dick Johnson)
@ 2005-08-03 11:44                         ` Steven Rostedt
  2005-08-03 12:04                           ` Ingo Molnar
  0 siblings, 1 reply; 66+ messages in thread
From: Steven Rostedt @ 2005-08-03 11:44 UTC (permalink / raw)
  To: linux-os (Dick Johnson); +Cc: LKML, Daniel Walker, Ingo Molnar

On Wed, 2005-08-03 at 06:56 -0400, linux-os (Dick Johnson) wrote:
> On Wed, 3 Aug 2005, Steven Rostedt wrote:
> The interrupt handler gets a pointer to a structure called "struct pt_regs".
> That contains, amongst other things, the registers pushed onto the stack
> during the interrupt. If the segments were kernel segments, the interrupt
> occurred while in kernel mode. But..... If you have any code that
> needs to know, it's horribly and irreparably broken beyond all
> repair. Interrupts need to be handled NOW, without regard to what
> got interrupted.
> 

By the time you get to __do_IRQ there's already more stuff on the stack.
And the pt_regs is arch specific so this doesn't help.

This is for debugging, so please don't jump to conclusions that what I'm
doing is broken.  I'm writing code to look for soft deadlocks on a fully
preemptible kernel (Ingo's RT).  For example, there's a nasty location
in kjournald that goes into a busy loop waiting for a lock to be
released. In the mainline kernel this is OK since the holding of these
locks turn off preemption. But Ingo's kernel does not turn off
preemption here, and if the kjournald is an RT task, it will prevent all
tasks lower in priority than itself from running.  I'm writing code to
detect this.

Take a look at the 
"[patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01" thread.

-- Steve




^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Question] arch-independent way to differentiate between user andkernel
  2005-08-03 11:44                         ` Steven Rostedt
@ 2005-08-03 12:04                           ` Ingo Molnar
  2005-08-03 12:30                             ` Steven Rostedt
  0 siblings, 1 reply; 66+ messages in thread
From: Ingo Molnar @ 2005-08-03 12:04 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-os (Dick Johnson), LKML, Daniel Walker


* Steven Rostedt <rostedt@goodmis.org> wrote:

> On Wed, 2005-08-03 at 06:56 -0400, linux-os (Dick Johnson) wrote:
> > On Wed, 3 Aug 2005, Steven Rostedt wrote:
> > The interrupt handler gets a pointer to a structure called "struct pt_regs".
> > That contains, amongst other things, the registers pushed onto the stack
> > during the interrupt. If the segments were kernel segments, the interrupt
> > occurred while in kernel mode. But..... If you have any code that
> > needs to know, it's horribly and irreparably broken beyond all
> > repair. Interrupts need to be handled NOW, without regard to what
> > got interrupted.
> > 
> 
> By the time you get to __do_IRQ there's already more stuff on the 
> stack. And the pt_regs is arch specific so this doesn't help.

the actual layout of pt_regs is arch-specific, but user_mode(regs) is 
pretty much generic across most arches.

	Ingo

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Question] arch-independent way to differentiate between user and kernel
  2005-08-03 10:48                       ` Ingo Molnar
@ 2005-08-03 12:18                         ` Steven Rostedt
  0 siblings, 0 replies; 66+ messages in thread
From: Steven Rostedt @ 2005-08-03 12:18 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: LKML, Daniel Walker

On Wed, 2005-08-03 at 12:48 +0200, Ingo Molnar wrote:
> 
> i dont think there's any. user_mode(regs) gets the closest - it might 
> make sense to generalize it over all arches.
> 
> update_process_times() gets an arch-independent 'was the tick user-space 
> or kernel-space' flag, so the best starting point would be to look at 
> the output of:
> 
>  for N in `find . -name '*.c' | xargs grep update_process_times |
>   grep arch`; do echo $N; done | grep update_process_times |
>    sort | uniq -c
> 
> which gives:
> 
>       2 update_process_times()
>       1 update_process_times(CHOOSE_MODE(user_context(UPT_SP(regs)),
>       6 update_process_times(user);
>       1 update_process_times(user_mode(fp));
>      33 update_process_times(user_mode(regs));
>       2 update_process_times(user_mode_vm(regs));
> 
> so ~33 calls use user_mode(regs), and the rest needs to be reviewed and 
> possibly changed. Looks doable.

Ingo,

Thanks for the starting point. For now I'll submit a patch that doesn't
do the user_mode checks as discussed before. Then if I can find
something here, I'll send another patch on top of the first patch later.

-- Steve



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [Question] arch-independent way to differentiate between user andkernel
  2005-08-03 12:04                           ` Ingo Molnar
@ 2005-08-03 12:30                             ` Steven Rostedt
  0 siblings, 0 replies; 66+ messages in thread
From: Steven Rostedt @ 2005-08-03 12:30 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-os (Dick Johnson), LKML, Daniel Walker

On Wed, 2005-08-03 at 14:04 +0200, Ingo Molnar wrote:
> * Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > On Wed, 2005-08-03 at 06:56 -0400, linux-os (Dick Johnson) wrote:
> > > On Wed, 3 Aug 2005, Steven Rostedt wrote:
> > > The interrupt handler gets a pointer to a structure called "struct pt_regs".
> > > That contains, amongst other things, the registers pushed onto the stack
> > > during the interrupt. If the segments were kernel segments, the interrupt
> > > occurred while in kernel mode. But..... If you have any code that
> > > needs to know, it's horribly and irreparably broken beyond all
> > > repair. Interrupts need to be handled NOW, without regard to what
> > > got interrupted.
> > > 
> > 
> > By the time you get to __do_IRQ there's already more stuff on the 
> > stack. And the pt_regs is arch specific so this doesn't help.
> 
> the actual layout of pt_regs is arch-specific, but user_mode(regs) is 
> pretty much generic across most arches.
> 

OK I did the following:

 find arch -name "*.c" ! -type d | xargs grep  "update_process_times" |grep -v user_mode
arch/arm/kernel/smp.c:  update_process_times(user);
arch/um/kernel/time_kern.c:     update_process_times(CHOOSE_MODE(user_context(UPT_SP(regs)), (regs)->skas.is_user));
arch/sparc64/kernel/smp.c:                      update_process_times(user);
arch/m32r/kernel/smp.c:         update_process_times(user);
arch/alpha/kernel/smp.c:                update_process_times(user);
arch/i386/kernel/apic.c:         * update_process_times() expects us to have done irq_enter().
arch/x86_64/kernel/apic.c:       * update_process_times() expects us to have done irq_enter().
arch/sparc/kernel/sun4d_smp.c:          update_process_times(user);
arch/sparc/kernel/sun4m_smp.c:          update_process_times(user);

I also did a find without the -v user_mode and here's some of the output
(filtered to only show what's relevant):

arch/arm/kernel/time.c: update_process_times(user_mode(regs));
arch/sparc64/kernel/time.c:             update_process_times(user_mode(regs));
arch/m32r/kernel/time.c:        update_process_times(user_mode(regs));
arch/alpha/kernel/time.c:               update_process_times(user_mode(regs));
arch/i386/kernel/apic.c:                update_process_times(user_mode_vm(regs));
arch/x86_64/kernel/time.c:      update_process_times(user_mode(regs));
arch/sparc/kernel/pcic.c:       update_process_times(user_mode(regs));

So all but (amusingly) user-mode-linux use the user_mode macro. So it
does look good. I'm not too worried right now for user-mode-linux, but
that sould be fixed too if need be.

So I'll add this to the patch I'll be sending you soon (after it's all
tested).

-- Steve



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
  2005-08-03  2:58                   ` Daniel Walker
  2005-08-03 10:30                     ` Steven Rostedt
  2005-08-03 10:37                     ` [Question] arch-independent way to differentiate between user and kernel Steven Rostedt
@ 2005-08-03 14:50                     ` Steven Rostedt
  2005-08-03 15:15                       ` Steven Rostedt
  2005-08-03 16:44                       ` Steven Rostedt
  2 siblings, 2 replies; 66+ messages in thread
From: Steven Rostedt @ 2005-08-03 14:50 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Daniel Walker, linux-kernel

Ingo,

Here it is. The patch rewriten to use user_mode instead of arch
dependent modifications.  It seems to work just like my previous patch.

On Tue, 2005-08-02 at 19:58 -0700, Daniel Walker wrote:
> 
> I can't speak for everyone else, but I would want to catch both. That
> way we'll know if it's just a whacked out task, or a kernel problem.
> 
> Daniel

Sorry Daniel, this patch still only detects kernel lock ups.  You can
easily modify this (perhaps add another config option) to detect user
mode lockups as well. What you would do is remove the
kernel/irq/handle.c part (ifdef out with a config?), as well as adding
something in schedule that would do a touch_light_softlockup_watchdog if
what was running happens to be the only thing runnable on the CPU.  Or
just test for RT tasks.  But I'm more interrested in debugging kernel
problems, but I do see a benefit of testing RT user tasks, especially if
something gets boosted with a PI leak.

-- Steve

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

Index: linux_realtime_ernie/kernel/irq/handle.c
===================================================================
--- linux_realtime_ernie/kernel/irq/handle.c	(revision 266)
+++ linux_realtime_ernie/kernel/irq/handle.c	(working copy)
@@ -177,6 +177,14 @@
 	 */
 	local_irq_save(flags);
 #endif
+	/*
+	 * If the task is currently running in user mode, don't 
+	 * detect soft lockups.  If CONFIG_DETECT_SOFTLOCKUP is not
+	 * configured, this should be optimized out.
+	 */
+	if (user_mode(regs))
+		touch_light_softlockup_watchdog();
+
 	kstat_this_cpu.irqs[irq]++;
 	if (desc->status & IRQ_PER_CPU) {
 		irqreturn_t action_ret;
Index: linux_realtime_ernie/kernel/sched.c
===================================================================
--- linux_realtime_ernie/kernel/sched.c	(revision 266)
+++ linux_realtime_ernie/kernel/sched.c	(working copy)
@@ -3367,6 +3367,8 @@
 		send_sig(SIGUSR2, current, 1);
 	}
 	do {
+		if (current->state & ~TASK_RUNNING_MUTEX)
+			touch_light_softlockup_watchdog();
 		__schedule();
 	} while (unlikely(test_thread_flag(TIF_NEED_RESCHED) || test_thread_flag(TIF_NEED_RESCHED_DELAYED)));
 	raw_local_irq_enable(); // TODO: do sti; ret
Index: linux_realtime_ernie/kernel/softlockup.c
===================================================================
--- linux_realtime_ernie/kernel/softlockup.c	(revision 269)
+++ linux_realtime_ernie/kernel/softlockup.c	(working copy)
@@ -3,6 +3,10 @@
  *
  * started by Ingo Molnar, (C) 2005, Red Hat
  *
+ * Steven Rostedt, Kihon Technologies Inc.
+ *   Added light softlockup detection off of what Daniel Walker of
+ *   MontaVista started.
+ *
  * this code detects soft lockups: incidents in where on a CPU
  * the kernel does not reschedule for 10 seconds or more.
  */
@@ -20,9 +24,7 @@
 static DEFINE_PER_CPU(unsigned long, timeout) = INITIAL_JIFFIES;
 static DEFINE_PER_CPU(unsigned long, timestamp) = INITIAL_JIFFIES;
 static DEFINE_PER_CPU(unsigned long, print_timestamp) = INITIAL_JIFFIES;
-static DEFINE_PER_CPU(struct task_struct *, prev_task);
 static DEFINE_PER_CPU(struct task_struct *, watchdog_task);
-static DEFINE_PER_CPU(unsigned long, task_counter);
 
 static int did_panic = 0;
 static int softlock_panic(struct notifier_block *this, unsigned long event,
@@ -42,6 +44,11 @@
 	per_cpu(timestamp, raw_smp_processor_id()) = jiffies;
 }
 
+void touch_light_softlockup_watchdog(void)
+{
+	current->softlockup_count = 0;
+}
+
 /*
  * This callback runs from the timer interrupt, and checks
  * whether the watchdog thread has hung or not:
@@ -59,24 +66,20 @@
 		if (!per_cpu(watchdog_task, this_cpu))
 			return;
 
-		if (per_cpu(prev_task, this_cpu) != current || 
-			!rt_task(current)) {
-			per_cpu(prev_task, this_cpu) = current;
-			per_cpu(task_counter, this_cpu) = 0;
-		}
-		else if ((++per_cpu(task_counter, this_cpu) > 10) && printk_ratelimit()) {
-
-			spin_lock(&print_lock);
-			printk(KERN_ERR "BUG: possible soft lockup detected on CPU#%u! %lu-%lu(%lu)\n",
-				this_cpu, jiffies, timestamp, timeout);
-			printk("curr=%s:%d\n",current->comm,current->pid);
-			
-			dump_stack();
+		if (current->pid) {
+			if (++current->softlockup_count > 10) {
+				spin_lock(&print_lock);
+				printk(KERN_ERR "BUG: possible soft lockup detected on CPU#%u! %lu-%lu(%lu)\n",
+				       this_cpu, jiffies, timestamp, timeout);
+				printk("curr=%s:%d count=%ld\n",current->comm,current->pid,
+				       current->softlockup_count);
+				dump_stack();
 #if defined(__i386__) && defined(CONFIG_SMP)
-			nmi_show_all_regs();
+				nmi_show_all_regs();
 #endif
-			spin_unlock(&print_lock);
-			per_cpu(task_counter, this_cpu) = 0;
+				spin_unlock(&print_lock);
+				touch_light_softlockup_watchdog();
+			}
 		}
 
 		wake_up_process(per_cpu(watchdog_task, this_cpu));
@@ -101,7 +104,6 @@
 		nmi_show_all_regs();
 #endif
 		spin_unlock(&print_lock);
-		per_cpu(task_counter, this_cpu) = 0;
 	}
 }
 
Index: linux_realtime_ernie/include/linux/sched.h
===================================================================
--- linux_realtime_ernie/include/linux/sched.h	(revision 266)
+++ linux_realtime_ernie/include/linux/sched.h	(working copy)
@@ -307,6 +307,7 @@
 extern void softlockup_tick(void);
 extern void spawn_softlockup_task(void);
 extern void touch_softlockup_watchdog(void);
+extern void touch_light_softlockup_watchdog(void);
 #else
 static inline void softlockup_tick(void)
 {
@@ -317,6 +318,9 @@
 static inline void touch_softlockup_watchdog(void)
 {
 }
+static inline void touch_light_softlockup_watchdog(void)
+{
+}
 #endif
 
 /* Attach to any functions which should be ignored in wchan output. */
@@ -898,6 +902,12 @@
 #ifdef CONFIG_DEBUG_PREEMPT
 	int lock_count;
 #endif
+#ifdef CONFIG_DETECT_SOFTLOCKUP
+	unsigned long	softlockup_count; /* Count to keep track how long the
+					   *  thread is in the kernel without
+					   *  sleeping.
+					   */
+#endif
 	/* realtime bits */
 	struct list_head delayed_put;
 	struct plist pi_waiters;



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
  2005-08-03 10:30                     ` Steven Rostedt
@ 2005-08-03 15:10                       ` Daniel Walker
  0 siblings, 0 replies; 66+ messages in thread
From: Daniel Walker @ 2005-08-03 15:10 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Ingo Molnar, linux-kernel

On Wed, 2005-08-03 at 06:30 -0400, Steven Rostedt wrote:
> On Tue, 2005-08-02 at 19:58 -0700, Daniel Walker wrote:
> > The stack trace should show where the problem is . If it's in the kernel
> > we will see kernel functions before do_IRQ() , if it's just a whacked
> > out task then do_IRQ() would be first in the stack trace . 
> 
> The problem is not differentiating tho output as kernel or user, I just
> don't want too many false positives.

I was just testing RT tasks, which are few enough currently.

> > 
> > I can't speak for everyone else, but I would want to catch both. That
> > way we'll know if it's just a whacked out task, or a kernel problem.
> 
> The thing is, it may be OK for a RT process to run in userspace for 10
> seconds without sleeping.  If this is the case, you will constantly get
> this output saying you may mave a bug. But if the kernel is running for
> 10 seconds without scheduling, I strongly believe that is a bug.  Unless

True, it's just really odd .. If someone complained to the list about a
crash, but they had a "possible softlockup" we might be able to conclude
that the task hung the system.

You said that your IRQ 14 would trigger this, but I think it wasn't
running for 10 seconds straight, it was just running frequent enough
that it was often running during the timer interrupt. I think that would
be solved if we just checked the running time.


> someone has some special driver thread, I don't know of any kernel path
> that runs for 10 seconds without going back to userspace or sleeping.

Right, and if someone did make a path like that, they wouldn't run the
softlockup code..

> I still wish there was a nice arch-independent way to tell if the task
> is running in user space from do_IRQ.  Maybe there is?  I'll post
> another thread and ask the question.

There should be a way to tell which protection level a task on when it
was interrupted . I doubt it arch independent though.

Daniel


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
  2005-08-03 14:50                     ` [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01 Steven Rostedt
@ 2005-08-03 15:15                       ` Steven Rostedt
  2005-08-03 15:57                         ` Steven Rostedt
  2005-08-03 16:44                       ` Steven Rostedt
  1 sibling, 1 reply; 66+ messages in thread
From: Steven Rostedt @ 2005-08-03 15:15 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Daniel Walker, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 686 bytes --]

On Wed, 2005-08-03 at 10:50 -0400, Steven Rostedt wrote:
> Ingo,
> 
> Here it is. The patch rewriten to use user_mode instead of arch
> dependent modifications.  It seems to work just like my previous patch.

I tested this patch by switching the kjournald to FIFO prio 30 and doing
a make of the kernel and "find / -name vmlinux", but this took 40
minutes to cause the deadlock.  Attached is a dumb module that creates a
thread and spins. If you want to test my softlockup detect, this will
trigger it.  The thread that is created is not RT so you can still do
work. My detector doesn't care if something spins in the kernel as an RT
task or not, it will flag it regardless.

-- Steve


[-- Attachment #2: deadlock_module.tar --]
[-- Type: application/x-tar, Size: 10240 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
  2005-08-03 15:15                       ` Steven Rostedt
@ 2005-08-03 15:57                         ` Steven Rostedt
  0 siblings, 0 replies; 66+ messages in thread
From: Steven Rostedt @ 2005-08-03 15:57 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, Daniel Walker

[-- Attachment #1: Type: text/plain, Size: 726 bytes --]

On Wed, 2005-08-03 at 11:15 -0400, Steven Rostedt wrote:
> I tested this patch by switching the kjournald to FIFO prio 30 and doing
> a make of the kernel and "find / -name vmlinux", but this took 40
> minutes to cause the deadlock.  Attached is a dumb module that creates a
> thread and spins. If you want to test my softlockup detect, this will
> trigger it.  The thread that is created is not RT so you can still do
> work. My detector doesn't care if something spins in the kernel as an RT
> task or not, it will flag it regardless.

Here's an updated version of the tarball. I've included another module
called nodeadlock that does a msleep(1) instead of a schedule.  This
should _not_ be detected as a lockup.

-- Steve

[-- Attachment #2: deadlock_module.tar --]
[-- Type: application/x-tar, Size: 10240 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
  2005-08-03 14:50                     ` [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01 Steven Rostedt
  2005-08-03 15:15                       ` Steven Rostedt
@ 2005-08-03 16:44                       ` Steven Rostedt
       [not found]                         ` <20050812125844.GA13357@elte.hu>
  1 sibling, 1 reply; 66+ messages in thread
From: Steven Rostedt @ 2005-08-03 16:44 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, Daniel Walker

Ingo,

Doing some more tests on the patch, I found that I don't like the
placement in sched.c of the touch_light_softlockup_watchdog. I figure
that it may be better to put in into __schedule instead.  This really
shows where a task is taken off the run queue.

-- Steve

New patch: I've only tested this with the deadlock modules, and not with
the kjournald case. I'll do that tonight when I'm no longer needing my
computer for other tests.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

Index: linux_realtime_ernie/kernel/irq/handle.c
===================================================================
--- linux_realtime_ernie/kernel/irq/handle.c	(revision 266)
+++ linux_realtime_ernie/kernel/irq/handle.c	(working copy)
@@ -177,6 +177,14 @@
 	 */
 	local_irq_save(flags);
 #endif
+	/*
+	 * If the task is currently running in user mode, don't 
+	 * detect soft lockups.  If CONFIG_DETECT_SOFTLOCKUP is not
+	 * configured, this should be optimized out.
+	 */
+	if (user_mode(regs))
+		touch_light_softlockup_watchdog();
+
 	kstat_this_cpu.irqs[irq]++;
 	if (desc->status & IRQ_PER_CPU) {
 		irqreturn_t action_ret;
Index: linux_realtime_ernie/kernel/sched.c
===================================================================
--- linux_realtime_ernie/kernel/sched.c	(revision 266)
+++ linux_realtime_ernie/kernel/sched.c	(working copy)
@@ -3209,6 +3209,7 @@
 		else {
 			if (prev->state == TASK_UNINTERRUPTIBLE)
 				rq->nr_uninterruptible++;
+			touch_light_softlockup_watchdog();
 			deactivate_task(prev, rq);
 		}
 	}
Index: linux_realtime_ernie/kernel/softlockup.c
===================================================================
--- linux_realtime_ernie/kernel/softlockup.c	(revision 269)
+++ linux_realtime_ernie/kernel/softlockup.c	(working copy)
@@ -3,6 +3,10 @@
  *
  * started by Ingo Molnar, (C) 2005, Red Hat
  *
+ * Steven Rostedt, Kihon Technologies Inc.
+ *   Added light softlockup detection off of what Daniel Walker of
+ *   MontaVista started.
+ *
  * this code detects soft lockups: incidents in where on a CPU
  * the kernel does not reschedule for 10 seconds or more.
  */
@@ -20,9 +24,7 @@
 static DEFINE_PER_CPU(unsigned long, timeout) = INITIAL_JIFFIES;
 static DEFINE_PER_CPU(unsigned long, timestamp) = INITIAL_JIFFIES;
 static DEFINE_PER_CPU(unsigned long, print_timestamp) = INITIAL_JIFFIES;
-static DEFINE_PER_CPU(struct task_struct *, prev_task);
 static DEFINE_PER_CPU(struct task_struct *, watchdog_task);
-static DEFINE_PER_CPU(unsigned long, task_counter);
 
 static int did_panic = 0;
 static int softlock_panic(struct notifier_block *this, unsigned long event,
@@ -42,6 +44,11 @@
 	per_cpu(timestamp, raw_smp_processor_id()) = jiffies;
 }
 
+void touch_light_softlockup_watchdog(void)
+{
+	current->softlockup_count = 0;
+}
+
 /*
  * This callback runs from the timer interrupt, and checks
  * whether the watchdog thread has hung or not:
@@ -59,24 +66,20 @@
 		if (!per_cpu(watchdog_task, this_cpu))
 			return;
 
-		if (per_cpu(prev_task, this_cpu) != current || 
-			!rt_task(current)) {
-			per_cpu(prev_task, this_cpu) = current;
-			per_cpu(task_counter, this_cpu) = 0;
-		}
-		else if ((++per_cpu(task_counter, this_cpu) > 10) && printk_ratelimit()) {
-
-			spin_lock(&print_lock);
-			printk(KERN_ERR "BUG: possible soft lockup detected on CPU#%u! %lu-%lu(%lu)\n",
-				this_cpu, jiffies, timestamp, timeout);
-			printk("curr=%s:%d\n",current->comm,current->pid);
-			
-			dump_stack();
+		if (current->pid) {
+			if (++current->softlockup_count > 10) {
+				spin_lock(&print_lock);
+				printk(KERN_ERR "BUG: possible soft lockup detected on CPU#%u! %lu-%lu(%lu)\n",
+				       this_cpu, jiffies, timestamp, timeout);
+				printk("curr=%s:%d count=%ld\n",current->comm,current->pid,
+				       current->softlockup_count);
+				dump_stack();
 #if defined(__i386__) && defined(CONFIG_SMP)
-			nmi_show_all_regs();
+				nmi_show_all_regs();
 #endif
-			spin_unlock(&print_lock);
-			per_cpu(task_counter, this_cpu) = 0;
+				spin_unlock(&print_lock);
+				touch_light_softlockup_watchdog();
+			}
 		}
 
 		wake_up_process(per_cpu(watchdog_task, this_cpu));
@@ -101,7 +104,6 @@
 		nmi_show_all_regs();
 #endif
 		spin_unlock(&print_lock);
-		per_cpu(task_counter, this_cpu) = 0;
 	}
 }
 
Index: linux_realtime_ernie/include/linux/sched.h
===================================================================
--- linux_realtime_ernie/include/linux/sched.h	(revision 266)
+++ linux_realtime_ernie/include/linux/sched.h	(working copy)
@@ -307,6 +307,7 @@
 extern void softlockup_tick(void);
 extern void spawn_softlockup_task(void);
 extern void touch_softlockup_watchdog(void);
+extern void touch_light_softlockup_watchdog(void);
 #else
 static inline void softlockup_tick(void)
 {
@@ -317,6 +318,9 @@
 static inline void touch_softlockup_watchdog(void)
 {
 }
+static inline void touch_light_softlockup_watchdog(void)
+{
+}
 #endif
 
 /* Attach to any functions which should be ignored in wchan output. */
@@ -898,6 +902,12 @@
 #ifdef CONFIG_DEBUG_PREEMPT
 	int lock_count;
 #endif
+#ifdef CONFIG_DETECT_SOFTLOCKUP
+	unsigned long	softlockup_count; /* Count to keep track how long the
+					   *  thread is in the kernel without
+					   *  sleeping.
+					   */
+#endif
 	/* realtime bits */
 	struct list_head delayed_put;
 	struct plist pi_waiters;



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
  2005-07-30 16:03 [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01 Ingo Molnar
                   ` (2 preceding siblings ...)
  2005-08-02 14:53 ` Steven Rostedt
@ 2005-08-04 12:20 ` Andrzej Nowak
  3 siblings, 0 replies; 66+ messages in thread
From: Andrzej Nowak @ 2005-08-04 12:20 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ingo Molnar

On 7/30/05, Ingo Molnar <mingo@elte.hu> wrote:
> 
> i have released the -V0.7.52-01 Real-Time Preemption patch, which can be
> downloaded from the usual place:
> ...
> reports, patches, suggestions welcome.

I can't get it to run on x86_64. The kernel won't build with
"voluntary preemption" enabled, it's complaining about mce_read_sem
being undeclared. Including linux/semaphore.h in
arch/x86_64/kernel/mce.c does get the compilation past that point, but
later on mtrr and kprobes won't build. I can turn those off, but the
build stops on kernel/printk.c with a "console_sem undeclared" error.

Everything builds fine with "real-time preemption" enabled, though the
linux system as a whole still won't run, as init crashes on startup
(kernel panic).

I saw earlier postings on lkml related to RT and x86_64, but
unfortunately the suggestions made, such as turning off latency
timing, didn't help. I tried this on a dual Xeon HT server with SLES
9.1 64bit installed (config has SMP/SMT set to yes). I used the
2.6.13-rc4 kernel patched with
realtime-preempt-2.6.13-rc4-RT-V0.7.52-10.

Any suggestions or any extra info I've missed would be appreciated.

Andrzej Nowak

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
       [not found]                         ` <20050812125844.GA13357@elte.hu>
@ 2005-08-26  4:24                           ` Steven Rostedt
  2005-08-26  6:08                             ` Ingo Molnar
  2005-08-30 11:00                             ` Stephen C. Tweedie
  0 siblings, 2 replies; 66+ messages in thread
From: Steven Rostedt @ 2005-08-26  4:24 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Stephen C. Tweedie, LKML

On Fri, 2005-08-12 at 14:58 +0200, Ingo Molnar wrote:
> FYI, in -53-05 i've added a bh->b_update_lock, which enabled me to get 
> rid of the bitlock ugliness in fs/buffer.c. Maybe it could be used to 
> have a better fix for the jbd bitlock thing too?

Well, I just spent several hours trying to use the b_update_lock in
implementing something to replace the bit spinlocks for RT.  It's
getting really ugly and I just hit a stone wall.

The problem is that I have two locks to work with. A
jbd_lock_bh_journal_head and a jbd_lock_bh_state. Unfortunately, I also
have a ranking order of:

jbd_lock_bh_state -> j_state_lock -> jbd_lock_bh_journal_head

If the ranking wasn't like this, I could probably make a little more
progress.

The jbd_lock_bh_journal_head is used to protect against creating a
journal_head and adding it to a buffer_head.  This was the obvious
choice to use your b_update_lock as a replacement, since I need to have
a lock before I acquired a journal descriptor.

The jbd_lock_bh_state was going to exist in the journal desciptor that
is stored in the buffer_head private data.  But this lead to a problem
when this is deleted.  The private data is freed while the lock is held.
So, keeping the lock in with the journal descriptor had the problem of
being freed before it was unlocked.

I started adding code to delay the freeing of the descriptor until after
the lock was held, but this added another problem.  There might be
another process waiting on this lock, and when it gets it, it tests if
the buffer_head even has a journal_descriptor for it. So, even if I
delayed the freeing, another process could be waiting on this so you
still may have a premature free.  Not to mention that this code was
becoming _very_ intrusive, since the freeing takes place deep inside
functions that acquire the lock.

So this lock has the same problem as the jbd_lock_bh_journal_head, where
as, you have a buffer_head and you want to take this lock before you
know that this buffer_head even has a journal descriptor attached to it.

So, the only other solutions that I can think of is:

a) add yet another (bloat) lock to the buffer head.

b) Still use your b_update_lock for the jbd_lock_bh_journal_head and
change the jbd_lock_bh_state to what I discussed earlier, and that being
the hash wait_on_bit code.

So do you have any ideas?

-- Steve



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
  2005-08-26  4:24                           ` Steven Rostedt
@ 2005-08-26  6:08                             ` Ingo Molnar
  2005-08-26 11:20                               ` Steven Rostedt
  2005-08-30 11:00                             ` Stephen C. Tweedie
  1 sibling, 1 reply; 66+ messages in thread
From: Ingo Molnar @ 2005-08-26  6:08 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Stephen C. Tweedie, LKML


* Steven Rostedt <rostedt@goodmis.org> wrote:

> So, the only other solutions that I can think of is:
> 
> a) add yet another (bloat) lock to the buffer head.
> 
> b) Still use your b_update_lock for the jbd_lock_bh_journal_head and 
> change the jbd_lock_bh_state to what I discussed earlier, and that 
> being the hash wait_on_bit code.

could you try a), how clean does it get? Personally i'm much more in 
favor of cleanliness. On the vanilla kernel a spinlock is zero bytes on 
UP [the most RAM-sensitive platform], and it's a word on typical SMP.

	Ingo

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
  2005-08-26  6:08                             ` Ingo Molnar
@ 2005-08-26 11:20                               ` Steven Rostedt
  2005-08-30 10:58                                 ` Stephen C. Tweedie
  0 siblings, 1 reply; 66+ messages in thread
From: Steven Rostedt @ 2005-08-26 11:20 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Stephen C. Tweedie, LKML

On Fri, 2005-08-26 at 08:08 +0200, Ingo Molnar wrote:
> * Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > So, the only other solutions that I can think of is:
> > 
> > a) add yet another (bloat) lock to the buffer head.
> > 
> > b) Still use your b_update_lock for the jbd_lock_bh_journal_head and 
> > change the jbd_lock_bh_state to what I discussed earlier, and that 
> > being the hash wait_on_bit code.
> 
> could you try a), how clean does it get? Personally i'm much more in 
> favor of cleanliness. On the vanilla kernel a spinlock is zero bytes on 
> UP [the most RAM-sensitive platform], and it's a word on typical SMP.

Not only the cleanest, but also the simplest :-)

-- Steve

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

Index: linux_realtime_ernie/fs/buffer.c
===================================================================
--- linux_realtime_ernie/fs/buffer.c	(revision 303)
+++ linux_realtime_ernie/fs/buffer.c	(working copy)
@@ -3053,6 +3053,7 @@
 {
 	BUG_ON(!list_empty(&bh->b_assoc_buffers));
 	BUG_ON(spin_is_locked(&bh->b_uptodate_lock));
+	BUG_ON(spin_is_locked(&bh->b_state_lock));
 	kmem_cache_free(bh_cachep, bh);
 	preempt_disable();
 	__get_cpu_var(bh_accounting).nr--;
@@ -3071,6 +3072,7 @@
 		memset(bh, 0, sizeof(*bh));
 		INIT_LIST_HEAD(&bh->b_assoc_buffers);
 		spin_lock_init(&bh->b_uptodate_lock);
+		spin_lock_init(&bh->b_state_lock);
 	}
 }
 
Index: linux_realtime_ernie/include/linux/buffer_head.h
===================================================================
--- linux_realtime_ernie/include/linux/buffer_head.h	(revision 303)
+++ linux_realtime_ernie/include/linux/buffer_head.h	(working copy)
@@ -62,6 +62,7 @@
  	void *b_private;		/* reserved for b_end_io */
 	struct list_head b_assoc_buffers; /* associated with another mapping */
 	spinlock_t b_uptodate_lock;
+	spinlock_t b_state_lock;
 };
 
 /*
Index: linux_realtime_ernie/include/linux/jbd.h
===================================================================
--- linux_realtime_ernie/include/linux/jbd.h	(revision 303)
+++ linux_realtime_ernie/include/linux/jbd.h	(working copy)
@@ -326,32 +326,32 @@
 
 static inline void jbd_lock_bh_state(struct buffer_head *bh)
 {
-	bit_spin_lock(BH_State, &bh->b_state);
+	spin_lock(&bh->b_state_lock);
 }
 
 static inline int jbd_trylock_bh_state(struct buffer_head *bh)
 {
-	return bit_spin_trylock(BH_State, &bh->b_state);
+	return spin_trylock(&bh->b_state_lock);
 }
 
 static inline int jbd_is_locked_bh_state(struct buffer_head *bh)
 {
-	return bit_spin_is_locked(BH_State, &bh->b_state);
+	return spin_is_locked(&bh->b_state_lock);
 }
 
 static inline void jbd_unlock_bh_state(struct buffer_head *bh)
 {
-	bit_spin_unlock(BH_State, &bh->b_state);
+	spin_unlock(&bh->b_state_lock);
 }
 
 static inline void jbd_lock_bh_journal_head(struct buffer_head *bh)
 {
-	bit_spin_lock(BH_JournalHead, &bh->b_state);
+	spin_lock(&bh->b_uptodate_lock);
 }
 
 static inline void jbd_unlock_bh_journal_head(struct buffer_head *bh)
 {
-	bit_spin_unlock(BH_JournalHead, &bh->b_state);
+	spin_unlock(&bh->b_uptodate_lock);
 }
 
 struct jbd_revoke_table_s;



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
  2005-08-26 11:20                               ` Steven Rostedt
@ 2005-08-30 10:58                                 ` Stephen C. Tweedie
  2005-08-30 11:14                                   ` Ingo Molnar
  0 siblings, 1 reply; 66+ messages in thread
From: Stephen C. Tweedie @ 2005-08-30 10:58 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Ingo Molnar, LKML, Stephen Tweedie

Hi,

On Fri, 2005-08-26 at 12:20, Steven Rostedt wrote:

> > could you try a), how clean does it get? Personally i'm much more in 
> > favor of cleanliness. On the vanilla kernel a spinlock is zero bytes on 
> > UP [the most RAM-sensitive platform], and it's a word on typical SMP.

It's a word, maybe; but it's a word used only by ext3 afaik, and it's
getting added to the core buffer_head.  Not very nice.  It certainly
looks like the easiest short-term way out for a development patch
series, though.

--Stephen


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
  2005-08-26  4:24                           ` Steven Rostedt
  2005-08-26  6:08                             ` Ingo Molnar
@ 2005-08-30 11:00                             ` Stephen C. Tweedie
  1 sibling, 0 replies; 66+ messages in thread
From: Stephen C. Tweedie @ 2005-08-30 11:00 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Ingo Molnar, LKML, Stephen Tweedie

Hi,

On Fri, 2005-08-26 at 05:24, Steven Rostedt wrote:

> Well, I just spent several hours trying to use the b_update_lock in
> implementing something to replace the bit spinlocks for RT.  It's
> getting really ugly and I just hit a stone wall.
> 
> The problem is that I have two locks to work with. A
> jbd_lock_bh_journal_head and a jbd_lock_bh_state.  ...

For now, yes.

> So, the only other solutions that I can think of is:
> 
> a) add yet another (bloat) lock to the buffer head.

This one looks like the right answer for now, just to get the patch
series running.  I've got a WIP patch under development which removes
the bh_journal_head lock entirely; if that works out, you may find
things get a bit easier.

--Stephen


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01
  2005-08-30 10:58                                 ` Stephen C. Tweedie
@ 2005-08-30 11:14                                   ` Ingo Molnar
  0 siblings, 0 replies; 66+ messages in thread
From: Ingo Molnar @ 2005-08-30 11:14 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: Steven Rostedt, LKML


* Stephen C. Tweedie <sct@redhat.com> wrote:

> On Fri, 2005-08-26 at 12:20, Steven Rostedt wrote:
> 
> > > could you try a), how clean does it get? Personally i'm much more in 
> > > favor of cleanliness. On the vanilla kernel a spinlock is zero bytes on 
> > > UP [the most RAM-sensitive platform], and it's a word on typical SMP.
> 
> It's a word, maybe; but it's a word used only by ext3 afaik, and it's 
> getting added to the core buffer_head.  Not very nice.  It certainly 
> looks like the easiest short-term way out for a development patch 
> series, though.

but ext3 is pretty much the only mainstream FS that still makes use of 
buffer_heads, so it should be fine. Any other solution looks _way_ too 
hacky - and the current bit-spin-lock solution is less than charming 
too.

	Ingo

^ permalink raw reply	[flat|nested] 66+ messages in thread

end of thread, other threads:[~2005-08-30 11:13 UTC | newest]

Thread overview: 66+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-07-30 16:03 [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01 Ingo Molnar
2005-07-30 20:47 ` Peter Zijlstra
2005-07-30 20:52   ` Ingo Molnar
2005-07-31  4:47     ` Lee Revell
2005-07-31  6:38       ` Ingo Molnar
2005-08-01  4:45         ` Lee Revell
2005-08-01 21:08           ` Ingo Molnar
2005-08-01 21:12             ` Ingo Molnar
2005-08-02 13:56           ` Steven Rostedt
2005-08-02 14:05             ` Lee Revell
2005-08-02 14:20               ` Steven Rostedt
2005-08-02 15:37                 ` 2.6.13-rc3 -> sluggish PS2 keyboard (was Re: [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01) Lee Revell
2005-08-02 15:44                   ` Vojtech Pavlik
2005-08-02 15:46                     ` Lee Revell
2005-08-02 15:47                     ` Lee Revell
2005-08-02 15:53                       ` Steven Rostedt
2005-08-02 15:55                       ` Vojtech Pavlik
2005-08-02 15:55                   ` Dmitry Torokhov
2005-08-02 15:38                 ` [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01 Lee Revell
2005-07-31  8:03     ` Peter Zijlstra
2005-07-31 10:44       ` Ingo Molnar
2005-07-31 15:56         ` [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-05 Gene Heskett
2005-08-01 18:22 ` [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01 Steven Rostedt
2005-08-01 19:49   ` Steven Rostedt
2005-08-01 20:52   ` Ingo Molnar
2005-08-01 21:09     ` Daniel Walker
2005-08-01 21:15       ` Ingo Molnar
2005-08-02  0:43       ` Steven Rostedt
2005-08-01 21:15     ` Steven Rostedt
2005-08-01 21:23       ` Ingo Molnar
2005-08-01 21:20   ` Daniel Walker
2005-08-02  0:53     ` Steven Rostedt
2005-08-02 10:19       ` Ingo Molnar
2005-08-02 19:45         ` Steven Rostedt
2005-08-02 19:56           ` Steven Rostedt
2005-08-02 23:38           ` Daniel Walker
2005-08-03  0:00             ` Steven Rostedt
2005-08-03  1:12               ` George Anzinger
2005-08-03  1:48                 ` Keith Owens
2005-08-03  2:12                   ` George Anzinger
2005-08-03  2:25               ` Daniel Walker
2005-08-03  2:42                 ` Steven Rostedt
2005-08-03  2:58                   ` Daniel Walker
2005-08-03 10:30                     ` Steven Rostedt
2005-08-03 15:10                       ` Daniel Walker
2005-08-03 10:37                     ` [Question] arch-independent way to differentiate between user and kernel Steven Rostedt
2005-08-03 10:48                       ` Ingo Molnar
2005-08-03 12:18                         ` Steven Rostedt
2005-08-03 10:56                       ` [Question] arch-independent way to differentiate between user andkernel linux-os (Dick Johnson)
2005-08-03 11:44                         ` Steven Rostedt
2005-08-03 12:04                           ` Ingo Molnar
2005-08-03 12:30                             ` Steven Rostedt
2005-08-03 14:50                     ` [patch] Real-Time Preemption, -RT-2.6.13-rc4-V0.7.52-01 Steven Rostedt
2005-08-03 15:15                       ` Steven Rostedt
2005-08-03 15:57                         ` Steven Rostedt
2005-08-03 16:44                       ` Steven Rostedt
     [not found]                         ` <20050812125844.GA13357@elte.hu>
2005-08-26  4:24                           ` Steven Rostedt
2005-08-26  6:08                             ` Ingo Molnar
2005-08-26 11:20                               ` Steven Rostedt
2005-08-30 10:58                                 ` Stephen C. Tweedie
2005-08-30 11:14                                   ` Ingo Molnar
2005-08-30 11:00                             ` Stephen C. Tweedie
2005-08-02  3:55     ` Steven Rostedt
2005-08-02  4:07       ` Daniel Walker
2005-08-02 14:53 ` Steven Rostedt
2005-08-04 12:20 ` Andrzej Nowak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox