linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-12
  2004-12-09  9:32                 ` Ingo Molnar
@ 2004-12-09 13:13                   ` Ingo Molnar
  2004-12-09 14:23                     ` Gene Heskett
                                       ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Ingo Molnar @ 2004-12-09 13:13 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Rui Nuno Capela, LKML, Lee Revell, Mark Johnson, K.R. Foley,
	Florian Schmidt, Michal Schmidt, Fernando Pablo Lopez-Lezcano,
	emann, Peter Zijlstra


* Ingo Molnar <mingo@elte.hu> wrote:

> SLAB draining was an oversight - it's mainly called when there is VM
> pressure (which is not a stricly necessary feature, so i disabled it),
> but i forgot about the module-unload case where it's a correctness
> feature. Your patch is a good starting point, i'll try to fix it on
> SMP too.

here's the full patch against a recent tree, or download the -32-12
patch from the usual place:

    http://redhat.com/~mingo/realtime-preempt/

Rui, Gene, does this fix the module unload crash you are seeing?

	Ingo

--- linux/mm/slab.c.orig
+++ linux/mm/slab.c
@@ -1509,22 +1509,26 @@ static void smp_call_function_all_cpus(v
 static void drain_array_locked(kmem_cache_t* cachep,
 				struct array_cache *ac, int force);
 
-#ifndef CONFIG_PREEMPT_RT
-/*
- * Executes in an IRQ context:
- */
-static void do_drain(void *arg)
+static void do_drain_cpu(kmem_cache_t *cachep, int cpu)
 {
-	kmem_cache_t *cachep = (kmem_cache_t*)arg;
 	struct array_cache *ac;
-	int cpu = smp_processor_id();
 
 	check_irq_off();
-	ac = ac_data(cachep, cpu);
+
 	spin_lock(&cachep->spinlock);
+	ac = ac_data(cachep, cpu);
 	free_block(cachep, &ac_entry(ac)[0], ac->avail);
-	spin_unlock(&cachep->spinlock);
 	ac->avail = 0;
+	spin_unlock(&cachep->spinlock);
+}
+
+#ifndef CONFIG_PREEMPT_RT
+/*
+ * Executes in an IRQ context:
+ */
+static void do_drain(void *arg)
+{
+	do_drain_cpu((kmem_cache_t*)arg, smp_processor_id());
 }
 #endif
 
@@ -1532,6 +1536,11 @@ static void drain_cpu_caches(kmem_cache_
 {
 #ifndef CONFIG_PREEMPT_RT
 	smp_call_function_all_cpus(do_drain, cachep);
+#else
+	int cpu;
+
+	for_each_online_cpu(cpu)
+		do_drain_cpu(cachep, cpu);
 #endif
 	check_irq_on();
 	spin_lock_irq(&cachep->spinlock);

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-12
  2004-12-09 13:13                   ` [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-12 Ingo Molnar
@ 2004-12-09 14:23                     ` Gene Heskett
  2004-12-09 14:33                     ` Steven Rostedt
  2004-12-09 14:43                     ` Rui Nuno Capela
  2 siblings, 0 replies; 25+ messages in thread
From: Gene Heskett @ 2004-12-09 14:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Steven Rostedt, Rui Nuno Capela, Lee Revell,
	Mark Johnson, K.R. Foley, Florian Schmidt, Michal Schmidt,
	Fernando Pablo Lopez-Lezcano, emann, Peter Zijlstra

On Thursday 09 December 2004 08:13, Ingo Molnar wrote:
>* Ingo Molnar <mingo@elte.hu> wrote:
>> SLAB draining was an oversight - it's mainly called when there is
>> VM pressure (which is not a stricly necessary feature, so i
>> disabled it), but i forgot about the module-unload case where it's
>> a correctness feature. Your patch is a good starting point, i'll
>> try to fix it on SMP too.
>
>here's the full patch against a recent tree, or download the -32-12
>patch from the usual place:
>
>    http://redhat.com/~mingo/realtime-preempt/
>
>Rui, Gene, does this fix the module unload crash you are seeing?

I'm still on -9 here Ingo, and I just rmmod'ed eeprom with no ill
effects.  I'd built -10 last night but as kde was trying to build,
didn't reboot.  I just got -12, so I'll do it and reboot to it next.

Or am I the wrong Gene?

>
> Ingo
>
>--- linux/mm/slab.c.orig
>+++ linux/mm/slab.c
>@@ -1509,22 +1509,26 @@ static void smp_call_function_all_cpus(v
> static void drain_array_locked(kmem_cache_t* cachep,
>     struct array_cache *ac, int force);
>
>-#ifndef CONFIG_PREEMPT_RT
>-/*
>- * Executes in an IRQ context:
>- */
>-static void do_drain(void *arg)
>+static void do_drain_cpu(kmem_cache_t *cachep, int cpu)
> {
>-	kmem_cache_t *cachep = (kmem_cache_t*)arg;
> 	struct array_cache *ac;
>-	int cpu = smp_processor_id();
>
> 	check_irq_off();
>-	ac = ac_data(cachep, cpu);
>+
> 	spin_lock(&cachep->spinlock);
>+	ac = ac_data(cachep, cpu);
> 	free_block(cachep, &ac_entry(ac)[0], ac->avail);
>-	spin_unlock(&cachep->spinlock);
> 	ac->avail = 0;
>+	spin_unlock(&cachep->spinlock);
>+}
>+
>+#ifndef CONFIG_PREEMPT_RT
>+/*
>+ * Executes in an IRQ context:
>+ */
>+static void do_drain(void *arg)
>+{
>+	do_drain_cpu((kmem_cache_t*)arg, smp_processor_id());
> }
> #endif
>
>@@ -1532,6 +1536,11 @@ static void drain_cpu_caches(kmem_cache_
> {
> #ifndef CONFIG_PREEMPT_RT
> 	smp_call_function_all_cpus(do_drain, cachep);
>+#else
>+	int cpu;
>+
>+	for_each_online_cpu(cpu)
>+		do_drain_cpu(cachep, cpu);
> #endif
> 	check_irq_on();
> 	spin_lock_irq(&cachep->spinlock);
>-
>To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at  http://www.tux.org/lkml/

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.30% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-12
  2004-12-09 13:13                   ` [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-12 Ingo Molnar
  2004-12-09 14:23                     ` Gene Heskett
@ 2004-12-09 14:33                     ` Steven Rostedt
  2004-12-09 19:19                       ` Steven Rostedt
  2004-12-09 14:43                     ` Rui Nuno Capela
  2 siblings, 1 reply; 25+ messages in thread
From: Steven Rostedt @ 2004-12-09 14:33 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rui Nuno Capela, LKML, Lee Revell, Mark Johnson, K.R. Foley,
	Florian Schmidt, Michal Schmidt, Fernando Pablo Lopez-Lezcano,
	emann, Peter Zijlstra

On Thu, 2004-12-09 at 14:13 +0100, Ingo Molnar wrote:
> * Ingo Molnar <mingo@elte.hu> wrote:
> 
> > SLAB draining was an oversight - it's mainly called when there is VM
> > pressure (which is not a stricly necessary feature, so i disabled it),
> > but i forgot about the module-unload case where it's a correctness
> > feature. Your patch is a good starting point, i'll try to fix it on
> > SMP too.
> 
> here's the full patch against a recent tree, or download the -32-12
> patch from the usual place:
> 
>     http://redhat.com/~mingo/realtime-preempt/

Ingo,

I just tried out your changes with both my sillycaches test as well as
my real modules that had the original problems. They both work fine.

I'll ever reboot my main machine now (SMP) and run your kernel there
too, and see how it works.

Thanks,

-- Steve


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-12
  2004-12-09 13:13                   ` [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-12 Ingo Molnar
  2004-12-09 14:23                     ` Gene Heskett
  2004-12-09 14:33                     ` Steven Rostedt
@ 2004-12-09 14:43                     ` Rui Nuno Capela
  2 siblings, 0 replies; 25+ messages in thread
From: Rui Nuno Capela @ 2004-12-09 14:43 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Steven Rostedt, LKML, Lee Revell, Mark Johnson, K.R. Foley,
	Florian Schmidt, Michal Schmidt, Fernando Pablo Lopez-Lezcano,
	emann, Peter Zijlstra

Ingo Molnar wrote:
>
>> SLAB draining was an oversight - it's mainly called when there is VM
>> pressure (which is not a stricly necessary feature, so i disabled it),
>> but i forgot about the module-unload case where it's a correctness
>> feature. Your patch is a good starting point, i'll try to fix it on
>> SMP too.
>
> here's the full patch against a recent tree, or download the -32-12
> patch from the usual place:
>
>     http://redhat.com/~mingo/realtime-preempt/
>
> Rui, Gene, does this fix the module unload crash you are seeing?
>

Tested with RT-V0.7.32-12 after some usb-storage plug/unplug bonanza. All
seems to be OK, at least on my laptop (P4/UP).

Cheers.
-- 
rncbc aka Rui Nuno Capela
rncbc@rncbc.org


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-12
  2004-12-09 14:33                     ` Steven Rostedt
@ 2004-12-09 19:19                       ` Steven Rostedt
  2004-12-09 20:33                         ` john cooper
  2004-12-09 22:10                         ` Ingo Molnar
  0 siblings, 2 replies; 25+ messages in thread
From: Steven Rostedt @ 2004-12-09 19:19 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rui Nuno Capela, LKML, Lee Revell, Mark Johnson, K.R. Foley,
	Florian Schmidt, Michal Schmidt, Fernando Pablo Lopez-Lezcano,
	emann, Peter Zijlstra

On Thu, 2004-12-09 at 09:33 -0500, Steven Rostedt wrote:
> On Thu, 2004-12-09 at 14:13 +0100, Ingo Molnar wrote:
> > * Ingo Molnar <mingo@elte.hu> wrote:
> > 
> > here's the full patch against a recent tree, or download the -32-12
> > patch from the usual place:
> > 
> >     http://redhat.com/~mingo/realtime-preempt/
> 
> Ingo,
> 
> I just tried out your changes with both my sillycaches test as well as
> my real modules that had the original problems. They both work fine.
> 
> I'll ever reboot my main machine now (SMP) and run your kernel there
> too, and see how it works.

Hi Ingo,

I just tried it on my main machine (SMP Dual Athlon MP, with a Gig of
RAM), and I just got the following dump on boot-up.  It never made it to
a prompt.

Freeing unused kernel memory: 216k freed
BUG: sleeping function called from invalid context IRQ 14(814) at
arch/i386/mm/highmem.c:5
in_atomic():0 [00000000], irqs_disabled():1
 [<c011c064>] __might_sleep+0xd4/0xf0 (8)
 [<c011749f>] kmap+0x1f/0x50 (36)
 [<c014ffb8>] bounce_copy_vec+0x28/0x70 (16)
 [<c01500bc>] copy_to_high_bio_irq+0x5c/0x70 (32)
 [<c0150221>] __bounce_end_io_read+0x41/0x50 (28)
 [<c0150258>] bounce_end_io_read+0x28/0x30 (20)
 [<c01686b9>] bio_endio+0x59/0x80 (12)
 [<c025a2bc>] ide_end_request+0x2c/0xc0 (16)
 [<c024bc02>] __end_that_request_first+0x1c2/0x230 (12)
 [<c01386cf>] up_mutex+0xaf/0x100 (16)
 [<c025a197>] __ide_end_request+0x77/0x170 (36)
 [<c025a2f5>] ide_end_request+0x65/0xc0 (36)
 [<c0260810>] task_end_request+0x40/0x80 (36)
 [<c0260941>] task_in_intr+0xf1/0x110 (24)
 [<c0260850>] task_in_intr+0x0/0x110 (20)
 [<c025bda1>] ide_intr+0xe1/0x170 (12)
 [<c0140e3b>] handle_IRQ_event+0x5b/0xd0 (32)
 [<c0141683>] do_hardirq+0xa3/0x100 (48)
 [<c01417f6>] do_irqd+0x116/0x1e0 (36)
 [<c01416e0>] do_irqd+0x0/0x1e0 (44)
 [<c0135e37>] kthread+0xb7/0xc0 (4)
 [<c0135d80>] kthread+0x0/0xc0 (28)
 [<c01012e5>] kernel_thread_helper+0x5/0x10 (16)
---------------------------
| preempt count: 00000001 ]
| 1-level deep critical section nesting:
----------------------------------------
.. [<c01398e7>] .... print_traces+0x17/0x50
.....[<00000000>] ..   ( <= 0x0)


This looks like it was triggered by bounce_copy_vec calling kmap_atomic
which is now just kmap with irqs disabled.  Does this need to change to
__kmap_atomic?  Is this also used to make things more preemptible, and
start removing the local_irq_saves?  I'd like to know so that you don't
need to make the patches yourself and I can handle things like this, but
I need to know what the general ideas are.  Also, am I the only one that
has highmem support enabled, because this looks like this bug would have
been triggered by anyone.

Thanks,

-- Steve


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-12
  2004-12-09 19:19                       ` Steven Rostedt
@ 2004-12-09 20:33                         ` john cooper
  2004-12-09 22:19                           ` Steven Rostedt
  2004-12-09 22:10                         ` Ingo Molnar
  1 sibling, 1 reply; 25+ messages in thread
From: john cooper @ 2004-12-09 20:33 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ingo Molnar, Rui Nuno Capela, LKML, Lee Revell, Mark Johnson,
	K.R. Foley, Florian Schmidt, Michal Schmidt,
	Fernando Pablo Lopez-Lezcano, emann, Peter Zijlstra, john cooper

[-- Attachment #1: Type: text/plain, Size: 317 bytes --]

Steven Rostedt wrote:

> ...Also, am I the only one that
> has highmem support enabled, because this looks like this bug would have
> been triggered by anyone.

I have not encountered this in -12 with CONFIG_HIGHMEM4G
running on an SMP Opteron with 1GB memory.  Details attached.

-john


-- 
john.cooper@timesys.com

[-- Attachment #2: dmesg.gz --]
[-- Type: application/x-gzip, Size: 4170 bytes --]

[-- Attachment #3: dot-config.gz --]
[-- Type: application/x-gzip, Size: 7125 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-12
  2004-12-09 19:19                       ` Steven Rostedt
  2004-12-09 20:33                         ` john cooper
@ 2004-12-09 22:10                         ` Ingo Molnar
  2004-12-10  6:11                           ` Steven Rostedt
  1 sibling, 1 reply; 25+ messages in thread
From: Ingo Molnar @ 2004-12-09 22:10 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Rui Nuno Capela, LKML, Lee Revell, Mark Johnson, K.R. Foley,
	Florian Schmidt, Michal Schmidt, Fernando Pablo Lopez-Lezcano,
	emann, Peter Zijlstra


* Steven Rostedt <rostedt@goodmis.org> wrote:

> This looks like it was triggered by bounce_copy_vec calling
> kmap_atomic which is now just kmap with irqs disabled.  Does this need
> to change to __kmap_atomic?  Is this also used to make things more
> preemptible, and start removing the local_irq_saves?  I'd like to know
> so that you don't need to make the patches yourself and I can handle
> things like this, but I need to know what the general ideas are. 
> Also, am I the only one that has highmem support enabled, because this
> looks like this bug would have been triggered by anyone.

the fix would be to find the place that disabled interrupts, and to
check that it's safe to change it to local_irq_disable_nort() (or
whatever other variant is used). Usually it's safe.

	Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-12
  2004-12-09 20:33                         ` john cooper
@ 2004-12-09 22:19                           ` Steven Rostedt
  2004-12-09 23:10                             ` john cooper
  0 siblings, 1 reply; 25+ messages in thread
From: Steven Rostedt @ 2004-12-09 22:19 UTC (permalink / raw)
  To: john cooper
  Cc: Ingo Molnar, Rui Nuno Capela, LKML, Lee Revell, Mark Johnson,
	K.R. Foley, Florian Schmidt, Michal Schmidt,
	Fernando Pablo Lopez-Lezcano, emann, Peter Zijlstra

On Thu, 2004-12-09 at 15:33 -0500, john cooper wrote:
> Steven Rostedt wrote:
> 
> > ...Also, am I the only one that
> > has highmem support enabled, because this looks like this bug would have
> > been triggered by anyone.
> 
> I have not encountered this in -12 with CONFIG_HIGHMEM4G
> running on an SMP Opteron with 1GB memory.  Details attached.

Hi John,

Could you do me a big favor? Put a print in mm/highmem.c bounce_copy_vec
to see if you get into it.  If you don't then it seems that my system is
triggering this and it just so happens that yours does not. Looking at
my dump, it shows that there may have been some contention in the ide
interrupt. 

I let it run for a little longer and the system does eventually get to a
login prompt, but I'm looking into this further.

-- Steve

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-12
  2004-12-09 22:19                           ` Steven Rostedt
@ 2004-12-09 23:10                             ` john cooper
  0 siblings, 0 replies; 25+ messages in thread
From: john cooper @ 2004-12-09 23:10 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ingo Molnar, Rui Nuno Capela, LKML, Lee Revell, Mark Johnson,
	K.R. Foley, Florian Schmidt, Michal Schmidt,
	Fernando Pablo Lopez-Lezcano, emann, Peter Zijlstra

Steven Rostedt wrote:

> Could you do me a big favor? Put a print in mm/highmem.c bounce_copy_vec
> to see if you get into it.  If you don't then it seems that my system is
> triggering this and it just so happens that yours does not.

Did so.  For whatever reason I don't appear to be getting into
bounce_copy_vec() during bootup as you seem to be.

-john


-- 
john.cooper@timesys.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-12
  2004-12-09 22:10                         ` Ingo Molnar
@ 2004-12-10  6:11                           ` Steven Rostedt
  2004-12-10 11:05                             ` Ingo Molnar
  2004-12-10 11:11                             ` Ingo Molnar
  0 siblings, 2 replies; 25+ messages in thread
From: Steven Rostedt @ 2004-12-10  6:11 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rui Nuno Capela, LKML, Lee Revell, Mark Johnson, K.R. Foley,
	Florian Schmidt, Michal Schmidt, Fernando Pablo Lopez-Lezcano,
	emann, Peter Zijlstra

On Thu, 2004-12-09 at 23:10 +0100, Ingo Molnar wrote:
> * Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > This looks like it was triggered by bounce_copy_vec calling
> > kmap_atomic which is now just kmap with irqs disabled.  Does this need
> > to change to __kmap_atomic?  Is this also used to make things more
> > preemptible, and start removing the local_irq_saves?  I'd like to know
> > so that you don't need to make the patches yourself and I can handle
> > things like this, but I need to know what the general ideas are. 
> > Also, am I the only one that has highmem support enabled, because this
> > looks like this bug would have been triggered by anyone.
> 
> the fix would be to find the place that disabled interrupts, and to
> check that it's safe to change it to local_irq_disable_nort() (or
> whatever other variant is used). Usually it's safe.

Hi Ingo,

Here's your fix. I haven't seen anything else cause the bug, and since
it uses local_irq_save, I guess the bounce_copy_vec can be called with
interrupts disabled. Since the kmap_atomic (or just kmap) checks for
that, I don't think I need more than what I've done.

Second, my ethernet doesn't work, and it really seems to be some kind of
interrupt trouble.  It sends out ARPs but doesn't see them come back,
and it also doesn't seem to know that it sent them out. I get the
following:

NETDEV WATCHDOG: eth0: transmit timed out
eth0: transmit timed out, tx_status 00 status e601.
  diagnostics: net 0ccc media 8880 dma 0000003a fifo 0000
eth0: Interrupt posted but not delivered -- IRQ blocked by another
device?
  Flags; bus-master 1, dirty 33(1) current 33(1)
  Transmit list 00000000 vs. f75012a0.
  0: @f7501200  length 80000043 status 8c010043
  1: @f75012a0  length 8000007a status 0c01007a
  2: @f7501340  length 8000002a status 0001002a
  3: @f75013e0  length 80000098 status 0c010098
  4: @f7501480  length 8000002a status 0001002a
  5: @f7501520  length 8000002a status 0001002a
  6: @f75015c0  length 8000002a status 0001002a
  7: @f7501660  length 8000002a status 0001002a
  8: @f7501700  length 80000043 status 0c010043
  9: @f75017a0  length 80000043 status 0c010043
  10: @f7501840  length 8000004f status 0c01004f
  11: @f75018e0  length 8000004f status 0c01004f
  12: @f7501980  length 80000043 status 0c010043
  13: @f7501a20  length 8000007a status 0c01007a
  14: @f7501ac0  length 80000098 status 0c010098
  15: @f7501b60  length 8000002a status 8001002a 

I have a (from lspci) 
0000:02:08.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M
[Tornado] (rev 78)


I can look into this further to see what the problem is. One funny note,
on the vanilla kernel, my eth0 is at interrupt 177, but on the rt
patched kernel its swapped with the sound card and is at interrupt 169.
Well it's getting too late for me now (its 1am my time (01:00 for you
European folks ;-) , and I need to get up at 6:30 am). Tomorrow, I'll
hack on it some more.

Oh, and here's the highmem patch:

Index: mm/highmem.c
===================================================================
--- mm/highmem.c	(revision 16)
+++ mm/highmem.c	(working copy)
@@ -240,11 +240,11 @@
 	unsigned long flags;
 	unsigned char *vto;
 
-	local_irq_save(flags);
+	local_irq_save_nort(flags);
 	vto = kmap_atomic(to->bv_page, KM_BOUNCE_READ);
 	memcpy(vto + to->bv_offset, vfrom, to->bv_len);
 	kunmap_atomic(vto, KM_BOUNCE_READ);
-	local_irq_restore(flags);
+	local_irq_restore_nort(flags);
 }
 
 #else /* CONFIG_HIGHMEM */



-- Steve


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-12
  2004-12-10  6:11                           ` Steven Rostedt
@ 2004-12-10 11:05                             ` Ingo Molnar
  2004-12-10 11:11                             ` Ingo Molnar
  1 sibling, 0 replies; 25+ messages in thread
From: Ingo Molnar @ 2004-12-10 11:05 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Rui Nuno Capela, LKML, Lee Revell, Mark Johnson, K.R. Foley,
	Florian Schmidt, Michal Schmidt, Fernando Pablo Lopez-Lezcano,
	emann, Peter Zijlstra


* Steven Rostedt <rostedt@goodmis.org> wrote:

> Hi Ingo,
> 
> Here's your fix. I haven't seen anything else cause the bug, and since
> it uses local_irq_save, I guess the bounce_copy_vec can be called with
> interrupts disabled. Since the kmap_atomic (or just kmap) checks for
> that, I don't think I need more than what I've done.

yeah, it looks good to me too - thx. I've uploaded -32-16 with your fix
included.

	Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-12
  2004-12-10  6:11                           ` Steven Rostedt
  2004-12-10 11:05                             ` Ingo Molnar
@ 2004-12-10 11:11                             ` Ingo Molnar
  2004-12-10 16:32                               ` K.R. Foley
  2004-12-11  2:26                               ` Steven Rostedt
  1 sibling, 2 replies; 25+ messages in thread
From: Ingo Molnar @ 2004-12-10 11:11 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Rui Nuno Capela, LKML, Lee Revell, Mark Johnson, K.R. Foley,
	Florian Schmidt, Michal Schmidt, Fernando Pablo Lopez-Lezcano,
	emann, Peter Zijlstra


* Steven Rostedt <rostedt@goodmis.org> wrote:

> Second, my ethernet doesn't work, and it really seems to be some kind
> of interrupt trouble.  It sends out ARPs but doesn't see them come
> back, and it also doesn't seem to know that it sent them out. I get
> the following:
> 
> NETDEV WATCHDOG: eth0: transmit timed out
> eth0: transmit timed out, tx_status 00 status e601.
>   diagnostics: net 0ccc media 8880 dma 0000003a fifo 0000
> eth0: Interrupt posted but not delivered -- IRQ blocked by another
> device?
>   Flags; bus-master 1, dirty 33(1) current 33(1)
>   Transmit list 00000000 vs. f75012a0.
>   0: @f7501200  length 80000043 status 8c010043
>   1: @f75012a0  length 8000007a status 0c01007a
>   2: @f7501340  length 8000002a status 0001002a
>   3: @f75013e0  length 80000098 status 0c010098
>   4: @f7501480  length 8000002a status 0001002a
>   5: @f7501520  length 8000002a status 0001002a
>   6: @f75015c0  length 8000002a status 0001002a
>   7: @f7501660  length 8000002a status 0001002a
>   8: @f7501700  length 80000043 status 0c010043
>   9: @f75017a0  length 80000043 status 0c010043
>   10: @f7501840  length 8000004f status 0c01004f
>   11: @f75018e0  length 8000004f status 0c01004f
>   12: @f7501980  length 80000043 status 0c010043
>   13: @f7501a20  length 8000007a status 0c01007a
>   14: @f7501ac0  length 80000098 status 0c010098
>   15: @f7501b60  length 8000002a status 8001002a 
> 
> I have a (from lspci) 
> 0000:02:08.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M
> [Tornado] (rev 78)
> 
> 
> I can look into this further to see what the problem is. One funny
> note, on the vanilla kernel, my eth0 is at interrupt 177, but on the
> rt patched kernel its swapped with the sound card and is at interrupt
> 169. Well it's getting too late for me now (its 1am my time (01:00 for
> you European folks ;-) , and I need to get up at 6:30 am). Tomorrow,
> I'll hack on it some more.

yeah, please check this - you are the first one to report this issue.

A good first step would be to go switch to a non-PREEMPT_RT preemption
model (but to keep the -RT codebase) and see whether the breakage is
related to that. If the breakage goes away with say PREEMPT_DESKTOP then
i'd suggest to enable PREEMPT_HARDIRQS and PREEMPT_SOFTIRQS (but keep
PREEMPT_DESKTOP) - do these alone trigger the breakage? E.g. there was
an obscure timing bug in the floppy driver that only triggered with
PREEMPT_HARDIRQS enabled. So it's not out of question that there's some
other driver bug/race in hiding. The other possibility is some generic
-RT kernel breakage - like the SLAB issue was.

	Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-12
  2004-12-10 11:11                             ` Ingo Molnar
@ 2004-12-10 16:32                               ` K.R. Foley
  2004-12-10 18:02                                 ` Steven Rostedt
  2004-12-11  2:26                               ` Steven Rostedt
  1 sibling, 1 reply; 25+ messages in thread
From: K.R. Foley @ 2004-12-10 16:32 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Steven Rostedt, Rui Nuno Capela, LKML, Lee Revell, Mark Johnson,
	Florian Schmidt, Michal Schmidt, Fernando Pablo Lopez-Lezcano,
	emann, Peter Zijlstra

Ingo Molnar wrote:
> * Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> 
>>Second, my ethernet doesn't work, and it really seems to be some kind
>>of interrupt trouble.  It sends out ARPs but doesn't see them come
>>back, and it also doesn't seem to know that it sent them out. I get
>>the following:
>>
>>NETDEV WATCHDOG: eth0: transmit timed out
>>eth0: transmit timed out, tx_status 00 status e601.
>>  diagnostics: net 0ccc media 8880 dma 0000003a fifo 0000
>>eth0: Interrupt posted but not delivered -- IRQ blocked by another
>>device?
>>  Flags; bus-master 1, dirty 33(1) current 33(1)
>>  Transmit list 00000000 vs. f75012a0.
>>  0: @f7501200  length 80000043 status 8c010043
>>  1: @f75012a0  length 8000007a status 0c01007a
>>  2: @f7501340  length 8000002a status 0001002a
>>  3: @f75013e0  length 80000098 status 0c010098
>>  4: @f7501480  length 8000002a status 0001002a
>>  5: @f7501520  length 8000002a status 0001002a
>>  6: @f75015c0  length 8000002a status 0001002a
>>  7: @f7501660  length 8000002a status 0001002a
>>  8: @f7501700  length 80000043 status 0c010043
>>  9: @f75017a0  length 80000043 status 0c010043
>>  10: @f7501840  length 8000004f status 0c01004f
>>  11: @f75018e0  length 8000004f status 0c01004f
>>  12: @f7501980  length 80000043 status 0c010043
>>  13: @f7501a20  length 8000007a status 0c01007a
>>  14: @f7501ac0  length 80000098 status 0c010098
>>  15: @f7501b60  length 8000002a status 8001002a 
>>
>>I have a (from lspci) 
>>0000:02:08.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M
>>[Tornado] (rev 78)
>>
>>
>>I can look into this further to see what the problem is. One funny
>>note, on the vanilla kernel, my eth0 is at interrupt 177, but on the
>>rt patched kernel its swapped with the sound card and is at interrupt
>>169. Well it's getting too late for me now (its 1am my time (01:00 for
>>you European folks ;-) , and I need to get up at 6:30 am). Tomorrow,
>>I'll hack on it some more.
> 
> 
> yeah, please check this - you are the first one to report this issue.
> 
> A good first step would be to go switch to a non-PREEMPT_RT preemption
> model (but to keep the -RT codebase) and see whether the breakage is
> related to that. If the breakage goes away with say PREEMPT_DESKTOP then
> i'd suggest to enable PREEMPT_HARDIRQS and PREEMPT_SOFTIRQS (but keep
> PREEMPT_DESKTOP) - do these alone trigger the breakage? E.g. there was
> an obscure timing bug in the floppy driver that only triggered with
> PREEMPT_HARDIRQS enabled. So it's not out of question that there's some
> other driver bug/race in hiding. The other possibility is some generic
> -RT kernel breakage - like the SLAB issue was.
> 
> 	Ingo
> 

I actually have this same card in the system I am sending this from 
currently running 2.6.10-rc2-mm3-V0.7.32-12 #15 SMP.

kr

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-12
  2004-12-10 16:32                               ` K.R. Foley
@ 2004-12-10 18:02                                 ` Steven Rostedt
  0 siblings, 0 replies; 25+ messages in thread
From: Steven Rostedt @ 2004-12-10 18:02 UTC (permalink / raw)
  To: K.R. Foley
  Cc: Ingo Molnar, Rui Nuno Capela, LKML, Lee Revell, Mark Johnson,
	Florian Schmidt, Michal Schmidt, Fernando Pablo Lopez-Lezcano,
	emann, Peter Zijlstra

On Fri, 2004-12-10 at 10:32 -0600, K.R. Foley wrote:

> >>I have a (from lspci) 
> >>0000:02:08.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M
> >>[Tornado] (rev 78)
> >>
> >>
> 

<snip> 

> I actually have this same card in the system I am sending this from 
> currently running 2.6.10-rc2-mm3-V0.7.32-12 #15 SMP.

I have a dual Athlon MP system with Tyan S2466N4 Motherboard. It looks
more of a problem with the interrupt controller.  I downloaded -17 and
tried that, but now it hangs on starting up cups and the last message
from the kernel is:

lp0: using parport0 (interrupt-driven).

It looks like a pretty bad lock up, since I know exactly when it happens
since the cursor stops blinking.

I'm right now compiling the PREEMPT_DESKTOP kernel to see if that gives
me the same problems.

-- Steve


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-12
  2004-12-10 11:11                             ` Ingo Molnar
  2004-12-10 16:32                               ` K.R. Foley
@ 2004-12-11  2:26                               ` Steven Rostedt
  2004-12-11  3:01                                 ` Steven Rostedt
                                                   ` (2 more replies)
  1 sibling, 3 replies; 25+ messages in thread
From: Steven Rostedt @ 2004-12-11  2:26 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rui Nuno Capela, LKML, Lee Revell, Mark Johnson, K.R. Foley,
	Florian Schmidt, Michal Schmidt, Fernando Pablo Lopez-Lezcano,
	emann, Peter Zijlstra

On Fri, 2004-12-10 at 12:11 +0100, Ingo Molnar wrote:
> * Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > Second, my ethernet doesn't work, and it really seems to be some kind
> > of interrupt trouble.  It sends out ARPs but doesn't see them come
> > back, and it also doesn't seem to know that it sent them out. I get
> > the following:
> > 

<snip>

> > I'll hack on it some more.
> 
> yeah, please check this - you are the first one to report this issue.

Hi Ingo,  I found the problem! and I now know why John Cooper didn't
have this problem too.  I have CONFIG_PCI_MSI defined. I don't know why,
I must have seen the option a while ago and said to myself "That looks
cool, lets try it". Since I started with the config file of the vanilla
kernel with your rt patches, it was still on. 

Anyways, what is happening is that the io_apic code is mapping irqs to
vectors, and your code didn't account for it. So here's my patch.

Index: arch/i386/kernel/io_apic.c
===================================================================
--- arch/i386/kernel/io_apic.c	(revision 18)
+++ arch/i386/kernel/io_apic.c	(working copy)
@@ -1942,12 +1942,14 @@
 
 static void end_level_ioapic_irq(unsigned int irq)
 {
+#ifndef CONFIG_PCI_MSI
 	if (!(irq_desc[irq].status & (IRQ_DISABLED | IRQ_INPROGRESS)) &&
 							irq_desc[irq].action)
+#endif
 		unmask_IO_APIC_irq(irq);
 }
 
-#else /* !CONFIG_PREEMPT_HARDIRQS || !CONFIG_SMP */
+#else /* !CONFIG_PREEMPT_HARDIRQS */
 
 static void mask_and_ack_level_ioapic_irq(unsigned int irq)
 {
@@ -2035,7 +2037,11 @@
 {
 	int irq = vector_to_irq(vector);
 
-	end_level_ioapic_irq(irq);
+#if defined(CONFIG_PREEMPT_HARDIRQS)
+	if (!(irq_desc[vector].status & (IRQ_DISABLED | IRQ_INPROGRESS)) &&
+							irq_desc[vector].action)
+#endif
+		end_level_ioapic_irq(irq);
 }
 
 static void enable_level_ioapic_vector(unsigned int vector)



--------------------

I also removed the comment "!CONFIG_SMP" since it really wasn't correct.
So I can get back to looking at other things.  This also may explain why
my system would hang with my usb printer attached (the usb interrupts
were vectored too).  I'll plug my printer back in and see if it works
now. I'll let you know if I have any more problems.

Thanks,

-- Steve


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-12
  2004-12-11  2:26                               ` Steven Rostedt
@ 2004-12-11  3:01                                 ` Steven Rostedt
  2004-12-11  7:37                                 ` Fernando Lopez-Lezcano
  2004-12-11  9:57                                 ` Ingo Molnar
  2 siblings, 0 replies; 25+ messages in thread
From: Steven Rostedt @ 2004-12-11  3:01 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rui Nuno Capela, LKML, Lee Revell, Mark Johnson, K.R. Foley,
	Florian Schmidt, Michal Schmidt, Fernando Pablo Lopez-Lezcano,
	emann, Peter Zijlstra

On Fri, 2004-12-10 at 21:26 -0500, Steven Rostedt wrote:

> Hi Ingo,  I found the problem! and I now know why John Cooper didn't
                                                    ^^^^^^^^^^^
                                            Sorry, I meant K.R.Foley, 
                                          Since he's the one with the
                                          same ethernet card as me.

> have this problem too.  I have CONFIG_PCI_MSI defined. I don't know why,
> I must have seen the option a while ago and said to myself "That looks
> cool, lets try it". Since I started with the config file of the vanilla
> kernel with your rt patches, it was still on. 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-12
  2004-12-11  2:26                               ` Steven Rostedt
  2004-12-11  3:01                                 ` Steven Rostedt
@ 2004-12-11  7:37                                 ` Fernando Lopez-Lezcano
  2004-12-11 12:30                                   ` Steven Rostedt
  2004-12-11  9:57                                 ` Ingo Molnar
  2 siblings, 1 reply; 25+ messages in thread
From: Fernando Lopez-Lezcano @ 2004-12-11  7:37 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ingo Molnar, Rui Nuno Capela, LKML, Lee Revell, Mark Johnson,
	K.R. Foley, Florian Schmidt, Michal Schmidt, emann,
	Peter Zijlstra

On Fri, 2004-12-10 at 18:26, Steven Rostedt wrote:
> On Fri, 2004-12-10 at 12:11 +0100, Ingo Molnar wrote:
> > * Steven Rostedt <rostedt@goodmis.org> wrote:
> > 
> > > Second, my ethernet doesn't work, and it really seems to be some kind
> > > of interrupt trouble.  It sends out ARPs but doesn't see them come
> > > back, and it also doesn't seem to know that it sent them out. I get
> > > the following:
> > > 
> 
> <snip>
> 
> > > I'll hack on it some more.
> > 
> > yeah, please check this - you are the first one to report this issue.
> 
> Hi Ingo,  I found the problem! and I now know why John Cooper didn't
> have this problem too.  I have CONFIG_PCI_MSI defined. I don't know why,
> I must have seen the option a while ago and said to myself "That looks
> cool, lets try it". Since I started with the config file of the vanilla
> kernel with your rt patches, it was still on. 
> 
> Anyways, what is happening is that the io_apic code is mapping irqs to
> vectors, and your code didn't account for it. So here's my patch.

Can't wait to try the patch, I don't have CONFI_PCI_MSI defined in the
configurations I use. I've had problems with a network card (R8169
driver) for a while (I think I reported it), the interrupts were being
ignored. Hopefully the same problem...

-- Fernando



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-12
  2004-12-11  2:26                               ` Steven Rostedt
  2004-12-11  3:01                                 ` Steven Rostedt
  2004-12-11  7:37                                 ` Fernando Lopez-Lezcano
@ 2004-12-11  9:57                                 ` Ingo Molnar
  2 siblings, 0 replies; 25+ messages in thread
From: Ingo Molnar @ 2004-12-11  9:57 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Rui Nuno Capela, LKML, Lee Revell, Mark Johnson, K.R. Foley,
	Florian Schmidt, Michal Schmidt, Fernando Pablo Lopez-Lezcano,
	emann, Peter Zijlstra


* Steven Rostedt <rostedt@goodmis.org> wrote:

> Anyways, what is happening is that the io_apic code is mapping irqs to
> vectors, and your code didn't account for it. So here's my patch.

ah .. thanks, great debugging!

	Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-12
  2004-12-11  7:37                                 ` Fernando Lopez-Lezcano
@ 2004-12-11 12:30                                   ` Steven Rostedt
  2004-12-13 23:34                                     ` Fernando Lopez-Lezcano
  0 siblings, 1 reply; 25+ messages in thread
From: Steven Rostedt @ 2004-12-11 12:30 UTC (permalink / raw)
  To: Fernando Lopez-Lezcano
  Cc: Ingo Molnar, Rui Nuno Capela, LKML, Lee Revell, Mark Johnson,
	K.R. Foley, Florian Schmidt, Michal Schmidt, emann,
	Peter Zijlstra

On Fri, 2004-12-10 at 23:37 -0800, Fernando Lopez-Lezcano wrote:

> Can't wait to try the patch, I don't have CONFI_PCI_MSI defined in the
> configurations I use. I've had problems with a network card (R8169
> driver) for a while (I think I reported it), the interrupts were being
> ignored. Hopefully the same problem...

Hi Fernando,

You may have the same problem but the patch I sent won't solve it.  My
patch only is a problem if you have CONFIG_PCI_MSI defined. But I'm sure
there exists other instances that threading hardirqs might not work
properly with other configurations. Send me your .config, and if I get
time I'll take a look. (also your /proc/cpuinfo might help).

Before you send this, make sure that it is the hardirqs that's the
problem. Switch to PREEMPT_DESKTOP and make sure hardirqs are not
threaded.  If the problem goes away, then this may be your problem. If
it does not, I'm afraid that it's something else, and you don't need to
send me anything.

-- Steve


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-12
@ 2004-12-11 14:52 Nicolas Mailhot
  2004-12-11 15:41 ` Steven Rostedt
                   ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Nicolas Mailhot @ 2004-12-11 14:52 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ingo Molnar

[-- Attachment #1: Type: text/plain, Size: 429 bytes --]

Hi,

Just FYI real-time kernels do not boot on my Fedora Devel (Rawhide)
system, including -RT-2.6.10-rc2-mm3-V0.7.32-12. 2.6.10-rc2-mm4 OTOH
boots fine. It freezes just after initial hardware init before going
into gfx mode.

(kernel config available on demand, it's almost the same - 2.6.10-rc2-
mm4 was generated using a make oldconfig on the -RT-2.6.10-rc2-mm3-
V0.7.32-12 file)

Regards,

-- 
Nicolas Mailhot

[-- Attachment #2: Ceci est une partie de message numériquement signée --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-12
  2004-12-11 14:52 [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-12 Nicolas Mailhot
@ 2004-12-11 15:41 ` Steven Rostedt
  2004-12-12  9:45 ` Ingo Molnar
  2004-12-13  7:09 ` Ingo Molnar
  2 siblings, 0 replies; 25+ messages in thread
From: Steven Rostedt @ 2004-12-11 15:41 UTC (permalink / raw)
  To: Nicolas Mailhot; +Cc: LKML, Ingo Molnar

On Sat, 2004-12-11 at 15:52 +0100, Nicolas Mailhot wrote:
> Just FYI real-time kernels do not boot on my Fedora Devel (Rawhide)
> system, including -RT-2.6.10-rc2-mm3-V0.7.32-12. 2.6.10-rc2-mm4 OTOH
> boots fine. It freezes just after initial hardware init before going
> into gfx mode.
> 
> (kernel config available on demand, it's almost the same - 2.6.10-rc2-
> mm4 was generated using a make oldconfig on the -RT-2.6.10-rc2-mm3-
> V0.7.32-12 file)

Do you have CONFIG_PCI_MSI set? If you do, then I already sent in a
patch to Ingo. It had a problem with interrupts, and if I had anything
plugged into the USB port, it would hang too.

-- Steve


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-12
  2004-12-11 14:52 [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-12 Nicolas Mailhot
  2004-12-11 15:41 ` Steven Rostedt
@ 2004-12-12  9:45 ` Ingo Molnar
  2004-12-13  7:09 ` Ingo Molnar
  2 siblings, 0 replies; 25+ messages in thread
From: Ingo Molnar @ 2004-12-12  9:45 UTC (permalink / raw)
  To: Nicolas Mailhot; +Cc: linux-kernel


* Nicolas Mailhot <Nicolas.Mailhot@laPoste.net> wrote:

> Hi,
> 
> Just FYI real-time kernels do not boot on my Fedora Devel (Rawhide)
> system, including -RT-2.6.10-rc2-mm3-V0.7.32-12. 2.6.10-rc2-mm4 OTOH
> boots fine. It freezes just after initial hardware init before going
> into gfx mode.

so it freezes hard still in textmode? There wasnt any particular freeze
fixed since -32-12 (-19 is the latest), but you might want to try -19 if
it's simple to do.

> (kernel config available on demand, it's almost the same - 2.6.10-rc2-
> mm4 was generated using a make oldconfig on the -RT-2.6.10-rc2-mm3-
> V0.7.32-12 file)

please send the .config to me in private mail, along with a bootlog if
available. Does nmi_watchdog=1 (if the system has an IO-APIC) or
nmi_watchdog=2 (if it doesnt) produce any stackdumps from the lockup? 
(but those usually only make it to a serial log, not to the syslog.) 
Find below the mini-howto on how to set up serial logging and how to
debug lockups with the NMI watchdog.

	Ingo


to set up serial logging:
-------------------------

install a null modem cable (== serial cable) to one of the serial ports
of the server, connect the cable to another box, run a terminal program
on that other box (e.g. "minicom -m" - do Alt-L to switch on logging
after starting it up) and set up the server's kernel to do serial
logging: enable CONFIG_SERIAL_8250_CONSOLE and
CONFIG_SERIAL_CORE_CONSOLE, recompile & reinstall the kernel, add
"console=ttyS0,38400 console=tty0" to your /etc/grub.conf or
/etc/lilo.conf kernel boot line, reboot the server with the new kernel
command line - and configure minicom to run with that speed (Alt-S).

e.g. my /etc/grub.conf has:

title test-2.6 (test-2.6)
        root (hd0,0)
        kernel /boot/bzImage root=/dev/sda1 console=ttyS0,38400 console=tty0 nmi_watchdog=1 kernel_preempt=1

if everything is set up correctly then you should see kernel messages
showing up in the minicom session when you boot up.

When the messages do not show up then typical errors are mismatch
between the serial port (or speed) and the device names used - if it's
COM2 then use ttyS1, and dont forget to set up the serial speed option
of minicom, etc. You can test the serial connection by doing:

	echo x > /dev/ttyS0

and that should show up in the minicom session on the other box.

to set up the NMI watchdog:
---------------------------

add nmi_watchdog=1 to your boot parameters and reboot - that should be
all to get it active. If all CPU's NMI count increases in
/proc/interrupts then it's working fine. If the counts do not increase
(or only one CPU increases it) then try nmi_watchdog=2 - this is another
type of NMI that might work better. (Very rarely there are boxes that
dont have reliable NMI counts with 1 and 2 either - but i dont think
your box is one of those.)

once the NMI watchdog is up and running it should catch all hard lockups
and print backtraces to the serial console - even if you are within X
while the lockup happens. You can test hard lockups by running the
attached 'lockupcli' userspace code as root - it turns off interrupts
and goes into an infinite loop => instant lockup. The NMI watchdog
should notice this condition after a couple of seconds and should abort
the task, printing a kernel trace as well. Your box should be back in
working order after that point.

now for the real lockup your box wont be 'fixed' by the NMI watchdog, it
will likely stay locked up, but you should get messages on the serial
console, giving us an idea where the kernel locked up and why. (Very
rarely it happens that not even the NMI watchdog prints anything for a
hard lockup - this is often the sign of hardware problems.)

	Ingo

--- lockupcli.c

main ()
{
        iopl(3);
        for (;;) asm("cli");
}


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-12
  2004-12-11 14:52 [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-12 Nicolas Mailhot
  2004-12-11 15:41 ` Steven Rostedt
  2004-12-12  9:45 ` Ingo Molnar
@ 2004-12-13  7:09 ` Ingo Molnar
  2 siblings, 0 replies; 25+ messages in thread
From: Ingo Molnar @ 2004-12-13  7:09 UTC (permalink / raw)
  To: Nicolas Mailhot; +Cc: linux-kernel


* Nicolas Mailhot <Nicolas.Mailhot@laPoste.net> wrote:

> Just FYI real-time kernels do not boot on my Fedora Devel (Rawhide)
> system, including -RT-2.6.10-rc2-mm3-V0.7.32-12. 2.6.10-rc2-mm4 OTOH
> boots fine. It freezes just after initial hardware init before going
> into gfx mode.
> 
> (kernel config available on demand, it's almost the same - 2.6.10-rc2-
> mm4 was generated using a make oldconfig on the -RT-2.6.10-rc2-mm3-
> V0.7.32-12 file)

cannot reproduce this on two testsystems using your .config, so it's
probably some hardware detail that makes the difference.

	Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-12
  2004-12-11 12:30                                   ` Steven Rostedt
@ 2004-12-13 23:34                                     ` Fernando Lopez-Lezcano
  2004-12-15  9:51                                       ` Ingo Molnar
  0 siblings, 1 reply; 25+ messages in thread
From: Fernando Lopez-Lezcano @ 2004-12-13 23:34 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ingo Molnar, Rui Nuno Capela, LKML, Lee Revell, Mark Johnson,
	K.R. Foley, Florian Schmidt, Michal Schmidt, emann,
	Peter Zijlstra

On Sat, 2004-12-11 at 04:30, Steven Rostedt wrote:
> On Fri, 2004-12-10 at 23:37 -0800, Fernando Lopez-Lezcano wrote:
> > Can't wait to try the patch, I don't have CONFI_PCI_MSI defined in the
> > configurations I use. I've had problems with a network card (R8169
> > driver) for a while (I think I reported it), the interrupts were being
> > ignored. Hopefully the same problem...
>
> You may have the same problem but the patch I sent won't solve it.  My
> patch only is a problem if you have CONFIG_PCI_MSI defined. But I'm sure
> there exists other instances that threading hardirqs might not work
> properly with other configurations. Send me your .config, and if I get
> time I'll take a look. (also your /proc/cpuinfo might help).
> 
> Before you send this, make sure that it is the hardirqs that's the
> problem. Switch to PREEMPT_DESKTOP and make sure hardirqs are not
> threaded.

[The following is all done booting into 0.7.32-19, interrupt scheduling
and priorities unchanged from the defaults]

I'm using PREEMPT_DESKTOP. I don't know how to force the kernel to not
thread hardirqs. What I did (maybe the same thing?) is to boot single
user, then turn off /proc/sys/kernel/hardirq_preempt, and then start the
network. It works (the network). And then I tried the same thing with
hardirq_preempt=1 and still worked when booting single user... :-(

So, these are the interrupts after booting single user:
           CPU0       
  0:      36426    IO-APIC-edge  timer  0/35956
  1:        143    IO-APIC-edge  i8042  0/143
  4:          0    IO-APIC-edge  KGDB-stub  0/0
  8:          1    IO-APIC-edge  rtc  0/1
  9:          0   IO-APIC-level  acpi  0/0
 12:        100    IO-APIC-edge  i8042  0/100
 14:         26    IO-APIC-edge  ide0  1/24
 17:         59   IO-APIC-level  libata, libata  0/59
 20:       1462   IO-APIC-level  libata  0/1462
 21:          0   IO-APIC-level  ehci_hcd, uhci_hcd, uhci_hcd, uhci_hcd,
uhci_hcd  0/0
NMI:       7544 
LOC:      36254 
ERR:          0
MIS:          0

Here's the extra one I have after I start the network

 16:          7   IO-APIC-level  eth0  0/7

(network works). I then telinit 3, no changes in interrupts (except for
the count numbers), network continues working. 

I then telinit 5 and the network dies. This is the added interrupt:

 11:          0    IO-APIC-edge  radeon@PCI:1:0:0  0/0

I tried repeating the whole thing but going through all the services in
the transition from level 3 to level 5 one by one, and nothing happened
to the network so it must be X. I then rebooted into single user,
started the network and loaded the radeon kernel module alone and the
network was not affected. 

So, this is what I get in dmesg when I go into level 5:

NET: Registered protocol family 10
Disabled Privacy Extensions on device c03e0ec0(lo)
IPv6 over IPv4 tunneling driver
divert: not allocating divert_blk for non-ethernet device sit0
eth0: no IPv6 routers present
[drm] Initialized radeon 1.11.0 20020828 on minor 0:
ACPI: PCI interrupt 0000:01:00.0[A] -> GSI 16 (level, low) -> IRQ 16
agpgart: Found an AGP 3.0 compliant device at 0000:00:00.0.
agpgart: Putting AGP V3 device at 0000:00:00.0 into 4x mode
agpgart: Putting AGP V3 device at 0000:01:00.0 into 4x mode
[drm] Loading R200 Microcode
irq 16: nobody cared!
 [<c01041e3>] dump_stack+0x23/0x30 (20)
 [<c014d480>] __report_bad_irq+0x30/0xa0 (24)
 [<c014d590>] note_interrupt+0x70/0xb0 (32)
 [<c014d30c>] do_hardirq+0x13c/0x150 (40)
 [<c014d399>] do_irqd+0x79/0xb0 (32)
 [<c013bf7a>] kthread+0xaa/0xb0 (48)
 [<c0101335>] kernel_thread_helper+0x5/0x10 (153411604)
---------------------------
| preempt count: 00000002 ]
| 2-level deep critical section nesting:
----------------------------------------
.. [<c014d2aa>] .... do_hardirq+0xda/0x150
.....[<c014d399>] ..   ( <= do_irqd+0x79/0xb0)
.. [<c014045d>] .... print_traces+0x1d/0x60
.....[<c01041e3>] ..   ( <= dump_stack+0x23/0x30)
 
handlers:
[<f88e3f00>] (rtl8169_interrupt+0x0/0x1a0 [r8169])
Disabling IRQ #16

Anything else I could test?

> If the problem goes away, then this may be your problem. If
> it does not, I'm afraid that it's something else, and you don't need to
> send me anything.

Let me know if you still want the .config...
-- Fernando



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-12
  2004-12-13 23:34                                     ` Fernando Lopez-Lezcano
@ 2004-12-15  9:51                                       ` Ingo Molnar
  0 siblings, 0 replies; 25+ messages in thread
From: Ingo Molnar @ 2004-12-15  9:51 UTC (permalink / raw)
  To: Fernando Lopez-Lezcano
  Cc: Steven Rostedt, Rui Nuno Capela, LKML, Lee Revell, Mark Johnson,
	K.R. Foley, Florian Schmidt, Michal Schmidt, emann,
	Peter Zijlstra


* Fernando Lopez-Lezcano <nando@ccrma.Stanford.EDU> wrote:

> [The following is all done booting into 0.7.32-19, interrupt
> scheduling and priorities unchanged from the defaults]
> 
> I'm using PREEMPT_DESKTOP. I don't know how to force the kernel to not
> thread hardirqs. [...]

this is now moot for your case, but here's how you can disable hardirq
threading: you can disable CONFIG_PREEMPT_HARDIRQS in the .config, or
you can disable it via the hardirq-preempt=0 boot-time kernel flag.

	Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2004-12-15  9:52 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-12-11 14:52 [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-12 Nicolas Mailhot
2004-12-11 15:41 ` Steven Rostedt
2004-12-12  9:45 ` Ingo Molnar
2004-12-13  7:09 ` Ingo Molnar
  -- strict thread matches above, loose matches on Subject: below --
2004-11-24 10:16 [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm2-V0.7.30-10 Ingo Molnar
2004-12-03 20:58 ` [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm2-V0.7.32-0 Ingo Molnar
2004-12-07 13:29   ` [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-4 Ingo Molnar
2004-12-07 14:11     ` [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-6 Ingo Molnar
2004-12-08 17:13       ` Steven Rostedt
2004-12-08 18:14         ` Rui Nuno Capela
2004-12-08 19:03           ` Steven Rostedt
2004-12-08 21:39             ` Rui Nuno Capela
2004-12-08 22:11               ` Steven Rostedt
2004-12-09  9:32                 ` Ingo Molnar
2004-12-09 13:13                   ` [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-12 Ingo Molnar
2004-12-09 14:23                     ` Gene Heskett
2004-12-09 14:33                     ` Steven Rostedt
2004-12-09 19:19                       ` Steven Rostedt
2004-12-09 20:33                         ` john cooper
2004-12-09 22:19                           ` Steven Rostedt
2004-12-09 23:10                             ` john cooper
2004-12-09 22:10                         ` Ingo Molnar
2004-12-10  6:11                           ` Steven Rostedt
2004-12-10 11:05                             ` Ingo Molnar
2004-12-10 11:11                             ` Ingo Molnar
2004-12-10 16:32                               ` K.R. Foley
2004-12-10 18:02                                 ` Steven Rostedt
2004-12-11  2:26                               ` Steven Rostedt
2004-12-11  3:01                                 ` Steven Rostedt
2004-12-11  7:37                                 ` Fernando Lopez-Lezcano
2004-12-11 12:30                                   ` Steven Rostedt
2004-12-13 23:34                                     ` Fernando Lopez-Lezcano
2004-12-15  9:51                                       ` Ingo Molnar
2004-12-11  9:57                                 ` Ingo Molnar
2004-12-09 14:43                     ` Rui Nuno Capela

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).