public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* v2.6.22.1-rt3
@ 2007-07-13 11:22 Thomas Gleixner
  2007-07-13 11:36 ` v2.6.22.1-rt3 Remy Bohmer
                   ` (8 more replies)
  0 siblings, 9 replies; 17+ messages in thread
From: Thomas Gleixner @ 2007-07-13 11:22 UTC (permalink / raw)
  To: LKML; +Cc: RT-Users, Ingo Molnar

we are pleased to announce the v2.6.22.1-rt3 kernel

Attention! 

Ingo is off for a long weekend and therefor the download location for
this release is:

 http://www.tglx.de/projects/preempt-rt/2.6.22.1
  
more info about the -rt patchset can be found in the RT wiki:
  
   http://rt.wiki.kernel.org
 
This release is bugfix release:

- update of the x8664 -hrt queue (resolve boot problems)
- gtod vsyscall fix from Gregory Haskins

to build a 2.6.22.1-rt3 tree, the following patches should be applied:
 
   http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.22.1.tar.bz2
   http://www.tglx.de/projects/preempt-rt/2.6.22.1-rt3/patch-2.6.22.1-rt3.patch


	Thomas



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: v2.6.22.1-rt3
  2007-07-13 11:22 v2.6.22.1-rt3 Thomas Gleixner
@ 2007-07-13 11:36 ` Remy Bohmer
  2007-07-13 16:05   ` v2.6.22.1-rt3 Thomas Gleixner
  2007-07-13 16:10 ` v2.6.22.1-rt3 Kevin Hilman
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 17+ messages in thread
From: Remy Bohmer @ 2007-07-13 11:36 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, RT-Users, Ingo Molnar

Thomas,

>  http://www.tglx.de/projects/preempt-rt/2.6.22.1

This is a dead link...
It should be: http://www.tglx.de/projects/preempt-rt/2.6.22.1-rt3/

Remy

>
> more info about the -rt patchset can be found in the RT wiki:
>
>    http://rt.wiki.kernel.org
>
> This release is bugfix release:
>
> - update of the x8664 -hrt queue (resolve boot problems)
> - gtod vsyscall fix from Gregory Haskins
>
> to build a 2.6.22.1-rt3 tree, the following patches should be applied:
>
>    http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.22.1.tar.bz2
>    http://www.tglx.de/projects/preempt-rt/2.6.22.1-rt3/patch-2.6.22.1-rt3.patch
>
>
>         Thomas
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: v2.6.22.1-rt3
  2007-07-13 11:36 ` v2.6.22.1-rt3 Remy Bohmer
@ 2007-07-13 16:05   ` Thomas Gleixner
  0 siblings, 0 replies; 17+ messages in thread
From: Thomas Gleixner @ 2007-07-13 16:05 UTC (permalink / raw)
  To: linux; +Cc: LKML, RT-Users, Ingo Molnar

On Fri, 2007-07-13 at 13:36 +0200, Remy Bohmer wrote:
> Thomas,
> 
> >  http://www.tglx.de/projects/preempt-rt/2.6.22.1
> 
> This is a dead link...
> It should be: http://www.tglx.de/projects/preempt-rt/2.6.22.1-rt3/

Grmbl, the publishing script choked.

	tglx



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: v2.6.22.1-rt3
  2007-07-13 11:22 v2.6.22.1-rt3 Thomas Gleixner
  2007-07-13 11:36 ` v2.6.22.1-rt3 Remy Bohmer
@ 2007-07-13 16:10 ` Kevin Hilman
  2007-07-13 16:32 ` v2.6.22.1-rt3 Kevin Hilman
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Kevin Hilman @ 2007-07-13 16:10 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, RT-Users, Ingo Molnar

Thomas,

In  arm-preempt-config.patch, the GENERIC_TIME is removed from the OMAP
arch.  Can you undo that removal?    OMAP is still GENERIC_TIME capable
unless something has been done to break it.

In other words, on top of -rt3:

Index: linux-2.6/arch/arm/Kconfig
===================================================================
--- linux-2.6.orig/arch/arm/Kconfig
+++ linux-2.6/arch/arm/Kconfig
@@ -394,6 +394,7 @@ config ARCH_DAVINCI
 config ARCH_OMAP
        bool "TI OMAP"
        select GENERIC_GPIO
+       select GENERIC_TIME
        help
          Support for TI's OMAP platform (OMAP1 and OMAP2).

Signed-off-by: Kevin Hilman <khilman@mvsita.com>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: v2.6.22.1-rt3
  2007-07-13 11:22 v2.6.22.1-rt3 Thomas Gleixner
  2007-07-13 11:36 ` v2.6.22.1-rt3 Remy Bohmer
  2007-07-13 16:10 ` v2.6.22.1-rt3 Kevin Hilman
@ 2007-07-13 16:32 ` Kevin Hilman
  2007-07-13 17:18 ` v2.6.22.1-rt3 - Early INT13 boot crash Carsten Emde
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Kevin Hilman @ 2007-07-13 16:32 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, RT-Users, Ingo Molnar

Thomas,

A typo in preempt-irqs-core.patch, where IRQF_TIMER is changed to
_IRQF_TIMER but called later as __IRQF_TIMER.

Here's a patch to compile, but not sure if you want one or two underscores.

With these two paches, -rt3 is building/booting for ARM/OMAP1.

Kevin

Index: linux-2.6/include/linux/interrupt.h
===================================================================
--- linux-2.6.orig/include/linux/interrupt.h
+++ linux-2.6/include/linux/interrupt.h
@@ -52,7 +52,7 @@
 #define IRQF_SAMPLE_RANDOM     0x00000040
 #define IRQF_SHARED            0x00000080
 #define IRQF_PROBE_SHARED      0x00000100
-#define _IRQF_TIMER            0x00000200
+#define __IRQF_TIMER           0x00000200
 #define IRQF_PERCPU            0x00000400
 #define IRQF_NOBALANCING       0x00000800
 #define IRQF_IRQPOLL           0x00001000


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: v2.6.22.1-rt3 - Early INT13 boot crash
  2007-07-13 11:22 v2.6.22.1-rt3 Thomas Gleixner
                   ` (2 preceding siblings ...)
  2007-07-13 16:32 ` v2.6.22.1-rt3 Kevin Hilman
@ 2007-07-13 17:18 ` Carsten Emde
  2007-07-13 17:25 ` v2.6.22.1-rt3 Fernando Lopez-Lezcano
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Carsten Emde @ 2007-07-13 17:18 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, RT-Users, Ingo Molnar

Thomas,

> we are pleased to announce the v2.6.22.1-rt3 kernel
Thanks a lot!

Using the .config file from the previous stable version (2.6.21.6-rt21), 
the new version runs okay on an x86_64 system. On an i386 system, 
however, the system is crashing at an early boot stage displaying an 
INT13 reg dump at the window bottom.

After disabling paravirtualization support (CONFIG_PARAVIRT), the new 
version is working fine on i386 as well.

Just in case someone out there is experiencing the same problem.

	--cbe

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: v2.6.22.1-rt3
  2007-07-13 11:22 v2.6.22.1-rt3 Thomas Gleixner
                   ` (3 preceding siblings ...)
  2007-07-13 17:18 ` v2.6.22.1-rt3 - Early INT13 boot crash Carsten Emde
@ 2007-07-13 17:25 ` Fernando Lopez-Lezcano
  2007-07-14  0:33 ` v2.6.22.1-rt3 Josh Triplett
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Fernando Lopez-Lezcano @ 2007-07-13 17:25 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, RT-Users, Ingo Molnar

On Fri, 2007-07-13 at 13:22 +0200, Thomas Gleixner wrote:
> we are pleased to announce the v2.6.22.1-rt3 kernel
> 
> Attention! 
> 
> Ingo is off for a long weekend and therefor the download location for
> this release is:
> 
>  http://www.tglx.de/projects/preempt-rt/2.6.22.1
>   
> more info about the -rt patchset can be found in the RT wiki:
>   
>    http://rt.wiki.kernel.org
>  
> This release is bugfix release:
> 
> - update of the x8664 -hrt queue (resolve boot problems)
> - gtod vsyscall fix from Gregory Haskins

Same problem as reported yesterday in 2.6.22.1-rt2 in a T61 laptop, boot
hangs, last BUG printed is similar to this (numbers changed since
yesterday, of course, functions listed appear to be the same). No serial
port available to dump everything...

This was copied from the screen yesterday:

BUG: spinlock lockup on CPU#1, swapper/0, c318da88
[<c0405f34>] show_trace_log_lvl+0x1a/0x2f
[<c0406a09>] show_trace+-x12/0x14
[<c0406a71>] dump_stack+0x16/0x18
[<c0617a91>] _raw_spin_lock+0xc1/0xe2
[<c061743f>] __spin_lock_irq+0x14/0x16
[<c061541d>] __sched_tex_start+0xd5/0xaef
[<c061600e>] schedule+0xe0/0xfa
[<c0616c15>] rt_spin_lock_slowlock+0xcf/0x14f
[<c061724b>] __rt_spin_lock+0x3d/0x40
[<c0617256>] rt_spin_lock+0x8/0xa
[<c052f95c>] acpi_idle_enter_c3+0x12d/0x232
[<c059af51>] cpuidle_idle_call+0x56/0x79
[<c04033a5>] cpu_idle+0x9d/0xda
[<c0419e21>] start_secondary+0x34e/0x356
[<00000000>] 0x0

Same .config as before.
-- Fernando



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: v2.6.22.1-rt3
  2007-07-13 11:22 v2.6.22.1-rt3 Thomas Gleixner
                   ` (4 preceding siblings ...)
  2007-07-13 17:25 ` v2.6.22.1-rt3 Fernando Lopez-Lezcano
@ 2007-07-14  0:33 ` Josh Triplett
  2007-07-14 21:39 ` 2.6.22.1-rt3 lockups Rui Nuno Capela
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Josh Triplett @ 2007-07-14  0:33 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, RT-Users, Ingo Molnar

On Fri, 2007-07-13 at 13:22 +0200, Thomas Gleixner wrote:
> we are pleased to announce the v2.6.22.1-rt3 kernel
[...]
> This release is bugfix release:
> 
> - update of the x8664 -hrt queue (resolve boot problems)
> - gtod vsyscall fix from Gregory Haskins

I can confirm that this patch fixes booting on an 8-CPU x86-64 box that
-rt2 would not boot on.

- Josh Triplett



^ permalink raw reply	[flat|nested] 17+ messages in thread

* 2.6.22.1-rt3 lockups
  2007-07-13 11:22 v2.6.22.1-rt3 Thomas Gleixner
                   ` (5 preceding siblings ...)
  2007-07-14  0:33 ` v2.6.22.1-rt3 Josh Triplett
@ 2007-07-14 21:39 ` Rui Nuno Capela
  2007-07-20  3:37 ` v2.6.22.1-rt3 Daniel Walker
  2007-07-21 22:07 ` 2.6.22.1-rt4 lockups Rui Nuno Capela
  8 siblings, 0 replies; 17+ messages in thread
From: Rui Nuno Capela @ 2007-07-14 21:39 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, RT-Users, Ingo Molnar

Hi,

Current 2.6.22.1-rt3 is locking-up on any of my x86 SMP machines, in
very rare and non-deterministic occasions and normal desktop workloads,
but seems to be more probable when high disk I/O is underway.

At least, I was able to capture some crash traces, via serial console,
with nmi_watchdog=1.

...
Oops: 0000 [#1]
PREEMPT SMP
Modules linked in: appletalk ax25 ipx p8023 snd_rtctimer snd_seq_dummy
snd_pcm_oss snd_mixer_oss snd_seq_midi snd_seq_midi_event w83627hf
hwmon_vid snd_seq button hwmon battery ac eeprom loop dm_mod ohci1394
ieee1394 wacom usbhid hid ff_memless snd_ice1712 snd_ice17xx_ak4xxx
snd_ak4xxx_adda snd_cs8427 snd_i2c firewire_ohci snd_mpu401_uart sk98lin
firewire_core ide_cd nvidia(P) snd_cs46xx gameport snd_rawmidi
snd_seq_device cdrom crc_itu_t snd_intel8x0 snd_ac97_codec ehci_hcd
uhci_hcd ac97_bus snd_pcm intel_agp snd_timer i2c_i801 agpgart snd
i2c_core soundcore snd_page_alloc shpchp iTCO_wdt rtc_cmos usbcore
pci_hotplug rtc_core rtc_lib ext3 mbcache jbd edd fan piix thermal
processor ide_disk ide_core
CPU:    1
EIP:    0060:[<00000000>]    Tainted: P       VLI
EFLAGS: 00210007   (2.6.22.1-rt3.0 #1)
EIP is at _stext+0x3feff000/0x20
eax: c1812a80   ebx: c03bb540   ecx: 00000001   edx: c038e3c0
esi: c038e3c0   edi: 00000001   ebp: c5099d6c   esp: c5099d50
ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068  preempt:00000003
Process cc1plus (pid: 20669, ti=c5099000 task=deca8070 task.ti=c5099000)
Stack: c011a8fc 3302bd39 000009cb c1812a80 c1812a80 3302bd39 000009cb
c5099d90
       c011b45f 3302bd39 000009cb 00000001 c038e3c0 00000001 00000000
c038e3c0
       c5099df4 c011e04d c5099dfc c011ddeb 00000000 0000001f c1812a80
0000001f
Call Trace:
 [<c010622a>] show_trace_log_lvl+0x1a/0x30
 [<c01062f6>] show_stack_log_lvl+0xb6/0xe0
 [<c0106521>] show_registers+0x201/0x330
 [<c0106768>] die+0x118/0x260
 [<c03041e3>] do_page_fault+0x193/0x600
 [<c03028fa>] error_code+0x72/0x78
 [<c011b45f>] activate_task+0x4f/0xb0
 [<c011e04d>] try_to_wake_up+0x2bd/0x420
 [<c011e229>] wake_up_process_mutex+0x19/0x20
 [<c014257c>] wakeup_next_waiter+0xec/0x1a0
 [<c03016ec>] rt_spin_lock_slowunlock+0x4c/0x70
 [<c0301fa6>] rt_spin_unlock+0x26/0x30
 [<c015b394>] put_zone_pcp+0x14/0x20
 [<c015c215>] get_page_from_freelist+0x145/0x380
 [<c015c4a4>] __alloc_pages+0x54/0x2d0
 [<c016526d>] __handle_mm_fault+0x7dd/0x9a0
 [<c0304348>] do_page_fault+0x2f8/0x600
 [<c03028fa>] error_code+0x72/0x78
 =======================
Code:  Bad EIP value.
EIP: [<00000000>] _stext+0x3feff000/0x20 SS:ESP 0068:c5099d50
__sched_text_start+0x91e/0xbd0
 [<c030086e>] schedule+0x2e/0x110
 [<c030184e>] rt_spin_lock_slowlock+0x8e/0x170
 [<c0301fd0>] __rt_spin_lock+0x20/0x30
 [<c0301fe8>] rt_spin_lock+0x8/0x10
 [<c015b53b>] get_zone_pcp+0x2b/0x50
 [<c015be97>] free_hot_cold_page+0xc7/0x190
 [<c015bfba>] free_hot_page+0xa/0x10
 [<c015bfe7>] __free_pages+0x27/0x30
 [<c015c016>] free_pages+0x26/0x30
 [<c01765e5>] quicklist_trim+0xc5/0x110
 [<c011875e>] check_pgt_cache+0x1e/0x20
 [<c01033b9>] cpu_idle+0x49/0xb0
 [<c02ff88d>] rest_init+0x6d/0x70
 [<c03c1e01>] start_kernel+0x301/0x3b0
 [<00000000>] _stext+0x3feff000/0x20
 =======================
NMI watchdog detected lockup on CPU#1 (5000/5000)

...
Oops: 0000 [#1]
PREEMPT SMP
Modules linked in: appletalk ax25 ipx p8023 snd_rtctimer snd_seq_dummy
snd_pcm_oss snd_mixer_oss snd_seq_midi snd_seq_midi_event snd_seq
w83627hf hwmon_vid button battery hwmon eeprom ac loop dm_mod ohci1394
ieee1394 wacom usbhid snd_ice1712 hid snd_ice17xx_ak4xxx snd_cs46xx
snd_ak4xxx_adda ff_memless snd_cs8427 gameport sk98lin firewire_ohci
snd_i2c snd_mpu401_uart nvidia(P) snd_rawmidi firewire_core
snd_seq_device crc_itu_t snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm
snd_timer ide_cd snd iTCO_wdt soundcore shpchp cdrom snd_page_alloc
pci_hotplug i2c_i801 i2c_core ehci_hcd uhci_hcd usbcore rtc_cmos
rtc_core rtc_lib intel_agp agpgart ext3 mbcache jbd edd fan piix thermal
processor ide_disk ide_core
CPU:    1
EIP:    0060:[<00000000>]    Tainted: P       VLI
EFLAGS: 00010003   (2.6.22.1-rt3.0 #1)
EIP is at _stext+0x3feff000/0x20
eax: c181ca80   ebx: c03bb540   ecx: 00000001   edx: dfca0c30
esi: dfca0c30   edi: 00000001   ebp: d2025b54   esp: d2025b38
ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068  preempt:00000003
Process rsync (pid: 18436, ti=d2025000 task=ca152730 task.ti=d2025000)
Stack: c011a8fc 00387b36 000008de c181ca80 c181ca80 00387b36 000008de
d2025b78
       c011b45f 00387b36 000008de 00000001 dfca0c30 00000001 00000001
dfca0c30
       d2025bdc c011e04d d2025be4 c011ddeb 00000000 0000001f c181ca80
0000001f
Call Trace:
 [<c010622a>] show_trace_log_lvl+0x1a/0x30
 [<c01062f6>] show_stack_log_lvl+0xb6/0xe0
 [<c0106521>] show_registers+0x201/0x330
 [<c0106768>] die+0x118/0x260
 [<c03041e3>] do_page_fault+0x193/0x600
 [<c03028fa>] error_code+0x72/0x78
 [<c011b45f>] activate_task+0x4f/0xb0
 [<c011e04d>] try_to_wake_up+0x2bd/0x420
 [<c011e229>] wake_up_process_mutex+0x19/0x20
 [<c014257c>] wakeup_next_waiter+0xec/0x1a0
 [<c03016ec>] rt_spin_lock_slowunlock+0x4c/0x70
 [<c0301fa6>] rt_spin_unlock+0x26/0x30
 [<c015b394>] put_zone_pcp+0x14/0x20
 [<c015c215>] get_page_from_freelist+0x145/0x380
 [<c015c4a4>] __alloc_pages+0x54/0x2d0
 [<c0174e79>] cache_alloc_refill+0x2b9/0x510
 [<c0174bae>] kmem_cache_alloc+0x7e/0x90
 [<f95817a2>] ext3_alloc_inode+0x12/0x50 [ext3]
 [<c018c609>] alloc_inode+0x19/0x190
 [<c018c7ce>] iget_locked+0x4e/0x140
 [<f9581470>] ext3_lookup+0x80/0xe0 [ext3]
 [<c017ffd8>] do_lookup+0x138/0x180
 [<c018219d>] __link_path_walk+0x81d/0xe10
 [<c01827d6>] link_path_walk+0x46/0xd0
 [<c0182879>] path_walk+0x19/0x20
 [<c0182a2b>] do_path_lookup+0x7b/0x220
 [<c0183458>] __user_walk_fd+0x38/0x50
 [<c017be2e>] vfs_lstat_fd+0x1e/0x50
 [<c017bea1>] vfs_lstat+0x11/0x20
 [<c017bec4>] sys_lstat64+0x14/0x30
 [<c01051d2>] sysenter_past_esp+0x5f/0x85
 =======================
Code:  Bad EIP value.
EIP: [<00000000>] _stext+0x3feff000/0x20 SS:ESP 0068:d2025b38
NMI watchdog detected lockup on CPU#1 (5000/5000)

Pid: 18436, comm:                rsync
EIP: 0060:[<c03022b6>] CPU: 1
EIP is at __spin_lock+0x16/0x20
 EFLAGS: 00000082    Tainted: P        (2.6.22.1-rt3.0 #1)
EAX: c181ca80 EBX: c181ca80 ECX: dfc2c1b0 EDX: d2025000
ESI: c0403a80 EDI: dfc2c1b0 EBP: d202598c DS: 007b ES: 007b FS: 00d8
CR0: 8005003b CR2: ffffffd5 CR3: 1027e000 CR4: 000006d0
 [<c010622a>] show_trace_log_lvl+0x1a/0x30
 [<c0106e12>] show_trace+0x12/0x20
 [<c0103af3>] show_regs+0x183/0x190
 [<c0303420>] nmi_watchdog_tick+0x1f0/0x290
 [<c0302e57>] do_nmi+0x77/0x260
 [<c03029a3>] nmi_stack_correct+0x26/0x2b
 [<c011bb77>] task_rq_lock+0x37/0x70
 [<c011e00a>] try_to_wake_up+0x27a/0x420
 [<c011e1c8>] default_wake_function+0x18/0x20
 [<c013689b>] autoremove_wake_function+0x1b/0x50
 [<c011a6a9>] __wake_up_common+0x39/0x60
 [<c01204a3>] __wake_up+0x33/0x60
 [<c012374b>] wake_up_klogd+0x3b/0x40
 [<c01ee247>] bust_spinlocks+0x27/0x30
 [<c01067bc>] die+0x16c/0x260
 [<c03041e3>] do_page_fault+0x193/0x600
 [<c03028fa>] error_code+0x72/0x78
 [<c011b45f>] activate_task+0x4f/0xb0
 [<c011e04d>] try_to_wake_up+0x2bd/0x420
 [<c011e229>] wake_up_process_mutex+0x19/0x20
 [<c014257c>] wakeup_next_waiter+0xec/0x1a0
 [<c03016ec>] rt_spin_lock_slowunlock+0x4c/0x70
 [<c0301fa6>] rt_spin_unlock+0x26/0x30
 [<c015b394>] put_zone_pcp+0x14/0x20
 [<c015c215>] get_page_from_freelist+0x145/0x380
 [<c015c4a4>] __alloc_pages+0x54/0x2d0
 [<c0174e79>] cache_alloc_refill+0x2b9/0x510
 [<c0174bae>] kmem_cache_alloc+0x7e/0x90
 [<f95817a2>] ext3_alloc_inode+0x12/0x50 [ext3]
 [<c018c609>] alloc_inode+0x19/0x190
 [<c018c7ce>] iget_locked+0x4e/0x140
 [<f9581470>] ext3_lookup+0x80/0xe0 [ext3]
 [<c017ffd8>] do_lookup+0x138/0x180
 [<c018219d>] __link_path_walk+0x81d/0xe10
 [<c01827d6>] link_path_walk+0x46/0xd0
 [<c0182879>] path_walk+0x19/0x20
 [<c0182a2b>] do_path_lookup+0x7b/0x220
 [<c0183458>] __user_walk_fd+0x38/0x50
 [<c017be2e>] vfs_lstat_fd+0x1e/0x50
 [<c017bea1>] vfs_lstat+0x11/0x20
 [<c017bec4>] sys_lstat64+0x14/0x30
 [<c01051d2>] sysenter_past_esp+0x5f/0x85
 =======================
NMI watchdog detected lockup on CPU#0 (0/5000)

...
Oops: 0000 [#1]
PREEMPT SMP
Modules linked in: appletalk ax25 ipx p8023 snd_rtctimer snd_seq_dummy
snd_pcm_oss snd_mixer_oss snd_seq_midi snd_seq_midi_event snd_seq button
battery ac w83627hf hwmon_vid hwmon eeprom loop dm_mod wacom usbhid hid
ff_memless nvidia(P) snd_ice1712 snd_ice17xx_ak4xxx snd_ak4xxx_adda
snd_cs8427 snd_cs46xx sk98lin snd_i2c gameport snd_mpu401_uart
snd_rawmidi snd_seq_device ohci1394 ieee1394 snd_intel8x0 snd_ac97_codec
ac97_bus snd_pcm firewire_ohci firewire_core snd_timer crc_itu_t ide_cd
cdrom shpchp intel_agp snd i2c_i801 iTCO_wdt agpgart pci_hotplug
i2c_core soundcore ehci_hcd snd_page_alloc uhci_hcd usbcore rtc_cmos
rtc_core rtc_lib ext3 mbcache jbd edd fan piix thermal processor
ide_disk ide_core
CPU:    0
EIP:    0060:[<00000000>]    Tainted: P       VLI
EFLAGS: 00213003   (2.6.22.1-rt3.0 #1)
EIP is at _stext+0x3feff000/0x20
eax: c1812a80   ebx: c03bb540   ecx: 00000001   edx: c038e3c0
esi: c038e3c0   edi: 00000001   ebp: f4fb0d6c   esp: f4fb0d50
ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068  preempt:00000003
Process Xorg (pid: 4145, ti=f4fb0000 task=dfd8c6b0 task.ti=f4fb0000)
Stack: c011a8fc f00c393c 00000b0a c1812a80 c1812a80 f00c393c 00000b0a
f4fb0d90
       c011b45f f00c393c 00000b0a 00000001 c038e3c0 00000000 00000000
c038e3c0
       f4fb0df4 c011e04d 00000000 c180d000 00000000 0000001f c1812a80
f4fb0e20
Call Trace:
 [<c010622a>] show_trace_log_lvl+0x1a/0x30
 [<c01062f6>] show_stack_log_lvl+0xb6/0xe0
 [<c0106521>] show_registers+0x201/0x330
 [<c0106768>] die+0x118/0x260
 [<c03041e3>] do_page_fault+0x193/0x600
 [<c03028fa>] error_code+0x72/0x78
 [<c011b45f>] activate_task+0x4f/0xb0
 [<c011e04d>] try_to_wake_up+0x2bd/0x420
 [<c011e229>] wake_up_process_mutex+0x19/0x20
 [<c014257c>] wakeup_next_waiter+0xec/0x1a0
 [<c03016ec>] rt_spin_lock_slowunlock+0x4c/0x70
 [<c0301fa6>] rt_spin_unlock+0x26/0x30
 [<c015b394>] put_zone_pcp+0x14/0x20
 [<c015c215>] get_page_from_freelist+0x145/0x380
 [<c015c51f>] __alloc_pages+0xcf/0x2d0
 [<c016526d>] __handle_mm_fault+0x7dd/0x9a0
 [<c0304348>] do_page_fault+0x2f8/0x600
 [<c03028fa>] error_code+0x72/0x78
 =======================
Code:  Bad EIP value.
EIP: [<00000000>] _stext+0x3feff000/0x20 SS:ESP 0068:f4fb0d50
NMI watchdog detected lockup on CPU#1 (5000/5000)

Pid: 2779, comm:                klogd
EIP: 0060:[<c03022b9>] CPU: 1
EIP is at __spin_lock+0x19/0x20
 EFLAGS: 00000082    Tainted: P        (2.6.22.1-rt3.0 #1)
EAX: c1812a80 EBX: c1812a80 ECX: 00000001 EDX: f4d11000
ESI: c0403a80 EDI: dff1c1b0 EBP: f4d11d1c DS: 007b ES: 007b FS: 00d8
CR0: 8005003b CR2: b7faa000 CR3: 1fe08000 CR4: 000006d0
 [<c010622a>] show_trace_log_lvl+0x1a/0x30
 [<c0106e12>] show_trace+0x12/0x20
 [<c0103af3>] show_regs+0x183/0x190
 [<c0303420>] nmi_watchdog_tick+0x1f0/0x290
 [<c0302e57>] do_nmi+0x77/0x260
 [<c03029a3>] nmi_stack_correct+0x26/0x2b
 [<c011bb77>] task_rq_lock+0x37/0x70
 [<c011ddb7>] try_to_wake_up+0x27/0x420
 [<c011e1c8>] default_wake_function+0x18/0x20
 [<c011a6a9>] __wake_up_common+0x39/0x60
 [<c012050b>] __wake_up_sync+0x3b/0x50
 [<c02939b9>] sock_def_readable+0x79/0x80
 [<c02fafc0>] unix_dgram_sendmsg+0x450/0x500
 [<c028eff4>] sock_aio_write+0x114/0x130
 [<c0178160>] do_sync_write+0xd0/0x110
 [<c0178a5d>] vfs_write+0x14d/0x160
 [<c017907d>] sys_write+0x3d/0x70
 [<c01051d2>] sysenter_past_esp+0x5f/0x85
 =======================
NMI watchdog detected lockup on CPU#0 (0/5000)

...

Here are the complete console captures:

  http://www.rncbc.org/datahub/console-2.6.22.1-rt3.0-1.log
  http://www.rncbc.org/datahub/console-2.6.22.1-rt3.0-2.log
  http://www.rncbc.org/datahub/console-2.6.22.1-rt3.0-3.log

.config evidence:

  http://www.rncbc.org/datahub/config-2.6.22.1-rt3.0

Cheers.
--
rncbc aka Rui Nuno Capela

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: v2.6.22.1-rt3
  2007-07-13 11:22 v2.6.22.1-rt3 Thomas Gleixner
                   ` (6 preceding siblings ...)
  2007-07-14 21:39 ` 2.6.22.1-rt3 lockups Rui Nuno Capela
@ 2007-07-20  3:37 ` Daniel Walker
  2007-07-20  3:41   ` v2.6.22.1-rt3 Daniel Walker
  2007-07-21  0:25   ` v2.6.22.1-rt3 Thomas Gleixner
  2007-07-21 22:07 ` 2.6.22.1-rt4 lockups Rui Nuno Capela
  8 siblings, 2 replies; 17+ messages in thread
From: Daniel Walker @ 2007-07-20  3:37 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, RT-Users, Ingo Molnar


I reworked the broken out series for 2.6.22.1-rt5 (note not -rt3) so
that it's a little more bisectable. I found that many of the patches
would compile but wouldn't boot. 

Combined patch,
ftp://source.mvista.com/pub/dwalker/rt/patch-2.6.22.1-rt4-dw1


The broken out series is here,
ftp://source.mvista.com/pub/dwalker/rt/patch-2.6.22.1-rt4-dw1.tar.gz

Below is a diff between the 2.6.22.1-rt4 series and mine, and a
interdiff between the two combined patches.

--- patches-2.6.22.1-rt4/series	2007-07-16 02:29:51.000000000 -0700
+++ patches/series	2007-07-19 20:40:00.000000000 -0700
@@ -306,18 +306,21 @@
 #
 # IRQ threading
 #
+preempt-softirqs-core.patch
 preempt-irqs-core.patch
+preempt-irqs-softirq-in-hardirq.patch
+preempt-irqs-direct-debug-keyboard.patch
 preempt-irqs-timer.patch
 preempt-irqs-hrtimer.patch
 
 preempt-irqs-i386.patch
+preempt-irqs-i386-ioapic-mask-quirk.patch
 
 preempt-irqs-mips.patch
 
 preempt-irqs-x86-64.patch
 preempt-irqs-x86-64-ioapic-mask-quirk.patch
 
-preempt-irqs-i386-ioapic-mask-quirk.patch
 preempt-irqs-arm.patch
 preempt-irqs-arm-fix-oprofile.patch
 
@@ -352,7 +355,7 @@
 rt-mutex-mips.patch
 
 rt-mutex-ppc.patch
-rt-mtex-ppc-fix-a5.patch
+rt-mutex-ppc-fix-a5.patch
 
 rt-mutex-x86-64.patch
 
@@ -402,6 +405,7 @@
 #
 # Posix-cpu-timers in a thread
 #
+preempt-realtime-warn-and-bug-on.patch
 cputimer-thread-rt_A0.patch
 cputimer-thread-rt-fix.patch
 posix-cpu-timers-fix.patch
@@ -501,7 +505,6 @@
 preempt-realtime-timer.patch
 preempt-realtime-usb.patch
 
-preempt-realtime-warn-and-bug-on.patch
 preempt-realtime-warn-and-bug-on-fix.patch
 
 #
@@ -611,7 +614,6 @@
 # Softirq modifications
 #
 new-softirq-code.patch
-new-softirq-code-fixlets.patch
 softirq-per-cpu-assumptions-fixes.patch
 smp-processor-id-fixups.patch
 fix-migrating-softirq.patch
@@ -659,13 +661,9 @@
 #
 # not yet backmerged tail patches:
 #
-hrt-rt-fix-merge-artifact.patch
 preempt-rt-no-slub.patch
 rfkill-input-fix.patch
-fork.c-takeover-tasklets-warning-fix.patch
 
 paravirt-function-pointer-fix.patch
-hpet-build-fix.patch
-rtc.c-build-fix.patch
 version.patch
 

diff -u linux/arch/i386/kernel/hpet.c linux-2.6.22.1/arch/i386/kernel/hpet.c
--- linux/arch/i386/kernel/hpet.c
+++ linux-2.6.22.1/arch/i386/kernel/hpet.c	2007-07-20 02:22:56.000000000 +0000
@@ -9,7 +9,6 @@
 #include <linux/pm.h>
 
 #include <asm/fixmap.h>
-#include <asm/i8253.h>
 #include <asm/hpet.h>
 #include <asm/i8253.h>
 #include <asm/io.h>
diff -u linux/include/asm-generic/bug.h linux-2.6.22.1/include/asm-generic/bug.h
--- linux/include/asm-generic/bug.h
+++ linux-2.6.22.1/include/asm-generic/bug.h	2007-07-20 03:15:15.000000000 +0000
@@ -94,14 +94,2 @@
 
-#ifdef CONFIG_PREEMPT_RT
-# define BUG_ON_RT(c)			BUG_ON(c)
-# define BUG_ON_NONRT(c)		do { } while (0)
-# define WARN_ON_RT(condition)		WARN_ON(condition)
-# define WARN_ON_NONRT(condition)	do { } while (0)
-#else
-# define BUG_ON_RT(c)			do { } while (0)
-# define BUG_ON_NONRT(c)		BUG_ON(c)
-# define WARN_ON_RT(condition)		do { } while (0)
-# define WARN_ON_NONRT(condition)	WARN_ON(condition)
-#endif
-
 #endif
diff -u linux/kernel/softirq.c linux-2.6.22.1/kernel/softirq.c
--- linux/kernel/softirq.c
+++ linux-2.6.22.1/kernel/softirq.c	2007-07-20 03:15:17.000000000 +0000
@@ -102,7 +102,6 @@
 
 	if (unlikely(!tsk))
 		return;
-#if 1
 #if defined(CONFIG_PREEMPT_SOFTIRQS) && defined(CONFIG_PREEMPT_HARDIRQS)
 	/*
 	 * Optimization: if we are in a hardirq thread context, and
@@ -117,7 +116,6 @@
 			(tsk->normal_prio == current->normal_prio))
 		return;
 #endif
-#endif
 	/*
 	 * Wake up the softirq task:
 	 */



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: v2.6.22.1-rt3
  2007-07-20  3:37 ` v2.6.22.1-rt3 Daniel Walker
@ 2007-07-20  3:41   ` Daniel Walker
  2007-07-21  0:25   ` v2.6.22.1-rt3 Thomas Gleixner
  1 sibling, 0 replies; 17+ messages in thread
From: Daniel Walker @ 2007-07-20  3:41 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, RT-Users, Ingo Molnar

On Thu, 2007-07-19 at 20:37 -0700, Daniel Walker wrote:
> I reworked the broken out series for 2.6.22.1-rt5 (note not -rt3) so

Woops , I mean , 2.6.22.1-rt4 here.. 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: v2.6.22.1-rt3
  2007-07-20  3:37 ` v2.6.22.1-rt3 Daniel Walker
  2007-07-20  3:41   ` v2.6.22.1-rt3 Daniel Walker
@ 2007-07-21  0:25   ` Thomas Gleixner
  1 sibling, 0 replies; 17+ messages in thread
From: Thomas Gleixner @ 2007-07-21  0:25 UTC (permalink / raw)
  To: Daniel Walker; +Cc: LKML, RT-Users, Ingo Molnar

On Thu, 2007-07-19 at 20:37 -0700, Daniel Walker wrote:
> The broken out series is here,
> ftp://source.mvista.com/pub/dwalker/rt/patch-2.6.22.1-rt4-dw1.tar.gz

I'll pick that up soon.

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 17+ messages in thread

* 2.6.22.1-rt4 lockups
  2007-07-13 11:22 v2.6.22.1-rt3 Thomas Gleixner
                   ` (7 preceding siblings ...)
  2007-07-20  3:37 ` v2.6.22.1-rt3 Daniel Walker
@ 2007-07-21 22:07 ` Rui Nuno Capela
  2007-07-22 21:00   ` Rui Nuno Capela
  2007-07-23 16:08   ` Daniel Walker
  8 siblings, 2 replies; 17+ messages in thread
From: Rui Nuno Capela @ 2007-07-21 22:07 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar; +Cc: LKML, RT-Users

Hi,

As with -rt3, I was able to capture one more crash trace, via serial
console, with nmi_watchdog=1.

Yes, current 2.6.22.1-rt4 is still locking-up on my ix86 SMT/SMP boxes.
I'll have to wait for some hours of uptime and normal desktop use and
then it just locks-up without warning.

Last couple of occurrences were all while browsing with firefox
(2.0.0.5) or using openoffice.org (2.0.4) but in rare and
non-deterministic fashion I must say.

It looks very similar to the previous ones I've reported before for
-rt3, but I am no expert in these things.

...
Oops: 0000 [#1]
PREEMPT SMP
Modules linked in: tun appletalk ax25 ipx p8023 snd_rtctimer
snd_seq_dummy snd_pcm_oss snd_mixer_oss snd_seq_midi snd_seq_midi_event
snd_seq w83627hf hwmon_vid hwmon eeprom button battery ac loop dm_mod
wacom usbhid hid ff_memless ohci1394 ieee1394 nvidia(P) snd_cs46xx
gameport firewire_ohci snd_ice1712 snd_ice17xx_ak4xxx snd_ak4xxx_adda
snd_cs8427 snd_i2c snd_mpu401_uart snd_rawmidi snd_seq_device
firewire_core sk98lin snd_intel8x0 crc_itu_t snd_ac97_codec ac97_bus
ide_cd snd_pcm cdrom snd_timer uhci_hcd ehci_hcd i2c_i801 snd rtc_cmos
shpchp iTCO_wdt i2c_core usbcore rtc_core pci_hotplug soundcore
intel_agp rtc_lib agpgart snd_page_alloc ext3 mbcache jbd edd fan piix
thermal processor ide_disk ide_core
CPU:    0
EIP:    0060:[<00000000>]    Tainted: P       VLI
EFLAGS: 00213006   (2.6.22.1-rt4.0 #1)
EIP is at _stext+0x3feff000/0x20
eax: c1812a80   ebx: c03bb540   ecx: 00000001   edx: c038e3c0
esi: c038e3c0   edi: 00000001   ebp: f6fe1d6c   esp: f6fe1d50
ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068  preempt:00000003
Process Xorg (pid: 4101, ti=f6fe1000 task=f754ec30 task.ti=f6fe1000)
Stack: c011a94c 04882eab 00000ca9 c1812a80 c1812a80 04882eab 00000ca9
f6fe1d90
       c011b4af 04882eab 00000ca9 00000001 c038e3c0 00000000 00000000
c038e3c0
       f6fe1df4 c011e09d f6fe1dfc c011de3b 00000000 0000001f c1812a80
0000001f
Call Trace:
 [<c010622a>] show_trace_log_lvl+0x1a/0x30
 [<c01062f6>] show_stack_log_lvl+0xb6/0xe0
 [<c0106521>] show_registers+0x201/0x330
 [<c0106768>] die+0x118/0x260
 [<c0304233>] do_page_fault+0x193/0x600
 [<c030294a>] error_code+0x72/0x78
 [<c011b4af>] activate_task+0x4f/0xb0
 [<c011e09d>] try_to_wake_up+0x2bd/0x420
 [<c011e279>] wake_up_process_mutex+0x19/0x20
 [<c01425cc>] wakeup_next_waiter+0xec/0x1a0
 [<c030173c>] rt_spin_lock_slowunlock+0x4c/0x70
 [<c0301ff6>] rt_spin_unlock+0x26/0x30
 [<c015b3e4>] put_zone_pcp+0x14/0x20
 [<c015c265>] get_page_from_freelist+0x145/0x380
 [<c015c4f4>] __alloc_pages+0x54/0x2d0
 [<c01652bd>] __handle_mm_fault+0x7dd/0x9a0
 [<c0304398>] do_page_fault+0x2f8/0x600
 [<c030294a>] error_code+0x72/0x78
 =======================
Code:  Bad EIP value.
EIP: [<00000000>] _stext+0x3feff000/0x20 SS:ESP 0068:f6fe1d50
NMI watchdog detected lockup on CPU#1 (5000/5000)
...


Complete serial console capture:

  http://www.rncbc.org/datahub/console-2.6.22.1-rt4.0-1.log

.config evidence:

  http://www.rncbc.org/datahub/config-2.6.22.1-rt4.0

Cheers.
--
rncbc aka Rui Nuno Capela

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.6.22.1-rt4 lockups
  2007-07-21 22:07 ` 2.6.22.1-rt4 lockups Rui Nuno Capela
@ 2007-07-22 21:00   ` Rui Nuno Capela
  2007-07-23 16:08   ` Daniel Walker
  1 sibling, 0 replies; 17+ messages in thread
From: Rui Nuno Capela @ 2007-07-22 21:00 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar; +Cc: LKML, RT-Users

Hi again,

Sorry to bother, but got another one. Please advise whether these dumps
are any useful or are just garbage.

...
Oops: 0000 [#1]
PREEMPT SMP
Modules linked in: appletalk ax25 ipx p8023 snd_rtctimer snd_seq_dummy
snd_pcm_oss snd_mixer_oss snd_seq_midi snd_seq_midi_event snd_seq
w83627hf hwmon_vid hwmon button eeprom battery ac loop dm_mod ohci1394
ieee1394 snd_ice1712 snd_ice17xx_ak4xxx snd_ak4xxx_adda snd_cs8427
nvidia(P) snd_i2c wacom usbhid hid ff_memless firewire_ohci snd_cs46xx
snd_mpu401_uart gameport snd_rawmidi firewire_core crc_itu_t
snd_seq_device sk98lin snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm
snd_timer snd intel_agp soundcore ide_cd ehci_hcd uhci_hcd cdrom
iTCO_wdt shpchp snd_page_alloc agpgart usbcore i2c_i801 pci_hotplug
i2c_core rtc_cmos rtc_core rtc_lib ext3 mbcache jbd edd fan piix thermal
processor ide_disk ide_core
CPU:    0
EIP:    0060:[<00000000>]    Tainted: P       VLI
EFLAGS: 00010003   (2.6.22.1-rt4.0 #1)
EIP is at _stext+0x3feff000/0x20
eax: c1812a80   ebx: c03bb540   ecx: 00000001   edx: c038e3c0
esi: c038e3c0   edi: 00000001   ebp: c64a9d6c   esp: c64a9d50
ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068  preempt:00000003
Process thunderbird-bin (pid: 12848, ti=c64a9000 task=e4514db0
task.ti=c64a9000)
Stack: c011a94c d1027dbe 00001dc1 c1812a80 c1812a80 d1027dbe 00001dc1
c64a9d90
       c011b4af d1027dbe 00001dc1 00000001 c038e3c0 00000000 00000000
c038e3c0
       c64a9df4 c011e09d 00000000 00000000 00000000 0000001f c1812a80
c64a9e20
Call Trace:
 [<c010622a>] show_trace_log_lvl+0x1a/0x30
 [<c01062f6>] show_stack_log_lvl+0xb6/0xe0
 [<c0106521>] show_registers+0x201/0x330
 [<c0106768>] die+0x118/0x260
 [<c0304233>] do_page_fault+0x193/0x600
 [<c030294a>] error_code+0x72/0x78
 [<c011b4af>] activate_task+0x4f/0xb0
 [<c011e09d>] try_to_wake_up+0x2bd/0x420
 [<c011e279>] wake_up_process_mutex+0x19/0x20
 [<c01425cc>] wakeup_next_waiter+0xec/0x1a0
 [<c030173c>] rt_spin_lock_slowunlock+0x4c/0x70
 [<c0301ff6>] rt_spin_unlock+0x26/0x30
 [<c015b3e4>] put_zone_pcp+0x14/0x20
 [<c015c265>] get_page_from_freelist+0x145/0x380
 [<c015c4f4>] __alloc_pages+0x54/0x2d0
 [<c01652bd>] __handle_mm_fault+0x7dd/0x9a0
 [<c0304398>] do_page_fault+0x2f8/0x600
 [<c030294a>] error_code+0x72/0x78
 =======================
Code:  Bad EIP value.
EIP: [<00000000>] _stext+0x3feff000/0x20 SS:ESP 0068:c64a9d50
NMI watchdog detected lockup on CPU#1 (5000/5000)

Pid: 2882, comm:                klogd
EIP: 0060:[<c0302309>] CPU: 1
EIP is at __spin_lock+0x19/0x20
 EFLAGS: 00000082    Tainted: P        (2.6.22.1-rt4.0 #1)
EAX: c1812a80 EBX: c1812a80 ECX: 00000001 EDX: f6c57000
ESI: c0403a80 EDI: dff03230 EBP: f6c57d1c DS: 007b ES: 007b FS: 00d8
CR0: 8005003b CR2: ae41a000 CR3: 3730e000 CR4: 000006d0
 [<c010622a>] show_trace_log_lvl+0x1a/0x30
 [<c0106e12>] show_trace+0x12/0x20
 [<c0103af3>] show_regs+0x183/0x190
 [<c0303470>] nmi_watchdog_tick+0x1f0/0x290
 [<c0302ea7>] do_nmi+0x77/0x260
 [<c03029f3>] nmi_stack_correct+0x26/0x2b
 [<c011bbc7>] task_rq_lock+0x37/0x70
 [<c011de07>] try_to_wake_up+0x27/0x420
 [<c011e218>] default_wake_function+0x18/0x20
 [<c011a6f9>] __wake_up_common+0x39/0x60
 [<c012055b>] __wake_up_sync+0x3b/0x50
 [<c0293a09>] sock_def_readable+0x79/0x80
 [<c02fb010>] unix_dgram_sendmsg+0x450/0x500
 [<c028f044>] sock_aio_write+0x114/0x130
 [<c01781b0>] do_sync_write+0xd0/0x110
 [<c0178aad>] vfs_write+0x14d/0x160
 [<c01790cd>] sys_write+0x3d/0x70
 [<c01051d2>] sysenter_past_esp+0x5f/0x85
 =======================
NMI watchdog detected lockup on CPU#0 (0/5000)

Pid: 12848, comm:      thunderbird-bin
EIP: 0060:[<c0302309>] CPU: 0
EIP is at __spin_lock+0x19/0x20
 EFLAGS: 00000082    Tainted: P        (2.6.22.1-rt4.0 #1)
EAX: c1812a80 EBX: c1812a80 ECX: 00000000 EDX: c040d000
ESI: c0403a80 EDI: f74cf8f0 EBP: c040df50 DS: 007b ES: 007b FS: 00d8
CR0: 8005003b CR2: ffffffd5 CR3: 0c391000 CR4: 000006d0
 [<c010622a>] show_trace_log_lvl+0x1a/0x30
 [<c0106e12>] show_trace+0x12/0x20
 [<c0103af3>] show_regs+0x183/0x190
 [<c0303470>] nmi_watchdog_tick+0x1f0/0x290
 [<c0302ea7>] do_nmi+0x77/0x260
 [<c03029f3>] nmi_stack_correct+0x26/0x2b
 [<c011bbc7>] task_rq_lock+0x37/0x70
 [<c011de07>] try_to_wake_up+0x27/0x420
 [<c011e2b9>] wake_up_process+0x19/0x20
 [<c01519c7>] redirect_hardirq+0x47/0x60
 [<c015343b>] handle_fasteoi_irq+0x6b/0x100
 [<c01075f4>] do_IRQ+0x94/0x100
 [<c0105beb>] common_interrupt+0x23/0x28
 [<c0126798>] do_exit+0x88/0x890
 [<c01068a7>] die+0x257/0x260
 [<c0304233>] do_page_fault+0x193/0x600
 [<c030294a>] error_code+0x72/0x78
 [<c011b4af>] activate_task+0x4f/0xb0
 [<c011e09d>] try_to_wake_up+0x2bd/0x420
 [<c011e279>] wake_up_process_mutex+0x19/0x20
 [<c01425cc>] wakeup_next_waiter+0xec/0x1a0
 [<c030173c>] rt_spin_lock_slowunlock+0x4c/0x70
 [<c0301ff6>] rt_spin_unlock+0x26/0x30
 [<c015b3e4>] put_zone_pcp+0x14/0x20
 [<c015c265>] get_page_from_freelist+0x145/0x380
 [<c015c4f4>] __alloc_pages+0x54/0x2d0
 [<c01652bd>] __handle_mm_fault+0x7dd/0x9a0
 [<c0304398>] do_page_fault+0x2f8/0x600
 [<c030294a>] error_code+0x72/0x78
 =======================
...


Complete serial console capture:

  http://www.rncbc.org/datahub/console-2.6.22.1-rt4.0-2.log

.config evidence:

  http://www.rncbc.org/datahub/config-2.6.22.1-rt4.0

Bye now
-- 
rncbc aka Rui Nuno Capela
rncbc@rncbc.org

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.6.22.1-rt4 lockups
  2007-07-21 22:07 ` 2.6.22.1-rt4 lockups Rui Nuno Capela
  2007-07-22 21:00   ` Rui Nuno Capela
@ 2007-07-23 16:08   ` Daniel Walker
  2007-07-23 20:15     ` Daniel Walker
  1 sibling, 1 reply; 17+ messages in thread
From: Daniel Walker @ 2007-07-23 16:08 UTC (permalink / raw)
  To: Rui Nuno Capela; +Cc: Thomas Gleixner, Ingo Molnar, LKML, RT-Users

On Sat, 2007-07-21 at 23:07 +0100, Rui Nuno Capela wrote:

> Call Trace:
>  [<c010622a>] show_trace_log_lvl+0x1a/0x30
>  [<c01062f6>] show_stack_log_lvl+0xb6/0xe0
>  [<c0106521>] show_registers+0x201/0x330
>  [<c0106768>] die+0x118/0x260
>  [<c0304233>] do_page_fault+0x193/0x600
>  [<c030294a>] error_code+0x72/0x78
>  [<c011b4af>] activate_task+0x4f/0xb0
>  [<c011e09d>] try_to_wake_up+0x2bd/0x420
>  [<c011e279>] wake_up_process_mutex+0x19/0x20
>  [<c01425cc>] wakeup_next_waiter+0xec/0x1a0
>  [<c030173c>] rt_spin_lock_slowunlock+0x4c/0x70
>  [<c0301ff6>] rt_spin_unlock+0x26/0x30
>  [<c015b3e4>] put_zone_pcp+0x14/0x20
>  [<c015c265>] get_page_from_freelist+0x145/0x380
>  [<c015c4f4>] __alloc_pages+0x54/0x2d0
>  [<c01652bd>] __handle_mm_fault+0x7dd/0x9a0
>  [<c0304398>] do_page_fault+0x2f8/0x600
>  [<c030294a>] error_code+0x72/0x78
>  =======================

I was able to reproduce a similar looking hang when I combine kernbench
running with another load (I used ltpstress.sh from LTP) ..

I'm debugging it now ..

Daniel


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.6.22.1-rt4 lockups
  2007-07-23 16:08   ` Daniel Walker
@ 2007-07-23 20:15     ` Daniel Walker
  2007-07-23 20:38       ` Ingo Molnar
  0 siblings, 1 reply; 17+ messages in thread
From: Daniel Walker @ 2007-07-23 20:15 UTC (permalink / raw)
  To: Rui Nuno Capela; +Cc: Thomas Gleixner, Ingo Molnar, LKML, RT-Users

On Mon, 2007-07-23 at 09:08 -0700, Daniel Walker wrote:
> On Sat, 2007-07-21 at 23:07 +0100, Rui Nuno Capela wrote:
> 
> > Call Trace:
> >  [<c010622a>] show_trace_log_lvl+0x1a/0x30
> >  [<c01062f6>] show_stack_log_lvl+0xb6/0xe0
> >  [<c0106521>] show_registers+0x201/0x330
> >  [<c0106768>] die+0x118/0x260
> >  [<c0304233>] do_page_fault+0x193/0x600
> >  [<c030294a>] error_code+0x72/0x78
> >  [<c011b4af>] activate_task+0x4f/0xb0
> >  [<c011e09d>] try_to_wake_up+0x2bd/0x420
> >  [<c011e279>] wake_up_process_mutex+0x19/0x20
> >  [<c01425cc>] wakeup_next_waiter+0xec/0x1a0
> >  [<c030173c>] rt_spin_lock_slowunlock+0x4c/0x70
> >  [<c0301ff6>] rt_spin_unlock+0x26/0x30
> >  [<c015b3e4>] put_zone_pcp+0x14/0x20
> >  [<c015c265>] get_page_from_freelist+0x145/0x380
> >  [<c015c4f4>] __alloc_pages+0x54/0x2d0
> >  [<c01652bd>] __handle_mm_fault+0x7dd/0x9a0
> >  [<c0304398>] do_page_fault+0x2f8/0x600
> >  [<c030294a>] error_code+0x72/0x78
> >  =======================
> 
> I was able to reproduce a similar looking hang when I combine kernbench
> running with another load (I used ltpstress.sh from LTP) ..
> 
> I'm debugging it now ..

It looks like sched_class->enqueue_task() is NULL and that's why the
system hangs ..

The reason why that happens is because check_pgt_cache() is called from
the idle thread, and with PREEMPT_RT check_pgt_cache() locks at least
one mutex .. Once the idle thread is on a wait_list, as soon as it's
woke by the mutex owner the system will crash in enqueue_task. Since the
idle thread has a NULL sched_class->enqueue_task ..

check_pgt_cache() is already getting called from the desched_thread() ,
so I think it could just be removed from i386 cpu_idle().

Anyone have comments on the theory above?

Daniel


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.6.22.1-rt4 lockups
  2007-07-23 20:15     ` Daniel Walker
@ 2007-07-23 20:38       ` Ingo Molnar
  0 siblings, 0 replies; 17+ messages in thread
From: Ingo Molnar @ 2007-07-23 20:38 UTC (permalink / raw)
  To: Daniel Walker; +Cc: Rui Nuno Capela, Thomas Gleixner, LKML, RT-Users


* Daniel Walker <dwalker@mvista.com> wrote:

> It looks like sched_class->enqueue_task() is NULL and that's why the 
> system hangs ..
> 
> The reason why that happens is because check_pgt_cache() is called 
> from the idle thread, and with PREEMPT_RT check_pgt_cache() locks at 
> least one mutex .. Once the idle thread is on a wait_list, as soon as 
> it's woke by the mutex owner the system will crash in enqueue_task. 
> Since the idle thread has a NULL sched_class->enqueue_task ..
> 
> check_pgt_cache() is already getting called from the desched_thread() 
> , so I think it could just be removed from i386 cpu_idle().
> 
> Anyone have comments on the theory above?

yeah, that call definitely looks wrong in cpu_idle(). Most of the other 
check_pgd_cache() calls introduced by commit f1d1a842 look wrong too in 
an -rt context. Fix is below.

	Ingo

Index: linux-rt.q/arch/i386/kernel/process.c
===================================================================
--- linux-rt.q.orig/arch/i386/kernel/process.c
+++ linux-rt.q/arch/i386/kernel/process.c
@@ -189,7 +189,6 @@ void cpu_idle(void)
 
 			tick_nohz_stop_sched_tick();
 
-			check_pgt_cache();
 			rmb();
 			idle = pm_idle;
 

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2007-07-23 20:39 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-13 11:22 v2.6.22.1-rt3 Thomas Gleixner
2007-07-13 11:36 ` v2.6.22.1-rt3 Remy Bohmer
2007-07-13 16:05   ` v2.6.22.1-rt3 Thomas Gleixner
2007-07-13 16:10 ` v2.6.22.1-rt3 Kevin Hilman
2007-07-13 16:32 ` v2.6.22.1-rt3 Kevin Hilman
2007-07-13 17:18 ` v2.6.22.1-rt3 - Early INT13 boot crash Carsten Emde
2007-07-13 17:25 ` v2.6.22.1-rt3 Fernando Lopez-Lezcano
2007-07-14  0:33 ` v2.6.22.1-rt3 Josh Triplett
2007-07-14 21:39 ` 2.6.22.1-rt3 lockups Rui Nuno Capela
2007-07-20  3:37 ` v2.6.22.1-rt3 Daniel Walker
2007-07-20  3:41   ` v2.6.22.1-rt3 Daniel Walker
2007-07-21  0:25   ` v2.6.22.1-rt3 Thomas Gleixner
2007-07-21 22:07 ` 2.6.22.1-rt4 lockups Rui Nuno Capela
2007-07-22 21:00   ` Rui Nuno Capela
2007-07-23 16:08   ` Daniel Walker
2007-07-23 20:15     ` Daniel Walker
2007-07-23 20:38       ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox