All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xenomai-core] nervous nmi-watchdog
@ 2006-05-23 20:21 Jan Kiszka
  2006-06-05 16:35 ` Philippe Gerum
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Kiszka @ 2006-05-23 20:21 UTC (permalink / raw)
  To: xenomai-core

[-- Attachment #1: Type: text/plain, Size: 4049 bytes --]

Hi,

I'm getting nmi alarms about latency being > 100 us on a dual P-III 1GHz
(with and without CONFIG_SMP) once I start the latency test tool. But
the must be false positive. Can someone comment on this trace:

> :    *fn                 -109!  91.810  __ipipe_unstall_root+0x8 (default_idle+0x3f)
> :|    fn                  -18    0.220  __ipipe_handle_irq+0xe (common_interrupt+0x18)
> :|    fn                  -17    0.228  __ipipe_ack_system_irq+0x8 (__ipipe_handle_irq+0x7f)
> :|    fn                  -17    0.191  __ipipe_dispatch_wired+0xb (__ipipe_handle_irq+0x8a)
> :|  * fn                  -17    0.219  xnintr_clock_handler+0x8 (__ipipe_dispatch_wired+0x77)
> :|  * fn                  -17    0.198  rthal_nmi_disarm+0x8 (xnintr_clock_handler+0xd)
> :|  * fn                  -17    0.202  xnintr_irq_handler+0xb (xnintr_clock_handler+0x1d)
> :|  * fn                  -16    0.197  xnpod_announce_tick+0x8 (xnintr_irq_handler+0x24)
> :|  * fn                  -16    0.256  xntimer_do_tick_aperiodic+0xe (xnpod_announce_tick+0xf)
> :|  * fn                  -16    0.203  xnthread_periodic_handler+0x8 (xntimer_do_tick_aperiodic+0x7c)
> :|  * fn                  -16    0.590  xnpod_resume_thread+0xe (xnthread_periodic_handler+0x1c)
> :|  * fn                  -15    0.315  rthal_nmi_arm+0xe (xntimer_do_tick_aperiodic+0x1ed)
> :|  * (0x00) 0x000305fb   -15    0.223  rthal_nmi_arm+0xb5 (xntimer_do_tick_aperiodic+0x1ed)

[This is an ipipe_trace_special, reporting the delay (~200 us = 100 us
period + 100 us nmi-trigger).]

> :|  * fn                  -15    0.370  xnpod_schedule+0xe (xnintr_irq_handler+0x5f)
> :|  * fn                  -14    0.694  __switch_to+0xe (xnpod_schedule+0x557)
> :|  * fn                  -13    0.826  __ipipe_restore_pipeline_head+0x8 (xnpod_wait_thread_period+0x1a1)
> :     fn                  -13    0.222  __ipipe_syscall_root+0x9 (system_call+0x20)
> :     fn                  -12    0.235  __ipipe_dispatch_event+0xe (__ipipe_syscall_root+0x55)
> :     fn                  -12    0.223  hisyscall_event+0xe (__ipipe_dispatch_event+0x5e)
> :     fn                  -12    0.188  __rt_task_wait_period+0xd (hisyscall_event+0x220)
> :     fn                  -12    0.192  rt_task_wait_period+0x8 (__rt_task_wait_period+0x39)
> :     fn                  -12    0.250  xnpod_wait_thread_period+0xe (rt_task_wait_period+0x32)
> :|  * fn                  -11    0.270  xnpod_suspend_thread+0xb (xnpod_wait_thread_period+0x6b)
> :|  * fn                  -11    0.342  xnpod_schedule+0xe (xnpod_suspend_thread+0xeb)
> :|  * fn                  -11    0.584  __switch_to+0xe (xnpod_schedule+0x557)
> :|    fn                  -10    0.264  __ipipe_walk_pipeline+0xe (__ipipe_handle_irq+0x178)
> :|    fn                  -10    0.302  __ipipe_unstall_iret_root+0x8 (restore_raw+0x0)
> :     fn                  -10    0.191  __ipipe_stall_root+0x8 (default_idle+0x33)
> :    *fn                   -9+   6.641  __ipipe_unstall_root+0x8 (default_idle+0x3f)
> :|    fn                   -3    0.530  do_nmi+0xd (nmi_stack_correct+0x1d)
> :|    fn                   -2+   1.315  dummy_nmi_callback+0x8 (do_nmi+0x39)
> :|    fn                   -1    0.384  notifier_call_chain+0xb (do_nmi+0x7b)
> :|    fn                    0    0.612  rthal_nmi_watchdog_tick+0xe (do_nmi+0x99)
> :|    fn                    0    0.373  rthal_latency_above_max+0x8 (rthal_nmi_watchdog_tick+0x21)
> <|    freeze 0x00000064     0    1.060  rthal_latency_above_max+0x13 (rthal_nmi_watchdog_tick+0x21)

[And this happens less than 15 us after the arming.]

>  |    fn                    1    0.628  __ipipe_handle_irq+0xe (common_interrupt+0x18)
>  |    fn                    1    0.199  __ipipe_ack_common_irq+0xa (__ipipe_handle_irq+0xeb)
>  |    fn                    1    0.242  ipipe_test_and_stall_pipeline_from+0x8 (__ipipe_ack_common_irq+0x17)

Is there something like spurious nmi? No real nmi-related problem is
reported otherwise by the kernel.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xenomai-core] nervous nmi-watchdog
  2006-05-23 20:21 [Xenomai-core] nervous nmi-watchdog Jan Kiszka
@ 2006-06-05 16:35 ` Philippe Gerum
  2006-06-05 19:06   ` Gilles Chanteperdrix
  0 siblings, 1 reply; 11+ messages in thread
From: Philippe Gerum @ 2006-06-05 16:35 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai-core

Jan Kiszka wrote:
> Hi,
> 
> I'm getting nmi alarms about latency being > 100 us on a dual P-III 1GHz
> (with and without CONFIG_SMP) once I start the latency test tool. But
> the must be false positive. Can someone comment on this trace:
> 

Issue confirmed here, on a dual 750 Mhz PIII Celeron. Starting the 
latency test, the NMI watchdog pulls the break on cpu #0 after a few us, 
albeit rthal_nmi_arm() had been told to trigger the NMI more than a 
millisecond in the future (likely the HTICK emulation + 100 us NMI 
threshold).

Running a UP + LAPIC enabled kernel on the same hw did not trigger the 
spurious NMI, so I'd bet that the issue is SMP related, and not 
hw/perfctr related.

-- 

Philippe.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xenomai-core] nervous nmi-watchdog
  2006-06-05 16:35 ` Philippe Gerum
@ 2006-06-05 19:06   ` Gilles Chanteperdrix
  2006-06-05 19:15     ` Philippe Gerum
  0 siblings, 1 reply; 11+ messages in thread
From: Gilles Chanteperdrix @ 2006-06-05 19:06 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Jan Kiszka, xenomai-core

Philippe Gerum wrote:
 > Jan Kiszka wrote:
 > > Hi,
 > > 
 > > I'm getting nmi alarms about latency being > 100 us on a dual P-III 1GHz
 > > (with and without CONFIG_SMP) once I start the latency test tool. But
 > > the must be false positive. Can someone comment on this trace:
 > > 
 > 
 > Issue confirmed here, on a dual 750 Mhz PIII Celeron. Starting the 
 > latency test, the NMI watchdog pulls the break on cpu #0 after a few us, 
 > albeit rthal_nmi_arm() had been told to trigger the NMI more than a 
 > millisecond in the future (likely the HTICK emulation + 100 us NMI 
 > threshold).
 > 
 > Running a UP + LAPIC enabled kernel on the same hw did not trigger the 
 > spurious NMI, so I'd bet that the issue is SMP related, and not 
 > hw/perfctr related.

I never observe this on my dual PIII whereas I always have the NMI
watchdog option enabled. Are you running with or without the tracer ?

-- 


					    Gilles Chanteperdrix.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xenomai-core] nervous nmi-watchdog
  2006-06-05 19:06   ` Gilles Chanteperdrix
@ 2006-06-05 19:15     ` Philippe Gerum
  2006-07-09 12:50       ` Gilles Chanteperdrix
  0 siblings, 1 reply; 11+ messages in thread
From: Philippe Gerum @ 2006-06-05 19:15 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Jan Kiszka, xenomai-core

Gilles Chanteperdrix wrote:
> Philippe Gerum wrote:
>  > Jan Kiszka wrote:
>  > > Hi,
>  > > 
>  > > I'm getting nmi alarms about latency being > 100 us on a dual P-III 1GHz
>  > > (with and without CONFIG_SMP) once I start the latency test tool. But
>  > > the must be false positive. Can someone comment on this trace:
>  > > 
>  > 
>  > Issue confirmed here, on a dual 750 Mhz PIII Celeron. Starting the 
>  > latency test, the NMI watchdog pulls the break on cpu #0 after a few us, 
>  > albeit rthal_nmi_arm() had been told to trigger the NMI more than a 
>  > millisecond in the future (likely the HTICK emulation + 100 us NMI 
>  > threshold).
>  > 
>  > Running a UP + LAPIC enabled kernel on the same hw did not trigger the 
>  > spurious NMI, so I'd bet that the issue is SMP related, and not 
>  > hw/perfctr related.
> 
> I never observe this on my dual PIII whereas I always have the NMI
> watchdog option enabled. Are you running with or without the tracer ?
> 

w/o.

-- 

Philippe.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xenomai-core] nervous nmi-watchdog
  2006-06-05 19:15     ` Philippe Gerum
@ 2006-07-09 12:50       ` Gilles Chanteperdrix
  2006-07-09 16:41         ` Philippe Gerum
  0 siblings, 1 reply; 11+ messages in thread
From: Gilles Chanteperdrix @ 2006-07-09 12:50 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Jan Kiszka, xenomai-core

Philippe Gerum wrote:
 > > I never observe this on my dual PIII whereas I always have the NMI
 > > watchdog option enabled. Are you running with or without the tracer ?
 > > 
 > 
 > w/o.

enabling xeno nmi watchdog whereas the nucleus module is built-in break
Linux nmi watchdog test. Maybe some setups are not done at Linux level
when this test fails, which could explain some weird behaviour
afterwards.

Do you have the message:
Testing NMI watchdog... CPU#0: NMI appears to be stuck (x->y)!

In Linux boot messages ?

-- 


					    Gilles Chanteperdrix.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xenomai-core] nervous nmi-watchdog
  2006-07-09 12:50       ` Gilles Chanteperdrix
@ 2006-07-09 16:41         ` Philippe Gerum
  2006-07-09 16:56           ` Jan Kiszka
  0 siblings, 1 reply; 11+ messages in thread
From: Philippe Gerum @ 2006-07-09 16:41 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Jan Kiszka, xenomai-core

On Sun, 2006-07-09 at 14:50 +0200, Gilles Chanteperdrix wrote:
> Philippe Gerum wrote:
>  > > I never observe this on my dual PIII whereas I always have the NMI
>  > > watchdog option enabled. Are you running with or without the tracer ?
>  > > 
>  > 
>  > w/o.
> 
> enabling xeno nmi watchdog whereas the nucleus module is built-in break
> Linux nmi watchdog test. Maybe some setups are not done at Linux level
> when this test fails, which could explain some weird behaviour
> afterwards.
> 
> Do you have the message:
> Testing NMI watchdog... CPU#0: NMI appears to be stuck (x->y)!
> 
> In Linux boot messages ?

Nope. The NMI test looks ok.

Linux version 2.6.17-ipipe (rpm@xenomai.org) (gcc version 3.3.3 (Debian
20040321)) #1 SMP Tue Jul 11 03:17:18 CEST 2006
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
 BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000003fffd000 (usable)
 BIOS-e820: 000000003fffd000 - 000000003ffff000 (ACPI data)
 BIOS-e820: 000000003ffff000 - 0000000040000000 (ACPI NVS)
 BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
 BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
 BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved)
127MB HIGHMEM available.
896MB LOWMEM available.
found SMP MP-table at 000f6e80
On node 0 totalpages: 262141
  DMA zone: 4096 pages, LIFO batch:0
  Normal zone: 225280 pages, LIFO batch:31
  HighMem zone: 32765 pages, LIFO batch:7
DMI 2.0 present.
Intel MultiProcessor Specification v1.1
    Virtual Wire compatibility mode.
OEM ID: OEM00000 Product ID: PROD00000000 APIC at: 0xFEE00000
Processor #1 6:8 APIC version 17
Processor #0 6:8 APIC version 17
I/O APIC #2 Version 17 at 0xFEC00000.
Enabling APIC mode:  Flat.  Using 1 I/O APICs
Processors: 2
Allocating PCI resources starting at 50000000 (gap: 40000000:bec00000)
Built 1 zonelists
Kernel command line: root=/dev/hdc1 ro nmi_watchdog=1 vga=1 console=tty0
console=ttyS0,115200
mapped APIC to ffffd000 (fee00000)
mapped IOAPIC to ffffc000 (fec00000)
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 16384 bytes)
Detected 751.749 MHz processor.
Using tsc for high-res timesource
I-pipe 1.3-07: pipeline enabled.
Console: colour VGA+ 80x50
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Memory: 1032916k/1048564k available (2491k kernel code, 15100k reserved,
743k data, 272k init, 131060k highmem)
Checking if this processor honours the WP bit even in supervisor mode...
Ok.
Calibrating delay using timer specific routine.. 1505.65 BogoMIPS
(lpj=7528285)
Mount-cache hash table entries: 512
CPU: After generic identify, caps: 0383fbff 00000000 00000000 00000000
00000000 00000000 00000000
CPU: After vendor identify, caps: 0383fbff 00000000 00000000 00000000
00000000 00000000 00000000
CPU: L1 I cache: 16K, L1 D cache: 16K
CPU: L2 cache: 256K
CPU: After all inits, caps: 0383fbff 00000000 00000000 00000040 00000000
00000000 00000000
Checking 'hlt' instruction... OK.
Freeing SMP alternatives: 28k freed
CPU0: Intel Pentium III (Coppermine) stepping 03
Booting processor 1/0 eip 2000
Initializing CPU#1
Calibrating delay using timer specific routine.. 1503.45 BogoMIPS
(lpj=7517293)
CPU: After generic identify, caps: 0383fbff 00000000 00000000 00000000
00000000 00000000 00000000
CPU: After vendor identify, caps: 0383fbff 00000000 00000000 00000000
00000000 00000000 00000000
CPU: L1 I cache: 16K, L1 D cache: 16K
CPU: L2 cache: 256K
CPU: After all inits, caps: 0383fbff 00000000 00000000 00000040 00000000
00000000 00000000
CPU1: Intel Pentium III (Coppermine) stepping 03
Total of 2 processors activated (3009.11 BogoMIPS).
ExtINT not setup in hardware but reported by MP table
ENABLING IO-APIC IRQs
..TIMER: vector=0x31 apic1=0 pin1=2 apic2=0 pin2=0
checking TSC synchronization across 2 CPUs: passed.
Brought up 2 CPUs
migration_cost=1897
NET: Registered protocol family 16
EISA bus registered
PCI: PCI BIOS revision 2.10 entry at 0xf0730, last bus=1
Setting up standard PCI resources
usbcore: registered new driver usbfs
usbcore: registered new driver hub
PCI: Probing PCI hardware
PCI: Probing PCI hardware (bus 00)
PCI quirk: region e400-e43f claimed by PIIX4 ACPI
PCI quirk: region e800-e80f claimed by PIIX4 SMB
PIIX4 devres B PIO at 0290-0297
Boot video device is 0000:01:00.0
PCI: Using IRQ router PIIX/ICH [8086/7110] at 0000:00:04.0
PCI: Bridge: 0000:00:01.0
  IO window: disabled.
  MEM window: e1800000-e2dfffff
  PREFETCH window: e2f00000-e3ffffff
NET: Registered protocol family 2
IP route cache hash table entries: 32768 (order: 5, 131072 bytes)
TCP established hash table entries: 131072 (order: 9, 2621440 bytes)
TCP bind hash table entries: 65536 (order: 8, 1310720 bytes)
TCP: Hash tables configured (established 131072 bind 65536)
TCP reno registered
IA-32 Microcode Update Driver: v1.14 <tigran@domain.hid>
I-pipe: Domain Xenomai registered.
Xenomai: hal/x86 started.
Xenomai: real-time nucleus v2.2-rc3 (Engines Of Creation) loaded.
Xenomai: NMI watchdog started (threshold=100 us).
Xenomai: starting native API services.
Xenomai: starting POSIX services.
Xenomai: starting RTDM services.
highmem bounce pool size: 64 pages
Installing knfsd (copyright (C) 1996 okir@domain.hid).
io scheduler noop registered
io scheduler anticipatory registered (default)
io scheduler deadline registered
io scheduler cfq registered
Limiting direct PCI/PCI transfers.
Linux agpgart interface v0.101 (c) Dave Jones
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing
disabled
serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
RAMDISK driver initialized: 16 RAM disks of 8192K size 1024 blocksize
loop: loaded (max 8 devices)
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with
idebus=xx
PIIX4: IDE controller at PCI slot 0000:00:04.1
PIIX4: chipset revision 1
PIIX4: not 100% native mode: will probe irqs later
    ide0: BM-DMA at 0xd800-0xd807, BIOS settings: hda:pio, hdb:pio
    ide1: BM-DMA at 0xd808-0xd80f, BIOS settings: hdc:DMA, hdd:pio
Probing IDE interface ide0...
Probing IDE interface ide1...
hdc: SAMSUNG SV2044D, ATA DISK drive
ide1 at 0x170-0x177,0x376 on irq 15
hdc: max request size: 128KiB
hdc: 39862368 sectors (20409 MB) w/472KiB Cache, CHS=39546/16/63,
UDMA(33)
hdc: cache flushes not supported
 hdc: hdc1 hdc2
usbmon: debugfs is not available
serio: i8042 AUX port at 0x60,0x64 irq 12
serio: i8042 KBD port at 0x60,0x64 irq 1
mice: PS/2 mouse device common for all mice
EISA: Probing bus 0 at eisa.0
TCP bic registered
NET: Registered protocol family 1
NET: Registered protocol family 17
Testing NMI watchdog ... <6>input: AT Raw Set 2 keyboard
as /class/input/input0
OK.
Starting balanced_irq
Using IPI Shortcut mode
kjournald starting.  Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
VFS: Mounted root (ext3 filesystem) readonly.
Freeing unused kernel memory: 272k freed
EXT3 FS on hdc1, internal journal
kjournald starting.  Commit interval 5 seconds
EXT3 FS on hdc2, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
3c59x: Donald Becker and others. www.scyld.com/network/vortex.html
0000:00:09.0: 3Com PCI 3c980C Python-T at f880e000.
0000:00:0a.0: 3Com PCI 3c980C Python-T at f8810000.
eth0:  setting half-duplex.
netconsole: local port 6665
netconsole: interface eth0
netconsole: remote port 6666
netconsole: remote IP 192.168.0.8
netconsole: remote ethernet address 00:11:2f:0c:f1:ca
netconsole: local IP 192.168.0.7
netconsole: network logging started


-- 
Philippe.




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xenomai-core] nervous nmi-watchdog
  2006-07-09 16:41         ` Philippe Gerum
@ 2006-07-09 16:56           ` Jan Kiszka
  2006-07-09 17:07             ` Philippe Gerum
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Kiszka @ 2006-07-09 16:56 UTC (permalink / raw)
  To: rpm; +Cc: xenomai-core

[-- Attachment #1: Type: text/plain, Size: 8679 bytes --]

Philippe Gerum wrote:
> On Sun, 2006-07-09 at 14:50 +0200, Gilles Chanteperdrix wrote:
>> Philippe Gerum wrote:
>>  > > I never observe this on my dual PIII whereas I always have the NMI
>>  > > watchdog option enabled. Are you running with or without the tracer ?
>>  > > 
>>  > 
>>  > w/o.
>>
>> enabling xeno nmi watchdog whereas the nucleus module is built-in break
>> Linux nmi watchdog test. Maybe some setups are not done at Linux level
>> when this test fails, which could explain some weird behaviour
>> afterwards.
>>
>> Do you have the message:
>> Testing NMI watchdog... CPU#0: NMI appears to be stuck (x->y)!
>>
>> In Linux boot messages ?
> 
> Nope. The NMI test looks ok.
> 
> Linux version 2.6.17-ipipe (rpm@xenomai.org) (gcc version 3.3.3 (Debian
> 20040321)) #1 SMP Tue Jul 11 03:17:18 CEST 2006
> BIOS-provided physical RAM map:
>  BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
>  BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
>  BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
>  BIOS-e820: 0000000000100000 - 000000003fffd000 (usable)
>  BIOS-e820: 000000003fffd000 - 000000003ffff000 (ACPI data)
>  BIOS-e820: 000000003ffff000 - 0000000040000000 (ACPI NVS)
>  BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
>  BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
>  BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved)
> 127MB HIGHMEM available.
> 896MB LOWMEM available.
> found SMP MP-table at 000f6e80
> On node 0 totalpages: 262141
>   DMA zone: 4096 pages, LIFO batch:0
>   Normal zone: 225280 pages, LIFO batch:31
>   HighMem zone: 32765 pages, LIFO batch:7
> DMI 2.0 present.
> Intel MultiProcessor Specification v1.1
>     Virtual Wire compatibility mode.
> OEM ID: OEM00000 Product ID: PROD00000000 APIC at: 0xFEE00000
> Processor #1 6:8 APIC version 17
> Processor #0 6:8 APIC version 17
> I/O APIC #2 Version 17 at 0xFEC00000.
> Enabling APIC mode:  Flat.  Using 1 I/O APICs
> Processors: 2
> Allocating PCI resources starting at 50000000 (gap: 40000000:bec00000)
> Built 1 zonelists
> Kernel command line: root=/dev/hdc1 ro nmi_watchdog=1 vga=1 console=tty0
> console=ttyS0,115200
> mapped APIC to ffffd000 (fee00000)
> mapped IOAPIC to ffffc000 (fec00000)
> Enabling fast FPU save and restore... done.
> Enabling unmasked SIMD FPU exception support... done.
> Initializing CPU#0
> PID hash table entries: 4096 (order: 12, 16384 bytes)
> Detected 751.749 MHz processor.
> Using tsc for high-res timesource
> I-pipe 1.3-07: pipeline enabled.
> Console: colour VGA+ 80x50
> Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
> Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
> Memory: 1032916k/1048564k available (2491k kernel code, 15100k reserved,
> 743k data, 272k init, 131060k highmem)
> Checking if this processor honours the WP bit even in supervisor mode...
> Ok.
> Calibrating delay using timer specific routine.. 1505.65 BogoMIPS
> (lpj=7528285)
> Mount-cache hash table entries: 512
> CPU: After generic identify, caps: 0383fbff 00000000 00000000 00000000
> 00000000 00000000 00000000
> CPU: After vendor identify, caps: 0383fbff 00000000 00000000 00000000
> 00000000 00000000 00000000
> CPU: L1 I cache: 16K, L1 D cache: 16K
> CPU: L2 cache: 256K
> CPU: After all inits, caps: 0383fbff 00000000 00000000 00000040 00000000
> 00000000 00000000
> Checking 'hlt' instruction... OK.
> Freeing SMP alternatives: 28k freed
> CPU0: Intel Pentium III (Coppermine) stepping 03
> Booting processor 1/0 eip 2000
> Initializing CPU#1
> Calibrating delay using timer specific routine.. 1503.45 BogoMIPS
> (lpj=7517293)
> CPU: After generic identify, caps: 0383fbff 00000000 00000000 00000000
> 00000000 00000000 00000000
> CPU: After vendor identify, caps: 0383fbff 00000000 00000000 00000000
> 00000000 00000000 00000000
> CPU: L1 I cache: 16K, L1 D cache: 16K
> CPU: L2 cache: 256K
> CPU: After all inits, caps: 0383fbff 00000000 00000000 00000040 00000000
> 00000000 00000000
> CPU1: Intel Pentium III (Coppermine) stepping 03
> Total of 2 processors activated (3009.11 BogoMIPS).
> ExtINT not setup in hardware but reported by MP table
> ENABLING IO-APIC IRQs
> ..TIMER: vector=0x31 apic1=0 pin1=2 apic2=0 pin2=0
> checking TSC synchronization across 2 CPUs: passed.
> Brought up 2 CPUs
> migration_cost=1897
> NET: Registered protocol family 16
> EISA bus registered
> PCI: PCI BIOS revision 2.10 entry at 0xf0730, last bus=1
> Setting up standard PCI resources
> usbcore: registered new driver usbfs
> usbcore: registered new driver hub
> PCI: Probing PCI hardware
> PCI: Probing PCI hardware (bus 00)
> PCI quirk: region e400-e43f claimed by PIIX4 ACPI
> PCI quirk: region e800-e80f claimed by PIIX4 SMB
> PIIX4 devres B PIO at 0290-0297
> Boot video device is 0000:01:00.0
> PCI: Using IRQ router PIIX/ICH [8086/7110] at 0000:00:04.0
> PCI: Bridge: 0000:00:01.0
>   IO window: disabled.
>   MEM window: e1800000-e2dfffff
>   PREFETCH window: e2f00000-e3ffffff
> NET: Registered protocol family 2
> IP route cache hash table entries: 32768 (order: 5, 131072 bytes)
> TCP established hash table entries: 131072 (order: 9, 2621440 bytes)
> TCP bind hash table entries: 65536 (order: 8, 1310720 bytes)
> TCP: Hash tables configured (established 131072 bind 65536)
> TCP reno registered
> IA-32 Microcode Update Driver: v1.14 <tigran@domain.hid>
> I-pipe: Domain Xenomai registered.
> Xenomai: hal/x86 started.
> Xenomai: real-time nucleus v2.2-rc3 (Engines Of Creation) loaded.
> Xenomai: NMI watchdog started (threshold=100 us).
> Xenomai: starting native API services.
> Xenomai: starting POSIX services.
> Xenomai: starting RTDM services.
> highmem bounce pool size: 64 pages
> Installing knfsd (copyright (C) 1996 okir@domain.hid).
> io scheduler noop registered
> io scheduler anticipatory registered (default)
> io scheduler deadline registered
> io scheduler cfq registered
> Limiting direct PCI/PCI transfers.
> Linux agpgart interface v0.101 (c) Dave Jones
> Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing
> disabled
> serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
> serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
> RAMDISK driver initialized: 16 RAM disks of 8192K size 1024 blocksize
> loop: loaded (max 8 devices)
> Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
> ide: Assuming 33MHz system bus speed for PIO modes; override with
> idebus=xx
> PIIX4: IDE controller at PCI slot 0000:00:04.1
> PIIX4: chipset revision 1
> PIIX4: not 100% native mode: will probe irqs later
>     ide0: BM-DMA at 0xd800-0xd807, BIOS settings: hda:pio, hdb:pio
>     ide1: BM-DMA at 0xd808-0xd80f, BIOS settings: hdc:DMA, hdd:pio
> Probing IDE interface ide0...
> Probing IDE interface ide1...
> hdc: SAMSUNG SV2044D, ATA DISK drive
> ide1 at 0x170-0x177,0x376 on irq 15
> hdc: max request size: 128KiB
> hdc: 39862368 sectors (20409 MB) w/472KiB Cache, CHS=39546/16/63,
> UDMA(33)
> hdc: cache flushes not supported
>  hdc: hdc1 hdc2
> usbmon: debugfs is not available
> serio: i8042 AUX port at 0x60,0x64 irq 12
> serio: i8042 KBD port at 0x60,0x64 irq 1
> mice: PS/2 mouse device common for all mice
> EISA: Probing bus 0 at eisa.0
> TCP bic registered
> NET: Registered protocol family 1
> NET: Registered protocol family 17
> Testing NMI watchdog ... <6>input: AT Raw Set 2 keyboard
> as /class/input/input0
> OK.
> Starting balanced_irq
> Using IPI Shortcut mode
> kjournald starting.  Commit interval 5 seconds
> EXT3-fs: mounted filesystem with ordered data mode.
> VFS: Mounted root (ext3 filesystem) readonly.
> Freeing unused kernel memory: 272k freed
> EXT3 FS on hdc1, internal journal
> kjournald starting.  Commit interval 5 seconds
> EXT3 FS on hdc2, internal journal
> EXT3-fs: mounted filesystem with ordered data mode.
> 3c59x: Donald Becker and others. www.scyld.com/network/vortex.html
> 0000:00:09.0: 3Com PCI 3c980C Python-T at f880e000.
> 0000:00:0a.0: 3Com PCI 3c980C Python-T at f8810000.
> eth0:  setting half-duplex.
> netconsole: local port 6665
> netconsole: interface eth0
> netconsole: remote port 6666
> netconsole: remote IP 192.168.0.8
> netconsole: remote ethernet address 00:11:2f:0c:f1:ca
> netconsole: local IP 192.168.0.7
> netconsole: network logging started
> 
> 

I can confirm the failing NMI test here on my notebook with both nucleus
and native skin built into the kernel. I haven't seen false positive
NMIs yet, but the tracer is still on. Will switch off and re-check.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 249 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xenomai-core] nervous nmi-watchdog
  2006-07-09 16:56           ` Jan Kiszka
@ 2006-07-09 17:07             ` Philippe Gerum
  2006-07-09 18:13               ` Jan Kiszka
  2006-07-09 19:36               ` Gilles Chanteperdrix
  0 siblings, 2 replies; 11+ messages in thread
From: Philippe Gerum @ 2006-07-09 17:07 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai-core

On Sun, 2006-07-09 at 18:56 +0200, Jan Kiszka wrote:
> I can confirm the failing NMI test here on my notebook with both nucleus
> and native skin built into the kernel. I haven't seen false positive
> NMIs yet, but the tracer is still on. Will switch off and re-check.

FWIW, here, the false positive is raised immediately when starting the
latency test.

-- 
Philippe.




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xenomai-core] nervous nmi-watchdog
  2006-07-09 17:07             ` Philippe Gerum
@ 2006-07-09 18:13               ` Jan Kiszka
  2006-07-09 19:36               ` Gilles Chanteperdrix
  1 sibling, 0 replies; 11+ messages in thread
From: Jan Kiszka @ 2006-07-09 18:13 UTC (permalink / raw)
  To: rpm; +Cc: xenomai-core

[-- Attachment #1: Type: text/plain, Size: 767 bytes --]

Philippe Gerum wrote:
> On Sun, 2006-07-09 at 18:56 +0200, Jan Kiszka wrote:
>> I can confirm the failing NMI test here on my notebook with both nucleus
>> and native skin built into the kernel. I haven't seen false positive
>> NMIs yet, but the tracer is still on. Will switch off and re-check.
> 
> FWIW, here, the false positive is raised immediately when starting the
> latency test.
> 

Still no "luck" here, i.e. the watchdog only triggers when I reduce the
threshold. But I'm not on the box where I originally observed this
effect, and that one is out of reach for me now. Remains mysterious,
maybe chipset-dependent. Fortunately, it's only a debug feature.

Jan


PS: Hey, looks good for France. I do have to switch the "program" now. :)


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 249 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xenomai-core] nervous nmi-watchdog
  2006-07-09 17:07             ` Philippe Gerum
  2006-07-09 18:13               ` Jan Kiszka
@ 2006-07-09 19:36               ` Gilles Chanteperdrix
  2006-07-10 17:28                 ` Gilles Chanteperdrix
  1 sibling, 1 reply; 11+ messages in thread
From: Gilles Chanteperdrix @ 2006-07-09 19:36 UTC (permalink / raw)
  To: rpm; +Cc: Jan Kiszka, xenomai-core

[-- Attachment #1: message body and .signature --]
[-- Type: text/plain, Size: 626 bytes --]

Philippe Gerum wrote:
 > On Sun, 2006-07-09 at 18:56 +0200, Jan Kiszka wrote:
 > > I can confirm the failing NMI test here on my notebook with both nucleus
 > > and native skin built into the kernel. I haven't seen false positive
 > > NMIs yet, but the tracer is still on. Will switch off and re-check.
 > 
 > FWIW, here, the false positive is raised immediately when starting the
 > latency test.

The attached patch attempt to workaround these early shots, count them,
and display them in /proc/xenomai/nmi_early_shots. Could you try it on
your box and see if the /proc display moves ?

-- 


					    Gilles Chanteperdrix.

[-- Attachment #2: xeno-nmi.diff --]
[-- Type: text/plain, Size: 2087 bytes --]

Index: ksrc/arch/i386/nmi.c
===================================================================
--- ksrc/arch/i386/nmi.c	(revision 1316)
+++ ksrc/arch/i386/nmi.c	(working copy)
@@ -61,6 +61,9 @@
         unsigned long perfctr_msr;
         unsigned long long next_linux_check;
         unsigned int p4_cccr_val;
+
+		unsigned early_shots;
+		unsigned long long arm_date;
     };
     char __pad[SMP_CACHE_BYTES];
 } rthal_nmi_wd_t ____cacheline_aligned;
@@ -108,8 +111,13 @@
     rthal_nmi_wd_t *wd = &rthal_nmi_wds[cpu];
     unsigned long long now;
 
-    if (wd->armed)
+	if (wd->armed) {
+		if (rthal_rdtsc() - wd->arm_date < rthal_maxlat_tsc) {
+			++wd->early_shots;
+			wd->next_linux_check = wd->arm_date + rthal_maxlat_tsc;
+		} else
         rthal_nmi_emergency(regs);
+	}
 
     now = rthal_rdtsc();
 
@@ -142,6 +150,27 @@
     wrmsrl(wd->perfctr_msr, now - wd->next_linux_check);
 }
 
+static int earlyshots_read_proc(char *page,
+				char **start,
+				off_t off, int count, int *eof, void *data)
+{
+	int i, len = 0;
+
+	for_each_online_cpu(i)
+		len += sprintf(page + len, "CPU#%d: %u\n",
+			       i, rthal_nmi_wds[i].early_shots);
+	len -= off;
+	if (len <= off + count)
+		*eof = 1;
+	*start = page + off;
+	if (len > count)
+		len = count;
+	if (len < 0)
+		len = 0;
+
+	return len;
+}
+
 int rthal_nmi_request(void (*emergency) (struct pt_regs *))
 {
     if (!nmi_active || !nmi_watchdog_tick)
@@ -180,6 +209,11 @@
     rthal_linux_nmi_tick = nmi_watchdog_tick;
     wmb();
     nmi_watchdog_tick = &rthal_nmi_watchdog_tick;
+
+	__rthal_add_proc_leaf("nmi_early_shots",
+			      &earlyshots_read_proc,
+			      NULL, NULL, rthal_proc_root);
+
     return 0;
 }
 
@@ -188,6 +222,8 @@
     if (!rthal_linux_nmi_tick)
         return;
 
+	remove_proc_entry("nmi_early_shots", rthal_proc_root);
+
     wrmsrl(rthal_nmi_perfctr_msr, 0 - RTHAL_CPU_FREQ);
     touch_nmi_watchdog();
     wmb();
@@ -215,6 +251,7 @@
         rthal_local_irq_restore(flags);
     }
 
+	wd->arm_date = rthal_rdtsc();
     wrmsrl(wd->perfctr_msr, 0 - delay);
     wmb();
     wd->armed = 1;

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xenomai-core] nervous nmi-watchdog
  2006-07-09 19:36               ` Gilles Chanteperdrix
@ 2006-07-10 17:28                 ` Gilles Chanteperdrix
  0 siblings, 0 replies; 11+ messages in thread
From: Gilles Chanteperdrix @ 2006-07-10 17:28 UTC (permalink / raw)
  To: rpm, Jan Kiszka, xenomai-core

[-- Attachment #1: message body and .signature --]
[-- Type: text/plain, Size: 722 bytes --]

Gilles Chanteperdrix wrote:
 > Philippe Gerum wrote:
 >  > On Sun, 2006-07-09 at 18:56 +0200, Jan Kiszka wrote:
 >  > > I can confirm the failing NMI test here on my notebook with both nucleus
 >  > > and native skin built into the kernel. I haven't seen false positive
 >  > > NMIs yet, but the tracer is still on. Will switch off and re-check.
 >  > 
 >  > FWIW, here, the false positive is raised immediately when starting the
 >  > latency test.
 > 
 > The attached patch attempt to workaround these early shots, count them,
 > and display them in /proc/xenomai/nmi_early_shots. Could you try it on
 > your box and see if the /proc display moves ?

This patch should work better.

-- 


					    Gilles Chanteperdrix.

[-- Attachment #2: xeno-nmi.2.diff --]
[-- Type: text/plain, Size: 2129 bytes --]

Index: ksrc/arch/i386/nmi.c
===================================================================
--- ksrc/arch/i386/nmi.c	(revision 1322)
+++ ksrc/arch/i386/nmi.c	(working copy)
@@ -61,6 +61,9 @@
         unsigned long perfctr_msr;
         unsigned long long next_linux_check;
         unsigned int p4_cccr_val;
+
+		unsigned early_shots;
+		unsigned long long tick_date;
     };
     char __pad[SMP_CACHE_BYTES];
 } rthal_nmi_wd_t ____cacheline_aligned;
@@ -108,8 +111,13 @@
     rthal_nmi_wd_t *wd = &rthal_nmi_wds[cpu];
     unsigned long long now;
 
-    if (wd->armed)
+	if (wd->armed) {
+		if (rthal_rdtsc() - wd->tick_date < rthal_maxlat_tsc) {
+			++wd->early_shots;
+			wd->next_linux_check = wd->tick_date + rthal_maxlat_tsc;
+		} else
         rthal_nmi_emergency(regs);
+	}
 
     now = rthal_rdtsc();
 
@@ -142,6 +150,27 @@
     wrmsrl(wd->perfctr_msr, now - wd->next_linux_check);
 }
 
+static int earlyshots_read_proc(char *page,
+				char **start,
+				off_t off, int count, int *eof, void *data)
+{
+	int i, len = 0;
+
+	for_each_online_cpu(i)
+		len += sprintf(page + len, "CPU#%d: %u\n",
+			       i, rthal_nmi_wds[i].early_shots);
+	len -= off;
+	if (len <= off + count)
+		*eof = 1;
+	*start = page + off;
+	if (len > count)
+		len = count;
+	if (len < 0)
+		len = 0;
+
+	return len;
+}
+
 int rthal_nmi_request(void (*emergency) (struct pt_regs *))
 {
     if (!nmi_active || !nmi_watchdog_tick)
@@ -180,6 +209,11 @@
     rthal_linux_nmi_tick = nmi_watchdog_tick;
     wmb();
     nmi_watchdog_tick = &rthal_nmi_watchdog_tick;
+
+	__rthal_add_proc_leaf("nmi_early_shots",
+			      &earlyshots_read_proc,
+			      NULL, NULL, rthal_proc_root);
+
     return 0;
 }
 
@@ -188,6 +222,8 @@
     if (!rthal_linux_nmi_tick)
         return;
 
+	remove_proc_entry("nmi_early_shots", rthal_proc_root);
+
     wrmsrl(rthal_nmi_perfctr_msr, 0 - RTHAL_CPU_FREQ);
     touch_nmi_watchdog();
     wmb();
@@ -215,6 +251,8 @@
         rthal_local_irq_restore(flags);
     }
 
+	wd->tick_date = rthal_rdtsc() + (delay - rthal_maxlat_tsc);
+	wmb();
     wrmsrl(wd->perfctr_msr, 0 - delay);
     wmb();
     wd->armed = 1;

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2006-07-10 17:28 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-05-23 20:21 [Xenomai-core] nervous nmi-watchdog Jan Kiszka
2006-06-05 16:35 ` Philippe Gerum
2006-06-05 19:06   ` Gilles Chanteperdrix
2006-06-05 19:15     ` Philippe Gerum
2006-07-09 12:50       ` Gilles Chanteperdrix
2006-07-09 16:41         ` Philippe Gerum
2006-07-09 16:56           ` Jan Kiszka
2006-07-09 17:07             ` Philippe Gerum
2006-07-09 18:13               ` Jan Kiszka
2006-07-09 19:36               ` Gilles Chanteperdrix
2006-07-10 17:28                 ` Gilles Chanteperdrix

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.