hrtimer problem on AT91RM9200

linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* hrtimer problem on AT91RM9200
@ 2009-08-19 12:24 Bo Hansen
  2009-08-19 15:33 ` Remy Bohmer
  2009-08-20 19:52 ` Uwe Kleine-König
  0 siblings, 2 replies; 19+ messages in thread
From: Bo Hansen @ 2009-08-19 12:24 UTC (permalink / raw)
  To: rt-users

[-- Attachment #1: Type: text/plain, Size: 1118 bytes --]

Hi,

I have tested 2.6.29.6 with/without rt23 and 2.6.31-rc6 with/without rt2,
but I keep getting kernel panics sometimes soon after I start cyclictest 
(v0.5),
but sometimes it runs for hours without problems.

I have tested with different options and the only one making a
difference is disabling high resolution timers. Whenever this is disabled
(RT patch or not) the system does not panic. This of course does not
make the latency as good as I would like it to be.

A kernel panic from both an RT and non-RT kernel with high resolution
timer has been attached. The source seems to vary from time to time.

I noticed than when I stress the system I sometimes get the
following message:
hrtimer: interrupt too slow, forcing clock min delta to 52482 ns

And often (but no always) the system panics a little while later. I guess
it tells the system is too busy? but should it really end in a kernel
panic? Using the RT patch I understand as we can set the priority
as we like and stall the system completely, but on a non-RT kernel this
should not be possible?

Any thoughts on how to proceed?


Thanks in advance,
Bo

[-- Attachment #2: 190809_kernel_panic_2.6.29.6-highres_2 --]
[-- Type: text/plain, Size: 4836 bytes --]

hrtimer: interrupt too slow, forcing clock min delta to 52482 ns
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = c3a8c000
[00000000] *pgd=23a27031, *pte=00000000, *ppte=00000000
Internal error: Oops: 817 [#1]
Modules linked in:
CPU: 0    Not tainted  (2.6.29.6 #1)
PC is at clkevt32k_next_event+0x70/0xac
LR is at clockevents_program_event+0xfc/0x168
pc : [<c002fb90>]    lr : [<c005687c>]    psr: 60000093
sp : c3a35ca0  ip : c3a35cc0  fp : c3a35cbc
r10: 00000000  r9 : 00000000  r8 : c02da8a8
r7 : 00000041  r6 : 00000000  r5 : 00000001  r4 : 00000000
r3 : 00000000  r2 : 00000000  r1 : c02da8a8  r0 : 00000001
Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: c000717f  Table: 23a8c000  DAC: 00000015
Process cyclictest (pid: 498, stack limit = 0xc3a34268)
Stack: (0xc3a35ca0 to 0xc3a36000)
5ca0: 00000041 00000000 000225c1 00000000 c3a35ce4 c3a35cc0 c005687c c002fb30 
5cc0: 00000041 23cc4bf5 00000041 23cbf12f 00000041 c02da8a8 c3a35d34 c3a35ce8 
5ce0: c00572b8 c0056790 23cbf12f 00000041 c02f8620 c3a34000 c3a35d2c c3a35d08 
5d00: 23cbf12f 00000041 c004f9e0 23cc4bf5 00000041 00000041 00000000 00000000 
5d20: c02dd1f8 c02dd1f8 c3a35d54 c3a35d38 c005735c c00571fc 00000001 c02da8a8 
5d40: c02dd1c8 23cbde68 c3a35db4 c3a35d58 c004fc04 c005733c c00572b8 c02da8a8 
5d60: 23cbde68 00000041 23cc4bf5 00000041 00000000 00000001 23cbde68 00000041 
5d80: c3a35d9c 23cc4bf5 00000041 c02da888 00000000 00000001 00000000 c02f10f8 
5da0: c02f8620 c3a34000 c3a35dd4 c3a35db8 c002fc80 c004fa34 c02dd1c8 c02da888 
5dc0: 00000000 00000001 c3a35df4 c3a35dd8 c005f604 c002fbdc c02dd754 00000001 
5de0: 00000000 c38165e0 c3a35e0c c3a35df8 c0061130 c005f5d0 00000001 c02faab8 
5e00: c3a35e2c c3a35e10 c002606c c00610c0 c3a35e34 ffffffff fefff000 00000001 
5e20: c3a35ea4 c3a35e30 c00269a8 c0026010 c02f0b64 00000002 c3946000 4a96e930 
5e40: c394601c c3816e20 00000015 c38165e0 c02f10f8 c02f8620 c3a34000 c3a35ea4 
5e60: c3a34044 c3a35e78 c0026d00 c0050a80 a0000013 ffffffff c3a35e9c 7fffffff 
5e80: 7fffffff c3a34000 00000000 c3a35f78 c3a34000 c3a35ef0 c3a35edc c3a35ea8 
5ea0: c023cd54 c023c868 00000000 60000093 60000093 60000013 0000000e 7fffffff 
5ec0: 7fffffff 00000000 c3a34000 00000000 c3a35eec c3a35ee0 c023cde0 c023ccd4 
5ee0: c3a35fa4 c3a35ef0 c004437c c023cdcc 0000000e 00000000 0001fffe 00000011 
5f00: c3a35f4c c3a35f10 c0053fd0 c01b4b58 00000000 00000000 c3a35f88 0000419e 
5f20: 00000000 0000419e c3a35f88 c02f7400 c02f73e0 c02f73e0 c3a34000 000172f4 
5f40: c3a35f74 c3a35f50 c004f9e0 c003aeb4 00000000 c3a35f88 4916ddf4 40029770 
5f60: 00000107 4916ddf4 40029770 00000107 c0026ec4 4916ddf4 ffffdfff ffffffff 
5f80: 4916ddfc 00000000 4916ddfc 000000b1 c0026ec4 40040000 00000000 c3a35fa8 
5fa0: c0026d40 c00441bc 4916ddfc 00000000 4916dcfc 00000000 00000000 00000008 
5fc0: 4916ddfc 00000000 4916ddfc 000000b1 00017978 00015c60 40040000 4916ddf4 
5fe0: 4916dcfc 4916dbc0 40033904 40033864 60000010 4916dcfc 00000000 00000000 
Backtrace: 
[<c002fb20>] (clkevt32k_next_event+0x0/0xac) from [<c005687c>] (clockevents_program_e)
 r6:00000000 r5:000225c1 r4:00000000
[<c0056780>] (clockevents_program_event+0x0/0x168) from [<c00572b8>] (tick_dev_progra)
 r8:c02da8a8 r7:00000041 r6:23cbf12f r5:00000041 r4:23cc4bf5
[<c00571ec>] (tick_dev_program_event+0x0/0xf8) from [<c005735c>] (tick_program_event+)
[<c005732c>] (tick_program_event+0x0/0x3c) from [<c004fc04>] (hrtimer_interrupt+0x1e0)
 r5:23cbde68 r4:c02dd1c8
[<c004fa24>] (hrtimer_interrupt+0x0/0x284) from [<c002fc80>] (at91rm9200_timer_interr)
[<c002fbcc>] (at91rm9200_timer_interrupt+0x0/0xc8) from [<c005f604>] (handle_IRQ_even)
 r6:00000001 r5:00000000 r4:c02da888
[<c005f5c0>] (handle_IRQ_event+0x0/0x80) from [<c0061130>] (handle_level_irq+0x80/0x1)
 r7:c38165e0 r6:00000000 r5:00000001 r4:c02dd754
[<c00610b0>] (handle_level_irq+0x0/0x11c) from [<c002606c>] (__exception_text_start+0)
 r5:c02faab8 r4:00000001
[<c0026000>] (__exception_text_start+0x0/0x8c) from [<c00269a8>] (__irq_svc+0x28/0x60)
Exception stack(0xc3a35e30 to 0xc3a35e78)
5e20:                                     c02f0b64 00000002 c3946000 4a96e930 
5e40: c394601c c3816e20 00000015 c38165e0 c02f10f8 c02f8620 c3a34000 c3a35ea4 
5e60: c3a34044 c3a35e78 c0026d00 c0050a80 a0000013 ffffffff                   
 r6:00000001 r5:fefff000 r4:ffffffff
[<c023c858>] (schedule+0x0/0x2ac) from [<c023cd54>] (schedule_timeout+0x90/0xd0)
[<c023ccc4>] (schedule_timeout+0x0/0xd0) from [<c023cde0>] (schedule_timeout_interrup)
 r7:00000000 r6:c3a34000 r5:00000000 r4:7fffffff
[<c023cdbc>] (schedule_timeout_interruptible+0x0/0x28) from [<c004437c>] (sys_rt_sigt)
[<c00441ac>] (sys_rt_sigtimedwait+0x0/0x270) from [<c0026d40>] (ret_fast_syscall+0x0/)
Code: e121f004 e3a00000 e89da878 e3a03000 (e5833000) 
Kernel panic - not syncing: Fatal exception in interrupt


[-- Attachment #3: 180809_kernel_panic_2.6.29.6-rt23_5 --]
[-- Type: text/plain, Size: 4082 bytes --]

Unable to handle kernel NULL pointer dereference at virtual a0
pgd = c0004000
[00000000] *pgd=00000000
Internal error: Oops: 817 [#1] PREEMPT
Modules linked in:
CPU: 0    Not tainted  (2.6.29.6-rt23 #1)
PC is at clkevt32k_next_event+0x70/0xac
LR is at clockevents_program_event+0xfc/0x168
pc : [<c002ef88>]    lr : [<c005a1e0>]    psr: 60000093
sp : c02f3d98  ip : c02f3db8  fp : c02f3db4
r10: 00000000  r9 : 00000000  r8 : c02f6a30
r7 : 00000045  r6 : 00000000  r5 : 00000001  r4 : 00000000
r3 : 00000000  r2 : 00000000  r1 : c02f6a30  r0 : 00000001
Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
Control: c000717f  Table: 23970000  DAC: 00000017
Process swapper (pid: 0, stack limit = 0xc02f2270)
Stack: (0xc02f3d98 to 0xc02f4000)
3d80:                                                       2d2c08e3 00000000 
3da0: 000225c1 00000000 c02f3ddc c02f3db8 c005a1e0 c002ef28 00000045 2d484309 
3dc0: 00000045 2d483fe8 00000045 c02f6a30 c02f3e2c c02f3de0 c005aea8 c005a0f4 
3de0: 2d483fe8 00000045 68e2da6f 00000000 00000000 c02f3e70 2d483fe8 00000045 
3e00: 689e360d 2d484309 00000045 00000045 c02f9de0 2d484309 00000045 c02f2000 
3e20: c02f3e4c c02f3e30 c005af4c c005adec 00000001 00000000 c02f9de0 2d48106f 
3e40: c02f3eac c02f3e50 c005215c c005af2c 3fd06258 c02f6a30 2d48106f 00000045 
3e60: 00000001 c02f9de0 00000001 00000000 2d48106f 00000045 00000001 00000000 
3e80: 00000000 c02f6a08 00010002 c02f2000 00000001 00000000 00000000 00000001 
3ea0: c02f3ecc c02f3eb0 c002f078 c0051f28 c385f7a0 c02f6a08 00010002 c02f2000 
3ec0: c02f3efc c02f3ed0 c006513c c002efd4 c02f3ef4 c02fa4f8 c02f2000 c02f6a08 
3ee0: 00000001 00000001 c02f2000 2001fc9c c02f3f1c c02f3f00 c00678dc c00650f8 
3f00: 00000001 c0302d40 00000000 00000002 c02f3f3c c02f3f20 c002506c c00677e8 
3f20: c02f3f54 ffffffff fefff000 00000001 c02f3f94 c02f3f40 c00259dc c0025010 
3f40: 00000000 00000001 c02f2000 60000013 c0026eb0 c02f2000 c02f5eb0 c03104e0 
3f60: 2001fcd0 41129200 2001fc9c c02f3f94 c02f3f98 c02f3f88 c0026ef0 c0026efc 
3f80: 60000013 ffffffff c02f3fb4 c02f3f98 c0026db0 c0026ec0 c02f2000 c031048c 
3fa0: c00222c0 c02f5ce0 c02f3fcc c02f3fb8 c02540dc c0026d70 c031048c c03195fc 
3fc0: c02f3ff4 c02f3fd0 c0008b38 c0254068 c0008594 00000000 00000000 c0021ebc 
3fe0: c0007175 c0310518 00000000 c02f3ff8 20008034 c0008970 00000000 00000000 
Backtrace: 
[<c002ef18>] (clkevt32k_next_event+0x0/0xac) from [<c005a1e0>] (clockevents_pro)
 r6:00000000 r5:000225c1 r4:00000000
[<c005a0e4>] (clockevents_program_event+0x0/0x168) from [<c005aea8>] (tick_dev_)
 r8:c02f6a30 r7:00000045 r6:2d483fe8 r5:00000045 r4:2d484309
[<c005addc>] (tick_dev_program_event+0x0/0xf8) from [<c005af4c>] (tick_program_)
[<c005af1c>] (tick_program_event+0x0/0x3c) from [<c005215c>] (hrtimer_interrupt)
 r5:2d48106f r4:c02f9de0
[<c0051f18>] (hrtimer_interrupt+0x0/0x2e8) from [<c002f078>] (at91rm9200_timer_)
[<c002efc4>] (at91rm9200_timer_interrupt+0x0/0xc8) from [<c006513c>] (handle_IR)
 r6:c02f2000 r5:00010002 r4:c02f6a08
[<c00650e8>] (handle_IRQ_event+0x0/0xec) from [<c00678dc>] (handle_level_irq+0x)
[<c00677d8>] (handle_level_irq+0x0/0x174) from [<c002506c>] (_text+0x6c/0x8c)
 r7:00000002 r6:00000000 r5:c0302d40 r4:00000001
[<c0025000>] (_text+0x0/0x8c) from [<c00259dc>] (__irq_svc+0x3c/0x80)
Exception stack(0xc02f3f40 to 0xc02f3f88)
3f40: 00000000 00000001 c02f2000 60000013 c0026eb0 c02f2000 c02f5eb0 c03104e0 
3f60: 2001fcd0 41129200 2001fc9c c02f3f94 c02f3f98 c02f3f88 c0026ef0 c0026efc 
3f80: 60000013 ffffffff                                                       
 r6:00000001 r5:fefff000 r4:ffffffff
[<c0026eb0>] (default_idle+0x0/0x54) from [<c0026db0>] (cpu_idle+0x50/0xac)
[<c0026d60>] (cpu_idle+0x0/0xac) from [<c02540dc>] (rest_init+0x84/0x9c)
 r7:c02f5ce0 r6:c00222c0 r5:c031048c r4:c02f2000
[<c0254058>] (rest_init+0x0/0x9c) from [<c0008b38>] (start_kernel+0x1d8/0x2c8)
 r4:c03195fc
[<c0008960>] (start_kernel+0x0/0x2c8) from [<20008034>] (0x20008034)
 r5:c0310518 r4:c0007175
Code: e121f004 e3a00000 e89da878 e3a03000 (e5833000) 
Kernel panic - not syncing: Fatal exception in interrupt

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: hrtimer problem on AT91RM9200
  2009-08-19 12:24 hrtimer problem on AT91RM9200 Bo Hansen
@ 2009-08-19 15:33 ` Remy Bohmer
  2009-08-20 19:52 ` Uwe Kleine-König
  1 sibling, 0 replies; 19+ messages in thread
From: Remy Bohmer @ 2009-08-19 15:33 UTC (permalink / raw)
  To: Bo Hansen; +Cc: rt-users

Hi Bo,

> I have tested with different options and the only one making a
> difference is disabling high resolution timers. Whenever this is disabled
> (RT patch or not) the system does not panic. This of course does not
> make the latency as good as I would like it to be.

Reading this I guess you want a timer accuracy better than about 1
millisecond on this processor.(at91rm9200)
Setting the CONFIG_HZ to 1000 and disabling HRT would give you about 1
millisecond accuracy.
But, timers in the order of 1 millisecond magnitude can easily have an
system (interrupt+softirqs) overhead up to 250us while using HRT on
this processor.
So, if you want sub-millisecond timers, then to me it seems that
things are somewhat out of balance here.

But, If you want an accurate period time for your realtime
application, you can also use the TC-library to provide you a periodic
interrupt on which you schedule your RT-task.
This will provide you much better latencies, and a much lower CPU-load.

> A kernel panic from both an RT and non-RT kernel with high resolution
> timer has been attached. The source seems to vary from time to time.
>
> I noticed than when I stress the system I sometimes get the
> following message:
> hrtimer: interrupt too slow, forcing clock min delta to 52482 ns

This is exactly what I would expect, except that these 52us is still
quite fast for this processor...

> And often (but no always) the system panics a little while later. I guess
> it tells the system is too busy? but should it really end in a kernel
> panic? Using the RT patch I understand as we can set the priority
> as we like and stall the system completely, but on a non-RT kernel this
> should not be possible?

But anyway, it should not panic...

Kind regards,

Remy

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: hrtimer problem on AT91RM9200
  2009-08-19 12:24 hrtimer problem on AT91RM9200 Bo Hansen
  2009-08-19 15:33 ` Remy Bohmer
@ 2009-08-20 19:52 ` Uwe Kleine-König
  2009-08-21 12:57   ` Bo Hansen
  1 sibling, 1 reply; 19+ messages in thread
From: Uwe Kleine-König @ 2009-08-20 19:52 UTC (permalink / raw)
  To: Bo Hansen; +Cc: rt-users

Hello,

> PC is at clkevt32k_next_event+0x70/0xac
This function has a WARN_ON_ONCE(...).  Can you check if this triggers?

Can you provide me a kernel image with CONFIG_DEBUG_INFO=y and a
matching oops?

Best regards
Uwe

-- 
Pengutronix e.K.                              | Uwe Kleine-König            |
Industrial Linux Solutions                    | http://www.pengutronix.de/  |
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: hrtimer problem on AT91RM9200
  2009-08-20 19:52 ` Uwe Kleine-König
@ 2009-08-21 12:57   ` Bo Hansen
       [not found]     ` <fae6a9700908232327u65fca65bt97bb8c43a91345cd@mail.gmail.com>
  2009-09-03 14:12     ` Uwe Kleine-König
  0 siblings, 2 replies; 19+ messages in thread
From: Bo Hansen @ 2009-08-21 12:57 UTC (permalink / raw)
  To: Uwe Kleine-König; +Cc: rt-users

Hi Uwe,

I cannot find anything in the dmesg or serial output regarding the
WARN_ON_ONCE(...)
I guess the WARN_ON_ONCE(...) should show up in dmesg output?

What puzzles me is that I tried the CONFIG_DEBUG_BUGVERBOSE=y
also when I enabled CONFIG_DEBUG_INFO. When I do
this I don't get the kernel panic. Is this somehow related to the
WARN_ON_ONCE?

Attachments only sent to Uwe - anybody are welcome to get a copy.

Best regards,
Bo

Uwe Kleine-König wrote:
> Hello,
>
>   
>> PC is at clkevt32k_next_event+0x70/0xac
>>     
> This function has a WARN_ON_ONCE(...).  Can you check if this triggers?
>
> Can you provide me a kernel image with CONFIG_DEBUG_INFO=y and a
> matching oops?
>
> Best regards
> Uwe
>
>   
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: hrtimer problem on AT91RM9200
       [not found]     ` <fae6a9700908232327u65fca65bt97bb8c43a91345cd@mail.gmail.com>
@ 2009-08-24  6:30       ` Vivek Satpute
  0 siblings, 0 replies; 19+ messages in thread
From: Vivek Satpute @ 2009-08-24  6:30 UTC (permalink / raw)
  To: Bo Hansen; +Cc: Uwe Kleine-König, rt-users

Adding RT-mailing list..

2009/8/24 Vivek Satpute <vivek@linsyssoft.com>:
> Hi Bo,
>
> 2009/8/21 Bo Hansen <bh@newtec.dk>
>>
>> Hi Uwe,
>>
>> I cannot find anything in the dmesg or serial output regarding the
>> WARN_ON_ONCE(...)
>> I guess the WARN_ON_ONCE(...) should show up in dmesg output?
>
> I am not sure that it is directly shown up in dmesg output, but you can
> add printks to verify whether WARN_ON_ONCE() code path hits or not.
>>
>> What puzzles me is that I tried the CONFIG_DEBUG_BUGVERBOSE=y
>> also when I enabled CONFIG_DEBUG_INFO. When I do
>> this I don't get the kernel panic. Is this somehow related to the
>> WARN_ON_ONCE?
>
> I have encountered this scenario. If there is any race condition in code
> then it might get suppressed when debugging options are enabled and
> kernel does not panic.
>>
>> Attachments only sent to Uwe - anybody are welcome to get a copy.
>>
>> Best regards,
>> Bo
>
> Thanks,
> Vivek Satpute
>>
>>
>> Uwe Kleine-König wrote:
>>>
>>> Hello,
>>>
>>>
>>>>
>>>> PC is at clkevt32k_next_event+0x70/0xac
>>>>
>>>
>>> This function has a WARN_ON_ONCE(...).  Can you check if this triggers?
>>>
>>> Can you provide me a kernel image with CONFIG_DEBUG_INFO=y and a
>>> matching oops?
>>>
>>> Best regards
>>> Uwe
>>>
>>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rt-users"
>> in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: hrtimer problem on AT91RM9200
  2009-08-21 12:57   ` Bo Hansen
       [not found]     ` <fae6a9700908232327u65fca65bt97bb8c43a91345cd@mail.gmail.com>
@ 2009-09-03 14:12     ` Uwe Kleine-König
  2009-09-10  6:44       ` Bo Hansen
  1 sibling, 1 reply; 19+ messages in thread
From: Uwe Kleine-König @ 2009-09-03 14:12 UTC (permalink / raw)
  To: Bo Hansen; +Cc: rt-users

Hello,

On Fri, Aug 21, 2009 at 02:57:43PM +0200, Bo Hansen wrote:
> Hi Uwe,
>
> I cannot find anything in the dmesg or serial output regarding the
> WARN_ON_ONCE(...)
> I guess the WARN_ON_ONCE(...) should show up in dmesg output?
>
> What puzzles me is that I tried the CONFIG_DEBUG_BUGVERBOSE=y
> also when I enabled CONFIG_DEBUG_INFO. When I do
> this I don't get the kernel panic. Is this somehow related to the
> WARN_ON_ONCE?
>
> Attachments only sent to Uwe - anybody are welcome to get a copy.
Hhhhmm, clkevt32k_next_event is called with delta=1 which hits
BUG_ON(delta < 2).

I calculated the clockeventdevice parameters by hand, and the numbers
look good:

	clkevt.shift = 32
	clkevt.mult = div_sc(AT91_SLOW_CLOCK, NSEC_PER_SEC, clkevt.shift)
		= (32768 << 32) / 1000000000
		= 140737
	clkevt.min_delta_ns = clockevent_delta2ns(2, &clkevt) + 1
		= (2 << clkevt.shift) / clkevt.mult + 1
		= (2 << 32) / 140737 + 1
		= 61036
	min_delta = (clkevt.min_delta_ns * mult) >> clkevt.shift
		= (61036 * 140737) >> 32
		= 2

Can you please apply the patch below, reproduce and provide the two
lines of output resulting from the two added printks?

Best regards
Uwe

diff --git a/arch/arm/mach-at91/at91rm9200_time.c b/arch/arm/mach-at91/at91rm9200_time.c
index 1ff1bda..83a7379 100644
--- a/arch/arm/mach-at91/at91rm9200_time.c
+++ b/arch/arm/mach-at91/at91rm9200_time.c
@@ -136,7 +136,13 @@ clkevt32k_next_event(unsigned long delta, struct clock_event_device *dev)
 	u32		alm;
 	int		status = 0;
 
-	BUG_ON(delta < 2);
+	if (unlikely(delta < 2)) {
+		pr_emerg("************** %s: shift=%d, mult=%lu, "
+				"min_delta_ns=%lu, dev=%pS\n", __func__,
+				dev->shift, dev->mult,
+				dev->min_delta_ns, dev);
+		BUG();
+	}
 
 	/* Use "raw" primitives so we behave correctly on RT kernels. */
 	raw_local_irq_save(flags);
@@ -205,6 +211,18 @@ void __init at91rm9200_timer_init(void)
 	clkevt.mult = div_sc(AT91_SLOW_CLOCK, NSEC_PER_SEC, clkevt.shift);
 	clkevt.max_delta_ns = clockevent_delta2ns(AT91_ST_ALMV, &clkevt);
 	clkevt.min_delta_ns = clockevent_delta2ns(2, &clkevt) + 1;
+
+	{
+		unsigned long long clc = (int64_t)clkevt.min_delta_ns *
+			clkevt.mult;
+
+		clc >>= clkevt.shift;
+
+		pr_info("************** %s: shift=%d, mult=%lu, "
+				"min_delta_ns=%lu, minclc=%llu\n", __func__,
+				clkevt.shift, clkevt.mult,
+				clkevt.min_delta_ns, clc);
+	}
 	clkevt.cpumask = cpumask_of(0);
 	clockevents_register_device(&clkevt);
 




-- 
Pengutronix e.K.                              | Uwe Kleine-König            |
Industrial Linux Solutions                    | http://www.pengutronix.de/  |
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: hrtimer problem on AT91RM9200
  2009-09-03 14:12     ` Uwe Kleine-König
@ 2009-09-10  6:44       ` Bo Hansen
  2009-09-10  9:51         ` Uwe Kleine-König
  0 siblings, 1 reply; 19+ messages in thread
From: Bo Hansen @ 2009-09-10  6:44 UTC (permalink / raw)
  To: Uwe Kleine-König; +Cc: rt-users

[-- Attachment #1: Type: text/plain, Size: 2841 bytes --]

Hi Uwe,

Sorry for my late reply.

I only get the initialization line:
************** at91rm9200_timer_init: shift=32, mult=140737,
min_delta_ns=61036, minclc=2  

I have attached the kernel panic.

Best regards,
Bo


Uwe Kleine-König wrote:
> Hello,
>
> On Fri, Aug 21, 2009 at 02:57:43PM +0200, Bo Hansen wrote:
>   
>> Hi Uwe,
>>
>> I cannot find anything in the dmesg or serial output regarding the
>> WARN_ON_ONCE(...)
>> I guess the WARN_ON_ONCE(...) should show up in dmesg output?
>>
>> What puzzles me is that I tried the CONFIG_DEBUG_BUGVERBOSE=y
>> also when I enabled CONFIG_DEBUG_INFO. When I do
>> this I don't get the kernel panic. Is this somehow related to the
>> WARN_ON_ONCE?
>>
>> Attachments only sent to Uwe - anybody are welcome to get a copy.
>>     
> Hhhhmm, clkevt32k_next_event is called with delta=1 which hits
> BUG_ON(delta < 2).
>
> I calculated the clockeventdevice parameters by hand, and the numbers
> look good:
>
> 	clkevt.shift = 32
> 	clkevt.mult = div_sc(AT91_SLOW_CLOCK, NSEC_PER_SEC, clkevt.shift)
> 		= (32768 << 32) / 1000000000
> 		= 140737
> 	clkevt.min_delta_ns = clockevent_delta2ns(2, &clkevt) + 1
> 		= (2 << clkevt.shift) / clkevt.mult + 1
> 		= (2 << 32) / 140737 + 1
> 		= 61036
> 	min_delta = (clkevt.min_delta_ns * mult) >> clkevt.shift
> 		= (61036 * 140737) >> 32
> 		= 2
>
> Can you please apply the patch below, reproduce and provide the two
> lines of output resulting from the two added printks?
>
> Best regards
> Uwe
>
> diff --git a/arch/arm/mach-at91/at91rm9200_time.c b/arch/arm/mach-at91/at91rm9200_time.c
> index 1ff1bda..83a7379 100644
> --- a/arch/arm/mach-at91/at91rm9200_time.c
> +++ b/arch/arm/mach-at91/at91rm9200_time.c
> @@ -136,7 +136,13 @@ clkevt32k_next_event(unsigned long delta, struct clock_event_device *dev)
>  	u32		alm;
>  	int		status = 0;
>  
> -	BUG_ON(delta < 2);
> +	if (unlikely(delta < 2)) {
> +		pr_emerg("************** %s: shift=%d, mult=%lu, "
> +				"min_delta_ns=%lu, dev=%pS\n", __func__,
> +				dev->shift, dev->mult,
> +				dev->min_delta_ns, dev);
> +		BUG();
> +	}
>  
>  	/* Use "raw" primitives so we behave correctly on RT kernels. */
>  	raw_local_irq_save(flags);
> @@ -205,6 +211,18 @@ void __init at91rm9200_timer_init(void)
>  	clkevt.mult = div_sc(AT91_SLOW_CLOCK, NSEC_PER_SEC, clkevt.shift);
>  	clkevt.max_delta_ns = clockevent_delta2ns(AT91_ST_ALMV, &clkevt);
>  	clkevt.min_delta_ns = clockevent_delta2ns(2, &clkevt) + 1;
> +
> +	{
> +		unsigned long long clc = (int64_t)clkevt.min_delta_ns *
> +			clkevt.mult;
> +
> +		clc >>= clkevt.shift;
> +
> +		pr_info("************** %s: shift=%d, mult=%lu, "
> +				"min_delta_ns=%lu, minclc=%llu\n", __func__,
> +				clkevt.shift, clkevt.mult,
> +				clkevt.min_delta_ns, clc);
> +	}
>  	clkevt.cpumask = cpumask_of(0);
>  	clockevents_register_device(&clkevt);
>  
>
>
>
>
>   

[-- Attachment #2: 100909_kernel_panic_2.6.29.6-rt23-debug-uwe --]
[-- Type: text/plain, Size: 7414 bytes --]

Unable to handle kernel NULL pointer dereference at virtual address 00000000                                                                      
pgd = c3930000                                                                  
[00000000] *pgd=2386e031, *pte=00000000, *ppte=00000000                         
Internal error: Oops: 817 [#1] PREEMPT                                          
Modules linked in:                                                              
CPU: 0    Not tainted  (2.6.29.6-rt23 #1)                                       
PC is at clkevt32k_next_event+0x94/0xd8                                         
LR is at rt_mutex_unlock+0x10/0x14                                              
pc : [<c002efac>]    lr : [<c0256e40>]    psr: 00000093                         
sp : c3a6fc20  ip : c3a6fb30  fp : c3a6fc44                                     
r10: 00000001  r9 : 00000000  r8 : c02f6a30                                     
r7 : 00000032  r6 : 00000000  r5 : 00000001  r4 : 00000000                      
r3 : 00000000  r2 : 00010002  r1 : 60000093  r0 : 00000062                      
Flags: nzcv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user              
Control: c000717f  Table: 23930000  DAC: 00000015                               
Process cyclictest (pid: 579, stack limit = 0xc3a6e270)                         
Stack: (0xc3a6fc20 to 0xc3a70000)                                               
fc20: 0000dcaa c02f6a30 00013bc9 00000000 000225c1 00000000 c3a6fc6c c3a6fc48   
fc40: c005a20c c002ef28 00000000 0083ad27 00000032 0082d07d 00000032 c02f6a30   
fc60: c3a6fcbc c3a6fc70 c005aed4 c005a120 0082d07d 00000032 0083ad27 00000032   
fc80: 0082d07d 00000032 0082bdb6 00000032 3bccb551 00829826 00000032 00000032   
fca0: c02f9de0 00829826 00000032 c3a6e000 c3a6fcdc c3a6fcc0 c005af78 c005ae18   
fcc0: 00000001 00000000 c02f9de0 0082947f c3a6fd3c c3a6fce0 c0052188 c005af58   
fce0: 3fd06258 c02f6a30 0082947f 00000032 c3a6fd1c c02f9de0 00000001 00000000   
fd00: 0082947f 00000032 c02f9de0 198461bc c3a6fd7c c02f6a08 00010002 c3a6e000   
fd20: 00000001 00000000 00000000 00000001 c3a6fd5c c3a6fd40 c002f0a4 c0051f54   
fd40: 198461bc c02f6a08 00010002 c3a6e000 c3a6fd8c c3a6fd60 c0065168 c002f000   
fd60: 00000001 c02fa4f8 c3a6e000 c02f6a08 00000001 00000001 c3a6e000 c3a6e000   
fd80: c3a6fdac c3a6fd90 c0067908 c0065124 00000001 c0302d40 00000000 00000002   
fda0: c3a6fdcc c3a6fdb0 c002506c c0067814 00000000 ffffffff fefff000 00000001   
fdc0: c3a6fe74 c3a6fdd0 c00259dc c0025010 00000000 c3a67ef8 c3a5fef8 40000013   
fde0: 40000013 00000000 0233a240 00000032 c3a6fef8 c02f9de0 c3a6e000 c3a6fe74   
fe00: c3a67ef8 c3a6fe18 c3a47ef8 c00528a4 60000013 ffffffff 00000000 00000000   
fe20: 00000000 40000013 0233a240 00000032 00000000 00000000 c00540a8 c0054044   
fe40: 00000000 c3a6e000 c3a6fe7c 3b9aca00 00000032 00000000 00000000 00000000   
fe60: 00000000 c3a6fef8 c3a6fe94 c3a6fe78 c00529cc c0052758 00000000 00000000   
fe80: 00000001 3b9aca00 c3a6fed4 c3a6fe98 c02569ac c00529b4 00000000 00000000   
fea0: c3a6e000 00000000 c02f9de0 00000000 00000000 00000000 c3a6ff80 c3a6e000   
fec0: 00000000 00000000 c3a6ff64 c3a6fed8 c0052a8c c0256904 00000000 00000000   
fee0: 38a09550 00000000 0233a240 00000032 00000000 00000000 c0310700 00000000   
ff00: 00000000 00000000 0233a240 00000032 0233a240 00000032 c0051e68 c02f9de0   
ff20: 00000001 c3a6ff24 c3a6ff24 00000001 c3a63680 c003bf78 c3a6ff88 00000001   
ff40: 00000001 00000000 c3a6ff80 c0025f44 c3a6e000 00017690 c3a6ff7c c3a6ff68   
ff60: c004c8d0 c00529e4 c0025f44 00000001 c3a6ffa4 c3a6ff80 c004ca00 c004c8b4   
ff80: 00000032 0233a240 00000001 00000000 5396ddec 00000109 00000000 c3a6ffa8   
ffa0: c0025dc0 c004c8e4 00000001 00000000 00000001 00000001 5396ddec 00000000   
ffc0: 00000001 00000000 5396ddec 00000109 00000001 00015c60 00017690 5396ddf4   
ffe0: 00000000 5396dc50 4004680c 4004682c 60000010 00000001 00000000 00000000   
Backtrace:                                                                      
[<c002ef18>] (clkevt32k_next_event+0x0/0xd8) from [<c005a20c>] (clockevents_prog
ram_event+0xfc/0x168)                                                           
 r6:00000000 r5:000225c1 r4:00000000                                            
[<c005a110>] (clockevents_program_event+0x0/0x168) from [<c005aed4>] (tick_dev_p
rogram_event+0xcc/0xf8)                                                         
 r8:c02f6a30 r7:00000032 r6:0082d07d r5:00000032 r4:0083ad27                    
[<c005ae08>] (tick_dev_program_event+0x0/0xf8) from [<c005af78>] (tick_program_e
vent+0x30/0x3c)                                                                 
[<c005af48>] (tick_program_event+0x0/0x3c) from [<c0052188>] (hrtimer_interrupt+
0x244/0x2e8)                                                                    
 r5:0082947f r4:c02f9de0                                                        
[<c0051f44>] (hrtimer_interrupt+0x0/0x2e8) from [<c002f0a4>] (at91rm9200_timer_i
nterrupt+0xb4/0xc8)                                                             
[<c002eff0>] (at91rm9200_timer_interrupt+0x0/0xc8) from [<c0065168>] (handle_IRQ
_event+0x54/0xec)                                                               
 r6:c3a6e000 r5:00010002 r4:c02f6a08                                            
[<c0065114>] (handle_IRQ_event+0x0/0xec) from [<c0067908>] (handle_level_irq+0x1
04/0x174)                                                                       
[<c0067804>] (handle_level_irq+0x0/0x174) from [<c002506c>] (_text+0x6c/0x8c)   
 r7:00000002 r6:00000000 r5:c0302d40 r4:00000001                                
[<c0025000>] (_text+0x0/0x8c) from [<c00259dc>] (__irq_svc+0x3c/0x80)           
Exception stack(0xc3a6fdd0 to 0xc3a6fe18)                                       
fdc0:                                     00000000 c3a67ef8 c3a5fef8 40000013   
fde0: 40000013 00000000 0233a240 00000032 c3a6fef8 c02f9de0 c3a6e000 c3a6fe74   
fe00: c3a67ef8 c3a6fe18 c3a47ef8 c00528a4 60000013 ffffffff                     
 r6:00000001 r5:fefff000 r4:ffffffff                                            
[<c0052748>] (__hrtimer_start_range_ns+0x0/0x228) from [<c00529cc>] (hrtimer_sta
rt_range_ns+0x28/0x30)                                                          
[<c00529a4>] (hrtimer_start_range_ns+0x0/0x30) from [<c02569ac>] (do_nanosleep+0
xb8/0x114)                                                                      
[<c02568f4>] (do_nanosleep+0x0/0x114) from [<c0052a8c>] (hrtimer_nanosleep+0xb8/
0x140)                                                                          
[<c00529d4>] (hrtimer_nanosleep+0x0/0x140) from [<c004c8d0>] (common_nsleep+0x2c
/0x30)                                                                          
[<c004c8a4>] (common_nsleep+0x0/0x30) from [<c004ca00>] (sys_clock_nanosleep+0x1
2c/0x150)                                                                       
 r4:00000001                                                                    
[<c004c8d4>] (sys_clock_nanosleep+0x0/0x150) from [<c0025dc0>] (ret_fast_syscall
+0x0/0x2c)                                                                      
 r7:00000109 r6:5396ddec r5:00000000 r4:00000001                                
Code: e59f1040 e88d5000 eb0024fc e3a03000 (e5833000)                            
Kernel panic - not syncing: Fatal exception in interrupt 

[-- Attachment #3: 100909_kernel_panic_2.6.29.6-rt23-debug-uwe2 --]
[-- Type: text/plain, Size: 5898 bytes --]

Unable to handle kernel NULL pointer dereference at virtual address 00000000    
pgd = c0004000                                                                  
[00000000] *pgd=00000000                                                        
Internal error: Oops: 817 [#1] PREEMPT                                          
Modules linked in:                                                              
CPU: 0    Not tainted  (2.6.29.6-rt23 #1)                                       
PC is at clkevt32k_next_event+0x94/0xd8                                         
LR is at rt_mutex_unlock+0x10/0x14                                              
pc : [<c002efac>]    lr : [<c0256e40>]    psr: 00000093                         
sp : c02f3d90  ip : c02f3ca0  fp : c02f3db4                                     
r10: 00000000  r9 : 00000000  r8 : c02f6a30                                     
r7 : 0000004a  r6 : 00000000  r5 : 00000001  r4 : 00000000                      
r3 : 00000000  r2 : 00010002  r1 : 60000093  r0 : 00000062                      
Flags: nzcv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment kernel            
Control: c000717f  Table: 23890000  DAC: 00000017                               
Process swapper (pid: 0, stack limit = 0xc02f2270)                              
Stack: (0xc02f3d90 to 0xc02f4000)                                               
3d80:                                     0000c6c0 c02f6a30 131806f5 00000000   
3da0: 000225c1 00000000 c02f3ddc c02f3db8 c005a20c c002ef28 0000004a 13797362   
3dc0: 0000004a 13796120 0000004a c02f6a30 c02f3e2c c02f3de0 c005aed4 c005a120   
3de0: 13796120 0000004a 4f140486 00000000 00000000 c02f3e70 13796120 0000004a   
3e00: 4e6c7a07 13797362 0000004a 0000004a c02f9de0 13797362 0000004a c02f2000   
3e20: c02f3e4c c02f3e30 c005af78 c005ae18 00000001 00000000 c02f9de0 13793a86   
3e40: c02f3eac c02f3e50 c0052188 c005af58 3fd06258 c02f6a30 13793a86 0000004a   
3e60: c0065168 c02f9de0 00000001 00000000 13793a86 0000004a c02f2000 c0302d40   
3e80: c02f3e9c c02f6a08 00010002 c02f2000 00000001 00000000 00000000 00000001   
3ea0: c02f3ecc c02f3eb0 c002f0a4 c0051f54 c38eb1e0 c02f6a08 00010002 c02f2000   
3ec0: c02f3efc c02f3ed0 c0065168 c002f000 c02f3ef4 c02fa4f8 c02f2000 c02f6a08   
3ee0: 00000001 00000001 c02f2000 2001fce8 c02f3f1c c02f3f00 c0067908 c0065124   
3f00: 00000001 c0302d40 00000000 00000002 c02f3f3c c02f3f20 c002506c c0067814   
3f20: c02f3f54 ffffffff fefff000 00000001 c02f3f94 c02f3f40 c00259dc c0025010   
3f40: 00000000 00000001 c02f2000 60000013 c0026eb0 c02f2000 c02f5eb0 c03104e0   
3f60: 2001fd1c 41129200 2001fce8 c02f3f94 c02f3f98 c02f3f88 c0026ef0 c0026efc   
3f80: 60000013 ffffffff c02f3fb4 c02f3f98 c0026db0 c0026ec0 c02f2000 c031048c   
3fa0: c00222c0 c02f5ce0 c02f3fcc c02f3fb8 c02540dc c0026d70 c031048c c03195fc   
3fc0: c02f3ff4 c02f3fd0 c0008b38 c0254068 c0008594 00000000 00000000 c0021ebc   
3fe0: c0007175 c0310518 00000000 c02f3ff8 20008034 c0008970 00000000 00000000   
Backtrace:                                                                      
[<c002ef18>] (clkevt32k_next_event+0x0/0xd8) from [<c005a20c>] (clockevents_prog
ram_event+0xfc/0x168)                                                           
 r6:00000000 r5:000225c1 r4:00000000                                            
[<c005a110>] (clockevents_program_event+0x0/0x168) from [<c005aed4>] (tick_dev_p
rogram_event+0xcc/0xf8)                                                         
 r8:c02f6a30 r7:0000004a r6:13796120 r5:0000004a r4:13797362                    
[<c005ae08>] (tick_dev_program_event+0x0/0xf8) from [<c005af78>] (tick_program_e
vent+0x30/0x3c)                                                                 
[<c005af48>] (tick_program_event+0x0/0x3c) from [<c0052188>] (hrtimer_interrupt+
0x244/0x2e8)                                                                    
 r5:13793a86 r4:c02f9de0                                                        
[<c0051f44>] (hrtimer_interrupt+0x0/0x2e8) from [<c002f0a4>] (at91rm9200_timer_i
nterrupt+0xb4/0xc8)                                                             
[<c002eff0>] (at91rm9200_timer_interrupt+0x0/0xc8) from [<c0065168>] (handle_IRQ
_event+0x54/0xec)                                                               
 r6:c02f2000 r5:00010002 r4:c02f6a08                                            
[<c0065114>] (handle_IRQ_event+0x0/0xec) from [<c0067908>] (handle_level_irq+0x1
04/0x174)                                                                       
[<c0067804>] (handle_level_irq+0x0/0x174) from [<c002506c>] (_text+0x6c/0x8c)   
 r7:00000002 r6:00000000 r5:c0302d40 r4:00000001                                
[<c0025000>] (_text+0x0/0x8c) from [<c00259dc>] (__irq_svc+0x3c/0x80)           
Exception stack(0xc02f3f40 to 0xc02f3f88)                                       
3f40: 00000000 00000001 c02f2000 60000013 c0026eb0 c02f2000 c02f5eb0 c03104e0   
3f60: 2001fd1c 41129200 2001fce8 c02f3f94 c02f3f98 c02f3f88 c0026ef0 c0026efc   
3f80: 60000013 ffffffff                                                         
 r6:00000001 r5:fefff000 r4:ffffffff                                            
[<c0026eb0>] (default_idle+0x0/0x54) from [<c0026db0>] (cpu_idle+0x50/0xac)     
[<c0026d60>] (cpu_idle+0x0/0xac) from [<c02540dc>] (rest_init+0x84/0x9c)        
 r7:c02f5ce0 r6:c00222c0 r5:c031048c r4:c02f2000                                
[<c0254058>] (rest_init+0x0/0x9c) from [<c0008b38>] (start_kernel+0x1d8/0x2c8)  
 r4:c03195fc                                                                    
[<c0008960>] (start_kernel+0x0/0x2c8) from [<20008034>] (0x20008034)            
 r5:c0310518 r4:c0007175                                                        
Code: e59f1040 e88d5000 eb0024fc e3a03000 (e5833000)                            
Kernel panic - not syncing: Fatal exception in interrupt         

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: hrtimer problem on AT91RM9200
  2009-09-10  6:44       ` Bo Hansen
@ 2009-09-10  9:51         ` Uwe Kleine-König
  2009-09-11 15:20           ` Uwe Kleine-König
  0 siblings, 1 reply; 19+ messages in thread
From: Uwe Kleine-König @ 2009-09-10  9:51 UTC (permalink / raw)
  To: Bo Hansen, Thomas Gleixner, john stultz; +Cc: rt-users

Hello Bo,

On Thu, Sep 10, 2009 at 08:44:11AM +0200, Bo Hansen wrote:
> Hi Uwe,
>
> Sorry for my late reply.
>
> I only get the initialization line:
> ************** at91rm9200_timer_init: shift=32, mult=140737,
> min_delta_ns=61036, minclc=2  
OK, that corresponds to my calculation.

Strange, with my patch lr is at rt_mutex_unlock in the oops, before it
was at clockevents_program_event.  I don't understand that.
Can you try changing the pr_info to pr_emerg?
If that doesn't help, do you have CONFIG_DEBUG_LL available?  If so can
you change the pr_info (now pr_emerg) to something like:

	void printch(char);
	void printhex8(unsigned);
	void printascii(const char *);
	printascii("**************");
	printhex8(dev->shift);
	printch(':');
	printhex8(dev->mult);
	printch(':');
	printhex8(dev->min_delta_ns);
	printch('\n');

and enable CONFIG_DEBUG_LL of course?

jstultz, tglx: there shouldn't be anything that fiddles around with
shift, mult and min_delta, should it?  (Well, increasing min_delta is
OK.)

Best regards
Uwe

-- 
Pengutronix e.K.                              | Uwe Kleine-König            |
Industrial Linux Solutions                    | http://www.pengutronix.de/  |
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: hrtimer problem on AT91RM9200
  2009-09-10  9:51         ` Uwe Kleine-König
@ 2009-09-11 15:20           ` Uwe Kleine-König
  2009-09-11 23:07             ` Uwe Kleine-König
  2009-09-14  9:09             ` Bo Hansen
  0 siblings, 2 replies; 19+ messages in thread
From: Uwe Kleine-König @ 2009-09-11 15:20 UTC (permalink / raw)
  To: Bo Hansen, Thomas Gleixner, john stultz; +Cc: rt-users

Hello Bo,

In the meantime I got access to an at91rm9200, too.  To help me
reproducing the problem:

 - which options do you use for cyclictest?
 - Is the system idle or do you run hackbench or similar in parallel?
 - anything else?

Best regards
Uwe

-- 
Pengutronix e.K.                              | Uwe Kleine-König            |
Industrial Linux Solutions                    | http://www.pengutronix.de/  |
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: hrtimer problem on AT91RM9200
  2009-09-11 15:20           ` Uwe Kleine-König
@ 2009-09-11 23:07             ` Uwe Kleine-König
  2009-09-14 10:50               ` Bo Hansen
  2009-09-14  9:09             ` Bo Hansen
  1 sibling, 1 reply; 19+ messages in thread
From: Uwe Kleine-König @ 2009-09-11 23:07 UTC (permalink / raw)
  To: Bo Hansen, Thomas Gleixner, john stultz; +Cc: rt-users

Hello,

On Fri, Sep 11, 2009 at 05:20:44PM +0200, Uwe Kleine-König wrote:
> Hello Bo,
> 
> In the meantime I got access to an at91rm9200, too.  To help me
> reproducing the problem:
I still cannot reproduce, but I found something anyhow.

The problem is that hrtimer_interrupt_hanging decreases min_delta_ns.
(Initially it's 61036.)

I talked to jstultz on irc and both of us are unsure if asserting that
min_delta_ns isn't decreased in hrtimer_interrupt_hanging is enough or
if there is another problem.

Bo, can you please apply the patch below, pass the kernel parameter
ftrace_dump_on_oops (or alternatively do 

	# echo 1 > /proc/sys/kernel/ftrace_dump_on_oops

), reproduce the problem and send the resulting oops?

Best regards
Uwe

diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
index 51bd089..cd72ca9 100644
--- a/arch/arm/kernel/traps.c
+++ b/arch/arm/kernel/traps.c
@@ -18,6 +18,7 @@
 #include <linux/personality.h>
 #include <linux/kallsyms.h>
 #include <linux/delay.h>
+#include <linux/kdebug.h>
 #include <linux/hardirq.h>
 #include <linux/init.h>
 #include <linux/uaccess.h>
@@ -223,6 +224,8 @@ static void __die(const char *str, int err, struct thread_info *thread, struct p
 		dump_backtrace(regs, tsk);
 		dump_instr(regs);
 	}
+
+	notify_die(DIE_OOPS, str, regs, err, current->thread.trap_no, SIGSEGV);
 }
 
 DEFINE_RAW_SPINLOCK(die_lock);
diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index 9e308ab..a0c05f3 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -1390,8 +1390,16 @@ void hrtimer_interrupt(struct clock_event_device *dev)
 
 	/* Reprogramming necessary ? */
 	if (expires_next.tv64 != KTIME_MAX) {
-		if (tick_program_event(expires_next, force_clock_reprogram))
+		if (tick_program_event(expires_next, force_clock_reprogram)) {
+			if (nr_retries > 1)
+				trace_printk("tick_program_event failed, "
+						"now=%lld, expires_next=%lld, "
+						"nr_retries=%d\n",
+					(long long)now.tv64,
+					(long long)expires_next.tv64,
+				       	nr_retries);
 			goto retry;
+		}
 	}
 
 	if (raise)

-- 
Pengutronix e.K.                              | Uwe Kleine-König            |
Industrial Linux Solutions                    | http://www.pengutronix.de/  |
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: hrtimer problem on AT91RM9200
  2009-09-11 15:20           ` Uwe Kleine-König
  2009-09-11 23:07             ` Uwe Kleine-König
@ 2009-09-14  9:09             ` Bo Hansen
  1 sibling, 0 replies; 19+ messages in thread
From: Bo Hansen @ 2009-09-14  9:09 UTC (permalink / raw)
  To: Uwe Kleine-König; +Cc: Thomas Gleixner, john stultz, rt-users

Hi Uwe,

I use cyclictest -p95 -n -i2000 -t40 and 3x ping -s65000. In the 
beginning I had hackbench 4 running also, but
actually the cyclictest was enough to trigger the panic.
Also I'm running from NFS rootfs and running cyclictest from an ssh 
connection.

Best regards
Bo


Uwe Kleine-König wrote:
> Hello Bo,
>
> In the meantime I got access to an at91rm9200, too.  To help me
> reproducing the problem:
>
>  - which options do you use for cyclictest?
>  - Is the system idle or do you run hackbench or similar in parallel?
>  - anything else?
>
> Best regards
> Uwe
>
>   
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: hrtimer problem on AT91RM9200
  2009-09-11 23:07             ` Uwe Kleine-König
@ 2009-09-14 10:50               ` Bo Hansen
  2009-09-14 11:10                 ` Uwe Kleine-König
  0 siblings, 1 reply; 19+ messages in thread
From: Bo Hansen @ 2009-09-14 10:50 UTC (permalink / raw)
  To: Uwe Kleine-König; +Cc: Thomas Gleixner, john stultz, rt-users

[-- Attachment #1: Type: text/plain, Size: 2919 bytes --]

Hi Uwe,

I tried to apply your patch and enabled ftrace but have not been able to 
reproduce
the panic.
It should also be noted that the panic is much more rare if I apply the 
first patch you
sent.

The only panic I got was using your first patch and enabling ftrace. I 
have not yet had
the time to try changing pr_info/pr_emerg.
It seems to run stable when something like: "hrtimer: interrupt too 
slow, forcing clock
min delta to 480771 ns" is found in the dmesg output, but I cannot be 
sure as I don't
know if this is also written to the log just before the panic.

Unfortunately I don't have much time for testing this at the moment but 
I'll get back
as soon as possible.


Best regards,
Bo

Uwe Kleine-König wrote:
> Hello,
>
> On Fri, Sep 11, 2009 at 05:20:44PM +0200, Uwe Kleine-König wrote:
>   
>> Hello Bo,
>>
>> In the meantime I got access to an at91rm9200, too.  To help me
>> reproducing the problem:
>>     
> I still cannot reproduce, but I found something anyhow.
>
> The problem is that hrtimer_interrupt_hanging decreases min_delta_ns.
> (Initially it's 61036.)
>
> I talked to jstultz on irc and both of us are unsure if asserting that
> min_delta_ns isn't decreased in hrtimer_interrupt_hanging is enough or
> if there is another problem.
>
> Bo, can you please apply the patch below, pass the kernel parameter
> ftrace_dump_on_oops (or alternatively do 
>
> 	# echo 1 > /proc/sys/kernel/ftrace_dump_on_oops
>
> ), reproduce the problem and send the resulting oops?
>
> Best regards
> Uwe
>
> diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
> index 51bd089..cd72ca9 100644
> --- a/arch/arm/kernel/traps.c
> +++ b/arch/arm/kernel/traps.c
> @@ -18,6 +18,7 @@
>  #include <linux/personality.h>
>  #include <linux/kallsyms.h>
>  #include <linux/delay.h>
> +#include <linux/kdebug.h>
>  #include <linux/hardirq.h>
>  #include <linux/init.h>
>  #include <linux/uaccess.h>
> @@ -223,6 +224,8 @@ static void __die(const char *str, int err, struct thread_info *thread, struct p
>  		dump_backtrace(regs, tsk);
>  		dump_instr(regs);
>  	}
> +
> +	notify_die(DIE_OOPS, str, regs, err, current->thread.trap_no, SIGSEGV);
>  }
>  
>  DEFINE_RAW_SPINLOCK(die_lock);
> diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
> index 9e308ab..a0c05f3 100644
> --- a/kernel/hrtimer.c
> +++ b/kernel/hrtimer.c
> @@ -1390,8 +1390,16 @@ void hrtimer_interrupt(struct clock_event_device *dev)
>  
>  	/* Reprogramming necessary ? */
>  	if (expires_next.tv64 != KTIME_MAX) {
> -		if (tick_program_event(expires_next, force_clock_reprogram))
> +		if (tick_program_event(expires_next, force_clock_reprogram)) {
> +			if (nr_retries > 1)
> +				trace_printk("tick_program_event failed, "
> +						"now=%lld, expires_next=%lld, "
> +						"nr_retries=%d\n",
> +					(long long)now.tv64,
> +					(long long)expires_next.tv64,
> +				       	nr_retries);
>  			goto retry;
> +		}
>  	}
>  
>  	if (raise)
>
>   

[-- Attachment #2: 140909_kernel_panic_2.6.29.6-rt23-debug-uwe-ftrace --]
[-- Type: text/plain, Size: 5275 bytes --]

Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = c3a38000
[00000000] *pgd=23a0c031, *pte=00000000, *ppte=00000000
Internal error: Oops: 817 [#1] PREEMPT
Modules linked in:
CPU: 0    Not tainted  (2.6.29.6-rt23 #1)
PC is at clkevt32k_next_event+0x98/0xdc
LR is at rt_mutex_unlock+0x14/0x18
pc : [<c0030580>]    lr : [<c027e2cc>]    psr: 00000093
sp : c3a3dc58  ip : c3a3db68  fp : c3a3dc7c
r10: 00000000  r9 : 00000000  r8 : c0322dd8
r7 : 00000067  r6 : 00000000  r5 : 00000001  r4 : 00000000
r3 : 00000000  r2 : 00010003  r1 : 60000093  r0 : 00000062
Flags: nzcv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: c000717f  Table: 23a38000  DAC: 00000015
Process cyclictest (pid: 572, stack limit = 0xc3a3c270)
Stack: (0xc3a3dc58 to 0xc3a3e000)
dc40:                                                       0000d026 c0322dd8 
dc60: c3a3dce0 00000000 000225c1 00000000 c3a3dca4 c3a3dc80 c005d7f4 c00304f8 
dc80: 00000067 38db444d 00000067 38db1ca8 00000067 c0322dd8 c3a3dcf4 c3a3dca8 
dca0: c005e514 c005d704 38db1ca8 00000067 38daf15c 00000000 00000000 c3a3dd38 
dcc0: 38db1ca8 00000067 38726a67 38db444d 00000067 00000067 c03271f8 38db444d 
dce0: 00000067 c3a3c000 c3a3dd14 c3a3dcf8 c005e5c0 c005e454 00000001 00000001 
dd00: c03271f8 38daf15c c3a3dd74 c3a3dd18 c00551a0 c005e59c 3fcd8e40 c0322dd8 
dd20: 38daf15c 00000067 c0378b90 c03271f8 00000001 00000000 38daf15c 00000067 
dd40: 14f46b04 c0030cf0 00000000 c0322db0 c3a3c000 c0322db0 00010003 00000001 
dd60: c3a3c000 00000000 c3a3dd94 c3a3dd78 c003067c c0054f68 c0033944 c0322db0 
dd80: c3a3c000 c0322db0 c3a3ddcc c3a3dd98 c0068fbc c00305d4 c393601c 00000000 
dda0: 00000000 c0328068 c3a3c000 c0322db0 00000001 00000002 c3a3c000 c03247a8 
ddc0: c3a3ddec c3a3ddd0 c006b9b0 c0068f64 00000001 c0335330 00000000 00000003 
dde0: c3a3de0c c3a3ddf0 c0026070 c006b8b8 c0033944 ffffffff fefff000 00000001 
de00: c3a3de7c c3a3de10 c00269fc c0026010 c38ce120 00000001 00000013 0000086c 
de20: 00000000 c3a3c000 c3918800 c3918990 c38ce120 00000000 c03247a8 c3a3de7c 
de40: c3a3de30 c3a3de58 c0031c64 c027d0d0 00000013 ffffffff c3a3c000 00000000 
de60: 00000000 00000000 00000000 c3a3def8 c3a3de94 c3a3de80 c027d5b8 c027cfa4 
de80: 00000001 3b9aca00 c3a3ded4 c3a3de98 c027de28 c027d59c 00000000 00000000 
dea0: c3a3c000 00000000 c003f52c 00000000 00000000 00000000 c3a3df80 c3a3c000 
dec0: 00000000 00000000 c3a3df64 c3a3ded8 c0055ae8 c027dd60 00000000 00000000 
dee0: 007610eb 00000000 39aff350 00000067 00000000 00000000 c3a77ef8 00000000 
df00: 00000000 00000000 39aff350 00000067 39aff350 00000067 c0054e74 c03271f8 
df20: 00000001 c3a3df24 c3a3df24 00000001 c3918800 c003de3c c3a3df88 00000001 
df40: 00000001 00000000 c3a3df80 c0026fa8 c3a3c000 00017194 c3a3df7c c3a3df68 
df60: c004f578 c0055a3c c0026fa8 00000001 c3a3dfa4 c3a3df80 c004f6ac c004f558 
df80: 00000067 39aff350 00000001 00000000 4516ddec 00000109 00000000 c3a3dfa8 
dfa0: c0026de0 c004f58c 00000001 00000000 00000001 00000001 4516ddec 00000000 
dfc0: 00000001 00000000 4516ddec 00000109 00000001 00015c60 00017194 4516ddf4 
dfe0: 00000000 4516dc50 4004680c 4004682c 60000010 00000001 00000000 00000000 
Backtrace: 
[<c00304e8>] (clkevt32k_next_event+0x0/0xdc) from [<c005d7f4>] (clockevents_program_event+0x100/0x16c)
 r6:00000000 r5:000225c1 r4:00000000
[<c005d6f4>] (clockevents_program_event+0x0/0x16c) from [<c005e514>] (tick_dev_program_event+0xd0/0xfc)
 r8:c0322dd8 r7:00000067 r6:38db1ca8 r5:00000067 r4:38db444d
[<c005e444>] (tick_dev_program_event+0x0/0xfc) from [<c005e5c0>] (tick_program_event+0x34/0x40)
[<c005e58c>] (tick_program_event+0x0/0x40) from [<c00551a0>] (hrtimer_interrupt+0x248/0x2ec)
 r5:38daf15c r4:c03271f8
[<c0054f58>] (hrtimer_interrupt+0x0/0x2ec) from [<c003067c>] (at91rm9200_timer_interrupt+0xb8/0xcc)
[<c00305c4>] (at91rm9200_timer_interrupt+0x0/0xcc) from [<c0068fbc>] (handle_IRQ_event+0x68/0x1d8)
 r6:c0322db0 r5:c3a3c000 r4:c0322db0
[<c0068f54>] (handle_IRQ_event+0x0/0x1d8) from [<c006b9b0>] (handle_level_irq+0x108/0x178)
[<c006b8a8>] (handle_level_irq+0x0/0x178) from [<c0026070>] (_text+0x70/0x90)
 r7:00000003 r6:00000000 r5:c0335330 r4:00000001
[<c0026000>] (_text+0x0/0x90) from [<c00269fc>] (__irq_svc+0x3c/0x80)
Exception stack(0xc3a3de10 to 0xc3a3de58)
de00:                                     c38ce120 00000001 00000013 0000086c 
de20: 00000000 c3a3c000 c3918800 c3918990 c38ce120 00000000 c03247a8 c3a3de7c 
de40: c3a3de30 c3a3de58 c0031c64 c027d0d0 00000013 ffffffff                   
 r6:00000001 r5:fefff000 r4:ffffffff
[<c027cf94>] (__schedule+0x0/0x3b0) from [<c027d5b8>] (schedule+0x2c/0x48)
[<c027d58c>] (schedule+0x0/0x48) from [<c027de28>] (do_nanosleep+0xd8/0x118)
 r4:3b9aca00
[<c027dd50>] (do_nanosleep+0x0/0x118) from [<c0055ae8>] (hrtimer_nanosleep+0xbc/0x144)
[<c0055a2c>] (hrtimer_nanosleep+0x0/0x144) from [<c004f578>] (common_nsleep+0x30/0x34)
[<c004f548>] (common_nsleep+0x0/0x34) from [<c004f6ac>] (sys_clock_nanosleep+0x130/0x154)
 r4:00000001
[<c004f57c>] (sys_clock_nanosleep+0x0/0x154) from [<c0026de0>] (ret_fast_syscall+0x0/0x2c)
 r7:00000109 r6:4516ddec r5:00000000 r4:00000001
Code: e59f1040 e88d5000 eb00269f e3a03000 (e5833000) 
Kernel panic - not syncing: Fatal exception in interrupt
Dumping ftrace buffer:
   (ftrace buffer empty)


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: hrtimer problem on AT91RM9200
  2009-09-14 10:50               ` Bo Hansen
@ 2009-09-14 11:10                 ` Uwe Kleine-König
  2009-09-15  9:29                   ` [SOLVED, RFC] " Uwe Kleine-König
  0 siblings, 1 reply; 19+ messages in thread
From: Uwe Kleine-König @ 2009-09-14 11:10 UTC (permalink / raw)
  To: Bo Hansen; +Cc: Thomas Gleixner, john stultz, rt-users

Hello Bo,

On Mon, Sep 14, 2009 at 12:50:32PM +0200, Bo Hansen wrote:
> I tried to apply your patch and enabled ftrace but have not been able to  
> reproduce
> the panic.
> It should also be noted that the panic is much more rare if I apply the  
> first patch you
> sent.
Revert this patch, only use the second.  And optimally add a BUG() in
hrtimer_interrupt_hanging (in kernel/hrtimer.c) after the printk.

> The only panic I got was using your first patch and enabling ftrace. I  
> have not yet had
> the time to try changing pr_info/pr_emerg.
In the meantime I understood why the 2nd printk doesn't appear on the
serial console.  That's because "non-atomic" consoles don't print
anything in atomic context (with PREEMPT_RT=y)

> It seems to run stable when something like: "hrtimer: interrupt too  
> slow, forcing clock
> min delta to 480771 ns" is found in the dmesg output, but I cannot be  
> sure as I don't
> know if this is also written to the log just before the panic.
Actually this is a bug because then min_delta_ns is decreased.  With the
BUG as suggested above this message would be enough to debug this
further.

> Unfortunately I don't have much time for testing this at the moment but  
> I'll get back
> as soon as possible.
:-|

Best regards
Uwe

-- 
Pengutronix e.K.                              | Uwe Kleine-König            |
Industrial Linux Solutions                    | http://www.pengutronix.de/  |
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [SOLVED, RFC] Re: hrtimer problem on AT91RM9200
  2009-09-14 11:10                 ` Uwe Kleine-König
@ 2009-09-15  9:29                   ` Uwe Kleine-König
  2009-09-15 15:29                     ` Thomas Gleixner
  2009-09-16  6:26                     ` Bo Hansen
  0 siblings, 2 replies; 19+ messages in thread
From: Uwe Kleine-König @ 2009-09-15  9:29 UTC (permalink / raw)
  To: Bo Hansen; +Cc: Thomas Gleixner, john stultz, rt-users

Hello,

I'm able to reproduce this now.  The problem is really that
hrtimer_interrupt() tries to program the next timer event which fails 4
times in a row, because the expiry time is already in the past when
tick_dev_program_event is called.  This happens because there are so
much timers with small expiry times. (cyclictest -t40 on an UP machine)

Addionally the fourth interation was short (probably because then there
were no expired timers to check for) and so the try_time argument to
hrtimer_interrupt_hanging was small and so min_delta_ns was decreased
instead of increased.

I wonder why increasing min_delta_ns is supposed to fix something here.
Actually some pressure that results in hrtimer_interrupt_hanging being
called can always happen independently of min_delta_ns' value.
And moreover, is it sensible to change the value of
force_clock_reprogram?  This makes the system harder to benchmark,
because to guarantee a certain latency now you have to check both
situations force_clock_reprogram==1 and force_clock_reprogram==0.

My suggestion is to do the following:
  - fix hrtimer_interrupt_hanging not to decrease min_delta_ns
  - in clockevents_program_event instead of
	if (delta <= 0)
		return -ETIME;
    do
	if (delta <= -dev->min_delta_ns)
		return -ETIME;

    I'm not sure though if -dev->min_delta_ns is the right value or if
    that is a good idea at all.

Best regards
Uwe

-- 
Pengutronix e.K.                              | Uwe Kleine-König            |
Industrial Linux Solutions                    | http://www.pengutronix.de/  |
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [SOLVED, RFC] Re: hrtimer problem on AT91RM9200
  2009-09-15  9:29                   ` [SOLVED, RFC] " Uwe Kleine-König
@ 2009-09-15 15:29                     ` Thomas Gleixner
  2009-09-15 18:40                       ` Uwe Kleine-König
  2009-09-16  6:26                     ` Bo Hansen
  1 sibling, 1 reply; 19+ messages in thread
From: Thomas Gleixner @ 2009-09-15 15:29 UTC (permalink / raw)
  To: Uwe Kleine-König; +Cc: Bo Hansen, john stultz, rt-users

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1870 bytes --]

On Tue, 15 Sep 2009, Uwe Kleine-König wrote:
> Hello,
> 
> I'm able to reproduce this now.  The problem is really that
> hrtimer_interrupt() tries to program the next timer event which fails 4
> times in a row, because the expiry time is already in the past when
> tick_dev_program_event is called.  This happens because there are so
> much timers with small expiry times. (cyclictest -t40 on an UP machine)

Pretty much the case which triggers the hang mechanism.

> Addionally the fourth interation was short (probably because then there
> were no expired timers to check for) and so the try_time argument to
> hrtimer_interrupt_hanging was small and so min_delta_ns was decreased
> instead of increased.

He, what is decreasing min_delta_ns ? There is no such code.
 
> I wonder why increasing min_delta_ns is supposed to fix something here.
> Actually some pressure that results in hrtimer_interrupt_hanging being
> called can always happen independently of min_delta_ns' value.

It ensures that the timer interrupts are delayed sufficiently that
something else than the timer interrupt has a chance to get on the
CPU.

> And moreover, is it sensible to change the value of
> force_clock_reprogram?  This makes the system harder to benchmark,
> because to guarantee a certain latency now you have to check both
> situations force_clock_reprogram==1 and force_clock_reprogram==0.
> 
> My suggestion is to do the following:
>   - fix hrtimer_interrupt_hanging not to decrease min_delta_ns

It does not decrease anything, so what's to fix there ?

>   - in clockevents_program_event instead of
> 	if (delta <= 0)
> 		return -ETIME;
>     do
> 	if (delta <= -dev->min_delta_ns)
> 		return -ETIME;
> 
>     I'm not sure though if -dev->min_delta_ns is the right value or if
>     that is a good idea at all.

You might expire timers early, which is wrong.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [SOLVED, RFC] Re: hrtimer problem on AT91RM9200
  2009-09-15 15:29                     ` Thomas Gleixner
@ 2009-09-15 18:40                       ` Uwe Kleine-König
  2009-09-15 19:06                         ` Thomas Gleixner
  0 siblings, 1 reply; 19+ messages in thread
From: Uwe Kleine-König @ 2009-09-15 18:40 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Bo Hansen, john stultz, rt-users

Hello Thomas,

On Tue, Sep 15, 2009 at 05:29:17PM +0200, Thomas Gleixner wrote:
> On Tue, 15 Sep 2009, Uwe Kleine-König wrote:
> > Hello,
> > 
> > I'm able to reproduce this now.  The problem is really that
> > hrtimer_interrupt() tries to program the next timer event which fails 4
> > times in a row, because the expiry time is already in the past when
> > tick_dev_program_event is called.  This happens because there are so
> > much timers with small expiry times. (cyclictest -t40 on an UP machine)
> 
> Pretty much the case which triggers the hang mechanism.
OK.

> > Addionally the fourth interation was short (probably because then there
> > were no expired timers to check for) and so the try_time argument to
> > hrtimer_interrupt_hanging was small and so min_delta_ns was decreased
> > instead of increased.
> 
> He, what is decreasing min_delta_ns ? There is no such code.
hrtimer_interrupt_hanging() can decrease min_delta_ns.

Look at Bo's reports: min_delta_ns is initially 61036 and he gets

	hrtimer: interrupt too slow, forcing clock min delta to 52482 ns

If you want I can add a

	WARN_ON_ONCE((unsigned long)try_time.tv64 * 3 < dev->min_delta_ns)

and send you the output of dmesg.

> > I wonder why increasing min_delta_ns is supposed to fix something here.
> > Actually some pressure that results in hrtimer_interrupt_hanging being
> > called can always happen independently of min_delta_ns' value.
> 
> It ensures that the timer interrupts are delayed sufficiently that
> something else than the timer interrupt has a chance to get on the
> CPU.
Hhhhm, doesn't destroy that any upper bound for latency.  If a process
succeeds to install many timers that expire one after another with a
short delta in between min_delta_ns goes up.  This in turn mean that
even high priority processes loose the ability to sleep a short period.
 
> > And moreover, is it sensible to change the value of
> > force_clock_reprogram?  This makes the system harder to benchmark,
> > because to guarantee a certain latency now you have to check both
> > situations force_clock_reprogram==1 and force_clock_reprogram==0.
> > 
> > My suggestion is to do the following:
> >   - fix hrtimer_interrupt_hanging not to decrease min_delta_ns
> 
> It does not decrease anything, so what's to fix there ?
http://git.pengutronix.de/?p=ukl/linux-2.6.git;a=commitdiff;h=a1b46573904d822ac5c7495ea1dbc8ef38200bf7

> >   - in clockevents_program_event instead of
> > 	if (delta <= 0)
> > 		return -ETIME;
> >     do
> > 	if (delta <= -dev->min_delta_ns)
> > 		return -ETIME;
> > 
> >     I'm not sure though if -dev->min_delta_ns is the right value or if
> >     that is a good idea at all.
> 
> You might expire timers early, which is wrong.
The critic be right, I haven't thought much about that, didn't even try
it.  My thought was only that in the original case (delta <= 0) the
event to program is in the past, so programming min_delta_ns for it
isn't too early at all.

Best regards
Uwe

-- 
Pengutronix e.K.                              | Uwe Kleine-König            |
Industrial Linux Solutions                    | http://www.pengutronix.de/  |
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [SOLVED, RFC] Re: hrtimer problem on AT91RM9200
  2009-09-15 18:40                       ` Uwe Kleine-König
@ 2009-09-15 19:06                         ` Thomas Gleixner
  2009-09-15 20:54                           ` Thomas Gleixner
  0 siblings, 1 reply; 19+ messages in thread
From: Thomas Gleixner @ 2009-09-15 19:06 UTC (permalink / raw)
  To: Uwe Kleine-König; +Cc: Bo Hansen, john stultz, rt-users

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1101 bytes --]

On Tue, 15 Sep 2009, Uwe Kleine-König wrote:
> On Tue, Sep 15, 2009 at 05:29:17PM +0200, Thomas Gleixner wrote:
> > He, what is decreasing min_delta_ns ? There is no such code.
> hrtimer_interrupt_hanging() can decrease min_delta_ns.
> 
> Look at Bo's reports: min_delta_ns is initially 61036 and he gets
> 
> 	hrtimer: interrupt too slow, forcing clock min delta to 52482 ns

So the problem is that min delta is too small due to an unhandled
32bit overflow.

But that's not a decrease by any means. It's simply an 32bit overflow
bug which needs to be fixed.
 
> > It ensures that the timer interrupts are delayed sufficiently that
> > something else than the timer interrupt has a chance to get on the
> > CPU.
> Hhhhm, doesn't destroy that any upper bound for latency.  If a process
> succeeds to install many timers that expire one after another with a
> short delta in between min_delta_ns goes up.  This in turn mean that
> even high priority processes loose the ability to sleep a short period.

Yes, we know that already and it's on the list of things which need to
be solved.
  
Thanks,

	tglx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [SOLVED, RFC] Re: hrtimer problem on AT91RM9200
  2009-09-15 19:06                         ` Thomas Gleixner
@ 2009-09-15 20:54                           ` Thomas Gleixner
  0 siblings, 0 replies; 19+ messages in thread
From: Thomas Gleixner @ 2009-09-15 20:54 UTC (permalink / raw)
  To: Uwe Kleine-König; +Cc: Bo Hansen, john stultz, rt-users

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1104 bytes --]

On Tue, 15 Sep 2009, Thomas Gleixner wrote:
> On Tue, 15 Sep 2009, Uwe Kleine-König wrote:
> > On Tue, Sep 15, 2009 at 05:29:17PM +0200, Thomas Gleixner wrote:
> > > He, what is decreasing min_delta_ns ? There is no such code.
> > hrtimer_interrupt_hanging() can decrease min_delta_ns.
> > 
> > Look at Bo's reports: min_delta_ns is initially 61036 and he gets
> > 
> > 	hrtimer: interrupt too slow, forcing clock min delta to 52482 ns
> 
> So the problem is that min delta is too small due to an unhandled
> 32bit overflow.
> 
> But that's not a decrease by any means. It's simply an 32bit overflow
> bug which needs to be fixed.

Ok. I read the whole thread back to avoid further confusion. Looking
at the code it's pretty simple to explain. The time delta is
multiplied times 3 and unconditionally stored in min_delta_ns. That
needs to be fixed but the real problem is how the delta is
calculated. We take it from the last iteration and not across the 5
iterations which trigger the hang check. I'm queuing a fix for that
for maninline and the -rt11 release (I already uploaded -rt10).

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [SOLVED, RFC] Re: hrtimer problem on AT91RM9200
  2009-09-15  9:29                   ` [SOLVED, RFC] " Uwe Kleine-König
  2009-09-15 15:29                     ` Thomas Gleixner
@ 2009-09-16  6:26                     ` Bo Hansen
  1 sibling, 0 replies; 19+ messages in thread
From: Bo Hansen @ 2009-09-16  6:26 UTC (permalink / raw)
  To: Uwe Kleine-König; +Cc: Thomas Gleixner, john stultz, rt-users

Hi Uwe,

I'm happy to see you are able to reproduce the problem now.

This is a bit over my head as a newbie in the kernel, but I
will follow from the sideline ;-).

Thank you for taking care of the problem.

Best regards,
Bo



Uwe Kleine-König wrote:
> Hello,
>
> I'm able to reproduce this now.  The problem is really that
> hrtimer_interrupt() tries to program the next timer event which fails 4
> times in a row, because the expiry time is already in the past when
> tick_dev_program_event is called.  This happens because there are so
> much timers with small expiry times. (cyclictest -t40 on an UP machine)
>
> Addionally the fourth interation was short (probably because then there
> were no expired timers to check for) and so the try_time argument to
> hrtimer_interrupt_hanging was small and so min_delta_ns was decreased
> instead of increased.
>
> I wonder why increasing min_delta_ns is supposed to fix something here.
> Actually some pressure that results in hrtimer_interrupt_hanging being
> called can always happen independently of min_delta_ns' value.
> And moreover, is it sensible to change the value of
> force_clock_reprogram?  This makes the system harder to benchmark,
> because to guarantee a certain latency now you have to check both
> situations force_clock_reprogram==1 and force_clock_reprogram==0.
>
> My suggestion is to do the following:
>   - fix hrtimer_interrupt_hanging not to decrease min_delta_ns
>   - in clockevents_program_event instead of
> 	if (delta <= 0)
> 		return -ETIME;
>     do
> 	if (delta <= -dev->min_delta_ns)
> 		return -ETIME;
>
>     I'm not sure though if -dev->min_delta_ns is the right value or if
>     that is a good idea at all.
>
> Best regards
> Uwe
>
>   
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2009-09-16  6:26 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-19 12:24 hrtimer problem on AT91RM9200 Bo Hansen
2009-08-19 15:33 ` Remy Bohmer
2009-08-20 19:52 ` Uwe Kleine-König
2009-08-21 12:57   ` Bo Hansen
     [not found]     ` <fae6a9700908232327u65fca65bt97bb8c43a91345cd@mail.gmail.com>
2009-08-24  6:30       ` Vivek Satpute
2009-09-03 14:12     ` Uwe Kleine-König
2009-09-10  6:44       ` Bo Hansen
2009-09-10  9:51         ` Uwe Kleine-König
2009-09-11 15:20           ` Uwe Kleine-König
2009-09-11 23:07             ` Uwe Kleine-König
2009-09-14 10:50               ` Bo Hansen
2009-09-14 11:10                 ` Uwe Kleine-König
2009-09-15  9:29                   ` [SOLVED, RFC] " Uwe Kleine-König
2009-09-15 15:29                     ` Thomas Gleixner
2009-09-15 18:40                       ` Uwe Kleine-König
2009-09-15 19:06                         ` Thomas Gleixner
2009-09-15 20:54                           ` Thomas Gleixner
2009-09-16  6:26                     ` Bo Hansen
2009-09-14  9:09             ` Bo Hansen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).