3.8.4-rt2 panic in migrate_task_rq

linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* 3.8.4-rt2 panic in migrate_task_rq_fair
@ 2013-04-05 16:47 Darren Hart
  2013-04-05 17:17 ` Darren Hart
  2013-04-26 13:21 ` Sebastian Andrzej Siewior
  0 siblings, 2 replies; 4+ messages in thread
From: Darren Hart @ 2013-04-05 16:47 UTC (permalink / raw)
  To: linux-rt-users

Running on a UEFI 32bit Atom E6xx system I see the following panic after
several minutes running the following cyclictest command.

root@sys940x:~# cyclictest -p 50 -d 10m -t -q
# /dev/cpu_dma_latency set to 0us

BUG: unable to handle kernel paging request at fffffff4
IP: [<c106a41c>] migrate_task_rq_fair+0x4c/0x100
*pde = 0198f067 *pte = 00000000
Oops: 0000 [#1] PREEMPT SMP
Modules linked in:
Pid: 649, comm: cyclictest Not tainted 3.8.4-rt2-yocto-preempt-rt #1
EIP: 0060:[<c106a41c>] EFLAGS: 00010046 CPU: 0
EIP is at migrate_task_rq_fair+0x4c/0x100
EAX: 00000000 EBX: deec43f0 ECX: 00000000 EDX: 00000000
ESI: dde8f948 EDI: c1983900 EBP: dee9fe58 ESP: dee9fe40
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
CR0: 80050033 CR2: fffffff4 CR3: 1ef64000 CR4: 000007d0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
Process cyclictest (pid: 649, ti=dee9e000 task=def74170 task.ti=dee9e000)
Stack:
 c197f060 00000000 00000000 00000000 deec43f0 00000001 dee9fec0 c1064ec5
 c10380ce 00000010 dee9fe7c c1065894 de699ef0 619b7c30 00000000 dee9feac
 00000010 00000000 dee9fea0 c1037f76 00000000 def74170 00000001 dee9ff28
Call Trace:
 [<c1064ec5>] set_task_cpu+0x55/0x1b0
 [<c10380ce>] ? unpin_current_cpu+0xe/0x70
 [<c1065894>] ? migrate_enable+0xc4/0x1c0
 [<c1037f76>] ? pin_current_cpu+0x76/0x1c0
 [<c106713c>] try_to_wake_up+0x18c/0x300
 [<c10672ef>] wake_up_process+0x1f/0x40
 [<c10595ed>] hrtimer_wakeup+0x1d/0x30
 [<c10599cb>] __run_hrtimer+0x9b/0x260
 [<c10595d0>] ? update_rmtp+0x90/0x90
 [<c105ad62>] hrtimer_interrupt+0x272/0x320
 [<c1645ed5>] smp_apic_timer_interrupt+0x55/0x87
 [<c163f75d>] apic_timer_interrupt+0x2d/0x34
 [<c163f48c>] ? resume_kernel+0x44/0x44
Code: 83 74 01 00 00 74 48 8d 4e 58 e8 94 2e 2c 00 89 45 f0 89 55 f4 8b
8b 78 01 00 00 8b 93 74 01 00 00 29 55 f0 19 4d f4 31 c0 31 d2 <8b> 49
f4 0b 4d f0 75 2c 89 83 74 01 00 00 89 93 78 01 00 00 8b

EIP: [<c106a41c>] migrate_task_rq_fair+0x4c/0x100 SS:ESP 0068:dee9fe40
CR2: 00000000fffffff4
---[ end trace 0000000000000002 ]---
Kernel panic - not syncing: Fatal exception in interrupt

-- 
Darren Hart
Intel Open Source Technology Center
Yocto Project - Technical Lead - Linux Kernel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 3.8.4-rt2 panic in migrate_task_rq_fair
  2013-04-05 16:47 3.8.4-rt2 panic in migrate_task_rq_fair Darren Hart
@ 2013-04-05 17:17 ` Darren Hart
  2013-04-26 13:21 ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 4+ messages in thread
From: Darren Hart @ 2013-04-05 17:17 UTC (permalink / raw)
  To: linux-rt-users



On 04/05/2013 09:47 AM, Darren Hart wrote:
> Running on a UEFI 32bit Atom E6xx system I see the following panic after
> several minutes running the following cyclictest command.
> 
> root@sys940x:~# cyclictest -p 50 -d 10m -t -q

Whoops, I should have used "-D 10m", but the following is of course
still a problem.

--
Darren

> # /dev/cpu_dma_latency set to 0us
> 
> BUG: unable to handle kernel paging request at fffffff4
> IP: [<c106a41c>] migrate_task_rq_fair+0x4c/0x100
> *pde = 0198f067 *pte = 00000000
> Oops: 0000 [#1] PREEMPT SMP
> Modules linked in:
> Pid: 649, comm: cyclictest Not tainted 3.8.4-rt2-yocto-preempt-rt #1
> EIP: 0060:[<c106a41c>] EFLAGS: 00010046 CPU: 0
> EIP is at migrate_task_rq_fair+0x4c/0x100
> EAX: 00000000 EBX: deec43f0 ECX: 00000000 EDX: 00000000
> ESI: dde8f948 EDI: c1983900 EBP: dee9fe58 ESP: dee9fe40
>  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> CR0: 80050033 CR2: fffffff4 CR3: 1ef64000 CR4: 000007d0
> DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> DR6: ffff0ff0 DR7: 00000400
> Process cyclictest (pid: 649, ti=dee9e000 task=def74170 task.ti=dee9e000)
> Stack:
>  c197f060 00000000 00000000 00000000 deec43f0 00000001 dee9fec0 c1064ec5
>  c10380ce 00000010 dee9fe7c c1065894 de699ef0 619b7c30 00000000 dee9feac
>  00000010 00000000 dee9fea0 c1037f76 00000000 def74170 00000001 dee9ff28
> Call Trace:
>  [<c1064ec5>] set_task_cpu+0x55/0x1b0
>  [<c10380ce>] ? unpin_current_cpu+0xe/0x70
>  [<c1065894>] ? migrate_enable+0xc4/0x1c0
>  [<c1037f76>] ? pin_current_cpu+0x76/0x1c0
>  [<c106713c>] try_to_wake_up+0x18c/0x300
>  [<c10672ef>] wake_up_process+0x1f/0x40
>  [<c10595ed>] hrtimer_wakeup+0x1d/0x30
>  [<c10599cb>] __run_hrtimer+0x9b/0x260
>  [<c10595d0>] ? update_rmtp+0x90/0x90
>  [<c105ad62>] hrtimer_interrupt+0x272/0x320
>  [<c1645ed5>] smp_apic_timer_interrupt+0x55/0x87
>  [<c163f75d>] apic_timer_interrupt+0x2d/0x34
>  [<c163f48c>] ? resume_kernel+0x44/0x44
> Code: 83 74 01 00 00 74 48 8d 4e 58 e8 94 2e 2c 00 89 45 f0 89 55 f4 8b
> 8b 78 01 00 00 8b 93 74 01 00 00 29 55 f0 19 4d f4 31 c0 31 d2 <8b> 49
> f4 0b 4d f0 75 2c 89 83 74 01 00 00 89 93 78 01 00 00 8b
> 
> EIP: [<c106a41c>] migrate_task_rq_fair+0x4c/0x100 SS:ESP 0068:dee9fe40
> CR2: 00000000fffffff4
> ---[ end trace 0000000000000002 ]---
> Kernel panic - not syncing: Fatal exception in interrupt
> 

-- 
Darren Hart
Intel Open Source Technology Center
Yocto Project - Technical Lead - Linux Kernel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 3.8.4-rt2 panic in migrate_task_rq_fair
  2013-04-05 16:47 3.8.4-rt2 panic in migrate_task_rq_fair Darren Hart
  2013-04-05 17:17 ` Darren Hart
@ 2013-04-26 13:21 ` Sebastian Andrzej Siewior
  2013-04-29 23:00   ` Darren Hart
  1 sibling, 1 reply; 4+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-04-26 13:21 UTC (permalink / raw)
  To: Darren Hart; +Cc: linux-rt-users

* Darren Hart | 2013-04-05 09:47:09 [-0700]:

>Running on a UEFI 32bit Atom E6xx system I see the following panic after
>several minutes running the following cyclictest command.

Can you reproduce this?

>root@sys940x:~# cyclictest -p 50 -d 10m -t -q
># /dev/cpu_dma_latency set to 0us
>
>BUG: unable to handle kernel paging request at fffffff4
>IP: [<c106a41c>] migrate_task_rq_fair+0x4c/0x100
>EIP is at migrate_task_rq_fair+0x4c/0x100
>EAX: 00000000 EBX: deec43f0 ECX: 00000000 EDX: 00000000
>ESI: dde8f948 EDI: c1983900 EBP: dee9fe58 ESP: dee9fe40
> DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068

This is the disassembly of your code:

|   0:   83 74 01 00 00          xorl   $0x0,0x0(%rcx,%rax,1)
|   5:   74 48                   je     4f <crash+0x24>
|   7:   8d 4e 58                lea    0x58(%rsi),%ecx
|   a:   e8 94 2e 2c 00          callq  2c2ea3 <crash+0x2c2e78>
|   f:   89 45 f0                mov    %eax,-0x10(%rbp)
|  12:   89 55 f4                mov    %edx,-0xc(%rbp)
|  15:   8b 8b 78 01 00 00       mov    0x178(%rbx),%ecx
|  1b:   8b 93 74 01 00 00       mov    0x174(%rbx),%edx
|  21:   29 55 f0                sub    %edx,-0x10(%rbp)
|  24:   19 4d f4                sbb    %ecx,-0xc(%rbp)
|  27:   31 c0                   xor    %eax,%eax
|  29:   31 d2                   xor    %edx,%edx
|
|000000000000002b <crash>:
|  2b:   8b 49 f4                mov    -0xc(%rcx),%ecx

So ecx is zero, -0xc gives xfffffff4. Okay, bad pointer crash.

|  2e:   0b 4d f0                or     -0x10(%rbp),%ecx
|  31:   75 2c                   jne    5f <crash+0x34>
|  33:   89 83 74 01 00 00       mov    %eax,0x174(%rbx)
|  39:   89 93 78 01 00 00       mov    %edx,0x178(%rbx)

A few lines up (offset 0x21) rcx is used for u64 subtraction in
__synchronize_entity_decay(), the C code:
|        decays -= se->avg.decay_count;
|         if (!decays)
|                 return 0;

The result is saved in -0x10 & -0xc *rbp. Later it is loaded again from
stack because atomic64 is not inlined and it needs to do the zero check.

So *I* think that the assembly here is wrong because line 0x2b should
use rbp as the pointer as it is done in 0x2e. The two lines are are the
zero check.
My gcc creates here: 

|c105c835:       e8 da 3a 1d 00          call   c1230314 <atomic64_read_cx8>
|c105c83a:       89 55 f4                mov    %edx,-0xc(%ebp)
|c105c83d:       8b 93 9c 00 00 00       mov    0x9c(%ebx),%edx
|c105c843:       89 45 f0                mov    %eax,-0x10(%ebp)
|c105c846:       8b 8b a0 00 00 00       mov    0xa0(%ebx),%ecx
|c105c84c:       29 55 f0                sub    %edx,-0x10(%ebp)
|c105c84f:       19 4d f4                sbb    %ecx,-0xc(%ebp)
|c105c852:       31 c0                   xor    %eax,%eax
|c105c854:       31 d2                   xor    %edx,%edx
crash:
|c105c856:       8b 4d f4                mov    -0xc(%ebp),%ecx

as you see, it uses ebp instead of rcx for the 0 check.

|c105c859:       0b 4d f0                or     -0x10(%ebp),%ecx
|c105c85c:       75 2a                   jne    c105c888 <migrate_task_rq_fair+0x78>

The assembly code looks wrong to me. So it is either a gcc bug or the
attributes for the inline assembly in atomic64_read() /
alternative_atomic64() are wrong.

Sebastian

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 3.8.4-rt2 panic in migrate_task_rq_fair
  2013-04-26 13:21 ` Sebastian Andrzej Siewior
@ 2013-04-29 23:00   ` Darren Hart
  0 siblings, 0 replies; 4+ messages in thread
From: Darren Hart @ 2013-04-29 23:00 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-rt-users



On 04/26/2013 06:21 AM, Sebastian Andrzej Siewior wrote:
> * Darren Hart | 2013-04-05 09:47:09 [-0700]:
> 
>> Running on a UEFI 32bit Atom E6xx system I see the following panic after
>> several minutes running the following cyclictest command.
> 
> Can you reproduce this?


Yes, it was perfectly repeatable.


>> root@sys940x:~# cyclictest -p 50 -d 10m -t -q
>> # /dev/cpu_dma_latency set to 0us
>>
>> BUG: unable to handle kernel paging request at fffffff4
>> IP: [<c106a41c>] migrate_task_rq_fair+0x4c/0x100
>> EIP is at migrate_task_rq_fair+0x4c/0x100
>> EAX: 00000000 EBX: deec43f0 ECX: 00000000 EDX: 00000000
>> ESI: dde8f948 EDI: c1983900 EBP: dee9fe58 ESP: dee9fe40
>> DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> 
> This is the disassembly of your code:
> 
> |   0:   83 74 01 00 00          xorl   $0x0,0x0(%rcx,%rax,1)
> |   5:   74 48                   je     4f <crash+0x24>
> |   7:   8d 4e 58                lea    0x58(%rsi),%ecx
> |   a:   e8 94 2e 2c 00          callq  2c2ea3 <crash+0x2c2e78>
> |   f:   89 45 f0                mov    %eax,-0x10(%rbp)
> |  12:   89 55 f4                mov    %edx,-0xc(%rbp)
> |  15:   8b 8b 78 01 00 00       mov    0x178(%rbx),%ecx
> |  1b:   8b 93 74 01 00 00       mov    0x174(%rbx),%edx
> |  21:   29 55 f0                sub    %edx,-0x10(%rbp)
> |  24:   19 4d f4                sbb    %ecx,-0xc(%rbp)
> |  27:   31 c0                   xor    %eax,%eax
> |  29:   31 d2                   xor    %edx,%edx
> |
> |000000000000002b <crash>:
> |  2b:   8b 49 f4                mov    -0xc(%rcx),%ecx
> 
> So ecx is zero, -0xc gives xfffffff4. Okay, bad pointer crash.
> 
> |  2e:   0b 4d f0                or     -0x10(%rbp),%ecx
> |  31:   75 2c                   jne    5f <crash+0x34>
> |  33:   89 83 74 01 00 00       mov    %eax,0x174(%rbx)
> |  39:   89 93 78 01 00 00       mov    %edx,0x178(%rbx)
> 
> A few lines up (offset 0x21) rcx is used for u64 subtraction in
> __synchronize_entity_decay(), the C code:
> |        decays -= se->avg.decay_count;
> |         if (!decays)
> |                 return 0;
> 
> The result is saved in -0x10 & -0xc *rbp. Later it is loaded again from
> stack because atomic64 is not inlined and it needs to do the zero check.
> 
> So *I* think that the assembly here is wrong because line 0x2b should
> use rbp as the pointer as it is done in 0x2e. The two lines are are the
> zero check.
> My gcc creates here: 
> 
> |c105c835:       e8 da 3a 1d 00          call   c1230314 <atomic64_read_cx8>
> |c105c83a:       89 55 f4                mov    %edx,-0xc(%ebp)
> |c105c83d:       8b 93 9c 00 00 00       mov    0x9c(%ebx),%edx
> |c105c843:       89 45 f0                mov    %eax,-0x10(%ebp)
> |c105c846:       8b 8b a0 00 00 00       mov    0xa0(%ebx),%ecx
> |c105c84c:       29 55 f0                sub    %edx,-0x10(%ebp)
> |c105c84f:       19 4d f4                sbb    %ecx,-0xc(%ebp)
> |c105c852:       31 c0                   xor    %eax,%eax
> |c105c854:       31 d2                   xor    %edx,%edx
> crash:
> |c105c856:       8b 4d f4                mov    -0xc(%ebp),%ecx
> 
> as you see, it uses ebp instead of rcx for the 0 check.
> 
> |c105c859:       0b 4d f0                or     -0x10(%ebp),%ecx
> |c105c85c:       75 2a                   jne    c105c888 <migrate_task_rq_fair+0x78>
> 
> The assembly code looks wrong to me. So it is either a gcc bug or the
> attributes for the inline assembly in atomic64_read() /
> alternative_atomic64() are wrong.


Something to look into, I will try to get back to this and compare a
couple of different compiler versions.

Thanks for looking into it!

-- 
Darren Hart
Intel Open Source Technology Center
Yocto Project - Technical Lead - Linux Kernel

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-04-29 23:00 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-05 16:47 3.8.4-rt2 panic in migrate_task_rq_fair Darren Hart
2013-04-05 17:17 ` Darren Hart
2013-04-26 13:21 ` Sebastian Andrzej Siewior
2013-04-29 23:00   ` Darren Hart

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).