* [Xenomai] ipipe/x86: kernel BUG due to missing IRQ_MOVE_CLEANUP_VECTOR entry in ipipe-core3.2
@ 2012-09-11 15:56 Gernot Hillier
2012-09-15 22:26 ` Gilles Chanteperdrix
0 siblings, 1 reply; 28+ messages in thread
From: Gernot Hillier @ 2012-09-11 15:56 UTC (permalink / raw)
To: rpm, xenomai
Hi there!
While testing ipipe-core3.2 on an SMP x86 machine, I found a reproducible
kernel BUG after some seconds after starting irqbalance:
------------[ cut here ]------------
kernel BUG at arch/x86/kernel/ipipe.c:592!
invalid opcode: 0000 [#1] SMP
CPU 0
Modules linked in: des_generic md4 i7core_edac psmouse nls_cp437 edac_core cifs serio_raw joydev raid10 raid456 async_pq async_xor xor async_memcpy async_raid6_recov usbhid hid mpt2sas scsi_transport_sas raid_class igb raid6_pq async_tx raid1 raid0 multipath linear
Pid: 0, comm: swapper/0 Not tainted 3.2.21-9-xenomai #3 Siemens AG Healthcare Sector MARS 2.1/X8DTH
RIP: 0010:[<ffffffff8101edec>] [<ffffffff8101edec>] __ipipe_handle_irq+0x1bc/0x1d0
RSP: 0018:ffffffff8177bbe0 EFLAGS: 00010086
RAX: 000000000000d880 RBX: 00000000ffffffff RCX: 0000000000000092
RDX: ffffffffffffffdf RSI: ffffffff8177bc18 RDI: ffffffff8177bbf8
RBP: ffffffff8177bc00 R08: 0000000000000001 R09: 0000000000000000
R10: ffff880624ebaef8 R11: 000000000029fbc4 R12: 000000000000d880
R13: ffffffff8177bbf8 R14: ffff880624e00000 R15: ffff880624e0d880
FS: 0000000000000000(0000) GS:ffff880624e00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f452a2efb80 CR3: 0000000c114d3000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper/0 (pid: 0, threadinfo ffffffff81778000, task ffffffff81787020)
Stack:
0000000000000000 ffffffff8177bfd8 0000000000000063 ffff880624e1f9a8
ffffffff8177bca8 ffffffff815a44dd ffffffff8177bc18 ffffffff8177bca8
00000000815a373b 000000000029fbc4 ffff880624eba570 0000000000000000
Call Trace:
[<ffffffff815a44dd>] irq_move_cleanup_interrupt+0x5d/0x90
[<ffffffff815a6649>] ? call_softirq+0x19/0x30
[<ffffffff810043d5>] do_softirq+0xc5/0x100
[<ffffffff81050955>] irq_exit+0xd5/0xf0
[<ffffffff815a6e1f>] do_IRQ+0x6f/0x100
[<ffffffff815a6db0>] ? __entry_text_end+0x5/0x5
[<ffffffff8101f4d3>] __ipipe_do_IRQ+0x83/0xa0
[<ffffffff8101f4d9>] ? __ipipe_do_IRQ+0x89/0xa0
[<ffffffff810b7aaa>] __ipipe_dispatch_irq_fast+0x16a/0x170
[<ffffffff810b87b9>] __ipipe_dispatch_irq+0xe9/0x210
[<ffffffff8101eca1>] __ipipe_handle_irq+0x71/0x1d0
[<ffffffff8159bae0>] common_interrupt+0x60/0x81
[<ffffffff8101f3c4>] ? __ipipe_halt_root+0x34/0x50
[<ffffffff8101f3b7>] ? __ipipe_halt_root+0x27/0x50
[<ffffffff8100b146>] default_idle+0x66/0x1a0
[<ffffffff810020cf>] cpu_idle+0xaf/0x100
[<ffffffff8157ec22>] rest_init+0x72/0x80
[<ffffffff81aeacca>] start_kernel+0x3b4/0x3bf
[<ffffffff81aea321>] x86_64_start_reservations+0x131/0x135
[<ffffffff81aea456>] x86_64_start_kernel+0x131/0x138
Code: ff ff 0f 1f 44 00 00 48 83 a0 98 06 00 00 fe 4c 89 ee bf 20 00 00 00 e8 63 83 09 00 e9 f6 fe ff ff be 01 00 00 00 e9 ab fe ff ff <0f> 0b 66 90 eb fc 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55
RIP [<ffffffff8101edec>] __ipipe_handle_irq+0x1bc/0x1d0
RSP <ffffffff8177bbe0>
This seems to be caused by a missing entry for IRQ_MOVE_CLEANUP_VECTOR
in the per_cpu array vector_irq[].
I found that ipipe_init_vector_irq() (which used to add the needed
entry) was factored out from arch/x86/kernel/ipipe.c. This likely
happened when porting from 2.6.38 to 3.1 - at least I can still see the
code in ipipe-2.6.38-x86 and missed it in ipipe-core3.1 (and didn't find
any x86-branch in-between).
The following sample patch fixes the issue for me:
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -1287,6 +1287,7 @@ void __setup_vector_irq(int cpu)
if (!cpumask_test_cpu(cpu, cfg->domain))
per_cpu(vector_irq, cpu)[vector] = -1;
}
+ per_cpu(vector_irq, cpu)[IRQ_MOVE_CLEANUP_VECTOR] = IRQ_MOVE_CLEANUP_VECTOR;
raw_spin_unlock(&vector_lock);
}
--- a/arch/x86/kernel/irqinit.c
+++ b/arch/x86/kernel/irqinit.c
@@ -295,7 +295,8 @@ static void __init apic_intr_init(void)
# ifdef CONFIG_IRQ_WORK
alloc_intr_gate(IRQ_WORK_VECTOR, irq_work_interrupt);
# endif
-
+ per_cpu(vector_irq, 0)[IRQ_MOVE_CLEANUP_VECTOR] =
+ IRQ_MOVE_CLEANUP_VECTOR;
#endif
}
However, this doesn't seem the way to go, so after a short discussion
with Jan Kiszka, we wondered if there was a special reason for the
removal of ipipe_init_vector_irq() or if it simply got lost and should
be re-added.
Thanks in advance!
--
With kind regards,
Gernot Hillier
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
^ permalink raw reply [flat|nested] 28+ messages in thread* Re: [Xenomai] ipipe/x86: kernel BUG due to missing IRQ_MOVE_CLEANUP_VECTOR entry in ipipe-core3.2 2012-09-11 15:56 [Xenomai] ipipe/x86: kernel BUG due to missing IRQ_MOVE_CLEANUP_VECTOR entry in ipipe-core3.2 Gernot Hillier @ 2012-09-15 22:26 ` Gilles Chanteperdrix 2012-09-16 7:20 ` Jan Kiszka 2012-09-16 8:50 ` Philippe Gerum 0 siblings, 2 replies; 28+ messages in thread From: Gilles Chanteperdrix @ 2012-09-15 22:26 UTC (permalink / raw) To: Gernot Hillier; +Cc: xenomai On 09/11/2012 05:56 PM, Gernot Hillier wrote: > Hi there! > > While testing ipipe-core3.2 on an SMP x86 machine, I found a reproducible > kernel BUG after some seconds after starting irqbalance: > > ------------[ cut here ]------------ > kernel BUG at arch/x86/kernel/ipipe.c:592! > invalid opcode: 0000 [#1] SMP > CPU 0 > Modules linked in: des_generic md4 i7core_edac psmouse nls_cp437 edac_core cifs serio_raw joydev raid10 raid456 async_pq async_xor xor async_memcpy async_raid6_recov usbhid hid mpt2sas scsi_transport_sas raid_class igb raid6_pq async_tx raid1 raid0 multipath linear > > Pid: 0, comm: swapper/0 Not tainted 3.2.21-9-xenomai #3 Siemens AG Healthcare Sector MARS 2.1/X8DTH > RIP: 0010:[<ffffffff8101edec>] [<ffffffff8101edec>] __ipipe_handle_irq+0x1bc/0x1d0 > RSP: 0018:ffffffff8177bbe0 EFLAGS: 00010086 > RAX: 000000000000d880 RBX: 00000000ffffffff RCX: 0000000000000092 > RDX: ffffffffffffffdf RSI: ffffffff8177bc18 RDI: ffffffff8177bbf8 > RBP: ffffffff8177bc00 R08: 0000000000000001 R09: 0000000000000000 > R10: ffff880624ebaef8 R11: 000000000029fbc4 R12: 000000000000d880 > R13: ffffffff8177bbf8 R14: ffff880624e00000 R15: ffff880624e0d880 > FS: 0000000000000000(0000) GS:ffff880624e00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 00007f452a2efb80 CR3: 0000000c114d3000 CR4: 00000000000006f0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process swapper/0 (pid: 0, threadinfo ffffffff81778000, task ffffffff81787020) > Stack: > 0000000000000000 ffffffff8177bfd8 0000000000000063 ffff880624e1f9a8 > ffffffff8177bca8 ffffffff815a44dd ffffffff8177bc18 ffffffff8177bca8 > 00000000815a373b 000000000029fbc4 ffff880624eba570 0000000000000000 > Call Trace: > [<ffffffff815a44dd>] irq_move_cleanup_interrupt+0x5d/0x90 > [<ffffffff815a6649>] ? call_softirq+0x19/0x30 > [<ffffffff810043d5>] do_softirq+0xc5/0x100 > [<ffffffff81050955>] irq_exit+0xd5/0xf0 > [<ffffffff815a6e1f>] do_IRQ+0x6f/0x100 > [<ffffffff815a6db0>] ? __entry_text_end+0x5/0x5 > [<ffffffff8101f4d3>] __ipipe_do_IRQ+0x83/0xa0 > [<ffffffff8101f4d9>] ? __ipipe_do_IRQ+0x89/0xa0 > [<ffffffff810b7aaa>] __ipipe_dispatch_irq_fast+0x16a/0x170 > [<ffffffff810b87b9>] __ipipe_dispatch_irq+0xe9/0x210 > [<ffffffff8101eca1>] __ipipe_handle_irq+0x71/0x1d0 > [<ffffffff8159bae0>] common_interrupt+0x60/0x81 > [<ffffffff8101f3c4>] ? __ipipe_halt_root+0x34/0x50 > [<ffffffff8101f3b7>] ? __ipipe_halt_root+0x27/0x50 > [<ffffffff8100b146>] default_idle+0x66/0x1a0 > [<ffffffff810020cf>] cpu_idle+0xaf/0x100 > [<ffffffff8157ec22>] rest_init+0x72/0x80 > [<ffffffff81aeacca>] start_kernel+0x3b4/0x3bf > [<ffffffff81aea321>] x86_64_start_reservations+0x131/0x135 > [<ffffffff81aea456>] x86_64_start_kernel+0x131/0x138 > Code: ff ff 0f 1f 44 00 00 48 83 a0 98 06 00 00 fe 4c 89 ee bf 20 00 00 00 e8 63 83 09 00 e9 f6 fe ff ff be 01 00 00 00 e9 ab fe ff ff <0f> 0b 66 90 eb fc 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 > RIP [<ffffffff8101edec>] __ipipe_handle_irq+0x1bc/0x1d0 > RSP <ffffffff8177bbe0> > > This seems to be caused by a missing entry for IRQ_MOVE_CLEANUP_VECTOR > in the per_cpu array vector_irq[]. > > I found that ipipe_init_vector_irq() (which used to add the needed > entry) was factored out from arch/x86/kernel/ipipe.c. This likely > happened when porting from 2.6.38 to 3.1 - at least I can still see the > code in ipipe-2.6.38-x86 and missed it in ipipe-core3.1 (and didn't find > any x86-branch in-between). If I understand correctly, ipipe_init_vector_irq is no longer needed because the I-pipe core uses ipipe_apic_vector_irq for vectors above FIRST_SYSTEM_VECTOR. All system vectors are above this limit... except IRQ_MOVE_CLEANUP_VECTOR. So, your patch is correct. Another one which should work is to handle the special case IRQ_MOVE_CLEANUP_IRQ in __ipipe_handle_irq as well. -- Gilles. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Xenomai] ipipe/x86: kernel BUG due to missing IRQ_MOVE_CLEANUP_VECTOR entry in ipipe-core3.2 2012-09-15 22:26 ` Gilles Chanteperdrix @ 2012-09-16 7:20 ` Jan Kiszka 2012-09-16 8:36 ` Philippe Gerum 2012-09-16 8:50 ` Philippe Gerum 1 sibling, 1 reply; 28+ messages in thread From: Jan Kiszka @ 2012-09-16 7:20 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai On 2012-09-16 00:26, Gilles Chanteperdrix wrote: > On 09/11/2012 05:56 PM, Gernot Hillier wrote: > >> Hi there! >> >> While testing ipipe-core3.2 on an SMP x86 machine, I found a reproducible >> kernel BUG after some seconds after starting irqbalance: >> >> ------------[ cut here ]------------ >> kernel BUG at arch/x86/kernel/ipipe.c:592! >> invalid opcode: 0000 [#1] SMP >> CPU 0 >> Modules linked in: des_generic md4 i7core_edac psmouse nls_cp437 edac_core cifs serio_raw joydev raid10 raid456 async_pq async_xor xor async_memcpy async_raid6_recov usbhid hid mpt2sas scsi_transport_sas raid_class igb raid6_pq async_tx raid1 raid0 multipath linear >> >> Pid: 0, comm: swapper/0 Not tainted 3.2.21-9-xenomai #3 Siemens AG Healthcare Sector MARS 2.1/X8DTH >> RIP: 0010:[<ffffffff8101edec>] [<ffffffff8101edec>] __ipipe_handle_irq+0x1bc/0x1d0 >> RSP: 0018:ffffffff8177bbe0 EFLAGS: 00010086 >> RAX: 000000000000d880 RBX: 00000000ffffffff RCX: 0000000000000092 >> RDX: ffffffffffffffdf RSI: ffffffff8177bc18 RDI: ffffffff8177bbf8 >> RBP: ffffffff8177bc00 R08: 0000000000000001 R09: 0000000000000000 >> R10: ffff880624ebaef8 R11: 000000000029fbc4 R12: 000000000000d880 >> R13: ffffffff8177bbf8 R14: ffff880624e00000 R15: ffff880624e0d880 >> FS: 0000000000000000(0000) GS:ffff880624e00000(0000) knlGS:0000000000000000 >> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >> CR2: 00007f452a2efb80 CR3: 0000000c114d3000 CR4: 00000000000006f0 >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >> Process swapper/0 (pid: 0, threadinfo ffffffff81778000, task ffffffff81787020) >> Stack: >> 0000000000000000 ffffffff8177bfd8 0000000000000063 ffff880624e1f9a8 >> ffffffff8177bca8 ffffffff815a44dd ffffffff8177bc18 ffffffff8177bca8 >> 00000000815a373b 000000000029fbc4 ffff880624eba570 0000000000000000 >> Call Trace: >> [<ffffffff815a44dd>] irq_move_cleanup_interrupt+0x5d/0x90 >> [<ffffffff815a6649>] ? call_softirq+0x19/0x30 >> [<ffffffff810043d5>] do_softirq+0xc5/0x100 >> [<ffffffff81050955>] irq_exit+0xd5/0xf0 >> [<ffffffff815a6e1f>] do_IRQ+0x6f/0x100 >> [<ffffffff815a6db0>] ? __entry_text_end+0x5/0x5 >> [<ffffffff8101f4d3>] __ipipe_do_IRQ+0x83/0xa0 >> [<ffffffff8101f4d9>] ? __ipipe_do_IRQ+0x89/0xa0 >> [<ffffffff810b7aaa>] __ipipe_dispatch_irq_fast+0x16a/0x170 >> [<ffffffff810b87b9>] __ipipe_dispatch_irq+0xe9/0x210 >> [<ffffffff8101eca1>] __ipipe_handle_irq+0x71/0x1d0 >> [<ffffffff8159bae0>] common_interrupt+0x60/0x81 >> [<ffffffff8101f3c4>] ? __ipipe_halt_root+0x34/0x50 >> [<ffffffff8101f3b7>] ? __ipipe_halt_root+0x27/0x50 >> [<ffffffff8100b146>] default_idle+0x66/0x1a0 >> [<ffffffff810020cf>] cpu_idle+0xaf/0x100 >> [<ffffffff8157ec22>] rest_init+0x72/0x80 >> [<ffffffff81aeacca>] start_kernel+0x3b4/0x3bf >> [<ffffffff81aea321>] x86_64_start_reservations+0x131/0x135 >> [<ffffffff81aea456>] x86_64_start_kernel+0x131/0x138 >> Code: ff ff 0f 1f 44 00 00 48 83 a0 98 06 00 00 fe 4c 89 ee bf 20 00 00 00 e8 63 83 09 00 e9 f6 fe ff ff be 01 00 00 00 e9 ab fe ff ff <0f> 0b 66 90 eb fc 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 >> RIP [<ffffffff8101edec>] __ipipe_handle_irq+0x1bc/0x1d0 >> RSP <ffffffff8177bbe0> >> >> This seems to be caused by a missing entry for IRQ_MOVE_CLEANUP_VECTOR >> in the per_cpu array vector_irq[]. >> >> I found that ipipe_init_vector_irq() (which used to add the needed >> entry) was factored out from arch/x86/kernel/ipipe.c. This likely >> happened when porting from 2.6.38 to 3.1 - at least I can still see the >> code in ipipe-2.6.38-x86 and missed it in ipipe-core3.1 (and didn't find >> any x86-branch in-between). > > > If I understand correctly, ipipe_init_vector_irq is no longer needed > because the I-pipe core uses ipipe_apic_vector_irq for vectors above > FIRST_SYSTEM_VECTOR. All system vectors are above this limit... except > IRQ_MOVE_CLEANUP_VECTOR. So, your patch is correct. Another one which > should work is to handle the special case IRQ_MOVE_CLEANUP_IRQ in > __ipipe_handle_irq as well. The idea of ipipe_init_vector_irq, i.e. filling of vector_irq beyond what Linux needs, was to avoid all these special cases. Unless there was a need to leave vector_irq alone in 3.x, we should simply restore this cleaner approach from 2.6.3x. That's what Gernot's question is aiming at. Jan -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 259 bytes Desc: OpenPGP digital signature URL: <http://www.xenomai.org/pipermail/xenomai/attachments/20120916/f98e2d13/attachment.pgp> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Xenomai] ipipe/x86: kernel BUG due to missing IRQ_MOVE_CLEANUP_VECTOR entry in ipipe-core3.2 2012-09-16 7:20 ` Jan Kiszka @ 2012-09-16 8:36 ` Philippe Gerum 0 siblings, 0 replies; 28+ messages in thread From: Philippe Gerum @ 2012-09-16 8:36 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai On 09/16/2012 09:20 AM, Jan Kiszka wrote: > On 2012-09-16 00:26, Gilles Chanteperdrix wrote: >> On 09/11/2012 05:56 PM, Gernot Hillier wrote: >> >>> Hi there! >>> >>> While testing ipipe-core3.2 on an SMP x86 machine, I found a reproducible >>> kernel BUG after some seconds after starting irqbalance: >>> >>> ------------[ cut here ]------------ >>> kernel BUG at arch/x86/kernel/ipipe.c:592! >>> invalid opcode: 0000 [#1] SMP >>> CPU 0 >>> Modules linked in: des_generic md4 i7core_edac psmouse nls_cp437 edac_core cifs serio_raw joydev raid10 raid456 async_pq async_xor xor async_memcpy async_raid6_recov usbhid hid mpt2sas scsi_transport_sas raid_class igb raid6_pq async_tx raid1 raid0 multipath linear >>> >>> Pid: 0, comm: swapper/0 Not tainted 3.2.21-9-xenomai #3 Siemens AG Healthcare Sector MARS 2.1/X8DTH >>> RIP: 0010:[<ffffffff8101edec>] [<ffffffff8101edec>] __ipipe_handle_irq+0x1bc/0x1d0 >>> RSP: 0018:ffffffff8177bbe0 EFLAGS: 00010086 >>> RAX: 000000000000d880 RBX: 00000000ffffffff RCX: 0000000000000092 >>> RDX: ffffffffffffffdf RSI: ffffffff8177bc18 RDI: ffffffff8177bbf8 >>> RBP: ffffffff8177bc00 R08: 0000000000000001 R09: 0000000000000000 >>> R10: ffff880624ebaef8 R11: 000000000029fbc4 R12: 000000000000d880 >>> R13: ffffffff8177bbf8 R14: ffff880624e00000 R15: ffff880624e0d880 >>> FS: 0000000000000000(0000) GS:ffff880624e00000(0000) knlGS:0000000000000000 >>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >>> CR2: 00007f452a2efb80 CR3: 0000000c114d3000 CR4: 00000000000006f0 >>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >>> Process swapper/0 (pid: 0, threadinfo ffffffff81778000, task ffffffff81787020) >>> Stack: >>> 0000000000000000 ffffffff8177bfd8 0000000000000063 ffff880624e1f9a8 >>> ffffffff8177bca8 ffffffff815a44dd ffffffff8177bc18 ffffffff8177bca8 >>> 00000000815a373b 000000000029fbc4 ffff880624eba570 0000000000000000 >>> Call Trace: >>> [<ffffffff815a44dd>] irq_move_cleanup_interrupt+0x5d/0x90 >>> [<ffffffff815a6649>] ? call_softirq+0x19/0x30 >>> [<ffffffff810043d5>] do_softirq+0xc5/0x100 >>> [<ffffffff81050955>] irq_exit+0xd5/0xf0 >>> [<ffffffff815a6e1f>] do_IRQ+0x6f/0x100 >>> [<ffffffff815a6db0>] ? __entry_text_end+0x5/0x5 >>> [<ffffffff8101f4d3>] __ipipe_do_IRQ+0x83/0xa0 >>> [<ffffffff8101f4d9>] ? __ipipe_do_IRQ+0x89/0xa0 >>> [<ffffffff810b7aaa>] __ipipe_dispatch_irq_fast+0x16a/0x170 >>> [<ffffffff810b87b9>] __ipipe_dispatch_irq+0xe9/0x210 >>> [<ffffffff8101eca1>] __ipipe_handle_irq+0x71/0x1d0 >>> [<ffffffff8159bae0>] common_interrupt+0x60/0x81 >>> [<ffffffff8101f3c4>] ? __ipipe_halt_root+0x34/0x50 >>> [<ffffffff8101f3b7>] ? __ipipe_halt_root+0x27/0x50 >>> [<ffffffff8100b146>] default_idle+0x66/0x1a0 >>> [<ffffffff810020cf>] cpu_idle+0xaf/0x100 >>> [<ffffffff8157ec22>] rest_init+0x72/0x80 >>> [<ffffffff81aeacca>] start_kernel+0x3b4/0x3bf >>> [<ffffffff81aea321>] x86_64_start_reservations+0x131/0x135 >>> [<ffffffff81aea456>] x86_64_start_kernel+0x131/0x138 >>> Code: ff ff 0f 1f 44 00 00 48 83 a0 98 06 00 00 fe 4c 89 ee bf 20 00 00 00 e8 63 83 09 00 e9 f6 fe ff ff be 01 00 00 00 e9 ab fe ff ff <0f> 0b 66 90 eb fc 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 >>> RIP [<ffffffff8101edec>] __ipipe_handle_irq+0x1bc/0x1d0 >>> RSP <ffffffff8177bbe0> >>> >>> This seems to be caused by a missing entry for IRQ_MOVE_CLEANUP_VECTOR >>> in the per_cpu array vector_irq[]. >>> >>> I found that ipipe_init_vector_irq() (which used to add the needed >>> entry) was factored out from arch/x86/kernel/ipipe.c. This likely >>> happened when porting from 2.6.38 to 3.1 - at least I can still see the >>> code in ipipe-2.6.38-x86 and missed it in ipipe-core3.1 (and didn't find >>> any x86-branch in-between). >> >> >> If I understand correctly, ipipe_init_vector_irq is no longer needed >> because the I-pipe core uses ipipe_apic_vector_irq for vectors above >> FIRST_SYSTEM_VECTOR. All system vectors are above this limit... except >> IRQ_MOVE_CLEANUP_VECTOR. So, your patch is correct. Another one which >> should work is to handle the special case IRQ_MOVE_CLEANUP_IRQ in >> __ipipe_handle_irq as well. > > The idea of ipipe_init_vector_irq, i.e. filling of vector_irq beyond > what Linux needs, was to avoid all these special cases. Unless there was > a need to leave vector_irq alone in 3.x, we should simply restore this > cleaner approach from 2.6.3x. That's what Gernot's question is aiming at. > Ack. The change was wrongly assuming that special vectors would always belong to the system range. Please revert this. -- Philippe. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Xenomai] ipipe/x86: kernel BUG due to missing IRQ_MOVE_CLEANUP_VECTOR entry in ipipe-core3.2 2012-09-15 22:26 ` Gilles Chanteperdrix 2012-09-16 7:20 ` Jan Kiszka @ 2012-09-16 8:50 ` Philippe Gerum 2012-09-16 10:31 ` Gilles Chanteperdrix 2012-09-17 8:13 ` Gilles Chanteperdrix 1 sibling, 2 replies; 28+ messages in thread From: Philippe Gerum @ 2012-09-16 8:50 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai On 09/16/2012 12:26 AM, Gilles Chanteperdrix wrote: > On 09/11/2012 05:56 PM, Gernot Hillier wrote: > >> Hi there! >> >> While testing ipipe-core3.2 on an SMP x86 machine, I found a reproducible >> kernel BUG after some seconds after starting irqbalance: >> >> ------------[ cut here ]------------ >> kernel BUG at arch/x86/kernel/ipipe.c:592! >> invalid opcode: 0000 [#1] SMP >> CPU 0 >> Modules linked in: des_generic md4 i7core_edac psmouse nls_cp437 edac_core cifs serio_raw joydev raid10 raid456 async_pq async_xor xor async_memcpy async_raid6_recov usbhid hid mpt2sas scsi_transport_sas raid_class igb raid6_pq async_tx raid1 raid0 multipath linear >> >> Pid: 0, comm: swapper/0 Not tainted 3.2.21-9-xenomai #3 Siemens AG Healthcare Sector MARS 2.1/X8DTH >> RIP: 0010:[<ffffffff8101edec>] [<ffffffff8101edec>] __ipipe_handle_irq+0x1bc/0x1d0 >> RSP: 0018:ffffffff8177bbe0 EFLAGS: 00010086 >> RAX: 000000000000d880 RBX: 00000000ffffffff RCX: 0000000000000092 >> RDX: ffffffffffffffdf RSI: ffffffff8177bc18 RDI: ffffffff8177bbf8 >> RBP: ffffffff8177bc00 R08: 0000000000000001 R09: 0000000000000000 >> R10: ffff880624ebaef8 R11: 000000000029fbc4 R12: 000000000000d880 >> R13: ffffffff8177bbf8 R14: ffff880624e00000 R15: ffff880624e0d880 >> FS: 0000000000000000(0000) GS:ffff880624e00000(0000) knlGS:0000000000000000 >> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >> CR2: 00007f452a2efb80 CR3: 0000000c114d3000 CR4: 00000000000006f0 >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >> Process swapper/0 (pid: 0, threadinfo ffffffff81778000, task ffffffff81787020) >> Stack: >> 0000000000000000 ffffffff8177bfd8 0000000000000063 ffff880624e1f9a8 >> ffffffff8177bca8 ffffffff815a44dd ffffffff8177bc18 ffffffff8177bca8 >> 00000000815a373b 000000000029fbc4 ffff880624eba570 0000000000000000 >> Call Trace: >> [<ffffffff815a44dd>] irq_move_cleanup_interrupt+0x5d/0x90 >> [<ffffffff815a6649>] ? call_softirq+0x19/0x30 >> [<ffffffff810043d5>] do_softirq+0xc5/0x100 >> [<ffffffff81050955>] irq_exit+0xd5/0xf0 >> [<ffffffff815a6e1f>] do_IRQ+0x6f/0x100 >> [<ffffffff815a6db0>] ? __entry_text_end+0x5/0x5 >> [<ffffffff8101f4d3>] __ipipe_do_IRQ+0x83/0xa0 >> [<ffffffff8101f4d9>] ? __ipipe_do_IRQ+0x89/0xa0 >> [<ffffffff810b7aaa>] __ipipe_dispatch_irq_fast+0x16a/0x170 >> [<ffffffff810b87b9>] __ipipe_dispatch_irq+0xe9/0x210 >> [<ffffffff8101eca1>] __ipipe_handle_irq+0x71/0x1d0 >> [<ffffffff8159bae0>] common_interrupt+0x60/0x81 >> [<ffffffff8101f3c4>] ? __ipipe_halt_root+0x34/0x50 >> [<ffffffff8101f3b7>] ? __ipipe_halt_root+0x27/0x50 >> [<ffffffff8100b146>] default_idle+0x66/0x1a0 >> [<ffffffff810020cf>] cpu_idle+0xaf/0x100 >> [<ffffffff8157ec22>] rest_init+0x72/0x80 >> [<ffffffff81aeacca>] start_kernel+0x3b4/0x3bf >> [<ffffffff81aea321>] x86_64_start_reservations+0x131/0x135 >> [<ffffffff81aea456>] x86_64_start_kernel+0x131/0x138 >> Code: ff ff 0f 1f 44 00 00 48 83 a0 98 06 00 00 fe 4c 89 ee bf 20 00 00 00 e8 63 83 09 00 e9 f6 fe ff ff be 01 00 00 00 e9 ab fe ff ff <0f> 0b 66 90 eb fc 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 >> RIP [<ffffffff8101edec>] __ipipe_handle_irq+0x1bc/0x1d0 >> RSP <ffffffff8177bbe0> >> >> This seems to be caused by a missing entry for IRQ_MOVE_CLEANUP_VECTOR >> in the per_cpu array vector_irq[]. >> >> I found that ipipe_init_vector_irq() (which used to add the needed >> entry) was factored out from arch/x86/kernel/ipipe.c. This likely >> happened when porting from 2.6.38 to 3.1 - at least I can still see the >> code in ipipe-2.6.38-x86 and missed it in ipipe-core3.1 (and didn't find >> any x86-branch in-between). > > > If I understand correctly, ipipe_init_vector_irq is no longer needed > because the I-pipe core uses ipipe_apic_vector_irq for vectors above > FIRST_SYSTEM_VECTOR. All system vectors are above this limit... except > IRQ_MOVE_CLEANUP_VECTOR. So, your patch is correct. Another one which > should work is to handle the special case IRQ_MOVE_CLEANUP_IRQ in > __ipipe_handle_irq as well. > > This is correct, but unfortunately, upstream reshuffles the IRQ vector space every so often, and does not restrict the special vectors to the system range anymore. Therefore, we should re-introduce a post-setup routine for the vector->irq map, grouping all fixups we need in one place. -- Philippe. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Xenomai] ipipe/x86: kernel BUG due to missing IRQ_MOVE_CLEANUP_VECTOR entry in ipipe-core3.2 2012-09-16 8:50 ` Philippe Gerum @ 2012-09-16 10:31 ` Gilles Chanteperdrix 2012-09-17 8:13 ` Gilles Chanteperdrix 1 sibling, 0 replies; 28+ messages in thread From: Gilles Chanteperdrix @ 2012-09-16 10:31 UTC (permalink / raw) To: Philippe Gerum; +Cc: xenomai On 09/16/2012 10:50 AM, Philippe Gerum wrote: > On 09/16/2012 12:26 AM, Gilles Chanteperdrix wrote: >> On 09/11/2012 05:56 PM, Gernot Hillier wrote: >> >>> Hi there! >>> >>> While testing ipipe-core3.2 on an SMP x86 machine, I found a reproducible >>> kernel BUG after some seconds after starting irqbalance: >>> >>> ------------[ cut here ]------------ >>> kernel BUG at arch/x86/kernel/ipipe.c:592! >>> invalid opcode: 0000 [#1] SMP >>> CPU 0 >>> Modules linked in: des_generic md4 i7core_edac psmouse nls_cp437 edac_core cifs serio_raw joydev raid10 raid456 async_pq async_xor xor async_memcpy async_raid6_recov usbhid hid mpt2sas scsi_transport_sas raid_class igb raid6_pq async_tx raid1 raid0 multipath linear >>> >>> Pid: 0, comm: swapper/0 Not tainted 3.2.21-9-xenomai #3 Siemens AG Healthcare Sector MARS 2.1/X8DTH >>> RIP: 0010:[<ffffffff8101edec>] [<ffffffff8101edec>] __ipipe_handle_irq+0x1bc/0x1d0 >>> RSP: 0018:ffffffff8177bbe0 EFLAGS: 00010086 >>> RAX: 000000000000d880 RBX: 00000000ffffffff RCX: 0000000000000092 >>> RDX: ffffffffffffffdf RSI: ffffffff8177bc18 RDI: ffffffff8177bbf8 >>> RBP: ffffffff8177bc00 R08: 0000000000000001 R09: 0000000000000000 >>> R10: ffff880624ebaef8 R11: 000000000029fbc4 R12: 000000000000d880 >>> R13: ffffffff8177bbf8 R14: ffff880624e00000 R15: ffff880624e0d880 >>> FS: 0000000000000000(0000) GS:ffff880624e00000(0000) knlGS:0000000000000000 >>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >>> CR2: 00007f452a2efb80 CR3: 0000000c114d3000 CR4: 00000000000006f0 >>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >>> Process swapper/0 (pid: 0, threadinfo ffffffff81778000, task ffffffff81787020) >>> Stack: >>> 0000000000000000 ffffffff8177bfd8 0000000000000063 ffff880624e1f9a8 >>> ffffffff8177bca8 ffffffff815a44dd ffffffff8177bc18 ffffffff8177bca8 >>> 00000000815a373b 000000000029fbc4 ffff880624eba570 0000000000000000 >>> Call Trace: >>> [<ffffffff815a44dd>] irq_move_cleanup_interrupt+0x5d/0x90 >>> [<ffffffff815a6649>] ? call_softirq+0x19/0x30 >>> [<ffffffff810043d5>] do_softirq+0xc5/0x100 >>> [<ffffffff81050955>] irq_exit+0xd5/0xf0 >>> [<ffffffff815a6e1f>] do_IRQ+0x6f/0x100 >>> [<ffffffff815a6db0>] ? __entry_text_end+0x5/0x5 >>> [<ffffffff8101f4d3>] __ipipe_do_IRQ+0x83/0xa0 >>> [<ffffffff8101f4d9>] ? __ipipe_do_IRQ+0x89/0xa0 >>> [<ffffffff810b7aaa>] __ipipe_dispatch_irq_fast+0x16a/0x170 >>> [<ffffffff810b87b9>] __ipipe_dispatch_irq+0xe9/0x210 >>> [<ffffffff8101eca1>] __ipipe_handle_irq+0x71/0x1d0 >>> [<ffffffff8159bae0>] common_interrupt+0x60/0x81 >>> [<ffffffff8101f3c4>] ? __ipipe_halt_root+0x34/0x50 >>> [<ffffffff8101f3b7>] ? __ipipe_halt_root+0x27/0x50 >>> [<ffffffff8100b146>] default_idle+0x66/0x1a0 >>> [<ffffffff810020cf>] cpu_idle+0xaf/0x100 >>> [<ffffffff8157ec22>] rest_init+0x72/0x80 >>> [<ffffffff81aeacca>] start_kernel+0x3b4/0x3bf >>> [<ffffffff81aea321>] x86_64_start_reservations+0x131/0x135 >>> [<ffffffff81aea456>] x86_64_start_kernel+0x131/0x138 >>> Code: ff ff 0f 1f 44 00 00 48 83 a0 98 06 00 00 fe 4c 89 ee bf 20 00 00 00 e8 63 83 09 00 e9 f6 fe ff ff be 01 00 00 00 e9 ab fe ff ff <0f> 0b 66 90 eb fc 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 >>> RIP [<ffffffff8101edec>] __ipipe_handle_irq+0x1bc/0x1d0 >>> RSP <ffffffff8177bbe0> >>> >>> This seems to be caused by a missing entry for IRQ_MOVE_CLEANUP_VECTOR >>> in the per_cpu array vector_irq[]. >>> >>> I found that ipipe_init_vector_irq() (which used to add the needed >>> entry) was factored out from arch/x86/kernel/ipipe.c. This likely >>> happened when porting from 2.6.38 to 3.1 - at least I can still see the >>> code in ipipe-2.6.38-x86 and missed it in ipipe-core3.1 (and didn't find >>> any x86-branch in-between). >> >> >> If I understand correctly, ipipe_init_vector_irq is no longer needed >> because the I-pipe core uses ipipe_apic_vector_irq for vectors above >> FIRST_SYSTEM_VECTOR. All system vectors are above this limit... except >> IRQ_MOVE_CLEANUP_VECTOR. So, your patch is correct. Another one which >> should work is to handle the special case IRQ_MOVE_CLEANUP_IRQ in >> __ipipe_handle_irq as well. >> >> > > This is correct, but unfortunately, upstream reshuffles the IRQ vector > space every so often, and does not restrict the special vectors to the > system range anymore. Therefore, we should re-introduce a post-setup > routine for the vector->irq map, grouping all fixups we need in one place. > Ok, I will make the change in my tree and send a pull request when done, I have some pending commits (including Wolfgang's fixes). -- Gilles. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Xenomai] ipipe/x86: kernel BUG due to missing IRQ_MOVE_CLEANUP_VECTOR entry in ipipe-core3.2 2012-09-16 8:50 ` Philippe Gerum 2012-09-16 10:31 ` Gilles Chanteperdrix @ 2012-09-17 8:13 ` Gilles Chanteperdrix 2012-09-18 17:25 ` Jan Kiszka 1 sibling, 1 reply; 28+ messages in thread From: Gilles Chanteperdrix @ 2012-09-17 8:13 UTC (permalink / raw) To: Philippe Gerum; +Cc: xenomai On 09/16/2012 10:50 AM, Philippe Gerum wrote: > On 09/16/2012 12:26 AM, Gilles Chanteperdrix wrote: >> On 09/11/2012 05:56 PM, Gernot Hillier wrote: >> >>> Hi there! >>> >>> While testing ipipe-core3.2 on an SMP x86 machine, I found a reproducible >>> kernel BUG after some seconds after starting irqbalance: >>> >>> ------------[ cut here ]------------ >>> kernel BUG at arch/x86/kernel/ipipe.c:592! >>> invalid opcode: 0000 [#1] SMP >>> CPU 0 >>> Modules linked in: des_generic md4 i7core_edac psmouse nls_cp437 edac_core cifs serio_raw joydev raid10 raid456 async_pq async_xor xor async_memcpy async_raid6_recov usbhid hid mpt2sas scsi_transport_sas raid_class igb raid6_pq async_tx raid1 raid0 multipath linear >>> >>> Pid: 0, comm: swapper/0 Not tainted 3.2.21-9-xenomai #3 Siemens AG Healthcare Sector MARS 2.1/X8DTH >>> RIP: 0010:[<ffffffff8101edec>] [<ffffffff8101edec>] __ipipe_handle_irq+0x1bc/0x1d0 >>> RSP: 0018:ffffffff8177bbe0 EFLAGS: 00010086 >>> RAX: 000000000000d880 RBX: 00000000ffffffff RCX: 0000000000000092 >>> RDX: ffffffffffffffdf RSI: ffffffff8177bc18 RDI: ffffffff8177bbf8 >>> RBP: ffffffff8177bc00 R08: 0000000000000001 R09: 0000000000000000 >>> R10: ffff880624ebaef8 R11: 000000000029fbc4 R12: 000000000000d880 >>> R13: ffffffff8177bbf8 R14: ffff880624e00000 R15: ffff880624e0d880 >>> FS: 0000000000000000(0000) GS:ffff880624e00000(0000) knlGS:0000000000000000 >>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >>> CR2: 00007f452a2efb80 CR3: 0000000c114d3000 CR4: 00000000000006f0 >>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >>> Process swapper/0 (pid: 0, threadinfo ffffffff81778000, task ffffffff81787020) >>> Stack: >>> 0000000000000000 ffffffff8177bfd8 0000000000000063 ffff880624e1f9a8 >>> ffffffff8177bca8 ffffffff815a44dd ffffffff8177bc18 ffffffff8177bca8 >>> 00000000815a373b 000000000029fbc4 ffff880624eba570 0000000000000000 >>> Call Trace: >>> [<ffffffff815a44dd>] irq_move_cleanup_interrupt+0x5d/0x90 >>> [<ffffffff815a6649>] ? call_softirq+0x19/0x30 >>> [<ffffffff810043d5>] do_softirq+0xc5/0x100 >>> [<ffffffff81050955>] irq_exit+0xd5/0xf0 >>> [<ffffffff815a6e1f>] do_IRQ+0x6f/0x100 >>> [<ffffffff815a6db0>] ? __entry_text_end+0x5/0x5 >>> [<ffffffff8101f4d3>] __ipipe_do_IRQ+0x83/0xa0 >>> [<ffffffff8101f4d9>] ? __ipipe_do_IRQ+0x89/0xa0 >>> [<ffffffff810b7aaa>] __ipipe_dispatch_irq_fast+0x16a/0x170 >>> [<ffffffff810b87b9>] __ipipe_dispatch_irq+0xe9/0x210 >>> [<ffffffff8101eca1>] __ipipe_handle_irq+0x71/0x1d0 >>> [<ffffffff8159bae0>] common_interrupt+0x60/0x81 >>> [<ffffffff8101f3c4>] ? __ipipe_halt_root+0x34/0x50 >>> [<ffffffff8101f3b7>] ? __ipipe_halt_root+0x27/0x50 >>> [<ffffffff8100b146>] default_idle+0x66/0x1a0 >>> [<ffffffff810020cf>] cpu_idle+0xaf/0x100 >>> [<ffffffff8157ec22>] rest_init+0x72/0x80 >>> [<ffffffff81aeacca>] start_kernel+0x3b4/0x3bf >>> [<ffffffff81aea321>] x86_64_start_reservations+0x131/0x135 >>> [<ffffffff81aea456>] x86_64_start_kernel+0x131/0x138 >>> Code: ff ff 0f 1f 44 00 00 48 83 a0 98 06 00 00 fe 4c 89 ee bf 20 00 00 00 e8 63 83 09 00 e9 f6 fe ff ff be 01 00 00 00 e9 ab fe ff ff <0f> 0b 66 90 eb fc 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 >>> RIP [<ffffffff8101edec>] __ipipe_handle_irq+0x1bc/0x1d0 >>> RSP <ffffffff8177bbe0> >>> >>> This seems to be caused by a missing entry for IRQ_MOVE_CLEANUP_VECTOR >>> in the per_cpu array vector_irq[]. >>> >>> I found that ipipe_init_vector_irq() (which used to add the needed >>> entry) was factored out from arch/x86/kernel/ipipe.c. This likely >>> happened when porting from 2.6.38 to 3.1 - at least I can still see the >>> code in ipipe-2.6.38-x86 and missed it in ipipe-core3.1 (and didn't find >>> any x86-branch in-between). >> >> >> If I understand correctly, ipipe_init_vector_irq is no longer needed >> because the I-pipe core uses ipipe_apic_vector_irq for vectors above >> FIRST_SYSTEM_VECTOR. All system vectors are above this limit... except >> IRQ_MOVE_CLEANUP_VECTOR. So, your patch is correct. Another one which >> should work is to handle the special case IRQ_MOVE_CLEANUP_IRQ in >> __ipipe_handle_irq as well. >> >> > > This is correct, but unfortunately, upstream reshuffles the IRQ vector > space every so often, and does not restrict the special vectors to the > system range anymore. Therefore, we should re-introduce a post-setup > routine for the vector->irq map, grouping all fixups we need in one place. > I propose the following change: http://git.xenomai.org/?p=ipipe-gch.git;a=commitdiff;h=9d131dc33080cda3f7e40342210d9338dc0c3d02 Which avoids listing explicitely the vectors we want to intercept, and so, should allow some changes to happen in the kernel without having to care too much (except for vectors sur as IRQ_MOVE_CLEANUP_VECTOR which do not go through alloc_intr_gate, but this vector is the only exception, for now). -- Gilles. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Xenomai] ipipe/x86: kernel BUG due to missing IRQ_MOVE_CLEANUP_VECTOR entry in ipipe-core3.2 2012-09-17 8:13 ` Gilles Chanteperdrix @ 2012-09-18 17:25 ` Jan Kiszka 2012-09-18 17:28 ` Gilles Chanteperdrix 0 siblings, 1 reply; 28+ messages in thread From: Jan Kiszka @ 2012-09-18 17:25 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai On 2012-09-17 10:13, Gilles Chanteperdrix wrote: > On 09/16/2012 10:50 AM, Philippe Gerum wrote: >> On 09/16/2012 12:26 AM, Gilles Chanteperdrix wrote: >>> On 09/11/2012 05:56 PM, Gernot Hillier wrote: >>> >>>> Hi there! >>>> >>>> While testing ipipe-core3.2 on an SMP x86 machine, I found a reproducible >>>> kernel BUG after some seconds after starting irqbalance: >>>> >>>> ------------[ cut here ]------------ >>>> kernel BUG at arch/x86/kernel/ipipe.c:592! >>>> invalid opcode: 0000 [#1] SMP >>>> CPU 0 >>>> Modules linked in: des_generic md4 i7core_edac psmouse nls_cp437 edac_core cifs serio_raw joydev raid10 raid456 async_pq async_xor xor async_memcpy async_raid6_recov usbhid hid mpt2sas scsi_transport_sas raid_class igb raid6_pq async_tx raid1 raid0 multipath linear >>>> >>>> Pid: 0, comm: swapper/0 Not tainted 3.2.21-9-xenomai #3 Siemens AG Healthcare Sector MARS 2.1/X8DTH >>>> RIP: 0010:[<ffffffff8101edec>] [<ffffffff8101edec>] __ipipe_handle_irq+0x1bc/0x1d0 >>>> RSP: 0018:ffffffff8177bbe0 EFLAGS: 00010086 >>>> RAX: 000000000000d880 RBX: 00000000ffffffff RCX: 0000000000000092 >>>> RDX: ffffffffffffffdf RSI: ffffffff8177bc18 RDI: ffffffff8177bbf8 >>>> RBP: ffffffff8177bc00 R08: 0000000000000001 R09: 0000000000000000 >>>> R10: ffff880624ebaef8 R11: 000000000029fbc4 R12: 000000000000d880 >>>> R13: ffffffff8177bbf8 R14: ffff880624e00000 R15: ffff880624e0d880 >>>> FS: 0000000000000000(0000) GS:ffff880624e00000(0000) knlGS:0000000000000000 >>>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >>>> CR2: 00007f452a2efb80 CR3: 0000000c114d3000 CR4: 00000000000006f0 >>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >>>> Process swapper/0 (pid: 0, threadinfo ffffffff81778000, task ffffffff81787020) >>>> Stack: >>>> 0000000000000000 ffffffff8177bfd8 0000000000000063 ffff880624e1f9a8 >>>> ffffffff8177bca8 ffffffff815a44dd ffffffff8177bc18 ffffffff8177bca8 >>>> 00000000815a373b 000000000029fbc4 ffff880624eba570 0000000000000000 >>>> Call Trace: >>>> [<ffffffff815a44dd>] irq_move_cleanup_interrupt+0x5d/0x90 >>>> [<ffffffff815a6649>] ? call_softirq+0x19/0x30 >>>> [<ffffffff810043d5>] do_softirq+0xc5/0x100 >>>> [<ffffffff81050955>] irq_exit+0xd5/0xf0 >>>> [<ffffffff815a6e1f>] do_IRQ+0x6f/0x100 >>>> [<ffffffff815a6db0>] ? __entry_text_end+0x5/0x5 >>>> [<ffffffff8101f4d3>] __ipipe_do_IRQ+0x83/0xa0 >>>> [<ffffffff8101f4d9>] ? __ipipe_do_IRQ+0x89/0xa0 >>>> [<ffffffff810b7aaa>] __ipipe_dispatch_irq_fast+0x16a/0x170 >>>> [<ffffffff810b87b9>] __ipipe_dispatch_irq+0xe9/0x210 >>>> [<ffffffff8101eca1>] __ipipe_handle_irq+0x71/0x1d0 >>>> [<ffffffff8159bae0>] common_interrupt+0x60/0x81 >>>> [<ffffffff8101f3c4>] ? __ipipe_halt_root+0x34/0x50 >>>> [<ffffffff8101f3b7>] ? __ipipe_halt_root+0x27/0x50 >>>> [<ffffffff8100b146>] default_idle+0x66/0x1a0 >>>> [<ffffffff810020cf>] cpu_idle+0xaf/0x100 >>>> [<ffffffff8157ec22>] rest_init+0x72/0x80 >>>> [<ffffffff81aeacca>] start_kernel+0x3b4/0x3bf >>>> [<ffffffff81aea321>] x86_64_start_reservations+0x131/0x135 >>>> [<ffffffff81aea456>] x86_64_start_kernel+0x131/0x138 >>>> Code: ff ff 0f 1f 44 00 00 48 83 a0 98 06 00 00 fe 4c 89 ee bf 20 00 00 00 e8 63 83 09 00 e9 f6 fe ff ff be 01 00 00 00 e9 ab fe ff ff <0f> 0b 66 90 eb fc 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 >>>> RIP [<ffffffff8101edec>] __ipipe_handle_irq+0x1bc/0x1d0 >>>> RSP <ffffffff8177bbe0> >>>> >>>> This seems to be caused by a missing entry for IRQ_MOVE_CLEANUP_VECTOR >>>> in the per_cpu array vector_irq[]. >>>> >>>> I found that ipipe_init_vector_irq() (which used to add the needed >>>> entry) was factored out from arch/x86/kernel/ipipe.c. This likely >>>> happened when porting from 2.6.38 to 3.1 - at least I can still see the >>>> code in ipipe-2.6.38-x86 and missed it in ipipe-core3.1 (and didn't find >>>> any x86-branch in-between). >>> >>> >>> If I understand correctly, ipipe_init_vector_irq is no longer needed >>> because the I-pipe core uses ipipe_apic_vector_irq for vectors above >>> FIRST_SYSTEM_VECTOR. All system vectors are above this limit... except >>> IRQ_MOVE_CLEANUP_VECTOR. So, your patch is correct. Another one which >>> should work is to handle the special case IRQ_MOVE_CLEANUP_IRQ in >>> __ipipe_handle_irq as well. >>> >>> >> >> This is correct, but unfortunately, upstream reshuffles the IRQ vector >> space every so often, and does not restrict the special vectors to the >> system range anymore. Therefore, we should re-introduce a post-setup >> routine for the vector->irq map, grouping all fixups we need in one place. >> > I propose the following change: > http://git.xenomai.org/?p=ipipe-gch.git;a=commitdiff;h=9d131dc33080cda3f7e40342210d9338dc0c3d02 > > Which avoids listing explicitely the vectors we want to intercept, and > so, should allow some changes to happen in the kernel without having to > care too much (except for vectors sur as IRQ_MOVE_CLEANUP_VECTOR which > do not go through alloc_intr_gate, but this vector is the only exception, > for now). Crashes on boot, SMP at least. Investigating. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Xenomai] ipipe/x86: kernel BUG due to missing IRQ_MOVE_CLEANUP_VECTOR entry in ipipe-core3.2 2012-09-18 17:25 ` Jan Kiszka @ 2012-09-18 17:28 ` Gilles Chanteperdrix 2012-09-18 18:25 ` Jan Kiszka 0 siblings, 1 reply; 28+ messages in thread From: Gilles Chanteperdrix @ 2012-09-18 17:28 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai On 09/18/2012 07:25 PM, Jan Kiszka wrote: > On 2012-09-17 10:13, Gilles Chanteperdrix wrote: >> On 09/16/2012 10:50 AM, Philippe Gerum wrote: >>> On 09/16/2012 12:26 AM, Gilles Chanteperdrix wrote: >>>> On 09/11/2012 05:56 PM, Gernot Hillier wrote: >>>> >>>>> Hi there! >>>>> >>>>> While testing ipipe-core3.2 on an SMP x86 machine, I found a reproducible >>>>> kernel BUG after some seconds after starting irqbalance: >>>>> >>>>> ------------[ cut here ]------------ >>>>> kernel BUG at arch/x86/kernel/ipipe.c:592! >>>>> invalid opcode: 0000 [#1] SMP >>>>> CPU 0 >>>>> Modules linked in: des_generic md4 i7core_edac psmouse nls_cp437 edac_core cifs serio_raw joydev raid10 raid456 async_pq async_xor xor async_memcpy async_raid6_recov usbhid hid mpt2sas scsi_transport_sas raid_class igb raid6_pq async_tx raid1 raid0 multipath linear >>>>> >>>>> Pid: 0, comm: swapper/0 Not tainted 3.2.21-9-xenomai #3 Siemens AG Healthcare Sector MARS 2.1/X8DTH >>>>> RIP: 0010:[<ffffffff8101edec>] [<ffffffff8101edec>] __ipipe_handle_irq+0x1bc/0x1d0 >>>>> RSP: 0018:ffffffff8177bbe0 EFLAGS: 00010086 >>>>> RAX: 000000000000d880 RBX: 00000000ffffffff RCX: 0000000000000092 >>>>> RDX: ffffffffffffffdf RSI: ffffffff8177bc18 RDI: ffffffff8177bbf8 >>>>> RBP: ffffffff8177bc00 R08: 0000000000000001 R09: 0000000000000000 >>>>> R10: ffff880624ebaef8 R11: 000000000029fbc4 R12: 000000000000d880 >>>>> R13: ffffffff8177bbf8 R14: ffff880624e00000 R15: ffff880624e0d880 >>>>> FS: 0000000000000000(0000) GS:ffff880624e00000(0000) knlGS:0000000000000000 >>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >>>>> CR2: 00007f452a2efb80 CR3: 0000000c114d3000 CR4: 00000000000006f0 >>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >>>>> Process swapper/0 (pid: 0, threadinfo ffffffff81778000, task ffffffff81787020) >>>>> Stack: >>>>> 0000000000000000 ffffffff8177bfd8 0000000000000063 ffff880624e1f9a8 >>>>> ffffffff8177bca8 ffffffff815a44dd ffffffff8177bc18 ffffffff8177bca8 >>>>> 00000000815a373b 000000000029fbc4 ffff880624eba570 0000000000000000 >>>>> Call Trace: >>>>> [<ffffffff815a44dd>] irq_move_cleanup_interrupt+0x5d/0x90 >>>>> [<ffffffff815a6649>] ? call_softirq+0x19/0x30 >>>>> [<ffffffff810043d5>] do_softirq+0xc5/0x100 >>>>> [<ffffffff81050955>] irq_exit+0xd5/0xf0 >>>>> [<ffffffff815a6e1f>] do_IRQ+0x6f/0x100 >>>>> [<ffffffff815a6db0>] ? __entry_text_end+0x5/0x5 >>>>> [<ffffffff8101f4d3>] __ipipe_do_IRQ+0x83/0xa0 >>>>> [<ffffffff8101f4d9>] ? __ipipe_do_IRQ+0x89/0xa0 >>>>> [<ffffffff810b7aaa>] __ipipe_dispatch_irq_fast+0x16a/0x170 >>>>> [<ffffffff810b87b9>] __ipipe_dispatch_irq+0xe9/0x210 >>>>> [<ffffffff8101eca1>] __ipipe_handle_irq+0x71/0x1d0 >>>>> [<ffffffff8159bae0>] common_interrupt+0x60/0x81 >>>>> [<ffffffff8101f3c4>] ? __ipipe_halt_root+0x34/0x50 >>>>> [<ffffffff8101f3b7>] ? __ipipe_halt_root+0x27/0x50 >>>>> [<ffffffff8100b146>] default_idle+0x66/0x1a0 >>>>> [<ffffffff810020cf>] cpu_idle+0xaf/0x100 >>>>> [<ffffffff8157ec22>] rest_init+0x72/0x80 >>>>> [<ffffffff81aeacca>] start_kernel+0x3b4/0x3bf >>>>> [<ffffffff81aea321>] x86_64_start_reservations+0x131/0x135 >>>>> [<ffffffff81aea456>] x86_64_start_kernel+0x131/0x138 >>>>> Code: ff ff 0f 1f 44 00 00 48 83 a0 98 06 00 00 fe 4c 89 ee bf 20 00 00 00 e8 63 83 09 00 e9 f6 fe ff ff be 01 00 00 00 e9 ab fe ff ff <0f> 0b 66 90 eb fc 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 >>>>> RIP [<ffffffff8101edec>] __ipipe_handle_irq+0x1bc/0x1d0 >>>>> RSP <ffffffff8177bbe0> >>>>> >>>>> This seems to be caused by a missing entry for IRQ_MOVE_CLEANUP_VECTOR >>>>> in the per_cpu array vector_irq[]. >>>>> >>>>> I found that ipipe_init_vector_irq() (which used to add the needed >>>>> entry) was factored out from arch/x86/kernel/ipipe.c. This likely >>>>> happened when porting from 2.6.38 to 3.1 - at least I can still see the >>>>> code in ipipe-2.6.38-x86 and missed it in ipipe-core3.1 (and didn't find >>>>> any x86-branch in-between). >>>> >>>> >>>> If I understand correctly, ipipe_init_vector_irq is no longer needed >>>> because the I-pipe core uses ipipe_apic_vector_irq for vectors above >>>> FIRST_SYSTEM_VECTOR. All system vectors are above this limit... except >>>> IRQ_MOVE_CLEANUP_VECTOR. So, your patch is correct. Another one which >>>> should work is to handle the special case IRQ_MOVE_CLEANUP_IRQ in >>>> __ipipe_handle_irq as well. >>>> >>>> >>> >>> This is correct, but unfortunately, upstream reshuffles the IRQ vector >>> space every so often, and does not restrict the special vectors to the >>> system range anymore. Therefore, we should re-introduce a post-setup >>> routine for the vector->irq map, grouping all fixups we need in one place. >>> >> I propose the following change: >> http://git.xenomai.org/?p=ipipe-gch.git;a=commitdiff;h=9d131dc33080cda3f7e40342210d9338dc0c3d02 >> >> Which avoids listing explicitely the vectors we want to intercept, and >> so, should allow some changes to happen in the kernel without having to >> care too much (except for vectors sur as IRQ_MOVE_CLEANUP_VECTOR which >> do not go through alloc_intr_gate, but this vector is the only exception, >> for now). > > Crashes on boot, SMP at least. Investigating. Well, I tested it in SMP, so, the crash is probably due to some option I could not activate (such as IRQ_REMAP, the one triggering the use of IRQ_MOVE_CLEANUP_VECTOR). One thing is sure however, some #ifdef is missing, because it does not compile in UP mode without APIC/IO-APIC. -- Gilles. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Xenomai] ipipe/x86: kernel BUG due to missing IRQ_MOVE_CLEANUP_VECTOR entry in ipipe-core3.2 2012-09-18 17:28 ` Gilles Chanteperdrix @ 2012-09-18 18:25 ` Jan Kiszka 2012-09-18 18:31 ` Gilles Chanteperdrix 0 siblings, 1 reply; 28+ messages in thread From: Jan Kiszka @ 2012-09-18 18:25 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org On 2012-09-18 19:28, Gilles Chanteperdrix wrote: > On 09/18/2012 07:25 PM, Jan Kiszka wrote: >> On 2012-09-17 10:13, Gilles Chanteperdrix wrote: >>> On 09/16/2012 10:50 AM, Philippe Gerum wrote: >>>> On 09/16/2012 12:26 AM, Gilles Chanteperdrix wrote: >>>>> On 09/11/2012 05:56 PM, Gernot Hillier wrote: >>>>> >>>>>> Hi there! >>>>>> >>>>>> While testing ipipe-core3.2 on an SMP x86 machine, I found a reproducible >>>>>> kernel BUG after some seconds after starting irqbalance: >>>>>> >>>>>> ------------[ cut here ]------------ >>>>>> kernel BUG at arch/x86/kernel/ipipe.c:592! >>>>>> invalid opcode: 0000 [#1] SMP >>>>>> CPU 0 >>>>>> Modules linked in: des_generic md4 i7core_edac psmouse nls_cp437 edac_core cifs serio_raw joydev raid10 raid456 async_pq async_xor xor async_memcpy async_raid6_recov usbhid hid mpt2sas scsi_transport_sas raid_class igb raid6_pq async_tx raid1 raid0 multipath linear >>>>>> >>>>>> Pid: 0, comm: swapper/0 Not tainted 3.2.21-9-xenomai #3 Siemens AG Healthcare Sector MARS 2.1/X8DTH >>>>>> RIP: 0010:[<ffffffff8101edec>] [<ffffffff8101edec>] __ipipe_handle_irq+0x1bc/0x1d0 >>>>>> RSP: 0018:ffffffff8177bbe0 EFLAGS: 00010086 >>>>>> RAX: 000000000000d880 RBX: 00000000ffffffff RCX: 0000000000000092 >>>>>> RDX: ffffffffffffffdf RSI: ffffffff8177bc18 RDI: ffffffff8177bbf8 >>>>>> RBP: ffffffff8177bc00 R08: 0000000000000001 R09: 0000000000000000 >>>>>> R10: ffff880624ebaef8 R11: 000000000029fbc4 R12: 000000000000d880 >>>>>> R13: ffffffff8177bbf8 R14: ffff880624e00000 R15: ffff880624e0d880 >>>>>> FS: 0000000000000000(0000) GS:ffff880624e00000(0000) knlGS:0000000000000000 >>>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >>>>>> CR2: 00007f452a2efb80 CR3: 0000000c114d3000 CR4: 00000000000006f0 >>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >>>>>> Process swapper/0 (pid: 0, threadinfo ffffffff81778000, task ffffffff81787020) >>>>>> Stack: >>>>>> 0000000000000000 ffffffff8177bfd8 0000000000000063 ffff880624e1f9a8 >>>>>> ffffffff8177bca8 ffffffff815a44dd ffffffff8177bc18 ffffffff8177bca8 >>>>>> 00000000815a373b 000000000029fbc4 ffff880624eba570 0000000000000000 >>>>>> Call Trace: >>>>>> [<ffffffff815a44dd>] irq_move_cleanup_interrupt+0x5d/0x90 >>>>>> [<ffffffff815a6649>] ? call_softirq+0x19/0x30 >>>>>> [<ffffffff810043d5>] do_softirq+0xc5/0x100 >>>>>> [<ffffffff81050955>] irq_exit+0xd5/0xf0 >>>>>> [<ffffffff815a6e1f>] do_IRQ+0x6f/0x100 >>>>>> [<ffffffff815a6db0>] ? __entry_text_end+0x5/0x5 >>>>>> [<ffffffff8101f4d3>] __ipipe_do_IRQ+0x83/0xa0 >>>>>> [<ffffffff8101f4d9>] ? __ipipe_do_IRQ+0x89/0xa0 >>>>>> [<ffffffff810b7aaa>] __ipipe_dispatch_irq_fast+0x16a/0x170 >>>>>> [<ffffffff810b87b9>] __ipipe_dispatch_irq+0xe9/0x210 >>>>>> [<ffffffff8101eca1>] __ipipe_handle_irq+0x71/0x1d0 >>>>>> [<ffffffff8159bae0>] common_interrupt+0x60/0x81 >>>>>> [<ffffffff8101f3c4>] ? __ipipe_halt_root+0x34/0x50 >>>>>> [<ffffffff8101f3b7>] ? __ipipe_halt_root+0x27/0x50 >>>>>> [<ffffffff8100b146>] default_idle+0x66/0x1a0 >>>>>> [<ffffffff810020cf>] cpu_idle+0xaf/0x100 >>>>>> [<ffffffff8157ec22>] rest_init+0x72/0x80 >>>>>> [<ffffffff81aeacca>] start_kernel+0x3b4/0x3bf >>>>>> [<ffffffff81aea321>] x86_64_start_reservations+0x131/0x135 >>>>>> [<ffffffff81aea456>] x86_64_start_kernel+0x131/0x138 >>>>>> Code: ff ff 0f 1f 44 00 00 48 83 a0 98 06 00 00 fe 4c 89 ee bf 20 00 00 00 e8 63 83 09 00 e9 f6 fe ff ff be 01 00 00 00 e9 ab fe ff ff <0f> 0b 66 90 eb fc 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 >>>>>> RIP [<ffffffff8101edec>] __ipipe_handle_irq+0x1bc/0x1d0 >>>>>> RSP <ffffffff8177bbe0> >>>>>> >>>>>> This seems to be caused by a missing entry for IRQ_MOVE_CLEANUP_VECTOR >>>>>> in the per_cpu array vector_irq[]. >>>>>> >>>>>> I found that ipipe_init_vector_irq() (which used to add the needed >>>>>> entry) was factored out from arch/x86/kernel/ipipe.c. This likely >>>>>> happened when porting from 2.6.38 to 3.1 - at least I can still see the >>>>>> code in ipipe-2.6.38-x86 and missed it in ipipe-core3.1 (and didn't find >>>>>> any x86-branch in-between). >>>>> >>>>> >>>>> If I understand correctly, ipipe_init_vector_irq is no longer needed >>>>> because the I-pipe core uses ipipe_apic_vector_irq for vectors above >>>>> FIRST_SYSTEM_VECTOR. All system vectors are above this limit... except >>>>> IRQ_MOVE_CLEANUP_VECTOR. So, your patch is correct. Another one which >>>>> should work is to handle the special case IRQ_MOVE_CLEANUP_IRQ in >>>>> __ipipe_handle_irq as well. >>>>> >>>>> >>>> >>>> This is correct, but unfortunately, upstream reshuffles the IRQ vector >>>> space every so often, and does not restrict the special vectors to the >>>> system range anymore. Therefore, we should re-introduce a post-setup >>>> routine for the vector->irq map, grouping all fixups we need in one place. >>>> >>> I propose the following change: >>> http://git.xenomai.org/?p=ipipe-gch.git;a=commitdiff;h=9d131dc33080cda3f7e40342210d9338dc0c3d02 >>> >>> Which avoids listing explicitely the vectors we want to intercept, and >>> so, should allow some changes to happen in the kernel without having to >>> care too much (except for vectors sur as IRQ_MOVE_CLEANUP_VECTOR which >>> do not go through alloc_intr_gate, but this vector is the only exception, >>> for now). >> >> Crashes on boot, SMP at least. Investigating. > > Well, I tested it in SMP, so, the crash is probably due to some option I > could not activate (such as IRQ_REMAP, the one triggering the use of > IRQ_MOVE_CLEANUP_VECTOR). One thing is sure however, some #ifdef is > missing, because it does not compile in UP mode without APIC/IO-APIC. Not sure what the precise option difference is, but the bug is that you write to vector_irq before __setup_vector_irq is finished. That's why the original patch had ipipe_init_vector_irq precisely at the end of this function. Also, ipipe_init_vector_irq looks nicer than #ifdefs IMHO. Should I forward-port my old patch? Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Xenomai] ipipe/x86: kernel BUG due to missing IRQ_MOVE_CLEANUP_VECTOR entry in ipipe-core3.2 2012-09-18 18:25 ` Jan Kiszka @ 2012-09-18 18:31 ` Gilles Chanteperdrix 2012-09-18 18:55 ` Jan Kiszka 0 siblings, 1 reply; 28+ messages in thread From: Gilles Chanteperdrix @ 2012-09-18 18:31 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai@xenomai.org On 09/18/2012 08:25 PM, Jan Kiszka wrote: > On 2012-09-18 19:28, Gilles Chanteperdrix wrote: >> On 09/18/2012 07:25 PM, Jan Kiszka wrote: >>> On 2012-09-17 10:13, Gilles Chanteperdrix wrote: >>>> On 09/16/2012 10:50 AM, Philippe Gerum wrote: >>>>> On 09/16/2012 12:26 AM, Gilles Chanteperdrix wrote: >>>>>> On 09/11/2012 05:56 PM, Gernot Hillier wrote: >>>>>> >>>>>>> Hi there! >>>>>>> >>>>>>> While testing ipipe-core3.2 on an SMP x86 machine, I found a reproducible >>>>>>> kernel BUG after some seconds after starting irqbalance: >>>>>>> >>>>>>> ------------[ cut here ]------------ >>>>>>> kernel BUG at arch/x86/kernel/ipipe.c:592! >>>>>>> invalid opcode: 0000 [#1] SMP >>>>>>> CPU 0 >>>>>>> Modules linked in: des_generic md4 i7core_edac psmouse nls_cp437 edac_core cifs serio_raw joydev raid10 raid456 async_pq async_xor xor async_memcpy async_raid6_recov usbhid hid mpt2sas scsi_transport_sas raid_class igb raid6_pq async_tx raid1 raid0 multipath linear >>>>>>> >>>>>>> Pid: 0, comm: swapper/0 Not tainted 3.2.21-9-xenomai #3 Siemens AG Healthcare Sector MARS 2.1/X8DTH >>>>>>> RIP: 0010:[<ffffffff8101edec>] [<ffffffff8101edec>] __ipipe_handle_irq+0x1bc/0x1d0 >>>>>>> RSP: 0018:ffffffff8177bbe0 EFLAGS: 00010086 >>>>>>> RAX: 000000000000d880 RBX: 00000000ffffffff RCX: 0000000000000092 >>>>>>> RDX: ffffffffffffffdf RSI: ffffffff8177bc18 RDI: ffffffff8177bbf8 >>>>>>> RBP: ffffffff8177bc00 R08: 0000000000000001 R09: 0000000000000000 >>>>>>> R10: ffff880624ebaef8 R11: 000000000029fbc4 R12: 000000000000d880 >>>>>>> R13: ffffffff8177bbf8 R14: ffff880624e00000 R15: ffff880624e0d880 >>>>>>> FS: 0000000000000000(0000) GS:ffff880624e00000(0000) knlGS:0000000000000000 >>>>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >>>>>>> CR2: 00007f452a2efb80 CR3: 0000000c114d3000 CR4: 00000000000006f0 >>>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>>>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >>>>>>> Process swapper/0 (pid: 0, threadinfo ffffffff81778000, task ffffffff81787020) >>>>>>> Stack: >>>>>>> 0000000000000000 ffffffff8177bfd8 0000000000000063 ffff880624e1f9a8 >>>>>>> ffffffff8177bca8 ffffffff815a44dd ffffffff8177bc18 ffffffff8177bca8 >>>>>>> 00000000815a373b 000000000029fbc4 ffff880624eba570 0000000000000000 >>>>>>> Call Trace: >>>>>>> [<ffffffff815a44dd>] irq_move_cleanup_interrupt+0x5d/0x90 >>>>>>> [<ffffffff815a6649>] ? call_softirq+0x19/0x30 >>>>>>> [<ffffffff810043d5>] do_softirq+0xc5/0x100 >>>>>>> [<ffffffff81050955>] irq_exit+0xd5/0xf0 >>>>>>> [<ffffffff815a6e1f>] do_IRQ+0x6f/0x100 >>>>>>> [<ffffffff815a6db0>] ? __entry_text_end+0x5/0x5 >>>>>>> [<ffffffff8101f4d3>] __ipipe_do_IRQ+0x83/0xa0 >>>>>>> [<ffffffff8101f4d9>] ? __ipipe_do_IRQ+0x89/0xa0 >>>>>>> [<ffffffff810b7aaa>] __ipipe_dispatch_irq_fast+0x16a/0x170 >>>>>>> [<ffffffff810b87b9>] __ipipe_dispatch_irq+0xe9/0x210 >>>>>>> [<ffffffff8101eca1>] __ipipe_handle_irq+0x71/0x1d0 >>>>>>> [<ffffffff8159bae0>] common_interrupt+0x60/0x81 >>>>>>> [<ffffffff8101f3c4>] ? __ipipe_halt_root+0x34/0x50 >>>>>>> [<ffffffff8101f3b7>] ? __ipipe_halt_root+0x27/0x50 >>>>>>> [<ffffffff8100b146>] default_idle+0x66/0x1a0 >>>>>>> [<ffffffff810020cf>] cpu_idle+0xaf/0x100 >>>>>>> [<ffffffff8157ec22>] rest_init+0x72/0x80 >>>>>>> [<ffffffff81aeacca>] start_kernel+0x3b4/0x3bf >>>>>>> [<ffffffff81aea321>] x86_64_start_reservations+0x131/0x135 >>>>>>> [<ffffffff81aea456>] x86_64_start_kernel+0x131/0x138 >>>>>>> Code: ff ff 0f 1f 44 00 00 48 83 a0 98 06 00 00 fe 4c 89 ee bf 20 00 00 00 e8 63 83 09 00 e9 f6 fe ff ff be 01 00 00 00 e9 ab fe ff ff <0f> 0b 66 90 eb fc 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 >>>>>>> RIP [<ffffffff8101edec>] __ipipe_handle_irq+0x1bc/0x1d0 >>>>>>> RSP <ffffffff8177bbe0> >>>>>>> >>>>>>> This seems to be caused by a missing entry for IRQ_MOVE_CLEANUP_VECTOR >>>>>>> in the per_cpu array vector_irq[]. >>>>>>> >>>>>>> I found that ipipe_init_vector_irq() (which used to add the needed >>>>>>> entry) was factored out from arch/x86/kernel/ipipe.c. This likely >>>>>>> happened when porting from 2.6.38 to 3.1 - at least I can still see the >>>>>>> code in ipipe-2.6.38-x86 and missed it in ipipe-core3.1 (and didn't find >>>>>>> any x86-branch in-between). >>>>>> >>>>>> >>>>>> If I understand correctly, ipipe_init_vector_irq is no longer needed >>>>>> because the I-pipe core uses ipipe_apic_vector_irq for vectors above >>>>>> FIRST_SYSTEM_VECTOR. All system vectors are above this limit... except >>>>>> IRQ_MOVE_CLEANUP_VECTOR. So, your patch is correct. Another one which >>>>>> should work is to handle the special case IRQ_MOVE_CLEANUP_IRQ in >>>>>> __ipipe_handle_irq as well. >>>>>> >>>>>> >>>>> >>>>> This is correct, but unfortunately, upstream reshuffles the IRQ vector >>>>> space every so often, and does not restrict the special vectors to the >>>>> system range anymore. Therefore, we should re-introduce a post-setup >>>>> routine for the vector->irq map, grouping all fixups we need in one place. >>>>> >>>> I propose the following change: >>>> http://git.xenomai.org/?p=ipipe-gch.git;a=commitdiff;h=9d131dc33080cda3f7e40342210d9338dc0c3d02 >>>> >>>> Which avoids listing explicitely the vectors we want to intercept, and >>>> so, should allow some changes to happen in the kernel without having to >>>> care too much (except for vectors sur as IRQ_MOVE_CLEANUP_VECTOR which >>>> do not go through alloc_intr_gate, but this vector is the only exception, >>>> for now). >>> >>> Crashes on boot, SMP at least. Investigating. >> >> Well, I tested it in SMP, so, the crash is probably due to some option I >> could not activate (such as IRQ_REMAP, the one triggering the use of >> IRQ_MOVE_CLEANUP_VECTOR). One thing is sure however, some #ifdef is >> missing, because it does not compile in UP mode without APIC/IO-APIC. > > Not sure what the precise option difference is, but the bug is that you > write to vector_irq before __setup_vector_irq is finished. Well, yes, the writes to vector_irq should happen even before it is started, that is the point. The test at the end of setup_vector_irq skip the... Ah, wait, a hunk is missing (I did test this modification in the middle of the LAPIC TPR modification, so the commit is not actually tested). diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c index 93f84c9..91d1ee9 100644 --- a/arch/x86/kernel/apic/io_apic.c +++ b/arch/x86/kernel/apic/io_apic.c @@ -1290,9 +1290,15 @@ static void __clear_irq_vector(int irq, struct irq_cfg *cfg) void __setup_vector_irq(int cpu) { /* Initialize vector_irq on a new cpu */ - int irq, vector; + int irq, vector, vectors_limit; struct irq_cfg *cfg; +#ifndef CONFIG_IPIPE + vectors_limit = NR_VECTORS; +#else + vectors_limit = first_system_vector; +#endif /* CONFIG_IPIPE */ + /* * vector_lock will make sure that we don't run into irq vector * assignments that might be happening on another cpu in parallel, @@ -1317,7 +1323,10 @@ void __setup_vector_irq(int cpu) per_cpu(vector_irq, cpu)[vector] = irq; } /* Mark the free vectors */ - for (vector = 0; vector < NR_VECTORS; ++vector) { + for (vector = 0; vector < first_system_vector; ++vector) { + if (vector == IRQ_MOVE_CLEANUP_VECTOR) + continue; + irq = per_cpu(vector_irq, cpu)[vector]; if (irq < 0) continue; That's why > the original patch had ipipe_init_vector_irq precisely at the end of > this function. > > Also, ipipe_init_vector_irq looks nicer than #ifdefs IMHO. Should I > forward-port my old patch? Well, we can remove the #ifdefs completely, as the change is harmless on the vanilla kernel. -- Gilles. ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [Xenomai] ipipe/x86: kernel BUG due to missing IRQ_MOVE_CLEANUP_VECTOR entry in ipipe-core3.2 2012-09-18 18:31 ` Gilles Chanteperdrix @ 2012-09-18 18:55 ` Jan Kiszka 2012-09-18 19:01 ` Gilles Chanteperdrix 2012-09-18 19:05 ` Gilles Chanteperdrix 0 siblings, 2 replies; 28+ messages in thread From: Jan Kiszka @ 2012-09-18 18:55 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org On 2012-09-18 20:31, Gilles Chanteperdrix wrote: > On 09/18/2012 08:25 PM, Jan Kiszka wrote: > >> On 2012-09-18 19:28, Gilles Chanteperdrix wrote: >>> On 09/18/2012 07:25 PM, Jan Kiszka wrote: >>>> On 2012-09-17 10:13, Gilles Chanteperdrix wrote: >>>>> On 09/16/2012 10:50 AM, Philippe Gerum wrote: >>>>>> On 09/16/2012 12:26 AM, Gilles Chanteperdrix wrote: >>>>>>> On 09/11/2012 05:56 PM, Gernot Hillier wrote: >>>>>>> >>>>>>>> Hi there! >>>>>>>> >>>>>>>> While testing ipipe-core3.2 on an SMP x86 machine, I found a reproducible >>>>>>>> kernel BUG after some seconds after starting irqbalance: >>>>>>>> >>>>>>>> ------------[ cut here ]------------ >>>>>>>> kernel BUG at arch/x86/kernel/ipipe.c:592! >>>>>>>> invalid opcode: 0000 [#1] SMP >>>>>>>> CPU 0 >>>>>>>> Modules linked in: des_generic md4 i7core_edac psmouse nls_cp437 edac_core cifs serio_raw joydev raid10 raid456 async_pq async_xor xor async_memcpy async_raid6_recov usbhid hid mpt2sas scsi_transport_sas raid_class igb raid6_pq async_tx raid1 raid0 multipath linear >>>>>>>> >>>>>>>> Pid: 0, comm: swapper/0 Not tainted 3.2.21-9-xenomai #3 Siemens AG Healthcare Sector MARS 2.1/X8DTH >>>>>>>> RIP: 0010:[<ffffffff8101edec>] [<ffffffff8101edec>] __ipipe_handle_irq+0x1bc/0x1d0 >>>>>>>> RSP: 0018:ffffffff8177bbe0 EFLAGS: 00010086 >>>>>>>> RAX: 000000000000d880 RBX: 00000000ffffffff RCX: 0000000000000092 >>>>>>>> RDX: ffffffffffffffdf RSI: ffffffff8177bc18 RDI: ffffffff8177bbf8 >>>>>>>> RBP: ffffffff8177bc00 R08: 0000000000000001 R09: 0000000000000000 >>>>>>>> R10: ffff880624ebaef8 R11: 000000000029fbc4 R12: 000000000000d880 >>>>>>>> R13: ffffffff8177bbf8 R14: ffff880624e00000 R15: ffff880624e0d880 >>>>>>>> FS: 0000000000000000(0000) GS:ffff880624e00000(0000) knlGS:0000000000000000 >>>>>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >>>>>>>> CR2: 00007f452a2efb80 CR3: 0000000c114d3000 CR4: 00000000000006f0 >>>>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>>>>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >>>>>>>> Process swapper/0 (pid: 0, threadinfo ffffffff81778000, task ffffffff81787020) >>>>>>>> Stack: >>>>>>>> 0000000000000000 ffffffff8177bfd8 0000000000000063 ffff880624e1f9a8 >>>>>>>> ffffffff8177bca8 ffffffff815a44dd ffffffff8177bc18 ffffffff8177bca8 >>>>>>>> 00000000815a373b 000000000029fbc4 ffff880624eba570 0000000000000000 >>>>>>>> Call Trace: >>>>>>>> [<ffffffff815a44dd>] irq_move_cleanup_interrupt+0x5d/0x90 >>>>>>>> [<ffffffff815a6649>] ? call_softirq+0x19/0x30 >>>>>>>> [<ffffffff810043d5>] do_softirq+0xc5/0x100 >>>>>>>> [<ffffffff81050955>] irq_exit+0xd5/0xf0 >>>>>>>> [<ffffffff815a6e1f>] do_IRQ+0x6f/0x100 >>>>>>>> [<ffffffff815a6db0>] ? __entry_text_end+0x5/0x5 >>>>>>>> [<ffffffff8101f4d3>] __ipipe_do_IRQ+0x83/0xa0 >>>>>>>> [<ffffffff8101f4d9>] ? __ipipe_do_IRQ+0x89/0xa0 >>>>>>>> [<ffffffff810b7aaa>] __ipipe_dispatch_irq_fast+0x16a/0x170 >>>>>>>> [<ffffffff810b87b9>] __ipipe_dispatch_irq+0xe9/0x210 >>>>>>>> [<ffffffff8101eca1>] __ipipe_handle_irq+0x71/0x1d0 >>>>>>>> [<ffffffff8159bae0>] common_interrupt+0x60/0x81 >>>>>>>> [<ffffffff8101f3c4>] ? __ipipe_halt_root+0x34/0x50 >>>>>>>> [<ffffffff8101f3b7>] ? __ipipe_halt_root+0x27/0x50 >>>>>>>> [<ffffffff8100b146>] default_idle+0x66/0x1a0 >>>>>>>> [<ffffffff810020cf>] cpu_idle+0xaf/0x100 >>>>>>>> [<ffffffff8157ec22>] rest_init+0x72/0x80 >>>>>>>> [<ffffffff81aeacca>] start_kernel+0x3b4/0x3bf >>>>>>>> [<ffffffff81aea321>] x86_64_start_reservations+0x131/0x135 >>>>>>>> [<ffffffff81aea456>] x86_64_start_kernel+0x131/0x138 >>>>>>>> Code: ff ff 0f 1f 44 00 00 48 83 a0 98 06 00 00 fe 4c 89 ee bf 20 00 00 00 e8 63 83 09 00 e9 f6 fe ff ff be 01 00 00 00 e9 ab fe ff ff <0f> 0b 66 90 eb fc 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 >>>>>>>> RIP [<ffffffff8101edec>] __ipipe_handle_irq+0x1bc/0x1d0 >>>>>>>> RSP <ffffffff8177bbe0> >>>>>>>> >>>>>>>> This seems to be caused by a missing entry for IRQ_MOVE_CLEANUP_VECTOR >>>>>>>> in the per_cpu array vector_irq[]. >>>>>>>> >>>>>>>> I found that ipipe_init_vector_irq() (which used to add the needed >>>>>>>> entry) was factored out from arch/x86/kernel/ipipe.c. This likely >>>>>>>> happened when porting from 2.6.38 to 3.1 - at least I can still see the >>>>>>>> code in ipipe-2.6.38-x86 and missed it in ipipe-core3.1 (and didn't find >>>>>>>> any x86-branch in-between). >>>>>>> >>>>>>> >>>>>>> If I understand correctly, ipipe_init_vector_irq is no longer needed >>>>>>> because the I-pipe core uses ipipe_apic_vector_irq for vectors above >>>>>>> FIRST_SYSTEM_VECTOR. All system vectors are above this limit... except >>>>>>> IRQ_MOVE_CLEANUP_VECTOR. So, your patch is correct. Another one which >>>>>>> should work is to handle the special case IRQ_MOVE_CLEANUP_IRQ in >>>>>>> __ipipe_handle_irq as well. >>>>>>> >>>>>>> >>>>>> >>>>>> This is correct, but unfortunately, upstream reshuffles the IRQ vector >>>>>> space every so often, and does not restrict the special vectors to the >>>>>> system range anymore. Therefore, we should re-introduce a post-setup >>>>>> routine for the vector->irq map, grouping all fixups we need in one place. >>>>>> >>>>> I propose the following change: >>>>> http://git.xenomai.org/?p=ipipe-gch.git;a=commitdiff;h=9d131dc33080cda3f7e40342210d9338dc0c3d02 >>>>> >>>>> Which avoids listing explicitely the vectors we want to intercept, and >>>>> so, should allow some changes to happen in the kernel without having to >>>>> care too much (except for vectors sur as IRQ_MOVE_CLEANUP_VECTOR which >>>>> do not go through alloc_intr_gate, but this vector is the only exception, >>>>> for now). >>>> >>>> Crashes on boot, SMP at least. Investigating. >>> >>> Well, I tested it in SMP, so, the crash is probably due to some option I >>> could not activate (such as IRQ_REMAP, the one triggering the use of >>> IRQ_MOVE_CLEANUP_VECTOR). One thing is sure however, some #ifdef is >>> missing, because it does not compile in UP mode without APIC/IO-APIC. >> >> Not sure what the precise option difference is, but the bug is that you >> write to vector_irq before __setup_vector_irq is finished. > > > Well, yes, the writes to vector_irq should happen even before it is > started, that is the point. The test at the end of setup_vector_irq skip > the... Ah, wait, a hunk is missing (I did test this modification in the > middle of the LAPIC TPR modification, so the commit is not actually tested). > > diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c > index 93f84c9..91d1ee9 100644 > --- a/arch/x86/kernel/apic/io_apic.c > +++ b/arch/x86/kernel/apic/io_apic.c > @@ -1290,9 +1290,15 @@ static void __clear_irq_vector(int irq, struct > irq_cfg *cfg) > void __setup_vector_irq(int cpu) > { > /* Initialize vector_irq on a new cpu */ > - int irq, vector; > + int irq, vector, vectors_limit; > struct irq_cfg *cfg; > > +#ifndef CONFIG_IPIPE > + vectors_limit = NR_VECTORS; > +#else > + vectors_limit = first_system_vector; > +#endif /* CONFIG_IPIPE */ > + > /* > * vector_lock will make sure that we don't run into irq vector > * assignments that might be happening on another cpu in parallel, > @@ -1317,7 +1323,10 @@ void __setup_vector_irq(int cpu) > per_cpu(vector_irq, cpu)[vector] = irq; > } > /* Mark the free vectors */ > - for (vector = 0; vector < NR_VECTORS; ++vector) { > + for (vector = 0; vector < first_system_vector; ++vector) { > + if (vector == IRQ_MOVE_CLEANUP_VECTOR) > + continue; > + > irq = per_cpu(vector_irq, cpu)[vector]; > if (irq < 0) > continue; > > Err, that's not really getting cleaner or easier maintainable than the original version, does it? Also, some loop below this clears out certain vector_irq entries again - are you sure that this is what we want? I vaguely recall that there was a very good reason to patch vector_irq afterward... Will post the old variant, at least for comparison. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Xenomai] ipipe/x86: kernel BUG due to missing IRQ_MOVE_CLEANUP_VECTOR entry in ipipe-core3.2 2012-09-18 18:55 ` Jan Kiszka @ 2012-09-18 19:01 ` Gilles Chanteperdrix 2012-09-18 19:10 ` Jan Kiszka 2012-09-18 19:05 ` Gilles Chanteperdrix 1 sibling, 1 reply; 28+ messages in thread From: Gilles Chanteperdrix @ 2012-09-18 19:01 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai@xenomai.org On 09/18/2012 08:55 PM, Jan Kiszka wrote: > On 2012-09-18 20:31, Gilles Chanteperdrix wrote: >> On 09/18/2012 08:25 PM, Jan Kiszka wrote: >> >>> On 2012-09-18 19:28, Gilles Chanteperdrix wrote: >>>> On 09/18/2012 07:25 PM, Jan Kiszka wrote: >>>>> On 2012-09-17 10:13, Gilles Chanteperdrix wrote: >>>>>> On 09/16/2012 10:50 AM, Philippe Gerum wrote: >>>>>>> On 09/16/2012 12:26 AM, Gilles Chanteperdrix wrote: >>>>>>>> On 09/11/2012 05:56 PM, Gernot Hillier wrote: >>>>>>>> >>>>>>>>> Hi there! >>>>>>>>> >>>>>>>>> While testing ipipe-core3.2 on an SMP x86 machine, I found a reproducible >>>>>>>>> kernel BUG after some seconds after starting irqbalance: >>>>>>>>> >>>>>>>>> ------------[ cut here ]------------ >>>>>>>>> kernel BUG at arch/x86/kernel/ipipe.c:592! >>>>>>>>> invalid opcode: 0000 [#1] SMP >>>>>>>>> CPU 0 >>>>>>>>> Modules linked in: des_generic md4 i7core_edac psmouse nls_cp437 edac_core cifs serio_raw joydev raid10 raid456 async_pq async_xor xor async_memcpy async_raid6_recov usbhid hid mpt2sas scsi_transport_sas raid_class igb raid6_pq async_tx raid1 raid0 multipath linear >>>>>>>>> >>>>>>>>> Pid: 0, comm: swapper/0 Not tainted 3.2.21-9-xenomai #3 Siemens AG Healthcare Sector MARS 2.1/X8DTH >>>>>>>>> RIP: 0010:[<ffffffff8101edec>] [<ffffffff8101edec>] __ipipe_handle_irq+0x1bc/0x1d0 >>>>>>>>> RSP: 0018:ffffffff8177bbe0 EFLAGS: 00010086 >>>>>>>>> RAX: 000000000000d880 RBX: 00000000ffffffff RCX: 0000000000000092 >>>>>>>>> RDX: ffffffffffffffdf RSI: ffffffff8177bc18 RDI: ffffffff8177bbf8 >>>>>>>>> RBP: ffffffff8177bc00 R08: 0000000000000001 R09: 0000000000000000 >>>>>>>>> R10: ffff880624ebaef8 R11: 000000000029fbc4 R12: 000000000000d880 >>>>>>>>> R13: ffffffff8177bbf8 R14: ffff880624e00000 R15: ffff880624e0d880 >>>>>>>>> FS: 0000000000000000(0000) GS:ffff880624e00000(0000) knlGS:0000000000000000 >>>>>>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >>>>>>>>> CR2: 00007f452a2efb80 CR3: 0000000c114d3000 CR4: 00000000000006f0 >>>>>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>>>>>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >>>>>>>>> Process swapper/0 (pid: 0, threadinfo ffffffff81778000, task ffffffff81787020) >>>>>>>>> Stack: >>>>>>>>> 0000000000000000 ffffffff8177bfd8 0000000000000063 ffff880624e1f9a8 >>>>>>>>> ffffffff8177bca8 ffffffff815a44dd ffffffff8177bc18 ffffffff8177bca8 >>>>>>>>> 00000000815a373b 000000000029fbc4 ffff880624eba570 0000000000000000 >>>>>>>>> Call Trace: >>>>>>>>> [<ffffffff815a44dd>] irq_move_cleanup_interrupt+0x5d/0x90 >>>>>>>>> [<ffffffff815a6649>] ? call_softirq+0x19/0x30 >>>>>>>>> [<ffffffff810043d5>] do_softirq+0xc5/0x100 >>>>>>>>> [<ffffffff81050955>] irq_exit+0xd5/0xf0 >>>>>>>>> [<ffffffff815a6e1f>] do_IRQ+0x6f/0x100 >>>>>>>>> [<ffffffff815a6db0>] ? __entry_text_end+0x5/0x5 >>>>>>>>> [<ffffffff8101f4d3>] __ipipe_do_IRQ+0x83/0xa0 >>>>>>>>> [<ffffffff8101f4d9>] ? __ipipe_do_IRQ+0x89/0xa0 >>>>>>>>> [<ffffffff810b7aaa>] __ipipe_dispatch_irq_fast+0x16a/0x170 >>>>>>>>> [<ffffffff810b87b9>] __ipipe_dispatch_irq+0xe9/0x210 >>>>>>>>> [<ffffffff8101eca1>] __ipipe_handle_irq+0x71/0x1d0 >>>>>>>>> [<ffffffff8159bae0>] common_interrupt+0x60/0x81 >>>>>>>>> [<ffffffff8101f3c4>] ? __ipipe_halt_root+0x34/0x50 >>>>>>>>> [<ffffffff8101f3b7>] ? __ipipe_halt_root+0x27/0x50 >>>>>>>>> [<ffffffff8100b146>] default_idle+0x66/0x1a0 >>>>>>>>> [<ffffffff810020cf>] cpu_idle+0xaf/0x100 >>>>>>>>> [<ffffffff8157ec22>] rest_init+0x72/0x80 >>>>>>>>> [<ffffffff81aeacca>] start_kernel+0x3b4/0x3bf >>>>>>>>> [<ffffffff81aea321>] x86_64_start_reservations+0x131/0x135 >>>>>>>>> [<ffffffff81aea456>] x86_64_start_kernel+0x131/0x138 >>>>>>>>> Code: ff ff 0f 1f 44 00 00 48 83 a0 98 06 00 00 fe 4c 89 ee bf 20 00 00 00 e8 63 83 09 00 e9 f6 fe ff ff be 01 00 00 00 e9 ab fe ff ff <0f> 0b 66 90 eb fc 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 >>>>>>>>> RIP [<ffffffff8101edec>] __ipipe_handle_irq+0x1bc/0x1d0 >>>>>>>>> RSP <ffffffff8177bbe0> >>>>>>>>> >>>>>>>>> This seems to be caused by a missing entry for IRQ_MOVE_CLEANUP_VECTOR >>>>>>>>> in the per_cpu array vector_irq[]. >>>>>>>>> >>>>>>>>> I found that ipipe_init_vector_irq() (which used to add the needed >>>>>>>>> entry) was factored out from arch/x86/kernel/ipipe.c. This likely >>>>>>>>> happened when porting from 2.6.38 to 3.1 - at least I can still see the >>>>>>>>> code in ipipe-2.6.38-x86 and missed it in ipipe-core3.1 (and didn't find >>>>>>>>> any x86-branch in-between). >>>>>>>> >>>>>>>> >>>>>>>> If I understand correctly, ipipe_init_vector_irq is no longer needed >>>>>>>> because the I-pipe core uses ipipe_apic_vector_irq for vectors above >>>>>>>> FIRST_SYSTEM_VECTOR. All system vectors are above this limit... except >>>>>>>> IRQ_MOVE_CLEANUP_VECTOR. So, your patch is correct. Another one which >>>>>>>> should work is to handle the special case IRQ_MOVE_CLEANUP_IRQ in >>>>>>>> __ipipe_handle_irq as well. >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> This is correct, but unfortunately, upstream reshuffles the IRQ vector >>>>>>> space every so often, and does not restrict the special vectors to the >>>>>>> system range anymore. Therefore, we should re-introduce a post-setup >>>>>>> routine for the vector->irq map, grouping all fixups we need in one place. >>>>>>> >>>>>> I propose the following change: >>>>>> http://git.xenomai.org/?p=ipipe-gch.git;a=commitdiff;h=9d131dc33080cda3f7e40342210d9338dc0c3d02 >>>>>> >>>>>> Which avoids listing explicitely the vectors we want to intercept, and >>>>>> so, should allow some changes to happen in the kernel without having to >>>>>> care too much (except for vectors sur as IRQ_MOVE_CLEANUP_VECTOR which >>>>>> do not go through alloc_intr_gate, but this vector is the only exception, >>>>>> for now). >>>>> >>>>> Crashes on boot, SMP at least. Investigating. >>>> >>>> Well, I tested it in SMP, so, the crash is probably due to some option I >>>> could not activate (such as IRQ_REMAP, the one triggering the use of >>>> IRQ_MOVE_CLEANUP_VECTOR). One thing is sure however, some #ifdef is >>>> missing, because it does not compile in UP mode without APIC/IO-APIC. >>> >>> Not sure what the precise option difference is, but the bug is that you >>> write to vector_irq before __setup_vector_irq is finished. >> >> >> Well, yes, the writes to vector_irq should happen even before it is >> started, that is the point. The test at the end of setup_vector_irq skip >> the... Ah, wait, a hunk is missing (I did test this modification in the >> middle of the LAPIC TPR modification, so the commit is not actually tested). >> >> diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c >> index 93f84c9..91d1ee9 100644 >> --- a/arch/x86/kernel/apic/io_apic.c >> +++ b/arch/x86/kernel/apic/io_apic.c >> @@ -1290,9 +1290,15 @@ static void __clear_irq_vector(int irq, struct >> irq_cfg *cfg) >> void __setup_vector_irq(int cpu) >> { >> /* Initialize vector_irq on a new cpu */ >> - int irq, vector; >> + int irq, vector, vectors_limit; >> struct irq_cfg *cfg; >> >> +#ifndef CONFIG_IPIPE >> + vectors_limit = NR_VECTORS; >> +#else >> + vectors_limit = first_system_vector; >> +#endif /* CONFIG_IPIPE */ >> + >> /* >> * vector_lock will make sure that we don't run into irq vector >> * assignments that might be happening on another cpu in parallel, >> @@ -1317,7 +1323,10 @@ void __setup_vector_irq(int cpu) >> per_cpu(vector_irq, cpu)[vector] = irq; >> } >> /* Mark the free vectors */ >> - for (vector = 0; vector < NR_VECTORS; ++vector) { >> + for (vector = 0; vector < first_system_vector; ++vector) { >> + if (vector == IRQ_MOVE_CLEANUP_VECTOR) >> + continue; >> + >> irq = per_cpu(vector_irq, cpu)[vector]; >> if (irq < 0) >> continue; >> >> > > Err, that's not really getting cleaner or easier maintainable than the > original version, does it? Also, some loop below this clears out certain > vector_irq entries again - are you sure that this is what we want? I > vaguely recall that there was a very good reason to patch vector_irq > afterward... > > Will post the old variant, at least for comparison. The point is to avoid having to enumerate the vectors by hand. If a system vector is added, no further modification is needed. -- Gilles. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Xenomai] ipipe/x86: kernel BUG due to missing IRQ_MOVE_CLEANUP_VECTOR entry in ipipe-core3.2 2012-09-18 19:01 ` Gilles Chanteperdrix @ 2012-09-18 19:10 ` Jan Kiszka 2012-09-18 19:14 ` Gilles Chanteperdrix 0 siblings, 1 reply; 28+ messages in thread From: Jan Kiszka @ 2012-09-18 19:10 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org On 2012-09-18 21:01, Gilles Chanteperdrix wrote: > On 09/18/2012 08:55 PM, Jan Kiszka wrote: > >> On 2012-09-18 20:31, Gilles Chanteperdrix wrote: >>> On 09/18/2012 08:25 PM, Jan Kiszka wrote: >>> >>>> On 2012-09-18 19:28, Gilles Chanteperdrix wrote: >>>>> On 09/18/2012 07:25 PM, Jan Kiszka wrote: >>>>>> On 2012-09-17 10:13, Gilles Chanteperdrix wrote: >>>>>>> On 09/16/2012 10:50 AM, Philippe Gerum wrote: >>>>>>>> On 09/16/2012 12:26 AM, Gilles Chanteperdrix wrote: >>>>>>>>> On 09/11/2012 05:56 PM, Gernot Hillier wrote: >>>>>>>>> >>>>>>>>>> Hi there! >>>>>>>>>> >>>>>>>>>> While testing ipipe-core3.2 on an SMP x86 machine, I found a reproducible >>>>>>>>>> kernel BUG after some seconds after starting irqbalance: >>>>>>>>>> >>>>>>>>>> ------------[ cut here ]------------ >>>>>>>>>> kernel BUG at arch/x86/kernel/ipipe.c:592! >>>>>>>>>> invalid opcode: 0000 [#1] SMP >>>>>>>>>> CPU 0 >>>>>>>>>> Modules linked in: des_generic md4 i7core_edac psmouse nls_cp437 edac_core cifs serio_raw joydev raid10 raid456 async_pq async_xor xor async_memcpy async_raid6_recov usbhid hid mpt2sas scsi_transport_sas raid_class igb raid6_pq async_tx raid1 raid0 multipath linear >>>>>>>>>> >>>>>>>>>> Pid: 0, comm: swapper/0 Not tainted 3.2.21-9-xenomai #3 Siemens AG Healthcare Sector MARS 2.1/X8DTH >>>>>>>>>> RIP: 0010:[<ffffffff8101edec>] [<ffffffff8101edec>] __ipipe_handle_irq+0x1bc/0x1d0 >>>>>>>>>> RSP: 0018:ffffffff8177bbe0 EFLAGS: 00010086 >>>>>>>>>> RAX: 000000000000d880 RBX: 00000000ffffffff RCX: 0000000000000092 >>>>>>>>>> RDX: ffffffffffffffdf RSI: ffffffff8177bc18 RDI: ffffffff8177bbf8 >>>>>>>>>> RBP: ffffffff8177bc00 R08: 0000000000000001 R09: 0000000000000000 >>>>>>>>>> R10: ffff880624ebaef8 R11: 000000000029fbc4 R12: 000000000000d880 >>>>>>>>>> R13: ffffffff8177bbf8 R14: ffff880624e00000 R15: ffff880624e0d880 >>>>>>>>>> FS: 0000000000000000(0000) GS:ffff880624e00000(0000) knlGS:0000000000000000 >>>>>>>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >>>>>>>>>> CR2: 00007f452a2efb80 CR3: 0000000c114d3000 CR4: 00000000000006f0 >>>>>>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>>>>>>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >>>>>>>>>> Process swapper/0 (pid: 0, threadinfo ffffffff81778000, task ffffffff81787020) >>>>>>>>>> Stack: >>>>>>>>>> 0000000000000000 ffffffff8177bfd8 0000000000000063 ffff880624e1f9a8 >>>>>>>>>> ffffffff8177bca8 ffffffff815a44dd ffffffff8177bc18 ffffffff8177bca8 >>>>>>>>>> 00000000815a373b 000000000029fbc4 ffff880624eba570 0000000000000000 >>>>>>>>>> Call Trace: >>>>>>>>>> [<ffffffff815a44dd>] irq_move_cleanup_interrupt+0x5d/0x90 >>>>>>>>>> [<ffffffff815a6649>] ? call_softirq+0x19/0x30 >>>>>>>>>> [<ffffffff810043d5>] do_softirq+0xc5/0x100 >>>>>>>>>> [<ffffffff81050955>] irq_exit+0xd5/0xf0 >>>>>>>>>> [<ffffffff815a6e1f>] do_IRQ+0x6f/0x100 >>>>>>>>>> [<ffffffff815a6db0>] ? __entry_text_end+0x5/0x5 >>>>>>>>>> [<ffffffff8101f4d3>] __ipipe_do_IRQ+0x83/0xa0 >>>>>>>>>> [<ffffffff8101f4d9>] ? __ipipe_do_IRQ+0x89/0xa0 >>>>>>>>>> [<ffffffff810b7aaa>] __ipipe_dispatch_irq_fast+0x16a/0x170 >>>>>>>>>> [<ffffffff810b87b9>] __ipipe_dispatch_irq+0xe9/0x210 >>>>>>>>>> [<ffffffff8101eca1>] __ipipe_handle_irq+0x71/0x1d0 >>>>>>>>>> [<ffffffff8159bae0>] common_interrupt+0x60/0x81 >>>>>>>>>> [<ffffffff8101f3c4>] ? __ipipe_halt_root+0x34/0x50 >>>>>>>>>> [<ffffffff8101f3b7>] ? __ipipe_halt_root+0x27/0x50 >>>>>>>>>> [<ffffffff8100b146>] default_idle+0x66/0x1a0 >>>>>>>>>> [<ffffffff810020cf>] cpu_idle+0xaf/0x100 >>>>>>>>>> [<ffffffff8157ec22>] rest_init+0x72/0x80 >>>>>>>>>> [<ffffffff81aeacca>] start_kernel+0x3b4/0x3bf >>>>>>>>>> [<ffffffff81aea321>] x86_64_start_reservations+0x131/0x135 >>>>>>>>>> [<ffffffff81aea456>] x86_64_start_kernel+0x131/0x138 >>>>>>>>>> Code: ff ff 0f 1f 44 00 00 48 83 a0 98 06 00 00 fe 4c 89 ee bf 20 00 00 00 e8 63 83 09 00 e9 f6 fe ff ff be 01 00 00 00 e9 ab fe ff ff <0f> 0b 66 90 eb fc 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 >>>>>>>>>> RIP [<ffffffff8101edec>] __ipipe_handle_irq+0x1bc/0x1d0 >>>>>>>>>> RSP <ffffffff8177bbe0> >>>>>>>>>> >>>>>>>>>> This seems to be caused by a missing entry for IRQ_MOVE_CLEANUP_VECTOR >>>>>>>>>> in the per_cpu array vector_irq[]. >>>>>>>>>> >>>>>>>>>> I found that ipipe_init_vector_irq() (which used to add the needed >>>>>>>>>> entry) was factored out from arch/x86/kernel/ipipe.c. This likely >>>>>>>>>> happened when porting from 2.6.38 to 3.1 - at least I can still see the >>>>>>>>>> code in ipipe-2.6.38-x86 and missed it in ipipe-core3.1 (and didn't find >>>>>>>>>> any x86-branch in-between). >>>>>>>>> >>>>>>>>> >>>>>>>>> If I understand correctly, ipipe_init_vector_irq is no longer needed >>>>>>>>> because the I-pipe core uses ipipe_apic_vector_irq for vectors above >>>>>>>>> FIRST_SYSTEM_VECTOR. All system vectors are above this limit... except >>>>>>>>> IRQ_MOVE_CLEANUP_VECTOR. So, your patch is correct. Another one which >>>>>>>>> should work is to handle the special case IRQ_MOVE_CLEANUP_IRQ in >>>>>>>>> __ipipe_handle_irq as well. >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> This is correct, but unfortunately, upstream reshuffles the IRQ vector >>>>>>>> space every so often, and does not restrict the special vectors to the >>>>>>>> system range anymore. Therefore, we should re-introduce a post-setup >>>>>>>> routine for the vector->irq map, grouping all fixups we need in one place. >>>>>>>> >>>>>>> I propose the following change: >>>>>>> http://git.xenomai.org/?p=ipipe-gch.git;a=commitdiff;h=9d131dc33080cda3f7e40342210d9338dc0c3d02 >>>>>>> >>>>>>> Which avoids listing explicitely the vectors we want to intercept, and >>>>>>> so, should allow some changes to happen in the kernel without having to >>>>>>> care too much (except for vectors sur as IRQ_MOVE_CLEANUP_VECTOR which >>>>>>> do not go through alloc_intr_gate, but this vector is the only exception, >>>>>>> for now). >>>>>> >>>>>> Crashes on boot, SMP at least. Investigating. >>>>> >>>>> Well, I tested it in SMP, so, the crash is probably due to some option I >>>>> could not activate (such as IRQ_REMAP, the one triggering the use of >>>>> IRQ_MOVE_CLEANUP_VECTOR). One thing is sure however, some #ifdef is >>>>> missing, because it does not compile in UP mode without APIC/IO-APIC. >>>> >>>> Not sure what the precise option difference is, but the bug is that you >>>> write to vector_irq before __setup_vector_irq is finished. >>> >>> >>> Well, yes, the writes to vector_irq should happen even before it is >>> started, that is the point. The test at the end of setup_vector_irq skip >>> the... Ah, wait, a hunk is missing (I did test this modification in the >>> middle of the LAPIC TPR modification, so the commit is not actually tested). >>> >>> diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c >>> index 93f84c9..91d1ee9 100644 >>> --- a/arch/x86/kernel/apic/io_apic.c >>> +++ b/arch/x86/kernel/apic/io_apic.c >>> @@ -1290,9 +1290,15 @@ static void __clear_irq_vector(int irq, struct >>> irq_cfg *cfg) >>> void __setup_vector_irq(int cpu) >>> { >>> /* Initialize vector_irq on a new cpu */ >>> - int irq, vector; >>> + int irq, vector, vectors_limit; >>> struct irq_cfg *cfg; >>> >>> +#ifndef CONFIG_IPIPE >>> + vectors_limit = NR_VECTORS; >>> +#else >>> + vectors_limit = first_system_vector; >>> +#endif /* CONFIG_IPIPE */ >>> + >>> /* >>> * vector_lock will make sure that we don't run into irq vector >>> * assignments that might be happening on another cpu in parallel, >>> @@ -1317,7 +1323,10 @@ void __setup_vector_irq(int cpu) >>> per_cpu(vector_irq, cpu)[vector] = irq; >>> } >>> /* Mark the free vectors */ >>> - for (vector = 0; vector < NR_VECTORS; ++vector) { >>> + for (vector = 0; vector < first_system_vector; ++vector) { >>> + if (vector == IRQ_MOVE_CLEANUP_VECTOR) >>> + continue; >>> + >>> irq = per_cpu(vector_irq, cpu)[vector]; >>> if (irq < 0) >>> continue; >>> >>> >> >> Err, that's not really getting cleaner or easier maintainable than the >> original version, does it? Also, some loop below this clears out certain >> vector_irq entries again - are you sure that this is what we want? I >> vaguely recall that there was a very good reason to patch vector_irq >> afterward... >> >> Will post the old variant, at least for comparison. > > > The point is to avoid having to enumerate the vectors by hand. If a > system vector is added, no further modification is needed. I see your intention, but this comes at the price of changing other kernel code that works against us due to the initialization order. Also, we only have two ranges to address: MOVE_CLEANUP and everything > first_system_vector. Your patch also have to deal with both of them separately. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Xenomai] ipipe/x86: kernel BUG due to missing IRQ_MOVE_CLEANUP_VECTOR entry in ipipe-core3.2 2012-09-18 19:10 ` Jan Kiszka @ 2012-09-18 19:14 ` Gilles Chanteperdrix 0 siblings, 0 replies; 28+ messages in thread From: Gilles Chanteperdrix @ 2012-09-18 19:14 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai@xenomai.org On 09/18/2012 09:10 PM, Jan Kiszka wrote: > On 2012-09-18 21:01, Gilles Chanteperdrix wrote: >> On 09/18/2012 08:55 PM, Jan Kiszka wrote: >> >>> On 2012-09-18 20:31, Gilles Chanteperdrix wrote: >>>> On 09/18/2012 08:25 PM, Jan Kiszka wrote: >>>> >>>>> On 2012-09-18 19:28, Gilles Chanteperdrix wrote: >>>>>> On 09/18/2012 07:25 PM, Jan Kiszka wrote: >>>>>>> On 2012-09-17 10:13, Gilles Chanteperdrix wrote: >>>>>>>> On 09/16/2012 10:50 AM, Philippe Gerum wrote: >>>>>>>>> On 09/16/2012 12:26 AM, Gilles Chanteperdrix wrote: >>>>>>>>>> On 09/11/2012 05:56 PM, Gernot Hillier wrote: >>>>>>>>>> >>>>>>>>>>> Hi there! >>>>>>>>>>> >>>>>>>>>>> While testing ipipe-core3.2 on an SMP x86 machine, I found a reproducible >>>>>>>>>>> kernel BUG after some seconds after starting irqbalance: >>>>>>>>>>> >>>>>>>>>>> ------------[ cut here ]------------ >>>>>>>>>>> kernel BUG at arch/x86/kernel/ipipe.c:592! >>>>>>>>>>> invalid opcode: 0000 [#1] SMP >>>>>>>>>>> CPU 0 >>>>>>>>>>> Modules linked in: des_generic md4 i7core_edac psmouse nls_cp437 edac_core cifs serio_raw joydev raid10 raid456 async_pq async_xor xor async_memcpy async_raid6_recov usbhid hid mpt2sas scsi_transport_sas raid_class igb raid6_pq async_tx raid1 raid0 multipath linear >>>>>>>>>>> >>>>>>>>>>> Pid: 0, comm: swapper/0 Not tainted 3.2.21-9-xenomai #3 Siemens AG Healthcare Sector MARS 2.1/X8DTH >>>>>>>>>>> RIP: 0010:[<ffffffff8101edec>] [<ffffffff8101edec>] __ipipe_handle_irq+0x1bc/0x1d0 >>>>>>>>>>> RSP: 0018:ffffffff8177bbe0 EFLAGS: 00010086 >>>>>>>>>>> RAX: 000000000000d880 RBX: 00000000ffffffff RCX: 0000000000000092 >>>>>>>>>>> RDX: ffffffffffffffdf RSI: ffffffff8177bc18 RDI: ffffffff8177bbf8 >>>>>>>>>>> RBP: ffffffff8177bc00 R08: 0000000000000001 R09: 0000000000000000 >>>>>>>>>>> R10: ffff880624ebaef8 R11: 000000000029fbc4 R12: 000000000000d880 >>>>>>>>>>> R13: ffffffff8177bbf8 R14: ffff880624e00000 R15: ffff880624e0d880 >>>>>>>>>>> FS: 0000000000000000(0000) GS:ffff880624e00000(0000) knlGS:0000000000000000 >>>>>>>>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >>>>>>>>>>> CR2: 00007f452a2efb80 CR3: 0000000c114d3000 CR4: 00000000000006f0 >>>>>>>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>>>>>>>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >>>>>>>>>>> Process swapper/0 (pid: 0, threadinfo ffffffff81778000, task ffffffff81787020) >>>>>>>>>>> Stack: >>>>>>>>>>> 0000000000000000 ffffffff8177bfd8 0000000000000063 ffff880624e1f9a8 >>>>>>>>>>> ffffffff8177bca8 ffffffff815a44dd ffffffff8177bc18 ffffffff8177bca8 >>>>>>>>>>> 00000000815a373b 000000000029fbc4 ffff880624eba570 0000000000000000 >>>>>>>>>>> Call Trace: >>>>>>>>>>> [<ffffffff815a44dd>] irq_move_cleanup_interrupt+0x5d/0x90 >>>>>>>>>>> [<ffffffff815a6649>] ? call_softirq+0x19/0x30 >>>>>>>>>>> [<ffffffff810043d5>] do_softirq+0xc5/0x100 >>>>>>>>>>> [<ffffffff81050955>] irq_exit+0xd5/0xf0 >>>>>>>>>>> [<ffffffff815a6e1f>] do_IRQ+0x6f/0x100 >>>>>>>>>>> [<ffffffff815a6db0>] ? __entry_text_end+0x5/0x5 >>>>>>>>>>> [<ffffffff8101f4d3>] __ipipe_do_IRQ+0x83/0xa0 >>>>>>>>>>> [<ffffffff8101f4d9>] ? __ipipe_do_IRQ+0x89/0xa0 >>>>>>>>>>> [<ffffffff810b7aaa>] __ipipe_dispatch_irq_fast+0x16a/0x170 >>>>>>>>>>> [<ffffffff810b87b9>] __ipipe_dispatch_irq+0xe9/0x210 >>>>>>>>>>> [<ffffffff8101eca1>] __ipipe_handle_irq+0x71/0x1d0 >>>>>>>>>>> [<ffffffff8159bae0>] common_interrupt+0x60/0x81 >>>>>>>>>>> [<ffffffff8101f3c4>] ? __ipipe_halt_root+0x34/0x50 >>>>>>>>>>> [<ffffffff8101f3b7>] ? __ipipe_halt_root+0x27/0x50 >>>>>>>>>>> [<ffffffff8100b146>] default_idle+0x66/0x1a0 >>>>>>>>>>> [<ffffffff810020cf>] cpu_idle+0xaf/0x100 >>>>>>>>>>> [<ffffffff8157ec22>] rest_init+0x72/0x80 >>>>>>>>>>> [<ffffffff81aeacca>] start_kernel+0x3b4/0x3bf >>>>>>>>>>> [<ffffffff81aea321>] x86_64_start_reservations+0x131/0x135 >>>>>>>>>>> [<ffffffff81aea456>] x86_64_start_kernel+0x131/0x138 >>>>>>>>>>> Code: ff ff 0f 1f 44 00 00 48 83 a0 98 06 00 00 fe 4c 89 ee bf 20 00 00 00 e8 63 83 09 00 e9 f6 fe ff ff be 01 00 00 00 e9 ab fe ff ff <0f> 0b 66 90 eb fc 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 >>>>>>>>>>> RIP [<ffffffff8101edec>] __ipipe_handle_irq+0x1bc/0x1d0 >>>>>>>>>>> RSP <ffffffff8177bbe0> >>>>>>>>>>> >>>>>>>>>>> This seems to be caused by a missing entry for IRQ_MOVE_CLEANUP_VECTOR >>>>>>>>>>> in the per_cpu array vector_irq[]. >>>>>>>>>>> >>>>>>>>>>> I found that ipipe_init_vector_irq() (which used to add the needed >>>>>>>>>>> entry) was factored out from arch/x86/kernel/ipipe.c. This likely >>>>>>>>>>> happened when porting from 2.6.38 to 3.1 - at least I can still see the >>>>>>>>>>> code in ipipe-2.6.38-x86 and missed it in ipipe-core3.1 (and didn't find >>>>>>>>>>> any x86-branch in-between). >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> If I understand correctly, ipipe_init_vector_irq is no longer needed >>>>>>>>>> because the I-pipe core uses ipipe_apic_vector_irq for vectors above >>>>>>>>>> FIRST_SYSTEM_VECTOR. All system vectors are above this limit... except >>>>>>>>>> IRQ_MOVE_CLEANUP_VECTOR. So, your patch is correct. Another one which >>>>>>>>>> should work is to handle the special case IRQ_MOVE_CLEANUP_IRQ in >>>>>>>>>> __ipipe_handle_irq as well. >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> This is correct, but unfortunately, upstream reshuffles the IRQ vector >>>>>>>>> space every so often, and does not restrict the special vectors to the >>>>>>>>> system range anymore. Therefore, we should re-introduce a post-setup >>>>>>>>> routine for the vector->irq map, grouping all fixups we need in one place. >>>>>>>>> >>>>>>>> I propose the following change: >>>>>>>> http://git.xenomai.org/?p=ipipe-gch.git;a=commitdiff;h=9d131dc33080cda3f7e40342210d9338dc0c3d02 >>>>>>>> >>>>>>>> Which avoids listing explicitely the vectors we want to intercept, and >>>>>>>> so, should allow some changes to happen in the kernel without having to >>>>>>>> care too much (except for vectors sur as IRQ_MOVE_CLEANUP_VECTOR which >>>>>>>> do not go through alloc_intr_gate, but this vector is the only exception, >>>>>>>> for now). >>>>>>> >>>>>>> Crashes on boot, SMP at least. Investigating. >>>>>> >>>>>> Well, I tested it in SMP, so, the crash is probably due to some option I >>>>>> could not activate (such as IRQ_REMAP, the one triggering the use of >>>>>> IRQ_MOVE_CLEANUP_VECTOR). One thing is sure however, some #ifdef is >>>>>> missing, because it does not compile in UP mode without APIC/IO-APIC. >>>>> >>>>> Not sure what the precise option difference is, but the bug is that you >>>>> write to vector_irq before __setup_vector_irq is finished. >>>> >>>> >>>> Well, yes, the writes to vector_irq should happen even before it is >>>> started, that is the point. The test at the end of setup_vector_irq skip >>>> the... Ah, wait, a hunk is missing (I did test this modification in the >>>> middle of the LAPIC TPR modification, so the commit is not actually tested). >>>> >>>> diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c >>>> index 93f84c9..91d1ee9 100644 >>>> --- a/arch/x86/kernel/apic/io_apic.c >>>> +++ b/arch/x86/kernel/apic/io_apic.c >>>> @@ -1290,9 +1290,15 @@ static void __clear_irq_vector(int irq, struct >>>> irq_cfg *cfg) >>>> void __setup_vector_irq(int cpu) >>>> { >>>> /* Initialize vector_irq on a new cpu */ >>>> - int irq, vector; >>>> + int irq, vector, vectors_limit; >>>> struct irq_cfg *cfg; >>>> >>>> +#ifndef CONFIG_IPIPE >>>> + vectors_limit = NR_VECTORS; >>>> +#else >>>> + vectors_limit = first_system_vector; >>>> +#endif /* CONFIG_IPIPE */ >>>> + >>>> /* >>>> * vector_lock will make sure that we don't run into irq vector >>>> * assignments that might be happening on another cpu in parallel, >>>> @@ -1317,7 +1323,10 @@ void __setup_vector_irq(int cpu) >>>> per_cpu(vector_irq, cpu)[vector] = irq; >>>> } >>>> /* Mark the free vectors */ >>>> - for (vector = 0; vector < NR_VECTORS; ++vector) { >>>> + for (vector = 0; vector < first_system_vector; ++vector) { >>>> + if (vector == IRQ_MOVE_CLEANUP_VECTOR) >>>> + continue; >>>> + >>>> irq = per_cpu(vector_irq, cpu)[vector]; >>>> if (irq < 0) >>>> continue; >>>> >>>> >>> >>> Err, that's not really getting cleaner or easier maintainable than the >>> original version, does it? Also, some loop below this clears out certain >>> vector_irq entries again - are you sure that this is what we want? I >>> vaguely recall that there was a very good reason to patch vector_irq >>> afterward... >>> >>> Will post the old variant, at least for comparison. >> >> >> The point is to avoid having to enumerate the vectors by hand. If a >> system vector is added, no further modification is needed. > > I see your intention, but this comes at the price of changing other > kernel code that works against us due to the initialization order. > > Also, we only have two ranges to address: MOVE_CLEANUP and everything > > first_system_vector. Your patch also have to deal with both of them > separately. Yes, and on the other hand, the function you propose could use a loop from system_vector to NR_VECTORS, so, basically not depend on a systematic enumeration of the used system vectors. We simply have to ensure that it is called after the calls to alloc_intr_gate. -- Gilles. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Xenomai] ipipe/x86: kernel BUG due to missing IRQ_MOVE_CLEANUP_VECTOR entry in ipipe-core3.2 2012-09-18 18:55 ` Jan Kiszka 2012-09-18 19:01 ` Gilles Chanteperdrix @ 2012-09-18 19:05 ` Gilles Chanteperdrix 2012-09-18 19:10 ` Jan Kiszka 1 sibling, 1 reply; 28+ messages in thread From: Gilles Chanteperdrix @ 2012-09-18 19:05 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai@xenomai.org On 09/18/2012 08:55 PM, Jan Kiszka wrote: > On 2012-09-18 20:31, Gilles Chanteperdrix wrote: >> On 09/18/2012 08:25 PM, Jan Kiszka wrote: >> >>> On 2012-09-18 19:28, Gilles Chanteperdrix wrote: >>>> On 09/18/2012 07:25 PM, Jan Kiszka wrote: >>>>> On 2012-09-17 10:13, Gilles Chanteperdrix wrote: >>>>>> On 09/16/2012 10:50 AM, Philippe Gerum wrote: >>>>>>> On 09/16/2012 12:26 AM, Gilles Chanteperdrix wrote: >>>>>>>> On 09/11/2012 05:56 PM, Gernot Hillier wrote: >>>>>>>> >>>>>>>>> Hi there! >>>>>>>>> >>>>>>>>> While testing ipipe-core3.2 on an SMP x86 machine, I found a reproducible >>>>>>>>> kernel BUG after some seconds after starting irqbalance: >>>>>>>>> >>>>>>>>> ------------[ cut here ]------------ >>>>>>>>> kernel BUG at arch/x86/kernel/ipipe.c:592! >>>>>>>>> invalid opcode: 0000 [#1] SMP >>>>>>>>> CPU 0 >>>>>>>>> Modules linked in: des_generic md4 i7core_edac psmouse nls_cp437 edac_core cifs serio_raw joydev raid10 raid456 async_pq async_xor xor async_memcpy async_raid6_recov usbhid hid mpt2sas scsi_transport_sas raid_class igb raid6_pq async_tx raid1 raid0 multipath linear >>>>>>>>> >>>>>>>>> Pid: 0, comm: swapper/0 Not tainted 3.2.21-9-xenomai #3 Siemens AG Healthcare Sector MARS 2.1/X8DTH >>>>>>>>> RIP: 0010:[<ffffffff8101edec>] [<ffffffff8101edec>] __ipipe_handle_irq+0x1bc/0x1d0 >>>>>>>>> RSP: 0018:ffffffff8177bbe0 EFLAGS: 00010086 >>>>>>>>> RAX: 000000000000d880 RBX: 00000000ffffffff RCX: 0000000000000092 >>>>>>>>> RDX: ffffffffffffffdf RSI: ffffffff8177bc18 RDI: ffffffff8177bbf8 >>>>>>>>> RBP: ffffffff8177bc00 R08: 0000000000000001 R09: 0000000000000000 >>>>>>>>> R10: ffff880624ebaef8 R11: 000000000029fbc4 R12: 000000000000d880 >>>>>>>>> R13: ffffffff8177bbf8 R14: ffff880624e00000 R15: ffff880624e0d880 >>>>>>>>> FS: 0000000000000000(0000) GS:ffff880624e00000(0000) knlGS:0000000000000000 >>>>>>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >>>>>>>>> CR2: 00007f452a2efb80 CR3: 0000000c114d3000 CR4: 00000000000006f0 >>>>>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>>>>>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >>>>>>>>> Process swapper/0 (pid: 0, threadinfo ffffffff81778000, task ffffffff81787020) >>>>>>>>> Stack: >>>>>>>>> 0000000000000000 ffffffff8177bfd8 0000000000000063 ffff880624e1f9a8 >>>>>>>>> ffffffff8177bca8 ffffffff815a44dd ffffffff8177bc18 ffffffff8177bca8 >>>>>>>>> 00000000815a373b 000000000029fbc4 ffff880624eba570 0000000000000000 >>>>>>>>> Call Trace: >>>>>>>>> [<ffffffff815a44dd>] irq_move_cleanup_interrupt+0x5d/0x90 >>>>>>>>> [<ffffffff815a6649>] ? call_softirq+0x19/0x30 >>>>>>>>> [<ffffffff810043d5>] do_softirq+0xc5/0x100 >>>>>>>>> [<ffffffff81050955>] irq_exit+0xd5/0xf0 >>>>>>>>> [<ffffffff815a6e1f>] do_IRQ+0x6f/0x100 >>>>>>>>> [<ffffffff815a6db0>] ? __entry_text_end+0x5/0x5 >>>>>>>>> [<ffffffff8101f4d3>] __ipipe_do_IRQ+0x83/0xa0 >>>>>>>>> [<ffffffff8101f4d9>] ? __ipipe_do_IRQ+0x89/0xa0 >>>>>>>>> [<ffffffff810b7aaa>] __ipipe_dispatch_irq_fast+0x16a/0x170 >>>>>>>>> [<ffffffff810b87b9>] __ipipe_dispatch_irq+0xe9/0x210 >>>>>>>>> [<ffffffff8101eca1>] __ipipe_handle_irq+0x71/0x1d0 >>>>>>>>> [<ffffffff8159bae0>] common_interrupt+0x60/0x81 >>>>>>>>> [<ffffffff8101f3c4>] ? __ipipe_halt_root+0x34/0x50 >>>>>>>>> [<ffffffff8101f3b7>] ? __ipipe_halt_root+0x27/0x50 >>>>>>>>> [<ffffffff8100b146>] default_idle+0x66/0x1a0 >>>>>>>>> [<ffffffff810020cf>] cpu_idle+0xaf/0x100 >>>>>>>>> [<ffffffff8157ec22>] rest_init+0x72/0x80 >>>>>>>>> [<ffffffff81aeacca>] start_kernel+0x3b4/0x3bf >>>>>>>>> [<ffffffff81aea321>] x86_64_start_reservations+0x131/0x135 >>>>>>>>> [<ffffffff81aea456>] x86_64_start_kernel+0x131/0x138 >>>>>>>>> Code: ff ff 0f 1f 44 00 00 48 83 a0 98 06 00 00 fe 4c 89 ee bf 20 00 00 00 e8 63 83 09 00 e9 f6 fe ff ff be 01 00 00 00 e9 ab fe ff ff <0f> 0b 66 90 eb fc 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 >>>>>>>>> RIP [<ffffffff8101edec>] __ipipe_handle_irq+0x1bc/0x1d0 >>>>>>>>> RSP <ffffffff8177bbe0> >>>>>>>>> >>>>>>>>> This seems to be caused by a missing entry for IRQ_MOVE_CLEANUP_VECTOR >>>>>>>>> in the per_cpu array vector_irq[]. >>>>>>>>> >>>>>>>>> I found that ipipe_init_vector_irq() (which used to add the needed >>>>>>>>> entry) was factored out from arch/x86/kernel/ipipe.c. This likely >>>>>>>>> happened when porting from 2.6.38 to 3.1 - at least I can still see the >>>>>>>>> code in ipipe-2.6.38-x86 and missed it in ipipe-core3.1 (and didn't find >>>>>>>>> any x86-branch in-between). >>>>>>>> >>>>>>>> >>>>>>>> If I understand correctly, ipipe_init_vector_irq is no longer needed >>>>>>>> because the I-pipe core uses ipipe_apic_vector_irq for vectors above >>>>>>>> FIRST_SYSTEM_VECTOR. All system vectors are above this limit... except >>>>>>>> IRQ_MOVE_CLEANUP_VECTOR. So, your patch is correct. Another one which >>>>>>>> should work is to handle the special case IRQ_MOVE_CLEANUP_IRQ in >>>>>>>> __ipipe_handle_irq as well. >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> This is correct, but unfortunately, upstream reshuffles the IRQ vector >>>>>>> space every so often, and does not restrict the special vectors to the >>>>>>> system range anymore. Therefore, we should re-introduce a post-setup >>>>>>> routine for the vector->irq map, grouping all fixups we need in one place. >>>>>>> >>>>>> I propose the following change: >>>>>> http://git.xenomai.org/?p=ipipe-gch.git;a=commitdiff;h=9d131dc33080cda3f7e40342210d9338dc0c3d02 >>>>>> >>>>>> Which avoids listing explicitely the vectors we want to intercept, and >>>>>> so, should allow some changes to happen in the kernel without having to >>>>>> care too much (except for vectors sur as IRQ_MOVE_CLEANUP_VECTOR which >>>>>> do not go through alloc_intr_gate, but this vector is the only exception, >>>>>> for now). >>>>> >>>>> Crashes on boot, SMP at least. Investigating. >>>> >>>> Well, I tested it in SMP, so, the crash is probably due to some option I >>>> could not activate (such as IRQ_REMAP, the one triggering the use of >>>> IRQ_MOVE_CLEANUP_VECTOR). One thing is sure however, some #ifdef is >>>> missing, because it does not compile in UP mode without APIC/IO-APIC. >>> >>> Not sure what the precise option difference is, but the bug is that you >>> write to vector_irq before __setup_vector_irq is finished. >> >> >> Well, yes, the writes to vector_irq should happen even before it is >> started, that is the point. The test at the end of setup_vector_irq skip >> the... Ah, wait, a hunk is missing (I did test this modification in the >> middle of the LAPIC TPR modification, so the commit is not actually tested). >> >> diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c >> index 93f84c9..91d1ee9 100644 >> --- a/arch/x86/kernel/apic/io_apic.c >> +++ b/arch/x86/kernel/apic/io_apic.c >> @@ -1290,9 +1290,15 @@ static void __clear_irq_vector(int irq, struct >> irq_cfg *cfg) >> void __setup_vector_irq(int cpu) >> { >> /* Initialize vector_irq on a new cpu */ >> - int irq, vector; >> + int irq, vector, vectors_limit; >> struct irq_cfg *cfg; >> >> +#ifndef CONFIG_IPIPE >> + vectors_limit = NR_VECTORS; >> +#else >> + vectors_limit = first_system_vector; >> +#endif /* CONFIG_IPIPE */ >> + >> /* >> * vector_lock will make sure that we don't run into irq vector >> * assignments that might be happening on another cpu in parallel, >> @@ -1317,7 +1323,10 @@ void __setup_vector_irq(int cpu) >> per_cpu(vector_irq, cpu)[vector] = irq; >> } >> /* Mark the free vectors */ >> - for (vector = 0; vector < NR_VECTORS; ++vector) { >> + for (vector = 0; vector < first_system_vector; ++vector) { >> + if (vector == IRQ_MOVE_CLEANUP_VECTOR) >> + continue; >> + >> irq = per_cpu(vector_irq, cpu)[vector]; >> if (irq < 0) >> continue; >> >> > > Err, that's not really getting cleaner or easier maintainable than the > original version, does it? Also, some loop below this clears out certain > vector_irq entries again - are you sure that this is what we want? I > vaguely recall that there was a very good reason to patch vector_irq > afterward... > > Will post the old variant, at least for comparison. rah, vectors_limit is not needed at all. I do not see which look you are talking about. -- Gilles. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Xenomai] ipipe/x86: kernel BUG due to missing IRQ_MOVE_CLEANUP_VECTOR entry in ipipe-core3.2 2012-09-18 19:05 ` Gilles Chanteperdrix @ 2012-09-18 19:10 ` Jan Kiszka 2012-09-18 19:12 ` Gilles Chanteperdrix 0 siblings, 1 reply; 28+ messages in thread From: Jan Kiszka @ 2012-09-18 19:10 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org On 2012-09-18 21:05, Gilles Chanteperdrix wrote: > On 09/18/2012 08:55 PM, Jan Kiszka wrote: > >> On 2012-09-18 20:31, Gilles Chanteperdrix wrote: >>> On 09/18/2012 08:25 PM, Jan Kiszka wrote: >>> >>>> On 2012-09-18 19:28, Gilles Chanteperdrix wrote: >>>>> On 09/18/2012 07:25 PM, Jan Kiszka wrote: >>>>>> On 2012-09-17 10:13, Gilles Chanteperdrix wrote: >>>>>>> On 09/16/2012 10:50 AM, Philippe Gerum wrote: >>>>>>>> On 09/16/2012 12:26 AM, Gilles Chanteperdrix wrote: >>>>>>>>> On 09/11/2012 05:56 PM, Gernot Hillier wrote: >>>>>>>>> >>>>>>>>>> Hi there! >>>>>>>>>> >>>>>>>>>> While testing ipipe-core3.2 on an SMP x86 machine, I found a reproducible >>>>>>>>>> kernel BUG after some seconds after starting irqbalance: >>>>>>>>>> >>>>>>>>>> ------------[ cut here ]------------ >>>>>>>>>> kernel BUG at arch/x86/kernel/ipipe.c:592! >>>>>>>>>> invalid opcode: 0000 [#1] SMP >>>>>>>>>> CPU 0 >>>>>>>>>> Modules linked in: des_generic md4 i7core_edac psmouse nls_cp437 edac_core cifs serio_raw joydev raid10 raid456 async_pq async_xor xor async_memcpy async_raid6_recov usbhid hid mpt2sas scsi_transport_sas raid_class igb raid6_pq async_tx raid1 raid0 multipath linear >>>>>>>>>> >>>>>>>>>> Pid: 0, comm: swapper/0 Not tainted 3.2.21-9-xenomai #3 Siemens AG Healthcare Sector MARS 2.1/X8DTH >>>>>>>>>> RIP: 0010:[<ffffffff8101edec>] [<ffffffff8101edec>] __ipipe_handle_irq+0x1bc/0x1d0 >>>>>>>>>> RSP: 0018:ffffffff8177bbe0 EFLAGS: 00010086 >>>>>>>>>> RAX: 000000000000d880 RBX: 00000000ffffffff RCX: 0000000000000092 >>>>>>>>>> RDX: ffffffffffffffdf RSI: ffffffff8177bc18 RDI: ffffffff8177bbf8 >>>>>>>>>> RBP: ffffffff8177bc00 R08: 0000000000000001 R09: 0000000000000000 >>>>>>>>>> R10: ffff880624ebaef8 R11: 000000000029fbc4 R12: 000000000000d880 >>>>>>>>>> R13: ffffffff8177bbf8 R14: ffff880624e00000 R15: ffff880624e0d880 >>>>>>>>>> FS: 0000000000000000(0000) GS:ffff880624e00000(0000) knlGS:0000000000000000 >>>>>>>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >>>>>>>>>> CR2: 00007f452a2efb80 CR3: 0000000c114d3000 CR4: 00000000000006f0 >>>>>>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>>>>>>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >>>>>>>>>> Process swapper/0 (pid: 0, threadinfo ffffffff81778000, task ffffffff81787020) >>>>>>>>>> Stack: >>>>>>>>>> 0000000000000000 ffffffff8177bfd8 0000000000000063 ffff880624e1f9a8 >>>>>>>>>> ffffffff8177bca8 ffffffff815a44dd ffffffff8177bc18 ffffffff8177bca8 >>>>>>>>>> 00000000815a373b 000000000029fbc4 ffff880624eba570 0000000000000000 >>>>>>>>>> Call Trace: >>>>>>>>>> [<ffffffff815a44dd>] irq_move_cleanup_interrupt+0x5d/0x90 >>>>>>>>>> [<ffffffff815a6649>] ? call_softirq+0x19/0x30 >>>>>>>>>> [<ffffffff810043d5>] do_softirq+0xc5/0x100 >>>>>>>>>> [<ffffffff81050955>] irq_exit+0xd5/0xf0 >>>>>>>>>> [<ffffffff815a6e1f>] do_IRQ+0x6f/0x100 >>>>>>>>>> [<ffffffff815a6db0>] ? __entry_text_end+0x5/0x5 >>>>>>>>>> [<ffffffff8101f4d3>] __ipipe_do_IRQ+0x83/0xa0 >>>>>>>>>> [<ffffffff8101f4d9>] ? __ipipe_do_IRQ+0x89/0xa0 >>>>>>>>>> [<ffffffff810b7aaa>] __ipipe_dispatch_irq_fast+0x16a/0x170 >>>>>>>>>> [<ffffffff810b87b9>] __ipipe_dispatch_irq+0xe9/0x210 >>>>>>>>>> [<ffffffff8101eca1>] __ipipe_handle_irq+0x71/0x1d0 >>>>>>>>>> [<ffffffff8159bae0>] common_interrupt+0x60/0x81 >>>>>>>>>> [<ffffffff8101f3c4>] ? __ipipe_halt_root+0x34/0x50 >>>>>>>>>> [<ffffffff8101f3b7>] ? __ipipe_halt_root+0x27/0x50 >>>>>>>>>> [<ffffffff8100b146>] default_idle+0x66/0x1a0 >>>>>>>>>> [<ffffffff810020cf>] cpu_idle+0xaf/0x100 >>>>>>>>>> [<ffffffff8157ec22>] rest_init+0x72/0x80 >>>>>>>>>> [<ffffffff81aeacca>] start_kernel+0x3b4/0x3bf >>>>>>>>>> [<ffffffff81aea321>] x86_64_start_reservations+0x131/0x135 >>>>>>>>>> [<ffffffff81aea456>] x86_64_start_kernel+0x131/0x138 >>>>>>>>>> Code: ff ff 0f 1f 44 00 00 48 83 a0 98 06 00 00 fe 4c 89 ee bf 20 00 00 00 e8 63 83 09 00 e9 f6 fe ff ff be 01 00 00 00 e9 ab fe ff ff <0f> 0b 66 90 eb fc 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 >>>>>>>>>> RIP [<ffffffff8101edec>] __ipipe_handle_irq+0x1bc/0x1d0 >>>>>>>>>> RSP <ffffffff8177bbe0> >>>>>>>>>> >>>>>>>>>> This seems to be caused by a missing entry for IRQ_MOVE_CLEANUP_VECTOR >>>>>>>>>> in the per_cpu array vector_irq[]. >>>>>>>>>> >>>>>>>>>> I found that ipipe_init_vector_irq() (which used to add the needed >>>>>>>>>> entry) was factored out from arch/x86/kernel/ipipe.c. This likely >>>>>>>>>> happened when porting from 2.6.38 to 3.1 - at least I can still see the >>>>>>>>>> code in ipipe-2.6.38-x86 and missed it in ipipe-core3.1 (and didn't find >>>>>>>>>> any x86-branch in-between). >>>>>>>>> >>>>>>>>> >>>>>>>>> If I understand correctly, ipipe_init_vector_irq is no longer needed >>>>>>>>> because the I-pipe core uses ipipe_apic_vector_irq for vectors above >>>>>>>>> FIRST_SYSTEM_VECTOR. All system vectors are above this limit... except >>>>>>>>> IRQ_MOVE_CLEANUP_VECTOR. So, your patch is correct. Another one which >>>>>>>>> should work is to handle the special case IRQ_MOVE_CLEANUP_IRQ in >>>>>>>>> __ipipe_handle_irq as well. >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> This is correct, but unfortunately, upstream reshuffles the IRQ vector >>>>>>>> space every so often, and does not restrict the special vectors to the >>>>>>>> system range anymore. Therefore, we should re-introduce a post-setup >>>>>>>> routine for the vector->irq map, grouping all fixups we need in one place. >>>>>>>> >>>>>>> I propose the following change: >>>>>>> http://git.xenomai.org/?p=ipipe-gch.git;a=commitdiff;h=9d131dc33080cda3f7e40342210d9338dc0c3d02 >>>>>>> >>>>>>> Which avoids listing explicitely the vectors we want to intercept, and >>>>>>> so, should allow some changes to happen in the kernel without having to >>>>>>> care too much (except for vectors sur as IRQ_MOVE_CLEANUP_VECTOR which >>>>>>> do not go through alloc_intr_gate, but this vector is the only exception, >>>>>>> for now). >>>>>> >>>>>> Crashes on boot, SMP at least. Investigating. >>>>> >>>>> Well, I tested it in SMP, so, the crash is probably due to some option I >>>>> could not activate (such as IRQ_REMAP, the one triggering the use of >>>>> IRQ_MOVE_CLEANUP_VECTOR). One thing is sure however, some #ifdef is >>>>> missing, because it does not compile in UP mode without APIC/IO-APIC. >>>> >>>> Not sure what the precise option difference is, but the bug is that you >>>> write to vector_irq before __setup_vector_irq is finished. >>> >>> >>> Well, yes, the writes to vector_irq should happen even before it is >>> started, that is the point. The test at the end of setup_vector_irq skip >>> the... Ah, wait, a hunk is missing (I did test this modification in the >>> middle of the LAPIC TPR modification, so the commit is not actually tested). >>> >>> diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c >>> index 93f84c9..91d1ee9 100644 >>> --- a/arch/x86/kernel/apic/io_apic.c >>> +++ b/arch/x86/kernel/apic/io_apic.c >>> @@ -1290,9 +1290,15 @@ static void __clear_irq_vector(int irq, struct >>> irq_cfg *cfg) >>> void __setup_vector_irq(int cpu) >>> { >>> /* Initialize vector_irq on a new cpu */ >>> - int irq, vector; >>> + int irq, vector, vectors_limit; >>> struct irq_cfg *cfg; >>> >>> +#ifndef CONFIG_IPIPE >>> + vectors_limit = NR_VECTORS; >>> +#else >>> + vectors_limit = first_system_vector; >>> +#endif /* CONFIG_IPIPE */ >>> + >>> /* >>> * vector_lock will make sure that we don't run into irq vector >>> * assignments that might be happening on another cpu in parallel, >>> @@ -1317,7 +1323,10 @@ void __setup_vector_irq(int cpu) >>> per_cpu(vector_irq, cpu)[vector] = irq; >>> } >>> /* Mark the free vectors */ >>> - for (vector = 0; vector < NR_VECTORS; ++vector) { >>> + for (vector = 0; vector < first_system_vector; ++vector) { >>> + if (vector == IRQ_MOVE_CLEANUP_VECTOR) >>> + continue; >>> + >>> irq = per_cpu(vector_irq, cpu)[vector]; >>> if (irq < 0) >>> continue; >>> >>> >> >> Err, that's not really getting cleaner or easier maintainable than the >> original version, does it? Also, some loop below this clears out certain >> vector_irq entries again - are you sure that this is what we want? I >> vaguely recall that there was a very good reason to patch vector_irq >> afterward... >> > >> Will post the old variant, at least for comparison. > > rah, vectors_limit is not needed at all. I do not see which look you are > talking about. cfg = irq_cfg(irq); if (!cpumask_test_cpu(cpu, cfg->domain)) per_cpu(vector_irq, cpu)[vector] = -1; Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Xenomai] ipipe/x86: kernel BUG due to missing IRQ_MOVE_CLEANUP_VECTOR entry in ipipe-core3.2 2012-09-18 19:10 ` Jan Kiszka @ 2012-09-18 19:12 ` Gilles Chanteperdrix 2012-09-18 19:18 ` Jan Kiszka 0 siblings, 1 reply; 28+ messages in thread From: Gilles Chanteperdrix @ 2012-09-18 19:12 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai@xenomai.org On 09/18/2012 09:10 PM, Jan Kiszka wrote: > On 2012-09-18 21:05, Gilles Chanteperdrix wrote: >> rah, vectors_limit is not needed at all. I do not see which look you are >> talking about. > > cfg = irq_cfg(irq); > if (!cpumask_test_cpu(cpu, cfg->domain)) > per_cpu(vector_irq, cpu)[vector] = -1; Yes, but : >>>> - for (vector = 0; vector < NR_VECTORS; ++vector) { >>>> + for (vector = 0; vector < first_system_vector; ++vector) { -- Gilles. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Xenomai] ipipe/x86: kernel BUG due to missing IRQ_MOVE_CLEANUP_VECTOR entry in ipipe-core3.2 2012-09-18 19:12 ` Gilles Chanteperdrix @ 2012-09-18 19:18 ` Jan Kiszka 2012-09-18 19:22 ` Gilles Chanteperdrix 0 siblings, 1 reply; 28+ messages in thread From: Jan Kiszka @ 2012-09-18 19:18 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org On 2012-09-18 21:12, Gilles Chanteperdrix wrote: > On 09/18/2012 09:10 PM, Jan Kiszka wrote: > >> On 2012-09-18 21:05, Gilles Chanteperdrix wrote: >>> rah, vectors_limit is not needed at all. I do not see which look you are >>> talking about. >> >> cfg = irq_cfg(irq); >> if (!cpumask_test_cpu(cpu, cfg->domain)) >> per_cpu(vector_irq, cpu)[vector] = -1; > > > Yes, but : > >>>>> - for (vector = 0; vector < NR_VECTORS; ++vector) { >>>>> + for (vector = 0; vector < first_system_vector; ++vector) { > And you know all side effects of that change? My point is: We have a working version, released and tested on machines that make use of the affected code paths. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Xenomai] ipipe/x86: kernel BUG due to missing IRQ_MOVE_CLEANUP_VECTOR entry in ipipe-core3.2 2012-09-18 19:18 ` Jan Kiszka @ 2012-09-18 19:22 ` Gilles Chanteperdrix 2012-09-19 14:17 ` [Xenomai] [PATCH] ipipe: x86: Populate vector_irq for all dispatched vectors Jan Kiszka 0 siblings, 1 reply; 28+ messages in thread From: Gilles Chanteperdrix @ 2012-09-18 19:22 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai@xenomai.org On 09/18/2012 09:18 PM, Jan Kiszka wrote: > On 2012-09-18 21:12, Gilles Chanteperdrix wrote: >> On 09/18/2012 09:10 PM, Jan Kiszka wrote: >> >>> On 2012-09-18 21:05, Gilles Chanteperdrix wrote: >>>> rah, vectors_limit is not needed at all. I do not see which look you are >>>> talking about. >>> >>> cfg = irq_cfg(irq); >>> if (!cpumask_test_cpu(cpu, cfg->domain)) >>> per_cpu(vector_irq, cpu)[vector] = -1; >> >> >> Yes, but : >> >>>>>> - for (vector = 0; vector < NR_VECTORS; ++vector) { >>>>>> + for (vector = 0; vector < first_system_vector; ++vector) { >> > > And you know all side effects of that change? > > My point is: We have a working version, released and tested on machines > that make use of the affected code paths. Come on, look at the code: for (vector = 0; vector < first_system_vector; ++vector) { if (vector == IRQ_MOVE_CLEANUP_VECTOR) continue; irq = per_cpu(vector_irq, cpu)[vector]; if (irq < 0) continue; cfg = irq_cfg(irq); if (!cpumask_test_cpu(cpu, cfg->domain)) per_cpu(vector_irq, cpu)[vector] = -1; } It simply skips an initialization to -1 because we know that we already initialized this part of the array elsewhere. I do not claim that it has no bug, but you can test this approach too, and it will have been tested too... -- Gilles. ^ permalink raw reply [flat|nested] 28+ messages in thread
* [Xenomai] [PATCH] ipipe: x86: Populate vector_irq for all dispatched vectors 2012-09-18 19:22 ` Gilles Chanteperdrix @ 2012-09-19 14:17 ` Jan Kiszka 2012-09-19 14:28 ` Gilles Chanteperdrix 0 siblings, 1 reply; 28+ messages in thread From: Jan Kiszka @ 2012-09-19 14:17 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org On 2012-09-18 21:22, Gilles Chanteperdrix wrote: > On 09/18/2012 09:18 PM, Jan Kiszka wrote: > >> On 2012-09-18 21:12, Gilles Chanteperdrix wrote: >>> On 09/18/2012 09:10 PM, Jan Kiszka wrote: >>> >>>> On 2012-09-18 21:05, Gilles Chanteperdrix wrote: >>>>> rah, vectors_limit is not needed at all. I do not see which look you are >>>>> talking about. >>>> >>>> cfg = irq_cfg(irq); >>>> if (!cpumask_test_cpu(cpu, cfg->domain)) >>>> per_cpu(vector_irq, cpu)[vector] = -1; >>> >>> >>> Yes, but : >>> >>>>>>> - for (vector = 0; vector < NR_VECTORS; ++vector) { >>>>>>> + for (vector = 0; vector < first_system_vector; ++vector) { >>> >> >> And you know all side effects of that change? >> >> My point is: We have a working version, released and tested on machines >> that make use of the affected code paths. > > > Come on, look at the code: > > for (vector = 0; vector < first_system_vector; ++vector) { > if (vector == IRQ_MOVE_CLEANUP_VECTOR) > continue; > > irq = per_cpu(vector_irq, cpu)[vector]; > if (irq < 0) > continue; > > cfg = irq_cfg(irq); > if (!cpumask_test_cpu(cpu, cfg->domain)) > per_cpu(vector_irq, cpu)[vector] = -1; > } > > It simply skips an initialization to -1 because we know that we already > initialized this part of the array elsewhere. Yes, it's simple and obvious - and therefore it was wrong. To be fair: also in 2.6.38. Below a patch - using your scheme - that survived both another review and testing round. And it builds for APIC-less x86-32. Jan ---8<--- Based on patch by Gilles: This fixes the dispatching of IRQ_MOVE_CLEANUP_VECTOR in __ipipe_handle_irq and simplifies those lines by always using vector_irq to translate a vector to a Linux IRQ number. In contrast to the version in 2.6.38 and before, this one also properly marks unused vectors above first_system_vector as free. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> --- arch/x86/include/asm/desc.h | 8 ++++++++ arch/x86/kernel/apic/io_apic.c | 3 +++ arch/x86/kernel/ipipe.c | 12 ++---------- arch/x86/kernel/irqinit.c | 5 +++++ 4 files changed, 18 insertions(+), 10 deletions(-) diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h index e95822d..5a5b415 100644 --- a/arch/x86/include/asm/desc.h +++ b/arch/x86/include/asm/desc.h @@ -4,6 +4,7 @@ #include <asm/desc_defs.h> #include <asm/ldt.h> #include <asm/mmu.h> +#include <asm/hw_irq.h> #include <linux/smp.h> @@ -351,6 +352,13 @@ extern unsigned long used_vectors[]; static inline void alloc_system_vector(int vector) { if (!test_bit(vector, used_vectors)) { +#ifdef CONFIG_X86_LOCAL_APIC + unsigned cpu; + + for_each_possible_cpu(cpu) + per_cpu(vector_irq, cpu)[vector] = + ipipe_apic_vector_irq(vector); +#endif set_bit(vector, used_vectors); if (first_system_vector > vector) first_system_vector = vector; diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c index 93f84c9..0dd658d 100644 --- a/arch/x86/kernel/apic/io_apic.c +++ b/arch/x86/kernel/apic/io_apic.c @@ -1318,6 +1318,9 @@ void __setup_vector_irq(int cpu) } /* Mark the free vectors */ for (vector = 0; vector < NR_VECTORS; ++vector) { + /* I-pipe requires initialized vector_irq for system vectors */ + if (test_bit(vector, used_vectors)) + continue; irq = per_cpu(vector_irq, cpu)[vector]; if (irq < 0) continue; diff --git a/arch/x86/kernel/ipipe.c b/arch/x86/kernel/ipipe.c index 1a56b5f..fcf8113 100644 --- a/arch/x86/kernel/ipipe.c +++ b/arch/x86/kernel/ipipe.c @@ -581,16 +581,8 @@ int __ipipe_handle_irq(struct pt_regs *regs) int irq, vector = regs->orig_ax, flags = 0; if (likely(vector < 0)) { - vector = ~vector; -#ifdef CONFIG_X86_LOCAL_APIC - if (vector >= FIRST_SYSTEM_VECTOR) - irq = ipipe_apic_vector_irq(vector); - else -#endif /* CONFIG_X86_LOCAL_APIC */ - { - irq = __this_cpu_read(vector_irq[vector]); - BUG_ON(irq < 0); - } + irq = __this_cpu_read(vector_irq[~vector]); + BUG_ON(irq < 0); } else { /* Software-generated. */ irq = vector; flags = IPIPE_IRQF_NOACK; diff --git a/arch/x86/kernel/irqinit.c b/arch/x86/kernel/irqinit.c index 6fa97b2..dd348e1 100644 --- a/arch/x86/kernel/irqinit.c +++ b/arch/x86/kernel/irqinit.c @@ -165,6 +165,8 @@ static void __init smp_intr_init(void) { #ifdef CONFIG_SMP #if defined(CONFIG_X86_64) || defined(CONFIG_X86_LOCAL_APIC) + unsigned cpu; + /* * The reschedule interrupt is a CPU-to-CPU reschedule-helper * IPI, driven by wakeup. @@ -254,6 +256,9 @@ static void __init smp_intr_init(void) /* Low priority IPI to cleanup after moving an irq */ set_intr_gate(IRQ_MOVE_CLEANUP_VECTOR, irq_move_cleanup_interrupt); set_bit(IRQ_MOVE_CLEANUP_VECTOR, used_vectors); + for_each_possible_cpu(cpu) + per_cpu(vector_irq, cpu)[IRQ_MOVE_CLEANUP_VECTOR] = + IRQ_MOVE_CLEANUP_VECTOR; /* IPI used for rebooting/stopping */ alloc_intr_gate(REBOOT_VECTOR, reboot_interrupt); -- 1.7.3.4 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [Xenomai] [PATCH] ipipe: x86: Populate vector_irq for all dispatched vectors 2012-09-19 14:17 ` [Xenomai] [PATCH] ipipe: x86: Populate vector_irq for all dispatched vectors Jan Kiszka @ 2012-09-19 14:28 ` Gilles Chanteperdrix 2012-09-19 14:33 ` Jan Kiszka 0 siblings, 1 reply; 28+ messages in thread From: Gilles Chanteperdrix @ 2012-09-19 14:28 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai@xenomai.org Jan Kiszka wrote: > for (vector = 0; vector < NR_VECTORS; ++vector) { > + /* I-pipe requires initialized vector_irq for system vectors */ > + if (test_bit(vector, used_vectors)) > + continue; > irq = per_cpu(vector_irq, cpu)[vector]; > if (irq < 0) > continue; This looks wrong to me: you are skipping a part a code that was setting to -1 vectors that were allocated but not used on this cpu. -- Gilles. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Xenomai] [PATCH] ipipe: x86: Populate vector_irq for all dispatched vectors 2012-09-19 14:28 ` Gilles Chanteperdrix @ 2012-09-19 14:33 ` Jan Kiszka 2012-09-19 14:46 ` Gilles Chanteperdrix 0 siblings, 1 reply; 28+ messages in thread From: Jan Kiszka @ 2012-09-19 14:33 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org On 2012-09-19 16:28, Gilles Chanteperdrix wrote: > > Jan Kiszka wrote: >> for (vector = 0; vector < NR_VECTORS; ++vector) { >> + /* I-pipe requires initialized vector_irq for system vectors */ >> + if (test_bit(vector, used_vectors)) >> + continue; >> irq = per_cpu(vector_irq, cpu)[vector]; >> if (irq < 0) >> continue; > > This looks wrong to me: you are skipping a part a code that was setting to > -1 vectors that were allocated but not used on this cpu. I'm skipping only used_vectors, i.e system vectors. They will be set up for I-pipe for that well-known reason. I'm no longer skipping to clear IRQ vectors that are not used on the current CPU (!cpumask_test_cpu(cpu, cfg->domain)). That is what was missing in both approaches so far. Note: vector_free = (vector_irq == -1) && !used_vector Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Xenomai] [PATCH] ipipe: x86: Populate vector_irq for all dispatched vectors 2012-09-19 14:33 ` Jan Kiszka @ 2012-09-19 14:46 ` Gilles Chanteperdrix 2012-09-19 14:48 ` Jan Kiszka 0 siblings, 1 reply; 28+ messages in thread From: Gilles Chanteperdrix @ 2012-09-19 14:46 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai@xenomai.org Jan Kiszka wrote: > On 2012-09-19 16:28, Gilles Chanteperdrix wrote: >> >> Jan Kiszka wrote: >>> for (vector = 0; vector < NR_VECTORS; ++vector) { >>> + /* I-pipe requires initialized vector_irq for system vectors */ >>> + if (test_bit(vector, used_vectors)) >>> + continue; >>> irq = per_cpu(vector_irq, cpu)[vector]; >>> if (irq < 0) >>> continue; >> >> This looks wrong to me: you are skipping a part a code that was setting >> to >> -1 vectors that were allocated but not used on this cpu. > > I'm skipping only used_vectors, i.e system vectors. They will be set up > for I-pipe for that well-known reason. I'm no longer skipping to clear > IRQ vectors that are not used on the current CPU (!cpumask_test_cpu(cpu, > cfg->domain)). That is what was missing in both approaches so far. > > Note: vector_free = (vector_irq == -1) && !used_vector Let us go back to the obvious loop: > for (vector = 0; vector < first_system_vector; ++vector) { > if (vector == IRQ_MOVE_CLEANUP_VECTOR) > continue; > > irq = per_cpu(vector_irq, cpu)[vector]; > if (irq < 0) > continue; > > cfg = irq_cfg(irq); > if (!cpumask_test_cpu(cpu, cfg->domain)) > per_cpu(vector_irq, cpu)[vector] = -1; > } >From what I understand, this function is called when a CPU is started, so, for instance, with hotplug cpu, on a system that has been running for days and so had a lot of time to allocate a lot of vectors. Now, some of these vectors are known not to be needed on the cpu we are currently starting. What this loop does, is for such vectors, to set the vector_irq entry on the cpu we are currently starting to -1. Now, such vectors will certainly have the used_vectors bit to 1. -- Gilles. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Xenomai] [PATCH] ipipe: x86: Populate vector_irq for all dispatched vectors 2012-09-19 14:46 ` Gilles Chanteperdrix @ 2012-09-19 14:48 ` Jan Kiszka 2012-09-19 15:02 ` Gilles Chanteperdrix 0 siblings, 1 reply; 28+ messages in thread From: Jan Kiszka @ 2012-09-19 14:48 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org On 2012-09-19 16:46, Gilles Chanteperdrix wrote: > > Jan Kiszka wrote: >> On 2012-09-19 16:28, Gilles Chanteperdrix wrote: >>> >>> Jan Kiszka wrote: >>>> for (vector = 0; vector < NR_VECTORS; ++vector) { >>>> + /* I-pipe requires initialized vector_irq for system vectors */ >>>> + if (test_bit(vector, used_vectors)) >>>> + continue; >>>> irq = per_cpu(vector_irq, cpu)[vector]; >>>> if (irq < 0) >>>> continue; >>> >>> This looks wrong to me: you are skipping a part a code that was setting >>> to >>> -1 vectors that were allocated but not used on this cpu. >> >> I'm skipping only used_vectors, i.e system vectors. They will be set up >> for I-pipe for that well-known reason. I'm no longer skipping to clear >> IRQ vectors that are not used on the current CPU (!cpumask_test_cpu(cpu, >> cfg->domain)). That is what was missing in both approaches so far. >> >> Note: vector_free = (vector_irq == -1) && !used_vector > > Let us go back to the obvious loop: > >> for (vector = 0; vector < first_system_vector; ++vector) { >> if (vector == IRQ_MOVE_CLEANUP_VECTOR) >> continue; >> >> irq = per_cpu(vector_irq, cpu)[vector]; >> if (irq < 0) >> continue; >> >> cfg = irq_cfg(irq); >> if (!cpumask_test_cpu(cpu, cfg->domain)) >> per_cpu(vector_irq, cpu)[vector] = -1; >> } > >>From what I understand, this function is called when a CPU is started, > so, for instance, with hotplug cpu, on a system that has been running > for days and so had a lot of time to allocate a lot of vectors. > > Now, some of these vectors are known not to be needed on the cpu we > are currently starting. What this loop does, is for such vectors, to > set the vector_irq entry on the cpu we are currently starting to -1. > > Now, such vectors will certainly have the used_vectors bit to 1. used_vectors should be called "used_for_system_vector". Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Xenomai] [PATCH] ipipe: x86: Populate vector_irq for all dispatched vectors 2012-09-19 14:48 ` Jan Kiszka @ 2012-09-19 15:02 ` Gilles Chanteperdrix 2012-09-19 15:18 ` Jan Kiszka 0 siblings, 1 reply; 28+ messages in thread From: Gilles Chanteperdrix @ 2012-09-19 15:02 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai@xenomai.org Jan Kiszka wrote: > On 2012-09-19 16:46, Gilles Chanteperdrix wrote: >> >> Jan Kiszka wrote: >>> On 2012-09-19 16:28, Gilles Chanteperdrix wrote: >>>> >>>> Jan Kiszka wrote: >>>>> for (vector = 0; vector < NR_VECTORS; ++vector) { >>>>> + /* I-pipe requires initialized vector_irq for system vectors */ >>>>> + if (test_bit(vector, used_vectors)) >>>>> + continue; >>>>> irq = per_cpu(vector_irq, cpu)[vector]; >>>>> if (irq < 0) >>>>> continue; >>>> >>>> This looks wrong to me: you are skipping a part a code that was >>>> setting >>>> to >>>> -1 vectors that were allocated but not used on this cpu. >>> >>> I'm skipping only used_vectors, i.e system vectors. They will be set up >>> for I-pipe for that well-known reason. I'm no longer skipping to clear >>> IRQ vectors that are not used on the current CPU >>> (!cpumask_test_cpu(cpu, >>> cfg->domain)). That is what was missing in both approaches so far. >>> >>> Note: vector_free = (vector_irq == -1) && !used_vector >> >> Let us go back to the obvious loop: >> >>> for (vector = 0; vector < first_system_vector; ++vector) { >>> if (vector == IRQ_MOVE_CLEANUP_VECTOR) >>> continue; >>> >>> irq = per_cpu(vector_irq, cpu)[vector]; >>> if (irq < 0) >>> continue; >>> >>> cfg = irq_cfg(irq); >>> if (!cpumask_test_cpu(cpu, cfg->domain)) >>> per_cpu(vector_irq, cpu)[vector] = -1; >>> } >> >>>From what I understand, this function is called when a CPU is started, >> so, for instance, with hotplug cpu, on a system that has been running >> for days and so had a lot of time to allocate a lot of vectors. >> >> Now, some of these vectors are known not to be needed on the cpu we >> are currently starting. What this loop does, is for such vectors, to >> set the vector_irq entry on the cpu we are currently starting to -1. >> >> Now, such vectors will certainly have the used_vectors bit to 1. > > used_vectors should be called "used_for_system_vector". What about TLB flush vectors, are they not restricted to some cpus? -- Gilles. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Xenomai] [PATCH] ipipe: x86: Populate vector_irq for all dispatched vectors 2012-09-19 15:02 ` Gilles Chanteperdrix @ 2012-09-19 15:18 ` Jan Kiszka 2012-09-19 16:09 ` Gilles Chanteperdrix 0 siblings, 1 reply; 28+ messages in thread From: Jan Kiszka @ 2012-09-19 15:18 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai@xenomai.org On 2012-09-19 17:02, Gilles Chanteperdrix wrote: > > Jan Kiszka wrote: >> On 2012-09-19 16:46, Gilles Chanteperdrix wrote: >>> >>> Jan Kiszka wrote: >>>> On 2012-09-19 16:28, Gilles Chanteperdrix wrote: >>>>> >>>>> Jan Kiszka wrote: >>>>>> for (vector = 0; vector < NR_VECTORS; ++vector) { >>>>>> + /* I-pipe requires initialized vector_irq for system vectors */ >>>>>> + if (test_bit(vector, used_vectors)) >>>>>> + continue; >>>>>> irq = per_cpu(vector_irq, cpu)[vector]; >>>>>> if (irq < 0) >>>>>> continue; >>>>> >>>>> This looks wrong to me: you are skipping a part a code that was >>>>> setting >>>>> to >>>>> -1 vectors that were allocated but not used on this cpu. >>>> >>>> I'm skipping only used_vectors, i.e system vectors. They will be set up >>>> for I-pipe for that well-known reason. I'm no longer skipping to clear >>>> IRQ vectors that are not used on the current CPU >>>> (!cpumask_test_cpu(cpu, >>>> cfg->domain)). That is what was missing in both approaches so far. >>>> >>>> Note: vector_free = (vector_irq == -1) && !used_vector >>> >>> Let us go back to the obvious loop: >>> >>>> for (vector = 0; vector < first_system_vector; ++vector) { >>>> if (vector == IRQ_MOVE_CLEANUP_VECTOR) >>>> continue; >>>> >>>> irq = per_cpu(vector_irq, cpu)[vector]; >>>> if (irq < 0) >>>> continue; >>>> >>>> cfg = irq_cfg(irq); >>>> if (!cpumask_test_cpu(cpu, cfg->domain)) >>>> per_cpu(vector_irq, cpu)[vector] = -1; >>>> } >>> >>> >From what I understand, this function is called when a CPU is started, >>> so, for instance, with hotplug cpu, on a system that has been running >>> for days and so had a lot of time to allocate a lot of vectors. >>> >>> Now, some of these vectors are known not to be needed on the cpu we >>> are currently starting. What this loop does, is for such vectors, to >>> set the vector_irq entry on the cpu we are currently starting to -1. >>> >>> Now, such vectors will certainly have the used_vectors bit to 1. >> >> used_vectors should be called "used_for_system_vector". > > What about TLB flush vectors, are they not restricted to some cpus? OK, let's try this step by step: So far, the original loop ("Mark the free vectors" went over all vectors, clearing those out in vector_irq that were filled for *external IRQs* by the first loop ("Mark the inuse vector") but are actually not in use on the current CPU. Linux does not fill any system vector into vector_irq, as we know. Now that we do enter certain system vectors here (MOVE_CLEANUP and those above first_system_vector), we must avoid that this clearing happens for them. My old version did this by overwriting what was cleared with that simply 1:1 mapping again. Your version avoided any clearing for MOVE_CLEANUP and >first_system_vector. Both were wrong as both left vectors marked as in use which are actually free. What we need is to still clear unused IRQ vectors, i.e. still run the full loop, but skip used system vectors. And the latter can be achieved by looking at used_vectors which is valid for all CPUs. Is this clearer? Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Xenomai] [PATCH] ipipe: x86: Populate vector_irq for all dispatched vectors 2012-09-19 15:18 ` Jan Kiszka @ 2012-09-19 16:09 ` Gilles Chanteperdrix 0 siblings, 0 replies; 28+ messages in thread From: Gilles Chanteperdrix @ 2012-09-19 16:09 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai@xenomai.org Jan Kiszka wrote: > On 2012-09-19 17:02, Gilles Chanteperdrix wrote: >> >> Jan Kiszka wrote: >>> On 2012-09-19 16:46, Gilles Chanteperdrix wrote: >>>> >>>> Jan Kiszka wrote: >>>>> On 2012-09-19 16:28, Gilles Chanteperdrix wrote: >>>>>> >>>>>> Jan Kiszka wrote: >>>>>>> for (vector = 0; vector < NR_VECTORS; ++vector) { >>>>>>> + /* I-pipe requires initialized vector_irq for system vectors */ >>>>>>> + if (test_bit(vector, used_vectors)) >>>>>>> + continue; >>>>>>> irq = per_cpu(vector_irq, cpu)[vector]; >>>>>>> if (irq < 0) >>>>>>> continue; >>>>>> >>>>>> This looks wrong to me: you are skipping a part a code that was >>>>>> setting >>>>>> to >>>>>> -1 vectors that were allocated but not used on this cpu. >>>>> >>>>> I'm skipping only used_vectors, i.e system vectors. They will be set >>>>> up >>>>> for I-pipe for that well-known reason. I'm no longer skipping to >>>>> clear >>>>> IRQ vectors that are not used on the current CPU >>>>> (!cpumask_test_cpu(cpu, >>>>> cfg->domain)). That is what was missing in both approaches so far. >>>>> >>>>> Note: vector_free = (vector_irq == -1) && !used_vector >>>> >>>> Let us go back to the obvious loop: >>>> >>>>> for (vector = 0; vector < first_system_vector; ++vector) { >>>>> if (vector == IRQ_MOVE_CLEANUP_VECTOR) >>>>> continue; >>>>> >>>>> irq = per_cpu(vector_irq, cpu)[vector]; >>>>> if (irq < 0) >>>>> continue; >>>>> >>>>> cfg = irq_cfg(irq); >>>>> if (!cpumask_test_cpu(cpu, cfg->domain)) >>>>> per_cpu(vector_irq, cpu)[vector] = -1; >>>>> } >>>> >>>> >From what I understand, this function is called when a CPU is >>>> started, >>>> so, for instance, with hotplug cpu, on a system that has been running >>>> for days and so had a lot of time to allocate a lot of vectors. >>>> >>>> Now, some of these vectors are known not to be needed on the cpu we >>>> are currently starting. What this loop does, is for such vectors, to >>>> set the vector_irq entry on the cpu we are currently starting to -1. >>>> >>>> Now, such vectors will certainly have the used_vectors bit to 1. >>> >>> used_vectors should be called "used_for_system_vector". >> >> What about TLB flush vectors, are they not restricted to some cpus? > > OK, let's try this step by step: > > So far, the original loop ("Mark the free vectors" went over all > vectors, clearing those out in vector_irq that were filled for *external > IRQs* by the first loop ("Mark the inuse vector") but are actually not > in use on the current CPU. Linux does not fill any system vector into > vector_irq, as we know. Ok, right, that is the answer to my question. per-cpu TLB vectors will not get set in vector_irq, and so, the loop will not be able to clear them on cpus where they are not used. > > Now that we do enter certain system vectors here (MOVE_CLEANUP and those > above first_system_vector), we must avoid that this clearing happens for > them. My old version did this by overwriting what was cleared with that > simply 1:1 mapping again. Your version avoided any clearing for > MOVE_CLEANUP and >first_system_vector. Both were wrong as both left > vectors marked as in use which are actually free. Which is not actually a big issue, as this will simply result in a oops in __ipipe_dispatch_irq due to a NULL descriptor, instead of a call to BUG(). No big difference... Ok, the bug may be easier to interpret. > > What we need is to still clear unused IRQ vectors, i.e. still run the > full loop, but skip used system vectors. And the latter can be achieved > by looking at used_vectors which is valid for all CPUs. > > Is this clearer? Ok, no problem. -- Gilles. ^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2012-09-19 16:09 UTC | newest] Thread overview: 28+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-09-11 15:56 [Xenomai] ipipe/x86: kernel BUG due to missing IRQ_MOVE_CLEANUP_VECTOR entry in ipipe-core3.2 Gernot Hillier 2012-09-15 22:26 ` Gilles Chanteperdrix 2012-09-16 7:20 ` Jan Kiszka 2012-09-16 8:36 ` Philippe Gerum 2012-09-16 8:50 ` Philippe Gerum 2012-09-16 10:31 ` Gilles Chanteperdrix 2012-09-17 8:13 ` Gilles Chanteperdrix 2012-09-18 17:25 ` Jan Kiszka 2012-09-18 17:28 ` Gilles Chanteperdrix 2012-09-18 18:25 ` Jan Kiszka 2012-09-18 18:31 ` Gilles Chanteperdrix 2012-09-18 18:55 ` Jan Kiszka 2012-09-18 19:01 ` Gilles Chanteperdrix 2012-09-18 19:10 ` Jan Kiszka 2012-09-18 19:14 ` Gilles Chanteperdrix 2012-09-18 19:05 ` Gilles Chanteperdrix 2012-09-18 19:10 ` Jan Kiszka 2012-09-18 19:12 ` Gilles Chanteperdrix 2012-09-18 19:18 ` Jan Kiszka 2012-09-18 19:22 ` Gilles Chanteperdrix 2012-09-19 14:17 ` [Xenomai] [PATCH] ipipe: x86: Populate vector_irq for all dispatched vectors Jan Kiszka 2012-09-19 14:28 ` Gilles Chanteperdrix 2012-09-19 14:33 ` Jan Kiszka 2012-09-19 14:46 ` Gilles Chanteperdrix 2012-09-19 14:48 ` Jan Kiszka 2012-09-19 15:02 ` Gilles Chanteperdrix 2012-09-19 15:18 ` Jan Kiszka 2012-09-19 16:09 ` Gilles Chanteperdrix
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.