* [Xenomai-help] Kernel panic: not syncing @ 2008-07-07 15:45 Petr Cervenka 2008-07-07 15:59 ` Philippe Gerum 0 siblings, 1 reply; 24+ messages in thread From: Petr Cervenka @ 2008-07-07 15:45 UTC (permalink / raw) To: xenomai Hello, I'm not sure if I'm not off topic. We use Linux 2.6.24 and Xenomai 2.4.1. Occasionally (once in few days) we get an kernel panic and I don't know If it's our fault or a problem of kernel/xenomai/adeos/configuration/hw/... If you have any questions, i'll try to answer them. Any help is welcome. Petr Cervenka I will try to reproduce the log (visible part of it). I can mail you snapshots for details, if needed: --------------------------------------------------------------- <IRQ> profile_tick+0x5e/0xa0 tick_sched_timer+0x85/0x170 hrtimer_interrupt+0x12f/0x1e0 smp_apic_timer_interrupt+0x37/0x60 __ipipe_sync_stage+0x350/0x355 smp_apic_timer_interrupt+0x0/0x60 __xirq_end+0x0/0x85 smp_apic_timer_interrupt+0x0/0x60 __ipipe_handle_irq+0x91/0x250 default_idle+0x0/0x40 common_interrupt+0x61/0x7d <EOI> default_idle+0x29/0x40 cpu_idle+0x8b/0x120 start_kernel+0x2ba/0x350 _sinittext+0x120/0x130 Code: 48 8b 11 48 89 d0 48 c1 e8 16 48 85 c0 75 1b 48 8b 51 08 48 RIP profile_pc+0x46/0x80 RSP <ffffffff80664da0> CR2: 0000000040090fb8 ---[ end trace ccd2184e479f15c8 ]--- Kernel panic - not syncing: Aiee, killing interrupt handler! Another one is similar, with following differencies: ---------------------------------------------------------------------- <IRQ> scheduler_tick+0xf8/0x140 ... /* same part as before */ smp_apic_timer_interrupt+0x0/0x60 ipipe_suspend_domain+0xb2/0xf0 __ipipe_walk_pipeline+0xee/0x150 __ipipe_handle_irq+0x81/0x250 common_interrupt+0x61/0x7d <EOI> Code: 0f 0b eb fe 66 66 66 2e 0f 1f 84 00 00 00 00 00 41 57 41 56 RIP run_posix_cpu_timers+0x810/0x280 ... ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Xenomai-help] Kernel panic: not syncing 2008-07-07 15:45 [Xenomai-help] Kernel panic: not syncing Petr Cervenka @ 2008-07-07 15:59 ` Philippe Gerum 2008-07-08 8:31 ` Petr Cervenka 2008-07-08 8:38 ` Jan Kiszka 0 siblings, 2 replies; 24+ messages in thread From: Philippe Gerum @ 2008-07-07 15:59 UTC (permalink / raw) To: Petr Cervenka; +Cc: xenomai Petr Cervenka wrote: > Hello, > I'm not sure if I'm not off topic. > We use Linux 2.6.24 and Xenomai 2.4.1. Occasionally (once in few days) we get an kernel panic and I don't know If it's our fault or a problem of kernel/xenomai/adeos/configuration/hw/... > If you have any questions, i'll try to answer them. Any help is welcome. It is an I-pipe issue, probably. We have to somewhat forge the register frame passed to the Linux tick handler, since we may delay that call. Some register values the profiling code attempts to dereference to find the preempted code may be wrong in our case. Could you 1) send back a disassembly of the profile_tick routine in your kernel image, then apply the following patch to check whether it improves the situation as well? TIA, --- 2.6.24-x86-2.0-03/arch/x86/kernel/ipipe.c~ 2008-02-11 10:48:24.000000000 +0100 +++ 2.6.24-x86-2.0-03/arch/x86/kernel/ipipe.c 2008-07-07 17:55:36.000000000 +0200 @@ -933,12 +933,7 @@ tick_regs->eip = regs.eip; tick_regs->ebp = regs.ebp; #else /* !CONFIG_X86_32 */ - tick_regs->ss = regs->ss; - tick_regs->rsp = regs->rsp; - tick_regs->eflags = regs->eflags; - tick_regs->cs = regs->cs; - tick_regs->rip = regs->rip; - tick_regs->rbp = regs->rbp; + *tick_regs = *regs; #endif /* !CONFIG_X86_32 */ } -- Philippe. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Xenomai-help] Kernel panic: not syncing 2008-07-07 15:59 ` Philippe Gerum @ 2008-07-08 8:31 ` Petr Cervenka 2008-07-08 8:38 ` Jan Kiszka 1 sibling, 0 replies; 24+ messages in thread From: Petr Cervenka @ 2008-07-08 8:31 UTC (permalink / raw) To: rpm; +Cc: xenomai >> Hello, >> I'm not sure if I'm not off topic. >> We use Linux 2.6.24 and Xenomai 2.4.1. Occasionally (once in few days) we get an kernel panic and I don't know If it's our fault or a problem of kernel/xenomai/adeos/configuration/hw/... >> If you have any questions, i'll try to answer them. Any help is welcome. > >It is an I-pipe issue, probably. We have to somewhat forge the register frame >passed to the Linux tick handler, since we may delay that call. Some register >values the profiling code attempts to dereference to find the preempted code may >be wrong in our case. > >Could you 1) send back a disassembly of the profile_tick routine in your kernel >image, then apply the following patch to check whether it improves the situation >as well? TIA, > ad 1) I would like to, but i don't know how to do it. If you have a simple guide or a link, I would be grateful. ad 2) I will apply the patch. But it will take days (or a week) before I could tell, if it helped or not. Petr ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Xenomai-help] Kernel panic: not syncing 2008-07-07 15:59 ` Philippe Gerum 2008-07-08 8:31 ` Petr Cervenka @ 2008-07-08 8:38 ` Jan Kiszka 2008-07-08 9:21 ` Gilles Chanteperdrix 1 sibling, 1 reply; 24+ messages in thread From: Jan Kiszka @ 2008-07-08 8:38 UTC (permalink / raw) To: rpm; +Cc: Petr Cervenka, xenomai Philippe Gerum wrote: > Petr Cervenka wrote: >> Hello, >> I'm not sure if I'm not off topic. >> We use Linux 2.6.24 and Xenomai 2.4.1. Occasionally (once in few days) we get an kernel panic and I don't know If it's our fault or a problem of kernel/xenomai/adeos/configuration/hw/... >> If you have any questions, i'll try to answer them. Any help is welcome. > > It is an I-pipe issue, probably. We have to somewhat forge the register frame > passed to the Linux tick handler, since we may delay that call. Some register > values the profiling code attempts to dereference to find the preempted code may > be wrong in our case. > > Could you 1) send back a disassembly of the profile_tick routine in your kernel > image, then apply the following patch to check whether it improves the situation > as well? TIA, > > --- 2.6.24-x86-2.0-03/arch/x86/kernel/ipipe.c~ 2008-02-11 10:48:24.000000000 +0100 > +++ 2.6.24-x86-2.0-03/arch/x86/kernel/ipipe.c 2008-07-07 17:55:36.000000000 +0200 > @@ -933,12 +933,7 @@ > tick_regs->eip = regs.eip; > tick_regs->ebp = regs.ebp; > #else /* !CONFIG_X86_32 */ > - tick_regs->ss = regs->ss; > - tick_regs->rsp = regs->rsp; > - tick_regs->eflags = regs->eflags; > - tick_regs->cs = regs->cs; > - tick_regs->rip = regs->rip; > - tick_regs->rbp = regs->rbp; > + *tick_regs = *regs; > #endif /* !CONFIG_X86_32 */ I'm fairly sure that this won't make a difference. According to Petr's first dump we crash in profile_pc, and there the kernel pokes around on the stack of the interrupted context (Petr, you are running SMP, right?). The question is if this stack may have vanished or may have been swapped out after capturing the registers. Or the test "!user_mode(regs) && in_lock_functions(pc)" returns an invalid result (Petr, do you run Xenomai kernel tasks?). I do not yet see the scenario behind it, but a workaround for a vanishing stack could be to cache sp[0] and sp[1] (as accessed in profile_pc) and let a faked regs->rsp point to that cache. Nevertheless, understanding the actual reason should remain a goal at the same time (to avoid papering over an even more serious issue). Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Xenomai-help] Kernel panic: not syncing 2008-07-08 8:38 ` Jan Kiszka @ 2008-07-08 9:21 ` Gilles Chanteperdrix 2008-07-08 9:33 ` Jan Kiszka 0 siblings, 1 reply; 24+ messages in thread From: Gilles Chanteperdrix @ 2008-07-08 9:21 UTC (permalink / raw) To: Jan Kiszka; +Cc: Petr Cervenka, xenomai Jan Kiszka wrote: > Philippe Gerum wrote: >> Petr Cervenka wrote: >>> Hello, >>> I'm not sure if I'm not off topic. >>> We use Linux 2.6.24 and Xenomai 2.4.1. Occasionally (once in few days) we get an kernel panic and I don't know If it's our fault or a problem of kernel/xenomai/adeos/configuration/hw/... >>> If you have any questions, i'll try to answer them. Any help is welcome. >> It is an I-pipe issue, probably. We have to somewhat forge the register frame >> passed to the Linux tick handler, since we may delay that call. Some register >> values the profiling code attempts to dereference to find the preempted code may >> be wrong in our case. >> >> Could you 1) send back a disassembly of the profile_tick routine in your kernel >> image, then apply the following patch to check whether it improves the situation >> as well? TIA, >> >> --- 2.6.24-x86-2.0-03/arch/x86/kernel/ipipe.c~ 2008-02-11 10:48:24.000000000 +0100 >> +++ 2.6.24-x86-2.0-03/arch/x86/kernel/ipipe.c 2008-07-07 17:55:36.000000000 +0200 >> @@ -933,12 +933,7 @@ >> tick_regs->eip = regs.eip; >> tick_regs->ebp = regs.ebp; >> #else /* !CONFIG_X86_32 */ >> - tick_regs->ss = regs->ss; >> - tick_regs->rsp = regs->rsp; >> - tick_regs->eflags = regs->eflags; >> - tick_regs->cs = regs->cs; >> - tick_regs->rip = regs->rip; >> - tick_regs->rbp = regs->rbp; >> + *tick_regs = *regs; >> #endif /* !CONFIG_X86_32 */ > > I'm fairly sure that this won't make a difference. According to Petr's > first dump we crash in profile_pc, and there the kernel pokes around on > the stack of the interrupted context (Petr, you are running SMP, > right?). The question is if this stack may have vanished or may have > been swapped out after capturing the registers. When Xenomai has forwarded the tick to linux, Linux tick handler is executed upon resume to user-space, so, if the stack had to vanish, it would have to vanish upon execution of another interrupt handler before the tick handler. However, I believe that only do_exit can kill a task, and I am not sure if it can be called from an interrupt handler. As for the stack being swapped out, it is kmalloced memory, so, it is impossible. -- Gilles. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Xenomai-help] Kernel panic: not syncing 2008-07-08 9:21 ` Gilles Chanteperdrix @ 2008-07-08 9:33 ` Jan Kiszka 2008-07-09 15:19 ` Petr Cervenka 0 siblings, 1 reply; 24+ messages in thread From: Jan Kiszka @ 2008-07-08 9:33 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: Petr Cervenka, xenomai Gilles Chanteperdrix wrote: > Jan Kiszka wrote: >> Philippe Gerum wrote: >>> Petr Cervenka wrote: >>>> Hello, >>>> I'm not sure if I'm not off topic. >>>> We use Linux 2.6.24 and Xenomai 2.4.1. Occasionally (once in few >>>> days) we get an kernel panic and I don't know If it's our fault or a >>>> problem of kernel/xenomai/adeos/configuration/hw/... >>>> If you have any questions, i'll try to answer them. Any help is >>>> welcome. >>> It is an I-pipe issue, probably. We have to somewhat forge the >>> register frame >>> passed to the Linux tick handler, since we may delay that call. Some >>> register >>> values the profiling code attempts to dereference to find the >>> preempted code may >>> be wrong in our case. >>> >>> Could you 1) send back a disassembly of the profile_tick routine in >>> your kernel >>> image, then apply the following patch to check whether it improves >>> the situation >>> as well? TIA, >>> >>> --- 2.6.24-x86-2.0-03/arch/x86/kernel/ipipe.c~ 2008-02-11 >>> 10:48:24.000000000 +0100 >>> +++ 2.6.24-x86-2.0-03/arch/x86/kernel/ipipe.c 2008-07-07 >>> 17:55:36.000000000 +0200 >>> @@ -933,12 +933,7 @@ >>> tick_regs->eip = regs.eip; >>> tick_regs->ebp = regs.ebp; >>> #else /* !CONFIG_X86_32 */ >>> - tick_regs->ss = regs->ss; >>> - tick_regs->rsp = regs->rsp; >>> - tick_regs->eflags = regs->eflags; >>> - tick_regs->cs = regs->cs; >>> - tick_regs->rip = regs->rip; >>> - tick_regs->rbp = regs->rbp; >>> + *tick_regs = *regs; >>> #endif /* !CONFIG_X86_32 */ >> >> I'm fairly sure that this won't make a difference. According to Petr's >> first dump we crash in profile_pc, and there the kernel pokes around on >> the stack of the interrupted context (Petr, you are running SMP, >> right?). The question is if this stack may have vanished or may have >> been swapped out after capturing the registers. > > When Xenomai has forwarded the tick to linux, Linux tick handler is > executed upon resume to user-space, so, if the stack had to vanish, it > would have to vanish upon execution of another interrupt handler before > the tick handler. However, I believe that only do_exit can kill a task, > and I am not sure if it can be called from an interrupt handler. As for > the stack being swapped out, it is kmalloced memory, so, it is impossible. > Yes, vanishing stack is unlikely, more probable is an invalid state right from the beginning. I guess we need a full oops to say more. Petr, any chance to attach a serial cable to your box and catch those oopses completely via a second box? The register state would be telling, but also, as Philippe already requested, a disassembly of the involved function - in case it remain profile_pc: objdump -dS linux-.../arch/x86/kernel/time_64.o. Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Xenomai-help] Kernel panic: not syncing 2008-07-08 9:33 ` Jan Kiszka @ 2008-07-09 15:19 ` Petr Cervenka 2008-07-09 16:05 ` Philippe Gerum 0 siblings, 1 reply; 24+ messages in thread From: Petr Cervenka @ 2008-07-09 15:19 UTC (permalink / raw) To: jan.kiszka; +Cc: xenomai Jan Kiszka wrote: >Gilles Chanteperdrix wrote: >> Jan Kiszka wrote: >>> Philippe Gerum wrote: >>>> Petr Cervenka wrote: >>>>> Hello, >>>>> I'm not sure if I'm not off topic. >>>>> We use Linux 2.6.24 and Xenomai 2.4.1. Occasionally (once in few >>>>> days) we get an kernel panic and I don't know If it's our fault or a >>>>> problem of kernel/xenomai/adeos/configuration/hw/... >>>>> If you have any questions, i'll try to answer them. Any help is >>>>> welcome. >>>> It is an I-pipe issue, probably. We have to somewhat forge the >>>> register frame >>>> passed to the Linux tick handler, since we may delay that call. Some >>>> register >>>> values the profiling code attempts to dereference to find the >>>> preempted code may >>>> be wrong in our case. >>>> >>>> Could you 1) send back a disassembly of the profile_tick routine in >>>> your kernel >>>> image, then apply the following patch to check whether it improves >>>> the situation >>>> as well? TIA, >>>> >>>> --- 2.6.24-x86-2.0-03/arch/x86/kernel/ipipe.c~ 2008-02-11 >>>> 10:48:24.000000000 +0100 >>>> +++ 2.6.24-x86-2.0-03/arch/x86/kernel/ipipe.c 2008-07-07 >>>> 17:55:36.000000000 +0200 >>>> @@ -933,12 +933,7 @@ >>>> tick_regs->eip = regs.eip; >>>> tick_regs->ebp = regs.ebp; >>>> #else /* !CONFIG_X86_32 */ >>>> - tick_regs->ss = regs->ss; >>>> - tick_regs->rsp = regs->rsp; >>>> - tick_regs->eflags = regs->eflags; >>>> - tick_regs->cs = regs->cs; >>>> - tick_regs->rip = regs->rip; >>>> - tick_regs->rbp = regs->rbp; >>>> + *tick_regs = *regs; >>>> #endif /* !CONFIG_X86_32 */ >>> >>> I'm fairly sure that this won't make a difference. According to Petr's >>> first dump we crash in profile_pc, and there the kernel pokes around on >>> the stack of the interrupted context (Petr, you are running SMP, >>> right?). The question is if this stack may have vanished or may have >>> been swapped out after capturing the registers. >> >> When Xenomai has forwarded the tick to linux, Linux tick handler is >> executed upon resume to user-space, so, if the stack had to vanish, it >> would have to vanish upon execution of another interrupt handler before >> the tick handler. However, I believe that only do_exit can kill a task, >> and I am not sure if it can be called from an interrupt handler. As for >> the stack being swapped out, it is kmalloced memory, so, it is impossible. >> > >Yes, vanishing stack is unlikely, more probable is an invalid state >right from the beginning. I guess we need a full oops to say more. > >Petr, any chance to attach a serial cable to your box and catch those >oopses completely via a second box? The register state would be telling, >but also, as Philippe already requested, a disassembly of the involved >function - in case it remain profile_pc: objdump -dS >linux-.../arch/x86/kernel/time_64.o. > To your questions: we have only user space tasks, but we use rtdm driver (with ioctl, no tasks). The processor is Athlon X2, 64-bit distribution (Kubuntu ?7.10?), x86_64 SMP PREEMPT, Kernel 2.6.24, Xenomai 2.4.1, adeos-ipipe-2.6.24-x86-2.0-03 I could send you my kernel config file if you want. I will try to learn the method of oopses catching via serial cable attached second box. But I don't know if it will be possible to setup such experiment for time long enough to reproduce the error. The following disassembly is from different machine than the one which has the kernel panics. I use it for developing and testing but it should have the same HW and kernel configuration. I can't totally sure, that the disassembly is correct, but i hope so. I junst can't recognise it. Could you explain me what does "profile_pc+0x46/0x80"? I assume the first number is the current RIP address (relative to routine start), so 0x256 in this case. But what does mean the second number? 0000000000000210 <profile_pc>: 210: 48 83 ec 18 sub $0x18,%rsp 214: 48 89 5c 24 08 mov %rbx,0x8(%rsp) 219: 48 89 6c 24 10 mov %rbp,0x10(%rsp) 21e: 48 89 fb mov %rdi,%rbx 221: f6 87 88 00 00 00 03 testb $0x3,0x88(%rdi) 228: 48 8b af 80 00 00 00 mov 0x80(%rdi),%rbp 22f: 74 12 je 243 <profile_pc+0x33> 231: 48 89 e8 mov %rbp,%rax 234: 48 8b 5c 24 08 mov 0x8(%rsp),%rbx 239: 48 8b 6c 24 10 mov 0x10(%rsp),%rbp 23e: 48 83 c4 18 add $0x18,%rsp 242: c3 retq 243: 48 89 ef mov %rbp,%rdi 246: e8 00 00 00 00 callq 24b <profile_pc+0x3b> 24b: 85 c0 test %eax,%eax 24d: 74 e2 je 231 <profile_pc+0x21> 24f: 48 8b 8b 98 00 00 00 mov 0x98(%rbx),%rcx 256: 48 8b 11 mov (%rcx),%rdx 259: 48 89 d0 mov %rdx,%rax 25c: 48 c1 e8 16 shr $0x16,%rax 260: 48 85 c0 test %rax,%rax 263: 75 1b jne 280 <profile_pc+0x70> 265: 48 8b 51 08 mov 0x8(%rcx),%rdx 269: 48 89 d0 mov %rdx,%rax 26c: 48 c1 e8 16 shr $0x16,%rax 270: 48 85 c0 test %rax,%rax 273: 48 0f 45 ea cmovne %rdx,%rbp 277: eb b8 jmp 231 <profile_pc+0x21> 279: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) 280: 48 89 d5 mov %rdx,%rbp 283: eb ac jmp 231 <profile_pc+0x21> 285: 66 66 2e 0f 1f 84 00 nopw %cs:0x0(%rax,%rax,1) 28c: 00 00 00 00 ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Xenomai-help] Kernel panic: not syncing 2008-07-09 15:19 ` Petr Cervenka @ 2008-07-09 16:05 ` Philippe Gerum 2008-07-10 13:45 ` Petr Cervenka 2008-07-11 13:18 ` Petr Cervenka 0 siblings, 2 replies; 24+ messages in thread From: Philippe Gerum @ 2008-07-09 16:05 UTC (permalink / raw) To: Petr Cervenka; +Cc: jan.kiszka, xenomai Petr Cervenka wrote: > Jan Kiszka wrote: >> Gilles Chanteperdrix wrote: >>> Jan Kiszka wrote: >>>> Philippe Gerum wrote: >>>>> Petr Cervenka wrote: >>>>>> Hello, >>>>>> I'm not sure if I'm not off topic. >>>>>> We use Linux 2.6.24 and Xenomai 2.4.1. Occasionally (once in few >>>>>> days) we get an kernel panic and I don't know If it's our fault or a >>>>>> problem of kernel/xenomai/adeos/configuration/hw/... >>>>>> If you have any questions, i'll try to answer them. Any help is >>>>>> welcome. >>>>> It is an I-pipe issue, probably. We have to somewhat forge the >>>>> register frame >>>>> passed to the Linux tick handler, since we may delay that call. Some >>>>> register >>>>> values the profiling code attempts to dereference to find the >>>>> preempted code may >>>>> be wrong in our case. >>>>> >>>>> Could you 1) send back a disassembly of the profile_tick routine in >>>>> your kernel >>>>> image, then apply the following patch to check whether it improves >>>>> the situation >>>>> as well? TIA, >>>>> >>>>> --- 2.6.24-x86-2.0-03/arch/x86/kernel/ipipe.c~ 2008-02-11 >>>>> 10:48:24.000000000 +0100 >>>>> +++ 2.6.24-x86-2.0-03/arch/x86/kernel/ipipe.c 2008-07-07 >>>>> 17:55:36.000000000 +0200 >>>>> @@ -933,12 +933,7 @@ >>>>> tick_regs->eip = regs.eip; >>>>> tick_regs->ebp = regs.ebp; >>>>> #else /* !CONFIG_X86_32 */ >>>>> - tick_regs->ss = regs->ss; >>>>> - tick_regs->rsp = regs->rsp; >>>>> - tick_regs->eflags = regs->eflags; >>>>> - tick_regs->cs = regs->cs; >>>>> - tick_regs->rip = regs->rip; >>>>> - tick_regs->rbp = regs->rbp; >>>>> + *tick_regs = *regs; >>>>> #endif /* !CONFIG_X86_32 */ >>>> I'm fairly sure that this won't make a difference. According to Petr's >>>> first dump we crash in profile_pc, and there the kernel pokes around > on >>>> the stack of the interrupted context (Petr, you are running SMP, >>>> right?). The question is if this stack may have vanished or may have >>>> been swapped out after capturing the registers. >>> When Xenomai has forwarded the tick to linux, Linux tick handler is >>> executed upon resume to user-space, so, if the stack had to vanish, it >>> would have to vanish upon execution of another interrupt handler before >>> the tick handler. However, I believe that only do_exit can kill a task, >>> and I am not sure if it can be called from an interrupt handler. As for >>> the stack being swapped out, it is kmalloced memory, so, it is > impossible. >> Yes, vanishing stack is unlikely, more probable is an invalid state >> right from the beginning. I guess we need a full oops to say more. >> >> Petr, any chance to attach a serial cable to your box and catch those >> oopses completely via a second box? The register state would be telling, >> but also, as Philippe already requested, a disassembly of the involved >> function - in case it remain profile_pc: objdump -dS >> linux-.../arch/x86/kernel/time_64.o. >> > > To your questions: > we have only user space tasks, but we use rtdm driver (with ioctl, no tasks). > The processor is Athlon X2, 64-bit distribution (Kubuntu ?7.10?), x86_64 SMP PREEMPT, > Kernel 2.6.24, Xenomai 2.4.1, adeos-ipipe-2.6.24-x86-2.0-03 > I could send you my kernel config file if you want. > I will try to learn the method of oopses catching via serial cable attached second box. But I don't know if it will be possible to setup such experiment for time long enough to reproduce the error. > > The following disassembly is from different machine than the one which has the kernel panics. I use it for developing and testing but it should have the same HW and kernel configuration. I can't totally sure, that the disassembly is correct, but i hope so. I junst can't recognise it. Could you explain me what does "profile_pc+0x46/0x80"? I assume the first number is the current RIP address (relative to routine start), so 0x256 in this case. But what does mean the second number? Size of the routine. > > 0000000000000210 <profile_pc>: > 210: 48 83 ec 18 sub $0x18,%rsp > 214: 48 89 5c 24 08 mov %rbx,0x8(%rsp) > 219: 48 89 6c 24 10 mov %rbp,0x10(%rsp) > 21e: 48 89 fb mov %rdi,%rbx > 221: f6 87 88 00 00 00 03 testb $0x3,0x88(%rdi) > 228: 48 8b af 80 00 00 00 mov 0x80(%rdi),%rbp > 22f: 74 12 je 243 <profile_pc+0x33> > 231: 48 89 e8 mov %rbp,%rax > 234: 48 8b 5c 24 08 mov 0x8(%rsp),%rbx > 239: 48 8b 6c 24 10 mov 0x10(%rsp),%rbp > 23e: 48 83 c4 18 add $0x18,%rsp > 242: c3 retq > 243: 48 89 ef mov %rbp,%rdi > 246: e8 00 00 00 00 callq 24b <profile_pc+0x3b> > 24b: 85 c0 test %eax,%eax > 24d: 74 e2 je 231 <profile_pc+0x21> > 24f: 48 8b 8b 98 00 00 00 mov 0x98(%rbx),%rcx > 256: 48 8b 11 mov (%rcx),%rdx We are dereferencing invalid stack memory, using the stack pointer value of the preempted context. What could help is to have the registers dump which should appear in the oops message, and specifically the %rcx value. > 259: 48 89 d0 mov %rdx,%rax > 25c: 48 c1 e8 16 shr $0x16,%rax > 260: 48 85 c0 test %rax,%rax > 263: 75 1b jne 280 <profile_pc+0x70> > 265: 48 8b 51 08 mov 0x8(%rcx),%rdx > 269: 48 89 d0 mov %rdx,%rax > 26c: 48 c1 e8 16 shr $0x16,%rax > 270: 48 85 c0 test %rax,%rax > 273: 48 0f 45 ea cmovne %rdx,%rbp > 277: eb b8 jmp 231 <profile_pc+0x21> > 279: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) > 280: 48 89 d5 mov %rdx,%rbp > 283: eb ac jmp 231 <profile_pc+0x21> > 285: 66 66 2e 0f 1f 84 00 nopw %cs:0x0(%rax,%rax,1) > 28c: 00 00 00 00 > > -- Philippe. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Xenomai-help] Kernel panic: not syncing 2008-07-09 16:05 ` Philippe Gerum @ 2008-07-10 13:45 ` Petr Cervenka 2008-07-11 13:18 ` Petr Cervenka 1 sibling, 0 replies; 24+ messages in thread From: Petr Cervenka @ 2008-07-10 13:45 UTC (permalink / raw) To: rpm; +Cc: xenomai Philippe Gerum wrote: >Petr Cervenka wrote: >> Jan Kiszka wrote: >>> Gilles Chanteperdrix wrote: >>>> Jan Kiszka wrote: >>>>> Philippe Gerum wrote: >>>>>> Petr Cervenka wrote: >>>>>>> Hello, >>>>>>> I'm not sure if I'm not off topic. >>>>>>> We use Linux 2.6.24 and Xenomai 2.4.1. Occasionally (once in few >>>>>>> days) we get an kernel panic and I don't know If it's our fault or a >>>>>>> problem of kernel/xenomai/adeos/configuration/hw/... >>>>>>> If you have any questions, i'll try to answer them. Any help is >>>>>>> welcome. >>>>>> It is an I-pipe issue, probably. We have to somewhat forge the >>>>>> register frame >>>>>> passed to the Linux tick handler, since we may delay that call. Some >>>>>> register >>>>>> values the profiling code attempts to dereference to find the >>>>>> preempted code may >>>>>> be wrong in our case. >>>>>> >>>>>> Could you 1) send back a disassembly of the profile_tick routine in >>>>>> your kernel >>>>>> image, then apply the following patch to check whether it improves >>>>>> the situation >>>>>> as well? TIA, >>>>>> >>>>>> --- 2.6.24-x86-2.0-03/arch/x86/kernel/ipipe.c~ 2008-02-11 >>>>>> 10:48:24.000000000 +0100 >>>>>> +++ 2.6.24-x86-2.0-03/arch/x86/kernel/ipipe.c 2008-07-07 >>>>>> 17:55:36.000000000 +0200 >>>>>> @@ -933,12 +933,7 @@ >>>>>> tick_regs->eip = regs.eip; >>>>>> tick_regs->ebp = regs.ebp; >>>>>> #else /* !CONFIG_X86_32 */ >>>>>> - tick_regs->ss = regs->ss; >>>>>> - tick_regs->rsp = regs->rsp; >>>>>> - tick_regs->eflags = regs->eflags; >>>>>> - tick_regs->cs = regs->cs; >>>>>> - tick_regs->rip = regs->rip; >>>>>> - tick_regs->rbp = regs->rbp; >>>>>> + *tick_regs = *regs; >>>>>> #endif /* !CONFIG_X86_32 */ >>>>> I'm fairly sure that this won't make a difference. According to Petr's >>>>> first dump we crash in profile_pc, and there the kernel pokes around >> on >>>>> the stack of the interrupted context (Petr, you are running SMP, >>>>> right?). The question is if this stack may have vanished or may have >>>>> been swapped out after capturing the registers. >>>> When Xenomai has forwarded the tick to linux, Linux tick handler is >>>> executed upon resume to user-space, so, if the stack had to vanish, it >>>> would have to vanish upon execution of another interrupt handler before >>>> the tick handler. However, I believe that only do_exit can kill a task, >>>> and I am not sure if it can be called from an interrupt handler. As for >>>> the stack being swapped out, it is kmalloced memory, so, it is >> impossible. >>> Yes, vanishing stack is unlikely, more probable is an invalid state >>> right from the beginning. I guess we need a full oops to say more. >>> >>> Petr, any chance to attach a serial cable to your box and catch those >>> oopses completely via a second box? The register state would be telling, >>> but also, as Philippe already requested, a disassembly of the involved >>> function - in case it remain profile_pc: objdump -dS >>> linux-.../arch/x86/kernel/time_64.o. >>> >> >> To your questions: >> we have only user space tasks, but we use rtdm driver (with ioctl, no tasks). >> The processor is Athlon X2, 64-bit distribution (Kubuntu ?7.10?), x86_64 SMP PREEMPT, >> Kernel 2.6.24, Xenomai 2.4.1, adeos-ipipe-2.6.24-x86-2.0-03 >> I could send you my kernel config file if you want. >> I will try to learn the method of oopses catching via serial cable attached second box. But I don't know if it will be possible to setup such experiment for time long enough to reproduce the error. >> >> The following disassembly is from different machine than the one which has the kernel panics. I use it for developing and testing but it should have the same HW and kernel configuration. I can't totally sure, that the disassembly is correct, but i hope so. I junst can't recognise it. Could you explain me what does "profile_pc+0x46/0x80"? I assume the first number is the current RIP address (relative to routine start), so 0x256 in this case. But what does mean the second number? > >Size of the routine. > >> >> 0000000000000210 <profile_pc>: >> 210: 48 83 ec 18 sub $0x18,%rsp >> 214: 48 89 5c 24 08 mov %rbx,0x8(%rsp) >> 219: 48 89 6c 24 10 mov %rbp,0x10(%rsp) >> 21e: 48 89 fb mov %rdi,%rbx >> 221: f6 87 88 00 00 00 03 testb $0x3,0x88(%rdi) >> 228: 48 8b af 80 00 00 00 mov 0x80(%rdi),%rbp >> 22f: 74 12 je 243 <profile_pc+0x33> >> 231: 48 89 e8 mov %rbp,%rax >> 234: 48 8b 5c 24 08 mov 0x8(%rsp),%rbx >> 239: 48 8b 6c 24 10 mov 0x10(%rsp),%rbp >> 23e: 48 83 c4 18 add $0x18,%rsp >> 242: c3 retq >> 243: 48 89 ef mov %rbp,%rdi >> 246: e8 00 00 00 00 callq 24b <profile_pc+0x3b> >> 24b: 85 c0 test %eax,%eax >> 24d: 74 e2 je 231 <profile_pc+0x21> >> 24f: 48 8b 8b 98 00 00 00 mov 0x98(%rbx),%rcx >> 256: 48 8b 11 mov (%rcx),%rdx > >We are dereferencing invalid stack memory, using the stack pointer value of the >preempted context. > >What could help is to have the registers dump which should appear in the oops >message, and specifically the %rcx value. I tried to setup the serial line console but with only partial success. I used this guide: http://www.av8n.com/computer/htm/kernel-lockup.htm (not the watchdog part) Now I am able through gtkterm to login to the 1. (logged) machine from the 2. (logging) machine. But when I try to use: echo "Hi there." > /dev/console, i see the text only on the actual session (tty0) and not on the connected 2. machine (maybe the gtkterm could be not the right program for it). When I use directly: echo "Hi there." > /dev/ttyS0, it freezes and I have to press Ctrl+C to continue (with: Interrupted signal call error). In the dmesg output of the first machine, there is line: console [ttyS0] enabled. the console ttyS0, the getty on ttyS0 and the gtkterm use the same 115200 speed. What have I done wrong? Petr ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Xenomai-help] Kernel panic: not syncing 2008-07-09 16:05 ` Philippe Gerum 2008-07-10 13:45 ` Petr Cervenka @ 2008-07-11 13:18 ` Petr Cervenka 2008-07-15 14:42 ` Petr Cervenka 1 sibling, 1 reply; 24+ messages in thread From: Petr Cervenka @ 2008-07-11 13:18 UTC (permalink / raw) To: rpm; +Cc: xenomai, jan.kiszka I was able to capture the kernel panic through the serial line. This time, it took less time than I expected. Petr [ 2009.702873] Unable to handle kernel paging request at 0000000040090ff8 RIP: [ 2009.708187] [<ffffffff802101a6>] profile_pc+0x46/0x80 [ 2009.716892] PGD 116fd067 PUD 116b5067 PMD 3d93d067 PTE 0 [ 2009.722998] Oops: 0000 [1] PREEMPT SMP [ 2009.727400] CPU 0 [ 2009.729800] Modules linked in: nls_iso8859_1 nls_cp437 vfat fat usb_storage libusual rt_e1000 rt_r8169 rtpacket rtnet ppdev pci171x_rtdm(P) container ac video output sbs sbshc dock battery parport_pc lp parport psmouse serio_raw pcspkr k8temp i2c_nforce2 i2c_core button af_packet ipv6 evdev ext3 jbd mbcache sg sd_mod ide_cd cdrom sata_nv floppy forcedeth ata_generic libata scsi_mod amd74xx ehci_hcd ohci_hcd ide_core usbcore fan fuse [ 2009.774477] Pid: 0, comm: swapper Tainted: P 2.6.24-adeos #1 [ 2009.781682] RIP: 0010:[<ffffffff802101a6>] [<ffffffff802101a6>] profile_pc+0x46/0x80 [ 2009.790682] RSP: 0018:ffffffff80664da0 EFLAGS: 00010202 [ 2009.796869] RAX: 0000000000000001 RBX: ffff8100010087a0 RCX: 0000000040090ff8 [ 2009.804953] RDX: ffff8100809b4000 RSI: 0000000000000906 RDI: ffffffff8048c6c8 [ 2009.813179] RBP: ffffffff8048c6c8 R08: 0000000000000004 R09: 0000000000000010 [ 2009.821383] R10: 0000000000000005 R11: ffffffff80258ee0 R12: ffff81000100a5c0 [ 2009.829667] R13: 000001d1bcf596b9 R14: 0000000000000000 R15: 0000000000000001 [ 2009.837867] FS: 0000000040091950(0000) GS:ffffffff805d6000(0000) knlGS:0000000000000000 [ 2009.847177] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b [ 2009.853762] CR2: 0000000040090ff8 CR3: 0000000014cfd000 CR4: 00000000000006e0 [ 2009.862084] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 2009.870373] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 2009.878572] Process swapper (pid: 0, threadinfo ffffffff80602000, task ffffffff805a03a0) [ 2009.887786] Stack: 000001d1bcf596b9 ffff8100010087a0 0000000000000001 ffffffff8023fd3e [ 2009.897090] 000001d1bcf596b9 ffff81000100a6c0 ffff8100010087a0 ffffffff8025e8a5 [ 2009.905760] ffff81000100a618 ffff81000100a6c0 ffff81000100a618 000001d1bcf58f16 [ 2009.914068] Call Trace: [ 2009.917071] <IRQ> [<ffffffff8023fd3e>] profile_tick+0x5e/0xa0 [ 2009.923847] [<ffffffff8025e8a5>] tick_sched_timer+0x85/0x170 [ 2009.930374] [<ffffffff8025900f>] hrtimer_interrupt+0x12f/0x1e0 [ 2009.937150] [<ffffffff80220857>] smp_apic_timer_interrupt+0x37/0x60 [ 2009.944454] [<ffffffff802777e0>] __ipipe_sync_stage+0x350/0x355 [ 2009.951351] [<ffffffff80220820>] smp_apic_timer_interrupt+0x0/0x60 [ 2009.958555] [<ffffffff802777e5>] __xirq_end+0x0/0x85 [ 2009.964349] [<ffffffff80220820>] smp_apic_timer_interrupt+0x0/0x60 [ 2009.971558] [<ffffffff80226b01>] __ipipe_handle_irq+0x91/0x250 [ 2009.978457] [<ffffffff8020af50>] default_idle+0x0/0x40 [ 2009.984437] [<ffffffff8020c9f1>] common_interrupt+0x61/0x7d [ 2009.990940] <EOI> [<ffffffff8020af79>] default_idle+0x29/0x40 [ 2009.997762] [<ffffffff8020b01b>] cpu_idle+0x8b/0x120 [ 2010.003541] [<ffffffff8060cbba>] start_kernel+0x2ba/0x350 [ 2010.009741] [<ffffffff8060c120>] _sinittext+0x120/0x130 [ 2010.015821] [ 2010.017549] [ 2010.017556] Code: 48 8b 11 48 89 d0 48 c1 e8 16 48 85 c0 75 1b 48 8b 51 08 48 [ 2010.028006] RIP [<ffffffff802101a6>] profile_pc+0x46/0x80 [ 2010.034335] RSP <ffffffff80664da0> [ 2010.038334] CR2: 0000000040090ff8 [ 2010.042166] ---[ end trace 95174c527ade95f0 ]--- [ 2010.047468] Kernel panic - not syncing: Aiee, killing interrupt handler! >> >> 0000000000000210 <profile_pc>: >> 210: 48 83 ec 18 sub $0x18,%rsp >> 214: 48 89 5c 24 08 mov %rbx,0x8(%rsp) >> 219: 48 89 6c 24 10 mov %rbp,0x10(%rsp) >> 21e: 48 89 fb mov %rdi,%rbx >> 221: f6 87 88 00 00 00 03 testb $0x3,0x88(%rdi) >> 228: 48 8b af 80 00 00 00 mov 0x80(%rdi),%rbp >> 22f: 74 12 je 243 <profile_pc+0x33> >> 231: 48 89 e8 mov %rbp,%rax >> 234: 48 8b 5c 24 08 mov 0x8(%rsp),%rbx >> 239: 48 8b 6c 24 10 mov 0x10(%rsp),%rbp >> 23e: 48 83 c4 18 add $0x18,%rsp >> 242: c3 retq >> 243: 48 89 ef mov %rbp,%rdi >> 246: e8 00 00 00 00 callq 24b <profile_pc+0x3b> >> 24b: 85 c0 test %eax,%eax >> 24d: 74 e2 je 231 <profile_pc+0x21> >> 24f: 48 8b 8b 98 00 00 00 mov 0x98(%rbx),%rcx >> 256: 48 8b 11 mov (%rcx),%rdx >> 259: 48 89 d0 mov %rdx,%rax >> 25c: 48 c1 e8 16 shr $0x16,%rax >> 260: 48 85 c0 test %rax,%rax >> 263: 75 1b jne 280 <profile_pc+0x70> >> 265: 48 8b 51 08 mov 0x8(%rcx),%rdx >> 269: 48 89 d0 mov %rdx,%rax >> 26c: 48 c1 e8 16 shr $0x16,%rax >> 270: 48 85 c0 test %rax,%rax >> 273: 48 0f 45 ea cmovne %rdx,%rbp >> 277: eb b8 jmp 231 <profile_pc+0x21> >> 279: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) >> 280: 48 89 d5 mov %rdx,%rbp >> 283: eb ac jmp 231 <profile_pc+0x21> >> 285: 66 66 2e 0f 1f 84 00 nopw %cs:0x0(%rax,%rax,1) >> 28c: 00 00 00 00 >> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Xenomai-help] Kernel panic: not syncing 2008-07-11 13:18 ` Petr Cervenka @ 2008-07-15 14:42 ` Petr Cervenka 2008-07-15 15:03 ` Jan Kiszka 0 siblings, 1 reply; 24+ messages in thread From: Petr Cervenka @ 2008-07-15 14:42 UTC (permalink / raw) To: xenomai; +Cc: jan.kiszka I captured also the second type of kernel panic. This one seems to happen during "advanced" configuration of out system. This means lot of work in a low priority (5) xenomai task (WORK_TASK_2056) for a short time. Another question is, what does mean "(P)" after the name of our rtdm module (pci171x_rtdm(P))? [ 7815.694296] ------------[ cut here ]------------ [ 7815.699111] kernel BUG at kernel/posix-cpu-timers.c:1295! [ 7815.704715] invalid opcode: 0000 [1] PREEMPT SMP [ 7815.709672] CPU 0 [ 7815.711777] Modules linked in: rt_e1000 rt_r8169 rtpacket rtnet ppdev pci171x_rtdm(P) container ac video output sbs sbshc dock battery parport_pc lp parport psmouse serio_raw pcspkr k8temp i2c_nforce2 button i2c_core af_packet ipv6 evdev ext3 jbd mbcache sg sd_mod ide_cd cdrom sata_nv floppy ata_generic libata ohci_hcd forcedeth ehci_hcd scsi_mod amd74xx ide_core usbcore fan fuse [ 7815.747844] Pid: 6481, comm: WORK_TASK_2056 Tainted: P 2.6.24-adeos #1 [ 7815.755321] RIP: 0010:[<ffffffff80256e20>] [<ffffffff80256e20>] run_posix_cpu_timers+0x810/0x820 [ 7815.764629] RSP: 0000:ffffffff80664d70 EFLAGS: 00010246 [ 7815.770122] RAX: ffff81000100a7c0 RBX: ffff81003e082780 RCX: ffffffff805a03a0 [ 7815.777573] RDX: 0000000000000000 RSI: ffff81003e082780 RDI: ffff81003e082780 [ 7815.785080] RBP: ffff8100010087a0 R08: 0000000000000004 R09: 0000000000000010 [ 7815.792566] R10: 0000000000000005 R11: ffffffff80258ee0 R12: ffff81000100a5c0 [ 7815.800001] R13: 00000719439890f1 R14: 0000000000000000 R15: ffffffff80664d90 [ 7815.807436] FS: 0000000040112950(0063) GS:ffffffff805d6000(0000) knlGS:0000000000000000 [ 7815.815909] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 7815.821915] CR2: 00002b83d55aec80 CR3: 000000003dff8000 CR4: 00000000000006e0 [ 7815.829357] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 7815.836786] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 7815.844238] Process WORK_TASK_2056 (pid: 6481, threadinfo ffff810013f78000, task ffff81003e082780) [ 7815.853584] Stack: ffff810001013180 00000718f7bcb1dd ffffffff80664db0 ffffffff80238af8 [ 7815.862131] ffffffff80664d90 ffffffff80664d90 00000719439890f1 ffff81000100a6c0 [ 7815.869974] ffff8100010087a0 ffff81000100a5c0 00000719439890f1 0000000000000000 [ 7815.877667] Call Trace: [ 7815.880441] <IRQ> [<ffffffff80238af8>] scheduler_tick+0xf8/0x140 [ 7815.886908] [<ffffffff8025e89b>] tick_sched_timer+0x7b/0x170 [ 7815.892929] [<ffffffff8025900f>] hrtimer_interrupt+0x12f/0x1e0 [ 7815.899137] [<ffffffff80220857>] smp_apic_timer_interrupt+0x37/0x60 [ 7815.905752] [<ffffffff8020c9f1>] common_interrupt+0x61/0x7d [ 7815.911779] [<ffffffff802777e0>] __ipipe_sync_stage+0x350/0x355 [ 7815.918085] [<ffffffff80220820>] smp_apic_timer_interrupt+0x0/0x60 [ 7815.924655] [<ffffffff802777e5>] __xirq_end+0x0/0x85 [ 7815.929964] [<ffffffff80220820>] smp_apic_timer_interrupt+0x0/0x60 [ 7815.936587] [<ffffffff80226b01>] __ipipe_handle_irq+0x91/0x250 [ 7815.942774] [<ffffffff8020c9f1>] common_interrupt+0x61/0x7d [ 7815.948673] <EOI> [ 7815.950909] [ 7815.950909] Code: 0f 0b eb fe 66 66 66 2e 0f 1f 84 00 00 00 00 00 41 57 41 56 [ 7815.960491] RIP [<ffffffff80256e20>] run_posix_cpu_timers+0x810/0x820 [ 7815.967284] RSP <ffffffff80664d70> [ 7815.970982] ---[ end trace d192885d9858c4b2 ]--- [ 7815.975820] Kernel panic - not syncing: Aiee, killing interrupt handler! ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Xenomai-help] Kernel panic: not syncing 2008-07-15 14:42 ` Petr Cervenka @ 2008-07-15 15:03 ` Jan Kiszka 2008-07-16 8:39 ` Petr Cervenka 0 siblings, 1 reply; 24+ messages in thread From: Jan Kiszka @ 2008-07-15 15:03 UTC (permalink / raw) To: Petr Cervenka; +Cc: xenomai Petr Cervenka wrote: > I captured also the second type of kernel panic. This one seems to happen during "advanced" configuration of out system. This means lot of work in a low priority (5) xenomai task (WORK_TASK_2056) for a short time. > Another question is, what does mean "(P)" after the name of our rtdm module (pci171x_rtdm(P))? That it either does not comply to the GPL or that the author forgot to announce its compliance via MODULE_LICENSE(). > > [ 7815.694296] ------------[ cut here ]------------ > [ 7815.699111] kernel BUG at kernel/posix-cpu-timers.c:1295! > [ 7815.704715] invalid opcode: 0000 [1] PREEMPT SMP > [ 7815.709672] CPU 0 > [ 7815.711777] Modules linked in: rt_e1000 rt_r8169 rtpacket rtnet ppdev pci171x_rtdm(P) container ac video output sbs sbshc dock battery parport_pc lp parport psmouse serio_raw pcspkr k8temp i2c_nforce2 button i2c_core af_packet ipv6 evdev ext3 jbd mbcache sg sd_mod ide_cd cdrom sata_nv floppy ata_generic libata ohci_hcd forcedeth ehci_hcd scsi_mod amd74xx ide_core usbcore fan fuse > [ 7815.747844] Pid: 6481, comm: WORK_TASK_2056 Tainted: P 2.6.24-adeos #1 > [ 7815.755321] RIP: 0010:[<ffffffff80256e20>] [<ffffffff80256e20>] run_posix_cpu_timers+0x810/0x820 > [ 7815.764629] RSP: 0000:ffffffff80664d70 EFLAGS: 00010246 > [ 7815.770122] RAX: ffff81000100a7c0 RBX: ffff81003e082780 RCX: ffffffff805a03a0 > [ 7815.777573] RDX: 0000000000000000 RSI: ffff81003e082780 RDI: ffff81003e082780 > [ 7815.785080] RBP: ffff8100010087a0 R08: 0000000000000004 R09: 0000000000000010 > [ 7815.792566] R10: 0000000000000005 R11: ffffffff80258ee0 R12: ffff81000100a5c0 > [ 7815.800001] R13: 00000719439890f1 R14: 0000000000000000 R15: ffffffff80664d90 > [ 7815.807436] FS: 0000000040112950(0063) GS:ffffffff805d6000(0000) knlGS:0000000000000000 > [ 7815.815909] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 7815.821915] CR2: 00002b83d55aec80 CR3: 000000003dff8000 CR4: 00000000000006e0 > [ 7815.829357] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 7815.836786] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > [ 7815.844238] Process WORK_TASK_2056 (pid: 6481, threadinfo ffff810013f78000, task ffff81003e082780) > [ 7815.853584] Stack: ffff810001013180 00000718f7bcb1dd ffffffff80664db0 ffffffff80238af8 > [ 7815.862131] ffffffff80664d90 ffffffff80664d90 00000719439890f1 ffff81000100a6c0 > [ 7815.869974] ffff8100010087a0 ffff81000100a5c0 00000719439890f1 0000000000000000 > [ 7815.877667] Call Trace: > [ 7815.880441] <IRQ> [<ffffffff80238af8>] scheduler_tick+0xf8/0x140 > [ 7815.886908] [<ffffffff8025e89b>] tick_sched_timer+0x7b/0x170 > [ 7815.892929] [<ffffffff8025900f>] hrtimer_interrupt+0x12f/0x1e0 > [ 7815.899137] [<ffffffff80220857>] smp_apic_timer_interrupt+0x37/0x60 > [ 7815.905752] [<ffffffff8020c9f1>] common_interrupt+0x61/0x7d > [ 7815.911779] [<ffffffff802777e0>] __ipipe_sync_stage+0x350/0x355 > [ 7815.918085] [<ffffffff80220820>] smp_apic_timer_interrupt+0x0/0x60 > [ 7815.924655] [<ffffffff802777e5>] __xirq_end+0x0/0x85 > [ 7815.929964] [<ffffffff80220820>] smp_apic_timer_interrupt+0x0/0x60 > [ 7815.936587] [<ffffffff80226b01>] __ipipe_handle_irq+0x91/0x250 > [ 7815.942774] [<ffffffff8020c9f1>] common_interrupt+0x61/0x7d > [ 7815.948673] <EOI> > [ 7815.950909] > [ 7815.950909] Code: 0f 0b eb fe 66 66 66 2e 0f 1f 84 00 00 00 00 00 41 57 41 56 > [ 7815.960491] RIP [<ffffffff80256e20>] run_posix_cpu_timers+0x810/0x820 > [ 7815.967284] RSP <ffffffff80664d70> > [ 7815.970982] ---[ end trace d192885d9858c4b2 ]--- > [ 7815.975820] Kernel panic - not syncing: Aiee, killing interrupt handler! That's now a totally different spot, and it makes me wonder if can reproduce all this troubles with vanilla Xenomai and without your driver being loaded... Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Xenomai-help] Kernel panic: not syncing 2008-07-15 15:03 ` Jan Kiszka @ 2008-07-16 8:39 ` Petr Cervenka 2008-07-17 10:21 ` Jan Kiszka 0 siblings, 1 reply; 24+ messages in thread From: Petr Cervenka @ 2008-07-16 8:39 UTC (permalink / raw) To: jan.kiszka; +Cc: xenomai Jan Kizska wrote: >Petr Cervenka wrote: >> I captured also the second type of kernel panic. This one seems to happen during "advanced" configuration of out system. This means lot of work in a low priority (5) xenomai task (WORK_TASK_2056) for a short time. >> Another question is, what does mean "(P)" after the name of our rtdm module (pci171x_rtdm(P))? > >That it either does not comply to the GPL or that the author forgot to >announce its compliance via MODULE_LICENSE(). > >> >> [ 7815.694296] ------------[ cut here ]------------ >> [ 7815.699111] kernel BUG at kernel/posix-cpu-timers.c:1295! >> [ 7815.704715] invalid opcode: 0000 [1] PREEMPT SMP >> [ 7815.709672] CPU 0 >> [ 7815.711777] Modules linked in: rt_e1000 rt_r8169 rtpacket rtnet ppdev pci171x_rtdm(P) container ac video output sbs sbshc dock battery parport_pc lp parport psmouse serio_raw pcspkr k8temp i2c_nforce2 button i2c_core af_packet ipv6 evdev ext3 jbd mbcache sg sd_mod ide_cd cdrom sata_nv floppy ata_generic libata ohci_hcd forcedeth ehci_hcd scsi_mod amd74xx ide_core usbcore fan fuse >> [ 7815.747844] Pid: 6481, comm: WORK_TASK_2056 Tainted: P 2.6.24-adeos #1 >> [ 7815.755321] RIP: 0010:[<ffffffff80256e20>] [<ffffffff80256e20>] run_posix_cpu_timers+0x810/0x820 >> [ 7815.764629] RSP: 0000:ffffffff80664d70 EFLAGS: 00010246 >> [ 7815.770122] RAX: ffff81000100a7c0 RBX: ffff81003e082780 RCX: ffffffff805a03a0 >> [ 7815.777573] RDX: 0000000000000000 RSI: ffff81003e082780 RDI: ffff81003e082780 >> [ 7815.785080] RBP: ffff8100010087a0 R08: 0000000000000004 R09: 0000000000000010 >> [ 7815.792566] R10: 0000000000000005 R11: ffffffff80258ee0 R12: ffff81000100a5c0 >> [ 7815.800001] R13: 00000719439890f1 R14: 0000000000000000 R15: ffffffff80664d90 >> [ 7815.807436] FS: 0000000040112950(0063) GS:ffffffff805d6000(0000) knlGS:0000000000000000 >> [ 7815.815909] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 7815.821915] CR2: 00002b83d55aec80 CR3: 000000003dff8000 CR4: 00000000000006e0 >> [ 7815.829357] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> [ 7815.836786] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >> [ 7815.844238] Process WORK_TASK_2056 (pid: 6481, threadinfo ffff810013f78000, task ffff81003e082780) >> [ 7815.853584] Stack: ffff810001013180 00000718f7bcb1dd ffffffff80664db0 ffffffff80238af8 >> [ 7815.862131] ffffffff80664d90 ffffffff80664d90 00000719439890f1 ffff81000100a6c0 >> [ 7815.869974] ffff8100010087a0 ffff81000100a5c0 00000719439890f1 0000000000000000 >> [ 7815.877667] Call Trace: >> [ 7815.880441] <IRQ> [<ffffffff80238af8>] scheduler_tick+0xf8/0x140 >> [ 7815.886908] [<ffffffff8025e89b>] tick_sched_timer+0x7b/0x170 >> [ 7815.892929] [<ffffffff8025900f>] hrtimer_interrupt+0x12f/0x1e0 >> [ 7815.899137] [<ffffffff80220857>] smp_apic_timer_interrupt+0x37/0x60 >> [ 7815.905752] [<ffffffff8020c9f1>] common_interrupt+0x61/0x7d >> [ 7815.911779] [<ffffffff802777e0>] __ipipe_sync_stage+0x350/0x355 >> [ 7815.918085] [<ffffffff80220820>] smp_apic_timer_interrupt+0x0/0x60 >> [ 7815.924655] [<ffffffff802777e5>] __xirq_end+0x0/0x85 >> [ 7815.929964] [<ffffffff80220820>] smp_apic_timer_interrupt+0x0/0x60 >> [ 7815.936587] [<ffffffff80226b01>] __ipipe_handle_irq+0x91/0x250 >> [ 7815.942774] [<ffffffff8020c9f1>] common_interrupt+0x61/0x7d >> [ 7815.948673] <EOI> >> [ 7815.950909] >> [ 7815.950909] Code: 0f 0b eb fe 66 66 66 2e 0f 1f 84 00 00 00 00 00 41 57 41 56 >> [ 7815.960491] RIP [<ffffffff80256e20>] run_posix_cpu_timers+0x810/0x820 >> [ 7815.967284] RSP <ffffffff80664d70> >> [ 7815.970982] ---[ end trace d192885d9858c4b2 ]--- >> [ 7815.975820] Kernel panic - not syncing: Aiee, killing interrupt handler! > >That's now a totally different spot, and it makes me wonder if can >reproduce all this troubles with vanilla Xenomai and without your driver >being loaded... > We measure data from our unit connected through rtnet or with a PCI card. It's independent if we use one way or another, these kernel panics appear in both setups. So rtnet and our module are not involved. But it does depend on the measuring frequency and the amount of measured data in every cycle. Petr ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Xenomai-help] Kernel panic: not syncing 2008-07-16 8:39 ` Petr Cervenka @ 2008-07-17 10:21 ` Jan Kiszka 2008-07-21 10:58 ` Petr Cervenka 0 siblings, 1 reply; 24+ messages in thread From: Jan Kiszka @ 2008-07-17 10:21 UTC (permalink / raw) To: Petr Cervenka; +Cc: xenomai Petr Cervenka wrote: > Jan Kizska wrote: >> Petr Cervenka wrote: >>> I captured also the second type of kernel panic. This one seems to > happen during "advanced" configuration of out system. This means lot of > work in a low priority (5) xenomai task (WORK_TASK_2056) for a short time. >>> Another question is, what does mean "(P)" after the name of our rtdm > module (pci171x_rtdm(P))? >> That it either does not comply to the GPL or that the author forgot to >> announce its compliance via MODULE_LICENSE(). >> >>> [ 7815.694296] ------------[ cut here ]------------ >>> [ 7815.699111] kernel BUG at kernel/posix-cpu-timers.c:1295! >>> [ 7815.704715] invalid opcode: 0000 [1] PREEMPT SMP >>> [ 7815.709672] CPU 0 >>> [ 7815.711777] Modules linked in: rt_e1000 rt_r8169 rtpacket rtnet > ppdev pci171x_rtdm(P) container ac video output sbs sbshc dock battery > parport_pc lp parport psmouse serio_raw pcspkr k8temp i2c_nforce2 button > i2c_core af_packet ipv6 evdev ext3 jbd mbcache sg sd_mod ide_cd cdrom > sata_nv floppy ata_generic libata ohci_hcd forcedeth ehci_hcd scsi_mod > amd74xx ide_core usbcore fan fuse >>> [ 7815.747844] Pid: 6481, comm: WORK_TASK_2056 Tainted: P > 2.6.24-adeos #1 >>> [ 7815.755321] RIP: 0010:[<ffffffff80256e20>] [<ffffffff80256e20>] > run_posix_cpu_timers+0x810/0x820 >>> [ 7815.764629] RSP: 0000:ffffffff80664d70 EFLAGS: 00010246 >>> [ 7815.770122] RAX: ffff81000100a7c0 RBX: ffff81003e082780 RCX: > ffffffff805a03a0 >>> [ 7815.777573] RDX: 0000000000000000 RSI: ffff81003e082780 RDI: > ffff81003e082780 >>> [ 7815.785080] RBP: ffff8100010087a0 R08: 0000000000000004 R09: > 0000000000000010 >>> [ 7815.792566] R10: 0000000000000005 R11: ffffffff80258ee0 R12: > ffff81000100a5c0 >>> [ 7815.800001] R13: 00000719439890f1 R14: 0000000000000000 R15: > ffffffff80664d90 >>> [ 7815.807436] FS: 0000000040112950(0063) GS:ffffffff805d6000(0000) > knlGS:0000000000000000 >>> [ 7815.815909] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> [ 7815.821915] CR2: 00002b83d55aec80 CR3: 000000003dff8000 CR4: > 00000000000006e0 >>> [ 7815.829357] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 >>> [ 7815.836786] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > 0000000000000400 >>> [ 7815.844238] Process WORK_TASK_2056 (pid: 6481, threadinfo > ffff810013f78000, task ffff81003e082780) >>> [ 7815.853584] Stack: ffff810001013180 00000718f7bcb1dd > ffffffff80664db0 ffffffff80238af8 >>> [ 7815.862131] ffffffff80664d90 ffffffff80664d90 00000719439890f1 > ffff81000100a6c0 >>> [ 7815.869974] ffff8100010087a0 ffff81000100a5c0 00000719439890f1 > 0000000000000000 >>> [ 7815.877667] Call Trace: >>> [ 7815.880441] <IRQ> [<ffffffff80238af8>] scheduler_tick+0xf8/0x140 >>> [ 7815.886908] [<ffffffff8025e89b>] tick_sched_timer+0x7b/0x170 >>> [ 7815.892929] [<ffffffff8025900f>] hrtimer_interrupt+0x12f/0x1e0 >>> [ 7815.899137] [<ffffffff80220857>] smp_apic_timer_interrupt+0x37/0x60 >>> [ 7815.905752] [<ffffffff8020c9f1>] common_interrupt+0x61/0x7d >>> [ 7815.911779] [<ffffffff802777e0>] __ipipe_sync_stage+0x350/0x355 >>> [ 7815.918085] [<ffffffff80220820>] smp_apic_timer_interrupt+0x0/0x60 >>> [ 7815.924655] [<ffffffff802777e5>] __xirq_end+0x0/0x85 >>> [ 7815.929964] [<ffffffff80220820>] smp_apic_timer_interrupt+0x0/0x60 >>> [ 7815.936587] [<ffffffff80226b01>] __ipipe_handle_irq+0x91/0x250 >>> [ 7815.942774] [<ffffffff8020c9f1>] common_interrupt+0x61/0x7d >>> [ 7815.948673] <EOI> >>> [ 7815.950909] >>> [ 7815.950909] Code: 0f 0b eb fe 66 66 66 2e 0f 1f 84 00 00 00 00 00 41 > 57 41 56 >>> [ 7815.960491] RIP [<ffffffff80256e20>] > run_posix_cpu_timers+0x810/0x820 >>> [ 7815.967284] RSP <ffffffff80664d70> >>> [ 7815.970982] ---[ end trace d192885d9858c4b2 ]--- >>> [ 7815.975820] Kernel panic - not syncing: Aiee, killing interrupt > handler! >> That's now a totally different spot, and it makes me wonder if can >> reproduce all this troubles with vanilla Xenomai and without your driver >> being loaded... >> > > We measure data from our unit connected through rtnet or with a PCI card. It's independent if we use one way or another, these kernel panics appear in both setups. So rtnet and our module are not involved. > But it does depend on the measuring frequency and the amount of measured data in every cycle. We likely see some race that causes weird memory corruptions. Its probability often increases when the code execution frequency raises. However, reducing the test case is very important now to reduce the search domain for this issue. E.g. try to fake peripheral access as far as possible, unloading the unused driver and only leaving the test program behind that is executable on arbitrary Xenomai installation (maybe finally on one of my boxes...). TiA, Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Xenomai-help] Kernel panic: not syncing 2008-07-17 10:21 ` Jan Kiszka @ 2008-07-21 10:58 ` Petr Cervenka 2008-07-21 11:26 ` Jan Kiszka 0 siblings, 1 reply; 24+ messages in thread From: Petr Cervenka @ 2008-07-21 10:58 UTC (permalink / raw) To: jan.kiszka; +Cc: xenomai Jan Kiszka wrote: >We likely see some race that causes weird memory corruptions. Its >probability often increases when the code execution frequency raises. > >However, reducing the test case is very important now to reduce the >search domain for this issue. E.g. try to fake peripheral access as far >as possible, unloading the unused driver and only leaving the test >program behind that is executable on arbitrary Xenomai installation >(maybe finally on one of my boxes...). > I'm not sure if I will be able to reduce the software. It's dependent on hardware and it's controlled from another windows computer with GUI and control application. And to check if the error is still there usually takes couple of days. I ran a test during last weekend (and nothing wrong happened). But the /proc/xenomai/stat output is strange. Probably some type cast error, because 18446744071739514846 = 0xFFFFFFFF8A939FDE and the appropriate value perhaps should be 0x000000008A939FDE = 2324930526. CPU PID MSW CSW PF STAT %CPU NAME 0 0 0 18446744071739514846 0 00500088 69.8 ROOT/0 1 0 0 18446744071675175740 0 00500080 23.2 ROOT/1 0 5299 0 351459 0 00300182 0.0 LOGGER_TASK_1804289383 0 5100 8 283613 0 00300186 0.0 0 5317 0 40591 0 00300182 0.0 0 5034 2 2330696 0 00300184 0.0 MAIN_TASK_2056 0 5318 5 18446744071736105613 3 00300180 29.5 REG_TASK_2056 0 5319 28 36 0 00300182 0.0 WORK_TASK_2056 0 5321 38926 39159 0 00300380 0.0 CERECV_2056 0 5323 1159385 2438330 0 00300181 0.0 CESEND_2056 1 5710 0 18446744071675175740 0 00300184 76.8 HARDWARE_KERNEL 0 0 0 18446744071964064315 0 00000000 0.7 IRQ520: [timer] 1 0 0 232145209 0 00000000 0.0 IRQ520: [timer] My theory is, that a occasional "longer" work or system call usage in the real-time task corrupts the rest of the system (under some special circumstances). Petr ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Xenomai-help] Kernel panic: not syncing 2008-07-21 10:58 ` Petr Cervenka @ 2008-07-21 11:26 ` Jan Kiszka 2008-07-31 16:14 ` [Xenomai-help] Segmentation error by heavy dynamic RT_QUEUE usage Petr Cervenka 2008-08-13 11:01 ` [Xenomai-core] [PATCH] Fix stat overruns on 64-bit (was: [Xenomai-help] Kernel panic: not syncing) Jan Kiszka 0 siblings, 2 replies; 24+ messages in thread From: Jan Kiszka @ 2008-07-21 11:26 UTC (permalink / raw) To: Petr Cervenka; +Cc: xenomai Petr Cervenka wrote: > Jan Kiszka wrote: >> We likely see some race that causes weird memory corruptions. Its >> probability often increases when the code execution frequency raises. >> >> However, reducing the test case is very important now to reduce the >> search domain for this issue. E.g. try to fake peripheral access as far >> as possible, unloading the unused driver and only leaving the test >> program behind that is executable on arbitrary Xenomai installation >> (maybe finally on one of my boxes...). >> > I'm not sure if I will be able to reduce the software. It's dependent on hardware and it's controlled from another windows computer with GUI and control application. And to check if the error is still there usually takes couple of days. > I ran a test during last weekend (and nothing wrong happened). But the /proc/xenomai/stat output is strange. Probably some type cast error, because 18446744071739514846 = 0xFFFFFFFF8A939FDE and the appropriate value perhaps should be 0x000000008A939FDE = 2324930526. > > CPU PID MSW CSW PF STAT %CPU NAME > 0 0 0 18446744071739514846 0 00500088 69.8 ROOT/0 > 1 0 0 18446744071675175740 0 00500080 23.2 ROOT/1 > 0 5299 0 351459 0 00300182 0.0 LOGGER_TASK_1804289383 > 0 5100 8 283613 0 00300186 0.0 > 0 5317 0 40591 0 00300182 0.0 > 0 5034 2 2330696 0 00300184 0.0 MAIN_TASK_2056 > 0 5318 5 18446744071736105613 3 00300180 29.5 REG_TASK_2056 > 0 5319 28 36 0 00300182 0.0 WORK_TASK_2056 > 0 5321 38926 39159 0 00300380 0.0 CERECV_2056 > 0 5323 1159385 2438330 0 00300181 0.0 CESEND_2056 > 1 5710 0 18446744071675175740 0 00300184 76.8 HARDWARE_KERNEL > 0 0 0 18446744071964064315 0 00000000 0.7 IRQ520: [timer] > 1 0 0 232145209 0 00000000 0.0 IRQ520: [timer] OK, at least this bug is a bit easier to fix. Please try this patch (which also takes the chance and extends the range of our stat counters a bit): Index: xenomai/include/nucleus/stat.h =================================================================== --- xenomai/include/nucleus/stat.h (Revision 4060) +++ xenomai/include/nucleus/stat.h (Arbeitskopie) @@ -84,20 +84,20 @@ do { \ typedef struct xnstat_counter { - int counter; + unsigned long counter; } xnstat_counter_t; -static inline int xnstat_counter_inc(xnstat_counter_t *c) +static inline unsigned long xnstat_counter_inc(xnstat_counter_t *c) { return c->counter++; } -static inline int xnstat_counter_get(xnstat_counter_t *c) +static inline unsigned long xnstat_counter_get(xnstat_counter_t *c) { return c->counter; } -static inline void xnstat_counter_set(xnstat_counter_t *c, int value) +static inline void xnstat_counter_set(xnstat_counter_t *c, unsigned long value) { c->counter = value; } > > My theory is, that a occasional "longer" work or system call usage in the real-time task corrupts the rest of the system (under some special circumstances). Yes, some nasty memory corruption is probably the reason. And that is always hard to track down, specifically if it happens very unpredictably. Nevertheless, if the issue continues to bug you, you will not get around reducing the test case and trying to increase its occurrence probability. Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 24+ messages in thread
* [Xenomai-help] Segmentation error by heavy dynamic RT_QUEUE usage 2008-07-21 11:26 ` Jan Kiszka @ 2008-07-31 16:14 ` Petr Cervenka 2008-08-12 14:37 ` Philippe Gerum 2008-08-13 11:01 ` [Xenomai-core] [PATCH] Fix stat overruns on 64-bit (was: [Xenomai-help] Kernel panic: not syncing) Jan Kiszka 1 sibling, 1 reply; 24+ messages in thread From: Petr Cervenka @ 2008-07-31 16:14 UTC (permalink / raw) To: xenomai [-- Attachment #1: Type: text/plain, Size: 1457 bytes --] Hello, I wanted to make an small example to find the kernel panic (and I failed with it). But during my tests I found another possible error. I made a small application (as netbeans c++ project) with two tasks: 1) server task with its RT_QUEUE waiting for a request. 2) client task which creates RT_QUEUES for response and sends requests to the server task >From time to time I get an segmentation error. It's always in the server task, when the server binds the clients queue, allocates a message buffer in it. It seems when the server starts to work with this buffer, the client could already close the queue. But this shouldn't be possible, because normally any attempt to close a queue binded by someone else ends with -EBUSY error. The error needs some time to produce and 2 CPUs (cores). One for server and one for client. My configuration(s): Athlon XP 2600GHz X86_64 kernel 2.6.24 (and 2.6.25.11) adeos 2.6.24 2.0-03 (and 2.0-07) xenomai 2.4.1 and 2.4.4 I'm sending also examples of the execution script and proper input.txt file both of them should be much longer (input.txt could be several MB)!!!! In the attachement there is also disassemble of my executable And finally, one of the segmentation error messages: [ 2553.818731] QT_SERVER[5919]: segfault at 2aaaaac96800 rip 4022b5 rsp 4000fe00 error 6 But there are more types, but allways when working with the allocated send buffer. I know, I'm annoying, but I can't help myself.... ;-) Petr [-- Attachment #2: queuetest.tar.bz2 --] [-- Type: application/octet-stream, Size: 10090 bytes --] [-- Attachment #3: runme --] [-- Type: application/octet-stream, Size: 164 bytes --] #!/bin/sh ../dist/Debug/GNU-Linux-x86/queuetest < input2.txt ../dist/Debug/GNU-Linux-x86/queuetest < input2.txt ../dist/Debug/GNU-Linux-x86/queuetest < input2.txt [-- Attachment #4: input.txt --] [-- Type: text/plain, Size: 156 bytes --] send 490000 recv echo 490000 sleep 100 send 490000 recv echo 490000 sleep 100 send 490000 recv echo 490000 sleep 100 send 490000 recv echo 490000 sleep 100 [-- Attachment #5: queuetest.asm.tar.bz2 --] [-- Type: application/octet-stream, Size: 52235 bytes --] ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Xenomai-help] Segmentation error by heavy dynamic RT_QUEUE usage 2008-07-31 16:14 ` [Xenomai-help] Segmentation error by heavy dynamic RT_QUEUE usage Petr Cervenka @ 2008-08-12 14:37 ` Philippe Gerum 0 siblings, 0 replies; 24+ messages in thread From: Philippe Gerum @ 2008-08-12 14:37 UTC (permalink / raw) To: Petr Cervenka; +Cc: xenomai Petr Cervenka wrote: > Hello, > I wanted to make an small example to find the kernel panic (and I failed with it). But during my tests I found another possible error. > I made a small application (as netbeans c++ project) with two tasks: > 1) server task with its RT_QUEUE waiting for a request. > 2) client task which creates RT_QUEUES for response and sends requests to the server task >>From time to time I get an segmentation error. > It's always in the server task, when the server binds the clients queue, allocates a message buffer in it. > It seems when the server starts to work with this buffer, the client could already close the queue. > But this shouldn't be possible, because normally any attempt to close a queue binded by someone else ends with -EBUSY error. > The error needs some time to produce and 2 CPUs (cores). One for server and one for client. > My configuration(s): > Athlon XP 2600GHz X86_64 > kernel 2.6.24 (and 2.6.25.11) > adeos 2.6.24 2.0-03 (and 2.0-07) > xenomai 2.4.1 and 2.4.4 > I'm sending also examples of the execution script and proper input.txt file > both of them should be much longer (input.txt could be several MB)!!!! > In the attachement there is also disassemble of my executable > And finally, one of the segmentation error messages: > [ 2553.818731] QT_SERVER[5919]: segfault at 2aaaaac96800 rip 4022b5 rsp 4000fe00 error 6 > But there are more types, but allways when working with the allocated send buffer. > I know, I'm annoying, but I can't help myself.... ;-) Yeah, but I can't help running useful test code people cared to write either, so that's ok. There was a silly bug in the userland wrapper, unmapping the memory pool from the application process, albeit the syscall just denied deletion (-EBUSY). This issue also affects RT_HEAP objects the very same way. Fixed in both trees. Thanks for narrowing the issue. Note: creating / binding to a _shared_ queue switches the caller to secondary mode, because in both cases, we need to use regular kernel services to mmap() the memory pool to the application process. --- src/skins/native/queue.c (revision 4086) +++ src/skins/native/queue.c (working copy) @@ -114,21 +114,18 @@ { int err; - err = __real_munmap(q->mapbase, q->mapsize); - - if (err) - return -EINVAL; - err = XENOMAI_SKINCALL1(__native_muxid, __native_queue_delete, q); - if (err) return err; + if (__real_munmap(q->mapbase, q->mapsize)) + err = -errno; + q->opaque = XN_NO_HANDLE; q->mapbase = NULL; q->mapsize = 0; - return 0; + return err; } void *rt_queue_alloc(RT_QUEUE *q, size_t size) PS: careful with the subject line, heavy / light RT_QUEUE usage is irrelevant wrt this bug, it is purely a matter of sequence (queue_create -> queue_bind -> queue_delete) that triggers the rt_queue_delete() wrapper issue. -- Philippe. ^ permalink raw reply [flat|nested] 24+ messages in thread
* [Xenomai-core] [PATCH] Fix stat overruns on 64-bit (was: [Xenomai-help] Kernel panic: not syncing) 2008-07-21 11:26 ` Jan Kiszka 2008-07-31 16:14 ` [Xenomai-help] Segmentation error by heavy dynamic RT_QUEUE usage Petr Cervenka @ 2008-08-13 11:01 ` Jan Kiszka 2008-08-13 15:29 ` [Xenomai-core] [PATCH] Fix stat overruns on 64-bit Philippe Gerum 1 sibling, 1 reply; 24+ messages in thread From: Jan Kiszka @ 2008-08-13 11:01 UTC (permalink / raw) To: xenomai-core [-- Attachment #1: Type: text/plain, Size: 2497 bytes --] Jan Kiszka wrote: > Petr Cervenka wrote: >> I ran a test during last weekend (and nothing wrong happened). But the /proc/xenomai/stat output is strange. Probably some type cast error, because 18446744071739514846 = 0xFFFFFFFF8A939FDE and the appropriate value perhaps should be 0x000000008A939FDE = 2324930526. >> >> CPU PID MSW CSW PF STAT %CPU NAME >> 0 0 0 18446744071739514846 0 00500088 69.8 ROOT/0 >> 1 0 0 18446744071675175740 0 00500080 23.2 ROOT/1 >> 0 5299 0 351459 0 00300182 0.0 LOGGER_TASK_1804289383 >> 0 5100 8 283613 0 00300186 0.0 >> 0 5317 0 40591 0 00300182 0.0 >> 0 5034 2 2330696 0 00300184 0.0 MAIN_TASK_2056 >> 0 5318 5 18446744071736105613 3 00300180 29.5 REG_TASK_2056 >> 0 5319 28 36 0 00300182 0.0 WORK_TASK_2056 >> 0 5321 38926 39159 0 00300380 0.0 CERECV_2056 >> 0 5323 1159385 2438330 0 00300181 0.0 CESEND_2056 >> 1 5710 0 18446744071675175740 0 00300184 76.8 HARDWARE_KERNEL >> 0 0 0 18446744071964064315 0 00000000 0.7 IRQ520: [timer] >> 1 0 0 232145209 0 00000000 0.0 IRQ520: [timer] > > OK, at least this bug is a bit easier to fix. Please try this patch > (which also takes the chance and extends the range of our stat counters > a bit): > > Index: xenomai/include/nucleus/stat.h > =================================================================== > --- xenomai/include/nucleus/stat.h (Revision 4060) > +++ xenomai/include/nucleus/stat.h (Arbeitskopie) > @@ -84,20 +84,20 @@ do { \ > > > typedef struct xnstat_counter { > - int counter; > + unsigned long counter; > } xnstat_counter_t; > > -static inline int xnstat_counter_inc(xnstat_counter_t *c) > +static inline unsigned long xnstat_counter_inc(xnstat_counter_t *c) > { > return c->counter++; > } > > -static inline int xnstat_counter_get(xnstat_counter_t *c) > +static inline unsigned long xnstat_counter_get(xnstat_counter_t *c) > { > return c->counter; > } > > -static inline void xnstat_counter_set(xnstat_counter_t *c, int value) > +static inline void xnstat_counter_set(xnstat_counter_t *c, unsigned long value) > { > c->counter = value; > } OK to apply those bits? Jan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 257 bytes --] ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Xenomai-core] [PATCH] Fix stat overruns on 64-bit 2008-08-13 11:01 ` [Xenomai-core] [PATCH] Fix stat overruns on 64-bit (was: [Xenomai-help] Kernel panic: not syncing) Jan Kiszka @ 2008-08-13 15:29 ` Philippe Gerum 0 siblings, 0 replies; 24+ messages in thread From: Philippe Gerum @ 2008-08-13 15:29 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai-core Jan Kiszka wrote: > Jan Kiszka wrote: >> Petr Cervenka wrote: >>> I ran a test during last weekend (and nothing wrong happened). But the /proc/xenomai/stat output is strange. Probably some type cast error, because 18446744071739514846 = 0xFFFFFFFF8A939FDE and the appropriate value perhaps should be 0x000000008A939FDE = 2324930526. >>> >>> CPU PID MSW CSW PF STAT %CPU NAME >>> 0 0 0 18446744071739514846 0 00500088 69.8 ROOT/0 >>> 1 0 0 18446744071675175740 0 00500080 23.2 ROOT/1 >>> 0 5299 0 351459 0 00300182 0.0 LOGGER_TASK_1804289383 >>> 0 5100 8 283613 0 00300186 0.0 >>> 0 5317 0 40591 0 00300182 0.0 >>> 0 5034 2 2330696 0 00300184 0.0 MAIN_TASK_2056 >>> 0 5318 5 18446744071736105613 3 00300180 29.5 REG_TASK_2056 >>> 0 5319 28 36 0 00300182 0.0 WORK_TASK_2056 >>> 0 5321 38926 39159 0 00300380 0.0 CERECV_2056 >>> 0 5323 1159385 2438330 0 00300181 0.0 CESEND_2056 >>> 1 5710 0 18446744071675175740 0 00300184 76.8 HARDWARE_KERNEL >>> 0 0 0 18446744071964064315 0 00000000 0.7 IRQ520: [timer] >>> 1 0 0 232145209 0 00000000 0.0 IRQ520: [timer] >> OK, at least this bug is a bit easier to fix. Please try this patch >> (which also takes the chance and extends the range of our stat counters >> a bit): >> >> Index: xenomai/include/nucleus/stat.h >> =================================================================== >> --- xenomai/include/nucleus/stat.h (Revision 4060) >> +++ xenomai/include/nucleus/stat.h (Arbeitskopie) >> @@ -84,20 +84,20 @@ do { \ >> >> >> typedef struct xnstat_counter { >> - int counter; >> + unsigned long counter; >> } xnstat_counter_t; >> >> -static inline int xnstat_counter_inc(xnstat_counter_t *c) >> +static inline unsigned long xnstat_counter_inc(xnstat_counter_t *c) >> { >> return c->counter++; >> } >> >> -static inline int xnstat_counter_get(xnstat_counter_t *c) >> +static inline unsigned long xnstat_counter_get(xnstat_counter_t *c) >> { >> return c->counter; >> } >> >> -static inline void xnstat_counter_set(xnstat_counter_t *c, int value) >> +static inline void xnstat_counter_set(xnstat_counter_t *c, unsigned long value) >> { >> c->counter = value; >> } > > OK to apply those bits? > Sure. Please apply to both branches. > Jan > > > > ------------------------------------------------------------------------ > > _______________________________________________ > Xenomai-core mailing list > Xenomai-core@domain.hid > https://mail.gna.org/listinfo/xenomai-core -- Philippe. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Xenomai-core] [PATCH] Fix stat overruns on 64-bit (was: [Xenomai-help] Kernel panic: not syncing)
@ 2008-08-13 15:48 Fillod Stephane
2008-08-13 17:02 ` Gilles Chanteperdrix
2008-08-13 17:53 ` Philippe Gerum
0 siblings, 2 replies; 24+ messages in thread
From: Fillod Stephane @ 2008-08-13 15:48 UTC (permalink / raw)
To: Jan Kiszka, xenomai-core
Jan Kiszka wrote:
>/proc/xenomai/stat output is strange. Probably some type cast error,
> because 18446744071739514846 = 0xFFFFFFFF8A939FDE and the appropriate
> value perhaps should be 0x000000008A939FDE = 2324930526.
[...]
Reminds me that other pending patch for /proc/xenomai/faults:
https://mail.gna.org/public/xenomai-core/2007-12/msg00064.html
--
Stephane
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [Xenomai-core] [PATCH] Fix stat overruns on 64-bit (was: [Xenomai-help] Kernel panic: not syncing) 2008-08-13 15:48 [Xenomai-core] [PATCH] Fix stat overruns on 64-bit (was: [Xenomai-help] Kernel panic: not syncing) Fillod Stephane @ 2008-08-13 17:02 ` Gilles Chanteperdrix 2008-08-13 20:50 ` Philippe Gerum 2008-08-13 17:53 ` Philippe Gerum 1 sibling, 1 reply; 24+ messages in thread From: Gilles Chanteperdrix @ 2008-08-13 17:02 UTC (permalink / raw) To: Fillod Stephane; +Cc: Jan Kiszka, xenomai-core Fillod Stephane wrote: > Jan Kiszka wrote: >> /proc/xenomai/stat output is strange. Probably some type cast error, >> because 18446744071739514846 = 0xFFFFFFFF8A939FDE and the appropriate >> value perhaps should be 0x000000008A939FDE = 2324930526. > [...] > > Reminds me that other pending patch for /proc/xenomai/faults: > https://mail.gna.org/public/xenomai-core/2007-12/msg00064.html december 2007? Oh dear! You should remind us more often when we forg^H^H^H^H take so much time to include your patches. -- Gilles. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Xenomai-core] [PATCH] Fix stat overruns on 64-bit (was: [Xenomai-help] Kernel panic: not syncing) 2008-08-13 17:02 ` Gilles Chanteperdrix @ 2008-08-13 20:50 ` Philippe Gerum 0 siblings, 0 replies; 24+ messages in thread From: Philippe Gerum @ 2008-08-13 20:50 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: Jan Kiszka, xenomai-core Gilles Chanteperdrix wrote: > Fillod Stephane wrote: >> Jan Kiszka wrote: >>> /proc/xenomai/stat output is strange. Probably some type cast error, >>> because 18446744071739514846 = 0xFFFFFFFF8A939FDE and the appropriate >>> value perhaps should be 0x000000008A939FDE = 2324930526. >> [...] >> >> Reminds me that other pending patch for /proc/xenomai/faults: >> https://mail.gna.org/public/xenomai-core/2007-12/msg00064.html > > december 2007? Oh dear! You should remind us more often when we > forg^H^H^H^H take so much time to include your patches. > Well, technically, this patch was not forgotten, but was, mmff... "swapped out". Fact is that my swapper-in sometimes gets swapped out as well. Working on it. -- Philippe. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Xenomai-core] [PATCH] Fix stat overruns on 64-bit (was: [Xenomai-help] Kernel panic: not syncing) 2008-08-13 15:48 [Xenomai-core] [PATCH] Fix stat overruns on 64-bit (was: [Xenomai-help] Kernel panic: not syncing) Fillod Stephane 2008-08-13 17:02 ` Gilles Chanteperdrix @ 2008-08-13 17:53 ` Philippe Gerum 1 sibling, 0 replies; 24+ messages in thread From: Philippe Gerum @ 2008-08-13 17:53 UTC (permalink / raw) To: Fillod Stephane; +Cc: Jan Kiszka, xenomai-core Fillod Stephane wrote: > Jan Kiszka wrote: >> /proc/xenomai/stat output is strange. Probably some type cast error, >> because 18446744071739514846 = 0xFFFFFFFF8A939FDE and the appropriate >> value perhaps should be 0x000000008A939FDE = 2324930526. > [...] > > Reminds me that other pending patch for /proc/xenomai/faults: > https://mail.gna.org/public/xenomai-core/2007-12/msg00064.html > Finally applied, thanks. -- Philippe. ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2008-08-13 20:50 UTC | newest] Thread overview: 24+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-07-07 15:45 [Xenomai-help] Kernel panic: not syncing Petr Cervenka 2008-07-07 15:59 ` Philippe Gerum 2008-07-08 8:31 ` Petr Cervenka 2008-07-08 8:38 ` Jan Kiszka 2008-07-08 9:21 ` Gilles Chanteperdrix 2008-07-08 9:33 ` Jan Kiszka 2008-07-09 15:19 ` Petr Cervenka 2008-07-09 16:05 ` Philippe Gerum 2008-07-10 13:45 ` Petr Cervenka 2008-07-11 13:18 ` Petr Cervenka 2008-07-15 14:42 ` Petr Cervenka 2008-07-15 15:03 ` Jan Kiszka 2008-07-16 8:39 ` Petr Cervenka 2008-07-17 10:21 ` Jan Kiszka 2008-07-21 10:58 ` Petr Cervenka 2008-07-21 11:26 ` Jan Kiszka 2008-07-31 16:14 ` [Xenomai-help] Segmentation error by heavy dynamic RT_QUEUE usage Petr Cervenka 2008-08-12 14:37 ` Philippe Gerum 2008-08-13 11:01 ` [Xenomai-core] [PATCH] Fix stat overruns on 64-bit (was: [Xenomai-help] Kernel panic: not syncing) Jan Kiszka 2008-08-13 15:29 ` [Xenomai-core] [PATCH] Fix stat overruns on 64-bit Philippe Gerum -- strict thread matches above, loose matches on Subject: below -- 2008-08-13 15:48 [Xenomai-core] [PATCH] Fix stat overruns on 64-bit (was: [Xenomai-help] Kernel panic: not syncing) Fillod Stephane 2008-08-13 17:02 ` Gilles Chanteperdrix 2008-08-13 20:50 ` Philippe Gerum 2008-08-13 17:53 ` Philippe Gerum
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.