From: "Petr Cervenka" <grugh@domain.hid>
To: rpm@xenomai.org
Cc: jan.kiszka@domain.hid, xenomai@xenomai.org
Subject: Re: [Xenomai-help] FPU not available
Date: Fri, 08 Feb 2008 16:27:40 +0100 [thread overview]
Message-ID: <200802081627.15934@domain.hid> (raw)
In-Reply-To: <47AC56A1.8090706@domain.hid>
______________________________________________________________
> Od: rpm@xenomai.org
> Komu: Petr Cervenka <grugh@domain.hid>
> CC: jan.kiszka@domain.hid, xenomai@xenomai.org
> Datum: 08.02.2008 14:25
> Předmět: Re: [Xenomai-help] FPU not available
>
>Petr Cervenka wrote:
>>> Philippe Gerum wrote:
>>>> Jan Kiszka wrote:
>>>>> Jan Kiszka wrote:
>>>>>> Gilles Chanteperdrix wrote:
>>>>>>> On Wed, Feb 6, 2008 at 3:09 PM, Petr Cervenka <grugh@domain.hid>
wrote:
>>>>>>>> Hello.
>>>>>>>> Recently, we switched to newer distribution of linux (Kubuntu
7.10). During this switch we changed many things (Xenomai 2.4.1, linux
kernel 2.6.24, x86_64 architecture, ...).
>>>>>>>> No we have problem, that in one of our tasks we are sometimes not able to use floating point operations (under very specific circumstances) . In such case, that task crashes immediately, but rest of the application runs "normaly". Output from dmesg is attached to this message. Task was created with T_FPU flag.
>>>>>>>> Is there anything we can check or change?
>>>>>>>> Petr Cervenka
>>>>>>> I do not know if this is related to the issue you are facing, but the
>>>>>>> first FPU fault of a thread running in primary mode may be handled by
>>>>>>> Xenomai without switching to secondary mode. So, maybe the fault
>>>>>>> epilogue implicitely expects Xenomai to have switched the fault to
>>>>>>> secondary mode and use some secondary mode services such as
>>>>>>> ipipe_restore_root, whereas the thread never leaved primary mode.
>>>>>>>
>>>>>> Good point! That is probably this path (and not the one I starred on):
>>>>>>
>>>>>> __ipipe_handle_exception()
>>>>>> ...
>>>>>> if (unlikely(ipipe_trap_notify(vector, regs))) {
>>>>>> local_irq_restore(flags);
>>>>>> return 1;
>>>>>> }
>>>>>>
>>>>>> That needs some more thoughts...
>>>>> Looking at the whole __ipipe_handle_exception, the problem is related to
>>>>> the early, context-independent __ipipe_stall_root(). Can we postpone
>>>>> this safely after having called any potential high-stage hooks for this
>>>>> exception, and then only if the callee migrated the thread to the root
>>>>> domain? Or is there a need to have the root domain stalled across the
>>>>> post-fault migration?
>>>>>
>>>> Someone from the root domain may want to get notified of the exceptions
>>>> occurring in that domain too, in which case we may not postpone the
>>>> virtual mask fixup after the notifier invocation, otherwise we would
>>>> call the handler with a broken interrupt state.
>>>>
>>>>> In the latter case, we would have to fiddle with the stall bits directly
>>>>> instead of calling local_irq_restore - not just to work around the
>>>>> BUG_ON, but also to avoid sync'ing root over potentially stalled
>>>>> non-root domains...
>>>>>
>>>> This used to be done by ipipe_restore_pipeline_nosync() in older
>>>> patches, but this one has disappeared after the flat log refactoring. We
>>>> indeed need to resurrect something alike in order to reset the stall bit
>>>> without calling the syncer, when taking the fast exit path after
>>>> ipipe_trap_notify().
>>> Hmm, so it could be fairly simple in fact:
>>>
>>> --- a/arch/x86/kernel/ipipe.c
>>> +++ b/arch/x86/kernel/ipipe.c
>>> @@ -755,7 +755,9 @@ int __ipipe_handle_exception(struct pt_r
>>> #endif /* CONFIG_KGDB */
>>>
>>> if (unlikely(ipipe_trap_notify(vector, regs))) {
>>> - local_irq_restore(flags);
>>> + if (!flags)
>>> + __clear_bit(IPIPE_STALL_FLAG,
>>> + &ipipe_root_cpudom_var(status));
>>> return 1;
>>> }
>>>
>>> Petr, ready to try?
>>>
>> I tried this patch and the problem (or the race condition) disappeared. ;-)
>> Is there any (easy) method to recognise if the problem was solved?
>>
>
>This one won't break the whole thing...
>
>diff --git a/arch/x86/kernel/ipipe.c b/arch/x86/kernel/ipipe.c
>index ce24db7..af9d4c4 100644
>--- a/arch/x86/kernel/ipipe.c
>+++ b/arch/x86/kernel/ipipe.c
>@@ -758,6 +758,7 @@ int __ipipe_handle_exception(struct pt_regs *regs, long error_code, int vector)
> #endif /* CONFIG_KGDB */
>
> if (unlikely(ipipe_trap_notify(vector, regs))) {
>+ WARN_ON(!ipipe_root_domain_p);
> if (!flags)
> __clear_bit(IPIPE_STALL_FLAG,
> &ipipe_root_cpudom_var(status));
>
I applied the "WARN_ON" patch and got the kernel bug again.
Dmesg output is attached. I hope it will help you this time a little bit.
It seems that the warning started to be printed and then the error happened (if I understand it well).
>> To your previous questions:
>> We use Athlon64 X2 (2 cores, 64-bit), kubuntu 7.10 amd64.
>> We have 2 real-time userspace applications: some kind of server for rtnet communication with special measuring hardware, and clients (1-4 instances) for some computing, configuration, ethernet comunication, etc. Comunication between server and clients is via named rt_queues.. Any "failing example" is perhaps impossible.
>> Any attempt with IPIPE_DEBUG and tracer removes the race condition.
>>
>> Thank you VERY MUCH for you help and support (all of you).
>> Petr
>>
>>> Jan
>>>
>>> --
>>> Siemens AG, Corporate Technology, CT SE 2
>>> Corporate Competence Center Embedded Linux
>>>
>>
>>
>> _______________________________________________
>> Xenomai-help mailing list
>> Xenomai-help@domain.hid
>> https://mail.gna.org/listinfo/xenomai-help
>>
>
>
>--
>Philippe.
>
[ 52.570242] WARNING: at arch/x86/kernel/ipipe.c:758 __ipipe_handle_exception()
[ 52.570249] Pid: 4758, comm: REG_TASK_2056 Not tainted 2.6.24-adeos #4
[ 52.570251]
[ 52.570251] Call Trace:
[ 52.570283] ------------[ cut here ]------------
[ 52.570318] kernel BUG at kernel/ipipe/core.c:321!
[ 52.570351] invalid opcode: 0000 [1] PREEMPT SMP
[ 52.570449] CPU 0
[ 52.570499] Modules linked in: rt_r8169 rtpacket rtnet rfcomm l2cap bluetooth ppdev container ac sbs sbshc dock battery lp irtty_sir sir_dev irda psmouse parport_pc parport crc_ccitt serio_raw k8temp pcspkr shpchp pci_hotplug button i2c_nforce2 i2c_core af_packet ipv6 evdev ext3 jbd mbcache sg sd_mod sata_nv ata_generic forcedeth libata amd74xx scsi_mod ide_core ehci_hcd ohci_hcd usbcore fan fuse
[ 52.571610] Pid: 4758, comm: REG_TASK_2056 Not tainted 2.6.24-adeos #4
[ 52.571645] RIP: 0010:[<ffffffff80278e47>] [<ffffffff80278e47>] __ipipe_restore_root+0x47/0x50
[ 52.571714] RSP: 0000:ffff81003e16bd88 EFLAGS: 00010002
[ 52.571747] RAX: ffffffff8067caa0 RBX: 00000009e4457f0f RCX: 0000000000000003
[ 52.571782] RDX: ffff810080993000 RSI: ffffffff8022743f RDI: 0000000000000001
[ 52.571816] RBP: ffff81003e16bd88 R08: ffff810001008420 R09: 0000000000000004
[ 52.571851] R10: ffff81003e16bda8 R11: 0000000000000000 R12: 0000000000000001
[ 52.571886] R13: ffff81003e16a000 R14: ffff81003e16bffd R15: 0000000000000000
[ 52.571921] FS: 0000000040091950(0063) GS:ffffffff805f5000(0000) knlGS:0000000000000000
[ 52.571963] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 52.571997] CR2: 00002b6d57fb623d CR3: 000000003a0c5000 CR4: 00000000000006e0
[ 52.572031] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 52.572066] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 52.572101] Process REG_TASK_2056 (pid: 4758, threadinfo ffff81003e16a000, task ffff810035432790)
[ 52.572144] Stack: ffff81003e16bdb8 ffffffff8023af74 ffff81003e16be98 ffffffff806777a8
[ 52.572288] ffff810080993000 ffff81003e16a000 ffff81003e16bdc8 ffffffff80273809
[ 52.572411] ffff81003e16bde8 ffffffff8027383e ffffffff8022743f ffff81003e16bf00
[ 52.572509] Call Trace:
[ 52.572569] [<ffffffff8023af74>] cpu_clock+0x84/0xa0
[ 52.572604] [<ffffffff80273809>] get_timestamp+0x9/0x10
[ 52.572638] [<ffffffff8027383e>] touch_softlockup_watchdog+0x2e/0x40
[ 52.572675] [<ffffffff8022743f>] __ipipe_handle_exception+0x25f/0x270
[ 52.572711] [<ffffffff80220c5a>] touch_nmi_watchdog+0x1a/0x80
[ 52.572746] [<ffffffff8022743f>] __ipipe_handle_exception+0x25f/0x270
[ 52.572783] [<ffffffff8020def1>] print_trace_address+0x11/0x20
[ 52.572818] [<ffffffff8022743f>] __ipipe_handle_exception+0x25f/0x270
[ 52.572853] [<ffffffff8020d8ab>] dump_trace+0x10b/0x2c0
[ 52.572890] [<ffffffff80413d18>] exception_event+0x48/0x60
[ 52.572925] [<ffffffff8020daa3>] show_trace+0x43/0x60
[ 52.572960] [<ffffffff8020e1aa>] dump_stack+0x6a/0x80
[ 52.572995] [<ffffffff8022743f>] __ipipe_handle_exception+0x25f/0x270
[ 52.573033] [<ffffffff804a2c73>] error_sti+0x1e/0x52
[ 52.573071]
[ 52.573100]
[ 52.573100] Code: 0f 0b eb fe 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 54 53
[ 52.573595] RIP [<ffffffff80278e47>] __ipipe_restore_root+0x47/0x50
[ 52.573650] RSP <ffff81003e16bd88>
[ 52.573690] ---[ end trace c09fed11ada7a064 ]---
[ 52.573723] note: REG_TASK_2056[4758] exited with preempt_count 1
prev parent reply other threads:[~2008-02-08 15:27 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-10-08 13:03 [Xenomai-help] rt_queue with multiple listeners Petr Cervenka
2008-02-06 14:09 ` [Xenomai-help] FPU not available Petr Cervenka
2008-02-06 14:45 ` Jan Kiszka
2008-02-07 12:22 ` Petr Cervenka
2008-02-07 13:16 ` Jan Kiszka
2008-02-07 13:23 ` Gilles Chanteperdrix
2008-02-07 13:45 ` Jan Kiszka
2008-02-07 14:02 ` Jan Kiszka
2008-02-07 14:35 ` Philippe Gerum
2008-02-07 14:56 ` Jan Kiszka
2008-02-08 12:41 ` Petr Cervenka
2008-02-08 13:17 ` Philippe Gerum
2008-02-08 13:18 ` Philippe Gerum
2008-02-08 15:27 ` Petr Cervenka [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200802081627.15934@domain.hid \
--to=grugh@domain.hid \
--cc=jan.kiszka@domain.hid \
--cc=rpm@xenomai.org \
--cc=xenomai@xenomai.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.