From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Fri, 08 Feb 2008 13:41:48 +0100 From: "Petr Cervenka" MIME-Version: 1.0 Message-ID: <200802081341.13030@domain.hid> References: 200710081503.15198@domain.hid> <200802061509.13010@domain.hid> <2ff1a98a0802070523r7af4ec4fv20f514b0cf1868c@domain.hid> <47AB0B79.8000709@domain.hid> <47AB0F88.3000001@domain.hid> <47AB174F.5070207@domain.hid> <47AB1C21.3070702@domain.hid> In-Reply-To: <47AB1C21.3070702@domain.hid> Content-Type: text/plain; charset="windows-1250" Content-Transfer-Encoding: 8bit Subject: Re: [Xenomai-help] FPU not available List-Id: Help regarding installation and common use of Xenomai List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: jan.kiszka@domain.hid Cc: xenomai@xenomai.org >Philippe Gerum wrote: >> Jan Kiszka wrote: >>> Jan Kiszka wrote: >>>> Gilles Chanteperdrix wrote: >>>>> On Wed, Feb 6, 2008 at 3:09 PM, Petr Cervenka wrote: >>>>>> Hello. >>>>>> Recently, we switched to newer distribution of linux (Kubuntu 7.10). During this switch we changed many things (Xenomai 2.4.1, linux kernel 2.6.24, x86_64 architecture, ...). >>>>>> No we have problem, that in one of our tasks we are sometimes not able to use floating point operations (under very specific circumstances) . In such case, that task crashes immediately, but rest of the application runs "normaly". Output from dmesg is attached to this message. Task was created with T_FPU flag. >>>>>> Is there anything we can check or change? >>>>>> Petr Cervenka >>>>> I do not know if this is related to the issue you are facing, but the >>>>> first FPU fault of a thread running in primary mode may be handled by >>>>> Xenomai without switching to secondary mode. So, maybe the fault >>>>> epilogue implicitely expects Xenomai to have switched the fault to >>>>> secondary mode and use some secondary mode services such as >>>>> ipipe_restore_root, whereas the thread never leaved primary mode. >>>>> >>>> Good point! That is probably this path (and not the one I starred on): >>>> >>>> __ipipe_handle_exception() >>>> ... >>>> if (unlikely(ipipe_trap_notify(vector, regs))) { >>>> local_irq_restore(flags); >>>> return 1; >>>> } >>>> >>>> That needs some more thoughts... >>> Looking at the whole __ipipe_handle_exception, the problem is related to >>> the early, context-independent __ipipe_stall_root(). Can we postpone >>> this safely after having called any potential high-stage hooks for this >>> exception, and then only if the callee migrated the thread to the root >>> domain? Or is there a need to have the root domain stalled across the >>> post-fault migration? >>> >> >> Someone from the root domain may want to get notified of the exceptions >> occurring in that domain too, in which case we may not postpone the >> virtual mask fixup after the notifier invocation, otherwise we would >> call the handler with a broken interrupt state. >> >>> In the latter case, we would have to fiddle with the stall bits directly >>> instead of calling local_irq_restore - not just to work around the >>> BUG_ON, but also to avoid sync'ing root over potentially stalled >>> non-root domains... >>> >> >> This used to be done by ipipe_restore_pipeline_nosync() in older >> patches, but this one has disappeared after the flat log refactoring. We >> indeed need to resurrect something alike in order to reset the stall bit >> without calling the syncer, when taking the fast exit path after >> ipipe_trap_notify(). > >Hmm, so it could be fairly simple in fact: > >--- a/arch/x86/kernel/ipipe.c >+++ b/arch/x86/kernel/ipipe.c >@@ -755,7 +755,9 @@ int __ipipe_handle_exception(struct pt_r > #endif /* CONFIG_KGDB */ > > if (unlikely(ipipe_trap_notify(vector, regs))) { >- local_irq_restore(flags); >+ if (!flags) >+ __clear_bit(IPIPE_STALL_FLAG, >+ &ipipe_root_cpudom_var(status)); > return 1; > } > >Petr, ready to try? > I tried this patch and the problem (or the race condition) disappeared. ;-) Is there any (easy) method to recognise if the problem was solved? To your previous questions: We use Athlon64 X2 (2 cores, 64-bit), kubuntu 7.10 amd64. We have 2 real-time userspace applications: some kind of server for rtnet communication with special measuring hardware, and clients (1-4 instances) for some computing, configuration, ethernet comunication, etc. Comunication between server and clients is via named rt_queues.. Any "failing example" is perhaps impossible. Any attempt with IPIPE_DEBUG and tracer removes the race condition. Thank you VERY MUCH for you help and support (all of you). Petr >Jan > >-- >Siemens AG, Corporate Technology, CT SE 2 >Corporate Competence Center Embedded Linux >