* [Adeos-main] Some notes on Adeos on ARM 920T (EP9301) with Linux 2.4.21-rmk1
@ 2004-12-13 14:02 Michael Neuhauser
2004-12-18 12:55 ` Philippe Gerum
0 siblings, 1 reply; 4+ messages in thread
From: Michael Neuhauser @ 2004-12-13 14:02 UTC (permalink / raw)
To: adeos-main
[-- Attachment #1: Type: text/plain, Size: 9711 bytes --]
First of all, sorry for the length of this - I've tried to be as brief
as possible.
I'm working with the EDB9301, a development board from Cirrus with an
ARM EP9301 CPU (ARM920T, 166 MHz). The board manufacturer supplies a
Linux patch for the 2.4.21-rmk1 Kernel (giving 2.4.21-rmk1-crus1.4.2).
The goal is to provide Adeos based RTAI on this system, supporting
hard-realtime both in kernel and user-space (i.e. LXRT).
I have Adeos now running stable on the system and want to share some of
the experiences I've gained and issues I've found in the porting
process. I will touch general issues first, but if wanted, I can provide
the architecture specific things later too (i.e. full source code, but
that will take a little time to clean up).
I've based my work on the Adeos cvs and the ARM patches included in
various RTAI versions.
1) idle loop (this may affect other architectures than ARM too)
---------------------------------------------------------------
If the CPU (like the EP9301) has a special HALT state and this feature
is used in the idle loop (in arch/arm/kernel/process.c, via arch_idle())
void default_idle(void)
{
local_irq_disable();
if (!current->need_resched)
arch_idle();
local_irq_enable();
}
then the following can happen when Adeos is used: local_irq_disable()
stalls the pipeline - interrupts that happen after this call and before
arch_idle() are handled by Adeos (put into pipe, mask in PIC) but
nothing more. If, by chance, *all* active interrupts happen just in this
time window, then the system is going to be locked-up because the HALT
state can not be left again:
* only an interrupt could wake-up the CPU from the HALT state
* all interrupts are masked and will not be delivered to the CPU
This actually happened quite often on my system: everything
froze until I sent a character to the otherwise idle 2nd UART (i.e.
triggered an unmasked interrupt).
To fix this, disable the interrupts on the hardware level in
default_idle(), not just stall the pipe (i.e. adeos_hw_cli() instead of
local_irq_disable()) if Adeos is configured.
See attached patch-file linux-2.4.21-rmk1-crus1.4.2_idle-lock-up.patch
Maybe using the HALT state should be avoided at all, when hard real-time
performance is the goal. Some RTAI patches for other ARM sub-archs have
comments that indicate problems with long wake-up times. EP9301 is OK in
this respect (comparing the latency on an idle system when HALT state is
used to using a simple busy loop showed no clear winner).
2) call asm_do_IRQ() instead of do_IRQ()
----------------------------------------
>From a mailing-list contribution by Russel M. King (sorry can't find
exact reference any more):
More thoughts - if this is 2.4.19-rmk6, calling do_IRQ is out.
do_IRQ is not safe to be called from anywhere other than within
properly-locked IRQ context. This means that all the locking which
is required would not be done, soft IRQs would not normally be run,
and things would go bang.
[...]
vector_IRQ -> {__irq_usr,__irq_svc} -> asm_do_IRQ -> do_IRQ
not
vector_IRQ -> {__irq_usr,__irq_svc} -> do_IRQ
This also holds for 2.4.21-rmk1 (especially the not called soft-irqs
result in strange phenomena). This can be fixed by replacing do_IRQ()
with asm_do_IRQ() in two places:
* in adeos/armv.c:__adeos_enable_pipeline()
* in arch/arm/kernel/adeos.c:__adeos_handle_irq()
But this makes it necessary to skip the irq fix-up in
arch/arm/kernel/irq.c:asm_do_IRQ() when the pipeline is active (because
it was already done in __adeos_handle_irq()).
See attached patch-file linux-2.4.21-rmk1-crus1.4.2_asm_do_irq.patch
I still haven't figured out if the further comments of RMK apply also:
Please note that asm_do_IRQ is intended to be called only once while
handling each interrupt, so:
asm_do_IRQ -> do_IRQ -> somewhere else -> asm_do_IRQ
isn't legal, but:
asm_do_IRQ -> do_IRQ -> somewhere else -> do_IRQ
or:
asm_do_IRQ -> do_IRQ -> somewhere else -> (cpu interrupt) ->
vector_IRQ -> __irq_svc -> asm_do_IRQ -> do_IRQ
are both legal.
I.e. the "somewhere else" would be __adeos_sync_stage(). But I haven't
noticed any troubles under heavy interrupt and general system load so it
seems to be OK the way it is now.
3) crash when Adeos is compiled into kernel (i.e. not module)
-------------------------------------------------------------
In the adeos-cvs __adeos_takeover() is called from
2.4.19-arm-rmk7-pxa2/init/main.c:do_basic_setup() when Adeos is not
compiled as a module. This crashes under 2.4.21-rmk1-crus1.4.2, while
calling it at the end of kernel/adeos.c:__adoes_init() does work. (I've
got this fix from the RTAI-list message "Preliminary ARM patch for
Stromboli/newLXRT" by Humberto Luiz Valdivia de Matos.)
I don't have PXA hardware so I can't check if it works there as it is.
4) __adeos_enter_syscall() gets called with wrong "regs" argument
-----------------------------------------------------------------
__adeos_enter_syscall() is called from
arch/arm/kernel/entry-common.S:vector_swi to monitor system-call events.
The syscall-number and a pointer to the pt_regs structure is passed. In
code:
ENTRY(vector_swi)
save_user_regs
[...]
str r4, [sp, #-S_OFF]! @ push fifth arg
[...]
#ifdef CONFIG_ADEOS_CORE
stmfd sp!, {r0-r3}
add r0, sp, #S_OFF
mov r1, scno
bl __adeos_enter_syscall
cmp r0,#0
ldmfd sp!, {r0-r3}
bne __adeos_fast_ret
#endif /* CONFIG_ADEOS_CORE */
ldr ip, [tsk, #TSK_PTRACE] @ check for syscall tracing
[...]
Register r0 should point to the stack where the registers were pushed
onto it with "save_user_regs", S_OFF is added to correct for the "str
r4, [sp, #-S_OFF]!" instruction. But the saving of the registers just
before the add is not accounted for. So the line
add r0, sp, #S_OFF
has to be replaced with
add r0, sp, #(4*4 + S_OFF) @ let r0 point to user regs
4*4: registers r0-r3 are saved, 4 bytes each.
Please note, that I've also moved the "ldr ip, [tsk, #TSK_PTRACE]" after
the ifdef so that ip hasn't to be saved on the stack.
Performance Enhancements
========================
P1) don't use write-back caching (all architectures that use virtual
addresses to access the cache, not really Adeos specific)
-------------------------------------------------------------------
Don't configure the d-cache to use write-back - it increases the worst
case irq latency by ~ 40 us on the EP9301:
* cache is dirty (i.e. contains modified data not written to RAM yet)
* a (Linux process) context switch happens (with irqs hard disabled of
course)
* cache has to be invalidated (because cache uses virtual addresses and
next process might use same address range)
* dirty words have to be written back to RAM, RAM is slow
-> 40 us of hard locked irqs.
So always use write-through caching when you need fast reaction to
interrupts. To be able to select this option in 2.4.21-rmk1, one has to
fix arch/arm/config.in (replace CONFIG_CPU_DISABLE_DCACHE with
CONFIG_CPU_DCACHE_DISABLE) and never use "make xconfig" it silently
resets the option (menuconfig, oldconfig and config is OK).
P2) optimize away adp_pipelined flag if not compiled as module
--------------------------------------------------------------
A little optimization "Preliminary ARM patch for Stromboli/newLXRT" by
Humberto Luiz Valdivia de Matos: the flag "adp_pipelined" is replaced by
a define to 1 if Adeos is directly compiled into the Kernel.
Conditionals on adp_pipelined can than be resolved during compile-time
(performance gain might not be much, but at my low end system everything
counts).
See attached patch-file linux-2.4.21-rmk1-crus1.4.2_pipelined_flag.patch
P3) return value of __adeos_handle_irq() is not used, don't compute it
----------------------------------------------------------------------
On the i386 architecture, the return value of __adeos_handle_irq() is
used to decide which return path to take when returning from an
interrupt.
* the slow path (may do reschedule, signal_return) if the current domain
is the root-domain and its pipeline is not stalled
* the fast path (just restore-registers and do "iret") otherwise
On the ARMv architecture, things are handled differently (see
arch/arm/kernel/entry-armv.S): the same return path through ret_to_user
will be taken, no matter what __adeos_handle_irq() returns.
So to save a few cycles (and cache space), __adeos_handle_irq()
should be made void so no return value needs to be computed (it would
also reflect more clearly what is really going on).
See attached patch-file linux-2.4.21-rmk1-crus1.4.2_handle_irq_noretval.patch
P4) IPIPE_SYNC_FLAG only needed for SMP
---------------------------------------
* arch/arm/kernel/adeos.c:__adeos_sync_stage() is the only place where
the IPIPE_SYNC_FLAG of a pipeline is modified.
* the flag is only modified when adeos_lock_cpu() or adeos_hw_cli() was
called before
* the flag is cleared before interrupts are hard enabled and the handler
is called
-> test_and_set_bit() on the sync-flag can never be true on an UP system
-> IPIPE_SYNC_FLAG is only needed for SMP
To save a few cycles (test_and_set_bit() is rather costly on ARM as the
MSR has to be read/modified), use "#ifdef CONFIG_SMP" around all uses of
IPIPE_SYNC_FLAG in arch/arm/kernel/adeos.c:__adeos_sync_stage().
See attached patch-file linux-2.4.21-rmk1-crus1.4.2_ipipe_sync_flag_smp.patch
Kind regards,
Mike
--
Dr. Michael Neuhauser phone: +43 1 789 08 49 - 30
Firmix Software GmbH fax: +43 1 789 08 49 - 55
Vienna/Austria/Europe email: mike@domain.hid
Embedded Linux Development and Services http://www.firmix.at/
[-- Attachment #2: linux-2.4.21-rmk1-crus1.4.2_asm_do_irq.patch --]
[-- Type: text/x-patch, Size: 2260 bytes --]
diff -d -u -r1.1.1.1 irq.c
--- arch/arm/kernel/irq.c 18 Oct 2004 10:00:17 -0000 1.1.1.1
+++ arch/arm/kernel/irq.c 13 Dec 2004 11:04:48 -0000
@@ -310,6 +318,11 @@
*/
asmlinkage void asm_do_IRQ(int irq, struct pt_regs *regs)
{
+#ifdef CONFIG_ADEOS_CORE
+ /* Only fix-up irq if pipline is not active (then it was already done
+ * in __adeos_handle_irq(). <mike@domain.hid> */
+ if (!adp_pipelined)
+#endif /* CONFIG_ADEOS_CORE */
irq = fixup_irq(irq);
/*
diff -u adeos-cvs/v2.4/adeos-core/adeos/armv.c adeos/armv.c
--- adeos-cvs/v2.4/adeos-core/adeos/armv.c 2004-10-03 16:18:50.000000000 +0200
+++ adeos/armv.c 2004-12-13 12:08:46.000000000 +0100
@@ -38,8 +42,8 @@
extern struct pt_regs __adeos_irq_regs;
-asmlinkage void do_IRQ(int irq,
- struct pt_regs *regs);
+asmlinkage void asm_do_IRQ(int irq,
+ struct pt_regs *regs);
static struct irqdesc __adeos_std_irq_desc[NR_IRQS];
@@ -126,8 +130,11 @@
/* First, virtualize all interrupts from the root domain. */
for (irq = 0; irq < NR_IRQS; irq++)
+ /* Note that according to Russel King asm_do_IRQ() has to used as the
+ * handler, not do_IRQ() (at least for 2.4.21-rmk7) as the original
+ * ADEOS code had here. <mike@domain.hid> */
adeos_virtualize_irq(irq,
- (void (*)(unsigned))&do_IRQ,
+ (void (*)(unsigned))&asm_do_IRQ,
&__adeos_ack_irq,
IPIPE_HANDLE_MASK|IPIPE_PASS_MASK);
diff -u adeos-cvs/v2.4/adeos-core/arch/arm/kernel/adeos.c adeos/armv.c
--- adeos-cvs/v2.4/adeos-core/arch/arm/kernel/adeos.c 2004-10-16 20:37:39.000000000 +0200
+++ arch/arm/kernel/adeos.c 2004-12-13 12:11:17.000000000 +0100
@@ -33,8 +33,8 @@
#include <asm/io.h>
#include <asm/arch/irq.h>
-asmlinkage void do_IRQ(int irq,
- struct pt_regs *regs);
+asmlinkage void asm_do_IRQ(int irq,
+ struct pt_regs *regs);
/* A global flag telling whether Adeos pipelining is engaged. */
int adp_pipelined;
@@ -236,8 +236,11 @@
if (!adp_pipelined)
{
- do_IRQ(irq,regs);
+ /* Note that according to Russel King asm_do_IRQ() has to be called (at
+ * least for 2.4.21-rmk7), not do_IRQ() as the original ADEOS code had
+ * here. <mike@domain.hid> */
+ asm_do_IRQ(irq,regs);
return 1;
}
if (!adeos_virtual_irq_p(irq))
[-- Attachment #3: linux-2.4.21-rmk1-crus1.4.2_handle_irq_noretval.patch --]
[-- Type: text/plain, Size: 2084 bytes --]
--- adeos-core/include/asm-arm/adeos.h 2004-11-09 09:02:14.000000000 +0100
+++ include/asm-arm/adeos.h 2004-12-09 23:35:49.000000000 +0100
@@ -279,8 +281,8 @@
void __adeos_tune_timer(unsigned long hz);
-asmlinkage int __adeos_handle_irq(int irq,
- struct pt_regs *regs);
+asmlinkage void __adeos_handle_irq(int irq,
+ struct pt_regs *regs);
asmlinkage int __adeos_switch_domain(adomain_t *adp,
adomain_t **currentp);
--- adeos-core/arch/arm/kernel/adeos.c 2004-12-13 12:30:35.000000000 +0100
+++ arch/arm/kernel/adeos.c 2004-12-13 12:30:46.000000000 +0100
@@ -224,7 +236,7 @@
interrupt protection log is maintained here for each
domain. Interrupts are off on entry. */
-asmlinkage int __adeos_handle_irq (int irq, struct pt_regs *regs)
+asmlinkage void __adeos_handle_irq (int irq, struct pt_regs *regs)
{
struct list_head *head, *pos;
@@ -240,7 +252,7 @@
* least for 2.4.21-rmk7), not do_IRQ() as the original ADEOS code had
* here. <mike@domain.hid> */
asm_do_IRQ(irq,regs);
- return 1;
+ return;
}
if (!adeos_virtual_irq_p(irq))
@@ -249,7 +261,7 @@
if (irq >= IPIPE_NR_IRQS)
{
printk(KERN_ERR "ADEOS: spurious interrupt %d\n",irq);
- return 1;
+ return;
}
adeos_load_cpuid();
@@ -328,8 +340,24 @@
__adeos_walk_pipeline(head,cpuid);
+#if 0
+ /* @TODO@
+ * The return value of this function is only used on i386 but not on ARM or
+ * PPC! On i386 the following is done in x86.c:
+ * r = __adeos_handle_irq()
+ * if (r) // current domain is root-domain && ipipe is not stalled?
+ * asm("cld") // slow path (may do reschedule, signal_return)
+ * jmp ret_from_intr
+ * else // current domain is not root-domain || ipipe is stalled
+ * restore regs // fast path
+ * asm("iret")
+ * --<mike@domain.hid>
+ */
return (adp_cpu_current[cpuid] == adp_root &&
!test_bit(IPIPE_STALL_FLAG,&adp_root->cpudata[cpuid].status));
+#endif
+
+ return;
}
/* adeos_trigger_irq() -- Push the interrupt to the pipeline entry
[-- Attachment #4: linux-2.4.21-rmk1-crus1.4.2_idle-lock-up.patch --]
[-- Type: text/x-patch, Size: 918 bytes --]
diff -d -u -r1.1.1.1 process.c
--- arch/arm/kernel/process.c 18 Oct 2004 10:00:17 -0000 1.1.1.1
+++ arch/arm/kernel/process.c 13 Dec 2004 10:21:41 -0000
@@ -66,17 +66,30 @@
void (*pm_idle)(void);
void (*pm_power_off)(void);
+#ifndef CONFIG_ADEOS_CORE
/*
* This is our default idle handler. We need to disable
* interrupts here to ensure we don't miss a wakeup call.
*/
void default_idle(void)
{
+#ifdef CONFIG_ADEOS_CORE
+ /* if we don't do a hard cli, we might miss an interrupt and
+ * sleep for an indefinite time (system lock-up or increased
+ * interrupt latency) --<mike@domain.hid> */
+ adeos_hw_cli();
+#else
local_irq_disable();
+#endif
if (!current->need_resched && !hlt_counter)
arch_idle();
+#ifdef CONFIG_ADEOS_CORE
+ adeos_hw_sti();
+#else
local_irq_enable();
+#endif
}
+#endif /* !CONFIG_ADEOS_CORE */
/*
* The idle thread. We try to conserve power, while trying to keep
[-- Attachment #5: linux-2.4.21-rmk1-crus1.4.2_ipipe_sync_flag_smp.patch --]
[-- Type: text/plain, Size: 1084 bytes --]
--- adeos-core/arch/arm/kernel/adeos.c 2004-12-13 12:30:35.000000000 +0100
+++ arch/arm/kernel/adeos.c 2004-12-13 12:30:46.000000000 +0100
@@ -114,8 +116,10 @@
do
{
+#ifdef CONFIG_SMP
if (unlikely(test_and_set_bit(IPIPE_SYNC_FLAG,&cpudata->status)))
goto release_cpu_and_exit;
+#endif
/* The policy here is to keep the dispatching code
interrupt-free by stalling the current stage. If the upper
@@ -155,7 +159,9 @@
set_bit(IPIPE_STALL_FLAG,&cpudata->status);
+#ifdef CONFIG_SMP
clear_bit(IPIPE_SYNC_FLAG,&cpudata->status);
+#endif
#ifdef CONFIG_ADEOS_PROFILING
__adeos_profile_data[cpuid].irqs[irq].n_synced++;
@@ -174,16 +180,22 @@
clear_bit(IPIPE_STALL_FLAG,&cpudata->status);
+#ifdef CONFIG_SMP
if (test_and_set_bit(IPIPE_SYNC_FLAG,&cpudata->status))
goto release_cpu_and_exit;
+#endif
}
}
+#ifdef CONFIG_SMP
clear_bit(IPIPE_SYNC_FLAG,&cpudata->status);
+#endif
}
while ((cpudata->irq_pending_hi & syncmask) != 0);
+#ifdef CONFIG_SMP
release_cpu_and_exit:
+#endif
adeos_unlock_cpu(flags);
}
[-- Attachment #6: linux-2.4.21-rmk1-crus1.4.2_pipelined_flag.patch --]
[-- Type: text/plain, Size: 922 bytes --]
--- adeos-core/include/asm-arm/adeos.h 2004-11-09 09:02:14.000000000 +0100
+++ include/asm-arm/adeos.h 2004-12-13 14:29:27.000000000 +0100
@@ -42,7 +42,11 @@
#error "Adeos: unsupported ARM architecture, sorry..."
#endif /* CONFIG_ARCH_SA1100 */
+#ifdef CONFIG_ADEOS_MODULE
extern int adp_pipelined;
+#else /* !CONFIG_ADEOS_MODULE */
+#define adp_pipelined (1) /* optimize away flag tests if not compiled as module */
+#endif /* CONFIG_ADEOS_MODULE */
typedef unsigned long cpumask_t;
--- adeos-core/arch/arm/kernel/adeos.c 2004-12-13 14:04:55.000000000 +0100
+++ arch/arm/kernel/adeos.c 2004-12-13 12:30:46.000000000 +0100
@@ -36,8 +36,10 @@
asmlinkage void asm_do_IRQ(int irq,
struct pt_regs *regs);
+#ifdef CONFIG_ADEOS_MODULE
/* A global flag telling whether Adeos pipelining is engaged. */
int adp_pipelined;
+#endif /* CONFIG_ADEOS_MODULE */
struct pt_regs __adeos_irq_regs;
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Adeos-main] Some notes on Adeos on ARM 920T (EP9301) with Linux 2.4.21-rmk1
2004-12-13 14:02 [Adeos-main] Some notes on Adeos on ARM 920T (EP9301) with Linux 2.4.21-rmk1 Michael Neuhauser
@ 2004-12-18 12:55 ` Philippe Gerum
2004-12-20 15:31 ` Michael Neuhauser
0 siblings, 1 reply; 4+ messages in thread
From: Philippe Gerum @ 2004-12-18 12:55 UTC (permalink / raw)
To: Michael Neuhauser; +Cc: adeos-main
On Mon, 2004-12-13 at 15:02, Michael Neuhauser wrote:
> First of all, sorry for the length of this - I've tried to be as brief
> as possible.
>
> I'm working with the EDB9301, a development board from Cirrus with an
> ARM EP9301 CPU (ARM920T, 166 MHz). The board manufacturer supplies a
> Linux patch for the 2.4.21-rmk1 Kernel (giving 2.4.21-rmk1-crus1.4.2).
> The goal is to provide Adeos based RTAI on this system, supporting
> hard-realtime both in kernel and user-space (i.e. LXRT).
>
> I have Adeos now running stable on the system and want to share some of
> the experiences I've gained and issues I've found in the porting
> process. I will touch general issues first, but if wanted, I can provide
> the architecture specific things later too (i.e. full source code, but
> that will take a little time to clean up).
>
> I've based my work on the Adeos cvs and the ARM patches included in
> various RTAI versions.
>
> 1) idle loop (this may affect other architectures than ARM too)
> ---------------------------------------------------------------
>
> If the CPU (like the EP9301) has a special HALT state and this feature
> is used in the idle loop (in arch/arm/kernel/process.c, via arch_idle())
>
> void default_idle(void)
> {
> local_irq_disable();
> if (!current->need_resched)
> arch_idle();
> local_irq_enable();
> }
>
> then the following can happen when Adeos is used: local_irq_disable()
> stalls the pipeline - interrupts that happen after this call and before
> arch_idle() are handled by Adeos (put into pipe, mask in PIC) but
> nothing more. If, by chance, *all* active interrupts happen just in this
> time window, then the system is going to be locked-up because the HALT
> state can not be left again:
> * only an interrupt could wake-up the CPU from the HALT state
> * all interrupts are masked and will not be delivered to the CPU
>
> This actually happened quite often on my system: everything
> froze until I sent a character to the otherwise idle 2nd UART (i.e.
> triggered an unmasked interrupt).
>
> To fix this, disable the interrupts on the hardware level in
> default_idle(), not just stall the pipe (i.e. adeos_hw_cli() instead of
> local_irq_disable()) if Adeos is configured.
>
This has been overlooked in the Adeos patch for ARM. x86 properly
flushes the interrupt log when calling safe_halt(), but locking out hw
IRQs seems indeed the best way to prevent the CPU to enter the halted
mode whilst there is some IRQ-awaken task to schedule, specifically for
non-preemptible kernels.
> See attached patch-file linux-2.4.21-rmk1-crus1.4.2_idle-lock-up.patch
>
> Maybe using the HALT state should be avoided at all, when hard real-time
> performance is the goal. Some RTAI patches for other ARM sub-archs have
> comments that indicate problems with long wake-up times. EP9301 is OK in
> this respect (comparing the latency on an idle system when HALT state is
> used to using a simple busy loop showed no clear winner).
>
Power consumption?
> 2) call asm_do_IRQ() instead of do_IRQ()
> ----------------------------------------
>
> >From a mailing-list contribution by Russel M. King (sorry can't find
> exact reference any more):
>
> More thoughts - if this is 2.4.19-rmk6, calling do_IRQ is out.
> do_IRQ is not safe to be called from anywhere other than within
> properly-locked IRQ context. This means that all the locking which
> is required would not be done, soft IRQs would not normally be run,
> and things would go bang.
>
> [...]
>
> vector_IRQ -> {__irq_usr,__irq_svc} -> asm_do_IRQ -> do_IRQ
> not
> vector_IRQ -> {__irq_usr,__irq_svc} -> do_IRQ
>
> This also holds for 2.4.21-rmk1 (especially the not called soft-irqs
> result in strange phenomena). This can be fixed by replacing do_IRQ()
> with asm_do_IRQ() in two places:
> * in adeos/armv.c:__adeos_enable_pipeline()
> * in arch/arm/kernel/adeos.c:__adeos_handle_irq()
> But this makes it necessary to skip the irq fix-up in
> arch/arm/kernel/irq.c:asm_do_IRQ() when the pipeline is active (because
> it was already done in __adeos_handle_irq()).
>
> See attached patch-file linux-2.4.21-rmk1-crus1.4.2_asm_do_irq.patch
>
Ok, I'll apply this too.
> I still haven't figured out if the further comments of RMK apply also:
>
> Please note that asm_do_IRQ is intended to be called only once while
> handling each interrupt, so:
>
> asm_do_IRQ -> do_IRQ -> somewhere else -> asm_do_IRQ
>
> isn't legal, but:
>
> asm_do_IRQ -> do_IRQ -> somewhere else -> do_IRQ
> or:
> asm_do_IRQ -> do_IRQ -> somewhere else -> (cpu interrupt) ->
> vector_IRQ -> __irq_svc -> asm_do_IRQ -> do_IRQ
>
> are both legal.
>
> I.e. the "somewhere else" would be __adeos_sync_stage(). But I haven't
> noticed any troubles under heavy interrupt and general system load so it
> seems to be OK the way it is now.
>
> 3) crash when Adeos is compiled into kernel (i.e. not module)
> -------------------------------------------------------------
>
> In the adeos-cvs __adeos_takeover() is called from
> 2.4.19-arm-rmk7-pxa2/init/main.c:do_basic_setup() when Adeos is not
> compiled as a module. This crashes under 2.4.21-rmk1-crus1.4.2, while
> calling it at the end of kernel/adeos.c:__adoes_init() does work. (I've
> got this fix from the RTAI-list message "Preliminary ARM patch for
> Stromboli/newLXRT" by Humberto Luiz Valdivia de Matos.)
>
The location in the boot sequence where the pipeline should be enabled
looks like much more arch-dependent than I had first expected (e.g.
Adeos 2.6/ia64 required some changes there). This said, the crash might
be related to the change making adp_pipelined a constant. In any case,
this is what happened to the Adeos 2.6/PPC port.
> I don't have PXA hardware so I can't check if it works there as it is.
>
> 4) __adeos_enter_syscall() gets called with wrong "regs" argument
> -----------------------------------------------------------------
>
> __adeos_enter_syscall() is called from
> arch/arm/kernel/entry-common.S:vector_swi to monitor system-call events.
> The syscall-number and a pointer to the pt_regs structure is passed. In
> code:
>
> ENTRY(vector_swi)
> save_user_regs
> [...]
> str r4, [sp, #-S_OFF]! @ push fifth arg
> [...]
> #ifdef CONFIG_ADEOS_CORE
> stmfd sp!, {r0-r3}
> add r0, sp, #S_OFF
> mov r1, scno
> bl __adeos_enter_syscall
> cmp r0,#0
> ldmfd sp!, {r0-r3}
> bne __adeos_fast_ret
> #endif /* CONFIG_ADEOS_CORE */
> ldr ip, [tsk, #TSK_PTRACE] @ check for syscall tracing
> [...]
>
> Register r0 should point to the stack where the registers were pushed
> onto it with "save_user_regs", S_OFF is added to correct for the "str
> r4, [sp, #-S_OFF]!" instruction. But the saving of the registers just
> before the add is not accounted for. So the line
>
> add r0, sp, #S_OFF
>
> has to be replaced with
>
> add r0, sp, #(4*4 + S_OFF) @ let r0 point to user regs
>
> 4*4: registers r0-r3 are saved, 4 bytes each.
>
> Please note, that I've also moved the "ldr ip, [tsk, #TSK_PTRACE]" after
> the ifdef so that ip hasn't to be saved on the stack.
>
Ok, internal syscall of mine returning -ENOBRAIN. The Adeos 2.4/ARM
patch against 2.4.19 is in fact a merge between a contributed patch for
sa1100 and a previous patch of mine against uclinux for mmuless AT91
platforms. Whilst the latter has been reasonably tested, the merged
stuff has not.
> Performance Enhancements
> ========================
>
> P1) don't use write-back caching (all architectures that use virtual
> addresses to access the cache, not really Adeos specific)
> -------------------------------------------------------------------
>
> Don't configure the d-cache to use write-back - it increases the worst
> case irq latency by ~ 40 us on the EP9301:
>
> * cache is dirty (i.e. contains modified data not written to RAM yet)
> * a (Linux process) context switch happens (with irqs hard disabled of
> course)
> * cache has to be invalidated (because cache uses virtual addresses and
> next process might use same address range)
> * dirty words have to be written back to RAM, RAM is slow
>
> -> 40 us of hard locked irqs.
>
> So always use write-through caching when you need fast reaction to
> interrupts. To be able to select this option in 2.4.21-rmk1, one has to
> fix arch/arm/config.in (replace CONFIG_CPU_DISABLE_DCACHE with
> CONFIG_CPU_DCACHE_DISABLE) and never use "make xconfig" it silently
> resets the option (menuconfig, oldconfig and config is OK).
>
Ok.
> P2) optimize away adp_pipelined flag if not compiled as module
> --------------------------------------------------------------
>
> A little optimization "Preliminary ARM patch for Stromboli/newLXRT" by
> Humberto Luiz Valdivia de Matos: the flag "adp_pipelined" is replaced by
> a define to 1 if Adeos is directly compiled into the Kernel.
> Conditionals on adp_pipelined can than be resolved during compile-time
> (performance gain might not be much, but at my low end system everything
> counts).
>
Warning: this optimization was there in the early Adeos days, but I've
discarded it because it broke the PPC port. Basically, some archs need
to perform part of their init chores with interrupts enabled, thus
requesting the pipeline to be up and running. Thus, the grey area of
code between the hw mask enable and the pipeline init does cause
problems. On PPC, this is clearly the case. On x86, things are a bit
more complex: unless the IO-APIC support is enabled, adp_pipelined can
be constant because the whole boot process preceeding the pipeline init
is done IRQs off. If IO-APIC is enabled, we have to wait for the IRQ
routing and other init chores to be performed before the pipeline is
finally initialized, so we must have a variable adp_pipelined state.
This said, the best approach is likely to make this optimization on an
arch-dependent basis if nothing prevents it.
> See attached patch-file linux-2.4.21-rmk1-crus1.4.2_pipelined_flag.patch
>
> P3) return value of __adeos_handle_irq() is not used, don't compute it
> ----------------------------------------------------------------------
>
> On the i386 architecture, the return value of __adeos_handle_irq() is
> used to decide which return path to take when returning from an
> interrupt.
> * the slow path (may do reschedule, signal_return) if the current domain
> is the root-domain and its pipeline is not stalled
> * the fast path (just restore-registers and do "iret") otherwise
>
> On the ARMv architecture, things are handled differently (see
> arch/arm/kernel/entry-armv.S): the same return path through ret_to_user
> will be taken, no matter what __adeos_handle_irq() returns.
>
> So to save a few cycles (and cache space), __adeos_handle_irq()
> should be made void so no return value needs to be computed (it would
> also reflect more clearly what is really going on).
>
> See attached patch-file linux-2.4.21-rmk1-crus1.4.2_handle_irq_noretval.patch
>
Whoops, no. The real problem is the IRQ return path not caring from the
return value of __adeos_handle_irq(), not the other way around. The
current patch has a serious flaw, it should check this return value, and
branch out of any Linux epilogue code if it is non-zero.
The reason is that if this value is non-zero, then the current IRQ has
either preempted a non-Linux domain, or has preempted Linux (e.g. to
serve more prioritary domains, or at least to log the interrupt event)
while the Linux pipeline stage was stalled, i.e. while Linux thought it
was running in an interrupt-free section.
If you do not prevent the Linux epilogue code to run upon IRQ exit in
the latter case, then you are likely to execute it in a spurious
context. This issue is even more important with preemptible kernels,
because the epilogue code does even more complex things.
> P4) IPIPE_SYNC_FLAG only needed for SMP
> ---------------------------------------
>
> * arch/arm/kernel/adeos.c:__adeos_sync_stage() is the only place where
> the IPIPE_SYNC_FLAG of a pipeline is modified.
> * the flag is only modified when adeos_lock_cpu() or adeos_hw_cli() was
> called before
> * the flag is cleared before interrupts are hard enabled and the handler
> is called
> -> test_and_set_bit() on the sync-flag can never be true on an UP system
> -> IPIPE_SYNC_FLAG is only needed for SMP
>
In fact, the need for this flag originates from x86, where some drivers
(e.g. PC keyboard one) do sit in a busy loop over an ISR waiting or the
next interrupt to come from the device they dialog with (yes, it's
ugly), instead of using some kind of automaton to deal with this
asynchronously from the kernel execution POV. For this to work, then you
need to be able to re-enter __adeos_sync_stage(). So it's actually
needed on UP too for some hw.
I cannot tell for ARM, but since __adeos_sync_stage() is arch-dependent,
I guess it's ok to remove it if there is no driver problem such as the
one plaguing x86.
> To save a few cycles (test_and_set_bit() is rather costly on ARM as the
> MSR has to be read/modified), use "#ifdef CONFIG_SMP" around all uses of
> IPIPE_SYNC_FLAG in arch/arm/kernel/adeos.c:__adeos_sync_stage().
>
> See attached patch-file linux-2.4.21-rmk1-crus1.4.2_ipipe_sync_flag_smp.patch
> Kind regards,
> Mike
--
Philippe.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Adeos-main] Some notes on Adeos on ARM 920T (EP9301) with Linux 2.4.21-rmk1
2004-12-18 12:55 ` Philippe Gerum
@ 2004-12-20 15:31 ` Michael Neuhauser
2004-12-22 22:47 ` Philippe Gerum
0 siblings, 1 reply; 4+ messages in thread
From: Michael Neuhauser @ 2004-12-20 15:31 UTC (permalink / raw)
To: rpm; +Cc: adeos-main
Thank you for commenting on my notes.
On Sat, 2004-12-18 at 13:55, Philippe Gerum wrote:
> On Mon, 2004-12-13 at 15:02, Michael Neuhauser wrote:
> > [...]
> > 1) idle loop (this may affect other architectures than ARM too)
> > ---------------------------------------------------------------
> >
> > If the CPU (like the EP9301) has a special HALT state and this feature
> > is used in the idle loop (in arch/arm/kernel/process.c, via arch_idle())
> >
> > void default_idle(void)
> > {
> > local_irq_disable();
> > if (!current->need_resched)
> > arch_idle();
> > local_irq_enable();
> > }
> >
> > then the following can happen when Adeos is used: local_irq_disable()
> > stalls the pipeline - interrupts that happen after this call and before
> > arch_idle() are handled by Adeos (put into pipe, mask in PIC) but
> > nothing more. If, by chance, *all* active interrupts happen just in this
> > time window, then the system is going to be locked-up because the HALT
> > state can not be left again:
> > * only an interrupt could wake-up the CPU from the HALT state
> > * all interrupts are masked and will not be delivered to the CPU
> >
> > This actually happened quite often on my system: everything
> > froze until I sent a character to the otherwise idle 2nd UART (i.e.
> > triggered an unmasked interrupt).
> >
> > To fix this, disable the interrupts on the hardware level in
> > default_idle(), not just stall the pipe (i.e. adeos_hw_cli() instead of
> > local_irq_disable()) if Adeos is configured.
>
> This has been overlooked in the Adeos patch for ARM. x86 properly
> flushes the interrupt log when calling safe_halt(), but locking out hw
> IRQs seems indeed the best way to prevent the CPU to enter the halted
> mode whilst there is some IRQ-awaken task to schedule, specifically for
> non-preemptible kernels.
As I understand it, the problem has nothing to do with flushing the
interrupt log or tasks not being scheduled. There is a lock-up on the
hardware level: all interrupts that could force the CPU from HALT are
masked -> HALT state is never left again. It's not likely to happen but
it does happen on my (slow) system (depends on interrupt load etc.).
> > Maybe using the HALT state should be avoided at all, when hard real-time
> > performance is the goal. [...]
>
> Power consumption?
I imagine some people will gladly accept increased power consumption if
it shaves off some us from the latency (if this is really the case for
some ARM hardware, it is not with mine). So this might not be a good
idea for default behaviour.
> >[...]
> > 3) crash when Adeos is compiled into kernel (i.e. not module)
> >[...]
>
> The location in the boot sequence where the pipeline should be enabled
> looks like much more arch-dependent than I had first expected (e.g.
> Adeos 2.6/ia64 required some changes there). This said, the crash might
> be related to the change making adp_pipelined a constant. In any case,
> this is what happened to the Adeos 2.6/PPC port.
I'll try it without making adp_pipelined a constant.
> > [...]
> > [return value of __adeos_handle_irq() is not used, don't compute it]
> > [...]
>
> Whoops, no. The real problem is the IRQ return path not caring from the
> return value of __adeos_handle_irq(), not the other way around. The
> current patch has a serious flaw, it should check this return value, and
> branch out of any Linux epilogue code if it is non-zero.
>
> The reason is that if this value is non-zero, then the current IRQ has
> either preempted a non-Linux domain, or has preempted Linux (e.g. to
> serve more prioritary domains, or at least to log the interrupt event)
> while the Linux pipeline stage was stalled, i.e. while Linux thought it
> was running in an interrupt-free section.
> If you do not prevent the Linux epilogue code to run upon IRQ exit in
> the latter case, then you are likely to execute it in a spurious
> context. This issue is even more important with preemptible kernels,
> because the epilogue code does even more complex things.
After spending a weekend buried in the entry-*.S stuff it is quite
obvious to me too. I have a preliminary patch (but it needs more
testing).
I think there are more problems with the exception handling stuff: the
stall-state of the root domain's ipipe is not preserved across the
exception. On i386 this is done by __adeos_if_fixup_root() and
__adeos_unstall_iret_root() but nothing the like exists for ARM.
Do you know if this was omitted accidentally or intentionally (i.e.
because it is not necessary on ARM, but I don't see how this could be
the case)?
> > P4) IPIPE_SYNC_FLAG only needed for SMP
> > ---------------------------------------
> >
> > * arch/arm/kernel/adeos.c:__adeos_sync_stage() is the only place where
> > the IPIPE_SYNC_FLAG of a pipeline is modified.
> > * the flag is only modified when adeos_lock_cpu() or adeos_hw_cli() was
> > called before
> > * the flag is cleared before interrupts are hard enabled and the handler
> > is called
> > -> test_and_set_bit() on the sync-flag can never be true on an UP system
> > -> IPIPE_SYNC_FLAG is only needed for SMP
>
> In fact, the need for this flag originates from x86, where some drivers
> (e.g. PC keyboard one) do sit in a busy loop over an ISR waiting or the
> next interrupt to come from the device they dialog with (yes, it's
> ugly), instead of using some kind of automaton to deal with this
> asynchronously from the kernel execution POV. For this to work, then you
> need to be able to re-enter __adeos_sync_stage(). So it's actually
> needed on UP too for some hw.
>
> I cannot tell for ARM, but since __adeos_sync_stage() is arch-dependent,
> I guess it's ok to remove it if there is no driver problem such as the
> one plaguing x86.
I think the difference between ARM & i386 is not the driver's behaviour
but the fact that ARM __adeos_sync_stage() does an adeos_lock_cpu()
(i386 doesn't) before setting the sync-flag and the sync-flag is cleared
before hw-interrupts are enabled again. So it is impossible (on UP) that
__adeos_sync_stage() is entered with sync-flag set (as long as nobody
else is modifying the flag, which doesn't seem to be the case).
Regards
Mike
--
Dr. Michael Neuhauser phone: +43 1 789 08 49 - 30
Firmix Software GmbH fax: +43 1 789 08 49 - 55
Vienna/Austria/Europe email: mike@domain.hid
Embedded Linux Development and Services http://www.firmix.at/
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Adeos-main] Some notes on Adeos on ARM 920T (EP9301) with Linux 2.4.21-rmk1
2004-12-20 15:31 ` Michael Neuhauser
@ 2004-12-22 22:47 ` Philippe Gerum
0 siblings, 0 replies; 4+ messages in thread
From: Philippe Gerum @ 2004-12-22 22:47 UTC (permalink / raw)
To: Michael Neuhauser; +Cc: adeos-main
On Mon, 2004-12-20 at 16:31, Michael Neuhauser wrote:
> Thank you for commenting on my notes.
>
> On Sat, 2004-12-18 at 13:55, Philippe Gerum wrote:
> > On Mon, 2004-12-13 at 15:02, Michael Neuhauser wrote:
> > > [...]
> > > 1) idle loop (this may affect other architectures than ARM too)
> > > ---------------------------------------------------------------
> > >
> > > If the CPU (like the EP9301) has a special HALT state and this feature
> > > is used in the idle loop (in arch/arm/kernel/process.c, via arch_idle())
> > >
> > > void default_idle(void)
> > > {
> > > local_irq_disable();
> > > if (!current->need_resched)
> > > arch_idle();
> > > local_irq_enable();
> > > }
> > >
> > > then the following can happen when Adeos is used: local_irq_disable()
> > > stalls the pipeline - interrupts that happen after this call and before
> > > arch_idle() are handled by Adeos (put into pipe, mask in PIC) but
> > > nothing more. If, by chance, *all* active interrupts happen just in this
> > > time window, then the system is going to be locked-up because the HALT
> > > state can not be left again:
> > > * only an interrupt could wake-up the CPU from the HALT state
> > > * all interrupts are masked and will not be delivered to the CPU
> > >
> > > This actually happened quite often on my system: everything
> > > froze until I sent a character to the otherwise idle 2nd UART (i.e.
> > > triggered an unmasked interrupt).
> > >
> > > To fix this, disable the interrupts on the hardware level in
> > > default_idle(), not just stall the pipe (i.e. adeos_hw_cli() instead of
> > > local_irq_disable()) if Adeos is configured.
> >
> > This has been overlooked in the Adeos patch for ARM. x86 properly
> > flushes the interrupt log when calling safe_halt(), but locking out hw
> > IRQs seems indeed the best way to prevent the CPU to enter the halted
> > mode whilst there is some IRQ-awaken task to schedule, specifically for
> > non-preemptible kernels.
>
> As I understand it, the problem has nothing to do with flushing the
> interrupt log or tasks not being scheduled. There is a lock-up on the
> hardware level: all interrupts that could force the CPU from HALT are
> masked -> HALT state is never left again. It's not likely to happen but
> it does happen on my (slow) system (depends on interrupt load etc.).
>
Indeed, this could happen with a particular optimization done in the
latest patches, which actually masks IRQs at hw level when stalling the
highest priority stage, since no other domain could preempt it anyway.
This saves a bunch of microseconds when the hi-priority domain runs in
interrupt-free sections, by avoiding further (useless) preemption to run
__adeos_handle_irq() (which among other things has to ack the interrupt
at hw level).
x86 solves this by calling safe_halt() which explicitely re-enables hw
IRQs before calling the 'hlt' insn. Maybe something alike is missing in
our case.
> > > Maybe using the HALT state should be avoided at all, when hard real-time
> > > performance is the goal. [...]
> >
> > Power consumption?
>
> I imagine some people will gladly accept increased power consumption if
> it shaves off some us from the latency (if this is really the case for
> some ARM hardware, it is not with mine). So this might not be a good
> idea for default behaviour.
>
Maybe, since waking up a CPU from a halted state is likely to be
power-consuming too; I guess the right solution is a matter of finding
the right trade-off, as usual. This could be made optional at the
Adeos-level if the option to choose the idle mode is not already
provided by the kernel.
> > >[...]
> > > 3) crash when Adeos is compiled into kernel (i.e. not module)
> > >[...]
> >
> > The location in the boot sequence where the pipeline should be enabled
> > looks like much more arch-dependent than I had first expected (e.g.
> > Adeos 2.6/ia64 required some changes there). This said, the crash might
> > be related to the change making adp_pipelined a constant. In any case,
> > this is what happened to the Adeos 2.6/PPC port.
>
> I'll try it without making adp_pipelined a constant.
>
> > > [...]
> > > [return value of __adeos_handle_irq() is not used, don't compute it]
> > > [...]
> >
> > Whoops, no. The real problem is the IRQ return path not caring from the
> > return value of __adeos_handle_irq(), not the other way around. The
> > current patch has a serious flaw, it should check this return value, and
> > branch out of any Linux epilogue code if it is non-zero.
> >
> > The reason is that if this value is non-zero, then the current IRQ has
> > either preempted a non-Linux domain, or has preempted Linux (e.g. to
> > serve more prioritary domains, or at least to log the interrupt event)
> > while the Linux pipeline stage was stalled, i.e. while Linux thought it
> > was running in an interrupt-free section.
> > If you do not prevent the Linux epilogue code to run upon IRQ exit in
> > the latter case, then you are likely to execute it in a spurious
> > context. This issue is even more important with preemptible kernels,
> > because the epilogue code does even more complex things.
>
> After spending a weekend buried in the entry-*.S stuff it is quite
> obvious to me too. I have a preliminary patch (but it needs more
> testing).
>
> I think there are more problems with the exception handling stuff: the
> stall-state of the root domain's ipipe is not preserved across the
> exception. On i386 this is done by __adeos_if_fixup_root() and
> __adeos_unstall_iret_root() but nothing the like exists for ARM.
>
> Do you know if this was omitted accidentally or intentionally (i.e.
> because it is not necessary on ARM, but I don't see how this could be
> the case)?
>
__adeos_fixup_root() is a rather recent fix I've made for x86. I've
struggled for weeks against random and infrequent bugs induced by the
wrong software interrupt bit being spuriously inverted by an exception
return before seing the light... Its absence for ARM is only due to the
fact that I did not analyse the situation of this arch wrt to this
issue.
> > > P4) IPIPE_SYNC_FLAG only needed for SMP
> > > ---------------------------------------
> > >
> > > * arch/arm/kernel/adeos.c:__adeos_sync_stage() is the only place where
> > > the IPIPE_SYNC_FLAG of a pipeline is modified.
> > > * the flag is only modified when adeos_lock_cpu() or adeos_hw_cli() was
> > > called before
> > > * the flag is cleared before interrupts are hard enabled and the handler
> > > is called
> > > -> test_and_set_bit() on the sync-flag can never be true on an UP system
> > > -> IPIPE_SYNC_FLAG is only needed for SMP
> >
> > In fact, the need for this flag originates from x86, where some drivers
> > (e.g. PC keyboard one) do sit in a busy loop over an ISR waiting or the
> > next interrupt to come from the device they dialog with (yes, it's
> > ugly), instead of using some kind of automaton to deal with this
> > asynchronously from the kernel execution POV. For this to work, then you
> > need to be able to re-enter __adeos_sync_stage(). So it's actually
> > needed on UP too for some hw.
> >
> > I cannot tell for ARM, but since __adeos_sync_stage() is arch-dependent,
> > I guess it's ok to remove it if there is no driver problem such as the
> > one plaguing x86.
>
> I think the difference between ARM & i386 is not the driver's behaviour
> but the fact that ARM __adeos_sync_stage() does an adeos_lock_cpu()
> (i386 doesn't) before setting the sync-flag and the sync-flag is cleared
In any case, recent Adeos patches for 2.6 must enter
__adeos_sync_stage() with hw IRQs off (i.e. adeos_lock_cpu()) regardless
of the target arch; it's a prerequisite. This is why adeos_lock_cpu() is
no more called by __adeos_sync_stage() in the latest patches, but the
caller must provide for this protection in the first place.
> before hw-interrupts are enabled again. So it is impossible (on UP) that
> __adeos_sync_stage() is entered with sync-flag set (as long as nobody
> else is modifying the flag, which doesn't seem to be the case).
Not on UP, you are right. IIRC, on SMP, there used to be another issue
aside of the recursion one: cpu migration of IRQ handler contexts
started by a high-priority (non-Linux) domain. But I'm not even sure
that the latter still applies after the various rewrites I've made of
__adeos_sync_stage() over time. /me needs to check.
> Regards
> Mike
--
Philippe.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2004-12-22 22:47 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-12-13 14:02 [Adeos-main] Some notes on Adeos on ARM 920T (EP9301) with Linux 2.4.21-rmk1 Michael Neuhauser
2004-12-18 12:55 ` Philippe Gerum
2004-12-20 15:31 ` Michael Neuhauser
2004-12-22 22:47 ` Philippe Gerum
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.