linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: u.kleine-koenig@pengutronix.de (Uwe Kleine-König)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH v2 5/5] Cortex-M3: Add support for exception handling
Date: Tue, 13 Mar 2012 21:39:30 +0100	[thread overview]
Message-ID: <20120313203930.GD10400@pengutronix.de> (raw)
In-Reply-To: <20120309171026.GF14148@arm.com>

Hello Catalin,

On Fri, Mar 09, 2012 at 05:10:26PM +0000, Catalin Marinas wrote:
> On Mon, Mar 05, 2012 at 05:04:02PM +0000, Uwe Kleine-K?nig wrote:
> > To save r0, I'd readd OLD_R0 at the end of pt_regs (plus one buffer word
> > to get even alignment). Or would that already be unacceptable because
> > it's an ABI change, too?
> 
> If we preserve the first part of the pt_regs structure, we could add the
> exception return at the end (with two additional 32-bit words to
> preserve the 64-bit alignment).
I will do that.

> > --- a/arch/arm/kernel/entry-common.S
> > +++ b/arch/arm/kernel/entry-common.S
> > @@ -473,7 +480,7 @@ __sys_trace:
> >  
> >  	adr	lr, BSYM(__sys_trace_return)	@ return address
> >  	mov	scno, r0			@ syscall number (possibly new)
> > -	add	r1, sp, #S_R0 + S_OFF		@ pointer to regs
> > +	add	r1, sp, #S_OFF			@ pointer to regs
> 
> This change is no longer needed since S_R0 is no 0.
hmm, I thought I had removed them. Seems I didn't.
 
> > --- a/arch/arm/kernel/entry-header.S
> > +++ b/arch/arm/kernel/entry-header.S
> > @@ -26,7 +26,7 @@
> >   * The SWI code relies on the fact that R0 is at the bottom of the stack
> >   * (due to slow/fast restore user regs).
> >   */
> > -#if S_R0 != 0
> > +#if S_R0 != 0 && !defined(CONFIG_CPU_V7M)
> 
> Same here.
Same here ;-)
> 
> > +#ifdef CONFIG_CPU_V7M
> > +/*
> > + * ARMv7-M exception entry/exit macros.
> > + *
> > + * xPSR, ReturnAddress(), LR (R14), R12, R3, R2, R1, and R0 are
> > + * automatically saved on the current stack (32 words) before
> > + * switching to the exception stack (SP_main).
> > + *
> > + * If exception is taken while in user mode, SP_main is
> > + * empty. Otherwise, SP_main is aligned to 64 bit automatically
> > + * (CCR.STKALIGN set).
> > + *
> > + * Linux assumes that the interrupts are disabled when entering an
> > + * exception handler and it may BUG if this is not the case. Interrupts
> > + * are disabled during entry and reenabled in the exit macro.
> > + *
> > + * v7m_exception_fast_exit is used when returning from interrupts.
> > + *
> > + * v7m_exception_slow_exit is used when returning from SVC or PendSV.
> > + * When returning to kernel mode, we don't return from exception.
> > + */
> > +	.macro	v7m_exception_entry
> > +	cpsid	i
> > +	sub	sp, #S_FRAME_SIZE
> > +	stmia	sp, {r0-r12}
> > +
> > +	@ set r0 to the location of the registers saved by the core during
> > +	@ exception entry. Depending on the mode the cpu was in when the
> > +	@ exception happend that is either on the main or the process stack.
> > +	@ Bit 2 of EXC_RETURN stored in the lr register specifies which stack
> > +	@ was used.
> > +	tst	lr, #0x4
> > +	mrsne	r0, psp
> > +	addeq	r0, sp, #S_FRAME_SIZE
> 
> Could we use some other registers here like r8-r12 so that we keep r0-r7
> for syscall handling later and avoid another ldmia?
One upside of using r0 is that

	addeq	r0, sp, #S_FRAME_SIZE

can be encoded in 16 bits while this is not possible and 32 bits are
needed. And I wonder if it's allowed to corrupt r8-r12?

> > +	add	r0, r0, #20		@ skip over r0-r3, r12
> > +	ldmia	r0!, {r1-r3}		@ load saved lr, return address and xPSR
> > +
> > +	@ calculate the orignal stack pointer value.
> > +	@ r0 currently points to the memory location just above the auto saved
> > +	@ xPSR. If the FP extension is implemented and bit 4 of EXC_RETURN is 0
> > +	@ then space was allocated for FP state. That is space for 18 32-bit
> > +	@ values. (If FP extension is unimplemented, bit 4 is 1.)
> > +	@ Additionally the cpu might automatically 8-byte align the stack. Bit 9
> > +	@ of the saved xPSR specifies if stack aligning took place. In this case
> > +	@ another 32-bit value is included in the stack.
> > +
> > +	tst	lr, #0x10
> > +	addeq	r0, r0, #576
> 
> I think you can ignore VFP for now. We could change it to do lazy
> save/restore and avoid the automatic state saving. If it does this while
> in kernel, it takes a bit of extra stack space.
If I understood correctly this will never trigger for kernel tasks,
won't it?

> > +	@ save original sp, lr, return address, xPSR and EXC_RETURN
> > +	add	r12, sp, #52
> 
> Just use S_ARM_SP or whatever asm offsets, it's easier to read.
OK, right.

> > +	stmia	r12, {r0-r3, lr}
> > +
> > +	@ restore registers for system calls
> > +	ldmia	sp, {r0-r12}
> 
> We could avoid reloading r0-r7 (that's what we actually need for
> syscalls) if we don't corrupt them.
See question above. And I noticed that because of tail-chaining we need
to load the register values from the exception frame at least once
anyhow. I will try to optimise here.

> > +	.endm
> > +
> > +	.macro	v7m_exception_fast_exit
> > +	@ read r12, sp, lr, return address, xPSR and EXC_RETURN
> > +	add	r12, sp, #48
> 
> S_ARM_R12?
> 
> > +	ldmia	r12, {r1-r5, lr}
> > +
> > +	tst     r5, #0x100
> > +	subne   r2, r2, #4
> > +
> > +	tst	lr, #0x10
> > +	subeq	r2, r2, #576
> > +
> > +	stmdb	r2!, {r1, r3-r5}		@ write saved r12, lr, return address and xPSR
> > +
> > +	ldmia	sp, {r1, r3-r5}			@ read saved r0-r3
> > +	stmdb	r2!, {r1, r3-r5}		@ write r0-r3 to basic exception frame
> > +
> > +	tst	lr, #0x4
> > +	msrne	psp, r2
> > +
> > +	ldmia	sp, {r0-r12}
> > +	add	sp, #S_FRAME_SIZE
> > +	cpsie	i
> > +	bx	lr
> > +	.endm
> 
> In the context v7m_exception_fast_exit is used (return from interrupts),
> the previously saved r0-r4,r12 etc. are still on the return stack and
> restored from there. There is no need for the ldmia/stmdb to re-create
> the interrupted process stack. Here we only need to make sure that we
> restore the registers that were not automatically saved and also move
> the kernel SP back to the original location (add S_FRAME_SIZE).
>
> We handle rescheduling and work pending by raising PendSV, so we get
> another interrupt which would be returned via the slow_exit macro.
I was not sure if a task could be preempted during an exception. So I
choosed to play save.

> > +	.macro	v7m_exception_slow_exit ret_r0
> > +	cpsid	i
> > +	ldr	lr, [sp, #S_EXCRET]		@ read exception LR
> > +	tst     lr, #0x8
> > +	bne	1f				@ go to thread mode using exception return
> > +
> > +	/*
> > +	 * return to kernel thread
> > +	 * sp is already set up (and might be unset in pt_regs), so only
> > +	 * restore r0-r12 and pc
> > +	 */
> > +	ldmia	sp, {r0-r12}
> > +	ldr	lr, [sp, #S_PC]
> > +	add	sp, sp, #S_FRAME_SIZE
> > +	cpsie	i
> > +	bx	lr
> > +	/*
> > +	 * return to userspace
> > +	 */
> > +1:
> > +	@ read original r12, sp, lr, pc, xPSR
> > +	add	r12, sp, #48
> > +	ldmia	r12, {r1-r5}
> > +
> > +	/* stack aligning */
> > +	tst	r5, #0x100
> > +	subne	r2, r2, #4
> > +
> > +	/* skip over stack space for fp extension */
> > +	tst	lr, #0x10
> > +	subeq	r2, r2, #576
> > +
> > +	/* write basic exception frame */
> > +	stmdb	r2!, {r1, r3-r5}		@ saved r12, lr, return address and xPSR
> > +	ldmia	sp, {r1, r3-r5}			@ read saved r0-r3
> > +	.if	\ret_r0
> > +	stmdb	r2!, {r0, r3-r5}		@ restore r0-r3
> > +	.else
> > +	stmdb	r2!, {r1, r3-r5}		@ restore r0-r3
> > +	.endif
> 
> This looks fine.
> 
> > +	msr	psp, r2
> > +
> > +	ldmia	sp, {r0-r12}			@ restore original r4-r11
> 
> Isn't this reading too much (optimisation)?
yeah, I think r12 isn't needed. For r0-r3 I'm not sure which is better:

	ldmia	sp, {r0-r11}
	add	sp, #S_FRAME_SIZE

vs.

	add	sp, #S_R4
	ldmia	sp, {r4-r11}
	add	sp, #S_FRAME_SIZE - S_R4

It's not described in the V7-M reverence. Do you know a document where I
can look that up?

> > +	add	sp, #S_FRAME_SIZE		@ restore the original MSP
> > +	cpsie	i
> > +	bx	lr
> > +	.endm
> > +#endif	/* CONFIG_CPU_V7M */
> ...
> > --- a/arch/arm/kernel/process.c
> > +++ b/arch/arm/kernel/process.c
> > @@ -452,7 +452,11 @@ asm(	".pushsection .text\n"
> >  #ifdef CONFIG_TRACE_IRQFLAGS
> >  "	bl	trace_hardirqs_on\n"
> >  #endif
> > +#ifdef CONFIG_CPU_V7M
> > +"	msr	primask, r7\n"
> > +#else
> >  "	msr	cpsr_c, r7\n"
> > +#endif
> >  "	mov	r0, r4\n"
> >  "	mov	lr, r6\n"
> >  "	mov	pc, r5\n"
> > @@ -491,6 +495,10 @@ pid_t kernel_thread(int (*fn)(void *), void *arg, unsigned long flags)
> >  	regs.ARM_r7 = SVC_MODE | PSR_ENDSTATE | PSR_ISETSTATE;
> >  	regs.ARM_pc = (unsigned long)kernel_thread_helper;
> >  	regs.ARM_cpsr = regs.ARM_r7 | PSR_I_BIT;
> > +#ifdef CONFIG_CPU_V7M
> > +	/* Return to Handler mode */
> > +	regs.ARM_EXCRET = 0xfffffff1L;
> > +#endif
> 
> BTW, do we need to set this here? We don't return to a kernel thread via
> exception return.
But we need it to decide in slow_return if we should use exception
return or not. So yes, I think it's needed.

> I currently cannot see any obvious bugs in the code. Since you said it
> doesn't work, what are the symptoms?
I found two problems. One is I wasn't aware that r0-r3,r12 doesn't
necessarily need to match the values on exception entry (because of
tail-chaining) and in start_thread decreasing sp is wrong.

I will reimplement the three macros now with the new lessons learned.

Best regards and thanks for your input (here and on irc)
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-K?nig            |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |

  reply	other threads:[~2012-03-13 20:39 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-22 11:12 [RFC PATCH 00/11] Cortex-M3 support Uwe Kleine-König
2012-01-22 11:13 ` [RFC PATCH 01/11] ARM: only show modules in the memory layout for MODULES=y Uwe Kleine-König
2012-01-26  6:16   ` Linus Walleij
2012-01-22 11:13 ` [RFC PATCH 02/11] ARM: add device tree blobs to .gitignore Uwe Kleine-König
2012-01-22 11:13 ` [RFC PATCH 03/11] ARM: protect usage of cr_alignment by #ifdef CONFIG_CPU_CP15 Uwe Kleine-König
2012-01-23  5:43   ` Jean-Christophe PLAGNIOL-VILLARD
2012-01-23  8:14     ` Uwe Kleine-König
2012-01-22 11:13 ` [RFC PATCH 04/11] ARM: Add a printk loglevel modifier Uwe Kleine-König
2012-01-23  5:50   ` Jean-Christophe PLAGNIOL-VILLARD
2012-01-22 11:13 ` [RFC PATCH 05/11] ARM: provide XIP_VIRT_ADDR for no-MMU builds Uwe Kleine-König
2012-01-22 11:13 ` [RFC PATCH 06/11] Cortex-M3: Add base support for Cortex-M3 Uwe Kleine-König
2012-01-22 19:45   ` Michał Mirosław
2012-01-22 20:42     ` Uwe Kleine-König
2012-01-22 11:13 ` [RFC PATCH 07/11] Cortex-M3: Add support for exception handling Uwe Kleine-König
2012-01-22 11:13 ` [RFC PATCH 08/11] Cortex-M3: Add NVIC support Uwe Kleine-König
2012-01-31 19:39   ` Uwe Kleine-König
2012-01-22 11:13 ` [RFC PATCH 09/11] Cortex-M3: Allow the building of Cortex-M3 kernel port Uwe Kleine-König
2012-01-22 20:05   ` Michał Mirosław
2012-02-07 19:43     ` Uwe Kleine-König
2012-01-22 11:13 ` [RFC PATCH 10/11] Cortex-M3: Add VFP support Uwe Kleine-König
2012-01-22 11:13 ` [RFC PATCH 11/11] HACK! ARM: no, we don't enter in ARM Uwe Kleine-König
2012-02-07 20:18 ` [RFC PATCH 00/11] Cortex-M3 support Uwe Kleine-König
2012-02-16 20:01   ` Uwe Kleine-König
2012-02-16 20:18     ` [PATCH 1/5] ARM: protect usage of cr_alignment by #ifdef CONFIG_CPU_CP15 Uwe Kleine-König
2012-02-16 20:18       ` [PATCH 2/5] ARM: Add a printk loglevel modifier Uwe Kleine-König
2012-02-16 20:18       ` [PATCH 3/5] ARM: force branch instructions to use long distance encoding Uwe Kleine-König
2012-02-16 20:18       ` [PATCH 4/5] ARM: Cortex-M3: Add base support for Cortex-M3 Uwe Kleine-König
2012-02-16 20:18       ` [PATCH 5/5] ARM: Cortex-M3: Add support for exception handling Uwe Kleine-König
2012-02-16 22:20         ` Russell King - ARM Linux
2012-02-24 22:01           ` Uwe Kleine-König
2012-02-24 22:12             ` Catalin Marinas
2012-02-24 22:43               ` Russell King - ARM Linux
2012-02-25  8:49                 ` Catalin Marinas
2012-02-25 14:07               ` Uwe Kleine-König
2012-03-05 17:04               ` [PATCH v2 4/5] Cortex-M3: Add base support for Cortex-M3 Uwe Kleine-König
2012-03-05 17:04                 ` [PATCH v2 5/5] Cortex-M3: Add support for exception handling Uwe Kleine-König
2012-03-09 17:10                   ` Catalin Marinas
2012-03-13 20:39                     ` Uwe Kleine-König [this message]
2012-03-08 10:52                 ` [PATCH v2 4/5] Cortex-M3: Add base support for Cortex-M3 Catalin Marinas
2012-02-17  0:28       ` [PATCH 1/5] ARM: protect usage of cr_alignment by #ifdef CONFIG_CPU_CP15 Ryan Mallon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120313203930.GD10400@pengutronix.de \
    --to=u.kleine-koenig@pengutronix.de \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).