From: dave.martin@linaro.org (Dave Martin)
To: linux-arm-kernel@lists.infradead.org
Subject: Question on ARM inline assembly - stackframes, sp/fp usage, __naked
Date: Wed, 18 May 2011 18:45:31 +0100 [thread overview]
Message-ID: <20110518174531.GA28300@arm.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1105181505080.2374@localhost6.localdomain6>
On Wed, May 18, 2011 at 04:25:15PM +0100, Frank Hofmann wrote:
> Hi,
>
>
> with the work on the hibernation support code, I've managed to make
> it all C-with-inline-asm. This happens to work fine on OMAP3, but
> there's an issue when I build it for Samsung 64xx. It's about this
> code:
>
>
> int __naked swsusp_arch_resume(void)
> {
> register struct pbe *pbe;
> register void *sp asm("sp") = swsusp_resume_stack + PAGE_SIZE - 8;
(Why - 8?)
> register void *saved_ctx asm ("r0") = ctx;
>
> __asm__ __volatile__("" : "+sp"(sp)); /* don't optimize away ! */
Is this an attempt to force gcc to change sp before the subsequent loop?
I think you don't mean "+sp". Something like "+r" might be appropriate.
This feels subtle and fragile to me...
>
> for (pbe = restore_pblist; pbe; pbe = pbe->next)
> copy_page(pbe->orig_address, pbe->address);
>
> __asm__ __volatile__(
> "mov r1, #" __stringify(SYSTEM_MODE) "\n\t"
> "msr cpsr_c, r1\n\t"
> "ldr r1, [r0], #4\n\t"
> "msr cpsr_cxsf, r1\n\t"
> "ldm r0!, {r4-r12,lr}\n\t"
> "ldr sp, [r0], #4\n\t"
> "mov r2, #" __stringify(SVC_MODE) "\n\t"
> "msr cpsr_c, r2\n\t"
> "ldm r0!, {r2, lr}\n\t"
> "ldr sp, [r0], #4\n\t"
> "msr spsr_cxsf, r2\n\t"
> "msr cpsr, r1\n\t"
> "mov r1, #0\n\t"
> "push {r1, lr}\n\t"
> "bl __restore_processor_state\n\t"
> "pop {r0, pc}\n\t"
> : "+r"(saved_ctx)
> :
> : "memory", "cc",
> "r1", "r2", "r3", "r4", "r5", "r6", "r7",
> "r8", "r9", "r10", "ip", "fp" "sp", "lr", "pc");
>
> return 0;
> }
>
>
> This has the special requirement to run on a separate stack, because
> the copy loop will blot over whatever there is. Clobbering
> everything is part of the design/intent.
>
>
>
> On OMAP (where I compile _without_ CONFIG_FTRACE, but even when
> switching it on there's no change), this creates the following
> assembly, which is ok:
>
> 00005bb0 <swsusp_arch_resume>:
> 5bb0: e59fd074 ldr sp, [pc, #116] ; 5c2c <swsusp_arch_resume+0x7c>
> 5bb4: e3003000 movw r3, #0 ; 0x0
> 5bb8: e3403000 movt r3, #0 ; 0x0
> 5bbc: e5934000 ldr r4, [r3]
> 5bc0: e3540000 cmp r4, #0 ; 0x0
> 5bc4: 0a000005 beq 5be0 <swsusp_arch_resume+0x30>
> 5bc8: e5940004 ldr r0, [r4, #4]
> 5bcc: e5941000 ldr r1, [r4]
> 5bd0: ebfffffe bl 162b80 <copy_page>
> 5bd4: e5944008 ldr r4, [r4, #8]
> 5bd8: e3540000 cmp r4, #0 ; 0x0
> 5bdc: 1afffff9 bne 5bc8 <swsusp_arch_resume+0x18>
> 5be0: e3000000 movw r0, #0 ; 0x0
> 5be4: e3400000 movt r0, #0 ; 0x0
> 5be8: e3a0101f mov r1, #31 ; 0x1f
> 5bec: e121f001 msr CPSR_c, r1
> [ ... inline as above follows ... ]
> 5c1c: e92d4002 push {r1, lr}
> 5c20: ebfffffe bl 1538c <__restore_processor_state>
> 5c24: e8bd8001 pop {r0, pc}
> 5c28: e3a00000 mov r0, #0 ; 0x0
> 5c2c: 00000ff8 .word 0x00000ff8
>
> Notice how the for() loop has been coded into using R4.
>
>
>
> When I build it for S5P64xx though (where we have CONFIG_FTRACE, but
> see below), I get a compile error (same gcc, just different kernel
> defconfig target, ARM11xx instead of Cortex-A8):
>
> arch/arm/kernel/hibernate.c: In function 'swsusp_arch_resume':
> arch/arm/kernel/hibernate.c:136: error: fp cannot be used in asm here
Probably this is a side-effect of building for ARM versus Thumb-2.
ARM1176 only supports ARM, and fp is used as the framepointer when
building a kernel in ARM.
Building that file with -fomit-frame-pointer avoids the error,
but that's best avoided unless there's a really good reason.
You would get the same problem when building for Thumb-2 if you pass
-fno-omit-frame-pointer to the compiler and attempt to clobber r7
(by default, Thumb-2 code has no framepointer).
>
> That's the '}' of the function. Strange enough ... the compiler
> wants to do something with FP and isn't willing to give that up.
>
>
> If I take "fp" out of the clobber list then it compiles.
>
> But the generated assembly code is incorrect:
>
> 0000757c <swsusp_arch_resume>:
> 757c: e59fd078 ldr sp, [pc, #120] ; 75fc <swsusp_arch_resume+0x80>
> 7580: e59f3078 ldr r3, [pc, #120] ; 7600 <swsusp_arch_resume+0x84>
> 7584: e5933000 ldr r3, [r3]
> 7588: ea000005 b 75a4 <swsusp_arch_resume+0x28>
> 758c: e59b3000 ldr r3, [fp]
> 7590: e5930004 ldr r0, [r3, #4]
> 7594: e5931000 ldr r1, [r3]
> 7598: ebfffffe bl 1647a0 <copy_page>
> 759c: e59b3000 ldr r3, [fp]
> 75a0: e5933008 ldr r3, [r3, #8]
> 75a4: e58b3000 str r3, [fp]
> 75a8: e59b3000 ldr r3, [fp]
> 75ac: e3530000 cmp r3, #0 ; 0x0
> 75b0: 1afffff5 bne 758c <swsusp_arch_resume+0x10>
> 75b4: e59f0048 ldr r0, [pc, #72] ; 7604 <swsusp_arch_resume+0x88>
> 75b8: e3a0101f mov r1, #31 ; 0x1f
> 75bc: e121f001 msr CPSR_c, r1
> [ ... inline as above follows ... ]
> 75ec: e92d4002 push {r1, lr}
> 75f0: ebfffffe bl dc58 <__restore_processor_state>
> 75f4: e8bd8001 pop {r0, pc}
> 75f8: e59b0000 ldr r0, [fp]
> 75fc: 00000ff8 .word 0x00000ff8
> 7600: 00000000 .word 0x00000000
> 7604: 00002000 .word 0x00002000
>
>
> This assumes FP is set up so that the local variable held in R3
> (that the OMAP compile stuffs into R4) can be stored there, to
> preserve over the function call.
>
> Adding a register asm("r4") to the variable declaration changes
> nothing, it still insists on saving it to/from the stack via [fp].
>
>
> Now this means there's something wrong here; either it's lucky that
> my compiler creates the OMAP (armv7) code that it does and the 6450
> (armv6) is actually correct in assuming this reg/mem location is
> available/usable for it.
>
> Or there's a problem with the ARM11xx / armv6 code generation.
>
>
> I'm tending towards believing that the OMAP result is luck, and to
> make sure the stack switch fools the entire ABI convention into
> working the stackframe setup needs to initialize more than just SP.
>
>
> I can hack around the issue by explicitly initializing FP via:
>
> __asm__ __volatile__("add fp, %0, #4\n\t" : "+sp"(sp));
>
> instead, but that's sneaky in the sense that I'm not telling the
> compiler I've done so (adding "fp" to the clobber fails compile
> again). Also, while this specific incantation uses one word, how can
> I be sure that that'll be it and no other gcc version will pull more
> [fp] usage out of some hat ?
>
> Note that while GCC chose to use [fp], it hasn't actually put proper
> offsets in; it uses the same [fp] for both 'pbe' and the zero return
> value; that's definitely not ok, but again is an artifact of the
> __naked.
>
>
>
> Another option I've found is:
>
> int __naked swsusp_arch_resume(void)
> {
> register void *sp asm("sp") = swsusp_resume_stack + PAGE_SIZE - 8;
> __asm__ __volatile__("" : "+sp"(sp));
>
> return resume_internal();
> }
>
> and make resume_internal() the actual function, non-naked; this
> creates a normal prologue which sets up FP. resume_internal() cannot
> be made static, though, or else the compiler inlines it - without
> the prologues, and it's back to #1^2.
>
>
>
> Re, the use of CONFIG_FTRACE: Just switching that off on S5P6450
> results in using [sp #4] instead of [fp]. Still incorrect code
> (albeit it works out of the box), and still not possible to put "fp"
> into the clobber list.
>
>
>
>
> I am wondering if there's a way to convince gcc to do the right
> thing here, __naked notwithstanding.
I think basically luck is the answer. If you ask the compiler to saw
its own head off it may occasionally make a neat job of it, but don't
bet on that happening every time...
>
> Is the use of a naked "bounce function" for this specific purpose
> (switch the stackpointer) acceptable, and guaranteed to create the
> correct code ?
>
>
> It'd be nice not having to dump this piece into assembly...
Using out-of-line assembler avoids the ABI issues, and is often
more readable IMHO.
On Wed, May 18, 2011 at 06:50:07PM +0200, Mikael Pettersson wrote:
[...]
> Just write the stack switching fragment in proper assembly, and
> keep both sides of it in normal C. End of problem.
Indeed, though there is only a little section of C, in the middle
of some assembler which is not implementable in C.
Maybe something like this:
ENTRY(swsusp_arch_resume)
ldr sp, resume_sp
bl __swsusp_arch_resume_copy_pages
/* the rest of your code code goes here ... */
.align 2
resume_sp: .long swsusp_resume_stack + PAGE_SIZE - 8
ENDPROC(swsusp_arch_resume)
Then in C:
asmlinkage void __swsusp_arch_resume_copy_pages(void)
{
struct pbe *pbe;
for (pbe = restore_pblist; pbe; pbe = pbe->next)
copy_page(pbe->orig_address, pbe->address);
}
There's now nothing for the compiler to get wrong...
Cheers
---Dave
next prev parent reply other threads:[~2011-05-18 17:45 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-05-18 15:25 Question on ARM inline assembly - stackframes, sp/fp usage, __naked Frank Hofmann
2011-05-18 16:06 ` Frank Hofmann
2011-05-18 16:50 ` Mikael Pettersson
2011-05-18 17:45 ` Dave Martin [this message]
2011-05-19 8:46 ` Frank Hofmann
2011-05-19 9:47 ` Dave Martin
2011-05-19 15:25 ` Frank Hofmann
2011-05-19 11:08 ` Måns Rullgård
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110518174531.GA28300@arm.com \
--to=dave.martin@linaro.org \
--cc=linux-arm-kernel@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.