From: dave.martin@linaro.org (Dave Martin)
To: linux-arm-kernel@lists.infradead.org
Subject: Question on ARM inline assembly - stackframes, sp/fp usage, __naked
Date: Wed, 18 May 2011 18:45:31 +0100 [thread overview]
Message-ID: <20110518174531.GA28300@arm.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1105181505080.2374@localhost6.localdomain6>
On Wed, May 18, 2011 at 04:25:15PM +0100, Frank Hofmann wrote:
> Hi,
>
>
> with the work on the hibernation support code, I've managed to make
> it all C-with-inline-asm. This happens to work fine on OMAP3, but
> there's an issue when I build it for Samsung 64xx. It's about this
> code:
>
>
> int __naked swsusp_arch_resume(void)
> {
> register struct pbe *pbe;
> register void *sp asm("sp") = swsusp_resume_stack + PAGE_SIZE - 8;
(Why - 8?)
> register void *saved_ctx asm ("r0") = ctx;
>
> __asm__ __volatile__("" : "+sp"(sp)); /* don't optimize away ! */
Is this an attempt to force gcc to change sp before the subsequent loop?
I think you don't mean "+sp". Something like "+r" might be appropriate.
This feels subtle and fragile to me...
>
> for (pbe = restore_pblist; pbe; pbe = pbe->next)
> copy_page(pbe->orig_address, pbe->address);
>
> __asm__ __volatile__(
> "mov r1, #" __stringify(SYSTEM_MODE) "\n\t"
> "msr cpsr_c, r1\n\t"
> "ldr r1, [r0], #4\n\t"
> "msr cpsr_cxsf, r1\n\t"
> "ldm r0!, {r4-r12,lr}\n\t"
> "ldr sp, [r0], #4\n\t"
> "mov r2, #" __stringify(SVC_MODE) "\n\t"
> "msr cpsr_c, r2\n\t"
> "ldm r0!, {r2, lr}\n\t"
> "ldr sp, [r0], #4\n\t"
> "msr spsr_cxsf, r2\n\t"
> "msr cpsr, r1\n\t"
> "mov r1, #0\n\t"
> "push {r1, lr}\n\t"
> "bl __restore_processor_state\n\t"
> "pop {r0, pc}\n\t"
> : "+r"(saved_ctx)
> :
> : "memory", "cc",
> "r1", "r2", "r3", "r4", "r5", "r6", "r7",
> "r8", "r9", "r10", "ip", "fp" "sp", "lr", "pc");
>
> return 0;
> }
>
>
> This has the special requirement to run on a separate stack, because
> the copy loop will blot over whatever there is. Clobbering
> everything is part of the design/intent.
>
>
>
> On OMAP (where I compile _without_ CONFIG_FTRACE, but even when
> switching it on there's no change), this creates the following
> assembly, which is ok:
>
> 00005bb0 <swsusp_arch_resume>:
> 5bb0: e59fd074 ldr sp, [pc, #116] ; 5c2c <swsusp_arch_resume+0x7c>
> 5bb4: e3003000 movw r3, #0 ; 0x0
> 5bb8: e3403000 movt r3, #0 ; 0x0
> 5bbc: e5934000 ldr r4, [r3]
> 5bc0: e3540000 cmp r4, #0 ; 0x0
> 5bc4: 0a000005 beq 5be0 <swsusp_arch_resume+0x30>
> 5bc8: e5940004 ldr r0, [r4, #4]
> 5bcc: e5941000 ldr r1, [r4]
> 5bd0: ebfffffe bl 162b80 <copy_page>
> 5bd4: e5944008 ldr r4, [r4, #8]
> 5bd8: e3540000 cmp r4, #0 ; 0x0
> 5bdc: 1afffff9 bne 5bc8 <swsusp_arch_resume+0x18>
> 5be0: e3000000 movw r0, #0 ; 0x0
> 5be4: e3400000 movt r0, #0 ; 0x0
> 5be8: e3a0101f mov r1, #31 ; 0x1f
> 5bec: e121f001 msr CPSR_c, r1
> [ ... inline as above follows ... ]
> 5c1c: e92d4002 push {r1, lr}
> 5c20: ebfffffe bl 1538c <__restore_processor_state>
> 5c24: e8bd8001 pop {r0, pc}
> 5c28: e3a00000 mov r0, #0 ; 0x0
> 5c2c: 00000ff8 .word 0x00000ff8
>
> Notice how the for() loop has been coded into using R4.
>
>
>
> When I build it for S5P64xx though (where we have CONFIG_FTRACE, but
> see below), I get a compile error (same gcc, just different kernel
> defconfig target, ARM11xx instead of Cortex-A8):
>
> arch/arm/kernel/hibernate.c: In function 'swsusp_arch_resume':
> arch/arm/kernel/hibernate.c:136: error: fp cannot be used in asm here
Probably this is a side-effect of building for ARM versus Thumb-2.
ARM1176 only supports ARM, and fp is used as the framepointer when
building a kernel in ARM.
Building that file with -fomit-frame-pointer avoids the error,
but that's best avoided unless there's a really good reason.
You would get the same problem when building for Thumb-2 if you pass
-fno-omit-frame-pointer to the compiler and attempt to clobber r7
(by default, Thumb-2 code has no framepointer).
>
> That's the '}' of the function. Strange enough ... the compiler
> wants to do something with FP and isn't willing to give that up.
>
>
> If I take "fp" out of the clobber list then it compiles.
>
> But the generated assembly code is incorrect:
>
> 0000757c <swsusp_arch_resume>:
> 757c: e59fd078 ldr sp, [pc, #120] ; 75fc <swsusp_arch_resume+0x80>
> 7580: e59f3078 ldr r3, [pc, #120] ; 7600 <swsusp_arch_resume+0x84>
> 7584: e5933000 ldr r3, [r3]
> 7588: ea000005 b 75a4 <swsusp_arch_resume+0x28>
> 758c: e59b3000 ldr r3, [fp]
> 7590: e5930004 ldr r0, [r3, #4]
> 7594: e5931000 ldr r1, [r3]
> 7598: ebfffffe bl 1647a0 <copy_page>
> 759c: e59b3000 ldr r3, [fp]
> 75a0: e5933008 ldr r3, [r3, #8]
> 75a4: e58b3000 str r3, [fp]
> 75a8: e59b3000 ldr r3, [fp]
> 75ac: e3530000 cmp r3, #0 ; 0x0
> 75b0: 1afffff5 bne 758c <swsusp_arch_resume+0x10>
> 75b4: e59f0048 ldr r0, [pc, #72] ; 7604 <swsusp_arch_resume+0x88>
> 75b8: e3a0101f mov r1, #31 ; 0x1f
> 75bc: e121f001 msr CPSR_c, r1
> [ ... inline as above follows ... ]
> 75ec: e92d4002 push {r1, lr}
> 75f0: ebfffffe bl dc58 <__restore_processor_state>
> 75f4: e8bd8001 pop {r0, pc}
> 75f8: e59b0000 ldr r0, [fp]
> 75fc: 00000ff8 .word 0x00000ff8
> 7600: 00000000 .word 0x00000000
> 7604: 00002000 .word 0x00002000
>
>
> This assumes FP is set up so that the local variable held in R3
> (that the OMAP compile stuffs into R4) can be stored there, to
> preserve over the function call.
>
> Adding a register asm("r4") to the variable declaration changes
> nothing, it still insists on saving it to/from the stack via [fp].
>
>
> Now this means there's something wrong here; either it's lucky that
> my compiler creates the OMAP (armv7) code that it does and the 6450
> (armv6) is actually correct in assuming this reg/mem location is
> available/usable for it.
>
> Or there's a problem with the ARM11xx / armv6 code generation.
>
>
> I'm tending towards believing that the OMAP result is luck, and to
> make sure the stack switch fools the entire ABI convention into
> working the stackframe setup needs to initialize more than just SP.
>
>
> I can hack around the issue by explicitly initializing FP via:
>
> __asm__ __volatile__("add fp, %0, #4\n\t" : "+sp"(sp));
>
> instead, but that's sneaky in the sense that I'm not telling the
> compiler I've done so (adding "fp" to the clobber fails compile
> again). Also, while this specific incantation uses one word, how can
> I be sure that that'll be it and no other gcc version will pull more
> [fp] usage out of some hat ?
>
> Note that while GCC chose to use [fp], it hasn't actually put proper
> offsets in; it uses the same [fp] for both 'pbe' and the zero return
> value; that's definitely not ok, but again is an artifact of the
> __naked.
>
>
>
> Another option I've found is:
>
> int __naked swsusp_arch_resume(void)
> {
> register void *sp asm("sp") = swsusp_resume_stack + PAGE_SIZE - 8;
> __asm__ __volatile__("" : "+sp"(sp));
>
> return resume_internal();
> }
>
> and make resume_internal() the actual function, non-naked; this
> creates a normal prologue which sets up FP. resume_internal() cannot
> be made static, though, or else the compiler inlines it - without
> the prologues, and it's back to #1^2.
>
>
>
> Re, the use of CONFIG_FTRACE: Just switching that off on S5P6450
> results in using [sp #4] instead of [fp]. Still incorrect code
> (albeit it works out of the box), and still not possible to put "fp"
> into the clobber list.
>
>
>
>
> I am wondering if there's a way to convince gcc to do the right
> thing here, __naked notwithstanding.
I think basically luck is the answer. If you ask the compiler to saw
its own head off it may occasionally make a neat job of it, but don't
bet on that happening every time...
>
> Is the use of a naked "bounce function" for this specific purpose
> (switch the stackpointer) acceptable, and guaranteed to create the
> correct code ?
>
>
> It'd be nice not having to dump this piece into assembly...
Using out-of-line assembler avoids the ABI issues, and is often
more readable IMHO.
On Wed, May 18, 2011 at 06:50:07PM +0200, Mikael Pettersson wrote:
[...]
> Just write the stack switching fragment in proper assembly, and
> keep both sides of it in normal C. End of problem.
Indeed, though there is only a little section of C, in the middle
of some assembler which is not implementable in C.
Maybe something like this:
ENTRY(swsusp_arch_resume)
ldr sp, resume_sp
bl __swsusp_arch_resume_copy_pages
/* the rest of your code code goes here ... */
.align 2
resume_sp: .long swsusp_resume_stack + PAGE_SIZE - 8
ENDPROC(swsusp_arch_resume)
Then in C:
asmlinkage void __swsusp_arch_resume_copy_pages(void)
{
struct pbe *pbe;
for (pbe = restore_pblist; pbe; pbe = pbe->next)
copy_page(pbe->orig_address, pbe->address);
}
There's now nothing for the compiler to get wrong...
Cheers
---Dave
next prev parent reply other threads:[~2011-05-18 17:45 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-05-18 15:25 Question on ARM inline assembly - stackframes, sp/fp usage, __naked Frank Hofmann
2011-05-18 16:06 ` Frank Hofmann
2011-05-18 16:50 ` Mikael Pettersson
2011-05-18 17:45 ` Dave Martin [this message]
2011-05-19 8:46 ` Frank Hofmann
2011-05-19 9:47 ` Dave Martin
2011-05-19 15:25 ` Frank Hofmann
2011-05-19 11:08 ` Måns Rullgård
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110518174531.GA28300@arm.com \
--to=dave.martin@linaro.org \
--cc=linux-arm-kernel@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).