* [RFC 0/4] Create infrastructure for running C code from SRAM. @ 2013-09-03 16:44 Russ Dill [not found] ` <1378226665-27090-4-git-send-email-Russ.Dill@ti.com> ` (2 more replies) 0 siblings, 3 replies; 17+ messages in thread From: Russ Dill @ 2013-09-03 16:44 UTC (permalink / raw) To: linux-arm-kernel This RFC patchset explores an idea for loading C code into SRAM. Currently, all the code I'm aware of that needs to run from SRAM is written in assembler. The most common reason for code needing to run from SRAM is that the memory controller is being disabled/ enabled or is already disabled. arch/arm has by far the most examples, but code also exists in powerpc and sh. The code is written in asm for two primary reasons. First so that markers can be put in indicating the size of the code they it can be copied. Second so that data can be placed along with text and accessed in a position independant manner. SRAM handling code is in the process of being moved from arch directories into drivers/misc/sram.c using device tree and genalloc [1] [2]. This RFC patchset builds on that, including the limitation that the SRAM address is not known at compile time. Because the SRAM address is not known at compile time, the code that runs from SRAM must be compiled with -fPIC. Even if the code were loaded to a fixed virtual address, portions of the code must often be run with the MMU disabled. The general idea is that for each SRAM user (such as an SoC specific suspend/resume mechanism) to create a group of sections. The section group is created with a single macro for each user, but end up looking like this: .sram.am33xx : AT(ADDR(.sram.am33xx) - 0) { __sram_am33xx_start = .; *(.sram.am33xx.*) __sram_am33xx_end = .; } Any data or functions that should be copied to SRAM for this use should be maked with an appropriate __section() attribute. A helper is then added for translating between the original kernel symbol, and the address of that function or variable once it has been copied into SRAM. Once control is passed to a function within the SRAM section grouping, it can access any variables or functions within that same SRAM section grouping without translation. [1] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4984c6 [2] http://www.spinics.net/lists/linux-omap/msg96504.html Russ Dill (4): Misc: SRAM: Create helpers for loading C code into SRAM ARM: SRAM: Add macro for generating SRAM resume trampoline Misc: SRAM: Hack for allowing executable code in SRAM. ARM: AM33XX: Move suspend/resume assembly to C arch/arm/include/asm/suspend.h | 14 ++ arch/arm/kernel/vmlinux.lds.S | 2 + arch/arm/mach-omap2/Makefile | 2 +- arch/arm/mach-omap2/pm33xx.c | 50 ++--- arch/arm/mach-omap2/pm33xx.h | 23 +-- arch/arm/mach-omap2/sleep33xx.S | 394 -------------------------------------- arch/arm/mach-omap2/sleep33xx.c | 309 ++++++++++++++++++++++++++++++ arch/arm/mach-omap2/sram.c | 15 -- drivers/misc/sram.c | 106 +++++++++- include/asm-generic/vmlinux.lds.h | 7 + include/linux/sram.h | 44 +++++ 11 files changed, 509 insertions(+), 457 deletions(-) delete mode 100644 arch/arm/mach-omap2/sleep33xx.S create mode 100644 arch/arm/mach-omap2/sleep33xx.c create mode 100644 include/linux/sram.h -- 1.8.3.2 ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <1378226665-27090-4-git-send-email-Russ.Dill@ti.com>]
* [RFC 3/4] Misc: SRAM: Hack for allowing executable code in SRAM. [not found] ` <1378226665-27090-4-git-send-email-Russ.Dill@ti.com> @ 2013-09-04 18:06 ` Tony Lindgren 2013-09-06 20:50 ` Russ Dill 0 siblings, 1 reply; 17+ messages in thread From: Tony Lindgren @ 2013-09-04 18:06 UTC (permalink / raw) To: linux-arm-kernel * Russ Dill <Russ.Dill@ti.com> [130903 09:52]: > The generic SRAM mechanism does not ioremap memory in a > manner that allows code to be executed from SRAM. There is > currently no generic way to request ioremap to return a > memory area with execution allowed. > > Insert a temporary hack for proof of concept on ARM. > > Signed-off-by: Russ Dill <Russ.Dill@ti.com> > --- > drivers/misc/sram.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/misc/sram.c b/drivers/misc/sram.c > index 08baaab..e059a23 100644 > --- a/drivers/misc/sram.c > +++ b/drivers/misc/sram.c > @@ -31,6 +31,7 @@ > #include <linux/genalloc.h> > #include <linux/sram.h> > #include <asm-generic/cacheflush.h> > +#include <asm/io.h> > > #define SRAM_GRANULARITY 32 > > @@ -138,7 +139,7 @@ static int sram_probe(struct platform_device *pdev) > int ret; > > res = platform_get_resource(pdev, IORESOURCE_MEM, 0); > - virt_base = devm_ioremap_resource(&pdev->dev, res); > + virt_base = __arm_ioremap_exec(res->start, resource_size(res), false); > if (IS_ERR(virt_base)) > return PTR_ERR(virt_base); You can get rid of this hack by defining ioremap_exec in include/asm-generic/io.h the same way as ioremap_nocache is done: #ifndef ioremap_exec #define ioremap_exec ioremap #endif Then the arch that need ioremap_exec can define and implement it. Needs to be reviewed on LKML naturally :) Regards, Tony ^ permalink raw reply [flat|nested] 17+ messages in thread
* [RFC 3/4] Misc: SRAM: Hack for allowing executable code in SRAM. 2013-09-04 18:06 ` [RFC 3/4] Misc: SRAM: Hack for allowing executable code in SRAM Tony Lindgren @ 2013-09-06 20:50 ` Russ Dill 0 siblings, 0 replies; 17+ messages in thread From: Russ Dill @ 2013-09-06 20:50 UTC (permalink / raw) To: linux-arm-kernel On Wed, Sep 4, 2013 at 11:06 AM, Tony Lindgren <tony@atomide.com> wrote: > * Russ Dill <Russ.Dill@ti.com> [130903 09:52]: >> The generic SRAM mechanism does not ioremap memory in a >> manner that allows code to be executed from SRAM. There is >> currently no generic way to request ioremap to return a >> memory area with execution allowed. >> >> Insert a temporary hack for proof of concept on ARM. >> >> Signed-off-by: Russ Dill <Russ.Dill@ti.com> >> --- >> drivers/misc/sram.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/misc/sram.c b/drivers/misc/sram.c >> index 08baaab..e059a23 100644 >> --- a/drivers/misc/sram.c >> +++ b/drivers/misc/sram.c >> @@ -31,6 +31,7 @@ >> #include <linux/genalloc.h> >> #include <linux/sram.h> >> #include <asm-generic/cacheflush.h> >> +#include <asm/io.h> >> >> #define SRAM_GRANULARITY 32 >> >> @@ -138,7 +139,7 @@ static int sram_probe(struct platform_device *pdev) >> int ret; >> >> res = platform_get_resource(pdev, IORESOURCE_MEM, 0); >> - virt_base = devm_ioremap_resource(&pdev->dev, res); >> + virt_base = __arm_ioremap_exec(res->start, resource_size(res), false); >> if (IS_ERR(virt_base)) >> return PTR_ERR(virt_base); > > You can get rid of this hack by defining ioremap_exec in > include/asm-generic/io.h the same way as ioremap_nocache > is done: > > #ifndef ioremap_exec > #define ioremap_exec ioremap > #endif > > Then the arch that need ioremap_exec can define and > implement it. Needs to be reviewed on LKML naturally :) The similar statement for nocache in asm-generic/io.h appears in an #ifndef CONFIG_MMU block. I think the better example would be ioremap_wc, which looks like: #ifndef ARCH_HAS_IOREMAP_WC #define ioremap_wc ioremap_nocache #endif Course, ioremap_exec on ARM has a slight complication since it has an extra bool nocache argument. ^ permalink raw reply [flat|nested] 17+ messages in thread
* [RFC 0/4] Create infrastructure for running C code from SRAM. 2013-09-03 16:44 [RFC 0/4] Create infrastructure for running C code from SRAM Russ Dill [not found] ` <1378226665-27090-4-git-send-email-Russ.Dill@ti.com> @ 2013-09-04 19:52 ` Emilio López 2013-09-04 21:47 ` Russ Dill 2013-09-06 11:12 ` Russell King - ARM Linux 2 siblings, 1 reply; 17+ messages in thread From: Emilio López @ 2013-09-04 19:52 UTC (permalink / raw) To: linux-arm-kernel Hi, El 03/09/13 13:44, Russ Dill escribi?: > This RFC patchset explores an idea for loading C code into SRAM. > Currently, all the code I'm aware of that needs to run from SRAM is written > in assembler. The most common reason for code needing to run from SRAM is > that the memory controller is being disabled/ enabled or is already > disabled. arch/arm has by far the most examples, but code also exists in > powerpc and sh. > > The code is written in asm for two primary reasons. First so that markers > can be put in indicating the size of the code they it can be copied. Second > so that data can be placed along with text and accessed in a position > independant manner. > > SRAM handling code is in the process of being moved from arch directories > into drivers/misc/sram.c using device tree and genalloc [1] [2]. This RFC > patchset builds on that, including the limitation that the SRAM address is > not known at compile time. Because the SRAM address is not known at compile > time, the code that runs from SRAM must be compiled with -fPIC. Even if > the code were loaded to a fixed virtual address, portions of the code must > often be run with the MMU disabled. > > The general idea is that for each SRAM user (such as an SoC specific > suspend/resume mechanism) to create a group of sections. The section group > is created with a single macro for each user, but end up looking like this: > > .sram.am33xx : AT(ADDR(.sram.am33xx) - 0) { > __sram_am33xx_start = .; > *(.sram.am33xx.*) > __sram_am33xx_end = .; > } > > Any data or functions that should be copied to SRAM for this use should be > maked with an appropriate __section() attribute. A helper is then added for > translating between the original kernel symbol, and the address of that > function or variable once it has been copied into SRAM. Once control is > passed to a function within the SRAM section grouping, it can access any > variables or functions within that same SRAM section grouping without > translation. > > [1] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4984c6 > [2] http://www.spinics.net/lists/linux-omap/msg96504.html > > Russ Dill (4): > Misc: SRAM: Create helpers for loading C code into SRAM > ARM: SRAM: Add macro for generating SRAM resume trampoline > Misc: SRAM: Hack for allowing executable code in SRAM. > ARM: AM33XX: Move suspend/resume assembly to C > > arch/arm/include/asm/suspend.h | 14 ++ > arch/arm/kernel/vmlinux.lds.S | 2 + > arch/arm/mach-omap2/Makefile | 2 +- > arch/arm/mach-omap2/pm33xx.c | 50 ++--- > arch/arm/mach-omap2/pm33xx.h | 23 +-- > arch/arm/mach-omap2/sleep33xx.S | 394 -------------------------------------- > arch/arm/mach-omap2/sleep33xx.c | 309 ++++++++++++++++++++++++++++++ > arch/arm/mach-omap2/sram.c | 15 -- > drivers/misc/sram.c | 106 +++++++++- > include/asm-generic/vmlinux.lds.h | 7 + > include/linux/sram.h | 44 +++++ > 11 files changed, 509 insertions(+), 457 deletions(-) > delete mode 100644 arch/arm/mach-omap2/sleep33xx.S > create mode 100644 arch/arm/mach-omap2/sleep33xx.c > create mode 100644 include/linux/sram.h > I'm interested in this, as I'll need something like it for suspend/resume on sunxi. Unfortunately, I only got the cover letter on my email, and the web lakml archives don't seem to have the rest either. After a bit of searching on Google I found a copy on linux-omap[1], but it'd be great if I didn't have to hunt for the patches :) I only have one comment, from a quick look at the code + memcpy((void *) chunk->addr, data, sz); + flush_icache_range(chunk->addr, chunk->addr + sz); How would that behave on Thumb-2 mode? I believe that's the reason why fncpy() got introduced[2] some time ago. Thanks for working on this! Emilio [1] http://www.mail-archive.com/linux-omap at vger.kernel.org/msg94995.html [2] http://www.spinics.net/lists/arm-kernel/msg110706.html ^ permalink raw reply [flat|nested] 17+ messages in thread
* [RFC 0/4] Create infrastructure for running C code from SRAM. 2013-09-04 19:52 ` [RFC 0/4] Create infrastructure for running C code from SRAM Emilio López @ 2013-09-04 21:47 ` Russ Dill 2013-09-06 11:02 ` Sekhar Nori 2013-09-06 11:14 ` Russell King - ARM Linux 0 siblings, 2 replies; 17+ messages in thread From: Russ Dill @ 2013-09-04 21:47 UTC (permalink / raw) To: linux-arm-kernel On Wed, Sep 4, 2013 at 12:52 PM, Emilio L?pez <emilio@elopez.com.ar> wrote: > Hi, > > El 03/09/13 13:44, Russ Dill escribi?: > >> This RFC patchset explores an idea for loading C code into SRAM. >> Currently, all the code I'm aware of that needs to run from SRAM is >> written >> in assembler. The most common reason for code needing to run from SRAM is >> that the memory controller is being disabled/ enabled or is already >> disabled. arch/arm has by far the most examples, but code also exists in >> powerpc and sh. >> >> The code is written in asm for two primary reasons. First so that markers >> can be put in indicating the size of the code they it can be copied. >> Second >> so that data can be placed along with text and accessed in a position >> independant manner. >> >> SRAM handling code is in the process of being moved from arch directories >> into drivers/misc/sram.c using device tree and genalloc [1] [2]. This RFC >> patchset builds on that, including the limitation that the SRAM address is >> not known at compile time. Because the SRAM address is not known at >> compile >> time, the code that runs from SRAM must be compiled with -fPIC. Even if >> the code were loaded to a fixed virtual address, portions of the code must >> often be run with the MMU disabled. >> >> The general idea is that for each SRAM user (such as an SoC specific >> suspend/resume mechanism) to create a group of sections. The section group >> is created with a single macro for each user, but end up looking like >> this: >> >> .sram.am33xx : AT(ADDR(.sram.am33xx) - 0) { >> __sram_am33xx_start = .; >> *(.sram.am33xx.*) >> __sram_am33xx_end = .; >> } >> >> Any data or functions that should be copied to SRAM for this use should be >> maked with an appropriate __section() attribute. A helper is then added >> for >> translating between the original kernel symbol, and the address of that >> function or variable once it has been copied into SRAM. Once control is >> passed to a function within the SRAM section grouping, it can access any >> variables or functions within that same SRAM section grouping without >> translation. >> >> [1] >> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4984c6 >> [2] http://www.spinics.net/lists/linux-omap/msg96504.html >> >> Russ Dill (4): >> Misc: SRAM: Create helpers for loading C code into SRAM >> ARM: SRAM: Add macro for generating SRAM resume trampoline >> Misc: SRAM: Hack for allowing executable code in SRAM. >> ARM: AM33XX: Move suspend/resume assembly to C >> >> arch/arm/include/asm/suspend.h | 14 ++ >> arch/arm/kernel/vmlinux.lds.S | 2 + >> arch/arm/mach-omap2/Makefile | 2 +- >> arch/arm/mach-omap2/pm33xx.c | 50 ++--- >> arch/arm/mach-omap2/pm33xx.h | 23 +-- >> arch/arm/mach-omap2/sleep33xx.S | 394 >> -------------------------------------- >> arch/arm/mach-omap2/sleep33xx.c | 309 ++++++++++++++++++++++++++++++ >> arch/arm/mach-omap2/sram.c | 15 -- >> drivers/misc/sram.c | 106 +++++++++- >> include/asm-generic/vmlinux.lds.h | 7 + >> include/linux/sram.h | 44 +++++ >> 11 files changed, 509 insertions(+), 457 deletions(-) >> delete mode 100644 arch/arm/mach-omap2/sleep33xx.S >> create mode 100644 arch/arm/mach-omap2/sleep33xx.c >> create mode 100644 include/linux/sram.h >> > > I'm interested in this, as I'll need something like it for suspend/resume on > sunxi. Unfortunately, I only got the cover letter on my email, and the web > lakml archives don't seem to have the rest either. After a bit of searching > on Google I found a copy on linux-omap[1], but it'd be great if I didn't > have to hunt for the patches :) The mails to arm-kernel are "awaiting moderation". > I only have one comment, from a quick look at the code > > + memcpy((void *) chunk->addr, data, sz); > + flush_icache_range(chunk->addr, chunk->addr + sz); > > How would that behave on Thumb-2 mode? I believe that's the reason why > fncpy() got introduced[2] some time ago. > > Thanks for working on this! I think this is already taken care of by the way sram.c is using genalloc. The allocation returned should be aligned to 32 bytes. The thumb bit shouldn't be an issue as code is copied based on the start and end makers made by the linker. I may need to add .align statements in the linker so that the start and end markers for the copied code are aligned to at least 8 bytes. Thanks! > Emilio > > [1] http://www.mail-archive.com/linux-omap at vger.kernel.org/msg94995.html > [2] http://www.spinics.net/lists/arm-kernel/msg110706.html > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel at lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 17+ messages in thread
* [RFC 0/4] Create infrastructure for running C code from SRAM. 2013-09-04 21:47 ` Russ Dill @ 2013-09-06 11:02 ` Sekhar Nori 2013-09-06 11:14 ` Russell King - ARM Linux 1 sibling, 0 replies; 17+ messages in thread From: Sekhar Nori @ 2013-09-06 11:02 UTC (permalink / raw) To: linux-arm-kernel On Thursday 05 September 2013 03:17 AM, Russ Dill wrote: > On Wed, Sep 4, 2013 at 12:52 PM, Emilio L?pez <emilio@elopez.com.ar> wrote: >> I'm interested in this, as I'll need something like it for suspend/resume on >> sunxi. Unfortunately, I only got the cover letter on my email, and the web >> lakml archives don't seem to have the rest either. After a bit of searching >> on Google I found a copy on linux-omap[1], but it'd be great if I didn't >> have to hunt for the patches :) > > The mails to arm-kernel are "awaiting moderation". That is because you have RFC in the subject line instead of PATCH RFC Thanks, Sekhar ^ permalink raw reply [flat|nested] 17+ messages in thread
* [RFC 0/4] Create infrastructure for running C code from SRAM. 2013-09-04 21:47 ` Russ Dill 2013-09-06 11:02 ` Sekhar Nori @ 2013-09-06 11:14 ` Russell King - ARM Linux 2013-09-06 16:40 ` Dave Martin 2013-09-06 18:40 ` Russ Dill 1 sibling, 2 replies; 17+ messages in thread From: Russell King - ARM Linux @ 2013-09-06 11:14 UTC (permalink / raw) To: linux-arm-kernel On Wed, Sep 04, 2013 at 02:47:51PM -0700, Russ Dill wrote: > I think this is already taken care of by the way sram.c is using > genalloc. The allocation returned should be aligned to 32 bytes. The > thumb bit shouldn't be an issue as code is copied based on the start > and end makers made by the linker. I may need to add .align statements > in the linker so that the start and end markers for the copied code > are aligned to at least 8 bytes. I think you need to read up on what fncpy does... there's more to it than just merely copying code at an appropriate alignment. ^ permalink raw reply [flat|nested] 17+ messages in thread
* [RFC 0/4] Create infrastructure for running C code from SRAM. 2013-09-06 11:14 ` Russell King - ARM Linux @ 2013-09-06 16:40 ` Dave Martin 2013-09-06 18:50 ` Russ Dill 2013-09-07 8:57 ` Russell King - ARM Linux 2013-09-06 18:40 ` Russ Dill 1 sibling, 2 replies; 17+ messages in thread From: Dave Martin @ 2013-09-06 16:40 UTC (permalink / raw) To: linux-arm-kernel On Fri, Sep 06, 2013 at 12:14:08PM +0100, Russell King - ARM Linux wrote: > On Wed, Sep 04, 2013 at 02:47:51PM -0700, Russ Dill wrote: > > I think this is already taken care of by the way sram.c is using > > genalloc. The allocation returned should be aligned to 32 bytes. The > > thumb bit shouldn't be an issue as code is copied based on the start > > and end makers made by the linker. I may need to add .align statements > > in the linker so that the start and end markers for the copied code > > are aligned to at least 8 bytes. > > I think you need to read up on what fncpy does... there's more to it > than just merely copying code at an appropriate alignment. The technique of putting each loadable blob in a specific vmlinux section, and then adjusting entry-point symbols by adding/subtracting the appropriate offset, probably does work. This relies on the functions' code alignment requirement being honoured by both the vmlinux link map, and the allocator used to find SRAM space to copy the functions to. Searching the entire list of known blobs every time we want to convert a symbol seems unnecessary though. Surely the caller could know the blob<->symbol mapping anyway? One thing fncpy() doesn't provide is a way to copy groups of functions that call each other, if vmlinux needs to know about any symbol other than the one at the start. We might need a better mechanism if that is needed. I actually wonder whether fncpy() contains a buglet, whereby flush_icache_range() is used instead of coherent_kern_range(). The SRAM is probably not mapped cached, but at least a DSB would be needed before flushing the relevant lines from the I-cache. However, flush_icache_range() seems to be implemented by a call to coherent_kern_range() anyway, so perhaps that's not a problem. Cheers ---Dave ^ permalink raw reply [flat|nested] 17+ messages in thread
* [RFC 0/4] Create infrastructure for running C code from SRAM. 2013-09-06 16:40 ` Dave Martin @ 2013-09-06 18:50 ` Russ Dill 2013-09-07 8:57 ` Russell King - ARM Linux 1 sibling, 0 replies; 17+ messages in thread From: Russ Dill @ 2013-09-06 18:50 UTC (permalink / raw) To: linux-arm-kernel On Fri, Sep 6, 2013 at 9:40 AM, Dave Martin <Dave.Martin@arm.com> wrote: > On Fri, Sep 06, 2013 at 12:14:08PM +0100, Russell King - ARM Linux wrote: >> On Wed, Sep 04, 2013 at 02:47:51PM -0700, Russ Dill wrote: >> > I think this is already taken care of by the way sram.c is using >> > genalloc. The allocation returned should be aligned to 32 bytes. The >> > thumb bit shouldn't be an issue as code is copied based on the start >> > and end makers made by the linker. I may need to add .align statements >> > in the linker so that the start and end markers for the copied code >> > are aligned to at least 8 bytes. >> >> I think you need to read up on what fncpy does... there's more to it >> than just merely copying code at an appropriate alignment. > > The technique of putting each loadable blob in a specific vmlinux > section, and then adjusting entry-point symbols by adding/subtracting > the appropriate offset, probably does work. > > This relies on the functions' code alignment requirement being > honoured by both the vmlinux link map, and the allocator used to find > SRAM space to copy the functions to. > > Searching the entire list of known blobs every time we want to convert > a symbol seems unnecessary though. Surely the caller could know the > blob<->symbol mapping anyway? It doesn't search the list of known blobs, only loaded blobs. On all the platforms I'm aware of, only one SRAM section is loaded with code. > One thing fncpy() doesn't provide is a way to copy groups of functions > that call each other, if vmlinux needs to know about any symbol other > than the one at the start. We might need a better mechanism if that is > needed. > > > I actually wonder whether fncpy() contains a buglet, whereby > flush_icache_range() is used instead of coherent_kern_range(). > The SRAM is probably not mapped cached, but at least a DSB would be > needed before flushing the relevant lines from the I-cache. It is mapped cached on most platforms. > However, flush_icache_range() seems to be implemented by a call to > coherent_kern_range() anyway, so perhaps that's not a problem. > > Cheers > ---Dave > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel at lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 17+ messages in thread
* [RFC 0/4] Create infrastructure for running C code from SRAM. 2013-09-06 16:40 ` Dave Martin 2013-09-06 18:50 ` Russ Dill @ 2013-09-07 8:57 ` Russell King - ARM Linux 1 sibling, 0 replies; 17+ messages in thread From: Russell King - ARM Linux @ 2013-09-07 8:57 UTC (permalink / raw) To: linux-arm-kernel On Fri, Sep 06, 2013 at 05:40:59PM +0100, Dave Martin wrote: > I actually wonder whether fncpy() contains a buglet, whereby > flush_icache_range() is used instead of coherent_kern_range(). > The SRAM is probably not mapped cached, but at least a DSB would be > needed before flushing the relevant lines from the I-cache. flush_icache_range() is correct - it's there to ensure that memory which has been written will be readable to the instruction stream. That's it's whole purpose, and it's used when modules are loaded too. You're reading too much into the name: it doesn't just touch the I cache. ^ permalink raw reply [flat|nested] 17+ messages in thread
* [RFC 0/4] Create infrastructure for running C code from SRAM. 2013-09-06 11:14 ` Russell King - ARM Linux 2013-09-06 16:40 ` Dave Martin @ 2013-09-06 18:40 ` Russ Dill 1 sibling, 0 replies; 17+ messages in thread From: Russ Dill @ 2013-09-06 18:40 UTC (permalink / raw) To: linux-arm-kernel On Fri, Sep 6, 2013 at 4:14 AM, Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > On Wed, Sep 04, 2013 at 02:47:51PM -0700, Russ Dill wrote: >> I think this is already taken care of by the way sram.c is using >> genalloc. The allocation returned should be aligned to 32 bytes. The >> thumb bit shouldn't be an issue as code is copied based on the start >> and end makers made by the linker. I may need to add .align statements >> in the linker so that the start and end markers for the copied code >> are aligned to at least 8 bytes. > > I think you need to read up on what fncpy does... there's more to it > than just merely copying code at an appropriate alignment. Yes, I need to add a pair of inlines that do the asm trickery to/from function addresses. Thanks. ^ permalink raw reply [flat|nested] 17+ messages in thread
* [RFC 0/4] Create infrastructure for running C code from SRAM. 2013-09-03 16:44 [RFC 0/4] Create infrastructure for running C code from SRAM Russ Dill [not found] ` <1378226665-27090-4-git-send-email-Russ.Dill@ti.com> 2013-09-04 19:52 ` [RFC 0/4] Create infrastructure for running C code from SRAM Emilio López @ 2013-09-06 11:12 ` Russell King - ARM Linux 2013-09-06 16:19 ` Dave Martin 2013-09-06 19:32 ` Russ Dill 2 siblings, 2 replies; 17+ messages in thread From: Russell King - ARM Linux @ 2013-09-06 11:12 UTC (permalink / raw) To: linux-arm-kernel On Tue, Sep 03, 2013 at 09:44:21AM -0700, Russ Dill wrote: > SRAM handling code is in the process of being moved from arch directories > into drivers/misc/sram.c using device tree and genalloc [1] [2]. This RFC > patchset builds on that, including the limitation that the SRAM address is > not known at compile time. Because the SRAM address is not known at compile > time, the code that runs from SRAM must be compiled with -fPIC. Even if > the code were loaded to a fixed virtual address, portions of the code must > often be run with the MMU disabled. What are you doing about the various gcc utility functions that may be implicitly called from C code such as memcpy and memset? > The general idea is that for each SRAM user (such as an SoC specific > suspend/resume mechanism) to create a group of sections. The section group > is created with a single macro for each user, but end up looking like this: > > .sram.am33xx : AT(ADDR(.sram.am33xx) - 0) { > __sram_am33xx_start = .; > *(.sram.am33xx.*) > __sram_am33xx_end = .; > } > > Any data or functions that should be copied to SRAM for this use should be > maked with an appropriate __section() attribute. A helper is then added for > translating between the original kernel symbol, and the address of that > function or variable once it has been copied into SRAM. Once control is > passed to a function within the SRAM section grouping, it can access any > variables or functions within that same SRAM section grouping without > translation. What about the relocations which will need to be fixed up - eg, addresses in the literal pool, the GOT table contents, etc? You say nothing about this. ^ permalink raw reply [flat|nested] 17+ messages in thread
* [RFC 0/4] Create infrastructure for running C code from SRAM. 2013-09-06 11:12 ` Russell King - ARM Linux @ 2013-09-06 16:19 ` Dave Martin 2013-09-06 19:42 ` Russ Dill 2013-09-06 19:32 ` Russ Dill 1 sibling, 1 reply; 17+ messages in thread From: Dave Martin @ 2013-09-06 16:19 UTC (permalink / raw) To: linux-arm-kernel On Fri, Sep 06, 2013 at 12:12:21PM +0100, Russell King - ARM Linux wrote: > On Tue, Sep 03, 2013 at 09:44:21AM -0700, Russ Dill wrote: > > SRAM handling code is in the process of being moved from arch directories > > into drivers/misc/sram.c using device tree and genalloc [1] [2]. This RFC > > patchset builds on that, including the limitation that the SRAM address is > > not known at compile time. Because the SRAM address is not known at compile > > time, the code that runs from SRAM must be compiled with -fPIC. Even if > > the code were loaded to a fixed virtual address, portions of the code must > > often be run with the MMU disabled. > > What are you doing about the various gcc utility functions that may be > implicitly called from C code such as memcpy and memset? > > > The general idea is that for each SRAM user (such as an SoC specific > > suspend/resume mechanism) to create a group of sections. The section group > > is created with a single macro for each user, but end up looking like this: > > > > .sram.am33xx : AT(ADDR(.sram.am33xx) - 0) { > > __sram_am33xx_start = .; > > *(.sram.am33xx.*) > > __sram_am33xx_end = .; > > } > > > > Any data or functions that should be copied to SRAM for this use should be > > maked with an appropriate __section() attribute. A helper is then added for > > translating between the original kernel symbol, and the address of that > > function or variable once it has been copied into SRAM. Once control is > > passed to a function within the SRAM section grouping, it can access any > > variables or functions within that same SRAM section grouping without > > translation. > > What about the relocations which will need to be fixed up - eg, addresses > in the literal pool, the GOT table contents, etc? You say nothing about > this. I was also thinking about this, and there are more problems. As well as what has already been mentioned: * Calls from inside the SRAM code to vmlinux (including lib1funcs etc.) will typically break, except on architectures where function calls are (absolute by default not ARM). * The compiler/linker won't detect unsafe constructs or code generation, because it assumes that anything built with -fPIC is going to be patched up later by ld.so or equivalent. * The GOT is generated by the linker, and is a single table. Yet each SRAM blob needs to be able to refer to its own GOT entries position- independently. Moving the blobs independently won't work. In other words: -fPIC does not generate position-independent code. It generates position-dependent code that is easier to move around than non-fPIC code, but you still need a dynamic linker (or equivalent) to make it all work. There are various "correct" ways to handle this, the simplest of which is probably to build each SRAM blob as a kernel module, embed the result in the kernel somehow, and then use the module loader infrastructure to handle fixing the module up to the right address. But this is still likely to be overkill, given the small scale of the SRAM code. Restricting such code to carefully-written assembler (as now) may be the more practical approrach, unless there's a good example of somewhere that C code would provide a big benefit. Cheers ---Dave ^ permalink raw reply [flat|nested] 17+ messages in thread
* [RFC 0/4] Create infrastructure for running C code from SRAM. 2013-09-06 16:19 ` Dave Martin @ 2013-09-06 19:42 ` Russ Dill 0 siblings, 0 replies; 17+ messages in thread From: Russ Dill @ 2013-09-06 19:42 UTC (permalink / raw) To: linux-arm-kernel On Fri, Sep 6, 2013 at 9:19 AM, Dave Martin <Dave.Martin@arm.com> wrote: > On Fri, Sep 06, 2013 at 12:12:21PM +0100, Russell King - ARM Linux wrote: >> On Tue, Sep 03, 2013 at 09:44:21AM -0700, Russ Dill wrote: >> > SRAM handling code is in the process of being moved from arch directories >> > into drivers/misc/sram.c using device tree and genalloc [1] [2]. This RFC >> > patchset builds on that, including the limitation that the SRAM address is >> > not known at compile time. Because the SRAM address is not known at compile >> > time, the code that runs from SRAM must be compiled with -fPIC. Even if >> > the code were loaded to a fixed virtual address, portions of the code must >> > often be run with the MMU disabled. >> >> What are you doing about the various gcc utility functions that may be >> implicitly called from C code such as memcpy and memset? >> >> > The general idea is that for each SRAM user (such as an SoC specific >> > suspend/resume mechanism) to create a group of sections. The section group >> > is created with a single macro for each user, but end up looking like this: >> > >> > .sram.am33xx : AT(ADDR(.sram.am33xx) - 0) { >> > __sram_am33xx_start = .; >> > *(.sram.am33xx.*) >> > __sram_am33xx_end = .; >> > } >> > >> > Any data or functions that should be copied to SRAM for this use should be >> > maked with an appropriate __section() attribute. A helper is then added for >> > translating between the original kernel symbol, and the address of that >> > function or variable once it has been copied into SRAM. Once control is >> > passed to a function within the SRAM section grouping, it can access any >> > variables or functions within that same SRAM section grouping without >> > translation. >> >> What about the relocations which will need to be fixed up - eg, addresses >> in the literal pool, the GOT table contents, etc? You say nothing about >> this. > > I was also thinking about this, and there are more problems. > > As well as what has already been mentioned: > > * Calls from inside the SRAM code to vmlinux (including lib1funcs etc.) > will typically break, except on architectures where function calls are > (absolute by default not ARM). As in the response to RMK, I think compiler flags are enough to prevent implicit memcpy/memset calls. The code would not be allowed to do divisions, module, or 64 bit multiplication. A make rule would check the sram sections for any dynamically relocatable symbols. > * The compiler/linker won't detect unsafe constructs or code generation, > because it assumes that anything built with -fPIC is going to be patched > up later by ld.so or equivalent. Can you provide examples of what some of these other unsafe constructs might be? > * The GOT is generated by the linker, and is a single table. Yet each > SRAM blob needs to be able to refer to its own GOT entries position- > independently. Moving the blobs independently won't work. Would GOT entries only exist if there are accesses to .data or .bss? The SRAM C code would not support such a thing, only access to data and text within the SRAM grouping is allowed. Is there a way to make the compiler or linker complain if such an access is done? If not, it'd be another make rule as above. > In other words: -fPIC does not generate position-independent code. > > It generates position-dependent code that is easier to move around than > non-fPIC code, but you still need a dynamic linker (or equivalent) to > make it all work. arch/arm/boot/compressed/ seems to manage it. Hopefully, by allowing only more limited code, I can get by with less tricks. > There are various "correct" ways to handle this, the simplest of which > is probably to build each SRAM blob as a kernel module, embed the result > in the kernel somehow, and then use the module loader infrastructure > to handle fixing the module up to the right address. > > But this is still likely to be overkill, given the small scale of the > SRAM code. Yes, I'm pretty sure several people would scream rather loudly if getting suspend/resume support on their platform required CONFIG_MODULES=y. > Restricting such code to carefully-written assembler (as now) may be > the more practical approrach, unless there's a good example of somewhere > that C code would provide a big benefit. There are currently about 5000 or so lines of assembly code in arch/arm that are used for suspend/resume stubs. In one stage of am335x development, the sleep/resume stub for am335x was about 1200 lines long. Since then, a lot of that code has been moved to a firmware blob, but there has been some pushback on that, which is why I'm investigating this path. Especially given that there are some future platforms that will follow the am335x pm model. ^ permalink raw reply [flat|nested] 17+ messages in thread
* [RFC 0/4] Create infrastructure for running C code from SRAM. 2013-09-06 11:12 ` Russell King - ARM Linux 2013-09-06 16:19 ` Dave Martin @ 2013-09-06 19:32 ` Russ Dill 2013-09-07 16:21 ` Ard Biesheuvel 1 sibling, 1 reply; 17+ messages in thread From: Russ Dill @ 2013-09-06 19:32 UTC (permalink / raw) To: linux-arm-kernel On Fri, Sep 6, 2013 at 4:12 AM, Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > On Tue, Sep 03, 2013 at 09:44:21AM -0700, Russ Dill wrote: >> SRAM handling code is in the process of being moved from arch directories >> into drivers/misc/sram.c using device tree and genalloc [1] [2]. This RFC >> patchset builds on that, including the limitation that the SRAM address is >> not known at compile time. Because the SRAM address is not known at compile >> time, the code that runs from SRAM must be compiled with -fPIC. Even if >> the code were loaded to a fixed virtual address, portions of the code must >> often be run with the MMU disabled. > > What are you doing about the various gcc utility functions that may be > implicitly called from C code such as memcpy and memset? That would create a problem. Would '-ffreestanding' be the correct flag to add? As far as the family of __aeabi_*, I need to add documentation stating that on ARM, you can't divide, perform modulo, and can't do 64 bit multiplications. I can then add a make rule that will grep the symbol lists of .sram sections for ^__aeabi_. Is this enough? >> The general idea is that for each SRAM user (such as an SoC specific >> suspend/resume mechanism) to create a group of sections. The section group >> is created with a single macro for each user, but end up looking like this: >> >> .sram.am33xx : AT(ADDR(.sram.am33xx) - 0) { >> __sram_am33xx_start = .; >> *(.sram.am33xx.*) >> __sram_am33xx_end = .; >> } >> >> Any data or functions that should be copied to SRAM for this use should be >> maked with an appropriate __section() attribute. A helper is then added for >> translating between the original kernel symbol, and the address of that >> function or variable once it has been copied into SRAM. Once control is >> passed to a function within the SRAM section grouping, it can access any >> variables or functions within that same SRAM section grouping without >> translation. > > What about the relocations which will need to be fixed up - eg, addresses > in the literal pool, the GOT table contents, etc? You say nothing about > this. The C code would need to be written so that such accesses do not occur. From functions that are in the sram text section, only accesses to other sram sections in their group would be allowed. And above, a compilation step could be added to make the compilation fail when such things happen. The direction you are going though is good, if this is fragile, it doesn't put us in a better place then having fragile asm that at least complies reliably. As I'm looking towards the future and platforms that are similar to am335x in that they are placing more and more of the PM state machine burden on the CPU, I'd really like to try to make this happen. ^ permalink raw reply [flat|nested] 17+ messages in thread
* [RFC 0/4] Create infrastructure for running C code from SRAM. 2013-09-06 19:32 ` Russ Dill @ 2013-09-07 16:21 ` Ard Biesheuvel 2013-09-09 23:10 ` Russ Dill 0 siblings, 1 reply; 17+ messages in thread From: Ard Biesheuvel @ 2013-09-07 16:21 UTC (permalink / raw) To: linux-arm-kernel On 6 September 2013 21:32, Russ Dill <Russ.Dill@ti.com> wrote: > On Fri, Sep 6, 2013 at 4:12 AM, Russell King - ARM Linux > <linux@arm.linux.org.uk> wrote: >> On Tue, Sep 03, 2013 at 09:44:21AM -0700, Russ Dill wrote: >>> SRAM handling code is in the process of being moved from arch directories >>> into drivers/misc/sram.c using device tree and genalloc [1] [2]. This RFC >>> patchset builds on that, including the limitation that the SRAM address is >>> not known at compile time. Because the SRAM address is not known at compile >>> time, the code that runs from SRAM must be compiled with -fPIC. Even if >>> the code were loaded to a fixed virtual address, portions of the code must >>> often be run with the MMU disabled. >> >> What are you doing about the various gcc utility functions that may be >> implicitly called from C code such as memcpy and memset? > > That would create a problem. Would '-ffreestanding' be the correct > flag to add? No, unfortunately, -ffreestanding won't prevent GCC from generating implicit calls to memzero() et al. These are mainly issued when using initialized non-POD stack variables so avoiding those might help you there. > As far as the family of __aeabi_*, I need to add > documentation stating that on ARM, you can't divide, perform modulo, > and can't do 64 bit multiplications. I can then add a make rule that > will grep the symbol lists of .sram sections for ^__aeabi_. Is this > enough? > Well, even printk() needs integer division for its %d/%u modifiers, so this is really not so easy to achieve. >>> The general idea is that for each SRAM user (such as an SoC specific >>> suspend/resume mechanism) to create a group of sections. The section group >>> is created with a single macro for each user, but end up looking like this: >>> >>> .sram.am33xx : AT(ADDR(.sram.am33xx) - 0) { >>> __sram_am33xx_start = .; >>> *(.sram.am33xx.*) >>> __sram_am33xx_end = .; >>> } >>> >>> Any data or functions that should be copied to SRAM for this use should be >>> maked with an appropriate __section() attribute. A helper is then added for >>> translating between the original kernel symbol, and the address of that >>> function or variable once it has been copied into SRAM. Once control is >>> passed to a function within the SRAM section grouping, it can access any >>> variables or functions within that same SRAM section grouping without >>> translation. >> >> What about the relocations which will need to be fixed up - eg, addresses >> in the literal pool, the GOT table contents, etc? You say nothing about >> this. > > The C code would need to be written so that such accesses do not > occur. From functions that are in the sram text section, only accesses > to other sram sections in their group would be allowed. And above, a > compilation step could be added to make the compilation fail when such > things happen. > The point is that, sadly, GCC is just not very good at generating relocatable code for embedded targets. Playing with -fvisibility may result in code that contains fewer dynamic relocations, but you will always end up with a few that need to be fixed up before the code can run. Another thing to note is that usually, these relocations can only be fixed up once, as the addend is overwritten by the fixed-up address. This means that the code can only run in SRAM, and you should probably best avoid the module loader machinery as it may clobber the addends before you get to process them. One thing that remains implicit in this discussion is that you are executing from SRAM because DRAM is not available (I presume). Wouldn't it be better to treat the code that lives in the SRAM as a completely separate executable? You can generate a PIE executable that supplies minimal memzero et al, fixup the relocations yourself (look at the uboot sources for an example of this) and you will be absolutely sure that the code can run completely autonomously. In fact, some of this stuff could potentially be reused for other disjoint execution domains such as TZ secure world. Regards, Ard. > The direction you are going though is good, if this is fragile, it > doesn't put us in a better place then having fragile asm that at least > complies reliably. As I'm looking towards the future and platforms > that are similar to am335x in that they are placing more and more of > the PM state machine burden on the CPU, I'd really like to try to make > this happen. > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel at lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 17+ messages in thread
* [RFC 0/4] Create infrastructure for running C code from SRAM. 2013-09-07 16:21 ` Ard Biesheuvel @ 2013-09-09 23:10 ` Russ Dill 0 siblings, 0 replies; 17+ messages in thread From: Russ Dill @ 2013-09-09 23:10 UTC (permalink / raw) To: linux-arm-kernel On Sat, Sep 7, 2013 at 9:21 AM, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > On 6 September 2013 21:32, Russ Dill <Russ.Dill@ti.com> wrote: >> On Fri, Sep 6, 2013 at 4:12 AM, Russell King - ARM Linux >> <linux@arm.linux.org.uk> wrote: >>> On Tue, Sep 03, 2013 at 09:44:21AM -0700, Russ Dill wrote: >>>> SRAM handling code is in the process of being moved from arch directories >>>> into drivers/misc/sram.c using device tree and genalloc [1] [2]. This RFC >>>> patchset builds on that, including the limitation that the SRAM address is >>>> not known at compile time. Because the SRAM address is not known at compile >>>> time, the code that runs from SRAM must be compiled with -fPIC. Even if >>>> the code were loaded to a fixed virtual address, portions of the code must >>>> often be run with the MMU disabled. >>> >>> What are you doing about the various gcc utility functions that may be >>> implicitly called from C code such as memcpy and memset? >> >> That would create a problem. Would '-ffreestanding' be the correct >> flag to add? > > No, unfortunately, -ffreestanding won't prevent GCC from generating > implicit calls to memzero() et al. These are mainly issued when using > initialized non-POD stack variables so avoiding those might help you > there. >> As far as the family of __aeabi_*, I need to add >> documentation stating that on ARM, you can't divide, perform modulo, >> and can't do 64 bit multiplications. I can then add a make rule that >> will grep the symbol lists of .sram sections for ^__aeabi_. Is this >> enough? >> > > Well, even printk() needs integer division for its %d/%u modifiers, so > this is really not so easy to achieve. > >>>> The general idea is that for each SRAM user (such as an SoC specific >>>> suspend/resume mechanism) to create a group of sections. The section group >>>> is created with a single macro for each user, but end up looking like this: >>>> >>>> .sram.am33xx : AT(ADDR(.sram.am33xx) - 0) { >>>> __sram_am33xx_start = .; >>>> *(.sram.am33xx.*) >>>> __sram_am33xx_end = .; >>>> } >>>> >>>> Any data or functions that should be copied to SRAM for this use should be >>>> maked with an appropriate __section() attribute. A helper is then added for >>>> translating between the original kernel symbol, and the address of that >>>> function or variable once it has been copied into SRAM. Once control is >>>> passed to a function within the SRAM section grouping, it can access any >>>> variables or functions within that same SRAM section grouping without >>>> translation. >>> >>> What about the relocations which will need to be fixed up - eg, addresses >>> in the literal pool, the GOT table contents, etc? You say nothing about >>> this. >> >> The C code would need to be written so that such accesses do not >> occur. From functions that are in the sram text section, only accesses >> to other sram sections in their group would be allowed. And above, a >> compilation step could be added to make the compilation fail when such >> things happen. >> > > The point is that, sadly, GCC is just not very good at generating > relocatable code for embedded targets. Playing with -fvisibility may > result in code that contains fewer dynamic relocations, but you will > always end up with a few that need to be fixed up before the code can > run. Another thing to note is that usually, these relocations can only > be fixed up once, as the addend is overwritten by the fixed-up > address. This means that the code can only run in SRAM, and you should > probably best avoid the module loader machinery as it may clobber the > addends before you get to process them. > > One thing that remains implicit in this discussion is that you are > executing from SRAM because DRAM is not available (I presume). > Wouldn't it be better to treat the code that lives in the SRAM as a > completely separate executable? You can generate a PIE executable that > supplies minimal memzero et al, fixup the relocations yourself (look > at the uboot sources for an example of this) and you will be > absolutely sure that the code can run completely autonomously. In > fact, some of this stuff could potentially be reused for other > disjoint execution domains such as TZ secure world. This is the path I'm going down, but I'm trying to do it without relocations. I'm following the model of arch/arm/boot/compressed and generating a relocatable gcc builtin library with weak symbols containing lib1funcs.S, string.c, ashldi3.S, and some stubs for div0 and the unwind symbols, call in sramlib.o. I'm then doing an objcopy of the .sramlib section, and the .sram.* sections into a single object file and performing a link with a linker script like: SECTIONS { .text : { *(.sramlib) } OVERLAY ALIGN(32) : NOCROSSREFS { .sram.am33xx { *(.sram.am33xx.*) } .sram.am437x { *(.sram.am437x.*) } } } It produces output without any relocations, but from there I'm a little fuzzy on how to get the symbols of functions and variables into the kernel. In the meantime, I'll look into the u-boot methods. ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2013-09-09 23:10 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-09-03 16:44 [RFC 0/4] Create infrastructure for running C code from SRAM Russ Dill [not found] ` <1378226665-27090-4-git-send-email-Russ.Dill@ti.com> 2013-09-04 18:06 ` [RFC 3/4] Misc: SRAM: Hack for allowing executable code in SRAM Tony Lindgren 2013-09-06 20:50 ` Russ Dill 2013-09-04 19:52 ` [RFC 0/4] Create infrastructure for running C code from SRAM Emilio López 2013-09-04 21:47 ` Russ Dill 2013-09-06 11:02 ` Sekhar Nori 2013-09-06 11:14 ` Russell King - ARM Linux 2013-09-06 16:40 ` Dave Martin 2013-09-06 18:50 ` Russ Dill 2013-09-07 8:57 ` Russell King - ARM Linux 2013-09-06 18:40 ` Russ Dill 2013-09-06 11:12 ` Russell King - ARM Linux 2013-09-06 16:19 ` Dave Martin 2013-09-06 19:42 ` Russ Dill 2013-09-06 19:32 ` Russ Dill 2013-09-07 16:21 ` Ard Biesheuvel 2013-09-09 23:10 ` Russ Dill
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).