* PXA3xx internal SRAM @ 2010-03-21 12:47 Daniel Mack 2010-03-22 0:47 ` Haojian Zhuang 2010-03-22 21:09 ` Linus Walleij 0 siblings, 2 replies; 7+ messages in thread From: Daniel Mack @ 2010-03-21 12:47 UTC (permalink / raw) To: linux-arm-kernel The pxa3xx series features a comparatively big and fast internal SRAM of 256KB which is currently unused by the Linux kernel except for a 4-byte return vector from suspend. I wonder what this could be used for. Is there any kind of cache that would be worth putting there to speed up things for example? And could the uncompressor probably benfit from that? Daniel ^ permalink raw reply [flat|nested] 7+ messages in thread
* PXA3xx internal SRAM 2010-03-21 12:47 PXA3xx internal SRAM Daniel Mack @ 2010-03-22 0:47 ` Haojian Zhuang 2010-03-22 7:57 ` Daniel Mack 2010-03-22 21:09 ` Linus Walleij 1 sibling, 1 reply; 7+ messages in thread From: Haojian Zhuang @ 2010-03-22 0:47 UTC (permalink / raw) To: linux-arm-kernel On Sun, Mar 21, 2010 at 8:47 AM, Daniel Mack <daniel@caiaq.de> wrote: > The pxa3xx series features a comparatively big and fast internal SRAM of > 256KB which is currently unused by the Linux kernel except for a 4-byte > return vector from suspend. > > I wonder what this could be used for. Is there any kind of cache that > would be worth putting there to speed up things for example? > And could the uncompressor probably benfit from that? > > Daniel > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel at lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel > The size of internal SRAM isn't always 256KB. It could be 128KB, 256KB or 702KB. In marvell BSP, there's a memory management driver (aka IMM) to handle internal SRAM. So driver or user application could make use of it. And one part of internal SRAM is reserved for low power idle mode. But both of these two features aren't pushed into community yet. Thanks Haojian ^ permalink raw reply [flat|nested] 7+ messages in thread
* PXA3xx internal SRAM 2010-03-22 0:47 ` Haojian Zhuang @ 2010-03-22 7:57 ` Daniel Mack 0 siblings, 0 replies; 7+ messages in thread From: Daniel Mack @ 2010-03-22 7:57 UTC (permalink / raw) To: linux-arm-kernel On Sun, Mar 21, 2010 at 08:47:22PM -0400, Haojian Zhuang wrote: > On Sun, Mar 21, 2010 at 8:47 AM, Daniel Mack <daniel@caiaq.de> wrote: > > The pxa3xx series features a comparatively big and fast internal SRAM of > > 256KB which is currently unused by the Linux kernel except for a 4-byte > > return vector from suspend. > > > > I wonder what this could be used for. Is there any kind of cache that > > would be worth putting there to speed up things for example? > > And could the uncompressor probably benfit from that? > > > > Daniel > > The size of internal SRAM isn't always 256KB. It could be 128KB, 256KB > or 702KB. In marvell BSP, there's a memory management driver (aka IMM) > to handle internal SRAM. So driver or user application could make use > of it. And one part of internal SRAM is reserved for low power idle > mode. But both of these two features aren't pushed into community yet. Ok. I was more thinking about any kernel specific kind of cache that has a fix size of 128kb or 256kb and which is consulted regularily. But I don't know which kind of cache that could be. As that's not neccessarily ARM-related, I copied LKML. Maybe anyone has an idea. Thanks, Daniel ^ permalink raw reply [flat|nested] 7+ messages in thread
* PXA3xx internal SRAM 2010-03-21 12:47 PXA3xx internal SRAM Daniel Mack 2010-03-22 0:47 ` Haojian Zhuang @ 2010-03-22 21:09 ` Linus Walleij 2010-03-25 12:58 ` Haojian Zhuang 1 sibling, 1 reply; 7+ messages in thread From: Linus Walleij @ 2010-03-22 21:09 UTC (permalink / raw) To: linux-arm-kernel 2010/3/21 Daniel Mack <daniel@caiaq.de>: > I wonder what this could be used for. Is there any kind of cache that > would be worth putting there to speed up things for example? Some ARM systems have TCM (Tightly Coupled Memory) which are similar in character but actually a feature of the ARM platforms rather than a custom memory for a certain SoC. You can find some documentation on how this can be used in Documentation/arm/tcm.txt, as you can see we can compile code there and the memory left after that can be used as a generic memory pool for fast always-on memory. I raised the question as to whether the IRQ vectors could be put into this memory but Russell says this is unfortunately not possible, because the top of the vector page is also used for other things. I don't know if this entire page can be moved to fast memory though, perhaps that concept need to be revisited? Linus Walleij ^ permalink raw reply [flat|nested] 7+ messages in thread
* PXA3xx internal SRAM 2010-03-22 21:09 ` Linus Walleij @ 2010-03-25 12:58 ` Haojian Zhuang 2010-03-25 15:47 ` Linus Walleij 0 siblings, 1 reply; 7+ messages in thread From: Haojian Zhuang @ 2010-03-25 12:58 UTC (permalink / raw) To: linux-arm-kernel On Mon, Mar 22, 2010 at 5:09 PM, Linus Walleij <linus.ml.walleij@gmail.com> wrote: > 2010/3/21 Daniel Mack <daniel@caiaq.de>: > >> I wonder what this could be used for. Is there any kind of cache that >> would be worth putting there to speed up things for example? > > Some ARM systems have TCM (Tightly Coupled Memory) which are similar > in character but actually a feature of the ARM platforms rather than a custom > memory for a certain SoC. > > You can find some documentation on how this can be used in > Documentation/arm/tcm.txt, as you can see we can compile code there > and the memory left after that can be used as a generic memory pool > for fast always-on memory. > Hi Linus, I just have some questions on tcm to understand it. 1. TCM may be structured as a Harvard architecture with seprate ITCM and DTCM. It may also be structured as a Von Neumann architecture with a unified TCM. It's depend on the silicon implementation. It seems that current implementation of tcm module is for seprated ITCM & DTCM. But if it's a unified TCM, we can't copy data into TCM in initialization since there's no DTCM. 2. All free memory of ITCM and DTCM are joined into tcm_pool. Does it mean that the purpose of all free memory is storing data, not instruction? 3. Allocating memory could be from either ITCM or DTCM. If a piece for program is copied into allocated memory of DTCM, could instruction of this program piece be fetched from DTCM? 4. Both ITCM and DTCM is configured as uncached. Is it necessary to export API to configure to cached in order to performance? 5. ARM supports smart cache that switch the functionality between TCM and cache. Is it necessary to be supported by TCM module? 6. In current TCM module, reading co-processor instruction is contained. It means that it's closely bind to ARM TCM. In custom SoC, internal SRAM is just similar TCM. It doesn't support these co-processor instruction of acquiring region and size. What's your suggestion on supporting this kind of SoC in TCM module? Thanks Haojian ^ permalink raw reply [flat|nested] 7+ messages in thread
* PXA3xx internal SRAM 2010-03-25 12:58 ` Haojian Zhuang @ 2010-03-25 15:47 ` Linus Walleij 2010-03-26 2:55 ` Haojian Zhuang 0 siblings, 1 reply; 7+ messages in thread From: Linus Walleij @ 2010-03-25 15:47 UTC (permalink / raw) To: linux-arm-kernel [Haojian] > 1. TCM may be structured as a Harvard architecture with seprate ITCM > and DTCM. It may also be structured as a Von Neumann architecture with > a unified TCM. Yes and no, in that case you only have ITCM. The ARM hardware registers have no entry for something like a unified TCM. The ITCM can always be used for data as well, so to be precise the ITCM is a Von Neumann arch and the DTCM is a Harvard addition, so the sum result is a hybrid. > It's depend on the silicon implementation. It seems > that current implementation of tcm module is for seprated ITCM & DTCM. You mean that in PXA3xx it's an ITCM and a DTCM and the thing called SRAM is actually an ITCM? > But if it's a unified TCM, we can't copy data into TCM in > initialization since there's no DTCM. Yeah that's true... :-/ I think it can easily be fixed with some #if FOO preprocessor hacking in arch/arm/kernel/vmlinux.ld.S so that symbols marked with .tcm.data are allocated to the DTCM_OFFSET is not defined, so you only define ITCM_OFFSET and ITCM_END for your system. If you promise to test it, I can make a try at fixing this, because I have no system to test it on. > 2. All free memory of ITCM and DTCM are joined into tcm_pool. Does it > mean that the purpose of all free memory is storing data, not > instruction? Dynamic data can be stored in this pool, yes. > 3. Allocating memory could be from either ITCM or DTCM. If a piece for > program is copied into allocated memory of DTCM, could instruction of > this program piece be fetched from DTCM? No. But a memory heap shall not be executable anyway, the only scenario where this is applicable is if you want to load code into [I|D]TCM at runtime, which is not currently supported by the API. Code is only assigned to (I)TCM locations at compiletime. > 4. Both ITCM and DTCM is configured as uncached. Is it necessary to > export API to configure to cached in order to performance? Both ITCM and DTCM if you have them, are "above" the caches. The caches don't see them, so they must be uncached. But don't worry: TCM memory is just as fast as cache, and will never miss a cache line, that is why it exists :-) > 5. ARM supports smart cache that switch the functionality between TCM > and cache. Is it necessary to be supported by TCM module? It is not necessary and not supported right now, but it would be cool to have this. In that case we shouldn't free the pages used by the TCM upload area after kernel init, instead we should keep it as a backup storage area for TCM when it's used as cache, then switch this back and forth when TCM is to be used. Do you have this in your system? I assume the only practical usage for this would be to disable a cache and enable its use as TCM when going to sleep for example, is this what you intend to do with your system? > 6. In current TCM module, Its not a module really, its a part of the core arm arch. > reading co-processor instruction is > contained. It means that it's closely bind to ARM TCM. In custom SoC, > internal SRAM is just similar TCM. It doesn't support these > co-processor instruction of acquiring region and size. What's your > suggestion on supporting this kind of SoC in TCM module? The TCM code is used for the TCM that is part of the ARM architecture, it has no other intended usage. SRAMs are typically Von Neumann type and not as complicated as a pair of TCMs. There are several SRAM solutions already in the kernel I think, implemented per architecture, but no generic SRAM handler, sadly :-( It would be good if SRAM could be handled by a per-machine vmlinux.ld.S file so that code can be compiled there from simple C files, a generic include/linux/sram.h file to tag code sections properly and something generic in the style of the TCM code to handle the copying of code to SRAM and handling the residual memory pool. Yours, Linus Walleij ^ permalink raw reply [flat|nested] 7+ messages in thread
* PXA3xx internal SRAM 2010-03-25 15:47 ` Linus Walleij @ 2010-03-26 2:55 ` Haojian Zhuang 0 siblings, 0 replies; 7+ messages in thread From: Haojian Zhuang @ 2010-03-26 2:55 UTC (permalink / raw) To: linux-arm-kernel On Thu, Mar 25, 2010 at 11:47 AM, Linus Walleij <linus.ml.walleij@gmail.com> wrote: >> It's depend on the silicon implementation. It seems >> that current implementation of tcm module is for seprated ITCM & DTCM. > > You mean that in PXA3xx it's an ITCM and a DTCM and the thing > called SRAM is actually an ITCM? > TCM is in the same level of L1 cache. Internal SRAM of PXA3xx is in the bus, so it's behind of L2. >> But if it's a unified TCM, we can't copy data into TCM in >> initialization since there's no DTCM. > > Yeah that's true... :-/ > > I think it can easily be fixed with some #if FOO preprocessor > hacking in arch/arm/kernel/vmlinux.ld.S so that symbols marked > with .tcm.data are allocated to the DTCM_OFFSET is not > defined, so you only define ITCM_OFFSET and ITCM_END > for your system. > > If you promise to test it, I can make a try at fixing this, > because I have no system to test it on. > Internal SRAM is different from TCM. It seems that SRAM can't fit tcm code. I haven't any silicon with unified TCM. So I can't help you on this. >> 5. ARM supports smart cache that switch the functionality between TCM >> and cache. Is it necessary to be supported by TCM module? > > It is not necessary and not supported right now, but it would be cool > to have this. In that case we shouldn't free the pages used by the > TCM upload area after kernel init, instead we should keep it as a > backup storage area for TCM when it's used as cache, then switch > this back and forth when TCM is to be used. > > Do you have this in your system? > > I assume the only practical usage for this would be to disable a > cache and enable its use as TCM when going to sleep for example, > is this what you intend to do with your system? > In pxa168, SRAM can be configured either SRAM or L2. Maybe keeping it in one function is better. >> reading co-processor instruction is >> contained. It means that it's closely bind to ARM TCM. In custom SoC, >> internal SRAM is just similar TCM. It doesn't support these >> co-processor instruction of acquiring region and size. What's your >> suggestion on supporting this kind of SoC in TCM module? > > The TCM code is used for the TCM that is part of the ARM architecture, > it has no other intended usage. > > SRAMs are typically Von Neumann type and not as complicated > as a pair of TCMs. > > There are several SRAM solutions already in the kernel I think, > implemented per architecture, but no generic SRAM handler, > sadly :-( > > It would be good if SRAM could be handled by a per-machine > vmlinux.ld.S file so that code can be compiled there from simple > C files, a generic include/linux/sram.h file to tag code sections properly > and something generic in the style of the TCM code > to handle the copying of code to SRAM and handling the ?residual > memory pool. > It seems that copying code into SRAM may not bring performance benefit. If code can be locked into L2, it may get more performance. Since SRAM is behind of L2. So I think the requirement between SRAM and TCM is totally different. SRAM can't get benefit from TCM code. SRAM just need a memory management for storing data. Thanks Haojian ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2010-03-26 2:55 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-03-21 12:47 PXA3xx internal SRAM Daniel Mack 2010-03-22 0:47 ` Haojian Zhuang 2010-03-22 7:57 ` Daniel Mack 2010-03-22 21:09 ` Linus Walleij 2010-03-25 12:58 ` Haojian Zhuang 2010-03-25 15:47 ` Linus Walleij 2010-03-26 2:55 ` Haojian Zhuang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).