From mboxrd@z Thu Jan 1 00:00:00 1970 From: linus.ml.walleij@gmail.com (Linus Walleij) Date: Thu, 25 Mar 2010 16:47:43 +0100 Subject: PXA3xx internal SRAM In-Reply-To: <771cded01003250558y6bf9a8b6q7fc969d448faa1df@mail.gmail.com> References: <20100321124739.GG30801@buzzloop.caiaq.de> <63386a3d1003221409j63413b5o7a0515836eaa8b86@mail.gmail.com> <771cded01003250558y6bf9a8b6q7fc969d448faa1df@mail.gmail.com> Message-ID: <63386a3d1003250847n3ac9e1fg5f7873bad6006af@mail.gmail.com> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org [Haojian] > 1. TCM may be structured as a Harvard architecture with seprate ITCM > and DTCM. It may also be structured as a Von Neumann architecture with > a unified TCM. Yes and no, in that case you only have ITCM. The ARM hardware registers have no entry for something like a unified TCM. The ITCM can always be used for data as well, so to be precise the ITCM is a Von Neumann arch and the DTCM is a Harvard addition, so the sum result is a hybrid. > It's depend on the silicon implementation. It seems > that current implementation of tcm module is for seprated ITCM & DTCM. You mean that in PXA3xx it's an ITCM and a DTCM and the thing called SRAM is actually an ITCM? > But if it's a unified TCM, we can't copy data into TCM in > initialization since there's no DTCM. Yeah that's true... :-/ I think it can easily be fixed with some #if FOO preprocessor hacking in arch/arm/kernel/vmlinux.ld.S so that symbols marked with .tcm.data are allocated to the DTCM_OFFSET is not defined, so you only define ITCM_OFFSET and ITCM_END for your system. If you promise to test it, I can make a try at fixing this, because I have no system to test it on. > 2. All free memory of ITCM and DTCM are joined into tcm_pool. Does it > mean that the purpose of all free memory is storing data, not > instruction? Dynamic data can be stored in this pool, yes. > 3. Allocating memory could be from either ITCM or DTCM. If a piece for > program is copied into allocated memory of DTCM, could instruction of > this program piece be fetched from DTCM? No. But a memory heap shall not be executable anyway, the only scenario where this is applicable is if you want to load code into [I|D]TCM at runtime, which is not currently supported by the API. Code is only assigned to (I)TCM locations at compiletime. > 4. Both ITCM and DTCM is configured as uncached. Is it necessary to > export API to configure to cached in order to performance? Both ITCM and DTCM if you have them, are "above" the caches. The caches don't see them, so they must be uncached. But don't worry: TCM memory is just as fast as cache, and will never miss a cache line, that is why it exists :-) > 5. ARM supports smart cache that switch the functionality between TCM > and cache. Is it necessary to be supported by TCM module? It is not necessary and not supported right now, but it would be cool to have this. In that case we shouldn't free the pages used by the TCM upload area after kernel init, instead we should keep it as a backup storage area for TCM when it's used as cache, then switch this back and forth when TCM is to be used. Do you have this in your system? I assume the only practical usage for this would be to disable a cache and enable its use as TCM when going to sleep for example, is this what you intend to do with your system? > 6. In current TCM module, Its not a module really, its a part of the core arm arch. > reading co-processor instruction is > contained. It means that it's closely bind to ARM TCM. In custom SoC, > internal SRAM is just similar TCM. It doesn't support these > co-processor instruction of acquiring region and size. What's your > suggestion on supporting this kind of SoC in TCM module? The TCM code is used for the TCM that is part of the ARM architecture, it has no other intended usage. SRAMs are typically Von Neumann type and not as complicated as a pair of TCMs. There are several SRAM solutions already in the kernel I think, implemented per architecture, but no generic SRAM handler, sadly :-( It would be good if SRAM could be handled by a per-machine vmlinux.ld.S file so that code can be compiled there from simple C files, a generic include/linux/sram.h file to tag code sections properly and something generic in the style of the TCM code to handle the copying of code to SRAM and handling the residual memory pool. Yours, Linus Walleij