From mboxrd@z Thu Jan 1 00:00:00 1970 From: York Sun Date: Thu, 29 May 2014 13:21:01 -0700 Subject: [U-Boot] [Patch v3 2/4] ARMv8/FSL_LSCH3: Add FSL_LSCH3 SoC In-Reply-To: References: <1401320790-11440-1-git-send-email-yorksun@freescale.com> <1401320790-11440-2-git-send-email-yorksun@freescale.com> <53875000.6070908@freescale.com> Message-ID: <538796AD.9070401@freescale.com> List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: u-boot@lists.denx.de On 05/29/2014 10:37 AM, Rob Herring wrote: > On Thu, May 29, 2014 at 10:19 AM, York Sun wrote: >> On 05/29/2014 06:19 AM, Rob Herring wrote: >>> On Wed, May 28, 2014 at 6:46 PM, York Sun wrote: >> >> >> >>>> +static void set_pgtable_section(u64 *page_table, u64 index, u64 section, >>>> + u8 memory_type) >>>> +{ >>>> + u64 value; >>>> + >>>> + value = section | PMD_TYPE_SECT | PMD_SECT_AF; >>>> + value |= PMD_ATTRINDX(memory_type); >>>> + page_table[index] = value; >>>> +} >>> >>> This function looks like it should be common. >>> >> >> There is a common version in arch/arm/cpu/armv8/cache_v8.c. This version has >> more flexibility. I am not sure which one will be used as common. > > Then add the flexibility to the common one. > Will do. > >>>> + el = current_el(); >>> >>> We really can't have u-boot running at random ELs in v8 for different >>> platforms. It's a mess on v7. You should never be at EL3. u-boot could >>> be defined to run at EL1, but then you need to be able to go back to >>> EL2 to boot the kernel. So really u-boot should always run at EL2 >>> unless you are running in a VM, but that would be a different >>> platform. >> >> I have to run u-boot at EL3. Otherwise I can't access Dickens. I vaguely >> remember other reasons. I will have to dig it out if needed. > > You may start in EL3, but then the early init code should drop to EL2. > If you need EL3 later on for something PSCI does not address, then you > are probably doing things wrong. This part code supports all ELs. I think it is not wrong here. There is something planned to have a secure monitor before u-boot and call it when special access is needed. I am going to reorganize the code here to reuse more. Is the general ARMv8 code already dropping to EL2? > >>>> + * flush_l3_cache >>>> + * Dickens L3 cache can be flushed by transitioning from FAM to SFONLY power >>>> + * state, by writing to HP-F P-state request register. >>> >>> Other SOCs will have Dickens. Are these registers FSL specific? If >>> not, this should be common. >> >> I don't think they are FSL specific. But I haven't found a proper place to host >> it. Can you share what other SoCs have Dickens? If they are not supported yet, >> we can keep the code here until we are clear then move it out. > > It is safe to say most if not all SOCs based on A53 and/or A57 will > also be based on Dickens aka CCN-504. Great. But in order to to access CCN-504 register to flush L3 cache, it needs to be EL3. At least for the SoC I am debugging. Please suggest a location to host CCN-504 code. Should it go to drivers/misc? > >>> Also, I believe the proper way to flush Dickens is using the >>> architected cache flushing method where you walk the levels out to >>> level 3. >> >> False. L3 cache gets flushed with instruction DCCIVAC. So if we flush the cache >> by range, it works OK. But it doesn't work with DCISW or DCCISW. If we flush >> cache by walking the levels, it doesn't work. We can only walk level 1 and level 2. > > So the EL3 boot code should do any one-time invalidate all operations > and u-boot in EL2 should only use range operations. There are > limitations in the by way operations such as they are not SMP safe. If > the by way operations are EL3 only, then that's probably a sign you > are not doing things as intended. Or it could be an oversight and we > need to figure out a common way to handle this across SOCs. > >>>> + */ >>>> +#define HNF0_PSTATE_REQ 0x04200010 >>>> +#define HNF1_PSTATE_REQ 0x04210010 >>>> +#define HNF2_PSTATE_REQ 0x04220010 >>>> +#define HNF3_PSTATE_REQ 0x04230010 >>>> +#define HNF4_PSTATE_REQ 0x04240010 >>>> +#define HNF5_PSTATE_REQ 0x04250010 >>>> +#define HNF6_PSTATE_REQ 0x04260010 >>>> +#define HNF7_PSTATE_REQ 0x04270010 >>>> +#define HNFPSTAT_MASK (0xFFFFFFFFFFFFFFFC) >>>> +#define HNFPSTAT_FAM 0x3 >>>> +#define HNFPSTAT_SFONLY 0x01 >>>> + >>>> +static void hnf_pstate_req(u64 *ptr, u64 state) >>>> +{ >>>> + int timeout = 1000; >>>> + out_le64(ptr, (in_le64(ptr) & HNFPSTAT_MASK) | (state & 0x3)); >>>> + ptr++; >>>> + /* checking if the transition is completed */ >>>> + while (timeout > 0) { >>>> + if (((in_le64(ptr) & 0x0c) >> 2) == (state & 0x3)) >>>> + break; >>>> + udelay(100); >>>> + timeout--; >>>> + } >>>> +} >>>> + >>>> +void flush_l3_cache(void) >>>> +{ >>>> + hnf_pstate_req((u64 *)HNF0_PSTATE_REQ, HNFPSTAT_SFONLY); >>>> + hnf_pstate_req((u64 *)HNF1_PSTATE_REQ, HNFPSTAT_SFONLY); >>>> + hnf_pstate_req((u64 *)HNF2_PSTATE_REQ, HNFPSTAT_SFONLY); >>>> + hnf_pstate_req((u64 *)HNF3_PSTATE_REQ, HNFPSTAT_SFONLY); >>>> + hnf_pstate_req((u64 *)HNF4_PSTATE_REQ, HNFPSTAT_SFONLY); >>>> + hnf_pstate_req((u64 *)HNF5_PSTATE_REQ, HNFPSTAT_SFONLY); >>>> + hnf_pstate_req((u64 *)HNF6_PSTATE_REQ, HNFPSTAT_SFONLY); >>>> + hnf_pstate_req((u64 *)HNF7_PSTATE_REQ, HNFPSTAT_SFONLY); >>>> + hnf_pstate_req((u64 *)HNF0_PSTATE_REQ, HNFPSTAT_FAM); >>>> + hnf_pstate_req((u64 *)HNF1_PSTATE_REQ, HNFPSTAT_FAM); >>>> + hnf_pstate_req((u64 *)HNF2_PSTATE_REQ, HNFPSTAT_FAM); >>>> + hnf_pstate_req((u64 *)HNF3_PSTATE_REQ, HNFPSTAT_FAM); >>>> + hnf_pstate_req((u64 *)HNF4_PSTATE_REQ, HNFPSTAT_FAM); >>>> + hnf_pstate_req((u64 *)HNF5_PSTATE_REQ, HNFPSTAT_FAM); >>>> + hnf_pstate_req((u64 *)HNF6_PSTATE_REQ, HNFPSTAT_FAM); >>>> + hnf_pstate_req((u64 *)HNF7_PSTATE_REQ, HNFPSTAT_FAM); >>>> +} >>>> + >>>> +/* >>>> + * This function is called from lib/board.c. >>>> + * It recreates MMU table in main memory. MMU and d-cache are enabled earlier. >>>> + * There is no need to disable d-cache for this operation. >>>> + */ >>>> +void enable_caches(void) >>>> +{ >>>> + final_mmu_setup(); >>>> + flush_dcache_range(gd->arch.tlb_addr, >>>> + gd->arch.tlb_addr + gd->arch.tlb_size); >>>> + __asm_invalidate_tlb_all(); >>>> +} >>>> +#endif >>>> + >>>> +static inline u32 init_type(u32 cluster, int init_id) >>> >>> init_type? That's a great name. >> >> That is initiator type. It is a funny name used by many FSL SoCs. By the way, I am changing the name to initiator_type. >> >>> >>>> +{ >>>> + struct ccsr_gur *gur = (void *)(CONFIG_SYS_FSL_GUTS_ADDR); >>>> + u32 idx = (cluster >> (init_id * 8)) & TP_CLUSTER_INIT_MASK; >>>> + u32 type = in_le32(&gur->tp_ityp[idx]); >>>> + >>>> + if (type & TP_ITYP_AV) >>>> + return type; >>>> + >>>> + return 0; >>>> +} >>>> + >>>> +u32 cpu_mask(void) >>>> +{ >>>> + struct ccsr_gur __iomem *gur = (void *)(CONFIG_SYS_FSL_GUTS_ADDR); >>>> + int i = 0, count = 0; >>>> + u32 cluster, type, mask = 0; >>>> + >>>> + do { >>>> + int j; >>>> + cluster = in_le32(&gur->tp_cluster[i].lower); >>>> + for (j = 0; j < TP_INIT_PER_CLUSTER; j++) { >>>> + type = init_type(cluster, j); >>>> + if (type) { >>>> + if (TP_ITYP_TYPE(type) == TP_ITYP_TYPE_ARM) >>>> + mask |= 1 << count; >>>> + count++; >>>> + } >>>> + } >>>> + i++; >>>> + } while ((cluster & TP_CLUSTER_EOC) != TP_CLUSTER_EOC); >>>> + >>>> + return mask; >>>> +} >>>> + >>>> +/* >>>> + * Return the number of cores on this SOC. >>>> + */ >>>> +int cpu_numcores(void) >>>> +{ >>>> + return hweight32(cpu_mask()); >>>> +} >>>> + >>>> +int fsl_qoriq_core_to_cluster(unsigned int core) >>>> +{ >>>> + struct ccsr_gur __iomem *gur = >>>> + (void __iomem *)(CONFIG_SYS_FSL_GUTS_ADDR); >>>> + int i = 0, count = 0; >>>> + u32 cluster; >>>> + >>>> + do { >>>> + int j; >>>> + cluster = in_le32(&gur->tp_cluster[i].lower); >>>> + for (j = 0; j < TP_INIT_PER_CLUSTER; j++) { >>>> + if (init_type(cluster, j)) { >>>> + if (count == core) >>>> + return i; >>>> + count++; >>>> + } >>>> + } >>>> + i++; >>>> + } while ((cluster & TP_CLUSTER_EOC) != TP_CLUSTER_EOC); >>>> + >>>> + return -1; /* cannot identify the cluster */ >>>> +} >>>> + >>>> +u32 fsl_qoriq_core_to_type(unsigned int core) >>>> +{ >>>> + struct ccsr_gur __iomem *gur = >>>> + (void __iomem *)(CONFIG_SYS_FSL_GUTS_ADDR); >>>> + int i = 0, count = 0; >>>> + u32 cluster, type; >>>> + >>>> + do { >>>> + int j; >>>> + cluster = in_le32(&gur->tp_cluster[i].lower); >>>> + for (j = 0; j < TP_INIT_PER_CLUSTER; j++) { >>>> + type = init_type(cluster, j); >>>> + if (type) { >>>> + if (count == core) >>>> + return type; >>>> + count++; >>>> + } >>>> + } >>>> + i++; >>>> + } while ((cluster & TP_CLUSTER_EOC) != TP_CLUSTER_EOC); >>>> + >>>> + return -1; /* cannot identify the cluster */ >>>> +} >>> >>> Do you plan on supporting PSCI because all this core and cluster stuff >>> belongs there. >> >> What's PSCI? > > Power State Coordination Interface, a spec from ARM to do cpu and > cluster power/boot control. It's strongly encouraged for v8 and good > luck with upstream kernel support without it. > > There are patches for v7 to add PSCI implementation to u-boot from > Marc Zyngier on the list. It is debatable whether u-boot is the right > place for it rather than separate secure firmware. There is also ARM > Trusted Firmware which implements PSCI and you should look at. Thanks for the explanation. The above code is to probe the number of cores, to support SMP. > >> This code is similar to FSL chassis generation 2 code, but it is for chassis >> generation 3. I plan to move it to a common place once we have another platform >> using chassis generation 3. > > Gen2 would be PowerPC, right? Not sure how that is relevant. It is not relevant to ARM. > >>>> + /* >>>> + * Slave should wait for master clearing spin table. >>>> + * This sync prevent salves observing incorrect >>>> + * value of spin table and jumping to wrong place. >>>> + */ >>>> +#if defined(CONFIG_GICV2) || defined(CONFIG_GICV3) >>>> +#ifdef CONFIG_GICV2 >>>> + ldr x0, =GICC_BASE >>>> +#endif >>>> + bl gic_wait_for_interrupt >>>> +#endif >>>> + >>>> + /* >>>> + * All processors will enter EL2 and optionally EL1. >>>> + */ >>>> + bl armv8_switch_to_el2 >>>> +#ifdef CONFIG_ARMV8_SWITCH_TO_EL1 >>>> + bl armv8_switch_to_el1 >>>> +#endif >>>> + b 2f >>> >>> This all looks like cut and paste from existing startup code. Can't >>> you refactor things? >> >> Right. We have added some code which only applies to this SoC. That's why the >> copy-n-paste then modify. I am also holding other patches which add a lot more >> code into this file. > > Then add callouts so you can add SOC specific initialization. > There is a reason to use weak for lowlevel_init. It would be easier to implement real low level init without adding callout here and there. I am cleaning up the GICv2 macro here so it will look cleaner. If we end up the same code when we have the real SoC, I will cleanup this function. For now, I prefer the separate one. York