From mboxrd@z Thu Jan  1 00:00:00 1970
From: York Sun <yorksun@freescale.com>
Date: Thu, 29 May 2014 13:21:01 -0700
Subject: [U-Boot] [Patch v3 2/4] ARMv8/FSL_LSCH3: Add FSL_LSCH3 SoC
In-Reply-To: <CAL_Jsq+nhGO3HDkuebq_TGc8kCkL-wuBvhXwWsOaQZYVoGOV9A@mail.gmail.com>
References: <1401320790-11440-1-git-send-email-yorksun@freescale.com>	<1401320790-11440-2-git-send-email-yorksun@freescale.com>	<CAL_JsqKiQxzXN43zdxfKgg2M+1qLJ9LumCyF=rsZW+Sth2A0YA@mail.gmail.com>	<53875000.6070908@freescale.com>
	<CAL_Jsq+nhGO3HDkuebq_TGc8kCkL-wuBvhXwWsOaQZYVoGOV9A@mail.gmail.com>
Message-ID: <538796AD.9070401@freescale.com>
List-Id: <u-boot.lists.denx.de>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
To: u-boot@lists.denx.de

On 05/29/2014 10:37 AM, Rob Herring wrote:
> On Thu, May 29, 2014 at 10:19 AM, York Sun <yorksun@freescale.com> wrote:
>> On 05/29/2014 06:19 AM, Rob Herring wrote:
>>> On Wed, May 28, 2014 at 6:46 PM, York Sun <yorksun@freescale.com> wrote:
>>
>> <snip>
>>
>>>> +static void set_pgtable_section(u64 *page_table, u64 index, u64 section,
>>>> +                               u8 memory_type)
>>>> +{
>>>> +       u64 value;
>>>> +
>>>> +       value = section | PMD_TYPE_SECT | PMD_SECT_AF;
>>>> +       value |= PMD_ATTRINDX(memory_type);
>>>> +       page_table[index] = value;
>>>> +}
>>>
>>> This function looks like it should be common.
>>>
>>
>> There is a common version in arch/arm/cpu/armv8/cache_v8.c. This version has
>> more flexibility. I am not sure which one will be used as common.
> 
> Then add the flexibility to the common one.
> 

Will do.

> 
>>>> +       el = current_el();
>>>
>>> We really can't have u-boot running at random ELs in v8 for different
>>> platforms. It's a mess on v7. You should never be at EL3. u-boot could
>>> be defined to run at EL1, but then you need to be able to go back to
>>> EL2 to boot the kernel. So really u-boot should always run at EL2
>>> unless you are running in a VM, but that would be a different
>>> platform.
>>
>> I have to run u-boot at EL3. Otherwise I can't access Dickens. I vaguely
>> remember other reasons. I will have to dig it out if needed.
> 
> You may start in EL3, but then the early init code should drop to EL2.
> If you need EL3 later on for something PSCI does not address, then you
> are probably doing things wrong.

This part code supports all ELs. I think it is not wrong here. There is
something planned to have a secure monitor before u-boot and call it when
special access is needed. I am going to reorganize the code here to reuse more.

Is the general ARMv8 code already dropping to EL2?

> 
>>>> + * flush_l3_cache
>>>> + * Dickens L3 cache can be flushed by transitioning from FAM to SFONLY power
>>>> + * state, by writing to HP-F P-state request register.
>>>
>>> Other SOCs will have Dickens. Are these registers FSL specific? If
>>> not, this should be common.
>>
>> I don't think they are FSL specific. But I haven't found a proper place to host
>> it. Can you share what other SoCs have Dickens? If they are not supported yet,
>> we can keep the code here until we are clear then move it out.
> 
> It is safe to say most if not all SOCs based on A53 and/or A57 will
> also be based on Dickens aka CCN-504.

Great. But in order to to access CCN-504 register to flush L3 cache, it needs to
be EL3. At least for the SoC I am debugging. Please suggest a location to host
CCN-504 code. Should it go to drivers/misc?

> 
>>> Also, I believe the proper way to flush Dickens is using the
>>> architected cache flushing method where you walk the levels out to
>>> level 3.
>>
>> False. L3 cache gets flushed with instruction DCCIVAC. So if we flush the cache
>> by range, it works OK. But it doesn't work with DCISW or DCCISW. If we flush
>> cache by walking the levels, it doesn't work. We can only walk level 1 and level 2.
> 
> So the EL3 boot code should do any one-time invalidate all operations
> and u-boot in EL2 should only use range operations. There are
> limitations in the by way operations such as they are not SMP safe. If
> the by way operations are EL3 only, then that's probably a sign you
> are not doing things as intended. Or it could be an oversight and we
> need to figure out a common way to handle this across SOCs.
> 
>>>> + */
>>>> +#define HNF0_PSTATE_REQ 0x04200010
>>>> +#define HNF1_PSTATE_REQ 0x04210010
>>>> +#define HNF2_PSTATE_REQ 0x04220010
>>>> +#define HNF3_PSTATE_REQ 0x04230010
>>>> +#define HNF4_PSTATE_REQ 0x04240010
>>>> +#define HNF5_PSTATE_REQ 0x04250010
>>>> +#define HNF6_PSTATE_REQ 0x04260010
>>>> +#define HNF7_PSTATE_REQ 0x04270010
>>>> +#define HNFPSTAT_MASK (0xFFFFFFFFFFFFFFFC)
>>>> +#define HNFPSTAT_FAM   0x3
>>>> +#define HNFPSTAT_SFONLY 0x01
>>>> +
>>>> +static void hnf_pstate_req(u64 *ptr, u64 state)
>>>> +{
>>>> +       int timeout = 1000;
>>>> +       out_le64(ptr, (in_le64(ptr) & HNFPSTAT_MASK) | (state & 0x3));
>>>> +       ptr++;
>>>> +       /* checking if the transition is completed */
>>>> +       while (timeout > 0) {
>>>> +               if (((in_le64(ptr) & 0x0c) >> 2) == (state & 0x3))
>>>> +                       break;
>>>> +               udelay(100);
>>>> +               timeout--;
>>>> +       }
>>>> +}
>>>> +
>>>> +void flush_l3_cache(void)
>>>> +{
>>>> +       hnf_pstate_req((u64 *)HNF0_PSTATE_REQ, HNFPSTAT_SFONLY);
>>>> +       hnf_pstate_req((u64 *)HNF1_PSTATE_REQ, HNFPSTAT_SFONLY);
>>>> +       hnf_pstate_req((u64 *)HNF2_PSTATE_REQ, HNFPSTAT_SFONLY);
>>>> +       hnf_pstate_req((u64 *)HNF3_PSTATE_REQ, HNFPSTAT_SFONLY);
>>>> +       hnf_pstate_req((u64 *)HNF4_PSTATE_REQ, HNFPSTAT_SFONLY);
>>>> +       hnf_pstate_req((u64 *)HNF5_PSTATE_REQ, HNFPSTAT_SFONLY);
>>>> +       hnf_pstate_req((u64 *)HNF6_PSTATE_REQ, HNFPSTAT_SFONLY);
>>>> +       hnf_pstate_req((u64 *)HNF7_PSTATE_REQ, HNFPSTAT_SFONLY);
>>>> +       hnf_pstate_req((u64 *)HNF0_PSTATE_REQ, HNFPSTAT_FAM);
>>>> +       hnf_pstate_req((u64 *)HNF1_PSTATE_REQ, HNFPSTAT_FAM);
>>>> +       hnf_pstate_req((u64 *)HNF2_PSTATE_REQ, HNFPSTAT_FAM);
>>>> +       hnf_pstate_req((u64 *)HNF3_PSTATE_REQ, HNFPSTAT_FAM);
>>>> +       hnf_pstate_req((u64 *)HNF4_PSTATE_REQ, HNFPSTAT_FAM);
>>>> +       hnf_pstate_req((u64 *)HNF5_PSTATE_REQ, HNFPSTAT_FAM);
>>>> +       hnf_pstate_req((u64 *)HNF6_PSTATE_REQ, HNFPSTAT_FAM);
>>>> +       hnf_pstate_req((u64 *)HNF7_PSTATE_REQ, HNFPSTAT_FAM);
>>>> +}
>>>> +
>>>> +/*
>>>> + * This function is called from lib/board.c.
>>>> + * It recreates MMU table in main memory. MMU and d-cache are enabled earlier.
>>>> + * There is no need to disable d-cache for this operation.
>>>> + */
>>>> +void enable_caches(void)
>>>> +{
>>>> +       final_mmu_setup();
>>>> +       flush_dcache_range(gd->arch.tlb_addr,
>>>> +                          gd->arch.tlb_addr +  gd->arch.tlb_size);
>>>> +       __asm_invalidate_tlb_all();
>>>> +}
>>>> +#endif
>>>> +
>>>> +static inline u32 init_type(u32 cluster, int init_id)
>>>
>>> init_type? That's a great name.
>>
>> That is initiator type. It is a funny name used by many FSL SoCs.


By the way, I am changing the name to initiator_type.

>>
>>>
>>>> +{
>>>> +       struct ccsr_gur *gur = (void *)(CONFIG_SYS_FSL_GUTS_ADDR);
>>>> +       u32 idx = (cluster >> (init_id * 8)) & TP_CLUSTER_INIT_MASK;
>>>> +       u32 type = in_le32(&gur->tp_ityp[idx]);
>>>> +
>>>> +       if (type & TP_ITYP_AV)
>>>> +               return type;
>>>> +
>>>> +       return 0;
>>>> +}
>>>> +
>>>> +u32 cpu_mask(void)
>>>> +{
>>>> +       struct ccsr_gur __iomem *gur = (void *)(CONFIG_SYS_FSL_GUTS_ADDR);
>>>> +       int i = 0, count = 0;
>>>> +       u32 cluster, type, mask = 0;
>>>> +
>>>> +       do {
>>>> +               int j;
>>>> +               cluster = in_le32(&gur->tp_cluster[i].lower);
>>>> +               for (j = 0; j < TP_INIT_PER_CLUSTER; j++) {
>>>> +                       type = init_type(cluster, j);
>>>> +                       if (type) {
>>>> +                               if (TP_ITYP_TYPE(type) == TP_ITYP_TYPE_ARM)
>>>> +                                       mask |= 1 << count;
>>>> +                               count++;
>>>> +                       }
>>>> +               }
>>>> +               i++;
>>>> +       } while ((cluster & TP_CLUSTER_EOC) != TP_CLUSTER_EOC);
>>>> +
>>>> +       return mask;
>>>> +}
>>>> +
>>>> +/*
>>>> + * Return the number of cores on this SOC.
>>>> + */
>>>> +int cpu_numcores(void)
>>>> +{
>>>> +       return hweight32(cpu_mask());
>>>> +}
>>>> +
>>>> +int fsl_qoriq_core_to_cluster(unsigned int core)
>>>> +{
>>>> +       struct ccsr_gur __iomem *gur =
>>>> +               (void __iomem *)(CONFIG_SYS_FSL_GUTS_ADDR);
>>>> +       int i = 0, count = 0;
>>>> +       u32 cluster;
>>>> +
>>>> +       do {
>>>> +               int j;
>>>> +               cluster = in_le32(&gur->tp_cluster[i].lower);
>>>> +               for (j = 0; j < TP_INIT_PER_CLUSTER; j++) {
>>>> +                       if (init_type(cluster, j)) {
>>>> +                               if (count == core)
>>>> +                                       return i;
>>>> +                               count++;
>>>> +                       }
>>>> +               }
>>>> +               i++;
>>>> +       } while ((cluster & TP_CLUSTER_EOC) != TP_CLUSTER_EOC);
>>>> +
>>>> +       return -1;      /* cannot identify the cluster */
>>>> +}
>>>> +
>>>> +u32 fsl_qoriq_core_to_type(unsigned int core)
>>>> +{
>>>> +       struct ccsr_gur __iomem *gur =
>>>> +               (void __iomem *)(CONFIG_SYS_FSL_GUTS_ADDR);
>>>> +       int i = 0, count = 0;
>>>> +       u32 cluster, type;
>>>> +
>>>> +       do {
>>>> +               int j;
>>>> +               cluster = in_le32(&gur->tp_cluster[i].lower);
>>>> +               for (j = 0; j < TP_INIT_PER_CLUSTER; j++) {
>>>> +                       type = init_type(cluster, j);
>>>> +                       if (type) {
>>>> +                               if (count == core)
>>>> +                                       return type;
>>>> +                               count++;
>>>> +                       }
>>>> +               }
>>>> +               i++;
>>>> +       } while ((cluster & TP_CLUSTER_EOC) != TP_CLUSTER_EOC);
>>>> +
>>>> +       return -1;      /* cannot identify the cluster */
>>>> +}
>>>
>>> Do you plan on supporting PSCI because all this core and cluster stuff
>>> belongs there.
>>
>> What's PSCI?
> 
> Power State Coordination Interface, a spec from ARM to do cpu and
> cluster power/boot control. It's strongly encouraged for v8 and good
> luck with upstream kernel support without it.
> 
> There are patches for v7 to add PSCI implementation to u-boot from
> Marc Zyngier on the list. It is debatable whether u-boot is the right
> place for it rather than separate secure firmware. There is also ARM
> Trusted Firmware which implements PSCI and you should look at.

Thanks for the explanation. The above code is to probe the number of cores, to
support SMP.

> 
>> This code is similar to FSL chassis generation 2 code, but it is for chassis
>> generation 3. I plan to move it to a common place once we have another platform
>> using chassis generation 3.
> 
> Gen2 would be PowerPC, right? Not sure how that is relevant.

It is not relevant to ARM.

> 
>>>> +       /*
>>>> +        * Slave should wait for master clearing spin table.
>>>> +        * This sync prevent salves observing incorrect
>>>> +        * value of spin table and jumping to wrong place.
>>>> +        */
>>>> +#if defined(CONFIG_GICV2) || defined(CONFIG_GICV3)
>>>> +#ifdef CONFIG_GICV2
>>>> +       ldr     x0, =GICC_BASE
>>>> +#endif
>>>> +       bl      gic_wait_for_interrupt
>>>> +#endif
>>>> +
>>>> +       /*
>>>> +        * All processors will enter EL2 and optionally EL1.
>>>> +        */
>>>> +       bl      armv8_switch_to_el2
>>>> +#ifdef CONFIG_ARMV8_SWITCH_TO_EL1
>>>> +       bl      armv8_switch_to_el1
>>>> +#endif
>>>> +       b       2f
>>>
>>> This all looks like cut and paste from existing startup code. Can't
>>> you refactor things?
>>
>> Right. We have added some code which only applies to this SoC. That's why the
>> copy-n-paste then modify. I am also  holding other patches which add a lot more
>> code into this file.
> 
> Then add callouts so you can add SOC specific initialization.
> 

There is a reason to use weak for lowlevel_init. It would be easier to implement
real low level init without adding callout here and there. I am cleaning up the
GICv2 macro here so it will look cleaner. If we end up the same code when we
have the real SoC, I will cleanup this function. For now, I prefer the separate one.

York