* boot time regressed a lot due to misaligned access probe @ 2023-09-13 0:14 Jisheng Zhang 2023-09-13 10:46 ` Ben Dooks 0 siblings, 1 reply; 6+ messages in thread From: Jisheng Zhang @ 2023-09-13 0:14 UTC (permalink / raw) To: linux-riscv, PalmerDabbelt, Paul Walmsley, Albert Ou, Evan Green Hi all, Probing one cpu for misaligned access cost about 0.06s, so it will cost about 3.8s on platforms with 64 CPUs, for example, milkv pioneer which is powered by sg2042. I'm not sure the reason of probing misaligned access for all CPUs. If the HW doesn't behave as SMP from misalligned access side, then unless userspace processes force cpu affinity, they always suffer from this non-SMP pain. So, can we only probe the boot cpu? Thanks _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: boot time regressed a lot due to misaligned access probe 2023-09-13 0:14 boot time regressed a lot due to misaligned access probe Jisheng Zhang @ 2023-09-13 10:46 ` Ben Dooks 2023-09-13 15:11 ` Jisheng Zhang 0 siblings, 1 reply; 6+ messages in thread From: Ben Dooks @ 2023-09-13 10:46 UTC (permalink / raw) To: Jisheng Zhang, linux-riscv, PalmerDabbelt, Paul Walmsley, Albert Ou, Evan Green On 13/09/2023 01:14, Jisheng Zhang wrote: > Hi all, > > Probing one cpu for misaligned access cost about 0.06s, so it will cost > about 3.8s on platforms with 64 CPUs, for example, milkv pioneer which > is powered by sg2042. > > I'm not sure the reason of probing misaligned access for all CPUs. If > the HW doesn't behave as SMP from misalligned access side, then unless > userspace processes force cpu affinity, they always suffer from this > non-SMP pain. > > So, can we only probe the boot cpu? So a couple of ideas: #1 is it worth adding a device-tree property to explicitly to say if the unaligned access has been measured and known #2 only probe one cpu in a cluster if there are multiple clusters of cpus? -- Ben Dooks http://www.codethink.co.uk/ Senior Engineer Codethink - Providing Genius https://www.codethink.co.uk/privacy.html _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: boot time regressed a lot due to misaligned access probe 2023-09-13 10:46 ` Ben Dooks @ 2023-09-13 15:11 ` Jisheng Zhang 2023-09-13 19:50 ` Evan Green 0 siblings, 1 reply; 6+ messages in thread From: Jisheng Zhang @ 2023-09-13 15:11 UTC (permalink / raw) To: Ben Dooks Cc: linux-riscv, PalmerDabbelt, Paul Walmsley, Albert Ou, Evan Green On Wed, Sep 13, 2023 at 11:46:28AM +0100, Ben Dooks wrote: > On 13/09/2023 01:14, Jisheng Zhang wrote: > > Hi all, > > > > Probing one cpu for misaligned access cost about 0.06s, so it will cost > > about 3.8s on platforms with 64 CPUs, for example, milkv pioneer which > > is powered by sg2042. > > > > I'm not sure the reason of probing misaligned access for all CPUs. If > > the HW doesn't behave as SMP from misalligned access side, then unless > > userspace processes force cpu affinity, they always suffer from this > > non-SMP pain. > > > > So, can we only probe the boot cpu? > > So a couple of ideas: > > #1 is it worth adding a device-tree property to explicitly to say if > the unaligned access has been measured and known > > #2 only probe one cpu in a cluster if there are multiple clusters of > cpus? and #3 Could userspace who cares about misaligned access probe the speed itself? And this reminds me the arm case: old armv5te VS armv7, there's no such probe in arm yet. > > -- > Ben Dooks http://www.codethink.co.uk/ > Senior Engineer Codethink - Providing Genius > > https://www.codethink.co.uk/privacy.html > _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: boot time regressed a lot due to misaligned access probe 2023-09-13 15:11 ` Jisheng Zhang @ 2023-09-13 19:50 ` Evan Green 2023-09-13 19:53 ` Palmer Dabbelt 2023-09-15 0:55 ` Jisheng Zhang 0 siblings, 2 replies; 6+ messages in thread From: Evan Green @ 2023-09-13 19:50 UTC (permalink / raw) To: Jisheng Zhang Cc: Ben Dooks, linux-riscv, PalmerDabbelt, Paul Walmsley, Albert Ou On Wed, Sep 13, 2023 at 8:23 AM Jisheng Zhang <jszhang@kernel.org> wrote: > > On Wed, Sep 13, 2023 at 11:46:28AM +0100, Ben Dooks wrote: > > On 13/09/2023 01:14, Jisheng Zhang wrote: > > > Hi all, > > > > > > Probing one cpu for misaligned access cost about 0.06s, so it will cost > > > about 3.8s on platforms with 64 CPUs, for example, milkv pioneer which > > > is powered by sg2042. > > > > > > I'm not sure the reason of probing misaligned access for all CPUs. If > > > the HW doesn't behave as SMP from misalligned access side, then unless > > > userspace processes force cpu affinity, they always suffer from this > > > non-SMP pain. > > > > > > So, can we only probe the boot cpu? Hi Jisheng, Thanks for identifying this regression. I'd prefer to keep the probing on each cpu, as I don't think it's safe to assume behavior is the same across all cores. But there's no reason this needs to be done serially, we should be able to do the checking in parallel on each cpu. I don't have a physical 64-core system, but I experimented with qemu a bit: With misaligned probing [ 0.558930] smp: Bringing up secondary CPUs ... [ 7.635580] smp: Brought up 1 node, 64 CPUs With no misaligned probing [ 0.473012] smp: Bringing up secondary CPUs ... [ 5.438450] smp: Brought up 1 node, 64 CPUs With change below: [ 0.615684] smp: Bringing up secondary CPUs ... [ 5.489045] smp: Brought up 1 node, 64 CPUs I also commented out the pr_info() in my testing, mostly to keep the UART out of the way. We should strive to improve the smp core bringup time in general, but hopefully with this the misaligned probing won't be making it worse. If this works for you I can clean it up and submit a patch (sorry gmail mangles the diff): diff --git a/arch/riscv/kernel/smpboot.c b/arch/riscv/kernel/smpboot.c index 1b8da4e40a4d..7dce30b7c868 100644 --- a/arch/riscv/kernel/smpboot.c +++ b/arch/riscv/kernel/smpboot.c @@ -223,8 +223,18 @@ int __cpu_up(unsigned int cpu, struct task_struct *tidle) return ret; } +static void check_unaligned_access_cpu(void *unused) +{ + unsigned int cpu = smp_processor_id(); + + /* Someone has to stay behind and tend the jiffies. */ + if (cpu != 0) + check_unaligned_access(cpu); +} + void __init smp_cpus_done(unsigned int max_cpus) { + on_each_cpu(check_unaligned_access_cpu, NULL, 0); } /* @@ -246,7 +256,6 @@ asmlinkage __visible void smp_callin(void) numa_add_cpu(curr_cpuid); set_cpu_online(curr_cpuid, 1); - check_unaligned_access(curr_cpuid); if (has_vector()) { if (riscv_v_setup_vsize()) -Evan _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: boot time regressed a lot due to misaligned access probe 2023-09-13 19:50 ` Evan Green @ 2023-09-13 19:53 ` Palmer Dabbelt 2023-09-15 0:55 ` Jisheng Zhang 1 sibling, 0 replies; 6+ messages in thread From: Palmer Dabbelt @ 2023-09-13 19:53 UTC (permalink / raw) To: Evan Green; +Cc: jszhang, ben.dooks, linux-riscv, Paul Walmsley, aou On Wed, 13 Sep 2023 12:50:54 PDT (-0700), Evan Green wrote: > On Wed, Sep 13, 2023 at 8:23 AM Jisheng Zhang <jszhang@kernel.org> wrote: >> >> On Wed, Sep 13, 2023 at 11:46:28AM +0100, Ben Dooks wrote: >> > On 13/09/2023 01:14, Jisheng Zhang wrote: >> > > Hi all, >> > > >> > > Probing one cpu for misaligned access cost about 0.06s, so it will cost >> > > about 3.8s on platforms with 64 CPUs, for example, milkv pioneer which >> > > is powered by sg2042. >> > > >> > > I'm not sure the reason of probing misaligned access for all CPUs. If >> > > the HW doesn't behave as SMP from misalligned access side, then unless >> > > userspace processes force cpu affinity, they always suffer from this >> > > non-SMP pain. >> > > >> > > So, can we only probe the boot cpu? > > Hi Jisheng, > Thanks for identifying this regression. I'd prefer to keep the probing > on each cpu, as I don't think it's safe to assume behavior is the same > across all cores. But there's no reason this needs to be done > serially, we should be able to do the checking in parallel on each > cpu. I don't have a physical 64-core system, but I experimented with > qemu a bit: > > With misaligned probing > [ 0.558930] smp: Bringing up secondary CPUs ... > [ 7.635580] smp: Brought up 1 node, 64 CPUs > > With no misaligned probing > [ 0.473012] smp: Bringing up secondary CPUs ... > [ 5.438450] smp: Brought up 1 node, 64 CPUs > > With change below: > [ 0.615684] smp: Bringing up secondary CPUs ... > [ 5.489045] smp: Brought up 1 node, 64 CPUs > > I also commented out the pr_info() in my testing, mostly to keep the > UART out of the way. We should strive to improve the smp core bringup > time in general, but hopefully with this the misaligned probing won't > be making it worse. If this works for you I can clean it up and submit > a patch (sorry gmail mangles the diff): Thanks. I think we can call something like this a fix. > > diff --git a/arch/riscv/kernel/smpboot.c b/arch/riscv/kernel/smpboot.c > index 1b8da4e40a4d..7dce30b7c868 100644 > --- a/arch/riscv/kernel/smpboot.c > +++ b/arch/riscv/kernel/smpboot.c > @@ -223,8 +223,18 @@ int __cpu_up(unsigned int cpu, struct task_struct *tidle) > return ret; > } > > +static void check_unaligned_access_cpu(void *unused) > +{ > + unsigned int cpu = smp_processor_id(); > + > + /* Someone has to stay behind and tend the jiffies. */ > + if (cpu != 0) > + check_unaligned_access(cpu); > +} > + > void __init smp_cpus_done(unsigned int max_cpus) > { > + on_each_cpu(check_unaligned_access_cpu, NULL, 0); > } > > /* > @@ -246,7 +256,6 @@ asmlinkage __visible void smp_callin(void) > > numa_add_cpu(curr_cpuid); > set_cpu_online(curr_cpuid, 1); > - check_unaligned_access(curr_cpuid); > > if (has_vector()) { > if (riscv_v_setup_vsize()) > > -Evan _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: boot time regressed a lot due to misaligned access probe 2023-09-13 19:50 ` Evan Green 2023-09-13 19:53 ` Palmer Dabbelt @ 2023-09-15 0:55 ` Jisheng Zhang 1 sibling, 0 replies; 6+ messages in thread From: Jisheng Zhang @ 2023-09-15 0:55 UTC (permalink / raw) To: Evan Green Cc: Ben Dooks, linux-riscv, PalmerDabbelt, Paul Walmsley, Albert Ou On Wed, Sep 13, 2023 at 12:50:54PM -0700, Evan Green wrote: > On Wed, Sep 13, 2023 at 8:23 AM Jisheng Zhang <jszhang@kernel.org> wrote: > > > > On Wed, Sep 13, 2023 at 11:46:28AM +0100, Ben Dooks wrote: > > > On 13/09/2023 01:14, Jisheng Zhang wrote: > > > > Hi all, > > > > > > > > Probing one cpu for misaligned access cost about 0.06s, so it will cost > > > > about 3.8s on platforms with 64 CPUs, for example, milkv pioneer which > > > > is powered by sg2042. > > > > > > > > I'm not sure the reason of probing misaligned access for all CPUs. If > > > > the HW doesn't behave as SMP from misalligned access side, then unless > > > > userspace processes force cpu affinity, they always suffer from this > > > > non-SMP pain. > > > > > > > > So, can we only probe the boot cpu? > > Hi Jisheng, > Thanks for identifying this regression. I'd prefer to keep the probing > on each cpu, as I don't think it's safe to assume behavior is the same > across all cores. But there's no reason this needs to be done > serially, we should be able to do the checking in parallel on each > cpu. I don't have a physical 64-core system, but I experimented with > qemu a bit: > > With misaligned probing > [ 0.558930] smp: Bringing up secondary CPUs ... > [ 7.635580] smp: Brought up 1 node, 64 CPUs > > With no misaligned probing > [ 0.473012] smp: Bringing up secondary CPUs ... > [ 5.438450] smp: Brought up 1 node, 64 CPUs > > With change below: > [ 0.615684] smp: Bringing up secondary CPUs ... > [ 5.489045] smp: Brought up 1 node, 64 CPUs > > I also commented out the pr_info() in my testing, mostly to keep the > UART out of the way. We should strive to improve the smp core bringup > time in general, but hopefully with this the misaligned probing won't > be making it worse. If this works for you I can clean it up and submit > a patch (sorry gmail mangles the diff): The patch improved the boot time a lot! Thanks Feel free to add: Tested-by: Jisheng Zhang <jszhang@kernel.org> > > diff --git a/arch/riscv/kernel/smpboot.c b/arch/riscv/kernel/smpboot.c > index 1b8da4e40a4d..7dce30b7c868 100644 > --- a/arch/riscv/kernel/smpboot.c > +++ b/arch/riscv/kernel/smpboot.c > @@ -223,8 +223,18 @@ int __cpu_up(unsigned int cpu, struct task_struct *tidle) > return ret; > } > > +static void check_unaligned_access_cpu(void *unused) > +{ > + unsigned int cpu = smp_processor_id(); > + > + /* Someone has to stay behind and tend the jiffies. */ > + if (cpu != 0) > + check_unaligned_access(cpu); > +} > + > void __init smp_cpus_done(unsigned int max_cpus) > { > + on_each_cpu(check_unaligned_access_cpu, NULL, 0); > } > > /* > @@ -246,7 +256,6 @@ asmlinkage __visible void smp_callin(void) > > numa_add_cpu(curr_cpuid); > set_cpu_online(curr_cpuid, 1); > - check_unaligned_access(curr_cpuid); > > if (has_vector()) { > if (riscv_v_setup_vsize()) > > -Evan _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-09-15 1:07 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-09-13 0:14 boot time regressed a lot due to misaligned access probe Jisheng Zhang 2023-09-13 10:46 ` Ben Dooks 2023-09-13 15:11 ` Jisheng Zhang 2023-09-13 19:50 ` Evan Green 2023-09-13 19:53 ` Palmer Dabbelt 2023-09-15 0:55 ` Jisheng Zhang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox