* boot time regressed a lot due to misaligned access probe
@ 2023-09-13 0:14 Jisheng Zhang
2023-09-13 10:46 ` Ben Dooks
0 siblings, 1 reply; 6+ messages in thread
From: Jisheng Zhang @ 2023-09-13 0:14 UTC (permalink / raw)
To: linux-riscv, PalmerDabbelt, Paul Walmsley, Albert Ou, Evan Green
Hi all,
Probing one cpu for misaligned access cost about 0.06s, so it will cost
about 3.8s on platforms with 64 CPUs, for example, milkv pioneer which
is powered by sg2042.
I'm not sure the reason of probing misaligned access for all CPUs. If
the HW doesn't behave as SMP from misalligned access side, then unless
userspace processes force cpu affinity, they always suffer from this
non-SMP pain.
So, can we only probe the boot cpu?
Thanks
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: boot time regressed a lot due to misaligned access probe
2023-09-13 0:14 boot time regressed a lot due to misaligned access probe Jisheng Zhang
@ 2023-09-13 10:46 ` Ben Dooks
2023-09-13 15:11 ` Jisheng Zhang
0 siblings, 1 reply; 6+ messages in thread
From: Ben Dooks @ 2023-09-13 10:46 UTC (permalink / raw)
To: Jisheng Zhang, linux-riscv, PalmerDabbelt, Paul Walmsley,
Albert Ou, Evan Green
On 13/09/2023 01:14, Jisheng Zhang wrote:
> Hi all,
>
> Probing one cpu for misaligned access cost about 0.06s, so it will cost
> about 3.8s on platforms with 64 CPUs, for example, milkv pioneer which
> is powered by sg2042.
>
> I'm not sure the reason of probing misaligned access for all CPUs. If
> the HW doesn't behave as SMP from misalligned access side, then unless
> userspace processes force cpu affinity, they always suffer from this
> non-SMP pain.
>
> So, can we only probe the boot cpu?
So a couple of ideas:
#1 is it worth adding a device-tree property to explicitly to say if
the unaligned access has been measured and known
#2 only probe one cpu in a cluster if there are multiple clusters of
cpus?
--
Ben Dooks http://www.codethink.co.uk/
Senior Engineer Codethink - Providing Genius
https://www.codethink.co.uk/privacy.html
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: boot time regressed a lot due to misaligned access probe
2023-09-13 10:46 ` Ben Dooks
@ 2023-09-13 15:11 ` Jisheng Zhang
2023-09-13 19:50 ` Evan Green
0 siblings, 1 reply; 6+ messages in thread
From: Jisheng Zhang @ 2023-09-13 15:11 UTC (permalink / raw)
To: Ben Dooks
Cc: linux-riscv, PalmerDabbelt, Paul Walmsley, Albert Ou, Evan Green
On Wed, Sep 13, 2023 at 11:46:28AM +0100, Ben Dooks wrote:
> On 13/09/2023 01:14, Jisheng Zhang wrote:
> > Hi all,
> >
> > Probing one cpu for misaligned access cost about 0.06s, so it will cost
> > about 3.8s on platforms with 64 CPUs, for example, milkv pioneer which
> > is powered by sg2042.
> >
> > I'm not sure the reason of probing misaligned access for all CPUs. If
> > the HW doesn't behave as SMP from misalligned access side, then unless
> > userspace processes force cpu affinity, they always suffer from this
> > non-SMP pain.
> >
> > So, can we only probe the boot cpu?
>
> So a couple of ideas:
>
> #1 is it worth adding a device-tree property to explicitly to say if
> the unaligned access has been measured and known
>
> #2 only probe one cpu in a cluster if there are multiple clusters of
> cpus?
and #3 Could userspace who cares about misaligned access probe the
speed itself? And this reminds me the arm case: old armv5te VS armv7,
there's no such probe in arm yet.
>
> --
> Ben Dooks http://www.codethink.co.uk/
> Senior Engineer Codethink - Providing Genius
>
> https://www.codethink.co.uk/privacy.html
>
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: boot time regressed a lot due to misaligned access probe
2023-09-13 15:11 ` Jisheng Zhang
@ 2023-09-13 19:50 ` Evan Green
2023-09-13 19:53 ` Palmer Dabbelt
2023-09-15 0:55 ` Jisheng Zhang
0 siblings, 2 replies; 6+ messages in thread
From: Evan Green @ 2023-09-13 19:50 UTC (permalink / raw)
To: Jisheng Zhang
Cc: Ben Dooks, linux-riscv, PalmerDabbelt, Paul Walmsley, Albert Ou
On Wed, Sep 13, 2023 at 8:23 AM Jisheng Zhang <jszhang@kernel.org> wrote:
>
> On Wed, Sep 13, 2023 at 11:46:28AM +0100, Ben Dooks wrote:
> > On 13/09/2023 01:14, Jisheng Zhang wrote:
> > > Hi all,
> > >
> > > Probing one cpu for misaligned access cost about 0.06s, so it will cost
> > > about 3.8s on platforms with 64 CPUs, for example, milkv pioneer which
> > > is powered by sg2042.
> > >
> > > I'm not sure the reason of probing misaligned access for all CPUs. If
> > > the HW doesn't behave as SMP from misalligned access side, then unless
> > > userspace processes force cpu affinity, they always suffer from this
> > > non-SMP pain.
> > >
> > > So, can we only probe the boot cpu?
Hi Jisheng,
Thanks for identifying this regression. I'd prefer to keep the probing
on each cpu, as I don't think it's safe to assume behavior is the same
across all cores. But there's no reason this needs to be done
serially, we should be able to do the checking in parallel on each
cpu. I don't have a physical 64-core system, but I experimented with
qemu a bit:
With misaligned probing
[ 0.558930] smp: Bringing up secondary CPUs ...
[ 7.635580] smp: Brought up 1 node, 64 CPUs
With no misaligned probing
[ 0.473012] smp: Bringing up secondary CPUs ...
[ 5.438450] smp: Brought up 1 node, 64 CPUs
With change below:
[ 0.615684] smp: Bringing up secondary CPUs ...
[ 5.489045] smp: Brought up 1 node, 64 CPUs
I also commented out the pr_info() in my testing, mostly to keep the
UART out of the way. We should strive to improve the smp core bringup
time in general, but hopefully with this the misaligned probing won't
be making it worse. If this works for you I can clean it up and submit
a patch (sorry gmail mangles the diff):
diff --git a/arch/riscv/kernel/smpboot.c b/arch/riscv/kernel/smpboot.c
index 1b8da4e40a4d..7dce30b7c868 100644
--- a/arch/riscv/kernel/smpboot.c
+++ b/arch/riscv/kernel/smpboot.c
@@ -223,8 +223,18 @@ int __cpu_up(unsigned int cpu, struct task_struct *tidle)
return ret;
}
+static void check_unaligned_access_cpu(void *unused)
+{
+ unsigned int cpu = smp_processor_id();
+
+ /* Someone has to stay behind and tend the jiffies. */
+ if (cpu != 0)
+ check_unaligned_access(cpu);
+}
+
void __init smp_cpus_done(unsigned int max_cpus)
{
+ on_each_cpu(check_unaligned_access_cpu, NULL, 0);
}
/*
@@ -246,7 +256,6 @@ asmlinkage __visible void smp_callin(void)
numa_add_cpu(curr_cpuid);
set_cpu_online(curr_cpuid, 1);
- check_unaligned_access(curr_cpuid);
if (has_vector()) {
if (riscv_v_setup_vsize())
-Evan
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: boot time regressed a lot due to misaligned access probe
2023-09-13 19:50 ` Evan Green
@ 2023-09-13 19:53 ` Palmer Dabbelt
2023-09-15 0:55 ` Jisheng Zhang
1 sibling, 0 replies; 6+ messages in thread
From: Palmer Dabbelt @ 2023-09-13 19:53 UTC (permalink / raw)
To: Evan Green; +Cc: jszhang, ben.dooks, linux-riscv, Paul Walmsley, aou
On Wed, 13 Sep 2023 12:50:54 PDT (-0700), Evan Green wrote:
> On Wed, Sep 13, 2023 at 8:23 AM Jisheng Zhang <jszhang@kernel.org> wrote:
>>
>> On Wed, Sep 13, 2023 at 11:46:28AM +0100, Ben Dooks wrote:
>> > On 13/09/2023 01:14, Jisheng Zhang wrote:
>> > > Hi all,
>> > >
>> > > Probing one cpu for misaligned access cost about 0.06s, so it will cost
>> > > about 3.8s on platforms with 64 CPUs, for example, milkv pioneer which
>> > > is powered by sg2042.
>> > >
>> > > I'm not sure the reason of probing misaligned access for all CPUs. If
>> > > the HW doesn't behave as SMP from misalligned access side, then unless
>> > > userspace processes force cpu affinity, they always suffer from this
>> > > non-SMP pain.
>> > >
>> > > So, can we only probe the boot cpu?
>
> Hi Jisheng,
> Thanks for identifying this regression. I'd prefer to keep the probing
> on each cpu, as I don't think it's safe to assume behavior is the same
> across all cores. But there's no reason this needs to be done
> serially, we should be able to do the checking in parallel on each
> cpu. I don't have a physical 64-core system, but I experimented with
> qemu a bit:
>
> With misaligned probing
> [ 0.558930] smp: Bringing up secondary CPUs ...
> [ 7.635580] smp: Brought up 1 node, 64 CPUs
>
> With no misaligned probing
> [ 0.473012] smp: Bringing up secondary CPUs ...
> [ 5.438450] smp: Brought up 1 node, 64 CPUs
>
> With change below:
> [ 0.615684] smp: Bringing up secondary CPUs ...
> [ 5.489045] smp: Brought up 1 node, 64 CPUs
>
> I also commented out the pr_info() in my testing, mostly to keep the
> UART out of the way. We should strive to improve the smp core bringup
> time in general, but hopefully with this the misaligned probing won't
> be making it worse. If this works for you I can clean it up and submit
> a patch (sorry gmail mangles the diff):
Thanks. I think we can call something like this a fix.
>
> diff --git a/arch/riscv/kernel/smpboot.c b/arch/riscv/kernel/smpboot.c
> index 1b8da4e40a4d..7dce30b7c868 100644
> --- a/arch/riscv/kernel/smpboot.c
> +++ b/arch/riscv/kernel/smpboot.c
> @@ -223,8 +223,18 @@ int __cpu_up(unsigned int cpu, struct task_struct *tidle)
> return ret;
> }
>
> +static void check_unaligned_access_cpu(void *unused)
> +{
> + unsigned int cpu = smp_processor_id();
> +
> + /* Someone has to stay behind and tend the jiffies. */
> + if (cpu != 0)
> + check_unaligned_access(cpu);
> +}
> +
> void __init smp_cpus_done(unsigned int max_cpus)
> {
> + on_each_cpu(check_unaligned_access_cpu, NULL, 0);
> }
>
> /*
> @@ -246,7 +256,6 @@ asmlinkage __visible void smp_callin(void)
>
> numa_add_cpu(curr_cpuid);
> set_cpu_online(curr_cpuid, 1);
> - check_unaligned_access(curr_cpuid);
>
> if (has_vector()) {
> if (riscv_v_setup_vsize())
>
> -Evan
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: boot time regressed a lot due to misaligned access probe
2023-09-13 19:50 ` Evan Green
2023-09-13 19:53 ` Palmer Dabbelt
@ 2023-09-15 0:55 ` Jisheng Zhang
1 sibling, 0 replies; 6+ messages in thread
From: Jisheng Zhang @ 2023-09-15 0:55 UTC (permalink / raw)
To: Evan Green
Cc: Ben Dooks, linux-riscv, PalmerDabbelt, Paul Walmsley, Albert Ou
On Wed, Sep 13, 2023 at 12:50:54PM -0700, Evan Green wrote:
> On Wed, Sep 13, 2023 at 8:23 AM Jisheng Zhang <jszhang@kernel.org> wrote:
> >
> > On Wed, Sep 13, 2023 at 11:46:28AM +0100, Ben Dooks wrote:
> > > On 13/09/2023 01:14, Jisheng Zhang wrote:
> > > > Hi all,
> > > >
> > > > Probing one cpu for misaligned access cost about 0.06s, so it will cost
> > > > about 3.8s on platforms with 64 CPUs, for example, milkv pioneer which
> > > > is powered by sg2042.
> > > >
> > > > I'm not sure the reason of probing misaligned access for all CPUs. If
> > > > the HW doesn't behave as SMP from misalligned access side, then unless
> > > > userspace processes force cpu affinity, they always suffer from this
> > > > non-SMP pain.
> > > >
> > > > So, can we only probe the boot cpu?
>
> Hi Jisheng,
> Thanks for identifying this regression. I'd prefer to keep the probing
> on each cpu, as I don't think it's safe to assume behavior is the same
> across all cores. But there's no reason this needs to be done
> serially, we should be able to do the checking in parallel on each
> cpu. I don't have a physical 64-core system, but I experimented with
> qemu a bit:
>
> With misaligned probing
> [ 0.558930] smp: Bringing up secondary CPUs ...
> [ 7.635580] smp: Brought up 1 node, 64 CPUs
>
> With no misaligned probing
> [ 0.473012] smp: Bringing up secondary CPUs ...
> [ 5.438450] smp: Brought up 1 node, 64 CPUs
>
> With change below:
> [ 0.615684] smp: Bringing up secondary CPUs ...
> [ 5.489045] smp: Brought up 1 node, 64 CPUs
>
> I also commented out the pr_info() in my testing, mostly to keep the
> UART out of the way. We should strive to improve the smp core bringup
> time in general, but hopefully with this the misaligned probing won't
> be making it worse. If this works for you I can clean it up and submit
> a patch (sorry gmail mangles the diff):
The patch improved the boot time a lot! Thanks
Feel free to add:
Tested-by: Jisheng Zhang <jszhang@kernel.org>
>
> diff --git a/arch/riscv/kernel/smpboot.c b/arch/riscv/kernel/smpboot.c
> index 1b8da4e40a4d..7dce30b7c868 100644
> --- a/arch/riscv/kernel/smpboot.c
> +++ b/arch/riscv/kernel/smpboot.c
> @@ -223,8 +223,18 @@ int __cpu_up(unsigned int cpu, struct task_struct *tidle)
> return ret;
> }
>
> +static void check_unaligned_access_cpu(void *unused)
> +{
> + unsigned int cpu = smp_processor_id();
> +
> + /* Someone has to stay behind and tend the jiffies. */
> + if (cpu != 0)
> + check_unaligned_access(cpu);
> +}
> +
> void __init smp_cpus_done(unsigned int max_cpus)
> {
> + on_each_cpu(check_unaligned_access_cpu, NULL, 0);
> }
>
> /*
> @@ -246,7 +256,6 @@ asmlinkage __visible void smp_callin(void)
>
> numa_add_cpu(curr_cpuid);
> set_cpu_online(curr_cpuid, 1);
> - check_unaligned_access(curr_cpuid);
>
> if (has_vector()) {
> if (riscv_v_setup_vsize())
>
> -Evan
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-09-15 1:07 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-13 0:14 boot time regressed a lot due to misaligned access probe Jisheng Zhang
2023-09-13 10:46 ` Ben Dooks
2023-09-13 15:11 ` Jisheng Zhang
2023-09-13 19:50 ` Evan Green
2023-09-13 19:53 ` Palmer Dabbelt
2023-09-15 0:55 ` Jisheng Zhang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox