public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed
From: Darren Hart <darren@os.amperecomputing.com>
To: Barry Song <21cnbao@gmail.com>,
	Sudeep Holla <sudeep.holla@arm.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	"Rafael J. Wysocki" <rafael@kernel.org>
Cc: Vincent Guittot <vincent.guittot@linaro.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux Arm <linux-arm-kernel@lists.infradead.org>,
	Catalin Marinas <Catalin.Marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Barry Song <song.bao.hua@hisilicon.com>,
	Valentin Schneider <Valentin.Schneider@arm.com>,
	"D . Scott Phillips" <scott@os.amperecomputing.com>,
	Ilkka Koskinen <ilkka@os.amperecomputing.com>,
	stable@vger.kernel.org
Subject: Re: [PATCH 1/1] arm64: smp: Skip MC sched domain on SoCs with no LLC
Date: Thu, 3 Mar 2022 08:35:35 -0800	[thread overview]
Message-ID: <YiDuV8YkaWGNgky7@fedora> (raw)
In-Reply-To: <CAGsJ_4y8MkQhAZ9c9yz_UHee7MCZrtv3aui=Luq-ZOBeAsGbGQ@mail.gmail.com>

On Thu, Mar 03, 2022 at 06:36:30PM +1300, Barry Song wrote:
> On Thu, Mar 3, 2022 at 3:22 PM Darren Hart
> <darren@os.amperecomputing.com> wrote:
> >
> > On Wed, Mar 02, 2022 at 10:32:06AM +0100, Vincent Guittot wrote:
> > > On Tue, 1 Mar 2022 at 01:35, Darren Hart <darren@os.amperecomputing.com> wrote:
> > > >
> > > > Ampere Altra defines CPU clusters in the ACPI PPTT. They share a Snoop
> > > > Control Unit, but have no shared CPU-side last level cache.
> > > >
> > > > cpu_coregroup_mask() will return a cpumask with weight 1, while
> > > > cpu_clustergroup_mask() will return a cpumask with weight 2.
> > > >
> > > > As a result, build_sched_domain() will BUG() once per CPU with:
> > > >
> > > > BUG: arch topology borken
> > > >      the CLS domain not a subset of the MC domain
> > > >
> > > > The MC level cpumask is then extended to that of the CLS child, and is
> > > > later removed entirely as redundant. This sched domain topology is an
> > > > improvement over previous topologies, or those built without
> > > > SCHED_CLUSTER, particularly for certain latency sensitive workloads.
> > > > With the current scheduler model and heuristics, this is a desirable
> > > > default topology for Ampere Altra and Altra Max system.
> > > >
> > > > Introduce an alternate sched domain topology for arm64 without the MC
> > > > level and test for llc_sibling weight 1 across all CPUs to enable it.
> > > >
> > > > Do this in arch/arm64/kernel/smp.c (as opposed to
> > > > arch/arm64/kernel/topology.c) as all the CPU sibling maps are now
> > > > populated and we avoid needing to extend the drivers/acpi/pptt.c API to
> > > > detect the cluster level being above the cpu llc level. This is
> > > > consistent with other architectures and provides a readily extensible
> > > > mechanism for other alternate topologies.
> > > >
> > > > The final sched domain topology for a 2 socket Ampere Altra system is
> > > > unchanged with or without CONFIG_SCHED_CLUSTER, and the BUG is avoided:
> > > >
> > > > For CPU0:
> > > >
> > > > CONFIG_SCHED_CLUSTER=y
> > > > CLS  [0-1]
> > > > DIE  [0-79]
> > > > NUMA [0-159]
> > > >
> > > > CONFIG_SCHED_CLUSTER is not set
> > > > DIE  [0-79]
> > > > NUMA [0-159]
> > > >
> > > > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > > > Cc: Will Deacon <will@kernel.org>
> > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > Cc: Vincent Guittot <vincent.guittot@linaro.org>
> > > > Cc: Barry Song <song.bao.hua@hisilicon.com>
> > > > Cc: Valentin Schneider <valentin.schneider@arm.com>
> > > > Cc: D. Scott Phillips <scott@os.amperecomputing.com>
> > > > Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
> > > > Cc: <stable@vger.kernel.org> # 5.16.x
> > > > Signed-off-by: Darren Hart <darren@os.amperecomputing.com>
> > > > ---
> > > >  arch/arm64/kernel/smp.c | 28 ++++++++++++++++++++++++++++
> > > >  1 file changed, 28 insertions(+)
> > > >
> > > > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> > > > index 27df5c1e6baa..3597e75645e1 100644
> > > > --- a/arch/arm64/kernel/smp.c
> > > > +++ b/arch/arm64/kernel/smp.c
> > > > @@ -433,6 +433,33 @@ static void __init hyp_mode_check(void)
> > > >         }
> > > >  }
> > > >
> > > > +static struct sched_domain_topology_level arm64_no_mc_topology[] = {
> > > > +#ifdef CONFIG_SCHED_SMT
> > > > +       { cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT) },
> > > > +#endif
> > > > +
> > > > +#ifdef CONFIG_SCHED_CLUSTER
> > > > +       { cpu_clustergroup_mask, cpu_cluster_flags, SD_INIT_NAME(CLS) },
> > > > +#endif
> > > > +
> > > > +       { cpu_cpu_mask, SD_INIT_NAME(DIE) },
> > > > +       { NULL, },
> > > > +};
> > > > +
> > > > +static void __init update_sched_domain_topology(void)
> > > > +{
> > > > +       int cpu;
> > > > +
> > > > +       for_each_possible_cpu(cpu) {
> > > > +               if (cpu_topology[cpu].llc_id != -1 &&
> > >
> > > Have you tested it with a non-acpi system ? AFAICT, llc_id is only set
> > > by ACPI system and  llc_id == -1 for others like DT based system
> > >
> > > > +                   cpumask_weight(&cpu_topology[cpu].llc_sibling) > 1)
> > > > +                       return;
> > > > +       }
> >
> > Hi Vincent,
> >
> > I did not have a non-acpi system to test, no. You're right of course,
> > llc_id is only set by ACPI systems on arm64. We could wrap this in a
> > CONFIG_ACPI ifdef (or IS_ENABLED), but I think this would be preferable:
> >
> > +       for_each_possible_cpu(cpu) {
> > +               if (cpu_topology[cpu].llc_id == -1 ||
> > +                   cpumask_weight(&cpu_topology[cpu].llc_sibling) > 1)
> > +                       return;
> > +       }
> >
> > Quickly tested on Altra successfully. Would appreciate anyone with non-acpi
> > arm64 systems who can test and verify this behaves as intended. I will ask
> > around tomorrow as well to see what I may have access to.
> 
> I wonder if we can fix it by this
> 
> diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
> index 976154140f0b..551655ccd0eb 100644
> --- a/drivers/base/arch_topology.c
> +++ b/drivers/base/arch_topology.c
> @@ -627,6 +627,13 @@ const struct cpumask *cpu_coregroup_mask(int cpu)
>                 if (cpumask_subset(&cpu_topology[cpu].llc_sibling, core_mask))
>                         core_mask = &cpu_topology[cpu].llc_sibling;
>         }
> +       /*
> +        * Some machines have no LLC but have clusters, we let MC = CLUSTER
> +        * as MC should always be after CLUSTER. But anyway, the MC domain
> +        * will be removed
> +        */
> +       if (cpumask_subset(core_mask, &cpu_topology[cpu].cluster_sibling))
> +               core_mask = &cpu_topology[cpu].cluster_sibling;
> 
>         return core_mask;
>  }
> 
> as it can make all kinds of topologies happy -  symmetric and asymmetric.
> 

Hah. Full circle. Yes, this works, and it's basically what we'd started
with internally. I ended up exploring various paths here to avoid a
"band aid" and to target the fix and minimize impact. That said, after
digging through the acpi, topology, smp, and sched domains code... I
don't think this approach is a band aid and it's a very minimal
solution. The only downside I can think of is masking a potential
topology bug and not catching it in the scheduler - that seems very
unlikely. I'm perfectly happy with this solution as well.

Will D, would you prefer this approach?

+Sudeep, Greg, and Rafael,

Are you OK with this approach?

If so, we can drop my arm64 specific new topology patch and I can send a
version of this one out (suggested-by Barry of course), unless you'd
prefer to send it Barry?

Thanks,

> >
> > Thanks,
> >
> > > > +
> > > > +       pr_info("No LLC siblings, using No MC sched domains topology\n");
> > > > +       set_sched_topology(arm64_no_mc_topology);
> > > > +}
> > > > +
> > > >  void __init smp_cpus_done(unsigned int max_cpus)
> > > >  {
> > > >         pr_info("SMP: Total of %d processors activated.\n", num_online_cpus());
> > > > @@ -440,6 +467,7 @@ void __init smp_cpus_done(unsigned int max_cpus)
> > > >         hyp_mode_check();
> > > >         apply_alternatives_all();
> > > >         mark_linear_text_alias_ro();
> > > > +       update_sched_domain_topology();
> > > >  }
> > > >
> > > >  void __init smp_prepare_boot_cpu(void)
> > > > --
> > > > 2.31.1
> > > >
> >
> > --
> > Darren Hart
> > Ampere Computing / OS and Kernel
> 
> Thanks
> Barry

-- 
Darren Hart
Ampere Computing / OS and Kernel

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2022-03-03 16:37 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-01  0:28 [PATCH 0/1] arm64: smp: Skip MC sched domain on SoCs with no LLC Darren Hart
2022-03-01  0:29 ` [PATCH 1/1] " Darren Hart
2022-03-02  9:32   ` Vincent Guittot
2022-03-03  2:18     ` Darren Hart
2022-03-03  5:36       ` Barry Song
2022-03-03 16:35         ` Darren Hart [this message]
2022-03-03 21:43           ` Barry Song
2022-03-03  8:08       ` Vincent Guittot
2022-03-03 16:02         ` Darren Hart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YiDuV8YkaWGNgky7@fedora \
    --to=darren@os.amperecomputing.com \
    --cc=21cnbao@gmail.com \
    --cc=Catalin.Marinas@arm.com \
    --cc=Valentin.Schneider@arm.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=ilkka@os.amperecomputing.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=rafael@kernel.org \
    --cc=scott@os.amperecomputing.com \
    --cc=song.bao.hua@hisilicon.com \
    --cc=stable@vger.kernel.org \
    --cc=sudeep.holla@arm.com \
    --cc=vincent.guittot@linaro.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox