From: Patrick Bellasi <patrick.bellasi@arm.com>
To: Quentin Perret <quentin.perret@arm.com>
Cc: peterz@infradead.org, rjw@rjwysocki.net,
linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org,
gregkh@linuxfoundation.org, mingo@redhat.com,
dietmar.eggemann@arm.com, morten.rasmussen@arm.com,
chris.redpath@arm.com, valentin.schneider@arm.com,
vincent.guittot@linaro.org, thara.gopinath@linaro.org,
viresh.kumar@linaro.org, tkjos@google.com,
joel@joelfernandes.org, smuckle@google.com,
adharmap@codeaurora.org, skannan@codeaurora.org,
pkondeti@codeaurora.org, juri.lelli@redhat.com,
edubezval@gmail.com, srinivas.pandruvada@linux.intel.com,
currojerez@riseup.net, javi.merino@kernel.org
Subject: Re: [PATCH v6 03/14] PM: Introduce an Energy Model management framework
Date: Fri, 31 Aug 2018 10:04:42 +0100 [thread overview]
Message-ID: <20180831090442.GA25636@e110439-lin> (raw)
In-Reply-To: <20180829132811.iacfltcos6kfgp7e@queper01-lin>
On 29-Aug 14:28, Quentin Perret wrote:
> Hi Patrick,
>
> On Wednesday 29 Aug 2018 at 11:04:35 (+0100), Patrick Bellasi wrote:
> > In the loop above we use smp_store_release() to propagate the pointer
> > setting in a PER_CPU(em_data), which ultimate goal is to protect
> > em_register_perf_domain() from multiple clients registering the same
> > power domain.
> >
> > I think there are two possible optimizations there:
> >
> > 1. use of a single memory barrier
> >
> > Since we are already em_pd_mutex protected, i.e. there cannot be a
> > concurrent writers, we can use one single memory barrier after the
> > loop, i.e.
> >
> > for_each_cpu(cpu, span)
> > WRITE_ONCE()
> > smp_wmb()
> >
> > which should be just enough to ensure that all other CPUs will see
> > the pointer set once we release the mutex
>
> Right, I'm actually wondering if the memory barrier is needed at all ...
> The mutex lock()/unlock() should already ensure the ordering I want no ?
>
> WRITE_ONCE() should prevent load/store tearing with concurrent em_cpu_get(),
> and the release/acquire semantics of mutex lock/unlock should be enough to
> serialize the memory accesses of concurrent em_register_perf_domain() calls
> properly ...
>
> Hmm, let me read memory-barriers.txt again.
Yes, I think it should... but better double check.
> > 2. avoid using PER_CPU variables
> >
> > Apart from the initialization code, i.e. boot time, the em_data is
> > expected to be read only, isn't it?
>
> That's right. It's not only read only, it's also not read very often (in
> the use-cases I have in mind at least). The scheduler for example will
> call em_cpu_get() once when sched domains are built, and keep the
> reference instead of calling it again.
>
> > If that's the case, I think that using PER_CPU variables is not
> > strictly required while it unnecessarily increases the cache pressure.
> >
> > In the worst case we can end up with one cache line for each CPU to
> > host just an 8B pointer, instead of using that single cache line to host
> > up to 8 pointers if we use just an array, i.e.
> >
> > struct em_perf_domain *em_data[NR_CPUS]
> > ____cacheline_aligned_in_smp __read_mostly;
> >
> > Consider also that: up to 8 pointers in a single cache line means
> > also that single cache line can be enough to access the EM from all
> > the CPUs of almost every modern mobile phone SoC.
> >
> > Note entirely sure if PER_CPU uses less overall memory in case you
> > have much less CPUs then the compile time defined NR_CPUS.
> > But still, if the above makes sense, you still have a 8x gain
> > factor between number Write allocated .data..percp sections and
> > the value of NR_CPUS. Meaning that in the worst case we allocate
> > the same amount of memory using NR_CPUS=64 (the default on arm64)
> > while running on an 8 CPUs system... but still we should get less
> > cluster caches pressure at run-time with the array approach, 1
> > cache line vs 4.
>
> Right, using per_cpu() might cause to bring in cache things you don't
> really care about (other non-related per_cpu stuff), but that shouldn't
> waste memory I think. I mean, if my em_data var is the first in a cache
> line, the rest of the cache line will most likely be used by other
> per_cpu variables anyways ...
>
> As you suggested, the alternative would be to have a simple array. I'm
> fine with this TBH. But I would probably allocate it dynamically using
> nr_cpu_ids instead of using a static NR_CPUS-wide thing though -- the
> registration of perf domains usually happens late enough in the boot
> process.
>
> What do you think ?
Sound all reasonable to me.
> Thanks
> Quentin
Best Patrick
--
#include <best/regards.h>
Patrick Bellasi
next prev parent reply other threads:[~2018-08-31 9:04 UTC|newest]
Thread overview: 67+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-08-20 9:44 [PATCH v6 00/14] Energy Aware Scheduling Quentin Perret
2018-08-20 9:44 ` [PATCH v6 01/14] sched: Relocate arch_scale_cpu_capacity Quentin Perret
2018-08-20 9:44 ` [PATCH v6 02/14] sched/cpufreq: Factor out utilization to frequency mapping Quentin Perret
2018-09-10 9:29 ` Rafael J. Wysocki
2018-08-20 9:44 ` [PATCH v6 03/14] PM: Introduce an Energy Model management framework Quentin Perret
2018-08-29 10:04 ` Patrick Bellasi
2018-08-29 13:28 ` Quentin Perret
2018-08-31 9:04 ` Patrick Bellasi [this message]
2018-09-11 9:34 ` Andrea Parri
2018-09-11 12:32 ` Quentin Perret
2018-09-11 13:31 ` Andrea Parri
2018-09-10 9:44 ` Rafael J. Wysocki
2018-09-10 10:38 ` Quentin Perret
2018-09-10 10:40 ` Rafael J. Wysocki
2018-08-20 9:44 ` [PATCH v6 04/14] PM / EM: Expose the Energy Model in sysfs Quentin Perret
2018-09-06 6:56 ` Dietmar Eggemann
2018-09-06 14:09 ` Quentin Perret
2018-09-07 0:14 ` Dietmar Eggemann
2018-08-20 9:44 ` [PATCH v6 05/14] sched/topology: Reference the Energy Model of CPUs when available Quentin Perret
2018-08-29 16:22 ` Patrick Bellasi
2018-08-29 16:56 ` Quentin Perret
2018-08-30 10:00 ` Patrick Bellasi
2018-08-30 10:47 ` Quentin Perret
2018-08-30 12:50 ` Patrick Bellasi
2018-08-20 9:44 ` [PATCH v6 06/14] sched/topology: Lowest CPU asymmetry sched_domain level pointer Quentin Perret
2018-08-20 9:44 ` [PATCH v6 07/14] sched/topology: Introduce sched_energy_present static key Quentin Perret
2018-08-29 16:50 ` Patrick Bellasi
2018-08-29 17:20 ` Quentin Perret
2018-08-30 9:23 ` Patrick Bellasi
2018-08-30 9:57 ` Quentin Perret
2018-08-30 10:18 ` Patrick Bellasi
2018-09-06 6:06 ` Dietmar Eggemann
2018-09-06 9:29 ` Quentin Perret
2018-09-06 23:49 ` Dietmar Eggemann
2018-09-07 8:24 ` Quentin Perret
2018-08-20 9:44 ` [PATCH v6 08/14] sched/fair: Clean-up update_sg_lb_stats parameters Quentin Perret
2018-08-20 9:44 ` [PATCH v6 09/14] sched: Add over-utilization/tipping point indicator Quentin Perret
2018-08-20 9:44 ` [PATCH v6 10/14] sched/cpufreq: Refactor the utilization aggregation method Quentin Perret
2018-09-10 9:53 ` Rafael J. Wysocki
2018-09-10 10:07 ` Quentin Perret
2018-09-10 10:25 ` Rafael J. Wysocki
2018-08-20 9:44 ` [PATCH v6 11/14] sched/fair: Introduce an energy estimation helper function Quentin Perret
2018-08-20 9:44 ` [PATCH v6 12/14] sched/fair: Select an energy-efficient CPU on task wake-up Quentin Perret
2018-08-20 9:44 ` [PATCH v6 13/14] sched/topology: Make Energy Aware Scheduling depend on schedutil Quentin Perret
2018-09-04 10:59 ` Quentin Perret
2018-09-06 9:18 ` Rafael J. Wysocki
2018-09-06 9:18 ` Rafael J. Wysocki
2018-09-06 14:38 ` Quentin Perret
2018-09-06 14:38 ` Quentin Perret
2018-09-07 8:52 ` Rafael J. Wysocki
2018-09-07 8:52 ` Rafael J. Wysocki
2018-09-07 8:56 ` Rafael J. Wysocki
2018-09-07 8:56 ` Rafael J. Wysocki
2018-09-07 9:02 ` Quentin Perret
2018-09-07 9:02 ` Quentin Perret
2018-09-07 15:29 ` Quentin Perret
2018-09-07 15:29 ` Quentin Perret
2018-09-09 20:13 ` Rafael J. Wysocki
2018-09-09 20:13 ` Rafael J. Wysocki
2018-09-10 8:24 ` Quentin Perret
2018-09-10 8:24 ` Quentin Perret
2018-09-10 8:55 ` Rafael J. Wysocki
2018-09-10 8:55 ` Rafael J. Wysocki
2018-09-10 9:43 ` Quentin Perret
2018-09-10 9:43 ` Quentin Perret
2018-08-20 9:44 ` [PATCH v6 14/14] OPTIONAL: cpufreq: dt: Register an Energy Model Quentin Perret
2018-09-10 9:12 ` [PATCH v6 00/14] Energy Aware Scheduling Rafael J. Wysocki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180831090442.GA25636@e110439-lin \
--to=patrick.bellasi@arm.com \
--cc=adharmap@codeaurora.org \
--cc=chris.redpath@arm.com \
--cc=currojerez@riseup.net \
--cc=dietmar.eggemann@arm.com \
--cc=edubezval@gmail.com \
--cc=gregkh@linuxfoundation.org \
--cc=javi.merino@kernel.org \
--cc=joel@joelfernandes.org \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=morten.rasmussen@arm.com \
--cc=peterz@infradead.org \
--cc=pkondeti@codeaurora.org \
--cc=quentin.perret@arm.com \
--cc=rjw@rjwysocki.net \
--cc=skannan@codeaurora.org \
--cc=smuckle@google.com \
--cc=srinivas.pandruvada@linux.intel.com \
--cc=thara.gopinath@linaro.org \
--cc=tkjos@google.com \
--cc=valentin.schneider@arm.com \
--cc=vincent.guittot@linaro.org \
--cc=viresh.kumar@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.