* Re: [PATCH v3 0/2] powernv/cpuidle Device-tree parsing cleanup
From: Rafael J. Wysocki @ 2018-07-04 10:54 UTC (permalink / raw)
To: Akshay Adiga
Cc: linux-kernel, linuxppc-dev, linux-pm, stewart, benh, svaidy, ego,
npiggin, mpe
In-Reply-To: <1530609656-13301-1-git-send-email-akshay.adiga@linux.vnet.ibm.com>
On Tuesday, July 3, 2018 11:20:54 AM CEST Akshay Adiga wrote:
>
> Device-tree parsed multiple time in powernv cpuidle and powernv
> hotplug code.
>
> First to identify supported flags. Second time, to identify deepest_state
> and first deep state. Third time, during cpuidle init to find the available
> idle states. Any change in device-tree format will lead to make changes in
> these 3 places. Errors in device-tree can be handled in a better manner.
>
> This series adds code to parse device tree once and save in global structure.
>
> Changes from v2 :
> - Fix build error (moved a hunk from patch 1 to patch 2)
> Changes from v1 :
> - fold first 2 patches into 1
> - rename pm_ctrl_reg_* as psscr_*
> - added comment stating removal of pmicr parsing code
> - removed parsing code for pmicr
> - add member valid in pnv_idle_states_t to indicate if the psscr-mask/val
> are valid combination,
> - Change function description of pnv_parse_cpuidle_dt
> - Added error handling code.
>
>
> Akshay Adiga (2):
> powernv/cpuidle: Parse dt idle properties into global structure
> powernv/cpuidle: Use parsed device tree values for cpuidle_init
>
> arch/powerpc/include/asm/cpuidle.h | 13 ++
> arch/powerpc/platforms/powernv/idle.c | 216 ++++++++++++++++----------
> drivers/cpuidle/cpuidle-powernv.c | 154 ++++--------------
> 3 files changed, 177 insertions(+), 206 deletions(-)
>
>
I am assuming that this series will go in via the powerpc tree.
Thanks,
Rafael
^ permalink raw reply
* Re: [PATCHv3 0/4] drivers/base: bugfix for supplier<-consumer ordering in device_kset
From: Rafael J. Wysocki @ 2018-07-04 10:21 UTC (permalink / raw)
To: Pingfan Liu
Cc: linux-kernel, Greg Kroah-Hartman, Grygorii Strashko,
Christoph Hellwig, Bjorn Helgaas, Dave Young, linux-pci,
linuxppc-dev, Linux PM
In-Reply-To: <CAFgQCTusBM0-E=wAJgRJA7g-RnDi9ts==-M7KezNBwSWnyO=HA@mail.gmail.com>
On Wednesday, July 4, 2018 4:47:07 AM CEST Pingfan Liu wrote:
> On Tue, Jul 3, 2018 at 10:36 PM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> >
> > On Tuesday, July 3, 2018 8:50:38 AM CEST Pingfan Liu wrote:
> > > commit 52cdbdd49853 ("driver core: correct device's shutdown order")
> > > places an assumption of supplier<-consumer order on the process of probe.
> > > But it turns out to break down the parent <- child order in some scene.
> > > E.g in pci, a bridge is enabled by pci core, and behind it, the devices
> > > have been probed. Then comes the bridge's module, which enables extra
> > > feature(such as hotplug) on this bridge.
> >
> > So what *exactly* does happen in that case?
> >
> I saw the shpc_probe() is called on the bridge, although the probing
> failed on that bare-metal. But if it success, then it will enable the
> hotplug feature on the bridge.
I don't understand what you are saying here, sorry.
device_reorder_to_tail() walks the entire device hierarchy below the target
and moves all of the children in there *after* their parents.
How can it break "the parent <- child order" then?
Thanks,
Rafael
^ permalink raw reply
* Re: [PATCHv3 3/4] drivers/base: clean up the usage of devices_kset_move_last()
From: Rafael J. Wysocki @ 2018-07-04 10:17 UTC (permalink / raw)
To: Pingfan Liu
Cc: Grygorii Strashko, linux-kernel, Greg Kroah-Hartman,
Christoph Hellwig, Bjorn Helgaas, Dave Young, linux-pci,
linuxppc-dev, Linux PM
In-Reply-To: <CAFgQCTtu7jaUfWV4xq8CfWb7vuS5=+HuNpNxeMm9UawoUZ2CBg@mail.gmail.com>
On Wednesday, July 4, 2018 6:40:09 AM CEST Pingfan Liu wrote:
> On Tue, Jul 3, 2018 at 10:28 PM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> >
> > On Tuesday, July 3, 2018 8:50:41 AM CEST Pingfan Liu wrote:
> > > Clean up the referring to the code in commit 52cdbdd49853 ("driver core:
> > > correct device's shutdown order"). So later we can revert it safely.
> > >
> > > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > Cc: Grygorii Strashko <grygorii.strashko@ti.com>
> > > Cc: Christoph Hellwig <hch@infradead.org>
> > > Cc: Bjorn Helgaas <helgaas@kernel.org>
> > > Cc: Dave Young <dyoung@redhat.com>
> > > Cc: linux-pci@vger.kernel.org
> > > Cc: linuxppc-dev@lists.ozlabs.org
> > > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > > ---
> > > drivers/base/core.c | 7 -------
> > > 1 file changed, 7 deletions(-)
> > >
> > > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > > index 684b994..db3deb8 100644
> > > --- a/drivers/base/core.c
> > > +++ b/drivers/base/core.c
> > > @@ -127,13 +127,6 @@ static int device_reorder_to_tail(struct device *dev, void *not_used)
> > > {
> > > struct device_link *link;
> > >
> > > - /*
> > > - * Devices that have not been registered yet will be put to the ends
> > > - * of the lists during the registration, so skip them here.
> > > - */
> > > - if (device_is_registered(dev))
> > > - devices_kset_move_last(dev);
> > > -
> > > if (device_pm_initialized(dev))
> > > device_pm_move_last(dev);
> >
> > You can't do this.
> >
> > If you do it, that will break power management in some situations.
> >
> Could you shed light on it? I had a quick browsing of pm code, but it
> is a big function, and I got lost in it.
> If the above code causes failure, then does it imply that the seq in
> devices_kset should be the same as dpm_list?
Generally, yes it should.
> But in device_shutdown(), it only intersect with pm by
> pm_runtime_get_noresume(dev) and pm_runtime_barrier(dev). How do these
> function affect the seq in dpm_list?
They are not related to dpm_list directly.
However, if you shut down a supplier device before its consumer and that
involves power management, then the consumer shutdown may fail and lock up
the system
I asked you elsewhere to clearly describe the problem you are trying to
address. Please do that in the first place.
Thanks,
Rafael
^ permalink raw reply
* Re: [v2 PATCH 1/2] powerpc: Detect the presence of big-cores via "ibm, thread-groups"
From: Gautham R Shenoy @ 2018-07-04 8:09 UTC (permalink / raw)
To: Murilo Opsfelder Araujo
Cc: Gautham R. Shenoy, Michael Ellerman, Benjamin Herrenschmidt,
Michael Neuling, Vaidyanathan Srinivasan, Akshay Adiga,
Shilpasri G Bhat, Oliver O'Halloran, Nicholas Piggin,
linuxppc-dev, linux-kernel
In-Reply-To: <20180703171655.GA6474@kermit-br-ibm-com.br.ibm.com>
Hello Murilo,
Thanks for reviewing the patch. Replies inline.
On Tue, Jul 03, 2018 at 02:16:55PM -0300, Murilo Opsfelder Araujo wrote:
> On Tue, Jul 03, 2018 at 04:33:50PM +0530, Gautham R. Shenoy wrote:
> > From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
> >
> > On IBM POWER9, the device tree exposes a property array identifed by
> > "ibm,thread-groups" which will indicate which groups of threads share a
> > particular set of resources.
> >
> > As of today we only have one form of grouping identifying the group of
> > threads in the core that share the L1 cache, translation cache and
> > instruction data flow.
> >
> > This patch defines the helper function to parse the contents of
> > "ibm,thread-groups" and a new structure to contain the parsed output.
> >
> > The patch also creates the sysfs file named "small_core_siblings" that
> > returns the physical ids of the threads in the core that share the L1
> > cache, translation cache and instruction data flow.
> >
> > Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
> > ---
> > Documentation/ABI/testing/sysfs-devices-system-cpu | 8 ++
> > arch/powerpc/include/asm/cputhreads.h | 22 +++++
> > arch/powerpc/kernel/setup-common.c | 110 +++++++++++++++++++++
> > arch/powerpc/kernel/sysfs.c | 35 +++++++
> > 4 files changed, 175 insertions(+)
> >
> > diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
> > index 9c5e7732..53a823a 100644
> > --- a/Documentation/ABI/testing/sysfs-devices-system-cpu
> > +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
> > @@ -487,3 +487,11 @@ Description: Information about CPU vulnerabilities
> > "Not affected" CPU is not affected by the vulnerability
> > "Vulnerable" CPU is affected and no mitigation in effect
> > "Mitigation: $M" CPU is affected and mitigation $M is in effect
> > +
> > +What: /sys/devices/system/cpu/cpu[0-9]+/small_core_sibings
>
> s/small_core_sibings/small_core_siblings
Nice catch! Will fix this.
>
> By the way, big_core_siblings was mentioned in the introductory
email.
It should be small_core_siblings in the introductory e-mail. My bad.
>
> > +Date: 03-Jul-2018
> > +KernelVersion: v4.18.0
> > +Contact: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
> > +Description: List of Physical ids of CPUs which share the the L1 cache,
> > + translation cache and instruction data-flow with this CPU.
> > +Values: Comma separated list of decimal integers.
[..snip..]
> > +/*
> > + * parse_thread_groups: Parses the "ibm,thread-groups" device tree
> > + * property for the CPU device node dn and stores
> > + * the parsed output in the thread_groups
> > + * structure tg.
>
> Perhaps document the arguments of this function, as done in the second
> patch?
Will do this. Thanks.
>
> > + *
> > + * ibm,thread-groups[0..N-1] array defines which group of threads in
> > + * the CPU-device node can be grouped together based on the property.
> > + *
> > + * ibm,thread-groups[0] tells us the property based on which the
> > + * threads are being grouped together. If this value is 1, it implies
> > + * that the threads in the same group share L1, translation cache.
> > + *
> > + * ibm,thread-groups[1] tells us how many such thread groups exist.
> > + *
> > + * ibm,thread-groups[2] tells us the number of threads in each such
> > + * group.
> > + *
> > + * ibm,thread-groups[3..N-1] is the list of threads identified by
> > + * "ibm,ppc-interrupt-server#s" arranged as per their membership in
> > + * the grouping.
> > + *
> > + * Example: If ibm,thread-groups = [1,2,4,5,6,7,8,9,10,11,12] it
> > + * implies that there are 2 groups of 4 threads each, where each group
> > + * of threads share L1, translation cache.
> > + *
> > + * The "ibm,ppc-interrupt-server#s" of the first group is {5,6,7,8}
> > + * and the "ibm,ppc-interrupt-server#s" of the second group is {9, 10,
> > + * 11, 12} structure
> > + *
> > + * Returns 0 on success, -EINVAL if the property does not exist,
> > + * -ENODATA if property does not have a value, and -EOVERFLOW if the
> > + * property data isn't large enough.
> > + */
> > +int parse_thread_groups(struct device_node *dn,
> > + struct thread_groups *tg)
> > +{
> > + unsigned int nr_groups, threads_per_group, property;
> > + int i;
> > + u32 thread_group_array[3 + MAX_THREAD_LIST_SIZE];
> > + u32 *thread_list;
> > + size_t total_threads;
> > + int ret;
> > +
> > + ret = of_property_read_u32_array(dn, "ibm,thread-groups",
> > + thread_group_array, 3);
> > +
> > + if (ret)
> > + return ret;
> > +
> > + property = thread_group_array[0];
> > + nr_groups = thread_group_array[1];
> > + threads_per_group = thread_group_array[2];
> > + total_threads = nr_groups * threads_per_group;
> > +
> > + ret = of_property_read_u32_array(dn, "ibm,thread-groups",
> > + thread_group_array,
> > + 3 + total_threads);
> > + if (ret)
> > + return ret;
> > +
> > + thread_list = &thread_group_array[3];
> > +
> > + for (i = 0 ; i < total_threads; i++)
> > + tg->thread_list[i] = thread_list[i];
> > +
> > + tg->property = property;
> > + tg->nr_groups = nr_groups;
> > + tg->threads_per_group = threads_per_group;
> > +
> > + return 0;
> > +}
> > +
> > +/*
> > + * get_cpu_thread_group_start : Searches the thread group in tg->thread_list
> > + * that @cpu belongs to.
>
> Same here.
Sure.
>
> > + *
> > + * Returns the index to tg->thread_list that points to the the start
> > + * of the thread_group that @cpu belongs to.
> > + *
> > + * Returns -1 if cpu doesn't belong to any of the groups pointed
> > + * to by tg->thread_list.
> > + */
> > +int get_cpu_thread_group_start(int cpu, struct thread_groups *tg)
> > +{
> > + int hw_cpu_id = get_hard_smp_processor_id(cpu);
> > + int i, j;
> > +
> > + for (i = 0; i < tg->nr_groups; i++) {
> > + int group_start = i * tg->threads_per_group;
> > +
> > + for (j = 0; j < tg->threads_per_group; j++) {
> > + int idx = group_start + j;
> > +
> > + if (tg->thread_list[idx] == hw_cpu_id)
> > + return group_start;
> > + }
> > + }
> > +
> > + return -1;
> > +}
> > +
> > /**
> > * setup_cpu_maps - initialize the following cpu maps:
> > * cpu_possible_mask
> > @@ -467,6 +571,7 @@ void __init smp_setup_cpu_maps(void)
> > const __be32 *intserv;
> > __be32 cpu_be;
> > int j, len;
> > + struct thread_groups tg = {.nr_groups = 0};
>
> We assume has_big_cores = true but here we initialize .nr_groups
> otherwise. It's kind of contradictory.
.nr_groups is being initialized to some sane value here. Perhaps I
should move the initializations of tg.nr_groups and tg.property inside
parse_thread_groups.
>
> What if has_big_cores is assumed false and members of tg are initialized
> with zeroes?
The idea here is that after parsing all the CPU nodes, the variable
"has_big_cores" continues to remain to true if all the CPU nodes are
big cores. Even if one of them isn't a big core (not sure if this is
possible in practise) then we want to set it to false.
Hence we start with the assumption that has_big_cores is true, and
switch it on finding even one core that is not a big-core.
But I got to know that this is an overkill since if the component
small core is bad, the entire big-core is disabled. Thus it might be
sufficient to just check for one CPU node, if it is a big core or not,
and set the variable from "false" to "true".
>
> >
> > DBG(" * %pOF...\n", dn);
> >
> > @@ -505,6 +610,11 @@ void __init smp_setup_cpu_maps(void)
> > cpu++;
> > }
> >
> > + if (parse_thread_groups(dn, &tg) ||
> > + tg.nr_groups < 1 || tg.property != 1) {
> > + has_big_cores = false;
> > + }
> > +
>
> parse_thread_groups() returns before setting tg.property if property
> doesn't exist. Are we confident that tg.property won't contain any
> garbage that could lead to a false positive here? Shouldn't we also
> initialize .property when declaring tg?
Yes we should. Will move the initializations to parse_thread_groups.
>
> What if this logic is encapsulated in a function? For example:
>
> has_big_cores = dt_has_big_cores(dn, &tg);
Good idea.
>
> > if (cpu >= nr_cpu_ids) {
> > of_node_put(dn);
> > break;
> > diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
[..snip..]
Will address these changes in the subsequent patch series.
>
> Cheers
> Murilo
--
Thanks and Regards
gautham.
^ permalink raw reply
* [PATCH v2 2/2] hwmon: ibmpowernv: Add attributes to enable/disable sensor groups
From: Shilpasri G Bhat @ 2018-07-04 9:16 UTC (permalink / raw)
To: linux, mpe; +Cc: linuxppc-dev, linux-hwmon, linux-kernel, ego, Shilpasri G Bhat
In-Reply-To: <1530695793-4584-1-git-send-email-shilpa.bhat@linux.vnet.ibm.com>
On-Chip-Controller(OCC) is an embedded micro-processor in POWER9 chip
which measures various system and chip level sensors. These sensors
comprises of environmental sensors (like power, temperature, current
and voltage) and performance sensors (like utilization, frequency).
All these sensors are copied to main memory at a regular interval of
100ms. OCC provides a way to select a group of sensors that is copied
to the main memory to increase the update frequency of selected sensor
groups. When a sensor-group is disabled, OCC will not copy it to main
memory and those sensors read 0 values.
This patch provides support for enabling/disabling the sensor groups
like power, temperature, current and voltage. This patch adds new
per-senor sysfs attribute to disable and enable them.
Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
---
Changes from v1:
- Add per-sensor 'enable' attribute
- Return -ENODATA when sensor is disabled
Documentation/hwmon/sysfs-interface | 22 +++
drivers/hwmon/ibmpowernv.c | 281 +++++++++++++++++++++++++++++++-----
2 files changed, 264 insertions(+), 39 deletions(-)
diff --git a/Documentation/hwmon/sysfs-interface b/Documentation/hwmon/sysfs-interface
index fc337c3..38ab05c 100644
--- a/Documentation/hwmon/sysfs-interface
+++ b/Documentation/hwmon/sysfs-interface
@@ -184,6 +184,11 @@ vrm Voltage Regulator Module version number.
Affects the way the driver calculates the CPU core reference
voltage from the vid pins.
+in[0-*]_enable Enable or disable the sensor
+ 1 : Enable
+ 0 : Disable
+ RW
+
Also see the Alarms section for status flags associated with voltages.
@@ -409,6 +414,12 @@ temp_reset_history
Reset temp_lowest and temp_highest for all sensors
WO
+temp[1-*]_enable
+ Enable or disable the sensor
+ 1 : Enable
+ 0 : Disable
+ RW
+
Some chips measure temperature using external thermistors and an ADC, and
report the temperature measurement as a voltage. Converting this voltage
back to a temperature (or the other way around for limits) requires
@@ -468,6 +479,12 @@ curr_reset_history
Reset currX_lowest and currX_highest for all sensors
WO
+curr[1-*]_enable
+ Enable or disable the sensor
+ 1 : Enable
+ 0 : Disable
+ RW
+
Also see the Alarms section for status flags associated with currents.
*********
@@ -566,6 +583,11 @@ power[1-*]_crit Critical maximum power.
Unit: microWatt
RW
+power[1-*]_enable Enable or disable the sensor
+ 1 : Enable
+ 0 : Disable
+ RW
+
Also see the Alarms section for status flags associated with power readings.
**********
diff --git a/drivers/hwmon/ibmpowernv.c b/drivers/hwmon/ibmpowernv.c
index f829dad..61e04cf 100644
--- a/drivers/hwmon/ibmpowernv.c
+++ b/drivers/hwmon/ibmpowernv.c
@@ -90,8 +90,28 @@ struct sensor_data {
char label[MAX_LABEL_LEN];
char name[MAX_ATTR_LEN];
struct device_attribute dev_attr;
+ struct sensor_group_data *sgdata;
+ struct sensor_data *sdata[3];
+ bool enable;
};
+static struct sensor_group_data {
+ u32 gid;
+ u32 nr_phandle;
+ u32 nr_sensor;
+ enum sensors type;
+ const __be32 *phandles;
+ struct sensor_data **sensors;
+ bool enable;
+} *sg_data;
+
+/*
+ * To synchronise writes to struct sensor_data.enable and
+ * struct sensor_group_data.enable
+ */
+DEFINE_MUTEX(sensor_groups_mutex);
+static int nr_sensor_groups;
+
struct platform_data {
const struct attribute_group *attr_groups[MAX_SENSOR_TYPE + 1];
u32 sensors_count; /* Total count of sensors from each group */
@@ -105,6 +125,9 @@ static ssize_t show_sensor(struct device *dev, struct device_attribute *devattr,
ssize_t ret;
u64 x;
+ if (sdata->sgdata && !sdata->enable)
+ return -ENODATA;
+
ret = opal_get_sensor_data_u64(sdata->id, &x);
if (ret)
@@ -120,6 +143,74 @@ static ssize_t show_sensor(struct device *dev, struct device_attribute *devattr,
return sprintf(buf, "%llu\n", x);
}
+static ssize_t show_enable(struct device *dev,
+ struct device_attribute *devattr, char *buf)
+{
+ struct sensor_data *sdata = container_of(devattr, struct sensor_data,
+ dev_attr);
+
+ return sprintf(buf, "%u\n", sdata->enable);
+}
+
+static ssize_t store_enable(struct device *dev,
+ struct device_attribute *devattr,
+ const char *buf, size_t count)
+{
+ struct sensor_data *sdata = container_of(devattr, struct sensor_data,
+ dev_attr);
+ struct sensor_group_data *sg = sdata->sgdata;
+ u32 data;
+ int ret, i;
+ bool send_command;
+
+ ret = kstrtoint(buf, 0, &data);
+ if (ret)
+ return ret;
+
+ if (data != 0 && data != 1)
+ return -EIO;
+
+ ret = mutex_lock_interruptible(&sensor_groups_mutex);
+ if (ret)
+ return ret;
+
+ sdata->enable = data;
+ if ((data && sg->enable) || (!data && !sg->enable)) {
+ send_command = false;
+ } else if (data && !sg->enable) {
+ /* Enable if first sensor in the group */
+ send_command = true;
+ } else if (!data && sg->enable) {
+ /* Disable if last sensor in the group */
+ send_command = true;
+ for (i = 0; i < sg->nr_sensor; i++) {
+ struct sensor_data *sd = sg->sensors[i];
+
+ if (sd->enable) {
+ send_command = false;
+ break;
+ }
+ }
+ }
+
+ if (send_command) {
+ ret = sensor_group_enable(sg->gid, data);
+ if (!ret)
+ sg->enable = data;
+ }
+
+ if (!ret) {
+ for (i = 0; i < ARRAY_SIZE(sdata->sdata); i++)
+ sdata->sdata[i]->enable = data;
+ ret = count;
+ } else {
+ ret = -EIO;
+ }
+
+ mutex_unlock(&sensor_groups_mutex);
+ return ret;
+}
+
static ssize_t show_label(struct device *dev, struct device_attribute *devattr,
char *buf)
{
@@ -292,6 +383,90 @@ static u32 get_sensor_hwmon_index(struct sensor_data *sdata,
return ++sensor_groups[sdata->type].hwmon_index;
}
+static int init_sensor_group_data(struct platform_device *pdev)
+{
+ struct device_node *sensor_groups, *sg;
+ enum sensors type;
+ int count = 0, ret = 0;
+
+ sensor_groups = of_find_node_by_path("/ibm,opal/sensor-groups");
+ if (!sensor_groups)
+ return ret;
+
+ for_each_child_of_node(sensor_groups, sg) {
+ type = get_sensor_type(sg);
+ if (type != MAX_SENSOR_TYPE)
+ nr_sensor_groups++;
+ }
+
+ if (!nr_sensor_groups)
+ goto out;
+
+ sg_data = devm_kzalloc(&pdev->dev, nr_sensor_groups * sizeof(*sg_data),
+ GFP_KERNEL);
+ if (!sg_data) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ for_each_child_of_node(sensor_groups, sg) {
+ const __be32 *phandles;
+ int len, gid;
+
+ type = get_sensor_type(sg);
+ if (type == MAX_SENSOR_TYPE)
+ continue;
+
+ if (of_property_read_u32(sg, "sensor-group-id", &gid))
+ continue;
+
+ phandles = of_get_property(sg, "sensors", &len);
+ if (!phandles)
+ continue;
+
+ len /= sizeof(u32);
+ if (!len)
+ continue;
+
+ sg_data[count].sensors = devm_kzalloc(&pdev->dev,
+ len * sizeof(struct sensor_data *),
+ GFP_KERNEL);
+ if (!sg_data[count].sensors) {
+ ret = -ENOMEM;
+ break;
+ }
+
+ sg_data[count].nr_sensor = 0;
+ sg_data[count].gid = gid;
+ sg_data[count].type = type;
+ sg_data[count].nr_phandle = len;
+ sg_data[count].phandles = phandles;
+ sg_data[count++].enable = true;
+ }
+out:
+ of_node_put(sensor_groups);
+ return ret;
+}
+
+static struct sensor_group_data *get_sensor_group(struct device_node *np,
+ enum sensors type)
+{
+ int i, j;
+
+ for (i = 0; i < nr_sensor_groups; i++) {
+ const __be32 *phandles = sg_data[i].phandles;
+
+ if (type != sg_data[i].type)
+ continue;
+
+ for (j = 0; j < sg_data[i].nr_phandle; j++)
+ if (be32_to_cpu(phandles[j]) == np->phandle)
+ return &sg_data[i];
+ }
+
+ return NULL;
+}
+
static int populate_attr_groups(struct platform_device *pdev)
{
struct platform_data *pdata = platform_get_drvdata(pdev);
@@ -299,6 +474,9 @@ static int populate_attr_groups(struct platform_device *pdev)
struct device_node *opal, *np;
enum sensors type;
+ if (init_sensor_group_data(pdev))
+ return -ENOMEM;
+
opal = of_find_node_by_path("/ibm,opal/sensors");
for_each_child_of_node(opal, np) {
const char *label;
@@ -313,7 +491,7 @@ static int populate_attr_groups(struct platform_device *pdev)
sensor_groups[type].attr_count++;
/*
- * add attributes for labels, min and max
+ * add attributes for labels, min, max and enable
*/
if (!of_property_read_string(np, "label", &label))
sensor_groups[type].attr_count++;
@@ -321,6 +499,8 @@ static int populate_attr_groups(struct platform_device *pdev)
sensor_groups[type].attr_count++;
if (of_find_property(np, "sensor-data-max", NULL))
sensor_groups[type].attr_count++;
+ if (get_sensor_group(np, type))
+ sensor_groups[type].attr_count++;
}
of_node_put(opal);
@@ -344,7 +524,10 @@ static int populate_attr_groups(struct platform_device *pdev)
static void create_hwmon_attr(struct sensor_data *sdata, const char *attr_name,
ssize_t (*show)(struct device *dev,
struct device_attribute *attr,
- char *buf))
+ char *buf),
+ ssize_t (*store)(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count))
{
snprintf(sdata->name, MAX_ATTR_LEN, "%s%d_%s",
sensor_groups[sdata->type].name, sdata->hwmon_index,
@@ -352,23 +535,34 @@ static void create_hwmon_attr(struct sensor_data *sdata, const char *attr_name,
sysfs_attr_init(&sdata->dev_attr.attr);
sdata->dev_attr.attr.name = sdata->name;
- sdata->dev_attr.attr.mode = S_IRUGO;
sdata->dev_attr.show = show;
+ if (store) {
+ sdata->dev_attr.store = store;
+ sdata->dev_attr.attr.mode = 0664;
+ } else {
+ sdata->dev_attr.attr.mode = 0444;
+ }
}
static void populate_sensor(struct sensor_data *sdata, int od, int hd, int sid,
const char *attr_name, enum sensors type,
const struct attribute_group *pgroup,
+ struct sensor_group_data *sgdata,
ssize_t (*show)(struct device *dev,
struct device_attribute *attr,
- char *buf))
+ char *buf),
+ ssize_t (*store)(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count))
{
sdata->id = sid;
sdata->type = type;
sdata->opal_index = od;
sdata->hwmon_index = hd;
- create_hwmon_attr(sdata, attr_name, show);
+ create_hwmon_attr(sdata, attr_name, show, store);
pgroup->attrs[sensor_groups[type].attr_count++] = &sdata->dev_attr.attr;
+ sdata->enable = true;
+ sdata->sgdata = sgdata;
}
static char *get_max_attr(enum sensors type)
@@ -408,18 +602,17 @@ static int create_device_attrs(struct platform_device *pdev)
u32 count = 0;
int err = 0;
- opal = of_find_node_by_path("/ibm,opal/sensors");
sdata = devm_kcalloc(&pdev->dev,
pdata->sensors_count, sizeof(*sdata),
GFP_KERNEL);
- if (!sdata) {
- err = -ENOMEM;
- goto exit_put_node;
- }
+ if (!sdata)
+ return -ENOMEM;
+ opal = of_find_node_by_path("/ibm,opal/sensors");
for_each_child_of_node(opal, np) {
+ struct sensor_group_data *sgdata;
const char *attr_name;
- u32 opal_index;
+ u32 opal_index, hw_id;
const char *label;
if (np->name == NULL)
@@ -456,14 +649,43 @@ static int create_device_attrs(struct platform_device *pdev)
opal_index = INVALID_INDEX;
}
- sdata[count].opal_index = opal_index;
- sdata[count].hwmon_index =
- get_sensor_hwmon_index(&sdata[count], sdata, count);
+ hw_id = get_sensor_hwmon_index(&sdata[count], sdata, count);
+ sgdata = get_sensor_group(np, type);
+ populate_sensor(&sdata[count], opal_index, hw_id, sensor_id,
+ attr_name, type, pgroups[type], sgdata,
+ show_sensor, NULL);
+ count++;
+
+ if (!of_property_read_u32(np, "sensor-data-max", &sensor_id)) {
+ attr_name = get_max_attr(type);
+ populate_sensor(&sdata[count], opal_index, hw_id,
+ sensor_id, attr_name, type,
+ pgroups[type], sgdata, show_sensor,
+ NULL);
+ count++;
+ }
+
+ if (!of_property_read_u32(np, "sensor-data-min", &sensor_id)) {
+ attr_name = get_min_attr(type);
+ populate_sensor(&sdata[count], opal_index, hw_id,
+ sensor_id, attr_name, type,
+ pgroups[type], sgdata, show_sensor,
+ NULL);
+ count++;
+ }
- create_hwmon_attr(&sdata[count], attr_name, show_sensor);
+ if (sgdata) {
+ int i;
- pgroups[type]->attrs[sensor_groups[type].attr_count++] =
- &sdata[count++].dev_attr.attr;
+ sgdata->sensors[sgdata->nr_sensor++] = &sdata[count];
+ populate_sensor(&sdata[count], opal_index, hw_id,
+ sgdata->gid, "enable", type,
+ pgroups[type], sgdata, show_enable,
+ store_enable);
+ for (i = 0; i < ARRAY_SIZE(sdata[count].sdata); i++)
+ sdata[count].sdata[i] = &sdata[count - i - 1];
+ count++;
+ }
if (!of_property_read_string(np, "label", &label)) {
/*
@@ -474,34 +696,15 @@ static int create_device_attrs(struct platform_device *pdev)
*/
make_sensor_label(np, &sdata[count], label);
- populate_sensor(&sdata[count], opal_index,
- sdata[count - 1].hwmon_index,
+ populate_sensor(&sdata[count], opal_index, hw_id,
sensor_id, "label", type, pgroups[type],
- show_label);
- count++;
- }
-
- if (!of_property_read_u32(np, "sensor-data-max", &sensor_id)) {
- attr_name = get_max_attr(type);
- populate_sensor(&sdata[count], opal_index,
- sdata[count - 1].hwmon_index,
- sensor_id, attr_name, type,
- pgroups[type], show_sensor);
- count++;
- }
-
- if (!of_property_read_u32(np, "sensor-data-min", &sensor_id)) {
- attr_name = get_min_attr(type);
- populate_sensor(&sdata[count], opal_index,
- sdata[count - 1].hwmon_index,
- sensor_id, attr_name, type,
- pgroups[type], show_sensor);
+ NULL, show_label, NULL);
count++;
}
}
-exit_put_node:
of_node_put(opal);
+
return err;
}
--
1.8.3.1
^ permalink raw reply related
* [PATCH v2 1/2] powernv:opal-sensor-groups: Add support to enable sensor groups
From: Shilpasri G Bhat @ 2018-07-04 9:16 UTC (permalink / raw)
To: linux, mpe; +Cc: linuxppc-dev, linux-hwmon, linux-kernel, ego, Shilpasri G Bhat
In-Reply-To: <1530695793-4584-1-git-send-email-shilpa.bhat@linux.vnet.ibm.com>
Adds support to enable/disable a sensor group at runtime. This
can be used to select the sensor groups that needs to be copied to
main memory by OCC. Sensor groups like power, temperature, current,
voltage, frequency, utilization can be enabled/disabled at runtime.
Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
---
- Rebased on master. No changes from v1.
arch/powerpc/include/asm/opal-api.h | 1 +
arch/powerpc/include/asm/opal.h | 2 ++
.../powerpc/platforms/powernv/opal-sensor-groups.c | 28 ++++++++++++++++++++++
arch/powerpc/platforms/powernv/opal-wrappers.S | 1 +
4 files changed, 32 insertions(+)
diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
index 3bab299..56a94a1 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -206,6 +206,7 @@
#define OPAL_NPU_SPA_CLEAR_CACHE 160
#define OPAL_NPU_TL_SET 161
#define OPAL_SENSOR_READ_U64 162
+#define OPAL_SENSOR_GROUP_ENABLE 163
#define OPAL_PCI_GET_PBCQ_TUNNEL_BAR 164
#define OPAL_PCI_SET_PBCQ_TUNNEL_BAR 165
#define OPAL_LAST 165
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index e1b2910..fc0550e 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -292,6 +292,7 @@ int64_t opal_imc_counters_init(uint32_t type, uint64_t address,
int opal_get_power_shift_ratio(u32 handle, int token, u32 *psr);
int opal_set_power_shift_ratio(u32 handle, int token, u32 psr);
int opal_sensor_group_clear(u32 group_hndl, int token);
+int opal_sensor_group_enable(u32 group_hndl, int token, bool enable);
s64 opal_signal_system_reset(s32 cpu);
s64 opal_quiesce(u64 shutdown_type, s32 cpu);
@@ -326,6 +327,7 @@ extern int opal_async_wait_response_interruptible(uint64_t token,
struct opal_msg *msg);
extern int opal_get_sensor_data(u32 sensor_hndl, u32 *sensor_data);
extern int opal_get_sensor_data_u64(u32 sensor_hndl, u64 *sensor_data);
+extern int sensor_group_enable(u32 grp_hndl, bool enable);
struct rtc_time;
extern time64_t opal_get_boot_time(void);
diff --git a/arch/powerpc/platforms/powernv/opal-sensor-groups.c b/arch/powerpc/platforms/powernv/opal-sensor-groups.c
index 541c9ea..f7d04b6 100644
--- a/arch/powerpc/platforms/powernv/opal-sensor-groups.c
+++ b/arch/powerpc/platforms/powernv/opal-sensor-groups.c
@@ -32,6 +32,34 @@ struct sg_attr {
struct sg_attr *sgattrs;
} *sgs;
+int sensor_group_enable(u32 handle, bool enable)
+{
+ struct opal_msg msg;
+ int token, ret;
+
+ token = opal_async_get_token_interruptible();
+ if (token < 0)
+ return token;
+
+ ret = opal_sensor_group_enable(handle, token, enable);
+ if (ret == OPAL_ASYNC_COMPLETION) {
+ ret = opal_async_wait_response(token, &msg);
+ if (ret) {
+ pr_devel("Failed to wait for the async response\n");
+ ret = -EIO;
+ goto out;
+ }
+ ret = opal_error_code(opal_get_async_rc(msg));
+ } else {
+ ret = opal_error_code(ret);
+ }
+
+out:
+ opal_async_release_token(token);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(sensor_group_enable);
+
static ssize_t sg_store(struct kobject *kobj, struct kobj_attribute *attr,
const char *buf, size_t count)
{
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
index a8d9b40..8268a1e 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -327,3 +327,4 @@ OPAL_CALL(opal_npu_tl_set, OPAL_NPU_TL_SET);
OPAL_CALL(opal_pci_get_pbcq_tunnel_bar, OPAL_PCI_GET_PBCQ_TUNNEL_BAR);
OPAL_CALL(opal_pci_set_pbcq_tunnel_bar, OPAL_PCI_SET_PBCQ_TUNNEL_BAR);
OPAL_CALL(opal_sensor_read_u64, OPAL_SENSOR_READ_U64);
+OPAL_CALL(opal_sensor_group_enable, OPAL_SENSOR_GROUP_ENABLE);
--
1.8.3.1
^ permalink raw reply related
* [PATCH v2 0/2] hwmon: Add attributes to enable/disable sensors
From: Shilpasri G Bhat @ 2018-07-04 9:16 UTC (permalink / raw)
To: linux, mpe; +Cc: linuxppc-dev, linux-hwmon, linux-kernel, ego, Shilpasri G Bhat
This patch series adds new attribute to enable or disable a sensor in
runtime.
v1 : https://lkml.org/lkml/2018/3/22/214
Shilpasri G Bhat (2):
powernv:opal-sensor-groups: Add support to enable sensor groups
hwmon: ibmpowernv: Add attributes to enable/disable sensor groups
Documentation/hwmon/sysfs-interface | 22 ++
arch/powerpc/include/asm/opal-api.h | 1 +
arch/powerpc/include/asm/opal.h | 2 +
.../powerpc/platforms/powernv/opal-sensor-groups.c | 28 ++
arch/powerpc/platforms/powernv/opal-wrappers.S | 1 +
drivers/hwmon/ibmpowernv.c | 281 ++++++++++++++++++---
6 files changed, 296 insertions(+), 39 deletions(-)
--
1.8.3.1
^ permalink raw reply
* [PATCH v10 6/6] kernel: tracepoints: add support for relative references
From: Ard Biesheuvel @ 2018-07-04 8:36 UTC (permalink / raw)
To: linux-kernel
Cc: Ard Biesheuvel, Arnd Bergmann, Kees Cook, Will Deacon,
Michael Ellerman, Thomas Garnier, Thomas Gleixner,
Serge E. Hallyn, Bjorn Helgaas, Benjamin Herrenschmidt,
Russell King, Paul Mackerras, Catalin Marinas, Petr Mladek,
Ingo Molnar, James Morris, Andrew Morton, Nicolas Pitre,
Josh Poimboeuf, Steven Rostedt, Sergey Senozhatsky,
Linus Torvalds, Jessica Yu, linux-arm-kernel, linuxppc-dev, x86
In-Reply-To: <20180704083651.24360-1-ard.biesheuvel@linaro.org>
To avoid the need for relocating absolute references to tracepoint
structures at boot time when running relocatable kernels (which may
take a disproportionate amount of space), add the option to emit
these tables as relative references instead.
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
include/linux/tracepoint.h | 19 ++++++--
kernel/tracepoint.c | 49 +++++++++++---------
2 files changed, 41 insertions(+), 27 deletions(-)
diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index 19a690b559ca..b130e40d82cb 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -225,6 +225,19 @@ extern void syscall_unregfunc(void);
return static_key_false(&__tracepoint_##name.key); \
}
+#ifdef CONFIG_HAVE_ARCH_PREL32_RELOCATIONS
+#define __TRACEPOINT_ENTRY(name) \
+ asm(" .section \"__tracepoints_ptrs\", \"a\" \n" \
+ " .balign 4 \n" \
+ " .long __tracepoint_" #name " - . \n" \
+ " .previous \n")
+#else
+#define __TRACEPOINT_ENTRY(name) \
+ static struct tracepoint * const __tracepoint_ptr_##name __used \
+ __attribute__((section("__tracepoints_ptrs"))) = \
+ &__tracepoint_##name
+#endif
+
/*
* We have no guarantee that gcc and the linker won't up-align the tracepoint
* structures, so we create an array of pointers that will be used for iteration
@@ -234,11 +247,9 @@ extern void syscall_unregfunc(void);
static const char __tpstrtab_##name[] \
__attribute__((section("__tracepoints_strings"))) = #name; \
struct tracepoint __tracepoint_##name \
- __attribute__((section("__tracepoints"))) = \
+ __attribute__((section("__tracepoints"), used)) = \
{ __tpstrtab_##name, STATIC_KEY_INIT_FALSE, reg, unreg, NULL };\
- static struct tracepoint * const __tracepoint_ptr_##name __used \
- __attribute__((section("__tracepoints_ptrs"))) = \
- &__tracepoint_##name;
+ __TRACEPOINT_ENTRY(name);
#define DEFINE_TRACE(name) \
DEFINE_TRACE_FN(name, NULL, NULL);
diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index 6dc6356c3327..451c8f5e8345 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c
@@ -325,6 +325,27 @@ int tracepoint_probe_unregister(struct tracepoint *tp, void *probe, void *data)
}
EXPORT_SYMBOL_GPL(tracepoint_probe_unregister);
+static void for_each_tracepoint_range(struct tracepoint * const *begin,
+ struct tracepoint * const *end,
+ void (*fct)(struct tracepoint *tp, void *priv),
+ void *priv)
+{
+ if (!begin)
+ return;
+
+ if (IS_ENABLED(CONFIG_HAVE_ARCH_PREL32_RELOCATIONS)) {
+ const int *iter;
+
+ for (iter = (const int *)begin; iter < (const int *)end; iter++)
+ fct(offset_to_ptr(iter), priv);
+ } else {
+ struct tracepoint * const *iter;
+
+ for (iter = begin; iter < end; iter++)
+ fct(*iter, priv);
+ }
+}
+
#ifdef CONFIG_MODULES
bool trace_module_has_bad_taint(struct module *mod)
{
@@ -389,15 +410,9 @@ EXPORT_SYMBOL_GPL(unregister_tracepoint_module_notifier);
* Ensure the tracer unregistered the module's probes before the module
* teardown is performed. Prevents leaks of probe and data pointers.
*/
-static void tp_module_going_check_quiescent(struct tracepoint * const *begin,
- struct tracepoint * const *end)
+static void tp_module_going_check_quiescent(struct tracepoint *tp, void *priv)
{
- struct tracepoint * const *iter;
-
- if (!begin)
- return;
- for (iter = begin; iter < end; iter++)
- WARN_ON_ONCE((*iter)->funcs);
+ WARN_ON_ONCE(tp->funcs);
}
static int tracepoint_module_coming(struct module *mod)
@@ -448,8 +463,9 @@ static void tracepoint_module_going(struct module *mod)
* Called the going notifier before checking for
* quiescence.
*/
- tp_module_going_check_quiescent(mod->tracepoints_ptrs,
- mod->tracepoints_ptrs + mod->num_tracepoints);
+ for_each_tracepoint_range(mod->tracepoints_ptrs,
+ mod->tracepoints_ptrs + mod->num_tracepoints,
+ tp_module_going_check_quiescent, NULL);
break;
}
}
@@ -501,19 +517,6 @@ static __init int init_tracepoints(void)
__initcall(init_tracepoints);
#endif /* CONFIG_MODULES */
-static void for_each_tracepoint_range(struct tracepoint * const *begin,
- struct tracepoint * const *end,
- void (*fct)(struct tracepoint *tp, void *priv),
- void *priv)
-{
- struct tracepoint * const *iter;
-
- if (!begin)
- return;
- for (iter = begin; iter < end; iter++)
- fct(*iter, priv);
-}
-
/**
* for_each_kernel_tracepoint - iteration on all kernel tracepoints
* @fct: callback
--
2.17.1
^ permalink raw reply related
* [PATCH v10 5/6] PCI: Add support for relative addressing in quirk tables
From: Ard Biesheuvel @ 2018-07-04 8:36 UTC (permalink / raw)
To: linux-kernel
Cc: Ard Biesheuvel, Arnd Bergmann, Kees Cook, Will Deacon,
Michael Ellerman, Thomas Garnier, Thomas Gleixner,
Serge E. Hallyn, Bjorn Helgaas, Benjamin Herrenschmidt,
Russell King, Paul Mackerras, Catalin Marinas, Petr Mladek,
Ingo Molnar, James Morris, Andrew Morton, Nicolas Pitre,
Josh Poimboeuf, Steven Rostedt, Sergey Senozhatsky,
Linus Torvalds, Jessica Yu, linux-arm-kernel, linuxppc-dev, x86
In-Reply-To: <20180704083651.24360-1-ard.biesheuvel@linaro.org>
Allow the PCI quirk tables to be emitted in a way that avoids absolute
references to the hook functions. This reduces the size of the entries,
and, more importantly, makes them invariant under runtime relocation
(e.g., for KASLR)
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
drivers/pci/quirks.c | 12 +++++++++---
include/linux/pci.h | 20 ++++++++++++++++++++
2 files changed, 29 insertions(+), 3 deletions(-)
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index f439de848658..0ba4e446e5db 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -64,9 +64,15 @@ static void pci_do_fixups(struct pci_dev *dev, struct pci_fixup *f,
f->vendor == (u16) PCI_ANY_ID) &&
(f->device == dev->device ||
f->device == (u16) PCI_ANY_ID)) {
- calltime = fixup_debug_start(dev, f->hook);
- f->hook(dev);
- fixup_debug_report(dev, calltime, f->hook);
+ void (*hook)(struct pci_dev *dev);
+#ifdef CONFIG_HAVE_ARCH_PREL32_RELOCATIONS
+ hook = offset_to_ptr(&f->hook_offset);
+#else
+ hook = f->hook;
+#endif
+ calltime = fixup_debug_start(dev, hook);
+ hook(dev);
+ fixup_debug_report(dev, calltime, hook);
}
}
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 340029b2fb38..51baa3ab5195 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1795,7 +1795,11 @@ struct pci_fixup {
u16 device; /* Or PCI_ANY_ID */
u32 class; /* Or PCI_ANY_ID */
unsigned int class_shift; /* should be 0, 8, 16 */
+#ifdef CONFIG_HAVE_ARCH_PREL32_RELOCATIONS
+ int hook_offset;
+#else
void (*hook)(struct pci_dev *dev);
+#endif
};
enum pci_fixup_pass {
@@ -1809,12 +1813,28 @@ enum pci_fixup_pass {
pci_fixup_suspend_late, /* pci_device_suspend_late() */
};
+#ifdef CONFIG_HAVE_ARCH_PREL32_RELOCATIONS
+#define __DECLARE_PCI_FIXUP_SECTION(sec, name, vendor, device, class, \
+ class_shift, hook) \
+ __ADDRESSABLE(hook) \
+ asm(".section " #sec ", \"a\" \n" \
+ ".balign 16 \n" \
+ ".short " #vendor ", " #device " \n" \
+ ".long " #class ", " #class_shift " \n" \
+ ".long " #hook " - . \n" \
+ ".previous \n");
+#define DECLARE_PCI_FIXUP_SECTION(sec, name, vendor, device, class, \
+ class_shift, hook) \
+ __DECLARE_PCI_FIXUP_SECTION(sec, name, vendor, device, class, \
+ class_shift, hook)
+#else
/* Anonymous variables would be nice... */
#define DECLARE_PCI_FIXUP_SECTION(section, name, vendor, device, class, \
class_shift, hook) \
static const struct pci_fixup __PASTE(__pci_fixup_##name,__LINE__) __used \
__attribute__((__section__(#section), aligned((sizeof(void *))))) \
= { vendor, device, class, class_shift, hook };
+#endif
#define DECLARE_PCI_FIXUP_CLASS_EARLY(vendor, device, class, \
class_shift, hook) \
--
2.17.1
^ permalink raw reply related
* [PATCH v10 4/6] init: allow initcall tables to be emitted using relative references
From: Ard Biesheuvel @ 2018-07-04 8:36 UTC (permalink / raw)
To: linux-kernel
Cc: Ard Biesheuvel, Arnd Bergmann, Kees Cook, Will Deacon,
Michael Ellerman, Thomas Garnier, Thomas Gleixner,
Serge E. Hallyn, Bjorn Helgaas, Benjamin Herrenschmidt,
Russell King, Paul Mackerras, Catalin Marinas, Petr Mladek,
Ingo Molnar, James Morris, Andrew Morton, Nicolas Pitre,
Josh Poimboeuf, Steven Rostedt, Sergey Senozhatsky,
Linus Torvalds, Jessica Yu, linux-arm-kernel, linuxppc-dev, x86
In-Reply-To: <20180704083651.24360-1-ard.biesheuvel@linaro.org>
Allow the initcall tables to be emitted using relative references that
are only half the size on 64-bit architectures and don't require fixups
at runtime on relocatable kernels.
Acked-by: James Morris <james.morris@microsoft.com>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Petr Mladek <pmladek@suse.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
include/linux/init.h | 44 +++++++++++++++-----
init/main.c | 32 +++++++-------
kernel/printk/printk.c | 16 +++----
security/security.c | 17 ++++----
4 files changed, 68 insertions(+), 41 deletions(-)
diff --git a/include/linux/init.h b/include/linux/init.h
index bc27cf03c41e..2538d176dd1f 100644
--- a/include/linux/init.h
+++ b/include/linux/init.h
@@ -116,8 +116,24 @@
typedef int (*initcall_t)(void);
typedef void (*exitcall_t)(void);
-extern initcall_t __con_initcall_start[], __con_initcall_end[];
-extern initcall_t __security_initcall_start[], __security_initcall_end[];
+#ifdef CONFIG_HAVE_ARCH_PREL32_RELOCATIONS
+typedef int initcall_entry_t;
+
+static inline initcall_t initcall_from_entry(initcall_entry_t *entry)
+{
+ return offset_to_ptr(entry);
+}
+#else
+typedef initcall_t initcall_entry_t;
+
+static inline initcall_t initcall_from_entry(initcall_entry_t *entry)
+{
+ return *entry;
+}
+#endif
+
+extern initcall_entry_t __con_initcall_start[], __con_initcall_end[];
+extern initcall_entry_t __security_initcall_start[], __security_initcall_end[];
/* Used for contructor calls. */
typedef void (*ctor_fn_t)(void);
@@ -167,9 +183,20 @@ extern bool initcall_debug;
* as KEEP() in the linker script.
*/
-#define __define_initcall(fn, id) \
+#ifdef CONFIG_HAVE_ARCH_PREL32_RELOCATIONS
+#define ___define_initcall(fn, id, __sec) \
+ __ADDRESSABLE(fn) \
+ asm(".section \"" #__sec ".init\", \"a\" \n" \
+ "__initcall_" #fn #id ": \n" \
+ ".long " #fn " - . \n" \
+ ".previous \n");
+#else
+#define ___define_initcall(fn, id, __sec) \
static initcall_t __initcall_##fn##id __used \
- __attribute__((__section__(".initcall" #id ".init"))) = fn;
+ __attribute__((__section__(#__sec ".init"))) = fn;
+#endif
+
+#define __define_initcall(fn, id) ___define_initcall(fn, id, .initcall##id)
/*
* Early initcalls run before initializing SMP.
@@ -208,13 +235,8 @@ extern bool initcall_debug;
#define __exitcall(fn) \
static exitcall_t __exitcall_##fn __exit_call = fn
-#define console_initcall(fn) \
- static initcall_t __initcall_##fn \
- __used __section(.con_initcall.init) = fn
-
-#define security_initcall(fn) \
- static initcall_t __initcall_##fn \
- __used __section(.security_initcall.init) = fn
+#define console_initcall(fn) ___define_initcall(fn,, .con_initcall)
+#define security_initcall(fn) ___define_initcall(fn,, .security_initcall)
struct obs_kernel_param {
const char *str;
diff --git a/init/main.c b/init/main.c
index 3b4ada11ed52..e59a01f163d6 100644
--- a/init/main.c
+++ b/init/main.c
@@ -901,18 +901,18 @@ int __init_or_module do_one_initcall(initcall_t fn)
}
-extern initcall_t __initcall_start[];
-extern initcall_t __initcall0_start[];
-extern initcall_t __initcall1_start[];
-extern initcall_t __initcall2_start[];
-extern initcall_t __initcall3_start[];
-extern initcall_t __initcall4_start[];
-extern initcall_t __initcall5_start[];
-extern initcall_t __initcall6_start[];
-extern initcall_t __initcall7_start[];
-extern initcall_t __initcall_end[];
-
-static initcall_t *initcall_levels[] __initdata = {
+extern initcall_entry_t __initcall_start[];
+extern initcall_entry_t __initcall0_start[];
+extern initcall_entry_t __initcall1_start[];
+extern initcall_entry_t __initcall2_start[];
+extern initcall_entry_t __initcall3_start[];
+extern initcall_entry_t __initcall4_start[];
+extern initcall_entry_t __initcall5_start[];
+extern initcall_entry_t __initcall6_start[];
+extern initcall_entry_t __initcall7_start[];
+extern initcall_entry_t __initcall_end[];
+
+static initcall_entry_t *initcall_levels[] __initdata = {
__initcall0_start,
__initcall1_start,
__initcall2_start,
@@ -938,7 +938,7 @@ static char *initcall_level_names[] __initdata = {
static void __init do_initcall_level(int level)
{
- initcall_t *fn;
+ initcall_entry_t *fn;
strcpy(initcall_command_line, saved_command_line);
parse_args(initcall_level_names[level],
@@ -949,7 +949,7 @@ static void __init do_initcall_level(int level)
trace_initcall_level(initcall_level_names[level]);
for (fn = initcall_levels[level]; fn < initcall_levels[level+1]; fn++)
- do_one_initcall(*fn);
+ do_one_initcall(initcall_from_entry(fn));
}
static void __init do_initcalls(void)
@@ -980,11 +980,11 @@ static void __init do_basic_setup(void)
static void __init do_pre_smp_initcalls(void)
{
- initcall_t *fn;
+ initcall_entry_t *fn;
trace_initcall_level("early");
for (fn = __initcall_start; fn < __initcall0_start; fn++)
- do_one_initcall(*fn);
+ do_one_initcall(initcall_from_entry(fn));
}
/*
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 247808333ba4..688a27b0888c 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2772,7 +2772,8 @@ EXPORT_SYMBOL(unregister_console);
void __init console_init(void)
{
int ret;
- initcall_t *call;
+ initcall_t call;
+ initcall_entry_t *ce;
/* Setup the default TTY line discipline. */
n_tty_init();
@@ -2781,13 +2782,14 @@ void __init console_init(void)
* set up the console device so that later boot sequences can
* inform about problems etc..
*/
- call = __con_initcall_start;
+ ce = __con_initcall_start;
trace_initcall_level("console");
- while (call < __con_initcall_end) {
- trace_initcall_start((*call));
- ret = (*call)();
- trace_initcall_finish((*call), ret);
- call++;
+ while (ce < __con_initcall_end) {
+ call = initcall_from_entry(ce);
+ trace_initcall_start(call);
+ ret = call();
+ trace_initcall_finish(call, ret);
+ ce++;
}
}
diff --git a/security/security.c b/security/security.c
index 68f46d849abe..1e7b1486d82a 100644
--- a/security/security.c
+++ b/security/security.c
@@ -48,14 +48,17 @@ static __initdata char chosen_lsm[SECURITY_NAME_MAX + 1] =
static void __init do_security_initcalls(void)
{
int ret;
- initcall_t *call;
- call = __security_initcall_start;
+ initcall_t call;
+ initcall_entry_t *ce;
+
+ ce = __security_initcall_start;
trace_initcall_level("security");
- while (call < __security_initcall_end) {
- trace_initcall_start((*call));
- ret = (*call) ();
- trace_initcall_finish((*call), ret);
- call++;
+ while (ce < __security_initcall_end) {
+ call = initcall_from_entry(ce);
+ trace_initcall_start(call);
+ ret = call();
+ trace_initcall_finish(call, ret);
+ ce++;
}
}
--
2.17.1
^ permalink raw reply related
* [PATCH v10 3/6] module: use relative references for __ksymtab entries
From: Ard Biesheuvel @ 2018-07-04 8:36 UTC (permalink / raw)
To: linux-kernel
Cc: Ard Biesheuvel, Arnd Bergmann, Kees Cook, Will Deacon,
Michael Ellerman, Thomas Garnier, Thomas Gleixner,
Serge E. Hallyn, Bjorn Helgaas, Benjamin Herrenschmidt,
Russell King, Paul Mackerras, Catalin Marinas, Petr Mladek,
Ingo Molnar, James Morris, Andrew Morton, Nicolas Pitre,
Josh Poimboeuf, Steven Rostedt, Sergey Senozhatsky,
Linus Torvalds, Jessica Yu, linux-arm-kernel, linuxppc-dev, x86
In-Reply-To: <20180704083651.24360-1-ard.biesheuvel@linaro.org>
An ordinary arm64 defconfig build has ~64 KB worth of __ksymtab
entries, each consisting of two 64-bit fields containing absolute
references, to the symbol itself and to a char array containing
its name, respectively.
When we build the same configuration with KASLR enabled, we end
up with an additional ~192 KB of relocations in the .init section,
i.e., one 24 byte entry for each absolute reference, which all need
to be processed at boot time.
Given how the struct kernel_symbol that describes each entry is
completely local to module.c (except for the references emitted
by EXPORT_SYMBOL() itself), we can easily modify it to contain
two 32-bit relative references instead. This reduces the size of
the __ksymtab section by 50% for all 64-bit architectures, and
gets rid of the runtime relocations entirely for architectures
implementing KASLR, either via standard PIE linking (arm64) or
using custom host tools (x86).
Note that the binary search involving __ksymtab contents relies
on each section being sorted by symbol name. This is implemented
based on the input section names, not the names in the ksymtab
entries, so this patch does not interfere with that.
Given that the use of place-relative relocations requires support
both in the toolchain and in the module loader, we cannot enable
this feature for all architectures. So make it dependent on whether
CONFIG_HAVE_ARCH_PREL32_RELOCATIONS is defined.
Acked-by: Jessica Yu <jeyu@kernel.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/x86/include/asm/Kbuild | 1 +
arch/x86/include/asm/export.h | 5 ---
include/asm-generic/export.h | 12 ++++-
include/linux/compiler.h | 19 ++++++++
include/linux/export.h | 46 +++++++++++++++-----
kernel/module.c | 32 +++++++++++---
6 files changed, 91 insertions(+), 24 deletions(-)
diff --git a/arch/x86/include/asm/Kbuild b/arch/x86/include/asm/Kbuild
index de690c2d2e33..a0ab9ab61c75 100644
--- a/arch/x86/include/asm/Kbuild
+++ b/arch/x86/include/asm/Kbuild
@@ -8,5 +8,6 @@ generated-y += xen-hypercalls.h
generic-y += dma-contiguous.h
generic-y += early_ioremap.h
+generic-y += export.h
generic-y += mcs_spinlock.h
generic-y += mm-arch-hooks.h
diff --git a/arch/x86/include/asm/export.h b/arch/x86/include/asm/export.h
deleted file mode 100644
index 2a51d66689c5..000000000000
--- a/arch/x86/include/asm/export.h
+++ /dev/null
@@ -1,5 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifdef CONFIG_64BIT
-#define KSYM_ALIGN 16
-#endif
-#include <asm-generic/export.h>
diff --git a/include/asm-generic/export.h b/include/asm-generic/export.h
index 68efb950a918..4d73e6e3c66c 100644
--- a/include/asm-generic/export.h
+++ b/include/asm-generic/export.h
@@ -5,12 +5,10 @@
#define KSYM_FUNC(x) x
#endif
#ifdef CONFIG_64BIT
-#define __put .quad
#ifndef KSYM_ALIGN
#define KSYM_ALIGN 8
#endif
#else
-#define __put .long
#ifndef KSYM_ALIGN
#define KSYM_ALIGN 4
#endif
@@ -19,6 +17,16 @@
#define KCRC_ALIGN 4
#endif
+.macro __put, val, name
+#ifdef CONFIG_HAVE_ARCH_PREL32_RELOCATIONS
+ .long \val - ., \name - .
+#elif defined(CONFIG_64BIT)
+ .quad \val, \name
+#else
+ .long \val, \name
+#endif
+.endm
+
/*
* note on .section use: @progbits vs %progbits nastiness doesn't matter,
* since we immediately emit into those sections anyway.
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 42506e4d1f53..61c844d4ab48 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -280,6 +280,25 @@ unsigned long read_word_at_a_time(const void *addr)
#endif /* __KERNEL__ */
+/*
+ * Force the compiler to emit 'sym' as a symbol, so that we can reference
+ * it from inline assembler. Necessary in case 'sym' could be inlined
+ * otherwise, or eliminated entirely due to lack of references that are
+ * visible to the compiler.
+ */
+#define __ADDRESSABLE(sym) \
+ static void * __attribute__((section(".discard.addressable"), used)) \
+ __PASTE(__addressable_##sym, __LINE__) = (void *)&sym;
+
+/**
+ * offset_to_ptr - convert a relative memory offset to an absolute pointer
+ * @off: the address of the 32-bit offset value
+ */
+static inline void *offset_to_ptr(const int *off)
+{
+ return (void *)((unsigned long)off + *off);
+}
+
#endif /* __ASSEMBLY__ */
#ifndef __optimize
diff --git a/include/linux/export.h b/include/linux/export.h
index ea7df303d68d..ae072bc5aacf 100644
--- a/include/linux/export.h
+++ b/include/linux/export.h
@@ -18,12 +18,6 @@
#define VMLINUX_SYMBOL_STR(x) __VMLINUX_SYMBOL_STR(x)
#ifndef __ASSEMBLY__
-struct kernel_symbol
-{
- unsigned long value;
- const char *name;
-};
-
#ifdef MODULE
extern struct module __this_module;
#define THIS_MODULE (&__this_module)
@@ -54,17 +48,47 @@ extern struct module __this_module;
#define __CRC_SYMBOL(sym, sec)
#endif
+#ifdef CONFIG_HAVE_ARCH_PREL32_RELOCATIONS
+#include <linux/compiler.h>
+/*
+ * Emit the ksymtab entry as a pair of relative references: this reduces
+ * the size by half on 64-bit architectures, and eliminates the need for
+ * absolute relocations that require runtime processing on relocatable
+ * kernels.
+ */
+#define __KSYMTAB_ENTRY(sym, sec) \
+ __ADDRESSABLE(sym) \
+ asm(" .section \"___ksymtab" sec "+" #sym "\", \"a\" \n" \
+ " .balign 8 \n" \
+ "__ksymtab_" #sym ": \n" \
+ " .long " #sym "- . \n" \
+ " .long __kstrtab_" #sym "- . \n" \
+ " .previous \n")
+
+struct kernel_symbol {
+ int value_offset;
+ int name_offset;
+};
+#else
+#define __KSYMTAB_ENTRY(sym, sec) \
+ static const struct kernel_symbol __ksymtab_##sym \
+ __attribute__((section("___ksymtab" sec "+" #sym), used)) \
+ = { (unsigned long)&sym, __kstrtab_##sym }
+
+struct kernel_symbol {
+ unsigned long value;
+ const char *name;
+};
+#endif
+
/* For every exported symbol, place a struct in the __ksymtab section */
#define ___EXPORT_SYMBOL(sym, sec) \
extern typeof(sym) sym; \
__CRC_SYMBOL(sym, sec) \
static const char __kstrtab_##sym[] \
- __attribute__((section("__ksymtab_strings"), aligned(1))) \
+ __attribute__((section("__ksymtab_strings"), used, aligned(1))) \
= #sym; \
- static const struct kernel_symbol __ksymtab_##sym \
- __used \
- __attribute__((section("___ksymtab" sec "+" #sym), used)) \
- = { (unsigned long)&sym, __kstrtab_##sym }
+ __KSYMTAB_ENTRY(sym, sec)
#if defined(__DISABLE_EXPORTS)
diff --git a/kernel/module.c b/kernel/module.c
index f475f30eed8c..7cb82e0fcac0 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -547,12 +547,30 @@ static bool check_symbol(const struct symsearch *syms,
return true;
}
+static unsigned long kernel_symbol_value(const struct kernel_symbol *sym)
+{
+#ifdef CONFIG_HAVE_ARCH_PREL32_RELOCATIONS
+ return (unsigned long)offset_to_ptr(&sym->value_offset);
+#else
+ return sym->value;
+#endif
+}
+
+static const char *kernel_symbol_name(const struct kernel_symbol *sym)
+{
+#ifdef CONFIG_HAVE_ARCH_PREL32_RELOCATIONS
+ return offset_to_ptr(&sym->name_offset);
+#else
+ return sym->name;
+#endif
+}
+
static int cmp_name(const void *va, const void *vb)
{
const char *a;
const struct kernel_symbol *b;
a = va; b = vb;
- return strcmp(a, b->name);
+ return strcmp(a, kernel_symbol_name(b));
}
static bool find_symbol_in_section(const struct symsearch *syms,
@@ -2192,7 +2210,7 @@ void *__symbol_get(const char *symbol)
sym = NULL;
preempt_enable();
- return sym ? (void *)sym->value : NULL;
+ return sym ? (void *)kernel_symbol_value(sym) : NULL;
}
EXPORT_SYMBOL_GPL(__symbol_get);
@@ -2222,10 +2240,12 @@ static int verify_export_symbols(struct module *mod)
for (i = 0; i < ARRAY_SIZE(arr); i++) {
for (s = arr[i].sym; s < arr[i].sym + arr[i].num; s++) {
- if (find_symbol(s->name, &owner, NULL, true, false)) {
+ if (find_symbol(kernel_symbol_name(s), &owner, NULL,
+ true, false)) {
pr_err("%s: exports duplicate symbol %s"
" (owned by %s)\n",
- mod->name, s->name, module_name(owner));
+ mod->name, kernel_symbol_name(s),
+ module_name(owner));
return -ENOEXEC;
}
}
@@ -2274,7 +2294,7 @@ static int simplify_symbols(struct module *mod, const struct load_info *info)
ksym = resolve_symbol_wait(mod, info, name);
/* Ok if resolved. */
if (ksym && !IS_ERR(ksym)) {
- sym[i].st_value = ksym->value;
+ sym[i].st_value = kernel_symbol_value(ksym);
break;
}
@@ -2534,7 +2554,7 @@ static int is_exported(const char *name, unsigned long value,
ks = lookup_symbol(name, __start___ksymtab, __stop___ksymtab);
else
ks = lookup_symbol(name, mod->syms, mod->syms + mod->num_syms);
- return ks != NULL && ks->value == value;
+ return ks != NULL && kernel_symbol_value(ks) == value;
}
/* As per nm */
--
2.17.1
^ permalink raw reply related
* [PATCH v10 2/6] module: allow symbol exports to be disabled
From: Ard Biesheuvel @ 2018-07-04 8:36 UTC (permalink / raw)
To: linux-kernel
Cc: Ard Biesheuvel, Arnd Bergmann, Kees Cook, Will Deacon,
Michael Ellerman, Thomas Garnier, Thomas Gleixner,
Serge E. Hallyn, Bjorn Helgaas, Benjamin Herrenschmidt,
Russell King, Paul Mackerras, Catalin Marinas, Petr Mladek,
Ingo Molnar, James Morris, Andrew Morton, Nicolas Pitre,
Josh Poimboeuf, Steven Rostedt, Sergey Senozhatsky,
Linus Torvalds, Jessica Yu, linux-arm-kernel, linuxppc-dev, x86
In-Reply-To: <20180704083651.24360-1-ard.biesheuvel@linaro.org>
To allow existing C code to be incorporated into the decompressor or
the UEFI stub, introduce a CPP macro that turns all EXPORT_SYMBOL_xxx
declarations into nops, and #define it in places where such exports
are undesirable. Note that this gets rid of a rather dodgy redefine
of linux/export.h's header guard.
Acked-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/x86/boot/compressed/kaslr.c | 5 +----
drivers/firmware/efi/libstub/Makefile | 3 ++-
include/linux/export.h | 11 ++++++++++-
3 files changed, 13 insertions(+), 6 deletions(-)
diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index b87a7582853d..ed7a123bba42 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -23,11 +23,8 @@
* _ctype[] in lib/ctype.c is needed by isspace() of linux/ctype.h.
* While both lib/ctype.c and lib/cmdline.c will bring EXPORT_SYMBOL
* which is meaningless and will cause compiling error in some cases.
- * So do not include linux/export.h and define EXPORT_SYMBOL(sym)
- * as empty.
*/
-#define _LINUX_EXPORT_H
-#define EXPORT_SYMBOL(sym)
+#define __DISABLE_EXPORTS
#include "misc.h"
#include "error.h"
diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index a34e9290a699..0d0d3483241c 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -20,7 +20,8 @@ cflags-$(CONFIG_EFI_ARMSTUB) += -I$(srctree)/scripts/dtc/libfdt
KBUILD_CFLAGS := $(cflags-y) -DDISABLE_BRANCH_PROFILING \
-D__NO_FORTIFY \
$(call cc-option,-ffreestanding) \
- $(call cc-option,-fno-stack-protector)
+ $(call cc-option,-fno-stack-protector) \
+ -D__DISABLE_EXPORTS
GCOV_PROFILE := n
KASAN_SANITIZE := n
diff --git a/include/linux/export.h b/include/linux/export.h
index b768d6dd3c90..ea7df303d68d 100644
--- a/include/linux/export.h
+++ b/include/linux/export.h
@@ -66,7 +66,16 @@ extern struct module __this_module;
__attribute__((section("___ksymtab" sec "+" #sym), used)) \
= { (unsigned long)&sym, __kstrtab_##sym }
-#if defined(__KSYM_DEPS__)
+#if defined(__DISABLE_EXPORTS)
+
+/*
+ * Allow symbol exports to be disabled completely so that C code may
+ * be reused in other execution contexts such as the UEFI stub or the
+ * decompressor.
+ */
+#define __EXPORT_SYMBOL(sym, sec)
+
+#elif defined(__KSYM_DEPS__)
/*
* For fine grained build dependencies, we want to tell the build system
--
2.17.1
^ permalink raw reply related
* [PATCH v10 1/6] arch: enable relative relocations for arm64, power and x86
From: Ard Biesheuvel @ 2018-07-04 8:36 UTC (permalink / raw)
To: linux-kernel
Cc: Ard Biesheuvel, Arnd Bergmann, Kees Cook, Will Deacon,
Michael Ellerman, Thomas Garnier, Thomas Gleixner,
Serge E. Hallyn, Bjorn Helgaas, Benjamin Herrenschmidt,
Russell King, Paul Mackerras, Catalin Marinas, Petr Mladek,
Ingo Molnar, James Morris, Andrew Morton, Nicolas Pitre,
Josh Poimboeuf, Steven Rostedt, Sergey Senozhatsky,
Linus Torvalds, Jessica Yu, linux-arm-kernel, linuxppc-dev, x86
In-Reply-To: <20180704083651.24360-1-ard.biesheuvel@linaro.org>
Before updating certain subsystems to use place relative 32-bit
relocations in special sections, to save space and reduce the
number of absolute relocations that need to be processed at runtime
by relocatable kernels, introduce the Kconfig symbol and define it
for some architectures that should be able to support and benefit
from it.
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/Kconfig | 10 ++++++++++
arch/arm64/Kconfig | 1 +
arch/powerpc/Kconfig | 1 +
arch/x86/Kconfig | 1 +
4 files changed, 13 insertions(+)
diff --git a/arch/Kconfig b/arch/Kconfig
index 1aa59063f1fd..2b8b70820002 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -971,4 +971,14 @@ config REFCOUNT_FULL
against various use-after-free conditions that can be used in
security flaw exploits.
+config HAVE_ARCH_PREL32_RELOCATIONS
+ bool
+ help
+ May be selected by an architecture if it supports place-relative
+ 32-bit relocations, both in the toolchain and in the module loader,
+ in which case relative references can be used in special sections
+ for PCI fixup, initcalls etc which are only half the size on 64 bit
+ architectures, and don't require runtime relocation on relocatable
+ kernels.
+
source "kernel/gcov/Kconfig"
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 42c090cf0292..1940c6405d04 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -95,6 +95,7 @@ config ARM64
select HAVE_ARCH_KGDB
select HAVE_ARCH_MMAP_RND_BITS
select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT
+ select HAVE_ARCH_PREL32_RELOCATIONS
select HAVE_ARCH_SECCOMP_FILTER
select HAVE_ARCH_THREAD_STRUCT_WHITELIST
select HAVE_ARCH_TRACEHOOK
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 9f2b75fe2c2d..e4fe19789b8b 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -177,6 +177,7 @@ config PPC
select HAVE_ARCH_KGDB
select HAVE_ARCH_MMAP_RND_BITS
select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT
+ select HAVE_ARCH_PREL32_RELOCATIONS
select HAVE_ARCH_SECCOMP_FILTER
select HAVE_ARCH_TRACEHOOK
select HAVE_CBPF_JIT if !PPC64
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f1dbb4ee19d7..e10a3542db7e 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -123,6 +123,7 @@ config X86
select HAVE_ARCH_MMAP_RND_BITS if MMU
select HAVE_ARCH_MMAP_RND_COMPAT_BITS if MMU && COMPAT
select HAVE_ARCH_COMPAT_MMAP_BASES if MMU && COMPAT
+ select HAVE_ARCH_PREL32_RELOCATIONS
select HAVE_ARCH_SECCOMP_FILTER
select HAVE_ARCH_THREAD_STRUCT_WHITELIST
select HAVE_ARCH_TRACEHOOK
--
2.17.1
^ permalink raw reply related
* [PATCH v10 0/6] add support for relative references in special sections
From: Ard Biesheuvel @ 2018-07-04 8:36 UTC (permalink / raw)
To: linux-kernel
Cc: Ard Biesheuvel, Arnd Bergmann, Kees Cook, Will Deacon,
Michael Ellerman, Thomas Garnier, Thomas Gleixner,
Serge E. Hallyn, Bjorn Helgaas, Benjamin Herrenschmidt,
Russell King, Paul Mackerras, Catalin Marinas, Petr Mladek,
Ingo Molnar, James Morris, Andrew Morton, Nicolas Pitre,
Josh Poimboeuf, Steven Rostedt, Sergey Senozhatsky,
Linus Torvalds, Jessica Yu, linux-arm-kernel, linuxppc-dev, x86
This adds support for emitting special sections such as initcall arrays,
PCI fixups and tracepoints as relative references rather than absolute
references. This reduces the size by 50% on 64-bit architectures, but
more importantly, it removes the need for carrying relocation metadata
for these sections in relocatable kernels (e.g., for KASLR) that needs
to be fixed up at boot time. On arm64, this reduces the vmlinux footprint
of such a reference by 8x (8 byte absolute reference + 24 byte RELA entry
vs 4 byte relative reference)
Patch #3 was sent out before as a single patch. This series supersedes
the previous submission. This version makes relative ksymtab entries
dependent on the new Kconfig symbol HAVE_ARCH_PREL32_RELOCATIONS rather
than trying to infer from kbuild test robot replies for which architectures
it should be blacklisted.
Patch #1 introduces the new Kconfig symbol HAVE_ARCH_PREL32_RELOCATIONS,
and sets it for the main architectures that are expected to benefit the
most from this feature, i.e., 64-bit architectures or ones that use
runtime relocations.
Patch #2 add support for #define'ing __DISABLE_EXPORTS to get rid of
ksymtab/kcrctab sections in decompressor and EFI stub objects when
rebuilding existing C files to run in a different context.
Patches #4 - #6 implement relative references for initcalls, PCI fixups
and tracepoints, respectively, all of which produce sections with order
~1000 entries on an arm64 defconfig kernel with tracing enabled. This
means we save about 28 KB of vmlinux space for each of these patches.
[From the v7 series blurb, which included the jump_label patches as well]:
For the arm64 kernel, all patches combined reduce the memory footprint of
vmlinux by about 1.3 MB (using a config copied from Ubuntu that has KASLR
enabled), of which ~1 MB is the size reduction of the RELA section in .init,
and the remaining 300 KB is reduction of .text/.data.
Branch:
git://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git relative-special-sections-v10
Andrew, this series now has all the prerequisite acks in place. Could you please
take this through the -mm tree? Thanks.
Changes since v9:
- use .discard.addressable section (not .discard) for emitting dummy symbol
references, to work around a build issue on powerpc
- add acks from Michael, Ingo, Will, James, Petr and Sergey
Changes since v8:
- add Nico's ack (#2)
- drop 'const' qualifier from __ADDRESSABLE(sym) to prevent mismatching
attributes for the .discard section (#3)
- drop all uses of VMLINUX_SYMBOL_STR(), which is on its way out (#3 - #6)
Changes since v7:
- dropped the jump_label patches, these will be revisited in a separate series
- reorder __DISABLE_EXPORTS with __KSYM_DEPS__ check in #2
- use offset_to_ptr() helper function to abstract the relative pointer
conversion [int *off -> (ulong)off + *off] (#3 - #6)
- rebase onto v4.16-rc3
Changes since v6:
- drop S390 from patch #1 introducing HAVE_ARCH_PREL32_RELOCATIONS: kbuild
robot threw me some s390 curveballs, and given that s390 does not define
CONFIG_RELOCATABLE in the first place, it does not benefit as much from
relative references as arm64, x86 and power do
- add patch to allow symbol exports to be disabled at compilation unit
granularity (#2)
- get rid of arm64 vmlinux.lds.S hunk to ensure code generated by __ADDRESSABLE
gets discarded from the EFI stub - it is no longer needed after adding #2 (#1)
- change _ADDRESSABLE() to emit a data reference, not a code reference - this
is another simplification made possible by patch #2 (#3)
- add Steven's ack to #6
- split x86 jump_label patch into two (#9, #10)
Changes since v5:
- add missing jump_label prototypes to s390 jump_label.h (#6)
- fix inverted condition in call to jump_entry_is_module_init() (#6)
Changes since v4:
- add patches to convert x86 and arm64 to use relative references for jump
tables (#6 - #8)
- rename PCI patch and add Bjorn's ack (#4)
- rebase onto v4.15-rc5
Changes since v3:
- fix module unload issue in patch #5 reported by Jessica, by reusing the
updated routine for_each_tracepoint_range() for the quiescent check at
module unload time; this requires this routine to be moved before
tracepoint_module_going() in kernel/tracepoint.c
- add Jessica's ack to #2
- rebase onto v4.14-rc1
Changes since v2:
- Revert my slightly misguided attempt to appease checkpatch, which resulted
in needless churn and worse code. This v3 is based on v1 with a few tweaks
that were actually reasonable checkpatch warnings: unnecessary braces (as
pointed out by Ingo) and other minor whitespace misdemeanors.
Changes since v1:
- Remove checkpatch errors to the extent feasible: in some cases, this
involves moving extern declarations into C files, and switching to
struct definitions rather than typedefs. Some errors are impossible
to fix: please find the remaining ones after the diffstat.
- Used 'int' instead if 'signed int' for the various offset fields: there
is no ambiguity between architectures regarding its signedness (unlike
'char')
- Refactor the different patches to be more uniform in the way they define
the section entry type and accessors in the .h file, and avoid the need to
add #ifdefs to the C code.
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nicolas Pitre <nico@linaro.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: x86@kernel.org
Ard Biesheuvel (6):
arch: enable relative relocations for arm64, power and x86
module: allow symbol exports to be disabled
module: use relative references for __ksymtab entries
init: allow initcall tables to be emitted using relative references
PCI: Add support for relative addressing in quirk tables
kernel: tracepoints: add support for relative references
arch/Kconfig | 10 ++++
arch/arm64/Kconfig | 1 +
arch/powerpc/Kconfig | 1 +
arch/x86/Kconfig | 1 +
arch/x86/boot/compressed/kaslr.c | 5 +-
arch/x86/include/asm/Kbuild | 1 +
arch/x86/include/asm/export.h | 5 --
drivers/firmware/efi/libstub/Makefile | 3 +-
drivers/pci/quirks.c | 12 +++--
include/asm-generic/export.h | 12 ++++-
include/linux/compiler.h | 19 +++++++
include/linux/export.h | 57 +++++++++++++++-----
include/linux/init.h | 44 +++++++++++----
include/linux/pci.h | 20 +++++++
include/linux/tracepoint.h | 19 +++++--
init/main.c | 32 +++++------
kernel/module.c | 32 ++++++++---
kernel/printk/printk.c | 16 +++---
kernel/tracepoint.c | 49 +++++++++--------
security/security.c | 17 +++---
20 files changed, 255 insertions(+), 101 deletions(-)
delete mode 100644 arch/x86/include/asm/export.h
--
2.17.1
^ permalink raw reply
* Re: [v2 PATCH 2/2] powerpc: Enable CPU_FTR_ASYM_SMT for interleaved big-cores
From: Gautham R Shenoy @ 2018-07-04 8:15 UTC (permalink / raw)
To: Murilo Opsfelder Araujo
Cc: Gautham R. Shenoy, Michael Ellerman, Benjamin Herrenschmidt,
Michael Neuling, Vaidyanathan Srinivasan, Akshay Adiga,
Shilpasri G Bhat, Oliver O'Halloran, Nicholas Piggin,
linuxppc-dev, linux-kernel
In-Reply-To: <20180703175346.GB6474@kermit-br-ibm-com.br.ibm.com>
Hi Murilo,
Thanks for the review.
On Tue, Jul 03, 2018 at 02:53:46PM -0300, Murilo Opsfelder Araujo wrote:
[..snip..]
> > - /* Initialize CPU <=> thread mapping/
> > + if (has_interleaved_big_core) {
> > + int key = __builtin_ctzl(CPU_FTR_ASYM_SMT);
> > +
> > + cur_cpu_spec->cpu_features |= CPU_FTR_ASYM_SMT;
> > + static_branch_enable(&cpu_feature_keys[key]);
> > + pr_info("Detected interleaved big-cores\n");
> > + }
>
> Shouldn't we use cpu_has_feature(CPU_FTR_ASYM_SMT) before setting
> > it?
Are you suggesting that we do the following?
if (has_interleaved_big_core &&
!cpu_has_feature(CPU_FTR_ASYM_SMT)) {
...
}
Currently CPU_FTR_ASYM_SMT is set at compile time for only POWER7
where running the tasks on lower numbered threads give us the benefit
of SMT thread folding. Interleaved big core is a feature introduced
only on POWER9. Thus, we know that CPU_FTR_ASYM_SMT is not set in
cpu_features at this point.
>
> > +
> > + /* Initialize CPU <=> thread mapping/
> > *
> > * WARNING: We assume that the number of threads is the same for
> > * every CPU in the system. If that is not the case, then some code
> > --
> > 1.9.4
> >
>
> --
> Murilo
--
Thanks and Regards
gautham.
^ permalink raw reply
* [PATCH] powerpc: icp-hv: fix missing of_node_put in success path
From: Nicholas Mc Guire @ 2018-07-04 8:03 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: Paul Mackerras, Michael Ellerman, linuxppc-dev, linux-kernel,
Nicholas Mc Guire
Both of_find_compatible_node() and of_find_node_by_type() will
return a refcounted node on success - thus for the success path
the node must be explicitly released with a of_node_put().
Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Fixes: commit 0b05ac6e2480 ("powerpc/xics: Rewrite XICS driver")
---
Problem found by experimental coccinelle script
Patch was compiletested with: ppc64_defconfig (implies
CONFIG_PPC_ICP_HV=y)
with sparse warnings though not related to the proposed change
Patch is against 4.18-rc3 (localversion-next is next-20180704)
arch/powerpc/sysdev/xics/icp-hv.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/powerpc/sysdev/xics/icp-hv.c b/arch/powerpc/sysdev/xics/icp-hv.c
index bbc839a..003deaa 100644
--- a/arch/powerpc/sysdev/xics/icp-hv.c
+++ b/arch/powerpc/sysdev/xics/icp-hv.c
@@ -179,6 +179,7 @@ int icp_hv_init(void)
icp_ops = &icp_hv_ops;
+ of_node_put(np);
return 0;
}
--
2.1.4
^ permalink raw reply related
* Re: [PATCH v11 00/26] Speculative page faults
From: Laurent Dufour @ 2018-07-04 7:51 UTC (permalink / raw)
To: Song, HaiyanX
Cc: akpm@linux-foundation.org, mhocko@kernel.org,
peterz@infradead.org, kirill@shutemov.name, ak@linux.intel.com,
dave@stgolabs.net, jack@suse.cz, Matthew Wilcox,
khandual@linux.vnet.ibm.com, aneesh.kumar@linux.vnet.ibm.com,
benh@kernel.crashing.org, mpe@ellerman.id.au, paulus@samba.org,
Thomas Gleixner, Ingo Molnar, hpa@zytor.com, Will Deacon,
Sergey Senozhatsky, sergey.senozhatsky.work@gmail.com,
Andrea Arcangeli, Alexei Starovoitov, Wang, Kemi, Daniel Jordan,
David Rientjes, Jerome Glisse, Ganesh Mahendran, Minchan Kim,
Punit Agrawal, vinayak menon, Yang Shi,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
haren@linux.vnet.ibm.com, npiggin@gmail.com,
bsingharora@gmail.com, paulmck@linux.vnet.ibm.com, Tim Chen,
linuxppc-dev@lists.ozlabs.org, x86@kernel.org
In-Reply-To: <9FE19350E8A7EE45B64D8D63D368C8966B85F660@SHSMSX101.ccr.corp.intel.com>
On 04/07/2018 05:23, Song, HaiyanX wrote:
> Hi Laurent,
>
>
> For the test result on Intel 4s skylake platform (192 CPUs, 768G Memory), the below test cases all were run 3 times.
> I check the test results, only page_fault3_thread/enable THP have 6% stddev for head commit, other tests have lower stddev.
Repeating the test only 3 times seems a bit too low to me.
I'll focus on the higher change for the moment, but I don't have access to such
a hardware.
Is possible to provide a diff between base and SPF of the performance cycles
measured when running page_fault3 and page_fault2 when the 20% change is detected.
Please stay focus on the test case process to see exactly where the series is
impacting.
Thanks,
Laurent.
>
> And I did not find other high variation on test case result.
>
> a). Enable THP
> testcase base stddev change head stddev metric
> page_fault3/enable THP 10519 ± 3% -20.5% 8368 ±6% will-it-scale.per_thread_ops
> page_fault2/enalbe THP 8281 ± 2% -18.8% 6728 will-it-scale.per_thread_ops
> brk1/eanble THP 998475 -2.2% 976893 will-it-scale.per_process_ops
> context_switch1/enable THP 223910 -1.3% 220930 will-it-scale.per_process_ops
> context_switch1/enable THP 233722 -1.0% 231288 will-it-scale.per_thread_ops
>
> b). Disable THP
> page_fault3/disable THP 10856 -23.1% 8344 will-it-scale.per_thread_ops
> page_fault2/disable THP 8147 -18.8% 6613 will-it-scale.per_thread_ops
> brk1/disable THP 957 -7.9% 881 will-it-scale.per_thread_ops
> context_switch1/disable THP 237006 -2.2% 231907 will-it-scale.per_thread_ops
> brk1/disable THP 997317 -2.0% 977778 will-it-scale.per_process_ops
> page_fault3/disable THP 467454 -1.8% 459251 will-it-scale.per_process_ops
> context_switch1/disable THP 224431 -1.3% 221567 will-it-scale.per_process_ops
>
>
> Best regards,
> Haiyan Song
> ________________________________________
> From: Laurent Dufour [ldufour@linux.vnet.ibm.com]
> Sent: Monday, July 02, 2018 4:59 PM
> To: Song, HaiyanX
> Cc: akpm@linux-foundation.org; mhocko@kernel.org; peterz@infradead.org; kirill@shutemov.name; ak@linux.intel.com; dave@stgolabs.net; jack@suse.cz; Matthew Wilcox; khandual@linux.vnet.ibm.com; aneesh.kumar@linux.vnet.ibm.com; benh@kernel.crashing.org; mpe@ellerman.id.au; paulus@samba.org; Thomas Gleixner; Ingo Molnar; hpa@zytor.com; Will Deacon; Sergey Senozhatsky; sergey.senozhatsky.work@gmail.com; Andrea Arcangeli; Alexei Starovoitov; Wang, Kemi; Daniel Jordan; David Rientjes; Jerome Glisse; Ganesh Mahendran; Minchan Kim; Punit Agrawal; vinayak menon; Yang Shi; linux-kernel@vger.kernel.org; linux-mm@kvack.org; haren@linux.vnet.ibm.com; npiggin@gmail.com; bsingharora@gmail.com; paulmck@linux.vnet.ibm.com; Tim Chen; linuxppc-dev@lists.ozlabs.org; x86@kernel.org
> Subject: Re: [PATCH v11 00/26] Speculative page faults
>
> On 11/06/2018 09:49, Song, HaiyanX wrote:
>> Hi Laurent,
>>
>> Regression test for v11 patch serials have been run, some regression is found by LKP-tools (linux kernel performance)
>> tested on Intel 4s skylake platform. This time only test the cases which have been run and found regressions on
>> V9 patch serials.
>>
>> The regression result is sorted by the metric will-it-scale.per_thread_ops.
>> branch: Laurent-Dufour/Speculative-page-faults/20180520-045126
>> commit id:
>> head commit : a7a8993bfe3ccb54ad468b9f1799649e4ad1ff12
>> base commit : ba98a1cdad71d259a194461b3a61471b49b14df1
>> Benchmark: will-it-scale
>> Download link: https://github.com/antonblanchard/will-it-scale/tree/master
>>
>> Metrics:
>> will-it-scale.per_process_ops=processes/nr_cpu
>> will-it-scale.per_thread_ops=threads/nr_cpu
>> test box: lkp-skl-4sp1(nr_cpu=192,memory=768G)
>> THP: enable / disable
>> nr_task:100%
>>
>> 1. Regressions:
>>
>> a). Enable THP
>> testcase base change head metric
>> page_fault3/enable THP 10519 -20.5% 836 will-it-scale.per_thread_ops
>> page_fault2/enalbe THP 8281 -18.8% 6728 will-it-scale.per_thread_ops
>> brk1/eanble THP 998475 -2.2% 976893 will-it-scale.per_process_ops
>> context_switch1/enable THP 223910 -1.3% 220930 will-it-scale.per_process_ops
>> context_switch1/enable THP 233722 -1.0% 231288 will-it-scale.per_thread_ops
>>
>> b). Disable THP
>> page_fault3/disable THP 10856 -23.1% 8344 will-it-scale.per_thread_ops
>> page_fault2/disable THP 8147 -18.8% 6613 will-it-scale.per_thread_ops
>> brk1/disable THP 957 -7.9% 881 will-it-scale.per_thread_ops
>> context_switch1/disable THP 237006 -2.2% 231907 will-it-scale.per_thread_ops
>> brk1/disable THP 997317 -2.0% 977778 will-it-scale.per_process_ops
>> page_fault3/disable THP 467454 -1.8% 459251 will-it-scale.per_process_ops
>> context_switch1/disable THP 224431 -1.3% 221567 will-it-scale.per_process_ops
>>
>> Notes: for the above values of test result, the higher is better.
>
> I tried the same tests on my PowerPC victim VM (1024 CPUs, 11TB) and I can't
> get reproducible results. The results have huge variation, even on the vanilla
> kernel, and I can't state on any changes due to that.
>
> I tried on smaller node (80 CPUs, 32G), and the tests ran better, but I didn't
> measure any changes between the vanilla and the SPF patched ones:
>
> test THP enabled 4.17.0-rc4-mm1 spf delta
> page_fault3_threads 2697.7 2683.5 -0.53%
> page_fault2_threads 170660.6 169574.1 -0.64%
> context_switch1_threads 6915269.2 6877507.3 -0.55%
> context_switch1_processes 6478076.2 6529493.5 0.79%
> brk1 243391.2 238527.5 -2.00%
>
> Tests were run 10 times, no high variation detected.
>
> Did you see high variation on your side ? How many times the test were run to
> compute the average values ?
>
> Thanks,
> Laurent.
>
>
>>
>> 2. Improvement: not found improvement based on the selected test cases.
>>
>>
>> Best regards
>> Haiyan Song
>> ________________________________________
>> From: owner-linux-mm@kvack.org [owner-linux-mm@kvack.org] on behalf of Laurent Dufour [ldufour@linux.vnet.ibm.com]
>> Sent: Monday, May 28, 2018 4:54 PM
>> To: Song, HaiyanX
>> Cc: akpm@linux-foundation.org; mhocko@kernel.org; peterz@infradead.org; kirill@shutemov.name; ak@linux.intel.com; dave@stgolabs.net; jack@suse.cz; Matthew Wilcox; khandual@linux.vnet.ibm.com; aneesh.kumar@linux.vnet.ibm.com; benh@kernel.crashing.org; mpe@ellerman.id.au; paulus@samba.org; Thomas Gleixner; Ingo Molnar; hpa@zytor.com; Will Deacon; Sergey Senozhatsky; sergey.senozhatsky.work@gmail.com; Andrea Arcangeli; Alexei Starovoitov; Wang, Kemi; Daniel Jordan; David Rientjes; Jerome Glisse; Ganesh Mahendran; Minchan Kim; Punit Agrawal; vinayak menon; Yang Shi; linux-kernel@vger.kernel.org; linux-mm@kvack.org; haren@linux.vnet.ibm.com; npiggin@gmail.com; bsingharora@gmail.com; paulmck@linux.vnet.ibm.com; Tim Chen; linuxppc-dev@lists.ozlabs.org; x86@kernel.org
>> Subject: Re: [PATCH v11 00/26] Speculative page faults
>>
>> On 28/05/2018 10:22, Haiyan Song wrote:
>>> Hi Laurent,
>>>
>>> Yes, these tests are done on V9 patch.
>>
>> Do you plan to give this V11 a run ?
>>
>>>
>>>
>>> Best regards,
>>> Haiyan Song
>>>
>>> On Mon, May 28, 2018 at 09:51:34AM +0200, Laurent Dufour wrote:
>>>> On 28/05/2018 07:23, Song, HaiyanX wrote:
>>>>>
>>>>> Some regression and improvements is found by LKP-tools(linux kernel performance) on V9 patch series
>>>>> tested on Intel 4s Skylake platform.
>>>>
>>>> Hi,
>>>>
>>>> Thanks for reporting this benchmark results, but you mentioned the "V9 patch
>>>> series" while responding to the v11 header series...
>>>> Were these tests done on v9 or v11 ?
>>>>
>>>> Cheers,
>>>> Laurent.
>>>>
>>>>>
>>>>> The regression result is sorted by the metric will-it-scale.per_thread_ops.
>>>>> Branch: Laurent-Dufour/Speculative-page-faults/20180316-151833 (V9 patch series)
>>>>> Commit id:
>>>>> base commit: d55f34411b1b126429a823d06c3124c16283231f
>>>>> head commit: 0355322b3577eeab7669066df42c550a56801110
>>>>> Benchmark suite: will-it-scale
>>>>> Download link:
>>>>> https://github.com/antonblanchard/will-it-scale/tree/master/tests
>>>>> Metrics:
>>>>> will-it-scale.per_process_ops=processes/nr_cpu
>>>>> will-it-scale.per_thread_ops=threads/nr_cpu
>>>>> test box: lkp-skl-4sp1(nr_cpu=192,memory=768G)
>>>>> THP: enable / disable
>>>>> nr_task: 100%
>>>>>
>>>>> 1. Regressions:
>>>>> a) THP enabled:
>>>>> testcase base change head metric
>>>>> page_fault3/ enable THP 10092 -17.5% 8323 will-it-scale.per_thread_ops
>>>>> page_fault2/ enable THP 8300 -17.2% 6869 will-it-scale.per_thread_ops
>>>>> brk1/ enable THP 957.67 -7.6% 885 will-it-scale.per_thread_ops
>>>>> page_fault3/ enable THP 172821 -5.3% 163692 will-it-scale.per_process_ops
>>>>> signal1/ enable THP 9125 -3.2% 8834 will-it-scale.per_process_ops
>>>>>
>>>>> b) THP disabled:
>>>>> testcase base change head metric
>>>>> page_fault3/ disable THP 10107 -19.1% 8180 will-it-scale.per_thread_ops
>>>>> page_fault2/ disable THP 8432 -17.8% 6931 will-it-scale.per_thread_ops
>>>>> context_switch1/ disable THP 215389 -6.8% 200776 will-it-scale.per_thread_ops
>>>>> brk1/ disable THP 939.67 -6.6% 877.33 will-it-scale.per_thread_ops
>>>>> page_fault3/ disable THP 173145 -4.7% 165064 will-it-scale.per_process_ops
>>>>> signal1/ disable THP 9162 -3.9% 8802 will-it-scale.per_process_ops
>>>>>
>>>>> 2. Improvements:
>>>>> a) THP enabled:
>>>>> testcase base change head metric
>>>>> malloc1/ enable THP 66.33 +469.8% 383.67 will-it-scale.per_thread_ops
>>>>> writeseek3/ enable THP 2531 +4.5% 2646 will-it-scale.per_thread_ops
>>>>> signal1/ enable THP 989.33 +2.8% 1016 will-it-scale.per_thread_ops
>>>>>
>>>>> b) THP disabled:
>>>>> testcase base change head metric
>>>>> malloc1/ disable THP 90.33 +417.3% 467.33 will-it-scale.per_thread_ops
>>>>> read2/ disable THP 58934 +39.2% 82060 will-it-scale.per_thread_ops
>>>>> page_fault1/ disable THP 8607 +36.4% 11736 will-it-scale.per_thread_ops
>>>>> read1/ disable THP 314063 +12.7% 353934 will-it-scale.per_thread_ops
>>>>> writeseek3/ disable THP 2452 +12.5% 2759 will-it-scale.per_thread_ops
>>>>> signal1/ disable THP 971.33 +5.5% 1024 will-it-scale.per_thread_ops
>>>>>
>>>>> Notes: for above values in column "change", the higher value means that the related testcase result
>>>>> on head commit is better than that on base commit for this benchmark.
>>>>>
>>>>>
>>>>> Best regards
>>>>> Haiyan Song
>>>>>
>>>>> ________________________________________
>>>>> From: owner-linux-mm@kvack.org [owner-linux-mm@kvack.org] on behalf of Laurent Dufour [ldufour@linux.vnet.ibm.com]
>>>>> Sent: Thursday, May 17, 2018 7:06 PM
>>>>> To: akpm@linux-foundation.org; mhocko@kernel.org; peterz@infradead.org; kirill@shutemov.name; ak@linux.intel.com; dave@stgolabs.net; jack@suse.cz; Matthew Wilcox; khandual@linux.vnet.ibm.com; aneesh.kumar@linux.vnet.ibm.com; benh@kernel.crashing.org; mpe@ellerman.id.au; paulus@samba.org; Thomas Gleixner; Ingo Molnar; hpa@zytor.com; Will Deacon; Sergey Senozhatsky; sergey.senozhatsky.work@gmail.com; Andrea Arcangeli; Alexei Starovoitov; Wang, Kemi; Daniel Jordan; David Rientjes; Jerome Glisse; Ganesh Mahendran; Minchan Kim; Punit Agrawal; vinayak menon; Yang Shi
>>>>> Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; haren@linux.vnet.ibm.com; npiggin@gmail.com; bsingharora@gmail.com; paulmck@linux.vnet.ibm.com; Tim Chen; linuxppc-dev@lists.ozlabs.org; x86@kernel.org
>>>>> Subject: [PATCH v11 00/26] Speculative page faults
>>>>>
>>>>> This is a port on kernel 4.17 of the work done by Peter Zijlstra to handle
>>>>> page fault without holding the mm semaphore [1].
>>>>>
>>>>> The idea is to try to handle user space page faults without holding the
>>>>> mmap_sem. This should allow better concurrency for massively threaded
>>>>> process since the page fault handler will not wait for other threads memory
>>>>> layout change to be done, assuming that this change is done in another part
>>>>> of the process's memory space. This type page fault is named speculative
>>>>> page fault. If the speculative page fault fails because of a concurrency is
>>>>> detected or because underlying PMD or PTE tables are not yet allocating, it
>>>>> is failing its processing and a classic page fault is then tried.
>>>>>
>>>>> The speculative page fault (SPF) has to look for the VMA matching the fault
>>>>> address without holding the mmap_sem, this is done by introducing a rwlock
>>>>> which protects the access to the mm_rb tree. Previously this was done using
>>>>> SRCU but it was introducing a lot of scheduling to process the VMA's
>>>>> freeing operation which was hitting the performance by 20% as reported by
>>>>> Kemi Wang [2]. Using a rwlock to protect access to the mm_rb tree is
>>>>> limiting the locking contention to these operations which are expected to
>>>>> be in a O(log n) order. In addition to ensure that the VMA is not freed in
>>>>> our back a reference count is added and 2 services (get_vma() and
>>>>> put_vma()) are introduced to handle the reference count. Once a VMA is
>>>>> fetched from the RB tree using get_vma(), it must be later freed using
>>>>> put_vma(). I can't see anymore the overhead I got while will-it-scale
>>>>> benchmark anymore.
>>>>>
>>>>> The VMA's attributes checked during the speculative page fault processing
>>>>> have to be protected against parallel changes. This is done by using a per
>>>>> VMA sequence lock. This sequence lock allows the speculative page fault
>>>>> handler to fast check for parallel changes in progress and to abort the
>>>>> speculative page fault in that case.
>>>>>
>>>>> Once the VMA has been found, the speculative page fault handler would check
>>>>> for the VMA's attributes to verify that the page fault has to be handled
>>>>> correctly or not. Thus, the VMA is protected through a sequence lock which
>>>>> allows fast detection of concurrent VMA changes. If such a change is
>>>>> detected, the speculative page fault is aborted and a *classic* page fault
>>>>> is tried. VMA sequence lockings are added when VMA attributes which are
>>>>> checked during the page fault are modified.
>>>>>
>>>>> When the PTE is fetched, the VMA is checked to see if it has been changed,
>>>>> so once the page table is locked, the VMA is valid, so any other changes
>>>>> leading to touching this PTE will need to lock the page table, so no
>>>>> parallel change is possible at this time.
>>>>>
>>>>> The locking of the PTE is done with interrupts disabled, this allows
>>>>> checking for the PMD to ensure that there is not an ongoing collapsing
>>>>> operation. Since khugepaged is firstly set the PMD to pmd_none and then is
>>>>> waiting for the other CPU to have caught the IPI interrupt, if the pmd is
>>>>> valid at the time the PTE is locked, we have the guarantee that the
>>>>> collapsing operation will have to wait on the PTE lock to move forward.
>>>>> This allows the SPF handler to map the PTE safely. If the PMD value is
>>>>> different from the one recorded at the beginning of the SPF operation, the
>>>>> classic page fault handler will be called to handle the operation while
>>>>> holding the mmap_sem. As the PTE lock is done with the interrupts disabled,
>>>>> the lock is done using spin_trylock() to avoid dead lock when handling a
>>>>> page fault while a TLB invalidate is requested by another CPU holding the
>>>>> PTE.
>>>>>
>>>>> In pseudo code, this could be seen as:
>>>>> speculative_page_fault()
>>>>> {
>>>>> vma = get_vma()
>>>>> check vma sequence count
>>>>> check vma's support
>>>>> disable interrupt
>>>>> check pgd,p4d,...,pte
>>>>> save pmd and pte in vmf
>>>>> save vma sequence counter in vmf
>>>>> enable interrupt
>>>>> check vma sequence count
>>>>> handle_pte_fault(vma)
>>>>> ..
>>>>> page = alloc_page()
>>>>> pte_map_lock()
>>>>> disable interrupt
>>>>> abort if sequence counter has changed
>>>>> abort if pmd or pte has changed
>>>>> pte map and lock
>>>>> enable interrupt
>>>>> if abort
>>>>> free page
>>>>> abort
>>>>> ...
>>>>> }
>>>>>
>>>>> arch_fault_handler()
>>>>> {
>>>>> if (speculative_page_fault(&vma))
>>>>> goto done
>>>>> again:
>>>>> lock(mmap_sem)
>>>>> vma = find_vma();
>>>>> handle_pte_fault(vma);
>>>>> if retry
>>>>> unlock(mmap_sem)
>>>>> goto again;
>>>>> done:
>>>>> handle fault error
>>>>> }
>>>>>
>>>>> Support for THP is not done because when checking for the PMD, we can be
>>>>> confused by an in progress collapsing operation done by khugepaged. The
>>>>> issue is that pmd_none() could be true either if the PMD is not already
>>>>> populated or if the underlying PTE are in the way to be collapsed. So we
>>>>> cannot safely allocate a PMD if pmd_none() is true.
>>>>>
>>>>> This series add a new software performance event named 'speculative-faults'
>>>>> or 'spf'. It counts the number of successful page fault event handled
>>>>> speculatively. When recording 'faults,spf' events, the faults one is
>>>>> counting the total number of page fault events while 'spf' is only counting
>>>>> the part of the faults processed speculatively.
>>>>>
>>>>> There are some trace events introduced by this series. They allow
>>>>> identifying why the page faults were not processed speculatively. This
>>>>> doesn't take in account the faults generated by a monothreaded process
>>>>> which directly processed while holding the mmap_sem. This trace events are
>>>>> grouped in a system named 'pagefault', they are:
>>>>> - pagefault:spf_vma_changed : if the VMA has been changed in our back
>>>>> - pagefault:spf_vma_noanon : the vma->anon_vma field was not yet set.
>>>>> - pagefault:spf_vma_notsup : the VMA's type is not supported
>>>>> - pagefault:spf_vma_access : the VMA's access right are not respected
>>>>> - pagefault:spf_pmd_changed : the upper PMD pointer has changed in our
>>>>> back.
>>>>>
>>>>> To record all the related events, the easier is to run perf with the
>>>>> following arguments :
>>>>> $ perf stat -e 'faults,spf,pagefault:*' <command>
>>>>>
>>>>> There is also a dedicated vmstat counter showing the number of successful
>>>>> page fault handled speculatively. I can be seen this way:
>>>>> $ grep speculative_pgfault /proc/vmstat
>>>>>
>>>>> This series builds on top of v4.16-mmotm-2018-04-13-17-28 and is functional
>>>>> on x86, PowerPC and arm64.
>>>>>
>>>>> ---------------------
>>>>> Real Workload results
>>>>>
>>>>> As mentioned in previous email, we did non official runs using a "popular
>>>>> in memory multithreaded database product" on 176 cores SMT8 Power system
>>>>> which showed a 30% improvements in the number of transaction processed per
>>>>> second. This run has been done on the v6 series, but changes introduced in
>>>>> this new version should not impact the performance boost seen.
>>>>>
>>>>> Here are the perf data captured during 2 of these runs on top of the v8
>>>>> series:
>>>>> vanilla spf
>>>>> faults 89.418 101.364 +13%
>>>>> spf n/a 97.989
>>>>>
>>>>> With the SPF kernel, most of the page fault were processed in a speculative
>>>>> way.
>>>>>
>>>>> Ganesh Mahendran had backported the series on top of a 4.9 kernel and gave
>>>>> it a try on an android device. He reported that the application launch time
>>>>> was improved in average by 6%, and for large applications (~100 threads) by
>>>>> 20%.
>>>>>
>>>>> Here are the launch time Ganesh mesured on Android 8.0 on top of a Qcom
>>>>> MSM845 (8 cores) with 6GB (the less is better):
>>>>>
>>>>> Application 4.9 4.9+spf delta
>>>>> com.tencent.mm 416 389 -7%
>>>>> com.eg.android.AlipayGphone 1135 986 -13%
>>>>> com.tencent.mtt 455 454 0%
>>>>> com.qqgame.hlddz 1497 1409 -6%
>>>>> com.autonavi.minimap 711 701 -1%
>>>>> com.tencent.tmgp.sgame 788 748 -5%
>>>>> com.immomo.momo 501 487 -3%
>>>>> com.tencent.peng 2145 2112 -2%
>>>>> com.smile.gifmaker 491 461 -6%
>>>>> com.baidu.BaiduMap 479 366 -23%
>>>>> com.taobao.taobao 1341 1198 -11%
>>>>> com.baidu.searchbox 333 314 -6%
>>>>> com.tencent.mobileqq 394 384 -3%
>>>>> com.sina.weibo 907 906 0%
>>>>> com.youku.phone 816 731 -11%
>>>>> com.happyelements.AndroidAnimal.qq 763 717 -6%
>>>>> com.UCMobile 415 411 -1%
>>>>> com.tencent.tmgp.ak 1464 1431 -2%
>>>>> com.tencent.qqmusic 336 329 -2%
>>>>> com.sankuai.meituan 1661 1302 -22%
>>>>> com.netease.cloudmusic 1193 1200 1%
>>>>> air.tv.douyu.android 4257 4152 -2%
>>>>>
>>>>> ------------------
>>>>> Benchmarks results
>>>>>
>>>>> Base kernel is v4.17.0-rc4-mm1
>>>>> SPF is BASE + this series
>>>>>
>>>>> Kernbench:
>>>>> ----------
>>>>> Here are the results on a 16 CPUs X86 guest using kernbench on a 4.15
>>>>> kernel (kernel is build 5 times):
>>>>>
>>>>> Average Half load -j 8
>>>>> Run (std deviation)
>>>>> BASE SPF
>>>>> Elapsed Time 1448.65 (5.72312) 1455.84 (4.84951) 0.50%
>>>>> User Time 10135.4 (30.3699) 10148.8 (31.1252) 0.13%
>>>>> System Time 900.47 (2.81131) 923.28 (7.52779) 2.53%
>>>>> Percent CPU 761.4 (1.14018) 760.2 (0.447214) -0.16%
>>>>> Context Switches 85380 (3419.52) 84748 (1904.44) -0.74%
>>>>> Sleeps 105064 (1240.96) 105074 (337.612) 0.01%
>>>>>
>>>>> Average Optimal load -j 16
>>>>> Run (std deviation)
>>>>> BASE SPF
>>>>> Elapsed Time 920.528 (10.1212) 927.404 (8.91789) 0.75%
>>>>> User Time 11064.8 (981.142) 11085 (990.897) 0.18%
>>>>> System Time 979.904 (84.0615) 1001.14 (82.5523) 2.17%
>>>>> Percent CPU 1089.5 (345.894) 1086.1 (343.545) -0.31%
>>>>> Context Switches 159488 (78156.4) 158223 (77472.1) -0.79%
>>>>> Sleeps 110566 (5877.49) 110388 (5617.75) -0.16%
>>>>>
>>>>>
>>>>> During a run on the SPF, perf events were captured:
>>>>> Performance counter stats for '../kernbench -M':
>>>>> 526743764 faults
>>>>> 210 spf
>>>>> 3 pagefault:spf_vma_changed
>>>>> 0 pagefault:spf_vma_noanon
>>>>> 2278 pagefault:spf_vma_notsup
>>>>> 0 pagefault:spf_vma_access
>>>>> 0 pagefault:spf_pmd_changed
>>>>>
>>>>> Very few speculative page faults were recorded as most of the processes
>>>>> involved are monothreaded (sounds that on this architecture some threads
>>>>> were created during the kernel build processing).
>>>>>
>>>>> Here are the kerbench results on a 80 CPUs Power8 system:
>>>>>
>>>>> Average Half load -j 40
>>>>> Run (std deviation)
>>>>> BASE SPF
>>>>> Elapsed Time 117.152 (0.774642) 117.166 (0.476057) 0.01%
>>>>> User Time 4478.52 (24.7688) 4479.76 (9.08555) 0.03%
>>>>> System Time 131.104 (0.720056) 134.04 (0.708414) 2.24%
>>>>> Percent CPU 3934 (19.7104) 3937.2 (19.0184) 0.08%
>>>>> Context Switches 92125.4 (576.787) 92581.6 (198.622) 0.50%
>>>>> Sleeps 317923 (652.499) 318469 (1255.59) 0.17%
>>>>>
>>>>> Average Optimal load -j 80
>>>>> Run (std deviation)
>>>>> BASE SPF
>>>>> Elapsed Time 107.73 (0.632416) 107.31 (0.584936) -0.39%
>>>>> User Time 5869.86 (1466.72) 5871.71 (1467.27) 0.03%
>>>>> System Time 153.728 (23.8573) 157.153 (24.3704) 2.23%
>>>>> Percent CPU 5418.6 (1565.17) 5436.7 (1580.91) 0.33%
>>>>> Context Switches 223861 (138865) 225032 (139632) 0.52%
>>>>> Sleeps 330529 (13495.1) 332001 (14746.2) 0.45%
>>>>>
>>>>> During a run on the SPF, perf events were captured:
>>>>> Performance counter stats for '../kernbench -M':
>>>>> 116730856 faults
>>>>> 0 spf
>>>>> 3 pagefault:spf_vma_changed
>>>>> 0 pagefault:spf_vma_noanon
>>>>> 476 pagefault:spf_vma_notsup
>>>>> 0 pagefault:spf_vma_access
>>>>> 0 pagefault:spf_pmd_changed
>>>>>
>>>>> Most of the processes involved are monothreaded so SPF is not activated but
>>>>> there is no impact on the performance.
>>>>>
>>>>> Ebizzy:
>>>>> -------
>>>>> The test is counting the number of records per second it can manage, the
>>>>> higher is the best. I run it like this 'ebizzy -mTt <nrcpus>'. To get
>>>>> consistent result I repeated the test 100 times and measure the average
>>>>> result. The number is the record processes per second, the higher is the
>>>>> best.
>>>>>
>>>>> BASE SPF delta
>>>>> 16 CPUs x86 VM 742.57 1490.24 100.69%
>>>>> 80 CPUs P8 node 13105.4 24174.23 84.46%
>>>>>
>>>>> Here are the performance counter read during a run on a 16 CPUs x86 VM:
>>>>> Performance counter stats for './ebizzy -mTt 16':
>>>>> 1706379 faults
>>>>> 1674599 spf
>>>>> 30588 pagefault:spf_vma_changed
>>>>> 0 pagefault:spf_vma_noanon
>>>>> 363 pagefault:spf_vma_notsup
>>>>> 0 pagefault:spf_vma_access
>>>>> 0 pagefault:spf_pmd_changed
>>>>>
>>>>> And the ones captured during a run on a 80 CPUs Power node:
>>>>> Performance counter stats for './ebizzy -mTt 80':
>>>>> 1874773 faults
>>>>> 1461153 spf
>>>>> 413293 pagefault:spf_vma_changed
>>>>> 0 pagefault:spf_vma_noanon
>>>>> 200 pagefault:spf_vma_notsup
>>>>> 0 pagefault:spf_vma_access
>>>>> 0 pagefault:spf_pmd_changed
>>>>>
>>>>> In ebizzy's case most of the page fault were handled in a speculative way,
>>>>> leading the ebizzy performance boost.
>>>>>
>>>>> ------------------
>>>>> Changes since v10 (https://lkml.org/lkml/2018/4/17/572):
>>>>> - Accounted for all review feedbacks from Punit Agrawal, Ganesh Mahendran
>>>>> and Minchan Kim, hopefully.
>>>>> - Remove unneeded check on CONFIG_SPECULATIVE_PAGE_FAULT in
>>>>> __do_page_fault().
>>>>> - Loop in pte_spinlock() and pte_map_lock() when pte try lock fails
>>>>> instead
>>>>> of aborting the speculative page fault handling. Dropping the now
>>>>> useless
>>>>> trace event pagefault:spf_pte_lock.
>>>>> - No more try to reuse the fetched VMA during the speculative page fault
>>>>> handling when retrying is needed. This adds a lot of complexity and
>>>>> additional tests done didn't show a significant performance improvement.
>>>>> - Convert IS_ENABLED(CONFIG_NUMA) back to #ifdef due to build error.
>>>>>
>>>>> [1] http://linux-kernel.2935.n7.nabble.com/RFC-PATCH-0-6-Another-go-at-speculative-page-faults-tt965642.html#none
>>>>> [2] https://patchwork.kernel.org/patch/9999687/
>>>>>
>>>>>
>>>>> Laurent Dufour (20):
>>>>> mm: introduce CONFIG_SPECULATIVE_PAGE_FAULT
>>>>> x86/mm: define ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT
>>>>> powerpc/mm: set ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT
>>>>> mm: introduce pte_spinlock for FAULT_FLAG_SPECULATIVE
>>>>> mm: make pte_unmap_same compatible with SPF
>>>>> mm: introduce INIT_VMA()
>>>>> mm: protect VMA modifications using VMA sequence count
>>>>> mm: protect mremap() against SPF hanlder
>>>>> mm: protect SPF handler against anon_vma changes
>>>>> mm: cache some VMA fields in the vm_fault structure
>>>>> mm/migrate: Pass vm_fault pointer to migrate_misplaced_page()
>>>>> mm: introduce __lru_cache_add_active_or_unevictable
>>>>> mm: introduce __vm_normal_page()
>>>>> mm: introduce __page_add_new_anon_rmap()
>>>>> mm: protect mm_rb tree with a rwlock
>>>>> mm: adding speculative page fault failure trace events
>>>>> perf: add a speculative page fault sw event
>>>>> perf tools: add support for the SPF perf event
>>>>> mm: add speculative page fault vmstats
>>>>> powerpc/mm: add speculative page fault
>>>>>
>>>>> Mahendran Ganesh (2):
>>>>> arm64/mm: define ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT
>>>>> arm64/mm: add speculative page fault
>>>>>
>>>>> Peter Zijlstra (4):
>>>>> mm: prepare for FAULT_FLAG_SPECULATIVE
>>>>> mm: VMA sequence count
>>>>> mm: provide speculative fault infrastructure
>>>>> x86/mm: add speculative pagefault handling
>>>>>
>>>>> arch/arm64/Kconfig | 1 +
>>>>> arch/arm64/mm/fault.c | 12 +
>>>>> arch/powerpc/Kconfig | 1 +
>>>>> arch/powerpc/mm/fault.c | 16 +
>>>>> arch/x86/Kconfig | 1 +
>>>>> arch/x86/mm/fault.c | 27 +-
>>>>> fs/exec.c | 2 +-
>>>>> fs/proc/task_mmu.c | 5 +-
>>>>> fs/userfaultfd.c | 17 +-
>>>>> include/linux/hugetlb_inline.h | 2 +-
>>>>> include/linux/migrate.h | 4 +-
>>>>> include/linux/mm.h | 136 +++++++-
>>>>> include/linux/mm_types.h | 7 +
>>>>> include/linux/pagemap.h | 4 +-
>>>>> include/linux/rmap.h | 12 +-
>>>>> include/linux/swap.h | 10 +-
>>>>> include/linux/vm_event_item.h | 3 +
>>>>> include/trace/events/pagefault.h | 80 +++++
>>>>> include/uapi/linux/perf_event.h | 1 +
>>>>> kernel/fork.c | 5 +-
>>>>> mm/Kconfig | 22 ++
>>>>> mm/huge_memory.c | 6 +-
>>>>> mm/hugetlb.c | 2 +
>>>>> mm/init-mm.c | 3 +
>>>>> mm/internal.h | 20 ++
>>>>> mm/khugepaged.c | 5 +
>>>>> mm/madvise.c | 6 +-
>>>>> mm/memory.c | 612 +++++++++++++++++++++++++++++-----
>>>>> mm/mempolicy.c | 51 ++-
>>>>> mm/migrate.c | 6 +-
>>>>> mm/mlock.c | 13 +-
>>>>> mm/mmap.c | 229 ++++++++++---
>>>>> mm/mprotect.c | 4 +-
>>>>> mm/mremap.c | 13 +
>>>>> mm/nommu.c | 2 +-
>>>>> mm/rmap.c | 5 +-
>>>>> mm/swap.c | 6 +-
>>>>> mm/swap_state.c | 8 +-
>>>>> mm/vmstat.c | 5 +-
>>>>> tools/include/uapi/linux/perf_event.h | 1 +
>>>>> tools/perf/util/evsel.c | 1 +
>>>>> tools/perf/util/parse-events.c | 4 +
>>>>> tools/perf/util/parse-events.l | 1 +
>>>>> tools/perf/util/python.c | 1 +
>>>>> 44 files changed, 1161 insertions(+), 211 deletions(-)
>>>>> create mode 100644 include/trace/events/pagefault.h
>>>>>
>>>>> --
>>>>> 2.7.4
>>>>>
>>>>>
>>>>
>>>
>>
>
>
^ permalink raw reply
* Re: [PATCH 05/11] hugetlb: Introduce generic version of huge_ptep_clear_flush
From: Alexandre Ghiti @ 2018-07-04 6:51 UTC (permalink / raw)
To: linux, catalin.marinas, will.deacon, tony.luck, fenghua.yu, ralf,
paul.burton, jhogan, jejb, deller, benh, paulus, mpe, ysato,
dalias, davem, tglx, mingo, hpa, x86, arnd, linux-arm-kernel,
linux-kernel, linux-ia64, linux-mips, linux-parisc, linuxppc-dev,
linux-sh, sparclinux, linux-arch
In-Reply-To: <20180704055207.27978-6-alex@ghiti.fr>
Just discovered my email provider limit of mails per minute, please drop
this serie, I'll send a v2 using --batch-size option of git send-email.
Sorry about that.
On 07/04/2018 07:52 AM, Alexandre Ghiti wrote:
> arm, x86 architectures use the same version of
> huge_ptep_clear_flush, so move this generic implementation into
> asm-generic/hugetlb.h.
>
> Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
> ---
> arch/arm/include/asm/hugetlb-3level.h | 6 ------
> arch/arm64/include/asm/hugetlb.h | 1 +
> arch/ia64/include/asm/hugetlb.h | 1 +
> arch/mips/include/asm/hugetlb.h | 1 +
> arch/parisc/include/asm/hugetlb.h | 1 +
> arch/powerpc/include/asm/hugetlb.h | 1 +
> arch/sh/include/asm/hugetlb.h | 1 +
> arch/sparc/include/asm/hugetlb.h | 1 +
> arch/x86/include/asm/hugetlb.h | 6 ------
> include/asm-generic/hugetlb.h | 8 ++++++++
> 10 files changed, 15 insertions(+), 12 deletions(-)
>
> diff --git a/arch/arm/include/asm/hugetlb-3level.h b/arch/arm/include/asm/hugetlb-3level.h
> index ad36e84b819a..b897541520ef 100644
> --- a/arch/arm/include/asm/hugetlb-3level.h
> +++ b/arch/arm/include/asm/hugetlb-3level.h
> @@ -37,12 +37,6 @@ static inline pte_t huge_ptep_get(pte_t *ptep)
> return retval;
> }
>
> -static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
> - unsigned long addr, pte_t *ptep)
> -{
> - ptep_clear_flush(vma, addr, ptep);
> -}
> -
> static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
> unsigned long addr, pte_t *ptep)
> {
> diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
> index 6ae0bcafe162..4c8dd488554d 100644
> --- a/arch/arm64/include/asm/hugetlb.h
> +++ b/arch/arm64/include/asm/hugetlb.h
> @@ -71,6 +71,7 @@ extern pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
> unsigned long addr, pte_t *ptep);
> extern void huge_ptep_set_wrprotect(struct mm_struct *mm,
> unsigned long addr, pte_t *ptep);
> +#define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
> extern void huge_ptep_clear_flush(struct vm_area_struct *vma,
> unsigned long addr, pte_t *ptep);
> #define __HAVE_ARCH_HUGE_PTE_CLEAR
> diff --git a/arch/ia64/include/asm/hugetlb.h b/arch/ia64/include/asm/hugetlb.h
> index 6719c74da0de..41b5f6adeee4 100644
> --- a/arch/ia64/include/asm/hugetlb.h
> +++ b/arch/ia64/include/asm/hugetlb.h
> @@ -20,6 +20,7 @@ static inline int is_hugepage_only_range(struct mm_struct *mm,
> REGION_NUMBER((addr)+(len)-1) == RGN_HPAGE);
> }
>
> +#define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
> static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
> unsigned long addr, pte_t *ptep)
> {
> diff --git a/arch/mips/include/asm/hugetlb.h b/arch/mips/include/asm/hugetlb.h
> index 0959cc5a41fa..7df1f116a3cc 100644
> --- a/arch/mips/include/asm/hugetlb.h
> +++ b/arch/mips/include/asm/hugetlb.h
> @@ -48,6 +48,7 @@ static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
> return pte;
> }
>
> +#define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
> static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
> unsigned long addr, pte_t *ptep)
> {
> diff --git a/arch/parisc/include/asm/hugetlb.h b/arch/parisc/include/asm/hugetlb.h
> index 6e281e1bb336..9afff26747a1 100644
> --- a/arch/parisc/include/asm/hugetlb.h
> +++ b/arch/parisc/include/asm/hugetlb.h
> @@ -32,6 +32,7 @@ static inline int prepare_hugepage_range(struct file *file,
> return 0;
> }
>
> +#define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
> static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
> unsigned long addr, pte_t *ptep)
> {
> diff --git a/arch/powerpc/include/asm/hugetlb.h b/arch/powerpc/include/asm/hugetlb.h
> index ec3e0c2e78f8..de0769f0b5b2 100644
> --- a/arch/powerpc/include/asm/hugetlb.h
> +++ b/arch/powerpc/include/asm/hugetlb.h
> @@ -143,6 +143,7 @@ static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
> #endif
> }
>
> +#define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
> static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
> unsigned long addr, pte_t *ptep)
> {
> diff --git a/arch/sh/include/asm/hugetlb.h b/arch/sh/include/asm/hugetlb.h
> index 08ee6c00b5e9..9abf9c86b769 100644
> --- a/arch/sh/include/asm/hugetlb.h
> +++ b/arch/sh/include/asm/hugetlb.h
> @@ -25,6 +25,7 @@ static inline int prepare_hugepage_range(struct file *file,
> return 0;
> }
>
> +#define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
> static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
> unsigned long addr, pte_t *ptep)
> {
> diff --git a/arch/sparc/include/asm/hugetlb.h b/arch/sparc/include/asm/hugetlb.h
> index 944e3a4bfaff..651a9593fcee 100644
> --- a/arch/sparc/include/asm/hugetlb.h
> +++ b/arch/sparc/include/asm/hugetlb.h
> @@ -42,6 +42,7 @@ static inline int prepare_hugepage_range(struct file *file,
> return 0;
> }
>
> +#define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
> static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
> unsigned long addr, pte_t *ptep)
> {
> diff --git a/arch/x86/include/asm/hugetlb.h b/arch/x86/include/asm/hugetlb.h
> index 48b8d9b13cc6..8347d5abf882 100644
> --- a/arch/x86/include/asm/hugetlb.h
> +++ b/arch/x86/include/asm/hugetlb.h
> @@ -27,12 +27,6 @@ static inline int prepare_hugepage_range(struct file *file,
> return 0;
> }
>
> -static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
> - unsigned long addr, pte_t *ptep)
> -{
> - ptep_clear_flush(vma, addr, ptep);
> -}
> -
> static inline int huge_pte_none(pte_t pte)
> {
> return pte_none(pte);
> diff --git a/include/asm-generic/hugetlb.h b/include/asm-generic/hugetlb.h
> index 0f6f151780dd..ffa63fd8388d 100644
> --- a/include/asm-generic/hugetlb.h
> +++ b/include/asm-generic/hugetlb.h
> @@ -65,4 +65,12 @@ static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
> }
> #endif
>
> +#ifndef __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
> +static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
> + unsigned long addr, pte_t *ptep)
> +{
> + ptep_clear_flush(vma, addr, ptep);
> +}
> +#endif
> +
> #endif /* _ASM_GENERIC_HUGETLB_H */
^ permalink raw reply
* [PATCH 03/11] hugetlb: Introduce generic version of set_huge_pte_at
From: Alexandre Ghiti @ 2018-07-04 5:51 UTC (permalink / raw)
To: linux, catalin.marinas, will.deacon, tony.luck, fenghua.yu, ralf,
paul.burton, jhogan, jejb, deller, benh, paulus, mpe, ysato,
dalias, davem, tglx, mingo, hpa, x86, arnd, linux-arm-kernel,
linux-kernel, linux-ia64, linux-mips, linux-parisc, linuxppc-dev,
linux-sh, sparclinux, linux-arch
Cc: Alexandre Ghiti
In-Reply-To: <20180704055207.27978-1-alex@ghiti.fr>
arm, ia64, mips, powerpc, sh, x86 architectures use the
same version of set_huge_pte_at, so move this generic
implementation into asm-generic/hugetlb.h.
Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
---
arch/arm/include/asm/hugetlb-3level.h | 6 ------
arch/arm64/include/asm/hugetlb.h | 1 +
arch/ia64/include/asm/hugetlb.h | 6 ------
arch/mips/include/asm/hugetlb.h | 6 ------
arch/parisc/include/asm/hugetlb.h | 1 +
arch/powerpc/include/asm/hugetlb.h | 6 ------
arch/sh/include/asm/hugetlb.h | 6 ------
arch/sparc/include/asm/hugetlb.h | 1 +
arch/x86/include/asm/hugetlb.h | 6 ------
include/asm-generic/hugetlb.h | 8 +++++++-
10 files changed, 10 insertions(+), 37 deletions(-)
diff --git a/arch/arm/include/asm/hugetlb-3level.h b/arch/arm/include/asm/hugetlb-3level.h
index d4014fbe5ea3..398fb06e8207 100644
--- a/arch/arm/include/asm/hugetlb-3level.h
+++ b/arch/arm/include/asm/hugetlb-3level.h
@@ -37,12 +37,6 @@ static inline pte_t huge_ptep_get(pte_t *ptep)
return retval;
}
-static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
- pte_t *ptep, pte_t pte)
-{
- set_pte_at(mm, addr, ptep, pte);
-}
-
static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
unsigned long addr, pte_t *ptep)
{
diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
index 4af1a800a900..874661a1dff1 100644
--- a/arch/arm64/include/asm/hugetlb.h
+++ b/arch/arm64/include/asm/hugetlb.h
@@ -60,6 +60,7 @@ static inline void arch_clear_hugepage_flags(struct page *page)
extern pte_t arch_make_huge_pte(pte_t entry, struct vm_area_struct *vma,
struct page *page, int writable);
#define arch_make_huge_pte arch_make_huge_pte
+#define __HAVE_ARCH_HUGE_SET_HUGE_PTE_AT
extern void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte);
extern int huge_ptep_set_access_flags(struct vm_area_struct *vma,
diff --git a/arch/ia64/include/asm/hugetlb.h b/arch/ia64/include/asm/hugetlb.h
index afe9fa4d969b..a235d6f60fb3 100644
--- a/arch/ia64/include/asm/hugetlb.h
+++ b/arch/ia64/include/asm/hugetlb.h
@@ -20,12 +20,6 @@ static inline int is_hugepage_only_range(struct mm_struct *mm,
REGION_NUMBER((addr)+(len)-1) == RGN_HPAGE);
}
-static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
- pte_t *ptep, pte_t pte)
-{
- set_pte_at(mm, addr, ptep, pte);
-}
-
static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
unsigned long addr, pte_t *ptep)
{
diff --git a/arch/mips/include/asm/hugetlb.h b/arch/mips/include/asm/hugetlb.h
index 53764050243e..8ea439041d5d 100644
--- a/arch/mips/include/asm/hugetlb.h
+++ b/arch/mips/include/asm/hugetlb.h
@@ -36,12 +36,6 @@ static inline int prepare_hugepage_range(struct file *file,
return 0;
}
-static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
- pte_t *ptep, pte_t pte)
-{
- set_pte_at(mm, addr, ptep, pte);
-}
-
static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
unsigned long addr, pte_t *ptep)
{
diff --git a/arch/parisc/include/asm/hugetlb.h b/arch/parisc/include/asm/hugetlb.h
index 28c23b68d38d..77c8adbac7c3 100644
--- a/arch/parisc/include/asm/hugetlb.h
+++ b/arch/parisc/include/asm/hugetlb.h
@@ -4,6 +4,7 @@
#include <asm/page.h>
+#define __HAVE_ARCH_HUGE_SET_HUGE_PTE_AT
void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte);
diff --git a/arch/powerpc/include/asm/hugetlb.h b/arch/powerpc/include/asm/hugetlb.h
index de46ee16b615..ba7d5d8b543f 100644
--- a/arch/powerpc/include/asm/hugetlb.h
+++ b/arch/powerpc/include/asm/hugetlb.h
@@ -132,12 +132,6 @@ static inline int prepare_hugepage_range(struct file *file,
return 0;
}
-static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
- pte_t *ptep, pte_t pte)
-{
- set_pte_at(mm, addr, ptep, pte);
-}
-
static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
unsigned long addr, pte_t *ptep)
{
diff --git a/arch/sh/include/asm/hugetlb.h b/arch/sh/include/asm/hugetlb.h
index f6a51b609409..bc552e37c1c9 100644
--- a/arch/sh/include/asm/hugetlb.h
+++ b/arch/sh/include/asm/hugetlb.h
@@ -25,12 +25,6 @@ static inline int prepare_hugepage_range(struct file *file,
return 0;
}
-static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
- pte_t *ptep, pte_t pte)
-{
- set_pte_at(mm, addr, ptep, pte);
-}
-
static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
unsigned long addr, pte_t *ptep)
{
diff --git a/arch/sparc/include/asm/hugetlb.h b/arch/sparc/include/asm/hugetlb.h
index 59d89b52ccb7..16b0c53ea6c9 100644
--- a/arch/sparc/include/asm/hugetlb.h
+++ b/arch/sparc/include/asm/hugetlb.h
@@ -12,6 +12,7 @@ struct pud_huge_patch_entry {
extern struct pud_huge_patch_entry __pud_huge_patch, __pud_huge_patch_end;
#endif
+#define __HAVE_ARCH_HUGE_SET_HUGE_PTE_AT
void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte);
diff --git a/arch/x86/include/asm/hugetlb.h b/arch/x86/include/asm/hugetlb.h
index 996ce8e15365..554d5614b375 100644
--- a/arch/x86/include/asm/hugetlb.h
+++ b/arch/x86/include/asm/hugetlb.h
@@ -27,12 +27,6 @@ static inline int prepare_hugepage_range(struct file *file,
return 0;
}
-static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
- pte_t *ptep, pte_t pte)
-{
- set_pte_at(mm, addr, ptep, pte);
-}
-
static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
unsigned long addr, pte_t *ptep)
{
diff --git a/include/asm-generic/hugetlb.h b/include/asm-generic/hugetlb.h
index c697ca9dda18..ee010b756246 100644
--- a/include/asm-generic/hugetlb.h
+++ b/include/asm-generic/hugetlb.h
@@ -47,8 +47,14 @@ static inline void hugetlb_free_pgd_range(struct mmu_gather *tlb,
{
free_pgd_range(tlb, addr, end, floor, ceiling);
}
+#endif
-
+#ifndef __HAVE_ARCH_HUGE_SET_HUGE_PTE_AT
+static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep, pte_t pte)
+{
+ set_pte_at(mm, addr, ptep, pte);
+}
#endif
#endif /* _ASM_GENERIC_HUGETLB_H */
--
2.16.2
^ permalink raw reply related
* [PATCH 00/11] hugetlb: Factorize architecture hugetlb primitives
From: Alexandre Ghiti @ 2018-07-04 5:51 UTC (permalink / raw)
To: linux, catalin.marinas, will.deacon, tony.luck, fenghua.yu, ralf,
paul.burton, jhogan, jejb, deller, benh, paulus, mpe, ysato,
dalias, davem, tglx, mingo, hpa, x86, arnd, linux-arm-kernel,
linux-kernel, linux-ia64, linux-mips, linux-parisc, linuxppc-dev,
linux-sh, sparclinux, linux-arch
Cc: Alexandre Ghiti
In order to reduce copy/paste of functions across architectures and then
make riscv hugetlb port simpler and smaller, this patchset intends to
factorize the numerous hugetlb primitives that are defined across all the
architectures.
Except for prepare_hugepage_range, this patchset moves the versions that
are just pass-through to standard pte primitives into
asm-generic/hugetlb.h by using the same #ifdef semantic that can be
found in asm-generic/pgtable.h, i.e. __HAVE_ARCH_***.
s390 architecture has not been tackled in this serie since it does not
use asm-generic/hugetlb.h at all.
powerpc could be factorized a bit more (cf huge_ptep_set_wrprotect).
This patchset has been compiled on x86 only.
Alexandre Ghiti (11):
hugetlb: Harmonize hugetlb.h arch specific defines with pgtable.h
hugetlb: Introduce generic version of hugetlb_free_pgd_range
hugetlb: Introduce generic version of set_huge_pte_at
hugetlb: Introduce generic version of huge_ptep_get_and_clear
hugetlb: Introduce generic version of huge_ptep_clear_flush
hugetlb: Introduce generic version of huge_pte_none
hugetlb: Introduce generic version of huge_pte_wrprotect
hugetlb: Introduce generic version of prepare_hugepage_range
hugetlb: Introduce generic version of huge_ptep_set_wrprotect
hugetlb: Introduce generic version of huge_ptep_set_access_flags
hugetlb: Introduce generic version of huge_ptep_get
arch/arm/include/asm/hugetlb-3level.h | 32 +---------
arch/arm/include/asm/hugetlb.h | 33 +----------
arch/arm64/include/asm/hugetlb.h | 39 +++---------
arch/ia64/include/asm/hugetlb.h | 47 ++-------------
arch/mips/include/asm/hugetlb.h | 40 +++----------
arch/parisc/include/asm/hugetlb.h | 33 +++--------
arch/powerpc/include/asm/book3s/32/pgtable.h | 2 +
arch/powerpc/include/asm/book3s/64/pgtable.h | 1 +
arch/powerpc/include/asm/hugetlb.h | 43 ++------------
arch/powerpc/include/asm/nohash/32/pgtable.h | 2 +
arch/powerpc/include/asm/nohash/64/pgtable.h | 1 +
arch/sh/include/asm/hugetlb.h | 54 ++---------------
arch/sparc/include/asm/hugetlb.h | 40 +++----------
arch/x86/include/asm/hugetlb.h | 72 +----------------------
include/asm-generic/hugetlb.h | 88 +++++++++++++++++++++++++++-
15 files changed, 143 insertions(+), 384 deletions(-)
--
2.16.2
^ permalink raw reply
* [PATCH 05/11] hugetlb: Introduce generic version of huge_ptep_clear_flush
From: Alexandre Ghiti @ 2018-07-04 5:52 UTC (permalink / raw)
To: linux, catalin.marinas, will.deacon, tony.luck, fenghua.yu, ralf,
paul.burton, jhogan, jejb, deller, benh, paulus, mpe, ysato,
dalias, davem, tglx, mingo, hpa, x86, arnd, linux-arm-kernel,
linux-kernel, linux-ia64, linux-mips, linux-parisc, linuxppc-dev,
linux-sh, sparclinux, linux-arch
Cc: Alexandre Ghiti
In-Reply-To: <20180704055207.27978-1-alex@ghiti.fr>
arm, x86 architectures use the same version of
huge_ptep_clear_flush, so move this generic implementation into
asm-generic/hugetlb.h.
Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
---
arch/arm/include/asm/hugetlb-3level.h | 6 ------
arch/arm64/include/asm/hugetlb.h | 1 +
arch/ia64/include/asm/hugetlb.h | 1 +
arch/mips/include/asm/hugetlb.h | 1 +
arch/parisc/include/asm/hugetlb.h | 1 +
arch/powerpc/include/asm/hugetlb.h | 1 +
arch/sh/include/asm/hugetlb.h | 1 +
arch/sparc/include/asm/hugetlb.h | 1 +
arch/x86/include/asm/hugetlb.h | 6 ------
include/asm-generic/hugetlb.h | 8 ++++++++
10 files changed, 15 insertions(+), 12 deletions(-)
diff --git a/arch/arm/include/asm/hugetlb-3level.h b/arch/arm/include/asm/hugetlb-3level.h
index ad36e84b819a..b897541520ef 100644
--- a/arch/arm/include/asm/hugetlb-3level.h
+++ b/arch/arm/include/asm/hugetlb-3level.h
@@ -37,12 +37,6 @@ static inline pte_t huge_ptep_get(pte_t *ptep)
return retval;
}
-static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
- unsigned long addr, pte_t *ptep)
-{
- ptep_clear_flush(vma, addr, ptep);
-}
-
static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
unsigned long addr, pte_t *ptep)
{
diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
index 6ae0bcafe162..4c8dd488554d 100644
--- a/arch/arm64/include/asm/hugetlb.h
+++ b/arch/arm64/include/asm/hugetlb.h
@@ -71,6 +71,7 @@ extern pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
unsigned long addr, pte_t *ptep);
extern void huge_ptep_set_wrprotect(struct mm_struct *mm,
unsigned long addr, pte_t *ptep);
+#define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
extern void huge_ptep_clear_flush(struct vm_area_struct *vma,
unsigned long addr, pte_t *ptep);
#define __HAVE_ARCH_HUGE_PTE_CLEAR
diff --git a/arch/ia64/include/asm/hugetlb.h b/arch/ia64/include/asm/hugetlb.h
index 6719c74da0de..41b5f6adeee4 100644
--- a/arch/ia64/include/asm/hugetlb.h
+++ b/arch/ia64/include/asm/hugetlb.h
@@ -20,6 +20,7 @@ static inline int is_hugepage_only_range(struct mm_struct *mm,
REGION_NUMBER((addr)+(len)-1) == RGN_HPAGE);
}
+#define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
unsigned long addr, pte_t *ptep)
{
diff --git a/arch/mips/include/asm/hugetlb.h b/arch/mips/include/asm/hugetlb.h
index 0959cc5a41fa..7df1f116a3cc 100644
--- a/arch/mips/include/asm/hugetlb.h
+++ b/arch/mips/include/asm/hugetlb.h
@@ -48,6 +48,7 @@ static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
return pte;
}
+#define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
unsigned long addr, pte_t *ptep)
{
diff --git a/arch/parisc/include/asm/hugetlb.h b/arch/parisc/include/asm/hugetlb.h
index 6e281e1bb336..9afff26747a1 100644
--- a/arch/parisc/include/asm/hugetlb.h
+++ b/arch/parisc/include/asm/hugetlb.h
@@ -32,6 +32,7 @@ static inline int prepare_hugepage_range(struct file *file,
return 0;
}
+#define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
unsigned long addr, pte_t *ptep)
{
diff --git a/arch/powerpc/include/asm/hugetlb.h b/arch/powerpc/include/asm/hugetlb.h
index ec3e0c2e78f8..de0769f0b5b2 100644
--- a/arch/powerpc/include/asm/hugetlb.h
+++ b/arch/powerpc/include/asm/hugetlb.h
@@ -143,6 +143,7 @@ static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
#endif
}
+#define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
unsigned long addr, pte_t *ptep)
{
diff --git a/arch/sh/include/asm/hugetlb.h b/arch/sh/include/asm/hugetlb.h
index 08ee6c00b5e9..9abf9c86b769 100644
--- a/arch/sh/include/asm/hugetlb.h
+++ b/arch/sh/include/asm/hugetlb.h
@@ -25,6 +25,7 @@ static inline int prepare_hugepage_range(struct file *file,
return 0;
}
+#define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
unsigned long addr, pte_t *ptep)
{
diff --git a/arch/sparc/include/asm/hugetlb.h b/arch/sparc/include/asm/hugetlb.h
index 944e3a4bfaff..651a9593fcee 100644
--- a/arch/sparc/include/asm/hugetlb.h
+++ b/arch/sparc/include/asm/hugetlb.h
@@ -42,6 +42,7 @@ static inline int prepare_hugepage_range(struct file *file,
return 0;
}
+#define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
unsigned long addr, pte_t *ptep)
{
diff --git a/arch/x86/include/asm/hugetlb.h b/arch/x86/include/asm/hugetlb.h
index 48b8d9b13cc6..8347d5abf882 100644
--- a/arch/x86/include/asm/hugetlb.h
+++ b/arch/x86/include/asm/hugetlb.h
@@ -27,12 +27,6 @@ static inline int prepare_hugepage_range(struct file *file,
return 0;
}
-static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
- unsigned long addr, pte_t *ptep)
-{
- ptep_clear_flush(vma, addr, ptep);
-}
-
static inline int huge_pte_none(pte_t pte)
{
return pte_none(pte);
diff --git a/include/asm-generic/hugetlb.h b/include/asm-generic/hugetlb.h
index 0f6f151780dd..ffa63fd8388d 100644
--- a/include/asm-generic/hugetlb.h
+++ b/include/asm-generic/hugetlb.h
@@ -65,4 +65,12 @@ static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
}
#endif
+#ifndef __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
+static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep)
+{
+ ptep_clear_flush(vma, addr, ptep);
+}
+#endif
+
#endif /* _ASM_GENERIC_HUGETLB_H */
--
2.16.2
^ permalink raw reply related
* [PATCH 02/11] hugetlb: Introduce generic version of hugetlb_free_pgd_range
From: Alexandre Ghiti @ 2018-07-04 5:51 UTC (permalink / raw)
To: linux, catalin.marinas, will.deacon, tony.luck, fenghua.yu, ralf,
paul.burton, jhogan, jejb, deller, benh, paulus, mpe, ysato,
dalias, davem, tglx, mingo, hpa, x86, arnd, linux-arm-kernel,
linux-kernel, linux-ia64, linux-mips, linux-parisc, linuxppc-dev,
linux-sh, sparclinux, linux-arch
Cc: Alexandre Ghiti
In-Reply-To: <20180704055207.27978-1-alex@ghiti.fr>
arm, arm64, mips, parisc, sh, x86 architectures use the
same version of hugetlb_free_pgd_range, so move this generic
implementation into asm-generic/hugetlb.h.
Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
---
arch/arm/include/asm/hugetlb.h | 12 ++----------
arch/arm64/include/asm/hugetlb.h | 10 ----------
arch/ia64/include/asm/hugetlb.h | 5 +++--
arch/mips/include/asm/hugetlb.h | 13 ++-----------
arch/parisc/include/asm/hugetlb.h | 12 ++----------
arch/powerpc/include/asm/hugetlb.h | 4 +++-
arch/sh/include/asm/hugetlb.h | 12 ++----------
arch/sparc/include/asm/hugetlb.h | 4 +++-
arch/x86/include/asm/hugetlb.h | 11 ++---------
include/asm-generic/hugetlb.h | 11 +++++++++++
10 files changed, 30 insertions(+), 64 deletions(-)
diff --git a/arch/arm/include/asm/hugetlb.h b/arch/arm/include/asm/hugetlb.h
index 7d26f6c4f0f5..047b893ef95d 100644
--- a/arch/arm/include/asm/hugetlb.h
+++ b/arch/arm/include/asm/hugetlb.h
@@ -23,19 +23,9 @@
#define _ASM_ARM_HUGETLB_H
#include <asm/page.h>
-#include <asm-generic/hugetlb.h>
#include <asm/hugetlb-3level.h>
-static inline void hugetlb_free_pgd_range(struct mmu_gather *tlb,
- unsigned long addr, unsigned long end,
- unsigned long floor,
- unsigned long ceiling)
-{
- free_pgd_range(tlb, addr, end, floor, ceiling);
-}
-
-
static inline int is_hugepage_only_range(struct mm_struct *mm,
unsigned long addr, unsigned long len)
{
@@ -68,4 +58,6 @@ static inline void arch_clear_hugepage_flags(struct page *page)
clear_bit(PG_dcache_clean, &page->flags);
}
+#include <asm-generic/hugetlb.h>
+
#endif /* _ASM_ARM_HUGETLB_H */
diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
index 3fcf14663dfa..4af1a800a900 100644
--- a/arch/arm64/include/asm/hugetlb.h
+++ b/arch/arm64/include/asm/hugetlb.h
@@ -25,16 +25,6 @@ static inline pte_t huge_ptep_get(pte_t *ptep)
return READ_ONCE(*ptep);
}
-
-
-static inline void hugetlb_free_pgd_range(struct mmu_gather *tlb,
- unsigned long addr, unsigned long end,
- unsigned long floor,
- unsigned long ceiling)
-{
- free_pgd_range(tlb, addr, end, floor, ceiling);
-}
-
static inline int is_hugepage_only_range(struct mm_struct *mm,
unsigned long addr, unsigned long len)
{
diff --git a/arch/ia64/include/asm/hugetlb.h b/arch/ia64/include/asm/hugetlb.h
index 74d2a5540aaf..afe9fa4d969b 100644
--- a/arch/ia64/include/asm/hugetlb.h
+++ b/arch/ia64/include/asm/hugetlb.h
@@ -3,9 +3,8 @@
#define _ASM_IA64_HUGETLB_H
#include <asm/page.h>
-#include <asm-generic/hugetlb.h>
-
+#define __HAVE_ARCH_HUGETLB_FREE_PGD_RANGE
void hugetlb_free_pgd_range(struct mmu_gather *tlb, unsigned long addr,
unsigned long end, unsigned long floor,
unsigned long ceiling);
@@ -70,4 +69,6 @@ static inline void arch_clear_hugepage_flags(struct page *page)
{
}
+#include <asm-generic/hugetlb.h>
+
#endif /* _ASM_IA64_HUGETLB_H */
diff --git a/arch/mips/include/asm/hugetlb.h b/arch/mips/include/asm/hugetlb.h
index 982bc0685330..53764050243e 100644
--- a/arch/mips/include/asm/hugetlb.h
+++ b/arch/mips/include/asm/hugetlb.h
@@ -10,8 +10,6 @@
#define __ASM_HUGETLB_H
#include <asm/page.h>
-#include <asm-generic/hugetlb.h>
-
static inline int is_hugepage_only_range(struct mm_struct *mm,
unsigned long addr,
@@ -38,15 +36,6 @@ static inline int prepare_hugepage_range(struct file *file,
return 0;
}
-static inline void hugetlb_free_pgd_range(struct mmu_gather *tlb,
- unsigned long addr,
- unsigned long end,
- unsigned long floor,
- unsigned long ceiling)
-{
- free_pgd_range(tlb, addr, end, floor, ceiling);
-}
-
static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte)
{
@@ -114,4 +103,6 @@ static inline void arch_clear_hugepage_flags(struct page *page)
{
}
+#include <asm-generic/hugetlb.h>
+
#endif /* __ASM_HUGETLB_H */
diff --git a/arch/parisc/include/asm/hugetlb.h b/arch/parisc/include/asm/hugetlb.h
index 58e0f4620426..28c23b68d38d 100644
--- a/arch/parisc/include/asm/hugetlb.h
+++ b/arch/parisc/include/asm/hugetlb.h
@@ -3,8 +3,6 @@
#define _ASM_PARISC64_HUGETLB_H
#include <asm/page.h>
-#include <asm-generic/hugetlb.h>
-
void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte);
@@ -32,14 +30,6 @@ static inline int prepare_hugepage_range(struct file *file,
return 0;
}
-static inline void hugetlb_free_pgd_range(struct mmu_gather *tlb,
- unsigned long addr, unsigned long end,
- unsigned long floor,
- unsigned long ceiling)
-{
- free_pgd_range(tlb, addr, end, floor, ceiling);
-}
-
static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
unsigned long addr, pte_t *ptep)
{
@@ -71,4 +61,6 @@ static inline void arch_clear_hugepage_flags(struct page *page)
{
}
+#include <asm-generic/hugetlb.h>
+
#endif /* _ASM_PARISC64_HUGETLB_H */
diff --git a/arch/powerpc/include/asm/hugetlb.h b/arch/powerpc/include/asm/hugetlb.h
index 3225eb6402cc..de46ee16b615 100644
--- a/arch/powerpc/include/asm/hugetlb.h
+++ b/arch/powerpc/include/asm/hugetlb.h
@@ -4,7 +4,6 @@
#ifdef CONFIG_HUGETLB_PAGE
#include <asm/page.h>
-#include <asm-generic/hugetlb.h>
extern struct kmem_cache *hugepte_cache;
@@ -113,6 +112,7 @@ static inline void flush_hugetlb_page(struct vm_area_struct *vma,
void flush_hugetlb_page(struct vm_area_struct *vma, unsigned long vmaddr);
#endif
+#define __HAVE_ARCH_HUGETLB_FREE_PGD_RANGE
void hugetlb_free_pgd_range(struct mmu_gather *tlb, unsigned long addr,
unsigned long end, unsigned long floor,
unsigned long ceiling);
@@ -193,4 +193,6 @@ static inline pte_t *hugepte_offset(hugepd_t hpd, unsigned long addr,
}
#endif /* CONFIG_HUGETLB_PAGE */
+#include <asm-generic/hugetlb.h>
+
#endif /* _ASM_POWERPC_HUGETLB_H */
diff --git a/arch/sh/include/asm/hugetlb.h b/arch/sh/include/asm/hugetlb.h
index 735939c0f513..f6a51b609409 100644
--- a/arch/sh/include/asm/hugetlb.h
+++ b/arch/sh/include/asm/hugetlb.h
@@ -4,8 +4,6 @@
#include <asm/cacheflush.h>
#include <asm/page.h>
-#include <asm-generic/hugetlb.h>
-
static inline int is_hugepage_only_range(struct mm_struct *mm,
unsigned long addr,
@@ -27,14 +25,6 @@ static inline int prepare_hugepage_range(struct file *file,
return 0;
}
-static inline void hugetlb_free_pgd_range(struct mmu_gather *tlb,
- unsigned long addr, unsigned long end,
- unsigned long floor,
- unsigned long ceiling)
-{
- free_pgd_range(tlb, addr, end, floor, ceiling);
-}
-
static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte)
{
@@ -85,4 +75,6 @@ static inline void arch_clear_hugepage_flags(struct page *page)
clear_bit(PG_dcache_clean, &page->flags);
}
+#include <asm-generic/hugetlb.h>
+
#endif /* _ASM_SH_HUGETLB_H */
diff --git a/arch/sparc/include/asm/hugetlb.h b/arch/sparc/include/asm/hugetlb.h
index 300557c66698..59d89b52ccb7 100644
--- a/arch/sparc/include/asm/hugetlb.h
+++ b/arch/sparc/include/asm/hugetlb.h
@@ -3,7 +3,6 @@
#define _ASM_SPARC64_HUGETLB_H
#include <asm/page.h>
-#include <asm-generic/hugetlb.h>
#ifdef CONFIG_HUGETLB_PAGE
struct pud_huge_patch_entry {
@@ -84,8 +83,11 @@ static inline void arch_clear_hugepage_flags(struct page *page)
{
}
+#define __HAVE_ARCH_HUGETLB_FREE_PGD_RANGE
void hugetlb_free_pgd_range(struct mmu_gather *tlb, unsigned long addr,
unsigned long end, unsigned long floor,
unsigned long ceiling);
+#include <asm-generic/hugetlb.h>
+
#endif /* _ASM_SPARC64_HUGETLB_H */
diff --git a/arch/x86/include/asm/hugetlb.h b/arch/x86/include/asm/hugetlb.h
index 5ed826da5e07..996ce8e15365 100644
--- a/arch/x86/include/asm/hugetlb.h
+++ b/arch/x86/include/asm/hugetlb.h
@@ -3,7 +3,6 @@
#define _ASM_X86_HUGETLB_H
#include <asm/page.h>
-#include <asm-generic/hugetlb.h>
#define hugepages_supported() boot_cpu_has(X86_FEATURE_PSE)
@@ -28,14 +27,6 @@ static inline int prepare_hugepage_range(struct file *file,
return 0;
}
-static inline void hugetlb_free_pgd_range(struct mmu_gather *tlb,
- unsigned long addr, unsigned long end,
- unsigned long floor,
- unsigned long ceiling)
-{
- free_pgd_range(tlb, addr, end, floor, ceiling);
-}
-
static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte)
{
@@ -90,4 +81,6 @@ static inline void arch_clear_hugepage_flags(struct page *page)
static inline bool gigantic_page_supported(void) { return true; }
#endif
+#include <asm-generic/hugetlb.h>
+
#endif /* _ASM_X86_HUGETLB_H */
diff --git a/include/asm-generic/hugetlb.h b/include/asm-generic/hugetlb.h
index 3da7cff52360..c697ca9dda18 100644
--- a/include/asm-generic/hugetlb.h
+++ b/include/asm-generic/hugetlb.h
@@ -40,4 +40,15 @@ static inline void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
}
#endif
+#ifndef __HAVE_ARCH_HUGETLB_FREE_PGD_RANGE
+static inline void hugetlb_free_pgd_range(struct mmu_gather *tlb,
+ unsigned long addr, unsigned long end,
+ unsigned long floor, unsigned long ceiling)
+{
+ free_pgd_range(tlb, addr, end, floor, ceiling);
+}
+
+
+#endif
+
#endif /* _ASM_GENERIC_HUGETLB_H */
--
2.16.2
^ permalink raw reply related
* [PATCH 04/11] hugetlb: Introduce generic version of huge_ptep_get_and_clear
From: Alexandre Ghiti @ 2018-07-04 5:52 UTC (permalink / raw)
To: linux, catalin.marinas, will.deacon, tony.luck, fenghua.yu, ralf,
paul.burton, jhogan, jejb, deller, benh, paulus, mpe, ysato,
dalias, davem, tglx, mingo, hpa, x86, arnd, linux-arm-kernel,
linux-kernel, linux-ia64, linux-mips, linux-parisc, linuxppc-dev,
linux-sh, sparclinux, linux-arch
Cc: Alexandre Ghiti
In-Reply-To: <20180704055207.27978-1-alex@ghiti.fr>
arm, ia64, sh, x86 architectures use the
same version of huge_ptep_get_and_clear, so move this generic
implementation into asm-generic/hugetlb.h.
Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
---
arch/arm/include/asm/hugetlb-3level.h | 6 ------
arch/arm64/include/asm/hugetlb.h | 1 +
arch/ia64/include/asm/hugetlb.h | 6 ------
arch/mips/include/asm/hugetlb.h | 1 +
arch/parisc/include/asm/hugetlb.h | 1 +
arch/powerpc/include/asm/hugetlb.h | 1 +
arch/sh/include/asm/hugetlb.h | 6 ------
arch/sparc/include/asm/hugetlb.h | 1 +
arch/x86/include/asm/hugetlb.h | 6 ------
include/asm-generic/hugetlb.h | 8 ++++++++
10 files changed, 13 insertions(+), 24 deletions(-)
diff --git a/arch/arm/include/asm/hugetlb-3level.h b/arch/arm/include/asm/hugetlb-3level.h
index 398fb06e8207..ad36e84b819a 100644
--- a/arch/arm/include/asm/hugetlb-3level.h
+++ b/arch/arm/include/asm/hugetlb-3level.h
@@ -49,12 +49,6 @@ static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
ptep_set_wrprotect(mm, addr, ptep);
}
-static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
- unsigned long addr, pte_t *ptep)
-{
- return ptep_get_and_clear(mm, addr, ptep);
-}
-
static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
unsigned long addr, pte_t *ptep,
pte_t pte, int dirty)
diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
index 874661a1dff1..6ae0bcafe162 100644
--- a/arch/arm64/include/asm/hugetlb.h
+++ b/arch/arm64/include/asm/hugetlb.h
@@ -66,6 +66,7 @@ extern void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
extern int huge_ptep_set_access_flags(struct vm_area_struct *vma,
unsigned long addr, pte_t *ptep,
pte_t pte, int dirty);
+#define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
extern pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
unsigned long addr, pte_t *ptep);
extern void huge_ptep_set_wrprotect(struct mm_struct *mm,
diff --git a/arch/ia64/include/asm/hugetlb.h b/arch/ia64/include/asm/hugetlb.h
index a235d6f60fb3..6719c74da0de 100644
--- a/arch/ia64/include/asm/hugetlb.h
+++ b/arch/ia64/include/asm/hugetlb.h
@@ -20,12 +20,6 @@ static inline int is_hugepage_only_range(struct mm_struct *mm,
REGION_NUMBER((addr)+(len)-1) == RGN_HPAGE);
}
-static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
- unsigned long addr, pte_t *ptep)
-{
- return ptep_get_and_clear(mm, addr, ptep);
-}
-
static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
unsigned long addr, pte_t *ptep)
{
diff --git a/arch/mips/include/asm/hugetlb.h b/arch/mips/include/asm/hugetlb.h
index 8ea439041d5d..0959cc5a41fa 100644
--- a/arch/mips/include/asm/hugetlb.h
+++ b/arch/mips/include/asm/hugetlb.h
@@ -36,6 +36,7 @@ static inline int prepare_hugepage_range(struct file *file,
return 0;
}
+#define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
unsigned long addr, pte_t *ptep)
{
diff --git a/arch/parisc/include/asm/hugetlb.h b/arch/parisc/include/asm/hugetlb.h
index 77c8adbac7c3..6e281e1bb336 100644
--- a/arch/parisc/include/asm/hugetlb.h
+++ b/arch/parisc/include/asm/hugetlb.h
@@ -8,6 +8,7 @@
void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte);
+#define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
pte_t *ptep);
diff --git a/arch/powerpc/include/asm/hugetlb.h b/arch/powerpc/include/asm/hugetlb.h
index ba7d5d8b543f..ec3e0c2e78f8 100644
--- a/arch/powerpc/include/asm/hugetlb.h
+++ b/arch/powerpc/include/asm/hugetlb.h
@@ -132,6 +132,7 @@ static inline int prepare_hugepage_range(struct file *file,
return 0;
}
+#define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
unsigned long addr, pte_t *ptep)
{
diff --git a/arch/sh/include/asm/hugetlb.h b/arch/sh/include/asm/hugetlb.h
index bc552e37c1c9..08ee6c00b5e9 100644
--- a/arch/sh/include/asm/hugetlb.h
+++ b/arch/sh/include/asm/hugetlb.h
@@ -25,12 +25,6 @@ static inline int prepare_hugepage_range(struct file *file,
return 0;
}
-static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
- unsigned long addr, pte_t *ptep)
-{
- return ptep_get_and_clear(mm, addr, ptep);
-}
-
static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
unsigned long addr, pte_t *ptep)
{
diff --git a/arch/sparc/include/asm/hugetlb.h b/arch/sparc/include/asm/hugetlb.h
index 16b0c53ea6c9..944e3a4bfaff 100644
--- a/arch/sparc/include/asm/hugetlb.h
+++ b/arch/sparc/include/asm/hugetlb.h
@@ -16,6 +16,7 @@ extern struct pud_huge_patch_entry __pud_huge_patch, __pud_huge_patch_end;
void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte);
+#define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
pte_t *ptep);
diff --git a/arch/x86/include/asm/hugetlb.h b/arch/x86/include/asm/hugetlb.h
index 554d5614b375..48b8d9b13cc6 100644
--- a/arch/x86/include/asm/hugetlb.h
+++ b/arch/x86/include/asm/hugetlb.h
@@ -27,12 +27,6 @@ static inline int prepare_hugepage_range(struct file *file,
return 0;
}
-static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
- unsigned long addr, pte_t *ptep)
-{
- return ptep_get_and_clear(mm, addr, ptep);
-}
-
static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
unsigned long addr, pte_t *ptep)
{
diff --git a/include/asm-generic/hugetlb.h b/include/asm-generic/hugetlb.h
index ee010b756246..0f6f151780dd 100644
--- a/include/asm-generic/hugetlb.h
+++ b/include/asm-generic/hugetlb.h
@@ -57,4 +57,12 @@ static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
}
#endif
+#ifndef __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
+static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
+ unsigned long addr, pte_t *ptep)
+{
+ return ptep_get_and_clear(mm, addr, ptep);
+}
+#endif
+
#endif /* _ASM_GENERIC_HUGETLB_H */
--
2.16.2
^ permalink raw reply related
* [PATCH 01/11] hugetlb: Harmonize hugetlb.h arch specific defines with pgtable.h
From: Alexandre Ghiti @ 2018-07-04 5:51 UTC (permalink / raw)
To: linux, catalin.marinas, will.deacon, tony.luck, fenghua.yu, ralf,
paul.burton, jhogan, jejb, deller, benh, paulus, mpe, ysato,
dalias, davem, tglx, mingo, hpa, x86, arnd, linux-arm-kernel,
linux-kernel, linux-ia64, linux-mips, linux-parisc, linuxppc-dev,
linux-sh, sparclinux, linux-arch
Cc: Alexandre Ghiti
In-Reply-To: <20180704055207.27978-1-alex@ghiti.fr>
asm-generic/hugetlb.h proposes generic implementations of hugetlb
related functions: use __HAVE_ARCH_HUGE* defines in order to make arch
specific implementations of hugetlb functions consistent with pgtable.h
scheme.
Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
---
arch/arm64/include/asm/hugetlb.h | 2 +-
include/asm-generic/hugetlb.h | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
index e73f68569624..3fcf14663dfa 100644
--- a/arch/arm64/include/asm/hugetlb.h
+++ b/arch/arm64/include/asm/hugetlb.h
@@ -81,9 +81,9 @@ extern void huge_ptep_set_wrprotect(struct mm_struct *mm,
unsigned long addr, pte_t *ptep);
extern void huge_ptep_clear_flush(struct vm_area_struct *vma,
unsigned long addr, pte_t *ptep);
+#define __HAVE_ARCH_HUGE_PTE_CLEAR
extern void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, unsigned long sz);
-#define huge_pte_clear huge_pte_clear
extern void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte, unsigned long sz);
#define set_huge_swap_pte_at set_huge_swap_pte_at
diff --git a/include/asm-generic/hugetlb.h b/include/asm-generic/hugetlb.h
index 9d0cde8ab716..3da7cff52360 100644
--- a/include/asm-generic/hugetlb.h
+++ b/include/asm-generic/hugetlb.h
@@ -32,7 +32,7 @@ static inline pte_t huge_pte_modify(pte_t pte, pgprot_t newprot)
return pte_modify(pte, newprot);
}
-#ifndef huge_pte_clear
+#ifndef __HAVE_ARCH_HUGE_PTE_CLEAR
static inline void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, unsigned long sz)
{
--
2.16.2
^ permalink raw reply related
* [PATCH kernel v3 3/6] KVM: PPC: Make iommu_table::it_userspace big endian
From: Alexey Kardashevskiy @ 2018-07-04 6:13 UTC (permalink / raw)
To: linuxppc-dev
Cc: Alexey Kardashevskiy, David Gibson, kvm-ppc, kvm, Alex Williamson,
Benjamin Herrenschmidt, Michael Ellerman, Russell Currey
In-Reply-To: <20180704061349.20742-1-aik@ozlabs.ru>
We are going to reuse multilevel TCE code for the userspace copy of
the TCE table and since it is big endian, let's make the copy big endian
too.
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
arch/powerpc/include/asm/iommu.h | 2 +-
arch/powerpc/kvm/book3s_64_vio.c | 11 ++++++-----
arch/powerpc/kvm/book3s_64_vio_hv.c | 10 +++++-----
drivers/vfio/vfio_iommu_spapr_tce.c | 19 +++++++++----------
4 files changed, 21 insertions(+), 21 deletions(-)
diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 20febe0..803ac70 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -117,7 +117,7 @@ struct iommu_table {
unsigned long *it_map; /* A simple allocation bitmap for now */
unsigned long it_page_shift;/* table iommu page size */
struct list_head it_group_list;/* List of iommu_table_group_link */
- unsigned long *it_userspace; /* userspace view of the table */
+ __be64 *it_userspace; /* userspace view of the table */
struct iommu_table_ops *it_ops;
struct kref it_kref;
};
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 8167ce8..6f34edd 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -377,19 +377,19 @@ static long kvmppc_tce_iommu_mapped_dec(struct kvm *kvm,
{
struct mm_iommu_table_group_mem_t *mem = NULL;
const unsigned long pgsize = 1ULL << tbl->it_page_shift;
- unsigned long *pua = IOMMU_TABLE_USERSPACE_ENTRY(tbl, entry);
+ __be64 *pua = IOMMU_TABLE_USERSPACE_ENTRY(tbl, entry);
if (!pua)
/* it_userspace allocation might be delayed */
return H_TOO_HARD;
- mem = mm_iommu_lookup(kvm->mm, *pua, pgsize);
+ mem = mm_iommu_lookup(kvm->mm, be64_to_cpu(*pua), pgsize);
if (!mem)
return H_TOO_HARD;
mm_iommu_mapped_dec(mem);
- *pua = 0;
+ *pua = cpu_to_be64(0);
return H_SUCCESS;
}
@@ -436,7 +436,8 @@ long kvmppc_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl,
enum dma_data_direction dir)
{
long ret;
- unsigned long hpa, *pua = IOMMU_TABLE_USERSPACE_ENTRY(tbl, entry);
+ unsigned long hpa;
+ __be64 *pua = IOMMU_TABLE_USERSPACE_ENTRY(tbl, entry);
struct mm_iommu_table_group_mem_t *mem;
if (!pua)
@@ -463,7 +464,7 @@ long kvmppc_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl,
if (dir != DMA_NONE)
kvmppc_tce_iommu_mapped_dec(kvm, tbl, entry);
- *pua = ua;
+ *pua = cpu_to_be64(ua);
return 0;
}
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index 5b298f5..841aef7 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -200,7 +200,7 @@ static long kvmppc_rm_tce_iommu_mapped_dec(struct kvm *kvm,
{
struct mm_iommu_table_group_mem_t *mem = NULL;
const unsigned long pgsize = 1ULL << tbl->it_page_shift;
- unsigned long *pua = IOMMU_TABLE_USERSPACE_ENTRY(tbl, entry);
+ __be64 *pua = IOMMU_TABLE_USERSPACE_ENTRY(tbl, entry);
if (!pua)
/* it_userspace allocation might be delayed */
@@ -210,13 +210,13 @@ static long kvmppc_rm_tce_iommu_mapped_dec(struct kvm *kvm,
if (WARN_ON_ONCE_RM(!pua))
return H_HARDWARE;
- mem = mm_iommu_lookup_rm(kvm->mm, *pua, pgsize);
+ mem = mm_iommu_lookup_rm(kvm->mm, be64_to_cpu(*pua), pgsize);
if (!mem)
return H_TOO_HARD;
mm_iommu_mapped_dec(mem);
- *pua = 0;
+ *pua = cpu_to_be64(0);
return H_SUCCESS;
}
@@ -268,7 +268,7 @@ static long kvmppc_rm_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl,
{
long ret;
unsigned long hpa = 0;
- unsigned long *pua = IOMMU_TABLE_USERSPACE_ENTRY(tbl, entry);
+ __be64 *pua = IOMMU_TABLE_USERSPACE_ENTRY(tbl, entry);
struct mm_iommu_table_group_mem_t *mem;
if (!pua)
@@ -303,7 +303,7 @@ static long kvmppc_rm_tce_iommu_do_map(struct kvm *kvm, struct iommu_table *tbl,
if (dir != DMA_NONE)
kvmppc_rm_tce_iommu_mapped_dec(kvm, tbl, entry);
- *pua = ua;
+ *pua = cpu_to_be64(ua);
return 0;
}
diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c
index 7cd63b0..17a418c 100644
--- a/drivers/vfio/vfio_iommu_spapr_tce.c
+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
@@ -230,7 +230,7 @@ static long tce_iommu_userspace_view_alloc(struct iommu_table *tbl,
decrement_locked_vm(mm, cb >> PAGE_SHIFT);
return -ENOMEM;
}
- tbl->it_userspace = uas;
+ tbl->it_userspace = (__be64 *) uas;
return 0;
}
@@ -482,20 +482,20 @@ static void tce_iommu_unuse_page_v2(struct tce_container *container,
struct mm_iommu_table_group_mem_t *mem = NULL;
int ret;
unsigned long hpa = 0;
- unsigned long *pua = IOMMU_TABLE_USERSPACE_ENTRY(tbl, entry);
+ __be64 *pua = IOMMU_TABLE_USERSPACE_ENTRY(tbl, entry);
if (!pua)
return;
- ret = tce_iommu_prereg_ua_to_hpa(container, *pua, tbl->it_page_shift,
- &hpa, &mem);
+ ret = tce_iommu_prereg_ua_to_hpa(container, be64_to_cpu(*pua),
+ tbl->it_page_shift, &hpa, &mem);
if (ret)
- pr_debug("%s: tce %lx at #%lx was not cached, ret=%d\n",
- __func__, *pua, entry, ret);
+ pr_debug("%s: tce %llx at #%lx was not cached, ret=%d\n",
+ __func__, be64_to_cpu(*pua), entry, ret);
if (mem)
mm_iommu_mapped_dec(mem);
- *pua = 0;
+ *pua = cpu_to_be64(0);
}
static int tce_iommu_clear(struct tce_container *container,
@@ -607,8 +607,7 @@ static long tce_iommu_build_v2(struct tce_container *container,
for (i = 0; i < pages; ++i) {
struct mm_iommu_table_group_mem_t *mem = NULL;
- unsigned long *pua = IOMMU_TABLE_USERSPACE_ENTRY(tbl,
- entry + i);
+ __be64 *pua = IOMMU_TABLE_USERSPACE_ENTRY(tbl, entry + i);
ret = tce_iommu_prereg_ua_to_hpa(container,
tce, tbl->it_page_shift, &hpa, &mem);
@@ -642,7 +641,7 @@ static long tce_iommu_build_v2(struct tce_container *container,
if (dirtmp != DMA_NONE)
tce_iommu_unuse_page_v2(container, tbl, entry + i);
- *pua = tce;
+ *pua = cpu_to_be64(tce);
tce += IOMMU_PAGE_SIZE(tbl);
}
--
2.11.0
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox