Linux Documentation
 help / color / mirror / Atom feed
* [PATCH RFC V2 4/6] ARM: bcm2835_defconfig: Enable RPi voltage sensor
From: Stefan Wahren @ 2018-05-22 11:21 UTC (permalink / raw)
  To: Rob Herring, Mark Rutland, Jean Delvare, Guenter Roeck,
	Jonathan Corbet, Eric Anholt
  Cc: Florian Fainelli, Ray Jui, Scott Branden, Phil Elwell,
	bcm-kernel-feedback-list, devicetree, linux-arm-kernel,
	linux-rpi-kernel, linux-hwmon, linux-doc, Stefan Wahren
In-Reply-To: <1526988112-4021-1-git-send-email-stefan.wahren@i2se.com>

The patch enables the hwmon driver for the Raspberry Pi.

Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
---
 arch/arm/configs/bcm2835_defconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/configs/bcm2835_defconfig b/arch/arm/configs/bcm2835_defconfig
index e4d188f..e9bc889 100644
--- a/arch/arm/configs/bcm2835_defconfig
+++ b/arch/arm/configs/bcm2835_defconfig
@@ -86,7 +86,7 @@ CONFIG_SPI=y
 CONFIG_SPI_BCM2835=y
 CONFIG_SPI_BCM2835AUX=y
 CONFIG_GPIO_SYSFS=y
-# CONFIG_HWMON is not set
+CONFIG_SENSORS_RASPBERRYPI_HWMON=m
 CONFIG_THERMAL=y
 CONFIG_BCM2835_THERMAL=y
 CONFIG_WATCHDOG=y
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH RFC V2 2/6] hwmon: Add support for RPi voltage sensor
From: Stefan Wahren @ 2018-05-22 11:21 UTC (permalink / raw)
  To: Rob Herring, Mark Rutland, Jean Delvare, Guenter Roeck,
	Jonathan Corbet, Eric Anholt
  Cc: Florian Fainelli, Ray Jui, Scott Branden, Phil Elwell,
	bcm-kernel-feedback-list, devicetree, linux-arm-kernel,
	linux-rpi-kernel, linux-hwmon, linux-doc, Stefan Wahren,
	Noralf Trønnes
In-Reply-To: <1526988112-4021-1-git-send-email-stefan.wahren@i2se.com>

Currently there is no easy way to detect undervoltage conditions on a
remote Raspberry Pi. This hwmon driver retrieves the state of the
undervoltage sensor via mailbox interface. The handling based on
Noralf's modifications to the downstream firmware driver. In case of
an undervoltage condition only an entry is written to the kernel log.

CC: "Noralf Trønnes" <noralf@tronnes.org>
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
---
 Documentation/hwmon/raspberrypi-hwmon |  22 +++++
 drivers/hwmon/Kconfig                 |  10 ++
 drivers/hwmon/Makefile                |   1 +
 drivers/hwmon/raspberrypi-hwmon.c     | 168 ++++++++++++++++++++++++++++++++++
 4 files changed, 201 insertions(+)
 create mode 100644 Documentation/hwmon/raspberrypi-hwmon
 create mode 100644 drivers/hwmon/raspberrypi-hwmon.c

diff --git a/Documentation/hwmon/raspberrypi-hwmon b/Documentation/hwmon/raspberrypi-hwmon
new file mode 100644
index 0000000..3c92e2c
--- /dev/null
+++ b/Documentation/hwmon/raspberrypi-hwmon
@@ -0,0 +1,22 @@
+Kernel driver raspberrypi-hwmon
+===============================
+
+Supported boards:
+  * Raspberry Pi A+ (via GPIO on SoC)
+  * Raspberry Pi B+ (via GPIO on SoC)
+  * Raspberry Pi 2 B (via GPIO on SoC)
+  * Raspberry Pi 3 B (via GPIO on port expander)
+  * Raspberry Pi 3 B+ (via PMIC)
+
+Author: Stefan Wahren <stefan.wahren@i2se.com>
+
+Description
+-----------
+
+This driver periodically polls a mailbox property of the VC4 firmware to detect
+undervoltage conditions.
+
+Sysfs entries
+-------------
+
+in0_lcrit_alarm		Undervoltage alarm
diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
index 768aed5..9a5bdb0 100644
--- a/drivers/hwmon/Kconfig
+++ b/drivers/hwmon/Kconfig
@@ -1298,6 +1298,16 @@ config SENSORS_PWM_FAN
 	  This driver can also be built as a module.  If so, the module
 	  will be called pwm-fan.
 
+config SENSORS_RASPBERRYPI_HWMON
+	tristate "Raspberry Pi voltage monitor"
+	depends on RASPBERRYPI_FIRMWARE || COMPILE_TEST
+	help
+	  If you say yes here you get support for voltage sensor on the
+	  Raspberry Pi.
+
+	  This driver can also be built as a module. If so, the module
+	  will be called raspberrypi-hwmon.
+
 config SENSORS_SHT15
 	tristate "Sensiron humidity and temperature sensors. SHT15 and compat."
 	depends on GPIOLIB || COMPILE_TEST
diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
index e7d52a3..a929770 100644
--- a/drivers/hwmon/Makefile
+++ b/drivers/hwmon/Makefile
@@ -141,6 +141,7 @@ obj-$(CONFIG_SENSORS_PC87427)	+= pc87427.o
 obj-$(CONFIG_SENSORS_PCF8591)	+= pcf8591.o
 obj-$(CONFIG_SENSORS_POWR1220)  += powr1220.o
 obj-$(CONFIG_SENSORS_PWM_FAN)	+= pwm-fan.o
+obj-$(CONFIG_SENSORS_RASPBERRYPI_HWMON)	+= raspberrypi-hwmon.o
 obj-$(CONFIG_SENSORS_S3C)	+= s3c-hwmon.o
 obj-$(CONFIG_SENSORS_SCH56XX_COMMON)+= sch56xx-common.o
 obj-$(CONFIG_SENSORS_SCH5627)	+= sch5627.o
diff --git a/drivers/hwmon/raspberrypi-hwmon.c b/drivers/hwmon/raspberrypi-hwmon.c
new file mode 100644
index 0000000..6233e84
--- /dev/null
+++ b/drivers/hwmon/raspberrypi-hwmon.c
@@ -0,0 +1,168 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Raspberry Pi voltage sensor driver
+ *
+ * Based on firmware/raspberrypi.c by Noralf Trønnes
+ *
+ * Copyright (C) 2018 Stefan Wahren <stefan.wahren@i2se.com>
+ */
+#include <linux/device.h>
+#include <linux/err.h>
+#include <linux/hwmon.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/slab.h>
+#include <linux/workqueue.h>
+#include <soc/bcm2835/raspberrypi-firmware.h>
+
+#define UNDERVOLTAGE_STICKY_BIT	BIT(16)
+
+struct rpi_hwmon_data {
+	struct device *hwmon_dev;
+	struct rpi_firmware *fw;
+	u32 last_throttled;
+	struct delayed_work get_values_poll_work;
+};
+
+static void rpi_firmware_get_throttled(struct rpi_hwmon_data *data)
+{
+	u32 new_uv, old_uv, value;
+	int ret;
+
+	/* Request firmware to clear sticky bits */
+	value = 0xffff;
+
+	ret = rpi_firmware_property(data->fw, RPI_FIRMWARE_GET_THROTTLED,
+				    &value, sizeof(value));
+	if (ret) {
+		dev_err_once(data->hwmon_dev, "Failed to get throttled (%d)\n",
+			     ret);
+		return;
+	}
+
+	new_uv = value & UNDERVOLTAGE_STICKY_BIT;
+	old_uv = data->last_throttled & UNDERVOLTAGE_STICKY_BIT;
+	data->last_throttled = value;
+
+	if (new_uv == old_uv)
+		return;
+
+	if (new_uv)
+		dev_crit(data->hwmon_dev, "Undervoltage detected!\n");
+	else
+		dev_info(data->hwmon_dev, "Voltage normalised\n");
+
+	sysfs_notify(&data->hwmon_dev->kobj, NULL, "in0_lcrit_alarm");
+}
+
+static void get_values_poll(struct work_struct *work)
+{
+	struct rpi_hwmon_data *data;
+
+	data = container_of(work, struct rpi_hwmon_data,
+			    get_values_poll_work.work);
+
+	rpi_firmware_get_throttled(data);
+
+	/*
+	 * We can't run faster than the sticky shift (100ms) since we get
+	 * flipping in the sticky bits that are cleared.
+	 */
+	schedule_delayed_work(&data->get_values_poll_work, 2 * HZ);
+}
+
+static int rpi_read(struct device *dev, enum hwmon_sensor_types type,
+		    u32 attr, int channel, long *val)
+{
+	struct rpi_hwmon_data *data = dev_get_drvdata(dev);
+
+	*val = !!(data->last_throttled & UNDERVOLTAGE_STICKY_BIT);
+	return 0;
+}
+
+static umode_t rpi_is_visible(const void *_data, enum hwmon_sensor_types type,
+			      u32 attr, int channel)
+{
+	return 0444;
+}
+
+static const u32 rpi_in_config[] = {
+	HWMON_I_LCRIT_ALARM,
+	0
+};
+
+static const struct hwmon_channel_info rpi_in = {
+	.type = hwmon_in,
+	.config = rpi_in_config,
+};
+
+static const struct hwmon_channel_info *rpi_info[] = {
+	&rpi_in,
+	NULL
+};
+
+static const struct hwmon_ops rpi_hwmon_ops = {
+	.is_visible = rpi_is_visible,
+	.read = rpi_read,
+};
+
+static const struct hwmon_chip_info rpi_chip_info = {
+	.ops = &rpi_hwmon_ops,
+	.info = rpi_info,
+};
+
+static int rpi_hwmon_probe(struct platform_device *pdev)
+{
+	struct device *dev = &pdev->dev;
+	struct rpi_hwmon_data *data;
+	int ret;
+
+	data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
+	if (!data)
+		return -ENOMEM;
+
+	data->fw = platform_get_drvdata(to_platform_device(dev->parent));
+	if (!data->fw)
+		return -EPROBE_DEFER;
+
+	ret = rpi_firmware_property(data->fw, RPI_FIRMWARE_GET_THROTTLED,
+				    &data->last_throttled,
+				    sizeof(data->last_throttled));
+	if (ret)
+		return -ENODEV;
+
+	data->hwmon_dev = devm_hwmon_device_register_with_info(dev, "rpi_volt",
+							       data,
+							       &rpi_chip_info,
+							       NULL);
+
+	INIT_DELAYED_WORK(&data->get_values_poll_work, get_values_poll);
+	platform_set_drvdata(pdev, data);
+
+	if (!PTR_ERR_OR_ZERO(data->hwmon_dev))
+		schedule_delayed_work(&data->get_values_poll_work, 2 * HZ);
+
+	return PTR_ERR_OR_ZERO(data->hwmon_dev);
+}
+
+static int rpi_hwmon_remove(struct platform_device *pdev)
+{
+	struct rpi_hwmon_data *data = platform_get_drvdata(pdev);
+
+	cancel_delayed_work_sync(&data->get_values_poll_work);
+
+	return 0;
+}
+
+static struct platform_driver rpi_hwmon_driver = {
+	.probe = rpi_hwmon_probe,
+	.remove = rpi_hwmon_remove,
+	.driver = {
+		.name = "raspberrypi-hwmon",
+	},
+};
+module_platform_driver(rpi_hwmon_driver);
+
+MODULE_AUTHOR("Stefan Wahren <stefan.wahren@i2se.com>");
+MODULE_DESCRIPTION("Raspberry Pi voltage sensor driver");
+MODULE_LICENSE("GPL v2");
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH RFC V2 3/6] firmware: raspberrypi: Register hwmon driver
From: Stefan Wahren @ 2018-05-22 11:21 UTC (permalink / raw)
  To: Rob Herring, Mark Rutland, Jean Delvare, Guenter Roeck,
	Jonathan Corbet, Eric Anholt
  Cc: Florian Fainelli, Ray Jui, Scott Branden, Phil Elwell,
	bcm-kernel-feedback-list, devicetree, linux-arm-kernel,
	linux-rpi-kernel, linux-hwmon, linux-doc, Stefan Wahren
In-Reply-To: <1526988112-4021-1-git-send-email-stefan.wahren@i2se.com>

Since the raspberrypi-hwmon driver is tied to the VC4 firmware instead of
particular hardware its registration should be in the firmware driver.

Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
---
 drivers/firmware/raspberrypi.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/drivers/firmware/raspberrypi.c b/drivers/firmware/raspberrypi.c
index 6692888f..0602626 100644
--- a/drivers/firmware/raspberrypi.c
+++ b/drivers/firmware/raspberrypi.c
@@ -21,6 +21,8 @@
 #define MBOX_DATA28(msg)		((msg) & ~0xf)
 #define MBOX_CHAN_PROPERTY		8
 
+static struct platform_device *rpi_hwmon;
+
 struct rpi_firmware {
 	struct mbox_client cl;
 	struct mbox_chan *chan; /* The property channel. */
@@ -183,6 +185,20 @@ rpi_firmware_print_firmware_revision(struct rpi_firmware *fw)
 	}
 }
 
+static void
+rpi_register_hwmon_driver(struct device *dev, struct rpi_firmware *fw)
+{
+	u32 packet;
+	int ret = rpi_firmware_property(fw, RPI_FIRMWARE_GET_THROTTLED,
+					&packet, sizeof(packet));
+
+	if (ret)
+		return;
+
+	rpi_hwmon = platform_device_register_data(dev, "raspberrypi-hwmon",
+						  -1, NULL, 0);
+}
+
 static int rpi_firmware_probe(struct platform_device *pdev)
 {
 	struct device *dev = &pdev->dev;
@@ -209,6 +225,7 @@ static int rpi_firmware_probe(struct platform_device *pdev)
 	platform_set_drvdata(pdev, fw);
 
 	rpi_firmware_print_firmware_revision(fw);
+	rpi_register_hwmon_driver(dev, fw);
 
 	return 0;
 }
@@ -217,6 +234,8 @@ static int rpi_firmware_remove(struct platform_device *pdev)
 {
 	struct rpi_firmware *fw = platform_get_drvdata(pdev);
 
+	platform_device_unregister(rpi_hwmon);
+	rpi_hwmon = NULL;
 	mbox_free_channel(fw->chan);
 
 	return 0;
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH RFC V2 6/6] arm64: defconfig: Enable RPi voltage sensor
From: Stefan Wahren @ 2018-05-22 11:21 UTC (permalink / raw)
  To: Rob Herring, Mark Rutland, Jean Delvare, Guenter Roeck,
	Jonathan Corbet, Eric Anholt
  Cc: Florian Fainelli, Ray Jui, Scott Branden, Phil Elwell,
	bcm-kernel-feedback-list, devicetree, linux-arm-kernel,
	linux-rpi-kernel, linux-hwmon, linux-doc, Stefan Wahren
In-Reply-To: <1526988112-4021-1-git-send-email-stefan.wahren@i2se.com>

The patch enables the hwmon driver for the Raspberry Pi.

Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
---
 arch/arm64/configs/defconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
index d25121b..5cdecef 100644
--- a/arch/arm64/configs/defconfig
+++ b/arch/arm64/configs/defconfig
@@ -352,6 +352,7 @@ CONFIG_BATTERY_BQ27XXX=y
 CONFIG_SENSORS_ARM_SCPI=y
 CONFIG_SENSORS_LM90=m
 CONFIG_SENSORS_INA2XX=m
+CONFIG_SENSORS_RASPBERRYPI_HWMON=m
 CONFIG_THERMAL_GOV_POWER_ALLOCATOR=y
 CONFIG_CPU_THERMAL=y
 CONFIG_THERMAL_EMULATION=y
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [RFT v2 1/4] perf cs-etm: Generate sample for missed packets
From: Leo Yan @ 2018-05-22  9:52 UTC (permalink / raw)
  To: Robert Walker
  Cc: Arnaldo Carvalho de Melo, Mathieu Poirier, Jonathan Corbet,
	Peter Zijlstra, Ingo Molnar, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, linux-arm-kernel, linux-doc, linux-kernel,
	Tor Jeremiassen, mike.leach, kim.phillips, coresight, Mike Leach
In-Reply-To: <20180522083920.GD31075@leoy-ThinkPad-X240s>

On Tue, May 22, 2018 at 04:39:20PM +0800, Leo Yan wrote:

[...]

Rather than the patch I posted in my previous email, I think below new
patch is more reasonable for me.

In the below change, 'etmq->prev_packet' is only used to store the
previous CS_ETM_RANGE packet, we don't need to save CS_ETM_TRACE_ON
packet into 'etmq->prev_packet'.

On the other hand, cs_etm__flush() can use 'etmq->period_instructions'
to indicate if need to generate instruction sample or not.  If it's
non-zero, then generate instruction sample and
'etmq->period_instructions' will be cleared; so next time if there
have more tracing CS_ETM_TRACE_ON packet, it can skip to generate
instruction sample due 'etmq->period_instructions' is zero.

How about you think for this?

Thanks,
Leo Yan


diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 822ba91..dd354ad 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -495,6 +495,13 @@ static inline void cs_etm__reset_last_branch_rb(struct cs_etm_queue *etmq)
 static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet *packet)
 {
 	/*
+	 * The packet is the start tracing packet if the end_addr is zero,
+	 * returns 0 for this case.
+	 */
+	if (!packet->end_addr)
+		return 0;
+
+	/*
 	 * The packet records the execution range with an exclusive end address
 	 *
 	 * A64 instructions are constant size, so the last executed
@@ -897,13 +904,27 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
 		etmq->period_instructions = instrs_over;
 	}
 
-	if (etm->sample_branches &&
-	    etmq->prev_packet &&
-	    etmq->prev_packet->sample_type == CS_ETM_RANGE &&
-	    etmq->prev_packet->last_instr_taken_branch) {
-		ret = cs_etm__synth_branch_sample(etmq);
-		if (ret)
-			return ret;
+	if (etm->sample_branches && etmq->prev_packet) {
+		bool generate_sample = false;
+
+		/* Generate sample for start tracing packet */
+		if (etmq->prev_packet->sample_type == 0)
+			generate_sample = true;
+
+		/* Generate sample for exception packet */
+		if (etmq->prev_packet->exc == true)
+			generate_sample = true;
+
+		/* Generate sample for normal branch packet */
+		if (etmq->prev_packet->sample_type == CS_ETM_RANGE &&
+		    etmq->prev_packet->last_instr_taken_branch)
+			generate_sample = true;
+
+		if (generate_sample) {
+			ret = cs_etm__synth_branch_sample(etmq);
+			if (ret)
+				return ret;
+		}
 	}
 
 	if (etm->sample_branches || etm->synth_opts.last_branch) {
@@ -922,11 +943,12 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
 static int cs_etm__flush(struct cs_etm_queue *etmq)
 {
 	int err = 0;
-	struct cs_etm_packet *tmp;
 
 	if (etmq->etm->synth_opts.last_branch &&
 	    etmq->prev_packet &&
-	    etmq->prev_packet->sample_type == CS_ETM_RANGE) {
+	    etmq->prev_packet->sample_type == CS_ETM_RANGE &&
+	    etmq->period_instructions) {
+
 		/*
 		 * Generate a last branch event for the branches left in the
 		 * circular buffer at the end of the trace.
@@ -940,14 +962,6 @@ static int cs_etm__flush(struct cs_etm_queue *etmq)
 			etmq, addr,
 			etmq->period_instructions);
 		etmq->period_instructions = 0;
-
-		/*
-		 * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
-		 * the next incoming packet.
-		 */
-		tmp = etmq->packet;
-		etmq->packet = etmq->prev_packet;
-		etmq->prev_packet = tmp;
 	}
 
 	return err;
-- 
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [RFT v2 1/4] perf cs-etm: Generate sample for missed packets
From: Leo Yan @ 2018-05-22  8:39 UTC (permalink / raw)
  To: Robert Walker
  Cc: Arnaldo Carvalho de Melo, Mathieu Poirier, Jonathan Corbet,
	Peter Zijlstra, Ingo Molnar, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, linux-arm-kernel, linux-doc, linux-kernel,
	Tor Jeremiassen, mike.leach, kim.phillips, coresight, Mike Leach
In-Reply-To: <bc320d6d-a21d-b5a4-e30d-511bb69aba32@arm.com>

Hi Rob,

On Mon, May 21, 2018 at 12:27:42PM +0100, Robert Walker wrote:
> Hi Leo,
> 
> On 21/05/18 09:52, Leo Yan wrote:
> >Commit e573e978fb12 ("perf cs-etm: Inject capabilitity for CoreSight
> >traces") reworks the samples generation flow from CoreSight trace to
> >match the correct format so Perf report tool can display the samples
> >properly.  But the change has side effect for packet handling, it only
> >generate samples when 'prev_packet->last_instr_taken_branch' is true,
> >this results in the start tracing packet and exception packets are
> >dropped.
> >
> >This patch checks extra two conditions for complete samples:
> >
> >- If 'prev_packet->sample_type' is zero we can use this condition to
> >   get to know this is the start tracing packet; for this case, the start
> >   packet's end_addr is zero as well so we need to handle it in the
> >   function cs_etm__last_executed_instr();
> >
> 
> I think you also need to add something in to handle discontinuities in
> trace - for example it is possible to configure the ETM to only trace
> execution in specific code regions or to trace a few cycles every so
> often. In these cases, prev_packet->sample_type will not be zero, but
> whatever the previous packet was.  You will get a CS_ETM_TRACE_ON packet in
> such cases, generated by an I_TRACE_ON element in the trace stream.
> You also get this on exception return.
> 
> However, you should also keep the test for prev_packet->sample_type == 0
> as you may not see a CS_ETM_TRACE_ON when decoding a buffer that has
> wrapped.

Thanks for reviewing.  Let's dig more detailed into this issue,
especially for handling packet CS_ETM_TRACE_ON, I'd like divide into two
sub cases.

- The first case is for using python script:

  I use python script to analyze packets with below command:
  ./perf script --itrace=ril128 -s arm-cs-trace-disasm.py -F cpu,event,ip,addr,sym -- -v -d objdump -k ./vmlinux

  What I observe is after we pass python script with parameter '-s
  arm-cs-trace-disasm.py', then instruction tracing options
  '--itrace=ril128' isn't really used;  the perf tool creates another
  new process for launch python script and re-enter cmd_script()
  function, but at the second time when invoke cmd_script() for python
  script execution the option '--itrace=ril128' is dropped and all
  parameters are only valid defined by the python script.

  As result, I can the variable 'etmq->etm->synth_opts.last_branch' is
  always FALSE for running python script.  So all CS_ETM_TRACE_ON
  packets will be ignored in the function cs_etm__flush().

  Even the CS_ETM_TRACE_ON packets are missed to handle, the program
  flow still can work well.  The reason is without the interference by
  CS_ETM_TRACE_ON, the CS_ETM_RANGE packets can smoothly create
  instruction range by ignore the middle CS_ETM_TRACE_ON packet.

  Please see below example, in this example there have 3 packets, the
  first one packet is CS_ETM_RANGE packet which is labelled with
  'PACKET_1', the first one packet can properly generate branch sample
  data with previous packet as expected;  the second packet is
  PACKET_2 which is CS_ETM_TRACE_ON, but
  'etmq->etm->synth_opts.last_branch' is false so function
  cs_etm__flush() doesn't handle it and skip the swap operation
  "etmq->prev_packet = tmp"; the third packet is PACKET_3, which is
  CS_ETM_RANGE packet and we can see it's smoontly to create
  continous instruction range between PACKET_1 and PACKET_3.

  cs_etm__sample: prev_packet: sample_type=1 exc=0 exc_ret=0 cpu=1 start_addr=0xffff000008a5f79c end_addr=0xffff000008a5f7bc last_instr_taken_branch=1
  PACKET_1: cs_etm__sample: packet: sample_type=1 exc=0 exc_ret=0 cpu=1 start_addr=0xffff000008a5f858 end_addr=0xffff000008a5f864 last_instr_taken_branch=1
  cs_etm__synth_branch_sample: ip=0xffff000008a5f7b8 addr=0xffff000008a5f858 pid=2290 tid=2290 id=1000000021 stream_id=1000000021 period=1 cpu=1 flags=0 cpumode=2

  cs_etm__flush: prev_packet: sample_type=1 exc=0 exc_ret=0 cpu=1 start_addr=0xffff000008a5f858 end_addr=0xffff000008a5f864 last_instr_taken_branch=1
  PACKET_2: cs_etm__flush: packet: sample_type=2 exc=0 exc_ret=0 cpu=2 start_addr=0xdeadbeefdeadbeef end_addr=0xdeadbeefdeadbeef last_instr_taken_branch=1

  cs_etm__sample: prev_packet: sample_type=1 exc=0 exc_ret=0 cpu=1 start_addr=0xffff000008a5f858 end_addr=0xffff000008a5f864 last_instr_taken_branch=1
  PACKET_3: cs_etm__sample: packet: sample_type=1 exc=0 exc_ret=0 cpu=2 start_addr=0xffff000008be7528 end_addr=0xffff000008be7538 last_instr_taken_branch=1
  cs_etm__synth_branch_sample: ip=0xffff000008a5f860 addr=0xffff000008be7528 pid=2290 tid=2290 id=1000000021 stream_id=1000000021 period=1 cpu=2 flags=0 cpumode=2

  So seems to me, the CS_ETM_TRACE_ON packet doesn't introduce trouble
  for the program flow analysis if we can handle all CS_ETM_RANGE
  packets and without handling CS_ETM_TRACE_ON packet for branch
  samples.

- The second case is for --itrace option without python script:
  ./perf script --itrace=ril -F cpu,event,ip,addr,sym -k ./vmlinux

  In this case, the flag 'etmq->etm->synth_opts.last_branch' is true
  so CS_ETM_TRACE_ON packet will be handled; but I can observe the
  CS_ETM_RANGE packet in etmq->prev_packet isn't handled in the
  function cs_etm__flush() for branch sample, so actually we miss some
  branch sample for this case.

  So I think we also need handle CS_ETM_RANGE packet in function
  cs_etm__flush() to generate branch samples.  But this has side
  effect, we introduce the extra track for CS_ETM_TRACE_ON packet for
  branch samples, so we will see one branch range like:
  [ 0xdeadbeefdeadbeef .. 0xdeadbeefdeadbeef ].

Please reivew below change is okay for you?  Thanks a lot for
suggestions.

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 822ba91..37d3722 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -495,6 +495,13 @@ static inline void cs_etm__reset_last_branch_rb(struct cs_etm_queue *etmq)
 static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet *packet)
 {
 	/*
+	 * The packet is the start tracing packet if the end_addr is zero,
+	 * returns 0 for this case.
+	 */
+	if (!packet->end_addr)
+		return 0;
+
+	/*
 	 * The packet records the execution range with an exclusive end address
 	 *
 	 * A64 instructions are constant size, so the last executed
@@ -897,13 +904,28 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
 		etmq->period_instructions = instrs_over;
 	}
 
-	if (etm->sample_branches &&
-	    etmq->prev_packet &&
-	    etmq->prev_packet->sample_type == CS_ETM_RANGE &&
-	    etmq->prev_packet->last_instr_taken_branch) {
-		ret = cs_etm__synth_branch_sample(etmq);
-		if (ret)
-			return ret;
+	if (etm->sample_branches && etmq->prev_packet) {
+		bool generate_sample = false;
+
+		/* Generate sample for start tracing packet */
+		if (etmq->prev_packet->sample_type == 0 ||
+		    etmq->prev_packet->sample_type == CS_ETM_TRACE_ON)
+			generate_sample = true;
+
+		/* Generate sample for exception packet */
+		if (etmq->prev_packet->exc == true)
+			generate_sample = true;
+
+		/* Generate sample for normal branch packet */
+		if (etmq->prev_packet->sample_type == CS_ETM_RANGE &&
+		    etmq->prev_packet->last_instr_taken_branch)
+			generate_sample = true;
+
+		if (generate_sample) {
+			ret = cs_etm__synth_branch_sample(etmq);
+			if (ret)
+				return ret;
+		}
 	}
 
 	if (etm->sample_branches || etm->synth_opts.last_branch) {
@@ -921,12 +943,17 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
 
 static int cs_etm__flush(struct cs_etm_queue *etmq)
 {
+	struct cs_etm_auxtrace *etm = etmq->etm;
 	int err = 0;
 	struct cs_etm_packet *tmp;
 
-	if (etmq->etm->synth_opts.last_branch &&
-	    etmq->prev_packet &&
-	    etmq->prev_packet->sample_type == CS_ETM_RANGE) {
+	if (!etmq->prev_packet)
+		return 0;
+
+	if (etmq->prev_packet->sample_type != CS_ETM_RANGE)
+		return 0;
+
+	if (etmq->etm->synth_opts.last_branch) {
 		/*
 		 * Generate a last branch event for the branches left in the
 		 * circular buffer at the end of the trace.
@@ -939,18 +966,25 @@ static int cs_etm__flush(struct cs_etm_queue *etmq)
 		err = cs_etm__synth_instruction_sample(
 			etmq, addr,
 			etmq->period_instructions);
+		if (err)
+			return err;
 		etmq->period_instructions = 0;
+	}
 
-		/*
-		 * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
-		 * the next incoming packet.
-		 */
-		tmp = etmq->packet;
-		etmq->packet = etmq->prev_packet;
-		etmq->prev_packet = tmp;
+	if (etm->sample_branches) {
+		err = cs_etm__synth_branch_sample(etmq);
+		if (err)
+			return err;
 	}
 
-	return err;
+	/*
+	 * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
+	 * the next incoming packet.
+	 */
+	tmp = etmq->packet;
+	etmq->packet = etmq->prev_packet;
+	etmq->prev_packet = tmp;
+	return 0;
 }
 
 static int cs_etm__run_decoder(struct cs_etm_queue *etmq)
-- 
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: Documentation/translations: Italian
From: Federico Vaga @ 2018-05-22  8:39 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: linux-doc, amantegazza
In-Reply-To: <20180521170035.7e75f54b@lwn.net>

On Tuesday, 22 May 2018 01:00:35 CEST Jonathan Corbet wrote:
> On Mon, 21 May 2018 22:54:18 +0200
> 
> Federico Vaga <federico.vaga@vaga.pv.it> wrote:
> > I'm writing you because I would like to start an effort to
> > translate the Documentation in Italian. I would like also to
> > express the idea of providing guide lines for translations.
> 
> Mi sembra un'ottima idea! :)

Siamo sulla stessa lunghezza d'onda :)
 
> > I know that there are already translations for Asian languages but
> > I am not able to find the history of them. I do not know if
> > translations in European languages are going to be accepted
> > (perhaps there is the assumption that everyone knows English in
> > the European continent and it is a waste of energy to do
> > translations[?]). For example, even if French and Germans are
> > quite active there are not translations yet in their language: is
> > there a particular reason or simply nobody did it?
> 
> Nobody has done it.  There certainly is no policy against
> translations to any specific language - that would be hard to
> justify, to say the least.
> 
> OK, I might draw the line at Klingon.  But the discussion of error
> handling in Klingon could actually be a lot of fun.
> 
> I'm happy to accept new translations of stuff in the documentation
> directory.  In general, I've had two concerns about translations:
> they are generally impossible for me to review, and there needs to
> be somebody committed to keeping the translations current as the
> documentation changes.  For Italian, the first problem doesn't
> exist, but the second is always there. What are your intentions for
> maintaining the translations in the long term?

I can maintain the Italian translation. 
 
> > If you agree with the need to support different translations, I
> > would like to do the Italian one. But first I would like to open
> > a little discussion about translations  "how to write
> > translations"; this discussion should produce a document (in
> > English) with guide lines for translator (e.g. Documentation/
> > translation/howto.rst): what to translate first, what to NOT
> > translate, how to structure it.
> > Once this is defined I will start the Italian translation (I
> > already have some documents translated).
> 
> This can be a fine plan, assuming we're convinced that the
> guidelines document is really needed.  I guess I'm not yet
> convinced of that.  But you might also consider gaining some
> experience in writing, merging, and maintaining a translation
> before trying to lay down rules for everybody else.  In other
> words, I think you might want to do things in the opposite order.

You are right, probably I was over-engineering this thing :)

> 
> > How to do translations (IMHO)
> > -----------------------------
> > Here my personal guide lines for translations
> > 
> > - Translate only sphinx-ready documents, do not translate
> > documents which are not yet sphinx. We should avoid useless
> > double work; at some point, I guess, everything will be sphinx.
> 
> I wouldn't insist on that.  But a better idea in any case would be:
> if a document you want to translate isn't yet in RST, just do the
> conversion. The amount of work required is usually quite small.

ok

> > - Include in all documents a disclaimer saying that English is the
> > main reference (use sphinx directive 'include' to include it).
> > - Include in all documents a reference to the English version. So
> > it will be easy jump to the original document.
> 
> Remember that the docs need to be readable *without* Sphinx
> processing. Better to just name the source document in a quick line
> at the top, IMO.

ok

> > - Translate in order: non-technical documents (they are stable,
> > useful for a wider group of people (developers and managers):
> > process/, doc-guide/ ), technical documents about key concepts
> > (they are stable, and important for new-comers), subsystems (the
> > big picture is stable, typically they do not describe all little
> > details that may change), and then other documents
> If you want to work in that order, that is more than fine.  Others
> have agreed - the process docs tend to get translated first.  But
> if somebody else wants to start elsewhere, I wouldn't try to tell
> them not to.
> 
> Anyway, thanks for wanting to help improve the documentation!  If
> you have some of this work already done, you might want to consider
> going ahead and posting some patches.

I will review them and push something in the next days

-- 
Federico Vaga
http://www.federicovaga.it/


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 4/5] acpi/processor: Fix the return value of acpi_processor_ids_walk()
From: Dou Liyang @ 2018-05-22  1:47 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, linux-acpi, linux-doc, Ingo Molnar, Jonathan Corbet,
	Rafael J. Wysocki, Len Brown, H. Peter Anvin, Peter Zijlstra
In-Reply-To: <alpine.DEB.2.21.1805191704150.1599@nanos.tec.linutronix.de>



At 05/19/2018 11:06 PM, Thomas Gleixner wrote:
> On Tue, 20 Mar 2018, Dou Liyang wrote:
> 
>> ACPI driver should make sure all the processor IDs in their ACPI Namespace
>> are unique for CPU hotplug. the driver performs a depth-first walk of the
>> namespace tree and calls the acpi_processor_ids_walk().
>>
>> But, the acpi_processor_ids_walk() will return true if one processor is
>> checked, that cause the walk break after walking pass the first processor.
>>
>> Repace the value with AE_OK which is the standard acpi_status value.
>>
>> Fixes 8c8cb30f49b8 ("acpi/processor: Implement DEVICE operator for processor enumeration")
>>
>> Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
>> ---
>>   drivers/acpi/acpi_processor.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
>> index 449d86d39965..db5bdb59639c 100644
>> --- a/drivers/acpi/acpi_processor.c
>> +++ b/drivers/acpi/acpi_processor.c
>> @@ -663,11 +663,11 @@ static acpi_status __init (acpi_handle handle,
>>   	}
>>   
>>   	processor_validated_ids_update(uid);
>> -	return true;
>> +	return AE_OK;
>>   
>>   err:
>>   	acpi_handle_info(handle, "Invalid processor object\n");
>> -	return false;
>> +	return AE_OK;
> 
> I'm not sure whether this is the right return value here. Rafael?
> 
Hi, Thomas, Rafael,

Yes, I used AE_OK to make sure it can skip the invalid objects and
continue to do the following other objects, I'm also not sure.

For this bug, recently, I sent another patch to remove this check code
away.

    https://lkml.org/lkml/2018/5/17/320

IMO, the duplicate IDs can be avoid by the other code

    if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) ---- 1)

As the mapping of cpu_id(pr->id) and processor_id is fixed, when
hot-plugging a physical CPU, if its processor_id is duplicated with the
present, the above condition 1) will be 0, and Linux will do not add
this CPU.

And, when every time the system starts, this code will be executed, it
will waste more time with the increase in the number of CPU.

So I prefer to remove this code.

Thanks,
	dou


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: Documentation/translations: Italian
From: Jonathan Corbet @ 2018-05-21 23:00 UTC (permalink / raw)
  To: Federico Vaga; +Cc: linux-doc, amantegazza
In-Reply-To: <152696186.kHulSgLgWb@harkonnen>

On Mon, 21 May 2018 22:54:18 +0200
Federico Vaga <federico.vaga@vaga.pv.it> wrote:

> I'm writing you because I would like to start an effort to translate the 
> Documentation in Italian. I would like also to express the idea of providing 
> guide lines for translations.

Mi sembra un'ottima idea! :)

> I know that there are already translations for Asian languages but I am not 
> able to find the history of them. I do not know if translations in European 
> languages are going to be accepted (perhaps there is the assumption that 
> everyone knows English in the European continent and it is a waste of energy 
> to do translations[?]). For example, even if French and Germans are quite 
> active there are not translations yet in their language: is there a particular 
> reason or simply nobody did it?

Nobody has done it.  There certainly is no policy against translations to
any specific language - that would be hard to justify, to say the least.

OK, I might draw the line at Klingon.  But the discussion of error handling
in Klingon could actually be a lot of fun.

I'm happy to accept new translations of stuff in the documentation
directory.  In general, I've had two concerns about translations: they are
generally impossible for me to review, and there needs to be somebody
committed to keeping the translations current as the documentation
changes.  For Italian, the first problem doesn't exist, but the second is
always there. What are your intentions for maintaining the translations in
the long term?

> If you agree with the need to support different translations, I would like to 
> do the Italian one. But first I would like to open a little discussion about 
> translations  "how to write translations"; this discussion should produce a 
> document (in English) with guide lines for translator (e.g. Documentation/
> translation/howto.rst): what to translate first, what to NOT translate, how to 
> structure it.
> Once this is defined I will start the Italian translation (I already have some 
> documents translated).

This can be a fine plan, assuming we're convinced that the guidelines
document is really needed.  I guess I'm not yet convinced of that.  But you
might also consider gaining some experience in writing, merging, and
maintaining a translation before trying to lay down rules for everybody
else.  In other words, I think you might want to do things in the opposite
order.

> How to do translations (IMHO)
> -----------------------------
> Here my personal guide lines for translations
> 
> - Translate only sphinx-ready documents, do not translate documents which are 
> not yet sphinx. We should avoid useless double work; at some point, I guess, 
> everything will be sphinx.

I wouldn't insist on that.  But a better idea in any case would be: if a
document you want to translate isn't yet in RST, just do the conversion.
The amount of work required is usually quite small.

> - Include in all documents a disclaimer saying that English is the main 
> reference (use sphinx directive 'include' to include it).
> - Include in all documents a reference to the English version. So it will be 
> easy jump to the original document.

Remember that the docs need to be readable *without* Sphinx processing.
Better to just name the source document in a quick line at the top, IMO.

> - Translate in order: non-technical documents (they are stable, useful for a 
> wider group of people (developers and managers): process/, doc-guide/ ), 
> technical documents about key concepts (they are stable, and important for 
> new-comers), subsystems (the big picture is stable, typically they do not 
> describe all little details that may change), and then other documents

If you want to work in that order, that is more than fine.  Others have
agreed - the process docs tend to get translated first.  But if somebody
else wants to start elsewhere, I wouldn't try to tell them not to.

Anyway, thanks for wanting to help improve the documentation!  If you have
some of this work already done, you might want to consider going ahead and
posting some patches.

Thanks,

jon
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Documentation/translations: Italian
From: Federico Vaga @ 2018-05-21 20:54 UTC (permalink / raw)
  To: linux-doc, Jonathan Corbet; +Cc: amantegazza

Hello,

I'm writing you because I would like to start an effort to translate the 
Documentation in Italian. I would like also to express the idea of providing 
guide lines for translations.

A looked a bit in the archive but I did not find anything about these two 
topics (Italian translation, guide lines for translations).

I know that there are already translations for Asian languages but I am not 
able to find the history of them. I do not know if translations in European 
languages are going to be accepted (perhaps there is the assumption that 
everyone knows English in the European continent and it is a waste of energy 
to do translations[?]). For example, even if French and Germans are quite 
active there are not translations yet in their language: is there a particular 
reason or simply nobody did it?

Why
===
There is nothing better for understanding than our own mother tongue, and 
reading Documentation is one of those activities where it is important to 
understand its message rather than learning a different language (there are 
dedicated books and courses for that). This is especially true for young 
developers and new-comers who are really focused on understanding Linux and a 
different language can be an obstacle sometimes. I personally had a couple of 
experiences where I pointed people to the documentation and I had to explain 
English rather than Linux. Very competent people but they were not used to use 
English every day.

I put myself in this list of people who prefer the mother tongue language when 
it is time to really understand something. I work for an international 
organization in a country that is not mine with people coming from all around 
the European continent and our common tongue is bad-English with all its 
dialects and accents: true-English (with its own dialects), spaghetti-English, 
kartoffel-English, paella-English, formage-English and more. Misunderstanding 
is not rare, and sometimes express ourselves takes more time than needed. This 
is another reason why I believe that for understanding purposes is good to 
read in our own mother tongue.

Plan
====
If you agree with the need to support different translations, I would like to 
do the Italian one. But first I would like to open a little discussion about 
translations  "how to write translations"; this discussion should produce a 
document (in English) with guide lines for translator (e.g. Documentation/
translation/howto.rst): what to translate first, what to NOT translate, how to 
structure it.
Once this is defined I will start the Italian translation (I already have some 
documents translated).

How to do translations (IMHO)
-----------------------------
Here my personal guide lines for translations

- Translate only sphinx-ready documents, do not translate documents which are 
not yet sphinx. We should avoid useless double work; at some point, I guess, 
everything will be sphinx.
- Include in all documents a disclaimer saying that English is the main 
reference (use sphinx directive 'include' to include it).
- Include in all documents a reference to the English version. So it will be 
easy jump to the original document.
- Translate in order: non-technical documents (they are stable, useful for a 
wider group of people (developers and managers): process/, doc-guide/ ), 
technical documents about key concepts (they are stable, and important for 
new-comers), subsystems (the big picture is stable, typically they do not 
describe all little details that may change), and then other documents
- avoid scattered translations: try to finish one "topic" before translating 
something else

Probably there is much more, that's why I would like to have a little 
discussion about it.


Thanks for reading everything :)

-- 
Federico Vaga
http://www.federicovaga.it/


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v5 3/5] i2c: i2c-qcom-geni: Add bus driver for the Qualcomm GENI I2C controller
From: Wolfram Sang @ 2018-05-21 20:49 UTC (permalink / raw)
  To: Karthikeyan Ramasubramanian
  Cc: corbet, andy.gross, david.brown, robh+dt, mark.rutland, linux-doc,
	linux-arm-msm, devicetree, linux-i2c, evgreen, acourbot, swboyd,
	dianders, Sagar Dharia, Girish Mahadevan
In-Reply-To: <1521836461-6515-4-git-send-email-kramasub@codeaurora.org>

[-- Attachment #1: Type: text/plain, Size: 1744 bytes --]

Hi,

On Fri, Mar 23, 2018 at 02:20:59PM -0600, Karthikeyan Ramasubramanian wrote:
> This bus driver supports the GENI based i2c hardware controller in the
> Qualcomm SOCs. The Qualcomm Generic Interface (GENI) is a programmable
> module supporting a wide range of serial interfaces including I2C. The
> driver supports FIFO mode and DMA mode of transfer and switches modes
> dynamically depending on the size of the transfer.
> 
> Signed-off-by: Karthikeyan Ramasubramanian <kramasub@codeaurora.org>
> Signed-off-by: Sagar Dharia <sdharia@codeaurora.org>
> Signed-off-by: Girish Mahadevan <girishm@codeaurora.org>

Is one of these people interested in maintaining this driver? Then, an
entry for MAINTAINERS would be needed, too. (Same goes for
drivers/soc/qcom/ IMHO, but this is not my realm, so just saying)

> +static const struct geni_i2c_err_log gi2c_log[] = {
> +	[GP_IRQ0] = {-EINVAL, "Unknown I2C err GP_IRQ0"},
> +	[NACK] = {-ENOTCONN, "NACK: slv unresponsive, check its power/reset-ln"},
> +	[GP_IRQ2] = {-EINVAL, "Unknown I2C err GP IRQ2"},
> +	[BUS_PROTO] = {-EPROTO, "Bus proto err, noisy/unepxected start/stop"},
> +	[ARB_LOST] = {-EBUSY, "Bus arbitration lost, clock line undriveable"},
> +	[GP_IRQ5] = {-EINVAL, "Unknown I2C err GP IRQ5"},
> +	[GENI_OVERRUN] = {-EIO, "Cmd overrun, check GENI cmd-state machine"},
> +	[GENI_ILLEGAL_CMD] = {-EILSEQ, "Illegal cmd, check GENI cmd-state machine"},
> +	[GENI_ABORT_DONE] = {-ETIMEDOUT, "Abort after timeout successful"},
> +	[GENI_TIMEOUT] = {-ETIMEDOUT, "I2C TXN timed out"},
> +};

Please check Documentation/i2c/fault-codes for better -ERRNO values,
especially for NACK and ARB_LOST.

Rest looks good from a glimpse.

Thanks,

   Wolfram


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* [v4 08/11] Documentation: hwmon: Add documents for PECI hwmon client drivers
From: Jae Hyun Yoo @ 2018-05-21 19:59 UTC (permalink / raw)
  To: Jean Delvare, Guenter Roeck, Jonathan Corbet, linux-hwmon,
	linux-doc, linux-kernel
  Cc: Jae Hyun Yoo, Jason M Biils, Randy Dunlap

This commit adds hwmon documents for PECI cputemp and dimmtemp drivers.

Signed-off-by: Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>
Reviewed-by: Haiyue Wang <haiyue.wang@linux.intel.com>
Reviewed-by: James Feist <james.feist@linux.intel.com>
Reviewed-by: Vernon Mauery <vernon.mauery@linux.intel.com>
Cc: Jason M Biils <jason.m.bills@linux.intel.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
---
 Documentation/hwmon/peci-cputemp  | 78 +++++++++++++++++++++++++++++++
 Documentation/hwmon/peci-dimmtemp | 50 ++++++++++++++++++++
 2 files changed, 128 insertions(+)
 create mode 100644 Documentation/hwmon/peci-cputemp
 create mode 100644 Documentation/hwmon/peci-dimmtemp

diff --git a/Documentation/hwmon/peci-cputemp b/Documentation/hwmon/peci-cputemp
new file mode 100644
index 000000000000..821a9258f2e6
--- /dev/null
+++ b/Documentation/hwmon/peci-cputemp
@@ -0,0 +1,78 @@
+Kernel driver peci-cputemp
+==========================
+
+Supported chips:
+	One of Intel server CPUs listed below which is connected to a PECI bus.
+		* Intel Xeon E5/E7 v3 server processors
+			Intel Xeon E5-14xx v3 family
+			Intel Xeon E5-24xx v3 family
+			Intel Xeon E5-16xx v3 family
+			Intel Xeon E5-26xx v3 family
+			Intel Xeon E5-46xx v3 family
+			Intel Xeon E7-48xx v3 family
+			Intel Xeon E7-88xx v3 family
+		* Intel Xeon E5/E7 v4 server processors
+			Intel Xeon E5-16xx v4 family
+			Intel Xeon E5-26xx v4 family
+			Intel Xeon E5-46xx v4 family
+			Intel Xeon E7-48xx v4 family
+			Intel Xeon E7-88xx v4 family
+		* Intel Xeon Scalable server processors
+			Intel Xeon Bronze family
+			Intel Xeon Silver family
+			Intel Xeon Gold family
+			Intel Xeon Platinum family
+	Addresses scanned: PECI client address 0x30 - 0x37
+	Datasheet: Available from http://www.intel.com/design/literature.htm
+
+Author:
+	Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>
+
+Description
+-----------
+
+This driver implements a generic PECI hwmon feature which provides Digital
+Thermal Sensor (DTS) thermal readings of the CPU package and CPU cores that are
+accessible using the PECI Client Command Suite via the processor PECI client.
+
+All temperature values are given in millidegree Celsius and will be measurable
+only when the target CPU is powered on.
+
+sysfs attributes
+----------------
+
+temp1_label		"Die"
+temp1_input		Provides current die temperature of the CPU package.
+temp1_max		Provides thermal control temperature of the CPU package
+			which is also known as Tcontrol.
+temp1_crit		Provides shutdown temperature of the CPU package which
+			is also known as the maximum processor junction
+			temperature, Tjmax or Tprochot.
+temp1_crit_hyst		Provides the hysteresis value from Tcontrol to Tjmax of
+			the CPU package.
+
+temp2_label		"Tcontrol"
+temp2_input		Provides current Tcontrol temperature of the CPU
+			package which is also known as Fan Temperature target.
+			Indicates the relative value from thermal monitor trip
+			temperature at which fans should be engaged.
+temp2_crit		Provides Tcontrol critical value of the CPU package
+			which is same to Tjmax.
+
+temp3_label		"Tthrottle"
+temp3_input		Provides current Tthrottle temperature of the CPU
+			package. Used for throttling temperature. If this value
+			is allowed and lower than Tjmax - the throttle will
+			occur and reported at lower than Tjmax.
+
+temp4_label		"Tjmax"
+temp4_input		Provides the maximum junction temperature, Tjmax of the
+			CPU package.
+
+temp[5-*]_label		Provides string "Core X", where X is resolved core
+			number.
+temp[5-*]_input		Provides current temperature of each core.
+temp[5-*]_max		Provides thermal control temperature of the core.
+temp[5-*]_crit		Provides shutdown temperature of the core.
+temp[5-*]_crit_hyst	Provides the hysteresis value from Tcontrol to Tjmax of
+			the core.
diff --git a/Documentation/hwmon/peci-dimmtemp b/Documentation/hwmon/peci-dimmtemp
new file mode 100644
index 000000000000..c54f2526188c
--- /dev/null
+++ b/Documentation/hwmon/peci-dimmtemp
@@ -0,0 +1,50 @@
+Kernel driver peci-dimmtemp
+===========================
+
+Supported chips:
+	One of Intel server CPUs listed below which is connected to a PECI bus.
+		* Intel Xeon E5/E7 v3 server processors
+			Intel Xeon E5-14xx v3 family
+			Intel Xeon E5-24xx v3 family
+			Intel Xeon E5-16xx v3 family
+			Intel Xeon E5-26xx v3 family
+			Intel Xeon E5-46xx v3 family
+			Intel Xeon E7-48xx v3 family
+			Intel Xeon E7-88xx v3 family
+		* Intel Xeon E5/E7 v4 server processors
+			Intel Xeon E5-16xx v4 family
+			Intel Xeon E5-26xx v4 family
+			Intel Xeon E5-46xx v4 family
+			Intel Xeon E7-48xx v4 family
+			Intel Xeon E7-88xx v4 family
+		* Intel Xeon Scalable server processors
+			Intel Xeon Bronze family
+			Intel Xeon Silver family
+			Intel Xeon Gold family
+			Intel Xeon Platinum family
+	Addresses scanned: PECI client address 0x30 - 0x37
+	Datasheet: Available from http://www.intel.com/design/literature.htm
+
+Author:
+	Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>
+
+Description
+-----------
+
+This driver implements a generic PECI hwmon feature which provides Digital
+Thermal Sensor (DTS) thermal readings of DIMM components that are accessible
+using the PECI Client Command Suite via the processor PECI client.
+
+All temperature values are given in millidegree Celsius and will be measurable
+only when the target CPU is powered on.
+
+sysfs attributes
+----------------
+
+temp[N]_label		Provides string "DIMM CI", where C is DIMM channel and
+			I is DIMM index of the populated DIMM.
+temp[N]_input		Provides current temperature of the populated DIMM.
+
+Note:
+	DIMM temperature attributes will appear when the client CPU's BIOS
+	completes memory training and testing.
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [v4 02/11] Documentation: ioctl: Add ioctl numbers for PECI subsystem
From: Jae Hyun Yoo @ 2018-05-21 19:58 UTC (permalink / raw)
  To: Jonathan Corbet, Martin K . Petersen, Darrick J . Wong,
	Greg Kroah-Hartman, Bryant G . Ly, Michael Ellerman,
	Tomohiro Kusumi, Frederic Barrat, Eric Sandeen, Arnd Bergmann,
	Matthew R . Ochs, linux-doc, linux-kernel
  Cc: Jae Hyun Yoo, James Feist, Jason M Biils, Vernon Mauery

This commit updates ioctl-number.txt to reflect ioctl numbers used
by the PECI subsystem.

Signed-off-by: Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>
Cc: James Feist <james.feist@linux.intel.com>
Cc: Jason M Biils <jason.m.bills@linux.intel.com>
Cc: Vernon Mauery <vernon.mauery@linux.intel.com>
---
 Documentation/ioctl/ioctl-number.txt | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt
index 480c8609dc58..1670ca4072b2 100644
--- a/Documentation/ioctl/ioctl-number.txt
+++ b/Documentation/ioctl/ioctl-number.txt
@@ -322,6 +322,8 @@ Code  Seq#(hex)	Include File		Comments
 0xB3	00	linux/mmc/ioctl.h
 0xB4	00-0F	linux/gpio.h		<mailto:linux-gpio@vger.kernel.org>
 0xB5	00-0F	uapi/linux/rpmsg.h	<mailto:linux-remoteproc@vger.kernel.org>
+0xB6	00-0F	uapi/linux/peci-ioctl.h	PECI subsystem
+					<mailto:jae.hyun.yoo@linux.intel.com>
 0xC0	00-0F	linux/usb/iowarrior.h
 0xCA	00-0F	uapi/misc/cxl.h
 0xCA	10-2F	uapi/misc/ocxl.h
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v4 00/10] PECI device driver introduction
From: Jae Hyun Yoo @ 2018-05-21 19:56 UTC (permalink / raw)
  To: Alan Cox, Andrew Jeffery, Andrew Lunn, Andy Shevchenko,
	Arnd Bergmann, Benjamin Herrenschmidt, Fengguang Wu, Greg KH,
	Guenter Roeck, Haiyue Wang, James Feist, Jason M Biils,
	Jean Delvare, Joel Stanley, Julia Cartwright, Miguel Ojeda,
	Milton Miller II, Pavel Machek, Randy Dunlap, Ryan Chen,
	Stef van Os, Sumeet R Pawnikar, Vernon Mauery
  Cc: linux-kernel, linux-doc, devicetree, linux-hwmon,
	linux-arm-kernel, linux-aspeed, openbmc, Jae Hyun Yoo

Introduction of the Platform Environment Control Interface (PECI) bus
device driver. PECI is a one-wire bus interface that provides a
communication channel between an Intel processor and chipset components to
external monitoring or control devices. PECI is designed to support the
following sideband functions:

* Processor and DRAM thermal management
  - Processor fan speed control is managed by comparing Digital Thermal
    Sensor (DTS) thermal readings acquired via PECI against the
    processor-specific fan speed control reference point, or TCONTROL. Both
    TCONTROL and DTS thermal readings are accessible via the processor PECI
    client. These variables are referenced to a common temperature, the TCC
    activation point, and are both defined as negative offsets from that
    reference.
  - PECI based access to the processor package configuration space provides
    a means for Baseboard Management Controllers (BMC) or other platform
    management devices to actively manage the processor and memory power
    and thermal features.

* Platform Manageability
  - Platform manageability functions including thermal, power, and error
    monitoring. Note that platform 'power' management includes monitoring
    and control for both the processor and DRAM subsystem to assist with
    data center power limiting.
  - PECI allows read access to certain error registers in the processor MSR
    space and status monitoring registers in the PCI configuration space
    within the processor and downstream devices.
  - PECI permits writes to certain registers in the processor PCI
    configuration space.

* Processor Interface Tuning and Diagnostics
  - Processor interface tuning and diagnostics capabilities
    (Intel Interconnect BIST). The processors Intel Interconnect Built In
    Self Test (Intel IBIST) allows for infield diagnostic capabilities in
    the Intel UPI and memory controller interfaces. PECI provides a port to
    execute these diagnostics via its PCI Configuration read and write
    capabilities.

* Failure Analysis
  - Output the state of the processor after a failure for analysis via
    Crashdump.

PECI uses a single wire for self-clocking and data transfer. The bus
requires no additional control lines. The physical layer is a self-clocked
one-wire bus that begins each bit with a driven, rising edge from an idle
level near zero volts. The duration of the signal driven high depends on
whether the bit value is a logic '0' or logic '1'. PECI also includes
variable data transfer rate established with every message. In this way, it
is highly flexible even though underlying logic is simple.

The interface design was optimized for interfacing between an Intel
processor and chipset components in both single processor and multiple
processor environments. The single wire interface provides low board
routing overhead for the multiple load connections in the congested routing
area near the processor and chipset components. Bus speed, error checking,
and low protocol overhead provides adequate link bandwidth and reliability
to transfer critical device operating conditions and configuration
information.

This implementation provides the basic framework to add PECI extensions to
the Linux bus and device models. A hardware specific 'Adapter' driver can
be attached to the PECI bus to provide sideband functions described above.
It is also possible to access all devices on an adapter from userspace
through the /dev interface. A device specific 'Client' driver also can be
attached to the PECI bus so each processor client's features can be
supported by the 'Client' driver through an adapter connection in the bus.
This patch set includes Aspeed 24xx/25xx PECI driver and PECI
cputemp/dimmtemp drivers as the first implementation for both adapter and
client drivers on the PECI bus framework.

Please review.

Thanks,

-Jae

Changes from v3:
* Made code more simple and compact.
* Removed unused header file inclusion.
* Fixed incorrect error return values and messages.
* Removed DTS margin temperature from the peci-cputemp.
* Made some magic numbers use defines.
* Moved peci_get_cpu_id() into peci-core as a common function.
* Replaced the cancel_delayed_work() call with a cancel_delayed_work_sync().
* Replaced AST and Aspeed uses with ASPEED.
* Simplified peci command timeout checking logic using
  regmap_read_poll_timeout().
* Simplified endian swap codes using endian handling macros.
* Dropped regmap read/write error checking except for the first access.
* Added a PECI reset setting in the device tree node.
* Removed unnecessary sleep from the probe context.
* Removed IRQF_SHARED flag from irq request code in the ASPEED PECI driver.
* Fixed typos in documents.
* Combined peci-bus.txt, peci-adapter.txt and peci-client.txt into peci.txt.
* Fixed and swept documents to drop some incorrect or unnecessary
  descriptions.
* Fixed device tree to make unit-address format use reg contents.
* Simplified bit manipulations using <linux/bitfield.h>.
* Made client CPU model checking use <asm/intel-family.h> if available.
* Modified adapter heap allocation method to use kobject reference count
  based.
* Added the low-level PECI xfer IOCTL again to support the Redfish
  requirement.
* Added PM domain attach/detach code.
* Added logic for device instantiation through sysfs.
* Fix a bug of interrupt status checking code in peci-aspeed driver.

Changes from v2:
* Divided peci-hwmon driver into two drivers, peci-cputemp and
  peci-dimmtemp.
* Added generic dt binding documents for PECI bus, adapter and client.
* Removed in_atomic() call from the PECI core driver.
* Improved PECI commands masking logic.
* Added permission check logic for PECI ioctls.
* Removed unnecessary type casts.
* Fixed some invalid error return codes.
* Added the mark_updated() function to improve update interval checking
  logic.
* Fixed a bug in populated DIMM checking function.
* Fixed some typo, grammar and style issues in documents.
* Rewrote hwmon drivers to use devm_hwmon_device_register_with_info API.
* Made peci_match_id() function as a static.
* Replaced a deprecated create_singlethread_workqueue() call with an
  alloc_ordered_workqueue() call.
* Reordered local variable definitions in reversed xmas tree notation.
* Listed up client CPUs that can be supported by peci-cputemp and
  peci-dimmtemp hwmon drivers.
* Added CPU generation detection logic which checks CPUID signature through
  PECI connection.
* Improved interrupt handling logic in the Aspeed PECI adapter driver.
* Fixed SPDX license identifier style in header files.
* Changed some macros in peci.h to static inline functions.
* Dropped sleepable context checking code in peci-core.
* Adjusted rt_mutex protection scope in peci-core.
* Moved adapter->xfer() checking code into peci_register_adapter().
* Improved PECI command retry checking logic.
* Changed ioctl base from 'P' to 0xb6 to avoid confiliction and updated
  ioctl-number.txt to reflect the ioctl number of PECI subsystem.
* Added a comment to describe PECI retry action.
* Simplified return code handling of peci_ioctl_ping().
* Changed type of peci_ioctl_fn[] to static const.
* Fixed range checking code for valid PECI commands.
* Fixed the error return code on invalid PECI commands.
* Fixed incorrect definitions of PECI ioctl and its handling logic.

Changes from v1:
* Additionally implemented a core driver to support PECI linux bus driver
  model.
* Modified Aspeed PECI driver to make that to be an adapter driver in PECI
  bus.
* Modified PECI hwmon driver to make that to be a client driver in PECI
  bus.
* Simplified hwmon driver attribute labels and removed redundant strings.
* Removed core_nums from device tree setting of hwmon driver and modified
  core number detection logic to check the resolved_core register in client
  CPU's local PCI configuration area.
* Removed dimm_nums from device tree setting of hwmon driver and added
  populated DIMM detection logic to support dynamic creation.
* Removed indexing gap on core temperature and DIMM temperature attributes.
* Improved hwmon registration and dynamic attribute creation logic.
* Fixed structure definitions in PECI uapi header to make that use __u8,
  __u16 and etc.
* Modified wait_for_completion_interruptible_timeout error handling logic
  in Aspeed PECI driver to deliver errors correctly.
* Removed low-level xfer command from ioctl and kept only high-level PECI
  command suite as ioctls.
* Fixed I/O timeout logic in Aspeed PECI driver using ktime.
* Added a function into hwmon driver to simplify update delay checking.
* Added a function into hwmon driver to convert 10.6 to millidegree.
* Dropped non-standard attributes in hwmon driver.
* Fixed OF table for hwmon to make it indicate as a PECI client of Intel
  CPU target.
* Added a maintainer of PECI subsystem into MAINTAINERS document.

Jae Hyun Yoo (11):
  dt-bindings: Add a document of PECI subsystem
  Documentation: ioctl: Add ioctl numbers for PECI subsystem
  drivers/peci: Add support for PECI bus driver core
  dt-bindings: Add a document of PECI adapter driver for ASPEED
    AST24xx/25xx SoCs
  ARM: dts: aspeed: peci: Add PECI node
  drivers/peci: Add a PECI adapter driver for Aspeed AST24xx/AST25xx
  dt-bindings: hwmon: Add documents for PECI hwmon client drivers
  Documentation: hwmon: Add documents for PECI hwmon client drivers
  drivers/hwmon: Add PECI cputemp driver
  drivers/hwmon: Add PECI dimmtemp driver
  Add maintainers for the PECI subsystem

 .../bindings/hwmon/peci-cputemp.txt           |   23 +
 .../bindings/hwmon/peci-dimmtemp.txt          |   24 +
 .../devicetree/bindings/peci/peci-aspeed.txt  |   57 +
 .../devicetree/bindings/peci/peci.txt         |   59 +
 Documentation/hwmon/peci-cputemp              |   78 +
 Documentation/hwmon/peci-dimmtemp             |   50 +
 Documentation/ioctl/ioctl-number.txt          |    2 +
 MAINTAINERS                                   |   10 +
 arch/arm/boot/dts/aspeed-g4.dtsi              |   26 +
 arch/arm/boot/dts/aspeed-g5.dtsi              |   26 +
 drivers/Kconfig                               |    2 +
 drivers/Makefile                              |    1 +
 drivers/hwmon/Kconfig                         |   28 +
 drivers/hwmon/Makefile                        |    2 +
 drivers/hwmon/peci-cputemp.c                  |  407 +++++
 drivers/hwmon/peci-dimmtemp.c                 |  300 ++++
 drivers/hwmon/peci-hwmon.c                    |  124 ++
 drivers/hwmon/peci-hwmon.h                    |   51 +
 drivers/peci/Kconfig                          |   44 +
 drivers/peci/Makefile                         |    9 +
 drivers/peci/peci-aspeed.c                    |  496 ++++++
 drivers/peci/peci-core.c                      | 1451 +++++++++++++++++
 include/linux/peci.h                          |  105 ++
 include/uapi/linux/peci-ioctl.h               |  265 +++
 24 files changed, 3640 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/hwmon/peci-cputemp.txt
 create mode 100644 Documentation/devicetree/bindings/hwmon/peci-dimmtemp.txt
 create mode 100644 Documentation/devicetree/bindings/peci/peci-aspeed.txt
 create mode 100644 Documentation/devicetree/bindings/peci/peci.txt
 create mode 100644 Documentation/hwmon/peci-cputemp
 create mode 100644 Documentation/hwmon/peci-dimmtemp
 create mode 100644 drivers/hwmon/peci-cputemp.c
 create mode 100644 drivers/hwmon/peci-dimmtemp.c
 create mode 100644 drivers/hwmon/peci-hwmon.c
 create mode 100644 drivers/hwmon/peci-hwmon.h
 create mode 100644 drivers/peci/Kconfig
 create mode 100644 drivers/peci/Makefile
 create mode 100644 drivers/peci/peci-aspeed.c
 create mode 100644 drivers/peci/peci-core.c
 create mode 100644 include/linux/peci.h
 create mode 100644 include/uapi/linux/peci-ioctl.h

-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 0/3] bpf: add boot parameters for sysctl knobs
From: Alexei Starovoitov @ 2018-05-21 18:58 UTC (permalink / raw)
  To: Eugene Syromiatnikov
  Cc: netdev, linux-kernel, linux-doc, Kees Cook, Kai-Heng Feng,
	Daniel Borkmann, Alexei Starovoitov, Jonathan Corbet, Jiri Olsa,
	Jesper Dangaard Brouer
In-Reply-To: <20180521122923.GA15717@asgard.redhat.com>

On Mon, May 21, 2018 at 02:29:30PM +0200, Eugene Syromiatnikov wrote:
> Hello.
> 
> This patch set adds ability to set default values for
> kernel.unprivileged_bpf_disable, net.core.bpf_jit_harden,
> net.core.bpf_jit_kallsyms sysctl knobs as well as option to override
> them via a boot-time kernel parameter.

Commits log not only should explain 'what' is being done by the patch,
but 'why' as well.

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH] Documentation: document hung_task_panic kernel parameter
From: Omar Sandoval @ 2018-05-21 18:18 UTC (permalink / raw)
  To: Jonathan Corbet, linux-doc; +Cc: linux-kernel, kernel-team

From: Omar Sandoval <osandov@fb.com>

This parameter has been around since commit e162b39a368f ("softlockup:
decouple hung tasks check from softlockup detection") in 2009 but was
never documented.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 Documentation/admin-guide/kernel-parameters.txt | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 11fc28ecdb6d..4e37bebdc3d0 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1341,6 +1341,16 @@
 			x86-64 are 2M (when the CPU supports "pse") and 1G
 			(when the CPU supports the "pdpe1gb" cpuinfo flag).
 
+	hung_task_panic=
+			[KNL] Should the hung task detector generate panics.
+			Format: <integer>
+
+			A nonzero value instructs the kernel to panic when a
+			hung task is detected. The default value is controlled
+			by the CONFIG_BOOTPARAM_HUNG_TASK_PANIC build-time
+			option. The value selected by this boot parameter can
+			be changed later by the kernel.hung_task_panic sysctl.
+
 	hvc_iucv=	[S390] Number of z/VM IUCV hypervisor console (HVC)
 			       terminal devices. Valid values: 0..8
 	hvc_iucv_allow=	[S390] Comma-separated list of z/VM user IDs.
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [PATCH v5 3/5] i2c: i2c-qcom-geni: Add bus driver for the Qualcomm GENI I2C controller
From: Doug Anderson @ 2018-05-21 18:14 UTC (permalink / raw)
  To: Karthikeyan Ramasubramanian, Wolfram Sang, Andy Gross
  Cc: Jonathan Corbet, David Brown, Rob Herring, Mark Rutland,
	linux-doc, linux-arm-msm, devicetree, linux-i2c, Evan Green,
	acourbot, Stephen Boyd, Sagar Dharia, Girish Mahadevan
In-Reply-To: <CAD=FV=VGcttQ6H1-Hnqj+Jrk+eaE52bOna6NojG5DoQ--A5aEg@mail.gmail.com>

Wolfram,

On Fri, Mar 23, 2018 at 4:34 PM, Doug Anderson <dianders@chromium.org> wrote:
> Hi,
>
> On Fri, Mar 23, 2018 at 1:20 PM, Karthikeyan Ramasubramanian
> <kramasub@codeaurora.org> wrote:
>> This bus driver supports the GENI based i2c hardware controller in the
>> Qualcomm SOCs. The Qualcomm Generic Interface (GENI) is a programmable
>> module supporting a wide range of serial interfaces including I2C. The
>> driver supports FIFO mode and DMA mode of transfer and switches modes
>> dynamically depending on the size of the transfer.
>>
>> Signed-off-by: Karthikeyan Ramasubramanian <kramasub@codeaurora.org>
>> Signed-off-by: Sagar Dharia <sdharia@codeaurora.org>
>> Signed-off-by: Girish Mahadevan <girishm@codeaurora.org>
>> ---
>>  drivers/i2c/busses/Kconfig         |  13 +
>>  drivers/i2c/busses/Makefile        |   1 +
>>  drivers/i2c/busses/i2c-qcom-geni.c | 650 +++++++++++++++++++++++++++++++++++++
>>  3 files changed, 664 insertions(+)
>
> [...]
>
>> +/*
>> + * Hardware uses the underlying formula to calculate time periods of
>> + * SCL clock cycle. Firmware uses some additional cycles excluded from the
>> + * below formula and it is confirmed that the time periods are within
>> + * specification limits.
>
> I was hoping for more than just "oh, and there's a fudge factor", but
> I guess this is the best I'm going to get?
>
>
>> +static int geni_i2c_probe(struct platform_device *pdev)
>> +{
>> +       struct geni_i2c_dev *gi2c;
>> +       struct resource *res;
>> +       u32 proto, tx_depth;
>> +       int ret;
>> +
>> +       gi2c = devm_kzalloc(&pdev->dev, sizeof(*gi2c), GFP_KERNEL);
>> +       if (!gi2c)
>> +               return -ENOMEM;
>> +
>> +       gi2c->se.dev = &pdev->dev;
>> +       gi2c->se.wrapper = dev_get_drvdata(pdev->dev.parent);
>> +       res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
>> +       gi2c->se.base = devm_ioremap_resource(&pdev->dev, res);
>> +       if (IS_ERR(gi2c->se.base))
>> +               return PTR_ERR(gi2c->se.base);
>> +
>> +       gi2c->se.clk = devm_clk_get(&pdev->dev, "se");
>> +       if (IS_ERR(gi2c->se.clk)) {
>> +               ret = PTR_ERR(gi2c->se.clk);
>> +               dev_err(&pdev->dev, "Err getting SE Core clk %d\n", ret);
>> +               return ret;
>> +       }
>> +
>> +       ret = device_property_read_u32(&pdev->dev, "clock-frequency",
>> +                                                       &gi2c->clk_freq_out);
>> +       if (ret) {
>> +               /* Clock frequency not specified, so default to 100kHz. */
>> +               dev_info(&pdev->dev,
>> +                       "Bus frequency not specified, default to 100kHz.\n");
>
> If you happen to spin again, can you remove the comment since it's
> obvious from the string in the print?  It looks a lot like this code:
>
> /* Print hello, world */
> printf("hello, world\n");
>
>
> In any case, that's a pretty minor nit, so I'll add:
>
> Reviewed-by: Douglas Anderson <dianders@chromium.org>
>
> ...assuming that the bindings and "geni" code get Acked / landed
> somewhere.  Ideally let's not land this before the geni code lands
> since if the geni API changes for some reason it'll cause us grief.

The bindings and "geni" code have landed in Andy's tree, so whenever
you get a chance it would be super if you could land this i2c driver
(assuming it looks good to you).  I know at least a few people have
been poking at this and it seems to work for basic transfers.

Thanks!

-Doug
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v2 0/7] mm: pages for hugetlb's overcommit may be able to charge to memcg
From: Mike Kravetz @ 2018-05-21 18:07 UTC (permalink / raw)
  To: TSUKADA Koutaro, Johannes Weiner, Michal Hocko, Vladimir Davydov,
	Jonathan Corbet, Luis R. Rodriguez, Kees Cook
  Cc: Andrew Morton, Roman Gushchin, David Rientjes, Aneesh Kumar K.V,
	Naoya Horiguchi, Anshuman Khandual, Marc-Andre Lureau,
	Punit Agrawal, Dan Williams, Vlastimil Babka, linux-doc,
	linux-kernel, linux-fsdevel, linux-mm, cgroups
In-Reply-To: <e863529b-7ce5-4fbe-8cff-581b5789a5f9@ascade.co.jp>

On 05/17/2018 09:27 PM, TSUKADA Koutaro wrote:
> Thanks to Mike Kravetz for comment on the previous version patch.
> 
> The purpose of this patch-set is to make it possible to control whether or
> not to charge surplus hugetlb pages obtained by overcommitting to memory
> cgroup. In the future, I am trying to accomplish limiting the memory usage
> of applications that use both normal pages and hugetlb pages by the memory
> cgroup(not use the hugetlb cgroup).
> 
> Applications that use shared libraries like libhugetlbfs.so use both normal
> pages and hugetlb pages, but we do not know how much to use each. Please
> suppose you want to manage the memory usage of such applications by cgroup
> How do you set the memory cgroup and hugetlb cgroup limit when you want to
> limit memory usage to 10GB?
> 
> If you set a limit of 10GB for each, the user can use a total of 20GB of
> memory and can not limit it well. Since it is difficult to estimate the
> ratio used by user of normal pages and hugetlb pages, setting limits of 2GB
> to memory cgroup and 8GB to hugetlb cgroup is not very good idea. In such a
> case, I thought that by using my patch-set, we could manage resources just
> by setting 10GB as the limit of memory cgoup(there is no limit to hugetlb
> cgroup).
> 
> In this patch-set, introduce the charge_surplus_huge_pages(boolean) to
> struct hstate. If it is true, it charges to the memory cgroup to which the
> task that obtained surplus hugepages belongs. If it is false, do nothing as
> before, and the default value is false. The charge_surplus_huge_pages can
> be controlled procfs or sysfs interfaces.
> 
> Since THP is very effective in environments with kernel page size of 4KB,
> such as x86, there is no reason to positively use HugeTLBfs, so I think
> that there is no situation to enable charge_surplus_huge_pages. However, in
> some distributions such as arm64, the page size of the kernel is 64KB, and
> the size of THP is too huge as 512MB, making it difficult to use. HugeTLBfs
> may support multiple huge page sizes, and in such a special environment
> there is a desire to use HugeTLBfs.

One of the basic questions/concerns I have is accounting for surplus huge
pages in the default memory resource controller.  The existing huegtlb
resource controller already takes hugetlbfs huge pages into account,
including surplus pages.  This series would allow surplus pages to be
accounted for in the default  memory controller, or the hugetlb controller
or both.

I understand that current mechanisms do not meet the needs of the above
use case.  The question is whether this is an appropriate way to approach
the issue.  My cgroup experience and knowledge is extremely limited, but
it does not appear that any other resource can be controlled by multiple
controllers.  Therefore, I am concerned that this may be going against
basic cgroup design philosophy.

It would be good to get comments from people more cgroup knowledgeable,
and especially from those involved in the decision to do separate hugetlb
control.

-- 
Mike Kravetz

> 
> The patch set is for 4.17.0-rc3+. I don't know whether patch-set are
> acceptable or not, so I just done a simple test.
> 
> Thanks,
> Tsukada
> 
> TSUKADA Koutaro (7):
>   hugetlb: introduce charge_surplus_huge_pages to struct hstate
>   hugetlb: supports migrate charging for surplus hugepages
>   memcg: use compound_order rather than hpage_nr_pages
>   mm, sysctl: make charging surplus hugepages controllable
>   hugetlb: add charge_surplus_hugepages attribute
>   Documentation, hugetlb: describe about charge_surplus_hugepages
>   memcg: supports movement of surplus hugepages statistics
> 
>  Documentation/vm/hugetlbpage.txt |    6 +
>  include/linux/hugetlb.h          |    4 +
>  kernel/sysctl.c                  |    7 +
>  mm/hugetlb.c                     |  148 +++++++++++++++++++++++++++++++++++++++
>  mm/memcontrol.c                  |  109 +++++++++++++++++++++++++++-
>  5 files changed, 269 insertions(+), 5 deletions(-)
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v8 1/6] cpuset: Enable cpuset controller in default hierarchy
From: Waiman Long @ 2018-05-21 16:10 UTC (permalink / raw)
  To: Patrick Bellasi
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar,
	cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Juri Lelli
In-Reply-To: <20180521150924.GS30654@e110439-lin>

On 05/21/2018 11:09 AM, Patrick Bellasi wrote:
> On 21-May 09:55, Waiman Long wrote:
>
>> Changing cpuset.cpus will require searching for the all the tasks in
>> the cpuset and change its cpu mask.
> ... I'm wondering if that has to be the case. In principle there can
> be a different solution which is: update on demand. In the wakeup
> path, once we know a task really need a CPU and we want to find one
> for it, at that point we can align the cpuset mask with the task's
> one. Sort of using the cpuset mask as a clamp on top of the task's
> affinity mask.
>
> The main downside of such an approach could be the overheads in the
> wakeup path... but, still... that should be measured.
> The advantage is that we do not spend time changing attributes of
> tassk which, potentially, could be sleeping for a long time.

We already have a linked list of tasks in a cgroup. So it isn't too hard
to find them. Doing update on demand will require adding a bunch of code
to the wakeup path. So unless there is a good reason to do it, I don't
it as necessary at this point.

>
>> That isn't a fast operation, but it shouldn't be too bad either
>> depending on how many tasks are in the cpuset.
> Indeed, althought it still seems a bit odd and overkilling updating
> task affinity for tasks which are not currently RUNNABLE. Isn't it?
>
>> I would not suggest doing rapid changes to cpuset.cpus as a mean to tune
>> the behavior of a task. So what exactly is the tuning you are thinking
>> about? Is it moving a task from the a high-power cpu to a low power one
>> or vice versa?
> That's defenitively a possible use case. In Android for example we
> usually assign more resources to TOP_APP tasks (those belonging to the
> application you are currently using) while we restrict the resoures
> one we switch an app to be in BACKGROUND.

Switching an app from foreground to background and vice versa shouldn't
happen that frequently. Maybe once every few seconds, at most. I am just
wondering what use cases will require changing cpuset attributes in tens
per second.

> More in general, if you think about a generic Run-Time Resource
> Management framework, which assign resources to the tasks of multiple
> applications and want to have a fine grained control.
>
>> If so, it is probably better to move the task from one cpuset of
>> high-power cpus to another cpuset of low-power cpus.
> This is what Android does not but also what we want to possible
> change, for two main reasons:
>
> 1. it does not fit with the "number one guideline" for proper
>    CGroups usage, which is "Organize Once and Control":
>       https://elixir.bootlin.com/linux/latest/source/Documentation/cgroup-v2.txt#L518
>    where it says that:
>       migrating processes across cgroups frequently as a means to
>       apply different resource restrictions is discouraged.
>
>    Despite this giudeline, it turns out that in v1 at least, it seems
>    to be faster to move tasks across cpusets then tuning cpuset
>    attributes... also when all the tasks are sleeping.

It is probably similar in v2 as the core logic are almost the same.

> 2. it does not allow to get advantages for accounting controllers such
>    as the memory controller where, by moving tasks around, we cannot
>    properly account and control the amount of memory a task can use.

For v1, memory controller and cpuset controller can be in different
hierarchy. For v2, we have a unified hierarchy. However, we don't need
to enable all the controllers in different levels of the hierarchy. For
example,

    A (memory, cpuset) -- B1 (cpuset)
                \-- B2 (cpuset)

Cgroup A has memory and cpuset controllers enabled. The child cgroups B1
and B2 only have cpuset enabled. You can move tasks between B1 and B2
and they will be subjected to the same memory limitation as imposed by
the memory controller in A. So there are way to work around that.

> Thsu, for these reasons and also to possibly migrate to the unified
> hierarchy schema proposed by CGroups v2... we would like a
> low-overhead mechanism for setting/tuning cpuset at run-time with
> whatever frequency you like.

We may be able to improve the performance of changing cpuset attribute
somewhat, but I don't believe there will be much improvement here.

>>>> +
>>>> +The "cpuset" controller is hierarchical.  That means the controller
>>>> +cannot use CPUs or memory nodes not allowed in its parent.
>>>> +
>>>> +
>>>> +Cpuset Interface Files
>>>> +~~~~~~~~~~~~~~~~~~~~~~
>>>> +
>>>> +  cpuset.cpus
>>>> +	A read-write multiple values file which exists on non-root
>>>> +	cpuset-enabled cgroups.
>>>> +
>>>> +	It lists the CPUs allowed to be used by tasks within this
>>>> +	cgroup.  The CPU numbers are comma-separated numbers or
>>>> +	ranges.  For example:
>>>> +
>>>> +	  # cat cpuset.cpus
>>>> +	  0-4,6,8-10
>>>> +
>>>> +	An empty value indicates that the cgroup is using the same
>>>> +	setting as the nearest cgroup ancestor with a non-empty
>>>> +	"cpuset.cpus" or all the available CPUs if none is found.
>>> Does that means that we can move tasks into a newly created group for
>>> which we have not yet configured this value?
>>> AFAIK, that's a different behavior wrt v1... and I like it better.
>>>
>> For v2, if you haven't set up the cpuset.cpus, it defaults to the
>> effective cpu list of its parent.
> +1
>
>>>> +
>>>> +	The value of "cpuset.cpus" stays constant until the next update
>>>> +	and won't be affected by any CPU hotplug events.
>>> This also sounds interesting, does it means that we use the
>>> cpuset.cpus mask to restrict online CPUs, whatever they are?
>> cpuset.cpus holds the cpu list written by the users.
>> cpuset.cpus.effective is the actual cpu mask that is being used. The
>> effective cpu mask is always a subset of cpuset.cpus. They differ if not
>> all the CPUs in cpuset.cpus are online.
> And that's fine: the effective mask is updated based on HP events.
>
> The main limitations on this side, so far, is that in
> update_tasks_cpumask() we walk all the tasks to set_cpus_allowed_ptr()
> independently for them to be RUNNABLE or not. Isn't that?

That is true.

> Thus, this will ensure to have a valid mask at wakeup time, but
> perhaps it's not such a big overhead to update the same on the wakeup
> path... thus speeding up quite a lot the update_cpumasks_hier()
> especially when you have many SLEEPING tasks on a cpuset.
>
> A first measurement and tracing shows that this update could cost up
> to 4ms on a Pixel2 device where you update the cpus for a cpuset
> containing a single task always sleeping.

The 4ms cost is more than what I would have expected. If you think
delaying the update until wakeup time is the right move, you can create
a patch to do that and we can discuss the merit of doing so in LKML.

>
>>> I'll have a better look at the code, but my understanding of v1 is
>>> that we spent a lot of effort to keep task cpu-affinity masks aligned
>>> with the cpuset in which they live, and we do something similar at each
>>> HP event, which ultimately generates a lot of overheads in systems
>>> where: you have many HP events and/or cpuset.cpus change quite
>>> frequently.
>>>
>>> I hope to find some better behavior in this series.
>>>
>> The behavior of CPU offline event should be similar in v2. Any HP event
>> will cause the system to reset the cpu masks of task affected by the
>> event. The online event, however, will be a bit different between v1 and
>> v2. For v1, the online event won't restore the CPU back to those cpusets
>> that had the onlined CPU previously. For v2, the v2, the online CPU will
>> be restored back to those cpusets. So there is less work from the
>> management layer, but overhead is still there in the kernel of doing the
>> restore.
> On that side, I still have to better look into the v1 and v2
> implementations, but for the util_clamp extension of the cpu
> controller:
>    https://lkml.org/lkml/2018/4/9/601
> I'm proposing a different update schema which it seems can give you
> the benefits or "restoring the mask" after an UP event as well as a
> fast update/tuning path at run-time.
>
> Along the line of the above implementation, it would mean that the
> task affinity mask is constrained/clamped/masked by the TG's affinity
> mask. This should be an operation performed "on-demand" whenever it
> makes sense.
>
> However, to be honest, I never measured the overheads to combine two
> cpu masks and it can very well be something overkilling for the wakeup
> path. I don't think the AND by itself should be an issue, since it's
> already used in the fast wakeup path, e.g.
>
>    select_task_rq_fair()
>       select_idle_sibling()
>          select_idle_core()
>             cpumask_and(cpus, sched_domain_span(sd),
>                         &p->cpus_allowed);
>
> What eventually could be an issue is the race between the scheduler
> looking at the cpuset cpumaks and cgroups changing it... but perhaps
> that's something could be fixed with a proper locking mechanism.
>
> I will try to run some experiments to at least collect some overheads
> numbers.
Collecting more information on where the slowdown is will be helpful.

-Longman

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 0/3] docs/vm: transhuge: split userspace bits to admin-guide/mm
From: Jonathan Corbet @ 2018-05-21 15:27 UTC (permalink / raw)
  To: Mike Rapoport; +Cc: linux-doc, linux-mm, lkml
In-Reply-To: <1526285620-453-1-git-send-email-rppt@linux.vnet.ibm.com>

On Mon, 14 May 2018 11:13:37 +0300
Mike Rapoport <rppt@linux.vnet.ibm.com> wrote:

> Here are minor updates to transparent hugepage docs. Except from minor
> formatting and spelling updates, these patches re-arrange the transhuge.rst
> so that userspace interface description will not be interleaved with the
> implementation details and it would be possible to split the userspace
> related bits to Documentation/admin-guide/mm, which is done by the third
> patch.

Looks good, I've applied the set, after adding a changelog for #3.

Thanks,

jon
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v8 1/6] cpuset: Enable cpuset controller in default hierarchy
From: Patrick Bellasi @ 2018-05-21 15:09 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar,
	cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Juri Lelli
In-Reply-To: <fbc71cfc-6801-0fda-b0c6-2a70f7c9fe25@redhat.com>

On 21-May 09:55, Waiman Long wrote:
> On 05/21/2018 07:55 AM, Patrick Bellasi wrote:
> > Hi Waiman!

[...]

> >> +Cpuset
> >> +------
> >> +
> >> +The "cpuset" controller provides a mechanism for constraining
> >> +the CPU and memory node placement of tasks to only the resources
> >> +specified in the cpuset interface files in a task's current cgroup.
> >> +This is especially valuable on large NUMA systems where placing jobs
> >> +on properly sized subsets of the systems with careful processor and
> >> +memory placement to reduce cross-node memory access and contention
> >> +can improve overall system performance.
> > Another quite important use-case for cpuset is Android, where they are
> > actively used to do both power-saving as well as performance tunings.
> > For example, depending on the status of an application, its threads
> > can be allowed to run on all available CPUS (e.g. foreground apps) or
> > be restricted only on few energy efficient CPUs (e.g. backgroud apps).
> >
> > Since here we are at "rewriting" cpusets for v2, I think it's important
> > to keep this mobile world scenario into consideration.
> >
> > For example, in this context, we are looking at the possibility to
> > update/tune cpuset.cpus with a relatively high rate, i.e. tens of
> > times per second. Not sure that's the same update rate usually
> > required for the large NUMA systems you cite above.  However, in this
> > case it's quite important to have really small overheads for these
> > operations.
> 
> The cgroup interface isn't designed for high update throughput.

Indeed, I had the same impression...

> Changing cpuset.cpus will require searching for the all the tasks in
> the cpuset and change its cpu mask.

... I'm wondering if that has to be the case. In principle there can
be a different solution which is: update on demand. In the wakeup
path, once we know a task really need a CPU and we want to find one
for it, at that point we can align the cpuset mask with the task's
one. Sort of using the cpuset mask as a clamp on top of the task's
affinity mask.

The main downside of such an approach could be the overheads in the
wakeup path... but, still... that should be measured.
The advantage is that we do not spend time changing attributes of
tassk which, potentially, could be sleeping for a long time.


> That isn't a fast operation, but it shouldn't be too bad either
> depending on how many tasks are in the cpuset.

Indeed, althought it still seems a bit odd and overkilling updating
task affinity for tasks which are not currently RUNNABLE. Isn't it?

> I would not suggest doing rapid changes to cpuset.cpus as a mean to tune
> the behavior of a task. So what exactly is the tuning you are thinking
> about? Is it moving a task from the a high-power cpu to a low power one
> or vice versa?

That's defenitively a possible use case. In Android for example we
usually assign more resources to TOP_APP tasks (those belonging to the
application you are currently using) while we restrict the resoures
one we switch an app to be in BACKGROUND.

More in general, if you think about a generic Run-Time Resource
Management framework, which assign resources to the tasks of multiple
applications and want to have a fine grained control.

> If so, it is probably better to move the task from one cpuset of
> high-power cpus to another cpuset of low-power cpus.

This is what Android does not but also what we want to possible
change, for two main reasons:

1. it does not fit with the "number one guideline" for proper
   CGroups usage, which is "Organize Once and Control":
      https://elixir.bootlin.com/linux/latest/source/Documentation/cgroup-v2.txt#L518
   where it says that:
      migrating processes across cgroups frequently as a means to
      apply different resource restrictions is discouraged.

   Despite this giudeline, it turns out that in v1 at least, it seems
   to be faster to move tasks across cpusets then tuning cpuset
   attributes... also when all the tasks are sleeping.


2. it does not allow to get advantages for accounting controllers such
   as the memory controller where, by moving tasks around, we cannot
   properly account and control the amount of memory a task can use.

Thsu, for these reasons and also to possibly migrate to the unified
hierarchy schema proposed by CGroups v2... we would like a
low-overhead mechanism for setting/tuning cpuset at run-time with
whatever frequency you like.

> >> +
> >> +The "cpuset" controller is hierarchical.  That means the controller
> >> +cannot use CPUs or memory nodes not allowed in its parent.
> >> +
> >> +
> >> +Cpuset Interface Files
> >> +~~~~~~~~~~~~~~~~~~~~~~
> >> +
> >> +  cpuset.cpus
> >> +	A read-write multiple values file which exists on non-root
> >> +	cpuset-enabled cgroups.
> >> +
> >> +	It lists the CPUs allowed to be used by tasks within this
> >> +	cgroup.  The CPU numbers are comma-separated numbers or
> >> +	ranges.  For example:
> >> +
> >> +	  # cat cpuset.cpus
> >> +	  0-4,6,8-10
> >> +
> >> +	An empty value indicates that the cgroup is using the same
> >> +	setting as the nearest cgroup ancestor with a non-empty
> >> +	"cpuset.cpus" or all the available CPUs if none is found.
> > Does that means that we can move tasks into a newly created group for
> > which we have not yet configured this value?
> > AFAIK, that's a different behavior wrt v1... and I like it better.
> >
> 
> For v2, if you haven't set up the cpuset.cpus, it defaults to the
> effective cpu list of its parent.

+1

> 
> >> +
> >> +	The value of "cpuset.cpus" stays constant until the next update
> >> +	and won't be affected by any CPU hotplug events.
> > This also sounds interesting, does it means that we use the
> > cpuset.cpus mask to restrict online CPUs, whatever they are?
> 
> cpuset.cpus holds the cpu list written by the users.
> cpuset.cpus.effective is the actual cpu mask that is being used. The
> effective cpu mask is always a subset of cpuset.cpus. They differ if not
> all the CPUs in cpuset.cpus are online.

And that's fine: the effective mask is updated based on HP events.

The main limitations on this side, so far, is that in
update_tasks_cpumask() we walk all the tasks to set_cpus_allowed_ptr()
independently for them to be RUNNABLE or not. Isn't that?

Thus, this will ensure to have a valid mask at wakeup time, but
perhaps it's not such a big overhead to update the same on the wakeup
path... thus speeding up quite a lot the update_cpumasks_hier()
especially when you have many SLEEPING tasks on a cpuset.

A first measurement and tracing shows that this update could cost up
to 4ms on a Pixel2 device where you update the cpus for a cpuset
containing a single task always sleeping.

> > I'll have a better look at the code, but my understanding of v1 is
> > that we spent a lot of effort to keep task cpu-affinity masks aligned
> > with the cpuset in which they live, and we do something similar at each
> > HP event, which ultimately generates a lot of overheads in systems
> > where: you have many HP events and/or cpuset.cpus change quite
> > frequently.
> >
> > I hope to find some better behavior in this series.
> >
> 
> The behavior of CPU offline event should be similar in v2. Any HP event
> will cause the system to reset the cpu masks of task affected by the
> event. The online event, however, will be a bit different between v1 and
> v2. For v1, the online event won't restore the CPU back to those cpusets
> that had the onlined CPU previously. For v2, the v2, the online CPU will
> be restored back to those cpusets. So there is less work from the
> management layer, but overhead is still there in the kernel of doing the
> restore.

On that side, I still have to better look into the v1 and v2
implementations, but for the util_clamp extension of the cpu
controller:
   https://lkml.org/lkml/2018/4/9/601
I'm proposing a different update schema which it seems can give you
the benefits or "restoring the mask" after an UP event as well as a
fast update/tuning path at run-time.

Along the line of the above implementation, it would mean that the
task affinity mask is constrained/clamped/masked by the TG's affinity
mask. This should be an operation performed "on-demand" whenever it
makes sense.

However, to be honest, I never measured the overheads to combine two
cpu masks and it can very well be something overkilling for the wakeup
path. I don't think the AND by itself should be an issue, since it's
already used in the fast wakeup path, e.g.

   select_task_rq_fair()
      select_idle_sibling()
         select_idle_core()
            cpumask_and(cpus, sched_domain_span(sd),
                        &p->cpus_allowed);

What eventually could be an issue is the race between the scheduler
looking at the cpuset cpumaks and cgroups changing it... but perhaps
that's something could be fixed with a proper locking mechanism.

I will try to run some experiments to at least collect some overheads
numbers.


[...]

> >> @@ -2104,8 +2144,10 @@ struct cgroup_subsys cpuset_cgrp_subsys = {
> >>  	.post_attach	= cpuset_post_attach,
> >>  	.bind		= cpuset_bind,
> >>  	.fork		= cpuset_fork,
> >> -	.legacy_cftypes	= files,
> >> +	.legacy_cftypes	= legacy_files,
> >> +	.dfl_cftypes	= dfl_files,
> >>  	.early_init	= true,
> >> +	.threaded	= true,
> > Which means that by default we can attach tasks instead of only
> > processes, right?
> 
> Yes, you can control task placement on the thread level, not just process.

+1

-- 
#include <best/regards.h>

Patrick Bellasi
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v2 3/7] memcg: use compound_order rather than hpage_nr_pages
From: Punit Agrawal @ 2018-05-21 14:53 UTC (permalink / raw)
  To: TSUKADA Koutaro
  Cc: Johannes Weiner, Michal Hocko, Vladimir Davydov, Jonathan Corbet,
	Luis R. Rodriguez, Kees Cook, Andrew Morton, Roman Gushchin,
	David Rientjes, Mike Kravetz, Aneesh Kumar K.V, Naoya Horiguchi,
	Anshuman Khandual, Marc-Andre Lureau, Dan Williams,
	Vlastimil Babka, linux-doc, linux-kernel, linux-fsdevel, linux-mm,
	cgroups
In-Reply-To: <2053ac36-74df-b05e-d1ce-36f69dde2a47@ascade.co.jp>

TSUKADA Koutaro <tsukada@ascade.co.jp> writes:

> On 2018/05/19 2:51, Punit Agrawal wrote:
>> Punit Agrawal <punit.agrawal@arm.com> writes:
>>
>>> Tsukada-san,
>>>
>>> I am not familiar with memcg so can't comment about whether the patchset
>>> is the right way to solve the problem outlined in the cover letter but
>>> had a couple of comments about this patch.
>>>
>>> TSUKADA Koutaro <tsukada@ascade.co.jp> writes:
>>>
>>>> The current memcg implementation assumes that the compound page is THP.
>>>> In order to be able to charge surplus hugepage, we use compound_order.
>>>>
>>>> Signed-off-by: TSUKADA Koutaro <tsukada@ascade.co.jp>
>>>
>>> Please move this before Patch 1/7. This is to prevent wrong accounting
>>> of pages to memcg for size != PMD_SIZE.
>>
>> I just noticed that the default state is off so the change isn't enabled
>> until the sysfs node is exposed in the next patch. Please ignore this
>> comment.
>>
>> One below still applies.
>>
>>>
>>>> ---
>>>>   memcontrol.c |   10 +++++-----
>>>>   1 file changed, 5 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>>>> index 2bd3df3..a8f1ff8 100644
>>>> --- a/mm/memcontrol.c
>>>> +++ b/mm/memcontrol.c
>>>> @@ -4483,7 +4483,7 @@ static int mem_cgroup_move_account(struct page *page,
>>>>   				   struct mem_cgroup *to)
>>>>   {
>>>>   	unsigned long flags;
>>>> -	unsigned int nr_pages = compound ? hpage_nr_pages(page) : 1;
>>>> +	unsigned int nr_pages = compound ? (1 << compound_order(page)) : 1;
>>>
>>> Instead of replacing calls to hpage_nr_pages(), is it possible to modify
>>> it to do the calculation?
>
> Thank you for review my code and please just call me Tsukada.
>
> I think it is possible to modify the inside of itself rather than
> replacing the call to hpage_nr_pages().
>
> Inferring from the processing that hpage_nr_pages() desires, I thought
> that the definition of hpage_nr_pages() could be moved outside the
> CONFIG_TRANSPARENT_HUGEPAGE. It seems that THP and HugeTLBfs can be
> handled correctly because compound_order() is judged by seeing whether it
> is PageHead or not.
>
> Also, I would like to use compound_order() inside hpage_nr_pages(), but
> since huge_mm.h is included before mm.h where compound_order() is defined,
> move hpage_nr_pages to mm.h.
>
> Instead of patch 3/7, are the following patches implementing what you
> intended?
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index a8a1262..1186ab7 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -204,12 +204,6 @@ static inline spinlock_t *pud_trans_huge_lock(pud_t *pud,
>  	else
>  		return NULL;
>  }
> -static inline int hpage_nr_pages(struct page *page)
> -{
> -	if (unlikely(PageTransHuge(page)))
> -		return HPAGE_PMD_NR;
> -	return 1;
> -}
>
>  struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr,
>  		pmd_t *pmd, int flags);
> @@ -254,8 +248,6 @@ static inline bool thp_migration_supported(void)
>  #define HPAGE_PUD_MASK ({ BUILD_BUG(); 0; })
>  #define HPAGE_PUD_SIZE ({ BUILD_BUG(); 0; })
>
> -#define hpage_nr_pages(x) 1
> -
>  static inline bool transparent_hugepage_enabled(struct vm_area_struct *vma)
>  {
>  	return false;
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 1ac1f06..082f2ee 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -673,6 +673,12 @@ static inline unsigned int compound_order(struct page *page)
>  	return page[1].compound_order;
>  }
>
> +static inline int hpage_nr_pages(struct page *page)
> +{
> +	VM_BUG_ON_PAGE(PageTail(page), page);
> +	return (1 << compound_order(page));
> +}
> +
>  static inline void set_compound_order(struct page *page, unsigned int order)
>  {
>  	page[1].compound_order = order;

That looks a lot better. Thanks for giving it a go.

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v2 0/7] mm: pages for hugetlb's overcommit may be able to charge to memcg
From: Punit Agrawal @ 2018-05-21 14:52 UTC (permalink / raw)
  To: TSUKADA Koutaro
  Cc: Johannes Weiner, Michal Hocko, Vladimir Davydov, Jonathan Corbet,
	Luis R. Rodriguez, Kees Cook, Andrew Morton, Roman Gushchin,
	David Rientjes, Mike Kravetz, Aneesh Kumar K.V, Naoya Horiguchi,
	Anshuman Khandual, Marc-Andre Lureau, Dan Williams,
	Vlastimil Babka, linux-doc, linux-kernel, linux-fsdevel, linux-mm,
	cgroups
In-Reply-To: <e863529b-7ce5-4fbe-8cff-581b5789a5f9@ascade.co.jp>

Hi Tsukada,

I was staring at memcg code to better understand your changes and had
the below thought.

TSUKADA Koutaro <tsukada@ascade.co.jp> writes:

[...]

> In this patch-set, introduce the charge_surplus_huge_pages(boolean) to
> struct hstate. If it is true, it charges to the memory cgroup to which the
> task that obtained surplus hugepages belongs. If it is false, do nothing as
> before, and the default value is false. The charge_surplus_huge_pages can
> be controlled procfs or sysfs interfaces.

Instead of tying the surplus huge page charging control per-hstate,
could the control be made per-memcg?

This can be done by introducing a per-memory controller file in sysfs
(memory.charge_surplus_hugepages?) that indicates whether surplus
hugepages are to be charged to the controller and forms part of the
total limit. IIUC, the limit already accounts for page and swap cache
pages.

This would allow the control to be enabled per-cgroup and also keep the
userspace control interface in one place.

As said earlier, I'm not familiar with memcg so the above might not be a
feasible but think it'll lead to a more coherent user
interface. Hopefully, more knowledgeable folks on the thread can chime
in.

Thanks,
Punit

> Since THP is very effective in environments with kernel page size of 4KB,
> such as x86, there is no reason to positively use HugeTLBfs, so I think
> that there is no situation to enable charge_surplus_huge_pages. However, in
> some distributions such as arm64, the page size of the kernel is 64KB, and
> the size of THP is too huge as 512MB, making it difficult to use. HugeTLBfs
> may support multiple huge page sizes, and in such a special environment
> there is a desire to use HugeTLBfs.
>
> The patch set is for 4.17.0-rc3+. I don't know whether patch-set are
> acceptable or not, so I just done a simple test.
>
> Thanks,
> Tsukada
>
> TSUKADA Koutaro (7):
>   hugetlb: introduce charge_surplus_huge_pages to struct hstate
>   hugetlb: supports migrate charging for surplus hugepages
>   memcg: use compound_order rather than hpage_nr_pages
>   mm, sysctl: make charging surplus hugepages controllable
>   hugetlb: add charge_surplus_hugepages attribute
>   Documentation, hugetlb: describe about charge_surplus_hugepages
>   memcg: supports movement of surplus hugepages statistics
>
>  Documentation/vm/hugetlbpage.txt |    6 +
>  include/linux/hugetlb.h          |    4 +
>  kernel/sysctl.c                  |    7 +
>  mm/hugetlb.c                     |  148 +++++++++++++++++++++++++++++++++++++++
>  mm/memcontrol.c                  |  109 +++++++++++++++++++++++++++-
>  5 files changed, 269 insertions(+), 5 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v8 1/6] cpuset: Enable cpuset controller in default hierarchy
From: Waiman Long @ 2018-05-21 13:55 UTC (permalink / raw)
  To: Patrick Bellasi
  Cc: Tejun Heo, Li Zefan, Johannes Weiner, Peter Zijlstra, Ingo Molnar,
	cgroups, linux-kernel, linux-doc, kernel-team, pjt, luto,
	Mike Galbraith, torvalds, Roman Gushchin, Juri Lelli
In-Reply-To: <20180521115528.GR30654@e110439-lin>

On 05/21/2018 07:55 AM, Patrick Bellasi wrote:
> Hi Waiman!
>
> I've started looking at the possibility to move Android to use cgroups
> v2 and the availability of the cpuset controller makes this even more
> promising.
>
> I'll try to give a run to this series on Android, meanwhile I have
> some (hopefully not too much dummy) questions below.
>
> On 17-May 16:55, Waiman Long wrote:
>> Given the fact that thread mode had been merged into 4.14, it is now
>> time to enable cpuset to be used in the default hierarchy (cgroup v2)
>> as it is clearly threaded.
>>
>> The cpuset controller had experienced feature creep since its
>> introduction more than a decade ago. Besides the core cpus and mems
>> control files to limit cpus and memory nodes, there are a bunch of
>> additional features that can be controlled from the userspace. Some of
>> the features are of doubtful usefulness and may not be actively used.
>>
>> This patch enables cpuset controller in the default hierarchy with
>> a minimal set of features, namely just the cpus and mems and their
>> effective_* counterparts.  We can certainly add more features to the
>> default hierarchy in the future if there is a real user need for them
>> later on.
>>
>> Alternatively, with the unified hiearachy, it may make more sense
>> to move some of those additional cpuset features, if desired, to
>> memory controller or may be to the cpu controller instead of staying
>> with cpuset.
>>
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>>  Documentation/cgroup-v2.txt | 90 ++++++++++++++++++++++++++++++++++++++++++---
>>  kernel/cgroup/cpuset.c      | 48 ++++++++++++++++++++++--
>>  2 files changed, 130 insertions(+), 8 deletions(-)
>>
>> diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
>> index 74cdeae..cf7bac6 100644
>> --- a/Documentation/cgroup-v2.txt
>> +++ b/Documentation/cgroup-v2.txt
>> @@ -53,11 +53,13 @@ v1 is available under Documentation/cgroup-v1/.
>>         5-3-2. Writeback
>>       5-4. PID
>>         5-4-1. PID Interface Files
>> -     5-5. Device
>> -     5-6. RDMA
>> -       5-6-1. RDMA Interface Files
>> -     5-7. Misc
>> -       5-7-1. perf_event
>> +     5-5. Cpuset
>> +       5.5-1. Cpuset Interface Files
>> +     5-6. Device
>> +     5-7. RDMA
>> +       5-7-1. RDMA Interface Files
>> +     5-8. Misc
>> +       5-8-1. perf_event
>>       5-N. Non-normative information
>>         5-N-1. CPU controller root cgroup process behaviour
>>         5-N-2. IO controller root cgroup process behaviour
>> @@ -1435,6 +1437,84 @@ through fork() or clone(). These will return -EAGAIN if the creation
>>  of a new process would cause a cgroup policy to be violated.
>>  
>>  
>> +Cpuset
>> +------
>> +
>> +The "cpuset" controller provides a mechanism for constraining
>> +the CPU and memory node placement of tasks to only the resources
>> +specified in the cpuset interface files in a task's current cgroup.
>> +This is especially valuable on large NUMA systems where placing jobs
>> +on properly sized subsets of the systems with careful processor and
>> +memory placement to reduce cross-node memory access and contention
>> +can improve overall system performance.
> Another quite important use-case for cpuset is Android, where they are
> actively used to do both power-saving as well as performance tunings.
> For example, depending on the status of an application, its threads
> can be allowed to run on all available CPUS (e.g. foreground apps) or
> be restricted only on few energy efficient CPUs (e.g. backgroud apps).
>
> Since here we are at "rewriting" cpusets for v2, I think it's important
> to keep this mobile world scenario into consideration.
>
> For example, in this context, we are looking at the possibility to
> update/tune cpuset.cpus with a relatively high rate, i.e. tens of
> times per second. Not sure that's the same update rate usually
> required for the large NUMA systems you cite above.  However, in this
> case it's quite important to have really small overheads for these
> operations.

The cgroup interface isn't designed for high update throughput. Changing
cpuset.cpus will require searching for the all the tasks in the cpuset
and change its cpu mask. That isn't a fast operation, but it shouldn't
be too bad either depending on how many tasks are in the cpuset.

I would not suggest doing rapid changes to cpuset.cpus as a mean to tune
the behavior of a task. So what exactly is the tuning you are thinking
about? Is it moving a task from the a high-power cpu to a low power one
or vice versa? If so, it is probably better to move the task from one
cpuset of high-power cpus to another cpuset of low-power cpus.

>> +
>> +The "cpuset" controller is hierarchical.  That means the controller
>> +cannot use CPUs or memory nodes not allowed in its parent.
>> +
>> +
>> +Cpuset Interface Files
>> +~~~~~~~~~~~~~~~~~~~~~~
>> +
>> +  cpuset.cpus
>> +	A read-write multiple values file which exists on non-root
>> +	cpuset-enabled cgroups.
>> +
>> +	It lists the CPUs allowed to be used by tasks within this
>> +	cgroup.  The CPU numbers are comma-separated numbers or
>> +	ranges.  For example:
>> +
>> +	  # cat cpuset.cpus
>> +	  0-4,6,8-10
>> +
>> +	An empty value indicates that the cgroup is using the same
>> +	setting as the nearest cgroup ancestor with a non-empty
>> +	"cpuset.cpus" or all the available CPUs if none is found.
> Does that means that we can move tasks into a newly created group for
> which we have not yet configured this value?
> AFAIK, that's a different behavior wrt v1... and I like it better.
>

For v2, if you haven't set up the cpuset.cpus, it defaults to the
effective cpu list of its parent.

>> +
>> +	The value of "cpuset.cpus" stays constant until the next update
>> +	and won't be affected by any CPU hotplug events.
> This also sounds interesting, does it means that we use the
> cpuset.cpus mask to restrict online CPUs, whatever they are?

cpuset.cpus holds the cpu list written by the users.
cpuset.cpus.effective is the actual cpu mask that is being used. The
effective cpu mask is always a subset of cpuset.cpus. They differ if not
all the CPUs in cpuset.cpus are online.
> I'll have a better look at the code, but my understanding of v1 is
> that we spent a lot of effort to keep task cpu-affinity masks aligned
> with the cpuset in which they live, and we do something similar at each
> HP event, which ultimately generates a lot of overheads in systems
> where: you have many HP events and/or cpuset.cpus change quite
> frequently.
>
> I hope to find some better behavior in this series.
>

The behavior of CPU offline event should be similar in v2. Any HP event
will cause the system to reset the cpu masks of task affected by the
event. The online event, however, will be a bit different between v1 and
v2. For v1, the online event won't restore the CPU back to those cpusets
that had the onlined CPU previously. For v2, the v2, the online CPU will
be restored back to those cpusets. So there is less work from the
management layer, but overhead is still there in the kernel of doing the
restore.

>> +
>> +  cpuset.cpus.effective
>> +	A read-only multiple values file which exists on non-root
>> +	cpuset-enabled cgroups.
>> +
>> +	It lists the onlined CPUs that are actually allowed to be
>> +	used by tasks within the current cgroup.  If "cpuset.cpus"
>> +	is empty, it shows all the CPUs from the parent cgroup that
>> +	will be available to be used by this cgroup.  Otherwise, it is
>> +	a subset of "cpuset.cpus".  Its value will be affected by CPU
>> +	hotplug events.
> This looks similar to v1, isn't it?

For v1, cpuset.cpus.effective is the same as cpuset.cpus unless you turn
on the v2 mode when mounting the v1 cpuset. For v2, they differ. Please
see the explanation above.

>> +
>> +  cpuset.mems
>> +	A read-write multiple values file which exists on non-root
>> +	cpuset-enabled cgroups.
>> +
>> +	It lists the memory nodes allowed to be used by tasks within
>> +	this cgroup.  The memory node numbers are comma-separated
>> +	numbers or ranges.  For example:
>> +
>> +	  # cat cpuset.mems
>> +	  0-1,3
>> +
>> +	An empty value indicates that the cgroup is using the same
>> +	setting as the nearest cgroup ancestor with a non-empty
>> +	"cpuset.mems" or all the available memory nodes if none
>> +	is found.
>> +
>> +	The value of "cpuset.mems" stays constant until the next update
>> +	and won't be affected by any memory nodes hotplug events.
>> +
>> +  cpuset.mems.effective
>> +	A read-only multiple values file which exists on non-root
>> +	cpuset-enabled cgroups.
>> +
>> +	It lists the onlined memory nodes that are actually allowed to
>> +	be used by tasks within the current cgroup.  If "cpuset.mems"
>> +	is empty, it shows all the memory nodes from the parent cgroup
>> +	that will be available to be used by this cgroup.  Otherwise,
>> +	it is a subset of "cpuset.mems".  Its value will be affected
>> +	by memory nodes hotplug events.
>> +
>> +
>>  Device controller
>>  -----------------
>>  
>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>> index b42037e..419b758 100644
>> --- a/kernel/cgroup/cpuset.c
>> +++ b/kernel/cgroup/cpuset.c
>> @@ -1823,12 +1823,11 @@ static s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft)
>>  	return 0;
>>  }
>>  
>> -
>>  /*
>>   * for the common functions, 'private' gives the type of file
>>   */
>>  
>> -static struct cftype files[] = {
>> +static struct cftype legacy_files[] = {
>>  	{
>>  		.name = "cpus",
>>  		.seq_show = cpuset_common_seq_show,
>> @@ -1931,6 +1930,47 @@ static s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft)
>>  };
>>  
>>  /*
>> + * This is currently a minimal set for the default hierarchy. It can be
>> + * expanded later on by migrating more features and control files from v1.
>> + */
>> +static struct cftype dfl_files[] = {
>> +	{
>> +		.name = "cpus",
>> +		.seq_show = cpuset_common_seq_show,
>> +		.write = cpuset_write_resmask,
>> +		.max_write_len = (100U + 6 * NR_CPUS),
>> +		.private = FILE_CPULIST,
>> +		.flags = CFTYPE_NOT_ON_ROOT,
>> +	},
>> +
>> +	{
>> +		.name = "mems",
>> +		.seq_show = cpuset_common_seq_show,
>> +		.write = cpuset_write_resmask,
>> +		.max_write_len = (100U + 6 * MAX_NUMNODES),
>> +		.private = FILE_MEMLIST,
>> +		.flags = CFTYPE_NOT_ON_ROOT,
>> +	},
>> +
>> +	{
>> +		.name = "cpus.effective",
>> +		.seq_show = cpuset_common_seq_show,
>> +		.private = FILE_EFFECTIVE_CPULIST,
>> +		.flags = CFTYPE_NOT_ON_ROOT,
>> +	},
>> +
>> +	{
>> +		.name = "mems.effective",
>> +		.seq_show = cpuset_common_seq_show,
>> +		.private = FILE_EFFECTIVE_MEMLIST,
>> +		.flags = CFTYPE_NOT_ON_ROOT,
>> +	},
>> +
>> +	{ }	/* terminate */
>> +};
>> +
>> +
>> +/*
>>   *	cpuset_css_alloc - allocate a cpuset css
>>   *	cgrp:	control group that the new cpuset will be part of
>>   */
>> @@ -2104,8 +2144,10 @@ struct cgroup_subsys cpuset_cgrp_subsys = {
>>  	.post_attach	= cpuset_post_attach,
>>  	.bind		= cpuset_bind,
>>  	.fork		= cpuset_fork,
>> -	.legacy_cftypes	= files,
>> +	.legacy_cftypes	= legacy_files,
>> +	.dfl_cftypes	= dfl_files,
>>  	.early_init	= true,
>> +	.threaded	= true,
> Which means that by default we can attach tasks instead of only
> processes, right?

Yes, you can control task placement on the thread level, not just process.

Regards,
Longman

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v4 2/2] ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver
From: Ganapatrao Kulkarni @ 2018-05-21 12:42 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Ganapatrao Kulkarni, linux-doc, LKML, linux-arm-kernel,
	Will Deacon, jnair, Robert Richter, Vadim.Lomovtsev, Jan.Glauber
In-Reply-To: <20180521104008.z6ei5zjve7u5iwho@lakrids.cambridge.arm.com>

On Mon, May 21, 2018 at 4:10 PM, Mark Rutland <mark.rutland@arm.com> wrote:
> On Mon, May 21, 2018 at 11:37:12AM +0100, Mark Rutland wrote:
>> Hi Ganapat,
>>
>>
>> Sorry for the delay in replying; I was away most of last week.
>>
>> On Tue, May 15, 2018 at 04:03:19PM +0530, Ganapatrao Kulkarni wrote:
>> > On Sat, May 5, 2018 at 12:16 AM, Ganapatrao Kulkarni <gklkml16@gmail.com> wrote:
>> > > On Thu, Apr 26, 2018 at 4:29 PM, Mark Rutland <mark.rutland@arm.com> wrote:
>> > >> On Wed, Apr 25, 2018 at 02:30:47PM +0530, Ganapatrao Kulkarni wrote:
>>
>> > >>> +static int alloc_counter(struct thunderx2_pmu_uncore_channel *pmu_uncore)
>> > >>> +{
>> > >>> +     int counter;
>> > >>> +
>> > >>> +     raw_spin_lock(&pmu_uncore->lock);
>> > >>> +     counter = find_first_zero_bit(pmu_uncore->counter_mask,
>> > >>> +                             pmu_uncore->uncore_dev->max_counters);
>> > >>> +     if (counter == pmu_uncore->uncore_dev->max_counters) {
>> > >>> +             raw_spin_unlock(&pmu_uncore->lock);
>> > >>> +             return -ENOSPC;
>> > >>> +     }
>> > >>> +     set_bit(counter, pmu_uncore->counter_mask);
>> > >>> +     raw_spin_unlock(&pmu_uncore->lock);
>> > >>> +     return counter;
>> > >>> +}
>> > >>> +
>> > >>> +static void free_counter(struct thunderx2_pmu_uncore_channel *pmu_uncore,
>> > >>> +                                     int counter)
>> > >>> +{
>> > >>> +     raw_spin_lock(&pmu_uncore->lock);
>> > >>> +     clear_bit(counter, pmu_uncore->counter_mask);
>> > >>> +     raw_spin_unlock(&pmu_uncore->lock);
>> > >>> +}
>> > >>
>> > >> I don't believe that locking is required in either of these, as the perf
>> > >> core serializes pmu::add() and pmu::del(), where these get called.
>> >
>> > without this locking, i am seeing "BUG: scheduling while atomic" when
>> > i run perf with more events together than the maximum counters
>> > supported
>>
>> Did you manage to get to the bottom of this?
>>
>> Do you have a backtrace?
>>
>> It looks like in your latest posting you reserve counters through the
>> userspace ABI, which doesn't seem right to me, and I'd like to
>> understand the problem.
>
> Looks like I misunderstood -- those are still allocated kernel-side.
>
> I'll follow that up in the v5 posting.

please review v5.
>
> Thanks,
> Mark.

thanks
Ganapat
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox