* [PATCH] docs: rust: fix grammar in testing documentation
From: Ariful Islam Shoikot @ 2026-04-14 9:07 UTC (permalink / raw)
To: linux-doc; +Cc: Ariful Islam Shoikot
Replace "how to test" with "on how to test" for clarity
Signed-off-by: Ariful Islam Shoikot <islamarifulshoikat@gmail.com>
---
Documentation/rust/testing.rst | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Documentation/rust/testing.rst b/Documentation/rust/testing.rst
index f43cb77bcc69..edce2cb6c54e 100644
--- a/Documentation/rust/testing.rst
+++ b/Documentation/rust/testing.rst
@@ -3,7 +3,7 @@
Testing
=======
-This document contains useful information how to test the Rust code in the
+This document contains useful information on how to test the Rust code in the
kernel.
There are three sorts of tests:
--
2.43.0
^ permalink raw reply related
* [PATCH v4 0/2] hwmon: Add support for MPS mp2985
From: wenswang @ 2026-04-14 9:28 UTC (permalink / raw)
To: robh, krzk+dt, conor+dt, linux, corbet, skhan
Cc: devicetree, linux-kernel, linux-hwmon, linux-doc, Wensheng Wang
From: Wensheng Wang <wenswang@yeah.net>
Add mp2985 driver in hwmon and add dt-bindings for it.
V3 -> V4:
1. Avoid mantissa data overflow in mp2985_linear_exp_transfer()
function.
V2 -> V3:
1. The shifted mantissa be clamped to the range [-1024, 1023]
before being masked in mp2985_linear_exp_transfer() function.
2. The PMBUS_VOUT_OV_FAULT_LIMIT and PMBUS_VOUT_UV_FAULT_LIMIT
value are clamped to 0xFFF before being written to the mp2985.
3. Fix the vout scale issue for vout linear11 mode.
v1 -> v2:
1. add Krzysztof's Acked-by
2. remove duplicate entry in mp2985.rst
3. clamp vout value to 32767
4. simplify the code for obtaining PMBUS_VOUT_MODE bit value
5. add comment for explaining MP2985 supported vout mode
6. switch back to previous page after obtaining vid scale to avoid
confusing the PMBus core
Wensheng Wang (2):
dt-bindings: hwmon: Add MPS mp2985
hwmon: add MP2985 driver
.../devicetree/bindings/trivial-devices.yaml | 2 +
Documentation/hwmon/index.rst | 1 +
Documentation/hwmon/mp2985.rst | 147 +++++++
MAINTAINERS | 7 +
drivers/hwmon/pmbus/Kconfig | 9 +
drivers/hwmon/pmbus/Makefile | 1 +
drivers/hwmon/pmbus/mp2985.c | 402 ++++++++++++++++++
7 files changed, 569 insertions(+)
create mode 100644 Documentation/hwmon/mp2985.rst
create mode 100644 drivers/hwmon/pmbus/mp2985.c
--
2.25.1
^ permalink raw reply
* [PATCH v4 2/2] hwmon: add MP2985 driver
From: wenswang @ 2026-04-14 9:29 UTC (permalink / raw)
To: robh, krzk+dt, conor+dt, linux, corbet, skhan
Cc: devicetree, linux-kernel, linux-hwmon, linux-doc, Wensheng Wang
In-Reply-To: <20260414092921.1067735-1-wenswang@yeah.net>
From: Wensheng Wang <wenswang@yeah.net>
Add support for MPS mp2985 controller. This driver exposes
telemetry and limit value readings and writtings.
Signed-off-by: Wensheng Wang <wenswang@yeah.net>
---
V3 -> V4:
1. Avoid mantissa data overflow in mp2985_linear_exp_transfer()
function.
V2 -> V3:
1. The shifted mantissa be clamped to the range [-1024, 1023]
before being masked in mp2985_linear_exp_transfer() function.
2. The PMBUS_VOUT_OV_FAULT_LIMIT and PMBUS_VOUT_UV_FAULT_LIMIT
value are clamped to 0xFFF before being written to the mp2985.
3. Fix the vout scale issue for vout linear11 mode.
v1 -> v2:
1. remove duplicate entry in mp2985.rst
2. clamp vout value to 32767
3. simplify the code for obtaining PMBUS_VOUT_MODE bit value
4. add comment for explaining MP2985 supported vout mode
5. switch back to previous page after obtaining vid scale to avoid
confusing the PMBus core
Documentation/hwmon/index.rst | 1 +
Documentation/hwmon/mp2985.rst | 147 ++++++++++++
MAINTAINERS | 7 +
drivers/hwmon/pmbus/Kconfig | 9 +
drivers/hwmon/pmbus/Makefile | 1 +
drivers/hwmon/pmbus/mp2985.c | 402 +++++++++++++++++++++++++++++++++
6 files changed, 567 insertions(+)
create mode 100644 Documentation/hwmon/mp2985.rst
create mode 100644 drivers/hwmon/pmbus/mp2985.c
diff --git a/Documentation/hwmon/index.rst b/Documentation/hwmon/index.rst
index b2ca8513cfcd..1b7007f41b39 100644
--- a/Documentation/hwmon/index.rst
+++ b/Documentation/hwmon/index.rst
@@ -183,6 +183,7 @@ Hardware Monitoring Kernel Drivers
mp2925
mp29502
mp2975
+ mp2985
mp2993
mp5023
mp5920
diff --git a/Documentation/hwmon/mp2985.rst b/Documentation/hwmon/mp2985.rst
new file mode 100644
index 000000000000..87a39c8a300c
--- /dev/null
+++ b/Documentation/hwmon/mp2985.rst
@@ -0,0 +1,147 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Kernel driver mp2985
+====================
+
+Supported chips:
+
+ * MPS mp2985
+
+ Prefix: 'mp2985'
+
+Author:
+
+ Wensheng Wang <wenswang@yeah.net>
+
+Description
+-----------
+
+This driver implements support for Monolithic Power Systems, Inc. (MPS)
+MP2985 Dual Loop Digital Multi-phase Controller.
+
+Device compliant with:
+
+- PMBus rev 1.3 interface.
+
+The driver exports the following attributes via the 'sysfs' files
+for input voltage:
+
+**in1_input**
+
+**in1_label**
+
+**in1_crit**
+
+**in1_crit_alarm**
+
+**in1_lcrit**
+
+**in1_lcrit_alarm**
+
+**in1_max**
+
+**in1_max_alarm**
+
+**in1_min**
+
+**in1_min_alarm**
+
+The driver provides the following attributes for output voltage:
+
+**in2_input**
+
+**in2_label**
+
+**in2_crit**
+
+**in2_crit_alarm**
+
+**in2_lcrit**
+
+**in2_lcrit_alarm**
+
+**in3_input**
+
+**in3_label**
+
+**in3_crit**
+
+**in3_crit_alarm**
+
+**in3_lcrit**
+
+**in3_lcrit_alarm**
+
+The driver provides the following attributes for input current:
+
+**curr1_input**
+
+**curr1_label**
+
+The driver provides the following attributes for output current:
+
+**curr2_input**
+
+**curr2_label**
+
+**curr2_crit**
+
+**curr2_crit_alarm**
+
+**curr2_max**
+
+**curr2_max_alarm**
+
+**curr3_input**
+
+**curr3_label**
+
+**curr3_crit**
+
+**curr3_crit_alarm**
+
+**curr3_max**
+
+**curr3_max_alarm**
+
+The driver provides the following attributes for input power:
+
+**power1_input**
+
+**power1_label**
+
+**power2_input**
+
+**power2_label**
+
+The driver provides the following attributes for output power:
+
+**power3_input**
+
+**power3_label**
+
+**power4_input**
+
+**power4_label**
+
+The driver provides the following attributes for temperature:
+
+**temp1_input**
+
+**temp1_crit**
+
+**temp1_crit_alarm**
+
+**temp1_max**
+
+**temp1_max_alarm**
+
+**temp2_input**
+
+**temp2_crit**
+
+**temp2_crit_alarm**
+
+**temp2_max**
+
+**temp2_max_alarm**
diff --git a/MAINTAINERS b/MAINTAINERS
index 3adc870d523b..ead04c2d1665 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -17933,6 +17933,13 @@ S: Maintained
F: Documentation/hwmon/mp29502.rst
F: drivers/hwmon/pmbus/mp29502.c
+MPS MP2985 DRIVER
+M: Wensheng Wang <wenswang@yeah.net>
+L: linux-hwmon@vger.kernel.org
+S: Maintained
+F: Documentation/hwmon/mp2985.rst
+F: drivers/hwmon/pmbus/mp2985.c
+
MPS MP2993 DRIVER
M: Noah Wang <noahwang.wang@outlook.com>
L: linux-hwmon@vger.kernel.org
diff --git a/drivers/hwmon/pmbus/Kconfig b/drivers/hwmon/pmbus/Kconfig
index fc1273abe357..83fe5866c083 100644
--- a/drivers/hwmon/pmbus/Kconfig
+++ b/drivers/hwmon/pmbus/Kconfig
@@ -447,6 +447,15 @@ config SENSORS_MP2975
This driver can also be built as a module. If so, the module will
be called mp2975.
+config SENSORS_MP2985
+ tristate "MPS MP2985"
+ help
+ If you say yes here you get hardware monitoring support for MPS
+ MP2985 Dual Loop Digital Multi-Phase Controller.
+
+ This driver can also be built as a module. If so, the module will
+ be called mp2985.
+
config SENSORS_MP2993
tristate "MPS MP2993"
help
diff --git a/drivers/hwmon/pmbus/Makefile b/drivers/hwmon/pmbus/Makefile
index d6c86924f887..24505bbee2b0 100644
--- a/drivers/hwmon/pmbus/Makefile
+++ b/drivers/hwmon/pmbus/Makefile
@@ -45,6 +45,7 @@ obj-$(CONFIG_SENSORS_MP2891) += mp2891.o
obj-$(CONFIG_SENSORS_MP2925) += mp2925.o
obj-$(CONFIG_SENSORS_MP29502) += mp29502.o
obj-$(CONFIG_SENSORS_MP2975) += mp2975.o
+obj-$(CONFIG_SENSORS_MP2985) += mp2985.o
obj-$(CONFIG_SENSORS_MP2993) += mp2993.o
obj-$(CONFIG_SENSORS_MP5023) += mp5023.o
obj-$(CONFIG_SENSORS_MP5920) += mp5920.o
diff --git a/drivers/hwmon/pmbus/mp2985.c b/drivers/hwmon/pmbus/mp2985.c
new file mode 100644
index 000000000000..eb1a25b00c0b
--- /dev/null
+++ b/drivers/hwmon/pmbus/mp2985.c
@@ -0,0 +1,402 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Hardware monitoring driver for MPS Multi-phase Digital VR Controllers(MP2985)
+ *
+ * Copyright (C) 2026 MPS
+ */
+
+#include <linux/bitfield.h>
+#include <linux/i2c.h>
+#include <linux/module.h>
+#include <linux/of_device.h>
+#include "pmbus.h"
+
+/*
+ * Vender specific register READ_PIN_EST(0x93), READ_IIN_EST(0x8E),
+ * MFR_VR_MULTI_CONFIG_R1(0x0D) and MFR_VR_MULTI_CONFIG_R2(0x1D).
+ * The READ_PIN_EST is used to read pin telemetry, the READ_IIN_EST
+ * is used to read iin telemetry and the MFR_VR_MULTI_CONFIG_R1,
+ * MFR_VR_MULTI_CONFIG_R2 are used to obtain vid scale.
+ */
+#define READ_PIN_EST 0x93
+#define READ_IIN_EST 0x8E
+#define MFR_VR_MULTI_CONFIG_R1 0x0D
+#define MFR_VR_MULTI_CONFIG_R2 0x1D
+
+#define MP2985_VOUT_DIV 64
+#define MP2985_VOUT_OVUV_UINT 125
+#define MP2985_VOUT_OVUV_DIV 64
+
+#define MP2985_PAGE_NUM 2
+
+#define MP2985_RAIL1_FUNC (PMBUS_HAVE_VIN | PMBUS_HAVE_PIN | \
+ PMBUS_HAVE_VOUT | PMBUS_HAVE_IOUT | \
+ PMBUS_HAVE_POUT | PMBUS_HAVE_TEMP | \
+ PMBUS_HAVE_STATUS_VOUT | \
+ PMBUS_HAVE_STATUS_IOUT | \
+ PMBUS_HAVE_STATUS_TEMP | \
+ PMBUS_HAVE_STATUS_INPUT)
+
+#define MP2985_RAIL2_FUNC (PMBUS_HAVE_PIN | PMBUS_HAVE_VOUT | \
+ PMBUS_HAVE_IOUT | PMBUS_HAVE_POUT | \
+ PMBUS_HAVE_TEMP | PMBUS_HAVE_IIN | \
+ PMBUS_HAVE_STATUS_VOUT | \
+ PMBUS_HAVE_STATUS_IOUT | \
+ PMBUS_HAVE_STATUS_TEMP | \
+ PMBUS_HAVE_STATUS_INPUT)
+
+struct mp2985_data {
+ struct pmbus_driver_info info;
+ int vout_scale[MP2985_PAGE_NUM];
+ int vid_offset[MP2985_PAGE_NUM];
+};
+
+#define to_mp2985_data(x) container_of(x, struct mp2985_data, info)
+
+static u16 mp2985_linear_exp_transfer(u16 word, u16 expect_exponent)
+{
+ s16 exponent, mantissa, target_exponent;
+
+ exponent = ((s16)word) >> 11;
+ mantissa = ((s16)((word & 0x7ff) << 5)) >> 5;
+ target_exponent = (s16)((expect_exponent & 0x1f) << 11) >> 11;
+
+ /*
+ * The MP2985 does not support negtive limit value, if a negtive
+ * limit value is written, the limit value will become to 0. And
+ * the maximum positive limit value is limitted to 0x3FF.
+ */
+ if (mantissa < 0) {
+ mantissa = 0;
+ } else {
+ if (exponent > target_exponent) {
+ mantissa = (1023 >> (exponent - target_exponent)) >= mantissa ?
+ mantissa << (exponent - target_exponent) :
+ 0x3FF;
+ } else {
+ mantissa = clamp_val(mantissa >> (target_exponent - exponent),
+ 0, 0x3FF);
+ }
+ }
+
+ return mantissa | ((expect_exponent << 11) & 0xf800);
+}
+
+static int mp2985_read_byte_data(struct i2c_client *client, int page, int reg)
+{
+ int ret;
+
+ switch (reg) {
+ case PMBUS_VOUT_MODE:
+ /*
+ * The MP2985 does not follow standard PMBus protocol completely,
+ * and the calculation of vout in this driver is based on direct
+ * format. As a result, the format of vout is enforced to direct.
+ */
+ ret = PB_VOUT_MODE_DIRECT;
+ break;
+ default:
+ ret = -ENODATA;
+ break;
+ }
+
+ return ret;
+}
+
+static int mp2985_read_word_data(struct i2c_client *client, int page, int phase,
+ int reg)
+{
+ const struct pmbus_driver_info *info = pmbus_get_driver_info(client);
+ struct mp2985_data *data = to_mp2985_data(info);
+ int ret;
+
+ switch (reg) {
+ case PMBUS_READ_VOUT:
+ ret = pmbus_read_word_data(client, page, phase, reg);
+ if (ret < 0)
+ return ret;
+
+ /*
+ * The MP2985 supports three vout mode, direct, linear11 and vid mode.
+ * In vid mode, the MP2985 vout telemetry has 49 vid step offset, but
+ * PMBUS_VOUT_OV_FAULT_LIMIT and PMBUS_VOUT_UV_FAULT_LIMIT do not take
+ * this into consideration, their resolution are 1.953125mV/LSB, as a
+ * result, format[PSC_VOLTAGE_OUT] can not be set to vid mode directly.
+ * Adding extra vid_offset variable for vout telemetry.
+ */
+ ret = clamp_val(DIV_ROUND_CLOSEST(((ret & GENMASK(11, 0)) +
+ data->vid_offset[page]) *
+ data->vout_scale[page], MP2985_VOUT_DIV),
+ 0, 0x7FFF);
+ break;
+ case PMBUS_READ_IIN:
+ /*
+ * The MP2985 has standard PMBUS_READ_IIN register(0x89), but this is
+ * not used to read the input current of per rail. The input current
+ * is read through the vender redefined register READ_IIN_EST(0x8E).
+ */
+ ret = pmbus_read_word_data(client, page, phase, READ_IIN_EST);
+ break;
+ case PMBUS_READ_PIN:
+ /*
+ * The MP2985 has standard PMBUS_READ_PIN register(0x97), but this
+ * is not used to read the input power of per rail. The input power
+ * of per rail is read through the vender redefined register
+ * READ_PIN_EST(0x93).
+ */
+ ret = pmbus_read_word_data(client, page, phase, READ_PIN_EST);
+ break;
+ case PMBUS_VOUT_OV_FAULT_LIMIT:
+ case PMBUS_VOUT_UV_FAULT_LIMIT:
+ ret = pmbus_read_word_data(client, page, phase, reg);
+ if (ret < 0)
+ return ret;
+
+ ret = DIV_ROUND_CLOSEST((ret & GENMASK(11, 0)) * MP2985_VOUT_OVUV_UINT,
+ MP2985_VOUT_OVUV_DIV);
+ break;
+ case PMBUS_STATUS_WORD:
+ case PMBUS_READ_VIN:
+ case PMBUS_READ_IOUT:
+ case PMBUS_READ_POUT:
+ case PMBUS_READ_TEMPERATURE_1:
+ case PMBUS_VIN_OV_FAULT_LIMIT:
+ case PMBUS_VIN_OV_WARN_LIMIT:
+ case PMBUS_VIN_UV_WARN_LIMIT:
+ case PMBUS_VIN_UV_FAULT_LIMIT:
+ case PMBUS_IOUT_OC_FAULT_LIMIT:
+ case PMBUS_IOUT_OC_WARN_LIMIT:
+ case PMBUS_OT_FAULT_LIMIT:
+ case PMBUS_OT_WARN_LIMIT:
+ /*
+ * These register is not explicitly handled by the driver,
+ * as a result, return -ENODATA directly.
+ */
+ ret = -ENODATA;
+ break;
+ default:
+ /*
+ * The MP2985 do not support other telemetry and limit value
+ * reading, so, return -EINVAL directly.
+ */
+ ret = -EINVAL;
+ break;
+ }
+
+ return ret;
+}
+
+static int mp2985_write_word_data(struct i2c_client *client, int page, int reg,
+ u16 word)
+{
+ int ret;
+
+ switch (reg) {
+ case PMBUS_VIN_OV_FAULT_LIMIT:
+ case PMBUS_VIN_OV_WARN_LIMIT:
+ case PMBUS_VIN_UV_WARN_LIMIT:
+ case PMBUS_VIN_UV_FAULT_LIMIT:
+ /*
+ * The PMBUS_VIN_OV_FAULT_LIMIT, PMBUS_VIN_OV_WARN_LIMIT,
+ * PMBUS_VIN_UV_WARN_LIMIT and PMBUS_VIN_UV_FAULT_LIMIT
+ * of MP2985 is linear11 format, and the exponent is a
+ * constant value(5'b11101), so the exponent of word
+ * parameter should be converted to 5'b11101(0x1D).
+ */
+ ret = pmbus_write_word_data(client, page, reg,
+ mp2985_linear_exp_transfer(word, 0x1D));
+ break;
+ case PMBUS_VOUT_OV_FAULT_LIMIT:
+ case PMBUS_VOUT_UV_FAULT_LIMIT:
+ /*
+ * The bit0-bit11 is the limit value, and bit12-bit15
+ * should not be changed.
+ */
+ ret = pmbus_read_word_data(client, page, 0xff, reg);
+ if (ret < 0)
+ return ret;
+
+ ret = pmbus_write_word_data(client, page, reg,
+ (ret & ~GENMASK(11, 0)) |
+ clamp_val(DIV_ROUND_CLOSEST(word * MP2985_VOUT_OVUV_DIV,
+ MP2985_VOUT_OVUV_UINT), 0, 0xFFF));
+ break;
+ case PMBUS_OT_FAULT_LIMIT:
+ case PMBUS_OT_WARN_LIMIT:
+ /*
+ * The PMBUS_OT_FAULT_LIMIT and PMBUS_OT_WARN_LIMIT of
+ * MP2985 is linear11 format, and the exponent is a
+ * constant value(5'b00000), so the exponent of word
+ * parameter should be converted to 5'b00000.
+ */
+ ret = pmbus_write_word_data(client, page, reg,
+ mp2985_linear_exp_transfer(word, 0x00));
+ break;
+ case PMBUS_IOUT_OC_FAULT_LIMIT:
+ case PMBUS_IOUT_OC_WARN_LIMIT:
+ /*
+ * The PMBUS_IOUT_OC_FAULT_LIMIT and PMBUS_IOUT_OC_WARN_LIMIT
+ * of MP2985 is linear11 format, and the exponent can not be
+ * changed.
+ */
+ ret = pmbus_read_word_data(client, page, 0xff, reg);
+ if (ret < 0)
+ return ret;
+
+ ret = pmbus_write_word_data(client, page, reg,
+ mp2985_linear_exp_transfer(word,
+ FIELD_GET(GENMASK(15, 11),
+ ret)));
+ break;
+ default:
+ /*
+ * The MP2985 do not support other limit value configuration,
+ * so, return -EINVAL directly.
+ */
+ ret = -EINVAL;
+ break;
+ }
+
+ return ret;
+}
+
+static int
+mp2985_identify_vout_scale(struct i2c_client *client, struct pmbus_driver_info *info,
+ int page)
+{
+ struct mp2985_data *data = to_mp2985_data(info);
+ int ret;
+
+ ret = i2c_smbus_write_byte_data(client, PMBUS_PAGE, page);
+ if (ret < 0)
+ return ret;
+
+ ret = i2c_smbus_read_byte_data(client, PMBUS_VOUT_MODE);
+ if (ret < 0)
+ return ret;
+
+ /*
+ * The MP2985 supports three vout mode. If PMBUS_VOUT_MODE
+ * bit5 is 1, it is vid mode. If PMBUS PMBUS_VOUT_MODE bit4
+ * is 1, it is linear11 mode, the vout scale is 1.953125mv/LSB.
+ * If PMBUS PMBUS_VOUT_MODE bit6 is 1, it is direct mode, the
+ * vout scale is 1mv/LSB. In vid mode, the MP2985 vout telemetry
+ * has 49 vid step offset.
+ */
+ if (FIELD_GET(BIT(5), ret)) {
+ ret = i2c_smbus_write_byte_data(client, PMBUS_PAGE, 2);
+ if (ret < 0)
+ return ret;
+
+ ret = i2c_smbus_read_word_data(client, page == 0 ?
+ MFR_VR_MULTI_CONFIG_R1 :
+ MFR_VR_MULTI_CONFIG_R2);
+ if (ret < 0)
+ return ret;
+
+ if (page == 0) {
+ if (FIELD_GET(BIT(4), ret))
+ data->vout_scale[page] = 320;
+ else
+ data->vout_scale[page] = 640;
+ } else {
+ if (FIELD_GET(BIT(3), ret))
+ data->vout_scale[page] = 320;
+ else
+ data->vout_scale[page] = 640;
+ }
+
+ data->vid_offset[page] = 49;
+
+ /*
+ * For vid mode, the MP2985 should be changed to page 2
+ * to obtain vout scale value, this may confuse the PMBus
+ * core. To avoid this, switch back to the previous page
+ * again.
+ */
+ ret = i2c_smbus_write_byte_data(client, PMBUS_PAGE, page);
+ if (ret < 0)
+ return ret;
+ } else if (FIELD_GET(BIT(4), ret)) {
+ data->vout_scale[page] = 125;
+ data->vid_offset[page] = 0;
+ } else {
+ data->vout_scale[page] = 64;
+ data->vid_offset[page] = 0;
+ }
+
+ return 0;
+}
+
+static int mp2985_identify(struct i2c_client *client, struct pmbus_driver_info *info)
+{
+ int ret;
+
+ ret = mp2985_identify_vout_scale(client, info, 0);
+ if (ret < 0)
+ return ret;
+
+ return mp2985_identify_vout_scale(client, info, 1);
+}
+
+static struct pmbus_driver_info mp2985_info = {
+ .pages = MP2985_PAGE_NUM,
+ .format[PSC_VOLTAGE_IN] = linear,
+ .format[PSC_CURRENT_IN] = linear,
+ .format[PSC_CURRENT_OUT] = linear,
+ .format[PSC_POWER] = linear,
+ .format[PSC_TEMPERATURE] = linear,
+ .format[PSC_VOLTAGE_OUT] = direct,
+
+ .m[PSC_VOLTAGE_OUT] = 1,
+ .R[PSC_VOLTAGE_OUT] = 3,
+ .b[PSC_VOLTAGE_OUT] = 0,
+
+ .func[0] = MP2985_RAIL1_FUNC,
+ .func[1] = MP2985_RAIL2_FUNC,
+ .read_word_data = mp2985_read_word_data,
+ .read_byte_data = mp2985_read_byte_data,
+ .write_word_data = mp2985_write_word_data,
+ .identify = mp2985_identify,
+};
+
+static int mp2985_probe(struct i2c_client *client)
+{
+ struct mp2985_data *data;
+
+ data = devm_kzalloc(&client->dev, sizeof(struct mp2985_data), GFP_KERNEL);
+ if (!data)
+ return -ENOMEM;
+
+ memcpy(&data->info, &mp2985_info, sizeof(mp2985_info));
+
+ return pmbus_do_probe(client, &data->info);
+}
+
+static const struct i2c_device_id mp2985_id[] = {
+ {"mp2985", 0},
+ {}
+};
+MODULE_DEVICE_TABLE(i2c, mp2985_id);
+
+static const struct of_device_id __maybe_unused mp2985_of_match[] = {
+ {.compatible = "mps,mp2985"},
+ {}
+};
+MODULE_DEVICE_TABLE(of, mp2985_of_match);
+
+static struct i2c_driver mp2985_driver = {
+ .driver = {
+ .name = "mp2985",
+ .of_match_table = mp2985_of_match,
+ },
+ .probe = mp2985_probe,
+ .id_table = mp2985_id,
+};
+
+module_i2c_driver(mp2985_driver);
+
+MODULE_AUTHOR("Wensheng Wang <wenswang@yeah.net>");
+MODULE_DESCRIPTION("PMBus driver for MPS MP2985 device");
+MODULE_LICENSE("GPL");
+MODULE_IMPORT_NS("PMBUS");
--
2.25.1
^ permalink raw reply related
* [PATCH v4 1/2] dt-bindings: hwmon: Add MPS mp2985
From: wenswang @ 2026-04-14 9:29 UTC (permalink / raw)
To: robh, krzk+dt, conor+dt, linux, corbet, skhan
Cc: devicetree, linux-kernel, linux-hwmon, linux-doc, Wensheng Wang,
Krzysztof Kozlowski
In-Reply-To: <20260414092801.1067470-1-wenswang@yeah.net>
From: Wensheng Wang <wenswang@yeah.net>
Add support for MPS mp2985 controller.
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Signed-off-by: Wensheng Wang <wenswang@yeah.net>
---
v1 -> v2:
1. add Krzysztof's Acked-by
Documentation/devicetree/bindings/trivial-devices.yaml | 2 ++
1 file changed, 2 insertions(+)
diff --git a/Documentation/devicetree/bindings/trivial-devices.yaml b/Documentation/devicetree/bindings/trivial-devices.yaml
index a482aeadcd44..d61482269352 100644
--- a/Documentation/devicetree/bindings/trivial-devices.yaml
+++ b/Documentation/devicetree/bindings/trivial-devices.yaml
@@ -325,6 +325,8 @@ properties:
- mps,mp29612
# Monolithic Power Systems Inc. multi-phase controller mp29816
- mps,mp29816
+ # Monolithic Power Systems Inc. multi-phase controller mp2985
+ - mps,mp2985
# Monolithic Power Systems Inc. multi-phase controller mp2993
- mps,mp2993
# Monolithic Power Systems Inc. hot-swap protection device
--
2.25.1
^ permalink raw reply related
* Re: [PATCH v4 2/9] bus: mhi: Move sahara protocol driver under drivers/bus/mhi
From: Kishore Batta @ 2026-04-14 9:45 UTC (permalink / raw)
To: Manivannan Sadhasivam, Jeff Hugo
Cc: Jonathan Corbet, Shuah Khan, Carl Vanderlip, Oded Gabbay,
andersson, linux-doc, linux-kernel, linux-arm-msm, dri-devel, mhi
In-Reply-To: <sab2tgxtiftme5gscknsl7cfifpshtlrnnihbm2g56ppbowcit@bg4bzwuta6a6>
On 4/13/2026 4:34 PM, Manivannan Sadhasivam wrote:
> On Thu, Apr 09, 2026 at 02:20:02PM -0600, Jeff Hugo wrote:
>> On 3/19/2026 12:31 AM, Kishore Batta wrote:
>>> The Sahara protocol driver is currently located under the QAIC
>>> accelerator subsystem even though protocol itself is transported over the
>>> MHI bus and is used by multiple Qualcomm flashless devices.
>>>
>>> Relocate the Sahara protocol driver to drivers/bus/mhi and register it as
>>> an independent MHI protocol driver. This avoids treating Sahara as QAIC
>>> specific and makes it available for reuse by other MHI based devices.
>>>
>>> As part of this move, introduce a dedicated Kconfig and Makefile under the
>>> MHI subsystem and expose the sahara interface via a common header.
>> I don't think this belongs under MHI. Mani needs to confirm that he agrees
>> with the concept of moving this there.
>>
>> The Sahara protocol as defined by the spec does not require MHI. We know
>> that there are Sahara implementations over USB. I don't see a dependency or
>> relationship to MHI other than the current in-kernel implementation uses
>> MHI, but there are plenty of things that use MHI (qaic, mhi-net, ath12k,
>> etc) which are not a part of the MHI bus.
>>
> Since Sahara is a MHI client driver, it is OK with me to place it under
> drivers/bus/mhi/host/. We do tend to host the client/controller drivers if they
> also bind to separate top level subsystems like Net, WWAN... but for the pure
> protocol drivers like Sahara, MHI can provide asylum.
>
> - Mani
Thanks for the confirmation Mani. I will keep the Sahara driver under
driver/bus/mhi/host/ and also move the Sahara documentation under
Documentation/mhi/ directory.
^ permalink raw reply
* Re: [PATCH v4 2/9] bus: mhi: Move sahara protocol driver under drivers/bus/mhi
From: Kishore Batta @ 2026-04-14 9:48 UTC (permalink / raw)
To: Manivannan Sadhasivam
Cc: Jonathan Corbet, Shuah Khan, Jeff Hugo, Carl Vanderlip,
Oded Gabbay, andersson, linux-doc, linux-kernel, linux-arm-msm,
dri-devel, mhi
In-Reply-To: <enwtopztznwtvlhukkggxcdmh4t7v7duoiuapi5gd4zggqwbit@ypb4nxnds53f>
On 4/13/2026 4:50 PM, Manivannan Sadhasivam wrote:
> On Thu, Mar 19, 2026 at 12:01:42PM +0530, Kishore Batta wrote:
>> The Sahara protocol driver is currently located under the QAIC
>> accelerator subsystem even though protocol itself is transported over the
>> MHI bus and is used by multiple Qualcomm flashless devices.
>>
>> Relocate the Sahara protocol driver to drivers/bus/mhi and register it as
>> an independent MHI protocol driver. This avoids treating Sahara as QAIC
>> specific and makes it available for reuse by other MHI based devices.
>>
>> As part of this move, introduce a dedicated Kconfig and Makefile under the
>> MHI subsystem and expose the sahara interface via a common header.
>>
>> Signed-off-by: Kishore Batta <kishore.batta@oss.qualcomm.com>
>> ---
>> drivers/accel/qaic/Kconfig | 1 +
>> drivers/accel/qaic/Makefile | 3 +--
>> drivers/accel/qaic/qaic_drv.c | 11 ++---------
>> drivers/bus/mhi/Kconfig | 1 +
>> drivers/bus/mhi/Makefile | 3 +++
>> drivers/bus/mhi/sahara/Kconfig | 18 ++++++++++++++++++
>> drivers/bus/mhi/sahara/Makefile | 2 ++
> Create one more subidr 'clients' and move 'sahara' here:
> drivers/bus/mhi/host/clients/sahara/
>
> I'm not sure if we are going to have Sahara implementation for the endpoint
> itself. If so, it should be moved under drivers/bus/mhi/common/.
Thanks for the suggestion. I will create clients directory and move
Sahara driver here. For endpoint, Sahara driver is implemented in XBL.
So, its not required here.
>
>> drivers/{accel/qaic => bus/mhi/sahara}/sahara.c | 16 +++++++++++-----
>> {drivers/accel/qaic => include/linux}/sahara.h | 0
> include/linux/mhi/sahara.h
ACK. I will move the header file to include/linux/mhi/sahara.h
>
>> 9 files changed, 39 insertions(+), 16 deletions(-)
>>
>> diff --git a/drivers/accel/qaic/Kconfig b/drivers/accel/qaic/Kconfig
>> index 116e42d152ca885b8c59e33c7a87519a0abc6bb3..1e5f1f4fa93c12d8ca8fb37633f2f0bee9997499 100644
>> --- a/drivers/accel/qaic/Kconfig
>> +++ b/drivers/accel/qaic/Kconfig
>> @@ -8,6 +8,7 @@ config DRM_ACCEL_QAIC
>> depends on DRM_ACCEL
>> depends on PCI && HAS_IOMEM
>> depends on MHI_BUS
>> + select MHI_SAHARA
>> select CRC32
>> select WANT_DEV_COREDUMP
>> help
>> diff --git a/drivers/accel/qaic/Makefile b/drivers/accel/qaic/Makefile
>> index 71f727b74da3bb4478324689f02a7cea24a05c2d..e7b8458800072aa627f7f36c3257883aa56f4ce4 100644
>> --- a/drivers/accel/qaic/Makefile
>> +++ b/drivers/accel/qaic/Makefile
>> @@ -13,7 +13,6 @@ qaic-y := \
>> qaic_ras.o \
>> qaic_ssr.o \
>> qaic_sysfs.o \
>> - qaic_timesync.o \
>> - sahara.o
>> + qaic_timesync.o
>>
>> qaic-$(CONFIG_DEBUG_FS) += qaic_debugfs.o
>> diff --git a/drivers/accel/qaic/qaic_drv.c b/drivers/accel/qaic/qaic_drv.c
>> index 63fb8c7b4abcbe4f1b76c32106f4e8b9ea5e2c8e..76cc8086825e7949ed756d51fcb56a08f392d228 100644
>> --- a/drivers/accel/qaic/qaic_drv.c
>> +++ b/drivers/accel/qaic/qaic_drv.c
>> @@ -15,6 +15,7 @@
>> #include <linux/msi.h>
>> #include <linux/mutex.h>
>> #include <linux/pci.h>
>> +#include <linux/sahara.h>
>> #include <linux/spinlock.h>
>> #include <linux/workqueue.h>
>> #include <linux/wait.h>
>> @@ -32,7 +33,6 @@
>> #include "qaic_ras.h"
>> #include "qaic_ssr.h"
>> #include "qaic_timesync.h"
>> -#include "sahara.h"
>>
>> MODULE_IMPORT_NS("DMA_BUF");
>>
>> @@ -782,18 +782,12 @@ static int __init qaic_init(void)
>> ret = pci_register_driver(&qaic_pci_driver);
>> if (ret) {
>> pr_debug("qaic: pci_register_driver failed %d\n", ret);
>> - return ret;
>> + goto free_pci;
>> }
>>
>> ret = mhi_driver_register(&qaic_mhi_driver);
>> if (ret) {
>> pr_debug("qaic: mhi_driver_register failed %d\n", ret);
>> - goto free_pci;
>> - }
>> -
>> - ret = sahara_register();
>> - if (ret) {
>> - pr_debug("qaic: sahara_register failed %d\n", ret);
>> goto free_mhi;
>> }
>>
>> @@ -847,7 +841,6 @@ static void __exit qaic_exit(void)
>> qaic_ras_unregister();
>> qaic_bootlog_unregister();
>> qaic_timesync_deinit();
>> - sahara_unregister();
>> mhi_driver_unregister(&qaic_mhi_driver);
>> pci_unregister_driver(&qaic_pci_driver);
>> }
>> diff --git a/drivers/bus/mhi/Kconfig b/drivers/bus/mhi/Kconfig
>> index b39a11e6c624ba00349cca22d74bd876020590ab..4acedb886adccc6f76f69c241d53106da59b491f 100644
>> --- a/drivers/bus/mhi/Kconfig
>> +++ b/drivers/bus/mhi/Kconfig
>> @@ -7,3 +7,4 @@
>>
>> source "drivers/bus/mhi/host/Kconfig"
>> source "drivers/bus/mhi/ep/Kconfig"
>> +source "drivers/bus/mhi/sahara/Kconfig"
>> diff --git a/drivers/bus/mhi/Makefile b/drivers/bus/mhi/Makefile
>> index 354204b0ef3ae4030469a24a659f32429d592aef..e4af535e1bb1bc9481fae60d7eb347700d2e874c 100644
>> --- a/drivers/bus/mhi/Makefile
>> +++ b/drivers/bus/mhi/Makefile
>> @@ -3,3 +3,6 @@ obj-$(CONFIG_MHI_BUS) += host/
>>
>> # Endpoint MHI stack
>> obj-$(CONFIG_MHI_BUS_EP) += ep/
>> +
>> +# Sahara MHI protocol
>> +obj-$(CONFIG_MHI_SAHARA) += sahara/
>> diff --git a/drivers/bus/mhi/sahara/Kconfig b/drivers/bus/mhi/sahara/Kconfig
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..3f1caf6acd979a4af68aaf0e250aa54762e8cda5
>> --- /dev/null
>> +++ b/drivers/bus/mhi/sahara/Kconfig
>> @@ -0,0 +1,18 @@
>> +config MHI_SAHARA
>> + tristate
>> + depends on MHI_BUS
>> + select FW_LOADER_COMPRESS
>> + select FW_LOADER_COMPRESS_XZ
>> + select FW_LOADER_COMPRESS_ZSTD
> Why suddenly these configs pop up?
I will remove these in the next version.
>
>> + help
>> + Enable support for the Sahara protocol transported over the MHI bus.
>> +
>> + The Sahara protocol is used to transfer firmware images, retrieve
>> + memory dumps and exchange command mode DDR calibration data between
>> + host and device. This driver is not tied to a specific SoC and may be
>> + used by multiple MHI based devices.
>> +
>> + If unsure, say N.
>> +
>> + To compile this driver as a module, choose M here: the module will be
>> + called mhi_sahara.
>> diff --git a/drivers/bus/mhi/sahara/Makefile b/drivers/bus/mhi/sahara/Makefile
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..fc02a25935011cbd7138ea8f24b88cf5b032a4ce
>> --- /dev/null
>> +++ b/drivers/bus/mhi/sahara/Makefile
>> @@ -0,0 +1,2 @@
>> +obj-$(CONFIG_MHI_SAHARA) += mhi_sahara.o
>> +mhi_sahara-y := sahara.o
>> diff --git a/drivers/accel/qaic/sahara.c b/drivers/bus/mhi/sahara/sahara.c
>> similarity index 99%
>> rename from drivers/accel/qaic/sahara.c
>> rename to drivers/bus/mhi/sahara/sahara.c
>> index fd3c3b2d1fd3bb698809e6ca669128e2dce06613..8ff7b6425ac5423ef8f32117151dca10397686a8 100644
>> --- a/drivers/accel/qaic/sahara.c
>> +++ b/drivers/bus/mhi/sahara/sahara.c
>> @@ -1,6 +1,8 @@
>> -// SPDX-License-Identifier: GPL-2.0-only
>> -
>> -/* Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. */
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * Copyright (c) 2018-2020, The Linux Foundation. All rights reserved.
> Why are you changing the copyright?
I misunderstood the comment from Patch 1 series. Only the copyright
style needs to be changed. I will modify it in next version.
>
>> + *
>> + */
>>
>> #include <linux/devcoredump.h>
>> #include <linux/firmware.h>
>> @@ -9,12 +11,11 @@
>> #include <linux/minmax.h>
>> #include <linux/mod_devicetable.h>
>> #include <linux/overflow.h>
>> +#include <linux/sahara.h>
>> #include <linux/types.h>
>> #include <linux/vmalloc.h>
>> #include <linux/workqueue.h>
>>
>> -#include "sahara.h"
>> -
>> #define SAHARA_HELLO_CMD 0x1 /* Min protocol version 1.0 */
>> #define SAHARA_HELLO_RESP_CMD 0x2 /* Min protocol version 1.0 */
>> #define SAHARA_READ_DATA_CMD 0x3 /* Min protocol version 1.0 */
>> @@ -928,8 +929,13 @@ int sahara_register(void)
>> {
>> return mhi_driver_register(&sahara_mhi_driver);
>> }
>> +module_init(sahara_register);
>>
>> void sahara_unregister(void)
>> {
>> mhi_driver_unregister(&sahara_mhi_driver);
>> }
>> +module_exit(sahara_unregister);
> Use module_mhi_driver().
ACK.
>
> - Mani
>
^ permalink raw reply
* Re: [PATCH v4 4/9] bus: mhi: Centralize firmware image table selection at probe time
From: Kishore Batta @ 2026-04-14 9:49 UTC (permalink / raw)
To: Manivannan Sadhasivam
Cc: Jonathan Corbet, Shuah Khan, Jeff Hugo, Carl Vanderlip,
Oded Gabbay, andersson, linux-doc, linux-kernel, linux-arm-msm,
dri-devel, mhi
In-Reply-To: <2sykuv6r643v3i6ymdoevzohoxdmgrrodvgpbaystskz7fwgun@fd3p7gcso252>
On 4/13/2026 4:56 PM, Manivannan Sadhasivam wrote:
> On Thu, Mar 19, 2026 at 12:01:44PM +0530, Kishore Batta wrote:
>> The Sahara driver currently selects firmware image tables using
>> scattered, device specific conditionals in the probe path, making the
>> logic harder to follow and extend.
>>
>> Refactor firmware image table selection into a single, explicit probe-time
>> mechanism by introducing a variant table that captures device matching,
>> firmware image tables, firmware folder names, and streaming behavior in
>> one place.
>>
>> This centralizes device specific decisions, simplifies the probe logic,
>> and avoids ad-hoc conditionals while preserving the existing behavior for
>> all supported AIC devices.
>>
>> Signed-off-by: Kishore Batta <kishore.batta@oss.qualcomm.com>
>> ---
>> drivers/bus/mhi/sahara/sahara.c | 66 ++++++++++++++++++++++++++++++++++++-----
>> 1 file changed, 58 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/bus/mhi/sahara/sahara.c b/drivers/bus/mhi/sahara/sahara.c
>> index e3499977e7c6b53bc624a8eb00d0636f2ea63307..8f1c0d72066c0cf80c09d78bfc51df2e482133b9 100644
>> --- a/drivers/bus/mhi/sahara/sahara.c
>> +++ b/drivers/bus/mhi/sahara/sahara.c
>> @@ -180,6 +180,16 @@ struct sahara_context {
>> u32 read_data_length;
>> bool is_mem_dump_mode;
>> bool non_streaming;
>> + const char *fw_folder;
>> +};
>> +
>> +struct sahara_variant {
>> + const char *match;
>> + bool match_is_chan;
> This name makes no sense.
>
> - Mani
I will drop this in the next version.
>> + const char * const *image_table;
>> + size_t table_size;
>> + const char *fw_folder;
>> + bool non_streaming;
>> };
>>
>> static const char * const aic100_image_table[] = {
>> @@ -224,11 +234,50 @@ static const char * const aic200_image_table[] = {
>> [78] = "qcom/aic200/pvs.bin",
>> };
>>
>> +static const struct sahara_variant sahara_variants[] = {
>> + {
>> + .match = "AIC100",
>> + .match_is_chan = false,
>> + .image_table = aic100_image_table,
>> + .table_size = ARRAY_SIZE(aic100_image_table),
>> + .fw_folder = "aic100",
>> + .non_streaming = true,
>> + },
>> + {
>> + .match = "AIC200",
>> + .match_is_chan = false,
>> + .image_table = aic200_image_table,
>> + .table_size = ARRAY_SIZE(aic200_image_table),
>> + .fw_folder = "aic200",
>> + .non_streaming = false,
>> + }
>> +};
>> +
>> static bool is_streaming(struct sahara_context *context)
>> {
>> return !context->non_streaming;
>> }
>>
>> +static const struct sahara_variant *sahara_select_variant(struct mhi_device *mhi_dev,
>> + const struct mhi_device_id *id)
>> +{
>> + int i;
>> +
>> + for (i = 0; i < ARRAY_SIZE(sahara_variants); i++) {
>> + const struct sahara_variant *v = &sahara_variants[i];
>> +
>> + if (v->match_is_chan) {
>> + if (id && id->chan && !strcmp(id->chan, v->match))
>> + return v;
>> + } else {
>> + if (mhi_dev->mhi_cntrl && mhi_dev->mhi_cntrl->name &&
>> + !strcmp(mhi_dev->mhi_cntrl->name, v->match))
>> + return v;
>> + }
>> + }
>> + return NULL;
>> +}
>> +
>> static int sahara_find_image(struct sahara_context *context, u32 image_id)
>> {
>> int ret;
>> @@ -797,6 +846,7 @@ static void sahara_read_data_processing(struct work_struct *work)
>>
>> static int sahara_mhi_probe(struct mhi_device *mhi_dev, const struct mhi_device_id *id)
>> {
>> + const struct sahara_variant *variant;
>> struct sahara_context *context;
>> int ret;
>> int i;
>> @@ -809,14 +859,14 @@ static int sahara_mhi_probe(struct mhi_device *mhi_dev, const struct mhi_device_
>> if (!context->rx)
>> return -ENOMEM;
>>
>> - if (!strcmp(mhi_dev->mhi_cntrl->name, "AIC200")) {
>> - context->image_table = aic200_image_table;
>> - context->table_size = ARRAY_SIZE(aic200_image_table);
>> - } else {
>> - context->image_table = aic100_image_table;
>> - context->table_size = ARRAY_SIZE(aic100_image_table);
>> - context->non_streaming = true;
>> - }
>> + variant = sahara_select_variant(mhi_dev, id);
>> + if (!variant)
>> + return -ENODEV;
>> +
>> + context->image_table = variant->image_table;
>> + context->table_size = variant->table_size;
>> + context->non_streaming = variant->non_streaming;
>> + context->fw_folder = variant->fw_folder;
>>
>> /*
>> * There are two firmware implementations for READ_DATA handling.
>>
>> --
>> 2.34.1
>>
^ permalink raw reply
* Re: [PATCH v4 5/9] bus: mhi: Add QDU100 variant and image_id firmware fallback
From: Kishore Batta @ 2026-04-14 9:51 UTC (permalink / raw)
To: Manivannan Sadhasivam
Cc: Jonathan Corbet, Shuah Khan, Jeff Hugo, Carl Vanderlip,
Oded Gabbay, andersson, linux-doc, linux-kernel, linux-arm-msm,
dri-devel, mhi
In-Reply-To: <5lfbhyzyyji6cuve3uzd26rfgnqotcupelppgehdj36dq7op6j@hn3jmhtqzntq>
On 4/13/2026 5:04 PM, Manivannan Sadhasivam wrote:
> On Thu, Mar 19, 2026 at 12:01:45PM +0530, Kishore Batta wrote:
>> The Sahara driver currently selects a firmware image table based on the
>> attached device, but it does not recognize QDU100 devices that expose the
>> protocol on the SAHARA MHI channel. As a result, the host cannot associate
>> QDU100 devices with the correct firmware namespace during image transfer.
>>
>> Extend the probe-time variant selection to match the SAHARA MHI channel
>> and associate it with the QDU100 firmware folder. Add an image_id based
>> firmware lookup fallback for cases where an image does not have an explicit
>> table entry. This allows required images to be provisioned by the platform
>> without requiring device specific client drivers or additional registration
>> mechanisms.
>>
>> This change only affects devices matched on the SAHARA channel and does not
>> change behavior for existing AIC100 and AIC200 devices.
>>
>> Signed-off-by: Kishore Batta <kishore.batta@oss.qualcomm.com>
>> ---
>> drivers/bus/mhi/sahara/sahara.c | 77 ++++++++++++++++++++++++++++++++++++++---
>> 1 file changed, 72 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/bus/mhi/sahara/sahara.c b/drivers/bus/mhi/sahara/sahara.c
>> index 8f1c0d72066c0cf80c09d78bfc51df2e482133b9..4ea14c57774f51a778289d7409372a6ab21fea60 100644
>> --- a/drivers/bus/mhi/sahara/sahara.c
>> +++ b/drivers/bus/mhi/sahara/sahara.c
>> @@ -234,6 +234,36 @@ static const char * const aic200_image_table[] = {
>> [78] = "qcom/aic200/pvs.bin",
>> };
>>
>> +static const char * const qdu100_image_table[] = {
>> + [5] = "qcom/qdu100/uefi.elf",
>> + [8] = "qcom/qdu100/qdsp6sw.mbn",
>> + [16] = "qcom/qdu100/efs1.bin",
>> + [17] = "qcom/qdu100/efs2.bin",
>> + [20] = "qcom/qdu100/efs3.bin",
>> + [23] = "qcom/qdu100/aop.mbn",
>> + [25] = "qcom/qdu100/tz.mbn",
>> + [29] = "qcom/qdu100/zeros_1sector.bin",
>> + [33] = "qcom/qdu100/hypvm.mbn",
>> + [34] = "qcom/qdu100/mdmddr.mbn",
>> + [36] = "qcom/qdu100/multi_image_qti.mbn",
>> + [37] = "qcom/qdu100/multi_image.mbn",
>> + [38] = "qcom/qdu100/xbl_config.elf",
>> + [39] = "qcom/qdu100/abl_userdebug.elf",
>> + [40] = "qcom/qdu100/zeros_1sector.bin",
>> + [41] = "qcom/qdu100/devcfg.mbn",
>> + [42] = "qcom/qdu100/zeros_1sector.bin",
>> + [45] = "qcom/qdu100/tools_l.elf",
>> + [46] = "qcom/qdu100/Quantum.elf",
>> + [47] = "qcom/qdu100/quest.elf",
>> + [48] = "qcom/qdu100/xbl_ramdump.elf",
>> + [49] = "qcom/qdu100/shrm.elf",
>> + [50] = "qcom/qdu100/cpucp.elf",
>> + [51] = "qcom/qdu100/aop_devcfg.mbn",
>> + [52] = "qcom/qdu100/fw_csm_gsi_3.0.elf",
>> + [53] = "qcom/qdu100/qdsp6sw_dtbs.elf",
>> + [54] = "qcom/qdu100/qupv3fw.elf",
>> +};
> Why the Sahara driver hardcodes these firmware names in the first place? Sahara
> is just a protocol to transfer these images to the device, so this driver
> shouldn't have any device specific info hardcoded. IMO, this should just act as
> a pure library. These firmware names should come from MHI controller drivers
> instead.
>
> - Mani
ACK. I will move these image tables to respective MHI controller drivers
by implementing a registration mechanism.
>
^ permalink raw reply
* Re: [PATCH 4/6] hugetlb: drop vma_hugecache_offset() in favor of linear_page_index()
From: Oscar Salvador @ 2026-04-14 9:53 UTC (permalink / raw)
To: Jane Chu
Cc: akpm, david, muchun.song, lorenzo.stoakes, Liam.Howlett, vbabka,
rppt, surenb, mhocko, corbet, skhan, hughd, baolin.wang, peterx,
linux-mm, linux-doc, linux-kernel
In-Reply-To: <20260409234158.837786-5-jane.chu@oracle.com>
On Thu, Apr 09, 2026 at 05:41:55PM -0600, Jane Chu wrote:
> vma_hugecache_offset() converts a hugetlb VMA address into a mapping
> offset in hugepage units. While the helper is small, its name is not very
> clear, and the resulting code is harder to follow than using the common MM
> helper directly.
>
> Use linear_page_index() instead, with an explicit conversion from
> PAGE_SIZE units to hugepage units at each call site, and remove
> vma_hugecache_offset().
>
> This makes the code a bit more direct and avoids a hugetlb-specific helper
> whose behavior is already expressible with existing MM primitives.
>
> Signed-off-by: Jane Chu <jane.chu@oracle.com>
Looks good to me, the only thing is the conversion to hugepage units
which may not be very clear to the casual reader, but you already
mentioned that you will add a helper, so all good.
--
Oscar Salvador
SUSE Labs
^ permalink raw reply
* Re: [PATCH v4 8/9] bus: mhi: Expose DDR training data via controller sysfs
From: Kishore Batta @ 2026-04-14 9:56 UTC (permalink / raw)
To: Manivannan Sadhasivam
Cc: Jonathan Corbet, Shuah Khan, Jeff Hugo, Carl Vanderlip,
Oded Gabbay, andersson, linux-doc, linux-kernel, linux-arm-msm,
dri-devel, mhi
In-Reply-To: <tbwahssgudfeacfj3wcg32yw5fkqorswees4gv4geypjmmdcyu@tv6qkuhyw23l>
On 4/13/2026 5:28 PM, Manivannan Sadhasivam wrote:
> On Thu, Mar 19, 2026 at 12:01:48PM +0530, Kishore Batta wrote:
>> DDR training data captured during Sahara command mode needs to be
>> accessible to userspace so it can be persisted and reused on subsequent
>> boots. Currently, the training data is stored internally in the driver
>> but has no external visibility once the sahara channel is torn down.
>>
> Maybe share some steps on how the userspace is expected to use this calibration
> data.
Sure. will update the commit message with the required details in the
next version.
>> Expose the captured DDR training data via a read-only binary sysfs
>> attribute on the MHI controller device. The sysfs file is created under
>> the controller node, allowing userspace to read the training data even
>> after the sahara channel device has been removed.
>>
> So once the calibration data is read, how it can be used further?
The userspace will store the calibration data in
"mdmddr_0x<serial_no>.mbn format. In the next boot, Sahara driver loads
the real DDR calibration data and training data will be restored. No
repeated DDR training is performed at target end.
>
>> The sysfs attribute reads directly from controller-scoped storage and
>> relies on device managed resources for cleanup when the controller
>> device is destroyed. No explicit sysfs removal is required, avoiding
>> lifetime dependencies on the Sahara channel device.
>>
> Missing ABI documentation.
>
> - Mani
Currently i have added in a separate patch(9/9). I will squash it with
this patch in the next version.
>> Signed-off-by: Kishore Batta <kishore.batta@oss.qualcomm.com>
>> ---
>> drivers/bus/mhi/sahara/sahara.c | 69 +++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 69 insertions(+)
>>
>> diff --git a/drivers/bus/mhi/sahara/sahara.c b/drivers/bus/mhi/sahara/sahara.c
>> index c88f1220199ac4373d3552167870c19a0d5f23b9..b7208738df10fc3c3895acd46873412818dc1730 100644
>> --- a/drivers/bus/mhi/sahara/sahara.c
>> +++ b/drivers/bus/mhi/sahara/sahara.c
>> @@ -415,6 +415,73 @@ static struct sahara_ctrl_trng_data *sahara_ctrl_trng_get(struct device *dev)
>> return ct;
>> }
>>
>> +static ssize_t ddr_training_data_read(struct file *filp, struct kobject *kobj,
>> + const struct bin_attribute *attr, char *buf,
>> + loff_t offset, size_t count)
>> +{
>> + struct device *dev = kobj_to_dev(kobj);
>> + struct sahara_ctrl_trng_data *ct;
>> + size_t available;
>> +
>> + ct = sahara_ctrl_trng_get(dev);
>> + if (!ct)
>> + return -ENODEV;
>> +
>> + mutex_lock(&ct->lock);
>> +
>> + /* No data yet or offset past end */
>> + if (!ct->data || offset >= ct->size) {
>> + mutex_unlock(&ct->lock);
>> + return 0;
>> + }
>> +
>> + available = ct->size - offset;
>> + count = min(count, available);
>> + memcpy(buf, (u8 *)ct->data + offset, count);
>> +
>> + mutex_unlock(&ct->lock);
>> +
>> + return count;
>> +}
>> +
>> +static const struct bin_attribute ddr_training_data_attr = {
>> + .attr = {
>> + .name = "ddr_training_data",
>> + .mode = 0444,
>> + },
>> + .read = ddr_training_data_read,
>> +};
>> +
>> +static void sahara_sysfs_devres_release(struct device *dev, void *res)
>> +{
>> + device_remove_bin_file(dev, &ddr_training_data_attr);
>> +}
>> +
>> +static void sahara_sysfs_create(struct mhi_device *mhi_dev)
>> +{
>> + struct device *dev = &mhi_dev->mhi_cntrl->mhi_dev->dev;
>> + void *cookie;
>> + int ret;
>> +
>> + if (devres_find(dev, sahara_sysfs_devres_release, NULL, NULL))
>> + return;
>> +
>> + ret = device_create_bin_file(dev, &ddr_training_data_attr);
>> + if (ret) {
>> + dev_warn(&mhi_dev->dev,
>> + "Failed to create DDR training sysfs node (%d)\n", ret);
>> + return;
>> + }
>> +
>> + cookie = devres_alloc(sahara_sysfs_devres_release, 1, GFP_KERNEL);
>> + if (!cookie) {
>> + device_remove_bin_file(dev, &ddr_training_data_attr);
>> + return;
>> + }
>> +
>> + devres_add(dev, cookie);
>> +}
>> +
>> static int sahara_find_image(struct sahara_context *context, u32 image_id)
>> {
>> char *fw_path;
>> @@ -1272,6 +1339,8 @@ static int sahara_mhi_probe(struct mhi_device *mhi_dev, const struct mhi_device_
>> return ret;
>> }
>>
>> + sahara_sysfs_create(mhi_dev);
>> +
>> return 0;
>> }
>>
>>
>> --
>> 2.34.1
>>
^ permalink raw reply
* Re: [PATCH v4 9/9] Documentation: ABI: Add sysfs ABI documentation for DDR training data
From: Kishore Batta @ 2026-04-14 9:57 UTC (permalink / raw)
To: Manivannan Sadhasivam
Cc: Jonathan Corbet, Shuah Khan, Jeff Hugo, Carl Vanderlip,
Oded Gabbay, andersson, linux-doc, linux-kernel, linux-arm-msm,
dri-devel, mhi
In-Reply-To: <yttrssaw4k2vx7r6l4vsb535qcrr4phsgj6qlnu2r764inai7o@d4qgr7uu5t2s>
On 4/13/2026 5:29 PM, Manivannan Sadhasivam wrote:
> On Thu, Mar 19, 2026 at 12:01:49PM +0530, Kishore Batta wrote:
>> Add ABI documentation for the DDR training data sysfs attribute exposed by
>> the sahara MHI driver.
>>
>> The documented sysfs node provides read-only access to the DDR training
>> data captured during sahara command mode and exposed via the MHI
>> controller device. This allows userspace to read the training data and
>> manage it as needed outside the kernel.
>>
>> Signed-off-by: Kishore Batta <kishore.batta@oss.qualcomm.com>
> Ah, this should be squashed with previous patch.
>
> - Mani
Sure. I will do it.
>> ---
>> .../ABI/testing/sysfs-bus-mhi-ddr_training_data | 19 +++++++++++++++++++
>> 1 file changed, 19 insertions(+)
>>
>> diff --git a/Documentation/ABI/testing/sysfs-bus-mhi-ddr_training_data b/Documentation/ABI/testing/sysfs-bus-mhi-ddr_training_data
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..810b487b5a5fdba133d81255f9879844e3938a10
>> --- /dev/null
>> +++ b/Documentation/ABI/testing/sysfs-bus-mhi-ddr_training_data
>> @@ -0,0 +1,19 @@
>> +What: /sys/bus/mhi/devices/<mhi-cntrl>/ddr_training_data
>> +
>> +Date: March 2026
>> +
>> +Contact: Kishore Batta <kishore.batta@oss.qualcomm.com>
>> +
>> +Description: Contains the DDR training data for the Qualcomm device
>> + connected. MHI driver populates different controller
>> + nodes for each device. The DDR training data is exposed
>> + to userspace to read and save the training data file to
>> + the filesystem. In the subsequent boot up of the device,
>> + the training data is restored from host to device
>> + optimizing the boot up time of the device.
>> +
>> +Usage: Example for reading DDR training data:
>> + cat /sys/bus/mhi/devices/mhi0/ddr_training_data
>> +
>> +Permissions: The file permissions are set to 0444 allowing read
>> + access.
>>
>> --
>> 2.34.1
>>
^ permalink raw reply
* Re: [PATCH 5/6] hugetlb: make hugetlb_add_to_page_cache() use PAGE_SIZE-based index
From: Oscar Salvador @ 2026-04-14 10:23 UTC (permalink / raw)
To: Jane Chu
Cc: akpm, david, muchun.song, lorenzo.stoakes, Liam.Howlett, vbabka,
rppt, surenb, mhocko, corbet, skhan, hughd, baolin.wang, peterx,
linux-mm, linux-doc, linux-kernel
In-Reply-To: <20260409234158.837786-6-jane.chu@oracle.com>
On Thu, Apr 09, 2026 at 05:41:56PM -0600, Jane Chu wrote:
> hugetlb_add_to_page_cache() currently takes a parameter named 'idx',
> but internally converts it from hugetlb page units into PAGE_SIZE-based
> page-cache index units before calling __filemap_add_folio().
>
> Make hugetlb_add_to_page_cache() take a PAGE_SIZE-based index directly
> and update its callers accordingly. This removes the internal shift,
> keeps the index units consistent with filemap_lock_folio() and
> __filemap_add_folio(), and simplifies the surrounding code.
>
> Signed-off-by: Jane Chu <jane.chu@oracle.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
--
Oscar Salvador
SUSE Labs
^ permalink raw reply
* RE: [PATCH v7 4/6] iio: adc: ad4691: add SPI offload support
From: Sabau, Radu bogdan @ 2026-04-14 10:28 UTC (permalink / raw)
To: David Lechner, Lars-Peter Clausen, Hennerich, Michael,
Jonathan Cameron, Sa, Nuno, Andy Shevchenko, Rob Herring,
Krzysztof Kozlowski, Conor Dooley, Uwe Kleine-König,
Liam Girdwood, Mark Brown, Linus Walleij, Bartosz Golaszewski,
Philipp Zabel, Jonathan Corbet, Shuah Khan
Cc: linux-iio@vger.kernel.org, devicetree@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-pwm@vger.kernel.org,
linux-gpio@vger.kernel.org, linux-doc@vger.kernel.org
In-Reply-To: <1170956f-da05-4280-990f-64306ca905c2@baylibre.com>
> -----Original Message-----
> From: David Lechner <dlechner@baylibre.com>
> Sent: Saturday, April 11, 2026 12:01 AM
...
> >
> > static const struct ad4691_chip_info ad4694_chip_info = {
> > .name = "ad4694",
> > .max_rate = 1 * HZ_PER_MHZ,
> > .sw_info = &ad4693_sw_info,
> > + .offload_info = &ad4693_offload_info,
> > +};
> > +
> > +struct ad4691_offload_state {
> > + struct spi_offload *spi;
>
> I would call this "offload" or "instance". "spi" is usally the SPI
> device handle.
I thought about this too, will implement it as offload then.
>
> > + struct spi_offload_trigger *trigger;
> > + u64 trigger_hz;
> > + u8 tx_cmd[17][2];
> > + u8 tx_reset[4];
> > };
> >
>
> ...
>
> > +
> > +static int ad4691_cnv_burst_offload_buffer_predisable(struct iio_dev
> *indio_dev)
> > +{
> > + struct ad4691_state *st = iio_priv(indio_dev);
> > + struct ad4691_offload_state *offload = st->offload;
> > + int ret;
> > +
> > + spi_offload_trigger_disable(offload->spi, offload->trigger);
> > +
> > + ret = ad4691_sampling_enable(st, false);
> > + if (ret)
> > + return ret;
> > +
> > + ret = regmap_write(st->regmap, AD4691_STD_SEQ_CONFIG,
> > + AD4691_SEQ_ALL_CHANNELS_OFF);
>
> Why this extra step? We don't have it when unwinding in the
> error path of the postenable function.
This is a mistake from my end. Perhaps this could be removed since
the sequencer is over-written upon new buffers/raw readings anyway.
>
> > + if (ret)
> > + return ret;
> > +
> > + spi_unoptimize_message(&st->scan_msg);
> > +
> > + return ad4691_exit_conversion_mode(st);
> > +}
> > +
> > +static const struct iio_buffer_setup_ops
> ad4691_cnv_burst_offload_buffer_setup_ops = {
> > + .postenable = &ad4691_cnv_burst_offload_buffer_postenable,
> > + .predisable = &ad4691_cnv_burst_offload_buffer_predisable,
> > +};
> > +
> > static ssize_t sampling_frequency_show(struct device *dev,
> > struct device_attribute *attr,
> > char *buf)
^ permalink raw reply
* Re: [PATCH v3 2/3] mm/memory-failure: add CONFIG_BOOTPARAM_MEMORY_FAILURE_PANIC option
From: Breno Leitao @ 2026-04-14 10:29 UTC (permalink / raw)
To: Miaohe Lin, Naoya Horiguchi, Andrew Morton, Jonathan Corbet,
Shuah Khan, David Hildenbrand, Lorenzo Stoakes, Liam R. Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko
Cc: linux-mm, linux-kernel, linux-doc, kernel-team, gustavold
In-Reply-To: <20260413-ecc_panic-v3-2-1dcbb2f12bc4@debian.org>
On Mon, Apr 13, 2026 at 06:26:34AM -0700, Breno Leitao wrote:
> +config BOOTPARAM_MEMORY_FAILURE_PANIC
> + bool "Panic on unrecoverable memory failure"
> + depends on MEMORY_FAILURE
> + help
> + Say Y here to panic when an unrecoverable memory failure is
> + detected. This covers kernel pages, high-order kernel pages,
> + and unknown page types that cannot be recovered. Can be disabled
> + at runtime via the panic_on_unrecoverable_memory_failure sysctl.
After considering Linus's recent feedback on kernel configuration
complexity, I'm reconsidering this approach. He recently emphasized:
"The kernel config phase is probably one of the biggest pain points for
random new people trying to build their own kernels, and we DO NOT ASK
PEOPLE STUIPID THINGS." --Linus
https://lore.kernel.org/all/CAHk-=whigg3hvOy7c1j1MXFy6o6CHp0g4Tc3Y-MAk+XDssHU0A@mail.gmail.com/
I will respin a new version, dropping this patch from the series to keep Linus’
blood pressure in check.
--breno
^ permalink raw reply
* RE: [PATCH v7 5/6] iio: adc: ad4691: add oversampling support
From: Sabau, Radu bogdan @ 2026-04-14 10:32 UTC (permalink / raw)
To: David Lechner, Lars-Peter Clausen, Hennerich, Michael,
Jonathan Cameron, Sa, Nuno, Andy Shevchenko, Rob Herring,
Krzysztof Kozlowski, Conor Dooley, Uwe Kleine-König,
Liam Girdwood, Mark Brown, Linus Walleij, Bartosz Golaszewski,
Philipp Zabel, Jonathan Corbet, Shuah Khan
Cc: linux-iio@vger.kernel.org, devicetree@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-pwm@vger.kernel.org,
linux-gpio@vger.kernel.org, linux-doc@vger.kernel.org
In-Reply-To: <742b1821-9103-414e-a860-c2e8d5406e35@baylibre.com>
> -----Original Message-----
> From: David Lechner <dlechner@baylibre.com>
> Sent: Saturday, April 11, 2026 12:15 AM
...
> >
> > osc_idx = FIELD_GET(AD4691_OSC_FREQ_MASK, reg_val);
> > - /* Wait 2 oscillator periods for the conversion to complete. */
> > - period_us = DIV_ROUND_UP(2UL * USEC_PER_SEC,
> ad4691_osc_freqs_Hz[osc_idx]);
> > + /* Wait osr oscillator periods for all accumulator samples to complete.
> */
>
> Why did we need to way 2 before and only 1 now when OSR == 1?
>
You are right, that extra period should exist when reading raw not dependent
on the OSR. If OSR = 4 then we should wait 5 just to make sure we are reading
a correct result, since the single_shot_read doesn’t use any interrupts as the
buffers do.
^ permalink raw reply
* [PATCH bpf] bpf,tcp: avoid infinite recursion in BPF_SOCK_OPS_HDR_OPT_LEN_CB
From: Jiayuan Chen @ 2026-04-14 10:57 UTC (permalink / raw)
To: bpf
Cc: Jiayuan Chen, Quan Sun, Yinhao Hu, Kaiyan Mei, Dongliang Mu,
Eric Dumazet, Neal Cardwell, Kuniyuki Iwashima, David S. Miller,
Jakub Kicinski, Paolo Abeni, Simon Horman, Jonathan Corbet,
Shuah Khan, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
David Ahern, netdev, linux-doc, linux-kernel
A BPF_PROG_TYPE_SOCK_OPS program can set BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG
to inject custom TCP header options. When the kernel builds a TCP packet,
it calls tcp_established_options() to calculate the header size, which
invokes bpf_skops_hdr_opt_len() to trigger the BPF_SOCK_OPS_HDR_OPT_LEN_CB
callback.
If the BPF program calls bpf_setsockopt(TCP_NODELAY) inside this callback,
__tcp_sock_set_nodelay() will call tcp_push_pending_frames(), which calls
tcp_current_mss(), which calls tcp_established_options() again,
re-triggering the same BPF callback. This creates an infinite recursion
that exhausts the kernel stack and causes a panic.
BPF_SOCK_OPS_HDR_OPT_LEN_CB
-> bpf_setsockopt(TCP_NODELAY)
-> tcp_push_pending_frames()
-> tcp_current_mss()
-> tcp_established_options()
-> bpf_skops_hdr_opt_len()
/* infinite recursion */
-> BPF_SOCK_OPS_HDR_OPT_LEN_CB
A similar reentrancy issue exists for TCP congestion control, which is
guarded by tp->bpf_chg_cc_inprogress. Adopt the same approach: introduce
tp->bpf_hdr_opt_len_cb_inprogress, set it before invoking the callback in
bpf_skops_hdr_opt_len(), and check it in sol_tcp_sockopt() to reject
bpf_setsockopt(TCP_NODELAY) calls that would trigger
tcp_push_pending_frames() and cause the recursion.
Reported-by: Quan Sun <2022090917019@std.uestc.edu.cn>
Reported-by: Yinhao Hu <dddddd@hust.edu.cn>
Reported-by: Kaiyan Mei <M202472210@hust.edu.cn>
Reported-by: Dongliang Mu <dzm91@hust.edu.cn>
Closes: https://lore.kernel.org/bpf/d1d523c9-6901-4454-a183-94462b8f3e4e@std.uestc.edu.cn/
Fixes: 0813a841566f ("bpf: tcp: Allow bpf prog to write and parse TCP header option")
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
---
Documentation/networking/net_cachelines/tcp_sock.rst | 1 +
include/linux/tcp.h | 11 ++++++++++-
net/core/filter.c | 4 ++++
net/ipv4/tcp_minisocks.c | 1 +
net/ipv4/tcp_output.c | 3 +++
5 files changed, 19 insertions(+), 1 deletion(-)
diff --git a/Documentation/networking/net_cachelines/tcp_sock.rst b/Documentation/networking/net_cachelines/tcp_sock.rst
index 563daea10d6c..07d3226d90cc 100644
--- a/Documentation/networking/net_cachelines/tcp_sock.rst
+++ b/Documentation/networking/net_cachelines/tcp_sock.rst
@@ -152,6 +152,7 @@ unsigned_int keepalive_intvl
int linger2
u8 bpf_sock_ops_cb_flags
u8:1 bpf_chg_cc_inprogress
+u8:1 bpf_hdr_opt_len_cb_inprogress
u16 timeout_rehash
u32 rcv_ooopack
u32 rcv_rtt_last_tsecr
diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index f72eef31fa23..2bfb73cf922e 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -475,12 +475,21 @@ struct tcp_sock {
u8 bpf_sock_ops_cb_flags; /* Control calling BPF programs
* values defined in uapi/linux/tcp.h
*/
- u8 bpf_chg_cc_inprogress:1; /* In the middle of
+ u8 bpf_chg_cc_inprogress:1, /* In the middle of
* bpf_setsockopt(TCP_CONGESTION),
* it is to avoid the bpf_tcp_cc->init()
* to recur itself by calling
* bpf_setsockopt(TCP_CONGESTION, "itself").
*/
+ bpf_hdr_opt_len_cb_inprogress:1; /* It is set before invoking the
+ * callback so that a nested
+ * bpf_setsockopt(TCP_NODELAY) or
+ * bpf_setsockopt(TCP_CORK) cannot
+ * trigger tcp_push_pending_frames(),
+ * which would call tcp_current_mss()
+ * -> bpf_skops_hdr_opt_len(), causing
+ * infinite recursion.
+ */
#define BPF_SOCK_OPS_TEST_FLAG(TP, ARG) (TP->bpf_sock_ops_cb_flags & ARG)
#else
#define BPF_SOCK_OPS_TEST_FLAG(TP, ARG) 0
diff --git a/net/core/filter.c b/net/core/filter.c
index 78b548158fb0..518699429a7a 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5483,6 +5483,10 @@ static int sol_tcp_sockopt(struct sock *sk, int optname,
if (sk->sk_protocol != IPPROTO_TCP)
return -EINVAL;
+ if ((optname == TCP_NODELAY || optname == TCP_CORK) &&
+ tcp_sk(sk)->bpf_hdr_opt_len_cb_inprogress)
+ return -EBUSY;
+
switch (optname) {
case TCP_NODELAY:
case TCP_MAXSEG:
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index dafb63b923d0..fb06c464ac16 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -663,6 +663,7 @@ struct sock *tcp_create_openreq_child(const struct sock *sk,
RCU_INIT_POINTER(newtp->fastopen_rsk, NULL);
newtp->bpf_chg_cc_inprogress = 0;
+ newtp->bpf_hdr_opt_len_cb_inprogress = 0;
tcp_bpf_clone(sk, newsk);
__TCP_INC_STATS(sock_net(sk), TCP_MIB_PASSIVEOPENS);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 326b58ff1118..c9654e690e1a 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -475,6 +475,7 @@ static void bpf_skops_hdr_opt_len(struct sock *sk, struct sk_buff *skb,
unsigned int *remaining)
{
struct bpf_sock_ops_kern sock_ops;
+ struct tcp_sock *tp = tcp_sk(sk);
int err;
if (likely(!BPF_SOCK_OPS_TEST_FLAG(tcp_sk(sk),
@@ -519,7 +520,9 @@ static void bpf_skops_hdr_opt_len(struct sock *sk, struct sk_buff *skb,
if (skb)
bpf_skops_init_skb(&sock_ops, skb, 0);
+ tp->bpf_hdr_opt_len_cb_inprogress = 1;
err = BPF_CGROUP_RUN_PROG_SOCK_OPS_SK(&sock_ops, sk);
+ tp->bpf_hdr_opt_len_cb_inprogress = 0;
if (err || sock_ops.remaining_opt_len == *remaining)
return;
--
2.43.0
^ permalink raw reply related
* Re: maintainer profiles
From: Krzysztof Kozlowski @ 2026-04-14 11:18 UTC (permalink / raw)
To: Randy Dunlap, Linux Documentation, Linux Kernel Mailing List
Cc: Jonathan Corbet, Linux Kernel Workflows
In-Reply-To: <b7775383-da94-4098-8af9-2f672c4f1a71@infradead.org>
On 10/04/2026 02:18, Randy Dunlap wrote:
> Hi,
>
> Is there supposed to be a difference (or distinction) in the contents of
>
> Documentation/process/maintainer-handbooks.rst
> and
> Documentation/maintainer/maintainer-entry-profile.rst
> ?
>
> Can they be combined into one location?
Yes, please! Including also the location of actual profiles. I am mostly
looking at them in the sources directly, not web docs, so confusing and
annoying to find them distributed.
Best regards,
Krzysztof
^ permalink raw reply
* Re: maintainer profiles
From: Mauro Carvalho Chehab @ 2026-04-14 12:37 UTC (permalink / raw)
To: Dan Williams
Cc: Jonathan Corbet, Randy Dunlap, Linux Documentation,
Linux Kernel Mailing List, Linux Kernel Workflows
In-Reply-To: <69dd6299440be_147c801005b@djbw-dev.notmuch>
On Mon, 13 Apr 2026 14:39:37 -0700
Dan Williams <djbw@kernel.org> wrote:
> Jonathan Corbet wrote:
> > Randy Dunlap <rdunlap@infradead.org> writes:
> >
> > > Hi,
> > >
> > > Is there supposed to be a difference (or distinction) in the contents of
> > >
> > > Documentation/process/maintainer-handbooks.rst
> > > and
> > > Documentation/maintainer/maintainer-entry-profile.rst
> > > ?
> > >
> > > Can they be combined into one location?
> >
> > Late to the party, sorry ... the original idea, I believe, was that
> > maintainer-handbooks.rst would be for developers looking for a guidebook
> > for a specific subsystem, while maintainer-entry-profile.rst was about
> > how maintainers themselves should write their subsystem guide.
> > Doubtless things have drifted since then... But the intended audiences
> > were different, so it might be good to think about bringing them back
> > into focus.
>
> Right, I think something (roughly / hand-wavy) like the below is the
> intent. However, as I write that I notice that the combined list is a
> bit of a mess. I also notice that there are more "P:" entries in
> MAINTAINERS than there are entries in this maintainer-handbooks.rst
> list.
>
> So this probably wants to be a script that can build Documentation links
> from MAINTAINERS, or otherwise provide a script for developers to query
> a kernel tree for additional submission guides. It is probably not as
> important for the built docs to link all guides as it is for developers
> (or their agents) to live query a tree they are developing against.
There is already a Python script which parses MAINTAINERS file
(Documentation/sphinx/maintainers_include.py).
Currently, it expects a Sphinx meta-tag inside
Documentation/process/maintainers.rst:
.. maintainers-include::
I guess it shouldn't be hard to add support there for a
.. maintainers-profile::
Making it creating a set of cross-references is probably easy. Not
sure how easy/hard would be to create a TOC tree, though.
> Note the problem goes both ways, there are P: entries not in the
> combined handbook list, like the Security subsystem, and there are
> handbook entries without a P:, like the Tip tree.
Assuming we add such extension, we'll need to sync the P: entries.
I'll take a look on trying to extend the Sphinx maintainers
extension.
>
> diff --git a/Documentation/maintainer/maintainer-entry-profile.rst b/Documentation/maintainer/maintainer-entry-profile.rst
> index 6020d188e13d..58e2af333692 100644
> --- a/Documentation/maintainer/maintainer-entry-profile.rst
> +++ b/Documentation/maintainer/maintainer-entry-profile.rst
> @@ -92,24 +92,8 @@ full series, or privately send a reminder email. This section might also
> list how review works for this code area and methods to get feedback
> that are not directly from the maintainer.
>
> -Existing profiles
> ------------------
> -
> -For now, existing maintainer profiles are listed here; we will likely want
> -to do something different in the near future.
> -
> -.. toctree::
> - :maxdepth: 1
> -
> - ../doc-guide/maintainer-profile
> - ../nvdimm/maintainer-entry-profile
> - ../arch/riscv/patch-acceptance
> - ../process/maintainer-soc
> - ../process/maintainer-soc-clean-dts
> - ../driver-api/media/maintainer-entry-profile
> - ../process/maintainer-netdev
> - ../driver-api/vfio-pci-device-specific-driver-acceptance
> - ../nvme/feature-and-quirk-policy
> - ../filesystems/nfs/nfsd-maintainer-entry-profile
> - ../filesystems/xfs/xfs-maintainer-entry-profile
> - ../mm/damon/maintainer-profile
> +Maintainer Handbooks
> +--------------------
> +
> +For examples of other subsystem handbooks see
> +Documentation/process/maintainer-handbooks.rst.
> diff --git a/Documentation/process/maintainer-handbooks.rst b/Documentation/process/maintainer-handbooks.rst
> index 976391cec528..bc9299a04b1f 100644
> --- a/Documentation/process/maintainer-handbooks.rst
> +++ b/Documentation/process/maintainer-handbooks.rst
> @@ -9,14 +9,33 @@ The purpose of this document is to provide subsystem specific information
> which is supplementary to the general development process handbook
> :ref:`Documentation/process <development_process_main>`.
>
> +For developers, see below for all the known subsystem specific guides.
> +If the subsystem you are contributing to does not have a guide listed
> +here, it is fair to seek clarification of questions raised in
> +Documentation/maintainer/maintainer-entry-profile.rst.
> +
> +For maintainers, consider documenting additional requirements and
> +expectations if submissions routinely overlook specific submission
> +criteria. See Documentation/maintainer/maintainer-entry-profile.rst.
> +
> Contents:
>
> .. toctree::
> :numbered:
> :maxdepth: 2
>
> + maintainer-kvm-x86
> maintainer-netdev
> maintainer-soc
> maintainer-soc-clean-dts
> + maintainer-soc-clean-dts
> maintainer-tip
> - maintainer-kvm-x86
> + ../arch/riscv/patch-acceptance
> + ../doc-guide/maintainer-profile
> + ../driver-api/media/maintainer-entry-profile
> + ../driver-api/vfio-pci-device-specific-driver-acceptance
> + ../filesystems/nfs/nfsd-maintainer-entry-profile
> + ../filesystems/xfs/xfs-maintainer-entry-profile
> + ../mm/damon/maintainer-profile
> + ../nvdimm/maintainer-entry-profile
> + ../nvme/feature-and-quirk-policy
Sounds good on my eyes.
Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
--
Thanks,
Mauro
^ permalink raw reply
* Re: [PATCH] docs: fix typos in kernel documentation
From: Jonathan Corbet @ 2026-04-14 12:54 UTC (permalink / raw)
To: fru1tworld; +Cc: skhan, linux-doc, fru1tworld
In-Reply-To: <20260414084553.22762-1-fruitworld.planet@gmail.com>
Thank you for working to improve our documentation.
fru1tworld <fruitworld.planet@gmail.com> writes:
> reinitalizes => reinitializes
> unpriviledged => unprivileged
> the the => the (duplicated word)
> sub-struture => sub-structure
These changes generally look OK, but...
> Signed-off-by: fru1tworld <fruitworld.planet@gmail.com>
We need a proper signoff with your real name, please.
> ---
> Documentation/block/data-integrity.rst | 2 +-
> Documentation/core-api/list.rst | 2 +-
> Documentation/core-api/real-time/differences.rst | 2 +-
This one has already been fixed; it's always best to prepare your
patches against docs-next or linux-next.
> Documentation/gpu/drm-uapi.rst | 2 +-
> 4 files changed, 4 insertions(+), 4 deletions(-)
Thanks,
jon
^ permalink raw reply
* RE: [PATCH v7 6/6] docs: iio: adc: ad4691: add driver documentation
From: Sabau, Radu bogdan @ 2026-04-14 12:54 UTC (permalink / raw)
To: David Lechner, Lars-Peter Clausen, Hennerich, Michael,
Jonathan Cameron, Sa, Nuno, Andy Shevchenko, Rob Herring,
Krzysztof Kozlowski, Conor Dooley, Uwe Kleine-König,
Liam Girdwood, Mark Brown, Linus Walleij, Bartosz Golaszewski,
Philipp Zabel, Jonathan Corbet, Shuah Khan
Cc: linux-iio@vger.kernel.org, devicetree@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-pwm@vger.kernel.org,
linux-gpio@vger.kernel.org, linux-doc@vger.kernel.org
In-Reply-To: <9c36ee85-12da-41e8-b9ab-e32b7ec29e75@baylibre.com>
> -----Original Message-----
> From: David Lechner <dlechner@baylibre.com>
> Sent: Saturday, April 11, 2026 12:39 AM
...
> > +Buffer data format
> > +==================
> > +
> > +The IIO buffer data format (``in_voltageN_type``) is the same across all
> > +paths: 16-bit unsigned big-endian samples with no shift.
> > +
> > ++-------------------------+-------------+----------+-------+
> > +| Path | storagebits | realbits | shift |
> > ++=========================+=============+==========+=======+
> > +| Triggered buffer | 16 | 16 | 0 |
> > ++-------------------------+-------------+----------+-------+
> > +| CNV Burst offload (DMA) | 16 | 16 | 0 |
> > ++-------------------------+-------------+----------+-------+
> > +| Manual offload (DMA) | 16 | 16 | 0 |
> > ++-------------------------+-------------+----------+-------+
>
> Not sure this table is helpful since all values are the same everywhere.
>
> Also, doesn't SPI offload have storagebits == 32?
I tried using 16 storage bits for offload too, and so use the same channels
macro. For Manual its received in the next transfer and for CNV only the
receive transfers are rx streamed, and so 16 storage bits suffice for both.
^ permalink raw reply
* Re: [PATCH V10 00/10] famfs: port into fuse
From: Miklos Szeredi @ 2026-04-14 13:19 UTC (permalink / raw)
To: Joanne Koong
Cc: John Groves, Bernd Schubert, John Groves, Dan Williams,
Bernd Schubert, Alison Schofield, John Groves, Jonathan Corbet,
Shuah Khan, Vishal Verma, Dave Jiang, Matthew Wilcox, Jan Kara,
Alexander Viro, David Hildenbrand, Christian Brauner,
Darrick J . Wong, Randy Dunlap, Jeff Layton, Amir Goldstein,
Jonathan Cameron, Stefan Hajnoczi, Josef Bacik, Bagas Sanjaya,
Chen Linxuan, James Morse, Fuad Tabba, Sean Christopherson,
Shivank Garg, Ackerley Tng, Gregory Price, Aravind Ramesh,
Ajay Joshi, venkataravis@micron.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org, djbw
In-Reply-To: <CAJnrk1a06zkUmXW5EFiUmgAoFauwtzsYvnotaPH0ifVtyh7iDQ@mail.gmail.com>
On Fri, 10 Apr 2026 at 21:44, Joanne Koong <joannelkoong@gmail.com> wrote:
> Overall, my intention with bringing this up is just to make sure we're
> at least aware of this alternative before anything is merged and
> permanent. If Miklos and you think we should land this series, then
> I'm on board with that.
TBH, I'd prefer not to add the famfs specific mapping interface if not
absolutely necessary. This was the main sticking point originally,
but there seemed to be no better alternative.
However with the bpf approach this would be gone, which is great.
So let us please at least have a try at this. I'm not into bpf yet,
but willing to learn.
Thanks,
Miklos
^ permalink raw reply
* Re: [PATCH V10 00/10] famfs: port into fuse
From: John Groves @ 2026-04-14 13:41 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Joanne Koong, Bernd Schubert, John Groves, Dan Williams,
Bernd Schubert, Alison Schofield, John Groves, Jonathan Corbet,
Shuah Khan, Vishal Verma, Dave Jiang, Matthew Wilcox, Jan Kara,
Alexander Viro, David Hildenbrand, Christian Brauner,
Darrick J . Wong, Randy Dunlap, Jeff Layton, Amir Goldstein,
Jonathan Cameron, Stefan Hajnoczi, Josef Bacik, Bagas Sanjaya,
Chen Linxuan, James Morse, Fuad Tabba, Sean Christopherson,
Shivank Garg, Ackerley Tng, Gregory Price, Aravind Ramesh,
Ajay Joshi, venkataravis@micron.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org, djbw
In-Reply-To: <CAJfpegvVTcV89=q3L326aGQjhduBcv7PVg5QKftGLjNZmCLmaw@mail.gmail.com>
On 26/04/14 03:19PM, Miklos Szeredi wrote:
> On Fri, 10 Apr 2026 at 21:44, Joanne Koong <joannelkoong@gmail.com> wrote:
>
> > Overall, my intention with bringing this up is just to make sure we're
> > at least aware of this alternative before anything is merged and
> > permanent. If Miklos and you think we should land this series, then
> > I'm on board with that.
>
> TBH, I'd prefer not to add the famfs specific mapping interface if not
> absolutely necessary. This was the main sticking point originally,
> but there seemed to be no better alternative.
>
> However with the bpf approach this would be gone, which is great.
>
> So let us please at least have a try at this. I'm not into bpf yet,
> but willing to learn.
>
> Thanks,
> Miklos
Thanks for responding...
My short response: Noooooooooo!!!!!!
I very strongly object to making this a prerequisite to merging. This
is an untested idea that will certainly delay us by at least a couple
of merge windows when products are shipping now, and the existing approach
has been in circulation for a long time. It is TOO LATE!!!!!!
Famfs is not a science project, it's enablement for actual products and
early versions are available now!!!
That doesn't mean we couldn't convert later IF THERE ARE NO HIDDEN PROBLEMS.
What are the risks of converting to BPF?
- I don't know how to do it - so it'll be slow (kinda like my fuse learning
curve cost about a year because this is not that similar to anything
else that was already in fuse.
- Those of us who are involved don't fully understand either the security
or performance implications of this. It
- Famfs is enabling access to memory and mapping fault handling must be
at "memory speed". We know that BPF walks some data structures when a
program executes. That exposes us to additional serialized L3 cache
misses each time we service a mapping fault (any TLB & page table miss).
This should be studied side-by-side with the existing approach under
multiple loads before being adopted for production.
- This has never been done in production, and we're throwing it in the way
of a project that has been soaking for years and needs to support early
shipments of products.
If this is the only path, I'd like to revive famfs as a standalone file
system. I'm still maintaining that and it's still in use.
Please reconsider Miklos. To use an American football metaphor, this moves
the goal posts by a mile, and that's not reasonable!!!
Thanks,
John
^ permalink raw reply
* Re: [PATCH V10 00/10] famfs: port into fuse
From: Miklos Szeredi @ 2026-04-14 14:18 UTC (permalink / raw)
To: John Groves
Cc: Joanne Koong, Bernd Schubert, John Groves, Dan Williams,
Bernd Schubert, Alison Schofield, John Groves, Jonathan Corbet,
Shuah Khan, Vishal Verma, Dave Jiang, Matthew Wilcox, Jan Kara,
Alexander Viro, David Hildenbrand, Christian Brauner,
Darrick J . Wong, Randy Dunlap, Jeff Layton, Amir Goldstein,
Jonathan Cameron, Stefan Hajnoczi, Josef Bacik, Bagas Sanjaya,
Chen Linxuan, James Morse, Fuad Tabba, Sean Christopherson,
Shivank Garg, Ackerley Tng, Gregory Price, Aravind Ramesh,
Ajay Joshi, venkataravis@micron.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, nvdimm@lists.linux.dev,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org, djbw
In-Reply-To: <ad4_jFsR951c2Mtn@groves.net>
On Tue, 14 Apr 2026 at 15:41, John Groves <John@groves.net> wrote:
> My short response: Noooooooooo!!!!!!
:) Seems like this is a highly emotional topic... I suggest that we
go ahead with bpf experiments, then discuss results and path forward
at LSM.
Thanks,
Miklos
^ permalink raw reply
* [RFC, PATCH 00/12] userfaultfd: working set tracking for VM guest memory
From: Kiryl Shutsemau (Meta) @ 2026-04-14 14:23 UTC (permalink / raw)
To: Andrew Morton
Cc: Peter Xu, David Hildenbrand, Lorenzo Stoakes, Mike Rapoport,
Suren Baghdasaryan, Vlastimil Babka, Liam R . Howlett, Zi Yan,
Jonathan Corbet, Shuah Khan, Sean Christopherson, Paolo Bonzini,
linux-mm, linux-kernel, linux-doc, linux-kselftest, kvm,
Kiryl Shutsemau (Meta)
This series adds userfaultfd support for tracking the working set of
VM guest memory, enabling VMMs to identify cold pages and evict them
to tiered or remote storage.
== Problem ==
VMMs managing guest memory need to:
1. Track which pages are actively used (working set detection)
2. Safely evict cold pages to slower storage
3. Fetch pages back on demand when accessed again
For shmem-backed guest memory, working set tracking partially works
today: MADV_DONTNEED zaps PTEs while pages stay in page cache, and
re-access auto-resolves from cache. But safe eviction still requires
synchronous fault interception to prevent data loss races.
For anonymous guest memory (needed for KSM cross-VM deduplication),
there is no mechanism at all — clearing a PTE loses the page.
== Solution ==
The series introduces a unified userfaultfd interface that works
across both anonymous and shmem-backed memory:
UFFD_FEATURE_MINOR_ANON: extends MODE_MINOR registration to anonymous
private memory. Uses the PROT_NONE hinting mechanism (same as NUMA
balancing) to make pages inaccessible without freeing them.
UFFD_FEATURE_MINOR_ASYNC: auto-resolves minor faults without handler
involvement. The kernel restores PTE permissions immediately and the
faulting thread continues. Works for anonymous, shmem, and hugetlbfs.
UFFDIO_DEACTIVATE: marks pages as deactivated. For anonymous memory,
sets PROT_NONE on PTEs (pages stay resident). For shmem/hugetlbfs,
zaps PTEs (pages stay in page cache).
UFFDIO_SET_MODE: toggles MINOR_ASYNC at runtime, synchronized via
mmap_write_lock. Enables the VMM workflow: async mode for lightweight
detection, sync mode for race-free eviction.
PAGE_IS_UFFD_DEACTIVATED: PAGEMAP_SCAN category flag for efficient
batch detection of cold (still-deactivated) anonymous pages.
== VMM Workflow ==
UFFDIO_DEACTIVATE(all) -- async, no vCPU stalls
sleep(interval)
PAGEMAP_SCAN -- find cold pages
UFFDIO_SET_MODE(sync) -- block faults for eviction
pwrite + MADV_DONTNEED cold pages -- safe, faults block
UFFDIO_SET_MODE(async) -- resume tracking
The same workflow applies to shmem, with a different PAGEMAP_SCAN mask
(!PAGE_IS_PRESENT instead of PAGE_IS_UFFD_DEACTIVATED).
== NUMA Balancing ==
NUMA balancing scanning is skipped on anonymous VM_UFFD_MINOR VMAs to
avoid protnone conflicts. NUMA locality stats are fed from the uffd
fault path via task_numa_fault() so the scheduler retains placement
data. Shmem VMAs are unaffected (UFFDIO_DEACTIVATE zaps PTEs there,
no protnone involved).
== Testing ==
The series includes 6 new selftests covering async/sync modes,
PAGEMAP_SCAN cold detection, GUP through protnone, UFFDIO_SET_MODE
toggling, and cleanup on close. All 73 uffd unit tests pass
(including hugetlb) across defconfig, allnoconfig, allmodconfig,
and randomized configs.
Kiryl Shutsemau (Meta) (12):
userfaultfd: define UAPI constants for anonymous minor faults
userfaultfd: add UFFD_FEATURE_MINOR_ANON registration support
userfaultfd: implement UFFDIO_DEACTIVATE ioctl
userfaultfd: UFFDIO_CONTINUE for anonymous memory
mm: intercept protnone faults on VM_UFFD_MINOR anonymous VMAs
userfaultfd: auto-resolve shmem and hugetlbfs minor faults in async
mode
sched/numa: skip scanning anonymous VM_UFFD_MINOR VMAs
userfaultfd: enable UFFD_FEATURE_MINOR_ANON
mm/pagemap: add PAGE_IS_UFFD_DEACTIVATED to PAGEMAP_SCAN
userfaultfd: add UFFDIO_SET_MODE for runtime sync/async toggle
selftests/mm: add userfaultfd anonymous minor fault tests
Documentation/userfaultfd: document working set tracking
Documentation/admin-guide/mm/userfaultfd.rst | 141 ++++-
fs/proc/task_mmu.c | 11 +-
fs/userfaultfd.c | 184 +++++-
include/linux/huge_mm.h | 6 +
include/linux/mm.h | 2 +
include/linux/sched/numa_balancing.h | 1 +
include/linux/userfaultfd_k.h | 21 +-
include/trace/events/sched.h | 3 +-
include/uapi/linux/fs.h | 1 +
include/uapi/linux/userfaultfd.h | 40 +-
kernel/sched/fair.c | 13 +
mm/huge_memory.c | 33 +-
mm/hugetlb.c | 3 +-
mm/memory.c | 51 +-
mm/mprotect.c | 9 +-
mm/shmem.c | 3 +-
mm/userfaultfd.c | 164 +++++-
tools/testing/selftests/mm/uffd-unit-tests.c | 458 +++++++++++++++
18 files changed, 1096 insertions(+), 48 deletions(-)
Kiryl Shutsemau (Meta) (12):
userfaultfd: define UAPI constants for anonymous minor faults
userfaultfd: add UFFD_FEATURE_MINOR_ANON registration support
userfaultfd: implement UFFDIO_DEACTIVATE ioctl
userfaultfd: UFFDIO_CONTINUE for anonymous memory
mm: intercept protnone faults on VM_UFFD_MINOR anonymous VMAs
userfaultfd: auto-resolve shmem and hugetlbfs minor faults in async
mode
sched/numa: skip scanning anonymous VM_UFFD_MINOR VMAs
userfaultfd: enable UFFD_FEATURE_MINOR_ANON
mm/pagemap: add PAGE_IS_UFFD_DEACTIVATED to PAGEMAP_SCAN
userfaultfd: add UFFDIO_SET_MODE for runtime sync/async toggle
selftests/mm: add userfaultfd anonymous minor fault tests
Documentation/userfaultfd: document working set tracking
Documentation/admin-guide/mm/userfaultfd.rst | 141 +++++-
fs/proc/task_mmu.c | 11 +-
fs/userfaultfd.c | 184 +++++++-
include/linux/huge_mm.h | 6 +
include/linux/mm.h | 2 +
include/linux/sched/numa_balancing.h | 1 +
include/linux/userfaultfd_k.h | 21 +-
include/trace/events/sched.h | 3 +-
include/uapi/linux/fs.h | 1 +
include/uapi/linux/userfaultfd.h | 40 +-
kernel/sched/fair.c | 13 +
mm/huge_memory.c | 33 +-
mm/hugetlb.c | 3 +-
mm/memory.c | 51 ++-
mm/mprotect.c | 9 +-
mm/shmem.c | 3 +-
mm/userfaultfd.c | 164 ++++++-
tools/testing/selftests/mm/uffd-unit-tests.c | 458 +++++++++++++++++++
18 files changed, 1096 insertions(+), 48 deletions(-)
--
2.51.2
^ permalink raw reply
* [RFC, PATCH 01/12] userfaultfd: define UAPI constants for anonymous minor faults
From: Kiryl Shutsemau (Meta) @ 2026-04-14 14:23 UTC (permalink / raw)
To: Andrew Morton
Cc: Peter Xu, David Hildenbrand, Lorenzo Stoakes, Mike Rapoport,
Suren Baghdasaryan, Vlastimil Babka, Liam R . Howlett, Zi Yan,
Jonathan Corbet, Shuah Khan, Sean Christopherson, Paolo Bonzini,
linux-mm, linux-kernel, linux-doc, linux-kselftest, kvm,
Kiryl Shutsemau (Meta)
In-Reply-To: <20260414142354.1465950-1-kas@kernel.org>
Add UAPI definitions for userfaultfd working set tracking on anonymous
memory:
- UFFD_FEATURE_MINOR_ANON: minor fault support for anonymous memory
- UFFD_FEATURE_MINOR_ASYNC: auto-resolve minor faults without handler
- UFFDIO_DEACTIVATE: mark pages as deactivated (protnone or PTE zap)
Not yet added to UFFD_API_FEATURES or UFFD_API_RANGE_IOCTLS.
Signed-off-by: Kiryl Shutsemau (Meta) <kas@kernel.org>
Assisted-by: Claude:claude-opus-4-6
---
include/uapi/linux/userfaultfd.h | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h
index 2841e4ea8f2c..336d07e1b6de 100644
--- a/include/uapi/linux/userfaultfd.h
+++ b/include/uapi/linux/userfaultfd.h
@@ -79,6 +79,7 @@
#define _UFFDIO_WRITEPROTECT (0x06)
#define _UFFDIO_CONTINUE (0x07)
#define _UFFDIO_POISON (0x08)
+#define _UFFDIO_DEACTIVATE (0x09)
#define _UFFDIO_API (0x3F)
/* userfaultfd ioctl ids */
@@ -103,6 +104,8 @@
struct uffdio_continue)
#define UFFDIO_POISON _IOWR(UFFDIO, _UFFDIO_POISON, \
struct uffdio_poison)
+#define UFFDIO_DEACTIVATE _IOR(UFFDIO, _UFFDIO_DEACTIVATE, \
+ struct uffdio_range)
/* read() structure */
struct uffd_msg {
@@ -230,6 +233,18 @@ struct uffdio_api {
*
* UFFD_FEATURE_MOVE indicates that the kernel supports moving an
* existing page contents from userspace.
+ *
+ * UFFD_FEATURE_MINOR_ANON indicates that minor fault interception
+ * is supported for anonymous private memory. Pages are made
+ * inaccessible via UFFDIO_DEACTIVATE (sets PROT_NONE while
+ * preserving the page) and faults are delivered when the pages
+ * are re-accessed.
+ *
+ * UFFD_FEATURE_MINOR_ASYNC indicates asynchronous minor fault
+ * mode. When set, faults on deactivated pages are auto-resolved
+ * by the kernel (PTE permissions restored immediately) without
+ * delivering a message to the userfaultfd handler. Use
+ * PAGEMAP_SCAN to find pages that were not re-accessed.
*/
#define UFFD_FEATURE_PAGEFAULT_FLAG_WP (1<<0)
#define UFFD_FEATURE_EVENT_FORK (1<<1)
@@ -248,6 +263,8 @@ struct uffdio_api {
#define UFFD_FEATURE_POISON (1<<14)
#define UFFD_FEATURE_WP_ASYNC (1<<15)
#define UFFD_FEATURE_MOVE (1<<16)
+#define UFFD_FEATURE_MINOR_ANON (1<<17)
+#define UFFD_FEATURE_MINOR_ASYNC (1<<18)
__u64 features;
__u64 ioctls;
--
2.51.2
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox