Linux Documentation
 help / color / mirror / Atom feed
* Re: [PATCH] docs: update rust-analyzer command
From: Miguel Ojeda @ 2026-05-13 10:29 UTC (permalink / raw)
  To: Onur Özkan, Tamir Duberstein, Jesung Yang
  Cc: rust-for-linux, linux-doc, linux-kernel, ojeda, boqun, gary,
	bjorn3_gh, lossin, a.hindborg, aliceryhl, tmgross, dakr, corbet,
	skhan, alexs, si.yanteng, dzm91
In-Reply-To: <20260513092017.265269-1-work@onurozkan.dev>

On Wed, May 13, 2026 at 11:20 AM Onur Özkan <work@onurozkan.dev> wrote:
>
> diff --git a/Documentation/rust/quick-start.rst b/Documentation/rust/quick-start.rst
> index a6ec3fa94d33..df5b54b51deb 100644
> --- a/Documentation/rust/quick-start.rst
> +++ b/Documentation/rust/quick-start.rst
> @@ -314,7 +314,7 @@ definition, and other features.
>  ``rust-analyzer`` needs a configuration file, ``rust-project.json``, which
>  can be generated by the ``rust-analyzer`` Make target::
>
> -       make LLVM=1 rust-analyzer
> +       make LLVM=1 prepare rust-analyzer

Perhaps we should add a brief sentence after this code block
explaining why the `prepare` is there, e.g. adapted from the commit
message:

    For the best experience, it is recommended to make ``prepare``
    together with the ``rust-analyzer`` target so that all generated
    files (e.g. proc macros) are available.

Cc'ing Tamir and Jesung.

Cheers,
Miguel

^ permalink raw reply

* [PATCH v3 0/2] hwmon: Add Murata D1U74T-W PSU driver
From: Abdurrahman Hussain @ 2026-05-13 10:33 UTC (permalink / raw)
  To: Guenter Roeck, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Jonathan Corbet, Shuah Khan
  Cc: linux-hwmon, devicetree, linux-kernel, linux-doc,
	Abdurrahman Hussain, kernel test robot

This series adds a PMBus driver for the Murata D1U74T-W AC/DC power
supply unit, used in some Open Compute Project platforms.

The PSU is PMBus-compliant and uses the linear data format. The driver
exposes:

  - input/output voltage, current and power telemetry,
  - three temperature sensors,
  - dual fan tachometer monitoring,

through the standard hwmon/pmbus sysfs interface. Probe verifies the
PMBUS_MFR_ID and PMBUS_MFR_MODEL fields before binding so the driver
only attaches to actual D1U74T-W hardware.

Patch 1 adds the compatible string to trivial-devices.yaml. The
binding declares only compatible and reg (no regulators, no supplies),
so a standalone binding file is not warranted.

Patch 2 adds the driver, hwmon documentation, Kconfig/Makefile entries
and MAINTAINERS section.

Signed-off-by: Abdurrahman Hussain <abdurrahman@nexthop.ai>
---
Changes in v3 (addresses the sashiko automated review):
- Patch 2: move the new MAINTAINERS entry into the correct
  alphabetical position in the M section (between MULTIPLEXER
  SUBSYSTEM and MUSB MULTIPOINT) instead of leaving it wedged
  between CRPS DRIVER and CRYPTO API.
- Patch 2: rewrite the sysfs-entries table in
  Documentation/hwmon/d1u74t.rst to match the attributes the chip
  actually exposes. The previous table listed the PMBus-spec
  maximal set (crit/lcrit/max/min for in1/in2, crit for temp,
  max/max_alarm for curr1, etc.) but the chip only implements a
  subset; pmbus_core consequently only creates a subset of attrs.
  Cross-checked against two D1U74T-W units, both expose the same
  attribute set. Also fixes the in2_* descriptions that incorrectly
  referred to "input voltage" rather than output voltage (in2 is
  vout1).
- Patch 2: use dev_err_probe() for the MFR_ID-mismatch error path
  in d1u74t_probe(), matching the surrounding error-handling style.
- Patch 2: gate the MFR_MODEL strncmp() on rc >= 8 so a short
  block-read response cannot make the comparison read stale bytes
  left over from the previous MFR_ID read into the same buffer.
- Patch 1 is unchanged from v2.
- Link to v2: https://patch.msgid.link/20260512-d1u74t-v2-0-431d00fbb1c4@nexthop.ai

Changes in v2:
- Patch 1: move the binding into trivial-devices.yaml rather than
  carrying a standalone murata,d1u74t.yaml. The device only declares
  compatible and reg, with no regulators or supplies, so the
  standalone binding was not warranted (Conor Dooley review).
- Patch 2: fix the d1u74t.rst title underline (was 18 '=' chars under
  a 20-char title, docutils warning from the kernel test robot).
- Link to v1: https://patch.msgid.link/20260511-d1u74t-v1-0-623c2bc1532a@nexthop.ai

To: Rob Herring <robh@kernel.org>
To: Krzysztof Kozlowski <krzk+dt@kernel.org>
To: Conor Dooley <conor+dt@kernel.org>
To: Abdurrahman Hussain <abdurrahman@nexthop.ai>
To: Guenter Roeck <linux@roeck-us.net>
To: Jonathan Corbet <corbet@lwn.net>
To: Shuah Khan <skhan@linuxfoundation.org>
Cc: devicetree@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-hwmon@vger.kernel.org
Cc: linux-doc@vger.kernel.org

---
Abdurrahman Hussain (2):
      dt-bindings: trivial-devices: Add Murata D1U74T PSU
      hwmon: (pmbus/d1u74t) Add Murata D1U74T PSU driver

 .../devicetree/bindings/trivial-devices.yaml       |  2 +
 Documentation/hwmon/d1u74t.rst                     | 81 ++++++++++++++++++++
 Documentation/hwmon/index.rst                      |  1 +
 MAINTAINERS                                        |  7 ++
 drivers/hwmon/pmbus/Kconfig                        |  9 +++
 drivers/hwmon/pmbus/Makefile                       |  1 +
 drivers/hwmon/pmbus/d1u74t.c                       | 86 ++++++++++++++++++++++
 7 files changed, 187 insertions(+)
---
base-commit: 5d6919055dec134de3c40167a490f33c74c12581
change-id: 20260511-d1u74t-c0cba8f1c344

Best regards,
--  
Abdurrahman Hussain <abdurrahman@nexthop.ai>


^ permalink raw reply

* [PATCH v3 1/2] dt-bindings: trivial-devices: Add Murata D1U74T PSU
From: Abdurrahman Hussain @ 2026-05-13 10:33 UTC (permalink / raw)
  To: Guenter Roeck, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Jonathan Corbet, Shuah Khan
  Cc: linux-hwmon, devicetree, linux-kernel, linux-doc,
	Abdurrahman Hussain
In-Reply-To: <20260513-d1u74t-v3-0-27bcd6852c45@nexthop.ai>

The Murata D1U74T-W is a PMBus-compliant AC/DC power supply unit. The
binding only declares the compatible string and i2c reg, with no
additional properties (no regulators, no supplies), so add it to
trivial-devices.yaml rather than carrying a standalone binding file.

Signed-off-by: Abdurrahman Hussain <abdurrahman@nexthop.ai>
---
 Documentation/devicetree/bindings/trivial-devices.yaml | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/devicetree/bindings/trivial-devices.yaml b/Documentation/devicetree/bindings/trivial-devices.yaml
index 23fd4513933a..19c8c7220858 100644
--- a/Documentation/devicetree/bindings/trivial-devices.yaml
+++ b/Documentation/devicetree/bindings/trivial-devices.yaml
@@ -352,6 +352,8 @@ properties:
           - mps,mp9941
             # Monolithic Power Systems Inc. digital step-down converter mp9945
           - mps,mp9945
+            # Murata D1U74T-W power supply unit
+          - murata,d1u74t
             # Temperature sensor with integrated fan control
           - national,lm63
             # Temperature sensor with integrated fan control

-- 
2.53.0


^ permalink raw reply related

* [PATCH v3 2/2] hwmon: (pmbus/d1u74t) Add Murata D1U74T PSU driver
From: Abdurrahman Hussain @ 2026-05-13 10:33 UTC (permalink / raw)
  To: Guenter Roeck, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Jonathan Corbet, Shuah Khan
  Cc: linux-hwmon, devicetree, linux-kernel, linux-doc,
	Abdurrahman Hussain, kernel test robot
In-Reply-To: <20260513-d1u74t-v3-0-27bcd6852c45@nexthop.ai>

Add PMBUS driver for Murata D1U74T power supplies.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202605122253.zInzmUeX-lkp@intel.com/
Signed-off-by: Abdurrahman Hussain <abdurrahman@nexthop.ai>
---
 Documentation/hwmon/d1u74t.rst | 81 +++++++++++++++++++++++++++++++++++++++
 Documentation/hwmon/index.rst  |  1 +
 MAINTAINERS                    |  7 ++++
 drivers/hwmon/pmbus/Kconfig    |  9 +++++
 drivers/hwmon/pmbus/Makefile   |  1 +
 drivers/hwmon/pmbus/d1u74t.c   | 86 ++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 185 insertions(+)

diff --git a/Documentation/hwmon/d1u74t.rst b/Documentation/hwmon/d1u74t.rst
new file mode 100644
index 000000000000..2856aba97c3a
--- /dev/null
+++ b/Documentation/hwmon/d1u74t.rst
@@ -0,0 +1,81 @@
+.. SPDX-License-Identifier: GPL-2.0-or-later
+
+Kernel driver d1u74t
+====================
+
+Supported chips:
+
+  * Murata D1U74T
+
+    Prefix: 'd1u74t'
+
+    Addresses scanned: -
+
+    Datasheet: Only available under NDA.
+
+Authors:
+    Abdurrahman Hussain <abdurrahman@nexthop.ai>
+
+
+Description
+-----------
+
+This driver implements support for Murata D1U74T Power Supply with
+PMBus support.
+
+The driver is a client driver to the core PMBus driver.
+Please see Documentation/hwmon/pmbus.rst for details on PMBus client drivers.
+
+
+Usage Notes
+-----------
+
+This driver does not auto-detect devices. You will have to instantiate the
+devices explicitly. Please see Documentation/i2c/instantiating-devices.rst for
+details.
+
+
+Sysfs entries
+-------------
+
+======================= ======================================================
+curr1_label		"iin"
+curr1_input		Measured input current
+curr1_alarm		Input current alarm
+curr1_rated_max		Maximum rated input current
+
+curr2_label		"iout1"
+curr2_input		Measured output current
+curr2_max		Maximum output current
+curr2_max_alarm		Output current high alarm
+curr2_crit		Critical high output current
+curr2_crit_alarm	Output current critical high alarm
+curr2_rated_max		Maximum rated output current
+
+in1_label		"vin"
+in1_input		Measured input voltage
+in1_alarm		Input voltage alarm
+in1_rated_min		Minimum rated input voltage
+in1_rated_max		Maximum rated input voltage
+
+in2_label		"vout1"
+in2_input		Measured output voltage
+in2_alarm		Output voltage alarm
+in2_rated_min		Minimum rated output voltage
+in2_rated_max		Maximum rated output voltage
+
+power1_label		"pin"
+power1_input		Measured input power
+power1_alarm		Input power alarm
+power1_rated_max	Maximum rated input power
+
+temp[1-3]_input		Measured temperature
+temp[1-3]_max		Maximum temperature
+temp[1-3]_max_alarm	Maximum temperature alarm
+temp[1-3]_rated_max	Maximum rated temperature
+
+fan1_alarm		Fan 1 warning
+fan1_fault		Fan 1 fault
+fan1_input		Fan 1 speed in RPM
+fan1_target		Fan 1 target
+======================= ======================================================
diff --git a/Documentation/hwmon/index.rst b/Documentation/hwmon/index.rst
index 8b655e5d6b68..97b1ef65b1c1 100644
--- a/Documentation/hwmon/index.rst
+++ b/Documentation/hwmon/index.rst
@@ -60,6 +60,7 @@ Hardware Monitoring Kernel Drivers
    corsair-psu
    cros_ec_hwmon
    crps
+   d1u74t
    da9052
    da9055
    dell-smm-hwmon
diff --git a/MAINTAINERS b/MAINTAINERS
index b2040011a386..3106cf725dfc 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -18249,6 +18249,13 @@ F:	drivers/mux/
 F:	include/dt-bindings/mux/
 F:	include/linux/mux/
 
+MURATA D1U74T PSU DRIVER
+M:	Abdurrahman Hussain <abdurrahman@nexthop.ai>
+L:	linux-hwmon@vger.kernel.org
+S:	Maintained
+F:	Documentation/hwmon/d1u74t.rst
+F:	drivers/hwmon/pmbus/d1u74t.c
+
 MUSB MULTIPOINT HIGH SPEED DUAL-ROLE CONTROLLER
 M:	Bin Liu <b-liu@ti.com>
 L:	linux-usb@vger.kernel.org
diff --git a/drivers/hwmon/pmbus/Kconfig b/drivers/hwmon/pmbus/Kconfig
index 8f4bff375ecb..ee93b22d2887 100644
--- a/drivers/hwmon/pmbus/Kconfig
+++ b/drivers/hwmon/pmbus/Kconfig
@@ -113,6 +113,15 @@ config SENSORS_CRPS
 	  This driver can also be built as a module. If so, the module will
 	  be called crps.
 
+config SENSORS_D1U74T
+	tristate "Murata D1U74T Power Supply"
+	help
+	  If you say yes here you get hardware monitoring support for the Murata
+	  D1U74T Power Supply.
+
+	  This driver can also be built as a module. If so, the module will
+	  be called d1u74t.
+
 config SENSORS_DELTA_AHE50DC_FAN
 	tristate "Delta AHE-50DC fan control module"
 	help
diff --git a/drivers/hwmon/pmbus/Makefile b/drivers/hwmon/pmbus/Makefile
index 7129b62bc00f..8cf7d3075371 100644
--- a/drivers/hwmon/pmbus/Makefile
+++ b/drivers/hwmon/pmbus/Makefile
@@ -76,3 +76,4 @@ obj-$(CONFIG_SENSORS_XDPE1A2G7B)	+= xdpe1a2g7b.o
 obj-$(CONFIG_SENSORS_ZL6100)	+= zl6100.o
 obj-$(CONFIG_SENSORS_PIM4328)	+= pim4328.o
 obj-$(CONFIG_SENSORS_CRPS)	+= crps.o
+obj-$(CONFIG_SENSORS_D1U74T)	+= d1u74t.o
diff --git a/drivers/hwmon/pmbus/d1u74t.c b/drivers/hwmon/pmbus/d1u74t.c
new file mode 100644
index 000000000000..286ba492e336
--- /dev/null
+++ b/drivers/hwmon/pmbus/d1u74t.c
@@ -0,0 +1,86 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright 2026 Nexthop Systems.
+ */
+
+#include <linux/i2c.h>
+#include <linux/of.h>
+#include <linux/pmbus.h>
+
+#include "pmbus.h"
+
+static const struct i2c_device_id d1u74t_id[] = {
+	{ "d1u74t" },
+	{},
+};
+MODULE_DEVICE_TABLE(i2c, d1u74t_id);
+
+static struct pmbus_driver_info d1u74t_info = {
+	.pages = 1,
+	/* PSU uses default linear data format. */
+	.func[0] = PMBUS_HAVE_PIN | PMBUS_HAVE_IOUT | PMBUS_HAVE_STATUS_IOUT |
+		   PMBUS_HAVE_IIN | PMBUS_HAVE_VIN | PMBUS_HAVE_STATUS_INPUT |
+		   PMBUS_HAVE_VOUT | PMBUS_HAVE_STATUS_VOUT | PMBUS_HAVE_TEMP |
+		   PMBUS_HAVE_TEMP2 | PMBUS_HAVE_TEMP3 |
+		   PMBUS_HAVE_STATUS_TEMP | PMBUS_HAVE_FAN12 |
+		   PMBUS_HAVE_STATUS_FAN12,
+};
+
+static int d1u74t_probe(struct i2c_client *client)
+{
+	char buf[I2C_SMBUS_BLOCK_MAX + 2] = { 0 };
+	struct device *dev = &client->dev;
+	int rc;
+
+	rc = i2c_smbus_read_block_data(client, PMBUS_MFR_ID, buf);
+	if (rc < 0)
+		return dev_err_probe(dev, rc, "Failed to read PMBUS_MFR_ID\n");
+
+	if (rc != 9 || strncmp(buf, "Murata-PS", 9)) {
+		buf[rc] = '\0';
+		return dev_err_probe(dev, -ENODEV,
+				     "Unsupported Manufacturer ID '%s'\n",
+				     buf);
+	}
+
+	rc = i2c_smbus_read_block_data(client, PMBUS_MFR_MODEL, buf);
+	if (rc < 0)
+		return dev_err_probe(dev, rc,
+				     "Failed to read PMBUS_MFR_MODEL\n");
+
+	if (rc < 8 || strncmp(buf, "D1U74T-W", 8)) {
+		buf[rc] = '\0';
+		return dev_err_probe(dev, -ENODEV, "Model '%s' not supported\n",
+				     buf);
+	}
+
+	rc = pmbus_do_probe(client, &d1u74t_info);
+	if (rc)
+		return dev_err_probe(dev, rc, "Failed to probe\n");
+
+	return 0;
+}
+
+static const struct of_device_id d1u74t_of_match[] = {
+	{
+		.compatible = "murata,d1u74t",
+	},
+	{},
+};
+MODULE_DEVICE_TABLE(of, d1u74t_of_match);
+
+static struct i2c_driver d1u74t_driver = {
+	.driver = {
+		.name = "d1u74t",
+		.of_match_table = d1u74t_of_match,
+	},
+	.probe = d1u74t_probe,
+	.id_table = d1u74t_id,
+};
+
+module_i2c_driver(d1u74t_driver);
+
+MODULE_AUTHOR("Abdurrahman Hussain");
+MODULE_DESCRIPTION("PMBus driver for Murata D1U74T-W power supplies");
+MODULE_LICENSE("GPL");
+MODULE_IMPORT_NS("PMBUS");

-- 
2.53.0


^ permalink raw reply related

* Re: [PATCH v3 2/3] Documentation: security-bugs: explain what is and is not a security bug
From: Greg KH @ 2026-05-13 10:29 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Willy Tarreau, Leon Romanovsky, skhan, security, workflows,
	linux-doc, linux-kernel
In-Reply-To: <87wlx8o87g.fsf@trenco.lwn.net>

On Tue, May 12, 2026 at 11:20:51AM -0600, Jonathan Corbet wrote:
> Willy Tarreau <w@1wt.eu> writes:
> 
> > The use of automated tools to find bugs in random locations of the kernel
> > induces a raise of security reports even if most of them should just be
> > reported as regular bugs. This patch is an attempt at drawing a line
> > between what qualifies as a security bug and what does not, hoping to
> > improve the situation and ease decision on the reporter's side.
> >
> > It defers the enumeration to a new file, threat-model.rst, that tries
> > to enumerate various classes of issues that are and are not security
> > bugs. This should permit to more easily update this file for various
> > subsystem-specific rules without having to revisit the security bug
> > reporting guide.
> 
> One thing here:
> 
> [...]
> 
> > +* **Capability-based protection**:
> > +
> > +  * users not having the ``CAP_SYS_ADMIN`` capability may not alter the
> > +    kernel's configuration, memory nor state, change other users' view of the
> > +    file system layout, grant any user capabilities they do not have, nor
> > +    affect the system's availability (shutdown, reboot, panic, hang, or making
> > +    the system unresponsive via unbounded resource exhaustion).
> 
> That is pretty demonstrably not true, and will likely elicit challenges
> at some point.  There are a lot of "make me root" capabilities that
> enable users to do all of those things; consider CAP_DAC_OVERRIDE as an
> obvious example.  I think that just about all of the capabilities will
> enable at least one of those things - that's why the capabilities exist
> in the first place.  So I think this needs to be written far more
> generally.

You are right, there are more capabilities, but we get bug reports all
the time that basically come down to "a user with CAP_SYS_ADMIN can go
and do..." which are pointless for us to be handling.  Just got one a
few minutes ago, so LLMs are churning this crap out quite frequently.

So any rewording of this to prevent us from getting these pointless
reports would be great.

> As a lower-priority thing, lockdown mode is meant to at least try to
> provide some stronger guarantees, and lockdown circumvention seems to be
> normally be viewed as a security bug.  Worth a mention?

lockdown issues are best discussed on the list where the lockdown people
are as most of us feel that really isn't a "security" thing at all :)

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH v3 3/3] Documentation: security-bugs: clarify requirements for AI-assisted reports
From: Greg KH @ 2026-05-13 10:30 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Willy Tarreau, Leon Romanovsky, skhan, security, workflows,
	linux-doc, linux-kernel
In-Reply-To: <87se7wo861.fsf@trenco.lwn.net>

On Tue, May 12, 2026 at 11:21:42AM -0600, Jonathan Corbet wrote:
> Willy Tarreau <w@1wt.eu> writes:
> 
> > AI tools are increasingly used to assist in bug discovery. While these
> > tools can identify valid issues, reports that are submitted without
> > manual verification often lack context, contain speculative impact
> > assessments, or include unnecessary formatting. Such reports increase
> > triage effort, waste maintainers' time and may be ignored.
> >
> > Reports where the reporter has verified the issue and the proposed fix
> > typically meet quality standards. This documentation outlines specific
> > requirements for length, formatting, and impact evaluation to reduce
> > the effort needed to deal with these reports.
> >
> > Cc: Greg KH <gregkh@linuxfoundation.org>
> > Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Reviewed-by: Leon Romanovsky <leon@kernel.org>
> > Signed-off-by: Willy Tarreau <w@1wt.eu>
> > ---
> >  Documentation/process/security-bugs.rst | 57 +++++++++++++++++++++++++
> >  1 file changed, 57 insertions(+)
> 
> One nit:
> 
> > +  * **Impact Evaluation**: Many AI-generated reports lack an understanding of
> > +    the kernel's threat model and go to great lengths inventing theoretical
> > +    consequences.
> 
> If only we had a shiny new document describing that threat model that we
> could reference here... :)

Ah yes, a link to that would make things better, but don't we have that
elsewhere in this series?

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH] Documentation: iio: fix typo in triggered-buffers example
From: Andy Shevchenko @ 2026-05-13 10:55 UTC (permalink / raw)
  To: Stepan Ionichev
  Cc: corbet, jic23, dlechner, nuno.sa, andy, skhan, gregkh, hcazarim,
	linux-doc, linux-iio, linux-kernel
In-Reply-To: <20260513100657.8498-1-sozdayvek@gmail.com>

On Wed, May 13, 2026 at 03:06:57PM +0500, Stepan Ionichev wrote:
> The function call example in triggered-buffers.rst uses "polfunc"

No need to repeat the file name here, see below what is better to use.

> (single 'l') while the function is defined as "pollfunc" (double
> 'l') on line 24 and referenced as "pollfunc" further down on
> line 56. Fix the misspelling so the example is consistent.

Line references are fragile, try to describe in terms of section and example
(names).

...

Code fix is good, nevertheless.

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply

* Re: [PATCH] docs: update rust-analyzer command
From: Gary Guo @ 2026-05-13 11:04 UTC (permalink / raw)
  To: Onur Özkan, rust-for-linux, linux-doc, linux-kernel
  Cc: ojeda, boqun, gary, bjorn3_gh, lossin, a.hindborg, aliceryhl,
	tmgross, dakr, corbet, skhan, alexs, si.yanteng, dzm91
In-Reply-To: <20260513092017.265269-1-work@onurozkan.dev>

On Wed May 13, 2026 at 10:19 AM BST, Onur Özkan wrote:
> On a fresh checkout, generating rust-project.json alone is not enough
> for rust-analyzer to work reliably. The issue only becomes apparent
> later when the LSP fails on a proc macro or binding types/functions.
>
> Recommend running prepare together with the rust-analyzer target so the
> generated files expected by rust-analyzer are available from the start.

This should be fixed by marking `prepare` as a dependency of `rust-analyzer`
instead.

Best,
Gary

>
> Link: https://rust-for-linux.zulipchat.com/#narrow/channel/597064-rust-analyzer
> Signed-off-by: Onur Özkan <work@onurozkan.dev>
> ---
>  Documentation/rust/quick-start.rst                    | 2 +-
>  Documentation/translations/zh_CN/rust/quick-start.rst | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/rust/quick-start.rst b/Documentation/rust/quick-start.rst
> index a6ec3fa94d33..df5b54b51deb 100644
> --- a/Documentation/rust/quick-start.rst
> +++ b/Documentation/rust/quick-start.rst
> @@ -314,7 +314,7 @@ definition, and other features.
>  ``rust-analyzer`` needs a configuration file, ``rust-project.json``, which
>  can be generated by the ``rust-analyzer`` Make target::
>  
> -	make LLVM=1 rust-analyzer
> +	make LLVM=1 prepare rust-analyzer
>  
>  
>  Configuration
> diff --git a/Documentation/translations/zh_CN/rust/quick-start.rst b/Documentation/translations/zh_CN/rust/quick-start.rst
> index 5f0ece6411f5..3f7efd3a63ad 100644
> --- a/Documentation/translations/zh_CN/rust/quick-start.rst
> +++ b/Documentation/translations/zh_CN/rust/quick-start.rst
> @@ -291,7 +291,7 @@ rust-analyzer
>  ``rust-analyzer`` 需要一个配置文件, ``rust-project.json``, 它可以由 ``rust-analyzer``
>  Make 目标生成::
>  
> -       make LLVM=1 rust-analyzer
> +       make LLVM=1 prepare rust-analyzer
>  
>  
>  配置


^ permalink raw reply

* Re: [RFC net-next 0/4] devlink: Add boot-time defaults
From: Jiri Pirko @ 2026-05-13 11:11 UTC (permalink / raw)
  To: Mark Bloch
  Cc: Parav Pandit, Jakub Kicinski, Eric Dumazet, Paolo Abeni,
	Andrew Lunn, David S. Miller, Jonathan Corbet, Shuah Khan,
	Simon Horman, Saeed Mahameed, Leon Romanovsky, Tariq Toukan,
	Andrew Morton, Borislav Petkov (AMD), Randy Dunlap, Dave Hansen,
	Christian Brauner, Petr Mladek, Peter Zijlstra (Intel),
	Thomas Gleixner, Pawan Gupta, Dapeng Mi, Kees Cook, Marco Elver,
	Eric Biggers, NBU-Contact-Li Rongqing (EXTERNAL),
	Paul E. McKenney, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	linux-rdma@vger.kernel.org
In-Reply-To: <29868c1b-5751-421a-9f2b-2ac0f3324904@nvidia.com>

Wed, May 13, 2026 at 07:53:05AM CEST, mbloch@nvidia.com wrote:
>
>
>On 12/05/2026 21:35, Jiri Pirko wrote:
>> Tue, May 12, 2026 at 05:25:21PM CEST, parav@nvidia.com wrote:
>>>
>>>
>>>> From: Jiri Pirko <jiri@resnulli.us>
>>>> Sent: 12 May 2026 07:37 PM
>>>>
>>>> Tue, May 12, 2026 at 03:48:32PM CEST, parav@nvidia.com wrote:
>>>>>
>>>>>> From: Jiri Pirko <jiri@resnulli.us>
>>>>>> Sent: 12 May 2026 02:16 PM
>>>>>>
>>>>>> Mon, May 11, 2026 at 08:21:37PM +0200, parav@nvidia.com wrote:
>>>>>>>
>>>>>>>> From: Mark Bloch <mbloch@nvidia.com>
>>>>>>>> Sent: 10 May 2026 06:02 PM
>>>>>>>>
>>>>>>>
>>>>>>> [..]
>>>>>>>
>>>>>>>>> I look at it from the perspective that from some CX generation,
>>>>>>>>> switchdev mode should be default. So that is a device-based decision.
>>>>>>>>> I believe as such it can optionally be permanenty configured (nv config)
>>>>>>>>> on older device. Why not?
>>>>>>>>
>>>>>>> Because sometimes switchdev_inactive is needed and sometimes not.
>>>>>>> Such knob is not device decision.
>>>>>>
>>>>>> That is what I would call corner case. In that, user can use userspace
>>>>>> configuration to change the mode in runtime.
>>>>>>
>>>>> Corner vs common depends on users one talks to. :)
>>>>> If fw has switchdev(active) as default, and then
>>>>> And user needs to run switchdev_inactive, it will actually break their switching applications.
>>>>
>>>> Can you describe the actutal breakage please?
>>>>
>>> Driver default was switchdev so all the traffic is forwarded to the switch,
>>> and user didn't have chance to setup the fdb rules.
>>> So packets are dropped but user didn't expect the traffic to be forwarded.
>> 
>> User may switch mode to switchdev_inactive early on, before any of the
>> representors are created. What's the issue then?
>
>That is the ordering problem I am trying to solve.
>
>On a DPU, the host PF cannot finish loading until the ECPF moves the eswitch to
>switchdev/switchdev_inactive. So we need to do that transition during ECPF
>driver init, as early as possible. Waiting for userspace means the host PF stays
>blocked until userspace is up and has the right logic.
>
>That is not always true in practice, the driver may be built in, loaded from an
>initramfs, or the initramfs may simply not contain the devlink policy we need.
>
>Also, after talking with Parav, my understanding is that we need to support both
>switchdev and switchdev_inactive, since different customers want different boot
>behavior. Once we do the transition, the host PF can load and may start sending
>packets. At that point the initial mode already matters: in switchdev_inactive
>packets are dropped until userspace programs the pipeline; in switchdev they may
>reach the FDB before the pipeline is ready.
>
>So I do not think an early userspace transition is equivalent here. The initial
>mode needs to be known by the kernel before userspace runs, which is why I am
>proposing the devlink= command line default.

Okay fair enough. Could you please at least make sure this is mode only
config and noone would ever think about abusing this for any other
configuration? Perhaps call it "devlink_eswitch_mode=" to remove
the "devlink=" namespace flexibility?

^ permalink raw reply

* Re: [PATCH] docs: update rust-analyzer command
From: Miguel Ojeda @ 2026-05-13 11:20 UTC (permalink / raw)
  To: Gary Guo
  Cc: Onur Özkan, rust-for-linux, linux-doc, linux-kernel, ojeda,
	boqun, bjorn3_gh, lossin, a.hindborg, aliceryhl, tmgross, dakr,
	corbet, skhan, alexs, si.yanteng, dzm91
In-Reply-To: <DIHHZLJR583G.2PP9PSQG3HTSR@garyguo.net>

On Wed, May 13, 2026 at 1:04 PM Gary Guo <gary@garyguo.net> wrote:
>
> This should be fixed by marking `prepare` as a dependency of `rust-analyzer`
> instead.

Onur suggested that in Zulip, but it is not a real dependency (in the
sense of generating the file) nor a hard one (in the sense that
rust-analyzer works to some degree without a build).

I am not opposed to it to make the target about "setup rust-analyzer"
rather than "just generate the file", since I think that is what most
people want, but I wonder if someone out there may be already relying
on generating the file without building.

Another alternative is an informational message about it as a middle
ground between "just in the docs" and "not being possible to avoid
part of the build" (and without introducing yet one more target, which
is another option too).

Cheers,
Miguel

^ permalink raw reply

* Re: [PATCH v3 2/3] Documentation: security-bugs: explain what is and is not a security bug
From: Willy Tarreau @ 2026-05-13 11:23 UTC (permalink / raw)
  To: Greg KH
  Cc: Jonathan Corbet, Leon Romanovsky, skhan, security, workflows,
	linux-doc, linux-kernel
In-Reply-To: <2026051333-puzzle-smokiness-8096@gregkh>

On Wed, May 13, 2026 at 12:29:34PM +0200, Greg KH wrote:
> On Tue, May 12, 2026 at 11:20:51AM -0600, Jonathan Corbet wrote:
> > Willy Tarreau <w@1wt.eu> writes:
> > 
> > > The use of automated tools to find bugs in random locations of the kernel
> > > induces a raise of security reports even if most of them should just be
> > > reported as regular bugs. This patch is an attempt at drawing a line
> > > between what qualifies as a security bug and what does not, hoping to
> > > improve the situation and ease decision on the reporter's side.
> > >
> > > It defers the enumeration to a new file, threat-model.rst, that tries
> > > to enumerate various classes of issues that are and are not security
> > > bugs. This should permit to more easily update this file for various
> > > subsystem-specific rules without having to revisit the security bug
> > > reporting guide.
> > 
> > One thing here:
> > 
> > [...]
> > 
> > > +* **Capability-based protection**:
> > > +
> > > +  * users not having the ``CAP_SYS_ADMIN`` capability may not alter the
> > > +    kernel's configuration, memory nor state, change other users' view of the
> > > +    file system layout, grant any user capabilities they do not have, nor
> > > +    affect the system's availability (shutdown, reboot, panic, hang, or making
> > > +    the system unresponsive via unbounded resource exhaustion).
> > 
> > That is pretty demonstrably not true, and will likely elicit challenges
> > at some point.  There are a lot of "make me root" capabilities that
> > enable users to do all of those things; consider CAP_DAC_OVERRIDE as an
> > obvious example.  I think that just about all of the capabilities will
> > enable at least one of those things - that's why the capabilities exist
> > in the first place.  So I think this needs to be written far more
> > generally.
> 
> You are right, there are more capabilities, but we get bug reports all
> the time that basically come down to "a user with CAP_SYS_ADMIN can go
> and do..." which are pointless for us to be handling.  Just got one a
> few minutes ago, so LLMs are churning this crap out quite frequently.
> 
> So any rewording of this to prevent us from getting these pointless
> reports would be great.

Honestly we're seeing this through the angle of a patch that lists a
single paragraph but the doc is already becoming quite long. I'm a bit
afraid of adding long enumerations, or sentences which do not immediately
translate to something recognizable by reporters. Not that it cannot be
done, but I think the current situation warrants incremental improvements
by fixing what doesn't work well. And indeed most of the capabilities
based reports currently revolve around "I already have CAP_{SYS,NET}_ADMIN
and ...". That might remain a good start for now.

> > As a lower-priority thing, lockdown mode is meant to at least try to
> > provide some stronger guarantees, and lockdown circumvention seems to be
> > normally be viewed as a security bug.  Worth a mention?
> 
> lockdown issues are best discussed on the list where the lockdown people
> are as most of us feel that really isn't a "security" thing at all :)

I don't remember when we last got a report for it but it's not frequent.
Again, I think we should continue to focus on efficiency, i.e. the number
of improperly routed reports we can stop per word written/read.

Willy

^ permalink raw reply

* Re: [PATCH v3 3/3] Documentation: security-bugs: clarify requirements for AI-assisted reports
From: Willy Tarreau @ 2026-05-13 11:24 UTC (permalink / raw)
  To: Greg KH
  Cc: Jonathan Corbet, Leon Romanovsky, skhan, security, workflows,
	linux-doc, linux-kernel
In-Reply-To: <2026051353-apricot-kleenex-fa57@gregkh>

On Wed, May 13, 2026 at 12:30:10PM +0200, Greg KH wrote:
> > One nit:
> > 
> > > +  * **Impact Evaluation**: Many AI-generated reports lack an understanding of
> > > +    the kernel's threat model and go to great lengths inventing theoretical
> > > +    consequences.
> > 
> > If only we had a shiny new document describing that threat model that we
> > could reference here... :)
> 
> Ah yes, a link to that would make things better, but don't we have that
> elsewhere in this series?

It's in the same patch, I think Jon was sarcastic here. I thought I had
addressed that one but apparently I was wrong :-/

willy

^ permalink raw reply

* Re: [PATCH RFC 2/5] dma-heap: charge dma-buf memory via explicit memcg
From: Albert Esteve @ 2026-05-13 11:39 UTC (permalink / raw)
  To: T.J. Mercier
  Cc: Christian König, Tejun Heo, Johannes Weiner,
	Michal Koutný, Jonathan Corbet, Shuah Khan, Sumit Semwal,
	Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song,
	Andrew Morton, Benjamin Gaignard, Brian Starkey, John Stultz,
	Christian Brauner, Paul Moore, James Morris, Serge E. Hallyn,
	Stephen Smalley, Ondrej Mosnacek, Shuah Khan, cgroups, linux-doc,
	linux-kernel, linux-media, dri-devel, linaro-mm-sig, linux-mm,
	linux-security-module, selinux, linux-kselftest, mripard,
	echanude
In-Reply-To: <CABdmKX2uwZ12kYJYPJGfWxuMBOJS=64b1GRj72tfB5D=NKM22w@mail.gmail.com>

On Tue, May 12, 2026 at 8:53 PM T.J. Mercier <tjmercier@google.com> wrote:
>
> On Tue, May 12, 2026 at 3:14 AM Christian König
> <christian.koenig@amd.com> wrote:
> >
> > On 5/12/26 11:10, Albert Esteve wrote:
> > > On embedded platforms a central process often allocates dma-buf
> > > memory on behalf of client applications. Without a way to
> > > attribute the charge to the requesting client's cgroup, the
> > > cost lands on the allocator, making per-cgroup memory limits
> > > ineffective for the actual consumers.
> > >
> > > Add charge_pid_fd to struct dma_heap_allocation_data. When set to
> > > a valid pidfd, DMA_HEAP_IOCTL_ALLOC resolves the target task's
> > > memcg and charges the buffer there via mem_cgroup_charge_dmabuf()
> > > inside dma_heap_buffer_alloc(). Without charge_pid_fd, and with
> > > the mem_accounting module parameter enabled, the buffer is charged
> > > to the allocator's own cgroup.
> > >
> > > Additionally, commit 3c227be90659 ("dma-buf: system_heap: account for
> > > system heap allocation in memcg") adds __GFP_ACCOUNT to system-heap
> > > page allocations. Keeping __GFP_ACCOUNT would charge the same pages
> > > twice (once to kmem, once to MEMCG_DMABUF), thus remove it and route
> > > all accounting through a single MEMCG_DMABUF path.
> > >
> > > Usage examples:
> > >
> > >   1. Central allocator charging to a client at allocation time.
> > >      The allocator knows the client's PID (e.g., from binder's
> > >      sender_pid) and uses pidfd to attribute the charge:
> > >
> > >        pid_t client_pid = txn->sender_pid;
> > >        int pidfd = pidfd_open(client_pid, 0);
> > >
> > >        struct dma_heap_allocation_data alloc = {
> > >            .len             = buffer_size,
> > >            .fd_flags        = O_RDWR | O_CLOEXEC,
> > >            .charge_pid_fd   = pidfd,
> > >        };
> > >        ioctl(heap_fd, DMA_HEAP_IOCTL_ALLOC, &alloc);
> > >        close(pidfd);
> > >        /* alloc.fd is now charged to client's cgroup */
> > >
> > >   2. Default allocation (no pidfd, mem_accounting=1).
> > >      When charge_pid_fd is not set and the mem_accounting module
> > >      parameter is enabled, the buffer is charged to the allocator's
> > >      own cgroup:
> > >
> > >        struct dma_heap_allocation_data alloc = {
> > >            .len      = buffer_size,
> > >            .fd_flags = O_RDWR | O_CLOEXEC,
> > >        };
> > >        ioctl(heap_fd, DMA_HEAP_IOCTL_ALLOC, &alloc);
> > >        /* charged to current process's cgroup */
> > >
> > > Current limitations:
> > >
> > >  - Single-owner model: a dma-buf carries one memcg charge regardless of
> > >    how many processes share it. Means only the first owner (and exporter)
> > >    of the shared buffer bears the charge.
> > >  - Only memcg accounting supported. While this makes sense for system
> > >    heap buffers, other heaps (e.g., CMA heaps) will require selectively
> > >    charging also for the dmem controller.
> >
> > Well that doesn't looks soo bad, it at least seems to tackle the problem at hand for Android and some of other embedded use cases.
>
> Yeah I think this might work. I know of 3 cases, and it trivially
> solves the first two. The third requires some work on our end to
> extend our userspace interfaces to include the pidfd but it seems
> doable. I'm checking with our graphics folks.
>
> 1) Direct allocation from user (e.g. app -> allocation ioctl on
> /dev/dma_heap/foo)
> No changes required to userspace. mem_accounting=1 charges the app.
>
> 2) Single hop remote allocation (e.g. app -> AHardwareBuffer_allocate
> -> gralloc)
> gralloc has the caller's pid as described in the commit message. Open
> a pidfd and pass it in the dma_heap_allocation_data.
>
> 3) Double hop remote allocation (e.g. app -> dequeueBuffer ->
> SurfaceFlinger -> gralloc)
> In this case gralloc knows SurfaceFlinger's pid, but not the app's. So
> we need to add the app's pidfd to the SurfaceFlinger -> gralloc
> interface, or transfer the memcg charge from SurfaceFlinger to the app
> after the allocation.
> It'd be nice to avoid the charge transfer option entirely, but if we
> need it that doesn't seem so bad in this case because it's a bulk
> charge for the entire dmabuf rather than per-page. So the exporter
> doesn't need to get involved (we wouldn't need a new dma_buf_op) and
> we wouldn't have to worry about looping and locking for each page.
>
> > I'm just not sure if this is future prove and will work for all use cases, e.g. cloud gaming, native context for automotive etc...
> >
> > Essentially the problem boils down to two limitations:
> > 1) a piece of memory can only be charged to one cgroup, the framework doesn't has a concept of charging shared memory to multiple groups
>
> Yup, memcg already has this problem with pagecache and shmem.
>
> > 2) when memory references in the form of file descriptors are passed between applications we have no way of changing the accounting to a different cgroup
> >
> > The passing of the memory reference already has a well defined uAPI and if we could solve those two limitations we not only solve the problem without introducing new uAPI (with potential new security risks) but also solve it for all other use cases which uses file descriptors as well as. E.g. memfd, accel and GPU drivers etc...
> >
> > On the other hand it is really nice to finally see this tackled for at least DMA-buf heaps.
>
> I have a question about this part. Albert I guess you are interested
> only in accounting dmabuf-heap allocations, or do you expect to add
> __GFP_ACCOUNT or mem_cgroup_charge_dmabuf calls to other
> non-dmabuf-heap exporters?

We're scoping this to dma-buf heaps for now. CMA heaps and the dmem
controller are on the radar for follow-up/parallel work (there will be
dragons and will surely need discussion). For DRM and V4L2 the
long-term intent is migration to heaps, which would make direct
accounting on those paths unnecessary. udmabufs are already
memcg-charged, so adding a separate MEMCG_DMABUF would double count.
Are there any other exporters you had in mind that would benefit from
this approach?

BR,
Albert.

>
> > On the GPU side I have seen just another try of a driver doing some kind of special driver specific accounting to solve this just a few weeks ago. And to be honest such single driver island approach have the tendency to break more often that they are working correctly.
> >
> > Regards,
> > Christian.
> >
> > >
> > > Signed-off-by: Albert Esteve <aesteve@redhat.com>
> > > ---
> > >  Documentation/admin-guide/cgroup-v2.rst |  5 ++--
> > >  drivers/dma-buf/dma-buf.c               | 16 ++++---------
> > >  drivers/dma-buf/dma-heap.c              | 42 ++++++++++++++++++++++++++++++---
> > >  drivers/dma-buf/heaps/system_heap.c     |  2 --
> > >  include/uapi/linux/dma-heap.h           |  6 +++++
> > >  5 files changed, 53 insertions(+), 18 deletions(-)
> > >
> > > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> > > index 8bdbc2e866430..824d269531eb1 100644
> > > --- a/Documentation/admin-guide/cgroup-v2.rst
> > > +++ b/Documentation/admin-guide/cgroup-v2.rst
> > > @@ -1636,8 +1636,9 @@ The following nested keys are defined.
> > >               structures.
> > >
> > >         dmabuf (npn)
> > > -             Amount of memory used for exported DMA buffers allocated by the cgroup.
> > > -             Stays with the allocating cgroup regardless of how the buffer is shared.
> > > +             Amount of memory used for exported DMA buffers allocated by or on
> > > +             behalf of the cgroup. Stays with the allocating cgroup regardless
> > > +             of how the buffer is shared.
> > >
> > >         workingset_refault_anon
> > >               Number of refaults of previously evicted anonymous pages.
> > > diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> > > index ce02377f48908..23fb758b78297 100644
> > > --- a/drivers/dma-buf/dma-buf.c
> > > +++ b/drivers/dma-buf/dma-buf.c
> > > @@ -181,8 +181,11 @@ static void dma_buf_release(struct dentry *dentry)
> > >        */
> > >       BUG_ON(dmabuf->cb_in.active || dmabuf->cb_out.active);
> > >
> > > -     mem_cgroup_uncharge_dmabuf(dmabuf->memcg, PAGE_ALIGN(dmabuf->size) / PAGE_SIZE);
> > > -     mem_cgroup_put(dmabuf->memcg);
> > > +     if (dmabuf->memcg) {
> > > +             mem_cgroup_uncharge_dmabuf(dmabuf->memcg,
> > > +                                       PAGE_ALIGN(dmabuf->size) / PAGE_SIZE);
> > > +             mem_cgroup_put(dmabuf->memcg);
> > > +     }
> > >
> > >       dmabuf->ops->release(dmabuf);
> > >
> > > @@ -764,13 +767,6 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
> > >               dmabuf->resv = resv;
> > >       }
> > >
> > > -     dmabuf->memcg = get_mem_cgroup_from_mm(current->mm);
> > > -     if (!mem_cgroup_charge_dmabuf(dmabuf->memcg, PAGE_ALIGN(dmabuf->size) / PAGE_SIZE,
> > > -                                   GFP_KERNEL)) {
> > > -             ret = -ENOMEM;
> > > -             goto err_memcg;
> > > -     }
> > > -
> > >       file->private_data = dmabuf;
> > >       file->f_path.dentry->d_fsdata = dmabuf;
> > >       dmabuf->file = file;
> > > @@ -781,8 +777,6 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
> > >
> > >       return dmabuf;
> > >
> > > -err_memcg:
> > > -     mem_cgroup_put(dmabuf->memcg);
> > >  err_file:
> > >       fput(file);
> > >  err_module:
> > > diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c
> > > index ac5f8685a6494..ff6e259afcdc0 100644
> > > --- a/drivers/dma-buf/dma-heap.c
> > > +++ b/drivers/dma-buf/dma-heap.c
> > > @@ -7,13 +7,17 @@
> > >   */
> > >
> > >  #include <linux/cdev.h>
> > > +#include <linux/cgroup.h>
> > >  #include <linux/device.h>
> > >  #include <linux/dma-buf.h>
> > >  #include <linux/dma-heap.h>
> > > +#include <linux/memcontrol.h>
> > > +#include <linux/sched/mm.h>
> > >  #include <linux/err.h>
> > >  #include <linux/export.h>
> > >  #include <linux/list.h>
> > >  #include <linux/nospec.h>
> > > +#include <linux/pidfd.h>
> > >  #include <linux/syscalls.h>
> > >  #include <linux/uaccess.h>
> > >  #include <linux/xarray.h>
> > > @@ -55,10 +59,12 @@ MODULE_PARM_DESC(mem_accounting,
> > >                "Enable cgroup-based memory accounting for dma-buf heap allocations (default=false).");
> > >
> > >  static int dma_heap_buffer_alloc(struct dma_heap *heap, size_t len,
> > > -                              u32 fd_flags,
> > > -                              u64 heap_flags)
> > > +                              u32 fd_flags, u64 heap_flags,
> > > +                              struct mem_cgroup *charge_to)
> > >  {
> > >       struct dma_buf *dmabuf;
> > > +     unsigned int nr_pages;
> > > +     struct mem_cgroup *memcg = charge_to;
> > >       int fd;
> > >
> > >       /*
> > > @@ -73,6 +79,22 @@ static int dma_heap_buffer_alloc(struct dma_heap *heap, size_t len,
> > >       if (IS_ERR(dmabuf))
> > >               return PTR_ERR(dmabuf);
> > >
> > > +     nr_pages = len / PAGE_SIZE;
> > > +
> > > +     if (memcg)
> > > +             css_get(&memcg->css);
> > > +     else if (mem_accounting)
> > > +             memcg = get_mem_cgroup_from_mm(current->mm);
> > > +
> > > +     if (memcg) {
> > > +             if (!mem_cgroup_charge_dmabuf(memcg, nr_pages, GFP_KERNEL)) {
> > > +                     mem_cgroup_put(memcg);
> > > +                     dma_buf_put(dmabuf);
> > > +                     return -ENOMEM;
> > > +             }
> > > +             dmabuf->memcg = memcg;
> > > +     }
> > > +
> > >       fd = dma_buf_fd(dmabuf, fd_flags);
> > >       if (fd < 0) {
> > >               dma_buf_put(dmabuf);
> > > @@ -102,6 +124,9 @@ static long dma_heap_ioctl_allocate(struct file *file, void *data)
> > >  {
> > >       struct dma_heap_allocation_data *heap_allocation = data;
> > >       struct dma_heap *heap = file->private_data;
> > > +     struct mem_cgroup *memcg = NULL;
> > > +     struct task_struct *task;
> > > +     unsigned int pidfd_flags;
> > >       int fd;
> > >
> > >       if (heap_allocation->fd)
> > > @@ -113,9 +138,20 @@ static long dma_heap_ioctl_allocate(struct file *file, void *data)
> > >       if (heap_allocation->heap_flags & ~DMA_HEAP_VALID_HEAP_FLAGS)
> > >               return -EINVAL;
> > >
> > > +     if (heap_allocation->charge_pid_fd) {
> > > +             task = pidfd_get_task(heap_allocation->charge_pid_fd, &pidfd_flags);
> > > +             if (IS_ERR(task))
> > > +                     return PTR_ERR(task);
> > > +
> > > +             memcg = get_mem_cgroup_from_mm(task->mm);
> > > +             put_task_struct(task);
> > > +     }
> > > +
> > >       fd = dma_heap_buffer_alloc(heap, heap_allocation->len,
> > >                                  heap_allocation->fd_flags,
> > > -                                heap_allocation->heap_flags);
> > > +                                heap_allocation->heap_flags,
> > > +                                memcg);
> > > +     mem_cgroup_put(memcg);
> > >       if (fd < 0)
> > >               return fd;
> > >
> > > diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
> > > index 03c2b87cb1112..95d7688167b93 100644
> > > --- a/drivers/dma-buf/heaps/system_heap.c
> > > +++ b/drivers/dma-buf/heaps/system_heap.c
> > > @@ -385,8 +385,6 @@ static struct page *alloc_largest_available(unsigned long size,
> > >               if (max_order < orders[i])
> > >                       continue;
> > >               flags = order_flags[i];
> > > -             if (mem_accounting)
> > > -                     flags |= __GFP_ACCOUNT;
> > >               page = alloc_pages(flags, orders[i]);
> > >               if (!page)
> > >                       continue;
> > > diff --git a/include/uapi/linux/dma-heap.h b/include/uapi/linux/dma-heap.h
> > > index a4cf716a49fa6..e02b0f8cbc6a1 100644
> > > --- a/include/uapi/linux/dma-heap.h
> > > +++ b/include/uapi/linux/dma-heap.h
> > > @@ -29,6 +29,10 @@
> > >   *                   handle to the allocated dma-buf
> > >   * @fd_flags:                file descriptor flags used when allocating
> > >   * @heap_flags:              flags passed to heap
> > > + * @charge_pid_fd:   optional pidfd of the process whose cgroup should be
> > > + *                   charged for this allocation; 0 means charge the calling
> > > + *                   process's cgroup
> > > + * @__padding:               reserved, must be zero
> > >   *
> > >   * Provided by userspace as an argument to the ioctl
> > >   */
> > > @@ -37,6 +41,8 @@ struct dma_heap_allocation_data {
> > >       __u32 fd;
> > >       __u32 fd_flags;
> > >       __u64 heap_flags;
> > > +     __u32 charge_pid_fd;
> > > +     __u32 __padding;
> > >  };
> > >
> > >  #define DMA_HEAP_IOC_MAGIC           'H'
> > >
> >
>


^ permalink raw reply

* Re: [PATCH] docs: update rust-analyzer command
From: Gary Guo @ 2026-05-13 11:44 UTC (permalink / raw)
  To: Miguel Ojeda, Gary Guo
  Cc: Onur Özkan, rust-for-linux, linux-doc, linux-kernel, ojeda,
	boqun, bjorn3_gh, lossin, a.hindborg, aliceryhl, tmgross, dakr,
	corbet, skhan, alexs, si.yanteng, dzm91
In-Reply-To: <CANiq72m27gP3z0t80bSUds=2tDcnt9JU5X-=4nW9XwpZsiLuAA@mail.gmail.com>

On Wed May 13, 2026 at 12:20 PM BST, Miguel Ojeda wrote:
> On Wed, May 13, 2026 at 1:04 PM Gary Guo <gary@garyguo.net> wrote:
>>
>> This should be fixed by marking `prepare` as a dependency of `rust-analyzer`
>> instead.
>
> Onur suggested that in Zulip, but it is not a real dependency (in the
> sense of generating the file) nor a hard one (in the sense that
> rust-analyzer works to some degree without a build).
>
> I am not opposed to it to make the target about "setup rust-analyzer"
> rather than "just generate the file", since I think that is what most
> people want, but I wonder if someone out there may be already relying
> on generating the file without building.


Well, to me I have always run it in a build directory, as it expects a .config
file so I cannot run it just source tree only. Running it would also sync the
config, which would invoke rustc and others already due to rustc-option, so
arguably it's already running part of the build.

The only issue with "prepare" dependency is that it also builds the kernel
crate. But I suppose with the new build system update this won't be necessary?

Best,
Gary

>
> Another alternative is an informational message about it as a middle
> ground between "just in the docs" and "not being possible to avoid
> part of the build" (and without introducing yet one more target, which
> is another option too).
>
> Cheers,
> Miguel


^ permalink raw reply

* Re: [PATCH v6 2/4] mm/memory-failure: classify get_any_page() failures by reason
From: David Hildenbrand (Arm) @ 2026-05-13 11:48 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Miaohe Lin, Naoya Horiguchi, Andrew Morton, Jonathan Corbet,
	Shuah Khan, Lorenzo Stoakes, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shuah Khan, Steven Rostedt,
	Masami Hiramatsu, Mathieu Desnoyers, Liam R. Howlett, linux-mm,
	linux-kernel, linux-doc, linux-kselftest, linux-trace-kernel,
	kernel-team, Lance Yang
In-Reply-To: <agMpqhpgmezqnaA_@gmail.com>

On 5/12/26 15:33, Breno Leitao wrote:
> On Tue, May 12, 2026 at 10:21:50AM +0200, David Hildenbrand (Arm) wrote:
>>
>>>  		}
>>>  		goto unlock_mutex;
>>>  	} else if (res < 0) {
>>> -		if (is_reserved)
>>> +		/*
>>> +		 * Promote a stable unhandlable kernel page diagnosed by
>>> +		 * get_hwpoison_page() to MF_MSG_KERNEL alongside reserved
>>> +		 * pages; transient lifecycle races stay as MF_MSG_GET_HWPOISON.
>>> +		 */
>>> +		if (is_reserved || gp_status == MF_GET_PAGE_UNHANDLABLE)
>>>  			res = action_result(pfn, MF_MSG_KERNEL, MF_IGNORED);
>>
>>
>> It's all a bit of a mess. get_hwpoison_page() should just indicate that a page
>> is unhandable if it is PG_reserved?
> 
> Are you saying that we should identify if the page is PG_reserved in
> get_hwpoison_page() instead of in memory_failure(), as done in the
> previous patch ("mm/memory-failure: report MF_MSG_KERNEL for reserved
> pages") ?
> 
>> Why can't we just return a special error code from  get_hwpoison_page()? We ahve
>> plenty of errno values to chose from.
> 
> Something like:
> 
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 866c4428ac7ef..0a6d83575833e 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -878,7 +878,7 @@ static const char *action_name[] = {
>  };
>  
>  static const char * const action_page_types[] = {
> -	[MF_MSG_KERNEL]			= "reserved kernel page",
> +	[MF_MSG_KERNEL]			= "unrecoverable kernel page",
>  	[MF_MSG_KERNEL_HIGH_ORDER]	= "high-order kernel page",
>  	[MF_MSG_HUGE]			= "huge page",
>  	[MF_MSG_FREE_HUGE]		= "free huge page",
> @@ -1394,6 +1394,21 @@ static int get_any_page(struct page *p, unsigned long flags)
>  	int ret = 0, pass = 0;
>  	bool count_increased = false;
> 
> +	if (PageReserved(p)) {
> +		ret = -ENOTRECOVERABLE;
> +		goto out;
> +	}
> +
>  	if (flags & MF_COUNT_INCREASED)
>  		count_increased = true;
>  
> @@ -1422,7 +1437,7 @@ static int get_any_page(struct page *p, unsigned long flags)
>  				shake_page(p);
>  				goto try_again;
>  			}
> -			ret = -EIO;
> +			ret = -ENOTRECOVERABLE;
>  			goto out;
>  		}
>  	}
> @@ -1441,10 +1456,10 @@ static int get_any_page(struct page *p, unsigned long flags)
>  			goto try_again;
>  		}
>  		put_page(p);
> -		ret = -EIO;
> +		ret = -ENOTRECOVERABLE;
>  	}
>  out:
> -	if (ret == -EIO)
> +	if (ret == -EIO || ret == -ENOTRECOVERABLE)
>  		pr_err("%#lx: unhandlable page.\n", page_to_pfn(p));
>  
>  	return ret;
> @@ -2431,6 +2448,9 @@ int memory_failure(unsigned long pfn, int flags)
>  			res = action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, MF_IGNORED);
>  		}
>  		goto unlock_mutex;
> +	} else if (res == -ENOTRECOVERABLE) {
> +		res = action_result(pfn, MF_MSG_KERNEL, MF_IGNORED);
> +		goto unlock_mutex;
>  	} else if (res < 0) {
>  		res = action_result(pfn, MF_MSG_GET_HWPOISON, MF_IGNORED);
>  		goto unlock_mutex;

That might probably read nicer as

switch (res) {
case 0: ...
case 1: ...
case -ENOTRECOVERABLE:  ...
case ...
default:
}

> 
> 
> If that is what you are suggestion, maybe we can create another
> MF_MSG_RESERVED? and another return value for get_any_page() to track
> the reserve pages ?

I guess "reserved" is really just like most other kernel pages. So I wouldn't
special-case them here.

Or would there be a good reason?

-- 
Cheers,

David

^ permalink raw reply

* Re: [PATCH v11 4/5] platform/chrome: Protect cros_ec_device lifecycle with revocable
From: Jason Gunthorpe @ 2026-05-13 11:51 UTC (permalink / raw)
  To: Tzung-Bi Shih
  Cc: Arnd Bergmann, Greg Kroah-Hartman, Bartosz Golaszewski,
	Linus Walleij, Benson Leung, linux-kernel, chrome-platform,
	driver-core, linux-doc, linux-gpio, Rafael J. Wysocki,
	Danilo Krummrich, Jonathan Corbet, Shuah Khan, Laurent Pinchart,
	Wolfram Sang, Johan Hovold, Paul E . McKenney
In-Reply-To: <20260513091043.6766-5-tzungbi@kernel.org>

On Wed, May 13, 2026 at 05:10:42PM +0800, Tzung-Bi Shih wrote:
> The cros_ec_device can be unregistered when the underlying device is
> removed.  Other kernel drivers that interact with the EC may hold a
> pointer to the cros_ec_device, creating a risk of a use-after-free
> error if the EC device is removed while still being referenced.
> 
> To prevent this, leverage the revocable and convert the underlying
> device drivers to resource providers of cros_ec_device.
> 
> ---
> v11:
> - No changes.

Two people are opposing this and yet no changes? Why haven't you
followed my advice to fix the bug in this driver in the obvious way?

Jason

^ permalink raw reply

* [linux-next:master 8363/14863] Documentation/devicetree/bindings/sound/mediatek,mt8173-rt5650-rt5514.yaml: properties:mediatek,audio-codec:items:0: 'anyOf' conditional failed, one must be fixed:
From: kernel test robot @ 2026-05-13 12:06 UTC (permalink / raw)
  To: Khushal Chitturi
  Cc: oe-kbuild-all, Mark Brown, Krzysztof Kozlowski, linux-doc

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
head:   e98d21c170b01ddef366f023bbfcf6b31509fa83
commit: 472d77bdc511d96434b3679ad022bfa35d3861c1 [8363/14863] ASoC: dt-bindings: mediatek,mt8173-rt5650-rt5514: convert to DT schema
config: arc-randconfig-2052-20260513 (https://download.01.org/0day-ci/archive/20260513/202605131424.Bkw0s7qx-lkp@intel.com/config)
compiler: arc-linux-gcc (GCC) 11.5.0
dtschema: 2026.5.dev9+gdf9ad30c5
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260513/202605131424.Bkw0s7qx-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202605131424.Bkw0s7qx-lkp@intel.com/

dtcheck warnings: (new ones prefixed by >>)
   Documentation/devicetree/bindings/spi/st,stm32mp25-ospi.yaml: properties:st,syscfg-dlyb:items:1: 'anyOf' conditional failed, one must be fixed:
   	'items' is a required property
   	'minItems' is a required property
   	'maxItems' is a required property
   	from schema $id: http://devicetree.org/meta-schemas/items.yaml
   Documentation/devicetree/bindings/remoteproc/qcom,sc7280-mss-pil.yaml: properties:qcom,smem-states:items:0: 'anyOf' conditional failed, one must be fixed:
   	'items' is a required property
   	'minItems' is a required property
   	'maxItems' is a required property
   	from schema $id: http://devicetree.org/meta-schemas/items.yaml
>> Documentation/devicetree/bindings/sound/mediatek,mt8173-rt5650-rt5514.yaml: properties:mediatek,audio-codec:items:0: 'anyOf' conditional failed, one must be fixed:
   	'items' is a required property
   	'minItems' is a required property
   	'maxItems' is a required property
   	from schema $id: http://devicetree.org/meta-schemas/items.yaml
   Documentation/devicetree/bindings/sound/mediatek,mt8173-rt5650-rt5514.yaml: properties:mediatek,audio-codec:items:1: 'anyOf' conditional failed, one must be fixed:
   	'items' is a required property
   	'minItems' is a required property
   	'maxItems' is a required property
   	from schema $id: http://devicetree.org/meta-schemas/items.yaml
   Documentation/devicetree/bindings/remoteproc/qcom,sc7280-adsp-pil.yaml: properties:qcom,smem-states:items:0: 'anyOf' conditional failed, one must be fixed:

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply

* Re: [PATCH RFC 2/5] dma-heap: charge dma-buf memory via explicit memcg
From: Albert Esteve @ 2026-05-13 12:41 UTC (permalink / raw)
  To: Christian König
  Cc: Tejun Heo, Johannes Weiner, Michal Koutný, Jonathan Corbet,
	Shuah Khan, Sumit Semwal, Michal Hocko, Roman Gushchin,
	Shakeel Butt, Muchun Song, Andrew Morton, Benjamin Gaignard,
	Brian Starkey, John Stultz, T.J. Mercier, Christian Brauner,
	Paul Moore, James Morris, Serge E. Hallyn, Stephen Smalley,
	Ondrej Mosnacek, Shuah Khan, cgroups, linux-doc, linux-kernel,
	linux-media, dri-devel, linaro-mm-sig, linux-mm,
	linux-security-module, selinux, linux-kselftest, mripard,
	echanude
In-Reply-To: <8ef38815-6ae9-4359-86d4-042554357639@amd.com>

On Tue, May 12, 2026 at 12:14 PM Christian König
<christian.koenig@amd.com> wrote:
>
> On 5/12/26 11:10, Albert Esteve wrote:
> > On embedded platforms a central process often allocates dma-buf
> > memory on behalf of client applications. Without a way to
> > attribute the charge to the requesting client's cgroup, the
> > cost lands on the allocator, making per-cgroup memory limits
> > ineffective for the actual consumers.
> >
> > Add charge_pid_fd to struct dma_heap_allocation_data. When set to
> > a valid pidfd, DMA_HEAP_IOCTL_ALLOC resolves the target task's
> > memcg and charges the buffer there via mem_cgroup_charge_dmabuf()
> > inside dma_heap_buffer_alloc(). Without charge_pid_fd, and with
> > the mem_accounting module parameter enabled, the buffer is charged
> > to the allocator's own cgroup.
> >
> > Additionally, commit 3c227be90659 ("dma-buf: system_heap: account for
> > system heap allocation in memcg") adds __GFP_ACCOUNT to system-heap
> > page allocations. Keeping __GFP_ACCOUNT would charge the same pages
> > twice (once to kmem, once to MEMCG_DMABUF), thus remove it and route
> > all accounting through a single MEMCG_DMABUF path.
> >
> > Usage examples:
> >
> >   1. Central allocator charging to a client at allocation time.
> >      The allocator knows the client's PID (e.g., from binder's
> >      sender_pid) and uses pidfd to attribute the charge:
> >
> >        pid_t client_pid = txn->sender_pid;
> >        int pidfd = pidfd_open(client_pid, 0);
> >
> >        struct dma_heap_allocation_data alloc = {
> >            .len             = buffer_size,
> >            .fd_flags        = O_RDWR | O_CLOEXEC,
> >            .charge_pid_fd   = pidfd,
> >        };
> >        ioctl(heap_fd, DMA_HEAP_IOCTL_ALLOC, &alloc);
> >        close(pidfd);
> >        /* alloc.fd is now charged to client's cgroup */
> >
> >   2. Default allocation (no pidfd, mem_accounting=1).
> >      When charge_pid_fd is not set and the mem_accounting module
> >      parameter is enabled, the buffer is charged to the allocator's
> >      own cgroup:
> >
> >        struct dma_heap_allocation_data alloc = {
> >            .len      = buffer_size,
> >            .fd_flags = O_RDWR | O_CLOEXEC,
> >        };
> >        ioctl(heap_fd, DMA_HEAP_IOCTL_ALLOC, &alloc);
> >        /* charged to current process's cgroup */
> >
> > Current limitations:
> >
> >  - Single-owner model: a dma-buf carries one memcg charge regardless of
> >    how many processes share it. Means only the first owner (and exporter)
> >    of the shared buffer bears the charge.
> >  - Only memcg accounting supported. While this makes sense for system
> >    heap buffers, other heaps (e.g., CMA heaps) will require selectively
> >    charging also for the dmem controller.
>
> Well that doesn't looks soo bad, it at least seems to tackle the problem at hand for Android and some of other embedded use cases.
>
> I'm just not sure if this is future prove and will work for all use cases, e.g. cloud gaming, native context for automotive etc...
>
> Essentially the problem boils down to two limitations:
> 1) a piece of memory can only be charged to one cgroup, the framework doesn't has a concept of charging shared memory to multiple groups
> 2) when memory references in the form of file descriptors are passed between applications we have no way of changing the accounting to a different cgroup
>
> The passing of the memory reference already has a well defined uAPI and if we could solve those two limitations we not only solve the problem without introducing new uAPI (with potential new security risks) but also solve it for all other use cases which uses file descriptors as well as. E.g. memfd, accel and GPU drivers etc...

Honestly, adding a hook to fd-passing uAPI to manage charge transfers
sounds like a promising solution requiring no uAPI changes. However,
it still does not cover all paths, e.g., dup() or fork(). And shared
memory sounds like a hard one to tackle, where deciding the best
policy is more a per-usecase thing and would probably require
userspace configuration. All in all, charge_pid_fd covers a
well-defined and immediately practical subset. The UAPI cost is small
and the mechanism is explicit about what it does and doesn't solve. A
general solution, if it ever converges, would likely supersede
charge_pid_fd for most cases, which is a fine outcome if it solves the
problem more completely.

Either way, if you have a specific approach in mind for solving any of
the above limitations, I'd be happy to look into it further.

BR,
Albert.

>
> On the other hand it is really nice to finally see this tackled for at least DMA-buf heaps. On the GPU side I have seen just another try of a driver doing some kind of special driver specific accounting to solve this just a few weeks ago. And to be honest such single driver island approach have the tendency to break more often that they are working correctly.
>
> Regards,
> Christian.
>
> >
> > Signed-off-by: Albert Esteve <aesteve@redhat.com>
> > ---
> >  Documentation/admin-guide/cgroup-v2.rst |  5 ++--
> >  drivers/dma-buf/dma-buf.c               | 16 ++++---------
> >  drivers/dma-buf/dma-heap.c              | 42 ++++++++++++++++++++++++++++++---
> >  drivers/dma-buf/heaps/system_heap.c     |  2 --
> >  include/uapi/linux/dma-heap.h           |  6 +++++
> >  5 files changed, 53 insertions(+), 18 deletions(-)
> >
> > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> > index 8bdbc2e866430..824d269531eb1 100644
> > --- a/Documentation/admin-guide/cgroup-v2.rst
> > +++ b/Documentation/admin-guide/cgroup-v2.rst
> > @@ -1636,8 +1636,9 @@ The following nested keys are defined.
> >               structures.
> >
> >         dmabuf (npn)
> > -             Amount of memory used for exported DMA buffers allocated by the cgroup.
> > -             Stays with the allocating cgroup regardless of how the buffer is shared.
> > +             Amount of memory used for exported DMA buffers allocated by or on
> > +             behalf of the cgroup. Stays with the allocating cgroup regardless
> > +             of how the buffer is shared.
> >
> >         workingset_refault_anon
> >               Number of refaults of previously evicted anonymous pages.
> > diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> > index ce02377f48908..23fb758b78297 100644
> > --- a/drivers/dma-buf/dma-buf.c
> > +++ b/drivers/dma-buf/dma-buf.c
> > @@ -181,8 +181,11 @@ static void dma_buf_release(struct dentry *dentry)
> >        */
> >       BUG_ON(dmabuf->cb_in.active || dmabuf->cb_out.active);
> >
> > -     mem_cgroup_uncharge_dmabuf(dmabuf->memcg, PAGE_ALIGN(dmabuf->size) / PAGE_SIZE);
> > -     mem_cgroup_put(dmabuf->memcg);
> > +     if (dmabuf->memcg) {
> > +             mem_cgroup_uncharge_dmabuf(dmabuf->memcg,
> > +                                       PAGE_ALIGN(dmabuf->size) / PAGE_SIZE);
> > +             mem_cgroup_put(dmabuf->memcg);
> > +     }
> >
> >       dmabuf->ops->release(dmabuf);
> >
> > @@ -764,13 +767,6 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
> >               dmabuf->resv = resv;
> >       }
> >
> > -     dmabuf->memcg = get_mem_cgroup_from_mm(current->mm);
> > -     if (!mem_cgroup_charge_dmabuf(dmabuf->memcg, PAGE_ALIGN(dmabuf->size) / PAGE_SIZE,
> > -                                   GFP_KERNEL)) {
> > -             ret = -ENOMEM;
> > -             goto err_memcg;
> > -     }
> > -
> >       file->private_data = dmabuf;
> >       file->f_path.dentry->d_fsdata = dmabuf;
> >       dmabuf->file = file;
> > @@ -781,8 +777,6 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
> >
> >       return dmabuf;
> >
> > -err_memcg:
> > -     mem_cgroup_put(dmabuf->memcg);
> >  err_file:
> >       fput(file);
> >  err_module:
> > diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c
> > index ac5f8685a6494..ff6e259afcdc0 100644
> > --- a/drivers/dma-buf/dma-heap.c
> > +++ b/drivers/dma-buf/dma-heap.c
> > @@ -7,13 +7,17 @@
> >   */
> >
> >  #include <linux/cdev.h>
> > +#include <linux/cgroup.h>
> >  #include <linux/device.h>
> >  #include <linux/dma-buf.h>
> >  #include <linux/dma-heap.h>
> > +#include <linux/memcontrol.h>
> > +#include <linux/sched/mm.h>
> >  #include <linux/err.h>
> >  #include <linux/export.h>
> >  #include <linux/list.h>
> >  #include <linux/nospec.h>
> > +#include <linux/pidfd.h>
> >  #include <linux/syscalls.h>
> >  #include <linux/uaccess.h>
> >  #include <linux/xarray.h>
> > @@ -55,10 +59,12 @@ MODULE_PARM_DESC(mem_accounting,
> >                "Enable cgroup-based memory accounting for dma-buf heap allocations (default=false).");
> >
> >  static int dma_heap_buffer_alloc(struct dma_heap *heap, size_t len,
> > -                              u32 fd_flags,
> > -                              u64 heap_flags)
> > +                              u32 fd_flags, u64 heap_flags,
> > +                              struct mem_cgroup *charge_to)
> >  {
> >       struct dma_buf *dmabuf;
> > +     unsigned int nr_pages;
> > +     struct mem_cgroup *memcg = charge_to;
> >       int fd;
> >
> >       /*
> > @@ -73,6 +79,22 @@ static int dma_heap_buffer_alloc(struct dma_heap *heap, size_t len,
> >       if (IS_ERR(dmabuf))
> >               return PTR_ERR(dmabuf);
> >
> > +     nr_pages = len / PAGE_SIZE;
> > +
> > +     if (memcg)
> > +             css_get(&memcg->css);
> > +     else if (mem_accounting)
> > +             memcg = get_mem_cgroup_from_mm(current->mm);
> > +
> > +     if (memcg) {
> > +             if (!mem_cgroup_charge_dmabuf(memcg, nr_pages, GFP_KERNEL)) {
> > +                     mem_cgroup_put(memcg);
> > +                     dma_buf_put(dmabuf);
> > +                     return -ENOMEM;
> > +             }
> > +             dmabuf->memcg = memcg;
> > +     }
> > +
> >       fd = dma_buf_fd(dmabuf, fd_flags);
> >       if (fd < 0) {
> >               dma_buf_put(dmabuf);
> > @@ -102,6 +124,9 @@ static long dma_heap_ioctl_allocate(struct file *file, void *data)
> >  {
> >       struct dma_heap_allocation_data *heap_allocation = data;
> >       struct dma_heap *heap = file->private_data;
> > +     struct mem_cgroup *memcg = NULL;
> > +     struct task_struct *task;
> > +     unsigned int pidfd_flags;
> >       int fd;
> >
> >       if (heap_allocation->fd)
> > @@ -113,9 +138,20 @@ static long dma_heap_ioctl_allocate(struct file *file, void *data)
> >       if (heap_allocation->heap_flags & ~DMA_HEAP_VALID_HEAP_FLAGS)
> >               return -EINVAL;
> >
> > +     if (heap_allocation->charge_pid_fd) {
> > +             task = pidfd_get_task(heap_allocation->charge_pid_fd, &pidfd_flags);
> > +             if (IS_ERR(task))
> > +                     return PTR_ERR(task);
> > +
> > +             memcg = get_mem_cgroup_from_mm(task->mm);
> > +             put_task_struct(task);
> > +     }
> > +
> >       fd = dma_heap_buffer_alloc(heap, heap_allocation->len,
> >                                  heap_allocation->fd_flags,
> > -                                heap_allocation->heap_flags);
> > +                                heap_allocation->heap_flags,
> > +                                memcg);
> > +     mem_cgroup_put(memcg);
> >       if (fd < 0)
> >               return fd;
> >
> > diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
> > index 03c2b87cb1112..95d7688167b93 100644
> > --- a/drivers/dma-buf/heaps/system_heap.c
> > +++ b/drivers/dma-buf/heaps/system_heap.c
> > @@ -385,8 +385,6 @@ static struct page *alloc_largest_available(unsigned long size,
> >               if (max_order < orders[i])
> >                       continue;
> >               flags = order_flags[i];
> > -             if (mem_accounting)
> > -                     flags |= __GFP_ACCOUNT;
> >               page = alloc_pages(flags, orders[i]);
> >               if (!page)
> >                       continue;
> > diff --git a/include/uapi/linux/dma-heap.h b/include/uapi/linux/dma-heap.h
> > index a4cf716a49fa6..e02b0f8cbc6a1 100644
> > --- a/include/uapi/linux/dma-heap.h
> > +++ b/include/uapi/linux/dma-heap.h
> > @@ -29,6 +29,10 @@
> >   *                   handle to the allocated dma-buf
> >   * @fd_flags:                file descriptor flags used when allocating
> >   * @heap_flags:              flags passed to heap
> > + * @charge_pid_fd:   optional pidfd of the process whose cgroup should be
> > + *                   charged for this allocation; 0 means charge the calling
> > + *                   process's cgroup
> > + * @__padding:               reserved, must be zero
> >   *
> >   * Provided by userspace as an argument to the ioctl
> >   */
> > @@ -37,6 +41,8 @@ struct dma_heap_allocation_data {
> >       __u32 fd;
> >       __u32 fd_flags;
> >       __u64 heap_flags;
> > +     __u32 charge_pid_fd;
> > +     __u32 __padding;
> >  };
> >
> >  #define DMA_HEAP_IOC_MAGIC           'H'
> >
>


^ permalink raw reply

* Re: [PATCH] Documentation: KVM: Document guest-visible compatibility expectations
From: Paolo Bonzini @ 2026-05-13 12:43 UTC (permalink / raw)
  To: David Woodhouse, Marc Zyngier
  Cc: Jonathan Corbet, Shuah Khan, kvm, linux-doc, linux-kernel,
	Sean Christopherson, Jim Mattson, Oliver Upton, Joey Gouly,
	Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon,
	Raghavendra Rao Ananta, Eric Auger, Kees Cook, Arnd Bergmann,
	Nathan Chancellor, linux-arm-kernel, kvmarm, linux-kselftest
In-Reply-To: <48b06e5655d56ff6eda30e563b34894fa0eb2f07.camel@infradead.org>

On 5/13/26 11:24, David Woodhouse wrote:
> On Wed, 2026-05-13 at 09:42 +0100, Marc Zyngier wrote:
>> If userspace is not a total joke, it will read all the ID registers,
>> and configure what it wants to see, assuming it is a feature that can
>> be configured (not everything can, because the architecture itself is
>> not fully backward compatible).
>>
>> Yes, this is buggy at times, because the combinatorial explosion of
>> CPU capabilities and supported features makes it pretty hard to test
>> (and really nobody actually does). But overall, it works, and QEMU is
>> growing an infrastructure to manage it in a "user friendly" way.
> 
> Yes, that is precisely what I'm asking for. I'm prepared to deal with
> the fact that KVM/Arm64 is not a stable and mature platform like x86
> is, and that userspace has to find all the random changes from one
> version to the next, and explicitly pin things down to be compatible.
> 
> All I'm asking for is that KVM makes it *possible* to pin things down
> to the behaviour of previously released Linux/KVM kernels.
> 
>> But really, this isn't what David is asking. He's demanding "bug for
>> bug" compatibility. For that, we have two possible cases:
> 
> No, I am not asking you to meet that bar. I merely observed that x86
> does and that it would be nice. But we are a *long* way from that.

x86 doesn't do bug-for-bug compatibility, thankfully - we have quirks 
but only 11 of them, or about one per year since we started adding them. 
  We only add quirks, generally speaking, when 1) we change the way file 
descriptors are initialized, 2) guests in the wild were relying on it, 
or 3) it prevends restoring state saved from an old kernel.  Is there 
anything else?

So you're asking something not really far from this:

>> - this is a behaviour that is not allowed by the architecture: we fix
>>    it for good. We do that on every release. Some minor, some much more
>>    visible. And there is no way we will add this sort of "bring the
>>    bugs back" type of behaviours. Specially when it is really obvious
>>    that no SW can make any reasonable use of the defect. We allow
>>    userspace to keep behaving as before, but the guest will not see a
>>    non-compliant behaviour.

... where for example 
https://lore.kernel.org/kvm/e03f092dfbb7d391a6bf2797ba01e122ba080bcd.camel@infradead.org/ 
is an example of a bug that "no SW can make any reasonable use of".

> Marc, this is complete nonsense and you should know better.
> Once a behaviour is present in a released version of Linux/KVM, we
> can't just declare it "wrong" and unilaterally impose a change in
> guest-visible behaviour on *running* guests as a side-effect of a
> kernel upgrade.
> 
> The criterion for *KVM* to remain compatible is "once it has been in a
> released version of the kernel". Not "once it is in the architecture".

That is *also* obviously nonsense though, isn't it (see example above)? 
The truth is in the middle, "once it is in the architecture" is likely 
too narrow but "once it is in a Linux release" is way too broad.  And 
besides, both miss the point of *configurability* which is the basis of 
it all.

The main difference between x86 and Arm is the default state at 
creation; x86 defaults to a blank slate, mostly; and when we didn't do 
that, we regretted it later (cue the STUFF_FEATURE_MSRS quirk).  It's 
too late to change the behavior for Arm, but I think we can agree that 
patches such as 
https://lore.kernel.org/kvm/20260511113558.3325004-2-dwmw2@infradead.org/ 
("KVM: arm64: vgic: Allow userspace to set IIDR revision 1") are what 
the letter and spirit of this proposal is about.

Marc did not mention having to deal with guests in the wild.  Let's 
ignore it for now because even defining "guests in the wild" is hard; 
and anyway it's not related to the patch that triggered the discussion.

So we have the third case, "restoring state saved from an old kernel". 
If this case arises, I do believe that Arm will have to deal with it and 
introduce quirks or KVM_GET/SET_REG hacks.  Maybe it hasn't happened 
yet, lucky you.

Overall, even if we may disagree about the details, are we really on 
terribly distant grounds, or are we not?

Paolo


^ permalink raw reply

* Re: [PATCH v3 2/3] Documentation: security-bugs: explain what is and is not a security bug
From: Jonathan Corbet @ 2026-05-13 12:52 UTC (permalink / raw)
  To: Willy Tarreau, Greg KH
  Cc: Leon Romanovsky, skhan, security, workflows, linux-doc,
	linux-kernel
In-Reply-To: <agRfFoMC2Gcu0Esz@1wt.eu>

Willy Tarreau <w@1wt.eu> writes:

> On Wed, May 13, 2026 at 12:29:34PM +0200, Greg KH wrote:
>> On Tue, May 12, 2026 at 11:20:51AM -0600, Jonathan Corbet wrote:
>> > Willy Tarreau <w@1wt.eu> writes:

>> > > +* **Capability-based protection**:
>> > > +
>> > > +  * users not having the ``CAP_SYS_ADMIN`` capability may not alter the
>> > > +    kernel's configuration, memory nor state, change other users' view of the
>> > > +    file system layout, grant any user capabilities they do not have, nor
>> > > +    affect the system's availability (shutdown, reboot, panic, hang, or making
>> > > +    the system unresponsive via unbounded resource exhaustion).
>> > 
>> > That is pretty demonstrably not true, and will likely elicit challenges
>> > at some point.  There are a lot of "make me root" capabilities that
>> > enable users to do all of those things; consider CAP_DAC_OVERRIDE as an
>> > obvious example.  I think that just about all of the capabilities will
>> > enable at least one of those things - that's why the capabilities exist
>> > in the first place.  So I think this needs to be written far more
>> > generally.
>> 
>> You are right, there are more capabilities, but we get bug reports all
>> the time that basically come down to "a user with CAP_SYS_ADMIN can go
>> and do..." which are pointless for us to be handling.  Just got one a
>> few minutes ago, so LLMs are churning this crap out quite frequently.
>> 
>> So any rewording of this to prevent us from getting these pointless
>> reports would be great.
>
> Honestly we're seeing this through the angle of a patch that lists a
> single paragraph but the doc is already becoming quite long. I'm a bit
> afraid of adding long enumerations, or sentences which do not immediately
> translate to something recognizable by reporters. Not that it cannot be
> done, but I think the current situation warrants incremental improvements
> by fixing what doesn't work well. And indeed most of the capabilities
> based reports currently revolve around "I already have CAP_{SYS,NET}_ADMIN
> and ...". That might remain a good start for now.

I definitely wouldn't argue for making it longer, and enumerating all of
the make-me-root capabilities would be silly.  I would consider just
replacing CAP_SYS_ADMIN with "elevated capabilities" or some such.  That
might rule out legitimate reports where some capability provides an
access it shouldn't, but I suspect you could live with that :)

Thanks,

jon

^ permalink raw reply

* Re: [PATCH v3 3/3] Documentation: security-bugs: clarify requirements for AI-assisted reports
From: Jonathan Corbet @ 2026-05-13 12:53 UTC (permalink / raw)
  To: Willy Tarreau, Greg KH
  Cc: Leon Romanovsky, skhan, security, workflows, linux-doc,
	linux-kernel
In-Reply-To: <agRfXQvN7ZDTNGQG@1wt.eu>

Willy Tarreau <w@1wt.eu> writes:

> On Wed, May 13, 2026 at 12:30:10PM +0200, Greg KH wrote:
>> > One nit:
>> > 
>> > > +  * **Impact Evaluation**: Many AI-generated reports lack an understanding of
>> > > +    the kernel's threat model and go to great lengths inventing theoretical
>> > > +    consequences.
>> > 
>> > If only we had a shiny new document describing that threat model that we
>> > could reference here... :)
>> 
>> Ah yes, a link to that would make things better, but don't we have that
>> elsewhere in this series?
>
> It's in the same patch, I think Jon was sarcastic here. I thought I had
> addressed that one but apparently I was wrong :-/

I'm just saying that this particular text should link to that document,
don't make readers go searching for it.  I can certainly add a patch
doing that if you like.

Thanks,

jon

^ permalink raw reply

* Re: [PATCH v3 3/3] Documentation: security-bugs: clarify requirements for AI-assisted reports
From: Willy Tarreau @ 2026-05-13 12:58 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Greg KH, Leon Romanovsky, skhan, security, workflows, linux-doc,
	linux-kernel
In-Reply-To: <87a4u3mpxk.fsf@trenco.lwn.net>

On Wed, May 13, 2026 at 06:53:11AM -0600, Jonathan Corbet wrote:
> Willy Tarreau <w@1wt.eu> writes:
> 
> > On Wed, May 13, 2026 at 12:30:10PM +0200, Greg KH wrote:
> >> > One nit:
> >> > 
> >> > > +  * **Impact Evaluation**: Many AI-generated reports lack an understanding of
> >> > > +    the kernel's threat model and go to great lengths inventing theoretical
> >> > > +    consequences.
> >> > 
> >> > If only we had a shiny new document describing that threat model that we
> >> > could reference here... :)
> >> 
> >> Ah yes, a link to that would make things better, but don't we have that
> >> elsewhere in this series?
> >
> > It's in the same patch, I think Jon was sarcastic here. I thought I had
> > addressed that one but apparently I was wrong :-/
> 
> I'm just saying that this particular text should link to that document,
> don't make readers go searching for it.  I can certainly add a patch
> doing that if you like.

That would be kind, thank you Jon. Feel free to modify my patch if you
haven't published it if you prefer.

Thanks!
willy

^ permalink raw reply

* Re: [PATCH v3 2/3] Documentation: security-bugs: explain what is and is not a security bug
From: Willy Tarreau @ 2026-05-13 13:00 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Greg KH, Leon Romanovsky, skhan, security, workflows, linux-doc,
	linux-kernel
In-Reply-To: <87ecjfmpzj.fsf@trenco.lwn.net>

On Wed, May 13, 2026 at 06:52:00AM -0600, Jonathan Corbet wrote:
> Willy Tarreau <w@1wt.eu> writes:
> 
> > On Wed, May 13, 2026 at 12:29:34PM +0200, Greg KH wrote:
> >> On Tue, May 12, 2026 at 11:20:51AM -0600, Jonathan Corbet wrote:
> >> > Willy Tarreau <w@1wt.eu> writes:
> 
> >> > > +* **Capability-based protection**:
> >> > > +
> >> > > +  * users not having the ``CAP_SYS_ADMIN`` capability may not alter the
> >> > > +    kernel's configuration, memory nor state, change other users' view of the
> >> > > +    file system layout, grant any user capabilities they do not have, nor
> >> > > +    affect the system's availability (shutdown, reboot, panic, hang, or making
> >> > > +    the system unresponsive via unbounded resource exhaustion).
> >> > 
> >> > That is pretty demonstrably not true, and will likely elicit challenges
> >> > at some point.  There are a lot of "make me root" capabilities that
> >> > enable users to do all of those things; consider CAP_DAC_OVERRIDE as an
> >> > obvious example.  I think that just about all of the capabilities will
> >> > enable at least one of those things - that's why the capabilities exist
> >> > in the first place.  So I think this needs to be written far more
> >> > generally.
> >> 
> >> You are right, there are more capabilities, but we get bug reports all
> >> the time that basically come down to "a user with CAP_SYS_ADMIN can go
> >> and do..." which are pointless for us to be handling.  Just got one a
> >> few minutes ago, so LLMs are churning this crap out quite frequently.
> >> 
> >> So any rewording of this to prevent us from getting these pointless
> >> reports would be great.
> >
> > Honestly we're seeing this through the angle of a patch that lists a
> > single paragraph but the doc is already becoming quite long. I'm a bit
> > afraid of adding long enumerations, or sentences which do not immediately
> > translate to something recognizable by reporters. Not that it cannot be
> > done, but I think the current situation warrants incremental improvements
> > by fixing what doesn't work well. And indeed most of the capabilities
> > based reports currently revolve around "I already have CAP_{SYS,NET}_ADMIN
> > and ...". That might remain a good start for now.
> 
> I definitely wouldn't argue for making it longer, and enumerating all of
> the make-me-root capabilities would be silly.  I would consider just
> replacing CAP_SYS_ADMIN with "elevated capabilities" or some such.  That
> might rule out legitimate reports where some capability provides an
> access it shouldn't, but I suspect you could live with that :)

I think it could indeed work like this, without denaturating the rest
of the paragraph and having broader coverage. Do you think you could
amend/update it ? I'm not trying to add you any burden, it's just that
it will take me more time before I provide an update :-/

Thanks,
Willy

^ permalink raw reply

* Re: [PATCH] Documentation: KVM: Document guest-visible compatibility expectations
From: Eric Auger @ 2026-05-13 13:03 UTC (permalink / raw)
  To: Paolo Bonzini, David Woodhouse, Marc Zyngier
  Cc: Jonathan Corbet, Shuah Khan, kvm, linux-doc, linux-kernel,
	Sean Christopherson, Jim Mattson, Oliver Upton, Joey Gouly,
	Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon,
	Raghavendra Rao Ananta, Kees Cook, Arnd Bergmann,
	Nathan Chancellor, linux-arm-kernel, kvmarm, linux-kselftest
In-Reply-To: <ba08dfe9-932b-40c3-9fdf-fc891d52e1d8@redhat.com>

Hi,

On 5/13/26 2:43 PM, Paolo Bonzini wrote:
> On 5/13/26 11:24, David Woodhouse wrote:
>> On Wed, 2026-05-13 at 09:42 +0100, Marc Zyngier wrote:
>>> If userspace is not a total joke, it will read all the ID registers,
>>> and configure what it wants to see, assuming it is a feature that can
>>> be configured (not everything can, because the architecture itself is
>>> not fully backward compatible).
>>>
>>> Yes, this is buggy at times, because the combinatorial explosion of
>>> CPU capabilities and supported features makes it pretty hard to test
>>> (and really nobody actually does). But overall, it works, and QEMU is
>>> growing an infrastructure to manage it in a "user friendly" way.
>>
>> Yes, that is precisely what I'm asking for. I'm prepared to deal with
>> the fact that KVM/Arm64 is not a stable and mature platform like x86
>> is, and that userspace has to find all the random changes from one
>> version to the next, and explicitly pin things down to be compatible.
>>
>> All I'm asking for is that KVM makes it *possible* to pin things down
>> to the behaviour of previously released Linux/KVM kernels.
>>
>>> But really, this isn't what David is asking. He's demanding "bug for
>>> bug" compatibility. For that, we have two possible cases:
>>
>> No, I am not asking you to meet that bar. I merely observed that x86
>> does and that it would be nice. But we are a *long* way from that.
>
> x86 doesn't do bug-for-bug compatibility, thankfully - we have quirks
> but only 11 of them, or about one per year since we started adding
> them.  We only add quirks, generally speaking, when 1) we change the
> way file descriptors are initialized, 2) guests in the wild were
> relying on it, or 3) it prevends restoring state saved from an old
> kernel.  Is there anything else?
>
> So you're asking something not really far from this:
>
>>> - this is a behaviour that is not allowed by the architecture: we fix
>>>    it for good. We do that on every release. Some minor, some much more
>>>    visible. And there is no way we will add this sort of "bring the
>>>    bugs back" type of behaviours. Specially when it is really obvious
>>>    that no SW can make any reasonable use of the defect. We allow
>>>    userspace to keep behaving as before, but the guest will not see a
>>>    non-compliant behaviour.
>
> ... where for example
> https://lore.kernel.org/kvm/e03f092dfbb7d391a6bf2797ba01e122ba080bcd.camel@infradead.org/
> is an example of a bug that "no SW can make any reasonable use of".
>
>> Marc, this is complete nonsense and you should know better.
>> Once a behaviour is present in a released version of Linux/KVM, we
>> can't just declare it "wrong" and unilaterally impose a change in
>> guest-visible behaviour on *running* guests as a side-effect of a
>> kernel upgrade.
>>
>> The criterion for *KVM* to remain compatible is "once it has been in a
>> released version of the kernel". Not "once it is in the architecture".
>
> That is *also* obviously nonsense though, isn't it (see example
> above)? The truth is in the middle, "once it is in the architecture"
> is likely too narrow but "once it is in a Linux release" is way too
> broad.  And besides, both miss the point of *configurability* which is
> the basis of it all.
>
> The main difference between x86 and Arm is the default state at
> creation; x86 defaults to a blank slate, mostly; and when we didn't do
> that, we regretted it later (cue the STUFF_FEATURE_MSRS quirk).  It's
> too late to change the behavior for Arm, but I think we can agree that
> patches such as
> https://lore.kernel.org/kvm/20260511113558.3325004-2-dwmw2@infradead.org/
> ("KVM: arm64: vgic: Allow userspace to set IIDR revision 1") are what
> the letter and spirit of this proposal is about.
>
> Marc did not mention having to deal with guests in the wild.  Let's
> ignore it for now because even defining "guests in the wild" is hard;
> and anyway it's not related to the patch that triggered the discussion.
>
> So we have the third case, "restoring state saved from an old kernel".
> If this case arises, I do believe that Arm will have to deal with it
> and introduce quirks or KVM_GET/SET_REG hacks.  Maybe it hasn't
> happened yet, lucky you. 

for info, this qemu series was merged laterly.

[PATCH v10 0/7] Mitigation of "failed to load
cpu:cpreg_vmstate_array_len" migration failures <https://lore.kernel.org/all/20260420140552.104369-1-eric.auger@redhat.com/#r>
https://lore.kernel.org/all/20260420140552.104369-1-eric.auger@redhat.com/#r

It brings an infrastructure to mitigate some migration failures accross different kernel versions.

Also there is [PATCH v4 00/17] kvm/arm: Introduce a customizable aarch64 KVM host model, under review
https://lore.kernel.org/all/20260503073541.790215-1-eric.auger@redhat.com/

This series aims at beeing able to offer the capacity to set writable ID regs on the host passthrough vcpu model.

Thanks

Eric


>
> Overall, even if we may disagree about the details, are we really on
> terribly distant grounds, or are we not?
>
> Paolo
>


^ permalink raw reply

* Re: [PATCH RFC] printk: remove BOOT_PRINTK_DELAY
From: Petr Mladek @ 2026-05-13 13:04 UTC (permalink / raw)
  To: Andrew Murray
  Cc: Jonathan Corbet, Shuah Khan, Russell King, Florian Fainelli,
	Ray Jui, Scott Branden, Broadcom internal kernel review list,
	Steven Rostedt, John Ogness, Sergey Senozhatsky, Andrew Morton,
	Sebastian Andrzej Siewior, Randy Dunlap, Clark Williams,
	linux-doc, linux-kernel, linux-arm-kernel, linux-rpi-kernel,
	linux-rt-devel, Linus Torvalds
In-Reply-To: <CALqELGxhXO=kzh9bpztd9=Ug9ykPL2NALo9Apq3=Oj6aeiEcKg@mail.gmail.com>

On Wed 2026-05-06 23:37:01, Andrew Murray wrote:
> On Tue, 5 May 2026 at 15:26, Petr Mladek <pmladek@suse.com> wrote:
> >
> > On Tue 2026-05-05 14:45:00, Andrew Murray wrote:
> > > The CONFIG_BOOT_PRINTK_DELAY option enables support for the boot_delay
> > > kernel parameter, this allows for a configurable delay to be added before
> > > each and every printk is emitted. This is DEBUG_KERNEL option that is
> > > helpful for debugging as kernel output can be slowed down during boot
> > > allowing messages to be seen before scrolling off the screen, or to
> > > correlate timing between some physical event and console output.
> > >
> > > However, since the introduction of nbcon and the legacy printer thread for
> > > PREEMPT_RT kernels, printk records are now emited to the console
> > > asynchronously to the caller of printk and its boot_delay. The delay added
> > > by boot_delay continues to slow down the calling process, but may not have
> > > any impact to the rate in which records are emited to the console. For
> > > example, if delay_use is set to 100ms, and the printer thread has a
> > > backlog of more than 100ms, perhaps due to a slow serial console, then the
> > > records will appear to be printed without any delay between them.
> > >
> > > It would be unhelpful to add a delay to the printer thread, and it would
> > > not be possible to disallow selection of CONFIG_BOOT_PRINTK_DELAY at build
> > > time as it's not possible to detect which consoles are nbcon enabled at
> > > build time. Therefore, let's remove this feature.
> >
> > Heh, Randy proposed to remove "boot_delay" few days ago.
> > This RFC goes even further and remove both "boot_delay" and
> > "printk_delay".
> 
> Apologies, I didn't see this. I'll co-ordinate with Randy.

No need to apologize.

> > Honestly, I do not feel comfortable by this. The delay seems to
> > be handy when there is only graphical console. I would suggest
> > to do:
> >
> >    1. Obsolete "boot_delay" with "printk_delay" as
> >       proposed in Randy's thread, see
> >       https://lore.kernel.org/all/afn2sYKKsqG4QBVX@pathway.suse.cz/
> 
> Your suggestion was:
> 
> " 1. Add "printk_delay" early_param() which would allow
>      to set "printk_delay_msec" via command line."
> 
> And I assume the intent is to replicate the functionality of
> boot_delay, by allowing printk_delay to be used to introduce delays
> from early_param time? Thus deprecating delay_use.

Exactly.

> 
> "  2. Modify boot_delay_setup() to set "printk_delay_msec" as well.
>      In addition, it might print a message that it has been
>      obsoleted by "printk_delay" and will be removed."
> 
> Given the intent may be to deprecate boot_delay, I'm not sure that
> setting printk_delay_msec as well would be beneficial, as this would
> extend its functionality to add delays beyond SYSTEM_RUNNING which is
> where boot_delay stops. Unless you mean to use boot_delay as an alias
> to an early_param hook for printk_delay?

I do not think that this is a big problem. As you write below, it is
a debug feature. IMHO, people debugging boot problems won't mind when
the delay continues beyond SYSTEM_RUNNING. And if anyone complains
than we would at least know that there are people using this feature ;-)

> It seems that there are also differences in behavior between
> printk_delay and boot_use, with printk_delay unconditionally adding
> delays to all printks, and delay_use which considers the loglevel.

The unconditional delay does not make much sense. I consider it a bug.

> >
> >    2. Move printk_delay() from vprintk_emit() to
> >       console_emit_next_record() and nbcon_emit_next_record().
> >
> >       For nbcon console, even better would be to use a sleeping
> >       wait in nbcon_kthread_func(). But it would need some
> >       changes to call it only when a record was really emitted.
> >       Also we would need to use the busy wait in
> >       __nbcon_atomic_flush_pending_con().
> 
> This makes sense.
> 
> If the use case (in a post kthread printk thread world), is only
> relevant for graphical consoles, then I do wonder if printk_delay and
> boot_delay can be replaced with a more specific solution? Now that we
> have printk threads, the time in which a printk is presented to the
> user may not relate to when it was created, and I fear people may
> continue to debug issues that rely on that assumption.
> 
> I think the most pragmatic solution for now is:
> - Move the printk delay to the point where the printk is actually
> printed (e.g. console_flush_one_record and descendants)
> - Add an early_param to allow for printk_delay_msec to be set
> - Deprecate boot_delay, by using it as an alias for setting
 > printk_delay_msec, and include a user mesage that it is being
> deprecated and that it now extends to beyond boot (which could impact
> performance on non PREEMPT_RT and non nbcon systems)

Sounds good.

> - Update printk_delay function to use the appropiate mechanism to
> delay based on stage of boot and using printk_delay_msec instead of
> boot_delay.

Good point! I thought that mdelay() can be used even for the early
messages because parse_early_param() is called right before
parse_args() in start_kernel() in init/main.c.

But parse_early_param() might be called even earlier, for example,
by setup_arch in arch/x86/kernel/setup.c. And it is called before

  + tsc_early_init()
    + tsc_enable_sched_clock()
      + loops_per_jiffy = get_loops_per_jiffy()

which seems to be used by

  + mdelay()
    + udelay()
      + __const_udelay()

Anyway, it has to be done before printk_delay_msec() can be set
via an early parameter.

> If that makes sense I can fashion a patchset.

That would be great.

Best Regards,
Petr

PS: Note that I am traveling the following week so my review might
    get delayed.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox