* Re: [PATCH 07/12] HID: spi_hid: add ACPI support for SPI over HID
From: Dmitry Torokhov @ 2026-03-11 5:27 UTC (permalink / raw)
To: Jingyuan Liang
Cc: Jiri Kosina, Benjamin Tissoires, Jonathan Corbet, Mark Brown,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, Rob Herring,
Krzysztof Kozlowski, Conor Dooley, linux-input, linux-doc,
linux-kernel, linux-spi, linux-trace-kernel, devicetree, hbarnor,
Angela Czubak
In-Reply-To: <20260303-send-upstream-v1-7-1515ba218f3d@chromium.org>
On Tue, Mar 03, 2026 at 06:12:59AM +0000, Jingyuan Liang wrote:
> From: Angela Czubak <acz@semihalf.com>
>
> Detect SPI HID devices described in ACPI.
>
> Signed-off-by: Angela Czubak <acz@semihalf.com>
> Signed-off-by: Jingyuan Liang <jingyliang@chromium.org>
> ---
> drivers/hid/spi-hid/Kconfig | 15 +++
> drivers/hid/spi-hid/Makefile | 1 +
> drivers/hid/spi-hid/spi-hid-acpi.c | 253 +++++++++++++++++++++++++++++++++++++
> drivers/hid/spi-hid/spi-hid-core.c | 27 +---
> drivers/hid/spi-hid/spi-hid.h | 44 +++++++
> 5 files changed, 316 insertions(+), 24 deletions(-)
>
> diff --git a/drivers/hid/spi-hid/Kconfig b/drivers/hid/spi-hid/Kconfig
> index 836fdefe8345..114b1e00da39 100644
> --- a/drivers/hid/spi-hid/Kconfig
> +++ b/drivers/hid/spi-hid/Kconfig
> @@ -10,6 +10,21 @@ menuconfig SPI_HID
>
> if SPI_HID
>
> +config SPI_HID_ACPI
> + tristate "HID over SPI transport layer ACPI driver"
> + depends on ACPI
> + select SPI_HID_CORE
> + help
> + Say Y here if you use a keyboard, a touchpad, a touchscreen, or any
> + other HID based devices which are connected to your computer via SPI.
> + This driver supports ACPI-based systems.
> +
> + If unsure, say N.
> +
> + This support is also available as a module. If so, the module
> + will be called spi-hid-acpi. It will also build/depend on the
> + module spi-hid.
> +
> config SPI_HID_CORE
> tristate
> endif
> diff --git a/drivers/hid/spi-hid/Makefile b/drivers/hid/spi-hid/Makefile
> index 92e24cddbfc2..753c7b7a7844 100644
> --- a/drivers/hid/spi-hid/Makefile
> +++ b/drivers/hid/spi-hid/Makefile
> @@ -7,3 +7,4 @@
>
> obj-$(CONFIG_SPI_HID_CORE) += spi-hid.o
> spi-hid-objs = spi-hid-core.o
> +obj-$(CONFIG_SPI_HID_ACPI) += spi-hid-acpi.o
> diff --git a/drivers/hid/spi-hid/spi-hid-acpi.c b/drivers/hid/spi-hid/spi-hid-acpi.c
> new file mode 100644
> index 000000000000..612e74fe72f9
> --- /dev/null
> +++ b/drivers/hid/spi-hid/spi-hid-acpi.c
> @@ -0,0 +1,253 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * HID over SPI protocol, ACPI related code
> + *
> + * Copyright (c) 2021 Microsoft Corporation
> + * Copyright (c) 2026 Google LLC
> + *
> + * This code was forked out of the HID over SPI core code, which is partially
> + * based on "HID over I2C protocol implementation:
> + *
> + * Copyright (c) 2012 Benjamin Tissoires <benjamin.tissoires@gmail.com>
> + * Copyright (c) 2012 Ecole Nationale de l'Aviation Civile, France
> + * Copyright (c) 2012 Red Hat, Inc
> + *
> + * which in turn is partially based on "USB HID support for Linux":
> + *
> + * Copyright (c) 1999 Andreas Gal
> + * Copyright (c) 2000-2005 Vojtech Pavlik <vojtech@suse.cz>
> + * Copyright (c) 2005 Michael Haboustak <mike-@cinci.rr.com> for Concept2, Inc
> + * Copyright (c) 2007-2008 Oliver Neukum
> + * Copyright (c) 2006-2010 Jiri Kosina
> + */
> +
> +#include <linux/acpi.h>
> +#include <linux/delay.h>
> +#include <linux/device.h>
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/reset.h>
> +#include <linux/uuid.h>
> +
> +#include "spi-hid.h"
> +
> +/* Config structure is filled with data from ACPI */
> +struct spi_hid_acpi_config {
> + struct spihid_ops ops;
> +
> + struct spi_hid_conf property_conf;
> + u32 post_power_on_delay_ms;
> + u32 minimal_reset_delay_ms;
> + struct acpi_device *adev;
> +};
> +
> +/* HID SPI Device: 6e2ac436-0fcf41af-a265-b32a220dcfab */
> +static guid_t spi_hid_guid =
> + GUID_INIT(0x6E2AC436, 0x0FCF, 0x41AF,
> + 0xA2, 0x65, 0xB3, 0x2A, 0x22, 0x0D, 0xCF, 0xAB);
> +
> +static int spi_hid_acpi_populate_config(struct spi_hid_acpi_config *conf,
> + struct acpi_device *adev)
> +{
> + acpi_handle handle = acpi_device_handle(adev);
> + union acpi_object *obj;
> +
> + conf->adev = adev;
> +
> + /* Revision 3 for HID over SPI V1, see specification. */
> + obj = acpi_evaluate_dsm_typed(handle, &spi_hid_guid, 3, 1, NULL,
> + ACPI_TYPE_INTEGER);
> + if (!obj) {
> + acpi_handle_err(handle,
> + "Error _DSM call to get HID input report header address failed");
> + return -ENODEV;
> + }
> + conf->property_conf.input_report_header_address = obj->integer.value;
> + ACPI_FREE(obj);
> +
> + obj = acpi_evaluate_dsm_typed(handle, &spi_hid_guid, 3, 2, NULL,
> + ACPI_TYPE_INTEGER);
> + if (!obj) {
> + acpi_handle_err(handle,
> + "Error _DSM call to get HID input report body address failed");
> + return -ENODEV;
> + }
> + conf->property_conf.input_report_body_address = obj->integer.value;
> + ACPI_FREE(obj);
> +
> + obj = acpi_evaluate_dsm_typed(handle, &spi_hid_guid, 3, 3, NULL,
> + ACPI_TYPE_INTEGER);
> + if (!obj) {
> + acpi_handle_err(handle,
> + "Error _DSM call to get HID output report header address failed");
> + return -ENODEV;
> + }
> + conf->property_conf.output_report_address = obj->integer.value;
> + ACPI_FREE(obj);
> +
> + obj = acpi_evaluate_dsm_typed(handle, &spi_hid_guid, 3, 4, NULL,
> + ACPI_TYPE_BUFFER);
> + if (!obj) {
> + acpi_handle_err(handle,
> + "Error _DSM call to get HID read opcode failed");
> + return -ENODEV;
> + }
> + if (obj->buffer.length == 1) {
> + conf->property_conf.read_opcode = obj->buffer.pointer[0];
> + } else {
> + acpi_handle_err(handle,
> + "Error _DSM call to get HID read opcode, too long buffer");
> + ACPI_FREE(obj);
> + return -ENODEV;
> + }
> + ACPI_FREE(obj);
> +
> + obj = acpi_evaluate_dsm_typed(handle, &spi_hid_guid, 3, 5, NULL,
> + ACPI_TYPE_BUFFER);
> + if (!obj) {
> + acpi_handle_err(handle,
> + "Error _DSM call to get HID write opcode failed");
> + return -ENODEV;
> + }
> + if (obj->buffer.length == 1) {
> + conf->property_conf.write_opcode = obj->buffer.pointer[0];
> + } else {
> + acpi_handle_err(handle,
> + "Error _DSM call to get HID write opcode, too long buffer");
> + ACPI_FREE(obj);
> + return -ENODEV;
> + }
> + ACPI_FREE(obj);
> +
> + /* Value not provided in ACPI,*/
> + conf->post_power_on_delay_ms = 5;
> + conf->minimal_reset_delay_ms = 150;
> +
> + if (!acpi_has_method(handle, "_RST")) {
> + acpi_handle_err(handle, "No reset method for acpi handle");
> + return -ENODEV;
I would return -EINVAL as we have the device with right _DSM but without
mandated by the spec _RST.
> + }
> +
> + /* FIXME: not reading hid-over-spi-flags, multi-SPI not supported */
> +
> + return 0;
> +}
> +
> +static int spi_hid_acpi_power_none(struct spihid_ops *ops)
> +{
> + return 0;
> +}
> +
> +static int spi_hid_acpi_power_down(struct spihid_ops *ops)
> +{
> + struct spi_hid_acpi_config *conf = container_of(ops,
> + struct spi_hid_acpi_config,
> + ops);
> +
> + return acpi_device_set_power(conf->adev, ACPI_STATE_D3);
> +}
> +
> +static int spi_hid_acpi_power_up(struct spihid_ops *ops)
> +{
> + struct spi_hid_acpi_config *conf = container_of(ops,
> + struct spi_hid_acpi_config,
> + ops);
> + int error;
> +
> + error = acpi_device_set_power(conf->adev, ACPI_STATE_D0);
> + if (error) {
> + dev_err(&conf->adev->dev, "Error could not power up ACPI device: %d.", error);
> + return error;
> + }
> +
> + if (conf->post_power_on_delay_ms)
> + msleep(conf->post_power_on_delay_ms);
> +
> + return 0;
> +}
> +
> +static int spi_hid_acpi_assert_reset(struct spihid_ops *ops)
> +{
> + return 0;
> +}
> +
> +static int spi_hid_acpi_deassert_reset(struct spihid_ops *ops)
> +{
> + struct spi_hid_acpi_config *conf = container_of(ops,
> + struct spi_hid_acpi_config,
> + ops);
> +
> + return device_reset(&conf->adev->dev);
> +}
> +
> +static void spi_hid_acpi_sleep_minimal_reset_delay(struct spihid_ops *ops)
> +{
> + struct spi_hid_acpi_config *conf = container_of(ops,
> + struct spi_hid_acpi_config,
> + ops);
> + usleep_range(1000 * conf->minimal_reset_delay_ms,
> + 1000 * (conf->minimal_reset_delay_ms + 1));
I'd probably use "fsleep(conf->minimal_reset_delay_ms * 1000)".
> +}
> +
> +static int spi_hid_acpi_probe(struct spi_device *spi)
> +{
> + struct device *dev = &spi->dev;
> + struct acpi_device *adev;
> + struct spi_hid_acpi_config *config;
> + int error;
> +
> + adev = ACPI_COMPANION(dev);
> + if (!adev) {
> + dev_err(dev, "Error could not get ACPI device.");
> + return -ENODEV;
> + }
> +
> + config = devm_kzalloc(dev, sizeof(struct spi_hid_acpi_config),
> + GFP_KERNEL);
> + if (!config)
> + return -ENOMEM;
> +
> + if (acpi_device_power_manageable(adev)) {
> + config->ops.power_up = spi_hid_acpi_power_up;
> + config->ops.power_down = spi_hid_acpi_power_down;
> + } else {
> + config->ops.power_up = spi_hid_acpi_power_none;
> + config->ops.power_down = spi_hid_acpi_power_none;
> + }
> + config->ops.assert_reset = spi_hid_acpi_assert_reset;
> + config->ops.deassert_reset = spi_hid_acpi_deassert_reset;
> + config->ops.sleep_minimal_reset_delay =
> + spi_hid_acpi_sleep_minimal_reset_delay;
> +
> + error = spi_hid_acpi_populate_config(config, adev);
> + if (error) {
> + dev_err(dev, "%s: unable to populate config data.", __func__);
> + return error;
> + }
I would add a blank line.
> + return spi_hid_core_probe(spi, &config->ops, &config->property_conf);
> +}
> +
> +static const struct acpi_device_id spi_hid_acpi_match[] = {
> + { "ACPI0C51", 0 },
> + { "PNP0C51", 0 },
> + { },
No comma on sentinels.
> +};
> +MODULE_DEVICE_TABLE(acpi, spi_hid_acpi_match);
> +
> +static struct spi_driver spi_hid_acpi_driver = {
> + .driver = {
> + .name = "spi_hid_acpi",
> + .owner = THIS_MODULE,
> + .acpi_match_table = ACPI_PTR(spi_hid_acpi_match),
This is dependent on ACPI, so no need to sue ACPI_PTR().
> + .probe_type = PROBE_PREFER_ASYNCHRONOUS,
> + .dev_groups = spi_hid_groups,
> + },
> + .probe = spi_hid_acpi_probe,
> + .remove = spi_hid_core_remove,
> +};
> +
> +module_spi_driver(spi_hid_acpi_driver);
> +
> +MODULE_DESCRIPTION("HID over SPI ACPI transport driver");
> +MODULE_AUTHOR("Angela Czubak <aczubak@google.com>");
> +MODULE_LICENSE("GPL");
> diff --git a/drivers/hid/spi-hid/spi-hid-core.c b/drivers/hid/spi-hid/spi-hid-core.c
> index e3273846267e..02beb209a92d 100644
> --- a/drivers/hid/spi-hid/spi-hid-core.c
> +++ b/drivers/hid/spi-hid/spi-hid-core.c
> @@ -43,6 +43,9 @@
> #include <linux/wait.h>
> #include <linux/workqueue.h>
>
> +#include "spi-hid.h"
> +#include "spi-hid-core.h"
> +
> /* Protocol constants */
> #define SPI_HID_READ_APPROVAL_CONSTANT 0xff
> #define SPI_HID_INPUT_HEADER_SYNC_BYTE 0x5a
> @@ -105,30 +108,6 @@ struct spi_hid_output_report {
> u8 *content;
> };
>
> -/* struct spi_hid_conf - Conf provided to the core */
> -struct spi_hid_conf {
> - u32 input_report_header_address;
> - u32 input_report_body_address;
> - u32 output_report_address;
> - u8 read_opcode;
> - u8 write_opcode;
> -};
> -
> -/**
> - * struct spihid_ops - Ops provided to the core
> - * @power_up: do sequencing to power up the device
> - * @power_down: do sequencing to power down the device
> - * @assert_reset: do sequencing to assert the reset line
> - * @deassert_reset: do sequencing to deassert the reset line
> - */
> -struct spihid_ops {
> - int (*power_up)(struct spihid_ops *ops);
> - int (*power_down)(struct spihid_ops *ops);
> - int (*assert_reset)(struct spihid_ops *ops);
> - int (*deassert_reset)(struct spihid_ops *ops);
> - void (*sleep_minimal_reset_delay)(struct spihid_ops *ops);
> -};
> -
> static struct hid_ll_driver spi_hid_ll_driver;
>
> static void spi_hid_populate_read_approvals(const struct spi_hid_conf *conf,
> diff --git a/drivers/hid/spi-hid/spi-hid.h b/drivers/hid/spi-hid/spi-hid.h
> new file mode 100644
> index 000000000000..1fdd45262647
> --- /dev/null
> +++ b/drivers/hid/spi-hid/spi-hid.h
> @@ -0,0 +1,44 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (c) 2021 Microsoft Corporation
> + * Copyright (c) 2026 Google LLC
> + */
> +
> +#ifndef SPI_HID_H
> +#define SPI_HID_H
> +
> +#include <linux/spi/spi.h>
> +#include <linux/sysfs.h>
> +
> +/* struct spi_hid_conf - Conf provided to the core */
> +struct spi_hid_conf {
> + u32 input_report_header_address;
> + u32 input_report_body_address;
> + u32 output_report_address;
> + u8 read_opcode;
> + u8 write_opcode;
> +};
> +
> +/**
> + * struct spihid_ops - Ops provided to the core
> + * @power_up: do sequencing to power up the device
> + * @power_down: do sequencing to power down the device
> + * @assert_reset: do sequencing to assert the reset line
> + * @deassert_reset: do sequencing to deassert the reset line
> + */
> +struct spihid_ops {
> + int (*power_up)(struct spihid_ops *ops);
> + int (*power_down)(struct spihid_ops *ops);
> + int (*assert_reset)(struct spihid_ops *ops);
> + int (*deassert_reset)(struct spihid_ops *ops);
> + void (*sleep_minimal_reset_delay)(struct spihid_ops *ops);
> +};
> +
> +int spi_hid_core_probe(struct spi_device *spi, struct spihid_ops *ops,
> + struct spi_hid_conf *conf);
> +
> +void spi_hid_core_remove(struct spi_device *spi);
> +
> +extern const struct attribute_group *spi_hid_groups[];
> +
> +#endif /* SPI_HID_H */
I am not sure if this belongs to this patch or if it should be better in
the patch introducing the main driver from the beginning.
For the ACPI part:
Reviewed-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Thanks.
--
Dmitry
^ permalink raw reply
* Re: [PATCH 15/61] trace: Prefer IS_ERR_OR_NULL over manual NULL check
From: Masami Hiramatsu @ 2026-03-11 5:13 UTC (permalink / raw)
To: Steven Rostedt
Cc: Philipp Hahn, amd-gfx, apparmor, bpf, ceph-devel, cocci, dm-devel,
dri-devel, gfs2, intel-gfx, intel-wired-lan, iommu, kvm,
linux-arm-kernel, linux-block, linux-bluetooth, linux-btrfs,
linux-cifs, linux-clk, linux-erofs, linux-ext4, linux-fsdevel,
linux-gpio, linux-hyperv, linux-input, linux-kernel, linux-leds,
linux-media, linux-mips, linux-mm, linux-modules, linux-mtd,
linux-nfs, linux-omap, linux-phy, linux-pm, linux-rockchip,
linux-s390, linux-scsi, linux-sctp, linux-security-module,
linux-sh, linux-sound, linux-stm32, linux-trace-kernel, linux-usb,
linux-wireless, netdev, ntfs3, samba-technical, sched-ext,
target-devel, tipc-discussion, v9fs, Masami Hiramatsu,
Mathieu Desnoyers
In-Reply-To: <20260310100750.303af303@gandalf.local.home>
On Tue, 10 Mar 2026 10:07:50 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:
> On Tue, 10 Mar 2026 12:48:41 +0100
> Philipp Hahn <phahn-oss@avm.de> wrote:
>
> > Prefer using IS_ERR_OR_NULL() over using IS_ERR() and a manual NULL
> > check.
>
> Why?
>
> >
> > Change generated with coccinelle.
> >
> > To: Steven Rostedt <rostedt@goodmis.org>
> > To: Masami Hiramatsu <mhiramat@kernel.org>
> > To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > Cc: linux-kernel@vger.kernel.org
> > Cc: linux-trace-kernel@vger.kernel.org
> > Signed-off-by: Philipp Hahn <phahn-oss@avm.de>
> > ---
> > kernel/trace/fprobe.c | 2 +-
> > kernel/trace/kprobe_event_gen_test.c | 2 +-
> > kernel/trace/trace_events_hist.c | 2 +-
> > 3 files changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/kernel/trace/fprobe.c b/kernel/trace/fprobe.c
> > index dcadf1d23b8a31f571392d0c49cbd22df1716b4f..a94ce810d83b90f55d1178a9bd29c78fd068df4c 100644
> > --- a/kernel/trace/fprobe.c
> > +++ b/kernel/trace/fprobe.c
> > @@ -607,7 +607,7 @@ static int fprobe_module_callback(struct notifier_block *nb,
> > do {
> > rhashtable_walk_start(&iter);
> >
> > - while ((node = rhashtable_walk_next(&iter)) && !IS_ERR(node))
> > + while (!IS_ERR_OR_NULL((node = rhashtable_walk_next(&iter))))
>
> Ug, No!
>
> That looks so much worse than the original.
Hmm, now IS_ERR_OR_NULL() is an inline function, so it is safe.
But if you want to use IS_ERR_OR_NULL() here, it will be better something like
node = rhashtable_walk_next(&iter);
while (!IS_ERR_OR_NULL(node)) {
fprobe_remove_node_in_module(mod, node, &alist);
node = rhashtable_walk_next(&iter);
}
Thanks,
>
> -- Steve
>
> > fprobe_remove_node_in_module(mod, node, &alist);
> >
> > rhashtable_walk_stop(&iter);
> > diff --git a/kernel/trace/kprobe_event_gen_test.c b/kernel/trace/kprobe_event_gen_test.c
> > index 5a4b722b50451bfdee42769a6d3be39c055690d1..a1735ca273f0b756aa1fcfcdab30ddad9bc51c5f 100644
> > --- a/kernel/trace/kprobe_event_gen_test.c
> > +++ b/kernel/trace/kprobe_event_gen_test.c
> > @@ -75,7 +75,7 @@ static struct trace_event_file *gen_kretprobe_test;
> >
> > static bool trace_event_file_is_valid(struct trace_event_file *input)
> > {
> > - return input && !IS_ERR(input);
> > + return !IS_ERR_OR_NULL(input);
> > }
> >
> > /*
> > diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
> > index 73ea180cad555898693e92ee397a1c9493c7c167..59df215e1dfd9349eca1c0823ed709ec7285f766 100644
> > --- a/kernel/trace/trace_events_hist.c
> > +++ b/kernel/trace/trace_events_hist.c
> > @@ -3973,7 +3973,7 @@ trace_action_create_field_var(struct hist_trigger_data *hist_data,
> > */
> > field_var = create_target_field_var(hist_data, system, event, var);
> >
> > - if (field_var && !IS_ERR(field_var)) {
> > + if (!IS_ERR_OR_NULL(field_var)) {
> > save_field_var(hist_data, field_var);
> > hist_field = field_var->var;
> > } else {
> >
>
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply
* Re: [PATCH 02/12] HID: Add BUS_SPI support and define HID_SPI_DEVICE macro
From: Dmitry Torokhov @ 2026-03-11 5:11 UTC (permalink / raw)
To: Jingyuan Liang
Cc: Jiri Kosina, Benjamin Tissoires, Jonathan Corbet, Mark Brown,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, Rob Herring,
Krzysztof Kozlowski, Conor Dooley, linux-input, linux-doc,
linux-kernel, linux-spi, linux-trace-kernel, devicetree, hbarnor,
Jarrett Schultz, Dmitry Antipov
In-Reply-To: <20260303-send-upstream-v1-2-1515ba218f3d@chromium.org>
On Tue, Mar 03, 2026 at 06:12:54AM +0000, Jingyuan Liang wrote:
> From: Jarrett Schultz <jaschultz@microsoft.com>
>
> If connecting a hid_device with bus field indicating BUS_SPI print out
> "SPI" in the debug print.
>
> Macro sets the bus field to BUS_SPI and uses arguments to set vendor
> product fields.
>
> Signed-off-by: Dmitry Antipov <dmanti@microsoft.com>
> Signed-off-by: Jingyuan Liang <jingyliang@chromium.org>
> ---
> drivers/hid/hid-core.c | 3 +++
> include/linux/hid.h | 2 ++
> 2 files changed, 5 insertions(+)
>
> diff --git a/drivers/hid/hid-core.c b/drivers/hid/hid-core.c
> index a5b3a8ca2fcb..813c9c743ccd 100644
> --- a/drivers/hid/hid-core.c
> +++ b/drivers/hid/hid-core.c
> @@ -2316,6 +2316,9 @@ int hid_connect(struct hid_device *hdev, unsigned int connect_mask)
> case BUS_I2C:
> bus = "I2C";
> break;
> + case BUS_SPI:
> + bus = "SPI";
> + break;
> case BUS_SDW:
> bus = "SOUNDWIRE";
> break;
> diff --git a/include/linux/hid.h b/include/linux/hid.h
> index dce862cafbbd..957f322a0ebd 100644
> --- a/include/linux/hid.h
> +++ b/include/linux/hid.h
> @@ -786,6 +786,8 @@ struct hid_descriptor {
> .bus = BUS_BLUETOOTH, .vendor = (ven), .product = (prod)
> #define HID_I2C_DEVICE(ven, prod) \
> .bus = BUS_I2C, .vendor = (ven), .product = (prod)
> +#define HID_SPI_DEVICE(ven, prod) \
> + .bus = BUS_SPI, .vendor = (ven), .product = (prod)
>
> #define HID_REPORT_ID(rep) \
> .report_type = (rep)
Reviewed-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Thanks.
--
Dmitry
^ permalink raw reply
* Re: [PATCH 01/12] Documentation: Correction in HID output_report callback description.
From: Dmitry Torokhov @ 2026-03-11 5:10 UTC (permalink / raw)
To: Jingyuan Liang
Cc: Jiri Kosina, Benjamin Tissoires, Jonathan Corbet, Mark Brown,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, Rob Herring,
Krzysztof Kozlowski, Conor Dooley, linux-input, linux-doc,
linux-kernel, linux-spi, linux-trace-kernel, devicetree, hbarnor,
Jarrett Schultz, Dmitry Antipov
In-Reply-To: <20260303-send-upstream-v1-1-1515ba218f3d@chromium.org>
On Tue, Mar 03, 2026 at 06:12:53AM +0000, Jingyuan Liang wrote:
> From: Jarrett Schultz <jaschultz@microsoft.com>
>
> Originally output_report callback was described as must-be asynchronous,
> but that is not the case in some implementations, namely i2c-hid.
> Correct the documentation to say that it may be asynchronous.
>
> Signed-off-by: Dmitry Antipov <dmanti@microsoft.com>
> Signed-off-by: Jingyuan Liang <jingyliang@chromium.org>
> ---
> Documentation/hid/hid-transport.rst | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/hid/hid-transport.rst b/Documentation/hid/hid-transport.rst
> index 6f1692da296c..2008cf432af1 100644
> --- a/Documentation/hid/hid-transport.rst
> +++ b/Documentation/hid/hid-transport.rst
> @@ -327,8 +327,8 @@ The available HID callbacks are:
>
> Send raw output report via intr channel. Used by some HID device drivers
> which require high throughput for outgoing requests on the intr channel. This
> - must not cause SET_REPORT calls! This must be implemented as asynchronous
> - output report on the intr channel!
> + must not cause SET_REPORT calls! This call might be asynchronous, so the
> + caller should not expect an immediate response!
Reviewed-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Thanks.
--
Dmitry
^ permalink raw reply
* Re: [PATCH 57/61] reset: Prefer IS_ERR_OR_NULL over manual NULL check
From: Masami Hiramatsu @ 2026-03-11 4:59 UTC (permalink / raw)
To: Philipp Hahn
Cc: amd-gfx, apparmor, bpf, ceph-devel, cocci, dm-devel, dri-devel,
gfs2, intel-gfx, intel-wired-lan, iommu, kvm, linux-arm-kernel,
linux-block, linux-bluetooth, linux-btrfs, linux-cifs, linux-clk,
linux-erofs, linux-ext4, linux-fsdevel, linux-gpio, linux-hyperv,
linux-input, linux-kernel, linux-leds, linux-media, linux-mips,
linux-mm, linux-modules, linux-mtd, linux-nfs, linux-omap,
linux-phy, linux-pm, linux-rockchip, linux-s390, linux-scsi,
linux-sctp, linux-security-module, linux-sh, linux-sound,
linux-stm32, linux-trace-kernel, linux-usb, linux-wireless,
netdev, ntfs3, samba-technical, sched-ext, target-devel,
tipc-discussion, v9fs, Philipp Zabel
In-Reply-To: <20260310-b4-is_err_or_null-v1-57-bd63b656022d@avm.de>
On Tue, 10 Mar 2026 12:49:23 +0100
Philipp Hahn <phahn-oss@avm.de> wrote:
> Prefer using IS_ERR_OR_NULL() over using IS_ERR() and a manual NULL
> check.
>
> Semantich change: Previously the code only printed the warning on error,
> but not when the pointer was NULL. Now the warning is printed in both
> cases!
>
> Change found with coccinelle.
>
> To: Philipp Zabel <p.zabel@pengutronix.de>
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Philipp Hahn <phahn-oss@avm.de>
> ---
> drivers/reset/core.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/reset/core.c b/drivers/reset/core.c
> index fceec45c8afc1e74fe46311bdc023ff257e8d770..649bb4ebabb20a09349ccbfc62f8280621df450e 100644
> --- a/drivers/reset/core.c
> +++ b/drivers/reset/core.c
> @@ -715,7 +715,7 @@ EXPORT_SYMBOL_GPL(reset_control_bulk_acquire);
> */
> void reset_control_release(struct reset_control *rstc)
> {
> - if (!rstc || WARN_ON(IS_ERR(rstc)))
> + if (WARN_ON(IS_ERR_OR_NULL(rstc)))
This changes the behavior when rstc == NULL.
WARN_ON does not hit when rstc == NULL in the original code.
Thanks,
> return;
>
> if (reset_control_is_array(rstc))
>
> --
> 2.43.0
>
>
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply
* [PATCH] Remove unused headers in x86/tools, scripts, pps, input
From: Oli @ 2026-03-11 3:01 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, Steven Rostedt
Cc: Mathieu Desnoyers, Masami Hiramatsu, Rodolfo Giometti,
Henrik Rydberg, Dmitry Torokhov, Nathan Chancellor,
Nicolas Schier, linux-kernel, linux-trace-kernel, linux-kbuild,
linux-input, x86
[-- Attachment #1.1: Type: text/plain, Size: 2304 bytes --]
From c78a0572f5ec2b927f9b723af687e6ef913561a4 Mon Sep 17 00:00:00 2001
From: Eddie Hudgins <Oochiolio@gmail.com>
Date: Tue, 10 Mar 2026 21:53:07 -0500
Subject: [PATCH] Signed-off-by: Eddie Hudgins <Oochiolio@gmail.com>
arch/x86/tools: Removed headers in relocs_32.c scripts/basic: Removed
headers
in fixdep.c drivers/pps: Removed headers in pps.c drivers/input: Removed
headers in input-mt.c
These changes compile for x86, x86_64, and powerpc (Those were the only
ones fairly tested) under defconfig. This aims to clean up code and
simplify the files for developers. This will also contribute to start of
decluttering the environment.
---
arch/x86/tools/relocs_32.c | 1 -
drivers/input/input-mt.c | 1 -
drivers/pps/pps.c | 3 ---
scripts/basic/fixdep.c | 1 -
4 files changed, 6 deletions(-)
diff --git a/arch/x86/tools/relocs_32.c b/arch/x86/tools/relocs_32.c
index 9442ff78be83..9e4668e74993 100644
--- a/arch/x86/tools/relocs_32.c
+++ b/arch/x86/tools/relocs_32.c
@@ -1,6 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
#include "relocs.h"
-
#define ELF_BITS 32
#define ELF_MACHINE EM_386
diff --git a/drivers/input/input-mt.c b/drivers/input/input-mt.c
index c06e98fbd77c..b553b7f2313a 100644
--- a/drivers/input/input-mt.c
+++ b/drivers/input/input-mt.c
@@ -7,7 +7,6 @@
#include <linux/input/mt.h>
#include <linux/export.h>
-#include <linux/slab.h>
#include "input-core-private.h"
#define TRKID_SGN ((TRKID_MAX + 1) >> 1)
diff --git a/drivers/pps/pps.c b/drivers/pps/pps.c
index c6b8b6478276..a9a8802c2399 100644
--- a/drivers/pps/pps.c
+++ b/drivers/pps/pps.c
@@ -7,14 +7,11 @@
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
-#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/init.h>
-#include <linux/sched.h>
#include <linux/uaccess.h>
#include <linux/idr.h>
#include <linux/mutex.h>
-#include <linux/cdev.h>
#include <linux/poll.h>
#include <linux/pps_kernel.h>
#include <linux/slab.h>
diff --git a/scripts/basic/fixdep.c b/scripts/basic/fixdep.c
index cdd5da7e009b..feb9e7d8984d 100644
--- a/scripts/basic/fixdep.c
+++ b/scripts/basic/fixdep.c
@@ -89,7 +89,6 @@
* but I don't think the added complexity is worth it)
*/
-#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <fcntl.h>
--
2.43.0
[-- Attachment #1.2: Type: text/html, Size: 2829 bytes --]
[-- Attachment #2: 0001-Signed-off-by-Eddie-Hudgins-Oochiolio-gmail.com.patch --]
[-- Type: application/octet-stream, Size: 2297 bytes --]
From c78a0572f5ec2b927f9b723af687e6ef913561a4 Mon Sep 17 00:00:00 2001
From: Eddie Hudgins <Oochiolio@gmail.com>
Date: Tue, 10 Mar 2026 21:53:07 -0500
Subject: [PATCH] Signed-off-by: Eddie Hudgins <Oochiolio@gmail.com>
arch/x86/tools: Removed headers in relocs_32.c scripts/basic: Removed headers
in fixdep.c drivers/pps: Removed headers in pps.c drivers/input: Removed
headers in input-mt.c
These changes compile for x86, x86_64, and powerpc (Those were the only ones fairly tested) under defconfig. This aims to clean up code and simplify the files for developers. This will also contribute to start of decluttering the environment.
---
arch/x86/tools/relocs_32.c | 1 -
drivers/input/input-mt.c | 1 -
drivers/pps/pps.c | 3 ---
scripts/basic/fixdep.c | 1 -
4 files changed, 6 deletions(-)
diff --git a/arch/x86/tools/relocs_32.c b/arch/x86/tools/relocs_32.c
index 9442ff78be83..9e4668e74993 100644
--- a/arch/x86/tools/relocs_32.c
+++ b/arch/x86/tools/relocs_32.c
@@ -1,6 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
#include "relocs.h"
-
#define ELF_BITS 32
#define ELF_MACHINE EM_386
diff --git a/drivers/input/input-mt.c b/drivers/input/input-mt.c
index c06e98fbd77c..b553b7f2313a 100644
--- a/drivers/input/input-mt.c
+++ b/drivers/input/input-mt.c
@@ -7,7 +7,6 @@
#include <linux/input/mt.h>
#include <linux/export.h>
-#include <linux/slab.h>
#include "input-core-private.h"
#define TRKID_SGN ((TRKID_MAX + 1) >> 1)
diff --git a/drivers/pps/pps.c b/drivers/pps/pps.c
index c6b8b6478276..a9a8802c2399 100644
--- a/drivers/pps/pps.c
+++ b/drivers/pps/pps.c
@@ -7,14 +7,11 @@
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
-#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/init.h>
-#include <linux/sched.h>
#include <linux/uaccess.h>
#include <linux/idr.h>
#include <linux/mutex.h>
-#include <linux/cdev.h>
#include <linux/poll.h>
#include <linux/pps_kernel.h>
#include <linux/slab.h>
diff --git a/scripts/basic/fixdep.c b/scripts/basic/fixdep.c
index cdd5da7e009b..feb9e7d8984d 100644
--- a/scripts/basic/fixdep.c
+++ b/scripts/basic/fixdep.c
@@ -89,7 +89,6 @@
* but I don't think the added complexity is worth it)
*/
-#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <fcntl.h>
--
2.43.0
^ permalink raw reply related
* Re: [net-next,1/2] mptcp: better mptcp-level RTT estimator
From: Jakub Kicinski @ 2026-03-11 2:45 UTC (permalink / raw)
To: matttbe
Cc: Jakub Kicinski, martineau, davem, netdev, pabeni, fw, horms,
edumazet, linux-kernel, mptcp, geliang, mhiramat,
linux-trace-kernel, rostedt, mathieu.desnoyers
In-Reply-To: <20260309-net-next-mptcp-reduce-rbuf-v1-1-8f471206f9c5@kernel.org>
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
For local reproduction steps see https://netdev-ai.bots.linux.dev/ai-local.html
---
mptcp: better mptcp-level RTT estimator
The current MPTCP-level RTT estimator has several issues. On high speed
links, the MPTCP-level receive buffer auto-tuning happens with a
frequency well above the TCP-level's one. That in turn can cause
excessive/unneeded receive buffer increase.
On such links, the initial rtt_us value is considerably higher than the
actual delay, and the current mptcp_rcv_space_adjust() updates
msk->rcvq_space.rtt_us with a period equal to the such field previous
value. If the initial rtt_us is 40ms, its first update will happen after
40ms, even if the subflows see actual RTT orders of magnitude lower.
Additionally:
- setting the msk RTT to the maximum among all the subflows RTTs makes
DRS constantly overshooting the rcvbuf size when a subflow has
considerable higher latency than the other(s).
- during unidirectional bulk transfers with multiple active subflows,
the TCP-level RTT estimator occasionally sees considerably higher
value than the real link delay, i.e. when the packet scheduler reacts
to an incoming ACK on given subflow pushing data on a different
subflow.
- currently inactive but still open subflows (i.e. switched to backup
mode) are always considered when computing the msk-level RTT.
Address the all the issues above with a more accurate RTT estimation
strategy: the MPTCP-level RTT is set to the minimum of all the subflows
actually feeding data into the MPTCP receive buffer, using a small
sliding window.
> While at it, also use EWMA to compute the msk-level scaling_ratio, to
> that MPTCP can avoid traversing the subflow list is
> mptcp_rcv_space_adjust().
This isn't a bug, but there are two typos here that make the sentence
hard to parse. Should this be 'so that MPTCP can avoid traversing the
subflow list in mptcp_rcv_space_adjust()'?
Use some care to avoid updating msk and ssk level fields too often.
Fixes: a6b118febbab ("mptcp: add receive buffer auto-tuning")
[ ... ]
> diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
> index 0bd1ee860316..6ec65c0ae655 100644
> --- a/net/mptcp/protocol.h
> +++ b/net/mptcp/protocol.h
[ ... ]
> @@ -422,6 +435,27 @@ static inline struct mptcp_data_frag *mptcp_send_head(const struct sock *sk)
> return msk->first_pending;
> }
>
> +static inline void mptcp_init_rtt_est(struct mptcp_sock *msk)
> +{
> + int i;
> +
> + for (i = 0; i < MPTCP_RTT_SAMPLES; ++i)
> + msk->rcv_rtt_est.samples[i] = U32_MAX;
> + msk->rcv_rtt_est.next_sample = 0;
> + msk->scaling_ratio = TCP_DEFAULT_SCALING_RATIO;
> +}
> +
> +static inline u32 mptcp_rtt_us_est(const struct mptcp_sock *msk)
> +{
> + u32 rtt_us = msk->rcv_rtt_est.samples[0];
^^^^
Is samples[0] missing a READ_ONCE()?
The function comment says 'Lockless access of collected samples' and the
writer mptcp_rcv_rtt_update() uses WRITE_ONCE() for all sample indices
(including index 0) with a comment 'Pairs with READ_ONCE() in
mptcp_rtt_us_est()':
net/mptcp/protocol.c:mptcp_rcv_rtt_update() {
...
/* Pairs with READ_ONCE() in mptcp_rtt_us_est(). */
id = msk->rcv_rtt_est.next_sample;
WRITE_ONCE(msk->rcv_rtt_est.samples[id], rtt_us);
...
}
However, samples[0] is read with a plain load while samples[1..4] correctly
use READ_ONCE():
> + int i;
> +
> + /* Lockless access of collected samples. */
> + for (i = 1; i < MPTCP_RTT_SAMPLES; ++i)
> + rtt_us = min(rtt_us, READ_ONCE(msk->rcv_rtt_est.samples[i]));
> + return rtt_us;
> +}
The writer runs under mptcp_data_lock (spin_lock_bh) while the reader runs
under the socket lock (msk_owned_by_me) -- these are different locks, so
WRITE_ONCE/READ_ONCE pairing is required. KCSAN would flag this as a data
race.
> +
> static inline struct mptcp_data_frag *mptcp_send_next(struct sock *sk)
> {
> struct mptcp_sock *msk = mptcp_sk(sk);
[ ... ]
> @@ -523,6 +557,8 @@ struct mptcp_subflow_context {
> u32 map_data_len;
> __wsum map_data_csum;
> u32 map_csum_len;
> + u32 prev_rtt_us;
^^^^
Is this field used anywhere?
The field 'prev_rtt_us' is added to struct mptcp_subflow_context but doesn't
appear to be referenced anywhere in the codebase. Only 'prev_rtt_seq' is used
in mptcp_rcv_rtt_update().
> + u32 prev_rtt_seq;
> u32 request_mptcp : 1, /* send MP_CAPABLE */
> request_join : 1, /* send MP_JOIN */
> request_bkup : 1,
--
pw-bot: cr
^ permalink raw reply
* Re: [PATCH v2 0/3] powerpc64/ftrace: fixes clang build issues
From: Madhavan Srinivasan @ 2026-03-11 2:13 UTC (permalink / raw)
To: linuxppc-dev, Hari Bathini
Cc: Naveen N. Rao, Michael Ellerman, linux-trace-kernel, Mark Rutland,
Steven Rostedt, Masami Hiramatsu, llvm, Maryam Moghadas,
Christophe Leroy
In-Reply-To: <20260127084926.34497-1-hbathini@linux.ibm.com>
On Tue, 27 Jan 2026 14:19:23 +0530, Hari Bathini wrote:
> Support for -fpatchable-function-entry on ppc64le was added in Clang
> with [1]. Faced a couple of issues while building the linux kernel
> with Clang though. Patches 1 & 2 address those issues. The last patch
> provides workaround to ensure, an open issue [2] in Clang with
> -fpatchable-function-entry support, does not impact the linux
> ftrace subsystem.
>
> [...]
Applied to powerpc/fixes.
[1/3] powerpc64: make clang cross-build friendly
https://git.kernel.org/powerpc/c/73cdf24e81e4eba52a40a6b10c6cf285d0ac23fd
[2/3] powerpc64/ftrace: fix OOL stub count with clang
https://git.kernel.org/powerpc/c/875612a7745013a43c67493cb0583ee3f7476344
[3/3] powerpc64/ftrace: workaround clang recording GEP in __patchable_function_entries
https://git.kernel.org/powerpc/c/db54c28702f7270e74dce36c84cb0db4cec96389
cheers
^ permalink raw reply
* Re: [PATCH 56/61] clk: Prefer IS_ERR_OR_NULL over manual NULL check
From: Chen-Yu Tsai @ 2026-03-11 2:07 UTC (permalink / raw)
To: Philipp Hahn
Cc: amd-gfx, apparmor, bpf, ceph-devel, cocci, dm-devel, dri-devel,
gfs2, intel-gfx, intel-wired-lan, iommu, kvm, linux-arm-kernel,
linux-block, linux-bluetooth, linux-btrfs, linux-cifs, linux-clk,
linux-erofs, linux-ext4, linux-fsdevel, linux-gpio, linux-hyperv,
linux-input, linux-kernel, linux-leds, linux-media, linux-mips,
linux-mm, linux-modules, linux-mtd, linux-nfs, linux-omap,
linux-phy, linux-pm, linux-rockchip, linux-s390, linux-scsi,
linux-sctp, linux-security-module, linux-sh, linux-sound,
linux-stm32, linux-trace-kernel, linux-usb, linux-wireless,
netdev, ntfs3, samba-technical, sched-ext, target-devel,
tipc-discussion, v9fs, Michael Turquette, Stephen Boyd,
Daniel Lezcano, Thomas Gleixner
In-Reply-To: <20260310-b4-is_err_or_null-v1-56-bd63b656022d@avm.de>
On Tue, Mar 10, 2026 at 9:57 PM Philipp Hahn <phahn-oss@avm.de> wrote:
>
> Prefer using IS_ERR_OR_NULL() over using IS_ERR() and a manual NULL
> check.
>
> Semantich change: Previously the code only printed the warning on error,
> but not when the pointer was NULL. Now the warning is printed in both
> cases!
>
> Change found with coccinelle.
>
> To: Michael Turquette <mturquette@baylibre.com>
> To: Stephen Boyd <sboyd@kernel.org>
> To: Daniel Lezcano <daniel.lezcano@kernel.org>
> To: Thomas Gleixner <tglx@kernel.org>
> Cc: linux-clk@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Philipp Hahn <phahn-oss@avm.de>
> ---
> drivers/clk/clk.c | 4 ++--
> drivers/clocksource/timer-pxa.c | 2 +-
> 2 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
> index 47093cda9df32223c1120c3710261296027c4cd3..35146e3869a7dd93741d10b7223d4488a9216ed1 100644
> --- a/drivers/clk/clk.c
> +++ b/drivers/clk/clk.c
> @@ -4558,7 +4558,7 @@ void clk_unregister(struct clk *clk)
> unsigned long flags;
> const struct clk_ops *ops;
>
> - if (!clk || WARN_ON_ONCE(IS_ERR(clk)))
> + if (WARN_ON_ONCE(IS_ERR_OR_NULL(clk)))
> return;
>
> clk_debug_unregister(clk->core);
> @@ -4744,7 +4744,7 @@ void __clk_put(struct clk *clk)
> {
> struct module *owner;
>
> - if (!clk || WARN_ON_ONCE(IS_ERR(clk)))
> + if (WARN_ON_ONCE(IS_ERR_OR_NULL(clk)))
clk_get_optional() returns NULL if the clk isn't present.
Drivers would just pass this to clk_put(). Your change here would cause
this pattern to emit a very big warning.
I don't think this change should be landed.
ChenYu
> return;
>
> clk_prepare_lock();
> diff --git a/drivers/clocksource/timer-pxa.c b/drivers/clocksource/timer-pxa.c
> index 7ad0e5adb2ffac4125c34710fc67f4b45f30331d..f65fb0b7fc318b766227e5e7a4c0fb08ba11c8f9 100644
> --- a/drivers/clocksource/timer-pxa.c
> +++ b/drivers/clocksource/timer-pxa.c
> @@ -218,7 +218,7 @@ void __init pxa_timer_nodt_init(int irq, void __iomem *base)
>
> timer_base = base;
> clk = clk_get(NULL, "OSTIMER0");
> - if (clk && !IS_ERR(clk)) {
> + if (!IS_ERR_OR_NULL(clk)) {
> clk_prepare_enable(clk);
> pxa_timer_common_init(irq, clk_get_rate(clk));
> } else {
>
> --
> 2.43.0
>
>
^ permalink raw reply
* [PATCH v9 4/4] ring-buffer: Add persistent ring buffer selftest
From: Masami Hiramatsu (Google) @ 2026-03-11 1:32 UTC (permalink / raw)
To: Steven Rostedt
Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
linux-trace-kernel, Ian Rogers
In-Reply-To: <177319273059.130641.10882692460536780093.stgit@mhiramat.tok.corp.google.com>
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Add a self-destractive test for the persistent ring buffer. This
will invalidate some sub-buffer pages in the persistent ring buffer
when kernel gets panic, and check whether the number of detected
invalid pages is the same as record after reboot.
This can ensure the kernel correctly recover partially corrupted
persistent ring buffer when boot.
The test only runs on the persistent ring buffer whose name is
"ptracingtest". And user has to fill it up with events before
kernel panics.
To run the test, enable CONFIG_RING_BUFFER_PERSISTENT_SELFTEST
and you have to setup the kernel cmdline;
reserve_mem=20M:2M:trace trace_instance=ptracingtest^traceoff@trace
panic=1
And run following commands after the 1st boot;
cd /sys/kernel/tracing/instances/ptracingtest
echo 1 > tracing_on
echo 1 > events/enable
sleep 3
echo c > /proc/sysrq-trigger
After panic message, the kernel will reboot and run the verification
on the persistent ring buffer, e.g.
Ring buffer meta [1] invalid buffer page detected
Ring buffer meta [1] is from previous boot! (318 pages discarded)
Ring buffer testing [1]: PASSED (318/318)
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
Changes in v9:
- Test also reader pages.
---
include/linux/ring_buffer.h | 1 +
kernel/trace/Kconfig | 15 +++++++++++++
kernel/trace/ring_buffer.c | 49 +++++++++++++++++++++++++++++++++++++++++++
kernel/trace/trace.c | 4 ++++
4 files changed, 69 insertions(+)
diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index 876358cfe1b1..927b6e8587cb 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -238,6 +238,7 @@ int ring_buffer_subbuf_size_get(struct trace_buffer *buffer);
enum ring_buffer_flags {
RB_FL_OVERWRITE = 1 << 0,
+ RB_FL_TESTING = 1 << 1,
};
#ifdef CONFIG_RING_BUFFER
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 49de13cae428..2e6f3b7c6a31 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -1202,6 +1202,21 @@ config RING_BUFFER_VALIDATE_TIME_DELTAS
Only say Y if you understand what this does, and you
still want it enabled. Otherwise say N
+config RING_BUFFER_PERSISTENT_SELFTEST
+ bool "Enable persistent ring buffer selftest"
+ depends on RING_BUFFER
+ help
+ Run a selftest on the persistent ring buffer which names
+ "ptracingtest" (and its backup) when panic_on_reboot by
+ invalidating ring buffer pages.
+ Note that user has to enable events on the persistent ring
+ buffer manually to fill up ring buffers before rebooting.
+ Since this invalidates the data on test target ring buffer,
+ "ptracingtest" persistent ring buffer must not be used for
+ actual tracing, but only for testing.
+
+ If unsure, say N
+
config MMIOTRACE_TEST
tristate "Test module for mmiotrace"
depends on MMIOTRACE && m
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 91b3f18d707b..bfe213c89b43 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -63,6 +63,7 @@ struct ring_buffer_cpu_meta {
unsigned long commit_buffer;
__u32 subbuf_size;
__u32 nr_subbufs;
+ __u32 nr_invalid;
int buffers[];
};
@@ -2096,6 +2097,11 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
pr_info("Ring buffer meta [%d] is from previous boot! (%d pages discarded)\n",
cpu_buffer->cpu, discarded);
+ if (meta->nr_invalid)
+ pr_info("Ring buffer testing [%d]: %s (%d/%d)\n",
+ cpu_buffer->cpu,
+ (discarded == meta->nr_invalid) ? "PASSED" : "FAILED",
+ discarded, meta->nr_invalid);
return;
invalid:
@@ -2498,12 +2504,55 @@ static void rb_free_cpu_buffer(struct ring_buffer_per_cpu *cpu_buffer)
kfree(cpu_buffer);
}
+#ifdef CONFIG_RING_BUFFER_PERSISTENT_SELFTEST
+static void rb_test_inject_invalid_pages(struct trace_buffer *buffer)
+{
+ struct ring_buffer_per_cpu *cpu_buffer;
+ struct ring_buffer_cpu_meta *meta;
+ struct buffer_data_page *dpage;
+ unsigned long ptr;
+ int subbuf_size;
+ int invalid = 0;
+ int cpu;
+ int i;
+
+ if (!(buffer->flags & RB_FL_TESTING))
+ return;
+
+ guard(preempt)();
+ cpu = smp_processor_id();
+
+ cpu_buffer = buffer->buffers[cpu];
+ meta = cpu_buffer->ring_meta;
+ ptr = (unsigned long)rb_subbufs_from_meta(meta);
+ subbuf_size = meta->subbuf_size;
+
+ /* Invalidate even pages. */
+ for (i = 0; i < meta->nr_subbufs; i += 2) {
+ int idx = meta->buffers[i];
+
+ dpage = (void *)(ptr + idx * subbuf_size);
+ /* Skip unused pages */
+ if (!local_read(&dpage->commit))
+ continue;
+ local_add(subbuf_size + 1, &dpage->commit);
+ invalid++;
+ }
+
+ pr_info("Inject invalidated %d pages on CPU%d\n", invalid, cpu);
+ meta->nr_invalid = invalid;
+}
+#else /* !CONFIG_RING_BUFFER_PERSISTENT_SELFTEST */
+#define rb_test_inject_invalid_pages(buffer) do { } while (0)
+#endif
+
/* Stop recording on a persistent buffer and flush cache if needed. */
static int rb_flush_buffer_cb(struct notifier_block *nb, unsigned long event, void *data)
{
struct trace_buffer *buffer = container_of(nb, struct trace_buffer, flush_nb);
ring_buffer_record_off(buffer);
+ rb_test_inject_invalid_pages(buffer);
arch_ring_buffer_flush_range(buffer->range_addr_start, buffer->range_addr_end);
return NOTIFY_DONE;
}
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 23de3719f495..eccc1ff22f71 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -9336,6 +9336,8 @@ static void setup_trace_scratch(struct trace_array *tr,
memset(tscratch, 0, size);
}
+#define TRACE_TEST_PTRACING_NAME "ptracingtest"
+
static int
allocate_trace_buffer(struct trace_array *tr, struct array_buffer *buf, int size)
{
@@ -9348,6 +9350,8 @@ allocate_trace_buffer(struct trace_array *tr, struct array_buffer *buf, int size
buf->tr = tr;
if (tr->range_addr_start && tr->range_addr_size) {
+ if (!strcmp(tr->name, TRACE_TEST_PTRACING_NAME))
+ rb_flags |= RB_FL_TESTING;
/* Add scratch buffer to handle 128 modules */
buf->buffer = ring_buffer_alloc_range(size, rb_flags, 0,
tr->range_addr_start,
^ permalink raw reply related
* [PATCH v9 3/4] ring-buffer: Skip invalid sub-buffers when validating persistent ring buffer
From: Masami Hiramatsu (Google) @ 2026-03-11 1:32 UTC (permalink / raw)
To: Steven Rostedt
Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
linux-trace-kernel, Ian Rogers
In-Reply-To: <177319273059.130641.10882692460536780093.stgit@mhiramat.tok.corp.google.com>
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Skip invalid sub-buffers when validating the persistent ring buffer
instead of discarding the entire ring buffer. Only skipped buffers
are invalidated (cleared).
If the cache data in memory fails to be synchronized during a reboot,
the persistent ring buffer may become partially corrupted, but other
sub-buffers may still contain readable event data. Only discard the
subbuffersa that ar found to be corrupted.
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
Changes in v9:
- Add meta->subbuf_size check.
- Fix a typo.
- Handle invalid reader_page case.
Changes in v8:
- Add comment in rb_valudate_buffer()
- Clear the RB_MISSED_* flags in rb_valudate_buffer() instead of
skipping subbuf.
- Remove unused subbuf local variable from rb_cpu_meta_valid().
Changes in v7:
- Combined with Handling RB_MISSED_* flags patch, focus on validation at boot.
- Remove checking subbuffer data when validating metadata, because it should be done
later.
- Do not mark the discarded sub buffer page but just reset it.
Changes in v6:
- Show invalid page detection message once per CPU.
Changes in v5:
- Instead of showing errors for each page, just show the number
of discarded pages at last.
Changes in v3:
- Record missed data event on commit.
---
kernel/trace/ring_buffer.c | 98 ++++++++++++++++++++++++++------------------
1 file changed, 58 insertions(+), 40 deletions(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 9f4ee9e3803d..91b3f18d707b 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -396,6 +396,12 @@ static __always_inline unsigned int rb_page_commit(struct buffer_page *bpage)
return local_read(&bpage->page->commit);
}
+/* Size is determined by what has been committed */
+static __always_inline unsigned int rb_page_size(struct buffer_page *bpage)
+{
+ return rb_page_commit(bpage) & ~RB_MISSED_MASK;
+}
+
static void free_buffer_page(struct buffer_page *bpage)
{
/* Range pages are not to be freed */
@@ -1791,7 +1797,6 @@ static bool rb_cpu_meta_valid(struct ring_buffer_cpu_meta *meta, int cpu,
unsigned long *subbuf_mask)
{
int subbuf_size = PAGE_SIZE;
- struct buffer_data_page *subbuf;
unsigned long buffers_start;
unsigned long buffers_end;
int i;
@@ -1799,6 +1804,11 @@ static bool rb_cpu_meta_valid(struct ring_buffer_cpu_meta *meta, int cpu,
if (!subbuf_mask)
return false;
+ if (meta->subbuf_size != PAGE_SIZE) {
+ pr_info("Ring buffer boot meta [%d] invalid subbuf_size\n", cpu);
+ return false;
+ }
+
buffers_start = meta->first_buffer;
buffers_end = meta->first_buffer + (subbuf_size * meta->nr_subbufs);
@@ -1815,11 +1825,12 @@ static bool rb_cpu_meta_valid(struct ring_buffer_cpu_meta *meta, int cpu,
return false;
}
- subbuf = rb_subbufs_from_meta(meta);
-
bitmap_clear(subbuf_mask, 0, meta->nr_subbufs);
- /* Is the meta buffers and the subbufs themselves have correct data? */
+ /*
+ * Ensure the meta::buffers array has correct data. The data in each subbufs
+ * are checked later in rb_meta_validate_events().
+ */
for (i = 0; i < meta->nr_subbufs; i++) {
if (meta->buffers[i] < 0 ||
meta->buffers[i] >= meta->nr_subbufs) {
@@ -1827,18 +1838,12 @@ static bool rb_cpu_meta_valid(struct ring_buffer_cpu_meta *meta, int cpu,
return false;
}
- if ((unsigned)local_read(&subbuf->commit) > subbuf_size) {
- pr_info("Ring buffer boot meta [%d] buffer invalid commit\n", cpu);
- return false;
- }
-
if (test_bit(meta->buffers[i], subbuf_mask)) {
pr_info("Ring buffer boot meta [%d] array has duplicates\n", cpu);
return false;
}
set_bit(meta->buffers[i], subbuf_mask);
- subbuf = (void *)subbuf + subbuf_size;
}
return true;
@@ -1902,13 +1907,22 @@ static int rb_read_data_buffer(struct buffer_data_page *dpage, int tail, int cpu
return events;
}
-static int rb_validate_buffer(struct buffer_data_page *dpage, int cpu)
+static int rb_validate_buffer(struct buffer_data_page *dpage, int cpu,
+ struct ring_buffer_cpu_meta *meta)
{
unsigned long long ts;
+ unsigned long tail;
u64 delta;
- int tail;
- tail = local_read(&dpage->commit);
+ /*
+ * When a sub-buffer is recovered from a read, the commit value may
+ * have RB_MISSED_* bits set, as these bits are reset on reuse.
+ * Even after clearing these bits, a commit value greater than the
+ * subbuf_size is considered invalid.
+ */
+ tail = local_read(&dpage->commit) & ~RB_MISSED_MASK;
+ if (tail > meta->subbuf_size)
+ return -1;
return rb_read_data_buffer(dpage, tail, cpu, &ts, &delta);
}
@@ -1919,6 +1933,7 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
struct buffer_page *head_page, *orig_head;
unsigned long entry_bytes = 0;
unsigned long entries = 0;
+ int discarded = 0;
int ret;
u64 ts;
int i;
@@ -1929,14 +1944,19 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
orig_head = head_page = cpu_buffer->head_page;
/* Do the reader page first */
- ret = rb_validate_buffer(cpu_buffer->reader_page->page, cpu_buffer->cpu);
+ ret = rb_validate_buffer(cpu_buffer->reader_page->page, cpu_buffer->cpu, meta);
if (ret < 0) {
- pr_info("Ring buffer reader page is invalid\n");
- goto invalid;
+ pr_info("Ring buffer meta [%d] invalid reader page detected\n",
+ cpu_buffer->cpu);
+ discarded++;
+ /* Instead of discard whole ring buffer, discard only this sub-buffer. */
+ local_set(&cpu_buffer->reader_page->entries, 0);
+ local_set(&cpu_buffer->reader_page->page->commit, 0);
+ } else {
+ entries += ret;
+ entry_bytes += rb_page_size(cpu_buffer->reader_page);
+ local_set(&cpu_buffer->reader_page->entries, ret);
}
- entries += ret;
- entry_bytes += local_read(&cpu_buffer->reader_page->page->commit);
- local_set(&cpu_buffer->reader_page->entries, ret);
ts = head_page->page->time_stamp;
@@ -1964,7 +1984,7 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
break;
/* Stop rewind if the page is invalid. */
- ret = rb_validate_buffer(head_page->page, cpu_buffer->cpu);
+ ret = rb_validate_buffer(head_page->page, cpu_buffer->cpu, meta);
if (ret < 0)
break;
@@ -2043,21 +2063,24 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
if (head_page == cpu_buffer->reader_page)
continue;
- ret = rb_validate_buffer(head_page->page, cpu_buffer->cpu);
+ ret = rb_validate_buffer(head_page->page, cpu_buffer->cpu, meta);
if (ret < 0) {
- pr_info("Ring buffer meta [%d] invalid buffer page\n",
- cpu_buffer->cpu);
- goto invalid;
- }
-
- /* If the buffer has content, update pages_touched */
- if (ret)
- local_inc(&cpu_buffer->pages_touched);
-
- entries += ret;
- entry_bytes += local_read(&head_page->page->commit);
- local_set(&head_page->entries, ret);
+ if (!discarded)
+ pr_info("Ring buffer meta [%d] invalid buffer page detected\n",
+ cpu_buffer->cpu);
+ discarded++;
+ /* Instead of discard whole ring buffer, discard only this sub-buffer. */
+ local_set(&head_page->entries, 0);
+ local_set(&head_page->page->commit, 0);
+ } else {
+ /* If the buffer has content, update pages_touched */
+ if (ret)
+ local_inc(&cpu_buffer->pages_touched);
+ entries += ret;
+ entry_bytes += rb_page_size(head_page);
+ local_set(&head_page->entries, ret);
+ }
if (head_page == cpu_buffer->commit_page)
break;
}
@@ -2071,7 +2094,8 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
local_set(&cpu_buffer->entries, entries);
local_set(&cpu_buffer->entries_bytes, entry_bytes);
- pr_info("Ring buffer meta [%d] is from previous boot!\n", cpu_buffer->cpu);
+ pr_info("Ring buffer meta [%d] is from previous boot! (%d pages discarded)\n",
+ cpu_buffer->cpu, discarded);
return;
invalid:
@@ -3258,12 +3282,6 @@ rb_iter_head_event(struct ring_buffer_iter *iter)
return NULL;
}
-/* Size is determined by what has been committed */
-static __always_inline unsigned rb_page_size(struct buffer_page *bpage)
-{
- return rb_page_commit(bpage) & ~RB_MISSED_MASK;
-}
-
static __always_inline unsigned
rb_commit_index(struct ring_buffer_per_cpu *cpu_buffer)
{
^ permalink raw reply related
* [PATCH v9 2/4] ring-buffer: Flush and stop persistent ring buffer on panic
From: Masami Hiramatsu (Google) @ 2026-03-11 1:32 UTC (permalink / raw)
To: Steven Rostedt
Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
linux-trace-kernel, Ian Rogers
In-Reply-To: <177319273059.130641.10882692460536780093.stgit@mhiramat.tok.corp.google.com>
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
On real hardware, panic and machine reboot may not flush hardware cache
to memory. This means the persistent ring buffer, which relies on a
coherent state of memory, may not have its events written to the buffer
and they may be lost. Moreover, there may be inconsistency with the
counters which are used for validation of the integrity of the
persistent ring buffer which may cause all data to be discarded.
To avoid this issue, stop recording of the ring buffer on panic and
flush the cache of the ring buffer's memory.
Fixes: e645535a954a ("tracing: Add option to use memmapped memory for trace boot instance")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
Changes in v9:
- Fix typo of & to &&.
- Fix typo of "Generic"
Changes in v6:
- Introduce asm/ring_buffer.h for arch_ring_buffer_flush_range().
- Use flush_cache_vmap() instead of flush_cache_all().
Changes in v5:
- Use ring_buffer_record_off() instead of ring_buffer_record_disable().
- Use flush_cache_all() to ensure flush all cache.
Changes in v3:
- update patch description.
---
arch/alpha/include/asm/Kbuild | 1 +
arch/arc/include/asm/Kbuild | 1 +
arch/arm/include/asm/Kbuild | 1 +
arch/arm64/include/asm/ring_buffer.h | 10 ++++++++++
arch/csky/include/asm/Kbuild | 1 +
arch/hexagon/include/asm/Kbuild | 1 +
arch/loongarch/include/asm/Kbuild | 1 +
arch/m68k/include/asm/Kbuild | 1 +
arch/microblaze/include/asm/Kbuild | 1 +
arch/mips/include/asm/Kbuild | 1 +
arch/nios2/include/asm/Kbuild | 1 +
arch/openrisc/include/asm/Kbuild | 1 +
arch/parisc/include/asm/Kbuild | 1 +
arch/powerpc/include/asm/Kbuild | 1 +
arch/riscv/include/asm/Kbuild | 1 +
arch/s390/include/asm/Kbuild | 1 +
arch/sh/include/asm/Kbuild | 1 +
arch/sparc/include/asm/Kbuild | 1 +
arch/um/include/asm/Kbuild | 1 +
arch/x86/include/asm/Kbuild | 1 +
arch/xtensa/include/asm/Kbuild | 1 +
include/asm-generic/ring_buffer.h | 13 +++++++++++++
kernel/trace/ring_buffer.c | 22 ++++++++++++++++++++++
23 files changed, 65 insertions(+)
create mode 100644 arch/arm64/include/asm/ring_buffer.h
create mode 100644 include/asm-generic/ring_buffer.h
diff --git a/arch/alpha/include/asm/Kbuild b/arch/alpha/include/asm/Kbuild
index 483965c5a4de..b154b4e3dfa8 100644
--- a/arch/alpha/include/asm/Kbuild
+++ b/arch/alpha/include/asm/Kbuild
@@ -5,4 +5,5 @@ generic-y += agp.h
generic-y += asm-offsets.h
generic-y += kvm_para.h
generic-y += mcs_spinlock.h
+generic-y += ring_buffer.h
generic-y += text-patching.h
diff --git a/arch/arc/include/asm/Kbuild b/arch/arc/include/asm/Kbuild
index 4c69522e0328..483caacc6988 100644
--- a/arch/arc/include/asm/Kbuild
+++ b/arch/arc/include/asm/Kbuild
@@ -5,5 +5,6 @@ generic-y += extable.h
generic-y += kvm_para.h
generic-y += mcs_spinlock.h
generic-y += parport.h
+generic-y += ring_buffer.h
generic-y += user.h
generic-y += text-patching.h
diff --git a/arch/arm/include/asm/Kbuild b/arch/arm/include/asm/Kbuild
index 03657ff8fbe3..decad5f2c826 100644
--- a/arch/arm/include/asm/Kbuild
+++ b/arch/arm/include/asm/Kbuild
@@ -3,6 +3,7 @@ generic-y += early_ioremap.h
generic-y += extable.h
generic-y += flat.h
generic-y += parport.h
+generic-y += ring_buffer.h
generated-y += mach-types.h
generated-y += unistd-nr.h
diff --git a/arch/arm64/include/asm/ring_buffer.h b/arch/arm64/include/asm/ring_buffer.h
new file mode 100644
index 000000000000..62316c406888
--- /dev/null
+++ b/arch/arm64/include/asm/ring_buffer.h
@@ -0,0 +1,10 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef _ASM_ARM64_RING_BUFFER_H
+#define _ASM_ARM64_RING_BUFFER_H
+
+#include <asm/cacheflush.h>
+
+/* Flush D-cache on persistent ring buffer */
+#define arch_ring_buffer_flush_range(start, end) dcache_clean_pop(start, end)
+
+#endif /* _ASM_ARM64_RING_BUFFER_H */
diff --git a/arch/csky/include/asm/Kbuild b/arch/csky/include/asm/Kbuild
index 3a5c7f6e5aac..7dca0c6cdc84 100644
--- a/arch/csky/include/asm/Kbuild
+++ b/arch/csky/include/asm/Kbuild
@@ -9,6 +9,7 @@ generic-y += qrwlock.h
generic-y += qrwlock_types.h
generic-y += qspinlock.h
generic-y += parport.h
+generic-y += ring_buffer.h
generic-y += user.h
generic-y += vmlinux.lds.h
generic-y += text-patching.h
diff --git a/arch/hexagon/include/asm/Kbuild b/arch/hexagon/include/asm/Kbuild
index 1efa1e993d4b..0f887d4238ed 100644
--- a/arch/hexagon/include/asm/Kbuild
+++ b/arch/hexagon/include/asm/Kbuild
@@ -5,4 +5,5 @@ generic-y += extable.h
generic-y += iomap.h
generic-y += kvm_para.h
generic-y += mcs_spinlock.h
+generic-y += ring_buffer.h
generic-y += text-patching.h
diff --git a/arch/loongarch/include/asm/Kbuild b/arch/loongarch/include/asm/Kbuild
index 9034b583a88a..7e92957baf6a 100644
--- a/arch/loongarch/include/asm/Kbuild
+++ b/arch/loongarch/include/asm/Kbuild
@@ -10,5 +10,6 @@ generic-y += qrwlock.h
generic-y += user.h
generic-y += ioctl.h
generic-y += mmzone.h
+generic-y += ring_buffer.h
generic-y += statfs.h
generic-y += text-patching.h
diff --git a/arch/m68k/include/asm/Kbuild b/arch/m68k/include/asm/Kbuild
index b282e0dd8dc1..62543bf305ff 100644
--- a/arch/m68k/include/asm/Kbuild
+++ b/arch/m68k/include/asm/Kbuild
@@ -3,5 +3,6 @@ generated-y += syscall_table.h
generic-y += extable.h
generic-y += kvm_para.h
generic-y += mcs_spinlock.h
+generic-y += ring_buffer.h
generic-y += spinlock.h
generic-y += text-patching.h
diff --git a/arch/microblaze/include/asm/Kbuild b/arch/microblaze/include/asm/Kbuild
index 7178f990e8b3..0030309b47ad 100644
--- a/arch/microblaze/include/asm/Kbuild
+++ b/arch/microblaze/include/asm/Kbuild
@@ -5,6 +5,7 @@ generic-y += extable.h
generic-y += kvm_para.h
generic-y += mcs_spinlock.h
generic-y += parport.h
+generic-y += ring_buffer.h
generic-y += syscalls.h
generic-y += tlb.h
generic-y += user.h
diff --git a/arch/mips/include/asm/Kbuild b/arch/mips/include/asm/Kbuild
index 684569b2ecd6..9771c3d85074 100644
--- a/arch/mips/include/asm/Kbuild
+++ b/arch/mips/include/asm/Kbuild
@@ -12,5 +12,6 @@ generic-y += mcs_spinlock.h
generic-y += parport.h
generic-y += qrwlock.h
generic-y += qspinlock.h
+generic-y += ring_buffer.h
generic-y += user.h
generic-y += text-patching.h
diff --git a/arch/nios2/include/asm/Kbuild b/arch/nios2/include/asm/Kbuild
index 28004301c236..0a2530964413 100644
--- a/arch/nios2/include/asm/Kbuild
+++ b/arch/nios2/include/asm/Kbuild
@@ -5,6 +5,7 @@ generic-y += cmpxchg.h
generic-y += extable.h
generic-y += kvm_para.h
generic-y += mcs_spinlock.h
+generic-y += ring_buffer.h
generic-y += spinlock.h
generic-y += user.h
generic-y += text-patching.h
diff --git a/arch/openrisc/include/asm/Kbuild b/arch/openrisc/include/asm/Kbuild
index cef49d60d74c..8aa34621702d 100644
--- a/arch/openrisc/include/asm/Kbuild
+++ b/arch/openrisc/include/asm/Kbuild
@@ -8,4 +8,5 @@ generic-y += spinlock_types.h
generic-y += spinlock.h
generic-y += qrwlock_types.h
generic-y += qrwlock.h
+generic-y += ring_buffer.h
generic-y += user.h
diff --git a/arch/parisc/include/asm/Kbuild b/arch/parisc/include/asm/Kbuild
index 4fb596d94c89..d48d158f7241 100644
--- a/arch/parisc/include/asm/Kbuild
+++ b/arch/parisc/include/asm/Kbuild
@@ -4,4 +4,5 @@ generated-y += syscall_table_64.h
generic-y += agp.h
generic-y += kvm_para.h
generic-y += mcs_spinlock.h
+generic-y += ring_buffer.h
generic-y += user.h
diff --git a/arch/powerpc/include/asm/Kbuild b/arch/powerpc/include/asm/Kbuild
index 2e23533b67e3..805b5aeebb6f 100644
--- a/arch/powerpc/include/asm/Kbuild
+++ b/arch/powerpc/include/asm/Kbuild
@@ -5,4 +5,5 @@ generated-y += syscall_table_spu.h
generic-y += agp.h
generic-y += mcs_spinlock.h
generic-y += qrwlock.h
+generic-y += ring_buffer.h
generic-y += early_ioremap.h
diff --git a/arch/riscv/include/asm/Kbuild b/arch/riscv/include/asm/Kbuild
index bd5fc9403295..7721b63642f4 100644
--- a/arch/riscv/include/asm/Kbuild
+++ b/arch/riscv/include/asm/Kbuild
@@ -14,5 +14,6 @@ generic-y += ticket_spinlock.h
generic-y += qrwlock.h
generic-y += qrwlock_types.h
generic-y += qspinlock.h
+generic-y += ring_buffer.h
generic-y += user.h
generic-y += vmlinux.lds.h
diff --git a/arch/s390/include/asm/Kbuild b/arch/s390/include/asm/Kbuild
index 80bad7de7a04..0c1fc47c3ba0 100644
--- a/arch/s390/include/asm/Kbuild
+++ b/arch/s390/include/asm/Kbuild
@@ -7,3 +7,4 @@ generated-y += unistd_nr.h
generic-y += asm-offsets.h
generic-y += mcs_spinlock.h
generic-y += mmzone.h
+generic-y += ring_buffer.h
diff --git a/arch/sh/include/asm/Kbuild b/arch/sh/include/asm/Kbuild
index 4d3f10ed8275..f0403d3ee8ab 100644
--- a/arch/sh/include/asm/Kbuild
+++ b/arch/sh/include/asm/Kbuild
@@ -3,4 +3,5 @@ generated-y += syscall_table.h
generic-y += kvm_para.h
generic-y += mcs_spinlock.h
generic-y += parport.h
+generic-y += ring_buffer.h
generic-y += text-patching.h
diff --git a/arch/sparc/include/asm/Kbuild b/arch/sparc/include/asm/Kbuild
index 17ee8a273aa6..49c6bb326b75 100644
--- a/arch/sparc/include/asm/Kbuild
+++ b/arch/sparc/include/asm/Kbuild
@@ -4,4 +4,5 @@ generated-y += syscall_table_64.h
generic-y += agp.h
generic-y += kvm_para.h
generic-y += mcs_spinlock.h
+generic-y += ring_buffer.h
generic-y += text-patching.h
diff --git a/arch/um/include/asm/Kbuild b/arch/um/include/asm/Kbuild
index 1b9b82bbe322..2a1629ba8140 100644
--- a/arch/um/include/asm/Kbuild
+++ b/arch/um/include/asm/Kbuild
@@ -17,6 +17,7 @@ generic-y += module.lds.h
generic-y += parport.h
generic-y += percpu.h
generic-y += preempt.h
+generic-y += ring_buffer.h
generic-y += runtime-const.h
generic-y += softirq_stack.h
generic-y += switch_to.h
diff --git a/arch/x86/include/asm/Kbuild b/arch/x86/include/asm/Kbuild
index 4566000e15c4..078fd2c0d69d 100644
--- a/arch/x86/include/asm/Kbuild
+++ b/arch/x86/include/asm/Kbuild
@@ -14,3 +14,4 @@ generic-y += early_ioremap.h
generic-y += fprobe.h
generic-y += mcs_spinlock.h
generic-y += mmzone.h
+generic-y += ring_buffer.h
diff --git a/arch/xtensa/include/asm/Kbuild b/arch/xtensa/include/asm/Kbuild
index 13fe45dea296..e57af619263a 100644
--- a/arch/xtensa/include/asm/Kbuild
+++ b/arch/xtensa/include/asm/Kbuild
@@ -6,5 +6,6 @@ generic-y += mcs_spinlock.h
generic-y += parport.h
generic-y += qrwlock.h
generic-y += qspinlock.h
+generic-y += ring_buffer.h
generic-y += user.h
generic-y += text-patching.h
diff --git a/include/asm-generic/ring_buffer.h b/include/asm-generic/ring_buffer.h
new file mode 100644
index 000000000000..930d96571f23
--- /dev/null
+++ b/include/asm-generic/ring_buffer.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Generic arch dependent ring_buffer macros.
+ */
+#ifndef __ASM_GENERIC_RING_BUFFER_H__
+#define __ASM_GENERIC_RING_BUFFER_H__
+
+#include <linux/cacheflush.h>
+
+/* Flush cache on ring buffer range if needed */
+#define arch_ring_buffer_flush_range(start, end) flush_cache_vmap(start, end)
+
+#endif /* __ASM_GENERIC_RING_BUFFER_H__ */
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 353a5aa1b612..9f4ee9e3803d 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -6,6 +6,7 @@
*/
#include <linux/sched/isolation.h>
#include <linux/trace_recursion.h>
+#include <linux/panic_notifier.h>
#include <linux/trace_events.h>
#include <linux/ring_buffer.h>
#include <linux/trace_clock.h>
@@ -30,6 +31,7 @@
#include <linux/oom.h>
#include <linux/mm.h>
+#include <asm/ring_buffer.h>
#include <asm/local64.h>
#include <asm/local.h>
#include <asm/setup.h>
@@ -589,6 +591,7 @@ struct trace_buffer {
unsigned long range_addr_start;
unsigned long range_addr_end;
+ struct notifier_block flush_nb;
struct ring_buffer_meta *meta;
@@ -2471,6 +2474,16 @@ static void rb_free_cpu_buffer(struct ring_buffer_per_cpu *cpu_buffer)
kfree(cpu_buffer);
}
+/* Stop recording on a persistent buffer and flush cache if needed. */
+static int rb_flush_buffer_cb(struct notifier_block *nb, unsigned long event, void *data)
+{
+ struct trace_buffer *buffer = container_of(nb, struct trace_buffer, flush_nb);
+
+ ring_buffer_record_off(buffer);
+ arch_ring_buffer_flush_range(buffer->range_addr_start, buffer->range_addr_end);
+ return NOTIFY_DONE;
+}
+
static struct trace_buffer *alloc_buffer(unsigned long size, unsigned flags,
int order, unsigned long start,
unsigned long end,
@@ -2590,6 +2603,12 @@ static struct trace_buffer *alloc_buffer(unsigned long size, unsigned flags,
mutex_init(&buffer->mutex);
+ /* Persistent ring buffer needs to flush cache before reboot. */
+ if (start && end) {
+ buffer->flush_nb.notifier_call = rb_flush_buffer_cb;
+ atomic_notifier_chain_register(&panic_notifier_list, &buffer->flush_nb);
+ }
+
return_ptr(buffer);
fail_free_buffers:
@@ -2677,6 +2696,9 @@ ring_buffer_free(struct trace_buffer *buffer)
{
int cpu;
+ if (buffer->range_addr_start && buffer->range_addr_end)
+ atomic_notifier_chain_unregister(&panic_notifier_list, &buffer->flush_nb);
+
cpuhp_state_remove_instance(CPUHP_TRACE_RB_PREPARE, &buffer->node);
irq_work_sync(&buffer->irq_work.work);
^ permalink raw reply related
* [PATCH v9 1/4] ring-buffer: Fix to update per-subbuf entries of persistent ring buffer
From: Masami Hiramatsu (Google) @ 2026-03-11 1:32 UTC (permalink / raw)
To: Steven Rostedt
Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
linux-trace-kernel, Ian Rogers
In-Reply-To: <177319273059.130641.10882692460536780093.stgit@mhiramat.tok.corp.google.com>
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Since the validation loop in rb_meta_validate_events() updates
the same cpu_buffer->head_page->entries, the other subbuf entries
are not updated.
Fix to use head_page to update the entries field, since it is the
cursor in this loop.
Fixes: 5f3b6e839f3c ("ring-buffer: Validate boot range memory events")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
kernel/trace/ring_buffer.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index f16f053ef77d..353a5aa1b612 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -2053,7 +2053,7 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
entries += ret;
entry_bytes += local_read(&head_page->page->commit);
- local_set(&cpu_buffer->head_page->entries, ret);
+ local_set(&head_page->entries, ret);
if (head_page == cpu_buffer->commit_page)
break;
^ permalink raw reply related
* [PATCH v9 0/4] ring-buffer: Making persistent ring buffers robust
From: Masami Hiramatsu (Google) @ 2026-03-11 1:32 UTC (permalink / raw)
To: Steven Rostedt
Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
linux-trace-kernel, Ian Rogers
Hi,
Here is the 9th version of improvement patches for making persistent
ring buffers robust to failures.
The previous version is here:
https://lore.kernel.org/all/177303264034.767813.5345788067082238396.stgit@mhiramat.tok.corp.google.com/
In this version, I fixed bugs/typos in [2/4][3/4] and add a bugfix patch
[1/4] and a test[4/4]. Also, add a meta->subbuf_size validation[3/4].
Thank you,
---
Masami Hiramatsu (Google) (4):
ring-buffer: Fix to update per-subbuf entries of persistent ring buffer
ring-buffer: Flush and stop persistent ring buffer on panic
ring-buffer: Skip invalid sub-buffers when validating persistent ring buffer
ring-buffer: Add persistent ring buffer selftest
arch/alpha/include/asm/Kbuild | 1
arch/arc/include/asm/Kbuild | 1
arch/arm/include/asm/Kbuild | 1
arch/arm64/include/asm/ring_buffer.h | 10 ++
arch/csky/include/asm/Kbuild | 1
arch/hexagon/include/asm/Kbuild | 1
arch/loongarch/include/asm/Kbuild | 1
arch/m68k/include/asm/Kbuild | 1
arch/microblaze/include/asm/Kbuild | 1
arch/mips/include/asm/Kbuild | 1
arch/nios2/include/asm/Kbuild | 1
arch/openrisc/include/asm/Kbuild | 1
arch/parisc/include/asm/Kbuild | 1
arch/powerpc/include/asm/Kbuild | 1
arch/riscv/include/asm/Kbuild | 1
arch/s390/include/asm/Kbuild | 1
arch/sh/include/asm/Kbuild | 1
arch/sparc/include/asm/Kbuild | 1
arch/um/include/asm/Kbuild | 1
arch/x86/include/asm/Kbuild | 1
arch/xtensa/include/asm/Kbuild | 1
include/asm-generic/ring_buffer.h | 13 +++
include/linux/ring_buffer.h | 1
kernel/trace/Kconfig | 15 +++
kernel/trace/ring_buffer.c | 169 ++++++++++++++++++++++++++--------
kernel/trace/trace.c | 4 +
26 files changed, 192 insertions(+), 40 deletions(-)
create mode 100644 arch/arm64/include/asm/ring_buffer.h
create mode 100644 include/asm-generic/ring_buffer.h
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply
* Re: [PATCH 38/61] net: Prefer IS_ERR_OR_NULL over manual NULL check
From: Russell King (Oracle) @ 2026-03-11 0:16 UTC (permalink / raw)
To: Philipp Hahn
Cc: amd-gfx, apparmor, bpf, ceph-devel, cocci, dm-devel, dri-devel,
gfs2, intel-gfx, intel-wired-lan, iommu, kvm, linux-arm-kernel,
linux-block, linux-bluetooth, linux-btrfs, linux-cifs, linux-clk,
linux-erofs, linux-ext4, linux-fsdevel, linux-gpio, linux-hyperv,
linux-input, linux-kernel, linux-leds, linux-media, linux-mips,
linux-mm, linux-modules, linux-mtd, linux-nfs, linux-omap,
linux-phy, linux-pm, linux-rockchip, linux-s390, linux-scsi,
linux-sctp, linux-security-module, linux-sh, linux-sound,
linux-stm32, linux-trace-kernel, linux-usb, linux-wireless,
netdev, ntfs3, samba-technical, sched-ext, target-devel,
tipc-discussion, v9fs, Igor Russkikh, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Pavan Chebbi, Michael Chan, Potnuri Bharat Teja, Tony Nguyen,
Przemek Kitszel, Taras Chornyi, Maxime Coquelin, Alexandre Torgue,
Iyappan Subramanian, Keyur Chudgar, Quan Nguyen, Heiner Kallweit
In-Reply-To: <20260310-b4-is_err_or_null-v1-38-bd63b656022d@avm.de>
On Tue, Mar 10, 2026 at 12:49:04PM +0100, Philipp Hahn wrote:
> diff --git a/drivers/net/mdio/mdio-xgene.c b/drivers/net/mdio/mdio-xgene.c
> index a8f91a4b7fed0927ee14e408000cd3a2bfb9b09a..09b30b563295c6085dc1358ac361301e5cf6b2a8 100644
> --- a/drivers/net/mdio/mdio-xgene.c
> +++ b/drivers/net/mdio/mdio-xgene.c
> @@ -265,7 +265,7 @@ struct phy_device *xgene_enet_phy_register(struct mii_bus *bus, int phy_addr)
> struct phy_device *phy_dev;
>
> phy_dev = get_phy_device(bus, phy_addr, false);
> - if (!phy_dev || IS_ERR(phy_dev))
> + if (IS_ERR_OR_NULL(phy_dev))
As noted in reply to your cover message, the check for NULL here is
incorrect - get_phy_device() returns either a valid pointer or an
error pointer, but never NULL.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
^ permalink raw reply
* Re: [PATCH 00/61] treewide: Use IS_ERR_OR_NULL over manual NULL check - refactor
From: Russell King (Oracle) @ 2026-03-11 0:09 UTC (permalink / raw)
To: Philipp Hahn
Cc: amd-gfx, apparmor, bpf, ceph-devel, cocci, dm-devel, dri-devel,
gfs2, intel-gfx, intel-wired-lan, iommu, kvm, linux-arm-kernel,
linux-block, linux-bluetooth, linux-btrfs, linux-cifs, linux-clk,
linux-erofs, linux-ext4, linux-fsdevel, linux-gpio, linux-hyperv,
linux-input, linux-kernel, linux-leds, linux-media, linux-mips,
linux-mm, linux-modules, linux-mtd, linux-nfs, linux-omap,
linux-phy, linux-pm, linux-rockchip, linux-s390, linux-scsi,
linux-sctp, linux-security-module, linux-sh, linux-sound,
linux-stm32, linux-trace-kernel, linux-usb, linux-wireless,
netdev, ntfs3, samba-technical, sched-ext, target-devel,
tipc-discussion, v9fs, Julia Lawall, Nicolas Palix, Chris Mason,
David Sterba, Ilya Dryomov, Alex Markuze, Viacheslav Dubeyko,
Theodore Ts'o, Andreas Dilger, Steve French, Paulo Alcantara,
Ronnie Sahlberg, Shyam Prasad N, Tom Talpey, Bharath SM,
Eric Van Hensbergen, Latchesar Ionkov, Dominique Martinet,
Christian Schoenebeck, Gao Xiang, Chao Yu, Yue Hu, Jeffle Xu,
Sandeep Dhavale, Hongbo Li, Chunhai Guo, Miklos Szeredi,
Konstantin Komarov, Andreas Gruenbacher, Kees Cook, Tony Luck,
Guilherme G. Piccoli, Jan Kara, Phillip Lougher, Alexander Viro,
Christian Brauner, Jan Kara, Steven Rostedt, Masami Hiramatsu,
Mathieu Desnoyers, Tejun Heo, David Vernet, Andrea Righi,
Changwoo Min, Ingo Molnar, Peter Zijlstra, Juri Lelli,
Vincent Guittot, Dietmar Eggemann, Ben Segall, Mel Gorman,
Valentin Schneider, Luis Chamberlain, Petr Pavlu, Daniel Gomez,
Sami Tolvanen, Aaron Tomlin, Sylwester Nawrocki, Liam Girdwood,
Mark Brown, Jaroslav Kysela, Takashi Iwai, Max Filippov,
Paolo Bonzini, John Johansen, Paul Moore, James Morris,
Serge E. Hallyn, Andrew Morton, Alasdair Kergon, Mike Snitzer,
Mikulas Patocka, Benjamin Marzinski, David S. Miller, David Ahern,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
Marcel Holtmann, Johan Hedberg, Luiz Augusto von Dentz,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, Stanislav Fomichev, Jamal Hadi Salim, Jiri Pirko,
Marcelo Ricardo Leitner, Xin Long, Trond Myklebust,
Anna Schumaker, Chuck Lever, Jeff Layton, NeilBrown,
Olga Kornievskaia, Dai Ngo, Jon Maloy, Johannes Berg,
Catalin Marinas, John Crispin, Thomas Bogendoerfer,
Yoshinori Sato, Rich Felker, John Paul Adrian Glaubitz,
Andrzej Hajda, Neil Armstrong, Robert Foss, Laurent Pinchart,
Jonas Karlman, Jernej Skrabec, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Zhenyu Wang,
Zhi Wang, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
Tvrtko Ursulin, Alex Deucher, Christian König, Sandy Huang,
Heiko Stübner, Andy Yan, Igor Russkikh, Andrew Lunn,
Pavan Chebbi, Michael Chan, Potnuri Bharat Teja, Tony Nguyen,
Przemek Kitszel, Taras Chornyi, Maxime Coquelin, Alexandre Torgue,
Iyappan Subramanian, Keyur Chudgar, Quan Nguyen, Heiner Kallweit,
Marc Zyngier, Thomas Gleixner, Andrew Lunn, Gregory Clement,
Sebastian Hesselbarth, Vinod Koul, Linus Walleij, Ulf Hansson,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, Martin K. Petersen,
Eduardo Valentin, Keerthy, Rafael J. Wysocki, Daniel Lezcano,
Zhang Rui, Lukasz Luba, Alex Williamson, Mark Greer,
Miquel Raynal, Richard Weinberger, Vignesh Raghavendra,
Shuah Khan, Kieran Bingham, Mauro Carvalho Chehab, Joerg Roedel,
Will Deacon, Robin Murphy, Lee Jones, Pavel Machek, Dave Penkler,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
Justin Sanders, Jens Axboe, Georgi Djakov, Michael Turquette,
Stephen Boyd, Philipp Zabel, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Pali Rohár, Dmitry Torokhov
In-Reply-To: <20260310-b4-is_err_or_null-v1-0-bd63b656022d@avm.de>
On Tue, Mar 10, 2026 at 12:48:26PM +0100, Philipp Hahn wrote:
> While doing some static code analysis I stumbled over a common pattern,
> where IS_ERR() is combined with a NULL check. For that there is
> IS_ERR_OR_NULL().
One thing you need to check for each of these cases is whether the tests
are actually correct.
There are certainly cases amongst those that you have identified where
the check for NULL is redundant.
For example, get_phy_device() never returns NULL, yet in your netdev
patch, you have at least one instance where the return value of
get_phy_device() is checked for NULL and IS_ERR() which you then
turn into IS_ERR_OR_NULL(). Instead, the NULL check should be dropped
(as a separate patch.)
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
^ permalink raw reply
* Re: [PATCH v2 2/2] tree-wide: rename do_exit() to task_exit()
From: Steven Rostedt @ 2026-03-11 0:02 UTC (permalink / raw)
To: Christian Brauner
Cc: Linus Torvalds, linux-kernel, linux-modules, linux-nfs, bpf,
kunit-dev, linux-doc, linux-trace-kernel, netfs, io-uring, audit,
rcu, kvm, virtualization, netdev, linux-mm, linux-security-module,
Christian Loehle, linux-fsdevel
In-Reply-To: <20260310-work-kernel-exit-v2-2-30711759d87b@kernel.org>
On Tue, 10 Mar 2026 15:56:10 +0100
Christian Brauner <brauner@kernel.org> wrote:
> diff --git a/tools/testing/selftests/bpf/progs/tracing_failure.c b/tools/testing/selftests/bpf/progs/tracing_failure.c
> index 65e485c4468c..5144f4cc5787 100644
> --- a/tools/testing/selftests/bpf/progs/tracing_failure.c
> +++ b/tools/testing/selftests/bpf/progs/tracing_failure.c
> @@ -25,7 +25,7 @@ int BPF_PROG(tracing_deny)
> return 0;
> }
>
> -SEC("?fexit/do_exit")
> +SEC("?fexit/task_exit")
> int BPF_PROG(fexit_noreturns)
> {
> return 0;
> diff --git a/tools/testing/selftests/ftrace/test.d/dynevent/fprobe_syntax_errors.tc b/tools/testing/selftests/ftrace/test.d/dynevent/fprobe_syntax_errors.tc
> index fee479295e2f..7e00d8ecd110 100644
> --- a/tools/testing/selftests/ftrace/test.d/dynevent/fprobe_syntax_errors.tc
> +++ b/tools/testing/selftests/ftrace/test.d/dynevent/fprobe_syntax_errors.tc
> @@ -82,7 +82,7 @@ check_error 'f vfs_read arg1=^' # NO_ARG_BODY
> # multiprobe errors
> if grep -q "Create/append/" README && grep -q "imm-value" README; then
> echo "f:fprobes/testevent $FUNCTION_FORK" > dynamic_events
> -check_error '^f:fprobes/testevent do_exit%return' # DIFF_PROBE_TYPE
> +check_error '^f:fprobes/testevent task_exit%return' # DIFF_PROBE_TYPE
>
> # Explicitly use printf "%s" to not interpret \1
> printf "%s" "f:fprobes/testevent $FUNCTION_FORK abcd=\\1" > dynamic_events
> diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_multiprobe.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_multiprobe.tc
> index f0d5b7777ed7..a95e3824690a 100644
> --- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_multiprobe.tc
> +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_multiprobe.tc
> @@ -5,7 +5,7 @@
>
> # Choose 2 symbols for target
> SYM1=$FUNCTION_FORK
> -SYM2=do_exit
> +SYM2=task_exit
> EVENT_NAME=kprobes/testevent
>
> DEF1="p:$EVENT_NAME $SYM1"
> diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_syntax_errors.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_syntax_errors.tc
> index 8f1c58f0c239..b55ea3c05cfa 100644
> --- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_syntax_errors.tc
> +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_syntax_errors.tc
> @@ -87,7 +87,7 @@ esac
> # multiprobe errors
> if grep -q "Create/append/" README && grep -q "imm-value" README; then
> echo "p:kprobes/testevent $FUNCTION_FORK" > kprobe_events
> -check_error '^r:kprobes/testevent do_exit' # DIFF_PROBE_TYPE
> +check_error '^r:kprobes/testevent task_exit' # DIFF_PROBE_TYPE
>
> # Explicitly use printf "%s" to not interpret \1
> printf "%s" "p:kprobes/testevent $FUNCTION_FORK abcd=\\1" > kprobe_events
These tests need to pass on old kernels too. So we can't just do a
"s/do_exit/task_exit/" conversion. It needs to test for task_exit first,
and if not found, fallback to do_exit.
See how we handled the _do_fork() > kernel_clone() rename:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/ftrace/test.d/functions#n182
-- Steve
^ permalink raw reply
* Re: [PATCH v2 2/2] tree-wide: rename do_exit() to task_exit()
From: Frederic Weisbecker @ 2026-03-10 22:30 UTC (permalink / raw)
To: Christian Brauner
Cc: Linus Torvalds, linux-kernel, linux-modules, linux-nfs, bpf,
kunit-dev, linux-doc, linux-trace-kernel, netfs, io-uring, audit,
rcu, kvm, virtualization, netdev, linux-mm, linux-security-module,
Christian Loehle, linux-fsdevel
In-Reply-To: <20260310-work-kernel-exit-v2-2-30711759d87b@kernel.org>
Le Tue, Mar 10, 2026 at 03:56:10PM +0100, Christian Brauner a écrit :
> Rename do_exit() to task_exit() so it's clear that this is an api and
> not a hidden internal helper.
>
> Signed-off-by: Christian Brauner <brauner@kernel.org>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
--
Frederic Weisbecker
SUSE Labs
^ permalink raw reply
* Re: [PATCH v2 1/2] kthread: remove kthread_exit()
From: Frederic Weisbecker @ 2026-03-10 22:26 UTC (permalink / raw)
To: Christian Brauner
Cc: Linus Torvalds, linux-kernel, linux-modules, linux-nfs, bpf,
kunit-dev, linux-doc, linux-trace-kernel, netfs, io-uring, audit,
rcu, kvm, virtualization, netdev, linux-mm, linux-security-module,
Christian Loehle, linux-fsdevel
In-Reply-To: <20260310-work-kernel-exit-v2-1-30711759d87b@kernel.org>
Le Tue, Mar 10, 2026 at 03:56:09PM +0100, Christian Brauner a écrit :
> In 28aaa9c39945 ("kthread: consolidate kthread exit paths to prevent use-after-free")
> we folded kthread_exit() into do_exit() when we fixed a nasty UAF bug.
> We left kthread_exit() around as an alias to do_exit(). Remove it
> completely.
Thanks for fixing that UAF! I unfortunately missed it.
>
> Reported-by: Christian Loehle <christian.loehle@arm.com>
> Link: https://lore.kernel.org/1ff1bce2-8bb4-463c-a631-16e14f4ea7e2@arm.com
> Signed-off-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
--
Frederic Weisbecker
SUSE Labs
^ permalink raw reply
* [PATCH v2 1/3] tracing: Have futex syscall trace event show specific user data
From: Steven Rostedt @ 2026-03-10 20:09 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
Thomas Gleixner, Peter Zijlstra, Brian Geffon, John Stultz,
Ian Rogers, Suleiman Souhlal
In-Reply-To: <20260310200954.285663884@kernel.org>
From: Steven Rostedt <rostedt@goodmis.org>
Add specific reporting of the futex system call. This allows for debugging
the futex code a bit easier. Instead of just showing the values passed
into the futex system call, read the value of the user space memory
pointed to by the addr parameter.
Also make the op parameter more readable by parsing the values to show
what the command is:
futex_requeue_p-3251 [002] ..... 2101.068479: sys_futex(uaddr: 0x55e79a4da834 (0x80000cb1), FUTEX_LOCK_PI|FUTEX_PRIVATE_FLAG, val: 0)
futex_requeue_p-3248 [001] ..... 2101.068970: sys_futex(uaddr: 0x7f859072f990 (0xcb2), FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, val: 3250)
futex_requeue_p-3252 [005] ..... 2101.069108: sys_futex(uaddr: 0x55e79a4da838 (0), FUTEX_WAIT_REQUEUE_PI|FUTEX_PRIVATE_FLAG, val: 0, timespec: 0x7ffe61076aa0, uaddr2: 0x55e79a4da834, uaddr2: 94453214586932, val3: 0)
futex_requeue_p-3252 [005] ..... 2101.069410: sys_futex(uaddr: 0x55e79a4da834 (0x80000cb1), FUTEX_LOCK_PI|FUTEX_PRIVATE_FLAG, val: 0)
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
Changes since v1: https://lore.kernel.org/all/20260303214942.428502100@kernel.org/
- Moved the printing of the futex parameters to kernel/futex/syscall.c to
keep it closer to the actual system call.
include/linux/futex.h | 7 +-
kernel/futex/syscalls.c | 89 ++++++++++++++++++
kernel/trace/trace_syscalls.c | 168 +++++++++++++++++++++++++++++++++-
3 files changed, 260 insertions(+), 4 deletions(-)
diff --git a/include/linux/futex.h b/include/linux/futex.h
index 9e9750f04980..9fc47aa01a8b 100644
--- a/include/linux/futex.h
+++ b/include/linux/futex.h
@@ -82,6 +82,10 @@ long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
u32 __user *uaddr2, u32 val2, u32 val3);
int futex_hash_prctl(unsigned long arg2, unsigned long arg3, unsigned long arg4);
+#ifdef CONFIG_FTRACE_SYSCALLS
+void futex_print_syscall(struct seq_buf *s, int nr_args, unsigned long *args, u32 *ptr);
+#endif
+
#ifdef CONFIG_FUTEX_PRIVATE_HASH
int futex_hash_allocate_default(void);
void futex_hash_free(struct mm_struct *mm);
@@ -114,7 +118,8 @@ static inline int futex_hash_allocate_default(void)
}
static inline int futex_hash_free(struct mm_struct *mm) { return 0; }
static inline int futex_mm_init(struct mm_struct *mm) { return 0; }
-
+static inline void futex_print_syscall(struct seq_buf *s, int nr_args,
+ unsigned long *args, u32 *ptr) { }
#endif
#endif
diff --git a/kernel/futex/syscalls.c b/kernel/futex/syscalls.c
index 743c7a728237..a1cd512aa502 100644
--- a/kernel/futex/syscalls.c
+++ b/kernel/futex/syscalls.c
@@ -171,6 +171,18 @@ static __always_inline bool futex_cmd_has_timeout(u32 cmd)
return false;
}
+static __always_inline bool futex_cmd_has_addr2(u32 cmd)
+{
+ switch (cmd) {
+ case FUTEX_REQUEUE:
+ case FUTEX_CMP_REQUEUE:
+ case FUTEX_WAKE_OP:
+ case FUTEX_WAIT_REQUEUE_PI:
+ return true;
+ }
+ return false;
+}
+
static __always_inline int
futex_init_timeout(u32 cmd, u32 op, struct timespec64 *ts, ktime_t *t)
{
@@ -207,6 +219,83 @@ SYSCALL_DEFINE6(futex, u32 __user *, uaddr, int, op, u32, val,
return do_futex(uaddr, op, val, tp, uaddr2, (unsigned long)utime, val3);
}
+#ifdef CONFIG_FTRACE_SYSCALLS
+static const char * __futex_cmds[] =
+{
+ "FUTEX_WAIT", "FUTEX_WAKE", "FUTEX_FD", "FUTEX_REQUEUE",
+ "FUTEX_CMP_REQUEUE", "FUTEX_WAKE_OP", "FUTEX_LOCK_PI",
+ "FUTEX_UNLOCK_PI", "FUTEX_TRYLOCK_PI", "FUTEX_WAIT_BITSET",
+ "FUTEX_WAKE_BITSET", "FUTEX_WAIT_REQUEUE_PI", "FUTEX_CMP_REQUEUE_PI",
+ "FUTEX_LOCK_PI2",
+};
+
+void futex_print_syscall(struct seq_buf *s, int nr_args, unsigned long *args, u32 *ptr)
+{
+ unsigned int op, cmd;
+ bool done = false;
+
+ for (int i = 0; !done && i < nr_args; i++) {
+
+ if (seq_buf_has_overflowed(s))
+ break;
+
+ switch (i) {
+ case 0:
+ seq_buf_printf(s, "uaddr: 0x%lx", args[i]);
+ if (ptr) {
+ u32 val = *ptr;
+ if (val < 10)
+ seq_buf_printf(s, " (%u)", val);
+ else
+ seq_buf_printf(s, " (0x%x)", val);
+ }
+ continue;
+ case 1:
+ op = args[i];
+ cmd = op & FUTEX_CMD_MASK;
+ if (cmd <= FUTEX_LOCK_PI2)
+ seq_buf_printf(s, ", %s", __futex_cmds[cmd]);
+ else
+ seq_buf_puts(s, ", UNKNOWN");
+
+ if (op & FUTEX_PRIVATE_FLAG)
+ seq_buf_puts(s, "|FUTEX_PRIVATE_FLAG");
+ if (op & FUTEX_CLOCK_REALTIME)
+ seq_buf_puts(s, "|FUTEX_CLOCK_REALTIME");
+ continue;
+ case 3:
+ if (!futex_cmd_has_timeout(cmd)) {
+
+ if (!futex_cmd_has_addr2(cmd)) {
+ done = true;
+ continue;
+ }
+
+ seq_buf_printf(s, ", val2: 0x%x", (u32)(long)args[i]);
+ continue;
+ }
+
+ if (!args[i])
+ continue;
+
+ seq_buf_printf(s, ", timespec: 0x%lx", args[i]);
+ continue;
+ case 4:
+ if (!futex_cmd_has_addr2(cmd)) {
+ done = true;
+ continue;
+ }
+ seq_buf_printf(s, ", uaddr2: 0x%lx", args[i]);
+ continue;
+ case 5:
+ seq_buf_printf(s, ", val3: %lu", args[i]);
+ done = true;
+ continue;
+ }
+ }
+}
+#endif
+
/**
* futex_parse_waitv - Parse a waitv array from userspace
* @futexv: Kernel side list of waiters to be filled
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 37317b81fcda..8cb3af569157 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -6,11 +6,13 @@
#include <linux/slab.h>
#include <linux/kernel.h>
#include <linux/module.h> /* for MODULE_NAME_LEN via KSYM_SYMBOL_LEN */
+#include <linux/futex.h>
#include <linux/ftrace.h>
#include <linux/perf_event.h>
#include <linux/xarray.h>
#include <asm/syscall.h>
+
#include "trace_output.h"
#include "trace.h"
@@ -237,6 +239,27 @@ sys_enter_openat_print(struct syscall_trace_enter *trace, struct syscall_metadat
return trace_handle_return(s);
}
+static enum print_line_t
+sys_enter_futex_print(struct syscall_trace_enter *trace, struct syscall_metadata *entry,
+ struct trace_seq *s, struct trace_event *event, int ent_size)
+{
+ void *end = (void *)trace + ent_size;
+ void *ptr;
+
+ /* Set ptr to the user space copied area */
+ ptr = (void *)trace->args + sizeof(unsigned long) * entry->nb_args;
+ if (ptr + 4 > end)
+ ptr = NULL;
+
+ trace_seq_printf(s, "%s(", entry->name);
+
+ futex_print_syscall(&s->seq, entry->nb_args, trace->args, ptr);
+
+ trace_seq_puts(s, ")\n");
+
+ return trace_handle_return(s);
+}
+
static enum print_line_t
print_syscall_enter(struct trace_iterator *iter, int flags,
struct trace_event *event)
@@ -267,6 +290,10 @@ print_syscall_enter(struct trace_iterator *iter, int flags,
if (!tr || !(tr->trace_flags & TRACE_ITER(VERBOSE)))
return sys_enter_openat_print(trace, entry, s, event);
break;
+ case __NR_futex:
+ if (!tr || !(tr->trace_flags & TRACE_ITER(VERBOSE)))
+ return sys_enter_futex_print(trace, entry, s, event, iter->ent_size);
+ break;
default:
break;
}
@@ -437,6 +464,69 @@ sys_enter_openat_print_fmt(struct syscall_metadata *entry, char *buf, int len)
return pos;
}
+static int __init
+sys_enter_futex_print_fmt(struct syscall_metadata *entry, char *buf, int len)
+{
+ int pos = 0;
+
+ pos += snprintf(buf + pos, LEN_OR_ZERO,
+ "\"uaddr: 0x%%lx (0x%%lx) cmd=%%s%%s%%s");
+ pos += snprintf(buf + pos, LEN_OR_ZERO,
+ " val: 0x%%x timeout/val2: 0x%%llx");
+ pos += snprintf(buf + pos, LEN_OR_ZERO,
+ " uaddr2: 0x%%lx val3: 0x%%x\", ");
+
+ pos += snprintf(buf + pos, LEN_OR_ZERO,
+ " REC->uaddr,");
+ pos += snprintf(buf + pos, LEN_OR_ZERO,
+ " REC->__value,");
+ pos += snprintf(buf + pos, LEN_OR_ZERO,
+ " __print_symbolic(REC->op & 0x%x, ", FUTEX_CMD_MASK);
+
+ pos += snprintf(buf + pos, LEN_OR_ZERO,
+ "{%d, \"FUTEX_WAIT\"}, ", FUTEX_WAIT);
+ pos += snprintf(buf + pos, LEN_OR_ZERO,
+ "{%d, \"FUTEX_WAKE\"}, ", FUTEX_WAKE);
+ pos += snprintf(buf + pos, LEN_OR_ZERO,
+ "{%d, \"FUTEX_FD\"}, ", FUTEX_FD);
+ pos += snprintf(buf + pos, LEN_OR_ZERO,
+ "{%d, \"FUTEX_REQUEUE\"}, ", FUTEX_REQUEUE);
+ pos += snprintf(buf + pos, LEN_OR_ZERO,
+ "{%d, \"FUTEX_CMP_REQUEUE\"}, ", FUTEX_CMP_REQUEUE);
+ pos += snprintf(buf + pos, LEN_OR_ZERO,
+ "{%d, \"FUTEX_WAKE_OP\"}, ", FUTEX_WAKE_OP);
+ pos += snprintf(buf + pos, LEN_OR_ZERO,
+ "{%d, \"FUTEX_LOCK_PI\"}, ", FUTEX_LOCK_PI);
+ pos += snprintf(buf + pos, LEN_OR_ZERO,
+ "{%d, \"FUTEX_UNLOCK_PI\"}, ", FUTEX_UNLOCK_PI);
+ pos += snprintf(buf + pos, LEN_OR_ZERO,
+ "{%d, \"FUTEX_TRYLOCK_PI\"}, ", FUTEX_TRYLOCK_PI);
+ pos += snprintf(buf + pos, LEN_OR_ZERO,
+ "{%d, \"FUTEX_WAIT_BITSET\"}, ", FUTEX_WAIT_BITSET);
+ pos += snprintf(buf + pos, LEN_OR_ZERO,
+ "{%d, \"FUTEX_WAKE_BITSET\"}, ", FUTEX_WAKE_BITSET);
+ pos += snprintf(buf + pos, LEN_OR_ZERO,
+ "{%d, \"FUTEX_WAIT_REQUEUE_PI\"}, ", FUTEX_WAIT_REQUEUE_PI);
+ pos += snprintf(buf + pos, LEN_OR_ZERO,
+ "{%d, \"FUTEX_CMP_REQUEUE_PI\"}, ", FUTEX_CMP_REQUEUE_PI);
+ pos += snprintf(buf + pos, LEN_OR_ZERO,
+ "{%d, \"FUTEX_LOCK_PI2\"}),", FUTEX_LOCK_PI2);
+
+ pos += snprintf(buf + pos, LEN_OR_ZERO,
+ " (REC->op & %d) ? \"|FUTEX_PRIVATE_FLAG\" : \"\",",
+ FUTEX_PRIVATE_FLAG);
+ pos += snprintf(buf + pos, LEN_OR_ZERO,
+ " (REC->op & %d) ? \"|FUTEX_CLOCK_REALTIME\" : \"\",",
+ FUTEX_CLOCK_REALTIME);
+
+ pos += snprintf(buf + pos, LEN_OR_ZERO,
+ " REC->val, REC->utime,");
+
+ pos += snprintf(buf + pos, LEN_OR_ZERO,
+ " REC->uaddr, REC->val3");
+ return pos;
+}
+
static int __init
__set_enter_print_fmt(struct syscall_metadata *entry, char *buf, int len)
{
@@ -447,6 +537,8 @@ __set_enter_print_fmt(struct syscall_metadata *entry, char *buf, int len)
switch (entry->syscall_nr) {
case __NR_openat:
return sys_enter_openat_print_fmt(entry, buf, len);
+ case __NR_futex:
+ return sys_enter_futex_print_fmt(entry, buf, len);
default:
break;
}
@@ -523,6 +615,21 @@ static void __init free_syscall_print_fmt(struct trace_event_call *call)
kfree(call->print_fmt);
}
+static int __init futex_fields(struct trace_event_call *call, int offset)
+{
+ char *arg;
+ int ret;
+
+ arg = kstrdup("__value", GFP_KERNEL);
+ if (WARN_ON_ONCE(!arg))
+ return -ENOMEM;
+ ret = trace_define_field(call, "u32", arg, offset, sizeof(int), 0,
+ FILTER_OTHER);
+ if (ret)
+ kfree(arg);
+ return ret;
+}
+
static int __init syscall_enter_define_fields(struct trace_event_call *call)
{
struct syscall_trace_enter trace;
@@ -544,6 +651,9 @@ static int __init syscall_enter_define_fields(struct trace_event_call *call)
offset += sizeof(unsigned long);
}
+ if (!ret && meta->syscall_nr == __NR_futex)
+ return futex_fields(call, offset);
+
if (ret || !meta->user_mask)
return ret;
@@ -689,6 +799,48 @@ static int syscall_copy_user_array(char *buf, const char __user *ptr,
return 0;
}
+static int
+syscall_get_futex(unsigned long *args, char **buffer, int *size, int buf_size)
+{
+ struct syscall_user_buffer *sbuf;
+ const char __user *ptr;
+ char *buf;
+
+ /* buf_size of zero means user doesn't want user space read */
+ if (!buf_size)
+ return -1;
+
+ /* If the syscall_buffer is NULL, tracing is being shutdown */
+ sbuf = READ_ONCE(syscall_buffer);
+ if (!sbuf)
+ return -1;
+
+ ptr = (char __user *)args[0];
+
+ *buffer = trace_user_fault_read(&sbuf->buf, ptr, 4, NULL, NULL);
+ if (!*buffer)
+ return -1;
+
+ /* Add room for the value */
+ *size += 4;
+
+ buf = *buffer;
+
+ return 0;
+}
+
+static void syscall_put_futex(struct syscall_metadata *sys_data,
+ struct syscall_trace_enter *entry,
+ char *buffer)
+{
+ u32 *ptr;
+
+ /* Place the futex key into the storage */
+ ptr = (void *)entry->args + sizeof(unsigned long) * sys_data->nb_args;
+
+ *ptr = *(u32 *)buffer;
+}
+
static char *sys_fault_user(unsigned int buf_size,
struct syscall_metadata *sys_data,
struct syscall_user_buffer *sbuf,
@@ -905,6 +1057,9 @@ static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
if (syscall_get_data(sys_data, args, &user_ptr,
&size, user_sizes, &uargs, tr->syscall_buf_sz) < 0)
return;
+ } else if (syscall_nr == __NR_futex) {
+ if (syscall_get_futex(args, &user_ptr, &size, tr->syscall_buf_sz) < 0)
+ return;
}
size += sizeof(*entry) + sizeof(unsigned long) * sys_data->nb_args;
@@ -921,6 +1076,9 @@ static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
if (mayfault)
syscall_put_data(sys_data, entry, user_ptr, size, user_sizes, uargs);
+ else if (syscall_nr == __NR_futex)
+ syscall_put_futex(sys_data, entry, user_ptr);
+
trace_event_buffer_commit(&fbuffer);
}
@@ -971,14 +1129,18 @@ static int reg_event_syscall_enter(struct trace_event_file *file,
{
struct syscall_metadata *sys_data = call->data;
struct trace_array *tr = file->tr;
+ bool enable_faults;
int ret = 0;
int num;
num = sys_data->syscall_nr;
if (WARN_ON_ONCE(num < 0 || num >= NR_syscalls))
return -ENOSYS;
+
+ enable_faults = sys_data->user_mask || num == __NR_futex;
+
guard(mutex)(&syscall_trace_lock);
- if (sys_data->user_mask) {
+ if (enable_faults) {
ret = syscall_fault_buffer_enable();
if (ret < 0)
return ret;
@@ -986,7 +1148,7 @@ static int reg_event_syscall_enter(struct trace_event_file *file,
if (!tr->sys_refcount_enter) {
ret = register_trace_sys_enter(ftrace_syscall_enter, tr);
if (ret < 0) {
- if (sys_data->user_mask)
+ if (enable_faults)
syscall_fault_buffer_disable();
return ret;
}
@@ -1011,7 +1173,7 @@ static void unreg_event_syscall_enter(struct trace_event_file *file,
WRITE_ONCE(tr->enter_syscall_files[num], NULL);
if (!tr->sys_refcount_enter)
unregister_trace_sys_enter(ftrace_syscall_enter, tr);
- if (sys_data->user_mask)
+ if (sys_data->user_mask || num == __NR_futex)
syscall_fault_buffer_disable();
}
--
2.51.0
^ permalink raw reply related
* [PATCH v2 3/3] tracing: Show TID and flags for PI futex system call trace event
From: Steven Rostedt @ 2026-03-10 20:09 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
Thomas Gleixner, Peter Zijlstra, Brian Geffon, John Stultz,
Ian Rogers, Suleiman Souhlal
In-Reply-To: <20260310200954.285663884@kernel.org>
From: Steven Rostedt <rostedt@goodmis.org>
For the futex system call trace event for FUTEX_LOCK_PI and
FUTEX_UNLOCK_PI commands, show the TID from the value (which is usually in
hex) as well as translate the flags (DIED and WAITERS).
pi_mutex_hammer-1098 [000] ..... 121.876928: sys_futex(uaddr: 0x560f40cc8180 (0x450) tid: 1104, FUTEX_LOCK_PI|FUTEX_PRIVATE_FLAG, val: 0, timespec: 0x7f2f9d4b1e50 (0.000100000))
pi_mutex_hammer-1128 [000] ..... 121.877120: sys_futex(uaddr: 0x560f40cc8180 (0x8000042a) tid: 1066 (WAITERS), FUTEX_LOCK_PI|FUTEX_PRIVATE_FLAG, val: 0, timespec: 0x7f2f8e493e50 (0.000100000))
pi_mutex_hammer-1106 [000] ..... 121.877242: sys_futex(uaddr: 0x560f40cc8180 (0x80000452) tid: 1106 (WAITERS), FUTEX_UNLOCK_PI|FUTEX_PRIVATE_FLAG, val: 0)
This makes it easier to see the hand off of a mutex and who the owner was.
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
Changes since v1: https://lore.kernel.org/all/20260303174442.548b6524@gandalf.local.home/
- Updated to have the print processing in kernel/futex/syscall.c
kernel/futex/syscalls.c | 26 ++++++++++++++++++++++++--
1 file changed, 24 insertions(+), 2 deletions(-)
diff --git a/kernel/futex/syscalls.c b/kernel/futex/syscalls.c
index a46706d6bc6c..24d912d9b20d 100644
--- a/kernel/futex/syscalls.c
+++ b/kernel/futex/syscalls.c
@@ -211,6 +211,9 @@ void futex_print_syscall(struct seq_buf *s, int nr_args, unsigned long *args,
unsigned int op, cmd;
bool done = false;
+ op = args[1];
+ cmd = op & FUTEX_CMD_MASK;
+
for (int i = 0; !done && i < nr_args; i++) {
if (seq_buf_has_overflowed(s))
@@ -225,11 +228,30 @@ void futex_print_syscall(struct seq_buf *s, int nr_args, unsigned long *args,
seq_buf_printf(s, " (%u)", val);
else
seq_buf_printf(s, " (0x%x)", val);
+
+ switch(cmd) {
+ case FUTEX_LOCK_PI:
+ case FUTEX_UNLOCK_PI:
+ seq_buf_printf(s, " tid: %d",
+ val & FUTEX_TID_MASK);
+
+ if (!(val & (FUTEX_OWNER_DIED|FUTEX_WAITERS)))
+ break;
+
+ seq_buf_puts(s, " (");
+ if (val & FUTEX_WAITERS)
+ seq_buf_puts(s, "WAITERS");
+ if (val & FUTEX_OWNER_DIED) {
+ if (op & FUTEX_WAITERS)
+ seq_buf_putc(s, '|');
+ seq_buf_puts(s, "DIED");
+ }
+ seq_buf_putc(s, ')');
+ break;
+ }
}
continue;
case 1:
- op = args[i];
- cmd = op & FUTEX_CMD_MASK;
if (cmd <= FUTEX_LOCK_PI2)
seq_buf_printf(s, ", %s", __futex_cmds[cmd]);
else
--
2.51.0
^ permalink raw reply related
* [PATCH v2 2/3] tracing: Update futex syscall trace event to show more commands
From: Steven Rostedt @ 2026-03-10 20:09 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
Thomas Gleixner, Peter Zijlstra, Brian Geffon, John Stultz,
Ian Rogers, Suleiman Souhlal
In-Reply-To: <20260310200954.285663884@kernel.org>
From: Steven Rostedt <rostedt@goodmis.org>
Make the futex syscall trace event a little more smart. Have it read the
futex_op instruction to determine what else it can save and print. For the
appropriate options, it will read the utime (timespec) parameter and show
its output as well as the uaddr2.
futex_requeue_p-1154 [004] ..... 144.568339: sys_futex(uaddr: 0x5652b178d834 (0x482), FUTEX_UNLOCK_PI|FUTEX_PRIVATE_FLAG, val: 0)
futex_requeue_p-1162 [002] ..... 144.568696: sys_futex(uaddr: 0x7f763b7fece0 (2), FUTEX_WAIT|FUTEX_PRIVATE_FLAG, val: 2)
futex_requeue_p-1151 [000] ..... 144.568700: sys_futex(uaddr: 0x7f763b7fece0 (0), FUTEX_WAKE|FUTEX_PRIVATE_FLAG, val: 1)
futex_requeue_p-1162 [002] ..... 144.568705: sys_futex(uaddr: 0x7f763b7fece0 (0), FUTEX_WAKE|FUTEX_PRIVATE_FLAG, val: 1)
futex_requeue_p-1151 [000] ..... 144.568715: sys_futex(uaddr: 0x7f764369e990 (0x483), FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, val: 1155)
futex_requeue_p-1155 [005] ..... 144.569420: sys_futex(uaddr: 0x5652b178d838 (0), FUTEX_WAIT_REQUEUE_PI|FUTEX_PRIVATE_FLAG, val: 0, timespec: 0x7ffdacfba500 (143.890024054), uaddr2: 0x5652b178d834 (0), val3: 0)
futex_requeue_p-1155 [005] ..... 144.569454: sys_futex(uaddr: 0x5652b178d834 (0), FUTEX_LOCK_PI|FUTEX_PRIVATE_FLAG, val: 0)
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
Changes since v1: https://lore.kernel.org/all/20260303214942.587739736@kernel.org/
- Updated to have the print processing in kernel/futex/syscall.c
include/linux/futex.h | 35 ++++++++-
kernel/futex/syscalls.c | 48 ++++++-------
kernel/trace/trace_syscalls.c | 129 +++++++++++++++++++++++++++++-----
3 files changed, 164 insertions(+), 48 deletions(-)
diff --git a/include/linux/futex.h b/include/linux/futex.h
index 9fc47aa01a8b..976fa257ab5c 100644
--- a/include/linux/futex.h
+++ b/include/linux/futex.h
@@ -82,8 +82,35 @@ long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
u32 __user *uaddr2, u32 val2, u32 val3);
int futex_hash_prctl(unsigned long arg2, unsigned long arg3, unsigned long arg4);
+static __always_inline bool futex_cmd_has_timeout(u32 cmd)
+{
+ switch (cmd) {
+ case FUTEX_WAIT:
+ case FUTEX_LOCK_PI:
+ case FUTEX_LOCK_PI2:
+ case FUTEX_WAIT_BITSET:
+ case FUTEX_WAIT_REQUEUE_PI:
+ return true;
+ }
+ return false;
+}
+
+static __always_inline bool futex_cmd_has_addr2(u32 cmd)
+{
+ switch (cmd) {
+ case FUTEX_REQUEUE:
+ case FUTEX_CMP_REQUEUE:
+ case FUTEX_WAKE_OP:
+ case FUTEX_WAIT_REQUEUE_PI:
+ return true;
+ }
+ return false;
+}
+
#ifdef CONFIG_FTRACE_SYSCALLS
-void futex_print_syscall(struct seq_buf *s, int nr_args, unsigned long *args, u32 *ptr);
+void futex_print_syscall(struct seq_buf *s, int nr_args, unsigned long *args,
+ u32 *ptr1, u32 *ptr2, unsigned long *ts1,
+ unsigned long *ts2);
#endif
#ifdef CONFIG_FUTEX_PRIVATE_HASH
@@ -119,7 +146,11 @@ static inline int futex_hash_allocate_default(void)
static inline int futex_hash_free(struct mm_struct *mm) { return 0; }
static inline int futex_mm_init(struct mm_struct *mm) { return 0; }
static inline void futex_print_syscall(struct seq_buf *s, int nr_args,
- unsigned long *args, u32 *ptr) { }
+ unsigned long *args, u32 *ptr1,
+ u32 *ptr2, unsigned long *ts1,
+ unsigned long *ts2) { }
+static __always_inline bool futex_cmd_has_timeout(u32 cmd) { return false; }
+static __always_inline bool futex_cmd_has_addr2(u32 cmd) { return false; }
#endif
#endif
diff --git a/kernel/futex/syscalls.c b/kernel/futex/syscalls.c
index a1cd512aa502..a46706d6bc6c 100644
--- a/kernel/futex/syscalls.c
+++ b/kernel/futex/syscalls.c
@@ -158,31 +158,6 @@ long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
return -ENOSYS;
}
-static __always_inline bool futex_cmd_has_timeout(u32 cmd)
-{
- switch (cmd) {
- case FUTEX_WAIT:
- case FUTEX_LOCK_PI:
- case FUTEX_LOCK_PI2:
- case FUTEX_WAIT_BITSET:
- case FUTEX_WAIT_REQUEUE_PI:
- return true;
- }
- return false;
-}
-
-static __always_inline bool futex_cmd_has_addr2(u32 cmd)
-{
- switch (cmd) {
- case FUTEX_REQUEUE:
- case FUTEX_CMP_REQUEUE:
- case FUTEX_WAKE_OP:
- case FUTEX_WAIT_REQUEUE_PI:
- return true;
- }
- return false;
-}
-
static __always_inline int
futex_init_timeout(u32 cmd, u32 op, struct timespec64 *ts, ktime_t *t)
{
@@ -229,7 +204,9 @@ static const char * __futex_cmds[] =
"FUTEX_LOCK_PI2",
};
-void futex_print_syscall(struct seq_buf *s, int nr_args, unsigned long *args, u32 *ptr)
+void futex_print_syscall(struct seq_buf *s, int nr_args, unsigned long *args,
+ u32 *ptr1, u32 *ptr2, unsigned long *ts1,
+ unsigned long *ts2)
{
unsigned int op, cmd;
bool done = false;
@@ -242,8 +219,8 @@ void futex_print_syscall(struct seq_buf *s, int nr_args, unsigned long *args, u3
switch (i) {
case 0:
seq_buf_printf(s, "uaddr: 0x%lx", args[i]);
- if (ptr) {
- u32 val = *ptr;
+ if (ptr1) {
+ u32 val = *ptr1;
if (val < 10)
seq_buf_printf(s, " (%u)", val);
else
@@ -279,6 +256,15 @@ void futex_print_syscall(struct seq_buf *s, int nr_args, unsigned long *args, u3
continue;
seq_buf_printf(s, ", timespec: 0x%lx", args[i]);
+
+ if (!ts1 || !ts2)
+ continue;
+
+ if (!*ts1 && !*ts2) {
+ seq_buf_puts(s, " (0)");
+ continue;
+ }
+ seq_buf_printf(s, " (%lu.%09lu)", *ts1, *ts2);
continue;
case 4:
if (!futex_cmd_has_addr2(cmd)) {
@@ -286,6 +272,12 @@ void futex_print_syscall(struct seq_buf *s, int nr_args, unsigned long *args, u3
continue;
}
seq_buf_printf(s, ", uaddr2: 0x%lx", args[i]);
+
+ if (!ptr2)
+ continue;
+
+ seq_buf_printf(s, " (%x)", *ptr2);
+
continue;
case 5:
seq_buf_printf(s, ", val3: %lu", args[i]);
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 8cb3af569157..de1fa97547a3 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -239,21 +239,35 @@ sys_enter_openat_print(struct syscall_trace_enter *trace, struct syscall_metadat
return trace_handle_return(s);
}
+struct futex_data {
+ u32 val1;
+ u32 val2;
+ unsigned long ts1;
+ unsigned long ts2;
+};
+
static enum print_line_t
sys_enter_futex_print(struct syscall_trace_enter *trace, struct syscall_metadata *entry,
struct trace_seq *s, struct trace_event *event, int ent_size)
{
+ struct futex_data *data;
void *end = (void *)trace + ent_size;
- void *ptr;
+ unsigned long *ts1 = NULL, *ts2 = NULL;
+ u32 *ptr1 = NULL, *ptr2 = NULL;
/* Set ptr to the user space copied area */
- ptr = (void *)trace->args + sizeof(unsigned long) * entry->nb_args;
- if (ptr + 4 > end)
- ptr = NULL;
+ data = (void *)trace->args + sizeof(unsigned long) * entry->nb_args;
+ if ((void *)data + sizeof(*data) <= end) {
+ ptr1 = &data->val1;
+ ptr2 = &data->val2;
+ ts1 = &data->ts1;
+ ts2 = &data->ts2;
+ }
trace_seq_printf(s, "%s(", entry->name);
- futex_print_syscall(&s->seq, entry->nb_args, trace->args, ptr);
+ futex_print_syscall(&s->seq, entry->nb_args, trace->args, ptr1, ptr2,
+ ts1, ts2);
trace_seq_puts(s, ")\n");
@@ -472,9 +486,9 @@ sys_enter_futex_print_fmt(struct syscall_metadata *entry, char *buf, int len)
pos += snprintf(buf + pos, LEN_OR_ZERO,
"\"uaddr: 0x%%lx (0x%%lx) cmd=%%s%%s%%s");
pos += snprintf(buf + pos, LEN_OR_ZERO,
- " val: 0x%%x timeout/val2: 0x%%llx");
+ " val: 0x%%x timeout/val2: 0x%%llx (%%lu.%%lu)");
pos += snprintf(buf + pos, LEN_OR_ZERO,
- " uaddr2: 0x%%lx val3: 0x%%x\", ");
+ " uaddr2: 0x%%lx (0x%%lx) val3: 0x%%x\", ");
pos += snprintf(buf + pos, LEN_OR_ZERO,
" REC->uaddr,");
@@ -520,10 +534,12 @@ sys_enter_futex_print_fmt(struct syscall_metadata *entry, char *buf, int len)
FUTEX_CLOCK_REALTIME);
pos += snprintf(buf + pos, LEN_OR_ZERO,
- " REC->val, REC->utime,");
+ " REC->val, REC->utime, REC->__ts1, REC->__ts2,");
pos += snprintf(buf + pos, LEN_OR_ZERO,
- " REC->uaddr, REC->val3");
+ " REC->uaddr,");
+ pos += snprintf(buf + pos, LEN_OR_ZERO,
+ " REC->__value2, REC->val3");
return pos;
}
@@ -626,7 +642,39 @@ static int __init futex_fields(struct trace_event_call *call, int offset)
ret = trace_define_field(call, "u32", arg, offset, sizeof(int), 0,
FILTER_OTHER);
if (ret)
- kfree(arg);
+ goto free;
+ offset += sizeof(int);
+
+ arg = kstrdup("__value2", GFP_KERNEL);
+ if (WARN_ON_ONCE(!arg))
+ return -ENOMEM;
+ ret = trace_define_field(call, "u32", arg, offset, sizeof(int), 0,
+ FILTER_OTHER);
+ if (ret)
+ goto free;
+ offset += sizeof(int);
+
+ arg = kstrdup("__ts1", GFP_KERNEL);
+ if (WARN_ON_ONCE(!arg))
+ return -ENOMEM;
+ ret = trace_define_field(call, "unsigned long", arg, offset,
+ sizeof(unsigned long), 0, FILTER_OTHER);
+ if (ret)
+ goto free;
+ offset += sizeof(long);
+
+ arg = kstrdup("__ts2", GFP_KERNEL);
+ if (WARN_ON_ONCE(!arg))
+ return -ENOMEM;
+ ret = trace_define_field(call, "unsigned long", arg, offset,
+ sizeof(unsigned long), 0, FILTER_OTHER);
+ if (ret)
+ goto free;
+
+ return 0;
+
+free:
+ kfree(arg);
return ret;
}
@@ -799,11 +847,51 @@ static int syscall_copy_user_array(char *buf, const char __user *ptr,
return 0;
}
+struct tp_futex_data {
+ u32 cmd;
+ const u32 __user *val1;
+ const u32 __user *val2;
+ void __user *timeout;
+};
+
+static int syscall_copy_futex(char *buf, const char __user *ptr,
+ size_t size, void *data)
+{
+ struct tp_futex_data *tp_data = data;
+ struct futex_data *fdata = (void *)buf;
+ int cmd = tp_data->cmd & FUTEX_CMD_MASK;
+ int ret;
+
+ memset(fdata, 0, sizeof(*fdata));
+
+ if (tp_data->val1) {
+ ret = __copy_from_user(&fdata->val1, tp_data->val1, 4);
+ if (ret)
+ return -1;
+ }
+
+ if (tp_data->val2 && futex_cmd_has_addr2(cmd)) {
+ ret = __copy_from_user(&fdata->val2, tp_data->val2, 4);
+ if (ret)
+ return -1;
+ }
+
+ if (tp_data->timeout && futex_cmd_has_timeout(cmd)) {
+ /* Copies both ts1 and ts2 */
+ ret = __copy_from_user(&fdata->ts1, tp_data->timeout,
+ sizeof(long) * 2);
+ if (ret)
+ return -1;
+ }
+
+ return 0;
+}
+
static int
syscall_get_futex(unsigned long *args, char **buffer, int *size, int buf_size)
{
struct syscall_user_buffer *sbuf;
- const char __user *ptr;
+ struct tp_futex_data tp_data;
char *buf;
/* buf_size of zero means user doesn't want user space read */
@@ -815,14 +903,18 @@ syscall_get_futex(unsigned long *args, char **buffer, int *size, int buf_size)
if (!sbuf)
return -1;
- ptr = (char __user *)args[0];
+ tp_data.cmd = args[1];
+ tp_data.val1 = (u32 __user *)args[0];
+ tp_data.val2 = (u32 __user *)args[4];
+ tp_data.timeout = (u64 __user *)args[3];
- *buffer = trace_user_fault_read(&sbuf->buf, ptr, 4, NULL, NULL);
+ *buffer = trace_user_fault_read(&sbuf->buf, NULL, 0,
+ syscall_copy_futex, &tp_data);
if (!*buffer)
return -1;
- /* Add room for the value */
- *size += 4;
+ /* Add room for values */
+ *size += sizeof(struct futex_data);
buf = *buffer;
@@ -833,12 +925,13 @@ static void syscall_put_futex(struct syscall_metadata *sys_data,
struct syscall_trace_enter *entry,
char *buffer)
{
- u32 *ptr;
+ struct futex_data *fdata = (void *)buffer;
+ struct futex_data *data;
/* Place the futex key into the storage */
- ptr = (void *)entry->args + sizeof(unsigned long) * sys_data->nb_args;
+ data = (void *)entry->args + sizeof(unsigned long) * sys_data->nb_args;
- *ptr = *(u32 *)buffer;
+ *data = *fdata;
}
static char *sys_fault_user(unsigned int buf_size,
--
2.51.0
^ permalink raw reply related
* [PATCH v2 0/3] tracing: Read user data from futex system call trace event
From: Steven Rostedt @ 2026-03-10 20:09 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
Thomas Gleixner, Peter Zijlstra, Brian Geffon, John Stultz,
Ian Rogers, Suleiman Souhlal
We are looking at the performance of futexes and require a bit more
information when tracing them.
The two patches here extend the system call reading of user space to
create specific handling of the futex system call. It now reads the
user space relevant data (the addr, utime and addr2), as well as
parses the flags. This adds a little smarts to the trace event as
it only shows the parameters that are relevant, as well as parses
utime as either a timespec or as val2 depending on the futex_op.
Here's an example of the new output:
sys_futex(uaddr: 0x56196292e830 (0), FUTEX_WAKE|FUTEX_PRIVATE_FLAG)
sys_futex(uaddr: 0x56196292e834 (0x4a7) tid: 1191, FUTEX_UNLOCK_PI|FUTEX_PRIVATE_FLAG)
sys_futex(uaddr: 0x56196292e834 (0) tid: 0, FUTEX_LOCK_PI|FUTEX_PRIVATE_FLAG)
sys_futex(uaddr: 0x56196292e830 (0), FUTEX_WAIT|FUTEX_PRIVATE_FLAG)
sys_futex(uaddr: 0x56196292e838 (0), FUTEX_WAIT_REQUEUE_PI|FUTEX_PRIVATE_FLAG, timespec: 0x7ffc1b91a9f0 (163.048528790), uaddr2: 0x56196292e834 (4aa), val3: 0)
sys_futex(uaddr: 0x56196292e834 (0x4aa) tid: 1194, FUTEX_LOCK_PI|FUTEX_PRIVATE_FLAG)
sys_futex(uaddr: 0x56196292e838 (0), FUTEX_WAIT_REQUEUE_PI|FUTEX_PRIVATE_FLAG, timespec: 0x7ffc1b91a9f0 (163.048528790), uaddr2: 0x56196292e834 (800004aa), val3: 0)
sys_futex(uaddr: 0x7f7ed6b29990 (0x4ab), FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME)
sys_futex(uaddr: 0x56196292e834 (0x800004aa) tid: 1194 (WAITERS), FUTEX_LOCK_PI|FUTEX_PRIVATE_FLAG)
sys_futex(uaddr: 0x56196292e838 (0), FUTEX_WAIT_REQUEUE_PI|FUTEX_PRIVATE_FLAG, timespec: 0x7ffc1b91a9f0 (163.048528790), uaddr2: 0x56196292e834 (800004aa), val3: 0)
sys_futex(uaddr: 0x56196292e834 (0x800004aa) tid: 1194 (WAITERS), FUTEX_LOCK_PI|FUTEX_PRIVATE_FLAG)
Changes since v1: https://lore.kernel.org/all/20260303214735.002154462@kernel.org/
- Moved the printing of the futex parameters to kernel/futex/syscall.c to
keep it closer to the actual system call.
- Moved the always_inline functions: futex_cmd_has_timeout()
and futex_cmd_has_addr2() into include/linux/futex.h so that they
can be both used by the futex and tracing syscall code.
Steven Rostedt (3):
tracing: Have futex syscall trace event show specific user data
tracing: Update futex syscall trace event to show more commands
tracing: Show TID and flags for PI futex system call trace event
----
include/linux/futex.h | 38 +++++-
kernel/futex/syscalls.c | 129 ++++++++++++++++++---
kernel/trace/trace_syscalls.c | 261 +++++++++++++++++++++++++++++++++++++++++-
3 files changed, 411 insertions(+), 17 deletions(-)
^ permalink raw reply
* Re: [PATCH 00/61] treewide: Use IS_ERR_OR_NULL over manual NULL check - refactor
From: Kuan-Wei Chiu @ 2026-03-10 18:40 UTC (permalink / raw)
To: Philipp Hahn
Cc: amd-gfx, apparmor, bpf, ceph-devel, cocci, dm-devel, dri-devel,
gfs2, intel-gfx, intel-wired-lan, iommu, kvm, linux-arm-kernel,
linux-block, linux-bluetooth, linux-btrfs, linux-cifs, linux-clk,
linux-erofs, linux-ext4, linux-fsdevel, linux-gpio, linux-hyperv,
linux-input, linux-kernel, linux-leds, linux-media, linux-mips,
linux-mm, linux-modules, linux-mtd, linux-nfs, linux-omap,
linux-phy, linux-pm, linux-rockchip, linux-s390, linux-scsi,
linux-sctp, linux-security-module, linux-sh, linux-sound,
linux-stm32, linux-trace-kernel, linux-usb, linux-wireless,
netdev, ntfs3, samba-technical, sched-ext, target-devel,
tipc-discussion, v9fs
In-Reply-To: <20260310-b4-is_err_or_null-v1-0-bd63b656022d@avm.de>
Hi Philipp,
On Tue, Mar 10, 2026 at 12:48:26PM +0100, Philipp Hahn wrote:
> While doing some static code analysis I stumbled over a common pattern,
> where IS_ERR() is combined with a NULL check. For that there is
> IS_ERR_OR_NULL().
>
> I've written a Coccinelle patch to find and patch those instances.
> The patches follow grouped by subsystem.
>
> Patches 55-58 may be dropped as they have a (minor?) semantic change:
> They use WARN_ON() or WARN_ON_ONCE(), but only in the IS_ERR() path, not
> for the NULL check. Iff it is okay to print the warning also for NULL,
> then the patches can be applied.
>
> While generating the patch set `checkpatch` complained about mixing
> [un]likely() with IS_ERR_OR_NULL(), which already uses likely()
> internally. I found and fixed several locations, where that combination
> has been used.
Thanks for the patchset. However, I think we need a explanation for why
switching to IS_ERR_OR_NULL() is an improvement over the existing code.
IMHO, the necessity of IS_ERR_OR_NULL() often highlights a confusing or
flawed API design. It usually implies that the caller is unsure whether
a failure results in an error pointer or a NULL pointer. Rather than
doing a treewide conversion of this pattern, I believe it would be much
more meaningful to review these instances case-by-case and fix the
underlying APIs or caller logic instead.
Additionally, a treewide refactoring like this has the practical
drawback of creating unnecessary merge conflicts when backporting to
stable trees.
Regards,
Kuan-Wei
^ permalink raw reply
* Re: [PATCH 03/61] ceph: Prefer IS_ERR_OR_NULL over manual NULL check
From: Viacheslav Dubeyko @ 2026-03-10 18:13 UTC (permalink / raw)
To: dm-devel@lists.linux.dev, phahn-oss@avm.de,
intel-wired-lan@lists.osuosl.org, linux-erofs@lists.ozlabs.org,
linux-security-module@vger.kernel.org, kvm@vger.kernel.org,
linux-sctp@vger.kernel.org, linux-pm@vger.kernel.org,
apparmor@lists.ubuntu.com, linux-ext4@vger.kernel.org,
amd-gfx@lists.freedesktop.org, linux-clk@vger.kernel.org,
linux-mips@vger.kernel.org, linux-media@vger.kernel.org,
netdev@vger.kernel.org, iommu@lists.linux.dev,
linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-fsdevel@vger.kernel.org, linux-usb@vger.kernel.org,
sched-ext@lists.linux.dev, linux-btrfs@vger.kernel.org,
linux-bluetooth@vger.kernel.org, linux-s390@vger.kernel.org,
samba-technical@lists.samba.org, intel-gfx@lists.freedesktop.org,
linux-trace-kernel@vger.kernel.org, ntfs3@lists.linux.dev,
linux-phy@lists.infradead.org, v9fs@lists.linux.dev,
ceph-devel@vger.kernel.org, tipc-discussion@lists.sourceforge.net,
linux-mtd@lists.infradead.org, linux-scsi@vger.kernel.org,
target-devel@vger.kernel.org, linux-gpio@vger.kernel.org,
cocci@inria.fr, linux-sh@vger.kernel.org,
linux-rockchip@lists.infradead.org,
linux-stm32@st-md-mailman.stormreply.com,
linux-cifs@vger.kernel.org, linux-modules@vger.kernel.org,
linux-sound@vger.kernel.org, bpf@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, linux-input@vger.kernel.org,
linux-leds@vger.kernel.org, dri-devel@lists.freedesktop.org,
linux-hyperv@vger.kernel.org, linux-mm@kvack.org,
linux-nfs@vger.kernel.org, gfs2@lists.linux.dev,
linux-wireless@vger.kernel.org, linux-omap@vger.kernel.org
Cc: idryomov@gmail.com, Alex Markuze, slava@dubeyko.com
In-Reply-To: <20260310-b4-is_err_or_null-v1-3-bd63b656022d@avm.de>
On Tue, 2026-03-10 at 12:48 +0100, Philipp Hahn wrote:
> Prefer using IS_ERR_OR_NULL() over using IS_ERR() and a manual NULL
> check.
>
> Change generated with coccinelle.
>
> To: Ilya Dryomov <idryomov@gmail.com>
> To: Alex Markuze <amarkuze@redhat.com>
> To: Viacheslav Dubeyko <slava@dubeyko.com>
> Cc: ceph-devel@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Philipp Hahn <phahn-oss@avm.de>
> ---
> fs/ceph/dir.c | 2 +-
> fs/ceph/snap.c | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
> index 86d7aa594ea99335af3e91a95c0a418fdc1b8a8a..934250748ae4fd4c148fd27bdf91175047c2877d 100644
> --- a/fs/ceph/dir.c
> +++ b/fs/ceph/dir.c
> @@ -889,7 +889,7 @@ int ceph_handle_notrace_create(struct inode *dir, struct dentry *dentry)
> {
> struct dentry *result = ceph_lookup(dir, dentry, 0);
>
> - if (result && !IS_ERR(result)) {
> + if (!IS_ERR_OR_NULL(result)) {
> /*
> * We created the item, then did a lookup, and found
> * it was already linked to another inode we already
> diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
> index 52b4c2684f922bfed39550311e793bfe3622cd26..528ad581be160713f91416115659e2dc6f259576 100644
> --- a/fs/ceph/snap.c
> +++ b/fs/ceph/snap.c
> @@ -902,7 +902,7 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
> bad:
> err = -EIO;
> fail:
> - if (realm && !IS_ERR(realm))
> + if (!IS_ERR_OR_NULL(realm))
> ceph_put_snap_realm(mdsc, realm);
> if (first_realm)
> ceph_put_snap_realm(mdsc, first_realm);
Looks good.
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Thanks,
Slava.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox