Netdev List
 help / color / mirror / Atom feed
* [PATCH RFC v2 6/9] leds: trigger: Add led_trigger_notify_hw_control_changed() interface
From: Rong Zhang @ 2026-06-17 16:48 UTC (permalink / raw)
  To: Lee Jones, Pavel Machek, Jonathan Corbet, Shuah Khan,
	Thomas Weißschuh, Benson Leung, Guenter Roeck,
	Marek Behún, Mark Pearson, Derek J. Clark, Hans de Goede,
	Ilpo Järvinen, Ike Panhc
  Cc: Andrew Lunn, Jakub Kicinski, Vishnu Sankar, Vishnu Sankar,
	linux-leds, netdev, linux-doc, linux-kernel, chrome-platform,
	platform-driver-x86, Rong Zhang
In-Reply-To: <20260618-leds-trigger-hw-changed-v2-0-c28c44053cf3@rong.moe>

Some hardware can autonomously activate/deactivate hardware control.
After that, the LED hardware notifies the LED driver. Currently, there
is no mechanism for LED drivers to notify the LED core about such events
and initiate a trigger transition to reflect the hardware state.

Add a new interface called led_trigger_notify_hw_control_changed(), so
that LED drivers can call it to notify the LED core about the
transition.

The interface only allows two transitions:

1. "none" => private trigger
2. private trigger => "none"

If the current trigger is neither the private trigger nor "none", no
transition will be made. This protects the currently selected software
trigger.

Note that LED_OFF won't be emitted during the #2 transition, as some
hardware may have selected a new brightness level during its hardware
state transition (e.g., laptop keyboards with a shortcut cycling through
different backlight brightnesses and auto mode).

The interface is designed as a void function as any failure should be
non-fatal and the result of transition should not have any impact on the
LED drivers' event handling procedures.

To use the interface, LEDS_TRIGGERS_HW_CHANGED must be enabled in
Kconfig, and the LED driver must set the LED_TRIG_HW_CHANGED flag for
the classdev.

Signed-off-by: Rong Zhang <i@rong.moe>
---
 Documentation/leds/leds-class.rst | 61 +++++++++++++++++++++++++++
 drivers/leds/led-triggers.c       | 86 +++++++++++++++++++++++++++++++++++++--
 drivers/leds/trigger/Kconfig      |  9 ++++
 include/linux/leds.h              |  8 ++++
 4 files changed, 161 insertions(+), 3 deletions(-)

diff --git a/Documentation/leds/leds-class.rst b/Documentation/leds/leds-class.rst
index 41342ecb5f6b..f250dc938e1f 100644
--- a/Documentation/leds/leds-class.rst
+++ b/Documentation/leds/leds-class.rst
@@ -261,9 +261,70 @@ the end use hw_control_set to activate hw control.
 A trigger can use hw_control_get to check if a LED is already in hw control
 and init their flags.
 
+Alternatively, a private trigger can be implemented along with the LED driver if
+the LED's hw control doesn't fit any generic trigger. To associate the private
+trigger with the LED classdev, their `trigger_type` must be the same. The name
+of the private trigger must be the same as `hw_control_trigger`. Since both the
+LED classdev and the private trigger are in the same LED driver, it's not
+necessary for them to coordinate via `hw_control_*` callbacks.
+
 When the LED is in hw control, no software blink is possible and doing so
 will effectively disable hw control.
 
+Hardware-initiated trigger transition
+=====================================
+
+Some hardware can autonomously activate/deactivate hardware control. After that,
+the LED hardware notifies the LED driver.
+
+If the driver can detect such transitions and thus wants to notify the LED core
+to update the current trigger then the `LED_TRIG_HW_CHANGED` flag must be set in
+flags before registering. To update the current trigger accordingly, call
+`led_trigger_notify_hw_control_changed` on the LED classdev. Calling the method
+on a classdev not registered with the `LED_TRIG_HW_CHANGED` flag or an
+appropriate `hw_control_trigger` string is a bug and will trigger a WARN_ON.
+
+This capability is restricted to the LED device's private trigger. The private
+trigger must have been properly registered (see above) and named after
+`hw_control_trigger`, or else a dev_err() will be triggered.
+
+Only two transitions are defined:
+
+- "none" => private trigger:
+        This happens when the hardware autonomously activates hardware control
+        and when "none" (i.e., no trigger) is currently active. If the private
+        trigger is already active when the method is called, this is essentially
+        a no-op.
+
+        The activation sequence for the private trigger will be executed as
+        normal.
+
+        The LED driver and its private trigger must be able to handle the
+        activation sequence even if the hardware is currently in hardware
+        control.
+
+        If error occurs in the activation sequence, the LED Trigger core reverts
+        the effective trigger to "none".
+
+- private trigger => "none"
+        This happens when the hardware autonomously deactivates hardware control
+        and when the private trigger is currently active. If "none" (i.e., no
+        trigger) is active when the method is called, this is essentially a
+        no-op.
+
+        The deactivation sequence for the private trigger will be executed as
+        normal, except that the current LED brightness is retained. The reason
+        for keeping the brightness unchanged is that some hardware may choose a
+        specific brightness instead of simply turning off the LED after
+        autonomously deactivating hardware control.
+
+        The LED driver and its private trigger must be able to handle the
+        deactivation sequence even if the hardware is not currently in hardware
+        control.
+
+If the current trigger is neither the private trigger nor "none", no transition
+will be made.
+
 Known Issues
 ============
 
diff --git a/drivers/leds/led-triggers.c b/drivers/leds/led-triggers.c
index c43229d9c4c1..73e9ce376d02 100644
--- a/drivers/leds/led-triggers.c
+++ b/drivers/leds/led-triggers.c
@@ -7,6 +7,7 @@
  * Author: Richard Purdie <rpurdie@openedhand.com>
  */
 
+#include <linux/bug.h>
 #include <linux/export.h>
 #include <linux/kernel.h>
 #include <linux/list.h>
@@ -162,8 +163,8 @@ ssize_t led_trigger_read(struct file *filp, struct kobject *kobj,
 }
 EXPORT_SYMBOL_GPL(led_trigger_read);
 
-/* Caller must ensure led_cdev->trigger_lock held */
-int led_trigger_set(struct led_classdev *led_cdev, struct led_trigger *trig)
+static int __led_trigger_set(struct led_classdev *led_cdev, struct led_trigger *trig,
+			     bool hw_triggered)
 {
 	char *event = NULL;
 	char *envp[2];
@@ -194,7 +195,21 @@ int led_trigger_set(struct led_classdev *led_cdev, struct led_trigger *trig)
 		led_cdev->trigger_data = NULL;
 		led_cdev->activated = false;
 		led_cdev->flags &= ~LED_INIT_DEFAULT_TRIGGER;
-		led_set_brightness(led_cdev, LED_OFF);
+
+		/*
+		 * Hardware may have selected a new brightness level during its
+		 * hardware control transition, so only reset brightness if we
+		 * are switching to another trigger or if the switching is not
+		 * hardware triggered.
+		 *
+		 * Note that this does not apply to the error path, as running
+		 * into the error path implies a none => private trigger
+		 * transition. This hints that the LED driver and its private
+		 * trigger must have some fundamental bugs, so don't bother
+		 * leaving the LED in an undefined state.
+		 */
+		if (trig || !hw_triggered)
+			led_set_brightness(led_cdev, LED_OFF);
 	}
 	if (trig) {
 		spin_lock(&trig->leddev_list_lock);
@@ -258,6 +273,12 @@ int led_trigger_set(struct led_classdev *led_cdev, struct led_trigger *trig)
 
 	return ret;
 }
+
+/* Caller must ensure led_cdev->trigger_lock held */
+int led_trigger_set(struct led_classdev *led_cdev, struct led_trigger *trig)
+{
+	return __led_trigger_set(led_cdev, trig, false);
+}
 EXPORT_SYMBOL_GPL(led_trigger_set);
 
 void led_trigger_remove(struct led_classdev *led_cdev)
@@ -448,6 +469,65 @@ int devm_led_trigger_register(struct device *dev,
 }
 EXPORT_SYMBOL_GPL(devm_led_trigger_register);
 
+#ifdef CONFIG_LEDS_TRIGGERS_HW_CHANGED
+static void led_trigger_do_hw_control_transition(struct led_classdev *led_cdev, bool activate,
+						 struct led_trigger *hc_trig)
+{
+	int err = 0;
+
+	if (!led_cdev->trigger) {
+		/* "none" => private trigger. */
+		if (activate)
+			err = __led_trigger_set(led_cdev, hc_trig, true);
+	} else if (led_cdev->trigger == hc_trig) {
+		/* private trigger => "none". */
+		if (!activate)
+			err = __led_trigger_set(led_cdev, NULL, true);
+	} else {
+		/* Other trigger is active. */
+		dev_dbg(led_cdev->dev,
+			"Ignoring hw control transition (%s %s) while %s is active",
+			activate ? "activate" : "deactivate", hc_trig->name,
+			led_cdev->trigger->name);
+
+		return;
+	}
+
+	if (err)
+		dev_warn(led_cdev->dev, "Failed to %s %s in hw control transition: %d",
+			 activate ? "activate" : "deactivate", hc_trig->name, err);
+}
+
+void led_trigger_notify_hw_control_changed(struct led_classdev *led_cdev, bool activate)
+{
+	struct led_trigger *trig;
+
+	/* Restricted to private triggers. */
+	if (WARN_ON(!(led_cdev->flags & LED_TRIG_HW_CHANGED) ||
+		    !led_cdev->hw_control_trigger || !led_cdev->trigger_type))
+		return;
+
+	down_read(&triggers_list_lock);
+	list_for_each_entry(trig, &trigger_list, next_trig) {
+		if (trig->trigger_type == led_cdev->trigger_type &&
+		    !strcmp(trig->name, led_cdev->hw_control_trigger)) {
+			down_write(&led_cdev->trigger_lock);
+			led_trigger_do_hw_control_transition(led_cdev, activate, trig);
+			up_write(&led_cdev->trigger_lock);
+
+			up_read(&triggers_list_lock);
+			return;
+		}
+	}
+	up_read(&triggers_list_lock);
+
+	dev_err(led_cdev->dev,
+		"%s() is called, but the private trigger (%s) is never registered\n",
+		__func__, led_cdev->hw_control_trigger);
+}
+EXPORT_SYMBOL_GPL(led_trigger_notify_hw_control_changed);
+#endif /* CONFIG_LEDS_TRIGGERS_HW_CHANGED */
+
 /* Simple LED Trigger Interface */
 
 void led_trigger_event(struct led_trigger *trig,
diff --git a/drivers/leds/trigger/Kconfig b/drivers/leds/trigger/Kconfig
index c11282a74b5a..798122154049 100644
--- a/drivers/leds/trigger/Kconfig
+++ b/drivers/leds/trigger/Kconfig
@@ -9,6 +9,15 @@ menuconfig LEDS_TRIGGERS
 
 if LEDS_TRIGGERS
 
+config LEDS_TRIGGERS_HW_CHANGED
+	bool "LED hardware-initiated trigger transition support"
+	help
+	  This option enables support for hardware initiated hardware control
+	  transitions, where the LED hardware autonomously switches between
+	  "none" (i.e., no trigger) and its private trigger.
+
+	  See Documentation/leds/leds-class.rst for details.
+
 config LEDS_TRIGGER_TIMER
 	tristate "LED Timer Trigger"
 	help
diff --git a/include/linux/leds.h b/include/linux/leds.h
index 7332034a43c8..479391ddf5e5 100644
--- a/include/linux/leds.h
+++ b/include/linux/leds.h
@@ -109,6 +109,7 @@ struct led_classdev {
 #define LED_INIT_DEFAULT_TRIGGER BIT(23)
 #define LED_REJECT_NAME_CONFLICT BIT(24)
 #define LED_MULTI_COLOR		BIT(25)
+#define LED_TRIG_HW_CHANGED	BIT(26)
 
 	/* set_brightness_work / blink_timer flags, atomic, private. */
 	unsigned long		work_flags;
@@ -599,6 +600,13 @@ led_trigger_get_brightness(const struct led_trigger *trigger)
 
 #endif /* CONFIG_LEDS_TRIGGERS */
 
+#ifdef CONFIG_LEDS_TRIGGERS_HW_CHANGED
+void led_trigger_notify_hw_control_changed(struct led_classdev *led_cdev, bool activate);
+#else
+static inline void led_trigger_notify_hw_control_changed(struct led_classdev *led_cdev,
+							 bool activate) {}
+#endif
+
 /* Trigger specific enum */
 enum led_trigger_netdev_modes {
 	TRIGGER_NETDEV_LINK = 0,

-- 
2.53.0


^ permalink raw reply related

* [PATCH RFC v2 5/9] leds: Add trigger_may_offload attribute
From: Rong Zhang @ 2026-06-17 16:47 UTC (permalink / raw)
  To: Lee Jones, Pavel Machek, Jonathan Corbet, Shuah Khan,
	Thomas Weißschuh, Benson Leung, Guenter Roeck,
	Marek Behún, Mark Pearson, Derek J. Clark, Hans de Goede,
	Ilpo Järvinen, Ike Panhc
  Cc: Andrew Lunn, Jakub Kicinski, Vishnu Sankar, Vishnu Sankar,
	linux-leds, netdev, linux-doc, linux-kernel, chrome-platform,
	platform-driver-x86, Rong Zhang
In-Reply-To: <20260618-leds-trigger-hw-changed-v2-0-c28c44053cf3@rong.moe>

There are multiple triggers implementing hardware control. Only "netdev"
provides a custom attribute to determine if it's offloaded to hardware
(i.e., in hardware control). For other triggers, there is no obvious way
for userspace to determine the trigger state programmatically. Moreover,
userspace can't query if an LED device supports hardware control or
identifies these triggers.

Add a new attribute "trigger_may_offload" to the LED core, so that
userspace can determine:

- if the LED device supports hardware control (supported => visible)
- which trigger is the hardware control trigger selected by the LED
  device
- if the trigger is selected ("<foo_trigger>")
- if the trigger is offloaded ("[foo_trigger]")

Note: the documentation describes the attribute as "returning a list"
despite the LED core currently only supports one hardware control
trigger per LED device. This is intentional to make the attribute
extensible in the future without breaking userspace.

Signed-off-by: Rong Zhang <i@rong.moe>
---
 .../ABI/obsolete/sysfs-class-led-trigger-netdev    | 16 ++++++++
 Documentation/ABI/testing/sysfs-class-led          | 22 +++++++++++
 .../ABI/testing/sysfs-class-led-trigger-netdev     | 13 -------
 Documentation/leds/leds-class.rst                  |  8 ++++
 drivers/leds/led-class.c                           | 23 +++++++++++
 drivers/leds/led-triggers.c                        | 45 ++++++++++++++++++++++
 drivers/leds/leds.h                                |  2 +
 drivers/leds/trigger/ledtrig-netdev.c              |  2 +
 8 files changed, 118 insertions(+), 13 deletions(-)

diff --git a/Documentation/ABI/obsolete/sysfs-class-led-trigger-netdev b/Documentation/ABI/obsolete/sysfs-class-led-trigger-netdev
new file mode 100644
index 000000000000..8d2fbfaf50c3
--- /dev/null
+++ b/Documentation/ABI/obsolete/sysfs-class-led-trigger-netdev
@@ -0,0 +1,16 @@
+What:		/sys/class/leds/<led>/offloaded
+Date:		June 2026
+KernelVersion:	7.3
+Contact:	linux-leds@vger.kernel.org
+Description:
+		Communicate whether the LED trigger modes are offloaded to
+		hardware or whether software fallback is used.
+
+		If 0, the LED is using software fallback to blink.
+
+		If 1, the LED blinking in requested mode is offloaded to
+		hardware.
+
+		/sys/class/leds/<led>/trigger_may_offload provides a generic
+		method to query the offloaded state of supported triggers,
+		superseding this attribute.
diff --git a/Documentation/ABI/testing/sysfs-class-led b/Documentation/ABI/testing/sysfs-class-led
index 0313b82644f2..edd5a9a74dfd 100644
--- a/Documentation/ABI/testing/sysfs-class-led
+++ b/Documentation/ABI/testing/sysfs-class-led
@@ -78,6 +78,28 @@ Description:
 		(which would often be configured in the device tree for the
 		hardware).
 
+What:		/sys/class/leds/<led>/trigger_may_offload
+Date:		June 2026
+KernelVersion:	7.3
+Contact:	linux-leds@vger.kernel.org
+Description:
+		Names and states of triggers that may be offloaded to hardware.
+		Such triggers are also called "hw control trigger" in some
+		context.
+
+		Only exists when the LED supports trigger offload.
+
+		Reading this file returns a list of triggers that are capable to
+		be offloaded. The optional brackets around the trigger name
+		indicate the state of the current trigger:
+
+		- `foo_trigger`: the trigger is not selected.
+		- `<foo_trigger>`: the trigger is selected, but falls back to
+		  software blink for some reason (e.g., incompatible trigger
+		  parameters)
+		- `[foo_trigger]`: the trigger is selected and offloaded to
+		  hardware.
+
 What:		/sys/class/leds/<led>/inverted
 Date:		January 2011
 KernelVersion:	2.6.38
diff --git a/Documentation/ABI/testing/sysfs-class-led-trigger-netdev b/Documentation/ABI/testing/sysfs-class-led-trigger-netdev
index ed46b37ab8a2..396d37a4b820 100644
--- a/Documentation/ABI/testing/sysfs-class-led-trigger-netdev
+++ b/Documentation/ABI/testing/sysfs-class-led-trigger-netdev
@@ -62,19 +62,6 @@ Description:
 		When offloaded is true, the blink interval is controlled by
 		hardware and won't reflect the value set in interval.
 
-What:		/sys/class/leds/<led>/offloaded
-Date:		Jun 2023
-KernelVersion:	6.5
-Contact:	linux-leds@vger.kernel.org
-Description:
-		Communicate whether the LED trigger modes are offloaded to
-		hardware or whether software fallback is used.
-
-		If 0, the LED is using software fallback to blink.
-
-		If 1, the LED blinking in requested mode is offloaded to
-		hardware.
-
 What:		/sys/class/leds/<led>/link_10
 Date:		Jun 2023
 KernelVersion:	6.5
diff --git a/Documentation/leds/leds-class.rst b/Documentation/leds/leds-class.rst
index 84665200a88d..41342ecb5f6b 100644
--- a/Documentation/leds/leds-class.rst
+++ b/Documentation/leds/leds-class.rst
@@ -179,6 +179,9 @@ ops and needs to declare specific support for the supported triggers.
 
 With hw control we refer to the LED driven by hardware.
 
+A sysfs attribute `trigger_may_offload` is provided for userspace to
+query supported triggers and their states.
+
 LED driver must define the following value to support hw control:
 
     - hw_control_trigger:
@@ -240,6 +243,11 @@ LED trigger must implement the following API to support hw control:
                 return a boolean indicating if the trigger is offloaded to
                 hardware.
 
+                If an LED driver specifies a hw control trigger but the
+                latter doesn't implement this callback, a dev_err_once will
+                be emitted and the LED trigger will be assumed to be not
+                offloaded.
+
 LED driver can activate additional modes by default to workaround the
 impossibility of supporting each different mode on the supported trigger.
 Examples are hardcoding the blink speed to a set interval, enable special
diff --git a/drivers/leds/led-class.c b/drivers/leds/led-class.c
index 9e14ae588f78..0ac80b93b8b5 100644
--- a/drivers/leds/led-class.c
+++ b/drivers/leds/led-class.c
@@ -90,8 +90,31 @@ static const struct bin_attribute *const led_trigger_bin_attrs[] = {
 	&bin_attr_trigger,
 	NULL,
 };
+
+static DEVICE_ATTR(trigger_may_offload, 0444, led_trigger_may_offload_show, NULL);
+static struct attribute *led_trigger_attrs[] = {
+	&dev_attr_trigger_may_offload.attr,
+	NULL,
+};
+
+static umode_t led_trigger_is_visible(struct kobject *kobj,
+				      struct attribute *attr,
+				      int idx)
+{
+	struct device *dev = kobj_to_dev(kobj);
+	struct led_classdev *led_cdev = dev_get_drvdata(dev);
+
+	if (attr == &dev_attr_trigger_may_offload.attr &&
+	    !led_cdev->hw_control_trigger)
+		return 0;
+
+	return attr->mode;
+}
+
 static const struct attribute_group led_trigger_group = {
 	.bin_attrs = led_trigger_bin_attrs,
+	.attrs = led_trigger_attrs,
+	.is_visible = led_trigger_is_visible,
 };
 #endif
 
diff --git a/drivers/leds/led-triggers.c b/drivers/leds/led-triggers.c
index b1223218bda1..c43229d9c4c1 100644
--- a/drivers/leds/led-triggers.c
+++ b/drivers/leds/led-triggers.c
@@ -313,6 +313,51 @@ void led_trigger_set_default(struct led_classdev *led_cdev)
 }
 EXPORT_SYMBOL_GPL(led_trigger_set_default);
 
+/*
+ * Caller must ensure led_cdev->trigger_lock held,
+ * and led_cdev->trigger->name must match led_cdev->hw_control_trigger.
+ */
+static bool led_trigger_get_offloaded(struct led_classdev *led_cdev)
+{
+	if (likely(led_cdev->trigger->offloaded))
+		return led_cdev->trigger->offloaded(led_cdev);
+
+	dev_err_once(led_cdev->dev,
+		     "hw control trigger %s doesn't implement offloaded(), this is a bug\n",
+		     led_cdev->trigger->name);
+	return false;
+}
+
+ssize_t led_trigger_may_offload_show(struct device *dev,
+				     struct device_attribute *attr, char *buf)
+{
+	struct led_classdev *led_cdev = dev_get_drvdata(dev);
+	bool hit, offloaded = false;
+	struct led_trigger *trig;
+	int len;
+
+	mutex_lock(&led_cdev->led_access);
+	down_read(&led_cdev->trigger_lock);
+
+	trig = led_cdev->trigger;
+
+	hit = trig && !strcmp(led_cdev->hw_control_trigger, trig->name);
+	if (hit)
+		offloaded = led_trigger_get_offloaded(led_cdev);
+
+	/* [offloaded] <active_but_not_offloaded> inactive */
+	len = sysfs_emit(buf, "%s%s%s\n",
+			 offloaded ? "[" : (hit ? "<" : ""),
+			 led_cdev->hw_control_trigger,
+			 offloaded ? "]" : (hit ? ">" : ""));
+
+	up_read(&led_cdev->trigger_lock);
+	mutex_unlock(&led_cdev->led_access);
+
+	return len;
+}
+EXPORT_SYMBOL_GPL(led_trigger_may_offload_show);
+
 /* LED Trigger Interface */
 
 int led_trigger_register(struct led_trigger *trig)
diff --git a/drivers/leds/leds.h b/drivers/leds/leds.h
index bee46651e068..9177e098989b 100644
--- a/drivers/leds/leds.h
+++ b/drivers/leds/leds.h
@@ -27,6 +27,8 @@ ssize_t led_trigger_read(struct file *filp, struct kobject *kobj,
 ssize_t led_trigger_write(struct file *filp, struct kobject *kobj,
 			const struct bin_attribute *bin_attr, char *buf,
 			loff_t pos, size_t count);
+ssize_t led_trigger_may_offload_show(struct device *dev,
+				     struct device_attribute *attr, char *buf);
 
 extern struct rw_semaphore leds_list_lock;
 extern struct list_head leds_list;
diff --git a/drivers/leds/trigger/ledtrig-netdev.c b/drivers/leds/trigger/ledtrig-netdev.c
index a26109ca4b1c..21f22eea4ab8 100644
--- a/drivers/leds/trigger/ledtrig-netdev.c
+++ b/drivers/leds/trigger/ledtrig-netdev.c
@@ -487,6 +487,8 @@ static ssize_t offloaded_show(struct device *dev,
 {
 	struct led_netdev_data *trigger_data = led_trigger_get_drvdata(dev);
 
+	dev_warn_once(dev, "offloaded attribute has been deprecated, see trigger_may_offload.\n");
+
 	return sprintf(buf, "%d\n", trigger_data->hw_control);
 }
 

-- 
2.53.0


^ permalink raw reply related

* [PATCH RFC v2 4/9] leds: trigger: netdev: Implement offloaded() callback
From: Rong Zhang @ 2026-06-17 16:47 UTC (permalink / raw)
  To: Lee Jones, Pavel Machek, Jonathan Corbet, Shuah Khan,
	Thomas Weißschuh, Benson Leung, Guenter Roeck,
	Marek Behún, Mark Pearson, Derek J. Clark, Hans de Goede,
	Ilpo Järvinen, Ike Panhc
  Cc: Andrew Lunn, Jakub Kicinski, Vishnu Sankar, Vishnu Sankar,
	linux-leds, netdev, linux-doc, linux-kernel, chrome-platform,
	platform-driver-x86, Rong Zhang
In-Reply-To: <20260618-leds-trigger-hw-changed-v2-0-c28c44053cf3@rong.moe>

"netdev" can run in hardware control according to hardware capabilities
and trigger options. Implement offloaded() callback to provide its
hardware control state to the LED core.

Signed-off-by: Rong Zhang <i@rong.moe>
---
 drivers/leds/trigger/ledtrig-netdev.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/leds/trigger/ledtrig-netdev.c b/drivers/leds/trigger/ledtrig-netdev.c
index 64c078e997f2..a26109ca4b1c 100644
--- a/drivers/leds/trigger/ledtrig-netdev.c
+++ b/drivers/leds/trigger/ledtrig-netdev.c
@@ -754,10 +754,18 @@ static void netdev_trig_deactivate(struct led_classdev *led_cdev)
 	kfree(trigger_data);
 }
 
+static bool netdev_trig_offloaded(struct led_classdev *led_cdev)
+{
+	struct led_netdev_data *trigger_data = led_get_trigger_data(led_cdev);
+
+	return trigger_data->hw_control;
+}
+
 static struct led_trigger netdev_led_trigger = {
 	.name = "netdev",
 	.activate = netdev_trig_activate,
 	.deactivate = netdev_trig_deactivate,
+	.offloaded = netdev_trig_offloaded,
 	.groups = netdev_trig_groups,
 };
 

-- 
2.53.0


^ permalink raw reply related

* [PATCH RFC v2 3/9] leds: turris-omnia: Implement offloaded() callback for trigger
From: Rong Zhang @ 2026-06-17 16:47 UTC (permalink / raw)
  To: Lee Jones, Pavel Machek, Jonathan Corbet, Shuah Khan,
	Thomas Weißschuh, Benson Leung, Guenter Roeck,
	Marek Behún, Mark Pearson, Derek J. Clark, Hans de Goede,
	Ilpo Järvinen, Ike Panhc
  Cc: Andrew Lunn, Jakub Kicinski, Vishnu Sankar, Vishnu Sankar,
	linux-leds, netdev, linux-doc, linux-kernel, chrome-platform,
	platform-driver-x86, Rong Zhang
In-Reply-To: <20260618-leds-trigger-hw-changed-v2-0-c28c44053cf3@rong.moe>

"omnia-mcu" is a private hardware control trigger which always stays in
hardware control mode. Implement offloaded() callback with its return
value to be always true to reflect this.

Meanwhile, declare it as a hardware control trigger as it's forgotten
before.

Signed-off-by: Rong Zhang <i@rong.moe>
---
 drivers/leds/leds-turris-omnia.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/leds/leds-turris-omnia.c b/drivers/leds/leds-turris-omnia.c
index 25ee5c1eb820..8e016ca86403 100644
--- a/drivers/leds/leds-turris-omnia.c
+++ b/drivers/leds/leds-turris-omnia.c
@@ -195,10 +195,16 @@ static void omnia_hwtrig_deactivate(struct led_classdev *cdev)
 			err);
 }
 
+static bool omnia_hwtrig_offloaded(struct led_classdev *cdev)
+{
+	return true;
+}
+
 static struct led_trigger omnia_hw_trigger = {
 	.name		= "omnia-mcu",
 	.activate	= omnia_hwtrig_activate,
 	.deactivate	= omnia_hwtrig_deactivate,
+	.offloaded	= omnia_hwtrig_offloaded,
 	.trigger_type	= &omnia_hw_trigger_type,
 };
 
@@ -251,6 +257,7 @@ static int omnia_led_register(struct i2c_client *client, struct omnia_led *led,
 	 * by LED class from the linux,default-trigger property.
 	 */
 	cdev->default_trigger = omnia_hw_trigger.name;
+	cdev->hw_control_trigger = omnia_hw_trigger.name;
 
 	/* Put the LED into software mode */
 	ret = omnia_cmd_write_u8(client, OMNIA_CMD_LED_MODE, OMNIA_CMD_LED_MODE_LED(led->reg) |

-- 
2.53.0


^ permalink raw reply related

* [PATCH RFC v2 2/9] leds: cros_ec: Implement offloaded() callback for trigger
From: Rong Zhang @ 2026-06-17 16:47 UTC (permalink / raw)
  To: Lee Jones, Pavel Machek, Jonathan Corbet, Shuah Khan,
	Thomas Weißschuh, Benson Leung, Guenter Roeck,
	Marek Behún, Mark Pearson, Derek J. Clark, Hans de Goede,
	Ilpo Järvinen, Ike Panhc
  Cc: Andrew Lunn, Jakub Kicinski, Vishnu Sankar, Vishnu Sankar,
	linux-leds, netdev, linux-doc, linux-kernel, chrome-platform,
	platform-driver-x86, Rong Zhang
In-Reply-To: <20260618-leds-trigger-hw-changed-v2-0-c28c44053cf3@rong.moe>

"chromeos-auto" is a private hardware control trigger which always stays
in hardware control. Implement offloaded() callback with its return
value to be always true to reflect this.

Signed-off-by: Rong Zhang <i@rong.moe>
---
 drivers/leds/leds-cros_ec.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/leds/leds-cros_ec.c b/drivers/leds/leds-cros_ec.c
index bea3cc3fbfd2..f48e3cf6ccf6 100644
--- a/drivers/leds/leds-cros_ec.c
+++ b/drivers/leds/leds-cros_ec.c
@@ -86,12 +86,18 @@ static int cros_ec_led_trigger_activate(struct led_classdev *led_cdev)
 	return cros_ec_led_send_cmd(priv->cros_ec, &arg);
 }
 
+static bool cros_ec_led_trigger_offloaded(struct led_classdev *led_cdev)
+{
+	return true;
+}
+
 static struct led_hw_trigger_type cros_ec_led_trigger_type;
 
 static struct led_trigger cros_ec_led_trigger = {
 	.name = "chromeos-auto",
 	.trigger_type = &cros_ec_led_trigger_type,
 	.activate = cros_ec_led_trigger_activate,
+	.offloaded = cros_ec_led_trigger_offloaded,
 };
 
 static int cros_ec_led_brightness_set_blocking(struct led_classdev *led_cdev,

-- 
2.53.0


^ permalink raw reply related

* [PATCH RFC v2 1/9] leds: Add callback offloaded() to query the state of hardware control trigger
From: Rong Zhang @ 2026-06-17 16:47 UTC (permalink / raw)
  To: Lee Jones, Pavel Machek, Jonathan Corbet, Shuah Khan,
	Thomas Weißschuh, Benson Leung, Guenter Roeck,
	Marek Behún, Mark Pearson, Derek J. Clark, Hans de Goede,
	Ilpo Järvinen, Ike Panhc
  Cc: Andrew Lunn, Jakub Kicinski, Vishnu Sankar, Vishnu Sankar,
	linux-leds, netdev, linux-doc, linux-kernel, chrome-platform,
	platform-driver-x86, Rong Zhang
In-Reply-To: <20260618-leds-trigger-hw-changed-v2-0-c28c44053cf3@rong.moe>

There are multiple triggers implementing hardware control. However, the
LED core doesn't really know the hardware control state since the
coordination is done directly between the trigger and the LED device.

Add an offloaded() callback so that the LED core can query the hardware
control state.

Signed-off-by: Rong Zhang <i@rong.moe>
---
 Documentation/leds/leds-class.rst | 5 +++++
 include/linux/leds.h              | 1 +
 2 files changed, 6 insertions(+)

diff --git a/Documentation/leds/leds-class.rst b/Documentation/leds/leds-class.rst
index 5db620ed27aa..84665200a88d 100644
--- a/Documentation/leds/leds-class.rst
+++ b/Documentation/leds/leds-class.rst
@@ -235,6 +235,11 @@ LED driver must implement the following API to support hw control:
                 Returns a pointer to a struct device or NULL if nothing
                 is currently attached.
 
+LED trigger must implement the following API to support hw control:
+    - offloaded:
+                return a boolean indicating if the trigger is offloaded to
+                hardware.
+
 LED driver can activate additional modes by default to workaround the
 impossibility of supporting each different mode on the supported trigger.
 Examples are hardcoding the blink speed to a set interval, enable special
diff --git a/include/linux/leds.h b/include/linux/leds.h
index b16b803cc1ac..7332034a43c8 100644
--- a/include/linux/leds.h
+++ b/include/linux/leds.h
@@ -485,6 +485,7 @@ struct led_trigger {
 	const char	 *name;
 	int		(*activate)(struct led_classdev *led_cdev);
 	void		(*deactivate)(struct led_classdev *led_cdev);
+	bool		(*offloaded)(struct led_classdev *led_cdev);
 
 	/* Brightness set by led_trigger_event */
 	enum led_brightness brightness;

-- 
2.53.0


^ permalink raw reply related

* [PATCH RFC v2 0/9] leds: Add support for hardware-initiated hardware control trigger transition
From: Rong Zhang @ 2026-06-17 16:47 UTC (permalink / raw)
  To: Lee Jones, Pavel Machek, Jonathan Corbet, Shuah Khan,
	Thomas Weißschuh, Benson Leung, Guenter Roeck,
	Marek Behún, Mark Pearson, Derek J. Clark, Hans de Goede,
	Ilpo Järvinen, Ike Panhc
  Cc: Andrew Lunn, Jakub Kicinski, Vishnu Sankar, Vishnu Sankar,
	linux-leds, netdev, linux-doc, linux-kernel, chrome-platform,
	platform-driver-x86, Rong Zhang

Some laptops can tune their keyboard backlight according to ambient
light sensors (auto mode). This capability is essentially a hardware
control trigger. Meanwhile, such laptops also offer a shrotcut for
cycling through brightness levels and auto mode. For example, on
ThinkBook, pressing Fn+Space cycles keyboard backlight levels in the
following sequence:

  1 => 2 => 0 => auto => 1 ...

Recent ThinkPad models should have similar sequence too.

However, there are some issues preventing us from using a private
hardware control trigger:

1. We want a mechanism to tell userspace which trigger is the hardware
   control one, so that userspace can determine if auto mode is on/off,
   as well as turing it on/off programmatically without obtaining the
   trigger's name via other channels
2. Turing on/off auto mode via the shortcut cannot activate/deactivate
   the corresponding hardware control trigger, making the software state
   out of sync
3. Even with #1 resolved, deactivating the hardware control trigger has
   a side effect of emitting LED_OFF, breaking the shortcut cycle, where
   "auto => 1" requires the driver to deactivate the trigger

This RFC series tries to demonstrate a path on solving these issues:

- Introduce an attribute "trigger_may_offload", so that userspace can
  determine:
  - if the LED device supports hardware control (supported => visible)
  - which trigger is the hardware control trigger selected by the LED
    device
  - if the trigger is selected ("<foo_trigger>")
  - if the trigger is offloaded ("[foo_trigger]")
    - A callback offloaded() is added so that LED triggers can report
      their hardware control state
- Add led_trigger_notify_hw_control_changed() interface, so that LED
  drivers can notify the LED core about hardware-initiated hardware
  control transitions. The LED core will then determine if the
  transition is allowed and switching between "none" (i.e., no trigger)
  and the device's private trigger accordingly
  - This capability is restricted to the device's private trigger. If
    the current trigger is neither the private trigger nor "none", no
    transition will be made
  - This interface is gated behind Kconfig LEDS_TRIGGERS_HW_CHANGED and
    LED device flag LED_TRIG_HW_CHANGED
- Tune the logic of trigger deactivation so that it won't emit LED_OFF
  when the deactivation is triggered by hardware

The last three patches are included in the RFC series to demonstrate how
to these interfaces are supposed to be utilized, so that ideapad-laptop
can expose the auto mode of ThinkBook's keyboard backlight. They can be
submitted separately once the dust settles, if preferred.

[ Summary of other approaches ]

< custom attribute >

Pros:
- simplicity, KISS
- no need to touch the LED core
- extensible as long as it has a sensor-neutral name
  - a sensor-related name could potentially lead to a mess if a future
    device implements auto mode based on multiple different sensors

Cons:
- must have zero influence on brightness_set[_blocking] callbacks
  in order not to break triggers
  - potential interference with triggers and the brightness attribute
- weird semantic (an attribute other than "brightness" and "trigger"
  changes the brightness)

< private hardware control trigger (this series) >

Pros:
- mutually exclusive with other triggers (hence less chaos)
- semantic correctness
- acts as an aggregate switch to turn on/off auto mode even a future
  device implements auto mode based on multiple different sensors
  - extensibility (through trigger attributes)

Cons:
- complexity

[ Previous discussion threads ]

https://lore.kernel.org/r/08580ec5-1d7b-4612-8a3f-75bc2f40aad2@app.fastmail.com
https://lore.kernel.org/r/1dbfcf656cdb4af0299f90d7426d2ec7e2b8ac9e.camel@rong.moe

Signed-off-by: Rong Zhang <i@rong.moe>
---
Changes in v2:
- Restrict the led_trigger_notify_hw_control_changed() interface to
  private triggers only
  - Drop PATCH v1 1/9 ("leds: Load trigger modules on-demand if used as
    hw control trigger"), not relavant any more
- Gate the led_trigger_notify_hw_control_changed() interface behind
  Kconfig LEDS_TRIGGERS_HW_CHANGED and LED device flag
  LED_TRIG_HW_CHANGED
- Fix lock ordering inversion
- ideapad-laptop:
  - Only call led_trigger_notify_hw_control_changed() when needed
  - Serialize keyboard backlight notifications
- Reword commit messages and documentations
- Link to v1: https://patch.msgid.link/20260227190617.271388-1-i@rong.moe

---
Rong Zhang (9):
      leds: Add callback offloaded() to query the state of hardware control trigger
      leds: cros_ec: Implement offloaded() callback for trigger
      leds: turris-omnia: Implement offloaded() callback for trigger
      leds: trigger: netdev: Implement offloaded() callback
      leds: Add trigger_may_offload attribute
      leds: trigger: Add led_trigger_notify_hw_control_changed() interface
      platform/x86: ideapad-laptop: Decouple hardware & classdev brightness for keyboard backlight
      platform/x86: ideapad-laptop: Serialize keyboard backlight notifications
      platform/x86: ideapad-laptop: Fully support auto keyboard backlight

 .../ABI/obsolete/sysfs-class-led-trigger-netdev    |  16 ++
 Documentation/ABI/testing/sysfs-class-led          |  22 +++
 .../ABI/testing/sysfs-class-led-trigger-netdev     |  13 --
 Documentation/leds/leds-class.rst                  |  74 +++++++
 drivers/leds/led-class.c                           |  23 +++
 drivers/leds/led-triggers.c                        | 131 +++++++++++-
 drivers/leds/leds-cros_ec.c                        |   6 +
 drivers/leds/leds-turris-omnia.c                   |   7 +
 drivers/leds/leds.h                                |   2 +
 drivers/leds/trigger/Kconfig                       |   9 +
 drivers/leds/trigger/ledtrig-netdev.c              |  10 +
 drivers/platform/x86/lenovo/Kconfig                |   1 +
 drivers/platform/x86/lenovo/ideapad-laptop.c       | 219 ++++++++++++++++-----
 include/linux/leds.h                               |   9 +
 14 files changed, 481 insertions(+), 61 deletions(-)
---
base-commit: 66affa37cfac0aec061cc4bcf4a065b0c52f7e19
change-id: 20260506-leds-trigger-hw-changed-96a62188cbdf

Thanks,
Rong


^ permalink raw reply

* Re: [PATCH v4 1/3] dt-bindings: net: add Realtek RTL8125 PCIe Ethernet
From: Heiner Kallweit @ 2026-06-17 16:43 UTC (permalink / raw)
  To: ricardo, nic_swsd, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Heiko Stuebner
  Cc: Sebastian Reichel, netdev, devicetree, linux-kernel,
	linux-arm-kernel, linux-rockchip
In-Reply-To: <20260617-rk3588-dts-rtl-eth-describe-dt-alias-v4-1-2bd38922d129@pardini.net>

On 17.06.2026 14:58, Ricardo Pardini via B4 Relay wrote:
> From: Ricardo Pardini <ricardo@pardini.net>
> 
> Add a binding for fixed/soldered Realtek RTL8125 PCIe Ethernet
> controller.
> 
> The "pciVVVV,DDDD" compatibles are the Open Firmware PCI Bus Binding
> spelling, auto-derived from PCI-SIG vendor/device IDs, but they still
> need a binding when used in a board DT - analogous to "usbVVVV,PPPP"
> compatibles documented in their own bindings (e.g. microchip,lan95xx)
> so board DTs attaching properties (fixed MAC, nvmem cell, ...) to
> these PCI function nodes can be validated.
> 
> Suggested-by: Sebastian Reichel <sebastian.reichel@collabora.com>
> Signed-off-by: Ricardo Pardini <ricardo@pardini.net>
> ---
>  .../devicetree/bindings/net/realtek,rtl8125.yaml   | 43 ++++++++++++++++++++++
>  MAINTAINERS                                        |  1 +
>  2 files changed, 44 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/net/realtek,rtl8125.yaml b/Documentation/devicetree/bindings/net/realtek,rtl8125.yaml
> new file mode 100644
> index 0000000000000..eee13fbc1e6a6
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/net/realtek,rtl8125.yaml
> @@ -0,0 +1,43 @@
> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/net/realtek,rtl8125.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: Realtek RTL8125 2.5 Gigabit PCIe Ethernet Controller
> +
> +maintainers:
> +  - Heiner Kallweit <hkallweit1@gmail.com>
> +
> +description:
> +  The Realtek RTL8125 is a 2.5GBASE-T Ethernet controller with a PCIe host
> +  interface.
> +
> +allOf:
> +  - $ref: ethernet-controller.yaml#
> +
> +properties:
> +  compatible:
> +    const: pci10ec,8125

IIRC we came to the conclusion that the compatible string isn't used in the
relevant code path. Then why add it here? Is there an alignment on this?
If it should be added here, then an explaining comment would be helpful.

> +
> +  reg:
> +    maxItems: 1
> +
> +required:
> +  - compatible
> +  - reg
> +
> +unevaluatedProperties: false
> +
> +examples:
> +  - |
> +    pcie {
> +        #address-cells = <3>;
> +        #size-cells = <2>;
> +
> +        ethernet@0,0 {
> +            compatible = "pci10ec,8125";
> +            reg = <0x10000 0 0 0 0>;
> +            local-mac-address = [00 00 00 00 00 00];
> +        };
> +    };
> diff --git a/MAINTAINERS b/MAINTAINERS
> index c8d4b913f26c1..e5fbd82946aec 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -134,6 +134,7 @@ M:	Heiner Kallweit <hkallweit1@gmail.com>
>  M:	nic_swsd@realtek.com
>  L:	netdev@vger.kernel.org
>  S:	Maintained
> +F:	Documentation/devicetree/bindings/net/realtek,rtl8125.yaml
>  F:	drivers/net/ethernet/realtek/r8169*
>  
>  8250/16?50 (AND CLONE UARTS) SERIAL DRIVER
> 


^ permalink raw reply

* Re: [PATCH net] net/mlx5e: Use sender devcom for MPV master-up
From: manjunath.b.patil @ 2026-06-17 16:28 UTC (permalink / raw)
  To: Saeed Mahameed, Tariq Toukan, Mark Bloch, Leon Romanovsky, netdev
  Cc: Andrew Lunn, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Patrisious Haddad, linux-rdma, linux-kernel, stable
In-Reply-To: <20260610173915.4053423-1-manjunath.b.patil@oracle.com>


On 6/10/26 10:39 AM, Manjunath Patil wrote:
> After PCIe DPC recovery, mlx5 reloads the affected functions and
> replays multiport affiliation events. In the reported failure, the
> first relevant device error was:
> 
>    pcieport 0000:10:01.1: DPC: containment event
>    pcieport 0000:10:01.1: PCIe Bus Error: severity=Uncorrected (Fatal)
>    pcieport 0000:10:01.1:    [ 5] SDES                   (First)
> 
> mlx5 recovered the PCI functions and resumed 0000:11:00.1. During
> that resume, RDMA multiport binding replayed
> MLX5_DRIVER_EVENT_AFFILIATION_DONE and mlx5e sent
> MPV_DEVCOM_MASTER_UP. The host then panicked with:
> 
>    BUG: kernel NULL pointer dereference, address: 0000000000000010
>    RIP: mlx5_devcom_comp_set_ready+0x5/0x40 [mlx5_core]
>    RDI: 0000000000000000
> 
> Call trace included:
> 
>    mlx5_devcom_comp_set_ready
>    mlx5e_devcom_event_mpv
>    mlx5_devcom_send_event
>    mlx5_ib_bind_slave_port
>    mlx5r_mp_probe
>    mlx5_pci_resume
> 
> MPV devcom registration publishes mlx5e private data to the component
> peer list before mlx5e_devcom_init_mpv() stores the returned component
> device in priv->devcom. A concurrent master-up event can therefore
> reach a peer whose private data is visible but whose priv->devcom
> backpointer is still NULL.
> 
> MPV_DEVCOM_MASTER_UP already carries the sender/master mlx5e private
> data as event_data. The ready bit is stored on the shared devcom
> component, not on an individual peer. Use the sender devcom when
> marking the MPV component ready.
> 
> This preserves the readiness transition while avoiding a NULL
> dereference of the peer devcom pointer during affiliation replay after
> PCI error recovery.
> 
> Fixes: bf11485f8419 ("net/mlx5: Register mlx5e priv to devcom in MPV mode")
> Assisted-by: Codex:gpt-5
> Signed-off-by: Manjunath Patil <manjunath.b.patil@oracle.com>
> Cc: stable@vger.kernel.org # 6.7+
> ---
Ping!

-Manjunath
>   drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 7 +++++--
>   1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> index 8f2b3abe0092..f7ff20b97e8c 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> @@ -211,11 +211,14 @@ static void mlx5e_disable_async_events(struct mlx5e_priv *priv)
>   
>   static int mlx5e_devcom_event_mpv(int event, void *my_data, void *event_data)
>   {
> -	struct mlx5e_priv *slave_priv = my_data;
> +	struct mlx5e_priv *master_priv = event_data;
>   
>   	switch (event) {
>   	case MPV_DEVCOM_MASTER_UP:
> -		mlx5_devcom_comp_set_ready(slave_priv->devcom, true);
> +		if (!master_priv || !master_priv->devcom)
> +			return -EINVAL;
> +
> +		mlx5_devcom_comp_set_ready(master_priv->devcom, true);
>   		break;
>   	case MPV_DEVCOM_MASTER_DOWN:
>   		/* no need for comp set ready false since we unregister after


^ permalink raw reply

* Re: [Bug] incompatibility between 'e1000e' and Aruba AOS-CX switches (too small inter-packet gap)
From: Philippe Andersson @ 2026-06-17 16:18 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: netdev, Ludovic Calmant, Fabian Noël
In-Reply-To: <ee384978-4eca-4a49-b1e7-55be7698970f@lunn.ch>


[-- Attachment #1.1.1: Type: text/plain, Size: 1362 bytes --]

On 17/06/2026 18:07, Andrew Lunn wrote:
>>> Can you run a git bisect from the last
>>> known good kernel version to the first known bad version?
>> Not really, as there is no "known good kernel". All kernels are good ones,
>> as long as older ProCurve switches are used (e.g. HPE/Aruba ProCurve 5406R
>> or HPE/Aruba ProCurve 2930M-48G-PoE+). The problem only shows when AOS-CX
>> switches are used, and we only started deploying those in production a
>> couple of months ago.
> 
> I was hoping you could narrow it down to one patch which caused the
> issue. But if it never worked...
Sorry to disappoint ;-)

> Then i really think you need to be talking to both vendors and try to
> get them to work together to identify the problem.
This is precisely why I contacted this mailing list. My understanding 
was that this was the proper way to report a potential bug in the 
'e1000e' driver now that it has been incorporated in the Linux kernel 
tree, as per the maintainers listed in the code. But if you know 
otherwise, please tell me.

> Maybe send a NIC to
> the switch vendor, so they have all the hardware, etc.
This is already ongoing.

Ph. A.

-- 

*Philippe Andersson*
Unix System Administrator
IBA Particle Therapy |
Tel: +32-10-475.983
Fax: +32-10-487.707
eMail: pan@iba-group.com
<http://www.iba-worldwide.com>



[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3165 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 203 bytes --]

^ permalink raw reply

* Re: [Bug] incompatibility between 'e1000e' and Aruba AOS-CX switches (too small inter-packet gap)
From: Andrew Lunn @ 2026-06-17 16:07 UTC (permalink / raw)
  To: Philippe Andersson; +Cc: netdev, Ludovic Calmant, Fabian Noël
In-Reply-To: <3b073409-9096-4467-9baf-11196a70c014@iba-group.com>

> > Can you run a git bisect from the last
> > known good kernel version to the first known bad version?
> Not really, as there is no "known good kernel". All kernels are good ones,
> as long as older ProCurve switches are used (e.g. HPE/Aruba ProCurve 5406R
> or HPE/Aruba ProCurve 2930M-48G-PoE+). The problem only shows when AOS-CX
> switches are used, and we only started deploying those in production a
> couple of months ago.

I was hoping you could narrow it down to one patch which caused the
issue. But if it never worked...

Then i really think you need to be talking to both vendors and try to
get them to work together to identify the problem. Maybe send a NIC to
the switch vendor, so they have all the hardware, etc.

    Andrew

^ permalink raw reply

* Re: [Bug] incompatibility between 'e1000e' and Aruba AOS-CX switches (too small inter-packet gap)
From: Philippe Andersson @ 2026-06-17 15:41 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: netdev, Ludovic Calmant, Fabian Noël
In-Reply-To: <bdf0a4b8-e414-4df7-833c-3051987c3f2b@lunn.ch>


[-- Attachment #1.1.1: Type: text/plain, Size: 3063 bytes --]

On 16/06/2026 21:34, Andrew Lunn wrote:
>> A support ticket has already been opened with Aruba, but it's unclear at
>> this stage that the problem is on their side.
> 
> How easy is it to reproduce?
Provided you have the required hardware (PC with NIC using the 'e1000e' 
driver and Aruba AOS-CX switch such as HPE/Aruba CX 5420 or HPE/Aruba CX 
6200M -- perhaps others, we only tested with those ones), reproducing 
the issue is easy: running 'iperf3' for a few minutes (the PC with 
'e1000e' plays the role of iperf server). You will get a cluster of 
retransmits at the start of the test, and you may get further clusters 
at random intervals.

Here is an 'iperf3' output that shows the problem after only 10 secs.

-------------------------<cut>----------------------------
Connecting to host 10.1.1.21, port 5201
[  5] local 10.1.1.61 port 55096 connected to 10.1.1.21 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   110 MBytes   923 Mbits/sec   70   1.49 MBytes 

[  5]   1.00-2.00   sec   106 MBytes   891 Mbits/sec    0   1.63 MBytes 

[  5]   2.00-3.00   sec   106 MBytes   891 Mbits/sec    0   1.74 MBytes 

[  5]   3.00-4.00   sec   108 MBytes   902 Mbits/sec    0   1.83 MBytes 

[  5]   4.00-5.00   sec   106 MBytes   891 Mbits/sec    3   1.32 MBytes 

[  5]   5.00-6.00   sec   106 MBytes   891 Mbits/sec    0   1.42 MBytes 

[  5]   6.00-7.00   sec   108 MBytes   902 Mbits/sec    0   1.49 MBytes 

[  5]   7.00-8.00   sec   106 MBytes   891 Mbits/sec    0   1.54 MBytes 

[  5]   8.00-9.00   sec   106 MBytes   891 Mbits/sec    0   1.57 MBytes 

[  5]   9.00-10.00  sec   108 MBytes   902 Mbits/sec    0   1.59 MBytes 

- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.04 GBytes   898 Mbits/sec   73             sender
[  5]   0.00-10.02  sec  1.04 GBytes   894 Mbits/sec 
receiver
-------------------------<cut>----------------------------

But you need to do this in parallel with "background network traffic" on 
the PC (such as NFS, for instance). We've not been able yet to 
characterise the minimum amount of traffic necessary for the problem to 
manifest itself but we'll try to do so.

> Can you run a git bisect from the last
> known good kernel version to the first known bad version?
Not really, as there is no "known good kernel". All kernels are good 
ones, as long as older ProCurve switches are used (e.g. HPE/Aruba 
ProCurve 5406R or HPE/Aruba ProCurve 2930M-48G-PoE+). The problem only 
shows when AOS-CX switches are used, and we only started deploying those 
in production a couple of months ago.

What I can tell you is that the problem is still present in 
6.12.90+deb13.1-amd64. This is the most recent kernel we tested.

HTH

Ph. A.

-- 

*Philippe Andersson*
Unix System Administrator
IBA Particle Therapy |
Tel: +32-10-475.983
Fax: +32-10-487.707
eMail: pan@iba-group.com
<http://www.iba-worldwide.com>



[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3165 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 203 bytes --]

^ permalink raw reply

* [net PATCH v3] octeontx2-af: Validate NIX maximum LFs correctly
From: Subbaraya Sundeep @ 2026-06-17 15:40 UTC (permalink / raw)
  To: andrew+netdev, davem, edumazet, kuba, pabeni, sgoutham, gakula,
	bbhushan2, rkannoth
  Cc: netdev, linux-kernel, Subbaraya Sundeep

NIX maximum number of LFs can be set via devlink command
but that can be done before assigning any LFs to a PF/VF.
The condition used to check whether any LFs are assigned is
incorrect. This patch fixes that condition.

Fixes: dd7842878633 ("octeontx2-af: Add new devlink param to configure maximum usable NIX block LFs")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
---
v3 changes:
	None, updated changelog
v2 changes:
	Fixed AI review by updating error message
	Updated comment to mention modifying NIXLFs has to be done prior
	to attaching NIXLFs to any PFs/VFs.

 .../marvell/octeontx2/af/rvu_devlink.c        | 27 +++++++++++++------
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
index 6494a9ee2f0d..3b47ecb44d51 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
@@ -1510,7 +1510,9 @@ static int rvu_af_dl_nix_maxlf_validate(struct devlink *devlink, u32 id,
 	struct rvu_devlink *rvu_dl = devlink_priv(devlink);
 	struct rvu *rvu = rvu_dl->rvu;
 	u16 max_nix0_lf, max_nix1_lf;
-	struct npc_mcam *mcam;
+	struct rvu_block *block;
+	int blkaddr = 0;
+	int free_lfs;
 	u64 cfg;
 
 	cfg = rvu_read64(rvu, BLKADDR_NIX0, NIX_AF_CONST2);
@@ -1518,14 +1520,23 @@ static int rvu_af_dl_nix_maxlf_validate(struct devlink *devlink, u32 id,
 	cfg = rvu_read64(rvu, BLKADDR_NIX1, NIX_AF_CONST2);
 	max_nix1_lf = cfg & 0xFFF;
 
-	/* Do not allow user to modify maximum NIX LFs while mcam entries
-	 * have already been assigned.
+	/* Do not allow user to modify maximum NIX LFs while NIX LFs
+	 * have already been assigned. Note that modifying NIX LFs count
+	 * can be done only before any LF attach requests from PFs and VFs
+	 * and not later or concurrently.
 	 */
-	mcam = &rvu->hw->mcam;
-	if (mcam->bmap_fcnt < mcam->bmap_entries) {
-		NL_SET_ERR_MSG_MOD(extack,
-				   "mcam entries have already been assigned, can't resize");
-		return -EPERM;
+	blkaddr = rvu_get_next_nix_blkaddr(rvu, blkaddr);
+	while (blkaddr) {
+		block = &rvu->hw->block[blkaddr];
+
+		free_lfs = rvu_rsrc_free_count(&block->lf);
+		if (free_lfs != block->lf.max) {
+			NL_SET_ERR_MSG_MOD(extack,
+					   "NIX LFs already assigned, can't resize");
+			return -EPERM;
+		}
+
+		blkaddr = rvu_get_next_nix_blkaddr(rvu, blkaddr);
 	}
 
 	if (max_nix0_lf && val.vu16 > max_nix0_lf) {
-- 
2.48.1


^ permalink raw reply related

* Re: [PATCH RESEND 1/6] sock: add sock_kzalloc helper
From: Thorsten Blum @ 2026-06-17 15:36 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Herbert Xu, David S. Miller, Eric Dumazet, Kuniyuki Iwashima,
	Paolo Abeni, Willem de Bruijn, Simon Horman, linux-crypto,
	linux-kernel, netdev
In-Reply-To: <20260615091555.4af017aa@kernel.org>

On Mon, Jun 15, 2026 at 09:15:55AM -0700, Jakub Kicinski wrote:
> On Sun, 14 Jun 2026 17:32:12 +0200 Thorsten Blum wrote:
> > Gentle ping? Patch 1/6 still needs an ack from netdev maintainers.
> 
> Perhaps other maintainers shared my feeling that this is a waste of
> time.

Could you elaborate on why sock_kzfree_s() is okay, but sock_kzalloc()
is not? Both are small, socket-specific zeroing helpers.

sock_kzalloc() has the same number of call sites as sock_kzfree_s(), and
it could also be used in net/ipv6/exthdrs.c in ipv6_renew_options().

Thanks,
Thorsten

^ permalink raw reply

* Re: [net PATCH v2] octeontx2-af: Validate NIX maximum LFs correctly
From: Subbaraya Sundeep @ 2026-06-17 15:31 UTC (permalink / raw)
  To: andrew+netdev, davem, edumazet, kuba, pabeni, sgoutham, gakula,
	bbhushan2, rkannoth
  Cc: netdev, linux-kernel
In-Reply-To: <1781709750-23218-1-git-send-email-sbhatta@marvell.com>

Missed the changelog. Will resend.

Thanks,
Sundeep

pw-bot: changes-requested

On 2026-06-17 at 20:52:30, Subbaraya Sundeep (sbhatta@marvell.com) wrote:
> NIX maximum number of LFs can be set via devlink command
> but that can be done before assigning any LFs to a PF/VF.
> The condition used to check whether any LFs are assigned is
> incorrect. This patch fixes that condition.
> 
> Fixes: dd7842878633 ("octeontx2-af: Add new devlink param to configure maximum usable NIX block LFs")
> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
> ---
>  .../marvell/octeontx2/af/rvu_devlink.c        | 27 +++++++++++++------
>  1 file changed, 19 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
> index 6494a9ee2f0d..3b47ecb44d51 100644
> --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
> +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
> @@ -1510,7 +1510,9 @@ static int rvu_af_dl_nix_maxlf_validate(struct devlink *devlink, u32 id,
>  	struct rvu_devlink *rvu_dl = devlink_priv(devlink);
>  	struct rvu *rvu = rvu_dl->rvu;
>  	u16 max_nix0_lf, max_nix1_lf;
> -	struct npc_mcam *mcam;
> +	struct rvu_block *block;
> +	int blkaddr = 0;
> +	int free_lfs;
>  	u64 cfg;
>  
>  	cfg = rvu_read64(rvu, BLKADDR_NIX0, NIX_AF_CONST2);
> @@ -1518,14 +1520,23 @@ static int rvu_af_dl_nix_maxlf_validate(struct devlink *devlink, u32 id,
>  	cfg = rvu_read64(rvu, BLKADDR_NIX1, NIX_AF_CONST2);
>  	max_nix1_lf = cfg & 0xFFF;
>  
> -	/* Do not allow user to modify maximum NIX LFs while mcam entries
> -	 * have already been assigned.
> +	/* Do not allow user to modify maximum NIX LFs while NIX LFs
> +	 * have already been assigned. Note that modifying NIX LFs count
> +	 * can be done only before any LF attach requests from PFs and VFs
> +	 * and not later or concurrently.
>  	 */
> -	mcam = &rvu->hw->mcam;
> -	if (mcam->bmap_fcnt < mcam->bmap_entries) {
> -		NL_SET_ERR_MSG_MOD(extack,
> -				   "mcam entries have already been assigned, can't resize");
> -		return -EPERM;
> +	blkaddr = rvu_get_next_nix_blkaddr(rvu, blkaddr);
> +	while (blkaddr) {
> +		block = &rvu->hw->block[blkaddr];
> +
> +		free_lfs = rvu_rsrc_free_count(&block->lf);
> +		if (free_lfs != block->lf.max) {
> +			NL_SET_ERR_MSG_MOD(extack,
> +					   "NIX LFs already assigned, can't resize");
> +			return -EPERM;
> +		}
> +
> +		blkaddr = rvu_get_next_nix_blkaddr(rvu, blkaddr);
>  	}
>  
>  	if (max_nix0_lf && val.vu16 > max_nix0_lf) {
> -- 
> 2.48.1
> 

^ permalink raw reply

* [PATCH net] net/smc: avoid recursive sk_callback_lock in listen data_ready
From: Runyu Xiao @ 2026-06-17 15:28 UTC (permalink / raw)
  To: D. Wythe, Dust Li, Sidraya Jayagond, Wenjia Zhang,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Mahanta Jambigi, Tony Lu, Wen Gu, Simon Horman, Karsten Graul,
	linux-rdma, linux-s390, netdev, linux-kernel, jianhao.xu,
	runyu.xiao, stable

smc_listen() installs smc_clcsock_data_ready() as the underlying TCP
listen socket's sk_data_ready callback.  smc_clcsock_data_ready() then
immediately takes sk_callback_lock before looking up the SMC listener and
queuing smc_tcp_listen_work().

That is unsafe once the TCP listen socket is leaving TCP_LISTEN.  The TCP
close/flush path can run the installed sk_data_ready callback with
sk_callback_lock already held, so entering smc_clcsock_data_ready() again
tries to take the same rwlock recursively in the same thread.  The nvmet
TCP listener had to make the same state check before taking
sk_callback_lock for this reason.

This issue was found by our static analysis tool and then manually
reviewed against the current tree.

The grounded PoC kept the SMC listen callback installation path:

  smc_listen()
  smc_clcsock_replace_cb()
  sk_data_ready = smc_clcsock_data_ready()

It then modeled the close/flush carrier that invokes the installed
sk_data_ready callback while sk_callback_lock is already held.  Lockdep
reported the same-thread recursive acquisition:

  WARNING: possible recursive locking detected
  smc_clcsock_data_ready+0xa/0x4d [vuln_msv]
  smc_close_flush_work+0x1f/0x30 [vuln_msv]
  *** DEADLOCK ***

Return before taking sk_callback_lock when the underlying TCP socket is no
longer in TCP_LISTEN.  In that state there is no listen accept work to
queue for SMC, and avoiding the callback lock mirrors the fix used by the
TCP nvmet listener.

Fixes: 0558226cebee ("net/smc: Fix slab-out-of-bounds issue in fallback")
Cc: stable@vger.kernel.org
Signed-off-by: Runyu Xiao <runyu.xiao@seu.edu.cn>
---
 net/smc/af_smc.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 6421c2e1c84d..1af4e3c333ff 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -2631,6 +2631,9 @@ static void smc_clcsock_data_ready(struct sock *listen_clcsock)
 {
 	struct smc_sock *lsmc;
 
+	if (READ_ONCE(listen_clcsock->sk_state) != TCP_LISTEN)
+		return;
+
 	read_lock_bh(&listen_clcsock->sk_callback_lock);
 	lsmc = smc_clcsock_user_data(listen_clcsock);
 	if (!lsmc)
-- 
2.34.1


^ permalink raw reply related

* [net PATCH v2] octeontx2-af: Validate NIX maximum LFs correctly
From: Subbaraya Sundeep @ 2026-06-17 15:22 UTC (permalink / raw)
  To: andrew+netdev, davem, edumazet, kuba, pabeni, sgoutham, gakula,
	bbhushan2, rkannoth
  Cc: netdev, linux-kernel, Subbaraya Sundeep

NIX maximum number of LFs can be set via devlink command
but that can be done before assigning any LFs to a PF/VF.
The condition used to check whether any LFs are assigned is
incorrect. This patch fixes that condition.

Fixes: dd7842878633 ("octeontx2-af: Add new devlink param to configure maximum usable NIX block LFs")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
---
 .../marvell/octeontx2/af/rvu_devlink.c        | 27 +++++++++++++------
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
index 6494a9ee2f0d..3b47ecb44d51 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
@@ -1510,7 +1510,9 @@ static int rvu_af_dl_nix_maxlf_validate(struct devlink *devlink, u32 id,
 	struct rvu_devlink *rvu_dl = devlink_priv(devlink);
 	struct rvu *rvu = rvu_dl->rvu;
 	u16 max_nix0_lf, max_nix1_lf;
-	struct npc_mcam *mcam;
+	struct rvu_block *block;
+	int blkaddr = 0;
+	int free_lfs;
 	u64 cfg;
 
 	cfg = rvu_read64(rvu, BLKADDR_NIX0, NIX_AF_CONST2);
@@ -1518,14 +1520,23 @@ static int rvu_af_dl_nix_maxlf_validate(struct devlink *devlink, u32 id,
 	cfg = rvu_read64(rvu, BLKADDR_NIX1, NIX_AF_CONST2);
 	max_nix1_lf = cfg & 0xFFF;
 
-	/* Do not allow user to modify maximum NIX LFs while mcam entries
-	 * have already been assigned.
+	/* Do not allow user to modify maximum NIX LFs while NIX LFs
+	 * have already been assigned. Note that modifying NIX LFs count
+	 * can be done only before any LF attach requests from PFs and VFs
+	 * and not later or concurrently.
 	 */
-	mcam = &rvu->hw->mcam;
-	if (mcam->bmap_fcnt < mcam->bmap_entries) {
-		NL_SET_ERR_MSG_MOD(extack,
-				   "mcam entries have already been assigned, can't resize");
-		return -EPERM;
+	blkaddr = rvu_get_next_nix_blkaddr(rvu, blkaddr);
+	while (blkaddr) {
+		block = &rvu->hw->block[blkaddr];
+
+		free_lfs = rvu_rsrc_free_count(&block->lf);
+		if (free_lfs != block->lf.max) {
+			NL_SET_ERR_MSG_MOD(extack,
+					   "NIX LFs already assigned, can't resize");
+			return -EPERM;
+		}
+
+		blkaddr = rvu_get_next_nix_blkaddr(rvu, blkaddr);
 	}
 
 	if (max_nix0_lf && val.vu16 > max_nix0_lf) {
-- 
2.48.1


^ permalink raw reply related

* Re: [PATCH net] dpaa2-switch: fix VLAN upper check not rejecting bridge join
From: Ioana Ciornei @ 2026-06-17 15:01 UTC (permalink / raw)
  To: Simon Horman
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, netdev, f.fainelli,
	vladimir.oltean, linux-kernel
In-Reply-To: <20260617114518.857486-1-horms@kernel.org>

On Wed, Jun 17, 2026 at 12:45:18PM +0100, Simon Horman wrote:
> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
> Full review at: https://sashiko.dev
> ---
> dpaa2-switch: fix VLAN upper check not rejecting bridge join
> 
> The blamed commit refactored the prechangeupper event handling but
> failed to actually return an error in case
> dpaa2_switch_prevent_bridging_with_8021q_upper() detected a 802.1q upper
> on a port which tries to join a bridge. Fix this by returning err
> instead of 0.
> 
> > diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c
> > index 52c1cb9cb7e02..46ae81c2fa013 100644
> > --- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c
> > +++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c
> > @@ -2177,7 +2177,7 @@ dpaa2_switch_prechangeupper_sanity_checks(struct net_device *netdev,
> >  	if (err) {
> >  		NL_SET_ERR_MSG_MOD(extack,
> >  				   "Cannot join a bridge while VLAN uppers are present");
> > -		return 0;
> > +		return err;
> 
> [Severity: High]
> If a user adds an 802.1q VLAN upper device to a port after it has already
> successfully joined a bridge, won't this check trigger when the port leaves
> the bridge?
> 
> In drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c,
> dpaa2_switch_prechangeupper_sanity_checks() is called for both bridge joins
> and unlinks from dpaa2_switch_port_prechangeupper():
> 
>     err = dpaa2_switch_prechangeupper_sanity_checks(netdev, upper_dev, extack);
>     if (err)
>         return err;
> 
>     if (!info->linking)
>         dpaa2_switch_port_pre_bridge_leave(netdev);
> 
> Since returning an error aborts the notifier chain, an unlink notification
> would fail this check, preventing dpaa2_switch_port_pre_bridge_leave() from
> running. Does this lead to state inconsistency and skipped switchdev
> unoffloads?
> 
> Could this check be guarded so it only runs when info->linking is true?

Valid point but the fix should be to prevent adding VLAN uppers on
bridged dpaa2-switch ports since the driver does not support the usecase
altogether.


^ permalink raw reply

* Re: [PATCH iproute2-next v4] ip/bond: add lacp_strict support
From: Stephen Hemminger @ 2026-06-17 15:01 UTC (permalink / raw)
  To: Louis Scalbert
  Cc: netdev, andrew+netdev, jv, edumazet, kuba, pabeni, fbl, andy,
	shemminger, maheshb, jonas.gorski, horms
In-Reply-To: <20260617130314.3893243-1-louis.scalbert@6wind.com>

On Wed, 17 Jun 2026 15:03:14 +0200
Louis Scalbert <louis.scalbert@6wind.com> wrote:

> +		} else if (matches(*argv, "lacp_strict") == 0) {
> +			NEXT_ARG();
> +			if (get_index(lacp_strict_tbl, *argv) < 0)
> +				invarg("invalid lacp_strict", *argv);
> +
> +			lacp_strict = get_index(lacp_strict_tbl, *argv);
> +			addattr8(n, 1024, IFLA_BOND_LACP_STRICT, lacp_strict);
>  		} else if (matches(*argv, "tlb_dynamic_lb") == 0) {
>  			NEXT_ARG();
>  			if (get_u8(&tlb_dynamic_lb, *argv, 0)) {

Why not use parse_on_off like other code in this file.


> @@ -642,6 +658,15 @@ static void bond_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[])
>  			   "all_slaves_active %u ",
>  			   rta_getattr_u8(tb[IFLA_BOND_ALL_SLAVES_ACTIVE]));
>  
> +	if (tb[IFLA_BOND_LACP_STRICT]) {
> +		__u8 lacp_strict = rta_getattr_u8(tb[IFLA_BOND_LACP_STRICT]);
> +		print_string(PRINT_FP,
> +			     "lacp_strict",
> +			     "lacp_strict %s ",
> +			     get_name(lacp_strict_tbl, lacp_strict));
> +		print_bool(PRINT_JSON, "lacp_strict", NULL, lacp_strict);
> +	}
> +

Why not use print_on_off like other options

^ permalink raw reply

* Re: [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock
From: Breno Leitao @ 2026-06-17 14:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Petr Mladek, Jakub Kicinski, Sebastian Andrzej Siewior,
	John Ogness, Sergey Senozhatsky, Vlad Poenaru, Thomas Gleixner,
	netdev, David S . Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
	Clark Williams, Steven Rostedt, linux-rt-devel, linux-kernel,
	stable, Frederic Weisbecker, Ingo Molnar, Vincent Guittot,
	Dietmar Eggemann, K Prateek Nayak
In-Reply-To: <20260617111958.GL49951@noisy.programming.kicks-ass.net>

On Wed, Jun 17, 2026 at 01:19:58PM +0200, Peter Zijlstra wrote:
> On Wed, Jun 17, 2026 at 12:37:30PM +0200, Petr Mladek wrote:
> > On Tue 2026-06-16 14:17:19, Jakub Kicinski wrote:
> > > On Tue, 16 Jun 2026 19:02:57 +0200 Peter Zijlstra wrote:
> > > > > So this is not an issue since commit 7eab73b18630e ("netconsole: convert
> > > > > to NBCON console infrastructure"). Because from here now on writes are
> > > > > deferred to the nbcon thread. So this purely about -stable in this case.  
> > > > 
> > > > Hmm, I thought netconsole had some reserved skbs and could to writes
> > > > 'atomic' like? That said, it was 2.6 era the last time I looked at
> > > > netconsole.
> > > 
> > > Yes, that part is fine. The problem is that netconsole tries
> > > to reap Tx completions if the Tx queue is full. We can't call
> > > skb destructor in irq context so we put the completed skbs on
> > > a queue and try to arm softirq to get to them later.
> > > Arming softirq causes a ksoftirq wake up.
> > > 
> > > We already skip the completion polling if we detect getting called
> > > from the same networking driver. It's best effort, anyway.
> > > Networking-side fix would be to toss another OR condition into
> > > the skip. But we don't have one that'd work cleanly :S
> > 
> > Alternative solution might be to offload the ksoftirq wake up
> > to an irq_work. It might make this part safe for the
> > console->write_atomic() call.
> > 
> > Well, my understanding is that there are more problems.
> > AFAIK, some drivers do not use an IRQ safe locking, see
> > https://lore.kernel.org/all/oth5t27z6acp7qxut7u45ekyil7djirg2ny3bnsvnzeqasavxb@nhwdxahvcosh/
> 
> But anything using locking is not ->write_atomic() and should be driven
> from a kthread, no?

Good point. If that's the case, netconsole might not ever be able to drop
CON_NBCON_ATOMIC_UNSAFE for any network-based console driver at all. 

As far as I can tell, there isn't a network driver today whose transmit
path is completely lockless, so, even if we make netpoll lockless.

It's unlikely any NIC will ever achieve this, given that NIC TX
fundamentally relies on a shared DMA ring and doorbell register, which
inherently cannot be made lockless.

So, is it correct to state that CON_NBCON_ATOMIC_UNSAFE will be part of
netconsole forever-ish?

^ permalink raw reply

* RE: [Intel-wired-lan] [PATCH iwl-next v1] ixgbe: Implement PCI reset handler
From: Temerkhanov, Sergey @ 2026-06-17 14:36 UTC (permalink / raw)
  To: Paul Menzel
  Cc: intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org,
	Loktionov, Aleksandr, Bjorn Helgaas, linux-pci@vger.kernel.org
In-Reply-To: <bd5ab9e3-ab93-43ad-a2ce-03d56e2d2ecf@molgen.mpg.de>



> -----Original Message-----
> From: Paul Menzel <pmenzel@molgen.mpg.de>
> Sent: Wednesday, June 17, 2026 11:03 AM
> To: Temerkhanov, Sergey <sergey.temerkhanov@intel.com>
> Cc: intel-wired-lan@lists.osuosl.org; netdev@vger.kernel.org; Loktionov,
> Aleksandr <aleksandr.loktionov@intel.com>; Bjorn Helgaas
> <bhelgaas@google.com>; linux-pci@vger.kernel.org
> Subject: Re: [Intel-wired-lan] [PATCH iwl-next v1] ixgbe: Implement PCI reset
> handler
> 
> [Cc: +Aleksandr (as in Reviewed-by:), +PCI subsystem]
> 
> Dear Sergey,
> 
> 
> Thank you for your patch.
> 
> Am 17.06.26 um 10:43 schrieb Sergey Temerkhanov:
> > Implement PCI device reset handler to allow the network device to get
> > re-initialized and function after a PCI-level reset.
> 
> Please describe the problem in more detail. When does PCI-level reset occur,
> and what is the current problematic situation?

The actual scenario is when a reset is invoked via sysfs during the operation and the adapter is not
properly restoring its state thereafter resulting in a TX queue timeout.

> 
> Also, what is ixgbe specific compared to a general PCIe implementation?
> 
> Please share details how to test it, and how you tested it.

The test is simple:
- Run traffic on the adapter
- Initiate reset via sysfs
- Observe TX queue WDT timeout w/o this change

> > +#define IXGBE_PCIE_RESET_RETRIES 1000
> 
> Why 1000? Isn’t there a generic PCIe macro? Please extend the commit
> message.

This is going to be replaced in v2

> > +	if (test_bit(__IXGBE_SERVICE_INITED, &adapter->state)) {
> > +		timer_delete_sync(&adapter->service_timer);
> > +		/* Prevent the service task from running while we're resetting.
> */
> 
> One of the two comments seems redundant.

I am clarifying this in v2. Essentially, the timer callback may queue a work, cancel_work_sync()
cancels any instance that may have been already pending.

> 
> > +		cancel_work_sync(&adapter->service_task);
> > +	}
> progress\n");
> 
> How can this happen? What should the user reading this error do?

Under the normal circumstances we should never get here, I am adding a comment in v2

> >   static DEFINE_SIMPLE_DEV_PM_OPS(ixgbe_pm_ops, ixgbe_suspend,
> > ixgbe_resume);
> 
> Kind regards,
> 
> Paul

Regards,
Sergey

^ permalink raw reply

* Re: [PATCH] net/sched: dualpi2: fix GSO backlog accounting
From: Jamal Hadi Salim @ 2026-06-17 14:23 UTC (permalink / raw)
  To: Xingquan Liu
  Cc: netdev, Jiri Pirko, Victor Nogueira, stable,
	Chia-Yu Chang (Nokia)
In-Reply-To: <CAM0EoM=o+kBQNND8ViMe8bZQmFAtATav+CFMmtp1udzu+tpTzA@mail.gmail.com>

On Wed, Jun 17, 2026 at 6:23 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
> On Tue, Jun 16, 2026 at 6:03 PM Xingquan Liu <b1n@b1n.io> wrote:
> >
> > When DualPI2 splits a GSO skb into N segments, it propagates N
> > additional packets to its parent before returning NET_XMIT_SUCCESS.
> > The parent then accounts for the original skb once more, leaving its
> > qlen one larger than the number of packets actually queued.
> >
> > With QFQ as the parent, after all real packets are dequeued, QFQ still
> > has a non-zero qlen while its in-service aggregate has no active
> > classes. qfq_choose_next_agg() returns NULL and qfq_dequeue() passes
> > the result to qfq_peek_skb(), causing a NULL pointer dereference.
> >
> > Count only successfully queued segments and propagate the difference
> > between the original skb and those segments. Return success whenever
> > at least one segment was queued.
> >
> > Fixes: 8f9516daedd6 ("sched: Add enqueue/dequeue of dualpi2 qdisc")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Xingquan Liu <b1n@b1n.io>
> > ---
> >  net/sched/sch_dualpi2.c | 11 +++++------
> >  1 file changed, 5 insertions(+), 6 deletions(-)
> >
> > diff --git a/net/sched/sch_dualpi2.c b/net/sched/sch_dualpi2.c
> > index dfec3c99eb45..37d6a8960310 100644
> > --- a/net/sched/sch_dualpi2.c
> > +++ b/net/sched/sch_dualpi2.c
> > @@ -461,7 +461,7 @@ static int dualpi2_qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> >                 if (IS_ERR_OR_NULL(nskb))
> >                         return qdisc_drop(skb, sch, to_free);
> >
> > -               cnt = 1;
> > +               cnt = 0;
> >                 byte_len = 0;
> >                 orig_len = qdisc_pkt_len(skb);
> >                 skb_list_walk_safe(nskb, nskb, next) {
> > @@ -488,16 +488,15 @@ static int dualpi2_qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> >                                 byte_len += nskb->len;
> >                         }
> >                 }
> > -               if (cnt > 1) {
> > +               if (cnt > 0) {
> >                         /* The caller will add the original skb stats to its
> >                          * backlog, compensate this if any nskb is enqueued.
> >                          */
> > -                       --cnt;
> > -                       byte_len -= orig_len;
> > +                       qdisc_tree_reduce_backlog(sch, 1 - cnt,
> > +                                                 orig_len - byte_len);
> >                 }
> > -               qdisc_tree_reduce_backlog(sch, -cnt, -byte_len);
> >                 consume_skb(skb);
> > -               return err;
> > +               return cnt > 0 ? NET_XMIT_SUCCESS : err;
> >         }
>
> This looks like a behavior change?
> Ex: If the last segment failed you will return XMIT_SUCCESS whereas
> before it could be with __NET_XMIT_BYPASS, NET_XMIT_CN,  etc.
> I am not sure what the best answer is and maybe it doesnt matter. Did
> you look at what other qdiscs do? I dont have time right now but will
> later - or you can before i get to it.
> Also, you didnt add the owner of this qdisc on your to:  - maybe he
> has some thoughts..
>

After looking at what other qdiscs do, your patch is fine. But please
fixup the commit to something like:

---
When DualPI2 splits a GSO skb into N segments, it propagates N
additional packets to its parent before returning NET_XMIT_SUCCESS.
The parent then accounts for the original skb once more, leaving its
qlen one larger than the number of packets actually queued.

With QFQ as the parent, after all real packets are dequeued, QFQ still
has a non-zero qlen while its in-service aggregate has no active
classes. qfq_choose_next_agg() returns NULL and qfq_dequeue() passes
the result to qfq_peek_skb(), causing a NULL pointer dereference.

Follow the same pattern used by tbf_segment() and taprio: count only
successfully queued segments, propagate the difference between the
original skb and those segments, and return NET_XMIT_SUCCESS whenever
at least one segment was queued.

Fixes: 8f9516daedd6 ("sched: Add enqueue/dequeue of dualpi2 qdisc")
Cc: stable@vger.kernel.org
Signed-off-by: Xingquan Liu <b1n@b1n.io>
-----

Do you know how to create a tdc test that will recreate this? If not
either Victor or myself can help you create one.

cheers,
jamal

> cheers,
> jamal
>
>
> >         return dualpi2_enqueue_skb(skb, sch, to_free);
> >  }
> >
> > base-commit: fbc6a80cb5d3fd4ac4b56e8c9d791dd17be890c4
> > --
> > Xingquan Liu
> >

^ permalink raw reply

* Re: Landlock: LANDLOCK_ACCESS_NET_CONNECT_TCP bypass via TCP Fast Open
From: Mickaël Salaün @ 2026-06-17 14:22 UTC (permalink / raw)
  To: Bryam Vargas
  Cc: Günther Noack, Matthieu Buffet, Paul Moore, Eric Dumazet,
	Neal Cardwell, linux-security-module, netdev, linux-kernel,
	Mikhail Ivanov
In-Reply-To: <20260616201615.275032-1-hexlabsecurity@proton.me>

Hi,

Thanks for the report.  This was previously identified by Mikhail and
Matthieu, see the related issue:
https://github.com/landlock-lsm/linux/issues/41


On Tue, Jun 16, 2026 at 08:16:22PM +0000, Bryam Vargas wrote:
> Hello Mickaël, and Landlock folks,
> 
> A task confined by a Landlock ruleset that handles
> LANDLOCK_ACCESS_NET_CONNECT_TCP and is denied connecting to a given port can
> still establish a TCP connection to that port by using TCP Fast Open, i.e.
> sendto(fd, ..., MSG_FASTOPEN, &dst, dstlen) on a fresh stream socket. The
> network-egress confinement for TCP connect is silently bypassed.
> 
> Affected
> --------
> Any kernel with CONFIG_SECURITY_LANDLOCK=y and Landlock enabled that supports
> the TCP network access rights (Landlock ABI >= 4, since Linux 6.7). Confirmed by
> source inspection on mainline (v7.1-rc7) and reproduced on Linux 7.0.11
> (Landlock ABI 8). No CONFIG beyond Landlock + IPv4/IPv6 TCP; TCP Fast Open client
> is enabled by the per-netns default (net.ipv4.tcp_fastopen has TFO_CLIENT_ENABLE
> set), so no sysctl change and no setsockopt are required.
> 
> Root cause
> ----------
> LANDLOCK_ACCESS_NET_CONNECT_TCP is enforced only by the socket_connect LSM hook
> (hook_socket_connect -> current_check_access_socket). security_socket_connect()
> has exactly one call site in the tree, net/socket.c (the connect(2) syscall).
> 
> TCP Fast Open performs an implicit connect inside sendmsg:
> 
>   tcp_sendmsg_locked()            net/ipv4/tcp.c  (MSG_FASTOPEN branch)
>    -> tcp_sendmsg_fastopen()      net/ipv4/tcp.c
>    -> __inet_stream_connect(..., is_sendmsg=1)  net/ipv4/af_inet.c
>    -> sk->sk_prot->connect()      net/ipv4/af_inet.c  -> tcp_v4_connect()
> 
> This path establishes the connection to the address taken from msg_name but
> never calls security_socket_connect(). The only LSM hook fired on the sendmsg
> path is security_socket_sendmsg(), and Landlock registers no socket_sendmsg
> hook, so LANDLOCK_ACCESS_NET_CONNECT_TCP is never re-checked. __inet_stream_connect()
> itself carries no LSM hook (only the cgroup-BPF pre_connect, a different
> mechanism).
> 
> Notably the kernel already mediates the analogous AF_UNIX implicit-connect on the
> send path via the unix_may_send hook, which Landlock does register
> (hook_unix_may_send) -- so the sendmsg-implies-connect pattern is recognized, but
> the TCP Fast Open case has no equivalent coverage. The MPTCP fast-open path
> (mptcp_sendmsg_fastopen -> __inet_stream_connect) is a second producer of the
> same unmediated connect (by source inspection; not separately reproduced).
> 
> Reproducer
> ----------
> A self-contained, fully unprivileged PoC is available on request. It forks an
> unconfined TFO-capable loopback listener, then in a child applies a Landlock
> ruleset handling LANDLOCK_ACCESS_NET_CONNECT_TCP with no allow rule
> (landlock_create_ruleset() with handled_access_net =
> LANDLOCK_ACCESS_NET_CONNECT_TCP, no landlock_add_rule(), then
> landlock_restrict_self(); every TCP connect is denied) and tries the forbidden
> port two ways:
> 
>   (1) connect(fd, &dst)                 -> -EACCES   (Landlock enforces CONNECT_TCP)
>   (2) sendto(fd2, buf, len, MSG_FASTOPEN, &dst, dstlen)
>                                         -> succeeds; the listener accepts the
>                                            connection and reads the payload.
> 
> Observed on Linux 7.0.11 (Landlock ABI 8):
> 
>   [1] connect(2)            -> ret=-1 errno=13 (Permission denied)
>   [2] sendto(MSG_FASTOPEN)  -> ret=14 errno=0 (OK/queued)
>   [+] listener ACCEPTED the confined child's connection; payload="..."
> 
> connect(2) to the port is denied while sendto(MSG_FASTOPEN) reaches the identical
> port and delivers data.
> 
> Impact
> ------
> A sandbox that uses LANDLOCK_ACCESS_NET_CONNECT_TCP to restrict outbound TCP
> (e.g. to keep a confined component from reaching an internal service or a
> metadata endpoint) can be escaped by an unprivileged, self-confined task with no
> CAP and no namespace transition -- for any destination port, since the
> implicit-connect path never consults the connect hook regardless of address (the
> run above shows one port). It is an integrity
> bypass of the network-confinement property; no memory safety is involved.
> I score it CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:C/C:N/I:H/A:N (6.5 Medium) -- the
> confined task escapes the policy authority that defined its sandbox, a scope
> change; 5.5 if you treat the Landlock boundary as the same authority (S:U).
> 
> Note on the in-flight UDP series
> --------------------------------
> The "landlock: Add UDP access control support" series (v5, Matthieu Buffet,
> https://lore.kernel.org/r/20260611162107.49278-3-matthieu@buffet.re) adds a
> socket_sendmsg hook, hook_socket_sendmsg(), but it returns 0 for non-UDP
> sockets:
> 
>     if (sk_is_udp(sock->sk))
>             access_request = LANDLOCK_ACCESS_NET_CONNECT_SEND_UDP;
>     else
>             return 0;
> 
> so a TCP socket using MSG_FASTOPEN still bypasses LANDLOCK_ACCESS_NET_CONNECT_TCP
> even after that series lands. It may be most convenient to fix this there.
> 
> Suggested direction
> -------------------
> Re-check LANDLOCK_ACCESS_NET_CONNECT_TCP on the implicit-connect path: either have
> the socket_sendmsg hook evaluate CONNECT_TCP for stream sockets when the call
> performs an implicit connect (mirroring the AF_UNIX unix_may_send handling), or
> place the check inside __inet_stream_connect() so a single chokepoint covers
> connect(2), TCP Fast Open, and the MPTCP fast-open sibling.
> 
> I am happy to send a patch for this if you would like me to.

Yes please.

> 
> Best regards,
> 
> Bryam Vargas
> Independent security researcher, HEXLAB S.A.S., Cali, Colombia
> hexlabsecurity@proton.me
> 
> 

^ permalink raw reply

* Re: [PATCH bpf v3 2/2] selftests/bpf: Cover partial copy of non-linear test_run output
From: sun jian @ 2026-06-17 14:19 UTC (permalink / raw)
  To: bot+bpf-ci
  Cc: bpf, netdev, linux-kselftest, linux-kernel, ast, daniel, andrii,
	martin.lau, eddyz87, memxor, song, yonghong.song, jolsa, davem,
	edumazet, kuba, pabeni, horms, shuah, hawk, john.fastabend, sdf,
	toke, lorenzo, paul.chaignon, martin.lau, clm, ihor.solodrai
In-Reply-To: <2dad9b5c184fa101d7cffa1d8f5eea5b5df60f53533d98c68175c9e3ec5ee6ac@mail.kernel.org>

On Wed, Jun 17, 2026 at 6:31 PM <bot+bpf-ci@kernel.org> wrote:
>
> > diff --git a/tools/testing/selftests/bpf/prog_tests/prog_run_opts.c b/tools/testing/selftests/bpf/prog_tests/prog_run_opts.c
> > index 01f1d1b6715a..9cc898e6a9f7 100644
> > --- a/tools/testing/selftests/bpf/prog_tests/prog_run_opts.c
> > +++ b/tools/testing/selftests/bpf/prog_tests/prog_run_opts.c
> > @@ -4,6 +4,10 @@
> >
> >  #include "test_pkt_access.skel.h"
> >
> > +#define NONLINEAR_PKT_LEN 9000
> > +#define NONLINEAR_LINEAR_DATA_LEN 64
> > +#define SHORT_OUT_LEN 100
> > +
>
> [ ... ]
>
> > @@ -20,6 +24,69 @@ static void check_run_cnt(int prog_fd, __u64 run_cnt)
> >             "incorrect number of repetitions, want %llu have %llu\n", run_cnt, info.run_cnt);
> >  }
> >
> > +static void init_pkt(__u8 *pkt, size_t len)
> > +{
> > +     size_t i;
> > +
> > +     for (i = 0; i < len; i++)
> > +             pkt[i] = i & 0xff;
> > +}
>
> A question was raised on v2 about whether pkt_v4 could be reused by
> reducing the linear area to ETH_HLEN, rather than introducing a custom
> init_pkt() with a 9000-byte stack packet.
>
> Can't we reuse pkt_v4 here by reducing the linear area to ETH_HLEN?
> The v3 still adds init_pkt() and the NONLINEAR_PKT_LEN packet, so this
> doesn't seem to have been picked up.
>
> > +
> > +static void test_skb_nonlinear_data_out_partial(struct test_pkt_access *skel)
> > +{
> > +     LIBBPF_OPTS(bpf_test_run_opts, topts);
> > +     __u8 pkt[NONLINEAR_PKT_LEN];
> > +     __u8 out[SHORT_OUT_LEN];
> > +     struct __sk_buff skb = {};
> > +     int prog_fd, err;
> > +
> > +     init_pkt(pkt, sizeof(pkt));
> > +
> > +     skb.data_end = NONLINEAR_LINEAR_DATA_LEN;
> > +
> > +     topts.data_in = pkt;
> > +     topts.data_size_in = sizeof(pkt);
> > +     topts.data_out = out;
> > +     topts.data_size_out = sizeof(out);
> > +     topts.ctx_in = &skb;
> > +     topts.ctx_size_in = sizeof(skb);
> > +
> > +     prog_fd = bpf_program__fd(skel->progs.tc_pass_prog);
>
> [ ... ]
>
> > diff --git a/tools/testing/selftests/bpf/progs/test_pkt_access.c b/tools/testing/selftests/bpf/progs/test_pkt_access.c
> > index bce7173152c6..cd284401eebd 100644
> > --- a/tools/testing/selftests/bpf/progs/test_pkt_access.c
> > +++ b/tools/testing/selftests/bpf/progs/test_pkt_access.c
> > @@ -150,3 +150,15 @@ int test_pkt_access(struct __sk_buff *skb)
> >
> >       return TC_ACT_UNSPEC;
> >  }
> > +
> > +SEC("tc")
> > +int tc_pass_prog(struct __sk_buff *skb)
> > +{
> > +     return TC_ACT_OK;
> > +}
> > +
> > +SEC("xdp.frags")
> > +int xdp_frags_pass_prog(struct xdp_md *ctx)
> > +{
> > +     return XDP_PASS;
> > +}
>
> A related suggestion on v2 was that, once pkt_v4 is reused, the existing
> BPF program could be reused instead of adding new pass-through programs.
>
> Could tc_pass_prog and xdp_frags_pass_prog be dropped in favour of the
> existing program? The v3 still adds both of these, so this point also
> seems to be open.
>
>
> ---
> AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
> See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
>
> CI run summary: https://github.com/kernel-patches/bpf/actions/runs/27680511802

Hi,

Thanks for checking this.

I tried reusing pkt_v4 and the existing TC program, but they do not fit
the skb case this test is trying to cover.

For skb test_run, IPv4/IPv6 inputs with a too-short L3 header in the
linear area are rejected before bpf_test_finish(). With pkt_v4 and a
linear area of ETH_HLEN, the test fails with -EINVAL before reaching the
partial copy-out path. If the linear area is increased enough to pass the
IPv4 check, pkt_v4 is too small to both trigger the old
copy_size - frag_size path and verify that the copied prefix spans the
linear data and the first fragment. pkt_v6 has the same issue: after
making the IPv6 header linear, only 20 bytes remain in frags.

The existing test_pkt_access program has its own packet-access coverage
goals and is not just a pass-through carrier. With such a short linear
area or small packet fixture, it can fail before the test hits the
bpf_test_finish()'s partial copy-out path. A pass-through TC program is
therefore a better fit, because it keeps the test focused on the
bpf_test_finish() copy-out semantics.

For XDP, this object does not have an existing xdp.frags pass-through
program, so the small XDP frags program is needed to cover the other
caller of the shared bpf_test_finish() path.

Thanks,
Sun Jian

^ permalink raw reply

* Re: [PATCH net] net: rnpgbe: fix mailbox endianness handling
From: Yibo Dong @ 2026-06-17 14:05 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, vadim.fedorenko,
	netdev, linux-kernel, yaojun
In-Reply-To: <26517b8f-33b7-4de7-8fe8-c7dca5fa7a4b@lunn.ch>

On Wed, Jun 17, 2026 at 02:09:00PM +0200, Andrew Lunn wrote:
> > My understanding is as follows:
> > The firmware structures are defined with__le16 / __le32 for wire format,
> > but the original code cast these struct pointers to u32 * before passing
> > them to the mailbox read/write routines:
> > - Send path: (u32 *)&req -> msg buffer -> writel()
> > - Receive path: readl() -> msg buffer -> (u32 *)&reply
> > Sparse only sees pure u32 = u32 assignments here, so no type mismatch is
> > reported.
> 
> Can the code be changed so that it does not need the cast? Casts are
> bad, as you have just shown. This is something i try to push back on,
> it makes you think about types and avoid issues like this.
> 
> 	Andrew
> 
Thinking... Yes. A few possibilities:

1. Make all fields __le32, then extract via shifts:
   struct mbx_fw_cmd_req {
       __le32 word0;  // [15:0]=flags  [31:16]=opcode
       __le32 word1;  // [15:0]=datalen [31:16]=ret_value
       ...
   };
   But that's painful — le32_to_cpu(req.word0) >> 16 vs req.opcode.

2. Use a union to keep named fields while also exposing __le32[] access:
   union mbx_fw_cmd_req_u {
       struct mbx_fw_cmd_req req;
       __le32 dwords[sizeof(struct mbx_fw_cmd_req) / sizeof(__le32)];
   };
   union mbx_fw_cmd_reply_u {
       struct mbx_fw_cmd_reply reply;
       __le32 dwords[sizeof(struct mbx_fw_cmd_reply) / sizeof(__le32)];
   };

   The transport interface becomes:
   int mucse_write_mbx_pf(struct mucse_hw *hw, const __le32 *msg, u16 size);
   int mucse_read_mbx_pf(struct mucse_hw *hw, __le32 *msg, u16 size);

   Callers would use:
   union mbx_fw_cmd_req_u cmd = {};
   cmd.req.opcode = cpu_to_le16(...);
   cmd.req.flags  = cpu_to_le16(...);
   mucse_write_mbx_pf(hw, cmd.dwords, sizeof(cmd.req));

   If the transport layer forgets le32_to_cpu(), sparse would catch it
   because msg is __le32 * and mbx_data_rd32() returns u32.

   The downside is an extra union wrapper and an extra level in field
   access (cmd.req.opcode vs req.opcode) — a minor inconvenience.

Do you have a preference between these, or another approach?

Thanks for the feedback.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox