[PATCH] HID: hid-lenovo-go: cancel cfg_setup work in hid_go_cfg

Linux Input/HID development
 help / color / mirror / Atom feed

* [PATCH] HID: hid-lenovo-go: cancel cfg_setup work in hid_go_cfg_remove()
@ 2026-05-15 15:36 Manish Khadka
  2026-05-15 16:18 ` sashiko-bot
  0 siblings, 1 reply; 4+ messages in thread
From: Manish Khadka @ 2026-05-15 15:36 UTC (permalink / raw)
  To: linux-input
  Cc: Derek J . Clark, Mark Pearson, Jiri Kosina, Benjamin Tissoires,
	linux-kernel

hid_go_cfg_probe() initialises drvdata.go_cfg_setup and schedules it
to run 2 ms later:

    INIT_DELAYED_WORK(&drvdata.go_cfg_setup, &cfg_setup);
    schedule_delayed_work(&drvdata.go_cfg_setup, msecs_to_jiffies(2));

cfg_setup() dereferences drvdata.hdev to issue MCU command requests.
hid_go_cfg_remove() tears down sysfs and stops the HID device, ending
with hid_set_drvdata(hdev, NULL), but never drains the delayed work.
If the device is unbound within the 2 ms scheduling delay (a probe
failure rolling back via remove, or a fast rmmod after probe), the
work fires after hid_set_drvdata(NULL) has cleared the back pointer,
leaving cfg_setup() with a NULL or stale drvdata.hdev.

Mirror the sibling driver hid-lenovo-go-s.c, whose hid_gos_cfg_remove()
already calls cancel_delayed_work_sync() on its analogous work, and
drain go_cfg_setup at the top of hid_go_cfg_remove().  The cancel
must come before guard(mutex)(&drvdata.cfg_mutex) because cfg_setup()
acquires that mutex; reversing the order would deadlock.

Fixes: d69ccfcbc955 ("HID: hid-lenovo-go: Add Lenovo Legion Go Series HID Driver")
Cc: stable@vger.kernel.org
Signed-off-by: Manish Khadka <maskmemanish@gmail.com>
---
 drivers/hid/hid-lenovo-go.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/hid/hid-lenovo-go.c b/drivers/hid/hid-lenovo-go.c
index d4d26c783356..ef69869f0a00 100644
--- a/drivers/hid/hid-lenovo-go.c
+++ b/drivers/hid/hid-lenovo-go.c
@@ -2408,6 +2408,13 @@ static int hid_go_cfg_probe(struct hid_device *hdev,

 static void hid_go_cfg_remove(struct hid_device *hdev)
 {
+	/*
+	 * cfg_setup is scheduled from hid_go_cfg_probe() with a 2 ms delay;
+	 * drain it here before tearing down so the workqueue cannot run
+	 * after hid_set_drvdata(NULL) and dereference a stale drvdata.hdev.
+	 */
+	cancel_delayed_work_sync(&drvdata.go_cfg_setup);
+
 	guard(mutex)(&drvdata.cfg_mutex);
 	sysfs_remove_groups(&hdev->dev.kobj, top_level_attr_groups);
 	hid_hw_close(hdev);
-- 
2.43.0

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] HID: hid-lenovo-go: cancel cfg_setup work in hid_go_cfg_remove()
  2026-05-15 15:36 [PATCH] HID: hid-lenovo-go: cancel cfg_setup work in hid_go_cfg_remove() Manish Khadka
@ 2026-05-15 16:18 ` sashiko-bot
  2026-05-15 17:45   ` [PATCH v2] " Manish Khadka
  0 siblings, 1 reply; 4+ messages in thread
From: sashiko-bot @ 2026-05-15 16:18 UTC (permalink / raw)
  To: Manish Khadka; +Cc: linux-input, dmitry.torokhov

Thank you for your contribution! Sashiko AI review found 7 potential issue(s) to consider:
- [High] AB-BA deadlock between device removal and sysfs operations.
- [High] Global state corruption due to static per-device structs.
- [High] Use-After-Free window due to mixing devm registration with explicit teardown.
- [High] Resource leaks on probe error paths.
- [Medium] Unconditional success return in `mcu_property_out()` masks timeout errors.
- [Medium] Probe arbitrarily fails due to incorrect `schedule_delayed_work()` boolean logic.
- [Low] Commit message mechanism explanation is factually incorrect.
--

commit 8001481b81b1932a98238ba0e2de3079e8807d91
Author: Manish Khadka <maskmemanish@gmail.com>

HID: hid-lenovo-go: cancel cfg_setup work in hid_go_cfg_remove()

This commit drains the delayed go_cfg_setup workqueue at the start of
hid_go_cfg_remove() to prevent it from dereferencing a stale device pointer
after teardown. It places the cancellation before the mutex acquisition to
avoid deadlocks.

Cc: stable@vger.kernel.org

> If the device is unbound within the 2 ms scheduling delay (a probe
> failure rolling back via remove, or a fast rmmod after probe), the
> work fires after hid_set_drvdata(NULL) has cleared the back pointer,
> leaving cfg_setup() with a NULL or stale drvdata.hdev.

This isn't a bug, but the explanation of the mechanism seems slightly
inaccurate.

Does hid_set_drvdata(hdev, NULL) only clear hdev->dev.driver_data rather
than the global drvdata.hdev variable?

> diff --git a/drivers/hid/hid-lenovo-go.c b/drivers/hid/hid-lenovo-go.c
> index d4d26c7833563..ef69869f0a009 100644
> --- a/drivers/hid/hid-lenovo-go.c
> +++ b/drivers/hid/hid-lenovo-go.c
> @@ -2408,6 +2408,13 @@ static int hid_go_cfg_probe(struct hid_device *hdev,
>  
>  static void hid_go_cfg_remove(struct hid_device *hdev)
>  {
> +	/*
> +	 * cfg_setup is scheduled from hid_go_cfg_probe() with a 2 ms delay;
> +	 * drain it here before tearing down so the workqueue cannot run
> +	 * after hid_set_drvdata(NULL) and dereference a stale drvdata.hdev.
> +	 */
> +	cancel_delayed_work_sync(&drvdata.go_cfg_setup);

The driver uses static global structures for device state (drvdata) and
LED attributes.

If multiple Legion Go controllers are connected, or a device is rapidly
re-bound, does re-initializing an active mutex or delayed workqueue, and
re-registering an already-linked LED struct corrupt internal kernel linked
lists and lock wait queues?

> +
>  	guard(mutex)(&drvdata.cfg_mutex);
>  	sysfs_remove_groups(&hdev->dev.kobj, top_level_attr_groups);

In hid_go_cfg_remove(), guard(mutex)(&drvdata.cfg_mutex) acquires the
mutex before calling sysfs_remove_groups(). sysfs_remove_groups() drains
kernfs active references, blocking until any currently executing sysfs
operations finish.

If the driver's sysfs callbacks acquire drvdata.cfg_mutex via
mcu_property_out(), can this cause an AB-BA deadlock if a sysfs operation
runs concurrently with device removal?

>  	hid_hw_close(hdev);
>  	hid_hw_stop(hdev);

The RGB LED class device and RGB sysfs attributes appear to be registered
using devm_* APIs, which clean up resources after the driver's .remove()
callback finishes.

If hid_go_cfg_remove() explicitly stops and closes the hardware here,
could userspace accessing the LED brightness or RGB sysfs files in the
window between hardware teardown and devres cleanup cause a use-after-free
when the sysfs callbacks call hid_hw_output_report()?

>  	hid_set_drvdata(hdev, NULL);
>  }

Looking at hid_go_cfg_probe(), if a step after sysfs_create_groups() fails,
the function returns an error without calling sysfs_remove_groups().

Does this leave dangling sysfs files that could be accessed later?

Also, if hid_go_cfg_probe() returns an error, does hid_go_probe() leak the
hardware open references by failing to call hid_hw_stop() and
hid_hw_close()?

In hid_go_cfg_probe(), the driver checks ret = schedule_delayed_work(...)
and fails the probe if !ret.

Since schedule_delayed_work() returns false (0) when the work is already
pending, and the driver uses a single global workqueue structure, will a
second device fail to probe if it connects before the first device's work
fires?

In mcu_property_out(), wait_for_completion_interruptible_timeout() is used
to wait for hardware responses. If a timeout occurs, ret is set to -EBUSY.

Because the function ends with an unconditional return 0, does this drop
the error code and mask communication failures?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260515153607.76175-1-maskmemanish@gmail.com?part=1

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v2] HID: hid-lenovo-go: cancel cfg_setup work in hid_go_cfg_remove()
  2026-05-15 16:18 ` sashiko-bot
@ 2026-05-15 17:45   ` Manish Khadka
  2026-05-15 18:23     ` sashiko-bot
  0 siblings, 1 reply; 4+ messages in thread
From: Manish Khadka @ 2026-05-15 17:45 UTC (permalink / raw)
  To: linux-input
  Cc: Derek J . Clark, Mark Pearson, Jiri Kosina, Benjamin Tissoires,
	linux-kernel

hid_go_cfg_probe() initialises drvdata.go_cfg_setup and schedules it
to run 2 ms later:

    INIT_DELAYED_WORK(&drvdata.go_cfg_setup, &cfg_setup);
    schedule_delayed_work(&drvdata.go_cfg_setup, msecs_to_jiffies(2));

cfg_setup() dereferences drvdata.hdev to issue MCU command requests.
hid_go_cfg_remove() tears down sysfs and stops the HID device, but
never drains the delayed work.  If the device is unbound within the
2 ms scheduling delay (a probe failure rolling back via remove, or a
fast rmmod after probe), the work fires after hid_destroy_device()
has dropped its reference and released the underlying hdev struct,
leaving cfg_setup() with a stale drvdata.hdev pointer.

Mirror the sibling driver hid-lenovo-go-s.c, whose hid_gos_cfg_remove()
already calls cancel_delayed_work_sync() on its analogous work, and
drain go_cfg_setup at the top of hid_go_cfg_remove().  The cancel
must come before guard(mutex)(&drvdata.cfg_mutex) because cfg_setup()
acquires that mutex; reversing the order would deadlock.

Fixes: d69ccfcbc955 ("HID: hid-lenovo-go: Add Lenovo Legion Go Series HID Driver")
Cc: stable@vger.kernel.org
Signed-off-by: Manish Khadka <maskmemanish@gmail.com>
---

v1 -> v2:
  - Address Sashiko AI review feedback:
    * [Low] Correct the inaccurate description of how drvdata.hdev
      becomes stale.  hid_set_drvdata(hdev, NULL) only clears the
      per-hdev driver_data, not the global drvdata.hdev; the actual
      stale-pointer mechanism is hid_destroy_device()'s put_device()
      releasing the underlying hdev struct.  Commit message and the
      inline comment in hid_go_cfg_remove() are corrected accordingly.
  - The other six review points (ABBA deadlock with sysfs_remove_groups,
    global static drvdata corruption on multi-device/rebind, devm vs.
    explicit teardown race, probe error-path leaks, mcu_property_out
    unconditional return 0, schedule_delayed_work boolean misuse)
    are all valid pre-existing bugs in the driver but predate and are
    independent of this patch.  Each deserves its own fix; will be
    addressed in separate patches.
  Thanks to Sashiko AI review for all seven points.
 drivers/hid/hid-lenovo-go.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/hid/hid-lenovo-go.c b/drivers/hid/hid-lenovo-go.c
index d4d26c783356..b9d8cde53136 100644
--- a/drivers/hid/hid-lenovo-go.c
+++ b/drivers/hid/hid-lenovo-go.c
@@ -2408,6 +2408,15 @@ static int hid_go_cfg_probe(struct hid_device *hdev,
 
 static void hid_go_cfg_remove(struct hid_device *hdev)
 {
+	/*
+	 * cfg_setup is scheduled from hid_go_cfg_probe() with a 2 ms delay
+	 * and dereferences drvdata.hdev.  Drain it here before tearing
+	 * down so the workqueue cannot run after hid_destroy_device()'s
+	 * put_device() has released the underlying hdev and dereference
+	 * a stale drvdata.hdev pointer.
+	 */
+	cancel_delayed_work_sync(&drvdata.go_cfg_setup);
+
 	guard(mutex)(&drvdata.cfg_mutex);
 	sysfs_remove_groups(&hdev->dev.kobj, top_level_attr_groups);
 	hid_hw_close(hdev);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] HID: hid-lenovo-go: cancel cfg_setup work in hid_go_cfg_remove()
  2026-05-15 17:45   ` [PATCH v2] " Manish Khadka
@ 2026-05-15 18:23     ` sashiko-bot
  0 siblings, 0 replies; 4+ messages in thread
From: sashiko-bot @ 2026-05-15 18:23 UTC (permalink / raw)
  To: Manish Khadka; +Cc: dmitry.torokhov, linux-input

Thank you for your contribution! Sashiko AI review found 7 potential issue(s) to consider:
- [Critical] Hardware state corruption and crashes due to use of a static global struct for device data.
- [High] The patch provides an incomplete fix for the use-after-free and introduces a resource leak because it assumes the driver core calls remove() on probe failure.
- [High] ABBA Deadlock in `hid_go_cfg_remove()` between `cfg_mutex` and sysfs kernfs removal.
- [High] 'Scheduling while atomic' panic in the LED subsystem callback.
- [High] Hardware event synchronization race condition permanently corrupts the command state machine.
- [High] Resource leak on unbind if the device is operating as a generic HID device.
- [High] Undefined behavior in `mcu_property_out()` due to passing a NULL pointer to `memcpy()`.
--

commit 304d13b89ea5c6814d905d7149a191f0e8fad93d
Author: Manish Khadka <maskmemanish@gmail.com>

HID: hid-lenovo-go: cancel cfg_setup work in hid_go_cfg_remove()

This commit cancels the go_cfg_setup delayed work in hid_go_cfg_remove() to
prevent a use-after-free if the device is unbound before the work executes.

> If the device is unbound within the
> 2 ms scheduling delay (a probe failure rolling back via remove, or a
> fast rmmod after probe), the work fires after hid_destroy_device()

Is the statement about a probe failure rolling back via remove accurate?

In the Linux driver model, the core never calls remove() if probe() fails.

If hid_go_cfg_probe() fails after scheduling the delayed work, the work
remains pending and will dereference the freed hdev when it fires, causing
the exact use-after-free crash this patch attempts to fix.

Additionally, if hid_go_cfg_probe() fails, does hid_go_probe() return the
error code without calling hid_hw_close() and hid_hw_stop(), permanently
leaking the hardware session? Should explicit cleanup be added to the probe
error paths?

> diff --git a/drivers/hid/hid-lenovo-go.c b/drivers/hid/hid-lenovo-go.c
> index d4d26c7833563..b9d8cde531365 100644
> --- a/drivers/hid/hid-lenovo-go.c
> +++ b/drivers/hid/hid-lenovo-go.c
> @@ -2408,6 +2408,15 @@ static int hid_go_cfg_probe(struct hid_device *hdev,
>  
>  static void hid_go_cfg_remove(struct hid_device *hdev)
>  {
> +	/*
> +	 * cfg_setup is scheduled from hid_go_cfg_probe() with a 2 ms delay
> +	 * and dereferences drvdata.hdev.  Drain it here before tearing
> +	 * down so the workqueue cannot run after hid_destroy_device()'s
> +	 * put_device() has released the underlying hdev and dereference
> +	 * a stale drvdata.hdev pointer.
> +	 */
> +	cancel_delayed_work_sync(&drvdata.go_cfg_setup);

This isn't a bug introduced by this patch, but does the driver use a global
static variable (static struct hid_go_cfg drvdata) to store device state?

A second device probe would unconditionally overwrite drvdata.hdev and
re-initialize both drvdata.cfg_mutex and drvdata.go_cfg_setup.

Re-initializing an active mutex destroys its wait lists, and re-initializing
an active delayed work overwrites its linked list pointers, leading to
immediate memory corruption and kernel panics.

Should the device state be dynamically allocated per-instance instead?

> +
>  	guard(mutex)(&drvdata.cfg_mutex);
>  	sysfs_remove_groups(&hdev->dev.kobj, top_level_attr_groups);

This is a pre-existing issue, but does this sequence cause an ABBA deadlock?

sysfs_remove_groups() drains all active kernfs references, blocking until any
currently executing sysfs operations complete.

However, the driver's sysfs callbacks internally call mcu_property_out(),
which acquires drvdata.cfg_mutex. If a sysfs read or write races with driver
removal, the sysfs thread holds the active kernfs reference and waits for
cfg_mutex, while the removal thread holds cfg_mutex and waits for the kernfs
reference.

Could this result in an unrecoverable deadlock?

[ ... ]

This isn't a bug introduced by this patch, but does the driver assign
hid_go_brightness_set to the led_cdev.brightness_set callback?

The LED subsystem requires this callback to be non-blocking because it can
be invoked from atomic contexts. However, hid_go_brightness_set calls
mcu_property_out(), which allocates memory with GFP_KERNEL, locks a mutex,
and blocks via wait_for_completion_interruptible_timeout().

Will this violate the LED API contract and cause scheduling while atomic
panics? Should the callback be assigned to .brightness_set_blocking instead?

This is another pre-existing issue, but does mcu_property_out() call
reinit_completion() after wait_for_completion_interruptible_timeout() returns?

If a command times out and the hardware response arrives shortly afterward,
the interrupt handler signals the completion.

The next call to mcu_property_out() will send a new command but fail to
re-initialize the completion first. Its wait_for_completion will immediately
return by consuming the stale signal, without waiting for the new response.

Does this permanently desynchronize all subsequent hardware communications?
Should reinit_completion() be called prior to sending the output report?

This is also a pre-existing issue, but does hid_go_remove() bypass hardware
cleanup if the device is operating as a generic HID device?

In hid_go_probe(), if get_endpoint_address() fails (e.g., returns -ENODEV),
the driver returns 0 to allow generic HID operation, leaving the hardware
started and open.

During unbind, hid_go_remove() calls get_endpoint_address() again. If it
returns <= 0, the function returns immediately instead of falling through to
the default case.

Does this mean hid_hw_close() and hid_hw_stop() are bypassed entirely,
permanently leaking the HID hardware session and its resources?

Finally, this is a pre-existing issue, but does mcu_property_out()
unconditionally execute memcpy(dmabuf + header_size, data, len)?

Several hardware commands, such as GET_VERSION_DATA, invoke this function
with data == NULL and len == 0.

Under the C standard, passing a NULL pointer to memcpy() is undefined behavior
regardless of the length. This triggers UBSAN kernel panics if enabled and
allows aggressive compiler optimizations to discard the operation.

Should the copy be conditionally guarded by if (len > 0) or if (data)?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260515174511.78486-1-maskmemanish@gmail.com?part=1

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-05-15 18:23 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-15 15:36 [PATCH] HID: hid-lenovo-go: cancel cfg_setup work in hid_go_cfg_remove() Manish Khadka
2026-05-15 16:18 ` sashiko-bot
2026-05-15 17:45   ` [PATCH v2] " Manish Khadka
2026-05-15 18:23     ` sashiko-bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox