* [PATCH 0/2] Input: uinput - Multiple concurrency fixes in ff request handling
@ 2023-12-07 6:34 Vicki Pfau
2023-12-07 6:34 ` [PATCH 1/2] Input: uinput - Allow uinput_request_submit wait interrupting Vicki Pfau
2023-12-07 6:34 ` [PATCH 2/2] Input: uinput - Release mutex while unregistering input device Vicki Pfau
0 siblings, 2 replies; 11+ messages in thread
From: Vicki Pfau @ 2023-12-07 6:34 UTC (permalink / raw)
To: Dmitry Torokhov, linux-input; +Cc: Vicki Pfau
While investigating a report of a process hanging on submitting ff data to a
uinput-derived evdev handle I uncovered several issues regarding cross-thread
concurrency.
The first fix is simply making waiting on the completion object interruptible.
Without this, the submitting process cannot be interrupted, meaning it has to
either wait for the uinput-controlling process to read the data, or the timeout
being reached. While this is the usual flow, it being uninterruptible means
that if the uinput-controlling process is misbehaving, the submitting process
cannot be killed, suspended, or otherwise interrupted until the timeout is
reached, which could take an annoyingly long time for users.
The second fix is probably more controversial, and I'm unsure if it's really
the best way to solve this issue. Namely, there exists a small, but
reproducible window where closing a uinput device on the uinput side and
uploading ff data via an evdev handle in a separate process will lead to a
deadlock: the uinput ioctl will claim the mutex, flush requests, then try to
close the input device, which then tries to claim the evdev mutex. However,
when uploading the ff data, the evdev mutex will be claimed, try to claim the
uinput mutex, and hang indefinitely, leading to a deadlock. Since it can never
claim the uinput mutex, it doesn't notice that it should exit early, but since
it can't get the mutex at all, it can't release the evdev mutex.
My approach to solving this involves temporarily releasing the mutex after
flushing requests, allowing the upload to claim the mutex, then closing the
input device without the mutex being held, and finally reclaim the mutex to
rebalance the mutex_unlock later on.
I spent quite a while investigating other approaches while trying to come up
with the least hacky and simplest way to fix this. However, a proper fix might
be more involved and have to touch other subsystems, namely evdev, in which
case I would defer to Dmitry for a better fix, as he's a lot more familiar with
these subsystems.
I also suspect that there's a race condition with uinput_dev_event, as most
call sites are protected by the uinput device mutex, but not all of them.
Namely, it can be called via the input device's event function pointer, which
has no idea that the uinput mutex exists. However, I haven't demonstrated that
there is actually an issue here, so I haven't attempted to fix it.
Vicki Pfau (2):
Input: uinput - Allow uinput_request_submit wait interrupting
Input: uinput - Release mutex while unregistering input device
drivers/input/misc/uinput.c | 34 ++++++++++++++++++++++++++++++++--
1 file changed, 32 insertions(+), 2 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/2] Input: uinput - Allow uinput_request_submit wait interrupting
2023-12-07 6:34 [PATCH 0/2] Input: uinput - Multiple concurrency fixes in ff request handling Vicki Pfau
@ 2023-12-07 6:34 ` Vicki Pfau
2023-12-08 19:32 ` Dmitry Torokhov
2023-12-07 6:34 ` [PATCH 2/2] Input: uinput - Release mutex while unregistering input device Vicki Pfau
1 sibling, 1 reply; 11+ messages in thread
From: Vicki Pfau @ 2023-12-07 6:34 UTC (permalink / raw)
To: Dmitry Torokhov, linux-input; +Cc: Vicki Pfau
Currently, uinput_request_submit will only fail if the request wait times out.
However, in other places this wait is interruptable, and in this specific
location it can lead to issues, such as causing system suspend to hang until
the request times out. Since the timeout is so long, this can cause the
appearance of a total system freeze. Making the wait interruptable resolves
this and possibly further issues.
Signed-off-by: Vicki Pfau <vi@endrift.com>
---
drivers/input/misc/uinput.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/input/misc/uinput.c b/drivers/input/misc/uinput.c
index d98212d55108..0330e72798db 100644
--- a/drivers/input/misc/uinput.c
+++ b/drivers/input/misc/uinput.c
@@ -183,7 +183,11 @@ static int uinput_request_submit(struct uinput_device *udev,
if (retval)
goto out;
- if (!wait_for_completion_timeout(&request->done, 30 * HZ)) {
+ retval = wait_for_completion_interruptible_timeout(&request->done, 30 * HZ);
+ if (retval == -ERESTARTSYS)
+ goto out;
+
+ if (!retval) {
retval = -ETIMEDOUT;
goto out;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 2/2] Input: uinput - Release mutex while unregistering input device
2023-12-07 6:34 [PATCH 0/2] Input: uinput - Multiple concurrency fixes in ff request handling Vicki Pfau
2023-12-07 6:34 ` [PATCH 1/2] Input: uinput - Allow uinput_request_submit wait interrupting Vicki Pfau
@ 2023-12-07 6:34 ` Vicki Pfau
2023-12-08 19:58 ` Dmitry Torokhov
1 sibling, 1 reply; 11+ messages in thread
From: Vicki Pfau @ 2023-12-07 6:34 UTC (permalink / raw)
To: Dmitry Torokhov, linux-input; +Cc: Vicki Pfau
Any pending requests may be holding a mutex from its own subsystem, e.g.
evdev, while waiting to be able to claim the uinput device mutex.
However, unregistering the device may try to claim that mutex, leading
to a deadlock. To prevent this from happening, we need to temporarily
give up the lock before calling input_unregister_device.
Fixes: e8b95728f724 ("Input: uinput - avoid FF flush when destroying device")
Signed-off-by: Vicki Pfau <vi@endrift.com>
---
drivers/input/misc/uinput.c | 28 +++++++++++++++++++++++++++-
1 file changed, 27 insertions(+), 1 deletion(-)
diff --git a/drivers/input/misc/uinput.c b/drivers/input/misc/uinput.c
index 0330e72798db..ac6e5baa2093 100644
--- a/drivers/input/misc/uinput.c
+++ b/drivers/input/misc/uinput.c
@@ -296,17 +296,34 @@ static void uinput_destroy_device(struct uinput_device *udev)
udev->state = UIST_NEW_DEVICE;
if (dev) {
+ udev->dev = NULL;
name = dev->name;
phys = dev->phys;
if (old_state == UIST_CREATED) {
uinput_flush_requests(udev);
+
+ /*
+ * Any pending requests may be holding a mutex from its
+ * own subsystem, e.g. evdev, while waiting to be able
+ * to claim the uinput device mutex. However,
+ * unregistering the device may try to claim that
+ * mutex, leading to a deadlock. To prevent this from
+ * happening, we need to temporarily give up the lock.
+ *
+ * Since this function is only called immediately
+ * before the caller exits the critical section without
+ * doing any further operations on the device, this
+ * is safe and we can immediately reclaim the mutex
+ * when done so the unlock is still balanced.
+ */
+ mutex_unlock(&udev->mutex);
input_unregister_device(dev);
+ mutex_lock(&udev->mutex);
} else {
input_free_device(dev);
}
kfree(name);
kfree(phys);
- udev->dev = NULL;
}
}
@@ -745,7 +762,16 @@ static int uinput_release(struct inode *inode, struct file *file)
{
struct uinput_device *udev = file->private_data;
+ int retval;
+
+ retval = mutex_lock_interruptible(&udev->mutex);
+ if (retval)
+ return retval;
+
uinput_destroy_device(udev);
+
+ mutex_unlock(&udev->mutex);
+
kfree(udev);
return 0;
--
2.43.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] Input: uinput - Allow uinput_request_submit wait interrupting
2023-12-07 6:34 ` [PATCH 1/2] Input: uinput - Allow uinput_request_submit wait interrupting Vicki Pfau
@ 2023-12-08 19:32 ` Dmitry Torokhov
2023-12-09 3:24 ` Vicki Pfau
0 siblings, 1 reply; 11+ messages in thread
From: Dmitry Torokhov @ 2023-12-08 19:32 UTC (permalink / raw)
To: Vicki Pfau; +Cc: linux-input
Hi Vicki,
On Wed, Dec 06, 2023 at 10:34:05PM -0800, Vicki Pfau wrote:
> Currently, uinput_request_submit will only fail if the request wait times out.
> However, in other places this wait is interruptable, and in this specific
> location it can lead to issues, such as causing system suspend to hang until
> the request times out.
Could you please explain how a sleeping process can cause suspend to
hang?
> Since the timeout is so long, this can cause the
> appearance of a total system freeze. Making the wait interruptable resolves
> this and possibly further issues.
I think you are trying to find a justification too hard and it does not
make sense, however I agree that allowing to kill the process issuing
the request without waiting for the timeout to expire if the other side
is stuck might be desirable.
I think the best way to use wait_for_completion_killable_timeout()
so that stray signals do not disturb userspace, but the processes can
still be terminated.
Thanks.
--
Dmitry
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] Input: uinput - Release mutex while unregistering input device
2023-12-07 6:34 ` [PATCH 2/2] Input: uinput - Release mutex while unregistering input device Vicki Pfau
@ 2023-12-08 19:58 ` Dmitry Torokhov
2023-12-09 3:24 ` Vicki Pfau
0 siblings, 1 reply; 11+ messages in thread
From: Dmitry Torokhov @ 2023-12-08 19:58 UTC (permalink / raw)
To: Vicki Pfau; +Cc: linux-input
Hi Vicki,
On Wed, Dec 06, 2023 at 10:34:06PM -0800, Vicki Pfau wrote:
> Any pending requests may be holding a mutex from its own subsystem, e.g.
> evdev, while waiting to be able to claim the uinput device mutex.
> However, unregistering the device may try to claim that mutex, leading
> to a deadlock. To prevent this from happening, we need to temporarily
> give up the lock before calling input_unregister_device.
I do not think we can simply give up the lock, the whole thing with
UI_DEV_DESTROY allowing reusing connection to create a new input device
was a huge mistake because if you try to do UI_DEV_CREATE again on
the same fd you'll end up reusing whatever is in udev instance,
including the state and the mutex, and will make a huge mess.
I think the only reasonable way forward is change the driver so that no
ioctls are accepted after UI_DEV_DESTROY and then start untangling the
locking issues (possibly by dropping the lock on destroy after setting
the status - I think you will not observe the lockups you mention if
your application will stop using UI_DEV_DESTROY and simply closes the
fd).
>
> Fixes: e8b95728f724 ("Input: uinput - avoid FF flush when destroying device")
This is not the commit that introduced the problem, it has been there
since forever.
Thanks.
--
Dmitry
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] Input: uinput - Allow uinput_request_submit wait interrupting
2023-12-08 19:32 ` Dmitry Torokhov
@ 2023-12-09 3:24 ` Vicki Pfau
2023-12-15 3:04 ` Vicki Pfau
0 siblings, 1 reply; 11+ messages in thread
From: Vicki Pfau @ 2023-12-09 3:24 UTC (permalink / raw)
To: Dmitry Torokhov; +Cc: linux-input
Hi Dmitry,
On 12/8/23 11:32, Dmitry Torokhov wrote:
> Hi Vicki,
>
> On Wed, Dec 06, 2023 at 10:34:05PM -0800, Vicki Pfau wrote:
>> Currently, uinput_request_submit will only fail if the request wait times out.
>> However, in other places this wait is interruptable, and in this specific
>> location it can lead to issues, such as causing system suspend to hang until
>> the request times out.
>
> Could you please explain how a sleeping process can cause suspend to
> hang?
While I'm not 100% sure how it happens, given I found this by reproducing it before I came up with a theory for why it happened, my guess is that as it's trying to suspend all of userspace programs, it suspends the process that owns the uinput handle, so it can't continue to service requests, while the other process hangs in the uninterruptable call, blocking suspend for 30 seconds until the call times out.
>
>> Since the timeout is so long, this can cause the
>> appearance of a total system freeze. Making the wait interruptable resolves
>> this and possibly further issues.
>
> I think you are trying to find a justification too hard and it does not
> make sense, however I agree that allowing to kill the process issuing
> the request without waiting for the timeout to expire if the other side
> is stuck might be desirable.
This isn't reaching. As I said above, I discovered the patched line of code *after* observing suspend hanging for 30 seconds while trying to reproduce another bug. I wrote this patch, retested, and found that it now suspended immediately, leading to a visible -ERESTARTSYS in strace on coming back from suspend.
I can post the reproduction case somewhere, but the test program is only the evdev client end, with the uinput side being Steam, which I don't have source code for.
>
> I think the best way to use wait_for_completion_killable_timeout()
> so that stray signals do not disturb userspace, but the processes can
> still be terminated.
There's already a mutex_lock_interruptable in uinput_request_send that could cause this to fall back to userspace under similar circumstances. The only difference I can find, which is admittedly a bug in this patch now that I look at it again, is that uinput_dev_event would get called twice, leading to the request getting duplicated.
If there's a better way to handle the suspend case let me know, but this is not a hypothetical issue.
>
> Thanks.
>
Vicki
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] Input: uinput - Release mutex while unregistering input device
2023-12-08 19:58 ` Dmitry Torokhov
@ 2023-12-09 3:24 ` Vicki Pfau
0 siblings, 0 replies; 11+ messages in thread
From: Vicki Pfau @ 2023-12-09 3:24 UTC (permalink / raw)
To: Dmitry Torokhov; +Cc: linux-input
Hi Dmitry,
On 12/8/23 11:58, Dmitry Torokhov wrote:
> Hi Vicki,
>
> On Wed, Dec 06, 2023 at 10:34:06PM -0800, Vicki Pfau wrote:
>> Any pending requests may be holding a mutex from its own subsystem, e.g.
>> evdev, while waiting to be able to claim the uinput device mutex.
>> However, unregistering the device may try to claim that mutex, leading
>> to a deadlock. To prevent this from happening, we need to temporarily
>> give up the lock before calling input_unregister_device.
>
> I do not think we can simply give up the lock, the whole thing with
> UI_DEV_DESTROY allowing reusing connection to create a new input device
> was a huge mistake because if you try to do UI_DEV_CREATE again on
> the same fd you'll end up reusing whatever is in udev instance,
> including the state and the mutex, and will make a huge mess.
Yeah, I was curious why this was possible in the first place. It seemed overcomplicated compared to just opening a new fd. I suppose that that makes more sense, though it's a bit involved for this.
>
> I think the only reasonable way forward is change the driver so that no
> ioctls are accepted after UI_DEV_DESTROY and then start untangling the
> locking issues (possibly by dropping the lock on destroy after setting
> the status - I think you will not observe the lockups you mention if
> your application will stop using UI_DEV_DESTROY and simply closes the
> fd).
This does sound like a reasonable way forward. Unfortunately, I don't have access to the uinput-side application code, but I have been trying to work with them to flatten out bugs in it. I can pass this suggestion along, though there is still a reproducible deadlock that could theoretically happen with other programs in the meantime (though the likelihood of it being hit without actively trying for it is low).
>
>>
>> Fixes: e8b95728f724 ("Input: uinput - avoid FF flush when destroying device")
>
> This is not the commit that introduced the problem, it has been there
> since forever.
My mistake. If I prepare a v2, which I may not, I'll drop the line.
>
> Thanks.
>
Vicki
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] Input: uinput - Allow uinput_request_submit wait interrupting
2023-12-09 3:24 ` Vicki Pfau
@ 2023-12-15 3:04 ` Vicki Pfau
2024-01-12 1:37 ` Vi Pfau
2024-01-22 4:03 ` Dmitry Torokhov
0 siblings, 2 replies; 11+ messages in thread
From: Vicki Pfau @ 2023-12-15 3:04 UTC (permalink / raw)
To: Dmitry Torokhov; +Cc: linux-input
Hi Dmitry
On 12/8/23 19:24, Vicki Pfau wrote:
> Hi Dmitry,
>
> On 12/8/23 11:32, Dmitry Torokhov wrote:
>> Hi Vicki,
>>
>> On Wed, Dec 06, 2023 at 10:34:05PM -0800, Vicki Pfau wrote:
>>> Currently, uinput_request_submit will only fail if the request wait times out.
>>> However, in other places this wait is interruptable, and in this specific
>>> location it can lead to issues, such as causing system suspend to hang until
>>> the request times out.
>>
>> Could you please explain how a sleeping process can cause suspend to
>> hang?
>
> While I'm not 100% sure how it happens, given I found this by reproducing it before I came up with a theory for why it happened, my guess is that as it's trying to suspend all of userspace programs, it suspends the process that owns the uinput handle, so it can't continue to service requests, while the other process hangs in the uninterruptable call, blocking suspend for 30 seconds until the call times out.
>
>>
>>> Since the timeout is so long, this can cause the
>>> appearance of a total system freeze. Making the wait interruptable resolves
>>> this and possibly further issues.
>>
>> I think you are trying to find a justification too hard and it does not
>> make sense, however I agree that allowing to kill the process issuing
>> the request without waiting for the timeout to expire if the other side
>> is stuck might be desirable.
>
> This isn't reaching. As I said above, I discovered the patched line of code *after* observing suspend hanging for 30 seconds while trying to reproduce another bug. I wrote this patch, retested, and found that it now suspended immediately, leading to a visible -ERESTARTSYS in strace on coming back from suspend.
>
> I can post the reproduction case somewhere, but the test program is only the evdev client end, with the uinput side being Steam, which I don't have source code for.
>
>>
>> I think the best way to use wait_for_completion_killable_timeout()
>> so that stray signals do not disturb userspace, but the processes can
>> still be terminated.
>
> There's already a mutex_lock_interruptable in uinput_request_send that could cause this to fall back to userspace under similar circumstances. The only difference I can find, which is admittedly a bug in this patch now that I look at it again, is that uinput_dev_event would get called twice, leading to the request getting duplicated.
After further investigation, it seems this would still be the case even if the request times out--an invalid request would get left in the buffer, which means that while this is a new way to trigger the issue, it's not actually a new issue.
It seems to me that this driver could use a lot of love to get it into better shape, which I could work on, but I'm not actually sure where to begin. Especially if we don't want to break ABI.
>
> If there's a better way to handle the suspend case let me know, but this is not a hypothetical issue.
>
>>
>> Thanks.
>>
>
> Vicki
Vici
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] Input: uinput - Allow uinput_request_submit wait interrupting
2023-12-15 3:04 ` Vicki Pfau
@ 2024-01-12 1:37 ` Vi Pfau
2024-01-22 4:03 ` Dmitry Torokhov
1 sibling, 0 replies; 11+ messages in thread
From: Vi Pfau @ 2024-01-12 1:37 UTC (permalink / raw)
To: Dmitry Torokhov; +Cc: linux-input
Hello Dmitry,
It's been almost a month since I replied addressing your concerns on
this patch. Can you please comment?
On 12/14/23 19:04, Vicki Pfau wrote:
> Hi Dmitry
>
> On 12/8/23 19:24, Vicki Pfau wrote:
>> Hi Dmitry,
>>
>> On 12/8/23 11:32, Dmitry Torokhov wrote:
>>> Hi Vicki,
>>>
>>> On Wed, Dec 06, 2023 at 10:34:05PM -0800, Vicki Pfau wrote:
>>>> Currently, uinput_request_submit will only fail if the request wait times out.
>>>> However, in other places this wait is interruptable, and in this specific
>>>> location it can lead to issues, such as causing system suspend to hang until
>>>> the request times out.
>>>
>>> Could you please explain how a sleeping process can cause suspend to
>>> hang?
>>
>> While I'm not 100% sure how it happens, given I found this by reproducing it before I came up with a theory for why it happened, my guess is that as it's trying to suspend all of userspace programs, it suspends the process that owns the uinput handle, so it can't continue to service requests, while the other process hangs in the uninterruptable call, blocking suspend for 30 seconds until the call times out.
>>
>>>
>>>> Since the timeout is so long, this can cause the
>>>> appearance of a total system freeze. Making the wait interruptable resolves
>>>> this and possibly further issues.
>>>
>>> I think you are trying to find a justification too hard and it does not
>>> make sense, however I agree that allowing to kill the process issuing
>>> the request without waiting for the timeout to expire if the other side
>>> is stuck might be desirable.
>>
>> This isn't reaching. As I said above, I discovered the patched line of code *after* observing suspend hanging for 30 seconds while trying to reproduce another bug. I wrote this patch, retested, and found that it now suspended immediately, leading to a visible -ERESTARTSYS in strace on coming back from suspend.
>>
>> I can post the reproduction case somewhere, but the test program is only the evdev client end, with the uinput side being Steam, which I don't have source code for.
>>
>>>
>>> I think the best way to use wait_for_completion_killable_timeout()
>>> so that stray signals do not disturb userspace, but the processes can
>>> still be terminated.
>>
>> There's already a mutex_lock_interruptable in uinput_request_send that could cause this to fall back to userspace under similar circumstances. The only difference I can find, which is admittedly a bug in this patch now that I look at it again, is that uinput_dev_event would get called twice, leading to the request getting duplicated.
>
> After further investigation, it seems this would still be the case even if the request times out--an invalid request would get left in the buffer, which means that while this is a new way to trigger the issue, it's not actually a new issue.
>
> It seems to me that this driver could use a lot of love to get it into better shape, which I could work on, but I'm not actually sure where to begin. Especially if we don't want to break ABI.
>
>>
>> If there's a better way to handle the suspend case let me know, but this is not a hypothetical issue.
>>
>>>
>>> Thanks.
>>>
>>
>> Vicki
>
> Vici
Vicki
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] Input: uinput - Allow uinput_request_submit wait interrupting
2023-12-15 3:04 ` Vicki Pfau
2024-01-12 1:37 ` Vi Pfau
@ 2024-01-22 4:03 ` Dmitry Torokhov
2025-12-10 3:37 ` Vicki Pfau
1 sibling, 1 reply; 11+ messages in thread
From: Dmitry Torokhov @ 2024-01-22 4:03 UTC (permalink / raw)
To: Vicki Pfau, Rafael J. Wysocki; +Cc: linux-input
Hi Vicki,
On Thu, Dec 14, 2023 at 07:04:09PM -0800, Vicki Pfau wrote:
> Hi Dmitry
>
> On 12/8/23 19:24, Vicki Pfau wrote:
> > Hi Dmitry,
> >
> > On 12/8/23 11:32, Dmitry Torokhov wrote:
> >> Hi Vicki,
> >>
> >> On Wed, Dec 06, 2023 at 10:34:05PM -0800, Vicki Pfau wrote:
> >>> Currently, uinput_request_submit will only fail if the request wait times out.
> >>> However, in other places this wait is interruptable, and in this specific
> >>> location it can lead to issues, such as causing system suspend to hang until
> >>> the request times out.
> >>
> >> Could you please explain how a sleeping process can cause suspend to
> >> hang?
> >
> > While I'm not 100% sure how it happens, given I found this by
> > reproducing it before I came up with a theory for why it happened,
> > my guess is that as it's trying to suspend all of userspace
> > programs, it suspends the process that owns the uinput handle, so it
> > can't continue to service requests, while the other process hangs in
> > the uninterruptable call, blocking suspend for 30 seconds until the
> > call times out.
> >
> >>
> >>> Since the timeout is so long, this can cause the
> >>> appearance of a total system freeze. Making the wait interruptable resolves
> >>> this and possibly further issues.
> >>
> >> I think you are trying to find a justification too hard and it does not
> >> make sense, however I agree that allowing to kill the process issuing
> >> the request without waiting for the timeout to expire if the other side
> >> is stuck might be desirable.
> >
> > This isn't reaching. As I said above, I discovered the patched line
> > of code *after* observing suspend hanging for 30 seconds while
> > trying to reproduce another bug. I wrote this patch, retested, and
> > found that it now suspended immediately, leading to a visible
> > -ERESTARTSYS in strace on coming back from suspend.
> >
I must apologize, you indeed weren't reaching. As far as I can see,
putting tasks into the freezer (which happens during system suspend) is
done via delivering a fake signal to the task. So the task indeed needs
to be in an interruptible state, uninterruptible tasks result in system
failing to suspend.
> > I can post the reproduction case somewhere, but the test program is
> > only the evdev client end, with the uinput side being Steam, which I
> > don't have source code for.
> >
> >>
> >> I think the best way to use wait_for_completion_killable_timeout()
> >> so that stray signals do not disturb userspace, but the processes can
> >> still be terminated.
> >
> > There's already a mutex_lock_interruptable in uinput_request_send
> > that could cause this to fall back to userspace under similar
> > circumstances. The only difference I can find, which is admittedly a
> > bug in this patch now that I look at it again, is that
> > uinput_dev_event would get called twice, leading to the request
> > getting duplicated.
>
> After further investigation, it seems this would still be the case
> even if the request times out--an invalid request would get left in
> the buffer, which means that while this is a new way to trigger the
> issue, it's not actually a new issue.
No, I disagree that it is the same issue. The timeout condition is
pretty much fatal, I expect the caller to exit or stop using the device
if request times out (because either the real device is not responding,
or userspace is not responding, and there is no indication that they
will start responding any time soon). That is why the timeout value is
so generous (30 seconds). In this case we definitely not expect the
request to be re-submitted, either automatically, or explicitly by
userspace.
If we make waiting on the request interruptible we may get interrupted
by a stray signal, and I do not know how both producer (the process
issuing the uinput request) and consumer of the request, will react to
essentially duplicate requests being sent.
I believe we can split this into 2 separate issues:
1. The fact that it is not possible terminate the producer process while
it is waiting for request to be handled (for 30 seconds). I think this
can be safely resolved by switching to
wait_for_completion_killable_timeout(). This will allow fatal signals to
break the wait, and for the process to exit.
2. Producer task failing to enter refrigerator and breaking suspend.
I wonder if the best way to handle that is for uinput to create and
register wakeup source, and then use __pm_stay_awake() and __pm_relax()
to indicate to the rest of the system that suspend is blocked. I believe
userspace should be able to handle this and repeat suspend attempt when
the condition clears...
Rafael, do you have any suggestions here? And I wonder, could we make
killable tasks also enter refrigerator?
Also, now that I think about it more, we should not use slot number for
request->id, as I expect the common situation is to have 1 outstanding
request, so all requests have id 0. Instead we should have a counter and
increase it. This way timeout handling will be more robust, we will not
mistake delayed response to the previous request as response to the
current one.
Thanks.
--
Dmitry
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] Input: uinput - Allow uinput_request_submit wait interrupting
2024-01-22 4:03 ` Dmitry Torokhov
@ 2025-12-10 3:37 ` Vicki Pfau
0 siblings, 0 replies; 11+ messages in thread
From: Vicki Pfau @ 2025-12-10 3:37 UTC (permalink / raw)
To: Dmitry Torokhov, Rafael J. Wysocki; +Cc: linux-input
Hi Dmitry,
It's been nearly 2 years with no updates on this, so I'm going to press on this some again.
On 1/21/24 8:03 PM, Dmitry Torokhov wrote:
> Hi Vicki,
>
> On Thu, Dec 14, 2023 at 07:04:09PM -0800, Vicki Pfau wrote:
>> Hi Dmitry
>>
>> On 12/8/23 19:24, Vicki Pfau wrote:
>>> Hi Dmitry,
>>>
>>> On 12/8/23 11:32, Dmitry Torokhov wrote:
>>>> Hi Vicki,
>>>>
>>>> On Wed, Dec 06, 2023 at 10:34:05PM -0800, Vicki Pfau wrote:
>>>>> Currently, uinput_request_submit will only fail if the request wait times out.
>>>>> However, in other places this wait is interruptable, and in this specific
>>>>> location it can lead to issues, such as causing system suspend to hang until
>>>>> the request times out.
>>>>
>>>> Could you please explain how a sleeping process can cause suspend to
>>>> hang?
>>>
>>> While I'm not 100% sure how it happens, given I found this by
>>> reproducing it before I came up with a theory for why it happened,
>>> my guess is that as it's trying to suspend all of userspace
>>> programs, it suspends the process that owns the uinput handle, so it
>>> can't continue to service requests, while the other process hangs in
>>> the uninterruptable call, blocking suspend for 30 seconds until the
>>> call times out.
>>>
>>>>
>>>>> Since the timeout is so long, this can cause the
>>>>> appearance of a total system freeze. Making the wait interruptable resolves
>>>>> this and possibly further issues.
>>>>
>>>> I think you are trying to find a justification too hard and it does not
>>>> make sense, however I agree that allowing to kill the process issuing
>>>> the request without waiting for the timeout to expire if the other side
>>>> is stuck might be desirable.
>>>
>>> This isn't reaching. As I said above, I discovered the patched line
>>> of code *after* observing suspend hanging for 30 seconds while
>>> trying to reproduce another bug. I wrote this patch, retested, and
>>> found that it now suspended immediately, leading to a visible
>>> -ERESTARTSYS in strace on coming back from suspend.
>>>
>
> I must apologize, you indeed weren't reaching. As far as I can see,
> putting tasks into the freezer (which happens during system suspend) is
> done via delivering a fake signal to the task. So the task indeed needs
> to be in an interruptible state, uninterruptible tasks result in system
> failing to suspend.
>
>>> I can post the reproduction case somewhere, but the test program is
>>> only the evdev client end, with the uinput side being Steam, which I
>>> don't have source code for.
>>>
>>>>
>>>> I think the best way to use wait_for_completion_killable_timeout()
>>>> so that stray signals do not disturb userspace, but the processes can
>>>> still be terminated.
>>>
>>> There's already a mutex_lock_interruptable in uinput_request_send
>>> that could cause this to fall back to userspace under similar
>>> circumstances. The only difference I can find, which is admittedly a
>>> bug in this patch now that I look at it again, is that
>>> uinput_dev_event would get called twice, leading to the request
>>> getting duplicated.
>>
>> After further investigation, it seems this would still be the case
>> even if the request times out--an invalid request would get left in
>> the buffer, which means that while this is a new way to trigger the
>> issue, it's not actually a new issue.
>
> No, I disagree that it is the same issue. The timeout condition is
> pretty much fatal, I expect the caller to exit or stop using the device
> if request times out (because either the real device is not responding,
> or userspace is not responding, and there is no indication that they
> will start responding any time soon). That is why the timeout value is
> so generous (30 seconds). In this case we definitely not expect the
> request to be re-submitted, either automatically, or explicitly by
> userspace.
>
> If we make waiting on the request interruptible we may get interrupted
> by a stray signal, and I do not know how both producer (the process
> issuing the uinput request) and consumer of the request, will react to
> essentially duplicate requests being sent.
>
> I believe we can split this into 2 separate issues:
>
> 1. The fact that it is not possible terminate the producer process while
> it is waiting for request to be handled (for 30 seconds). I think this
> can be safely resolved by switching to
> wait_for_completion_killable_timeout(). This will allow fatal signals to
> break the wait, and for the process to exit.
>
> 2. Producer task failing to enter refrigerator and breaking suspend.
> I wonder if the best way to handle that is for uinput to create and
> register wakeup source, and then use __pm_stay_awake() and __pm_relax()
> to indicate to the rest of the system that suspend is blocked. I believe
> userspace should be able to handle this and repeat suspend attempt when
> the condition clears...
>
> Rafael, do you have any suggestions here? And I wonder, could we make
> killable tasks also enter refrigerator?
>
>
> Also, now that I think about it more, we should not use slot number for
> request->id, as I expect the common situation is to have 1 outstanding
> request, so all requests have id 0. Instead we should have a counter and
> increase it. This way timeout handling will be more robust, we will not
> mistake delayed response to the previous request as response to the
> current one.
>
> Thanks.
>
Though it may not be the "best" approach, the patch I posted has been shipping in SteamOS for 2 years with, as far as I'm aware, no issues. For obvious reasons, I don't want to be carrying downstream patches for any longer than necessary, so I'd like to see what needs to be done to get this properly fixed. I don't think I'm really familiar enough with the subsystem to fix it in the desired way, but hopefully Rafael can chime in this time, or someone else who is more familiar can make some progress.
Vicki
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2025-12-10 3:43 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-07 6:34 [PATCH 0/2] Input: uinput - Multiple concurrency fixes in ff request handling Vicki Pfau
2023-12-07 6:34 ` [PATCH 1/2] Input: uinput - Allow uinput_request_submit wait interrupting Vicki Pfau
2023-12-08 19:32 ` Dmitry Torokhov
2023-12-09 3:24 ` Vicki Pfau
2023-12-15 3:04 ` Vicki Pfau
2024-01-12 1:37 ` Vi Pfau
2024-01-22 4:03 ` Dmitry Torokhov
2025-12-10 3:37 ` Vicki Pfau
2023-12-07 6:34 ` [PATCH 2/2] Input: uinput - Release mutex while unregistering input device Vicki Pfau
2023-12-08 19:58 ` Dmitry Torokhov
2023-12-09 3:24 ` Vicki Pfau
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).