Handling incoming ZLP in cdc-wdm

public inbox for linux-usb@vger.kernel.org
 help / color / mirror / Atom feed

* Handling incoming ZLP in cdc-wdm
@ 2025-03-26 15:50 Robert Hodaszi
  2025-03-27 13:21 ` Oliver Neukum
  0 siblings, 1 reply; 15+ messages in thread
From: Robert Hodaszi @ 2025-03-26 15:50 UTC (permalink / raw)
  To: linux-usb

Hi,

(Sorry for the long mail, but want to describe exactly what is happening!)

I'm having a weird error with a certain modem (Telit LE910C4) + ModemManager. In some circumstances (e.g. SIM switching while there are some other ongoing transactions already from ModemManager, or stopping ModemManager in the wrong moment (again, ongoing transactions)), I can make the qmi-proxy stuck, and can only SIGKILL it.

qmi-proxy gets stuck in g_unix_input_stream_read() in glib.

ModemManager tries to read an incoming message (/dev/cdc-wdm0), so it calls g_pollable_input_stream_default_read_nonblocking() (glib), which first checks if the stream is readable with g_poll(), and if there is, it reads the data in g_unix_input_stream_read() (glib).

What g_unix_input_stream_read() does: it polls first (uninterruptible (EINTR) loop, this is where it gets stuck finally), then reads the data. If the read function returns with EINTR or EAGAIN, another loop starts, and goes back to poll().

When the issue happens, the modem sends us a lot of zero-length packets. I see a 10+ INTERRUPT_IN URBs, without CONTROL_IN URB, because qmi-proxy is busy with close the connection (sending CONTROL_OUT URBs, and doing other things). In the INTERRUPT_IN URB's last 4 byte (cdc-wdm driver doesn't handle that), I can see the exact same number. I guess, this is the frame ID, as usually that gets incremented. So I think, modem tries to inform us about a pending message over-and-over. That's incrementing the desc->resp_count counter in the cdc-wdm driver.

Finally qmi-proxy gets to the point to try to read in data (and call the aforementioned g_unix_input_stream_read()). But the modem is only sending back ZLPs, I suppose, because it informed us multiple times about the same pending packet, and it doesn't have anything more to send us (and it makes sense to send ZLP in this case).

The problem is, wdm_poll() always return with EPOLLIN even when wdm_in_callback() receives a ZLP, as it sets WDM_READ. So it makes sense for glib to think, there's a pending packet. In wdm_read(), if the packet's length is 0 (desc->length = 0) and WDM_READ is set, we reach

    if (!desc->length)

line, where it puts out another URB (as the resp_count is not 0), clear WDM_READ and go back to "retry". The second time we test WDM_READ, it is obviously not set yet, and as we are reading non-blocking, the function returns with EAGAIN.

And that is the issue here, as glib in this case thinks (with reason), that OK, it has to try to read the packet again, so it goes back to poll.

Then another ZLP succeed, wdm_poll() returns with EPOLLIN, glib calls wdm_read(), which return EAGAIN, etc.

Finally modem runs out from ZLPs as well, and has nothing to send us, so we just wait in wdm_poll(), and we cannot even interrupt this loop, as this is a non-blocking call, and EINTR is disabled.

-----------------------------------

I think, that should be fixed in cdc-wdm. So I'm wondering, what is the right approach here? Should we just return with success with a 0-length packet from wdm_read()? Consider ZLP as an error in wdm_in_callback, and schedule service_outs_intr work, like if desc->rerr is set? Other?

Thanks,
Robert Hodaszi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Handling incoming ZLP in cdc-wdm
@ 2025-03-26 16:03 Hodaszi, Robert
  2025-03-27 13:01 ` Robert Hodaszi
  0 siblings, 1 reply; 15+ messages in thread
From: Hodaszi, Robert @ 2025-03-26 16:03 UTC (permalink / raw)
  To: linux-usb@vger.kernel.org

Hi,

(Sorry for the long mail, but want to describe exactly what is happening!)

I'm having a weird error with a certain modem (Telit LE910C4) + ModemManager. In some circumstances (e.g. SIM switching while there are some other ongoing transactions already from ModemManager, or stopping ModemManager in the wrong moment (again, ongoing transactions)), I can make the qmi-proxy stuck, and can only SIGKILL it.

qmi-proxy gets stuck in g_unix_input_stream_read() in glib.

ModemManager tries to read an incoming message (/dev/cdc-wdm0), so it calls g_pollable_input_stream_default_read_nonblocking() (glib), which first checks if the stream is readable with g_poll(), and if there is, it reads the data in g_unix_input_stream_read() (glib).

What g_unix_input_stream_read() does: it polls first (uninterruptible (EINTR) loop, this is where it gets stuck finally), then reads the data. If the read function returns with EINTR or EAGAIN, another loop starts, and goes back to poll().

When the issue happens, the modem sends us a lot of zero-length packets. I see a 10+ INTERRUPT_IN URBs, without CONTROL_IN URB, because qmi-proxy is busy with close the connection (sending CONTROL_OUT URBs, and doing other things). In the INTERRUPT_IN URB's last 4 byte (cdc-wdm driver doesn't handle that), I can see the exact same number. I guess, this is the frame ID, as usually that gets incremented. So I think, modem tries to inform us about a pending message over-and-over. That's incrementing the desc->resp_count counter in the cdc-wdm driver.

Finally qmi-proxy gets to the point to try to read in data (and call the aforementioned g_unix_input_stream_read()). But the modem is only sending back ZLPs, I suppose, because it informed us multiple times about the same pending packet, and it doesn't have anything more to send us (and it makes sense to send ZLP in this case).

The problem is, wdm_poll() always return with EPOLLIN even when wdm_in_callback() receives a ZLP, as it sets WDM_READ. So it makes sense for glib to think, there's a pending packet. In wdm_read(), if the packet's length is 0 (desc->length = 0) and WDM_READ is set, we reach

    if (!desc->length)

line, where it puts out another URB (as the resp_count is not 0), clear WDM_READ and go back to "retry". The second time we test WDM_READ, it is obviously not set yet, and as we are reading non-blocking, the function returns with EAGAIN.

And that is the issue here, as glib in this case thinks (with reason), that OK, it has to try to read the packet again, so it goes back to poll.

Then another ZLP succeed, wdm_poll() returns with EPOLLIN, glib calls wdm_read(), which return EAGAIN, etc.

Finally modem runs out from ZLPs as well, and has nothing to send us, so we just wait in wdm_poll(), and we cannot even interrupt this loop, as this is a non-blocking call, and EINTR is disabled.

-----------------------------------

I think, that should be fixed in cdc-wdm. So I'm wondering, what is the right approach here? Should we just return with success with a 0-length packet from wdm_read()? Consider ZLP as an error in wdm_in_callback, and schedule service_outs_intr work, like if desc->rerr is set? Other?

Thanks,
Robert Hodaszi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Handling incoming ZLP in cdc-wdm
  2025-03-26 16:03 Handling incoming ZLP in cdc-wdm Hodaszi, Robert
@ 2025-03-27 13:01 ` Robert Hodaszi
  2025-03-27 13:24   ` Oliver Neukum
  0 siblings, 1 reply; 15+ messages in thread
From: Robert Hodaszi @ 2025-03-27 13:01 UTC (permalink / raw)
  To: linux-usb@vger.kernel.org

On Wednesday, 26.03.2025 at 17:03 +0100, Hodaszi, Robert <robert.hodaszi@digi.com> wrote:
> Hi,
>
> (Sorry for the long mail, but want to describe exactly what is happening!)
>
> I'm having a weird error with a certain modem (Telit LE910C4) + ModemManager. In some circumstances (e.g. SIM switching while there are some other ongoing transactions already from ModemManager, or stopping ModemManager in the wrong moment (again, ongoing transactions)), I can make the qmi-proxy stuck, and can only SIGKILL it.
>
> qmi-proxy gets stuck in g_unix_input_stream_read() in glib.
>
> ModemManager tries to read an incoming message (/dev/cdc-wdm0), so it calls g_pollable_input_stream_default_read_nonblocking() (glib), which first checks if the stream is readable with g_poll(), and if there is, it reads the data in g_unix_input_stream_read() (glib).
>
> What g_unix_input_stream_read() does: it polls first (uninterruptible (EINTR) loop, this is where it gets stuck finally), then reads the data. If the read function returns with EINTR or EAGAIN, another loop starts, and goes back to poll().
>
> When the issue happens, the modem sends us a lot of zero-length packets. I see a 10+ INTERRUPT_IN URBs, without CONTROL_IN URB, because qmi-proxy is busy with close the connection (sending CONTROL_OUT URBs, and doing other things). In the INTERRUPT_IN URB's last 4 byte (cdc-wdm driver doesn't handle that), I can see the exact same number. I guess, this is the frame ID, as usually that gets incremented. So I think, modem tries to inform us about a pending message over-and-over. That's incrementing the desc->resp_count counter in the cdc-wdm driver.
>
> Finally qmi-proxy gets to the point to try to read in data (and call the aforementioned g_unix_input_stream_read()). But the modem is only sending back ZLPs, I suppose, because it informed us multiple times about the same pending packet, and it doesn't have anything more to send us (and it makes sense to send ZLP in this case).
>
> The problem is, wdm_poll() always return with EPOLLIN even when wdm_in_callback() receives a ZLP, as it sets WDM_READ. So it makes sense for glib to think, there's a pending packet. In wdm_read(), if the packet's length is 0 (desc->length = 0) and WDM_READ is set, we reach
>
>     if (!desc->length)
>
> line, where it puts out another URB (as the resp_count is not 0), clear WDM_READ and go back to "retry". The second time we test WDM_READ, it is obviously not set yet, and as we are reading non-blocking, the function returns with EAGAIN.
>
> And that is the issue here, as glib in this case thinks (with reason), that OK, it has to try to read the packet again, so it goes back to poll.
>
> Then another ZLP succeed, wdm_poll() returns with EPOLLIN, glib calls wdm_read(), which return EAGAIN, etc.
>
> Finally modem runs out from ZLPs as well, and has nothing to send us, so we just wait in wdm_poll(), and we cannot even interrupt this loop, as this is a non-blocking call, and EINTR is disabled.
>
> -----------------------------------
>
> I think, that should be fixed in cdc-wdm. So I'm wondering, what is the right approach here? Should we just return with success with a 0-length packet from wdm_read()? Consider ZLP as an error in wdm_in_callback, and schedule service_outs_intr work, like if desc->rerr is set? Other?
>
>
> Thanks,
> Robert Hodaszi
>
Following on this: returning 0 bytes for read would kill libqmi, as that handles that as an error as well ("connection broken").

So what about this patch?

    diff --git a/drivers/usb/class/cdc-wdm.c b/drivers/usb/class/cdc-wdm.c
    index 86ee39db013f..37873acd18f4 100644
    --- a/drivers/usb/class/cdc-wdm.c
    +++ b/drivers/usb/class/cdc-wdm.c
    @@ -214,6 +214,11 @@ static void wdm_in_callback(struct urb *urb)
            if (desc->rerr == 0 && status != -EPIPE)
                    desc->rerr = status;
     
    +       if (length == 0) {
    +               dev_dbg(&desc->intf->dev, "received ZLP\n");
    +               goto skip_error;
    +       }
    +
            if (length + desc->length > desc->wMaxCommand) {
                    /* The buffer would overflow */
                    set_bit(WDM_OVERFLOW, &desc->flags);
    @@ -227,10 +232,10 @@ static void wdm_in_callback(struct urb *urb)
            }
     skip_error:
     
    -       if (desc->rerr) {
    +       if (desc->rerr || length == 0) {
                    /*
    -                * Since there was an error, userspace may decide to not read
    -                * any data after poll'ing.
    +                * If there was a ZLP or an error, userspace may decide to not
    +                * read any data after poll'ing.
                     * We should respond to further attempts from the device to send
                     * data, so that we can get unstuck.
                     */

Best regards,
Robert Hodaszi


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Handling incoming ZLP in cdc-wdm
  2025-03-26 15:50 Robert Hodaszi
@ 2025-03-27 13:21 ` Oliver Neukum
  2025-03-27 15:23   ` Robert Hodaszi
  0 siblings, 1 reply; 15+ messages in thread
From: Oliver Neukum @ 2025-03-27 13:21 UTC (permalink / raw)
  To: Robert Hodaszi, linux-usb

[-- Attachment #1: Type: text/plain, Size: 766 bytes --]

Hi,

On 26.03.25 16:50, Robert Hodaszi wrote:

> The problem is, wdm_poll() always return with EPOLLIN even when wdm_in_callback() receives a ZLP, as it sets WDM_READ. So it makes sense for glib to think, there's a pending packet. In wdm_read(), if the packet's length is 0 (desc->length = 0) and WDM_READ is set, we reach
> 
>      if (!desc->length)
> 
> line, where it puts out another URB (as the resp_count is not 0), clear WDM_READ and go back to "retry". The second time we test WDM_READ, it is obviously not set yet, and as we are reading non-blocking, the function returns with EAGAIN.

Arguably the interrupt handler should set the flag for a readable result only
if indeed there is data in the buffer. Could you try the attached patch?

	Regards
		Oliver

[-- Attachment #2: 0001-usb-cdc-wdm-check-for-zero-length-response.patch --]
[-- Type: text/x-patch, Size: 834 bytes --]

From 47311e34cad1cc13ab5d2fbae239f188a6e82c1d Mon Sep 17 00:00:00 2001
From: Oliver Neukum <oneukum@suse.com>
Date: Thu, 27 Mar 2025 14:17:45 +0100
Subject: [PATCH] usb: cdc-wdm: check for zero length response

We should not indicate that there's a response if we have no
data in the buffer.

Signed-off-by: Oliver Neukum <oneukum@suse.com>
---
 drivers/usb/class/cdc-wdm.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/usb/class/cdc-wdm.c b/drivers/usb/class/cdc-wdm.c
index 86ee39db013f..856488a7cb6b 100644
--- a/drivers/usb/class/cdc-wdm.c
+++ b/drivers/usb/class/cdc-wdm.c
@@ -236,6 +236,8 @@ static void wdm_in_callback(struct urb *urb)
 		 */
 		schedule_work(&desc->service_outs_intr);
 	} else {
+		if (!desc->length)
+			goto out;
 		set_bit(WDM_READ, &desc->flags);
 		wake_up(&desc->wait);
 	}
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: Handling incoming ZLP in cdc-wdm
  2025-03-27 13:01 ` Robert Hodaszi
@ 2025-03-27 13:24   ` Oliver Neukum
  2025-03-27 15:27     ` Robert Hodaszi
  0 siblings, 1 reply; 15+ messages in thread
From: Oliver Neukum @ 2025-03-27 13:24 UTC (permalink / raw)
  To: Robert Hodaszi, linux-usb@vger.kernel.org

Hi,

On 27.03.25 14:01, Robert Hodaszi wrote:

>>
> Following on this: returning 0 bytes for read would kill libqmi, as that handles that as an error as well ("connection broken").
> 
> So what about this patch?
> 
>      diff --git a/drivers/usb/class/cdc-wdm.c b/drivers/usb/class/cdc-wdm.c
>      index 86ee39db013f..37873acd18f4 100644
>      --- a/drivers/usb/class/cdc-wdm.c
>      +++ b/drivers/usb/class/cdc-wdm.c
>      @@ -214,6 +214,11 @@ static void wdm_in_callback(struct urb *urb)
>              if (desc->rerr == 0 && status != -EPIPE)
>                      desc->rerr = status;
>       
>      +       if (length == 0) {
>      +               dev_dbg(&desc->intf->dev, "received ZLP\n");
>      +               goto skip_error;
>      +       }
>      +
>              if (length + desc->length > desc->wMaxCommand) {
>                      /* The buffer would overflow */
>                      set_bit(WDM_OVERFLOW, &desc->flags);
>      @@ -227,10 +232,10 @@ static void wdm_in_callback(struct urb *urb)
>              }
>       skip_error:
>       
>      -       if (desc->rerr) {
>      +       if (desc->rerr || length == 0) {
>                      /*
>      -                * Since there was an error, userspace may decide to not read
>      -                * any data after poll'ing.
>      +                * If there was a ZLP or an error, userspace may decide to not
>      +                * read any data after poll'ing.
>                       * We should respond to further attempts from the device to send
>                       * data, so that we can get unstuck.
>                       */

Why do you wish to react to this like an error?
It seems to me that we indeed need to wait for the device in this case.

	Regards
		Oliver


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Handling incoming ZLP in cdc-wdm
  2025-03-27 13:21 ` Oliver Neukum
@ 2025-03-27 15:23   ` Robert Hodaszi
  0 siblings, 0 replies; 15+ messages in thread
From: Robert Hodaszi @ 2025-03-27 15:23 UTC (permalink / raw)
  To: Oliver Neukum, linux-usb

 On Thursday, 27.03.2025 at 14:21 +0100, Oliver Neukum <oneukum@suse.com> wrote:

>
> Arguably the interrupt handler should set the flag for a readable result only
> if indeed there is data in the buffer. Could you try the attached patch?
>
>        Regards
>                Oliver

I think, that will not work. With this patch, if a ZLP comes in, WDM_READ is not set, so wdm_read() won't submit another read urb if resp_count is not zero (won't call service_outstanding_interrupt), and we are not doing here, in wdm_in_callback neither. So the receiving just gets stuck.

That's exactly the reason I did what I did in the patch I sent.

Best regards,
Robert Hodaszi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Handling incoming ZLP in cdc-wdm
  2025-03-27 13:24   ` Oliver Neukum
@ 2025-03-27 15:27     ` Robert Hodaszi
  2025-03-31  9:59       ` Oliver Neukum
  0 siblings, 1 reply; 15+ messages in thread
From: Robert Hodaszi @ 2025-03-27 15:27 UTC (permalink / raw)
  To: Oliver Neukum, linux-usb@vger.kernel.org

On Thursday, 27.03.2025 at 14:24 +0100, Oliver Neukum <oneukum@suse.com> wrote:
>
> Why do you wish to react to this like an error?
> It seems to me that we indeed need to wait for the device in this case.
>
>        Regards
>                Oliver
>
See my previous response to your patch. Because:

    1. we have to submit another read URB is resp_count != 0 (call service_outstanding_interrupt), otherwise receiving gets stuck
    2. we should not set WDM_READ, to not confuse userspace
    3. we should not set res_length to 0, otherwise  we get stuck again (actually, should we clear the "if (!desc->reslength)" in wdm_read(), since that's is no longer possible anymore?

Best regards,
Robert Hodaszi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Handling incoming ZLP in cdc-wdm
  2025-03-27 15:27     ` Robert Hodaszi
@ 2025-03-31  9:59       ` Oliver Neukum
  2025-04-02 11:57         ` Robert Hodaszi
  0 siblings, 1 reply; 15+ messages in thread
From: Oliver Neukum @ 2025-03-31  9:59 UTC (permalink / raw)
  To: Robert Hodaszi, Oliver Neukum, linux-usb@vger.kernel.org,
	Bjørn Mork

(Adding Bjørn)

On 27.03.25 16:27, Robert Hodaszi wrote:
> On Thursday, 27.03.2025 at 14:24 +0100, Oliver Neukum <oneukum@suse.com> wrote:
>>
>> Why do you wish to react to this like an error?
>> It seems to me that we indeed need to wait for the device in this case.
>>
>>         Regards
>>                 Oliver
>>
> See my previous response to your patch. Because:
> 
>      1. we have to submit another read URB is resp_count != 0 (call service_outstanding_interrupt), otherwise receiving gets stuck

Now that you mention that I am not sure that the resp_count
logic in general is correct. I see no reason user space cannot
exhaust the buffer and get resp_count out of sync.

>      2. we should not set WDM_READ, to not confuse userspace

We agree on that.

>      3. we should not set res_length to 0, otherwise  we get stuck again (actually, should we clear the "if (!desc->reslength)" in wdm_read(), since that's is no longer possible anymore?

AFAICT it can happen if two threads are racing on wdm_read()

	Regards
		Oliver


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Handling incoming ZLP in cdc-wdm
  2025-03-31  9:59       ` Oliver Neukum
@ 2025-04-02 11:57         ` Robert Hodaszi
  2025-04-02 14:01           ` Oliver Neukum
  0 siblings, 1 reply; 15+ messages in thread
From: Robert Hodaszi @ 2025-04-02 11:57 UTC (permalink / raw)
  To: Oliver Neukum, linux-usb@vger.kernel.org, Bjørn Mork


2025. 03. 31. 11:59 keltezéssel, Oliver Neukum írta:
>
>>      3. we should not set res_length to 0, otherwise  we get stuck 
>> again (actually, should we clear the "if (!desc->reslength)" in 
>> wdm_read(), since that's is no longer possible anymore?
>
> AFAICT it can happen if two threads are racing on wdm_read()
>
I'm not seeing how could we get stuck. I think, with the patch I sent 
previously, we can just simply remove reslength as is, as it is no 
longer used. Nothing sets it to 0 anymore, as in that case, the setter 
gets skipped, and the only consumer is wdm_read(), where, it is only 
checking for 0.

Best regards,
Robert


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Handling incoming ZLP in cdc-wdm
  2025-04-02 11:57         ` Robert Hodaszi
@ 2025-04-02 14:01           ` Oliver Neukum
  2025-04-02 15:01             ` Robert Hodaszi
  0 siblings, 1 reply; 15+ messages in thread
From: Oliver Neukum @ 2025-04-02 14:01 UTC (permalink / raw)
  To: Robert Hodaszi, Oliver Neukum, linux-usb@vger.kernel.org,
	Bjørn Mork

On 02.04.25 13:57, Robert Hodaszi wrote:
> 
> 2025. 03. 31. 11:59 keltezéssel, Oliver Neukum írta:
>>
>>>      3. we should not set res_length to 0, otherwise  we get stuck again (actually, should we clear the "if (!desc->reslength)" in wdm_read(), since that's is no longer possible anymore?
>>
>> AFAICT it can happen if two threads are racing on wdm_read()
>>
> I'm not seeing how could we get stuck. I think, with the patch I sent previously, we can just simply remove reslength as is, as it is no longer used. Nothing sets it to 0 anymore, as in that case, the setter gets skipped, and the only consumer is wdm_read(), where, it is only checking for 0.

Hi,

I see what you mean, but I am afraid this introduces an error case.

Suppose we have the following scenario with your patch applied:

wdm_in_callback() gets a ZLP -> we schedule service_interrupt_work()
service_interrupt_work() calls service_outstanding_interrupt()
service_outstanding_interrupt() decrements resp_count to zero
service_interrupt_work() sets WDM_READ and wakes the waiting task

wdm_read() is woken up and finds an empty buffer
If reslength == 0 all is well and it returns -EAGAIN to user space
If reslength != 0 it will return EOF

We must not return EOF.
Hence the longer I think about it it seems to me that

1. reslength is necessary
2. You must set reslength to 0 whenever you get a ZLP

	Regards
		Oliver



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Handling incoming ZLP in cdc-wdm
  2025-04-02 14:01           ` Oliver Neukum
@ 2025-04-02 15:01             ` Robert Hodaszi
  2025-04-02 19:13               ` Oliver Neukum
  0 siblings, 1 reply; 15+ messages in thread
From: Robert Hodaszi @ 2025-04-02 15:01 UTC (permalink / raw)
  To: Oliver Neukum, linux-usb@vger.kernel.org, Bjørn Mork

>
> Hi,
>
> I see what you mean, but I am afraid this introduces an error case.
>
> Suppose we have the following scenario with your patch applied:
>
> wdm_in_callback() gets a ZLP -> we schedule service_interrupt_work()
> service_interrupt_work() calls service_outstanding_interrupt()
> service_outstanding_interrupt() decrements resp_count to zero
> service_interrupt_work() sets WDM_READ and wakes the waiting task
>
> wdm_read() is woken up and finds an empty buffer
> If reslength == 0 all is well and it returns -EAGAIN to user space
> If reslength != 0 it will return EOF
>
> We must not return EOF.
> Hence the longer I think about it it seems to me that
>
> 1. reslength is necessary
> 2. You must set reslength to 0 whenever you get a ZLP
>
>        Regards
>                Oliver
>
That's a good point! So that's the case, when we receive only a single 
ZLP, and nothing else.

But we cannot return with -EAGAIN. If we do that, we're back to sqrt(1), 
and get stuck again.

So what about modifying the service_interrupt_work to no simply set 
WDM_READ if resp_count is 0, but instead to check if there's any real 
message in the buffer, to not confuse consumers. Something like this:

    diff --git a/drivers/usb/class/cdc-wdm.c b/drivers/usb/class/cdc-wdm.c
    index 37873acd18f4..9037379f3603 100644
    --- a/drivers/usb/class/cdc-wdm.c
    +++ b/drivers/usb/class/cdc-wdm.c
    @@ -1010,7 +1010,7 @@ static void service_interrupt_work(struct
    work_struct *work)

             spin_lock_irq(&desc->iuspin);
             service_outstanding_interrupt(desc);
    -       if (!desc->resp_count) {
    +       if (!desc->resp_count && (desc->length || desc->rerr)) {
                     set_bit(WDM_READ, &desc->flags);
                     wake_up(&desc->wait);
             }


Best regards,
Robert


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Handling incoming ZLP in cdc-wdm
  2025-04-02 15:01             ` Robert Hodaszi
@ 2025-04-02 19:13               ` Oliver Neukum
  2025-04-03 12:25                 ` Robert Hodaszi
       [not found]                 ` <898977f7-3882-4ffe-8833-c44f06914337@digi.com>
  0 siblings, 2 replies; 15+ messages in thread
From: Oliver Neukum @ 2025-04-02 19:13 UTC (permalink / raw)
  To: Robert Hodaszi, Oliver Neukum, linux-usb@vger.kernel.org,
	Bjørn Mork

On 02.04.25 17:01, Robert Hodaszi wrote:

> 
> But we cannot return with -EAGAIN. If we do that, we're back to sqrt(1), and get stuck again.

Then we have a problem. If we are servicing a read() syscall,
we have to either return data, or, if we cannot do that
we either sleep or return -EAGAIN depending on O_NONBLOCK.

There is nothing we can do about that. This makes me think
that the issue here is poll() rather than in wdm_read()

> So what about modifying the service_interrupt_work to no simply set WDM_READ if resp_count is 0, but instead to check if there's any real message in the buffer, to not confuse consumers. Something like this:

That specific proposal will not work because the issue
is in service_interrupt_work() which can already be scheduled.
We cannot prevent that.

>     diff --git a/drivers/usb/class/cdc-wdm.c b/drivers/usb/class/cdc-wdm.c
>     index 37873acd18f4..9037379f3603 100644
>     --- a/drivers/usb/class/cdc-wdm.c
>     +++ b/drivers/usb/class/cdc-wdm.c
>     @@ -1010,7 +1010,7 @@ static void service_interrupt_work(struct
>     work_struct *work)
> 
>              spin_lock_irq(&desc->iuspin);
>              service_outstanding_interrupt(desc);
>     -       if (!desc->resp_count) {
>     +       if (!desc->resp_count && (desc->length || desc->rerr)) {
>                      set_bit(WDM_READ, &desc->flags);
>                      wake_up(&desc->wait);

And what happens if wdm_read() wakes up because a signal is delivered?

	Regards
		Oliver


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Handling incoming ZLP in cdc-wdm
  2025-04-02 19:13               ` Oliver Neukum
@ 2025-04-03 12:25                 ` Robert Hodaszi
       [not found]                 ` <898977f7-3882-4ffe-8833-c44f06914337@digi.com>
  1 sibling, 0 replies; 15+ messages in thread
From: Robert Hodaszi @ 2025-04-03 12:25 UTC (permalink / raw)
  To: Oliver Neukum, linux-usb@vger.kernel.org, Bjørn Mork

> Then we have a problem. If we are servicing a read() syscall,
> we have to either return data, or, if we cannot do that
> we either sleep or return -EAGAIN depending on O_NONBLOCK.
>
> There is nothing we can do about that. This makes me think
> that the issue here is poll() rather than in wdm_read()
(Resending the mail, because Thunderbird changed format to HTML...)

Yes, exactly, that's what I'm saying. :) It's not a problem with 
wdm_read(). The problem is that wdm_poll() confuses users with returning 
available data, although there's nothing there to read, and wdm_read() 
returns with EAGAIN (correctly). That's why we don't want to set 
WDM_READ when the buffer is empty, and there's no error neither (so we 
just received a ZLP).

Returning with EAGAIN in wdm_read() is totally OK, except if cdc-wdm 
told the caller through poll that there's available data.

>
>> So what about modifying the service_interrupt_work to no simply set 
>> WDM_READ if resp_count is 0, but instead to check if there's any real 
>> message in the buffer, to not confuse consumers. Something like this:
>
> That specific proposal will not work because the issue
> is in service_interrupt_work() which can already be scheduled.
> We cannot prevent that.
>
>>     diff --git a/drivers/usb/class/cdc-wdm.c 
>> b/drivers/usb/class/cdc-wdm.c
>>     index 37873acd18f4..9037379f3603 100644
>>     --- a/drivers/usb/class/cdc-wdm.c
>>     +++ b/drivers/usb/class/cdc-wdm.c
>>     @@ -1010,7 +1010,7 @@ static void service_interrupt_work(struct
>>     work_struct *work)
>>
>>              spin_lock_irq(&desc->iuspin);
>>              service_outstanding_interrupt(desc);
>>     -       if (!desc->resp_count) {
>>     +       if (!desc->resp_count && (desc->length || desc->rerr)) {
>>                      set_bit(WDM_READ, &desc->flags);
>>                      wake_up(&desc->wait);
>
> And what happens if wdm_read() wakes up because a signal is delivered?
>
OK, I might not see the obvious, but I don't understand this now. Let me 
copy here the complete patch, not just the chunks of it. That might make 
more sense. This can be completely wrong though.

    diff --git a/drivers/usb/class/cdc-wdm.c b/drivers/usb/class/cdc-wdm.c
    index 86ee39db013f..9164e311a3e8 100644
    --- a/drivers/usb/class/cdc-wdm.c
    +++ b/drivers/usb/class/cdc-wdm.c
    @@ -92,7 +92,6 @@ struct wdm_device {
          u16            wMaxCommand;
          u16            wMaxPacketSize;
          __le16            inum;
    -    int            reslength;
          int            length;
          int            read;
          int            count;
    @@ -214,6 +213,11 @@ static void wdm_in_callback(struct urb *urb)
          if (desc->rerr == 0 && status != -EPIPE)
              desc->rerr = status;

    +    if (length == 0) {
    +        dev_dbg(&desc->intf->dev, "received ZLP\n");
    +        goto skip_error;
    +    }
    +
          if (length + desc->length > desc->wMaxCommand) {
              /* The buffer would overflow */
              set_bit(WDM_OVERFLOW, &desc->flags);
    @@ -222,15 +226,14 @@ static void wdm_in_callback(struct urb *urb)
              if (!test_bit(WDM_OVERFLOW, &desc->flags)) {
                  memmove(desc->ubuf + desc->length, desc->inbuf, length);
                  desc->length += length;
    -            desc->reslength = length;
              }
          }
      skip_error:

    -    if (desc->rerr) {
    +    if (desc->rerr || length == 0) {
              /*
    -         * Since there was an error, userspace may decide to not read
    -         * any data after poll'ing.
    +         * If there was a ZLP or an error, userspace may decide to not
    +         * read any data after poll'ing.
               * We should respond to further attempts from the device
    to send
               * data, so that we can get unstuck.
               */
    @@ -585,15 +588,6 @@ static ssize_t wdm_read
                  goto retry;
              }

    -        if (!desc->reslength) { /* zero length read */
    -            dev_dbg(&desc->intf->dev, "zero length - clearing
    WDM_READ\n");
    -            clear_bit(WDM_READ, &desc->flags);
    -            rv = service_outstanding_interrupt(desc);
    -            spin_unlock_irq(&desc->iuspin);
    -            if (rv < 0)
    -                goto err;
    -            goto retry;
    -        }
              cntr = desc->length;
              spin_unlock_irq(&desc->iuspin);
          }
    @@ -1005,7 +999,7 @@ static void service_interrupt_work(struct
    work_struct *work)

          spin_lock_irq(&desc->iuspin);
          service_outstanding_interrupt(desc);
    -    if (!desc->resp_count) {
    +    if (!desc->resp_count && (desc->length || desc->rerr)) {
              set_bit(WDM_READ, &desc->flags);
              wake_up(&desc->wait);
          }


So with this:

    1. not setting WDM_READ on ZLP
    2. on ZLP, we just submitting another URB to receive the remaining
    responses, if resp_count != 0, so keeping the receiving alive
    3. in service_interrupt_work(), just setting the WDM_READ if we
    finished response receiving, and there's data in the buffer for the
    user to read out, or we received and error
    4. reslength is no longer needed, as that checked for ZLP-s, but we
    are handling them wdm_in_callback()


Best regards,
Robert



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Handling incoming ZLP in cdc-wdm
       [not found]                 ` <898977f7-3882-4ffe-8833-c44f06914337@digi.com>
@ 2025-04-03 12:58                   ` Oliver Neukum
  2025-04-03 14:42                     ` Robert Hodaszi
  0 siblings, 1 reply; 15+ messages in thread
From: Oliver Neukum @ 2025-04-03 12:58 UTC (permalink / raw)
  To: Robert Hodaszi, linux-usb@vger.kernel.org, Bjørn Mork

On 03.04.25 13:49, Robert Hodaszi wrote:

> 
>     +    if (length == 0) {
>     +        dev_dbg(&desc->intf->dev, "received ZLP\n");
>     +        goto skip_error;

Please use a dedicated label for this. No need to retest length

Otherwise it looks good to me.

	Regards
		Oliver


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Handling incoming ZLP in cdc-wdm
  2025-04-03 12:58                   ` Oliver Neukum
@ 2025-04-03 14:42                     ` Robert Hodaszi
  0 siblings, 0 replies; 15+ messages in thread
From: Robert Hodaszi @ 2025-04-03 14:42 UTC (permalink / raw)
  To: Oliver Neukum, linux-usb@vger.kernel.org, Bjørn Mork

On 03.04.25 14:58, Oliver Neukum wrote:
>
> Please use a dedicated label for this. No need to retest length
>
> Otherwise it looks good to me.
>
>        Regards
>                Oliver

Thanks! I sent up a patch.

Best regards,
Robert Hodaszi


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2025-04-03 14:42 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-26 16:03 Handling incoming ZLP in cdc-wdm Hodaszi, Robert
2025-03-27 13:01 ` Robert Hodaszi
2025-03-27 13:24   ` Oliver Neukum
2025-03-27 15:27     ` Robert Hodaszi
2025-03-31  9:59       ` Oliver Neukum
2025-04-02 11:57         ` Robert Hodaszi
2025-04-02 14:01           ` Oliver Neukum
2025-04-02 15:01             ` Robert Hodaszi
2025-04-02 19:13               ` Oliver Neukum
2025-04-03 12:25                 ` Robert Hodaszi
     [not found]                 ` <898977f7-3882-4ffe-8833-c44f06914337@digi.com>
2025-04-03 12:58                   ` Oliver Neukum
2025-04-03 14:42                     ` Robert Hodaszi
  -- strict thread matches above, loose matches on Subject: below --
2025-03-26 15:50 Robert Hodaszi
2025-03-27 13:21 ` Oliver Neukum
2025-03-27 15:23   ` Robert Hodaszi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox